Yao Xu has posted comments on this change. ( http://gerrit.cloudera.org:8080/13632 )
Change subject: KUDU-2823 Place tablet replicas based on deminsion ...................................................................... Patch Set 11: (33 comments) Thanks for your comments. I fixed some of the issues mentioned in the comments. But there are still some comments, I want to confirm it again, so I didn't update it in this patch. :) http://gerrit.cloudera.org:8080/#/c/13632/10//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/13632/10//COMMIT_MSG@7 PS10, Line 7: deminsion > dimension Done http://gerrit.cloudera.org:8080/#/c/13632/10//COMMIT_MSG@9 PS10, Line 9: the newly : created tablet to be evenly distributed on each tablet server > maybe, rephrase for clarity: Done http://gerrit.cloudera.org:8080/#/c/13632/10//COMMIT_MSG@25 PS10, Line 25: deminsion > dimension Done http://gerrit.cloudera.org:8080/#/c/13632/10//COMMIT_MSG@27 PS10, Line 27: deminsion > dimension Done http://gerrit.cloudera.org:8080/#/c/13632/10//COMMIT_MSG@30 PS10, Line 30: deminsion > dimension Done http://gerrit.cloudera.org:8080/#/c/13632/10/src/kudu/client/client.h File src/kudu/client/client.h: http://gerrit.cloudera.org:8080/#/c/13632/10/src/kudu/client/client.h@882 PS10, Line 882: the tablet > Which tablet? Or you meant 'table'? Done http://gerrit.cloudera.org:8080/#/c/13632/10/src/kudu/client/client.h@1210 PS10, Line 1210: to be created. D > to be created ? Done http://gerrit.cloudera.org:8080/#/c/13632/10/src/kudu/integration-tests/create-table-itest.cc File src/kudu/integration-tests/create-table-itest.cc: http://gerrit.cloudera.org:8080/#/c/13632/10/src/kudu/integration-tests/create-table-itest.cc@228 PS10, Line 228: vector<string> master_fl > maybe, drop this since it's empty anyways. Done http://gerrit.cloudera.org:8080/#/c/13632/10/src/kudu/integration-tests/create-table-itest.cc@229 PS10, Line 229: "--master_place_tablet_replicas_based_on_dimension=true", : "--tserver_last_replica_creations_halflife_ms=10", : }; > nit: the shorter form might be Done http://gerrit.cloudera.org:8080/#/c/13632/10/src/kudu/integration-tests/create-table-itest.cc@233 PS10, Line 233: {}, mast > nit: {} Done http://gerrit.cloudera.org:8080/#/c/13632/10/src/kudu/integration-tests/create-table-itest.cc@321 PS10, Line 321: kN > kNumServers ? Done http://gerrit.cloudera.org:8080/#/c/13632/10/src/kudu/master/catalog_manager.cc File src/kudu/master/catalog_manager.cc: http://gerrit.cloudera.org:8080/#/c/13632/10/src/kudu/master/catalog_manager.cc@708 PS10, Line 708: > why boost::optional<string>* ? It seems string* as a type for the > output parameter is good enough here, no? I'm very sorry, this is my mistake. This code should be removed. We should use the dimension label of the tablet instead of the range. http://gerrit.cloudera.org:8080/#/c/13632/10/src/kudu/master/catalog_manager.cc@3733 PS10, Line 3733: > dimension ? Done http://gerrit.cloudera.org:8080/#/c/13632/10/src/kudu/master/catalog_manager.cc@3748 PS10, Line 3748: } : DCHECK_GE(replication_factor, 1); : const auto num_tservers_needed = : > I think that in case of failure to get partition's debug string > it's safe enough to just log a warning and proceed with empty/none > dimension label. > > The idea is that this method is used for re-replication as well, > and it's better to place a replica maybe a bit off from the > dimension-dictated destination, but still preserve availability of > the tablet's data. Thank you for your explanation. I think after drop GetRangePartitionDebugString function, we won't need to modify the code here. :) http://gerrit.cloudera.org:8080/#/c/13632/10/src/kudu/master/catalog_manager.cc@4688 PS10, Line 4688: > is enabled Done http://gerrit.cloudera.org:8080/#/c/13632/10/src/kudu/master/catalog_manager.cc@4688 PS10, Line 4688: > dimension Done http://gerrit.cloudera.org:8080/#/c/13632/10/src/kudu/master/master.proto File src/kudu/master/master.proto: http://gerrit.cloudera.org:8080/#/c/13632/10/src/kudu/master/master.proto@321 PS10, Line 321: in re > drop Done http://gerrit.cloudera.org:8080/#/c/13632/10/src/kudu/master/placement_policy-test.cc File src/kudu/master/placement_policy-test.cc: http://gerrit.cloudera.org:8080/#/c/13632/10/src/kudu/master/placement_policy-test.cc@904 PS10, Line 904: { "ts1", 1000, { { "labelA", 1000 }, } }, : { "ts2", 1000, { { "labelA", 1000 }, > It would be nice to add a few scenarios where: > > * tablets for multiple dimensions label are being added (like 5 > different dimensions) for the same table > * the initial distribution of tablet replicas has a few 'deeps' and > 'peaks' around some average level: e.g., 10, 20, 30 > * the replicas are placed for tablets with replication factor of 3 > (i.e. the first argument for PlaceTabletReplicas() is 3, not only > 1). > * verify how PlaceExtraTabletReplica() works as well > * there scenarios should cover the cases where the replication > factor is a) equal to the number of tablet servers b) replication > factor + 1 (e.g., for RF=3 that's 4 tablet servers) c) two times of > the replication factor (e.g. 6 tablet servers or more for RF=3). > > In the end, it's necessary to validate the overall distribution of > the replicas cluster-wise and label-wise. Ok, I will add tests for these scenarios on the next update. I have a question, how to validate the overall distribution of the replicas cluster-wise? validate cluster-wise is skewed? http://gerrit.cloudera.org:8080/#/c/13632/10/src/kudu/master/placement_policy-test.cc@910 PS10, Line 910: fun > nit: func Done http://gerrit.cloudera.org:8080/#/c/13632/10/src/kudu/master/placement_policy-test.cc@930 PS10, Line 930: boost::none > It would be nice to have a coverage for the scenario when > label-specific replicas are added in the case of huge imbalance of > total number of replicas. This is to verify that there is skew when placing a newly created tablet without dimension label. If this is not necessary, I think I can remove these code. http://gerrit.cloudera.org:8080/#/c/13632/10/src/kudu/master/placement_policy-test.cc@933 PS10, Line 933: ++placement_stats[ts_uuid]; > drop this: it does not provide much of the semantically meaningful coverage Done http://gerrit.cloudera.org:8080/#/c/13632/10/src/kudu/master/placement_policy-test.cc@941 PS10, Line 941: > Did you run it many times (like thousands) to be sure that's a good > enough threshold to avoid flakiness in the future runs? I ran this case 1000 times, the smallest stddev is 19.1. So I changed this to 15.0, I think it will be safer. http://gerrit.cloudera.org:8080/#/c/13632/10/src/kudu/master/placement_policy-test.cc@944 PS10, Line 944: ; > For clarity, it's better to use some other label which is not easily confla Done http://gerrit.cloudera.org:8080/#/c/13632/10/src/kudu/master/placement_policy-test.cc@944 PS10, Line 944: info) > with dimension label ? Done http://gerrit.cloudera.org:8080/#/c/13632/10/src/kudu/master/placement_policy-test.cc@944 PS10, Line 944: SSERT_O > Place Done http://gerrit.cloudera.org:8080/#/c/13632/10/src/kudu/master/placement_policy-test.cc@957 PS10, Line 957: > I'm not sure this assert has some semantically meaningful coverage: sure, t Done http://gerrit.cloudera.org:8080/#/c/13632/10/src/kudu/master/placement_policy-test.cc@964 PS10, Line 964: > Is it crucial to output from the test? If that was just for > debugging purposes, consider removing this once the test is ready. Ok, I will remove it. http://gerrit.cloudera.org:8080/#/c/13632/10/src/kudu/master/placement_policy.cc File src/kudu/master/placement_policy.cc: http://gerrit.cloudera.org:8080/#/c/13632/10/src/kudu/master/placement_policy.cc@56 PS10, Line 56: // TODO (oclarms): get the number of times this tablet server has recently been : // selected to create a tablet replica by dimension. > Is this about somehow amplifying the numbers already accounted for > by RecentReplicaCreations()? I'm curious why that is needed. Does > it mean the current way of accounting is not good enough? I think so, when we add tablets of different dimensions at the same time, this may result in uneven distribution of tablets in some dimension. However, I don't think this will happen often, so I plan to improve it in the next patch. http://gerrit.cloudera.org:8080/#/c/13632/10/src/kudu/master/placement_policy.cc@58 PS10, Line 58: return desc->RecentReplicaCreations() + desc->num_live_replicas(std::move(dimension)); > What if measuring the load of two tablet servers, where nothing has > been created so far (so both RecentReplicaCreations() and > num_live_replicas(label) are 0), but first has 10, but second 20 > tablet replicas total? I think the former one should be considered > less loaded than the second. Or you want those dimension to be > completely independent from the overall load? > > Maybe, add num_live_replicas() / 10000 or something like that into > the sum? Ideally, that should be something that expresses the > preference of using dimension-only count against weighted 'all > other counts'. I'm not sure what's the best way to express it > here, though. Probably, using 1/10000 is good enough initially > given it's not expected to host that many replicas per tablet > server :) I think these dimensions are completely independent of the overall load to be a good choice. Maybe using the rebalance tool to solve the problem of uneven overall load will be better to me. :) http://gerrit.cloudera.org:8080/#/c/13632/10/src/kudu/master/placement_policy.cc@73 PS10, Line 73: if > nit: If --> if Done http://gerrit.cloudera.org:8080/#/c/13632/10/src/kudu/master/ts_descriptor.h File src/kudu/master/ts_descriptor.h: http://gerrit.cloudera.org:8080/#/c/13632/10/src/kudu/master/ts_descriptor.h@136 PS10, Line 136: total > total number Done http://gerrit.cloudera.org:8080/#/c/13632/10/src/kudu/tserver/ts_tablet_manager.h File src/kudu/tserver/ts_tablet_manager.h: http://gerrit.cloudera.org:8080/#/c/13632/10/src/kudu/tserver/ts_tablet_manager.h@204 PS10, Line 204: Get th > nit: it's not the result value of this method per se, so maybe replace 'Ret Done http://gerrit.cloudera.org:8080/#/c/13632/10/src/kudu/tserver/ts_tablet_manager.cc File src/kudu/tserver/ts_tablet_manager.cc: http://gerrit.cloudera.org:8080/#/c/13632/10/src/kudu/tserver/ts_tablet_manager.cc@1248 PS10, Line 1248: DCHEC > nit: it's safe to use DCHECK here since in release build it would crash wit Done -- To view, visit http://gerrit.cloudera.org:8080/13632 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I48a225e221eb42ef2f5489687e80a151d8dc1a42 Gerrit-Change-Number: 13632 Gerrit-PatchSet: 11 Gerrit-Owner: Yao Xu <ocla...@gmail.com> Gerrit-Reviewer: Alexey Serbin <aser...@cloudera.com> Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Reviewer: Todd Lipcon <t...@apache.org> Gerrit-Reviewer: Yao Xu <ocla...@gmail.com> Gerrit-Comment-Date: Tue, 09 Jul 2019 03:01:41 +0000 Gerrit-HasComments: Yes