Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/14608 )
Change subject: KUDU-2987 Intra location rebalance will crash in special case. ...................................................................... Patch Set 3: (23 comments) Overall looks good, just some nits. Thank you very much for fixing the bug! http://gerrit.cloudera.org:8080/#/c/14608/3//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/14608/3//COMMIT_MSG@7 PS3, Line 7: will crash crashes http://gerrit.cloudera.org:8080/#/c/14608/3//COMMIT_MSG@8 PS3, Line 8: Maybe, describe what were the conditions which lead to the crash scenario. Something like the following, I guess: The crash manifested itself in cases where a Kudu cluster had a location that didn't host even a single replica of a tablet. http://gerrit.cloudera.org:8080/#/c/14608/3/src/kudu/tools/rebalancer_tool-test.cc File src/kudu/tools/rebalancer_tool-test.cc: http://gerrit.cloudera.org:8080/#/c/14608/3/src/kudu/tools/rebalancer_tool-test.cc@483 PS3, Line 483: // by the set of the specified locations. Document the newly introduced parameter 'table_name_pattern': it' not obvious from reading the parameter list and the doc what it's about. http://gerrit.cloudera.org:8080/#/c/14608/3/src/kudu/tools/rebalancer_tool-test.cc@487 PS3, Line 487: const string& table_name_pattern = kTableNamePattern) { code style: input parameters should come before output parameters in the function/methods signatures. Move this new parameter before 'table_names'. http://gerrit.cloudera.org:8080/#/c/14608/3/src/kudu/tools/rebalancer_tool-test.cc@523 PS3, Line 523: not elsewhere but not http://gerrit.cloudera.org:8080/#/c/14608/3/src/kudu/tools/rebalancer_tool-test.cc@524 PS3, Line 524: This is to build the imbalance inside : // the locations. Drop. http://gerrit.cloudera.org:8080/#/c/14608/3/src/kudu/tools/rebalancer_tool-test.cc@526 PS3, Line 526: Tserver Tservers http://gerrit.cloudera.org:8080/#/c/14608/3/src/kudu/tools/rebalancer_tool-test.cc@531 PS3, Line 531: excluded_tserver_uuid nit: excluded_tserver_uuids http://gerrit.cloudera.org:8080/#/c/14608/3/src/kudu/tools/rebalancer_tool-test.cc@537 PS3, Line 537: excluded_tservers.count(ts->location) Is it possible to use ContainsKey() here instead? http://gerrit.cloudera.org:8080/#/c/14608/3/src/kudu/tools/rebalancer_tool-test.cc@545 PS3, Line 545: // Check the validation of excluded_tserver_by_location. Drop: this sentence isn't adding any clarity and from the code it's obvious what's going on. http://gerrit.cloudera.org:8080/#/c/14608/3/src/kudu/tools/rebalancer_tool-test.cc@547 PS3, Line 547: elem.second, 0 nit: the expected value comes first (i.e. swap these two) http://gerrit.cloudera.org:8080/#/c/14608/3/src/kudu/tools/rebalancer_tool-test.cc@1716 PS3, Line 1716: /*rep_factor=*/ 3, : /*num_tservers=*/ 11) { : } nit: fix the alignment of these two lines http://gerrit.cloudera.org:8080/#/c/14608/3/src/kudu/tools/rebalancer_tool-test.cc@1725 PS3, Line 1725: Rebalance_KUDU_2987 nit: I think it's better to name this scenario to reflect the essence of what it does, but add the information that this is KUDU-2987 regression test into the comment for this TEST_F() scope. I guess the name for this scenario might be, e.g., LocationsWithEmptyTabletServers http://gerrit.cloudera.org:8080/#/c/14608/3/src/kudu/tools/rebalancer_tool-test.cc@1733 PS3, Line 1733: { "/D", 2} }; nit: add space http://gerrit.cloudera.org:8080/#/c/14608/3/src/kudu/tools/rebalancer_tool-test.cc@1746 PS3, Line 1746: // Create special intra location imbalance in location "/D". I think it's important to note that there are 3 tablets in each table, so total it's 9 tablet replicas per table. http://gerrit.cloudera.org:8080/#/c/14608/3/src/kudu/tools/rebalancer_tool-test.cc@1747 PS3, Line 1747: replica tablet replicas http://gerrit.cloudera.org:8080/#/c/14608/3/src/kudu/tools/rebalancer_tool-test.cc@1749 PS3, Line 1749: table0 in table0 with tablet replicas in locations /A, /B, and /C. http://gerrit.cloudera.org:8080/#/c/14608/3/src/kudu/tools/rebalancer_tool-test.cc@1750 PS3, Line 1750: doesn't over 1 is not greater than 1 http://gerrit.cloudera.org:8080/#/c/14608/3/src/kudu/tools/rebalancer_tool-test.cc@1750 PS3, Line 1750: the : // table0 won't be moved no replica of table0 will be moved http://gerrit.cloudera.org:8080/#/c/14608/3/src/kudu/tools/rebalancer_tool-test.cc@1764 PS3, Line 1764: // The run of the location-aware rebalancing tool should report the cluster : // as balanced. Drop since this is a bit confusing -- the rebalancer actually moves one replica. I thinks it's worth adding a sentence explaining that prior to the fix added in this changelist the rebalancer would simply crash while running. http://gerrit.cloudera.org:8080/#/c/14608/3/src/kudu/tools/rebalancer_tool-test.cc@1774 PS3, Line 1774: rebalance rebalancing http://gerrit.cloudera.org:8080/#/c/14608/3/src/kudu/tools/rebalancer_tool-test.cc@1774 PS3, Line 1774: supposed move should move http://gerrit.cloudera.org:8080/#/c/14608/3/src/kudu/tools/rebalancer_tool-test.cc@1777 PS3, Line 1777: // All the violations of the placement policy should be : // corrected. There were no placement policy violations ever prior to running the rebalancer tool, right? -- To view, visit http://gerrit.cloudera.org:8080/14608 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iea39472e55178fca688b390249432a0f7fefaaba Gerrit-Change-Number: 14608 Gerrit-PatchSet: 3 Gerrit-Owner: ZhangYao <triplesheep0...@gmail.com> Gerrit-Reviewer: Adar Dembo <a...@cloudera.com> Gerrit-Reviewer: Alexey Serbin <aser...@cloudera.com> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: ZhangYao <triplesheep0...@gmail.com> Gerrit-Comment-Date: Thu, 07 Nov 2019 19:22:30 +0000 Gerrit-HasComments: Yes