Will Berkeley has posted comments on this change. ( http://gerrit.cloudera.org:8080/10399 )
Change subject: [tools] rebalancer in the kudu CLI tool ...................................................................... Patch Set 20: Code-Review+1 (13 comments) A couple of nits and a couple of questions but overall I think it looks good. Great job making it easier to understand! http://gerrit.cloudera.org:8080/#/c/10399/20//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/10399/20//COMMIT_MSG@15 PS20, Line 15: result nit: extra word http://gerrit.cloudera.org:8080/#/c/10399/20//COMMIT_MSG@18 PS20, Line 18: the nit: a http://gerrit.cloudera.org:8080/#/c/10399/20//COMMIT_MSG@21 PS20, Line 21: We should also cover the limitations (if any) this tool has working against older versions. http://gerrit.cloudera.org:8080/#/c/10399/20/src/kudu/client/client.cc File src/kudu/client/client.cc: http://gerrit.cloudera.org:8080/#/c/10399/20/src/kudu/client/client.cc@548 PS20, Line 548: resp.tablet_locations_size() == 0 This is funny. How did this come up? Looking at the master code, we return a NotFound error in the RPC if the tablet is unknown. The only other ways I see for no replicas to be returned is for the tablet to have no replicas assigned, which would be odd, or for a VOTER_REPLICA filter to be applied with all replicas non-voters, which would also be odd. http://gerrit.cloudera.org:8080/#/c/10399/20/src/kudu/tools/kudu-admin-test.cc File src/kudu/tools/kudu-admin-test.cc: http://gerrit.cloudera.org:8080/#/c/10399/20/src/kudu/tools/kudu-admin-test.cc@1353 PS20, Line 1353: ASSERT_FALSE(s.ok()); nit: ASSERT_TRUE on the sort of status it is. Here It looks like that is IllegalState. http://gerrit.cloudera.org:8080/#/c/10399/20/src/kudu/tools/kudu-admin-test.cc@1383 PS20, Line 1383: constexpr auto kTserverUnresponsiveMs = 5000; nit: Can we lower this and speed up the test? Does it make things flaky if it's lowered? http://gerrit.cloudera.org:8080/#/c/10399/20/src/kudu/tools/kudu-admin-test.cc@1403 PS20, Line 1403: 3 nit: (kRepFactor + 1) http://gerrit.cloudera.org:8080/#/c/10399/20/src/kudu/tools/kudu-admin-test.cc@1427 PS20, Line 1427: if (kRepFactor > 1) Ah, makes sense to me that writes can't replicate during replica movement when RF = 1 under 3-2-3. However, shouldn't it work during 3-4-3 replica movement? nit: Also, could you update the comments in this test to say that a workload isn't run during 3-2-3 replica movement with RF = 1 because the tablet is unavailable during the move? http://gerrit.cloudera.org:8080/#/c/10399/20/src/kudu/tools/rebalancer.h File src/kudu/tools/rebalancer.h: http://gerrit.cloudera.org:8080/#/c/10399/20/src/kudu/tools/rebalancer.h@78 PS20, Line 78: size_t max_state_syncs_in_a_row This parameter seems like a nice safety valve but also hard to use. How long would it take for 1000 max state syncs to happen? Do we need an idle timeout instead? So, the semantics would be "if you haven't been able to do a move for X seconds, give up as being unable to make progress". http://gerrit.cloudera.org:8080/#/c/10399/20/src/kudu/tools/rebalancer.cc File src/kudu/tools/rebalancer.cc: http://gerrit.cloudera.org:8080/#/c/10399/20/src/kudu/tools/rebalancer.cc@243 PS20, Line 243: resync_state = false; The rsync happens in GetNextMoves but that's not obvious from reading this function. Could you note somewhere that we resync in GetNextMoves? http://gerrit.cloudera.org:8080/#/c/10399/20/src/kudu/tools/rebalancer.cc@539 PS20, Line 539: nit: are http://gerrit.cloudera.org:8080/#/c/10399/20/src/kudu/tools/rebalancer.cc@545 PS20, Line 545: // UUIDs of tablets of the selected table at the source tserver which replicas : // are non-leaders. Howabout "Tablet ids of replicas on the source tserver that are non-leaders." http://gerrit.cloudera.org:8080/#/c/10399/20/src/kudu/tools/rebalancer.cc@548 PS20, Line 548: // UUIDs of tablets of the selected table at the source tserver which replicas : // are leaders. "Tablet ids of replicas on the source tserver that are leaders." -- To view, visit http://gerrit.cloudera.org:8080/10399 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I269ea1dcb0b528ad9f03308bac6b8769e2141238 Gerrit-Change-Number: 10399 Gerrit-PatchSet: 20 Gerrit-Owner: Alexey Serbin <aser...@cloudera.com> Gerrit-Reviewer: Alexey Serbin <aser...@cloudera.com> Gerrit-Reviewer: Mike Percy <mpe...@apache.org> Gerrit-Reviewer: Will Berkeley <wdberke...@gmail.com> Gerrit-Comment-Date: Tue, 05 Jun 2018 18:45:57 +0000 Gerrit-HasComments: Yes