Will Berkeley has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10399 )

Change subject: [tools] rebalancer in the kudu CLI tool
......................................................................


Patch Set 20: Code-Review+1

(13 comments)

A couple of nits and a couple of questions but overall I think it looks good. 
Great job making it easier to understand!

http://gerrit.cloudera.org:8080/#/c/10399/20//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/10399/20//COMMIT_MSG@15
PS20, Line 15: result
nit: extra word


http://gerrit.cloudera.org:8080/#/c/10399/20//COMMIT_MSG@18
PS20, Line 18: the
nit: a


http://gerrit.cloudera.org:8080/#/c/10399/20//COMMIT_MSG@21
PS20, Line 21:
We should also cover the limitations (if any) this tool has working against 
older versions.


http://gerrit.cloudera.org:8080/#/c/10399/20/src/kudu/client/client.cc
File src/kudu/client/client.cc:

http://gerrit.cloudera.org:8080/#/c/10399/20/src/kudu/client/client.cc@548
PS20, Line 548: resp.tablet_locations_size() == 0
This is funny. How did this come up? Looking at the master code, we return a 
NotFound error in the RPC if the tablet is unknown. The only other ways I see 
for no replicas to be returned is for the tablet to have no replicas assigned, 
which would be odd, or for a VOTER_REPLICA filter to be applied with all 
replicas non-voters, which would also be odd.


http://gerrit.cloudera.org:8080/#/c/10399/20/src/kudu/tools/kudu-admin-test.cc
File src/kudu/tools/kudu-admin-test.cc:

http://gerrit.cloudera.org:8080/#/c/10399/20/src/kudu/tools/kudu-admin-test.cc@1353
PS20, Line 1353: ASSERT_FALSE(s.ok());
nit: ASSERT_TRUE on the sort of status it is. Here It looks like that is 
IllegalState.


http://gerrit.cloudera.org:8080/#/c/10399/20/src/kudu/tools/kudu-admin-test.cc@1383
PS20, Line 1383: constexpr auto kTserverUnresponsiveMs = 5000;
nit: Can we lower this and speed up the test? Does it make things flaky if it's 
lowered?


http://gerrit.cloudera.org:8080/#/c/10399/20/src/kudu/tools/kudu-admin-test.cc@1403
PS20, Line 1403: 3
nit: (kRepFactor + 1)


http://gerrit.cloudera.org:8080/#/c/10399/20/src/kudu/tools/kudu-admin-test.cc@1427
PS20, Line 1427: if (kRepFactor > 1)
Ah, makes sense to me that writes can't replicate during replica movement when 
RF = 1 under 3-2-3. However, shouldn't it work during 3-4-3 replica movement?

nit: Also, could you update the comments in this test to say that a workload 
isn't run during 3-2-3 replica movement with RF = 1 because the tablet is 
unavailable during the move?


http://gerrit.cloudera.org:8080/#/c/10399/20/src/kudu/tools/rebalancer.h
File src/kudu/tools/rebalancer.h:

http://gerrit.cloudera.org:8080/#/c/10399/20/src/kudu/tools/rebalancer.h@78
PS20, Line 78: size_t max_state_syncs_in_a_row
This parameter seems like a nice safety valve but also hard to use. How long 
would it take for 1000 max state syncs to happen? Do we need an idle timeout 
instead? So, the semantics would be "if you haven't been able to do a move for 
X seconds, give up as being unable to make progress".


http://gerrit.cloudera.org:8080/#/c/10399/20/src/kudu/tools/rebalancer.cc
File src/kudu/tools/rebalancer.cc:

http://gerrit.cloudera.org:8080/#/c/10399/20/src/kudu/tools/rebalancer.cc@243
PS20, Line 243: resync_state = false;
The rsync happens in GetNextMoves but that's not obvious from reading this 
function. Could you note somewhere that we resync in GetNextMoves?


http://gerrit.cloudera.org:8080/#/c/10399/20/src/kudu/tools/rebalancer.cc@539
PS20, Line 539:
nit: are


http://gerrit.cloudera.org:8080/#/c/10399/20/src/kudu/tools/rebalancer.cc@545
PS20, Line 545: // UUIDs of tablets of the selected table at the source tserver 
which replicas
              :   // are non-leaders.
Howabout "Tablet ids of replicas on the source tserver that are non-leaders."


http://gerrit.cloudera.org:8080/#/c/10399/20/src/kudu/tools/rebalancer.cc@548
PS20, Line 548: // UUIDs of tablets of the selected table at the source tserver 
which replicas
              :   // are leaders.
"Tablet ids of replicas on the source tserver that are leaders."



--
To view, visit http://gerrit.cloudera.org:8080/10399
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I269ea1dcb0b528ad9f03308bac6b8769e2141238
Gerrit-Change-Number: 10399
Gerrit-PatchSet: 20
Gerrit-Owner: Alexey Serbin <aser...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <aser...@cloudera.com>
Gerrit-Reviewer: Mike Percy <mpe...@apache.org>
Gerrit-Reviewer: Will Berkeley <wdberke...@gmail.com>
Gerrit-Comment-Date: Tue, 05 Jun 2018 18:45:57 +0000
Gerrit-HasComments: Yes

Reply via email to