Andrew Wong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14111 )

Change subject: KUDU-2069 pt 1: add a maintenance mode
......................................................................


Patch Set 3:

(21 comments)

http://gerrit.cloudera.org:8080/#/c/14111/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/14111/2//COMMIT_MSG@10
PS2, Line 10: failures of T will not be considered when determining
            : whether a given tablet is under-replicated
> I'm curious whether it mandates to have some sort of special response error
I don't explicitly mention this in the design doc, but going by what I 
described, I don't think so. What would the error code do? Lead the client to 
retry?

Or maybe the master should create the table anyway, but with 2 replicas instead 
of 3, and the master will eventually trigger re-replication back up to 3 once 
out of maintenance mode. That doesn't seem too unreasonable, but I think that 
functionality could be built in later.


http://gerrit.cloudera.org:8080/#/c/14111/2//COMMIT_MSG@23
PS2, Line 23: whitelist
> nit: it sounds like rather a "blackgrey/list" or "the usual suspects".  Jus
I was struggling to come to a term that aptly described the behavior and landed 
on whitelist:
"a list of people or things considered to be acceptable or trustworthy."

The thought here being, even if these servers are failed, their failed status 
is acceptable. Note that the whitelisting is _only_ for counting towards 
under-replication.

The prevention of placement on maintenance mode servers is more 
"blacklist"-like.


http://gerrit.cloudera.org:8080/#/c/14111/2/src/kudu/integration-tests/maintenance_mode-itest.cc
File src/kudu/integration-tests/maintenance_mode-itest.cc:

http://gerrit.cloudera.org:8080/#/c/14111/2/src/kudu/integration-tests/maintenance_mode-itest.cc@114
PS2, Line 114:   // Return the number of failed replicas there are in the 
cluster, according
> warning: the const qualified parameter 'ts_map' is copied for each invocati
Done


http://gerrit.cloudera.org:8080/#/c/14111/2/src/kudu/master/catalog_manager.cc
File src/kudu/master/catalog_manager.cc:

http://gerrit.cloudera.org:8080/#/c/14111/2/src/kudu/master/catalog_manager.cc@5018
PS2, Line 5018:   MaintenanceState state;
> warning: the variable 'tserver_uuid' is copy-constructed from a const refer
Done


http://gerrit.cloudera.org:8080/#/c/14111/2/src/kudu/master/catalog_manager.cc@5039
PS2, Line 5039:   master_->ts_manager()->SetMaintenanceState(tserver_uuid, 
state);
> warning: the parameter 'user' is copied for each invocation but only used a
Ack


http://gerrit.cloudera.org:8080/#/c/14111/1/src/kudu/master/maintenance_state-test.cc
File src/kudu/master/maintenance_state-test.cc:

http://gerrit.cloudera.org:8080/#/c/14111/1/src/kudu/master/maintenance_state-test.cc@26
PS1, Line 26: #include "kudu/common/common.pb.h"
> warning: #includes are not sorted properly [llvm-include-order]
Done


http://gerrit.cloudera.org:8080/#/c/14111/1/src/kudu/master/maintenance_state-test.cc@116
PS1, Line 116:   // report type, populating 'resp' with the response if 
non-null.
> warning: the const qualified parameter 'consensus_health_map' is copied for
Done


http://gerrit.cloudera.org:8080/#/c/14111/1/src/kudu/master/maintenance_state-test.cc@113
PS1, Line 113:   }
             :
             :   // Sends a heartbeat to the master from the given tserver with 
the given
             :   // report type, populating 'resp' with the response if 
non-null.
             :   Status SendHeartbeat(const string& tserver,
             :
> Will remove.
Done


http://gerrit.cloudera.org:8080/#/c/14111/2/src/kudu/master/maintenance_state-test.cc
File src/kudu/master/maintenance_state-test.cc:

http://gerrit.cloudera.org:8080/#/c/14111/2/src/kudu/master/maintenance_state-test.cc@55
PS2, Line 55: using strings::Substitute;
> warning: using decl 'unordered_map' is unused [misc-unused-using-decls]
Done


http://gerrit.cloudera.org:8080/#/c/14111/2/src/kudu/master/maintenance_state-test.cc@84
PS2, Line 84:
> affiliated
Done


http://gerrit.cloudera.org:8080/#/c/14111/2/src/kudu/master/maintenance_state-test.cc@100
PS2, Line 100:                                         
mini_master_->bound_rpc_addr().host()));
             :   }
             :
             :   /
> nit: move it up to be just after SetUp() -- that would look a bit better fr
Done


http://gerrit.cloudera.org:8080/#/c/14111/2/src/kudu/master/maintenance_state-test.cc@170
PS2, Line 170: tateTest, TestReloadMaintenanceState)
> looks like an incomplete sentence
Done


http://gerrit.cloudera.org:8080/#/c/14111/2/src/kudu/master/master.proto
File src/kudu/master/master.proto:

http://gerrit.cloudera.org:8080/#/c/14111/2/src/kudu/master/master.proto@816
PS2, Line 816: optional MaintenanceStatePB state
> Maybe, it makes sense to allow setting the maintenance state in batches?
I don't think that is necessary right now, but I'll write it so we can in the 
future if we want.


http://gerrit.cloudera.org:8080/#/c/14111/2/src/kudu/master/master_service.cc
File src/kudu/master/master_service.cc:

http://gerrit.cloudera.org:8080/#/c/14111/2/src/kudu/master/master_service.cc@282
PS2, Line 282:     // If we previously needed a full tablet report for the 
tserver (e.g.
             :     // because we need to recheck replica states after exiting 
from maintenance
             :     /
> Could you document the reasoning behind this code?
Done


http://gerrit.cloudera.org:8080/#/c/14111/2/src/kudu/master/ts_descriptor.h
File src/kudu/master/ts_descriptor.h:

http://gerrit.cloudera.org:8080/#/c/14111/2/src/kudu/master/ts_descriptor.h@113
PS2, Line 113: std::string &permanent_uuid() cons
> Why boost::optional?  Would simple enum suffice?
Done


http://gerrit.cloudera.org:8080/#/c/14111/2/src/kudu/master/ts_descriptor.h@207
PS2, Line 207: tenance
> nit: drop
Done


http://gerrit.cloudera.org:8080/#/c/14111/2/src/kudu/master/ts_descriptor.h@207
PS2, Line 207:  serv
> needs to send ?
Done


http://gerrit.cloudera.org:8080/#/c/14111/2/src/kudu/master/ts_manager.h
File src/kudu/master/ts_manager.h:

http://gerrit.cloudera.org:8080/#/c/14111/2/src/kudu/master/ts_manager.h@90
PS2, Line 90:
> them ?
Done


http://gerrit.cloudera.org:8080/#/c/14111/2/src/kudu/master/ts_manager.h@132
PS2, Line 132:
> nit: could you follow the same naming notation that was used for 'servers_b
Done


http://gerrit.cloudera.org:8080/#/c/14111/1/src/kudu/master/ts_manager.cc
File src/kudu/master/ts_manager.cc:

http://gerrit.cloudera.org:8080/#/c/14111/1/src/kudu/master/ts_manager.cc@182
PS1, Line 182:                           new_tserver ? "Registered new" : 
"Re-registered known",
             :                           descriptor->ToString());
             :   *desc = std::move(descriptor);
> move up
Done


http://gerrit.cloudera.org:8080/#/c/14111/2/src/kudu/master/ts_manager.cc
File src/kudu/master/ts_manager.cc:

http://gerrit.cloudera.org:8080/#/c/14111/2/src/kudu/master/ts_manager.cc@68
PS2, Line 68: const char* MaintenanceStateAsString(MaintenanceState state) {
            :   switch (state) {
            :     case kNone:
            :       return "NONE";
            :     case kMaintenanceMode:
            :       return "MAINTENANCE MODE";
            :   }
            :   LOG(DFATAL) << Substitute("UNKNOWN STATE ($0)", state);
            :   return "";
            : }
> nit: I think this method can return const reference to strings, returning s
Done, returning const char*



--
To view, visit http://gerrit.cloudera.org:8080/14111
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia857668b87560cdd451c2e7f90d72f8217ba5a4b
Gerrit-Change-Number: 14111
Gerrit-PatchSet: 3
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <aser...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Fri, 23 Aug 2019 19:45:51 +0000
Gerrit-HasComments: Yes

Reply via email to