Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/14111 )
Change subject: KUDU-2069 pt 1: add a maintenance mode ...................................................................... Patch Set 3: (21 comments) http://gerrit.cloudera.org:8080/#/c/14111/2//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/14111/2//COMMIT_MSG@10 PS2, Line 10: failures of T will not be considered when determining : whether a given tablet is under-replicated > I'm curious whether it mandates to have some sort of special response error I don't explicitly mention this in the design doc, but going by what I described, I don't think so. What would the error code do? Lead the client to retry? Or maybe the master should create the table anyway, but with 2 replicas instead of 3, and the master will eventually trigger re-replication back up to 3 once out of maintenance mode. That doesn't seem too unreasonable, but I think that functionality could be built in later. http://gerrit.cloudera.org:8080/#/c/14111/2//COMMIT_MSG@23 PS2, Line 23: whitelist > nit: it sounds like rather a "blackgrey/list" or "the usual suspects". Jus I was struggling to come to a term that aptly described the behavior and landed on whitelist: "a list of people or things considered to be acceptable or trustworthy." The thought here being, even if these servers are failed, their failed status is acceptable. Note that the whitelisting is _only_ for counting towards under-replication. The prevention of placement on maintenance mode servers is more "blacklist"-like. http://gerrit.cloudera.org:8080/#/c/14111/2/src/kudu/integration-tests/maintenance_mode-itest.cc File src/kudu/integration-tests/maintenance_mode-itest.cc: http://gerrit.cloudera.org:8080/#/c/14111/2/src/kudu/integration-tests/maintenance_mode-itest.cc@114 PS2, Line 114: // Return the number of failed replicas there are in the cluster, according > warning: the const qualified parameter 'ts_map' is copied for each invocati Done http://gerrit.cloudera.org:8080/#/c/14111/2/src/kudu/master/catalog_manager.cc File src/kudu/master/catalog_manager.cc: http://gerrit.cloudera.org:8080/#/c/14111/2/src/kudu/master/catalog_manager.cc@5018 PS2, Line 5018: MaintenanceState state; > warning: the variable 'tserver_uuid' is copy-constructed from a const refer Done http://gerrit.cloudera.org:8080/#/c/14111/2/src/kudu/master/catalog_manager.cc@5039 PS2, Line 5039: master_->ts_manager()->SetMaintenanceState(tserver_uuid, state); > warning: the parameter 'user' is copied for each invocation but only used a Ack http://gerrit.cloudera.org:8080/#/c/14111/1/src/kudu/master/maintenance_state-test.cc File src/kudu/master/maintenance_state-test.cc: http://gerrit.cloudera.org:8080/#/c/14111/1/src/kudu/master/maintenance_state-test.cc@26 PS1, Line 26: #include "kudu/common/common.pb.h" > warning: #includes are not sorted properly [llvm-include-order] Done http://gerrit.cloudera.org:8080/#/c/14111/1/src/kudu/master/maintenance_state-test.cc@116 PS1, Line 116: // report type, populating 'resp' with the response if non-null. > warning: the const qualified parameter 'consensus_health_map' is copied for Done http://gerrit.cloudera.org:8080/#/c/14111/1/src/kudu/master/maintenance_state-test.cc@113 PS1, Line 113: } : : // Sends a heartbeat to the master from the given tserver with the given : // report type, populating 'resp' with the response if non-null. : Status SendHeartbeat(const string& tserver, : > Will remove. Done http://gerrit.cloudera.org:8080/#/c/14111/2/src/kudu/master/maintenance_state-test.cc File src/kudu/master/maintenance_state-test.cc: http://gerrit.cloudera.org:8080/#/c/14111/2/src/kudu/master/maintenance_state-test.cc@55 PS2, Line 55: using strings::Substitute; > warning: using decl 'unordered_map' is unused [misc-unused-using-decls] Done http://gerrit.cloudera.org:8080/#/c/14111/2/src/kudu/master/maintenance_state-test.cc@84 PS2, Line 84: > affiliated Done http://gerrit.cloudera.org:8080/#/c/14111/2/src/kudu/master/maintenance_state-test.cc@100 PS2, Line 100: mini_master_->bound_rpc_addr().host())); : } : : / > nit: move it up to be just after SetUp() -- that would look a bit better fr Done http://gerrit.cloudera.org:8080/#/c/14111/2/src/kudu/master/maintenance_state-test.cc@170 PS2, Line 170: tateTest, TestReloadMaintenanceState) > looks like an incomplete sentence Done http://gerrit.cloudera.org:8080/#/c/14111/2/src/kudu/master/master.proto File src/kudu/master/master.proto: http://gerrit.cloudera.org:8080/#/c/14111/2/src/kudu/master/master.proto@816 PS2, Line 816: optional MaintenanceStatePB state > Maybe, it makes sense to allow setting the maintenance state in batches? I don't think that is necessary right now, but I'll write it so we can in the future if we want. http://gerrit.cloudera.org:8080/#/c/14111/2/src/kudu/master/master_service.cc File src/kudu/master/master_service.cc: http://gerrit.cloudera.org:8080/#/c/14111/2/src/kudu/master/master_service.cc@282 PS2, Line 282: // If we previously needed a full tablet report for the tserver (e.g. : // because we need to recheck replica states after exiting from maintenance : / > Could you document the reasoning behind this code? Done http://gerrit.cloudera.org:8080/#/c/14111/2/src/kudu/master/ts_descriptor.h File src/kudu/master/ts_descriptor.h: http://gerrit.cloudera.org:8080/#/c/14111/2/src/kudu/master/ts_descriptor.h@113 PS2, Line 113: std::string &permanent_uuid() cons > Why boost::optional? Would simple enum suffice? Done http://gerrit.cloudera.org:8080/#/c/14111/2/src/kudu/master/ts_descriptor.h@207 PS2, Line 207: tenance > nit: drop Done http://gerrit.cloudera.org:8080/#/c/14111/2/src/kudu/master/ts_descriptor.h@207 PS2, Line 207: serv > needs to send ? Done http://gerrit.cloudera.org:8080/#/c/14111/2/src/kudu/master/ts_manager.h File src/kudu/master/ts_manager.h: http://gerrit.cloudera.org:8080/#/c/14111/2/src/kudu/master/ts_manager.h@90 PS2, Line 90: > them ? Done http://gerrit.cloudera.org:8080/#/c/14111/2/src/kudu/master/ts_manager.h@132 PS2, Line 132: > nit: could you follow the same naming notation that was used for 'servers_b Done http://gerrit.cloudera.org:8080/#/c/14111/1/src/kudu/master/ts_manager.cc File src/kudu/master/ts_manager.cc: http://gerrit.cloudera.org:8080/#/c/14111/1/src/kudu/master/ts_manager.cc@182 PS1, Line 182: new_tserver ? "Registered new" : "Re-registered known", : descriptor->ToString()); : *desc = std::move(descriptor); > move up Done http://gerrit.cloudera.org:8080/#/c/14111/2/src/kudu/master/ts_manager.cc File src/kudu/master/ts_manager.cc: http://gerrit.cloudera.org:8080/#/c/14111/2/src/kudu/master/ts_manager.cc@68 PS2, Line 68: const char* MaintenanceStateAsString(MaintenanceState state) { : switch (state) { : case kNone: : return "NONE"; : case kMaintenanceMode: : return "MAINTENANCE MODE"; : } : LOG(DFATAL) << Substitute("UNKNOWN STATE ($0)", state); : return ""; : } > nit: I think this method can return const reference to strings, returning s Done, returning const char* -- To view, visit http://gerrit.cloudera.org:8080/14111 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia857668b87560cdd451c2e7f90d72f8217ba5a4b Gerrit-Change-Number: 14111 Gerrit-PatchSet: 3 Gerrit-Owner: Andrew Wong <aw...@cloudera.com> Gerrit-Reviewer: Alexey Serbin <aser...@cloudera.com> Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Comment-Date: Fri, 23 Aug 2019 19:45:51 +0000 Gerrit-HasComments: Yes