Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/17267 )
Change subject: [master][tool] KUDU-2181 Tool to orchestrate adding a master ...................................................................... Patch Set 2: (8 comments) http://gerrit.cloudera.org:8080/#/c/17267/2/src/kudu/tools/tool_action_master.cc File src/kudu/tools/tool_action_master.cc: http://gerrit.cloudera.org:8080/#/c/17267/2/src/kudu/tools/tool_action_master.cc@172 PS2, Line 172: MonoTime deadline = MonoTime::Now() + MonoDelta::FromSeconds(FLAGS_wait_secs); : do { : Status wait_status = new_master->WaitNoBlock(); : if (!wait_status.IsTimedOut()) { : return Status::RuntimeError("Failed to bring up new master"); : } : if (is_catalog_mngr_running(proxy.get())) { : *new_master_out = std::move(new_master); : return Status::OK(); : } : SleepFor(MonoDelta::FromMilliseconds(100)); : } while (MonoTime::Now() < deadline); : : return Status::TimedOut("Timed out waiting for the new master to come up"); Here and in general, if there's an error, should we kill the subprocess? Using a cancelable ScopedCleanup might be useful (see util/scoped_cleanup.h). http://gerrit.cloudera.org:8080/#/c/17267/2/src/kudu/tools/tool_action_master.cc@191 PS2, Line 191: auto it = flags_map.find(flag_name); : if (it == flags_map.end()) { : return Status::NotFound(Substitute("Flag $0 not supplied", flag_name)); : } nit: would FindOrNull() work here? http://gerrit.cloudera.org:8080/#/c/17267/2/src/kudu/tools/tool_action_master.cc@349 PS2, Line 349: // Get the flags that'll be needed to bring up new master and for system catalog copy. : GFlagsMap flags_map = GetFlagsMap(); Thinking about the typical CM-run Kudu deployment, there could be a few other important flags that get set that we may care about, like the block_manager type, security configs, etc. Did you explore passing the arguments as a space-separated quote-wrapped string? Such a string can easily be found in CM's Kudu-running scripts (and I imagine it wouldn't be tough to do for other orchestration tools). GFlags are last-flag-wins, so it should be easy to override some of them if needed. I suppose that's somewhat annoying to generate in tests, but I also suppose we could build such a string using this GetFlagsMap() call as well. http://gerrit.cloudera.org:8080/#/c/17267/2/src/kudu/tools/tool_action_master.cc@363 PS2, Line 363: [[maybe_unused]] Hm, what might be unused here? http://gerrit.cloudera.org:8080/#/c/17267/2/src/kudu/tools/tool_action_master.cc@363 PS2, Line 363: nit: extra space http://gerrit.cloudera.org:8080/#/c/17267/2/src/kudu/tools/tool_action_master.cc@376 PS2, Line 376: kudu master nit: "the Kudu Master logs" for consistency http://gerrit.cloudera.org:8080/#/c/17267/2/src/kudu/tools/tool_action_master.cc@403 PS2, Line 403: WARN_NOT_OK(new_master->KillAndWait(SIGTERM), "Failed stopping master"); Here and below, why are we only warning here instead of returning an error? Sure it's not catastrophic, but it _will_ prevent a further startup of the Kudu master, wouldn't it? http://gerrit.cloudera.org:8080/#/c/17267/2/src/kudu/tools/tool_action_master.cc@427 PS2, Line 427: kudu master lo nit: sam here -- To view, visit http://gerrit.cloudera.org:8080/17267 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8f99cf2b3f1738c4127c7e7288beab61daf42e7b Gerrit-Change-Number: 17267 Gerrit-PatchSet: 2 Gerrit-Owner: Bankim Bhavsar <[email protected]> Gerrit-Reviewer: Andrew Wong <[email protected]> Gerrit-Reviewer: Bankim Bhavsar <[email protected]> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Comment-Date: Mon, 05 Apr 2021 18:26:03 +0000 Gerrit-HasComments: Yes
