Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/14217 )
Change subject: KUDU-2069 p1: add persistent tserver maintenance mode ...................................................................... Patch Set 4: (16 comments) http://gerrit.cloudera.org:8080/#/c/14217/4/src/kudu/master/catalog_manager.cc File src/kudu/master/catalog_manager.cc: http://gerrit.cloudera.org:8080/#/c/14217/4/src/kudu/master/catalog_manager.cc@493 PS4, Line 493: ts_manager_->ResetAllTServerStates(); Seems unintuitive to bury this rather important call in the loader's constructor. Wouldn't it be more intuitive if it were done in VisitTServerStatesUnlocked, before the visiting starts? http://gerrit.cloudera.org:8080/#/c/14217/4/src/kudu/master/catalog_manager.cc@502 PS4, Line 502: auto ts_lock = ts_manager_->GetOrCreateTServerStateLock(tserver_id); Might be nice to use a "create only" variant here, as the master tablet schema guarantees that we'll only ever see a particular tserver ID once. http://gerrit.cloudera.org:8080/#/c/14217/4/src/kudu/master/catalog_manager.cc@4999 PS4, Line 4999: RETURN_NOT_OK_PREPEND(sys_catalog_->RemoveTServerState(tserver_uuid), Why not also remove the entry from the TSManager map? That way the on-disk and in-memory invariants match: if a tserver has no in-progress state, there's neither an on-disk nor an in-memory object tracking it. http://gerrit.cloudera.org:8080/#/c/14217/4/src/kudu/master/master.proto File src/kudu/master/master.proto: http://gerrit.cloudera.org:8080/#/c/14217/4/src/kudu/master/master.proto@229 PS4, Line 229: message SysTServerStateEntryPB { I thought it might be nice to add a timestamp here, but I think you wanted to wait until this patch series landed first? http://gerrit.cloudera.org:8080/#/c/14217/4/src/kudu/master/master.proto@809 PS4, Line 809: // TODO(KUDU-1827): add state for decommissioning. Do you expect decommissioning (or any future state) to be additive? Or can a tserver only be in one such state at a time? http://gerrit.cloudera.org:8080/#/c/14217/4/src/kudu/master/sys_catalog.cc File src/kudu/master/sys_catalog.cc: http://gerrit.cloudera.org:8080/#/c/14217/4/src/kudu/master/sys_catalog.cc@471 PS4, Line 471: // Schema for the unified SysCatalogTable: : // : // (entry_type, entry_id) -> metadata : // : // entry_type is a enum defined in sys_tables. It indicates : // whether an entry is a table or a tablet. : // : // entry_type is the first part of a compound key as to allow : // efficient scans of entries of only a single type (e.g., only : // scan all of the tables, or only scan all of the tablets). : // : // entry_id is either a table id or a tablet id. For tablet entries, : // the table id that the tablet is associated with is stored in the : // protobuf itself. This is pretty important to know, and may deserve to be moved (and expanded) in sys_catalog.h. http://gerrit.cloudera.org:8080/#/c/14217/4/src/kudu/master/sys_catalog.cc@775 PS4, Line 775: std:: Remove prefix. http://gerrit.cloudera.org:8080/#/c/14217/4/src/kudu/master/sys_catalog.cc@790 PS4, Line 790: enc.Add(RowOperationsPB::UPSERT, row); Why UPSERT and not INSERT? I can take my answer in the form of a code comment. http://gerrit.cloudera.org:8080/#/c/14217/4/src/kudu/master/ts_manager.cc File src/kudu/master/ts_manager.cc: http://gerrit.cloudera.org:8080/#/c/14217/4/src/kudu/master/ts_manager.cc@216 PS4, Line 216: LOG(INFO) << "Reseting tserver states"; Do you still want this? http://gerrit.cloudera.org:8080/#/c/14217/4/src/kudu/master/ts_state-test.cc File src/kudu/master/ts_state-test.cc: http://gerrit.cloudera.org:8080/#/c/14217/4/src/kudu/master/ts_state-test.cc@92 PS4, Line 92: mini_master_.reset(new MiniMaster(GetTestPath("master"), : HostPort("127.0.0.1", 0))); Note that this pattern leads to flaky tests when combined with restarting (see KUDU-2431 for details). http://gerrit.cloudera.org:8080/#/c/14217/4/src/kudu/master/ts_state-test.cc@96 PS4, Line 96: ASSERT_OK(master_->WaitUntilCatalogManagerIsLeaderAndReadyForTests( : MonoDelta::FromSeconds(5))); No reason not to be more generous, right? http://gerrit.cloudera.org:8080/#/c/14217/4/src/kudu/master/ts_state-test.cc@111 PS4, Line 111: // Sends a heartbeat to the master from the given tserver. : Status SendHeartbeat(const string& tserver) { Could you reword this a bit so the syntheticness is more clear (i.e. that the test is pretending to be a tserver)? http://gerrit.cloudera.org:8080/#/c/14217/4/src/kudu/master/ts_state-test.cc@134 PS4, Line 134: Master* master_; This isn't used by any of the tests themselves; maybe remove it and tolerate the extra indirection in some fixture calls? http://gerrit.cloudera.org:8080/#/c/14217/4/src/kudu/master/ts_state.h File src/kudu/master/ts_state.h: http://gerrit.cloudera.org:8080/#/c/14217/4/src/kudu/master/ts_state.h@27 PS4, Line 27: enum TServerState { : kNone, : kMaintenanceMode, : }; What value does this abstraction add? Why not work with the TServerStatePB enum directly? http://gerrit.cloudera.org:8080/#/c/14217/4/src/kudu/master/ts_state.h@57 PS4, Line 57: class TServerStateLock : public RefCountedThreadSafe<TServerStateLock> { Why does the class need to support shared ownership? Why via scoped_refptr and not std::shared_ptr? http://gerrit.cloudera.org:8080/#/c/14217/4/src/kudu/master/ts_state.h@81 PS4, Line 81: ~TServerStateLock() {} Should this assert that l_ is unlocked? The underlying RWMutex destructor will sort of do that in that the call to pthread_rwlock_destroy will return a non-zero value, but it's not always super obvious what that means. -- To view, visit http://gerrit.cloudera.org:8080/14217 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib669b43b3cee171c4c7dbd54041e29c30cb9f767 Gerrit-Change-Number: 14217 Gerrit-PatchSet: 4 Gerrit-Owner: Andrew Wong <[email protected]> Gerrit-Reviewer: Adar Dembo <[email protected]> Gerrit-Reviewer: Alexey Serbin <[email protected]> Gerrit-Reviewer: Andrew Wong <[email protected]> Gerrit-Reviewer: Greg Solovyev <[email protected]> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Comment-Date: Thu, 12 Sep 2019 23:33:50 +0000 Gerrit-HasComments: Yes
