Adar Dembo has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14217 )

Change subject: KUDU-2069 p1: add persistent tserver maintenance mode
......................................................................


Patch Set 4:

(16 comments)

http://gerrit.cloudera.org:8080/#/c/14217/4/src/kudu/master/catalog_manager.cc
File src/kudu/master/catalog_manager.cc:

http://gerrit.cloudera.org:8080/#/c/14217/4/src/kudu/master/catalog_manager.cc@493
PS4, Line 493:     ts_manager_->ResetAllTServerStates();
Seems unintuitive to bury this rather important call in the loader's 
constructor. Wouldn't it be more intuitive if it were done in 
VisitTServerStatesUnlocked, before the visiting starts?


http://gerrit.cloudera.org:8080/#/c/14217/4/src/kudu/master/catalog_manager.cc@502
PS4, Line 502:     auto ts_lock = 
ts_manager_->GetOrCreateTServerStateLock(tserver_id);
Might be nice to use a "create only" variant here, as the master tablet schema 
guarantees that we'll only ever see a particular tserver ID once.


http://gerrit.cloudera.org:8080/#/c/14217/4/src/kudu/master/catalog_manager.cc@4999
PS4, Line 4999:     
RETURN_NOT_OK_PREPEND(sys_catalog_->RemoveTServerState(tserver_uuid),
Why not also remove the entry from the TSManager map? That way the on-disk and 
in-memory invariants match: if a tserver has no in-progress state, there's 
neither an on-disk nor an in-memory object tracking it.


http://gerrit.cloudera.org:8080/#/c/14217/4/src/kudu/master/master.proto
File src/kudu/master/master.proto:

http://gerrit.cloudera.org:8080/#/c/14217/4/src/kudu/master/master.proto@229
PS4, Line 229: message SysTServerStateEntryPB {
I thought it might be nice to add a timestamp here, but I think you wanted to 
wait until this patch series landed first?


http://gerrit.cloudera.org:8080/#/c/14217/4/src/kudu/master/master.proto@809
PS4, Line 809: // TODO(KUDU-1827): add state for decommissioning.
Do you expect decommissioning (or any future state) to be additive? Or can a 
tserver only be in one such state at a time?


http://gerrit.cloudera.org:8080/#/c/14217/4/src/kudu/master/sys_catalog.cc
File src/kudu/master/sys_catalog.cc:

http://gerrit.cloudera.org:8080/#/c/14217/4/src/kudu/master/sys_catalog.cc@471
PS4, Line 471: // Schema for the unified SysCatalogTable:
             : //
             : // (entry_type, entry_id) -> metadata
             : //
             : // entry_type is a enum defined in sys_tables. It indicates
             : // whether an entry is a table or a tablet.
             : //
             : // entry_type is the first part of a compound key as to allow
             : // efficient scans of entries of only a single type (e.g., only
             : // scan all of the tables, or only scan all of the tablets).
             : //
             : // entry_id is either a table id or a tablet id. For tablet 
entries,
             : // the table id that the tablet is associated with is stored in 
the
             : // protobuf itself.
This is pretty important to know, and may deserve to be moved (and expanded) in 
sys_catalog.h.


http://gerrit.cloudera.org:8080/#/c/14217/4/src/kudu/master/sys_catalog.cc@775
PS4, Line 775: std::
Remove prefix.


http://gerrit.cloudera.org:8080/#/c/14217/4/src/kudu/master/sys_catalog.cc@790
PS4, Line 790:   enc.Add(RowOperationsPB::UPSERT, row);
Why UPSERT and not INSERT? I can take my answer in the form of a code comment.


http://gerrit.cloudera.org:8080/#/c/14217/4/src/kudu/master/ts_manager.cc
File src/kudu/master/ts_manager.cc:

http://gerrit.cloudera.org:8080/#/c/14217/4/src/kudu/master/ts_manager.cc@216
PS4, Line 216:   LOG(INFO) << "Reseting tserver states";
Do you still want this?


http://gerrit.cloudera.org:8080/#/c/14217/4/src/kudu/master/ts_state-test.cc
File src/kudu/master/ts_state-test.cc:

http://gerrit.cloudera.org:8080/#/c/14217/4/src/kudu/master/ts_state-test.cc@92
PS4, Line 92:     mini_master_.reset(new MiniMaster(GetTestPath("master"),
            :                                       HostPort("127.0.0.1", 0)));
Note that this pattern leads to flaky tests when combined with restarting (see 
KUDU-2431 for details).


http://gerrit.cloudera.org:8080/#/c/14217/4/src/kudu/master/ts_state-test.cc@96
PS4, Line 96:     
ASSERT_OK(master_->WaitUntilCatalogManagerIsLeaderAndReadyForTests(
            :         MonoDelta::FromSeconds(5)));
No reason not to be more generous, right?


http://gerrit.cloudera.org:8080/#/c/14217/4/src/kudu/master/ts_state-test.cc@111
PS4, Line 111:   // Sends a heartbeat to the master from the given tserver.
             :   Status SendHeartbeat(const string& tserver) {
Could you reword this a bit so the syntheticness is more clear (i.e. that the 
test is pretending to be a tserver)?


http://gerrit.cloudera.org:8080/#/c/14217/4/src/kudu/master/ts_state-test.cc@134
PS4, Line 134:   Master* master_;
This isn't used by any of the tests themselves; maybe remove it and tolerate 
the extra indirection in some fixture calls?


http://gerrit.cloudera.org:8080/#/c/14217/4/src/kudu/master/ts_state.h
File src/kudu/master/ts_state.h:

http://gerrit.cloudera.org:8080/#/c/14217/4/src/kudu/master/ts_state.h@27
PS4, Line 27: enum TServerState {
            :   kNone,
            :   kMaintenanceMode,
            : };
What value does this abstraction add? Why not work with the TServerStatePB enum 
directly?


http://gerrit.cloudera.org:8080/#/c/14217/4/src/kudu/master/ts_state.h@57
PS4, Line 57: class TServerStateLock : public 
RefCountedThreadSafe<TServerStateLock> {
Why does the class need to support shared ownership? Why via scoped_refptr and 
not std::shared_ptr?


http://gerrit.cloudera.org:8080/#/c/14217/4/src/kudu/master/ts_state.h@81
PS4, Line 81:   ~TServerStateLock() {}
Should this assert that l_ is unlocked? The underlying RWMutex destructor will 
sort of do that in that the call to pthread_rwlock_destroy will return a 
non-zero value, but it's not always super obvious what that means.



--
To view, visit http://gerrit.cloudera.org:8080/14217
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib669b43b3cee171c4c7dbd54041e29c30cb9f767
Gerrit-Change-Number: 14217
Gerrit-PatchSet: 4
Gerrit-Owner: Andrew Wong <[email protected]>
Gerrit-Reviewer: Adar Dembo <[email protected]>
Gerrit-Reviewer: Alexey Serbin <[email protected]>
Gerrit-Reviewer: Andrew Wong <[email protected]>
Gerrit-Reviewer: Greg Solovyev <[email protected]>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Thu, 12 Sep 2019 23:33:50 +0000
Gerrit-HasComments: Yes

Reply via email to