David Ribeiro Alves has posted comments on this change.

Change subject: [catalog manager] fixed deadlock on catalog shutdown
......................................................................


Patch Set 4:

(16 comments)

http://gerrit.cloudera.org:8080/#/c/6134/4//COMMIT_MSG
Commit Message:

PS4, Line 7: [catalog manager] fixed deadlock on catalog shutdown
Not so important now, but when I meant trace I meant the stack trace showing 
deadlock


PS4, Line 10:  
remove extra space


PS4, Line 13: system
the system


PS4, Line 22: like
missing colon after like


PS4, Line 35: spitting messages like
missing colon after like


http://gerrit.cloudera.org:8080/#/c/6134/4/src/kudu/master/catalog_manager.cc
File src/kudu/master/catalog_manager.cc:

PS4, Line 722: scoped_refptr<Consensus> consensus =
             :       sys_catalog_->tablet_peer()->shared_consensus();
             :   if (!consensus || !consensus->IsRunning()) {
             :     return Status::IllegalState("no active consensus");
             :   }
> The WaitUntilCaughtUpAsLeader method is called only by the 'elected-as-a-le
so how can consensus be null here?


PS4, Line 807:  This could be just a raw pointer since the tablet peer shutdown 
happens
             :   // after this task finishes.
so why isn't it? :)


PS4, Line 811: consensus->IsRunning()
do you care that it is running? can it be "not running"? 
(consensus->ConsensusState doesn't depend on whether its running)


PS4, Line 988: scoped_refptr<Consensus> consensus =
             :       sys_catalog_->tablet_peer()->shared_consensus();
             :   if (!consensus || !consensus->IsRunning()) {
             :     return RaftPeerPB::UNKNOWN_ROLE;
             :   }
             :   return consensus->role();
could you replace this and other checks (and avoid reaching into the consensus 
locks twice, once in IsRunning() and another in role()) by having a test on the 
state of the catalog_manager itself?

If the catalog manager is kRunning, then consensus must be alive, right? If it 
is alive then it doesn't matter if its running, at least in this particular 
case.


PS4, Line 1022: AbortAndWaitForAllTasks(copy);
After this statement you can short-circuit if the there is no sync_catalog_. 

The only instance when that might happen is when tests issue a shutdown before 
initialization finished meaning there will be no consensus or 
leader_election_pool_ tasks. This can simplify a bit by avoiding the extra ifs.

Also if there is a sync_catalog_ then you don't need to ref count consensus 
(sync_catalog has a counted ref to tablet peer which has a counted ref to 
consensus, this method is idempotent)


PS4, Line 1024: consensus
nit "consensus configuration" or "raft config" (it's not ideal but it's what 
we've been calling it)


PS4, Line 1026: consensus
same


PS4, Line 1030: Raft quorum
prefer consensus configuration or raft configuration


PS4, Line 1033: tablets
tablet


PS4, Line 1040: if (consensus && consensus->IsRunning()) {
as I mentioned above this if unecessary, if we got here we have a consensus and 
ShutDown() is idempotent


PS4, Line 1045: activity
tasks


-- 
To view, visit http://gerrit.cloudera.org:8080/6134
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I10ad66fe33d4696adf2a02a09e2790afa8869583
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin <aser...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <a...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <aser...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <dral...@apache.org>
Gerrit-Reviewer: Mike Percy <mpe...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <t...@apache.org>
Gerrit-HasComments: Yes

Reply via email to