Alexey Serbin has posted comments on this change. Change subject: [catalog manager] fixed deadlock on catalog shutdown ......................................................................
Patch Set 4: (16 comments) http://gerrit.cloudera.org:8080/#/c/6134/4//COMMIT_MSG Commit Message: PS4, Line 7: [catalog manager] fixed deadlock on catalog shutdown > Not so important now, but when I meant trace I meant the stack trace showin There is no stack trace on the deadlock: it's not a deadlock on synchronization primitives, it's a logical deadlock. PS4, Line 10: > remove extra space Done PS4, Line 13: system > the system Done PS4, Line 22: like > missing colon after like Done PS4, Line 35: spitting messages like > missing colon after like Done http://gerrit.cloudera.org:8080/#/c/6134/4/src/kudu/master/catalog_manager.cc File src/kudu/master/catalog_manager.cc: PS4, Line 722: scoped_refptr<Consensus> consensus = : sys_catalog_->tablet_peer()->shared_consensus(); : if (!consensus || !consensus->IsRunning()) { : return Status::IllegalState("no active consensus"); : } > so how can consensus be null here? It cannot. I used the common pattern to handle consensus as a shared pointer here as well. PS4, Line 807: This could be just a raw pointer since the tablet peer shutdown happens : // after this task finishes. > so why isn't it? :) Just for uniformity. PS4, Line 811: consensus->IsRunning() > do you care that it is running? can it be "not running"? (consensus->Consen It the consensus is not running then the code below will definitely fail: it tries to read/write into the system table. However, as you mentioned elsewhere, if the consensus is not running (which can happen only when catalog state is not kRunning), this code should not be executed at all. PS4, Line 988: scoped_refptr<Consensus> consensus = : sys_catalog_->tablet_peer()->shared_consensus(); : if (!consensus || !consensus->IsRunning()) { : return RaftPeerPB::UNKNOWN_ROLE; : } : return consensus->role(); > could you replace this and other checks (and avoid reaching into the consen This sounds like a good idea. At least, I cannot see that there are any separate operations with tablet_peer() which could affect consensus separately, not affecting state of the catalog manager. PS4, Line 1022: AbortAndWaitForAllTasks(copy); > After this statement you can short-circuit if the there is no sync_catalog_ I'm not sure I follow. What is sync_catalog_? Where could I find it? PS4, Line 1024: consensus > nit "consensus configuration" or "raft config" (it's not ideal but it's wha This extremely confusing. Why then such a name for the member variable? :) Also, the class comment for the Consensus class starts with: // The external interface for a consensus peer. Anyway, I'll replace this with 'consensus configuration'. PS4, Line 1026: consensus > same Done PS4, Line 1030: Raft quorum > prefer consensus configuration or raft configuration Why 'configuration'? The message here is that other members of the Raft quorum for the tablet go down. PS4, Line 1033: tablets > tablet I meant read metadata for tablets. All right, I'll just remove this. PS4, Line 1040: if (consensus && consensus->IsRunning()) { > as I mentioned above this if unecessary, if we got here we have a consensus ok, this sounds good to me. As you could see, I just used the same pattern for consensus probing everywhere. After noting that consensus cannot be null if sys_catalog_ is not null and the catalog was running, it's not needed, indeed. PS4, Line 1045: activity > tasks I'll replace with 'tasks', but what's wrong with 'activity'? -- To view, visit http://gerrit.cloudera.org:8080/6134 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I10ad66fe33d4696adf2a02a09e2790afa8869583 Gerrit-PatchSet: 4 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey Serbin <aser...@cloudera.com> Gerrit-Reviewer: Adar Dembo <a...@cloudera.com> Gerrit-Reviewer: Alexey Serbin <aser...@cloudera.com> Gerrit-Reviewer: David Ribeiro Alves <dral...@apache.org> Gerrit-Reviewer: Mike Percy <mpe...@apache.org> Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon <t...@apache.org> Gerrit-HasComments: Yes