Todd Lipcon has posted comments on this change.

Change subject: KUDU-1863: improve overall safety of graceful server shutdown
......................................................................


Patch Set 2:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/7183/2//COMMIT_MSG
Commit Message:

PS2, Line 15: we can guarantee that there
            : are no reactor threads running RPC callbacks within the 
subsystems that we
            : may be trying to shut down
to guarantee lack of RPC callbacks, wouldn't step 1/2 also need to be ensuring 
that _outbound_ RPCs are all marked failed/aborted/whatever?


Line 31:   generic subsystems.
does this have any issues with hitting the webserver after things like 
TSTabletManager are shut down, etc?


PS2, Line 36: fixing KUDU-1863 in
            : the process
isn't KUDU-1863 still an issue because the SyncWrite might be blocked waiting 
for an operation to be committed even when all peers are down? Or do we 
properly abort it?


http://gerrit.cloudera.org:8080/#/c/7183/2/src/kudu/kserver/kserver.cc
File src/kudu/kserver/kserver.cc:

PS2, Line 87: RPC callbacks
can you be a bit more explicit about inbound vs outbound? i.e there might be 
RPC callbacks executing (due to the respnose of an outbound RPC) and there 
might be inbound RPCs executing (on the RPC worker threads or pending but not 
occupying an explicit RPC thread, eg waiting somewhere in consensus).

Messenger::Shutdown() is enough to ensure that the RPC workers shut down and 
are joined, perhaps, but what about the case where the RPC worker has deferred 
an RpcContext off to some other thread? What will happen then when that other 
thread calls RpcContext::Respond? Is that OK? (I suppose it would be fine to 
just black-hole the response?)


http://gerrit.cloudera.org:8080/#/c/7183/2/src/kudu/rpc/messenger.cc
File src/kudu/rpc/messenger.cc:

Line 387:   // Release the map outside of the lock.
technically I dont think the extra scope is necessary, since 'guard' is defined 
after 'to_release' it will be destructed first, right?

that said, maybe there's illustrative value in the extra scope, so if you think 
it's nice, feel free to leave it.

Another option might be to use unique_lock and then add an explicit .unlock()


Line 400:   // Release the service outside of the lock.
same


http://gerrit.cloudera.org:8080/#/c/7183/2/src/kudu/server/server_base.cc
File src/kudu/server/server_base.cc:

PS2, Line 475:   // First, shut down incoming connections and wait for any 
outstanding RPCs to
             :   // finish processing.
similar to comment elsewhere, not sure this actually finishes all RPCs since 
they may be deferring responses to other worker pools


-- 
To view, visit http://gerrit.cloudera.org:8080/7183
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ibbda29dd471e4dc1a8cb4f472cc456ccdb1e48cc
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo <a...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <a...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <aser...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <davidral...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <t...@apache.org>
Gerrit-HasComments: Yes

Reply via email to