On 9/9/17 12:16 AM, William Allen Simpson wrote:
On 9/8/17 9:44 AM, Daniel Gryniewicz wrote:
On 09/08/2017 09:07 AM, William Allen Simpson wrote:
On 9/7/17 10:47 PM, Malahal Naineni wrote:
Last time I tried, I got the same. A thread was waiting in epoll_wait() with 29 
second timeout that, it was working after such a timeout.

I have seen the same, after I sped up the work pool shutdown.  The work
pool shutdown will nanosleep 1 second intervals (was 5 seconds) waiting
for that last thread.

I don't know how/why a thread is getting into epoll_wait() during the
window between svc_rqst_shutdown() and work_pool_shutdown(), but that's
what happens sometimes.

Probably need yet another flag in svc_rqst_shutdown().


I'm looking at using an eventfd to wake up threads on shutdown.  That way, we 
can sleep for a long time while polling.

There's already a signal to awaken the threads on shutdown.

Finally figured it out, but it was complicated and took too long for
review and inclusion into this week's dev release:

(1) nfs_rpc_dispatch_stop() calls svc_rqst_thrd_signal() with
SVC_RQST_SIGNAL_SHUTDOWN for each service listener channel.

(2) somewhere else calls clnt_vc_ncreatef() and clnt_vc_call() over and
over, which sets up another transport epoll fd and then deletes it after
each reply.

Presumably this is unregistering services.  Should probably unregister
services *before* nfs_rpc_dispatch_stop() kills the listeners?

Done.  Removed nfs_rpc_dispatch_stop() entirely.


Should also call clnt_vc_ncreatef() once, and then call clnt_vc_call()
repeatedly instead.  No need to emulate UDP with TCP!

This still needs to be looked at, but not in this patch.


(3) then calls svc_shutdown(), which in turn calls svc_xprt_shutdown(),
svc_rqst_shutdown(), and work_pool_shutdown().

(4) svc_xprt_shutdown() kills any remaining open transports.

(5) svc_rqst_shutdown() didn't kill epolls that have no transports.  The
fix is to kill again channels previously killed in step #1, even though
they no longer have any open transports.

Done.  Especially as #1 is removed.


(6) work_pool_shutdown() waited until timeout caused that one remaining
channel for the epoll fd (step #2) to terminate.

Still takes an extra second or two for all the cleanup threads to complete.



This whole process has obviously been a problem in the past, and there
were several otherwise extraneous state flags.  This fix means they are
not needed anymore.


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel

Reply via email to