On 9/9/17 12:16 AM, William Allen Simpson wrote:
On 9/8/17 9:44 AM, Daniel Gryniewicz wrote:
On 09/08/2017 09:07 AM, William Allen Simpson wrote:
On 9/7/17 10:47 PM, Malahal Naineni wrote:
Last time I tried, I got the same. A thread was waiting in epoll_wait() with 29
second timeout that, it was working after such a timeout.
I have seen the same, after I sped up the work pool shutdown. The work
pool shutdown will nanosleep 1 second intervals (was 5 seconds) waiting
for that last thread.
I don't know how/why a thread is getting into epoll_wait() during the
window between svc_rqst_shutdown() and work_pool_shutdown(), but that's
what happens sometimes.
Probably need yet another flag in svc_rqst_shutdown().
I'm looking at using an eventfd to wake up threads on shutdown. That way, we
can sleep for a long time while polling.
There's already a signal to awaken the threads on shutdown.
Finally figured it out, but it was complicated and took too long for
review and inclusion into this week's dev release:
(1) nfs_rpc_dispatch_stop() calls svc_rqst_thrd_signal() with
SVC_RQST_SIGNAL_SHUTDOWN for each service listener channel.
(2) somewhere else calls clnt_vc_ncreatef() and clnt_vc_call() over and
over, which sets up another transport epoll fd and then deletes it after
each reply.
Presumably this is unregistering services. Should probably unregister
services *before* nfs_rpc_dispatch_stop() kills the listeners?
Done. Removed nfs_rpc_dispatch_stop() entirely.
Should also call clnt_vc_ncreatef() once, and then call clnt_vc_call()
repeatedly instead. No need to emulate UDP with TCP!
This still needs to be looked at, but not in this patch.
(3) then calls svc_shutdown(), which in turn calls svc_xprt_shutdown(),
svc_rqst_shutdown(), and work_pool_shutdown().
(4) svc_xprt_shutdown() kills any remaining open transports.
(5) svc_rqst_shutdown() didn't kill epolls that have no transports. The
fix is to kill again channels previously killed in step #1, even though
they no longer have any open transports.
Done. Especially as #1 is removed.
(6) work_pool_shutdown() waited until timeout caused that one remaining
channel for the epoll fd (step #2) to terminate.
Still takes an extra second or two for all the cleanup threads to complete.
This whole process has obviously been a problem in the past, and there
were several otherwise extraneous state flags. This fix means they are
not needed anymore.
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel