On 8/31/17 9:14 AM, Daniel Gryniewicz wrote:
On 08/30/2017 10:06 PM, Pradeep wrote:
Hi all,
I'm hitting a crash in TIRPC with Ganesha 2.6-dev.5. It appears to me that
there is a race between a incoming RPC message on a new xprt (for which
accept() was done on the FD) and TIRPC setting the process_cb on the new xprt.
We set the xprt->xp_dispatch.process_cb() from the rendezvous function (nfs_rpc_dispatch_tcp_NFS in case of NFS/TCP). This is called at the end of svc_vc_rendezvous(). But before this happens an RPC request could be invoking svc_vc_recv() because we have
already called accept(). Shouldn't we setup xprt before accept()?
Not the accept itself, but adding the accepted fd to epoll, which is also
happening before the rendezvous. I think the call to svc_rqst_xprt_register()
needs to be last, or a lock needs to be taken.
Bill?
Yes, that's a problem. I checked v2.5 (ntirpc 1.5) and that has the
same issue. It's registering the epoll before doing other essential
things, like setting up the recvsize and sendsize, and calling (old)
xp_recv_user_data (now named nfs_rpc_dispatch_tcp_NFS).
My guess is you're seeing it because the 2.6 epoll loop is much faster.
We're expecting to find more of these timing and code ordering errors.
But it looks like a relatively easy fix.
Thanks for the excellent detailed report. So helpful!
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel