Hi Matt and Bill, we were able to reproduce this crash very easily with a
sleep after closing "fd" . After my fix, things worked fine. The changes
are a lot but mostly trivial. Appreciate any high level review.

ganesha changes (last but one commit at
https://github.com/ganltc/nfs-ganesha/commits/ibm2.3).

Corresponding ntirpc commit (last commit)
https://github.com/ganltc/ntirpc/commits/ibm2.3

On Mon, Aug 14, 2017 at 5:02 PM, Malahal Naineni <mala...@gmail.com> wrote:

> Unfortunately, I need a fix for this issue against ganesha2.3.
>
> Regards, Malahal.
>
> On Mon, Aug 14, 2017 at 4:18 PM, William Allen Simpson <
> william.allen.simp...@gmail.com> wrote:
>
>> On 8/13/17 11:50 PM, Malahal Naineni wrote:
>>
>>>  >> That trace is the NSM clnt_dg clnt_call, the only use of outgoing
>>> UDP. It's a mess, and has been a mess for a long time.
>>>
>>> We get a file descriptor fd and then create "rec", but while destroying
>>> things, we close "fd" and then rpc_dplx_unref(). Re-arranging these in
>>> clnt_dg_destroy() (and other places) might help fix this issue, but I am
>>> not positive as I am not familiar with this code.
>>>
>>> I am also working on a blind replacement of "fd" by "struct gfd" where
>>> struct gfd has the "fd" as well as a "generation number". The generation
>>> number is incremented when ever such "fd" is created (e.g. accept() call or
>>> socket() call). The changes are many but they are trivial.
>>>
>>> Any thoughts?
>>>
>>> It's not really interesting for the current code base.  In V2.5, I've
>> already eliminated all the various copies of fd, and every SVCXPRT is
>> wrapped inside a dplx_rec, and they all use xp_fd, and it's in only one
>> tree (svc_rqst).  So there's no longer any possibility of multiple
>> generations of fd.
>>
>> That said, the last remaining problem is clnt_dg clnt_call, where the
>> fd can be passed to poll() at the same time as another copy is passed to
>> (or being removed from) epoll().  Requires a complete re-write.
>>
>> I'd started doing the re-write long long ago, even made the rpc_ctx
>> transport independent (committed in V2.6/v1.6 Napalm rendezvous patch).
>> But there are still many problems redesigning with async callbacks.
>>
>> I'm looking at the short-term fix I've mentioned earlier, that we should
>> try TCP before UDP, but given our current code base doesn't even compile,
>> I've given up until next week.
>>
>
>
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel

Reply via email to