> A few people know this code.  One is Frank.  I don't immediately see the
> reason for strong concern, feel free to improve.

So part of why that code looks bizarre? Because the NLM ASYNC RPC procedures
are bizarre...

The NLM ASYNC procedures DON'T have a normal RPC call response. Instead, the
host handling the call (normally the server, but the client in the case of
NLM_GRANTED for lock grant callbacks) makes an RPC CALL back to the sender!
The RPC library, at least at the time of writing this, had no mechanism to
fire off RPC calls and not care about a response...

The problem is the client NEVER sends a response to the NLM_xxxx_RSP RPC
callback. So I coded a short timeout so we didn't actually wait for a
response we would never get... This code was also written in haste. I had a
day or so to get it re-written and tested during one of the few
Connectathons I attended with Ganesha that Apple also attended... Since
then, I have never had a Mac client to even make sure it still works...

The server does need to wait for the NLM_GRANTED_RSP call to be able to
complete the lock transaction, thus some of the craziness. The short 5
second wait may not really be enough, and NLM may go very clunky under poor
network conditions. But it won't totally fail because the clients actually
never trust the server will send an NLM_GRANTED_MSG (apparently some client,
that was identified by name at least at one time in the Linux source, got
this whole thing wrong... No big surprise, it's a mess to try and
understand).

There's assuredly a better way to handle this all, especially if the ntirpc
library would directly support fire and forget RPC calls, and then maybe
even have something to wait for the NLM_GRANTED_RSP call back from the
client after the server sends the NLM_GRANTED_MSG call (which is really a
callback).

The problem is if we muck in this area at all, we need to get a Mac client
(or maybe a Freebsd client) that uses the NLM  ASYNC calls...

Amusingly the NLM Async calls which the Mac client (and maybe the FreeBSD
client) uses were designated out of a mistaken belief that MS-DOS not being
multi-tasking couldn't handle the whole RPC call/response thing... NLM_SHARE
was put in the NLM protocol to support MS-DOS 4.0...

It would be cool if we didn't actually need to support the NLM ASYNC stuff
except for the NLM_GRANTED_MSG and NLM_GRANTED_RSP calls that the clients
all seem to expect (though it might be worth a check to see if they would
actually be ok getting an NLM_GRANTED call).

So if you mess with this area, at an absolute minimum make sure Connectathon
lock tests still work from a Linux client using NFS v3. Better, use the
multilock test suite to MAKE SURE you have actually tested blocking locks
(the Connectathon suite has insufficient synchronization to assure it
actually ever sees a blocked lock). Ideally locking should be tested between
two separate clients (to make sure you aren't just testing a client's local
blocked lock handling (the Linux client used to handle blocked locks
locally...). If using multilock, one of the ml_posix_client processes can be
run on the server just directly accessing the exported filesystem which
would allow reasonable testing with just two hosts, though it's better to
use 3 hosts (server, 2 clients). I never do this level of testing anymore
because it's a pain for me to actually run a 2nd (let alone a 3rd) VM (and
my work laptop has firewall setup that prevents it from being an RPC server
- which means it can't be an NLM client since an NLM client is ALSO an RPC
server...).

With all the mucking about that's been done, we also need to make sure that
someone properly tests server failure while using NLM...

Gee, don't you wish you had never touched the RPC code... :-) :-) :-)
Seriously, thanks for all the work there, I'm just making sure all the
implications of change here are understood because almost no one actually
understands this stuff (I barely understand it myself...).

Frank

> On Wed, Nov 1, 2017 at 5:20 AM, William Allen Simpson
> <william.allen.simp...@gmail.com> wrote:
> > I'm flummoxed.  Who knows this code?
> >
> > Problem 1: the timeout is set to 10 microseconds.  Holy heck?  And
> > historically, that's the maximum total wait time, so it would try at
> > least three (3) times within 10 *MICRO*seconds?
> >
> > Probably should be milliseconds.
> >
> > Problem 2: there's a retry loop that sets up and tears down a TCP (or
> > UDP) connection 3 times.  On top of the 3 tries in RPC itself?
> >
> > This looks like a lot of self-flagellation, maybe because the timeout
> > above was set too short?
> >
> > Problem 3: this isn't really async -- nlm_send_async() actually runs a
> > pthread_cond_timedwait() before returning.  That's sync!
> >
> > But we already have a timedwait in RPC.  And a signal.  So this
> > completely duplicates the underlying RPC library code.  Why?
> >
> >
> > ----------------------------------------------------------------------
> > -------- Check out the vibrant tech community on one of the world's
> > most engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> > _______________________________________________
> > Nfs-ganesha-devel mailing list
> > Nfs-ganesha-devel@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
> 
> 
> 
> --
> 
> Matt Benjamin
> Red Hat, Inc.
> 315 West Huron Street, Suite 140A
> Ann Arbor, Michigan 48103
> 
> http://www.redhat.com/en/technologies/storage
> 
> tel.  734-821-5101
> fax.  734-769-8938
> cel.  734-216-5309
> 
>
----------------------------------------------------------------------------
--
> Check out the vibrant tech community on one of the world's most engaging
> tech sites, Slashdot.org! http://sdm.link/slashdot
> _______________________________________________
> Nfs-ganesha-devel mailing list
> Nfs-ganesha-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel

Reply via email to