I've run into another TLS problem.  It's been there since tirpc.

Apparently, once upon a time, rpc_createerr was a static global.
It still says that in the man pages.

When a client create function fails, they stash the error there,
and return NULL for the CLIENT.  Basically, you check for NULL,
and then check rpc_createerr....

This is also used extensively by the RPC bind code.

Then, they made it a keyed thread local to try and salvage it
without a major code re-write.

With async threads, that's not working.  We've got memory leaks.
AFAICT, only on errors.  But accessing them on a different
thread gives the wrong error code (or none at all).  Not good.

All the functions that use it are still not MT-safe, usually
because they stash a string in global memory without locking.
They need re-definition to alloc/free the string.

Worse, it's not a good definition.

rpc_createerr has both clnt_stat and rpc_err, but struct rpc_err
also has a clnt_stat (since original tirpc).  clnt_stat is not
consistently set properly, as it is in two places.  So the error
checking code is often wrong.

I'd like to get rid of the whole mess, but that means every client
create would have new semantics.  Fortunately, there aren't many
(in Ganesha).  Plus we already have new definitions -- all named
*_ncreate with a tirpc_compat.h to munge them back.

But should we do it now?  Or in 2.7?  We've been living with it for
years, although recent code changes have made it worse.  Again, it
only happens on errors.  Especially for RPC bind.

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel

Reply via email to