On 09/21/2017 07:45 PM, Frank Filz wrote:
Philippe discovered that recent Ganesha will no longer allow compiling the
linux kernel due to dangling open file descriptors.

I'm not sure if there is any true leak, the simple test of echo foo >
/mnt/foo does show a remaining open fd for /mnt/foo, however that is the
global fd opened in the course of doing a getattrs on FSAL_ VFS.

We have been talking about how the current management of open file
descriptors doesn't really work, so I have a couple proposals:

1. We really should have a limit on the number of states we allow. Now that
NLM locks and shares also have a state_t, it would be simple to have a count
of how many are in use, and return a resource error if an operation requires
creating a new one past the limit. This can be a hard limit with no grace,
if the limit is hit, then alloc_state fails.

This I agree with.


2. Management of the global fd is more complex, so here goes:

Part of the proposal is a way for the FSAL to indicate that an FSAL call
used the global fd in a way that consumes some kind of resource the FSAL
would like managed.

FSAL_PROXY should never indicate that (anonymous I/O should be done using a
special stateid, and a simple file create should result in the open stateid
immediately being closed, if that's not the case, then it's easy enough to
indicate use of a limited resource.

FSAL_VFS would indicate use of the resource any time it utilizes the global
fd. If it uses a temp fd that is closed after performing the operation, it
would not indicate use of the limited resource.

FSAL_GPFS, FSAL_GLUSTER, and FSAL_CEPH should all be similar to FSAL_VFS.

FSAL_RGW only has a global fd, and I don't quite understand how it is
managed.

If only PROXY doesn't set this, then maybe it's added complexity we don't need. Just assume it's set.

The main part of the proposal is to actually create a new LRU queue for
objects that are using the limited resource.

If we are at the hard limit on the limited resource and an entry that is not
already in the LRU uses the resource, then we would reap an existing entry
and call fsal_close on it to release the resource. If an entry was not
available to be reaped, we would temporarily exceed the limit just like we
do with mdcache entries.

If an FSAL call resulted in use of the resource and the entry was already in
the resource LRU, then it would be bumped to MRU of L1.

The LRU run thread for the resource would demote objects from LRU L1 to MRU
of L2, and call fsal_close and remove objects from LRU of L2. I think it
should work to close any files that have not been used in the amount of
time, really using the L1 and L2 to give a shorter life to objects for which
the resource is used once and then not used again, whereas a file that is
accessed multiple times would have more resistance to being closed. I think
the exact mechanics here may need some tuning, but that's the general idea.

The idea here is to be constantly closing files that have not been accessed
recently, and also to better manage a count of the files for which we are
actually using the resources, and not keep a file open just because for some
reason we do lots of lookups or stats of it (we might have to open it for
getattrs, but then we might serve a bunch of cached attrs, which doesn't go
to disk, might as well close the fd).

This sounds almost exactly like the existing LRU thread, except that it ignores refcount. If you remove global FD from the obj_handle, then the LRU as it currently exists becomes unnecessary for MDCACHE entries, as they only need a simple, single-level LRU based only on initial refcounts. The current, multi-level LRU only exists to close the global FD when transitioning LRU levels.

So, what it sounds like to me is that you're splitting the LRU for entries from the LRU for global FDs. Is this correct? If so, I think this complicates the two sets of LRU transitions, but probably not insurmountably so.

I also propose making the limit for the resource configurable independent of
the ulimit for file descriptors, though if an FSAL is loaded that actually
uses file descriptors for open files should check that the ulimit is big
enough, it should also include the limit on state_t also. Of course it will
be impossible to account for file descriptors used for sockets, log files,
config files, or random libraries that like to open files...

Hmmm... I don't think we can do any kind of checking, if we're not going to use ulimit by default, since it depends on which FSALs are in use at any given time. I say we either default the limits to ulimit, or just ignore ulimit entirely and log an appropriate error when EMFILE is returned.

Daniel

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel

Reply via email to