Hmm, this discussion got stalled, but Patrice reminded me that we need to
continue it...

> On 09/21/2017 07:45 PM, Frank Filz wrote:
> > Philippe discovered that recent Ganesha will no longer allow compiling
> > the linux kernel due to dangling open file descriptors.
> >
> > I'm not sure if there is any true leak, the simple test of echo foo >
> > /mnt/foo does show a remaining open fd for /mnt/foo, however that is
> > the global fd opened in the course of doing a getattrs on FSAL_ VFS.
> >
> > We have been talking about how the current management of open file
> > descriptors doesn't really work, so I have a couple proposals:
> >
> > 1. We really should have a limit on the number of states we allow. Now
> > that NLM locks and shares also have a state_t, it would be simple to
> > have a count of how many are in use, and return a resource error if an
> > operation requires creating a new one past the limit. This can be a
> > hard limit with no grace, if the limit is hit, then alloc_state fails.
> 
> This I agree with.
> 
> >
> > 2. Management of the global fd is more complex, so here goes:
> >
> > Part of the proposal is a way for the FSAL to indicate that an FSAL
> > call used the global fd in a way that consumes some kind of resource
> > the FSAL would like managed.
> >
> > FSAL_PROXY should never indicate that (anonymous I/O should be done
> > using a special stateid, and a simple file create should result in the
> > open stateid immediately being closed, if that's not the case, then
> > it's easy enough to indicate use of a limited resource.
> >
> > FSAL_VFS would indicate use of the resource any time it utilizes the
> > global fd. If it uses a temp fd that is closed after performing the
> > operation, it would not indicate use of the limited resource.
> >
> > FSAL_GPFS, FSAL_GLUSTER, and FSAL_CEPH should all be similar to
> FSAL_VFS.
> >
> > FSAL_RGW only has a global fd, and I don't quite understand how it is
> > managed.
> 
> If only PROXY doesn't set this, then maybe it's added complexity we don't
> need.  Just assume it's set.

Matt, could you chime in on RGW? It sounds like FSAL_RGW and/or the RGW
library really manage the open/close state of the file. If so, then you
don't need hints from MDCACHE LRU...

> > The main part of the proposal is to actually create a new LRU queue
> > for objects that are using the limited resource.
> >
> > If we are at the hard limit on the limited resource and an entry that
> > is not already in the LRU uses the resource, then we would reap an
> > existing entry and call fsal_close on it to release the resource. If
> > an entry was not available to be reaped, we would temporarily exceed
> > the limit just like we do with mdcache entries.
> >
> > If an FSAL call resulted in use of the resource and the entry was
> > already in the resource LRU, then it would be bumped to MRU of L1.
> >
> > The LRU run thread for the resource would demote objects from LRU L1
> > to MRU of L2, and call fsal_close and remove objects from LRU of L2. I
> > think it should work to close any files that have not been used in the
> > amount of time, really using the L1 and L2 to give a shorter life to
> > objects for which the resource is used once and then not used again,
> > whereas a file that is accessed multiple times would have more
> > resistance to being closed. I think the exact mechanics here may need
> some tuning, but that's the general idea.
> >
> > The idea here is to be constantly closing files that have not been
> > accessed recently, and also to better manage a count of the files for
> > which we are actually using the resources, and not keep a file open
> > just because for some reason we do lots of lookups or stats of it (we
> > might have to open it for getattrs, but then we might serve a bunch of
> > cached attrs, which doesn't go to disk, might as well close the fd).
> 
> This sounds almost exactly like the existing LRU thread, except that it
ignores
> refcount.  If you remove global FD from the obj_handle, then the LRU as it
> currently exists becomes unnecessary for MDCACHE entries, as they only
> need a simple, single-level LRU based only on initial refcounts.  The
current,
> multi-level LRU only exists to close the global FD when transitioning LRU
> levels.

The multi-level LRU for handle cache still have some value for scan
resistance.

> So, what it sounds like to me is that you're splitting the LRU for entries
from
> the LRU for global FDs.  Is this correct?  If so, I think this complicates
the two
> sets of LRU transitions, but probably not insurmountably so.
> 
> > I also propose making the limit for the resource configurable
> > independent of the ulimit for file descriptors, though if an FSAL is
> > loaded that actually uses file descriptors for open files should check
> > that the ulimit is big enough, it should also include the limit on
> > state_t also. Of course it will be impossible to account for file
> > descriptors used for sockets, log files, config files, or random
libraries that
> like to open files...
> 
> Hmmm... I don't think we can do any kind of checking, if we're not going
to
> use ulimit by default, since it depends on which FSALs are in use at any
given
> time.  I say we either default the limits to ulimit, or just ignore ulimit
entirely
> and log an appropriate error when EMFILE is returned.

Yea, that certainly would be the simplest mechanism. Along with
documentation. Specific products which know which FSAL they are using and
may know other quirks could always set ulimit to the config value + some
reasonable amount to cover things like log file open/close and sockets (hmm,
do we have a configurable limit on the number of TCP connections we
accept?).

Frank



---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel

Reply via email to