On 10/18/17 15:40, Frank Filz wrote:
Hmm, this discussion got stalled, but Patrice reminded me that we need to
continue it...
On 09/21/2017 07:45 PM, Frank Filz wrote:
Philippe discovered that recent Ganesha will no longer allow compiling
the linux kernel due to dangling open file descriptors.
I'm not sure if there is any true leak, the simple test of echo foo >
/mnt/foo does show a remaining open fd for /mnt/foo, however that is
the global fd opened in the course of doing a getattrs on FSAL_ VFS.
We have been talking about how the current management of open file
descriptors doesn't really work, so I have a couple proposals:
1. We really should have a limit on the number of states we allow. Now
that NLM locks and shares also have a state_t, it would be simple to
have a count of how many are in use, and return a resource error if an
operation requires creating a new one past the limit. This can be a
hard limit with no grace, if the limit is hit, then alloc_state fails.
This I agree with.
2. Management of the global fd is more complex, so here goes:
Part of the proposal is a way for the FSAL to indicate that an FSAL
call used the global fd in a way that consumes some kind of resource
the FSAL would like managed.
FSAL_PROXY should never indicate that (anonymous I/O should be done
using a special stateid, and a simple file create should result in the
open stateid immediately being closed, if that's not the case, then
it's easy enough to indicate use of a limited resource.
FSAL_VFS would indicate use of the resource any time it utilizes the
global fd. If it uses a temp fd that is closed after performing the
operation, it would not indicate use of the limited resource.
FSAL_GPFS, FSAL_GLUSTER, and FSAL_CEPH should all be similar to
FSAL_VFS.
FSAL_RGW only has a global fd, and I don't quite understand how it is
managed.
If only PROXY doesn't set this, then maybe it's added complexity we don't
need. Just assume it's set.
Matt, could you chime in on RGW? It sounds like FSAL_RGW and/or the RGW
library really manage the open/close state of the file. If so, then you
don't need hints from MDCACHE LRU...
The main part of the proposal is to actually create a new LRU queue
for objects that are using the limited resource.
If we are at the hard limit on the limited resource and an entry that
is not already in the LRU uses the resource, then we would reap an
existing entry and call fsal_close on it to release the resource. If
an entry was not available to be reaped, we would temporarily exceed
the limit just like we do with mdcache entries.
If an FSAL call resulted in use of the resource and the entry was
already in the resource LRU, then it would be bumped to MRU of L1.
The LRU run thread for the resource would demote objects from LRU L1
to MRU of L2, and call fsal_close and remove objects from LRU of L2. I
think it should work to close any files that have not been used in the
amount of time, really using the L1 and L2 to give a shorter life to
objects for which the resource is used once and then not used again,
whereas a file that is accessed multiple times would have more
resistance to being closed. I think the exact mechanics here may need
some tuning, but that's the general idea.
The idea here is to be constantly closing files that have not been
accessed recently, and also to better manage a count of the files for
which we are actually using the resources, and not keep a file open
just because for some reason we do lots of lookups or stats of it (we
might have to open it for getattrs, but then we might serve a bunch of
cached attrs, which doesn't go to disk, might as well close the fd).
This sounds almost exactly like the existing LRU thread, except that it
ignores
refcount. If you remove global FD from the obj_handle, then the LRU as it
currently exists becomes unnecessary for MDCACHE entries, as they only
need a simple, single-level LRU based only on initial refcounts. The
current,
multi-level LRU only exists to close the global FD when transitioning LRU
levels.
The multi-level LRU for handle cache still have some value for scan
resistance.
So, what it sounds like to me is that you're splitting the LRU for entries
from
the LRU for global FDs. Is this correct? If so, I think this complicates
the two
sets of LRU transitions, but probably not insurmountably so.
I also propose making the limit for the resource configurable
independent of the ulimit for file descriptors, though if an FSAL is
loaded that actually uses file descriptors for open files should check
that the ulimit is big enough, it should also include the limit on
state_t also. Of course it will be impossible to account for file
descriptors used for sockets, log files, config files, or random
libraries that
like to open files...
Hmmm... I don't think we can do any kind of checking, if we're not going
to
use ulimit by default, since it depends on which FSALs are in use at any
given
time. I say we either default the limits to ulimit, or just ignore ulimit
entirely
and log an appropriate error when EMFILE is returned.
Yea, that certainly would be the simplest mechanism. Along with
documentation. Specific products which know which FSAL they are using and
may know other quirks could always set ulimit to the config value + some
reasonable amount to cover things like log file open/close and sockets (hmm,
do we have a configurable limit on the number of TCP connections we
accept?).
Frank
---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus
Isn't it only a question of layering the code ? If limited ressources
are used by any fsal, why don't letting each FSAL managing it ?
The FSAL-API seems to be designed without any reference to the "global
fd" concept whereas some parts of the code outside of any fsals seem to
be very link to this concept. Using a "global fd", isn't-it only a FSAL
deal ? A very common deal (this choice seems to be done by vfs, ceph,
gpfs and gluster, fsals), but only at fsal level. It looks like using a
global fd is "hardcoded" outside of the FSAL whereas there is no track
of it in the FSAL-API.
Best regards,
--
Patrice LUCAS
Ingenieur-Chercheur, CEA-DAM/DSSI/SISR/LA2S
tel : +33 (0)1 69 26 47 86
e-mail : patrice.lu...@cea.fr
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel