Andrew: Thanks for taking the time to dive into this. I agree that at this point the discussion really needs to be taken to openafs-devel in order to ensure that the right audience is paying attention. This e-mail will hopefully take care of that.
Jeffrey Altman On 3/22/2011 4:13 PM, Andrew Deason wrote: > On Mon, 21 Mar 2011 23:25:36 -0500 > Andrew Deason <[email protected]> wrote: > >> However, I don't think this fully explains the behavior. I haven't >> checked it out yet (and will not today), but this shouldn't really be >> causing the fd to be left open. We are calling IH_REALLYCLOSE when the >> volume goes offline, and we do appear to be going through the code >> path that closes all relevant file descriptors. But after that is >> done, I still see an FdHandle_t with a refcount of 0, in the _OPEN >> state holding open the problematic inode (this is without the fix >> mentioned above). So, it seems like ih_fdclose or whatever isn't >> doing its job, and it seems like that warrants investigation as well. > > We're pretty far into -devel territory now, I think, but... > > I think this is another separate bug, though I don't know if it > manifests as an actual noticeable issue when gerrit 4272 is in play. > While we do issue IH_REALLYCLOSE for all of the vnode handles for the > volume, the handle we're leaving open isn't part of the volume that's > going offline (in my usage, anyway). This may be a bit hard to explain, > but it's something like this... > > We open some vnode in RW vol for writing, and CoW it; our old handle for > inode A is closed, and write to the new handle B (the RW copy). When the > vol goes offline, handle B sticks around because of the leak fixed by > 4272. The volume is released, and we write to the vnode again. We CoW it > so now the RW has a new handle C, and FDH_REALLYCLOSE's handle B. > However, handle B never goes away, since while we do IH_RELEASE it in > the CoW/StoreData code, the ref count doesn't go down to 0, since we > leaked a positive ref earlier. > > In theory, this should just be a resource leak, and the handle for B > should not stay open because FDH_REALLYCLOSE is supposed to close the > underlying fd. However, due to CoW refactoring in 1.6, the original > target handle is opened twice. StoreData_RXStyle holds the ref in > origfdP and CopyOnWrite opens another FdHandle_t from the target vnode's > ihandle. So, when HAVE_PIO is defined, this means that when CoW > IH_OPEN's the targetvnode->handle, it gets back the same FdHandle_t as > StorData_RXStyle has, but the ref count is inc'd. And if you look > closely at fd_reallyclose, you'll see that it is effectively a no-op > when fd_refcnt > 1. > > This isn't an issue when gerrit 4272 fixes the ihandle leak, since when > the ihandle refcnt reaches 0, we close all associated file descriptors > anyway. > > A conservative fix for this I think would be to add some field to > FdHandle_t to indicate that a reallyclose is needed, so a subsequent > call to FDH_CLOSE causes an effective FDH_REALLYCLOSE. (Like how > ihandle_t has the IH_REALLY_CLOSED flag.) A "better" fix would be to add > a new fd_status type, but I'd be worried for stable 1.6.x about screwing > up the other fd_status checks. >
signature.asc
Description: OpenPGP digital signature
