Re: CVS commit: src/sys/ufs/ufs
On Tue, Sep 22, 2009 at 11:26:21PM +0300, Antti Kantee wrote: > On Tue Sep 22 2009 at 21:58:02 +0200, Manuel Bouyer wrote: > > [explanations] > > This is starting to sound sensible now. You obviously have invested > quite a bit of thought in investigating the problem. Yes, it was annoying enough :) > > > > Blah, I didn't even want to think about this migrane-inducer now. > > > Maybe people who have recently worked on vnode reclaiming could instead > > > be the ones to comment? > > > > that would be nice :) > > You don't like my comments? ;) Ho I do, and you certainly know more than I do in this area ... :) -- Manuel Bouyer NetBSD: 26 ans d'experience feront toujours la difference --
Re: CVS commit: src/sys/ufs/ufs
On Tue Sep 22 2009 at 21:58:02 +0200, Manuel Bouyer wrote: > [explanations] This is starting to sound sensible now. You obviously have invested quite a bit of thought in investigating the problem. > > Blah, I didn't even want to think about this migrane-inducer now. > > Maybe people who have recently worked on vnode reclaiming could instead > > be the ones to comment? > > that would be nice :) You don't like my comments? ;)
Re: CVS commit: src/sys/ufs/ufs
On Tue, Sep 22, 2009 at 10:42:59PM +0300, Antti Kantee wrote: > On Tue Sep 22 2009 at 21:04:14 +0200, Manuel Bouyer wrote: > > that's not an issue with the reference count. It's an issue with vclean() > > calling VOP_RECLAIM() even if the refcount is greater than 1, and > > vrelel() calling vclean() even if the refcount is greater than 1, > > when the file has been unlink(2)ed. There's a comment about this in > > vrelel(), look for variable "recycle". > > Ah, ic, probably would've been easier if I read the PR first ;) > > So basically someone can vget the vnode (via fhtovp, since it's gone > from the directory namespace) between VOP_INACTIVE and clean. > > Your fix doesn't really fix this problem, since depending on timings > the inode might still be recycled after you check it's "valid". I don't think so because at this point the vnode is locked. I think the race window is between vn_lock() releasing the interlock and getting the vn_lock. After that the reference count is high enouth to prevent vrelel() trying to clean it (vtryrele() at the start of vrelel will make it return, and we hold the interlock at this point). > > Hmm, ok, so things which might happen: > > 1: VOP_REMOVE, vnode is locked, vrele is called > 2: fhtovp before inode is reclaimed, blocks on vn_lock It would fist block on the interlock at this point, I guess. > 1: VOP_INACTIVE releases vnlock, returns signal to recycle vnode > 2: gets lock, does check that it's still the same inode > 1: reclaims vnode > 2: boom > > Since vget takes the interlock, a dirty & cheap trick might be to check > that the reference count is still one before trying to clean the vnode in > vrelel(), otherwise punting and letting the next release-to-0 reclaim it? So basically ignoring what VOP_INACTIVE() says. I think this would need another flag so that new vget() against this vnode can drop it early. Otherwise we could have a livelock with the NFS server always getting references to the vnode, preventing it from being recycled and blocking other threads waiting on the inode to reuse it. > > Blah, I didn't even want to think about this migrane-inducer now. > Maybe people who have recently worked on vnode reclaiming could instead > be the ones to comment? that would be nice :) -- Manuel Bouyer NetBSD: 26 ans d'experience feront toujours la difference --
Re: CVS commit: src/sys/ufs/ufs
On Tue Sep 22 2009 at 21:04:14 +0200, Manuel Bouyer wrote: > that's not an issue with the reference count. It's an issue with vclean() > calling VOP_RECLAIM() even if the refcount is greater than 1, and > vrelel() calling vclean() even if the refcount is greater than 1, > when the file has been unlink(2)ed. There's a comment about this in > vrelel(), look for variable "recycle". Ah, ic, probably would've been easier if I read the PR first ;) So basically someone can vget the vnode (via fhtovp, since it's gone from the directory namespace) between VOP_INACTIVE and clean. Your fix doesn't really fix this problem, since depending on timings the inode might still be recycled after you check it's "valid". Hmm, ok, so things which might happen: 1: VOP_REMOVE, vnode is locked, vrele is called 2: fhtovp before inode is reclaimed, blocks on vn_lock 1: VOP_INACTIVE releases vnlock, returns signal to recycle vnode 2: gets lock, does check that it's still the same inode 1: reclaims vnode 2: boom Since vget takes the interlock, a dirty & cheap trick might be to check that the reference count is still one before trying to clean the vnode in vrelel(), otherwise punting and letting the next release-to-0 reclaim it? Blah, I didn't even want to think about this migrane-inducer now. Maybe people who have recently worked on vnode reclaiming could instead be the ones to comment?
Re: CVS commit: src/sys/ufs/ufs
On Tue, Sep 22, 2009 at 09:51:13PM +0300, Antti Kantee wrote: > > >From what I understand the vnode is not on the free list yet so it can't > > be reused for something else. > > For which one? ;) > > No, when the reference count (and hold count) goes to zero, the vnode > goes onto the freelist. However, before use, it must be cleaned. But v_usecount is not 0. vget() did increment it before sleeping. > > > > If there is a race in vfs and XLOCK is not used properly, I think that > > > should be investigated and fixed instead of patching file systems here > > > and there. > > > > I don't know how XLOCK is supposed to be used (and I even less know how > > to change things without creating deadlocks). As I see it XLOCK is set > > while the vnode is being cleaned, not before or after cleaning it, so > > XLOCK is not going to avoid this race anyway. > > XLOCK is supposed to be set when the vnode is being cleaned. vget() > can then detect this early on, wait for the vnode to actually be cleaned > (i.e. VOP_RECLAIM() has done its thing so a new vnode can be crafted > without problems) and return an error. This is what it does, yes. But what happens here is that XLOCK was not set, so vget() tried to aquire the vn_lock. It sleeps here, and the vnode is being VOP_RECLAIM'ed while it sleeps. Before it sleeps XLOCK is not set, when it's woken up XLOCK is not set any more and the vnode is clean. > > I am *guessing* that the selection for a vnode cleanup is racy and > the vnode is selected before the reference count is checked (i.e. > the situation before vget() upped the refcount for the duration of > its operation). But I might be completely wrong too, as that doesn't > really explain what the relationship with fhtovnode is. Maybe it just > more readily triggers the race? that's not an issue with the reference count. It's an issue with vclean() calling VOP_RECLAIM() even if the refcount is greater than 1, and vrelel() calling vclean() even if the refcount is greater than 1, when the file has been unlink(2)ed. There's a comment about this in vrelel(), look for variable "recycle". > > However, I'm still very far from convinced the current fix is appropriate. I'm not either. But at last it makes the NFS server useable, so I think it's OK as a stopgap measure for netbsd-5. > You should at least file a big allcaps PR about the issue. there is one: kern/41147 (this is about the consequences, not the cause though). -- Manuel Bouyer NetBSD: 26 ans d'experience feront toujours la difference --
Re: CVS commit: src/sys/ufs/ufs
On Tue Sep 22 2009 at 20:34:50 +0200, Manuel Bouyer wrote: > > > It depends on what you mean with "race-free". If you mean that the > > > vnode returned by vget() can't be recygled, I think this is true. > > > If you mean that vget() can't return a clean vnode then this is false: > > > vget() can sleep in vn_lock(), and it releases the v_interlock mutex > > > before > > > sleeping. While sleeping vclean() can VOP_RECLAIM() the vnode, even > > > if v_usecount is > 1. > > > > What is the practical difference of "cleaned" and "recycled" for the > > file system driver? > > >From what I understand the vnode is not on the free list yet so it can't > be reused for something else. For which one? ;) No, when the reference count (and hold count) goes to zero, the vnode goes onto the freelist. However, before use, it must be cleaned. > > If there is a race in vfs and XLOCK is not used properly, I think that > > should be investigated and fixed instead of patching file systems here > > and there. > > I don't know how XLOCK is supposed to be used (and I even less know how > to change things without creating deadlocks). As I see it XLOCK is set > while the vnode is being cleaned, not before or after cleaning it, so > XLOCK is not going to avoid this race anyway. XLOCK is supposed to be set when the vnode is being cleaned. vget() can then detect this early on, wait for the vnode to actually be cleaned (i.e. VOP_RECLAIM() has done its thing so a new vnode can be crafted without problems) and return an error. I am *guessing* that the selection for a vnode cleanup is racy and the vnode is selected before the reference count is checked (i.e. the situation before vget() upped the refcount for the duration of its operation). But I might be completely wrong too, as that doesn't really explain what the relationship with fhtovnode is. Maybe it just more readily triggers the race? However, I'm still very far from convinced the current fix is appropriate. You should at least file a big allcaps PR about the issue.
Re: CVS commit: src/sys/ufs/ufs
On Tue, Sep 22, 2009 at 09:23:05PM +0300, Antti Kantee wrote: > On Tue Sep 22 2009 at 20:06:40 +0200, Manuel Bouyer wrote: > > On Sun, Sep 20, 2009 at 08:23:38PM +0300, Antti Kantee wrote: > > > > In ufs_ihashget(), vget() can return a vnode that has been vclean'ed > > > > because > > > > vget() can sleep. After vget returns, check that vp is still connected > > > > with > > > > ip, and that ip still points to the inode we want. This fix the NULL > > > > pointer dereference in ufs_fhtovp() I've been seeing on a NFS server. > > > > > > Um, hold the phone. The whole point of vget() is to provide race-free > > > access to the weak vnode reference held by the file system. Are you > > > saying this does not hold anymore? > > > > It depends on what you mean with "race-free". If you mean that the > > vnode returned by vget() can't be recygled, I think this is true. > > If you mean that vget() can't return a clean vnode then this is false: > > vget() can sleep in vn_lock(), and it releases the v_interlock mutex before > > sleeping. While sleeping vclean() can VOP_RECLAIM() the vnode, even > > if v_usecount is > 1. > > What is the practical difference of "cleaned" and "recycled" for the > file system driver? >From what I understand the vnode is not on the free list yet so it can't be reused for something else. > > If there is a race in vfs and XLOCK is not used properly, I think that > should be investigated and fixed instead of patching file systems here > and there. I don't know how XLOCK is supposed to be used (and I even less know how to change things without creating deadlocks). As I see it XLOCK is set while the vnode is being cleaned, not before or after cleaning it, so XLOCK is not going to avoid this race anyway. I asked for help on tech-kern about this but didn't get any reply ... -- Manuel Bouyer NetBSD: 26 ans d'experience feront toujours la difference --
Re: CVS commit: src/sys/ufs/ufs
On Tue Sep 22 2009 at 20:06:40 +0200, Manuel Bouyer wrote: > On Sun, Sep 20, 2009 at 08:23:38PM +0300, Antti Kantee wrote: > > > In ufs_ihashget(), vget() can return a vnode that has been vclean'ed > > > because > > > vget() can sleep. After vget returns, check that vp is still connected > > > with > > > ip, and that ip still points to the inode we want. This fix the NULL > > > pointer dereference in ufs_fhtovp() I've been seeing on a NFS server. > > > > Um, hold the phone. The whole point of vget() is to provide race-free > > access to the weak vnode reference held by the file system. Are you > > saying this does not hold anymore? > > It depends on what you mean with "race-free". If you mean that the > vnode returned by vget() can't be recygled, I think this is true. > If you mean that vget() can't return a clean vnode then this is false: > vget() can sleep in vn_lock(), and it releases the v_interlock mutex before > sleeping. While sleeping vclean() can VOP_RECLAIM() the vnode, even > if v_usecount is > 1. What is the practical difference of "cleaned" and "recycled" for the file system driver? If there is a race in vfs and XLOCK is not used properly, I think that should be investigated and fixed instead of patching file systems here and there.
Re: CVS commit: src/sys/ufs/ufs
On Sun, Sep 20, 2009 at 08:23:38PM +0300, Antti Kantee wrote: > > In ufs_ihashget(), vget() can return a vnode that has been vclean'ed because > > vget() can sleep. After vget returns, check that vp is still connected with > > ip, and that ip still points to the inode we want. This fix the NULL > > pointer dereference in ufs_fhtovp() I've been seeing on a NFS server. > > Um, hold the phone. The whole point of vget() is to provide race-free > access to the weak vnode reference held by the file system. Are you > saying this does not hold anymore? It depends on what you mean with "race-free". If you mean that the vnode returned by vget() can't be recygled, I think this is true. If you mean that vget() can't return a clean vnode then this is false: vget() can sleep in vn_lock(), and it releases the v_interlock mutex before sleeping. While sleeping vclean() can VOP_RECLAIM() the vnode, even if v_usecount is > 1. -- Manuel Bouyer NetBSD: 26 ans d'experience feront toujours la difference --