Re: CVS commit: src/sys/ufs/ufs

2009-09-22 Thread Manuel Bouyer
On Sun, Sep 20, 2009 at 08:23:38PM +0300, Antti Kantee wrote:
> > In ufs_ihashget(), vget() can return a vnode that has been vclean'ed because
> > vget() can sleep. After vget returns, check that vp is still connected with
> > ip, and that ip still points to the inode we want. This fix the NULL
> > pointer dereference in ufs_fhtovp() I've been seeing on a NFS server.
> 
> Um, hold the phone.  The whole point of vget() is to provide race-free
> access to the weak vnode reference held by the file system.  Are you
> saying this does not hold anymore?

It depends on what you mean with "race-free". If you mean that the
vnode returned by vget() can't be recygled, I think this is true.
If you mean that vget() can't return a clean vnode then this is false:
vget() can sleep in vn_lock(), and it releases the v_interlock mutex before
sleeping. While sleeping vclean() can VOP_RECLAIM() the vnode, even
if v_usecount is > 1.

-- 
Manuel Bouyer 
 NetBSD: 26 ans d'experience feront toujours la difference
--


Re: CVS commit: src/sys/ufs/ufs

2009-09-22 Thread Antti Kantee
On Tue Sep 22 2009 at 20:06:40 +0200, Manuel Bouyer wrote:
> On Sun, Sep 20, 2009 at 08:23:38PM +0300, Antti Kantee wrote:
> > > In ufs_ihashget(), vget() can return a vnode that has been vclean'ed 
> > > because
> > > vget() can sleep. After vget returns, check that vp is still connected 
> > > with
> > > ip, and that ip still points to the inode we want. This fix the NULL
> > > pointer dereference in ufs_fhtovp() I've been seeing on a NFS server.
> > 
> > Um, hold the phone.  The whole point of vget() is to provide race-free
> > access to the weak vnode reference held by the file system.  Are you
> > saying this does not hold anymore?
> 
> It depends on what you mean with "race-free". If you mean that the
> vnode returned by vget() can't be recygled, I think this is true.
> If you mean that vget() can't return a clean vnode then this is false:
> vget() can sleep in vn_lock(), and it releases the v_interlock mutex before
> sleeping. While sleeping vclean() can VOP_RECLAIM() the vnode, even
> if v_usecount is > 1.

What is the practical difference of "cleaned" and "recycled" for the
file system driver?

If there is a race in vfs and XLOCK is not used properly, I think that
should be investigated and fixed instead of patching file systems here
and there.


Re: CVS commit: src/sys/ufs/ufs

2009-09-22 Thread Manuel Bouyer
On Tue, Sep 22, 2009 at 09:23:05PM +0300, Antti Kantee wrote:
> On Tue Sep 22 2009 at 20:06:40 +0200, Manuel Bouyer wrote:
> > On Sun, Sep 20, 2009 at 08:23:38PM +0300, Antti Kantee wrote:
> > > > In ufs_ihashget(), vget() can return a vnode that has been vclean'ed 
> > > > because
> > > > vget() can sleep. After vget returns, check that vp is still connected 
> > > > with
> > > > ip, and that ip still points to the inode we want. This fix the NULL
> > > > pointer dereference in ufs_fhtovp() I've been seeing on a NFS server.
> > > 
> > > Um, hold the phone.  The whole point of vget() is to provide race-free
> > > access to the weak vnode reference held by the file system.  Are you
> > > saying this does not hold anymore?
> > 
> > It depends on what you mean with "race-free". If you mean that the
> > vnode returned by vget() can't be recygled, I think this is true.
> > If you mean that vget() can't return a clean vnode then this is false:
> > vget() can sleep in vn_lock(), and it releases the v_interlock mutex before
> > sleeping. While sleeping vclean() can VOP_RECLAIM() the vnode, even
> > if v_usecount is > 1.
> 
> What is the practical difference of "cleaned" and "recycled" for the
> file system driver?

>From what I understand the vnode is not on the free list yet so it can't
be reused for something else.

> 
> If there is a race in vfs and XLOCK is not used properly, I think that
> should be investigated and fixed instead of patching file systems here
> and there.

I don't know how XLOCK is supposed to be used (and I even less know how
to change things without creating deadlocks). As I see it XLOCK is set
while the vnode is being cleaned, not before or after cleaning it, so
XLOCK is not going to avoid this race anyway.

I asked for help on tech-kern about this but didn't get any reply ...

-- 
Manuel Bouyer 
 NetBSD: 26 ans d'experience feront toujours la difference
--


Re: CVS commit: src/sys/ufs/ufs

2009-09-22 Thread Antti Kantee
On Tue Sep 22 2009 at 20:34:50 +0200, Manuel Bouyer wrote:
> > > It depends on what you mean with "race-free". If you mean that the
> > > vnode returned by vget() can't be recygled, I think this is true.
> > > If you mean that vget() can't return a clean vnode then this is false:
> > > vget() can sleep in vn_lock(), and it releases the v_interlock mutex 
> > > before
> > > sleeping. While sleeping vclean() can VOP_RECLAIM() the vnode, even
> > > if v_usecount is > 1.
> > 
> > What is the practical difference of "cleaned" and "recycled" for the
> > file system driver?
> 
> >From what I understand the vnode is not on the free list yet so it can't
> be reused for something else.

For which one? ;)

No, when the reference count (and hold count) goes to zero, the vnode
goes onto the freelist.  However, before use, it must be cleaned.

> > If there is a race in vfs and XLOCK is not used properly, I think that
> > should be investigated and fixed instead of patching file systems here
> > and there.
> 
> I don't know how XLOCK is supposed to be used (and I even less know how
> to change things without creating deadlocks). As I see it XLOCK is set
> while the vnode is being cleaned, not before or after cleaning it, so
> XLOCK is not going to avoid this race anyway.

XLOCK is supposed to be set when the vnode is being cleaned.  vget()
can then detect this early on, wait for the vnode to actually be cleaned
(i.e. VOP_RECLAIM() has done its thing so a new vnode can be crafted
without problems) and return an error.

I am *guessing* that the selection for a vnode cleanup is racy and
the vnode is selected before the reference count is checked (i.e.
the situation before vget() upped the refcount for the duration of
its operation).  But I might be completely wrong too, as that doesn't
really explain what the relationship with fhtovnode is.  Maybe it just
more readily triggers the race?

However, I'm still very far from convinced the current fix is appropriate.
You should at least file a big allcaps PR about the issue.


Re: CVS commit: src/sys/ufs/ufs

2009-09-22 Thread Manuel Bouyer
On Tue, Sep 22, 2009 at 09:51:13PM +0300, Antti Kantee wrote:
> > >From what I understand the vnode is not on the free list yet so it can't
> > be reused for something else.
> 
> For which one? ;)
> 
> No, when the reference count (and hold count) goes to zero, the vnode
> goes onto the freelist.  However, before use, it must be cleaned.

But v_usecount is not 0. vget() did increment it before sleeping.

> 
> > > If there is a race in vfs and XLOCK is not used properly, I think that
> > > should be investigated and fixed instead of patching file systems here
> > > and there.
> > 
> > I don't know how XLOCK is supposed to be used (and I even less know how
> > to change things without creating deadlocks). As I see it XLOCK is set
> > while the vnode is being cleaned, not before or after cleaning it, so
> > XLOCK is not going to avoid this race anyway.
> 
> XLOCK is supposed to be set when the vnode is being cleaned.  vget()
> can then detect this early on, wait for the vnode to actually be cleaned
> (i.e. VOP_RECLAIM() has done its thing so a new vnode can be crafted
> without problems) and return an error.

This is what it does, yes. But what happens here is that XLOCK was not
set, so vget() tried to aquire the vn_lock. It sleeps here, and the
vnode is being VOP_RECLAIM'ed while it sleeps.
Before it sleeps XLOCK is not set, when it's woken up XLOCK is not
set any more and the vnode is clean.

> 
> I am *guessing* that the selection for a vnode cleanup is racy and
> the vnode is selected before the reference count is checked (i.e.
> the situation before vget() upped the refcount for the duration of
> its operation).  But I might be completely wrong too, as that doesn't
> really explain what the relationship with fhtovnode is.  Maybe it just
> more readily triggers the race?

that's not an issue with the reference count. It's an issue with vclean()
calling VOP_RECLAIM() even if the refcount is greater than 1, and
vrelel() calling vclean() even if the refcount is greater than 1,
when the file has been unlink(2)ed. There's a comment about this in
vrelel(), look for variable "recycle".

> 
> However, I'm still very far from convinced the current fix is appropriate.

I'm not either. But at last it makes the NFS server useable, so I think
it's OK as a stopgap measure for netbsd-5.

> You should at least file a big allcaps PR about the issue.

there is one: kern/41147 (this is about the consequences, not the cause
though).

-- 
Manuel Bouyer 
 NetBSD: 26 ans d'experience feront toujours la difference
--


Re: CVS commit: src/sys/ufs/ufs

2009-09-22 Thread Antti Kantee
On Tue Sep 22 2009 at 21:04:14 +0200, Manuel Bouyer wrote:
> that's not an issue with the reference count. It's an issue with vclean()
> calling VOP_RECLAIM() even if the refcount is greater than 1, and
> vrelel() calling vclean() even if the refcount is greater than 1,
> when the file has been unlink(2)ed. There's a comment about this in
> vrelel(), look for variable "recycle".

Ah, ic, probably would've been easier if I read the PR first ;)

So basically someone can vget the vnode (via fhtovp, since it's gone
from the directory namespace) between VOP_INACTIVE and clean.

Your fix doesn't really fix this problem, since depending on timings
the inode might still be recycled after you check it's "valid".

Hmm, ok, so things which might happen:

1: VOP_REMOVE, vnode is locked, vrele is called
2: fhtovp before inode is reclaimed, blocks on vn_lock
1: VOP_INACTIVE releases vnlock, returns signal to recycle vnode
2: gets lock, does check that it's still the same inode
1: reclaims vnode
2: boom

Since vget takes the interlock, a dirty & cheap trick might be to check
that the reference count is still one before trying to clean the vnode in
vrelel(), otherwise punting and letting the next release-to-0 reclaim it?

Blah, I didn't even want to think about this migrane-inducer now.
Maybe people who have recently worked on vnode reclaiming could instead
be the ones to comment?


Re: CVS commit: src/sys/ufs/ufs

2009-09-22 Thread Manuel Bouyer
On Tue, Sep 22, 2009 at 10:42:59PM +0300, Antti Kantee wrote:
> On Tue Sep 22 2009 at 21:04:14 +0200, Manuel Bouyer wrote:
> > that's not an issue with the reference count. It's an issue with vclean()
> > calling VOP_RECLAIM() even if the refcount is greater than 1, and
> > vrelel() calling vclean() even if the refcount is greater than 1,
> > when the file has been unlink(2)ed. There's a comment about this in
> > vrelel(), look for variable "recycle".
> 
> Ah, ic, probably would've been easier if I read the PR first ;)
> 
> So basically someone can vget the vnode (via fhtovp, since it's gone
> from the directory namespace) between VOP_INACTIVE and clean.
> 
> Your fix doesn't really fix this problem, since depending on timings
> the inode might still be recycled after you check it's "valid".

I don't think so because at this point the vnode is locked. I think the
race window is between vn_lock() releasing the interlock and getting the
vn_lock. After that the reference count is high enouth to prevent
vrelel() trying to clean it (vtryrele() at the start of vrelel will
make it return, and we hold the interlock at this point).


> 
> Hmm, ok, so things which might happen:
> 
> 1: VOP_REMOVE, vnode is locked, vrele is called
> 2: fhtovp before inode is reclaimed, blocks on vn_lock

It would fist block on the interlock at this point, I guess.

> 1: VOP_INACTIVE releases vnlock, returns signal to recycle vnode
> 2: gets lock, does check that it's still the same inode
> 1: reclaims vnode
> 2: boom
> 
> Since vget takes the interlock, a dirty & cheap trick might be to check
> that the reference count is still one before trying to clean the vnode in
> vrelel(), otherwise punting and letting the next release-to-0 reclaim it?

So basically ignoring what VOP_INACTIVE() says. I think this would need
another flag so that new vget() against this vnode can drop it early.
Otherwise we could have a livelock with the NFS server always getting
references to the vnode, preventing it from being recycled and
blocking other threads waiting on the inode to reuse it.

> 
> Blah, I didn't even want to think about this migrane-inducer now.
> Maybe people who have recently worked on vnode reclaiming could instead
> be the ones to comment?

that would be nice :)

-- 
Manuel Bouyer 
 NetBSD: 26 ans d'experience feront toujours la difference
--


Re: CVS commit: src/sys/ufs/ufs

2009-09-22 Thread Antti Kantee
On Tue Sep 22 2009 at 21:58:02 +0200, Manuel Bouyer wrote:
> [explanations]

This is starting to sound sensible now.  You obviously have invested
quite a bit of thought in investigating the problem.

> > Blah, I didn't even want to think about this migrane-inducer now.
> > Maybe people who have recently worked on vnode reclaiming could instead
> > be the ones to comment?
> 
> that would be nice :)

You don't like my comments? ;)


Re: CVS commit: src/sys/ufs/ufs

2009-09-22 Thread Manuel Bouyer
On Tue, Sep 22, 2009 at 11:26:21PM +0300, Antti Kantee wrote:
> On Tue Sep 22 2009 at 21:58:02 +0200, Manuel Bouyer wrote:
> > [explanations]
> 
> This is starting to sound sensible now.  You obviously have invested
> quite a bit of thought in investigating the problem.

Yes, it was annoying enough :)

> 
> > > Blah, I didn't even want to think about this migrane-inducer now.
> > > Maybe people who have recently worked on vnode reclaiming could instead
> > > be the ones to comment?
> > 
> > that would be nice :)
> 
> You don't like my comments? ;)

Ho I do, and you certainly know more than I do in this area ... :)

-- 
Manuel Bouyer 
 NetBSD: 26 ans d'experience feront toujours la difference
--