Re: [PATCH] nfsd: Close a race between access checking/setting in nfs4_get_vfs_file

2016-06-12 Thread Jeff Layton
On Sat, 2016-06-11 at 23:15 -0400, Oleg Drokin wrote:
> On Jun 11, 2016, at 10:50 PM, Jeff Layton wrote:
> 
> > 
> > On Sat, 2016-06-11 at 22:06 -0400, Oleg Drokin wrote:
> > > 
> > > 
> > > Hm. I am trying to lock the newly initialized one and that seems to be 
> > > holding up
> > > well (but I want 24 hours just to be extra sure).
> > > Hn, I just noticed a bug in this, so that'll reset the clock back.
> > > 
> > > But I think we cannot return with locked one if we found existing one due 
> > > to lock
> > > inversion?
> > > I see that normally first we lock the state rwsem (now mutex) and then
> > > lock the fi_lock.
> > > Now if we make init_open_stateid() to lock the new state mutex while the 
> > > fi_lock
> > > is locked - that's probably ok, because we can do it before adding it to 
> > > the list,
> > > so nobody can find it.
> > > Now the existing state that we find, we cannot really lock while holding 
> > > that fi_lock,
> > > because what if there's a parallel thread that already holds the mutex 
> > > and now
> > > wants the fi_lock?
> > > And so it's probably best to return with existing state unlocked and let 
> > > caller lock it?
> > > Or do you think it's best to separately lock the found stp outside of 
> > > spinlock
> > > just for consistency?
> > I think we just have to ensure that if the new stateid is hashed that
> > its mutex is locked prior to being inserted into the hashtable. That
> > should prevent the race you mentioned.
> > 
> > If we find an existing one in the hashtable in init_open_stateid, then
> > we _can_ take the mutex after dropping the spinlocks, since we won't
> > call release_open_stateid in that case anyway.
> Yes.
> 
> > 
> > We'll also need to consider what happens if nfs4_get_vfs_file fails
> > after we hashed the stateid, but then another task finds it while
> > processing another open. So we might have to have release_open_stateid
> > unlock the mutex after unhashing the stateid, but before putting the
> > reference, and then have init_open_stateid check to see if the thing is
> > still hashed after it gets the mutex.
> Hm.
> So what's going to go wrong if another user reuses the unhashed stateid?
> As long as they drop it once they are done it'll be freed and all is fine, no?
> Are there other implications?
> Hm, it looks like free_ol_stateid_reaplist() just frees the thing without any 
> looking
> into mutexes and stuff?
> 

The problem there is that you're sending a very soon to be defunct
stateid to the client as valid, which means the client will end up in
state recovery (at best).

The initial open is somewhat of a special case...we are hashing a new
stateid. If the operation that hashes it fails, then we want to make
like it never happened at all. The client will never see it, so we
don't want to leave it hanging around on the server. We need to unhash
the stateid in that case.

So, yes, the mutex doesn't really prevent the thing from being
unhashed, but if you take the mutex and the thing is still hashed
afterward, then you know that the above situation didn't happen. The
initial incarnation of the stateid will have been sent to the client.

> Ok, so we get the mutex, check that the stateid is hashed, it's not anymore
> (actually unhashing could be done without mutex too, right? so just mutex held
> is not going to protect us), then we need to drop the mutex and restart the 
> search
> from scratch (including all relocking), I assume?

Yeah, that's what I was thinking. It should be a fairly rare race so
taking an extra hit on the locking in that case shouldn't be too bad. I
think there's quite a bit of opportunity to simplify the open path by
reworking this code as well.

What we probably need to do is turn nfsd4_find_existing_open into
something that intializes and hashes the stateid if it doesn't find one
, and returns it with the mutex locked. It could also do the check to
see if an existing stateid was still hashed after taking the mutex and
could redrive the search if it isn't. That would also make the open
code more resilient in the face of OPEN vs. CLOSE races, which would
also be nice...

> I guess I'll have it as a separate follow on patch.
> We'll probably also need some fault-injection here to trigger this case, as 
> triggering
> it "naturally" will be a tough problem even on my mega-racy setup.
> 
> Something like:
> if (swapstp) {
> …
>   }
>   if (FAULTINJECTION) {
>   msleep(some_random_time);
>   status = nfserr_eio;
>   } else
>   status = nfs4_get_vfs_file(rqstp, fp, current_fh, stp, 
> open); 
> 
> should increase the chance.

Sure. Look for CONFIG_NFSD_FAULT_INJECTION. You might be able to use
that framework (though it is a bit manky).

> Ideally there'd be a way to trigger this case more deterministically,
> how do I have two OPEN requests in parallel in NFS for the same file,
> just have two threads do it and that wo

Re: [PATCH] nfsd: Close a race between access checking/setting in nfs4_get_vfs_file

2016-06-11 Thread Oleg Drokin

On Jun 11, 2016, at 10:50 PM, Jeff Layton wrote:

> On Sat, 2016-06-11 at 22:06 -0400, Oleg Drokin wrote:
>> 
>> Hm. I am trying to lock the newly initialized one and that seems to be 
>> holding up
>> well (but I want 24 hours just to be extra sure).
>> Hn, I just noticed a bug in this, so that'll reset the clock back.
>> 
>> But I think we cannot return with locked one if we found existing one due to 
>> lock
>> inversion?
>> I see that normally first we lock the state rwsem (now mutex) and then
>> lock the fi_lock.
>> Now if we make init_open_stateid() to lock the new state mutex while the 
>> fi_lock
>> is locked - that's probably ok, because we can do it before adding it to the 
>> list,
>> so nobody can find it.
>> Now the existing state that we find, we cannot really lock while holding 
>> that fi_lock,
>> because what if there's a parallel thread that already holds the mutex and 
>> now
>> wants the fi_lock?
>> And so it's probably best to return with existing state unlocked and let 
>> caller lock it?
>> Or do you think it's best to separately lock the found stp outside of 
>> spinlock
>> just for consistency?
> 
> I think we just have to ensure that if the new stateid is hashed that
> its mutex is locked prior to being inserted into the hashtable. That
> should prevent the race you mentioned.
> 
> If we find an existing one in the hashtable in init_open_stateid, then
> we _can_ take the mutex after dropping the spinlocks, since we won't
> call release_open_stateid in that case anyway.

Yes.

> We'll also need to consider what happens if nfs4_get_vfs_file fails
> after we hashed the stateid, but then another task finds it while
> processing another open. So we might have to have release_open_stateid
> unlock the mutex after unhashing the stateid, but before putting the
> reference, and then have init_open_stateid check to see if the thing is
> still hashed after it gets the mutex.

Hm.
So what's going to go wrong if another user reuses the unhashed stateid?
As long as they drop it once they are done it'll be freed and all is fine, no?
Are there other implications?
Hm, it looks like free_ol_stateid_reaplist() just frees the thing without any 
looking
into mutexes and stuff?

Ok, so we get the mutex, check that the stateid is hashed, it's not anymore
(actually unhashing could be done without mutex too, right? so just mutex held
is not going to protect us), then we need to drop the mutex and restart the 
search
from scratch (including all relocking), I assume?
I guess I'll have it as a separate follow on patch.
We'll probably also need some fault-injection here to trigger this case, as 
triggering
it "naturally" will be a tough problem even on my mega-racy setup.

Something like:
if (swapstp) {
…
}
if (FAULTINJECTION) {
msleep(some_random_time);
status = nfserr_eio;
} else
status = nfs4_get_vfs_file(rqstp, fp, current_fh, stp, 
open); 

should increase the chance.
Ideally there'd be a way to trigger this case more deterministically,
how do I have two OPEN requests in parallel in NFS for the same file,
just have two threads do it and that would 100% result in two requests,
no merging anywhere along the way that I need to be aware of?


Re: [PATCH] nfsd: Close a race between access checking/setting in nfs4_get_vfs_file

2016-06-11 Thread Jeff Layton
On Sat, 2016-06-11 at 22:06 -0400, Oleg Drokin wrote:
> On Jun 11, 2016, at 9:33 PM, Jeff Layton wrote:
> 
> > On Sat, 2016-06-11 at 11:41 -0400, Oleg Drokin wrote:
> > > On Jun 10, 2016, at 4:55 PM, J . Bruce Fields wrote:
> > > 
> > > > On Fri, Jun 10, 2016 at 06:50:33AM -0400, Jeff Layton wrote:
> > > > > On Fri, 2016-06-10 at 00:18 -0400, Oleg Drokin wrote:
> > > > > > On Jun 9, 2016, at 5:01 PM, Oleg Drokin wrote:
> > > > > > 
> > > > > > > Currently there's an unprotected access mode check in
> > > > > > > nfs4_upgrade_open
> > > > > > > that then calls nfs4_get_vfs_file which in turn assumes whatever
> > > > > > > access mode was present in the state is still valid which is racy.
> > > > > > > Two nfs4_get_vfs_file van enter the same path as result and get 
> > > > > > > two
> > > > > > > references to nfs4_file, but later drop would only happens once
> > > > > > > because
> > > > > > > access mode is only denoted by bits, so no refcounting.
> > > > > > > 
> > > > > > > The locking around access mode testing is introduced to avoid this
> > > > > > > race.
> > > > > > > 
> > > > > > > Signed-off-by: Oleg Drokin 
> > > > > > > ---
> > > > > > > 
> > > > > > > This patch performs equally well to the st_rwsem -> mutex
> > > > > > > conversion,
> > > > > > > but is a bit ligher-weight I imagine.
> > > > > > > For one it seems to allow truncates in parallel if we ever want 
> > > > > > > it.
> > > > > > > 
> > > > > > > fs/nfsd/nfs4state.c | 28 +---
> > > > > > > 1 file changed, 25 insertions(+), 3 deletions(-)
> > > > > > > 
> > > > > > > diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> > > > > > > index f5f82e1..d4b9eba 100644
> > > > > > > --- a/fs/nfsd/nfs4state.c
> > > > > > > +++ b/fs/nfsd/nfs4state.c
> > > > > > > @@ -3958,6 +3958,11 @@ static __be32 nfs4_get_vfs_file(struct
> > > > > > > svc_rqst *rqstp, struct nfs4_file *fp,
> > > > > > > 
> > > > > > >   spin_lock(&fp->fi_lock);
> > > > > > > 
> > > > > > > + if (test_access(open->op_share_access, stp)) {
> > > > > > > + spin_unlock(&fp->fi_lock);
> > > > > > > + return nfserr_eagain;
> > > > > > > + }
> > > > > > > +
> > > > > > >   /*
> > > > > > >* Are we trying to set a deny mode that would conflict with
> > > > > > >* current access?
> > > > > > > @@ -4017,11 +4022,21 @@ nfs4_upgrade_open(struct svc_rqst *rqstp,
> > > > > > > struct nfs4_file *fp, struct svc_fh *c
> > > > > > >   __be32 status;
> > > > > > >   unsigned char old_deny_bmap = stp->st_deny_bmap;
> > > > > > > 
> > > > > > > - if (!test_access(open->op_share_access, stp))
> > > > > > > - return nfs4_get_vfs_file(rqstp, fp, cur_fh, stp,
> > > > > > > open);
> > > > > > > +again:
> > > > > > > + spin_lock(&fp->fi_lock);
> > > > > > > + if (!test_access(open->op_share_access, stp)) {
> > > > > > > + spin_unlock(&fp->fi_lock);
> > > > > > > + status = nfs4_get_vfs_file(rqstp, fp, cur_fh, stp,
> > > > > > > open);
> > > > > > > + /*
> > > > > > > +  * Somebody won the race for access while we did
> > > > > > > not hold
> > > > > > > +  * the lock here
> > > > > > > +  */
> > > > > > > + if (status == nfserr_eagain)
> > > > > > > + goto again;
> > > > > > > + return status;
> > > > > > > + }
> > > > > > > 
> > > > > > >   /* test and set deny mode */
> > > > > > > - spin_lock(&fp->fi_lock);
> > > > > > >   status = nfs4_file_check_deny(fp, open->op_share_deny);
> > > > > > >   if (status == nfs_ok) {
> > > > > > >   set_deny(open->op_share_deny, stp);
> > > > > > > @@ -4361,6 +4376,13 @@ nfsd4_process_open2(struct svc_rqst *rqstp,
> > > > > > > struct svc_fh *current_fh, struct nf
> > > > > > >   status = nfs4_get_vfs_file(rqstp, fp, current_fh, stp,
> > > > > > > open);
> > > > > > >   if (status) {
> > > > > > >   up_read(&stp->st_rwsem);
> > > > > > > + /*
> > > > > > > +  * EAGAIN is returned when there's a
> > > > > > > racing access,
> > > > > > > +  * this should never happen as we are the
> > > > > > > only user
> > > > > > > +  * of this new state, and since it's not
> > > > > > > yet hashed,
> > > > > > > +  * nobody can find it
> > > > > > > +  */
> > > > > > > + WARN_ON(status == nfserr_eagain);
> > > > > > 
> > > > > > Ok, some more testing shows that this CAN happen.
> > > > > > So this patch is inferior to the mutex one after all.
> > > > > > 
> > > > > 
> > > > > Yeah, that can happen for all sorts of reasons. As Andrew pointed out,
> > > > > you can get this when there is a lease break in progress, and that may
> > > > > be occurring for a completely different stateid (or because of samba,
> > > > > etc...)
> > > > > 
> > > > > It may be possible to do something like this, but we'd need to audit
> > > > > all of the handling of st_access_bmap (and the deny bmap) to ensure
> > >

Re: [PATCH] nfsd: Close a race between access checking/setting in nfs4_get_vfs_file

2016-06-11 Thread Oleg Drokin

On Jun 11, 2016, at 9:33 PM, Jeff Layton wrote:

> On Sat, 2016-06-11 at 11:41 -0400, Oleg Drokin wrote:
>> On Jun 10, 2016, at 4:55 PM, J . Bruce Fields wrote:
>> 
>>> On Fri, Jun 10, 2016 at 06:50:33AM -0400, Jeff Layton wrote:
 On Fri, 2016-06-10 at 00:18 -0400, Oleg Drokin wrote:
> On Jun 9, 2016, at 5:01 PM, Oleg Drokin wrote:
> 
>> Currently there's an unprotected access mode check in
>> nfs4_upgrade_open
>> that then calls nfs4_get_vfs_file which in turn assumes whatever
>> access mode was present in the state is still valid which is racy.
>> Two nfs4_get_vfs_file van enter the same path as result and get two
>> references to nfs4_file, but later drop would only happens once
>> because
>> access mode is only denoted by bits, so no refcounting.
>> 
>> The locking around access mode testing is introduced to avoid this
>> race.
>> 
>> Signed-off-by: Oleg Drokin 
>> ---
>> 
>> This patch performs equally well to the st_rwsem -> mutex
>> conversion,
>> but is a bit ligher-weight I imagine.
>> For one it seems to allow truncates in parallel if we ever want it.
>> 
>> fs/nfsd/nfs4state.c | 28 +---
>> 1 file changed, 25 insertions(+), 3 deletions(-)
>> 
>> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
>> index f5f82e1..d4b9eba 100644
>> --- a/fs/nfsd/nfs4state.c
>> +++ b/fs/nfsd/nfs4state.c
>> @@ -3958,6 +3958,11 @@ static __be32 nfs4_get_vfs_file(struct
>> svc_rqst *rqstp, struct nfs4_file *fp,
>> 
>>  spin_lock(&fp->fi_lock);
>> 
>> +if (test_access(open->op_share_access, stp)) {
>> +spin_unlock(&fp->fi_lock);
>> +return nfserr_eagain;
>> +}
>> +
>>  /*
>>   * Are we trying to set a deny mode that would conflict with
>>   * current access?
>> @@ -4017,11 +4022,21 @@ nfs4_upgrade_open(struct svc_rqst *rqstp,
>> struct nfs4_file *fp, struct svc_fh *c
>>  __be32 status;
>>  unsigned char old_deny_bmap = stp->st_deny_bmap;
>> 
>> -if (!test_access(open->op_share_access, stp))
>> -return nfs4_get_vfs_file(rqstp, fp, cur_fh, stp,
>> open);
>> +again:
>> +spin_lock(&fp->fi_lock);
>> +if (!test_access(open->op_share_access, stp)) {
>> +spin_unlock(&fp->fi_lock);
>> +status = nfs4_get_vfs_file(rqstp, fp, cur_fh, stp,
>> open);
>> +/*
>> + * Somebody won the race for access while we did
>> not hold
>> + * the lock here
>> + */
>> +if (status == nfserr_eagain)
>> +goto again;
>> +return status;
>> +}
>> 
>>  /* test and set deny mode */
>> -spin_lock(&fp->fi_lock);
>>  status = nfs4_file_check_deny(fp, open->op_share_deny);
>>  if (status == nfs_ok) {
>>  set_deny(open->op_share_deny, stp);
>> @@ -4361,6 +4376,13 @@ nfsd4_process_open2(struct svc_rqst *rqstp,
>> struct svc_fh *current_fh, struct nf
>>  status = nfs4_get_vfs_file(rqstp, fp, current_fh, stp,
>> open);
>>  if (status) {
>>  up_read(&stp->st_rwsem);
>> +/*
>> + * EAGAIN is returned when there's a
>> racing access,
>> + * this should never happen as we are the
>> only user
>> + * of this new state, and since it's not
>> yet hashed,
>> + * nobody can find it
>> + */
>> +WARN_ON(status == nfserr_eagain);
> 
> Ok, some more testing shows that this CAN happen.
> So this patch is inferior to the mutex one after all.
> 
 
 Yeah, that can happen for all sorts of reasons. As Andrew pointed out,
 you can get this when there is a lease break in progress, and that may
 be occurring for a completely different stateid (or because of samba,
 etc...)
 
 It may be possible to do something like this, but we'd need to audit
 all of the handling of st_access_bmap (and the deny bmap) to ensure
 that we get it right.
 
 For now, I think just turning that rwsem into a mutex is the best
 solution. That is a per-stateid mutex so any contention is going to be
 due to the client sending racing OPEN calls for the same inode anyway.
 Allowing those to run in parallel again could be useful in some cases,
 but most use-cases won't be harmed by that serialization.
>>> 
>>> OK, so for now my plan is to take "nfsd: Always lock state exclusively"
>>> for 4.7.  Thanks to both of you for your work on this….
>> 
>> 
>> FYI, I just hit this again with the "Always lock state exclusive

Re: [PATCH] nfsd: Close a race between access checking/setting in nfs4_get_vfs_file

2016-06-11 Thread Jeff Layton
On Sat, 2016-06-11 at 11:41 -0400, Oleg Drokin wrote:
> On Jun 10, 2016, at 4:55 PM, J . Bruce Fields wrote:
> 
> > On Fri, Jun 10, 2016 at 06:50:33AM -0400, Jeff Layton wrote:
> > > On Fri, 2016-06-10 at 00:18 -0400, Oleg Drokin wrote:
> > > > On Jun 9, 2016, at 5:01 PM, Oleg Drokin wrote:
> > > > 
> > > > > Currently there's an unprotected access mode check in
> > > > > nfs4_upgrade_open
> > > > > that then calls nfs4_get_vfs_file which in turn assumes whatever
> > > > > access mode was present in the state is still valid which is racy.
> > > > > Two nfs4_get_vfs_file van enter the same path as result and get two
> > > > > references to nfs4_file, but later drop would only happens once
> > > > > because
> > > > > access mode is only denoted by bits, so no refcounting.
> > > > > 
> > > > > The locking around access mode testing is introduced to avoid this
> > > > > race.
> > > > > 
> > > > > Signed-off-by: Oleg Drokin 
> > > > > ---
> > > > > 
> > > > > This patch performs equally well to the st_rwsem -> mutex
> > > > > conversion,
> > > > > but is a bit ligher-weight I imagine.
> > > > > For one it seems to allow truncates in parallel if we ever want it.
> > > > > 
> > > > > fs/nfsd/nfs4state.c | 28 +---
> > > > > 1 file changed, 25 insertions(+), 3 deletions(-)
> > > > > 
> > > > > diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> > > > > index f5f82e1..d4b9eba 100644
> > > > > --- a/fs/nfsd/nfs4state.c
> > > > > +++ b/fs/nfsd/nfs4state.c
> > > > > @@ -3958,6 +3958,11 @@ static __be32 nfs4_get_vfs_file(struct
> > > > > svc_rqst *rqstp, struct nfs4_file *fp,
> > > > > 
> > > > >   spin_lock(&fp->fi_lock);
> > > > > 
> > > > > + if (test_access(open->op_share_access, stp)) {
> > > > > + spin_unlock(&fp->fi_lock);
> > > > > + return nfserr_eagain;
> > > > > + }
> > > > > +
> > > > >   /*
> > > > >* Are we trying to set a deny mode that would conflict with
> > > > >* current access?
> > > > > @@ -4017,11 +4022,21 @@ nfs4_upgrade_open(struct svc_rqst *rqstp,
> > > > > struct nfs4_file *fp, struct svc_fh *c
> > > > >   __be32 status;
> > > > >   unsigned char old_deny_bmap = stp->st_deny_bmap;
> > > > > 
> > > > > - if (!test_access(open->op_share_access, stp))
> > > > > - return nfs4_get_vfs_file(rqstp, fp, cur_fh, stp,
> > > > > open);
> > > > > +again:
> > > > > + spin_lock(&fp->fi_lock);
> > > > > + if (!test_access(open->op_share_access, stp)) {
> > > > > + spin_unlock(&fp->fi_lock);
> > > > > + status = nfs4_get_vfs_file(rqstp, fp, cur_fh, stp,
> > > > > open);
> > > > > + /*
> > > > > +  * Somebody won the race for access while we did
> > > > > not hold
> > > > > +  * the lock here
> > > > > +  */
> > > > > + if (status == nfserr_eagain)
> > > > > + goto again;
> > > > > + return status;
> > > > > + }
> > > > > 
> > > > >   /* test and set deny mode */
> > > > > - spin_lock(&fp->fi_lock);
> > > > >   status = nfs4_file_check_deny(fp, open->op_share_deny);
> > > > >   if (status == nfs_ok) {
> > > > >   set_deny(open->op_share_deny, stp);
> > > > > @@ -4361,6 +4376,13 @@ nfsd4_process_open2(struct svc_rqst *rqstp,
> > > > > struct svc_fh *current_fh, struct nf
> > > > >   status = nfs4_get_vfs_file(rqstp, fp, current_fh, stp,
> > > > > open);
> > > > >   if (status) {
> > > > >   up_read(&stp->st_rwsem);
> > > > > + /*
> > > > > +  * EAGAIN is returned when there's a
> > > > > racing access,
> > > > > +  * this should never happen as we are the
> > > > > only user
> > > > > +  * of this new state, and since it's not
> > > > > yet hashed,
> > > > > +  * nobody can find it
> > > > > +  */
> > > > > + WARN_ON(status == nfserr_eagain);
> > > > 
> > > > Ok, some more testing shows that this CAN happen.
> > > > So this patch is inferior to the mutex one after all.
> > > > 
> > > 
> > > Yeah, that can happen for all sorts of reasons. As Andrew pointed out,
> > > you can get this when there is a lease break in progress, and that may
> > > be occurring for a completely different stateid (or because of samba,
> > > etc...)
> > > 
> > > It may be possible to do something like this, but we'd need to audit
> > > all of the handling of st_access_bmap (and the deny bmap) to ensure
> > > that we get it right.
> > > 
> > > For now, I think just turning that rwsem into a mutex is the best
> > > solution. That is a per-stateid mutex so any contention is going to be
> > > due to the client sending racing OPEN calls for the same inode anyway.
> > > Allowing those to run in parallel again could be useful in some cases,
> > > but most use-cases won't be harmed by that serialization.

Re: [PATCH] nfsd: Close a race between access checking/setting in nfs4_get_vfs_file

2016-06-11 Thread Oleg Drokin

On Jun 10, 2016, at 4:55 PM, J . Bruce Fields wrote:

> On Fri, Jun 10, 2016 at 06:50:33AM -0400, Jeff Layton wrote:
>> On Fri, 2016-06-10 at 00:18 -0400, Oleg Drokin wrote:
>>> On Jun 9, 2016, at 5:01 PM, Oleg Drokin wrote:
>>> 
 Currently there's an unprotected access mode check in
 nfs4_upgrade_open
 that then calls nfs4_get_vfs_file which in turn assumes whatever
 access mode was present in the state is still valid which is racy.
 Two nfs4_get_vfs_file van enter the same path as result and get two
 references to nfs4_file, but later drop would only happens once
 because
 access mode is only denoted by bits, so no refcounting.
 
 The locking around access mode testing is introduced to avoid this
 race.
 
 Signed-off-by: Oleg Drokin 
 ---
 
 This patch performs equally well to the st_rwsem -> mutex
 conversion,
 but is a bit ligher-weight I imagine.
 For one it seems to allow truncates in parallel if we ever want it.
 
 fs/nfsd/nfs4state.c | 28 +---
 1 file changed, 25 insertions(+), 3 deletions(-)
 
 diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
 index f5f82e1..d4b9eba 100644
 --- a/fs/nfsd/nfs4state.c
 +++ b/fs/nfsd/nfs4state.c
 @@ -3958,6 +3958,11 @@ static __be32 nfs4_get_vfs_file(struct
 svc_rqst *rqstp, struct nfs4_file *fp,
 
spin_lock(&fp->fi_lock);
 
 +  if (test_access(open->op_share_access, stp)) {
 +  spin_unlock(&fp->fi_lock);
 +  return nfserr_eagain;
 +  }
 +
/*
 * Are we trying to set a deny mode that would conflict with
 * current access?
 @@ -4017,11 +4022,21 @@ nfs4_upgrade_open(struct svc_rqst *rqstp,
 struct nfs4_file *fp, struct svc_fh *c
__be32 status;
unsigned char old_deny_bmap = stp->st_deny_bmap;
 
 -  if (!test_access(open->op_share_access, stp))
 -  return nfs4_get_vfs_file(rqstp, fp, cur_fh, stp,
 open);
 +again:
 +  spin_lock(&fp->fi_lock);
 +  if (!test_access(open->op_share_access, stp)) {
 +  spin_unlock(&fp->fi_lock);
 +  status = nfs4_get_vfs_file(rqstp, fp, cur_fh, stp,
 open);
 +  /*
 +   * Somebody won the race for access while we did
 not hold
 +   * the lock here
 +   */
 +  if (status == nfserr_eagain)
 +  goto again;
 +  return status;
 +  }
 
/* test and set deny mode */
 -  spin_lock(&fp->fi_lock);
status = nfs4_file_check_deny(fp, open->op_share_deny);
if (status == nfs_ok) {
set_deny(open->op_share_deny, stp);
 @@ -4361,6 +4376,13 @@ nfsd4_process_open2(struct svc_rqst *rqstp,
 struct svc_fh *current_fh, struct nf
status = nfs4_get_vfs_file(rqstp, fp, current_fh, stp,
 open);
if (status) {
up_read(&stp->st_rwsem);
 +  /*
 +   * EAGAIN is returned when there's a
 racing access,
 +   * this should never happen as we are the
 only user
 +   * of this new state, and since it's not
 yet hashed,
 +   * nobody can find it
 +   */
 +  WARN_ON(status == nfserr_eagain);
>>> 
>>> Ok, some more testing shows that this CAN happen.
>>> So this patch is inferior to the mutex one after all.
>>> 
>> 
>> Yeah, that can happen for all sorts of reasons. As Andrew pointed out,
>> you can get this when there is a lease break in progress, and that may
>> be occurring for a completely different stateid (or because of samba,
>> etc...)
>> 
>> It may be possible to do something like this, but we'd need to audit
>> all of the handling of st_access_bmap (and the deny bmap) to ensure
>> that we get it right.
>> 
>> For now, I think just turning that rwsem into a mutex is the best
>> solution. That is a per-stateid mutex so any contention is going to be
>> due to the client sending racing OPEN calls for the same inode anyway.
>> Allowing those to run in parallel again could be useful in some cases,
>> but most use-cases won't be harmed by that serialization.
> 
> OK, so for now my plan is to take "nfsd: Always lock state exclusively"
> for 4.7.  Thanks to both of you for your work on this….


FYI, I just hit this again with the "Always lock state exclusively" patch too.
I hate when that happens. But it is much harder to hit now.

the trace is also in the nfs4_get_vfs_file() that's called directly from
nfsd4_process_open2().

Otherwise the symptoms are pretty same - first I get the warning in set_access
that the flag is already set and then the nfsd4_free_file_rcu() one and then
unmount of underlying fs fails.

What's strange is I am not sure what else can set the flag.
Basically set_access is called from nfs4_get_vfs_f

Re: [PATCH] nfsd: Close a race between access checking/setting in nfs4_get_vfs_file

2016-06-10 Thread J . Bruce Fields
On Fri, Jun 10, 2016 at 06:50:33AM -0400, Jeff Layton wrote:
> On Fri, 2016-06-10 at 00:18 -0400, Oleg Drokin wrote:
> > On Jun 9, 2016, at 5:01 PM, Oleg Drokin wrote:
> > 
> > > Currently there's an unprotected access mode check in
> > > nfs4_upgrade_open
> > > that then calls nfs4_get_vfs_file which in turn assumes whatever
> > > access mode was present in the state is still valid which is racy.
> > > Two nfs4_get_vfs_file van enter the same path as result and get two
> > > references to nfs4_file, but later drop would only happens once
> > > because
> > > access mode is only denoted by bits, so no refcounting.
> > > 
> > > The locking around access mode testing is introduced to avoid this
> > > race.
> > > 
> > > Signed-off-by: Oleg Drokin 
> > > ---
> > > 
> > > This patch performs equally well to the st_rwsem -> mutex
> > > conversion,
> > > but is a bit ligher-weight I imagine.
> > > For one it seems to allow truncates in parallel if we ever want it.
> > > 
> > > fs/nfsd/nfs4state.c | 28 +---
> > > 1 file changed, 25 insertions(+), 3 deletions(-)
> > > 
> > > diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> > > index f5f82e1..d4b9eba 100644
> > > --- a/fs/nfsd/nfs4state.c
> > > +++ b/fs/nfsd/nfs4state.c
> > > @@ -3958,6 +3958,11 @@ static __be32 nfs4_get_vfs_file(struct
> > > svc_rqst *rqstp, struct nfs4_file *fp,
> > > 
> > >   spin_lock(&fp->fi_lock);
> > > 
> > > + if (test_access(open->op_share_access, stp)) {
> > > + spin_unlock(&fp->fi_lock);
> > > + return nfserr_eagain;
> > > + }
> > > +
> > >   /*
> > >* Are we trying to set a deny mode that would conflict with
> > >* current access?
> > > @@ -4017,11 +4022,21 @@ nfs4_upgrade_open(struct svc_rqst *rqstp,
> > > struct nfs4_file *fp, struct svc_fh *c
> > >   __be32 status;
> > >   unsigned char old_deny_bmap = stp->st_deny_bmap;
> > > 
> > > - if (!test_access(open->op_share_access, stp))
> > > - return nfs4_get_vfs_file(rqstp, fp, cur_fh, stp,
> > > open);
> > > +again:
> > > + spin_lock(&fp->fi_lock);
> > > + if (!test_access(open->op_share_access, stp)) {
> > > + spin_unlock(&fp->fi_lock);
> > > + status = nfs4_get_vfs_file(rqstp, fp, cur_fh, stp,
> > > open);
> > > + /*
> > > +  * Somebody won the race for access while we did
> > > not hold
> > > +  * the lock here
> > > +  */
> > > + if (status == nfserr_eagain)
> > > + goto again;
> > > + return status;
> > > + }
> > > 
> > >   /* test and set deny mode */
> > > - spin_lock(&fp->fi_lock);
> > >   status = nfs4_file_check_deny(fp, open->op_share_deny);
> > >   if (status == nfs_ok) {
> > >   set_deny(open->op_share_deny, stp);
> > > @@ -4361,6 +4376,13 @@ nfsd4_process_open2(struct svc_rqst *rqstp,
> > > struct svc_fh *current_fh, struct nf
> > >   status = nfs4_get_vfs_file(rqstp, fp, current_fh, stp,
> > > open);
> > >   if (status) {
> > >   up_read(&stp->st_rwsem);
> > > + /*
> > > +  * EAGAIN is returned when there's a
> > > racing access,
> > > +  * this should never happen as we are the
> > > only user
> > > +  * of this new state, and since it's not
> > > yet hashed,
> > > +  * nobody can find it
> > > +  */
> > > + WARN_ON(status == nfserr_eagain);
> > 
> > Ok, some more testing shows that this CAN happen.
> > So this patch is inferior to the mutex one after all.
> > 
> 
> Yeah, that can happen for all sorts of reasons. As Andrew pointed out,
> you can get this when there is a lease break in progress, and that may
> be occurring for a completely different stateid (or because of samba,
> etc...)
> 
> It may be possible to do something like this, but we'd need to audit
> all of the handling of st_access_bmap (and the deny bmap) to ensure
> that we get it right.
> 
> For now, I think just turning that rwsem into a mutex is the best
> solution. That is a per-stateid mutex so any contention is going to be
> due to the client sending racing OPEN calls for the same inode anyway.
> Allowing those to run in parallel again could be useful in some cases,
> but most use-cases won't be harmed by that serialization.

OK, so for now my plan is to take "nfsd: Always lock state exclusively"
for 4.7.  Thanks to both of you for your work on this

--b.


Re: [PATCH] nfsd: Close a race between access checking/setting in nfs4_get_vfs_file

2016-06-10 Thread Jeff Layton
On Fri, 2016-06-10 at 00:18 -0400, Oleg Drokin wrote:
> On Jun 9, 2016, at 5:01 PM, Oleg Drokin wrote:
> 
> > Currently there's an unprotected access mode check in
> > nfs4_upgrade_open
> > that then calls nfs4_get_vfs_file which in turn assumes whatever
> > access mode was present in the state is still valid which is racy.
> > Two nfs4_get_vfs_file van enter the same path as result and get two
> > references to nfs4_file, but later drop would only happens once
> > because
> > access mode is only denoted by bits, so no refcounting.
> > 
> > The locking around access mode testing is introduced to avoid this
> > race.
> > 
> > Signed-off-by: Oleg Drokin 
> > ---
> > 
> > This patch performs equally well to the st_rwsem -> mutex
> > conversion,
> > but is a bit ligher-weight I imagine.
> > For one it seems to allow truncates in parallel if we ever want it.
> > 
> > fs/nfsd/nfs4state.c | 28 +---
> > 1 file changed, 25 insertions(+), 3 deletions(-)
> > 
> > diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> > index f5f82e1..d4b9eba 100644
> > --- a/fs/nfsd/nfs4state.c
> > +++ b/fs/nfsd/nfs4state.c
> > @@ -3958,6 +3958,11 @@ static __be32 nfs4_get_vfs_file(struct
> > svc_rqst *rqstp, struct nfs4_file *fp,
> > 
> > spin_lock(&fp->fi_lock);
> > 
> > +   if (test_access(open->op_share_access, stp)) {
> > +   spin_unlock(&fp->fi_lock);
> > +   return nfserr_eagain;
> > +   }
> > +
> > /*
> >  * Are we trying to set a deny mode that would conflict with
> >  * current access?
> > @@ -4017,11 +4022,21 @@ nfs4_upgrade_open(struct svc_rqst *rqstp,
> > struct nfs4_file *fp, struct svc_fh *c
> > __be32 status;
> > unsigned char old_deny_bmap = stp->st_deny_bmap;
> > 
> > -   if (!test_access(open->op_share_access, stp))
> > -   return nfs4_get_vfs_file(rqstp, fp, cur_fh, stp,
> > open);
> > +again:
> > +   spin_lock(&fp->fi_lock);
> > +   if (!test_access(open->op_share_access, stp)) {
> > +   spin_unlock(&fp->fi_lock);
> > +   status = nfs4_get_vfs_file(rqstp, fp, cur_fh, stp,
> > open);
> > +   /*
> > +    * Somebody won the race for access while we did
> > not hold
> > +    * the lock here
> > +    */
> > +   if (status == nfserr_eagain)
> > +   goto again;
> > +   return status;
> > +   }
> > 
> > /* test and set deny mode */
> > -   spin_lock(&fp->fi_lock);
> > status = nfs4_file_check_deny(fp, open->op_share_deny);
> > if (status == nfs_ok) {
> > set_deny(open->op_share_deny, stp);
> > @@ -4361,6 +4376,13 @@ nfsd4_process_open2(struct svc_rqst *rqstp,
> > struct svc_fh *current_fh, struct nf
> > status = nfs4_get_vfs_file(rqstp, fp, current_fh, stp,
> > open);
> > if (status) {
> > up_read(&stp->st_rwsem);
> > +   /*
> > +    * EAGAIN is returned when there's a
> > racing access,
> > +    * this should never happen as we are the
> > only user
> > +    * of this new state, and since it's not
> > yet hashed,
> > +    * nobody can find it
> > +    */
> > +   WARN_ON(status == nfserr_eagain);
> 
> Ok, some more testing shows that this CAN happen.
> So this patch is inferior to the mutex one after all.
> 

Yeah, that can happen for all sorts of reasons. As Andrew pointed out,
you can get this when there is a lease break in progress, and that may
be occurring for a completely different stateid (or because of samba,
etc...)

It may be possible to do something like this, but we'd need to audit
all of the handling of st_access_bmap (and the deny bmap) to ensure
that we get it right.

For now, I think just turning that rwsem into a mutex is the best
solution. That is a per-stateid mutex so any contention is going to be
due to the client sending racing OPEN calls for the same inode anyway.
Allowing those to run in parallel again could be useful in some cases,
but most use-cases won't be harmed by that serialization.

> > release_open_stateid(stp);
> > goto out;
> > }
> > -- 
> > 2.7.4
> 
-- 
Jeff Layton 


Re: [PATCH] nfsd: Close a race between access checking/setting in nfs4_get_vfs_file

2016-06-09 Thread Oleg Drokin

On Jun 9, 2016, at 5:01 PM, Oleg Drokin wrote:

> Currently there's an unprotected access mode check in nfs4_upgrade_open
> that then calls nfs4_get_vfs_file which in turn assumes whatever
> access mode was present in the state is still valid which is racy.
> Two nfs4_get_vfs_file van enter the same path as result and get two
> references to nfs4_file, but later drop would only happens once because
> access mode is only denoted by bits, so no refcounting.
> 
> The locking around access mode testing is introduced to avoid this race.
> 
> Signed-off-by: Oleg Drokin 
> ---
> 
> This patch performs equally well to the st_rwsem -> mutex conversion,
> but is a bit ligher-weight I imagine.
> For one it seems to allow truncates in parallel if we ever want it.
> 
> fs/nfsd/nfs4state.c | 28 +---
> 1 file changed, 25 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> index f5f82e1..d4b9eba 100644
> --- a/fs/nfsd/nfs4state.c
> +++ b/fs/nfsd/nfs4state.c
> @@ -3958,6 +3958,11 @@ static __be32 nfs4_get_vfs_file(struct svc_rqst 
> *rqstp, struct nfs4_file *fp,
> 
>   spin_lock(&fp->fi_lock);
> 
> + if (test_access(open->op_share_access, stp)) {
> + spin_unlock(&fp->fi_lock);
> + return nfserr_eagain;
> + }
> +
>   /*
>* Are we trying to set a deny mode that would conflict with
>* current access?
> @@ -4017,11 +4022,21 @@ nfs4_upgrade_open(struct svc_rqst *rqstp, struct 
> nfs4_file *fp, struct svc_fh *c
>   __be32 status;
>   unsigned char old_deny_bmap = stp->st_deny_bmap;
> 
> - if (!test_access(open->op_share_access, stp))
> - return nfs4_get_vfs_file(rqstp, fp, cur_fh, stp, open);
> +again:
> + spin_lock(&fp->fi_lock);
> + if (!test_access(open->op_share_access, stp)) {
> + spin_unlock(&fp->fi_lock);
> + status = nfs4_get_vfs_file(rqstp, fp, cur_fh, stp, open);
> + /*
> +  * Somebody won the race for access while we did not hold
> +  * the lock here
> +  */
> + if (status == nfserr_eagain)
> + goto again;
> + return status;
> + }
> 
>   /* test and set deny mode */
> - spin_lock(&fp->fi_lock);
>   status = nfs4_file_check_deny(fp, open->op_share_deny);
>   if (status == nfs_ok) {
>   set_deny(open->op_share_deny, stp);
> @@ -4361,6 +4376,13 @@ nfsd4_process_open2(struct svc_rqst *rqstp, struct 
> svc_fh *current_fh, struct nf
>   status = nfs4_get_vfs_file(rqstp, fp, current_fh, stp, open);
>   if (status) {
>   up_read(&stp->st_rwsem);
> + /*
> +  * EAGAIN is returned when there's a racing access,
> +  * this should never happen as we are the only user
> +  * of this new state, and since it's not yet hashed,
> +  * nobody can find it
> +  */
> + WARN_ON(status == nfserr_eagain);

Ok, some more testing shows that this CAN happen.
So this patch is inferior to the mutex one after all.

>   release_open_stateid(stp);
>   goto out;
>   }
> -- 
> 2.7.4



[PATCH] nfsd: Close a race between access checking/setting in nfs4_get_vfs_file

2016-06-09 Thread Oleg Drokin
Currently there's an unprotected access mode check in nfs4_upgrade_open
that then calls nfs4_get_vfs_file which in turn assumes whatever
access mode was present in the state is still valid which is racy.
Two nfs4_get_vfs_file van enter the same path as result and get two
references to nfs4_file, but later drop would only happens once because
access mode is only denoted by bits, so no refcounting.

The locking around access mode testing is introduced to avoid this race.

Signed-off-by: Oleg Drokin 
---

This patch performs equally well to the st_rwsem -> mutex conversion,
but is a bit ligher-weight I imagine.
For one it seems to allow truncates in parallel if we ever want it.

 fs/nfsd/nfs4state.c | 28 +---
 1 file changed, 25 insertions(+), 3 deletions(-)

diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index f5f82e1..d4b9eba 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -3958,6 +3958,11 @@ static __be32 nfs4_get_vfs_file(struct svc_rqst *rqstp, 
struct nfs4_file *fp,
 
spin_lock(&fp->fi_lock);
 
+   if (test_access(open->op_share_access, stp)) {
+   spin_unlock(&fp->fi_lock);
+   return nfserr_eagain;
+   }
+
/*
 * Are we trying to set a deny mode that would conflict with
 * current access?
@@ -4017,11 +4022,21 @@ nfs4_upgrade_open(struct svc_rqst *rqstp, struct 
nfs4_file *fp, struct svc_fh *c
__be32 status;
unsigned char old_deny_bmap = stp->st_deny_bmap;
 
-   if (!test_access(open->op_share_access, stp))
-   return nfs4_get_vfs_file(rqstp, fp, cur_fh, stp, open);
+again:
+   spin_lock(&fp->fi_lock);
+   if (!test_access(open->op_share_access, stp)) {
+   spin_unlock(&fp->fi_lock);
+   status = nfs4_get_vfs_file(rqstp, fp, cur_fh, stp, open);
+   /*
+* Somebody won the race for access while we did not hold
+* the lock here
+*/
+   if (status == nfserr_eagain)
+   goto again;
+   return status;
+   }
 
/* test and set deny mode */
-   spin_lock(&fp->fi_lock);
status = nfs4_file_check_deny(fp, open->op_share_deny);
if (status == nfs_ok) {
set_deny(open->op_share_deny, stp);
@@ -4361,6 +4376,13 @@ nfsd4_process_open2(struct svc_rqst *rqstp, struct 
svc_fh *current_fh, struct nf
status = nfs4_get_vfs_file(rqstp, fp, current_fh, stp, open);
if (status) {
up_read(&stp->st_rwsem);
+   /*
+* EAGAIN is returned when there's a racing access,
+* this should never happen as we are the only user
+* of this new state, and since it's not yet hashed,
+* nobody can find it
+*/
+   WARN_ON(status == nfserr_eagain);
release_open_stateid(stp);
goto out;
}
-- 
2.7.4