Re: NFS/d_splice_alias breakage

2016-06-20 Thread Anna Schumaker
Hi,

On 06/20/2016 10:08 AM, Al Viro wrote:
> On Mon, Jun 20, 2016 at 09:25:12AM -0400, Oleg Drokin wrote:
>> It looks like this patch was totally forgotten?
>> I don't see it in neither vfs nor nfs trees and yet it fixes a very easy to 
>> cause
>> crash in nfs code. And I think it's unrelated to the other parallel case too.
> 
> I assumed it would go through NFS tree, seeing that it's NFS-specific and
> has nothing to do with any of the recent VFS changes (oops is triggerable
> starting from 3.11); I can certainly put it through vfs.git, and there
> will be changes nearby, but this one should go into -stable as a separate
> patch.

I was going to put together an nfs bugfixes pull request for 4.7 this week, so 
I can include the patch there if this is easy to hit.

Anna

> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 



Re: NFS/d_splice_alias breakage

2016-06-20 Thread Trond Myklebust

> On Jun 20, 2016, at 11:43, Anna Schumaker  wrote:
> 
> Hi,
> 
> On 06/20/2016 10:08 AM, Al Viro wrote:
>> On Mon, Jun 20, 2016 at 09:25:12AM -0400, Oleg Drokin wrote:
>>> It looks like this patch was totally forgotten?
>>> I don't see it in neither vfs nor nfs trees and yet it fixes a very easy to 
>>> cause
>>> crash in nfs code. And I think it's unrelated to the other parallel case 
>>> too.
>> 
>> I assumed it would go through NFS tree, seeing that it's NFS-specific and
>> has nothing to do with any of the recent VFS changes (oops is triggerable
>> starting from 3.11); I can certainly put it through vfs.git, and there
>> will be changes nearby, but this one should go into -stable as a separate
>> patch.
> 
> I was going to put together an nfs bugfixes pull request for 4.7 this week, 
> so I can include the patch there if this is easy to hit.
> 

Hi Anna,

Please do, and please keep the Cc: stable…

Thanks
 Trond



Re: NFS/d_splice_alias breakage

2016-06-20 Thread Oleg Drokin

On Jun 20, 2016, at 11:43 AM, Anna Schumaker wrote:

> Hi,
> 
> On 06/20/2016 10:08 AM, Al Viro wrote:
>> On Mon, Jun 20, 2016 at 09:25:12AM -0400, Oleg Drokin wrote:
>>> It looks like this patch was totally forgotten?
>>> I don't see it in neither vfs nor nfs trees and yet it fixes a very easy to 
>>> cause
>>> crash in nfs code. And I think it's unrelated to the other parallel case 
>>> too.
>> 
>> I assumed it would go through NFS tree, seeing that it's NFS-specific and
>> has nothing to do with any of the recent VFS changes (oops is triggerable
>> starting from 3.11); I can certainly put it through vfs.git, and there
>> will be changes nearby, but this one should go into -stable as a separate
>> patch.
> 
> I was going to put together an nfs bugfixes pull request for 4.7 this week, 
> so I can include the patch there if this is easy to hit.

Yes, it is very easy to hit.



Re: NFS/d_splice_alias breakage

2016-06-20 Thread Al Viro
On Mon, Jun 20, 2016 at 02:54:36PM +, Trond Myklebust wrote:
> 
> > On Jun 20, 2016, at 10:08, Al Viro  wrote:
> > 
> > On Mon, Jun 20, 2016 at 09:25:12AM -0400, Oleg Drokin wrote:
> >> It looks like this patch was totally forgotten?
> >> I don't see it in neither vfs nor nfs trees and yet it fixes a very easy 
> >> to cause
> >> crash in nfs code. And I think it's unrelated to the other parallel case 
> >> too.
> > 
> > I assumed it would go through NFS tree, seeing that it's NFS-specific and
> > has nothing to do with any of the recent VFS changes (oops is triggerable
> > starting from 3.11); I can certainly put it through vfs.git, and there
> > will be changes nearby, but this one should go into -stable as a separate
> > patch.
> > 
> 
> I’ll take it through the NFS tree.

OK.  It's really a -stable fodder, BTW - all you need to trigger that oops is
a hashed negative dentry from earlier lookup + symlink created from another
client + attempt to open from ours.  Gets you d_splice_alias() (or
d_materialise_unique() prior to 3.19) with hashed dentry and that triggers
BUG_ON, leaving us with the parent directory locked.


Re: NFS/d_splice_alias breakage

2016-06-20 Thread Trond Myklebust

> On Jun 20, 2016, at 10:08, Al Viro  wrote:
> 
> On Mon, Jun 20, 2016 at 09:25:12AM -0400, Oleg Drokin wrote:
>> It looks like this patch was totally forgotten?
>> I don't see it in neither vfs nor nfs trees and yet it fixes a very easy to 
>> cause
>> crash in nfs code. And I think it's unrelated to the other parallel case too.
> 
> I assumed it would go through NFS tree, seeing that it's NFS-specific and
> has nothing to do with any of the recent VFS changes (oops is triggerable
> starting from 3.11); I can certainly put it through vfs.git, and there
> will be changes nearby, but this one should go into -stable as a separate
> patch.
> 

I’ll take it through the NFS tree.

Cheers
  Trond



Re: NFS/d_splice_alias breakage

2016-06-20 Thread Al Viro
On Mon, Jun 20, 2016 at 09:25:12AM -0400, Oleg Drokin wrote:
> It looks like this patch was totally forgotten?
> I don't see it in neither vfs nor nfs trees and yet it fixes a very easy to 
> cause
> crash in nfs code. And I think it's unrelated to the other parallel case too.

I assumed it would go through NFS tree, seeing that it's NFS-specific and
has nothing to do with any of the recent VFS changes (oops is triggerable
starting from 3.11); I can certainly put it through vfs.git, and there
will be changes nearby, but this one should go into -stable as a separate
patch.


Re: NFS/d_splice_alias breakage

2016-06-20 Thread Oleg Drokin
It looks like this patch was totally forgotten?
I don't see it in neither vfs nor nfs trees and yet it fixes a very easy to 
cause
crash in nfs code. And I think it's unrelated to the other parallel case too.

On Jun 3, 2016, at 1:56 AM, Al Viro wrote:

> On Fri, Jun 03, 2016 at 12:58:10AM -0400, Oleg Drokin wrote:
> 
>> This one cures the insta-crash I was having, and I see no other ill-effects 
>> so far.
> 
> OK...  I can take it through vfs.git, but I think it'd be better off in
> NFS tree.  Is everyone OK with something like the following?
> 
> make nfs_atomic_open() call d_drop() on all ->open_context() errors.
> 
> In "NFSv4: Move dentry instantiation into the NFSv4-specific atomic open code"
> unconditional d_drop() after the ->open_context() had been removed.  It had
> been correct for success cases (there ->open_context() itself had been doing
> dcache manipulations), but not for error ones.  Only one of those (ENOENT)
> got a compensatory d_drop() added in that commit, but in fact it should've
> been done for all errors.  As it is, the case of O_CREAT non-exclusive open
> on a hashed negative dentry racing with e.g. symlink creation from another
> client ended up with ->open_context() getting an error and proceeding to
> call nfs_lookup().  On a hashed dentry, which would've instantly triggered
> BUG_ON() in d_materialise_unique() (or, these days, its equivalent in
> d_splice_alias()).
> 
> Cc: sta...@vger.kernel.org # v3.10+
> Tested-by: Oleg Drokin 
> Signed-off-by: Al Viro 
> ---
> diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
> index aaf7bd0..6e3a6f4 100644
> --- a/fs/nfs/dir.c
> +++ b/fs/nfs/dir.c
> @@ -1536,9 +1536,9 @@ int nfs_atomic_open(struct inode *dir, struct dentry 
> *dentry,
>   err = PTR_ERR(inode);
>   trace_nfs_atomic_open_exit(dir, ctx, open_flags, err);
>   put_nfs_open_context(ctx);
> + d_drop(dentry);
>   switch (err) {
>   case -ENOENT:
> - d_drop(dentry);
>   d_add(dentry, NULL);
>   nfs_set_verifier(dentry, 
> nfs_save_change_attribute(dir));
>   break;



Re: NFS/d_splice_alias breakage

2016-06-10 Thread Oleg Drokin

On Jun 9, 2016, at 9:33 PM, Oleg Drokin wrote:

> 
> On Jun 6, 2016, at 7:36 PM, Oleg Drokin wrote:
> 
>> Well, I have some bad news.
>> 
>> This patch does not fix the issue 100% of the time apparently, I just hit it 
>> again.
> 
> Ok, so now finally a piece of good news.
> Whatever was causing this other much harder to hit crash is gone in -rc2, 
> gone are
> some other disturbing things I saw.

Famous last words, I guess.
It all returned overnight.

But this time it's different from the past couple.

0x813aa8bb is in nfs4_do_open 
(/home/green/bk/linux/fs/nfs/nfs4proc.c:2482).
2477d_drop(dentry);
2478alias = d_exact_alias(dentry, state->inode);
2479if (!alias)
2480alias = d_splice_alias(igrab(state->inode), 
dentry);
2481/* d_splice_alias() can't fail here - it's a 
non-directory */

So it appears the dentry that was negative turned positive in the middle of
d_exact_alias call… and also despite us doing d_drop(dentry), it's also already
hashed?
If this can happen in the middle of our operations here, same thing
presumably can happen in the other place we patched up in nfs_atomic_open - 
we do d_drop there, but by the time we get into d_splice_alias it's now
all hashed again by some racing thread, that would explain
some of the rarer earlier crashes I had in -rc1 after the initial fix.

I wonder if it could be a remote-fs specific open vs open race?

Say we have atomic open for parent1/file1, this locks parent1 and proceeds to 
lookup
file1.
Now before the lookup commences, some other thread moves file1 to parent2
and yet some other thread goes to open parent2/file1.
Eventually it all converges with two threads trying to instantiate file1,
if we get it "just wrong" then a clash like what we see can happen, no?
Hm, I guess then both opens would have their own dentry and it's only inode 
that's
shared, so that cannot be it.
How can anything find a dentry we presumably just allocated and hash it while we
are not done with it, I wonder?
Also I wonder if all of this is somehow related to this other problem I am 
having
where drop_nlink reports going into negative territory on rename() call, I hit 
this
twice already and I guess I just need to convert that to BUG_ON instead for a
closer examination.


The dentry is (I finally have a crashdump):

crash> p *(struct dentry *)0x880055dbd2e8
$4 = {
  d_flags = 4718796, 
  d_seq = {
sequence = 4, 
dep_map = {
  key = 0x8337b1c0 <__key.41115>, 
  class_cache = {0x0, 0x0}, 
  name = 0x81c79c66 "&dentry->d_seq", 
  cpu = 6, 
  ip = 651061426900570
}
  }, 
  d_hash = {
next = 0x0, 
pprev = 0xc90fd9c0
  }, 
  d_parent = 0x8800079d1008, 
  d_name = {
{
  {
hash = 2223188567, 
len = 2
  }, 
  hash_len = 10813123159
}, 
name = 0x880055dbd358 "10"
  }, 
  d_inode = 0x880066d8ab38, 
  d_iname = "10\000", 
  d_lockref = {
{
  {
lock = {
  {
rlock = {
  raw_lock = {
val = {
  counter = 0
}
  }, 
  magic = 3735899821, 
  owner_cpu = 4294967295, 
  owner = 0x, 
  dep_map = {
key = 0x8337b1c8 <__key.41114>, 
class_cache = {0x82c1c3a0 , 0x0}, 
name = 0x81c65bc8 "&(&dentry->d_lockref.lock)->rlock", 
cpu = 3, 
ip = 18446744071583760701
  }
}, 
{
  __padding = "\000\000\000\000╜N╜ч", 
  dep_map = {
key = 0x8337b1c8 <__key.41114>, 
class_cache = {0x82c1c3a0 , 0x0}, 
name = 0x81c65bc8 "&(&dentry->d_lockref.lock)->rlock", 
cpu = 3, 
ip = 18446744071583760701
  }
}
  }
}, 
count = 3
  }
}
  }, 
  d_op = 0x81a46780 , 
  d_sb = 0x88004eaf3000, 
  d_time = 651061426956154, 
  d_fsdata = 0x0, 
  {
d_lru = {
  next = 0x880066d7e3e8, 
  prev = 0x8800036736c8
}, 
d_wait = 0x880066d7e3e8
  }, 
  d_child = {
next = 0x8800268629b8, 
prev = 0x8800079d1128
  }, 
  d_subdirs = {
next = 0x880055dbd408, 
prev = 0x880055dbd408
  }, 
  d_u = {
d_alias = {
  next = 0x0, 
  pprev = 0x880066d8ad20
}, 
d_in_lookup_hash = {
  next = 0x0, 
  pprev = 0x880066d8ad20
}, 
d_rcu = {
  next = 0x0, 
  func = 0x880066d8ad20
}
  }
}

[22845.516232] kernel BUG at /home/green/bk/linux/fs/dcache.c:2989!
[22845.516864] invalid opcode:  [#1] SMP
[22845.517350] Modules linked in: loop rpcsec_gss_krb5 acpi_cpufreq joyd

Re: NFS/d_splice_alias breakage

2016-06-09 Thread Oleg Drokin

On Jun 6, 2016, at 7:36 PM, Oleg Drokin wrote:

> Well, I have some bad news.
> 
> This patch does not fix the issue 100% of the time apparently, I just hit it 
> again.

Ok, so now finally a piece of good news.
Whatever was causing this other much harder to hit crash is gone in -rc2, gone 
are
some other disturbing things I saw.

Hopefully this patch will get propagated everywhere soonish.


Re: NFS/d_splice_alias breakage

2016-06-06 Thread Oleg Drokin
Well, I have some bad news.

This patch does not fix the issue 100% of the time apparently, I just hit it 
again.

Only this time it's much harder to trigger, but stack is the same
(looks a bit different due to a compiler change). Must be some much narrower 
race now.

I still don't have a crashdump, though (apparently makedumpfile that is used by
kexec-tools is behind times and does not work with 4.7.0-rc1 kernels) so I 
cannot
tell you more about the dentry still.

[12470.365211] kernel BUG at /home/green/bk/linux/fs/dcache.c:2989!
[12470.366336] invalid opcode:  [#1] SMP DEBUG_PAGEALLOC
[12470.366927] Modules linked in: loop rpcsec_gss_krb5 joydev pcspkr 
acpi_cpufreq i2c_piix4 tpm_tis virtio_console tpm nfsd ttm drm_kms_helper 
syscopyarea sysfillrect sysimgblt fb_sys_fops drm serio_raw virtio_blk floppy
[12470.368917] CPU: 7 PID: 15952 Comm: cat Not tainted 4.7.0-rc1-vm-nfs+ #115
[12470.369554] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[12470.370137] task: 8800447447c0 ti: 880049a48000 task.ti: 
880049a48000
[12470.371214] RIP: 0010:[]  [] 
d_splice_alias+0x1e1/0x470
[12470.372340] RSP: 0018:880049a4bab8  EFLAGS: 00010286
[12470.372906] RAX: 8800393372a8 RBX: 88003c781000 RCX: 0001
[12470.373525] RDX: 1895 RSI: 88003c781000 RDI: 8800393372a8
[12470.374145] RBP: 880049a4baf0 R08: 1353641935c2 R09: 
[12470.374777] R10: 0001 R11:  R12: 88003a7b9300
[12470.375407] R13:  R14: 88003bf0d2a8 R15: 
[12470.376016] FS:  7fbb07feb700() GS:88006b80() 
knlGS:
[12470.377106] CS:  0010 DS:  ES:  CR0: 80050033
[12470.377693] CR2: 55a498c7bdc8 CR3: 479b3000 CR4: 06e0
[12470.378311] Stack:
[12470.378823]  880040008f00 29e67876 88003c781000 
88003a7b9300
[12470.379946]   88003bf0d2a8  
880049a4bb48
[12470.381075]  8137d63c ffeb 8800 

[12470.382228] Call Trace:
[12470.382766]  [] nfs_lookup+0x15c/0x420
[12470.383363]  [] nfs_atomic_open+0xb1/0x700
[12470.383961]  [] lookup_open+0x2ea/0x770
[12470.384570]  [] path_openat+0x7ff/0x1030
[12470.385169]  [] ? getname_flags+0x4f/0x1f0
[12470.385770]  [] ? kvm_sched_clock_read+0x25/0x40
[12470.386361]  [] do_filp_open+0x91/0x100
[12470.386945]  [] ? _raw_spin_unlock+0x27/0x40
[12470.387559]  [] ? __alloc_fd+0x100/0x200
[12470.388158]  [] do_sys_open+0x130/0x220
[12470.388758]  [] SyS_open+0x1e/0x20
[12470.389327]  [] entry_SYSCALL_64_fastpath+0x1f/0xbd
[12470.389929] Code: 83 c4 10 4c 89 f8 5b 41 5c 41 5d 41 5e 41 5f 5d c3 48 89 
df e8 f1 d6 ff ff 4c 89 f7 e8 19 2a 60 00 45 31 ff eb d9 49 89 ff eb d4 <0f> 0b 
48 8b 43 50 4c 8b 78 68 49 8d 97 c8 03 00 00 eb 02 f3 90 
[12470.392299] RIP  [] d_splice_alias+0x1e1/0x470


On Jun 3, 2016, at 1:56 AM, Al Viro wrote:

> On Fri, Jun 03, 2016 at 12:58:10AM -0400, Oleg Drokin wrote:
> 
>> This one cures the insta-crash I was having, and I see no other ill-effects 
>> so far.
> 
> OK...  I can take it through vfs.git, but I think it'd be better off in
> NFS tree.  Is everyone OK with something like the following?
> 
> make nfs_atomic_open() call d_drop() on all ->open_context() errors.
> 
> In "NFSv4: Move dentry instantiation into the NFSv4-specific atomic open code"
> unconditional d_drop() after the ->open_context() had been removed.  It had
> been correct for success cases (there ->open_context() itself had been doing
> dcache manipulations), but not for error ones.  Only one of those (ENOENT)
> got a compensatory d_drop() added in that commit, but in fact it should've
> been done for all errors.  As it is, the case of O_CREAT non-exclusive open
> on a hashed negative dentry racing with e.g. symlink creation from another
> client ended up with ->open_context() getting an error and proceeding to
> call nfs_lookup().  On a hashed dentry, which would've instantly triggered
> BUG_ON() in d_materialise_unique() (or, these days, its equivalent in
> d_splice_alias()).
> 
> Cc: sta...@vger.kernel.org # v3.10+
> Tested-by: Oleg Drokin 
> Signed-off-by: Al Viro 
> ---
> diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
> index aaf7bd0..6e3a6f4 100644
> --- a/fs/nfs/dir.c
> +++ b/fs/nfs/dir.c
> @@ -1536,9 +1536,9 @@ int nfs_atomic_open(struct inode *dir, struct dentry 
> *dentry,
>   err = PTR_ERR(inode);
>   trace_nfs_atomic_open_exit(dir, ctx, open_flags, err);
>   put_nfs_open_context(ctx);
> + d_drop(dentry);
>   switch (err) {
>   case -ENOENT:
> - d_drop(dentry);
>   d_add(dentry, NULL);
>   nfs_set_verifier(dentry, 
> nfs_save_change_attribute(dir));
>   break;



Re: NFS/d_splice_alias breakage

2016-06-02 Thread Al Viro
On Fri, Jun 03, 2016 at 12:58:10AM -0400, Oleg Drokin wrote:

> This one cures the insta-crash I was having, and I see no other ill-effects 
> so far.

OK...  I can take it through vfs.git, but I think it'd be better off in
NFS tree.  Is everyone OK with something like the following?

make nfs_atomic_open() call d_drop() on all ->open_context() errors.

In "NFSv4: Move dentry instantiation into the NFSv4-specific atomic open code"
unconditional d_drop() after the ->open_context() had been removed.  It had
been correct for success cases (there ->open_context() itself had been doing
dcache manipulations), but not for error ones.  Only one of those (ENOENT)
got a compensatory d_drop() added in that commit, but in fact it should've
been done for all errors.  As it is, the case of O_CREAT non-exclusive open
on a hashed negative dentry racing with e.g. symlink creation from another
client ended up with ->open_context() getting an error and proceeding to
call nfs_lookup().  On a hashed dentry, which would've instantly triggered
BUG_ON() in d_materialise_unique() (or, these days, its equivalent in
d_splice_alias()).

Cc: sta...@vger.kernel.org # v3.10+
Tested-by: Oleg Drokin 
Signed-off-by: Al Viro 
---
diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
index aaf7bd0..6e3a6f4 100644
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -1536,9 +1536,9 @@ int nfs_atomic_open(struct inode *dir, struct dentry 
*dentry,
err = PTR_ERR(inode);
trace_nfs_atomic_open_exit(dir, ctx, open_flags, err);
put_nfs_open_context(ctx);
+   d_drop(dentry);
switch (err) {
case -ENOENT:
-   d_drop(dentry);
d_add(dentry, NULL);
nfs_set_verifier(dentry, 
nfs_save_change_attribute(dir));
break;


Re: NFS/d_splice_alias breakage

2016-06-02 Thread Oleg Drokin

On Jun 3, 2016, at 12:26 AM, Al Viro wrote:

> On Thu, Jun 02, 2016 at 11:43:59PM -0400, Oleg Drokin wrote:
> 
>>> Which of the call sites had that been and how does one reproduce that fun?
>>> If you feel that posting a reproducer in the open is a bad idea, just send
>>> it off-list...
>> 
>> This is fs/nfs/dir.c::nfs_lookup() right after no_entry label.
> 
> Bloody hell...  Was that sucker hashed on the entry into nfs_lookup()?
> If we get that with call as a method, we are really fucked.
> 
> 
> 
> Ho-hum...  One of the goto no_open; in nfs_atomic_open()?  That would
> mean a stale negative dentry in dcache that has escaped revalidation...
> Wait a minute, didn't nfs ->d_revalidate() skip everything in some
> cases, leaving it to ->atomic_open()?
> 
> Looks like the right thing to do would be to do d_drop() at no_open:,
> just before we call nfs_lookup().  If not earlier, actually...  How
> about the following?

This one cures the insta-crash I was having, and I see no other ill-effects so 
far.

> 
> diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
> index aaf7bd0..6e3a6f4 100644
> --- a/fs/nfs/dir.c
> +++ b/fs/nfs/dir.c
> @@ -1536,9 +1536,9 @@ int nfs_atomic_open(struct inode *dir, struct dentry 
> *dentry,
>   err = PTR_ERR(inode);
>   trace_nfs_atomic_open_exit(dir, ctx, open_flags, err);
>   put_nfs_open_context(ctx);
> + d_drop(dentry);
>   switch (err) {
>   case -ENOENT:
> - d_drop(dentry);
>   d_add(dentry, NULL);
>   nfs_set_verifier(dentry, 
> nfs_save_change_attribute(dir));
>   break;



Re: NFS/d_splice_alias breakage

2016-06-02 Thread Al Viro
On Fri, Jun 03, 2016 at 05:42:53AM +0100, Al Viro wrote:
> On Fri, Jun 03, 2016 at 05:26:48AM +0100, Al Viro wrote:
> 
> > Looks like the right thing to do would be to do d_drop() at no_open:,
> > just before we call nfs_lookup().  If not earlier, actually...  How
> > about the following?
> 
> A bit of rationale: dentry in question is negative and attempt to open
> it has just failed; in case it's really negative we did that d_drop()
> anyway (and then unconditionally rehashed it).  In case when we proceed to
> nfs_lookup() and it does not fail, we'll have it rehashed there (with the
> right inode).  What do we lose from doing d_drop() on *any* error here?
> It's negative, with dubious validity.  In the worst case, we had open
> and lookup fail, but it's just us - the sucker really is negative and
> somebody else would be able to revalidate it.  If we drop it here (and
> not rehash later), that somebody else will have to allocate an in-core
> dentry before doing lookup or atomic_open.  Which is negligible extra
> burden.

I suspect that it got broken in commit 275bb3078 (NFSv4: Move dentry
instantiation into the NFSv4-specific atomic open code).  Prior to that
commit, d_drop() was there (error or no error).  Looks like 3.10+, indeed.
I agree that we shouldn't drop it on successful open, but it needs to be
done on errors.  All of them, not just ENOENT, as in that commit.


Re: NFS/d_splice_alias breakage

2016-06-02 Thread Al Viro
On Fri, Jun 03, 2016 at 05:26:48AM +0100, Al Viro wrote:

> Looks like the right thing to do would be to do d_drop() at no_open:,
> just before we call nfs_lookup().  If not earlier, actually...  How
> about the following?

A bit of rationale: dentry in question is negative and attempt to open
it has just failed; in case it's really negative we did that d_drop()
anyway (and then unconditionally rehashed it).  In case when we proceed to
nfs_lookup() and it does not fail, we'll have it rehashed there (with the
right inode).  What do we lose from doing d_drop() on *any* error here?
It's negative, with dubious validity.  In the worst case, we had open
and lookup fail, but it's just us - the sucker really is negative and
somebody else would be able to revalidate it.  If we drop it here (and
not rehash later), that somebody else will have to allocate an in-core
dentry before doing lookup or atomic_open.  Which is negligible extra
burden.

> diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
> index aaf7bd0..6e3a6f4 100644
> --- a/fs/nfs/dir.c
> +++ b/fs/nfs/dir.c
> @@ -1536,9 +1536,9 @@ int nfs_atomic_open(struct inode *dir, struct dentry 
> *dentry,
>   err = PTR_ERR(inode);
>   trace_nfs_atomic_open_exit(dir, ctx, open_flags, err);
>   put_nfs_open_context(ctx);
> + d_drop(dentry);
>   switch (err) {
>   case -ENOENT:
> - d_drop(dentry);
>   d_add(dentry, NULL);
>   nfs_set_verifier(dentry, 
> nfs_save_change_attribute(dir));
>   break;


Re: NFS/d_splice_alias breakage

2016-06-02 Thread Al Viro
On Thu, Jun 02, 2016 at 11:43:59PM -0400, Oleg Drokin wrote:

> > Which of the call sites had that been and how does one reproduce that fun?
> > If you feel that posting a reproducer in the open is a bad idea, just send
> > it off-list...
> 
> This is fs/nfs/dir.c::nfs_lookup() right after no_entry label.

Bloody hell...  Was that sucker hashed on the entry into nfs_lookup()?
If we get that with call as a method, we are really fucked.



Ho-hum...  One of the goto no_open; in nfs_atomic_open()?  That would
mean a stale negative dentry in dcache that has escaped revalidation...
Wait a minute, didn't nfs ->d_revalidate() skip everything in some
cases, leaving it to ->atomic_open()?

Looks like the right thing to do would be to do d_drop() at no_open:,
just before we call nfs_lookup().  If not earlier, actually...  How
about the following?

diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
index aaf7bd0..6e3a6f4 100644
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -1536,9 +1536,9 @@ int nfs_atomic_open(struct inode *dir, struct dentry 
*dentry,
err = PTR_ERR(inode);
trace_nfs_atomic_open_exit(dir, ctx, open_flags, err);
put_nfs_open_context(ctx);
+   d_drop(dentry);
switch (err) {
case -ENOENT:
-   d_drop(dentry);
d_add(dentry, NULL);
nfs_set_verifier(dentry, 
nfs_save_change_attribute(dir));
break;


Re: NFS/d_splice_alias breakage

2016-06-02 Thread Oleg Drokin

On Jun 2, 2016, at 11:37 PM, Al Viro wrote:

> On Thu, Jun 02, 2016 at 06:46:08PM -0400, Oleg Drokin wrote:
>> Hello!
>> 
>>   I just came across a bug (trying to run some Lustre test scripts against 
>> NFS, while hunting for another nfsd bug)
>>   that seems to be present since at least 2014 that lets users crash nfs 
>> client locally.
> 
>>> * Cluster filesystems may call this function with a negative, hashed dentry.
>>> * In that case, we know that the inode will be a regular file, and also this
>>> * will only occur during atomic_open. So we need to check for the dentry
>>> * being already hashed only in the final case.
> 
> Comment is long obsolete and should've been removed.  "Cluster filesystem"
> in question was GFS2 and it had been dealt with there.  Mea culpa - should've
> removed the comment as soon as that was done.

Oh, ok. I assumed it was still valid, esp. considering the issue at hand where
what it describes actually happens and NFS is also a cluster filesystem of 
sorts ;)

>> The problem was there at least since 3.10 it appears where the fs/nfs/dir.c 
>> code
>> was calling d_materialise_unique() that did require the dentry to be 
>> unhashed.
>> 
>> Not sure how this was not hit earlier. The crash looks like this (I added
>> a printk to ensure this is what is going on indeed and not some other weird 
>> race):
> 
>> [   64.489326] Calling into d_splice_alias with hashed dentry, 
>> dentry->d_inode (null) inode 88010f500c70
> 
> Which of the call sites had that been and how does one reproduce that fun?
> If you feel that posting a reproducer in the open is a bad idea, just send
> it off-list...

This is fs/nfs/dir.c::nfs_lookup() right after no_entry label.

I'll send you the scripts with instructions separately for now.


Re: NFS/d_splice_alias breakage

2016-06-02 Thread Al Viro
On Fri, Jun 03, 2016 at 04:26:25AM +0100, Al Viro wrote:
> On Thu, Jun 02, 2016 at 08:54:06PM -0400, Oleg Drokin wrote:
> 
> > > Al, I’ve been distracted by personal matters in the past 6 months. What 
> > > is there to guarantee exclusion of the readdirplus dentry instantiation 
> > > and the NFSv4 atomic open in the brave new world of VFS, June 2016 
> > > vintage?
> 
> d_alloc_parallel() use by all parties involved.

Grr...  That should've been a reply to Trond, obviously.


Re: NFS/d_splice_alias breakage

2016-06-02 Thread Al Viro
On Thu, Jun 02, 2016 at 06:46:08PM -0400, Oleg Drokin wrote:
> Hello!
> 
>I just came across a bug (trying to run some Lustre test scripts against 
> NFS, while hunting for another nfsd bug)
>that seems to be present since at least 2014 that lets users crash nfs 
> client locally.

> >  * Cluster filesystems may call this function with a negative, hashed 
> > dentry.
> >  * In that case, we know that the inode will be a regular file, and also 
> > this
> >  * will only occur during atomic_open. So we need to check for the dentry
> >  * being already hashed only in the final case.

Comment is long obsolete and should've been removed.  "Cluster filesystem"
in question was GFS2 and it had been dealt with there.  Mea culpa - should've
removed the comment as soon as that was done.

> Removing the BUG_ON headon is not going to work since the d_rehash of old
> is now folded into __d_add and we might not want to move that condition there.

No, it is not.  It really should not be called that way.

> The problem was there at least since 3.10 it appears where the fs/nfs/dir.c 
> code
> was calling d_materialise_unique() that did require the dentry to be unhashed.
> 
> Not sure how this was not hit earlier. The crash looks like this (I added
> a printk to ensure this is what is going on indeed and not some other weird 
> race):

> [   64.489326] Calling into d_splice_alias with hashed dentry, 
> dentry->d_inode (null) inode 88010f500c70

Which of the call sites had that been and how does one reproduce that fun?
If you feel that posting a reproducer in the open is a bad idea, just send
it off-list...


Re: NFS/d_splice_alias breakage

2016-06-02 Thread Al Viro
On Fri, Jun 03, 2016 at 12:44:51AM +, Trond Myklebust wrote:

> That would have to be a really tight race, since the code in 
> _nfs4_open_and_get_state() currently reads:
> 
> d_drop(dentry);
> alias = d_exact_alias(dentry, state->inode);
> if (!alias)
> alias = d_splice_alias(igrab(state->inode), dentry);
> 
> IOW: something would have to be acting between the d_drop() and 
> d_splice_alias() above...

How?  dentry is
* negative (it would better be, or we are _really_ fucked)
* unhashed

How does whoever's rehashing it stumble across that thing?


Re: NFS/d_splice_alias breakage

2016-06-02 Thread Al Viro
On Thu, Jun 02, 2016 at 08:54:06PM -0400, Oleg Drokin wrote:

> > Al, I’ve been distracted by personal matters in the past 6 months. What is 
> > there to guarantee exclusion of the readdirplus dentry instantiation and 
> > the NFSv4 atomic open in the brave new world of VFS, June 2016 vintage?

d_alloc_parallel() use by all parties involved.


Re: NFS/d_splice_alias breakage

2016-06-02 Thread Oleg Drokin

On Jun 2, 2016, at 8:44 PM, Trond Myklebust wrote:

> That would have to be a really tight race, since the code in 
> _nfs4_open_and_get_state() currently reads:
> 
> d_drop(dentry);
> alias = d_exact_alias(dentry, state->inode);
> if (!alias)
> alias = d_splice_alias(igrab(state->inode), dentry);

We live in reality where there are no more tight races left.
1 instruction-race is now huge due to commonplace cpu-overcommitted VMs (and 
hyperthreading), nobody knows
when is your VM cpu going to be preempted by the host (while the other
cpus of the same VM are running ahead full steam).
Stuff that took ages to trigger again to better understand is instant now.

> IOW: something would have to be acting between the d_drop() and 
> d_splice_alias() above…

The other CPU ;)
I checked the readdirplus theory and that's not it, there's some sort of 
another lookup going, this dentry we crashed on never came through 
nfs_prime_dcache.

> Al, I’ve been distracted by personal matters in the past 6 months. What is 
> there to guarantee exclusion of the readdirplus dentry instantiation and the 
> NFSv4 atomic open in the brave new world of VFS, June 2016 vintage?

Note this is not a new 2016 vintage bug. I hit this in 3.10 as well (baseline 
kernel from one well known enterprise vendor, only there we hit it in 
d_realise_unique).


Re: NFS/d_splice_alias breakage

2016-06-02 Thread Trond Myklebust


On 6/2/16, 18:46, "linux-nfs-ow...@vger.kernel.org on behalf of Oleg Drokin" 
 wrote:

>Hello!
>
>   I just came across a bug (trying to run some Lustre test scripts against 
> NFS, while hunting for another nfsd bug)
>   that seems to be present since at least 2014 that lets users crash nfs 
> client locally.
>
>   Here's some interesting comment quote first from d_obtain_alias:
>
>>  * Cluster filesystems may call this function with a negative, hashed dentry.
>>  * In that case, we know that the inode will be a regular file, and also this
>>  * will only occur during atomic_open. So we need to check for the dentry
>>  * being already hashed only in the final case.
>>  */
>> struct dentry *d_splice_alias(struct inode *inode, struct dentry *dentry)
>> {
>> if (IS_ERR(inode))
>> return ERR_CAST(inode);
>> 
>> BUG_ON(!d_unhashed(dentry));
>^^ - This does not align well with the quote above.
>
>It got imported here by commit b5ae6b15bd73e35b129408755a0804287a87e041
>
>Removing the BUG_ON headon is not going to work since the d_rehash of old
>is now folded into __d_add and we might not want to move that condition there.
>
>doing an
>if (d_unhashed(dentry))
>   __d_add(dentry, inode);
>else
>   d_instantiate(dentry, inode);
>
>also does not look super pretty and who knows how all of the previous code
>like _d_find_any_alias would react.
>
>Al, I think you might want to chime in here on how to better handle this?
>
>The problem was there at least since 3.10 it appears where the fs/nfs/dir.c 
>code
>was calling d_materialise_unique() that did require the dentry to be unhashed.
>
>Not sure how this was not hit earlier. The crash looks like this (I added
>a printk to ensure this is what is going on indeed and not some other weird 
>race):
>
>[   64.489326] Calling into d_splice_alias with hashed dentry, dentry->d_inode 
>(null) inode 88010f500c70
>[   64.489549] [ cut here ]
>[   64.489642] kernel BUG at /home/green/bk/linux/fs/dcache.c:2989!
>[   64.489750] invalid opcode:  [#1] SMP DEBUG_PAGEALLOC
>[   64.489853] Modules linked in: loop rpcsec_gss_krb5 joydev pcspkr 
>acpi_cpufreq i2c_piix4 tpm_tis tpm nfsd drm_kms_helper ttm drm serio_raw 
>virtio_blk
>[   64.49] CPU: 6 PID: 7125 Comm: file_concat.sh Not tainted 
>4.7.0-rc1-vm-nfs+ #99
>[   64.492069] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
>[   64.492489] task: 880110a801c0 ti: 8800c283c000 task.ti: 
>8800c283c000
>[   64.493159] RIP: 0010:[]  [] 
>d_splice_alias+0x31f/0x480
>[   64.493866] RSP: 0018:8800c283fb20  EFLAGS: 00010282
>[   64.494238] RAX: 0067 RBX: 88010f500c70 RCX: 
>
>[   64.494625] RDX: 0067 RSI: 8800d08d2ed0 RDI: 
>88010f500c70
>[   64.495021] RBP: 8800c283fb78 R08: 0001 R09: 
>
>[   64.495407] R10: 0001 R11: 0001 R12: 
>8800d08d2ed0
>[   64.495804] R13:  R14: 8800d4a56f00 R15: 
>8800d0bb8c70
>[   64.496199] FS:  7f94ae25c700() GS:88011f58() 
>knlGS:
>[   64.496859] CS:  0010 DS:  ES:  CR0: 80050033
>[   64.497235] CR2: 55e5d3fb46a4 CR3: ce364000 CR4: 
>06e0
>[   64.497626] Stack:
>[   64.497961]  8132154e 8800c283fb68 81325916 
>
>[   64.498765]   8801109459c0  
>8800d0bb8c70
>[   64.499578]  8800c283fba0 8800d08d2ed0 8800cb927080 
>8800c283fc18
>[   64.500385] Call Trace:
>[   64.500727]  [] ? nfs_lookup+0x17e/0x320
>[   64.501103]  [] ? __put_nfs_open_context+0xc6/0xf0
>[   64.501477]  [] nfs_atomic_open+0x8b/0x430
>[   64.501850]  [] lookup_open+0x29f/0x7a0
>[   64.50]  [] path_openat+0x4de/0xfc0
>[   64.502591]  [] do_filp_open+0x7e/0xe0
>[   64.502964]  [] ? __alloc_fd+0xbc/0x170
>[   64.503347]  [] ? _raw_spin_unlock+0x27/0x40
>[   64.503719]  [] ? __alloc_fd+0xbc/0x170
>[   64.504097]  [] do_sys_open+0x116/0x1f0
>[   64.504465]  [] SyS_open+0x1e/0x20
>[   64.504831]  [] entry_SYSCALL_64_fastpath+0x1e/0xad
>[   64.505218] Code: 01 e8 46 20 5b 00 85 db 74 0b 4c 89 ff 4c 63 fb e8 87 d8 
>ff ff 4c 89 e7 e8 2f 3c 00 00 4c 89 f8 e9 5e fe ff ff 0f 0b 48 89 f8 c3 <0f> 
>0b 48 8b 43 40 4c 8b 78 58 49 8d 8f 58 03 00 00 eb 02 f3 90 
>[   64.508754] RIP  [] d_splice_alias+0x31f/0x480
>[   64.509176]  RSP 

That would have to be a really tight race, since the code in 
_nfs4_open_and_get_state() currently reads:

d_drop(dentry);
alias = d_exact_alias(dentry, state->inode);
if (!alias)
alias = d_splice_alias(igrab(state->inode), dentry);

IOW: something would have to be acting between the d_drop() and 
d_splice_alias() above...

Al, I’ve been distracted by personal matters in the past 6 months. What is 
there to guarantee exclusion of the readdirplus dentry instant

NFS/d_splice_alias breakage

2016-06-02 Thread Oleg Drokin
Hello!

   I just came across a bug (trying to run some Lustre test scripts against 
NFS, while hunting for another nfsd bug)
   that seems to be present since at least 2014 that lets users crash nfs 
client locally.

   Here's some interesting comment quote first from d_obtain_alias:

>  * Cluster filesystems may call this function with a negative, hashed dentry.
>  * In that case, we know that the inode will be a regular file, and also this
>  * will only occur during atomic_open. So we need to check for the dentry
>  * being already hashed only in the final case.
>  */
> struct dentry *d_splice_alias(struct inode *inode, struct dentry *dentry)
> {
> if (IS_ERR(inode))
> return ERR_CAST(inode);
> 
> BUG_ON(!d_unhashed(dentry));
^^ - This does not align well with the quote above.

It got imported here by commit b5ae6b15bd73e35b129408755a0804287a87e041

Removing the BUG_ON headon is not going to work since the d_rehash of old
is now folded into __d_add and we might not want to move that condition there.

doing an
if (d_unhashed(dentry))
   __d_add(dentry, inode);
else
   d_instantiate(dentry, inode);

also does not look super pretty and who knows how all of the previous code
like _d_find_any_alias would react.

Al, I think you might want to chime in here on how to better handle this?

The problem was there at least since 3.10 it appears where the fs/nfs/dir.c code
was calling d_materialise_unique() that did require the dentry to be unhashed.

Not sure how this was not hit earlier. The crash looks like this (I added
a printk to ensure this is what is going on indeed and not some other weird 
race):

[   64.489326] Calling into d_splice_alias with hashed dentry, dentry->d_inode 
(null) inode 88010f500c70
[   64.489549] [ cut here ]
[   64.489642] kernel BUG at /home/green/bk/linux/fs/dcache.c:2989!
[   64.489750] invalid opcode:  [#1] SMP DEBUG_PAGEALLOC
[   64.489853] Modules linked in: loop rpcsec_gss_krb5 joydev pcspkr 
acpi_cpufreq i2c_piix4 tpm_tis tpm nfsd drm_kms_helper ttm drm serio_raw 
virtio_blk
[   64.49] CPU: 6 PID: 7125 Comm: file_concat.sh Not tainted 
4.7.0-rc1-vm-nfs+ #99
[   64.492069] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[   64.492489] task: 880110a801c0 ti: 8800c283c000 task.ti: 
8800c283c000
[   64.493159] RIP: 0010:[]  [] 
d_splice_alias+0x31f/0x480
[   64.493866] RSP: 0018:8800c283fb20  EFLAGS: 00010282
[   64.494238] RAX: 0067 RBX: 88010f500c70 RCX: 
[   64.494625] RDX: 0067 RSI: 8800d08d2ed0 RDI: 88010f500c70
[   64.495021] RBP: 8800c283fb78 R08: 0001 R09: 
[   64.495407] R10: 0001 R11: 0001 R12: 8800d08d2ed0
[   64.495804] R13:  R14: 8800d4a56f00 R15: 8800d0bb8c70
[   64.496199] FS:  7f94ae25c700() GS:88011f58() 
knlGS:
[   64.496859] CS:  0010 DS:  ES:  CR0: 80050033
[   64.497235] CR2: 55e5d3fb46a4 CR3: ce364000 CR4: 06e0
[   64.497626] Stack:
[   64.497961]  8132154e 8800c283fb68 81325916 

[   64.498765]   8801109459c0  
8800d0bb8c70
[   64.499578]  8800c283fba0 8800d08d2ed0 8800cb927080 
8800c283fc18
[   64.500385] Call Trace:
[   64.500727]  [] ? nfs_lookup+0x17e/0x320
[   64.501103]  [] ? __put_nfs_open_context+0xc6/0xf0
[   64.501477]  [] nfs_atomic_open+0x8b/0x430
[   64.501850]  [] lookup_open+0x29f/0x7a0
[   64.50]  [] path_openat+0x4de/0xfc0
[   64.502591]  [] do_filp_open+0x7e/0xe0
[   64.502964]  [] ? __alloc_fd+0xbc/0x170
[   64.503347]  [] ? _raw_spin_unlock+0x27/0x40
[   64.503719]  [] ? __alloc_fd+0xbc/0x170
[   64.504097]  [] do_sys_open+0x116/0x1f0
[   64.504465]  [] SyS_open+0x1e/0x20
[   64.504831]  [] entry_SYSCALL_64_fastpath+0x1e/0xad
[   64.505218] Code: 01 e8 46 20 5b 00 85 db 74 0b 4c 89 ff 4c 63 fb e8 87 d8 
ff ff 4c 89 e7 e8 2f 3c 00 00 4c 89 f8 e9 5e fe ff ff 0f 0b 48 89 f8 c3 <0f> 0b 
48 8b 43 40 4c 8b 78 58 49 8d 8f 58 03 00 00 eb 02 f3 90 
[   64.508754] RIP  [] d_splice_alias+0x31f/0x480
[   64.509176]  RSP 

Bye,
Oleg