Re: nfsv2 ref leak in 2.6.24?

2007-10-21 Thread Erez Zadok
In message <[EMAIL PROTECTED]>, Trond Myklebust writes:
> 
> On Sat, 2007-10-20 at 17:35 -0400, Erez Zadok wrote:
> 
> > Trond, I verified that w/ the above patch the problem is w/ nfs: the client
> > leaves .nfsXXX files behind for every file unlinked while open.  Let me know
> > when you get a fix and I'll test it.
> 
> Doh... Another typo.
> 
> Trond
> -- CUT HERE 
> From: Trond Myklebust <[EMAIL PROTECTED]>
> Date: Sun, 21 Oct 2007 12:02:22 -0400
> NFS: Fix a typo in nfs_call_unlink()
> 
> Signed-off-by: Trond Myklebust <[EMAIL PROTECTED]>
> ---
> 
>  fs/nfs/unlink.c |2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/fs/nfs/unlink.c b/fs/nfs/unlink.c
> index ce558c2..233ad38 100644
> --- a/fs/nfs/unlink.c
> +++ b/fs/nfs/unlink.c
> @@ -171,7 +171,7 @@ static int nfs_call_unlink(struct dentry *dentry, struct 
> nfs_unlinkdata *data)
>   if (parent == NULL)
>   goto out_free;
>   dir = parent->d_inode;
> - if (nfs_copy_dname(dentry, data) == 0)
> + if (nfs_copy_dname(dentry, data) != 0)
>   goto out_dput;
>   /* Non-exclusive lock protects against concurrent lookup() calls */
>   spin_lock(&dir->i_lock);

With this patch, all my nfs2/3/4 tests passed just fine.

Thanks,
Erez.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: nfsv2 ref leak in 2.6.24?

2007-10-21 Thread Trond Myklebust

On Sat, 2007-10-20 at 17:35 -0400, Erez Zadok wrote:

> Trond, I verified that w/ the above patch the problem is w/ nfs: the client
> leaves .nfsXXX files behind for every file unlinked while open.  Let me know
> when you get a fix and I'll test it.

Doh... Another typo.

Trond
-- CUT HERE 
From: Trond Myklebust <[EMAIL PROTECTED]>
Date: Sun, 21 Oct 2007 12:02:22 -0400
NFS: Fix a typo in nfs_call_unlink()

Signed-off-by: Trond Myklebust <[EMAIL PROTECTED]>
---

 fs/nfs/unlink.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/fs/nfs/unlink.c b/fs/nfs/unlink.c
index ce558c2..233ad38 100644
--- a/fs/nfs/unlink.c
+++ b/fs/nfs/unlink.c
@@ -171,7 +171,7 @@ static int nfs_call_unlink(struct dentry *dentry, struct 
nfs_unlinkdata *data)
if (parent == NULL)
goto out_free;
dir = parent->d_inode;
-   if (nfs_copy_dname(dentry, data) == 0)
+   if (nfs_copy_dname(dentry, data) != 0)
goto out_dput;
/* Non-exclusive lock protects against concurrent lookup() calls */
spin_lock(&dir->i_lock);


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: nfsv2 ref leak in 2.6.24?

2007-10-20 Thread Erez Zadok
In message <[EMAIL PROTECTED]>, Erez Zadok writes:
> In message <[EMAIL PROTECTED]>, Trond Myklebust writes:
[...]
> > Looking at
> > nfs_proc_create(), there is indeed a missing call to
> > nfs_mark_for_revalidate(). The reason why you need such a call being the
> > usual one: NFSv2 doesn't provide post-op attributes for the directory.
> > 
> > The patch below ought to fix the problem.
> 
> It fixes some, but breaks others.  The test script I sent yesterday indeed
> passes.  And more of my unionfs-on-nfs2 tests pass, but not all.  Three of
> my unionfs tests (create w/ copyup, unlink open files, and unlink with a
> whiteout) fail b/c they detect leftover silly-renamed files.  Worse, now the
> same three tests also fail when I use unionfs on top of nfs3/nfs4: before
> the one line fix below, unionfs-on-nfsv3/4 worked fine.
> 
> Was there any significant semantic change in the behaviour of silly-renamed
> files in nfs in 2.6.24?  If so, then I may have to change how unionfs
> handles refcounts and such.
> 
> I'll try to dig deeper and see if I can come up with a small test case that
> doesn't involve unionfs.
> 
> Thanks,
> Erez.
> 
> > Cheers
> >   Trond
> > -- CUT HERE ---
> > From: Trond Myklebust <[EMAIL PROTECTED]>
> > Date: Sat, 20 Oct 2007 13:07:21 -0400
> > NFSv2: Ensure that the directory metadata gets revalidated on file create
> > 
> > Signed-off-by: Trond Myklebust <[EMAIL PROTECTED]>
> > ---
> > 
> >  fs/nfs/proc.c |1 +
> >  1 files changed, 1 insertions(+), 0 deletions(-)
> > 
> > diff --git a/fs/nfs/proc.c b/fs/nfs/proc.c
> > index 97669ed..4f80d88 100644
> > --- a/fs/nfs/proc.c
> > +++ b/fs/nfs/proc.c
> > @@ -211,6 +211,7 @@ nfs_proc_create(struct inode *dir, struct dentry 
> > *dentry, struct iattr *sattr,
> > nfs_fattr_init(&fattr);
> > dprintk("NFS call  create %s\n", dentry->d_name.name);
> > status = rpc_call_sync(NFS_CLIENT(dir), &msg, 0);
> > +   nfs_mark_for_revalidate(dir);
> > if (status == 0)
> > status = nfs_instantiate(dentry, &fhandle, &fattr);
> > dprintk("NFS reply create: %d\n", status);
> 
> Erez.

Trond, I verified that w/ the above patch the problem is w/ nfs: the client
leaves .nfsXXX files behind for every file unlinked while open.  Let me know
when you get a fix and I'll test it.

Cheers,
Erez.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: nfsv2 ref leak in 2.6.24?

2007-10-20 Thread Erez Zadok
In message <[EMAIL PROTECTED]>, Trond Myklebust writes:
> 
> On Fri, 2007-10-19 at 22:33 -0400, Erez Zadok wrote:
> > Trond, good news.  I was able to narrow down the problem to purely the
> > client-side, probably dcache/readdir related, and I have a shell script that
> > deterministically triggers the problem each time for me (this is a FC6 image
> > under Vmware 6.0.1).  Here's a short shell script which reliably triggers
> > the "lost files" problem -- I create a file via nfs2 on the client side, and
> > sometimes it doesn't show up in readdir, but it is there if you stat it
> > directly.
> 
> Ah... I got confused as to what you were measuring and where.

Yeah, I wasn't sure myself until I narrowed it down to a small test case.

> Looking at
> nfs_proc_create(), there is indeed a missing call to
> nfs_mark_for_revalidate(). The reason why you need such a call being the
> usual one: NFSv2 doesn't provide post-op attributes for the directory.
> 
> The patch below ought to fix the problem.

It fixes some, but breaks others.  The test script I sent yesterday indeed
passes.  And more of my unionfs-on-nfs2 tests pass, but not all.  Three of
my unionfs tests (create w/ copyup, unlink open files, and unlink with a
whiteout) fail b/c they detect leftover silly-renamed files.  Worse, now the
same three tests also fail when I use unionfs on top of nfs3/nfs4: before
the one line fix below, unionfs-on-nfsv3/4 worked fine.

Was there any significant semantic change in the behaviour of silly-renamed
files in nfs in 2.6.24?  If so, then I may have to change how unionfs
handles refcounts and such.

I'll try to dig deeper and see if I can come up with a small test case that
doesn't involve unionfs.

Thanks,
Erez.

> Cheers
>   Trond
> -- CUT HERE ---
> From: Trond Myklebust <[EMAIL PROTECTED]>
> Date: Sat, 20 Oct 2007 13:07:21 -0400
> NFSv2: Ensure that the directory metadata gets revalidated on file create
> 
> Signed-off-by: Trond Myklebust <[EMAIL PROTECTED]>
> ---
> 
>  fs/nfs/proc.c |1 +
>  1 files changed, 1 insertions(+), 0 deletions(-)
> 
> diff --git a/fs/nfs/proc.c b/fs/nfs/proc.c
> index 97669ed..4f80d88 100644
> --- a/fs/nfs/proc.c
> +++ b/fs/nfs/proc.c
> @@ -211,6 +211,7 @@ nfs_proc_create(struct inode *dir, struct dentry *dentry, 
> struct iattr *sattr,
>   nfs_fattr_init(&fattr);
>   dprintk("NFS call  create %s\n", dentry->d_name.name);
>   status = rpc_call_sync(NFS_CLIENT(dir), &msg, 0);
> + nfs_mark_for_revalidate(dir);
>   if (status == 0)
>   status = nfs_instantiate(dentry, &fhandle, &fattr);
>   dprintk("NFS reply create: %d\n", status);

Erez.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: nfsv2 ref leak in 2.6.24?

2007-10-20 Thread Trond Myklebust

On Fri, 2007-10-19 at 22:33 -0400, Erez Zadok wrote:
> Trond, good news.  I was able to narrow down the problem to purely the
> client-side, probably dcache/readdir related, and I have a shell script that
> deterministically triggers the problem each time for me (this is a FC6 image
> under Vmware 6.0.1).  Here's a short shell script which reliably triggers
> the "lost files" problem -- I create a file via nfs2 on the client side, and
> sometimes it doesn't show up in readdir, but it is there if you stat it
> directly.

Ah... I got confused as to what you were measuring and where. Looking at
nfs_proc_create(), there is indeed a missing call to
nfs_mark_for_revalidate(). The reason why you need such a call being the
usual one: NFSv2 doesn't provide post-op attributes for the directory.

The patch below ought to fix the problem.

Cheers
  Trond
-- CUT HERE ---
From: Trond Myklebust <[EMAIL PROTECTED]>
Date: Sat, 20 Oct 2007 13:07:21 -0400
NFSv2: Ensure that the directory metadata gets revalidated on file create

Signed-off-by: Trond Myklebust <[EMAIL PROTECTED]>
---

 fs/nfs/proc.c |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/fs/nfs/proc.c b/fs/nfs/proc.c
index 97669ed..4f80d88 100644
--- a/fs/nfs/proc.c
+++ b/fs/nfs/proc.c
@@ -211,6 +211,7 @@ nfs_proc_create(struct inode *dir, struct dentry *dentry, 
struct iattr *sattr,
nfs_fattr_init(&fattr);
dprintk("NFS call  create %s\n", dentry->d_name.name);
status = rpc_call_sync(NFS_CLIENT(dir), &msg, 0);
+   nfs_mark_for_revalidate(dir);
if (status == 0)
status = nfs_instantiate(dentry, &fhandle, &fattr);
dprintk("NFS reply create: %d\n", status);


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: nfsv2 ref leak in 2.6.24?

2007-10-19 Thread Erez Zadok
Trond, good news.  I was able to narrow down the problem to purely the
client-side, probably dcache/readdir related, and I have a shell script that
deterministically triggers the problem each time for me (this is a FC6 image
under Vmware 6.0.1).  Here's a short shell script which reliably triggers
the "lost files" problem -- I create a file via nfs2 on the client side, and
sometimes it doesn't show up in readdir, but it is there if you stat it
directly.

BTW, since this is a client-side bug, I also tried your last set of posted
client patches for Linus, but it didn't help.

Happy hunting. :-)

Erez.

##
#!/bin/sh

dd if=/dev/zero of=/tmp/fs.$$.0 bs=1024k count=1 seek=100
losetup /dev/loop0 /tmp/fs.$$.0
mkfs -t ext2 -q /dev/loop0
mkdir -p /n/export/b0
mount -t ext2 /dev/loop0 /n/export/b0
exportfs -o no_root_squash,rw localhost:/n/export/b0
mkdir -p /n/client/b0
mount -t nfs -o nfsvers=2 localhost:/n/export/b0 /n/client/b0

count=0
while true; do
cfile="/n/client/b0/F.$count"

if touch $cfile ; then
echo $cfile created
else
echo "touch $cfile failed"
exit 1
fi

if `ls -1 /n/client/b0 | egrep -q F.$count'$'` ; then
:
else
echo "$cfile doesn't exist (but it should)"
ls -l /n/client/b0
ls -l /n/export/b0
stat $cfile
exit 1
fi

let count=count+1
done
##
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: nfsv2 ref leak in 2.6.24?

2007-10-19 Thread Erez Zadok
In message <[EMAIL PROTECTED]>, Trond Myklebust writes:
> 
> On Fri, 2007-10-19 at 17:40 -0400, Erez Zadok wrote:
[...]
> > Trond, I was able to narrow down the problem w/o using unionfs at all (yay!
> > :-).  All I do is setup a loop device, mkfs it as ext2, mount it, then
> > export it to localhost and mount it locally as nfs2.  I go and touch a new
> > file in the ext2 directory.  Then I readdir the export point to find the new
> > file indeed.  Then I stat(1) the new file, and get the appropriate stat
> > output.  Now the strange thing is that right after the stat through the
> > export point, the file DISAPPEARS from the lower ext2 dir, but REAPPEARS a
> > few seconds later.
> 
> Let me get this straight:
> 
>   1) You create the file on the server
>   2) You readdir the file from the client
>   3) You stat the file from the client
> 
> with the result that the file disappears from sight on the server?

Yup.

> That would indicate some dcache issues with the server code. What export
> options are you using?

The explicit command I use is:

exportfs -o no_root_squash,rw localhost:/n/export/b0

The options recorded in exportfs -v:

/n/export/b0  
localhost.localdomain(rw,wdelay,no_root_squash,no_subtree_check,anonuid=65534,anongid=65534)

Erez.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: nfsv2 ref leak in 2.6.24?

2007-10-19 Thread Trond Myklebust

On Fri, 2007-10-19 at 17:40 -0400, Erez Zadok wrote:
> In message <[EMAIL PROTECTED]>, Trond Myklebust writes:
> > 
> > On Fri, 2007-10-19 at 01:49 -0400, Erez Zadok wrote:
> > > I'm testing unionfs on top of nfsv2/3/4, using 2.6.24 as of linus's commit
> > > 4fa4d23fa20de67df919030c1216295664866ad7.  A lot of my unionfs regression
> > > tests are failing on nfs2, b/c files that should be deleted, aren't.  It
> > > feels like there may be a ref leak that prevents the files from being
> > > deleted, or maybe an unlink issue.  It doesn't happen in all of my 
> > > previous
> > > kernels w/ identical unionfs (code 2.6.9--2.6.23).  And in 2.6.24 it 
> > > happens
> > > only w/ nfs2 -- nfs3/4 are fine.
> > > 
> > > I'm not sure if this is a client or server issue, and I'm only starting to
> > > dig deeper.  But I thought I'd ask you in case this is a known problem and
> > > you have a fix.  If this is the first you hear of this problem, let me 
> > > know
> > > and I'll try to narrow it down further.
> > 
> > A couple of questions:
> > 
> >   * Are these files being sillyrenamed?
> >   * Are they shown as being removed by the server?
> > 
> > Cheers
> >   Trond
> 
> Trond, I was able to narrow down the problem w/o using unionfs at all (yay!
> :-).  All I do is setup a loop device, mkfs it as ext2, mount it, then
> export it to localhost and mount it locally as nfs2.  I go and touch a new
> file in the ext2 directory.  Then I readdir the export point to find the new
> file indeed.  Then I stat(1) the new file, and get the appropriate stat
> output.  Now the strange thing is that right after the stat through the
> export point, the file DISAPPEARS from the lower ext2 dir, but REAPPEARS a
> few seconds later.

Let me get this straight:

  1) You create the file on the server
  2) You readdir the file from the client
  3) You stat the file from the client

with the result that the file disappears from sight on the server?

That would indicate some dcache issues with the server code. What export
options are you using?

> It doesn't happen all the time, so it feels like some sort of a race or
> timing-related bug.  And it only happens w/ nfs2 on 2.6.24 (v3/4 work fine;
> v2/3/4 work find in previous kernels).
> 
> I'm trying to get a script that'll be able to reproduce this for you more
> deterministically.
> 
> BTW, does your just-posted set of patches, subject "[GIT] NFS client fixes
> for 2.6.23++" possibly fix this?  I can try it and let you know.

I doubt the new patches fix the problem if I did indeed get your method
right.

Cheers
  Trond

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: nfsv2 ref leak in 2.6.24?

2007-10-19 Thread Erez Zadok
In message <[EMAIL PROTECTED]>, Trond Myklebust writes:
> 
> On Fri, 2007-10-19 at 01:49 -0400, Erez Zadok wrote:
> > I'm testing unionfs on top of nfsv2/3/4, using 2.6.24 as of linus's commit
> > 4fa4d23fa20de67df919030c1216295664866ad7.  A lot of my unionfs regression
> > tests are failing on nfs2, b/c files that should be deleted, aren't.  It
> > feels like there may be a ref leak that prevents the files from being
> > deleted, or maybe an unlink issue.  It doesn't happen in all of my previous
> > kernels w/ identical unionfs (code 2.6.9--2.6.23).  And in 2.6.24 it happens
> > only w/ nfs2 -- nfs3/4 are fine.
> > 
> > I'm not sure if this is a client or server issue, and I'm only starting to
> > dig deeper.  But I thought I'd ask you in case this is a known problem and
> > you have a fix.  If this is the first you hear of this problem, let me know
> > and I'll try to narrow it down further.
> 
> A couple of questions:
> 
>   * Are these files being sillyrenamed?
>   * Are they shown as being removed by the server?
> 
> Cheers
>   Trond

Trond, I was able to narrow down the problem w/o using unionfs at all (yay!
:-).  All I do is setup a loop device, mkfs it as ext2, mount it, then
export it to localhost and mount it locally as nfs2.  I go and touch a new
file in the ext2 directory.  Then I readdir the export point to find the new
file indeed.  Then I stat(1) the new file, and get the appropriate stat
output.  Now the strange thing is that right after the stat through the
export point, the file DISAPPEARS from the lower ext2 dir, but REAPPEARS a
few seconds later.

It doesn't happen all the time, so it feels like some sort of a race or
timing-related bug.  And it only happens w/ nfs2 on 2.6.24 (v3/4 work fine;
v2/3/4 work find in previous kernels).

I'm trying to get a script that'll be able to reproduce this for you more
deterministically.

BTW, does your just-posted set of patches, subject "[GIT] NFS client fixes
for 2.6.23++" possibly fix this?  I can try it and let you know.

Hope this helps for now.

Erez.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: nfsv2 ref leak in 2.6.24?

2007-10-19 Thread Trond Myklebust

On Fri, 2007-10-19 at 01:49 -0400, Erez Zadok wrote:
> I'm testing unionfs on top of nfsv2/3/4, using 2.6.24 as of linus's commit
> 4fa4d23fa20de67df919030c1216295664866ad7.  A lot of my unionfs regression
> tests are failing on nfs2, b/c files that should be deleted, aren't.  It
> feels like there may be a ref leak that prevents the files from being
> deleted, or maybe an unlink issue.  It doesn't happen in all of my previous
> kernels w/ identical unionfs (code 2.6.9--2.6.23).  And in 2.6.24 it happens
> only w/ nfs2 -- nfs3/4 are fine.
> 
> I'm not sure if this is a client or server issue, and I'm only starting to
> dig deeper.  But I thought I'd ask you in case this is a known problem and
> you have a fix.  If this is the first you hear of this problem, let me know
> and I'll try to narrow it down further.

A couple of questions:

  * Are these files being sillyrenamed?
  * Are they shown as being removed by the server?

Cheers
  Trond

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/