Re: msync(2) bug(?), returns AOP_WRITEPAGE_ACTIVATE to userland

Dave Hansen Mon, 05 Nov 2007 08:58:24 -0800

On Mon, 2007-11-05 at 15:40 +0000, Hugh Dickins wrote:
> The second problem was a hang: all cpus in
> handle_write_count_underflow
> doing lock_and_coalesce_cpu_mnt_writer_counts: new -mm stuff from Dave
> Hansen.  At first I thought that was a locking problem in Dave's code,
> but I now suspect it's that your unionfs reference counting is wrong
> somewhere, and the error accumulates until __mnt_writers drops below
> MNT_WRITER_UNDERFLOW_LIMIT, but the coalescence does nothing to help
> and we're stuck in that loop.


I've never actually seen this happen in practice, but I do know exactly
what you're talking about.

> but I hope Dave can
> also make handle_write_count_underflow more robust, it's unfortunate
> if refcount errors elsewhere first show up as a hang there.

Actually, I think your s/while/if/ change is probably a decent fix.
Barring any other races, that loop should always have made progress on
mnt->__mnt_writers the way it is written.  If we get to:

>                 lock_and_coalesce_cpu_mnt_writer_counts();
----------------->HERE
>                 mnt_unlock_cpus();

and don't have a positive mnt->__mnt_writers, we know something is going
badly.  We WARN_ON() there, which should at least give an earlier
warning that the system is not doing well.  But it doesn't fix the
inevitable.  Could you try the attached patch and see if it at least
warns you earlier?

I have a decent guess what the bug is, too.  In the unionfs code:

> int init_lower_nd(struct nameidata *nd, unsigned int flags)
> {
> ...
> #ifdef ALLOC_LOWER_ND_FILE
>                 file = kzalloc(sizeof(struct file), GFP_KERNEL);
>                 if (unlikely(!file)) {
>                         err = -ENOMEM;
>                         break; /* exit switch statement and thus return */
>                 }
>                 nd->intent.open.file = file;
> #endif /* ALLOC_LOWER_ND_FILE */

The r/o bind mount code will mnt_drop_write() on that file's f_vfsmnt at
__fput() time.  Since that code never got a write on the mount, we'll
see an imbalance if the file was opened for a write.  I don't see this
file's mnt set anywhere, so I'm not completely sure that this is it.  In
any case, rolling your own 'struct file' without using alloc_file() and
friends is a no-no.

BTW, I have some "debugging" code in my latest set of patches that I
think should fix this kind of imbalance with the mnt->__mnt_writers().
It ensures that before we do that mnt_drop_write() at __fput() that we
absolutely did a mnt_want_write() at some point in the 'struct file's
life.  

-- Dave

 linux-2.6.git-dave/fs/namespace.c        |   31 ++++++++++++++++++++++---------
 linux-2.6.git-dave/include/linux/mount.h |    1 +
 2 files changed, 23 insertions(+), 9 deletions(-)

diff -puN fs/namei.c~fix-naughty-loop fs/namei.c
diff -puN fs/namespace.c~fix-naughty-loop fs/namespace.c
--- linux-2.6.git/fs/namespace.c~fix-naughty-loop       2007-11-05 
08:03:59.000000000 -0800
+++ linux-2.6.git-dave/fs/namespace.c   2007-11-05 08:35:06.000000000 -0800
@@ -225,16 +225,29 @@ static void lock_and_coalesce_cpu_mnt_wr
  */
 static void handle_write_count_underflow(struct vfsmount *mnt)
 {
-       while (atomic_read(&mnt->__mnt_writers) <
-               MNT_WRITER_UNDERFLOW_LIMIT) {
-               /*
-                * It isn't necessary to hold all of the locks
-                * at the same time, but doing it this way makes
-                * us share a lot more code.
-                */
-               lock_and_coalesce_cpu_mnt_writer_counts();
-               mnt_unlock_cpus();
+       if (atomic_read(&mnt->__mnt_writers) >=
+           MNT_WRITER_UNDERFLOW_LIMIT)
+               return;
+       /*
+        * It isn't necessary to hold all of the locks
+        * at the same time, but doing it this way makes
+        * us share a lot more code.
+        */
+       lock_and_coalesce_cpu_mnt_writer_counts();
+       /*
+        * If coalescing the per-cpu writer counts did not
+        * get us back to a positive writer count, we have
+        * a bug.
+        */
+       if ((atomic_read(&mnt->__mnt_writers) < 0) &&
+           !(mnt->mnt_flags & MNT_IMBALANCED_WRITE_COUNT)) {
+               printk("leak detected on mount(%p) writers count: %d\n",
+                       mnt, atomic_read(&mnt->__mnt_writers));
+               WARN_ON(1);
+               /* use the flag to keep the dmesg spam down */
+               mnt->mnt_flags |= MNT_IMBALANCED_WRITE_COUNT;
        }
+       mnt_unlock_cpus();
 }
 
 /**
diff -puN include/linux/mount.h~fix-naughty-loop include/linux/mount.h
--- linux-2.6.git/include/linux/mount.h~fix-naughty-loop        2007-11-05 
08:22:21.000000000 -0800
+++ linux-2.6.git-dave/include/linux/mount.h    2007-11-05 08:28:20.000000000 
-0800
@@ -32,6 +32,7 @@ struct mnt_namespace;
 #define MNT_READONLY   0x40    /* does the user want this to be r/o? */
 
 #define MNT_SHRINKABLE 0x100
+#define MNT_IMBALANCED_WRITE_COUNT     0x200 /* just for debugging */
 
 #define MNT_SHARED     0x1000  /* if the vfsmount is a shared mount */
 #define MNT_UNBINDABLE 0x2000  /* if the vfsmount is a unbindable mount */
_


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: msync(2) bug(?), returns AOP_WRITEPAGE_ACTIVATE to userland

Reply via email to