Sam, Kevin, and I arrived at a solution that is now checked into both trunk and the 2-7 branch. The short story is that we rolled back to the old dcache approach and then fixed the old "simul #7" bug by adding a pvfs2_d_delete() function. The VFS ended up being too dependent on dcache entries staying put for us to invalidate them automatically.

To my knowledge all of the mv, simul, touch, and dbench issues related to the dcache are now working.

This may also fix the NFS re-exporting issue in trunk, but I have not tried it yet.

-Phil

Phil Carns wrote:
Ok, I think I found the root cause (thanks to UML and GDB). Within the kernel's do_rename() function in namei.c, it makes a call to a function called lock_rename() with two dentry arguments, p1 and p2.

The first dentry is for the old parent directory. The second dentry is the name of the new parent directory. In the case that I caught in GDB they should both be dirE because I was running "mv dirE/file1 dirE/file2".

At the top of lock_rename(), the kernel first checks to see if the old parent and the new parent are the same by checking "if (p1 == p2)". It needs to do a different locking strategy if that is true. In this case p1 and p2 _should_ be equivalent, but the pointers are actually different even though they refer to the same thing:

(gdb) print p1
$1 = (struct dentry *) 0x2345655c
(gdb) print p2
$2 = (struct dentry *) 0x234566f4

Here are the contents; note that d_iname is "dirE" for both:

(gdb) print *p1
$3 = {d_count = {counter = 1}, d_flags = 0, d_lock = {
    raw_lock = {<No data fields>}}, d_inode = 0x24098dc8, d_hash = {
next = 0x23456700, pprev = 0x9021d6c}, d_parent = 0x2711bddc, d_name = {
    hash = 25991251, len = 4, name = 0x234565b4 "dirE"}, d_lru = {
    next = 0x23456580, prev = 0x23456580}, d_u = {d_child = {
      next = 0x23456720, prev = 0x2711be10}, d_rcu = {next = 0x23456720,
func = 0x2711be10}}, d_subdirs = {next = 0x23456590, prev = 0x23456590},
  d_alias = {next = 0x23456730, prev = 0x24098de0}, d_time = 1515870810,
  d_op = 0x28815494, d_sb = 0x263bd0bc, d_fsdata = 0x0, d_mounted = 0,
  d_iname = "dirE\000", 'Z' <repeats 30 times>, "?"}
(gdb) print *p2

$4 = {d_count = {counter = 1}, d_flags = 0, d_lock = {
    raw_lock = {<No data fields>}}, d_inode = 0x24098dc8, d_hash = {
    next = 0x0, pprev = 0x23456568}, d_parent = 0x2711bddc, d_name = {
    hash = 25991251, len = 4, name = 0x2345674c "dirE"}, d_lru = {
    next = 0x23456718, prev = 0x23456718}, d_u = {d_child = {
      next = 0x23108ad8, prev = 0x23456588}, d_rcu = {next = 0x23108ad8,
func = 0x23456588}}, d_subdirs = {next = 0x23456728, prev = 0x23456728},
  d_alias = {next = 0x24098de0, prev = 0x23456598}, d_time = 1515870810,
  d_op = 0x28815494, d_sb = 0x263bd0bc, d_fsdata = 0x0, d_mounted = 0,
  d_iname = "dirE\000", 'Z' <repeats 30 times>, "?"}

I assume this is an artifact of how we are invalidating dcache entries now? Did the kernel get a physically different dentry structure when it looked up the old dir and the new dir back to back?

It looks like for rename to work, we really need the kernel to be able to get the same dentry structure consistently if it looks up the same directory twice in a row. Right now we are actually hanging within this lock_rename() function before getting to the actual PVFS2 rename function.

-Phil

Phil Carns wrote:
It looks like dbench is hanging on a rename of a file within a subdirectory. I can replicate it outside of dbench like this:

  root@(none):/mnt/pvfs2# pwd
  /mnt/pvfs2
  root@(none):/mnt/pvfs2# mkdir testdir
  root@(none):/mnt/pvfs2# touch testdir/foo
  root@(none):/mnt/pvfs2# mv testdir/foo testdir/bar

Everything works fine if I instead try to rename something with no subdirectory:

  root@(none):/mnt/pvfs2# touch foo
  root@(none):/mnt/pvfs2# mv foo bar                  # so far so good

I don't think the "mkdir" and "touch" steps are relevant other than just for an example. I think get the same hang if the file and subdir already exist when pvfs is mounted.

-Phil

Phil Carns wrote:
Whoops - it looks like the pvfs2-client-core crash didn't really have much to do w/ dbench. That was a red herring (a system/compile problem on my end was what really triggered that), but I think the fixes to address the bufmap race conditions ended up being a good thing anyway :)

Anyway, I can't vouch for the current state of dbench with pvfs2 trunk; I think it probably still hangs the client machine just as before.

-Phil

Robert Latham wrote:
On Thu, Jan 17, 2008 at 06:37:52PM -0600, Phil Carns wrote:
1) dbench somehow kills pvfs2-client-core (not sure why yet)

Hi Phil
Thanks for taking a look at the dbench failures. I confirmed yesterday that revision 1.32 of
src/kernel/linux-2.6/dcache.c is the last revision to pass dbench.
Sam already knows that's a suspect area.  I'm just saying by starting
with HEAD and reverting that one file to 1.32 dbench (and all the
other nightlies) pass.
Why do subsequent revisions make pvfs2-client-core blow up? no idea.
It's no small change, and related if I remember correctly to Sam and
Kevin debugging some BGP issues, so reverting it doesn't seem like the
right thing to do.

==rob





_______________________________________________
Pvfs2-developers mailing list
Pvfs2-developers@beowulf-underground.org
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers

Reply via email to