Re: [Pvfs2-developers] dbench and pvfs2_bufmap race

Phil Carns Wed, 23 Jan 2008 12:04:32 -0800

Sam, Kevin, and I arrived at a solution that is now checked into bothtrunk and the 2-7 branch. The short story is that we rolled back to theold dcache approach and then fixed the old "simul #7" bug by adding apvfs2_d_delete() function. The VFS ended up being too dependent ondcache entries staying put for us to invalidate them automatically.

To my knowledge all of the mv, simul, touch, and dbench issues relatedto the dcache are now working.

This may also fix the NFS re-exporting issue in trunk, but I have nottried it yet.


-Phil

Phil Carns wrote:

Ok, I think I found the root cause (thanks to UML and GDB). Within thekernel's do_rename() function in namei.c, it makes a call to a functioncalled lock_rename() with two dentry arguments, p1 and p2.
The first dentry is for the old parent directory. The second dentry isthe name of the new parent directory. In the case that I caught in GDBthey should both be dirE because I was running "mv dirE/file1 dirE/file2".
At the top of lock_rename(), the kernel first checks to see if the oldparent and the new parent are the same by checking "if (p1 == p2)". Itneeds to do a different locking strategy if that is true. In this casep1 and p2 _should_ be equivalent, but the pointers are actuallydifferent even though they refer to the same thing:
(gdb) print p1
$1 = (struct dentry *) 0x2345655c
(gdb) print p2
$2 = (struct dentry *) 0x234566f4

Here are the contents; note that d_iname is "dirE" for both:

(gdb) print *p1
$3 = {d_count = {counter = 1}, d_flags = 0, d_lock = {
    raw_lock = {<No data fields>}}, d_inode = 0x24098dc8, d_hash = {
next = 0x23456700, pprev = 0x9021d6c}, d_parent = 0x2711bddc, d_name= {
    hash = 25991251, len = 4, name = 0x234565b4 "dirE"}, d_lru = {
    next = 0x23456580, prev = 0x23456580}, d_u = {d_child = {
      next = 0x23456720, prev = 0x2711be10}, d_rcu = {next = 0x23456720,
func = 0x2711be10}}, d_subdirs = {next = 0x23456590, prev =0x23456590},
  d_alias = {next = 0x23456730, prev = 0x24098de0}, d_time = 1515870810,
  d_op = 0x28815494, d_sb = 0x263bd0bc, d_fsdata = 0x0, d_mounted = 0,
  d_iname = "dirE\000", 'Z' <repeats 30 times>, "?"}
(gdb) print *p2

$4 = {d_count = {counter = 1}, d_flags = 0, d_lock = {
    raw_lock = {<No data fields>}}, d_inode = 0x24098dc8, d_hash = {
    next = 0x0, pprev = 0x23456568}, d_parent = 0x2711bddc, d_name = {
    hash = 25991251, len = 4, name = 0x2345674c "dirE"}, d_lru = {
    next = 0x23456718, prev = 0x23456718}, d_u = {d_child = {
      next = 0x23108ad8, prev = 0x23456588}, d_rcu = {next = 0x23108ad8,
func = 0x23456588}}, d_subdirs = {next = 0x23456728, prev =0x23456728},
  d_alias = {next = 0x24098de0, prev = 0x23456598}, d_time = 1515870810,
  d_op = 0x28815494, d_sb = 0x263bd0bc, d_fsdata = 0x0, d_mounted = 0,
  d_iname = "dirE\000", 'Z' <repeats 30 times>, "?"}
I assume this is an artifact of how we are invalidating dcache entriesnow? Did the kernel get a physically different dentry structure when itlooked up the old dir and the new dir back to back?
It looks like for rename to work, we really need the kernel to be ableto get the same dentry structure consistently if it looks up the samedirectory twice in a row. Right now we are actually hanging within thislock_rename() function before getting to the actual PVFS2 rename function.
-Phil

Phil Carns wrote:
It looks like dbench is hanging on a rename of a file within asubdirectory. I can replicate it outside of dbench like this:
  root@(none):/mnt/pvfs2# pwd
  /mnt/pvfs2
  root@(none):/mnt/pvfs2# mkdir testdir
  root@(none):/mnt/pvfs2# touch testdir/foo
  root@(none):/mnt/pvfs2# mv testdir/foo testdir/bar
Everything works fine if I instead try to rename something with nosubdirectory:
  root@(none):/mnt/pvfs2# touch foo
  root@(none):/mnt/pvfs2# mv foo bar                  # so far so good
I don't think the "mkdir" and "touch" steps are relevant other thanjust for an example. I think get the same hang if the file and subdiralready exist when pvfs is mounted.
-Phil

Phil Carns wrote:
Whoops - it looks like the pvfs2-client-core crash didn't really havemuch to do w/ dbench. That was a red herring (a system/compileproblem on my end was what really triggered that), but I think thefixes to address the bufmap race conditions ended up being a goodthing anyway :)
Anyway, I can't vouch for the current state of dbench with pvfs2trunk; I think it probably still hangs the client machine just asbefore.
-Phil

Robert Latham wrote:
On Thu, Jan 17, 2008 at 06:37:52PM -0600, Phil Carns wrote:
1) dbench somehow kills pvfs2-client-core (not sure why yet)
Hi Phil
Thanks for taking a look at the dbench failures. I confirmedyesterday that revision 1.32 of
src/kernel/linux-2.6/dcache.c is the last revision to pass dbench.
Sam already knows that's a suspect area.  I'm just saying by starting
with HEAD and reverting that one file to 1.32 dbench (and all the
other nightlies) pass.
Why do subsequent revisions make pvfs2-client-core blow up? no idea.
It's no small change, and related if I remember correctly to Sam and
Kevin debugging some BGP issues, so reverting it doesn't seem like the
right thing to do.

==rob


_______________________________________________
Pvfs2-developers mailing list
Pvfs2-developers@beowulf-underground.org
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers

Re: [Pvfs2-developers] dbench and pvfs2_bufmap race

Reply via email to