Ok, I think I found the root cause (thanks to UML and GDB). Within the
kernel's do_rename() function in namei.c, it makes a call to a function
called lock_rename() with two dentry arguments, p1 and p2.
The first dentry is for the old parent directory. The second dentry is
the name of the new parent directory. In the case that I caught in GDB
they should both be dirE because I was running "mv dirE/file1 dirE/file2".
At the top of lock_rename(), the kernel first checks to see if the old
parent and the new parent are the same by checking "if (p1 == p2)". It
needs to do a different locking strategy if that is true. In this case
p1 and p2 _should_ be equivalent, but the pointers are actually
different even though they refer to the same thing:
(gdb) print p1
$1 = (struct dentry *) 0x2345655c
(gdb) print p2
$2 = (struct dentry *) 0x234566f4
Here are the contents; note that d_iname is "dirE" for both:
(gdb) print *p1
$3 = {d_count = {counter = 1}, d_flags = 0, d_lock = {
raw_lock = {<No data fields>}}, d_inode = 0x24098dc8, d_hash = {
next = 0x23456700, pprev = 0x9021d6c}, d_parent = 0x2711bddc, d_name
= {
hash = 25991251, len = 4, name = 0x234565b4 "dirE"}, d_lru = {
next = 0x23456580, prev = 0x23456580}, d_u = {d_child = {
next = 0x23456720, prev = 0x2711be10}, d_rcu = {next = 0x23456720,
func = 0x2711be10}}, d_subdirs = {next = 0x23456590, prev =
0x23456590},
d_alias = {next = 0x23456730, prev = 0x24098de0}, d_time = 1515870810,
d_op = 0x28815494, d_sb = 0x263bd0bc, d_fsdata = 0x0, d_mounted = 0,
d_iname = "dirE\000", 'Z' <repeats 30 times>, "?"}
(gdb) print *p2
$4 = {d_count = {counter = 1}, d_flags = 0, d_lock = {
raw_lock = {<No data fields>}}, d_inode = 0x24098dc8, d_hash = {
next = 0x0, pprev = 0x23456568}, d_parent = 0x2711bddc, d_name = {
hash = 25991251, len = 4, name = 0x2345674c "dirE"}, d_lru = {
next = 0x23456718, prev = 0x23456718}, d_u = {d_child = {
next = 0x23108ad8, prev = 0x23456588}, d_rcu = {next = 0x23108ad8,
func = 0x23456588}}, d_subdirs = {next = 0x23456728, prev =
0x23456728},
d_alias = {next = 0x24098de0, prev = 0x23456598}, d_time = 1515870810,
d_op = 0x28815494, d_sb = 0x263bd0bc, d_fsdata = 0x0, d_mounted = 0,
d_iname = "dirE\000", 'Z' <repeats 30 times>, "?"}
I assume this is an artifact of how we are invalidating dcache entries
now? Did the kernel get a physically different dentry structure when it
looked up the old dir and the new dir back to back?
It looks like for rename to work, we really need the kernel to be able
to get the same dentry structure consistently if it looks up the same
directory twice in a row. Right now we are actually hanging within this
lock_rename() function before getting to the actual PVFS2 rename function.
-Phil
Phil Carns wrote:
It looks like dbench is hanging on a rename of a file within a
subdirectory. I can replicate it outside of dbench like this:
root@(none):/mnt/pvfs2# pwd
/mnt/pvfs2
root@(none):/mnt/pvfs2# mkdir testdir
root@(none):/mnt/pvfs2# touch testdir/foo
root@(none):/mnt/pvfs2# mv testdir/foo testdir/bar
Everything works fine if I instead try to rename something with no
subdirectory:
root@(none):/mnt/pvfs2# touch foo
root@(none):/mnt/pvfs2# mv foo bar # so far so good
I don't think the "mkdir" and "touch" steps are relevant other than
just for an example. I think get the same hang if the file and subdir
already exist when pvfs is mounted.
-Phil
Phil Carns wrote:
Whoops - it looks like the pvfs2-client-core crash didn't really have
much to do w/ dbench. That was a red herring (a system/compile
problem on my end was what really triggered that), but I think the
fixes to address the bufmap race conditions ended up being a good
thing anyway :)
Anyway, I can't vouch for the current state of dbench with pvfs2
trunk; I think it probably still hangs the client machine just as
before.
-Phil
Robert Latham wrote:
On Thu, Jan 17, 2008 at 06:37:52PM -0600, Phil Carns wrote:
1) dbench somehow kills pvfs2-client-core (not sure why yet)
Hi Phil
Thanks for taking a look at the dbench failures. I confirmed
yesterday that revision 1.32 of
src/kernel/linux-2.6/dcache.c is the last revision to pass dbench.
Sam already knows that's a suspect area. I'm just saying by starting
with HEAD and reverting that one file to 1.32 dbench (and all the
other nightlies) pass.
Why do subsequent revisions make pvfs2-client-core blow up? no idea.
It's no small change, and related if I remember correctly to Sam and
Kevin debugging some BGP issues, so reverting it doesn't seem like the
right thing to do.
==rob