Sam and I looked at this some more on Friday, and we don't think that the i_mutex lock is needed in d_revalidate at this point. It looks like the main problem was really fixed by Sam's reorganization of the attribute copying functions.

I went ahead and backed the lock out of cvs trunk and will run some simul tests to make sure everything is in working order.

-Phil

Phil Carns wrote:
I've narrowed down a deadlock situation that I need some help untangling.

I got started down this path trying to reproduce a bug that Bart reported in this thread: http://www.beowulf-underground.org/pipermail/pvfs2-developers/2008-June/004080.html

Before I got to that point I got sidetracked with the rename06 test case within LTP. Rename06 deadlocks the file system and becomes unkillable (even with kill -9). The particular case that causes this can be reproduced on the command line with trunk as follows:

  mkdir dir1
  mv dir1 dir1/dir2

That mv command should gracefully return an error about moving dir1 to a subdirectory of itself, but it deadlocks instead.

After chasing this around a bit, I found that it is hanging on dcache.c:112:

http://www.pvfs.org/fisheye/browse/PVFS/src/kernel/linux-2.6/dcache.c?r=1.41#l112

The pvfs2_inode_lock() call is locking the i_mutex variable in this case. This lock was added to solve the simul open problem that Sam ran into a while back:

http://www.beowulf-underground.org/pipermail/pvfs2-developers/2008-March/003975.html

Unfortunately, in this rename case the kernel is already holding i_mutex when it calls d_revalidate. The culprit is the do_rename() function, which locks the mutex within a subroutine called lock_rename():

http://lxr.linux.no/linux+v2.6.21/fs/namei.c#L2520

... and then indirectly triggers a d_revalidate as part of lookup_hash() a few lines later while still holding it:

http://lxr.linux.no/linux+v2.6.21/fs/namei.c#L2522

These steps happen as the kernel is trying to find out information about the two directories in order to perform rename error checking.

Anyone have any ideas what to do?

The copy_attributes_to_inode() function (the core problem in the d_revalidate/simul problem) already uses a different and apparently safe lock for some things: i_lock by way of pvfs2_lock_inode(). It does not currently cover the perms or mode though. I kind of wonder if expanding that lock to cover more fields in copy_attributes_to_inode() would be enough to fix the simul problem, but I can't find any documentation for which fields i_lock is supposed to protect compared to i_mutex.

I also apologize for there being both a pvfs2_lock_inode() and pvfs2_inode_lock() macro that do different things. I added the latter recently when we needed a configure test for i_mutex changing to i_sem, but I didn't notice that there was a confusingly similar macro already. We should definitely rename it :)

-Phil






_______________________________________________
Pvfs2-developers mailing list
Pvfs2-developers@beowulf-underground.org
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers

_______________________________________________
Pvfs2-developers mailing list
Pvfs2-developers@beowulf-underground.org
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers

Reply via email to