Hi, Mark On 2014/2/6 7:31, Mark Fasheh wrote: > On Fri, Jan 24, 2014 at 12:47:03PM -0800, [email protected] wrote: >> From: Yiwen Jiang <[email protected]> >> Subject: ocfs2: fix a tiny race when running dirop_fileop_racer >> >> When running dirop_fileop_racer we found a dead lock case. >> >> 2 nodes, say Node A and Node B, mount the same ocfs2 volume. Create >> /race/16/1 in the filesystem, and let the inode number of dir 16 is less >> than the inode number of dir race. >> >> Node A Node B >> mv /race/16/1 /race/ >> right after Node A has got the >> EX mode of /race/16/, and tries to >> get EX mode of /race >> ls /race/16/ >> >> In this case, Node A has got the EX mode of /race/16/, and wants to get EX >> mode of /race/. Node B has got the PR mode of /race/, and wants to get >> the PR mode of /race/16/. Since EX and PR are mutually exclusive, dead >> lock happens. > > I am confused as to how this race happens. > > Something like "ls /race/16' shouldn't hold locks on 'race' and '16' at the > same time. It should look more like: > > <userspace does readdir /race/16> > PR race > <kernel looks up '16' in 'race'> > Unlock PR race > PR 16 > <get dirents from '16'> > Unlock PR 16 > <return dirents to userspace> > > Can you please explain where I may be going wrong? Also an strace of the > locked up 'ls' as well as the output of sysrq-t when it's deadlocked would > help show what's going on. > --Mark > when doing 'ls /race/16', it calls vfs_fstatat->..->d_alloc()->ocfs2_lookup() after readdir(). ocfs2_lookup() first get PR lock of race, and then get PR lock of 16 in ocfs2_iget() without unlocking PR race. -- joyce.xue > -- > Mark Fasheh > > _______________________________________________ > Ocfs2-devel mailing list > [email protected] > https://oss.oracle.com/mailman/listinfo/ocfs2-devel >
_______________________________________________ Ocfs2-devel mailing list [email protected] https://oss.oracle.com/mailman/listinfo/ocfs2-devel
