On Tue, Aug 22, 2017 at 11:06:03AM +0200, Peter Zijlstra wrote: > On Tue, Aug 22, 2017 at 03:46:03PM +1000, Dave Chinner wrote: > > Even if I ignore the fact that buffer completions are run on > > different workqueues, there seems to be a bigger problem with this > > sort of completion checking. > > > > That is, the trace looks plausible because we are definitely hold an > > inode locked deep inside a truncate operation where the completion > > if flagged. Indeed, some transactions that would flag like this > > could be holding up to 5 inodes locked and have tens of other > > metadata objects locked. There are potentially tens (maybe even > > hundreds) of different paths into this IO wait point, and all have > > different combinations of objects locked when it triggers. So > > there's massive scope for potential deadlocks.... > > > > .... and so we must have some way of avoiding this whole class of > > problems that lockdep is unaware of. > > So I did the below little hack, which basically wipes the entire lock > history when we start a work and thereby disregards/looses the > dependency on the work 'lock'.
Ok, so now it treats workqueue worker threads like any other process? > It makes my test box able to boot and build a kernel on XFS, so while I > see what you're saying (I think), it doesn't appear to instantly show. > > Should I run xfstests or something to further verify things are OK? Does > that need a scratch partition (I keep forgetting how to run that stuff > :/). A couple of 4-8GB ramdisks/fake pmem regions is all you need. Put this in the configs/<hostname>.config file, modifying the devices to suit: [xfs] FSTYP=xfs TEST_DIR=/mnt/test TEST_DEV=/dev/pmem0 SCRATCH_MNT=/mnt/scratch SCRATCH_DEV=/dev/pmem1 and run "./check -s xfs -g auto" from the root of the xfstests source tree. Cheers, Dave. -- Dave Chinner da...@fromorbit.com