On 7 March 2017 at 17:52, Mike Snitzer <snit...@redhat.com> wrote: > > On 06.03.2017 21:18, Jens Axboe wrote: > > > I like the change, and thanks for tackling this. It's been a pending > > > issue for way too long. I do think we should squash Jack's patch > > > into the original, as it does clean up the code nicely. > > > > > > Do we have a proper test case for this, so we can verify that it > > > does indeed also work in practice? > > > > > Hi Jens, > > > > I can trigger deadlock with in RAID1 with test below: > > > > I create one md with one local loop device and one remote scsi > > exported by SRP. running fio with mix rw on top of md, force_close > > session on storage side. mdx_raid1 is wait on free_array in D state, > > and a lot of fio also in D state in wait_barrier. > > > > With the patch from Neil above, I can no longer trigger it anymore. > > > > The discussion was in link below: > > http://www.spinics.net/lists/raid/msg54680.html > > In addition to Jack's MD raid test there is a DM snapshot deadlock test, > albeit unpolished/needy to get running, see: > https://www.redhat.com/archives/dm-devel/2017-January/msg00064.html > > But to actually test block core's ability to handle this, upstream > commit d67a5f4b5947aba4bfe9a80a2b86079c215ca755 ("dm: flush queued bios > when process blocks to avoid deadlock") would need to be reverted. > > Also, I know Lars had a drbd deadlock too. Not sure if Jack's MD test > is sufficient to coverage for drbd. Lars? >
As this is just a slightly different implementation, trading some bytes of stack for more local, self-contained, "obvious" code changes (good job!), but follows the same basic idea as my original RFC [*] (see the "inspired-by" tag) I have no doubt it fixes the issues we are able to provoke with DRBD. [*] https://lkml.org/lkml/2016/7/19/263 (where I also already suggest to fix the device-mapper issues by losing the in-device-mapper loop, relying on the loop in generic_make_request()) Cheers, Lars