On Tue, 2024-04-30 at 17:29 -0400, Benjamin Marzinski wrote:
> On Tue, Apr 30, 2024 at 07:06:24PM +0200, Martin Wilck wrote:
> > On Thu, 2024-04-25 at 19:35 -0400, Benjamin Marzinski wrote:
> > >
> > > 1. create a multipath device with a kpartx partition on top of it
> > > and
> > > no_path_retry set to either "queue" or something long enough to
> > > run
> > > all
> > > the commands in the reproducer before it disables queueing.
> > > 2. disable all the paths to the device with something like:
> > > # echo offline > /sys/block/<path_dev>/device/state
> > > 3. Write directly to the multipath device with something like:
> > > # dd if=/dev/zero of=/dev/mapper/<mpath_dev> bs=4K count=1
> > > 4. delete all the paths to the device with something like:
> > > # echo 1 > /sys/block/<path_dev>/device/delete
> >
> > I've tried to reproduce the issue with these commands. Test system
> > was
> > using a LIO iSCSI target with 2 paths. I created a test script
> > (attached) to try the offline / IO / delete procedure repeatedly.
> > I haven't been able to make multipathd hang even once.
> >
> > I also played around with dd options. If I use oflag=sync or
> > oflag=direct, the dd command itself hangs.
> >
> > Did I set up anything wrongly, or does the behavior perhaps depend
> > on
> > the kernel, or something else perhaps? Mine was a 6.4 kernel. This
> > is
> > not to say there's something wrong with your patch, but I'd like to
> > understand the error situation better, as it doesn't seem to be
> > trigger-able on my test system.
> >
> > multipath.conf:
> >
> > defaults {
> > verbosity 3
> > flush_on_last_del yes
>
> If you set flush_on_last_del to "yes", then you won't be able to hit
> this, because you will never be queueing when multipathd tries to
> autoremove the device. The goal of my patch was to make sure
> multipathd
> never hung on an autoremove, regardless of the no_path_retry setting
> and
> the flush_on_last_del setting
Stupid me. To my excuse, I'd set "flush_on_last_del yes" because I
previously had been unable to reproduce the multipathd hang with the
default setting "flush_on_last_del no", and thought I'd misunderstood
something about flush_on_last_del. But I'd made some other mistake at
that point, apparently, which caused the issue not to reproduce.
I just set "flush_on_last_del no" and indeed reproduced the issue with
my script, immediately.
Thanks,
Martin