On Tue, Aug 26, 2025 at 10:44:22AM +0200, Martin Wilck wrote:
> On Mon, 2025-08-25 at 20:51 -0400, Benjamin Marzinski wrote:
> > On Sun, Aug 24, 2025 at 05:26:50PM +0200, Martin Wilck wrote:
> > 
> > > 
> > > > +       /*
> > > > +        * Cannot free the reservation because the path that is
> > > > holding it
> > > > +        * is not usable. Workaround this by:
> > > > +        * 1. Suspending the device
> > > > +        * 2. Preempting the reservation to move it to a usable
> > > > path
> > > > +        *    (this removes the registered keys on all paths
> > > > except
> > > > the
> > > > +        *    preempting one. Since the device is suspended, no
> > > > IO
> > > > can
> > > > +        *    go to these unregistered paths and fail).
> > > > +        * 3. Releasing the reservation on the path that now
> > > > holds
> > > > it.
> > > > +        * 4. Resuming the device (since it no longer matters
> > > > that
> > > > most of
> > > > +        *    that paths no longer have a registered key)
> > > > +        * 5. Reregistering keys on all the paths
> > > > +        */
> > > > +
> > > > +       if (!dm_simplecmd_noflush(DM_DEVICE_SUSPEND, mpp->alias,
> > > > 0))
> > > > {
> > > > +               condlog(0, "%s: release: failed to suspend dm
> > > > device.",
> > > 
> > > Why do you use dm_simplecmd_noflush() here? Shouldn't queued IO be
> > > flushed from the dm device to avoid it being sent to paths that are
> > > going to be unregistered?
> > > 
> > 
> > I'm pretty certain that DM will still flush all the IO from the
> > target
> > to DM core before suspending, even with dm_simplecmd_noflush() set.
> > In
> > request based multipath, queued IOs are never stored in the target.
> > In
> > bio based multipath, they are, but they will get flushed back up to
> > DM
> > core when suspending and queued there. No IO should happen through
> > the
> > target after the suspend, until the resume. dm_simplecmd_noflush()
> > just
> > keeps multipath from failing any IO that it had queueing, and it's
> > only
> > really necessary when we resize the device, because if we shrink the
> > device, outstanding IO might be outside the new bounds.
> 
> OK, thanks for the clarification. I guess I've never fully understood
> the way queueing works in dm.
> 
> What about queueing in the path devices? We'll be removing registration
> keys, so IO sent by the SCSI layer may end up with RESERVATION CONFLICT
> errors. To my understanding, without the DM_NOFLUSH_FLAG the kernel
> will freeze the queue and flush everything, as if the device was closed
> during shutdown. If DM_NOFLUSH_FLAG is set, this won't happen. What's
> preventing the SCSI layer from sending IO while we're modifying the
> registrations?

In __dm_suspend() we block all new IOs to the dm device here:
https://github.com/torvalds/linux/blob/fab1beda7597fac1cecc01707d55eadb6bbe773c/drivers/md/dm.c#L2955-L2966

Once we know that no new IOs are getting sent to the target, we wait for
all the IOs that were send to the target to get completed by calling
dm_wait_for_completion() here:

https://github.com/torvalds/linux/blob/fab1beda7597fac1cecc01707d55eadb6bbe773c/drivers/md/dm.c#L2973

Any IOs that are currently being sent inside the multipath target will
get handled either while getting mapped or when ending the path IO by
multipath_clone_and_map(), __multipath_map_bio(), multipath_end_io(), or
multipath_end_io_bio(), which will complete the IOs or send them back to
DM core for queueing there (which also satisfies dm_wait_for_completion).

So by the time the suspend command returns, there won't be any IOs in
flight for the the SCSI layer to send to the target, and there can't be
new ones coming in through DM until we resume.

-Ben 
 
> Martin


Reply via email to