On Tue, 2025-08-26 at 15:36 -0400, Benjamin Marzinski wrote:
> On Tue, Aug 26, 2025 at 10:44:22AM +0200, Martin Wilck wrote:
> > On Mon, 2025-08-25 at 20:51 -0400, Benjamin Marzinski wrote:
> > > On Sun, Aug 24, 2025 at 05:26:50PM +0200, Martin Wilck wrote:
> > > 
> > > > 
> > > > > +     /*
> > > > > +      * Cannot free the reservation because the path that
> > > > > is
> > > > > holding it
> > > > > +      * is not usable. Workaround this by:
> > > > > +      * 1. Suspending the device
> > > > > +      * 2. Preempting the reservation to move it to a
> > > > > usable
> > > > > path
> > > > > +      *    (this removes the registered keys on all paths
> > > > > except
> > > > > the
> > > > > +      *    preempting one. Since the device is suspended,
> > > > > no
> > > > > IO
> > > > > can
> > > > > +      *    go to these unregistered paths and fail).
> > > > > +      * 3. Releasing the reservation on the path that now
> > > > > holds
> > > > > it.
> > > > > +      * 4. Resuming the device (since it no longer
> > > > > matters
> > > > > that
> > > > > most of
> > > > > +      *    that paths no longer have a registered key)
> > > > > +      * 5. Reregistering keys on all the paths
> > > > > +      */
> > > > > +
> > > > > +     if (!dm_simplecmd_noflush(DM_DEVICE_SUSPEND, mpp-
> > > > > >alias,
> > > > > 0))
> > > > > {
> > > > > +             condlog(0, "%s: release: failed to suspend
> > > > > dm
> > > > > device.",
> > > > 
> > > > Why do you use dm_simplecmd_noflush() here? Shouldn't queued IO
> > > > be
> > > > flushed from the dm device to avoid it being sent to paths that
> > > > are
> > > > going to be unregistered?
> > > > 
> > > 
> > > I'm pretty certain that DM will still flush all the IO from the
> > > target
> > > to DM core before suspending, even with dm_simplecmd_noflush()
> > > set.
> > > In
> > > request based multipath, queued IOs are never stored in the
> > > target.
> > > In
> > > bio based multipath, they are, but they will get flushed back up
> > > to
> > > DM
> > > core when suspending and queued there. No IO should happen
> > > through
> > > the
> > > target after the suspend, until the resume.
> > > dm_simplecmd_noflush()
> > > just
> > > keeps multipath from failing any IO that it had queueing, and
> > > it's
> > > only
> > > really necessary when we resize the device, because if we shrink
> > > the
> > > device, outstanding IO might be outside the new bounds.
> > 
> > OK, thanks for the clarification. I guess I've never fully
> > understood
> > the way queueing works in dm.
> > 
> > What about queueing in the path devices? We'll be removing
> > registration
> > keys, so IO sent by the SCSI layer may end up with RESERVATION
> > CONFLICT
> > errors. To my understanding, without the DM_NOFLUSH_FLAG the kernel
> > will freeze the queue and flush everything, as if the device was
> > closed
> > during shutdown. If DM_NOFLUSH_FLAG is set, this won't happen.
> > What's
> > preventing the SCSI layer from sending IO while we're modifying the
> > registrations?
> 
> In __dm_suspend() we block all new IOs to the dm device here:
> https://github.com/torvalds/linux/blob/fab1beda7597fac1cecc01707d55eadb6bbe773c/drivers/md/dm.c#L2955-L2966
> 
> Once we know that no new IOs are getting sent to the target, we wait
> for
> all the IOs that were send to the target to get completed by calling
> dm_wait_for_completion() here:
> 
> https://github.com/torvalds/linux/blob/fab1beda7597fac1cecc01707d55eadb6bbe773c/drivers/md/dm.c#L2973

I saw that code. I realize now that dm_wait_for_completion() will
finish all IO on underlying path devices. I wasn't 100% certain about
that in the first place.

I also saw that lock_fs() is not called if we set the noflush flag. And
I am wondering if that's what we want. If there's dirty data on the
file system and we drop our registration key, the file system will most
probably error out.

Martin

Reply via email to