On Tue, Dec 2, 2025 at 6:10 AM Pratyush Yadav <[email protected]> wrote:
>
> On Mon, Dec 01 2025, Pasha Tatashin wrote:
>
> > On Wed, Nov 26, 2025 at 2:36 PM David Matlack <[email protected]> wrote:
> [...]
> >> FLB Locking
> >>
> >>   I don't see a way to properly synchronize pci_flb_finish() with
> >>   pci_liveupdate_incoming_is_preserved() since the incoming FLB mutex is
> >>   dropped by liveupdate_flb_get_incoming() when it returns the pointer
> >>   to the object, and taking pci_flb_incoming_lock in pci_flb_finish()
> >>   could result in a deadlock due to reversing the lock ordering.
>
> My mental model for FLB is that it is a dependency for files, so it
> should always be created (aka prepare) before _any_ of the files, and
> always destroyed (aka finish) after _all_ of the files.
>
> By the time the FLB is being finished, all the files for that FLB should
> also be finished, so there should no longer be a user of the FLB.
>
> Once all of the files are finished, it should be LUO's responsibility to
> make sure liveupdate_flb_get_incoming() returns an error _before_ it
> starts doing the FLB finish. And in FLB finish you should not be needing
> to take any locks.
>
> Why do you want to use the FLB when it is being finished?

The next patch looks at the PCI FLB anytime a device is probed, which
could could race with the last device file getting finished causing
the FLB to be freed.

However, it looks like I am going to drop that patch. But the PCI FLB
is still used in PATCH 08 [1] whenever userspace opens a VFIO cdev or
issues the VFIO_GROUP_GET_DEVICE_FD ioctl to check of the underlying
PCI device was preserved. Offline Jason suggested decoupling those
checks from the FLB, so I'll look into doing that in the next version.

[1]https://lore.kernel.org/kvm/[email protected]/

>
> >
> > I will re-introduce _lock/_unlock API to solve this issue.
> >
> >>
> >> FLB Retrieving
> >>
> >>   The first patch of this series includes a fix to prevent an FLB from
> >>   being retrieved again it is finished. I am wondering if this is the
> >>   right approach or if subsystems are expected to stop calling
> >>   liveupdate_flb_get_incoming() after an FLB is finished.
>
> IMO once the FLB is finished, LUO should make sure it cannot be
> retrieved, mainly so subsystem code is simpler and less bug-prone.

+1, and I think Pasha is going to do that in the next version of FLB.

Reply via email to