On Tue, Dec 2, 2025 at 6:10 AM Pratyush Yadav <[email protected]> wrote: > > On Mon, Dec 01 2025, Pasha Tatashin wrote: > > > On Wed, Nov 26, 2025 at 2:36 PM David Matlack <[email protected]> wrote: > [...] > >> FLB Locking > >> > >> I don't see a way to properly synchronize pci_flb_finish() with > >> pci_liveupdate_incoming_is_preserved() since the incoming FLB mutex is > >> dropped by liveupdate_flb_get_incoming() when it returns the pointer > >> to the object, and taking pci_flb_incoming_lock in pci_flb_finish() > >> could result in a deadlock due to reversing the lock ordering. > > My mental model for FLB is that it is a dependency for files, so it > should always be created (aka prepare) before _any_ of the files, and > always destroyed (aka finish) after _all_ of the files. > > By the time the FLB is being finished, all the files for that FLB should > also be finished, so there should no longer be a user of the FLB. > > Once all of the files are finished, it should be LUO's responsibility to > make sure liveupdate_flb_get_incoming() returns an error _before_ it > starts doing the FLB finish. And in FLB finish you should not be needing > to take any locks. > > Why do you want to use the FLB when it is being finished?
The next patch looks at the PCI FLB anytime a device is probed, which could could race with the last device file getting finished causing the FLB to be freed. However, it looks like I am going to drop that patch. But the PCI FLB is still used in PATCH 08 [1] whenever userspace opens a VFIO cdev or issues the VFIO_GROUP_GET_DEVICE_FD ioctl to check of the underlying PCI device was preserved. Offline Jason suggested decoupling those checks from the FLB, so I'll look into doing that in the next version. [1]https://lore.kernel.org/kvm/[email protected]/ > > > > > I will re-introduce _lock/_unlock API to solve this issue. > > > >> > >> FLB Retrieving > >> > >> The first patch of this series includes a fix to prevent an FLB from > >> being retrieved again it is finished. I am wondering if this is the > >> right approach or if subsystems are expected to stop calling > >> liveupdate_flb_get_incoming() after an FLB is finished. > > IMO once the FLB is finished, LUO should make sure it cannot be > retrieved, mainly so subsystem code is simpler and less bug-prone. +1, and I think Pasha is going to do that in the next version of FLB.

