On Wed, Dec 03, 2025 at 09:29:27AM -0800, David Matlack wrote:
> On Wed, Dec 3, 2025 at 7:46 AM Pasha Tatashin <[email protected]> 
> wrote:
> >
> > On Wed, Dec 3, 2025 at 7:55 AM Alex Mastro <[email protected]> wrote:
> > >
> > > On Wed, Nov 26, 2025 at 07:35:53PM +0000, David Matlack wrote:
> > > > From: Vipin Sharma <[email protected]>
> > > >  static int vfio_pci_liveupdate_retrieve(struct liveupdate_file_op_args 
> > > > *args)
> > > >  {
> > > > -     return -EOPNOTSUPP;
> > > > +     struct vfio_pci_core_device_ser *ser;
> > > > +     struct vfio_device *device;
> > > > +     struct folio *folio;
> > > > +     struct file *file;
> > > > +     int ret;
> > > > +
> > > > +     folio = kho_restore_folio(args->serialized_data);
> > > > +     if (!folio)
> > > > +             return -ENOENT;
> > >
> > > Should this be consistent with the behavior of pci_flb_retrieve() which 
> > > panics
> > > on failure? The short circuit failure paths which follow leak the folio,
> 
> Thanks for catching the leaked folio. I'll fix that in the next version.
> 
> > > which seems like a hygiene issue, but the practical significance is moot 
> > > if
> > > vfio_pci_liveupdate_retrieve() failure is catastrophic anyways?
> >
> > pci_flb_retrieve() is used during boot. If it fails, we risk DMA
> > corrupting any memory region, so a panic makes sense. In contrast,
> > this retrieval happens once we are already in userspace, allowing the
> > user to decide how to handle the failure to recover the preserved
> > cdev.
> 
> This is what I was thinking as well. vfio_pci_liveupdate_retrieve()
> runs in the context of the ioctl LIVEUPDATE_SESSION_RETRIEVE_FD, so we
> can just return an error up to userspace if anything goes wrong and
> let userspace initiate the reboot to recover the device if/when it's
> ready.
> 
> OTOH, pci_flb_retrieve() gets called by the kernel during early boot
> to determine what devices the previous kernel preserved. If the kernel
> can't determine which devices were preserved by the previous kernel
> and once the kernel starts preserving I/O page tables, that could lead
> to corruption, so panicking is warranted.

Make sense, thanks for elaborating David and Pasha. 

Reply via email to