RE: [PATCH RFC] pci: report surprise removal events

Parav Pandit Sun, 29 Jun 2025 21:11:29 -0700


> From: Keith Busch <[email protected]>
> Sent: 30 June 2025 05:10 AM
> 
> On Sun, Jun 29, 2025 at 01:28:08PM -0400, Michael S. Tsirkin wrote:
> > On Sun, Jun 29, 2025 at 03:36:27PM +0200, Lukas Wunner wrote:
> > > On Sat, Jun 28, 2025 at 02:58:49PM -0400, Michael S. Tsirkin wrote:
> > >
> > > 1/ The device_lock() will reintroduce the issues solved by 74ff8864cc84.
> >
> > I see. What other way is there to prevent dev->driver from going away,
> > though? I guess I can add a new spinlock and take it both here and
> > when
> > dev->driver changes? Acceptable?
> 
> You're already holding the pci_bus_sem here, so the final device 'put'
> can't have been called yet, so the device is valid and thread safe in this
> context. I think maintaining the desired lifetime of the instantiated driver 
> is
> just a matter of reference counting within your driver.
> 
> Just a thought on your patch, instead of introducing a new callback, you could
> call the existing '->error_detected()' callback with the previously set
> 'pci_channel_io_perm_failure' status. That would totally work for nvme to
> kick its cleanup much quicker than the blk_mq timeout handling we currently
> rely on for this scenario.


error_detected() callback is also called while holding the device_lock() by 
report_error_detected().
So when remove() callback is ongoing for graceful removal and driver is waiting 
for the request completions,

If the error_detected() will be stuck on device lock.

We likely need async notification from driver kernel core without holding the 
device lock, so driver can initiate cleanup in this corner case.

RE: [PATCH RFC] pci: report surprise removal events

Reply via email to