> From: Michael S. Tsirkin <m...@redhat.com>
> Sent: 26 June 2025 12:04 PM
> To: Parav Pandit <pa...@nvidia.com>
> Cc: Stefan Hajnoczi <stefa...@redhat.com>; ax...@kernel.dk;
> virtualizat...@lists.linux.dev; linux-block@vger.kernel.org;
> sta...@vger.kernel.org; NBU-Contact-Li Rongqing (EXTERNAL)
> <lirongq...@baidu.com>; Chaitanya Kulkarni <chaitan...@nvidia.com>;
> xuanz...@linux.alibaba.com; pbonz...@redhat.com;
> jasow...@redhat.com; alok.a.tiw...@oracle.com; Max Gurtovoy
> <mgurto...@nvidia.com>; Israel Rukshin <isra...@nvidia.com>
> Subject: Re: [PATCH v5] virtio_blk: Fix disk deletion hang on device surprise
> removal
> 
> On Thu, Jun 26, 2025 at 06:29:09AM +0000, Parav Pandit wrote:
> > > > > yes however this is not at all different that hotunplug right after 
> > > > > reset.
> > > > >
> > > > For hotunplug after reset, we likely need a timeout handler.
> > > > Because block driver running inside the remove() callback waiting
> > > > for the IO,
> > > may not get notified from driver core to synchronize ongoing remove().
> > >
> > >
> > > Notified of what?
> > Notification that surprise-removal occurred.
> >
> > > So is the scenario that graceful remove starts, and meanwhile a
> > > surprise removal happens?
> > >
> > Right.
> 
> 
> where is it stuck then? can you explain?

I am not sure I understood the question.

Let me try:
Following scenario will hang even with the current fix:

Say, 
1. the graceful removal is ongoing in the remove() callback, where disk 
deletion del_gendisk() is ongoing, which waits for the requests to complete,

2. Now few requests are yet to complete, and surprise removal started.

At this point, virtio block driver will not get notified by the driver core 
layer, because it is likely serializing remove() happening by user/driver 
unload and PCI hotplug driver-initiated device removal.
So vblk driver doesn't know that device is removed, block layer is waiting for 
requests completions to arrive which it never gets.
So del_gendisk() gets stuck.

This needs some kind of timeout handling to improve the situation to make 
removal more robust.

Did I answer or I didn't understand the question?

Reply via email to