On Thu, Jun 26, 2025 at 09:19:49AM +0000, Parav Pandit wrote: > > > From: Michael S. Tsirkin <m...@redhat.com> > > Sent: 26 June 2025 12:04 PM > > To: Parav Pandit <pa...@nvidia.com> > > Cc: Stefan Hajnoczi <stefa...@redhat.com>; ax...@kernel.dk; > > virtualizat...@lists.linux.dev; linux-block@vger.kernel.org; > > sta...@vger.kernel.org; NBU-Contact-Li Rongqing (EXTERNAL) > > <lirongq...@baidu.com>; Chaitanya Kulkarni <chaitan...@nvidia.com>; > > xuanz...@linux.alibaba.com; pbonz...@redhat.com; > > jasow...@redhat.com; alok.a.tiw...@oracle.com; Max Gurtovoy > > <mgurto...@nvidia.com>; Israel Rukshin <isra...@nvidia.com> > > Subject: Re: [PATCH v5] virtio_blk: Fix disk deletion hang on device > > surprise > > removal > > > > On Thu, Jun 26, 2025 at 06:29:09AM +0000, Parav Pandit wrote: > > > > > > yes however this is not at all different that hotunplug right after > > > > > > reset. > > > > > > > > > > > For hotunplug after reset, we likely need a timeout handler. > > > > > Because block driver running inside the remove() callback waiting > > > > > for the IO, > > > > may not get notified from driver core to synchronize ongoing remove(). > > > > > > > > > > > > Notified of what? > > > Notification that surprise-removal occurred. > > > > > > > So is the scenario that graceful remove starts, and meanwhile a > > > > surprise removal happens? > > > > > > > Right. > > > > > > where is it stuck then? can you explain? > > I am not sure I understood the question. > > Let me try: > Following scenario will hang even with the current fix: > > Say, > 1. the graceful removal is ongoing in the remove() callback, where disk > deletion del_gendisk() is ongoing, which waits for the requests to complete, > > 2. Now few requests are yet to complete, and surprise removal started. > > At this point, virtio block driver will not get notified by the driver core > layer, because it is likely serializing remove() happening by user/driver > unload and PCI hotplug driver-initiated device removal. > So vblk driver doesn't know that device is removed, block layer is waiting > for requests completions to arrive which it never gets. > So del_gendisk() gets stuck. > > This needs some kind of timeout handling to improve the situation to make > removal more robust. > > Did I answer or I didn't understand the question?
You did, thanks! How do other drivers handle this? The issue seems generic. -- MST