Re: Deadlock with SATA CD I/O and eject

Tejus GK Wed, 21 Jan 2026 01:46:40 -0800


> On 18 Sep 2023, at 10:58 PM, John Levon <[email protected]> wrote:
> 
> 
> Observed with base of qemu 6.2.0, but from code inspection it looks to me like
> it's still current in upstream master. Apologies if I have missed a fix in 
> this
> area.
> 
> Symptom: run a UEFI-booted SATA CD Windows installer. When it hits "Loading
> files.." screen, run an eject e.g.
> 
> virsh qemu-monitor-command 64c6e190-ea7f-49e2-b2d5-6ba1814b00ae 
> '{"execute":"eject", "arguments": { "id": "sata0-0-0" } }'
> 
> qemu will get stuck like so:
> 
> gdb) bt
> #0  0x00007f8ba4b16036 in ppoll () from /lib64/libc.so.6
> #1  0x0000561813c48ed5 in ppoll (__ss=0x0, __timeout=0x7ffcbd981a70, 
> __nfds=<optimized out>, __fds=<optimized out>) at /usr/include/bits/poll2.h:62
> #2  qemu_poll_ns (fds=<optimized out>, nfds=<optimized out>, 
> timeout=timeout@entry=999896128) at ../util/qemu-timer.c:348
> #3  0x0000561813c29be9 in fdmon_poll_wait (ctx=0x56181516e070, 
> ready_list=0x7ffcbd981af0, timeout=999896128) at ../util/fdmon-poll.c:80
> #4  0x0000561813c297e1 in aio_poll (ctx=ctx@entry=0x56181516e070, 
> blocking=blocking@entry=true) at ../util/aio-posix.c:607
> #5  0x0000561813ae2fad in bdrv_do_drained_begin (poll=true, 
> ignore_bds_parents=false, parent=0x0, recursive=false, bs=0x56181533fcc0) at 
> ../block/io.c:483
> #6  bdrv_do_drained_begin (bs=0x56181533fcc0, recursive=<optimized out>, 
> parent=0x0, ignore_bds_parents=<optimized out>, poll=<optimized out>) at 
> ../block/io.c:446
> #7  0x0000561813ad9982 in blk_drain (blk=0x5618161c1f10) at 
> ../block/block-backend.c:1741
> #8  0x0000561813ad9b8c in blk_remove_bs (blk=blk@entry=0x5618161c1f10) at 
> ../block/block-backend.c:852
> #9  0x000056181382b8ab in blockdev_remove_medium (has_device=<optimized out>, 
> device=<optimized out>, has_id=<optimized out>, id=<optimized out>, 
> errp=0x7ffcbd981c78) at ../block/qapi-sysemu.c:232
> #10 0x000056181382bfb1 in qmp_eject (has_device=<optimized out>, device=0x0, 
> has_id=<optimized out>, id=0x561815e6efe0 "sata0-0-0", has_force=<optimized 
> out>, force=<optimized out>, errp=0x7ffcbd981c78) at ../block/qapi-sysemu.c:45
> 
> We are stuck forever here:
> 
> 351 static void bdrv_do_drained_begin(BlockDriverState *bs, BdrvChild *parent,
> 352                                   bool poll)
> ...
> 380     if (poll) {
> 381         BDRV_POLL_WHILE(bs, bdrv_drain_poll_top_level(bs, parent));
> 382     }
> 
> Because the blk root's ->in_flight is 1, as tested by the condition
> blk_root_drained_poll().
> 
> 
> Our blk->in_flight user is stuck here:
> 
> 1298 static void coroutine_fn blk_wait_while_drained(BlockBackend *blk)
> ...
> 1310         blk_dec_in_flight(blk);
> 1311         qemu_co_queue_wait(&blk->queued_requests, 
> &blk->queued_requests_lock);
> 1312         blk_inc_in_flight(blk);
> 
> Note that before entering this stanza, blk->in_flight was 2. This turns out to
> be due to the ide atapi code. In particular, the guest VM is generating lots 
> of
> read I/O. The "first IO" arrives into blk via:
> 
> cd_read_sector()->ide_buffered_readv()->blk_aio_preadv()
> 
> This initial IO completes:
> 
> blk_aio_read_entry()->blk_aio_complete()
> 
> 1560 static void blk_aio_complete(BlkAioEmAIOCB *acb)
> 1561 {
> 1562     if (acb->has_returned) {
> 1563         acb->common.cb(acb->common.opaque, acb->rwco.ret);
> 1564         blk_dec_in_flight(acb->rwco.blk);
> 1565         qemu_aio_unref(acb);
> 1566     }
> 1567 }
> 
> Line 1564 is what we need to move blk->in_flight down to zero, but that is 
> never
> reached! This is because of what happens at :1563
> 
> acm->common.cb()->cd_read_sector_cb()->ide_atapi_cmd_reply_end()->cd_read_sector_sync()->blk_pread()
> 
> That is, the IO callback in the atapi code itself triggers more - synchronous 
> - IO.
> 
> In the meantime, we start processing the blk_drain() code, so by the time this
> blk_pread() actually gets handled, quiesce is set, and we get stuck in the
> blk_wait_while_drained().
> 
> I don't know the qemu block stack well enough to propose an actual fix.
> 
> Experimentally, waiting for ->in_flight to drop to zero *before* we quiesce in
> blk_remove_bs() via an AIO_WAIT_WHILE() avoids the symptom, but I'm pretty 
> sure
> that's just a band-aid instead of fixing the deadlock.
> 
> Any suggestions/clues/thoughts?
> 
> thanks
> john
> 
>


Hi Kevin, Hanna. 

Did we ever end up fixing this? I am seeing a similar deadlock, on QEMU 9.1.0, 
when migrating the same kind of VM described originally. Due to the deadlock, 
the VM on the source goes into a paused state, and stays there indefinitely.

QEMU backtrace:
Thread 14 (Thread 0x7fc07e7d7640 (LWP 693747) "mig/src/main"):
#0  0x00007fc087301fce in ppoll () from target:/lib/../lib64/libc.so.6
#1  0x00005624c07ecb63 in qemu_poll_ns (fds=0x7fbe44118f90, timeout=<optimized 
out>, nfds=<optimized out>) at ../util/qemu-timer.c:339
#2  fdmon_poll_wait (ctx=0x5624d7fea900, ready_list=0x7fc084047f78, 
timeout=<optimized out>) at ../util/fdmon-poll.c:79
#3  0x00005624c07ebdb7 in aio_poll (ctx=0x5624d7fea900, blocking=<optimized 
out>) at ../util/aio-posix.c:670
#4  0x00005624c04f5b03 in bdrv_drain_all_begin () at ../block/io.c:531
#5  0x00005624c013f7ab in bdrv_drain_all () at ../block/io.c:576
#6  do_vm_stop (state=<optimized out>, send_stop=true) at ../system/cpus.c:308
#7  0x00005624c0140ebd in vm_stop (state=<optimized out>) at 
../system/cpus.c:715
#8  vm_stop_force_state (state=<optimized out>) at ../system/cpus.c:784
#9  0x00005624c0187e8e in migration_stop_vm (s=0x5624d7fc6490, 
state=RUN_STATE_FINISH_MIGRATE) at ../migration/migration.c:243
#10 migration_completion_precopy (s=0x5624d7fc6490, 
current_active_state=<optimized out>) at ../migration/migration.c:2747
#11 migration_completion (s=0x5624d7fc6490) at ../migration/migration.c:2801
#12 migration_iteration_run (s=0x5624d7fc6490) at ../migration/migration.c:3246
#13 migration_thread (opaque=0x5624d7fc6490) at ../migration/migration.c:3517
#14 0x00005624c07f0baa in qemu_thread_start (args=0x5624d82d7bc0) at 
../util/qemu-thread-posix.c:543
#15 0x00007fc08728a19a in start_thread () from target:/lib/../lib64/libc.so.6
#16 0x00007fc08730e3f4 in clone () from target:/lib/../lib64/libc.so.6

The rest of the symptoms are pretty much same as the original thread. The only 
difference this time being the fact that the deadlock hitting in the migration 
use-case.

I was wondering if its feasible to introduce a workaround like this? 
Essentially, waiting for all in-flight IOs to finish *just* before we enable 
quiescing. Something like: 

diff --git a/block/block-backend.c b/block/block-backend.c
index ba7dc44998..4599bd48f2 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -2822,6 +2822,9 @@ static void blk_root_drained_begin(BdrvChild *child)
     BlockBackend *blk = child->opaque;
     ThrottleGroupMember *tgm = &blk->public.throttle_group_member;
+    AIO_WAIT_WHILE(blk_get_aio_context(blk),
+                   qatomic_load_acquire(&blk->in_flight) > 0);
+
     if (qatomic_fetch_inc(&blk->quiesce_counter) == 0) {
         if (blk->dev_ops && blk->dev_ops->drained_begin) {
             blk->dev_ops->drained_begin(blk->dev_opaque); 

Thoughts? 


Regards,
Tejus

Re: Deadlock with SATA CD I/O and eject

Reply via email to