Re: [PATCH 0/2] block: fix two missed-wakeup hangs on shutdown path

Denis V. Lunev Mon, 11 May 2026 14:54:25 -0700
On 4/24/26 12:39, Denis V. Lunev wrote:
> Problem
> -------
>
> The qemu shutdown / blockdev-close path can deadlock permanently on
> upstream master.  The main thread enters ppoll(timeout=-1) holding
> BQL, no other thread has a wake source that points back at it, and
> qemu has to be SIGKILLed.  The hang has no timeout -- it is a hard
> deadlock, not a slow operation; behind BQL, RCU, VCPUs and every
> iothread path that needs BQL stall with it.
>
> Two independent missed-wakeup races in the block layer contribute.
> Both share the same shape: a waiter arms on one side, the waker
> reads stale state on its fast path and silently skips the kick, and
> nothing else on the AioContext will fire to recover.  They are
> different bugs in different subsystems and each patch stands on its
> own; they are posted together because they surface through the same
> test and the same symptom and are easiest to diagnose side by side.
>
> Depending on which race fires, the main thread backtrace at the
> moment of hang is one of:
>
>   ppoll -> aio_poll -> bdrv_graph_wrlock -> blk_remove_bs
>       (patch 1 -- block/graph-lock)
>
>   ppoll -> aio_poll -> cache_clean_timer_del_and_wait -> qcow2_close
>       (patch 2 -- block/qcow2 cache_clean_timer)
>
> Race diagrams and the exact stale-state read are in each patch's
> commit message.
>
> Reproducer
> ----------
>
> Environment used for the numbers below: 4-vCPU VM guest,
> kernel 6.12.x, upstream master at bb230769b4.  On modern bare-metal
> the window is narrow enough that the hangs rarely reproduce without
> a VM -- a VM guest under full CPU saturation is what makes the
> timing reliable.  Downstream trees that still use plain
> bdrv_graph_wrlock() in blk_remove_bs() hit the graph-lock race on
> the first iteration without any stress at all.
>
>     # reproducer
>     stress-ng --cpu "$(nproc)" --timeout 0 &
>     for r in $(seq 20); do
>         timeout 120 ./build/tests/qemu-iotests/check -qcow2 iothreads-create
>     done
>     kill %1
>
> With `stress-ng --cpu $(nproc)` both races surface.  With
> `stress-ng --cpu $(($(nproc) - 1))` or without a stressor neither
> reproduces reliably across 20 iterations.
>
> When a race fires, the Python QMP client times out on vm.run_job()
> after 5 s, the qemu process keeps running but never makes forward
> progress, and the outer `timeout 120` eventually kills it.  attach
> gdb before the timeout kills qemu to capture the stack and
> distinguish which of the two races fired.
>
> Results
> -------
>
> Same guest, 20 iterations of the loop above:
>
>   upstream master:            10/20 FAIL (first fail at iter #2)
>   master + both patches:      20/20 PASS
>
> Signed-off-by: Denis V. Lunev <[email protected]>
> Cc: Kevin Wolf <[email protected]>
> Cc: Hanna Reitz <[email protected]>
> Cc: Stefan Hajnoczi <[email protected]>
> Cc: Fiona Ebner <[email protected]>
> Cc: Hanna Czenczek <[email protected]>
>
> Denis V. Lunev (2):
>   block/graph-lock: fix missed wakeup in bdrv_graph_co_rdunlock()
>   block/qcow2: fix hangup in cache_clean_timer cancellation
>
>  block/graph-lock.c | 12 +++++-------
>  block/qcow2.c      | 28 +++++++++++++++++-----------
>  2 files changed, 22 insertions(+), 18 deletions(-)
>
> --
> 2.51.0
ping
Re: [PATCH 0/2] block: fix two missed-wakeup hangs on shutdown path

Reply via email to