Am 24.04.2026 um 12:39 hat Denis V. Lunev geschrieben:
> cache_clean_timer_del_and_wait() cancels the cache-cleaner coroutine
> by setting s->cache_clean_interval = 0 and calling qemu_co_sleep_wake()
> to cut short its qemu_co_sleep_ns_wakeable(). qemu_co_sleep_wake() is
> fire-and-forget: it reads w->to_wake and silently returns when it is
> NULL. A sleeper that is between two iterations -- has just released
> s->lock but has not yet set w->to_wake inside qemu_co_sleep() -- loses
> the wake:
>
> iothread0 timer coroutine main thread (qcow2 close)
> ------------------------- -------------------------
> while-body (holding s->lock):
> read interval = 600
> wait_ns = 600 * NS
> release s->lock
> take s->lock
> interval = 0
> qemu_co_sleep_wake(w):
> w->to_wake == NULL -> skip
> return
> qemu_co_queue_wait(exit, s->lock):
> release s->lock
> yield
> qemu_co_sleep_ns_wakeable:
> aio_timer_init(+600 s)
> qemu_co_sleep:
> cas scheduled NULL -> "qsns"
> w->to_wake = co
> yield [sleeps 600 s]
>
> cache_clean_timer_del_and_wait() is now stuck waiting for
> cache_clean_timer_exit; the timer will not signal it until its
> original 600 s expiry fires. qcow2_close() is on the main thread
> holding BQL, so RCU, VCPUs and every iothread path that needs BQL
> stall behind it.
>
> qemu_co_sleep_wake() has always been a hint: it has no way to
> rendezvous with a sleeper still arming. Rather than mutate it (which
> would change semantics for every other user -- mirror, stream,
> backup), fix the caller.
Would changing it be a problem for any caller?
I don't see the calls in mirror and stream, but for the backup job it
seems to be two cases: Cancelling the block job and updating the speed.
Cancelling is essentially the exact same case as closing qcow2, so
changing it fixes an unwanted delay. If updating the speed should always
wake up the job is more debatable, but once we've taken the decision
that it should do so, doing that always (even under the race condition)
doesn't seem like a problem either.
I'm asking because the workaround in this patch both makes the qcow2
code more complicated and still doesn't fully solve the problem - you
still get a delay, it's just shortened to what we think is acceptable.
If we fixed the qemu_co_sleep_wake() interface, the wakeup would be
instantaneous and the code in the callers would be simpler. (And I don't
think it would be much worse in the implementation either.)
> Split the sleep in cache_clean_timer() into steps of at most one
> second and move the s->cache_clean_interval check to the top of the
> loop so it is re-evaluated under s->lock between steps. The
> loop/wait structure itself is unchanged. The stop decision is now
> made under the same lock that the teardown caller holds to set
> cache_clean_interval = 0, so it cannot be missed.
> qemu_co_sleep_wake() is still called opportunistically to cut short
> the current step; if it misses, the next 1 s tick catches the change.
> Worst-case cancellation latency is bounded at 1 s, independent of
> cache_clean_interval.
>
> Fixes: f86dde9a15 ("qcow2: Fix cache_clean_timer")
> Signed-off-by: Denis V. Lunev <[email protected]>
> Cc: Hanna Czenczek <[email protected]>
> Cc: Kevin Wolf <[email protected]>
Kevin