On 5/15/26 14:58, Kevin Wolf wrote:
> Am 24.04.2026 um 12:39 hat Denis V. Lunev geschrieben:
>> cache_clean_timer_del_and_wait() cancels the cache-cleaner coroutine
>> by setting s->cache_clean_interval = 0 and calling qemu_co_sleep_wake()
>> to cut short its qemu_co_sleep_ns_wakeable(). qemu_co_sleep_wake() is
>> fire-and-forget: it reads w->to_wake and silently returns when it is
>> NULL. A sleeper that is between two iterations -- has just released
>> s->lock but has not yet set w->to_wake inside qemu_co_sleep() -- loses
>> the wake:
>>
>> iothread0 timer coroutine main thread (qcow2 close)
>> ------------------------- -------------------------
>> while-body (holding s->lock):
>> read interval = 600
>> wait_ns = 600 * NS
>> release s->lock
>> take s->lock
>> interval = 0
>> qemu_co_sleep_wake(w):
>> w->to_wake == NULL -> skip
>> return
>> qemu_co_queue_wait(exit, s->lock):
>> release s->lock
>> yield
>> qemu_co_sleep_ns_wakeable:
>> aio_timer_init(+600 s)
>> qemu_co_sleep:
>> cas scheduled NULL -> "qsns"
>> w->to_wake = co
>> yield [sleeps 600 s]
>>
>> cache_clean_timer_del_and_wait() is now stuck waiting for
>> cache_clean_timer_exit; the timer will not signal it until its
>> original 600 s expiry fires. qcow2_close() is on the main thread
>> holding BQL, so RCU, VCPUs and every iothread path that needs BQL
>> stall behind it.
>>
>> qemu_co_sleep_wake() has always been a hint: it has no way to
>> rendezvous with a sleeper still arming. Rather than mutate it (which
>> would change semantics for every other user -- mirror, stream,
>> backup), fix the caller.
> Would changing it be a problem for any caller?
>
> I don't see the calls in mirror and stream, but for the backup job it
> seems to be two cases: Cancelling the block job and updating the speed.
> Cancelling is essentially the exact same case as closing qcow2, so
> changing it fixes an unwanted delay. If updating the speed should always
> wake up the job is more debatable, but once we've taken the decision
> that it should do so, doing that always (even under the race condition)
> doesn't seem like a problem either.
>
> I'm asking because the workaround in this patch both makes the qcow2
> code more complicated and still doesn't fully solve the problem - you
> still get a delay, it's just shortened to what we think is acceptable.
> If we fixed the qemu_co_sleep_wake() interface, the wakeup would be
> instantaneous and the code in the callers would be simpler. (And I don't
> think it would be much worse in the implementation either.)
>
>> Split the sleep in cache_clean_timer() into steps of at most one
>> second and move the s->cache_clean_interval check to the top of the
>> loop so it is re-evaluated under s->lock between steps. The
>> loop/wait structure itself is unchanged. The stop decision is now
>> made under the same lock that the teardown caller holds to set
>> cache_clean_interval = 0, so it cannot be missed.
>> qemu_co_sleep_wake() is still called opportunistically to cut short
>> the current step; if it misses, the next 1 s tick catches the change.
>> Worst-case cancellation latency is bounded at 1 s, independent of
>> cache_clean_interval.
>>
>> Fixes: f86dde9a15 ("qcow2: Fix cache_clean_timer")
>> Signed-off-by: Denis V. Lunev <[email protected]>
>> Cc: Hanna Czenczek <[email protected]>
>> Cc: Kevin Wolf <[email protected]>
> Kevin
>
ok. Let me try to write this.
Den