Am 10.11.2025 um 16:48 hat Hanna Czenczek geschrieben:
> The cache-cleaner runs as a timer CB in the BDS AioContext. With
> multiqueue, it can run concurrently to I/O requests, and because it does
> not take any lock, this can break concurrent cache accesses, corrupting
> the image. While the chances of this happening are low, it can be
> reproduced e.g. by modifying the code to schedule the timer CB every
> 5 ms (instead of at most once per second) and modifying the last (inner)
> while loop of qcow2_cache_clean_unused() like so:
>
> while (i < c->size && can_clean_entry(c, i)) {
> for (int j = 0; j < 1000 && can_clean_entry(c, i); j++) {
> usleep(100);
> }
> c->entries[i].offset = 0;
> c->entries[i].lru_counter = 0;
> i++;
> to_clean++;
> }
>
> i.e. making it wait on purpose for the point in time where the cache is
> in use by something else.
>
> The solution chosen for this in this patch is not the best solution, I
> hope, but I admittedly can’t come up with anything strictly better.
>
> We can protect from concurrent cache accesses either by taking the
> existing s->lock, or we introduce a new (non-coroutine) mutex
> specifically for cache accesses. I would prefer to avoid the latter so
> as not to introduce additional (very slight) overhead.
>
> Using s->lock, which is a coroutine mutex, however means that we need to
> take it in a coroutine, so the timer must run in a coroutine. We can
> transform it from the current timer CB style into a coroutine that
> sleeps for the set interval. As a result, however, we can no longer
> just deschedule the timer to instantly guarantee it won’t run anymore,
> but have to await the coroutine’s exit.
>
> (Note even before this patch there were places that may not have been so
> guaranteed after all: Anything calling cache_clean_timer_del() from the
> QEMU main AioContext could have been running concurrently to an existing
> timer CB invocation.)
>
> Polling to await the timer to actually settle seems very complicated for
> something that’s rather a minor problem, but I can’t come up with any
> better solution that doesn’t again just overlook potential problems.
>
> (Not Cc-ing qemu-stable, as the issue is quite unlikely to be hit, and
> I’m not too fond of this solution.)
>
> Signed-off-by: Hanna Czenczek <[email protected]>
> -static void cache_clean_timer_cb(void *opaque)
> +static void coroutine_fn cache_clean_timer(void *opaque)
> {
> - BlockDriverState *bs = opaque;
> - BDRVQcow2State *s = bs->opaque;
> - qcow2_cache_clean_unused(s->l2_table_cache);
> - qcow2_cache_clean_unused(s->refcount_block_cache);
> - timer_mod(s->cache_clean_timer, qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) +
> - (int64_t) s->cache_clean_interval * 1000);
> + BDRVQcow2State *s = opaque;
> + uint64_t wait_ns;
> +
> + WITH_QEMU_LOCK_GUARD(&s->lock) {
> + wait_ns = s->cache_clean_interval * NANOSECONDS_PER_SECOND;
> + }
> +
> + while (wait_ns > 0) {
> + qemu_co_sleep_ns_wakeable(&s->cache_clean_timer_wake,
> + QEMU_CLOCK_VIRTUAL, wait_ns);
> +
> + WITH_QEMU_LOCK_GUARD(&s->lock) {
> + if (s->cache_clean_interval > 0) {
> + qcow2_cache_clean_unused(s->l2_table_cache);
> + qcow2_cache_clean_unused(s->refcount_block_cache);
> + }
> +
> + wait_ns = s->cache_clean_interval * NANOSECONDS_PER_SECOND;
> + }
> + }
> +
> + WITH_QEMU_LOCK_GUARD(&s->lock) {
> + s->cache_clean_timer_co = NULL;
> + qemu_co_queue_restart_all(&s->cache_clean_timer_exit);
> + }
> }
> +/**
> + * Delete the cache clean timer and await any yet running instance.
> + * Called holding s->lock.
> + */
> +static void coroutine_fn
> +cache_clean_timer_co_locked_del_and_wait(BlockDriverState *bs)
> +{
> + BDRVQcow2State *s = bs->opaque;
> +
> + if (s->cache_clean_timer_co) {
> + s->cache_clean_interval = 0;
> + qemu_co_sleep_wake(&s->cache_clean_timer_wake);
> + qemu_co_queue_wait(&s->cache_clean_timer_exit, &s->lock);
> }
> }
I don't want to count the number of context switches this dance between
cache_clean_timer_co_locked_del_and_wait() and cache_clean_timer()
takes! Good that it's not a hot path. :-)
> +/* s_locked specifies whether s->lock is held or not */
> static void qcow2_update_options_commit(BlockDriverState *bs,
> - Qcow2ReopenState *r)
> + Qcow2ReopenState *r,
> + bool s_locked)
> {
> BDRVQcow2State *s = bs->opaque;
> int i;
>
> + /*
> + * We need to stop the cache-clean-timer before destroying the metadata
> + * table caches
> + */
> + if (s_locked) {
> + cache_clean_timer_co_locked_del_and_wait(bs);
> + } else {
> + cache_clean_timer_del_and_wait(bs);
> + }
> +
> if (s->l2_table_cache) {
> qcow2_cache_destroy(s->l2_table_cache);
> }
> @@ -1228,6 +1312,10 @@ static void
> qcow2_update_options_commit(BlockDriverState *bs,
> }
> s->l2_table_cache = r->l2_table_cache;
> s->refcount_block_cache = r->refcount_block_cache;
> +
> + s->cache_clean_interval = r->cache_clean_interval;
> + cache_clean_timer_init(bs, bdrv_get_aio_context(bs));
> +
> s->l2_slice_size = r->l2_slice_size;
>
> s->overlap_check = r->overlap_check;
> @@ -1239,12 +1327,6 @@ static void
> qcow2_update_options_commit(BlockDriverState *bs,
>
> s->discard_no_unref = r->discard_no_unref;
>
> - if (s->cache_clean_interval != r->cache_clean_interval) {
> - cache_clean_timer_del(bs);
> - s->cache_clean_interval = r->cache_clean_interval;
> - cache_clean_timer_init(bs, bdrv_get_aio_context(bs));
> - }
> -
> qapi_free_QCryptoBlockOpenOptions(s->crypto_opts);
> s->crypto_opts = r->crypto_opts;
> }
Is there any specific reason why you move cache_clean_timer_init()
earlier? I don't see an actual problem with the code as it is after this
change, but s->l2_slice_size is related to the cache in a way, so it
would feel safer if the cache cleaner were only started once all the
settings are updated and not only those that it actually happens to
access at the moment.
Kevin