Hi, I'm working on a v3 of my query-block series [1] and I'm a bit
confused about how to convert a QMP command into a coroutine.

In case you miss the context:

 In that series I'm turning query-block into a coroutine so we can avoid
 holding the BQL for too long in the case of a misbehaving (slow)
 syscall at the end of the call chain (get_allocated_file_size -> fstat
 in my case).

The issue:

After converting qmp_query_block into a coroutine, I'm hitting the
assert(false) bug at qcow2_get_specific_info() which was already fixed
for non-coroutines [2]. The bug was caused by qmp_query_block() running
during bdrv_activate_all():

bdrv_activate_all
  ...
  bdrv_invalidate_cache
    bdrv_poll_co
    |-> aio_co_enter
    |   ...
    |   qcow2_co_invalidate_cache
    |     memset(s, 0, ...)
    |     qcow2_do_open
    |       blk_co_pread
    |       ...
    |       qemu_coroutine_yield
    |-> AIO_WAIT_WHILE
    |   aio_poll
    |     reschedule of qmp_dispatch
    |     qmp_query_block
    |     ...
    |     qcow2_get_specific_info
    |       sees s->qcow_version == 0
    |       assert(false)
  
So my question is how do we expect to be able to convert a QMP command
into a coroutine if we're rescheduling all coroutines into
qemu_aio_context (at qmp_dispatch). I don't see how to avoid any
random aio_poll causing a dispatch of the coroutine in the middle of
something else.

If I keep the QMP command in the iohandler context, then the bug never
happens. Rescheduling back into the iohandler would also work, were it
not for the HMP path which only polls on qemu_aio_context and causes a
deadlock.

What's the recommended approach here?

Thank you

1- https://lore.kernel.org/r/20230609201910.12100-1-faro...@suse.de
2- https://gitlab.com/qemu-project/qemu/-/issues/1933

Reply via email to