Am 02.01.24 um 16:24 schrieb Hanna Czenczek: > > I’ve attached the preliminary patch that I didn’t get to send (or test > much) last year. Not sure if it has the same CPU-usage-spike issue > Fiona was seeing, the only functional difference is that I notify the vq > after attaching the notifiers instead of before. >
Applied the patch on top of c12887e1b0 ("block-coroutine-wrapper: use qemu_get_current_aio_context()") because it conflicts with b6948ab01d ("virtio-blk: add iothread-vq-mapping parameter"). I'm happy to report that I cannot reproduce the CPU-usage-spike issue with the patch, but I did run into an assertion failure when trying to verify that it fixes my original stuck-guest-IO issue. See below for the backtrace [0]. Hanna wrote in https://issues.redhat.com/browse/RHEL-3934 > I think it’s sufficient to simply call virtio_queue_notify_vq(vq) after the > virtio_queue_aio_attach_host_notifier(vq, ctx) call, because both > virtio-scsi’s and virtio-blk’s .handle_output() implementations acquire the > device’s context, so this should be directly callable from any context. I guess this is not true anymore now that the AioContext locking was removed? Back to the CPU-usage-spike issue: I experimented around and it doesn't seem to matter whether I notify the virt queue before or after attaching the notifiers. But there's another functional difference. My patch called virtio_queue_notify() which contains this block: > if (vq->host_notifier_enabled) { > event_notifier_set(&vq->host_notifier); > } else if (vq->handle_output) { > vq->handle_output(vdev, vq); In my testing, the first branch was taken, calling event_notifier_set(). Hanna's patch uses virtio_queue_notify_vq() and there, vq->handle_output() will be called. That seems to be the relevant difference regarding the CPU-usage-spike issue. Best Regards, Fiona [0]: > #0 __pthread_kill_implementation (threadid=<optimized out>, > signo=signo@entry=6, no_tid=no_tid@entry=0) at ./nptl/pthread_kill.c:44 > #1 0x00007ffff60e3d9f in __pthread_kill_internal (signo=6, > threadid=<optimized out>) at ./nptl/pthread_kill.c:78 > #2 0x00007ffff6094f32 in __GI_raise (sig=sig@entry=6) at > ../sysdeps/posix/raise.c:26 > #3 0x00007ffff607f472 in __GI_abort () at ./stdlib/abort.c:79 > #4 0x00007ffff607f395 in __assert_fail_base (fmt=0x7ffff61f3a90 "%s%s%s:%u: > %s%sAssertion `%s' failed.\n%n", > assertion=assertion@entry=0x555556246bf8 "ctx == > qemu_get_current_aio_context()", > file=file@entry=0x555556246baf "../system/dma-helpers.c", > line=line@entry=123, > function=function@entry=0x555556246c70 <__PRETTY_FUNCTION__.1> > "dma_blk_cb") at ./assert/assert.c:92 > #5 0x00007ffff608de32 in __GI___assert_fail (assertion=0x555556246bf8 "ctx > == qemu_get_current_aio_context()", > file=0x555556246baf "../system/dma-helpers.c", line=123, > function=0x555556246c70 <__PRETTY_FUNCTION__.1> "dma_blk_cb") > at ./assert/assert.c:101 > #6 0x0000555555b83425 in dma_blk_cb (opaque=0x55555804f150, ret=0) at > ../system/dma-helpers.c:123 > #7 0x0000555555b839ec in dma_blk_io (ctx=0x555557404310, sg=0x5555588ca6f8, > offset=70905856, align=512, > io_func=0x555555a94a87 <scsi_dma_readv>, io_func_opaque=0x55555817ea00, > cb=0x555555a8d99f <scsi_dma_complete>, opaque=0x55555817ea00, > dir=DMA_DIRECTION_FROM_DEVICE) at ../system/dma-helpers.c:236 > #8 0x0000555555a8de9a in scsi_do_read (r=0x55555817ea00, ret=0) at > ../hw/scsi/scsi-disk.c:431 > #9 0x0000555555a8e249 in scsi_read_data (req=0x55555817ea00) at > ../hw/scsi/scsi-disk.c:501 > #10 0x0000555555a897e3 in scsi_req_continue (req=0x55555817ea00) at > ../hw/scsi/scsi-bus.c:1478 > #11 0x0000555555d8270e in virtio_scsi_handle_cmd_req_submit > (s=0x555558669af0, req=0x5555588ca6b0) at ../hw/scsi/virtio-scsi.c:828 > #12 0x0000555555d82937 in virtio_scsi_handle_cmd_vq (s=0x555558669af0, > vq=0x555558672550) at ../hw/scsi/virtio-scsi.c:870 > #13 0x0000555555d829a9 in virtio_scsi_handle_cmd (vdev=0x555558669af0, > vq=0x555558672550) at ../hw/scsi/virtio-scsi.c:883 > #14 0x0000555555db3784 in virtio_queue_notify_vq (vq=0x555558672550) at > ../hw/virtio/virtio.c:2268 > #15 0x0000555555d8346a in virtio_scsi_drained_end (bus=0x555558669d88) at > ../hw/scsi/virtio-scsi.c:1179 > #16 0x0000555555a8a549 in scsi_device_drained_end (sdev=0x555558105000) at > ../hw/scsi/scsi-bus.c:1774 > #17 0x0000555555a931db in scsi_disk_drained_end (opaque=0x555558105000) at > ../hw/scsi/scsi-disk.c:2369 > #18 0x0000555555ee439c in blk_root_drained_end (child=0x5555574065d0) at > ../block/block-backend.c:2829 > #19 0x0000555555ef0ac3 in bdrv_parent_drained_end_single (c=0x5555574065d0) > at ../block/io.c:74 > #20 0x0000555555ef0b02 in bdrv_parent_drained_end (bs=0x555557409f80, > ignore=0x0) at ../block/io.c:89 > #21 0x0000555555ef1b1b in bdrv_do_drained_end (bs=0x555557409f80, parent=0x0) > at ../block/io.c:421 > #22 0x0000555555ef1b5a in bdrv_drained_end (bs=0x555557409f80) at > ../block/io.c:428 > #23 0x0000555555efcf64 in mirror_exit_common (job=0x5555588b8220) at > ../block/mirror.c:798 > #24 0x0000555555efcfde in mirror_abort (job=0x5555588b8220) at > ../block/mirror.c:814 > #25 0x0000555555ec53ea in job_abort (job=0x5555588b8220) at ../job.c:825 > #26 0x0000555555ec54d5 in job_finalize_single_locked (job=0x5555588b8220) at > ../job.c:855 > #27 0x0000555555ec57cb in job_completed_txn_abort_locked (job=0x5555588b8220) > at ../job.c:958 > #28 0x0000555555ec5c20 in job_completed_locked (job=0x5555588b8220) at > ../job.c:1065 > #29 0x0000555555ec5cd5 in job_exit (opaque=0x5555588b8220) at ../job.c:1088 > #30 0x000055555608342e in aio_bh_call (bh=0x7fffe400dfd0) at > ../util/async.c:169 > #31 0x0000555556083549 in aio_bh_poll (ctx=0x55555718ade0) at > ../util/async.c:216 > #32 0x0000555556065203 in aio_dispatch (ctx=0x55555718ade0) at > ../util/aio-posix.c:423 > #33 0x0000555556083988 in aio_ctx_dispatch (source=0x55555718ade0, > callback=0x0, user_data=0x0) at ../util/async.c:358 > #34 0x00007ffff753e7a9 in g_main_context_dispatch () from > /lib/x86_64-linux-gnu/libglib-2.0.so.0 > #35 0x00005555560850ae in glib_pollfds_poll () at ../util/main-loop.c:290 > #36 0x000055555608512b in os_host_main_loop_wait (timeout=0) at > ../util/main-loop.c:313 > #37 0x0000555556085239 in main_loop_wait (nonblocking=0) at > ../util/main-loop.c:592 > #38 0x0000555555b8d501 in qemu_main_loop () at ../system/runstate.c:782 > #39 0x0000555555e55587 in qemu_default_main () at ../system/main.c:37 > #40 0x0000555555e555c2 in main (argc=68, argv=0x7fffffffd8b8) at > ../system/main.c:48