Am 02.01.24 um 16:24 schrieb Hanna Czenczek:
> 
> I’ve attached the preliminary patch that I didn’t get to send (or test
> much) last year.  Not sure if it has the same CPU-usage-spike issue
> Fiona was seeing, the only functional difference is that I notify the vq
> after attaching the notifiers instead of before.
> 

Applied the patch on top of c12887e1b0 ("block-coroutine-wrapper: use
qemu_get_current_aio_context()") because it conflicts with b6948ab01d
("virtio-blk: add iothread-vq-mapping parameter").

I'm happy to report that I cannot reproduce the CPU-usage-spike issue
with the patch, but I did run into an assertion failure when trying to
verify that it fixes my original stuck-guest-IO issue. See below for the
backtrace [0]. Hanna wrote in https://issues.redhat.com/browse/RHEL-3934

> I think it’s sufficient to simply call virtio_queue_notify_vq(vq) after the 
> virtio_queue_aio_attach_host_notifier(vq, ctx) call, because both 
> virtio-scsi’s and virtio-blk’s .handle_output() implementations acquire the 
> device’s context, so this should be directly callable from any context.

I guess this is not true anymore now that the AioContext locking was
removed?

Back to the CPU-usage-spike issue: I experimented around and it doesn't
seem to matter whether I notify the virt queue before or after attaching
the notifiers. But there's another functional difference. My patch
called virtio_queue_notify() which contains this block:

>     if (vq->host_notifier_enabled) {
>         event_notifier_set(&vq->host_notifier);
>     } else if (vq->handle_output) {
>         vq->handle_output(vdev, vq);

In my testing, the first branch was taken, calling event_notifier_set().
Hanna's patch uses virtio_queue_notify_vq() and there,
vq->handle_output() will be called. That seems to be the relevant
difference regarding the CPU-usage-spike issue.

Best Regards,
Fiona

[0]:

> #0  __pthread_kill_implementation (threadid=<optimized out>, 
> signo=signo@entry=6, no_tid=no_tid@entry=0) at ./nptl/pthread_kill.c:44
> #1  0x00007ffff60e3d9f in __pthread_kill_internal (signo=6, 
> threadid=<optimized out>) at ./nptl/pthread_kill.c:78
> #2  0x00007ffff6094f32 in __GI_raise (sig=sig@entry=6) at 
> ../sysdeps/posix/raise.c:26
> #3  0x00007ffff607f472 in __GI_abort () at ./stdlib/abort.c:79
> #4  0x00007ffff607f395 in __assert_fail_base (fmt=0x7ffff61f3a90 "%s%s%s:%u: 
> %s%sAssertion `%s' failed.\n%n", 
>     assertion=assertion@entry=0x555556246bf8 "ctx == 
> qemu_get_current_aio_context()", 
>     file=file@entry=0x555556246baf "../system/dma-helpers.c", 
> line=line@entry=123, 
>     function=function@entry=0x555556246c70 <__PRETTY_FUNCTION__.1> 
> "dma_blk_cb") at ./assert/assert.c:92
> #5  0x00007ffff608de32 in __GI___assert_fail (assertion=0x555556246bf8 "ctx 
> == qemu_get_current_aio_context()", 
>     file=0x555556246baf "../system/dma-helpers.c", line=123, 
> function=0x555556246c70 <__PRETTY_FUNCTION__.1> "dma_blk_cb")
>     at ./assert/assert.c:101
> #6  0x0000555555b83425 in dma_blk_cb (opaque=0x55555804f150, ret=0) at 
> ../system/dma-helpers.c:123
> #7  0x0000555555b839ec in dma_blk_io (ctx=0x555557404310, sg=0x5555588ca6f8, 
> offset=70905856, align=512, 
>     io_func=0x555555a94a87 <scsi_dma_readv>, io_func_opaque=0x55555817ea00, 
> cb=0x555555a8d99f <scsi_dma_complete>, opaque=0x55555817ea00, 
>     dir=DMA_DIRECTION_FROM_DEVICE) at ../system/dma-helpers.c:236
> #8  0x0000555555a8de9a in scsi_do_read (r=0x55555817ea00, ret=0) at 
> ../hw/scsi/scsi-disk.c:431
> #9  0x0000555555a8e249 in scsi_read_data (req=0x55555817ea00) at 
> ../hw/scsi/scsi-disk.c:501
> #10 0x0000555555a897e3 in scsi_req_continue (req=0x55555817ea00) at 
> ../hw/scsi/scsi-bus.c:1478
> #11 0x0000555555d8270e in virtio_scsi_handle_cmd_req_submit 
> (s=0x555558669af0, req=0x5555588ca6b0) at ../hw/scsi/virtio-scsi.c:828
> #12 0x0000555555d82937 in virtio_scsi_handle_cmd_vq (s=0x555558669af0, 
> vq=0x555558672550) at ../hw/scsi/virtio-scsi.c:870
> #13 0x0000555555d829a9 in virtio_scsi_handle_cmd (vdev=0x555558669af0, 
> vq=0x555558672550) at ../hw/scsi/virtio-scsi.c:883
> #14 0x0000555555db3784 in virtio_queue_notify_vq (vq=0x555558672550) at 
> ../hw/virtio/virtio.c:2268
> #15 0x0000555555d8346a in virtio_scsi_drained_end (bus=0x555558669d88) at 
> ../hw/scsi/virtio-scsi.c:1179
> #16 0x0000555555a8a549 in scsi_device_drained_end (sdev=0x555558105000) at 
> ../hw/scsi/scsi-bus.c:1774
> #17 0x0000555555a931db in scsi_disk_drained_end (opaque=0x555558105000) at 
> ../hw/scsi/scsi-disk.c:2369
> #18 0x0000555555ee439c in blk_root_drained_end (child=0x5555574065d0) at 
> ../block/block-backend.c:2829
> #19 0x0000555555ef0ac3 in bdrv_parent_drained_end_single (c=0x5555574065d0) 
> at ../block/io.c:74
> #20 0x0000555555ef0b02 in bdrv_parent_drained_end (bs=0x555557409f80, 
> ignore=0x0) at ../block/io.c:89
> #21 0x0000555555ef1b1b in bdrv_do_drained_end (bs=0x555557409f80, parent=0x0) 
> at ../block/io.c:421
> #22 0x0000555555ef1b5a in bdrv_drained_end (bs=0x555557409f80) at 
> ../block/io.c:428
> #23 0x0000555555efcf64 in mirror_exit_common (job=0x5555588b8220) at 
> ../block/mirror.c:798
> #24 0x0000555555efcfde in mirror_abort (job=0x5555588b8220) at 
> ../block/mirror.c:814
> #25 0x0000555555ec53ea in job_abort (job=0x5555588b8220) at ../job.c:825
> #26 0x0000555555ec54d5 in job_finalize_single_locked (job=0x5555588b8220) at 
> ../job.c:855
> #27 0x0000555555ec57cb in job_completed_txn_abort_locked (job=0x5555588b8220) 
> at ../job.c:958
> #28 0x0000555555ec5c20 in job_completed_locked (job=0x5555588b8220) at 
> ../job.c:1065
> #29 0x0000555555ec5cd5 in job_exit (opaque=0x5555588b8220) at ../job.c:1088
> #30 0x000055555608342e in aio_bh_call (bh=0x7fffe400dfd0) at 
> ../util/async.c:169
> #31 0x0000555556083549 in aio_bh_poll (ctx=0x55555718ade0) at 
> ../util/async.c:216
> #32 0x0000555556065203 in aio_dispatch (ctx=0x55555718ade0) at 
> ../util/aio-posix.c:423
> #33 0x0000555556083988 in aio_ctx_dispatch (source=0x55555718ade0, 
> callback=0x0, user_data=0x0) at ../util/async.c:358
> #34 0x00007ffff753e7a9 in g_main_context_dispatch () from 
> /lib/x86_64-linux-gnu/libglib-2.0.so.0
> #35 0x00005555560850ae in glib_pollfds_poll () at ../util/main-loop.c:290
> #36 0x000055555608512b in os_host_main_loop_wait (timeout=0) at 
> ../util/main-loop.c:313
> #37 0x0000555556085239 in main_loop_wait (nonblocking=0) at 
> ../util/main-loop.c:592
> #38 0x0000555555b8d501 in qemu_main_loop () at ../system/runstate.c:782
> #39 0x0000555555e55587 in qemu_default_main () at ../system/main.c:37
> #40 0x0000555555e555c2 in main (argc=68, argv=0x7fffffffd8b8) at 
> ../system/main.c:48


Reply via email to