Hi,
Am 04.11.25 um 5:11 AM schrieb Stefan Hajnoczi:
> AioContext has its own io_uring instance for file descriptor monitoring.
> The disk I/O io_uring code was developed separately. Originally I
> thought the characteristics of file descriptor monitoring and disk I/O
> were too different, requiring separate io_uring instances.
>
> Now it has become clear to me that it's feasible to share a single
> io_uring instance for file descriptor monitoring and disk I/O. We're not
> using io_uring's IOPOLL feature or anything else that would require a
> separate instance.
>
> Unify block/io_uring.c and util/fdmon-io_uring.c using the new
> aio_add_sqe() API that allows user-defined io_uring sqe submission. Now
> block/io_uring.c just needs to submit readv/writev/fsync and most of the
> io_uring-specific logic is handled by fdmon-io_uring.c.
>
> There are two immediate advantages:
> 1. Fewer system calls. There is no need to monitor the disk I/O io_uring
> ring fd from the file descriptor monitoring io_uring instance. Disk
> I/O completions are now picked up directly. Also, sqes are
> accumulated in the sq ring until the end of the event loop iteration
> and there are fewer io_uring_enter(2) syscalls.
> 2. Less code duplication.
>
> Note that error_setg() messages are not supposed to end with
> punctuation, so I removed a '.' for the non-io_uring build error
> message.
>
> Signed-off-by: Stefan Hajnoczi <[email protected]>
> Reviewed-by: Eric Blake <[email protected]>
after commit 047dabef97 ("block/io_uring: use aio_add_sqe()") using
'aio=io_uring' for an 'ide-hd' drive becomes prohibitively slow: for me,
a Debian 12 guest [0] won't even reach the GRUB menu after multiple
minutes. Using aio=threads makes the issue go away, as does building
from the previous commit, i.e. 1eebdab3c3 ("aio-posix: add aio_add_sqe()
API for user-defined io_uring requests"). The issue still exists on
current master.
There are lots and lots of 512 byte read requests. MAX_BUFFERED_REQS in
ide_buffered_readv() is only 16, maybe that has something to do with it?
Haven't had the time to dig into the issue yet, and wasn't able to
reproduce with 'qemu-img bench' or something other than 'ide-hd'.
Best Regards,
Fiona
[0]:
> ./qemu-system-x86_64 \
> -accel kvm \
> -chardev
> 'socket,id=qmp,path=/var/run/qemu-server/500.qmp,server=on,wait=off' \
> -mon 'chardev=qmp,mode=control' \
> -pidfile /var/run/qemu-server/500.pid \
> -nodefaults \
> -vnc 'unix:/var/run/qemu-server/500.vnc,password=on' \
> -m 4096 \
> -device 'VGA,id=vga,bus=pci.0,addr=0x2' \
> -drive
> 'file=/mnt/pve/dir/images/500/vm-500-disk-0.raw,if=none,id=drive-ide0,format=raw,aio=io_uring'
> \
> -device 'ide-hd,bus=ide.0,unit=0,drive=drive-ide0,id=ide0'