Am 28.04.26 um 6:18 PM schrieb Stefan Hajnoczi:
> On Tue, Apr 28, 2026 at 02:10:02PM +0200, Fiona Ebner wrote:
>> Hi Stefan,
>>
>> Am 27.04.26 um 9:12 PM schrieb Stefan Hajnoczi:
>>> On Fri, Apr 24, 2026 at 12:25:41PM +0200, Fiona Ebner wrote:
>>>> Dear maintainers,
>>>>
>>>> since QEMU 10.2, if io_uring is enabled, it will be used for the event
>>>> loop of iothreads and this causes an IO pressure stall value of nearly
>>>> 100 when idle.
>>>>
>>>> The issue was also reported on the kernel mailing list [0]. The
>>>> suggestion from Jens Axboe was to just turn off the iowait accounting
>>>> completely. But since (for block/file-posix.c), there is actual IO
>>>> submitted via the same ring, I wasn't sure if that is the right approach.
>>>>
>>>> So the idea was to keep track of whether the event loop is otherwise
>>>> idle and only use the IORING_ENTER_NO_IOWAIT flag in that case [1].
>>>>
>>>> However, doing so would only help for block/file-posix.c, which submits
>>>> IO via luring_co_submit() -> fdmon_io_uring_add_sqe(). For example, for
>>>> block/rbd.c, only a poll SQE for the AioHandler node's fd is used. When
>>>> submitting that poll SQE in the iothread, we would need to be able to
>>>> know if IO for RBD is currently in-flight or not to be able to decide
>>>> whether to use the IORING_ENTER_NO_IOWAIT flag or not. Is there a good
>>>> way to do this (in a general way)?
>>>>
>>>> Or should the flag really always be used (if supported by the kernel)?
>>>> Is there a way to tell io_uring/kernel that we are an event loop and our
>>>> waiting should only be accounted for when there is actual IO in-flight?
>>>>
>>>> Happy to hear your opinions and suggestions!
>>>>
>>>> [0]:
>>>> https://lore.kernel.org/io-uring/[email protected]/T/
>>>
>>> Hi Fiona,
>>> Jens replied yesterday confirmed your suspicion that the number of
>>> inflight requests is not being tracked correctly.
>>>
>>> Is there still a problem after fixing the kernel's inflight counting? If
>>> not, then no QEMU change is necessary and that seems like the cleanest
>>> solution anyway. The kernel should know whether there is I/O in flight
>>> and so it doesn't seem right that userspace needs to hint this.
>>
>>
>> unfortunately, yes. Even with the kernel fix [2], the real problem with
>> poll SQEs described above remains. I'm still seeing high IO pressure
>> stall values when using QEMU. In add_poll_add_sqe(), QEMU submits poll
>> SQEs for the AioHandler node fd, and that does count as pending IO. A
>> small reproducer modeling this [3].
> 
> Does the kernel account POLL_ADD SQEs as blocking I/O activity?

Apparently yes. See the C program below [3].

> That behavior is inconsistent if select(2)/poll(2)/epoll_wait(2)
> syscalls do not count as blocking I/O activity. The kernel io_uring code
> should account them correctly and not rely on a userspace hint.

@Jens Axboe: should there be a separate internal counter for
poll/timeout SQEs and have them not count towards IO wait by default?

> 
> Stefan
> 
>>
>> So the question from above, how to deal with this for block drivers not
>> going through file-posix.c remains.
>>
>> Best Regards,
>> Fiona
>>
>> [2]:
>> https://lore.kernel.org/io-uring/[email protected]/T/
>>
>> [3]:
>>
>> #include <assert.h>
>> #include <errno.h>
>> #include <stdio.h>
>> #include <unistd.h>
>> #include <liburing.h>
>> #include <sys/eventfd.h>
>>
>> int main(void) {
>>     int fd;
>>     int ret;
>>     struct io_uring ring;
>>     struct io_uring_sqe *sqe;
>>
>>     fd = eventfd(0, 0);
>>     assert(fd >= 0);
>>
>>     ret = io_uring_queue_init(128, &ring, 0);
>>     assert(ret == 0);
>>
>>     sqe = io_uring_get_sqe(&ring);
>>     assert(sqe);
>>
>>     io_uring_prep_poll_add(sqe, fd, 1);
>>
>>     ret = io_uring_submit_and_wait(&ring, 1);
>>     printf("got ret %d\n", ret);
>>
>>     io_uring_queue_exit(&ring);
>>
>>     return 0;
>> }
>>
>>



Reply via email to