Am 30.10.2025 um 16:21 hat Stefan Hajnoczi geschrieben:
> v5:
> - Explain how fdmon-io_uring.c differs from other fdmon implementations
> in commit message [Kevin]
> - Move test-nested-aio-poll aio_get_g_source() removal into commit that
> touches test case [Kevin]
> - Avoid g_source_add_poll() use-after-free in fdmon_poll_update() [Kevin]
> - Avoid duplication in fdmon_epoll_gsource_dispatch(), use fdmon_epoll_wait()
> [Kevin]
> - Drop unnecessary revents checks in fdmon_poll_gsource_dispatch() [Kevin]
> - Mention in commit message that fdmon-io_uring.c is the new default [Kevin]
> - Add comments explaining how to clean up resources in error paths [Kevin]
> - Indicate error in return value from function with Error *errp arg [Kevin]
> - Add patch to unindent fdmon_io_uring_destroy() [Kevin]
> - Add patch to introduce FDMonOps->dispatch() callback [Kevin]
> - Drop patch with hacky BH optimization for fdmon-io_uring.c [Kevin]
> - Replace cqe_handler_bh with FDMonOps->dispatch() [Kevin]
> - Rename AioHandler->cqe_handler field to ->internal_cqe_handler [Kevin]
> - Consolidate fdmon-io_uring.c trace-events changes into this commit
> - Reduce #ifdef HAVE_IO_URING_PREP_WRITEV2 code duplication [Kevin]
The changes look good to me.
However, the test cases are still failing. I just tried to see where
test-aio is stuck, and while I looked for a backtrace first, I noticed
that just attaching gdb to the process and immediately detaching again
makes the test unstuck. Very strange.
This is the backtrace, maybe a bit unsurpring:
(gdb) bt
#0 0x00007ffff7e6fec6 in __io_uring_submit () from /lib64/liburing.so.2
#1 0x00005555556f4394 in fdmon_io_uring_wait (ctx=0x555556409950,
ready_list=0x7fffffffcda0, timeout=749993088) at ../util/fdmon-io_uring.c:410
#2 0x00005555556ed29f in aio_poll (ctx=0x555556409950, blocking=true) at
../util/aio-posix.c:699
#3 0x0000555555681547 in test_timer_schedule () at ../tests/unit/test-aio.c:413
#4 0x00007ffff6f30e7e in test_case_run (tc=0x55555640d340,
test_run_name=0x55555640de10 "/aio/timer/schedule", path=<optimized out>) at
../glib/gtestutils.c:3115
#5 g_test_run_suite_internal (suite=suite@entry=0x5555558696d0,
path=path@entry=0x0) at ../glib/gtestutils.c:3210
#6 0x00007ffff6f30df3 in g_test_run_suite_internal
(suite=suite@entry=0x555555867480, path=path@entry=0x0) at
../glib/gtestutils.c:3229
#7 0x00007ffff6f30df3 in g_test_run_suite_internal
(suite=suite@entry=0x555555867720, path=path@entry=0x0) at
../glib/gtestutils.c:3229
#8 0x00007ffff6f313aa in g_test_run_suite (suite=suite@entry=0x555555867720)
at ../glib/gtestutils.c:3310
#9 0x00007ffff6f31440 in g_test_run () at ../glib/gtestutils.c:2379
#10 g_test_run () at ../glib/gtestutils.c:2366
#11 0x000055555567e204 in main (argc=1, argv=0x7fffffffd488) at
../tests/unit/test-aio.c:872
And running it under strace shows that we're indeed hanging in the
syscall:
write(1, "# Start of timer tests\n", 23) = 23
eventfd2(0, EFD_CLOEXEC|EFD_NONBLOCK) = 9
io_uring_enter(7, 1, 0, 0, NULL, 8) = 1
clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=1, tv_nsec=0}, 0x7ffc239bec80) = 0
io_uring_enter(7, 1, 1, IORING_ENTER_GETEVENTS, NULL, 8
Of course, if I start the test without strace and then attach strace to
the running process, that gets it unstuck like attaching gdb (not very
surprising, I guess, it's both just ptrace).
Finally I tried Ctrl-C while having strace logging to a file, and now
the io_uring_enter() returns 1 (rather than EINTR or 0 or whatever):
io_uring_enter(7, 1, 1, IORING_ENTER_GETEVENTS, NULL, 8) = 1
--- SIGINT {si_signo=SIGINT, si_code=SI_KERNEL} ---
+++ killed by SIGINT +++
Not sure what to make of this.
I think you already said you run the same kernel version, but just to be
sure, I'm running 6.17.5-200.fc42.x86_64.
Kevin