Hi all, I've been struggling with a bug which seems to be linked to several issues in the polling system on Windows hosts.
When connecting gdb to a qemu-system (it happens with all the emulations I've tried), I've discovered that sometimes a latency appears. It happens with all the commands but it is really noticeable with "call" commands. It can take more than 20s to complete. While investigating it seems that the polling system misses some events and thus waits for the timeout of g_poll (1s) before handling them. It can be seen with any program launched with gdbstub_io_command traces. $ gdb-system-arm -s -S ... gdbstub_io_command Received: m422650,8 gdbstub_io_command Received: m422650,8 Freeze for less than one second gdbstub_io_command Received: P1f=d09fca0000000000 gdbstub_io_command Received: m422650,8 .... This is random but pretty obvious when the freeze happens. An important note is that it's triggered by newer versions of glib. We have a qemu-6 built with glib-2.54 where everything is fine, but when rebuilding it with glib-2.60 this problem appears. I didn't check yet with glib 2.56 or 2.58 because it's still using the autoconf approach instead of meson. Anyway, I didn't find any obvious glib commits which could have introduced this issue. If anyone more experienced with glib has an idea, I'm interested. Afterwards, I've dug into qemu core and how it sets up the connection between gdb and qemu. And I have several questions / ideas about what is happening. IIUC, the gdb connection is handled using an io/channel-watch. This adds a GSource for our given socket (-S being a tcp connection) to be polled by the main loop. For Windows, qio_channel_socket_source_check is the function used for the check operation. In this function, we are both calling WSAEnumNetworkEvents and select. The first one seems here only to reset the events while the second retrieves them. However, it's not an atomic operation. So my guess is that some events are lost during these two operations. I've tried several solutions around that move WSAEnumNetworkEvents after select, replace it with WSAResetEvent, use auto/manual reset in CreateEvent. None of them worked. Afterwards, I've tried to replace select by just WSAEnumNetworkEvents which is supposed to be enough. But I've faced another issue. We have two sources connected to the same socket. These two sources have different conditions G_IO_HUP vs G_IO_IN + G_IO_OUT + ... It's fine on Linux but on Windows, it seems to be problematic as I'm getting the Read event on the GSource having just G_IO_HUP. It's kind of logical as Windows API only knows about HANDLE which is the same in both cases. I've made a quick attempt to create another HANDLE for the second GSource. But it didn't work. The GSource with G_IO_HUP is created by: #0 qio_channel_create_socket_watch (... condition=G_IO_HUP) at io/channel-watch.c #1 qio_channel_create_watch at io/channel.c #2 update_ioc_handlers at chardev/char-socket.c #3 tcp_chr_connect at chardev/char-socket.c #4 tcp_chr_new_client at chardev/char-socket.c #5 qio_net_listener_channel_func at io/net-listener.c #6 g_main_dispatch at glib/gmain.c #7 g_main_context_dispatch at glib/gmain.c #8 os_host_main_loop_wait at util/main-loop.c:480 ... The other is made during the poll_prepare and added as a child_source of the first one. #0 qio_channel_create_socket_watch (..., condition=(G_IO_IN | G_IO_OUT | G_IO_ERR | G_IO_HUP | G_IO_NVAL)) at io/channel-watch.c #1 qio_channel_create_watch at io/channel.c #2 io_watch_poll_prepare at chardev/char-io.c #3 io_watch_poll_prepare at chardev/char-io.c #4 g_main_context_prepare at glib/gmain.c #5 os_host_main_loop_wait at util/main-loop.c ... I'm not familiar enough with glib to know if these child_source are working fine on Windows. I'm currently trying to change the approach and instead of creating a new source, I want to update the previous one. But it needs some important modifications. As I'm a bit taken by the time, I'm looking for a workaround and any advice on that. For now, the only workaround I've found is to reduce the timeout in g_poll to catch the missed events earlier... @Paolo, you were the one implementing the part in io/channel-watch in a5897205677, do you have any ideas or suggestions ? I'll try to send an update with a reproducer. But I didn't have time to create it yet. Thanks in advance Clément