On Mon, Sep 5, 2022 at 2:04 PM Marc-André Lureau <marcandre.lur...@gmail.com> wrote: > > Hi > > On Sun, Sep 4, 2022 at 10:24 AM Bin Meng <bmeng...@gmail.com> wrote: >> >> On Thu, Sep 1, 2022 at 8:58 PM Marc-André Lureau >> <marcandre.lur...@gmail.com> wrote: >> > >> > Hi >> > >> > On Wed, Aug 24, 2022 at 2:49 PM Bin Meng <bmeng...@gmail.com> wrote: >> >> >> >> From: Bin Meng <bin.m...@windriver.com> >> >> >> >> Random failure was observed when running qtests on Windows due to >> >> "Broken pipe" detected by qmp_fd_receive(). What happened is that >> >> the qtest executable sends testing data over a socket to the QEMU >> >> under test but no response is received. The errno of the recv() >> >> call from the qtest executable indicates ETIMEOUT, due to the qmp >> >> chardev's tcp_chr_read() is never called to receive testing data >> >> hence no response is sent to the other side. >> >> >> >> tcp_chr_read() is registered as the callback of the socket watch >> >> GSource. The reason of the callback not being called by glib, is >> >> that the source check fails to indicate the source is ready. There >> >> are two socket watch sources created to monitor the same socket >> >> event object from the char-socket backend in update_ioc_handlers(). >> >> >> >> During the source check phase, qio_channel_socket_source_check() >> >> calls WSAEnumNetworkEvents() to discovers occurrences of network >> >> events for the indicated socket, clear internal network event records, >> >> and reset the event object. Testing shows that if we don't reset the >> >> event object by not passing the event handle to WSAEnumNetworkEvents() >> >> the symptom goes away and qtest runs very stably. >> >> >> >> It looks we don't need to call WSAEnumNetworkEvents() at all, as we >> >> don't parse the result of WSANETWORKEVENTS returned from this API. >> >> We use select() to poll the socket status. Fix this instability by >> >> dropping the WSAEnumNetworkEvents() call. >> >> >> >> Signed-off-by: Bin Meng <bin.m...@windriver.com> >> > >> > >> > What clears the event then? >> > >> >> It seems we don't need to clear the event as everything still works as >> expected. > > > Well, it can "work" but are you sure it doesn't have a busy loop?
You mean busy loop in g_poll()? >> >> >> >> --- >> >> During the testing, I removed the following codes in >> >> update_ioc_handlers(): >> >> >> >> remove_hup_source(s); >> >> s->hup_source = qio_channel_create_watch(s->ioc, G_IO_HUP); >> >> g_source_set_callback(s->hup_source, (GSourceFunc)tcp_chr_hup, >> >> chr, NULL); >> >> g_source_attach(s->hup_source, chr->gcontext); >> >> >> >> and such change also makes the symptom go away. >> >> >> >> And if I moved the above codes to the beginning, before the call to >> >> io_add_watch_poll(), the symptom also goes away. >> >> >> >> It seems two sources watching on the same socket event object is >> >> the key that leads to the instability. The order of adding a source >> >> watch seems to also play a role but I can't explain why. >> >> Hopefully a Windows and glib expert could explain this behavior. >> >> >> > >> > Feel free to leave that comment in the commit message. >> >> Sure, will add the above message into the commit in v2. >> >> > >> > This is strange, as both sources should have different events, clearing >> > one shouldn't affect the other. >> >> Both sources have the same event, as one QIO channel only has one >> socket, and one socket can only be bound to one event. > > > "one socket can only be bound to one event", is that a WSAEventSelect > limitation? > Yes, please see the MSDN: It is not possible to specify different event objects for different network events. The following code will not work; the second call will cancel the effects of the first, and only the FD_WRITE network event will be associated with hEventObject2: rc = WSAEventSelect(s, hEventObject1, FD_READ); rc = WSAEventSelect(s, hEventObject2, FD_WRITE); //bad >> >> > >> > I guess it's WSAEnumNetworkEvents clearing of the internal network event >> > records that is problematic. >> > >> > Can you check if you replace the call with ResetEvent() everything works? >> >> No, ResetEvent() does not work, and is not recommended by MSDN [1] >> too, which says: > > > It probably works to some extent (I see glib is using it > https://gitlab.gnome.org/GNOME/glib/-/blob/main/glib/giowin32.c#L903), What > you mean is that it doesn't solve the issue, I guess. Correct, it does not solve the issue. >> >> >> The proper way to reset the state of an event object used with the >> WSAEventSelect function is to pass the handle of the event object to >> the WSAEnumNetworkEvents function in the hEventObject parameter. This >> will reset the event object and adjust the status of active FD events >> on the socket in an atomic fashion. >> > > This is not what you want though if you have multiple event objects for the > same socket. > >> >> [1] >> https://docs.microsoft.com/en-us/windows/win32/api/winsock2/nf-winsock2-wsaeventselect Regards, Bin