On 2016-01-02 15:40:03 +0100, Andres Freund wrote: > I wonder if the following is the problem: The docs for WSAEventSelect() > says: > "Having successfully recorded the occurrence of the network event (by > setting the corresponding bit in the internal network event record) and > signaled the associated event object, no further actions are taken for > that network event until the application makes the function call that > implicitly reenables the setting of that network event and signaling of > the associated event object." > and also notes specifically for FD_CLOSE that there's no re-enabling > functions. > > See > https://msdn.microsoft.com/en-us/library/windows/desktop/ms741576%28v=vs.85%29.aspx > which goes on to talk about some level triggered events (FD_READ, ...) > and others being edge triggered. It's not clear to me from that whether > FD_CLOSE is supposed to be edge or level triggered. > > If FD_CLOSE is indeed edge and not level triggered - which imo would be > supremely insane - we'd be in trouble. It'd explain why some failures > are noticed and others not.
I found a few more resources confirming that FD_CLOSE is edge triggered. Which probably doesn't just make our code buggy when waiting twice on the same socket, but probably also makes it very timing dependent: As the event is only triggered when the close actually occurs it's possible that we don't have any event associated with that socket: We only do so for shorts amount of time in WaitLatchOrSocket() and pgwin32_waitforsinglesocket(). A bit of searching around brought up that we saw issues around this before: http://www.postgresql.org/message-id/4351.1336927...@sss.pgh.pa.us I really right now can see only two somewhat surgical fixes: 1) We do a nonblocking or select() *after* registering our events. Both in WaitLatchOrSocket() and waitforsinglesocket. Since select/poll are explicitly level triggered, that should make us notice any events we might have missed. select() appears to have been available for a fair while. 2) We explicitly shutdown(SD_BOTH) the socket whenever we get a FD_CLOSE object. I *think* this should trigger errors in WSArecv, WSAEventSelect et al. Doesn't solve the problem that we might miss important events though. Given 2) isn't a complete fix and I can't find reliable documentation since when shutdown() is supported I'm inclined to go with 1). Better ideas? Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers