On Wed, Jan 12, 2022 at 4:00 AM Alexander Lakhin <exclus...@gmail.com> wrote: > So here we get similar hanging on WaitLatchOrSocket(). > Just to make sure that it's indeed the same issue, I've removed socket > shutdown&close and the test executed to the end (several times). Argh.
Ouch. I think our options at this point are: 1. Revert 6051857fc (and put it back when we have a working long-lived WES as I showed). This is not very satisfying, now that we understand the bug, because even without that change I guess you must be able to reach the hanging condition by using Windows postgres_fdw to talk to a non-Windows server (ie a normal TCP stack with graceful shutdown/linger on process exit). 2. Put your poll() check into the READABLE side. There's some precedent for that sort of kludge on the WRITEABLE side (and a rejection of the fragile idea that clients of latch.c should only perform "safe" sequences): /* * Windows does not guarantee to log an FD_WRITE network event * indicating that more data can be sent unless the previous send() * failed with WSAEWOULDBLOCK. While our caller might well have made * such a call, we cannot assume that here. Therefore, if waiting for * write-ready, force the issue by doing a dummy send(). If the dummy * send() succeeds, assume that the socket is in fact write-ready, and * return immediately. Also, if it fails with something other than * WSAEWOULDBLOCK, return a write-ready indication to let our caller * deal with the error condition. */