On Tue, Mar 30, 2021 at 6:25 AM Maksim Milyutin <milyuti...@gmail.com> wrote: > > Hi Thomas! Thanks for working on this patch. > > I have attached a new version with some typo corrections of doc entry, > removing of redundant `include` entries and trailing whitespaces. Also I > added in doc the case when single query transaction with disconnected > client might be eventually commited upon completion in autocommit mode > if it doesn't return any rows (doesn't communicate with user) before > sending final commit message. This behavior might be unexpected for > clients and hence IMO it's worth noticing.
Thanks! > > + * pq_recvbuf_ext - load some bytes into the input buffer > AFAICS, the above fragment is not related with primary fix directly. Right, that was some stuff left over from the v7 patch that I forgot to remove. > AFAICS, there are the following open items that don't allow to treat the > current patch completed: > > * Absence of tap tests emulating some scenarios of client disconnection. > IIRC, you wanted to rewrite the test case posted by Sergey. Yeah, it's a bit tricky to write a test that won't fail randomly on slow animals, but I think I see how now (I think you need to send SELECT pg_sleep(60), then in another connection, wait until that shows as running in pg_stat_activity, and then kill the first process, and then wait until the log contains a disconnected-for-this-reason message). If we can answer the big question below, I'll invest the time to do this. > * Concerns about portability of `pq_check_connection()`A implementation. Yes. This is the main problem for me. I just did a bit of testing to refresh my memory of what doesn't work. On FreeBSD, with TCP keepalives enabled (I tested with tcp_keepalives_idle=1s, tcp_keepalives_count=5, tcp_keepalives_interval=1s to make it aggressive), the POLLUP patch works if you pull out the ethernet cable. But... it doesn't work if you kill -9 the remote psql process (the remote kernel sends a FIN). So this is a patch that only works the way we need it to work on Linux. Other OSes can only tell you about this condition when you try to read or write. > BTW, on windows postgres with this patch have not been built [1]. Huh, the manual says they have struct pollfd, and we include winsock2.h. But I guess it doesn't work reliably on Windows anyway. https://docs.microsoft.com/en-us/windows/win32/api/winsock2/ns-winsock2-wsapollfd > * Absence of benchmark results to show lack of noticeable performance > regression after applying non-zero timeout for checking client liveness. True, we should do that, but I'll be very surprised if it's still a problem: the regression was caused by calling setitimer() for every query, but we fixed that in commit 09cf1d52. If we want to ship this in v14 we have to make a decision ASAP: 1. Ship the POLLHUP patch (like v9) that only works reliably on Linux. Maybe disable the feature completely on other OSes? 2. Ship the patch that tries to read (like v7). It should work on all systems, but it can be fooled by pipelined commands (though it can detect a pipelined 'X'). Personally, I lean towards #2. A third option was mentioned, but we have no patch: 3. We could try to *send* a message. This should work on all systems. Unfortunately our protocol doesn't seem to have an existing "no-op" or "heartbeat" message we could use for this, and sending a NOTICE would be noisy. I suspect this would require some protocol evolution, and there is no time for that in v14.