Am 18.11.21 um 03:04 schrieb Tom Lane:
Thomas Munro <thomas.mu...@gmail.com> writes:
I realise now that the experiments we did a while back to try to
understand this across a few different operating systems[2] had missed
this subtlety, because that Python script had an explicit close()
call, whereas PostgreSQL exits.  It still revealed that the client
isn't allowed to read any data after its write failed, which is a
known source of error messages being eaten.
Yeah.  After re-reading that thread, I'm a bit confused about how
to square the results we got then with Lars' report.  The Windows
documentation he pointed to does claim that the default behavior if you
issue closesocket() is to do a "graceful close in the background", which
one would think means allowing sent data to be received.  That's not what
we saw.  It's possible that we would get different results if we re-tested
with a scenario where the client doesn't attempt to send data after the
server-side close; but I'm not sure how much it's worth to improve that
case if the other case still fails hard.

Form my experimentation the Winsock implementation has the two issues which I explained. First it drops all received but not yet retrieved data as soon as it receives a RST packet. And secondly it always sends a RST packet on every socket, that wasn't send-closed at process termination, regardless if there is any pending data.

Sending data to a socket, that was already closed from the other side is only one way to trigger a RST packet, but closing a socket with l_linger=0 is another way and process termination is the third. They all can lead to data loss on the receiver side, presumably because of the RST flag.

An alternative to closesocket() is shutdown(sock, SD_SEND). It doesn't free the socket resource, but leads to a graceful shutdown. However the FIN packet is send when the shutdown() or closesocket() function is called and that's still short before the process terminates. I did some more testing with different linger options, but it didn't change the behavior substantial. So I didn't find any way to close the socket with a FIN packet at the point in time of the process termination.

The other way around would be to make sure on the client side, that the last message is retrieved before the RST packet arrives, so that no data is lost. This works mostly well through the sync API of libpq, but with the async API the trigger for data reception is outside of the scope of libpq, so that there's no way to ensure recv() is called quick enough, after the data was received but before RST arrives. On a local client+server combination there is only a gap of 0.5 milliseconds or so. I also didn't find a way to retrieve the enqueued data after RST arrived. Maybe there's a nasty hack to retrieve the data afterwards, but I didn't dig into assembly code and memory layout of Winsock internals.


In any case, our previous
results definitely show that issuing an explicit close() is no panacea.
I don't fully understand the issue with closing the socket before process termination. Sure, it can be a valuable information that the corresponding backend process has definitely terminated. At least in the context of regression testing or so. But I think that loosing messages from the backend is way more critical than a non-sync process termination. Do I miss something?

--

Regards,
Lars Kanis




Reply via email to