Re: Windows: Wrong error message at connection termination

Lars Kanis Sun, 21 Nov 2021 11:19:46 -0800

Am 18.11.21 um 03:04 schrieb Tom Lane:

Thomas Munro <thomas.mu...@gmail.com> writes:

I realise now that the experiments we did a while back to try to
understand this across a few different operating systems[2] had missed
this subtlety, because that Python script had an explicit close()
call, whereas PostgreSQL exits.  It still revealed that the client
isn't allowed to read any data after its write failed, which is a
known source of error messages being eaten.

Yeah.  After re-reading that thread, I'm a bit confused about how
to square the results we got then with Lars' report.  The Windows
documentation he pointed to does claim that the default behavior if you
issue closesocket() is to do a "graceful close in the background", which
one would think means allowing sent data to be received.  That's not what
we saw.  It's possible that we would get different results if we re-tested
with a scenario where the client doesn't attempt to send data after the
server-side close; but I'm not sure how much it's worth to improve that
case if the other case still fails hard.

Form my experimentation the Winsock implementation has the two issueswhich I explained. First it drops all received but not yet retrieveddata as soon as it receives a RST packet. And secondly it always sends aRST packet on every socket, that wasn't send-closed at processtermination, regardless if there is any pending data.

Sending data to a socket, that was already closed from the other side isonly one way to trigger a RST packet, but closing a socket withl_linger=0 is another way and process termination is the third. They allcan lead to data loss on the receiver side, presumably because of theRST flag.

An alternative to closesocket() is shutdown(sock, SD_SEND). It doesn'tfree the socket resource, but leads to a graceful shutdown. However theFIN packet is send when the shutdown() or closesocket() function iscalled and that's still short before the process terminates. I did somemore testing with different linger options, but it didn't change thebehavior substantial. So I didn't find any way to close the socket witha FIN packet at the point in time of the process termination.

The other way around would be to make sure on the client side, that thelast message is retrieved before the RST packet arrives, so that no datais lost. This works mostly well through the sync API of libpq, but withthe async API the trigger for data reception is outside of the scope oflibpq, so that there's no way to ensure recv() is called quick enough,after the data was received but before RST arrives. On a localclient+server combination there is only a gap of 0.5 milliseconds or so.I also didn't find a way to retrieve the enqueued data after RSTarrived. Maybe there's a nasty hack to retrieve the data afterwards, butI didn't dig into assembly code and memory layout of Winsock internals.

In any case, our previous
results definitely show that issuing an explicit close() is no panacea.

I don't fully understand the issue with closing the socket beforeprocess termination. Sure, it can be a valuable information that thecorresponding backend process has definitely terminated. At least in thecontext of regression testing or so. But I think that loosing messagesfrom the backend is way more critical than a non-sync processtermination. Do I miss something?


--

Regards,
Lars Kanis

Re: Windows: Wrong error message at connection termination

Reply via email to