Thomas Munro <thomas.mu...@gmail.com> writes: > On Tue, Mar 19, 2019 at 9:11 AM Tom Lane <t...@sss.pgh.pa.us> wrote: >>> One thing that isn't real clear to me is how much timing sensitivity >>> there is in "when no more input data is available". Can we assume that >>> if we've gotten ECONNRESET or an allied error from a write, then any >>> data the far end might've wished to send us is already available to >>> read?
> Following a trail beginning at > https://en.wikipedia.org/wiki/Transmission_Control_Protocol I see that > RFC 1122 4.2.2.13 discusses this topic and possible variations in this > area. I don't know enough about any of this stuff to draw hard > conclusions from primary sources, but it does appear that an > implementation might be within its rights to jettison that data, > unfortunately. I spent some time looking at that, and as far as I can see, the behavior reported for Windows is flat out forbidden by TCP. RFC1122 does discuss the possibility that an O/S might not support half-closed connections; but that would only matter if (in Unix terms) we issued shutdown(sock, SHUT_WR) and expected to continue to be able to read, which we do not. In any other scenario, TCP is supposed to deliver any data it successfully received. We can't, of course, work magic and retrieve data the TCP stack never got, nor is there much we can do about it if the stack throws away data despite the spec. So I'm not inclined to fuss about the corner cases. The main thing for us is to ensure that if a server error message is available to read, we do so and return it rather than returning a less-helpful bleat about being unable to write. The proposed patch does that. It will also tend to report bleats about read failure rather than write failure, even if from libpq's perspective the write failure happened first; but that seems fine to me. >>> The reason I'm concerned is that I don't think it'd be bright to ignore a >>> send error until we see input EOF, which'd be the obvious way to solve a >>> timing problem if there is one. If our send error was some transient >>> glitch and the connection is really still open, then we might not get EOF, >>> but we won't get a server reply either because no message went to the >>> server. You could imagine waiting some small interval of time before >>> deciding that it's time to report the write failure, but ugh. I'm likewise inclined to dismiss this worry, because I don't see how it could happen, given that the server doesn't use shutdown(2). A send error should imply either that the kernel saw RST from the remote end, or that the connection is local and the kernel knows that the server closed its socket or crashed. In either scenario the kernel should consider that the incoming data stream is ended; maybe it'll give us whatever data it received or maybe not, but it shouldn't allow us to sit and wait for more data. Or in other words: there are no client-visible "transient glitches" in TCP. Either the connection is still up, or it's been reset. So I'm inclined to (1) commit the patch as-proposed in HEAD, and (2) hack the ssl test cases in v11 as you suggested. If we see field complaints about this, we can consider reverting (2) in favor of a back-patch once v12 beta is over. regards, tom lane