On Tue, May 16, 2017 at 3:13 AM, Tom Lane <t...@sss.pgh.pa.us> wrote: > FWIW, I think the position most of us were taking is that this feature > is meant to retry transport-level connection failures, not cases where > we successfully make a connection to a server and then it rejects our > login attempt. I would classify a timeout as a transport-level failure > as long as it occurs before we got any server response --- if it happens > during the authentication protocol, that's less clear. But it might not > be very practical to distinguish those two cases. > > In short, +1 for retrying on timeout during connection, and I'm okay with > retrying a timeout during authentication if it's not practical to treat > that differently.
Sensible argument here. It could happen that a server is unresponsive, for example in split-brains (?). I have been playing a bit with the patch. + * + * Returns -1 on failure, 0 if the socket is readable/writable, 1 if it timed out. */ pqWait is internal to libpq, so we are free to set up what we want here. Still I think that we should be consistent with what pqSocketCheck returns: * >0 means that the socket is readable/writable, counting things. * 0 is for timeout. * -1 on failure. + int ret = 0; + int timeout = 0; The declaration of "ret" should be internal in the for(;;) loop. + /* Attempt connection to the next host, starting the connect_timeout timer */ + pqDropConnection(conn, true); + conn->addr_cur = conn->connhost[conn->whichhost].addrlist; + conn->status = CONNECTION_NEEDED; + finish_time = time(NULL) + timeout; + } I think that it would be safer to not set finish_time if conn->connect_timeout is NULL. I agree that your code works because pqWaitTimed() will never complain on timeout reached if finish_time is -1. That's for robustness sake. The docs say that for connect_timeout: <para> Maximum wait for connection, in seconds (write as a decimal integer string). Zero or not specified means wait indefinitely. It is not recommended to use a timeout of less than 2 seconds. This timeout applies separately to each connection attempt. For example, if you specify two hosts and both of them are unreachable, and <literal>connect_timeout</> is 5, the total time spent waiting for a connection might be up to 10 seconds. </para> It seems to me that this implies that if a timeout occurs on the first connection, the counter is reset, which is what this patch is doing. So we are all set. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers