On Sun, May 14, 2017 at 11:45 PM, Noah Misch <n...@leadboat.com> wrote: >> I'll add this item in the PostgreSQL 10 Open Items. > > [Action required within three days. This is a generic notification.]
I think there is a good argument that the existing behavior is as per the documentation, but I think we may want to change it anyway. What the documentation is saying - or at least what I believe I intended for it to say - is that connect_timeout is restarted for each new host, so you could end up waiting longer than connect_timeout - but not forever - if you specify multiple hosts. And I believe that statement to be correct. Takayuki Tsunakawa is saying something different. He's saying that when connect_timeout expires, we should try the next host instead of giving up. That may or may not be a good idea, but it doesn't contradict the passage from the documentation which he quoted. That passage from the documentation doesn't say anything at all about what happens when connect_timeout expires. It only talks about how much time might pass before that happens. Takayuki Tsunakawa raised a very similar issue in another thread related to another open item, namely https://www.postgresql.org/message-id/flat/0A3221C70F24FB45833433255569204D1F6F5659%40G01JPEXMBYT05 in which he argued that libpq ought to try then next host after a connection failure regardless of the reason for the connection failure. Tom, Michael Paquier, and I all disagreed; none of us believe that this feature was intended to retry the connection to a different host after an arbitrary error reported by the remote server. This thread is essentially the same issue, except here the question isn't what should happen after we connect to a server and it returns an error, but rather what happens when we time out waiting to connect to a server. When that happens, should we give up, or try the next server? Despite the chorus of support for the opposite conclusion on the other thread, I'm inclined to think that it would be best to change the behavior here as per the proposed patch. The point of being able to specify multiple hosts is to be able to have multiple database servers (or perhaps, multiple ways to access the same database server) and use whichever one of those servers is currently up. I think that when the server fails with a complaint like "I've never heard of the database to which you want to connect" that's not a case of the server being down, but some other kind of trouble that the administrator really ought to fix; thus it's best to stop and report the error. But if connect_timeout expires, that sounds a whole lot like the server being down. It sounds morally equivalent to socket() or connect() failing outright, which *would* trigger advancing to the next host. So I'm inclined to accept the patch, but as a definitional change rather than a bug fix. However, I'd like to hear some other opinions. I'll wait until Friday for such opinions to arrive, and then update on next steps. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers