Was the primary running and accepting connections when you encountered this 
error?  That is, if you specified host="host1 host2", host1 was the non-hot 
standby and host2 was a running primary?  Or only the non-hot standby was 
running?

If a primary was running, I'd say it's a bug...  Perhaps the following part in 
libpq gives up connection attempts wen the above FATAL error is returned from 
the server.  Maybe libpq should differentiate errors using SQLSTATE and 
continue connection attempts on other hosts.
Yes, the primary was running, but non-hot standby is in front of the primary in 
connection string.
Hao Wu and I wrote a patch to fix this problem. Client side libpq should try 
another hosts in connection string when it is rejected by a non-hot standby, or 
the first host encounter some n/w problems during the libpq handshake.

Please send emails in text format.  Your email was in HTML, and I changed this 
reply to text format.
Thanks. Is this email in text format now? I just use outlook in chrome. Let me 
know if it still in html format.

Hubert & Hao Wu

________________________________
From: tsunakawa.ta...@fujitsu.com <tsunakawa.ta...@fujitsu.com>
Sent: Tuesday, October 27, 2020 5:30 PM
To: Hubert Zhang <zhub...@vmware.com>
Cc: pgsql-hack...@postgresql.org <pgsql-hack...@postgresql.org>
Subject: RE: Multiple hosts in connection string failed to failover in non-hot 
standby mode

Please send emails in text format.  Your email was in HTML, and I changed this 
reply to text format.


From: Hubert Zhang <zhub...@vmware.com>
> Libpq has supported to specify multiple hosts in connection string and enable 
> auto failover when the previous PostgreSQL instance cannot be accessed.
> But when I tried to enable this feature for a non-hot standby, it cannot do 
> the failover with the following messages.
>
> psql: error: could not connect to server: FATAL:  the database system is 
> starting up

Was the primary running and accepting connections when you encountered this 
error?  That is, if you specified host="host1 host2", host1 was the non-hot 
standby and host2 was a running primary?  Or only the non-hot standby was 
running?

If a primary was running, I'd say it's a bug...  Perhaps the following part in 
libpq gives up connection attempts wen the above FATAL error is returned from 
the server.  Maybe libpq should differentiate errors using SQLSTATE and 
continue connection attempts on other hosts.

[fe-connect.c]
                /* Handle errors. */
                if (beresp == 'E')
                {
                    if (PG_PROTOCOL_MAJOR(conn->pversion) >= 3)
...
#endif

                    goto error_return;
                }

                /* It is an authentication request. */
                conn->auth_req_received = true;

                /* Get the type of request. */


Regards
Takayuki Tsunakawa

Attachment: 0001-Enhance-libpq-to-support-multiple-host-for-non-hot-s.patch
Description: 0001-Enhance-libpq-to-support-multiple-host-for-non-hot-s.patch

Reply via email to