Re: Multiple hosts in connection string failed to failover in non-hot standby mode

Justin Pryzby Thu, 06 May 2021 09:27:11 -0700

On Sun, Jan 10, 2021 at 05:38:50PM -0500, Tom Lane wrote:
> I wrote:
> > The problems that I see in this area are first that there's no
> > real standardization in libpq as to whether to append error messages
> > together or just flush preceding messages; and second that no effort
> > is made in multi-connection-attempt cases to separate the errors from
> > different attempts, much less identify which error goes with which
> > host or IP address.  I think we really need to put some work into
> > that.
> 
> I spent some time on this, and here is a patch set that tries to
> improve matters.
> 
> 0001 changes the libpq coding rules to be that all error messages should
> be appended to conn->errorMessage, never overwritten (there are a couple
> of exceptions in fe-lobj.c) and we reset conn->errorMessage to empty only
> at the start of a connection request or new query.  This is something
> that's been bugging me for a long time and I'm glad to get it cleaned up.
> Formerly it seemed that a dartboard had been used to decide whether to use
> "printfPQExpBuffer" or "appendPQExpBuffer"; now it's always the latter.
> We can also get rid of several hacks that were used to get around the
> mess and force appending behavior.
> 
> 0002 then changes the reporting rules in fe-connect.c as I suggested,
> so that you might get errors like this:
> 
> $ psql -h localhost,/tmp -p 12345
> psql: error: could not connect to host "localhost" (::1), port 12345: 
> Connection refused
>         Is the server running on that host and accepting TCP/IP connections?
> could not connect to host "localhost" (127.0.0.1), port 12345: Connection 
> refused
>         Is the server running on that host and accepting TCP/IP connections?
> could not connect to socket "/tmp/.s.PGSQL.12345": No such file or directory
>         Is the server running locally and accepting connections on that 
> socket?
> 
> and we have a pretty uniform rule that errors coming back from a
> connection attempt will be prefixed with the server address.


52a10224 broke sqlsmith, of all things.

It was using errmsg as a test of success, instead of checking if the connection
result wasn't null:

     conn = PQconnectdb(conninfo.c_str());
     char *errmsg = PQerrorMessage(conn);
     if (strlen(errmsg))
         throw dut::broken(errmsg, "08001");

That's clearly the wrong thing to do, but maybe this should be described in the
release notes as a compatibility issue, in case other people had the same idea.
Clearing errorMessage during success is an option..

-- 
Justin

Re: Multiple hosts in connection string failed to failover in non-hot standby mode

Reply via email to