Re: Multiple hosts in connection string failed to failover in non-hot standby mode
On Mon, Sep 13, 2021 at 04:09:26PM -0400, Tom Lane wrote: > I got around to looking at this issue today, and verified that only one > place needs to be changed, as attached. Thanks! This looks fine to me at quick glance. -- Michael signature.asc Description: PGP signature
Re: Multiple hosts in connection string failed to failover in non-hot standby mode
Michael Paquier writes: > On Sun, May 30, 2021 at 08:25:00PM -0500, Justin Pryzby wrote: >> ..But I think it's not useful to put details into errorMessage on success, >> unless you're going to document that. It would never have occurred to me to >> look there, or that it would even be safe. > Yeah. On the contrary, it could be confusing if one sees an error > message but there is nothing to worry about, because things are > working in the scope of what the user wanted at connection time. I got around to looking at this issue today, and verified that only one place needs to be changed, as attached. Although I was initially thinking that maybe we should leave the code as-is, I now agree that resetting errorMessage is a good idea, because what tends to be in it at this point is something like "connection to server on socket "/tmp/.s.PGSQL.5432" failed: " (ie the string made by emitHostIdentityInfo). Anybody who does look at that is likely to be confused, because the connection *didn't* fail. There might be some value in my original idea of preserving a trace of the servers we tried before succeeding. But it would take additional work to present it in a non-confusing way, and given the lack of any field requests for that, I'm not excited about doing it right now. (One could also argue that it ought to get tied into the PQtrace facilities somehow, rather than being done in this ad-hoc way.) regards, tom lane diff --git a/src/interfaces/libpq/fe-connect.c b/src/interfaces/libpq/fe-connect.c index 49eec3e835..b288d346f9 100644 --- a/src/interfaces/libpq/fe-connect.c +++ b/src/interfaces/libpq/fe-connect.c @@ -3654,6 +3654,13 @@ keep_going: /* We will come back to here until there is /* We can release the address list now. */ release_conn_addrinfo(conn); +/* + * Contents of conn->errorMessage are no longer interesting + * (and it seems some clients expect it to be empty after a + * successful connection). + */ +resetPQExpBuffer(>errorMessage); + /* We are open for business! */ conn->status = CONNECTION_OK; return PGRES_POLLING_OK;
Re: Multiple hosts in connection string failed to failover in non-hot standby mode
I wrote: > Now I'm inclined to go back to the first-draft patch I had, which just > dropped the first problematic test case, and added gssencmode=disable > to the second one. Done that way. If we figure out why the GSS code is acting strangely on hamerkop, maybe this can be reverted --- but it seems like we already spent more time than is justified looking for band-aids. regards, tom lane
Re: Multiple hosts in connection string failed to failover in non-hot standby mode
Michael Paquier writes: > On Wed, Jun 09, 2021 at 12:05:10PM -0400, Tom Lane wrote: >> Here's a draft patch that renames regress_ecpg_user2 to ecpg2_regression, > Using ecpg2_regression for the role goes a bit against the recent rule > to not create any role not suffixed by "regress_" as part of the > regression tests, but I am fine to live with that here. Oh dear, I forgot to check that carefully. I'd been thinking the rule was that such names must *contain* "regress", but looking at user.c, it's stricter: #ifdef ENFORCE_REGRESSION_TEST_NAME_RESTRICTIONS if (strncmp(stmt->role, "regress_", 8) != 0) elog(WARNING, "roles created by regression test cases should have names starting with \"regress_\""); #endif Meanwhile, the rule for database names is: #ifdef ENFORCE_REGRESSION_TEST_NAME_RESTRICTIONS if (IsUnderPostmaster && strstr(dbname, "regression") == NULL) elog(WARNING, "databases created by regression test cases should have names including \"regression\""); #endif So unless we want to relax one or both of those, we can't have a user name that matches the database name. Now I'm inclined to go back to the first-draft patch I had, which just dropped the first problematic test case, and added gssencmode=disable to the second one. regards, tom lane
Re: Multiple hosts in connection string failed to failover in non-hot standby mode
On Wed, Jun 09, 2021 at 12:05:10PM -0400, Tom Lane wrote: > Here's a draft patch that renames regress_ecpg_user2 to ecpg2_regression, > which matches the name of one of the databases used, allowing the test > cases with defaulted database name to succeed. That gets rid of one of > the problematic diffs. Yeah, I agree that this does not matter much for this one, as we want to stress the quotes and the grammar for the connections here, as 99a5619 implies. It is good to check for the failure path as well, so what you have here looks fine to me. > As it stood, though, that meant that connect/test5 > wasn't exercising the connection-failure code path at all, which didn't > seem like what we want. So I adjusted the second place that had been > failing to again fail on no-such-database, and stuck in gssencmode=disable > so that we shouldn't get any test diff on hamerkop. Using ecpg2_regression for the role goes a bit against the recent rule to not create any role not suffixed by "regress_" as part of the regression tests, but I am fine to live with that here. The changes for test1 with MinGW look right, I have not been able to test them. -- Michael signature.asc Description: PGP signature
Re: Multiple hosts in connection string failed to failover in non-hot standby mode
I wrote: > ... I'd be okay with dropping that test; or maybe we could > fix things so that the default case succeeds? Here's a draft patch that renames regress_ecpg_user2 to ecpg2_regression, which matches the name of one of the databases used, allowing the test cases with defaulted database name to succeed. That gets rid of one of the problematic diffs. As it stood, though, that meant that connect/test5 wasn't exercising the connection-failure code path at all, which didn't seem like what we want. So I adjusted the second place that had been failing to again fail on no-such-database, and stuck in gssencmode=disable so that we shouldn't get any test diff on hamerkop. regards, tom lane diff --git a/src/interfaces/ecpg/test/Makefile b/src/interfaces/ecpg/test/Makefile index be53b7b94d..abea3fcc85 100644 --- a/src/interfaces/ecpg/test/Makefile +++ b/src/interfaces/ecpg/test/Makefile @@ -77,7 +77,7 @@ $(remaining_files_build): $(abs_builddir)/%: $(srcdir)/% endif # Common options for tests. Also pick up anything passed in EXTRA_REGRESS_OPTS -REGRESS_OPTS = --dbname=ecpg1_regression,ecpg2_regression --create-role=regress_ecpg_user1,regress_ecpg_user2 $(EXTRA_REGRESS_OPTS) +REGRESS_OPTS = --dbname=ecpg1_regression,ecpg2_regression --create-role=regress_ecpg_user1,ecpg2_regression $(EXTRA_REGRESS_OPTS) check: all $(with_temp_install) ./pg_regress $(REGRESS_OPTS) --temp-instance=./tmp_check $(TEMP_CONF) --bindir= $(pg_regress_locale_flags) $(THREAD) --schedule=$(srcdir)/ecpg_schedule sql/twophase diff --git a/src/interfaces/ecpg/test/connect/test1.pgc b/src/interfaces/ecpg/test/connect/test1.pgc index 961bd72ef2..77e571ec86 100644 --- a/src/interfaces/ecpg/test/connect/test1.pgc +++ b/src/interfaces/ecpg/test/connect/test1.pgc @@ -26,16 +26,16 @@ exec sql end declare section; exec sql connect to ecpg2_regression@localhost as main; exec sql disconnect main; - exec sql connect to @localhost as main user regress_ecpg_user2; + exec sql connect to @localhost as main user ecpg2_regression; exec sql disconnect main; - /* exec sql connect to :@TEMP_PORT@ as main user regress_ecpg_user2; + /* exec sql connect to :@TEMP_PORT@ as main user ecpg2_regression; exec sql disconnect main; */ exec sql connect to tcp:postgresql://localhost/ecpg2_regression user regress_ecpg_user1 identified by connectpw; exec sql disconnect; - exec sql connect to tcp:postgresql://localhost/ user regress_ecpg_user2; + exec sql connect to tcp:postgresql://localhost/ user ecpg2_regression; exec sql disconnect; strcpy(pw, "connectpw"); diff --git a/src/interfaces/ecpg/test/connect/test5.pgc b/src/interfaces/ecpg/test/connect/test5.pgc index e712fa8778..96a7b3e892 100644 --- a/src/interfaces/ecpg/test/connect/test5.pgc +++ b/src/interfaces/ecpg/test/connect/test5.pgc @@ -21,7 +21,7 @@ exec sql end declare section; ECPGdebug(1, stderr); exec sql connect to ecpg2_regression as main; - exec sql alter user regress_ecpg_user2 ENCRYPTED PASSWORD 'insecure'; + exec sql alter user ecpg2_regression ENCRYPTED PASSWORD 'insecure'; exec sql alter user regress_ecpg_user1 ENCRYPTED PASSWORD 'connectpw'; exec sql commit; exec sql disconnect; /* <-- "main" not specified */ @@ -40,7 +40,7 @@ exec sql end declare section; exec sql connect to 'ecpg2_regression' as main; exec sql disconnect main; - exec sql connect to as main user regress_ecpg_user2/insecure; + exec sql connect to as main user ecpg2_regression/insecure; exec sql disconnect main; exec sql connect to ecpg2_regression as main user regress_ecpg_user1/connectpw; @@ -61,7 +61,7 @@ exec sql end declare section; exec sql connect to "unix:postgresql://200.46.204.71/ecpg2_regression" as main user regress_ecpg_user1/connectpw; exec sql disconnect main; - exec sql connect to unix:postgresql://localhost/ as main user regress_ecpg_user2 IDENTIFIED BY insecure; + exec sql connect to "unix:postgresql://localhost/no_such_db?gssencmode=disable" as main user ecpg2_regression IDENTIFIED BY insecure; exec sql disconnect main; /* connect twice */ diff --git a/src/interfaces/ecpg/test/expected/connect-test1-minGW32.stderr b/src/interfaces/ecpg/test/expected/connect-test1-minGW32.stderr index c5b5248eb2..e5f16bd854 100644 --- a/src/interfaces/ecpg/test/expected/connect-test1-minGW32.stderr +++ b/src/interfaces/ecpg/test/expected/connect-test1-minGW32.stderr @@ -14,30 +14,18 @@ [NO_PID]: sqlca: code: 0, state: 0 [NO_PID]: ecpg_finish: connection main closed [NO_PID]: sqlca: code: 0, state: 0 -[NO_PID]: ECPGconnect: opening database on localhost port for user regress_ecpg_user2 -[NO_PID]: sqlca: code: 0, state: 0 -[NO_PID]: ECPGconnect: connection to server failed: FATAL: database "regress_ecpg_user2" does not exist +[NO_PID]: ECPGconnect: opening database on localhost port for user ecpg2_regression [NO_PID]: sqlca: code: 0, state: 0 [NO_PID]: ecpg_finish: connection main closed [NO_PID]: sqlca: code:
Re: Multiple hosts in connection string failed to failover in non-hot standby mode
Michael Paquier writes: > On Tue, Jun 08, 2021 at 11:21:34AM -0400, Tom Lane wrote: >> IIUC, what you are proposing to do is replace pgwin32_setenv with >> src/port/setenv.c. I don't think that's an improvement. setenv.c >> leaks memory on repeat calls, because it cannot know what >> pgwin32_setenv knows about how putenv works on that platform. > Is gaur the only animal that needs this file, by the way? I think it is. setenv has been in POSIX for awhile, so probably only very old systems would need that. (This is why I don't care that much that setenv.c leaks memory. But we can't start using it on platforms where we *do* care about performance.) >> (3) Don't try to use the environment variable for this >> purpose. I'd originally tried to change test5.pgc to just >> specify gssmode=disable in-line, but that only works >> nicely for one of the two failing cases. The other one >> is testing the case of a completely defaulted connection >> target, so there's no place to add an option without >> breaking the only unique aspect of that test case. > FWIW, I'd be rather in favor of doing (3) because this remains simple > just to take care of an edge case, even if that partially breaks the > promise to rely on a default connection. Yeah, it doesn't seem like we need to test that case all that badly. I'd be okay with dropping that test; or maybe we could fix things so that the default case succeeds? regards, tom lane
Re: Multiple hosts in connection string failed to failover in non-hot standby mode
On Tue, Jun 08, 2021 at 11:21:34AM -0400, Tom Lane wrote: > IIUC, what you are proposing to do is replace pgwin32_setenv with > src/port/setenv.c. I don't think that's an improvement. setenv.c > leaks memory on repeat calls, because it cannot know what > pgwin32_setenv knows about how putenv works on that platform. Is gaur the only animal that needs this file, by the way? > (1) allow just this one ECPG test to include port.h (or > probably c.h). However, there's a whole other can of worms > there, which is that I wonder if we aren't doing it wrong > on the Unix side by linking libpgport when we shouldn't. > We've not been bit by that yet, but I wonder if it isn't > just a matter of time. The MSVC build, by not linking > those libs in the first place, is really doing this the > correct way. I don't really want to include this stuff in the ECPG tests just to bypass an environment configuration. > (2) Let pg_regress_ecpg.c pass down the environment setting. > > (3) Don't try to use the environment variable for this > purpose. I'd originally tried to change test5.pgc to just > specify gssmode=disable in-line, but that only works > nicely for one of the two failing cases. The other one > is testing the case of a completely defaulted connection > target, so there's no place to add an option without > breaking the only unique aspect of that test case. > (2) is starting to seem attractive now that we've seen > the downsides of (1) and (3). FWIW, I'd be rather in favor of doing (3) because this remains simple just to take care of an edge case, even if that partially breaks the promise to rely on a default connection. (4) would be to revisit the decision to make libpq report all the errors stored in its stack with multiple attempts. That would bring back the buildfarm to green at least, and we still need to take a decision about that for 14 anyway as it involves a compatibility breakage. But I agree that we also should do something for 12~ for those tests. > (BTW, I just noticed that regress.c is unsetenv'ing the > SSL connection environment variables, but not the GSS ones. > Seens like that needs work similar to 8279f68a1.) Yes, I saw that I was able to break things is many fancy ways when working on 8279f68a1, but the list of parameters to reset needs to diverge a bit compared to the TAP tests. -- Michael signature.asc Description: PGP signature
Re: Multiple hosts in connection string failed to failover in non-hot standby mode
Michael Paquier writes: > Now, I also see that using pgwin32_setenv() instead of > src/port/setenv.c causes cl to be confused once we update > ecpg_regression.proj because it cannot find setenv(). Bringing the > question, why is it necessary to have both setenv.c and > pgwin32_setenv() on HEAD? setenv.c should be enough once you have the > fallback implementation of putenv() available. IIUC, what you are proposing to do is replace pgwin32_setenv with src/port/setenv.c. I don't think that's an improvement. setenv.c leaks memory on repeat calls, because it cannot know what pgwin32_setenv knows about how putenv works on that platform. It'd be okay to do it like that for the ECPG tests, perhaps, because we don't really care about small leaks in those. But I don't want it to happen across-the-board. Thinking more, the real problem is that use of libpgport goes hand-in-hand with #including port.h; it's not going to work real well if you do one without the other. And I don't think we want to include port.h in the ECPG test programs, because those are trying to model the environment that typical user applications see. Alternatives seem to be (1) allow just this one ECPG test to include port.h (or probably c.h). However, there's a whole other can of worms there, which is that I wonder if we aren't doing it wrong on the Unix side by linking libpgport when we shouldn't. We've not been bit by that yet, but I wonder if it isn't just a matter of time. The MSVC build, by not linking those libs in the first place, is really doing this the correct way. (2) Let pg_regress_ecpg.c pass down the environment setting. (3) Don't try to use the environment variable for this purpose. I'd originally tried to change test5.pgc to just specify gssmode=disable in-line, but that only works nicely for one of the two failing cases. The other one is testing the case of a completely defaulted connection target, so there's no place to add an option without breaking the only unique aspect of that test case. (2) is starting to seem attractive now that we've seen the downsides of (1) and (3). (BTW, I just noticed that regress.c is unsetenv'ing the SSL connection environment variables, but not the GSS ones. Seens like that needs work similar to 8279f68a1.) regards, tom lane
Re: Multiple hosts in connection string failed to failover in non-hot standby mode
On Mon, Jun 07, 2021 at 10:38:03AM -0400, Tom Lane wrote: > Hmm. We do include "-lpgcommon -lpgport" when building the ecpg test > programs on Unix, so I'd assumed that the MSVC scripts did the same. > Is there a good reason not to make them do so? I was looking at that this morning, and yes we need to add more references here. Actually, adding only libpgport.lib allows the compilation and the tests to work, but I agree to add also libpgcommon.lib so as we don't fall into the same compilation trap again in the future. Now, I also see that using pgwin32_setenv() instead of src/port/setenv.c causes cl to be confused once we update ecpg_regression.proj because it cannot find setenv(). Bringing the question, why is it necessary to have both setenv.c and pgwin32_setenv() on HEAD? setenv.c should be enough once you have the fallback implementation of putenv() available. Attached is the patch I am finishing with, that also brings all this stuff closer to what I did in 12 and 13 for hamerkop. The failing test is passing for me now with MSVC and GSSAPI builds. Thoughts? -- Michael diff --git a/src/include/port/win32_port.h b/src/include/port/win32_port.h index 05c5a53442..e25f65b054 100644 --- a/src/include/port/win32_port.h +++ b/src/include/port/win32_port.h @@ -490,11 +490,9 @@ extern void _dosmaperr(unsigned long); /* in port/win32env.c */ extern int pgwin32_putenv(const char *); -extern int pgwin32_setenv(const char *name, const char *value, int overwrite); extern int pgwin32_unsetenv(const char *name); #define putenv(x) pgwin32_putenv(x) -#define setenv(x,y,z) pgwin32_setenv(x,y,z) #define unsetenv(x) pgwin32_unsetenv(x) /* in port/win32security.c */ diff --git a/src/port/win32env.c b/src/port/win32env.c index a03556078c..c8d43af381 100644 --- a/src/port/win32env.c +++ b/src/port/win32env.c @@ -1,7 +1,7 @@ /*- * * win32env.c - * putenv(), setenv(), and unsetenv() for win32. + * putenv() and unsetenv() for win32. * * These functions update both the process environment and caches in * (potentially multiple) C run-time library (CRT) versions. @@ -117,35 +117,6 @@ pgwin32_putenv(const char *envval) return _putenv(envval); } -int -pgwin32_setenv(const char *name, const char *value, int overwrite) -{ - int res; - char *envstr; - - /* Error conditions, per POSIX */ - if (name == NULL || name[0] == '\0' || strchr(name, '=') != NULL || - value == NULL) - { - errno = EINVAL; - return -1; - } - - /* No work if variable exists and we're not to replace it */ - if (overwrite == 0 && getenv(name) != NULL) - return 0; - - envstr = (char *) malloc(strlen(name) + strlen(value) + 2); - if (!envstr)/* not much we can do if no memory */ - return -1; - - sprintf(envstr, "%s=%s", name, value); - - res = pgwin32_putenv(envstr); - free(envstr); - return res; -} - int pgwin32_unsetenv(const char *name) { diff --git a/src/interfaces/ecpg/test/connect/test5.pgc b/src/interfaces/ecpg/test/connect/test5.pgc index e712fa8778..f7263935ce 100644 --- a/src/interfaces/ecpg/test/connect/test5.pgc +++ b/src/interfaces/ecpg/test/connect/test5.pgc @@ -18,6 +18,9 @@ exec sql begin declare section; char *user="regress_ecpg_user1"; exec sql end declare section; + /* disable GSSENC to ensure stability of connection failure reports */ + setenv("PGGSSENCMODE", "disable", 1); + ECPGdebug(1, stderr); exec sql connect to ecpg2_regression as main; diff --git a/src/interfaces/ecpg/test/expected/connect-test5.c b/src/interfaces/ecpg/test/expected/connect-test5.c index 6ae5b589de..a86a5e4331 100644 --- a/src/interfaces/ecpg/test/expected/connect-test5.c +++ b/src/interfaces/ecpg/test/expected/connect-test5.c @@ -38,37 +38,33 @@ main(void) #line 19 "test5.pgc" + /* disable GSSENC to ensure stability of connection failure reports */ + setenv("PGGSSENCMODE", "disable", 1); + ECPGdebug(1, stderr); { ECPGconnect(__LINE__, 0, "ecpg2_regression" , NULL, NULL , "main", 0); } -#line 23 "test5.pgc" - - { ECPGdo(__LINE__, 0, 1, NULL, 0, ECPGst_normal, "alter user regress_ecpg_user2 encrypted password 'insecure'", ECPGt_EOIT, ECPGt_EORT);} -#line 24 "test5.pgc" - - { ECPGdo(__LINE__, 0, 1, NULL, 0, ECPGst_normal, "alter user regress_ecpg_user1 encrypted password 'connectpw'", ECPGt_EOIT, ECPGt_EORT);} -#line 25 "test5.pgc" - - { ECPGtrans(__LINE__, NULL, "commit");} #line 26 "test5.pgc" - { ECPGdisconnect(__LINE__, "CURRENT");} + { ECPGdo(__LINE__, 0, 1, NULL, 0, ECPGst_normal, "alter user regress_ecpg_user2 encrypted password 'insecure'", ECPGt_EOIT, ECPGt_EORT);} #line 27 "test5.pgc" + + { ECPGdo(__LINE__, 0, 1, NULL, 0, ECPGst_normal, "alter user regress_ecpg_user1 encrypted password 'connectpw'", ECPGt_EOIT, ECPGt_EORT);} +#line 28 "test5.pgc" + + { ECPGtrans(__LINE__, NULL, "commit");} +#line 29 "test5.pgc" + + { ECPGdisconnect(__LINE__, "CURRENT");} +#line 30 "test5.pgc" /* <-- "main" not specified */
Re: Multiple hosts in connection string failed to failover in non-hot standby mode
Michael Paquier writes: > On Sun, Jun 06, 2021 at 05:27:49PM -0400, Tom Lane wrote: >> So I took >> a look at disabling GSSENC in these test cases to try to silence >> hamerkop's test failure that way. Here's a proposed patch. >> It relies on setenv() being available, but I think that's fine >> because we link the ECPG test programs with libpgport. > No, that's not it. The compilation of the tests happens when > triggering the tests as of ecpgcheck() in vcregress.pl so I think that > this is going to fail. This requires at least the addition of a > reference to libpgport in ecpg_regression.proj, perhaps more. Hmm. We do include "-lpgcommon -lpgport" when building the ecpg test programs on Unix, so I'd assumed that the MSVC scripts did the same. Is there a good reason not to make them do so? regards, tom lane
Re: Multiple hosts in connection string failed to failover in non-hot standby mode
On Sun, Jun 06, 2021 at 05:27:49PM -0400, Tom Lane wrote: > It seems like nobody's terribly interested in figuring out why > pg_GSS_have_cred_cache() is misbehaving on Windows. I have been investigating that for a couple of hours in total, but nothing to report yet. > So I took > a look at disabling GSSENC in these test cases to try to silence > hamerkop's test failure that way. Here's a proposed patch. > It relies on setenv() being available, but I think that's fine > because we link the ECPG test programs with libpgport. No, that's not it. The compilation of the tests happens when triggering the tests as of ecpgcheck() in vcregress.pl so I think that this is going to fail. This requires at least the addition of a reference to libpgport in ecpg_regression.proj, perhaps more. -- Michael signature.asc Description: PGP signature
Re: Multiple hosts in connection string failed to failover in non-hot standby mode
It seems like nobody's terribly interested in figuring out why pg_GSS_have_cred_cache() is misbehaving on Windows. So I took a look at disabling GSSENC in these test cases to try to silence hamerkop's test failure that way. Here's a proposed patch. It relies on setenv() being available, but I think that's fine because we link the ECPG test programs with libpgport. regards, tom lane diff --git a/src/interfaces/ecpg/test/connect/test5.pgc b/src/interfaces/ecpg/test/connect/test5.pgc index e712fa8778..f7263935ce 100644 --- a/src/interfaces/ecpg/test/connect/test5.pgc +++ b/src/interfaces/ecpg/test/connect/test5.pgc @@ -18,6 +18,9 @@ exec sql begin declare section; char *user="regress_ecpg_user1"; exec sql end declare section; + /* disable GSSENC to ensure stability of connection failure reports */ + setenv("PGGSSENCMODE", "disable", 1); + ECPGdebug(1, stderr); exec sql connect to ecpg2_regression as main; diff --git a/src/interfaces/ecpg/test/expected/connect-test5.c b/src/interfaces/ecpg/test/expected/connect-test5.c index 6ae5b589de..a86a5e4331 100644 --- a/src/interfaces/ecpg/test/expected/connect-test5.c +++ b/src/interfaces/ecpg/test/expected/connect-test5.c @@ -38,37 +38,33 @@ main(void) #line 19 "test5.pgc" + /* disable GSSENC to ensure stability of connection failure reports */ + setenv("PGGSSENCMODE", "disable", 1); + ECPGdebug(1, stderr); { ECPGconnect(__LINE__, 0, "ecpg2_regression" , NULL, NULL , "main", 0); } -#line 23 "test5.pgc" +#line 26 "test5.pgc" { ECPGdo(__LINE__, 0, 1, NULL, 0, ECPGst_normal, "alter user regress_ecpg_user2 encrypted password 'insecure'", ECPGt_EOIT, ECPGt_EORT);} -#line 24 "test5.pgc" +#line 27 "test5.pgc" { ECPGdo(__LINE__, 0, 1, NULL, 0, ECPGst_normal, "alter user regress_ecpg_user1 encrypted password 'connectpw'", ECPGt_EOIT, ECPGt_EORT);} -#line 25 "test5.pgc" +#line 28 "test5.pgc" { ECPGtrans(__LINE__, NULL, "commit");} -#line 26 "test5.pgc" +#line 29 "test5.pgc" { ECPGdisconnect(__LINE__, "CURRENT");} -#line 27 "test5.pgc" +#line 30 "test5.pgc" /* <-- "main" not specified */ strcpy(db, "ecpg2_regression"); strcpy(id, "main"); { ECPGconnect(__LINE__, 0, db , NULL, NULL , id, 0); } -#line 31 "test5.pgc" - - { ECPGdisconnect(__LINE__, id);} -#line 32 "test5.pgc" - - - { ECPGconnect(__LINE__, 0, "ecpg2_regression" , NULL, NULL , "main", 0); } #line 34 "test5.pgc" - { ECPGdisconnect(__LINE__, "main");} + { ECPGdisconnect(__LINE__, id);} #line 35 "test5.pgc" @@ -86,21 +82,21 @@ main(void) #line 41 "test5.pgc" - { ECPGconnect(__LINE__, 0, "" , "regress_ecpg_user2" , "insecure" , "main", 0); } + { ECPGconnect(__LINE__, 0, "ecpg2_regression" , NULL, NULL , "main", 0); } #line 43 "test5.pgc" { ECPGdisconnect(__LINE__, "main");} #line 44 "test5.pgc" - { ECPGconnect(__LINE__, 0, "ecpg2_regression" , "regress_ecpg_user1" , "connectpw" , "main", 0); } + { ECPGconnect(__LINE__, 0, "" , "regress_ecpg_user2" , "insecure" , "main", 0); } #line 46 "test5.pgc" { ECPGdisconnect(__LINE__, "main");} #line 47 "test5.pgc" - { ECPGconnect(__LINE__, 0, "unix:postgresql://localhost/ecpg2_regression" , "regress_ecpg_user1" , "connectpw" , "main", 0); } + { ECPGconnect(__LINE__, 0, "ecpg2_regression" , "regress_ecpg_user1" , "connectpw" , "main", 0); } #line 49 "test5.pgc" { ECPGdisconnect(__LINE__, "main");} @@ -114,48 +110,55 @@ main(void) #line 53 "test5.pgc" - { ECPGconnect(__LINE__, 0, "unix:postgresql://localhost/ecpg2_regression" , user , "connectpw" , "main", 0); } + { ECPGconnect(__LINE__, 0, "unix:postgresql://localhost/ecpg2_regression" , "regress_ecpg_user1" , "connectpw" , "main", 0); } #line 55 "test5.pgc" { ECPGdisconnect(__LINE__, "main");} #line 56 "test5.pgc" - { ECPGconnect(__LINE__, 0, "unix:postgresql://localhost/ecpg2_regression?connect_timeout=180 & client_encoding=latin1" , "regress_ecpg_user1" , "connectpw" , "main", 0); } + { ECPGconnect(__LINE__, 0, "unix:postgresql://localhost/ecpg2_regression" , user , "connectpw" , "main", 0); } #line 58 "test5.pgc" { ECPGdisconnect(__LINE__, "main");} #line 59 "test5.pgc" - { ECPGconnect(__LINE__, 0, "unix:postgresql://200.46.204.71/ecpg2_regression" , "regress_ecpg_user1" , "connectpw" , "main", 0); } + { ECPGconnect(__LINE__, 0, "unix:postgresql://localhost/ecpg2_regression?connect_timeout=180 & client_encoding=latin1" , "regress_ecpg_user1" , "connectpw" , "main", 0); } #line 61 "test5.pgc" { ECPGdisconnect(__LINE__, "main");} #line 62 "test5.pgc" - { ECPGconnect(__LINE__, 0, "unix:postgresql://localhost/" , "regress_ecpg_user2" , "insecure" , "main", 0); } + { ECPGconnect(__LINE__, 0, "unix:postgresql://200.46.204.71/ecpg2_regression" , "regress_ecpg_user1" , "connectpw" , "main", 0); } #line 64 "test5.pgc" { ECPGdisconnect(__LINE__, "main");} #line 65 "test5.pgc" + { ECPGconnect(__LINE__, 0, "unix:postgresql://localhost/" , "regress_ecpg_user2" , "insecure" , "main",
Re: Multiple hosts in connection string failed to failover in non-hot standby mode
On Mon, May 31, 2021 at 09:07:38PM -0400, Tom Lane wrote: > Michael Paquier writes: >>> Agreed, that seems bogus. > >> There may be others, and I have not checked yet. I'd rather do a >> backpatch for this part, would you agree? > > +1 Playing with all those variables and broken values here and there, I have been able to break a bunch of tests. Most of the failures were in the authentication and SSL tests, but there were also fancier cases. For example, PGCLIENTENCODING would cause a failure with pg_ctl, for any TAP test. I got surprised that enforcing values for most of the PGSSL* ones did not cause a failure when it came to the certs, CRLs keys and root certs now. Still, I think that we'd be safer to cancel these as well. Attached is the list I am finishing with. I'd like to fix that, so please let me know if there are any comments or objections. -- Michael diff --git a/src/test/perl/TestLib.pm b/src/test/perl/TestLib.pm index d6c3eb8723..d4f9fc5f2b 100644 --- a/src/test/perl/TestLib.pm +++ b/src/test/perl/TestLib.pm @@ -104,13 +104,30 @@ BEGIN delete $ENV{LC_ALL}; $ENV{LC_MESSAGES} = 'C'; + delete $ENV{PGCHANNELBINDING}; + delete $ENV{PGCLIENTENCODING}; delete $ENV{PGCONNECT_TIMEOUT}; delete $ENV{PGDATA}; delete $ENV{PGDATABASE}; + delete $ENV{PGGSSENCMODE}; + delete $ENV{PGGSSLIB}; delete $ENV{PGHOSTADDR}; + delete $ENV{PGKRBSRVNAME}; + delete $ENV{PGPASSFILE}; + delete $ENV{PGPASSWORD}; + delete $ENV{PGREQUIREPEER}; delete $ENV{PGREQUIRESSL}; delete $ENV{PGSERVICE}; + delete $ENV{PGSERVICEFILE}; + delete $ENV{PGSSLCERT}; + delete $ENV{PGSSLCRL}; + delete $ENV{PGSSLCRLDIR}; + delete $ENV{PGSSLKEY}; + delete $ENV{PGSSLMAXPROTOCOLVERSION}; + delete $ENV{PGSSLMINPROTOCOLVERSION}; delete $ENV{PGSSLMODE}; + delete $ENV{PGSSLROOTCERT}; + delete $ENV{PGSSLSNI}; delete $ENV{PGUSER}; delete $ENV{PGPORT}; delete $ENV{PGHOST}; signature.asc Description: PGP signature
Re: Multiple hosts in connection string failed to failover in non-hot standby mode
Michael Paquier writes: >>> Another thing that strikes me as incorrect is that we don't unset >>> PGGSSENCMODE or PGGSSLIB in TestLib.pm. Just noting it on the way.. >> Agreed, that seems bogus. > There may be others, and I have not checked yet. I'd rather do a > backpatch for this part, would you agree? +1 regards, tom lane
Re: Multiple hosts in connection string failed to failover in non-hot standby mode
On Mon, May 31, 2021 at 09:36:20AM -0400, Tom Lane wrote: > Interesting --- I was considering running such a test locally, but > didn't get round to it yet. Just to be clear, that's my Windows dev box. > It seems like the ideal solution would be to make pg_GSS_have_cred_cache > smarter, so that we don't attempt a GSS connection cycle here. I am not sure yet what would be adapted here. That requires diving a bit into the upstream code. >> Another thing that strikes me as incorrect is that we don't unset >> PGGSSENCMODE or PGGSSLIB in TestLib.pm. Just noting it on the way.. > > Agreed, that seems bogus. There may be others, and I have not checked yet. I'd rather do a backpatch for this part, would you agree? -- Michael signature.asc Description: PGP signature
Re: Multiple hosts in connection string failed to failover in non-hot standby mode
Michael Paquier writes: > On Mon, May 31, 2021 at 12:05:12AM -0400, Tom Lane wrote: >> What is not clear is why GSS is acting that way. We wouldn't >> have tried a GSS connection unless pg_GSS_have_cred_cache >> succeeded ... so how come that worked but then gss_init_sec_context >> complained "Credential cache is empty"? > I suspect that's just the way the upstream installation works with a > credentials cache created from the beginning, as I see the same > behavior and the same error on my own host for HEAD with a KRB5 server > set up once the upstream installation runs. Interesting --- I was considering running such a test locally, but didn't get round to it yet. > Leaving the specific > topic of this thread aside for one moment, would there be an argument > for just enforcing gssencmode=disable in this set of tests down to 12? It seems like the ideal solution would be to make pg_GSS_have_cred_cache smarter, so that we don't attempt a GSS connection cycle here. But if we can't, adding gssencmode=disable to these test cases is what I was thinking about, too. > Another thing that strikes me as incorrect is that we don't unset > PGGSSENCMODE or PGGSSLIB in TestLib.pm. Just noting it on the way.. Agreed, that seems bogus. regards, tom lane
Re: Multiple hosts in connection string failed to failover in non-hot standby mode
On Mon, May 31, 2021 at 12:05:12AM -0400, Tom Lane wrote: > Yeah, I was looking at that earlier today. Evidently libpq is > trying a GSS-encrypted connection, and that doesn't work, so > it falls back to a regular connection where we get the expected > error. Probably all the connections in this test are hitting the > GSS failure, but only the ones where the second attempt fails > show a visible issue. Yep. This wastes cycles. > What is not clear is why GSS is acting that way. We wouldn't > have tried a GSS connection unless pg_GSS_have_cred_cache > succeeded ... so how come that worked but then gss_init_sec_context > complained "Credential cache is empty"? > > My rough guess is that Windows has implemented the GSS APIs in > such a way that what pg_GSS_have_cred_cache is testing isn't > sufficient to tell whether a sane credential actually exists. > > Or there's something particularly weird about how hamerkop > is set up. I suspect that's just the way the upstream installation works with a credentials cache created from the beginning, as I see the same behavior and the same error on my own host for HEAD with a KRB5 server set up once the upstream installation runs. Leaving the specific topic of this thread aside for one moment, would there be an argument for just enforcing gssencmode=disable in this set of tests down to 12? It is worth noting that the problem does not show up in 12 and 13 once the compilation works, because we just mask the error there, but the code path is still taken. Another thing that strikes me as incorrect is that we don't unset PGGSSENCMODE or PGGSSLIB in TestLib.pm. Just noting it on the way.. -- Michael signature.asc Description: PGP signature
Re: Multiple hosts in connection string failed to failover in non-hot standby mode
Michael Paquier writes: > In my recent quest to look at GSSAPI builds on Windows, I have bumped > into another failure that's related to this thread. hamerkop > summarizes the situation here: > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hamerkop=2021-05-29%2010%3A15%3A42 > There are two failures like this one as errorMessage piles up on > failures, as of connect/test5: > -[NO_PID]: ECPGconnect: connection to server failed: FATAL: database > "regress_ecpg_user2" does not exist > +[NO_PID]: ECPGconnect: connection to server failed: could not > initiate GSSAPI security context: Unspecified GSS failure. Minor > code may provide more information: Credential cache is empty > +connection to server failed: FATAL: database "regress_ecpg_user2" > does not exist Yeah, I was looking at that earlier today. Evidently libpq is trying a GSS-encrypted connection, and that doesn't work, so it falls back to a regular connection where we get the expected error. Probably all the connections in this test are hitting the GSS failure, but only the ones where the second attempt fails show a visible issue. What is not clear is why GSS is acting that way. We wouldn't have tried a GSS connection unless pg_GSS_have_cred_cache succeeded ... so how come that worked but then gss_init_sec_context complained "Credential cache is empty"? My rough guess is that Windows has implemented the GSS APIs in such a way that what pg_GSS_have_cred_cache is testing isn't sufficient to tell whether a sane credential actually exists. Or there's something particularly weird about how hamerkop is set up. regards, tom lane
Re: Multiple hosts in connection string failed to failover in non-hot standby mode
On Mon, May 31, 2021 at 11:00:55AM +0900, Michael Paquier wrote: > Yeah. On the contrary, it could be confusing if one sees an error > message but there is nothing to worry about, because things are > working in the scope of what the user wanted at connection time. In my recent quest to look at GSSAPI builds on Windows, I have bumped into another failure that's related to this thread. hamerkop summarizes the situation here: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hamerkop=2021-05-29%2010%3A15%3A42 There are two failures like this one as errorMessage piles up on failures, as of connect/test5: -[NO_PID]: ECPGconnect: connection to server failed: FATAL: database "regress_ecpg_user2" does not exist +[NO_PID]: ECPGconnect: connection to server failed: could not initiate GSSAPI security context: Unspecified GSS failure. Minor code may provide more information: Credential cache is empty +connection to server failed: FATAL: database "regress_ecpg_user2" does not exist -- Michael signature.asc Description: PGP signature
Re: Multiple hosts in connection string failed to failover in non-hot standby mode
On Sun, May 30, 2021 at 08:25:00PM -0500, Justin Pryzby wrote: > ..But I think it's not useful to put details into errorMessage on success, > unless you're going to document that. It would never have occurred to me to > look there, or that it would even be safe. Yeah. On the contrary, it could be confusing if one sees an error message but there is nothing to worry about, because things are working in the scope of what the user wanted at connection time. -- Michael signature.asc Description: PGP signature
Re: Multiple hosts in connection string failed to failover in non-hot standby mode
On Thu, May 06, 2021 at 01:22:27PM -0400, Tom Lane wrote: > Justin Pryzby writes: > > 52a10224 broke sqlsmith, of all things. > > > It was using errmsg as a test of success, instead of checking if the > > connection > > result wasn't null: > > > conn = PQconnectdb(conninfo.c_str()); > > char *errmsg = PQerrorMessage(conn); > > if (strlen(errmsg)) > > throw dut::broken(errmsg, "08001"); > > > That's clearly the wrong thing to do, but maybe this should be described in > > the > > release notes as a compatibility issue, in case other people had the same > > idea. > > Clearing errorMessage during success is an option.. > > Hm. I'd debated whether to clear conn->errorMessage at the end of > a successful connection sequence, and decided not to on the grounds > that it might be interesting info (eg it could tell you why you > ended up connected to server Y and not server X). But perhaps > it's too much of a compatibility break for this small benefit. I don't care if applications break because they check the errorMessage instead of the return value... ..But I think it's not useful to put details into errorMessage on success, unless you're going to document that. It would never have occurred to me to look there, or that it would even be safe. -- Justin
Re: Multiple hosts in connection string failed to failover in non-hot standby mode
> On 11 May 2021, at 09:29, Michael Paquier wrote: > FWIW, I think that the case of getting some information about any > failed connections while a connection has been successfully made > within the scope of the connection string parameters provided by the > user is rather thin, and I really feel that this is going to cause > more pain to users than this is worth it. So my vote would be to > clean up conn->errorMessage after a successful connection. Agreed, given how conservative we typically are with backwards compatibility it seems a too thin benefit to warrant potential breakage. My vote would too be to restore the behavior by clearing conn->errorMessage. -- Daniel Gustafsson https://vmware.com/
Re: Multiple hosts in connection string failed to failover in non-hot standby mode
On Thu, May 06, 2021 at 01:22:27PM -0400, Tom Lane wrote: > Hm. I'd debated whether to clear conn->errorMessage at the end of > a successful connection sequence, and decided not to on the grounds > that it might be interesting info (eg it could tell you why you > ended up connected to server Y and not server X). But perhaps > it's too much of a compatibility break for this small benefit. > > I'm curious though why it took this long for anyone to complain. > I'd supposed that people were running sqlsmith against HEAD on > a pretty regular basis. FWIW, I think that the case of getting some information about any failed connections while a connection has been successfully made within the scope of the connection string parameters provided by the user is rather thin, and I really feel that this is going to cause more pain to users than this is worth it. So my vote would be to clean up conn->errorMessage after a successful connection. Now, I would not mind either if we finish by taking a decision here after beta1, to see if there are actual complains on the matter based on the feedback we get. -- Michael signature.asc Description: PGP signature
Re: Multiple hosts in connection string failed to failover in non-hot standby mode
> On Thu, May 06, 2021 at 01:22:27PM -0400, Tom Lane wrote: >> I'm curious though why it took this long for anyone to complain. >> I'd supposed that people were running sqlsmith against HEAD on >> a pretty regular basis. Last time I ran it was November 27. I'm neglecting it on my spare time and there is hardly any opportunity to sneak it onto my agenda at work. I'll do my best to try to get either of these fixed. Justin Pryzby writes: > I think it's also becase sqlsmith would need to run against the v14 *client* > library. I don't know about anyone else's workflow, but I tend not to "make > install", but work with binaries out of ./tmp_install. My playbooks don't grab the client libraries of the test target either. I'll change them. > There's a few changes needed on sqlsmith HEAD, but I guess nobody would have > gotten that far if the connection was failing (or rather, detected as such). Thanks for the patch. regards, andreas
Re: Multiple hosts in connection string failed to failover in non-hot standby mode
On Thu, May 06, 2021 at 01:22:27PM -0400, Tom Lane wrote: > I'm curious though why it took this long for anyone to complain. > I'd supposed that people were running sqlsmith against HEAD on > a pretty regular basis. I think it's also becase sqlsmith would need to run against the v14 *client* library. I don't know about anyone else's workflow, but I tend not to "make install", but work with binaries out of ./tmp_install. There's a few changes needed on sqlsmith HEAD, but I guess nobody would have gotten that far if the connection was failing (or rather, detected as such). diff --git a/grammar.cc b/grammar.cc index 62aa8e9..76491ff 100644 --- a/grammar.cc +++ b/grammar.cc @@ -327,7 +327,11 @@ query_spec::query_spec(prod *p, struct scope *s, bool lateral) : search = bool_expr::factory(this); - if (d6() > 2) { + if (d6() > 4) { +ostringstream cons; +cons << "order by 1 fetch first " << d100() + d100() << " rows with ties"; +limit_clause = cons.str(); + } else if (d6() > 2) { ostringstream cons; cons << "limit " << d100() + d100(); limit_clause = cons.str(); diff --git a/postgres.cc b/postgres.cc index f2a3627..1c0c55f 100644 --- a/postgres.cc +++ b/postgres.cc @@ -30,6 +30,7 @@ bool pg_type::consistent(sqltype *rvalue) case 'c': /* composite type */ case 'd': /* domain */ case 'r': /* range */ + case 'm': /* multirange */ case 'e': /* enum */ return this == t;
Re: Multiple hosts in connection string failed to failover in non-hot standby mode
Justin Pryzby writes: > 52a10224 broke sqlsmith, of all things. > It was using errmsg as a test of success, instead of checking if the > connection > result wasn't null: > conn = PQconnectdb(conninfo.c_str()); > char *errmsg = PQerrorMessage(conn); > if (strlen(errmsg)) > throw dut::broken(errmsg, "08001"); > That's clearly the wrong thing to do, but maybe this should be described in > the > release notes as a compatibility issue, in case other people had the same > idea. > Clearing errorMessage during success is an option.. Hm. I'd debated whether to clear conn->errorMessage at the end of a successful connection sequence, and decided not to on the grounds that it might be interesting info (eg it could tell you why you ended up connected to server Y and not server X). But perhaps it's too much of a compatibility break for this small benefit. I'm curious though why it took this long for anyone to complain. I'd supposed that people were running sqlsmith against HEAD on a pretty regular basis. regards, tom lane
Re: Multiple hosts in connection string failed to failover in non-hot standby mode
On Sun, Jan 10, 2021 at 05:38:50PM -0500, Tom Lane wrote: > I wrote: > > The problems that I see in this area are first that there's no > > real standardization in libpq as to whether to append error messages > > together or just flush preceding messages; and second that no effort > > is made in multi-connection-attempt cases to separate the errors from > > different attempts, much less identify which error goes with which > > host or IP address. I think we really need to put some work into > > that. > > I spent some time on this, and here is a patch set that tries to > improve matters. > > 0001 changes the libpq coding rules to be that all error messages should > be appended to conn->errorMessage, never overwritten (there are a couple > of exceptions in fe-lobj.c) and we reset conn->errorMessage to empty only > at the start of a connection request or new query. This is something > that's been bugging me for a long time and I'm glad to get it cleaned up. > Formerly it seemed that a dartboard had been used to decide whether to use > "printfPQExpBuffer" or "appendPQExpBuffer"; now it's always the latter. > We can also get rid of several hacks that were used to get around the > mess and force appending behavior. > > 0002 then changes the reporting rules in fe-connect.c as I suggested, > so that you might get errors like this: > > $ psql -h localhost,/tmp -p 12345 > psql: error: could not connect to host "localhost" (::1), port 12345: > Connection refused > Is the server running on that host and accepting TCP/IP connections? > could not connect to host "localhost" (127.0.0.1), port 12345: Connection > refused > Is the server running on that host and accepting TCP/IP connections? > could not connect to socket "/tmp/.s.PGSQL.12345": No such file or directory > Is the server running locally and accepting connections on that > socket? > > and we have a pretty uniform rule that errors coming back from a > connection attempt will be prefixed with the server address. 52a10224 broke sqlsmith, of all things. It was using errmsg as a test of success, instead of checking if the connection result wasn't null: conn = PQconnectdb(conninfo.c_str()); char *errmsg = PQerrorMessage(conn); if (strlen(errmsg)) throw dut::broken(errmsg, "08001"); That's clearly the wrong thing to do, but maybe this should be described in the release notes as a compatibility issue, in case other people had the same idea. Clearing errorMessage during success is an option.. -- Justin
Re: Multiple hosts in connection string failed to failover in non-hot standby mode
Hubert Zhang writes: > As for patch 0004, find ':' after "could not connect to" may failed when > error message like: > "could not connect to host "localhost" (::1), port 12345: Connection > refused", where p2 will point to "::1" instead of ": Connection refused". But > since it's only used for test case, we don't need to filter the error message > precisely. Excellent point, and I think that could happen on a Windows installation. We can make it look for ": " instead of just ':', and that'll reduce the odds of trouble. regards, tom lane
Re: Multiple hosts in connection string failed to failover in non-hot standby mode
Hi Tom, I agree to get detailed error message for each failed host as your patch 0001. As for patch 0004, find ':' after "could not connect to" may failed when error message like: "could not connect to host "localhost" (::1), port 12345: Connection refused", where p2 will point to "::1" instead of ": Connection refused". But since it's only used for test case, we don't need to filter the error message precisely. ``` ecpg_filter_stderr(const char *resultfile, const char *tmpfile) { .. char *p1 = strstr(linebuf.data, "could not connect to "); if (p1) { char *p2 = strchr(p1, ':'); if (p2) memmove(p1 + 17, p2, strlen(p2) + 1); } } ``` Thanks, Hubert From: Tom Lane Sent: Monday, January 11, 2021 10:56 AM To: Hubert Zhang Cc: tsunakawa.ta...@fujitsu.com ; pgsql-hack...@postgresql.org Subject: Re: Multiple hosts in connection string failed to failover in non-hot standby mode I wrote: > I feel pretty good about 0001: it might be committable as-is. 0002 is > probably subject to bikeshedding, plus it has a problem in the ECPG tests. > Two of the error messages are now unstable because they expose > chosen-at-random socket paths: > ... > I don't have any non-hacky ideas what to do about that. The extra detail > seems useful to end users, but we don't have any infrastructure that > would allow filtering it out in the ECPG tests. So far the only solution that comes to mind is to introduce some infrastructure to do that filtering. 0001-0003 below are unchanged, 0004 patches up the ecpg test framework with a rather ad-hoc filtering function. I'd feel worse about this if there weren't already a very ad-hoc filtering function there ;-) This set passes check-world for me; we'll soon see what the cfbot thinks. regards, tom lane
Re: Multiple hosts in connection string failed to failover in non-hot standby mode
I wrote: > The problems that I see in this area are first that there's no > real standardization in libpq as to whether to append error messages > together or just flush preceding messages; and second that no effort > is made in multi-connection-attempt cases to separate the errors from > different attempts, much less identify which error goes with which > host or IP address. I think we really need to put some work into > that. I spent some time on this, and here is a patch set that tries to improve matters. 0001 changes the libpq coding rules to be that all error messages should be appended to conn->errorMessage, never overwritten (there are a couple of exceptions in fe-lobj.c) and we reset conn->errorMessage to empty only at the start of a connection request or new query. This is something that's been bugging me for a long time and I'm glad to get it cleaned up. Formerly it seemed that a dartboard had been used to decide whether to use "printfPQExpBuffer" or "appendPQExpBuffer"; now it's always the latter. We can also get rid of several hacks that were used to get around the mess and force appending behavior. 0002 then changes the reporting rules in fe-connect.c as I suggested, so that you might get errors like this: $ psql -h localhost,/tmp -p 12345 psql: error: could not connect to host "localhost" (::1), port 12345: Connection refused Is the server running on that host and accepting TCP/IP connections? could not connect to host "localhost" (127.0.0.1), port 12345: Connection refused Is the server running on that host and accepting TCP/IP connections? could not connect to socket "/tmp/.s.PGSQL.12345": No such file or directory Is the server running locally and accepting connections on that socket? and we have a pretty uniform rule that errors coming back from a connection attempt will be prefixed with the server address. Then 0003 is the part of your patch that I'm happy with. Given 0001+0002 we could certainly consider looping back and retrying for more cases, but I still want to tread lightly on that. I don't see a lot of value in retrying errors that seem to be on the client side, such as failure to set socket properties; and in general I'm hesitant to add untestable code paths here. I feel pretty good about 0001: it might be committable as-is. 0002 is probably subject to bikeshedding, plus it has a problem in the ECPG tests. Two of the error messages are now unstable because they expose chosen-at-random socket paths: diff -U3 /home/postgres/pgsql/src/interfaces/ecpg/test/expected/connect-test5.stderr /home/postgres/pgsql/src/interfaces/ecpg/test/results/connect-test5.stderr --- /home/postgres/pgsql/src/interfaces/ecpg/test/expected/connect-test5.stderr 2020-08-04 14:59:45.617380050 -0400 +++ /home/postgres/pgsql/src/interfaces/ecpg/test/results/connect-test5.stderr 2021-01-10 16:12:22.539433702 -0500 @@ -36,7 +36,7 @@ [NO_PID]: sqlca: code: 0, state: 0 [NO_PID]: ECPGconnect: opening database on port for user regress_ecpg_user2 [NO_PID]: sqlca: code: 0, state: 0 -[NO_PID]: ECPGconnect: could not open database: FATAL: database "regress_ecpg_user2" does not exist +[NO_PID]: ECPGconnect: could not open database: could not connect to socket "/tmp/pg_regress-EbHubF/.s.PGSQL.58080": FATAL: database "regress_ecpg_user2" does not exist [NO_PID]: sqlca: code: 0, state: 0 [NO_PID]: ecpg_finish: connection main closed @@ -73,7 +73,7 @@ [NO_PID]: sqlca: code: -220, state: 08003 [NO_PID]: ECPGconnect: opening database on port for user regress_ecpg_user2 [NO_PID]: sqlca: code: 0, state: 0 -[NO_PID]: ECPGconnect: could not open database: FATAL: database "regress_ecpg_user2" does not exist +[NO_PID]: ECPGconnect: could not open database: could not connect to socket "/tmp/pg_regress-EbHubF/.s.PGSQL.58080": FATAL: database "regress_ecpg_user2" does not exist [NO_PID]: sqlca: code: 0, state: 0 [NO_PID]: ecpg_finish: connection main closed I don't have any non-hacky ideas what to do about that. The extra detail seems useful to end users, but we don't have any infrastructure that would allow filtering it out in the ECPG tests. regards, tom lane diff --git a/src/interfaces/libpq/fe-auth-scram.c b/src/interfaces/libpq/fe-auth-scram.c index b76f0befd0..8b60378379 100644 --- a/src/interfaces/libpq/fe-auth-scram.c +++ b/src/interfaces/libpq/fe-auth-scram.c @@ -208,13 +208,13 @@ pg_fe_scram_exchange(void *opaq, char *input, int inputlen, { if (inputlen == 0) { - printfPQExpBuffer(>errorMessage, + appendPQExpBuffer(>errorMessage, libpq_gettext("malformed SCRAM message (empty message)\n")); goto error; } if (inputlen != strlen(input)) { - printfPQExpBuffer(>errorMessage, + appendPQExpBuffer(>errorMessage, libpq_gettext("malformed SCRAM message (length mismatch)\n")); goto error; } @@ -258,14 +258,14 @@ pg_fe_scram_exchange(void *opaq, char *input,
Re: Multiple hosts in connection string failed to failover in non-hot standby mode
Hubert Zhang writes: > [ 0001-Enhance-libpq-to-support-multiple-host-for-non-hot-s.patch ] I took a quick look at this. TBH, I'd just drop the first three hunks, as they've got nothing to do with any failure mode that there's evidence for in this thread or the prior one, and I'm afraid they're more likely to create trouble than fix it. As for the last hunk, why is it after rather than before the SSL/GSS checks? I doubt that retrying with/without SSL is going to change a CANNOT_CONNECT_NOW result, unless maybe by slowing things down to the point where recovery has finished ;-) The bigger picture though is (1) what set of failures should we retry on? I think CANNOT_CONNECT_NOW is reasonable, but are there others? (2) what does this do to the quality of the error messages in cases where all the connection attempts fail? I think that error message quality was not thought too much about in the original development of the multi-host feature, so to some extent I'm asking you to clean up someone else's mess. Nonetheless, I feel that we do need to clean it up before we do things that risk making it even more confusing. The problems that I see in this area are first that there's no real standardization in libpq as to whether to append error messages together or just flush preceding messages; and second that no effort is made in multi-connection-attempt cases to separate the errors from different attempts, much less identify which error goes with which host or IP address. I think we really need to put some work into that. In some cases you can infer what happened from breadcrumbs we already put into the text, for example $ psql -h localhost,/tmp -p 12345 psql: error: could not connect to server: Connection refused Is the server running on host "localhost" (::1) and accepting TCP/IP connections on port 12345? could not connect to server: Connection refused Is the server running on host "localhost" (127.0.0.1) and accepting TCP/IP connections on port 12345? could not connect to server: No such file or directory Is the server running locally and accepting connections on Unix domain socket "/tmp/.s.PGSQL.12345"? but this doesn't seem particularly helpfully laid out to me, and we don't provide the breadcrumbs at all for a lot of other error cases. I'm vaguely imagining that we could do something more like could not connect to host "localhost" (::1), port 12345: Connection refused could not connect to host "localhost" (127.0.0.1), port 12345: Connection refused could not connect to socket "/tmp/.s.PGSQL.12345": No such file or directory Not quite sure if the "Is the server running" hint is worth preserving. We'd have to reword it quite a bit, and it'd be very duplicative. The implementation of this might involve sticking the initial string (up to the colon, in this example) into conn->errorMessage speculatively as we try each host. If we then append an error to it and go around again, we're good. If we successfully connect, then the contents of conn->errorMessage don't matter. regards, tom lane
RE: Multiple hosts in connection string failed to failover in non-hot standby mode
From: Hubert Zhang > Hao Wu and I wrote a patch to fix this problem. Client side libpq should try > another hosts in connection string when it is rejected by a non-hot standby, > or the first host encounter some n/w problems during the libpq handshake. Thank you. Please add it to the November Commitfest. > Thanks. Is this email in text format now? I just use outlook in chrome. Let > me know if it still in html format. I'm afraid not. The Outlook's title bar says that it's in HTML format. I'm using Outlook 2016 client app on Windows 10. I may have failed to convert my previous email to text, but it should be text format this time. Regards Takayuki Tsunakawa
Re: Multiple hosts in connection string failed to failover in non-hot standby mode
Was the primary running and accepting connections when you encountered this error? That is, if you specified host="host1 host2", host1 was the non-hot standby and host2 was a running primary? Or only the non-hot standby was running? If a primary was running, I'd say it's a bug... Perhaps the following part in libpq gives up connection attempts wen the above FATAL error is returned from the server. Maybe libpq should differentiate errors using SQLSTATE and continue connection attempts on other hosts. Yes, the primary was running, but non-hot standby is in front of the primary in connection string. Hao Wu and I wrote a patch to fix this problem. Client side libpq should try another hosts in connection string when it is rejected by a non-hot standby, or the first host encounter some n/w problems during the libpq handshake. Please send emails in text format. Your email was in HTML, and I changed this reply to text format. Thanks. Is this email in text format now? I just use outlook in chrome. Let me know if it still in html format. Hubert & Hao Wu From: tsunakawa.ta...@fujitsu.com Sent: Tuesday, October 27, 2020 5:30 PM To: Hubert Zhang Cc: pgsql-hack...@postgresql.org Subject: RE: Multiple hosts in connection string failed to failover in non-hot standby mode Please send emails in text format. Your email was in HTML, and I changed this reply to text format. From: Hubert Zhang > Libpq has supported to specify multiple hosts in connection string and enable > auto failover when the previous PostgreSQL instance cannot be accessed. > But when I tried to enable this feature for a non-hot standby, it cannot do > the failover with the following messages. > > psql: error: could not connect to server: FATAL: the database system is > starting up Was the primary running and accepting connections when you encountered this error? That is, if you specified host="host1 host2", host1 was the non-hot standby and host2 was a running primary? Or only the non-hot standby was running? If a primary was running, I'd say it's a bug... Perhaps the following part in libpq gives up connection attempts wen the above FATAL error is returned from the server. Maybe libpq should differentiate errors using SQLSTATE and continue connection attempts on other hosts. [fe-connect.c] /* Handle errors. */ if (beresp == 'E') { if (PG_PROTOCOL_MAJOR(conn->pversion) >= 3) ... #endif goto error_return; } /* It is an authentication request. */ conn->auth_req_received = true; /* Get the type of request. */ Regards Takayuki Tsunakawa 0001-Enhance-libpq-to-support-multiple-host-for-non-hot-s.patch Description: 0001-Enhance-libpq-to-support-multiple-host-for-non-hot-s.patch
Re: Multiple hosts in connection string failed to failover in non-hot standby mode
On Tue, Oct 27, 2020 at 07:14:14AM +, Hubert Zhang wrote: > Libpq has supported to specify multiple hosts in connection string and enable > auto failover when the previous PostgreSQL instance cannot be accessed. > But when I tried to enable this feature for a non-hot standby, it cannot do > the failover with the following messages. > > psql: error: could not connect to server: FATAL: the database system is > starting up > > Document says ' If a connection is established successfully, but > authentication fails, the remaining hosts in the list are not tried.' > I'm wondering is it a feature by design or a bug? If it's a bug, I plan to > fix it. I felt it was a bug, but the community as a whole may or may not agree: https://postgr.es/m/flat/16508-1a63222835164566%40postgresql.org
RE: Multiple hosts in connection string failed to failover in non-hot standby mode
Please send emails in text format. Your email was in HTML, and I changed this reply to text format. From: Hubert Zhang > Libpq has supported to specify multiple hosts in connection string and enable > auto failover when the previous PostgreSQL instance cannot be accessed. > But when I tried to enable this feature for a non-hot standby, it cannot do > the failover with the following messages. > > psql: error: could not connect to server: FATAL: the database system is > starting up Was the primary running and accepting connections when you encountered this error? That is, if you specified host="host1 host2", host1 was the non-hot standby and host2 was a running primary? Or only the non-hot standby was running? If a primary was running, I'd say it's a bug... Perhaps the following part in libpq gives up connection attempts wen the above FATAL error is returned from the server. Maybe libpq should differentiate errors using SQLSTATE and continue connection attempts on other hosts. [fe-connect.c] /* Handle errors. */ if (beresp == 'E') { if (PG_PROTOCOL_MAJOR(conn->pversion) >= 3) ... #endif goto error_return; } /* It is an authentication request. */ conn->auth_req_received = true; /* Get the type of request. */ Regards Takayuki Tsunakawa
Multiple hosts in connection string failed to failover in non-hot standby mode
Hi hackers, Libpq has supported to specify multiple hosts in connection string and enable auto failover when the previous PostgreSQL instance cannot be accessed. But when I tried to enable this feature for a non-hot standby, it cannot do the failover with the following messages. psql: error: could not connect to server: FATAL: the database system is starting up Document says ' If a connection is established successfully, but authentication fails, the remaining hosts in the list are not tried.' I'm wondering is it a feature by design or a bug? If it's a bug, I plan to fix it. Thanks, Hubert Zhang