Hi, On 2025-03-06 22:49:20 +0200, Heikki Linnakangas wrote: > In short, all the 4 patches look good to me. Thanks for picking this up! > > On 06/03/2025 22:16, Andres Freund wrote: > > On 2025-03-05 20:49:33 -0800, Noah Misch wrote: > > > > This behaviour makes it really hard to debug problems. It'd have been a > > > > lot > > > > easier to understand the problem if we'd seen psql's stderr before the > > > > test > > > > died. > > > > > > > > I guess that mean at the very least we'd need to put an eval {} around > > > > the > > > > ->pump() call., print $self->{stdout}, ->{stderr} and reraise an error? > > > > > > That sounds right. > > > > In the attached patch I did that for wait_connect(). I did verify that it > > works by implementing the wait_connect() fix before fixing > > 002_connection_limits.pl, which fails if a sleep(1) is added just before the > > proc_exit(1) for FATAL. > > +1. For the archives sake, I just want to clarify that this pump stuff is > all about getting better error messages on a test failure. It doesn't help > with the original issue.
Agreed. > This is all annoyingly complicated, but getting good error messages is worth > it. Yea. I really look forward to having a way to write stuff like this that doesn't involve hackily driving psql from 100m away using rubber bands. > > On 2025-03-05 08:23:32 +0900, Michael Paquier wrote:>> Why not adding an > > injection point with a WARNING or a LOG generated, > then > > > check the server logs for the code path taken based on the elog() > > > generated > > > with the point name? > > > > I think the log_min_messages approach is a lot simpler. If we need something > > like this more widely we can reconsider injection points... > > +1. It's a little annoying to depend on a detail like the "client backend > process exited" debug message, but seems like the best fix for now. We use the same message for LOG messages too, for other types of backends, so I think it's not that likely to change. But stilll not great. > While we're at it, attached are a few more cleanups I noticed. I assume you'll apply that yourself? Commits with updated commit messages attached. I wonder if we should apply the polishing of connect_ok()/connect_fails() and the wait_connect() debuggability improvements to the backbranches? Keeping TAP infrastructure as similar as possible between branches has proven worthwhile IME. Greetings, Andres Freund
>From d6dbf4c4a1e723a27df8a08b7e75352b8fb29d05 Mon Sep 17 00:00:00 2001 From: Andres Freund <and...@anarazel.de> Date: Fri, 7 Mar 2025 09:44:00 -0500 Subject: [PATCH v2 1/4] tests: Improve test names in connect_fails()/connect_ok() connect_fails() didn't mention that stderr matched, whereas connect_ok() did. Neither connect_fails() nor connect_ok() mentioned what they were checking when checking psql's return status. Reviewed-by: Heikki Linnakangas <hlinn...@iki.fi> Discussion: https://postgr.es/m/ggflhkciwdyotpoie323chu2c2idpjk5qimrn462encwx2io7s@thmcxl7i6dpw --- src/test/perl/PostgreSQL/Test/Cluster.pm | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/src/test/perl/PostgreSQL/Test/Cluster.pm b/src/test/perl/PostgreSQL/Test/Cluster.pm index b105cba05a6..883532e1cd3 100644 --- a/src/test/perl/PostgreSQL/Test/Cluster.pm +++ b/src/test/perl/PostgreSQL/Test/Cluster.pm @@ -2554,7 +2554,7 @@ sub connect_ok connstr => "$connstr", on_error_stop => 0); - is($ret, 0, $test_name); + is($ret, 0, "$test_name: connect succeeds, as expected"); if (defined($params{expected_stdout})) { @@ -2619,11 +2619,11 @@ sub connect_fails extra_params => ['-w'], connstr => "$connstr"); - isnt($ret, 0, $test_name); + isnt($ret, 0, "$test_name: connect fails, as expected"); if (defined($params{expected_stderr})) { - like($stderr, $params{expected_stderr}, "$test_name: matches"); + like($stderr, $params{expected_stderr}, "$test_name: stderr matches"); } $self->log_check($test_name, $log_location, %params); -- 2.48.1.76.g4e746b1a31.dirty
>From e514fe32c2566c524f1f18410266a1e2efdc7644 Mon Sep 17 00:00:00 2001 From: Andres Freund <and...@anarazel.de> Date: Fri, 7 Mar 2025 09:44:00 -0500 Subject: [PATCH v2 2/4] tests: Add note if BackgroundPsql::wait_connect() fails If wait_connect() failed due to psql exiting, all that we'd see is a "process ended prematurely" error thrown by IPC::Run, without ever seeing psql's error message. Address that by wrapping the pump() call in eval and taking note of stdout & stderr in case of failure. We might want to do that in pump_until() as well, but that seems to require API changes, so let's do the easily achievable bit first. Reviewed-by: Heikki Linnakangas <hlinn...@iki.fi> Discussion: https://postgr.es/m/ggflhkciwdyotpoie323chu2c2idpjk5qimrn462encwx2io7s@thmcxl7i6dpw --- .../perl/PostgreSQL/Test/BackgroundPsql.pm | 26 ++++++++++++++++--- 1 file changed, 22 insertions(+), 4 deletions(-) diff --git a/src/test/perl/PostgreSQL/Test/BackgroundPsql.pm b/src/test/perl/PostgreSQL/Test/BackgroundPsql.pm index c611a61cf4e..1deb410c133 100644 --- a/src/test/perl/PostgreSQL/Test/BackgroundPsql.pm +++ b/src/test/perl/PostgreSQL/Test/BackgroundPsql.pm @@ -154,10 +154,28 @@ sub wait_connect my $banner = "background_psql: ready"; my $banner_match = qr/(^|\n)$banner\r?\n/; $self->{stdin} .= "\\echo $banner\n\\warn $banner\n"; - $self->{run}->pump() - until ($self->{stdout} =~ /$banner_match/ - && $self->{stderr} =~ /$banner\r?\n/) - || $self->{timeout}->is_expired; + + # IPC::Run throws in case psql exits while we're pumping. To make it + # easier to diagnose that, catch the error, report stdout/stderr at time + # of death and reraise. + eval { + $self->{run}->pump() + until ($self->{stdout} =~ /$banner_match/ + && $self->{stderr} =~ /$banner\r?\n/) + || $self->{timeout}->is_expired; + }; + if ($@) + { + chomp(my $stdout = $self->{stdout}); + chomp(my $stderr = $self->{stderr}); + chomp(my $err = $@); + diag qq(psql died while connecting: + stdout: $stdout + stderr: $stderr + perl error: $err +); + die "psql died while connecting"; + } note "connect output:\n", explain { -- 2.48.1.76.g4e746b1a31.dirty
>From 6bd317053557030d7d5b1818b1254aebf2230f08 Mon Sep 17 00:00:00 2001 From: Andres Freund <and...@anarazel.de> Date: Fri, 7 Mar 2025 09:44:00 -0500 Subject: [PATCH v2 3/4] tests: Fix race condition in postmaster/002_connection_limits The test occasionally failed due to unexpected connection limit errors being encountered after having waited for FATAL errors on another connection. These spurious failures were caused by the the backend reporting FATAL errors to the client before detaching from the PGPROC entry. Adding a sleep(1) before proc_exit() makes it easy to reproduce that problem. To fix the issue, add a helper function that waits for postmaster to notice the process having exited. For now this is implemented by waiting for the DEBUG2 message that postmaster logs in that case. That's not the prettiest fix, but simple. If we notice this problem elsewhere, it might be worthwhile to make this more general, e.g. by adding an injection point. Reported-by: Tomas Vondra <to...@vondra.me> Diagnosed-by: Heikki Linnakangas <hlinn...@iki.fi> Reviewed-by: Heikki Linnakangas <hlinn...@iki.fi> Discussion: https://postgr.es/m/ggflhkciwdyotpoie323chu2c2idpjk5qimrn462encwx2io7s@thmcxl7i6dpw --- .../postmaster/t/002_connection_limits.pl | 35 +++++++++++++++++-- 1 file changed, 32 insertions(+), 3 deletions(-) diff --git a/src/test/postmaster/t/002_connection_limits.pl b/src/test/postmaster/t/002_connection_limits.pl index 8cfa6e0ced5..2c185eef6eb 100644 --- a/src/test/postmaster/t/002_connection_limits.pl +++ b/src/test/postmaster/t/002_connection_limits.pl @@ -20,6 +20,7 @@ $node->append_conf('postgresql.conf', "max_connections = 6"); $node->append_conf('postgresql.conf', "reserved_connections = 2"); $node->append_conf('postgresql.conf', "superuser_reserved_connections = 1"); $node->append_conf('postgresql.conf', "log_connections = on"); +$node->append_conf('postgresql.conf', "log_min_messages=debug2"); $node->start; $node->safe_psql( @@ -45,13 +46,39 @@ sub background_psql_as_user extra_params => [ '-U', $user ]); } +# Like connect_fails(), except that we also wait for the failed backend to +# have exited. +# +# This tests needs to wait for client processes to exit because the error +# message for a failed connection is reported before the backend has detached +# from shared memory. If we didn't wait, subsequent tests might hit connection +# limits spuriously. +# +# This can't easily be generalized, as detecting process exit requires +# log_min_messages to be at least DEBUG2 and is not concurrency safe, as we +# can't easily be sure the right process exited. In this test that's not a +# problem though, we only have one new connection at a time. +sub connect_fails_wait +{ + local $Test::Builder::Level = $Test::Builder::Level + 1; + my ($node, $connstr, $test_name, %params) = @_; + + my $log_location = -s $node->logfile; + + $node->connect_fails($connstr, $test_name, %params); + $node->wait_for_log(qr/DEBUG: client backend.*exited with exit code 1/, + $log_location); + ok(1, "$test_name: client backend process exited"); +} + my @sessions = (); my @raw_connections = (); push(@sessions, background_psql_as_user('regress_regular')); push(@sessions, background_psql_as_user('regress_regular')); push(@sessions, background_psql_as_user('regress_regular')); -$node->connect_fails( +connect_fails_wait( + $node, "dbname=postgres user=regress_regular", "reserved_connections limit", expected_stderr => @@ -60,7 +87,8 @@ $node->connect_fails( push(@sessions, background_psql_as_user('regress_reserved')); push(@sessions, background_psql_as_user('regress_reserved')); -$node->connect_fails( +connect_fails_wait( + $node, "dbname=postgres user=regress_regular", "reserved_connections limit", expected_stderr => @@ -68,7 +96,8 @@ $node->connect_fails( ); push(@sessions, background_psql_as_user('regress_superuser')); -$node->connect_fails( +connect_fails_wait( + $node, "dbname=postgres user=regress_superuser", "superuser_reserved_connections limit", expected_stderr => qr/FATAL: sorry, too many clients already/); -- 2.48.1.76.g4e746b1a31.dirty
>From f8c5ee4adae736b275185c837a0971a5b2dbbc40 Mon Sep 17 00:00:00 2001 From: Andres Freund <and...@anarazel.de> Date: Fri, 7 Mar 2025 09:44:00 -0500 Subject: [PATCH v2 4/4] tests: Don't fail due to high default timeout in postmaster/003_start_stop Some BF animals use very high timeouts due to their slowness. Unfortunately postmaster/003_start_stop fails if a high timeout is configured, due to authentication_timeout having a fairly low max. As this test is reasonably fast, the easiest fix seems to be to cap the timeout to 600. Per buildfarm animal skink. Reviewed-by: Heikki Linnakangas <hlinn...@iki.fi> Discussion: https://postgr.es/m/ggflhkciwdyotpoie323chu2c2idpjk5qimrn462encwx2io7s@thmcxl7i6dpw --- src/test/postmaster/t/003_start_stop.pl | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/src/test/postmaster/t/003_start_stop.pl b/src/test/postmaster/t/003_start_stop.pl index 036b296f72b..4dc394139d9 100644 --- a/src/test/postmaster/t/003_start_stop.pl +++ b/src/test/postmaster/t/003_start_stop.pl @@ -20,6 +20,10 @@ use Test::More; # "pg_ctl stop" will error out before the authentication timeout kicks # in and cleans up the dead-end backends. my $authentication_timeout = $PostgreSQL::Test::Utils::timeout_default; + +# Don't fail due to hitting the max value allowed for authentication_timeout. +$authentication_timeout = 600 unless $authentication_timeout < 600; + my $stop_timeout = $authentication_timeout / 2; # Initialize the server with low connection limits, to test dead-end backends -- 2.48.1.76.g4e746b1a31.dirty