Hello Peter and Euler,
17.06.2024 14:04, Peter Eisentraut wrote:
On 07.06.24 05:49, Euler Taveira wrote:
Here it is a patch series to fix the issues reported in recent discussions. The
patches 0001 and 0003 aim to fix the buildfarm issues. The patch 0002 removes
synchronized failover slots on subscriber since it has no use. I also included
an optional patch 0004 that improves the usability by checking both servers if
it already failed in any subscriber check.
I have committed 0001, 0002, and 0003. Let's keep an eye on the buildfarm to see if that stabilizes things. So far
it looks good.
For 0004, I suggest inverting the result values from check_publisher() and create_subscriber() so that it returns true
if the check is ok.
As a recent buildfarm failure [1] shows, that test addition introduced
new instability:
### Starting node "node_s"
# Running: pg_ctl -w -D
/home/bf/bf-build/piculet/HEAD/pgsql.build/testrun/pg_basebackup/040_pg_createsubscriber/data/t_040_pg_createsubscriber_node_s_data/pgdata
-l
/home/bf/bf-build/piculet/HEAD/pgsql.build/testrun/pg_basebackup/040_pg_createsubscriber/log/040_pg_createsubscriber_node_s.log
-o --cluster-name=node_s start
waiting for server to start.... done
server started
# Postmaster PID for node "node_s" is 416482
error running SQL: 'psql:<stdin>:1: ERROR: skipping slot synchronization as the received slot sync LSN 0/30047F0 for
slot "failover_slot" is ahead of the standby position 0/3004708'
while running 'psql -XAtq -d port=51506 host=/tmp/pqWohdD5Qj dbname='postgres' -f - -v ON_ERROR_STOP=1' with sql 'SELECT
pg_sync_replication_slots()' at /home/bf/bf-build/piculet/HEAD/pgsql/src/test/perl/PostgreSQL/Test/Cluster.pm line 2126.
I could reproduce this failure with:
--- a/src/backend/replication/walreceiver.c
+++ b/src/backend/replication/walreceiver.c
@@ -517,6 +517,7 @@ WalReceiverMain(char *startup_data, size_t startup_data_len)
* let the startup process and primary server know about
* them.
*/
+pg_usleep(300000);
XLogWalRcvFlush(false, startpointTLI);
make -s check -C src/bin/pg_basebackup/ PROVE_TESTS="t/040*"
# +++ tap check in src/bin/pg_basebackup +++
t/040_pg_createsubscriber.pl .. 22/? # Tests were run but no plan was declared
and done_testing() was not seen.
# Looks like your test exited with 29 just after 23.
t/040_pg_createsubscriber.pl .. Dubious, test returned 29 (wstat 7424, 0x1d00)
All 23 subtests passed
Test Summary Report
-------------------
t/040_pg_createsubscriber.pl (Wstat: 7424 Tests: 23 Failed: 0)
Non-zero exit status: 29
Parse errors: No plan found in TAP output
Files=1, Tests=23, 4 wallclock secs ( 0.01 usr 0.01 sys + 0.49 cusr 0.44
csys = 0.95 CPU)
Moreover, this test may suffer from autovacuum:
echo "
autovacuum_naptime = 1
autovacuum_analyze_threshold = 1
" > /tmp/temp.config
TEMP_CONFIG=/tmp/temp.config make -s check -C src/bin/pg_basebackup/
PROVE_TESTS="t/040*"
# +++ tap check in src/bin/pg_basebackup +++
t/040_pg_createsubscriber.pl .. 24/?
# Failed test 'failover slot is synced'
# at t/040_pg_createsubscriber.pl line 273.
# got: ''
# expected: 'failover_slot'
t/040_pg_createsubscriber.pl .. 28/? # Looks like you failed 1 test of 33.
t/040_pg_createsubscriber.pl .. Dubious, test returned 1 (wstat 256, 0x100)
Failed 1/33 subtests
[1]
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=piculet&dt=2024-06-28%2004%3A42%3A48
Best regards,
Alexander