Re: Andres Freund > Hm, that could just be a "harmless" race. Does it still happen if you apply > the attached patch in addition?
Putting that patch on top of v8 made it pass 294 times before exiting like this: [08:52:34.134](0.032s) ok 1 - buffer pin conflict: cursor with conflicting pin established Waiting for replication conn standby's replay_lsn to pass 0/3430000 on primary done timed out waiting for match: (?^:User was holding shared buffer pin for too long) at t/031_recovery_conflict.pl line 318. But admittedly, this build machine is quite sluggish at times, likely due to neighboring VMs on the same host. (Perhaps that even explains the behavior from before this patch.) I'll still attach the logs since I frankly can't judge what errors are acceptable here and which aren't. Christoph
2023-08-08 08:52:33.312 UTC [1734403] LOG: starting PostgreSQL 17devel (Debian 17~~devel-1) on s390x-ibm-linux-gnu, compiled by gcc (Debian 13.2.0-1) 13.2.0, 64-bit 2023-08-08 08:52:33.312 UTC [1734403] LOG: listening on Unix socket "/tmp/UNCayI_17k/.s.PGSQL.63827" 2023-08-08 08:52:33.318 UTC [1734406] LOG: database system was shut down at 2023-08-08 08:52:33 UTC 2023-08-08 08:52:33.350 UTC [1734403] LOG: database system is ready to accept connections 2023-08-08 08:52:33.477 UTC [1734411] 031_recovery_conflict.pl LOG: statement: CREATE TABLESPACE test_recovery_conflict_tblspc LOCATION '' 2023-08-08 08:52:33.484 UTC [1734413] 031_recovery_conflict.pl LOG: received replication command: SHOW data_directory_mode 2023-08-08 08:52:33.484 UTC [1734413] 031_recovery_conflict.pl STATEMENT: SHOW data_directory_mode 2023-08-08 08:52:33.485 UTC [1734413] 031_recovery_conflict.pl LOG: received replication command: SHOW wal_segment_size 2023-08-08 08:52:33.485 UTC [1734413] 031_recovery_conflict.pl STATEMENT: SHOW wal_segment_size 2023-08-08 08:52:33.485 UTC [1734413] 031_recovery_conflict.pl LOG: received replication command: IDENTIFY_SYSTEM 2023-08-08 08:52:33.485 UTC [1734413] 031_recovery_conflict.pl STATEMENT: IDENTIFY_SYSTEM 2023-08-08 08:52:33.485 UTC [1734413] 031_recovery_conflict.pl LOG: received replication command: BASE_BACKUP ( LABEL 'pg_basebackup base backup', PROGRESS, CHECKPOINT 'fast', WAIT 0, MANIFEST 'yes', TARGET 'client') 2023-08-08 08:52:33.485 UTC [1734413] 031_recovery_conflict.pl STATEMENT: BASE_BACKUP ( LABEL 'pg_basebackup base backup', PROGRESS, CHECKPOINT 'fast', WAIT 0, MANIFEST 'yes', TARGET 'client') 2023-08-08 08:52:33.490 UTC [1734404] LOG: checkpoint starting: immediate force wait 2023-08-08 08:52:33.500 UTC [1734404] LOG: checkpoint complete: wrote 7 buffers (5.5%); 0 WAL file(s) added, 0 removed, 1 recycled; write=0.001 s, sync=0.001 s, total=0.011 s; sync files=0, longest=0.000 s, average=0.000 s; distance=11350 kB, estimate=11350 kB; lsn=0/2000060, redo lsn=0/2000028 2023-08-08 08:52:33.507 UTC [1734414] 031_recovery_conflict.pl LOG: received replication command: SHOW data_directory_mode 2023-08-08 08:52:33.507 UTC [1734414] 031_recovery_conflict.pl STATEMENT: SHOW data_directory_mode 2023-08-08 08:52:33.508 UTC [1734414] 031_recovery_conflict.pl LOG: received replication command: CREATE_REPLICATION_SLOT "pg_basebackup_1734414" TEMPORARY PHYSICAL ( RESERVE_WAL) 2023-08-08 08:52:33.508 UTC [1734414] 031_recovery_conflict.pl STATEMENT: CREATE_REPLICATION_SLOT "pg_basebackup_1734414" TEMPORARY PHYSICAL ( RESERVE_WAL) 2023-08-08 08:52:33.513 UTC [1734414] 031_recovery_conflict.pl LOG: received replication command: IDENTIFY_SYSTEM 2023-08-08 08:52:33.513 UTC [1734414] 031_recovery_conflict.pl STATEMENT: IDENTIFY_SYSTEM 2023-08-08 08:52:33.518 UTC [1734414] 031_recovery_conflict.pl LOG: received replication command: START_REPLICATION SLOT "pg_basebackup_1734414" 0/2000000 TIMELINE 1 2023-08-08 08:52:33.518 UTC [1734414] 031_recovery_conflict.pl STATEMENT: START_REPLICATION SLOT "pg_basebackup_1734414" 0/2000000 TIMELINE 1 2023-08-08 08:52:33.596 UTC [1734413] 031_recovery_conflict.pl LOG: temporary file: path "base/pgsql_tmp/pgsql_tmp1734413.0", size 137324 2023-08-08 08:52:33.596 UTC [1734413] 031_recovery_conflict.pl STATEMENT: BASE_BACKUP ( LABEL 'pg_basebackup base backup', PROGRESS, CHECKPOINT 'fast', WAIT 0, MANIFEST 'yes', TARGET 'client') 2023-08-08 08:52:33.741 UTC [1734425] standby LOG: received replication command: IDENTIFY_SYSTEM 2023-08-08 08:52:33.741 UTC [1734425] standby STATEMENT: IDENTIFY_SYSTEM 2023-08-08 08:52:33.766 UTC [1734425] standby LOG: received replication command: START_REPLICATION 0/3000000 TIMELINE 1 2023-08-08 08:52:33.766 UTC [1734425] standby STATEMENT: START_REPLICATION 0/3000000 TIMELINE 1 2023-08-08 08:52:33.823 UTC [1734427] 031_recovery_conflict.pl LOG: statement: CREATE DATABASE test_db 2023-08-08 08:52:33.878 UTC [1734429] 031_recovery_conflict.pl LOG: statement: CREATE TABLE test_recovery_conflict_table1(a int, b int); 2023-08-08 08:52:33.887 UTC [1734429] 031_recovery_conflict.pl LOG: statement: INSERT INTO test_recovery_conflict_table1 SELECT i % 3, 0 FROM generate_series(1,20) i; 2023-08-08 08:52:33.898 UTC [1734429] 031_recovery_conflict.pl LOG: statement: CREATE TABLE test_recovery_conflict_table2(a int, b int); 2023-08-08 08:52:33.936 UTC [1734431] 031_recovery_conflict.pl LOG: statement: SELECT pg_current_wal_flush_lsn() 2023-08-08 08:52:33.962 UTC [1734433] 031_recovery_conflict.pl LOG: statement: SELECT '0/342B100' <= replay_lsn AND state = 'streaming' FROM pg_catalog.pg_stat_replication WHERE application_name IN ('standby', 'walreceiver') 2023-08-08 08:52:34.013 UTC [1734437] 031_recovery_conflict.pl LOG: statement: BEGIN; 2023-08-08 08:52:34.018 UTC [1734437] 031_recovery_conflict.pl LOG: statement: INSERT INTO test_recovery_conflict_table1 VALUES (1,0); 2023-08-08 08:52:34.024 UTC [1734437] 031_recovery_conflict.pl LOG: statement: ROLLBACK; 2023-08-08 08:52:34.029 UTC [1734437] 031_recovery_conflict.pl LOG: statement: BEGIN; 2023-08-08 08:52:34.034 UTC [1734437] 031_recovery_conflict.pl LOG: statement: LOCK test_recovery_conflict_table1; 2023-08-08 08:52:34.040 UTC [1734437] 031_recovery_conflict.pl LOG: statement: COMMIT; 2023-08-08 08:52:34.072 UTC [1734439] 031_recovery_conflict.pl LOG: statement: SELECT pg_current_wal_flush_lsn() 2023-08-08 08:52:34.097 UTC [1734441] 031_recovery_conflict.pl LOG: statement: SELECT '0/342B1C0' <= replay_lsn AND state = 'streaming' FROM pg_catalog.pg_stat_replication WHERE application_name IN ('standby', 'walreceiver') 2023-08-08 08:52:34.149 UTC [1734443] 031_recovery_conflict.pl LOG: statement: VACUUM test_recovery_conflict_table1; 2023-08-08 08:52:34.192 UTC [1734445] 031_recovery_conflict.pl LOG: statement: SELECT pg_current_wal_flush_lsn() 2023-08-08 08:52:34.211 UTC [1734447] 031_recovery_conflict.pl LOG: statement: SELECT '0/3430000' <= replay_lsn AND state = 'streaming' FROM pg_catalog.pg_stat_replication WHERE application_name IN ('standby', 'walreceiver') 2023-08-08 08:52:34.348 UTC [1734449] 031_recovery_conflict.pl LOG: statement: SELECT '0/3430000' <= replay_lsn AND state = 'streaming' FROM pg_catalog.pg_stat_replication WHERE application_name IN ('standby', 'walreceiver') 2023-08-08 08:52:34.479 UTC [1734451] 031_recovery_conflict.pl LOG: statement: SELECT '0/3430000' <= replay_lsn AND state = 'streaming' FROM pg_catalog.pg_stat_replication WHERE application_name IN ('standby', 'walreceiver') 2023-08-08 08:52:34.617 UTC [1734453] 031_recovery_conflict.pl LOG: statement: SELECT '0/3430000' <= replay_lsn AND state = 'streaming' FROM pg_catalog.pg_stat_replication WHERE application_name IN ('standby', 'walreceiver') 2023-08-08 08:55:38.427 UTC [1734403] LOG: received immediate shutdown request 2023-08-08 08:55:38.437 UTC [1734403] LOG: database system is shut down
2023-08-08 08:52:33.722 UTC [1734420] LOG: starting PostgreSQL 17devel (Debian 17~~devel-1) on s390x-ibm-linux-gnu, compiled by gcc (Debian 13.2.0-1) 13.2.0, 64-bit 2023-08-08 08:52:33.722 UTC [1734420] LOG: listening on Unix socket "/tmp/UNCayI_17k/.s.PGSQL.63828" 2023-08-08 08:52:33.728 UTC [1734423] LOG: database system was interrupted; last known up at 2023-08-08 08:52:33 UTC 2023-08-08 08:52:33.728 UTC [1734423] LOG: entering standby mode 2023-08-08 08:52:33.729 UTC [1734423] LOG: redo starts at 0/2000028 2023-08-08 08:52:33.729 UTC [1734423] WARNING: unexpected directory entry "16384" found in pg_tblspc/ 2023-08-08 08:52:33.729 UTC [1734423] DETAIL: All directory entries in pg_tblspc/ should be symbolic links. 2023-08-08 08:52:33.729 UTC [1734423] HINT: Remove those directories, or set allow_in_place_tablespaces to ON transiently to let recovery complete. 2023-08-08 08:52:33.729 UTC [1734423] LOG: consistent recovery state reached at 0/2000100 2023-08-08 08:52:33.733 UTC [1734420] LOG: database system is ready to accept read-only connections 2023-08-08 08:52:33.768 UTC [1734424] LOG: started streaming WAL from primary at 0/3000000 on timeline 1 2023-08-08 08:52:34.108 UTC [1734435] 031_recovery_conflict.pl LOG: statement: BEGIN; 2023-08-08 08:52:34.113 UTC [1734435] 031_recovery_conflict.pl LOG: statement: DECLARE test_recovery_conflict_cursor CURSOR FOR SELECT b FROM test_recovery_conflict_table1; 2023-08-08 08:52:34.119 UTC [1734435] 031_recovery_conflict.pl LOG: statement: FETCH FORWARD FROM test_recovery_conflict_cursor; 2023-08-08 08:52:34.128 UTC [1734435] 031_recovery_conflict.pl LOG: statement: ; 2023-08-08 08:55:38.434 UTC [1734424] FATAL: could not receive data from WAL stream: server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. 2023-08-08 08:55:38.434 UTC [1734423] LOG: invalid record length at 0/347E6A8: expected at least 24, got 0 2023-08-08 08:55:38.440 UTC [1734495] FATAL: could not connect to the primary server: connection to server on socket "/tmp/UNCayI_17k/.s.PGSQL.63827" failed: No such file or directory Is the server running locally and accepting connections on that socket? 2023-08-08 08:55:38.441 UTC [1734423] LOG: waiting for WAL to become available at 0/347E6C0 2023-08-08 08:55:38.536 UTC [1734420] LOG: received immediate shutdown request 2023-08-08 08:55:38.543 UTC [1734420] LOG: database system is shut down
# Checking port 63827 # Found port 63827 Name: primary Data directory: /home/myon/postgresql/pg/postgresql/build/src/test/recovery/tmp_check/t_031_recovery_conflict_primary_data/pgdata Backup directory: /home/myon/postgresql/pg/postgresql/build/src/test/recovery/tmp_check/t_031_recovery_conflict_primary_data/backup Archive directory: /home/myon/postgresql/pg/postgresql/build/src/test/recovery/tmp_check/t_031_recovery_conflict_primary_data/archives Connection string: port=63827 host=/tmp/UNCayI_17k Log file: /home/myon/postgresql/pg/postgresql/build/src/test/recovery/tmp_check/log/031_recovery_conflict_primary.log # Running: initdb -D /home/myon/postgresql/pg/postgresql/build/src/test/recovery/tmp_check/t_031_recovery_conflict_primary_data/pgdata -A trust -N The files belonging to this database system will be owned by user "myon". This user must also own the server process. The database cluster will be initialized with this locale configuration: provider: libc LC_COLLATE: en_US.utf8 LC_CTYPE: en_US.utf8 LC_MESSAGES: C LC_MONETARY: en_US.utf8 LC_NUMERIC: en_US.utf8 LC_TIME: en_US.utf8 The default database encoding has accordingly been set to "UTF8". The default text search configuration will be set to "english". Data page checksums are disabled. creating directory /home/myon/postgresql/pg/postgresql/build/src/test/recovery/tmp_check/t_031_recovery_conflict_primary_data/pgdata ... ok creating subdirectories ... ok selecting dynamic shared memory implementation ... posix selecting default max_connections ... 100 selecting default shared_buffers ... 128MB selecting default time zone ... Etc/UTC creating configuration files ... ok running bootstrap script ... ok performing post-bootstrap initialization ... ok Sync to disk skipped. The data directory might become corrupt if the operating system crashes. Success. You can now start the database server using: pg_ctl -D /home/myon/postgresql/pg/postgresql/build/src/test/recovery/tmp_check/t_031_recovery_conflict_primary_data/pgdata -l logfile start # Running: /home/myon/postgresql/pg/postgresql/build/src/test/recovery/../../../src/test/regress/pg_regress --config-auth /home/myon/postgresql/pg/postgresql/build/src/test/recovery/tmp_check/t_031_recovery_conflict_primary_data/pgdata ### Starting node "primary" # Running: pg_ctl -w -D /home/myon/postgresql/pg/postgresql/build/src/test/recovery/tmp_check/t_031_recovery_conflict_primary_data/pgdata -l /home/myon/postgresql/pg/postgresql/build/src/test/recovery/tmp_check/log/031_recovery_conflict_primary.log -o --cluster-name=primary start waiting for server to start.... done server started # Postmaster PID for node "primary" is 1734403 # Taking pg_basebackup my_backup from node "primary" # Running: pg_basebackup -D /home/myon/postgresql/pg/postgresql/build/src/test/recovery/tmp_check/t_031_recovery_conflict_primary_data/backup/my_backup -h /tmp/UNCayI_17k -p 63827 --checkpoint fast --no-sync # Backup finished # Checking port 63828 # Found port 63828 Name: standby Data directory: /home/myon/postgresql/pg/postgresql/build/src/test/recovery/tmp_check/t_031_recovery_conflict_standby_data/pgdata Backup directory: /home/myon/postgresql/pg/postgresql/build/src/test/recovery/tmp_check/t_031_recovery_conflict_standby_data/backup Archive directory: /home/myon/postgresql/pg/postgresql/build/src/test/recovery/tmp_check/t_031_recovery_conflict_standby_data/archives Connection string: port=63828 host=/tmp/UNCayI_17k Log file: /home/myon/postgresql/pg/postgresql/build/src/test/recovery/tmp_check/log/031_recovery_conflict_standby.log # Initializing node "standby" from backup "my_backup" of node "primary" ### Enabling streaming replication for node "standby" ### Starting node "standby" # Running: pg_ctl -w -D /home/myon/postgresql/pg/postgresql/build/src/test/recovery/tmp_check/t_031_recovery_conflict_standby_data/pgdata -l /home/myon/postgresql/pg/postgresql/build/src/test/recovery/tmp_check/log/031_recovery_conflict_standby.log -o --cluster-name=standby start waiting for server to start.... done server started # Postmaster PID for node "standby" is 1734420 Waiting for replication conn standby's replay_lsn to pass 0/342B100 on primary done Waiting for replication conn standby's replay_lsn to pass 0/342B1C0 on primary done [08:52:34.101](1.605s) # issuing query via background psql: # BEGIN; # DECLARE test_recovery_conflict_cursor CURSOR FOR SELECT b FROM test_recovery_conflict_table1; # FETCH FORWARD FROM test_recovery_conflict_cursor; [08:52:34.134](0.032s) ok 1 - buffer pin conflict: cursor with conflicting pin established Waiting for replication conn standby's replay_lsn to pass 0/3430000 on primary done timed out waiting for match: (?^:User was holding shared buffer pin for too long) at t/031_recovery_conflict.pl line 318. # Postmaster PID for node "primary" is 1734403 ### Stopping node "primary" using mode immediate # Running: pg_ctl -D /home/myon/postgresql/pg/postgresql/build/src/test/recovery/tmp_check/t_031_recovery_conflict_primary_data/pgdata -m immediate stop waiting for server to shut down.... done server stopped # No postmaster PID for node "primary" # Postmaster PID for node "standby" is 1734420 ### Stopping node "standby" using mode immediate # Running: pg_ctl -D /home/myon/postgresql/pg/postgresql/build/src/test/recovery/tmp_check/t_031_recovery_conflict_standby_data/pgdata -m immediate stop waiting for server to shut down.... done server stopped # No postmaster PID for node "standby" [08:55:38.639](184.505s) # Tests were run but no plan was declared and done_testing() was not seen. [08:55:38.639](0.000s) # Looks like your test exited with 255 just after 1.