On 5/5/23 11:29 AM, Amit Kapila wrote:
On Fri, May 5, 2023 at 1:16 PM Drouvot, Bertrand
<bertranddrouvot...@gmail.com> wrote:


After multiple attempts, I got one failing one.

Issue is that we expect this file to be removed:

[07:24:27.261](0.899s) #WAL file is 
/Users/admin/pgsql/build/testrun/recovery/035_standby_logical_decoding/data/t_035_standby_logical_decoding_standby_data/pgdata/pg_wal/000000010000000000000003

But the standby emits:

2023-05-05 07:24:27.216 UTC [17909][client backend] 
[035_standby_logical_decoding.pl][3/6:0] LOG:  statement: checkpoint;
2023-05-05 07:24:27.216 UTC [17745][checkpointer] LOG:  restartpoint starting: 
immediate wait
2023-05-05 07:24:27.259 UTC [17745][checkpointer] LOG:  attempting to remove 
WAL segments older than log file 000000000000000000000002

So it seems the test is not right (missing activity??), not sure why yet.


Can you try to print the value returned by
XLogGetReplicationSlotMinimumLSN() in KeepLogSeg() on standby? Also,
please try to print "attempting to remove WAL segments ..." on the
primary. We can see, if by any chance some slot is holding us to
remove the required WAL file.


I turned DEBUG2 on. We can also see on the primary:

2023-05-05 08:23:30.843 UTC [16833][checkpointer] LOCATION:  
CheckPointReplicationSlots, slot.c:1576
2023-05-05 08:23:30.844 UTC [16833][checkpointer] DEBUG:  00000: snapshot of 
0+0 running transaction ids (lsn 0/40000D0 oldest xid 746 latest complete 745 
next xid 746)
2023-05-05 08:23:30.844 UTC [16833][checkpointer] LOCATION:  
LogCurrentRunningXacts, standby.c:1377
2023-05-05 08:23:30.845 UTC [16833][checkpointer] LOG:  00000: BDT1 about to 
call RemoveOldXlogFiles in CreateCheckPoint
2023-05-05 08:23:30.845 UTC [16833][checkpointer] LOCATION:  CreateCheckPoint, 
xlog.c:6835
2023-05-05 08:23:30.845 UTC [16833][checkpointer] LOG:  00000: attempting to 
remove WAL segments older than log file 000000000000000000000002
2023-05-05 08:23:30.845 UTC [16833][checkpointer] LOCATION:  
RemoveOldXlogFiles, xlog.c:3560
2023-05-05 08:23:30.845 UTC [16833][checkpointer] DEBUG:  00000: recycled write-ahead log 
file "000000010000000000000001"
2023-05-05 08:23:30.845 UTC [16833][checkpointer] LOCATION:  RemoveXlogFile, 
xlog.c:3708
2023-05-05 08:23:30.845 UTC [16833][checkpointer] DEBUG:  00000: recycled write-ahead log 
file "000000010000000000000002"
2023-05-05 08:23:30.845 UTC [16833][checkpointer] LOCATION:  RemoveXlogFile, 
xlog.c:3708
2023-05-05 08:23:30.845 UTC [16833][checkpointer] DEBUG:  00000: 
SlruScanDirectory invoking callback on pg_subtrans/0000

So, 000000010000000000000003 is not removed on the primary.

It has been recycled on:

2023-05-05 08:23:38.605 UTC [16833][checkpointer] DEBUG:  00000: recycled write-ahead log 
file "000000010000000000000003"

Which is later than the test:

[08:23:31.931](0.000s) not ok 19 - invalidated logical slots do not lead to 
retaining WAL

FWIW, the failing test with DEBUG2 can be found there: 
https://cirrus-ci.com/task/5615316688961536

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com


Reply via email to