Hello,

We are seeing an interesting STANDBY behavior, that’s happening once in 3-4 
days.

The standby suddenly disconnects from the primary, and it throws the error 
“LOG: invalid record length at <LSN>: wanted 24, got0”.

And then it tries to restore the WAL file from the archive. Due to low write 
activity on primary, the WAL file will be switched and archived only after 1 hr.

So, it stuck in a loop of switching the WAL sources from STREAM and ARCHIVE 
without replicating the primary.

Due to this there will be write outage as the standby is synchronous standby.

We are using “wal_sync_method” as “fsync” assuming WAL file not getting flushed 
correctly.

But this is happening even after making it as “fsync” instead of “fdatasync”.

Restarting the STANDBY sometimes fixes this problem, but detecting this 
automatically is a big problem as the postgres standby process will be still 
running fine, but WAL RECEIVER process is up and down continuously due to 
switching of WAL sources.


How can we fix this ? Any suggestions regarding this will be appreciated.


Postgres Version: 13.6
OS: RHEL Linux


Thank you,


Best,
Harinath.

Reply via email to