Hi all, There has been for the last couple of weeks a collection of reports complaining that the renaming of WAL segments is broken: https://www.postgresql.org/message-id/3861ff1e-0923-7838-e826-094cc9bef...@hot.ee https://www.postgresql.org/message-id/16874-c3eecd319e36a...@postgresql.org https://www.postgresql.org/message-id/095ccf8d-7f58-d928-427c-b17ace23c...@burgess.co.nz https://www.postgresql.org/message-id/16927-67c570d968c99567%40postgresql.org
These have happened on a variety of Windows versions, 2019 and 2012 R2 being mentioned when segments are recycled. The number of those failures is alarming, and the information gathered points at 13.1 and 13.2 as the culprits where those failures are happening, so I'd like to believe that there is a regression in 13. FWIW, I have also been doing some tests on my side, and while I as not able to trigger the reported failure, I have been able to trigger the same error with an archive_command doing a simple cp that failed continuously on EACCES. Fujii-san has mentioned that on twitter, but one area that has changed during the v13 cycle is aaa3aed, where the code recycling segments has been switched from a pgrename() (with a retry loop) to a CreateHardLinkA()+pgunlink() (with a retry loop for the second). One theory that I got in mind here is the case where we create the hard link, but fail to finish do the pgunlink() on the xlogtemp.N file, though after some testing it did not seem to have any impact. I am running more tests with several scenarios (aggressive segment recycling or segment rotation) to get more reproducible scenarios, but I was wondering if anybody had ideas around that. So, thoughts? -- Michael
signature.asc
Description: PGP signature