Hi, On Wed, Oct 15, 2025 at 4:49 AM John H <[email protected]> wrote:
> Hi, > > On Fri, Oct 10, 2025 at 12:45 AM Srinath Reddy Sadipiralla > <[email protected]> wrote: > > > > ... > > XLogFilePath , then validate this new path with the given path > > ,this helps to catch invalid xlog files like > pg_wal/00000001FFFFFFFFFFFFFF10. > > ... > > Are you concerned that somehow these files, which are named like XLog > files but actually > aren't, are somehow created therefore we should sync them in case? > I'm trying to understand how these files would be generated in the first > place. > the problem is not when server generates them because filename is properly calculated using the XLogRecPtr in XLogWrite->XLogFileInit->XLogFileInitInternal->XLogFilePath ,the main problem is when if someone manually places an invalid WAL file in pg_wal like 00000001FFFFFFFFFFFFFF10, IsXLogFileName will consider it as valid ,so with the approach as i mentioned earlier we can catch such cases. > > > > > instead of passing last_common_segno down the call stack directly, > > we can have a struct maybe called "rewindState" which will have the > common > > information related to both clusters involved in rewind ,so that in > future > > if we want to pass more such information down we won't be increasing > > the number of arguments that need to be passed down to all functions in > stack. > > > > I don't feel too strongly about this. Not sure how we view > future-proofing things, > as in do we proactively wrap around structs or wait until there's an > actual need. > > This is the first time it's been touched since it was introduced in 14 for > what > it's worth. > TBH first it seemed to me a good coding practice for future proofing, then i checked if there's any place in postgres we are doing something similar ,then found 1 in src/include/access/gist.h typedef struct GISTDeletedPageContents { /* last xid which could see the page in a scan */ FullTransactionId deleteXid; } GISTDeletedPageContents; > > > > > we can improve the TAP test file header comment as > > # Test the situation where we skip copying the same WAL files from > source to target > > # except if WAL file size differs. > > > > let me put it this way > > 1) WALs are written to 000000010000000000000002 in primary. > > ... > > 7) so the last common WAL segment was 000000010000000000000003. > > Updated comments. > > > > > ... > > where entry->target_size != entry->source_size and we do copy, > > like if stat->mtime is changed (which means file has been copied) > > and target_stat->size != soucrce_stat->size here no error expected. > > > > Updated patch with one additional segment has a byte appended to it > on target but not source. > thanks for updating, i have attached a diff patch on top of v9-0001 patch , where i tried to add more tests to improve the validation that we copy the WAL file even when it exists on both source and target but the size differs. I also tried to rename some variables for readability,while testing this out i found that usleep(1000); is not enough to show that there has been a copy which changed the stat->mtime of the file because i think the copy was very fast (less than 1 millisecond) so the mtime of the file after rewind's copy didn't change or filesystem precision didn't caught the change, the copying of the file is confirmed because the same file has different size before rewind and same after rewind which these tests proved ok($corrupt_wal_seg_stat_before_rewind->size != $corrupt_wal_seg_source_stat->size,"WAL segment $corrupt_wal_seg has different size in source vs target before rewind"); ok 11 - WAL segment 000000010000000000000004 has different size in source vs target before rewind ok($corrupt_wal_seg_stat_after_rewind->size == $corrupt_wal_seg_source_stat->size, "WAL segment $corrupt_wal_seg was copied: file sizes are same between target and source after rewind"); ok 12 - WAL segment 000000010000000000000004 was copied: file sizes are same between target and source after rewind and more over the pg_rewind confirmed this with a new debug message which I added in this diff patch diff --git a/src/bin/pg_rewind/filemap.c b/src/bin/pg_rewind/filemap.c index 69d7728ce44..743484f4833 100644 --- a/src/bin/pg_rewind/filemap.c +++ b/src/bin/pg_rewind/filemap.c @@ -732,11 +732,11 @@ decide_wal_file_action(const char *fname, XLogSegNo last_common_segno, */ if (file_segno < last_common_segno && source_size == target_size) { pg_log_debug("WAL segment \"%s\" not copied to target", fname); return FILE_ACTION_NONE; } + pg_log_debug("WAL segment \"%s\" is copied to target", fname); return FILE_ACTION_COPY; } ok 6 - run pg_rewind stderr /(?^:WAL segment \"000000010000000000000004\" is copied to target)/ also did this instead of using catfile -my $wal_skipped_path = File::Spec->catfile($node_primary->data_dir, 'pg_wal', $wal_seg_skipped); +my $wal_skipped_path = $node_primary->data_dir . '/pg_wal/' . $wal_seg_skipped; seems consistent with other TAP tests. -- Thanks, Srinath Reddy Sadipiralla EDB: https://www.enterprisedb.com/
v9-0001.diff
Description: Binary data
