On Thu, Oct 16, 2025 at 11:48 PM Robert Haas <[email protected]> wrote:
> On Thu, Oct 9, 2025 at 3:09 PM Srinath Reddy Sadipiralla > <[email protected]> wrote: > > just a second late :( i was about to post a patch addressing the > refactors which Robert mentioned ,anyway will have a look at your latest > patch John thanks :), curious about the tap test. > > > > while i was writing the patch something suddenly struck me , that is why > we are even depending on last_common_segno ,because once we reached > decide_wal_file_action it means that the file exists in both target and > source ,AFAIK this can only happen with wal segments older than or equal to > last_common_segno because once the promotion competes the filename of the > WAL files gets changed with the new timelineID(2), for ex: if the > last_common_segno is 000000010000000000000003 then based on the rules in > XLogInitNewTimeline > > 1) if the timeline switch happens in middle of segment ,copy data from > the last WAL segment and create WAL file with same segno but different > timelineID,in this case the starting WAL file for the new timeline will be > 000000020000000000000003 > > 2) if the timeline switch happens at segment boundary , just create next > segment for this case the starting WAL file for the new timeline will be > 000000020000000000000004 > > > > so basically the files which exists in source and not in target like the > new timeline WAL segments will be copied to target in total before we reach > decide_wal_file_action , so i think we don't need to think about copying > WAL files after divergence point by calculating and checking against > last_common_segno which we are doing in our current approach , i think we > can just do > > What makes me nervous about this is that it isn't necessarily the case > that the servers were perfectly in sync at the time of the failure. > Suppose that the primary was in the middle of writing > 000000010000000000000003. The standby might also have this file, but > it might contain less valid data than the one on the primary; > therefore, if we don't copy the file, the two servers might not have > an identical file. Maybe that wouldn't really matter, in the sense > that the extra valid data that exists on the original primary > shouldn't prevent it from replaying WAL on the new primary's timeline, > which is probably all we really care about. But it feels dangerous to > me. > Thanks Robert ,I want to understand this point more , and will get back . -- Thanks, Srinath Reddy Sadipiralla EDB: https://www.enterprisedb.com/
