Added patch to upcoming commitfest https://commitfest.postgresql.org/patch/5908/
Thanks & Regards, Sunil S On Wed, Jul 9, 2025 at 12:01 AM sunil s <sunilfe...@gmail.com> wrote: > Hello Hackers, > > I recently had the opportunity to continue the effort originally led by a > valued contributor. > I’ve addressed most of the previously reported feedback and issues, and > would like to share the updated patch with the community. > > IMHO starting WAL receiver eagerly offers significant advantages because > of following reasons > > 1. > > If recovery_min_apply_delay is set high (for various operational > reasons) and the primary crashes, the mirror can recover quickly, thereby > improving overall High Availability. > 2. > > For setups without archive-based recovery, restore and recovery > operations complete faster. > 3. > > When synchronous_commit is enabled, faster mirror recovery reduces > offline time and helps avoid prolonged commit/query wait times during > failover/recovery. > 4. > > This approach also improves resilience by limiting the impact of > network interruptions on replication. > > > > In common cases, I believe archive recovery is faster than > replication. If a segment is available from archive, we don't need to > prefetch it via stream. > > I completely agree — restoring from the archive is significantly faster > than streaming. > Attempting to stream from the last available WAL in the archive would > introduce complexity and risk. > Therefore, we can limit this feature to crash recovery scenarios and skip > it when archiving is enabled. > > > The "FATAL: could not open file" message from walreceiver means that > the walreceiver was operationally prohibited to install a new wal > segment at the time. > This was caused by an additional fix added in upstream to address a race > condition between the archiver and checkpointer. > It has been resolved in the latest patch, which also includes a TAP test > to verify the fix. Thanks for testing and bringing this to our attention. > For now we will skip wal receiver early start since enabling the write > access for wal receiver will reintroduce the bug, which the > commit cc2c7d65fc27e877c9f407587b0b92d46cd6dd16 > <https://github.com/postgres/postgres/commit/cc2c7d65fc27e877c9f407587b0b92d46cd6dd16> > fixed > previously. > > > I've attached the rebased patch with the necessary fix. > > Thanks & Regards, > Sunil S (Broadcom) > > > On Tue, Jul 8, 2025 at 11:01 AM Kyotaro Horiguchi <horikyota....@gmail.com> > wrote: > >> At Wed, 15 Dec 2021 17:01:24 -0800, Soumyadeep Chakraborty < >> soumyadeep2...@gmail.com> wrote in >> > Sure, that makes more sense. Fixed. >> >> As I played with this briefly. I started a standby from a backup that >> has an access to archive. I had the following log lines steadily. >> >> >> [139535:postmaster] LOG: database system is ready to accept read-only >> connections >> [139542:walreceiver] LOG: started streaming WAL from primary at >> 0/2000000 on timeline 1 >> cp: cannot stat '/home/horiguti/data/arc_work/000000010000000000000003': >> No such file or directory >> [139542:walreceiver] FATAL: could not open file >> "pg_wal/000000010000000000000003": No such file or directory >> cp: cannot stat '/home/horiguti/data/arc_work/00000002.history': No such >> file or directory >> cp: cannot stat '/home/horiguti/data/arc_work/000000010000000000000003': >> No such file or directory >> [139548:walreceiver] LOG: started streaming WAL from primary at >> 0/3000000 on timeline 1 >> >> The "FATAL: could not open file" message from walreceiver means that >> the walreceiver was operationally prohibited to install a new wal >> segment at the time. Thus the walreceiver ended as soon as started. >> In short, the eager replication is not working at all. >> >> >> I have a comment on the behavior and objective of this feature. >> >> In the case where archive recovery is started from a backup, this >> feature lets walreceiver start while the archive recovery is ongoing. >> If walreceiver (or the eager replication) worked as expected, it would >> write wal files while archive recovery writes the same set of WAL >> segments to the same directory. I don't think that is a sane behavior. >> Or, if putting more modestly, an unintended behavior. >> >> In common cases, I believe archive recovery is faster than >> replication. If a segment is available from archive, we don't need to >> prefetch it via stream. >> >> If this feature is intended to use only for crash recovery of a >> standby, it should fire only when it is needed. >> >> If not, that is, if it is intended to work also for archive recovery, >> I think the eager replication should start from the next segment of >> the last WAL in archive but that would invite more complex problems. >> >> regards. >> >> -- >> Kyotaro Horiguchi >> NTT Open Source Software Center >> >> >> >> >>