On Tue, Oct 11, 2022 at 8:40 AM Nathan Bossart <nathandboss...@gmail.com> wrote: > > On Mon, Oct 10, 2022 at 11:33:57AM +0530, Bharath Rupireddy wrote: > > On Mon, Oct 10, 2022 at 3:17 AM Nathan Bossart <nathandboss...@gmail.com> > > wrote: > >> I wonder if it would be better to simply remove this extra polling of > >> pg_wal as a prerequisite to your patch. The existing commentary leads me > >> to think there might not be a strong reason for this behavior, so it could > >> be a nice way to simplify your patch. > > > > I don't think it's a good idea to remove that completely. As said > > above, it might help someone, we never know. > > It would be great to hear whether anyone is using this functionality. If > no one is aware of existing usage and there is no interest in keeping it > around, I don't think it would be unreasonable to remove it in v16.
It seems like exhausting all the WAL in pg_wal before switching to streaming after failing to fetch from archive is unremovable. I found this after experimenting with it, here are my findings: 1. The standby has to recover initial WAL files in the pg_wal directory even for the normal post-restart/first-time-start case, I mean, in non-crash recovery case. 2. The standby received WAL files from primary (walreceiver just writes and flushes the received WAL to WAL files under pg_wal) pretty-fast and/or standby recovery is slow, say both the standby connection to primary and archive connection are broken for whatever reasons, then it has WAL files to recover in pg_wal directory. I think the fundamental behaviour for the standy is that it has to fully recover to the end of WAL under pg_wal no matter who copies WAL files there. I fully understand the consequences of manually copying WAL files into pg_wal, for that matter, manually copying/tinkering any other files into/under the data directory is something we don't recommend and encourage. In summary, the standby state machine in WaitForWALToBecomeAvailable() exhausts all the WAL in pg_wal before switching to streaming after failing to fetch from archive. The v8 patch proposed upthread deviates from this behaviour. Hence, attaching v9 patch that keeps the behaviour as-is, that means, the standby exhausts all the WAL in pg_wal before switching to streaming after fetching WAL from archive for at least streaming_replication_retry_interval milliseconds. Please review the v9 patch further. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
v9-0001-Allow-standby-to-switch-WAL-source-from-archive-t.patch
Description: Binary data