Re: Replication failure, slave requesting old segments

Adrian Klaver Mon, 13 Aug 2018 06:52:43 -0700

On 08/13/2018 05:39 AM, Stephen Frost wrote:

Greetings,


* Phil Endecott ([email protected]) wrote:

Adrian Klaver wrote:

On 08/12/2018 02:56 PM, Phil Endecott wrote:

Anyway.  Do others agree that my issue was the result of
wal_keep_segments=0 ?


Only as a sub-issue of the slave losing contact with the master. The basic
problem is maintaining two separate operations, archiving and streaming,
in sync. If either or some combination of both lose synchronization then
it is anyone's guess on what is appropriate for wal_keep_segments.


Uh, no, having an archive_command and a restore_command configures
exactly should remove the need to worry about what wal_keep_segments is
set to because anything not on the primary really should be available
through what's been archived and PG shouldn't have any trouble figuring
that out and working with it.

If all you've got is streaming replication then, sure, you have no idea
what to set wal_keep_segments to because the replica could be offline
for an indeterminate amount of time, but as long as you're keeping track
of all the WAL through archive_command, that shouldn't be an issue.

Therein lies the rub. As I stated previously the bigger issue is syncingtwo different operations, archiving and streaming. The OP got caughtshort assuming the archiving would handle the situation where thestreaming was down for a period. In his particular setup and for thisparticular situation a wal_keep_segments of 1 would have helped. I donot see this as a default value though as it depends on too manyvariables outside the reach of the database, mostly notably the successof the archive command. First is the command even valid, two is thenetwork link reliable, three is there even a network link, is there morethen one network link, four is the restore command valid? That is justof the top of my head, more caffeine and I could come up with more.Saying that having archiving, streaming and a wal_keep_segments=1 hasyou covered, is misleading. I don't see it as detrimental to performancebut I do see more posts down the road from folks who are surprised whenit does not cover their case. Personally I think it better to be upfront that this requires more thought or a third party solution that hasdone the thinking.

Really?  I thought the intention was that the system should be
able to recover reliably when the slave reconnects after a
period of downtime, subject only to there being sufficient
network/CPU/disk bandwidth etc. for it to eventually catch up.


Yes, that's correct, the replica should always be able to catch back up
presuming there's no gaps in the WAL between when the replica failed and
where the primary is at.

Thanks!

Stephen



--
Adrian Klaver
[email protected]

Re: Replication failure, slave requesting old segments

Reply via email to