On 22 February 2016 at 23:39, Oleksii Kliukin <al...@hintbits.com> wrote:
> What it’s doing is calling pg_basebackup first to initialize the replica, > and that actually failed with: > > _basebackup: unexpected termination of replication stream: ERROR: > requested WAL segment 000000010000000000000000 has already been removed > > The segment name definitely looks bogus to me. > > The actual command causing the failure was an attempt to clone the replica > using pg_basebackup, turning on xlog streaming: > > pg_basebackup --pgdata data/postgres1 --xlog-method=stream > --dbname="host=localhost port=5432 user=replicator” > > I checked the same command against the git master without the patches > applied and could not reproduce this problem there. > That's a bug. In testing whether we need to return a lower LSN for minimum WAL for BASE_BACKUP it failed to properly test for InvalidXLogRecPtr . Good catch. > On the code level, I have no comments on 0001, it’s well documented and I > have no questions about the approach, although I might be not too > knowledgable to judge the specifics of the implementation. > The first patch is the most important IMO, and the one I think needs the most thought since it's ... well, timelines aren't simple. > slots.c:294 > elog(LOG, "persistency is %i", (int)slot->data.persistency); > > Should be changed to DEBUG? > That's an escapee log statement I thought I'd already rebased out. Well spotted, fixed. > > slot.c:468 > Why did you drop “void" as a parameter type of ReplicationSlotDropAcquired? > That's an editing error on my part that I'll reverse. Since the prototype declares (void) it doesn't matter, but it's a pointless change. Fixed. > walsender.c: 1509 at PhysicalConfirmReceivedLocation > > I’ve noticed a comment stating that we don’t need to call > ReplicationSlotSave(), but that pre-dated the WAL-logging of replication > slot changes. Don’t we need to call it now, the same way it’s done for > the logical slots in logical.c:at LogicalConfirmReceivedLocation? > No, it's safe here. All we must ensure is that a slot is advanced on the replica when it's advanced on the master. For physical slots even that's a weak requirement, we just have to stop them from falling *too* far behind and causing too much xlog retention. For logical slots we should ensure we advance the slot on the replica before any vacuum activity that might remove catalog tuples still needed by that slot gets replayed. Basically the difference is that logical slots keep track of the catalog xmin too, so they have (slightly) stricter requirements. This patch doesn't touch either of those functions except for renaming ReplicationSlotsComputeRequiredLSN to ReplicationSlotsUpdateRequiredLSN . Which, by the way, I really don't like doing, but I couldn't figure out a name to give the function that computes-and-returns the required LSN that wouldn't be even more confusing in the face of having a ReplicationSlotsComputeRequiredLSN function as well. Ideas welcome. Updated patch -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services