> On 23 Feb 2016, at 11:30, Craig Ringer <[email protected]> wrote:
>
>
> Updated patch
>
> ... attached
>
> I've split it up a bit more too, so it's easier to tell what change is for
> what and fixed the issues mentioned by Oleksii. I've also removed some
> unrelated documentation changes.
>
> Patch 0001, timeline switches for logical decoding, is unchanged since the
> last post.
Thank you, I read the user-interface part now, looks good to me.
I found the following issue when shutting down a master with a connected
replica that uses a physical failover slot:
2016-02-23 20:33:42.546 CET,,,54998,,56ccb3f3.d6d6,3,,2016-02-23 20:33:07
CET,,0,DEBUG,00000,"performing replication slot checkpoint",,,,,,,,,""
2016-02-23 20:33:42.594 CET,,,55002,,56ccb3f3.d6da,4,,2016-02-23 20:33:07
CET,,0,DEBUG,00000,"archived transaction log file
""000000010000000000000003""",,,,,,,,,""
2016-02-23 20:33:42.601 CET,,,54998,,56ccb3f3.d6d6,4,,2016-02-23 20:33:07
CET,,0,PANIC,XX000,"concurrent transaction log activity while database system
is shutting down",,,,,,,,,""
2016-02-23 20:33:43.537 CET,,,54995,,56ccb3f3.d6d3,5,,2016-02-23 20:33:07
CET,,0,LOG,00000,"checkpointer process (PID 54998) was terminated by signal 6:
Abort trap",,,,,,,,,""
2016-02-23 20:33:43.537 CET,,,54995,,56ccb3f3.d6d3,6,,2016-02-23 20:33:07
CET,,0,LOG,00000,"terminating any other active server processes",,,,,,,,,
Basically, the issue is that CreateCheckPoint calls CheckpointReplicationSlots,
which currently produces WAL, and this violates the assumption at line
xlog.c:8492
if (shutdown && checkPoint.redo != ProcLastRecPtr)
ereport(PANIC,
(errmsg("concurrent transaction log activity
while database system is shutting down")));
There are a couple of incorrect comments
logical.c: 90
There's some things missing to allow this: I think it should be “There are some
things missing to allow this:”
logical.c:93
"we need we would need”
slot.c:889
"and there's no latch to set, so poll” - clearly there is a latch used in the
code below.
Also, slot.c:301 emits an error message for an attempt to create a failover
slot on the replica after acquiring and releasing the locks and getting the
shared memory slot, even though all the data to check for this condition is
available right at the beginning of the function. Shouldn’t it avoid the extra
work if it’s not needed?
>
>
> --
> Craig Ringer http://www.2ndQuadrant.com/
> <http://www.2ndquadrant.com/>
> PostgreSQL Development, 24x7 Support, Training & Services
> <0001-Allow-logical-slots-to-follow-timeline-switches.patch><0002-Allow-replication-slots-to-follow-failover.patch><0003-Retain-extra-WAL-for-failover-slots-in-base-backups.patch><0004-Add-the-UI-and-for-failover-slots.patch><0005-Document-failover-slots.patch><0006-Add-failover-to-pg_replication_slots.patch><0007-not-for-inclusion-Test-script-for-failover-slots.patch>
Kind regards,
--
Oleksii