> On 23 Feb 2016, at 11:30, Craig Ringer <cr...@2ndquadrant.com> wrote:
> 
>  
> Updated patch 
> 
> ... attached
> 
> I've split it up a bit more too, so it's easier to tell what change is for 
> what and fixed the issues mentioned by Oleksii. I've also removed some 
> unrelated documentation changes.
> 
> Patch 0001, timeline switches for logical decoding, is unchanged since the 
> last post. 

Thank you, I read the user-interface part now, looks good to me.

I found the following issue when shutting down a master with a connected 
replica that uses a physical failover slot:

2016-02-23 20:33:42.546 CET,,,54998,,56ccb3f3.d6d6,3,,2016-02-23 20:33:07 
CET,,0,DEBUG,00000,"performing replication slot checkpoint",,,,,,,,,""
2016-02-23 20:33:42.594 CET,,,55002,,56ccb3f3.d6da,4,,2016-02-23 20:33:07 
CET,,0,DEBUG,00000,"archived transaction log file 
""000000010000000000000003""",,,,,,,,,""
2016-02-23 20:33:42.601 CET,,,54998,,56ccb3f3.d6d6,4,,2016-02-23 20:33:07 
CET,,0,PANIC,XX000,"concurrent transaction log activity while database system 
is shutting down",,,,,,,,,""
2016-02-23 20:33:43.537 CET,,,54995,,56ccb3f3.d6d3,5,,2016-02-23 20:33:07 
CET,,0,LOG,00000,"checkpointer process (PID 54998) was terminated by signal 6: 
Abort trap",,,,,,,,,""
2016-02-23 20:33:43.537 CET,,,54995,,56ccb3f3.d6d3,6,,2016-02-23 20:33:07 
CET,,0,LOG,00000,"terminating any other active server processes",,,,,,,,,

Basically, the issue is that CreateCheckPoint calls CheckpointReplicationSlots, 
which currently produces WAL, and this violates the assumption at line 
xlog.c:8492

        if (shutdown && checkPoint.redo != ProcLastRecPtr)
                ereport(PANIC,
                                (errmsg("concurrent transaction log activity 
while database system is shutting down")));


There are a couple of incorrect comments

logical.c: 90
There's some things missing to allow this: I think it should be “There are some 
things missing to allow this:”

logical.c:93
"we need we would need”

slot.c:889
"and there's no latch to set, so poll” - clearly there is a latch used in the 
code below.

Also,  slot.c:301 emits an error message for an attempt to create a failover 
slot on the replica after acquiring and releasing the locks and getting the 
shared memory slot, even though all the data to check for this condition is 
available right at the beginning of the function. Shouldn’t it avoid the extra 
work if it’s not needed? 

> 
> 
> -- 
>  Craig Ringer                   http://www.2ndQuadrant.com/ 
> <http://www.2ndquadrant.com/>
>  PostgreSQL Development, 24x7 Support, Training & Services
> <0001-Allow-logical-slots-to-follow-timeline-switches.patch><0002-Allow-replication-slots-to-follow-failover.patch><0003-Retain-extra-WAL-for-failover-slots-in-base-backups.patch><0004-Add-the-UI-and-for-failover-slots.patch><0005-Document-failover-slots.patch><0006-Add-failover-to-pg_replication_slots.patch><0007-not-for-inclusion-Test-script-for-failover-slots.patch>


Kind regards,
--
Oleksii

Reply via email to