Re: [HACKERS] Logical decoding on standby
On 21 June 2017 at 17:30, sanyam jain wrote: > Hi, > After changing > sendTimeLineIsHistoric = state->currTLI == ThisTimeLineID; > to > sendTimeLineIsHistoric = state->currTLI != ThisTimeLineID; > > I was facing another issue. > On promotion of a cascaded server ThisTimeLineID in the standby server > having logical slot becomes 0. > Then i added a function call to GetStandbyFlushRecPtr in > StartLogicalReplication which updates ThisTimeLineID. > > After the above two changes timeline following is working.But i'm not sure > whether this is correct or not.In any case please someone clarify. That's a reasonable thing to do, and again, I thought I did it in a later revision, but apparently not (?). I've been working on other things and have lost track of progress here a bit. I'll check more closely. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Logical decoding on standby
On 21 June 2017 at 13:28, sanyam jain wrote: > Hi, > > In this patch in walsender.c sendTimeLineIsHistoric is set to true when > current and ThisTimeLineID are equal. > > sendTimeLineIsHistoric = state->currTLI == ThisTimeLineID; > > > Shouldn't sendTimeLineIsHistoric is true when state->currTLI is less than > ThisTimeLineID. Correct, that was a bug. I thought it got fixed upthread though. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Logical decoding on standby
Hi, >After changing >sendTimeLineIsHistoric = state->currTLI == ThisTimeLineID; >to >sendTimeLineIsHistoric = state->currTLI != ThisTimeLineID; > >I was facing another issue. >On promotion of a cascaded server ThisTimeLineID in the standby server having >>logical slot becomes 0. >Then i added a function call to GetStandbyFlushRecPtr in >StartLogicalReplication >which updates ThisTimeLineID. > >After the above two changes timeline following is working.But i'm not sure >whether >this is correct or not.In any case please someone clarify. Please anyone with experience can explain whether the steps i have done are correct or not. Thanks, Sanyam Jain
Re: [HACKERS] Logical decoding on standby
Hi, After changing sendTimeLineIsHistoric = state->currTLI == ThisTimeLineID; to sendTimeLineIsHistoric = state->currTLI != ThisTimeLineID; I was facing another issue. On promotion of a cascaded server ThisTimeLineID in the standby server having logical slot becomes 0. Then i added a function call to GetStandbyFlushRecPtr in StartLogicalReplication which updates ThisTimeLineID. After the above two changes timeline following is working.But i'm not sure whether this is correct or not.In any case please someone clarify. Thanks, Sanyam Jain
Re: [HACKERS] Logical decoding on standby
Hi, In this patch in walsender.c sendTimeLineIsHistoric is set to true when current and ThisTimeLineID are equal. sendTimeLineIsHistoric = state->currTLI == ThisTimeLineID; Shouldn't sendTimeLineIsHistoric is true when state->currTLI is less than ThisTimeLineID. When i applied the timeline following patch alone pg_recvlogical quits in startup phase but when i made the above change pg_recvlogical works although timeline following doesn't work. Thanks, Sanyam Jain From: pgsql-hackers-ow...@postgresql.org on behalf of Robert Haas Sent: Wednesday, April 5, 2017 3:25:50 PM To: Andres Freund Cc: Craig Ringer; Simon Riggs; Thom Brown; Michael Paquier; Petr Jelinek; PostgreSQL Hackers Subject: Re: [HACKERS] Logical decoding on standby On Wed, Apr 5, 2017 at 10:32 AM, Andres Freund wrote: > On 2017-04-05 17:18:24 +0800, Craig Ringer wrote: >> On 5 April 2017 at 04:19, Andres Freund wrote: >> > On 2017-04-04 22:32:40 +0800, Craig Ringer wrote: >> >> I'm much happier with this. I'm still fixing some issues in the tests >> >> for 03 and tidying them up, but 03 should allow 01 and 02 to be >> >> reviewed in their proper context now. >> > >> > To me this very clearly is too late for v10, and now should be moved to >> > the next CF. >> >> I tend to agree that it's late in the piece. It's still worth cleaning >> it up into a state ready for early pg11 though. > > Totally agreed. Based on this exchange, marked as "Moved to next CF". -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Logical decoding on standby
On 5 April 2017 at 23:25, Robert Haas wrote: > On Wed, Apr 5, 2017 at 10:32 AM, Andres Freund wrote: >> On 2017-04-05 17:18:24 +0800, Craig Ringer wrote: >>> On 5 April 2017 at 04:19, Andres Freund wrote: >>> > On 2017-04-04 22:32:40 +0800, Craig Ringer wrote: >>> >> I'm much happier with this. I'm still fixing some issues in the tests >>> >> for 03 and tidying them up, but 03 should allow 01 and 02 to be >>> >> reviewed in their proper context now. >>> > >>> > To me this very clearly is too late for v10, and now should be moved to >>> > the next CF. >>> >>> I tend to agree that it's late in the piece. It's still worth cleaning >>> it up into a state ready for early pg11 though. >> >> Totally agreed. > > Based on this exchange, marked as "Moved to next CF". Yeah. Can't say I like it, but I have to agree. Can get this rolling in early pg11, and that way we can hopefully get support for it into logical replication too. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Logical decoding on standby
On Wed, Apr 5, 2017 at 10:32 AM, Andres Freund wrote: > On 2017-04-05 17:18:24 +0800, Craig Ringer wrote: >> On 5 April 2017 at 04:19, Andres Freund wrote: >> > On 2017-04-04 22:32:40 +0800, Craig Ringer wrote: >> >> I'm much happier with this. I'm still fixing some issues in the tests >> >> for 03 and tidying them up, but 03 should allow 01 and 02 to be >> >> reviewed in their proper context now. >> > >> > To me this very clearly is too late for v10, and now should be moved to >> > the next CF. >> >> I tend to agree that it's late in the piece. It's still worth cleaning >> it up into a state ready for early pg11 though. > > Totally agreed. Based on this exchange, marked as "Moved to next CF". -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Logical decoding on standby
On 2017-04-05 17:18:24 +0800, Craig Ringer wrote: > On 5 April 2017 at 04:19, Andres Freund wrote: > > On 2017-04-04 22:32:40 +0800, Craig Ringer wrote: > >> I'm much happier with this. I'm still fixing some issues in the tests > >> for 03 and tidying them up, but 03 should allow 01 and 02 to be > >> reviewed in their proper context now. > > > > To me this very clearly is too late for v10, and now should be moved to > > the next CF. > > I tend to agree that it's late in the piece. It's still worth cleaning > it up into a state ready for early pg11 though. Totally agreed. - Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Logical decoding on standby
On 5 April 2017 at 04:19, Andres Freund wrote: > On 2017-04-04 22:32:40 +0800, Craig Ringer wrote: >> I'm much happier with this. I'm still fixing some issues in the tests >> for 03 and tidying them up, but 03 should allow 01 and 02 to be >> reviewed in their proper context now. > > To me this very clearly is too late for v10, and now should be moved to > the next CF. I tend to agree that it's late in the piece. It's still worth cleaning it up into a state ready for early pg11 though. I've just fixed an issue where hot_standby_feedback on a physical slot could cause oldestCatalogXmin to go backwards. When the slot's catalog_xmin was 0 and is being set for the first time the standby's supplied catalog_xmin is trusted. To fix it, in PhysicalReplicationSlotNewXmin when setting catalog_xmin from 0, clamp the value to the master's GetOldestSafeDecodingTransactionId(). Tests are cleaned up and fixed. This series adds full support for logical decoding on a standby. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services From 24e2baea15c4f435789c7fda5ddc9feae8a7012f Mon Sep 17 00:00:00 2001 From: Craig Ringer Date: Wed, 22 Mar 2017 13:36:49 +0800 Subject: [PATCH 1/3] Log catalog_xmin advances before removing catalog tuples Before advancing the effective catalog_xmin we use to remove old catalog tuple versions, make sure it is written to WAL. This allows standbys to know the oldest xid they can safely create a historic snapshot for. They can then refuse to start decoding from a slot or raise a recovery conflict. The catalog_xmin advance is logged in a new xl_catalog_xmin_advance record, emitted before vacuum or periodically by the bgwriter. WAL is only written if the lowest catalog_xmin needed by any replication slot has advanced. --- src/backend/access/heap/rewriteheap.c | 3 +- src/backend/access/rmgrdesc/xactdesc.c | 9 ++ src/backend/access/rmgrdesc/xlogdesc.c | 3 +- src/backend/access/transam/varsup.c | 15 src/backend/access/transam/xact.c | 36 src/backend/access/transam/xlog.c | 23 - src/backend/postmaster/bgwriter.c | 9 ++ src/backend/replication/logical/decode.c| 12 +++ src/backend/replication/walreceiver.c | 2 +- src/backend/replication/walsender.c | 46 +- src/backend/storage/ipc/procarray.c | 134 ++-- src/bin/pg_controldata/pg_controldata.c | 2 + src/include/access/transam.h| 5 ++ src/include/access/xact.h | 12 ++- src/include/catalog/pg_control.h| 1 + src/include/storage/procarray.h | 5 +- src/test/recovery/t/006_logical_decoding.pl | 90 +-- 17 files changed, 383 insertions(+), 24 deletions(-) diff --git a/src/backend/access/heap/rewriteheap.c b/src/backend/access/heap/rewriteheap.c index d7f65a5..d1400ec 100644 --- a/src/backend/access/heap/rewriteheap.c +++ b/src/backend/access/heap/rewriteheap.c @@ -812,7 +812,8 @@ logical_begin_heap_rewrite(RewriteState state) if (!state->rs_logical_rewrite) return; - ProcArrayGetReplicationSlotXmin(NULL, &logical_xmin); + /* Use oldestCatalogXmin here */ + ProcArrayGetReplicationSlotXmin(NULL, &logical_xmin, NULL); /* * If there are no logical slots in progress we don't need to do anything, diff --git a/src/backend/access/rmgrdesc/xactdesc.c b/src/backend/access/rmgrdesc/xactdesc.c index 735f8c5..96ea163 100644 --- a/src/backend/access/rmgrdesc/xactdesc.c +++ b/src/backend/access/rmgrdesc/xactdesc.c @@ -297,6 +297,12 @@ xact_desc(StringInfo buf, XLogReaderState *record) appendStringInfo(buf, "xtop %u: ", xlrec->xtop); xact_desc_assignment(buf, xlrec); } + else if (info == XLOG_XACT_CATALOG_XMIN_ADV) + { + xl_xact_catalog_xmin_advance *xlrec = (xl_xact_catalog_xmin_advance *) XLogRecGetData(record); + + appendStringInfo(buf, "catalog_xmin %u", xlrec->new_catalog_xmin); + } } const char * @@ -324,6 +330,9 @@ xact_identify(uint8 info) case XLOG_XACT_ASSIGNMENT: id = "ASSIGNMENT"; break; + case XLOG_XACT_CATALOG_XMIN_ADV: + id = "CATALOG_XMIN"; + break; } return id; diff --git a/src/backend/access/rmgrdesc/xlogdesc.c b/src/backend/access/rmgrdesc/xlogdesc.c index 5f07eb1..a66cfc6 100644 --- a/src/backend/access/rmgrdesc/xlogdesc.c +++ b/src/backend/access/rmgrdesc/xlogdesc.c @@ -47,7 +47,7 @@ xlog_desc(StringInfo buf, XLogReaderState *record) "tli %u; prev tli %u; fpw %s; xid %u:%u; oid %u; multi %u; offset %u; " "oldest xid %u in DB %u; oldest multi %u in DB %u; " "oldest/newest commit timestamp xid: %u/%u; " - "oldest running xid %u; %s", + "oldest running xid %u; oldest catalog xmin %u; %s", (uint32) (checkpoint->redo >> 32), (uint32) checkpoint->redo, checkpoint->ThisTimeLineID, checkpoint->PrevTimeLineID, @@ -63,6 +63,7 @@ xlog_desc(StringI
Re: [HACKERS] Logical decoding on standby
On 2017-04-04 22:32:40 +0800, Craig Ringer wrote: > I'm much happier with this. I'm still fixing some issues in the tests > for 03 and tidying them up, but 03 should allow 01 and 02 to be > reviewed in their proper context now. To me this very clearly is too late for v10, and now should be moved to the next CF. - Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Logical decoding on standby
On 4 April 2017 at 22:32, Craig Ringer wrote: > Hi all > > Here's the final set of three patches on top of what's already committed. > > The first is catalog_xmin logging, which is unchanged from the prior post. > > The 2nd is support for conflict with recovery, with changes that > should address Andres's concerns there. > > The 3rd actually enables decoding on standby. Unlike the prior > version, no attempt is made to check the walsender configuration for > slot use, etc. The ugly code to try to mitigate races is also removed. > Instead, if wal_level is logical the catalog_xmin sent by > hot_standby_feedback is now the same as the xmin if there's no local > slot holding it down. So we're always sending a catalog_xmin in > feedback and we should always expect to have a valid local > oldestCatalogXmin once hot_standby_feedback kicks in. This makes the > race in slot creation no worse than the existing race between > hot_standby_feedback establishment and the first queries run on a > downstream, albeit with more annoying consequences. Apps can still > ensure a slot created on standby is guaranteed safe and conflict-free > by having a slot on the master first. > > I'm much happier with this. I'm still fixing some issues in the tests > for 03 and tidying them up, but 03 should allow 01 and 02 to be > reviewed in their proper context now. Dammit. Attached. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services From 9b8b1236eb32819430062031ff76750ed8bc1661 Mon Sep 17 00:00:00 2001 From: Craig Ringer Date: Wed, 22 Mar 2017 13:36:49 +0800 Subject: [PATCH 1/3] Log catalog_xmin advances before removing catalog tuples Before advancing the effective catalog_xmin we use to remove old catalog tuple versions, make sure it is written to WAL. This allows standbys to know the oldest xid they can safely create a historic snapshot for. They can then refuse to start decoding from a slot or raise a recovery conflict. The catalog_xmin advance is logged in a new xl_catalog_xmin_advance record, emitted before vacuum or periodically by the bgwriter. WAL is only written if the lowest catalog_xmin needed by any replication slot has advanced. --- src/backend/access/heap/rewriteheap.c | 3 +- src/backend/access/rmgrdesc/xactdesc.c | 9 ++ src/backend/access/rmgrdesc/xlogdesc.c | 3 +- src/backend/access/transam/varsup.c | 15 src/backend/access/transam/xact.c | 36 src/backend/access/transam/xlog.c | 23 - src/backend/postmaster/bgwriter.c | 9 ++ src/backend/replication/logical/decode.c| 12 +++ src/backend/replication/walreceiver.c | 2 +- src/backend/replication/walsender.c | 8 ++ src/backend/storage/ipc/procarray.c | 134 ++-- src/bin/pg_controldata/pg_controldata.c | 2 + src/include/access/transam.h| 5 ++ src/include/access/xact.h | 12 ++- src/include/catalog/pg_control.h| 1 + src/include/storage/procarray.h | 5 +- src/test/recovery/t/006_logical_decoding.pl | 90 +-- 17 files changed, 348 insertions(+), 21 deletions(-) diff --git a/src/backend/access/heap/rewriteheap.c b/src/backend/access/heap/rewriteheap.c index d7f65a5..d1400ec 100644 --- a/src/backend/access/heap/rewriteheap.c +++ b/src/backend/access/heap/rewriteheap.c @@ -812,7 +812,8 @@ logical_begin_heap_rewrite(RewriteState state) if (!state->rs_logical_rewrite) return; - ProcArrayGetReplicationSlotXmin(NULL, &logical_xmin); + /* Use oldestCatalogXmin here */ + ProcArrayGetReplicationSlotXmin(NULL, &logical_xmin, NULL); /* * If there are no logical slots in progress we don't need to do anything, diff --git a/src/backend/access/rmgrdesc/xactdesc.c b/src/backend/access/rmgrdesc/xactdesc.c index 735f8c5..96ea163 100644 --- a/src/backend/access/rmgrdesc/xactdesc.c +++ b/src/backend/access/rmgrdesc/xactdesc.c @@ -297,6 +297,12 @@ xact_desc(StringInfo buf, XLogReaderState *record) appendStringInfo(buf, "xtop %u: ", xlrec->xtop); xact_desc_assignment(buf, xlrec); } + else if (info == XLOG_XACT_CATALOG_XMIN_ADV) + { + xl_xact_catalog_xmin_advance *xlrec = (xl_xact_catalog_xmin_advance *) XLogRecGetData(record); + + appendStringInfo(buf, "catalog_xmin %u", xlrec->new_catalog_xmin); + } } const char * @@ -324,6 +330,9 @@ xact_identify(uint8 info) case XLOG_XACT_ASSIGNMENT: id = "ASSIGNMENT"; break; + case XLOG_XACT_CATALOG_XMIN_ADV: + id = "CATALOG_XMIN"; + break; } return id; diff --git a/src/backend/access/rmgrdesc/xlogdesc.c b/src/backend/access/rmgrdesc/xlogdesc.c index 5f07eb1..a66cfc6 100644 --- a/src/backend/access/rmgrdesc/xlogdesc.c +++ b/src/backend/access/rmgrdesc/xlogdesc.c @@ -47,7 +47,7 @@ xlog_desc(StringInfo buf, XLogReaderState *record) "tli %u; prev tli %u; fpw %s; xid %u:%u; oid %u; multi %u; offset %u;
Re: [HACKERS] Logical decoding on standby
Hi all Here's the final set of three patches on top of what's already committed. The first is catalog_xmin logging, which is unchanged from the prior post. The 2nd is support for conflict with recovery, with changes that should address Andres's concerns there. The 3rd actually enables decoding on standby. Unlike the prior version, no attempt is made to check the walsender configuration for slot use, etc. The ugly code to try to mitigate races is also removed. Instead, if wal_level is logical the catalog_xmin sent by hot_standby_feedback is now the same as the xmin if there's no local slot holding it down. So we're always sending a catalog_xmin in feedback and we should always expect to have a valid local oldestCatalogXmin once hot_standby_feedback kicks in. This makes the race in slot creation no worse than the existing race between hot_standby_feedback establishment and the first queries run on a downstream, albeit with more annoying consequences. Apps can still ensure a slot created on standby is guaranteed safe and conflict-free by having a slot on the master first. I'm much happier with this. I'm still fixing some issues in the tests for 03 and tidying them up, but 03 should allow 01 and 02 to be reviewed in their proper context now. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Logical decoding on standby
On 3 April 2017 at 15:27, Craig Ringer wrote: > On 3 April 2017 at 13:46, Craig Ringer wrote: > >> OK, updated catalog_xmin logging patch attached. > > Ahem, that should be v5. ... and here's v6, which returns to the separate xl_xact_catalog_xmin_advance approach. pgintented. This is what I favour proceeding with. Now updating/amending recovery conflict patch. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services From 353e987584e22d268b8ab1c10c46d7e8c74ef552 Mon Sep 17 00:00:00 2001 From: Craig Ringer Date: Wed, 22 Mar 2017 13:36:49 +0800 Subject: [PATCH] Log catalog_xmin advances before removing catalog tuples Before advancing the effective catalog_xmin we use to remove old catalog tuple versions, make sure it is written to WAL. This allows standbys to know the oldest xid they can safely create a historic snapshot for. They can then refuse to start decoding from a slot or raise a recovery conflict. The catalog_xmin advance is logged in a new xl_catalog_xmin_advance record, emitted before vacuum or periodically by the bgwriter. WAL is only written if the lowest catalog_xmin needed by any replication slot has advanced. --- src/backend/access/heap/rewriteheap.c | 3 +- src/backend/access/rmgrdesc/xactdesc.c | 9 ++ src/backend/access/rmgrdesc/xlogdesc.c | 3 +- src/backend/access/transam/varsup.c | 15 src/backend/access/transam/xact.c | 36 src/backend/access/transam/xlog.c | 20 - src/backend/commands/vacuum.c | 9 ++ src/backend/postmaster/bgwriter.c | 10 +++ src/backend/replication/logical/decode.c| 12 +++ src/backend/replication/walreceiver.c | 2 +- src/backend/replication/walsender.c | 8 ++ src/backend/storage/ipc/procarray.c | 132 ++-- src/bin/pg_controldata/pg_controldata.c | 2 + src/include/access/transam.h| 5 ++ src/include/access/xact.h | 12 ++- src/include/catalog/pg_control.h| 1 + src/include/storage/procarray.h | 5 +- src/test/recovery/t/006_logical_decoding.pl | 90 +-- 18 files changed, 353 insertions(+), 21 deletions(-) diff --git a/src/backend/access/heap/rewriteheap.c b/src/backend/access/heap/rewriteheap.c index d7f65a5..d1400ec 100644 --- a/src/backend/access/heap/rewriteheap.c +++ b/src/backend/access/heap/rewriteheap.c @@ -812,7 +812,8 @@ logical_begin_heap_rewrite(RewriteState state) if (!state->rs_logical_rewrite) return; - ProcArrayGetReplicationSlotXmin(NULL, &logical_xmin); + /* Use oldestCatalogXmin here */ + ProcArrayGetReplicationSlotXmin(NULL, &logical_xmin, NULL); /* * If there are no logical slots in progress we don't need to do anything, diff --git a/src/backend/access/rmgrdesc/xactdesc.c b/src/backend/access/rmgrdesc/xactdesc.c index 735f8c5..96ea163 100644 --- a/src/backend/access/rmgrdesc/xactdesc.c +++ b/src/backend/access/rmgrdesc/xactdesc.c @@ -297,6 +297,12 @@ xact_desc(StringInfo buf, XLogReaderState *record) appendStringInfo(buf, "xtop %u: ", xlrec->xtop); xact_desc_assignment(buf, xlrec); } + else if (info == XLOG_XACT_CATALOG_XMIN_ADV) + { + xl_xact_catalog_xmin_advance *xlrec = (xl_xact_catalog_xmin_advance *) XLogRecGetData(record); + + appendStringInfo(buf, "catalog_xmin %u", xlrec->new_catalog_xmin); + } } const char * @@ -324,6 +330,9 @@ xact_identify(uint8 info) case XLOG_XACT_ASSIGNMENT: id = "ASSIGNMENT"; break; + case XLOG_XACT_CATALOG_XMIN_ADV: + id = "CATALOG_XMIN"; + break; } return id; diff --git a/src/backend/access/rmgrdesc/xlogdesc.c b/src/backend/access/rmgrdesc/xlogdesc.c index 5f07eb1..a66cfc6 100644 --- a/src/backend/access/rmgrdesc/xlogdesc.c +++ b/src/backend/access/rmgrdesc/xlogdesc.c @@ -47,7 +47,7 @@ xlog_desc(StringInfo buf, XLogReaderState *record) "tli %u; prev tli %u; fpw %s; xid %u:%u; oid %u; multi %u; offset %u; " "oldest xid %u in DB %u; oldest multi %u in DB %u; " "oldest/newest commit timestamp xid: %u/%u; " - "oldest running xid %u; %s", + "oldest running xid %u; oldest catalog xmin %u; %s", (uint32) (checkpoint->redo >> 32), (uint32) checkpoint->redo, checkpoint->ThisTimeLineID, checkpoint->PrevTimeLineID, @@ -63,6 +63,7 @@ xlog_desc(StringInfo buf, XLogReaderState *record) checkpoint->oldestCommitTsXid, checkpoint->newestCommitTsXid, checkpoint->oldestActiveXid, + checkpoint->oldestCatalogXmin, (info == XLOG_CHECKPOINT_SHUTDOWN) ? "shutdown" : "online"); } else if (info == XLOG_NEXTOID) diff --git a/src/backend/access/transam/varsup.c b/src/backend/access/transam/varsup.c index 5efbfbd..ffabf1c 100644 --- a/src/backend/access/transam/varsup.c +++ b/src/backend/access/transam/varsup.c @@ -414,6 +414,21 @@ SetTransactionIdLimit(TransactionId oldest_datfrozen
Re: [HACKERS] Logical decoding on standby
On 3 April 2017 at 13:46, Craig Ringer wrote: > OK, updated catalog_xmin logging patch attached. Ahem, that should be v5. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services From 7f742f582e1f6f8f23c4e9d78cd0298180e5387c Mon Sep 17 00:00:00 2001 From: Craig Ringer Date: Wed, 22 Mar 2017 13:36:49 +0800 Subject: [PATCH] Log catalog_xmin advances before removing catalog tuples Before advancing the effective catalog_xmin we use to remove old catalog tuple versions, make sure it is written to WAL. This allows standbys to know the oldest xid they can safely create a historic snapshot for. They can then refuse to start decoding from a slot or raise a recovery conflict. The catalog_xmin advance is logged in the next xl_running_xacts records, so vacuum of catalogs may be held back up to 10 seconds when a replication slot with catalog_xmin is holding down the global catalog_xmin. --- src/backend/access/heap/rewriteheap.c | 3 +- src/backend/access/rmgrdesc/standbydesc.c | 5 +- src/backend/access/rmgrdesc/xlogdesc.c | 3 +- src/backend/access/transam/varsup.c | 15 + src/backend/access/transam/xlog.c | 26 +++- src/backend/postmaster/bgwriter.c | 2 +- src/backend/replication/slot.c | 2 +- src/backend/replication/walreceiver.c | 2 +- src/backend/replication/walsender.c | 8 +++ src/backend/storage/ipc/procarray.c | 61 --- src/backend/storage/ipc/standby.c | 60 +-- src/bin/pg_controldata/pg_controldata.c | 2 + src/include/access/transam.h| 5 ++ src/include/catalog/pg_control.h| 1 + src/include/storage/procarray.h | 3 +- src/include/storage/standby.h | 8 ++- src/include/storage/standbydefs.h | 1 + src/test/recovery/t/006_logical_decoding.pl | 93 +++-- 18 files changed, 269 insertions(+), 31 deletions(-) diff --git a/src/backend/access/heap/rewriteheap.c b/src/backend/access/heap/rewriteheap.c index d7f65a5..d1400ec 100644 --- a/src/backend/access/heap/rewriteheap.c +++ b/src/backend/access/heap/rewriteheap.c @@ -812,7 +812,8 @@ logical_begin_heap_rewrite(RewriteState state) if (!state->rs_logical_rewrite) return; - ProcArrayGetReplicationSlotXmin(NULL, &logical_xmin); + /* Use oldestCatalogXmin here */ + ProcArrayGetReplicationSlotXmin(NULL, &logical_xmin, NULL); /* * If there are no logical slots in progress we don't need to do anything, diff --git a/src/backend/access/rmgrdesc/standbydesc.c b/src/backend/access/rmgrdesc/standbydesc.c index 278546a..4aaae59 100644 --- a/src/backend/access/rmgrdesc/standbydesc.c +++ b/src/backend/access/rmgrdesc/standbydesc.c @@ -21,10 +21,11 @@ standby_desc_running_xacts(StringInfo buf, xl_running_xacts *xlrec) { int i; - appendStringInfo(buf, "nextXid %u latestCompletedXid %u oldestRunningXid %u", + appendStringInfo(buf, "nextXid %u latestCompletedXid %u oldestRunningXid %u oldestCatalogXmin %u", xlrec->nextXid, xlrec->latestCompletedXid, - xlrec->oldestRunningXid); + xlrec->oldestRunningXid, + xlrec->oldestCatalogXmin); if (xlrec->xcnt > 0) { appendStringInfo(buf, "; %d xacts:", xlrec->xcnt); diff --git a/src/backend/access/rmgrdesc/xlogdesc.c b/src/backend/access/rmgrdesc/xlogdesc.c index 5f07eb1..a66cfc6 100644 --- a/src/backend/access/rmgrdesc/xlogdesc.c +++ b/src/backend/access/rmgrdesc/xlogdesc.c @@ -47,7 +47,7 @@ xlog_desc(StringInfo buf, XLogReaderState *record) "tli %u; prev tli %u; fpw %s; xid %u:%u; oid %u; multi %u; offset %u; " "oldest xid %u in DB %u; oldest multi %u in DB %u; " "oldest/newest commit timestamp xid: %u/%u; " - "oldest running xid %u; %s", + "oldest running xid %u; oldest catalog xmin %u; %s", (uint32) (checkpoint->redo >> 32), (uint32) checkpoint->redo, checkpoint->ThisTimeLineID, checkpoint->PrevTimeLineID, @@ -63,6 +63,7 @@ xlog_desc(StringInfo buf, XLogReaderState *record) checkpoint->oldestCommitTsXid, checkpoint->newestCommitTsXid, checkpoint->oldestActiveXid, + checkpoint->oldestCatalogXmin, (info == XLOG_CHECKPOINT_SHUTDOWN) ? "shutdown" : "online"); } else if (info == XLOG_NEXTOID) diff --git a/src/backend/access/transam/varsup.c b/src/backend/access/transam/varsup.c index 5efbfbd..ffabf1c 100644 --- a/src/backend/access/transam/varsup.c +++ b/src/backend/access/transam/varsup.c @@ -414,6 +414,21 @@ SetTransactionIdLimit(TransactionId oldest_datfrozenxid, Oid oldest_datoid) } } +/* + * Set the global oldest catalog_xmin used to determine when tuples + * may be removed from catalogs and user-catalogs accessible from logical + * decoding. + * + * Only to be called from the startup process or from LogCurrentRunningXacts() + * which ensures the update is properly written to xlog
Re: [HACKERS] Logical decoding on standby
On 31 March 2017 at 12:49, Craig Ringer wrote: > On 31 March 2017 at 01:16, Andres Freund wrote: >>> @@ -9633,6 +9643,12 @@ xlog_redo(XLogReaderState *record) >>> SetTransactionIdLimit(checkPoint.oldestXid, >>> checkPoint.oldestXidDB); >>> >>> /* >>> + * There can be no concurrent writers to oldestCatalogXmin >>> during >>> + * recovery, so no need to take ProcArrayLock. >>> + */ >>> + ShmemVariableCache->oldestCatalogXmin = >>> checkPoint.oldestCatalogXmin; >> >> s/writers/writes/? > > I meant writers, i.e. nothing else can be writing to it. But writes works too. > Fixed. >>> @@ -276,8 +279,21 @@ CreateInitDecodingContext(char *plugin, >>> >>> ReplicationSlotsComputeRequiredXmin(true); >>> >>> + /* >>> + * If this is the first slot created on the master we won't have a >>> + * persistent record of the oldest safe xid for historic snapshots >>> yet. >>> + * Force one to be recorded so that when we go to replay from this >>> slot we >>> + * know it's safe. >>> + */ >>> + force_standby_snapshot = >>> + !TransactionIdIsValid(ShmemVariableCache->oldestCatalogXmin); >> >> s/first slot/first logical slot/. Also, the reference to master doesn't >> seem right. > > Unsure what you mean re reference to master not seeming right. > > If oldestCatalogXmin is 0 we'll ERROR when trying to start decoding > from the new slot so we need to make sure it gets advanced one we've > decided on our starting catalog_xmin. Moved to next patch, will address there. >>> LWLockRelease(ProcArrayLock); >>> >>> + /* Update ShmemVariableCache->oldestCatalogXmin */ >>> + if (force_standby_snapshot) >>> + LogStandbySnapshot(); >> >> The comment and code don't quite square to me - it's far from obvious >> that LogStandbySnapshot does something like that. I'd even say it's a >> bad idea to have it do that. > > So you prefer the prior approach with separate xl_catalog_xmin advance > records? > > I don't have much preference; I liked the small code reduction of > Simon's proposed approach, but it landed up being a bit awkward in > terms of ordering and locking. I don't think catalog_xmin tracking is > really closely related to the standby snapshot stuff and it feels a > bit like it's a tacked-on afterthought where it is now. This code moved to next patch. But we do need to agree on the best approach. If we're not going to force a standby snapshot here, then it's probably better to use the separate xl_catalog_xmin design instead. >>> /* >>>* tell the snapshot builder to only assemble snapshot once reaching >>> the >>>* running_xact's record with the respective xmin. >>> @@ -376,6 +392,8 @@ CreateDecodingContext(XLogRecPtr start_lsn, >>> start_lsn = slot->data.confirmed_flush; >>> } >>> >>> + EnsureActiveLogicalSlotValid(); >> >> This seems like it should be in a separate patch, and seperately >> reviewed. It's code that's currently unreachable (and thus untestable). > > It is reached and is run, those checks run whenever creating a > non-initial decoding context on master or replica. Again, moved to next patch. >>> /* >>> + * Update our knowledge of the oldest xid we can safely create >>> historic >>> + * snapshots for. >>> + * >>> + * There can be no concurrent writers to oldestCatalogXmin during >>> + * recovery, so no need to take ProcArrayLock. >> >> By now I think is pretty flawed logic, because there can be concurrent >> readers, that need to be safe against oldestCatalogXmin advancing >> concurrently. > > You're right, we'll need a lock or suitable barriers here to ensure > that slot conflict with recovery and startup of new decoding sessions > doesn't see outdated values. This would be the peer of the > pg_memory_barrier() above. Or could just take a lock; there's enough > other locking activity in redo that it should be fine. Now takes ProcArrayLock briefly. oldestCatalogXmin is also used in GetOldestSafeDecodingTransactionId, and there we want to prevent it from being advanced. But on further thought, relying on oldestCatalogXmin there is actually unsafe; on the master, we might've already logged our intent to advance it to some greater value of procArray->replication_slot_catalog_xmin and will do so as ProcArrayLock is released. On standby we're also better off relying on procArray->replication_slot_catalog_xmin since that's what we'll be sending in feedback. Went back to using replication_slot_catalog_xmin here, with comment * * We don't use ShmemVariableCache->oldestCatalogXmin here because another * backend may have already logged its intention to advance it to a higher * value (still <= replication_slot_catalog_xmin) and just be waiting on * ProcArrayLock to actually apply the change. On a standby * replication_slot_catalog_xmin is what the walreceive
Re: [HACKERS] Logical decoding on standby
On 31 March 2017 at 12:49, Craig Ringer wrote: > On 31 March 2017 at 01:16, Andres Freund wrote: >> The comment and code don't quite square to me - it's far from obvious >> that LogStandbySnapshot does something like that. I'd even say it's a >> bad idea to have it do that. > > So you prefer the prior approach with separate xl_catalog_xmin advance > records? Alternately, we can record the creation timeline on slots, so we know if there's been a promotion. It wouldn't make sense to do this if that were the only use of timelines on slots. But I'm aware you'd rather keep slots timeline-agnostic and I tend to agree. Anyway, per your advice will split out the validation step. (I'd like replication origins to be able to track time alongside lsn, and to pass the timeline of each LSN to output plugin callbacks so we can detect if a physical promotion results in us backtracking down a fork in history, but that doesn't affect slots.) -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Logical decoding on standby
On 31 March 2017 at 01:16, Andres Freund wrote: >> @@ -9633,6 +9643,12 @@ xlog_redo(XLogReaderState *record) >> SetTransactionIdLimit(checkPoint.oldestXid, >> checkPoint.oldestXidDB); >> >> /* >> + * There can be no concurrent writers to oldestCatalogXmin >> during >> + * recovery, so no need to take ProcArrayLock. >> + */ >> + ShmemVariableCache->oldestCatalogXmin = >> checkPoint.oldestCatalogXmin; > > s/writers/writes/? I meant writers, i.e. nothing else can be writing to it. But writes works too. >> @@ -276,8 +279,21 @@ CreateInitDecodingContext(char *plugin, >> >> ReplicationSlotsComputeRequiredXmin(true); >> >> + /* >> + * If this is the first slot created on the master we won't have a >> + * persistent record of the oldest safe xid for historic snapshots yet. >> + * Force one to be recorded so that when we go to replay from this >> slot we >> + * know it's safe. >> + */ >> + force_standby_snapshot = >> + !TransactionIdIsValid(ShmemVariableCache->oldestCatalogXmin); > > s/first slot/first logical slot/. Also, the reference to master doesn't > seem right. Unsure what you mean re reference to master not seeming right. If oldestCatalogXmin is 0 we'll ERROR when trying to start decoding from the new slot so we need to make sure it gets advanced one we've decided on our starting catalog_xmin. >> LWLockRelease(ProcArrayLock); >> >> + /* Update ShmemVariableCache->oldestCatalogXmin */ >> + if (force_standby_snapshot) >> + LogStandbySnapshot(); > > The comment and code don't quite square to me - it's far from obvious > that LogStandbySnapshot does something like that. I'd even say it's a > bad idea to have it do that. So you prefer the prior approach with separate xl_catalog_xmin advance records? I don't have much preference; I liked the small code reduction of Simon's proposed approach, but it landed up being a bit awkward in terms of ordering and locking. I don't think catalog_xmin tracking is really closely related to the standby snapshot stuff and it feels a bit like it's a tacked-on afterthought where it is now. >> /* >>* tell the snapshot builder to only assemble snapshot once reaching >> the >>* running_xact's record with the respective xmin. >> @@ -376,6 +392,8 @@ CreateDecodingContext(XLogRecPtr start_lsn, >> start_lsn = slot->data.confirmed_flush; >> } >> >> + EnsureActiveLogicalSlotValid(); > > This seems like it should be in a separate patch, and seperately > reviewed. It's code that's currently unreachable (and thus untestable). It is reached and is run, those checks run whenever creating a non-initial decoding context on master or replica. The failure case is reachable if a replica has hot_standby_feedback off or it's not using a physical slot and loses its connection. If promoted, any slot existing on that replica (from a file system level copy when the replica was created) will fail. I agree it's contrived since we can't create and maintain slots on replicas, which is the main point of it. >> +/* >> + * Test to see if the active logical slot is usable. >> + */ >> +static void >> +EnsureActiveLogicalSlotValid(void) >> +{ >> + TransactionId shmem_catalog_xmin; >> + >> + Assert(MyReplicationSlot != NULL); >> + >> + /* >> + * A logical slot can become unusable if we're doing logical decoding >> on a >> + * standby or using a slot created before we were promoted from standby >> + * to master. > > Neither of those is currently possible... Right. Because it's foundations for decoding on standby. >> If the master advanced its global catalog_xmin past the >> + * threshold we need it could've removed catalog tuple versions that >> + * we'll require to start decoding at our restart_lsn. >> + * >> + * We need a barrier so that if we decode in recovery on a standby we >> + * don't allow new decoding sessions to start after redo has advanced >> + * the threshold. >> + */ >> + if (RecoveryInProgress()) >> + pg_memory_barrier(); > > I don't think this is a meaningful locking protocol. It's a bad idea to > use lock-free programming without need, especially when the concurrency > protocol isn't well defined. Yeah. The intended interaction is with recovery conflict on standby which doesn't look likely to land in this release due to patch split/cleanup etc. (Not a complaint). Better to just take a brief shared ProcArrayLock. >> diff --git a/src/backend/replication/walsender.c >> b/src/backend/replication/walsender.c >> index cfc3fba..cdc5f95 100644 >> --- a/src/backend/replication/walsender.c >> +++ b/src/backend/replication/walsender.c >> @@ -1658,6 +1658,11 @@ PhysicalConfirmReceivedLocation(XLogRecPtr lsn) >>* be energy wasted - the worst lost information can do here is give us >>* wrong infor
Re: [HACKERS] Logical decoding on standby
On 31 March 2017 at 01:16, Andres Freund wrote: > On 2017-03-29 08:01:34 +0800, Craig Ringer wrote: >> On 28 March 2017 at 23:22, Andres Freund wrote: >> >> >> --- a/doc/src/sgml/protocol.sgml >> >> +++ b/doc/src/sgml/protocol.sgml >> >> @@ -2034,6 +2034,8 @@ The commands accepted in walsender mode are: >> >> >> >>Drops a replication slot, freeing any reserved server-side >> >> resources. If >> >>the slot is currently in use by an active connection, this command >> >> fails. >> >> + If the slot is a logical slot that was created in a database other >> >> than >> >> + the database the walsender is connected to, this command fails. >> >> >> >> >> >> >> > >> > Shouldn't the docs in the drop database section about this? >> >> DROP DATABASE doesn't really discuss all the resources it drops, but >> I'm happy to add mention of replication slots handling. > > I don't think that's really comparable, because the other things aren't > global objects, which replication slots are. Fine by me. Patch fix-slot-drop-docs.patch, upthread, adds the passage + + + Active logical + replication slots count as connections and will prevent a + database from being dropped. Inactive slots will be automatically + dropped when the database is dropped. + to the notes section of the DROP DATABASE docs; that should do the trick. I'm not convinced it's worth going into the exceedingly unlikely race with concurrent slot drop, and we don't seem to in other places in the docs, like the race you mentioned with connecting to a db as it's being dropped. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Logical decoding on standby
On 2017-03-30 19:40:08 +0100, Simon Riggs wrote: > On 30 March 2017 at 18:16, Andres Freund wrote: > > >> /* > >> * Each page of XLOG file has a header like this: > >> */ > >> -#define XLOG_PAGE_MAGIC 0xD097 /* can be used as WAL version > >> indicator */ > >> +#define XLOG_PAGE_MAGIC 0xD100 /* can be used as WAL version > >> indicator */ > > > > We normally only advance this by one, it's not tied to the poistgres > > version. > > That was my addition. I rounded it up cos this is release 10. No biggie. We'll probably upgrade that more than once again this release... > (Poistgres? Is that the Manhattan spelling?) Tiredness spelling ;) > We've been redesigning the mechanisms for 2 years now, so it seems > unlikely that further redesign can be required. I don't think that's true *at all* - the mechanism previously fundamentally different. The whole topic has largely seen activity shortly before the code freeze, both last time round and now. I don't think it's surprising that it thus doesn't end up being ready. > If it is required, > this patch is fairly low touch and change is possible later, > incremental development etc. As regards overhead, this adds a small > amount of time to a background process executed every 10 secs, > generates no new WAL records. > > So I don't see any reason not to commit this feature, after the minor > corrections. It doesn't have any benefit on its own, the locking model doesn't seem fully there. I don't see much reason to get this in before the release. - Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Logical decoding on standby
On 30 March 2017 at 18:16, Andres Freund wrote: >> /* >> * Each page of XLOG file has a header like this: >> */ >> -#define XLOG_PAGE_MAGIC 0xD097 /* can be used as WAL version >> indicator */ >> +#define XLOG_PAGE_MAGIC 0xD100 /* can be used as WAL version >> indicator */ > > We normally only advance this by one, it's not tied to the poistgres version. That was my addition. I rounded it up cos this is release 10. No biggie. (Poistgres? Is that the Manhattan spelling?) > I'm sorry, but to me this patch isn't ready. I'm also doubtful that it > makes a whole lot of sense on its own, without having finished the > design for decoding on a standby - it seems quite likely that we'll have > to redesign the mechanisms in here a bit for that. For 10 this seems to > do nothing but add overhead? I'm sure we can reword the comments. We've been redesigning the mechanisms for 2 years now, so it seems unlikely that further redesign can be required. If it is required, this patch is fairly low touch and change is possible later, incremental development etc. As regards overhead, this adds a small amount of time to a background process executed every 10 secs, generates no new WAL records. So I don't see any reason not to commit this feature, after the minor corrections. -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Logical decoding on standby
On 2017-03-29 08:01:34 +0800, Craig Ringer wrote: > On 28 March 2017 at 23:22, Andres Freund wrote: > > >> --- a/doc/src/sgml/protocol.sgml > >> +++ b/doc/src/sgml/protocol.sgml > >> @@ -2034,6 +2034,8 @@ The commands accepted in walsender mode are: > >> > >>Drops a replication slot, freeing any reserved server-side > >> resources. If > >>the slot is currently in use by an active connection, this command > >> fails. > >> + If the slot is a logical slot that was created in a database other > >> than > >> + the database the walsender is connected to, this command fails. > >> > >> > >> > > > > Shouldn't the docs in the drop database section about this? > > DROP DATABASE doesn't really discuss all the resources it drops, but > I'm happy to add mention of replication slots handling. I don't think that's really comparable, because the other things aren't global objects, which replication slots are. - Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Logical decoding on standby
> @@ -9633,6 +9643,12 @@ xlog_redo(XLogReaderState *record) > SetTransactionIdLimit(checkPoint.oldestXid, > checkPoint.oldestXidDB); > > /* > + * There can be no concurrent writers to oldestCatalogXmin > during > + * recovery, so no need to take ProcArrayLock. > + */ > + ShmemVariableCache->oldestCatalogXmin = > checkPoint.oldestCatalogXmin; s/writers/writes/? > @@ -9731,6 +9748,15 @@ xlog_redo(XLogReaderState *record) > > checkPoint.oldestXid)) > SetTransactionIdLimit(checkPoint.oldestXid, > > checkPoint.oldestXidDB); > + > + /* > + * There can be no concurrent writers to oldestCatalogXmin > during > + * recovery, so no need to take ProcArrayLock. > + */ > + if (TransactionIdPrecedes(ShmemVariableCache->oldestCatalogXmin, > + > checkPoint.oldestCatalogXmin) > + ShmemVariableCache->oldestCatalogXmin = > checkPoint.oldestCatalogXmin; dito. > @@ -276,8 +279,21 @@ CreateInitDecodingContext(char *plugin, > > ReplicationSlotsComputeRequiredXmin(true); > > + /* > + * If this is the first slot created on the master we won't have a > + * persistent record of the oldest safe xid for historic snapshots yet. > + * Force one to be recorded so that when we go to replay from this slot > we > + * know it's safe. > + */ > + force_standby_snapshot = > + !TransactionIdIsValid(ShmemVariableCache->oldestCatalogXmin); s/first slot/first logical slot/. Also, the reference to master doesn't seem right. > LWLockRelease(ProcArrayLock); > > + /* Update ShmemVariableCache->oldestCatalogXmin */ > + if (force_standby_snapshot) > + LogStandbySnapshot(); The comment and code don't quite square to me - it's far from obvious that LogStandbySnapshot does something like that. I'd even say it's a bad idea to have it do that. > /* >* tell the snapshot builder to only assemble snapshot once reaching the >* running_xact's record with the respective xmin. > @@ -376,6 +392,8 @@ CreateDecodingContext(XLogRecPtr start_lsn, > start_lsn = slot->data.confirmed_flush; > } > > + EnsureActiveLogicalSlotValid(); This seems like it should be in a separate patch, and seperately reviewed. It's code that's currently unreachable (and thus untestable). > +/* > + * Test to see if the active logical slot is usable. > + */ > +static void > +EnsureActiveLogicalSlotValid(void) > +{ > + TransactionId shmem_catalog_xmin; > + > + Assert(MyReplicationSlot != NULL); > + > + /* > + * A logical slot can become unusable if we're doing logical decoding > on a > + * standby or using a slot created before we were promoted from standby > + * to master. Neither of those is currently possible... > If the master advanced its global catalog_xmin past the > + * threshold we need it could've removed catalog tuple versions that > + * we'll require to start decoding at our restart_lsn. > + * > + * We need a barrier so that if we decode in recovery on a standby we > + * don't allow new decoding sessions to start after redo has advanced > + * the threshold. > + */ > + if (RecoveryInProgress()) > + pg_memory_barrier(); I don't think this is a meaningful locking protocol. It's a bad idea to use lock-free programming without need, especially when the concurrency protocol isn't well defined. With what other barrier does this pair with? What prevents the data being out of date by the time we actually do the check? > diff --git a/src/backend/replication/walsender.c > b/src/backend/replication/walsender.c > index cfc3fba..cdc5f95 100644 > --- a/src/backend/replication/walsender.c > +++ b/src/backend/replication/walsender.c > @@ -1658,6 +1658,11 @@ PhysicalConfirmReceivedLocation(XLogRecPtr lsn) >* be energy wasted - the worst lost information can do here is give us >* wrong information in a statistics view - we'll just potentially be > more >* conservative in removing files. > + * > + * We don't have to do any effective_xmin / effective_catalog_xmin > testing > + * here either, like for LogicalConfirmReceivedLocation. If we received > + * the xmin and catalog_xmin from downstream replication slots we know > they > + * were already confirmed there, >*/ > } This comment reads as if LogicalConfirmReceivedLocation had justification for not touching / checking catalog_xmin. But it does. > /* > + * Update our knowledge of the oldest xid we can safely create historic > + * snapshots for. > + * > +
Re: [HACKERS] Logical decoding on standby
On 30 March 2017 at 15:27, Andres Freund wrote: > On 2017-03-30 15:26:02 +0100, Simon Riggs wrote: >> On 30 March 2017 at 09:07, Craig Ringer wrote: >> >> > Attached. >> >> * Cleaned up in 3 places >> * Added code for faked up RunningTransactions in xlog.c >> * Ensure catalog_xmin doesn't go backwards >> >> All else looks good. Comments before commit? > > Can you give me till after lunch? Sure, np -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Logical decoding on standby
On 2017-03-30 15:26:02 +0100, Simon Riggs wrote: > On 30 March 2017 at 09:07, Craig Ringer wrote: > > > Attached. > > * Cleaned up in 3 places > * Added code for faked up RunningTransactions in xlog.c > * Ensure catalog_xmin doesn't go backwards > > All else looks good. Comments before commit? Can you give me till after lunch? - Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Logical decoding on standby
On 30 March 2017 at 09:07, Craig Ringer wrote: > Attached. * Cleaned up in 3 places * Added code for faked up RunningTransactions in xlog.c * Ensure catalog_xmin doesn't go backwards All else looks good. Comments before commit? -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services log-catalog-xmin-advances-v4.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Logical decoding on standby
On 30 March 2017 at 11:34, Craig Ringer wrote: > On 29 March 2017 at 23:13, Simon Riggs wrote: >> On 29 March 2017 at 10:17, Craig Ringer wrote: >>> On 29 March 2017 at 16:44, Craig Ringer wrote: >>> * Split oldestCatalogXmin tracking into separate patch >>> >>> Regarding this, Simon raised concerns about xlog volume here. >>> >>> It's pretty negligible. >>> >>> We only write a new record when a vacuum runs after catalog_xmin >>> advances on the slot with the currently-lowest catalog_xmin (or, if >>> vacuum doesn't run reasonably soon, when the bgworker next looks). >> >> I'd prefer to slow things down a little, not be so eager. >> >> If we hold back update of the catalog_xmin until when we run >> GetRunningTransactionData() we wouldn't need to produce any WAL >> records at all AND we wouldn't need to have VACUUM do >> UpdateOldestCatalogXmin(). Bgwriter wouldn't need to perform an extra >> task. >> >> That would also make this patch about half the length it is. >> >> Let me know what you think. > > Good idea. > > We can always add a heuristic later to make xl_running_xacts get > emitted more often at high transaction rates if it's necessary. > > Patch coming soon. Attached. A bit fiddlier than expected, but I like the result more. In the process I identified an issue with both the prior patch and this one where we don't check slot validity for slots that existed on standby prior to promotion of standby to master. We were just assuming that being the master was good enough, since it controls replication_slot_catalog_xmin, but that's not true for pre-existing slots. Fixed by forcing update of the persistent safe catalog xmin after the first slot is created on the master - which is now done by doing an immediate LogStandbySnapshot() after assigning the slot's catalog_xmin. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services From 0df4f4ae04f8d37c623d3a533699966c3cc0479a Mon Sep 17 00:00:00 2001 From: Craig Ringer Date: Wed, 22 Mar 2017 13:36:49 +0800 Subject: [PATCH v2] Log catalog_xmin advances before removing catalog tuples Before advancing the effective catalog_xmin we use to remove old catalog tuple versions, make sure it is written to WAL. This allows standbys to know the oldest xid they can safely create a historic snapshot for. They can then refuse to start decoding from a slot or raise a recovery conflict. The catalog_xmin advance is logged in the next xl_running_xacts records, so vacuum of catalogs may be held back up to 10 seconds when a replication slot with catalog_xmin is holding down the global catalog_xmin. --- src/backend/access/heap/rewriteheap.c | 3 +- src/backend/access/rmgrdesc/standbydesc.c | 5 ++- src/backend/access/transam/varsup.c | 1 - src/backend/access/transam/xlog.c | 26 ++- src/backend/replication/logical/logical.c | 54 +++ src/backend/replication/walreceiver.c | 2 +- src/backend/replication/walsender.c | 13 ++ src/backend/storage/ipc/procarray.c | 68 +++-- src/backend/storage/ipc/standby.c | 25 +++ src/bin/pg_controldata/pg_controldata.c | 2 + src/include/access/transam.h| 11 + src/include/catalog/pg_control.h| 1 + src/include/storage/procarray.h | 3 +- src/include/storage/standby.h | 6 +++ src/include/storage/standbydefs.h | 1 + src/test/recovery/t/006_logical_decoding.pl | 15 ++- 16 files changed, 214 insertions(+), 22 deletions(-) diff --git a/src/backend/access/heap/rewriteheap.c b/src/backend/access/heap/rewriteheap.c index d7f65a5..d1400ec 100644 --- a/src/backend/access/heap/rewriteheap.c +++ b/src/backend/access/heap/rewriteheap.c @@ -812,7 +812,8 @@ logical_begin_heap_rewrite(RewriteState state) if (!state->rs_logical_rewrite) return; - ProcArrayGetReplicationSlotXmin(NULL, &logical_xmin); + /* Use oldestCatalogXmin here */ + ProcArrayGetReplicationSlotXmin(NULL, &logical_xmin, NULL); /* * If there are no logical slots in progress we don't need to do anything, diff --git a/src/backend/access/rmgrdesc/standbydesc.c b/src/backend/access/rmgrdesc/standbydesc.c index 278546a..4aaae59 100644 --- a/src/backend/access/rmgrdesc/standbydesc.c +++ b/src/backend/access/rmgrdesc/standbydesc.c @@ -21,10 +21,11 @@ standby_desc_running_xacts(StringInfo buf, xl_running_xacts *xlrec) { int i; - appendStringInfo(buf, "nextXid %u latestCompletedXid %u oldestRunningXid %u", + appendStringInfo(buf, "nextXid %u latestCompletedXid %u oldestRunningXid %u oldestCatalogXmin %u", xlrec->nextXid, xlrec->latestCompletedXid, - xlrec->oldestRunningXid); + xlrec->oldestRunningXid, + xlrec->oldestCatalogXmin); if (xlrec->xcnt > 0) { appendStringInfo(buf, "; %d xacts:", xlrec->xcnt); diff --git a/src/backend/access/transam/varsup.c b/src/b
Re: [HACKERS] Logical decoding on standby
On 29 March 2017 at 23:13, Simon Riggs wrote: > On 29 March 2017 at 10:17, Craig Ringer wrote: >> On 29 March 2017 at 16:44, Craig Ringer wrote: >> >>> * Split oldestCatalogXmin tracking into separate patch >> >> Regarding this, Simon raised concerns about xlog volume here. >> >> It's pretty negligible. >> >> We only write a new record when a vacuum runs after catalog_xmin >> advances on the slot with the currently-lowest catalog_xmin (or, if >> vacuum doesn't run reasonably soon, when the bgworker next looks). > > I'd prefer to slow things down a little, not be so eager. > > If we hold back update of the catalog_xmin until when we run > GetRunningTransactionData() we wouldn't need to produce any WAL > records at all AND we wouldn't need to have VACUUM do > UpdateOldestCatalogXmin(). Bgwriter wouldn't need to perform an extra > task. > > That would also make this patch about half the length it is. > > Let me know what you think. Good idea. We can always add a heuristic later to make xl_running_xacts get emitted more often at high transaction rates if it's necessary. Patch coming soon. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Logical decoding on standby
On 29 March 2017 at 10:17, Craig Ringer wrote: > On 29 March 2017 at 16:44, Craig Ringer wrote: > >> * Split oldestCatalogXmin tracking into separate patch > > Regarding this, Simon raised concerns about xlog volume here. > > It's pretty negligible. > > We only write a new record when a vacuum runs after catalog_xmin > advances on the slot with the currently-lowest catalog_xmin (or, if > vacuum doesn't run reasonably soon, when the bgworker next looks). I'd prefer to slow things down a little, not be so eager. If we hold back update of the catalog_xmin until when we run GetRunningTransactionData() we wouldn't need to produce any WAL records at all AND we wouldn't need to have VACUUM do UpdateOldestCatalogXmin(). Bgwriter wouldn't need to perform an extra task. That would also make this patch about half the length it is. Let me know what you think. -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Logical decoding on standby
On 29 March 2017 at 16:44, Craig Ringer wrote: > * Split oldestCatalogXmin tracking into separate patch Regarding this, Simon raised concerns about xlog volume here. It's pretty negligible. We only write a new record when a vacuum runs after catalog_xmin advances on the slot with the currently-lowest catalog_xmin (or, if vacuum doesn't run reasonably soon, when the bgworker next looks). So at worst on a fairly slow moving system or one with a super high vacuum rate we'll write one per commit. But in most cases we'll write a lot fewer than that. When running t/006_logical_decoding.pl for example: $ ../../../src/bin/pg_waldump/pg_waldump tmp_check/data_master_daPa/pgdata/pg_wal/00010001 | grep CATALOG rmgr: Transaction len (rec/tot): 4/30, tx: 0, lsn: 0/01648D50, prev 0/01648D18, desc: CATALOG_XMIN catalog_xmin 555 rmgr: Transaction len (rec/tot): 4/30, tx: 0, lsn: 0/0164C840, prev 0/0164C378, desc: CATALOG_XMIN catalog_xmin 0 pg_waldump: FATAL: error in WAL record at 0/16BBF10: invalid record length at 0/16BBF88: wanted 24, got 0 and of course, none at all unless you use logical decoding. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Logical decoding on standby
On 29 March 2017 at 08:11, Craig Ringer wrote: > On 29 March 2017 at 08:01, Craig Ringer wrote: > >> I just notice that I failed to remove the docs changes regarding >> dropping slots becoming db-specific, so I'll post a follow-up for that >> in a sec. > > Attached. ... and here's the next in the patch series. Both this and the immediately prior minor patch fix-drop-slot-docs.patch are pending now. Notable changes in this patch since review: * Split oldestCatalogXmin tracking into separate patch * Critically, fix use of procArray->replication_slot_catalog_xmin in GetSnapshotData's setting of RecentGlobalXmin and RecentGlobalDataXmin so it instead uses ShmemVariableCache->oldestCatalogXmin . This could've led to tuples newer than oldestCatalogXmin being removed. * Memory barrier in UpdateOldestCatalogXmin and SetOldestCatalogXmin. It still does a pre-check before deciding if it needs to take ProcArrayLock, recheck, and advance, since we don't want to unnecessarily contest ProcArrayLock. * Remove unnecessary volatile usage (retained in UpdateOldestCatalogXmin due to barrier) * Remove unnecessary test for XLogInsertAllowed() in XactLogCatalogXminUpdate * EnsureActiveLogicalSlotValid(void) - add (void) * pgidented changes in this diff; have left unrelated changes alone Re: > what does > > + TransactionId oldestCatalogXmin; /* oldest xid where complete catalog > state > + * > is guaranteed to still exist */ > > mean? I complained about the overall justification in the commit > already, but looking at this commit alone, the justification for this > part of the change is quite hard to understand. The patch now contains TransactionId oldestCatalogXmin; /* oldest xid it is guaranteed to be safe * to create a historic snapshot for; see * also * procArray->replication_slot_catalog_xmin * */ which I think is an improvement. I've also sought to explain the purpose of this change better with /* * If necessary, copy the current catalog_xmin needed by replication slots to * the effective catalog_xmin used for dead tuple removal and write a WAL * record recording the change. * * This allows standbys to know the oldest xid for which it is safe to create * a historic snapshot for logical decoding. VACUUM or other cleanup may have * removed catalog tuple versions needed to correctly decode transactions older * than this threshold. Standbys can use this information to cancel conflicting * decoding sessions and invalidate slots that need discarded information. * * (We can't use the transaction IDs in WAL records emitted by VACUUM etc for * this, since they don't identify the relation as a catalog or not. Nor can a * standby look up the relcache to get the Relation for the affected * relfilenode to check if it is a catalog. The standby would also have no way * to know the oldest safe position at startup if it wasn't in the control * file.) */ void UpdateOldestCatalogXmin(void) { ... Does that help? (Sidenote for later: ResolveRecoveryConflictWithLogicalDecoding will need a read barrier too, when the next patch adds it.) -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services From 4b8e3aaa52539ef8cf3c79d1ed0319cc44800a32 Mon Sep 17 00:00:00 2001 From: Craig Ringer Date: Wed, 22 Mar 2017 13:36:49 +0800 Subject: [PATCH] Log catalog_xmin advances before removing catalog tuples Write a WAL record before advancing the oldest catalog_xmin preserved by VACUUM and other tuple removal. Previously GetOldestXmin would use procArray->replication_slot_catalog_xmin as the xid limit for vacuuming catalog tuples, so it was not possible for standbys to determine whether all catalog tuples needed for a catalog snapshot for a given xid would still exist. Logging catalog_xmin advances allows standbys to determine if a logical slot on the standby has become unsafe to use. It can then refuse to start logical decoding on that slot or, if decoding is in progress, raise a conflict with recovery. Note that we only emit new WAL records if catalog_xmin changes, which happens due to changes in slot state. So this won't generate WAL whenever oldestXmin advances. --- src/backend/access/heap/rewriteheap.c | 3 +- src/backend/access/rmgrdesc/xactdesc.c | 9 +++ src/backend/access/transam/varsup.c | 14 src/backend/access/transam/xact.c | 35 src/backend/access/transam/xlog.c | 12 ++- src/backend/commands/vacuum.c | 9 +++ src/backend/postmaster/bgwriter.c | 10 +++ src/backend/replication/logical/decode.c| 11 +++ src/backend/replication/logical/logical.c | 38 + src/backend/replication/walreceiver.c |
Re: [HACKERS] Logical decoding on standby
On 29 March 2017 at 08:01, Craig Ringer wrote: > I just notice that I failed to remove the docs changes regarding > dropping slots becoming db-specific, so I'll post a follow-up for that > in a sec. Attached. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services From 5fe01aef643905ec1f6dcffd0f5d583809fc9a21 Mon Sep 17 00:00:00 2001 From: Craig Ringer Date: Wed, 29 Mar 2017 08:03:06 +0800 Subject: [PATCH] Documentation amendments for slot drop on db drop The "Cleanup slots during drop database" patch incorrectly documented that dropping logical slots must now be done from the database the slot was created on. This was the case in an earlier variant of the patch, but not the committed version. Also document that idle logical replication slots will be dropped by DROP DATABASE. --- doc/src/sgml/func.sgml | 3 +-- doc/src/sgml/protocol.sgml | 2 -- doc/src/sgml/ref/drop_database.sgml | 7 +++ 3 files changed, 8 insertions(+), 4 deletions(-) diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index 78508d7..ba6f8dd 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -18876,8 +18876,7 @@ postgres=# SELECT * FROM pg_walfile_name_offset(pg_stop_backup()); Drops the physical or logical replication slot named slot_name. Same as replication protocol -command DROP_REPLICATION_SLOT. For logical slots, this must -be called when connected to the same database the slot was created on. +command DROP_REPLICATION_SLOT. diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml index 5f97141..b3a5026 100644 --- a/doc/src/sgml/protocol.sgml +++ b/doc/src/sgml/protocol.sgml @@ -2034,8 +2034,6 @@ The commands accepted in walsender mode are: Drops a replication slot, freeing any reserved server-side resources. If the slot is currently in use by an active connection, this command fails. - If the slot is a logical slot that was created in a database other than - the database the walsender is connected to, this command fails. diff --git a/doc/src/sgml/ref/drop_database.sgml b/doc/src/sgml/ref/drop_database.sgml index 740aa31..3427139 100644 --- a/doc/src/sgml/ref/drop_database.sgml +++ b/doc/src/sgml/ref/drop_database.sgml @@ -81,6 +81,13 @@ DROP DATABASE [ IF EXISTS ] name instead, which is a wrapper around this command. + + + Active logical + replication slots count as connections and will prevent a + database from being dropped. Inactive slots will be automatically + dropped when the database is dropped. + -- 2.5.5 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Logical decoding on standby
On 28 March 2017 at 23:22, Andres Freund wrote: >> --- a/doc/src/sgml/protocol.sgml >> +++ b/doc/src/sgml/protocol.sgml >> @@ -2034,6 +2034,8 @@ The commands accepted in walsender mode are: >> >>Drops a replication slot, freeing any reserved server-side resources. >> If >>the slot is currently in use by an active connection, this command >> fails. >> + If the slot is a logical slot that was created in a database other >> than >> + the database the walsender is connected to, this command fails. >> >> >> > > Shouldn't the docs in the drop database section about this? DROP DATABASE doesn't really discuss all the resources it drops, but I'm happy to add mention of replication slots handling. I just notice that I failed to remove the docs changes regarding dropping slots becoming db-specific, so I'll post a follow-up for that in a sec. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Logical decoding on standby
Hi, On 2017-03-27 16:03:48 +0800, Craig Ringer wrote: > On 27 March 2017 at 14:08, Craig Ringer wrote: > > > So this patch makes ReplicationSlotAcquire check that the slot > > database matches the current database and refuse to acquire the slot > > if it does not. > > New patch attached that drops above requirement, so slots can still be > dropped from any DB. > > This introduces a narrow race window where DROP DATABASE may ERROR if > somebody connects to a different database and runs a > pg_drop_replication_slot(...) for one of the slots being dropped by > DROP DATABASE after we check for active slots but before we've dropped > the slot. But it's hard to hit and it's pretty harmless; the worst > possible result is dropping one or more of the slots before we ERROR > out of the DROP. But you clearly didn't want them anyway, since you > were dropping the DB and dropping some slots at the same time. > > I think this one's ready to go. > > -- > Craig Ringer http://www.2ndQuadrant.com/ > PostgreSQL Development, 24x7 Support, Training & Services > From 99d5313d3a265bcc57ca6845230b9ec49d188710 Mon Sep 17 00:00:00 2001 > From: Craig Ringer > Date: Wed, 22 Mar 2017 13:21:09 +0800 > Subject: [PATCH] Make DROP DATABASE drop logical slots for the DB > > Automatically drop all logical replication slots associated with a > database when the database is dropped. > --- > doc/src/sgml/func.sgml | 3 +- > doc/src/sgml/protocol.sgml | 2 + > src/backend/commands/dbcommands.c | 32 +--- > src/backend/replication/slot.c | 88 > ++ > src/include/replication/slot.h | 1 + > src/test/recovery/t/006_logical_decoding.pl| 40 +- > .../recovery/t/010_logical_decoding_timelines.pl | 30 +++- > 7 files changed, 182 insertions(+), 14 deletions(-) > > diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml > index ba6f8dd..78508d7 100644 > --- a/doc/src/sgml/func.sgml > +++ b/doc/src/sgml/func.sgml > @@ -18876,7 +18876,8 @@ postgres=# SELECT * FROM > pg_walfile_name_offset(pg_stop_backup()); > > Drops the physical or logical replication slot > named slot_name. Same as replication protocol > -command DROP_REPLICATION_SLOT. > +command DROP_REPLICATION_SLOT. For logical slots, this > must > +be called when connected to the same database the slot was created > on. > > > > diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml > index b3a5026..5f97141 100644 > --- a/doc/src/sgml/protocol.sgml > +++ b/doc/src/sgml/protocol.sgml > @@ -2034,6 +2034,8 @@ The commands accepted in walsender mode are: > >Drops a replication slot, freeing any reserved server-side resources. > If >the slot is currently in use by an active connection, this command > fails. > + If the slot is a logical slot that was created in a database other than > + the database the walsender is connected to, this command fails. > > > Shouldn't the docs in the drop database section about this? > +void > +ReplicationSlotsDropDBSlots(Oid dboid) > +{ > + int i; > + > + if (max_replication_slots <= 0) > + return; > + > +restart: > + LWLockAcquire(ReplicationSlotControlLock, LW_SHARED); > + for (i = 0; i < max_replication_slots; i++) > + { > + ReplicationSlot *s; > + NameData slotname; > + int active_pid; > + > + s = &ReplicationSlotCtl->replication_slots[i]; > + > + /* cannot change while ReplicationSlotCtlLock is held */ > + if (!s->in_use) > + continue; > + > + /* only logical slots are database specific, skip */ > + if (!SlotIsLogical(s)) > + continue; > + > + /* not our database, skip */ > + if (s->data.database != dboid) > + continue; > + > + /* Claim the slot, as if ReplicationSlotAcquire()ing. */ > + SpinLockAcquire(&s->mutex); > + strncpy(NameStr(slotname), NameStr(s->data.name), NAMEDATALEN); > + NameStr(slotname)[NAMEDATALEN-1] = '\0'; > + active_pid = s->active_pid; > + if (active_pid == 0) > + { > + MyReplicationSlot = s; > + s->active_pid = MyProcPid; > + } > + SpinLockRelease(&s->mutex); > + > + /* > + * We might fail here if the slot was active. Even though we > hold an > + * exclusive lock on the database object a logical slot for > that DB can > + * still be active if it's being dropped by a backend connected > to > + * another DB or is otherwise acquired. > + * > + * I
Re: [HACKERS] Logical decoding on standby
On 27 March 2017 at 16:20, Simon Riggs wrote: > On 27 March 2017 at 09:03, Craig Ringer wrote: > >> I think this one's ready to go. > > Looks like something I could commit. Full review by me while offline > today, aiming to commit tomorrow barring issues raised. Great. Meanwhile I'm going to be trying to work with Stas on 2PC logical decoding, while firming up the next patches in this series to see if we can progress a bit further. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Logical decoding on standby
On 27 March 2017 at 09:03, Craig Ringer wrote: > I think this one's ready to go. Looks like something I could commit. Full review by me while offline today, aiming to commit tomorrow barring issues raised. -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Logical decoding on standby
On 27 March 2017 at 14:08, Craig Ringer wrote: > So this patch makes ReplicationSlotAcquire check that the slot > database matches the current database and refuse to acquire the slot > if it does not. New patch attached that drops above requirement, so slots can still be dropped from any DB. This introduces a narrow race window where DROP DATABASE may ERROR if somebody connects to a different database and runs a pg_drop_replication_slot(...) for one of the slots being dropped by DROP DATABASE after we check for active slots but before we've dropped the slot. But it's hard to hit and it's pretty harmless; the worst possible result is dropping one or more of the slots before we ERROR out of the DROP. But you clearly didn't want them anyway, since you were dropping the DB and dropping some slots at the same time. I think this one's ready to go. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services From 99d5313d3a265bcc57ca6845230b9ec49d188710 Mon Sep 17 00:00:00 2001 From: Craig Ringer Date: Wed, 22 Mar 2017 13:21:09 +0800 Subject: [PATCH] Make DROP DATABASE drop logical slots for the DB Automatically drop all logical replication slots associated with a database when the database is dropped. --- doc/src/sgml/func.sgml | 3 +- doc/src/sgml/protocol.sgml | 2 + src/backend/commands/dbcommands.c | 32 +--- src/backend/replication/slot.c | 88 ++ src/include/replication/slot.h | 1 + src/test/recovery/t/006_logical_decoding.pl| 40 +- .../recovery/t/010_logical_decoding_timelines.pl | 30 +++- 7 files changed, 182 insertions(+), 14 deletions(-) diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index ba6f8dd..78508d7 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -18876,7 +18876,8 @@ postgres=# SELECT * FROM pg_walfile_name_offset(pg_stop_backup()); Drops the physical or logical replication slot named slot_name. Same as replication protocol -command DROP_REPLICATION_SLOT. +command DROP_REPLICATION_SLOT. For logical slots, this must +be called when connected to the same database the slot was created on. diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml index b3a5026..5f97141 100644 --- a/doc/src/sgml/protocol.sgml +++ b/doc/src/sgml/protocol.sgml @@ -2034,6 +2034,8 @@ The commands accepted in walsender mode are: Drops a replication slot, freeing any reserved server-side resources. If the slot is currently in use by an active connection, this command fails. + If the slot is a logical slot that was created in a database other than + the database the walsender is connected to, this command fails. diff --git a/src/backend/commands/dbcommands.c b/src/backend/commands/dbcommands.c index 5a63b1a..c0ba2b4 100644 --- a/src/backend/commands/dbcommands.c +++ b/src/backend/commands/dbcommands.c @@ -845,19 +845,22 @@ dropdb(const char *dbname, bool missing_ok) errmsg("cannot drop the currently open database"))); /* - * Check whether there are, possibly unconnected, logical slots that refer - * to the to-be-dropped database. The database lock we are holding - * prevents the creation of new slots using the database. + * Check whether there are active logical slots that refer to the + * to-be-dropped database. The database lock we are holding prevents the + * creation of new slots using the database or existing slots becoming + * active. */ - if (ReplicationSlotsCountDBSlots(db_id, &nslots, &nslots_active)) + (void) ReplicationSlotsCountDBSlots(db_id, &nslots, &nslots_active); + if (nslots_active) + { ereport(ERROR, (errcode(ERRCODE_OBJECT_IN_USE), - errmsg("database \"%s\" is used by a logical replication slot", + errmsg("database \"%s\" is used by an active logical replication slot", dbname), - errdetail_plural("There is %d slot, %d of them active.", - "There are %d slots, %d of them active.", - nslots, - nslots, nslots_active))); + errdetail_plural("There is %d active slot", + "There are %d active slots", + nslots_active, nslots_active))); + } /* * Check for other backends in the target database. (Because we hold the @@ -915,6 +918,11 @@ dropdb(const char *dbname, bool missing_ok) dropDatabaseDependencies(db_id); /* + * Drop db-specific replication slots. + */ + ReplicationSlotsDropDBSlots(db_id); + + /* * Drop pages for this database that are in the shared buffer cache. This * is important to ensure that no remaining backend tries to write out a * dirty buffer to the dead database later... @@ -2124,11 +2132,17 @@ dbase_redo(XLogReaderState *record) * InitPostgres() cannot fully re-execute concurrentl
Re: [HACKERS] Logical decoding on standby
Hi Here's the next patch in the split-up series, drop db-specific (logical) replication slots on DROP DATABASE. Current behaviour is to ERROR if logical slots exist on the DB, whether in-use or not. With this patch we can DROP a database if it has logical slots so long as they are not active. I haven't added any sort of syntax for this, it's just done unconditionally. I don't see any sensible way to stop a slot becoming active after we check for active slots and begin the actual database DROP, since ReplicationSlotAcquire will happily acquire a db-specific slot for a different DB and the only lock it takes is a shared lock on ReplicationSlotControlLock, which we presumably don't want to hold throughout DROP DATABASE. So this patch makes ReplicationSlotAcquire check that the slot database matches the current database and refuse to acquire the slot if it does not. The only sensible reason to acquire a slot from a different DB is to drop it, and then it's only a convenience at best. Slot drop is the only user-visible behaviour change, since all other activity on logical slots happened when the backend was already connected to the slot's DB. Appropriate docs changes have been made and tests added. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services From c126a10e40aba0c39a43a97da591492d6240659c Mon Sep 17 00:00:00 2001 From: Craig Ringer Date: Wed, 22 Mar 2017 13:21:09 +0800 Subject: [PATCH] Make DROP DATABASE drop logical slots for the DB Automatically drop all logical replication slots associated with a database when the database is dropped. As a side-effect, pg_drop_replication_slot(...) may now only drop logical slots when connected to the slot's database. --- contrib/test_decoding/sql/slot.sql | 8 ++ doc/src/sgml/func.sgml | 3 +- doc/src/sgml/protocol.sgml | 2 + src/backend/commands/dbcommands.c | 32 +-- src/backend/replication/logical/logical.c | 12 +-- src/backend/replication/slot.c | 97 +- src/include/replication/slot.h | 1 + src/test/recovery/t/006_logical_decoding.pl| 34 +++- .../recovery/t/010_logical_decoding_timelines.pl | 30 ++- 9 files changed, 194 insertions(+), 25 deletions(-) diff --git a/contrib/test_decoding/sql/slot.sql b/contrib/test_decoding/sql/slot.sql index 7ca83fe..22b22f3 100644 --- a/contrib/test_decoding/sql/slot.sql +++ b/contrib/test_decoding/sql/slot.sql @@ -48,3 +48,11 @@ SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot1', 'test_ -- both should error as they should be dropped on error SELECT pg_drop_replication_slot('regression_slot1'); SELECT pg_drop_replication_slot('regression_slot2'); + +CREATE DATABASE testdb; +\c testdb +SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot_otherdb', 'test_decoding'); +\c regression +SELECT pg_drop_replication_slot('regression_slot_otherdb'); +\c testdb +SELECT pg_drop_replication_slot('regression_slot_otherdb'); diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index ba6f8dd..78508d7 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -18876,7 +18876,8 @@ postgres=# SELECT * FROM pg_walfile_name_offset(pg_stop_backup()); Drops the physical or logical replication slot named slot_name. Same as replication protocol -command DROP_REPLICATION_SLOT. +command DROP_REPLICATION_SLOT. For logical slots, this must +be called when connected to the same database the slot was created on. diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml index b3a5026..5f97141 100644 --- a/doc/src/sgml/protocol.sgml +++ b/doc/src/sgml/protocol.sgml @@ -2034,6 +2034,8 @@ The commands accepted in walsender mode are: Drops a replication slot, freeing any reserved server-side resources. If the slot is currently in use by an active connection, this command fails. + If the slot is a logical slot that was created in a database other than + the database the walsender is connected to, this command fails. diff --git a/src/backend/commands/dbcommands.c b/src/backend/commands/dbcommands.c index 5a63b1a..7fe2c2b 100644 --- a/src/backend/commands/dbcommands.c +++ b/src/backend/commands/dbcommands.c @@ -845,19 +845,22 @@ dropdb(const char *dbname, bool missing_ok) errmsg("cannot drop the currently open database"))); /* - * Check whether there are, possibly unconnected, logical slots that refer - * to the to-be-dropped database. The database lock we are holding - * prevents the creation of new slots using the database. + * Check whether there are active logical slots that refer to the + * to-be-dropped database. The database lock we are holding prevents the + * creation of new slots using the dat
Re: [HACKERS] Logical decoding on standby
On 20 March 2017 at 17:33, Andres Freund wrote: > Have you checked how high the overhead of XLogReadDetermineTimeline is? > A non-local function call, especially into a different translation-unit > (no partial inlining), for every single page might end up being > noticeable. That's fine in the cases it actually adds functionality, > but for a master streaming out data, that's not actually adding > anything. I haven't been able to measure any difference. But, since we require the caller to ensure a reasonably up to date ThisTimeLineID, maybe it's worth adding an inlineable function for the fast-path that tests the "page cached" and "timeline is current and unchanged" conditions? //xlogutils.h: static inline void XLogReadDetermineTimeline(...) { ... first test for page already read-in and valid ... ... second test for ThisTimeLineId ... XLogReadCheckTimeLineChange(...) } XLogReadCheckTimeLineChange(...) { ... rest of function } (Yes, I know "inline" means little, but it's a hint for readers) I'd rather avoid using a macro since it'd be pretty ugly, but it's also an option if an inline func is undesirable. #define XLOG_READ_DETERMINE_TIMELINE \ do { \ ... same as above ... } while (0); Can be done after CF if needed anyway, it's just fiddling some code around. Figured I'd mention though. >> + /* >> + * To avoid largely duplicating ReplicationSlotDropAcquired() >> or >> + * complicating it with already_locked flags for ProcArrayLock, >> + * ReplicationSlotControlLock and >> ReplicationSlotAllocationLock, we >> + * just release our ReplicationSlotControlLock to drop the >> slot. >> + * >> + * There's no race here: we acquired this slot, and no slot >> "behind" >> + * our scan can be created or become active with our target >> dboid due >> + * to our exclusive lock on the DB. >> + */ >> + LWLockRelease(ReplicationSlotControlLock); >> + ReplicationSlotDropAcquired(); >> + LWLockAcquire(ReplicationSlotControlLock, LW_SHARED); > > I don't see much problem with this, but I'd change the code so you > simply do a goto restart; if you released the slot. Then there's a lot > less chance / complications around temporarily releasing > ReplicationSlotControlLock I don't quite get this. I suspect I'm just not seeing the implications as clearly as you do. Do you mean we should restart the whole scan of the slot array if we drop any slot? That'll be O(n log m) but since we don't expect to be working on a big array or a lot of slots it's unlikely to matter. The patch coming soon will assume we'll restart the whole scan, anyway. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Logical decoding on standby
On 24 March 2017 at 06:23, Craig Ringer wrote: > On 23 March 2017 at 17:44, Craig Ringer wrote: > > Minor update to catalog_xmin walsender patch to fix failure to > parenthesize definition of PROCARRAY_PROC_FLAGS_MASK . > > This one's ready to go. Working on drop slots on DB drop now. Committed. Next! -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Logical decoding on standby
On 23 March 2017 at 17:44, Craig Ringer wrote: Minor update to catalog_xmin walsender patch to fix failure to parenthesize definition of PROCARRAY_PROC_FLAGS_MASK . This one's ready to go. Working on drop slots on DB drop now. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services From b5e34ecaa8f43825fe41ae2e2bbf0a97258cb56a Mon Sep 17 00:00:00 2001 From: Craig Ringer Date: Wed, 22 Mar 2017 12:29:13 +0800 Subject: [PATCH] Report catalog_xmin separately to xmin in hot standby feedback The catalog_xmin of slots on a standby was reported as part of the standby's xmin, causing the master's xmin to be held down. This could cause considerable unnecessary bloat on the master. Instead, report catalog_xmin as a separate field in hot_standby_feedback. If the upstream walsender is using a physical replication slot, store the catalog_xmin in the slot's catalog_xmin field. If the upstream doesn't use a slot and has only a PGPROC entry behaviour doesn't change, as we store the combined xmin and catalog_xmin in the PGPROC entry. There's no backward compatibility concern here, as nothing except another postgres instance of the same major version has any business sending hot standby feedback and it's only used on the physical replication protocol. --- doc/src/sgml/protocol.sgml | 33 ++- src/backend/replication/walreceiver.c | 45 +++-- src/backend/replication/walsender.c| 110 +++-- src/backend/storage/ipc/procarray.c| 12 ++- src/include/storage/proc.h | 5 + src/include/storage/procarray.h| 11 +++ .../recovery/t/010_logical_decoding_timelines.pl | 38 ++- 7 files changed, 202 insertions(+), 52 deletions(-) diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml index 48ca414..b3a5026 100644 --- a/doc/src/sgml/protocol.sgml +++ b/doc/src/sgml/protocol.sgml @@ -1916,10 +1916,11 @@ The commands accepted in walsender mode are: - The standby's current xmin. This may be 0, if the standby is - sending notification that Hot Standby feedback will no longer - be sent on this connection. Later non-zero messages may - reinitiate the feedback mechanism. + The standby's current global xmin, excluding the catalog_xmin from any + replication slots. If both this value and the following + catalog_xmin are 0 this is treated as a notification that Hot Standby + feedback will no longer be sent on this connection. Later non-zero + messages may reinitiate the feedback mechanism. @@ -1929,7 +1930,29 @@ The commands accepted in walsender mode are: - The standby's current epoch. + The epoch of the global xmin xid on the standby. + + + + + + Int32 + + + + The lowest catalog_xmin of any replication slots on the standby. Set to 0 + if no catalog_xmin exists on the standby or if hot standby feedback is being + disabled. + + + + + + Int32 + + + + The epoch of the catalog_xmin xid on the standby. diff --git a/src/backend/replication/walreceiver.c b/src/backend/replication/walreceiver.c index 31c567b..0f22f17 100644 --- a/src/backend/replication/walreceiver.c +++ b/src/backend/replication/walreceiver.c @@ -1175,8 +1175,8 @@ XLogWalRcvSendHSFeedback(bool immed) { TimestampTz now; TransactionId nextXid; - uint32 nextEpoch; - TransactionId xmin; + uint32 xmin_epoch, catalog_xmin_epoch; + TransactionId xmin, catalog_xmin; static TimestampTz sendTime = 0; /* initially true so we always send at least one feedback message */ static bool master_has_standby_xmin = true; @@ -1221,29 +1221,56 @@ XLogWalRcvSendHSFeedback(bool immed) * everything else has been checked. */ if (hot_standby_feedback) - xmin = GetOldestXmin(NULL, PROCARRAY_FLAGS_DEFAULT); + { + TransactionId slot_xmin; + + /* + * Usually GetOldestXmin() would include the global replication slot + * xmin and catalog_xmin in its calculations, but we don't want to hold + * upstream back from vacuuming normal user table tuples just because + * they're within the catalog_xmin horizon of logical replication slots + * on this standby, so we ignore slot xmin and catalog_xmin GetOldestXmin + * then deal with them ourselves. + */ + xmin = GetOldestXmin(NULL, + PROCARRAY_FLAGS_DEFAULT|PROCARRAY_SLOTS_XMIN); + + ProcArrayGetReplicationSlotXmin(&slot_xmin, &catalog_xmin); + + if (TransactionIdIsValid(slot_xmin) && + TransactionIdPrecedes(slot_xmin, xmin)) + xmin = slot_xmin; + } else + { xmin = InvalidTransactionId; + catalog_xmin = InvalidTransactionId; + } /* * Get epoch
Re: [HACKERS] Logical decoding on standby
On 23 March 2017 at 16:07, Craig Ringer wrote: > If preferred I can instead add > > proc.h: > > #define PROC_RESERVED 0x20 > > procarray.h: > > #define PROCARRAY_REPLICATION_SLOTS 0x20 > > and then test for (flags & PROCARRAY_REPLICATION_SLOTS) Attached done that way. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services From 2e887ee19c9c1bae442b9f0682169f9b0c61268a Mon Sep 17 00:00:00 2001 From: Craig Ringer Date: Wed, 22 Mar 2017 12:29:13 +0800 Subject: [PATCH] Report catalog_xmin separately to xmin in hot standby feedback The catalog_xmin of slots on a standby was reported as part of the standby's xmin, causing the master's xmin to be held down. This could cause considerable unnecessary bloat on the master. Instead, report catalog_xmin as a separate field in hot_standby_feedback. If the upstream walsender is using a physical replication slot, store the catalog_xmin in the slot's catalog_xmin field. If the upstream doesn't use a slot and has only a PGPROC entry behaviour doesn't change, as we store the combined xmin and catalog_xmin in the PGPROC entry. There's no backward compatibility concern here, as nothing except another postgres instance of the same major version has any business sending hot standby feedback and it's only used on the physical replication protocol. --- doc/src/sgml/protocol.sgml | 33 ++- src/backend/replication/walreceiver.c | 45 +++-- src/backend/replication/walsender.c| 110 +++-- src/backend/storage/ipc/procarray.c| 12 ++- src/include/storage/proc.h | 5 + src/include/storage/procarray.h| 11 +++ .../recovery/t/010_logical_decoding_timelines.pl | 38 ++- 7 files changed, 202 insertions(+), 52 deletions(-) diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml index 244e381..d8786f0 100644 --- a/doc/src/sgml/protocol.sgml +++ b/doc/src/sgml/protocol.sgml @@ -1911,10 +1911,11 @@ The commands accepted in walsender mode are: - The standby's current xmin. This may be 0, if the standby is - sending notification that Hot Standby feedback will no longer - be sent on this connection. Later non-zero messages may - reinitiate the feedback mechanism. + The standby's current global xmin, excluding the catalog_xmin from any + replication slots. If both this value and the following + catalog_xmin are 0 this is treated as a notification that Hot Standby + feedback will no longer be sent on this connection. Later non-zero + messages may reinitiate the feedback mechanism. @@ -1924,7 +1925,29 @@ The commands accepted in walsender mode are: - The standby's current epoch. + The epoch of the global xmin xid on the standby. + + + + + + Int32 + + + + The lowest catalog_xmin of any replication slots on the standby. Set to 0 + if no catalog_xmin exists on the standby or if hot standby feedback is being + disabled. + + + + + + Int32 + + + + The epoch of the catalog_xmin xid on the standby. diff --git a/src/backend/replication/walreceiver.c b/src/backend/replication/walreceiver.c index 31c567b..0f22f17 100644 --- a/src/backend/replication/walreceiver.c +++ b/src/backend/replication/walreceiver.c @@ -1175,8 +1175,8 @@ XLogWalRcvSendHSFeedback(bool immed) { TimestampTz now; TransactionId nextXid; - uint32 nextEpoch; - TransactionId xmin; + uint32 xmin_epoch, catalog_xmin_epoch; + TransactionId xmin, catalog_xmin; static TimestampTz sendTime = 0; /* initially true so we always send at least one feedback message */ static bool master_has_standby_xmin = true; @@ -1221,29 +1221,56 @@ XLogWalRcvSendHSFeedback(bool immed) * everything else has been checked. */ if (hot_standby_feedback) - xmin = GetOldestXmin(NULL, PROCARRAY_FLAGS_DEFAULT); + { + TransactionId slot_xmin; + + /* + * Usually GetOldestXmin() would include the global replication slot + * xmin and catalog_xmin in its calculations, but we don't want to hold + * upstream back from vacuuming normal user table tuples just because + * they're within the catalog_xmin horizon of logical replication slots + * on this standby, so we ignore slot xmin and catalog_xmin GetOldestXmin + * then deal with them ourselves. + */ + xmin = GetOldestXmin(NULL, + PROCARRAY_FLAGS_DEFAULT|PROCARRAY_SLOTS_XMIN); + + ProcArrayGetReplicationSlotXmin(&slot_xmin, &catalog_xmin); + + if (TransactionIdIsValid(slot_xmin) && + TransactionIdPrecedes(slot_xmin, xmin)) + xmin = slot_xmin; + } else + { xmin = InvalidTransactionId; + catalog_xmin = Invali
Re: [HACKERS] Logical decoding on standby
On 23 March 2017 at 00:13, Simon Riggs wrote: > On 22 March 2017 at 08:53, Craig Ringer wrote: > >> I'm splitting up the rest of the decoding on standby patch set with >> the goal of getting minimal functionality for creating and managing >> slots on standbys in, so we can maintain slots on standbys and use >> them when the standby is promoted to master. >> >> The first, to send catalog_xmin separately to the global xmin on >> hot_standby_feedback and store it in the upstream physical slot's >> catalog_xmin, is attached. >> >> These are extracted directly from the logical decoding on standby >> patch, with comments by Petr and Andres made re the relevant code >> addressed. > > I've reduced your two patches back to one with a smaller blast radius. > > I'll commit this tomorrow morning, barring objections. This needs rebasing on top of commit af4b1a0869bd3bb52e5f662e4491554b7f611489 Author: Simon Riggs Date: Wed Mar 22 16:51:01 2017 + Refactor GetOldestXmin() to use flags Replace ignoreVacuum parameter with more flexible flags. Author: Eiji Seki Review: Haribabu Kommi That patch landed up using PROCARRAY flags directly as flags to GetOldestXmin, so it doesn't make much sense to add a flag like PROCARRAY_REPLICATION_SLOTS . There won't be any corresponding PROC_ flag for PGXACT->vacuumFlags, replication slot xmin and catalog_xmin are global state not tracked in individual proc entries. Rather than add some kind of "PROC_RESERVED" flag in proc.h that would never be used and only exist to reserve a bit for use for PROCARRAY_REPLICATION_SLOTS, which we'd use a flag to GetOldestXmin, I added a new argument to GetOldestXmin like the prior patch did. If preferred I can instead add proc.h: #define PROC_RESERVED 0x20 procarray.h: #define PROCARRAY_REPLICATION_SLOTS 0x20 and then test for (flags & PROCARRAY_REPLICATION_SLOTS) but that's kind of ugly to say the least, I'd rather just add another argument. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services From ffa43fae35857dbff0efe83ef199df165d887d97 Mon Sep 17 00:00:00 2001 From: Craig Ringer Date: Wed, 22 Mar 2017 12:29:13 +0800 Subject: [PATCH] Report catalog_xmin separately to xmin in hot standby feedback The catalog_xmin of slots on a standby was reported as part of the standby's xmin, causing the master's xmin to be held down. This could cause considerable unnecessary bloat on the master. Instead, report catalog_xmin as a separate field in hot_standby_feedback. If the upstream walsender is using a physical replication slot, store the catalog_xmin in the slot's catalog_xmin field. If the upstream doesn't use a slot and has only a PGPROC entry behaviour doesn't change, as we store the combined xmin and catalog_xmin in the PGPROC entry. There's no backward compatibility concern here, as nothing except another postgres instance of the same major version has any business sending hot standby feedback and it's only used on the physical replication protocol. --- contrib/pg_visibility/pg_visibility.c | 6 +- contrib/pgstattuple/pgstatapprox.c | 2 +- doc/src/sgml/protocol.sgml | 33 ++- src/backend/access/transam/xlog.c | 4 +- src/backend/catalog/index.c| 3 +- src/backend/commands/analyze.c | 2 +- src/backend/commands/vacuum.c | 5 +- src/backend/replication/walreceiver.c | 44 +++-- src/backend/replication/walsender.c| 110 +++-- src/backend/storage/ipc/procarray.c| 11 ++- src/include/storage/procarray.h| 2 +- .../recovery/t/010_logical_decoding_timelines.pl | 38 ++- 12 files changed, 198 insertions(+), 62 deletions(-) diff --git a/contrib/pg_visibility/pg_visibility.c b/contrib/pg_visibility/pg_visibility.c index ee3936e..2db5762 100644 --- a/contrib/pg_visibility/pg_visibility.c +++ b/contrib/pg_visibility/pg_visibility.c @@ -557,7 +557,7 @@ collect_corrupt_items(Oid relid, bool all_visible, bool all_frozen) if (all_visible) { /* Don't pass rel; that will fail in recovery. */ - OldestXmin = GetOldestXmin(NULL, PROCARRAY_FLAGS_VACUUM); + OldestXmin = GetOldestXmin(NULL, PROCARRAY_FLAGS_VACUUM, false); } rel = relation_open(relid, AccessShareLock); @@ -674,7 +674,9 @@ collect_corrupt_items(Oid relid, bool all_visible, bool all_frozen) * a buffer lock. And this shouldn't happen often, so it's * worth being careful so as to avoid false positives. */ -RecomputedOldestXmin = GetOldestXmin(NULL, PROCARRAY_FLAGS_VACUUM); +RecomputedOldestXmin = GetOldestXmin(NULL, + PROCARRAY_FLAGS_VACUUM, + false); if (!TransactionIdPrecedes(OldestXmin, RecomputedOldestXmin)) record_corrupt_item(items, &tuple.t_self); diff --git a/contrib/pgstattu
Re: [HACKERS] Logical decoding on standby
On 23 March 2017 at 12:41, Andres Freund wrote: > On 2017-03-23 12:14:02 +0800, Craig Ringer wrote: >> On 23 March 2017 at 09:39, Andres Freund wrote: >> > I still think decoding-on-standby is simply not the right approach as >> > the basic/first HA approach for logical rep. It's a nice later-on >> > feature. But that's an irrelevant aside. >> >> I don't really agree that it's irrelevant. > > I'm not sure we have enough time for either getting some parts of your > patch in, or for figuring out long term goals. But we definitely don't > have time for both. Fair. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Logical decoding on standby
On 2017-03-23 12:14:02 +0800, Craig Ringer wrote: > On 23 March 2017 at 09:39, Andres Freund wrote: > > I still think decoding-on-standby is simply not the right approach as > > the basic/first HA approach for logical rep. It's a nice later-on > > feature. But that's an irrelevant aside. > > I don't really agree that it's irrelevant. I'm not sure we have enough time for either getting some parts of your patch in, or for figuring out long term goals. But we definitely don't have time for both. - Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Logical decoding on standby
On 23 March 2017 at 00:13, Simon Riggs wrote: > On 22 March 2017 at 08:53, Craig Ringer wrote: > >> I'm splitting up the rest of the decoding on standby patch set with >> the goal of getting minimal functionality for creating and managing >> slots on standbys in, so we can maintain slots on standbys and use >> them when the standby is promoted to master. >> >> The first, to send catalog_xmin separately to the global xmin on >> hot_standby_feedback and store it in the upstream physical slot's >> catalog_xmin, is attached. >> >> These are extracted directly from the logical decoding on standby >> patch, with comments by Petr and Andres made re the relevant code >> addressed. > > I've reduced your two patches back to one with a smaller blast radius. > > I'll commit this tomorrow morning, barring objections. Thanks. I was tempted to refactor GetOldestXmin to use flags myself, but thought it might be at higher risk of objections. Since Eiji Seki has shown that there are other uses for excluding particular things from GetOldestXmin it and that's committed now, it's nice to have the impact of this patch reduced. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Logical decoding on standby
On 23 March 2017 at 09:39, Andres Freund wrote: > We can't just assume that snapbuild is going to work correctly when it's > prerequisites - pinned xmin horizon - isn't working. Makes sense. >> What do _you_ see as the minimum acceptable way to achieve the ability >> for a logical decoding client to follow failover of an upstream to a >> physical standby? In the end, you're one of the main people whose view >> carries weight in this area, and I don't want to develop yet another > > I think your approach here wasn't that bad? There's a lot of cleaning > up/shoring up needed, and we probably need a smarter feedback system. I > don't think anybody here has objected to the fundamental approach? That's useful, thanks. I'm not arguing that the patch as it stands is ready, and appreciate the input re the general design. > I still think decoding-on-standby is simply not the right approach as > the basic/first HA approach for logical rep. It's a nice later-on > feature. But that's an irrelevant aside. I don't really agree that it's irrelevant. Right now Pg has no HA capability for logical decoding clients. We've now added logical replication, but it has no way to provide for upstream node failure and ensure a consistent switch-over, whether to a logical or physical replica. Since real world servers fail or need maintenance, this is kind of a problem for practical production use. Because of transaction serialization for commit-time order replay, logical replication experiences saw-tooth replication lag, where large or long xacts such as batch jobs effectively stall all later xacts until they are fully replicated. We cannot currently start replicating a big xact until it commits on the upstream, so that lag can easily be ~2x the runtime on the upstream. So while you can do sync rep on a logical standby, it tends to result in big delays on COMMITs relative to physical rep, even if app are careful to keep transactions small. When the app DR planning people come and ask you what the max data loss window / max sync rep lag is, you have to say " dunno? depends on what else was running on the server at the time." AFAICT, changing those things will require the ability to begin streaming reorder buffers for big xacts before commit, which as the logical decoding on 2PC thread shows is not exactly trivial. We'll also need to be able to apply them concurrently with other xacts on the other end. Those are both big and complex things IMO, and I'll be surprised if we can do either in Pg11 given that AFAIK nobody has even started work on either of them or has a detailed plan. Presuming we get some kind of failover to logical replica upstreams into Pg11, it'll have significant limitations relative to what we can deliver to users by using physical replication. Especially when it comes to bounded-time lag for HA, sync rep, etc. And I haven't seen a design for it, though Petr and I have discussed some with regards to pglogical. That's why I think we need to do HA on the physical side first. Because it's going to take a long time to get equivalent functionality for logical rep based upstreams, and when it is we'll still have to teach management tools and other non-logical-rep logical decoding clients about the new way of doing things. Wheras for physical HA setups to support logical downstreams requires only relatively minor changes and gets us all the physical HA features _now_. That's why we pursued failover slots - as a simple, minimal solution to allowing logical decoding clients to inter-operate with Pg in a physical HA configuration. TBH, I still think we should just add them. Sure, they don't help us achieve decoding on standby, but they're a lot simpler and they help Pg's behaviour with slots match user expectations for how the rest of Pg behaves, i.e. if it's on the master it'll be on the replica too. And as you've said, decoding on standby is a nice-to-have, wheras I think some kind of HA support is rather more important. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Logical decoding on standby
On 2017-03-23 09:14:07 +0800, Craig Ringer wrote: > On 23 March 2017 at 07:31, Andres Freund wrote: > > On 2017-03-23 06:55:53 +0800, Craig Ringer wrote: > > >> I was thinking that by disallowing snapshot use and output plugin > >> invocation we'd avoid the need to support cancellation on recovery > >> conflicts, etc, simplifying things considerably. > > > > That seems like it'd end up being pretty hacky - the likelihood that > > we'd run into snapbuild error cross-checks seems very high. > > TBH I'm not following this. But I haven't touched snapbuild much yet, > Petr's done much more with snapbuild than I have. We can't just assume that snapbuild is going to work correctly when it's prerequisites - pinned xmin horizon - isn't working. > We're not going to have robust logical replication that's suitable for > HA and failover use on high load systems until 2020 or so, with Pg 12. > We'll need concurrent decoding and apply, which nobody's even started > on AFAIK, we'll need sequence replication, and more. These seem largely unrelated to the topic at hand(nor do I agree on all of them). > So I'd really, really like to get some kind of HA picture other than > "none" in for logical decoding based systems. If it's imperfect, it's > still something. I still think decoding-on-standby is simply not the right approach as the basic/first HA approach for logical rep. It's a nice later-on feature. But that's an irrelevant aside. I don't understand why you're making a "fundamental" argument here - I'm not arguing against the goals of the patch at all. I want as much stuff committed as we can in a good shape. > What do _you_ see as the minimum acceptable way to achieve the ability > for a logical decoding client to follow failover of an upstream to a > physical standby? In the end, you're one of the main people whose view > carries weight in this area, and I don't want to develop yet another I think your approach here wasn't that bad? There's a lot of cleaning up/shoring up needed, and we probably need a smarter feedback system. I don't think anybody here has objected to the fundamental approach? Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Logical decoding on standby
On 23 March 2017 at 07:31, Andres Freund wrote: > On 2017-03-23 06:55:53 +0800, Craig Ringer wrote: >> I was thinking that by disallowing snapshot use and output plugin >> invocation we'd avoid the need to support cancellation on recovery >> conflicts, etc, simplifying things considerably. > > That seems like it'd end up being pretty hacky - the likelihood that > we'd run into snapbuild error cross-checks seems very high. TBH I'm not following this. But I haven't touched snapbuild much yet, Petr's done much more with snapbuild than I have. We're not going to have robust logical replication that's suitable for HA and failover use on high load systems until 2020 or so, with Pg 12. We'll need concurrent decoding and apply, which nobody's even started on AFAIK, we'll need sequence replication, and more. So I'd really, really like to get some kind of HA picture other than "none" in for logical decoding based systems. If it's imperfect, it's still something. I wish we'd just proceeded with failover slots. They were blocked in favour of decoding on standby, and HA is possible if we have decoding on standby with some more work by the application. But now we'll have neither. If we'd just done failover slots we could've had logical replication able to follow failover in Pg 10. What do _you_ see as the minimum acceptable way to achieve the ability for a logical decoding client to follow failover of an upstream to a physical standby? In the end, you're one of the main people whose view carries weight in this area, and I don't want to develop yet another approach only to have it knocked back once the work is done. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Logical decoding on standby
On 23 March 2017 at 00:17, Andres Freund wrote: > On 2017-03-22 15:59:42 +, Simon Riggs wrote: >> On 22 March 2017 at 13:06, Andres Freund wrote: >> >> >> The parts I think are important for Pg10 are: >> > >> >> * Ability to create logical slots on replicas >> > >> > Doesn't this also imply recovery conflicts on DROP DATABASE? >> >> Not needed until the slot is in use, which is a later patch. > > Hm? We need to drop slots, if they can exist / be created, on a standby, > and they're on a dropped database. Otherwise we'll reserve resources, > while anyone connecting to the slot will likely just receive errors > because the database doesn't exist anymore. It's also one of the > patches that can quite easily be developed / reviewed, because there > really isn't anything complicated about it. Most of the code is already > in Craig's patch, it just needs some adjustments. Right, I'm not too concerned about doing that, and it's next on my TODO as I clean up the split patch series. >> >> * Ability to advance (via feedback or via SQL function) - no need to >> >> actually decode and call output plugins at al >> > >> > That pretty much requires decoding, otherwise you really don't know how >> > much WAL you have to retain. >> >> Knowing how much WAL to retain is key. >> >> Why would decoding tell you how much WAL to retain? > > Because decoding already has the necessary logic? (You need to retain > enough WAL to restart decoding for all in-progress transactions etc). Indeed; after all, standby status updates from the decoding client only contain the *flushed* LSN. The downstream doesn't know the restartpoint LSN, it must be tracked by the upstream. It's also necessary to maintain our catalog_xmin correctly on the standby so we can send it via hot_standby_feedback to a physical replication slot used on the master, ensuring the master doesn't remove catalog tuples we may still need. > I don't know what you're on about with that statement. I've spent a > good chunk of time looking at the 0003 patch, even though it's large > and contains a lot of different things. I suggested splitting things up. > I even suggested what to move earlier after Craig agreed with that > sentiment, in the mail you're replying to, because it seems > independently doable. I really appreciate the review, as I'm all too aware of how time consuming it can be. >From my PoV, the difficulty I'm in is that this patch series has languished for most of the Pg 10 release cycle with no real input from stakeholders in the logical decoding area, so while the review is important, the fact that it's now means that it pretty comprehensively blocks the patch for Pg 10. I asked on list for input on structure (if/how to split it up) literally months ago, for example. I've been struggling to get some kind of support for logical decoding on standbys for most of two release cycles, and there are people climbing the walls wanting it. I'm trying to make sure it's done right, but I can't do that alone, and it's hard to progress when I don't know what will be expected until it's too late to do anything about it. I guess all we can do at this point is get the foundations in place and carry on for Pg 11, where the presence of in-core logical replication will offer a lever to actually push this in. In the mean time I'll have to continue carrying the out-of-tree failover slots patch for people who use logical decoding and want it to be reliable. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Logical decoding on standby
On 2017-03-23 06:55:53 +0800, Craig Ringer wrote: > On 22 March 2017 at 21:06, Andres Freund wrote: > > Hi, > > > > On 2017-03-21 09:05:26 +0800, Craig Ringer wrote: > >> > 0002 should be doable as a whole this release, I have severe doubts that > >> > 0003 as a whole has a chance for 10 - the code is in quite a raw shape, > >> > there's a significant number of open ends. I'd suggest breaking of bits > >> > that are independently useful, and work on getting those committed. > >> > >> That would be my preference too. > > > > > >> The parts I think are important for Pg10 are: > > > >> * Ability to create logical slots on replicas > > > > Doesn't this also imply recovery conflicts on DROP DATABASE? Besides, > > allowing to drop all slots using a database upon DROP DATABASE, is a > > useful thing on its own. > > Definitely beneficial, otherwise recovery will stop until you drop > slots, which isn't ideal. s/isn't ideal/not acceptable/ ;) > >> * Ability to advance (via feedback or via SQL function) - no need to > >> actually decode and call output plugins at al > > > > That pretty much requires decoding, otherwise you really don't know how > > much WAL you have to retain. > > Yes, and to update restart_lsn and catalog_xmin correctly. > I was thinking that by disallowing snapshot use and output plugin > invocation we'd avoid the need to support cancellation on recovery > conflicts, etc, simplifying things considerably. That seems like it'd end up being pretty hacky - the likelihood that we'd run into snapbuild error cross-checks seems very high. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Logical decoding on standby
On 22 March 2017 at 21:06, Andres Freund wrote: > Hi, > > On 2017-03-21 09:05:26 +0800, Craig Ringer wrote: >> > 0002 should be doable as a whole this release, I have severe doubts that >> > 0003 as a whole has a chance for 10 - the code is in quite a raw shape, >> > there's a significant number of open ends. I'd suggest breaking of bits >> > that are independently useful, and work on getting those committed. >> >> That would be my preference too. > > >> The parts I think are important for Pg10 are: > >> * Ability to create logical slots on replicas > > Doesn't this also imply recovery conflicts on DROP DATABASE? Besides, > allowing to drop all slots using a database upon DROP DATABASE, is a > useful thing on its own. Definitely beneficial, otherwise recovery will stop until you drop slots, which isn't ideal. >> * Ability to advance (via feedback or via SQL function) - no need to >> actually decode and call output plugins at al > > That pretty much requires decoding, otherwise you really don't know how > much WAL you have to retain. Yes, and to update restart_lsn and catalog_xmin correctly. I was thinking that by disallowing snapshot use and output plugin invocation we'd avoid the need to support cancellation on recovery conflicts, etc, simplifying things considerably. >> * Ability to drop logical slots on replicas > > That shouldn't actually require any changes, no? It doesn't, it works as-is. I have NFI why I wrote that. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Logical decoding on standby
On 2017-03-22 15:59:42 +, Simon Riggs wrote: > On 22 March 2017 at 13:06, Andres Freund wrote: > > >> The parts I think are important for Pg10 are: > > > >> * Ability to create logical slots on replicas > > > > Doesn't this also imply recovery conflicts on DROP DATABASE? > > Not needed until the slot is in use, which is a later patch. Hm? We need to drop slots, if they can exist / be created, on a standby, and they're on a dropped database. Otherwise we'll reserve resources, while anyone connecting to the slot will likely just receive errors because the database doesn't exist anymore. It's also one of the patches that can quite easily be developed / reviewed, because there really isn't anything complicated about it. Most of the code is already in Craig's patch, it just needs some adjustments. > > Besides, > > allowing to drop all slots using a database upon DROP DATABASE, is a > > useful thing on its own. > > Sure but that's a separate feature unrelated to this patch and we're > running out of time. Hm? The patch implemented it. > >> * Ability to advance (via feedback or via SQL function) - no need to > >> actually decode and call output plugins at al > > > > That pretty much requires decoding, otherwise you really don't know how > > much WAL you have to retain. > > Knowing how much WAL to retain is key. > > Why would decoding tell you how much WAL to retain? Because decoding already has the necessary logic? (You need to retain enough WAL to restart decoding for all in-progress transactions etc). > We tried to implement this automatically from the master, which was > rejected. So the only other way is manually. We need one or the other. I think the current approach is roughly the right way - but that doesn't make the patch ready. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Logical decoding on standby
On 22 March 2017 at 08:53, Craig Ringer wrote: > I'm splitting up the rest of the decoding on standby patch set with > the goal of getting minimal functionality for creating and managing > slots on standbys in, so we can maintain slots on standbys and use > them when the standby is promoted to master. > > The first, to send catalog_xmin separately to the global xmin on > hot_standby_feedback and store it in the upstream physical slot's > catalog_xmin, is attached. > > These are extracted directly from the logical decoding on standby > patch, with comments by Petr and Andres made re the relevant code > addressed. I've reduced your two patches back to one with a smaller blast radius. I'll commit this tomorrow morning, barring objections. -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services Report-catalog_xmin-separately-to-xmin-in-hot-standb.v2.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Logical decoding on standby
On 22 March 2017 at 13:06, Andres Freund wrote: >> The parts I think are important for Pg10 are: > >> * Ability to create logical slots on replicas > > Doesn't this also imply recovery conflicts on DROP DATABASE? Not needed until the slot is in use, which is a later patch. > Besides, > allowing to drop all slots using a database upon DROP DATABASE, is a > useful thing on its own. Sure but that's a separate feature unrelated to this patch and we're running out of time. >> * Ability to advance (via feedback or via SQL function) - no need to >> actually decode and call output plugins at al > > That pretty much requires decoding, otherwise you really don't know how > much WAL you have to retain. Knowing how much WAL to retain is key. Why would decoding tell you how much WAL to retain? We tried to implement this automatically from the master, which was rejected. So the only other way is manually. We need one or the other. -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Logical decoding on standby
On 2017-03-22 14:58:29 +, Simon Riggs wrote: > On 22 March 2017 at 13:06, Andres Freund wrote: > > > But I have to admit, I've *severe* doubts about getting the whole > > infrastructure for slot creation on replica into 10. The work is far > > from ready, and we're mere days away from freeze. > > If Craig has to guess what would be acceptable, then its not long enough. I don't know what you're on about with that statement. I've spent a good chunk of time looking at the 0003 patch, even though it's large and contains a lot of different things. I suggested splitting things up. I even suggested what to move earlier after Craig agreed with that sentiment, in the mail you're replying to, because it seems independently doable. > It would be better if you could outline a specific approach so he can > code it. Coding takes about a day for most things, since Craig knows > the code and what we're trying to achieve. I find that fairly unconvincing. What we have here is a patch that isn't close to being ready, contains a lot of complicated pieces, a couple days before freeze. If we can pull useful pieces out: great. But it's too later for major new development. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Logical decoding on standby
On 22 March 2017 at 13:06, Andres Freund wrote: > But I have to admit, I've *severe* doubts about getting the whole > infrastructure for slot creation on replica into 10. The work is far > from ready, and we're mere days away from freeze. If Craig has to guess what would be acceptable, then its not long enough. It would be better if you could outline a specific approach so he can code it. Coding takes about a day for most things, since Craig knows the code and what we're trying to achieve. -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Logical decoding on standby
Hi, On 2017-03-21 09:05:26 +0800, Craig Ringer wrote: > > 0002 should be doable as a whole this release, I have severe doubts that > > 0003 as a whole has a chance for 10 - the code is in quite a raw shape, > > there's a significant number of open ends. I'd suggest breaking of bits > > that are independently useful, and work on getting those committed. > > That would be my preference too. > The parts I think are important for Pg10 are: > * Ability to create logical slots on replicas Doesn't this also imply recovery conflicts on DROP DATABASE? Besides, allowing to drop all slots using a database upon DROP DATABASE, is a useful thing on its own. But I have to admit, I've *severe* doubts about getting the whole infrastructure for slot creation on replica into 10. The work is far from ready, and we're mere days away from freeze. > * Ability to advance (via feedback or via SQL function) - no need to > actually decode and call output plugins at al That pretty much requires decoding, otherwise you really don't know how much WAL you have to retain. > * Ability to drop logical slots on replicas That shouldn't actually require any changes, no? Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Logical decoding on standby
On 22 March 2017 at 10:51, Craig Ringer wrote: > Hi all > > Updated timeline following patch attached. > > There's a change in read_local_xlog_page to ensure we maintain > ThisTimeLineID properly, otherwise it's just comment changes. OK, so we're looking OK with the TL following. I'm splitting up the rest of the decoding on standby patch set with the goal of getting minimal functionality for creating and managing slots on standbys in, so we can maintain slots on standbys and use them when the standby is promoted to master. The first, to send catalog_xmin separately to the global xmin on hot_standby_feedback and store it in the upstream physical slot's catalog_xmin, is attached. These are extracted directly from the logical decoding on standby patch, with comments by Petr and Andres made re the relevant code addressed. I will next be working on a bare-minimum facility for creating and advancing logical slots on a replica without support for buffering changes, creating historic snapshots or invoking output plugin. The slots will become usable after the replica is promoted. They'll track their own restart_lsn, etc, and will keep track of xids so they can manage their catalog_xmin, so there'll be no need for dodgy slot syncing from the master, but they'll avoid most of the complex and messy bits. The application will be expected to make sure a slot on the master exists and is advanced before the corresponding slot on the replica to protect required catalogs. Then if there's agreement that it's the right way forward I can add the catalog_xmin xlog'ing stuff as the next patch. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services From b719c0b556a6823c9b48c0f4042aaf77a8d5f69e Mon Sep 17 00:00:00 2001 From: Craig Ringer Date: Wed, 22 Mar 2017 12:11:17 +0800 Subject: [PATCH 1/2] Allow GetOldestXmin to ignore replication slot xmin For walsender to report replication slots' catalog_xmin separately, it's necessary to be able to ask GetOldestXmin to ignore replication slots. --- contrib/pg_visibility/pg_visibility.c | 4 ++-- contrib/pgstattuple/pgstatapprox.c| 2 +- src/backend/access/transam/xlog.c | 4 ++-- src/backend/catalog/index.c | 2 +- src/backend/commands/analyze.c| 2 +- src/backend/commands/vacuum.c | 4 ++-- src/backend/replication/walreceiver.c | 2 +- src/backend/storage/ipc/procarray.c | 36 +++ src/include/storage/procarray.h | 2 +- 9 files changed, 39 insertions(+), 19 deletions(-) diff --git a/contrib/pg_visibility/pg_visibility.c b/contrib/pg_visibility/pg_visibility.c index d0f7618..6261e68 100644 --- a/contrib/pg_visibility/pg_visibility.c +++ b/contrib/pg_visibility/pg_visibility.c @@ -557,7 +557,7 @@ collect_corrupt_items(Oid relid, bool all_visible, bool all_frozen) if (all_visible) { /* Don't pass rel; that will fail in recovery. */ - OldestXmin = GetOldestXmin(NULL, true); + OldestXmin = GetOldestXmin(NULL, true, false); } rel = relation_open(relid, AccessShareLock); @@ -674,7 +674,7 @@ collect_corrupt_items(Oid relid, bool all_visible, bool all_frozen) * a buffer lock. And this shouldn't happen often, so it's * worth being careful so as to avoid false positives. */ -RecomputedOldestXmin = GetOldestXmin(NULL, true); +RecomputedOldestXmin = GetOldestXmin(NULL, true, false); if (!TransactionIdPrecedes(OldestXmin, RecomputedOldestXmin)) record_corrupt_item(items, &tuple.t_self); diff --git a/contrib/pgstattuple/pgstatapprox.c b/contrib/pgstattuple/pgstatapprox.c index 8db1e20..743cbee 100644 --- a/contrib/pgstattuple/pgstatapprox.c +++ b/contrib/pgstattuple/pgstatapprox.c @@ -70,7 +70,7 @@ statapprox_heap(Relation rel, output_type *stat) TransactionId OldestXmin; uint64 misc_count = 0; - OldestXmin = GetOldestXmin(rel, true); + OldestXmin = GetOldestXmin(rel, true, false); bstrategy = GetAccessStrategy(BAS_BULKREAD); nblocks = RelationGetNumberOfBlocks(rel); diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 9480377..c2b4f2c 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -8895,7 +8895,7 @@ CreateCheckPoint(int flags) * StartupSUBTRANS hasn't been called yet. */ if (!RecoveryInProgress()) - TruncateSUBTRANS(GetOldestXmin(NULL, false)); + TruncateSUBTRANS(GetOldestXmin(NULL, false, false)); /* Real work is done, but log and update stats before releasing lock. */ LogCheckpointEnd(false); @@ -9258,7 +9258,7 @@ CreateRestartPoint(int flags) * this because StartupSUBTRANS hasn't been called yet. */ if (EnableHotStandby) - TruncateSUBTRANS(GetOldestXmin(NULL, false)); + TruncateSUBTRANS(GetOldestXmin(NULL, false, false)); /* Real work is done, but log and update before releasing lock. */ LogCheckpointEnd(true); diff --git a/src/backend/catalog/index.c b/src/backen
Re: [HACKERS] Logical decoding on standby
Hi all Updated timeline following patch attached. There's a change in read_local_xlog_page to ensure we maintain ThisTimeLineID properly, otherwise it's just comment changes. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services From d42ceaec47793f67c55523d1aeb72be61c4f2dea Mon Sep 17 00:00:00 2001 From: Craig Ringer Date: Thu, 1 Sep 2016 10:16:55 +0800 Subject: [PATCH] Teach xlogreader to follow timeline switches The XLogReader was timeline-agnostic and assumed that all WAL segments requested would be on ThisTimeLineID. When decoding from a logical slot, it's necessary for xlog reading to be able to read xlog from historical (i.e. not current) timelines. Otherwise decoding fails after failover to a physical replica because the oldest still-needed archives are in the historical timeline. Supporting logical decoding timeline following is a pre-requisite for logical decoding on physical standby servers. It also makes it possible to promote a replica with logical slots to a master and replay from those slots, allowing logical decoding applications to follow physical failover. Logical slots cannot actually be created or advanced on a replica so this is mostly foundation work for subsequent changes to enable logical decoding on standbys. Tests are included to exercise the functionality using a cold disk-level copy of the master that's started up as a replica with slots intact, but the intended use of the functionality is with logical decoding on a standby. Note that an earlier version of logical decoding timeline following was committed to 9.6 as 24c5f1a103ce, 3a3b309041b0, 82c83b337202, and f07d18b6e94d. It was then reverted by c1543a81a7a8 just after 9.6 feature freeze when issues were discovered too late to safely fix them in the 9.6 release cycle. The prior approach failed to consider that a record could be split across pages that are on different segments, where the new segment contains the start of a new timeline. In that case the old segment might be missing or renamed with a .partial suffix. This patch reworks the logic to be page-based and in the process simplify how the last timeline for a segment is looked up. --- src/backend/access/transam/xlogutils.c | 213 +++-- src/backend/replication/logical/logicalfuncs.c | 8 +- src/backend/replication/walsender.c| 11 +- src/include/access/xlogreader.h| 16 ++ src/include/access/xlogutils.h | 3 + src/test/recovery/Makefile | 2 + .../recovery/t/009_logical_decoding_timelines.pl | 130 + 7 files changed, 364 insertions(+), 19 deletions(-) create mode 100644 src/test/recovery/t/009_logical_decoding_timelines.pl diff --git a/src/backend/access/transam/xlogutils.c b/src/backend/access/transam/xlogutils.c index b2b9fcb..28c07d3 100644 --- a/src/backend/access/transam/xlogutils.c +++ b/src/backend/access/transam/xlogutils.c @@ -19,6 +19,7 @@ #include +#include "access/timeline.h" #include "access/xlog.h" #include "access/xlog_internal.h" #include "access/xlogutils.h" @@ -662,6 +663,7 @@ XLogRead(char *buf, TimeLineID tli, XLogRecPtr startptr, Size count) /* state maintained across calls */ static int sendFile = -1; static XLogSegNo sendSegNo = 0; + static TimeLineID sendTLI = 0; static uint32 sendOff = 0; p = buf; @@ -677,7 +679,8 @@ XLogRead(char *buf, TimeLineID tli, XLogRecPtr startptr, Size count) startoff = recptr % XLogSegSize; /* Do we need to switch to a different xlog segment? */ - if (sendFile < 0 || !XLByteInSeg(recptr, sendSegNo)) + if (sendFile < 0 || !XLByteInSeg(recptr, sendSegNo) || + sendTLI != tli) { char path[MAXPGPATH]; @@ -704,6 +707,7 @@ XLogRead(char *buf, TimeLineID tli, XLogRecPtr startptr, Size count) path))); } sendOff = 0; + sendTLI = tli; } /* Need to seek in the file? */ @@ -754,6 +758,133 @@ XLogRead(char *buf, TimeLineID tli, XLogRecPtr startptr, Size count) } /* + * Determine which timeline to read an xlog page from and set the + * XLogReaderState's currTLI to that timeline ID. + * + * We care about timelines in xlogreader when we might be reading xlog + * generated prior to a promotion, either if we're currently a standby in + * recovery or if we're a promoted master reading xlogs generated by the old + * master before our promotion. + * + * wantPage must be set to the start address of the page to read and + * wantLength to the amount of the page that will be read, up to + * XLOG_BLCKSZ. If the amount to be read isn't known, pass XLOG_BLCKSZ. + * + * We switch to an xlog segment from the new timeline eagerly when on a + * historical timeline, as soon as we reach the start of the xlog segment + * containing the timeline switch. The server copied the segment to the new + * timeline so all the data up to the switch point is the same, but there's
Re: [HACKERS] Logical decoding on standby
On 21 March 2017 at 09:05, Craig Ringer wrote: > Thanks, that's a helpful point. The commit in question is 978b2f65. I > didn't notice that it introduced XLogReader use in twophase.c, though > I should've realised given the discussion about fetching recent 2pc > info from xlog. I don't see any potential for issues at first glance, > but I'll go over it in more detail. The main concern is validity of > ThisTimeLineID, but since it doesn't run in recovery I don't see much > of a problem there. That also means it can afford to use the current > timeline-oblivious read_local_xlog_page safely. > > TAP tests for 2pc were added by 3082098. I'll check to make sure they > have appropriate coverage for this. The TAP tests pass fine, and I can't see any likely issues either. XLogReader for 2PC doesn't happen on standby, and RecoveryInProgress() will update ThisTimeLineID on promotion. >> Did you check whether ThisTimeLineID is actually always valid in the >> processes logical decoding could run in? IIRC it's not consistently >> update during recovery in any process but the startup process. > > I share your concerns that it may not be well enough maintained. > Thankyou for the reminder, that's been on my TODO and got lost when I > had to task-hop to other priorities. The main place we maintain ThisTimeLineID (outside StartupXLOG of course) is in walsender's GetStandbyFlushRecPtr, which calls GetWalRcvWriteRecPtr. That's not used in walsender's logical decoding or in the SQL interface. I've changed the order of operations in read_local_xlog_page to ensure that RecoveryInProgress() updates ThisTimeLineID if we're promoted, and made it update ThisTimeLineID from GetXLogReplayRecPtr otherwise. pg_logical_slot_get_changes_guts was fine already. Because xlog read callbacks must not attempt to read pages past the flush limit (master) or replay limit (standby), it doesn't matter if ThisTimeLineID is completely up to date, only that it's valid as-of that LSN. I did identify one problem. The startup process renames the last segment in a timeline to .partial when it processes a timeline switch. See xlog.c:7597. So if we have the following order of operations: * Update ThisTimeLineID to 2 at latest redo ptr * XLogReadDetermineTimeline chooses timeline 2 to read from * startup process replays timeline switch to TL 3 and renames last segment in old timeline to .partial * XLogRead() tries to open segment with TL 2 we'll fail. I don't think it matters much though. We're not actually supporting streaming decoding from standby this release by the looks, and even if we did the effect would be limited to an ERROR and a reconnect. It doesn't look like there's really any sort of lock or other synchronisation we can rely on to prevent this, and we should probably just live with it. If we have already opened the segment we'll just keep reading from it without noticing the rename; if we haven't and are switching to it just as it's renamed we'll ERROR when we try to open it. I had cascading and promotion tests in progress for decoding on standby, but doubt there's much point finishing them off now that it's not likely that decoding on standby can be added for this CF. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Logical decoding on standby
On 21 March 2017 at 02:21, Craig Ringer wrote: > On 20 March 2017 at 17:33, Andres Freund wrote: > >>> Subject: [PATCH 2/3] Follow timeline switches in logical decoding >> >> FWIW, the title doesn't really seem accurate to me. > > Yeah, it's not really at the logical decoding layer at all. > > "Teach xlogreader to follow timeline switches" ? Happy with that. I think Craig has addressed Andres' issues with this patch, so I will apply later today as planned using that name. The longer Logical Decoding on Standby will not be applied yet and not without further changess, per review. -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Logical decoding on standby
On 20 March 2017 at 17:33, Andres Freund wrote: >> Subject: [PATCH 2/3] Follow timeline switches in logical decoding > > FWIW, the title doesn't really seem accurate to me. Yeah, it's not really at the logical decoding layer at all. "Teach xlogreader to follow timeline switches" ? >> Logical slots cannot actually be created on a replica without use of >> the low-level C slot management APIs so this is mostly foundation work >> for subsequent changes to enable logical decoding on standbys. > > Everytime I read references to anything like this my blood starts to > boil. I kind of regret not having plastered RecoveryInProgress() errors > all over this code. In fairness, I've been trying for multiple releases to get a "right" way in. I have no intention of using such hacks, and only ever did so for testing xlogreader timeline following without full logical decoding on standby being available. >> From 8854d44e2227b9d076b0a25a9c8b9df9270b2433 Mon Sep 17 00:00:00 2001 >> From: Craig Ringer >> Date: Mon, 5 Sep 2016 15:30:53 +0800 >> Subject: [PATCH 3/3] Logical decoding on standby >> >> * Make walsender aware of ProcSignal and recovery conflicts, make walsender >> exit with recovery conflict on upstream drop database when it has an active >> logical slot on that database. >> * Allow GetOldestXmin to omit catalog_xmin, be called already locked. > > "be called already locked"? To be called with ProcArrayLock already held. But that's actually outdated due to changes Petr requested earlier, thanks for noticing. >> * Send catalog_xmin separately in hot_standby_feedback messages. >> * Store catalog_xmin separately on a physical slot if received in >> hot_standby_feedback > > What does separate mean? Currently, hot standby feedback sends effectively the min(catalog_xmin, xmin) to the upstream, which in turn records that either in the PGPROC entry or, if there's a slot in use, in the xmin field on the slot. So catalog_xmin on the standby gets promoted to xmin on the master's physical slot. Lots of unnecessary bloat results. This splits it up, so we can send catalog_xmin separately on the wire, and store it on the physical replication slot as catalog_xmin, not xmin. >> * Separate the catalog_xmin used by vacuum from ProcArray's >> replication_slot_catalog_xmin, >> requiring that xlog be emitted before vacuum can remove no longer needed >> catalogs, store >> it in checkpoints, make vacuum and bgwriter advance it. > > I can't parse that sentence. We now write an xlog record before allowing the catalog_xmin in ProcArray replication_slot_catalog_xmin to advance and allow catalog tuples to be removed. This is achieved by making vacuum use a separate field in ShmemVariableCache, oldestCatalogXmin. When vacuum looks up the new xmin from GetOldestXmin, it copies ProcArray.replication_slot_catalog_xmin to ShmemVariableCache.oldestCatalogXmin, writing an xlog record to ensure we remember the new value and ensure standbys know about it. This provides a guarantee to standbys that all catalog tuples >= ShmemVariableCache.oldestCatalogXmin are protected from vacuum and lets them discover when that threshold advances. The reason we cannot use the xid field in existing vacuum xlog records is that the downstream has no way to know if the xact affected catalogs and therefore whether it should advance its idea of catalog_xmin or not. It can't get a Relation for the affected relfilenode because it can't use the relcache during redo. We'd have to add a flag to every vacuum record indicating whether it affected catalogs, which is not fun, and vacuum might not always even know. And the standby would still need a way to keep track of the oldest valid catalog_xmin across restart without the ability to write it to checkpoints. It's a lot simpler and cheaper to have the master do it. >> * Add a new recovery conflict type for conflict with catalog_xmin. Abort >> in-progress logical decoding sessions with conflict with recovery where >> needed >> catalog_xmin is too old > > Are we retaining WAL for slots broken in that way? Yes, until the slot is dropped. If I added a persistent flag on the slot to indicate that the slot is invalid, then we could ignore it for purposes of WAL retention. It seemed unnecessary at this stage. >> * Make extra efforts to reserve master's catalog_xmin during decoding startup >> on standby. > > What does that mean? WaitForMasterCatalogXminReservation(...) I don't like it. At all. I'd rather have hot standby feedback replies so we can know for sure when the master has locked in our feedback. It's my most disliked part of this patch. >> * Remove checks preventing starting logical decoding on standby > > To me that's too many different things in one commit. A bunch of them > seem like it'd be good if they'd get independent buildfarm cycles too. I agree with you. I had them all separate before and was told that there were too many patches. I also had fixes that spanned multiple patch
Re: [HACKERS] Logical decoding on standby
.On 20 March 2017 at 17:33, Andres Freund wrote: > Hi, > > Have you checked how high the overhead of XLogReadDetermineTimeline is? > A non-local function call, especially into a different translation-unit > (no partial inlining), for every single page might end up being > noticeable. That's fine in the cases it actually adds functionality, > but for a master streaming out data, that's not actually adding > anything. I don't anticipate any significant effect given the large amount of indirection via decoding, reorder buffer, snapshot builder, output plugin, etc that we already do and how much memory allocation gets done ... but it's worth checking. I could always move the fast path into a macro or inline function if it does turn out to make a detectable difference. One of the things I want to get to is refactoring all the xlog page reading stuff into a single place, shared between walsender and normal backends, to get rid of this confusing mess we currently have. The only necessary difference is how we wait for new WAL, the rest should be as common as possible allowing for xlogreader's needs. I particularly want to get rid of the two identically named static XLogRead functions. But all that probably means making timeline.c FRONTEND friendly and it's way too intrusive to contemplate at this stage. > Did you check whether you changes to read_local_xlog_page could cause > issues with twophase.c? Because that now also uses it. Thanks, that's a helpful point. The commit in question is 978b2f65. I didn't notice that it introduced XLogReader use in twophase.c, though I should've realised given the discussion about fetching recent 2pc info from xlog. I don't see any potential for issues at first glance, but I'll go over it in more detail. The main concern is validity of ThisTimeLineID, but since it doesn't run in recovery I don't see much of a problem there. That also means it can afford to use the current timeline-oblivious read_local_xlog_page safely. TAP tests for 2pc were added by 3082098. I'll check to make sure they have appropriate coverage for this. > Did you check whether ThisTimeLineID is actually always valid in the > processes logical decoding could run in? IIRC it's not consistently > update during recovery in any process but the startup process. I share your concerns that it may not be well enough maintained. Thankyou for the reminder, that's been on my TODO and got lost when I had to task-hop to other priorities. I have some TAP tests to validate promotion that need finishing off. My main concern is around live promotions, both promotion of standby to master, and promotion of upstream master when streaming from a cascaded replica. [Will cover review of 0003 separately, next] > 0002 should be doable as a whole this release, I have severe doubts that > 0003 as a whole has a chance for 10 - the code is in quite a raw shape, > there's a significant number of open ends. I'd suggest breaking of bits > that are independently useful, and work on getting those committed. That would be my preference too. I do not actually feel strongly about the need for logical decoding on standby, and would in many ways prefer to defer it until we have two-way hot standby feedback and the ability to have the master confirm the actual catalog_xmin locked in to eliminate the current race and ugly workaround for it. I'd rather have solid timeline following in place now and bare-minimum failover capability. I'm confident that the ability for xlogreader to follow timeline switches will also be independently useful. The parts I think are important for Pg10 are: * Teach xlogreader to follow timeline switches * Ability to create logical slots on replicas * Ability to advance (via feedback or via SQL function) - no need to actually decode and call output plugins at all. * Ability to drop logical slots on replicas That would be enough to provide minimal standby promotion without hideous hacks. Unfortunately, slot creation on standby is probably the ugliest part of the patch. It can be considerably simplified by imposing the rule that the application must ensure catalog_xmin on the master is already held down (with a replication slot) before creating a slot on the standby, and it's the application's job to send feedback to the master before any standbys it's keeping slots on. If the app fails to do so, the slot on the downstream will become unusable and attempts to decode changes from it will fail with conflict with recovery. That'd get rid of a lot of the code including some of the ugliest bits, since we'd no longer make any special effort with catalog_xmin during slot creation. We're already pushing complexity onto apps for this, after concluding that the transparent failover slots approach wasn't the way forward, so I'm OK with that. Let the apps that want logical decoding to support physical replica promotion pay most of the cost. I'd then like to revisit full decoding on standby later, once we have 2-way hot
Re: [HACKERS] Logical decoding on standby
On 19 March 2017 at 22:12, Petr Jelinek wrote: > I am slightly worried about impact of the readTimeLineHistory() call but > I think it should be called so little that it should not matter. Pretty much my thinking too. > That brings us to the big patch 0003. > > I still don't like the "New in 10.0" comments in documentation, for one > it's 10, not 10.0 and mainly we don't generally write stuff like this to > documentation, that's what release notes are for. OK. Personally I think it's worthwhile for protocol docs, which are more dev-focused. But I agree it's not consistent with the rest of the docs, so removed. (Frankly I wish we did this consistently throughout the Pg docs, too, and it'd be much more user-friendly if we did, but that's just not going to happen.) > There is large amounts of whitespace mess (empty lines with only > whitespace, spaces at the end of the lines), nothing horrible, but > should be cleaned up. Fixed. > One thing I don't understand much is the wal_level change and turning > off catalogXmin tracking. I don't really see anywhere that the > catalogXmin would be reset in control file for example. There is TODO in > UpdateOldestCatalogXmin() that seems related but tbh I don't follow > what's happening there - comment says about not doing anything, but the > code inside the if block are only Asserts. UpdateOldestCatalogXmin(...) with force=true forces a XactLogCatalogXminUpdate(...) call to write the new procArray->replication_slot_catalog_xmin . We call it with force=true from XLogReportParameters(...) when wal_level has been lowered; see XLogReportParameters. This will write out a xl_xact_catalog_xmin_advance with procArray->replication_slot_catalog_xmin's value then update ShmemVariableCache->oldestCatalogXmin in shmem. ShmemVariableCache->oldestCatalogXmin will get written out in the next checkpoint, which gets incorporated in the control file. There is a problem though - StartupReplicationSlots() and RestoreSlotFromDisk() don't care if catalog_xmin is set on a slot but wal_level is < logical and will happily restore a logical slot, or a physical slot with a catalog_xmin. So we can't actually assume that procArray->replication_slot_catalog_xmin will be 0 if we're not writing new logical WAL. This isn't a big deal, it just means we can't short-circuit UpdateOldestCatalogXmin() calls if !XLogLogicalInfoActive(). It also means the XLogReportParameters() stuff can be removed since we don't care about wal_level for tracking oldestCatalogXmin. Fixed in updated patch. I'm now reading over Andres's review. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Logical decoding on standby
Hi, Have you checked how high the overhead of XLogReadDetermineTimeline is? A non-local function call, especially into a different translation-unit (no partial inlining), for every single page might end up being noticeable. That's fine in the cases it actually adds functionality, but for a master streaming out data, that's not actually adding anything. Did you check whether you changes to read_local_xlog_page could cause issues with twophase.c? Because that now also uses it. Did you check whether ThisTimeLineID is actually always valid in the processes logical decoding could run in? IIRC it's not consistently update during recovery in any process but the startup process. On 2017-03-19 21:12:23 +0800, Craig Ringer wrote: > From 2fa891a555ea4fb200d75b8c906c6b932699b463 Mon Sep 17 00:00:00 2001 > From: Craig Ringer > Date: Thu, 1 Sep 2016 10:16:55 +0800 > Subject: [PATCH 2/3] Follow timeline switches in logical decoding FWIW, the title doesn't really seem accurate to me. > Logical slots cannot actually be created on a replica without use of > the low-level C slot management APIs so this is mostly foundation work > for subsequent changes to enable logical decoding on standbys. Everytime I read references to anything like this my blood starts to boil. I kind of regret not having plastered RecoveryInProgress() errors all over this code. > From 8854d44e2227b9d076b0a25a9c8b9df9270b2433 Mon Sep 17 00:00:00 2001 > From: Craig Ringer > Date: Mon, 5 Sep 2016 15:30:53 +0800 > Subject: [PATCH 3/3] Logical decoding on standby > > * Make walsender aware of ProcSignal and recovery conflicts, make walsender > exit with recovery conflict on upstream drop database when it has an active > logical slot on that database. > * Allow GetOldestXmin to omit catalog_xmin, be called already locked. "be called already locked"? > * Send catalog_xmin separately in hot_standby_feedback messages. > * Store catalog_xmin separately on a physical slot if received in > hot_standby_feedback What does separate mean? > * Separate the catalog_xmin used by vacuum from ProcArray's > replication_slot_catalog_xmin, > requiring that xlog be emitted before vacuum can remove no longer needed > catalogs, store > it in checkpoints, make vacuum and bgwriter advance it. I can't parse that sentence. > * Add a new recovery conflict type for conflict with catalog_xmin. Abort > in-progress logical decoding sessions with conflict with recovery where > needed > catalog_xmin is too old Are we retaining WAL for slots broken in that way? > * Make extra efforts to reserve master's catalog_xmin during decoding startup > on standby. What does that mean? > * Remove checks preventing starting logical decoding on standby To me that's too many different things in one commit. A bunch of them seem like it'd be good if they'd get independent buildfarm cycles too. > diff --git a/src/backend/access/heap/rewriteheap.c > b/src/backend/access/heap/rewriteheap.c > index d7f65a5..36bbb98 100644 > --- a/src/backend/access/heap/rewriteheap.c > +++ b/src/backend/access/heap/rewriteheap.c > @@ -812,7 +812,8 @@ logical_begin_heap_rewrite(RewriteState state) > if (!state->rs_logical_rewrite) > return; > > - ProcArrayGetReplicationSlotXmin(NULL, &logical_xmin); > + /* Use the catalog_xmin being retained by vacuum */ > + ProcArrayGetReplicationSlotXmin(NULL, &logical_xmin, NULL); What does that comment mean? Vacuum isn't the only thing that prunes old records. > +/* > + * Set the global oldest catalog_xmin used to determine when tuples > + * may be removed from catalogs and user-catalogs accessible from logical > + * decoding. > + * > + * Only to be called from the startup process or by > UpdateOldestCatalogXmin(), > + * which ensures the update is properly written to xlog first. > + */ > +void > +SetOldestCatalogXmin(TransactionId oldestCatalogXmin) > +{ > + Assert(InRecovery || !IsUnderPostmaster || AmStartupProcess() || > LWLockHeldByMe(ProcArrayLock)); Uh, that's long-ish. And doesn't agree with the comment above (s/startup process/process performing recovery/?). This is a long enough list that I'd consider just dropping the assert. > + else if (info == XLOG_XACT_CATALOG_XMIN_ADV) > + { > + xl_xact_catalog_xmin_advance *xlrec = > (xl_xact_catalog_xmin_advance *) XLogRecGetData(record); > + > + /* > + * Unless logical decoding is possible on this node, we don't > care about > + * this record. > + */ > + if (!XLogLogicalInfoActive() || max_replication_slots == 0) > + return; Too many negatives for my taste, but whatever. > + /* > + * Apply the new catalog_xmin limit immediately. New decoding > sessions > + * will refuse to start if their slot is past it, and old ones > will > + * notice when we signal them with a recovery conflict.
Re: [HACKERS] Logical decoding on standby
On 20 March 2017 at 14:57, Simon Riggs wrote: > 2.1 Why does call to ReplicationSlotAcquire() move earlier in > pg_logical_slot_get_changes_guts()? That appears to be an oversight from an earlier version where it looped over timelines in pg_logical_slot_get_changes_guts . Reverted. > 2.2 sendTimeLineIsHistoric looks incorrect, and at least isn't really > documented well. > The setting > sendTimeLineIsHistoric = state->currTLI == ThisTimeLineID; > should be > sendTimeLineIsHistoric = (state->currTLI != ThisTimeLineID); Definitely wrong. Fixed. > but that doesn't cause failure because in read_local_xlog_page() we > say that we are reading from history when > state->currTLI != ThisTimeLineID explicitly rather than use > sendTimeLineIsHistoric XLogRead(...), as called by logical_read_xlog_page, does test it. It's part of the walsender-local log read callback. We don't hit read_local_xlog_page at all when we're doing walsender based logical decoding. We have two parallel code paths for reading xlogs, one for walsender, one for normal backends. The walsender one is glued together with a bunch of globals that pass state "around" the xlogreader - we set it up before calling into xlogreader, and then examine it when xlogreader calls back into walsender.c with logical_read_xlog_page. I really want to refactor that at some stage, getting rid of the use of walsender globals for timeline state tracking and sharing more of the xlog reading logic between walsender and normal backends. But -ENOTIME, especially to do it as carefully as it must be done. There are comments on read_local_xlog_page, logical_read_xlog_page that mention this. Also XLogRead in src/backend/access/transam/xlogutils.c (which has the same name as XLogRead in src/backend/replication/walsender.c). I have a draft for a timeline following readme that would address some of this but don't expect to be able to finish it off for this release cycle, and I'd really rather clean it up instead. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Logical decoding on standby
On 19 March 2017 at 21:12, Craig Ringer wrote: > Rebased attached. Patch1 looks good to go. I'll correct a spelling mistake in the tap test when I commit that later today. Patch2 has a couple of points 2.1 Why does call to ReplicationSlotAcquire() move earlier in pg_logical_slot_get_changes_guts()? 2.2 sendTimeLineIsHistoric looks incorrect, and at least isn't really documented well. The setting sendTimeLineIsHistoric = state->currTLI == ThisTimeLineID; should be sendTimeLineIsHistoric = (state->currTLI != ThisTimeLineID); but that doesn't cause failure because in read_local_xlog_page() we say that we are reading from history when state->currTLI != ThisTimeLineID explicitly rather than use sendTimeLineIsHistoric So it looks like we could do with a few extra comments If you correct these I'll commit it tomorrow. -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Logical decoding on standby
Hi, I don't know how well I can review the 0001 (the TAP infra patch) but it looks okay to me. I don't really have any complaints about 0002 either. I like that it's more or less one self-contained function and there are no weird ifdefs anymore like in 9.6 version (btw your commit message talks about 9.5 but it was 9.6). I also like the clever test :) I am slightly worried about impact of the readTimeLineHistory() call but I think it should be called so little that it should not matter. That brings us to the big patch 0003. I still don't like the "New in 10.0" comments in documentation, for one it's 10, not 10.0 and mainly we don't generally write stuff like this to documentation, that's what release notes are for. There is large amounts of whitespace mess (empty lines with only whitespace, spaces at the end of the lines), nothing horrible, but should be cleaned up. One thing I don't understand much is the wal_level change and turning off catalogXmin tracking. I don't really see anywhere that the catalogXmin would be reset in control file for example. There is TODO in UpdateOldestCatalogXmin() that seems related but tbh I don't follow what's happening there - comment says about not doing anything, but the code inside the if block are only Asserts. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Logical decoding on standby
On 13 March 2017 at 10:56, Craig Ringer wrote: > On 7 March 2017 at 21:08, Simon Riggs wrote: > >> Patch 4 committed. Few others need rebase. > > Since this patch series Patch 1 fails since feature has already been applied. If other reason, let me know. -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Logical decoding on standby
On 13 March 2017 at 10:56, Craig Ringer wrote: > On 7 March 2017 at 21:08, Simon Riggs wrote: > >> Patch 4 committed. Few others need rebase. > > Since this patch series and initial data copy for logical replication > both add a facility for suppressing initial snapshot export on a > logical slot, I've dropped patch 0003 (make snapshot export on logical > slot creation) in favour of Petr's similar patch. > > I will duplicate it in this patch series for ease of application. (The > version here is slightly extended over Petr's so I'll re-post the > modified version on the logical replication initial data copy thread > too). > > The main thing I want to direct attention to for Simon, as committer, > is the xlog'ing of VACUUM's xid threshold before we advance it and > start removing tuples. This is necessary for the standby to know > whether a given replication slot is safe to use and fail with conflict > with recovery if it is not, or if it becomes unsafe due to master > vacuum activity. Note that we can _not_ use the various vacuum records > for this because we don't know which are catalogs and which aren't; > we'd have to add a separate "is catalog" field to each vacuum xlog > record, which is undesirable. The downstream can't look up whether > it's a catalog or not because it doesn't have relcache/syscache access > during decoding. > > This change might look a bit similar to the vac_truncate_clog change > in the txid_status patch, but it isn't really related. The txid_status > change is about knowing when we can safely look up xids in clog and > preventing a race with clog truncation. This change is about knowing > when we can know all catalog tuples for a given xid will still be in > the heap, not vacuumed away. Both are about making sure standbys know > more about the state of the system in a low-cost way, though. > > WaitForMasterCatalogXminReservation(...) in logical.c is also worth > looking more closely at. I should also note that because the TAP tests currently take a long time, I recommend skipping the tests for this patch by default and running them only when actually touching logical decoding. I'm looking at ways to make them faster, but they're inevitably going to take a while until we can get hot standby feedback replies in place, including cascading support. Which I have as WIP, but won't make this release. Changing the test import to use Test::More skip_all => "disabled by default, too slow"; will be sufficient. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Logical decoding on standby
On 24 January 2017 at 06:37, Craig Ringer wrote: > Rebased series attached, on top of current master (which includes > logical replicaiton). > > I'm inclined to think I should split out a few of the changes from > 0005, roughly along the lines of the bullet points in its commit > message. Anyone feel strongly about how granular this should be? > > This patch series is a pre-requisite for supporting logical > replication using a physical standby as a source, but does not its > self enable logical replication from a physical standby. Patch 4 committed. Few others need rebase. -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Logical decoding on standby
On Tue, Jan 24, 2017 at 7:37 AM, Craig Ringer wrote: > Rebased series attached, on top of current master (which includes > logical replicaiton). > > I'm inclined to think I should split out a few of the changes from > 0005, roughly along the lines of the bullet points in its commit > message. Anyone feel strongly about how granular this should be? > > This patch series is a pre-requisite for supporting logical > replication using a physical standby as a source, but does not its > self enable logical replication from a physical standby. There are patches but no reviews yet so moved to CF 2017-03. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Logical decoding on standby
On 5 January 2017 at 01:21, Craig Ringer wrote: > On 5 January 2017 at 09:19, Craig Ringer wrote: > >> so here's a rebased series on top of master. No other changes. > > Now with actual patches. Patch 5 no longer applies: patching file src/include/pgstat.h Hunk #1 FAILED at 745. 1 out of 1 hunk FAILED -- saving rejects to file src/include/pgstat.h.rej --- src/include/pgstat.h +++ src/include/pgstat.h @@ -745,7 +745,8 @@ typedef enum WAIT_EVENT_SYSLOGGER_MAIN, WAIT_EVENT_WAL_RECEIVER_MAIN, WAIT_EVENT_WAL_SENDER_MAIN, - WAIT_EVENT_WAL_WRITER_MAIN + WAIT_EVENT_WAL_WRITER_MAIN, + WAIT_EVENT_STANDBY_LOGICAL_SLOT_CREATE } WaitEventActivity; /* -- Could you rebase? Thanks Thom -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Logical decoding on standby
On Fri, Jan 6, 2017 at 1:07 PM, Craig Ringer wrote: > On 5 January 2017 at 13:12, Michael Paquier wrote: >> On Thu, Jan 5, 2017 at 10:21 AM, Craig Ringer wrote: >>> On 5 January 2017 at 09:19, Craig Ringer wrote: >>> so here's a rebased series on top of master. No other changes. >>> >>> Now with actual patches. >> >> Looking at the PostgresNode code in 0001... >> +=pod $node->pg_recvlogical_upto(self, dbname, slot_name, endpos, >> timeout_secs, ...) >> + >> This format is incorrect. I think that you also need to fix the >> brackets for the do{} and the eval{] blocks. >> >> +push @cmd, '--endpos', $endpos if ($endpos); >> endpos should be made a mandatory argument. If it is not defined that >> would make the test calling this routine being stuck. > > Acknowledged and agreed. I'll fix both in the next revision. I'm > currently working on hot standby replies, but will return to this > ASAP. By the way, be sure to fix as well the =pod blocks for the new routines. perldoc needs to use both =pod and =item. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Logical decoding on standby
On 5 January 2017 at 13:12, Michael Paquier wrote: > On Thu, Jan 5, 2017 at 10:21 AM, Craig Ringer wrote: >> On 5 January 2017 at 09:19, Craig Ringer wrote: >> >>> so here's a rebased series on top of master. No other changes. >> >> Now with actual patches. > > Looking at the PostgresNode code in 0001... > +=pod $node->pg_recvlogical_upto(self, dbname, slot_name, endpos, > timeout_secs, ...) > + > This format is incorrect. I think that you also need to fix the > brackets for the do{} and the eval{] blocks. > > +push @cmd, '--endpos', $endpos if ($endpos); > endpos should be made a mandatory argument. If it is not defined that > would make the test calling this routine being stuck. > -- > Michael Acknowledged and agreed. I'll fix both in the next revision. I'm currently working on hot standby replies, but will return to this ASAP. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Logical decoding on standby
On Thu, Jan 5, 2017 at 10:21 AM, Craig Ringer wrote: > On 5 January 2017 at 09:19, Craig Ringer wrote: > >> so here's a rebased series on top of master. No other changes. > > Now with actual patches. Looking at the PostgresNode code in 0001... +=pod $node->pg_recvlogical_upto(self, dbname, slot_name, endpos, timeout_secs, ...) + This format is incorrect. I think that you also need to fix the brackets for the do{} and the eval{] blocks. +push @cmd, '--endpos', $endpos if ($endpos); endpos should be made a mandatory argument. If it is not defined that would make the test calling this routine being stuck. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Logical decoding on standby
On 4 January 2017 at 16:19, Craig Ringer wrote: > On 4 January 2017 at 12:15, Craig Ringer wrote: > >> That's particularly relevant to you Simon as you expressed a wish to >> commit the new streaming rep tests. Simon committed 1, 2, 3 and 5: * Extra PostgresNode methods * pg_recvlogical --endpos * Tests for pg_recvlogical * Expand streaming replication tests to cover hot standby so here's a rebased series on top of master. No other changes. The first patch to add a pg_recvlogical wrapper to PostgresNode is really only needed to test the rest of the patches, so it can be committed together with them. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Logical decoding on standby
On 4 January 2017 at 12:15, Craig Ringer wrote: > That's particularly relevant to you Simon as you expressed a wish to > commit the new streaming rep tests. Patches 0001 and 0005 in this series also posted as https://www.postgresql.org/message-id/CAMsr+YHxTMrY1woH_m4bEF3f5+kxX_T=sduyxf4d2-+e-56...@mail.gmail.com , since they're really pre-requisites not part of decoding on standby as such. I'll post a new series with them removed once committed. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Logical decoding on standby
On 4 January 2017 at 12:08, Craig Ringer wrote: > > 0001 incorporates the changes requested by Michael Paquier. Simon > expressed his intention to commit this after updates, in the separate > thread [...] ... > 0005 (new streaming rep tests) is updated for the changes in 0001, > otherwise unchanged. Simon said he wanted to commit this soon. > > 0006 (timeline following) is unchanged except for updates to be > compatible with 0001. > > 0007 is the optional snapshot export requested by Petr. > > 0008 is unchanged. > > 0009 is unchanged except for updates vs 0001 and use of the > WITHOUT_SNAPSHOT option added in 0007. Oh, note that it's possible to commit 0001 then 0005, skipping over 2..4. I should probably have ordered them that way. That's particularly relevant to you Simon as you expressed a wish to commit the new streaming rep tests. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Logical decoding on standby
On 12/22/2016 01:21 AM, Craig Ringer wrote: + my @fields = ('plugin', 'slot_type', 'datoid', 'database', 'active', 'active_pid', 'xmin', 'catalog_xmin', 'restart_lsn'); + my $result = $self->safe_psql('postgres', 'SELECT ' . join(', ', @fields) . " FROM pg_catalog.pg_replication_slots WHERE slot_name = '$slot_name'"); + $result = undef if $result eq ''; + # hash slice, see http://stackoverflow.com/a/16755894/398670 . Couldn't this portion be made more generic? Other queries could benefit from that by having a routine that accepts as argument an array of column names for example. Yeah, probably. I'm not sure where it should live though - TestLib.pm ? Not sure if there's an idomatic way to pass a string (in this case queyr) in Perl with a placeholder for interpolation of values (in this case columns). in Python you'd pass it with pre-defined %(placeholders)s for %. For direct interpolation of an expression, there is this slightly baroque gadget: my $str = "here it is @{[ arbitrary expression here ]}"; For indirect interpolation using placeholders, there is my $str = sprintf("format string",...); which works much like C except that the string is returned as a result instead of being the first argument. And as we always say, TIMTOWTDI. cheers andrew (japh) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Logical decoding on standby
On 22 December 2016 at 14:21, Craig Ringer wrote: changes-in-0001-v2.diff shows the changes to PostgresNode.pm per Michael's comments, and applies on top of 0001. I also attach a patch to add a new CREATE_REPLICATION_SLOT option per Petr's suggestion, so you can request a slot be created WITHOUT_SNAPSHOT. This replaces the patch series's behaviour of silently suppressing snapshot export when a slot was created on a replica. It'll conflict (easily resolved) if applied on top of the current series. I have more to do before re-posting the full series, so waiting on author at this point. The PostgresNode changes likely break later tests, I'm just posting them so there's some progress here and so I don't forget over the next few days' distraction. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services diff --git a/src/test/perl/PostgresNode.pm b/src/test/perl/PostgresNode.pm index 28e9f0b..64a4633 100644 --- a/src/test/perl/PostgresNode.pm +++ b/src/test/perl/PostgresNode.pm @@ -93,7 +93,6 @@ use RecursiveCopy; use Socket; use Test::More; use TestLib (); -use pg_lsn qw(parse_lsn); use Scalar::Util qw(blessed); our @EXPORT = qw( @@ -1325,38 +1324,62 @@ sub run_log TestLib::run_log(@_); } -=pod $node->lsn +=pod $node->lsn(mode) -Return pg_current_xlog_insert_location() or, on a replica, -pg_last_xlog_replay_location(). +Look up xlog positions on the server: + +* insert position (master only, error on replica) +* write position (master only, error on replica) +* flush position +* receive position (always undef on master) +* replay position + +mode must be specified. =cut sub lsn { - my $self = shift; - return $self->safe_psql('postgres', 'select case when pg_is_in_recovery() then pg_last_xlog_replay_location() else pg_current_xlog_insert_location() end as lsn;'); + my ($self, $mode) = @_; + my %modes = ('insert' => 'pg_current_xlog_insert_location()', +'flush' => 'pg_current_xlog_flush_location()', +'write' => 'pg_current_xlog_location()', +'receive' => 'pg_last_xlog_receive_location()', +'replay' => 'pg_last_xlog_replay_location()'); + + $mode = '' if !defined($mode); + die "unknown mode for 'lsn': '$mode', valid modes are " . join(', ', keys %modes) + if !defined($modes{$mode}); + + my $result = $self->safe_psql('postgres', "SELECT $modes{$mode}"); + chomp($result); + if ($result eq '') + { + return undef; + } + else + { + return $result; + } } =pod $node->wait_for_catchup(standby_name, mode, target_lsn) Wait for the node with application_name standby_name (usually from node->name) -until its replication equals or passes the upstream's xlog insert point at the -time this function is called. By default the replay_location is waited for, -but 'mode' may be specified to wait for any of sent|write|flush|replay. +until its replication position in pg_stat_replication equals or passes the +upstream's xlog insert point at the time this function is called. By default +the replay_location is waited for, but 'mode' may be specified to wait for any +of sent|write|flush|replay. If there is no active replication connection from this peer, waits until poll_query_until timeout. Requires that the 'postgres' db exists and is accessible. -If pos is passed, use that xlog position instead of the server's current -xlog insert position. +target_lsn may be any arbitrary lsn, but is typically $master_node->lsn('insert'). This is not a test. It die()s on failure. -Returns the LSN caught up to. - =cut sub wait_for_catchup @@ -1364,24 +1387,25 @@ sub wait_for_catchup my ($self, $standby_name, $mode, $target_lsn) = @_; $mode = defined($mode) ? $mode : 'replay'; my %valid_modes = ( 'sent' => 1, 'write' => 1, 'flush' => 1, 'replay' => 1 ); - die "valid modes are " . join(', ', keys(%valid_modes)) unless exists($valid_modes{$mode}); - if ( blessed( $standby_name ) && $standby_name->isa("PostgresNode") ) { + die "unknown mode $mode for 'wait_for_catchup', valid modes are " . join(', ', keys(%valid_modes)) unless exists($valid_modes{$mode}); + # Allow passing of a PostgresNode instance as shorthand + if ( blessed( $standby_name ) && $standby_name->isa("PostgresNode") ) + { $standby_name = $standby_name->name; } - if (!defined($target_lsn)) { - $target_lsn = $self->lsn; - } - $self->poll_query_until('postgres', qq[SELECT '$target_lsn' <= ${mode}_location FROM pg_catalog.pg_stat_replication WHERE application_name = '$standby_name';]) - or die "timed out waiting for catchup"; - return $target_lsn; + die 'target_lsn must be speci
Re: [HACKERS] Logical decoding on standby
On 22 December 2016 at 13:43, Michael Paquier wrote: > So, for 0001: > --- a/src/test/perl/PostgresNode.pm > +++ b/src/test/perl/PostgresNode.pm > @@ -93,6 +93,7 @@ use RecursiveCopy; > use Socket; > use Test::More; > use TestLib (); > +use pg_lsn qw(parse_lsn); > use Scalar::Util qw(blessed); > This depends on 0002, so the order should be reversed. Will do. That was silly. I think I should probably also move the standby tests earlier, then add a patch to update them when the results change. > +sub lsn > +{ > + my $self = shift; > + return $self->safe_psql('postgres', 'select case when > pg_is_in_recovery() then pg_last_xlog_replay_location() else > pg_current_xlog_insert_location() end as lsn;'); > +} > The design of the test should be in charge of choosing which value it > wants to get, and the routine should just blindly do the work. More > flexibility is more useful to design tests. So it would be nice to > have one routine able to switch at will between 'flush', 'insert', > 'write', 'receive' and 'replay modes to get values from the > corresponding xlog functions. Fair enough. I can amend that. > - die "error running SQL: '$$stderr'\nwhile running '@psql_params'" > + die "error running SQL: '$$stderr'\nwhile running > '@psql_params' with sql '$sql'" > if $ret == 3; > That's separate from this patch, and definitely useful. Yeah. Slipped through. I don't think it really merits a separate patch though tbh. > + if (!($mode eq 'restart' || $mode eq 'confirmed_flush')) { > + die "valid modes are restart, confirmed_flush"; > + } > + if (!defined($target_lsn)) { > + $target_lsn = $self->lsn; > + } > That's not really the style followed by the perl scripts present in > the code regarding the use of the brackets. Do we really need to care > about the object type checks by the way? Brackets: will look / fix. Type checks (not in quoted snippet above): that's a convenience to let you pass a PostgresNode instance or a string name. Maybe there's a more idiomatic Perl-y way to write it. My Perl is pretty dire. > Regarding wait_for_catchup, there are two ways to do things. Either > query the standby like in the way 004_timeline_switch.pl does it or > the way this routine does. The way of this routine looks more > straight-forward IMO, and other tests should be refactored to use it. > In short I would make the target LSN a mandatory argument, and have > the caller send a standby's application_name instead of a PostgresNode > object, the current way to enforce the value of $standby_name being > really confusing. Hm, ok. I'll take a look. Making LSN mandatory so you have to pass $self->lsn is ok with me. > + my %valid_modes = ( 'sent' => 1, 'write' => 1, 'flush' => 1, > 'replay' => 1 ); > What's actually the point of 'sent'? Pretty useless, but we expose it in Pg, so we might as well in the tests. > + my @fields = ('plugin', 'slot_type', 'datoid', 'database', > 'active', 'active_pid', 'xmin', 'catalog_xmin', 'restart_lsn'); > + my $result = $self->safe_psql('postgres', 'SELECT ' . join(', ', > @fields) . " FROM pg_catalog.pg_replication_slots WHERE slot_name = > '$slot_name'"); > + $result = undef if $result eq ''; > + # hash slice, see http://stackoverflow.com/a/16755894/398670 . > Couldn't this portion be made more generic? Other queries could > benefit from that by having a routine that accepts as argument an > array of column names for example. Yeah, probably. I'm not sure where it should live though - TestLib.pm ? Not sure if there's an idomatic way to pass a string (in this case queyr) in Perl with a placeholder for interpolation of values (in this case columns). in Python you'd pass it with pre-defined %(placeholders)s for %. > Now looking at 0002 > One whitespace: > $ git diff HEAD~1 --check > src/test/perl/pg_lsn.pm:139: trailing whitespace. > +=cut Will fix. > pg_lsn sounds like a fine name, now we are more used to camel case for > module names. And routines are written as lower case separated by an > underscore. Unsure what the intent of this is. > +++ b/src/test/perl/t/002_pg_lsn.pl > @@ -0,0 +1,68 @@ > +use strict; > +use warnings; > +use Test::More tests => 42; > +use Scalar::Util qw(blessed); > Most of the tests added don't have a description. This makes things > harder to debug when tracking an issue. > > It may be good to begin using this module within the other tests in > this patch as well. Now do we actually need it? Most of the existing > tests I recall rely on the backend's operators for the pg_lsn data > type, so this is actually duplicating an exiting facility. And all the > values are just passed as-is. I added it mainly for ordered tests of whether some expected lsn had passed/increased. But maybe it makes sense to just call into the server and let it evaluate such tests. > +++ b/src/test/perl/t/001_load.pl > @@ -0,0 +1,9 @@ > +use strict; > +use warnings; > +use Test::More tests => 5; > I can guess the mea
Re: [HACKERS] Logical decoding on standby
On Tue, Dec 20, 2016 at 4:03 PM, Petr Jelinek wrote: > That's about approach, but since there are prerequisite patches in the > patchset that don't really depend on the approach I will comment about > them as well. > > 0001 and 0002 add testing infrastructure and look fine to me, possibly > committable. > > But in 0003 I don't understand following code: >> + if (endpos != InvalidXLogRecPtr && !do_start_slot) >> + { >> + fprintf(stderr, >> + _("%s: cannot use --create-slot or --drop-slot >> together with --endpos\n"), >> + progname); >> + fprintf(stderr, _("Try \"%s --help\" for more information.\n"), >> + progname); >> + exit(1); >> + } > > Why is --create-slot and --endpos not allowed together? > > 0004 again looks good but depends on 0003. > > 0005 is timeline following which is IMHO ready for committer, as is 0006 > and 0008 and I still maintain the opinion that these should go in soon. > > 0007 is unfinished as you said in your mail (missing option to specify > behavior). And the last one 0009 is the implementation discussed above, > which I think needs rework. IMHO 0007 and 0009 should be ultimately merged. > > I think parts of this could be committed separately and are ready for > committer IMHO, but there is no way in CF application to mark only part > of patch-set for committer to attract the attention. Craig has pinged me about looking at 0001, 0002, 0004 and 0006 as those involve the TAP infrastructure. So, for 0001: --- a/src/test/perl/PostgresNode.pm +++ b/src/test/perl/PostgresNode.pm @@ -93,6 +93,7 @@ use RecursiveCopy; use Socket; use Test::More; use TestLib (); +use pg_lsn qw(parse_lsn); use Scalar::Util qw(blessed); This depends on 0002, so the order should be reversed. +sub lsn +{ + my $self = shift; + return $self->safe_psql('postgres', 'select case when pg_is_in_recovery() then pg_last_xlog_replay_location() else pg_current_xlog_insert_location() end as lsn;'); +} The design of the test should be in charge of choosing which value it wants to get, and the routine should just blindly do the work. More flexibility is more useful to design tests. So it would be nice to have one routine able to switch at will between 'flush', 'insert', 'write', 'receive' and 'replay modes to get values from the corresponding xlog functions. - die "error running SQL: '$$stderr'\nwhile running '@psql_params'" + die "error running SQL: '$$stderr'\nwhile running '@psql_params' with sql '$sql'" if $ret == 3; That's separate from this patch, and definitely useful. + if (!($mode eq 'restart' || $mode eq 'confirmed_flush')) { + die "valid modes are restart, confirmed_flush"; + } + if (!defined($target_lsn)) { + $target_lsn = $self->lsn; + } That's not really the style followed by the perl scripts present in the code regarding the use of the brackets. Do we really need to care about the object type checks by the way? Regarding wait_for_catchup, there are two ways to do things. Either query the standby like in the way 004_timeline_switch.pl does it or the way this routine does. The way of this routine looks more straight-forward IMO, and other tests should be refactored to use it. In short I would make the target LSN a mandatory argument, and have the caller send a standby's application_name instead of a PostgresNode object, the current way to enforce the value of $standby_name being really confusing. + my %valid_modes = ( 'sent' => 1, 'write' => 1, 'flush' => 1, 'replay' => 1 ); What's actually the point of 'sent'? + my @fields = ('plugin', 'slot_type', 'datoid', 'database', 'active', 'active_pid', 'xmin', 'catalog_xmin', 'restart_lsn'); + my $result = $self->safe_psql('postgres', 'SELECT ' . join(', ', @fields) . " FROM pg_catalog.pg_replication_slots WHERE slot_name = '$slot_name'"); + $result = undef if $result eq ''; + # hash slice, see http://stackoverflow.com/a/16755894/398670 . Couldn't this portion be made more generic? Other queries could benefit from that by having a routine that accepts as argument an array of column names for example. Now looking at 0002 One whitespace: $ git diff HEAD~1 --check src/test/perl/pg_lsn.pm:139: trailing whitespace. +=cut pg_lsn sounds like a fine name, now we are more used to camel case for module names. And routines are written as lower case separated by an underscore. +++ b/src/test/perl/t/002_pg_lsn.pl @@ -0,0 +1,68 @@ +use strict; +use warnings; +use Test::More tests => 42; +use Scalar::Util qw(blessed); Most of the tests added don't have a description. This makes things harder to debug when tracking an issue. It may be good to begin using this module within the other tests in this patch as well. Now do we actually need it? Most of the existing tests I recall rely on the backend's operators for the pg_lsn data type, so this is actually duplicating an exiting facilit
Re: [HACKERS] Logical decoding on standby
On Tue, Dec 20, 2016 at 10:06 PM, Craig Ringer wrote: > On 20 December 2016 at 15:03, Petr Jelinek > wrote: >>> The biggest change in this patch, and the main intrusive part, is that >>> procArray->replication_slot_catalog_xmin is no longer directly used by >>> vacuum. Instead, a new ShmemVariableCache->oldestCatalogXmin field is >>> added, with a corresponding CheckPoint field. >>> [snip] >> >> If this mechanism would not be needed most of the time, wouldn't it be >> better to not have it and just have a way to ask physical slot about >> what's the current reserved catalog_xmin (in most cases the standby >> should actually know what it is since it's sending the hs_feedback, but >> first slot created on such standby may need to check). > > Yes, and that was actually my originally preferred approach, though > the one above does offer the advantage that if something goes wrong we > can detect it and fail gracefully. Possibly not worth the complexity > though. > > Your approach requires us to make very sure that hot_standby_feedback > does not get turned off by user or become ineffective once we're > replicating, though, since we won't have any way to detect when needed > tuples are removed. We'd probably just bail out with relcache/syscache > lookup errors, but I can't guarantee we wouldn't crash if we tried > logical decoding on WAL where needed catalogs have been removed. I dunno, Craig, I think your approach sounds more robust. It's not very nice to introduce a bunch of random prohibitions on what works with what, and it doesn't sound like it's altogether watertight anyway. Incorporating an occasional, small record into the WAL stream to mark the advancement of the reserved catalog_xmin seems like a cleaner and safer solution. We certainly do NOT want to find out about corruption only because of random relcache/syscache lookup failures, let alone crashes. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Logical decoding on standby
On 21/12/16 04:06, Craig Ringer wrote: > On 20 December 2016 at 15:03, Petr Jelinek > wrote: > >>> The biggest change in this patch, and the main intrusive part, is that >>> procArray->replication_slot_catalog_xmin is no longer directly used by >>> vacuum. Instead, a new ShmemVariableCache->oldestCatalogXmin field is >>> added, with a corresponding CheckPoint field. >>> [snip] >> >> If this mechanism would not be needed most of the time, wouldn't it be >> better to not have it and just have a way to ask physical slot about >> what's the current reserved catalog_xmin (in most cases the standby >> should actually know what it is since it's sending the hs_feedback, but >> first slot created on such standby may need to check). > > Yes, and that was actually my originally preferred approach, though > the one above does offer the advantage that if something goes wrong we > can detect it and fail gracefully. Possibly not worth the complexity > though. > > Your approach requires us to make very sure that hot_standby_feedback > does not get turned off by user or become ineffective once we're > replicating, though, since we won't have any way to detect when needed > tuples are removed. We'd probably just bail out with relcache/syscache > lookup errors, but I can't guarantee we wouldn't crash if we tried > logical decoding on WAL where needed catalogs have been removed. > > I initially ran into trouble doing that, but now think I have a way forward. > >> WRT preventing >> hs_feedback going off, we can refuse to start with hs_feedback off when >> there are logical slots detected. > > Yes. There are some ordering issues there though. We load slots quite > late in startup and they don't get tracked in checkpoints. So we might > launch the walreceiver before we load slots and notice their needed > xmin/catalog_xmin. So we need to prevent sending of > hot_standby_feedback until slots are loaded, or load slots earlier in > startup. The former sounds less intrusive and safer - probably just > add an "initialized" flag to ReplicationSlotCtlData and suppress > hs_feedback until it becomes true. > > We'd also have to suppress the validation callback action on the > hot_standby_feedback GUC until we know replication slot state is > initialised, and perform the check during slot startup instead. The > hot_standby_feedback GUC validation check might get called before > shmem is even set up so we have to guard against attempts to access a > shmem segment that may not event exist yet. > > The general idea is workable though. Refuse to start if logical slots > exist and hot_standby_feedback is off or walreceiver isn't using a > physical slot. Refuse to allow hot_standby_feedback to change > These are all problems associated with replication slots being in shmem if I understand correctly. I wonder, could we put just bool someplace which is available early that says if there are any logical slots defined? We don't actually need all the slot info, just to know if there are some. > >> You may ask what if user drops the slot and recreates or somehow >> otherwise messes up catalog_xmin on master, well, considering that under >> my proposal we'd first (on connect) check the slot for catalog_xmin we'd >> know about it so we'd either mark the local slots broken/drop them or >> plainly refuse to connect to the master same way as if it didn't have >> required WAL anymore (not sure which behavior is better). Note that user >> could mess up catalog_xmin even in your design the same way, so it's not >> really a regression. > > Agreed. Checking catalog_xmin of the slot when we connect is > sufficient to guard against that, assuming we can trust that the > catalog_xmin is actually in effect on the master. Consider cascading > setups, where we set our catalog_xmin but it might not be "locked in" > until the middle cascaded server relays it to the master. > > I have a proposed solution to that which I'll outline in a separate > patch+post; it ties in to some work on addressing the race between hot > standby feedback taking effect and queries starting on the hot > standby. It boils down to "add a hot_standby_feedback reply protocol > message". > >> Plus >> it might even be okay to only allow creating logical slots on standbys >> connected directly to master in v1. > > True. I didn't consider that. > > We haven't had much luck in the past with such limitations, but > personally I'd consider it a perfectly reasonable one. > I think it's infinitely better with that limitation than the status quo. Especially for failover scenario (you usually won't failover to replica down the cascade as it's always more behind). Not to mention that with every level of cascading you get automatically more lag which means more bloat so it might not even be all that desirable to go that route immediately in v1 when we don't have way to control that bloat/maximum slot lag. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Dev
Re: [HACKERS] Logical decoding on standby
On 21/12/16 04:31, Craig Ringer wrote: > On 20 December 2016 at 15:03, Petr Jelinek > wrote: > >> But in 0003 I don't understand following code: >>> + if (endpos != InvalidXLogRecPtr && !do_start_slot) >>> + { >>> + fprintf(stderr, >>> + _("%s: cannot use --create-slot or >>> --drop-slot together with --endpos\n"), >>> + progname); >>> + fprintf(stderr, _("Try \"%s --help\" for more >>> information.\n"), >>> + progname); >>> + exit(1); >>> + } >> >> Why is --create-slot and --endpos not allowed together? > > Actually, the test is fine, the error is just misleading due to my > misunderstanding. > > The fix is simply to change the error message to > > _("%s: --endpos may only be specified > with --start\n"), > > so I won't post a separate followup patch. > Ah okay makes sense. The --create-slot + --endpos should definitely be allowed combination, especially now that we can extend this to optionally use temporary slot. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Logical decoding on standby
On 20 December 2016 at 15:03, Petr Jelinek wrote: > But in 0003 I don't understand following code: >> + if (endpos != InvalidXLogRecPtr && !do_start_slot) >> + { >> + fprintf(stderr, >> + _("%s: cannot use --create-slot or --drop-slot >> together with --endpos\n"), >> + progname); >> + fprintf(stderr, _("Try \"%s --help\" for more information.\n"), >> + progname); >> + exit(1); >> + } > > Why is --create-slot and --endpos not allowed together? Actually, the test is fine, the error is just misleading due to my misunderstanding. The fix is simply to change the error message to _("%s: --endpos may only be specified with --start\n"), so I won't post a separate followup patch. Okano Naoki tried to bring this to my attention earlier, but I didn't understand as I hadn't yet realised you could use --create-slot --start, they weren't mutually exclusive. (We test to ensure --start --drop-slot isn't specified earlier so no test for do_drop_slot is required). -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Logical decoding on standby
On 20 December 2016 at 15:03, Petr Jelinek wrote: >> The biggest change in this patch, and the main intrusive part, is that >> procArray->replication_slot_catalog_xmin is no longer directly used by >> vacuum. Instead, a new ShmemVariableCache->oldestCatalogXmin field is >> added, with a corresponding CheckPoint field. >> [snip] > > If this mechanism would not be needed most of the time, wouldn't it be > better to not have it and just have a way to ask physical slot about > what's the current reserved catalog_xmin (in most cases the standby > should actually know what it is since it's sending the hs_feedback, but > first slot created on such standby may need to check). Yes, and that was actually my originally preferred approach, though the one above does offer the advantage that if something goes wrong we can detect it and fail gracefully. Possibly not worth the complexity though. Your approach requires us to make very sure that hot_standby_feedback does not get turned off by user or become ineffective once we're replicating, though, since we won't have any way to detect when needed tuples are removed. We'd probably just bail out with relcache/syscache lookup errors, but I can't guarantee we wouldn't crash if we tried logical decoding on WAL where needed catalogs have been removed. I initially ran into trouble doing that, but now think I have a way forward. > WRT preventing > hs_feedback going off, we can refuse to start with hs_feedback off when > there are logical slots detected. Yes. There are some ordering issues there though. We load slots quite late in startup and they don't get tracked in checkpoints. So we might launch the walreceiver before we load slots and notice their needed xmin/catalog_xmin. So we need to prevent sending of hot_standby_feedback until slots are loaded, or load slots earlier in startup. The former sounds less intrusive and safer - probably just add an "initialized" flag to ReplicationSlotCtlData and suppress hs_feedback until it becomes true. We'd also have to suppress the validation callback action on the hot_standby_feedback GUC until we know replication slot state is initialised, and perform the check during slot startup instead. The hot_standby_feedback GUC validation check might get called before shmem is even set up so we have to guard against attempts to access a shmem segment that may not event exist yet. The general idea is workable though. Refuse to start if logical slots exist and hot_standby_feedback is off or walreceiver isn't using a physical slot. Refuse to allow hot_standby_feedback to change > We can also refuse to connect to the > master without physical slot if there are logical slots detected. I > don't see problem with either of those. Agreed. We must also be able to reliably enforce that the walreceiver is using a replication slot to connect to the master and refuse to start if it is not. The user could change recovery.conf and restart the walreceiver while we're running, after we perform that check, so walreceiver must also refuse to start if logical replication slots exist but it has no primary slot name configured. > You may ask what if user drops the slot and recreates or somehow > otherwise messes up catalog_xmin on master, well, considering that under > my proposal we'd first (on connect) check the slot for catalog_xmin we'd > know about it so we'd either mark the local slots broken/drop them or > plainly refuse to connect to the master same way as if it didn't have > required WAL anymore (not sure which behavior is better). Note that user > could mess up catalog_xmin even in your design the same way, so it's not > really a regression. Agreed. Checking catalog_xmin of the slot when we connect is sufficient to guard against that, assuming we can trust that the catalog_xmin is actually in effect on the master. Consider cascading setups, where we set our catalog_xmin but it might not be "locked in" until the middle cascaded server relays it to the master. I have a proposed solution to that which I'll outline in a separate patch+post; it ties in to some work on addressing the race between hot standby feedback taking effect and queries starting on the hot standby. It boils down to "add a hot_standby_feedback reply protocol message". > Plus > it might even be okay to only allow creating logical slots on standbys > connected directly to master in v1. True. I didn't consider that. We haven't had much luck in the past with such limitations, but personally I'd consider it a perfectly reasonable one. > But in 0003 I don't understand following code: >> + if (endpos != InvalidXLogRecPtr && !do_start_slot) >> + { >> + fprintf(stderr, >> + _("%s: cannot use --create-slot or --drop-slot >> together with --endpos\n"), >> + progname); >> + fprintf(stderr, _("Try \"%s --help\" for more information.\n"), >> + progname); >> +
Re: [HACKERS] Logical decoding on standby
On 07/12/16 07:05, Craig Ringer wrote: > On 21 November 2016 at 16:17, Craig Ringer wrote: >> Hi all >> >> I've prepared a working initial, somewhat raw implementation for >> logical decoding on physical standbys. > > Hi all > > I've attached a significantly revised patch, which now incorporates > safeguards to ensure that we prevent decoding if the master has not > retained needed catalogs and cancel decoding sessions that are holding > up apply because they need too-old catalogs > > The biggest change in this patch, and the main intrusive part, is that > procArray->replication_slot_catalog_xmin is no longer directly used by > vacuum. Instead, a new ShmemVariableCache->oldestCatalogXmin field is > added, with a corresponding CheckPoint field. Vacuum notices if > procArray->replication_slot_catalog_xmin has advanced past > ShmemVariableCache->oldestCatalogXmin and writes a new xact rmgr > record with the new value before it copies it to oldestCatalogXmin. > This means that a standby can now reliably tell when catalogs are > about to be removed or become candidates for removal, so it can pause > redo until logical decoding sessions on the standby advance far enough > that their catalog_xmin passes that point. It also means that if our > hot_standby_feedback somehow fails to lock in the catalogs our slots > need on a standby, we can cancel sessions with a conflict with > recovery. > > If wal_level is < logical this won't do anything, since > replication_slot_catalog_xmin and oldestCatalogXmin will both always > be 0. > > Because oldestCatalogXmin advances eagerly as soon as vacuum sees the > new replication_slot_catalog_xmin, this won't impact catalog bloat. > > Ideally this mechanism won't generally actually be needed, since > hot_standby_feedback stops the master from removing needed catalogs, > and we make an effort to ensure that the standby has > hot_standby_feedback enabled and is using a replication slot. We > cannot prevent the user from dropping and re-creating the physical > slot on the upstream, though, and it doesn't look simple to stop them > turning off hot_standby_feedback or turning off use of a physical slot > after creating logical slots, either. So we try to stop users shooting > themselves in the foot, but if they do it anyway we notice and cope > gracefully. Logging catalog_xmin also helps slots created on standbys > know where to start, and makes sure we can deal gracefully with a race > between hs_feedback and slot creation on a standby. > Hi, If this mechanism would not be needed most of the time, wouldn't it be better to not have it and just have a way to ask physical slot about what's the current reserved catalog_xmin (in most cases the standby should actually know what it is since it's sending the hs_feedback, but first slot created on such standby may need to check). WRT preventing hs_feedback going off, we can refuse to start with hs_feedback off when there are logical slots detected. We can also refuse to connect to the master without physical slot if there are logical slots detected. I don't see problem with either of those. You may ask what if user drops the slot and recreates or somehow otherwise messes up catalog_xmin on master, well, considering that under my proposal we'd first (on connect) check the slot for catalog_xmin we'd know about it so we'd either mark the local slots broken/drop them or plainly refuse to connect to the master same way as if it didn't have required WAL anymore (not sure which behavior is better). Note that user could mess up catalog_xmin even in your design the same way, so it's not really a regression. In general I don't think that it's necessary to WAL log anything for this. It will not work without slot and therefore via archiving anyway so writing to WAL does not seem to buy us anything. There are some interesting side effects of cascading (ie having A->B->C replication and creating logical slot on C) but those should not be insurmountable. Plus it might even be okay to only allow creating logical slots on standbys connected directly to master in v1. That's about approach, but since there are prerequisite patches in the patchset that don't really depend on the approach I will comment about them as well. 0001 and 0002 add testing infrastructure and look fine to me, possibly committable. But in 0003 I don't understand following code: > + if (endpos != InvalidXLogRecPtr && !do_start_slot) > + { > + fprintf(stderr, > + _("%s: cannot use --create-slot or --drop-slot > together with --endpos\n"), > + progname); > + fprintf(stderr, _("Try \"%s --help\" for more information.\n"), > + progname); > + exit(1); > + } Why is --create-slot and --endpos not allowed together? 0004 again looks good but depends on 0003. 0005 is timeline following which is IMHO ready for committer, as is 0006 and 0008 and I still
Re: [HACKERS] Logical decoding on standby
>> --- a/contrib/pgstattuple/pgstatapprox.c >> +++ b/contrib/pgstattuple/pgstatapprox.c >> @@ -70,7 +70,7 @@ statapprox_heap(Relation rel, output_type *stat) >> TransactionId OldestXmin; >> uint64 misc_count = 0; >> >> - OldestXmin = GetOldestXmin(rel, true); >> + OldestXmin = GetOldestXmin(rel, true, false); >> bstrategy = GetAccessStrategy(BAS_BULKREAD); >> >> nblocks = RelationGetNumberOfBlocks(rel); > > This does not seem correct, you are sending false as pointer parameter. Thanks. That's an oversight from the GetOldestXmin interface change per your prior feedback. C doesn't care since null is 0 and false is 0, and I missed it when transforming the patch. > 0012: > > I think there should be parameter saying if snapshot should be exported > or not and if user asks for it on standby it should fail. Sounds reasonable. That also means clients can suppress standby export on master, which as we recently learned can be desirable sometimes. > 0014 makes 0011 even more pointless. Yeah, as I said, it's a bit WIP still and needs some rebasing and rearrangement. > This also replaces the previous timeline following and decoding > threads/CF entries so maybe those should be closed in CF? I wasn't sure what to do about that, since it's all a set of related functionality. I think it's going to get more traction as a single "logical decoding onstandby" feature though, since the other parts are hard to test and use in isolation. So yeah, probably, I'll do so unless someone objects. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Logical decoding on standby
Hi, I did look at the code a bit. The first 6 patches seem reasonable. I don't understand why some patches are separate tbh (like 7-10, or 11). About the 0009: > diff --git a/contrib/pg_visibility/pg_visibility.c > b/contrib/pg_visibility/pg_visibility.c > index 9985e3e..4fa3ad4 100644 > --- a/contrib/pg_visibility/pg_visibility.c > +++ b/contrib/pg_visibility/pg_visibility.c > @@ -538,7 +538,7 @@ collect_corrupt_items(Oid relid, bool all_visible, bool > all_frozen) > if (all_visible) > { > /* Don't pass rel; that will fail in recovery. */ > - OldestXmin = GetOldestXmin(NULL, true); > + OldestXmin = GetOldestXmin(NULL, true, false); > } > > rel = relation_open(relid, AccessShareLock); > @@ -660,7 +660,7 @@ collect_corrupt_items(Oid relid, bool all_visible, bool > all_frozen) >* a buffer lock. And this shouldn't happen > often, so it's >* worth being careful so as to avoid false > positives. >*/ > - RecomputedOldestXmin = GetOldestXmin(NULL, > true); > + RecomputedOldestXmin = GetOldestXmin(NULL, > true, false); > > if (!TransactionIdPrecedes(OldestXmin, > RecomputedOldestXmin)) > record_corrupt_item(items, > &tuple.t_self); > diff --git a/contrib/pgstattuple/pgstatapprox.c > b/contrib/pgstattuple/pgstatapprox.c > index f524fc4..5b33c97 100644 > --- a/contrib/pgstattuple/pgstatapprox.c > +++ b/contrib/pgstattuple/pgstatapprox.c > @@ -70,7 +70,7 @@ statapprox_heap(Relation rel, output_type *stat) > TransactionId OldestXmin; > uint64 misc_count = 0; > > - OldestXmin = GetOldestXmin(rel, true); > + OldestXmin = GetOldestXmin(rel, true, false); > bstrategy = GetAccessStrategy(BAS_BULKREAD); > > nblocks = RelationGetNumberOfBlocks(rel); This does not seem correct, you are sending false as pointer parameter. 0012: I think there should be parameter saying if snapshot should be exported or not and if user asks for it on standby it should fail. 0014 makes 0011 even more pointless. Not going into deeper detail as this is still very WIP. I go agree with the general design though. This also replaces the previous timeline following and decoding threads/CF entries so maybe those should be closed in CF? -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Logical decoding on standby
On 26 Nov. 2016 23:40, "Robert Haas" wrote: > > On Wed, Nov 23, 2016 at 8:37 AM, Craig Ringer wrote: > >>> The last checkpoint's oldestXid, and ShmemVariableCache's oldestXid, > >>> are already held down by ProcArray's catalog_xmin. But that doesn't > >>> mean we haven't removed newer tuples from specific relations and > >>> logged that in xl_heap_clean, etc, including catalogs or user > >>> catalogs, it only means the clog still exists for those XIDs. > >> > >> Really? > > > > (Note the double negative above). > > > > Yes, necessarily so. You can't look up xids older than the clog > > truncation threshold at oldestXid, per our discussion on txid_status() > > and traceable commit. But the tuples from that xact aren't guaranteed > > to exist in any given relation; vacuum uses vacuum_set_xid_limits(...) > > which calls GetOldestXmin(...); that in turn scans ProcArray to find > > the oldest xid any running xact cares about. It might bump it down > > further if there's a replication slot requirement or based on > > vacuum_defer_cleanup_age, but it doesn't care in the slightest about > > oldestXmin. > > > > oldestXmin cannot advance until vacuum has removed all tuples for that > > xid and advanced the database's datfrozenxid. But a given oldestXmin > > says nothing about which tuples, catalog or otherwise, still exist and > > are acessible. > > Right. Sorry, my mistake. Phew. Had me worried there. Thanks for looking over it. Prototype looks promising so far.
Re: [HACKERS] Logical decoding on standby
On Wed, Nov 23, 2016 at 8:37 AM, Craig Ringer wrote: >>> The last checkpoint's oldestXid, and ShmemVariableCache's oldestXid, >>> are already held down by ProcArray's catalog_xmin. But that doesn't >>> mean we haven't removed newer tuples from specific relations and >>> logged that in xl_heap_clean, etc, including catalogs or user >>> catalogs, it only means the clog still exists for those XIDs. >> >> Really? > > (Note the double negative above). > > Yes, necessarily so. You can't look up xids older than the clog > truncation threshold at oldestXid, per our discussion on txid_status() > and traceable commit. But the tuples from that xact aren't guaranteed > to exist in any given relation; vacuum uses vacuum_set_xid_limits(...) > which calls GetOldestXmin(...); that in turn scans ProcArray to find > the oldest xid any running xact cares about. It might bump it down > further if there's a replication slot requirement or based on > vacuum_defer_cleanup_age, but it doesn't care in the slightest about > oldestXmin. > > oldestXmin cannot advance until vacuum has removed all tuples for that > xid and advanced the database's datfrozenxid. But a given oldestXmin > says nothing about which tuples, catalog or otherwise, still exist and > are acessible. Right. Sorry, my mistake. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Logical decoding on standby
On 23 November 2016 at 03:55, Robert Haas wrote: > On Tue, Nov 22, 2016 at 1:49 AM, Craig Ringer wrote: >> On 22 November 2016 at 10:20, Craig Ringer wrote: >>> I'm currently looking at making detection of replay conflict with a >>> slot work by separating the current catalog_xmin into two effective >>> parts - the catalog_xmin currently needed by any known slots >>> (ProcArray->replication_slot_catalog_xmin, as now), and the oldest >>> actually valid catalog_xmin where we know we haven't removed anything >>> yet. >> >> OK, more detailed plan. >> >> The last checkpoint's oldestXid, and ShmemVariableCache's oldestXid, >> are already held down by ProcArray's catalog_xmin. But that doesn't >> mean we haven't removed newer tuples from specific relations and >> logged that in xl_heap_clean, etc, including catalogs or user >> catalogs, it only means the clog still exists for those XIDs. > > Really? (Note the double negative above). Yes, necessarily so. You can't look up xids older than the clog truncation threshold at oldestXid, per our discussion on txid_status() and traceable commit. But the tuples from that xact aren't guaranteed to exist in any given relation; vacuum uses vacuum_set_xid_limits(...) which calls GetOldestXmin(...); that in turn scans ProcArray to find the oldest xid any running xact cares about. It might bump it down further if there's a replication slot requirement or based on vacuum_defer_cleanup_age, but it doesn't care in the slightest about oldestXmin. oldestXmin cannot advance until vacuum has removed all tuples for that xid and advanced the database's datfrozenxid. But a given oldestXmin says nothing about which tuples, catalog or otherwise, still exist and are acessible. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Logical decoding on standby
On Tue, Nov 22, 2016 at 1:49 AM, Craig Ringer wrote: > On 22 November 2016 at 10:20, Craig Ringer wrote: >> I'm currently looking at making detection of replay conflict with a >> slot work by separating the current catalog_xmin into two effective >> parts - the catalog_xmin currently needed by any known slots >> (ProcArray->replication_slot_catalog_xmin, as now), and the oldest >> actually valid catalog_xmin where we know we haven't removed anything >> yet. > > OK, more detailed plan. > > The last checkpoint's oldestXid, and ShmemVariableCache's oldestXid, > are already held down by ProcArray's catalog_xmin. But that doesn't > mean we haven't removed newer tuples from specific relations and > logged that in xl_heap_clean, etc, including catalogs or user > catalogs, it only means the clog still exists for those XIDs. Really? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Logical decoding on standby
On 22 November 2016 at 10:20, Craig Ringer wrote: > I'm currently looking at making detection of replay conflict with a > slot work by separating the current catalog_xmin into two effective > parts - the catalog_xmin currently needed by any known slots > (ProcArray->replication_slot_catalog_xmin, as now), and the oldest > actually valid catalog_xmin where we know we haven't removed anything > yet. OK, more detailed plan. The last checkpoint's oldestXid, and ShmemVariableCache's oldestXid, are already held down by ProcArray's catalog_xmin. But that doesn't mean we haven't removed newer tuples from specific relations and logged that in xl_heap_clean, etc, including catalogs or user catalogs, it only means the clog still exists for those XIDs. We don't emit a WAL record when we advance oldestXid in SetTransactionIdLimit(), and doing so is useless because vacuum will have already removed needed tuples from needed catalogs before calling SetTransactionIdLimit() from vac_truncate_clog(). We know that if oldestXid is n, the true valid catalog_xmin where no needed tuples have been removed must be >= n. But we need to know the lower bound of valid catalog_xmin, which oldestXid doesn't give us. So right now a standby has no way to reliably know if the catalog_xmin requirement for a given replication slot can be satisfied. A standby can't tell based on a xl_heap_cleanup_info record, xl_heap_clean record, etc whether the affected table is a catalog or not, and shouldn't generate conflicts for non-catalogs since otherwise it'll be constantly clobbering walsenders. A 2-phase advance of the global catalog_xmin would mean that GetOldestXmin() would return a value from ShmemVariableCache, not the oldest catalog xmin from ProcArray like it does now. (auto)vacuum would then be responsible for: * Reading the oldest catalog_xmin from procarray * If it has advanced vs what's present in ShmemVariableCache, writing a new xlog record type recording an advance of oldest catalog xmin * advancing ShmemVariableCache's oldest catalog xmin and would do so before it called GetOldestXmin via vacuum_set_xid_limits() to determine what it can remove. GetOldestXmin would return the ProcArray's copy of the oldest catalog_xmin when in recovery, so we report it via hot_standby_fedback to the upstream, it's recorded on our physical slot, and in turn causes vacuum to advance the master's effective oldest catalog_xmin for vacuum. On the standby we'd replay the catalog_xmin advance record, advance the standby's ShmemVariableCache's oldest catalog xmin, and check to see if any replication slots, active or not, have a catalog_xmin < than the new threshold. If none do, there's no conflict and we're fine. If any do, we wait max_standby_streaming_delay/max_standby_archive_delay as appropriate, then generate recovery conflicts against all backends that have an active replication slot based on the replication slot state in shmem. Those backends - walsender or normal decoding backend - would promptly die. New decoding sessions will check the ShmemVariableCache and refuse to start if their catalog_xmin is < the threshold. Since we advance it before generating recovery conflicts there's no race with clients trying to reconnect after their backend is killed with a conflict. If we wanted to get fancy we could set the latches of walsender backends at risk of conflicting and they could check ShmemVariableCache's oldest valid catalog xmin, so they could send immediate keepalives with reply_requested set and hopefully get flush confirmation from the client and advance their catalog_xmin before we terminate them as conflicting with recovery. But that can IMO be done later/separately. Going to prototype this. Alternate approach: --- The oldest xid in heap_xlog_cleanup_info is relation-specific, but the standby has no way to know if it's a catalog relation or not during redo and know whether to kill slots and decoding sessions based on its latestRemovedXid. Same for xl_heap_clean and the other records that can cause snapshot conflicts (xl_xlog_visible, xl_heap_freeze_page, xl_btree_delete xl_btree_reuse_page, spgxlogVacuumRedirect). Instead of adding a 2-phase advance of the global catalog_xmin, we can instead add a rider to each of these records that identifies whether it's a catalog table or not. This would only be emitted when wal_level >= logical, but it *would* increase WAL sizes a bit when logical decoding is enabled even if it's not going to be used on a standby. The rider would be a simple: typedef struct xl_rel_catalog_info { bool rel_accessible_from_logical_decoding; } xl_catalog_info; or similar. During redo we call a new ResolveRecoveryConflictWithLogicalSlot function from each of those records' redo routines that does what I outlined above. This way add more info to more xlog records, and the upstream has to use RelationIsAccessibleInLogicalDecoding() to set up the records when writing the xlogs. In exchange, we don't h
Re: [HACKERS] Logical decoding on standby
Hi, On 2016-11-21 16:17:58 +0800, Craig Ringer wrote: > I've prepared a working initial, somewhat raw implementation for > logical decoding on physical standbys. Please attach. Otherwise in a year or two it'll be impossible to look this up. Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Logical decoding on standby
Hi all I've prepared a working initial, somewhat raw implementation for logical decoding on physical standbys. Since it's a series of 20 smallish patches at the moment I won't attach it. You can find the current version at time of writing here: https://github.com/postgres/postgres/compare/c5f365f3ab...2ndQuadrant:dev/logical-decoding-on-standby-pg10-v1?expand=1 i.e the tagdev/logical-decoding-on-standby-pg10-v1 in github repo 2ndQuadrant/postgres. and whatever I'm working on (subject to rebase, breakage, etc) lives in the branch dev/logical-decoding-on-standby-pg10 . Quickstart === Compile and install like usual; make sure to install test_decoding too. To see the functionality in action, configure with --enable-tap-tests and: make -C src/test/recovery check To try manually, initdb a master, set pg_hba.conf to 'trust' on all replication connections, append to postgresql.conf: wal_level = 'logical' max_replication_slots = 4 max_wal_senders = 4 hot_standby_feedback = on then start the master. Now psql -d 'master_connstr' -c "SELECT pg_create_physical_replication_slot('standby1');" and pg_basebackup -d 'master_connstr' -X stream -R --slot=standby1 and start the replica. You can now use pg_recvlogical to create a slot on the replica and decode changes from it, e.g. pg_recvlogical -d 'replica_connstr' -S test -P test_decoding --create-slot pg_recvlogical -d 'replica_connstr' -S 'test' -f - --start and you'll (hopefully) see subsequent changes you make on the master. If not, tell me. Patch series contents === This patch series incorporates the following changes: * Timeline following for logical slots, so they can start decoding on the correct timeline and follow timeline switches (with tests); originally [3] * Add --endpos to pg_recvlogical, with tests; originally [3] * Splitting of xmin and catalog_xmin on hot standby feedback, so logical slots on a replica only hold down catalog_xmin on the master, not the xmin for user tables (with tests). Minimises upstream bloat; originally [4] * Suppress export of snapshot when starting logical decoding on replica. Since we cannot allocate an xid, we cannot export snapshots on standby. Decoding clients can still do their initial setup via a slot on the master then switch over, do it via physical copy, etc. * Require hot_standby_feedback to be enabled when starting logical decoding on a standby. * Drop any replication slots from a database when redo'ing database drop, so we don't leave dangling slots on the replica (with tests). * Make the walsender respect SIGUSR1 and exit via RecoveryConflictInterrupt() when it gets PROCSIG_RECOVERY_CONFLICT_DATABASE (with tests); see [6] * PostgresNode.pm enhancements for the tests * New test coverage for logical decoding on standby Remaining issues === * The method used to make the walsender respect conflict with recovery interrupts may not be entirely safe, see walsender procsignal_sigusr1_handler thread [5]; * We probably terminate walsenders running inside an output plugin with a virtual xact whose xmin is below the upstream's global xmin, even though its catalog xmin is fine, in ResolveRecoveryConflictWithSnapshot(...). Haven't been able to test this. Need to only terminate them when the conflict affects relations accessible in logical decoding, which likely needs the upstream to send more info in WAL; * logical decoding timeline following needs tests for cascading physical replication where an intermediate node is promoted per timeline following thread [3]; * walsender may need to maintain ThisTimeLineID in more places per decoding timeline following v10 thread [3]; * it may be desirable to refactor the walsender to deliver cleaner logical decoding timeline following per the decoding timeline following v10 thread[3] also: * Nothing stops the user from disabling hot_standby_feedback on the standby or dropping and re-creating the physical slot on the master, causing needed catalog tuples to get vacuumed away. Since it's not going to be safe to check slot shmem state from the hot_standby_feedback verify hook and we let hot_standby_feedback change at runtime this is going to be hard to fix comprehensively, so we need to cope with what happens when feedback fails, but: * We don't yet detect when upstream's catalog_xmin increases past our needed catalog_xmin and needed catalog tuples are vacuumed away by the upstream. So we don't invalidate the slot or terminate any active decoding sessions using the slot. Active decoding sessions often won't have a vtxid to use with ResolveRecoveryConflictWithVirtualXIDs(), transaction cancel is not going to be sufficient, and anyway it'll cancel too aggressively since it doesn't know it's safe to apply changes that affect only (non-user-catalog) heap tables without conflict with decoding sessions. ... so this is definitely NOT ready for commit. It does, however, make logical decoding work on standby. Next steps === S
Re: [HACKERS] Logical decoding on standby
On 2016-01-20 15:11:06 +0800, Craig Ringer wrote: > Unfortunately it's not particularly simple and nobody seems to have time to > implement it. FWIW, I don't think it's *that* hard. > As Álvaro pointed out, sometimes you have to do the work if > you want the change to happen. Or find someone with the existing skills and > convince them to want to do it, but most of those people are already very, > very busy. > > As part of the failover slots work Simon noted that: > > "To prevent needed rows from being removed we need we would need to enhance > hot_standby_feedback so it sends both xmin and catalog_xmin to the master." > > ... which means a protocol change in the walsender protocol. So you're > looking at that plus the other comment given above, that Not a huge problem though. > "We need to be able to correctly and quickly identify the timeline a LSN > belongs to" > > which is new internal infrastructure and new state in the replica that > must be maintained. Probably persistently. I think it just needs a more efficient lookup structure over the existing tliOfPointInHistory(), the data is all there. > (On a side note, failover slots probably won't be usable from a standby > either. They have to write to WAL and you can't write to WAL from a > standby. So if slot support is ever added on a standby it'll probably be > ephemeral slots only.) ephemeral slots are a different thing. Anyway, at this point failover slots aren't really proposed / have an agreed upon design yet, so it's a bit hard to take them into account. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Logical decoding on standby
On 19 January 2016 at 23:30, Дмитрий Сарафанников wrote: When you plan to add logical decoding on standby? > > it is useful to have separate standby server for logical replication that > will not break the master if you make a mistake in plugin. > Indeed. It might PANIC it and force a restart, if that's what you mean. I guess in an extreme case you could clobber shmem silently, which would be bad. (I wonder if we could make shared_buffers mmap'ed read-only for walsender backends?). So it certainly seems like it'd be nice to have. There's also the advantage that you could shift load off the master by having physical master=>replica replication on a fast wide link, then do logical decoding from there to send over the wire to WAN sites etc. Unfortunately it's not particularly simple and nobody seems to have time to implement it. As Álvaro pointed out, sometimes you have to do the work if you want the change to happen. Or find someone with the existing skills and convince them to want to do it, but most of those people are already very, very busy. As part of the failover slots work Simon noted that: "To prevent needed rows from being removed we need we would need to enhance hot_standby_feedback so it sends both xmin and catalog_xmin to the master." ... which means a protocol change in the walsender protocol. So you're looking at that plus the other comment given above, that "We need to be able to correctly and quickly identify the timeline a LSN belongs to" which is new internal infrastructure and new state in the replica that must be maintained. Probably persistently. In other words ... this isn't especially simple to do. It requires at least two non-trivial core changes. Nobody who has the skills to do it reasonably quickly also has the time to do it at all or has other higher priority things to do first. So it's not likely to happen soon unless you or someone like you steps up and has a go at it. If you do decide to attempt to implement it and you're willing to read a fair bit of code and documentation to do so you'll find people here pretty helpful with suggestions, ideas, and help if you get stuck. But only if you put in the work yourself. (On a side note, failover slots probably won't be usable from a standby either. They have to write to WAL and you can't write to WAL from a standby. So if slot support is ever added on a standby it'll probably be ephemeral slots only.) -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services