Re: Minimal logical decoding on standbys

Drouvot, Bertrand Tue, 24 Jan 2023 07:00:51 -0800

Hi,

On 1/24/23 12:21 AM, Melanie Plageman wrote:

I'm new to this thread and subject, but I had a few basic thoughts about
the first patch in the set.


Thanks for looking at it!

On Mon, Jan 23, 2023 at 12:03:35PM +0100, Drouvot, Bertrand wrote:

Please find V42 attached.

 From 3c206bd77831d507f4f95e1942eb26855524571a Mon Sep 17 00:00:00 2001
From: bdrouvotAWS <bdrou...@amazon.com>
Date: Mon, 23 Jan 2023 10:07:51 +0000
Subject: [PATCH v42 1/6] Add info in WAL records in preparation for logical
  slot conflict handling.
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Overall design:

1. We want to enable logical decoding on standbys, but replay of WAL
from the primary might remove data that is needed by logical decoding,
causing replication conflicts much as hot standby does.


It is a little confusing to mention replication conflicts in point 1. It
makes it sound like it already logs a recovery conflict. Without the
recovery conflict handling in this patchset, logical decoding of
statements using data that has been removed will fail with some error
like :
        ERROR:  could not map filenumber "xxx" to relation OID
Part of what this patchset does is introduce the concept of a new kind
of recovery conflict and a resolution process.


I think I understand what you mean, what about the following?

1. We want to enable logical decoding on standbys, but replay of WAL
from the primary might remove data that is needed by logical decoding,
causing error(s) on the standby.

To prevent those errors a new replication conflict scenario
needs to be addressed (as much as hot standby does.)

2. Our chosen strategy for dealing with this type of replication slot
is to invalidate logical slots for which needed data has been removed.

3. To do this we need the latestRemovedXid for each change, just as we
do for physical replication conflicts, but we also need to know
whether any particular change was to data that logical replication
might access.

It isn't clear from the above sentence why you would need both. I think
it has something to do with what is below (hot_standby_feedback being
off), but I'm not sure, so the order is confusing.


Right, it has to deal with the xid horizons too. So the idea is to check if
1) there is a risk of conflict and 2) if there is a risk then check
is there a conflict? (based on the xid). I'll reword this part.

4. We can't rely on the standby's relcache entries for this purpose in
any way, because the startup process can't access catalog contents.

5. Therefore every WAL record that potentially removes data from the
index or heap must carry a flag indicating whether or not it is one
that might be accessed during logical decoding.

Why do we need this for logical decoding on standby?

First, let's forget about logical decoding on standby and recall that
on a primary database, any catalog rows that may be needed by a logical
decoding replication slot are not removed.

This is done thanks to the catalog_xmin associated with the logical
replication slot.

But, with logical decoding on standby, in the following cases:

- hot_standby_feedback is off
- hot_standby_feedback is on but there is no a physical slot between
   the primary and the standby. Then, hot_standby_feedback will work,
   but only while the connection is alive (for example a node restart
   would break it)

Then, the primary may delete system catalog rows that could be needed
by the logical decoding on the standby (as it does not know about the
catalog_xmin on the standby).

So, it’s mandatory to identify those rows and invalidate the slots
that may need them if any. Identifying those rows is the purpose of
this commit.


I would like a few more specifics about this commit (patch 1 in the set)
itself in the commit message.

I think it would be good to have the commit message mention what kinds
of operations require WAL to contain information about whether or not it
is operating on a catalog table and why this is.

For example, I think the explanation of the feature makes it clear why
vacuum and pruning operations would require isCatalogRel, however it
isn't immediately obvious why page reuse would.


What do you think about putting those extra explanations in the code instead?

Also, because the diff has so many function signatures changed, it is
hard to tell which xlog record types are actually being changed to
include isCatalogRel. It might be too detailed/repetitive for the final
commit message to have a list of the xlog types requiring this info
(gistxlogPageReuse, spgxlogVacuumRedirect, xl_hash_vacuum_one_page,
xl_btree_reuse_page, xl_btree_delete, xl_heap_visible, xl_heap_prune,
xl_heap_freeze_page) but perhaps you could enumerate all the general
operations (freeze page, vacuum, prune, etc).


Right, at the end there is only a few making "real" use of it: they can be
enumerated in the commit message. Will do.

Implementation:

When a WAL replay on standby indicates that a catalog table tuple is
to be deleted by an xid that is greater than a logical slot's
catalog_xmin, then that means the slot's catalog_xmin conflicts with
the xid, and we need to handle the conflict. While subsequent commits
will do the actual conflict handling, this commit adds a new field
isCatalogRel in such WAL records (and a new bit set in the
xl_heap_visible flags field), that is true for catalog tables, so as to
arrange for conflict handling.


You do mention it a bit here, but I think it could be more clear and
specific.


Ok, will try to be more clear.

diff --git a/src/backend/access/gist/gist.c b/src/backend/access/gist/gist.c
index ba394f08f6..235f1a1843 100644
--- a/src/backend/access/gist/gist.c
+++ b/src/backend/access/gist/gist.c
@@ -348,7 +348,7 @@ gistplacetopage(Relation rel, Size freespace, GISTSTATE 
*giststate,
                for (; ptr; ptr = ptr->next)
                {
                        /* Allocate new page */
-                       ptr->buffer = gistNewBuffer(rel);
+                       ptr->buffer = gistNewBuffer(heapRel, rel);
                        GISTInitBuffer(ptr->buffer, (is_leaf) ? F_LEAF : 0);
                        ptr->page = BufferGetPage(ptr->buffer);
                        ptr->block.blkno = BufferGetBlockNumber(ptr->buffer);
diff --git a/src/backend/access/gist/gistbuild.c 
b/src/backend/access/gist/gistbuild.c
index d21a308d41..a87890b965 100644
--- a/src/backend/access/gist/gistbuild.c
+++ b/src/backend/access/gist/gistbuild.c
@@ -298,7 +298,7 @@ gistbuild(Relation heap, Relation index, IndexInfo 
*indexInfo)
                Page            page;

/* initialize the root page */

-               buffer = gistNewBuffer(index);
+               buffer = gistNewBuffer(heap, index);
                Assert(BufferGetBlockNumber(buffer) == GIST_ROOT_BLKNO);
                page = BufferGetPage(buffer);

diff --git a/src/backend/access/gist/gistutil.c b/src/backend/access/gist/gistutil.c

index 56451fede1..119e34ce0f 100644
--- a/src/backend/access/gist/gistutil.c
+++ b/src/backend/access/gist/gistutil.c
@@ -821,7 +821,7 @@ gistcheckpage(Relation rel, Buffer buf)
   * Caller is responsible for initializing the page by calling GISTInitBuffer
   */
  Buffer
-gistNewBuffer(Relation r)
+gistNewBuffer(Relation heaprel, Relation r)
  {


It is not very important, but I noticed you made "heaprel" the last
parameter to all of the btree-related functions but the first parameter
to the gist functions. I thought it might be nice to make the order
consistent.


Agree, will do.

I also was wondering why you made it the last argument to
all the btree functions to begin with (i.e. instead of directly after
the first rel argument).


No real reasons, will put all of them after the first rel argument (that seems 
a better place).

diff --git a/src/include/access/gist_private.h 
b/src/include/access/gist_private.h
index 8af33d7b40..9bdac12baf 100644
--- a/src/include/access/gist_private.h
+++ b/src/include/access/gist_private.h
@@ -440,7 +440,7 @@ extern XLogRecPtr gistXLogPageDelete(Buffer buffer,
                                                                         
FullTransactionId xid, Buffer parentBuffer,
                                                                         
OffsetNumber downlinkOffset);

-extern void gistXLogPageReuse(Relation rel, BlockNumber blkno,

+extern void gistXLogPageReuse(Relation heaprel, Relation rel, BlockNumber 
blkno,
                                                          FullTransactionId 
deleteXid);

extern XLogRecPtr gistXLogUpdate(Buffer buffer,

@@ -485,7 +485,7 @@ extern bool gistproperty(Oid index_oid, int attno,
  extern bool gistfitpage(IndexTuple *itvec, int len);
  extern bool gistnospace(Page page, IndexTuple *itvec, int len, OffsetNumber 
todelete, Size freespace);
  extern void gistcheckpage(Relation rel, Buffer buf);
-extern Buffer gistNewBuffer(Relation r);
+extern Buffer gistNewBuffer(Relation heaprel, Relation r);
  extern bool gistPageRecyclable(Page page);
  extern void gistfillbuffer(Page page, IndexTuple *itup, int len,
                                                   OffsetNumber off);
diff --git a/src/include/access/gistxlog.h b/src/include/access/gistxlog.h
index 09f9b0f8c6..191f0e5808 100644
--- a/src/include/access/gistxlog.h
+++ b/src/include/access/gistxlog.h
@@ -51,13 +51,13 @@ typedef struct gistxlogDelete
  {
        TransactionId snapshotConflictHorizon;
        uint16          ntodelete;              /* number of deleted offsets */
+       bool        isCatalogRel;


In some of these struct definitions, I think it would help comprehension
to have a comment explaining the purpose of this member.


Yeah, agree but it could be done in another patch (outside of this feature), 
agree?

Thanks for all your hard work on this feature!


Thanks for the review and the feedback!

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Minimal logical decoding on standbys

Reply via email to