Hi.

Trying to catch up.

On 2017/09/25 13:43, Michael Paquier wrote:
> On Sun, Sep 24, 2017 at 2:25 PM, Amit Kapila <amit.kapil...@gmail.com> wrote:
>> Added and updated the comments for both btree and hash index patches.
> 
> I don't have real complaints about this patch, this looks fine to me.
> 
> +    * Currently, the advantage of setting pd_lower is in limited cases like
> +    * during wal_consistency_checking or while logging for unlogged relation
> +    * as for all other purposes, we initialize the metapage.  Note, it also
> +    * helps in page masking by allowing to mask unused space.
> I would have reworked this comment a bit, say like that:
> Setting pd_lower is useful for two cases which make use of WAL
> compressibility even if the meta page is initialized at replay:
> - Logging of init forks for unlogged relations.
> - wal_consistency_checking logs extra full-page writes, and this
> allows masking of the unused space of the page.
> 
> Now I often get complains that I suck at this exercise ;)

So, I think I understand the discussion so far and the arguments about
what we should write to explain why we're setting pd_lower to the correct
value.

Just to remind, I didn't actually start this thread [1] to address the
issue that the FPWs of meta pages written to WAL are not getting
compressed.  An external backup tool relies on pd_lower to give the
correct starting offset of the hole to compress, provided the page's other
fields suggest it has the standard page layout.  Since we discovered that
pd_lower is not set correctly in gin, brin, and spgist meta pages, I
created patches to do the same.  You (Michael) pointed out that that
actually helps compress their FPW images in WAL as well, so we began
considering that.  Also, you pointed out that WAL checking code masks
pages based on the respective masking functions' assumptions about the
page's layout properties, which the original patches forgot to consider.
So, we updated the patches to change the respective masking functions to
mask meta pages as pages with standard page layout, too.

But as Tom pointed out [2], WAL compressibility benefit cannot be obtained
unless we change how the meta page is passed to xlog.c to be written to
WAL.  So, we found out all the places that write/register the meta page to
WAL and changed the respective XLogRegisterBuffer calls to include the
REGBUG_STANDARD flag.  Some of these calls already passed
REGBUF_WILL_INIT, which would result in no FPW image to be written to the
WAL and so there would be no question of compressibility.  But, passing
REGBUF_STANDARD would still help the case where WAL checking is enabled,
because FPW image *is* written in that case.

So, ISTM, comments that the patches add should all say that setting the
meta pages' pd_lower to the correct value helps to pass those pages to
xlog.c as compressible standard layout pages, regardless of whether they
are actually passed that way.  Even if the patches do take care of the
latter as well.

Did I miss something?

Looking at Amit K's updated patches for btree and hash, it seems that he
updated the comment to say that setting pd_lower to the correct value is
*essential*, because those pages are passed as REGBUF_STANDARD pages and
hence will be compressed.  That seems to contradict what I wrote above,
but I think that's not wrong, too.  So, I updated the gin, brin, and
spgist patches to say the same, viz. the following:

   /*
    * Set pd_lower just past the end of the metadata.  This is essential,
    * because without doing so, metadata will be lost if xlog.c compresses
    * the page.
    */

Attached updated patches.

Thanks,
Amit

[1]
https://www.postgresql.org/message-id/0d273805-0e9e-ec1a-cb84-d4da400b8f85%40lab.ntt.co.jp

[2] https://www.postgresql.org/message-id/22268.1504815869%40sss.pgh.pa.us
From 600caa054e98673428fc9595b13df54efe06a6a4 Mon Sep 17 00:00:00 2001
From: amit <amitlangot...@gmail.com>
Date: Fri, 23 Jun 2017 11:20:41 +0900
Subject: [PATCH 1/5] Set pd_lower correctly in the GIN metapage.

Also tell xlog.c to treat the metapage like a standard page, so any
hole in it is compressed.
---
 src/backend/access/gin/ginfast.c   | 20 ++++++++++++++++++--
 src/backend/access/gin/gininsert.c |  4 ++--
 src/backend/access/gin/ginutil.c   | 17 ++++++++++++++++-
 src/backend/access/gin/ginxlog.c   | 25 ++++++++++---------------
 4 files changed, 46 insertions(+), 20 deletions(-)

diff --git a/src/backend/access/gin/ginfast.c b/src/backend/access/gin/ginfast.c
index 59e435465a..359de85dfc 100644
--- a/src/backend/access/gin/ginfast.c
+++ b/src/backend/access/gin/ginfast.c
@@ -399,6 +399,14 @@ ginHeapTupleFastInsert(GinState *ginstate, 
GinTupleCollector *collector)
        /*
         * Write metabuffer, make xlog entry
         */
+
+       /*
+        * Set pd_lower just past the end of the metadata.  This is essential,
+        * because without doing so, metadata will be lost if xlog.c compresses
+        * the page.
+        */
+       ((PageHeader) metapage)->pd_lower =
+                       ((char *) metadata + sizeof(GinMetaPageData)) - (char 
*) metapage;
        MarkBufferDirty(metabuffer);
 
        if (needWal)
@@ -407,7 +415,7 @@ ginHeapTupleFastInsert(GinState *ginstate, 
GinTupleCollector *collector)
 
                memcpy(&data.metadata, metadata, sizeof(GinMetaPageData));
 
-               XLogRegisterBuffer(0, metabuffer, REGBUF_WILL_INIT);
+               XLogRegisterBuffer(0, metabuffer, REGBUF_WILL_INIT | 
REGBUF_STANDARD);
                XLogRegisterData((char *) &data, sizeof(ginxlogUpdateMeta));
 
                recptr = XLogInsert(RM_GIN_ID, XLOG_GIN_UPDATE_META_PAGE);
@@ -572,6 +580,13 @@ shiftList(Relation index, Buffer metabuffer, BlockNumber 
newHead,
                        metadata->nPendingHeapTuples = 0;
                }
 
+               /*
+                * Set pd_lower just past the end of the metadata.  This is 
essential,
+                * because without doing so, metadata will be lost if xlog.c 
compresses
+                * the page.
+                */
+               ((PageHeader) metapage)->pd_lower =
+                       ((char *) metadata + sizeof(GinMetaPageData)) - (char 
*) metapage;
                MarkBufferDirty(metabuffer);
 
                for (i = 0; i < data.ndeleted; i++)
@@ -586,7 +601,8 @@ shiftList(Relation index, Buffer metabuffer, BlockNumber 
newHead,
                        XLogRecPtr      recptr;
 
                        XLogBeginInsert();
-                       XLogRegisterBuffer(0, metabuffer, REGBUF_WILL_INIT);
+                       XLogRegisterBuffer(0, metabuffer,
+                                                          REGBUF_WILL_INIT | 
REGBUF_STANDARD);
                        for (i = 0; i < data.ndeleted; i++)
                                XLogRegisterBuffer(i + 1, buffers[i], 
REGBUF_WILL_INIT);
 
diff --git a/src/backend/access/gin/gininsert.c 
b/src/backend/access/gin/gininsert.c
index 5378011f50..c9aa4ee147 100644
--- a/src/backend/access/gin/gininsert.c
+++ b/src/backend/access/gin/gininsert.c
@@ -348,7 +348,7 @@ ginbuild(Relation heap, Relation index, IndexInfo 
*indexInfo)
                Page            page;
 
                XLogBeginInsert();
-               XLogRegisterBuffer(0, MetaBuffer, REGBUF_WILL_INIT);
+               XLogRegisterBuffer(0, MetaBuffer, REGBUF_WILL_INIT | 
REGBUF_STANDARD);
                XLogRegisterBuffer(1, RootBuffer, REGBUF_WILL_INIT);
 
                recptr = XLogInsert(RM_GIN_ID, XLOG_GIN_CREATE_INDEX);
@@ -447,7 +447,7 @@ ginbuildempty(Relation index)
        START_CRIT_SECTION();
        GinInitMetabuffer(MetaBuffer);
        MarkBufferDirty(MetaBuffer);
-       log_newpage_buffer(MetaBuffer, false);
+       log_newpage_buffer(MetaBuffer, true);
        GinInitBuffer(RootBuffer, GIN_LEAF);
        MarkBufferDirty(RootBuffer);
        log_newpage_buffer(RootBuffer, false);
diff --git a/src/backend/access/gin/ginutil.c b/src/backend/access/gin/ginutil.c
index 136ea27718..f6e776dbe9 100644
--- a/src/backend/access/gin/ginutil.c
+++ b/src/backend/access/gin/ginutil.c
@@ -374,6 +374,14 @@ GinInitMetabuffer(Buffer b)
        metadata->nDataPages = 0;
        metadata->nEntries = 0;
        metadata->ginVersion = GIN_CURRENT_VERSION;
+
+       /*
+        * Set pd_lower just past the end of the metadata.  This is essential,
+        * because without doing so, metadata will be lost if xlog.c compresses
+        * the page.
+        */
+       ((PageHeader) page)->pd_lower =
+                       ((char *) metadata + sizeof(GinMetaPageData)) - (char 
*) page;
 }
 
 /*
@@ -676,6 +684,13 @@ ginUpdateStats(Relation index, const GinStatsData *stats)
        metadata->nDataPages = stats->nDataPages;
        metadata->nEntries = stats->nEntries;
 
+       /*
+        * Set pd_lower just past the end of the metadata.  This is essential,
+        * because without doing so, metadata will be lost if xlog.c compresses
+        * the page.
+        */
+       ((PageHeader) metapage)->pd_lower =
+               ((char *) metadata + sizeof(GinMetaPageData)) - (char *) 
metapage;
        MarkBufferDirty(metabuffer);
 
        if (RelationNeedsWAL(index))
@@ -690,7 +705,7 @@ ginUpdateStats(Relation index, const GinStatsData *stats)
 
                XLogBeginInsert();
                XLogRegisterData((char *) &data, sizeof(ginxlogUpdateMeta));
-               XLogRegisterBuffer(0, metabuffer, REGBUF_WILL_INIT);
+               XLogRegisterBuffer(0, metabuffer, REGBUF_WILL_INIT | 
REGBUF_STANDARD);
 
                recptr = XLogInsert(RM_GIN_ID, XLOG_GIN_UPDATE_META_PAGE);
                PageSetLSN(metapage, recptr);
diff --git a/src/backend/access/gin/ginxlog.c b/src/backend/access/gin/ginxlog.c
index 92cafe950b..2261447926 100644
--- a/src/backend/access/gin/ginxlog.c
+++ b/src/backend/access/gin/ginxlog.c
@@ -514,7 +514,7 @@ ginRedoUpdateMetapage(XLogReaderState *record)
        Assert(BufferGetBlockNumber(metabuffer) == GIN_METAPAGE_BLKNO);
        metapage = BufferGetPage(metabuffer);
 
-       GinInitPage(metapage, GIN_META, BufferGetPageSize(metabuffer));
+       GinInitMetabuffer(metabuffer);
        memcpy(GinPageGetMeta(metapage), &data->metadata, 
sizeof(GinMetaPageData));
        PageSetLSN(metapage, lsn);
        MarkBufferDirty(metabuffer);
@@ -656,7 +656,7 @@ ginRedoDeleteListPages(XLogReaderState *record)
        Assert(BufferGetBlockNumber(metabuffer) == GIN_METAPAGE_BLKNO);
        metapage = BufferGetPage(metabuffer);
 
-       GinInitPage(metapage, GIN_META, BufferGetPageSize(metabuffer));
+       GinInitMetabuffer(metabuffer);
 
        memcpy(GinPageGetMeta(metapage), &data->metadata, 
sizeof(GinMetaPageData));
        PageSetLSN(metapage, lsn);
@@ -768,6 +768,7 @@ void
 gin_mask(char *pagedata, BlockNumber blkno)
 {
        Page            page = (Page) pagedata;
+       PageHeader      pagehdr = (PageHeader) page;
        GinPageOpaque opaque;
 
        mask_page_lsn_and_checksum(page);
@@ -776,18 +777,12 @@ gin_mask(char *pagedata, BlockNumber blkno)
        mask_page_hint_bits(page);
 
        /*
-        * GIN metapage doesn't use pd_lower/pd_upper. Other page types do. 
Hence,
-        * we need to apply masking for those pages.
+        * For GIN_DELETED page, the page is initialized to empty. Hence, mask
+        * the page content.  For other pages, mask the hole if the pd_lower
+        * appears to have been set correctly.
         */
-       if (opaque->flags != GIN_META)
-       {
-               /*
-                * For GIN_DELETED page, the page is initialized to empty. 
Hence, mask
-                * the page content.
-                */
-               if (opaque->flags & GIN_DELETED)
-                       mask_page_content(page);
-               else
-                       mask_unused_space(page);
-       }
+       if (opaque->flags & GIN_DELETED)
+               mask_page_content(page);
+       else if (pagehdr->pd_lower > SizeOfPageHeaderData)
+               mask_unused_space(page);
 }
-- 
2.11.0

From c5c2b0ebffdb48da530b6b2786e4ca3f58354a83 Mon Sep 17 00:00:00 2001
From: amit <amitlangot...@gmail.com>
Date: Mon, 26 Jun 2017 15:13:32 +0900
Subject: [PATCH 2/5] Set pd_lower correctly in the BRIN index metapage

Also tell xlog.c to treat the metapage like a standard page, so any
hole in it is compressed.
---
 src/backend/access/brin/brin.c         |  4 ++--
 src/backend/access/brin/brin_pageops.c |  8 ++++++++
 src/backend/access/brin/brin_revmap.c  | 10 +++++++++-
 src/backend/access/brin/brin_xlog.c    | 18 ++++++++++++++++--
 4 files changed, 35 insertions(+), 5 deletions(-)

diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index b3aa6d1ced..e6909d7aea 100644
--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -685,7 +685,7 @@ brinbuild(Relation heap, Relation index, IndexInfo 
*indexInfo)
 
                XLogBeginInsert();
                XLogRegisterData((char *) &xlrec, SizeOfBrinCreateIdx);
-               XLogRegisterBuffer(0, meta, REGBUF_WILL_INIT);
+               XLogRegisterBuffer(0, meta, REGBUF_WILL_INIT | REGBUF_STANDARD);
 
                recptr = XLogInsert(RM_BRIN_ID, XLOG_BRIN_CREATE_INDEX);
 
@@ -742,7 +742,7 @@ brinbuildempty(Relation index)
        brin_metapage_init(BufferGetPage(metabuf), BrinGetPagesPerRange(index),
                                           BRIN_CURRENT_VERSION);
        MarkBufferDirty(metabuf);
-       log_newpage_buffer(metabuf, false);
+       log_newpage_buffer(metabuf, true);
        END_CRIT_SECTION();
 
        UnlockReleaseBuffer(metabuf);
diff --git a/src/backend/access/brin/brin_pageops.c 
b/src/backend/access/brin/brin_pageops.c
index 80f803e438..30bc5051ff 100644
--- a/src/backend/access/brin/brin_pageops.c
+++ b/src/backend/access/brin/brin_pageops.c
@@ -491,6 +491,14 @@ brin_metapage_init(Page page, BlockNumber pagesPerRange, 
uint16 version)
         * revmap page to be created when the index is.
         */
        metadata->lastRevmapPage = 0;
+
+       /*
+        * Set pd_lower just past the end of the metadata.  This is essential,
+        * because without doing so, metadata will be lost if xlog.c compresses
+        * the page.
+        */
+       ((PageHeader) page)->pd_lower =
+               ((char *) metadata + sizeof(BrinMetaPageData)) - (char *) page;
 }
 
 /*
diff --git a/src/backend/access/brin/brin_revmap.c 
b/src/backend/access/brin/brin_revmap.c
index 22f2076887..a276dc79ad 100644
--- a/src/backend/access/brin/brin_revmap.c
+++ b/src/backend/access/brin/brin_revmap.c
@@ -624,6 +624,14 @@ revmap_physical_extend(BrinRevmap *revmap)
        MarkBufferDirty(buf);
 
        metadata->lastRevmapPage = mapBlk;
+
+       /*
+        * Set pd_lower just past the end of the metadata.  This is essential,
+        * because without doing so, metadata will be lost if xlog.c compresses
+        * the page.
+        */
+       ((PageHeader) metapage)->pd_lower =
+               ((char *) metadata + sizeof(BrinMetaPageData)) - (char *) 
metapage;
        MarkBufferDirty(revmap->rm_metaBuf);
 
        if (RelationNeedsWAL(revmap->rm_irel))
@@ -635,7 +643,7 @@ revmap_physical_extend(BrinRevmap *revmap)
 
                XLogBeginInsert();
                XLogRegisterData((char *) &xlrec, SizeOfBrinRevmapExtend);
-               XLogRegisterBuffer(0, revmap->rm_metaBuf, 0);
+               XLogRegisterBuffer(0, revmap->rm_metaBuf, REGBUF_STANDARD);
 
                XLogRegisterBuffer(1, buf, REGBUF_WILL_INIT);
 
diff --git a/src/backend/access/brin/brin_xlog.c 
b/src/backend/access/brin/brin_xlog.c
index 60daa54a95..55d4dea891 100644
--- a/src/backend/access/brin/brin_xlog.c
+++ b/src/backend/access/brin/brin_xlog.c
@@ -234,6 +234,14 @@ brin_xlog_revmap_extend(XLogReaderState *record)
                metadata->lastRevmapPage = xlrec->targetBlk;
 
                PageSetLSN(metapg, lsn);
+
+               /*
+                * Set pd_lower just past the end of the metadata.  This is 
essential,
+                * because without doing so, metadata will be lost if xlog.c
+                * compresses the page.
+                */
+               ((PageHeader) metapg)->pd_lower =
+                       ((char *) metadata + sizeof(BrinMetaPageData)) - (char 
*) metapg;
                MarkBufferDirty(metabuf);
        }
 
@@ -331,14 +339,20 @@ void
 brin_mask(char *pagedata, BlockNumber blkno)
 {
        Page            page = (Page) pagedata;
+       PageHeader      pagehdr = (PageHeader) page;
 
        mask_page_lsn_and_checksum(page);
 
        mask_page_hint_bits(page);
 
-       if (BRIN_IS_REGULAR_PAGE(page))
+       /*
+        * Regular brin pages contain unused space which needs to be masked.
+        * Similarly for meta pages, but mask it only if pd_lower appears to 
have
+        * been set correctly.
+        */
+       if (BRIN_IS_REGULAR_PAGE(page) ||
+               (BRIN_IS_META_PAGE(page) && pagehdr->pd_lower > 
SizeOfPageHeaderData))
        {
-               /* Regular brin pages contain unused space which needs to be 
masked. */
                mask_unused_space(page);
        }
 }
-- 
2.11.0

From b0c7b929fdc78db4252af5f5ef5e1814cb441f59 Mon Sep 17 00:00:00 2001
From: amit <amitlangot...@gmail.com>
Date: Mon, 26 Jun 2017 15:23:34 +0900
Subject: [PATCH 3/5] Set pd_lower correctly in the SP-GiST index metapage

Also tell xlog.c to treat the metapage like a standard page, so any
hole in it is compressed.
---
 src/backend/access/spgist/spginsert.c |  4 ++--
 src/backend/access/spgist/spgutils.c  | 17 +++++++++++++++++
 src/backend/access/spgist/spgxlog.c   |  7 ++++---
 3 files changed, 23 insertions(+), 5 deletions(-)

diff --git a/src/backend/access/spgist/spginsert.c 
b/src/backend/access/spgist/spginsert.c
index e4b2c29b0e..80b82e1602 100644
--- a/src/backend/access/spgist/spginsert.c
+++ b/src/backend/access/spgist/spginsert.c
@@ -110,7 +110,7 @@ spgbuild(Relation heap, Relation index, IndexInfo 
*indexInfo)
                 * Replay will re-initialize the pages, so don't take full pages
                 * images.  No other data to log.
                 */
-               XLogRegisterBuffer(0, metabuffer, REGBUF_WILL_INIT);
+               XLogRegisterBuffer(0, metabuffer, REGBUF_WILL_INIT | 
REGBUF_STANDARD);
                XLogRegisterBuffer(1, rootbuffer, REGBUF_WILL_INIT | 
REGBUF_STANDARD);
                XLogRegisterBuffer(2, nullbuffer, REGBUF_WILL_INIT | 
REGBUF_STANDARD);
 
@@ -173,7 +173,7 @@ spgbuildempty(Relation index)
        smgrwrite(index->rd_smgr, INIT_FORKNUM, SPGIST_METAPAGE_BLKNO,
                          (char *) page, true);
        log_newpage(&index->rd_smgr->smgr_rnode.node, INIT_FORKNUM,
-                               SPGIST_METAPAGE_BLKNO, page, false);
+                               SPGIST_METAPAGE_BLKNO, page, true);
 
        /* Likewise for the root page. */
        SpGistInitPage(page, SPGIST_LEAF);
diff --git a/src/backend/access/spgist/spgutils.c 
b/src/backend/access/spgist/spgutils.c
index 22f64b0103..9e47883ef6 100644
--- a/src/backend/access/spgist/spgutils.c
+++ b/src/backend/access/spgist/spgutils.c
@@ -256,15 +256,24 @@ SpGistUpdateMetaPage(Relation index)
        if (cache != NULL)
        {
                Buffer          metabuffer;
+               Page            metapage;
                SpGistMetaPageData *metadata;
 
                metabuffer = ReadBuffer(index, SPGIST_METAPAGE_BLKNO);
+               metapage = BufferGetPage(metabuffer);
 
                if (ConditionalLockBuffer(metabuffer))
                {
                        metadata = SpGistPageGetMeta(BufferGetPage(metabuffer));
                        metadata->lastUsedPages = cache->lastUsedPages;
 
+                       /*
+                        * Set pd_lower just past the end of the metadata.  
This is
+                        * essential, because without doing so, metadata will 
be lost if
+                        * xlog.c compresses the page.
+                        */
+                       ((PageHeader) metapage)->pd_lower = ((char *) metadata +
+                                                       
sizeof(SpGistMetaPageData)) - (char *) metapage;
                        MarkBufferDirty(metabuffer);
                        UnlockReleaseBuffer(metabuffer);
                }
@@ -534,6 +543,14 @@ SpGistInitMetapage(Page page)
        /* initialize last-used-page cache to empty */
        for (i = 0; i < SPGIST_CACHED_PAGES; i++)
                metadata->lastUsedPages.cachedPage[i].blkno = 
InvalidBlockNumber;
+
+       /*
+        * Set pd_lower just past the end of the metadata.  This is essential,
+        * because without doing so, metadata will be lost if xlog.c compresses
+        * the page.
+        */
+       ((PageHeader) page)->pd_lower =
+               ((char *) metadata + sizeof(SpGistMetaPageData)) - (char *) 
page;
 }
 
 /*
diff --git a/src/backend/access/spgist/spgxlog.c 
b/src/backend/access/spgist/spgxlog.c
index 87def79ee5..539ce8eb8c 100644
--- a/src/backend/access/spgist/spgxlog.c
+++ b/src/backend/access/spgist/spgxlog.c
@@ -1033,15 +1033,16 @@ void
 spg_mask(char *pagedata, BlockNumber blkno)
 {
        Page            page = (Page) pagedata;
+       PageHeader      pagehdr = (PageHeader) page;
 
        mask_page_lsn_and_checksum(page);
 
        mask_page_hint_bits(page);
 
        /*
-        * Any SpGist page other than meta contains unused space which needs to 
be
-        * masked.
+        * Mask the unused space, only if the page's pd_lower appears to have 
been
+        * set correctly.
         */
-       if (!SpGistPageIsMeta(page))
+       if (pagehdr->pd_lower > SizeOfPageHeaderData)
                mask_unused_space(page);
 }
-- 
2.11.0

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to