Hi,

On 2015-12-04 21:57:54 +0900, Michael Paquier wrote:
> On Fri, Dec 4, 2015 at 5:10 PM, Andres Freund <and...@anarazel.de> wrote:
> >> Let's go for XLOG_FPI_FLUSH.
> >
> > I think the other way is a bit better, because we can add new flags
> > without changing the WAL format.
> 
> Hm. On the contrary, I think that it would make more sense to have a
> flag as well for FOR_HINT honestly, those are really the same
> operations, and FOR_HINT is just here for statistic purposes.

I don't think it's all that much the same operation. And WRT statistics
purpose: Being able to easily differentiate FOR_HINT is important for
pg_xlogdump --stats, but not at all for XLOG_FPI_FLUSH.

> >> --- a/src/backend/access/transam/xloginsert.c
> >> +++ b/src/backend/access/transam/xloginsert.c
> >> @@ -932,10 +932,13 @@ XLogSaveBufferForHint(Buffer buffer, bool buffer_std)
> >>   * If the page follows the standard page layout, with a PageHeader and 
> >> unused
> >>   * space between pd_lower and pd_upper, set 'page_std' to TRUE. That 
> >> allows
> >>   * the unused space to be left out from the WAL record, making it smaller.
> >> + *
> >> + * If 'is_flush' is set to TRUE, relation will be requested to flush
> >> + * immediately its data at replay after replaying this full page image.
> >>   */
> >
> > s/is_flush/flush_immed/? And maybe say that it 'will be flushed to the
> > OS immediately after replaying the record'?
> 
> s/OS/stable storage?

It's not flushed to stable storage here. Just to the OS.

> diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
> index 99337b0..fff48ab 100644
> --- a/src/backend/access/brin/brin.c
> +++ b/src/backend/access/brin/brin.c
> @@ -682,7 +682,15 @@ brinbuildempty(PG_FUNCTION_ARGS)
>       brin_metapage_init(BufferGetPage(metabuf), BrinGetPagesPerRange(index),
>                                          BRIN_CURRENT_VERSION);
>       MarkBufferDirty(metabuf);
> -     log_newpage_buffer(metabuf, false);
> +
> +     /*
> +      * For unlogged relations, this page should be immediately flushed
> +      * to disk after being replayed. This is necessary to ensure that the
> +      * initial on-disk state of unlogged relations is preserved as the
> +      * on-disk files are copied directly at the end of recovery.
> +      */
> +     log_newpage_buffer(metabuf, false,
> +             index->rd_rel->relpersistence == RELPERSISTENCE_UNLOGGED);
>       END_CRIT_SECTION();
>  
>       UnlockReleaseBuffer(metabuf);
> diff --git a/src/backend/access/brin/brin_pageops.c 
> b/src/backend/access/brin/brin_pageops.c
> index f876f62..572fe20 100644
> --- a/src/backend/access/brin/brin_pageops.c
> +++ b/src/backend/access/brin/brin_pageops.c
> @@ -865,7 +865,7 @@ brin_initialize_empty_new_buffer(Relation idxrel, Buffer 
> buffer)
>       page = BufferGetPage(buffer);
>       brin_page_init(page, BRIN_PAGETYPE_REGULAR);
>       MarkBufferDirty(buffer);
> -     log_newpage_buffer(buffer, true);
> +     log_newpage_buffer(buffer, true, false);
>       END_CRIT_SECTION();
>  
>       /*
> diff --git a/src/backend/access/gin/gininsert.c 
> b/src/backend/access/gin/gininsert.c
> index 49e9185..17c168a 100644
> --- a/src/backend/access/gin/gininsert.c
> +++ b/src/backend/access/gin/gininsert.c
> @@ -450,14 +450,22 @@ ginbuildempty(PG_FUNCTION_ARGS)
>               ReadBufferExtended(index, INIT_FORKNUM, P_NEW, RBM_NORMAL, 
> NULL);
>       LockBuffer(RootBuffer, BUFFER_LOCK_EXCLUSIVE);
>  
> -     /* Initialize and xlog metabuffer and root buffer. */
> +     /*
> +      * Initialize and xlog metabuffer and root buffer. For unlogged
> +      * relations, those pages need to be immediately flushed to disk
> +      * after being replayed. This is necessary to ensure that the
> +      * initial on-disk state of unlogged relations is preserved when
> +      * they get reset at the end of recovery.
> +      */
>       START_CRIT_SECTION();
>       GinInitMetabuffer(MetaBuffer);
>       MarkBufferDirty(MetaBuffer);
> -     log_newpage_buffer(MetaBuffer, false);
> +     log_newpage_buffer(MetaBuffer, false,
> +             index->rd_rel->relpersistence == RELPERSISTENCE_UNLOGGED);
>       GinInitBuffer(RootBuffer, GIN_LEAF);
>       MarkBufferDirty(RootBuffer);
> -     log_newpage_buffer(RootBuffer, false);
> +     log_newpage_buffer(RootBuffer, false,
> +             index->rd_rel->relpersistence == RELPERSISTENCE_UNLOGGED);
>       END_CRIT_SECTION();
>  
>       /* Unlock and release the buffers. */
> diff --git a/src/backend/access/gist/gist.c b/src/backend/access/gist/gist.c
> index 53bccf6..6a20031 100644
> --- a/src/backend/access/gist/gist.c
> +++ b/src/backend/access/gist/gist.c
> @@ -84,7 +84,8 @@ gistbuildempty(PG_FUNCTION_ARGS)
>       START_CRIT_SECTION();
>       GISTInitBuffer(buffer, F_LEAF);
>       MarkBufferDirty(buffer);
> -     log_newpage_buffer(buffer, true);
> +     log_newpage_buffer(buffer, true,
> +             index->rd_rel->relpersistence == RELPERSISTENCE_UNLOGGED);
>       END_CRIT_SECTION();
>

I might be missing something here - but isn't it pretty much guaranteed
that all these are unlogged relations? Otherwise *buildempty() wouldn't
have been called, right?

>       else if (info == XLOG_BACKUP_END)
>       {
> @@ -178,9 +183,6 @@ xlog_identify(uint8 info)
>               case XLOG_FPI:
>                       id = "FPI";
>                       break;
> -             case XLOG_FPI_FOR_HINT:
> -                     id = "FPI_FOR_HINT";
> -                     break;
>       }

Really don't want to do that.

> @@ -9391,14 +9394,34 @@ xlog_redo(XLogReaderState *record)
>                * resource manager needs to generate conflicts, it has to 
> define a
>                * separate WAL record type and redo routine.
>                *
> -              * XLOG_FPI_FOR_HINT records are generated when a page needs to 
> be
> -              * WAL- logged because of a hint bit update. They are only 
> generated
> -              * when checksums are enabled. There is no difference in 
> handling
> -              * XLOG_FPI and XLOG_FPI_FOR_HINT records, they use a different 
> info
> -              * code just to distinguish them for statistics purposes.
> +              * Records flagged with 'for_hint_bits' are generated when a 
> page needs
> +              * to be WAL- logged because of a hint bit update. They are only
> +              * generated when checksums are enabled. There is no difference 
> in
> +              * handling records when this flag is set, it is used for 
> statistics
> +              * purposes.
> +              *
> +              * Records flagged with 'is_flush' indicate that the page 
> immediately
> +              * needs to be written to disk, not just to shared buffers. 
> This is
> +              * important if the on-disk state is to be the authoritative, 
> not the
> +              * state in shared buffers. E.g. because on-disk files may 
> later be
> +              * copied directly.
>                */
>               if (XLogReadBufferForRedo(record, 0, &buffer) != BLK_RESTORED)
>                       elog(ERROR, "unexpected XLogReadBufferForRedo result 
> when restoring backup block");
> +
> +             if (xlrec.is_flush)
> +             {
> +                     RelFileNode     rnode;
> +                     ForkNumber      forknum;
> +                     BlockNumber     blkno;
> +                     SMgrRelation srel;
> +
> +                     (void) XLogRecGetBlockTag(record, 0, &rnode, &forknum, 
> &blkno);
> +                     srel = smgropen(rnode, InvalidBackendId);
> +                     smgrwrite(srel, forknum, blkno, BufferGetPage(buffer), 
> false);
> +                     smgrclose(srel);
> +             }

That'd leave the dirty bit set on the buffer...

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to