On 2015-12-04 17:00:13 +0900, Michael Paquier wrote:
> Andres Freud wrote:
> >> @@ -450,14 +450,21 @@ ginbuildempty(PG_FUNCTION_ARGS)
> >>       START_CRIT_SECTION();
> >>       GinInitMetabuffer(MetaBuffer);
> >>       MarkBufferDirty(MetaBuffer);
> >> -     log_newpage_buffer(MetaBuffer, false);
> >> +     log_newpage_buffer(MetaBuffer, false, false);
> >>       GinInitBuffer(RootBuffer, GIN_LEAF);
> >>       MarkBufferDirty(RootBuffer);
> >> -     log_newpage_buffer(RootBuffer, false);
> >> +     log_newpage_buffer(RootBuffer, false, true);
> >>       END_CRIT_SECTION();
> >>
> > Why isn't the metapage flushed to disk?
> 
> I was not sure if we should only flush only at the last page, as pages
> just before would be already replayed. Switched.

Uh, that's not how it works! The earlier pages would just be in shared
buffers. Neither smgrwrite, nor smgrimmedsync does anything about that!

> >>  extern void InitXLogInsert(void);
> >> diff --git a/src/include/catalog/pg_control.h 
> >> b/src/include/catalog/pg_control.h
> >> index ad1eb4b..91445f1 100644
> >> --- a/src/include/catalog/pg_control.h
> >> +++ b/src/include/catalog/pg_control.h
> >> @@ -73,6 +73,7 @@ typedef struct CheckPoint
> >>  #define XLOG_END_OF_RECOVERY                 0x90
> >>  #define XLOG_FPI_FOR_HINT                            0xA0
> >>  #define XLOG_FPI                                             0xB0
> >> +#define XLOG_FPI_FOR_SYNC                            0xC0
> >
> >
> > I'm not a big fan of the XLOG_FPI_FOR_SYNC name. Syncing is a bit too
> > ambigous for my taste. How about either naming it XLOG_FPI_FLUSH or
> > instead adding actual record data and a 'flags' field in there? I
> > slightly prefer the latter - XLOG_FPI and XLOG_FPI_FOR_HINT really are
> > different, XLOG_FPI_FOR_SYNC not so much.
> 
> Let's go for XLOG_FPI_FLUSH.

I think the other way is a bit better, because we can add new flags
without changing the WAL format.

> diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
> index 99337b0..b646101 100644
> --- a/src/backend/access/brin/brin.c
> +++ b/src/backend/access/brin/brin.c
> @@ -682,7 +682,15 @@ brinbuildempty(PG_FUNCTION_ARGS)
>       brin_metapage_init(BufferGetPage(metabuf), BrinGetPagesPerRange(index),
>                                          BRIN_CURRENT_VERSION);
>       MarkBufferDirty(metabuf);
> -     log_newpage_buffer(metabuf, false);
> +
> +     /*
> +      * For unlogged relations, this page should be immediately flushed
> +      * to disk after being replayed. This is necessary to ensure that the
> +      * initial on-disk state of unlogged relations is preserved when
> +      * they get reset at the end of recovery.
> +      */
> +     log_newpage_buffer(metabuf, false,
> +             index->rd_rel->relpersistence == RELPERSISTENCE_UNLOGGED);
>       END_CRIT_SECTION();

Maybe write the last sentence as '... as the on disk files are copied
directly at the end of recovery.'?

> @@ -336,7 +336,8 @@ end_heap_rewrite(RewriteState state)
>                                               MAIN_FORKNUM,
>                                               state->rs_blockno,
>                                               state->rs_buffer,
> -                                             true);
> +                                             true,
> +                                             false);
>               RelationOpenSmgr(state->rs_new_rel);
>  
>               PageSetChecksumInplace(state->rs_buffer, state->rs_blockno);
> @@ -685,7 +686,8 @@ raw_heap_insert(RewriteState state, HeapTuple tup)
>                                                       MAIN_FORKNUM,
>                                                       state->rs_blockno,
>                                                       page,
> -                                                     true);
> +                                                     true,
> +                                                     false);

Did you verify that that's ok when a unlogged table is clustered/vacuum
full'ed?

> @@ -181,6 +183,9 @@ xlog_identify(uint8 info)
>               case XLOG_FPI_FOR_HINT:
>                       id = "FPI_FOR_HINT";
>                       break;
> +             case XLOG_FPI_FLUSH:
> +                     id = "FPI_FOR_SYNC";
> +                     break;
>       }

Old string.


> --- a/src/backend/access/transam/xloginsert.c
> +++ b/src/backend/access/transam/xloginsert.c
> @@ -932,10 +932,13 @@ XLogSaveBufferForHint(Buffer buffer, bool buffer_std)
>   * If the page follows the standard page layout, with a PageHeader and unused
>   * space between pd_lower and pd_upper, set 'page_std' to TRUE. That allows
>   * the unused space to be left out from the WAL record, making it smaller.
> + *
> + * If 'is_flush' is set to TRUE, relation will be requested to flush
> + * immediately its data at replay after replaying this full page image.
>   */

s/is_flush/flush_immed/? And maybe say that it 'will be flushed to the
OS immediately after replaying the record'?


Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to