Hi, On 2015-12-04 21:57:54 +0900, Michael Paquier wrote: > On Fri, Dec 4, 2015 at 5:10 PM, Andres Freund <and...@anarazel.de> wrote: > >> Let's go for XLOG_FPI_FLUSH. > > > > I think the other way is a bit better, because we can add new flags > > without changing the WAL format. > > Hm. On the contrary, I think that it would make more sense to have a > flag as well for FOR_HINT honestly, those are really the same > operations, and FOR_HINT is just here for statistic purposes.
I don't think it's all that much the same operation. And WRT statistics purpose: Being able to easily differentiate FOR_HINT is important for pg_xlogdump --stats, but not at all for XLOG_FPI_FLUSH. > >> --- a/src/backend/access/transam/xloginsert.c > >> +++ b/src/backend/access/transam/xloginsert.c > >> @@ -932,10 +932,13 @@ XLogSaveBufferForHint(Buffer buffer, bool buffer_std) > >> * If the page follows the standard page layout, with a PageHeader and > >> unused > >> * space between pd_lower and pd_upper, set 'page_std' to TRUE. That > >> allows > >> * the unused space to be left out from the WAL record, making it smaller. > >> + * > >> + * If 'is_flush' is set to TRUE, relation will be requested to flush > >> + * immediately its data at replay after replaying this full page image. > >> */ > > > > s/is_flush/flush_immed/? And maybe say that it 'will be flushed to the > > OS immediately after replaying the record'? > > s/OS/stable storage? It's not flushed to stable storage here. Just to the OS. > diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c > index 99337b0..fff48ab 100644 > --- a/src/backend/access/brin/brin.c > +++ b/src/backend/access/brin/brin.c > @@ -682,7 +682,15 @@ brinbuildempty(PG_FUNCTION_ARGS) > brin_metapage_init(BufferGetPage(metabuf), BrinGetPagesPerRange(index), > BRIN_CURRENT_VERSION); > MarkBufferDirty(metabuf); > - log_newpage_buffer(metabuf, false); > + > + /* > + * For unlogged relations, this page should be immediately flushed > + * to disk after being replayed. This is necessary to ensure that the > + * initial on-disk state of unlogged relations is preserved as the > + * on-disk files are copied directly at the end of recovery. > + */ > + log_newpage_buffer(metabuf, false, > + index->rd_rel->relpersistence == RELPERSISTENCE_UNLOGGED); > END_CRIT_SECTION(); > > UnlockReleaseBuffer(metabuf); > diff --git a/src/backend/access/brin/brin_pageops.c > b/src/backend/access/brin/brin_pageops.c > index f876f62..572fe20 100644 > --- a/src/backend/access/brin/brin_pageops.c > +++ b/src/backend/access/brin/brin_pageops.c > @@ -865,7 +865,7 @@ brin_initialize_empty_new_buffer(Relation idxrel, Buffer > buffer) > page = BufferGetPage(buffer); > brin_page_init(page, BRIN_PAGETYPE_REGULAR); > MarkBufferDirty(buffer); > - log_newpage_buffer(buffer, true); > + log_newpage_buffer(buffer, true, false); > END_CRIT_SECTION(); > > /* > diff --git a/src/backend/access/gin/gininsert.c > b/src/backend/access/gin/gininsert.c > index 49e9185..17c168a 100644 > --- a/src/backend/access/gin/gininsert.c > +++ b/src/backend/access/gin/gininsert.c > @@ -450,14 +450,22 @@ ginbuildempty(PG_FUNCTION_ARGS) > ReadBufferExtended(index, INIT_FORKNUM, P_NEW, RBM_NORMAL, > NULL); > LockBuffer(RootBuffer, BUFFER_LOCK_EXCLUSIVE); > > - /* Initialize and xlog metabuffer and root buffer. */ > + /* > + * Initialize and xlog metabuffer and root buffer. For unlogged > + * relations, those pages need to be immediately flushed to disk > + * after being replayed. This is necessary to ensure that the > + * initial on-disk state of unlogged relations is preserved when > + * they get reset at the end of recovery. > + */ > START_CRIT_SECTION(); > GinInitMetabuffer(MetaBuffer); > MarkBufferDirty(MetaBuffer); > - log_newpage_buffer(MetaBuffer, false); > + log_newpage_buffer(MetaBuffer, false, > + index->rd_rel->relpersistence == RELPERSISTENCE_UNLOGGED); > GinInitBuffer(RootBuffer, GIN_LEAF); > MarkBufferDirty(RootBuffer); > - log_newpage_buffer(RootBuffer, false); > + log_newpage_buffer(RootBuffer, false, > + index->rd_rel->relpersistence == RELPERSISTENCE_UNLOGGED); > END_CRIT_SECTION(); > > /* Unlock and release the buffers. */ > diff --git a/src/backend/access/gist/gist.c b/src/backend/access/gist/gist.c > index 53bccf6..6a20031 100644 > --- a/src/backend/access/gist/gist.c > +++ b/src/backend/access/gist/gist.c > @@ -84,7 +84,8 @@ gistbuildempty(PG_FUNCTION_ARGS) > START_CRIT_SECTION(); > GISTInitBuffer(buffer, F_LEAF); > MarkBufferDirty(buffer); > - log_newpage_buffer(buffer, true); > + log_newpage_buffer(buffer, true, > + index->rd_rel->relpersistence == RELPERSISTENCE_UNLOGGED); > END_CRIT_SECTION(); > I might be missing something here - but isn't it pretty much guaranteed that all these are unlogged relations? Otherwise *buildempty() wouldn't have been called, right? > else if (info == XLOG_BACKUP_END) > { > @@ -178,9 +183,6 @@ xlog_identify(uint8 info) > case XLOG_FPI: > id = "FPI"; > break; > - case XLOG_FPI_FOR_HINT: > - id = "FPI_FOR_HINT"; > - break; > } Really don't want to do that. > @@ -9391,14 +9394,34 @@ xlog_redo(XLogReaderState *record) > * resource manager needs to generate conflicts, it has to > define a > * separate WAL record type and redo routine. > * > - * XLOG_FPI_FOR_HINT records are generated when a page needs to > be > - * WAL- logged because of a hint bit update. They are only > generated > - * when checksums are enabled. There is no difference in > handling > - * XLOG_FPI and XLOG_FPI_FOR_HINT records, they use a different > info > - * code just to distinguish them for statistics purposes. > + * Records flagged with 'for_hint_bits' are generated when a > page needs > + * to be WAL- logged because of a hint bit update. They are only > + * generated when checksums are enabled. There is no difference > in > + * handling records when this flag is set, it is used for > statistics > + * purposes. > + * > + * Records flagged with 'is_flush' indicate that the page > immediately > + * needs to be written to disk, not just to shared buffers. > This is > + * important if the on-disk state is to be the authoritative, > not the > + * state in shared buffers. E.g. because on-disk files may > later be > + * copied directly. > */ > if (XLogReadBufferForRedo(record, 0, &buffer) != BLK_RESTORED) > elog(ERROR, "unexpected XLogReadBufferForRedo result > when restoring backup block"); > + > + if (xlrec.is_flush) > + { > + RelFileNode rnode; > + ForkNumber forknum; > + BlockNumber blkno; > + SMgrRelation srel; > + > + (void) XLogRecGetBlockTag(record, 0, &rnode, &forknum, > &blkno); > + srel = smgropen(rnode, InvalidBackendId); > + smgrwrite(srel, forknum, blkno, BufferGetPage(buffer), > false); > + smgrclose(srel); > + } That'd leave the dirty bit set on the buffer... Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers