On Thu, Jan 27, 2022 at 2:32 PM Heikki Linnakangas <hlinn...@iki.fi> wrote: > Unlogged relations are not WAL-logged, but creating the init-fork is. > There are a few things around that seem sloppy: > > 1. In index_build(), we do this: > > > */ > > if (indexRelation->rd_rel->relpersistence == RELPERSISTENCE_UNLOGGED > > && > > !smgrexists(RelationGetSmgr(indexRelation), INIT_FORKNUM)) > > { > > smgrcreate(RelationGetSmgr(indexRelation), INIT_FORKNUM, > > false); > > indexRelation->rd_indam->ambuildempty(indexRelation); > > } > > Shouldn't we call log_smgrcreate() here? Creating the init fork is > otherwise not WAL-logged at all.
Yes, that's a bug. > 2. Some implementations of ambuildempty() use the buffer cache (hash, > gist, gin, brin), while others bypass it and call smgrimmedsync() > instead (btree, spgist, bloom). I don't see any particular reason for > those decisions, it seems to be based purely on which example the author > happened to copy-paste. I thought that this inconsistency was odd when I was developing the unlogged feature, but I tried to keep each routine's ambuildempty() consistent with whatever ambuild() was doing. I don't mind if you want to change it, though. > 3. Those ambuildempty implementations that bypass the buffer cache use > smgrwrite() to write the pages. That doesn't make any difference in > practice, but in principle it's wrong: You are supposed to use > smgrextend() when extending a relation. That's a mistake on my part. > 4. Also, the smgrwrite() calls are performed before WAL-logging the > pages, so the page that's written to disk has 0/0 as the LSN, not the > LSN of the WAL record. That's harmless too, but seems a bit sloppy. That is also a mistake on my part. > 5. In heapam_relation_set_new_filenode(), we do this: > > > > > /* > > * If required, set up an init fork for an unlogged table so that it > > can > > * be correctly reinitialized on restart. An immediate sync is > > required > > * even if the page has been logged, because the write did not go > > through > > * shared_buffers and therefore a concurrent checkpoint may have > > moved the > > * redo pointer past our xlog record. Recovery may as well remove it > > * while replaying, for example, XLOG_DBASE_CREATE or > > XLOG_TBLSPC_CREATE > > * record. Therefore, logging is necessary even if wal_level=minimal. > > */ > > if (persistence == RELPERSISTENCE_UNLOGGED) > > { > > Assert(rel->rd_rel->relkind == RELKIND_RELATION || > > rel->rd_rel->relkind == RELKIND_MATVIEW || > > rel->rd_rel->relkind == RELKIND_TOASTVALUE); > > smgrcreate(srel, INIT_FORKNUM, false); > > log_smgrcreate(newrnode, INIT_FORKNUM); > > smgrimmedsync(srel, INIT_FORKNUM); > > } > > The comment doesn't make much sense, we haven't written nor WAL-logged > any page here, with nor without the buffer cache. It made more sense > before commit fa0f466d53. Well, it seems to me (and perhaps I am just confused) that complaining that there's no page written here might be a technicality. The point is that there's no synchronization between the work we're doing here -- which is creating a fork, not writing a page -- and any concurrent checkpoint. So we both need to log it, and also sync it immediately. -- Robert Haas EDB: http://www.enterprisedb.com