Hi, On 2022-01-20 19:15:08 +0000, Bossart, Nathan wrote: > > * Why is it okay to ignore lstat failure? Seems like we might > > as well not even have the lstat. > > I added error checking for lstat().
It seems odd to change a bunch of things to not be errors anymore, but then add new sources of errors. If we try to deal with concurrent deletions or permission issues - otherwise what's the point of making unlink() not an error anymore - why do we expect to be able to lstat()? > I also updated CheckPointSnapBuild() and UpdateLogicalMappings() with > similar adjustments. FWIW, I still think the ERROR->LOG changes are bad idea. The whole thing of "oh, let's just ignore stuff that we don't expect and soldier on" has bitten us over and over again. It makes us less robust, not more robust. It's also just about impossible to monitor for problems that emit LOG. I'd be more on board accepting some selective errors. E.g. not erroring on ENOENT, but continuing to error on others (most likely ENOACCESS). I think we should *not* change the > + /* > + * We just log a message if a file doesn't fit the pattern, it's > + * probably some editor's lock/state file or similar... > + */ An editor's lock file that starts with map- would presumably be the whole filename plus an additional file-ending. But this check won't catch those. > + * Unlike failures to unlink() old logical rewrite > files, we must > + * ERROR if we're unable to sync transient state down > to disk. This > + * allows replay to assume that everything written out > before > + * checkpoint start is persisted. > */ Hm, not quite happy with the second bit. Checkpointed state being durable isn't about replay directly, it's about the basic property of a checkpoint being fulfilled? > > pgstat_report_wait_start(WAIT_EVENT_LOGICAL_REWRITE_CHECKPOINT_SYNC); > if (pg_fsync(fd) != 0) > @@ -1282,4 +1305,7 @@ CheckPointLogicalRewriteHeap(void) > } > } > FreeDir(mappings_dir); > + > + /* persist directory entries to disk */ > + fsync_fname("pg_logical/mappings", true); > } This is probably worth backpatching, if you split it out I'll do so. Greetings, Andres Freund