Hi,

On 2022-01-20 19:15:08 +0000, Bossart, Nathan wrote:
> > * Why is it okay to ignore lstat failure?  Seems like we might
> > as well not even have the lstat.
> 
> I added error checking for lstat().

It seems odd to change a bunch of things to not be errors anymore, but then
add new sources of errors. If we try to deal with concurrent deletions or
permission issues - otherwise what's the point of making unlink() not an error
anymore - why do we expect to be able to lstat()?


> I also updated CheckPointSnapBuild() and UpdateLogicalMappings() with
> similar adjustments.

FWIW, I still think the ERROR->LOG changes are bad idea. The whole thing of
"oh, let's just ignore stuff that we don't expect and soldier on" has bitten
us over and over again. It makes us less robust, not more robust.

It's also just about impossible to monitor for problems that emit LOG.


I'd be more on board accepting some selective errors. E.g. not erroring on
ENOENT, but continuing to error on others (most likely ENOACCESS). I think we
should *not* change the


> +             /*
> +              * We just log a message if a file doesn't fit the pattern, it's
> +              * probably some editor's lock/state file or similar...
> +              */

An editor's lock file that starts with map- would presumably be the whole
filename plus an additional file-ending. But this check won't catch those.


> +                      * Unlike failures to unlink() old logical rewrite 
> files, we must
> +                      * ERROR if we're unable to sync transient state down 
> to disk.  This
> +                      * allows replay to assume that everything written out 
> before
> +                      * checkpoint start is persisted.
>                        */

Hm, not quite happy with the second bit. Checkpointed state being durable
isn't about replay directly, it's about the basic property of a checkpoint
being fulfilled?


>                       
> pgstat_report_wait_start(WAIT_EVENT_LOGICAL_REWRITE_CHECKPOINT_SYNC);
>                       if (pg_fsync(fd) != 0)
> @@ -1282,4 +1305,7 @@ CheckPointLogicalRewriteHeap(void)
>               }
>       }
>       FreeDir(mappings_dir);
> +
> +     /* persist directory entries to disk */
> +     fsync_fname("pg_logical/mappings", true);
>  }

This is probably worth backpatching, if you split it out I'll do so.

Greetings,

Andres Freund


Reply via email to