Re: Missing pg_control crashes postmaster

Andres Freund Wed, 25 Jul 2018 08:10:41 -0700

Hi,

On 2018-07-25 10:52:08 -0400, David Steele wrote:
> On 7/25/18 10:37 AM, Andres Freund wrote:
> > On July 25, 2018 7:18:30 AM PDT, David Steele <da...@pgmasters.net> wrote:
> > > 
> > > It seems like an easy win if we can find a safe way to do it, though I
> > > admit that this is only a benefit in corner cases.
> > 
> > What would we win here? Which scenario that's not contrived would be less 
> > bad due to the proposed change.  This seems complexity for it's own sake.
> 
> I think it's worth preserving pg_control even in the case where there is
> other damage to the cluster.  The alternative in this case (if no backup
> exists) is to run pg_resetwal which means data since the last checkpoint
> will not be written out causing even more data loss.  I have run clusters
> with checkpoint_timeout = 60m so data loss in this case is a real concern.


Wait, what? How is "data loss in this case is a real concern." - no
even a remotely realistic scenario has been described where this matters
so far.


> I favor the contrived scenario that helps preserve the current cluster
> instead of a hypothetical newly init'd one.  I also don't think that users
> deleting files out of a cluster is all that contrived.

But trying to limp on in that case, and that being helpful, is.


> Adding O_CREATE to open() doesn't seem too complex to me.  I'm not really in
> favor of the renaming idea, but I'm not against it either if it gets me a
> copy of the pg_control file.

The problem is that that'll just hide the issue for a bit longer, while
continuing (due to the O_CREAT we'll not PANIC anymore).  Which can lead
to a lot of followup issues, like checkpoints removing old WAL that'd
have been useful for data recovery.

Greetings,

Andres Freund

Re: Missing pg_control crashes postmaster

Reply via email to