Hi Christophe,

On 2/25/19 7:24 PM, Christophe Pettus wrote:


On Feb 25, 2019, at 08:55, Stephen Frost <sfr...@snowman.net> wrote:

I honestly do doubt that they have had the same experiences that I have
had

Well, I guarantee you that no two people on this list have had identical experiences. :)  
I certainly have been bitten by the problems with the current system.  But the resistance 
to major version upgrades is *huge*, and I'm strongly biased against anything that will 
make that harder.  I'm not sure I'm communicating how big a problem telling many large 
installations, "If you move to v12/13/etc., you will have to change your backup 
system" is going to be.

I honestly think you are underestimating how bad this can be.

The prevailing wisdom is that it's unfortunate that these backup_labels get left around but they can be removed with scripting, so no big deal. After that the cluster will start.

But -- if you are too aggressive about removing the backup_label and accidentally do it before a real restore from backup, then you have a corrupt cluster. Totally silent, but definitely corrupt. You'll probably only see it when you start getting consistency errors from the indexes, if ever. Page checksums won't catch it either unless you are *lucky* enough to have a torn page.

Erroneous scripting of this kind can also affect backups that were made with the non-exclusive method since the backups look the same.

fsync() is the major corruption issue we are facing right now but that doesn't mean there aren't other sources of corruption we should be thinking about. I've thought about this one a lot and it scares me.

I've worked on ways to make it better, but all of them break something and involve compromises that are nearly as severe as removing exclusive backups entirely.

Regards,
--
-David
da...@pgmasters.net

Reply via email to