On 11/17/23 00:18, Andres Freund wrote:
I've often had to analyze what caused corruption in PG instances, where the symptoms match not having had backup_label in place when bringing on the node. However that's surprisingly hard - the only log messages that indicate use of backup_label are at DEBUG1. Given how crucial use of backup_label is and how frequently people do get it wrong, I think we should add a LOG message - it's not like use of backup_label is a frequent thing in the life of a postgres instance and is going to swamp the log. And I think we should backpatch that addition.
+1 for the message and I think a backpatch is fine as long as it is a new message. If monitoring systems can't handle an unrecognized message then that feels like a problem on their part.
Medium term I think we should go further, and leave evidence in pg_control about the last use of ControlFile->backupStartPoint, instead of resetting it.
Michael also thinks this is a good idea.
I realize that there's a discussion about removing backup_label - but I think that's fairly orthogonal. Should we go with the pg_control approach, we should still emit a useful message when starting in a state that's "equivalent" to having used the backup_label.
Agreed, this new message could easily be adapted to the recovery in pg_control patch.
Regards, -David