Simon Riggs wrote:
Falling back to the secondary checkpoint implies we have a corrupted or absent WAL file, so making recovery startup work correctly won't avoid the need to re-run the base backup. We'll end with an unrecoverable error in either case, so it doesn't seem worth attempting to improve this in the way you suggest.
That's true whenever you have to fall back to a secondary checkpoint, but we still try to get the database up. One could argue that we shouldn't, of course.
Anyway, the point is that the patch relies on a non-obvious assumption. Even if the secondary checkpoint issue is a non-issue, it's not obvious (to me at least) that there isn't other similar scenarios. And someone might inadvertently break the assumption in a future patch, because it's not an obvious one; calling ReadRecord looks very innocent. We shouldn't introduce an assumption like that when we don't have to.
I think we should completely prevent access to secondary checkpoints during archive recovery, because if the primary checkpoint isn't present or is corrupt we aren't ever going to get passed it to get to the pg_stop_backup() point. Hence an archive recovery can never be valid in that case. I'll do a separate patch for that because they are unrelated issues.
Well, we already don't use the secondary checkpoint if a backup label file is present. And you can take a base backup without pg_start_backup()/pg_stop_backup() if you shut down the system first (a cold backup).
-- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-patches mailing list ([email protected]) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
