Re: First-draft release notes for back-branch releases

Andrew Gierth Tue, 06 Nov 2018 15:46:02 -0800

>>>>> "Tom" == Tom Lane <t...@sss.pgh.pa.us> writes:


 Tom> You could be bit by any shutdown of the old code, no, whether it's
 Tom> part of a pg_upgrade or not?

Nothing to do with pg_upgrade, this is likely to bite people just doing
an update from the previous minor release.

 Tom> Also, it looks like the bug only affects standbys (or at least
 Tom> that's what the commit message seems to imply), which makes it
 Tom> less of a data-loss hazard than it might've been.

The commit message doesn't really show the severity of the problem at
all.

The problem is this: the updating of minRecoveryPoint in the control
file is almost completely broken in the last point releases. It's not an
"incorrect calculation" as the commit message says, it's that the
bgwriter and checkpointer _do not update the value at all_ except
immediately after a checkpoint. That means that it is common to have a
situation where the recovery restartpoint is at lsn X, the
minRecoveryPoint is at a slightly later lsn Y, but there are on-disk
data pages with a _much_ later lsn Z.

If such a data page was the subject of a Btree/DELETE record, then any
attempt to do recovery will potentially PANIC with a (false) "WAL
contains references to invalid pages" error -- if, and only if, at least
one client (e.g. a monitoring system) is connected when the record is
replayed, which is possible because of the incorrect minRecoveryPoint.

The users whose case I was diagnosing on IRC were finding that their
monitoring system was sufficient to trigger the problem at least 80% of
the time. Consider that the broken minRecoveryPoint can be quite a long
way in the past relative to on-disk data pages, so the window of
vulnerability isn't necessarily small.

So while there _probably_ isn't any data corruption, the standby can get
into a state that isn't restartable unless you know to block client
connections to it until it has caught up. Rebuilding the standby from
the master will work but that may be a significant practical problem if
the data is large.

-- 
Andrew (irc:RhodiumToad)

Re: First-draft release notes for back-branch releases

Reply via email to