On Fri, Jan 28, 2011 at 3:08 PM, Tom Lane <t...@sss.pgh.pa.us> wrote: > Robert Haas <robertmh...@gmail.com> writes: >> Any substantive comments, besides the obvious "this is not 9.1 material"? > > Now that I've absorbed a bit more caffeine, let's see if I can think > straight this time. > > General principle you want to assert: any WAL entry that merely results > in setting a deterministic field to a deterministic value shouldn't need > a FPI, since it is easy to check whether the field has that value and > re-apply the update if needed. The way this would have to work is: > > 1. Page LSN < WAL location: apply field update, set page LSN = WAL location. > > 2. Page LSN = WAL location: check if field matches, apply update if not. > > 3. Page LSN > WAL location: do NOT apply field update or change LSN. > > Now the issue is what happens if a torn-page event causes the LSN to be > out of sync with the page contents. If the LSN is too small (ie, the > actual page contents come from some later WAL entry) we may mistakenly > apply action 1 or 2 to bytes that don't actually represent the field we > think they do. Now, that would be all right once we replay the later > WAL entry and replace the page data from its FPI. But there are two > huge risks here: one being that in a PITR operation the user might tell > us to stop short of applying the later WAL entry, and the other being > that in any case we'll have an interval where the page is corrupt, which > is a problem if any hot-standby queries try to look at it. > > I think we might be all right with that if we can guarantee that any such > inconsistencies only exist before the system believes that it's reached > a consistent database state, but I'm not quite sure if that's true or > not.
I wasn't too sure either, at first, but I think it must be true. If it were possible to reach consistency while there were still torn pages on disk, then we could enter normal running with those pages still on disk by stopping recovery at that point. And clearly that's not going to fly unless we're talking about something like hint bits, where nothing's actually busted if we only get half the update. > Hmm, no, it doesn't work, because the above argument assumes there is a > later FPI-containing WAL entry at all. Suppose we have a sequence of > several single-field-setting WAL entries, and there is no FPI-containing > WAL entry for the page before end of WAL. We could have a torn page > such that the LSN comes from one of the later entries but the field that > should be set from an earlier entry is old. When we replay the earlier > entry, case 3 will apply, so we do nothing ... incorrectly. And there > will be no FPI to fix it. What happens if we (a) keep the current rule after reaching consistency and (b) apply any such updates *unconditionally* - that is, without reference to the LSN - prior to reaching consistency? Under that rule, if we encounter an FPI before reaching consistency, we're OK. So let's suppose we don't. No matter how many times we replay any initial prefix of any such updates between the redo pointer and the point at which we reach consistency, the state of the page when we finally reach consistency will be identical. But we could get hosed if replay progressed *past* the minimum recovery point and then started over at the previous redo pointer. If we forced an immediate restartpoint on reaching consistency, that seems like it might prevent that scenario. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers