On Thu, Jun 23, 2011 at 6:40 PM, Jeff Davis <pg...@j-davis.com> wrote:
> On Thu, 2011-06-23 at 18:18 -0400, Robert Haas wrote:
>> Lazy VACUUM is the only thing that makes a page all visible.  I don't
>> understand the part about snapshots.
>
> Lazy VACUUM is the only thing that _marks_ a page with PD_ALL_VISIBLE.
>
> After an INSERT to a new page, and after all snapshots are released, the
> page becomes all-visible; and thus subject to being marked with
> PD_ALL_VISIBLE by lazy vacuum without bumping the LSN. Note that there
> is no cleanup action that takes place here, so nothing else will bump
> the LSN either.
>
> So, let's say that we hypothetically had persistent snapshots, then
> you'd have the following problem:
>
> 1. INSERT to a new page, marking it with LSN X
> 2. WAL flushed to LSN Y (Y > X)
> 2. Some persistent snapshot (that doesn't see the INSERT) is released,
> and generates WAL recording that fact with LSN Z (Z > Y)
> 3. Lazy VACUUM marks the newly all-visible page with PD_ALL_VISIBLE
> 4. page is written out because LSN is still X
> 5. crash
>
> Now, the persistent snapshot is still present because LSN Z never made
> it to disk; but the page is marked with PD_ALL_VISIBLE.
>
> Sure, if these hypothetical persistent snapshots were transactional, and
> if synchronous_commit is on, then LSN Z would be flushed before step 3;
> but that's another set of assumptions. That's why I left it simple and
> said that the assumption was "snapshots are released if there's a
> crash".

I don't really think that's a separate set of assumptions - if we had
some system whereby snapshots could survive a crash, then they'd have
to be WAL-logged (because that's how we make things survive crashes).
And anything that is WAL-logged must obey the WAL-before-data rule.
We have a system already that ensures that when
synchronous_commit=off, CLOG pages can't be flushed before the
corresponding WAL record makes it to disk.  For a system like what
you're describing, you'd need something similar - these
crash-surviving snapshots would have to make sure that no action which
depended on their state hit the disk before the WAL record marking the
state change hit the disk.

I guess the point you are driving at here is that a page can only go
from being all-visible to not-all-visible by virtue of being modified.
 There's no other piece of state (like a persistent snapshot) that can
be lost as part of a crash that would make us need change our mind and
decide that an all-visible XID is really not all-visible after all.
(The reverse is not true: since snapshots are ephemeral, a crash will
render every row either all-visible or dead.)  I guess I never thought
about documenting that particular aspect of it because (to me) it
seems fairly self-evident.  Maybe I'm wrong...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to