I've been able to reproduce the problem reported by Pius Chan in bug
#7819.  With some logging added that prints the OldestXmin values used
by vacuum and cluster operations, the reason is fairly clear:

2013-02-01 13:41:12 EST 8011 LOG:  vacuuming "test" with OldestXmin 1760160 
FreezeLimit 4246727456
2013-02-01 13:41:14 EST 8011 LOG:  automatic vacuum of table 
"test.public.test": index scans: 1
        pages: 0 removed, 635 remain
        tuples: 48895 removed, 91 remain
        system usage: CPU 0.00s/0.03u sec elapsed 1.64 sec
2013-02-01 13:41:14 EST 8011 LOG:  automatic analyze of table 
"test.public.test" system usage: CPU 0.00s/0.00u sec elapsed 0.09 sec
2013-02-01 13:41:14 EST 8011 LOG:  vacuuming "pg_toast_115435" with OldestXmin 
1761518 FreezeLimit 4246728814
2013-02-01 13:41:57 EST 8011 LOG:  automatic vacuum of table 
"test.pg_toast.pg_toast_115435": index scans: 1
        pages: 0 removed, 12904 remain
        tuples: 7624 removed, 43992 remain
        system usage: CPU 0.21s/0.13u sec elapsed 42.79 sec
2013-02-01 13:41:58 EST 7025 LOG:  clustering "test" with OldestXmin 1752719 
FreezeXid 1630979
2013-02-01 13:41:58 EST 7025 ERROR:  missing chunk number 0 for toast value 
215828 in pg_toast_115435

IOW, autovacuum worker 8011 vacuumed "test" with one xmin, and then a
bit later vacuumed its toast table with a slightly newer xmin.  This
means that certain tuples' toast tuples could get cleaned out of the
toast table while the parent tuples are still there in the heap.
Normally this is fine because the parent tuples are certainly dead and
beyond snapshot range, so no one will make any attempt to fetch their
toast values.  But then, here comes CLUSTER with an older xmin ---
so it thinks it needs to copy over some of those dead-but-possibly-
still-visible-to-somebody tuples, and kaboom.

Now, CLUSTER is very careful to make sure that it gets a lock on the
toast table before computing OldestXmin, with the express intent of
preventing this type of problem (cf comment in copy_heap_data).
However, if the result of GetOldestXmin() can ever go backwards,
that doesn't help.  And guess what ... it can.

The header comment for GetOldestXmin() contains a long rationalization
for why it's safe for the result to sometimes go backwards, but now
that I've seen this example I'm pretty sure that that's all horsepucky.

I think we need to either ensure that GetOldestXmin is strictly
nondecreasing, or find some more-direct way of interlocking vacuuming
of toast tables with their parents.

We could possibly do the former by tracking the last-known value in
shared memory, but it'd be complicated because the value can
(legitimately) vary depending on which database you ask about.

The only backpatchable way I can think of to do the latter is to undo
the decoupling of main and toast-table vacuuming, and then say that
we simply don't change the value of OldestXmin when going to vacuum
the toast table.  This is a bit unpleasant.

In any case, I no longer have much faith in the idea that letting
GetOldestXmin go backwards is really safe.

The one bright spot is that AFAICS this isn't a true data loss hazard;
no possibly-still-needed data can be lost.  Any failures would be
transient failures of CLUSTER or VACUUM FULL.

Thoughts?

                        regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to