On 14.11.2013 02:26, Jeff Janes wrote:
On Wed, Nov 13, 2013 at 3:53 PM, Sergey Burladyan <eshkin...@gmail.com>wrote:

Jeff Janes <jeff.ja...@gmail.com> writes:

If I not mistaken, looks like lazy_scan_heap() called from
lazy_vacuum_rel()
(see [1]) skip pages, even if it run with scan_all == true,
lazy_scan_heap()
does not increment scanned_pages if lazy_check_needs_freeze() return
false, so
if this occurred at wraparound vacuum it cannot update pg_class, because
pg_class updated via this code:

     new_frozen_xid = FreezeLimit;
     if (vacrelstats->scanned_pages < vacrelstats->rel_pages)
         new_frozen_xid = InvalidTransactionId;

     vac_update_relstats(onerel,
                         new_rel_pages,
                         new_rel_tuples,
                         new_rel_allvisible,
                         vacrelstats->hasindex,
                         new_frozen_xid);

so i think in our prevent wraparound vacuum vacrelstats->scanned_pages
always
less than vacrelstats->rel_pages and pg_class relfrozenxid never updated.

Yeah, I think that that is a bug.  If the clean-up lock is unavailable but
the page is inspected without it and found not to need freezing, then the
page needs to be counted as scanned, but is not so counted.

commit bbb6e559c4ea0fb4c346beda76736451dc24eb4e
Date:   Mon Nov 7 21:39:40 2011 -0500

But this was introduced in 9.2.0, so unless the OP didn't upgrade to 9.2
until recently, I don't know why it just started happening.

It looks like a simple fix (to HEAD attached), but I don't know how to test
it.

Thanks, committed.

I was able to reproduce it by doing this:

-- Create and populate a test table
create table foo (i int4);
insert into foo select generate_series(1, 10000);

-- Freeze it, and observe relfrozenxid.
vacuum freeze foo;
select relfrozenxid from pg_class where oid='foo'::regclass;

-- Do some transactions to bump current transaction ID
insert into foo values (-1);
insert into foo values (-1);
insert into foo values (-1);
insert into foo values (-1);

-- Now, in a second psql session, open a cursor on the table. It keeps the current page pinned, which causes the ConditionalLockBuffer() in vacuumlazy.c to fail to grab the lock:

 begin; declare  foocur cursor for select * from foo;  fetch foocur;

-- Back to the original psql session, vacuum freeze again:
vacuum freeze foo;

-- Observe the new relfrozenxid. With the bug, it's the same as before. Vacuum freeze is not able to advance relfrozenxid because it skipped the page by the cursor. With the patch, it does advance.

select relfrozenxid from pg_class where oid='foo'::regclass;

Also, it seem like it might be worth issuing a warning if scan_all is true
but all was not scanned.

Hmm, the new definition of "scanned", with this patch, is that pages that were not vacuumed are still counted as scanned. I don't think a warning is appropriate, there isn't anything wrong with skipping pinned pages that don't need freezing, the amount of bloat left behind should be tiny. For diagnostic purposes, perhaps they should be counted separately in vacuum verbose output, though.

- Heikki


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Reply via email to