Re: [HACKERS] autovacuum truncate exclusive lock round two

Jan Wieck Thu, 06 Dec 2012 10:34:44 -0800

Kevin and Robert are well aware of most of the below. I just want to putthis out here so other people, who haven't followed the discussion tooclosely, may chime in.


Some details on the problem:

First of all, there is a minimum number of 1000 pages that the vacuumscan must detect as possibly being all empty at the end of a relation.Without at least 8MB of possible free space at the end, the code nevercalls lazy_truncate_heap(). This means we don't have to worry about tinyrelations at all. Any relation that stays under 8MB turnover betweenautovacuum VACUUM runs can never get into this ever.

Relations that have higher turnover than that, but at random places orwith a high percentage of rather static rows, don't fall into theproblem category either. They may never accumulate that much "contiguousfree space at the end". The turnover will be reusing free space all overthe place. So again, lazy_truncate_heap() won't be called ever.

Relations that eventually build up more than 8MB of free space at theend aren't automatically a problem. The autovacuum VACUUM scan justscanned those pages at the end, which means that the safety scan fortruncate, done under exclusive lock, is checking exactly those pages atthe end and most likely they are still in memory. The truncate safetyscan will be fast due to a 99+% buffer cache hit rate.

The only actual problem case (I have found so far) are rolling windowtables of significant size, that can bloat multiple times their normalsize every now and then. This is indeed a rare corner case and I have noidea how many users may (unknowingly) be suffering from it.

This rare corner case triggers lazy_truncate_heap() with a significantamount of free space to truncate. The table bloats, then all the bloatis deleted and the periodic 100% turnover will guarantee that all "live"tuples will shortly after circulate in lower block numbers again, withgigabytes of empty space at the end.

This by itself isn't a problem still. The existing code may do the jobjust fine "unless" there is "frequent" access to that very table. Onlyat this special combination of circumstances we actually have a problem.

Only now, with a significant amount of free space at the end andfrequent access to the table, the truncate safety scan takes long enoughand has to actually read pages from disk to interfere with clienttransactions.

At this point, the truncate safety scan may have to be interrupted tolet the frequent other traffic go through. This is what we accomplishwith the autovacuum_truncate_lock_check interval, where we voluntarilyrelease the lock whenever someone else needs it. I agree with Kevin thata 20ms check interval is reasonable because the code to check this iseven less expensive than releasing the exclusive lock we're holding.

At the same time, completely giving up and relying on the autovacuumlauncher to restart another worker isn't as free as it looks likeeither. The next autovacuum worker will have to do the VACUUM scanfirst, before getting to the truncate phase. We cannot just skip blindlyto the truncate code. With repeated abortion of the truncate, the tablewould deteriorate and accumulate dead tuples again. The removal of deadtuples and their index tuples has priority.

As said earlier in the discussion, the VACUUM scan will skip pages, thatare marked as completely visible. So the scan won't physically read themajority of the empty pages at the end of the table over and over. Butit will at least scan all pages, that had been modified since the lastVACUUM run.

To me this means that we want to be more generous to the truncate codeabout acquiring the exclusive lock. In my tests, I've seen that arolling window table with a "live" set of just 10 MB or so, but emptyspace of 3 GB, can still have a 2 minute VACUUM scan time. Throwing thatwork away because we can't acquire the exclusive lock withing 2 secondsis a waste of effort.


Something in between 2-60 seconds sounds more reasonable to me.


Jan

--
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] autovacuum truncate exclusive lock round two

Reply via email to