Re: [HACKERS] autovacuum truncate exclusive lock round two

Jan Wieck Thu, 25 Oct 2012 09:17:00 -0700

On 10/25/2012 10:12 AM, Stephen Frost wrote:

Jan,


* Jan Wieck (janwi...@yahoo.com) wrote:

The problem case this patch is dealing with is rolling window tables
that experienced some bloat. The typical example is a log table,
that has new data constantly added and the oldest data constantly
purged out. This data normally rotates through some blocks like a
rolling window. If for some reason (purging turned off for example)
this table bloats by several GB and later shrinks back to its normal
content, soon all the used blocks are at the beginning of the heap
and we find tens of thousands of empty pages at the end. Only now
does the second scan take more than 1000ms and autovacuum is at risk
to get killed while at it.


My concern is that this could certainly also happen to a heavily updated
table in an OLTP type of environment where the requirement to take a
heavy lock to clean it up might prevent it from ever happening..  I was
simply hoping we could find a mechanism to lock just those pages we're
getting ready to nuke rather than the entire relation.  Perhaps we can
consider how to make those changes alongside of changes to eliminate or
reduce the extent locking that has been painful (for me at least) when
doing massive parallel loads into a table.

I've been testing this with loads of 20 writes/s to that bloated table.Preventing not only the clean up, but the following ANALYZE as well isprecisely what happens. There may be multiple ways how to get into thissituation, but once you're there the symptoms are the same. Vacuum failsto truncate it and causing a 1 second hiccup every minute, while vacuumis holding the exclusive lock until the deadlock detection code ofanother transaction kills it.

My patch doesn't change the logic how we ensure that we don't zap anydata by accident with the truncate and Tom's comments suggest we shouldstick to it. It only makes autovacuum check frequently if theAccessExclusiveLock is actually blocking anyone and then get out of theway.

I would rather like to discuss any ideas how to do all this without 3new GUCs.

In the original code, the maximum delay that autovacuum can cause byholding the exclusive lock is one deadlock_timeout (default 1s). Itwould appear reasonable to me to use max(deadlock_timeout/10,10ms) asthe interval to check for a conflicting lock request. For anothertransaction that needs to access the table this is 10 times faster thanit is now and still guarantees that autovacuum will make some progresswith the truncate.

The other two GUCs control how often and how fast autovacuum tries toacquire the exclusive lock in the first place. Since we actively releasethe lock *because someone needs it* it is pretty much guaranteed thatthe immediate next lock attempt fails. We on purpose do aConditionalLockRelation() because there is a chance to deadlock. Thecurrent code only tries one lock attempt and gives up immediately. Idon't know from what to derive a good value for how long to retry, butthe nap time in between tries could be a hardcoded 20ms or using thecost based vacuum nap time (which defaults to 20ms).


Any other ideas are welcome.


Thanks,
Jan

--
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] autovacuum truncate exclusive lock round two

Reply via email to