This problem has been discussed before. Those familiar with the subject please skip the next paragraph.

When autovacuum finds a substantial amount of empty pages at the end of a relation, it attempts to truncate it in lazy_truncate_heap(). Because all the scanning had been done in parallel to normal DB activity, it needs to verify that all those blocks are still empty. To do that autovacuum grabs an AccessExclusiveLock on the relation, then scans backwards to the last non-empty page. If any other backend needs to access that table during this time, it will kill the autovacuum from the deadlock detection code, which by default is done after a 1000ms timeout. The autovacuum launcher will start another vacuum after (default) 60 seconds, which most likely is getting killed again, and again, and again. The net result of this is that the table is never truncated and every 60 seconds there is a 1 second hiccup before the autovacuum is killed.

Proposal:

Add functions to lmgr that are derived from the lock release code, but instead of releasing the lock and waking up waiters, just return a boolean telling if there are any waiters that would be woken up if this lock was released.

Use this lmgr feature inside count_nondeletable_pages() of vacuumlazy.c to periodically check, if there is a conflicting lock request waiting. If not, keep going. If there is a waiter, truncate the relation to the point checked thus far, release the AccessExclusiveLock, then loop back to where we acquire this lock in the first place and continue checking/truncating.

I have a working patch here:

https://github.com/wieck/postgres/tree/autovacuum-truncate-lock

This patch does introduce three new postgresql.conf parameters, which I would be happy to get rid of if we could derive them from something else. Something based on the deadlock timeout may be possible.

    autovacuum_truncate_lock_check = 100ms # how frequent to check
                                           # for conflicting locks
    autovacuum_truncate_lock_retry = 50    # how often to try acquiring
                                           # the exclusive lock
    autovacuum_truncate_lock_wait = 20ms   # nap in between attempts

With these settings, I see the truncate of a bloated table progressing at a rate of 3 minutes per GB, while that table is accessed 20 times per second.

The original "kill autovacuum" mechanism in the deadlock code is still there. All this code really does is 10 lmgr lookups per second and releasing the AccessExclusiveLock if there are any waiters. I don't think it can get any cheaper than this.

I am attaching a script that uses pgbench to demonstrate the actual problem of a bloated table with significant empty pages at the end.


Comments?


Jan

--
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin

Attachment: t1.autovac-lock-issue.tgz
Description: application/compressed

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to