On 10/19/14, 11:41 AM, Andres Freund wrote:
On 2014-10-18 21:36:48 -0500, Jim Nasby wrote:
On 10/9/14, 4:19 PM, Andres Freund wrote:
On 2014-10-09 18:16:46 -0300, Alvaro Herrera wrote:
Andres Freund wrote:
On 2014-10-09 18:03:00 -0300, Alvaro Herrera wrote:
Bruce Momjian wrote:

I agree this is a serious problem.  We have discussed various options,
but have not decided on anything.  The TODO list has:

        https://wiki.postgresql.org/wiki/Todo

        Improve setting of visibility map bits for read-only and insert-only
        workloads
        
        http://www.postgresql.org/message-id/20130906001437.ga29...@momjian.us

I hate to repeat myself, but I think autovacuum could be modified to run
actions other than vacuum and analyze.  In this specific case we could
be running a table scan that checks only pages that don't have the
all-visible bit set, and see if it can be set.

Isn't that*precisely*  what a plain vacuum run does?

Well, it also scans for dead tuples, removes them, and needs to go
through indexes to remove their references.

IIRC it doesn't do most of that if that there's no need. And if it's a
insert only table without rollbacks. I*do*  think there's some
optimizations we could make in general.

No, it always attempts dead tuple removal.

I said some steps, not all steps. Check it out:

     /* If any tuples need to be deleted, perform final vacuum cycle */
     /* XXX put a threshold on min number of tuples here? */
     if (vacrelstats->num_dead_tuples > 0)
     {
         /* Log cleanup info before we touch indexes */
         vacuum_log_cleanup_info(onerel, vacrelstats);

         /* Remove index entries */
         for (i = 0; i < nindexes; i++)
             lazy_vacuum_index(Irel[i],
                               &indstats[i],
                               vacrelstats);
         /* Remove tuples from heap */
         lazy_vacuum_heap(onerel, vacrelstats);
         vacrelstats->num_index_scans++;
     }

There's rub here though. We unconditionally do:
        /* Do post-vacuum cleanup and statistics update for each index */
        for (i = 0; i < nindexes; i++)
                lazy_cleanup_index(Irel[i], indstats[i], vacrelstats);

and that's not particularly cheap. Maybe we should make that conditional
when there's been no lazy_vacuum_index/heap calls at all?

We could possibly pass in to lazy_cleanup_index whether we actually removed any 
tuples.

The "weird" part is that if it's not doing a freeze it will just punt
on a page if it can't get the cleanup lock.

I don't think that's particularly wierd. Otherwise vacuum can get stuck
behind a single very hot page - leading to much, much more bloat.

I have to believe that could seriously screw up autovacuum scheduling.

Why?

I'm worried there could be some pathological cases where we'd skip a large 
number of pages, perhaps if a vacuum scan and a seqscan ended up running 
alongside each other.

Perhaps this is just paranoia, but we have no idea how bad things might be, 
because we don't have any logging for how many pages we skipped because we 
couldn't lock them.

Also, if this really is that big a deal for heap pages, how come we don't get 
screwed by it on Btree index pages, where we mandate that we acquire a cleanup 
lock?

Now that we have forks, I'm wondering if it would be best to come up
with a per-page system that could be used to determine when a table
needs background work to be done. The visibility map could serve a lot
of this purpose, but I'm not sure if it would work for getting hint
bits set in the background.

It would. Per definition, all tuples that are 'all visible' need to be
fully hint bitted.

I think it would also be a win if we had a way to advance relfrozenxid
and relminmxid. Perhaps something that simply remembered the last XID
that touched each page...

Not sure what you're getting at here?

That ultimately, our current method for determining when and what to vacuum is 
rather crude, and likely results in wasted effort during scans as well as not 
firing autovac often enough. Keep in mind that autovac started as a user-space 
utility and the best it could possibly do was to keep a table of stats counters.

The visibility map obviously helps cut down on extra work during a scan, but it 
only goes so far in that regard.

Instead of relying on the crude methods, if we reliably tracked certain txids 
on a per-block basis in a fork, we could cheaply scan the fork and make an 
extremely informed decision on how much a vacuum would gain us, and exactly 
what blocks it should hit.

Let me use freezing as an example. If we had a reliable list of the lowest txid 
for each block of a relation that would allow us to do a freeze scan by hitting 
only blocks with minimum txid within our freeze range. The same could be done 
for multixacts.

That's just one example. We could do something similar for background hinting 
(oldest xmin/xmax of all non-hinted tuples), and whether it's worth trying to 
vacuum (I think that'd be a combination of oldest non-locking xmax and seeing 
how much room the FSM has listed for the page).

If we stored 3 txids for each block in a fork, we could fit information for 
~680 heap blocks in each fork block. So in a database with 680G of heap data, 
we could fully determine every *block* (not table) we needed to vacuum by 
scanning just 1GB of data. That would allow for far better autovacuum 
scheduling than what we do today.

I think the big missing piece lest something like Heikki's xid lsn
ranges thing gets finished is a freeze map.

The problem with a simple freeze map is when do you actually set the bit? If 
you do it while the transaction that created all the tuples is still running 
then any attempt to use the map prior to those tuples becoming all visible is 
pointless. Presumably this is why pd_prune_xid stores a txid and isn't just a 
boolean.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to