On Fri, Dec 2, 2016 at 3:50 AM, Robert Haas <robertmh...@gmail.com> wrote: > On Thu, Dec 1, 2016 at 1:39 PM, Tom Lane <t...@sss.pgh.pa.us> wrote: >> Robert Haas <robertmh...@gmail.com> writes: >>> I think that the indexes only need to be scanned if the VACUUM finds >>> dead tuples. But even 1 dead tuple will cause a complete scan of >>> every index. I've complained about this before and I think there's >>> room for improvement here, but nobody's been motivated enough to >>> pursue this yet. >> >> The thing that's been speculated about in the past is having some >> threshold larger than 1 on the minimum number of dead tuples needed >> to cause a cleanup pass. > > Agreed. > >> It wouldn't be hard to implement, if you >> could get consensus on what the threshold should be. > > Also agreed. > >> I'd think >> some algorithm similar to the autovacuum thresholds might be >> appropriate. It's not quite clear how this would interact with >> HOT pruning, though. > > What's the relevance of HOT pruning here? > > I was thinking that the relevant metric might be how many pages > contain dead tuples, because what we really want to do to reduce the > cost of future vacuuming and future index-only scans is get pages > marked all-visible. Say, if less than 2% of the pages in the table > contain dead tuples and the space required to store the TIDs is less > than 50% of maintenance_work_mem, skip the index scans. The first of > those thresholds, at least, would probably need to be configurable, > but that kind of idea.
I think that this idea is better. If the number of pages containing dead tuple is less than threshold (e.g. vacuum_index_cleanup_scale_factor), we can skip the cleanup index scans. I will write the patch and submit to next CF. > The alternative that's been proposed is to do something based on the > number of dead tuples but, as somebody pointed out in a previous > discussion of this topic, one dead tuple per page throughout the whole > table is a LOT worse than same number of dead tuples all on the same > pages. You don't want to keep scanning large chunks of the heap > because you're too lazy to visit the indexes. > > -- > Robert Haas > EnterpriseDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers