вт, 2 февр. 2021 г. в 05:27, Peter Geoghegan <p...@bowt.ie>: > And now here is the second thing I thought of, which is much better: > > Sometimes 1% of the dead tuples in a heap relation will be spread > across 90%+ of the pages. With other workloads 1% of dead tuples might > be highly concentrated, and appear in no more than 1% of all heap > pages. Obviously the distinction between these two cases/workloads > matters a lot. And so the triggering criteria must be quantitative > *and* qualitative. It should not be based on counting dead tuples, > since that alone won't differentiate these two extreme cases - both of > which are probably quite common (in the real world extremes are > actually the normal and common case IME). > > I like the idea of basing it on counting *heap blocks*, not dead > tuples. We can count heap blocks that have *at least* one dead tuple > (of course it doesn't matter how they're dead, whether it was this > VACUUM operation or some earlier opportunistic pruning). Note in > particular that it should not matter if it's a heap block that has > only one LP_DEAD line pointer or a heap page that is near the > MaxHeapTuplesPerPage limit for the page -- we count either type of > page towards the heap-page based limit used to decide if index > vacuuming goes ahead for all indexes during VACUUM. >
I really like this idea! It resembles the approach used in bottom-up index deletion, block-based accounting provides a better estimate for the usefulness of the operation. I suppose that 1% threshold should be configurable as a cluster-wide GUC and also as a table storage parameter? -- Victor Yegorov