On Tue, 1 Sep 2020 at 08:01, Peter Geoghegan <p...@bowt.ie> wrote: > > On Mon, Aug 31, 2020 at 1:56 PM Peter Geoghegan <p...@bowt.ie> wrote: > > I wonder if Roaring bitmaps would work well for this: > > > > https://arxiv.org/abs/1709.07821 > > Alternatively, perhaps we could use a negative cache of heap blocks > that have no tuples to kill at all. Maybe just a simple array whose > elements are BlockNumber pairs. Each pair represents a range of blocks > known to contain heap pages that *don't* have any TIDs for VACUUM to > kill. The array could be size limited to 8KB, allowing it to be > accessed in L1 cache throughout vacuuming. When the representation > cannot fit in 8KB, maybe we disable the negative cache for the rest of > the VACUUM operation. > > This is a bit like Masahiko's min_blkno + max_blkno idea, except: 1). > It's a negative cache, and 2.) There are perhaps as many as 1024 > min/max pairs -- not just 1.
Interesting idea. Maybe we need one more bsearch() for the min/max pairs array when a TID should be killed? > > It's pretty clear that more than 90% of all calls to lazy_tid_reaped() > return false unless vacuum is running behind (false means "don't kill > this TID, it's alive"). But it could be much more than 90%. This may > be because autovacuum is configured to run more aggressively than the > default settings allow. But it may also happen when LP_DEAD killing in > indexes works well enough to do most of the cleanup needed in > practice. I bet pgbench finds that ~99% of all calls to > lazy_tid_reaped() that take place during index vacuuming find that the > TID doesn't need to be killed. So negative caching could really help. I agree that lazy_tid_reaped() returns false in most checks in the case autovacuum is not running behind. But I'm concerned a bit that it instead costs the case where vacuum needs to kill many TIDs and uses the min/max filter because it needs to call bsearch() twice. I think that this is the case where autovacuum is running behind and the user wants to complete the vacuum as soon as possible. Since I expect that checking the filtering using min/max pairs array should be done in a very short time it might not be a problem but I think it’s worth benchmarking the overhead in the worst case. Or I guess we can use a bloom filter for this purpose instead. Also, I'm concerned that 1024 min/max pairs might not be enough in practice, especially for very large tables. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services