I wonder why we're counting the number of dead tuples (or LP_DEAD stub items) in the relation as a whole in ANALYZE's acquire_sample_rows() function. Wouldn't it make more sense to focus on the "live vs dead tuple properties" of heap pages that are not known to be all-visible when we generate statistics for our pgstat_report_analyze() report? These statistic collector stats are only for the benefit of autovacuum scheduling -- and so they're *consumed* in a way that is totally different to the nearby pg_statistic stats.
There is no good reason for the pgstat_report_analyze() stats to be based on the same pg_class.relpages "denominator" as the pg_statistic stats (it's just slightly easier to do it that way in acquire_sample_rows(), I suppose). On the other hand, an alternative behavior involving counting totaldeadrows against sampled not-all-visible pages (but not otherwise) has a big benefit: doing so would remove any risk that older/earlier PageIsAllVisible() pages will bias ANALYZE in the direction of underestimating the count. This isn't a theoretical benefit -- I have tied it to an issue with the BenchmarkSQL TPC-C implementation [1]. This approach just seems natural to me. VACUUM intrinsically only expects dead tuples/line pointers in not-all-visible pages. So PageIsAllVisible() pages should not be counted here -- they are simply irrelevant, because these stats are for autovacuum, and autovacuum thinks they're irrelevant. What's more, VACUUM currently uses vac_estimate_reltuples() to compensate for the fact that it skips some pages using the visibility map -- pgstat_report_vacuum() expects a whole-relation estimate. But if pgstat_report_vacuum()/pgstat_report_analyze() expected statistics about the general properties of live vs dead tuples (or LP_DEAD items) on not-all-visible pages in the first place, then we wouldn't need to compensate like this. This new approach also buys us the ability to extrapolate a new estimated number of dead tuples using old, stale stats. The stats can be combined with the authoritative/known number of not-all-visible pages right this second, since it's cheap enough to *accurately* determine the total number of not-all-visible pages for a heap relation by calling visibilitymap_count(). My guess is that this would be much more accurate in practice: provided the original average number of dead/live tuples (tuples per not-all-visible block) was still reasonably accurate, the extrapolated "total dead tuples right now" values would also be accurate. I'm glossing over some obvious wrinkles here, such as: what happens to totaldeadrows when 100% of all the pages ANALYZE samples are PageIsAllVisible() pages? I think that it shouldn't be too hard to come up with solutions to those problems (the extrapolation idea already hints at a solution), but for now I'd like to keep the discussion high level. [1] https://postgr.es/m/CAH2-Wz=9r83wcwzcpuh4fvpedm4znzbzmvp3rt21+xhqwmu...@mail.gmail.com -- Peter Geoghegan