Hi,
Robert, Melanie and I spent an evening discussing this topic around pgconf.nyc. Here are, mildly revised, notes from that: First a few random points that didn't fit with the sketch of an approach below: - Are unlogged tables a problem for using LSN based heuristics for freezing? We concluded, no, not a problem, because aggressively freezing does not increase overhead meaningfully, as we would already dirty both the heap and VM page to set the all-visible flag. - "Unfreezing" pages that were frozen hours / days ago aren't too bad and can be desirable. The main thing we are worried about is repeated freezing / unfreezing of pages within a relatively short time period. - Computing an average "modification distance" as I (Andres) proposed efor each page is complicated / "fuzzy" The main problem is that it's not clear how to come up with a good number for workloads that have many more inserts into new pages than modifications of existing pages. It's also hard to use average for this kind of thing, e.g. in cases where new pages are frequently updated, but also some old data is updated, it's easy for the updates to the old data to completely skew the average, even though that shouldn't prevent us from freezing. - We also discussed an idea by Robert to track the number of times we need to dirty a page when unfreezing and to compare that to the number of pages dirtied overall (IIRC), but I don't think we really came to a conclusion around that - and I didn't write down anything so this is purely from memory. A rough sketch of a freezing heuristic: - We concluded that to intelligently control opportunistic freezing we need statistics about the number of freezes and unfreezes - We should track page freezes / unfreezes in shared memory stats on a per-relation basis - To use such statistics to control heuristics, we need to turn them into rates. For that we need to keep snapshots of absolute values at certain times (when vacuuming), allowing us to compute a rate. - If we snapshot some stats, we need to limit the amount of data that occupies - evict based on wall clock time (we don't care about unfreezing pages frozen a month ago) - "thin out" data when exceeding limited amount of stats per relation using random sampling or such - need a smarter approach than just keeping N last vacuums, as there are situations where a table is (auto-) vacuumed at a high frequency - only looking at recent-ish table stats is fine, because we - a) don't want to look at too old data, as we need to deal with changing workloads - b) if there aren't recent vacuums, falsely freezing is of bounded cost - shared memory stats being lost on crash-restart/failover might be a problem - we certainly don't want to immediate store these stats in a table, due to the xid consumption that'd imply - Attributing "unfreezes" to specific vacuums would be powerful: - "Number of pages frozen during vacuum" and "Number of pages unfrozen that were frozen during the same vacuum" provides numerator / denominator for an "error rate" - We can perform this attribution by comparing the page LSN with recorded start/end LSNs of recent vacuums - If the freezing error rate of recent vacuums is low, freeze more aggressively. This is important to deal with insert mostly workloads. - If old data is "unfrozen", that's fine, we can ignore such unfreezes when controlling "freezing aggressiveness" - Ignoring unfreezing of old pages is important to e.g. deal with workloads that delete old data - This approach could provide "goals" for opportunistic freezing in a somewhat understandable way. E.g. aiming to rarely unfreeze data that has been frozen within 1h/1d/... Around this point my laptop unfortunately ran out of battery. Possibly the attendees of this mini summit also ran out of steam (and tea). We had a few "disagreements" or "unresolved issues": - How aggressive should we be when we have no stats? - Should the freezing heuristic take into account whether freezing would require an FPI? Or whether page was not in s_b, or ... I likely mangled this substantially, both when taking notes during the lively discussion, and when revising them to make them a bit more readable. Greetings, Andres Freund