On Tue, Sep 13, 2022 at 6:02 AM Peter Geoghegan <p...@bowt.ie> wrote: > > My ongoing project to make VACUUM more predictable over time by > proactive freezing [1] will increase the overall number of tuples > frozen by VACUUM significantly (at least in larger tables). It's > important that we avoid any new user-visible impact from extra > freezing, though. I recently spent a lot of time on adding high-level > techniques that aim to avoid extra freezing (e.g. by being lazy about > freezing) when that makes sense. Low level techniques aimed at making > the mechanical process of freezing cheaper might also help. (In any > case it's well worth optimizing.) > > I'd like to talk about one such technique on this thread. The attached > WIP patch reduces the size of xl_heap_freeze_page records by applying > a simple deduplication process. This can be treated as independent > work (I think it can, at least).
+1 > The patch doesn't change anything > about the conceptual model used by VACUUM's lazy_scan_prune function > to build xl_heap_freeze_page records for a page, and yet still manages > to make the WAL records for freeze records over 5x smaller in many > important cases. They'll be ~4x-5x smaller with *most* workloads, > even. After a quick benchmark, I've confirmed that the amount of WAL records for freezing 1 million tuples reduced to about one-fifth (1.2GB vs 250MB). Great. > > Each individual tuple entry (each xl_heap_freeze_tuple) adds a full 12 > bytes to the WAL record right now -- no matter what. So the existing > approach is rather space inefficient by any standard (perhaps because > it was developed under time pressure while fixing bugs in Postgres > 9.3). More importantly, there is a lot of natural redundancy among > each xl_heap_freeze_tuple entry -- each tuple's xl_heap_freeze_tuple > details tend to match. We can usually get away with storing each > unique combination of values from xl_heap_freeze_tuple once per > xl_heap_freeze_page record, while storing associated page offset > numbers in a separate area, grouped by their canonical freeze plan > (which is a normalized version of the information currently stored in > xl_heap_freeze_tuple). True. I've not looked at the patch in depth yet but I think we need regression tests for this. Regards, -- Masahiko Sawada PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com