Hi, > I assume that manually specifying dictionary entries is a consequence of > the prototype state? I don't think this is something humans are very > good at, just analyzing the data to see what's useful to dictionarize > seems more promising.
No, humans are not good at it. The idea was to automate the process and build the dictionaries automatically e.g. during the VACUUM. > I don't think we'd want much of the infrastructure introduced in the > patch for type agnostic cross-row compression. A dedicated "dictionary" > type as a wrapper around other types IMO is the wrong direction. This > should be a relation-level optimization option, possibly automatic, not > something visible to every user of the table. So to clarify, are we talking about tuple-level compression? Or perhaps page-level compression? Implementing page-level compression should be *relatively* straightforward. As an example this was previously done for InnoDB. Basically InnoDB compresses the entire page, then rounds the result to 1K, 2K, 4K, 8K, etc and stores the result in a corresponding fork ("fork" in PG terminology), similarly to how a SLAB allocator works. Additionally a page_id -> fork_id map should be maintained, probably in yet another fork, similarly to visibility map. A compressed page can change the fork after being modified since this may change the size of a compressed page. The buffer manager is unaffected and deals only with uncompressed pages. (I'm not an expert in InnoDB and this is my very rough understanding of how its compression works.) I believe this can be implemented as a TAM. Whether this would be a "dictionary" compression is debatable but it gives the users similar benefits, give or take. The advantage is that users shouldn't define any dictionaries manually, nor should DBMS during VACUUM or somehow else. > I also suspect that we'd have to spend a lot of effort to make > compression/decompression fast if we want to handle dictionaries > ourselves, rather than using the dictionary support in libraries like > lz4/zstd. That's a reasonable concern, can't argue with that. > I don't think a prototype-y patch not needing a rebase two months is a > good measure of complexity :) It's worth noting that I also invested quite some time into reviewing type-aware TOASTers :) I just choose to keep my personal opinion about the complexity of that patch to myself this time since obviously I'm a bit biased. However if you are curious it's all in the corresponding thread. -- Best regards, Aleksander Alekseev