Hi, On 2023-02-05 13:41:17 +0300, Aleksander Alekseev wrote: > > I don't think the approaches in either of these threads is > > promising. They add a lot of complexity, require implementation effort > > for each type, manual work by the administrator for column, etc. > > I would like to point out that compression dictionaries don't require > per-type work. > > Current implementation is artificially limited to JSONB because it's a > PoC. I was hoping to get more feedback from the community before > proceeding further. Internally it uses type-agnostic compression and > doesn't care whether it compresses JSON(B), XML, TEXT, BYTEA or > arrays. This choice was explicitly done in order to support types > other than JSONB.
I don't think we'd want much of the infrastructure introduced in the patch for type agnostic cross-row compression. A dedicated "dictionary" type as a wrapper around other types IMO is the wrong direction. This should be a relation-level optimization option, possibly automatic, not something visible to every user of the table. I assume that manually specifying dictionary entries is a consequence of the prototype state? I don't think this is something humans are very good at, just analyzing the data to see what's useful to dictionarize seems more promising. I also suspect that we'd have to spend a lot of effort to make compression/decompression fast if we want to handle dictionaries ourselves, rather than using the dictionary support in libraries like lz4/zstd. > > One of the major justifications for work in this area is the cross-row > > redundancy for types like jsonb. I think there's ways to improve that > > across types, instead of requiring per-type work. > > To be fair, there are advantages in using type-aware compression. The > compression algorithm can be more efficient than a general one and in > theory one can implement lazy decompression, e.g. the one that > decompresses only the accessed fields of a JSONB document. > I agree though that particularly for PostgreSQL this is not > necessarily the right path, especially considering the accompanying > complexity. I agree with both those paragraphs. > above. However having a built-in type-agnostic dictionary compression > IMO is a too attractive idea to completely ignore it. Especially > considering the fact that the implementation was proven to be fairly > simple and there was even no need to rebase the patch since November > :) I don't think a prototype-y patch not needing a rebase two months is a good measure of complexity :) Greetings, Andres Freund