On Mon, May 11, 2020 at 4:10 PM Bruce Momjian <br...@momjian.us> wrote: > > think that you should point out that deduplication works by storing > > the duplicates in the obvious way: Only storing the key once per > > distinct value (or once per distinct combination of values in the case > > of multi-column indexes), followed by an array of TIDs (i.e. a posting > > list). Each TID points to a separate row in the table. > > These are not details that should be in the release notes since the > internal representation is not important for its use.
I am not concerned about describing the specifics of the on-disk representation, and I don't feel too strongly about the storage parameter (leave it out). I only ask that the wording convey the fact that the deduplication feature is not just a quantitative improvement -- it's a qualitative behavioral change, that will help data warehousing in particular. This wasn't the case with the v12 work on B-Tree duplicates (as I said last year, I thought of the v12 stuff as fixing a problem more than an enhancement). With the deduplication feature added to Postgres v13, the B-Tree code can now gracefully deal with low cardinality data by compressing the duplicates as needed. This is comparable to bitmap indexes in proprietary database systems, but without most of their disadvantages (in particular around heavyweight locking, deadlocks that abort transactions, etc). It's easy to imagine this making a big difference with analytics workloads. The v12 work made indexes with lots of duplicates 15%-30% smaller (compared to v11), but the v13 work can make them 60% - 80% smaller in many common cases (compared to v12). In extreme cases indexes might even be ~12x smaller (though that will be rare). -- Peter Geoghegan