Re: PG 13 release notes, first draft

Peter Geoghegan Mon, 11 May 2020 17:06:01 -0700

On Mon, May 11, 2020 at 4:10 PM Bruce Momjian <br...@momjian.us> wrote:
> > think that you should point out that deduplication works by storing
> > the duplicates in the obvious way: Only storing the key once per
> > distinct value (or once per distinct combination of values in the case
> > of multi-column indexes), followed by an array of TIDs (i.e. a posting
> > list). Each TID points to a separate row in the table.
>
> These are not details that should be in the release notes since the
> internal representation is not important for its use.


I am not concerned about describing the specifics of the on-disk
representation, and I don't feel too strongly about the storage
parameter (leave it out). I only ask that the wording convey the fact
that the deduplication feature is not just a quantitative improvement
-- it's a qualitative behavioral change, that will help data
warehousing in particular. This wasn't the case with the v12 work on
B-Tree duplicates (as I said last year, I thought of the v12 stuff as
fixing a problem more than an enhancement).

With the deduplication feature added to Postgres v13, the B-Tree code
can now gracefully deal with low cardinality data by compressing the
duplicates as needed. This is comparable to bitmap indexes in
proprietary database systems, but without most of their disadvantages
(in particular around heavyweight locking, deadlocks that abort
transactions, etc). It's easy to imagine this making a big difference
with analytics workloads. The v12 work made indexes with lots of
duplicates 15%-30% smaller (compared to v11), but the v13 work can
make them 60% - 80% smaller in many common cases (compared to v12). In
extreme cases indexes might even be ~12x smaller (though that will be
rare).

-- 
Peter Geoghegan

Re: PG 13 release notes, first draft

Reply via email to