Re: Patch: Global Unique Index

Simon Riggs Mon, 21 Nov 2022 04:34:04 -0800

On Thu, 17 Nov 2022 at 22:01, Cary Huang <cary.hu...@highgo.ca> wrote:
>
> Patch: Global Unique Index


Let me start by expressing severe doubt on the usefulness of such a
feature, but also salute your efforts to contribute.

> In other words, a global unique index and a regular partitioned index are 
> essentially the same in terms of their storage structure except that one can 
> do cross-partition uniqueness check, the other cannot.

This is the only workable architecture, since it allows DETACH to be
feasible, which is essential.

You don't seem to mention that this would require a uniqueness check
on each partition. Is that correct? This would result in O(N) cost of
uniqueness checks, severely limiting load speed. I notice you don't
offer any benchmarks on load speed or the overhead associated with
this, which is not good if you want to be taken seriously, but at
least it is recoverable.

(It might be necessary to specify some partitions as READ ONLY, to
allow us to record their min/max values for the indexed cols, allowing
us to do this more quickly.)

> - Supported Features -
> 1. Global unique index is supported only on btree index type

Why? Surely any index type that supports uniqueness is good.

> - Not-supported Features -
> 1. Global uniqueness check with Sub partition tables is not yet supported as 
> we do not have immediate use case and it may involve majoy change in current 
> implementation

Hmm, sounds like a problem. Arranging the calls recursively should work.

> - Create a global unique index -
> To create a regular unique index on a partitioned table, Postgres has to 
> perform heap scan and sorting on every child partition. Uniqueness check 
> happens during the sorting phase and will raise an error if multiple tuples 
> with the same index key are sorted together. To achieve global uniqueness 
> check, we make Postgres perform the sorting after all of the child partitions 
> have been scanned instead of on the "sort per partition" fashion. In 
> otherwords, the sorting only happens once at the very end and it sorts the 
> tuples coming from all the partitions and therefore can ensure global 
> uniqueness.

My feeling is that performance on this will suck so badly that we must
warn people away from it, and tell people if they want this, create
the index at the start and let it load.

Hopefully CREATE INDEX CONCURRENTLY still works.

Let's see some benchmarks on this also please.

You'll need to think about progress reporting early because correctly
reporting the progress and expected run times are likely critical for
usability.

> Example:
>
> > CREATE TABLE gidx_part (a int, b int, c text) PARTITION BY RANGE (a);
> > CREATE TABLE gidx_part1 partition of gidx_part FOR VALUES FROM (0) TO (10);
> > CREATE TABLE gidx_part2 partition of gidx_part FOR VALUES FROM (10) TO (20);
> > CREATE UNIQUE INDEX global_unique_idx ON gidx_part USING BTREE(b) GLOBAL;
> > INSERT INTO gidx_part values(5, 5, 'test');
> > INSERT INTO gidx_part values(15, 5, 'test');
> ERROR:  duplicate key value violates unique constraint "gidx_part1_b_idx"
> DETAIL:  Key (b)=(5) already exists.

Well done.

> - DETACH -
> Since we retain the same partitioned structure, detaching a partition with 
> global unique index is straightforward. Upon DETACH, Postgres will change its 
> relkind from RELKIND_GLOBAL_INDEX to RELKIND_INDEX and remove their 
> inheritance relationship as usual.

It's the only way that works

> - Optimizer, query planning and vacuum -
> Since no major modification is done on global unique index's structure and 
> storage, it works in the same way as a regular partitioned index. No major 
> change is required to be done on optimizer, planner and vacuum process as 
> they should work in the same way as regular index.

Agreed


Making a prototype is a great first step.

The next step is to understand the good and the bad aspects of it, so
you can see what else needs to be done. You need to be honest and real
about the fact that this may not actually be desirable in practice, or
in a restricted use case.

That means performance analysis of create, load, attach, detach,
INSERT, SELECT, UPD/DEL and anything else that might be affected,
together with algorithmic analysis of what happens for larger N and
larger tables.

Expect many versions; take provisions for many days.

Best of luck

-- 
Simon Riggs                http://www.EnterpriseDB.com/

Re: Patch: Global Unique Index

Reply via email to