Patch: Global Unique Index

Cary Huang Thu, 17 Nov 2022 14:01:40 -0800

Patch: Global Unique Index



“Global unique index” in our definition is a unique index on a partitioned 
table that can ensure cross-partition uniqueness using a non-partition key. 
This work is inspired by this email thread, “Proposal: Global Index” started 
back in 2019 (Link below). My colleague David and I took a different approach 
to implement the feature that ensures uniqueness constraint spanning multiple 
partitions. We achieved this mainly by using application logics without heavy 
modification to current Postgres’s partitioned table/index structure. In other 
words, a global unique index and a regular partitioned index are essentially 
the same in terms of their storage structure except that one can do 
cross-partition uniqueness check, the other cannot.



https://www.postgresql.org/message-id/CALtqXTcurqy1PKXzP9XO%3DofLLA5wBSo77BnUnYVEZpmcA3V0ag%40mail.gmail.com



- Patch -

The attached patches were generated based on commit 
`85d8b30724c0fd117a683cc72706f71b28463a05` on master branch.



- Benefits of global unique index -

1. Ensure uniqueness spanning all partitions using a non-partition key column

2. Allow user to create a unique index on a non-partition key column without 
the need to include partition key (current Postgres enforces this)

3. Query performance increase when using a single unique non-partition key 
column





- Supported Features -

1. Global unique index is supported only on btree index type

2. Global unique index is useful only when created on a partitioned table.

3. Cross-partition uniqueness check with CREATE UNIQUE INDEX in serial and 
parallel mode

4. Cross-partition uniqueness check with ATTACH in serial and parallel mode

5. Cross-partition uniqueness check when INSERT and UPDATE





- Not-supported Features -

1. Global uniqueness check with Sub partition tables is not yet supported as we 
do not have immediate use case and it may involve majoy change in current 
implementation





- Global unique index syntax -

A global unique index can be created with "GLOBAL" and "UNIQUE" clauses in a 
"CREATE INDEX" statement run against a partitioned table. For example,



CREATE UNIQUE INDEX global_index ON idxpart(bid) GLOBAL;





- New Relkind: RELKIND_GLOBAL_INDEX -

When a global unique index is created on a partitioned table, its relkind is 
RELKIND_PARTITIONED_INDEX (I). This is the same as creating a regular index. 
Then Postgres will recursively create index on each child partition, except now 
the relkind will be set as RELKIND_GLOBAL_INDEX (g) instead of RELKIND_INDEX 
(i). This new relkind, along with uniqueness flag are needed for 
cross-partition uniqueness check later.





- Create a global unique index -

To create a regular unique index on a partitioned table, Postgres has to 
perform heap scan and sorting on every child partition. Uniqueness check 
happens during the sorting phase and will raise an error if multiple tuples 
with the same index key are sorted together. To achieve global uniqueness 
check, we make Postgres perform the sorting after all of the child partitions 
have been scanned instead of on the "sort per partition" fashion. In 
otherwords, the sorting only happens once at the very end and it sorts the 
tuples coming from all the partitions and therefore can ensure global 
uniqueness.



In parallel index build case, the idea is the same, except that the tuples will 
be put into shared file set (temp files) on disk instead of in memory to ensure 
other workers can share the sort results. At the end of the very last partition 
build, we make Postgres take over all the temp files and perform a final merge 
sort to ensure global uniqueness.



Example:



> CREATE TABLE gidx_part(a int, b int, c text) PARTITION BY RANGE (a);

> CREATE TABLE gidx_part1 PARTITION OF gidx_part FOR VALUES FROM (0) to (10);

> CREATE TABLE gidx_part2 PARTITION OF gidx_part FOR VALUES FROM (10) to (20);

> INSERT INTO gidx_part values(5, 5, 'test');

> INSERT INTO gidx_part values(15, 5, 'test');

> CREATE UNIQUE INDEX global_unique_idx ON gidx_part(b) GLOBAL;

ERROR:  could not create unique index "gidx_part1_b_idx"

DETAIL:  Key (b)=(5) is duplicated.





- INSERT and UPDATE -

For every new tuple inserted or updated, Postgres attempts to fetch the same 
tuple from current partition to determine if a duplicate already exists. In the 
global unique index case, we make Postgres attempt to fetch the same tuple from 
other partitions as well as the current partition. If a duplicate is found, 
global uniqueness is violated and an error is raised.



Example:



> CREATE TABLE gidx_part (a int, b int, c text) PARTITION BY RANGE (a);

> CREATE TABLE gidx_part1 partition of gidx_part FOR VALUES FROM (0) TO (10);

> CREATE TABLE gidx_part2 partition of gidx_part FOR VALUES FROM (10) TO (20);

> CREATE UNIQUE INDEX global_unique_idx ON gidx_part USING BTREE(b) GLOBAL;

> INSERT INTO gidx_part values(5, 5, 'test');

> INSERT INTO gidx_part values(15, 5, 'test');

ERROR:  duplicate key value violates unique constraint "gidx_part1_b_idx"

DETAIL:  Key (b)=(5) already exists.





- ATTACH -

The new partition-to-be may already contain a regular unique index or contain 
no index at all. If it has no index, Postgres will create a similar index for 
it upon ATTACH. If the partitioned table has a global unique index, a new 
global unique index is automatically created on the partition-to-be upon 
ATTACH, and it will run a global uniqueness check between all current 
partitions and the partition-to-be.



If the partition-to-be already contains a regular unique index, Postgres will 
change its relkind from RELKIND_INDEX to RELKIND_GLOBAL_INDEX and run a global 
uniqueness check between all current partitions and the partition-to-be. No new 
index is created in this case



If a duplicate record is found, global uniqueness is violated and an error is 
raised.





- DETACH -

Since we retain the same partitioned structure, detaching a partition with 
global unique index is straightforward. Upon DETACH, Postgres will change its 
relkind from RELKIND_GLOBAL_INDEX to RELKIND_INDEX and remove their inheritance 
relationship as usual.





- Optimizer, query planning and vacuum -

Since no major modification is done on global unique index's structure and 
storage, it works in the same way as a regular partitioned index. No major 
change is required to be done on optimizer, planner and vacuum process as they 
should work in the same way as regular index.





- REINDX -

A global unique index can be reindexed normally just like a regular index. No 
cross-partition uniqueness check is performed while a global unique index is 
being rebuilt. This is okay as long as it acquired a exclusive lock on the 
index relation.





- Benchmark Result -

Using pgbench with 200 partitions running SELECT and READ-WRITE tests with a 
unique non-partition key, we observe orders of magnitude higher TPS compared to 
a regular unique index built with partition key restriction (multicolumn index).





- TODOs -

Since this is a POC patch, there is several TODOs related to user experience 
such as:



    1. Raise error when user uses CREATE UNIQUE INDEX with ON ONLY clause

    2. Raise error when user tries to create a global unique index directly on 
a child partition

    3. ... maybe more



We will work on these at a later time.



thank you



Please let us know your thoughts or questions about the feature.



All comments are welcome and greatly appreciated!



David and Cary



============================

HighGo Software Canada

http://www.highgo.ca

0001-support-global-unique-index-with-non-partition-key.patch
Description: Binary data

0002-support-global-unique-index-create.patch
Description: Binary data

0003-support-global-unique-index-attach-and-detach.patch
Description: Binary data

0004-support-global-unique-index-insert-and-update.patch
Description: Binary data

Patch: Global Unique Index

Reply via email to