Hello hackers,

I'd like to float an idea and gauge appetite for exploring it. This is a 
request for comments, not a patch.

The problem, stated narrowly:

A UNIQUE or PRIMARY KEY constraint on a partitioned table must today include 
every partition-key column, because each per-partition index can only police 
uniqueness inside its own partition. The most-wanted feature this blocks is a 
cross-partition unique constraint on a non-partition-key column (e.g. partition 
by created_at, but keep email globally unique).

Prior art (and what each pays for it)
 * The 2019 "Proposal: Global Index" [1] - a single parent-level index over 
storage-less parents.
 * The 2022 HighGo "Global Unique Index" POC [2][3][4] - RELKIND_GLOBAL_INDEX, 
a merge-sort across all partitions at build, and on every write a probe into 
other partitions' indexes.
 * Dilip Kumar's ongoing "Global Index" patch set [5] - a partition identifier 
embedded in each index tuple, a pg_index_partitions catalog, planner paths for 
the partitioned relation, and (currently) locking the partition hierarchy 
during DML.
 * Postgres Pro's shipped gbtree AM [6][7] - a non-MVCC global structure 
carrying the PK as INCLUDE-style columns, with hash-bucket locking for 
uniqueness; reportedly ~1.6–1.7× slower writes with 100 partitions.

These are all full global indexes: they have to serve index scans, carry 
per-row entries with visibility semantics, expose planner paths, and be 
vacuumed. That is a large surface, and I think it's why the feature has been 
"almost there" for the better part of a decade.

The observation:

In the recent thread [8][9], the discussion already split the problem in two: 
for SELECTs we'd like to avoid locking every partition, and for uniqueness the 
conflict is found by probing one structure, after which only the single owning 
partition needs touching. In other words, enforcing a cross-partition 
constraint is a strictly smaller problem than maintaining a cross-partition 
index for reads. A unique constraint needs only: "does this key already exist 
anywhere, and if so where, transactionally?" It does not need ordered range 
scans, planner paths, or visibility-aware read access for arbitrary queries.

The proposal: a Constraint Management layer (two-tier)

Formalize constraint enforcement as its own abstraction, separate from indexing 
- call it a constraint method (CM), analogous to but distinct from the index 
AM. The planner/executor consult the CM before the row reaches the table AM and 
index AMs, or in cases where we are using the index for constraint enforcement 
as we do now.

    Tier 1 - reuse an index when one already does the job. When the 
constraint's columns include the partition key (today's supported case) the CM 
simply delegates to existing local-index.

    Tier 2 - a purpose-built enforcer when full indexing is overkill. For 
cross-partition uniqueness on a non-key column, register a lightweight 
enforcement structure that maintains just enough logged state to answer the 
membership question and locate the one conflicting partition. It is keyed only 
by the constraint columns; it stores no per-row scan payload, exposes no 
planner path, and is checked/updated from an ExecInsert/Update/Delete hook 
ahead of the heap and index inserts. Conceptually this is a global membership 
map (key -> owning-partition + heap TID) rather than a global secondary index.

The win is that Tier 2 lets us defer the hard parts the full-index proposals 
struggle with. Because the enforcer is consulted at the executor level (not 
from inside index AM code), we know which single partition to lock at exactly 
the moment we have a candidate conflict — addressing the "we only discover the 
partition while inside the AM" objection raised in the current thread. And 
because it carries no MVCC scan data, it sidesteps much of the vacuum/bloat 
conversation.

What it might look like in code:

A pg_constraint_method catalog and a CM descriptor with a handful of callbacks, 
roughly:
        cm_check(values, snapshot) -> {ok | conflict(partition, tid)}
        cm_insert(values, partition, tid, xid) / cm_delete(...)
        cm_build(partitioned_rel) for initial population
        cm_vacuum(...) / WAL redo for the enforcer's own state.

A reference Tier-2 enforcer: a WAL-logged, MVCC-aware ordered/hash relation 
keyed on the constraint columns, holding (partition_id, heap_tid, xmin/xmax) — 
large enough to enforce and to resolve the visibility of a would-be conflict, 
small enough to avoid being a second copy of the data.

Executor wiring: ExecInsert/ExecUpdate call cm_check after forming the tuple 
but before/around the heap insert (mirroring how speculative insertion + ON 
CONFLICT already work), locking only the partition returned by a positive 
cm_check.

Recovery: the enforcer is just another WAL-logged relation/fork, so crash 
recovery and physical replication come along for free; logical replication and 
ON CONFLICT are explicit follow-ups.

Why I think this is worth exploring over "just finish the global index":

The full-global-index patches are valuable and I'm not proposing to abandon 
them - Tier 1 can adopt them when they land, but the motivation for global 
indexes should be a requirement for index scans across partitioned tables, not 
constraint management. Based on what I understand, the real demand is for the 
constraint enforcement over partitioned tables, not the index scan. A CM layer 
lets us ship cross-partition uniqueness (and cross-partition exclusion 
constraints, partition-spanning FKs) sooner, with a smaller and more reviewable 
surface providing a "global" constraints system which should enable a wider 
adoption of partitioned tables.

Open questions I'd like input on:

- Is a separate constraint-method abstraction warranted, or should this be 
modeled as a degenerate "constraint-only" mode of the proposed global index (no 
read paths)?

 - Concurrency: is a speculative-insertion-style protocol against the Tier-2 
structure sufficient, or do we need a dedicated predicate-locking scheme for 
SSI correctness?

 - Catalog and DDL: how should ALTER TABLE … ADD CONSTRAINT choose Tier 1 vs 
Tier 2, and should users be able to force it?

 - Does deferring partition locks to executor-level CM checks interact badly 
with prepared plans / AcquireExecutorLocks?

 - Is there appetite for a minimal prototype (uniqueness only, no 
FK/exclusion/etc.) as a discussion vehicle/proof-of-concept?

I'm happy to put together a WIP prototype of the uniqueness enforcer if there's 
interest in the direction. Feedback, especially on whether this should be 
folded into the existing global-index effort rather than standing alongside it, 
very welcome.

best,

-greg

[1] 
https://www.postgresql.org/message-id/CALtqXTcurqy1PKXzP9XO%3DofLLA5wBSo77BnUnYVEZpmcA3V0ag%40mail.gmail.com
[2] 
https://www.postgresql.org/message-id/184879c5306.12490ea581628934.7312528450011769010%40highgo.ca
[3] 
https://www.highgo.ca/2022/10/28/cross-partition-uniqueness-guarantee-with-global-unique-index/
[4] 
https://www.highgo.ca/2022/12/16/global-unique-index-attach-support-and-its-potential-deficiency/
[5] 
https://www.postgresql.org/message-id/CA%2BHiwqG%2BEizej9nCNcxSOfFyZ2i9Mhv%3Dzn%2Ba%2B4o-gwsvFz6EqQ%40mail.gmail.com
[6] https://postgrespro.com/docs/enterprise/current/pgpro-gbtree
[7] https://habr.com/en/companies/postgrespro/articles/948428/
[8] 
https://www.postgresql.org/message-id/CA%2BHiwqG%2BEizej9nCNcxSOfFyZ2i9Mhv%3Dzn%2Ba%2B4o-gwsvFz6EqQ%40mail.gmail.com
[9] 
https://www.postgresql.org/message-id/CAFiTN-tu1f0TL4C1CgRzBYkTrrhcYscc7Nz_LJ3xJDOZJGA6kA%40mail.gmail.com


Reply via email to