Re: Key joins

Arne Roland Thu, 25 Jun 2026 06:35:21 -0700

Hi Henson,

thank you for getting involved!


On 2026-06-25 1:15 PM, Henson Choi wrote:

Hi hackers,
Thank you for this interesting proposal. I have two questions thattouch on what I see as a fundamental tension in the design.
On SQL's evolution from How to What

As Codd put it, "The user of a relational system should not need to
know how the system stores and accesses data." Key Join, I would
argue, pulls in the opposite direction.
SQL has historically evolved away from procedural "how" towarddeclarative "what" — users describe the result they want, and theoptimizer figures out how to get there. Foreign key constraints are agood example: once declared in the schema, the optimizer can alreadyexploit them for join elimination, cardinality estimation, and so on.Key Join asks users to explicitly annotate their queries with FKtraversal semantics. Isn't that a step back toward "how"?

I am 100% with you on that Codd quote. I love the declarative nature ofSQL and invited developers with imperative paradigms to change todeclarative ones. I'd argue that we describe what we want. Let me quoteour intent posted in the opening message of this thread:


> We propose a new JOIN syntax that makes it easy to determine locally
> that the immediate join result, before any further steps, just enriches
> the referencing side with information from the referenced side, with
> null-extension for OUTER JOINs. It conveys the author's intent, makes
> the referencing side visually clear, and is enforced at compile time
> against the schema.

I'd argue the intent is the enrichment, and which side is the one beingreferenced. It has been my experience over the years that developershave it. We are offering a way to express it. By its nature this patchseries adds something closer to an assertion. If anything that makes akey join more declarative than an ordinary equijoin: We declare anintent, and the schema settles the rest. This is expressed in moredetail in section 8.3 in our key_joins.pdf[https://www.postgresql.org/message-id/00c30670-64e1-4c30-a349-784426d333df%40app.fastmail.com]or in the web demo under https://keyjoin.org/#sec8.4. (The numbering isslightly off because of versioning issues.) The proposed patch does notalter how the planner or optimizer work in the slightest. Please help meunderstand how we are taking a step toward "how"?

Tomas Vondra raised a directly relevant point in his review:

> I'm interested in this patch because there seems to be a possible
> overlap with the starjoin planning (in that maybe we could try
> reusing some of the derived information for that).

I do think that we want to use the same underlying provinginfrastructure. That comment is not about the semantics of our SQLlanguage or our interface with the code, but about reusing and sharinglogic, that we can get into the source code of our beloved project.

And later:

> Plus, I don't want to make that patch dependent on people using
> new syntax. If that can give us additional information, that would
> be a different thing.

His second remark seems to carry an implicit question: if the optimizer
can already see the FK constraints, why does the user need to annotate
the query at all? Wouldn't automatic starjoin-style optimization be more
consistent with SQL's declarative philosophy?

If you want to do optimization in the planner, you totally can. I thinkwe need more optimizer improvements. As the attempts in the prior threadabout the optimizations of star joins did that. Tomas just noted that heis interested in using the underlying proof architecture for that.

This feature we are working on here is NOT a performance feature, it's*only* a correctness feature. We as authors have different backgroundsand different views. However I am very confident that we are inagreement, this is solely a correctness feature.

I am very open to discuss and reason about the potential value of usingthe infrastructure for improvements, because I have an interest in thosetoo. I do think that's currently beyond the scope of this thread,because of the already fairly involved complexity of this patch.

Informally this feature is meant for query writers to say "if you can'tprove the referential correctness of this feature, please don't go aheadand give me an error instead". This error is not an accidental artifact,it sits at the core of this feature. Doing this compile time is veryhelpful to build correct systems. We need to convey the intent of theexistence of a referential constraint in order for it to be proven.

Our current syntax also makes it much easier to review queries. For anexample, I'd refer you to our example in section 8.9 of our document,which can be read here in the thread at the attached key_joins.pdf orour online web demo at https://keyjoin.org/#sec8.10. The numbering isslightly off because of versioning issues.

This reading is also
supported by his point 13, where he explicitly asks whether the planner
— rather than parse-analyze — might be the more appropriate place to
handle this, and notes that moving it there could substantially reduce
the code footprint.

I have tried Tomas suggestion and found no substantial reduction incomplexity. One issue here is that we have different inputs, since weare at parsing and not planning stage. Another that we have vastlydifferent outputs, since we need not only to know, that we are unique,but exactly how, including the knowledge of the uniqueconstraints/indexes to record the pg_depend for stored objects withattached key joins.

Despite us logically doing almost the same thing in the three helperfunctions as the optimizer code does, for code architectural reasons,the benefit of trying to use those functions seems questionable to me.Small side note for completeness sake: We skip a few cases likeuniqueness of aggregates without group by clause, because in those weknow, that there can't be a referential constraint. We still encode theuniqueness of that, but not in that function.

On hints vs. syntax

Traditionally, when users have needed to pass extra information to the
optimizer — information the engine couldn't derive on its own — the
community has handled this through hints. Non-standard, yes, but they
leave the core syntax untouched and stay firmly in the optimizer's
domain. Key Join introduces this information as first-class syntax with
compile-time enforcement. What are the concrete advantages and
disadvantages of that choice compared to a hint-based approach?

This patch series doesn't touch the optimizer at all nor should it. Ifsomething like this gets committed, we will be able to harness some ofthe underlying architecture for optimization purposes.

The reason why our intended correctness guarantees can't be achievedthrough works in the optimizer is the proper enforcement of tracking thecorrectness of the referential guarantees in catalog objects. Allow meto quote from my earlier answer:

> With DDL-issue I was alluding to a DDL command that attaches a queryto some object. Views, sql functions and policies all do that.> I don't see us doing that work planning time. Theoretically we couldtry to use a new planner hook for that, to call the planner with an> extra struct element telling it about increased locking arrangementswithout the intent to ever execute a query and abort after the proof.

Structurally I don't really see a sane way to make this feature aplanner stage feature at all.

Specifically:

The main advertised advantage is compile-time correctness enforcement
— catching fan-out bugs at parse time rather than silently producing
wrong results. Is that benefit sufficient to justify introducing "how"
semantics into SQL syntax proper? A hint could achieve the
optimizer-side benefits (starjoin planning, cardinality guidance)
without touching the grammar. What does the syntax approach give us
that a hint cannot?

I recognize that compile-time enforcement is genuinely valuable, and
that hints cannot be standardized through WG3. But I'd like to
understand how the authors weigh those tradeoffs, especially given
Tomas's observation that the planner already has access to FK
information and could potentially derive much of this automatically.

Best regards,
Henson

Our correctness guarantee is very different from a hint. A hint istelling the optimizer "try to do this thing, if you can". Ourcorrectness guarantee is as of now not giving anything to the optimizerand telling the parser "give me an error, if you can't prove myassumptions". I struggle to see the overlap between the two.

Since you specifically raised the fan-out problem and wanted an opinionfrom an author, I will share mine. Here I can't claim to speak for allauthors.My opinion is that the fan-out problem alone is more than enough towarrant a proper SQL syntax expansion. It has been a serious bug inmultiple codebases I have worked with.Some of our analysis suggested to me, that preventing the fan out bugmight be possible with less complex logic. However this does not onlyprevent the fan out problem, but a myriad of issues, a lot of which Ihave seen in production systems too.

You can read more about our design rationale in chapter 8 in theattached key_joins.pdf or online at https://keyjoin.org/#sec8. For anunderstanding of the usefulness of the feature I suggest sections 7.2 to7.6 again in the key_joins.pdf or online at https://keyjoin.org/#sec7.2.


Best regards
Arne

Re: Key joins

Reply via email to