Hi Henson,

thank you for getting involved!

On 2026-06-25 1:15 PM, Henson Choi wrote:
Hi hackers,

Thank you for this interesting proposal. I have two questions that touch on what I see as a fundamental tension in the design.

On SQL's evolution from How to What

As Codd put it, "The user of a relational system should not need to
know how the system stores and accesses data." Key Join, I would
argue, pulls in the opposite direction.

SQL has historically evolved away from procedural "how" toward declarative "what" — users describe the result they want, and the optimizer figures out how to get there. Foreign key constraints are a good example: once declared in the schema, the optimizer can already exploit them for join elimination, cardinality estimation, and so on. Key Join asks users to explicitly annotate their queries with FK traversal semantics. Isn't that a step back toward "how"?

I am 100% with you on that Codd quote. I love the declarative nature of SQL and invited developers with imperative paradigms to change to declarative ones. I'd argue that we describe what we want. Let me quote our intent posted in the opening message of this thread:

> We propose a new JOIN syntax that makes it easy to determine locally
> that the immediate join result, before any further steps, just enriches
> the referencing side with information from the referenced side, with
> null-extension for OUTER JOINs. It conveys the author's intent, makes
> the referencing side visually clear, and is enforced at compile time
> against the schema.

I'd argue the intent is the enrichment, and which side is the one being referenced. It has been my experience over the years that developers have it. We are offering a way to express it. By its nature this patch series adds something closer to an assertion. If anything that makes a key join more declarative than an ordinary equijoin: We declare an intent, and the schema settles the rest. This is expressed in more detail in section 8.3 in our key_joins.pdf [https://www.postgresql.org/message-id/00c30670-64e1-4c30-a349-784426d333df%40app.fastmail.com] or in the web demo under https://keyjoin.org/#sec8.4. (The numbering is slightly off because of versioning issues.) The proposed patch does not alter how the planner or optimizer work in the slightest. Please help me understand how we are taking a step toward "how"?

Tomas Vondra raised a directly relevant point in his review:

> I'm interested in this patch because there seems to be a possible
> overlap with the starjoin planning (in that maybe we could try
> reusing some of the derived information for that).
I do think that we want to use the same underlying proving infrastructure. That comment is not about the semantics of our SQL language or our interface with the code, but about reusing and sharing logic, that we can get into the source code of our beloved project.
And later:

> Plus, I don't want to make that patch dependent on people using
> new syntax. If that can give us additional information, that would
> be a different thing.

His second remark seems to carry an implicit question: if the optimizer
can already see the FK constraints, why does the user need to annotate
the query at all? Wouldn't automatic starjoin-style optimization be more
consistent with SQL's declarative philosophy?

If you want to do optimization in the planner, you totally can. I think we need more optimizer improvements. As the attempts in the prior thread about the optimizations of star joins did that. Tomas just noted that he is interested in using the underlying proof architecture for that.

This feature we are working on here is NOT a performance feature, it's *only* a correctness feature. We as authors have different backgrounds and different views. However I am very confident that we are in agreement, this is solely a correctness feature.

I am very open to discuss and reason about the potential value of using the infrastructure for improvements, because I have an interest in those too. I do think that's currently beyond the scope of this thread, because of the already fairly involved complexity of this patch.

Informally this feature is meant for query writers to say "if you can't prove the referential correctness of this feature, please don't go ahead and give me an error instead". This error is not an accidental artifact, it sits at the core of this feature. Doing this compile time is very helpful to build correct systems. We need to convey the intent of the existence of a referential constraint in order for it to be proven.

Our current syntax also makes it much easier to review queries. For an example, I'd refer you to our example in section 8.9 of our document, which can be read here in the thread at the attached key_joins.pdf or our online web demo at https://keyjoin.org/#sec8.10. The numbering is slightly off because of versioning issues.

This reading is also
supported by his point 13, where he explicitly asks whether the planner
— rather than parse-analyze — might be the more appropriate place to
handle this, and notes that moving it there could substantially reduce
the code footprint.
I have tried Tomas suggestion and found no substantial reduction in complexity. One issue here is that we have different inputs, since we are at parsing and not planning stage. Another that we have vastly different outputs, since we need not only to know, that we are unique, but exactly how, including the knowledge of the unique constraints/indexes to record the pg_depend for stored objects with attached key joins.

Despite us logically doing almost the same thing in the three helper functions as the optimizer code does, for code architectural reasons, the benefit of trying to use those functions seems questionable to me. Small side note for completeness sake: We skip a few cases like uniqueness of aggregates without group by clause, because in those we know, that there can't be a referential constraint. We still encode the uniqueness of that, but not in that function.

On hints vs. syntax

Traditionally, when users have needed to pass extra information to the
optimizer — information the engine couldn't derive on its own — the
community has handled this through hints. Non-standard, yes, but they
leave the core syntax untouched and stay firmly in the optimizer's
domain. Key Join introduces this information as first-class syntax with
compile-time enforcement. What are the concrete advantages and
disadvantages of that choice compared to a hint-based approach?

This patch series doesn't touch the optimizer at all nor should it. If something like this gets committed, we will be able to harness some of the underlying architecture for optimization purposes.

The reason why our intended correctness guarantees can't be achieved through works in the optimizer is the proper enforcement of tracking the correctness of the referential guarantees in catalog objects. Allow me to quote from my earlier answer:

> With DDL-issue I was alluding to a DDL command that attaches a query to some object. Views, sql functions and policies all do that. > I don't see us doing that work planning time. Theoretically we could try to use a new planner hook for that, to call the planner with an > extra struct element telling it about increased locking arrangements without the intent to ever execute a query and abort after the proof.

Structurally I don't really see a sane way to make this feature a planner stage feature at all.

Specifically:

The main advertised advantage is compile-time correctness enforcement
— catching fan-out bugs at parse time rather than silently producing
wrong results. Is that benefit sufficient to justify introducing "how"
semantics into SQL syntax proper? A hint could achieve the
optimizer-side benefits (starjoin planning, cardinality guidance)
without touching the grammar. What does the syntax approach give us
that a hint cannot?

I recognize that compile-time enforcement is genuinely valuable, and
that hints cannot be standardized through WG3. But I'd like to
understand how the authors weigh those tradeoffs, especially given
Tomas's observation that the planner already has access to FK
information and could potentially derive much of this automatically.

Best regards,
Henson

Our correctness guarantee is very different from a hint. A hint is telling the optimizer "try to do this thing, if you can". Our correctness guarantee is as of now not giving anything to the optimizer and telling the parser "give me an error, if you can't prove my assumptions". I struggle to see the overlap between the two.

Since you specifically raised the fan-out problem and wanted an opinion from an author, I will share mine. Here I can't claim to speak for all authors. My opinion is that the fan-out problem alone is more than enough to warrant a proper SQL syntax expansion. It has been a serious bug in multiple codebases I have worked with. Some of our analysis suggested to me, that preventing the fan out bug might be possible with less complex logic. However this does not only prevent the fan out problem, but a myriad of issues, a lot of which I have seen in production systems too.

You can read more about our design rationale in chapter 8 in the attached key_joins.pdf or online at https://keyjoin.org/#sec8. For an understanding of the usefulness of the feature I suggest sections 7.2 to 7.6 again in the key_joins.pdf or online at https://keyjoin.org/#sec7.2.

Best regards
Arne

Reply via email to