Hi Henson,
thank you for getting involved!
On 2026-06-25 1:15 PM, Henson Choi wrote:
Hi hackers,
Thank you for this interesting proposal. I have two questions that
touch on what I see as a fundamental tension in the design.
On SQL's evolution from How to What
As Codd put it, "The user of a relational system should not need to
know how the system stores and accesses data." Key Join, I would
argue, pulls in the opposite direction.
SQL has historically evolved away from procedural "how" toward
declarative "what" — users describe the result they want, and the
optimizer figures out how to get there. Foreign key constraints are a
good example: once declared in the schema, the optimizer can already
exploit them for join elimination, cardinality estimation, and so on.
Key Join asks users to explicitly annotate their queries with FK
traversal semantics. Isn't that a step back toward "how"?
I am 100% with you on that Codd quote. I love the declarative nature of
SQL and invited developers with imperative paradigms to change to
declarative ones. I'd argue that we describe what we want. Let me quote
our intent posted in the opening message of this thread:
> We propose a new JOIN syntax that makes it easy to determine locally
> that the immediate join result, before any further steps, just enriches
> the referencing side with information from the referenced side, with
> null-extension for OUTER JOINs. It conveys the author's intent, makes
> the referencing side visually clear, and is enforced at compile time
> against the schema.
I'd argue the intent is the enrichment, and which side is the one being
referenced. It has been my experience over the years that developers
have it. We are offering a way to express it. By its nature this patch
series adds something closer to an assertion. If anything that makes a
key join more declarative than an ordinary equijoin: We declare an
intent, and the schema settles the rest. This is expressed in more
detail in section 8.3 in our key_joins.pdf
[https://www.postgresql.org/message-id/00c30670-64e1-4c30-a349-784426d333df%40app.fastmail.com]
or in the web demo under https://keyjoin.org/#sec8.4. (The numbering is
slightly off because of versioning issues.) The proposed patch does not
alter how the planner or optimizer work in the slightest. Please help me
understand how we are taking a step toward "how"?
Tomas Vondra raised a directly relevant point in his review:
> I'm interested in this patch because there seems to be a possible
> overlap with the starjoin planning (in that maybe we could try
> reusing some of the derived information for that).
I do think that we want to use the same underlying proving
infrastructure. That comment is not about the semantics of our SQL
language or our interface with the code, but about reusing and sharing
logic, that we can get into the source code of our beloved project.
And later:
> Plus, I don't want to make that patch dependent on people using
> new syntax. If that can give us additional information, that would
> be a different thing.
His second remark seems to carry an implicit question: if the optimizer
can already see the FK constraints, why does the user need to annotate
the query at all? Wouldn't automatic starjoin-style optimization be more
consistent with SQL's declarative philosophy?
If you want to do optimization in the planner, you totally can. I think
we need more optimizer improvements. As the attempts in the prior thread
about the optimizations of star joins did that. Tomas just noted that he
is interested in using the underlying proof architecture for that.
This feature we are working on here is NOT a performance feature, it's
*only* a correctness feature. We as authors have different backgrounds
and different views. However I am very confident that we are in
agreement, this is solely a correctness feature.
I am very open to discuss and reason about the potential value of using
the infrastructure for improvements, because I have an interest in those
too. I do think that's currently beyond the scope of this thread,
because of the already fairly involved complexity of this patch.
Informally this feature is meant for query writers to say "if you can't
prove the referential correctness of this feature, please don't go ahead
and give me an error instead". This error is not an accidental artifact,
it sits at the core of this feature. Doing this compile time is very
helpful to build correct systems. We need to convey the intent of the
existence of a referential constraint in order for it to be proven.
Our current syntax also makes it much easier to review queries. For an
example, I'd refer you to our example in section 8.9 of our document,
which can be read here in the thread at the attached key_joins.pdf or
our online web demo at https://keyjoin.org/#sec8.10. The numbering is
slightly off because of versioning issues.
This reading is also
supported by his point 13, where he explicitly asks whether the planner
— rather than parse-analyze — might be the more appropriate place to
handle this, and notes that moving it there could substantially reduce
the code footprint.
I have tried Tomas suggestion and found no substantial reduction in
complexity. One issue here is that we have different inputs, since we
are at parsing and not planning stage. Another that we have vastly
different outputs, since we need not only to know, that we are unique,
but exactly how, including the knowledge of the unique
constraints/indexes to record the pg_depend for stored objects with
attached key joins.
Despite us logically doing almost the same thing in the three helper
functions as the optimizer code does, for code architectural reasons,
the benefit of trying to use those functions seems questionable to me.
Small side note for completeness sake: We skip a few cases like
uniqueness of aggregates without group by clause, because in those we
know, that there can't be a referential constraint. We still encode the
uniqueness of that, but not in that function.
On hints vs. syntax
Traditionally, when users have needed to pass extra information to the
optimizer — information the engine couldn't derive on its own — the
community has handled this through hints. Non-standard, yes, but they
leave the core syntax untouched and stay firmly in the optimizer's
domain. Key Join introduces this information as first-class syntax with
compile-time enforcement. What are the concrete advantages and
disadvantages of that choice compared to a hint-based approach?
This patch series doesn't touch the optimizer at all nor should it. If
something like this gets committed, we will be able to harness some of
the underlying architecture for optimization purposes.
The reason why our intended correctness guarantees can't be achieved
through works in the optimizer is the proper enforcement of tracking the
correctness of the referential guarantees in catalog objects. Allow me
to quote from my earlier answer:
> With DDL-issue I was alluding to a DDL command that attaches a query
to some object. Views, sql functions and policies all do that.
> I don't see us doing that work planning time. Theoretically we could
try to use a new planner hook for that, to call the planner with an
> extra struct element telling it about increased locking arrangements
without the intent to ever execute a query and abort after the proof.
Structurally I don't really see a sane way to make this feature a
planner stage feature at all.
Specifically:
The main advertised advantage is compile-time correctness enforcement
— catching fan-out bugs at parse time rather than silently producing
wrong results. Is that benefit sufficient to justify introducing "how"
semantics into SQL syntax proper? A hint could achieve the
optimizer-side benefits (starjoin planning, cardinality guidance)
without touching the grammar. What does the syntax approach give us
that a hint cannot?
I recognize that compile-time enforcement is genuinely valuable, and
that hints cannot be standardized through WG3. But I'd like to
understand how the authors weigh those tradeoffs, especially given
Tomas's observation that the planner already has access to FK
information and could potentially derive much of this automatically.
Best regards,
Henson
Our correctness guarantee is very different from a hint. A hint is
telling the optimizer "try to do this thing, if you can". Our
correctness guarantee is as of now not giving anything to the optimizer
and telling the parser "give me an error, if you can't prove my
assumptions". I struggle to see the overlap between the two.
Since you specifically raised the fan-out problem and wanted an opinion
from an author, I will share mine. Here I can't claim to speak for all
authors.
My opinion is that the fan-out problem alone is more than enough to
warrant a proper SQL syntax expansion. It has been a serious bug in
multiple codebases I have worked with.
Some of our analysis suggested to me, that preventing the fan out bug
might be possible with less complex logic. However this does not only
prevent the fan out problem, but a myriad of issues, a lot of which I
have seen in production systems too.
You can read more about our design rationale in chapter 8 in the
attached key_joins.pdf or online at https://keyjoin.org/#sec8. For an
understanding of the usefulness of the feature I suggest sections 7.2 to
7.6 again in the key_joins.pdf or online at https://keyjoin.org/#sec7.2.
Best regards
Arne