Re: Key joins

Arne Roland Thu, 11 Jun 2026 17:21:18 -0700

Hi Tomas,

thank you for checking it out!


On 2026-06-11 11:02 PM, Tomas Vondra wrote:

Hi,

I took a quick look at the patch over the past couple days. I don't have
a perfect understanding of how it works, but let me share what I have so
far, before I get distracted by other stuff.

I'm interested in this patch because there seems to be a possible
overlap with the overlap with the starjoin planning (in that maybe we
could try reusing some of the derived information for that).

It's a mix of random thoughts, high/low level, important/superficial in
no particular order.


1) It does not compile, ATExecSetRowSecurity seems to be missing
prev_rls or something like that. I simply commented this out, to get it
to compile.


Sorry, I am confused. There is the self contained block of

@@ -18879,6 +18898,7 @@ ATExecSetRowSecurity(Relation rel, bool rls)
        Relation        pg_class;
        Oid                     relid;
        HeapTuple       tuple;
+       bool            prev_rls;

relid = RelationGetRelid(rel);@@ -18890,6 +18910,7 @@ ATExecSetRowSecurity(Relation rel, bool rls)

        if (!HeapTupleIsValid(tuple))
                elog(ERROR, "cache lookup failed for relation %u", relid);

+ prev_rls = ((Form_pg_class) GETSTRUCT(tuple))->relrowsecurity;

        ((Form_pg_class) GETSTRUCT(tuple))->relrowsecurity = rls;
        CatalogTupleUpdate(pg_class, &tuple->t_self, tuple);

@@ -18898,6 +18919,21 @@ ATExecSetRowSecurity(Relation rel, bool rls)table_close(pg_class, RowExclusiveLock);

        heap_freetuple(tuple);
+
+       /*
+        * Revalidate stored key-join proofs that depend on this relation.  The
+        * key-join base-fact computation refuses to expose facts for a relation
+        * with row-level security enabled (parse_key_join.c), so flipping
+        * relrowsecurity can leave a stored proof unprovable.  Revalidate
+        * whenever the flag actually changes; for the off-to-on transition the
+        * revalidation raises an error and aborts this DDL, while the on-to-off
+        * transition is a no-op for proofs that were already valid.
+        */
+       if (rls != prev_rls)
+       {
+               CommandCounterIncrement();
+               RevalidateDependentKeyJoinObjectsOnRelation(relid);
+       }
  }

Apart from that there shouldn't be any references to that. Whatdefinition was missing?

2) fk_referenced_selected in find_key_join_match should be marked as
PG_USED_FOR_ASSERTS_ONLY, to fix compiler warning

Thank you for spotting that.

3) It'd be helpful if the commit message for 0001 explained why this
change is needed. More clearly than now, I mean. The initial message in
this thread points at another thread as the source of this, but that
thread is huge so how are reviewers expected to find the explanation?

Don't explain just what the commit does, but why it's needed. Say, give
an example of how the old code fails.

In short we need that to prevent objects our proof relies on to bedropped or altered below us. I'll try to work that into the commit message.

4) I think there needs to be a README explaining how the feature works,
i.e. the design and trade offs. Alternatively, it could be explained in
a comment at the beginning of some .c file, but a README seems easier to
find. Without this, reviewers have to piece it from the sgml "user"
docs, and random comments all over the place.


5) It might be helpful if the README defined a couple terms introduced
by the patch, but not really defined anywhere. I mean terms like: join
point, proof, proof graph, fact, surface, "relation visible from". I can
guess what each of these means, but maybe I got it wrong.

Although, I now see "join point" was used in the docs before, so maybe
it's a well-known term?

Thank you, I will try to come up with some better document explainingthe concept. For now I can mainly point you only tohttps://keyjoin.org/#sec7.1, which gives a birds eye overview over howthe implementation and it's artifacts work.

Seeing your confusion, I think we should definitely reduce the amount ofterminology used. For instance "proof graph" is shorthand for the sliceof the catalog dependency graph contributed by key-join proofs(depending object → proof object), i.e. the deptype='k' subgraph of thedependency graph. I don't think we need to introduce this word as a newconcept. Thank you, this input is very valuable to improve this.

6) Does it always have to be a "foreign key traversal"? Consider an
example like this:

     create table dim (id serial primary key);
     create table f (id serial primary key, did int);
     select * from f left join dim for key (id) <- f(did);

Currently this fails because of no matching foreign key constraint, but
isn't that pretty much the same thing as if the table actually had the
foreign key (at least considering the cardinality of the join - it won't
change it by adding/removing rows).

I realize that'd contradict the "FOR KEY" part of this patch, but it's
also one of the things that might be beneficial for the starjoin
planning (in that we could maybe extend it to more cases). Although,
maybe we could check those constraints during planning, just like we
check the fkey_list.

This is a very different feature avoiding far less bugs than theoriginal key join. To give just once, consider


SELECT *
FROM customers cu
LEFT JOIN FOR KEY orders (id) <- cu (id)
WHERE customers.id = 122354;

We decided, this (among other) error classes, was important enough, wepushed for stronger error guarantees, than just the uniqueness of thereferenced side. I still am convinced, this more save way of writingqueries is better for it's additional safety. I do think caring aboutthe foreign key gives the better syntax.


Do you think such constructions are very common for your customers?

Our proof framework with proof facts could potentially be leveraged toproof such constructions too. We opted to keep this patch as minimal aspossible, since it's already huge. In the end our conquest is to get tosomething, which is eventually commitable. This patch has a lot thingsto get wrong. But I think as a second step adding support of otherschemas and handing this information of to the planer, should be afairly straight forward thing to do. Although it would add a few lineson it's own, it's step, that I seems easier than landing a minimalversion of this patch.

7) The patch invents a new "FILTER" clause for joins:

    [ FILTER (WHERE join_filter) ]

I understand why it's done - there may be additional join conditions, on
top of the FK equality. But I think it'll be rather confusing. We
already have a "Join Filter", which is used for join clauses that happen
to not be used as "proper" join conditions (e.g. Hash/Merge Cond), and
has to be evaluated "after" the join itself. And now we'd have another
kind of "join filter" ...

Is there a different that'd make this work without the FILTER? For
example, we might encode the "key join" information in the existing ON
(...) clause:

     ON (KEY (oi.order_id) -> (o.id) AND ... other clauses ...)

but it seems not as readable. Or maybe just use something else than
"FILTER"? Not sure.

This word has the benefit of being already a keyword and it solves theproblem rather clear cut. Do you think, there is something we could doto make this better?

8) I don't understand this change in functioncmds:

     /* Routine kind cannot change for an existing pg_proc OID. */
     Assert(procForm->prokind != PROKIND_AGGREGATE);

The comment says we can't change OID, but the assert checks it's not an
aggregate functions. Isn't that misleading?

Maybe it would be cleaner to mention additionally, that "aggregates werealready rejected on the pre-lock tuple"? I actually think, that commentis indeed stating a helpful invariant to explain why we are concurrencysave here.

9) DefineView now adjusts the query ID. Why is that needed? Isn't that a
bit weird?

Sorry, does it? Could you point out where exactly? I might be too tiredto see it right now.

10) checkWellFormedRecursionWalker now recurses, but why is that needed?

Didn't in already recurse before? Didn't check the blame, but I do thinkit's recursing for a long time.

I just see an added line to check the new filter clause for subqueriesor something to recurse into.

11) It's not clear to me why we need p_creating_stored_object, and it's
not explained anywhere. Maybe it's obvious, but not to me.

Sorry, that's my oversight. We need to take stronger locks, if we storesomething materially, to prevent concurrent changes, while we store it.

Where do you recon a better comment would be helpful? A longer commentdirectly in the struct definition?

12) I'm very skeptical anyone can meaningfully review parse_key_join.c.
It's 150KB with ~5200 lines (very dense, with pretty minimal comments).
That's a massive file. The chance of me declaring this committable is
about 0%, simply because I wouldn't believe I understand it.

I think it needs to be broken up into smaller pieces, somehow. I don't
know how, but perhaps it's possible to extract a "minimal feature"
handling some limited subset of cases, and then gradually expand it?

I think there is serious complexity involved in just getting the basicthe architecture. That being said, I am very open to any idea of astepping stone to this.

It's something I thought a lot about during the last weeks. I am toyingwith the thought of ripping out all changes related to anything storedlike views and sql functions as a minimal, minimal version. It's so barebone, I'm not terribly exited about doing that, but maybe it's anecessary evil. Having thought about a lot of edge cases related to thatstoring, As I see it, we wouldn't save a lot of lines of code with this,but it makes reasoning about correctness easier. I can testament to thecomplexity in reasoning and correctness it adds. Do you have any otherideas for a stepping stone, that could reasonably land?

I think this one of the most fundamental questions to be answered,before we optimize one of the paths. What is the first consumer of thesurface fact framework and how can we stage a minimal commit around it.We need a committer to feel save to commit something with thisframework. If you have something notably smaller than this, I am all ears.

Furthermore, it seems a lot of that file is duplicate with code we
already have elsewhere. For example, couldn't it use a bunch of list
functions instead of writing a local version?

   list_contains_equal_node -> list_member
   append_dependencies_unique -> list_concat_unique
   append_dependency_unique -> list_append_unique
   ...

There may be more such cases, I haven't looked for them.

Thanks for bringing it up. I will look into that, once I've properlygone through the other suggestions. I suspect there won't be many lineto save, but I'll have a look.

13) I think the main question is whether parse-analyze is the right
place to handle this. I don't know. I assume one of the reasons for
doing that is to get an error when defining a view, or when altering an
object - e.g. like when dropping a function used by a view. Which seems
reasonable, people would not like views silently broken by DDL.

But it seems to be this also leads to a lot of code duplication, because
the parse analysis now has to "reimplement" a lot of the stuff already
done in later stages, after parse analyze. For example, we have the
root->fkey_list thing, but that's not available yet.

There's probably more such information - like innerrel_is_unique,
rel_is_distinct_for, relation_has_unique_index_for etc.

Maybe it's not worth it? What if we did this in the planner instead? How
much simpler would it get? My intuition is it'd get much smaller, but
maybe I'm wrong.

It'd probably mean it's not necessary to expand views during parse,
which was one of the problems mentioned at the beginning of this thread.

I suppose we'd need to keep additional information in the plan to know
which joins to check, etc.

If we want to use this for correctness and for proofs, we can't do itplan time. We have DDL-writes to store proofs. In it's nature this is acompile time feature.

Even if we don't include all of that in the first patch, this placesvery severe limitations like the DDL issue, which would be just tooolimiting to get to as an end place. I do think it's more reasonable howto cut this up into more digestible pieces.

Even though the planner is messy, I feel more at home there than here inthe parser. I could see a world were we clean up some of those out ofthe planner to make them available earlier some time in parsing. Wecould push those variables forward to the planner to avoid redoing thework. Each of these steps would need a careful performance check and Iam not sure how the real stepping stones would look like.

14) If we wanted to use some of this for the starjoin planning stuff,
we'd need to propagate more information too, I think. But I'm not sure
about this - we already have the information about foreign keys, so if
key joins are tied to foreign keys, there would not be no new useful
information I suppose.

Plus, I don't want to make that patch dependent on people using new
syntax. If that can give us *additional* information, that would be a
different thing.

As I said above, if we figure out what information we want to propagatethis will be helpful. We will be able to propagate this. And if we canget in a base patch with this architecture, adding this should bestraight forward.

One thing, that sounds fairly simple to derive from our proof, is theproof graph. This means in every node we denote, this node can betreated as cardinality preserving joined onto a different node. Forplanning we need something more global, because we need to considersomething WHERE/HAVING, but the existence of such a clause shouldn't becomplex to check for.

[...]

16) I was wondering what performance impact this has, roughly. So I
created a simple 4-way join with fact + 3 dimensions:

   create table dim1 (id serial primary key, val text);
   create table dim2 (id serial primary key, val text);
   create table dim3 (id serial primary key, val text);
   create table f (id serial primary key,
                   d1 int not null references dim1(id),
                   d2 int not null references dim2(id),
                   d3 int not null references dim3(id));

and then measured throughput with 2 queries

   select * from f
            join dim1 on (dim1.id = d1)
            join dim2 on (dim2.id = d2)
            join dim3 on (dim3.id = d3);

   select * from f
            join dim1 for key (id) <- f (d1)
            join dim2 for key (id) <- f (d2)
            join dim3 for key (id) <- f (d3);

which is the same query, except for the FOR KEY syntax. And I got this
(under explain, to only do the planning):

               patched    master
   -----------------------------
      join       25251     24906
   keyjoin       17159

So that's ~30% regression compared to master / regular joins. I also
tried with join_collapse_limit=1 to eliminate the join order planning:

               patched    master
   -----------------------------
      join       31476     31252
   keyjoin       19353

That's 40% regression. Of course, this is just explain - once there is
data in the table, and actual execution, the differences will be much
smaller. Still, it's not great.

I haven't looked very closely, but based on some quick profiling it
seems ensure_key_join_surface_facts / compute_key_join_relation_facts is
doing expensive stuff like findNotNullConstraintAttnum or
get_index_constraint, both of which do systable scans. That should
probably go through syscache or something like that.

I'm not too surprised about ensure_key_join_surface_facts holding mostof the regression, since it's doing all the work. Since I spent no timeor thought on optimizing runtime thus far, I think there a lot of longhanging fruits left. I'd prefer to sort out the architecture first, butI think, we should be able to improve the parsetime of this feature by asizeable amount. I fully agree: Before committing this, we should hitsome of the relevant functions for a quick win.

regards

I will probably work my way through this incrementally. I hope to have aversion incorporating your vast feedback next week.


Regards
Arne

Re: Key joins

Reply via email to