Re: should we have a fast-path planning for OLTP starjoins?

Tomas Vondra Mon, 01 Jun 2026 08:35:40 -0700

On 6/1/26 16:00, Robert Haas wrote:
> On Thu, May 28, 2026 at 6:11 PM Tomas Vondra <[email protected]> wrote:
>> Determine if the query includes a starjoin (or multiple), and remember
>> the relids included in the starjoin cluster. Pick a "canonical" join
>> order for each starjoin cluster we found (e.g. with dimensions in relid
>> order).
> 
> I like the idea of trying to avoid considering a bunch of join orders
> that are not meaningfully different from each other, but I think we're
> only really guaranteed that two join orders are equivalent if both of
> them are exactly one-to-one, with the row count neither increasing nor
> decreasing. (Even then, it's not 100% equivalent, but probably close
> enough.) I don't remember off-hand whether your definition of starjoin
> also includes joins that can reduce the row count. If it does, then I
> think we might need to be smarter than just putting the dimensions in
> relid order.


For the purpose of this optimization, the dimensions have to be 1:1
joins. There must not be any additional restrictions, it needs to be a
join on a FK, etc. So the relid order works fine.

> If it does not, then maybe we need to think about whether
> the optimization is going to apply to a sufficiently broad range of
> cases. Maybe it's fine: if a fact table is joined to lots of dimension
> tables, then some of them might have restriction clauses but probably
> many of them won't.
> 

In my experience it does apply to a significant number of queries, but
there's a selection bias (people don't talk to me about queries that
have no problems). Still, if the optimization costs virtually nothing, I
don't think that's a problem (we already do stuff like matching joins to
FK constraints anyway).

Also, I think there are ways to use this optimization to more queries.
For example it directly extends to snowflake joins - by adding "layers"
of dimensions, recursively. It's not such a massive benefit, because
snowflake joins have restrictions that substantially limit the search
space. But it would allow increasing join_collapse_limit.

The other idea is that Joel Jacobson's patch with KEY joins might allow
proving more joins are 1:1, not just joins on foreign keys. But I
haven't thought about that very much, and I'm not sure it's very useful
for such joins (it's probably not a starjoin-like query, and does not
have so many number of join orders).

> This is making me wonder whether we could improve on GEQO by
> transforming it from a "plan this query" thing to a "restrict the join
> order" thing. For instance, suppose we have two tables T1 and T2 and
> we wonder whether swapping the order in which they are joined matters.
> We pick a few random partial join orders of tables in the query
> excluding T1 and T2 and then check whether appending T1 and then T2
> produces significantly different results from appending T2 and then
> T1. So for example say we pick (), (T3), (T3-T4), and (T5-T3). We then
> test the cost/row count of T1-T2 vs. T2-T1, the cost/row count of
> T3-T1-T2 vs. T3-T2-T1, similarly for T3-T4-T1-T2 vs. T3-T4-T2-T1,
> similarly for T5-T3-T1-T2 vs. T5-T3-T2-T1. If we see that swapping the
> order of T1 and T2 routinely has minimal effect on the outcome, then
> it's probably safe to pick one or the other to be joined first and
> disregard all the join orders where the two are swapped. By repeating
> this kind of testing for various pairs of tables, we could possibly
> prune the join search space for large join problems considerably, and
> then still do an exhaustive test of the remaining join orders. Maybe
> that would produce better results than GEQO currently does (or maybe
> not).
> 

I'm not opposed to improving GEQO, but I don't think it's an alternative
to this optimization. It's still going to be a heuristics that skips
parts of the search space too early. 99.999% of deployments are not even
using GEQO for any queries.

The optimization helps a lot both in cases where GEQO is not applicable
(with fewer than geqo_threshold, and I doubt we want to lower that), and
in cases where GEQO is supposed to be helpful (i.e. it beats GEQO with
many joins).

So I think it's orthogonal, and it might even mean we don't need GEQO as
often. In any case, I see this as an orthogonal discussion. There's
already a thread discussing changes to GEQO, or replacing it with a new
heuristics.

> Although I kind of like this idea and think it's interesting, it's
> probably not going to be a good enough idea to compete with what
> you're doing, because your idea is working even when the number of
> dimensions is very small, and this seems like it would be a
> significantly more expensive test. So the question is probably whether
> there are cases where your cheaper test loses too much. I'm not sure
> of the answer. My intuition is that join planning is pretty hard and
> that shortcuts will tend to have cases where they behave really badly,
> but my intuition is also that we are wasting a whole lot of CPU cycles
> testing cases that are nearly equivalent to each other. I'm not sure
> which of those two intuitions is more correct in this case.
> 

Right, exactly. I agree this needs some testing how much overhead this
adds to large queries that don't benefit from it. I'll do that, but I
don't expect that to be an issue (because we already do the heavy work
of matching the FKeys already). It might require a smarter way to
represent the clusters or some other optimizations, of course.


regards

-- 
Tomas Vondra

Re: should we have a fast-path planning for OLTP starjoins?

Reply via email to