On 6/1/26 16:00, Robert Haas wrote: > On Thu, May 28, 2026 at 6:11 PM Tomas Vondra <[email protected]> wrote: >> Determine if the query includes a starjoin (or multiple), and remember >> the relids included in the starjoin cluster. Pick a "canonical" join >> order for each starjoin cluster we found (e.g. with dimensions in relid >> order). > > I like the idea of trying to avoid considering a bunch of join orders > that are not meaningfully different from each other, but I think we're > only really guaranteed that two join orders are equivalent if both of > them are exactly one-to-one, with the row count neither increasing nor > decreasing. (Even then, it's not 100% equivalent, but probably close > enough.) I don't remember off-hand whether your definition of starjoin > also includes joins that can reduce the row count. If it does, then I > think we might need to be smarter than just putting the dimensions in > relid order.
For the purpose of this optimization, the dimensions have to be 1:1 joins. There must not be any additional restrictions, it needs to be a join on a FK, etc. So the relid order works fine. > If it does not, then maybe we need to think about whether > the optimization is going to apply to a sufficiently broad range of > cases. Maybe it's fine: if a fact table is joined to lots of dimension > tables, then some of them might have restriction clauses but probably > many of them won't. > In my experience it does apply to a significant number of queries, but there's a selection bias (people don't talk to me about queries that have no problems). Still, if the optimization costs virtually nothing, I don't think that's a problem (we already do stuff like matching joins to FK constraints anyway). Also, I think there are ways to use this optimization to more queries. For example it directly extends to snowflake joins - by adding "layers" of dimensions, recursively. It's not such a massive benefit, because snowflake joins have restrictions that substantially limit the search space. But it would allow increasing join_collapse_limit. The other idea is that Joel Jacobson's patch with KEY joins might allow proving more joins are 1:1, not just joins on foreign keys. But I haven't thought about that very much, and I'm not sure it's very useful for such joins (it's probably not a starjoin-like query, and does not have so many number of join orders). > This is making me wonder whether we could improve on GEQO by > transforming it from a "plan this query" thing to a "restrict the join > order" thing. For instance, suppose we have two tables T1 and T2 and > we wonder whether swapping the order in which they are joined matters. > We pick a few random partial join orders of tables in the query > excluding T1 and T2 and then check whether appending T1 and then T2 > produces significantly different results from appending T2 and then > T1. So for example say we pick (), (T3), (T3-T4), and (T5-T3). We then > test the cost/row count of T1-T2 vs. T2-T1, the cost/row count of > T3-T1-T2 vs. T3-T2-T1, similarly for T3-T4-T1-T2 vs. T3-T4-T2-T1, > similarly for T5-T3-T1-T2 vs. T5-T3-T2-T1. If we see that swapping the > order of T1 and T2 routinely has minimal effect on the outcome, then > it's probably safe to pick one or the other to be joined first and > disregard all the join orders where the two are swapped. By repeating > this kind of testing for various pairs of tables, we could possibly > prune the join search space for large join problems considerably, and > then still do an exhaustive test of the remaining join orders. Maybe > that would produce better results than GEQO currently does (or maybe > not). > I'm not opposed to improving GEQO, but I don't think it's an alternative to this optimization. It's still going to be a heuristics that skips parts of the search space too early. 99.999% of deployments are not even using GEQO for any queries. The optimization helps a lot both in cases where GEQO is not applicable (with fewer than geqo_threshold, and I doubt we want to lower that), and in cases where GEQO is supposed to be helpful (i.e. it beats GEQO with many joins). So I think it's orthogonal, and it might even mean we don't need GEQO as often. In any case, I see this as an orthogonal discussion. There's already a thread discussing changes to GEQO, or replacing it with a new heuristics. > Although I kind of like this idea and think it's interesting, it's > probably not going to be a good enough idea to compete with what > you're doing, because your idea is working even when the number of > dimensions is very small, and this seems like it would be a > significantly more expensive test. So the question is probably whether > there are cases where your cheaper test loses too much. I'm not sure > of the answer. My intuition is that join planning is pretty hard and > that shortcuts will tend to have cases where they behave really badly, > but my intuition is also that we are wasting a whole lot of CPU cycles > testing cases that are nearly equivalent to each other. I'm not sure > which of those two intuitions is more correct in this case. > Right, exactly. I agree this needs some testing how much overhead this adds to large queries that don't benefit from it. I'll do that, but I don't expect that to be an issue (because we already do the heavy work of matching the FKeys already). It might require a smarter way to represent the clusters or some other optimizations, of course. regards -- Tomas Vondra
