On Sat, May 30, 2026 at 08:57:20PM +0200, Tomas Vondra wrote: > >> The new join restriction is that if the join result includes a subset of > >> the starjoin cluster, then it has to include the fact + prefix of the > >> list of dimensions (which is the canonical join order). > > > > Sorry, I got lost here. What is "prefix?" I looked at the patch and > > also could not understand it. > > Apologies, it may not be obvious from the code / comments (I'll try to > improve that in the next version). > > Let's say we're joining "F" with dimensions D1, D2, D3. Then the > starjoins_canonicalize() finds the cluster, and picks a canonical join > order. Could be [F, D1, D2, D3] - in this order. Or whatever other > permutation of the dimensions, it's all equal.
Uh, are D1, D2, D3 in relid order at this point? > Then starjoin_order_invalid() ensures that whatever join relation we > produce, it only even contains a prefix of this list. So a join relation > can contain [F], [F, D1], [F, D1, D2], [F, D1, D2, D3]. But it can't > contain e.g. [F, D2], because that skips the D1 - it's not a prefix. Okay, prefix like a multi-column index prefix of columns. > The patch only applies this to relations from the cluster. There can be > other relations in the join "in between" the dimensions - that does not > make the join order "invalid". > > So for example there may be joins to non-dimensions A and B, and we will > consider joins [F, A, D1, B, D2, D3] and so on as valid. The joins to A > and B joins can increase/decrease cardinality, but thanks to this we > should find the right place to join the dimensions. Okay, so if D1, D2, and D3 are all "cluster" joins then aren't they are 1:1, so why would you ever put something like B between them? If B reduces columns, it would be before the cluster, and if it expands them it would be after cluster. So if B is 1:1 too, in what cases might it be better to join B between dimension joins? > We could even make it a bit stricter, and require that all dimensions > join "at once". I.e. after joining a dimension, only dimensions can be > joined (until all dimensions are joined). So [F, D1, A, D2] would not be > allowed. This would further reduce the number of join orders considered. Right, I guess that is what I am asking above. > > Impressive. > > > > Indeed. I like how it fits into the existing approach. It's a bit like > having yet another "join order restriction". This would be a big feature improvement for OLAP workloads. -- Bruce Momjian <[email protected]> https://momjian.us EDB https://enterprisedb.com Do not let urgent matters crowd out time for investment in the future.
