In my example, I think we could check if (c,d) is unique on the right side, in which, case I believe, (a,d) would be unique after the join. So maybe the logic could be improved to:
Boolean leftUnique = mq.areColumnsUnique(left, leftColumns.union(rel.analyzeCondition().leftSet()), ignoreNulls); Boolean rightUnique = mq.areColumnsUnique(right, rightColumns.union(rel.analyzeCondition().rightSet()), ignoreNulls); I think this would still be correct, while being less conservative. The case where the right join column is a primary key, but the other column is not unique is probably pretty common. On Fri, Mar 7, 2025 at 1:13 PM Steven Phillips <[email protected]> wrote: > By join analysis, I assume you are referring to > https://github.com/apache/calcite/blob/main/core/src/main/java/org/apache/calcite/rel/metadata/RelMdColumnUniqueness.java#L336 > > The thing is, a join as the potential to make a unique column no longer > unique, as it could be an expanding join. e.g. joining these two tables on > b = c. Column a is unique on its own, but d is not. > > a b c d > 1 1 1 1 > 2 1 1 1 > > a b c d > 1 1 1 1 > 1 1 1 1 > 2 1 1 1 > 2 1 1 1 > > After the join, the key (a, d) is not unique. So the rule is not too > conservative. > > On Fri, Mar 7, 2025 at 12:57 PM Ian Bertolacci > <[email protected]> wrote: > >> Two comments: >> First: on our side, we’d like to know if *removing* the use of this >> predicate pullup information would have *negative* consequences for >> correctness. >> If it does not impact correctness: we can override the behavior to not >> consider this information, and move forwards. >> >> Second: a colleague pointed out that join’s notion of unique may also be >> too conservative, depending on what RelMdColumnUniqueness.areColumnsUnique >> is really supposed to be doing. >> If the intention is to check if the bitset of columns forms a unique key, >> then the join analysis is too conservative, since if there are unique >> columns from one side, and non-unique from the other, the whole key is >> considered not unique, even though the presence of *any* unique column >> makes the overall key unique. (Its worth pointing out that this is how >> Statisitcs.isKey works, which is how areColumnsUnique(TableScan) works if >> that RelOptTable does not provide a special uniqueness handler). >> If the intention is to check if *any* column in the bitset is itself >> unique, then the join analysis is also too conservative, for the same >> reasons. >> If the intention is to check if *all* column in the bitset is itself >> unique, then the join analysis is fine but the TableScan’s default analysis >> is incorrect (for reasons mentioned above). >> >> So there’s more than just this predicate pullup at play here. >> >
