In my example, I think we could check if (c,d) is unique on the right side,
in which, case I believe, (a,d) would be unique after the join. So maybe
the logic could be improved to:

Boolean leftUnique = mq.areColumnsUnique(left,
leftColumns.union(rel.analyzeCondition().leftSet()), ignoreNulls);
Boolean rightUnique = mq.areColumnsUnique(right,
rightColumns.union(rel.analyzeCondition().rightSet()), ignoreNulls);

I think this would still be correct, while being less conservative. The
case where the right join column is a primary key, but the other column is
not unique is probably pretty common.

On Fri, Mar 7, 2025 at 1:13 PM Steven Phillips <[email protected]> wrote:

> By join analysis, I assume you are referring to
> https://github.com/apache/calcite/blob/main/core/src/main/java/org/apache/calcite/rel/metadata/RelMdColumnUniqueness.java#L336
>
> The thing is, a join as the potential to make a unique column no longer
> unique, as it could be an expanding join. e.g. joining these two tables on
> b = c. Column a is unique on its own, but d is not.
>
> a b      c   d
> 1 1      1   1
> 2 1      1   1
>
> a b c  d
> 1 1 1 1
> 1 1 1 1
> 2 1 1 1
> 2 1 1 1
>
> After the join, the key (a, d) is not unique. So the rule is not too
> conservative.
>
> On Fri, Mar 7, 2025 at 12:57 PM Ian Bertolacci
> <[email protected]> wrote:
>
>> Two comments:
>> First: on our side, we’d like to know if *removing* the use of this
>> predicate pullup information would have *negative* consequences for
>> correctness.
>> If it does not impact correctness: we can override the behavior to not
>> consider this information, and move forwards.
>>
>> Second: a colleague pointed out that join’s notion of unique may also be
>> too conservative, depending on what RelMdColumnUniqueness.areColumnsUnique
>> is really supposed to be doing.
>> If the intention is to check if the bitset of columns forms a unique key,
>> then the join analysis is too conservative, since if there are unique
>> columns from one side, and non-unique from the other, the whole key is
>> considered not unique, even though the presence of *any* unique column
>> makes the overall key unique. (Its worth pointing out that this is how
>> Statisitcs.isKey works, which is how areColumnsUnique(TableScan) works if
>> that RelOptTable does not provide a special uniqueness handler).
>> If the intention is to check if *any* column in the bitset is itself
>> unique, then the join analysis is also too conservative, for the same
>> reasons.
>> If the intention is to check if *all* column in the bitset is itself
>> unique, then the join analysis is fine but the TableScan’s default analysis
>> is incorrect (for reasons mentioned above).
>>
>> So there’s more than just this predicate pullup at play here.
>>
>

Reply via email to