Claude Brisson created CALCITE-7199:
---------------------------------------
Summary: RelMdColumnUniqueness joins handler is invalid for
columns in both sides
Key: CALCITE-7199
URL: https://issues.apache.org/jira/browse/CALCITE-7199
Project: Calcite
Issue Type: Bug
Components: core
Affects Versions: 1.40.0
Reporter: Claude Brisson
{{RelMdColumnUniqueness.areColumnsUnique()}} handler for joins contains the
following code:
{code}
// If the original column mask contains columns from both the left and
// right hand side, then the columns are unique if and only if they're
// unique for their respective join inputs
Boolean leftUnique = mq.areColumnsUnique(left, leftColumns, ignoreNulls);
Boolean rightUnique = mq.areColumnsUnique(right, rightColumns, ignoreNulls);
if ((leftColumns.cardinality() > 0)
&& (rightColumns.cardinality() > 0)) {
if ((leftUnique == null) || (rightUnique == null)) {
return null;
} else {
return leftUnique && rightUnique;
}
}
{code}
This is not correct. Uniqueness on both sides is a sufficient condition for the
columns to be unique in the join result, but not a necessary one.
The columns will also be unique in the joins result if the following conditions
are all met:
* the queried columns coming from the left input are unique
* the columns implied in the right side of the join equi-condition are unique
* the join does not generate nulls on the left
(and the same with left and right reversed).
The fact that the join is done on a unique key on the right side will guarantee
that the uniqueness of the queried columns on the left is preserved, and
whatever columns we add in the queried subset, it will remain unique.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)