[
https://issues.apache.org/jira/browse/CALCITE-7622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Yash updated CALCITE-7622:
--------------------------
Description:
JoinProjectTransposeRule must not match SEMI / ANTI join
The fundamental problem is that {{JoinProjectTransposeRule}} was designed
around *regular joins* (INNER/OUTER), where the output row type is always
{{{}left columns + right columns{}}}. SEMI and ANTI joins violate this
assumption in a structural way.
h4. 1. The rule computes {{joinChildrenRowType}} with hardcoded {{INNER}}
{code:java}
final RelDataType joinChildrenRowType = SqlValidatorUtil.deriveJoinRowType(
leftJoinChild.getRowType(), rightJoinChild.getRowType(), JoinRelType.INNER, //
<-- always INNER ...);{code}
This is intentional for INNER/OUTER: it's building a "scratch" merged type to
construct the RexProgram over both sides. But the assumption that {{left fields
+ right fields = join output fields}} is {*}only true for non-SEMI/ANTI
types{*}.
h4. 2. SEMI and ANTI drop the right side from their output row type
>From {{{}SqlValidatorUtil.deriveJoinRowType{}}}:
{code:java}
case SEMI: case ANTI: rightType = null; // right side is GONE from the output
break;{code}
So when the rule later does:
{code:java}
final int nProjExprs = join.getRowType().getFieldCount();{code}
For a SEMI/ANTI join, {{join.getRowType()}} only has {*}left-side fields{*}.
But the rule calculated {{projects}} over {{left fields + right fields}} (the
INNER-typed scratch type). The counts don't match.
h4. 3. The resulting {{newJoin}} has a structurally wrong row type
The rule creates:
{code:java}
final Join newJoin = join.copy(join.getTraitSet(), newCondition, leftJoinChild,
rightJoinChild, join.getJoinType(), // SEMI or ANTI preserved here
join.isSemiJoinDone());{code}
{{newJoin.getRowType()}} is re-derived via {{deriveJoinRowType}} with the
actual SEMI/ANTI type — so it only has left-side fields. But the projection
list {{newProjExprs}} was built assuming left + right fields exist. The loop:
{code:java}
for (int i = 0; i < nProjExprs; i++) {
RexNode newExpr = mergedProgram.expandLocalRef(projList.get(i)); ...
newProjExprs.add(newExpr);
}{code}
...will contain {{{}RexInputRef{}}}s pointing to right-side field indices that
no longer exist in {{{}newJoin{}}}'s row type. This causes either:
* An {{IndexOutOfBoundsException}} at planning time, or
* A silent plan corruption where the wrong fields get referenced
h4. 4. The {{isOuterJoin()}} adjustment path is also skipped
For OUTER joins, there's a correction step:
{code:java}
if (joinType.isOuterJoin()) { newExpr = newExpr.accept(new
RelOptUtil.RexInputConverter(...)); }{code}
SEMI/ANTI return {{false}} for {{{}isOuterJoin(){}}}, so this adjustment never
runs. Even if the field counts somehow survived, the {{RexInputRef}} indices
would still be wrong relative to the new join's output.
h4. 5. The guards that exist don't help for SEMI/ANTI
The only type-based guards in {{onMatch}} are:
{code:java}
!joinType.generatesNullsOnLeft() // false for SEMI and ANTI
!joinType.generatesNullsOnRight() // false for SEMI and ANTI{code}
Since both return {{false}} for SEMI/ANTI, {*}neither project gets
suppressed{*}. The rule merrily picks up both the left and right project
children and proceeds into the broken code path.
----
h3. Why the right-side project is especially dangerous
For a plan like:
{code:java}
Project(a, b)
└─ Join[SEMI]
├─ Project(a, b, c) ← left
└─ Project(x, y) ← right{code}
The right-side project is *semantically invisible* — SEMI join consumers never
see right-side columns. But the rule pulls it up anyway, constructing a merged
program that references right-side field indices. After the rule fires, the
plan references columns that the SEMI join doesn't output.
was:JoinProjectTransposeRule must not match SEMI / ANTI join
> Don't fire JoinProjectTransposeRule for ANTI/SEMI/LEFT_MARK JOIN
> ----------------------------------------------------------------
>
> Key: CALCITE-7622
> URL: https://issues.apache.org/jira/browse/CALCITE-7622
> Project: Calcite
> Issue Type: Bug
> Components: core
> Affects Versions: 1.42.0
> Reporter: Yash
> Assignee: Yash
> Priority: Minor
> Labels: pull-request-available
>
> JoinProjectTransposeRule must not match SEMI / ANTI join
>
> The fundamental problem is that {{JoinProjectTransposeRule}} was designed
> around *regular joins* (INNER/OUTER), where the output row type is always
> {{{}left columns + right columns{}}}. SEMI and ANTI joins violate this
> assumption in a structural way.
> h4. 1. The rule computes {{joinChildrenRowType}} with hardcoded {{INNER}}
> {code:java}
> final RelDataType joinChildrenRowType = SqlValidatorUtil.deriveJoinRowType(
> leftJoinChild.getRowType(), rightJoinChild.getRowType(), JoinRelType.INNER,
> // <-- always INNER ...);{code}
> This is intentional for INNER/OUTER: it's building a "scratch" merged type to
> construct the RexProgram over both sides. But the assumption that {{left
> fields + right fields = join output fields}} is {*}only true for
> non-SEMI/ANTI types{*}.
> h4. 2. SEMI and ANTI drop the right side from their output row type
> From {{{}SqlValidatorUtil.deriveJoinRowType{}}}:
> {code:java}
> case SEMI: case ANTI: rightType = null; // right side is GONE from the output
> break;{code}
> So when the rule later does:
> {code:java}
> final int nProjExprs = join.getRowType().getFieldCount();{code}
> For a SEMI/ANTI join, {{join.getRowType()}} only has {*}left-side fields{*}.
> But the rule calculated {{projects}} over {{left fields + right fields}} (the
> INNER-typed scratch type). The counts don't match.
> h4. 3. The resulting {{newJoin}} has a structurally wrong row type
> The rule creates:
> {code:java}
> final Join newJoin = join.copy(join.getTraitSet(), newCondition,
> leftJoinChild, rightJoinChild, join.getJoinType(), // SEMI or ANTI preserved
> here join.isSemiJoinDone());{code}
> {{newJoin.getRowType()}} is re-derived via {{deriveJoinRowType}} with the
> actual SEMI/ANTI type — so it only has left-side fields. But the projection
> list {{newProjExprs}} was built assuming left + right fields exist. The loop:
>
> {code:java}
> for (int i = 0; i < nProjExprs; i++) {
> RexNode newExpr = mergedProgram.expandLocalRef(projList.get(i)); ...
> newProjExprs.add(newExpr);
> }{code}
> ...will contain {{{}RexInputRef{}}}s pointing to right-side field indices
> that no longer exist in {{{}newJoin{}}}'s row type. This causes either:
> * An {{IndexOutOfBoundsException}} at planning time, or
> * A silent plan corruption where the wrong fields get referenced
> h4. 4. The {{isOuterJoin()}} adjustment path is also skipped
> For OUTER joins, there's a correction step:
> {code:java}
> if (joinType.isOuterJoin()) { newExpr = newExpr.accept(new
> RelOptUtil.RexInputConverter(...)); }{code}
> SEMI/ANTI return {{false}} for {{{}isOuterJoin(){}}}, so this adjustment
> never runs. Even if the field counts somehow survived, the {{RexInputRef}}
> indices would still be wrong relative to the new join's output.
> h4. 5. The guards that exist don't help for SEMI/ANTI
> The only type-based guards in {{onMatch}} are:
> {code:java}
> !joinType.generatesNullsOnLeft() // false for SEMI and ANTI
> !joinType.generatesNullsOnRight() // false for SEMI and ANTI{code}
> Since both return {{false}} for SEMI/ANTI, {*}neither project gets
> suppressed{*}. The rule merrily picks up both the left and right project
> children and proceeds into the broken code path.
> ----
> h3. Why the right-side project is especially dangerous
> For a plan like:
> {code:java}
> Project(a, b)
> └─ Join[SEMI]
> ├─ Project(a, b, c) ← left
> └─ Project(x, y) ← right{code}
> The right-side project is *semantically invisible* — SEMI join consumers
> never see right-side columns. But the rule pulls it up anyway, constructing a
> merged program that references right-side field indices. After the rule
> fires, the plan references columns that the SEMI join doesn't output.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)