[ 
https://issues.apache.org/jira/browse/CALCITE-7622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhen Chen resolved CALCITE-7622.
--------------------------------
    Fix Version/s: 1.43.0
       Resolution: Fixed

Fixed in 
[{{beee52a}}|https://github.com/apache/calcite/commit/beee52abac9ab1f195c7b9e63c2d80539f6a0062]
 

Thanks for the contribution [~limbadyash] 

Thanks for the review [~mbudiu] [~xuzifu666] 

> Don't fire JoinProjectTransposeRule for ANTI/SEMI/LEFT_MARK JOIN
> ----------------------------------------------------------------
>
>                 Key: CALCITE-7622
>                 URL: https://issues.apache.org/jira/browse/CALCITE-7622
>             Project: Calcite
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 1.42.0
>            Reporter: Yash
>            Assignee: Yash
>            Priority: Minor
>              Labels: pull-request-available
>             Fix For: 1.43.0
>
>
> JoinProjectTransposeRule must not match SEMI / ANTI join
>  
> The fundamental problem is that {{JoinProjectTransposeRule}} was designed 
> around *regular joins* (INNER/OUTER), where the output row type is always 
> {{{}left columns + right columns{}}}. SEMI and ANTI joins violate this 
> assumption in a structural way.
> h4. 1. The rule computes {{joinChildrenRowType}} with hardcoded {{INNER}}
> {code:java}
> final RelDataType joinChildrenRowType = SqlValidatorUtil.deriveJoinRowType( 
> leftJoinChild.getRowType(), rightJoinChild.getRowType(), JoinRelType.INNER, 
> // <-- always INNER ...);{code}
> This is intentional for INNER/OUTER: it's building a "scratch" merged type to 
> construct the RexProgram over both sides. But the assumption that {{left 
> fields + right fields = join output fields}} is {*}only true for 
> non-SEMI/ANTI types{*}.
> h4. 2. SEMI and ANTI drop the right side from their output row type
> From {{{}SqlValidatorUtil.deriveJoinRowType{}}}:
> {code:java}
> case SEMI: case ANTI: rightType = null; // right side is GONE from the output 
> break;{code}
> So when the rule later does:
> {code:java}
> final int nProjExprs = join.getRowType().getFieldCount();{code}
> For a SEMI/ANTI join, {{join.getRowType()}} only has {*}left-side fields{*}. 
> But the rule calculated {{projects}} over {{left fields + right fields}} (the 
> INNER-typed scratch type). The counts don't match.
> h4. 3. The resulting {{newJoin}} has a structurally wrong row type
> The rule creates:
> {code:java}
> final Join newJoin = join.copy(join.getTraitSet(), newCondition, 
> leftJoinChild, rightJoinChild, join.getJoinType(), // SEMI or ANTI preserved 
> here join.isSemiJoinDone());{code}
> {{newJoin.getRowType()}} is re-derived via {{deriveJoinRowType}} with the 
> actual SEMI/ANTI type — so it only has left-side fields. But the projection 
> list {{newProjExprs}} was built assuming left + right fields exist. The loop:
>  
> {code:java}
> for (int i = 0; i < nProjExprs; i++) { 
>   RexNode newExpr = mergedProgram.expandLocalRef(projList.get(i)); ... 
> newProjExprs.add(newExpr); 
> }{code}
> ...will contain {{{}RexInputRef{}}}s pointing to right-side field indices 
> that no longer exist in {{{}newJoin{}}}'s row type. This causes either:
>  * An {{IndexOutOfBoundsException}} at planning time, or
>  * A silent plan corruption where the wrong fields get referenced
> h4. 4. The {{isOuterJoin()}} adjustment path is also skipped
> For OUTER joins, there's a correction step:
> {code:java}
> if (joinType.isOuterJoin()) { newExpr = newExpr.accept(new 
> RelOptUtil.RexInputConverter(...)); }{code}
> SEMI/ANTI return {{false}} for {{{}isOuterJoin(){}}}, so this adjustment 
> never runs. Even if the field counts somehow survived, the {{RexInputRef}} 
> indices would still be wrong relative to the new join's output.
> h4. 5. The guards that exist don't help for SEMI/ANTI
> The only type-based guards in {{onMatch}} are:
> {code:java}
> !joinType.generatesNullsOnLeft() // false for SEMI and ANTI 
> !joinType.generatesNullsOnRight() // false for SEMI and ANTI{code}
> Since both return {{false}} for SEMI/ANTI, {*}neither project gets 
> suppressed{*}. The rule merrily picks up both the left and right project 
> children and proceeds into the broken code path.
> ----
> h3. Why the right-side project is especially dangerous
> For a plan like:
> {code:java}
> Project(a, b) 
> └─ Join[SEMI] 
>     ├─ Project(a, b, c) ← left 
>     └─ Project(x, y) ← right{code}
> The right-side project is *semantically invisible* — SEMI join consumers 
> never see right-side columns. But the rule pulls it up anyway, constructing a 
> merged program that references right-side field indices. After the rule 
> fires, the plan references columns that the SEMI join doesn't output.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to