[ 
https://issues.apache.org/jira/browse/CALCITE-7487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alessandro Solimando updated CALCITE-7487:
------------------------------------------
    Component/s: core

> ProjectJoinTransposeRule throws ArrayIndexOutOfBoundsException in 
> PushProjector when a Join input has a zero-column row type
> ----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: CALCITE-7487
>                 URL: https://issues.apache.org/jira/browse/CALCITE-7487
>             Project: Calcite
>          Issue Type: Bug
>          Components: core
>            Reporter: Chris Rice
>            Priority: Minor
>
> h2. Description
> {{ProjectJoinTransposeRule}} (via {{PushProjector}}) throws 
> {{ArrayIndexOutOfBoundsException}} when invoked on a {{Project}} over a 
> {{Join}} whose left or right input has a row type with zero fields.
> The crash originates in a workaround introduced to avoid producing 
> zero-column projections below the join. In {{PushProjector.locateAllRefs()}} 
> (lines 456–471 at HEAD, {{95eecd87c}}):
> {code:java}
> if ((childRel instanceof Join)
>     || (childRel instanceof SetOp)) {
>   // if nothing is projected from the children, arbitrarily project
>   // the first columns; this is necessary since Fennel doesn't
>   // handle 0-column projections
>   if (nProject == 0 && childPreserveExprs.isEmpty()) {
>     projRefs.set(0);
>     nProject = 1;
>   }
>   if (childRel instanceof Join) {
>     if (nRightProject == 0 && rightPreserveExprs.isEmpty()) {
>       projRefs.set(nFields);
>       nRightProject = 1;
>     }
>   }
> }
> {code}
> The workaround forces {{nRightProject = 1}} and sets a bit at position 
> {{nFields}} (the first field of the right input). When {{nFieldsRight == 0}} 
> this bit points past the last valid field of the right input entirely. 
> {{createProjectRefsAndExprs}} later reads that bit and uses it to index into 
> the right input's field list, which is empty:
> {code:java}
> // PushProjector.createProjectRefsAndExprs, right side call
> int refIdx = offset - 1;                     // offset = nSysFields + nFields
> List<RelDataTypeField> destFields =
>     projChild.getRowType().getFieldList();   // empty
> for (int i = 0; i < nInputRefs; i++) {       // nInputRefs = nRightProject = 1
>   refIdx = projRefs.nextSetBit(refIdx + 1);  // refIdx = nFields
>   assert refIdx >= 0;
>   final RelDataTypeField destField =
>       destFields.get(refIdx - offset);       // destFields.get(0) on empty 
> list
>   ...
> }
> {code}
> The symmetric {{nProject == 0}} case for the left input has the same latent 
> issue when {{nSysFields + nFields == 0}}, though I've only encountered the 
> right-side variant in practice.
> Zero-column {{Join}} inputs are legitimate. The canonical example is *DEE* 
> ({{LogicalValues}} with empty row type and a single empty tuple — the 
> multiplicative identity for cross-join), which naturally appears when an 
> {{Aggregate}} with {{GROUP BY ()}} has its output mapping pruned to zero 
> columns by upstream column pruning.
> ----
> h2. Minimal reproduction
> {code:java}
> @Test void testProjectJoinTransposeWithZeroColumnJoinInput() {
>   final RelBuilder builder = RelBuilder.create(config().build());
>   final RexBuilder rexBuilder = builder.getRexBuilder();
>   final RelDataTypeFactory typeFactory = builder.getTypeFactory();
>   // Left: non-empty LogicalValues with one BIGINT column.
>   RelDataType intType = typeFactory.createSqlType(SqlTypeName.BIGINT);
>   RelNode left = builder
>       .values(
>           ImmutableList.of(ImmutableList.of((RexLiteral) 
> rexBuilder.makeZeroLiteral(intType))),
>           typeFactory.builder().add("col1", intType).build())
>       .build();
>   // Right: DEE — zero-column, one-row LogicalValues (cross-join identity).
>   RelNode dee = builder
>       .values(ImmutableList.of(ImmutableList.of()), 
> typeFactory.builder().build())
>       .build();
>   // Project(CAST($0 AS VARCHAR)) over Join(inner, true, left, dee).
>   // The project must be non-identity, otherwise RelBuilder collapses it away
>   // and the rule never fires.
>   RelDataType varcharType = typeFactory.createSqlType(SqlTypeName.VARCHAR);
>   RelNode root = builder
>       .push(left)
>       .push(dee)
>       .join(JoinRelType.INNER, rexBuilder.makeLiteral(true))
>       .project(rexBuilder.makeCast(varcharType, builder.field(0)))
>       .build();
>   HepPlanner planner = new HepPlanner(
>       new HepProgramBuilder()
>           .addRuleInstance(CoreRules.PROJECT_JOIN_TRANSPOSE)
>           .build());
>   planner.setRoot(root);
>   planner.findBestExp();   // throws ArrayIndexOutOfBoundsException
> }
> {code}
> The plan that reaches the planner:
> {code}
> LogicalProject(col1=[CAST($0):VARCHAR NOT NULL])
>   LogicalJoin(condition=[true], joinType=[inner])
>     LogicalValues(tuples=[[{ 0 }]])
>     LogicalValues(tuples=[[{  }]])
> {code}
> h3. Current output
> {code}
> java.lang.ArrayIndexOutOfBoundsException: Index 0 out of bounds for length 0
>     at 
> com.google.common.collect.RegularImmutableList.get(RegularImmutableList.java:81)
>     at 
> org.apache.calcite.rel.rules.PushProjector.createProjectRefsAndExprs(PushProjector.java:523)
>     at 
> org.apache.calcite.rel.rules.ProjectJoinTransposeRule.onMatch(ProjectJoinTransposeRule.java:117)
>     at org.apache.calcite.plan.hep.HepPlanner.applyRule(HepPlanner.java:527)
>     ...
> {code}
> h3. Expected output
> The rule either pushes the {{Project}} through the {{Join}} and leaves the 
> zero-column side unchanged, or declines to transform (no crash).
> ----
> h2. Proposed fix (or equivalent)
> Guard the Fennel zero-column workaround on the corresponding input having at 
> least one field to project. In {{PushProjector.locateAllRefs()}}:
> {code:java}
> if (childRel instanceof Join) {
>   if (nRightProject == 0 && rightPreserveExprs.isEmpty()
>       && nFieldsRight > 0) {                    // <-- added guard
>     projRefs.set(nFields);
>     nRightProject = 1;
>   }
> }
> // and analogously for the left-side workaround:
> if (nProject == 0 && childPreserveExprs.isEmpty()
>     && (nSysFields + nFields) > 0) {            // <-- added guard
>   projRefs.set(0);
>   nProject = 1;
> }
> {code}
> When the input genuinely has zero fields, there is no "first column" to fall 
> back to and the workaround should be skipped. An equivalent fix is to bail 
> out of the rule entirely in {{matches()}} when either input's field count is 
> zero. The Fennel execution engine referenced in the original comment has been 
> removed for some time, so another option is to remove the workaround 
> altogether — though that's a broader behavioral change and out of scope for 
> the immediate crash.
> ----
> h2. Notes
> * Reproduced on Calcite master at {{95eecd87c}} (a few commits past 
> {{1.41.0}}). The bug appears to be long-standing; it's likely present in 
> earlier released versions as well.
> * Also reproducible via {{substrait-java}}: an {{Aggregate(GROUP BY (), 
> measures)}} with {{Rel.Emit\{outputMapping=[]\}}} yields a DEE on one side of 
> a cross when cross-joined into a larger query, tripping this rule during 
> optimization in downstream engines.
> * {{ProjectCorrelateTransposeRule}} also constructs a {{PushProjector}} 
> ({{ProjectCorrelateTransposeRule.java:79-80}}) and may have the same latent 
> issue on a zero-column correlate side — I haven't confirmed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to