[ https://issues.apache.org/jira/browse/CALCITE-685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17385736#comment-17385736 ]
Jihoon Son commented on CALCITE-685: ------------------------------------ Hi, I dig this issue a bit further using a similar query shown below. I used Calcite 1.21 for my testing, but I believe {code:java} select dim1, sum("m1") as "Cnt", (select avg("users") from (select floor(__time to day), count(distinct "dim2") "users" from foo t where t.dim1 = foo.dim1 group by 1)) from foo where dim3 = 'a' group by 1{code} The row signature of the table "foo" was \{__time:LONG, before_count:STRING, cnt:LONG, dim1:STRING, dim2:STRING, dim3:STRING, m1:FLOAT, m2:DOUBLE, unique_dim1:COMPLEX} (I used Druid to test this query and copied the row signature from Druid logs, so please understand that the row signature has Druid types not sql types). And the calcite prints this log with trace logging enabled. {code:java} Breadth-first from root: { HepRelVertex#84 = rel#83:LogicalCorrelate.NONE.[](left=HepRelVertex#71,right=HepRelVertex#82,correlation=$cor0,joinType=left,requiredColumns={3}), rowcount=1.0, cumulative cost={inf} HepRelVertex#71 = rel#70:LogicalAggregate.NONE.[](input=HepRelVertex#69,group={0},Cnt=SUM($1)), rowcount=1.5, cumulative cost={131.70625007152557 rows, 231.0 cpu, 0.0 io} HepRelVertex#82 = rel#81:LogicalAggregate.NONE.[](input=HepRelVertex#80,group={},EXPR$0=AVG($0)), rowcount=1.0, cumulative cost={134.31221064770205 rows, 232.49986383421273 cpu, 0.0 io} HepRelVertex#69 = rel#68:LogicalProject.NONE.[](input=HepRelVertex#67,dim1=$3,m1=$6), rowcount=15.0, cumulative cost={130.0 rows, 231.0 cpu, 0.0 io} HepRelVertex#80 = rel#79:LogicalProject.NONE.[](input=HepRelVertex#78,users=$1), rowcount=1.4998638342127297, cumulative cost={133.18721064770205 rows, 232.49986383421273 cpu, 0.0 io} HepRelVertex#67 = rel#66:LogicalFilter.NONE.[](input=HepRelVertex#65,condition==($5, 'a')), rowcount=15.0, cumulative cost={115.0 rows, 201.0 cpu, 0.0 io} HepRelVertex#78 = rel#77:LogicalAggregate.NONE.[](input=HepRelVertex#76,group={0},users=COUNT(DISTINCT $1)), rowcount=1.4998638342127297, cumulative cost={131.6873468134893 rows, 231.0 cpu, 0.0 io} HepRelVertex#65 = rel#11:LogicalTableScan.NONE.[](table=[druid, foo]), rowcount=100.0, cumulative cost={100.0 rows, 101.0 cpu, 0.0 io} HepRelVertex#76 = rel#75:LogicalProject.NONE.[](input=HepRelVertex#74,EXPR$0=FLOOR($0, FLAG(DAY)),dim2=$4), rowcount=15.0, cumulative cost={130.0 rows, 231.0 cpu, 0.0 io} HepRelVertex#74 = rel#73:LogicalFilter.NONE.[](input=HepRelVertex#72,condition==($3, $cor0.dim1)), rowcount=15.0, cumulative cost={115.0 rows, 201.0 cpu, 0.0 io} HepRelVertex#72 = rel#15:LogicalTableScan.NONE.[](table=[druid, foo]), rowcount=100.0, cumulative cost={100.0 rows, 101.0 cpu, 0.0 io} } {code} Finally, it fails with this exception. This was thrown while making input references to requiredColumns of LogicalCorrelate. {code:java} java.lang.IndexOutOfBoundsException: index (3) must be less than size (3)java.lang.IndexOutOfBoundsException: index (3) must be less than size (3) at com.google.common.base.Preconditions.checkElementIndex(Preconditions.java:313) at com.google.common.base.Preconditions.checkElementIndex(Preconditions.java:295) at com.google.common.collect.RegularImmutableList.get(RegularImmutableList.java:65) at org.apache.calcite.rex.RexBuilder.makeInputRef(RexBuilder.java:859) at org.apache.calcite.sql2rel.RelDecorrelator$AdjustProjectForCountAggregateRule.lambda$onMatch2$0(RelDecorrelator.java:2501) at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) at java.util.Iterator.forEachRemaining(Iterator.java:116) at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801) at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482) at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472) at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708) at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:566) at org.apache.calcite.sql2rel.RelDecorrelator$AdjustProjectForCountAggregateRule.onMatch2(RelDecorrelator.java:2502) at org.apache.calcite.sql2rel.RelDecorrelator$AdjustProjectForCountAggregateRule.onMatch(RelDecorrelator.java:2433) at org.apache.calcite.plan.AbstractRelOptPlanner.fireRule(AbstractRelOptPlanner.java:319) at org.apache.calcite.plan.hep.HepPlanner.applyRule(HepPlanner.java:560) at org.apache.calcite.plan.hep.HepPlanner.applyRules(HepPlanner.java:419) at org.apache.calcite.plan.hep.HepPlanner.executeInstruction(HepPlanner.java:256) at org.apache.calcite.plan.hep.HepInstruction$RuleInstance.execute(HepInstruction.java:127) at org.apache.calcite.plan.hep.HepPlanner.executeProgram(HepPlanner.java:215) at org.apache.calcite.plan.hep.HepPlanner.findBestExp(HepPlanner.java:202) at org.apache.calcite.sql2rel.RelDecorrelator.decorrelate(RelDecorrelator.java:250) at org.apache.calcite.sql2rel.RelDecorrelator.decorrelateQuery(RelDecorrelator.java:215) ...{code} I think this is because the ordinal in requiredColumns of LogicalCorrelate was computed from the row type of TableScan, not from that of LogicalCorrelate. Note that the row type of LogicalCorrelate was (VARCHAR dim1, DOUBLE Cnt, BIGINT EXPR$0). Because there were only 3 columns in the row type, accessing 4th column with the index 3 failed with IndexOutOfBoundsException. As I'm unfamiliar with the internal of Calcite, I'm not sure what would be the best approach to fix this bug. Should we compute the row type of LogicalCorrelate first and feed it to [RelOptUtil.correlationColumns|https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/tools/RelBuilder.java#L2442] to compute the requiredColumns of LogicalCorrelate? > Correlated scalar sub-query in SELECT clause throws > --------------------------------------------------- > > Key: CALCITE-685 > URL: https://issues.apache.org/jira/browse/CALCITE-685 > Project: Calcite > Issue Type: Bug > Reporter: Jinfeng Ni > Priority: Major > Labels: sub-query > > For the following query, where a correlated scalar subquery is put in select > list, Calcite will hit NPE in RelDecorrelator. > {code} > select e.department_id, sum(e.employee_id), > ( select sum(e2.employee_id) > from employee e2 > where e.department_id = e2.department_id > ) > from employee e > group by e.department_id; > {code} > {code} > Caused by: java.lang.NullPointerException > at > org.apache.calcite.sql2rel.RelDecorrelator.createValueGenerator(RelDecorrelator.java:733) > at > org.apache.calcite.sql2rel.RelDecorrelator.decorrelateInputWithValueGenerator(RelDecorrelator.java:842) > at > org.apache.calcite.sql2rel.RelDecorrelator.decorrelateRel(RelDecorrelator.java:902) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at > org.apache.calcite.util.ReflectUtil.invokeVisitorInternal(ReflectUtil.java:256) > at > org.apache.calcite.util.ReflectUtil.invokeVisitor(ReflectUtil.java:213) > at > org.apache.calcite.util.ReflectUtil$1.invokeVisitor(ReflectUtil.java:476) > at > org.apache.calcite.sql2rel.RelDecorrelator$DecorrelateRelVisitor.visit(RelDecorrelator.java:1420) > at org.apache.calcite.rel.SingleRel.childrenAccept(SingleRel.java:72) > ..... > {code} > Here is the unit test case I used to re-produce this problem in JdbcTest.java > {code} > @Test public void testCorreScalarSubQInSelect() > throws ClassNotFoundException, SQLException { > String query = "select e.department_id, sum(e.employee_id),\n" > + " ( select sum(e2.employee_id)\n" > + " from employee e2\n" > + " where e.department_id = e2.department_id\n" > + " )\n" > + " from employee e\n" > + " group by e.department_id\n"; > CalciteAssert.that() > .with(CalciteAssert.Config.FOODMART_CLONE) > .with(Lex.JAVA) > .query(query) > .returnsCount(0); > } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)