[jira] [Commented] (CALCITE-685) Correlated scalar sub-query in SELECT clause throws

Jihoon Son (Jira) Thu, 22 Jul 2021 12:47:05 -0700


    [ 
https://issues.apache.org/jira/browse/CALCITE-685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17385736#comment-17385736
 ]


Jihoon Son commented on CALCITE-685:
------------------------------------

Hi, I dig this issue a bit further using a similar query shown below. I used 
Calcite 1.21 for my testing, but I believe 
{code:java}
select
    dim1,
    sum("m1") as "Cnt",
    (select avg("users") from (select floor(__time to day), count(distinct 
"dim2") "users" from foo t where t.dim1 = foo.dim1 group by 1)) 
from foo
where dim3 = 'a'
group by 1{code}
The row signature of the table "foo" was \{__time:LONG, before_count:STRING, 
cnt:LONG, dim1:STRING, dim2:STRING, dim3:STRING, m1:FLOAT, m2:DOUBLE, 
unique_dim1:COMPLEX} (I used Druid to test this query and copied the row 
signature from Druid logs, so please understand that the row signature has 
Druid types not sql types). And the calcite prints this log with trace logging 
enabled. 
{code:java}
Breadth-first from root:  {
    HepRelVertex#84 = 
rel#83:LogicalCorrelate.NONE.[](left=HepRelVertex#71,right=HepRelVertex#82,correlation=$cor0,joinType=left,requiredColumns={3}),
 rowcount=1.0, cumulative cost={inf}
    HepRelVertex#71 = 
rel#70:LogicalAggregate.NONE.[](input=HepRelVertex#69,group={0},Cnt=SUM($1)), 
rowcount=1.5, cumulative cost={131.70625007152557 rows, 231.0 cpu, 0.0 io}
    HepRelVertex#82 = 
rel#81:LogicalAggregate.NONE.[](input=HepRelVertex#80,group={},EXPR$0=AVG($0)), 
rowcount=1.0, cumulative cost={134.31221064770205 rows, 232.49986383421273 cpu, 
0.0 io}
    HepRelVertex#69 = 
rel#68:LogicalProject.NONE.[](input=HepRelVertex#67,dim1=$3,m1=$6), 
rowcount=15.0, cumulative cost={130.0 rows, 231.0 cpu, 0.0 io}
    HepRelVertex#80 = 
rel#79:LogicalProject.NONE.[](input=HepRelVertex#78,users=$1), 
rowcount=1.4998638342127297, cumulative cost={133.18721064770205 rows, 
232.49986383421273 cpu, 0.0 io}
    HepRelVertex#67 = 
rel#66:LogicalFilter.NONE.[](input=HepRelVertex#65,condition==($5, 'a')), 
rowcount=15.0, cumulative cost={115.0 rows, 201.0 cpu, 0.0 io}
    HepRelVertex#78 = 
rel#77:LogicalAggregate.NONE.[](input=HepRelVertex#76,group={0},users=COUNT(DISTINCT
 $1)), rowcount=1.4998638342127297, cumulative cost={131.6873468134893 rows, 
231.0 cpu, 0.0 io}
    HepRelVertex#65 = rel#11:LogicalTableScan.NONE.[](table=[druid, foo]), 
rowcount=100.0, cumulative cost={100.0 rows, 101.0 cpu, 0.0 io}
    HepRelVertex#76 = 
rel#75:LogicalProject.NONE.[](input=HepRelVertex#74,EXPR$0=FLOOR($0, 
FLAG(DAY)),dim2=$4), rowcount=15.0, cumulative cost={130.0 rows, 231.0 cpu, 0.0 
io}
    HepRelVertex#74 = 
rel#73:LogicalFilter.NONE.[](input=HepRelVertex#72,condition==($3, 
$cor0.dim1)), rowcount=15.0, cumulative cost={115.0 rows, 201.0 cpu, 0.0 io}
    HepRelVertex#72 = rel#15:LogicalTableScan.NONE.[](table=[druid, foo]), 
rowcount=100.0, cumulative cost={100.0 rows, 101.0 cpu, 0.0 io}
}
{code}
Finally, it fails with this exception. This was thrown while making input 
references to requiredColumns of LogicalCorrelate. 
{code:java}
java.lang.IndexOutOfBoundsException: index (3) must be less than size 
(3)java.lang.IndexOutOfBoundsException: index (3) must be less than size (3)

 at 
com.google.common.base.Preconditions.checkElementIndex(Preconditions.java:313) 
at 
com.google.common.base.Preconditions.checkElementIndex(Preconditions.java:295) 
at 
com.google.common.collect.RegularImmutableList.get(RegularImmutableList.java:65)
 at org.apache.calcite.rex.RexBuilder.makeInputRef(RexBuilder.java:859) at 
org.apache.calcite.sql2rel.RelDecorrelator$AdjustProjectForCountAggregateRule.lambda$onMatch2$0(RelDecorrelator.java:2501)
 at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) 
at java.util.Iterator.forEachRemaining(Iterator.java:116) at 
java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
 at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482) at 
java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472) at 
java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708) at 
java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) at 
java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:566) at 
org.apache.calcite.sql2rel.RelDecorrelator$AdjustProjectForCountAggregateRule.onMatch2(RelDecorrelator.java:2502)
 at 
org.apache.calcite.sql2rel.RelDecorrelator$AdjustProjectForCountAggregateRule.onMatch(RelDecorrelator.java:2433)
 at 
org.apache.calcite.plan.AbstractRelOptPlanner.fireRule(AbstractRelOptPlanner.java:319)
 at org.apache.calcite.plan.hep.HepPlanner.applyRule(HepPlanner.java:560) at 
org.apache.calcite.plan.hep.HepPlanner.applyRules(HepPlanner.java:419) at 
org.apache.calcite.plan.hep.HepPlanner.executeInstruction(HepPlanner.java:256) 
at 
org.apache.calcite.plan.hep.HepInstruction$RuleInstance.execute(HepInstruction.java:127)
 at org.apache.calcite.plan.hep.HepPlanner.executeProgram(HepPlanner.java:215) 
at org.apache.calcite.plan.hep.HepPlanner.findBestExp(HepPlanner.java:202) at 
org.apache.calcite.sql2rel.RelDecorrelator.decorrelate(RelDecorrelator.java:250)
 at 
org.apache.calcite.sql2rel.RelDecorrelator.decorrelateQuery(RelDecorrelator.java:215)
...{code}
I think this is because the ordinal in requiredColumns of LogicalCorrelate was 
computed from the row type of TableScan, not from that of LogicalCorrelate. 
Note that the row type of LogicalCorrelate was (VARCHAR dim1, DOUBLE Cnt, 
BIGINT EXPR$0). Because there were only 3 columns in the row type, accessing 
4th column with the index 3 failed with IndexOutOfBoundsException.

As I'm unfamiliar with the internal of Calcite, I'm not sure what would be the 
best approach to fix this bug. Should we compute the row type of 
LogicalCorrelate first and feed it to 
[RelOptUtil.correlationColumns|https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/tools/RelBuilder.java#L2442]
 to compute the requiredColumns of LogicalCorrelate?

> Correlated scalar sub-query in SELECT clause throws
> ---------------------------------------------------
>
>                 Key: CALCITE-685
>                 URL: https://issues.apache.org/jira/browse/CALCITE-685
>             Project: Calcite
>          Issue Type: Bug
>            Reporter: Jinfeng Ni
>            Priority: Major
>              Labels: sub-query
>
> For the following query, where a correlated scalar subquery is put in select 
> list, Calcite will hit NPE in RelDecorrelator.
> {code}
> select e.department_id, sum(e.employee_id),
>        ( select sum(e2.employee_id)
>          from  employee e2
>          where e.department_id = e2.department_id
>        )
> from employee e
> group by e.department_id;
> {code}
> {code}
> Caused by: java.lang.NullPointerException
>       at 
> org.apache.calcite.sql2rel.RelDecorrelator.createValueGenerator(RelDecorrelator.java:733)
>       at 
> org.apache.calcite.sql2rel.RelDecorrelator.decorrelateInputWithValueGenerator(RelDecorrelator.java:842)
>       at 
> org.apache.calcite.sql2rel.RelDecorrelator.decorrelateRel(RelDecorrelator.java:902)
>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>       at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>       at 
> org.apache.calcite.util.ReflectUtil.invokeVisitorInternal(ReflectUtil.java:256)
>       at 
> org.apache.calcite.util.ReflectUtil.invokeVisitor(ReflectUtil.java:213)
>       at 
> org.apache.calcite.util.ReflectUtil$1.invokeVisitor(ReflectUtil.java:476)
>       at 
> org.apache.calcite.sql2rel.RelDecorrelator$DecorrelateRelVisitor.visit(RelDecorrelator.java:1420)
>       at org.apache.calcite.rel.SingleRel.childrenAccept(SingleRel.java:72)
> .....
> {code}
> Here is the unit test case I used to re-produce this problem in JdbcTest.java
> {code}
>   @Test public void testCorreScalarSubQInSelect()
>       throws ClassNotFoundException, SQLException {
>     String query = "select e.department_id, sum(e.employee_id),\n"
>         + "       ( select sum(e2.employee_id)\n"
>         + "         from  employee e2\n"
>         + "         where e.department_id = e2.department_id\n"
>         + "       )\n"
>         + " from employee e\n"
>         + " group by e.department_id\n";
>     CalciteAssert.that()
>         .with(CalciteAssert.Config.FOODMART_CLONE)
>         .with(Lex.JAVA)
>         .query(query)
>         .returnsCount(0);
>   }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (CALCITE-685) Correlated scalar sub-query in SELECT clause throws

Reply via email to