[
https://issues.apache.org/jira/browse/DRILL-2107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14315849#comment-14315849
]
Aman Sinha commented on DRILL-2107:
---
The changes to remove the processing of the first batch in HashAggBatch seems
ok at first glance, since it is subsumed by the buildSchema() (fast schema
return).However, I recall that for certain functions on complex types we
don't do fast schema return; would that continue to work as expected after
this change ? I don't know the details here, so let's discuss and maybe this is
not an issue.
> Hash Join throw IOBE for a query with exists subquery.
> ---
>
> Key: DRILL-2107
> URL: https://issues.apache.org/jira/browse/DRILL-2107
> Project: Apache Drill
> Issue Type: New Feature
> Components: Execution - Relational Operators
>Reporter: Jinfeng Ni
>Assignee: Aman Sinha
>Priority: Blocker
> Attachments: DRILL-2107.patch, q4_1_hj.json, q4_1_hj_phy.txt,
> q4_1_mj.json, q4_1_mj_phy.txt
>
>
> I hit an IOBE for TestTpchDistributed Q4, when I tried to enable an optimizer
> rule. Then, I simplified Q4 to the following, and still re-produce the same
> IOBE.
> {code}
> select
> o.o_orderpriority
> from
> cp.`tpch/orders.parquet` o
> where
> exists (
> select
> *
> from
> cp.`tpch/lineitem.parquet` l
> where
> l.l_orderkey = o.o_orderkey
> )
> ;
> {code}
> Stack trace of the exception:
> {code}
> java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
> at java.util.ArrayList.rangeCheck(ArrayList.java:635) ~[na:1.7.0_45]
> at java.util.ArrayList.get(ArrayList.java:411) ~[na:1.7.0_45]
> at
> org.apache.drill.exec.record.VectorContainer.getValueAccessorById(VectorContainer.java:232)
> ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
> at
> org.apache.drill.exec.record.RecordBatchLoader.getValueAccessorById(RecordBatchLoader.java:149)
> ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
> at
> org.apache.drill.exec.physical.impl.unorderedreceiver.UnorderedReceiverBatch.getValueAccessorById(UnorderedReceiverBatch.java:132)
> ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
> at
> org.apache.drill.exec.test.generated.HashTableGen307.doSetup(HashTableTemplate.java:71)
> ~[na:na]
> at
> org.apache.drill.exec.test.generated.HashTableGen307.updateBatches(HashTableTemplate.java:473)
> ~[na:na]
> at
> org.apache.drill.exec.test.generated.HashJoinProbeGen313.executeProbePhase(HashJoinProbeTemplate.java:139)
> ~[na:na]
> at
> org.apache.drill.exec.test.generated.HashJoinProbeGen313.probeAndProject(HashJoinProbeTemplate.java:223)
> ~[na:na]
> at
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.innerNext(HashJoinBatch.java:227)
> ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
>
> {code}
> The physical plan seems to be correct, after enabling the new rule. Actually,
> if I disable HashJoin, and use merge join for the query, it works fine. So,
> seems the IOBE exposes some bug in HashJoin.
> To re-produce this issue, two options:
> 1 ) - Modify DrillRuleSets.java, remove the comment before SwapJoinRule
>- alter session set `planner.slice_target` = 10;
>- run the query
>
> 2) use the attached physical plan in json file, and use "submitplan" to
> submit the physical plan.
> For comparison, I also attached the physical plan when disabling hashjoin
> (use merge join), and the explain plan at physical operator level.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)