[jira] [Commented] (HIVE-11028) Tez: table self join and join with another table fails with IndexOutOfBoundsException

Jason Dere (JIRA) Tue, 16 Jun 2015 16:17:34 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-11028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14588988#comment-14588988
 ]


Jason Dere commented on HIVE-11028:
-----------------------------------

It looks like this is caused because TezCompiler invokes ConstantPropagate and 
this is removing some columns, but without a corresponding call to ColumnPruner 
to remove outputColumnNames from the join operator.

Talking to [~jpullokkaran] and [~hagleitn], the use of ConstantPropagate in 
TezCompiler is to remove extra (and unnecessary) "AND true" predicates 
generated during dynamic partition pruning. One solution is to eliminate just 
those expressions (referred to in ConstantPropagate as short-cutting), as 
opposed to doing full constant folding. I'll try to add an option to 
ConstantPropagate where we can specify that we only want to perform expression 
short-cutting rather than full constant folding.

> Tez: table self join and join with another table fails with 
> IndexOutOfBoundsException
> -------------------------------------------------------------------------------------
>
>                 Key: HIVE-11028
>                 URL: https://issues.apache.org/jira/browse/HIVE-11028
>             Project: Hive
>          Issue Type: Bug
>          Components: Query Planning
>            Reporter: Jason Dere
>            Assignee: Jason Dere
>
> {noformat}
> create table tez_self_join1(id1 int, id2 string, id3 string);
> insert into table tez_self_join1 values(1, 'aa','bb'), (2, 'ab','ab'), 
> (3,'ba','ba');
> create table tez_self_join2(id1 int);
> insert into table tez_self_join2 values(1),(2),(3);
> explain
> select s.id2, s.id3
> from
> (
>  select self1.id1, self1.id2, self1.id3
>  from tez_self_join1 self1 join tez_self_join1 self2
>  on self1.id2=self2.id3 ) s
> join tez_self_join2
> on s.id1=tez_self_join2.id1
> where s.id2='ab';
> {noformat}
> fails with error:
> {noformat}
> 2015-06-16 15:41:55,759 ERROR [main]: ql.Driver 
> (SessionState.java:printError(979)) - FAILED: Execution Error, return code 2 
> from org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, 
> vertexName=Reducer 3, vertexId=vertex_1434494327112_0002_4_04, 
> diagnostics=[Task failed, taskId=task_1434494327112_0002_4_04_000000, 
> diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running 
> task:java.lang.RuntimeException: java.lang.IndexOutOfBoundsException: Index: 
> 0, Size: 0
>         at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:171)
>         at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:137)
>         at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:324)
>         at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:176)
>         at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:168)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:415)
>         at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>         at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:168)
>         at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:163)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:744)
> Caused by: java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
>         at java.util.ArrayList.rangeCheck(ArrayList.java:635)
>         at java.util.ArrayList.get(ArrayList.java:411)
>         at 
> org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.init(StandardStructObjectInspector.java:118)
>         at 
> org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.<init>(StandardStructObjectInspector.java:109)
>         at 
> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorFactory.getStandardStructObjectInspector(ObjectInspectorFactory.java:290)
>         at 
> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorFactory.getStandardStructObjectInspector(ObjectInspectorFactory.java:275)
>         at 
> org.apache.hadoop.hive.ql.exec.CommonJoinOperator.getJoinOutputObjectInspector(CommonJoinOperator.java:175)
>         at 
> org.apache.hadoop.hive.ql.exec.CommonJoinOperator.initializeOp(CommonJoinOperator.java:313)
>         at 
> org.apache.hadoop.hive.ql.exec.AbstractMapJoinOperator.initializeOp(AbstractMapJoinOperator.java:71)
>         at 
> org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.initializeOp(CommonMergeJoinOperator.java:99)
>         at 
> org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:362)
>         at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.init(ReduceRecordProcessor.java:146)
>         at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:147)
>         ... 13 more
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11028) Tez: table self join and join with another table fails with IndexOutOfBoundsException

Reply via email to