[jira] [Commented] (HIVE-21746) ArrayIndexOutOfBoundsException during dynamically partitioned hash join, with CBO disabled

Jason Dere (JIRA) Thu, 16 May 2019 16:33:32 -0700


    [ 
https://issues.apache.org/jira/browse/HIVE-21746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16841809#comment-16841809
 ]


Jason Dere commented on HIVE-21746:
-----------------------------------

I believe the dynamically partitioned hash join has issues when the join keys 
are constant folded.
Looking at the ReduceSink output that feeds into the dynamically partitioned 
hash join:
{noformat}
                      Reduce Output Operator
                        key expressions: _col20 (type: string), 'HR3' (type: 
string)
                        null sort order: aa
                        sort order: ++
                        Map-reduce partition columns: _col20 (type: string), 
'HR3' (type: string)
                        Statistics: Num rows: 3800000 Data size: 1288485344 
Basic stats: COMPLETE Column stats: PARTIAL
                        tag: 0
                        value expressions: _col2 (type: timestamp), _col3 
(type: timestamp), _col51 (type: timestamp), _col124 (type: timestamp)
{noformat}

So the value expressions in the ReduceSink consists of 4 timestamp columns. And 
it appears that the data written out and sent to the Join also matches that.
However, the input schema to the MapJoin operator shows 5 columns rather than 4:
{noformat}
*** valCols[0] for JOIN JOIN_13: [Column[VALUE._col2], Column[VALUE._col3], 
Column[KEY.reducesinkkey1], Column[VALUE._col49], Column[VALUE._col122]]
{noformat}
With types (timestamp, timestamp, string, timestamp, timestamp)

Note that the third column in this list is KEY.reducesinkkey1. Key columns 
should have been filtered out from the values columns in 
MapJoinProcessor.getMapJoinDesc(), during the section that populates 
valueTableDescs.
But the keyExprMap generated by ExprNodeDescUtils.resolveJoinKeysAsRSColumns(), 
which is only done for dynamically partitioned hash join, does not properly 
match the KEY.reducesinkkey1 column from the ReduceSinkOperator, when filtering 
the key columns from the value columns.

The column reference generated from the constant folded column, in keyExprMap:
{noformat}
   1 = {ExprNodeColumnDesc@9714} "Column[KEY.reducesinkkey1]"
    column = "KEY.reducesinkkey1"
    tabAlias = ""
    isPartitionColOrVirtualCol = false
    isSkewedCol = false
    typeInfo = {PrimitiveTypeInfo@9719} "string"
{noformat}

What should have been the corresponding key in the ReduceSinkOperator:
{noformat}
expr = {ExprNodeColumnDesc@8704} "Column[KEY.reducesinkkey1]"
 column = "KEY.reducesinkkey1"
 tabAlias = "t2"
 isPartitionColOrVirtualCol = true
 isSkewedCol = false
 typeInfo = {PrimitiveTypeInfo@9719} "string"
{noformat}

The difference is the ReduceSinkOperator key has tabAlias = "t2". The one 
generated by ExprNodeDescUtils.resolveJoinKeysAsRSColumns() currently has a 
tabAlias hardcoded to "".

One solution is for ExprNodeConstantDesc to keep a foldedFromTab for the table 
alias, in addition to foldedFromCol which it already has. That way 
ExprNodeDescUtils.resolveJoinKeysAsRSColumns() can generate a column reference 
with the same matching tableAlias as its parent ReduceSinkOperator.

> ArrayIndexOutOfBoundsException during dynamically partitioned hash join, with 
> CBO disabled
> ------------------------------------------------------------------------------------------
>
>                 Key: HIVE-21746
>                 URL: https://issues.apache.org/jira/browse/HIVE-21746
>             Project: Hive
>          Issue Type: Bug
>          Components: Query Planning
>            Reporter: Jason Dere
>            Assignee: Jason Dere
>            Priority: Major
>
> ArrayIndexOutOfBounds exception during query execution with dynamically 
> partitioned hash join.
> Found on Hive 2.x. Seems to occur with CBO disabled/failed.
> Disabling constant propagation seems to allow the query to succeed.
> {noformat}
> java.lang.ArrayIndexOutOfBoundsException: 203
>         at 
> org.apache.hadoop.hive.serde2.io.TimestampWritable.getTotalLength(TimestampWritable.java:217)
>  ~[hive-exec-2.1.0.2.6.4.119-3.jar:2.1.0.2.6.4.119-3]
>         at 
> org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryUtils.checkObjectByteInfo(LazyBinaryUtils.java:205)
>  ~[hive-exec-2.1.0.2.6.4.119-3.jar:2.1.0.2.6.4.119-3]
>         at 
> org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryStruct.parse(LazyBinaryStruct.java:142)
>  ~[hive-exec-2.1.0.2.6.4.119-3.jar:2.1.0.2.6.4.119-3]
>         at 
> org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryStruct.getFieldsAsList(LazyBinaryStruct.java:281)
>  ~[hive-exec-2.1.0.2.6.4.119-3.jar:2.1.0.2.6.4.119-3]
>         at 
> org.apache.hadoop.hive.ql.exec.persistence.MapJoinBytesTableContainer$ReusableRowContainer.unpack(MapJoinBytesTableContainer.java:744)
>  ~[hive-exec-2.1.0.2.6.4.119-3.jar:2.1.0.2.6.4.119-3]
>         at 
> org.apache.hadoop.hive.ql.exec.persistence.MapJoinBytesTableContainer$ReusableRowContainer.next(MapJoinBytesTableContainer.java:730)
>  ~[hive-exec-2.1.0.2.6.4.119-3.jar:2.1.0.2.6.4.119-3]
>         at 
> org.apache.hadoop.hive.ql.exec.persistence.MapJoinBytesTableContainer$ReusableRowContainer.next(MapJoinBytesTableContainer.java:605)
>  ~[hive-exec-2.1.0.2.6.4.119-3.jar:2.1.0.2.6.4.119-3]
>         at 
> org.apache.hadoop.hive.ql.exec.persistence.UnwrapRowContainer.next(UnwrapRowContainer.java:70)
>  ~[hive-exec-2.1.0.2.6.4.119-3.jar:2.1.0.2.6.4.119-3]
>         at 
> org.apache.hadoop.hive.ql.exec.persistence.UnwrapRowContainer.next(UnwrapRowContainer.java:34)
>  ~[hive-exec-2.1.0.2.6.4.119-3.jar:2.1.0.2.6.4.119-3]
>         at 
> org.apache.hadoop.hive.ql.exec.CommonJoinOperator.genUniqueJoinObject(CommonJoinOperator.java:819)
>  ~[hive-exec-2.1.0.2.6.4.119-3.jar:2.1.0.2.6.4.119-3]
>         at 
> org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:924)
>  ~[hive-exec-2.1.0.2.6.4.119-3.jar:2.1.0.2.6.4.119-3]
>         at 
> org.apache.hadoop.hive.ql.exec.MapJoinOperator.process(MapJoinOperator.java:456)
>  ~[hive-exec-2.1.0.2.6.4.119-3.jar:2.1.0.2.6.4.119-3]
>         at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:359)
>  ~[hive-exec-2.1.0.2.6.4.119-3.jar:2.1.0.2.6.4.119-3]
>         at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:290)
>  ~[hive-exec-2.1.0.2.6.4.119-3.jar:2.1.0.2.6.4.119-3]
>         at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.run(ReduceRecordProcessor.java:319)
>  ~[hive-exec-2.1.0.2.6.4.119-3.jar:2.1.0.2.6.4.119-3]
>         at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:189)
>  ~[hive-exec-2.1.0.2.6.4.119-3.jar:2.1.0.2.6.4.119-3]
>         at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:172) 
> ~[hive-exec-2.1.0.2.6.4.119-3.jar:2.1.0.2.6.4.119-3]
>         at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:377)
>  ~[tez-runtime-internals-0.8.4.2.6.4.119-3.jar:0.8.4.2.6.4.119-3]
>         at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
>  ~[tez-runtime-internals-0.8.4.2.6.4.119-3.jar:0.8.4.2.6.4.119-3]
>         at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
>  ~[tez-runtime-internals-0.8.4.2.6.4.119-3.jar:0.8.4.2.6.4.119-3]
>         at java.security.AccessController.doPrivileged(Native Method) 
> ~[?:1.8.0_112]
>         at javax.security.auth.Subject.doAs(Subject.java:422) ~[?:1.8.0_112]
>         at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869)
>  ~[hadoop-common-2.7.3.2.6.4.119-3.jar:?]
>         at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
>  ~[tez-runtime-internals-0.8.4.2.6.4.119-3.jar:0.8.4.2.6.4.119-3]
>         at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
>  ~[tez-runtime-internals-0.8.4.2.6.4.119-3.jar:0.8.4.2.6.4.119-3]
>         at 
> org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) 
> ~[tez-common-0.8.4.2.6.4.119-3.jar:0.8.4.2.6.4.119-3]
>         at 
> org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call(StatsRecordingThreadPool.java:118)
>  ~[hive-llap-server-2.1.0.2.6.4.119-3.jar:2.1.0.2.6.4.119-3]
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
> ~[?:1.8.0_112]
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  [?:1.8.0_112]
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  [?:1.8.0_112]
>         at java.lang.Thread.run(Thread.java:745) [?:1.8.0_112]
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-21746) ArrayIndexOutOfBoundsException during dynamically partitioned hash join, with CBO disabled

Reply via email to