[ 
https://issues.apache.org/jira/browse/HIVE-11606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14737335#comment-14737335
 ] 

Vikram Dixit K commented on HIVE-11606:
---------------------------------------

There is an optimization where in case of inner joins, if the hashtable is 
empty, we set the done flag for the operator. However, this causes bucket map 
joins to produce incorrect results in case of container reuse because the 
operators in the cached work do not process records when the done flag has been 
set even though a different bucket is being processed. We prevent caching of 
the input in the case of bucket map joins but not the work - which makes sense 
because the operator pipeline hasn't changed. Ideally, we should reset the done 
flag only in the case of bucket map joins but this is not a big issue for 
broadcast joins because we will run the previous optimization again anyways and 
stop processing early in the initialize operator (loadHashTable) phase itself.

> Bucket map joins fail at hash table construction time
> -----------------------------------------------------
>
>                 Key: HIVE-11606
>                 URL: https://issues.apache.org/jira/browse/HIVE-11606
>             Project: Hive
>          Issue Type: Bug
>          Components: Tez
>    Affects Versions: 1.0.1, 1.2.1
>            Reporter: Vikram Dixit K
>            Assignee: Vikram Dixit K
>         Attachments: HIVE-11606.1.patch, HIVE-11606.2.patch, 
> HIVE-11606.3.patch
>
>
> {code}
> info=[Error: Failure while running task:java.lang.RuntimeException: 
> java.lang.RuntimeException: java.lang.AssertionError: Capacity must be a 
> power of two
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:186)
> at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:138)
> at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:324)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:176)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:168)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:168)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:163)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
> Caused by: java.lang.RuntimeException: java.lang.AssertionError: Capacity 
> must be a power of two
> at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:91)
> at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:68)
> at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:294)
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:163)
>  
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to