[ 
https://issues.apache.org/jira/browse/HIVE-26179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17534359#comment-17534359
 ] 

Stamatis Zampetakis commented on HIVE-26179:
--------------------------------------------

Thanks for the follow-up [~zhengchenyu].

What is the problem that you see in more recent Hive versions?

I understand the problem at high-level but I don't feel comfortable merging 
something to master that is untested. 

It would help if there is a minimal sequence of steps that someone can use to 
reproduce this problem.

> In tez reuse container mode, asyncInitOperations are not clear.
> ---------------------------------------------------------------
>
>                 Key: HIVE-26179
>                 URL: https://issues.apache.org/jira/browse/HIVE-26179
>             Project: Hive
>          Issue Type: Bug
>          Components: Hive, Tez
>    Affects Versions: 1.2.1
>         Environment: engine: Tez (Note: tez.am.container.reuse.enabled is 
> true)
>  
>            Reporter: zhengchenyu
>            Assignee: zhengchenyu
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 4.0.0
>
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> In our cluster, we found error like this.
> {code:java}
> Vertex failed, vertexName=Map 1, vertexId=vertex_1650608671415_321290_1_11, 
> diagnostics=[Task failed, taskId=task_1650608671415_321290_1_11_000422, 
> diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( 
> failure ) : 
> attempt_1650608671415_321290_1_11_000422_0:java.lang.RuntimeException: 
> java.lang.RuntimeException: Hive Runtime Error while closing operators
>     at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:173)
>     at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:135)
>     at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
>     at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
>     at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
>     at java.security.AccessController.doPrivileged(Native Method)
>     at javax.security.auth.Subject.doAs(Subject.java:422)
>     at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
>     at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
>     at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
>     at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>     at 
> com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125)
>     at 
> com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:57)
>     at 
> com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78)
>     at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>     at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>     at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.RuntimeException: Hive Runtime Error while closing 
> operators
>     at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.close(MapRecordProcessor.java:349)
>     at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:161)
>     ... 16 more
> Caused by: java.lang.NullPointerException
>     at 
> org.apache.hadoop.hive.ql.exec.MapJoinOperator.closeOp(MapJoinOperator.java:488)
>     at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:684)
>     at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:698)
>     at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.close(MapRecordProcessor.java:338)
>     ... 17 more
> {code}
> When tez reuse container is enable, and use MapJoinOperator, if same tasks's 
> different taskattemp execute in same container, will throw NPE.
> By my debug, I found the second task attempt use first task's 
> asyncInitOperations. asyncInitOperations are not clear when close op, then 
> second taskattemp may use first taskattepmt's mapJoinTables which 
> HybridHashTableContainer.HashPartition is closed, so throw NPE.
> We must clear asyncInitOperations when op is closed.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to