[ https://issues.apache.org/jira/browse/HIVE-26179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17529334#comment-17529334 ]
Stamatis Zampetakis commented on HIVE-26179: -------------------------------------------- In which version did you reproduce the problem? The stack trace in the summary does not seem to correspond to current master or 4.0.0-alpha-1 release. Were you able to reproduce the problem also with 4.0.0-alpha-1? Is there a minimal sequence of steps that can be used to reproduce the problem? > In tez reuse container mode, asyncInitOperations are not clear. > --------------------------------------------------------------- > > Key: HIVE-26179 > URL: https://issues.apache.org/jira/browse/HIVE-26179 > Project: Hive > Issue Type: Bug > Components: Hive, Tez > Affects Versions: 1.2.1 > Environment: engine: Tez (Note: tez.am.container.reuse.enabled is > true) > > Reporter: zhengchenyu > Assignee: zhengchenyu > Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 20m > Remaining Estimate: 0h > > In our cluster, we found error like this. > {code:java} > Vertex failed, vertexName=Map 1, vertexId=vertex_1650608671415_321290_1_11, > diagnostics=[Task failed, taskId=task_1650608671415_321290_1_11_000422, > diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( > failure ) : > attempt_1650608671415_321290_1_11_000422_0:java.lang.RuntimeException: > java.lang.RuntimeException: Hive Runtime Error while closing operators > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:173) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:135) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at > com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125) > at > com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:57) > at > com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.lang.RuntimeException: Hive Runtime Error while closing > operators > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.close(MapRecordProcessor.java:349) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:161) > ... 16 more > Caused by: java.lang.NullPointerException > at > org.apache.hadoop.hive.ql.exec.MapJoinOperator.closeOp(MapJoinOperator.java:488) > at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:684) > at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:698) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.close(MapRecordProcessor.java:338) > ... 17 more > {code} > When tez reuse container is enable, and use MapJoinOperator, if same tasks's > different taskattemp execute in same container, will throw NPE. > By my debug, I found the second task attempt use first task's > asyncInitOperations. asyncInitOperations are not clear when close op, then > second taskattemp may use first taskattepmt's mapJoinTables which > HybridHashTableContainer.HashPartition is closed, so throw NPE. > We must clear asyncInitOperations when op is closed. -- This message was sent by Atlassian Jira (v8.20.7#820007)