[jira] [Commented] (HIVE-26179) In tez reuse container mode, asyncInitOperations are not clear.
[ https://issues.apache.org/jira/browse/HIVE-26179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17534675#comment-17534675 ] zhengchenyu commented on HIVE-26179: [~zabetak] For now, only MapJoinOperator use asyncInitOperations. After apply HIVE-13809, I think it is hard to reproduce some problem. > In tez reuse container mode, asyncInitOperations are not clear. > --- > > Key: HIVE-26179 > URL: https://issues.apache.org/jira/browse/HIVE-26179 > Project: Hive > Issue Type: Bug > Components: Hive, Tez >Affects Versions: 1.2.1 > Environment: engine: Tez (Note: tez.am.container.reuse.enabled is > true) > >Reporter: zhengchenyu >Assignee: zhengchenyu >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 20m > Remaining Estimate: 0h > > In our cluster, we found error like this. > {code:java} > Vertex failed, vertexName=Map 1, vertexId=vertex_1650608671415_321290_1_11, > diagnostics=[Task failed, taskId=task_1650608671415_321290_1_11_000422, > diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( > failure ) : > attempt_1650608671415_321290_1_11_000422_0:java.lang.RuntimeException: > java.lang.RuntimeException: Hive Runtime Error while closing operators > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:173) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:135) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at > com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125) > at > com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:57) > at > com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.lang.RuntimeException: Hive Runtime Error while closing > operators > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.close(MapRecordProcessor.java:349) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:161) > ... 16 more > Caused by: java.lang.NullPointerException > at > org.apache.hadoop.hive.ql.exec.MapJoinOperator.closeOp(MapJoinOperator.java:488) > at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:684) > at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:698) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.close(MapRecordProcessor.java:338) > ... 17 more > {code} > When tez reuse container is enable, and use MapJoinOperator, if same tasks's > different taskattemp execute in same container, will throw NPE. > By my debug, I found the second task attempt use first task's > asyncInitOperations. asyncInitOperations are not clear when close op, then > second taskattemp may use first taskattepmt's mapJoinTables which > HybridHashTableContainer.HashPartition is closed, so throw NPE. > We must clear asyncInitOperations when op is closed. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (HIVE-26179) In tez reuse container mode, asyncInitOperations are not clear.
[ https://issues.apache.org/jira/browse/HIVE-26179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17532146#comment-17532146 ] zhengchenyu commented on HIVE-26179: I test on hive-3.1.2 with my dataset, NPE does not occur. By my debug, I found NPE is fixed in HIVE-13809 by [~wzheng] (Note: remove loadCalled, then fix). But in my debug process, I still found we also get inconsistency result by Operator::completeInitialization in high-level hive version, this bug will raise some unpredictable manner. > In tez reuse container mode, asyncInitOperations are not clear. > --- > > Key: HIVE-26179 > URL: https://issues.apache.org/jira/browse/HIVE-26179 > Project: Hive > Issue Type: Bug > Components: Hive, Tez >Affects Versions: 1.2.1 > Environment: engine: Tez (Note: tez.am.container.reuse.enabled is > true) > >Reporter: zhengchenyu >Assignee: zhengchenyu >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 20m > Remaining Estimate: 0h > > In our cluster, we found error like this. > {code:java} > Vertex failed, vertexName=Map 1, vertexId=vertex_1650608671415_321290_1_11, > diagnostics=[Task failed, taskId=task_1650608671415_321290_1_11_000422, > diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( > failure ) : > attempt_1650608671415_321290_1_11_000422_0:java.lang.RuntimeException: > java.lang.RuntimeException: Hive Runtime Error while closing operators > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:173) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:135) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at > com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125) > at > com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:57) > at > com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.lang.RuntimeException: Hive Runtime Error while closing > operators > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.close(MapRecordProcessor.java:349) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:161) > ... 16 more > Caused by: java.lang.NullPointerException > at > org.apache.hadoop.hive.ql.exec.MapJoinOperator.closeOp(MapJoinOperator.java:488) > at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:684) > at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:698) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.close(MapRecordProcessor.java:338) > ... 17 more > {code} > When tez reuse container is enable, and use MapJoinOperator, if same tasks's > different taskattemp execute in same container, will throw NPE. > By my debug, I found the second task attempt use first task's > asyncInitOperations. asyncInitOperations are not clear when close op, then > second taskattemp may use first taskattepmt's mapJoinTables which > HybridHashTableContainer.HashPartition is closed, so throw NPE. > We must clear asyncInitOperations when op is closed. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Comment Edited] (HIVE-26179) In tez reuse container mode, asyncInitOperations are not clear.
[ https://issues.apache.org/jira/browse/HIVE-26179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17532146#comment-17532146 ] zhengchenyu edited comment on HIVE-26179 at 5/5/22 8:44 AM: I test on hive-3.1.2 with my dataset, NPE does not occur. By my debug, I found NPE is fixed in HIVE-13809 by [~wzheng] (Note: remove loadCalled, then fix). But in my debug process, I still found we also get inconsistency result by Operator::completeInitialization in high-level hive version, this bug will raise some unpredictable manner. [~zabetak] was (Author: zhengchenyu): I test on hive-3.1.2 with my dataset, NPE does not occur. By my debug, I found NPE is fixed in HIVE-13809 by [~wzheng] (Note: remove loadCalled, then fix). But in my debug process, I still found we also get inconsistency result by Operator::completeInitialization in high-level hive version, this bug will raise some unpredictable manner. > In tez reuse container mode, asyncInitOperations are not clear. > --- > > Key: HIVE-26179 > URL: https://issues.apache.org/jira/browse/HIVE-26179 > Project: Hive > Issue Type: Bug > Components: Hive, Tez >Affects Versions: 1.2.1 > Environment: engine: Tez (Note: tez.am.container.reuse.enabled is > true) > >Reporter: zhengchenyu >Assignee: zhengchenyu >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 20m > Remaining Estimate: 0h > > In our cluster, we found error like this. > {code:java} > Vertex failed, vertexName=Map 1, vertexId=vertex_1650608671415_321290_1_11, > diagnostics=[Task failed, taskId=task_1650608671415_321290_1_11_000422, > diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( > failure ) : > attempt_1650608671415_321290_1_11_000422_0:java.lang.RuntimeException: > java.lang.RuntimeException: Hive Runtime Error while closing operators > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:173) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:135) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at > com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125) > at > com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:57) > at > com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.lang.RuntimeException: Hive Runtime Error while closing > operators > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.close(MapRecordProcessor.java:349) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:161) > ... 16 more > Caused by: java.lang.NullPointerException > at > org.apache.hadoop.hive.ql.exec.MapJoinOperator.closeOp(MapJoinOperator.java:488) > at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:684) > at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:698) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.close(MapRecordProcessor.java:338) > ... 17 more > {code} > When tez reuse container is enable, and use MapJoinOperator, if same tasks's > different taskattemp execute in same container, will throw NPE. > By my debug, I found the second task attempt use first task's > asyncInitOperations. asyncInitOperations are not clear when close op, then > second taskattemp may use first taskattepmt's mapJoinTables which > HybridHashTableContainer.HashPartition is closed, so throw NPE. > We must clear asyncInitOperations when op is closed. -- This message was sent by Atlassian Jira
[jira] [Commented] (HIVE-26179) In tez reuse container mode, asyncInitOperations are not clear.
[ https://issues.apache.org/jira/browse/HIVE-26179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17529404#comment-17529404 ] zhengchenyu commented on HIVE-26179: [~zabetak] I use our internal version based on hive-1.2.1. I don't have 4.0.0-alpah-1 environment. But I read the master source code, I found same logical problem. asyncInitOperations need to clear when close op, or will result to inconsistency in tez reuse container mode. > In tez reuse container mode, asyncInitOperations are not clear. > --- > > Key: HIVE-26179 > URL: https://issues.apache.org/jira/browse/HIVE-26179 > Project: Hive > Issue Type: Bug > Components: Hive, Tez >Affects Versions: 1.2.1 > Environment: engine: Tez (Note: tez.am.container.reuse.enabled is > true) > >Reporter: zhengchenyu >Assignee: zhengchenyu >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 20m > Remaining Estimate: 0h > > In our cluster, we found error like this. > {code:java} > Vertex failed, vertexName=Map 1, vertexId=vertex_1650608671415_321290_1_11, > diagnostics=[Task failed, taskId=task_1650608671415_321290_1_11_000422, > diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( > failure ) : > attempt_1650608671415_321290_1_11_000422_0:java.lang.RuntimeException: > java.lang.RuntimeException: Hive Runtime Error while closing operators > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:173) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:135) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at > com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125) > at > com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:57) > at > com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.lang.RuntimeException: Hive Runtime Error while closing > operators > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.close(MapRecordProcessor.java:349) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:161) > ... 16 more > Caused by: java.lang.NullPointerException > at > org.apache.hadoop.hive.ql.exec.MapJoinOperator.closeOp(MapJoinOperator.java:488) > at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:684) > at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:698) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.close(MapRecordProcessor.java:338) > ... 17 more > {code} > When tez reuse container is enable, and use MapJoinOperator, if same tasks's > different taskattemp execute in same container, will throw NPE. > By my debug, I found the second task attempt use first task's > asyncInitOperations. asyncInitOperations are not clear when close op, then > second taskattemp may use first taskattepmt's mapJoinTables which > HybridHashTableContainer.HashPartition is closed, so throw NPE. > We must clear asyncInitOperations when op is closed. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (HIVE-15194) Hive on Tez - Hive Runtime Error while closing operators
[ https://issues.apache.org/jira/browse/HIVE-15194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17528573#comment-17528573 ] zhengchenyu commented on HIVE-15194: [~wzheng] [~gopalv] In our cluster, I found HashPartition.clear() called in last taskattemp when tez resue container. Becuase we asyncInitOperations are not clear, next taskattemp will use last taskattemp's asyncInitOperations. detail message see https://issues.apache.org/jira/browse/HIVE-26179. > Hive on Tez - Hive Runtime Error while closing operators > > > Key: HIVE-15194 > URL: https://issues.apache.org/jira/browse/HIVE-15194 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 2.1.0 > Environment: Hive 2.1.0 > Tez 0.8.4 > 4 Nodes x CentOS-6 x64 (32GB Memory, 8 CPUs) > Hadoop 2.7.1 >Reporter: Shankar M >Assignee: Wei Zheng >Priority: Major > Attachments: HIVE-15194.1.patch > > > Please help me to solve below issue.. > -- > I am setting below commands in hive CLI: > set hive.execution.engine=tez; > set hive.vectorized.execution.enabled = true; > set hive.vectorized.execution.reduce.enabled = true; > set hive.cbo.enable=true; > set hive.compute.query.using.stats=true; > set hive.stats.fetch.column.stats=true; > set hive.stats.fetch.partition.stats=true; > SET hive.tez.container.size=4096; > SET hive.tez.java.opts=-Xmx3072m; > -- > {code} > hive> CREATE TABLE tmp_parquet_newtable STORED AS PARQUET AS > > select a.* from orc_very_large_table a where a.event = 1 and EXISTS > (SELECT 1 FROM tmp_small_parquet_table b WHERE b.session_id = a.session_id ) ; > Query ID = hadoop_20161114132930_65843cb3-557c-4b42-b662-2901caf5be2d > Total jobs = 1 > Launching Job 1 out of 1 > Status: Running (Executing on YARN cluster with App id > application_1479059955967_0049) > -- > VERTICES MODESTATUS TOTAL COMPLETED RUNNING PENDING > FAILED KILLED > -- > Map 1 . containerFAILED384 440 340 > 26 0 > Map 2 .. container SUCCEEDED 1 100 > 0 0 > -- > VERTICES: 01/02 [===>>---] 11% ELAPSED TIME: 43.76 s > > -- > Status: Failed > Vertex failed, vertexName=Map 1, vertexId=vertex_1479059955967_0049_2_01, > diagnostics=[Task failed, taskId=task_1479059955967_0049_2_01_48, > diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( > failure ) : > attempt_1479059955967_0049_2_01_48_0:java.lang.RuntimeException: > java.lang.RuntimeException: Hive Runtime Error while closing operators > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:198) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:160) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:370) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.RuntimeException: Hive Runtime Error while closing > operators > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.close(MapRecordProcessor.java:422) > at >
[jira] [Commented] (HIVE-26179) In tez reuse container mode, asyncInitOperations are not clear.
[ https://issues.apache.org/jira/browse/HIVE-26179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17528571#comment-17528571 ] zhengchenyu commented on HIVE-26179: I found HIVE-15194 which have similar exceptions, but no enough information in HIVE-15194. [~wzheng] [~gopalv] Can you help me review it? > In tez reuse container mode, asyncInitOperations are not clear. > --- > > Key: HIVE-26179 > URL: https://issues.apache.org/jira/browse/HIVE-26179 > Project: Hive > Issue Type: Bug > Components: Hive, Tez >Affects Versions: 1.2.1 > Environment: engine: Tez (Note: tez.am.container.reuse.enabled is > true) > >Reporter: zhengchenyu >Assignee: zhengchenyu >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 10m > Remaining Estimate: 0h > > In our cluster, we found error like this. > {code:java} > Vertex failed, vertexName=Map 1, vertexId=vertex_1650608671415_321290_1_11, > diagnostics=[Task failed, taskId=task_1650608671415_321290_1_11_000422, > diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( > failure ) : > attempt_1650608671415_321290_1_11_000422_0:java.lang.RuntimeException: > java.lang.RuntimeException: Hive Runtime Error while closing operators > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:173) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:135) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at > com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125) > at > com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:57) > at > com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.lang.RuntimeException: Hive Runtime Error while closing > operators > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.close(MapRecordProcessor.java:349) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:161) > ... 16 more > Caused by: java.lang.NullPointerException > at > org.apache.hadoop.hive.ql.exec.MapJoinOperator.closeOp(MapJoinOperator.java:488) > at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:684) > at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:698) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.close(MapRecordProcessor.java:338) > ... 17 more > {code} > When tez reuse container is enable, and use MapJoinOperator, if same tasks's > different taskattemp execute in same container, will throw NPE. > By my debug, I found the second task attempt use first task's > asyncInitOperations. asyncInitOperations are not clear when close op, then > second taskattemp may use first taskattepmt's mapJoinTables which > HybridHashTableContainer.HashPartition is closed, so throw NPE. > We must clear asyncInitOperations when op is closed. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (HIVE-26179) In tez reuse container mode, asyncInitOperations are not clear.
[ https://issues.apache.org/jira/browse/HIVE-26179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17528568#comment-17528568 ] zhengchenyu commented on HIVE-26179: [~zabetak] Can you help me review it? I think it is a bug which must be fixed. > In tez reuse container mode, asyncInitOperations are not clear. > --- > > Key: HIVE-26179 > URL: https://issues.apache.org/jira/browse/HIVE-26179 > Project: Hive > Issue Type: Bug > Components: Hive, Tez >Affects Versions: 1.2.1 > Environment: engine: Tez (Note: tez.am.container.reuse.enabled is > true) > >Reporter: zhengchenyu >Assignee: zhengchenyu >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 10m > Remaining Estimate: 0h > > In our cluster, we found error like this. > {code:java} > Vertex failed, vertexName=Map 1, vertexId=vertex_1650608671415_321290_1_11, > diagnostics=[Task failed, taskId=task_1650608671415_321290_1_11_000422, > diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( > failure ) : > attempt_1650608671415_321290_1_11_000422_0:java.lang.RuntimeException: > java.lang.RuntimeException: Hive Runtime Error while closing operators > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:173) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:135) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at > com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125) > at > com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:57) > at > com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.lang.RuntimeException: Hive Runtime Error while closing > operators > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.close(MapRecordProcessor.java:349) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:161) > ... 16 more > Caused by: java.lang.NullPointerException > at > org.apache.hadoop.hive.ql.exec.MapJoinOperator.closeOp(MapJoinOperator.java:488) > at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:684) > at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:698) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.close(MapRecordProcessor.java:338) > ... 17 more > {code} > When tez reuse container is enable, and use MapJoinOperator, if same tasks's > different taskattemp execute in same container, will throw NPE. > By my debug, I found the second task attempt use first task's > asyncInitOperations. asyncInitOperations are not clear when close op, then > second taskattemp may use first taskattepmt's mapJoinTables which > HybridHashTableContainer.HashPartition is closed, so throw NPE. > We must clear asyncInitOperations when op is closed. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Updated] (HIVE-26179) In tez reuse container mode, asyncInitOperations are not clear.
[ https://issues.apache.org/jira/browse/HIVE-26179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengchenyu updated HIVE-26179: --- Description: In our cluster, we found error like this. {code:java} Vertex failed, vertexName=Map 1, vertexId=vertex_1650608671415_321290_1_11, diagnostics=[Task failed, taskId=task_1650608671415_321290_1_11_000422, diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( failure ) : attempt_1650608671415_321290_1_11_000422_0:java.lang.RuntimeException: java.lang.RuntimeException: Hive Runtime Error while closing operators at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:173) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:135) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374) at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73) at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61) at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37) at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125) at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:57) at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.RuntimeException: Hive Runtime Error while closing operators at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.close(MapRecordProcessor.java:349) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:161) ... 16 more Caused by: java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.MapJoinOperator.closeOp(MapJoinOperator.java:488) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:684) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:698) at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.close(MapRecordProcessor.java:338) ... 17 more {code} When tez reuse container is enable, and use MapJoinOperator, if same tasks's different taskattemp execute in same container, will throw NPE. By my debug, I found the second task attempt use first task's asyncInitOperations. asyncInitOperations are not clear when close op, then second taskattemp may use first taskattepmt's mapJoinTables which HybridHashTableContainer.HashPartition is closed, so throw NPE. We must clear asyncInitOperations when op is closed. was: In our cluster, we found error like this. {code} Vertex failed, vertexName=Map 1, vertexId=vertex_1650608671415_321290_1_11, diagnostics=[Task failed, taskId=task_1650608671415_321290_1_11_000422, diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( failure ) : attempt_1650608671415_321290_1_11_000422_0:java.lang.RuntimeException: java.lang.RuntimeException: Hive Runtime Error while closing operators at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:173) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:135) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374) at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73) at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61) at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37) at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125) at
[jira] [Assigned] (HIVE-26179) In tez reuse container mode, asyncInitOperations are not clear.
[ https://issues.apache.org/jira/browse/HIVE-26179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengchenyu reassigned HIVE-26179: -- > In tez reuse container mode, asyncInitOperations are not clear. > --- > > Key: HIVE-26179 > URL: https://issues.apache.org/jira/browse/HIVE-26179 > Project: Hive > Issue Type: Bug > Components: Hive, Tez >Affects Versions: 1.2.1 > Environment: engine: Tez (Note: tez.am.container.reuse.enabled is > true) > >Reporter: zhengchenyu >Assignee: zhengchenyu >Priority: Major > Fix For: 4.0.0 > > > In our cluster, we found error like this. > {code} > Vertex failed, vertexName=Map 1, vertexId=vertex_1650608671415_321290_1_11, > diagnostics=[Task failed, taskId=task_1650608671415_321290_1_11_000422, > diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( > failure ) : > attempt_1650608671415_321290_1_11_000422_0:java.lang.RuntimeException: > java.lang.RuntimeException: Hive Runtime Error while closing operators > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:173) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:135) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at > com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125) > at > com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:57) > at > com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.lang.RuntimeException: Hive Runtime Error while closing > operators > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.close(MapRecordProcessor.java:349) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:161) > ... 16 more > Caused by: java.lang.NullPointerException > at > org.apache.hadoop.hive.ql.exec.MapJoinOperator.closeOp(MapJoinOperator.java:488) > at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:684) > at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:698) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.close(MapRecordProcessor.java:338) > ... 17 more > {code} > When tez reuse container is enable, and use MapJoinOperator, if same tasks's > different taskattemp execute in same container, will throw NPE. > By my debug, I found the second task attempt use first task's > asyncInitOperations. asyncInitOperations are not clear when close op, then > second taskattemp may use first taskattepmt's mapJoinTables which > HybridHashTableContainer.HashPartition is closed, so throw NPE. > > We must clear asyncInitOperations when op is closed. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (HIVE-25561) Killed task should not commit file.
[ https://issues.apache.org/jira/browse/HIVE-25561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17425939#comment-17425939 ] zhengchenyu commented on HIVE-25561: If tez speculation is enabled, the probability of this problem will increase. But also a low probability event. If we do not fix this bug, may produce two duplicate file in logical (doesn't means same file ), When we read table without uniq, may produce duplicated row. In fact, removeTempOrDuplicateFiles aim to solve this problem. But in some conditions, removeTempOrDuplicateFiles will fail. For this, I think there is no need to commit file which is created by killed task. > Killed task should not commit file. > --- > > Key: HIVE-25561 > URL: https://issues.apache.org/jira/browse/HIVE-25561 > Project: Hive > Issue Type: Bug > Components: Tez >Affects Versions: 1.2.1, 2.3.8, 2.4.0 >Reporter: zhengchenyu >Assignee: zhengchenyu >Priority: Major > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > > For tez engine in our cluster, I found some duplicate line, especially tez > speculation is enabled. In partition dir, I found both 02_0 and 02_1 > exist. > It's a very low probability event. HIVE-10429 has fix some bug about > interrupt, but some exception was not caught. > In our cluster, Task receive SIGTERM, then ClientFinalizer(Hadoop Class) was > called, hdfs client will close. Then will raise exception, but abort may not > set to true. > Then removeTempOrDuplicateFiles may fail because of inconsistency, duplicate > file will retain. > (Notes: Driver first list dir, then Task commit file, then Driver remove > duplicate file. It is a inconsistency case) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (HIVE-25561) Killed task should not commit file.
[ https://issues.apache.org/jira/browse/HIVE-25561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17421994#comment-17421994 ] zhengchenyu edited comment on HIVE-25561 at 9/29/21, 8:20 AM: -- [~zabetak] When bug is reproduced, partition contains duplicate file: 02_0 and 02_1. The two file are created by two different task attempt which belong to same task. (One is normal task attempt, the other is speculative task attempt.) So we will query duplicated line. One file is the subset of the other file. Because the speculative task is killed, so this file created by killed task is the subset of the completed file. I found the file created by killed task could be committed under some conditions. Once some exception was not caught, abort may be false. was (Author: zhengchenyu): [~zabetak] When bug is reproduced, partition contains duplicate file: 02_0 and 02_1. The two file are created by two different task attempt which belong to same task. (One is normal task attempt, the other is speculative task attempt.) So we will query duplicated line. One file is the subset of the other file. Because the speculative task is killed, so this file created by killed task is the subset of the completed file. I found the file created by killed task could be committed. Once some exception was not caught, abort may be false. > Killed task should not commit file. > --- > > Key: HIVE-25561 > URL: https://issues.apache.org/jira/browse/HIVE-25561 > Project: Hive > Issue Type: Bug > Components: Tez >Affects Versions: 1.2.1, 2.3.8, 2.4.0 >Reporter: zhengchenyu >Assignee: zhengchenyu >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > For tez engine in our cluster, I found some duplicate line, especially tez > speculation is enabled. In partition dir, I found both 02_0 and 02_1 > exist. > It's a very low probability event. HIVE-10429 has fix some bug about > interrupt, but some exception was not caught. > In our cluster, Task receive SIGTERM, then ClientFinalizer(Hadoop Class) was > called, hdfs client will close. Then will raise exception, but abort may not > set to true. > Then removeTempOrDuplicateFiles may fail because of inconsistency, duplicate > file will retain. > (Notes: Driver first list dir, then Task commit file, then Driver remove > duplicate file. It is a inconsistency case) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (HIVE-25561) Killed task should not commit file.
[ https://issues.apache.org/jira/browse/HIVE-25561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17421994#comment-17421994 ] zhengchenyu edited comment on HIVE-25561 at 9/29/21, 8:18 AM: -- [~zabetak] When bug is reproduced, partition contains duplicate file: 02_0 and 02_1. The two file are created by two different task attempt which belong to same task. (One is normal task attempt, the other is speculative task attempt.) So we will query duplicated line. One file is the subset of the other file. Because the speculative task is killed, so this file created by killed task is the subset of the completed file. I found the file created by killed task could be committed. Once some exception was not caught, abort may be false. was (Author: zhengchenyu): [~zabetak] When bug is reproduced, partition contains duplicate file: 02_0 and 02_1. The two file are created by two different task attempt which belong to same task. One is normal task attempt, the other is speculative task attempt. So we will query duplicated line. One file is the subset of the other file. Because the speculative task is killed, so this file created by killed task is the subset of the completed file. I found the file created by killed task could be committed. Once some exception was not caught, abort may be false. > Killed task should not commit file. > --- > > Key: HIVE-25561 > URL: https://issues.apache.org/jira/browse/HIVE-25561 > Project: Hive > Issue Type: Bug > Components: Tez >Affects Versions: 1.2.1, 2.3.8, 2.4.0 >Reporter: zhengchenyu >Assignee: zhengchenyu >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > For tez engine in our cluster, I found some duplicate line, especially tez > speculation is enabled. In partition dir, I found both 02_0 and 02_1 > exist. > It's a very low probability event. HIVE-10429 has fix some bug about > interrupt, but some exception was not caught. > In our cluster, Task receive SIGTERM, then ClientFinalizer(Hadoop Class) was > called, hdfs client will close. Then will raise exception, but abort may not > set to true. > Then removeTempOrDuplicateFiles may fail because of inconsistency, duplicate > file will retain. > (Notes: Driver first list dir, then Task commit file, then Driver remove > duplicate file. It is a inconsistency case) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (HIVE-25561) Killed task should not commit file.
[ https://issues.apache.org/jira/browse/HIVE-25561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17421994#comment-17421994 ] zhengchenyu edited comment on HIVE-25561 at 9/29/21, 8:18 AM: -- [~zabetak] When bug is reproduced, partition contains duplicate file: 02_0 and 02_1. The two file are created by two different task attempt which belong to same task. One is normal task attempt, the other is speculative task attempt. So we will query duplicated line. One file is the subset of the other file. Because the speculative task is killed, so this file created by killed task is the subset of the completed file. I found the file created by killed task could be committed. Once some exception was not caught, abort may be false. was (Author: zhengchenyu): [~zabetak] When bug is reproduced, partition contains duplicate file: 02_0 and 02_1. The two file are created by two different task attempt which belong to same task. One is normal task attempt, the other is speculative task attempt. So we will query duplicated line. One file is the subset of the other file. Because the speculative task is killed, so this file created by killed task is the subset of the completed file. I found the file created by killed task could be committed. Once some exception was not caught, ** abort may be false. > Killed task should not commit file. > --- > > Key: HIVE-25561 > URL: https://issues.apache.org/jira/browse/HIVE-25561 > Project: Hive > Issue Type: Bug > Components: Tez >Affects Versions: 1.2.1, 2.3.8, 2.4.0 >Reporter: zhengchenyu >Assignee: zhengchenyu >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > For tez engine in our cluster, I found some duplicate line, especially tez > speculation is enabled. In partition dir, I found both 02_0 and 02_1 > exist. > It's a very low probability event. HIVE-10429 has fix some bug about > interrupt, but some exception was not caught. > In our cluster, Task receive SIGTERM, then ClientFinalizer(Hadoop Class) was > called, hdfs client will close. Then will raise exception, but abort may not > set to true. > Then removeTempOrDuplicateFiles may fail because of inconsistency, duplicate > file will retain. > (Notes: Driver first list dir, then Task commit file, then Driver remove > duplicate file. It is a inconsistency case) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (HIVE-25561) Killed task should not commit file.
[ https://issues.apache.org/jira/browse/HIVE-25561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17421994#comment-17421994 ] zhengchenyu edited comment on HIVE-25561 at 9/29/21, 8:17 AM: -- [~zabetak] When bug is reproduced, partition contains duplicate file: 02_0 and 02_1. The two file are created by two different task attempt which belong to same task. One is normal task attempt, the other is speculative task attempt. So we will query duplicated line. One file is the subset of the other file. Because the speculative task is killed, so this file created by killed task is the subset of the completed file. I found the file created by killed task could be committed. Once some exception was not caught, ** abort may be false. was (Author: zhengchenyu): [~zabetak] When bug is reproduced, partition contains duplicate file: 02_0 and 02_1. The two file are created by two different task attempt which belong to same task. One is normal task attempt, the other is speculative task attempt. So we will query duplicated line. One file is the subset of the other file. Because the speculative task is killed, so this file created by killed task is the subset of the completed file. I found the file created by killed task could be committed. > Killed task should not commit file. > --- > > Key: HIVE-25561 > URL: https://issues.apache.org/jira/browse/HIVE-25561 > Project: Hive > Issue Type: Bug > Components: Tez >Affects Versions: 1.2.1, 2.3.8, 2.4.0 >Reporter: zhengchenyu >Assignee: zhengchenyu >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > For tez engine in our cluster, I found some duplicate line, especially tez > speculation is enabled. In partition dir, I found both 02_0 and 02_1 > exist. > It's a very low probability event. HIVE-10429 has fix some bug about > interrupt, but some exception was not caught. > In our cluster, Task receive SIGTERM, then ClientFinalizer(Hadoop Class) was > called, hdfs client will close. Then will raise exception, but abort may not > set to true. > Then removeTempOrDuplicateFiles may fail because of inconsistency, duplicate > file will retain. > (Notes: Driver first list dir, then Task commit file, then Driver remove > duplicate file. It is a inconsistency case) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (HIVE-25561) Killed task should not commit file.
[ https://issues.apache.org/jira/browse/HIVE-25561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17421994#comment-17421994 ] zhengchenyu edited comment on HIVE-25561 at 9/29/21, 8:15 AM: -- [~zabetak] When bug is reproduced, partition contains duplicate file: 02_0 and 02_1. The two file are created by two different task attempt which belong to same task. One is normal task attempt, the other is speculative task attempt. So we will query duplicated line. One file is the subset of the other file. Because the speculative task is killed, so this file created by killed task is the subset of the completed file. I found the file created by killed task could be committed. was (Author: zhengchenyu): [~zabetak] When bug is reproduced, partition contains duplicate file: 02_0 and 02_1. The two file are created by two different task attempt which belong to same task. One is normal task attempt, the other is speculative task attempt. So we will query duplicated line. > Killed task should not commit file. > --- > > Key: HIVE-25561 > URL: https://issues.apache.org/jira/browse/HIVE-25561 > Project: Hive > Issue Type: Bug > Components: Tez >Affects Versions: 1.2.1, 2.3.8, 2.4.0 >Reporter: zhengchenyu >Assignee: zhengchenyu >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > For tez engine in our cluster, I found some duplicate line, especially tez > speculation is enabled. In partition dir, I found both 02_0 and 02_1 > exist. > It's a very low probability event. HIVE-10429 has fix some bug about > interrupt, but some exception was not caught. > In our cluster, Task receive SIGTERM, then ClientFinalizer(Hadoop Class) was > called, hdfs client will close. Then will raise exception, but abort may not > set to true. > Then removeTempOrDuplicateFiles may fail because of inconsistency, duplicate > file will retain. > (Notes: Driver first list dir, then Task commit file, then Driver remove > duplicate file. It is a inconsistency case) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-25561) Killed task should not commit file.
[ https://issues.apache.org/jira/browse/HIVE-25561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17421994#comment-17421994 ] zhengchenyu commented on HIVE-25561: [~zabetak] When bug is reproduced, partition contains duplicate file: 02_0 and 02_1. The two file are created by two different task attempt which belong to same task. One is normal task attempt, the other is speculative task attempt. So we will query duplicated line. > Killed task should not commit file. > --- > > Key: HIVE-25561 > URL: https://issues.apache.org/jira/browse/HIVE-25561 > Project: Hive > Issue Type: Bug > Components: Tez >Affects Versions: 1.2.1, 2.3.8, 2.4.0 >Reporter: zhengchenyu >Assignee: zhengchenyu >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > For tez engine in our cluster, I found some duplicate line, especially tez > speculation is enabled. In partition dir, I found both 02_0 and 02_1 > exist. > It's a very low probability event. HIVE-10429 has fix some bug about > interrupt, but some exception was not caught. > In our cluster, Task receive SIGTERM, then ClientFinalizer(Hadoop Class) was > called, hdfs client will close. Then will raise exception, but abort may not > set to true. > Then removeTempOrDuplicateFiles may fail because of inconsistency, duplicate > file will retain. > (Notes: Driver first list dir, then Task commit file, then Driver remove > duplicate file. It is a inconsistency case) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-25561) Killed task should not commit file.
[ https://issues.apache.org/jira/browse/HIVE-25561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengchenyu reassigned HIVE-25561: -- Assignee: zhengchenyu > Killed task should not commit file. > --- > > Key: HIVE-25561 > URL: https://issues.apache.org/jira/browse/HIVE-25561 > Project: Hive > Issue Type: Bug > Components: Tez >Affects Versions: 1.2.1, 2.3.8, 2.4.0 >Reporter: zhengchenyu >Assignee: zhengchenyu >Priority: Major > > For tez engine in our cluster, I found some duplicate line, especially tez > speculation is enabled. In partition dir, I found both 02_0 and 02_1 > exist. > It's a very low probability event. HIVE-10429 has fix some bug about > interrupt, but some exception was not caught. > In our cluster, Task receive SIGTERM, then ClientFinalizer(Hadoop Class) was > called, hdfs client will close. Then will raise exception, but abort may not > set to true. > Then removeTempOrDuplicateFiles may fail because of inconsistency, duplicate > file will retain. > (Notes: Driver first list dir, then Task commit file, then Driver remove > duplicate file. It is a inconsistency case) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-25335) Unreasonable setting reduce number, when join big size table(but small row count) and small size table
[ https://issues.apache.org/jira/browse/HIVE-25335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17383089#comment-17383089 ] zhengchenyu commented on HIVE-25335: [~zabetak] Okay,I will submit PR. > Unreasonable setting reduce number, when join big size table(but small row > count) and small size table > -- > > Key: HIVE-25335 > URL: https://issues.apache.org/jira/browse/HIVE-25335 > Project: Hive > Issue Type: Improvement >Reporter: zhengchenyu >Assignee: zhengchenyu >Priority: Major > Attachments: HIVE-25335.001.patch > > > I found an application which is slow in our cluster, because the proccess > bytes of one reduce is very huge, but only two reduce. > when I debug, I found the reason. Because in this sql, one big size table > (about 30G) with few row count(about 3.5M), another small size table (about > 100M) have more row count (about 3.6M). So JoinStatsRule.process only use > 100M to estimate reducer's number. But we need to process 30G byte in fact. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-25335) Unreasonable setting reduce number, when join big size table(but small row count) and small size table
[ https://issues.apache.org/jira/browse/HIVE-25335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengchenyu updated HIVE-25335: --- Attachment: HIVE-25335.001.patch > Unreasonable setting reduce number, when join big size table(but small row > count) and small size table > -- > > Key: HIVE-25335 > URL: https://issues.apache.org/jira/browse/HIVE-25335 > Project: Hive > Issue Type: Improvement >Reporter: zhengchenyu >Assignee: zhengchenyu >Priority: Major > Attachments: HIVE-25335.001.patch > > > I found an application which is slow in our cluster, because the proccess > bytes of one reduce is very huge, but only two reduce. > when I debug, I found the reason. Because in this sql, one big size table > (about 30G) with few row count(about 3.5M), another small size table (about > 100M) have more row count (about 3.6M). So JoinStatsRule.process only use > 100M to estimate reducer's number. But we need to process 30G byte in fact. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-25335) Unreasonable setting reduce number, when join big size table(but small row count) and small size table
[ https://issues.apache.org/jira/browse/HIVE-25335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengchenyu reassigned HIVE-25335: -- Assignee: zhengchenyu > Unreasonable setting reduce number, when join big size table(but small row > count) and small size table > -- > > Key: HIVE-25335 > URL: https://issues.apache.org/jira/browse/HIVE-25335 > Project: Hive > Issue Type: Improvement >Reporter: zhengchenyu >Assignee: zhengchenyu >Priority: Major > > I found an application which is slow in our cluster, because the proccess > bytes of one reduce is very huge, but only two reduce. > when I debug, I found the reason. Because in this sql, one big size table > (about 30G) with few row count(about 3.5M), another small size table (about > 100M) have more row count (about 3.6M). So JoinStatsRule.process only use > 100M to estimate reducer's number. But we need to process 30G byte in fact. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-22126) hive-exec packaging should shade guava
[ https://issues.apache.org/jira/browse/HIVE-22126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17176752#comment-17176752 ] zhengchenyu commented on HIVE-22126: [~euigeun_chung] I found another problem. In deriveRowType function, the change in this patch will result in dead loop, then throw OOM. the variable 'name' is not changed, so the loop never exit. > hive-exec packaging should shade guava > -- > > Key: HIVE-22126 > URL: https://issues.apache.org/jira/browse/HIVE-22126 > Project: Hive > Issue Type: Bug >Reporter: Vihang Karajgaonkar >Assignee: Eugene Chung >Priority: Major > Fix For: 4.0.0 > > Attachments: HIVE-22126.01.patch, HIVE-22126.02.patch, > HIVE-22126.03.patch, HIVE-22126.04.patch, HIVE-22126.05.patch, > HIVE-22126.06.patch, HIVE-22126.07.patch, HIVE-22126.08.patch, > HIVE-22126.09.patch, HIVE-22126.09.patch, HIVE-22126.09.patch, > HIVE-22126.09.patch, HIVE-22126.09.patch > > > The ql/pom.xml includes complete guava library into hive-exec.jar > https://github.com/apache/hive/blob/master/ql/pom.xml#L990 This causes a > problems for downstream clients of hive which have hive-exec.jar in their > classpath since they are pinned to the same guava version as that of hive. > We should shade guava classes so that other components which depend on > hive-exec can independently use a different version of guava as needed. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-22126) hive-exec packaging should shade guava
[ https://issues.apache.org/jira/browse/HIVE-22126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17176751#comment-17176751 ] zhengchenyu commented on HIVE-22126: [~kgyrtkirk] Yeah, I decompile the jar, found duplicated calcite, I slove this problem. > hive-exec packaging should shade guava > -- > > Key: HIVE-22126 > URL: https://issues.apache.org/jira/browse/HIVE-22126 > Project: Hive > Issue Type: Bug >Reporter: Vihang Karajgaonkar >Assignee: Eugene Chung >Priority: Major > Fix For: 4.0.0 > > Attachments: HIVE-22126.01.patch, HIVE-22126.02.patch, > HIVE-22126.03.patch, HIVE-22126.04.patch, HIVE-22126.05.patch, > HIVE-22126.06.patch, HIVE-22126.07.patch, HIVE-22126.08.patch, > HIVE-22126.09.patch, HIVE-22126.09.patch, HIVE-22126.09.patch, > HIVE-22126.09.patch, HIVE-22126.09.patch > > > The ql/pom.xml includes complete guava library into hive-exec.jar > https://github.com/apache/hive/blob/master/ql/pom.xml#L990 This causes a > problems for downstream clients of hive which have hive-exec.jar in their > classpath since they are pinned to the same guava version as that of hive. > We should shade guava classes so that other components which depend on > hive-exec can independently use a different version of guava as needed. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-22126) hive-exec packaging should shade guava
[ https://issues.apache.org/jira/browse/HIVE-22126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17175634#comment-17175634 ] zhengchenyu commented on HIVE-22126: [~euigeun_chung] I solve this problem, I exclude calcite package from bin.tar.gz. Sorry for a huge gap, my version(3.2.1) is too older than master. > hive-exec packaging should shade guava > -- > > Key: HIVE-22126 > URL: https://issues.apache.org/jira/browse/HIVE-22126 > Project: Hive > Issue Type: Bug >Reporter: Vihang Karajgaonkar >Assignee: Eugene Chung >Priority: Major > Fix For: 4.0.0 > > Attachments: HIVE-22126.01.patch, HIVE-22126.02.patch, > HIVE-22126.03.patch, HIVE-22126.04.patch, HIVE-22126.05.patch, > HIVE-22126.06.patch, HIVE-22126.07.patch, HIVE-22126.08.patch, > HIVE-22126.09.patch, HIVE-22126.09.patch, HIVE-22126.09.patch, > HIVE-22126.09.patch, HIVE-22126.09.patch > > > The ql/pom.xml includes complete guava library into hive-exec.jar > https://github.com/apache/hive/blob/master/ql/pom.xml#L990 This causes a > problems for downstream clients of hive which have hive-exec.jar in their > classpath since they are pinned to the same guava version as that of hive. > We should shade guava classes so that other components which depend on > hive-exec can independently use a different version of guava as needed. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Issue Comment Deleted] (HIVE-22126) hive-exec packaging should shade guava
[ https://issues.apache.org/jira/browse/HIVE-22126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengchenyu updated HIVE-22126: --- Comment: was deleted (was: [~euigeun_chung] I think it 's not a version problem. I think we wanna need to shade guava, we need shade all guava from all other component(for example: hadoop, spark), not but work well on a specific version. ) > hive-exec packaging should shade guava > -- > > Key: HIVE-22126 > URL: https://issues.apache.org/jira/browse/HIVE-22126 > Project: Hive > Issue Type: Bug >Reporter: Vihang Karajgaonkar >Assignee: Eugene Chung >Priority: Major > Fix For: 4.0.0 > > Attachments: HIVE-22126.01.patch, HIVE-22126.02.patch, > HIVE-22126.03.patch, HIVE-22126.04.patch, HIVE-22126.05.patch, > HIVE-22126.06.patch, HIVE-22126.07.patch, HIVE-22126.08.patch, > HIVE-22126.09.patch, HIVE-22126.09.patch, HIVE-22126.09.patch, > HIVE-22126.09.patch, HIVE-22126.09.patch > > > The ql/pom.xml includes complete guava library into hive-exec.jar > https://github.com/apache/hive/blob/master/ql/pom.xml#L990 This causes a > problems for downstream clients of hive which have hive-exec.jar in their > classpath since they are pinned to the same guava version as that of hive. > We should shade guava classes so that other components which depend on > hive-exec can independently use a different version of guava as needed. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-22126) hive-exec packaging should shade guava
[ https://issues.apache.org/jira/browse/HIVE-22126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17175462#comment-17175462 ] zhengchenyu commented on HIVE-22126: [~euigeun_chung] I think it 's not a version problem. I think we wanna need to shade guava, we need shade all guava from all other component(for example: hadoop, spark), not but work well on a specific version. > hive-exec packaging should shade guava > -- > > Key: HIVE-22126 > URL: https://issues.apache.org/jira/browse/HIVE-22126 > Project: Hive > Issue Type: Bug >Reporter: Vihang Karajgaonkar >Assignee: Eugene Chung >Priority: Major > Fix For: 4.0.0 > > Attachments: HIVE-22126.01.patch, HIVE-22126.02.patch, > HIVE-22126.03.patch, HIVE-22126.04.patch, HIVE-22126.05.patch, > HIVE-22126.06.patch, HIVE-22126.07.patch, HIVE-22126.08.patch, > HIVE-22126.09.patch, HIVE-22126.09.patch, HIVE-22126.09.patch, > HIVE-22126.09.patch, HIVE-22126.09.patch > > > The ql/pom.xml includes complete guava library into hive-exec.jar > https://github.com/apache/hive/blob/master/ql/pom.xml#L990 This causes a > problems for downstream clients of hive which have hive-exec.jar in their > classpath since they are pinned to the same guava version as that of hive. > We should shade guava classes so that other components which depend on > hive-exec can independently use a different version of guava as needed. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (HIVE-22126) hive-exec packaging should shade guava
[ https://issues.apache.org/jira/browse/HIVE-22126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17175452#comment-17175452 ] zhengchenyu edited comment on HIVE-22126 at 8/11/20, 11:02 AM: --- When I run the program, I found an AbstractMethodError. I think HiveAggregateFactoryImpl's createAggregate doesn't implement AggregateFactory, I think com.google.common.collect.ImmutableList would be relocation to org.apache.hive.com.google.common.collect.ImmutableList, so throw AbstractMethodError. Notes: for shade all guava jar, I remove guava lib from lib dir and use maven-shade-plugin in main pom.xml to shade all guava. Error stack are below: {code:java} 2020-08-11T18:26:51,434 ERROR [0ae58217-4908-4697-a9e7-a57f279a22a0 main] parse.CalcitePlanner: CBO failed, skipping CBO. java.lang.RuntimeException: java.lang.AbstractMethodError: org.apache.hadoop.hive.ql.optimizer.calcite.HiveRelFactories$HiveAggregateFactoryImpl.createAggregate(Lorg/apache/calcite/rel/RelNode;ZLorg/apache/calcite/util/ImmutableBitSet;Lcom/google/common/collect/ImmutableList;Ljava/util/List;)Lorg/apache/calcite/rel/RelNode; at org.apache.hadoop.hive.ql.parse.CalcitePlanner.rethrowCalciteException(CalcitePlanner.java:1539) ~[hive-exec-3.1.2.jar:3.1.2] at org.apache.hadoop.hive.ql.parse.CalcitePlanner.logicalPlan(CalcitePlanner.java:1417) ~[hive-exec-3.1.2.jar:3.1.2] at org.apache.hadoop.hive.ql.parse.CalcitePlanner.getOptimizedAST(CalcitePlanner.java:1430) ~[hive-exec-3.1.2.jar:3.1.2] at org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:450) ~[hive-exec-3.1.2.jar:3.1.2] at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12164) ~[hive-exec-3.1.2.jar:3.1.2] at org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:330) ~[hive-exec-3.1.2.jar:3.1.2] at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:285) ~[hive-exec-3.1.2.jar:3.1.2] at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:659) ~[hive-exec-3.1.2.jar:3.1.2] at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1826) ~[hive-exec-3.1.2.jar:3.1.2] at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1773) ~[hive-exec-3.1.2.jar:3.1.2] at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1768) ~[hive-exec-3.1.2.jar:3.1.2] at org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:126) ~[hive-exec-3.1.2.jar:3.1.2] at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:214) ~[hive-exec-3.1.2.jar:3.1.2] at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:239) ~[hive-cli-3.1.2.jar:3.1.2] at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:188) ~[hive-cli-3.1.2.jar:3.1.2] at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:402) ~[hive-cli-3.1.2.jar:3.1.2] at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:821) ~[hive-cli-3.1.2.jar:3.1.2] at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:759) ~[hive-cli-3.1.2.jar:3.1.2] at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:683) ~[hive-cli-3.1.2.jar:3.1.2] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_241] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_241] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_241] at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_241] at org.apache.hadoop.util.RunJar.run(RunJar.java:323) ~[hadoop-common-3.2.1.jar:?] at org.apache.hadoop.util.RunJar.main(RunJar.java:236) ~[hadoop-common-3.2.1.jar:?] Caused by: java.lang.AbstractMethodError: org.apache.hadoop.hive.ql.optimizer.calcite.HiveRelFactories$HiveAggregateFactoryImpl.createAggregate(Lorg/apache/calcite/rel/RelNode;ZLorg/apache/calcite/util/ImmutableBitSet;Lcom/google/common/collect/ImmutableList;Ljava/util/List;)Lorg/apache/calcite/rel/RelNode; at org.apache.calcite.tools.RelBuilder.aggregate(RelBuilder.java:1267) ~[calcite-core-1.16.0.jar:1.16.0] at org.apache.calcite.sql2rel.RelFieldTrimmer.trimFields(RelFieldTrimmer.java:886) ~[calcite-core-1.16.0.jar:1.16.0] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_241] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_241] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_241] at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_241] at
[jira] [Commented] (HIVE-22126) hive-exec packaging should shade guava
[ https://issues.apache.org/jira/browse/HIVE-22126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17175452#comment-17175452 ] zhengchenyu commented on HIVE-22126: When I run the program, I found an AbstractMethodError. I think HiveAggregateFactoryImpl's createAggregate doesn't implement AggregateFactory, I think com.google.common.collect.ImmutableList would be relocation to org.apache.hive.com.google.common.collect.ImmutableList, so throw AbstractMethodError. Notes: for shade all guava jar, I remove guava lib from lib dir and use maven-shade-plugin in main pom.xml to shade all guava. Error stack are below: {code} 2020-08-11T18:26:51,434 ERROR [0ae58217-4908-4697-a9e7-a57f279a22a0 main] parse.CalcitePlanner: CBO failed, skipping CBO.2020-08-11T18:26:51,434 ERROR [0ae58217-4908-4697-a9e7-a57f279a22a0 main] parse.CalcitePlanner: CBO failed, skipping CBO.java.lang.RuntimeException: java.lang.AbstractMethodError: org.apache.hadoop.hive.ql.optimizer.calcite.HiveRelFactories$HiveAggregateFactoryImpl.createAggregate(Lorg/apache/calcite/rel/RelNode;ZLorg/apache/calcite/util/ImmutableBitSet;Lcom/google/common/collect/ImmutableList;Ljava/util/List;)Lorg/apache/calcite/rel/RelNode; at org.apache.hadoop.hive.ql.parse.CalcitePlanner.rethrowCalciteException(CalcitePlanner.java:1539) ~[hive-exec-3.1.2.jar:3.1.2] at org.apache.hadoop.hive.ql.parse.CalcitePlanner.logicalPlan(CalcitePlanner.java:1417) ~[hive-exec-3.1.2.jar:3.1.2] at org.apache.hadoop.hive.ql.parse.CalcitePlanner.getOptimizedAST(CalcitePlanner.java:1430) ~[hive-exec-3.1.2.jar:3.1.2] at org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:450) ~[hive-exec-3.1.2.jar:3.1.2] at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12164) ~[hive-exec-3.1.2.jar:3.1.2] at org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:330) ~[hive-exec-3.1.2.jar:3.1.2] at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:285) ~[hive-exec-3.1.2.jar:3.1.2] at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:659) ~[hive-exec-3.1.2.jar:3.1.2] at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1826) ~[hive-exec-3.1.2.jar:3.1.2] at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1773) ~[hive-exec-3.1.2.jar:3.1.2] at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1768) ~[hive-exec-3.1.2.jar:3.1.2] at org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:126) ~[hive-exec-3.1.2.jar:3.1.2] at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:214) ~[hive-exec-3.1.2.jar:3.1.2] at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:239) ~[hive-cli-3.1.2.jar:3.1.2] at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:188) ~[hive-cli-3.1.2.jar:3.1.2] at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:402) ~[hive-cli-3.1.2.jar:3.1.2] at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:821) ~[hive-cli-3.1.2.jar:3.1.2] at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:759) ~[hive-cli-3.1.2.jar:3.1.2] at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:683) ~[hive-cli-3.1.2.jar:3.1.2] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_241] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_241] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_241] at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_241] at org.apache.hadoop.util.RunJar.run(RunJar.java:323) ~[hadoop-common-3.2.1.jar:?] at org.apache.hadoop.util.RunJar.main(RunJar.java:236) ~[hadoop-common-3.2.1.jar:?]Caused by: java.lang.AbstractMethodError: org.apache.hadoop.hive.ql.optimizer.calcite.HiveRelFactories$HiveAggregateFactoryImpl.createAggregate(Lorg/apache/calcite/rel/RelNode;ZLorg/apache/calcite/util/ImmutableBitSet;Lcom/google/common/collect/ImmutableList;Ljava/util/List;)Lorg/apache/calcite/rel/RelNode; at org.apache.calcite.tools.RelBuilder.aggregate(RelBuilder.java:1267) ~[calcite-core-1.16.0.jar:1.16.0] at org.apache.calcite.sql2rel.RelFieldTrimmer.trimFields(RelFieldTrimmer.java:886) ~[calcite-core-1.16.0.jar:1.16.0] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_241] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_241] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_241] at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_241] at org.apache.calcite.util.ReflectUtil$2.invoke(ReflectUtil.java:524) ~[calcite-core-1.16.0.jar:1.16.0] at org.apache.calcite.sql2rel.RelFieldTrimmer.dispatchTrimFields(RelFieldTrimmer.java:273)
[jira] [Assigned] (HIVE-14425) java.io.IOException: Could not find status of job:job_*
[ https://issues.apache.org/jira/browse/HIVE-14425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengchenyu reassigned HIVE-14425: -- Assignee: zhengchenyu (was: liuguanghua) > java.io.IOException: Could not find status of job:job_* > --- > > Key: HIVE-14425 > URL: https://issues.apache.org/jira/browse/HIVE-14425 > Project: Hive > Issue Type: Bug >Affects Versions: 1.2.1 > Environment: hadoop2.7.2 + hive1.2.1 >Reporter: liuguanghua >Assignee: zhengchenyu >Priority: Minor > > java.io.IOException: Could not find status of job:job_1470047186803_13 > at > org.apache.hadoop.hive.ql.exec.mr.HadoopJobExecHelper.progress(HadoopJobExecHelper.java:295) > at > org.apache.hadoop.hive.ql.exec.mr.HadoopJobExecHelper.progress(HadoopJobExecHelper.java:549) > at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:437) > at org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:137) > at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160) > at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:88) > at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1653) > at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1412) > at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1195) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1059) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1049) > at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:213) > at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:165) > at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376) > at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:736) > at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681) > at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:621) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at org.apache.hadoop.util.RunJar.run(RunJar.java:221) > at org.apache.hadoop.util.RunJar.main(RunJar.java:136) > Ended Job = job_1470047186803_13 with exception > 'java.io.IOException(Could not find status of job:job_1470047186803_13)' -- This message was sent by Atlassian JIRA (v6.3.4#6332)