[jira] [Commented] (HIVE-26179) In tez reuse container mode, asyncInitOperations are not clear.

2022-05-10 Thread zhengchenyu (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17534675#comment-17534675
 ] 

zhengchenyu commented on HIVE-26179:


[~zabetak] For now, only MapJoinOperator use asyncInitOperations. After apply 
HIVE-13809, I think it is hard to reproduce some problem.

> In tez reuse container mode, asyncInitOperations are not clear.
> ---
>
> Key: HIVE-26179
> URL: https://issues.apache.org/jira/browse/HIVE-26179
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Tez
>Affects Versions: 1.2.1
> Environment: engine: Tez (Note: tez.am.container.reuse.enabled is 
> true)
>  
>Reporter: zhengchenyu
>Assignee: zhengchenyu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> In our cluster, we found error like this.
> {code:java}
> Vertex failed, vertexName=Map 1, vertexId=vertex_1650608671415_321290_1_11, 
> diagnostics=[Task failed, taskId=task_1650608671415_321290_1_11_000422, 
> diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( 
> failure ) : 
> attempt_1650608671415_321290_1_11_000422_0:java.lang.RuntimeException: 
> java.lang.RuntimeException: Hive Runtime Error while closing operators
>     at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:173)
>     at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:135)
>     at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
>     at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
>     at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
>     at java.security.AccessController.doPrivileged(Native Method)
>     at javax.security.auth.Subject.doAs(Subject.java:422)
>     at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
>     at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
>     at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
>     at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>     at 
> com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125)
>     at 
> com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:57)
>     at 
> com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78)
>     at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>     at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>     at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.RuntimeException: Hive Runtime Error while closing 
> operators
>     at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.close(MapRecordProcessor.java:349)
>     at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:161)
>     ... 16 more
> Caused by: java.lang.NullPointerException
>     at 
> org.apache.hadoop.hive.ql.exec.MapJoinOperator.closeOp(MapJoinOperator.java:488)
>     at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:684)
>     at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:698)
>     at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.close(MapRecordProcessor.java:338)
>     ... 17 more
> {code}
> When tez reuse container is enable, and use MapJoinOperator, if same tasks's 
> different taskattemp execute in same container, will throw NPE.
> By my debug, I found the second task attempt use first task's 
> asyncInitOperations. asyncInitOperations are not clear when close op, then 
> second taskattemp may use first taskattepmt's mapJoinTables which 
> HybridHashTableContainer.HashPartition is closed, so throw NPE.
> We must clear asyncInitOperations when op is closed.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (HIVE-26179) In tez reuse container mode, asyncInitOperations are not clear.

2022-05-05 Thread zhengchenyu (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17532146#comment-17532146
 ] 

zhengchenyu commented on HIVE-26179:


I test on hive-3.1.2 with my dataset, NPE does not occur. By my debug, I found 
NPE is fixed in HIVE-13809 by [~wzheng] (Note: remove loadCalled, then fix).

But in my debug process, I still found we also get inconsistency result by 
Operator::completeInitialization in high-level hive version, this bug will 
raise some unpredictable manner.

> In tez reuse container mode, asyncInitOperations are not clear.
> ---
>
> Key: HIVE-26179
> URL: https://issues.apache.org/jira/browse/HIVE-26179
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Tez
>Affects Versions: 1.2.1
> Environment: engine: Tez (Note: tez.am.container.reuse.enabled is 
> true)
>  
>Reporter: zhengchenyu
>Assignee: zhengchenyu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> In our cluster, we found error like this.
> {code:java}
> Vertex failed, vertexName=Map 1, vertexId=vertex_1650608671415_321290_1_11, 
> diagnostics=[Task failed, taskId=task_1650608671415_321290_1_11_000422, 
> diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( 
> failure ) : 
> attempt_1650608671415_321290_1_11_000422_0:java.lang.RuntimeException: 
> java.lang.RuntimeException: Hive Runtime Error while closing operators
>     at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:173)
>     at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:135)
>     at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
>     at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
>     at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
>     at java.security.AccessController.doPrivileged(Native Method)
>     at javax.security.auth.Subject.doAs(Subject.java:422)
>     at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
>     at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
>     at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
>     at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>     at 
> com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125)
>     at 
> com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:57)
>     at 
> com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78)
>     at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>     at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>     at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.RuntimeException: Hive Runtime Error while closing 
> operators
>     at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.close(MapRecordProcessor.java:349)
>     at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:161)
>     ... 16 more
> Caused by: java.lang.NullPointerException
>     at 
> org.apache.hadoop.hive.ql.exec.MapJoinOperator.closeOp(MapJoinOperator.java:488)
>     at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:684)
>     at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:698)
>     at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.close(MapRecordProcessor.java:338)
>     ... 17 more
> {code}
> When tez reuse container is enable, and use MapJoinOperator, if same tasks's 
> different taskattemp execute in same container, will throw NPE.
> By my debug, I found the second task attempt use first task's 
> asyncInitOperations. asyncInitOperations are not clear when close op, then 
> second taskattemp may use first taskattepmt's mapJoinTables which 
> HybridHashTableContainer.HashPartition is closed, so throw NPE.
> We must clear asyncInitOperations when op is closed.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Comment Edited] (HIVE-26179) In tez reuse container mode, asyncInitOperations are not clear.

2022-05-05 Thread zhengchenyu (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17532146#comment-17532146
 ] 

zhengchenyu edited comment on HIVE-26179 at 5/5/22 8:44 AM:


I test on hive-3.1.2 with my dataset, NPE does not occur. By my debug, I found 
NPE is fixed in HIVE-13809 by [~wzheng] (Note: remove loadCalled, then fix).

But in my debug process, I still found we also get inconsistency result by 
Operator::completeInitialization in high-level hive version, this bug will 
raise some unpredictable manner.

[~zabetak] 


was (Author: zhengchenyu):
I test on hive-3.1.2 with my dataset, NPE does not occur. By my debug, I found 
NPE is fixed in HIVE-13809 by [~wzheng] (Note: remove loadCalled, then fix).

But in my debug process, I still found we also get inconsistency result by 
Operator::completeInitialization in high-level hive version, this bug will 
raise some unpredictable manner.

> In tez reuse container mode, asyncInitOperations are not clear.
> ---
>
> Key: HIVE-26179
> URL: https://issues.apache.org/jira/browse/HIVE-26179
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Tez
>Affects Versions: 1.2.1
> Environment: engine: Tez (Note: tez.am.container.reuse.enabled is 
> true)
>  
>Reporter: zhengchenyu
>Assignee: zhengchenyu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> In our cluster, we found error like this.
> {code:java}
> Vertex failed, vertexName=Map 1, vertexId=vertex_1650608671415_321290_1_11, 
> diagnostics=[Task failed, taskId=task_1650608671415_321290_1_11_000422, 
> diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( 
> failure ) : 
> attempt_1650608671415_321290_1_11_000422_0:java.lang.RuntimeException: 
> java.lang.RuntimeException: Hive Runtime Error while closing operators
>     at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:173)
>     at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:135)
>     at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
>     at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
>     at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
>     at java.security.AccessController.doPrivileged(Native Method)
>     at javax.security.auth.Subject.doAs(Subject.java:422)
>     at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
>     at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
>     at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
>     at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>     at 
> com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125)
>     at 
> com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:57)
>     at 
> com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78)
>     at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>     at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>     at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.RuntimeException: Hive Runtime Error while closing 
> operators
>     at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.close(MapRecordProcessor.java:349)
>     at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:161)
>     ... 16 more
> Caused by: java.lang.NullPointerException
>     at 
> org.apache.hadoop.hive.ql.exec.MapJoinOperator.closeOp(MapJoinOperator.java:488)
>     at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:684)
>     at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:698)
>     at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.close(MapRecordProcessor.java:338)
>     ... 17 more
> {code}
> When tez reuse container is enable, and use MapJoinOperator, if same tasks's 
> different taskattemp execute in same container, will throw NPE.
> By my debug, I found the second task attempt use first task's 
> asyncInitOperations. asyncInitOperations are not clear when close op, then 
> second taskattemp may use first taskattepmt's mapJoinTables which 
> HybridHashTableContainer.HashPartition is closed, so throw NPE.
> We must clear asyncInitOperations when op is closed.



--
This message was sent by Atlassian Jira

[jira] [Commented] (HIVE-26179) In tez reuse container mode, asyncInitOperations are not clear.

2022-04-28 Thread zhengchenyu (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17529404#comment-17529404
 ] 

zhengchenyu commented on HIVE-26179:


[~zabetak] I use our internal version based on hive-1.2.1. I don't have 
4.0.0-alpah-1 environment. But I read the master source code, I found same 
logical problem. asyncInitOperations need to clear when close op, or will 
result to inconsistency in tez reuse container mode.

> In tez reuse container mode, asyncInitOperations are not clear.
> ---
>
> Key: HIVE-26179
> URL: https://issues.apache.org/jira/browse/HIVE-26179
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Tez
>Affects Versions: 1.2.1
> Environment: engine: Tez (Note: tez.am.container.reuse.enabled is 
> true)
>  
>Reporter: zhengchenyu
>Assignee: zhengchenyu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> In our cluster, we found error like this.
> {code:java}
> Vertex failed, vertexName=Map 1, vertexId=vertex_1650608671415_321290_1_11, 
> diagnostics=[Task failed, taskId=task_1650608671415_321290_1_11_000422, 
> diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( 
> failure ) : 
> attempt_1650608671415_321290_1_11_000422_0:java.lang.RuntimeException: 
> java.lang.RuntimeException: Hive Runtime Error while closing operators
>     at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:173)
>     at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:135)
>     at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
>     at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
>     at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
>     at java.security.AccessController.doPrivileged(Native Method)
>     at javax.security.auth.Subject.doAs(Subject.java:422)
>     at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
>     at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
>     at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
>     at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>     at 
> com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125)
>     at 
> com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:57)
>     at 
> com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78)
>     at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>     at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>     at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.RuntimeException: Hive Runtime Error while closing 
> operators
>     at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.close(MapRecordProcessor.java:349)
>     at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:161)
>     ... 16 more
> Caused by: java.lang.NullPointerException
>     at 
> org.apache.hadoop.hive.ql.exec.MapJoinOperator.closeOp(MapJoinOperator.java:488)
>     at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:684)
>     at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:698)
>     at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.close(MapRecordProcessor.java:338)
>     ... 17 more
> {code}
> When tez reuse container is enable, and use MapJoinOperator, if same tasks's 
> different taskattemp execute in same container, will throw NPE.
> By my debug, I found the second task attempt use first task's 
> asyncInitOperations. asyncInitOperations are not clear when close op, then 
> second taskattemp may use first taskattepmt's mapJoinTables which 
> HybridHashTableContainer.HashPartition is closed, so throw NPE.
> We must clear asyncInitOperations when op is closed.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (HIVE-15194) Hive on Tez - Hive Runtime Error while closing operators

2022-04-27 Thread zhengchenyu (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-15194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17528573#comment-17528573
 ] 

zhengchenyu commented on HIVE-15194:


[~wzheng] [~gopalv] 

In our cluster, I found HashPartition.clear() called in last taskattemp when 
tez resue container. Becuase we asyncInitOperations are not clear, next 
taskattemp will use last taskattemp's asyncInitOperations.

detail message see https://issues.apache.org/jira/browse/HIVE-26179.

> Hive on Tez - Hive Runtime Error while closing operators
> 
>
> Key: HIVE-15194
> URL: https://issues.apache.org/jira/browse/HIVE-15194
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.1.0
> Environment: Hive 2.1.0 
> Tez 0.8.4
> 4 Nodes x CentOS-6 x64 (32GB Memory, 8 CPUs)
> Hadoop 2.7.1
>Reporter: Shankar M
>Assignee: Wei Zheng
>Priority: Major
> Attachments: HIVE-15194.1.patch
>
>
> Please help me to solve below issue.. 
> --
> I am setting below commands in hive CLI: 
> set hive.execution.engine=tez;
> set hive.vectorized.execution.enabled = true;
> set hive.vectorized.execution.reduce.enabled = true;
> set hive.cbo.enable=true;
> set hive.compute.query.using.stats=true;
> set hive.stats.fetch.column.stats=true;
> set hive.stats.fetch.partition.stats=true;
> SET hive.tez.container.size=4096;
> SET hive.tez.java.opts=-Xmx3072m;
> --
> {code}
> hive> CREATE TABLE tmp_parquet_newtable STORED AS PARQUET AS 
> > select a.* from orc_very_large_table a where a.event = 1 and EXISTS 
> (SELECT 1 FROM tmp_small_parquet_table b WHERE b.session_id = a.session_id ) ;
> Query ID = hadoop_20161114132930_65843cb3-557c-4b42-b662-2901caf5be2d
> Total jobs = 1
> Launching Job 1 out of 1
> Status: Running (Executing on YARN cluster with App id 
> application_1479059955967_0049)
> --
> VERTICES  MODESTATUS  TOTAL  COMPLETED  RUNNING  PENDING  
> FAILED  KILLED  
> --
> Map 1 .  containerFAILED384 440  340  
> 26   0  
> Map 2 .. container SUCCEEDED  1  100  
>  0   0  
> --
> VERTICES: 01/02  [===>>---] 11%   ELAPSED TIME: 43.76 s   
>  
> --
> Status: Failed
> Vertex failed, vertexName=Map 1, vertexId=vertex_1479059955967_0049_2_01, 
> diagnostics=[Task failed, taskId=task_1479059955967_0049_2_01_48, 
> diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( 
> failure ) : 
> attempt_1479059955967_0049_2_01_48_0:java.lang.RuntimeException: 
> java.lang.RuntimeException: Hive Runtime Error while closing operators
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:198)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:160)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:370)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.RuntimeException: Hive Runtime Error while closing 
> operators
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.close(MapRecordProcessor.java:422)
>   at 
> 

[jira] [Commented] (HIVE-26179) In tez reuse container mode, asyncInitOperations are not clear.

2022-04-27 Thread zhengchenyu (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17528571#comment-17528571
 ] 

zhengchenyu commented on HIVE-26179:


I found HIVE-15194 which have similar exceptions, but no enough information in 
HIVE-15194.

[~wzheng] [~gopalv] Can you help me review it?

> In tez reuse container mode, asyncInitOperations are not clear.
> ---
>
> Key: HIVE-26179
> URL: https://issues.apache.org/jira/browse/HIVE-26179
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Tez
>Affects Versions: 1.2.1
> Environment: engine: Tez (Note: tez.am.container.reuse.enabled is 
> true)
>  
>Reporter: zhengchenyu
>Assignee: zhengchenyu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> In our cluster, we found error like this.
> {code:java}
> Vertex failed, vertexName=Map 1, vertexId=vertex_1650608671415_321290_1_11, 
> diagnostics=[Task failed, taskId=task_1650608671415_321290_1_11_000422, 
> diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( 
> failure ) : 
> attempt_1650608671415_321290_1_11_000422_0:java.lang.RuntimeException: 
> java.lang.RuntimeException: Hive Runtime Error while closing operators
>     at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:173)
>     at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:135)
>     at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
>     at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
>     at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
>     at java.security.AccessController.doPrivileged(Native Method)
>     at javax.security.auth.Subject.doAs(Subject.java:422)
>     at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
>     at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
>     at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
>     at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>     at 
> com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125)
>     at 
> com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:57)
>     at 
> com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78)
>     at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>     at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>     at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.RuntimeException: Hive Runtime Error while closing 
> operators
>     at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.close(MapRecordProcessor.java:349)
>     at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:161)
>     ... 16 more
> Caused by: java.lang.NullPointerException
>     at 
> org.apache.hadoop.hive.ql.exec.MapJoinOperator.closeOp(MapJoinOperator.java:488)
>     at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:684)
>     at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:698)
>     at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.close(MapRecordProcessor.java:338)
>     ... 17 more
> {code}
> When tez reuse container is enable, and use MapJoinOperator, if same tasks's 
> different taskattemp execute in same container, will throw NPE.
> By my debug, I found the second task attempt use first task's 
> asyncInitOperations. asyncInitOperations are not clear when close op, then 
> second taskattemp may use first taskattepmt's mapJoinTables which 
> HybridHashTableContainer.HashPartition is closed, so throw NPE.
> We must clear asyncInitOperations when op is closed.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (HIVE-26179) In tez reuse container mode, asyncInitOperations are not clear.

2022-04-27 Thread zhengchenyu (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17528568#comment-17528568
 ] 

zhengchenyu commented on HIVE-26179:


[~zabetak] Can you help me review it? I think it is a bug which must be fixed.

> In tez reuse container mode, asyncInitOperations are not clear.
> ---
>
> Key: HIVE-26179
> URL: https://issues.apache.org/jira/browse/HIVE-26179
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Tez
>Affects Versions: 1.2.1
> Environment: engine: Tez (Note: tez.am.container.reuse.enabled is 
> true)
>  
>Reporter: zhengchenyu
>Assignee: zhengchenyu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> In our cluster, we found error like this.
> {code:java}
> Vertex failed, vertexName=Map 1, vertexId=vertex_1650608671415_321290_1_11, 
> diagnostics=[Task failed, taskId=task_1650608671415_321290_1_11_000422, 
> diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( 
> failure ) : 
> attempt_1650608671415_321290_1_11_000422_0:java.lang.RuntimeException: 
> java.lang.RuntimeException: Hive Runtime Error while closing operators
>     at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:173)
>     at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:135)
>     at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
>     at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
>     at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
>     at java.security.AccessController.doPrivileged(Native Method)
>     at javax.security.auth.Subject.doAs(Subject.java:422)
>     at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
>     at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
>     at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
>     at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>     at 
> com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125)
>     at 
> com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:57)
>     at 
> com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78)
>     at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>     at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>     at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.RuntimeException: Hive Runtime Error while closing 
> operators
>     at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.close(MapRecordProcessor.java:349)
>     at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:161)
>     ... 16 more
> Caused by: java.lang.NullPointerException
>     at 
> org.apache.hadoop.hive.ql.exec.MapJoinOperator.closeOp(MapJoinOperator.java:488)
>     at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:684)
>     at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:698)
>     at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.close(MapRecordProcessor.java:338)
>     ... 17 more
> {code}
> When tez reuse container is enable, and use MapJoinOperator, if same tasks's 
> different taskattemp execute in same container, will throw NPE.
> By my debug, I found the second task attempt use first task's 
> asyncInitOperations. asyncInitOperations are not clear when close op, then 
> second taskattemp may use first taskattepmt's mapJoinTables which 
> HybridHashTableContainer.HashPartition is closed, so throw NPE.
> We must clear asyncInitOperations when op is closed.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Updated] (HIVE-26179) In tez reuse container mode, asyncInitOperations are not clear.

2022-04-26 Thread zhengchenyu (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhengchenyu updated HIVE-26179:
---
Description: 
In our cluster, we found error like this.
{code:java}
Vertex failed, vertexName=Map 1, vertexId=vertex_1650608671415_321290_1_11, 
diagnostics=[Task failed, taskId=task_1650608671415_321290_1_11_000422, 
diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( 
failure ) : 
attempt_1650608671415_321290_1_11_000422_0:java.lang.RuntimeException: 
java.lang.RuntimeException: Hive Runtime Error while closing operators
    at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:173)
    at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:135)
    at 
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
    at 
org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
    at 
org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
    at 
org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
    at 
org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
    at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
    at 
com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125)
    at 
com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:57)
    at 
com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78)
    at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.RuntimeException: Hive Runtime Error while closing 
operators
    at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.close(MapRecordProcessor.java:349)
    at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:161)
    ... 16 more
Caused by: java.lang.NullPointerException
    at 
org.apache.hadoop.hive.ql.exec.MapJoinOperator.closeOp(MapJoinOperator.java:488)
    at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:684)
    at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:698)
    at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.close(MapRecordProcessor.java:338)
    ... 17 more
{code}
When tez reuse container is enable, and use MapJoinOperator, if same tasks's 
different taskattemp execute in same container, will throw NPE.

By my debug, I found the second task attempt use first task's 
asyncInitOperations. asyncInitOperations are not clear when close op, then 
second taskattemp may use first taskattepmt's mapJoinTables which 
HybridHashTableContainer.HashPartition is closed, so throw NPE.

We must clear asyncInitOperations when op is closed.

  was:
In our cluster, we found error like this.
{code}
Vertex failed, vertexName=Map 1, vertexId=vertex_1650608671415_321290_1_11, 
diagnostics=[Task failed, taskId=task_1650608671415_321290_1_11_000422, 
diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( 
failure ) : 
attempt_1650608671415_321290_1_11_000422_0:java.lang.RuntimeException: 
java.lang.RuntimeException: Hive Runtime Error while closing operators
    at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:173)
    at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:135)
    at 
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
    at 
org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
    at 
org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
    at 
org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
    at 
org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
    at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
    at 
com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125)
    at 

[jira] [Assigned] (HIVE-26179) In tez reuse container mode, asyncInitOperations are not clear.

2022-04-26 Thread zhengchenyu (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhengchenyu reassigned HIVE-26179:
--


> In tez reuse container mode, asyncInitOperations are not clear.
> ---
>
> Key: HIVE-26179
> URL: https://issues.apache.org/jira/browse/HIVE-26179
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Tez
>Affects Versions: 1.2.1
> Environment: engine: Tez (Note: tez.am.container.reuse.enabled is 
> true)
>  
>Reporter: zhengchenyu
>Assignee: zhengchenyu
>Priority: Major
> Fix For: 4.0.0
>
>
> In our cluster, we found error like this.
> {code}
> Vertex failed, vertexName=Map 1, vertexId=vertex_1650608671415_321290_1_11, 
> diagnostics=[Task failed, taskId=task_1650608671415_321290_1_11_000422, 
> diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( 
> failure ) : 
> attempt_1650608671415_321290_1_11_000422_0:java.lang.RuntimeException: 
> java.lang.RuntimeException: Hive Runtime Error while closing operators
>     at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:173)
>     at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:135)
>     at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
>     at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
>     at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
>     at java.security.AccessController.doPrivileged(Native Method)
>     at javax.security.auth.Subject.doAs(Subject.java:422)
>     at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
>     at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
>     at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
>     at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>     at 
> com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125)
>     at 
> com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:57)
>     at 
> com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78)
>     at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>     at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>     at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.RuntimeException: Hive Runtime Error while closing 
> operators
>     at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.close(MapRecordProcessor.java:349)
>     at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:161)
>     ... 16 more
> Caused by: java.lang.NullPointerException
>     at 
> org.apache.hadoop.hive.ql.exec.MapJoinOperator.closeOp(MapJoinOperator.java:488)
>     at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:684)
>     at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:698)
>     at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.close(MapRecordProcessor.java:338)
>     ... 17 more
> {code}
> When tez reuse container is enable, and use MapJoinOperator, if same tasks's 
> different taskattemp execute in same container, will throw NPE.
> By my debug, I found the second task attempt use first task's 
> asyncInitOperations. asyncInitOperations are not clear when close op, then 
> second taskattemp may use first taskattepmt's mapJoinTables which 
> HybridHashTableContainer.HashPartition is closed, so throw NPE.
>  
> We must clear asyncInitOperations when op is closed.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (HIVE-25561) Killed task should not commit file.

2021-10-07 Thread zhengchenyu (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-25561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17425939#comment-17425939
 ] 

zhengchenyu commented on HIVE-25561:


If tez speculation is enabled, the probability of this problem will increase. 
But also a low probability event.


 If we do not fix this bug, may produce two duplicate file in logical (doesn't 
means same file ), When we read table without uniq, may produce duplicated row.

In fact, removeTempOrDuplicateFiles aim to solve this problem. But in some 
conditions, removeTempOrDuplicateFiles will fail.

For this, I think there is no need to commit file which is created by killed 
task.

> Killed task should not commit file.
> ---
>
> Key: HIVE-25561
> URL: https://issues.apache.org/jira/browse/HIVE-25561
> Project: Hive
>  Issue Type: Bug
>  Components: Tez
>Affects Versions: 1.2.1, 2.3.8, 2.4.0
>Reporter: zhengchenyu
>Assignee: zhengchenyu
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> For tez engine in our cluster, I found some duplicate line, especially tez 
> speculation is enabled. In partition dir, I found both 02_0 and 02_1 
> exist.
> It's a very low probability event. HIVE-10429 has fix some bug about 
> interrupt, but some exception was not caught.
> In our cluster, Task receive SIGTERM, then ClientFinalizer(Hadoop Class) was 
> called, hdfs client will close. Then will raise exception, but abort may not 
> set to true.
> Then removeTempOrDuplicateFiles may fail because of inconsistency, duplicate 
> file will retain. 
> (Notes: Driver first list dir, then Task commit file, then Driver remove 
> duplicate file. It is a inconsistency case)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HIVE-25561) Killed task should not commit file.

2021-09-29 Thread zhengchenyu (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-25561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17421994#comment-17421994
 ] 

zhengchenyu edited comment on HIVE-25561 at 9/29/21, 8:20 AM:
--

[~zabetak] When bug is reproduced, partition contains duplicate file:  02_0 
and 02_1. The two file are created by two different task attempt which 
belong to same task. (One is normal task attempt, the other is speculative task 
attempt.) So we will query duplicated line.

One file is the subset of the other file. Because the speculative task is 
killed, so this file created by killed task is the subset of the completed file.

I found the file created by killed task could be committed under some 
conditions. Once some exception was not caught, abort may be false.


was (Author: zhengchenyu):
[~zabetak] When bug is reproduced, partition contains duplicate file:  02_0 
and 02_1. The two file are created by two different task attempt which 
belong to same task. (One is normal task attempt, the other is speculative task 
attempt.) So we will query duplicated line.

One file is the subset of the other file. Because the speculative task is 
killed, so this file created by killed task is the subset of the completed file.

I found the file created by killed task could be committed. Once some exception 
was not caught, abort may be false.

> Killed task should not commit file.
> ---
>
> Key: HIVE-25561
> URL: https://issues.apache.org/jira/browse/HIVE-25561
> Project: Hive
>  Issue Type: Bug
>  Components: Tez
>Affects Versions: 1.2.1, 2.3.8, 2.4.0
>Reporter: zhengchenyu
>Assignee: zhengchenyu
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> For tez engine in our cluster, I found some duplicate line, especially tez 
> speculation is enabled. In partition dir, I found both 02_0 and 02_1 
> exist.
> It's a very low probability event. HIVE-10429 has fix some bug about 
> interrupt, but some exception was not caught.
> In our cluster, Task receive SIGTERM, then ClientFinalizer(Hadoop Class) was 
> called, hdfs client will close. Then will raise exception, but abort may not 
> set to true.
> Then removeTempOrDuplicateFiles may fail because of inconsistency, duplicate 
> file will retain. 
> (Notes: Driver first list dir, then Task commit file, then Driver remove 
> duplicate file. It is a inconsistency case)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HIVE-25561) Killed task should not commit file.

2021-09-29 Thread zhengchenyu (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-25561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17421994#comment-17421994
 ] 

zhengchenyu edited comment on HIVE-25561 at 9/29/21, 8:18 AM:
--

[~zabetak] When bug is reproduced, partition contains duplicate file:  02_0 
and 02_1. The two file are created by two different task attempt which 
belong to same task. (One is normal task attempt, the other is speculative task 
attempt.) So we will query duplicated line.

One file is the subset of the other file. Because the speculative task is 
killed, so this file created by killed task is the subset of the completed file.

I found the file created by killed task could be committed. Once some exception 
was not caught, abort may be false.


was (Author: zhengchenyu):
[~zabetak] When bug is reproduced, partition contains duplicate file:  02_0 
and 02_1. The two file are created by two different task attempt which 
belong to same task. One is normal task attempt, the other is speculative task 
attempt. So we will query duplicated line.

One file is the subset of the other file. Because the speculative task is 
killed, so this file created by killed task is the subset of the completed file.

I found the file created by killed task could be committed. Once some exception 
was not caught, abort may be false.

> Killed task should not commit file.
> ---
>
> Key: HIVE-25561
> URL: https://issues.apache.org/jira/browse/HIVE-25561
> Project: Hive
>  Issue Type: Bug
>  Components: Tez
>Affects Versions: 1.2.1, 2.3.8, 2.4.0
>Reporter: zhengchenyu
>Assignee: zhengchenyu
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> For tez engine in our cluster, I found some duplicate line, especially tez 
> speculation is enabled. In partition dir, I found both 02_0 and 02_1 
> exist.
> It's a very low probability event. HIVE-10429 has fix some bug about 
> interrupt, but some exception was not caught.
> In our cluster, Task receive SIGTERM, then ClientFinalizer(Hadoop Class) was 
> called, hdfs client will close. Then will raise exception, but abort may not 
> set to true.
> Then removeTempOrDuplicateFiles may fail because of inconsistency, duplicate 
> file will retain. 
> (Notes: Driver first list dir, then Task commit file, then Driver remove 
> duplicate file. It is a inconsistency case)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HIVE-25561) Killed task should not commit file.

2021-09-29 Thread zhengchenyu (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-25561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17421994#comment-17421994
 ] 

zhengchenyu edited comment on HIVE-25561 at 9/29/21, 8:18 AM:
--

[~zabetak] When bug is reproduced, partition contains duplicate file:  02_0 
and 02_1. The two file are created by two different task attempt which 
belong to same task. One is normal task attempt, the other is speculative task 
attempt. So we will query duplicated line.

One file is the subset of the other file. Because the speculative task is 
killed, so this file created by killed task is the subset of the completed file.

I found the file created by killed task could be committed. Once some exception 
was not caught, abort may be false.


was (Author: zhengchenyu):
[~zabetak] When bug is reproduced, partition contains duplicate file:  02_0 
and 02_1. The two file are created by two different task attempt which 
belong to same task. One is normal task attempt, the other is speculative task 
attempt. So we will query duplicated line.

One file is the subset of the other file. Because the speculative task is 
killed, so this file created by killed task is the subset of the completed file.

I found the file created by killed task could be committed. Once some exception 
was not caught, ** abort may be false.

> Killed task should not commit file.
> ---
>
> Key: HIVE-25561
> URL: https://issues.apache.org/jira/browse/HIVE-25561
> Project: Hive
>  Issue Type: Bug
>  Components: Tez
>Affects Versions: 1.2.1, 2.3.8, 2.4.0
>Reporter: zhengchenyu
>Assignee: zhengchenyu
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> For tez engine in our cluster, I found some duplicate line, especially tez 
> speculation is enabled. In partition dir, I found both 02_0 and 02_1 
> exist.
> It's a very low probability event. HIVE-10429 has fix some bug about 
> interrupt, but some exception was not caught.
> In our cluster, Task receive SIGTERM, then ClientFinalizer(Hadoop Class) was 
> called, hdfs client will close. Then will raise exception, but abort may not 
> set to true.
> Then removeTempOrDuplicateFiles may fail because of inconsistency, duplicate 
> file will retain. 
> (Notes: Driver first list dir, then Task commit file, then Driver remove 
> duplicate file. It is a inconsistency case)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HIVE-25561) Killed task should not commit file.

2021-09-29 Thread zhengchenyu (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-25561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17421994#comment-17421994
 ] 

zhengchenyu edited comment on HIVE-25561 at 9/29/21, 8:17 AM:
--

[~zabetak] When bug is reproduced, partition contains duplicate file:  02_0 
and 02_1. The two file are created by two different task attempt which 
belong to same task. One is normal task attempt, the other is speculative task 
attempt. So we will query duplicated line.

One file is the subset of the other file. Because the speculative task is 
killed, so this file created by killed task is the subset of the completed file.

I found the file created by killed task could be committed. Once some exception 
was not caught, ** abort may be false.


was (Author: zhengchenyu):
[~zabetak] When bug is reproduced, partition contains duplicate file:  02_0 
and 02_1. The two file are created by two different task attempt which 
belong to same task. One is normal task attempt, the other is speculative task 
attempt. So we will query duplicated line.

One file is the subset of the other file. Because the speculative task is 
killed, so this file created by killed task is the subset of the completed file.

I found the file created by killed task could be committed.

> Killed task should not commit file.
> ---
>
> Key: HIVE-25561
> URL: https://issues.apache.org/jira/browse/HIVE-25561
> Project: Hive
>  Issue Type: Bug
>  Components: Tez
>Affects Versions: 1.2.1, 2.3.8, 2.4.0
>Reporter: zhengchenyu
>Assignee: zhengchenyu
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> For tez engine in our cluster, I found some duplicate line, especially tez 
> speculation is enabled. In partition dir, I found both 02_0 and 02_1 
> exist.
> It's a very low probability event. HIVE-10429 has fix some bug about 
> interrupt, but some exception was not caught.
> In our cluster, Task receive SIGTERM, then ClientFinalizer(Hadoop Class) was 
> called, hdfs client will close. Then will raise exception, but abort may not 
> set to true.
> Then removeTempOrDuplicateFiles may fail because of inconsistency, duplicate 
> file will retain. 
> (Notes: Driver first list dir, then Task commit file, then Driver remove 
> duplicate file. It is a inconsistency case)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HIVE-25561) Killed task should not commit file.

2021-09-29 Thread zhengchenyu (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-25561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17421994#comment-17421994
 ] 

zhengchenyu edited comment on HIVE-25561 at 9/29/21, 8:15 AM:
--

[~zabetak] When bug is reproduced, partition contains duplicate file:  02_0 
and 02_1. The two file are created by two different task attempt which 
belong to same task. One is normal task attempt, the other is speculative task 
attempt. So we will query duplicated line.

One file is the subset of the other file. Because the speculative task is 
killed, so this file created by killed task is the subset of the completed file.

I found the file created by killed task could be committed.


was (Author: zhengchenyu):
[~zabetak] When bug is reproduced, partition contains duplicate file:  02_0 
and 02_1. The two file are created by two different task attempt which 
belong to same task. One is normal task attempt, the other is speculative task 
attempt. So we will query duplicated line.

> Killed task should not commit file.
> ---
>
> Key: HIVE-25561
> URL: https://issues.apache.org/jira/browse/HIVE-25561
> Project: Hive
>  Issue Type: Bug
>  Components: Tez
>Affects Versions: 1.2.1, 2.3.8, 2.4.0
>Reporter: zhengchenyu
>Assignee: zhengchenyu
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> For tez engine in our cluster, I found some duplicate line, especially tez 
> speculation is enabled. In partition dir, I found both 02_0 and 02_1 
> exist.
> It's a very low probability event. HIVE-10429 has fix some bug about 
> interrupt, but some exception was not caught.
> In our cluster, Task receive SIGTERM, then ClientFinalizer(Hadoop Class) was 
> called, hdfs client will close. Then will raise exception, but abort may not 
> set to true.
> Then removeTempOrDuplicateFiles may fail because of inconsistency, duplicate 
> file will retain. 
> (Notes: Driver first list dir, then Task commit file, then Driver remove 
> duplicate file. It is a inconsistency case)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-25561) Killed task should not commit file.

2021-09-29 Thread zhengchenyu (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-25561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17421994#comment-17421994
 ] 

zhengchenyu commented on HIVE-25561:


[~zabetak] When bug is reproduced, partition contains duplicate file:  02_0 
and 02_1. The two file are created by two different task attempt which 
belong to same task. One is normal task attempt, the other is speculative task 
attempt. So we will query duplicated line.

> Killed task should not commit file.
> ---
>
> Key: HIVE-25561
> URL: https://issues.apache.org/jira/browse/HIVE-25561
> Project: Hive
>  Issue Type: Bug
>  Components: Tez
>Affects Versions: 1.2.1, 2.3.8, 2.4.0
>Reporter: zhengchenyu
>Assignee: zhengchenyu
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> For tez engine in our cluster, I found some duplicate line, especially tez 
> speculation is enabled. In partition dir, I found both 02_0 and 02_1 
> exist.
> It's a very low probability event. HIVE-10429 has fix some bug about 
> interrupt, but some exception was not caught.
> In our cluster, Task receive SIGTERM, then ClientFinalizer(Hadoop Class) was 
> called, hdfs client will close. Then will raise exception, but abort may not 
> set to true.
> Then removeTempOrDuplicateFiles may fail because of inconsistency, duplicate 
> file will retain. 
> (Notes: Driver first list dir, then Task commit file, then Driver remove 
> duplicate file. It is a inconsistency case)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-25561) Killed task should not commit file.

2021-09-26 Thread zhengchenyu (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhengchenyu reassigned HIVE-25561:
--

Assignee: zhengchenyu

> Killed task should not commit file.
> ---
>
> Key: HIVE-25561
> URL: https://issues.apache.org/jira/browse/HIVE-25561
> Project: Hive
>  Issue Type: Bug
>  Components: Tez
>Affects Versions: 1.2.1, 2.3.8, 2.4.0
>Reporter: zhengchenyu
>Assignee: zhengchenyu
>Priority: Major
>
> For tez engine in our cluster, I found some duplicate line, especially tez 
> speculation is enabled. In partition dir, I found both 02_0 and 02_1 
> exist.
> It's a very low probability event. HIVE-10429 has fix some bug about 
> interrupt, but some exception was not caught.
> In our cluster, Task receive SIGTERM, then ClientFinalizer(Hadoop Class) was 
> called, hdfs client will close. Then will raise exception, but abort may not 
> set to true.
> Then removeTempOrDuplicateFiles may fail because of inconsistency, duplicate 
> file will retain. 
> (Notes: Driver first list dir, then Task commit file, then Driver remove 
> duplicate file. It is a inconsistency case)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-25335) Unreasonable setting reduce number, when join big size table(but small row count) and small size table

2021-07-19 Thread zhengchenyu (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-25335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17383089#comment-17383089
 ] 

zhengchenyu commented on HIVE-25335:


[~zabetak] Okay,I will submit PR.

> Unreasonable setting reduce number, when join big size table(but small row 
> count) and small size table
> --
>
> Key: HIVE-25335
> URL: https://issues.apache.org/jira/browse/HIVE-25335
> Project: Hive
>  Issue Type: Improvement
>Reporter: zhengchenyu
>Assignee: zhengchenyu
>Priority: Major
> Attachments: HIVE-25335.001.patch
>
>
> I found an application which is slow in our cluster, because the proccess 
> bytes of one reduce is very huge, but only two reduce. 
> when I debug, I found the reason. Because in this sql, one big size table 
> (about 30G) with few row count(about 3.5M), another small size table (about 
> 100M) have more row count (about 3.6M). So JoinStatsRule.process only use 
> 100M to estimate reducer's number. But we need to  process 30G byte in fact.  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25335) Unreasonable setting reduce number, when join big size table(but small row count) and small size table

2021-07-15 Thread zhengchenyu (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhengchenyu updated HIVE-25335:
---
Attachment: HIVE-25335.001.patch

> Unreasonable setting reduce number, when join big size table(but small row 
> count) and small size table
> --
>
> Key: HIVE-25335
> URL: https://issues.apache.org/jira/browse/HIVE-25335
> Project: Hive
>  Issue Type: Improvement
>Reporter: zhengchenyu
>Assignee: zhengchenyu
>Priority: Major
> Attachments: HIVE-25335.001.patch
>
>
> I found an application which is slow in our cluster, because the proccess 
> bytes of one reduce is very huge, but only two reduce. 
> when I debug, I found the reason. Because in this sql, one big size table 
> (about 30G) with few row count(about 3.5M), another small size table (about 
> 100M) have more row count (about 3.6M). So JoinStatsRule.process only use 
> 100M to estimate reducer's number. But we need to  process 30G byte in fact.  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-25335) Unreasonable setting reduce number, when join big size table(but small row count) and small size table

2021-07-15 Thread zhengchenyu (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhengchenyu reassigned HIVE-25335:
--

Assignee: zhengchenyu

> Unreasonable setting reduce number, when join big size table(but small row 
> count) and small size table
> --
>
> Key: HIVE-25335
> URL: https://issues.apache.org/jira/browse/HIVE-25335
> Project: Hive
>  Issue Type: Improvement
>Reporter: zhengchenyu
>Assignee: zhengchenyu
>Priority: Major
>
> I found an application which is slow in our cluster, because the proccess 
> bytes of one reduce is very huge, but only two reduce. 
> when I debug, I found the reason. Because in this sql, one big size table 
> (about 30G) with few row count(about 3.5M), another small size table (about 
> 100M) have more row count (about 3.6M). So JoinStatsRule.process only use 
> 100M to estimate reducer's number. But we need to  process 30G byte in fact.  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22126) hive-exec packaging should shade guava

2020-08-12 Thread zhengchenyu (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17176752#comment-17176752
 ] 

zhengchenyu commented on HIVE-22126:


[~euigeun_chung] I found another problem. In deriveRowType function, the change 
in this patch will result in dead loop, then throw OOM. the variable 'name' is 
not changed, so the loop never exit.

> hive-exec packaging should shade guava
> --
>
> Key: HIVE-22126
> URL: https://issues.apache.org/jira/browse/HIVE-22126
> Project: Hive
>  Issue Type: Bug
>Reporter: Vihang Karajgaonkar
>Assignee: Eugene Chung
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-22126.01.patch, HIVE-22126.02.patch, 
> HIVE-22126.03.patch, HIVE-22126.04.patch, HIVE-22126.05.patch, 
> HIVE-22126.06.patch, HIVE-22126.07.patch, HIVE-22126.08.patch, 
> HIVE-22126.09.patch, HIVE-22126.09.patch, HIVE-22126.09.patch, 
> HIVE-22126.09.patch, HIVE-22126.09.patch
>
>
> The ql/pom.xml includes complete guava library into hive-exec.jar 
> https://github.com/apache/hive/blob/master/ql/pom.xml#L990 This causes a 
> problems for downstream clients of hive which have hive-exec.jar in their 
> classpath since they are pinned to the same guava version as that of hive. 
> We should shade guava classes so that other components which depend on 
> hive-exec can independently use a different version of guava as needed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22126) hive-exec packaging should shade guava

2020-08-12 Thread zhengchenyu (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17176751#comment-17176751
 ] 

zhengchenyu commented on HIVE-22126:


[~kgyrtkirk]  Yeah, I decompile the jar, found duplicated calcite, I slove this 
problem.

> hive-exec packaging should shade guava
> --
>
> Key: HIVE-22126
> URL: https://issues.apache.org/jira/browse/HIVE-22126
> Project: Hive
>  Issue Type: Bug
>Reporter: Vihang Karajgaonkar
>Assignee: Eugene Chung
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-22126.01.patch, HIVE-22126.02.patch, 
> HIVE-22126.03.patch, HIVE-22126.04.patch, HIVE-22126.05.patch, 
> HIVE-22126.06.patch, HIVE-22126.07.patch, HIVE-22126.08.patch, 
> HIVE-22126.09.patch, HIVE-22126.09.patch, HIVE-22126.09.patch, 
> HIVE-22126.09.patch, HIVE-22126.09.patch
>
>
> The ql/pom.xml includes complete guava library into hive-exec.jar 
> https://github.com/apache/hive/blob/master/ql/pom.xml#L990 This causes a 
> problems for downstream clients of hive which have hive-exec.jar in their 
> classpath since they are pinned to the same guava version as that of hive. 
> We should shade guava classes so that other components which depend on 
> hive-exec can independently use a different version of guava as needed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22126) hive-exec packaging should shade guava

2020-08-11 Thread zhengchenyu (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17175634#comment-17175634
 ] 

zhengchenyu commented on HIVE-22126:


[~euigeun_chung] I solve this problem, I exclude calcite package from 
bin.tar.gz. Sorry for a huge gap, my version(3.2.1) is too older than master.

> hive-exec packaging should shade guava
> --
>
> Key: HIVE-22126
> URL: https://issues.apache.org/jira/browse/HIVE-22126
> Project: Hive
>  Issue Type: Bug
>Reporter: Vihang Karajgaonkar
>Assignee: Eugene Chung
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-22126.01.patch, HIVE-22126.02.patch, 
> HIVE-22126.03.patch, HIVE-22126.04.patch, HIVE-22126.05.patch, 
> HIVE-22126.06.patch, HIVE-22126.07.patch, HIVE-22126.08.patch, 
> HIVE-22126.09.patch, HIVE-22126.09.patch, HIVE-22126.09.patch, 
> HIVE-22126.09.patch, HIVE-22126.09.patch
>
>
> The ql/pom.xml includes complete guava library into hive-exec.jar 
> https://github.com/apache/hive/blob/master/ql/pom.xml#L990 This causes a 
> problems for downstream clients of hive which have hive-exec.jar in their 
> classpath since they are pinned to the same guava version as that of hive. 
> We should shade guava classes so that other components which depend on 
> hive-exec can independently use a different version of guava as needed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Issue Comment Deleted] (HIVE-22126) hive-exec packaging should shade guava

2020-08-11 Thread zhengchenyu (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhengchenyu updated HIVE-22126:
---
Comment: was deleted

(was: [~euigeun_chung] I think it 's not a version problem. I think we wanna 
need to shade guava, we need shade all guava from all other component(for 
example: hadoop, spark), not but work well on a specific version. )

> hive-exec packaging should shade guava
> --
>
> Key: HIVE-22126
> URL: https://issues.apache.org/jira/browse/HIVE-22126
> Project: Hive
>  Issue Type: Bug
>Reporter: Vihang Karajgaonkar
>Assignee: Eugene Chung
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-22126.01.patch, HIVE-22126.02.patch, 
> HIVE-22126.03.patch, HIVE-22126.04.patch, HIVE-22126.05.patch, 
> HIVE-22126.06.patch, HIVE-22126.07.patch, HIVE-22126.08.patch, 
> HIVE-22126.09.patch, HIVE-22126.09.patch, HIVE-22126.09.patch, 
> HIVE-22126.09.patch, HIVE-22126.09.patch
>
>
> The ql/pom.xml includes complete guava library into hive-exec.jar 
> https://github.com/apache/hive/blob/master/ql/pom.xml#L990 This causes a 
> problems for downstream clients of hive which have hive-exec.jar in their 
> classpath since they are pinned to the same guava version as that of hive. 
> We should shade guava classes so that other components which depend on 
> hive-exec can independently use a different version of guava as needed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22126) hive-exec packaging should shade guava

2020-08-11 Thread zhengchenyu (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17175462#comment-17175462
 ] 

zhengchenyu commented on HIVE-22126:


[~euigeun_chung] I think it 's not a version problem. I think we wanna need to 
shade guava, we need shade all guava from all other component(for example: 
hadoop, spark), not but work well on a specific version. 

> hive-exec packaging should shade guava
> --
>
> Key: HIVE-22126
> URL: https://issues.apache.org/jira/browse/HIVE-22126
> Project: Hive
>  Issue Type: Bug
>Reporter: Vihang Karajgaonkar
>Assignee: Eugene Chung
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-22126.01.patch, HIVE-22126.02.patch, 
> HIVE-22126.03.patch, HIVE-22126.04.patch, HIVE-22126.05.patch, 
> HIVE-22126.06.patch, HIVE-22126.07.patch, HIVE-22126.08.patch, 
> HIVE-22126.09.patch, HIVE-22126.09.patch, HIVE-22126.09.patch, 
> HIVE-22126.09.patch, HIVE-22126.09.patch
>
>
> The ql/pom.xml includes complete guava library into hive-exec.jar 
> https://github.com/apache/hive/blob/master/ql/pom.xml#L990 This causes a 
> problems for downstream clients of hive which have hive-exec.jar in their 
> classpath since they are pinned to the same guava version as that of hive. 
> We should shade guava classes so that other components which depend on 
> hive-exec can independently use a different version of guava as needed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HIVE-22126) hive-exec packaging should shade guava

2020-08-11 Thread zhengchenyu (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17175452#comment-17175452
 ] 

zhengchenyu edited comment on HIVE-22126 at 8/11/20, 11:02 AM:
---

When I run the program, I found an AbstractMethodError. I think 
HiveAggregateFactoryImpl's createAggregate doesn't implement AggregateFactory, 
I think com.google.common.collect.ImmutableList would be relocation to 
org.apache.hive.com.google.common.collect.ImmutableList, so throw 
AbstractMethodError. 

Notes: for shade all guava jar, I remove guava lib from lib dir and use 
maven-shade-plugin in main pom.xml to shade all guava. 

Error stack are below: 
{code:java}
2020-08-11T18:26:51,434 ERROR [0ae58217-4908-4697-a9e7-a57f279a22a0 main] 
parse.CalcitePlanner: CBO failed, skipping CBO.
java.lang.RuntimeException: java.lang.AbstractMethodError: 
org.apache.hadoop.hive.ql.optimizer.calcite.HiveRelFactories$HiveAggregateFactoryImpl.createAggregate(Lorg/apache/calcite/rel/RelNode;ZLorg/apache/calcite/util/ImmutableBitSet;Lcom/google/common/collect/ImmutableList;Ljava/util/List;)Lorg/apache/calcite/rel/RelNode;
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.rethrowCalciteException(CalcitePlanner.java:1539)
 ~[hive-exec-3.1.2.jar:3.1.2]
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.logicalPlan(CalcitePlanner.java:1417)
 ~[hive-exec-3.1.2.jar:3.1.2]
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.getOptimizedAST(CalcitePlanner.java:1430)
 ~[hive-exec-3.1.2.jar:3.1.2]
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:450)
 ~[hive-exec-3.1.2.jar:3.1.2]
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12164)
 ~[hive-exec-3.1.2.jar:3.1.2]
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:330)
 ~[hive-exec-3.1.2.jar:3.1.2]
at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:285)
 ~[hive-exec-3.1.2.jar:3.1.2]
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:659) 
~[hive-exec-3.1.2.jar:3.1.2]
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1826) 
~[hive-exec-3.1.2.jar:3.1.2]
at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1773) 
~[hive-exec-3.1.2.jar:3.1.2]
at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1768) 
~[hive-exec-3.1.2.jar:3.1.2]
at 
org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:126)
 ~[hive-exec-3.1.2.jar:3.1.2]
at 
org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:214) 
~[hive-exec-3.1.2.jar:3.1.2]
at 
org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:239) 
~[hive-cli-3.1.2.jar:3.1.2]
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:188) 
~[hive-cli-3.1.2.jar:3.1.2]
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:402) 
~[hive-cli-3.1.2.jar:3.1.2]
at 
org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:821) 
~[hive-cli-3.1.2.jar:3.1.2]
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:759) 
~[hive-cli-3.1.2.jar:3.1.2]
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:683) 
~[hive-cli-3.1.2.jar:3.1.2]
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
~[?:1.8.0_241]
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
~[?:1.8.0_241]
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 ~[?:1.8.0_241]
at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_241]
at org.apache.hadoop.util.RunJar.run(RunJar.java:323) 
~[hadoop-common-3.2.1.jar:?]
at org.apache.hadoop.util.RunJar.main(RunJar.java:236) 
~[hadoop-common-3.2.1.jar:?]
Caused by: java.lang.AbstractMethodError: 
org.apache.hadoop.hive.ql.optimizer.calcite.HiveRelFactories$HiveAggregateFactoryImpl.createAggregate(Lorg/apache/calcite/rel/RelNode;ZLorg/apache/calcite/util/ImmutableBitSet;Lcom/google/common/collect/ImmutableList;Ljava/util/List;)Lorg/apache/calcite/rel/RelNode;
at org.apache.calcite.tools.RelBuilder.aggregate(RelBuilder.java:1267) 
~[calcite-core-1.16.0.jar:1.16.0]
at 
org.apache.calcite.sql2rel.RelFieldTrimmer.trimFields(RelFieldTrimmer.java:886) 
~[calcite-core-1.16.0.jar:1.16.0]
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
~[?:1.8.0_241]
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
~[?:1.8.0_241]
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 ~[?:1.8.0_241]
at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_241]
at 

[jira] [Commented] (HIVE-22126) hive-exec packaging should shade guava

2020-08-11 Thread zhengchenyu (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17175452#comment-17175452
 ] 

zhengchenyu commented on HIVE-22126:


When I run the program, I found an AbstractMethodError. I think 
HiveAggregateFactoryImpl's createAggregate doesn't implement AggregateFactory, 
I think com.google.common.collect.ImmutableList would be relocation to 
org.apache.hive.com.google.common.collect.ImmutableList, so throw 
AbstractMethodError. 

Notes: for shade all guava jar, I remove guava lib from lib dir and use 
maven-shade-plugin in main pom.xml to shade all guava. 

Error stack are below: 

{code}

2020-08-11T18:26:51,434 ERROR [0ae58217-4908-4697-a9e7-a57f279a22a0 main] 
parse.CalcitePlanner: CBO failed, skipping CBO.2020-08-11T18:26:51,434 ERROR 
[0ae58217-4908-4697-a9e7-a57f279a22a0 main] parse.CalcitePlanner: CBO failed, 
skipping CBO.java.lang.RuntimeException: java.lang.AbstractMethodError: 
org.apache.hadoop.hive.ql.optimizer.calcite.HiveRelFactories$HiveAggregateFactoryImpl.createAggregate(Lorg/apache/calcite/rel/RelNode;ZLorg/apache/calcite/util/ImmutableBitSet;Lcom/google/common/collect/ImmutableList;Ljava/util/List;)Lorg/apache/calcite/rel/RelNode;
 at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.rethrowCalciteException(CalcitePlanner.java:1539)
 ~[hive-exec-3.1.2.jar:3.1.2] at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.logicalPlan(CalcitePlanner.java:1417)
 ~[hive-exec-3.1.2.jar:3.1.2] at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.getOptimizedAST(CalcitePlanner.java:1430)
 ~[hive-exec-3.1.2.jar:3.1.2] at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:450)
 ~[hive-exec-3.1.2.jar:3.1.2] at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12164)
 ~[hive-exec-3.1.2.jar:3.1.2] at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:330)
 ~[hive-exec-3.1.2.jar:3.1.2] at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:285)
 ~[hive-exec-3.1.2.jar:3.1.2] at 
org.apache.hadoop.hive.ql.Driver.compile(Driver.java:659) 
~[hive-exec-3.1.2.jar:3.1.2] at 
org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1826) 
~[hive-exec-3.1.2.jar:3.1.2] at 
org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1773) 
~[hive-exec-3.1.2.jar:3.1.2] at 
org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1768) 
~[hive-exec-3.1.2.jar:3.1.2] at 
org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:126)
 ~[hive-exec-3.1.2.jar:3.1.2] at 
org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:214) 
~[hive-exec-3.1.2.jar:3.1.2] at 
org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:239) 
~[hive-cli-3.1.2.jar:3.1.2] at 
org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:188) 
~[hive-cli-3.1.2.jar:3.1.2] at 
org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:402) 
~[hive-cli-3.1.2.jar:3.1.2] at 
org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:821) 
~[hive-cli-3.1.2.jar:3.1.2] at 
org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:759) 
~[hive-cli-3.1.2.jar:3.1.2] at 
org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:683) 
~[hive-cli-3.1.2.jar:3.1.2] at 
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_241] at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
~[?:1.8.0_241] at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 ~[?:1.8.0_241] at java.lang.reflect.Method.invoke(Method.java:498) 
~[?:1.8.0_241] at org.apache.hadoop.util.RunJar.run(RunJar.java:323) 
~[hadoop-common-3.2.1.jar:?] at 
org.apache.hadoop.util.RunJar.main(RunJar.java:236) 
~[hadoop-common-3.2.1.jar:?]Caused by: java.lang.AbstractMethodError: 
org.apache.hadoop.hive.ql.optimizer.calcite.HiveRelFactories$HiveAggregateFactoryImpl.createAggregate(Lorg/apache/calcite/rel/RelNode;ZLorg/apache/calcite/util/ImmutableBitSet;Lcom/google/common/collect/ImmutableList;Ljava/util/List;)Lorg/apache/calcite/rel/RelNode;
 at org.apache.calcite.tools.RelBuilder.aggregate(RelBuilder.java:1267) 
~[calcite-core-1.16.0.jar:1.16.0] at 
org.apache.calcite.sql2rel.RelFieldTrimmer.trimFields(RelFieldTrimmer.java:886) 
~[calcite-core-1.16.0.jar:1.16.0] at 
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_241] at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
~[?:1.8.0_241] at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 ~[?:1.8.0_241] at java.lang.reflect.Method.invoke(Method.java:498) 
~[?:1.8.0_241] at 
org.apache.calcite.util.ReflectUtil$2.invoke(ReflectUtil.java:524) 
~[calcite-core-1.16.0.jar:1.16.0] at 
org.apache.calcite.sql2rel.RelFieldTrimmer.dispatchTrimFields(RelFieldTrimmer.java:273)
 

[jira] [Assigned] (HIVE-14425) java.io.IOException: Could not find status of job:job_*

2017-01-22 Thread zhengchenyu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhengchenyu reassigned HIVE-14425:
--

Assignee: zhengchenyu  (was: liuguanghua)

> java.io.IOException: Could not find status of job:job_*
> ---
>
> Key: HIVE-14425
> URL: https://issues.apache.org/jira/browse/HIVE-14425
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.2.1
> Environment: hadoop2.7.2 + hive1.2.1
>Reporter: liuguanghua
>Assignee: zhengchenyu
>Priority: Minor
>
> java.io.IOException: Could not find status of job:job_1470047186803_13
> at 
> org.apache.hadoop.hive.ql.exec.mr.HadoopJobExecHelper.progress(HadoopJobExecHelper.java:295)
> at 
> org.apache.hadoop.hive.ql.exec.mr.HadoopJobExecHelper.progress(HadoopJobExecHelper.java:549)
> at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:437)
> at org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:137)
> at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160)
> at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:88)
> at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1653)
> at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1412)
> at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1195)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1059)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1049)
> at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:213)
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:165)
> at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376)
> at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:736)
> at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681)
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:621)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
> Ended Job = job_1470047186803_13 with exception 
> 'java.io.IOException(Could not find status of job:job_1470047186803_13)'



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)