[ 
https://issues.apache.org/jira/browse/HIVE-25561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17425939#comment-17425939
 ] 

zhengchenyu commented on HIVE-25561:
------------------------------------

If tez speculation is enabled, the probability of this problem will increase. 
But also a low probability event.


 If we do not fix this bug, may produce two duplicate file in logical (doesn't 
means same file ), When we read table without uniq, may produce duplicated row.

In fact, removeTempOrDuplicateFiles aim to solve this problem. But in some 
conditions, removeTempOrDuplicateFiles will fail.

For this, I think there is no need to commit file which is created by killed 
task.

> Killed task should not commit file.
> -----------------------------------
>
>                 Key: HIVE-25561
>                 URL: https://issues.apache.org/jira/browse/HIVE-25561
>             Project: Hive
>          Issue Type: Bug
>          Components: Tez
>    Affects Versions: 1.2.1, 2.3.8, 2.4.0
>            Reporter: zhengchenyu
>            Assignee: zhengchenyu
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 40m
>  Remaining Estimate: 0h
>
> For tez engine in our cluster, I found some duplicate line, especially tez 
> speculation is enabled. In partition dir, I found both 000002_0 and 000002_1 
> exist.
> It's a very low probability event. HIVE-10429 has fix some bug about 
> interrupt, but some exception was not caught.
> In our cluster, Task receive SIGTERM, then ClientFinalizer(Hadoop Class) was 
> called, hdfs client will close. Then will raise exception, but abort may not 
> set to true.
> Then removeTempOrDuplicateFiles may fail because of inconsistency, duplicate 
> file will retain. 
> (Notes: Driver first list dir, then Task commit file, then Driver remove 
> duplicate file. It is a inconsistency case)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to