zhengchenyu created HIVE-25561:
----------------------------------

             Summary: Killed task should not commit file.
                 Key: HIVE-25561
                 URL: https://issues.apache.org/jira/browse/HIVE-25561
             Project: Hive
          Issue Type: Bug
          Components: Tez
    Affects Versions: 2.4.0, 2.3.8, 1.2.1
            Reporter: zhengchenyu


For tez engine in our cluster, I found some duplicate line, especially tez 
speculation is enabled. In partition dir, I found both 000002_0 and 000002_1 
exist.
It's a very low probability event. HIVE-10429 has fix some bug about interrupt, 
but some exception was not caught.

In our cluster, Task receive SIGTERM, then ClientFinalizer(Hadoop Class) was 
called, hdfs client will close. Then will raise exception, but abort may not 
set to true.
Then removeTempOrDuplicateFiles may fail because of inconsistency, duplicate 
file will retain. 
(Notes: Driver first list dir, then Task commit file, then Driver remove 
duplicate file. It is a inconsistency case)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to