zhengchenyu opened a new pull request #2674:
URL: https://github.com/apache/hive/pull/2674


   ### What changes were proposed in this pull request?
   We should set abort to true, when we catch any Exception.
   
   ### Why are the changes needed?
   
   For tez engine in our cluster, I found some duplicate line, especially tez 
speculation is enabled. In partition dir, I found both 000002_0 and 000002_1 
exist.
   It's a very low probability event. HIVE-10429 has fix some bug about 
interrupt, but some exception was not caught.
   
   In our cluster, Task receive SIGTERM, then ClientFinalizer(Hadoop Class) was 
called, hdfs client will close. Then will raise exception, but abort may not 
set to true.
   Then removeTempOrDuplicateFiles may fail because of inconsistency, duplicate 
file will retain.
   (Notes: Driver first list dir, then Task commit file, then Driver remove 
duplicate file. It is a inconsistency case)
   
   
   ### How was this patch tested?
   
   Manual test in our cluster. 
   And I add some delay in our test code, then increase the problem's 
probability.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to