zhengchenyu opened a new pull request, #6011:
URL: https://github.com/apache/hive/pull/6011

   ### What changes were proposed in this pull request?
   
   Two change:
   
   * Does not allow abort to be set from true to false.
   * dummyOps also aborts.  
   
   ### Why are the changes needed?
   
   I found that when the task is almost completed (more precisely, when the 
source has been processed), but not closed, if an exception is thrown at this 
time, it may cause the file to be committed incorrectly.
   
   Look at the below code. abort may be set from true to false. It is not 
reasonable. The correct logic is that as long as abort is set to true at one 
place, abort should always be true, then do not commit.
   
https://github.com/apache/hive/blob/9b07c5c7136863ae1eb469e7a3c11357299d2ea1/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/ReduceRecordProcessor.java#L349
   
   When I tried to reproduce this bug, I found that only 
[dummyOp.close(abort)](https://github.com/apache/hive/blob/9b07c5c7136863ae1eb469e7a3c11357299d2ea1/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/ReduceRecordProcessor.java#L369)
 caused the problem. I initially thought that the problem would occur at 
[reducer.close(abort)](https://github.com/apache/hive/blob/9b07c5c7136863ae1eb469e7a3c11357299d2ea1/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/ReduceRecordProcessor.java#L356)
 because the reduce op's abort was set to true in the abort. However, dummyOps 
was not properly aborted. Here, dummyOps should also be aborted. Therefore, the 
issue only occurs when dummyOps is used, such as in mapjoin.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No.
   
   ### How was this patch tested?
   
   Use real tasks. 
   
   > Note: This is a low-probability issue, and I added sleep code at a 
specific location to increase the probability of this bug.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to