[jira] [Updated] (HIVE-29124) Avoid committing files when a task is aborted even though some source has completed

Chenyu Zheng (Jira) Wed, 06 Aug 2025 19:05:05 -0700


     [ 
https://issues.apache.org/jira/browse/HIVE-29124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Chenyu Zheng updated HIVE-29124:
--------------------------------
    Description: 
I found that when the task is almost completed (more precisely, when the source 
has been processed), but not closed, if an exception is thrown at this time, it 
may cause the file to be committed incorrectly.

Look at the [the 
code](https://github.com/apache/hive/blob/9b07c5c7136863ae1eb469e7a3c11357299d2ea1/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/ReduceRecordProcessor.java#L349).
 abort may be set from true to false. It is not reasonable. The correct logic 
is that as long as abort is set to true at one place, abort should always be 
true, then do not commit.

When I tried to reproduce this bug, I found that only 
[dummyOp.close(abort)](https://github.com/apache/hive/blob/9b07c5c7136863ae1eb469e7a3c11357299d2ea1/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/ReduceRecordProcessor.java#L369)
 caused the problem. I initially thought that the problem would occur at 
[reducer.close(abort)](https://github.com/apache/hive/blob/9b07c5c7136863ae1eb469e7a3c11357299d2ea1/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/ReduceRecordProcessor.java#L356)
 because the reduce op's abort was set to true in the abort. However, dummyOps 
was not properly aborted. Here, dummyOps should also be aborted. Therefore, the 
issue only occurs when dummyOps is used, such as in mapjoin.

  was:
I found that when the task is almost completed (more precisely, when the source 
has been processed), but not closed, if an exception is thrown at this time, it 
may cause the file to be committed incorrectly.

 

Look at the below code. abort may be set from true to false. It is not 
reasonable. The correct logic is that as long as abort is set to true at one 
place, abort should always be true, then do not commit.
[hive/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/ReduceRecordProcessor.java|https://github.com/apache/hive/blob/9b07c5c7136863ae1eb469e7a3c11357299d2ea1/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/ReduceRecordProcessor.java#L349]

Line 349 in 
[9b07c5c|https://github.com/apache/hive/commit/9b07c5c7136863ae1eb469e7a3c11357299d2ea1]
|setAborted(false); // Preserving the old logic. Hmm...|

 

When I tried to reproduce this bug, I found that only 
[dummyOp.close(abort)|https://github.com/apache/hive/blob/9b07c5c7136863ae1eb469e7a3c11357299d2ea1/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/ReduceRecordProcessor.java#L369]
 caused the problem. I initially thought that the problem would occur at 
[reducer.close(abort)|https://github.com/apache/hive/blob/9b07c5c7136863ae1eb469e7a3c11357299d2ea1/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/ReduceRecordProcessor.java#L356]
 because the reduce op's abort was set to true in the abort. However, dummyOps 
was not properly aborted. Here, dummyOps should also be aborted. Therefore, the 
issue only occurs when dummyOps is used, such as in mapjoin.


> Avoid committing files when a task is aborted even though some source has 
> completed
> -----------------------------------------------------------------------------------
>
>                 Key: HIVE-29124
>                 URL: https://issues.apache.org/jira/browse/HIVE-29124
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Chenyu Zheng
>            Assignee: Chenyu Zheng
>            Priority: Major
>
> I found that when the task is almost completed (more precisely, when the 
> source has been processed), but not closed, if an exception is thrown at this 
> time, it may cause the file to be committed incorrectly.
> Look at the [the 
> code](https://github.com/apache/hive/blob/9b07c5c7136863ae1eb469e7a3c11357299d2ea1/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/ReduceRecordProcessor.java#L349).
>  abort may be set from true to false. It is not reasonable. The correct logic 
> is that as long as abort is set to true at one place, abort should always be 
> true, then do not commit.
> When I tried to reproduce this bug, I found that only 
> [dummyOp.close(abort)](https://github.com/apache/hive/blob/9b07c5c7136863ae1eb469e7a3c11357299d2ea1/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/ReduceRecordProcessor.java#L369)
>  caused the problem. I initially thought that the problem would occur at 
> [reducer.close(abort)](https://github.com/apache/hive/blob/9b07c5c7136863ae1eb469e7a3c11357299d2ea1/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/ReduceRecordProcessor.java#L356)
>  because the reduce op's abort was set to true in the abort. However, 
> dummyOps was not properly aborted. Here, dummyOps should also be aborted. 
> Therefore, the issue only occurs when dummyOps is used, such as in mapjoin.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HIVE-29124) Avoid committing files when a task is aborted even though some source has completed

Reply via email to