[
https://issues.apache.org/jira/browse/HIVE-29124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Chenyu Zheng updated HIVE-29124:
--------------------------------
Description:
I found that when the task is almost completed (more precisely, when the source
has been processed), but not closed, if an exception is thrown at this time, it
may cause the file to be committed incorrectly.
Look at the [the
code](https://github.com/apache/hive/blob/9b07c5c7136863ae1eb469e7a3c11357299d2ea1/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/ReduceRecordProcessor.java#L349).
abort may be set from true to false. It is not reasonable. The correct logic
is that as long as abort is set to true at one place, abort should always be
true, then do not commit.
When I tried to reproduce this bug, I found that only
[dummyOp.close(abort)](https://github.com/apache/hive/blob/9b07c5c7136863ae1eb469e7a3c11357299d2ea1/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/ReduceRecordProcessor.java#L369)
caused the problem. I initially thought that the problem would occur at
[reducer.close(abort)](https://github.com/apache/hive/blob/9b07c5c7136863ae1eb469e7a3c11357299d2ea1/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/ReduceRecordProcessor.java#L356)
because the reduce op's abort was set to true in the abort. However, dummyOps
was not properly aborted. Here, dummyOps should also be aborted. Therefore, the
issue only occurs when dummyOps is used, such as in mapjoin.
was:
I found that when the task is almost completed (more precisely, when the source
has been processed), but not closed, if an exception is thrown at this time, it
may cause the file to be committed incorrectly.
Look at the below code. abort may be set from true to false. It is not
reasonable. The correct logic is that as long as abort is set to true at one
place, abort should always be true, then do not commit.
[hive/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/ReduceRecordProcessor.java|https://github.com/apache/hive/blob/9b07c5c7136863ae1eb469e7a3c11357299d2ea1/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/ReduceRecordProcessor.java#L349]
Line 349 in
[9b07c5c|https://github.com/apache/hive/commit/9b07c5c7136863ae1eb469e7a3c11357299d2ea1]
|setAborted(false); // Preserving the old logic. Hmm...|
When I tried to reproduce this bug, I found that only
[dummyOp.close(abort)|https://github.com/apache/hive/blob/9b07c5c7136863ae1eb469e7a3c11357299d2ea1/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/ReduceRecordProcessor.java#L369]
caused the problem. I initially thought that the problem would occur at
[reducer.close(abort)|https://github.com/apache/hive/blob/9b07c5c7136863ae1eb469e7a3c11357299d2ea1/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/ReduceRecordProcessor.java#L356]
because the reduce op's abort was set to true in the abort. However, dummyOps
was not properly aborted. Here, dummyOps should also be aborted. Therefore, the
issue only occurs when dummyOps is used, such as in mapjoin.
> Avoid committing files when a task is aborted even though some source has
> completed
> -----------------------------------------------------------------------------------
>
> Key: HIVE-29124
> URL: https://issues.apache.org/jira/browse/HIVE-29124
> Project: Hive
> Issue Type: Bug
> Reporter: Chenyu Zheng
> Assignee: Chenyu Zheng
> Priority: Major
>
> I found that when the task is almost completed (more precisely, when the
> source has been processed), but not closed, if an exception is thrown at this
> time, it may cause the file to be committed incorrectly.
> Look at the [the
> code](https://github.com/apache/hive/blob/9b07c5c7136863ae1eb469e7a3c11357299d2ea1/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/ReduceRecordProcessor.java#L349).
> abort may be set from true to false. It is not reasonable. The correct logic
> is that as long as abort is set to true at one place, abort should always be
> true, then do not commit.
> When I tried to reproduce this bug, I found that only
> [dummyOp.close(abort)](https://github.com/apache/hive/blob/9b07c5c7136863ae1eb469e7a3c11357299d2ea1/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/ReduceRecordProcessor.java#L369)
> caused the problem. I initially thought that the problem would occur at
> [reducer.close(abort)](https://github.com/apache/hive/blob/9b07c5c7136863ae1eb469e7a3c11357299d2ea1/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/ReduceRecordProcessor.java#L356)
> because the reduce op's abort was set to true in the abort. However,
> dummyOps was not properly aborted. Here, dummyOps should also be aborted.
> Therefore, the issue only occurs when dummyOps is used, such as in mapjoin.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)