Mohamed Ali created HIVE-29348:
----------------------------------
Summary: MoveTask fails during ACID insert with dynamic partition
when partition value is NULL
Key: HIVE-29348
URL: https://issues.apache.org/jira/browse/HIVE-29348
Project: Hive
Issue Type: Bug
Components: Hive, Tez
Affects Versions: 3.1.3
Reporter: Mohamed Ali
*Description:*
We encountered a failure while running an {{INSERT INTO … PARTITION}} query in
Hive (running on Tez).
The query completes most stages successfully, but fails near the end during a
{{MoveTask}} with the following error:
{{FAILED: Execution Error, return code 40000 from
org.apache.hadoop.hive.ql.exec.MoveTask.
java.io.FileNotFoundException:
Filehdfs://<cluster>/warehouse/.../<table>/_tmp.delta_0064171_0064171_0001does
not exist.
(state=08S01, code=40000)}}
Despite the failure, Hive prints:
{{INFO: OK}}
which makes it unclear whether the query succeeded or failed.
The final result is that *no data is written to the target table.*
FROM (
SELECT *, SUBSTRING(end_time_str,1,8) AS observation_date
FROM source_table
WHERE LENGTH(SUBSTRING(end_time_str,1,8)) = 8
) base
INSERT INTO stats_table PARTITION (year='YYYY', month='MM', stream='STREAM')
SELECT job_exec_time, observation_date, COUNT(*)
GROUP BY observation_date
INSERT INTO target_table PARTITION (observation_date)
SELECT col1, col2, col3, observation_date
WHERE some_condition;
As soon as the second INSERT executes, Hive produces a MoveTask failure.
Observed Behavior
Earlier stages (DEPENDENCY_COLLECTION, MOVE, etc.) succeed
Hive loads the first target table successfully
The second insert’s MoveTask attempts to read from a temporary delta directory
(example: _tmp.delta_0064171_0064171_0001)
That temporary directory does not exist
MoveTask throws FileNotFoundException
Hive prints INFO: OK which is misleading
No rows are written to the final table
Expected Behavior
Hive should create required temporary directories before MoveTask
OR
Hive should fail earlier with a clear explanation
Logs should not print INFO: OK if the query fails
Request
We request investigation of:
Why temporary delta folder /_tmp.delta_* is missing during MoveTask
Why Hive reports INFO: OK although the statement fails
Whether this is a bug in MoveTask handling on partitioned inserts under Tez
--
This message was sent by Atlassian Jira
(v8.20.10#820010)