Mohamed Ali created HIVE-29348:
----------------------------------

             Summary: MoveTask fails during ACID insert with dynamic partition 
when partition value is NULL
                 Key: HIVE-29348
                 URL: https://issues.apache.org/jira/browse/HIVE-29348
             Project: Hive
          Issue Type: Bug
          Components: Hive, Tez
    Affects Versions: 3.1.3
            Reporter: Mohamed Ali


*Description:*
We encountered a failure while running an {{INSERT INTO … PARTITION}} query in 
Hive (running on Tez).
The query completes most stages successfully, but fails near the end during a 
{{MoveTask}} with the following error:
 
 
{{FAILED: Execution Error, return code 40000 from 
org.apache.hadoop.hive.ql.exec.MoveTask.
java.io.FileNotFoundException: 
Filehdfs://<cluster>/warehouse/.../<table>/_tmp.delta_0064171_0064171_0001does 
not exist.
(state=08S01, code=40000)}}
Despite the failure, Hive prints:
 
 
{{INFO: OK}}
which makes it unclear whether the query succeeded or failed.
The final result is that *no data is written to the target table.*

 

FROM (
  SELECT *, SUBSTRING(end_time_str,1,8) AS observation_date
  FROM source_table
  WHERE LENGTH(SUBSTRING(end_time_str,1,8)) = 8
) base

INSERT INTO stats_table PARTITION (year='YYYY', month='MM', stream='STREAM')
SELECT job_exec_time, observation_date, COUNT(*)
GROUP BY observation_date

INSERT INTO target_table PARTITION (observation_date)
SELECT col1, col2, col3, observation_date
WHERE some_condition;


As soon as the second INSERT executes, Hive produces a MoveTask failure.

Observed Behavior
Earlier stages (DEPENDENCY_COLLECTION, MOVE, etc.) succeed

Hive loads the first target table successfully

The second insert’s MoveTask attempts to read from a temporary delta directory
(example: _tmp.delta_0064171_0064171_0001)

That temporary directory does not exist

MoveTask throws FileNotFoundException

Hive prints INFO: OK which is misleading

No rows are written to the final table

Expected Behavior
Hive should create required temporary directories before MoveTask
OR

Hive should fail earlier with a clear explanation

Logs should not print INFO: OK if the query fails

Request
We request investigation of:

Why temporary delta folder /_tmp.delta_* is missing during MoveTask

Why Hive reports INFO: OK although the statement fails

Whether this is a bug in MoveTask handling on partitioned inserts under Tez



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to