yuquan wang created HIVE-25295:
----------------------------------

             Summary: "File already exist exception" during mapper/reducer 
retry with old hive(0.13)
                 Key: HIVE-25295
                 URL: https://issues.apache.org/jira/browse/HIVE-25295
             Project: Hive
          Issue Type: Bug
          Components: Hive
    Affects Versions: 0.13.0
            Reporter: yuquan wang


We are now using very old hive version(0.13) due to historical reason, and we 
often meet following issue:
{code:java}
Caused by: java.io.IOException: File already 
exists:s3://smart-dmp/warehouse/uploaded/ad_dmp_pixel/dt=2021-06-21/key=259f3XXXXXXX
{code}
We have investigated this issue for quite a long time, but didn't get a good 
fix, so I may want to ask the hive community for help to see if there are any 
solutions.
 
The error is created during map/reduce stage, once an instance failed due to 
some unexpected reason(for example unstable spot instance got killed), then 
later retry will throw the above exception, instead of overwriting it.
 
we have several guesses like following:
1. Is it caused by orc file type? I have found similar issue like 
https://issues.apache.org/jira/browse/HIVE-6341 but saw no comments there, and 
our table is stored as orc style.
2. Is the problem solved in the higher hive version? because we are also 
running hive 2.3.6, but didn't meet such an issue, so want to see if version 
upgrade can solve the issue?
3.Do we have such a config that supports always cleaning up existing folders 
during retry of mapper/reducer stage. I have searched all mapreduce config but 
can not find one.  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to