[
https://issues.apache.org/jira/browse/HADOOP-1612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12515217
]
Arun C Murthy commented on HADOOP-1612:
---------------------------------------
Christian, I've spent a fair amount of time trying to reproduce the _lost
files_ case without much headway... although the issue with the _${taskid}
subdirs turning up is fairly easy to reproduce and as we discussed it is an
unfortunate side-effect of speculative tasks killed *after* job completion.
I'll keep trying to see if I get some toe-hold, meanwhile:
a) Could you just ignore the the _${taskid} files while you are moving stuff
from your job output dir.
b) Try and incorporate HADOOP-1576, which at the very least, helps in debugging.
Thanks!
> listing of an output directory shortly after job completion fails
> -----------------------------------------------------------------
>
> Key: HADOOP-1612
> URL: https://issues.apache.org/jira/browse/HADOOP-1612
> Project: Hadoop
> Issue Type: Bug
> Components: mapred
> Affects Versions: 0.14.0
> Reporter: Christian Kunz
> Assignee: Arun C Murthy
> Priority: Blocker
> Fix For: 0.14.0
>
>
> Sometimes, after a job finishes, and another application wants to rename dfs
> files created by that job, listing of the output directory containing the
> newly created files fails. File creation and directory listing is done via
> libhdfs, but it is unlikely that this makes any difference, therefore, I add
> this to the mapred component.
> It might be a race condition: does the job complete before the files in the
> output directory are promoted?
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.