[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13670574#comment-13670574
 ] 

Jason Lowe commented on MAPREDUCE-5211:
---------------------------------------

Is this actually a problem in 2.0.4?  I believe after MAPREDUCE-2264, which is 
in 2.0.3, MapOutput objects are not used to generate the pathnames and the 
problem does not occur.  The pathnames can get very long, since they 
concatenate two absolute paths to form the new path, but I didn't think it was 
possible for two paths to collide for the same reducer.  Also the reduce 
attempt ID should appear at least once in the pathname, so it should also be 
impossible for paths to collide between reducers.
                
> Reducer intermediate files can collide during merge
> ---------------------------------------------------
>
>                 Key: MAPREDUCE-5211
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5211
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>    Affects Versions: 0.23.7
>            Reporter: Jason Lowe
>            Assignee: Jason Lowe
>            Priority: Blocker
>              Labels: 2.0.4.2
>             Fix For: 0.23.8
>
>         Attachments: MAPREDUCE-5211.branch-0.23.patch
>
>
> The OnDiskMerger.merge method constructs an output path that is not unique to 
> a reduce attempt, and as a result can result in a file collision with other 
> reducers from the same app that are running on the same node.  In addition 
> the name of the output file is based on MapOutput.toString which may not be 
> unique in light of multi-pass merges on disk since the mapId will be null and 
> the basename ends up as "MapOutput(null, DISK)"

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to