dfs -getMerge does not do what it says it does
----------------------------------------------
Key: HADOOP-2120
URL: https://issues.apache.org/jira/browse/HADOOP-2120
Project: Hadoop
Issue Type: Bug
Components: mapred
Affects Versions: 0.14.3
Environment: All
Reporter: Milind Bhandarkar
Fix For: 0.16.0
dfs -getMerge, which calls FileUtil.CopyMerge, contains this javadoc:
{code}
Get all the files in the directories that match the source file pattern
* and merge and sort them to only one file on local fs
* srcf is kept.
{code}
However, it only concatenates the set of input files, rather than merging them
in sorted order.
Ideally, the copyMerge should be equivalent to a map-reduce job with
IdentityMapper and IdentityReducer with numReducers = 1. However, not having to
run this as a map-reduce job has some advantages, since it increases cluster
utilization during reduce phase.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.