[
https://issues.apache.org/jira/browse/HADOOP-1216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12492249
]
Hadoop QA commented on HADOOP-1216:
-----------------------------------
Integrated in Hadoop-Nightly #71 (See
http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Nightly/71/)
> Hadoop should support reduce none option
> ----------------------------------------
>
> Key: HADOOP-1216
> URL: https://issues.apache.org/jira/browse/HADOOP-1216
> Project: Hadoop
> Issue Type: New Feature
> Components: mapred
> Reporter: Runping Qi
> Assigned To: Runping Qi
> Fix For: 0.13.0
>
> Attachments: patch_1216.txt
>
>
> This has been a highly desired feature in streaming world and was asked
> occationally in the non-streaming side.
> Streaming implemented a working (hacking) solution. But it also generates
> discrepency between hadoop
> streaming/non-streaming model. It would be nice if Hadoop offers such a
> feature
> that works both streaming and non-streaming. Owen and I discussed this a bit
> and here is the
> general idea for further discussions/suggestions:
> 1. Allows the user to specify reducer=none in jobconf.
> 2. The user still can specify output format and output directory
> 3. Each mapper will generate an output file in the specified directory. The
> naming convention can still be like part-xxxxxxxx
> where xxxxxxxx is the map task number.
> 4. The mapoutput collector of a mapper task will be a record writer on the
> 5. The mapper will call output.collect() to write the output, thus the same
> mapper class can be
> used, regardless reducer none is set or not.
> When reducer is set to none for a job, there will be no mapoutput files
> writen on to local file system at all,
> and no data shuffling between mappers and reducers. As a mapper of fact, the
> framework may choose
> not to create reducers at all.
>
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.