[ https://issues.apache.org/jira/browse/MAPREDUCE-2410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Harsh J Chouraria updated MAPREDUCE-2410: ----------------------------------------- Attachment: MAPREDUCE-2410.r1.diff Dieter, I've attached a patch that adds a documentation entry to the streaming's FAQ page. Let me know if the following is sufficient (its what the patch contains as well): {code} +<section> +<title>How does the use of streaming differ from the Java MapReduce API?</title> +<p> + The Java MapReduce API provides a higher level API that lets the developer focus on writing map and reduce functions that act upon a pair of key and associated value(s). The Java API takes care of the iteration over the data source behind the scenes. + In streaming, the framework pours in the input data over the stdin to the mapper/reduce program, and thus these programs ought to be written from the reading (via stdin) iteration level. +</p> +</section> {code} > document multiple keys per reducer oddity in hadoop streaming FAQ > ----------------------------------------------------------------- > > Key: MAPREDUCE-2410 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-2410 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: contrib/streaming, documentation > Reporter: Dieter Plaetinck > Priority: Minor > Labels: newbie > Attachments: MAPREDUCE-2410.r1.diff > > Original Estimate: 40m > Remaining Estimate: 40m > > Hi, > for a newcomer to hadoop streaming, it comes as a surprise that the reducer > receives arbitrary keys, unlike the "real" hadoop where a reducer works on a > single key. > An explanation for this is @ > http://mail-archives.apache.org/mod_mbox/hadoop-common-user/201103.mbox/browser > I suggest to add this to the FAQ of hadoop streaming -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira