[ 
https://issues.apache.org/jira/browse/HADOOP-4842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12673926#action_12673926
 ] 

Klaas Bosteels commented on HADOOP-4842:
----------------------------------------

Actually, shell-only programmers can already combine by adding something like 
"| sort | sh combiner.sh" to their mapper script. More generally, I think it 
makes more sense to combine locally in the streaming application process 
itself, instead of running an additional application process and requiring 
another round trip to the Java process and back. Both 
[Pipes|http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/mapred/pipes/package-summary.html]
 and [Dumbo|http://wiki.github.com/klbostee/dumbo] use this approach for 
combining.

> Streaming combiner should allow command, not just JavaClass
> -----------------------------------------------------------
>
>                 Key: HADOOP-4842
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4842
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: contrib/streaming
>            Reporter: Marco Nicosia
>            Assignee: Amareshwari Sriramadasu
>             Fix For: 0.21.0
>
>
> Streaming jobs are way slower than Java jobs for many reasons, but certainly 
> stopping the shell-only programmer from using the combiner feature won't 
> help. Right now, the streaming usage says:
> {quote}
>   -mapper   <cmd|JavaClassName>      The streaming command to run
>   -combiner <JavaClassName> Combiner has to be a Java class
>   -reducer  <cmd|JavaClassName>      The streaming command to run
> {quote}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to