[
https://issues.apache.org/jira/browse/HADOOP-5979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12717591#action_12717591
]
Klaas Bosteels commented on HADOOP-5979:
----------------------------------------
Yeah, I was actually suggesting such a special Java implementation that writes
to and reads from a command, but instead of letting the command generate the
partition number directly, I thought it might make sense to let it output a key
or even a key/value pair (which are completely separate from the other
MapReduce keys and values) and determine the partition from that. So instead of
generating the same number for pairs that need to go to the same reducer, the
partitioner command would just have to generate the same key for those pairs.
The benefits of such an approach would be that
# it's simpler (the partitioner command doesn't need to know how many
partitions there are),
# it might be easier to define a suitable partitioner command (when using shell
tools it might be easier to output a string instead of a specific number for
example),
# we could reuse more code that's already there (if we let the the partitioner
command output both a key and a value and pass that on to a wrapped
partitioner, like in the code sample I gave above, we even wouldn't need any
additional reading/writing logic).
> Streaming partitioner should allow command, not just Java class
> ---------------------------------------------------------------
>
> Key: HADOOP-5979
> URL: https://issues.apache.org/jira/browse/HADOOP-5979
> Project: Hadoop Core
> Issue Type: Improvement
> Components: contrib/streaming
> Reporter: Klaas Bosteels
>
> Since HADOOP-4842 got committed, Streaming allows both commands and Java
> classes to be specified as mapper, reducer, and combiner, but the
> {{-partitioner}} option is still limited to Java classes only. Allowing
> commands to be specified as partitioner as well would greatly improve the
> flexibility of Streaming programs.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.