[
https://issues.apache.org/jira/browse/HADOOP-1290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Runping Qi updated HADOOP-1290:
-------------------------------
Attachment: patch-1284.txt
This patch implemented the proposed protocol.
With this patch, the streaming user can specify a field separatot for the
mapper's output and/or a field separator
for the reducer's output. The default will be the tab char.
The user can also specify how many fields in the output consitute the keys. The
default is 1.
The rest part of a line will be the value.
A partitioner class, KeyFieldBasedPartitioner in mapred.lib, is also
implemented.
The user can specify the number of the fields in the map output keys
will be used for partitioning.
Also a urility class, FieldSelectionMapReduce in mapred.lib, is added. This
class allows the
user to create map/reduce jobs that manapulate text data like the Unix cut
utility.
The user can specify field separator (delimiter for cut) and specify which
fields to select, and
by which fields to partition/sort.
Two unit tests are introduced.
All the unit tests passed.
> Move Hadoop Abacus to hadoop.mapred.lib
> ---------------------------------------
>
> Key: HADOOP-1290
> URL: https://issues.apache.org/jira/browse/HADOOP-1290
> Project: Hadoop
> Issue Type: Improvement
> Reporter: Runping Qi
>
> Owen and I discussed this issue and we both felt that it is appropriate to
> move Hadoop Abacus to the hadoop main framework.
> Any comments/thoughts/concerns/objections?
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.