[jira] Commented: (HADOOP-113) Allow multiple Output Dirs to be specified for a job

paul sutter (JIRA) Sat, 08 Apr 2006 15:23:49 -0700

    [ 
http://issues.apache.org/jira/browse/HADOOP-113?page=comments#action_12373749 ]


paul sutter commented on HADOOP-113:
------------------------------------


Is the intention to have one mapper fork into multiple reducers, saving the 
file io of doing independent map passes?

mapper1 -> output a -> reducer 1
                 -> output b -> reducer 2 

instead of

mapper1 -> output a -> reducer 1
mapper 2 -> output b -> reducer 2

wherein the second example, the map input file is read twice instead of once?

that could be useful. i not sure how much it would really speed things up.

> Allow multiple Output Dirs to be specified for a job
> ----------------------------------------------------
>
>          Key: HADOOP-113
>          URL: http://issues.apache.org/jira/browse/HADOOP-113
>      Project: Hadoop
>         Type: New Feature

>   Components: mapred
>     Versions: 0.1.0
>     Reporter: Rod Taylor
>  Attachments: hadoop_multisegment.patch
>
> Allow a single job to create multiple outputs. 2 additional simple functions 
> only
> This allows for more complex branching of the process to occur either with 
> multiple steps of the same type or allow different actions to take place on 
> each output directory depending on the required actions.
> For my specific use, it allows me to run multiple Generate Outputs instead of 
> a single Generate Output as submitted in 
> NUTCH-171(http://issues.apache.org/jira/browse/NUTCH-171)

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

[jira] Commented: (HADOOP-113) Allow multiple Output Dirs to be specified for a job

Reply via email to