[jira] Commented: (MAPREDUCE-1750) Make #rows avail. to reducers as environment variable

Owen O'Malley (JIRA) Sun, 02 May 2010 22:12:22 -0700

    [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12863234#action_12863234
 ]


Owen O'Malley commented on MAPREDUCE-1750:
------------------------------------------

Is this for streaming? I don't remember how lazy streaming is about waiting for 
the input before starting the process. If the process starts too early, it will 
be a difficult change. In any case, it will be easier to start by making it 
available to Java first.

I assume you mean the number of values for this reduce? The number of keys 
isn't known until the reduce is almost done. You also can't know the number of 
keys or values for other reduces without a lot of extra traffic from the 
JobTracker.

> Make #rows avail. to reducers as environment variable
> -----------------------------------------------------
>
>                 Key: MAPREDUCE-1750
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1750
>             Project: Hadoop Map/Reduce
>          Issue Type: Wish
>            Reporter: Adam Kramer
>            Priority: Minor
>
> Given that there is a sort phase between the copy phase and the reduce phase, 
> it seems like there is a chance for counting during sort.
> It would be nice if my reducers could have access to an environment variable, 
> say, mapred.reduce.rows, that contained the number of rows present for this 
> reducer (as counted during the sort step).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1750) Make #rows avail. to reducers as environment variable

Reply via email to