Re: How do I sum by Key in the Reduce Phase AND keep the initial value

Amogh Vasekar Tue, 12 Jan 2010 12:01:14 -0800

Hi,
I ran into a very similar situation quite some time back and had then 
encountered this : http://issues.apache.org/jira/browse/HADOOP-475
After speaking to a few Hadoop folks, they had said complete cloning was not a 
straightforward option for some optimization reasons.
There were a few things I tried , to run this in a single MR job emitting <k,v> 
 from mapper one more time with some tagging info ( this bumped up S&S phase by 
quite a lot ); run a map only successor job etc. But keeping records in memory 
and writing to disk after certain threshold amount worked pretty well for me ( 
all this on Hadoop 0.17.2 )
Anyways, they seem to have resolved it in next Hadoop release.


Amogh



On 1/12/10 10:29 PM, "Stephen Watt" <sw...@us.ibm.com> wrote:

The Key Value pairs coming into my Reducer are as Follows

KEY(Text)        VALUE(IntWritable)
A               11
A               9
B               2
B                3

I want my reducer to sum the Values for each input key and then output the key 
with a Text Value containing the original value and the sum.

KEY(Text)        VALUE(Text)
A               11        20
A               9        20
B               2        5
B                3         5

Here is the issue :  In the reducer, I am iterating through the values for each 
using values.iterator() and storing the total amount in a variable. Then I am 
TRYING to iterate through the keys again, except this time, writing the new 
value (A, new Text("11 20") in the output collector to create the Value 
structure displayed in the example above. This fails because it appears I can 
only iterate through the values for each key ONCE. I know this because 
additional attempts to get new iterators from the context or the Iterable type 
thats passed into the reducer always return false on the initial hasNext().

I have to iterate through it twice because the first time I have to sum the 
values and the second time I need to write the write the initial (11) value and 
the sum(20) as I need both values as part of a calculation in the next job. Any 
ideas on how to do this ?

Kind regards
Steve Watt

Re: How do I sum by Key in the Reduce Phase AND keep the initial value

Reply via email to