You probably need to implement a custom comparator that you use as the grouping 
comparator that compares the primary key, and then if they are the same 
compares the int part of the key.

--Aaron

-----------------------------------------------------------------------------
From: Trevor Adams [mailto:trevorad...@gmail.com]
Sent: Wednesday, June 29, 2011 10:00 AM
To: mapreduce-user@hadoop.apache.org
Subject: Reduce method called same key twice

So I have a custom Key which is used for a join. It contains two fields, a 
boolean (is primary key) and an int (key). Hashcode only looks at the key 
field, so that it gets sent to the same reducer. Compare places the pkey at the 
top of the list (if sorted using compare). This works nicely, except that the 
reduce method is called with Key: 1 -> a single value, Key: 1 -> another value 
etc. One for each value, so instead of bucketing the values to a key (and some 
of the keys are identical, in every way) it sends 1 key and 1 value to the 
reducer at a time. How do I get it to bucket or why isn't it bucketing?

-Trevor

Reply via email to