On Fri, Mar 27, 2009 at 4:39 PM, Sid123 <itis...@gmail.com> wrote: > But I was thinking of grouping the values and generating a key using a > random number generator in the collector of the mapper. The values will now > be uniformly distributed over a few keys. Say the number of keys will be > 0.1% of the # of values or atleast 1, which ever is higher. So if there > 20000 values 2000 odd values should be under a single key.. and 10 reducers > should spawn to do the sum in parallel... Now I can atleast run 10 sum in > parallel rather than just 1 reducer doing the whole work... How does that > theory seem? >
What you want to do is write a combiner, which is essentially a reducer that runs on the map output of a single node before before being sent to the main reducer. Then the real reducer would get one value per node.