Oops - I mistakenly assumed the test Reducer was just some kind of wordcount-esque summer.
In fact, it has an O(n^2) operation, essentially: sValue += values.next().toString() + '\t'; Appending to a string like this is very slow, and explains why the reducers that get a lot of the same key are way slower. It's allocating lots of excess objects, and doing a ton of memory copies. Switching to a stringbuffer fixed the whole thing: - String sValue=""; + StringBuffer sb = new StringBuffer(); while (values.hasNext()) { iCount++; - sValue += values.next().toString() + '\t'; + sb.append(values.next().toString()).append('\t'); } Hope that helps, Pankil. -Todd On Mon, Nov 23, 2009 at 9:32 PM, Todd Lipcon <t...@cloudera.com> wrote: > Interesting. I don't have the 17 minutes issue, but the reducer with the > identical keys is taking about twice as long as the others. > > Looking at counters, most of the tasks have "Reduce shuffle bytes" 0, > whereas the slow one has reduce shuffle bytes 1,100,006 as expected. > > Logs on the slow one: > > 2009-11-23 21:20:48,524 INFO org.apache.hadoop.mapred.Merger: Down to the > last merge-pass, with 1 segments left of total size: 1100002 bytes > 2009-11-23 21:22:53,763 INFO org.apache.hadoop.mapred.TaskRunner: > Task:attempt_200911232110_0007_r_000009_0 is done. And is in the process of > commiting > > > On the fast one: > > 2009-11-23 21:20:45,370 INFO org.apache.hadoop.mapred.Merger: Down to the > last merge-pass, with 1 segments left of total size: 1100002 bytes > 2009-11-23 21:21:27,535 INFO org.apache.hadoop.mapred.TaskRunner: > Task:attempt_200911232110_0007_r_000008_0 is done. And is in the process of > commiting > > > So something about the merge is slow when all keys are identical, perhaps. > Which is strange, because this isn't even much of a "merge" - there was only > one mapper. > > I have a plane flight tonight, will try to take a deeper look. > > -Todd > > > > On Wed, Nov 18, 2009 at 2:53 PM, Pankil Doshi <forpan...@gmail.com> wrote: > >> Ya thats true. Though it depends on my cluster configuration but still >> other >> reducers (0 to 8) also have 100000 keys to handle in which keys are >> different and they get done in (1 min 30 sec on avg). >> But 9th reducer getting all 100000 keys same takes 17 mins. >> >> Pankil >> >> On Wed, Nov 18, 2009 at 3:34 PM, Runping Qi <runping...@gmail.com> wrote: >> >> > Is it true that most of the 17 minutes for the reducer with the 100000 >> same >> > keys was taken by the sort phase? If so, that means that the sorting >> > algorithm does not handle the special case well. >> > >> > >> > On Wed, Nov 18, 2009 at 11:16 AM, Pankil Doshi <forpan...@gmail.com> >> > wrote: >> > >> > > Hey Todd, >> > > >> > > I will attach dataset and java source used by me. Make sure you use >> with >> > 10 >> > > reducers and also use partitioner class as I have provided. >> > > >> > > Dataset-1 has smaller key length >> > > Dataset-2 has larger key length >> > > >> > > When I experiment with both dataset , According my partitioner class >> > > Reducer 9 (i.e 10 if start with 1) gets all 100000 keys same and so it >> > take >> > > maximum amount of time in all reducers.( like 17 mins) where as >> remaining >> > > all reducers also get 100000 keys but they all are not same , and >> these >> > > reducers get over in (1 min 30 sec on avg). >> > > >> > > Pankil >> > > >> > > >> > > On Tue, Nov 17, 2009 at 5:07 PM, Todd Lipcon <t...@cloudera.com> >> wrote: >> > > >> > >> On Tue, Nov 17, 2009 at 1:54 PM, Pankil Doshi <forpan...@gmail.com> >> > >> wrote: >> > >> >> > >> > With respect to Imbalanced data, Can anyone guide me how sorting >> takes >> > >> > place >> > >> > in Hadoop after Map phase. >> > >> > >> > >> > I did some experiments and found that if there are two reducers >> which >> > >> have >> > >> > same number of keys to sort and one reducer has all the keys same >> and >> > >> other >> > >> > have different keys then time taken by by the reducer having all >> keys >> > >> same >> > >> > is terribly large then other one. >> > >> > >> > >> > >> > >> Hi Pankil, >> > >> >> > >> This is an interesting experiment you've done with results that I >> > wouldn't >> > >> quite expect. Do you have the java source available that you used to >> run >> > >> this experiment? >> > >> >> > >> >> > >> >> > >> > Also I found that length on my Key doesnt matter in the time taken >> to >> > >> sort >> > >> > it. >> > >> > >> > >> > >> > >> With small keys on CPU-bound workload this is probably the case since >> > the >> > >> sort would be dominated by comparison. If you were to benchmark keys >> > that >> > >> are 10 bytes vs keys that are 1000 bytes, I'm sure you'd see a >> > difference. >> > >> >> > >> >> > >> > I wanted some hints how sorting is done .. >> > >> > >> > >> >> > >> MapTask.java, ReduceTask.java, and Merger.java are the key places to >> > look. >> > >> The actual sort is a relatively basic quicksort, but there is plenty >> of >> > >> complexity in the spill/shuffle/merge logic. >> > >> >> > >> -Todd >> > >> >> > >> >> > >> >> > >> > >> > >> > Pankil >> > >> > >> > >> > On Sun, Nov 15, 2009 at 7:25 PM, Jeff Hammerbacher < >> > ham...@cloudera.com >> > >> > >wrote: >> > >> > >> > >> > > Hey Jeff, >> > >> > > >> > >> > > You may be interested in the Skewed Design specification from the >> > Pig >> > >> > team: >> > >> > > http://wiki.apache.org/pig/PigSkewedJoinSpec. >> > >> > > >> > >> > > Regards, >> > >> > > Jeff >> > >> > > >> > >> > > On Sun, Nov 15, 2009 at 2:00 PM, brien colwell < >> xcolw...@gmail.com> >> > >> > wrote: >> > >> > > >> > >> > > > My first thought is that it depends on the reduce logic. If you >> > >> could >> > >> > do >> > >> > > > the >> > >> > > > reduction in two passes then you could do an initial arbitrary >> > >> > partition >> > >> > > > for >> > >> > > > the majority key and bring the partitions together in a second >> > >> > reduction >> > >> > > > (or >> > >> > > > a map-side join). I would use a round robin strategy to assign >> the >> > >> > > > arbitrary >> > >> > > > partitions. >> > >> > > > >> > >> > > > >> > >> > > > >> > >> > > > >> > >> > > > On Sat, Nov 14, 2009 at 11:03 PM, Jeff Zhang <zjf...@gmail.com >> > >> > >> wrote: >> > >> > > > >> > >> > > > > Hi all, >> > >> > > > > >> > >> > > > > Today there's a problem about imbalanced data come out of >> mind . >> > >> > > > > >> > >> > > > > I'd like to know how hadoop handle this kind of data. e.g. >> one >> > >> key >> > >> > > > > dominates the map output, say 99%. So 99% data set will go to >> > one >> > >> > > > reducer, >> > >> > > > > and this reducer will become the bottleneck. >> > >> > > > > >> > >> > > > > Does hadoop have any other better ways to handle such >> imbalanced >> > >> data >> > >> > > set >> > >> > > > ? >> > >> > > > > >> > >> > > > > >> > >> > > > > Jeff Zhang >> > >> > > > > >> > >> > > > >> > >> > > >> > >> > >> > >> >> > > >> > > >> > >> > >