How many reduce tasks do you have? Look into increasing mapred.reduce.parallel.copies from the default of 5 to something more like 20 or 30.
- Aaron On Fri, Dec 5, 2008 at 10:00 PM, Songting Chen <[EMAIL PROTECTED]>wrote: > A little more information: > > We optimized our Map process quite a bit that now the Shuffle becomes the > bottleneck. > > 1. There are 300 Map jobs (128M size block), each takes about 13 sec. > 2. The Reducer starts running at a very late stage (80% maps are done) > 3. copy 300 map outputs (shuffle) takes as long as the entire map process, > although each map output is just about 50Kbytes > > > > > > --- On Fri, 12/5/08, Alex Loddengaard <[EMAIL PROTECTED]> wrote: > > > From: Alex Loddengaard <[EMAIL PROTECTED]> > > Subject: Re: slow shuffle > > To: core-user@hadoop.apache.org > > Date: Friday, December 5, 2008, 11:43 AM > > These configuration options will be useful: > > > > <property> > > > > > <name>mapred.job.shuffle.merge.percent</name> > > > <value>0.66</value> > > > <description>The usage threshold at which an > > in-memory merge will be > > > initiated, expressed as a percentage of the total > > memory allocated to > > > storing in-memory map outputs, as defined by > > > mapred.job.shuffle.input.buffer.percent. > > > </description> > > > </property> > > > > > > <property> > > > > > <name>mapred.job.shuffle.input.buffer.percent</name> > > > <value>0.70</value> > > > <description>The percentage of memory to be > > allocated from the maximum > > > heap > > > size to storing map outputs during the shuffle. > > > </description> > > > </property> > > > > > > <property> > > > > > <name>mapred.job.reduce.input.buffer.percent</name> > > > <value>0.0</value> > > > <description>The percentage of memory- > > relative to the maximum heap size- > > > to > > > retain map outputs during the reduce. When the > > shuffle is concluded, any > > > remaining map outputs in memory must consume less > > than this threshold > > > before > > > the reduce can begin. > > > </description> > > > </property> > > > > > > > How long did the shuffle take relative to the rest of the > > job? > > > > Alex > > > > On Fri, Dec 5, 2008 at 11:17 AM, Songting Chen > > <[EMAIL PROTECTED]>wrote: > > > > > We encountered a bottleneck during the shuffle phase. > > However, there is not > > > much data to be shuffled across the network at all - > > total less than > > > 10MBytes (the combiner aggregated most of the data). > > > > > > Are there any parameters or anything we can tune to > > improve the shuffle > > > performance? > > > > > > Thanks, > > > -Songting > > > >