it takes 50% of the total time.
--- On Fri, 12/5/08, Alex Loddengaard <[EMAIL PROTECTED]> wrote: > From: Alex Loddengaard <[EMAIL PROTECTED]> > Subject: Re: slow shuffle > To: core-user@hadoop.apache.org > Date: Friday, December 5, 2008, 11:43 AM > These configuration options will be useful: > > <property> > > > <name>mapred.job.shuffle.merge.percent</name> > > <value>0.66</value> > > <description>The usage threshold at which an > in-memory merge will be > > initiated, expressed as a percentage of the total > memory allocated to > > storing in-memory map outputs, as defined by > > mapred.job.shuffle.input.buffer.percent. > > </description> > > </property> > > > > <property> > > > <name>mapred.job.shuffle.input.buffer.percent</name> > > <value>0.70</value> > > <description>The percentage of memory to be > allocated from the maximum > > heap > > size to storing map outputs during the shuffle. > > </description> > > </property> > > > > <property> > > > <name>mapred.job.reduce.input.buffer.percent</name> > > <value>0.0</value> > > <description>The percentage of memory- > relative to the maximum heap size- > > to > > retain map outputs during the reduce. When the > shuffle is concluded, any > > remaining map outputs in memory must consume less > than this threshold > > before > > the reduce can begin. > > </description> > > </property> > > > > How long did the shuffle take relative to the rest of the > job? > > Alex > > On Fri, Dec 5, 2008 at 11:17 AM, Songting Chen > <[EMAIL PROTECTED]>wrote: > > > We encountered a bottleneck during the shuffle phase. > However, there is not > > much data to be shuffled across the network at all - > total less than > > 10MBytes (the combiner aggregated most of the data). > > > > Are there any parameters or anything we can tune to > improve the shuffle > > performance? > > > > Thanks, > > -Songting > >