Re: slow shuffle

2008-12-06 Thread Devaraj Das



On 12/6/08 11:11 PM, "Songting Chen" <[EMAIL PROTECTED]> wrote:

> That's cool.
> 
> Update on Issue 2:
> 
> I accidentally changed number of reducer to 1 (from 3). The problem is gone!
> That one reducer overlaps with Map well and copies 300 small map output pretty
> fast. 
> 
> So when there are large number of small Map outputs, I'd use only 1 reducer.
> (That doesn't really make sense - there may be some internal code issues).
> 
By the way, the approach of minimizing the number of reducers does make
sense.. If your map outputs are small, then you should consider minimizing
the number of reducers. You cut certain costs in the framework and thereby
improve the runtime of your job..

> Thanks,
> -Songting 
> 
> 
> --- On Sat, 12/6/08, Arun C Murthy <[EMAIL PROTECTED]> wrote:
> 
>> From: Arun C Murthy <[EMAIL PROTECTED]>
>> Subject: Re: slow shuffle
>> To: core-user@hadoop.apache.org
>> Date: Saturday, December 6, 2008, 1:20 AM
>> On Dec 5, 2008, at 2:43 PM, Songting Chen wrote:
>> 
>>> To summarize the slow shuffle issue:
>>> 
>>> 1. I think one problem is that the Reducer starts very
>>> late in the process, slowing the entire job
>> significantly.
>>> 
>>>   Is there a way to let reducer start earlier?
>>> 
>> 
>> http://issues.apache.org/jira/browse/HADOOP-3136 should
>> help you  
>> there, it's pretty close to getting in to 0.20.
>> 
>> Arun
>> 
>>> 2. Copying 300 files with 30K each took total 3 mins
>> (after all map  
>>> finished). This really puzzles me what's behind
>> the scene. (note
>>> that sorting takes < 1 sec)
>>> 
>>> Thanks,
>>> -Songting
>>> 
>>>> 
>>>> 
>>>> --- On Fri, 12/5/08, Songting Chen
>>>> <[EMAIL PROTECTED]> wrote:
>>>> 
>>>>> From: Songting Chen
>> <[EMAIL PROTECTED]>
>>>>> Subject: Re: slow shuffle
>>>>> To: core-user@hadoop.apache.org
>>>>> Date: Friday, December 5, 2008, 1:27 PM
>>>>> We have 4 testing data nodes with 3 reduce
>> tasks. The
>>>>> parallel.copies parameter has been increased
>> to 20,30,
>>>> even
>>>>> 50. But it doesn't really help...
>>>>> 
>>>>> 
>>>>> --- On Fri, 12/5/08, Aaron Kimball
>>>>> <[EMAIL PROTECTED]> wrote:
>>>>> 
>>>>>> From: Aaron Kimball
>> <[EMAIL PROTECTED]>
>>>>>> Subject: Re: slow shuffle
>>>>>> To: core-user@hadoop.apache.org
>>>>>> Date: Friday, December 5, 2008, 12:28 PM
>>>>>> How many reduce tasks do you have? Look
>> into
>>>>> increasing
>>>>>> mapred.reduce.parallel.copies from the
>> default of
>>>> 5 to
>>>>>> something more like
>>>>>> 20 or 30.
>>>>>> 
>>>>>> - Aaron
>>>>>> 
>>>>>> On Fri, Dec 5, 2008 at 10:00 PM, Songting
>> Chen
>>>>>> <[EMAIL PROTECTED]>wrote:
>>>>>> 
>>>>>>> A little more information:
>>>>>>> 
>>>>>>> We optimized our Map process quite a
>> bit
>>>> that now
>>>>> the
>>>>>> Shuffle becomes the
>>>>>>> bottleneck.
>>>>>>> 
>>>>>>> 1. There are 300 Map jobs (128M size
>> block),
>>>> each
>>>>>> takes about 13 sec.
>>>>>>> 2. The Reducer starts running at a
>> very late
>>>>> stage
>>>>>> (80% maps are done)
>>>>>>> 3. copy 300 map outputs (shuffle)
>> takes as
>>>> long
>>>>> as the
>>>>>> entire map process,
>>>>>>> although each map output is just about
>>>> 50Kbytes
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> --- On Fri, 12/5/08, Alex Loddengaard
>>>>>> <[EMAIL PROTECTED]> wrote:
>>>>>>> 
>>>>>>>> From: Alex Loddengaard
>>>>> <[EMAIL PROTECTED]>
>>>>>>>> Subject: Re: slow shuffle
>>>>>>>> To: core-user@hadoop.apache.org
>>

Re: slow shuffle

2008-12-06 Thread Songting Chen
That's cool.

Update on Issue 2:

I accidentally changed number of reducer to 1 (from 3). The problem is gone! 
That one reducer overlaps with Map well and copies 300 small map output pretty 
fast. 

So when there are large number of small Map outputs, I'd use only 1 reducer. 
(That doesn't really make sense - there may be some internal code issues).

Thanks,
-Songting 


--- On Sat, 12/6/08, Arun C Murthy <[EMAIL PROTECTED]> wrote:

> From: Arun C Murthy <[EMAIL PROTECTED]>
> Subject: Re: slow shuffle
> To: core-user@hadoop.apache.org
> Date: Saturday, December 6, 2008, 1:20 AM
> On Dec 5, 2008, at 2:43 PM, Songting Chen wrote:
> 
> > To summarize the slow shuffle issue:
> >
> > 1. I think one problem is that the Reducer starts very
> > late in the process, slowing the entire job
> significantly.
> >
> >   Is there a way to let reducer start earlier?
> >
> 
> http://issues.apache.org/jira/browse/HADOOP-3136 should
> help you  
> there, it's pretty close to getting in to 0.20.
> 
> Arun
> 
> > 2. Copying 300 files with 30K each took total 3 mins
> (after all map  
> > finished). This really puzzles me what's behind
> the scene. (note  
> > that sorting takes < 1 sec)
> >
> > Thanks,
> > -Songting
> >
> >>
> >>
> >> --- On Fri, 12/5/08, Songting Chen
> >> <[EMAIL PROTECTED]> wrote:
> >>
> >>> From: Songting Chen
> <[EMAIL PROTECTED]>
> >>> Subject: Re: slow shuffle
> >>> To: core-user@hadoop.apache.org
> >>> Date: Friday, December 5, 2008, 1:27 PM
> >>> We have 4 testing data nodes with 3 reduce
> tasks. The
> >>> parallel.copies parameter has been increased
> to 20,30,
> >> even
> >>> 50. But it doesn't really help...
> >>>
> >>>
> >>> --- On Fri, 12/5/08, Aaron Kimball
> >>> <[EMAIL PROTECTED]> wrote:
> >>>
> >>>> From: Aaron Kimball
> <[EMAIL PROTECTED]>
> >>>> Subject: Re: slow shuffle
> >>>> To: core-user@hadoop.apache.org
> >>>> Date: Friday, December 5, 2008, 12:28 PM
> >>>> How many reduce tasks do you have? Look
> into
> >>> increasing
> >>>> mapred.reduce.parallel.copies from the
> default of
> >> 5 to
> >>>> something more like
> >>>> 20 or 30.
> >>>>
> >>>> - Aaron
> >>>>
> >>>> On Fri, Dec 5, 2008 at 10:00 PM, Songting
> Chen
> >>>> <[EMAIL PROTECTED]>wrote:
> >>>>
> >>>>> A little more information:
> >>>>>
> >>>>> We optimized our Map process quite a
> bit
> >> that now
> >>> the
> >>>> Shuffle becomes the
> >>>>> bottleneck.
> >>>>>
> >>>>> 1. There are 300 Map jobs (128M size
> block),
> >> each
> >>>> takes about 13 sec.
> >>>>> 2. The Reducer starts running at a
> very late
> >>> stage
> >>>> (80% maps are done)
> >>>>> 3. copy 300 map outputs (shuffle)
> takes as
> >> long
> >>> as the
> >>>> entire map process,
> >>>>> although each map output is just about
> >> 50Kbytes
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> --- On Fri, 12/5/08, Alex Loddengaard
> >>>> <[EMAIL PROTECTED]> wrote:
> >>>>>
> >>>>>> From: Alex Loddengaard
> >>> <[EMAIL PROTECTED]>
> >>>>>> Subject: Re: slow shuffle
> >>>>>> To: core-user@hadoop.apache.org
> >>>>>> Date: Friday, December 5, 2008,
> 11:43
> >> AM
> >>>>>> These configuration options will
> be
> >> useful:
> >>>>>>
> >>>>>> 
> >>>>>>>
> >>>>>>
> >>>>
> >>>
> >>
> mapred.job.shuffle.merge.percent
> >>>>>>> 
> 0.66
> >>>>>>>  The usage
> >>> threshold at
> >>>> which an
> >>>>>> in-memory merge will be
> >>>>>>>  initiated, expressed as a
> >> percentage
> >>> of
> >>>> the total
> >>>>>> memory a

Re: slow shuffle

2008-12-06 Thread Arun C Murthy


On Dec 5, 2008, at 2:43 PM, Songting Chen wrote:


To summarize the slow shuffle issue:

1. I think one problem is that the Reducer starts very
late in the process, slowing the entire job significantly.

  Is there a way to let reducer start earlier?



http://issues.apache.org/jira/browse/HADOOP-3136 should help you  
there, it's pretty close to getting in to 0.20.


Arun

2. Copying 300 files with 30K each took total 3 mins (after all map  
finished). This really puzzles me what's behind the scene. (note  
that sorting takes < 1 sec)


Thanks,
-Songting




--- On Fri, 12/5/08, Songting Chen
<[EMAIL PROTECTED]> wrote:


From: Songting Chen <[EMAIL PROTECTED]>
Subject: Re: slow shuffle
To: core-user@hadoop.apache.org
Date: Friday, December 5, 2008, 1:27 PM
We have 4 testing data nodes with 3 reduce tasks. The
parallel.copies parameter has been increased to 20,30,

even

50. But it doesn't really help...


--- On Fri, 12/5/08, Aaron Kimball
<[EMAIL PROTECTED]> wrote:


From: Aaron Kimball <[EMAIL PROTECTED]>
Subject: Re: slow shuffle
To: core-user@hadoop.apache.org
Date: Friday, December 5, 2008, 12:28 PM
How many reduce tasks do you have? Look into

increasing

mapred.reduce.parallel.copies from the default of

5 to

something more like
20 or 30.

- Aaron

On Fri, Dec 5, 2008 at 10:00 PM, Songting Chen
<[EMAIL PROTECTED]>wrote:


A little more information:

We optimized our Map process quite a bit

that now

the

Shuffle becomes the

bottleneck.

1. There are 300 Map jobs (128M size block),

each

takes about 13 sec.

2. The Reducer starts running at a very late

stage

(80% maps are done)

3. copy 300 map outputs (shuffle) takes as

long

as the

entire map process,

although each map output is just about

50Kbytes






--- On Fri, 12/5/08, Alex Loddengaard

<[EMAIL PROTECTED]> wrote:



From: Alex Loddengaard

<[EMAIL PROTECTED]>

Subject: Re: slow shuffle
To: core-user@hadoop.apache.org
Date: Friday, December 5, 2008, 11:43

AM

These configuration options will be

useful:












mapred.job.shuffle.merge.percent

 0.66
 The usage

threshold at

which an

in-memory merge will be

 initiated, expressed as a

percentage

of

the total

memory allocated to

 storing in-memory map outputs,

as

defined

by



mapred.job.shuffle.input.buffer.percent.

 











mapred.job.shuffle.input.buffer.percent

 0.70
 The

percentage of

memory to be

allocated from the maximum

heap
 size to storing map outputs

during

the

shuffle.

 











mapred.job.reduce.input.buffer.percent

 0.0
 The

percentage of

memory-

relative to the maximum heap size-

to
 retain map outputs during the

reduce.

When

the

shuffle is concluded, any

 remaining map outputs in memory

must

consume less

than this threshold

before
 the reduce can begin.
 




How long did the shuffle take relative

to

the

rest of the

job?

Alex

On Fri, Dec 5, 2008 at 11:17 AM,

Songting

Chen

<[EMAIL PROTECTED]>wrote:


We encountered a bottleneck during

the

shuffle phase.

However, there is not

much data to be shuffled across

the

network

at all -

total less than

10MBytes (the combiner aggregated

most

of

the data).


Are there any parameters or

anything we

can

tune to

improve the shuffle

performance?

Thanks,
-Songting







Re: slow shuffle

2008-12-05 Thread Songting Chen
To summarize the slow shuffle issue:

1. I think one problem is that the Reducer starts very
late in the process, slowing the entire job significantly. 

   Is there a way to let reducer start earlier?

2. Copying 300 files with 30K each took total 3 mins (after all map finished). 
This really puzzles me what's behind the scene. (note that sorting takes < 1 
sec)

Thanks,
-Songting

> 
> 
> --- On Fri, 12/5/08, Songting Chen
> <[EMAIL PROTECTED]> wrote:
> 
> > From: Songting Chen <[EMAIL PROTECTED]>
> > Subject: Re: slow shuffle
> > To: core-user@hadoop.apache.org
> > Date: Friday, December 5, 2008, 1:27 PM
> > We have 4 testing data nodes with 3 reduce tasks. The
> > parallel.copies parameter has been increased to 20,30,
> even
> > 50. But it doesn't really help...
> > 
> > 
> > --- On Fri, 12/5/08, Aaron Kimball
> > <[EMAIL PROTECTED]> wrote:
> > 
> > > From: Aaron Kimball <[EMAIL PROTECTED]>
> > > Subject: Re: slow shuffle
> > > To: core-user@hadoop.apache.org
> > > Date: Friday, December 5, 2008, 12:28 PM
> > > How many reduce tasks do you have? Look into
> > increasing
> > > mapred.reduce.parallel.copies from the default of
> 5 to
> > > something more like
> > > 20 or 30.
> > > 
> > > - Aaron
> > > 
> > > On Fri, Dec 5, 2008 at 10:00 PM, Songting Chen
> > > <[EMAIL PROTECTED]>wrote:
> > > 
> > > > A little more information:
> > > >
> > > > We optimized our Map process quite a bit
> that now
> > the
> > > Shuffle becomes the
> > > > bottleneck.
> > > >
> > > > 1. There are 300 Map jobs (128M size block),
> each
> > > takes about 13 sec.
> > > > 2. The Reducer starts running at a very late
> > stage
> > > (80% maps are done)
> > > > 3. copy 300 map outputs (shuffle) takes as
> long
> > as the
> > > entire map process,
> > > > although each map output is just about
> 50Kbytes
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > --- On Fri, 12/5/08, Alex Loddengaard
> > > <[EMAIL PROTECTED]> wrote:
> > > >
> > > > > From: Alex Loddengaard
> > <[EMAIL PROTECTED]>
> > > > > Subject: Re: slow shuffle
> > > > > To: core-user@hadoop.apache.org
> > > > > Date: Friday, December 5, 2008, 11:43
> AM
> > > > > These configuration options will be
> useful:
> > > > >
> > > > > 
> > > > > >
> > > > >
> > >
> >
> mapred.job.shuffle.merge.percent
> > > > > >   0.66
> > > > > >   The usage
> > threshold at
> > > which an
> > > > > in-memory merge will be
> > > > > >   initiated, expressed as a
> percentage
> > of
> > > the total
> > > > > memory allocated to
> > > > > >   storing in-memory map outputs,
> as
> > defined
> > > by
> > > > > >  
> > mapred.job.shuffle.input.buffer.percent.
> > > > > >   
> > > > > > 
> > > > > >
> > > > > > 
> > > > > >
> > > > >
> > >
> >
> mapred.job.shuffle.input.buffer.percent
> > > > > >   0.70
> > > > > >   The
> percentage of
> > > memory to be
> > > > > allocated from the maximum
> > > > > > heap
> > > > > >   size to storing map outputs
> during
> > the
> > > shuffle.
> > > > > >   
> > > > > > 
> > > > > >
> > > > > > 
> > > > > >
> > > > >
> > >
> >
> mapred.job.reduce.input.buffer.percent
> > > > > >   0.0
> > > > > >   The
> percentage of
> > > memory-
> > > > > relative to the maximum heap size-
> > > > > > to
> > > > > >   retain map outputs during the
> reduce.
> > When
> > > the
> > > > > shuffle is concluded, any
> > > > > >   remaining map outputs in memory
> must
> > > consume less
> > > > > than this threshold
> > > > > > before
> > > > > >   the reduce can begin.
> > > > > >   
> > > > > > 
> > > > > >
> > > > >
> > > > > How long did the shuffle take relative
> to
> > the
> > > rest of the
> > > > > job?
> > > > >
> > > > > Alex
> > > > >
> > > > > On Fri, Dec 5, 2008 at 11:17 AM,
> Songting
> > Chen
> > > > > <[EMAIL PROTECTED]>wrote:
> > > > >
> > > > > > We encountered a bottleneck during
> the
> > > shuffle phase.
> > > > > However, there is not
> > > > > > much data to be shuffled across
> the
> > network
> > > at all -
> > > > > total less than
> > > > > > 10MBytes (the combiner aggregated
> most
> > of
> > > the data).
> > > > > >
> > > > > > Are there any parameters or
> anything we
> > can
> > > tune to
> > > > > improve the shuffle
> > > > > > performance?
> > > > > >
> > > > > > Thanks,
> > > > > > -Songting
> > > > > >
> > > >


Re: slow shuffle

2008-12-05 Thread Songting Chen
I think one of the issues is that the Reducer starts very late in the process, 
slowing the entire job significantly. 

Is there a way to let reducer start earlier?


--- On Fri, 12/5/08, Songting Chen <[EMAIL PROTECTED]> wrote:

> From: Songting Chen <[EMAIL PROTECTED]>
> Subject: Re: slow shuffle
> To: core-user@hadoop.apache.org
> Date: Friday, December 5, 2008, 1:27 PM
> We have 4 testing data nodes with 3 reduce tasks. The
> parallel.copies parameter has been increased to 20,30, even
> 50. But it doesn't really help...
> 
> 
> --- On Fri, 12/5/08, Aaron Kimball
> <[EMAIL PROTECTED]> wrote:
> 
> > From: Aaron Kimball <[EMAIL PROTECTED]>
> > Subject: Re: slow shuffle
> > To: core-user@hadoop.apache.org
> > Date: Friday, December 5, 2008, 12:28 PM
> > How many reduce tasks do you have? Look into
> increasing
> > mapred.reduce.parallel.copies from the default of 5 to
> > something more like
> > 20 or 30.
> > 
> > - Aaron
> > 
> > On Fri, Dec 5, 2008 at 10:00 PM, Songting Chen
> > <[EMAIL PROTECTED]>wrote:
> > 
> > > A little more information:
> > >
> > > We optimized our Map process quite a bit that now
> the
> > Shuffle becomes the
> > > bottleneck.
> > >
> > > 1. There are 300 Map jobs (128M size block), each
> > takes about 13 sec.
> > > 2. The Reducer starts running at a very late
> stage
> > (80% maps are done)
> > > 3. copy 300 map outputs (shuffle) takes as long
> as the
> > entire map process,
> > > although each map output is just about 50Kbytes
> > >
> > >
> > >
> > >
> > >
> > > --- On Fri, 12/5/08, Alex Loddengaard
> > <[EMAIL PROTECTED]> wrote:
> > >
> > > > From: Alex Loddengaard
> <[EMAIL PROTECTED]>
> > > > Subject: Re: slow shuffle
> > > > To: core-user@hadoop.apache.org
> > > > Date: Friday, December 5, 2008, 11:43 AM
> > > > These configuration options will be useful:
> > > >
> > > > 
> > > > >
> > > >
> >
> mapred.job.shuffle.merge.percent
> > > > >   0.66
> > > > >   The usage
> threshold at
> > which an
> > > > in-memory merge will be
> > > > >   initiated, expressed as a percentage
> of
> > the total
> > > > memory allocated to
> > > > >   storing in-memory map outputs, as
> defined
> > by
> > > > >  
> mapred.job.shuffle.input.buffer.percent.
> > > > >   
> > > > > 
> > > > >
> > > > > 
> > > > >
> > > >
> >
> mapred.job.shuffle.input.buffer.percent
> > > > >   0.70
> > > > >   The percentage of
> > memory to be
> > > > allocated from the maximum
> > > > > heap
> > > > >   size to storing map outputs during
> the
> > shuffle.
> > > > >   
> > > > > 
> > > > >
> > > > > 
> > > > >
> > > >
> >
> mapred.job.reduce.input.buffer.percent
> > > > >   0.0
> > > > >   The percentage of
> > memory-
> > > > relative to the maximum heap size-
> > > > > to
> > > > >   retain map outputs during the reduce.
> When
> > the
> > > > shuffle is concluded, any
> > > > >   remaining map outputs in memory must
> > consume less
> > > > than this threshold
> > > > > before
> > > > >   the reduce can begin.
> > > > >   
> > > > > 
> > > > >
> > > >
> > > > How long did the shuffle take relative to
> the
> > rest of the
> > > > job?
> > > >
> > > > Alex
> > > >
> > > > On Fri, Dec 5, 2008 at 11:17 AM, Songting
> Chen
> > > > <[EMAIL PROTECTED]>wrote:
> > > >
> > > > > We encountered a bottleneck during the
> > shuffle phase.
> > > > However, there is not
> > > > > much data to be shuffled across the
> network
> > at all -
> > > > total less than
> > > > > 10MBytes (the combiner aggregated most
> of
> > the data).
> > > > >
> > > > > Are there any parameters or anything we
> can
> > tune to
> > > > improve the shuffle
> > > > > performance?
> > > > >
> > > > > Thanks,
> > > > > -Songting
> > > > >
> > >


Re: slow shuffle

2008-12-05 Thread Songting Chen
We have 4 testing data nodes with 3 reduce tasks. The parallel.copies parameter 
has been increased to 20,30, even 50. But it doesn't really help...


--- On Fri, 12/5/08, Aaron Kimball <[EMAIL PROTECTED]> wrote:

> From: Aaron Kimball <[EMAIL PROTECTED]>
> Subject: Re: slow shuffle
> To: core-user@hadoop.apache.org
> Date: Friday, December 5, 2008, 12:28 PM
> How many reduce tasks do you have? Look into increasing
> mapred.reduce.parallel.copies from the default of 5 to
> something more like
> 20 or 30.
> 
> - Aaron
> 
> On Fri, Dec 5, 2008 at 10:00 PM, Songting Chen
> <[EMAIL PROTECTED]>wrote:
> 
> > A little more information:
> >
> > We optimized our Map process quite a bit that now the
> Shuffle becomes the
> > bottleneck.
> >
> > 1. There are 300 Map jobs (128M size block), each
> takes about 13 sec.
> > 2. The Reducer starts running at a very late stage
> (80% maps are done)
> > 3. copy 300 map outputs (shuffle) takes as long as the
> entire map process,
> > although each map output is just about 50Kbytes
> >
> >
> >
> >
> >
> > --- On Fri, 12/5/08, Alex Loddengaard
> <[EMAIL PROTECTED]> wrote:
> >
> > > From: Alex Loddengaard <[EMAIL PROTECTED]>
> > > Subject: Re: slow shuffle
> > > To: core-user@hadoop.apache.org
> > > Date: Friday, December 5, 2008, 11:43 AM
> > > These configuration options will be useful:
> > >
> > > 
> > > >
> > >
> mapred.job.shuffle.merge.percent
> > > >   0.66
> > > >   The usage threshold at
> which an
> > > in-memory merge will be
> > > >   initiated, expressed as a percentage of
> the total
> > > memory allocated to
> > > >   storing in-memory map outputs, as defined
> by
> > > >   mapred.job.shuffle.input.buffer.percent.
> > > >   
> > > > 
> > > >
> > > > 
> > > >
> > >
> mapred.job.shuffle.input.buffer.percent
> > > >   0.70
> > > >   The percentage of
> memory to be
> > > allocated from the maximum
> > > > heap
> > > >   size to storing map outputs during the
> shuffle.
> > > >   
> > > > 
> > > >
> > > > 
> > > >
> > >
> mapred.job.reduce.input.buffer.percent
> > > >   0.0
> > > >   The percentage of
> memory-
> > > relative to the maximum heap size-
> > > > to
> > > >   retain map outputs during the reduce. When
> the
> > > shuffle is concluded, any
> > > >   remaining map outputs in memory must
> consume less
> > > than this threshold
> > > > before
> > > >   the reduce can begin.
> > > >   
> > > > 
> > > >
> > >
> > > How long did the shuffle take relative to the
> rest of the
> > > job?
> > >
> > > Alex
> > >
> > > On Fri, Dec 5, 2008 at 11:17 AM, Songting Chen
> > > <[EMAIL PROTECTED]>wrote:
> > >
> > > > We encountered a bottleneck during the
> shuffle phase.
> > > However, there is not
> > > > much data to be shuffled across the network
> at all -
> > > total less than
> > > > 10MBytes (the combiner aggregated most of
> the data).
> > > >
> > > > Are there any parameters or anything we can
> tune to
> > > improve the shuffle
> > > > performance?
> > > >
> > > > Thanks,
> > > > -Songting
> > > >
> >


Re: slow shuffle

2008-12-05 Thread Aaron Kimball
How many reduce tasks do you have? Look into increasing
mapred.reduce.parallel.copies from the default of 5 to something more like
20 or 30.

- Aaron

On Fri, Dec 5, 2008 at 10:00 PM, Songting Chen <[EMAIL PROTECTED]>wrote:

> A little more information:
>
> We optimized our Map process quite a bit that now the Shuffle becomes the
> bottleneck.
>
> 1. There are 300 Map jobs (128M size block), each takes about 13 sec.
> 2. The Reducer starts running at a very late stage (80% maps are done)
> 3. copy 300 map outputs (shuffle) takes as long as the entire map process,
> although each map output is just about 50Kbytes
>
>
>
>
>
> --- On Fri, 12/5/08, Alex Loddengaard <[EMAIL PROTECTED]> wrote:
>
> > From: Alex Loddengaard <[EMAIL PROTECTED]>
> > Subject: Re: slow shuffle
> > To: core-user@hadoop.apache.org
> > Date: Friday, December 5, 2008, 11:43 AM
> > These configuration options will be useful:
> >
> > 
> > >
> > mapred.job.shuffle.merge.percent
> > >   0.66
> > >   The usage threshold at which an
> > in-memory merge will be
> > >   initiated, expressed as a percentage of the total
> > memory allocated to
> > >   storing in-memory map outputs, as defined by
> > >   mapred.job.shuffle.input.buffer.percent.
> > >   
> > > 
> > >
> > > 
> > >
> > mapred.job.shuffle.input.buffer.percent
> > >   0.70
> > >   The percentage of memory to be
> > allocated from the maximum
> > > heap
> > >   size to storing map outputs during the shuffle.
> > >   
> > > 
> > >
> > > 
> > >
> > mapred.job.reduce.input.buffer.percent
> > >   0.0
> > >   The percentage of memory-
> > relative to the maximum heap size-
> > > to
> > >   retain map outputs during the reduce. When the
> > shuffle is concluded, any
> > >   remaining map outputs in memory must consume less
> > than this threshold
> > > before
> > >   the reduce can begin.
> > >   
> > > 
> > >
> >
> > How long did the shuffle take relative to the rest of the
> > job?
> >
> > Alex
> >
> > On Fri, Dec 5, 2008 at 11:17 AM, Songting Chen
> > <[EMAIL PROTECTED]>wrote:
> >
> > > We encountered a bottleneck during the shuffle phase.
> > However, there is not
> > > much data to be shuffled across the network at all -
> > total less than
> > > 10MBytes (the combiner aggregated most of the data).
> > >
> > > Are there any parameters or anything we can tune to
> > improve the shuffle
> > > performance?
> > >
> > > Thanks,
> > > -Songting
> > >
>


Re: slow shuffle

2008-12-05 Thread Songting Chen
A little more information:

We optimized our Map process quite a bit that now the Shuffle becomes the 
bottleneck.

1. There are 300 Map jobs (128M size block), each takes about 13 sec.
2. The Reducer starts running at a very late stage (80% maps are done)
3. copy 300 map outputs (shuffle) takes as long as the entire map process, 
although each map output is just about 50Kbytes





--- On Fri, 12/5/08, Alex Loddengaard <[EMAIL PROTECTED]> wrote:

> From: Alex Loddengaard <[EMAIL PROTECTED]>
> Subject: Re: slow shuffle
> To: core-user@hadoop.apache.org
> Date: Friday, December 5, 2008, 11:43 AM
> These configuration options will be useful:
> 
> 
> >  
> mapred.job.shuffle.merge.percent
> >   0.66
> >   The usage threshold at which an
> in-memory merge will be
> >   initiated, expressed as a percentage of the total
> memory allocated to
> >   storing in-memory map outputs, as defined by
> >   mapred.job.shuffle.input.buffer.percent.
> >   
> > 
> >
> > 
> >  
> mapred.job.shuffle.input.buffer.percent
> >   0.70
> >   The percentage of memory to be
> allocated from the maximum
> > heap
> >   size to storing map outputs during the shuffle.
> >   
> > 
> >
> > 
> >  
> mapred.job.reduce.input.buffer.percent
> >   0.0
> >   The percentage of memory-
> relative to the maximum heap size-
> > to
> >   retain map outputs during the reduce. When the
> shuffle is concluded, any
> >   remaining map outputs in memory must consume less
> than this threshold
> > before
> >   the reduce can begin.
> >   
> > 
> >
> 
> How long did the shuffle take relative to the rest of the
> job?
> 
> Alex
> 
> On Fri, Dec 5, 2008 at 11:17 AM, Songting Chen
> <[EMAIL PROTECTED]>wrote:
> 
> > We encountered a bottleneck during the shuffle phase.
> However, there is not
> > much data to be shuffled across the network at all -
> total less than
> > 10MBytes (the combiner aggregated most of the data).
> >
> > Are there any parameters or anything we can tune to
> improve the shuffle
> > performance?
> >
> > Thanks,
> > -Songting
> >


Re: slow shuffle

2008-12-05 Thread Songting Chen
it takes 50% of the total time.


--- On Fri, 12/5/08, Alex Loddengaard <[EMAIL PROTECTED]> wrote:

> From: Alex Loddengaard <[EMAIL PROTECTED]>
> Subject: Re: slow shuffle
> To: core-user@hadoop.apache.org
> Date: Friday, December 5, 2008, 11:43 AM
> These configuration options will be useful:
> 
> 
> >  
> mapred.job.shuffle.merge.percent
> >   0.66
> >   The usage threshold at which an
> in-memory merge will be
> >   initiated, expressed as a percentage of the total
> memory allocated to
> >   storing in-memory map outputs, as defined by
> >   mapred.job.shuffle.input.buffer.percent.
> >   
> > 
> >
> > 
> >  
> mapred.job.shuffle.input.buffer.percent
> >   0.70
> >   The percentage of memory to be
> allocated from the maximum
> > heap
> >   size to storing map outputs during the shuffle.
> >   
> > 
> >
> > 
> >  
> mapred.job.reduce.input.buffer.percent
> >   0.0
> >   The percentage of memory-
> relative to the maximum heap size-
> > to
> >   retain map outputs during the reduce. When the
> shuffle is concluded, any
> >   remaining map outputs in memory must consume less
> than this threshold
> > before
> >   the reduce can begin.
> >   
> > 
> >
> 
> How long did the shuffle take relative to the rest of the
> job?
> 
> Alex
> 
> On Fri, Dec 5, 2008 at 11:17 AM, Songting Chen
> <[EMAIL PROTECTED]>wrote:
> 
> > We encountered a bottleneck during the shuffle phase.
> However, there is not
> > much data to be shuffled across the network at all -
> total less than
> > 10MBytes (the combiner aggregated most of the data).
> >
> > Are there any parameters or anything we can tune to
> improve the shuffle
> > performance?
> >
> > Thanks,
> > -Songting
> >


Re: slow shuffle

2008-12-05 Thread Alex Loddengaard
These configuration options will be useful:


>   mapred.job.shuffle.merge.percent
>   0.66
>   The usage threshold at which an in-memory merge will be
>   initiated, expressed as a percentage of the total memory allocated to
>   storing in-memory map outputs, as defined by
>   mapred.job.shuffle.input.buffer.percent.
>   
> 
>
> 
>   mapred.job.shuffle.input.buffer.percent
>   0.70
>   The percentage of memory to be allocated from the maximum
> heap
>   size to storing map outputs during the shuffle.
>   
> 
>
> 
>   mapred.job.reduce.input.buffer.percent
>   0.0
>   The percentage of memory- relative to the maximum heap size-
> to
>   retain map outputs during the reduce. When the shuffle is concluded, any
>   remaining map outputs in memory must consume less than this threshold
> before
>   the reduce can begin.
>   
> 
>

How long did the shuffle take relative to the rest of the job?

Alex

On Fri, Dec 5, 2008 at 11:17 AM, Songting Chen <[EMAIL PROTECTED]>wrote:

> We encountered a bottleneck during the shuffle phase. However, there is not
> much data to be shuffled across the network at all - total less than
> 10MBytes (the combiner aggregated most of the data).
>
> Are there any parameters or anything we can tune to improve the shuffle
> performance?
>
> Thanks,
> -Songting
>