Each reduce task should follow those phases:

   1. Shuffle: copy intermediate results from all mappers and store them in
   the memory buffer. Once the memory buffer is full, data copied from the
   mappers are merged, combined and *spilled to disk*
   2. Merge: Merge the spills
   3. Reduce: calling the reduce method
   4. Write the output to HDFS

So, to reduce IO, you can:

   - increase the size of this memory buffer used at shuffle phase by
   setting the conf. parameter *mapred.job.shuffle.input.buffer.percent* to
   a higher value. That will reduce the number of disk spills and hence less
   IO.


On Tue, Feb 7, 2012 at 2:52 AM, Marek Miglinski <mmiglin...@seven.com>wrote:

> Thanks for the reply,
>
> As it turns out that didn't help, IO is used even more as each reducer is
> copying and sorting. What are the options? Is there an option to limit
> reduce - > copy and reduce - > sort somehow?
>
>
> Thanks,
> Marek M.
>
> ________________________________
> From: Mostafa Gaber [moustafa.ga...@gmail.com]
> Sent: Monday, February 06, 2012 6:50 PM
> To: mapreduce-user@hadoop.apache.org
> Subject: Re: Reducer IO
>
> Hello Marek,
>
> I think you can increase number of reducers for your MR job so as to
> reduce the amount of intermediate key-value pairs assigned to each reducer.
> Note also that the number of reducers is dependent on your job and how the
> output should be produced.
>
> On Mon, Feb 6, 2012 at 11:37 AM, Marek Miglinski <mmiglin...@seven.com
> <mailto:mmiglin...@seven.com>> wrote:
> Hey,
>
> I have a mapreduce job (transactions loader) and the main problem of it is
> "reduce->copy" and "reduce->sort" phase which takes all IO and uses all
> disk resources, what are the possible ways to reduce this load? My cloud
> settings are:
>
> ioSortFactor=80
> ioSortMb=800
> (mapredChildJavaOpts=Xmx1152m)
>
> I can lower those settings, what else can I tweak?
>
>
> Thanks,
> Marek M.
>
>
>
> --
> Best Regards,
> Mostafa Ead
>
>


-- 
Best Regards,
Mostafa Ead

Reply via email to