Hi Ajit, You could experiment with a higher value of "io.sort.mb" so that the combiner is more effective. However if you combiner is such that it does not really 'reduce' the number of records, it would not help. You will have to increase the java heap size as well (mapred.child.java.opts) so that your tasks don't go out of memory.
Regards, Anand On 21-Feb-2012, at 3:09 PM, Ajit Ratnaparkhi wrote: > Hi, > > This about a typical pattern of map-reduce jobs, > > There are some map-reduce jobs in which map phase generates records which are > more in number than its input, at reduce phase this data reduces a lot and > final output of reduce is very small. > Eg. Each map function call ie. for each input record map generates approx 100 > output records(one output record is approx of same size as one input record). > Combiner is applied, output of map is shuffled and it reaches reducer, where > it is reduced to very small size output data (say less than 0.1% of input > data size to map). > > Time taken for execution of such kind of job(where output of map is more than > its input) is considerably high if you compare those with jobs with same/less > output map records for same input data. > > Has anybody worked on optimizing such jobs? any configuration tuning which > might help here? > > -Ajit > >