problem when using combiner and MultipleOutputFormat

Xin Jing Thu, 27 Oct 2011 21:36:17 -0700

Hi all,

I am currently encountering a tough problem, my job use MultipleOutputFormat to 
output result into different folder, and I have to use a combiner to enhance 
performance. In this situation, reduce does not work, reduce cannot receive any 
data. I searched this issue and found a related topic, 
http://lucene.472066.n3.nabble.com/Combiner-and-MultipleOutputs-in-Mapreduce-td1640503.html
 , but not get clear what the solution is really. Seems it is the constraint of 
hadoop framework?


I found a interesting phenomenon, when I limit the map input record to a small 
number (such as 10000), the reduce is ok, it can receive data and the result is 
correct. But when the input is over a million record, the reduce receive 
nothing. I guess the reason is the combiner only be called once when data is 
small while combiner be called multiple time when data is huge.

To summary, how can I make combiner feasible  while using MultipleOutputFormat? 
Any solution or suggestion is welcome.


Thanks

problem when using combiner and MultipleOutputFormat

Reply via email to