Xin, You probably just need to write a special Combiner class instead of reusing your Reducer class for combiner purposes. In an MR job, you need to specifically guarantee that the combiner outputs the same type of K-V pairs as the reducer's input. Do not output to files directly from your combiner, and that is why you'd need a different class impl. performing the optimization.
On Fri, Oct 28, 2011 at 10:04 AM, Xin Jing <xinj...@beyondfun.net> wrote: > > Hi all, > I am currently encountering a tough problem, my job use MultipleOutputFormat > to output result into different folder, and I have to use a combiner to > enhance performance. In this situation, reduce does not work, reduce cannot > receive any data. I searched this issue and found a related > topic, http://lucene.472066.n3.nabble.com/Combiner-and-MultipleOutputs-in-Mapreduce-td1640503.html , > but not get clear what the solution is really. Seems it is the constraint of > hadoop framework? > I found a interesting phenomenon, when I limit the map input record to a > small number (such as 10000), the reduce is ok, it can receive data and the > result is correct. But when the input is over a million record, the reduce > receive nothing. I guess the reason is the combiner only be called once when > data is small while combiner be called multiple time when data is huge. > To summary, how can I make combiner feasible while using > MultipleOutputFormat? Any solution or suggestion is welcome. > > Thanks > -- Harsh J