Thanks for your answer. I am using different combiner and reducer. As I have said in previous mail, when the data set is small, it works fine and the result is correct. I can tell the functionality of my job is ok, right?
I cannot understand what do you mean by ' Do not output to files directly from your combiner', could you give me more hints? I combiner code, I am using output.collect() to output my result, do I misuse it? ________________________________________ From: Harsh J [ha...@cloudera.com] Sent: Friday, October 28, 2011 2:11 PM To: mapreduce-user@hadoop.apache.org Subject: Re: problem when using combiner and MultipleOutputFormat Xin, You probably just need to write a special Combiner class instead of reusing your Reducer class for combiner purposes. In an MR job, you need to specifically guarantee that the combiner outputs the same type of K-V pairs as the reducer's input. Do not output to files directly from your combiner, and that is why you'd need a different class impl. performing the optimization. On Fri, Oct 28, 2011 at 10:04 AM, Xin Jing <xinj...@beyondfun.net> wrote: > > Hi all, > I am currently encountering a tough problem, my job use MultipleOutputFormat > to output result into different folder, and I have to use a combiner to > enhance performance. In this situation, reduce does not work, reduce cannot > receive any data. I searched this issue and found a related > topic, > http://lucene.472066.n3.nabble.com/Combiner-and-MultipleOutputs-in-Mapreduce-td1640503.html > , > but not get clear what the solution is really. Seems it is the constraint of > hadoop framework? > I found a interesting phenomenon, when I limit the map input record to a > small number (such as 10000), the reduce is ok, it can receive data and the > result is correct. But when the input is over a million record, the reduce > receive nothing. I guess the reason is the combiner only be called once when > data is small while combiner be called multiple time when data is huge. > To summary, how can I make combiner feasible while using > MultipleOutputFormat? Any solution or suggestion is welcome. > > Thanks > -- Harsh J