Re: problem when using combiner and MultipleOutputFormat

Harsh J Thu, 27 Oct 2011 23:11:55 -0700

Xin,

You probably just need to write a special Combiner class instead of
reusing your Reducer class for combiner purposes. In an MR job, you
need to specifically guarantee that the combiner outputs the same type
of K-V pairs as the reducer's input. Do not output to files directly
from your combiner, and that is why you'd need a different class impl.
performing the optimization.


On Fri, Oct 28, 2011 at 10:04 AM, Xin Jing <xinj...@beyondfun.net> wrote:
>
> Hi all,
> I am currently encountering a tough problem, my job use MultipleOutputFormat
> to output result into different folder, and I have to use a combiner to
> enhance performance. In this situation, reduce does not work, reduce cannot
> receive any data. I searched this issue and found a related
> topic, http://lucene.472066.n3.nabble.com/Combiner-and-MultipleOutputs-in-Mapreduce-td1640503.html ,
> but not get clear what the solution is really. Seems it is the constraint of
> hadoop framework?
> I found a interesting phenomenon, when I limit the map input record to a
> small number (such as 10000), the reduce is ok, it can receive data and the
> result is correct. But when the input is over a million record, the reduce
> receive nothing. I guess the reason is the combiner only be called once when
> data is small while combiner be called multiple time when data is huge.
> To summary, how can I make combiner feasible  while using
> MultipleOutputFormat? Any solution or suggestion is welcome.
>
> Thanks
>



-- 
Harsh J

Re: problem when using combiner and MultipleOutputFormat

Reply via email to