Unless you need the hashing/sorting provided by the reduce phase, I'd
recommend placing your logic in your mapper and, when setting up your
job, calling JobConf#setNumReduceTasks(0), so that the reduce phase
won't be executed.  In that case, any records emitted by your mapper
will be written to the output.

http://hadoop.apache.org/core/docs/r0.19.1/api/org/apache/hadoop/mapred/JobConf.html#setNumReduceTasks(int)


On Mon, Apr 20, 2009 at 10:25 PM, Mark Kerzner <markkerz...@gmail.com> wrote:
> Hi,
>
> in an MR step, I need to extract text from various files (using Tika). I
> have put text extraction into reduce(), because I am writing the extracted
> text to the output on HDFS. But now it occurs to me that I might as well
> have put it into map() and have default reduce() which will write every
> map() result out, is that true?
>
> Thank you,
> Mark
>

Reply via email to