Unless you need the hashing/sorting provided by the reduce phase, I'd recommend placing your logic in your mapper and, when setting up your job, calling JobConf#setNumReduceTasks(0), so that the reduce phase won't be executed. In that case, any records emitted by your mapper will be written to the output.
http://hadoop.apache.org/core/docs/r0.19.1/api/org/apache/hadoop/mapred/JobConf.html#setNumReduceTasks(int) On Mon, Apr 20, 2009 at 10:25 PM, Mark Kerzner <markkerz...@gmail.com> wrote: > Hi, > > in an MR step, I need to extract text from various files (using Tika). I > have put text extraction into reduce(), because I am writing the extracted > text to the output on HDFS. But now it occurs to me that I might as well > have put it into map() and have default reduce() which will write every > map() result out, is that true? > > Thank you, > Mark >