Does anyone have a pointer to code that allows the map to save data in
intermediate files, for use in a later map/reduce job?  I have been looking
for an example and cannot find one.

I have investigated MultipleOutputFormat and MultipleOutputs.  Because I am
using version 0.18.3, I don't have MultipleOutputs.  The problem with
MultipleOutputFormat is that the data I want to save is a different format
from the data I want to pass to the Reducer.  I have also tried opening a
sequence file directly from the mapper, but I am concerned that this is not
fault tolerant.

The process currently is:

Job1:  Mapper:  reads complicated data, saves out data structure.
Job2:  Mapper:  reads saved data, processes and sends data to Reducer 2.
Job3:  Mapper:  reads saved data, processes and sends data to Reducer 3.

I would like to combine the first two steps, so the process is:

Job1:  Mapper:  reads complicated data, saves out data structure, and passes
processed data to Reducer 2.
Job2:  Mapper:  reads saved data, processes and sends to Reducer 3.

--gordon



On Sun, Nov 22, 2009 at 9:27 PM, Jason Venner <jason.had...@gmail.com>wrote:

> You can manually write the map output to a new file, there are a number of
> examples of opening a sequence file and writing to it on the web or in the
> example code for various hadoop books.
>
> You can also disable the removal of intermediate data, which will result in
> potentially large amounts of data being left in the mapred.local.dir.
>
>
>
> On Sun, Nov 22, 2009 at 3:56 PM, Gordon Linoff <glin...@gmail.com> wrote:
>
>> I am starting to learn Hadoop, using the Yahoo virtual machine with
>> version
>> 0.18.
>>
>> My question is rather simple.  I would like to execute a map/reduce job.
>>  In
>> addition to getting the results from the reduce, I would also like to save
>> the intermediate results from the map in another HDFS file.  Is this
>> possible?
>>
>> --gordon
>>
>
>
>
> --
> Pro Hadoop, a book to guide you from beginner to hadoop mastery,
> http://www.amazon.com/dp/1430219424?tag=jewlerymall
> www.prohadoopbook.com a community for Hadoop Professionals
>

Reply via email to