Does anyone have a pointer to code that allows the map to save data in intermediate files, for use in a later map/reduce job? I have been looking for an example and cannot find one.
I have investigated MultipleOutputFormat and MultipleOutputs. Because I am using version 0.18.3, I don't have MultipleOutputs. The problem with MultipleOutputFormat is that the data I want to save is a different format from the data I want to pass to the Reducer. I have also tried opening a sequence file directly from the mapper, but I am concerned that this is not fault tolerant. The process currently is: Job1: Mapper: reads complicated data, saves out data structure. Job2: Mapper: reads saved data, processes and sends data to Reducer 2. Job3: Mapper: reads saved data, processes and sends data to Reducer 3. I would like to combine the first two steps, so the process is: Job1: Mapper: reads complicated data, saves out data structure, and passes processed data to Reducer 2. Job2: Mapper: reads saved data, processes and sends to Reducer 3. --gordon On Sun, Nov 22, 2009 at 9:27 PM, Jason Venner <jason.had...@gmail.com>wrote: > You can manually write the map output to a new file, there are a number of > examples of opening a sequence file and writing to it on the web or in the > example code for various hadoop books. > > You can also disable the removal of intermediate data, which will result in > potentially large amounts of data being left in the mapred.local.dir. > > > > On Sun, Nov 22, 2009 at 3:56 PM, Gordon Linoff <glin...@gmail.com> wrote: > >> I am starting to learn Hadoop, using the Yahoo virtual machine with >> version >> 0.18. >> >> My question is rather simple. I would like to execute a map/reduce job. >> In >> addition to getting the results from the reduce, I would also like to save >> the intermediate results from the map in another HDFS file. Is this >> possible? >> >> --gordon >> > > > > -- > Pro Hadoop, a book to guide you from beginner to hadoop mastery, > http://www.amazon.com/dp/1430219424?tag=jewlerymall > www.prohadoopbook.com a community for Hadoop Professionals >