Hi, In the past i've had the same situation where I needed the data for debugging. Back then I chose to create a second job with simply SequenceFileInputFormat, IdentityMapper, IdentityReducer and finally TextOutputFormat.
In my situation that worked great for my purpose. -- Met vriendelijke groet, Niels Basjes Op 6 sep. 2011 01:54 schreef "ilyal levin" <nipponil...@gmail.com> het volgende: > > o.k , so now i'm using SequenceFileInputFormat and SequenceFileOutputFormat and it works fine but the output of the reducer is > now a binary file (not txt) so i can't understand the data. how can i solve this? i need the data (in txt form ) of the Intermediate stages in the chain. > > Thanks > > > On Tue, Sep 6, 2011 at 1:33 AM, ilyal levin <nipponil...@gmail.com> wrote: >> >> Thanks for the help. >> >> >> On Mon, Sep 5, 2011 at 10:50 PM, Roger Chen <rogc...@ucdavis.edu> wrote: >>> >>> The binary file will allow you to pass the output from the first reducer to the second mapper. For example, if you outputed Text, IntWritable from the first one in SequenceFileOutputFormat, then you are able to retrieve Text, IntWritable input at the head of the second mapper. The idea of chaining is that you know what kind of output the first reducer is going to give already, and that you want to perform some secondary operation on it. >>> >>> One last thing on chaining jobs: it's often worth looking to see if you can consolidate all of your separate map and reduce tasks into a single map/reduce operation. There are many situations where it is more intuitive to write a number of map/reduce operations and chain them together, but more efficient to have just a single operation. >>> >>> >>> >>> On Mon, Sep 5, 2011 at 12:21 PM, ilyal levin <nipponil...@gmail.com> wrote: >>>> >>>> Thanks for the reply. >>>> I tried it but it creates a binary file which i can not understand (i need the result of the first job). >>>> The other thing is how can i use this file in the next chained mapper? i.e how can i retrieve the keys and the values in the map function? >>>> >>>> >>>> Ilyal >>>> >>>> >>>> On Mon, Sep 5, 2011 at 7:41 PM, Joey Echeverria <j...@cloudera.com> wrote: >>>>> >>>>> Have you tried SequenceFileOutputFormat and SequenceFileInputFormat? >>>>> >>>>> -Joey >>>>> >>>>> On Mon, Sep 5, 2011 at 11:49 AM, ilyal levin <nipponil...@gmail.com> wrote: >>>>> > Hi >>>>> > I'm trying to write a chained mapreduce program. i'm doing so with a simple >>>>> > loop where in each iteration i >>>>> > create a job ,execute it and every time the current job's output is the next >>>>> > job's input. >>>>> > how can i configure the outputFormat of the current job and the inputFormat >>>>> > of the next job so that >>>>> > i will not use the TextInputFormat (TextOutputFormat), because if i do use >>>>> > it, i need to parse the input file in the Map function? >>>>> > i.e if possible i want the next job to "consider" the input file as >>>>> > <key,value> and not plain Text. >>>>> > Thanks a lot. >>>>> > >>>>> > >>>>> > >>>>> >>>>> >>>>> >>>>> -- >>>>> Joseph Echeverria >>>>> Cloudera, Inc. >>>>> 443.305.9434 >>>> >>>> >>> >>> >>> >>> -- >>> Roger Chen >>> UC Davis Genome Center >> >> >