o.k , so now i'm using SequenceFileInputFormat and SequenceFileOutputFormat and it works fine but the output of the reducer is now a binary file (not txt) so i can't understand the data. how can i solve this? i need the data (in txt form ) of the Intermediate stages in the chain.
Thanks On Tue, Sep 6, 2011 at 1:33 AM, ilyal levin <nipponil...@gmail.com> wrote: > Thanks for the help. > > > On Mon, Sep 5, 2011 at 10:50 PM, Roger Chen <rogc...@ucdavis.edu> wrote: > >> The binary file will allow you to pass the output from the first reducer >> to the second mapper. For example, if you outputed Text, IntWritable from >> the first one in SequenceFileOutputFormat, then you are able to retrieve >> Text, IntWritable input at the head of the second mapper. The idea of >> chaining is that you know what kind of output the first reducer is going to >> give already, and that you want to perform some secondary operation on it. >> >> One last thing on chaining jobs: it's often worth looking to see if you >> can consolidate all of your separate map and reduce tasks into a single >> map/reduce operation. There are many situations where it is more intuitive >> to write a number of map/reduce operations and chain them together, but more >> efficient to have just a single operation. >> >> >> >> On Mon, Sep 5, 2011 at 12:21 PM, ilyal levin <nipponil...@gmail.com>wrote: >> >>> Thanks for the reply. >>> I tried it but it creates a binary file which i can not understand (i >>> need the result of the first job). >>> The other thing is how can i use this file in the next chained mapper? >>> i.e how can i retrieve the keys and the values in the map function? >>> >>> >>> Ilyal >>> >>> >>> On Mon, Sep 5, 2011 at 7:41 PM, Joey Echeverria <j...@cloudera.com>wrote: >>> >>>> Have you tried SequenceFileOutputFormat and SequenceFileInputFormat? >>>> >>>> -Joey >>>> >>>> On Mon, Sep 5, 2011 at 11:49 AM, ilyal levin <nipponil...@gmail.com> >>>> wrote: >>>> > Hi >>>> > I'm trying to write a chained mapreduce program. i'm doing so with a >>>> simple >>>> > loop where in each iteration i >>>> > create a job ,execute it and every time the current job's output is >>>> the next >>>> > job's input. >>>> > how can i configure the outputFormat of the current job and the >>>> inputFormat >>>> > of the next job so that >>>> > i will not use the TextInputFormat (TextOutputFormat), because if i do >>>> use >>>> > it, i need to parse the input file in the Map function? >>>> > i.e if possible i want the next job to "consider" the input file as >>>> > <key,value> and not plain Text. >>>> > Thanks a lot. >>>> > >>>> > >>>> > >>>> >>>> >>>> >>>> -- >>>> Joseph Echeverria >>>> Cloudera, Inc. >>>> 443.305.9434 >>>> >>> >>> >> >> >> -- >> Roger Chen >> UC Davis Genome Center >> > >