I need it because the intermediate data is also part of the solution to the problem my algorithm solve. i somehow need to log this information. The key is Text and the value is ArrayWritable (TextArrayWritable).
On Tue, Sep 6, 2011 at 8:57 AM, Niels Basjes <ni...@basj.es> wrote: > Hi, > > In the past i've had the same situation where I needed the data for > debugging. Back then I chose to create a second job with simply > SequenceFileInputFormat, IdentityMapper, IdentityReducer and finally > TextOutputFormat. > > In my situation that worked great for my purpose. > > -- > Met vriendelijke groet, > Niels Basjes > > Op 6 sep. 2011 01:54 schreef "ilyal levin" <nipponil...@gmail.com> het > volgende: > > > > > o.k , so now i'm using SequenceFileInputFormat > and SequenceFileOutputFormat and it works fine but the output of the reducer > is > > now a binary file (not txt) so i can't understand the data. how can i > solve this? i need the data (in txt form ) of the Intermediate stages in the > chain. > > > > Thanks > > > > > > On Tue, Sep 6, 2011 at 1:33 AM, ilyal levin <nipponil...@gmail.com> > wrote: > >> > >> Thanks for the help. > >> > >> > >> On Mon, Sep 5, 2011 at 10:50 PM, Roger Chen <rogc...@ucdavis.edu> > wrote: > >>> > >>> The binary file will allow you to pass the output from the first > reducer to the second mapper. For example, if you outputed Text, IntWritable > from the first one in SequenceFileOutputFormat, then you are able to > retrieve Text, IntWritable input at the head of the second mapper. The idea > of chaining is that you know what kind of output the first reducer is going > to give already, and that you want to perform some secondary operation on > it. > >>> > >>> One last thing on chaining jobs: it's often worth looking to see if you > can consolidate all of your separate map and reduce tasks into a single > map/reduce operation. There are many situations where it is more intuitive > to write a number of map/reduce operations and chain them together, but more > efficient to have just a single operation. > >>> > >>> > >>> > >>> On Mon, Sep 5, 2011 at 12:21 PM, ilyal levin <nipponil...@gmail.com> > wrote: > >>>> > >>>> Thanks for the reply. > >>>> I tried it but it creates a binary file which i can not understand (i > need the result of the first job). > >>>> The other thing is how can i use this file in the next chained mapper? > i.e how can i retrieve the keys and the values in the map function? > >>>> > >>>> > >>>> Ilyal > >>>> > >>>> > >>>> On Mon, Sep 5, 2011 at 7:41 PM, Joey Echeverria <j...@cloudera.com> > wrote: > >>>>> > >>>>> Have you tried SequenceFileOutputFormat and SequenceFileInputFormat? > >>>>> > >>>>> -Joey > >>>>> > >>>>> On Mon, Sep 5, 2011 at 11:49 AM, ilyal levin <nipponil...@gmail.com> > wrote: > >>>>> > Hi > >>>>> > I'm trying to write a chained mapreduce program. i'm doing so with > a simple > >>>>> > loop where in each iteration i > >>>>> > create a job ,execute it and every time the current job's output is > the next > >>>>> > job's input. > >>>>> > how can i configure the outputFormat of the current job and the > inputFormat > >>>>> > of the next job so that > >>>>> > i will not use the TextInputFormat (TextOutputFormat), because if i > do use > >>>>> > it, i need to parse the input file in the Map function? > >>>>> > i.e if possible i want the next job to "consider" the input file as > >>>>> > <key,value> and not plain Text. > >>>>> > Thanks a lot. > >>>>> > > >>>>> > > >>>>> > > >>>>> > >>>>> > >>>>> > >>>>> -- > >>>>> Joseph Echeverria > >>>>> Cloudera, Inc. > >>>>> 443.305.9434 > >>>> > >>>> > >>> > >>> > >>> > >>> -- > >>> Roger Chen > >>> UC Davis Genome Center > >> > >> > > > >