Hi, I want one output file not multiple but I think your reply has steered me in the right direction! Thanks John
2009/5/20 Tom White <t...@cloudera.com> > Hi John, > > You could do this with a map only-job (using NLineInputFormat, and > setting the number of reducers to 0), and write the output key as > docnameN,stat1,stat2,stat3,....stat12 and a null value. This assumes > that you calculate all 12 statistics in one map. Each output file > would have a single line in it. > > Cheers, > Tom > > On Wed, May 20, 2009 at 10:21 AM, John Clarke <clarke...@gmail.com> wrote: > > Hi, > > > > I'm having some trouble implementing what I want to achieve... > essentially I > > have a large input list of documents that I want to get statistics on. > For > > each document I have 12 different stats to work out. > > > > So my input file is a text file with one document filepath on each line. > The > > documents are stored on a remote server. I want to fetch each document > and > > calculate certain stats from it. > > > > My problem is with the output. > > > > I want my output to be similar to this: > > > > docname1,stat1,stat2,stat3,....stat12 > > docname2,stat1,stat2,stat3,....stat12 > > docname3,stat1,stat2,stat3,....stat12 > > . > > . > > . > > docnameN,stat1,stat2,stat3,....stat12 > > > > I can fetch the document in my map code and perform my stats calculation > on > > it but dont know how to return more than one value for a key, the key in > > this case being the document name. > > > > Cheers, > > John > > >