Hi John, You could do this with a map only-job (using NLineInputFormat, and setting the number of reducers to 0), and write the output key as docnameN,stat1,stat2,stat3,....stat12 and a null value. This assumes that you calculate all 12 statistics in one map. Each output file would have a single line in it.
Cheers, Tom On Wed, May 20, 2009 at 10:21 AM, John Clarke <clarke...@gmail.com> wrote: > Hi, > > I'm having some trouble implementing what I want to achieve... essentially I > have a large input list of documents that I want to get statistics on. For > each document I have 12 different stats to work out. > > So my input file is a text file with one document filepath on each line. The > documents are stored on a remote server. I want to fetch each document and > calculate certain stats from it. > > My problem is with the output. > > I want my output to be similar to this: > > docname1,stat1,stat2,stat3,....stat12 > docname2,stat1,stat2,stat3,....stat12 > docname3,stat1,stat2,stat3,....stat12 > . > . > . > docnameN,stat1,stat2,stat3,....stat12 > > I can fetch the document in my map code and perform my stats calculation on > it but dont know how to return more than one value for a key, the key in > this case being the document name. > > Cheers, > John >