You could combine them into one file using a reduce stage (with a single reducer), or by using hadoop fs -getmerge on the output directory.
Cheers, Tom On Thu, May 21, 2009 at 3:14 PM, John Clarke <clarke...@gmail.com> wrote: > Hi, > > I want one output file not multiple but I think your reply has steered me in > the right direction! > Thanks > John > > 2009/5/20 Tom White <t...@cloudera.com> > >> Hi John, >> >> You could do this with a map only-job (using NLineInputFormat, and >> setting the number of reducers to 0), and write the output key as >> docnameN,stat1,stat2,stat3,....stat12 and a null value. This assumes >> that you calculate all 12 statistics in one map. Each output file >> would have a single line in it. >> >> Cheers, >> Tom >> >> On Wed, May 20, 2009 at 10:21 AM, John Clarke <clarke...@gmail.com> wrote: >> > Hi, >> > >> > I'm having some trouble implementing what I want to achieve... >> essentially I >> > have a large input list of documents that I want to get statistics on. >> For >> > each document I have 12 different stats to work out. >> > >> > So my input file is a text file with one document filepath on each line. >> The >> > documents are stored on a remote server. I want to fetch each document >> and >> > calculate certain stats from it. >> > >> > My problem is with the output. >> > >> > I want my output to be similar to this: >> > >> > docname1,stat1,stat2,stat3,....stat12 >> > docname2,stat1,stat2,stat3,....stat12 >> > docname3,stat1,stat2,stat3,....stat12 >> > . >> > . >> > . >> > docnameN,stat1,stat2,stat3,....stat12 >> > >> > I can fetch the document in my map code and perform my stats calculation >> on >> > it but dont know how to return more than one value for a key, the key in >> > this case being the document name. >> > >> > Cheers, >> > John >> > >> >