Hi,

I want one output file not multiple but I think your reply has steered me in
the right direction!
Thanks
John

2009/5/20 Tom White <t...@cloudera.com>

> Hi John,
>
> You could do this with a map only-job (using NLineInputFormat, and
> setting the number of reducers to 0), and write the output key as
> docnameN,stat1,stat2,stat3,....stat12 and a null value. This assumes
> that you calculate all 12 statistics in one map. Each output file
> would have a single line in it.
>
> Cheers,
> Tom
>
> On Wed, May 20, 2009 at 10:21 AM, John Clarke <clarke...@gmail.com> wrote:
> > Hi,
> >
> > I'm having some trouble implementing what I want to achieve...
> essentially I
> > have a large input list of documents that I want to get statistics on.
> For
> > each document I have 12 different stats to work out.
> >
> > So my input file is a text file with one document filepath on each line.
> The
> > documents are stored on a remote server. I want to fetch each document
> and
> > calculate certain stats from it.
> >
> > My problem is with the output.
> >
> > I want my output to be similar to this:
> >
> > docname1,stat1,stat2,stat3,....stat12
> > docname2,stat1,stat2,stat3,....stat12
> > docname3,stat1,stat2,stat3,....stat12
> > .
> > .
> > .
> > docnameN,stat1,stat2,stat3,....stat12
> >
> > I can fetch the document in my map code and perform my stats calculation
> on
> > it but dont know how to return more than one value for a key, the key in
> > this case being the document name.
> >
> > Cheers,
> > John
> >
>

Reply via email to