Hi John,

You could do this with a map only-job (using NLineInputFormat, and
setting the number of reducers to 0), and write the output key as
docnameN,stat1,stat2,stat3,....stat12 and a null value. This assumes
that you calculate all 12 statistics in one map. Each output file
would have a single line in it.

Cheers,
Tom

On Wed, May 20, 2009 at 10:21 AM, John Clarke <clarke...@gmail.com> wrote:
> Hi,
>
> I'm having some trouble implementing what I want to achieve... essentially I
> have a large input list of documents that I want to get statistics on. For
> each document I have 12 different stats to work out.
>
> So my input file is a text file with one document filepath on each line. The
> documents are stored on a remote server. I want to fetch each document and
> calculate certain stats from it.
>
> My problem is with the output.
>
> I want my output to be similar to this:
>
> docname1,stat1,stat2,stat3,....stat12
> docname2,stat1,stat2,stat3,....stat12
> docname3,stat1,stat2,stat3,....stat12
> .
> .
> .
> docnameN,stat1,stat2,stat3,....stat12
>
> I can fetch the document in my map code and perform my stats calculation on
> it but dont know how to return more than one value for a key, the key in
> this case being the document name.
>
> Cheers,
> John
>

Reply via email to