Re: multiple results for each input line

Tom White Thu, 21 May 2009 07:18:28 -0700

You could combine them into one file using a reduce stage (with a
single reducer), or by using hadoop fs -getmerge on the output
directory.


Cheers,
Tom

On Thu, May 21, 2009 at 3:14 PM, John Clarke <clarke...@gmail.com> wrote:
> Hi,
>
> I want one output file not multiple but I think your reply has steered me in
> the right direction!
> Thanks
> John
>
> 2009/5/20 Tom White <t...@cloudera.com>
>
>> Hi John,
>>
>> You could do this with a map only-job (using NLineInputFormat, and
>> setting the number of reducers to 0), and write the output key as
>> docnameN,stat1,stat2,stat3,....stat12 and a null value. This assumes
>> that you calculate all 12 statistics in one map. Each output file
>> would have a single line in it.
>>
>> Cheers,
>> Tom
>>
>> On Wed, May 20, 2009 at 10:21 AM, John Clarke <clarke...@gmail.com> wrote:
>> > Hi,
>> >
>> > I'm having some trouble implementing what I want to achieve...
>> essentially I
>> > have a large input list of documents that I want to get statistics on.
>> For
>> > each document I have 12 different stats to work out.
>> >
>> > So my input file is a text file with one document filepath on each line.
>> The
>> > documents are stored on a remote server. I want to fetch each document
>> and
>> > calculate certain stats from it.
>> >
>> > My problem is with the output.
>> >
>> > I want my output to be similar to this:
>> >
>> > docname1,stat1,stat2,stat3,....stat12
>> > docname2,stat1,stat2,stat3,....stat12
>> > docname3,stat1,stat2,stat3,....stat12
>> > .
>> > .
>> > .
>> > docnameN,stat1,stat2,stat3,....stat12
>> >
>> > I can fetch the document in my map code and perform my stats calculation
>> on
>> > it but dont know how to return more than one value for a key, the key in
>> > this case being the document name.
>> >
>> > Cheers,
>> > John
>> >
>>
>

Re: multiple results for each input line

Reply via email to