Hi Alan,

Unless you run your job with a single reducer you will not be able to do
this.  Think scalable: you should always add '-r-NNNNN' to the end to allow
for multiple reducers and you can use custom partitioner to make sure each
host goes to a single reducer.  MultipleOutputs can do the rest, meaning the
'YYYY-MM-DD' prefix.  2 looks like a simple aggregation job: the key should
be the host name, and you need just to aggregate the values for each host x
YYYY-MM-DD pair and write them into separate 'YYYY-MM-DD-r-NNNNN' files.
You can also do secondary sort to make sure the YYYY-MM-DD values come in
order: this way you do not need to aggregate them in memory.  See
Reducer.java<http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapreduce/Reducer.html>for
details.

Alex K

On Wed, May 12, 2010 at 3:04 PM, Alan Miller <[email protected]>wrote:

>  Hi Alex,
>
> The tab isn't the issue (yet). I guess it's really 2 questions I have.
> Using the reducer inputs already mentioned.
>
> 1. How do I generate multiple output files named YYYY-MM-DD.txt
> 2. Each file should contain
>      a. one line per host
>      b. each line with host avg1 avg2 avg3 ....
>
> Alan
>
>
> On 05/12/2010 11:50 PM, Alex Kozlov wrote:
>
> Hi Alan,
>
> Is the problem that you want your 'value' vals to be tab separated?   This
> is entirely under control of your reducer.
>
> Alex K
>
> On Wed, May 12, 2010 at 2:07 PM, Alan Miller <[email protected]>wrote:
>
>> Hi all,
>>
>> How can I write tab-delimited output files from my reducer?
>>
>> My reducer gets Text/Text key/vals like:
>>
>> hostX_2010-05-01 varA=valA1,varB=valB1,varC=valC1
>> hostX_2010-05-01 varA=valA2,varB=valB2,varC=valC2
>> hostX_2010-05-01 varA=valA3,varB=valB3,varC=valC3
>> ...
>> hostY_2010-05-01 varA=valA1,varB=valB1,varC=valC1
>> hostY_2010-05-01 varA=valA2,varB=valB2,varC=valC2
>> hostY_2010-05-01 varA=valA3,varB=valB3,varC=valC3
>> ...
>>
>> After my reducer calcs the daily averages of varA,B,C
>> I  want to write a tab-delimited file with lines like:
>>
>> hostX    varA-Avg    varB-Avg    varC-Avg    ....
>> hostY    varA-Avg    varB-Avg    varC-Avg    ....
>>
>>
>> Thanks,
>> Alan
>>
>
>
>

Reply via email to