MultipleOutputs - Create multiple files during output

2011-09-01 Thread modemide
Hi all,
I was wondering if anyone was familiar with this class.  I want to
create multiple output files during my reduce.

My input files will consist of
name1action1date1
name1action2date2
name1action3date3

name2action1date1
name2action2date2
name2action3date3


My goal is to create files with the following format
Filename:
name_Date:CCYYMM

File Contents:
action1
action2
action3


I.e. This will store all the actions of one person for any given month
in one file.

I just don't know how I will decide the file name at run time.  Can anyone help?

Thanks,
Tim


Re: MultipleOutputs - Create multiple files during output

2011-09-01 Thread Stan Rosenberg
Hi Tim,

You could create a custom HashPartitioner so that all key,value pairs
denoting the actions of the same user end up in the same reducer; then you
need
only one output file per reducer.  Btw, how large are the output files? make
sure you don't end up creating
a lot of small files, i.e.,  64MB.

Best,

stan

On Thu, Sep 1, 2011 at 3:47 PM, modemide modem...@gmail.com wrote:

 Hi all,
 I was wondering if anyone was familiar with this class.  I want to
 create multiple output files during my reduce.

 My input files will consist of
 name1action1date1
 name1action2date2
 name1action3date3

 name2action1date1
 name2action2date2
 name2action3date3


 My goal is to create files with the following format
 Filename:
 name_Date:CCYYMM

 File Contents:
 action1
 action2
 action3


 I.e. This will store all the actions of one person for any given month
 in one file.

 I just don't know how I will decide the file name at run time.  Can anyone
 help?

 Thanks,
 Tim