Hi Alan, On Mon, May 10, 2010 at 5:08 AM, Some Body <[email protected]> wrote:
> Hi, > > I'm trying to understand how to generate multiple outputs in my reducer > (using 0.20.2+228). > Do I need MultipleOutput or should I partition my output in the mapper? > > The question is scalability. If you are OK with running only 2 (or N) reducers, "morning" and "afternoon", and they are approximately of the same size, you should implement a custom partitioner. However, this approach is not scalable since you will always be stuck with a predefined number of reducers. A better approach is to leave the # of reducers flexible and use 'hadoop fs -getmerge' or custom Java code afterwards to merge multiple files. Alex K > My reducer currently gets key/val input pairs like this which all end up in > my part_r_0000 file. > > hostA_VarX_2010-05-01_morning <FLOATVAL> > hostA_VarY_2010-05-01_morning <FLOATVAL> > hostA_VarX_2010-05-01_afternoon <FLOATVAL> > hostA_VarY_2010-05-01_afternoon <FLOATVAL> > ..... > hostB_VarX_2010-05-01_morning <FLOATVAL> > hostB_VarY_2010-05-01_morning <FLOATVAL> > hostB_VarX_2010-05-01_afternoon <FLOATVAL> > hostB_VarY_2010-05-01_afternoon <FLOATVAL> > ..... > hostA_VarX_2010-05-02_morning <FLOATVAL> > hostA_VarY_2010-05-02_morning <FLOATVAL> > hostA_VarX_2010-05-02_afternoon <FLOATVAL> > hostA_VarY_2010-05-02_afternoon <FLOATVAL> > ..... > hostB_VarX_2010-05-02_morning <FLOATVAL> > hostB_VarY_2010-05-02_morning <FLOATVAL> > hostB_VarX_2010-05-02_afternoon <FLOATVAL> > hostB_VarY_2010-05-02_afternoon <FLOATVAL> > ..... > > But instead of 1 output file I want one output file per day/group. e.g. > 2010-05-01_morning.txt > 2010-05-01_afternoon.txt > > Each <date>_<time>.txt file would contain all keys/vals for all hosts & > VarNames > > Thanks, > Alan
