Re: Two output files?

John Clarke Tue, 18 Aug 2009 01:26:33 -0700

Cheers, that is pretty much what I did except I chose my file name based on
the first few chars of the keys. I have included it below for others looking
for the solution:


==========================================
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.lib.MultipleTextOutputFormat;

public class MyMultipleTextOutputFormat extends
MultipleTextOutputFormat<Text, Text> {

    /**
     * Override so can specify custom file names bases on the key and/or
value
     */
    @Override
    protected String generateFileNameForKeyValue(Text key, Text value,
String name) {

        // here we can choose a name based on the key or value
        String fileName = "output1_" + name;

        if(key.toString().startsWith("sometext")
            fileName = "output2_" + name;

        return fileName;
    }
}

// END


// Define the above as the output format when setting up the JobConf
jobConf.setOutputFormat(MyMultipleTextOutputFormat.class);


// in the Reduce class simply output as before
output.collect(key, val);

==========================================


John.




2009/8/17 Vibhooti Verma <[email protected]>

> Hi John,
>
>
> Here is the example, where you can change the filename specified in the
> conf.
>
> protected String generateFileNameForKeyValue(Object key, Object value,
> String name) {
>
>                return  name.concat(key.toString() + "_" + name);
>
>                return keyBasedName;
>        }
>
>
>
> --
> vibhooti
>
>
>
> On Mon, Aug 17, 2009 at 1:59 PM, John Clarke <[email protected]> wrote:
>
> > Fantastic, I will try that :) A little push in the right driection helps
> > hugely! I don't have that book yet but I'm planning on getting it.
> >
> > cheers
> > John,
> >
> >
> >
> > 2009/8/14 Kris Jirapinyo <[email protected]>
> >
> > > Hi John,
> > >     If you have the Hadoop O'Reilly book, look at pg 206 for an
> example.
> > > But basically, you just create a subclass of MultipleTextOutputFormat
> and
> > > then inside it you override generateFileNameForKeyValue (for example)
> to
> > > have the reducer emit the desired filenames.  For each key in the
> > reducer,
> > > it will write the text values to that file.  Make sure in the JobConf
> you
> > > set OutputFormat to your class that extends MultipleTextOutputFormat.
> > >
> > > -- Kris.
> > >
> > > On Fri, Aug 14, 2009 at 7:11 AM, John Clarke <[email protected]>
> > wrote:
> > >
> > > > Hi,
> > > >
> > > > I want to output two text files from my MapReduce job but I am having
> > > > trouble understanding how to use the MultipleTextOutputFormat class
> to
> > do
> > > > so.
> > > >
> > > > I want to write to the two files depending on the key of each
> key/value
> > > > pair.
> > > >
> > > > In the Reducer how do I tell it to write the different files?
> Normally
> > I
> > > > just do an output.collect(key, val);.
> > > >
> > > > Any help would be most appreciated.
> > > >
> > > > Thanks,
> > > > John
> > > >
> > >
> >
>
>
>
> --
> cheers,
> Vibhooti
>

Re: Two output files?

Reply via email to