Re: Multiple Output Formats
Roger, Or you can take a look at Hadoop's MultipleOutputs class. Thanks. Alejandro On Tue, Jul 26, 2011 at 11:30 PM, Luca Pireddu pire...@crs4.it wrote: On July 26, 2011 06:11:33 PM Roger Chen wrote: Hi all, I am attempting to implement MultipleOutputFormat to write data to multiple files dependent on the output keys and values. Can somebody provide a working example with how to implement this in Hadoop 0.20.2? Thanks! Hello, I have a working sample here: http://biodoop-seal.bzr.sourceforge.net/bzr/biodoop- seal/trunk/annotate/head%3A/src/it/crs4/seal/demux/DemuxOutputFormat.java It extends FileOutputFormat. -- Luca Pireddu CRS4 - Distributed Computing Group Loc. Pixina Manna Edificio 1 Pula 09010 (CA), Italy Tel: +39 0709250452
Multiple Output Formats
Hi all, I am attempting to implement MultipleOutputFormat to write data to multiple files dependent on the output keys and values. Can somebody provide a working example with how to implement this in Hadoop 0.20.2? Thanks! -- Roger Chen UC Davis Genome Center
Re: Multiple Output Formats
package com.shopkick.util; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapred.lib.MultipleTextOutputFormat; public class MultiFileOutput extends MultipleTextOutputFormatText, Text { @Override protected String generateFileNameForKeyValue(Text key, Text value, String name) { // TODO Auto-generated method stub return key.toString()+/+name; } } -Ayon See My Photos on Flickr Also check out my Blog for answers to commonly asked questions. From: Roger Chen rogc...@ucdavis.edu To: common-user@hadoop.apache.org Sent: Tuesday, July 26, 2011 9:11 AM Subject: Multiple Output Formats Hi all, I am attempting to implement MultipleOutputFormat to write data to multiple files dependent on the output keys and values. Can somebody provide a working example with how to implement this in Hadoop 0.20.2? Thanks! -- Roger Chen UC Davis Genome Center
Re: Multiple Output Formats
Roger, Beyond Ayon's example answer, I'd like you to note that the newer API will *not* carry a supported MultipleOutputFormat as it has been obsoleted away in favor of MultipleOutputs, whose use is much easier, is threadsafe, and also carries an example to look at, at [1]. [1] - http://hadoop.apache.org/common/docs/r0.20.2/api/org/apache/hadoop/mapred/lib/MultipleOutputs.html On Tue, Jul 26, 2011 at 9:41 PM, Roger Chen rogc...@ucdavis.edu wrote: Hi all, I am attempting to implement MultipleOutputFormat to write data to multiple files dependent on the output keys and values. Can somebody provide a working example with how to implement this in Hadoop 0.20.2? Thanks! -- Roger Chen UC Davis Genome Center -- Harsh J
Re: Multiple Output Formats
The problem I'm facing right now is with the configuration needed for MultipleOutputs, because JobConf is deprecated now and I am unable to do its equivalent with Configuration. I set the configuration of the job by: Job job = new Job(getConf()); but when I'm trying to use this line in my config: MultipleOutputs.addNamedOutput(conf, text, TextOutputFormat.class, LongWritable.class, Text.class); I get an issue about no suitable method being found. Roger On Tue, Jul 26, 2011 at 12:00 PM, Harsh J ha...@cloudera.com wrote: Roger, Beyond Ayon's example answer, I'd like you to note that the newer API will *not* carry a supported MultipleOutputFormat as it has been obsoleted away in favor of MultipleOutputs, whose use is much easier, is threadsafe, and also carries an example to look at, at [1]. [1] - http://hadoop.apache.org/common/docs/r0.20.2/api/org/apache/hadoop/mapred/lib/MultipleOutputs.html On Tue, Jul 26, 2011 at 9:41 PM, Roger Chen rogc...@ucdavis.edu wrote: Hi all, I am attempting to implement MultipleOutputFormat to write data to multiple files dependent on the output keys and values. Can somebody provide a working example with how to implement this in Hadoop 0.20.2? Thanks! -- Roger Chen UC Davis Genome Center -- Harsh J -- Roger Chen UC Davis Genome Center
Re: Multiple Output Formats
Gotcha, my bad then. The hadoop distribution I use provides a backported MO, so I overlooked this particular issue while replying. Still, the warning holds as the versions would roll ahead. But I believe the refactor would not be that much of a pain, so perhaps its a no-worry. On Wed, Jul 27, 2011 at 2:00 AM, Roger Chen rogc...@ucdavis.edu wrote: The problem I'm facing right now is with the configuration needed for MultipleOutputs, because JobConf is deprecated now and I am unable to do its equivalent with Configuration. I set the configuration of the job by: Job job = new Job(getConf()); but when I'm trying to use this line in my config: MultipleOutputs.addNamedOutput(conf, text, TextOutputFormat.class, LongWritable.class, Text.class); I get an issue about no suitable method being found. Roger On Tue, Jul 26, 2011 at 12:00 PM, Harsh J ha...@cloudera.com wrote: Roger, Beyond Ayon's example answer, I'd like you to note that the newer API will *not* carry a supported MultipleOutputFormat as it has been obsoleted away in favor of MultipleOutputs, whose use is much easier, is threadsafe, and also carries an example to look at, at [1]. [1] - http://hadoop.apache.org/common/docs/r0.20.2/api/org/apache/hadoop/mapred/lib/MultipleOutputs.html On Tue, Jul 26, 2011 at 9:41 PM, Roger Chen rogc...@ucdavis.edu wrote: Hi all, I am attempting to implement MultipleOutputFormat to write data to multiple files dependent on the output keys and values. Can somebody provide a working example with how to implement this in Hadoop 0.20.2? Thanks! -- Roger Chen UC Davis Genome Center -- Harsh J -- Roger Chen UC Davis Genome Center -- Harsh J