Re: Multiple Output Formats

2011-07-27 Thread Alejandro Abdelnur
Roger,

Or you can take a look at Hadoop's MultipleOutputs class.

Thanks.

Alejandro

On Tue, Jul 26, 2011 at 11:30 PM, Luca Pireddu pire...@crs4.it wrote:

 On July 26, 2011 06:11:33 PM Roger Chen wrote:
  Hi all,
 
  I am attempting to implement MultipleOutputFormat to write data to
 multiple
  files dependent on the output keys and values. Can somebody provide a
  working example with how to implement this in Hadoop 0.20.2?
 
  Thanks!

 Hello,

 I have a working sample here:

 http://biodoop-seal.bzr.sourceforge.net/bzr/biodoop-
 seal/trunk/annotate/head%3A/src/it/crs4/seal/demux/DemuxOutputFormat.java

 It extends FileOutputFormat.

 --
 Luca Pireddu
 CRS4 - Distributed Computing Group
 Loc. Pixina Manna Edificio 1
 Pula 09010 (CA), Italy
 Tel:  +39 0709250452



Multiple Output Formats

2011-07-26 Thread Roger Chen
Hi all,

I am attempting to implement MultipleOutputFormat to write data to multiple
files dependent on the output keys and values. Can somebody provide a
working example with how to implement this in Hadoop 0.20.2?

Thanks!

-- 
Roger Chen
UC Davis Genome Center


Re: Multiple Output Formats

2011-07-26 Thread Ayon Sinha
package com.shopkick.util;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.lib.MultipleTextOutputFormat;


public class MultiFileOutput extends MultipleTextOutputFormatText, Text {

@Override
protected String generateFileNameForKeyValue(Text key, Text value,
String name) {
// TODO Auto-generated method stub
return key.toString()+/+name;
}

}


 
-Ayon
See My Photos on Flickr
Also check out my Blog for answers to commonly asked questions.




From: Roger Chen rogc...@ucdavis.edu
To: common-user@hadoop.apache.org
Sent: Tuesday, July 26, 2011 9:11 AM
Subject: Multiple Output Formats

Hi all,

I am attempting to implement MultipleOutputFormat to write data to multiple
files dependent on the output keys and values. Can somebody provide a
working example with how to implement this in Hadoop 0.20.2?

Thanks!

-- 
Roger Chen
UC Davis Genome Center

Re: Multiple Output Formats

2011-07-26 Thread Harsh J
Roger,

Beyond Ayon's example answer, I'd like you to note that the newer API
will *not* carry a supported MultipleOutputFormat as it has been
obsoleted away in favor of MultipleOutputs, whose use is much easier,
is threadsafe, and also carries an example to look at, at [1].

[1] - 
http://hadoop.apache.org/common/docs/r0.20.2/api/org/apache/hadoop/mapred/lib/MultipleOutputs.html

On Tue, Jul 26, 2011 at 9:41 PM, Roger Chen rogc...@ucdavis.edu wrote:
 Hi all,

 I am attempting to implement MultipleOutputFormat to write data to multiple
 files dependent on the output keys and values. Can somebody provide a
 working example with how to implement this in Hadoop 0.20.2?

 Thanks!

 --
 Roger Chen
 UC Davis Genome Center




-- 
Harsh J


Re: Multiple Output Formats

2011-07-26 Thread Roger Chen
The problem I'm facing right now is with the configuration needed for
MultipleOutputs, because JobConf is deprecated now and I am unable to do its
equivalent with Configuration. I set the configuration of the job by:

 Job job = new Job(getConf());

but when I'm trying to use this line in my config:

 MultipleOutputs.addNamedOutput(conf, text, TextOutputFormat.class,
 LongWritable.class, Text.class);

I get an issue about no suitable method being found.

Roger

On Tue, Jul 26, 2011 at 12:00 PM, Harsh J ha...@cloudera.com wrote:

 Roger,

 Beyond Ayon's example answer, I'd like you to note that the newer API
 will *not* carry a supported MultipleOutputFormat as it has been
 obsoleted away in favor of MultipleOutputs, whose use is much easier,
 is threadsafe, and also carries an example to look at, at [1].

 [1] -
 http://hadoop.apache.org/common/docs/r0.20.2/api/org/apache/hadoop/mapred/lib/MultipleOutputs.html

 On Tue, Jul 26, 2011 at 9:41 PM, Roger Chen rogc...@ucdavis.edu wrote:
  Hi all,
 
  I am attempting to implement MultipleOutputFormat to write data to
 multiple
  files dependent on the output keys and values. Can somebody provide a
  working example with how to implement this in Hadoop 0.20.2?
 
  Thanks!
 
  --
  Roger Chen
  UC Davis Genome Center
 



 --
 Harsh J




-- 
Roger Chen
UC Davis Genome Center


Re: Multiple Output Formats

2011-07-26 Thread Harsh J
Gotcha, my bad then. The hadoop distribution I use provides a
backported MO, so I overlooked this particular issue while replying.

Still, the warning holds as the versions would roll ahead. But I
believe the refactor would not be that much of a pain, so perhaps its
a no-worry.

On Wed, Jul 27, 2011 at 2:00 AM, Roger Chen rogc...@ucdavis.edu wrote:
 The problem I'm facing right now is with the configuration needed for
 MultipleOutputs, because JobConf is deprecated now and I am unable to do its
 equivalent with Configuration. I set the configuration of the job by:

  Job job = new Job(getConf());

 but when I'm trying to use this line in my config:

  MultipleOutputs.addNamedOutput(conf, text, TextOutputFormat.class,
  LongWritable.class, Text.class);

 I get an issue about no suitable method being found.

 Roger

 On Tue, Jul 26, 2011 at 12:00 PM, Harsh J ha...@cloudera.com wrote:

 Roger,

 Beyond Ayon's example answer, I'd like you to note that the newer API
 will *not* carry a supported MultipleOutputFormat as it has been
 obsoleted away in favor of MultipleOutputs, whose use is much easier,
 is threadsafe, and also carries an example to look at, at [1].

 [1] -
 http://hadoop.apache.org/common/docs/r0.20.2/api/org/apache/hadoop/mapred/lib/MultipleOutputs.html

 On Tue, Jul 26, 2011 at 9:41 PM, Roger Chen rogc...@ucdavis.edu wrote:
  Hi all,
 
  I am attempting to implement MultipleOutputFormat to write data to
 multiple
  files dependent on the output keys and values. Can somebody provide a
  working example with how to implement this in Hadoop 0.20.2?
 
  Thanks!
 
  --
  Roger Chen
  UC Davis Genome Center
 



 --
 Harsh J




 --
 Roger Chen
 UC Davis Genome Center




-- 
Harsh J