Different input streams

2012-05-01 Thread Ajay Srivastava
Hi,

If there are two inputs to a hadoop job one is text and another is binary 
(Sequence file), is there a way to set InputFormatClass to these two different 
streams ?
job.setInputFormatClass will set to one type of input. Does that mean a hadoop 
job can not take input in two different formats?



Thanks.
Ajay Srivastava

Re: Different input streams

2012-05-01 Thread Harsh J
Ajay,

Take a look at MultipleInputs: See Page 214 | Chapter 7: MapReduce
Types and Formats  of Hadoop: The Definitive Guide (2nd edition) by
Tom White (O'Reilly) and also
http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapreduce/lib/input/MultipleInputs.html

This class will solve your need, just use a common mapper with them.

On Tue, May 1, 2012 at 1:32 PM, Ajay Srivastava
ajay.srivast...@guavus.com wrote:
 Hi,

 If there are two inputs to a hadoop job one is text and another is binary 
 (Sequence file), is there a way to set InputFormatClass to these two 
 different streams ?
 job.setInputFormatClass will set to one type of input. Does that mean a 
 hadoop job can not take input in two different formats?



 Thanks.
 Ajay Srivastava



-- 
Harsh J


Re: Different input streams

2012-05-01 Thread 黄 山
I get the same problem while I am using streaming for sequence file.
My solution is use 'org.apache.hadoop.streaming.AutoInputFormat' as input format
and add '-D stream.map.input=rawbytes'.

huangs,
thuhuang...@gmail.com



在 2012-5-1,下午4:02, Ajay Srivastava 写道:

 Hi,
 
 If there are two inputs to a hadoop job one is text and another is binary 
 (Sequence file), is there a way to set InputFormatClass to these two 
 different streams ?
 job.setInputFormatClass will set to one type of input. Does that mean a 
 hadoop job can not take input in two different formats?
 
 
 
 Thanks.
 Ajay Srivastava