Different input streams
Hi, If there are two inputs to a hadoop job one is text and another is binary (Sequence file), is there a way to set InputFormatClass to these two different streams ? job.setInputFormatClass will set to one type of input. Does that mean a hadoop job can not take input in two different formats? Thanks. Ajay Srivastava
Re: Different input streams
Ajay, Take a look at MultipleInputs: See Page 214 | Chapter 7: MapReduce Types and Formats of Hadoop: The Definitive Guide (2nd edition) by Tom White (O'Reilly) and also http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapreduce/lib/input/MultipleInputs.html This class will solve your need, just use a common mapper with them. On Tue, May 1, 2012 at 1:32 PM, Ajay Srivastava ajay.srivast...@guavus.com wrote: Hi, If there are two inputs to a hadoop job one is text and another is binary (Sequence file), is there a way to set InputFormatClass to these two different streams ? job.setInputFormatClass will set to one type of input. Does that mean a hadoop job can not take input in two different formats? Thanks. Ajay Srivastava -- Harsh J
Re: Different input streams
I get the same problem while I am using streaming for sequence file. My solution is use 'org.apache.hadoop.streaming.AutoInputFormat' as input format and add '-D stream.map.input=rawbytes'. huangs, thuhuang...@gmail.com 在 2012-5-1,下午4:02, Ajay Srivastava 写道: Hi, If there are two inputs to a hadoop job one is text and another is binary (Sequence file), is there a way to set InputFormatClass to these two different streams ? job.setInputFormatClass will set to one type of input. Does that mean a hadoop job can not take input in two different formats? Thanks. Ajay Srivastava