But I have downloaded "hadoop-streaming-0.20.205.0.jar" and it contains StreamXmlRecordReader.class file. This means it should support StreamInputFormat.
Regards, Mohammad Tariq On Tue, Jun 19, 2012 at 5:54 PM, Mohammad Tariq <donta...@gmail.com> wrote: > Thanks Madhu. I'll do that. > > Regards, > Mohammad Tariq > > > On Tue, Jun 19, 2012 at 5:43 PM, madhu phatak <phatak....@gmail.com> wrote: >> Seems like StreamInputFormat not yet ported to new API.That's why you are >> not able to set as InputFormatClass. You can file a jira for this issue. >> >> >> On Tue, Jun 19, 2012 at 4:49 PM, Mohammad Tariq <donta...@gmail.com> wrote: >>> >>> My driver function looks like this - >>> >>> public static void main(String[] args) throws IOException, >>> InterruptedException, ClassNotFoundException { >>> // TODO Auto-generated method stub >>> >>> Configuration conf = new Configuration(); >>> Job job = new Job(); >>> conf.set("stream.recordreader.class", >>> "org.apache.hadoop.streaming.StreamXmlRecordReader"); >>> conf.set("stream.recordreader.begin", "<info>"); >>> conf.set("stream.recordreader.end", "</info>"); >>> job.setInputFormatClass(StreamInputFormat.class); >>> job.setOutputKeyClass(Text.class); >>> job.setOutputValueClass(IntWritable.class); >>> FileInputFormat.addInputPath(job, new >>> Path("/mapin/demo.xml")); >>> FileOutputFormat.setOutputPath(job, new >>> Path("/mapout/demo")); >>> job.waitForCompletion(true); >>> } >>> >>> Could you please out my mistake?? >>> >>> Regards, >>> Mohammad Tariq >>> >>> >>> On Tue, Jun 19, 2012 at 4:35 PM, Mohammad Tariq <donta...@gmail.com> >>> wrote: >>> > Hello Madhu, >>> > >>> > Thanks for the response. Actually I was trying to use the >>> > new API (Job). Have you tried that. I was not able to set the >>> > InputFormat using the Job API. >>> > >>> > Regards, >>> > Mohammad Tariq >>> > >>> > >>> > On Tue, Jun 19, 2012 at 4:28 PM, madhu phatak <phatak....@gmail.com> >>> > wrote: >>> >> Hi, >>> >> Set the following properties in driver class >>> >> >>> >> jobConf.set("stream.recordreader.class", >>> >> "org.apache.hadoop.streaming.StreamXmlRecordReader"); >>> >> jobConf.set("stream.recordreader.begin", >>> >> "start-tag"); >>> >> jobConf.set("stream.recordreader.end", >>> >> "end-tag"); >>> >> >>> >> jobConf.setInputFormat(StreamInputFormat,class); >>> >> >>> >> In Mapper, xml record will come as key of type Text,so your mapper >>> >> will >>> >> look like >>> >> >>> >> public class MyMapper<K,V> implements Mapper<Text,Text,K,V> >>> >> >>> >> >>> >> On Tue, Jun 19, 2012 at 2:49 AM, Mohammad Tariq <donta...@gmail.com> >>> >> wrote: >>> >>> >>> >>> Hello list, >>> >>> >>> >>> Could anyone, who has written MapReduce jobs to process xml >>> >>> documents stored in there cluster using "StreamXmlRecordReader" share >>> >>> his/her experience??...or if you can provide me some pointers >>> >>> addressing that..Many thanks. >>> >>> >>> >>> Regards, >>> >>> Mohammad Tariq >>> >> >>> >> >>> >> >>> >> >>> >> -- >>> >> https://github.com/zinnia-phatak-dev/Nectar >>> >> >> >> >> >> >> -- >> https://github.com/zinnia-phatak-dev/Nectar >>