Re: Processing xml documents using StreamXmlRecordReader

madhu phatak Tue, 19 Jun 2012 05:13:57 -0700

Seems like StreamInputFormat not yet ported to new API.That's why you are
not able to set as InputFormatClass. You can file a  jira for this issue.


On Tue, Jun 19, 2012 at 4:49 PM, Mohammad Tariq <donta...@gmail.com> wrote:

> My driver function looks like this -
>
> public static void main(String[] args) throws IOException,
> InterruptedException, ClassNotFoundException {
>                // TODO Auto-generated method stub
>
>                Configuration conf = new Configuration();
>                Job job = new Job();
>                conf.set("stream.recordreader.class",
> "org.apache.hadoop.streaming.StreamXmlRecordReader");
>                conf.set("stream.recordreader.begin", "<info>");
>                conf.set("stream.recordreader.end", "</info>");
>                job.setInputFormatClass(StreamInputFormat.class);
>                job.setOutputKeyClass(Text.class);
>                job.setOutputValueClass(IntWritable.class);
>                FileInputFormat.addInputPath(job, new
> Path("/mapin/demo.xml"));
>                FileOutputFormat.setOutputPath(job, new
> Path("/mapout/demo"));
>                job.waitForCompletion(true);
>        }
>
> Could you please out my mistake??
>
> Regards,
>     Mohammad Tariq
>
>
> On Tue, Jun 19, 2012 at 4:35 PM, Mohammad Tariq <donta...@gmail.com>
> wrote:
> > Hello Madhu,
> >
> >             Thanks for the response. Actually I was trying to use the
> > new API (Job). Have you tried that. I was not able to set the
> > InputFormat using the Job API.
> >
> > Regards,
> >     Mohammad Tariq
> >
> >
> > On Tue, Jun 19, 2012 at 4:28 PM, madhu phatak <phatak....@gmail.com>
> wrote:
> >> Hi,
> >>  Set the following properties in driver class
> >>
> >>   jobConf.set("stream.recordreader.class",
> >> "org.apache.hadoop.streaming.StreamXmlRecordReader");
> >> jobConf.set("stream.recordreader.begin",
> >> "start-tag");
> >> jobConf.set("stream.recordreader.end",
> >> "end-tag");
> >>                         jobConf.setInputFormat(StreamInputFormat,class);
> >>
> >>  In Mapper, xml record will come as key of type Text,so your mapper will
> >> look like
> >>
> >>   public class MyMapper<K,V>  implements Mapper<Text,Text,K,V>
> >>
> >>
> >> On Tue, Jun 19, 2012 at 2:49 AM, Mohammad Tariq <donta...@gmail.com>
> wrote:
> >>>
> >>> Hello list,
> >>>
> >>>        Could anyone, who has written MapReduce jobs to process xml
> >>> documents stored in there cluster using "StreamXmlRecordReader" share
> >>> his/her experience??...or if you can provide me some pointers
> >>> addressing that..Many thanks.
> >>>
> >>> Regards,
> >>>     Mohammad Tariq
> >>
> >>
> >>
> >>
> >> --
> >> https://github.com/zinnia-phatak-dev/Nectar
> >>
>



-- 
https://github.com/zinnia-phatak-dev/Nectar

Re: Processing xml documents using StreamXmlRecordReader

Reply via email to