Re: parsing open xml

Alex Loddengaard Mon, 15 Jun 2009 10:29:51 -0700

Well, you define what your job does, but I expect that nearly all MR jobs do
their parsing in the mapper, not in the reducer.  You may find these two
videos useful:


<http://www.cloudera.com/hadoop-training-mapreduce-hdfs>
<http://www.cloudera.com/hadoop-training-programming-with-hadoop>

Hope this helps!

Alex

On Sat, Jun 13, 2009 at 1:42 AM, Alexandre Jaquet <alexjaq...@gmail.com>wrote:

> Thanks Alex,
>
> Parsing the documents is a task done within the reducer ? we collect the
> datas (document input) within a mapper and then parse it ?
>
> Thanks in advance
>
> Alexandre Jaquet
>
> 2009/6/13 Alex Loddengaard <a...@cloudera.com>
>
> > When you refer to "filesystem," do you mean HDFS?
> >
> > It's very common to store lots of text files in HDFS and run multiple
> jobs
> > to process / learn about those text files.  As for XML support, you can
> use
> > Java libraries (or Python libraries if you're using Hadoop streaming) to
> > parse the XML; Hadoop itself doesn't have much XML support.  I hope this
> > answers your question.
> >
> > Alex
> >
> > On Fri, Jun 12, 2009 at 1:31 PM, Alexandre Jaquet <alexjaq...@gmail.com
> > >wrote:
> >
> > > Hi,
> > >
> > > Does hadoop and map / reduce will allow me to parse large quantity of
> > open
> > > xml files distributed inside the same filesystem but using multipe jobs
> ?
> > >
> > > Thx
> > >
> > > Alexandre Jaquet
> > >
> >
>

Re: parsing open xml

Reply via email to