Thanks Alex,

Parsing the documents is a task done within the reducer ? we collect the
datas (document input) within a mapper and then parse it ?

Thanks in advance

Alexandre Jaquet

2009/6/13 Alex Loddengaard <a...@cloudera.com>

> When you refer to "filesystem," do you mean HDFS?
>
> It's very common to store lots of text files in HDFS and run multiple jobs
> to process / learn about those text files.  As for XML support, you can use
> Java libraries (or Python libraries if you're using Hadoop streaming) to
> parse the XML; Hadoop itself doesn't have much XML support.  I hope this
> answers your question.
>
> Alex
>
> On Fri, Jun 12, 2009 at 1:31 PM, Alexandre Jaquet <alexjaq...@gmail.com
> >wrote:
>
> > Hi,
> >
> > Does hadoop and map / reduce will allow me to parse large quantity of
> open
> > xml files distributed inside the same filesystem but using multipe jobs ?
> >
> > Thx
> >
> > Alexandre Jaquet
> >
>

Reply via email to