Thanks Alex, Parsing the documents is a task done within the reducer ? we collect the datas (document input) within a mapper and then parse it ?
Thanks in advance Alexandre Jaquet 2009/6/13 Alex Loddengaard <a...@cloudera.com> > When you refer to "filesystem," do you mean HDFS? > > It's very common to store lots of text files in HDFS and run multiple jobs > to process / learn about those text files. As for XML support, you can use > Java libraries (or Python libraries if you're using Hadoop streaming) to > parse the XML; Hadoop itself doesn't have much XML support. I hope this > answers your question. > > Alex > > On Fri, Jun 12, 2009 at 1:31 PM, Alexandre Jaquet <alexjaq...@gmail.com > >wrote: > > > Hi, > > > > Does hadoop and map / reduce will allow me to parse large quantity of > open > > xml files distributed inside the same filesystem but using multipe jobs ? > > > > Thx > > > > Alexandre Jaquet > > >