Well, you define what your job does, but I expect that nearly all MR jobs do their parsing in the mapper, not in the reducer. You may find these two videos useful:
<http://www.cloudera.com/hadoop-training-mapreduce-hdfs> <http://www.cloudera.com/hadoop-training-programming-with-hadoop> Hope this helps! Alex On Sat, Jun 13, 2009 at 1:42 AM, Alexandre Jaquet <alexjaq...@gmail.com>wrote: > Thanks Alex, > > Parsing the documents is a task done within the reducer ? we collect the > datas (document input) within a mapper and then parse it ? > > Thanks in advance > > Alexandre Jaquet > > 2009/6/13 Alex Loddengaard <a...@cloudera.com> > > > When you refer to "filesystem," do you mean HDFS? > > > > It's very common to store lots of text files in HDFS and run multiple > jobs > > to process / learn about those text files. As for XML support, you can > use > > Java libraries (or Python libraries if you're using Hadoop streaming) to > > parse the XML; Hadoop itself doesn't have much XML support. I hope this > > answers your question. > > > > Alex > > > > On Fri, Jun 12, 2009 at 1:31 PM, Alexandre Jaquet <alexjaq...@gmail.com > > >wrote: > > > > > Hi, > > > > > > Does hadoop and map / reduce will allow me to parse large quantity of > > open > > > xml files distributed inside the same filesystem but using multipe jobs > ? > > > > > > Thx > > > > > > Alexandre Jaquet > > > > > >