Hi Alex, First thanks again for responding, I saw that katta within their search engine already allow to do full text search within pdf box to search and index pdf files ;) I will study your video training tonigth to learn how to implement the job for xml within your video :))
2009/6/15 Alex Loddengaard <a...@cloudera.com> > Well, you define what your job does, but I expect that nearly all MR jobs > do > their parsing in the mapper, not in the reducer. You may find these two > videos useful: > > <http://www.cloudera.com/hadoop-training-mapreduce-hdfs> > <http://www.cloudera.com/hadoop-training-programming-with-hadoop> > > Hope this helps! > > Alex > > On Sat, Jun 13, 2009 at 1:42 AM, Alexandre Jaquet <alexjaq...@gmail.com > >wrote: > > > Thanks Alex, > > > > Parsing the documents is a task done within the reducer ? we collect the > > datas (document input) within a mapper and then parse it ? > > > > Thanks in advance > > > > Alexandre Jaquet > > > > 2009/6/13 Alex Loddengaard <a...@cloudera.com> > > > > > When you refer to "filesystem," do you mean HDFS? > > > > > > It's very common to store lots of text files in HDFS and run multiple > > jobs > > > to process / learn about those text files. As for XML support, you can > > use > > > Java libraries (or Python libraries if you're using Hadoop streaming) > to > > > parse the XML; Hadoop itself doesn't have much XML support. I hope > this > > > answers your question. > > > > > > Alex > > > > > > On Fri, Jun 12, 2009 at 1:31 PM, Alexandre Jaquet < > alexjaq...@gmail.com > > > >wrote: > > > > > > > Hi, > > > > > > > > Does hadoop and map / reduce will allow me to parse large quantity of > > > open > > > > xml files distributed inside the same filesystem but using multipe > jobs > > ? > > > > > > > > Thx > > > > > > > > Alexandre Jaquet > > > > > > > > > >