Tim,

I looked there, but it is a set up manual. I read the MapReduce, Sazall, and
the MS paper on these, but I need "best practices."

Thank you,
Mark

On Fri, Jan 23, 2009 at 3:22 PM, tim robertson <timrobertson...@gmail.com>wrote:

> Hi,
>
> Sounds like you might want to look at the Nutch project architecture
> and then see the Nutch on Hadoop tutorial -
> http://wiki.apache.org/nutch/NutchHadoopTutorial  It does web
> crawling, and indexing using Lucene.  It would be a good place to
> start anyway for ideas, even if it doesn't end up meeting your exact
> needs.
>
> Cheers,
>
> Tim
>
>
> On Fri, Jan 23, 2009 at 10:11 PM, Mark Kerzner <markkerz...@gmail.com>
> wrote:
> > Hi, esteemed group,
> > how would I form Maps in MapReduce to recursevely look at every file in a
> > directory, and do something to this file, such as produce a PDF or
> compute
> > its hash?
> >
> > For that matter, Google builds its index using MapReduce, or so the
> papers
> > say. First the crawlers store all the files. Are their papers on the
> > architecture of this? How and where do the crawlers store the downloaded
> > files?
> >
> > Thank you,
> > Mark
> >
>

Reply via email to