Re: Nutch and Distributed Lucene

Ning Li Tue, 01 Apr 2008 09:06:37 -0700

Hi,

Nutch builds Lucene indexes. But Nutch is much more than that. It is a
web search application software that crawls the web, inverts links and
builds indexes. Each step is one or more Map/Reduce jobs. You can find
more information at http://lucene.apache.org/nutch/

The Map/Reduce job to build Lucene indexes in Nutch is customized to
the data schema/structures used in Nutch. The index contrib package in
Hadoop provides a general/configurable process to build Lucene indexes
in parallel using a Map/Reduce job. That's the main difference. There
is also the difference that the index build job in Nutch builds
indexes in reduce tasks, while the index contrib package builds
indexes in both map and reduce tasks and there are advantages in doing
that...

Regards,
Ning

On 4/1/08, Naama Kraus <[EMAIL PROTECTED]> wrote:
> Hi,
>
> I'd like to know if Nutch is running on top of Lucene, or is it non related
> to Lucene. I.e. indexing, parsing, crawling, internal data structures ... -
> all written from scratch using MapReduce (my impression) ?
>
> What is the relation between Nutch and the distributed Lucene patch that was
> inserted lately into Hadoop ?
>
> Thanks for any enlightening,
> Naama
>
> --
> oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo
> 00 oo 00 oo
> "If you want your children to be intelligent, read them fairy tales. If you
> want them to be more intelligent, read them more fairy tales." (Albert
> Einstein)
>

Re: Nutch and Distributed Lucene

Reply via email to