RE: Distributed Lucene - from hadoop contrib

Deepika Khera Tue, 12 Aug 2008 16:20:07 -0700

Thank you for your response. 

I was imagining the 2 concepts of i) using hadoop.contrib.index to index
documents ii) providing search in a distributed fashion, to be all in
one box.


So basically, hadoop.contrib.index is used to create lucene indexes in
a distributed fashion (by creating shards-each shard being a lucene
instance). And then I can use Katta or any other Distributed Lucene
application to serve lucene indexes distributed over many servers.

Deepika 


-----Original Message-----
From: Ning Li [mailto:[EMAIL PROTECTED] 
Sent: Friday, August 08, 2008 7:08 AM
To: core-user@hadoop.apache.org
Subject: Re: Distributed Lucene - from hadoop contrib

> 1) Katta n Distributed Lucene are different projects though, right?
Both
> being based on kind of the same paradigm (Distributed Index)?

The design of Katta and that of Distributed Lucene are quite different
last time I checked. I pointed out the Katta project because you can
find the code for Distributed Lucene there.

> 2) So, I should be able to use the hadoop.contrib.index with HDFS.
> Though, it would be much better if it is integrated with "Distributed
> Lucene" or the "Katta project" as these are designed keeping the
> structure and behavior of indexes in mind. Right?

As described in the README file, hadoop.contrib.index uses map/reduce
to build Lucene instances. It does not contain a component that serves
queries. If that's not sufficient for you, you can check out the
designs of Katta and Distributed Index and see which one suits your
use better.

Ning

RE: Distributed Lucene - from hadoop contrib

Reply via email to