Re: Distributed Lucene - from hadoop contrib

Ning Li Mon, 18 Aug 2008 08:06:15 -0700

On 8/12/08, Deepika Khera <[EMAIL PROTECTED]> wrote:
> I was imagining the 2 concepts of i) using hadoop.contrib.index to index
> documents ii) providing search in a distributed fashion, to be all in
> one box.

Ideally, yes. However, while it's good to use map/reduce when
batch-building index, there is no consensus whether it'll be a good
idea to serve index on HDFS. This is because of the poor performance
of random reads in HDFS.

On 8/14/08, Anoop Bhatti <[EMAIL PROTECTED]> wrote:
> I'd like to know if I'm heading down the right path, so my questions are:
> * Has anyone tried searching a distributed Lucene index using a method
> like this before?  It seems too easy.  Are there any "gotchas" that I
> should look out for as I scale up to more nodes and a larger index?
> * Do you think that going ahead with this approach, which consists of
> 1) creating a Lucene index using the  hadoop.contrib.index code
> (thanks, Ning!) and 2) leaving that index "in-place" on hdfs and
> searching over it using the client code below, is a good approach?

Yes, the code works on a single index shard. There is the performance
concern described above. More importantly, as your index scales out,
there will be multiple shards, and there are the challenges of load
balance and fault tolerance, etc.

> * What is the status of the bailey project?  It seems to be working on
> the same type of problem. Should I wait until that project comes out
> with code?

There is no timeline for Bailey right now.

Ning

Re: Distributed Lucene - from hadoop contrib

Reply via email to