Re: Distributed Lucene Questions

Ken Krugler Mon, 01 Jun 2009 11:08:17 -0700

Hi All,

I am trying to build a distributed system to build and serve lucene indexes.
I came across the Distributed Lucene project-
http://wiki.apache.org/hadoop/DistributedLucene
https://issues.apache.org/jira/browse/HADOOP-3394


and have a couple of questions. It will be really helpful if someone can
provide some insights.

1) Is this code production ready?
2) Does someone has performance data for this project?
3) It allows searches and updates/deletes to be performed at the same time.
How well the system will perform if there are frequent updates to the
system. Will it handle the search and update load easily or will it be
better to rebuild or update the indexes on different machines and then
deploy the indexes back to the machines that are serving the indexes?

Basically I am trying to choose between the 2 approaches-

1) Use Hadoop to build and/or update Lucene indexes and then deploy them on
separate cluster that will take care or load balancing, fault tolerance etc.
There is a package in Hadoop contrib that does this, so I can use that code.

2) Use and/or modify the Distributed Lucene code.

I am expecting daily updates to our index so I am not sure if Distribtued
Lucene code (which allows searches and updates on the same indexes) will be
able to handle search and update load efficiently.

I think you should take a look at Katta, as this is exactly the typeof use case it's designed to handle.


http://katta.sourceforge.net/

Other people use the distributed search support inside of Nutch.

And Solr has distributed search support, though it's still pretty new.

-- Ken
--
Ken Krugler
+1 530-210-6378

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: Distributed Lucene Questions

Reply via email to