Hi All,

I am trying to build a distributed system to build and serve lucene indexes.
I came across the Distributed Lucene project-
http://wiki.apache.org/hadoop/DistributedLucene
https://issues.apache.org/jira/browse/HADOOP-3394

and have a couple of questions. It will be really helpful if someone can
provide some insights.

1) Is this code production ready?
2) Does someone has performance data for this project?
3) It allows searches and updates/deletes to be performed at the same time.
How well the system will perform if there are frequent updates to the
system. Will it handle the search and update load easily or will it be
better to rebuild or update the indexes on different machines and then
deploy the indexes back to the machines that are serving the indexes?

Basically I am trying to choose between the 2 approaches-

1) Use Hadoop to build and/or update Lucene indexes and then deploy them on
separate cluster that will take care or load balancing, fault tolerance etc.
There is a package in Hadoop contrib that does this, so I can use that code.

2) Use and/or modify the Distributed Lucene code.

I am expecting daily updates to our index so I am not sure if Distribtued
Lucene code (which allows searches and updates on the same indexes) will be
able to handle search and update load efficiently.

I think you should take a look at Katta, as this is exactly the type of use case it's designed to handle.

http://katta.sourceforge.net/

Other people use the distributed search support inside of Nutch.

And Solr has distributed search support, though it's still pretty new.

-- Ken
--
Ken Krugler
+1 530-210-6378

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to