Re: Serving remote lucene client - RMI vs HTTP

Grant Ingersoll Sun, 15 Jul 2007 20:00:01 -0700

Hi Kumar,

I am curious about where the bulk of time was spent in Lucene. I amnot doubting your numbers or analysis, just want to know if there wasanything that could be improved in Lucene and make sure you aresolidly in need of going to multiple machines as it adds an extralevel of complexity. Can you provide info about number of documents,# of updates, # of queries etc. ? What kind of analysis, etc. areyou doing? That being said, putting the web app on one machine andthe search application on the other makes sense much in the same wayit makes sense in most cases to do this for a database.

As for POST/GET vs. RMI, have a look at Solr, esp. its replicationcapabilities for load balancing search servers. I think it answersthe question in favor of the POST/GET approach. In fact, you may beable to drop in Solr to your situation w/o too much work.


Cheers,
Grant


On Jul 15, 2007, at 10:10 PM, kumarlimbu wrote:

Hi Everyone,

We are using lucene,nutch and spring framework to create a specialized
search engine. Due to growing traffic we are trying to scale. Bydoing sometests we found out that the bottle neck was lucene search. We usedsomeheavy traffic simulation and logged the time taken by each portionof the
server response and found out that the bulk of the time was spent in
searching from lucene index.

In order to accomodate higher traffic we are planning on splitting our
application in 2 portions:
1. Web application (on 1 machine)
2. Search application (one more than 1 machine)
Each one of the application will reside on a (possibly) separatemachines.We are looking forward to scaling by adding more than 1 machinededicated to
searching as lucene search seems to be the bottleneck.
Web application will provide the front-end to the user. All staticpages,images and the style information will reside on this machine. Itwill alsoserve dynamic pages but all the searching will take place on thesearch
application. Web application will send search parameters to the search
application and after searching, it will send back the results tothe web
application which will format it and display it to the user.

What we are unable to decide is whether to use RMI (
http://www.soft-amis.com/index.html?return=http://www.soft-amis.com/cluster4spring/index.html
cluster4spring  )or simple HTTP POST/GET request with response in XML
format. We did some research and found out that RMI (with clustering
support) might be more suitable for our needs. Unfortunately ourteam is notfamiliar with RMI and so we don't know if there will be any issueswith it
during implementation.
Advantage of using simple GET/POST is we have more control whichsearcherapp to use and when. An important criteria for us is to disablesearchingfrom the searcher app who's index is being updated. This is veryimportantfor us. Is there anyway in RMI to inform the web (client)application that a
particular server is unavailable (due to index being updated).
We would also like to know if anybody has implemented lucenesearching in a
similar fashion. Thanks for your help.

Kumar Limbu

--
View this message in context: http://www.nabble.com/Serving-remote-lucene-client---RMI-vs-HTTP-tf4084167.html#a11608209
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


--------------------------
Grant Ingersoll
Center for Natural Language Processing
http://www.cnlp.org/tech/lucene.asp

Read the Lucene Java FAQ at http://wiki.apache.org/lucene-java/LuceneFAQ



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Serving remote lucene client - RMI vs HTTP

Reply via email to