Re: Clustering Lucene with 40 Servers

Peter W. Tue, 02 Jan 2007 20:58:59 -0800

Hello,

Don't have any of the scalability requirements mentioned in thisthread but the problem is an interesting one.Lucene needs a connection pool equivalent IMHO or a best practicesmethod for load balancing.

Opening, locking, reading and writing to remote indexes over RMIseems good on paper but likely to meltwith anything approaching the kind of web traffic seen by a popularsite. This is why you see peoplerunning (so many) JVM's locally. Solr helps but passing long XML orJSON urls for thousands or millions ofrequests between your own machines to maintain a Lucene index looksredundant to me.

Adding messaging layers to propagate changes or updates introducesmore points of failure.

I wonder if a system where just a few machines capture say 100kupdates (at a time) in memory then write .gzto locally attached external drives would work. These separated datafiles would be exposed thru a web service

where load balanced remote boxes access them using servlets.

They connect in rotation downloading batched index updates. Heck,start splitting up big files using Hadoop's

HDFS and make it a party!

Regards,

Peter W.



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Clustering Lucene with 40 Servers

Reply via email to