Hello,

Don't have any of the scalability requirements mentioned in this thread but the problem is an interesting one. Lucene needs a connection pool equivalent IMHO or a best practices method for load balancing.

Opening, locking, reading and writing to remote indexes over RMI seems good on paper but likely to melt with anything approaching the kind of web traffic seen by a popular site. This is why you see people running (so many) JVM's locally. Solr helps but passing long XML or JSON urls for thousands or millions of requests between your own machines to maintain a Lucene index looks redundant to me.

Adding messaging layers to propagate changes or updates introduces more points of failure.

I wonder if a system where just a few machines capture say 100k updates (at a time) in memory then write .gz to locally attached external drives would work. These separated data files would be exposed thru a web service
where load balanced remote boxes access them using servlets.

They connect in rotation downloading batched index updates. Heck, start splitting up big files using Hadoop's
HDFS and make it a party!

Regards,

Peter W.



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to