Re: Lucene in clustered environment (Tomcat)

Nader Henein Fri, 10 Jun 2005 11:32:42 -0700

Considering you have all your servers on one machine a simple memory failure and the whole thing goes south. But you're right, we have an independent Lucene index sitting next to each one of our webservers on each machine, but they are all updated from a central location powered and organized by an application that accesses our persistent store on anoracle database and creates XML files which are then copied to each of the Lucene servers and indexed, if the central utility fails, then the backup kicks in, at worst the indecies aren't up to date for as long as it takes to point the webservers to the Oracle Standby. I wrote a preliminary paper (will send you separately coz the mailing list doesn't allow attachments) about Lucene strategies in a clustered environment, this is a bout 6 months old, I've gone a long way since and I'm finalizing a newer version which I hope to publish so as to offer a solid case study to anyone out there taking that step. Once again this paper is old, but it should get you going.


Nader Henein




Ben wrote:

Wouldn't it defeat the purpose of clustering if you have a single
server to manage a single index? What would happen if this server
failed?

Cheers,
Ben

On 6/8/05, Ben <[EMAIL PROTECTED]> wrote:

How about using JavaGroups to notify other nodes in the cluster about
the changes?

Essentially, each node has the same index stored in a different
location. When one node updates/deletes a record, other nodes will get
a notification about the changes and update their index accordingly?
By using this method, I don't have to modify my Lucene code, I just
need to add additional code to notify other nodes. I believe this
method also scales better.

Cheers,
Ben


On 6/7/05, Nader Henein <[EMAIL PROTECTED]> wrote:

I realize I've already asked you this question, but do you need 100%
real time, because you could run batch them every 2 minutes, and
concerning Parallel search, unless you really need it, it's overkill in
this case, a communal index will serve you well and will be much easier
to maintain. You have to way requirement vs. complexity/ debug time.

Nader Henein

Ben wrote:

When you say your cluster is on a single machine, do you mean that you have 
multiple webservers on the same machine all of which search a single Lucene 
index?

Yes, this is my case.

Do you use Lucene as your persistent store or do you have a DB back there?

I use Lucene to search for data stored in a PostgreSQL server.

what is your current update/delete strategy because real time inserts from the 
webservers directly to the index will not work because you can't have multiple 
writers.

I have to do this in real time, what are the available solutions? My
application has the ability to do batch update/delete to a Lucene
index but I would like to do this in real time.

One solution I am thinking is to have each cluster has it own index
and use parallel search. This makes my application even more complex.

I strongly recommend Quartz, it's rock solid and really versatile.

I am using Quartz, it is really great and supports cluster.

Thanks,
Ben


On 6/7/05, Nader Henein <[EMAIL PROTECTED]> wrote:

When you say your cluster is on a single machine, do you mean that you
have multiple webservers on the same machine all of which search a
single Lucene index? Because if that's the case, your solution is
simple, as long as you persist to a single DB and then designate one of
your servers (or even another server) to update/delete the index. Do you
use Lucene as your persistent store or do you have a DB back there? and
what is your current update/delete strategy because real time inserts

from the webservers directly to the index will not work because you

can't have multiple writers. Updating a dirty flag on rows that need to
be indexed/deleted, or using a table for this task and then batching
your updates would be ideal, and if you're using server specific
scheduling, I strongly recommend Quartz, it's rock solid and really
versatile.

My two cents.

Nader Henein


Ben wrote:

My cluster is on a single machine and I am using FS index.

I have already integrated Lucene into my web application for use in a
non-clustered environment. I don't know what I need to do to make it
work in a clustered environment.

Thanks,
Ben

On 6/7/05, Nader Henein <[EMAIL PROTECTED]> wrote:

IMHO, Issues that you need to consider

 * Atomicity of updates and deletes if you are using multiple indexes
   on multiple machines (the case if your cluster is over a wide network)
 * Scheduled indecies to core data comparison and sanitization
   (intensive)

This all depends on what the volume of change is on your index and
whether you'll be using a Memory resident index or an FS index.

This should start the ball rolling, we've been using Lucene successfully
on a distributed cluster for a while now, and as long as you're aware of
some basic NDS limitations/constraints you should be fine.

Hope this helps

Nader Henein

Ben wrote:

Hi

I would like to use Lucene in a clustered environment, what are the
things that I should consider and do?

I would like to use the same ordinary index storage for all the nodes
in the the cluster, possible?

Thanks,
Ben

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

--

Nader S. Henein
Senior Applications Architect

Bayt.com





---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

--

Nader S. Henein
Senior Applications Architect

Bayt.com





---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

--

Nader S. Henein
Senior Applications Architect

Bayt.com





---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


--
Nader S. Henein
Senior Applications Developer

Bayt.com





---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Lucene in clustered environment (Tomcat)

Reply via email to