RE: Setting "disableLuceneLocks" to "true" for Read-Only Mode

George Aroush Tue, 07 Nov 2006 17:47:16 -0800

Hi Andy,

Your problem is now a design issue.  Here is my suggestion but keep in mind,
I am not familiar with your requirement or environment as you are so I might
be off.  But in general, this is a good framework.

If you setup your system as follows, it should allow give you excellent
performance both for indexing and at searching (at the cost of extra disk
space, but those are very cheap those days.)

1) On your machine where you have the index stored, add a second hard-drive.

2) Clone your existing index 'A' into the second hard-drive as index 'B'.

3) On drive 1, where index A is, create another index 'a'.  Do the same on
drive 2, where index B is, create index 'b'.  'a' and 'b' are initially
empty Lucene indexes.

4) You are currently using a Searcher, change it into a MultiSearcher and
point it to index 'A' and 'a'.

5) Run your indexer on 'a'.  All updates, additions and optimizations will
happen on 'a'.  Given that 'a' is initially empty, and will grow slowly, any
change to index 'a' will be supper fast compared to the big master index
'A'.

6) Once a day, at midnight for example, merge 'a' index into 'B' index (the
index on the second hard-drive.)  Next, create an empty 'b' index, point
your MultiSearcher at B and 'b', delete 'a' and create an empty 'a'.  Now
that 'B' is the active search index, point your indexer to 'b' and have all
update, additions and optimizations take place on 'b'.

7) The next night, repeated the above but this time from 'B' into 'A' and so
on and so forth.

Do you see the idea behind this design?  The goal is to have a small live
index (for fast update and optimization) combined with the master index.
You want a second hard-drive because optimization and merging is disk IO
bound.  You don't want your nightly merge to slow down your searcher.

Let me know if this makes sense and if it will work in your environment.

Regards,

-- George Aroush

-----Original Message-----
From: Andy Berryman [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, November 01, 2006 11:24 PM
To: [email protected]
Subject: Re: Setting "disableLuceneLocks" to "true" for Read-Only Mode

Yes ... You've got it.

However ... If I cant read the index while I'm updating it, then it doesnt
help me much.  This is a requirement.  My architecture guarantees that the
two machines maintianing the index wont work on the same index at the same
time.  But I have nothing that prevents the machines providing the search
from reading the index at the same time as it is being updated by the other
machines.

So it sounds to me like I still need to use the Lucene locking.  And it also
sounds like I need set the "Lock Directory" to be in the shared location
where I'm keeping the index directory.  Because currently each machine is
using its own "Lock Directory" ... which I guess means that the search isnt
currently locked against the writes now.

So if my indexes are located here:
   \\server\indexes\index1\
   \\server\indexes\index2\

I should set the "Lock Directory" for ALL the machines to be something like:
   \\server\indexes\lockdir\

Make sense?

Andy

On 11/1/06, George Aroush <[EMAIL PROTECTED]> wrote:
>
> Hi Andy,
>
> So two machines all that they do is update the index, and they are 
> synchronized, right?  Then you have two other machines that provide 
> search, right?  And your 4th machine simply hosts the index.  Did I 
> get this right?
>
> Sure, the search machines can be set as read-only, but they can't read 
> an index while it is being updated.  If you are preventing this, then 
> you are all set.
>
> Regards,
>
> -- George Aroush
>
> -----Original Message-----
> From: Andy Berryman [mailto:[EMAIL PROTECTED]
> Sent: Wednesday, November 01, 2006 9:34 AM
> To: [email protected]
> Subject: Re: Setting "disableLuceneLocks" to "true" for Read-Only Mode
>
> As for the architecture of my project currently ...
>
> Think about it as involving 5 machines.  1 machine hosts the UNC share 
> folder that contains the index directory.  2 machines run an NT 
> service that looks for changes in the database and then uses the 
> "Reader" to delete documents and then the "Writer" to add documents.  
> These machines synchronize their work such that they each arent 
> working on the same index at the same time.  2 machines run a web 
> service that provides methods to search the index and return results.
>
> As such ... the machines that run the Web Service have NO path that 
> involves manipulating the index at all.  Therefore, I was thinking 
> that disabling the locking on those machines would simply reduce to 
> extra overhead that doesnt really seem necessary for me.
>
> Thoughts?
>
> Thanks
> Andy
>
>
> On 11/1/06, George Aroush <[EMAIL PROTECTED]> wrote:
> >
> > Hi Andy,
> >
> > If you have your own solution to guarantee reader/write locking, and 
> > it's faster then what Lucene.Net has to offer, you can use it.
> >
> > "disableLuceneLocks" is provided by Lucene.Net so that a Lucene 
> > application can be run off a CD/DVD (read-only device) thus, no lock 
> > file will be created.
> >
> > BTW, what is your solution?
> >
> > Regards,
> >
> > -- George Aroush
> >
> > -----Original Message-----
> > From: Andy Berryman [mailto:[EMAIL PROTECTED]
> > Sent: Tuesday, October 31, 2006 4:23 PM
> > To: [email protected]
> > Subject: Setting "disableLuceneLocks" to "true" for Read-Only Mode
> >
> > What are the benefits of doing this versus just letting Lucene do 
> > its normal locking when set to "false"?  I have a scenario where I 
> > can gurantee that the processing using the Reader object is ONLY 
> > going to read the index and NOT modify it in any way.  It seems to 
> > me that disabling the locking would reduce some overhead that I dont 
> > really need to care about.
> >
> > Thoughts?
> >
> > Thanks
> > Andy
> >
> >
>
>

RE: Setting "disableLuceneLocks" to "true" for Read-Only Mode

Reply via email to