Re: Lucene : avoiding locking (incremental indexing)

sergiu gordea Mon, 15 Nov 2004 23:28:44 -0800

Luke Shannon wrote:

I like the sound of the Queue approach. I also don't like that I have to focefully unlock the index.

Personally I don't like the Queue aproach... because I already implemented multithreading in out application to improve its performance. In our application indexing is not a high priority, but it's happening quite often. Search is a priority.

Lucene allows to have more searches at on time. When you have a big index and a many users then ... the Queue aproach can slow down your application to much. I think it will be a bottleneck.

I know that the lock problem is annoying, but I also think that the right way is to identify the source of locking. Our application is a webbased application based on turbine, and when we want to restart tomcat, we just kill the process (otherwise we need to restart 2 times because of some log4j initialization problem), so ... the index is locked after the tomcat restart. In my case it makes sense to check if index is locked one time at startup. I'm also logging all errors that I get in the systems, this is helping me to find their sourcce easier.

All the best,

Sergiu

I'm not the most experience programmer and am on a tight deadline. The
approach I ended up with was the best I could do with the experience I've
got and the time I had.
My indexer works so far and doesn't have to forcefully release the lock on
the Index too often (the case is most likely to occur when someone removes a
content file(s) and the reader needs to delete from the existing index for
the first time). We will see what happens as more people use the system with
large content directories.
As I learn more I plan to expand the functionality of my class.
Luke S
----- Original Message ----- From: <[EMAIL PROTECTED]> To: "Lucene Users List" <[EMAIL PROTECTED]> Sent: Monday, November 15, 2004 5:50 PM Subject: Re: Lucene : avoiding locking (incremental indexing)
It really seems like I am not the only person having this issue.
So far I am seeing 2 solutions and honestly I don't love either totally.
I am thinking that without changes to Lucene itself, the best "general" way to implement this might be to have a queue of changes and have Lucene work off this queue in a single thread using a time-settable batch method. This is similar to what you are using below, but I don't like that you forcibly unlock Lucene if it shows itself locked. Using the Queue approach, only that one thread could be accessing Lucene for writes/deletes anyway so there should be no "unknown" locking.

I can imagine this being a very good addition to Lucene - creating a high

level interface to Lucene that manages incremental updates in such a manner. If anybody has such a general piece of code, please post it!!! I would use it tonight rather then create my own.

I am not sure if there is anything that can be done to Lucene itself to

help with this need people seem to be having. I realize the likely reasons why Lucene might need to only have one Index writer and the additional load that might be caused by locking off pieces of the database rather then the whole database. I think I need to look in the developer archives.
JohnE
----- Original Message -----
From: Luke Shannon <[EMAIL PROTECTED]>
Date: Monday, November 15, 2004 5:14 pm
Subject: Re: Lucene : avoiding locking (incremental indexing)
Hi Luke;
I have a similar system (except people don't need to see results
immediatly). The approach I took is a little different.
I made my Indexer a thread with the indexing operations occuring
the in run
method. When the IndexWriter is to be created or the IndexReader
needs to
execute a delete I called the following method:
private void manageIndexLock() {
try {
 //check if the index is locked and deal with it if it is
 if (index.exists() && IndexReader.isLocked(indexFileLocation)) {
  System.out.println("INDEXING INFO: There is more than one
process trying
to write to the index folder. Will wait for index to become
available.");    //perform this loop until the lock if released or
3 mins
  // has expired
  int indexChecks = 0;
  while (IndexReader.isLocked(indexFileLocation)
    && indexChecks < 6) {
   //increment the number of times we check the index
   // files
   indexChecks++;
   try {
    //sleep for 30 seconds
    Thread.sleep(30000L);
   } catch (InterruptedException e2) {
    System.out.println("INDEX ERROR: There was a problem waiting
for the
lock to release. "
        + e2.getMessage());
   }
  }//closes the while loop for checking on the index
  // directory
  //if we are still locked we need to do something about it
  if (IndexReader.isLocked(indexFileLocation)) {
   System.out.println("INDEXING INFO: Index Locked After 3
minute of
waiting. Forcefully releasing lock.");
   IndexReader.unlock(FSDirectory.getDirectory(index, false));
   System.out.println("INDEXING INFO: Index lock released");
  }//close the if that actually releases the lock
 }//close the if ensure the file exists
}//closes the try for all the above operations
catch (IOException e1) {
 System.out.println("INDEX ERROR: There was a problem waiting
for the lock
to release. "
     + e1.getMessage());
}
}//close the manageIndexLock method
Do you think this is a bad approach?
Luke
----- Original Message ----- From: "Luke Francl" <[EMAIL PROTECTED]> To: "Lucene Users List" <[EMAIL PROTECTED]> Sent: Monday, November 15, 2004 5:01 PM Subject: Re: Lucene : avoiding locking (incremental indexing)

This is how I implemented incremental indexing. If anyone sees

anything> wrong, please let me know.

Our motivation is similar to John Eichel's. We have a digital asset management system and when users update, delete or create a new

asset,> they need to see their results immediately.
The most important thing to know about incremental indexing that
multiple threads cannot share the same IndexWriter, and only one
IndexWriter can be open on an index at a time.
Therefore, what I did was control access to the IndexWriter
through a

singleton wrapper class that synchronizes access to the

IndexWriter and
IndexReader (for deletes). After finishing writing to the index, you
must close the IndexWriter to flush the changes to the index.
If you do this you will be fine.
However, opening and closing the index takes time so we had to
look for
some ways to speed up the indexing.
The most obvious thing is that you should do as much work as
possible> outside of the synchronized block. For example, in my application, the

creation of Lucene Document objects is not synchronized. Only

the part
of the code that is between your IndexWriter.open() and
IndexWriter.close() needs to be synchronized.
The other easy thing I did to improve performance was batch
changes in a
transaction together for indexing. If a user changes 50 assets, that
will all be indexed using one Lucene IndexWriter.
So far, we haven't had to explore further performance
enhancements, but

if we do the next thing I will do is create a thread that

gathers assets

that need to be indexed and performs a batch job every five

minutes or
so.
Hope this is helpful,
Luke
-----------------------------------------------------------------
----
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
-------------------------------------------------------------------
--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Lucene : avoiding locking (incremental indexing)

Reply via email to