Note that some Solr users have reported a similar issue. https://issues.apache.org/jira/browse/SOLR-240
-Yonik On 6/29/07, Patrick Kimber <[EMAIL PROTECTED]> wrote:
Hi As requested, I have been trying to improve the logging in the application so I can give you more details of the update pattern. I am using the Lucene Index Accessor contribution to co-ordinate the readers and writers: http://www.nabble.com/Fwd%3A-Contribution%3A-LuceneIndexAccessor-t17416.html#a47049 If the close method, in the IndexAccessProvider, fails the exception is logged but not re-thrown: public void close(IndexReader reader) { if (reader != null) { try { reader.close(); } catch (IOException e) { log.error("", e); } } } I have been checking the application log. Just before the time when the lock file errors occur I found this log entry: [11:28:59] [ERROR] IndexAccessProvider java.io.FileNotFoundException: /mnt/nfstest/repository/lucene/lucene-icm-test-1-0/segments_h75 (No such file or directory) at java.io.RandomAccessFile.open(Native Method) - I guess the missing segments file could result in the lock file not being removed? - Is it safe to ignore this exception (probably not)? - Why would the segments file be missing? Could this be connected to the NFS issues in some way? Thanks for your help Patrick On 29/06/07, Patrick Kimber <[EMAIL PROTECTED]> wrote: > Hi Doron > > Thanks for your reply. > > I am working on the details of the update pattern. It will take me > some time as I cannot reproduce the issue on demand. > > To answer your other questions, yes, we do have multiple writers. One > writer per node in the cluster. > > I will post the results of my investigations as soon as possible. > > Thanks for your help > > Patrick > > > > On 29/06/07, Doron Cohen <[EMAIL PROTECTED]> wrote: > > hi Patrick, > > > > Mike is the expert in this, but until he gets in, can you add details on > > the update pattern - note that the DeletionPolicy you describe below is not > > (afaik) related to the write lock time-out issues you are facing. The > > DeletionPolicy manages better the interaction between an IndexWriter that > > deletes old files, and an IndexReader that might still use this file. The > > write lock, on the hand, just synchronizes between multiple IndexWriter > > objects attempting to open the same index for write. So, do you have > > multiple writers? Can you print/describe the writers timing scenario when > > this time-out problem occur, e.g, something like this > > w1.open > > w1.modify > > w1.close > > w2.open > > w2.modify > > w2.close > > w3.open > > w3.modify > > w3.close > > w2.open ..... time-out... but w3 closed the index.... so the > > lock-file was supposed to be removed, why wasn't it? > > Can write attempt come from different nodes in the cluster? > > Can you make sure that when "the" writer gets the lock time-out there is > > indeed no other active writer? > > > > Doron > > > > "Patrick Kimber" <[EMAIL PROTECTED]> wrote on 29/06/2007 > > 02:01:08: > > > > > Hi, > > > > > > We are sharing a Lucene index in a Linux cluster over an NFS > > > share. We have > > > multiple servers reading and writing to the index. > > > > > > I am getting regular lock exceptions e.g. > > > Lock obtain timed out: > > > > > NativeFSLock@/mnt/nfstest/repository/lucene/lock/lucene-2d3d31fa7f19eabb73d692df44087d81- > > > > > n-write.lock > > > > > > - We are using Lucene 2.2.0 > > > - We are using kernel NFS and lockd is running. > > > - We are using a modified version of the ExpirationTimeDeletionPolicy > > > found in the > > > Lucene test suite: > > > http://svn.apache. > > > > > org/repos/asf/lucene/java/trunk/src/test/org/apache/lucene/index/TestDeletionPolicy. > > > > > java > > > I have set the expiration time to 600 seconds (10 minutes). > > > - We are using the NativeFSLockFactory with the lock folder being > > > within the index > > > folder: > > > /mnt/nfstest/repository/lucene/lock/ > > > - I have implemented a handler which will pause and retry an > > > update or delete > > > operation if a LockObtainFailedException or StaleReaderException is > > > caught. The > > > handler will retry the update or delete once every second for > > > 1 minute before > > > re-throwing the exception and aborting. > > > > > > The issue appears to be caused by a lock file which is not deleted. > > > The handlers > > > keep retrying... the process holding the lock eventually aborts... > > > this deletes the > > > lock file - any applications still running then continue normally. > > > > > > The application does not throw these exceptions when it is run on a > > > standard Linux > > > file system or Windows workstation. > > > > > > I would really appreciate some help with this issue. The > > > chances are I am doing > > > something stupid... but I cannot think what to try next. > > > > > > Thanks for your help > > > > > > Patrick
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]