Re: Lucene 2.2, NFS, Lock obtain timed out

Yonik Seeley Fri, 29 Jun 2007 06:47:51 -0700

Note that some Solr users have reported a similar issue.
https://issues.apache.org/jira/browse/SOLR-240


-Yonik

On 6/29/07, Patrick Kimber <[EMAIL PROTECTED]> wrote:

Hi

As requested, I have been trying to improve the logging in the
application so I can give you more details of the update pattern.

I am using the Lucene Index Accessor contribution to co-ordinate the
readers and writers:
http://www.nabble.com/Fwd%3A-Contribution%3A-LuceneIndexAccessor-t17416.html#a47049

If the close method, in the IndexAccessProvider, fails the exception
is logged but not re-thrown:
public void close(IndexReader reader) {
  if (reader != null) {
    try {
      reader.close();
    } catch (IOException e) {
      log.error("", e);
    }
  }
}

I have been checking the application log.  Just before the time when
the lock file errors occur I found this log entry:
[11:28:59] [ERROR] IndexAccessProvider
java.io.FileNotFoundException:
/mnt/nfstest/repository/lucene/lucene-icm-test-1-0/segments_h75 (No
such file or directory)
        at java.io.RandomAccessFile.open(Native Method)

- I guess the missing segments file could result in the lock file not
being removed?
- Is it safe to ignore this exception (probably not)?
- Why would the segments file be missing?  Could this be connected to
the NFS issues in some way?

Thanks for your help

Patrick


On 29/06/07, Patrick Kimber <[EMAIL PROTECTED]> wrote:
> Hi Doron
>
> Thanks for your reply.
>
> I am working on the details of the update pattern.  It will take me
> some time as I cannot reproduce the issue on demand.
>
> To answer your other questions, yes, we do have multiple writers.  One
> writer per node in the cluster.
>
> I will post the results of my investigations as soon as possible.
>
> Thanks for your help
>
> Patrick
>
>
>
> On 29/06/07, Doron Cohen <[EMAIL PROTECTED]> wrote:
> > hi Patrick,
> >
> > Mike is the expert in this, but until he gets in, can you add details on
> > the update pattern - note that the DeletionPolicy you describe below is not
> > (afaik) related to the write lock time-out issues you are facing. The
> > DeletionPolicy manages better the interaction between an IndexWriter that
> > deletes old files, and an IndexReader that might still use this file. The
> > write lock, on the hand, just synchronizes between multiple IndexWriter
> > objects attempting to open the same index for write. So, do you have
> > multiple writers? Can you print/describe the writers timing scenario when
> > this time-out problem occur, e.g, something like this
> >      w1.open
> >      w1.modify
> >      w1.close
> >      w2.open
> >      w2.modify
> >      w2.close
> >      w3.open
> >      w3.modify
> >      w3.close
> >      w2.open .....  time-out... but w3 closed the index.... so the
> > lock-file was supposed to be removed, why wasn't it?
> > Can write attempt come from different nodes in the cluster?
> > Can you make sure that when "the" writer gets the lock time-out there is
> > indeed no other active writer?
> >
> > Doron
> >
> > "Patrick Kimber" <[EMAIL PROTECTED]> wrote on 29/06/2007
> > 02:01:08:
> >
> > > Hi,
> > >
> > > We are sharing a Lucene index in a Linux cluster over an NFS
> > > share.  We have
> > > multiple servers reading and writing to the index.
> > >
> > > I am getting regular lock exceptions e.g.
> > > Lock obtain timed out:
> > >
> > 
NativeFSLock@/mnt/nfstest/repository/lucene/lock/lucene-2d3d31fa7f19eabb73d692df44087d81-
> >
> > > n-write.lock
> > >
> > > - We are using Lucene 2.2.0
> > > - We are using kernel NFS and lockd is running.
> > > - We are using a modified version of the ExpirationTimeDeletionPolicy
> > > found in the
> > >   Lucene test suite:
> > > http://svn.apache.
> > >
> > 
org/repos/asf/lucene/java/trunk/src/test/org/apache/lucene/index/TestDeletionPolicy.
> >
> > > java
> > >   I have set the expiration time to 600 seconds (10 minutes).
> > > - We are using the NativeFSLockFactory with the lock folder being
> > > within the index
> > >   folder:
> > >   /mnt/nfstest/repository/lucene/lock/
> > > - I have implemented a handler which will pause and retry an
> > > update or delete
> > >   operation if a LockObtainFailedException or StaleReaderException is
> > > caught.  The
> > >   handler will retry the update or delete once every second for
> > > 1 minute before
> > >   re-throwing the exception and aborting.
> > >
> > > The issue appears to be caused by a lock file which is not deleted.
> > > The handlers
> > > keep retrying... the process holding the lock eventually aborts...
> > > this deletes the
> > > lock file - any applications still running then continue normally.
> > >
> > > The application does not throw these exceptions when it is run on a
> > > standard Linux
> > > file system or Windows workstation.
> > >
> > > I would really appreciate some help with this issue.  The
> > > chances are I am doing
> > > something stupid... but I cannot think what to try next.
> > >
> > > Thanks for your help
> > >
> > > Patrick


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Lucene 2.2, NFS, Lock obtain timed out

Reply via email to