Re: Lucene 2.2, NFS, Lock obtain timed out

Patrick Kimber Fri, 29 Jun 2007 05:33:40 -0700

Hi

As requested, I have been trying to improve the logging in the
application so I can give you more details of the update pattern.


I am using the Lucene Index Accessor contribution to co-ordinate the
readers and writers:
http://www.nabble.com/Fwd%3A-Contribution%3A-LuceneIndexAccessor-t17416.html#a47049

If the close method, in the IndexAccessProvider, fails the exception
is logged but not re-thrown:
public void close(IndexReader reader) {
 if (reader != null) {
   try {
     reader.close();
   } catch (IOException e) {
     log.error("", e);
   }
 }
}

I have been checking the application log.  Just before the time when
the lock file errors occur I found this log entry:
[11:28:59] [ERROR] IndexAccessProvider
java.io.FileNotFoundException:
/mnt/nfstest/repository/lucene/lucene-icm-test-1-0/segments_h75 (No
such file or directory)
        at java.io.RandomAccessFile.open(Native Method)

- I guess the missing segments file could result in the lock file not
being removed?
- Is it safe to ignore this exception (probably not)?
- Why would the segments file be missing?  Could this be connected to
the NFS issues in some way?

Thanks for your help

Patrick


On 29/06/07, Patrick Kimber <[EMAIL PROTECTED]> wrote:

Hi Doron

Thanks for your reply.

I am working on the details of the update pattern.  It will take me
some time as I cannot reproduce the issue on demand.

To answer your other questions, yes, we do have multiple writers.  One
writer per node in the cluster.

I will post the results of my investigations as soon as possible.

Thanks for your help

Patrick



On 29/06/07, Doron Cohen <[EMAIL PROTECTED]> wrote:
> hi Patrick,
>
> Mike is the expert in this, but until he gets in, can you add details on
> the update pattern - note that the DeletionPolicy you describe below is not
> (afaik) related to the write lock time-out issues you are facing. The
> DeletionPolicy manages better the interaction between an IndexWriter that
> deletes old files, and an IndexReader that might still use this file. The
> write lock, on the hand, just synchronizes between multiple IndexWriter
> objects attempting to open the same index for write. So, do you have
> multiple writers? Can you print/describe the writers timing scenario when
> this time-out problem occur, e.g, something like this
>      w1.open
>      w1.modify
>      w1.close
>      w2.open
>      w2.modify
>      w2.close
>      w3.open
>      w3.modify
>      w3.close
>      w2.open .....  time-out... but w3 closed the index.... so the
> lock-file was supposed to be removed, why wasn't it?
> Can write attempt come from different nodes in the cluster?
> Can you make sure that when "the" writer gets the lock time-out there is
> indeed no other active writer?
>
> Doron
>
> "Patrick Kimber" <[EMAIL PROTECTED]> wrote on 29/06/2007
> 02:01:08:
>
> > Hi,
> >
> > We are sharing a Lucene index in a Linux cluster over an NFS
> > share.  We have
> > multiple servers reading and writing to the index.
> >
> > I am getting regular lock exceptions e.g.
> > Lock obtain timed out:
> >
> 
NativeFSLock@/mnt/nfstest/repository/lucene/lock/lucene-2d3d31fa7f19eabb73d692df44087d81-
>
> > n-write.lock
> >
> > - We are using Lucene 2.2.0
> > - We are using kernel NFS and lockd is running.
> > - We are using a modified version of the ExpirationTimeDeletionPolicy
> > found in the
> >   Lucene test suite:
> > http://svn.apache.
> >
> 
org/repos/asf/lucene/java/trunk/src/test/org/apache/lucene/index/TestDeletionPolicy.
>
> > java
> >   I have set the expiration time to 600 seconds (10 minutes).
> > - We are using the NativeFSLockFactory with the lock folder being
> > within the index
> >   folder:
> >   /mnt/nfstest/repository/lucene/lock/
> > - I have implemented a handler which will pause and retry an
> > update or delete
> >   operation if a LockObtainFailedException or StaleReaderException is
> > caught.  The
> >   handler will retry the update or delete once every second for
> > 1 minute before
> >   re-throwing the exception and aborting.
> >
> > The issue appears to be caused by a lock file which is not deleted.
> > The handlers
> > keep retrying... the process holding the lock eventually aborts...
> > this deletes the
> > lock file - any applications still running then continue normally.
> >
> > The application does not throw these exceptions when it is run on a
> > standard Linux
> > file system or Windows workstation.
> >
> > I would really appreciate some help with this issue.  The
> > chances are I am doing
> > something stupid... but I cannot think what to try next.
> >
> > Thanks for your help
> >
> > Patrick
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > For additional commands, e-mail: [EMAIL PROTECTED]
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Lucene 2.2, NFS, Lock obtain timed out

Reply via email to