Re: Lucene 2.2, NFS, Lock obtain timed out

Patrick Kimber Fri, 29 Jun 2007 04:04:13 -0700

Hi Doron

Thanks for your reply.


I am working on the details of the update pattern.  It will take me
some time as I cannot reproduce the issue on demand.

To answer your other questions, yes, we do have multiple writers.  One
writer per node in the cluster.

I will post the results of my investigations as soon as possible.

Thanks for your help

Patrick



On 29/06/07, Doron Cohen <[EMAIL PROTECTED]> wrote:

hi Patrick,

Mike is the expert in this, but until he gets in, can you add details on
the update pattern - note that the DeletionPolicy you describe below is not
(afaik) related to the write lock time-out issues you are facing. The
DeletionPolicy manages better the interaction between an IndexWriter that
deletes old files, and an IndexReader that might still use this file. The
write lock, on the hand, just synchronizes between multiple IndexWriter
objects attempting to open the same index for write. So, do you have
multiple writers? Can you print/describe the writers timing scenario when
this time-out problem occur, e.g, something like this
     w1.open
     w1.modify
     w1.close
     w2.open
     w2.modify
     w2.close
     w3.open
     w3.modify
     w3.close
     w2.open .....  time-out... but w3 closed the index.... so the
lock-file was supposed to be removed, why wasn't it?
Can write attempt come from different nodes in the cluster?
Can you make sure that when "the" writer gets the lock time-out there is
indeed no other active writer?

Doron

"Patrick Kimber" <[EMAIL PROTECTED]> wrote on 29/06/2007
02:01:08:

> Hi,
>
> We are sharing a Lucene index in a Linux cluster over an NFS
> share.  We have
> multiple servers reading and writing to the index.
>
> I am getting regular lock exceptions e.g.
> Lock obtain timed out:
>
NativeFSLock@/mnt/nfstest/repository/lucene/lock/lucene-2d3d31fa7f19eabb73d692df44087d81-

> n-write.lock
>
> - We are using Lucene 2.2.0
> - We are using kernel NFS and lockd is running.
> - We are using a modified version of the ExpirationTimeDeletionPolicy
> found in the
>   Lucene test suite:
> http://svn.apache.
>
org/repos/asf/lucene/java/trunk/src/test/org/apache/lucene/index/TestDeletionPolicy.

> java
>   I have set the expiration time to 600 seconds (10 minutes).
> - We are using the NativeFSLockFactory with the lock folder being
> within the index
>   folder:
>   /mnt/nfstest/repository/lucene/lock/
> - I have implemented a handler which will pause and retry an
> update or delete
>   operation if a LockObtainFailedException or StaleReaderException is
> caught.  The
>   handler will retry the update or delete once every second for
> 1 minute before
>   re-throwing the exception and aborting.
>
> The issue appears to be caused by a lock file which is not deleted.
> The handlers
> keep retrying... the process holding the lock eventually aborts...
> this deletes the
> lock file - any applications still running then continue normally.
>
> The application does not throw these exceptions when it is run on a
> standard Linux
> file system or Windows workstation.
>
> I would really appreciate some help with this issue.  The
> chances are I am doing
> something stupid... but I cannot think what to try next.
>
> Thanks for your help
>
> Patrick
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Lucene 2.2, NFS, Lock obtain timed out

Reply via email to