[ http://issues.apache.org/jira/browse/LUCENE-665?page=all ]
Doron Cohen updated LUCENE-665:
-------------------------------
Attachment: FSDirs_Retry_Logic_3.patch
I am attaching an updated patch - FSDirs_Retry_Logic_3.patch.
In this update:
- merge with code changes by issue 635 ("decouple locking from directory")
- modified by recommendations in above comments:
- do not rely on specific exception message text.
- overide lock.obtain(timeout) and handle unexpected exceptions there.
- do not modify logic of obtain() (no changes to this method).
- UNEXPECTED_ERROR_RETRY_DELAY set to 100ms.
- debug prints commented out.
"ant test" tests all pass.
My stress IO test passes as well.
> temporary file access denied on Windows
> ---------------------------------------
>
> Key: LUCENE-665
> URL: http://issues.apache.org/jira/browse/LUCENE-665
> Project: Lucene - Java
> Issue Type: Bug
> Components: Store
> Affects Versions: 2.0.0
> Environment: Windows
> Reporter: Doron Cohen
> Attachments: FSDirectory_Retry_Logic.patch,
> FSDirs_Retry_Logic_3.patch, Test_Output.txt, TestInterleavedAddAndRemoves.java
>
>
> When interleaving adds and removes there is frequent opening/closing of
> readers and writers.
> I tried to measure performance in such a scenario (for issue 565), but the
> performance test failed - the indexing process crashed consistently with
> file "access denied" errors - "cannot create a lock file" in
> "lockFile.createNewFile()" and "cannot rename file".
> This is related to:
> - issue 516 (a closed issue: "TestFSDirectory fails on Windows") -
> http://issues.apache.org/jira/browse/LUCENE-516
> - user list questions due to file errors:
> -
> http://www.nabble.com/OutOfMemory-and-IOException-Access-Denied-errors-tf1649795.html
> -
> http://www.nabble.com/running-a-lucene-indexing-app-as-a-windows-service-on-xp%2C-crashing-tf2053536.html
> - discussion on lock-less commits
> http://www.nabble.com/Lock-less-commits-tf2126935.html
> My test setup is: XP (SP1), JAVA 1.5 - both SUN and IBM SDKs.
> I noticed that the problem is more frequent when locks are created on one
> disk and the index on another. Both are NTFS with Windows indexing service
> enabled. I suspect this indexing service might be related - keeping files
> busy for a while, but don't know for sure.
> After experimenting with it I conclude that these problems - at least in my
> scenario - are due to a temporary situation - the FS, or the OS, is
> *temporarily* holding references to files or folders, preventing from
> renaming them, deleting them, or creating new files in certain directories.
> So I added to FSDirectory a retry logic in cases the error was related to
> "Access Denied". This is the same approach brought in
> http://www.nabble.com/running-a-lucene-indexing-app-as-a-windows-service-on-xp%2C-crashing-tf2053536.html
> - there, in addition to the retry, gc() is invoked (I did not gc()). This is
> based on the *hope* that a access-denied situation would vanish after a small
> delay, and the retry would succeed.
> I modified FSDirectory this way for "Access Denied" errors during creating a
> new files, renaming a file.
> This worked fine for me. The performance test that failed before, now managed
> to complete. There should be no performance implications due to this
> modification, because only the cases that would otherwise wrongly fail are
> now delaying some extra millis and retry.
> I am attaching here a patch - FSDirectory_Retry_Logic.patch - that has these
> changes to FSDirectory.
> All "ant test" tests pass with this patch.
> Also attaching a test case that demostrates the problem - at least on my
> machine. There two tests cases in that test file - one that works in system
> temp (like most Lucene tests) and one that creates the index in a different
> disk. The latter case can only run if the path ("D:" , "tmp") is valid.
> It would be great if people that experienced these problems could try out
> this patch and comment whether it made any difference for them.
> If it turns out useful for others as well, including this patch in the code
> might help to relieve some of those "frustration" user cases.
> A comment on state of proposed patch:
> - It is not a "ready to deploy" code - it has some debug printing, showing
> the cases that the "retry logic" actually took place.
> - I am not sure if current 30ms is the right delay... why not 50ms? 10ms?
> This is currently defined by a constant.
> - Should a call to gc() be added? (I think not.)
> - Should the retry be attempted also on "non access-denied" exceptions? (I
> think not).
> - I feel it is somewhat "woodoo programming", but though I don't like it, it
> seems to work...
> Attached files:
> 1. TestInterleavedAddAndRemoves.java - the LONG test that fails on XP without
> the patch and passes with the patch.
> 2. FSDirectory_Retry_Logic.patch
> 3. Test_Output.txt- output of the test with the patch, on my XP. Only the
> createNewFile() case had to be bypassed in this test, but for another program
> I also saw the renameFile() being bypassed.
> - Doron
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]