[ http://issues.apache.org/jira/browse/LUCENE-665?page=comments#action_12431801 ] Doron Cohen commented on LUCENE-665: ------------------------------------
I think I know which software is causing/exposing this behavior in my environment. This is the SVN client I am using - TortoiseSVN. I tried the following sequence: 1) Run with TortoiseSVN installed - the test generates these "access denied:" errors (and bypasses them). 2) Uninstalled TortoiseSVN (+reboot), run test - pass with no "access denied" errorrs. 3) Installed TortoiseSVN again (+reboot), run test - same "access denied" errors again. I am using most recent stable TotoiseSVN version - 1.3.5 build 6804 - 32 bit, for svn-1.3.2, downloaded from http://tortoisesvn.tigris.org/. There is an interesting discussion thread of these type of errors on Windows platforms in svn forums - http://svn.haxx.se/dev/archive-2003-10/0136.shtml. At that case it was svn that suffers from these errors. It says "...Windows allows applications to "tag-along" to see when a file has been written - they will wait for it to close and then do whatever they do, usually opening a file descriptor or handle. This would prevent that file from being renamed for a brief period..." TortoiseSVN is a shell extension integrated into Windows explorer. As such, it probably demonstrates the "tag-along" behavior described above. (BTW, it is a great svn client to my opinion) Here is another excerpt from that discussion thread - >> >> sleep(1) would work, I suppose. ;~) >> > Most of the time, but not all the time. The only way I've made it work > well on all the machines I've tried it on is to put it into a sleep(1) > and retry loop of at *least* 20 or so attempts. Anything less and it > still fails on some machines. That implies it is very dependent on > machine speed or something, which means sleep times/retry times are just > guessing games at best. > > If I could just get it recreated outside of Subversion and prove it's a > Microsoft problem...although it probably still wouldn't get fixed for > months at least. We don't know that this is a bug in TortoiseSVN. We cannot tell that there are no other such tag-along applications in users machines. One cannot seriously expect this Win32 behavior to be fixed. I guess the question is - is it worth for Lucene to attempt to at least reduce chances of failures in this case (I say yes:-) > temporary file access denied on Windows > --------------------------------------- > > Key: LUCENE-665 > URL: http://issues.apache.org/jira/browse/LUCENE-665 > Project: Lucene - Java > Issue Type: Bug > Components: Store > Affects Versions: 2.0.0 > Environment: Windows > Reporter: Doron Cohen > Attachments: FSDirectory_Retry_Logic.patch, > FSDirs_Retry_Logic_3.patch, Test_Output.txt, TestInterleavedAddAndRemoves.java > > > When interleaving adds and removes there is frequent opening/closing of > readers and writers. > I tried to measure performance in such a scenario (for issue 565), but the > performance test failed - the indexing process crashed consistently with > file "access denied" errors - "cannot create a lock file" in > "lockFile.createNewFile()" and "cannot rename file". > This is related to: > - issue 516 (a closed issue: "TestFSDirectory fails on Windows") - > http://issues.apache.org/jira/browse/LUCENE-516 > - user list questions due to file errors: > - > http://www.nabble.com/OutOfMemory-and-IOException-Access-Denied-errors-tf1649795.html > - > http://www.nabble.com/running-a-lucene-indexing-app-as-a-windows-service-on-xp%2C-crashing-tf2053536.html > - discussion on lock-less commits > http://www.nabble.com/Lock-less-commits-tf2126935.html > My test setup is: XP (SP1), JAVA 1.5 - both SUN and IBM SDKs. > I noticed that the problem is more frequent when locks are created on one > disk and the index on another. Both are NTFS with Windows indexing service > enabled. I suspect this indexing service might be related - keeping files > busy for a while, but don't know for sure. > After experimenting with it I conclude that these problems - at least in my > scenario - are due to a temporary situation - the FS, or the OS, is > *temporarily* holding references to files or folders, preventing from > renaming them, deleting them, or creating new files in certain directories. > So I added to FSDirectory a retry logic in cases the error was related to > "Access Denied". This is the same approach brought in > http://www.nabble.com/running-a-lucene-indexing-app-as-a-windows-service-on-xp%2C-crashing-tf2053536.html > - there, in addition to the retry, gc() is invoked (I did not gc()). This is > based on the *hope* that a access-denied situation would vanish after a small > delay, and the retry would succeed. > I modified FSDirectory this way for "Access Denied" errors during creating a > new files, renaming a file. > This worked fine for me. The performance test that failed before, now managed > to complete. There should be no performance implications due to this > modification, because only the cases that would otherwise wrongly fail are > now delaying some extra millis and retry. > I am attaching here a patch - FSDirectory_Retry_Logic.patch - that has these > changes to FSDirectory. > All "ant test" tests pass with this patch. > Also attaching a test case that demostrates the problem - at least on my > machine. There two tests cases in that test file - one that works in system > temp (like most Lucene tests) and one that creates the index in a different > disk. The latter case can only run if the path ("D:" , "tmp") is valid. > It would be great if people that experienced these problems could try out > this patch and comment whether it made any difference for them. > If it turns out useful for others as well, including this patch in the code > might help to relieve some of those "frustration" user cases. > A comment on state of proposed patch: > - It is not a "ready to deploy" code - it has some debug printing, showing > the cases that the "retry logic" actually took place. > - I am not sure if current 30ms is the right delay... why not 50ms? 10ms? > This is currently defined by a constant. > - Should a call to gc() be added? (I think not.) > - Should the retry be attempted also on "non access-denied" exceptions? (I > think not). > - I feel it is somewhat "woodoo programming", but though I don't like it, it > seems to work... > Attached files: > 1. TestInterleavedAddAndRemoves.java - the LONG test that fails on XP without > the patch and passes with the patch. > 2. FSDirectory_Retry_Logic.patch > 3. Test_Output.txt- output of the test with the patch, on my XP. Only the > createNewFile() case had to be bypassed in this test, but for another program > I also saw the renameFile() being bypassed. > - Doron -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]