[ 
http://issues.apache.org/jira/browse/LUCENE-665?page=comments#action_12435414 ] 
            
Doron Cohen commented on LUCENE-665:
------------------------------------

My summary - and "what's next" proposal - for the discussion so far (in 
comments for issue-665 and in thread 
http://www.nabble.com/-jira--Created%3A-%28LUCENE-665%29-temporary-file-access-denied-on-Windows-tf2167540.html):
 

[1] Reported problem can be regenerated in Windows in presence of programs 
monitoring files.

[2] The proposed fix adds retry after 100ms delay in rare cases where the 
problem occurs.

[3] That fix reduces much the chances of the problem but does not really solve 
it.

[4] Proposed fix for FSDirectry not accepted because:
   [4.1] 100ms second may be too long for highly interactive programs.
   [4.2] 100ms can be insufficient in some cases.
   [4.3] non windows environments might be affected with no justification.
   [4.4] work in progress "lock-less" commits may reduce chances for this 
problem. 

[5] A Windows-specific implementation of FSDir that would not be the default, 
but would be available for application to select, was proposed as a better 
place to host this retry logic, to be available for applications at least until 
the "lock-less" commits is available for use and proves to solve the same 
problem. 

So, I intend to write this solution as outlined in [5] above. It would be 
optional, definitely not the default. Applications would be able to use it for 
Windows environments. The retry behavior would be controlled. In addition, 
would be controlled if to apply retry logic for lock-delete or not - the 
default would be 'no' - because in NFS, a delete may return 'failed' due to 
time-out although it actually succeeded, and a retry logic in this case might 
"kill" voluntary file locking schemes like the default one used by Lucene 
(though I assume that with the NFS native locks proposed by Michael this is not 
the case). 

Hope this reflects the discussion so far...

> temporary file access denied on Windows
> ---------------------------------------
>
>                 Key: LUCENE-665
>                 URL: http://issues.apache.org/jira/browse/LUCENE-665
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Store
>    Affects Versions: 2.0.0
>         Environment: Windows
>            Reporter: Doron Cohen
>         Attachments: FSDirectory_Retry_Logic.patch, 
> FSDirs_Retry_Logic_3.patch, Test_Output.txt, TestInterleavedAddAndRemoves.java
>
>
> When interleaving adds and removes there is frequent opening/closing of 
> readers and writers. 
> I tried to measure performance in such a scenario (for issue 565), but the 
> performance test failed  - the indexing process crashed consistently with 
> file "access denied" errors - "cannot create a lock file" in 
> "lockFile.createNewFile()" and "cannot rename file".
> This is related to:
> - issue 516 (a closed issue: "TestFSDirectory fails on Windows") - 
> http://issues.apache.org/jira/browse/LUCENE-516 
> - user list questions due to file errors:
>   - 
> http://www.nabble.com/OutOfMemory-and-IOException-Access-Denied-errors-tf1649795.html
>   - 
> http://www.nabble.com/running-a-lucene-indexing-app-as-a-windows-service-on-xp%2C-crashing-tf2053536.html
> - discussion on lock-less commits 
> http://www.nabble.com/Lock-less-commits-tf2126935.html
> My test setup is: XP (SP1), JAVA 1.5 - both SUN and IBM SDKs. 
> I noticed that the problem is more frequent when locks are created on one 
> disk and the index on another. Both are NTFS with Windows indexing service 
> enabled. I suspect this indexing service might be related - keeping files 
> busy for a while, but don't know for sure.
> After experimenting with it I conclude that these problems - at least in my 
> scenario - are due to a temporary situation - the FS, or the OS, is 
> *temporarily* holding references to files or folders, preventing from 
> renaming them, deleting them, or creating new files in certain directories. 
> So I added to FSDirectory a retry logic in cases the error was related to 
> "Access Denied". This is the same approach brought in 
> http://www.nabble.com/running-a-lucene-indexing-app-as-a-windows-service-on-xp%2C-crashing-tf2053536.html
>  - there, in addition to the retry, gc() is invoked (I did not gc()). This is 
> based on the *hope* that a access-denied situation would vanish after a small 
> delay, and the retry would succeed.
> I modified FSDirectory this way for "Access Denied" errors during creating a 
> new files, renaming a file.
> This worked fine for me. The performance test that failed before, now managed 
> to complete. There should be no performance implications due to this 
> modification, because only the cases that would otherwise wrongly fail are 
> now delaying some extra millis and retry.
> I am attaching here a patch - FSDirectory_Retry_Logic.patch - that has these 
> changes to FSDirectory. 
> All "ant test" tests pass with this patch.
> Also attaching a test case that demostrates the problem - at least on my 
> machine. There two tests cases in that test file - one that works in system 
> temp (like most Lucene tests) and one that creates the index in a different 
> disk. The latter case can only run if the path ("D:" , "tmp") is valid.
> It would be great if people that experienced these problems could try out 
> this patch and comment whether it made any difference for them. 
> If it turns out useful for others as well, including this patch in the code 
> might help to relieve some of those "frustration" user cases.
> A comment on state of proposed patch: 
> - It is not a "ready to deploy" code - it has some debug printing, showing 
> the cases that the "retry logic" actually took place. 
> - I am not sure if current 30ms is the right delay... why not 50ms? 10ms? 
> This is currently defined by a constant.
> - Should a call to gc() be added? (I think not.)
> - Should the retry be attempted also on "non access-denied" exceptions? (I 
> think not).
> - I feel it is somewhat "woodoo programming", but though I don't like it, it 
> seems to work... 
> Attached files:
> 1. TestInterleavedAddAndRemoves.java - the LONG test that fails on XP without 
> the patch and passes with the patch.
> 2. FSDirectory_Retry_Logic.patch
> 3. Test_Output.txt- output of the test with the patch, on my XP. Only the 
> createNewFile() case had to be bypassed in this test, but for another program 
> I also saw the renameFile() being bypassed.
> - Doron

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to