[jira] Commented: (LUCENE-675) Lucene benchmark: objective performance test for Lucene
[ http://issues.apache.org/jira/browse/LUCENE-675?page=comments#action_12436443 ] Paul Smith commented on LUCENE-675: --- >From a strict performance point of view, a standard set of important, but >don't forget other languages. >From a tokenization point of view (seperate to this issues), perhaps the >Gutenberg project would be useful to test correctness of the analysis phase. > Lucene benchmark: objective performance test for Lucene > --- > > Key: LUCENE-675 > URL: http://issues.apache.org/jira/browse/LUCENE-675 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Andrzej Bialecki > Attachments: LuceneBenchmark.java > > > We need an objective way to measure the performance of Lucene, both indexing > and querying, on a known corpus. This issue is intended to collect comments > and patches implementing a suite of such benchmarking tests. > Regarding the corpus: one of the widely used and freely available corpora is > the original Reuters collection, available from > http://www-2.cs.cmu.edu/afs/cs.cmu.edu/project/theo-20/www/data/news20.tar.gz > or > http://people.csail.mit.edu/u/j/jrennie/public_html/20Newsgroups/20news-18828.tar.gz. > I propose to use this corpus as a base for benchmarks. The benchmarking > suite could automatically retrieve it from known locations, and cache it > locally. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-675) Lucene benchmark: objective performance test for Lucene
[ http://issues.apache.org/jira/browse/LUCENE-675?page=comments#action_12436442 ] Andrzej Bialecki commented on LUCENE-675: -- Yes, that could be a good additional source. However, IMHO the primary corpus should be widely known and standardized, hence my proposal of the Reuters. (I mistakenly copy&paste-d the urls in the comment above - of course the corpus they're pointing at is the "20 Newsgroups", not the Reuters one. Correct url for the Reuters corpus is http://www.daviddlewis.com/resources/testcollections/reuters21578/ ). > Lucene benchmark: objective performance test for Lucene > --- > > Key: LUCENE-675 > URL: http://issues.apache.org/jira/browse/LUCENE-675 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Andrzej Bialecki > Attachments: LuceneBenchmark.java > > > We need an objective way to measure the performance of Lucene, both indexing > and querying, on a known corpus. This issue is intended to collect comments > and patches implementing a suite of such benchmarking tests. > Regarding the corpus: one of the widely used and freely available corpora is > the original Reuters collection, available from > http://www-2.cs.cmu.edu/afs/cs.cmu.edu/project/theo-20/www/data/news20.tar.gz > or > http://people.csail.mit.edu/u/j/jrennie/public_html/20Newsgroups/20news-18828.tar.gz. > I propose to use this corpus as a base for benchmarks. The benchmarking > suite could automatically retrieve it from known locations, and cache it > locally. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-675) Lucene benchmark: objective performance test for Lucene
[ http://issues.apache.org/jira/browse/LUCENE-675?page=comments#action_12436437 ] Paul Smith commented on LUCENE-675: --- If you're looking for freely available text in bulk, what about: http://www.gutenberg.org/wiki/Main_Page > Lucene benchmark: objective performance test for Lucene > --- > > Key: LUCENE-675 > URL: http://issues.apache.org/jira/browse/LUCENE-675 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Andrzej Bialecki > Attachments: LuceneBenchmark.java > > > We need an objective way to measure the performance of Lucene, both indexing > and querying, on a known corpus. This issue is intended to collect comments > and patches implementing a suite of such benchmarking tests. > Regarding the corpus: one of the widely used and freely available corpora is > the original Reuters collection, available from > http://www-2.cs.cmu.edu/afs/cs.cmu.edu/project/theo-20/www/data/news20.tar.gz > or > http://people.csail.mit.edu/u/j/jrennie/public_html/20Newsgroups/20news-18828.tar.gz. > I propose to use this corpus as a base for benchmarks. The benchmarking > suite could automatically retrieve it from known locations, and cache it > locally. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-675) Lucene benchmark: objective performance test for Lucene
[ http://issues.apache.org/jira/browse/LUCENE-675?page=all ] Andrzej Bialecki updated LUCENE-675: - Attachment: LuceneBenchmark.java This is just a starting point for discussion - it's a pretty old file I found lying around, so it may not even compile with modern Lucene. Requires commons-compress. > Lucene benchmark: objective performance test for Lucene > --- > > Key: LUCENE-675 > URL: http://issues.apache.org/jira/browse/LUCENE-675 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Andrzej Bialecki > Attachments: LuceneBenchmark.java > > > We need an objective way to measure the performance of Lucene, both indexing > and querying, on a known corpus. This issue is intended to collect comments > and patches implementing a suite of such benchmarking tests. > Regarding the corpus: one of the widely used and freely available corpora is > the original Reuters collection, available from > http://www-2.cs.cmu.edu/afs/cs.cmu.edu/project/theo-20/www/data/news20.tar.gz > or > http://people.csail.mit.edu/u/j/jrennie/public_html/20Newsgroups/20news-18828.tar.gz. > I propose to use this corpus as a base for benchmarks. The benchmarking > suite could automatically retrieve it from known locations, and cache it > locally. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Created: (LUCENE-675) Lucene benchmark: objective performance test for Lucene
Lucene benchmark: objective performance test for Lucene --- Key: LUCENE-675 URL: http://issues.apache.org/jira/browse/LUCENE-675 Project: Lucene - Java Issue Type: Improvement Reporter: Andrzej Bialecki We need an objective way to measure the performance of Lucene, both indexing and querying, on a known corpus. This issue is intended to collect comments and patches implementing a suite of such benchmarking tests. Regarding the corpus: one of the widely used and freely available corpora is the original Reuters collection, available from http://www-2.cs.cmu.edu/afs/cs.cmu.edu/project/theo-20/www/data/news20.tar.gz or http://people.csail.mit.edu/u/j/jrennie/public_html/20Newsgroups/20news-18828.tar.gz. I propose to use this corpus as a base for benchmarks. The benchmarking suite could automatically retrieve it from known locations, and cache it locally. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-443) ConjunctionScorer tune-up
[ http://issues.apache.org/jira/browse/LUCENE-443?page=comments#action_12436414 ] Grant Ingersoll commented on LUCENE-443: Yonik, Paul, do either of you know the status on this one? From the looks of it, it hasn't been implemented. It also has the highest number of votes in JIRA, so I thought I would take a look at it. One downside is it is not in patch form, but it also doesn't look to hard to extract the changes, either. One issue I have with these performance issues is that we don't have a reliable benchmarking suite. I am not a lawyer, but might we be able to use something like http://trec.nist.gov/data/reuters/reuters.html to build a sample benchmark suite? This corpus, plus 100 or so queries could work nicely. Of course, we would have to figure out some way for those interested to get their hands on the data. What do others do for benchmarking? > ConjunctionScorer tune-up > - > > Key: LUCENE-443 > URL: http://issues.apache.org/jira/browse/LUCENE-443 > Project: Lucene - Java > Issue Type: Bug > Components: Search >Affects Versions: 1.9 > Environment: Linux, Java 1.5, Large Index with 4 million items and > some heavily nested boolean queries >Reporter: Abdul Chaudhry > Attachments: ConjunctionScorer.java, ConjunctionScorer.java > > > I just recently ran a load test on the latest code from lucene , which is > using a new BooleanScore and noticed the ConjunctionScorer was crunching > through objects , especially while sorting as part of the skipTo call. It > turns a linked list into an array, sorts the array, then converts the array > back to a linked list for further processing by the scoring engines below. > 'm not sure if anyone else is experiencing this as I have a very large index > (> 4 million items) and I am issuing some heavily nested queries > Anyway, I decide to change the link list into an array and use a first and > last marker to "simulate" a linked list. > This scaled much better during my load test as the java gargbage collector > was less - umm - virulent -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Clustering IndexWriter?
Warning, I'm a vendor dude but this isn't really a vendor message. My IT guy had mentioned to me that a bunch of the open source products we use (JIRA, JForum etc) have Lucene inside and in the name of eating our own dog food I tried to cluster IndexWriter (with a RAMDirectory) using our (terracotta) clustering technology. Took me about a half hour to get the basics working from download time. I was wondering, do people in the real world want to be able to cluster this stuff? Is clustering the IndexWriter really all I need to do? If it is interesting, how do I feedback a small code change into the project. We don't yet support subclasses of collections and SegmentInfos subclasses Vector. I just turned it into aggregation (that took 10 of the 30 minutes). We will support this in a future release so it isn't a huge deal but I could get something out sooner if the change was made. Cheers, Steve - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [jira] Commented: (LUCENE-674) Error in FSDirectory if java.io.tmpdir incorrectly specified
: I'm not sure if "the user specified the wrong directory" is necessarily : the correct situation here. Unless a user specifically sets the : org.apache.lucene.lockDir property, they aren't really choosing the lock : directory location - Lucene uses the java.io.tmpdir property as a : default, without any input from the user. A user who runs into this It depends on your definition of "user" ... someone is setting the java.io.tmpdir ... if the lockDir hasn't been explicitly set, and tmpdir points at a bogus directory, that should be an error. (just like it should be an error if lockDir is explicitly set, but points at a bogus directory) : problem will see only something like "Cannot create directory: /temp" in : their logs, and then has to go through the source code to figure out why : anything is trying to create that directory. ... : The code already defaults to using the index directory for lock files : (which the user DID specify) if the org.apache.lucene.lockDir property : and the java.io.tmpdir properties are not set - it doesn't seem like : much of a stretch to just modify the code to also use the index : directory if at least the java.io.tmpdir property is invalid. There is a big differnce between coosing a default in the absense of input, and making assumptions when input is "bad" ... as i said: applications that want to make these assumptions can do so, but the Lucene *library* should not ... the system properties are input just like values passed to method calls -- we have to respect that input. -Hoss - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-674) Error in FSDirectory if java.io.tmpdir incorrectly specified
[ http://issues.apache.org/jira/browse/LUCENE-674?page=comments#action_12436384 ] Ryan Holliday commented on LUCENE-674: -- I'm not sure if "the user specified the wrong directory" is necessarily the correct situation here. Unless a user specifically sets the org.apache.lucene.lockDir property, they aren't really choosing the lock directory location - Lucene uses the java.io.tmpdir property as a default, without any input from the user. A user who runs into this problem will see only something like "Cannot create directory: /temp" in their logs, and then has to go through the source code to figure out why anything is trying to create that directory. The code already defaults to using the index directory for lock files (which the user DID specify) if the org.apache.lucene.lockDir property and the java.io.tmpdir properties are not set - it doesn't seem like much of a stretch to just modify the code to also use the index directory if at least the java.io.tmpdir property is invalid. > Error in FSDirectory if java.io.tmpdir incorrectly specified > > > Key: LUCENE-674 > URL: http://issues.apache.org/jira/browse/LUCENE-674 > Project: Lucene - Java > Issue Type: Bug > Components: Store >Affects Versions: 2.0.0 > Environment: Reported on a Linux system under Tomcat >Reporter: Ryan Holliday > > A user of the JAMWiki project (http://jamwiki.org/) reported an error with > the following stack trace: > SEVERE: Unable to create search instance > /usr/share/tomcat5/webapps/jamwiki-0.3.4-beta7/test/base/search/indexen > java.io.IOException: Cannot create directory: /temp > at org.apache.lucene.store.FSDirectory.init(FSDirectory.java:171) > at > org.apache.lucene.store.FSDirectory.getDirectory(FSDirectory.java:141) > at > org.apache.lucene.store.FSDirectory.getDirectory(FSDirectory.java:117) > at > org.jamwiki.search.LuceneSearchEngine.getSearchIndexPath(LuceneSearchEngine.java:318) > The culprit is that the java.io.tmpdir property was incorrectly specified on > the user's system. Lucene could easily handle this issue by modifying the > FSDirectory.init() method. Currently the code uses the index directory if > java.io.tmpdir and org.apache.lucene.lockDir are unspecified, but it could > use that directory if those values are unspecified OR if they are invalid. > Doing so would make Lucene a bit more robust without breaking any existing > installations. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-674) Error in FSDirectory if java.io.tmpdir incorrectly specified
[ http://issues.apache.org/jira/browse/LUCENE-674?page=comments#action_12436351 ] Hoss Man commented on LUCENE-674: - This sounds like a very similar issue to some past discussion about the path specified when opening a directory, and what to do if it doesn't exist (ie: create it, or throw an error) ... in general i think it would be unadvisable to assume that if java.io.tmpdir refers to a bogus directory that we should use the index directory, because that could lead to situations were typo result in errors silently being ignored to the possible extend of index corruption. (consider for a moment: two lucene based apps running in two seperate JVM instances on the same machine, attempting to use hte same index directory; one with a properly set java.io.tmpdir and one without -- they will most likely crash hard because they would wilently use completley differnet directories for managing locks). As with the discussion about index directories that don't exist, applications that *want* to silenetly assume that a bogus java.io.tmpdir property should result in using the index directory for lock files can get that behavior if they want (by testing java.io.tmpdir themselves, and explicitly constructing a SimpleFSLockFactory() on the directory they want to use) but Lucene should not make any assumptions about what the client application wants in the case of garbage input. > Error in FSDirectory if java.io.tmpdir incorrectly specified > > > Key: LUCENE-674 > URL: http://issues.apache.org/jira/browse/LUCENE-674 > Project: Lucene - Java > Issue Type: Bug > Components: Store >Affects Versions: 2.0.0 > Environment: Reported on a Linux system under Tomcat >Reporter: Ryan Holliday > > A user of the JAMWiki project (http://jamwiki.org/) reported an error with > the following stack trace: > SEVERE: Unable to create search instance > /usr/share/tomcat5/webapps/jamwiki-0.3.4-beta7/test/base/search/indexen > java.io.IOException: Cannot create directory: /temp > at org.apache.lucene.store.FSDirectory.init(FSDirectory.java:171) > at > org.apache.lucene.store.FSDirectory.getDirectory(FSDirectory.java:141) > at > org.apache.lucene.store.FSDirectory.getDirectory(FSDirectory.java:117) > at > org.jamwiki.search.LuceneSearchEngine.getSearchIndexPath(LuceneSearchEngine.java:318) > The culprit is that the java.io.tmpdir property was incorrectly specified on > the user's system. Lucene could easily handle this issue by modifying the > FSDirectory.init() method. Currently the code uses the index directory if > java.io.tmpdir and org.apache.lucene.lockDir are unspecified, but it could > use that directory if those values are unspecified OR if they are invalid. > Doing so would make Lucene a bit more robust without breaking any existing > installations. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Created: (LUCENE-674) Error in FSDirectory if java.io.tmpdir incorrectly specified
Error in FSDirectory if java.io.tmpdir incorrectly specified Key: LUCENE-674 URL: http://issues.apache.org/jira/browse/LUCENE-674 Project: Lucene - Java Issue Type: Bug Components: Store Affects Versions: 2.0.0 Environment: Reported on a Linux system under Tomcat Reporter: Ryan Holliday A user of the JAMWiki project (http://jamwiki.org/) reported an error with the following stack trace: SEVERE: Unable to create search instance /usr/share/tomcat5/webapps/jamwiki-0.3.4-beta7/test/base/search/indexen java.io.IOException: Cannot create directory: /temp at org.apache.lucene.store.FSDirectory.init(FSDirectory.java:171) at org.apache.lucene.store.FSDirectory.getDirectory(FSDirectory.java:141) at org.apache.lucene.store.FSDirectory.getDirectory(FSDirectory.java:117) at org.jamwiki.search.LuceneSearchEngine.getSearchIndexPath(LuceneSearchEngine.java:318) The culprit is that the java.io.tmpdir property was incorrectly specified on the user's system. Lucene could easily handle this issue by modifying the FSDirectory.init() method. Currently the code uses the index directory if java.io.tmpdir and org.apache.lucene.lockDir are unspecified, but it could use that directory if those values are unspecified OR if they are invalid. Doing so would make Lucene a bit more robust without breaking any existing installations. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [jira] Commented: (LUCENE-665) temporary file access denied on Windows
"Michael McCandless (JIRA)" <[EMAIL PROTECTED]> wrote on 20/09/2006 04:41:26: > Doron, which version of TortoiseSVN did you have installed when you > got the exceptions? TortoiseSVN 1.3.5, Build 6804 - 32 Bit Subversion 1.3.2, apr 0.9.7 apr-iconv 0.9.7 apr-utils 0.9.7 berkeley db 4.3.28 neon 0.25.4 OpenSSL 0.9.8a 11 Oct 2005 zlib 1.2.3 My env: Microsoft Windows XP Professional Version 5.1.2600 Service Pack 2 Build 2600 Processor x86 Family 6 Model 13 Stepping 6 GenuineIntel ~1998 Mhz It is a laptop, btw. > I've installed version 1.4.0 on my Windows XP SP2 box, and then ran > your stress test just fine, ie, I can't reproduce the errors (to > verify that lock-less commits fixes this). - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-665) temporary file access denied on Windows
[ http://issues.apache.org/jira/browse/LUCENE-665?page=comments#action_12436220 ] Michael McCandless commented on LUCENE-665: --- Doron, which version of TortoiseSVN did you have installed when you got the exceptions? I've installed version 1.4.0 on my Windows XP SP2 box, and then ran your stress test just fine, ie, I can't reproduce the errors (to verify that lock-less commits fixes this). > temporary file access denied on Windows > --- > > Key: LUCENE-665 > URL: http://issues.apache.org/jira/browse/LUCENE-665 > Project: Lucene - Java > Issue Type: Bug > Components: Store >Affects Versions: 2.0.0 > Environment: Windows >Reporter: Doron Cohen > Attachments: FSDirectory_Retry_Logic.patch, > FSDirs_Retry_Logic_3.patch, FSWinDirectory.patch, Test_Output.txt, > TestInterleavedAddAndRemoves.java > > > When interleaving adds and removes there is frequent opening/closing of > readers and writers. > I tried to measure performance in such a scenario (for issue 565), but the > performance test failed - the indexing process crashed consistently with > file "access denied" errors - "cannot create a lock file" in > "lockFile.createNewFile()" and "cannot rename file". > This is related to: > - issue 516 (a closed issue: "TestFSDirectory fails on Windows") - > http://issues.apache.org/jira/browse/LUCENE-516 > - user list questions due to file errors: > - > http://www.nabble.com/OutOfMemory-and-IOException-Access-Denied-errors-tf1649795.html > - > http://www.nabble.com/running-a-lucene-indexing-app-as-a-windows-service-on-xp%2C-crashing-tf2053536.html > - discussion on lock-less commits > http://www.nabble.com/Lock-less-commits-tf2126935.html > My test setup is: XP (SP1), JAVA 1.5 - both SUN and IBM SDKs. > I noticed that the problem is more frequent when locks are created on one > disk and the index on another. Both are NTFS with Windows indexing service > enabled. I suspect this indexing service might be related - keeping files > busy for a while, but don't know for sure. > After experimenting with it I conclude that these problems - at least in my > scenario - are due to a temporary situation - the FS, or the OS, is > *temporarily* holding references to files or folders, preventing from > renaming them, deleting them, or creating new files in certain directories. > So I added to FSDirectory a retry logic in cases the error was related to > "Access Denied". This is the same approach brought in > http://www.nabble.com/running-a-lucene-indexing-app-as-a-windows-service-on-xp%2C-crashing-tf2053536.html > - there, in addition to the retry, gc() is invoked (I did not gc()). This is > based on the *hope* that a access-denied situation would vanish after a small > delay, and the retry would succeed. > I modified FSDirectory this way for "Access Denied" errors during creating a > new files, renaming a file. > This worked fine for me. The performance test that failed before, now managed > to complete. There should be no performance implications due to this > modification, because only the cases that would otherwise wrongly fail are > now delaying some extra millis and retry. > I am attaching here a patch - FSDirectory_Retry_Logic.patch - that has these > changes to FSDirectory. > All "ant test" tests pass with this patch. > Also attaching a test case that demostrates the problem - at least on my > machine. There two tests cases in that test file - one that works in system > temp (like most Lucene tests) and one that creates the index in a different > disk. The latter case can only run if the path ("D:" , "tmp") is valid. > It would be great if people that experienced these problems could try out > this patch and comment whether it made any difference for them. > If it turns out useful for others as well, including this patch in the code > might help to relieve some of those "frustration" user cases. > A comment on state of proposed patch: > - It is not a "ready to deploy" code - it has some debug printing, showing > the cases that the "retry logic" actually took place. > - I am not sure if current 30ms is the right delay... why not 50ms? 10ms? > This is currently defined by a constant. > - Should a call to gc() be added? (I think not.) > - Should the retry be attempted also on "non access-denied" exceptions? (I > think not). > - I feel it is somewhat "woodoo programming", but though I don't like it, it > seems to work... > Attached files: > 1. TestInterleavedAddAndRemoves.java - the LONG test that fails on XP without > the patch and passes with the patch. > 2. FSDirectory_Retry_Logic.patch > 3. Test_Output.txt- output of the test with the patch, on my XP. Only the > createNewFile() case had to be bypassed in this test, but for another program > I also saw the renameFile() being bypassed
[jira] Updated: (LUCENE-665) temporary file access denied on Windows
[ http://issues.apache.org/jira/browse/LUCENE-665?page=all ] Doron Cohen updated LUCENE-665: --- Attachment: FSWinDirectory.patch Attached patch - FSWinDirectory - implements retry logic of FS operations in a separate non default directory class as discussed above. By default this new class is not used. Applications can start using it by replacing the IMPL class in FSDirectory to be the new class FSWinDirectory. There are two ways to do this - by setting a system property (this is the original mechanism), or by calling FSDirectory static (new) method - setFSDirImplClass(name). There are 3 new classes in this patch: - FSWinDirectory (extends FSDirectory) - SimpleFSWinLockFactory (extends SimpleFSLockFactory) - TestWinLockFactory (extends TestLockFactory). Few simple modifications were required in FSDirectory, SimpleFSLockFactory and TestLockfactory in order to allow inheritance Tests: - "ant test" passes with new code. - For test, I modified my copy of build-common.xml to set a system property so that the new WinFS class was always in effect and ran the tests - all passed. - my stress test TestinterleavedAddAndRemoves fails in my env by default and passes when FSWinDirectory is in effect. > temporary file access denied on Windows > --- > > Key: LUCENE-665 > URL: http://issues.apache.org/jira/browse/LUCENE-665 > Project: Lucene - Java > Issue Type: Bug > Components: Store >Affects Versions: 2.0.0 > Environment: Windows >Reporter: Doron Cohen > Attachments: FSDirectory_Retry_Logic.patch, > FSDirs_Retry_Logic_3.patch, FSWinDirectory.patch, Test_Output.txt, > TestInterleavedAddAndRemoves.java > > > When interleaving adds and removes there is frequent opening/closing of > readers and writers. > I tried to measure performance in such a scenario (for issue 565), but the > performance test failed - the indexing process crashed consistently with > file "access denied" errors - "cannot create a lock file" in > "lockFile.createNewFile()" and "cannot rename file". > This is related to: > - issue 516 (a closed issue: "TestFSDirectory fails on Windows") - > http://issues.apache.org/jira/browse/LUCENE-516 > - user list questions due to file errors: > - > http://www.nabble.com/OutOfMemory-and-IOException-Access-Denied-errors-tf1649795.html > - > http://www.nabble.com/running-a-lucene-indexing-app-as-a-windows-service-on-xp%2C-crashing-tf2053536.html > - discussion on lock-less commits > http://www.nabble.com/Lock-less-commits-tf2126935.html > My test setup is: XP (SP1), JAVA 1.5 - both SUN and IBM SDKs. > I noticed that the problem is more frequent when locks are created on one > disk and the index on another. Both are NTFS with Windows indexing service > enabled. I suspect this indexing service might be related - keeping files > busy for a while, but don't know for sure. > After experimenting with it I conclude that these problems - at least in my > scenario - are due to a temporary situation - the FS, or the OS, is > *temporarily* holding references to files or folders, preventing from > renaming them, deleting them, or creating new files in certain directories. > So I added to FSDirectory a retry logic in cases the error was related to > "Access Denied". This is the same approach brought in > http://www.nabble.com/running-a-lucene-indexing-app-as-a-windows-service-on-xp%2C-crashing-tf2053536.html > - there, in addition to the retry, gc() is invoked (I did not gc()). This is > based on the *hope* that a access-denied situation would vanish after a small > delay, and the retry would succeed. > I modified FSDirectory this way for "Access Denied" errors during creating a > new files, renaming a file. > This worked fine for me. The performance test that failed before, now managed > to complete. There should be no performance implications due to this > modification, because only the cases that would otherwise wrongly fail are > now delaying some extra millis and retry. > I am attaching here a patch - FSDirectory_Retry_Logic.patch - that has these > changes to FSDirectory. > All "ant test" tests pass with this patch. > Also attaching a test case that demostrates the problem - at least on my > machine. There two tests cases in that test file - one that works in system > temp (like most Lucene tests) and one that creates the index in a different > disk. The latter case can only run if the path ("D:" , "tmp") is valid. > It would be great if people that experienced these problems could try out > this patch and comment whether it made any difference for them. > If it turns out useful for others as well, including this patch in the code > might help to relieve some of those "frustration" user cases. > A comment on state of proposed patch: > - It is not a "ready to deploy" code - it has some debug printin