[jira] Commented: (LUCENE-675) Lucene benchmark: objective performance test for Lucene

2006-09-20 Thread Paul Smith (JIRA)
[ 
http://issues.apache.org/jira/browse/LUCENE-675?page=comments#action_12436443 ] 

Paul Smith commented on LUCENE-675:
---

>From a strict performance point of view, a standard set of important, but 
>don't forget other languages.

>From a tokenization point of view (seperate to this issues), perhaps the 
>Gutenberg project would be useful to test correctness of the analysis phase.

> Lucene benchmark: objective performance test for Lucene
> ---
>
> Key: LUCENE-675
> URL: http://issues.apache.org/jira/browse/LUCENE-675
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Andrzej Bialecki 
> Attachments: LuceneBenchmark.java
>
>
> We need an objective way to measure the performance of Lucene, both indexing 
> and querying, on a known corpus. This issue is intended to collect comments 
> and patches implementing a suite of such benchmarking tests.
> Regarding the corpus: one of the widely used and freely available corpora is 
> the original Reuters collection, available from 
> http://www-2.cs.cmu.edu/afs/cs.cmu.edu/project/theo-20/www/data/news20.tar.gz 
> or 
> http://people.csail.mit.edu/u/j/jrennie/public_html/20Newsgroups/20news-18828.tar.gz.
>  I propose to use this corpus as a base for benchmarks. The benchmarking 
> suite could automatically retrieve it from known locations, and cache it 
> locally.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-675) Lucene benchmark: objective performance test for Lucene

2006-09-20 Thread Andrzej Bialecki (JIRA)
[ 
http://issues.apache.org/jira/browse/LUCENE-675?page=comments#action_12436442 ] 

Andrzej Bialecki  commented on LUCENE-675:
--

Yes, that could be a good additional source. However, IMHO the primary corpus 
should be widely known and standardized, hence my proposal of the Reuters.

(I mistakenly copy&paste-d the urls in the comment above - of course the corpus 
they're pointing at is the "20 Newsgroups", not the Reuters one. Correct url 
for the Reuters corpus is  
http://www.daviddlewis.com/resources/testcollections/reuters21578/ ).

> Lucene benchmark: objective performance test for Lucene
> ---
>
> Key: LUCENE-675
> URL: http://issues.apache.org/jira/browse/LUCENE-675
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Andrzej Bialecki 
> Attachments: LuceneBenchmark.java
>
>
> We need an objective way to measure the performance of Lucene, both indexing 
> and querying, on a known corpus. This issue is intended to collect comments 
> and patches implementing a suite of such benchmarking tests.
> Regarding the corpus: one of the widely used and freely available corpora is 
> the original Reuters collection, available from 
> http://www-2.cs.cmu.edu/afs/cs.cmu.edu/project/theo-20/www/data/news20.tar.gz 
> or 
> http://people.csail.mit.edu/u/j/jrennie/public_html/20Newsgroups/20news-18828.tar.gz.
>  I propose to use this corpus as a base for benchmarks. The benchmarking 
> suite could automatically retrieve it from known locations, and cache it 
> locally.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-675) Lucene benchmark: objective performance test for Lucene

2006-09-20 Thread Paul Smith (JIRA)
[ 
http://issues.apache.org/jira/browse/LUCENE-675?page=comments#action_12436437 ] 

Paul Smith commented on LUCENE-675:
---

If you're looking for freely available text in bulk, what about:

http://www.gutenberg.org/wiki/Main_Page

> Lucene benchmark: objective performance test for Lucene
> ---
>
> Key: LUCENE-675
> URL: http://issues.apache.org/jira/browse/LUCENE-675
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Andrzej Bialecki 
> Attachments: LuceneBenchmark.java
>
>
> We need an objective way to measure the performance of Lucene, both indexing 
> and querying, on a known corpus. This issue is intended to collect comments 
> and patches implementing a suite of such benchmarking tests.
> Regarding the corpus: one of the widely used and freely available corpora is 
> the original Reuters collection, available from 
> http://www-2.cs.cmu.edu/afs/cs.cmu.edu/project/theo-20/www/data/news20.tar.gz 
> or 
> http://people.csail.mit.edu/u/j/jrennie/public_html/20Newsgroups/20news-18828.tar.gz.
>  I propose to use this corpus as a base for benchmarks. The benchmarking 
> suite could automatically retrieve it from known locations, and cache it 
> locally.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Updated: (LUCENE-675) Lucene benchmark: objective performance test for Lucene

2006-09-20 Thread Andrzej Bialecki (JIRA)
 [ http://issues.apache.org/jira/browse/LUCENE-675?page=all ]

Andrzej Bialecki  updated LUCENE-675:
-

Attachment: LuceneBenchmark.java

This is just a starting point for discussion - it's a pretty old file I found 
lying around, so it may not even compile with modern Lucene. Requires 
commons-compress.

> Lucene benchmark: objective performance test for Lucene
> ---
>
> Key: LUCENE-675
> URL: http://issues.apache.org/jira/browse/LUCENE-675
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Andrzej Bialecki 
> Attachments: LuceneBenchmark.java
>
>
> We need an objective way to measure the performance of Lucene, both indexing 
> and querying, on a known corpus. This issue is intended to collect comments 
> and patches implementing a suite of such benchmarking tests.
> Regarding the corpus: one of the widely used and freely available corpora is 
> the original Reuters collection, available from 
> http://www-2.cs.cmu.edu/afs/cs.cmu.edu/project/theo-20/www/data/news20.tar.gz 
> or 
> http://people.csail.mit.edu/u/j/jrennie/public_html/20Newsgroups/20news-18828.tar.gz.
>  I propose to use this corpus as a base for benchmarks. The benchmarking 
> suite could automatically retrieve it from known locations, and cache it 
> locally.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Created: (LUCENE-675) Lucene benchmark: objective performance test for Lucene

2006-09-20 Thread Andrzej Bialecki (JIRA)
Lucene benchmark: objective performance test for Lucene
---

 Key: LUCENE-675
 URL: http://issues.apache.org/jira/browse/LUCENE-675
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Andrzej Bialecki 


We need an objective way to measure the performance of Lucene, both indexing 
and querying, on a known corpus. This issue is intended to collect comments and 
patches implementing a suite of such benchmarking tests.

Regarding the corpus: one of the widely used and freely available corpora is 
the original Reuters collection, available from 
http://www-2.cs.cmu.edu/afs/cs.cmu.edu/project/theo-20/www/data/news20.tar.gz 
or 
http://people.csail.mit.edu/u/j/jrennie/public_html/20Newsgroups/20news-18828.tar.gz.
 I propose to use this corpus as a base for benchmarks. The benchmarking suite 
could automatically retrieve it from known locations, and cache it locally.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-443) ConjunctionScorer tune-up

2006-09-20 Thread Grant Ingersoll (JIRA)
[ 
http://issues.apache.org/jira/browse/LUCENE-443?page=comments#action_12436414 ] 

Grant Ingersoll commented on LUCENE-443:


Yonik, Paul, do either of you know the status on this one?  From the looks of 
it, it hasn't been implemented.  It also has the highest number of votes in 
JIRA, so I thought I would take a look at it.  One downside is it is not in 
patch form, but it also doesn't look to hard to extract the changes, either.

One issue I have with these performance issues is that we don't have a reliable 
benchmarking suite.  I am not a lawyer, but might we be able to use something 
like http://trec.nist.gov/data/reuters/reuters.html to build a sample benchmark 
suite?  This corpus, plus 100 or so queries could work nicely.  Of course, we 
would have to figure out some way for those interested to get their hands on 
the data.  What do others do for benchmarking?

> ConjunctionScorer tune-up
> -
>
> Key: LUCENE-443
> URL: http://issues.apache.org/jira/browse/LUCENE-443
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Search
>Affects Versions: 1.9
> Environment: Linux, Java 1.5, Large Index with 4 million items and 
> some heavily nested boolean queries
>Reporter: Abdul Chaudhry
> Attachments: ConjunctionScorer.java, ConjunctionScorer.java
>
>
> I just recently ran a load test on the latest code from lucene , which is 
> using a new BooleanScore and noticed the ConjunctionScorer was crunching 
> through objects , especially while sorting as part of the skipTo call. It 
> turns a linked list into an array, sorts the array, then converts the array 
> back to a linked list for further processing by the scoring engines below.
> 'm not sure if anyone else is experiencing this as I have a very large index 
> (> 4 million items) and I am issuing some heavily nested queries
> Anyway, I decide to change the link list into an array and use a first and 
> last marker to "simulate" a linked list.
> This scaled much better during my load test as the java gargbage collector 
> was less - umm - virulent 

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Clustering IndexWriter?

2006-09-20 Thread Steve Harris

Warning, I'm a vendor dude but this isn't really a vendor message.

My IT guy had mentioned to me that a bunch of the open source products
we use (JIRA, JForum etc) have Lucene inside and in the name of eating
our own dog food
I tried to cluster IndexWriter (with a RAMDirectory) using our
(terracotta) clustering technology.

Took me about a half hour to get the basics working from download
time. I was wondering, do people in the real world want to be able to
cluster this stuff? Is clustering the IndexWriter really all I need to do?

If it is interesting, how do I feedback a small code change into the
project. We don't yet support subclasses of collections and
SegmentInfos subclasses Vector. I just turned it into aggregation
(that took 10 of the 30 minutes). We will support this in a future
release so it isn't a huge deal but I could get something out sooner
if the change was made.

Cheers,
Steve

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [jira] Commented: (LUCENE-674) Error in FSDirectory if java.io.tmpdir incorrectly specified

2006-09-20 Thread Chris Hostetter
: I'm not sure if "the user specified the wrong directory" is necessarily
: the correct situation here.  Unless a user specifically sets the
: org.apache.lucene.lockDir property, they aren't really choosing the lock
: directory location - Lucene uses the java.io.tmpdir property as a
: default, without any input from the user.  A user who runs into this

It depends on your definition of "user" ... someone is setting the
java.io.tmpdir ... if the lockDir hasn't been explicitly set, and tmpdir
points at a bogus directory, that should be an error.  (just like it
should be an error if lockDir is explicitly set, but points at a bogus
directory)

: problem will see only something like "Cannot create directory: /temp" in
: their logs, and then has to go through the source code to figure out why
: anything is trying to create that directory.
...
: The code already defaults to using the index directory for lock files
: (which the user DID specify) if the org.apache.lucene.lockDir property
: and the java.io.tmpdir properties are not set - it doesn't seem like
: much of a stretch to just modify the code to also use the index
: directory if at least the java.io.tmpdir property is invalid.

There is a big differnce between coosing a default in the absense of
input, and making assumptions when input is "bad" ... as i said:
applications that want to make these assumptions can do so, but the Lucene
*library* should not ... the system properties are input just like values
passed to method calls -- we have to respect that input.




-Hoss


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-674) Error in FSDirectory if java.io.tmpdir incorrectly specified

2006-09-20 Thread Ryan Holliday (JIRA)
[ 
http://issues.apache.org/jira/browse/LUCENE-674?page=comments#action_12436384 ] 

Ryan Holliday commented on LUCENE-674:
--

I'm not sure if "the user specified the wrong directory" is necessarily the 
correct situation here.  Unless a user specifically sets the 
org.apache.lucene.lockDir property, they aren't really choosing the lock 
directory location - Lucene uses the java.io.tmpdir property as a default, 
without any input from the user.  A user who runs into this problem will see 
only something like "Cannot create directory: /temp" in their logs, and then 
has to go through the source code to figure out why anything is trying to 
create that directory.

The code already defaults to using the index directory for lock files (which 
the user DID specify) if the org.apache.lucene.lockDir property and the 
java.io.tmpdir properties are not set - it doesn't seem like much of a stretch 
to just modify the code to also use the index directory if at least the 
java.io.tmpdir property is invalid.

> Error in FSDirectory if java.io.tmpdir incorrectly specified
> 
>
> Key: LUCENE-674
> URL: http://issues.apache.org/jira/browse/LUCENE-674
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Store
>Affects Versions: 2.0.0
> Environment: Reported on a Linux system under Tomcat
>Reporter: Ryan Holliday
>
> A user of the JAMWiki project (http://jamwiki.org/) reported an error with 
> the following stack trace:
> SEVERE: Unable to create search instance 
> /usr/share/tomcat5/webapps/jamwiki-0.3.4-beta7/test/base/search/indexen
> java.io.IOException: Cannot create directory: /temp
> at org.apache.lucene.store.FSDirectory.init(FSDirectory.java:171)
> at 
> org.apache.lucene.store.FSDirectory.getDirectory(FSDirectory.java:141)
> at 
> org.apache.lucene.store.FSDirectory.getDirectory(FSDirectory.java:117)
> at 
> org.jamwiki.search.LuceneSearchEngine.getSearchIndexPath(LuceneSearchEngine.java:318)
> The culprit is that the java.io.tmpdir property was incorrectly specified on 
> the user's system.  Lucene could easily handle this issue by modifying the 
> FSDirectory.init() method.  Currently the code uses the index directory if 
> java.io.tmpdir and org.apache.lucene.lockDir are unspecified, but it could 
> use that directory if those values are unspecified OR if they are invalid.  
> Doing so would make Lucene a bit more robust without breaking any existing 
> installations.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-674) Error in FSDirectory if java.io.tmpdir incorrectly specified

2006-09-20 Thread Hoss Man (JIRA)
[ 
http://issues.apache.org/jira/browse/LUCENE-674?page=comments#action_12436351 ] 

Hoss Man commented on LUCENE-674:
-

This sounds like a very similar issue to some past discussion about the path 
specified when opening a directory, and what to do if it doesn't exist (ie: 
create it, or throw an error) ... in general i think it would be unadvisable to 
assume that if java.io.tmpdir refers to a bogus directory that we should use 
the index directory, because that could lead to situations were typo result in 
errors silently being ignored to the possible extend of index corruption.

(consider for a moment: two lucene based apps running in two seperate JVM 
instances on the same machine, attempting to use hte same index directory; one 
with a properly set java.io.tmpdir and one without -- they will most likely 
crash hard because they would wilently use completley differnet directories for 
managing locks).

As with the discussion about index directories that don't exist, applications 
that *want* to silenetly assume that a bogus java.io.tmpdir property should 
result in using the index directory for lock files can get that behavior if 
they want (by testing java.io.tmpdir themselves, and explicitly constructing a 
SimpleFSLockFactory() on the directory they want to use) but Lucene should not 
make any assumptions about what the client application wants in the case of 
garbage input.

> Error in FSDirectory if java.io.tmpdir incorrectly specified
> 
>
> Key: LUCENE-674
> URL: http://issues.apache.org/jira/browse/LUCENE-674
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Store
>Affects Versions: 2.0.0
> Environment: Reported on a Linux system under Tomcat
>Reporter: Ryan Holliday
>
> A user of the JAMWiki project (http://jamwiki.org/) reported an error with 
> the following stack trace:
> SEVERE: Unable to create search instance 
> /usr/share/tomcat5/webapps/jamwiki-0.3.4-beta7/test/base/search/indexen
> java.io.IOException: Cannot create directory: /temp
> at org.apache.lucene.store.FSDirectory.init(FSDirectory.java:171)
> at 
> org.apache.lucene.store.FSDirectory.getDirectory(FSDirectory.java:141)
> at 
> org.apache.lucene.store.FSDirectory.getDirectory(FSDirectory.java:117)
> at 
> org.jamwiki.search.LuceneSearchEngine.getSearchIndexPath(LuceneSearchEngine.java:318)
> The culprit is that the java.io.tmpdir property was incorrectly specified on 
> the user's system.  Lucene could easily handle this issue by modifying the 
> FSDirectory.init() method.  Currently the code uses the index directory if 
> java.io.tmpdir and org.apache.lucene.lockDir are unspecified, but it could 
> use that directory if those values are unspecified OR if they are invalid.  
> Doing so would make Lucene a bit more robust without breaking any existing 
> installations.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Created: (LUCENE-674) Error in FSDirectory if java.io.tmpdir incorrectly specified

2006-09-20 Thread Ryan Holliday (JIRA)
Error in FSDirectory if java.io.tmpdir incorrectly specified


 Key: LUCENE-674
 URL: http://issues.apache.org/jira/browse/LUCENE-674
 Project: Lucene - Java
  Issue Type: Bug
  Components: Store
Affects Versions: 2.0.0
 Environment: Reported on a Linux system under Tomcat
Reporter: Ryan Holliday


A user of the JAMWiki project (http://jamwiki.org/) reported an error with the 
following stack trace:

SEVERE: Unable to create search instance 
/usr/share/tomcat5/webapps/jamwiki-0.3.4-beta7/test/base/search/indexen
java.io.IOException: Cannot create directory: /temp
at org.apache.lucene.store.FSDirectory.init(FSDirectory.java:171)
at 
org.apache.lucene.store.FSDirectory.getDirectory(FSDirectory.java:141)
at 
org.apache.lucene.store.FSDirectory.getDirectory(FSDirectory.java:117)
at 
org.jamwiki.search.LuceneSearchEngine.getSearchIndexPath(LuceneSearchEngine.java:318)

The culprit is that the java.io.tmpdir property was incorrectly specified on 
the user's system.  Lucene could easily handle this issue by modifying the 
FSDirectory.init() method.  Currently the code uses the index directory if 
java.io.tmpdir and org.apache.lucene.lockDir are unspecified, but it could use 
that directory if those values are unspecified OR if they are invalid.  Doing 
so would make Lucene a bit more robust without breaking any existing 
installations.


-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [jira] Commented: (LUCENE-665) temporary file access denied on Windows

2006-09-20 Thread Doron Cohen
"Michael McCandless (JIRA)" <[EMAIL PROTECTED]> wrote on 20/09/2006 04:41:26:
> Doron, which version of TortoiseSVN did you have installed when you
> got the exceptions?

TortoiseSVN 1.3.5, Build 6804 - 32 Bit
  Subversion 1.3.2,
  apr 0.9.7
  apr-iconv 0.9.7
  apr-utils 0.9.7
  berkeley db 4.3.28
  neon 0.25.4
  OpenSSL 0.9.8a 11 Oct 2005
  zlib 1.2.3

My env:
  Microsoft Windows XP Professional
  Version 5.1.2600 Service Pack 2 Build 2600
  Processor x86 Family 6 Model 13 Stepping 6 GenuineIntel ~1998 Mhz

It is a laptop, btw.

> I've installed version 1.4.0 on my Windows XP SP2 box, and then ran
> your stress test just fine, ie, I can't reproduce the errors (to
> verify that lock-less commits fixes this).


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-665) temporary file access denied on Windows

2006-09-20 Thread Michael McCandless (JIRA)
[ 
http://issues.apache.org/jira/browse/LUCENE-665?page=comments#action_12436220 ] 

Michael McCandless commented on LUCENE-665:
---

Doron, which version of TortoiseSVN did you have installed when you got the 
exceptions?

I've installed version 1.4.0 on my Windows XP SP2 box, and then ran your stress 
test just fine, ie, I can't reproduce the errors (to verify that lock-less 
commits fixes this).

> temporary file access denied on Windows
> ---
>
> Key: LUCENE-665
> URL: http://issues.apache.org/jira/browse/LUCENE-665
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Store
>Affects Versions: 2.0.0
> Environment: Windows
>Reporter: Doron Cohen
> Attachments: FSDirectory_Retry_Logic.patch, 
> FSDirs_Retry_Logic_3.patch, FSWinDirectory.patch, Test_Output.txt, 
> TestInterleavedAddAndRemoves.java
>
>
> When interleaving adds and removes there is frequent opening/closing of 
> readers and writers. 
> I tried to measure performance in such a scenario (for issue 565), but the 
> performance test failed  - the indexing process crashed consistently with 
> file "access denied" errors - "cannot create a lock file" in 
> "lockFile.createNewFile()" and "cannot rename file".
> This is related to:
> - issue 516 (a closed issue: "TestFSDirectory fails on Windows") - 
> http://issues.apache.org/jira/browse/LUCENE-516 
> - user list questions due to file errors:
>   - 
> http://www.nabble.com/OutOfMemory-and-IOException-Access-Denied-errors-tf1649795.html
>   - 
> http://www.nabble.com/running-a-lucene-indexing-app-as-a-windows-service-on-xp%2C-crashing-tf2053536.html
> - discussion on lock-less commits 
> http://www.nabble.com/Lock-less-commits-tf2126935.html
> My test setup is: XP (SP1), JAVA 1.5 - both SUN and IBM SDKs. 
> I noticed that the problem is more frequent when locks are created on one 
> disk and the index on another. Both are NTFS with Windows indexing service 
> enabled. I suspect this indexing service might be related - keeping files 
> busy for a while, but don't know for sure.
> After experimenting with it I conclude that these problems - at least in my 
> scenario - are due to a temporary situation - the FS, or the OS, is 
> *temporarily* holding references to files or folders, preventing from 
> renaming them, deleting them, or creating new files in certain directories. 
> So I added to FSDirectory a retry logic in cases the error was related to 
> "Access Denied". This is the same approach brought in 
> http://www.nabble.com/running-a-lucene-indexing-app-as-a-windows-service-on-xp%2C-crashing-tf2053536.html
>  - there, in addition to the retry, gc() is invoked (I did not gc()). This is 
> based on the *hope* that a access-denied situation would vanish after a small 
> delay, and the retry would succeed.
> I modified FSDirectory this way for "Access Denied" errors during creating a 
> new files, renaming a file.
> This worked fine for me. The performance test that failed before, now managed 
> to complete. There should be no performance implications due to this 
> modification, because only the cases that would otherwise wrongly fail are 
> now delaying some extra millis and retry.
> I am attaching here a patch - FSDirectory_Retry_Logic.patch - that has these 
> changes to FSDirectory. 
> All "ant test" tests pass with this patch.
> Also attaching a test case that demostrates the problem - at least on my 
> machine. There two tests cases in that test file - one that works in system 
> temp (like most Lucene tests) and one that creates the index in a different 
> disk. The latter case can only run if the path ("D:" , "tmp") is valid.
> It would be great if people that experienced these problems could try out 
> this patch and comment whether it made any difference for them. 
> If it turns out useful for others as well, including this patch in the code 
> might help to relieve some of those "frustration" user cases.
> A comment on state of proposed patch: 
> - It is not a "ready to deploy" code - it has some debug printing, showing 
> the cases that the "retry logic" actually took place. 
> - I am not sure if current 30ms is the right delay... why not 50ms? 10ms? 
> This is currently defined by a constant.
> - Should a call to gc() be added? (I think not.)
> - Should the retry be attempted also on "non access-denied" exceptions? (I 
> think not).
> - I feel it is somewhat "woodoo programming", but though I don't like it, it 
> seems to work... 
> Attached files:
> 1. TestInterleavedAddAndRemoves.java - the LONG test that fails on XP without 
> the patch and passes with the patch.
> 2. FSDirectory_Retry_Logic.patch
> 3. Test_Output.txt- output of the test with the patch, on my XP. Only the 
> createNewFile() case had to be bypassed in this test, but for another program 
> I also saw the renameFile() being bypassed

[jira] Updated: (LUCENE-665) temporary file access denied on Windows

2006-09-20 Thread Doron Cohen (JIRA)
 [ http://issues.apache.org/jira/browse/LUCENE-665?page=all ]

Doron Cohen updated LUCENE-665:
---

Attachment: FSWinDirectory.patch

Attached patch - FSWinDirectory - implements retry logic of FS operations in a 
separate non default directory class as discussed above. 

By default this new class is not used. Applications can start using it by 
replacing the IMPL class in FSDirectory to be the new class FSWinDirectory. 

There are two ways to do this - by setting a system property (this is the 
original mechanism), or by calling FSDirectory static (new) method - 
setFSDirImplClass(name). 

There are 3 new classes in this patch: 
- FSWinDirectory (extends FSDirectory)
- SimpleFSWinLockFactory (extends SimpleFSLockFactory)
- TestWinLockFactory (extends TestLockFactory). 

Few simple modifications were required in FSDirectory, SimpleFSLockFactory and 
TestLockfactory in order to allow inheritance

Tests:
- "ant test" passes with new code.
- For test, I modified my copy of build-common.xml to set a system property so 
that the new WinFS class was always in effect and ran the tests - all passed. 
- my stress test TestinterleavedAddAndRemoves fails in my env by default and 
passes when FSWinDirectory is in effect.

> temporary file access denied on Windows
> ---
>
> Key: LUCENE-665
> URL: http://issues.apache.org/jira/browse/LUCENE-665
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Store
>Affects Versions: 2.0.0
> Environment: Windows
>Reporter: Doron Cohen
> Attachments: FSDirectory_Retry_Logic.patch, 
> FSDirs_Retry_Logic_3.patch, FSWinDirectory.patch, Test_Output.txt, 
> TestInterleavedAddAndRemoves.java
>
>
> When interleaving adds and removes there is frequent opening/closing of 
> readers and writers. 
> I tried to measure performance in such a scenario (for issue 565), but the 
> performance test failed  - the indexing process crashed consistently with 
> file "access denied" errors - "cannot create a lock file" in 
> "lockFile.createNewFile()" and "cannot rename file".
> This is related to:
> - issue 516 (a closed issue: "TestFSDirectory fails on Windows") - 
> http://issues.apache.org/jira/browse/LUCENE-516 
> - user list questions due to file errors:
>   - 
> http://www.nabble.com/OutOfMemory-and-IOException-Access-Denied-errors-tf1649795.html
>   - 
> http://www.nabble.com/running-a-lucene-indexing-app-as-a-windows-service-on-xp%2C-crashing-tf2053536.html
> - discussion on lock-less commits 
> http://www.nabble.com/Lock-less-commits-tf2126935.html
> My test setup is: XP (SP1), JAVA 1.5 - both SUN and IBM SDKs. 
> I noticed that the problem is more frequent when locks are created on one 
> disk and the index on another. Both are NTFS with Windows indexing service 
> enabled. I suspect this indexing service might be related - keeping files 
> busy for a while, but don't know for sure.
> After experimenting with it I conclude that these problems - at least in my 
> scenario - are due to a temporary situation - the FS, or the OS, is 
> *temporarily* holding references to files or folders, preventing from 
> renaming them, deleting them, or creating new files in certain directories. 
> So I added to FSDirectory a retry logic in cases the error was related to 
> "Access Denied". This is the same approach brought in 
> http://www.nabble.com/running-a-lucene-indexing-app-as-a-windows-service-on-xp%2C-crashing-tf2053536.html
>  - there, in addition to the retry, gc() is invoked (I did not gc()). This is 
> based on the *hope* that a access-denied situation would vanish after a small 
> delay, and the retry would succeed.
> I modified FSDirectory this way for "Access Denied" errors during creating a 
> new files, renaming a file.
> This worked fine for me. The performance test that failed before, now managed 
> to complete. There should be no performance implications due to this 
> modification, because only the cases that would otherwise wrongly fail are 
> now delaying some extra millis and retry.
> I am attaching here a patch - FSDirectory_Retry_Logic.patch - that has these 
> changes to FSDirectory. 
> All "ant test" tests pass with this patch.
> Also attaching a test case that demostrates the problem - at least on my 
> machine. There two tests cases in that test file - one that works in system 
> temp (like most Lucene tests) and one that creates the index in a different 
> disk. The latter case can only run if the path ("D:" , "tmp") is valid.
> It would be great if people that experienced these problems could try out 
> this patch and comment whether it made any difference for them. 
> If it turns out useful for others as well, including this patch in the code 
> might help to relieve some of those "frustration" user cases.
> A comment on state of proposed patch: 
> - It is not a "ready to deploy" code - it has some debug printin