date:20061213

Re: fileformats.html not in sync with fileformats.xml

2006-12-13 Thread Michael McCandless


Doron Cohen wrote:

http://issues.apache.org/jira/browse/LUCENE-738 updated fileformats.xml.
This shows correctly in
http://svn.apache.org/viewvc/lucene/java/trunk/src/site/src/documentation/content/xdocs/fileformats.xml?view=markup
 but not reflected (2nd day now) in the Main site version
http://lucene.apache.org/java/docs/fileformats.html


OK, looks like the docs just needed to be regenerated  pushed to the
site.  I've done this now (it was a great chance to test the
instructions @ 
http://wiki.apache.org/jakarta-lucene/HowToUpdateTheWebsite -- thanks

Hoss!).

So I think changes should refresh to the public site in maybe 30
minutes or so

Mike

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (LUCENE-565) Supporting deleteDocuments in IndexWriter (Code and Performance Results Provided)

2006-12-13 Thread Ning Li (JIRA)

[
http://issues.apache.org/jira/browse/LUCENE-565?page=comments#action_12458158 ]

Ning Li commented on LUCENE-565:

Can the same thing happen with your patch (with a smaller window), or are
deletes applied between writing the new segment and writing the new segments
file that references it? (hard to tell from current diff in isolation)

No, it does not happen with the patch, no matter what the window size is.
This is because results of flushing ram - both inserts and deletes - are
committed in the same transaction.

Supporting deleteDocuments in IndexWriter (Code and Performance Results
Provided)
-

Key: LUCENE-565
URL: http://issues.apache.org/jira/browse/LUCENE-565
Project: Lucene - Java
Issue Type: Bug
Components: Index
Reporter: Ning Li
Attachments: IndexWriter.java, IndexWriter.July09.patch,
IndexWriter.patch, KeepDocCount0Segment.Sept15.patch,
NewIndexModifier.July09.patch, NewIndexModifier.Sept21.patch,
NewIndexWriter.Aug23.patch, NewIndexWriter.July18.patch,
newMergePolicy.Sept08.patch, perf-test-res.JPG, perf-test-res2.JPG,
perfres.log, TestBufferedDeletesPerf.java, TestWriterDelete.java

Today, applications have to open/close an IndexWriter and open/close an
IndexReader directly or indirectly (via IndexModifier) in order to handle a
mix of inserts and deletes. This performs well when inserts and deletes
come in fairly large batches. However, the performance can degrade
dramatically when inserts and deletes are interleaved in small batches.
This is because the ramDirectory is flushed to disk whenever an IndexWriter
is closed, causing a lot of small segments to be created on disk, which
eventually need to be merged.
We would like to propose a small API change to eliminate this problem. We
are aware that this kind change has come up in discusions before. See
http://www.gossamer-threads.com/lists/lucene/java-dev/23049?search_string=indexwriter%20delete;#23049
. The difference this time is that we have implemented the change and
tested its performance, as described below.
API Changes
---
We propose adding a deleteDocuments(Term term) method to IndexWriter.
Using this method, inserts and deletes can be interleaved using the same
IndexWriter.
Note that, with this change it would be very easy to add another method to
IndexWriter for updating documents, allowing applications to avoid a
separate delete and insert to update a document.
Also note that this change can co-exist with the existing APIs for deleting
documents using an IndexReader. But if our proposal is accepted, we think
those APIs should probably be deprecated.
Coding Changes
--
Coding changes are localized to IndexWriter. Internally, the new
deleteDocuments() method works by buffering the terms to be deleted.
Deletes are deferred until the ramDirectory is flushed to disk, either
because it becomes full or because the IndexWriter is closed. Using Java
synchronization, care is taken to ensure that an interleaved sequence of
inserts and deletes for the same document are properly serialized.
We have attached a modified version of IndexWriter in Release 1.9.1 with
these changes. Only a few hundred lines of coding changes are needed. All
changes are commented by CHANGE. We have also attached a modified version
of an example from Chapter 2.2 of Lucene in Action.
Performance Results
---
To test the performance our proposed changes, we ran some experiments using
the TREC WT 10G dataset. The experiments were run on a dual 2.4 Ghz Intel
Xeon server running Linux. The disk storage was configured as RAID0 array
with 5 drives. Before indexes were built, the input documents were parsed
to remove the HTML from them (i.e., only the text was indexed). This was
done to minimize the impact of parsing on performance. A simple
WhitespaceAnalyzer was used during index build.
We experimented with three workloads:
- Insert only. 1.6M documents were inserted and the final
index size was 2.3GB.
- Insert/delete (big batches). The same documents were
inserted, but 25% were deleted. 1000 documents were
deleted for every 4000 inserted.
- Insert/delete (small batches). In this case, 5 documents
were deleted for every 20 inserted.
current current new
Workload IndexWriter IndexModifier IndexWriter
---
Insert only 116 min 119 min116 min
Insert/delete (big batches) -- 135 min125 min
Insert/delete (small batches) -- 338 min134 min
As the experiments show, with the proposed changes, the performance
improved by

[jira] Commented: (LUCENE-565) Supporting deleteDocuments in IndexWriter (Code and Performance Results Provided)

2006-12-13 Thread Yonik Seeley (JIRA)

[
http://issues.apache.org/jira/browse/LUCENE-565?page=comments#action_12458170 ]

Yonik Seeley commented on LUCENE-565:
-

both inserts and deletes - are committed in the same transaction.

OK, cool. I agree that's the ideal default behavior.

Supporting deleteDocuments in IndexWriter (Code and Performance Results
Provided)
-

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of

[jira] Commented: (LUCENE-740) Bugs in contrib/snowball/.../SnowballProgram.java - Kraaij-Pohlmann gives Index-OOB Exception

2006-12-13 Thread Steven Parkes (JIRA)

[ 
http://issues.apache.org/jira/browse/LUCENE-740?page=comments#action_12458201 ] 

Steven Parkes commented on LUCENE-740:
--

I'm kind of wondering about the snowball licensing, so I'm intrigued by Yonik's 
comment. Cleanup is necessary?

Did the original snowball authors agree to license the software under the 
AL2.0? That's what LICENSE.txt says now. The source site cites the BSD license 
and says you can't claim it's licensed under another license.

 Bugs in contrib/snowball/.../SnowballProgram.java - Kraaij-Pohlmann gives 
 Index-OOB Exception
 --

 Key: LUCENE-740
 URL: http://issues.apache.org/jira/browse/LUCENE-740
 Project: Lucene - Java
  Issue Type: Bug
  Components: Analysis
Affects Versions: 1.9
 Environment: linux amd64
Reporter: Andreas Kohn
Priority: Minor
 Attachments: lucene-1.9.1-SnowballProgram.java, snowball.patch.txt


 (copied from mail to java-user)
 while playing with the various stemmers of Lucene(-1.9.1), I got an
 index out of bounds exception:
 lucene-1.9.1java -cp
 build/contrib/snowball/lucene-snowball-1.9.2-dev.jar
 net.sf.snowball.TestApp Kp bla.txt
 Exception in thread main java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:64)
at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:615)
at net.sf.snowball.TestApp.main(TestApp.java:56)
 Caused by: java.lang.StringIndexOutOfBoundsException: String index out
 of range: 11
at java.lang.StringBuffer.charAt(StringBuffer.java:303)
at 
 net.sf.snowball.SnowballProgram.find_among_b(SnowballProgram.java:270)
at net.sf.snowball.ext.KpStemmer.r_Step_4(KpStemmer.java:1122)
at net.sf.snowball.ext.KpStemmer.stem(KpStemmer.java:1997)
 This happens when executing
 lucene-1.9.1java -cp
 build/contrib/snowball/lucene-snowball-1.9.2-dev.jar
 net.sf.snowball.TestApp Kp bla.txt
 bla.txt contains just this word: 'spijsvertering'.
 After some debugging, and some tests with the original snowball
 distribution from snowball.tartarus.org, it seems that the attached
 change is needed to avoid the exception.
 (The change comes from tartarus' SnowballProgram.java)

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (LUCENE-740) Bugs in contrib/snowball/.../SnowballProgram.java - Kraaij-Pohlmann gives Index-OOB Exception

2006-12-13 Thread Doug Cutting (JIRA)

[ 
http://issues.apache.org/jira/browse/LUCENE-740?page=comments#action_12458209 ] 

Doug Cutting commented on LUCENE-740:
-

This is a good question.  We redistribute stuff generated from Snowball 
sources, not the original files.  Does this constitute a redistribution in 
binary form?

I think the LICENSE.txt here refers to the code that's included in this 
sub-tree, which is Apache-licensed.  So that's okay.  If anything we might need 
to add something to NOTICE.txt and/or include a copy of Snowball's BSD license 
too, as something like SNOWBALL-LICENSE.txt.


 Bugs in contrib/snowball/.../SnowballProgram.java - Kraaij-Pohlmann gives 
 Index-OOB Exception
 --

 Key: LUCENE-740
 URL: http://issues.apache.org/jira/browse/LUCENE-740
 Project: Lucene - Java
  Issue Type: Bug
  Components: Analysis
Affects Versions: 1.9
 Environment: linux amd64
Reporter: Andreas Kohn
Priority: Minor
 Attachments: lucene-1.9.1-SnowballProgram.java, snowball.patch.txt


 (copied from mail to java-user)
 while playing with the various stemmers of Lucene(-1.9.1), I got an
 index out of bounds exception:
 lucene-1.9.1java -cp
 build/contrib/snowball/lucene-snowball-1.9.2-dev.jar
 net.sf.snowball.TestApp Kp bla.txt
 Exception in thread main java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:64)
at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:615)
at net.sf.snowball.TestApp.main(TestApp.java:56)
 Caused by: java.lang.StringIndexOutOfBoundsException: String index out
 of range: 11
at java.lang.StringBuffer.charAt(StringBuffer.java:303)
at 
 net.sf.snowball.SnowballProgram.find_among_b(SnowballProgram.java:270)
at net.sf.snowball.ext.KpStemmer.r_Step_4(KpStemmer.java:1122)
at net.sf.snowball.ext.KpStemmer.stem(KpStemmer.java:1997)
 This happens when executing
 lucene-1.9.1java -cp
 build/contrib/snowball/lucene-snowball-1.9.2-dev.jar
 net.sf.snowball.TestApp Kp bla.txt
 bla.txt contains just this word: 'spijsvertering'.
 After some debugging, and some tests with the original snowball
 distribution from snowball.tartarus.org, it seems that the attached
 change is needed to avoid the exception.
 (The change comes from tartarus' SnowballProgram.java)

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

IBM OmniFind Yahoo! Edition

2006-12-13 Thread Doug Cutting


I just saw the following new Lucene application announced:

http://omnifind.ibm.yahoo.net/productinfo.php

While I work for Yahoo!, I know nothing about Yahoo!'s involvement in
this except for what I've just read in the press.  Were any of the IBM
folks on this list involved?  If so, congratulations!  Can you tell us
any more about how Lucene is used here?

(I see that Steven Parkes already updated the Powered By page in the
wiki...)

Doug

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: IBM OmniFind Yahoo! Edition

2006-12-13 Thread Steven Parkes

The primary folks on the Lucene side are Michael Busch and Andreas
Neumann. Certainly other folks at IBM have contributed significant
pieces (though notable NOT me) but Michael and Andreas did most of the
heavy lifting.

I'll leave them to take credit for their work.

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Created: (LUCENE-746) Incorrect error message in AnalyzingQueryParser.getPrefixQuery

2006-12-13 Thread Ronnie Kolehmainen (JIRA)

Incorrect error message in AnalyzingQueryParser.getPrefixQuery
--

 Key: LUCENE-746
 URL: http://issues.apache.org/jira/browse/LUCENE-746
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Other
Reporter: Ronnie Kolehmainen
Priority: Minor


The error message of  getPrefixQuery is incorrect when tokens were added, for 
example by a stemmer. The message is token was consumed even if tokens were 
added.
Attached is a patch, which when applied gives a better description of what 
actually happened.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Updated: (LUCENE-746) Incorrect error message in AnalyzingQueryParser.getPrefixQuery

2006-12-13 Thread Ronnie Kolehmainen (JIRA)

 [ http://issues.apache.org/jira/browse/LUCENE-746?page=all ]

Ronnie Kolehmainen updated LUCENE-746:
--

Attachment: AnalyzingQueryParser.getPrefixQuery.patch

Patch for current trunk.

 Incorrect error message in AnalyzingQueryParser.getPrefixQuery
 --

 Key: LUCENE-746
 URL: http://issues.apache.org/jira/browse/LUCENE-746
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Other
Reporter: Ronnie Kolehmainen
Priority: Minor
 Attachments: AnalyzingQueryParser.getPrefixQuery.patch


 The error message of  getPrefixQuery is incorrect when tokens were added, for 
 example by a stemmer. The message is token was consumed even if tokens were 
 added.
 Attached is a patch, which when applied gives a better description of what 
 actually happened.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

GData DB4o reloaded

2006-12-13 Thread Simon Willnauer


Hello all,

two weeks ago I met the DB4O CEO at the DB4O roadshow in Berlin. We
talked about gdata, lucene and the license nightmare. Two days later I
got an email that db4o will release a third license to allow projects
like lucene to closely-distribute the db4o binaries and source. I did
receive the licence today. So here goes the question.. Do I have to
talk to some ASF officials / lawyers about that stuff or should I just
add the license text and jar to the svn.

Currently I just got an PDF license document should I send this to the
list at all?!

best regards Simon

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: GData DB4o reloaded

2006-12-13 Thread Brian McCallister


What are the licensing terms?

-Brian

On Dec 13, 2006, at 11:31 AM, Simon Willnauer wrote:


Hello all,

two weeks ago I met the DB4O CEO at the DB4O roadshow in Berlin. We
talked about gdata, lucene and the license nightmare. Two days later I
got an email that db4o will release a third license to allow projects
like lucene to closely-distribute the db4o binaries and source. I did
receive the licence today. So here goes the question.. Do I have to
talk to some ASF officials / lawyers about that stuff or should I just
add the license text and jar to the svn.

Currently I just got an PDF license document should I send this to the
list at all?!

best regards Simon

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: GData DB4o reloaded

2006-12-13 Thread Simon Willnauer


There your go:
http://www.db4o.com/about/company/legalpolicies/docl.aspx

thanks simon

On 12/13/06, Brian McCallister [EMAIL PROTECTED] wrote:

What are the licensing terms?

-Brian

On Dec 13, 2006, at 11:31 AM, Simon Willnauer wrote:

 Hello all,

 two weeks ago I met the DB4O CEO at the DB4O roadshow in Berlin. We
 talked about gdata, lucene and the license nightmare. Two days later I
 got an email that db4o will release a third license to allow projects
 like lucene to closely-distribute the db4o binaries and source. I did
 receive the licence today. So here goes the question.. Do I have to
 talk to some ASF officials / lawyers about that stuff or should I just
 add the license text and jar to the svn.

 Currently I just got an PDF license document should I send this to the
 list at all?!

 best regards Simon

 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (LUCENE-436) [PATCH] TermInfosReader, SegmentTermEnum Out Of Memory Exception

2006-12-13 Thread Otis Gospodnetic (JIRA)

[ 
http://issues.apache.org/jira/browse/LUCENE-436?page=comments#action_12458264 ] 

Otis Gospodnetic commented on LUCENE-436:
-

4 months later, I think I see the same problem here.
I'm using JDK 1.6 (I saw the same problem under 1.5.0_0(8,9,10)) and Lucene 
from HEAD (2.1-dev).
I'm running out of 2GB heap in under 1 day on a production system that searches 
tens of thousands of indexes, where a few hundred of them have IndexSearchers 
open to them at any one time, with unused IndexSearchers getting closed after 
some period of inactivity.

I'm periodically dumping the heap with jconsole and noticing the continuously 
increasing number of:

 org.apache.lucene.index.TermInfo
 org.apache.lucene.index.CompoundFileReader$CSIndexInput
 org.apache.lucene.index.Term
 org.apache.lucene.index.SegmentTermEnum
...

There was a LOT of back and forth here.

What is the final solution?  I see a complete new copy of TermInfosReader, but 
there are a lot of formatting changes in there, it's hard to tell what was 
actually changed, even with diff -bB --expand-tabs.

I also see FixedThreadLocal, but I see no references to it from 
TermInfosReader...?


 [PATCH] TermInfosReader, SegmentTermEnum Out Of Memory Exception
 

 Key: LUCENE-436
 URL: http://issues.apache.org/jira/browse/LUCENE-436
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Affects Versions: 1.4
 Environment: Solaris JVM 1.4.1
 Linux JVM 1.4.2/1.5.0
 Windows not tested
Reporter: kieran
 Attachments: FixedThreadLocal.java, lucene-1.9.1.patch, 
 Lucene-436-TestCase.tar.gz, TermInfosReader.java, ThreadLocalTest.java


 We've been experiencing terrible memory problems on our production search 
 server, running lucene (1.4.3).
 Our live app regularly opens new indexes and, in doing so, releases old 
 IndexReaders for garbage collection.
 But...there appears to be a memory leak in 
 org.apache.lucene.index.TermInfosReader.java.
 Under certain conditions (possibly related to JVM version, although I've 
 personally observed it under both linux JVM 1.4.2_06, and 1.5.0_03, and SUNOS 
 JVM 1.4.1) the ThreadLocal member variable, enumerators doesn't get 
 garbage-collected when the TermInfosReader object is gc-ed.
 Looking at the code in TermInfosReader.java, there's no reason why it 
 _shouldn't_ be gc-ed, so I can only presume (and I've seen this suggested 
 elsewhere) that there could be a bug in the garbage collector of some JVMs.
 I've seen this problem briefly discussed; in particular at the following URL:
   http://java2.5341.com/msg/85821.html
 The patch that Doug recommended, which is included in lucene-1.4.3 doesn't 
 work in our particular circumstances. Doug's patch only clears the 
 ThreadLocal variable for the thread running the finalizer (my knowledge of 
 java breaks down here - I'm not sure which thread actually runs the 
 finalizer). In our situation, the TermInfosReader is (potentially) used by 
 more than one thread, meaning that Doug's patch _doesn't_ allow the affected 
 JVMs to correctly collect garbage.
 So...I've devised a simple patch which, from my observations on linux JVMs 
 1.4.2_06, and 1.5.0_03, fixes this problem.
 Kieran
 PS Thanks to daniel naber for pointing me to jira/lucene
 @@ -19,6 +19,7 @@
  import java.io.IOException;
  import org.apache.lucene.store.Directory;
 +import java.util.Hashtable;
  /** This stores a monotonically increasing set of Term, TermInfo pairs in a
   * Directory.  Pairs are accessed either by Term or by ordinal position the
 @@ -29,7 +30,7 @@
private String segment;
private FieldInfos fieldInfos;
 -  private ThreadLocal enumerators = new ThreadLocal();
 +  private final Hashtable enumeratorsByThread = new Hashtable();
private SegmentTermEnum origEnum;
private long size;
 @@ -60,10 +61,10 @@
}
private SegmentTermEnum getEnum() {
 -SegmentTermEnum termEnum = (SegmentTermEnum)enumerators.get();
 +SegmentTermEnum termEnum = 
 (SegmentTermEnum)enumeratorsByThread.get(Thread.currentThread());
  if (termEnum == null) {
termEnum = terms();
 -  enumerators.set(termEnum);
 +  enumeratorsByThread.put(Thread.currentThread(), termEnum);
  }
  return termEnum;
}
 @@ -195,5 +196,15 @@
public SegmentTermEnum terms(Term term) throws IOException {
  get(term);
  return (SegmentTermEnum)getEnum().clone();
 +  }
 +
 +  /* some jvms might have trouble gc-ing enumeratorsByThread */
 +  protected void finalize() throws Throwable {
 +try {
 +// make sure gc can clear up.
 +enumeratorsByThread.clear();
 +} finally {
 +super.finalize();
 +}
}
  }
 TermInfosReader.java, full source:
 ==
 package org.apache.lucene.index;
 /**
  * Copyright 2004 The Apache

TermInfosReader and clone of SegmentTermEnum

2006-12-13 Thread Otis Gospodnetic

Hi,

I'm looking at Robert Engels' patches in 
http://issues.apache.org/jira/browse/LUCENE-436 and looking at TermInfosReader.
I think I understand why there is ThreadLocal there in the first place - to act 
as a per-thread cache for the expensive to compute SegmentTermEnum yes?

But why is there is need to clone() the (original) SegmentTermEnum?

Thanks,
Otis




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (LUCENE-436) [PATCH] TermInfosReader, SegmentTermEnum Out Of Memory Exception

2006-12-13 Thread robert engels (JIRA)

[ 
http://issues.apache.org/jira/browse/LUCENE-436?page=comments#action_12458292 ] 

robert engels commented on LUCENE-436:
--

I would doubt the ThreadLocal issue that was in 1.4, changed in 1.5, would be 
reintroduced in 1.6.

I do not use Lucene 2.1 so I can't say for certain that a new memory bug hasn't 
been introduced.

I suggest attaching a good profiler (like JProfiler) and figure our the cause 
of the memory leak (the root references).

I use 1.9 based Lucene and can say unequivocally there are no inherent memory 
issues (especially when running under 1.5+).

There may also be new issues introduced in JDK 6 - we have not tested with it, 
only 1.4 and 1.5.

 [PATCH] TermInfosReader, SegmentTermEnum Out Of Memory Exception
 

 Key: LUCENE-436
 URL: http://issues.apache.org/jira/browse/LUCENE-436
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Affects Versions: 1.4
 Environment: Solaris JVM 1.4.1
 Linux JVM 1.4.2/1.5.0
 Windows not tested
Reporter: kieran
 Attachments: FixedThreadLocal.java, lucene-1.9.1.patch, 
 Lucene-436-TestCase.tar.gz, TermInfosReader.java, ThreadLocalTest.java


 We've been experiencing terrible memory problems on our production search 
 server, running lucene (1.4.3).
 Our live app regularly opens new indexes and, in doing so, releases old 
 IndexReaders for garbage collection.
 But...there appears to be a memory leak in 
 org.apache.lucene.index.TermInfosReader.java.
 Under certain conditions (possibly related to JVM version, although I've 
 personally observed it under both linux JVM 1.4.2_06, and 1.5.0_03, and SUNOS 
 JVM 1.4.1) the ThreadLocal member variable, enumerators doesn't get 
 garbage-collected when the TermInfosReader object is gc-ed.
 Looking at the code in TermInfosReader.java, there's no reason why it 
 _shouldn't_ be gc-ed, so I can only presume (and I've seen this suggested 
 elsewhere) that there could be a bug in the garbage collector of some JVMs.
 I've seen this problem briefly discussed; in particular at the following URL:
   http://java2.5341.com/msg/85821.html
 The patch that Doug recommended, which is included in lucene-1.4.3 doesn't 
 work in our particular circumstances. Doug's patch only clears the 
 ThreadLocal variable for the thread running the finalizer (my knowledge of 
 java breaks down here - I'm not sure which thread actually runs the 
 finalizer). In our situation, the TermInfosReader is (potentially) used by 
 more than one thread, meaning that Doug's patch _doesn't_ allow the affected 
 JVMs to correctly collect garbage.
 So...I've devised a simple patch which, from my observations on linux JVMs 
 1.4.2_06, and 1.5.0_03, fixes this problem.
 Kieran
 PS Thanks to daniel naber for pointing me to jira/lucene
 @@ -19,6 +19,7 @@
  import java.io.IOException;
  import org.apache.lucene.store.Directory;
 +import java.util.Hashtable;
  /** This stores a monotonically increasing set of Term, TermInfo pairs in a
   * Directory.  Pairs are accessed either by Term or by ordinal position the
 @@ -29,7 +30,7 @@
private String segment;
private FieldInfos fieldInfos;
 -  private ThreadLocal enumerators = new ThreadLocal();
 +  private final Hashtable enumeratorsByThread = new Hashtable();
private SegmentTermEnum origEnum;
private long size;
 @@ -60,10 +61,10 @@
}
private SegmentTermEnum getEnum() {
 -SegmentTermEnum termEnum = (SegmentTermEnum)enumerators.get();
 +SegmentTermEnum termEnum = 
 (SegmentTermEnum)enumeratorsByThread.get(Thread.currentThread());
  if (termEnum == null) {
termEnum = terms();
 -  enumerators.set(termEnum);
 +  enumeratorsByThread.put(Thread.currentThread(), termEnum);
  }
  return termEnum;
}
 @@ -195,5 +196,15 @@
public SegmentTermEnum terms(Term term) throws IOException {
  get(term);
  return (SegmentTermEnum)getEnum().clone();
 +  }
 +
 +  /* some jvms might have trouble gc-ing enumeratorsByThread */
 +  protected void finalize() throws Throwable {
 +try {
 +// make sure gc can clear up.
 +enumeratorsByThread.clear();
 +} finally {
 +super.finalize();
 +}
}
  }
 TermInfosReader.java, full source:
 ==
 package org.apache.lucene.index;
 /**
  * Copyright 2004 The Apache Software Foundation
  *
  * Licensed under the Apache License, Version 2.0 (the License);
  * you may not use this file except in compliance with the License.
  * You may obtain a copy of the License at
  *
  * http://www.apache.org/licenses/LICENSE-2.0
  *
  * Unless required by applicable law or agreed to in writing, software
  * distributed under the License is distributed on an AS IS BASIS,
  * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  * See

Re: TermInfosReader and clone of SegmentTermEnum

2006-12-13 Thread Otis Gospodnetic

Aaaah, I think I get it.
TermIndexReader can be shared by multiple threads.
Each thread will need access to SegmentTermEnum inside the TIR, but since each 
of them will search, scan, and seek to a different location, each threads needs 
its own copy/clone of the original SegmentTermEnum.

ThreadLocal is then used as a simple cache for the clone of the original 
SegmentTermEnum, so a single thread can get to it without repeating scan/seek 
stuff, and so that each thread works with its own clone of SegmentTermEnum.

Otis

- Original Message 
From: Otis Gospodnetic [EMAIL PROTECTED]
To: java-dev@lucene.apache.org
Sent: Wednesday, December 13, 2006 4:53:45 PM
Subject: TermInfosReader and clone of SegmentTermEnum

Hi,

I'm looking at Robert Engels' patches in 
http://issues.apache.org/jira/browse/LUCENE-436 and looking at TermInfosReader.
I think I understand why there is ThreadLocal there in the first place - to act 
as a per-thread cache for the expensive to compute SegmentTermEnum yes?

But why is there is need to clone() the (original) SegmentTermEnum?

Thanks,
Otis




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]





-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Resolved: (LUCENE-681) org.apache.lucene.document.Field is Serializable but doesn't have default constructor

2006-12-13 Thread Otis Gospodnetic (JIRA)

 [ http://issues.apache.org/jira/browse/LUCENE-681?page=all ]

Otis Gospodnetic resolved LUCENE-681.
-

Resolution: Won't Fix

I think Jed's right.  Plus, calling new Field(), which would now be possible, 
would give us without the actual information about the field - name, value, 
tokenized, stored, indexed, etc.


 org.apache.lucene.document.Field is Serializable but doesn't have default 
 constructor
 -

 Key: LUCENE-681
 URL: http://issues.apache.org/jira/browse/LUCENE-681
 Project: Lucene - Java
  Issue Type: Bug
  Components: Other
Affects Versions: 1.9, 2.0.0, 2.1, 2.0.1
 Environment: doesn't depend on environment
Reporter: Elijah Epifanov
Priority: Critical

 when I try to pass Document via network or do anyhing involving 
 serialization/deserialization I will get an exception.
 the following patch should help (Field.java):
   public Field () {
   }
   private void writeObject (java.io.ObjectOutputStream out)
   throws IOException {
 out.defaultWriteObject ();
   }
   private void readObject (java.io.ObjectInputStream in)
   throws IOException, ClassNotFoundException {
 in.defaultReadObject ();
 if (name == null) {
   throw new NullPointerException (name cannot be null);
 }
 this.name = name.intern ();// field names are interned
   }
 Maybe other classes do not conform to Serialization requirements too...

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: TermInfosReader and clone of SegmentTermEnum

2006-12-13 Thread robert engels


That is correct.

On Dec 13, 2006, at 4:48 PM, Otis Gospodnetic wrote:


Aaaah, I think I get it.
TermIndexReader can be shared by multiple threads.
Each thread will need access to SegmentTermEnum inside the TIR, but  
since each of them will search, scan, and seek to a different  
location, each threads needs its own copy/clone of the original  
SegmentTermEnum.


ThreadLocal is then used as a simple cache for the clone of the  
original SegmentTermEnum, so a single thread can get to it without  
repeating scan/seek stuff, and so that each thread works with its  
own clone of SegmentTermEnum.


Otis

- Original Message 
From: Otis Gospodnetic [EMAIL PROTECTED]
To: java-dev@lucene.apache.org
Sent: Wednesday, December 13, 2006 4:53:45 PM
Subject: TermInfosReader and clone of SegmentTermEnum

Hi,

I'm looking at Robert Engels' patches in http://issues.apache.org/ 
jira/browse/LUCENE-436 and looking at TermInfosReader.
I think I understand why there is ThreadLocal there in the first  
place - to act as a per-thread cache for the expensive to compute  
SegmentTermEnum yes?


But why is there is need to clone() the (original) SegmentTermEnum?

Thanks,
Otis




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]





-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Lucene nightly build failure

2006-12-13 Thread java-dev


javacc-uptodate-check:

javacc-notice:
 [echo] 
 [echo]   One or more of the JavaCC .jj files is newer than its 
corresponding
 [echo]   .java file.  Run the javacc target to regenerate the 
artifacts.
 [echo] 

init:

clover.setup:

clover.info:
 [echo] 
 [echo]   Clover not found. Code coverage reports disabled.
 [echo] 

clover:

common.compile-core:
[mkdir] Created dir: /tmp/lucene-nightly/build/classes/java
[javac] Compiling 204 source files to /tmp/lucene-nightly/build/classes/java
[javac] Note: 
/tmp/lucene-nightly/src/java/org/apache/lucene/queryParser/QueryParser.java 
uses or overrides a deprecated API.
[javac] Note: Recompile with -deprecation for details.

compile-core:
 [rmic] RMI Compiling 1 class to /tmp/lucene-nightly/build/classes/java

compile-demo:
[mkdir] Created dir: /tmp/lucene-nightly/build/classes/demo
[javac] Compiling 17 source files to /tmp/lucene-nightly/build/classes/demo

common.compile-test:
[mkdir] Created dir: /tmp/lucene-nightly/build/classes/test
[javac] Compiling 124 source files to /tmp/lucene-nightly/build/classes/test
[javac] Note: 
/tmp/lucene-nightly/src/test/org/apache/lucene/queryParser/TestQueryParser.java 
uses or overrides a deprecated API.
[javac] Note: Recompile with -deprecation for details.
 [copy] Copying 2 files to /tmp/lucene-nightly/build/classes/test
 [copy] Copied 1 empty directory to 1 empty directory under 
/tmp/lucene-nightly/build/classes/test

compile-test:

test:
[mkdir] Created dir: /tmp/lucene-nightly/build/test
[junit] Testsuite: org.apache.lucene.TestDemo
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.399 sec

[junit] Testsuite: org.apache.lucene.TestHitIterator
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.343 sec

[junit] Testsuite: org.apache.lucene.TestSearch
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.426 sec

[junit] Testsuite: org.apache.lucene.TestSearchForDuplicates
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.916 sec

[junit] Testsuite: org.apache.lucene.analysis.TestAnalyzers
[junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 0.272 sec

[junit] Testsuite: org.apache.lucene.analysis.TestISOLatin1AccentFilter
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.271 sec

[junit] Testsuite: org.apache.lucene.analysis.TestKeywordAnalyzer
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.378 sec

[junit] Testsuite: org.apache.lucene.analysis.TestLengthFilter
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.258 sec

[junit] Testsuite: org.apache.lucene.analysis.TestPerFieldAnalzyerWrapper
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.262 sec

[junit] Testsuite: org.apache.lucene.analysis.TestStandardAnalyzer
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.315 sec

[junit] Testsuite: org.apache.lucene.analysis.TestStopAnalyzer
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 0.268 sec

[junit] Testsuite: org.apache.lucene.analysis.TestStopFilter
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 0.261 sec

[junit] Testsuite: org.apache.lucene.document.TestBinaryDocument
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.334 sec

[junit] Testsuite: org.apache.lucene.document.TestDateTools
[junit] Tests run: 5, Failures: 0, Errors: 0, Time elapsed: 0.337 sec

[junit] Testsuite: org.apache.lucene.document.TestDocument
[junit] Tests run: 5, Failures: 0, Errors: 0, Time elapsed: 0.377 sec

[junit] Testsuite: org.apache.lucene.document.TestNumberTools
[junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 0.633 sec

[junit] Testsuite: org.apache.lucene.index.TestAddIndexesNoOptimize
[junit] Tests run: 7, Failures: 0, Errors: 0, Time elapsed: 2.726 sec

[junit] Testsuite: org.apache.lucene.index.TestBackwardsCompatibility
[junit] Tests run: 9, Failures: 0, Errors: 0, Time elapsed: 1.041 sec

[junit] Testsuite: org.apache.lucene.index.TestCompoundFile
[junit] Tests run: 10, Failures: 0, Errors: 0, Time elapsed: 3.316 sec

[junit] Testsuite: org.apache.lucene.index.TestDoc
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.435 sec

[junit] Testsuite: org.apache.lucene.index.TestDocumentWriter
[junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 0.504 sec

[junit] Testsuite: org.apache.lucene.index.TestFieldInfos
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.333 sec

[junit] Testsuite: org.apache.lucene.index.TestFieldsReader
[junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: 7.781 sec

[junit] - Standard Output ---
[junit] Average Non-lazy time (should be very close to zero): 0 ms for 50 
reads
[junit] Average

[EMAIL PROTECTED]: Project lucene-java (in module lucene-java) failed

2006-12-13 Thread Jason van Zyl

To whom it may engage...

This is an automated request, but not an unsolicited one. For 
more information please visit http://gump.apache.org/nagged.html, 
and/or contact the folk at [EMAIL PROTECTED]

Project lucene-java has an issue affecting its community integration.
This issue affects 4 projects.
The current state of this project is 'Failed', with reason 'Build Failed'.
For reference only, the following projects are affected by this:
- eyebrowse :  Web-based mail archive browsing
- jakarta-lucene :  Java Based Search Engine
- jakarta-slide :  Content Management System based on WebDAV technology
- lucene-java :  Java Based Search Engine


Full details are available at:
http://vmgump.apache.org/gump/public/lucene-java/lucene-java/index.html

That said, some information snippets are provided here.

The following annotations (debug/informational/warning/error messages) were 
provided:
 -DEBUG- Sole output [lucene-core-13122006.jar] identifier set to project name
 -DEBUG- Dependency on javacc exists, no need to add for property javacc.home.
 -INFO- Failed with reason build failed
 -DEBUG- Extracted fallback artifacts from Gump Repository



The following work was performed:
http://vmgump.apache.org/gump/public/lucene-java/lucene-java/gump_work/build_lucene-java_lucene-java.html
Work Name: build_lucene-java_lucene-java (Type: Build)
Work ended in a state of : Failed
Elapsed: 9 secs
Command Line: java -Djava.awt.headless=true 
-Xbootclasspath/p:/usr/local/gump/public/workspace/xml-commons/java/external/build/xml-apis.jar:/usr/local/gump/public/workspace/xml-xerces2/build/xercesImpl.jar
 org.apache.tools.ant.Main -Dgump.merge=/x1/gump/public/gump/work/merge.xml 
-Dbuild.sysclasspath=only -Dversion=13122006 
-Djavacc.home=/usr/local/gump/packages/javacc-3.1 package 
[Working Directory: /usr/local/gump/public/workspace/lucene-java]
CLASSPATH: 
/opt/jdk1.5/lib/tools.jar:/usr/local/gump/public/workspace/lucene-java/build/classes/java:/usr/local/gump/public/workspace/lucene-java/build/classes/demo:/usr/local/gump/public/workspace/lucene-java/build/classes/test:/usr/local/gump/public/workspace/ant/dist/lib/ant-jmf.jar:/usr/local/gump/public/workspace/ant/dist/lib/ant-swing.jar:/usr/local/gump/public/workspace/ant/dist/lib/ant-apache-resolver.jar:/usr/local/gump/public/workspace/ant/dist/lib/ant-trax.jar:/usr/local/gump/public/workspace/ant/dist/lib/ant-junit.jar:/usr/local/gump/public/workspace/ant/dist/lib/ant-launcher.jar:/usr/local/gump/public/workspace/ant/dist/lib/ant-nodeps.jar:/usr/local/gump/public/workspace/ant/dist/lib/ant.jar:/usr/local/gump/packages/junit3.8.1/junit.jar:/usr/local/gump/public/workspace/xml-commons/java/build/resolver.jar:/usr/local/gump/packages/je-1.7.1/lib/je.jar:/usr/local/gump/packages/javacc-3.1/bin/lib/javacc.jar:/usr/local/gump/packages/jtidy-04aug2000r7-dev/build/Tidy.jar:/usr/local/gump/public/workspace/dist/junit/junit.jar
-
Buildfile: build.xml

javacc-uptodate-check:

javacc-notice:
 [echo] 
 [echo]   One or more of the JavaCC .jj files is newer than its 
corresponding
 [echo]   .java file.  Run the javacc target to regenerate the 
artifacts.
 [echo] 

init:

clover.setup:

clover.info:
 [echo] 
 [echo]   Clover not found. Code coverage reports disabled.
 [echo] 

clover:

common.compile-core:
[mkdir] Created dir: 
/x1/gump/public/workspace/lucene-java/build/classes/java
[javac] Compiling 204 source files to 
/x1/gump/public/workspace/lucene-java/build/classes/java
[javac] Note: Some input files use or override a deprecated API.
[javac] Note: Recompile with -Xlint:deprecation for details.

compile-core:
 [rmic] RMI Compiling 1 class to 
/x1/gump/public/workspace/lucene-java/build/classes/java

jar-core:
  [jar] Building jar: 
/x1/gump/public/workspace/lucene-java/build/lucene-core-13122006.jar

javadocs:
[mkdir] Created dir: /x1/gump/public/workspace/lucene-java/build/docs/api

BUILD FAILED
/x1/gump/public/workspace/lucene-java/build.xml:126: The following error 
occurred while executing this line:
/x1/gump/public/workspace/lucene-java/build.xml:368: 
/x1/gump/public/workspace/lucene-java/contrib/gdata-server/src/java not found.

Total time: 9 seconds
-

To subscribe to this information via syndicated feeds:
- RSS: http://vmgump.apache.org/gump/public/lucene-java/lucene-java/rss.xml
- Atom: http://vmgump.apache.org/gump/public/lucene-java/lucene-java/atom.xml

== Gump Tracking Only ===
Produced by Gump version 2.2.
Gump Run 14001613122006, vmgump.apache.org:vmgump-public:14001613122006
Gump E-mail Identifier (unique within run) #1.

--
Apache Gump
http://gump.apache.org/ [Instance: vmgump]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[EMAIL PROTECTED]: Project lucene-java (in module lucene-java) failed

2006-12-13 Thread Jason van Zyl

To whom it may engage...

This is an automated request, but not an unsolicited one. For 
more information please visit http://gump.apache.org/nagged.html, 
and/or contact the folk at [EMAIL PROTECTED]

Project lucene-java has an issue affecting its community integration.
This issue affects 4 projects.
The current state of this project is 'Failed', with reason 'Build Failed'.
For reference only, the following projects are affected by this:
- eyebrowse :  Web-based mail archive browsing
- jakarta-lucene :  Java Based Search Engine
- jakarta-slide :  Content Management System based on WebDAV technology
- lucene-java :  Java Based Search Engine


Full details are available at:
http://vmgump.apache.org/gump/public/lucene-java/lucene-java/index.html

That said, some information snippets are provided here.

The following annotations (debug/informational/warning/error messages) were 
provided:
 -DEBUG- Sole output [lucene-core-13122006.jar] identifier set to project name
 -DEBUG- Dependency on javacc exists, no need to add for property javacc.home.
 -INFO- Failed with reason build failed
 -DEBUG- Extracted fallback artifacts from Gump Repository



The following work was performed:
http://vmgump.apache.org/gump/public/lucene-java/lucene-java/gump_work/build_lucene-java_lucene-java.html
Work Name: build_lucene-java_lucene-java (Type: Build)
Work ended in a state of : Failed
Elapsed: 9 secs
Command Line: java -Djava.awt.headless=true 
-Xbootclasspath/p:/usr/local/gump/public/workspace/xml-commons/java/external/build/xml-apis.jar:/usr/local/gump/public/workspace/xml-xerces2/build/xercesImpl.jar
 org.apache.tools.ant.Main -Dgump.merge=/x1/gump/public/gump/work/merge.xml 
-Dbuild.sysclasspath=only -Dversion=13122006 
-Djavacc.home=/usr/local/gump/packages/javacc-3.1 package 
[Working Directory: /usr/local/gump/public/workspace/lucene-java]
CLASSPATH: 
/opt/jdk1.5/lib/tools.jar:/usr/local/gump/public/workspace/lucene-java/build/classes/java:/usr/local/gump/public/workspace/lucene-java/build/classes/demo:/usr/local/gump/public/workspace/lucene-java/build/classes/test:/usr/local/gump/public/workspace/ant/dist/lib/ant-jmf.jar:/usr/local/gump/public/workspace/ant/dist/lib/ant-swing.jar:/usr/local/gump/public/workspace/ant/dist/lib/ant-apache-resolver.jar:/usr/local/gump/public/workspace/ant/dist/lib/ant-trax.jar:/usr/local/gump/public/workspace/ant/dist/lib/ant-junit.jar:/usr/local/gump/public/workspace/ant/dist/lib/ant-launcher.jar:/usr/local/gump/public/workspace/ant/dist/lib/ant-nodeps.jar:/usr/local/gump/public/workspace/ant/dist/lib/ant.jar:/usr/local/gump/packages/junit3.8.1/junit.jar:/usr/local/gump/public/workspace/xml-commons/java/build/resolver.jar:/usr/local/gump/packages/je-1.7.1/lib/je.jar:/usr/local/gump/packages/javacc-3.1/bin/lib/javacc.jar:/usr/local/gump/packages/jtidy-04aug2000r7-dev/build/Tidy.jar:/usr/local/gump/public/workspace/dist/junit/junit.jar
-
Buildfile: build.xml

javacc-uptodate-check:

javacc-notice:
 [echo] 
 [echo]   One or more of the JavaCC .jj files is newer than its 
corresponding
 [echo]   .java file.  Run the javacc target to regenerate the 
artifacts.
 [echo] 

init:

clover.setup:

clover.info:
 [echo] 
 [echo]   Clover not found. Code coverage reports disabled.
 [echo] 

clover:

common.compile-core:
[mkdir] Created dir: 
/x1/gump/public/workspace/lucene-java/build/classes/java
[javac] Compiling 204 source files to 
/x1/gump/public/workspace/lucene-java/build/classes/java
[javac] Note: Some input files use or override a deprecated API.
[javac] Note: Recompile with -Xlint:deprecation for details.

compile-core:
 [rmic] RMI Compiling 1 class to 
/x1/gump/public/workspace/lucene-java/build/classes/java

jar-core:
  [jar] Building jar: 
/x1/gump/public/workspace/lucene-java/build/lucene-core-13122006.jar

javadocs:
[mkdir] Created dir: /x1/gump/public/workspace/lucene-java/build/docs/api

BUILD FAILED
/x1/gump/public/workspace/lucene-java/build.xml:126: The following error 
occurred while executing this line:
/x1/gump/public/workspace/lucene-java/build.xml:368: 
/x1/gump/public/workspace/lucene-java/contrib/gdata-server/src/java not found.

Total time: 9 seconds
-

To subscribe to this information via syndicated feeds:
- RSS: http://vmgump.apache.org/gump/public/lucene-java/lucene-java/rss.xml
- Atom: http://vmgump.apache.org/gump/public/lucene-java/lucene-java/atom.xml

== Gump Tracking Only ===
Produced by Gump version 2.2.
Gump Run 14001613122006, vmgump.apache.org:vmgump-public:14001613122006
Gump E-mail Identifier (unique within run) #1.

--
Apache Gump
http://gump.apache.org/ [Instance: vmgump]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Locale string compare: Java vs. C#

2006-12-13 Thread George Aroush

Hi folks,

Over at Lucene.Net, I have run into a NUnit test which is failing with
Lucene.Net (C#) but is passing with Lucene (Java).  The two tests that fail
are: TestInternationalMultiSearcherSort and TestInternationalSort

After several hours of investigation, I narrowed the problem to what I
believe is a difference in the way Java and .NET implement compare.

The code in question is this method (found in FieldSortedHitQueue.java):

public final int compare (final ScoreDoc i, final ScoreDoc j) {
return collator.compare (index[i.doc], index[j.doc]);
}

To demonstrate the compare problem (Java vs. .NET) I crated this simple code
both in Java and C#:

// Java code: you get back 1 for 'res'
String s1 = H\u00D8T;
String s2 = HUT;
Collator collator = Collator.getInstance (Locale.US);
int diff = collator.compare(s1, s2);

// C# code: you get back -1 for 'res'
string s1 = H\u00D8T;
string s2 = HUT;
System.Globalization.CultureInfo locale = new
System.Globalization.CultureInfo(en-US);
System.Globalization.CompareInfo collator = locale.CompareInfo;
int res = collator.Compare(s1, s2);

Java will give me back a 1 while .NET gives me back -1.

So, what I am trying to figure out is who is doing the right thing?  Or am I
missing additional calls before I can compare?

My goal is to understand why the difference exist and thus based on that
understanding I can judge how serious this issue is and find a fix for it or
just document it as a language difference between Java and .NET.

Btw, this is based on Lucene 2.0 for both Java and C# Lucene.

Regards,

-- George Aroush


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Locale string compare: Java vs. C#

2006-12-13 Thread Chuck Williams

Surprising but it looks to me like a bug in Java's collation rules for
en-US.  According to
http://developer.mimer.com/collations/charts/UCA_latin.htm, \u00D8
(which is Latin Capital Letter O With Stroke) should be before U,
implying -1 is the correct result.  Java is returning 1 for all
strengths of the collator.  Maybe there is some other subtlety with this
character...

Chuck


George Aroush wrote on 12/13/2006 04:20 PM:
 Hi folks,

 Over at Lucene.Net, I have run into a NUnit test which is failing with
 Lucene.Net (C#) but is passing with Lucene (Java).  The two tests that fail
 are: TestInternationalMultiSearcherSort and TestInternationalSort

 After several hours of investigation, I narrowed the problem to what I
 believe is a difference in the way Java and .NET implement compare.

 The code in question is this method (found in FieldSortedHitQueue.java):

 public final int compare (final ScoreDoc i, final ScoreDoc j) {
 return collator.compare (index[i.doc], index[j.doc]);
 }

 To demonstrate the compare problem (Java vs. .NET) I crated this simple code
 both in Java and C#:

 // Java code: you get back 1 for 'res'
 String s1 = H\u00D8T;
 String s2 = HUT;
 Collator collator = Collator.getInstance (Locale.US);
 int diff = collator.compare(s1, s2);

 // C# code: you get back -1 for 'res'
 string s1 = H\u00D8T;
 string s2 = HUT;
 System.Globalization.CultureInfo locale = new
 System.Globalization.CultureInfo(en-US);
 System.Globalization.CompareInfo collator = locale.CompareInfo;
 int res = collator.Compare(s1, s2);

 Java will give me back a 1 while .NET gives me back -1.

 So, what I am trying to figure out is who is doing the right thing?  Or am I
 missing additional calls before I can compare?

 My goal is to understand why the difference exist and thus based on that
 understanding I can judge how serious this issue is and find a fix for it or
 just document it as a language difference between Java and .NET.

 Btw, this is based on Lucene 2.0 for both Java and C# Lucene.

 Regards,

 -- George Aroush


 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]

   


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: fileformats.html not in sync with fileformats.xml

[jira] Commented: (LUCENE-565) Supporting deleteDocuments in IndexWriter (Code and Performance Results Provided)

[jira] Commented: (LUCENE-565) Supporting deleteDocuments in IndexWriter (Code and Performance Results Provided)

[jira] Commented: (LUCENE-740) Bugs in contrib/snowball/.../SnowballProgram.java - Kraaij-Pohlmann gives Index-OOB Exception

[jira] Commented: (LUCENE-740) Bugs in contrib/snowball/.../SnowballProgram.java - Kraaij-Pohlmann gives Index-OOB Exception

IBM OmniFind Yahoo! Edition

RE: IBM OmniFind Yahoo! Edition

[jira] Created: (LUCENE-746) Incorrect error message in AnalyzingQueryParser.getPrefixQuery

[jira] Updated: (LUCENE-746) Incorrect error message in AnalyzingQueryParser.getPrefixQuery

GData DB4o reloaded

Re: GData DB4o reloaded

Re: GData DB4o reloaded

[jira] Commented: (LUCENE-436) [PATCH] TermInfosReader, SegmentTermEnum Out Of Memory Exception

TermInfosReader and clone of SegmentTermEnum

[jira] Commented: (LUCENE-436) [PATCH] TermInfosReader, SegmentTermEnum Out Of Memory Exception

Re: TermInfosReader and clone of SegmentTermEnum

[jira] Resolved: (LUCENE-681) org.apache.lucene.document.Field is Serializable but doesn't have default constructor

Re: TermInfosReader and clone of SegmentTermEnum

Lucene nightly build failure

[EMAIL PROTECTED]: Project lucene-java (in module lucene-java) failed

[EMAIL PROTECTED]: Project lucene-java (in module lucene-java) failed

Locale string compare: Java vs. C#

Re: Locale string compare: Java vs. C#

23 matches

Site Navigation

Mail list logo

Footer information