Re: Documentation Brainstorming

2007-05-31 Thread Doron Cohen
Paul Elschot <[EMAIL PROTECTED]> wrote on 30/05/2007 23:57:47:

> On Thursday 31 May 2007 05:52, Erik Hatcher wrote:
> >
> > On May 30, 2007, at 9:33 PM, Grant Ingersoll wrote:
> > >> I'd rather see each jar get its own javadoc,
> > >> or at the very least, indicate which jar each
> > >> class is defined in for the ones that aren't
> > >> part of the core.
> > >>
> > >
> > > Yeah, I don't like that all the contribs are built in together.
> > > What do others think?  I would vote for separating them out.
> >
> > I concur with having the contrib docs separated.  I may have been the
> > one (or at least assisted with it) who got the documentation build to
> > fold it altogether as that was the goal at the time.  It'd be much
> > easier, build-wise, if all artifacts were kept entirely separate for
> > all the various contrib libraries and the core, as well as the demo.
>
>
> Currently it is not clear in the javadocs whether a class belongs
> to core or contrib. Having separate javadocs would probably
> improve that.
> I have no experience in linking between javadoc "packages",
> so I have no suggestion on how to make such a separation.

I am all for separation.
Though it is sometimes useful to have it all together, - perhaps two
versions: all, and by module (core, contrib/x, contrib/y, etc.)?
Or is this too cluttered - we already have it by release...


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Updated: (LUCENE-887) Interruptible segment merges

2007-05-31 Thread Michael Busch (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Busch updated LUCENE-887:
-

Attachment: ExtendedIndexWriter.java

Here is the code I originally wrote to add a shutdown function to IndexWriter.

This patch contains a class called ExtendedIndexWriter that (as you might 
guess ;) ) extends IndexWriter and adds a shutdown() method. This method
may always be called by some thread, no matter if other threads are 
currently adding documents.

Three scenarios might happen:
1) Shutdown() is called while there is no ongoing merge or addDocument:
   In this case the buffered documents are flushed to disk without 
   triggering cascading merges. (I will commit a protected method
   flushRamSegments(boolean triggerMerge) to IndexWriter to support this.
  
2) Shutdown() is called while there is an ongoing merge:
   In this case an IOException is thrown by the extended FSOutputStream
   which makes the IndexWriter rollback the transaction. Thereafter
   flushRamSegments(false) is called to flush buffered docs if there are
   any.
   
3) Shutdown() is called while other threads are in addDocument:
   This is the tricky one. We don't want to throw the IOException before
   the addDocument has finished analyzing and indexing the document,
   because otherwise this document would be lost. Since buildDocument()
   is not synchronized we can not rely on IndexWriters mutex to wait for
   those threads to finish addDocument. Therefore I add a variable that
   counts how many threads are in addDocument(). A different mutex is
   used to increment, decrement and check this variable. Shutdown wait
   until indexing of those docs is done and continues like in case 1) or 2).
   

I suggest whoever is interested should just look at the code. I'm sure
there will be a lot of questions. There's still a lot of work that has
to be done here, like writing testcases and examining how this works in
the new autoCommit=false mode (I wrote this code before that new feature
was committed). And we still have to decide whether this shutdown
functionality should go into the Lucene core.

> Interruptible segment merges
> 
>
> Key: LUCENE-887
> URL: https://issues.apache.org/jira/browse/LUCENE-887
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: Index
>Reporter: Michael Busch
>Priority: Minor
> Fix For: 2.2
>
> Attachments: ExtendedIndexWriter.java
>
>
> Adds the ability to IndexWriter to interrupt an ongoing merge. This might be 
> necessary when Lucene is e. g. running as a service and has to stop indexing 
> within a certain period of time due to a shutdown request.
> A solution would be to add a new method shutdown() to IndexWriter which 
> satisfies the following two requirements:
> - if a merge is happening, abort it
> - flush the buffered docs but do not trigger a merge 
> See also discussions about this feature on java-dev:
> http://www.gossamer-threads.com/lists/lucene/java-dev/49008

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Resolved: (LUCENE-866) Multi-level skipping on posting lists

2007-05-31 Thread Michael Busch (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Busch resolved LUCENE-866.
--

Resolution: Fixed

Committed.

> Multi-level skipping on posting lists
> -
>
> Key: LUCENE-866
> URL: https://issues.apache.org/jira/browse/LUCENE-866
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Reporter: Michael Busch
>Assignee: Michael Busch
>Priority: Minor
> Fix For: 2.2
>
> Attachments: fileformats.patch, lucene-866.patch, lucene-866.patch
>
>
> To accelerate posting list skips (TermDocs.skipTo(int)) Lucene uses skip 
> lists. 
> The default skip interval is set to 16. If we want to skip e. g. 100 
> documents, 
> then it is not necessary to read 100 entries from the posting list, but only 
> 100/16 = 6 skip list entries plus 100%16 = 4 entries from the posting list. 
> This 
> speeds up conjunction (AND) and phrase queries significantly.
> However, the skip interval is always a compromise. If you have a very big 
> index 
> with huge posting lists and you want to skip over lets say 100k documents, 
> then 
> it is still necessary to read 100k/16 = 6250 entries from the skip list. For 
> big 
> indexes the skip interval could be set to a higher value, but then after a 
> big 
> skip a long scan to the target doc might be necessary.
> A solution for this compromise is to have multi-level skip lists that 
> guarantee a 
> logarithmic amount of skips to any target in the posting list. This patch 
> implements such an approach in the following way:
>   Example for skipInterval = 3:
>   c(skip 
> level 2)
>   c c c(skip 
> level 1) 
>   x x x x x x x x x x  (skip 
> level 0)
>   d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d  (posting 
> list)
>   3 6 9 12151821242730 (df)
>  
>   d - document
>   x - skip data
>   c - skip data with child pointer
>  
> Skip level i contains every skipInterval-th entry from skip level i-1. 
> Therefore the 
> number of entries on level i is: floor(df / ((skipInterval ^ (i + 1))).
>  
> Each skip entry on a level i>0 contains a pointer to the corresponding skip 
> entry in 
> list i-1. This guarantees a logarithmic amount of skips to find the target 
> document.
> Implementations details:
>* I factored the skipping code out of SegmentMerger and SegmentTermDocs to 
>  simplify those classes. The two new classes AbstractSkipListReader and 
>AbstractSkipListWriter implement the skipping functionality.
>* While AbstractSkipListReader and Writer take care of writing and reading 
> the 
>  multiple skip levels, they do not implement an actual skip data format. 
> The two 
>new subclasses DefaultSkipListReader and Writer implement the skip 
> data format 
>that is currently used in Lucene (with two file pointers for the freq 
> and prox 
>file and with payload length information). I added this extra layer to 
> be 
>prepared for flexible indexing and different posting list formats. 
>   
>
> File format changes: 
>* I added the new parameter 'maxSkipLevels' to the term dictionary and 
> increased the
>  version of this file. If maxSkipLevels is set to one, then the format of 
> the freq 
>file does not change at all, because we only have one skip level as 
> before. For 
>backwards compatibility maxSkipLevels is set to one automatically if 
> an index 
>without the new parameter is read. 
>* In case maxSkipLevels > 1, then the frq file changes as follows:
>  FreqFile (.frq) --> ^TermCount
>SkipData--> <^(Min(maxSkipLevels, 
>  floor(log(DocFreq/log(skipInterval))) - 1)>, 
> SkipLevel>
>SkipLevel   --> ^DocFreq/(SkipInterval^(Level + 1))
>Remark: The length of the SkipLevel is not stored for level 0, because 
> 1) it is not 
>needed, and 2) the format of this file does not change for 
> maxSkipLevels=1 then.
>
>
> All unit tests pass with this patch.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Updated: (LUCENE-763) LuceneDictionary skips first word in enumeration

2007-05-31 Thread Christian Mallwitz (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Christian Mallwitz updated LUCENE-763:
--

Attachment: (was: LuceneDictionary.java)

> LuceneDictionary skips first word in enumeration
> 
>
> Key: LUCENE-763
> URL: https://issues.apache.org/jira/browse/LUCENE-763
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Other
>Affects Versions: 2.0.0
> Environment: Windows Sun JRE 1.4.2_10_b03
>Reporter: Dan Ertman
> Attachments: TestLuceneDictionary.java
>
>
> The current code for LuceneDictionary will always skip the first word of the 
> TermEnum. The reason is that it doesn't initially retrieve TermEnum.term - 
> its first call is to TermEnum.next, which moves it past the first term (line 
> 76).
> To see this problem cause a failure, add this test to TestSpellChecker:
> similar = spellChecker.suggestSimilar("eihgt",2);
>   assertEquals(1, similar.length);
>   assertEquals(similar[0], "eight");
> Because "eight" is the first word in the index, it will fail.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Updated: (LUCENE-763) LuceneDictionary skips first word in enumeration

2007-05-31 Thread Christian Mallwitz (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Christian Mallwitz updated LUCENE-763:
--

Attachment: (was: TestLuceneDictionary.java)

> LuceneDictionary skips first word in enumeration
> 
>
> Key: LUCENE-763
> URL: https://issues.apache.org/jira/browse/LUCENE-763
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Other
>Affects Versions: 2.0.0
> Environment: Windows Sun JRE 1.4.2_10_b03
>Reporter: Dan Ertman
>
> The current code for LuceneDictionary will always skip the first word of the 
> TermEnum. The reason is that it doesn't initially retrieve TermEnum.term - 
> its first call is to TermEnum.next, which moves it past the first term (line 
> 76).
> To see this problem cause a failure, add this test to TestSpellChecker:
> similar = spellChecker.suggestSimilar("eihgt",2);
>   assertEquals(1, similar.length);
>   assertEquals(similar[0], "eight");
> Because "eight" is the first word in the index, it will fail.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Updated: (LUCENE-763) LuceneDictionary skips first word in enumeration

2007-05-31 Thread Christian Mallwitz (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Christian Mallwitz updated LUCENE-763:
--

Attachment: TestLuceneDictionary.java

New extended unit test case for class LuceneDictionary

> LuceneDictionary skips first word in enumeration
> 
>
> Key: LUCENE-763
> URL: https://issues.apache.org/jira/browse/LUCENE-763
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Other
>Affects Versions: 2.0.0
> Environment: Windows Sun JRE 1.4.2_10_b03
>Reporter: Dan Ertman
> Attachments: TestLuceneDictionary.java
>
>
> The current code for LuceneDictionary will always skip the first word of the 
> TermEnum. The reason is that it doesn't initially retrieve TermEnum.term - 
> its first call is to TermEnum.next, which moves it past the first term (line 
> 76).
> To see this problem cause a failure, add this test to TestSpellChecker:
> similar = spellChecker.suggestSimilar("eihgt",2);
>   assertEquals(1, similar.length);
>   assertEquals(similar[0], "eight");
> Because "eight" is the first word in the index, it will fail.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Updated: (LUCENE-763) LuceneDictionary skips first word in enumeration

2007-05-31 Thread Christian Mallwitz (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Christian Mallwitz updated LUCENE-763:
--

Attachment: LuceneDictionary.java

Fixed class LuceneDictionary

> LuceneDictionary skips first word in enumeration
> 
>
> Key: LUCENE-763
> URL: https://issues.apache.org/jira/browse/LUCENE-763
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Other
>Affects Versions: 2.0.0
> Environment: Windows Sun JRE 1.4.2_10_b03
>Reporter: Dan Ertman
> Attachments: LuceneDictionary.java, TestLuceneDictionary.java
>
>
> The current code for LuceneDictionary will always skip the first word of the 
> TermEnum. The reason is that it doesn't initially retrieve TermEnum.term - 
> its first call is to TermEnum.next, which moves it past the first term (line 
> 76).
> To see this problem cause a failure, add this test to TestSpellChecker:
> similar = spellChecker.suggestSimilar("eihgt",2);
>   assertEquals(1, similar.length);
>   assertEquals(similar[0], "eight");
> Because "eight" is the first word in the index, it will fail.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [jira] Commented: (LUCENE-763) LuceneDictionary skips first word in enumeration

2007-05-31 Thread Christian Mallwitz
I knew the boolean flag which was in the class in the first place was
used for something ... :-)

Anyway, I have uploaded updated class and unit test files. 

Thanks
Christian



This e-mail has been scanned for all viruses by MessageLabs.

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




[EMAIL PROTECTED]: Project lucene-java (in module lucene-java) failed

2007-05-31 Thread Jason van Zyl
To whom it may engage...

This is an automated request, but not an unsolicited one. For 
more information please visit http://gump.apache.org/nagged.html, 
and/or contact the folk at [EMAIL PROTECTED]

Project lucene-java has an issue affecting its community integration.
This issue affects 4 projects,
 and has been outstanding for 15 runs.
The current state of this project is 'Failed', with reason 'Build Failed'.
For reference only, the following projects are affected by this:
- eyebrowse :  Web-based mail archive browsing
- jakarta-lucene :  Java Based Search Engine
- jakarta-slide :  Content Management System based on WebDAV technology
- lucene-java :  Java Based Search Engine


Full details are available at:
http://vmgump.apache.org/gump/public/lucene-java/lucene-java/index.html

That said, some information snippets are provided here.

The following annotations (debug/informational/warning/error messages) were 
provided:
 -DEBUG- Sole output [lucene-core-31052007.jar] identifier set to project name
 -DEBUG- Dependency on javacc exists, no need to add for property javacc.home.
 -INFO- Failed with reason build failed
 -DEBUG- Extracted fallback artifacts from Gump Repository



The following work was performed:
http://vmgump.apache.org/gump/public/lucene-java/lucene-java/gump_work/build_lucene-java_lucene-java.html
Work Name: build_lucene-java_lucene-java (Type: Build)
Work ended in a state of : Failed
Elapsed: 26 secs
Command Line: /opt/jdk1.5/bin/java -Djava.awt.headless=true 
-Xbootclasspath/p:/usr/local/gump/public/workspace/xml-commons/java/external/build/xml-apis.jar:/usr/local/gump/public/workspace/xml-xerces2/build/xercesImpl.jar
 org.apache.tools.ant.Main -Dgump.merge=/x1/gump/public/gump/work/merge.xml 
-Dbuild.sysclasspath=only -Dversion=31052007 
-Djavacc.home=/usr/local/gump/packages/javacc-3.1 package 
[Working Directory: /usr/local/gump/public/workspace/lucene-java]
CLASSPATH: 
/opt/jdk1.5/lib/tools.jar:/usr/local/gump/public/workspace/lucene-java/build/classes/java:/usr/local/gump/public/workspace/lucene-java/build/classes/demo:/usr/local/gump/public/workspace/lucene-java/build/classes/test:/usr/local/gump/public/workspace/ant/dist/lib/ant-jmf.jar:/usr/local/gump/public/workspace/ant/dist/lib/ant-swing.jar:/usr/local/gump/public/workspace/ant/dist/lib/ant-apache-resolver.jar:/usr/local/gump/public/workspace/ant/dist/lib/ant-trax.jar:/usr/local/gump/public/workspace/ant/dist/lib/ant-junit.jar:/usr/local/gump/public/workspace/ant/dist/lib/ant-launcher.jar:/usr/local/gump/public/workspace/ant/dist/lib/ant-nodeps.jar:/usr/local/gump/public/workspace/ant/dist/lib/ant.jar:/usr/local/gump/packages/junit3.8.1/junit.jar:/usr/local/gump/public/workspace/xml-commons/java/build/resolver.jar:/usr/local/gump/packages/je-1.7.1/lib/je.jar:/usr/local/gump/public/workspace/jakarta-commons/digester/dist/commons-digester.jar:/usr/local/gump/public/workspace/jakarta-regexp/build/jakarta-regexp-31052007.jar:/usr/local/gump/packages/javacc-3.1/bin/lib/javacc.jar:/usr/local/gump/public/workspace/jline/target/jline-0.9.92-SNAPSHOT.jar:/usr/local/gump/packages/jtidy-04aug2000r7-dev/build/Tidy.jar:/usr/local/gump/public/workspace/junit/dist/junit-31052007.jar:/usr/local/gump/public/workspace/xml-commons/java/external/build/xml-apis-ext.jar
-
[javac] location: class org.apache.lucene.store.db.DbDirectory
[javac] DatabaseEntry key = new DatabaseEntry(new byte[0]);
[javac] ^
[javac] 
/x1/gump/public/workspace/lucene-java/contrib/db/bdb/src/java/org/apache/lucene/store/db/DbDirectory.java:171:
 cannot find symbol
[javac] symbol  : class DatabaseEntry
[javac] location: class org.apache.lucene.store.db.DbDirectory
[javac] DatabaseEntry data = new DatabaseEntry((byte[]) 
null);
[javac] ^
[javac] 
/x1/gump/public/workspace/lucene-java/contrib/db/bdb/src/java/org/apache/lucene/store/db/DbDirectory.java:171:
 cannot find symbol
[javac] symbol  : class DatabaseEntry
[javac] location: class org.apache.lucene.store.db.DbDirectory
[javac] DatabaseEntry data = new DatabaseEntry((byte[]) 
null);
[javac]  ^
[javac] 
/x1/gump/public/workspace/lucene-java/contrib/db/bdb/src/java/org/apache/lucene/store/db/DbDirectory.java:178:
 cannot find symbol
[javac] symbol  : variable DbConstants
[javac] location: class org.apache.lucene.store.db.DbDirectory
[javac]DbConstants.DB_SET_RANGE | flags) != 
DbConstants.DB_NOTFOUND)
[javac]^
[javac] 
/x1/gump/public/workspace/lucene-java/contrib/db/bdb/src/java/org/apache/lucene/store/db/DbDirectory.java:178:
 cannot find symbol
[javac] symbol  : variable DbConstants
[javac] location: class org.apache.lucene.store.db.DbDirectory
[javac] 

[EMAIL PROTECTED]: Project lucene-java (in module lucene-java) failed

2007-05-31 Thread Jason van Zyl
To whom it may engage...

This is an automated request, but not an unsolicited one. For 
more information please visit http://gump.apache.org/nagged.html, 
and/or contact the folk at [EMAIL PROTECTED]

Project lucene-java has an issue affecting its community integration.
This issue affects 4 projects,
 and has been outstanding for 15 runs.
The current state of this project is 'Failed', with reason 'Build Failed'.
For reference only, the following projects are affected by this:
- eyebrowse :  Web-based mail archive browsing
- jakarta-lucene :  Java Based Search Engine
- jakarta-slide :  Content Management System based on WebDAV technology
- lucene-java :  Java Based Search Engine


Full details are available at:
http://vmgump.apache.org/gump/public/lucene-java/lucene-java/index.html

That said, some information snippets are provided here.

The following annotations (debug/informational/warning/error messages) were 
provided:
 -DEBUG- Sole output [lucene-core-31052007.jar] identifier set to project name
 -DEBUG- Dependency on javacc exists, no need to add for property javacc.home.
 -INFO- Failed with reason build failed
 -DEBUG- Extracted fallback artifacts from Gump Repository



The following work was performed:
http://vmgump.apache.org/gump/public/lucene-java/lucene-java/gump_work/build_lucene-java_lucene-java.html
Work Name: build_lucene-java_lucene-java (Type: Build)
Work ended in a state of : Failed
Elapsed: 26 secs
Command Line: /opt/jdk1.5/bin/java -Djava.awt.headless=true 
-Xbootclasspath/p:/usr/local/gump/public/workspace/xml-commons/java/external/build/xml-apis.jar:/usr/local/gump/public/workspace/xml-xerces2/build/xercesImpl.jar
 org.apache.tools.ant.Main -Dgump.merge=/x1/gump/public/gump/work/merge.xml 
-Dbuild.sysclasspath=only -Dversion=31052007 
-Djavacc.home=/usr/local/gump/packages/javacc-3.1 package 
[Working Directory: /usr/local/gump/public/workspace/lucene-java]
CLASSPATH: 
/opt/jdk1.5/lib/tools.jar:/usr/local/gump/public/workspace/lucene-java/build/classes/java:/usr/local/gump/public/workspace/lucene-java/build/classes/demo:/usr/local/gump/public/workspace/lucene-java/build/classes/test:/usr/local/gump/public/workspace/ant/dist/lib/ant-jmf.jar:/usr/local/gump/public/workspace/ant/dist/lib/ant-swing.jar:/usr/local/gump/public/workspace/ant/dist/lib/ant-apache-resolver.jar:/usr/local/gump/public/workspace/ant/dist/lib/ant-trax.jar:/usr/local/gump/public/workspace/ant/dist/lib/ant-junit.jar:/usr/local/gump/public/workspace/ant/dist/lib/ant-launcher.jar:/usr/local/gump/public/workspace/ant/dist/lib/ant-nodeps.jar:/usr/local/gump/public/workspace/ant/dist/lib/ant.jar:/usr/local/gump/packages/junit3.8.1/junit.jar:/usr/local/gump/public/workspace/xml-commons/java/build/resolver.jar:/usr/local/gump/packages/je-1.7.1/lib/je.jar:/usr/local/gump/public/workspace/jakarta-commons/digester/dist/commons-digester.jar:/usr/local/gump/public/workspace/jakarta-regexp/build/jakarta-regexp-31052007.jar:/usr/local/gump/packages/javacc-3.1/bin/lib/javacc.jar:/usr/local/gump/public/workspace/jline/target/jline-0.9.92-SNAPSHOT.jar:/usr/local/gump/packages/jtidy-04aug2000r7-dev/build/Tidy.jar:/usr/local/gump/public/workspace/junit/dist/junit-31052007.jar:/usr/local/gump/public/workspace/xml-commons/java/external/build/xml-apis-ext.jar
-
[javac] location: class org.apache.lucene.store.db.DbDirectory
[javac] DatabaseEntry key = new DatabaseEntry(new byte[0]);
[javac] ^
[javac] 
/x1/gump/public/workspace/lucene-java/contrib/db/bdb/src/java/org/apache/lucene/store/db/DbDirectory.java:171:
 cannot find symbol
[javac] symbol  : class DatabaseEntry
[javac] location: class org.apache.lucene.store.db.DbDirectory
[javac] DatabaseEntry data = new DatabaseEntry((byte[]) 
null);
[javac] ^
[javac] 
/x1/gump/public/workspace/lucene-java/contrib/db/bdb/src/java/org/apache/lucene/store/db/DbDirectory.java:171:
 cannot find symbol
[javac] symbol  : class DatabaseEntry
[javac] location: class org.apache.lucene.store.db.DbDirectory
[javac] DatabaseEntry data = new DatabaseEntry((byte[]) 
null);
[javac]  ^
[javac] 
/x1/gump/public/workspace/lucene-java/contrib/db/bdb/src/java/org/apache/lucene/store/db/DbDirectory.java:178:
 cannot find symbol
[javac] symbol  : variable DbConstants
[javac] location: class org.apache.lucene.store.db.DbDirectory
[javac]DbConstants.DB_SET_RANGE | flags) != 
DbConstants.DB_NOTFOUND)
[javac]^
[javac] 
/x1/gump/public/workspace/lucene-java/contrib/db/bdb/src/java/org/apache/lucene/store/db/DbDirectory.java:178:
 cannot find symbol
[javac] symbol  : variable DbConstants
[javac] location: class org.apache.lucene.store.db.DbDirectory
[javac] 

Re: svn commit: r543076 - in /lucene/java/trunk: ./ src/java/org/apache/lucene/index/ src/site/src/documentation/content/xdocs/ src/test/org/apache/lucene/index/

2007-05-31 Thread Erik Hatcher


On May 31, 2007, at 3:48 AM, [EMAIL PROTECTED] wrote:
+ 7. LUCENE-866: Adds multi-level skip lists to the posting lists.  
This speeds
+up most queries that use skipTo(), especially on big indexes  
with large posting
+lists. For average AND queries the speedup is about 20%, for  
queries that
+contain very frequence and very unique terms the speedup can  
be over 80%.

+(Michael Busch)


Minor typo frequence => frequent.

Erik



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Created: (LUCENE-897) Change how Core and Contrib javadocs are hosted

2007-05-31 Thread Grant Ingersoll (JIRA)
Change how Core and Contrib javadocs are hosted
---

 Key: LUCENE-897
 URL: https://issues.apache.org/jira/browse/LUCENE-897
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Website
Reporter: Grant Ingersoll
Priority: Minor


Change the site javadocs to:
1. separate contrib javadocs from core javadocs
2. Optionally, include a unified view as well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: addIndexes()

2007-05-31 Thread Doug Cutting

Steven Parkes wrote:

Is there any particular reason that the version that takes a Directory[]
optimizes first?


There was, but unfortunately I can't recall it now.  Index merging has 
changed substantially since then, so, whatever it was, it may no longer 
apply.  If no one can think of a good reason to optimize any longer, 
then probably we should remove it, no?


Doug

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: addIndexes()

2007-05-31 Thread Andi Vajda


On Thu, 31 May 2007, Doug Cutting wrote:


Steven Parkes wrote:

Is there any particular reason that the version that takes a Directory[]
optimizes first?


There was, but unfortunately I can't recall it now.  Index merging has 
changed substantially since then, so, whatever it was, it may no longer 
apply.  If no one can think of a good reason to optimize any longer, then 
probably we should remove it, no?


No longer optimizing on this call would impact performance in what I'm doing.
My usage pattern involves indexing in a MemoryIndex and adding that index to 
an index backed by a DbDirectory. If the index is not optimized first, the 
operation becomes very noisy in the database.


In other words, if that change is made, please let us know so that I can adapt 
my code to explicitely optimize the MemoryIndex first.


Thanks !

Andi..


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Documentation Brainstorming

2007-05-31 Thread Doug Cutting

Grant Ingersoll wrote:

I'd rather see each jar get its own javadoc,
or at the very least, indicate which jar each
class is defined in for the ones that aren't
part of the core.



Yeah, I don't like that all the contribs are built in together.  What do 
others think?  I would vote for separating them out.


I like the single javadoc build.  The linking is nice, e.g., all 
Analyzer implementations are linked from Analyzer.  It also makes it 
easier for folks to see everything that's included in the release in one 
place.


Perhaps the names of the sections should be the name of the jar file, 
and/or the summary sentence in the package.html for contrib packages 
should name the jar file.  Would that suffice?


However if most folks really wish to split things, then some new 
navigational pages are required to provide a home for the various 
javadocs.  Ideally this would provide the level of integration that, 
e.g., Ant's optional tasks have with Ant's core tasks: when browsing 
core tasks there's always a link to optional tasks, and vice-versa, so 
the optional stuff is always just a single click away.  Putting contrib 
and core javadoc together achieves this.  Achieving it with separate 
javadocs will be harder.


Doug


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-698) FilteredQuery ignores boost

2007-05-31 Thread Doug Cutting (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12500427
 ] 

Doug Cutting commented on LUCENE-698:
-

> If boost is zero, then
> sumOfSquaredWeights() returns zero as well, resulting in a
> queryNorm of Infinity (due to a div by zero if DefaultSimilarity is
> used). Then it multiplies boost and queryNorm and 0*Infinity=NaN.

The bug here to me seems that queryNorm is Infinity.  A boost of zero has a 
reasonable interpretation (don't influence scoring), but I don't see how a 
queryNorm of Infinity is ever useful.  So perhaps we can remove the NaN by 
modifying the default implementation of queryNorm to return 1.0 instead of 
Infinity when passed zero.  Would that cause any harm?

> FilteredQuery ignores boost
> ---
>
> Key: LUCENE-698
> URL: https://issues.apache.org/jira/browse/LUCENE-698
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Search
>Affects Versions: 2.0.0
>Reporter: Yonik Seeley
>Assignee: Michael Busch
>Priority: Minor
> Fix For: 2.2
>
> Attachments: lucene-698.patch
>
>
> Filtered query ignores it's own boost.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Reopened: (LUCENE-885) clean up build files so contrib tests are run more easily

2007-05-31 Thread Hoss Man (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man reopened LUCENE-885:
-


Officially reopening this bug as i have discovered that it causes the build to 
fail on java 1.4

the problem is that the contrib-crawl logic used by build-contrib and 
test-contrib is ignorant of the "skip 1.5 contribs" logic used in the javadocs 
(it is a javadoc specific property) and the individiaul 1.5 contribs (ie: 
gdata) assume that ify ou are trying to build them, you must have 1.5.

patch is already ready to make the property more global, and to make the 
targets in the gdata build.xml act as NOOPs (echoing a message) based on the 
value ... just doing some more testing now before committing.

> clean up build files so contrib tests are run more easily
> -
>
> Key: LUCENE-885
> URL: https://issues.apache.org/jira/browse/LUCENE-885
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Build
>Reporter: Hoss Man
>Assignee: Hoss Man
> Fix For: 2.2
>
> Attachments: LUCENE-885.patch, LUCENE-885.patch
>
>
> Per mailing list discussion...
> http://www.nabble.com/Tests%2C-Contribs%2C-and-Releases-tf3768924.html#a10655448
> Tests for contribs should be run when "ant test" is used,  existing "test" 
> target renamed to "test-core"

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Created: (LUCENE-898) contrib/javascript is not packaged into releases

2007-05-31 Thread Hoss Man (JIRA)
contrib/javascript is not packaged into releases


 Key: LUCENE-898
 URL: https://issues.apache.org/jira/browse/LUCENE-898
 Project: Lucene - Java
  Issue Type: Bug
  Components: Build
Reporter: Hoss Man
Priority: Trivial


the contrib/javascript directory is (apparently) a collection of javascript 
utilities for lucene .. but it has not build files or any mechanism to package 
it, so it is excluded form releases.



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-698) FilteredQuery ignores boost

2007-05-31 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12500437
 ] 

Yonik Seeley commented on LUCENE-698:
-

> the default implementation of queryNorm to return 1.0 instead of Infinity 
> when passed zero.

That seems like it should be fine, esp since Similarity.queryNorm is only 
called at the top level when creating a weight.

> FilteredQuery ignores boost
> ---
>
> Key: LUCENE-698
> URL: https://issues.apache.org/jira/browse/LUCENE-698
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Search
>Affects Versions: 2.0.0
>Reporter: Yonik Seeley
>Assignee: Michael Busch
>Priority: Minor
> Fix For: 2.2
>
> Attachments: lucene-698.patch
>
>
> Filtered query ignores it's own boost.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-898) contrib/javascript is not packaged into releases

2007-05-31 Thread Erik Hatcher (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12500453
 ] 

Erik Hatcher commented on LUCENE-898:
-

My vote is to remove the javascript contrib area entirely.  It doesn't really 
do all that much useful.  I'd be surprised if anyone really uses it.

> contrib/javascript is not packaged into releases
> 
>
> Key: LUCENE-898
> URL: https://issues.apache.org/jira/browse/LUCENE-898
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Build
>Reporter: Hoss Man
>Priority: Trivial
>
> the contrib/javascript directory is (apparently) a collection of javascript 
> utilities for lucene .. but it has not build files or any mechanism to 
> package it, so it is excluded form releases.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Closed: (LUCENE-763) LuceneDictionary skips first word in enumeration

2007-05-31 Thread Daniel Naber (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Naber closed LUCENE-763.
---

   Resolution: Fixed
Fix Version/s: 2.2

Thanks, patch applied.


> LuceneDictionary skips first word in enumeration
> 
>
> Key: LUCENE-763
> URL: https://issues.apache.org/jira/browse/LUCENE-763
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Other
>Affects Versions: 2.0.0
> Environment: Windows Sun JRE 1.4.2_10_b03
>Reporter: Dan Ertman
> Fix For: 2.2
>
> Attachments: LuceneDictionary.java, TestLuceneDictionary.java
>
>
> The current code for LuceneDictionary will always skip the first word of the 
> TermEnum. The reason is that it doesn't initially retrieve TermEnum.term - 
> its first call is to TermEnum.next, which moves it past the first term (line 
> 76).
> To see this problem cause a failure, add this test to TestSpellChecker:
> similar = spellChecker.suggestSimilar("eihgt",2);
>   assertEquals(1, similar.length);
>   assertEquals(similar[0], "eight");
> Because "eight" is the first word in the index, it will fail.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-887) Interruptible segment merges

2007-05-31 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12500457
 ] 

Michael McCandless commented on LUCENE-887:
---


This looks great to me!

I think we should keep it out of core (ie, as subclasses as you've done
here) for now?

So, if a shutdown request comes in then currently running addDocument
calls are allowed to complete but if a new addDocument call tries to
run it will hit an "IndexWriter already closed" IOException.  Once the
in-flight addDocument calls finish you then flush the ram segments
without allowing cascading merge.

This actually means you can potentially have too many "level 0" (just
flushed) segments in the index but that should not be a big deal since
the next merge would clean it up.  And it should be rare.

In shutdown(), after you call waitForAddDocument(), why not call
clearInterrupt before calling flushRamSegments?  Isn't the
flushRamSegments() call guaranteed to hit the
IndexWriterInterruptException if it's using an ExtendedFSDirectory and
there are > 0 buffered docs?

Also I think it's possible that the addDocument() call from another
thread will hit the IndexWriterInterruptException, right?  So those
other threads should catch this and ignore it (since their doc was in
fact succesfully added and only the followon merge was interrupted)?


> Interruptible segment merges
> 
>
> Key: LUCENE-887
> URL: https://issues.apache.org/jira/browse/LUCENE-887
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: Index
>Reporter: Michael Busch
>Priority: Minor
> Fix For: 2.2
>
> Attachments: ExtendedIndexWriter.java
>
>
> Adds the ability to IndexWriter to interrupt an ongoing merge. This might be 
> necessary when Lucene is e. g. running as a service and has to stop indexing 
> within a certain period of time due to a shutdown request.
> A solution would be to add a new method shutdown() to IndexWriter which 
> satisfies the following two requirements:
> - if a merge is happening, abort it
> - flush the buffered docs but do not trigger a merge 
> See also discussions about this feature on java-dev:
> http://www.gossamer-threads.com/lists/lucene/java-dev/49008

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Created: (LUCENE-899) several gdata build targets don't work from contrib/gdata

2007-05-31 Thread Hoss Man (JIRA)
several gdata build targets don't work from contrib/gdata
-

 Key: LUCENE-899
 URL: https://issues.apache.org/jira/browse/LUCENE-899
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Hoss Man


the contrib/gdata/build.xml file is a little ... odd, and many of the targets 
do't work at all when called from that directory (only when using build-contrib 
from the top level)

this problem predates LUCENE-885 ...

[EMAIL PROTECTED]:~/svn/lucene-bugs/contrib/gdata-server$ svnversion
542768
[EMAIL PROTECTED]:~/svn/lucene-bugs/contrib/gdata-server$ ant test
Buildfile: build.xml

test:
 [echo] Building gdata-core...

javacc-uptodate-check:

javacc-notice:

common.init:

build-lucene:

init:

compile-core:
 [echo] Use gdata - compile-core task
[javac] Compiling 5 source files to 
/home/chrish/svn/lucene-bugs/build/contrib/gdata-server/core/classes/java
Warning: Reference build.path has not been set at runtime, but was found during
build file parsing, attempting to resolve. Future versions of Ant may support
 referencing ids defined in non-executed targets.
Warning: Reference common.build.path has not been set at runtime, but was found 
during
build file parsing, attempting to resolve. Future versions of Ant may support
 referencing ids defined in non-executed targets.

BUILD FAILED
/home/chrish/svn/lucene-bugs/contrib/gdata-server/build.xml:87: The following 
error occurred while executing this line:
/home/chrish/svn/lucene-bugs/contrib/gdata-server/src/core/build.xml:49: The 
following error occurred while executing this line:
/home/chrish/svn/lucene-bugs/common-build.xml:298: 
/home/chrish/svn/lucene-bugs/contrib/gdata-server/src/core/ext-libs not found.

Total time: 1 second


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Documentation Brainstorming

2007-05-31 Thread Grant Ingersoll
I like the suggestion of having two views: a unified view and then  
also a separate view.  Slightly more work to setup, but should  
satisfy both camps.


On May 31, 2007, at 1:16 PM, Doug Cutting wrote:



I like the single javadoc build.  The linking is nice, e.g., all  
Analyzer implementations are linked from Analyzer.  It also makes  
it easier for folks to see everything that's included in the  
release in one place.


True



Perhaps the names of the sections should be the name of the jar  
file, and/or the summary sentence in the package.html for contrib  
packages should name the jar file.  Would that suffice?




I find the lower left frame to be the main pain for me, since it  
isn't clear there what is in core and what is in contrib.


However if most folks really wish to split things, then some new  
navigational pages are required to provide a home for the various  
javadocs.  Ideally this would provide the level of integration  
that, e.g., Ant's optional tasks have with Ant's core tasks: when  
browsing core tasks there's always a link to optional tasks, and  
vice-versa, so the optional stuff is always just a single click  
away.  Putting contrib and core javadoc together achieves this.   
Achieving it with separate javadocs will be harder.




Makes sense.

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-897) Change how Core and Contrib javadocs are hosted

2007-05-31 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12500477
 ] 

Grant Ingersoll commented on LUCENE-897:


See http://www.gossamer-threads.com/lists/lucene/java-dev/49348 for reference

> Change how Core and Contrib javadocs are hosted
> ---
>
> Key: LUCENE-897
> URL: https://issues.apache.org/jira/browse/LUCENE-897
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Website
>Reporter: Grant Ingersoll
>Priority: Minor
>
> Change the site javadocs to:
> 1. separate contrib javadocs from core javadocs
> 2. Optionally, include a unified view as well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Resolved: (LUCENE-885) clean up build files so contrib tests are run more easily

2007-05-31 Thread Hoss Man (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man resolved LUCENE-885.
-

Resolution: Fixed

Committed revision 543257.

Compilation and test of the entire tree should work fine now under 1.4 ... note 
that gdata doesn't actually run it's tests (even under 1.5) because of 
LUCENE-899 ... but this problem predates any work done for this issue, so i'm 
not going to look into it at this time as it relates to bugs in a specific 
contrib, and not in changes made to facilitate the building/testing of contribs.

> clean up build files so contrib tests are run more easily
> -
>
> Key: LUCENE-885
> URL: https://issues.apache.org/jira/browse/LUCENE-885
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Build
>Reporter: Hoss Man
>Assignee: Hoss Man
> Fix For: 2.2
>
> Attachments: LUCENE-885.patch, LUCENE-885.patch
>
>
> Per mailing list discussion...
> http://www.nabble.com/Tests%2C-Contribs%2C-and-Releases-tf3768924.html#a10655448
> Tests for contribs should be run when "ant test" is used,  existing "test" 
> target renamed to "test-core"

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: addIndexes()

2007-05-31 Thread Steven Parkes
Hmmm ... something's not meshing for me here.

If I understood what you've said, you have a DbD index to which you are
addIndexes'ing a memory index? I must have missed something, because
addIndexes pre- and post-optimizes the target (Dbd) index, not the
operand (mem) index.

-Original Message-
From: Andi Vajda [mailto:[EMAIL PROTECTED] 
Sent: Thursday, May 31, 2007 10:10 AM
To: java-dev@lucene.apache.org
Subject: Re: addIndexes()


On Thu, 31 May 2007, Doug Cutting wrote:

> Steven Parkes wrote:
>> Is there any particular reason that the version that takes a
Directory[]
>> optimizes first?
>
> There was, but unfortunately I can't recall it now.  Index merging has

> changed substantially since then, so, whatever it was, it may no
longer 
> apply.  If no one can think of a good reason to optimize any longer,
then 
> probably we should remove it, no?

No longer optimizing on this call would impact performance in what I'm
doing.
My usage pattern involves indexing in a MemoryIndex and adding that
index to 
an index backed by a DbDirectory. If the index is not optimized first,
the 
operation becomes very noisy in the database.

In other words, if that change is made, please let us know so that I can
adapt 
my code to explicitely optimize the MemoryIndex first.

Thanks !

Andi..


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



enabling java assertions in the tests

2007-05-31 Thread Doron Cohen

While testing LUCENE-866 I realized that Java assertions
are disabled when *I* run 'ant test'.

Others did have the assertion executed and causing that
NPE. So I am not sure if this is general problem or only
a Windows one.

Compile wise we are ok, having "-source 1.4".
At runtime, assertions can be enabled by running "java -ea".
Using ant, setting "ANT_ARGS=-ea" is supposed to have the
same effect, but it doesn't, at least not for me.

Adding:
 
 
 
to the  task would enable assertions during tests
regardless of ANT_OPTS variable (and hopefully on all OSs).

Anyone sees a problem with adding this?

Btw, I think we can/should use Java asserts more (there are
currently only 4 active asserts under trunk/java).

Doron


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: enabling java assertions in the tests

2007-05-31 Thread Michael McCandless
"Doron Cohen" <[EMAIL PROTECTED]> wrote:
> 
> While testing LUCENE-866 I realized that Java assertions
> are disabled when *I* run 'ant test'.

I noticed this too; in my patch on LUCENE-843 I've turned
on assertions for all unit tests (I'm using alot of asserts
in that patch) as well.

> Others did have the assertion executed and causing that
> NPE. So I am not sure if this is general problem or only
> a Windows one.
> 
> Compile wise we are ok, having "-source 1.4".
> At runtime, assertions can be enabled by running "java -ea".
> Using ant, setting "ANT_ARGS=-ea" is supposed to have the
> same effect, but it doesn't, at least not for me.
> 
> Adding:
>  
>  
>  
> to the  task would enable assertions during tests
> regardless of ANT_OPTS variable (and hopefully on all OSs).

I had added  under the  tast and
it also seems to work, but I like your solution better (it's
clearer).

> Anyone sees a problem with adding this?
> 
> Btw, I think we can/should use Java asserts more (there are
> currently only 4 active asserts under trunk/java).

I agree!  The asserts have been very helpful in my debugging
in LUCENE-843.

Mike

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-763) LuceneDictionary skips first word in enumeration

2007-05-31 Thread Steven Parkes (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12500521
 ] 

Steven Parkes commented on LUCENE-763:
--

Can we also update the javadocs to reflect the different semantics between 
terms() and terms(term)? Here's some possible verbage. (Also tweaks the "after 
the given term" which I think isn't correct?) 

{noformat} 
Index: src/java/org/apache/lucene/index/IndexReader.java
===
--- src/java/org/apache/lucene/index/IndexReader.java   (revision 543284)
+++ src/java/org/apache/lucene/index/IndexReader.java   (working copy)
@@ -539,16 +539,21 @@
 setNorm(doc, field, Similarity.encodeNorm(value));
   }
 
-  /** Returns an enumeration of all the terms in the index.
-   * The enumeration is ordered by Term.compareTo().  Each term
-   * is greater than all that precede it in the enumeration.
+  /** Returns an enumeration of all the terms in the index.  The
+   * enumeration is ordered by Term.compareTo().  Each term is greater
+   * than all that precede it in the enumeration.  Note that after
+   * calling [EMAIL PROTECTED] #terms()}, [EMAIL PROTECTED] TermEnum#next()} 
must be called
+   * on the resulting enumeration before calling other methods such as
+   * [EMAIL PROTECTED] TermEnum#term()}.
* @throws IOException if there is a low-level IO error
*/
   public abstract TermEnum terms() throws IOException;
 
-  /** Returns an enumeration of all terms after a given term.
-   * The enumeration is ordered by Term.compareTo().  Each term
-   * is greater than all that precede it in the enumeration.
+  /** Returns an enumeration of all terms starting at a given term. If
+   * the given term does not exist, the enumeration is positioned a the
+   * first term greater than the supplied therm.  The enumeration is
+   * ordered by Term.compareTo().  Each term is greater than all that
+   * precede it in the enumeration.
* @throws IOException if there is a low-level IO error
*/
   public abstract TermEnum terms(Term t) throws IOException;
{noformat} 


> LuceneDictionary skips first word in enumeration
> 
>
> Key: LUCENE-763
> URL: https://issues.apache.org/jira/browse/LUCENE-763
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Other
>Affects Versions: 2.0.0
> Environment: Windows Sun JRE 1.4.2_10_b03
>Reporter: Dan Ertman
> Fix For: 2.2
>
> Attachments: LuceneDictionary.java, TestLuceneDictionary.java
>
>
> The current code for LuceneDictionary will always skip the first word of the 
> TermEnum. The reason is that it doesn't initially retrieve TermEnum.term - 
> its first call is to TermEnum.next, which moves it past the first term (line 
> 76).
> To see this problem cause a failure, add this test to TestSpellChecker:
> similar = spellChecker.suggestSimilar("eihgt",2);
>   assertEquals(1, similar.length);
>   assertEquals(similar[0], "eight");
> Because "eight" is the first word in the index, it will fail.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: enabling java assertions in the tests

2007-05-31 Thread Paul Elschot
On Friday 01 June 2007 00:30, Doron Cohen wrote:
> 
> While testing LUCENE-866 I realized that Java assertions
> are disabled when *I* run 'ant test'.
> Others did have the assertion executed and causing that
> NPE. So I am not sure if this is general problem or only
> a Windows one.

Indeed, see below. I'm running Linux and java 1.6.0.
 
> Compile wise we are ok, having "-source 1.4".
> At runtime, assertions can be enabled by running "java -ea".
> Using ant, setting "ANT_ARGS=-ea" is supposed to have the
> same effect, but it doesn't, at least not for me.
> 
> Adding:
>  
>  
>  
>
> to the  task would enable assertions during tests
> regardless of ANT_OPTS variable (and hopefully on all OSs).
> 
> Anyone sees a problem with adding this?

My common-build.xml has this added in the junit task:

  

  

Regards,
Paul Elschot

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: enabling java assertions in the tests

2007-05-31 Thread DM Smith
I think that having assertions is of no value if they are never  
turned on :)


I suggest going carefully in adding assertions. There are a lot of  
places where assertions are inappropriate (e.g. checking parameters  
on a public method).


I think Sun's document gives good guidelines:

http://java.sun.com/j2se/1.4.2/docs/guide/lang/assert.html

-- DM Smith

On May 31, 2007, at 6:30 PM, Doron Cohen wrote:



While testing LUCENE-866 I realized that Java assertions
are disabled when *I* run 'ant test'.

Others did have the assertion executed and causing that
NPE. So I am not sure if this is general problem or only
a Windows one.

Compile wise we are ok, having "-source 1.4".
At runtime, assertions can be enabled by running "java -ea".
Using ant, setting "ANT_ARGS=-ea" is supposed to have the
same effect, but it doesn't, at least not for me.

Adding:
 
 
 
to the  task would enable assertions during tests
regardless of ANT_OPTS variable (and hopefully on all OSs).

Anyone sees a problem with adding this?

Btw, I think we can/should use Java asserts more (there are
currently only 4 active asserts under trunk/java).

Doron


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: enabling java assertions in the tests

2007-05-31 Thread Doron Cohen
DM Smith wrote on 31/05/2007 15:59:05:

> I think that having assertions is of no value if they are never
> turned on :)
>
> I suggest going carefully in adding assertions. There are a lot of
> places where assertions are inappropriate (e.g. checking parameters
> on a public method).
>
> I think Sun's document gives good guidelines:
>
> http://java.sun.com/j2se/1.4.2/docs/guide/lang/assert.html

Perhaps the most important guideline regarding assertions
is that they should *never* have side effects, otherwise
correctness is broken when assertions are disabled.

Doron


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Created: (LUCENE-900) Enable Java asserts in the Junit tests

2007-05-31 Thread Doron Cohen (JIRA)
Enable Java asserts in the Junit tests
--

 Key: LUCENE-900
 URL: https://issues.apache.org/jira/browse/LUCENE-900
 Project: Lucene - Java
  Issue Type: Test
  Components: Build
Reporter: Doron Cohen
Assignee: Doron Cohen
Priority: Minor


For background see 
http://www.mail-archive.com/java-dev@lucene.apache.org/msg10307.html

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: enabling java assertions in the tests

2007-05-31 Thread Doron Cohen
Paul Elschot <[EMAIL PROTECTED]> wrote on 31/05/2007 16:21:09:

> > Adding:
> >  
> >  
> >  
> >
> > to the  task would enable assertions during tests
> > regardless of ANT_OPTS variable (and hopefully on all OSs).

> My common-build.xml has this added in the junit task:
>
>   
> 
>   
>

This enables the asserts for all lucene-java packages, but not
for any external jars being used. I think this is cleaner
because the no-args form would enable asserts also in external
jars and might add noise to our tests.

I'll open an issue and patch it like this!


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: addIndexes()

2007-05-31 Thread Andi Vajda


On Thu, 31 May 2007, Steven Parkes wrote:


Hmmm ... something's not meshing for me here.

If I understood what you've said, you have a DbD index to which you are
addIndexes'ing a memory index? I must have missed something, because
addIndexes pre- and post-optimizes the target (Dbd) index, not the
operand (mem) index.


I stand corrected. I'm using an IndexWriter opened on a RAMDirectory to do the 
indexing for a given transaction. Then I call addIndexes([writer]) on the 
IndexWriter backed by the DbDirectory to persist this. This approach ash 
turned out to be considerably faster and less noisy in the database (the 
amount of random access changes) than indexing into the DbDirectory backed 
index directly and then optimizing it.


The docs for addIndexes() say "After this completes, the index is optimized." 
I mistakenly thought that there was discussion here about making this no 
longer be the case.


Andi..

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-887) Interruptible segment merges

2007-05-31 Thread Michael Busch (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12500550
 ] 

Michael Busch commented on LUCENE-887:
--

> This looks great to me!

Thanks for reviewing!

> So, if a shutdown request comes in then currently running addDocument
> calls are allowed to complete but if a new addDocument call tries to
> run it will hit an "IndexWriter already closed" IOException.  Once the
> in-flight addDocument calls finish you then flush the ram segments
> without allowing cascading merge.

Exactly.

> This actually means you can potentially have too many "level 0" (just
> flushed) segments in the index but that should not be a big deal since
> the next merge would clean it up.  And it should be rare.

Yes, unless another shutdown request comes while the first merge after
restarting the system is happening (which should be very unlikely), this
will be cleaned up. Also, once the system is up again the IndexWriter 
will delete left over file fragments from an aborted merge.

> In shutdown(), after you call waitForAddDocument(), why not call
> clearInterrupt before calling flushRamSegments?  Isn't the
> flushRamSegments() call guaranteed to hit the
> IndexWriterInterruptException if it's using an ExtendedFSDirectory and
> there are > 0 buffered docs?

Hmm I think I did it this way in case we aren't using an 
ExtendedFSDirectory, because then the flush would just succeed without 
an IndexWriterInterruptException and we safe an instanceof check here. 
But you are right, we can just call clearInterrupt, but only if 
(d instanceof ExtendedFSDirectory) == true. That's probably simpler. 
Thereafter it is safe to call close() because the buffer is empty, so 
the call of flushRamSegments in close() won't do anything.

> Also I think it's possible that the addDocument() call from another
> thread will hit the IndexWriterInterruptException, right?  So those
> other threads should catch this and ignore it (since their doc was in
> fact succesfully added and only the followon merge was interrupted)?

Hmm I'm not sure if I understand this. I catch the 
IndexWriterInterruptException in addDocument() and in the catch block
flushAfterInterrupt() is called which clears the interrupt flag. So
IndexWriterInterruptException shouldn't be thrown again and addDocument()
should just return normally? Or am I missing something. Could you give 
an example?

> Interruptible segment merges
> 
>
> Key: LUCENE-887
> URL: https://issues.apache.org/jira/browse/LUCENE-887
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: Index
>Reporter: Michael Busch
>Priority: Minor
> Fix For: 2.2
>
> Attachments: ExtendedIndexWriter.java
>
>
> Adds the ability to IndexWriter to interrupt an ongoing merge. This might be 
> necessary when Lucene is e. g. running as a service and has to stop indexing 
> within a certain period of time due to a shutdown request.
> A solution would be to add a new method shutdown() to IndexWriter which 
> satisfies the following two requirements:
> - if a merge is happening, abort it
> - flush the buffered docs but do not trigger a merge 
> See also discussions about this feature on java-dev:
> http://www.gossamer-threads.com/lists/lucene/java-dev/49008

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-698) FilteredQuery ignores boost

2007-05-31 Thread Michael Busch (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12500552
 ] 

Michael Busch commented on LUCENE-698:
--

> So perhaps we can remove the NaN by modifying the default implementation of 
> queryNorm to return 1.0 instead of Infinity when passed zero. Would that 
> cause any harm?

Yes I believe this should work, too. This would prevent the NaN score when
DefaultSimilarity is used. It will be the responsibility of people
who implement their own Similarity then to take care of this in a similar way.

I'll open a new issue for fixing the DefaultSimilarity.

> FilteredQuery ignores boost
> ---
>
> Key: LUCENE-698
> URL: https://issues.apache.org/jira/browse/LUCENE-698
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Search
>Affects Versions: 2.0.0
>Reporter: Yonik Seeley
>Assignee: Michael Busch
>Priority: Minor
> Fix For: 2.2
>
> Attachments: lucene-698.patch
>
>
> Filtered query ignores it's own boost.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Created: (LUCENE-901) DefaultSimilarity.queryNorm() should never return Infinity

2007-05-31 Thread Michael Busch (JIRA)
DefaultSimilarity.queryNorm() should never return Infinity
--

 Key: LUCENE-901
 URL: https://issues.apache.org/jira/browse/LUCENE-901
 Project: Lucene - Java
  Issue Type: Bug
  Components: Search
Reporter: Michael Busch
Priority: Trivial


Currently DefaultSimilarity.queryNorm() returns Infinity if 
sumOfSquaredWeights=0.
This can result in a score of NaN (e. g. in TermScorer) if boost=0.0f.

A simple fix would be to return 1.0f in case zero is passed in.

See LUCENE-698 for discussions about this.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-898) contrib/javascript is not packaged into releases

2007-05-31 Thread Michael Busch (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12500569
 ] 

Michael Busch commented on LUCENE-898:
--

> My vote is to remove the javascript contrib area entirely. 

+1. It also seems that this package is unmaintained. No files have
been changed since February 2005, when it was moved from the 
sandbox to contrib.

> contrib/javascript is not packaged into releases
> 
>
> Key: LUCENE-898
> URL: https://issues.apache.org/jira/browse/LUCENE-898
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Build
>Reporter: Hoss Man
>Priority: Trivial
>
> the contrib/javascript directory is (apparently) a collection of javascript 
> utilities for lucene .. but it has not build files or any mechanism to 
> package it, so it is excluded form releases.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-901) DefaultSimilarity.queryNorm() should never return Infinity

2007-05-31 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12500576
 ] 

Hoss Man commented on LUCENE-901:
-

I'm not sure if i agree with this concept. Do we really want the curve of 
values from queryNorm to have a step drop down from really *huge* values when 
sumOfSquaredWeights is "near" zero to "1" when sumOfSquaredWeights becomes so 
close to zero it can only be represented as 0.0f ?

Float.MAX_VALUE seems like a better choice then 1, but I haven't really thought 
through wether or not that will still trigger NaN scores.

> DefaultSimilarity.queryNorm() should never return Infinity
> --
>
> Key: LUCENE-901
> URL: https://issues.apache.org/jira/browse/LUCENE-901
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Search
>Reporter: Michael Busch
>Priority: Trivial
>
> Currently DefaultSimilarity.queryNorm() returns Infinity if 
> sumOfSquaredWeights=0.
> This can result in a score of NaN (e. g. in TermScorer) if boost=0.0f.
> A simple fix would be to return 1.0f in case zero is passed in.
> See LUCENE-698 for discussions about this.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-898) contrib/javascript is not packaged into releases

2007-05-31 Thread Otis Gospodnetic (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12500575
 ] 

Otis Gospodnetic commented on LUCENE-898:
-

I think the files have not changed in a while because they work.  I believe 
Kelvin Tan (the author) used/uses this stuff somewhere.  I'm typically for 
cleaning things up, but somehow I feel that this javascript stuff should be 
left alone (it ain't broken, is it?).

> contrib/javascript is not packaged into releases
> 
>
> Key: LUCENE-898
> URL: https://issues.apache.org/jira/browse/LUCENE-898
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Build
>Reporter: Hoss Man
>Priority: Trivial
>
> the contrib/javascript directory is (apparently) a collection of javascript 
> utilities for lucene .. but it has not build files or any mechanism to 
> package it, so it is excluded form releases.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]