[jira] Commented: (LUCENE-1849) Add OutOfOrderCollector and InOrderCollector subclasses of Collector

2009-08-26 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12747805#action_12747805
 ] 

Shai Erera commented on LUCENE-1849:


bq. I think somewhere in LUCENE-1483 is the answer to this question

I tracked it down to LUCENE-1575 which was the huge refactoring to HitCollector 
issue: 
https://issues.apache.org/jira/browse/LUCENE-1575?focusedCommentId=12695784page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12695784
(perhaps we should index my memory cells? :) ).

 Add OutOfOrderCollector and InOrderCollector subclasses of Collector
 

 Key: LUCENE-1849
 URL: https://issues.apache.org/jira/browse/LUCENE-1849
 Project: Lucene - Java
  Issue Type: Wish
  Components: Search
Affects Versions: 2.9
Reporter: Tim Smith
Priority: Minor
 Fix For: 2.9


 I find myself always having to implement these methods, and i always return a 
 constant (depending on if the collector can handle out of order hits)
 would be nice for these two convenience abstract classes to exist that 
 implemented acceptsDocsOutOfOrder() as final and returned the appropriate 
 value

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1845) if the build fails to download JARs for contrib/db, just skip its tests

2009-08-26 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12747836#action_12747836
 ] 

Simon Willnauer commented on LUCENE-1845:
-

I added the short discussion I had on legal-discuss for the record.
One person confirmed that we could add the jar to SVN if we do not redistribute 
them. I'm not a license guy but I guess we should first figure out what license 
this particular jar has. It is not a download from the oracle page (you can 
only get the sources for this particular jar not the binary as a download) but 
something from  http://downloads.osafoundation.org/db/ without any license 
notice.

I would suggest to try to build the jar from source with the latest release on 
the oracle page so we can be sure about the license. Once I have done this I 
will send another request to legal to confirm tha we are not violating anything.

The discussion from legal-discuss
{noformat}
 Hey there,
 We (lucene) have a contrib project that provides a Index-Directory
 implementation based on BerkleyDB. This code downloads a jar file from
 http://downloads.osafoundation.org/... to build and test the code.
 This jar-file is not included in any distribution and we do not plan
 to do so. The problem is that the download site is down very
 frequently so we are looking for another way to obtain the jar. Here
 is the question do we violate the license if we add the jar-file to
 the svn repository but not distributing it at all? Another way would
 be to add the jar to a commiter page on people.apache.org and download
 it from there.
 The license is here:
 http://www.oracle.com/technology/software/products/berkeley-db/htdocs/oslicense.html

Complicated matter.
BDB seems viral in that anything that uses must be made available in
source form. So, ASF has no problem fulfilling that requirement, but
downstream users may. OTOH, you say that the BDB is only used to build
(do you really need it to build?) and test your implementation, BUT
you say that you have an implementation based on BDB, so I presume
that it requires it to run.

My interpretation is;
 * IFF your component is purely optional, having a dependency on BDB
is Ok, provided it is not shipped with the release and that the user
is provided with the information that the BDB needs to be downloaded
separately and advised to review their license.

For your second part; Can you stick the BDB jar(s) somewhere more
reliably available?
 * Yes, I think so. The license allows distribution in any form,
source or binary... So, I suggest that you upload it to a dependable
host, such as SF, ibiblio.org or similar. people.apache.org -- I
wouldn't recommend it. ASF SVN -- yes, that should be Ok, but there
is a strong recommendation of not putting JARs in there... Also there
is a risk that the encumbrance around BDB is forgotten and used beyond
what is acceptable if it is 'laying around'.


Cheer
{noformat}

 if the build fails to download JARs for contrib/db, just skip its tests
 ---

 Key: LUCENE-1845
 URL: https://issues.apache.org/jira/browse/LUCENE-1845
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Michael McCandless
Assignee: Simon Willnauer
Priority: Minor
 Fix For: 2.9

 Attachments: LUCENE-1845.patch, LUCENE-1845.txt, LUCENE-1845.txt, 
 LUCENE-1845.txt, LUCENE-1845.txt


 Every so often our nightly build fails because contrib/db is unable to 
 download the necessary BDB JARs from http://downloads.osafoundation.org.  I 
 think in such cases we should simply skip contrib/db's tests, if it's the 
 nightly build that's running, since it's a false positive failure.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1849) Add OutOfOrderCollector and InOrderCollector subclasses of Collector

2009-08-26 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12747851#action_12747851
 ] 

Michael McCandless commented on LUCENE-1849:


We do need to index your memory cells!

Except: that entry is showing the [sizable] perf gains of disabling scoring 
when sorting by field (I think?). We were instead looking for the comparison of 
BooleanScorer vs BoolenScorer2.

 Add OutOfOrderCollector and InOrderCollector subclasses of Collector
 

 Key: LUCENE-1849
 URL: https://issues.apache.org/jira/browse/LUCENE-1849
 Project: Lucene - Java
  Issue Type: Wish
  Components: Search
Affects Versions: 2.9
Reporter: Tim Smith
Priority: Minor
 Fix For: 2.9


 I find myself always having to implement these methods, and i always return a 
 constant (depending on if the collector can handle out of order hits)
 would be nice for these two convenience abstract classes to exist that 
 implemented acceptsDocsOutOfOrder() as final and returned the appropriate 
 value

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: Hudson build is back to normal: Lucene-trunk #929

2009-08-26 Thread Michael McCandless
Yay!

Mike

On Wed, Aug 26, 2009 at 1:24 AM, Apache Hudson
Serverhud...@hudson.zones.apache.org wrote:
 See http://hudson.zones.apache.org/hudson/job/Lucene-trunk/929/changes



 -
 To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: Hudson build is back to normal: Lucene-trunk #929

2009-08-26 Thread Simon Willnauer
:)

there we go!

On Wed, Aug 26, 2009 at 11:24 AM, Michael
McCandlessluc...@mikemccandless.com wrote:
 Yay!

 Mike

 On Wed, Aug 26, 2009 at 1:24 AM, Apache Hudson
 Serverhud...@hudson.zones.apache.org wrote:
 See http://hudson.zones.apache.org/hudson/job/Lucene-trunk/929/changes



 -
 To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-dev-h...@lucene.apache.org



 -
 To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



RE: javadoc update help

2009-08-26 Thread Uwe Schindler
Even the old RangeQuery does it. Only the new class TermRangeQuery uses
constant score (and the also deprecated ConstantScoreRangeQuery).

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de

 -Original Message-
 From: Michael McCandless [mailto:luc...@mikemccandless.com]
 Sent: Wednesday, August 26, 2009 11:33 AM
 To: java-dev@lucene.apache.org
 Subject: Re: javadoc update help
 
 Unfortunately, Prefix/Wildcard/FuzzyQuery, etc., still rewrite to scoring
 BooleanQuery by default, for now.  In 3.0 this will change to constant
 score auto mode.  At least, that's the plan now... however,
 QueryParser will produce queries in constant score auto mode, so we
 could consider changing the default for these queries in 2.9?  If we
 don't want to change that default, how about something like this?:
 
 /** Set the maximum number of clauses permitted per
   * BooleanQuery.  Default value is 1024.  Note that queries that
   * derive from MultiTermQuery, such as such as WildcardQuery,
   * PrefixQuery and FuzzyQuery, may rewrite themselves to a
   * BooleanQuery before searching, and may therefore also hit this
   * limit.  See {...@link MultiTermQuery} for details.
  */
 
 Mike
 
 On Tue, Aug 25, 2009 at 8:14 PM, Mark Millermarkrmil...@gmail.com wrote:
  Having a writers block here:
 
   /** Set the maximum number of clauses permitted per BooleanQuery.
    * Default value is 1024.
    * pTermQuery clauses are generated from for example prefix queries
 and
    * fuzzy queries. Each TermQuery needs some buffer space during search,
    * so this parameter indirectly controls the maximum buffer
  requirements for
    * query search.
    * pWhen this parameter becomes a bottleneck for a Query one can use
 a
    * Filter. For example instead of a {...@link TermRangeQuery} one can use
 a
    * {...@link TermRangeFilter}.
    * pNormally the buffers are allocated by the JVM. When using for
  example
    * {...@link org.apache.lucene.store.MMapDirectory} the buffering is left
 to
    * the operating system.
    */
 
  Okay, so prefix and fuzzy queries now will use a constantscore mode
  (indirectly, a filter) when it makes sense.
  So this comment is misleading. And the parameter doesn't control the max
  buffer - it possibly provides a top cutoff. But now it doesn't even
  necessarily do that, because if the Query uses constantscore mode
  (multi-term queries auto pick by default), this setting doesn't even
  influence anything.
 
  I started to rewrite below - but then it feels like I almost need to
  start from scratch. I don't think we should claim this setting controls
  the maximum buffer requirements for query search either - thats a bit
  strong ;) And the buffer talk overall (including at the bottom) is a bit
  confusing.
 
 
   /** Set the maximum number of clauses permitted per BooleanQuery.
    * Default value is 1024.
    * pFor example, TermQuery clauses can be generated from prefix
  queries and
    * fuzzy queries. Each TermQuery needs some buffer space during search,
    * so this parameter indirectly controls the maximum buffer
  requirements for
    * query search.
    * pWhen this parameter becomes a bottleneck for a Query one can use
 a
    * Filter. For example instead of a {...@link TermRangeQuery} one can use
 a
    * {...@link TermRangeFilter}.
    * pNormally the buffers are allocated by the JVM. When using for
  example
    * {...@link org.apache.lucene.store.MMapDirectory} the buffering is left
 to
    * the operating system.
    */
 
  I'm tempted to make it:
 
  /** Set the maximum number of clauses permitted per BooleanQuery.
    * Default value is 1024.
   */
 
  :)
 
  Anyone have any suggestions though?
 
  --
  - Mark
 
  http://www.lucidimagination.com
 
 
 
 
  -
  To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
  For additional commands, e-mail: java-dev-h...@lucene.apache.org
 
 
 
 -
 To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1849) Add OutOfOrderCollector and InOrderCollector subclasses of Collector

2009-08-26 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12747874#action_12747874
 ] 

Shai Erera commented on LUCENE-1849:


Yes I know that - I remember that you once ran w/ BS vs. BS2 and thought the 
results were reported in that issue. But I've scanned it and I don't find it. 
Perhaps it was in an email, but I seem to remember you reported ~20-30% 
improvement in favor of BS. I'll try to digg it up from the bottom of my memory 
pit.

 Add OutOfOrderCollector and InOrderCollector subclasses of Collector
 

 Key: LUCENE-1849
 URL: https://issues.apache.org/jira/browse/LUCENE-1849
 Project: Lucene - Java
  Issue Type: Wish
  Components: Search
Affects Versions: 2.9
Reporter: Tim Smith
Priority: Minor
 Fix For: 2.9


 I find myself always having to implement these methods, and i always return a 
 constant (depending on if the collector can handle out of order hits)
 would be nice for these two convenience abstract classes to exist that 
 implemented acceptsDocsOutOfOrder() as final and returned the appropriate 
 value

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: javadoc update help

2009-08-26 Thread Mark Miller
hmm...I guess this javadoc from MultiTermQuery confused me:

 * Note that {...@link QueryParser} produces
 * MultiTermQueries using {...@link
 * #CONSTANT_SCORE_AUTO_REWRITE_DEFAULT} by default.


Uwe Schindler wrote:
 Even the old RangeQuery does it. Only the new class TermRangeQuery uses
 constant score (and the also deprecated ConstantScoreRangeQuery).

 -
 Uwe Schindler
 H.-H.-Meier-Allee 63, D-28213 Bremen
 http://www.thetaphi.de
 eMail: u...@thetaphi.de

   
 -Original Message-
 From: Michael McCandless [mailto:luc...@mikemccandless.com]
 Sent: Wednesday, August 26, 2009 11:33 AM
 To: java-dev@lucene.apache.org
 Subject: Re: javadoc update help

 Unfortunately, Prefix/Wildcard/FuzzyQuery, etc., still rewrite to scoring
 BooleanQuery by default, for now.  In 3.0 this will change to constant
 score auto mode.  At least, that's the plan now... however,
 QueryParser will produce queries in constant score auto mode, so we
 could consider changing the default for these queries in 2.9?  If we
 don't want to change that default, how about something like this?:

 /** Set the maximum number of clauses permitted per
   * BooleanQuery.  Default value is 1024.  Note that queries that
   * derive from MultiTermQuery, such as such as WildcardQuery,
   * PrefixQuery and FuzzyQuery, may rewrite themselves to a
   * BooleanQuery before searching, and may therefore also hit this
   * limit.  See {...@link MultiTermQuery} for details.
  */

 Mike

 On Tue, Aug 25, 2009 at 8:14 PM, Mark Millermarkrmil...@gmail.com wrote:
 
 Having a writers block here:

  /** Set the maximum number of clauses permitted per BooleanQuery.
   * Default value is 1024.
   * pTermQuery clauses are generated from for example prefix queries
   
 and
 
   * fuzzy queries. Each TermQuery needs some buffer space during search,
   * so this parameter indirectly controls the maximum buffer
 requirements for
   * query search.
   * pWhen this parameter becomes a bottleneck for a Query one can use
   
 a
 
   * Filter. For example instead of a {...@link TermRangeQuery} one can use
   
 a
 
   * {...@link TermRangeFilter}.
   * pNormally the buffers are allocated by the JVM. When using for
 example
   * {...@link org.apache.lucene.store.MMapDirectory} the buffering is left
   
 to
 
   * the operating system.
   */

 Okay, so prefix and fuzzy queries now will use a constantscore mode
 (indirectly, a filter) when it makes sense.
 So this comment is misleading. And the parameter doesn't control the max
 buffer - it possibly provides a top cutoff. But now it doesn't even
 necessarily do that, because if the Query uses constantscore mode
 (multi-term queries auto pick by default), this setting doesn't even
 influence anything.

 I started to rewrite below - but then it feels like I almost need to
 start from scratch. I don't think we should claim this setting controls
 the maximum buffer requirements for query search either - thats a bit
 strong ;) And the buffer talk overall (including at the bottom) is a bit
 confusing.


  /** Set the maximum number of clauses permitted per BooleanQuery.
   * Default value is 1024.
   * pFor example, TermQuery clauses can be generated from prefix
 queries and
   * fuzzy queries. Each TermQuery needs some buffer space during search,
   * so this parameter indirectly controls the maximum buffer
 requirements for
   * query search.
   * pWhen this parameter becomes a bottleneck for a Query one can use
   
 a
 
   * Filter. For example instead of a {...@link TermRangeQuery} one can use
   
 a
 
   * {...@link TermRangeFilter}.
   * pNormally the buffers are allocated by the JVM. When using for
 example
   * {...@link org.apache.lucene.store.MMapDirectory} the buffering is left
   
 to
 
   * the operating system.
   */

 I'm tempted to make it:

 /** Set the maximum number of clauses permitted per BooleanQuery.
   * Default value is 1024.
  */

 :)

 Anyone have any suggestions though?

 --
 - Mark

 http://www.lucidimagination.com




 -
 To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-dev-h...@lucene.apache.org


   
 -
 To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-dev-h...@lucene.apache.org
 



 -
 To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-dev-h...@lucene.apache.org

   


-- 
- Mark

http://www.lucidimagination.com




-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: javadoc update help

2009-08-26 Thread Michael McCandless
Right, it is confusing!

QueryParser has already cutover to auto constant score, by default.

But for direct instantiation of one of the MultiTermQueries, we still
default to scoring BooleanQuery, but have declared that in 3.0 this
will also switch to auto constant score.  I'm tempted to simply switch
the default today, for 2.9, instead.  Then your original proposed
javadoc is great.

Mike

On Wed, Aug 26, 2009 at 6:06 AM, Mark Millermarkrmil...@gmail.com wrote:
 hmm...I guess this javadoc from MultiTermQuery confused me:

  * Note that {...@link QueryParser} produces
  * MultiTermQueries using {...@link
  * #CONSTANT_SCORE_AUTO_REWRITE_DEFAULT} by default.


 Uwe Schindler wrote:
 Even the old RangeQuery does it. Only the new class TermRangeQuery uses
 constant score (and the also deprecated ConstantScoreRangeQuery).

 -
 Uwe Schindler
 H.-H.-Meier-Allee 63, D-28213 Bremen
 http://www.thetaphi.de
 eMail: u...@thetaphi.de


 -Original Message-
 From: Michael McCandless [mailto:luc...@mikemccandless.com]
 Sent: Wednesday, August 26, 2009 11:33 AM
 To: java-dev@lucene.apache.org
 Subject: Re: javadoc update help

 Unfortunately, Prefix/Wildcard/FuzzyQuery, etc., still rewrite to scoring
 BooleanQuery by default, for now.  In 3.0 this will change to constant
 score auto mode.  At least, that's the plan now... however,
 QueryParser will produce queries in constant score auto mode, so we
 could consider changing the default for these queries in 2.9?  If we
 don't want to change that default, how about something like this?:

 /** Set the maximum number of clauses permitted per
   * BooleanQuery.  Default value is 1024.  Note that queries that
   * derive from MultiTermQuery, such as such as WildcardQuery,
   * PrefixQuery and FuzzyQuery, may rewrite themselves to a
   * BooleanQuery before searching, and may therefore also hit this
   * limit.  See {...@link MultiTermQuery} for details.
  */

 Mike

 On Tue, Aug 25, 2009 at 8:14 PM, Mark Millermarkrmil...@gmail.com wrote:

 Having a writers block here:

  /** Set the maximum number of clauses permitted per BooleanQuery.
   * Default value is 1024.
   * pTermQuery clauses are generated from for example prefix queries

 and

   * fuzzy queries. Each TermQuery needs some buffer space during search,
   * so this parameter indirectly controls the maximum buffer
 requirements for
   * query search.
   * pWhen this parameter becomes a bottleneck for a Query one can use

 a

   * Filter. For example instead of a {...@link TermRangeQuery} one can use

 a

   * {...@link TermRangeFilter}.
   * pNormally the buffers are allocated by the JVM. When using for
 example
   * {...@link org.apache.lucene.store.MMapDirectory} the buffering is left

 to

   * the operating system.
   */

 Okay, so prefix and fuzzy queries now will use a constantscore mode
 (indirectly, a filter) when it makes sense.
 So this comment is misleading. And the parameter doesn't control the max
 buffer - it possibly provides a top cutoff. But now it doesn't even
 necessarily do that, because if the Query uses constantscore mode
 (multi-term queries auto pick by default), this setting doesn't even
 influence anything.

 I started to rewrite below - but then it feels like I almost need to
 start from scratch. I don't think we should claim this setting controls
 the maximum buffer requirements for query search either - thats a bit
 strong ;) And the buffer talk overall (including at the bottom) is a bit
 confusing.


  /** Set the maximum number of clauses permitted per BooleanQuery.
   * Default value is 1024.
   * pFor example, TermQuery clauses can be generated from prefix
 queries and
   * fuzzy queries. Each TermQuery needs some buffer space during search,
   * so this parameter indirectly controls the maximum buffer
 requirements for
   * query search.
   * pWhen this parameter becomes a bottleneck for a Query one can use

 a

   * Filter. For example instead of a {...@link TermRangeQuery} one can use

 a

   * {...@link TermRangeFilter}.
   * pNormally the buffers are allocated by the JVM. When using for
 example
   * {...@link org.apache.lucene.store.MMapDirectory} the buffering is left

 to

   * the operating system.
   */

 I'm tempted to make it:

 /** Set the maximum number of clauses permitted per BooleanQuery.
   * Default value is 1024.
  */

 :)

 Anyone have any suggestions though?

 --
 - Mark

 http://www.lucidimagination.com




 -
 To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-dev-h...@lucene.apache.org



 -
 To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-dev-h...@lucene.apache.org




 -
 To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
 For 

[jira] Commented: (LUCENE-1851) 'ant javacc' in root project should also properly create contrib/surround Java files

2009-08-26 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12747884#action_12747884
 ] 

Paul Elschot commented on LUCENE-1851:
--

After svn update I still have the output below, so I think the commit missed 
some files affected by the patch:

svn diff `find contrib/surround -name '*.jj'`

Index: 
contrib/surround/src/java/org/apache/lucene/queryParser/surround/parser/QueryParser.jj
===
--- 
contrib/surround/src/java/org/apache/lucene/queryParser/surround/parser/QueryParser.jj
  (revision 807956)
+++ 
contrib/surround/src/java/org/apache/lucene/queryParser/surround/parser/QueryParser.jj
  (working copy)
@@ -184,7 +184,7 @@
 }
 
 DEFAULT SKIP : {
-  _WHITESPACE
+   _WHITESPACE
 }
 
 /* Operator tokens (in increasing order of precedence): */


 'ant javacc' in root project should also properly create contrib/surround 
 Java files
 

 Key: LUCENE-1851
 URL: https://issues.apache.org/jira/browse/LUCENE-1851
 Project: Lucene - Java
  Issue Type: Improvement
  Components: contrib/*
Affects Versions: 2.9
Reporter: Paul Elschot
Assignee: Michael Busch
Priority: Minor
 Fix For: 2.9

 Attachments: javacc20090825.patch, LUCENE-1851.patch


 For consistency after LUCENE-1829 which did the same for contrib/queryparser

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: svn commit: r807763 - /lucene/java/trunk/build.xml

2009-08-26 Thread Grant Ingersoll

Here's what is currently run on Hudson as shell:

set -x

export FORREST_HOME=/export/home/nigel/tools/forrest/latest

ANT_HOME=/export/home/hudson/tools/ant/latest

ARTIFACTS=$WORKSPACE/artifacts
MAVEN_ARTIFACTS=$WORKSPACE/maven_artifacts
TRUNK=$WORKSPACE/trunk
mkdir -p $ARTIFACTS
mkdir -p $MAVEN_ARTIFACTS
cd $TRUNK

echo Workspace: $WORKSPACE

# run build
#$ANT_HOME/bin/ant -lib /export/home/nigel/hudsonSupport/junit \
#  -Dversion=$BUILD_ID -Dtest.junit.output.format=xml nightly
# release it
#cp dist/*.tar.gz $ARTIFACTS

#Package the Source
$ANT_HOME/bin/ant -Dversion=$BUILD_ID \
   -Dsvnversion.exe=/opt/subversion-current/bin/svnversion \
   -Dsvn.exe=/opt/subversion-current/bin/svn \
   clean package-tgz-src
# release it
cp dist/*-src.tar.gz $ARTIFACTS

#Generate the Maven snapshot
#Update the Version # when doing a release
$ANT_HOME/bin/ant -lib /export/home/nigel/hudsonSupport/maven \
  -Dsvnversion.exe=/opt/subversion-current/bin/svnversion \
  -Dsvn.exe=/opt/subversion-current/bin/svn \
  -Dversion=2.9-SNAPSHOT generate-maven-artifacts
#copy the artifacts to the side so the cron job can publish them
echo Copying Maven artifacts to $MAVEN_ARTIFACTS
cp -R dist/maven/org/apache/lucene $MAVEN_ARTIFACTS
echo Done Copying Maven Artifacts

# run build
$ANT_HOME/bin/ant -lib /export/home/nigel/hudsonSupport/junit \
  -Dsvnversion.exe=/opt/subversion-current/bin/svnversion \
  -Dsvn.exe=/opt/subversion-current/bin/svn \
  -Dversion=$BUILD_ID -Dtest.junit.output.format=xml nightly
# release it
cp dist/*.tar.gz $ARTIFACTS

$ANT_HOME/bin/ant -Dversion=$BUILD_ID \
  -Dsvnversion.exe=/opt/subversion-current/bin/svnversion \
  -Dsvn.exe=/opt/subversion-current/bin/svn \
  javadocs

#Rerun nightly with clover on
$ANT_HOME/bin/ant -lib /export/home/nigel/hudsonSupport/junit \
  -lib /export/home/hudson/tools/clover/latest/lib \
  -Dsvnversion.exe=/opt/subversion-current/bin/svnversion \
  -Dsvn.exe=/opt/subversion-current/bin/svn \
  -Dversion=$BUILD_ID -Drun.clover=true clean nightly

#generate the clover reports
$ANT_HOME/bin/ant -lib /export/home/nigel/hudsonSupport/junit \
  -lib /export/home/hudson/tools/clover/latest/lib \
  -Dsvnversion.exe=/opt/subversion-current/bin/svnversion \
  -Dsvn.exe=/opt/subversion-current/bin/svn \
  -Dversion=$BUILD_ID -Drun.clover=true generate-clover-reports


On Aug 25, 2009, at 7:39 PM, Chris Hostetter wrote:



: Grant does the cutover to hudson.zones still invoke the  
nightly.sh?  I

: thought it did?  (But then looking at the console output from the
: build, I can't correlate it..).

nightly.sh is not run, there's a complicated set of shell commands
configured in hudson that gets run instead. (why it's not just  
exec'ing a
shellscript in svn isn't clear to me ... but it starts with set -x  
so

the build log should make it clear exactly what's running.

you can see from that log: the nightly ant target is still used.



-Hoss




-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: Lucene website - benchmarks page

2009-08-26 Thread Michael McCandless
+1

Mike

On Wed, Aug 26, 2009 at 8:20 AM, Grant Ingersollgsing...@apache.org wrote:
 +1

 On Aug 25, 2009, at 10:11 PM, Mark Miller wrote:

 This are very old and not very useful anymore. Should we pull this page?
 Its kind of an embarrassment if we don't actually maintain it to be
 remotely current. These all with Lucene 1.2, 1.3 ...

 --
 - Mark

 http://www.lucidimagination.com




 -
 To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-dev-h...@lucene.apache.org




 -
 To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



RE: svn commit: r807763 - /lucene/java/trunk/build.xml

2009-08-26 Thread Uwe Schindler
So it is possible by editing this script to pass additional options with -D
to some of the ANT commands.

Thanks for the insight, that also helps me very much with the clover update.

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de

 -Original Message-
 From: Grant Ingersoll [mailto:gsing...@apache.org]
 Sent: Wednesday, August 26, 2009 2:20 PM
 To: java-dev@lucene.apache.org
 Subject: Re: svn commit: r807763 - /lucene/java/trunk/build.xml
 
 Here's what is currently run on Hudson as shell:
 
 set -x
 
 export FORREST_HOME=/export/home/nigel/tools/forrest/latest
 
 ANT_HOME=/export/home/hudson/tools/ant/latest
 
 ARTIFACTS=$WORKSPACE/artifacts
 MAVEN_ARTIFACTS=$WORKSPACE/maven_artifacts
 TRUNK=$WORKSPACE/trunk
 mkdir -p $ARTIFACTS
 mkdir -p $MAVEN_ARTIFACTS
 cd $TRUNK
 
 echo Workspace: $WORKSPACE
 
 # run build
 #$ANT_HOME/bin/ant -lib /export/home/nigel/hudsonSupport/junit \
 #  -Dversion=$BUILD_ID -Dtest.junit.output.format=xml nightly
 # release it
 #cp dist/*.tar.gz $ARTIFACTS
 
 #Package the Source
 $ANT_HOME/bin/ant -Dversion=$BUILD_ID \
 -Dsvnversion.exe=/opt/subversion-current/bin/svnversion \
 -Dsvn.exe=/opt/subversion-current/bin/svn \
 clean package-tgz-src
 # release it
 cp dist/*-src.tar.gz $ARTIFACTS
 
 #Generate the Maven snapshot
 #Update the Version # when doing a release
 $ANT_HOME/bin/ant -lib /export/home/nigel/hudsonSupport/maven \
-Dsvnversion.exe=/opt/subversion-current/bin/svnversion \
-Dsvn.exe=/opt/subversion-current/bin/svn \
-Dversion=2.9-SNAPSHOT generate-maven-artifacts
 #copy the artifacts to the side so the cron job can publish them
 echo Copying Maven artifacts to $MAVEN_ARTIFACTS
 cp -R dist/maven/org/apache/lucene $MAVEN_ARTIFACTS
 echo Done Copying Maven Artifacts
 
 # run build
 $ANT_HOME/bin/ant -lib /export/home/nigel/hudsonSupport/junit \
-Dsvnversion.exe=/opt/subversion-current/bin/svnversion \
-Dsvn.exe=/opt/subversion-current/bin/svn \
-Dversion=$BUILD_ID -Dtest.junit.output.format=xml nightly
 # release it
 cp dist/*.tar.gz $ARTIFACTS
 
 $ANT_HOME/bin/ant -Dversion=$BUILD_ID \
-Dsvnversion.exe=/opt/subversion-current/bin/svnversion \
-Dsvn.exe=/opt/subversion-current/bin/svn \
javadocs
 
 #Rerun nightly with clover on
 $ANT_HOME/bin/ant -lib /export/home/nigel/hudsonSupport/junit \
-lib /export/home/hudson/tools/clover/latest/lib \
-Dsvnversion.exe=/opt/subversion-current/bin/svnversion \
-Dsvn.exe=/opt/subversion-current/bin/svn \
-Dversion=$BUILD_ID -Drun.clover=true clean nightly
 
 #generate the clover reports
 $ANT_HOME/bin/ant -lib /export/home/nigel/hudsonSupport/junit \
-lib /export/home/hudson/tools/clover/latest/lib \
-Dsvnversion.exe=/opt/subversion-current/bin/svnversion \
-Dsvn.exe=/opt/subversion-current/bin/svn \
-Dversion=$BUILD_ID -Drun.clover=true generate-clover-reports
 
 
 On Aug 25, 2009, at 7:39 PM, Chris Hostetter wrote:
 
 
  : Grant does the cutover to hudson.zones still invoke the
  nightly.sh?  I
  : thought it did?  (But then looking at the console output from the
  : build, I can't correlate it..).
 
  nightly.sh is not run, there's a complicated set of shell commands
  configured in hudson that gets run instead. (why it's not just
  exec'ing a
  shellscript in svn isn't clear to me ... but it starts with set -x
  so
  the build log should make it clear exactly what's running.
 
  you can see from that log: the nightly ant target is still used.
 
 
 
  -Hoss
 
 
 
 -
 To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: Lucene 2.9 release

2009-08-26 Thread Mark Miller
I'm tempted to say lets start the freeze tomorrow instead - I could do
another full day of doc/packaging no problem I think (a bunch left to do
on the website stuff alone) - and technically the releaseToDo wants
everything to go through a patch in JIRA first while in freeze (not a
bad idea at all) - which slows things down. Also don't have much time to
do the RC if I'm on doc all day.

Anyone object to starting tomorrow rather than today?

-- 
- Mark

http://www.lucidimagination.com




-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: svn commit: r807763 - /lucene/java/trunk/build.xml

2009-08-26 Thread Michael McCandless
Thanks Grant.

Should we remove https://svn.apache.org/repos/asf/lucene/java/nightly
entirely?  Ie we are not using any of these files anymore?:

README.txt  nightly.properties  nightly.sh.bak  publish-maven.sh
nightly.cronnightly.sh

Mike

On Wed, Aug 26, 2009 at 8:20 AM, Grant Ingersollgsing...@apache.org wrote:
 Here's what is currently run on Hudson as shell:

 set -x

 export FORREST_HOME=/export/home/nigel/tools/forrest/latest

 ANT_HOME=/export/home/hudson/tools/ant/latest

 ARTIFACTS=$WORKSPACE/artifacts
 MAVEN_ARTIFACTS=$WORKSPACE/maven_artifacts
 TRUNK=$WORKSPACE/trunk
 mkdir -p $ARTIFACTS
 mkdir -p $MAVEN_ARTIFACTS
 cd $TRUNK

 echo Workspace: $WORKSPACE

 # run build
 #$ANT_HOME/bin/ant -lib /export/home/nigel/hudsonSupport/junit \
 #  -Dversion=$BUILD_ID -Dtest.junit.output.format=xml nightly
 # release it
 #cp dist/*.tar.gz $ARTIFACTS

 #Package the Source
 $ANT_HOME/bin/ant -Dversion=$BUILD_ID \
   -Dsvnversion.exe=/opt/subversion-current/bin/svnversion \
   -Dsvn.exe=/opt/subversion-current/bin/svn \
   clean package-tgz-src
 # release it
 cp dist/*-src.tar.gz $ARTIFACTS

 #Generate the Maven snapshot
 #Update the Version # when doing a release
 $ANT_HOME/bin/ant -lib /export/home/nigel/hudsonSupport/maven \
  -Dsvnversion.exe=/opt/subversion-current/bin/svnversion \
  -Dsvn.exe=/opt/subversion-current/bin/svn \
  -Dversion=2.9-SNAPSHOT generate-maven-artifacts
 #copy the artifacts to the side so the cron job can publish them
 echo Copying Maven artifacts to $MAVEN_ARTIFACTS
 cp -R dist/maven/org/apache/lucene $MAVEN_ARTIFACTS
 echo Done Copying Maven Artifacts

 # run build
 $ANT_HOME/bin/ant -lib /export/home/nigel/hudsonSupport/junit \
  -Dsvnversion.exe=/opt/subversion-current/bin/svnversion \
  -Dsvn.exe=/opt/subversion-current/bin/svn \
  -Dversion=$BUILD_ID -Dtest.junit.output.format=xml nightly
 # release it
 cp dist/*.tar.gz $ARTIFACTS

 $ANT_HOME/bin/ant -Dversion=$BUILD_ID \
  -Dsvnversion.exe=/opt/subversion-current/bin/svnversion \
  -Dsvn.exe=/opt/subversion-current/bin/svn \
  javadocs

 #Rerun nightly with clover on
 $ANT_HOME/bin/ant -lib /export/home/nigel/hudsonSupport/junit \
  -lib /export/home/hudson/tools/clover/latest/lib \
  -Dsvnversion.exe=/opt/subversion-current/bin/svnversion \
  -Dsvn.exe=/opt/subversion-current/bin/svn \
  -Dversion=$BUILD_ID -Drun.clover=true clean nightly

 #generate the clover reports
 $ANT_HOME/bin/ant -lib /export/home/nigel/hudsonSupport/junit \
  -lib /export/home/hudson/tools/clover/latest/lib \
  -Dsvnversion.exe=/opt/subversion-current/bin/svnversion \
  -Dsvn.exe=/opt/subversion-current/bin/svn \
  -Dversion=$BUILD_ID -Drun.clover=true generate-clover-reports


 On Aug 25, 2009, at 7:39 PM, Chris Hostetter wrote:


 : Grant does the cutover to hudson.zones still invoke the nightly.sh?  I
 : thought it did?  (But then looking at the console output from the
 : build, I can't correlate it..).

 nightly.sh is not run, there's a complicated set of shell commands
 configured in hudson that gets run instead. (why it's not just exec'ing a
 shellscript in svn isn't clear to me ... but it starts with set -x so
 the build log should make it clear exactly what's running.

 you can see from that log: the nightly ant target is still used.



 -Hoss



 -
 To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: svn commit: r807763 - /lucene/java/trunk/build.xml

2009-08-26 Thread Grant Ingersoll

I think pub-maven is still used, but let me check.


On Aug 26, 2009, at 8:47 AM, Michael McCandless wrote:


Thanks Grant.

Should we remove https://svn.apache.org/repos/asf/lucene/java/nightly
entirely?  Ie we are not using any of these files anymore?:

README.txt  nightly.properties  nightly.sh.bak  publish- 
maven.sh

nightly.cronnightly.sh

Mike

On Wed, Aug 26, 2009 at 8:20 AM, Grant  
Ingersollgsing...@apache.org wrote:

Here's what is currently run on Hudson as shell:

set -x

export FORREST_HOME=/export/home/nigel/tools/forrest/latest

ANT_HOME=/export/home/hudson/tools/ant/latest

ARTIFACTS=$WORKSPACE/artifacts
MAVEN_ARTIFACTS=$WORKSPACE/maven_artifacts
TRUNK=$WORKSPACE/trunk
mkdir -p $ARTIFACTS
mkdir -p $MAVEN_ARTIFACTS
cd $TRUNK

echo Workspace: $WORKSPACE

# run build
#$ANT_HOME/bin/ant -lib /export/home/nigel/hudsonSupport/junit \
#  -Dversion=$BUILD_ID -Dtest.junit.output.format=xml nightly
# release it
#cp dist/*.tar.gz $ARTIFACTS

#Package the Source
$ANT_HOME/bin/ant -Dversion=$BUILD_ID \
  -Dsvnversion.exe=/opt/subversion-current/bin/svnversion \
  -Dsvn.exe=/opt/subversion-current/bin/svn \
  clean package-tgz-src
# release it
cp dist/*-src.tar.gz $ARTIFACTS

#Generate the Maven snapshot
#Update the Version # when doing a release
$ANT_HOME/bin/ant -lib /export/home/nigel/hudsonSupport/maven \
 -Dsvnversion.exe=/opt/subversion-current/bin/svnversion \
 -Dsvn.exe=/opt/subversion-current/bin/svn \
 -Dversion=2.9-SNAPSHOT generate-maven-artifacts
#copy the artifacts to the side so the cron job can publish them
echo Copying Maven artifacts to $MAVEN_ARTIFACTS
cp -R dist/maven/org/apache/lucene $MAVEN_ARTIFACTS
echo Done Copying Maven Artifacts

# run build
$ANT_HOME/bin/ant -lib /export/home/nigel/hudsonSupport/junit \
 -Dsvnversion.exe=/opt/subversion-current/bin/svnversion \
 -Dsvn.exe=/opt/subversion-current/bin/svn \
 -Dversion=$BUILD_ID -Dtest.junit.output.format=xml nightly
# release it
cp dist/*.tar.gz $ARTIFACTS

$ANT_HOME/bin/ant -Dversion=$BUILD_ID \
 -Dsvnversion.exe=/opt/subversion-current/bin/svnversion \
 -Dsvn.exe=/opt/subversion-current/bin/svn \
 javadocs

#Rerun nightly with clover on
$ANT_HOME/bin/ant -lib /export/home/nigel/hudsonSupport/junit \
 -lib /export/home/hudson/tools/clover/latest/lib \
 -Dsvnversion.exe=/opt/subversion-current/bin/svnversion \
 -Dsvn.exe=/opt/subversion-current/bin/svn \
 -Dversion=$BUILD_ID -Drun.clover=true clean nightly

#generate the clover reports
$ANT_HOME/bin/ant -lib /export/home/nigel/hudsonSupport/junit \
 -lib /export/home/hudson/tools/clover/latest/lib \
 -Dsvnversion.exe=/opt/subversion-current/bin/svnversion \
 -Dsvn.exe=/opt/subversion-current/bin/svn \
 -Dversion=$BUILD_ID -Drun.clover=true generate-clover-reports


On Aug 25, 2009, at 7:39 PM, Chris Hostetter wrote:



: Grant does the cutover to hudson.zones still invoke the  
nightly.sh?  I

: thought it did?  (But then looking at the console output from the
: build, I can't correlate it..).

nightly.sh is not run, there's a complicated set of shell commands
configured in hudson that gets run instead. (why it's not just  
exec'ing a
shellscript in svn isn't clear to me ... but it starts with set - 
x so

the build log should make it clear exactly what's running.

you can see from that log: the nightly ant target is still used.



-Hoss




-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org




-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:

http://www.lucidimagination.com/search


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Reopened: (LUCENE-1851) 'ant javacc' in root project should also properly create contrib/surround Java files

2009-08-26 Thread Paul Elschot (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Elschot reopened LUCENE-1851:
--


Reopening only to make sure my last comment is not missed before the impending 
2.9 release.

 'ant javacc' in root project should also properly create contrib/surround 
 Java files
 

 Key: LUCENE-1851
 URL: https://issues.apache.org/jira/browse/LUCENE-1851
 Project: Lucene - Java
  Issue Type: Improvement
  Components: contrib/*
Affects Versions: 2.9
Reporter: Paul Elschot
Assignee: Michael Busch
Priority: Minor
 Fix For: 2.9

 Attachments: javacc20090825.patch, LUCENE-1851.patch


 For consistency after LUCENE-1829 which did the same for contrib/queryparser

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: svn commit: r807763 - /lucene/java/trunk/build.xml

2009-08-26 Thread Grant Ingersoll
publish-maven is run on my cron script on the Hudson zone, where it  
copies the maven artifacts off of the Lucene Zone and onto the Hudson  
Zone.


FWIW, committers can get Hudson accounts.  See http://wiki.apache.org/general/Hudson 
.  Committers can also get Lucene Zone access to, if it is needed.


I will update the docs.

On Aug 26, 2009, at 8:54 AM, Grant Ingersoll wrote:


I think pub-maven is still used, but let me check.


On Aug 26, 2009, at 8:47 AM, Michael McCandless wrote:


Thanks Grant.

Should we remove https://svn.apache.org/repos/asf/lucene/java/nightly
entirely?  Ie we are not using any of these files anymore?:

README.txt  nightly.properties  nightly.sh.bak  publish- 
maven.sh

nightly.cronnightly.sh

Mike

On Wed, Aug 26, 2009 at 8:20 AM, Grant  
Ingersollgsing...@apache.org wrote:

Here's what is currently run on Hudson as shell:

set -x

export FORREST_HOME=/export/home/nigel/tools/forrest/latest

ANT_HOME=/export/home/hudson/tools/ant/latest

ARTIFACTS=$WORKSPACE/artifacts
MAVEN_ARTIFACTS=$WORKSPACE/maven_artifacts
TRUNK=$WORKSPACE/trunk
mkdir -p $ARTIFACTS
mkdir -p $MAVEN_ARTIFACTS
cd $TRUNK

echo Workspace: $WORKSPACE

# run build
#$ANT_HOME/bin/ant -lib /export/home/nigel/hudsonSupport/junit \
#  -Dversion=$BUILD_ID -Dtest.junit.output.format=xml nightly
# release it
#cp dist/*.tar.gz $ARTIFACTS

#Package the Source
$ANT_HOME/bin/ant -Dversion=$BUILD_ID \
 -Dsvnversion.exe=/opt/subversion-current/bin/svnversion \
 -Dsvn.exe=/opt/subversion-current/bin/svn \
 clean package-tgz-src
# release it
cp dist/*-src.tar.gz $ARTIFACTS

#Generate the Maven snapshot
#Update the Version # when doing a release
$ANT_HOME/bin/ant -lib /export/home/nigel/hudsonSupport/maven \
-Dsvnversion.exe=/opt/subversion-current/bin/svnversion \
-Dsvn.exe=/opt/subversion-current/bin/svn \
-Dversion=2.9-SNAPSHOT generate-maven-artifacts
#copy the artifacts to the side so the cron job can publish them
echo Copying Maven artifacts to $MAVEN_ARTIFACTS
cp -R dist/maven/org/apache/lucene $MAVEN_ARTIFACTS
echo Done Copying Maven Artifacts

# run build
$ANT_HOME/bin/ant -lib /export/home/nigel/hudsonSupport/junit \
-Dsvnversion.exe=/opt/subversion-current/bin/svnversion \
-Dsvn.exe=/opt/subversion-current/bin/svn \
-Dversion=$BUILD_ID -Dtest.junit.output.format=xml nightly
# release it
cp dist/*.tar.gz $ARTIFACTS

$ANT_HOME/bin/ant -Dversion=$BUILD_ID \
-Dsvnversion.exe=/opt/subversion-current/bin/svnversion \
-Dsvn.exe=/opt/subversion-current/bin/svn \
javadocs

#Rerun nightly with clover on
$ANT_HOME/bin/ant -lib /export/home/nigel/hudsonSupport/junit \
-lib /export/home/hudson/tools/clover/latest/lib \
-Dsvnversion.exe=/opt/subversion-current/bin/svnversion \
-Dsvn.exe=/opt/subversion-current/bin/svn \
-Dversion=$BUILD_ID -Drun.clover=true clean nightly

#generate the clover reports
$ANT_HOME/bin/ant -lib /export/home/nigel/hudsonSupport/junit \
-lib /export/home/hudson/tools/clover/latest/lib \
-Dsvnversion.exe=/opt/subversion-current/bin/svnversion \
-Dsvn.exe=/opt/subversion-current/bin/svn \
-Dversion=$BUILD_ID -Drun.clover=true generate-clover-reports


On Aug 25, 2009, at 7:39 PM, Chris Hostetter wrote:



: Grant does the cutover to hudson.zones still invoke the  
nightly.sh?  I

: thought it did?  (But then looking at the console output from the
: build, I can't correlate it..).

nightly.sh is not run, there's a complicated set of shell commands
configured in hudson that gets run instead. (why it's not just  
exec'ing a
shellscript in svn isn't clear to me ... but it starts with set - 
x so

the build log should make it clear exactly what's running.

you can see from that log: the nightly ant target is still used.



-Hoss




-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org




-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org




-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1859) TermAttributeImpl's buffer will never shrink if it grows too big

2009-08-26 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12747942#action_12747942
 ] 

Uwe Schindler commented on LUCENE-1859:
---

This also applies to Token. If we fix that, we should also fix it in Token.

 TermAttributeImpl's buffer will never shrink if it grows too big
 --

 Key: LUCENE-1859
 URL: https://issues.apache.org/jira/browse/LUCENE-1859
 Project: Lucene - Java
  Issue Type: Bug
  Components: Analysis
Affects Versions: 2.9
Reporter: Tim Smith

 This was also an issue with Token previously as well
 If a TermAttributeImpl is populated with a very long buffer, it will never be 
 able to reclaim this memory
 Obviously, it can be argued that Tokenizer's should never emit large 
 tokens, however it seems that the TermAttributeImpl should have a reasonable 
 static MAX_BUFFER_SIZE such that if the term buffer grows bigger than this, 
 it will shrink back down to this size once the next token smaller than 
 MAX_BUFFER_SIZE is set
 I don't think i have actually encountered issues with this yet, however it 
 seems like if you have multiple indexing threads, you could end up with a 
 char[Integer.MAX_VALUE] per thread (in the very worst case scenario)
 perhaps growTermBuffer should have the logic to shrink if the buffer is 
 currently larger than MAX_BUFFER_SIZE and it needs less than MAX_BUFFER_SIZE

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-1815) Geohash encode/decode floating point problems

2009-08-26 Thread Simon Willnauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer updated LUCENE-1815:


Priority: Minor  (was: Major)

I don't think this shouldn't be major!

 Geohash encode/decode floating point problems
 -

 Key: LUCENE-1815
 URL: https://issues.apache.org/jira/browse/LUCENE-1815
 Project: Lucene - Java
  Issue Type: Bug
  Components: contrib/spatial
Affects Versions: 2.9
Reporter: Wouter Heijke
Priority: Minor

 i'm finding the Geohash support in the spatial package to be rather 
 unreliable.
 Here is the outcome of a test that encodes/decodes the same lat/lon and 
 geohash a few times.
 the format:
 action geohash=(latitude, longitude)
 the result:
 encode u173zq37x014=(52.3738007,4.8909347)
 decode u173zq37x014=(52.3737996,4.890934)
 encode u173zq37rpbw=(52.3737996,4.890934)
 decode u173zq37rpbw=(52.3737996,4.89093295)
 encode u173zq37qzzy=(52.3737996,4.89093295)
 if I now change to the google code implementation:
 encode u173zq37x014=(52.3738007,4.8909347)
 decode u173zq37x014=(52.37380061298609,4.890934377908707)
 encode u173zq37x014=(52.37380061298609,4.890934377908707)
 decode u173zq37x014=(52.37380061298609,4.890934377908707)
 encode u173zq37x014=(52.37380061298609,4.890934377908707)
 Note the differences between the geohashes in both situations and the 
 lat/lon's!
 Now things get worse if you work on low-precision geohashes:
 decode u173=(52.0,4.0)
 encode u14zg429yy84=(52.0,4.0)
 decode u14zg429yy84=(52.0,3.99)
 encode u14zg429ywx6=(52.0,3.99)
 and google:
 decode u173=(52.20703125,4.5703125)
 encode u173=(52.20703125,4.5703125)
 decode u173=(52.20703125,4.5703125)
 encode u173=(52.20703125,4.5703125)
 We are using geohashes extensively and will now use the google code version 
 unfortunately.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1851) 'ant javacc' in root project should also properly create contrib/surround Java files

2009-08-26 Thread Luis Alves (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12747961#action_12747961
 ] 

Luis Alves commented on LUCENE-1851:


{code}
javacc:
   [javacc] Java Compiler Compiler Version 4.2 (Parser Generator)
   [javacc] (type javacc with no arguments for help)
   [javacc] Reading from file 
/home/lafa/kisor2/workspace_eclipse33/lucene_trunk2/contrib/surround/src/java/org/apache/lucene/queryParser/surround/parser/QueryParser.jj
 . . .
   [javacc] org.javacc.parser.ParseException: Encountered at line 
187, column 3.
   [javacc] Was expecting one of:
   [javacc] STRING_LITERAL ...
   [javacc]  ...
   [javacc] 
   [javacc] Detected 1 errors and 0 warnings.
{code}
I just re-synced and see the same problem, I think Michael forgot to commit the 
QueryParser.jj changes I made.


 'ant javacc' in root project should also properly create contrib/surround 
 Java files
 

 Key: LUCENE-1851
 URL: https://issues.apache.org/jira/browse/LUCENE-1851
 Project: Lucene - Java
  Issue Type: Improvement
  Components: contrib/*
Affects Versions: 2.9
Reporter: Paul Elschot
Assignee: Michael Busch
Priority: Minor
 Fix For: 2.9

 Attachments: javacc20090825.patch, LUCENE-1851.patch


 For consistency after LUCENE-1829 which did the same for contrib/queryparser

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Back-Compat on Contribs

2009-08-26 Thread Simon Willnauer
I just talked to Robert about refactoring of smartcn for the next
releases  2.9. Robert raised a question if we should mark smartcn as
experimental so that we can change interfaces and public methods etc.
during the refactoring. Would that make sense for 2.9 or is there no
such thing as a back compat policy for modules like that.

simon

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: Back-Compat on Contribs

2009-08-26 Thread Mark Miller
Simon Willnauer wrote:
 I just talked to Robert about refactoring of smartcn for the next
 releases  2.9. Robert raised a question if we should mark smartcn as
 experimental so that we can change interfaces and public methods etc.
 during the refactoring. Would that make sense for 2.9 or is there no
 such thing as a back compat policy for modules like that.

 simon

 -
 To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-dev-h...@lucene.apache.org

   
Contrib modules are not required to support back compat in any way
currently - but they can also each have any more restrictive policy that
we want. I consider Highlighter to be 1.4 right now (even though thats
not explicit anywhere).

Warning users that you don't plan on promising back compat with
experimental warnings seems like a good idea to me.

-- 
- Mark

http://www.lucidimagination.com




-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1817) it is impossible to use a custom dictionary for SmartChineseAnalyzer

2009-08-26 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12747963#action_12747963
 ] 

Robert Muir commented on LUCENE-1817:
-

I am looking at this today. One thing about this code that should also be 
corrected ASAP is that if you have a custom dictionary directory in .DCT 
format, the load() method will actually call save()

This will create a corresponding .MEM file in the same directory after loading 
the dictionary in DCT format.

I really do not think load() methods should be creating or writing to files.


 it is impossible to use a custom dictionary for SmartChineseAnalyzer
 

 Key: LUCENE-1817
 URL: https://issues.apache.org/jira/browse/LUCENE-1817
 Project: Lucene - Java
  Issue Type: Bug
  Components: contrib/analyzers
Reporter: Robert Muir
Priority: Minor

 it is not possible to use a custom dictionary, even though there is a lot of 
 code and javadocs to allow this.
 This is because the custom dictionary is only loaded if it cannot load the 
 built-in one (which is of course, in the jar file and should load)
 {code}
 public synchronized static WordDictionary getInstance() {
 if (singleInstance == null) {
   singleInstance = new WordDictionary(); // load from jar file
   try {
 singleInstance.load();
   } catch (IOException e) { // loading from jar file must fail before it 
 checks the AnalyzerProfile (where this can be configured)
 String wordDictRoot = AnalyzerProfile.ANALYSIS_DATA_DIR;
 singleInstance.load(wordDictRoot);
   } catch (ClassNotFoundException e) {
 throw new RuntimeException(e);
   }
 }
 return singleInstance;
   }
 {code}
 I think we should either correct this, document this, or disable custom 
 dictionary support...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-1817) it is impossible to use a custom dictionary for SmartChineseAnalyzer

2009-08-26 Thread Simon Willnauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer updated LUCENE-1817:


Attachment: LUCENE-1817-mark-cn-experimental.patch

We should mark the smartcn module  experimental as we plan to do heavy 
refactoring after 2.9 is out. This patch adds a notice to package.html and 
JavaDoc.
Quoting Mark Miller from the list:
bq. Warning users that you don't plan on promising back compat with 
experimental warnings seems like a good idea to me.


 it is impossible to use a custom dictionary for SmartChineseAnalyzer
 

 Key: LUCENE-1817
 URL: https://issues.apache.org/jira/browse/LUCENE-1817
 Project: Lucene - Java
  Issue Type: Bug
  Components: contrib/analyzers
Reporter: Robert Muir
Priority: Minor
 Attachments: LUCENE-1817-mark-cn-experimental.patch


 it is not possible to use a custom dictionary, even though there is a lot of 
 code and javadocs to allow this.
 This is because the custom dictionary is only loaded if it cannot load the 
 built-in one (which is of course, in the jar file and should load)
 {code}
 public synchronized static WordDictionary getInstance() {
 if (singleInstance == null) {
   singleInstance = new WordDictionary(); // load from jar file
   try {
 singleInstance.load();
   } catch (IOException e) { // loading from jar file must fail before it 
 checks the AnalyzerProfile (where this can be configured)
 String wordDictRoot = AnalyzerProfile.ANALYSIS_DATA_DIR;
 singleInstance.load(wordDictRoot);
   } catch (ClassNotFoundException e) {
 throw new RuntimeException(e);
   }
 }
 return singleInstance;
   }
 {code}
 I think we should either correct this, document this, or disable custom 
 dictionary support...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: Back-Compat on Contribs

2009-08-26 Thread Simon Willnauer
On Wed, Aug 26, 2009 at 4:49 PM, Mark Millermarkrmil...@gmail.com wrote:
 Simon Willnauer wrote:
 I just talked to Robert about refactoring of smartcn for the next
 releases  2.9. Robert raised a question if we should mark smartcn as
 experimental so that we can change interfaces and public methods etc.
 during the refactoring. Would that make sense for 2.9 or is there no
 such thing as a back compat policy for modules like that.

 simon

 -
 To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-dev-h...@lucene.apache.org


 Contrib modules are not required to support back compat in any way
 currently - but they can also each have any more restrictive policy that
 we want. I consider Highlighter to be 1.4 right now (even though thats
 not explicit anywhere).

 Warning users that you don't plan on promising back compat with
 experimental warnings seems like a good idea to me.
I think so too - done!

simon

 --
 - Mark

 http://www.lucidimagination.com




 -
 To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1486) Wildcards, ORs etc inside Phrase queries

2009-08-26 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12747982#action_12747982
 ] 

Grant Ingersoll commented on LUCENE-1486:
-

I'm not sure why the ComplexPhraseQuery itself is buried in the Parser.  Can't 
the query stand on it's own?  Seems like it could be a useful class outside of 
the specific content of a QueryParser, no?

 Wildcards, ORs etc inside Phrase queries
 

 Key: LUCENE-1486
 URL: https://issues.apache.org/jira/browse/LUCENE-1486
 Project: Lucene - Java
  Issue Type: Improvement
  Components: QueryParser
Affects Versions: 2.4
Reporter: Mark Harwood
Assignee: Mark Miller
Priority: Minor
 Fix For: 3.0, 3.1

 Attachments: ComplexPhraseQueryParser.java, 
 junit_complex_phrase_qp_07_21_2009.patch, 
 junit_complex_phrase_qp_07_22_2009.patch, Lucene-1486 non default 
 field.patch, LUCENE-1486.patch, LUCENE-1486.patch, LUCENE-1486.patch, 
 LUCENE-1486.patch, LUCENE-1486.patch, TestComplexPhraseQuery.java


 An extension to the default QueryParser that overrides the parsing of 
 PhraseQueries to allow more complex syntax e.g. wildcards in phrase queries.
 The implementation feels a little hacky - this is arguably better handled in 
 QueryParser itself. This works as a proof of concept  for much of the query 
 parser syntax. Examples from the Junit test include:
   checkMatches(\j*   smyth~\, 1,2); //wildcards and fuzzies 
 are OK in phrases
   checkMatches(\(jo* -john)  smith\, 2); // boolean logic 
 works
   checkMatches(\jo*  smith\~2, 1,2,3); // position logic 
 works.
   
   checkBadQuery(\jo*  id:1 smith\); //mixing fields in a 
 phrase is bad
   checkBadQuery(\jo* \smith\ \); //phrases inside phrases 
 is bad
   checkBadQuery(\jo* [sma TO smZ]\ \); //range queries 
 inside phrases not supported
 Code plus Junit test to follow...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1817) it is impossible to use a custom dictionary for SmartChineseAnalyzer

2009-08-26 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12747967#action_12747967
 ] 

Robert Muir commented on LUCENE-1817:
-

Uwe, i agree. currently it does do the autodetect (first checks for .MEM, then 
falls back on DCT).
but if it has to fall back on DCT, it will create a .MEM file.

 it is impossible to use a custom dictionary for SmartChineseAnalyzer
 

 Key: LUCENE-1817
 URL: https://issues.apache.org/jira/browse/LUCENE-1817
 Project: Lucene - Java
  Issue Type: Bug
  Components: contrib/analyzers
Reporter: Robert Muir
Priority: Minor

 it is not possible to use a custom dictionary, even though there is a lot of 
 code and javadocs to allow this.
 This is because the custom dictionary is only loaded if it cannot load the 
 built-in one (which is of course, in the jar file and should load)
 {code}
 public synchronized static WordDictionary getInstance() {
 if (singleInstance == null) {
   singleInstance = new WordDictionary(); // load from jar file
   try {
 singleInstance.load();
   } catch (IOException e) { // loading from jar file must fail before it 
 checks the AnalyzerProfile (where this can be configured)
 String wordDictRoot = AnalyzerProfile.ANALYSIS_DATA_DIR;
 singleInstance.load(wordDictRoot);
   } catch (ClassNotFoundException e) {
 throw new RuntimeException(e);
   }
 }
 return singleInstance;
   }
 {code}
 I think we should either correct this, document this, or disable custom 
 dictionary support...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1817) it is impossible to use a custom dictionary for SmartChineseAnalyzer

2009-08-26 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12747966#action_12747966
 ] 

Uwe Schindler commented on LUCENE-1817:
---

In my opinion, the loader should be able to load either .mem files (which 
should realy be named *.ser, because they are serialized java objects) or DCT 
format files (maybe autodetect) or two separate methods. If you want to quicker 
load the files later, you could also save the DCT as a serialized object after 
that, but this should be left to the user and not done automatically.

 it is impossible to use a custom dictionary for SmartChineseAnalyzer
 

 Key: LUCENE-1817
 URL: https://issues.apache.org/jira/browse/LUCENE-1817
 Project: Lucene - Java
  Issue Type: Bug
  Components: contrib/analyzers
Reporter: Robert Muir
Priority: Minor

 it is not possible to use a custom dictionary, even though there is a lot of 
 code and javadocs to allow this.
 This is because the custom dictionary is only loaded if it cannot load the 
 built-in one (which is of course, in the jar file and should load)
 {code}
 public synchronized static WordDictionary getInstance() {
 if (singleInstance == null) {
   singleInstance = new WordDictionary(); // load from jar file
   try {
 singleInstance.load();
   } catch (IOException e) { // loading from jar file must fail before it 
 checks the AnalyzerProfile (where this can be configured)
 String wordDictRoot = AnalyzerProfile.ANALYSIS_DATA_DIR;
 singleInstance.load(wordDictRoot);
   } catch (ClassNotFoundException e) {
 throw new RuntimeException(e);
   }
 }
 return singleInstance;
   }
 {code}
 I think we should either correct this, document this, or disable custom 
 dictionary support...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1817) it is impossible to use a custom dictionary for SmartChineseAnalyzer

2009-08-26 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12748025#action_12748025
 ] 

Robert Muir commented on LUCENE-1817:
-

to make matters more complex, trying to load a bigram dictionary from a DCT 
file gave me:

{noformat}
# An unexpected error has been detected by Java Runtime Environment:
#
#  EXCEPTION_ACCESS_VIOLATION (0xc005) at pc=0x6dc378d0, pid=3140, 
tid=5912
#
# Java VM: Java HotSpot(TM) 64-Bit Server VM (11.2-b01 mixed mode windows-amd64)
# Problematic frame:
# V  [jvm.dll+0x3a78d0]
{noformat}

I will try to see if i can resolve this.

 it is impossible to use a custom dictionary for SmartChineseAnalyzer
 

 Key: LUCENE-1817
 URL: https://issues.apache.org/jira/browse/LUCENE-1817
 Project: Lucene - Java
  Issue Type: Bug
  Components: contrib/analyzers
Reporter: Robert Muir
Priority: Minor
 Attachments: LUCENE-1817-mark-cn-experimental.patch


 it is not possible to use a custom dictionary, even though there is a lot of 
 code and javadocs to allow this.
 This is because the custom dictionary is only loaded if it cannot load the 
 built-in one (which is of course, in the jar file and should load)
 {code}
 public synchronized static WordDictionary getInstance() {
 if (singleInstance == null) {
   singleInstance = new WordDictionary(); // load from jar file
   try {
 singleInstance.load();
   } catch (IOException e) { // loading from jar file must fail before it 
 checks the AnalyzerProfile (where this can be configured)
 String wordDictRoot = AnalyzerProfile.ANALYSIS_DATA_DIR;
 singleInstance.load(wordDictRoot);
   } catch (ClassNotFoundException e) {
 throw new RuntimeException(e);
   }
 }
 return singleInstance;
   }
 {code}
 I think we should either correct this, document this, or disable custom 
 dictionary support...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-1817) it is impossible to use a custom dictionary for SmartChineseAnalyzer

2009-08-26 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-1817:


Attachment: LUCENE-1817.patch

patch adds:
* load custom dictionaries when the analyzer has been configured to do so
* test that custom DCT dictionaries load 
* do not serialize/write files when loading DCT
* change saveToObj() to package protected so someone can serialize their own 
dictionaries instead.

the patch requires some binary dct data files which I will try to upload as a 
zip

 it is impossible to use a custom dictionary for SmartChineseAnalyzer
 

 Key: LUCENE-1817
 URL: https://issues.apache.org/jira/browse/LUCENE-1817
 Project: Lucene - Java
  Issue Type: Bug
  Components: contrib/analyzers
Reporter: Robert Muir
Priority: Minor
 Attachments: LUCENE-1817-mark-cn-experimental.patch, LUCENE-1817.patch


 it is not possible to use a custom dictionary, even though there is a lot of 
 code and javadocs to allow this.
 This is because the custom dictionary is only loaded if it cannot load the 
 built-in one (which is of course, in the jar file and should load)
 {code}
 public synchronized static WordDictionary getInstance() {
 if (singleInstance == null) {
   singleInstance = new WordDictionary(); // load from jar file
   try {
 singleInstance.load();
   } catch (IOException e) { // loading from jar file must fail before it 
 checks the AnalyzerProfile (where this can be configured)
 String wordDictRoot = AnalyzerProfile.ANALYSIS_DATA_DIR;
 singleInstance.load(wordDictRoot);
   } catch (ClassNotFoundException e) {
 throw new RuntimeException(e);
   }
 }
 return singleInstance;
   }
 {code}
 I think we should either correct this, document this, or disable custom 
 dictionary support...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Issue Comment Edited: (LUCENE-1817) it is impossible to use a custom dictionary for SmartChineseAnalyzer

2009-08-26 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12748025#action_12748025
 ] 

Robert Muir edited comment on LUCENE-1817 at 8/26/09 10:21 AM:
---

to make matters more complex, trying to load a bigram dictionary from a DCT 
file gave me:

{noformat}
# An unexpected error has been detected by Java Runtime Environment:
#
#  EXCEPTION_ACCESS_VIOLATION (0xc005) at pc=0x6dc378d0, pid=3140, 
tid=5912
#
# Java VM: Java HotSpot(TM) 64-Bit Server VM (11.2-b01 mixed mode windows-amd64)
# Problematic frame:
# V  [jvm.dll+0x3a78d0]
{noformat}

apparently this is some clover issue in my eclipse and i turned it off, so it 
is an unrelated problem.

  was (Author: rcmuir):
to make matters more complex, trying to load a bigram dictionary from a DCT 
file gave me:

{noformat}
# An unexpected error has been detected by Java Runtime Environment:
#
#  EXCEPTION_ACCESS_VIOLATION (0xc005) at pc=0x6dc378d0, pid=3140, 
tid=5912
#
# Java VM: Java HotSpot(TM) 64-Bit Server VM (11.2-b01 mixed mode windows-amd64)
# Problematic frame:
# V  [jvm.dll+0x3a78d0]
{noformat}

I will try to see if i can resolve this.
  
 it is impossible to use a custom dictionary for SmartChineseAnalyzer
 

 Key: LUCENE-1817
 URL: https://issues.apache.org/jira/browse/LUCENE-1817
 Project: Lucene - Java
  Issue Type: Bug
  Components: contrib/analyzers
Reporter: Robert Muir
Priority: Minor
 Attachments: dataFiles.zip, LUCENE-1817-mark-cn-experimental.patch, 
 LUCENE-1817.patch


 it is not possible to use a custom dictionary, even though there is a lot of 
 code and javadocs to allow this.
 This is because the custom dictionary is only loaded if it cannot load the 
 built-in one (which is of course, in the jar file and should load)
 {code}
 public synchronized static WordDictionary getInstance() {
 if (singleInstance == null) {
   singleInstance = new WordDictionary(); // load from jar file
   try {
 singleInstance.load();
   } catch (IOException e) { // loading from jar file must fail before it 
 checks the AnalyzerProfile (where this can be configured)
 String wordDictRoot = AnalyzerProfile.ANALYSIS_DATA_DIR;
 singleInstance.load(wordDictRoot);
   } catch (ClassNotFoundException e) {
 throw new RuntimeException(e);
   }
 }
 return singleInstance;
   }
 {code}
 I think we should either correct this, document this, or disable custom 
 dictionary support...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-1817) it is impossible to use a custom dictionary for SmartChineseAnalyzer

2009-08-26 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-1817:


Attachment: dataFiles.zip

the two files in this directory need to be placed in smartcn/test under 
o/a/l/analysis/cn/smart/hmm/customDictionaryDCT


 it is impossible to use a custom dictionary for SmartChineseAnalyzer
 

 Key: LUCENE-1817
 URL: https://issues.apache.org/jira/browse/LUCENE-1817
 Project: Lucene - Java
  Issue Type: Bug
  Components: contrib/analyzers
Reporter: Robert Muir
Priority: Minor
 Attachments: dataFiles.zip, LUCENE-1817-mark-cn-experimental.patch, 
 LUCENE-1817.patch


 it is not possible to use a custom dictionary, even though there is a lot of 
 code and javadocs to allow this.
 This is because the custom dictionary is only loaded if it cannot load the 
 built-in one (which is of course, in the jar file and should load)
 {code}
 public synchronized static WordDictionary getInstance() {
 if (singleInstance == null) {
   singleInstance = new WordDictionary(); // load from jar file
   try {
 singleInstance.load();
   } catch (IOException e) { // loading from jar file must fail before it 
 checks the AnalyzerProfile (where this can be configured)
 String wordDictRoot = AnalyzerProfile.ANALYSIS_DATA_DIR;
 singleInstance.load(wordDictRoot);
   } catch (ClassNotFoundException e) {
 throw new RuntimeException(e);
   }
 }
 return singleInstance;
   }
 {code}
 I think we should either correct this, document this, or disable custom 
 dictionary support...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1486) Wildcards, ORs etc inside Phrase queries

2009-08-26 Thread Mark Harwood (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12748046#action_12748046
 ] 

Mark Harwood commented on LUCENE-1486:
--

It does not stand on it's own as it is merely a temporary object used as a 
peculiarity in the way the parsing works. The SpanQuery family would be the 
legitimate standalone equivalents of this class.

ComplexPhraseQuery objects are constructed during the the first pass of parsing 
to capture everything between quotes as an opaque string.
The ComplexPhraseQueryParser then calls parsePhraseElements(...) on these 
objects to complete the process of parsing in a second pass where in this 
context any brackets etc take on a different meaning
There is no merit in making this externally visible.





 Wildcards, ORs etc inside Phrase queries
 

 Key: LUCENE-1486
 URL: https://issues.apache.org/jira/browse/LUCENE-1486
 Project: Lucene - Java
  Issue Type: Improvement
  Components: QueryParser
Affects Versions: 2.4
Reporter: Mark Harwood
Assignee: Mark Miller
Priority: Minor
 Fix For: 3.0, 3.1

 Attachments: ComplexPhraseQueryParser.java, 
 junit_complex_phrase_qp_07_21_2009.patch, 
 junit_complex_phrase_qp_07_22_2009.patch, Lucene-1486 non default 
 field.patch, LUCENE-1486.patch, LUCENE-1486.patch, LUCENE-1486.patch, 
 LUCENE-1486.patch, LUCENE-1486.patch, TestComplexPhraseQuery.java


 An extension to the default QueryParser that overrides the parsing of 
 PhraseQueries to allow more complex syntax e.g. wildcards in phrase queries.
 The implementation feels a little hacky - this is arguably better handled in 
 QueryParser itself. This works as a proof of concept  for much of the query 
 parser syntax. Examples from the Junit test include:
   checkMatches(\j*   smyth~\, 1,2); //wildcards and fuzzies 
 are OK in phrases
   checkMatches(\(jo* -john)  smith\, 2); // boolean logic 
 works
   checkMatches(\jo*  smith\~2, 1,2,3); // position logic 
 works.
   
   checkBadQuery(\jo*  id:1 smith\); //mixing fields in a 
 phrase is bad
   checkBadQuery(\jo* \smith\ \); //phrases inside phrases 
 is bad
   checkBadQuery(\jo* [sma TO smZ]\ \); //range queries 
 inside phrases not supported
 Code plus Junit test to follow...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1859) TermAttributeImpl's buffer will never shrink if it grows too big

2009-08-26 Thread Marvin Humphrey (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12748064#action_12748064
 ] 

Marvin Humphrey commented on LUCENE-1859:
-

The worst-case scenario seems kind of theoretical, since there are so many
reasons that huge tokens are impractical. (Is a priority of major
justified?) If there's a significant benefit to shrinking the allocation, it's
minimizing average memory usage over time.  But even that assumes a nearly
pathological distribution in field size -- it would have to be large for early
documents, then consistently small for subsequent documents.  If it's
scattered, you have to plan for worst case RAM usage as an app developer,
anyway.  Which generally means limiting token size.

I assume that, based on this report, TermAttributeImpl never gets reset or
discarded/recreated over the course of an indexing session?

-0 if the reallocation happens no more often than once per document.

-1 if it the reallocation has be performed in an inner loop.

 TermAttributeImpl's buffer will never shrink if it grows too big
 --

 Key: LUCENE-1859
 URL: https://issues.apache.org/jira/browse/LUCENE-1859
 Project: Lucene - Java
  Issue Type: Bug
  Components: Analysis
Affects Versions: 2.9
Reporter: Tim Smith

 This was also an issue with Token previously as well
 If a TermAttributeImpl is populated with a very long buffer, it will never be 
 able to reclaim this memory
 Obviously, it can be argued that Tokenizer's should never emit large 
 tokens, however it seems that the TermAttributeImpl should have a reasonable 
 static MAX_BUFFER_SIZE such that if the term buffer grows bigger than this, 
 it will shrink back down to this size once the next token smaller than 
 MAX_BUFFER_SIZE is set
 I don't think i have actually encountered issues with this yet, however it 
 seems like if you have multiple indexing threads, you could end up with a 
 char[Integer.MAX_VALUE] per thread (in the very worst case scenario)
 perhaps growTermBuffer should have the logic to shrink if the buffer is 
 currently larger than MAX_BUFFER_SIZE and it needs less than MAX_BUFFER_SIZE

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1859) TermAttributeImpl's buffer will never shrink if it grows too big

2009-08-26 Thread Tim Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12748071#action_12748071
 ] 

Tim Smith commented on LUCENE-1859:
---

b1. The worst-case scenario seems kind of theoretical
100% agree, but even if one extremely large token gets added to the stream (and 
possibly dropped prior to indexing), the char[] grows without ever shrinking 
back (so it can result in memory usage growing if bad content is thrown in 
(and people have no shortage of bad content)

bq. Is a priority of major justified?

major is just the default priority (feel free to change)

bq. I assume that, based on this report, TermAttributeImpl never gets reset or 
discarded/recreated over the course of an indexing session?
using reusable TokenStream will never cause the buffer to be nulled (as far as 
i can tell) for the lifetime of the thread (please correct me if i'm wrong on 
this)


i would argue for a semi-large value for MAX_BUFFER_SIZE (potentially allowing 
this to be statically updated), just as a means to bound the max memory used 
here
currently, the memory use is bounded by Integer.MAX_VALUE (which is really big)
If someone feeds a large text document with no spaces or other delimiting 
characters, a non-intelligent tokenizer would treat this a 1 big token (and 
grow the char[] accordingly)

 TermAttributeImpl's buffer will never shrink if it grows too big
 --

 Key: LUCENE-1859
 URL: https://issues.apache.org/jira/browse/LUCENE-1859
 Project: Lucene - Java
  Issue Type: Bug
  Components: Analysis
Affects Versions: 2.9
Reporter: Tim Smith

 This was also an issue with Token previously as well
 If a TermAttributeImpl is populated with a very long buffer, it will never be 
 able to reclaim this memory
 Obviously, it can be argued that Tokenizer's should never emit large 
 tokens, however it seems that the TermAttributeImpl should have a reasonable 
 static MAX_BUFFER_SIZE such that if the term buffer grows bigger than this, 
 it will shrink back down to this size once the next token smaller than 
 MAX_BUFFER_SIZE is set
 I don't think i have actually encountered issues with this yet, however it 
 seems like if you have multiple indexing threads, you could end up with a 
 char[Integer.MAX_VALUE] per thread (in the very worst case scenario)
 perhaps growTermBuffer should have the logic to shrink if the buffer is 
 currently larger than MAX_BUFFER_SIZE and it needs less than MAX_BUFFER_SIZE

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Issue Comment Edited: (LUCENE-1859) TermAttributeImpl's buffer will never shrink if it grows too big

2009-08-26 Thread Tim Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12748071#action_12748071
 ] 

Tim Smith edited comment on LUCENE-1859 at 8/26/09 11:31 AM:
-

bq. The worst-case scenario seems kind of theoretical
100% agree, but even if one extremely large token gets added to the stream (and 
possibly dropped prior to indexing), the char[] grows without ever shrinking 
back (so it can result in memory usage growing if bad content is thrown in 
(and people have no shortage of bad content)

bq. Is a priority of major justified?

major is just the default priority (feel free to change)

bq. I assume that, based on this report, TermAttributeImpl never gets reset or 
discarded/recreated over the course of an indexing session?
using reusable TokenStream will never cause the buffer to be nulled (as far as 
i can tell) for the lifetime of the thread (please correct me if i'm wrong on 
this)


i would argue for a semi-large value for MAX_BUFFER_SIZE (potentially allowing 
this to be statically updated), just as a means to bound the max memory used 
here
currently, the memory use is bounded by Integer.MAX_VALUE (which is really big)
If someone feeds a large text document with no spaces or other delimiting 
characters, a non-intelligent tokenizer would treat this a 1 big token (and 
grow the char[] accordingly)

  was (Author: tsmith):
b1. The worst-case scenario seems kind of theoretical
100% agree, but even if one extremely large token gets added to the stream (and 
possibly dropped prior to indexing), the char[] grows without ever shrinking 
back (so it can result in memory usage growing if bad content is thrown in 
(and people have no shortage of bad content)

bq. Is a priority of major justified?

major is just the default priority (feel free to change)

bq. I assume that, based on this report, TermAttributeImpl never gets reset or 
discarded/recreated over the course of an indexing session?
using reusable TokenStream will never cause the buffer to be nulled (as far as 
i can tell) for the lifetime of the thread (please correct me if i'm wrong on 
this)


i would argue for a semi-large value for MAX_BUFFER_SIZE (potentially allowing 
this to be statically updated), just as a means to bound the max memory used 
here
currently, the memory use is bounded by Integer.MAX_VALUE (which is really big)
If someone feeds a large text document with no spaces or other delimiting 
characters, a non-intelligent tokenizer would treat this a 1 big token (and 
grow the char[] accordingly)
  
 TermAttributeImpl's buffer will never shrink if it grows too big
 --

 Key: LUCENE-1859
 URL: https://issues.apache.org/jira/browse/LUCENE-1859
 Project: Lucene - Java
  Issue Type: Bug
  Components: Analysis
Affects Versions: 2.9
Reporter: Tim Smith

 This was also an issue with Token previously as well
 If a TermAttributeImpl is populated with a very long buffer, it will never be 
 able to reclaim this memory
 Obviously, it can be argued that Tokenizer's should never emit large 
 tokens, however it seems that the TermAttributeImpl should have a reasonable 
 static MAX_BUFFER_SIZE such that if the term buffer grows bigger than this, 
 it will shrink back down to this size once the next token smaller than 
 MAX_BUFFER_SIZE is set
 I don't think i have actually encountered issues with this yet, however it 
 seems like if you have multiple indexing threads, you could end up with a 
 char[Integer.MAX_VALUE] per thread (in the very worst case scenario)
 perhaps growTermBuffer should have the logic to shrink if the buffer is 
 currently larger than MAX_BUFFER_SIZE and it needs less than MAX_BUFFER_SIZE

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1859) TermAttributeImpl's buffer will never shrink if it grows too big

2009-08-26 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12748072#action_12748072
 ] 

Uwe Schindler commented on LUCENE-1859:
---

The problem is, that it may be possible to shrink the buffer once per document, 
when TokenStream's reset() is called (which is done before each new document). 
To achieve this, all TokenStreams must notify the termattribute in reset() to 
shrink its size, which is impractical.

On the other hand, the reallocation would always be for each token (you call 
that inner loop).

I agree, that normally, the tokens will not grow very large (if they do, you do 
something wrong during tokenization). Even things like KeywordTokenizer that 
only creates one token has an upper limit of the term size (as far as I know).

I would set this to minor and would not take care before 2.9. The problem of 
maybe large buffers was there even in older versions with Token as attribute 
implementation. It is the same problem like preserving an ArrayList for very 
long time, it also only grows but never automatically shrinks.

 TermAttributeImpl's buffer will never shrink if it grows too big
 --

 Key: LUCENE-1859
 URL: https://issues.apache.org/jira/browse/LUCENE-1859
 Project: Lucene - Java
  Issue Type: Bug
  Components: Analysis
Affects Versions: 2.9
Reporter: Tim Smith

 This was also an issue with Token previously as well
 If a TermAttributeImpl is populated with a very long buffer, it will never be 
 able to reclaim this memory
 Obviously, it can be argued that Tokenizer's should never emit large 
 tokens, however it seems that the TermAttributeImpl should have a reasonable 
 static MAX_BUFFER_SIZE such that if the term buffer grows bigger than this, 
 it will shrink back down to this size once the next token smaller than 
 MAX_BUFFER_SIZE is set
 I don't think i have actually encountered issues with this yet, however it 
 seems like if you have multiple indexing threads, you could end up with a 
 char[Integer.MAX_VALUE] per thread (in the very worst case scenario)
 perhaps growTermBuffer should have the logic to shrink if the buffer is 
 currently larger than MAX_BUFFER_SIZE and it needs less than MAX_BUFFER_SIZE

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-1859) TermAttributeImpl's buffer will never shrink if it grows too big

2009-08-26 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-1859:
--

Priority: Minor  (was: Major)

 TermAttributeImpl's buffer will never shrink if it grows too big
 --

 Key: LUCENE-1859
 URL: https://issues.apache.org/jira/browse/LUCENE-1859
 Project: Lucene - Java
  Issue Type: Bug
  Components: Analysis
Affects Versions: 2.9
Reporter: Tim Smith
Priority: Minor

 This was also an issue with Token previously as well
 If a TermAttributeImpl is populated with a very long buffer, it will never be 
 able to reclaim this memory
 Obviously, it can be argued that Tokenizer's should never emit large 
 tokens, however it seems that the TermAttributeImpl should have a reasonable 
 static MAX_BUFFER_SIZE such that if the term buffer grows bigger than this, 
 it will shrink back down to this size once the next token smaller than 
 MAX_BUFFER_SIZE is set
 I don't think i have actually encountered issues with this yet, however it 
 seems like if you have multiple indexing threads, you could end up with a 
 char[Integer.MAX_VALUE] per thread (in the very worst case scenario)
 perhaps growTermBuffer should have the logic to shrink if the buffer is 
 currently larger than MAX_BUFFER_SIZE and it needs less than MAX_BUFFER_SIZE

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1859) TermAttributeImpl's buffer will never shrink if it grows too big

2009-08-26 Thread Tim Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12748082#action_12748082
 ] 

Tim Smith commented on LUCENE-1859:
---

bq. which non-intelligent tokenizers are you referring to? nearly all the 
lucene tokenizers have 255 as a limit.

perhaps this is a non-issue with regards to lucene tokenizers
however, Tokenizers can be implemented by anyone (not sure if there are 
adequate warnings about keeping tokens short)
it also may not be possible to keep tokens short, i may need to index a rather 
long id string in a TokenStream fashion which will grow the buffer without 
reclaiming this

perhaps it should be the responsibility of the Tokenizer to shrink the 
TermBuffer if it adds long tokens (but this will probably require some helper 
methods)

 TermAttributeImpl's buffer will never shrink if it grows too big
 --

 Key: LUCENE-1859
 URL: https://issues.apache.org/jira/browse/LUCENE-1859
 Project: Lucene - Java
  Issue Type: Bug
  Components: Analysis
Affects Versions: 2.9
Reporter: Tim Smith
Priority: Minor

 This was also an issue with Token previously as well
 If a TermAttributeImpl is populated with a very long buffer, it will never be 
 able to reclaim this memory
 Obviously, it can be argued that Tokenizer's should never emit large 
 tokens, however it seems that the TermAttributeImpl should have a reasonable 
 static MAX_BUFFER_SIZE such that if the term buffer grows bigger than this, 
 it will shrink back down to this size once the next token smaller than 
 MAX_BUFFER_SIZE is set
 I don't think i have actually encountered issues with this yet, however it 
 seems like if you have multiple indexing threads, you could end up with a 
 char[Integer.MAX_VALUE] per thread (in the very worst case scenario)
 perhaps growTermBuffer should have the logic to shrink if the buffer is 
 currently larger than MAX_BUFFER_SIZE and it needs less than MAX_BUFFER_SIZE

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: javadoc update help

2009-08-26 Thread Michael McCandless
Agreed.  I'll open an issue...

Mike

On Wed, Aug 26, 2009 at 10:26 AM, Mark Millermarkrmil...@gmail.com wrote:
 Why wouldn't we? Isn't it just as much of a break to have the QP start
 spitting them off? Except now its confusing because you get something
 different by default from the QP and by default from the direct object -
 sometimes this could make sense, I don't think the QP has to be locked
 into Query object defaults, but here it seems a bit odd to me.

 Also, now if you parse the output of the Query object toString, you will
 get different behavior - this isn't really a contract, but I think less
 surprises is better.

 - Mark

 Michael McCandless wrote:
 Right, it is confusing!

 QueryParser has already cutover to auto constant score, by default.

 But for direct instantiation of one of the MultiTermQueries, we still
 default to scoring BooleanQuery, but have declared that in 3.0 this
 will also switch to auto constant score.  I'm tempted to simply switch
 the default today, for 2.9, instead.  Then your original proposed
 javadoc is great.

 Mike

 On Wed, Aug 26, 2009 at 6:06 AM, Mark Millermarkrmil...@gmail.com wrote:

 hmm...I guess this javadoc from MultiTermQuery confused me:

  * Note that {...@link QueryParser} produces
  * MultiTermQueries using {...@link
  * #CONSTANT_SCORE_AUTO_REWRITE_DEFAULT} by default.


 Uwe Schindler wrote:

 Even the old RangeQuery does it. Only the new class TermRangeQuery uses
 constant score (and the also deprecated ConstantScoreRangeQuery).

 -
 Uwe Schindler
 H.-H.-Meier-Allee 63, D-28213 Bremen
 http://www.thetaphi.de
 eMail: u...@thetaphi.de



 -Original Message-
 From: Michael McCandless [mailto:luc...@mikemccandless.com]
 Sent: Wednesday, August 26, 2009 11:33 AM
 To: java-dev@lucene.apache.org
 Subject: Re: javadoc update help

 Unfortunately, Prefix/Wildcard/FuzzyQuery, etc., still rewrite to scoring
 BooleanQuery by default, for now.  In 3.0 this will change to constant
 score auto mode.  At least, that's the plan now... however,
 QueryParser will produce queries in constant score auto mode, so we
 could consider changing the default for these queries in 2.9?  If we
 don't want to change that default, how about something like this?:

 /** Set the maximum number of clauses permitted per
   * BooleanQuery.  Default value is 1024.  Note that queries that
   * derive from MultiTermQuery, such as such as WildcardQuery,
   * PrefixQuery and FuzzyQuery, may rewrite themselves to a
   * BooleanQuery before searching, and may therefore also hit this
   * limit.  See {...@link MultiTermQuery} for details.
  */

 Mike

 On Tue, Aug 25, 2009 at 8:14 PM, Mark Millermarkrmil...@gmail.com wrote:


 Having a writers block here:

  /** Set the maximum number of clauses permitted per BooleanQuery.
   * Default value is 1024.
   * pTermQuery clauses are generated from for example prefix queries


 and


   * fuzzy queries. Each TermQuery needs some buffer space during search,
   * so this parameter indirectly controls the maximum buffer
 requirements for
   * query search.
   * pWhen this parameter becomes a bottleneck for a Query one can use


 a


   * Filter. For example instead of a {...@link TermRangeQuery} one can 
 use


 a


   * {...@link TermRangeFilter}.
   * pNormally the buffers are allocated by the JVM. When using for
 example
   * {...@link org.apache.lucene.store.MMapDirectory} the buffering is 
 left


 to


   * the operating system.
   */

 Okay, so prefix and fuzzy queries now will use a constantscore mode
 (indirectly, a filter) when it makes sense.
 So this comment is misleading. And the parameter doesn't control the max
 buffer - it possibly provides a top cutoff. But now it doesn't even
 necessarily do that, because if the Query uses constantscore mode
 (multi-term queries auto pick by default), this setting doesn't even
 influence anything.

 I started to rewrite below - but then it feels like I almost need to
 start from scratch. I don't think we should claim this setting controls
 the maximum buffer requirements for query search either - thats a bit
 strong ;) And the buffer talk overall (including at the bottom) is a bit
 confusing.


  /** Set the maximum number of clauses permitted per BooleanQuery.
   * Default value is 1024.
   * pFor example, TermQuery clauses can be generated from prefix
 queries and
   * fuzzy queries. Each TermQuery needs some buffer space during search,
   * so this parameter indirectly controls the maximum buffer
 requirements for
   * query search.
   * pWhen this parameter becomes a bottleneck for a Query one can use


 a


   * Filter. For example instead of a {...@link TermRangeQuery} one can 
 use


 a


   * {...@link TermRangeFilter}.
   * pNormally the buffers are allocated by the JVM. When using for
 example
   * {...@link org.apache.lucene.store.MMapDirectory} the buffering is 
 left


 to


   * the operating system.
   */

 I'm tempted to make it:

 /** Set the maximum number of clauses 

[jira] Commented: (LUCENE-1859) TermAttributeImpl's buffer will never shrink if it grows too big

2009-08-26 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12748083#action_12748083
 ] 

Robert Muir commented on LUCENE-1859:
-

bq. perhaps it should be the responsibility of the Tokenizer to shrink the 
TermBuffer if it adds long tokens (but this will probably require some helper 
methods)

I like this idea better than having any resizing behavior that I might not be 
able to control.


 TermAttributeImpl's buffer will never shrink if it grows too big
 --

 Key: LUCENE-1859
 URL: https://issues.apache.org/jira/browse/LUCENE-1859
 Project: Lucene - Java
  Issue Type: Bug
  Components: Analysis
Affects Versions: 2.9
Reporter: Tim Smith
Priority: Minor

 This was also an issue with Token previously as well
 If a TermAttributeImpl is populated with a very long buffer, it will never be 
 able to reclaim this memory
 Obviously, it can be argued that Tokenizer's should never emit large 
 tokens, however it seems that the TermAttributeImpl should have a reasonable 
 static MAX_BUFFER_SIZE such that if the term buffer grows bigger than this, 
 it will shrink back down to this size once the next token smaller than 
 MAX_BUFFER_SIZE is set
 I don't think i have actually encountered issues with this yet, however it 
 seems like if you have multiple indexing threads, you could end up with a 
 char[Integer.MAX_VALUE] per thread (in the very worst case scenario)
 perhaps growTermBuffer should have the logic to shrink if the buffer is 
 currently larger than MAX_BUFFER_SIZE and it needs less than MAX_BUFFER_SIZE

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1859) TermAttributeImpl's buffer will never shrink if it grows too big

2009-08-26 Thread Tim Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12748077#action_12748077
 ] 

Tim Smith commented on LUCENE-1859:
---

bq. I would set this to minor and would not take care before 2.9.

i would agree with this

just reported the issue as it has the potential to cause memory issues (and 
would think something should be done about it (in the long term at least))
also, the AttributeSource stuff does result in TermAttributeImpl being held 
onto pretty much forever if using a reusableTokenStream (correct?)
was't a new Token() by the indexer for each doc/field in 2.4?, so the 
unbounding would only last at most for the duration of indexing that one 
document?
with Attribute caching in the TokenStream, the bounding lasts the duration of 
the TokenStream now (or its underlaying AttributeSource), which could remain 
until shutdown

 TermAttributeImpl's buffer will never shrink if it grows too big
 --

 Key: LUCENE-1859
 URL: https://issues.apache.org/jira/browse/LUCENE-1859
 Project: Lucene - Java
  Issue Type: Bug
  Components: Analysis
Affects Versions: 2.9
Reporter: Tim Smith
Priority: Minor

 This was also an issue with Token previously as well
 If a TermAttributeImpl is populated with a very long buffer, it will never be 
 able to reclaim this memory
 Obviously, it can be argued that Tokenizer's should never emit large 
 tokens, however it seems that the TermAttributeImpl should have a reasonable 
 static MAX_BUFFER_SIZE such that if the term buffer grows bigger than this, 
 it will shrink back down to this size once the next token smaller than 
 MAX_BUFFER_SIZE is set
 I don't think i have actually encountered issues with this yet, however it 
 seems like if you have multiple indexing threads, you could end up with a 
 char[Integer.MAX_VALUE] per thread (in the very worst case scenario)
 perhaps growTermBuffer should have the logic to shrink if the buffer is 
 currently larger than MAX_BUFFER_SIZE and it needs less than MAX_BUFFER_SIZE

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1859) TermAttributeImpl's buffer will never shrink if it grows too big

2009-08-26 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12748079#action_12748079
 ] 

Robert Muir commented on LUCENE-1859:
-

bq. If someone feeds a large text document with no spaces or other delimiting 
characters, a non-intelligent tokenizer would treat this a 1 big token (and 
grow the char[] accordingly)

which non-intelligent tokenizers are you referring to? nearly all the lucene 
tokenizers have 255 as a limit. 


 TermAttributeImpl's buffer will never shrink if it grows too big
 --

 Key: LUCENE-1859
 URL: https://issues.apache.org/jira/browse/LUCENE-1859
 Project: Lucene - Java
  Issue Type: Bug
  Components: Analysis
Affects Versions: 2.9
Reporter: Tim Smith
Priority: Minor

 This was also an issue with Token previously as well
 If a TermAttributeImpl is populated with a very long buffer, it will never be 
 able to reclaim this memory
 Obviously, it can be argued that Tokenizer's should never emit large 
 tokens, however it seems that the TermAttributeImpl should have a reasonable 
 static MAX_BUFFER_SIZE such that if the term buffer grows bigger than this, 
 it will shrink back down to this size once the next token smaller than 
 MAX_BUFFER_SIZE is set
 I don't think i have actually encountered issues with this yet, however it 
 seems like if you have multiple indexing threads, you could end up with a 
 char[Integer.MAX_VALUE] per thread (in the very worst case scenario)
 perhaps growTermBuffer should have the logic to shrink if the buffer is 
 currently larger than MAX_BUFFER_SIZE and it needs less than MAX_BUFFER_SIZE

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1859) TermAttributeImpl's buffer will never shrink if it grows too big

2009-08-26 Thread Marvin Humphrey (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12748089#action_12748089
 ] 

Marvin Humphrey commented on LUCENE-1859:
-

IMO, the benefit of adding these theoretical helper methods to lower average -- 
but not peak -- memory usage by non-core Tokenizers which are probably doing 
something impractical anyway... does not justify the complexity cost.

 TermAttributeImpl's buffer will never shrink if it grows too big
 --

 Key: LUCENE-1859
 URL: https://issues.apache.org/jira/browse/LUCENE-1859
 Project: Lucene - Java
  Issue Type: Bug
  Components: Analysis
Affects Versions: 2.9
Reporter: Tim Smith
Priority: Minor

 This was also an issue with Token previously as well
 If a TermAttributeImpl is populated with a very long buffer, it will never be 
 able to reclaim this memory
 Obviously, it can be argued that Tokenizer's should never emit large 
 tokens, however it seems that the TermAttributeImpl should have a reasonable 
 static MAX_BUFFER_SIZE such that if the term buffer grows bigger than this, 
 it will shrink back down to this size once the next token smaller than 
 MAX_BUFFER_SIZE is set
 I don't think i have actually encountered issues with this yet, however it 
 seems like if you have multiple indexing threads, you could end up with a 
 char[Integer.MAX_VALUE] per thread (in the very worst case scenario)
 perhaps growTermBuffer should have the logic to shrink if the buffer is 
 currently larger than MAX_BUFFER_SIZE and it needs less than MAX_BUFFER_SIZE

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1859) TermAttributeImpl's buffer will never shrink if it grows too big

2009-08-26 Thread Marvin Humphrey (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12748102#action_12748102
 ] 

Marvin Humphrey commented on LUCENE-1859:
-

 i fail to see the complexity of adding one method to TermAttribute:

Death by a thousand cuts.  This is one cut.

I wouldn't even add the note to the documentation.  If you emit large tokens,
you have to plan for obscene peak memory usage anyway, and if you're not
prepared for that, you deserve what you get.  Keeping the average down 
doesn't help that.

The only reason to do this is to keep average memory usage down for
the hell of it, and if it goes in, it should be an implementation detail.

 TermAttributeImpl's buffer will never shrink if it grows too big
 --

 Key: LUCENE-1859
 URL: https://issues.apache.org/jira/browse/LUCENE-1859
 Project: Lucene - Java
  Issue Type: Bug
  Components: Analysis
Affects Versions: 2.9
Reporter: Tim Smith
Priority: Minor

 This was also an issue with Token previously as well
 If a TermAttributeImpl is populated with a very long buffer, it will never be 
 able to reclaim this memory
 Obviously, it can be argued that Tokenizer's should never emit large 
 tokens, however it seems that the TermAttributeImpl should have a reasonable 
 static MAX_BUFFER_SIZE such that if the term buffer grows bigger than this, 
 it will shrink back down to this size once the next token smaller than 
 MAX_BUFFER_SIZE is set
 I don't think i have actually encountered issues with this yet, however it 
 seems like if you have multiple indexing threads, you could end up with a 
 char[Integer.MAX_VALUE] per thread (in the very worst case scenario)
 perhaps growTermBuffer should have the logic to shrink if the buffer is 
 currently larger than MAX_BUFFER_SIZE and it needs less than MAX_BUFFER_SIZE

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1859) TermAttributeImpl's buffer will never shrink if it grows too big

2009-08-26 Thread Tim Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12748091#action_12748091
 ] 

Tim Smith commented on LUCENE-1859:
---

i fail to see the complexity of adding one method to TermAttribute:
{code}
public void shrinkBuffer(int maxSize) {
  if ((maxSize  termLength)  (buffer.length  maxSize)) {
termBuffer = new char[maxSize];
  } 
}
{code}

Not having this is fine as long as its well documented that emitting large 
tokens can and will result in memory growing uncontrolled (especially if using 
many indexing threads)

 TermAttributeImpl's buffer will never shrink if it grows too big
 --

 Key: LUCENE-1859
 URL: https://issues.apache.org/jira/browse/LUCENE-1859
 Project: Lucene - Java
  Issue Type: Bug
  Components: Analysis
Affects Versions: 2.9
Reporter: Tim Smith
Priority: Minor

 This was also an issue with Token previously as well
 If a TermAttributeImpl is populated with a very long buffer, it will never be 
 able to reclaim this memory
 Obviously, it can be argued that Tokenizer's should never emit large 
 tokens, however it seems that the TermAttributeImpl should have a reasonable 
 static MAX_BUFFER_SIZE such that if the term buffer grows bigger than this, 
 it will shrink back down to this size once the next token smaller than 
 MAX_BUFFER_SIZE is set
 I don't think i have actually encountered issues with this yet, however it 
 seems like if you have multiple indexing threads, you could end up with a 
 char[Integer.MAX_VALUE] per thread (in the very worst case scenario)
 perhaps growTermBuffer should have the logic to shrink if the buffer is 
 currently larger than MAX_BUFFER_SIZE and it needs less than MAX_BUFFER_SIZE

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1859) TermAttributeImpl's buffer will never shrink if it grows too big

2009-08-26 Thread Tim Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12748103#action_12748103
 ] 

Tim Smith commented on LUCENE-1859:
---

bq. Death by a thousand cuts. This is one cut.

by this logic, nothing new can ever be added. 
The thing that brought this to my attention was the new TokenStream API (one 
cut (rather big, but i like the new API so i'm happy with the blood loss (makes 
me dizzy and happy)))
The new TokenStream API holds onto theses char[] much longer (if not forever), 
so this results in memory growing unbounded unless there is some facility to 
truncate/null out the char[]

bq. I wouldn't even add the note to the documentation.

I don't believe there is ever any valid argument against adding documentation.
If someone can shoot themselves in the foot with the gun you gave them, at 
least tell them not to point the gun at their foot with the safety off.

bq. The only reason to do this is to keep average memory usage down for the 
hell of it.
keeping average memory usage down prevents those wonderful OutOfMemory 
Exceptions (which are difficult at best to recover from)

 TermAttributeImpl's buffer will never shrink if it grows too big
 --

 Key: LUCENE-1859
 URL: https://issues.apache.org/jira/browse/LUCENE-1859
 Project: Lucene - Java
  Issue Type: Bug
  Components: Analysis
Affects Versions: 2.9
Reporter: Tim Smith
Priority: Minor

 This was also an issue with Token previously as well
 If a TermAttributeImpl is populated with a very long buffer, it will never be 
 able to reclaim this memory
 Obviously, it can be argued that Tokenizer's should never emit large 
 tokens, however it seems that the TermAttributeImpl should have a reasonable 
 static MAX_BUFFER_SIZE such that if the term buffer grows bigger than this, 
 it will shrink back down to this size once the next token smaller than 
 MAX_BUFFER_SIZE is set
 I don't think i have actually encountered issues with this yet, however it 
 seems like if you have multiple indexing threads, you could end up with a 
 char[Integer.MAX_VALUE] per thread (in the very worst case scenario)
 perhaps growTermBuffer should have the logic to shrink if the buffer is 
 currently larger than MAX_BUFFER_SIZE and it needs less than MAX_BUFFER_SIZE

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-1860) switch MultiTermQuery to constant score auto rewrite by default

2009-08-26 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-1860:
---

Component/s: Search
   Priority: Minor  (was: Major)

 switch MultiTermQuery to constant score auto rewrite by default
 -

 Key: LUCENE-1860
 URL: https://issues.apache.org/jira/browse/LUCENE-1860
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Affects Versions: 2.9
Reporter: Michael McCandless
Assignee: Michael McCandless
Priority: Minor
 Fix For: 2.9


 Right now it defaults to scoring BooleanQuery, and that's inconsistent w/ 
 QueryParser which does constant score auto.
 The new multi-term queries already set this default, so the only core queries 
 this will impact are PrefixQuery and WildcardQuery.  FuzzyQuery, which has 
 its own rewrite to BooleanQuery, will keep doing so.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-1860) switch MultiTermQuery to constant score auto rewrite by default

2009-08-26 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-1860:
---

Attachment: LUCENE-1860.patch

 switch MultiTermQuery to constant score auto rewrite by default
 -

 Key: LUCENE-1860
 URL: https://issues.apache.org/jira/browse/LUCENE-1860
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Affects Versions: 2.9
Reporter: Michael McCandless
Assignee: Michael McCandless
Priority: Minor
 Fix For: 2.9

 Attachments: LUCENE-1860.patch


 Right now it defaults to scoring BooleanQuery, and that's inconsistent w/ 
 QueryParser which does constant score auto.
 The new multi-term queries already set this default, so the only core queries 
 this will impact are PrefixQuery and WildcardQuery.  FuzzyQuery, which has 
 its own rewrite to BooleanQuery, will keep doing so.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Created: (LUCENE-1860) switch MultiTermQuery to constant score auto rewrite by default

2009-08-26 Thread Michael McCandless (JIRA)
switch MultiTermQuery to constant score auto rewrite by default
-

 Key: LUCENE-1860
 URL: https://issues.apache.org/jira/browse/LUCENE-1860
 Project: Lucene - Java
  Issue Type: Improvement
Affects Versions: 2.9
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 2.9


Right now it defaults to scoring BooleanQuery, and that's inconsistent w/ 
QueryParser which does constant score auto.

The new multi-term queries already set this default, so the only core queries 
this will impact are PrefixQuery and WildcardQuery.  FuzzyQuery, which has its 
own rewrite to BooleanQuery, will keep doing so.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1859) TermAttributeImpl's buffer will never shrink if it grows too big

2009-08-26 Thread Marvin Humphrey (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12748109#action_12748109
 ] 

Marvin Humphrey commented on LUCENE-1859:
-

 I don't believe there is ever any valid argument against adding
 documentation.

The more that documentation grows, the harder it is to absorb.  The more
bells and whistles on an API, the harder it is to grok and to use effectively.
The more a code base bloats, the harder it is to maintain or to evolve.

 keeping average memory usage down prevents those wonderful OutOfMemory
 Exceptions

No, it won't.  If someone is emitting large tokens regularly, it is likely
that several threads will require large RAM footprints simultaneously, and an
OOM will occur.  That would be the common case.

If someone is emmitting large tokens periodically, well, this doesn't prevent
the OOM, it just makes it less likely.  That's not worthless, but it's not
something anybody should count on when assessing required RAM usage.

Keeping average memory usage down is good for the system at large.  If this is
implemented, that should be the justification.


 TermAttributeImpl's buffer will never shrink if it grows too big
 --

 Key: LUCENE-1859
 URL: https://issues.apache.org/jira/browse/LUCENE-1859
 Project: Lucene - Java
  Issue Type: Bug
  Components: Analysis
Affects Versions: 2.9
Reporter: Tim Smith
Priority: Minor

 This was also an issue with Token previously as well
 If a TermAttributeImpl is populated with a very long buffer, it will never be 
 able to reclaim this memory
 Obviously, it can be argued that Tokenizer's should never emit large 
 tokens, however it seems that the TermAttributeImpl should have a reasonable 
 static MAX_BUFFER_SIZE such that if the term buffer grows bigger than this, 
 it will shrink back down to this size once the next token smaller than 
 MAX_BUFFER_SIZE is set
 I don't think i have actually encountered issues with this yet, however it 
 seems like if you have multiple indexing threads, you could end up with a 
 char[Integer.MAX_VALUE] per thread (in the very worst case scenario)
 perhaps growTermBuffer should have the logic to shrink if the buffer is 
 currently larger than MAX_BUFFER_SIZE and it needs less than MAX_BUFFER_SIZE

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Created: (LUCENE-1861) Add contrib libs to classpath for javadoc

2009-08-26 Thread Mark Miller (JIRA)
Add contrib libs to classpath for javadoc
-

 Key: LUCENE-1861
 URL: https://issues.apache.org/jira/browse/LUCENE-1861
 Project: Lucene - Java
  Issue Type: Wish
Reporter: Mark Miller
Priority: Minor


I don't know Ant well enough to just do this easily, so I've labeled a wish - 
would be nice to get rid of all the errors/warnings that not finding these 
classes generates when building javadoc.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-1861) Add contrib libs to classpath for javadoc

2009-08-26 Thread Mark Miller (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller updated LUCENE-1861:


Component/s: Build

 Add contrib libs to classpath for javadoc
 -

 Key: LUCENE-1861
 URL: https://issues.apache.org/jira/browse/LUCENE-1861
 Project: Lucene - Java
  Issue Type: Wish
  Components: Build
Reporter: Mark Miller
Priority: Minor

 I don't know Ant well enough to just do this easily, so I've labeled a wish - 
 would be nice to get rid of all the errors/warnings that not finding these 
 classes generates when building javadoc.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1859) TermAttributeImpl's buffer will never shrink if it grows too big

2009-08-26 Thread Tim Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12748122#action_12748122
 ] 

Tim Smith commented on LUCENE-1859:
---

On documentation:
any warnings/precautions should always be called out (calling out the external 
link (wiki/etc) for in depth details)
in depth descriptions of the details can be pushed off to wiki pages or 
external references, as long as a link is provided for the curious, but i would 
still argue that they should exist

bq. this doesn't prevent the OOM, it just makes it less likely

all you can ever do for OOM issues is make them less likely (short of just 
fixing a bug that holds onto memory like mad). 
If accepting arbitrary content, there will always be a possibility of the 
content forcing OOM issues. In general, everything possible should be done to 
reduce the likelyhood of such OOM issues where possible (IMO).

 TermAttributeImpl's buffer will never shrink if it grows too big
 --

 Key: LUCENE-1859
 URL: https://issues.apache.org/jira/browse/LUCENE-1859
 Project: Lucene - Java
  Issue Type: Bug
  Components: Analysis
Affects Versions: 2.9
Reporter: Tim Smith
Priority: Minor

 This was also an issue with Token previously as well
 If a TermAttributeImpl is populated with a very long buffer, it will never be 
 able to reclaim this memory
 Obviously, it can be argued that Tokenizer's should never emit large 
 tokens, however it seems that the TermAttributeImpl should have a reasonable 
 static MAX_BUFFER_SIZE such that if the term buffer grows bigger than this, 
 it will shrink back down to this size once the next token smaller than 
 MAX_BUFFER_SIZE is set
 I don't think i have actually encountered issues with this yet, however it 
 seems like if you have multiple indexing threads, you could end up with a 
 char[Integer.MAX_VALUE] per thread (in the very worst case scenario)
 perhaps growTermBuffer should have the logic to shrink if the buffer is 
 currently larger than MAX_BUFFER_SIZE and it needs less than MAX_BUFFER_SIZE

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1817) it is impossible to use a custom dictionary for SmartChineseAnalyzer

2009-08-26 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12748126#action_12748126
 ] 

Robert Muir commented on LUCENE-1817:
-

i looked at this file format and I am going to create smaller custom 
dictionaries for testing.

this way we do not have huge files in svn

 it is impossible to use a custom dictionary for SmartChineseAnalyzer
 

 Key: LUCENE-1817
 URL: https://issues.apache.org/jira/browse/LUCENE-1817
 Project: Lucene - Java
  Issue Type: Bug
  Components: contrib/analyzers
Reporter: Robert Muir
Priority: Minor
 Attachments: dataFiles.zip, LUCENE-1817-mark-cn-experimental.patch, 
 LUCENE-1817.patch


 it is not possible to use a custom dictionary, even though there is a lot of 
 code and javadocs to allow this.
 This is because the custom dictionary is only loaded if it cannot load the 
 built-in one (which is of course, in the jar file and should load)
 {code}
 public synchronized static WordDictionary getInstance() {
 if (singleInstance == null) {
   singleInstance = new WordDictionary(); // load from jar file
   try {
 singleInstance.load();
   } catch (IOException e) { // loading from jar file must fail before it 
 checks the AnalyzerProfile (where this can be configured)
 String wordDictRoot = AnalyzerProfile.ANALYSIS_DATA_DIR;
 singleInstance.load(wordDictRoot);
   } catch (ClassNotFoundException e) {
 throw new RuntimeException(e);
   }
 }
 return singleInstance;
   }
 {code}
 I think we should either correct this, document this, or disable custom 
 dictionary support...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-950) IllegalArgumentException parsing foo~1

2009-08-26 Thread Adriano Crestani (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adriano Crestani updated LUCENE-950:


Attachment: lucene_950_08_26_2009.patch

This patch fixes the bug, it no longer throws IllegalArgumentException when the 
user enters fuzzy queries with similarity greater or equals 1, instead, it 
converts the FuzzyQuery into a simple TermQuery, ignoring the fuzzy value.

 IllegalArgumentException parsing foo~1
 

 Key: LUCENE-950
 URL: https://issues.apache.org/jira/browse/LUCENE-950
 Project: Lucene - Java
  Issue Type: Bug
  Components: QueryParser
Affects Versions: 2.1, 2.2
 Environment: Java 1.5
Reporter: Eleanor Joslin
Priority: Minor
 Attachments: lucene_950_08_26_2009.patch


 If I run this:
 QueryParser parser = new QueryParser(myField, new SimpleAnalyzer());
 try {
   parser.parse(foo~1);
 }
 catch (ParseException e) {
   // OK
 }
 I get this:
 Exception in thread main java.lang.IllegalArgumentException: 
 minimumSimilarity = 1
   at org.apache.lucene.search.FuzzyQuery.init(FuzzyQuery.java:58)
   at 
 org.apache.lucene.queryParser.QueryParser.getFuzzyQuery(QueryParser.java:711)
   at org.apache.lucene.queryParser.QueryParser.Term(QueryParser.java:1090)
   at 
 org.apache.lucene.queryParser.QueryParser.Clause(QueryParser.java:979)
   at org.apache.lucene.queryParser.QueryParser.Query(QueryParser.java:907)
   at 
 org.apache.lucene.queryParser.QueryParser.TopLevelQuery(QueryParser.java:896)
   at org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:146)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1860) switch MultiTermQuery to constant score auto rewrite by default

2009-08-26 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12748157#action_12748157
 ] 

Uwe Schindler commented on LUCENE-1860:
---

If we change this, should we keep the good old RangeQuery as it is (boolean 
rewrite)? Because there is also the deprecated ConstantScoreRangeQuery.

 switch MultiTermQuery to constant score auto rewrite by default
 -

 Key: LUCENE-1860
 URL: https://issues.apache.org/jira/browse/LUCENE-1860
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Affects Versions: 2.9
Reporter: Michael McCandless
Assignee: Michael McCandless
Priority: Minor
 Fix For: 2.9

 Attachments: LUCENE-1860.patch


 Right now it defaults to scoring BooleanQuery, and that's inconsistent w/ 
 QueryParser which does constant score auto.
 The new multi-term queries already set this default, so the only core queries 
 this will impact are PrefixQuery and WildcardQuery.  FuzzyQuery, which has 
 its own rewrite to BooleanQuery, will keep doing so.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Resolved: (LUCENE-1851) 'ant javacc' in root project should also properly create contrib/surround Java files

2009-08-26 Thread Michael Busch (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Busch resolved LUCENE-1851.
---

Resolution: Fixed

Fixed it. Sorry about that.

Committed revision 808224.

 'ant javacc' in root project should also properly create contrib/surround 
 Java files
 

 Key: LUCENE-1851
 URL: https://issues.apache.org/jira/browse/LUCENE-1851
 Project: Lucene - Java
  Issue Type: Improvement
  Components: contrib/*
Affects Versions: 2.9
Reporter: Paul Elschot
Assignee: Michael Busch
Priority: Minor
 Fix For: 2.9

 Attachments: javacc20090825.patch, LUCENE-1851.patch


 For consistency after LUCENE-1829 which did the same for contrib/queryparser

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1860) switch MultiTermQuery to constant score auto rewrite by default

2009-08-26 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12748213#action_12748213
 ] 

Michael McCandless commented on LUCENE-1860:


bq. should we keep the good old RangeQuery as it is (boolean rewrite)? Because 
there is also the deprecated ConstantScoreRangeQuery.

I think we should?  That's what it is right now (and the patch leaves it).

 switch MultiTermQuery to constant score auto rewrite by default
 -

 Key: LUCENE-1860
 URL: https://issues.apache.org/jira/browse/LUCENE-1860
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Affects Versions: 2.9
Reporter: Michael McCandless
Assignee: Michael McCandless
Priority: Minor
 Fix For: 2.9

 Attachments: LUCENE-1860.patch


 Right now it defaults to scoring BooleanQuery, and that's inconsistent w/ 
 QueryParser which does constant score auto.
 The new multi-term queries already set this default, so the only core queries 
 this will impact are PrefixQuery and WildcardQuery.  FuzzyQuery, which has 
 its own rewrite to BooleanQuery, will keep doing so.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: Lucene 2.9 release

2009-08-26 Thread Mark Miller
Mark Miller wrote:
 I'm tempted to say lets start the freeze tomorrow instead - I could do
 another full day of doc/packaging no problem I think (a bunch left to do
 on the website stuff alone) - and technically the releaseToDo wants
 everything to go through a patch in JIRA first while in freeze (not a
 bad idea at all) - which slows things down. Also don't have much time to
 do the RC if I'm on doc all day.

 Anyone object to starting tomorrow rather than today?

   
I think I'm ready for freeze tomorrow if everyone else is.

I won't branch - but I'll jump the right numbers and what not on trunk.
Then after branch (in a week ), I'll advance trunks numbers again (to
3.0-dev)

-- 
- Mark

http://www.lucidimagination.com




-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Adding Field twice w/ Payload - bug or works as designed?

2009-08-26 Thread Shai Erera
Hi

I don't know if it's supported or not, but I wrote the following simple
example code to describe what I want.

Directory dir = new RAMDirectory();
Analyzer a = new SimpleAnalyzer();
IndexWriter writer = new IndexWriter(dir, a, MaxFieldLength.UNLIMITED);
Document doc = new Document();
doc.add(new Field(a, abc, Store.NO, Index.NOT_ANALYZED));
final Term t = new Term(a, abc);
doc.add(new Field(t.field(), new TokenStream() {
  boolean done = false;
  @Override
  public Token next(Token reusableToken) throws IOException {
if (done) return null;
done = true;
reusableToken.setTermBuffer(t.text());
reusableToken.setPayload(new Payload(new byte[] { 1 }));
return reusableToken;
  }
}));
writer.addDocument(doc);
writer.commit();
writer.close();

IndexReader reader = IndexReader.open(dir, true);
TermPositions tp = reader.termPositions(t);
tp.next();
tp.nextPosition();
System.out.println(tp.getPayloadLength());
reader.close();

Basically, I add the same Field twice (a:abc), the second time I just set a
Payload. The program prints 0 as the payload length (1 line above the last).
If I change either the field name or field text, it prints 1.

Bug or works as designed?

Shai


Re: Adding Field twice w/ Payload - bug or works as designed?

2009-08-26 Thread Michael Busch
The first occurrence of your term does not have a payload, the second 
one does. So getPayloadLength() correctly returns 0, because the 
TermPositions is at the first occurrence. If you call nextPosition() 
again and then dump the payload length it should be 1.


 Michael

On 8/26/09 8:51 PM, Shai Erera wrote:

Hi

I don't know if it's supported or not, but I wrote the following 
simple example code to describe what I want.


Directory dir = new RAMDirectory();
Analyzer a = new SimpleAnalyzer();
IndexWriter writer = new IndexWriter(dir, a, 
MaxFieldLength.UNLIMITED);

Document doc = new Document();
doc.add(new Field(a, abc, Store.NO, Index.NOT_ANALYZED));
final Term t = new Term(a, abc);
doc.add(new Field(t.field(), new TokenStream() {
  boolean done = false;
  @Override
  public Token next(Token reusableToken) throws IOException {
if (done) return null;
done = true;
reusableToken.setTermBuffer(t.text());
reusableToken.setPayload(new Payload(new byte[] { 1 }));
return reusableToken;
  }
}));
writer.addDocument(doc);
writer.commit();
writer.close();

IndexReader reader = IndexReader.open(dir, true);
TermPositions tp = reader.termPositions(t);
tp.next();
tp.nextPosition();
System.out.println(tp.getPayloadLength());
reader.close();

Basically, I add the same Field twice (a:abc), the second time I just 
set a Payload. The program prints 0 as the payload length (1 line 
above the last). If I change either the field name or field text, it 
prints 1.


Bug or works as designed?

Shai



-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: Adding Field twice w/ Payload - bug or works as designed?

2009-08-26 Thread Shai Erera
Ohh, right. I missed that. Indeed after I call nextPosition again, it prints
1. Thanks !

Shai

On Thu, Aug 27, 2009 at 7:09 AM, Michael Busch busch...@gmail.com wrote:

 The first occurrence of your term does not have a payload, the second one
 does. So getPayloadLength() correctly returns 0, because the TermPositions
 is at the first occurrence. If you call nextPosition() again and then dump
 the payload length it should be 1.

  Michael


 On 8/26/09 8:51 PM, Shai Erera wrote:

 Hi

 I don't know if it's supported or not, but I wrote the following simple
 example code to describe what I want.

Directory dir = new RAMDirectory();
Analyzer a = new SimpleAnalyzer();
IndexWriter writer = new IndexWriter(dir, a, MaxFieldLength.UNLIMITED);
Document doc = new Document();
doc.add(new Field(a, abc, Store.NO, Index.NOT_ANALYZED));
final Term t = new Term(a, abc);
doc.add(new Field(t.field(), new TokenStream() {
  boolean done = false;
  @Override
  public Token next(Token reusableToken) throws IOException {
if (done) return null;
done = true;
reusableToken.setTermBuffer(t.text());
reusableToken.setPayload(new Payload(new byte[] { 1 }));
return reusableToken;
  }
}));
writer.addDocument(doc);
writer.commit();
writer.close();

IndexReader reader = IndexReader.open(dir, true);
TermPositions tp = reader.termPositions(t);
tp.next();
tp.nextPosition();
System.out.println(tp.getPayloadLength());
reader.close();

 Basically, I add the same Field twice (a:abc), the second time I just set
 a Payload. The program prints 0 as the payload length (1 line above the
 last). If I change either the field name or field text, it prints 1.

 Bug or works as designed?

 Shai



 -
 To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-dev-h...@lucene.apache.org