[jira] Commented: (LUCENE-1849) Add OutOfOrderCollector and InOrderCollector subclasses of Collector
[ https://issues.apache.org/jira/browse/LUCENE-1849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12747805#action_12747805 ] Shai Erera commented on LUCENE-1849: bq. I think somewhere in LUCENE-1483 is the answer to this question I tracked it down to LUCENE-1575 which was the huge refactoring to HitCollector issue: https://issues.apache.org/jira/browse/LUCENE-1575?focusedCommentId=12695784page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12695784 (perhaps we should index my memory cells? :) ). Add OutOfOrderCollector and InOrderCollector subclasses of Collector Key: LUCENE-1849 URL: https://issues.apache.org/jira/browse/LUCENE-1849 Project: Lucene - Java Issue Type: Wish Components: Search Affects Versions: 2.9 Reporter: Tim Smith Priority: Minor Fix For: 2.9 I find myself always having to implement these methods, and i always return a constant (depending on if the collector can handle out of order hits) would be nice for these two convenience abstract classes to exist that implemented acceptsDocsOutOfOrder() as final and returned the appropriate value -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1845) if the build fails to download JARs for contrib/db, just skip its tests
[ https://issues.apache.org/jira/browse/LUCENE-1845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12747836#action_12747836 ] Simon Willnauer commented on LUCENE-1845: - I added the short discussion I had on legal-discuss for the record. One person confirmed that we could add the jar to SVN if we do not redistribute them. I'm not a license guy but I guess we should first figure out what license this particular jar has. It is not a download from the oracle page (you can only get the sources for this particular jar not the binary as a download) but something from http://downloads.osafoundation.org/db/ without any license notice. I would suggest to try to build the jar from source with the latest release on the oracle page so we can be sure about the license. Once I have done this I will send another request to legal to confirm tha we are not violating anything. The discussion from legal-discuss {noformat} Hey there, We (lucene) have a contrib project that provides a Index-Directory implementation based on BerkleyDB. This code downloads a jar file from http://downloads.osafoundation.org/... to build and test the code. This jar-file is not included in any distribution and we do not plan to do so. The problem is that the download site is down very frequently so we are looking for another way to obtain the jar. Here is the question do we violate the license if we add the jar-file to the svn repository but not distributing it at all? Another way would be to add the jar to a commiter page on people.apache.org and download it from there. The license is here: http://www.oracle.com/technology/software/products/berkeley-db/htdocs/oslicense.html Complicated matter. BDB seems viral in that anything that uses must be made available in source form. So, ASF has no problem fulfilling that requirement, but downstream users may. OTOH, you say that the BDB is only used to build (do you really need it to build?) and test your implementation, BUT you say that you have an implementation based on BDB, so I presume that it requires it to run. My interpretation is; * IFF your component is purely optional, having a dependency on BDB is Ok, provided it is not shipped with the release and that the user is provided with the information that the BDB needs to be downloaded separately and advised to review their license. For your second part; Can you stick the BDB jar(s) somewhere more reliably available? * Yes, I think so. The license allows distribution in any form, source or binary... So, I suggest that you upload it to a dependable host, such as SF, ibiblio.org or similar. people.apache.org -- I wouldn't recommend it. ASF SVN -- yes, that should be Ok, but there is a strong recommendation of not putting JARs in there... Also there is a risk that the encumbrance around BDB is forgotten and used beyond what is acceptable if it is 'laying around'. Cheer {noformat} if the build fails to download JARs for contrib/db, just skip its tests --- Key: LUCENE-1845 URL: https://issues.apache.org/jira/browse/LUCENE-1845 Project: Lucene - Java Issue Type: Improvement Reporter: Michael McCandless Assignee: Simon Willnauer Priority: Minor Fix For: 2.9 Attachments: LUCENE-1845.patch, LUCENE-1845.txt, LUCENE-1845.txt, LUCENE-1845.txt, LUCENE-1845.txt Every so often our nightly build fails because contrib/db is unable to download the necessary BDB JARs from http://downloads.osafoundation.org. I think in such cases we should simply skip contrib/db's tests, if it's the nightly build that's running, since it's a false positive failure. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1849) Add OutOfOrderCollector and InOrderCollector subclasses of Collector
[ https://issues.apache.org/jira/browse/LUCENE-1849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12747851#action_12747851 ] Michael McCandless commented on LUCENE-1849: We do need to index your memory cells! Except: that entry is showing the [sizable] perf gains of disabling scoring when sorting by field (I think?). We were instead looking for the comparison of BooleanScorer vs BoolenScorer2. Add OutOfOrderCollector and InOrderCollector subclasses of Collector Key: LUCENE-1849 URL: https://issues.apache.org/jira/browse/LUCENE-1849 Project: Lucene - Java Issue Type: Wish Components: Search Affects Versions: 2.9 Reporter: Tim Smith Priority: Minor Fix For: 2.9 I find myself always having to implement these methods, and i always return a constant (depending on if the collector can handle out of order hits) would be nice for these two convenience abstract classes to exist that implemented acceptsDocsOutOfOrder() as final and returned the appropriate value -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: Hudson build is back to normal: Lucene-trunk #929
Yay! Mike On Wed, Aug 26, 2009 at 1:24 AM, Apache Hudson Serverhud...@hudson.zones.apache.org wrote: See http://hudson.zones.apache.org/hudson/job/Lucene-trunk/929/changes - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: Hudson build is back to normal: Lucene-trunk #929
:) there we go! On Wed, Aug 26, 2009 at 11:24 AM, Michael McCandlessluc...@mikemccandless.com wrote: Yay! Mike On Wed, Aug 26, 2009 at 1:24 AM, Apache Hudson Serverhud...@hudson.zones.apache.org wrote: See http://hudson.zones.apache.org/hudson/job/Lucene-trunk/929/changes - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
RE: javadoc update help
Even the old RangeQuery does it. Only the new class TermRangeQuery uses constant score (and the also deprecated ConstantScoreRangeQuery). - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Michael McCandless [mailto:luc...@mikemccandless.com] Sent: Wednesday, August 26, 2009 11:33 AM To: java-dev@lucene.apache.org Subject: Re: javadoc update help Unfortunately, Prefix/Wildcard/FuzzyQuery, etc., still rewrite to scoring BooleanQuery by default, for now. In 3.0 this will change to constant score auto mode. At least, that's the plan now... however, QueryParser will produce queries in constant score auto mode, so we could consider changing the default for these queries in 2.9? If we don't want to change that default, how about something like this?: /** Set the maximum number of clauses permitted per * BooleanQuery. Default value is 1024. Note that queries that * derive from MultiTermQuery, such as such as WildcardQuery, * PrefixQuery and FuzzyQuery, may rewrite themselves to a * BooleanQuery before searching, and may therefore also hit this * limit. See {...@link MultiTermQuery} for details. */ Mike On Tue, Aug 25, 2009 at 8:14 PM, Mark Millermarkrmil...@gmail.com wrote: Having a writers block here: /** Set the maximum number of clauses permitted per BooleanQuery. * Default value is 1024. * pTermQuery clauses are generated from for example prefix queries and * fuzzy queries. Each TermQuery needs some buffer space during search, * so this parameter indirectly controls the maximum buffer requirements for * query search. * pWhen this parameter becomes a bottleneck for a Query one can use a * Filter. For example instead of a {...@link TermRangeQuery} one can use a * {...@link TermRangeFilter}. * pNormally the buffers are allocated by the JVM. When using for example * {...@link org.apache.lucene.store.MMapDirectory} the buffering is left to * the operating system. */ Okay, so prefix and fuzzy queries now will use a constantscore mode (indirectly, a filter) when it makes sense. So this comment is misleading. And the parameter doesn't control the max buffer - it possibly provides a top cutoff. But now it doesn't even necessarily do that, because if the Query uses constantscore mode (multi-term queries auto pick by default), this setting doesn't even influence anything. I started to rewrite below - but then it feels like I almost need to start from scratch. I don't think we should claim this setting controls the maximum buffer requirements for query search either - thats a bit strong ;) And the buffer talk overall (including at the bottom) is a bit confusing. /** Set the maximum number of clauses permitted per BooleanQuery. * Default value is 1024. * pFor example, TermQuery clauses can be generated from prefix queries and * fuzzy queries. Each TermQuery needs some buffer space during search, * so this parameter indirectly controls the maximum buffer requirements for * query search. * pWhen this parameter becomes a bottleneck for a Query one can use a * Filter. For example instead of a {...@link TermRangeQuery} one can use a * {...@link TermRangeFilter}. * pNormally the buffers are allocated by the JVM. When using for example * {...@link org.apache.lucene.store.MMapDirectory} the buffering is left to * the operating system. */ I'm tempted to make it: /** Set the maximum number of clauses permitted per BooleanQuery. * Default value is 1024. */ :) Anyone have any suggestions though? -- - Mark http://www.lucidimagination.com - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1849) Add OutOfOrderCollector and InOrderCollector subclasses of Collector
[ https://issues.apache.org/jira/browse/LUCENE-1849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12747874#action_12747874 ] Shai Erera commented on LUCENE-1849: Yes I know that - I remember that you once ran w/ BS vs. BS2 and thought the results were reported in that issue. But I've scanned it and I don't find it. Perhaps it was in an email, but I seem to remember you reported ~20-30% improvement in favor of BS. I'll try to digg it up from the bottom of my memory pit. Add OutOfOrderCollector and InOrderCollector subclasses of Collector Key: LUCENE-1849 URL: https://issues.apache.org/jira/browse/LUCENE-1849 Project: Lucene - Java Issue Type: Wish Components: Search Affects Versions: 2.9 Reporter: Tim Smith Priority: Minor Fix For: 2.9 I find myself always having to implement these methods, and i always return a constant (depending on if the collector can handle out of order hits) would be nice for these two convenience abstract classes to exist that implemented acceptsDocsOutOfOrder() as final and returned the appropriate value -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: javadoc update help
hmm...I guess this javadoc from MultiTermQuery confused me: * Note that {...@link QueryParser} produces * MultiTermQueries using {...@link * #CONSTANT_SCORE_AUTO_REWRITE_DEFAULT} by default. Uwe Schindler wrote: Even the old RangeQuery does it. Only the new class TermRangeQuery uses constant score (and the also deprecated ConstantScoreRangeQuery). - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Michael McCandless [mailto:luc...@mikemccandless.com] Sent: Wednesday, August 26, 2009 11:33 AM To: java-dev@lucene.apache.org Subject: Re: javadoc update help Unfortunately, Prefix/Wildcard/FuzzyQuery, etc., still rewrite to scoring BooleanQuery by default, for now. In 3.0 this will change to constant score auto mode. At least, that's the plan now... however, QueryParser will produce queries in constant score auto mode, so we could consider changing the default for these queries in 2.9? If we don't want to change that default, how about something like this?: /** Set the maximum number of clauses permitted per * BooleanQuery. Default value is 1024. Note that queries that * derive from MultiTermQuery, such as such as WildcardQuery, * PrefixQuery and FuzzyQuery, may rewrite themselves to a * BooleanQuery before searching, and may therefore also hit this * limit. See {...@link MultiTermQuery} for details. */ Mike On Tue, Aug 25, 2009 at 8:14 PM, Mark Millermarkrmil...@gmail.com wrote: Having a writers block here: /** Set the maximum number of clauses permitted per BooleanQuery. * Default value is 1024. * pTermQuery clauses are generated from for example prefix queries and * fuzzy queries. Each TermQuery needs some buffer space during search, * so this parameter indirectly controls the maximum buffer requirements for * query search. * pWhen this parameter becomes a bottleneck for a Query one can use a * Filter. For example instead of a {...@link TermRangeQuery} one can use a * {...@link TermRangeFilter}. * pNormally the buffers are allocated by the JVM. When using for example * {...@link org.apache.lucene.store.MMapDirectory} the buffering is left to * the operating system. */ Okay, so prefix and fuzzy queries now will use a constantscore mode (indirectly, a filter) when it makes sense. So this comment is misleading. And the parameter doesn't control the max buffer - it possibly provides a top cutoff. But now it doesn't even necessarily do that, because if the Query uses constantscore mode (multi-term queries auto pick by default), this setting doesn't even influence anything. I started to rewrite below - but then it feels like I almost need to start from scratch. I don't think we should claim this setting controls the maximum buffer requirements for query search either - thats a bit strong ;) And the buffer talk overall (including at the bottom) is a bit confusing. /** Set the maximum number of clauses permitted per BooleanQuery. * Default value is 1024. * pFor example, TermQuery clauses can be generated from prefix queries and * fuzzy queries. Each TermQuery needs some buffer space during search, * so this parameter indirectly controls the maximum buffer requirements for * query search. * pWhen this parameter becomes a bottleneck for a Query one can use a * Filter. For example instead of a {...@link TermRangeQuery} one can use a * {...@link TermRangeFilter}. * pNormally the buffers are allocated by the JVM. When using for example * {...@link org.apache.lucene.store.MMapDirectory} the buffering is left to * the operating system. */ I'm tempted to make it: /** Set the maximum number of clauses permitted per BooleanQuery. * Default value is 1024. */ :) Anyone have any suggestions though? -- - Mark http://www.lucidimagination.com - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org -- - Mark http://www.lucidimagination.com - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: javadoc update help
Right, it is confusing! QueryParser has already cutover to auto constant score, by default. But for direct instantiation of one of the MultiTermQueries, we still default to scoring BooleanQuery, but have declared that in 3.0 this will also switch to auto constant score. I'm tempted to simply switch the default today, for 2.9, instead. Then your original proposed javadoc is great. Mike On Wed, Aug 26, 2009 at 6:06 AM, Mark Millermarkrmil...@gmail.com wrote: hmm...I guess this javadoc from MultiTermQuery confused me: * Note that {...@link QueryParser} produces * MultiTermQueries using {...@link * #CONSTANT_SCORE_AUTO_REWRITE_DEFAULT} by default. Uwe Schindler wrote: Even the old RangeQuery does it. Only the new class TermRangeQuery uses constant score (and the also deprecated ConstantScoreRangeQuery). - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Michael McCandless [mailto:luc...@mikemccandless.com] Sent: Wednesday, August 26, 2009 11:33 AM To: java-dev@lucene.apache.org Subject: Re: javadoc update help Unfortunately, Prefix/Wildcard/FuzzyQuery, etc., still rewrite to scoring BooleanQuery by default, for now. In 3.0 this will change to constant score auto mode. At least, that's the plan now... however, QueryParser will produce queries in constant score auto mode, so we could consider changing the default for these queries in 2.9? If we don't want to change that default, how about something like this?: /** Set the maximum number of clauses permitted per * BooleanQuery. Default value is 1024. Note that queries that * derive from MultiTermQuery, such as such as WildcardQuery, * PrefixQuery and FuzzyQuery, may rewrite themselves to a * BooleanQuery before searching, and may therefore also hit this * limit. See {...@link MultiTermQuery} for details. */ Mike On Tue, Aug 25, 2009 at 8:14 PM, Mark Millermarkrmil...@gmail.com wrote: Having a writers block here: /** Set the maximum number of clauses permitted per BooleanQuery. * Default value is 1024. * pTermQuery clauses are generated from for example prefix queries and * fuzzy queries. Each TermQuery needs some buffer space during search, * so this parameter indirectly controls the maximum buffer requirements for * query search. * pWhen this parameter becomes a bottleneck for a Query one can use a * Filter. For example instead of a {...@link TermRangeQuery} one can use a * {...@link TermRangeFilter}. * pNormally the buffers are allocated by the JVM. When using for example * {...@link org.apache.lucene.store.MMapDirectory} the buffering is left to * the operating system. */ Okay, so prefix and fuzzy queries now will use a constantscore mode (indirectly, a filter) when it makes sense. So this comment is misleading. And the parameter doesn't control the max buffer - it possibly provides a top cutoff. But now it doesn't even necessarily do that, because if the Query uses constantscore mode (multi-term queries auto pick by default), this setting doesn't even influence anything. I started to rewrite below - but then it feels like I almost need to start from scratch. I don't think we should claim this setting controls the maximum buffer requirements for query search either - thats a bit strong ;) And the buffer talk overall (including at the bottom) is a bit confusing. /** Set the maximum number of clauses permitted per BooleanQuery. * Default value is 1024. * pFor example, TermQuery clauses can be generated from prefix queries and * fuzzy queries. Each TermQuery needs some buffer space during search, * so this parameter indirectly controls the maximum buffer requirements for * query search. * pWhen this parameter becomes a bottleneck for a Query one can use a * Filter. For example instead of a {...@link TermRangeQuery} one can use a * {...@link TermRangeFilter}. * pNormally the buffers are allocated by the JVM. When using for example * {...@link org.apache.lucene.store.MMapDirectory} the buffering is left to * the operating system. */ I'm tempted to make it: /** Set the maximum number of clauses permitted per BooleanQuery. * Default value is 1024. */ :) Anyone have any suggestions though? -- - Mark http://www.lucidimagination.com - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For
[jira] Commented: (LUCENE-1851) 'ant javacc' in root project should also properly create contrib/surround Java files
[ https://issues.apache.org/jira/browse/LUCENE-1851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12747884#action_12747884 ] Paul Elschot commented on LUCENE-1851: -- After svn update I still have the output below, so I think the commit missed some files affected by the patch: svn diff `find contrib/surround -name '*.jj'` Index: contrib/surround/src/java/org/apache/lucene/queryParser/surround/parser/QueryParser.jj === --- contrib/surround/src/java/org/apache/lucene/queryParser/surround/parser/QueryParser.jj (revision 807956) +++ contrib/surround/src/java/org/apache/lucene/queryParser/surround/parser/QueryParser.jj (working copy) @@ -184,7 +184,7 @@ } DEFAULT SKIP : { - _WHITESPACE + _WHITESPACE } /* Operator tokens (in increasing order of precedence): */ 'ant javacc' in root project should also properly create contrib/surround Java files Key: LUCENE-1851 URL: https://issues.apache.org/jira/browse/LUCENE-1851 Project: Lucene - Java Issue Type: Improvement Components: contrib/* Affects Versions: 2.9 Reporter: Paul Elschot Assignee: Michael Busch Priority: Minor Fix For: 2.9 Attachments: javacc20090825.patch, LUCENE-1851.patch For consistency after LUCENE-1829 which did the same for contrib/queryparser -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: svn commit: r807763 - /lucene/java/trunk/build.xml
Here's what is currently run on Hudson as shell: set -x export FORREST_HOME=/export/home/nigel/tools/forrest/latest ANT_HOME=/export/home/hudson/tools/ant/latest ARTIFACTS=$WORKSPACE/artifacts MAVEN_ARTIFACTS=$WORKSPACE/maven_artifacts TRUNK=$WORKSPACE/trunk mkdir -p $ARTIFACTS mkdir -p $MAVEN_ARTIFACTS cd $TRUNK echo Workspace: $WORKSPACE # run build #$ANT_HOME/bin/ant -lib /export/home/nigel/hudsonSupport/junit \ # -Dversion=$BUILD_ID -Dtest.junit.output.format=xml nightly # release it #cp dist/*.tar.gz $ARTIFACTS #Package the Source $ANT_HOME/bin/ant -Dversion=$BUILD_ID \ -Dsvnversion.exe=/opt/subversion-current/bin/svnversion \ -Dsvn.exe=/opt/subversion-current/bin/svn \ clean package-tgz-src # release it cp dist/*-src.tar.gz $ARTIFACTS #Generate the Maven snapshot #Update the Version # when doing a release $ANT_HOME/bin/ant -lib /export/home/nigel/hudsonSupport/maven \ -Dsvnversion.exe=/opt/subversion-current/bin/svnversion \ -Dsvn.exe=/opt/subversion-current/bin/svn \ -Dversion=2.9-SNAPSHOT generate-maven-artifacts #copy the artifacts to the side so the cron job can publish them echo Copying Maven artifacts to $MAVEN_ARTIFACTS cp -R dist/maven/org/apache/lucene $MAVEN_ARTIFACTS echo Done Copying Maven Artifacts # run build $ANT_HOME/bin/ant -lib /export/home/nigel/hudsonSupport/junit \ -Dsvnversion.exe=/opt/subversion-current/bin/svnversion \ -Dsvn.exe=/opt/subversion-current/bin/svn \ -Dversion=$BUILD_ID -Dtest.junit.output.format=xml nightly # release it cp dist/*.tar.gz $ARTIFACTS $ANT_HOME/bin/ant -Dversion=$BUILD_ID \ -Dsvnversion.exe=/opt/subversion-current/bin/svnversion \ -Dsvn.exe=/opt/subversion-current/bin/svn \ javadocs #Rerun nightly with clover on $ANT_HOME/bin/ant -lib /export/home/nigel/hudsonSupport/junit \ -lib /export/home/hudson/tools/clover/latest/lib \ -Dsvnversion.exe=/opt/subversion-current/bin/svnversion \ -Dsvn.exe=/opt/subversion-current/bin/svn \ -Dversion=$BUILD_ID -Drun.clover=true clean nightly #generate the clover reports $ANT_HOME/bin/ant -lib /export/home/nigel/hudsonSupport/junit \ -lib /export/home/hudson/tools/clover/latest/lib \ -Dsvnversion.exe=/opt/subversion-current/bin/svnversion \ -Dsvn.exe=/opt/subversion-current/bin/svn \ -Dversion=$BUILD_ID -Drun.clover=true generate-clover-reports On Aug 25, 2009, at 7:39 PM, Chris Hostetter wrote: : Grant does the cutover to hudson.zones still invoke the nightly.sh? I : thought it did? (But then looking at the console output from the : build, I can't correlate it..). nightly.sh is not run, there's a complicated set of shell commands configured in hudson that gets run instead. (why it's not just exec'ing a shellscript in svn isn't clear to me ... but it starts with set -x so the build log should make it clear exactly what's running. you can see from that log: the nightly ant target is still used. -Hoss - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: Lucene website - benchmarks page
+1 Mike On Wed, Aug 26, 2009 at 8:20 AM, Grant Ingersollgsing...@apache.org wrote: +1 On Aug 25, 2009, at 10:11 PM, Mark Miller wrote: This are very old and not very useful anymore. Should we pull this page? Its kind of an embarrassment if we don't actually maintain it to be remotely current. These all with Lucene 1.2, 1.3 ... -- - Mark http://www.lucidimagination.com - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
RE: svn commit: r807763 - /lucene/java/trunk/build.xml
So it is possible by editing this script to pass additional options with -D to some of the ANT commands. Thanks for the insight, that also helps me very much with the clover update. - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Grant Ingersoll [mailto:gsing...@apache.org] Sent: Wednesday, August 26, 2009 2:20 PM To: java-dev@lucene.apache.org Subject: Re: svn commit: r807763 - /lucene/java/trunk/build.xml Here's what is currently run on Hudson as shell: set -x export FORREST_HOME=/export/home/nigel/tools/forrest/latest ANT_HOME=/export/home/hudson/tools/ant/latest ARTIFACTS=$WORKSPACE/artifacts MAVEN_ARTIFACTS=$WORKSPACE/maven_artifacts TRUNK=$WORKSPACE/trunk mkdir -p $ARTIFACTS mkdir -p $MAVEN_ARTIFACTS cd $TRUNK echo Workspace: $WORKSPACE # run build #$ANT_HOME/bin/ant -lib /export/home/nigel/hudsonSupport/junit \ # -Dversion=$BUILD_ID -Dtest.junit.output.format=xml nightly # release it #cp dist/*.tar.gz $ARTIFACTS #Package the Source $ANT_HOME/bin/ant -Dversion=$BUILD_ID \ -Dsvnversion.exe=/opt/subversion-current/bin/svnversion \ -Dsvn.exe=/opt/subversion-current/bin/svn \ clean package-tgz-src # release it cp dist/*-src.tar.gz $ARTIFACTS #Generate the Maven snapshot #Update the Version # when doing a release $ANT_HOME/bin/ant -lib /export/home/nigel/hudsonSupport/maven \ -Dsvnversion.exe=/opt/subversion-current/bin/svnversion \ -Dsvn.exe=/opt/subversion-current/bin/svn \ -Dversion=2.9-SNAPSHOT generate-maven-artifacts #copy the artifacts to the side so the cron job can publish them echo Copying Maven artifacts to $MAVEN_ARTIFACTS cp -R dist/maven/org/apache/lucene $MAVEN_ARTIFACTS echo Done Copying Maven Artifacts # run build $ANT_HOME/bin/ant -lib /export/home/nigel/hudsonSupport/junit \ -Dsvnversion.exe=/opt/subversion-current/bin/svnversion \ -Dsvn.exe=/opt/subversion-current/bin/svn \ -Dversion=$BUILD_ID -Dtest.junit.output.format=xml nightly # release it cp dist/*.tar.gz $ARTIFACTS $ANT_HOME/bin/ant -Dversion=$BUILD_ID \ -Dsvnversion.exe=/opt/subversion-current/bin/svnversion \ -Dsvn.exe=/opt/subversion-current/bin/svn \ javadocs #Rerun nightly with clover on $ANT_HOME/bin/ant -lib /export/home/nigel/hudsonSupport/junit \ -lib /export/home/hudson/tools/clover/latest/lib \ -Dsvnversion.exe=/opt/subversion-current/bin/svnversion \ -Dsvn.exe=/opt/subversion-current/bin/svn \ -Dversion=$BUILD_ID -Drun.clover=true clean nightly #generate the clover reports $ANT_HOME/bin/ant -lib /export/home/nigel/hudsonSupport/junit \ -lib /export/home/hudson/tools/clover/latest/lib \ -Dsvnversion.exe=/opt/subversion-current/bin/svnversion \ -Dsvn.exe=/opt/subversion-current/bin/svn \ -Dversion=$BUILD_ID -Drun.clover=true generate-clover-reports On Aug 25, 2009, at 7:39 PM, Chris Hostetter wrote: : Grant does the cutover to hudson.zones still invoke the nightly.sh? I : thought it did? (But then looking at the console output from the : build, I can't correlate it..). nightly.sh is not run, there's a complicated set of shell commands configured in hudson that gets run instead. (why it's not just exec'ing a shellscript in svn isn't clear to me ... but it starts with set -x so the build log should make it clear exactly what's running. you can see from that log: the nightly ant target is still used. -Hoss - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: Lucene 2.9 release
I'm tempted to say lets start the freeze tomorrow instead - I could do another full day of doc/packaging no problem I think (a bunch left to do on the website stuff alone) - and technically the releaseToDo wants everything to go through a patch in JIRA first while in freeze (not a bad idea at all) - which slows things down. Also don't have much time to do the RC if I'm on doc all day. Anyone object to starting tomorrow rather than today? -- - Mark http://www.lucidimagination.com - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: svn commit: r807763 - /lucene/java/trunk/build.xml
Thanks Grant. Should we remove https://svn.apache.org/repos/asf/lucene/java/nightly entirely? Ie we are not using any of these files anymore?: README.txt nightly.properties nightly.sh.bak publish-maven.sh nightly.cronnightly.sh Mike On Wed, Aug 26, 2009 at 8:20 AM, Grant Ingersollgsing...@apache.org wrote: Here's what is currently run on Hudson as shell: set -x export FORREST_HOME=/export/home/nigel/tools/forrest/latest ANT_HOME=/export/home/hudson/tools/ant/latest ARTIFACTS=$WORKSPACE/artifacts MAVEN_ARTIFACTS=$WORKSPACE/maven_artifacts TRUNK=$WORKSPACE/trunk mkdir -p $ARTIFACTS mkdir -p $MAVEN_ARTIFACTS cd $TRUNK echo Workspace: $WORKSPACE # run build #$ANT_HOME/bin/ant -lib /export/home/nigel/hudsonSupport/junit \ # -Dversion=$BUILD_ID -Dtest.junit.output.format=xml nightly # release it #cp dist/*.tar.gz $ARTIFACTS #Package the Source $ANT_HOME/bin/ant -Dversion=$BUILD_ID \ -Dsvnversion.exe=/opt/subversion-current/bin/svnversion \ -Dsvn.exe=/opt/subversion-current/bin/svn \ clean package-tgz-src # release it cp dist/*-src.tar.gz $ARTIFACTS #Generate the Maven snapshot #Update the Version # when doing a release $ANT_HOME/bin/ant -lib /export/home/nigel/hudsonSupport/maven \ -Dsvnversion.exe=/opt/subversion-current/bin/svnversion \ -Dsvn.exe=/opt/subversion-current/bin/svn \ -Dversion=2.9-SNAPSHOT generate-maven-artifacts #copy the artifacts to the side so the cron job can publish them echo Copying Maven artifacts to $MAVEN_ARTIFACTS cp -R dist/maven/org/apache/lucene $MAVEN_ARTIFACTS echo Done Copying Maven Artifacts # run build $ANT_HOME/bin/ant -lib /export/home/nigel/hudsonSupport/junit \ -Dsvnversion.exe=/opt/subversion-current/bin/svnversion \ -Dsvn.exe=/opt/subversion-current/bin/svn \ -Dversion=$BUILD_ID -Dtest.junit.output.format=xml nightly # release it cp dist/*.tar.gz $ARTIFACTS $ANT_HOME/bin/ant -Dversion=$BUILD_ID \ -Dsvnversion.exe=/opt/subversion-current/bin/svnversion \ -Dsvn.exe=/opt/subversion-current/bin/svn \ javadocs #Rerun nightly with clover on $ANT_HOME/bin/ant -lib /export/home/nigel/hudsonSupport/junit \ -lib /export/home/hudson/tools/clover/latest/lib \ -Dsvnversion.exe=/opt/subversion-current/bin/svnversion \ -Dsvn.exe=/opt/subversion-current/bin/svn \ -Dversion=$BUILD_ID -Drun.clover=true clean nightly #generate the clover reports $ANT_HOME/bin/ant -lib /export/home/nigel/hudsonSupport/junit \ -lib /export/home/hudson/tools/clover/latest/lib \ -Dsvnversion.exe=/opt/subversion-current/bin/svnversion \ -Dsvn.exe=/opt/subversion-current/bin/svn \ -Dversion=$BUILD_ID -Drun.clover=true generate-clover-reports On Aug 25, 2009, at 7:39 PM, Chris Hostetter wrote: : Grant does the cutover to hudson.zones still invoke the nightly.sh? I : thought it did? (But then looking at the console output from the : build, I can't correlate it..). nightly.sh is not run, there's a complicated set of shell commands configured in hudson that gets run instead. (why it's not just exec'ing a shellscript in svn isn't clear to me ... but it starts with set -x so the build log should make it clear exactly what's running. you can see from that log: the nightly ant target is still used. -Hoss - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: svn commit: r807763 - /lucene/java/trunk/build.xml
I think pub-maven is still used, but let me check. On Aug 26, 2009, at 8:47 AM, Michael McCandless wrote: Thanks Grant. Should we remove https://svn.apache.org/repos/asf/lucene/java/nightly entirely? Ie we are not using any of these files anymore?: README.txt nightly.properties nightly.sh.bak publish- maven.sh nightly.cronnightly.sh Mike On Wed, Aug 26, 2009 at 8:20 AM, Grant Ingersollgsing...@apache.org wrote: Here's what is currently run on Hudson as shell: set -x export FORREST_HOME=/export/home/nigel/tools/forrest/latest ANT_HOME=/export/home/hudson/tools/ant/latest ARTIFACTS=$WORKSPACE/artifacts MAVEN_ARTIFACTS=$WORKSPACE/maven_artifacts TRUNK=$WORKSPACE/trunk mkdir -p $ARTIFACTS mkdir -p $MAVEN_ARTIFACTS cd $TRUNK echo Workspace: $WORKSPACE # run build #$ANT_HOME/bin/ant -lib /export/home/nigel/hudsonSupport/junit \ # -Dversion=$BUILD_ID -Dtest.junit.output.format=xml nightly # release it #cp dist/*.tar.gz $ARTIFACTS #Package the Source $ANT_HOME/bin/ant -Dversion=$BUILD_ID \ -Dsvnversion.exe=/opt/subversion-current/bin/svnversion \ -Dsvn.exe=/opt/subversion-current/bin/svn \ clean package-tgz-src # release it cp dist/*-src.tar.gz $ARTIFACTS #Generate the Maven snapshot #Update the Version # when doing a release $ANT_HOME/bin/ant -lib /export/home/nigel/hudsonSupport/maven \ -Dsvnversion.exe=/opt/subversion-current/bin/svnversion \ -Dsvn.exe=/opt/subversion-current/bin/svn \ -Dversion=2.9-SNAPSHOT generate-maven-artifacts #copy the artifacts to the side so the cron job can publish them echo Copying Maven artifacts to $MAVEN_ARTIFACTS cp -R dist/maven/org/apache/lucene $MAVEN_ARTIFACTS echo Done Copying Maven Artifacts # run build $ANT_HOME/bin/ant -lib /export/home/nigel/hudsonSupport/junit \ -Dsvnversion.exe=/opt/subversion-current/bin/svnversion \ -Dsvn.exe=/opt/subversion-current/bin/svn \ -Dversion=$BUILD_ID -Dtest.junit.output.format=xml nightly # release it cp dist/*.tar.gz $ARTIFACTS $ANT_HOME/bin/ant -Dversion=$BUILD_ID \ -Dsvnversion.exe=/opt/subversion-current/bin/svnversion \ -Dsvn.exe=/opt/subversion-current/bin/svn \ javadocs #Rerun nightly with clover on $ANT_HOME/bin/ant -lib /export/home/nigel/hudsonSupport/junit \ -lib /export/home/hudson/tools/clover/latest/lib \ -Dsvnversion.exe=/opt/subversion-current/bin/svnversion \ -Dsvn.exe=/opt/subversion-current/bin/svn \ -Dversion=$BUILD_ID -Drun.clover=true clean nightly #generate the clover reports $ANT_HOME/bin/ant -lib /export/home/nigel/hudsonSupport/junit \ -lib /export/home/hudson/tools/clover/latest/lib \ -Dsvnversion.exe=/opt/subversion-current/bin/svnversion \ -Dsvn.exe=/opt/subversion-current/bin/svn \ -Dversion=$BUILD_ID -Drun.clover=true generate-clover-reports On Aug 25, 2009, at 7:39 PM, Chris Hostetter wrote: : Grant does the cutover to hudson.zones still invoke the nightly.sh? I : thought it did? (But then looking at the console output from the : build, I can't correlate it..). nightly.sh is not run, there's a complicated set of shell commands configured in hudson that gets run instead. (why it's not just exec'ing a shellscript in svn isn't clear to me ... but it starts with set - x so the build log should make it clear exactly what's running. you can see from that log: the nightly ant target is still used. -Hoss - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Reopened: (LUCENE-1851) 'ant javacc' in root project should also properly create contrib/surround Java files
[ https://issues.apache.org/jira/browse/LUCENE-1851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Elschot reopened LUCENE-1851: -- Reopening only to make sure my last comment is not missed before the impending 2.9 release. 'ant javacc' in root project should also properly create contrib/surround Java files Key: LUCENE-1851 URL: https://issues.apache.org/jira/browse/LUCENE-1851 Project: Lucene - Java Issue Type: Improvement Components: contrib/* Affects Versions: 2.9 Reporter: Paul Elschot Assignee: Michael Busch Priority: Minor Fix For: 2.9 Attachments: javacc20090825.patch, LUCENE-1851.patch For consistency after LUCENE-1829 which did the same for contrib/queryparser -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: svn commit: r807763 - /lucene/java/trunk/build.xml
publish-maven is run on my cron script on the Hudson zone, where it copies the maven artifacts off of the Lucene Zone and onto the Hudson Zone. FWIW, committers can get Hudson accounts. See http://wiki.apache.org/general/Hudson . Committers can also get Lucene Zone access to, if it is needed. I will update the docs. On Aug 26, 2009, at 8:54 AM, Grant Ingersoll wrote: I think pub-maven is still used, but let me check. On Aug 26, 2009, at 8:47 AM, Michael McCandless wrote: Thanks Grant. Should we remove https://svn.apache.org/repos/asf/lucene/java/nightly entirely? Ie we are not using any of these files anymore?: README.txt nightly.properties nightly.sh.bak publish- maven.sh nightly.cronnightly.sh Mike On Wed, Aug 26, 2009 at 8:20 AM, Grant Ingersollgsing...@apache.org wrote: Here's what is currently run on Hudson as shell: set -x export FORREST_HOME=/export/home/nigel/tools/forrest/latest ANT_HOME=/export/home/hudson/tools/ant/latest ARTIFACTS=$WORKSPACE/artifacts MAVEN_ARTIFACTS=$WORKSPACE/maven_artifacts TRUNK=$WORKSPACE/trunk mkdir -p $ARTIFACTS mkdir -p $MAVEN_ARTIFACTS cd $TRUNK echo Workspace: $WORKSPACE # run build #$ANT_HOME/bin/ant -lib /export/home/nigel/hudsonSupport/junit \ # -Dversion=$BUILD_ID -Dtest.junit.output.format=xml nightly # release it #cp dist/*.tar.gz $ARTIFACTS #Package the Source $ANT_HOME/bin/ant -Dversion=$BUILD_ID \ -Dsvnversion.exe=/opt/subversion-current/bin/svnversion \ -Dsvn.exe=/opt/subversion-current/bin/svn \ clean package-tgz-src # release it cp dist/*-src.tar.gz $ARTIFACTS #Generate the Maven snapshot #Update the Version # when doing a release $ANT_HOME/bin/ant -lib /export/home/nigel/hudsonSupport/maven \ -Dsvnversion.exe=/opt/subversion-current/bin/svnversion \ -Dsvn.exe=/opt/subversion-current/bin/svn \ -Dversion=2.9-SNAPSHOT generate-maven-artifacts #copy the artifacts to the side so the cron job can publish them echo Copying Maven artifacts to $MAVEN_ARTIFACTS cp -R dist/maven/org/apache/lucene $MAVEN_ARTIFACTS echo Done Copying Maven Artifacts # run build $ANT_HOME/bin/ant -lib /export/home/nigel/hudsonSupport/junit \ -Dsvnversion.exe=/opt/subversion-current/bin/svnversion \ -Dsvn.exe=/opt/subversion-current/bin/svn \ -Dversion=$BUILD_ID -Dtest.junit.output.format=xml nightly # release it cp dist/*.tar.gz $ARTIFACTS $ANT_HOME/bin/ant -Dversion=$BUILD_ID \ -Dsvnversion.exe=/opt/subversion-current/bin/svnversion \ -Dsvn.exe=/opt/subversion-current/bin/svn \ javadocs #Rerun nightly with clover on $ANT_HOME/bin/ant -lib /export/home/nigel/hudsonSupport/junit \ -lib /export/home/hudson/tools/clover/latest/lib \ -Dsvnversion.exe=/opt/subversion-current/bin/svnversion \ -Dsvn.exe=/opt/subversion-current/bin/svn \ -Dversion=$BUILD_ID -Drun.clover=true clean nightly #generate the clover reports $ANT_HOME/bin/ant -lib /export/home/nigel/hudsonSupport/junit \ -lib /export/home/hudson/tools/clover/latest/lib \ -Dsvnversion.exe=/opt/subversion-current/bin/svnversion \ -Dsvn.exe=/opt/subversion-current/bin/svn \ -Dversion=$BUILD_ID -Drun.clover=true generate-clover-reports On Aug 25, 2009, at 7:39 PM, Chris Hostetter wrote: : Grant does the cutover to hudson.zones still invoke the nightly.sh? I : thought it did? (But then looking at the console output from the : build, I can't correlate it..). nightly.sh is not run, there's a complicated set of shell commands configured in hudson that gets run instead. (why it's not just exec'ing a shellscript in svn isn't clear to me ... but it starts with set - x so the build log should make it clear exactly what's running. you can see from that log: the nightly ant target is still used. -Hoss - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1859) TermAttributeImpl's buffer will never shrink if it grows too big
[ https://issues.apache.org/jira/browse/LUCENE-1859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12747942#action_12747942 ] Uwe Schindler commented on LUCENE-1859: --- This also applies to Token. If we fix that, we should also fix it in Token. TermAttributeImpl's buffer will never shrink if it grows too big -- Key: LUCENE-1859 URL: https://issues.apache.org/jira/browse/LUCENE-1859 Project: Lucene - Java Issue Type: Bug Components: Analysis Affects Versions: 2.9 Reporter: Tim Smith This was also an issue with Token previously as well If a TermAttributeImpl is populated with a very long buffer, it will never be able to reclaim this memory Obviously, it can be argued that Tokenizer's should never emit large tokens, however it seems that the TermAttributeImpl should have a reasonable static MAX_BUFFER_SIZE such that if the term buffer grows bigger than this, it will shrink back down to this size once the next token smaller than MAX_BUFFER_SIZE is set I don't think i have actually encountered issues with this yet, however it seems like if you have multiple indexing threads, you could end up with a char[Integer.MAX_VALUE] per thread (in the very worst case scenario) perhaps growTermBuffer should have the logic to shrink if the buffer is currently larger than MAX_BUFFER_SIZE and it needs less than MAX_BUFFER_SIZE -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-1815) Geohash encode/decode floating point problems
[ https://issues.apache.org/jira/browse/LUCENE-1815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer updated LUCENE-1815: Priority: Minor (was: Major) I don't think this shouldn't be major! Geohash encode/decode floating point problems - Key: LUCENE-1815 URL: https://issues.apache.org/jira/browse/LUCENE-1815 Project: Lucene - Java Issue Type: Bug Components: contrib/spatial Affects Versions: 2.9 Reporter: Wouter Heijke Priority: Minor i'm finding the Geohash support in the spatial package to be rather unreliable. Here is the outcome of a test that encodes/decodes the same lat/lon and geohash a few times. the format: action geohash=(latitude, longitude) the result: encode u173zq37x014=(52.3738007,4.8909347) decode u173zq37x014=(52.3737996,4.890934) encode u173zq37rpbw=(52.3737996,4.890934) decode u173zq37rpbw=(52.3737996,4.89093295) encode u173zq37qzzy=(52.3737996,4.89093295) if I now change to the google code implementation: encode u173zq37x014=(52.3738007,4.8909347) decode u173zq37x014=(52.37380061298609,4.890934377908707) encode u173zq37x014=(52.37380061298609,4.890934377908707) decode u173zq37x014=(52.37380061298609,4.890934377908707) encode u173zq37x014=(52.37380061298609,4.890934377908707) Note the differences between the geohashes in both situations and the lat/lon's! Now things get worse if you work on low-precision geohashes: decode u173=(52.0,4.0) encode u14zg429yy84=(52.0,4.0) decode u14zg429yy84=(52.0,3.99) encode u14zg429ywx6=(52.0,3.99) and google: decode u173=(52.20703125,4.5703125) encode u173=(52.20703125,4.5703125) decode u173=(52.20703125,4.5703125) encode u173=(52.20703125,4.5703125) We are using geohashes extensively and will now use the google code version unfortunately. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1851) 'ant javacc' in root project should also properly create contrib/surround Java files
[ https://issues.apache.org/jira/browse/LUCENE-1851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12747961#action_12747961 ] Luis Alves commented on LUCENE-1851: {code} javacc: [javacc] Java Compiler Compiler Version 4.2 (Parser Generator) [javacc] (type javacc with no arguments for help) [javacc] Reading from file /home/lafa/kisor2/workspace_eclipse33/lucene_trunk2/contrib/surround/src/java/org/apache/lucene/queryParser/surround/parser/QueryParser.jj . . . [javacc] org.javacc.parser.ParseException: Encountered at line 187, column 3. [javacc] Was expecting one of: [javacc] STRING_LITERAL ... [javacc] ... [javacc] [javacc] Detected 1 errors and 0 warnings. {code} I just re-synced and see the same problem, I think Michael forgot to commit the QueryParser.jj changes I made. 'ant javacc' in root project should also properly create contrib/surround Java files Key: LUCENE-1851 URL: https://issues.apache.org/jira/browse/LUCENE-1851 Project: Lucene - Java Issue Type: Improvement Components: contrib/* Affects Versions: 2.9 Reporter: Paul Elschot Assignee: Michael Busch Priority: Minor Fix For: 2.9 Attachments: javacc20090825.patch, LUCENE-1851.patch For consistency after LUCENE-1829 which did the same for contrib/queryparser -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Back-Compat on Contribs
I just talked to Robert about refactoring of smartcn for the next releases 2.9. Robert raised a question if we should mark smartcn as experimental so that we can change interfaces and public methods etc. during the refactoring. Would that make sense for 2.9 or is there no such thing as a back compat policy for modules like that. simon - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: Back-Compat on Contribs
Simon Willnauer wrote: I just talked to Robert about refactoring of smartcn for the next releases 2.9. Robert raised a question if we should mark smartcn as experimental so that we can change interfaces and public methods etc. during the refactoring. Would that make sense for 2.9 or is there no such thing as a back compat policy for modules like that. simon - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org Contrib modules are not required to support back compat in any way currently - but they can also each have any more restrictive policy that we want. I consider Highlighter to be 1.4 right now (even though thats not explicit anywhere). Warning users that you don't plan on promising back compat with experimental warnings seems like a good idea to me. -- - Mark http://www.lucidimagination.com - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1817) it is impossible to use a custom dictionary for SmartChineseAnalyzer
[ https://issues.apache.org/jira/browse/LUCENE-1817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12747963#action_12747963 ] Robert Muir commented on LUCENE-1817: - I am looking at this today. One thing about this code that should also be corrected ASAP is that if you have a custom dictionary directory in .DCT format, the load() method will actually call save() This will create a corresponding .MEM file in the same directory after loading the dictionary in DCT format. I really do not think load() methods should be creating or writing to files. it is impossible to use a custom dictionary for SmartChineseAnalyzer Key: LUCENE-1817 URL: https://issues.apache.org/jira/browse/LUCENE-1817 Project: Lucene - Java Issue Type: Bug Components: contrib/analyzers Reporter: Robert Muir Priority: Minor it is not possible to use a custom dictionary, even though there is a lot of code and javadocs to allow this. This is because the custom dictionary is only loaded if it cannot load the built-in one (which is of course, in the jar file and should load) {code} public synchronized static WordDictionary getInstance() { if (singleInstance == null) { singleInstance = new WordDictionary(); // load from jar file try { singleInstance.load(); } catch (IOException e) { // loading from jar file must fail before it checks the AnalyzerProfile (where this can be configured) String wordDictRoot = AnalyzerProfile.ANALYSIS_DATA_DIR; singleInstance.load(wordDictRoot); } catch (ClassNotFoundException e) { throw new RuntimeException(e); } } return singleInstance; } {code} I think we should either correct this, document this, or disable custom dictionary support... -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-1817) it is impossible to use a custom dictionary for SmartChineseAnalyzer
[ https://issues.apache.org/jira/browse/LUCENE-1817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer updated LUCENE-1817: Attachment: LUCENE-1817-mark-cn-experimental.patch We should mark the smartcn module experimental as we plan to do heavy refactoring after 2.9 is out. This patch adds a notice to package.html and JavaDoc. Quoting Mark Miller from the list: bq. Warning users that you don't plan on promising back compat with experimental warnings seems like a good idea to me. it is impossible to use a custom dictionary for SmartChineseAnalyzer Key: LUCENE-1817 URL: https://issues.apache.org/jira/browse/LUCENE-1817 Project: Lucene - Java Issue Type: Bug Components: contrib/analyzers Reporter: Robert Muir Priority: Minor Attachments: LUCENE-1817-mark-cn-experimental.patch it is not possible to use a custom dictionary, even though there is a lot of code and javadocs to allow this. This is because the custom dictionary is only loaded if it cannot load the built-in one (which is of course, in the jar file and should load) {code} public synchronized static WordDictionary getInstance() { if (singleInstance == null) { singleInstance = new WordDictionary(); // load from jar file try { singleInstance.load(); } catch (IOException e) { // loading from jar file must fail before it checks the AnalyzerProfile (where this can be configured) String wordDictRoot = AnalyzerProfile.ANALYSIS_DATA_DIR; singleInstance.load(wordDictRoot); } catch (ClassNotFoundException e) { throw new RuntimeException(e); } } return singleInstance; } {code} I think we should either correct this, document this, or disable custom dictionary support... -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: Back-Compat on Contribs
On Wed, Aug 26, 2009 at 4:49 PM, Mark Millermarkrmil...@gmail.com wrote: Simon Willnauer wrote: I just talked to Robert about refactoring of smartcn for the next releases 2.9. Robert raised a question if we should mark smartcn as experimental so that we can change interfaces and public methods etc. during the refactoring. Would that make sense for 2.9 or is there no such thing as a back compat policy for modules like that. simon - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org Contrib modules are not required to support back compat in any way currently - but they can also each have any more restrictive policy that we want. I consider Highlighter to be 1.4 right now (even though thats not explicit anywhere). Warning users that you don't plan on promising back compat with experimental warnings seems like a good idea to me. I think so too - done! simon -- - Mark http://www.lucidimagination.com - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1486) Wildcards, ORs etc inside Phrase queries
[ https://issues.apache.org/jira/browse/LUCENE-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12747982#action_12747982 ] Grant Ingersoll commented on LUCENE-1486: - I'm not sure why the ComplexPhraseQuery itself is buried in the Parser. Can't the query stand on it's own? Seems like it could be a useful class outside of the specific content of a QueryParser, no? Wildcards, ORs etc inside Phrase queries Key: LUCENE-1486 URL: https://issues.apache.org/jira/browse/LUCENE-1486 Project: Lucene - Java Issue Type: Improvement Components: QueryParser Affects Versions: 2.4 Reporter: Mark Harwood Assignee: Mark Miller Priority: Minor Fix For: 3.0, 3.1 Attachments: ComplexPhraseQueryParser.java, junit_complex_phrase_qp_07_21_2009.patch, junit_complex_phrase_qp_07_22_2009.patch, Lucene-1486 non default field.patch, LUCENE-1486.patch, LUCENE-1486.patch, LUCENE-1486.patch, LUCENE-1486.patch, LUCENE-1486.patch, TestComplexPhraseQuery.java An extension to the default QueryParser that overrides the parsing of PhraseQueries to allow more complex syntax e.g. wildcards in phrase queries. The implementation feels a little hacky - this is arguably better handled in QueryParser itself. This works as a proof of concept for much of the query parser syntax. Examples from the Junit test include: checkMatches(\j* smyth~\, 1,2); //wildcards and fuzzies are OK in phrases checkMatches(\(jo* -john) smith\, 2); // boolean logic works checkMatches(\jo* smith\~2, 1,2,3); // position logic works. checkBadQuery(\jo* id:1 smith\); //mixing fields in a phrase is bad checkBadQuery(\jo* \smith\ \); //phrases inside phrases is bad checkBadQuery(\jo* [sma TO smZ]\ \); //range queries inside phrases not supported Code plus Junit test to follow... -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1817) it is impossible to use a custom dictionary for SmartChineseAnalyzer
[ https://issues.apache.org/jira/browse/LUCENE-1817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12747967#action_12747967 ] Robert Muir commented on LUCENE-1817: - Uwe, i agree. currently it does do the autodetect (first checks for .MEM, then falls back on DCT). but if it has to fall back on DCT, it will create a .MEM file. it is impossible to use a custom dictionary for SmartChineseAnalyzer Key: LUCENE-1817 URL: https://issues.apache.org/jira/browse/LUCENE-1817 Project: Lucene - Java Issue Type: Bug Components: contrib/analyzers Reporter: Robert Muir Priority: Minor it is not possible to use a custom dictionary, even though there is a lot of code and javadocs to allow this. This is because the custom dictionary is only loaded if it cannot load the built-in one (which is of course, in the jar file and should load) {code} public synchronized static WordDictionary getInstance() { if (singleInstance == null) { singleInstance = new WordDictionary(); // load from jar file try { singleInstance.load(); } catch (IOException e) { // loading from jar file must fail before it checks the AnalyzerProfile (where this can be configured) String wordDictRoot = AnalyzerProfile.ANALYSIS_DATA_DIR; singleInstance.load(wordDictRoot); } catch (ClassNotFoundException e) { throw new RuntimeException(e); } } return singleInstance; } {code} I think we should either correct this, document this, or disable custom dictionary support... -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1817) it is impossible to use a custom dictionary for SmartChineseAnalyzer
[ https://issues.apache.org/jira/browse/LUCENE-1817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12747966#action_12747966 ] Uwe Schindler commented on LUCENE-1817: --- In my opinion, the loader should be able to load either .mem files (which should realy be named *.ser, because they are serialized java objects) or DCT format files (maybe autodetect) or two separate methods. If you want to quicker load the files later, you could also save the DCT as a serialized object after that, but this should be left to the user and not done automatically. it is impossible to use a custom dictionary for SmartChineseAnalyzer Key: LUCENE-1817 URL: https://issues.apache.org/jira/browse/LUCENE-1817 Project: Lucene - Java Issue Type: Bug Components: contrib/analyzers Reporter: Robert Muir Priority: Minor it is not possible to use a custom dictionary, even though there is a lot of code and javadocs to allow this. This is because the custom dictionary is only loaded if it cannot load the built-in one (which is of course, in the jar file and should load) {code} public synchronized static WordDictionary getInstance() { if (singleInstance == null) { singleInstance = new WordDictionary(); // load from jar file try { singleInstance.load(); } catch (IOException e) { // loading from jar file must fail before it checks the AnalyzerProfile (where this can be configured) String wordDictRoot = AnalyzerProfile.ANALYSIS_DATA_DIR; singleInstance.load(wordDictRoot); } catch (ClassNotFoundException e) { throw new RuntimeException(e); } } return singleInstance; } {code} I think we should either correct this, document this, or disable custom dictionary support... -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1817) it is impossible to use a custom dictionary for SmartChineseAnalyzer
[ https://issues.apache.org/jira/browse/LUCENE-1817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12748025#action_12748025 ] Robert Muir commented on LUCENE-1817: - to make matters more complex, trying to load a bigram dictionary from a DCT file gave me: {noformat} # An unexpected error has been detected by Java Runtime Environment: # # EXCEPTION_ACCESS_VIOLATION (0xc005) at pc=0x6dc378d0, pid=3140, tid=5912 # # Java VM: Java HotSpot(TM) 64-Bit Server VM (11.2-b01 mixed mode windows-amd64) # Problematic frame: # V [jvm.dll+0x3a78d0] {noformat} I will try to see if i can resolve this. it is impossible to use a custom dictionary for SmartChineseAnalyzer Key: LUCENE-1817 URL: https://issues.apache.org/jira/browse/LUCENE-1817 Project: Lucene - Java Issue Type: Bug Components: contrib/analyzers Reporter: Robert Muir Priority: Minor Attachments: LUCENE-1817-mark-cn-experimental.patch it is not possible to use a custom dictionary, even though there is a lot of code and javadocs to allow this. This is because the custom dictionary is only loaded if it cannot load the built-in one (which is of course, in the jar file and should load) {code} public synchronized static WordDictionary getInstance() { if (singleInstance == null) { singleInstance = new WordDictionary(); // load from jar file try { singleInstance.load(); } catch (IOException e) { // loading from jar file must fail before it checks the AnalyzerProfile (where this can be configured) String wordDictRoot = AnalyzerProfile.ANALYSIS_DATA_DIR; singleInstance.load(wordDictRoot); } catch (ClassNotFoundException e) { throw new RuntimeException(e); } } return singleInstance; } {code} I think we should either correct this, document this, or disable custom dictionary support... -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-1817) it is impossible to use a custom dictionary for SmartChineseAnalyzer
[ https://issues.apache.org/jira/browse/LUCENE-1817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-1817: Attachment: LUCENE-1817.patch patch adds: * load custom dictionaries when the analyzer has been configured to do so * test that custom DCT dictionaries load * do not serialize/write files when loading DCT * change saveToObj() to package protected so someone can serialize their own dictionaries instead. the patch requires some binary dct data files which I will try to upload as a zip it is impossible to use a custom dictionary for SmartChineseAnalyzer Key: LUCENE-1817 URL: https://issues.apache.org/jira/browse/LUCENE-1817 Project: Lucene - Java Issue Type: Bug Components: contrib/analyzers Reporter: Robert Muir Priority: Minor Attachments: LUCENE-1817-mark-cn-experimental.patch, LUCENE-1817.patch it is not possible to use a custom dictionary, even though there is a lot of code and javadocs to allow this. This is because the custom dictionary is only loaded if it cannot load the built-in one (which is of course, in the jar file and should load) {code} public synchronized static WordDictionary getInstance() { if (singleInstance == null) { singleInstance = new WordDictionary(); // load from jar file try { singleInstance.load(); } catch (IOException e) { // loading from jar file must fail before it checks the AnalyzerProfile (where this can be configured) String wordDictRoot = AnalyzerProfile.ANALYSIS_DATA_DIR; singleInstance.load(wordDictRoot); } catch (ClassNotFoundException e) { throw new RuntimeException(e); } } return singleInstance; } {code} I think we should either correct this, document this, or disable custom dictionary support... -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Issue Comment Edited: (LUCENE-1817) it is impossible to use a custom dictionary for SmartChineseAnalyzer
[ https://issues.apache.org/jira/browse/LUCENE-1817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12748025#action_12748025 ] Robert Muir edited comment on LUCENE-1817 at 8/26/09 10:21 AM: --- to make matters more complex, trying to load a bigram dictionary from a DCT file gave me: {noformat} # An unexpected error has been detected by Java Runtime Environment: # # EXCEPTION_ACCESS_VIOLATION (0xc005) at pc=0x6dc378d0, pid=3140, tid=5912 # # Java VM: Java HotSpot(TM) 64-Bit Server VM (11.2-b01 mixed mode windows-amd64) # Problematic frame: # V [jvm.dll+0x3a78d0] {noformat} apparently this is some clover issue in my eclipse and i turned it off, so it is an unrelated problem. was (Author: rcmuir): to make matters more complex, trying to load a bigram dictionary from a DCT file gave me: {noformat} # An unexpected error has been detected by Java Runtime Environment: # # EXCEPTION_ACCESS_VIOLATION (0xc005) at pc=0x6dc378d0, pid=3140, tid=5912 # # Java VM: Java HotSpot(TM) 64-Bit Server VM (11.2-b01 mixed mode windows-amd64) # Problematic frame: # V [jvm.dll+0x3a78d0] {noformat} I will try to see if i can resolve this. it is impossible to use a custom dictionary for SmartChineseAnalyzer Key: LUCENE-1817 URL: https://issues.apache.org/jira/browse/LUCENE-1817 Project: Lucene - Java Issue Type: Bug Components: contrib/analyzers Reporter: Robert Muir Priority: Minor Attachments: dataFiles.zip, LUCENE-1817-mark-cn-experimental.patch, LUCENE-1817.patch it is not possible to use a custom dictionary, even though there is a lot of code and javadocs to allow this. This is because the custom dictionary is only loaded if it cannot load the built-in one (which is of course, in the jar file and should load) {code} public synchronized static WordDictionary getInstance() { if (singleInstance == null) { singleInstance = new WordDictionary(); // load from jar file try { singleInstance.load(); } catch (IOException e) { // loading from jar file must fail before it checks the AnalyzerProfile (where this can be configured) String wordDictRoot = AnalyzerProfile.ANALYSIS_DATA_DIR; singleInstance.load(wordDictRoot); } catch (ClassNotFoundException e) { throw new RuntimeException(e); } } return singleInstance; } {code} I think we should either correct this, document this, or disable custom dictionary support... -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-1817) it is impossible to use a custom dictionary for SmartChineseAnalyzer
[ https://issues.apache.org/jira/browse/LUCENE-1817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-1817: Attachment: dataFiles.zip the two files in this directory need to be placed in smartcn/test under o/a/l/analysis/cn/smart/hmm/customDictionaryDCT it is impossible to use a custom dictionary for SmartChineseAnalyzer Key: LUCENE-1817 URL: https://issues.apache.org/jira/browse/LUCENE-1817 Project: Lucene - Java Issue Type: Bug Components: contrib/analyzers Reporter: Robert Muir Priority: Minor Attachments: dataFiles.zip, LUCENE-1817-mark-cn-experimental.patch, LUCENE-1817.patch it is not possible to use a custom dictionary, even though there is a lot of code and javadocs to allow this. This is because the custom dictionary is only loaded if it cannot load the built-in one (which is of course, in the jar file and should load) {code} public synchronized static WordDictionary getInstance() { if (singleInstance == null) { singleInstance = new WordDictionary(); // load from jar file try { singleInstance.load(); } catch (IOException e) { // loading from jar file must fail before it checks the AnalyzerProfile (where this can be configured) String wordDictRoot = AnalyzerProfile.ANALYSIS_DATA_DIR; singleInstance.load(wordDictRoot); } catch (ClassNotFoundException e) { throw new RuntimeException(e); } } return singleInstance; } {code} I think we should either correct this, document this, or disable custom dictionary support... -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1486) Wildcards, ORs etc inside Phrase queries
[ https://issues.apache.org/jira/browse/LUCENE-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12748046#action_12748046 ] Mark Harwood commented on LUCENE-1486: -- It does not stand on it's own as it is merely a temporary object used as a peculiarity in the way the parsing works. The SpanQuery family would be the legitimate standalone equivalents of this class. ComplexPhraseQuery objects are constructed during the the first pass of parsing to capture everything between quotes as an opaque string. The ComplexPhraseQueryParser then calls parsePhraseElements(...) on these objects to complete the process of parsing in a second pass where in this context any brackets etc take on a different meaning There is no merit in making this externally visible. Wildcards, ORs etc inside Phrase queries Key: LUCENE-1486 URL: https://issues.apache.org/jira/browse/LUCENE-1486 Project: Lucene - Java Issue Type: Improvement Components: QueryParser Affects Versions: 2.4 Reporter: Mark Harwood Assignee: Mark Miller Priority: Minor Fix For: 3.0, 3.1 Attachments: ComplexPhraseQueryParser.java, junit_complex_phrase_qp_07_21_2009.patch, junit_complex_phrase_qp_07_22_2009.patch, Lucene-1486 non default field.patch, LUCENE-1486.patch, LUCENE-1486.patch, LUCENE-1486.patch, LUCENE-1486.patch, LUCENE-1486.patch, TestComplexPhraseQuery.java An extension to the default QueryParser that overrides the parsing of PhraseQueries to allow more complex syntax e.g. wildcards in phrase queries. The implementation feels a little hacky - this is arguably better handled in QueryParser itself. This works as a proof of concept for much of the query parser syntax. Examples from the Junit test include: checkMatches(\j* smyth~\, 1,2); //wildcards and fuzzies are OK in phrases checkMatches(\(jo* -john) smith\, 2); // boolean logic works checkMatches(\jo* smith\~2, 1,2,3); // position logic works. checkBadQuery(\jo* id:1 smith\); //mixing fields in a phrase is bad checkBadQuery(\jo* \smith\ \); //phrases inside phrases is bad checkBadQuery(\jo* [sma TO smZ]\ \); //range queries inside phrases not supported Code plus Junit test to follow... -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1859) TermAttributeImpl's buffer will never shrink if it grows too big
[ https://issues.apache.org/jira/browse/LUCENE-1859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12748064#action_12748064 ] Marvin Humphrey commented on LUCENE-1859: - The worst-case scenario seems kind of theoretical, since there are so many reasons that huge tokens are impractical. (Is a priority of major justified?) If there's a significant benefit to shrinking the allocation, it's minimizing average memory usage over time. But even that assumes a nearly pathological distribution in field size -- it would have to be large for early documents, then consistently small for subsequent documents. If it's scattered, you have to plan for worst case RAM usage as an app developer, anyway. Which generally means limiting token size. I assume that, based on this report, TermAttributeImpl never gets reset or discarded/recreated over the course of an indexing session? -0 if the reallocation happens no more often than once per document. -1 if it the reallocation has be performed in an inner loop. TermAttributeImpl's buffer will never shrink if it grows too big -- Key: LUCENE-1859 URL: https://issues.apache.org/jira/browse/LUCENE-1859 Project: Lucene - Java Issue Type: Bug Components: Analysis Affects Versions: 2.9 Reporter: Tim Smith This was also an issue with Token previously as well If a TermAttributeImpl is populated with a very long buffer, it will never be able to reclaim this memory Obviously, it can be argued that Tokenizer's should never emit large tokens, however it seems that the TermAttributeImpl should have a reasonable static MAX_BUFFER_SIZE such that if the term buffer grows bigger than this, it will shrink back down to this size once the next token smaller than MAX_BUFFER_SIZE is set I don't think i have actually encountered issues with this yet, however it seems like if you have multiple indexing threads, you could end up with a char[Integer.MAX_VALUE] per thread (in the very worst case scenario) perhaps growTermBuffer should have the logic to shrink if the buffer is currently larger than MAX_BUFFER_SIZE and it needs less than MAX_BUFFER_SIZE -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1859) TermAttributeImpl's buffer will never shrink if it grows too big
[ https://issues.apache.org/jira/browse/LUCENE-1859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12748071#action_12748071 ] Tim Smith commented on LUCENE-1859: --- b1. The worst-case scenario seems kind of theoretical 100% agree, but even if one extremely large token gets added to the stream (and possibly dropped prior to indexing), the char[] grows without ever shrinking back (so it can result in memory usage growing if bad content is thrown in (and people have no shortage of bad content) bq. Is a priority of major justified? major is just the default priority (feel free to change) bq. I assume that, based on this report, TermAttributeImpl never gets reset or discarded/recreated over the course of an indexing session? using reusable TokenStream will never cause the buffer to be nulled (as far as i can tell) for the lifetime of the thread (please correct me if i'm wrong on this) i would argue for a semi-large value for MAX_BUFFER_SIZE (potentially allowing this to be statically updated), just as a means to bound the max memory used here currently, the memory use is bounded by Integer.MAX_VALUE (which is really big) If someone feeds a large text document with no spaces or other delimiting characters, a non-intelligent tokenizer would treat this a 1 big token (and grow the char[] accordingly) TermAttributeImpl's buffer will never shrink if it grows too big -- Key: LUCENE-1859 URL: https://issues.apache.org/jira/browse/LUCENE-1859 Project: Lucene - Java Issue Type: Bug Components: Analysis Affects Versions: 2.9 Reporter: Tim Smith This was also an issue with Token previously as well If a TermAttributeImpl is populated with a very long buffer, it will never be able to reclaim this memory Obviously, it can be argued that Tokenizer's should never emit large tokens, however it seems that the TermAttributeImpl should have a reasonable static MAX_BUFFER_SIZE such that if the term buffer grows bigger than this, it will shrink back down to this size once the next token smaller than MAX_BUFFER_SIZE is set I don't think i have actually encountered issues with this yet, however it seems like if you have multiple indexing threads, you could end up with a char[Integer.MAX_VALUE] per thread (in the very worst case scenario) perhaps growTermBuffer should have the logic to shrink if the buffer is currently larger than MAX_BUFFER_SIZE and it needs less than MAX_BUFFER_SIZE -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Issue Comment Edited: (LUCENE-1859) TermAttributeImpl's buffer will never shrink if it grows too big
[ https://issues.apache.org/jira/browse/LUCENE-1859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12748071#action_12748071 ] Tim Smith edited comment on LUCENE-1859 at 8/26/09 11:31 AM: - bq. The worst-case scenario seems kind of theoretical 100% agree, but even if one extremely large token gets added to the stream (and possibly dropped prior to indexing), the char[] grows without ever shrinking back (so it can result in memory usage growing if bad content is thrown in (and people have no shortage of bad content) bq. Is a priority of major justified? major is just the default priority (feel free to change) bq. I assume that, based on this report, TermAttributeImpl never gets reset or discarded/recreated over the course of an indexing session? using reusable TokenStream will never cause the buffer to be nulled (as far as i can tell) for the lifetime of the thread (please correct me if i'm wrong on this) i would argue for a semi-large value for MAX_BUFFER_SIZE (potentially allowing this to be statically updated), just as a means to bound the max memory used here currently, the memory use is bounded by Integer.MAX_VALUE (which is really big) If someone feeds a large text document with no spaces or other delimiting characters, a non-intelligent tokenizer would treat this a 1 big token (and grow the char[] accordingly) was (Author: tsmith): b1. The worst-case scenario seems kind of theoretical 100% agree, but even if one extremely large token gets added to the stream (and possibly dropped prior to indexing), the char[] grows without ever shrinking back (so it can result in memory usage growing if bad content is thrown in (and people have no shortage of bad content) bq. Is a priority of major justified? major is just the default priority (feel free to change) bq. I assume that, based on this report, TermAttributeImpl never gets reset or discarded/recreated over the course of an indexing session? using reusable TokenStream will never cause the buffer to be nulled (as far as i can tell) for the lifetime of the thread (please correct me if i'm wrong on this) i would argue for a semi-large value for MAX_BUFFER_SIZE (potentially allowing this to be statically updated), just as a means to bound the max memory used here currently, the memory use is bounded by Integer.MAX_VALUE (which is really big) If someone feeds a large text document with no spaces or other delimiting characters, a non-intelligent tokenizer would treat this a 1 big token (and grow the char[] accordingly) TermAttributeImpl's buffer will never shrink if it grows too big -- Key: LUCENE-1859 URL: https://issues.apache.org/jira/browse/LUCENE-1859 Project: Lucene - Java Issue Type: Bug Components: Analysis Affects Versions: 2.9 Reporter: Tim Smith This was also an issue with Token previously as well If a TermAttributeImpl is populated with a very long buffer, it will never be able to reclaim this memory Obviously, it can be argued that Tokenizer's should never emit large tokens, however it seems that the TermAttributeImpl should have a reasonable static MAX_BUFFER_SIZE such that if the term buffer grows bigger than this, it will shrink back down to this size once the next token smaller than MAX_BUFFER_SIZE is set I don't think i have actually encountered issues with this yet, however it seems like if you have multiple indexing threads, you could end up with a char[Integer.MAX_VALUE] per thread (in the very worst case scenario) perhaps growTermBuffer should have the logic to shrink if the buffer is currently larger than MAX_BUFFER_SIZE and it needs less than MAX_BUFFER_SIZE -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1859) TermAttributeImpl's buffer will never shrink if it grows too big
[ https://issues.apache.org/jira/browse/LUCENE-1859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12748072#action_12748072 ] Uwe Schindler commented on LUCENE-1859: --- The problem is, that it may be possible to shrink the buffer once per document, when TokenStream's reset() is called (which is done before each new document). To achieve this, all TokenStreams must notify the termattribute in reset() to shrink its size, which is impractical. On the other hand, the reallocation would always be for each token (you call that inner loop). I agree, that normally, the tokens will not grow very large (if they do, you do something wrong during tokenization). Even things like KeywordTokenizer that only creates one token has an upper limit of the term size (as far as I know). I would set this to minor and would not take care before 2.9. The problem of maybe large buffers was there even in older versions with Token as attribute implementation. It is the same problem like preserving an ArrayList for very long time, it also only grows but never automatically shrinks. TermAttributeImpl's buffer will never shrink if it grows too big -- Key: LUCENE-1859 URL: https://issues.apache.org/jira/browse/LUCENE-1859 Project: Lucene - Java Issue Type: Bug Components: Analysis Affects Versions: 2.9 Reporter: Tim Smith This was also an issue with Token previously as well If a TermAttributeImpl is populated with a very long buffer, it will never be able to reclaim this memory Obviously, it can be argued that Tokenizer's should never emit large tokens, however it seems that the TermAttributeImpl should have a reasonable static MAX_BUFFER_SIZE such that if the term buffer grows bigger than this, it will shrink back down to this size once the next token smaller than MAX_BUFFER_SIZE is set I don't think i have actually encountered issues with this yet, however it seems like if you have multiple indexing threads, you could end up with a char[Integer.MAX_VALUE] per thread (in the very worst case scenario) perhaps growTermBuffer should have the logic to shrink if the buffer is currently larger than MAX_BUFFER_SIZE and it needs less than MAX_BUFFER_SIZE -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-1859) TermAttributeImpl's buffer will never shrink if it grows too big
[ https://issues.apache.org/jira/browse/LUCENE-1859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-1859: -- Priority: Minor (was: Major) TermAttributeImpl's buffer will never shrink if it grows too big -- Key: LUCENE-1859 URL: https://issues.apache.org/jira/browse/LUCENE-1859 Project: Lucene - Java Issue Type: Bug Components: Analysis Affects Versions: 2.9 Reporter: Tim Smith Priority: Minor This was also an issue with Token previously as well If a TermAttributeImpl is populated with a very long buffer, it will never be able to reclaim this memory Obviously, it can be argued that Tokenizer's should never emit large tokens, however it seems that the TermAttributeImpl should have a reasonable static MAX_BUFFER_SIZE such that if the term buffer grows bigger than this, it will shrink back down to this size once the next token smaller than MAX_BUFFER_SIZE is set I don't think i have actually encountered issues with this yet, however it seems like if you have multiple indexing threads, you could end up with a char[Integer.MAX_VALUE] per thread (in the very worst case scenario) perhaps growTermBuffer should have the logic to shrink if the buffer is currently larger than MAX_BUFFER_SIZE and it needs less than MAX_BUFFER_SIZE -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1859) TermAttributeImpl's buffer will never shrink if it grows too big
[ https://issues.apache.org/jira/browse/LUCENE-1859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12748082#action_12748082 ] Tim Smith commented on LUCENE-1859: --- bq. which non-intelligent tokenizers are you referring to? nearly all the lucene tokenizers have 255 as a limit. perhaps this is a non-issue with regards to lucene tokenizers however, Tokenizers can be implemented by anyone (not sure if there are adequate warnings about keeping tokens short) it also may not be possible to keep tokens short, i may need to index a rather long id string in a TokenStream fashion which will grow the buffer without reclaiming this perhaps it should be the responsibility of the Tokenizer to shrink the TermBuffer if it adds long tokens (but this will probably require some helper methods) TermAttributeImpl's buffer will never shrink if it grows too big -- Key: LUCENE-1859 URL: https://issues.apache.org/jira/browse/LUCENE-1859 Project: Lucene - Java Issue Type: Bug Components: Analysis Affects Versions: 2.9 Reporter: Tim Smith Priority: Minor This was also an issue with Token previously as well If a TermAttributeImpl is populated with a very long buffer, it will never be able to reclaim this memory Obviously, it can be argued that Tokenizer's should never emit large tokens, however it seems that the TermAttributeImpl should have a reasonable static MAX_BUFFER_SIZE such that if the term buffer grows bigger than this, it will shrink back down to this size once the next token smaller than MAX_BUFFER_SIZE is set I don't think i have actually encountered issues with this yet, however it seems like if you have multiple indexing threads, you could end up with a char[Integer.MAX_VALUE] per thread (in the very worst case scenario) perhaps growTermBuffer should have the logic to shrink if the buffer is currently larger than MAX_BUFFER_SIZE and it needs less than MAX_BUFFER_SIZE -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: javadoc update help
Agreed. I'll open an issue... Mike On Wed, Aug 26, 2009 at 10:26 AM, Mark Millermarkrmil...@gmail.com wrote: Why wouldn't we? Isn't it just as much of a break to have the QP start spitting them off? Except now its confusing because you get something different by default from the QP and by default from the direct object - sometimes this could make sense, I don't think the QP has to be locked into Query object defaults, but here it seems a bit odd to me. Also, now if you parse the output of the Query object toString, you will get different behavior - this isn't really a contract, but I think less surprises is better. - Mark Michael McCandless wrote: Right, it is confusing! QueryParser has already cutover to auto constant score, by default. But for direct instantiation of one of the MultiTermQueries, we still default to scoring BooleanQuery, but have declared that in 3.0 this will also switch to auto constant score. I'm tempted to simply switch the default today, for 2.9, instead. Then your original proposed javadoc is great. Mike On Wed, Aug 26, 2009 at 6:06 AM, Mark Millermarkrmil...@gmail.com wrote: hmm...I guess this javadoc from MultiTermQuery confused me: * Note that {...@link QueryParser} produces * MultiTermQueries using {...@link * #CONSTANT_SCORE_AUTO_REWRITE_DEFAULT} by default. Uwe Schindler wrote: Even the old RangeQuery does it. Only the new class TermRangeQuery uses constant score (and the also deprecated ConstantScoreRangeQuery). - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Michael McCandless [mailto:luc...@mikemccandless.com] Sent: Wednesday, August 26, 2009 11:33 AM To: java-dev@lucene.apache.org Subject: Re: javadoc update help Unfortunately, Prefix/Wildcard/FuzzyQuery, etc., still rewrite to scoring BooleanQuery by default, for now. In 3.0 this will change to constant score auto mode. At least, that's the plan now... however, QueryParser will produce queries in constant score auto mode, so we could consider changing the default for these queries in 2.9? If we don't want to change that default, how about something like this?: /** Set the maximum number of clauses permitted per * BooleanQuery. Default value is 1024. Note that queries that * derive from MultiTermQuery, such as such as WildcardQuery, * PrefixQuery and FuzzyQuery, may rewrite themselves to a * BooleanQuery before searching, and may therefore also hit this * limit. See {...@link MultiTermQuery} for details. */ Mike On Tue, Aug 25, 2009 at 8:14 PM, Mark Millermarkrmil...@gmail.com wrote: Having a writers block here: /** Set the maximum number of clauses permitted per BooleanQuery. * Default value is 1024. * pTermQuery clauses are generated from for example prefix queries and * fuzzy queries. Each TermQuery needs some buffer space during search, * so this parameter indirectly controls the maximum buffer requirements for * query search. * pWhen this parameter becomes a bottleneck for a Query one can use a * Filter. For example instead of a {...@link TermRangeQuery} one can use a * {...@link TermRangeFilter}. * pNormally the buffers are allocated by the JVM. When using for example * {...@link org.apache.lucene.store.MMapDirectory} the buffering is left to * the operating system. */ Okay, so prefix and fuzzy queries now will use a constantscore mode (indirectly, a filter) when it makes sense. So this comment is misleading. And the parameter doesn't control the max buffer - it possibly provides a top cutoff. But now it doesn't even necessarily do that, because if the Query uses constantscore mode (multi-term queries auto pick by default), this setting doesn't even influence anything. I started to rewrite below - but then it feels like I almost need to start from scratch. I don't think we should claim this setting controls the maximum buffer requirements for query search either - thats a bit strong ;) And the buffer talk overall (including at the bottom) is a bit confusing. /** Set the maximum number of clauses permitted per BooleanQuery. * Default value is 1024. * pFor example, TermQuery clauses can be generated from prefix queries and * fuzzy queries. Each TermQuery needs some buffer space during search, * so this parameter indirectly controls the maximum buffer requirements for * query search. * pWhen this parameter becomes a bottleneck for a Query one can use a * Filter. For example instead of a {...@link TermRangeQuery} one can use a * {...@link TermRangeFilter}. * pNormally the buffers are allocated by the JVM. When using for example * {...@link org.apache.lucene.store.MMapDirectory} the buffering is left to * the operating system. */ I'm tempted to make it: /** Set the maximum number of clauses
[jira] Commented: (LUCENE-1859) TermAttributeImpl's buffer will never shrink if it grows too big
[ https://issues.apache.org/jira/browse/LUCENE-1859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12748083#action_12748083 ] Robert Muir commented on LUCENE-1859: - bq. perhaps it should be the responsibility of the Tokenizer to shrink the TermBuffer if it adds long tokens (but this will probably require some helper methods) I like this idea better than having any resizing behavior that I might not be able to control. TermAttributeImpl's buffer will never shrink if it grows too big -- Key: LUCENE-1859 URL: https://issues.apache.org/jira/browse/LUCENE-1859 Project: Lucene - Java Issue Type: Bug Components: Analysis Affects Versions: 2.9 Reporter: Tim Smith Priority: Minor This was also an issue with Token previously as well If a TermAttributeImpl is populated with a very long buffer, it will never be able to reclaim this memory Obviously, it can be argued that Tokenizer's should never emit large tokens, however it seems that the TermAttributeImpl should have a reasonable static MAX_BUFFER_SIZE such that if the term buffer grows bigger than this, it will shrink back down to this size once the next token smaller than MAX_BUFFER_SIZE is set I don't think i have actually encountered issues with this yet, however it seems like if you have multiple indexing threads, you could end up with a char[Integer.MAX_VALUE] per thread (in the very worst case scenario) perhaps growTermBuffer should have the logic to shrink if the buffer is currently larger than MAX_BUFFER_SIZE and it needs less than MAX_BUFFER_SIZE -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1859) TermAttributeImpl's buffer will never shrink if it grows too big
[ https://issues.apache.org/jira/browse/LUCENE-1859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12748077#action_12748077 ] Tim Smith commented on LUCENE-1859: --- bq. I would set this to minor and would not take care before 2.9. i would agree with this just reported the issue as it has the potential to cause memory issues (and would think something should be done about it (in the long term at least)) also, the AttributeSource stuff does result in TermAttributeImpl being held onto pretty much forever if using a reusableTokenStream (correct?) was't a new Token() by the indexer for each doc/field in 2.4?, so the unbounding would only last at most for the duration of indexing that one document? with Attribute caching in the TokenStream, the bounding lasts the duration of the TokenStream now (or its underlaying AttributeSource), which could remain until shutdown TermAttributeImpl's buffer will never shrink if it grows too big -- Key: LUCENE-1859 URL: https://issues.apache.org/jira/browse/LUCENE-1859 Project: Lucene - Java Issue Type: Bug Components: Analysis Affects Versions: 2.9 Reporter: Tim Smith Priority: Minor This was also an issue with Token previously as well If a TermAttributeImpl is populated with a very long buffer, it will never be able to reclaim this memory Obviously, it can be argued that Tokenizer's should never emit large tokens, however it seems that the TermAttributeImpl should have a reasonable static MAX_BUFFER_SIZE such that if the term buffer grows bigger than this, it will shrink back down to this size once the next token smaller than MAX_BUFFER_SIZE is set I don't think i have actually encountered issues with this yet, however it seems like if you have multiple indexing threads, you could end up with a char[Integer.MAX_VALUE] per thread (in the very worst case scenario) perhaps growTermBuffer should have the logic to shrink if the buffer is currently larger than MAX_BUFFER_SIZE and it needs less than MAX_BUFFER_SIZE -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1859) TermAttributeImpl's buffer will never shrink if it grows too big
[ https://issues.apache.org/jira/browse/LUCENE-1859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12748079#action_12748079 ] Robert Muir commented on LUCENE-1859: - bq. If someone feeds a large text document with no spaces or other delimiting characters, a non-intelligent tokenizer would treat this a 1 big token (and grow the char[] accordingly) which non-intelligent tokenizers are you referring to? nearly all the lucene tokenizers have 255 as a limit. TermAttributeImpl's buffer will never shrink if it grows too big -- Key: LUCENE-1859 URL: https://issues.apache.org/jira/browse/LUCENE-1859 Project: Lucene - Java Issue Type: Bug Components: Analysis Affects Versions: 2.9 Reporter: Tim Smith Priority: Minor This was also an issue with Token previously as well If a TermAttributeImpl is populated with a very long buffer, it will never be able to reclaim this memory Obviously, it can be argued that Tokenizer's should never emit large tokens, however it seems that the TermAttributeImpl should have a reasonable static MAX_BUFFER_SIZE such that if the term buffer grows bigger than this, it will shrink back down to this size once the next token smaller than MAX_BUFFER_SIZE is set I don't think i have actually encountered issues with this yet, however it seems like if you have multiple indexing threads, you could end up with a char[Integer.MAX_VALUE] per thread (in the very worst case scenario) perhaps growTermBuffer should have the logic to shrink if the buffer is currently larger than MAX_BUFFER_SIZE and it needs less than MAX_BUFFER_SIZE -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1859) TermAttributeImpl's buffer will never shrink if it grows too big
[ https://issues.apache.org/jira/browse/LUCENE-1859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12748089#action_12748089 ] Marvin Humphrey commented on LUCENE-1859: - IMO, the benefit of adding these theoretical helper methods to lower average -- but not peak -- memory usage by non-core Tokenizers which are probably doing something impractical anyway... does not justify the complexity cost. TermAttributeImpl's buffer will never shrink if it grows too big -- Key: LUCENE-1859 URL: https://issues.apache.org/jira/browse/LUCENE-1859 Project: Lucene - Java Issue Type: Bug Components: Analysis Affects Versions: 2.9 Reporter: Tim Smith Priority: Minor This was also an issue with Token previously as well If a TermAttributeImpl is populated with a very long buffer, it will never be able to reclaim this memory Obviously, it can be argued that Tokenizer's should never emit large tokens, however it seems that the TermAttributeImpl should have a reasonable static MAX_BUFFER_SIZE such that if the term buffer grows bigger than this, it will shrink back down to this size once the next token smaller than MAX_BUFFER_SIZE is set I don't think i have actually encountered issues with this yet, however it seems like if you have multiple indexing threads, you could end up with a char[Integer.MAX_VALUE] per thread (in the very worst case scenario) perhaps growTermBuffer should have the logic to shrink if the buffer is currently larger than MAX_BUFFER_SIZE and it needs less than MAX_BUFFER_SIZE -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1859) TermAttributeImpl's buffer will never shrink if it grows too big
[ https://issues.apache.org/jira/browse/LUCENE-1859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12748102#action_12748102 ] Marvin Humphrey commented on LUCENE-1859: - i fail to see the complexity of adding one method to TermAttribute: Death by a thousand cuts. This is one cut. I wouldn't even add the note to the documentation. If you emit large tokens, you have to plan for obscene peak memory usage anyway, and if you're not prepared for that, you deserve what you get. Keeping the average down doesn't help that. The only reason to do this is to keep average memory usage down for the hell of it, and if it goes in, it should be an implementation detail. TermAttributeImpl's buffer will never shrink if it grows too big -- Key: LUCENE-1859 URL: https://issues.apache.org/jira/browse/LUCENE-1859 Project: Lucene - Java Issue Type: Bug Components: Analysis Affects Versions: 2.9 Reporter: Tim Smith Priority: Minor This was also an issue with Token previously as well If a TermAttributeImpl is populated with a very long buffer, it will never be able to reclaim this memory Obviously, it can be argued that Tokenizer's should never emit large tokens, however it seems that the TermAttributeImpl should have a reasonable static MAX_BUFFER_SIZE such that if the term buffer grows bigger than this, it will shrink back down to this size once the next token smaller than MAX_BUFFER_SIZE is set I don't think i have actually encountered issues with this yet, however it seems like if you have multiple indexing threads, you could end up with a char[Integer.MAX_VALUE] per thread (in the very worst case scenario) perhaps growTermBuffer should have the logic to shrink if the buffer is currently larger than MAX_BUFFER_SIZE and it needs less than MAX_BUFFER_SIZE -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1859) TermAttributeImpl's buffer will never shrink if it grows too big
[ https://issues.apache.org/jira/browse/LUCENE-1859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12748091#action_12748091 ] Tim Smith commented on LUCENE-1859: --- i fail to see the complexity of adding one method to TermAttribute: {code} public void shrinkBuffer(int maxSize) { if ((maxSize termLength) (buffer.length maxSize)) { termBuffer = new char[maxSize]; } } {code} Not having this is fine as long as its well documented that emitting large tokens can and will result in memory growing uncontrolled (especially if using many indexing threads) TermAttributeImpl's buffer will never shrink if it grows too big -- Key: LUCENE-1859 URL: https://issues.apache.org/jira/browse/LUCENE-1859 Project: Lucene - Java Issue Type: Bug Components: Analysis Affects Versions: 2.9 Reporter: Tim Smith Priority: Minor This was also an issue with Token previously as well If a TermAttributeImpl is populated with a very long buffer, it will never be able to reclaim this memory Obviously, it can be argued that Tokenizer's should never emit large tokens, however it seems that the TermAttributeImpl should have a reasonable static MAX_BUFFER_SIZE such that if the term buffer grows bigger than this, it will shrink back down to this size once the next token smaller than MAX_BUFFER_SIZE is set I don't think i have actually encountered issues with this yet, however it seems like if you have multiple indexing threads, you could end up with a char[Integer.MAX_VALUE] per thread (in the very worst case scenario) perhaps growTermBuffer should have the logic to shrink if the buffer is currently larger than MAX_BUFFER_SIZE and it needs less than MAX_BUFFER_SIZE -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1859) TermAttributeImpl's buffer will never shrink if it grows too big
[ https://issues.apache.org/jira/browse/LUCENE-1859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12748103#action_12748103 ] Tim Smith commented on LUCENE-1859: --- bq. Death by a thousand cuts. This is one cut. by this logic, nothing new can ever be added. The thing that brought this to my attention was the new TokenStream API (one cut (rather big, but i like the new API so i'm happy with the blood loss (makes me dizzy and happy))) The new TokenStream API holds onto theses char[] much longer (if not forever), so this results in memory growing unbounded unless there is some facility to truncate/null out the char[] bq. I wouldn't even add the note to the documentation. I don't believe there is ever any valid argument against adding documentation. If someone can shoot themselves in the foot with the gun you gave them, at least tell them not to point the gun at their foot with the safety off. bq. The only reason to do this is to keep average memory usage down for the hell of it. keeping average memory usage down prevents those wonderful OutOfMemory Exceptions (which are difficult at best to recover from) TermAttributeImpl's buffer will never shrink if it grows too big -- Key: LUCENE-1859 URL: https://issues.apache.org/jira/browse/LUCENE-1859 Project: Lucene - Java Issue Type: Bug Components: Analysis Affects Versions: 2.9 Reporter: Tim Smith Priority: Minor This was also an issue with Token previously as well If a TermAttributeImpl is populated with a very long buffer, it will never be able to reclaim this memory Obviously, it can be argued that Tokenizer's should never emit large tokens, however it seems that the TermAttributeImpl should have a reasonable static MAX_BUFFER_SIZE such that if the term buffer grows bigger than this, it will shrink back down to this size once the next token smaller than MAX_BUFFER_SIZE is set I don't think i have actually encountered issues with this yet, however it seems like if you have multiple indexing threads, you could end up with a char[Integer.MAX_VALUE] per thread (in the very worst case scenario) perhaps growTermBuffer should have the logic to shrink if the buffer is currently larger than MAX_BUFFER_SIZE and it needs less than MAX_BUFFER_SIZE -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-1860) switch MultiTermQuery to constant score auto rewrite by default
[ https://issues.apache.org/jira/browse/LUCENE-1860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-1860: --- Component/s: Search Priority: Minor (was: Major) switch MultiTermQuery to constant score auto rewrite by default - Key: LUCENE-1860 URL: https://issues.apache.org/jira/browse/LUCENE-1860 Project: Lucene - Java Issue Type: Improvement Components: Search Affects Versions: 2.9 Reporter: Michael McCandless Assignee: Michael McCandless Priority: Minor Fix For: 2.9 Right now it defaults to scoring BooleanQuery, and that's inconsistent w/ QueryParser which does constant score auto. The new multi-term queries already set this default, so the only core queries this will impact are PrefixQuery and WildcardQuery. FuzzyQuery, which has its own rewrite to BooleanQuery, will keep doing so. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-1860) switch MultiTermQuery to constant score auto rewrite by default
[ https://issues.apache.org/jira/browse/LUCENE-1860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-1860: --- Attachment: LUCENE-1860.patch switch MultiTermQuery to constant score auto rewrite by default - Key: LUCENE-1860 URL: https://issues.apache.org/jira/browse/LUCENE-1860 Project: Lucene - Java Issue Type: Improvement Components: Search Affects Versions: 2.9 Reporter: Michael McCandless Assignee: Michael McCandless Priority: Minor Fix For: 2.9 Attachments: LUCENE-1860.patch Right now it defaults to scoring BooleanQuery, and that's inconsistent w/ QueryParser which does constant score auto. The new multi-term queries already set this default, so the only core queries this will impact are PrefixQuery and WildcardQuery. FuzzyQuery, which has its own rewrite to BooleanQuery, will keep doing so. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Created: (LUCENE-1860) switch MultiTermQuery to constant score auto rewrite by default
switch MultiTermQuery to constant score auto rewrite by default - Key: LUCENE-1860 URL: https://issues.apache.org/jira/browse/LUCENE-1860 Project: Lucene - Java Issue Type: Improvement Affects Versions: 2.9 Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 2.9 Right now it defaults to scoring BooleanQuery, and that's inconsistent w/ QueryParser which does constant score auto. The new multi-term queries already set this default, so the only core queries this will impact are PrefixQuery and WildcardQuery. FuzzyQuery, which has its own rewrite to BooleanQuery, will keep doing so. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1859) TermAttributeImpl's buffer will never shrink if it grows too big
[ https://issues.apache.org/jira/browse/LUCENE-1859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12748109#action_12748109 ] Marvin Humphrey commented on LUCENE-1859: - I don't believe there is ever any valid argument against adding documentation. The more that documentation grows, the harder it is to absorb. The more bells and whistles on an API, the harder it is to grok and to use effectively. The more a code base bloats, the harder it is to maintain or to evolve. keeping average memory usage down prevents those wonderful OutOfMemory Exceptions No, it won't. If someone is emitting large tokens regularly, it is likely that several threads will require large RAM footprints simultaneously, and an OOM will occur. That would be the common case. If someone is emmitting large tokens periodically, well, this doesn't prevent the OOM, it just makes it less likely. That's not worthless, but it's not something anybody should count on when assessing required RAM usage. Keeping average memory usage down is good for the system at large. If this is implemented, that should be the justification. TermAttributeImpl's buffer will never shrink if it grows too big -- Key: LUCENE-1859 URL: https://issues.apache.org/jira/browse/LUCENE-1859 Project: Lucene - Java Issue Type: Bug Components: Analysis Affects Versions: 2.9 Reporter: Tim Smith Priority: Minor This was also an issue with Token previously as well If a TermAttributeImpl is populated with a very long buffer, it will never be able to reclaim this memory Obviously, it can be argued that Tokenizer's should never emit large tokens, however it seems that the TermAttributeImpl should have a reasonable static MAX_BUFFER_SIZE such that if the term buffer grows bigger than this, it will shrink back down to this size once the next token smaller than MAX_BUFFER_SIZE is set I don't think i have actually encountered issues with this yet, however it seems like if you have multiple indexing threads, you could end up with a char[Integer.MAX_VALUE] per thread (in the very worst case scenario) perhaps growTermBuffer should have the logic to shrink if the buffer is currently larger than MAX_BUFFER_SIZE and it needs less than MAX_BUFFER_SIZE -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Created: (LUCENE-1861) Add contrib libs to classpath for javadoc
Add contrib libs to classpath for javadoc - Key: LUCENE-1861 URL: https://issues.apache.org/jira/browse/LUCENE-1861 Project: Lucene - Java Issue Type: Wish Reporter: Mark Miller Priority: Minor I don't know Ant well enough to just do this easily, so I've labeled a wish - would be nice to get rid of all the errors/warnings that not finding these classes generates when building javadoc. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-1861) Add contrib libs to classpath for javadoc
[ https://issues.apache.org/jira/browse/LUCENE-1861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller updated LUCENE-1861: Component/s: Build Add contrib libs to classpath for javadoc - Key: LUCENE-1861 URL: https://issues.apache.org/jira/browse/LUCENE-1861 Project: Lucene - Java Issue Type: Wish Components: Build Reporter: Mark Miller Priority: Minor I don't know Ant well enough to just do this easily, so I've labeled a wish - would be nice to get rid of all the errors/warnings that not finding these classes generates when building javadoc. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1859) TermAttributeImpl's buffer will never shrink if it grows too big
[ https://issues.apache.org/jira/browse/LUCENE-1859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12748122#action_12748122 ] Tim Smith commented on LUCENE-1859: --- On documentation: any warnings/precautions should always be called out (calling out the external link (wiki/etc) for in depth details) in depth descriptions of the details can be pushed off to wiki pages or external references, as long as a link is provided for the curious, but i would still argue that they should exist bq. this doesn't prevent the OOM, it just makes it less likely all you can ever do for OOM issues is make them less likely (short of just fixing a bug that holds onto memory like mad). If accepting arbitrary content, there will always be a possibility of the content forcing OOM issues. In general, everything possible should be done to reduce the likelyhood of such OOM issues where possible (IMO). TermAttributeImpl's buffer will never shrink if it grows too big -- Key: LUCENE-1859 URL: https://issues.apache.org/jira/browse/LUCENE-1859 Project: Lucene - Java Issue Type: Bug Components: Analysis Affects Versions: 2.9 Reporter: Tim Smith Priority: Minor This was also an issue with Token previously as well If a TermAttributeImpl is populated with a very long buffer, it will never be able to reclaim this memory Obviously, it can be argued that Tokenizer's should never emit large tokens, however it seems that the TermAttributeImpl should have a reasonable static MAX_BUFFER_SIZE such that if the term buffer grows bigger than this, it will shrink back down to this size once the next token smaller than MAX_BUFFER_SIZE is set I don't think i have actually encountered issues with this yet, however it seems like if you have multiple indexing threads, you could end up with a char[Integer.MAX_VALUE] per thread (in the very worst case scenario) perhaps growTermBuffer should have the logic to shrink if the buffer is currently larger than MAX_BUFFER_SIZE and it needs less than MAX_BUFFER_SIZE -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1817) it is impossible to use a custom dictionary for SmartChineseAnalyzer
[ https://issues.apache.org/jira/browse/LUCENE-1817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12748126#action_12748126 ] Robert Muir commented on LUCENE-1817: - i looked at this file format and I am going to create smaller custom dictionaries for testing. this way we do not have huge files in svn it is impossible to use a custom dictionary for SmartChineseAnalyzer Key: LUCENE-1817 URL: https://issues.apache.org/jira/browse/LUCENE-1817 Project: Lucene - Java Issue Type: Bug Components: contrib/analyzers Reporter: Robert Muir Priority: Minor Attachments: dataFiles.zip, LUCENE-1817-mark-cn-experimental.patch, LUCENE-1817.patch it is not possible to use a custom dictionary, even though there is a lot of code and javadocs to allow this. This is because the custom dictionary is only loaded if it cannot load the built-in one (which is of course, in the jar file and should load) {code} public synchronized static WordDictionary getInstance() { if (singleInstance == null) { singleInstance = new WordDictionary(); // load from jar file try { singleInstance.load(); } catch (IOException e) { // loading from jar file must fail before it checks the AnalyzerProfile (where this can be configured) String wordDictRoot = AnalyzerProfile.ANALYSIS_DATA_DIR; singleInstance.load(wordDictRoot); } catch (ClassNotFoundException e) { throw new RuntimeException(e); } } return singleInstance; } {code} I think we should either correct this, document this, or disable custom dictionary support... -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-950) IllegalArgumentException parsing foo~1
[ https://issues.apache.org/jira/browse/LUCENE-950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adriano Crestani updated LUCENE-950: Attachment: lucene_950_08_26_2009.patch This patch fixes the bug, it no longer throws IllegalArgumentException when the user enters fuzzy queries with similarity greater or equals 1, instead, it converts the FuzzyQuery into a simple TermQuery, ignoring the fuzzy value. IllegalArgumentException parsing foo~1 Key: LUCENE-950 URL: https://issues.apache.org/jira/browse/LUCENE-950 Project: Lucene - Java Issue Type: Bug Components: QueryParser Affects Versions: 2.1, 2.2 Environment: Java 1.5 Reporter: Eleanor Joslin Priority: Minor Attachments: lucene_950_08_26_2009.patch If I run this: QueryParser parser = new QueryParser(myField, new SimpleAnalyzer()); try { parser.parse(foo~1); } catch (ParseException e) { // OK } I get this: Exception in thread main java.lang.IllegalArgumentException: minimumSimilarity = 1 at org.apache.lucene.search.FuzzyQuery.init(FuzzyQuery.java:58) at org.apache.lucene.queryParser.QueryParser.getFuzzyQuery(QueryParser.java:711) at org.apache.lucene.queryParser.QueryParser.Term(QueryParser.java:1090) at org.apache.lucene.queryParser.QueryParser.Clause(QueryParser.java:979) at org.apache.lucene.queryParser.QueryParser.Query(QueryParser.java:907) at org.apache.lucene.queryParser.QueryParser.TopLevelQuery(QueryParser.java:896) at org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:146) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1860) switch MultiTermQuery to constant score auto rewrite by default
[ https://issues.apache.org/jira/browse/LUCENE-1860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12748157#action_12748157 ] Uwe Schindler commented on LUCENE-1860: --- If we change this, should we keep the good old RangeQuery as it is (boolean rewrite)? Because there is also the deprecated ConstantScoreRangeQuery. switch MultiTermQuery to constant score auto rewrite by default - Key: LUCENE-1860 URL: https://issues.apache.org/jira/browse/LUCENE-1860 Project: Lucene - Java Issue Type: Improvement Components: Search Affects Versions: 2.9 Reporter: Michael McCandless Assignee: Michael McCandless Priority: Minor Fix For: 2.9 Attachments: LUCENE-1860.patch Right now it defaults to scoring BooleanQuery, and that's inconsistent w/ QueryParser which does constant score auto. The new multi-term queries already set this default, so the only core queries this will impact are PrefixQuery and WildcardQuery. FuzzyQuery, which has its own rewrite to BooleanQuery, will keep doing so. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Resolved: (LUCENE-1851) 'ant javacc' in root project should also properly create contrib/surround Java files
[ https://issues.apache.org/jira/browse/LUCENE-1851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Busch resolved LUCENE-1851. --- Resolution: Fixed Fixed it. Sorry about that. Committed revision 808224. 'ant javacc' in root project should also properly create contrib/surround Java files Key: LUCENE-1851 URL: https://issues.apache.org/jira/browse/LUCENE-1851 Project: Lucene - Java Issue Type: Improvement Components: contrib/* Affects Versions: 2.9 Reporter: Paul Elschot Assignee: Michael Busch Priority: Minor Fix For: 2.9 Attachments: javacc20090825.patch, LUCENE-1851.patch For consistency after LUCENE-1829 which did the same for contrib/queryparser -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1860) switch MultiTermQuery to constant score auto rewrite by default
[ https://issues.apache.org/jira/browse/LUCENE-1860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12748213#action_12748213 ] Michael McCandless commented on LUCENE-1860: bq. should we keep the good old RangeQuery as it is (boolean rewrite)? Because there is also the deprecated ConstantScoreRangeQuery. I think we should? That's what it is right now (and the patch leaves it). switch MultiTermQuery to constant score auto rewrite by default - Key: LUCENE-1860 URL: https://issues.apache.org/jira/browse/LUCENE-1860 Project: Lucene - Java Issue Type: Improvement Components: Search Affects Versions: 2.9 Reporter: Michael McCandless Assignee: Michael McCandless Priority: Minor Fix For: 2.9 Attachments: LUCENE-1860.patch Right now it defaults to scoring BooleanQuery, and that's inconsistent w/ QueryParser which does constant score auto. The new multi-term queries already set this default, so the only core queries this will impact are PrefixQuery and WildcardQuery. FuzzyQuery, which has its own rewrite to BooleanQuery, will keep doing so. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: Lucene 2.9 release
Mark Miller wrote: I'm tempted to say lets start the freeze tomorrow instead - I could do another full day of doc/packaging no problem I think (a bunch left to do on the website stuff alone) - and technically the releaseToDo wants everything to go through a patch in JIRA first while in freeze (not a bad idea at all) - which slows things down. Also don't have much time to do the RC if I'm on doc all day. Anyone object to starting tomorrow rather than today? I think I'm ready for freeze tomorrow if everyone else is. I won't branch - but I'll jump the right numbers and what not on trunk. Then after branch (in a week ), I'll advance trunks numbers again (to 3.0-dev) -- - Mark http://www.lucidimagination.com - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Adding Field twice w/ Payload - bug or works as designed?
Hi I don't know if it's supported or not, but I wrote the following simple example code to describe what I want. Directory dir = new RAMDirectory(); Analyzer a = new SimpleAnalyzer(); IndexWriter writer = new IndexWriter(dir, a, MaxFieldLength.UNLIMITED); Document doc = new Document(); doc.add(new Field(a, abc, Store.NO, Index.NOT_ANALYZED)); final Term t = new Term(a, abc); doc.add(new Field(t.field(), new TokenStream() { boolean done = false; @Override public Token next(Token reusableToken) throws IOException { if (done) return null; done = true; reusableToken.setTermBuffer(t.text()); reusableToken.setPayload(new Payload(new byte[] { 1 })); return reusableToken; } })); writer.addDocument(doc); writer.commit(); writer.close(); IndexReader reader = IndexReader.open(dir, true); TermPositions tp = reader.termPositions(t); tp.next(); tp.nextPosition(); System.out.println(tp.getPayloadLength()); reader.close(); Basically, I add the same Field twice (a:abc), the second time I just set a Payload. The program prints 0 as the payload length (1 line above the last). If I change either the field name or field text, it prints 1. Bug or works as designed? Shai
Re: Adding Field twice w/ Payload - bug or works as designed?
The first occurrence of your term does not have a payload, the second one does. So getPayloadLength() correctly returns 0, because the TermPositions is at the first occurrence. If you call nextPosition() again and then dump the payload length it should be 1. Michael On 8/26/09 8:51 PM, Shai Erera wrote: Hi I don't know if it's supported or not, but I wrote the following simple example code to describe what I want. Directory dir = new RAMDirectory(); Analyzer a = new SimpleAnalyzer(); IndexWriter writer = new IndexWriter(dir, a, MaxFieldLength.UNLIMITED); Document doc = new Document(); doc.add(new Field(a, abc, Store.NO, Index.NOT_ANALYZED)); final Term t = new Term(a, abc); doc.add(new Field(t.field(), new TokenStream() { boolean done = false; @Override public Token next(Token reusableToken) throws IOException { if (done) return null; done = true; reusableToken.setTermBuffer(t.text()); reusableToken.setPayload(new Payload(new byte[] { 1 })); return reusableToken; } })); writer.addDocument(doc); writer.commit(); writer.close(); IndexReader reader = IndexReader.open(dir, true); TermPositions tp = reader.termPositions(t); tp.next(); tp.nextPosition(); System.out.println(tp.getPayloadLength()); reader.close(); Basically, I add the same Field twice (a:abc), the second time I just set a Payload. The program prints 0 as the payload length (1 line above the last). If I change either the field name or field text, it prints 1. Bug or works as designed? Shai - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: Adding Field twice w/ Payload - bug or works as designed?
Ohh, right. I missed that. Indeed after I call nextPosition again, it prints 1. Thanks ! Shai On Thu, Aug 27, 2009 at 7:09 AM, Michael Busch busch...@gmail.com wrote: The first occurrence of your term does not have a payload, the second one does. So getPayloadLength() correctly returns 0, because the TermPositions is at the first occurrence. If you call nextPosition() again and then dump the payload length it should be 1. Michael On 8/26/09 8:51 PM, Shai Erera wrote: Hi I don't know if it's supported or not, but I wrote the following simple example code to describe what I want. Directory dir = new RAMDirectory(); Analyzer a = new SimpleAnalyzer(); IndexWriter writer = new IndexWriter(dir, a, MaxFieldLength.UNLIMITED); Document doc = new Document(); doc.add(new Field(a, abc, Store.NO, Index.NOT_ANALYZED)); final Term t = new Term(a, abc); doc.add(new Field(t.field(), new TokenStream() { boolean done = false; @Override public Token next(Token reusableToken) throws IOException { if (done) return null; done = true; reusableToken.setTermBuffer(t.text()); reusableToken.setPayload(new Payload(new byte[] { 1 })); return reusableToken; } })); writer.addDocument(doc); writer.commit(); writer.close(); IndexReader reader = IndexReader.open(dir, true); TermPositions tp = reader.termPositions(t); tp.next(); tp.nextPosition(); System.out.println(tp.getPayloadLength()); reader.close(); Basically, I add the same Field twice (a:abc), the second time I just set a Payload. The program prints 0 as the payload length (1 line above the last). If I change either the field name or field text, it prints 1. Bug or works as designed? Shai - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org