[jira] [Updated] (SOLR-2540) CommitWithin as an Update Request parameter

2011-09-14 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/SOLR-2540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Høydahl updated SOLR-2540:
--


Tried reproducing on my Mac, with Java 1.6.0_26 and ant 1.8.2, but no success.
Any clue on what environment I need to test at, or what in the test case is the 
problem?

> CommitWithin as an Update Request parameter
> ---
>
> Key: SOLR-2540
> URL: https://issues.apache.org/jira/browse/SOLR-2540
> Project: Solr
>  Issue Type: New Feature
>  Components: update
>Reporter: Jan Høydahl
>Assignee: Jan Høydahl
>  Labels: commit, commitWithin
> Fix For: 3.4, 4.0
>
> Attachments: SOLR-2540.patch, SOLR-2540.patch
>
>
> It would be useful to support commitWithin HTTP GET request param on all 
> UpdateRequestHandlers.
> That way, you could set commitWithin on the request (for XML, JSON, CSV, 
> Binary and Extracting handlers) with this syntax:
> {code}
>   curl 
> http://localhost:8983/solr/update/extract?literal.id=123&commitWithin=1
>-H "Content-Type: application/pdf" --data-binary @file.pdf
> {code}
> PS: The JsonUpdateRequestHandler and BinaryUpdateRequestHandler already 
> support this syntax.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Trunk test failure: ExtractingRequestHandlerTest.testCommitWithin() [was: [JENKINS-MAVEN] Lucene-Solr-Maven-trunk #239: POMs out of sync]

2011-09-14 Thread Jan Høydahl
Hi,

It succeeds here. Will try to reproduce.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Solr Training - www.solrtraining.com

On 14. sep. 2011, at 21:06, Chris Hostetter wrote:

> 
> : This is 100% reproducible on my local machine (run from 
> solr/contrib/extraction/):
> : 
> : ant test -Dtestcase=ExtractingRequestHandlerTest 
> -Dtestmethod=testCommitWithin 
> -Dtests.seed=-2b35f16e02bddd0d:5c36eb67e44fc16d:-54d0d485d6a45315
> 
> I reopend SOLR-2540, where this test was added.
> 
> Jan?  are you looking at this?
> 
> : 
> : Steve
> : 
> : > -Original Message-
> : > From: Apache Jenkins Server [mailto:jenk...@builds.apache.org]
> : > Sent: Tuesday, September 13, 2011 12:09 PM
> : > To: dev@lucene.apache.org
> : > Subject: [JENKINS-MAVEN] Lucene-Solr-Maven-trunk #239: POMs out of sync
> : > 
> : > Build: https://builds.apache.org/job/Lucene-Solr-Maven-trunk/239/
> : > 
> : > 1 tests failed.
> : > FAILED:
> : > org.apache.solr.handler.extraction.ExtractingRequestHandlerTest.testCommi
> : > tWithin
> : > 
> : > Error Message:
> : > Exception during query
> : > 
> : > Stack Trace:
> : > java.lang.RuntimeException: Exception during query
> : >   at org.apache.solr.SolrTestCaseJ4.assertQ(SolrTestCaseJ4.java:396)
> : >   at org.apache.solr.SolrTestCaseJ4.assertQ(SolrTestCaseJ4.java:363)
> : >   at
> : > org.apache.solr.handler.extraction.ExtractingRequestHandlerTest.testCommi
> : > tWithin(ExtractingRequestHandlerTest.java:306)
> : >   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> : >   at
> : > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java
> : > :57)
> : >   at
> : > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorI
> : > mpl.java:43)
> : >   at java.lang.reflect.Method.invoke(Method.java:616)
> : >   at
> : > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMeth
> : > od.java:44)
> : >   at
> : > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallabl
> : > e.java:15)
> : >   at
> : > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod
> : > .java:41)
> : >   at
> : > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.
> : > java:20)
> : >   at org.junit.rules.TestWatchman$1.evaluate(TestWatchman.java:48)
> : >   at
> : > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java
> : > :28)
> : >   at
> : > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:3
> : > 1)
> : >   at
> : > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.
> : > java:76)
> : >   at
> : > org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner
> : > .java:148)
> : >   at
> : > org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner
> : > .java:50)
> : >   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193)
> : >   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52)
> : >   at
> : > org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191)
> : >   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:42)
> : >   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184)
> : >   at
> : > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java
> : > :28)
> : >   at
> : > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:3
> : > 1)
> : >   at org.junit.runners.ParentRunner.run(ParentRunner.java:236)
> : >   at
> : > org.apache.maven.surefire.junit4.JUnit4TestSet.execute(JUnit4TestSet.java
> : > :35)
> : >   at
> : > org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Prov
> : > ider.java:146)
> : >   at
> : > org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.jav
> : > a:97)
> : >   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> : >   at
> : > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java
> : > :57)
> : >   at
> : > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorI
> : > mpl.java:43)
> : >   at java.lang.reflect.Method.invoke(Method.java:616)
> : >   at
> : > org.apache.maven.surefire.booter.ProviderFactory$ClassLoaderProxy.invoke(
> : > ProviderFactory.java:103)
> : >   at $Proxy0.invoke(Unknown Source)
> : >   at
> : > org.apache.maven.surefire.booter.SurefireStarter.invokeProvider(SurefireS
> : > tarter.java:145)
> : >   at
> : > org.apache.maven.surefire.booter.SurefireStarter.runSuitesInProcess(Suref
> : > ireStarter.java:87)
> : >   at
> : > org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:69)
> : > Caused by: java.lang.RuntimeException: REQUEST FAILED:
> : > xpath=//*[@numFound='1']
> : >   xml response was: 
> : > 
> : > 0 : > name="QTime">0 : > start="0">
> : > 
> : > 
> : >   request was:start=0&q=id:one&qt=standard&rows=20&version=2.2
> : >   at org.apache.solr.SolrTestCaseJ4.assertQ(SolrTestCaseJ4.java:389)
> : >   ... 36 more

Re: [jira] [Created] (SOLR-2760) Cannot "ant dist or ant example"

2011-09-14 Thread Bill Bell
I did a clean, but there were remnantsŠ

I just did a "svn co" into a new dir, and all if fine. Thanks.

From:  Grant Ingersoll 
Reply-To:  
Date:  Wed, 14 Sep 2011 15:18:19 -0400
To:  
Subject:  Re: [jira] [Created] (SOLR-2760) Cannot "ant dist or ant example"

Did you clean first?

On Sep 14, 2011, at 1:49 AM, Bill Bell wrote:

> Thoughts on this?
> 
> I did an "svn up"
> 
> 
> On 9/13/11 11:00 PM, "Bill Bell (JIRA)"  wrote:
> 
>> Cannot "ant dist or ant example"
>> 
>> 
>> Key: SOLR-2760
>> URL: https://issues.apache.org/jira/browse/SOLR-2760
>> Project: Solr
>>  Issue Type: Bug
>>Reporter: Bill Bell
>> 
>> 
>> Path: .
>> URL: http://svn.apache.org/repos/asf/lucene/dev/trunk/solr
>> Repository Root: http://svn.apache.org/repos/asf
>> Repository UUID: 13f79535-47bb-0310-9956-ffa450edef68
>> Revision: 1170435
>> Node Kind: directory
>> Schedule: normal
>> Last Changed Author: chrism
>> Last Changed Rev: 1170425
>> Last Changed Date: 2011-09-13 21:36:56 -0600 (Tue, 13 Sep 2011)
>> 
>> 
>> Then
>> 
>>> ant dist or ant example
>> 
>> compile-core:
>>[javac] Compiling 23 source files to
>> /Users/bill/solr/trunk/modules/queries/build/classes/java
>>[javac] 
>> /Users/bill/solr/trunk/modules/queries/src/java/org/apache/lucene/queries/
>> function/valuesource/NormValueSource.java:48: warning: [unchecked]
>> unchecked call to put(K,V) as a member of the raw type java.util.Map
>>[javac] context.put("searcher",searcher);
>>[javac]^
>>[javac] 
>> /Users/bill/solr/trunk/modules/queries/src/java/org/apache/lucene/queries/
>> function/valuesource/NormValueSource.java:61: cannot find symbol
>>[javac] symbol  : class ConstDoubleDocValues
>>[javac] location: class
>> org.apache.lucene.queries.function.valuesource.NormValueSource
>>[javac]   return new ConstDoubleDocValues(0.0, this);
>>[javac]  ^
>>[javac] 
>> /Users/bill/solr/trunk/modules/queries/src/java/org/apache/lucene/queries/
>> function/valuesource/NumDocsValueSource.java:40: cannot find symbol
>>[javac] symbol  : class ConstIntDocValues
>>[javac] location: class
>> org.apache.lucene.queries.function.valuesource.NumDocsValueSource
>>[javac] return new
>> ConstIntDocValues(ReaderUtil.getTopLevelContext(readerContext).reader.numD
>> ocs(), this);
>>[javac]^
>>[javac] 
>> /Users/bill/solr/trunk/modules/queries/src/java/org/apache/lucene/queries/
>> function/valuesource/QueryValueSource.java:73: warning: [unchecked]
>> unchecked call to put(K,V) as a member of the raw type java.util.Map
>>[javac] context.put(this, w);
>>[javac]^
>>[javac] 
>> /Users/bill/solr/trunk/modules/queries/src/java/org/apache/lucene/queries/
>> function/valuesource/ScaleFloatFunction.java:96: warning: [unchecked]
>> unchecked call to put(K,V) as a member of the raw type java.util.Map
>>[javac] context.put(this.source, scaleInfo);
>>[javac]^
>>[javac] 
>> /Users/bill/solr/trunk/modules/queries/src/java/org/apache/lucene/queries/
>> function/valuesource/SumTotalTermFreqValueSource.java:68: warning:
>> [unchecked] unchecked call to put(K,V) as a member of the raw type
>> java.util.Map
>>[javac] context.put(this, new LongDocValues(this) {
>>[javac]^
>>[javac] 
>> /Users/bill/solr/trunk/modules/queries/src/java/org/apache/lucene/queries/
>> function/valuesource/TotalTermFreqValueSource.java:68: warning:
>> [unchecked] unchecked call to put(K,V) as a member of the raw type
>> java.util.Map
>>[javac] context.put(this, new LongDocValues(this) {
>>[javac]^
>>[javac] 2 errors
>>[javac] 5 warnings
>> 
>> 
>> --
>> This message is automatically generated by JIRA.
>> For more information on JIRA, see: http://www.atlassian.com/software/jira
>> 
>> 
>> 
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>> 
> 
> 
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
> 


Grant Ingersoll
http://www.lucidimagination.com
Lucene Eurocon 2011: http://www.lucene-eurocon.com





[JENKINS] Lucene-Solr-tests-only-trunk-java7 - Build # 446 - Failure

2011-09-14 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk-java7/446/

1 tests failed.
REGRESSION:  
org.apache.solr.client.solrj.embedded.SolrExampleEmbeddedTest.testCommitWithin

Error Message:
expected:<1> but was:<0>

Stack Trace:
junit.framework.AssertionFailedError: expected:<1> but was:<0>
at 
org.apache.solr.client.solrj.SolrExampleTests.testCommitWithin(SolrExampleTests.java:401)
at 
org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:148)
at 
org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:50)




Build Log (for compile errors):
[...truncated 11892 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-trunk - Build # 1677 - Still Failing

2011-09-14 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/Lucene-trunk/1677/

2 tests failed.
FAILED:  org.apache.lucene.index.TestTermsEnum.testIntersectRandom

Error Message:
Java heap space

Stack Trace:
java.lang.OutOfMemoryError: Java heap space
at 
org.apache.lucene.util.automaton.RunAutomaton.(RunAutomaton.java:128)
at 
org.apache.lucene.util.automaton.ByteRunAutomaton.(ByteRunAutomaton.java:28)
at 
org.apache.lucene.util.automaton.CompiledAutomaton.(CompiledAutomaton.java:134)
at 
org.apache.lucene.index.TestTermsEnum.testIntersectRandom(TestTermsEnum.java:266)
at 
org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:148)
at 
org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:50)


FAILED:  org.apache.lucene.util.automaton.TestCompiledAutomaton.testRandom

Error Message:
Java heap space

Stack Trace:
java.lang.OutOfMemoryError: Java heap space
at 
org.apache.lucene.util.automaton.RunAutomaton.(RunAutomaton.java:128)
at 
org.apache.lucene.util.automaton.ByteRunAutomaton.(ByteRunAutomaton.java:28)
at 
org.apache.lucene.util.automaton.CompiledAutomaton.(CompiledAutomaton.java:134)
at 
org.apache.lucene.util.automaton.TestCompiledAutomaton.build(TestCompiledAutomaton.java:39)
at 
org.apache.lucene.util.automaton.TestCompiledAutomaton.testTerms(TestCompiledAutomaton.java:55)
at 
org.apache.lucene.util.automaton.TestCompiledAutomaton.testRandom(TestCompiledAutomaton.java:101)
at 
org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:148)
at 
org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:50)




Build Log (for compile errors):
[...truncated 12824 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-3434) Make ShingleAnalyzerWrapper and PerFieldAnalyzerWrapper immutable

2011-09-14 Thread Chris Male (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Male resolved LUCENE-3434.


   Resolution: Fixed
Fix Version/s: 4.0
   3.5
 Assignee: Chris Male

Trunk: Committed revision 1170942. (with Robert's change)
3x: Committed revision 1170939.

> Make ShingleAnalyzerWrapper and PerFieldAnalyzerWrapper immutable
> -
>
> Key: LUCENE-3434
> URL: https://issues.apache.org/jira/browse/LUCENE-3434
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: modules/analysis
>Reporter: Chris Male
>Assignee: Chris Male
> Fix For: 3.5, 4.0
>
> Attachments: LUCENE-3434-3x.patch, LUCENE-3434-trunk.patch
>
>
> Both ShingleAnalyzerWrapper and PerFieldAnalyzerWrapper have setters which 
> change some state which impacts their analysis stack.  If these are going to 
> become reusable, then the state must be immutable as changing it will have no 
> effect.
> Process will be similar to QueryAutoStopWordAnalyzer, I will remove in trunk 
> and deprecate in 3x.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-3019) FVH: uncontrollable color tags

2011-09-14 Thread Koji Sekiguchi (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Sekiguchi resolved LUCENE-3019.


Resolution: Fixed

trunk: Committed revision 1170908.
3x: Committed revision 1170913.

> FVH: uncontrollable color tags
> --
>
> Key: LUCENE-3019
> URL: https://issues.apache.org/jira/browse/LUCENE-3019
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: modules/highlighter
>Affects Versions: 2.9.4, 3.0.3, 3.1, 4.0
>Reporter: Koji Sekiguchi
>Assignee: Koji Sekiguchi
>Priority: Trivial
> Fix For: 3.5, 4.0
>
> Attachments: LUCENE-3019.patch
>
>
> The multi-colored tags is a feature of FVH. But it is uncontrollable (or more 
> precisely, unexpected by users) that which color is used for each terms.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2739) TestSqlEntityProcessorDelta.testNonWritablePersistFile failures on some systems

2011-09-14 Thread Shawn Heisey (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13105030#comment-13105030
 ] 

Shawn Heisey commented on SOLR-2739:


I've now tried it on a completely unrelated Debian system (Squeeze, ext4) with 
the same TestSqlEntityProcessorDelta failure.  All packages on this system are 
from the standard Debian repositories, and include ant 1.8.1, which I remember 
reading isn't supported by the lucene/solr build system.  I also tried to do it 
on a Debian Lenny system, but it's running ant 1.7.0 and won't run Solr's 
build.xml at all.


> TestSqlEntityProcessorDelta.testNonWritablePersistFile failures on some 
> systems
> ---
>
> Key: SOLR-2739
> URL: https://issues.apache.org/jira/browse/SOLR-2739
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 3.3
>Reporter: Shawn Heisey
>Assignee: Hoss Man
> Fix For: 3.4, 4.0
>
>
> Shawn Heisey noted on the mailing list that he was getting consistent 
> failures from TestSqlEntityProcessorDelta.testNonWritablePersistFile on his 
> machine.
> I can't reproduce his exact failures, but the test is hinky enough that i 
> want to try and clean it up.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2205) Rework of the TermInfosReader class to remove the Terms[], TermInfos[], and the index pointer long[] and create a more memory efficient data structure.

2011-09-14 Thread Aaron McCurry (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13105017#comment-13105017
 ] 

Aaron McCurry commented on LUCENE-2205:
---

I will update my patch taking your comments into account and re-submit.

I have tried the termInfosIndexDivisor and it does help with memory consumption 
but it typically costs access time (if I remember since I last tried changing 
the values to tune the index).  Since I started running with the patch above I 
haven't had any memory issues relating to index size, so I can't really comment 
it's effect.


> Rework of the TermInfosReader class to remove the Terms[], TermInfos[], and 
> the index pointer long[] and create a more memory efficient data structure.
> ---
>
> Key: LUCENE-2205
> URL: https://issues.apache.org/jira/browse/LUCENE-2205
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: core/index
> Environment: Java5
>Reporter: Aaron McCurry
> Attachments: RandomAccessTest.java, TermInfosReader.java, 
> TermInfosReaderIndex.java, TermInfosReaderIndexDefault.java, 
> TermInfosReaderIndexSmall.java, patch-final.txt, rawoutput.txt
>
>
> Basically packing those three arrays into a byte array with an int array as 
> an index offset.  
> The performance benefits are stagering on my test index (of size 6.2 GB, with 
> ~1,000,000 documents and ~175,000,000 terms), the memory needed to load the 
> terminfos into memory were reduced to 17% of there original size.  From 291.5 
> MB to 49.7 MB.  The random access speed has been made better by 1-2%, load 
> time of the segments are ~40% faster as well, and full GC's on my JVM were 
> made 7 times faster.
> I have already performed the work and am offering this code as a patch.  
> Currently all test in the trunk pass with this new code enabled.  I did write 
> a system property switch to allow for the original implementation to be used 
> as well.
> -Dorg.apache.lucene.index.TermInfosReader=default or small
> I have also written a blog about this patch here is the link.
> http://www.nearinfinity.com/blogs/aaron_mccurry/my_first_lucene_patch.html

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3019) FVH: uncontrollable color tags

2011-09-14 Thread Koji Sekiguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13105015#comment-13105015
 ] 

Koji Sekiguchi commented on LUCENE-3019:


I'll commit soon.

> FVH: uncontrollable color tags
> --
>
> Key: LUCENE-3019
> URL: https://issues.apache.org/jira/browse/LUCENE-3019
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: modules/highlighter
>Affects Versions: 2.9.4, 3.0.3, 3.1, 4.0
>Reporter: Koji Sekiguchi
>Assignee: Koji Sekiguchi
>Priority: Trivial
> Fix For: 3.5, 4.0
>
> Attachments: LUCENE-3019.patch
>
>
> The multi-colored tags is a feature of FVH. But it is uncontrollable (or more 
> precisely, unexpected by users) that which color is used for each terms.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (SOLR-2756) SolrJ maven dependencies are faulty; needless dependency on lucene-core

2011-09-14 Thread Steven Rowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Rowe resolved SOLR-2756.
---

   Resolution: Fixed
Fix Version/s: 4.0
   3.5
 Assignee: Steven Rowe

Committed to branch_3x and (partially) to trunk.

> SolrJ maven dependencies are faulty; needless dependency on lucene-core
> ---
>
> Key: SOLR-2756
> URL: https://issues.apache.org/jira/browse/SOLR-2756
> Project: Solr
>  Issue Type: Bug
>  Components: clients - java
>Affects Versions: 3.3
>Reporter: David Smiley
>Assignee: Steven Rowe
>Priority: Minor
> Fix For: 3.5, 4.0
>
> Attachments: SOLR-2756-zookeeper-and-stax-api.patch, 
> SOLR-2756-zookeeper-and-stax-api.patch
>
>
> I included a SolrJ 3.3 dependency into a new project and I noticed needless 
> dependencies transitive show up.
> Here is a subset of the output from mvn dependency:tree:
> {noformat}
> [INFO] +- org.apache.solr:solr-solrj:jar:3.3.0:compile
> [INFO] |  +- org.apache.lucene:lucene-core:jar:3.3.0:compile
> [INFO] |  +- commons-httpclient:commons-httpclient:jar:3.1:compile
> [INFO] |  |  \- commons-codec:commons-codec:jar:1.2:compile
> [INFO] |  +- 
> org.apache.geronimo.specs:geronimo-stax-api_1.0_spec:jar:1.0.1:compile
> [INFO] |  +- org.apache.zookeeper:zookeeper:jar:3.3.1:compile
> [INFO] |  |  +- log4j:log4j:jar:1.2.15:compile
> [INFO] |  |  |  \- javax.mail:mail:jar:1.4:compile
> [INFO] |  |  | \- javax.activation:activation:jar:1.1:compile
> [INFO] |  |  \- jline:jline:jar:0.9.94:compile
> [INFO] |  \- org.codehaus.woodstox:wstx-asl:jar:3.2.7:compile
> [INFO] | \- stax:stax-api:jar:1.0.1:compile
> {noformat}
> Clearly there is an inconsistency with solr/dist/solrj-lib and this list.
> * lucene-core dependency should be removed
> * AFAIK, geronimo-stax-api and wstx-asl are only needed for Java 1.5.  Right? 
>  These can be put in a maven profile activated by jdk1.5.
> * zookeeper dependency should be removed. Is this used in Solr 4?  Even if it 
> is, it strikes me as an optional dependency.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2205) Rework of the TermInfosReader class to remove the Terms[], TermInfos[], and the index pointer long[] and create a more memory efficient data structure.

2011-09-14 Thread Doug Cutting (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13104982#comment-13104982
 ] 

Doug Cutting commented on LUCENE-2205:
--

Also have you tried specifying termInfosIndexDivisor?  I added that feature 
many years ago to address the memory footprint of the terms index.

http://lucene.apache.org/java/3_0_3/api/core/org/apache/lucene/index/IndexReader.html#open(org.apache.lucene.store.Directory,
 org.apache.lucene.index.IndexDeletionPolicy, boolean, int)

If this is 2 then the memory use is halved, but the compute cost of looking up 
each search term is doubled.  It would be interesting to compare the 
performance of the two approaches, since the approach of this patch probably 
increases lookup cost somewhat too.

> Rework of the TermInfosReader class to remove the Terms[], TermInfos[], and 
> the index pointer long[] and create a more memory efficient data structure.
> ---
>
> Key: LUCENE-2205
> URL: https://issues.apache.org/jira/browse/LUCENE-2205
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: core/index
> Environment: Java5
>Reporter: Aaron McCurry
> Attachments: RandomAccessTest.java, TermInfosReader.java, 
> TermInfosReaderIndex.java, TermInfosReaderIndexDefault.java, 
> TermInfosReaderIndexSmall.java, patch-final.txt, rawoutput.txt
>
>
> Basically packing those three arrays into a byte array with an int array as 
> an index offset.  
> The performance benefits are stagering on my test index (of size 6.2 GB, with 
> ~1,000,000 documents and ~175,000,000 terms), the memory needed to load the 
> terminfos into memory were reduced to 17% of there original size.  From 291.5 
> MB to 49.7 MB.  The random access speed has been made better by 1-2%, load 
> time of the segments are ~40% faster as well, and full GC's on my JVM were 
> made 7 times faster.
> I have already performed the work and am offering this code as a patch.  
> Currently all test in the trunk pass with this new code enabled.  I did write 
> a system property switch to allow for the original implementation to be used 
> as well.
> -Dorg.apache.lucene.index.TermInfosReader=default or small
> I have also written a blog about this patch here is the link.
> http://www.nearinfinity.com/blogs/aaron_mccurry/my_first_lucene_patch.html

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2205) Rework of the TermInfosReader class to remove the Terms[], TermInfos[], and the index pointer long[] and create a more memory efficient data structure.

2011-09-14 Thread Doug Cutting (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13104976#comment-13104976
 ] 

Doug Cutting commented on LUCENE-2205:
--

A few comments on the patch:
 - It'd probably be better not to make TermInfosReaderIndex and its subclasses 
public, to reduce the APIs that must be supported long-term.
 - Could you use BufferedIndexInput directly instead of re-implementing 
readVInt, readVLong, etc?
 - The code uses tabs for indentation.  Lucene's standard is 2-spaces per 
level, no tabs. http://wiki.apache.org/lucene-java/HowToContribute
 - It would be good to add some tests, perhaps running some existing set of 
test searches with a reader  configured to use the new TermInfosReaderIndex 
implementation.

Probably the "Fix-version" of this patch should be 3.5, since it's not fixing a 
regression.

> Rework of the TermInfosReader class to remove the Terms[], TermInfos[], and 
> the index pointer long[] and create a more memory efficient data structure.
> ---
>
> Key: LUCENE-2205
> URL: https://issues.apache.org/jira/browse/LUCENE-2205
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: core/index
> Environment: Java5
>Reporter: Aaron McCurry
> Attachments: RandomAccessTest.java, TermInfosReader.java, 
> TermInfosReaderIndex.java, TermInfosReaderIndexDefault.java, 
> TermInfosReaderIndexSmall.java, patch-final.txt, rawoutput.txt
>
>
> Basically packing those three arrays into a byte array with an int array as 
> an index offset.  
> The performance benefits are stagering on my test index (of size 6.2 GB, with 
> ~1,000,000 documents and ~175,000,000 terms), the memory needed to load the 
> terminfos into memory were reduced to 17% of there original size.  From 291.5 
> MB to 49.7 MB.  The random access speed has been made better by 1-2%, load 
> time of the segments are ~40% faster as well, and full GC's on my JVM were 
> made 7 times faster.
> I have already performed the work and am offering this code as a patch.  
> Currently all test in the trunk pass with this new code enabled.  I did write 
> a system property switch to allow for the original implementation to be used 
> as well.
> -Dorg.apache.lucene.index.TermInfosReader=default or small
> I have also written a blog about this patch here is the link.
> http://www.nearinfinity.com/blogs/aaron_mccurry/my_first_lucene_patch.html

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Jira 3.4 -> 3.5 spam

2011-09-14 Thread Michael McCandless
Sorry for all the spam!

I went to Jira to mark 3.4 as "release" and it had this nice
innocent-looking checkbox to move all still-open issues to 3.5.

Next time I won't check it!  And do the bulk edit option, instead,
which has the checkbox to not send spam...

Mike McCandless

http://blog.mikemccandless.com

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-1812) Static index pruning by in-document term frequency (Carmel pruning)

2011-09-14 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-1812:
---

Fix Version/s: (was: 3.4)
   3.5

> Static index pruning by in-document term frequency (Carmel pruning)
> ---
>
> Key: LUCENE-1812
> URL: https://issues.apache.org/jira/browse/LUCENE-1812
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: modules/other
>Reporter: Andrzej Bialecki 
>Assignee: Doron Cohen
> Fix For: 3.5, 4.0
>
> Attachments: pruning.patch, pruning.patch, pruning.patch, 
> pruning.patch
>
>
> This module provides tools to produce a subset of input indexes by removing 
> postings data for those terms where their in-document frequency is below a 
> specified threshold. The net effect of this processing is a much smaller 
> index that for common types of queries returns nearly identical top-N results 
> as compared with the original index, but with increased performance. 
> Optionally, stored values and term vectors can also be removed. This 
> functionality is largely independent, so it can be used without term pruning 
> (when term freq. threshold is set to 1).
> As the threshold value increases, the total size of the index decreases, 
> search performance increases, and recall decreases (i.e. search quality 
> deteriorates). NOTE: especially phrase recall deteriorates significantly at 
> higher threshold values. 
> Primary purpose of this class is to produce small first-tier indexes that fit 
> completely in RAM, and store these indexes using 
> IndexWriter.addIndexes(IndexReader[]). Usually the performance of this class 
> will not be sufficient to use the resulting index view for on-the-fly pruning 
> and searching. 
> NOTE: If the input index is optimized (i.e. doesn't contain deletions) then 
> the index produced via IndexWriter.addIndexes(IndexReader[]) will preserve 
> internal document id-s so that they are in sync with the original index. This 
> means that all other auxiliary information not necessary for first-tier 
> processing, such as some stored fields, can also be removed, to be quickly 
> retrieved on-demand from the original index using the same internal document 
> id. 
> Threshold values can be specified globally (for terms in all fields) using 
> defaultThreshold parameter, and can be overriden using per-field or per-term 
> values supplied in a thresholds map. Keys in this map are either field names, 
> or terms in field:text format. The precedence of these values is the 
> following: first a per-term threshold is used if present, then per-field 
> threshold if present, and finally the default threshold.
> A command-line tool (PruningTool) is provided for convenience. At this moment 
> it doesn't support all functionality available through API.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-2457) QueryNode implementors should override equals method

2011-09-14 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-2457:
---

Fix Version/s: (was: 3.4)
   3.5

> QueryNode implementors should override equals method
> 
>
> Key: LUCENE-2457
> URL: https://issues.apache.org/jira/browse/LUCENE-2457
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: core/queryparser
>Reporter: Adriano Crestani
>Priority: Minor
> Fix For: 3.5, 4.0
>
>
> Discussed on thread: http://markmail.org/thread/gjqk35t7e3y4fo5j
> "QueryNode(s) are data objects, and it makes sense to override
> their equals method. But before, we need to define what is a QueryNode
> equality. Should two nodes be considered equal if they represent
> syntactically or semantically the same query? e.g. an ORQueryNode created
> from the query  will not have the same children ordering as the
> query , so they are syntactically not equal, but they are
> semantically equal, because the order of the OR operands (usually) does not
> matter when the query is executed. I say it usually does not matter, because
> it's up to the Query object implementation built from that ORQueryNode
> object, for this reason, I vote for defining that two query nodes should be
> equals if they are syntactically equal.
> I also vote for excluding query node tags from the equality check, because
> they are not meant to represent the query structure, but to attach extra
> info to the node, which is usually used for communication between
> processors."

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-1866) better RAT reporting

2011-09-14 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-1866:
---

Fix Version/s: (was: 3.4)
   3.5

> better RAT reporting
> 
>
> Key: LUCENE-1866
> URL: https://issues.apache.org/jira/browse/LUCENE-1866
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: general/build
>Reporter: Hoss Man
>Assignee: Robert Muir
> Fix For: 3.5, 4.0
>
> Attachments: LUCENE-1866.patch, LUCENE-1866.patch
>
>
> the "ant rat-sources" target currently only analyzes src/java ... we can do 
> better then this.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-2482) Index sorter

2011-09-14 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-2482:
---

Fix Version/s: (was: 3.4)
   3.5

> Index sorter
> 
>
> Key: LUCENE-2482
> URL: https://issues.apache.org/jira/browse/LUCENE-2482
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: modules/other
>Affects Versions: 3.1, 4.0
>Reporter: Andrzej Bialecki 
>Assignee: Andrzej Bialecki 
> Fix For: 3.5, 4.0
>
> Attachments: LUCENE-2482-4.0.patch, indexSorter.patch
>
>
> A tool to sort index according to a float document weight. Documents with 
> high weight are given low document numbers, which means that they will be 
> first evaluated. When using a strategy of "early termination" of queries (see 
> TimeLimitedCollector) such sorting significantly improves the quality of 
> partial results.
> (Originally this tool was created by Doug Cutting in Nutch, and used norms as 
> document weights - thus the ordering was limited by the limited resolution of 
> norms. This is a pure Lucene version of the tool, and it uses arbitrary 
> floats from a specified stored field).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-2585) DirectoryReader.isCurrent might fail to see the segments file during concurrent index changes

2011-09-14 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-2585:
---

Fix Version/s: (was: 3.4)
   3.5

> DirectoryReader.isCurrent might fail to see the segments file during 
> concurrent index changes
> -
>
> Key: LUCENE-2585
> URL: https://issues.apache.org/jira/browse/LUCENE-2585
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: core/index
>Reporter: Sanne Grinovero
> Fix For: 3.5, 4.0
>
>
> I could reproduce the issue several times but only by running long and 
> stressfull benchmarks, the high number of files is likely part of the 
> scenario.
> All tests run on local disk, using ext3.
> Sample stacktrace:
> {noformat}java.io.FileNotFoundException: no segments* file found in 
> org.apache.lucene.store.NIOFSDirectory@/home/sanne/infinispan-41/lucene-directory/tempIndexName:
>  files:
> _2l3.frq _uz.fdt _1q4.fnm _1q0.fdx _4bc.fdt _v2.tis _4ll.fdx _2l8.tii _ux.fnm 
> _3g7.fdx _4bb.tii _4bj.prx _uy.fdx _3g7.prx _2l7.frq _2la.fdt _3ge.nrm 
> _2l6.prx 
> _1py.fdx _3g6.nrm _v0.prx _4bi.tii _2l2.tis _v2.fdx _2l3.nrm _2l8.fnm 
> _4bg.tis _2la.tis _uu.fdx _3g6.fdx _1q3.frq _2la.frq _4bb.tis _3gb.tii 
> _1pz.tis 
> _2lb.nrm _4lm.nrm _3g9.tii _v0.fdt _2l5.fnm _v2.prx _4ll.tii _4bd.nrm 
> _2l7.fnm _2l4.nrm _1q2.tis _3gb.fdx _4bh.fdx _1pz.nrm _ux.fdx _ux.tii 
> _1q6.nrm 
> _3gf.fdx _4lk.fdt _3gd.nrm _v3.fnm _3g8.prx _1q2.nrm _4bh.prx _1q0.frq 
> _ux.fdt _1q7.fdt _4bb.fnm _4bf.nrm _4bc.nrm _3gb.fdt _4bh.fnm _2l5.tis 
> _1pz.fnm _1py.fnm _3gc.fnm _2l2.prx _2l4.frq _3gc.fdt _ux.tis _1q3.prx 
> _2l7.fdx _4bj.nrm _4bj.fdx _4bi.tis _3g9.prx _1q4.prx _v3.fdt _1q3.fdx 
> _2l9.fdt 
> _4bh.tis _3gb.nrm _v2.nrm _3gd.tii _2l7.nrm _2lb.tii _4lm.tis _3ga.fdx 
> _1pz.fdt _3g7.fnm _2l3.fnm _4lk.fnm _uz.fnm _2l2.frq _4bd.fdx _1q2.fdt 
> _3g7.tis 
> _4bi.frq _4bj.frq _2l7.prx _ux.prx _3gd.fnm _1q4.fdt _1q1.fdt _v1.fnm 
> _1py.nrm _3gf.nrm _4be.fdt _1q3.tii _1q1.prx _2l3.fdt _4lk.frq _2l4.fdx 
> _4bd.fnm 
> _uw.frq _3g8.fdx _2l6.tii _1q5.frq _1q5.tis _3g8.nrm _uw.nrm _v0.tii _v2.fdt 
> _2l7.fdt _v0.tis _uy.tii _3ge.tii _v1.tii _3gb.tis _4lm.fdx _4bc.fnm _2lb.frq 
> _2l6.fnm _3g6.tii _3ge.prx _uu.frq _1pz.fdx _1q2.fnm _4bi.prx _3gc.frq 
> _2l9.tis _3ge.fdt _uy.fdt _4ll.fnm _3gc.prx _1q7.tii _2l5.nrm _uy.nrm _uv.frq 
> _1q6.frq _4ba.tis _3g9.tis _4be.nrm _4bi.fnm _ux.frq _1q1.fnm _v0.fnm 
> _2l4.fnm _4ba.fnm _4be.tis _uz.prx _1q6.fdx _uw.tii _2l6.nrm _1pz.prx 
> _2l7.tis 
> _1q7.fdx _2l9.tii _4lk.tii _uz.frq _3g8.frq _4bb.prx _1q5.tii _1q5.prx 
> _v2.frq _4bc.tii _1q7.prx _v2.tii _2lb.tis _4bi.fdt _uv.nrm _2l2.fnm _4bd.tii 
> _1q7.tis 
> _4bg.fnm _3ga.frq _uu.fnm _2l9.fnm _3ga.fnm _uw.fnm _1pz.frq _1q1.fdx 
> _3ge.fdx _2l3.prx _3ga.nrm _uv.fdt _4bb.nrm _1q7.fnm _uv.tis _3gb.fnm 
> _2l6.tis _1pz.tii _uy.fnm _3gf.fdt _3gc.nrm _4bf.tis _1q5.fnm _uu.tis 
> _4bh.tii _2l5.fdt _1q6.tii _4bc.tis _3gc.tii _3g9.fnm _2l6.fdt _4bj.fnm 
> _uu.tii _v3.frq 
> _3g9.fdx _v0.nrm _2l7.tii _1q0.fdt _3ge.fnm _4bf.fdt _1q6.prx _uz.nrm 
> _4bi.fdx _3gf.fnm _4lm.frq _v0.fdx _4ba.fdt _1py.tii _4bf.tii _uw.fdx 
> _2l5.frq 
> _3g9.nrm _v1.fdt _uw.fdt _4bd.frq _4bg.prx _3gd.tis _1q4.tis _2l9.nrm 
> _2la.nrm _v3.tii _4bf.prx _1q1.nrm _4ba.tii _3gd.fdx _1q4.tii _4lm.tii 
> _3ga.tis 
> _4bf.fnm write.lock _2l8.prx _2l8.fdt segments.gen _2lb.fnm _2l4.fdt _1q2.prx 
> _4be.fnm _3gf.prx _2l6.fdx _3g6.fnm _4bb.fdt _4bd.tis _4lk.nrm _2l5.fdx 
> _2la.tii _4bd.prx _4ln.fnm _3gf.tis _4ba.nrm _v3.prx _uv.prx _1q3.fnm 
> _3ga.tii _uz.tii _3g9.frq _v0.frq _3ge.tis _3g6.tis _4ln.prx _3g7.tii 
> _3g8.fdt 
> _3g7.nrm _3ga.prx _2l2.fdx _2l8.fdx _4ba.prx _1py.frq _uz.fdx _2l3.tii 
> _3g6.prx _v3.fdx _1q6.fdt _v1.nrm _2l2.tii _1q0.tis _4ba.fdx _4be.tii 
> _4ba.frq 
> _4ll.fdt _4bh.nrm _4lm.fdt _1q7.frq _4lk.tis _4bc.frq _1q6.fnm _3g7.frq 
> _uw.tis _3g8.tis _2l9.fdx _2l4.tii _1q4.fdx _4be.prx _1q3.nrm _1q0.tii 
> _1q0.fnm 
> _v3.nrm _1py.tis _3g9.fdt _4bh.fdt _4ll.nrm _4lk.prx _3gd.prx _1q3.tis 
> _1q2.tii _2l2.nrm _3gd.fdt _2l3.fdx _3g6.fdt _3gd.frq _1q1.tis _4bb.fdx 
> _1q2.frq 
> _1q3.fdt _v1.tis _2l8.frq _3gc.fdx _1q1.frq _4bg.frq _4bb.frq _2la.fdx 
> _2l9.frq _uy.tis _uy.prx _4bg.fdx _3gb.prx _uy.frq _1q2.fdx _4lm.prx _2la.prx 
> _2l4.prx _4bg.fdt _4be.frq _1q7.nrm _2l5.prx _4bf.frq _v1.prx _4bd.fdt 
> _2l9.prx _1q6.tis _3g8.fnm _4ln.tis _2l3.tis _4bc.fdx _2lb.prx _3gb.frq 
> _3gf.frq 
> _2la.fnm _3ga.fdt _uz.tis _4bg.nrm _uv.tii _4bg.tii _3g8.tii _4ll.frq _uv.fnm 
> _2l8.tis _2l8.nrm _2l2.fdt _4bj.tis _4lk.fdx _uw.prx _4bc.prx _4bj.fdt 
> _4be.fdx 
> _1q4.frq _uu.fdt _1q1.tii _2l5.tii _2lb.fdt _4bh.frq _3ge.frq _1py.prx 
> _1q5.nrm _v1.fdx _3g7.fdt _4ln.fdt _1q4.nrm _1py.fdt _3gc.tis _4ll.prx 
> _v3.tis _4bf.fdx 
> _1q5.fdx _1q0.prx _

[jira] [Updated] (LUCENE-2686) DisjunctionSumScorer should not call .score on sub scorers until consumer calls .score

2011-09-14 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-2686:
---

Fix Version/s: (was: 3.4)
   3.5

> DisjunctionSumScorer should not call .score on sub scorers until consumer 
> calls .score
> --
>
> Key: LUCENE-2686
> URL: https://issues.apache.org/jira/browse/LUCENE-2686
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: core/search
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Fix For: 3.5, 4.0
>
> Attachments: LUCENE-2686.patch, LUCENE-2686.patch, 
> Test2LUCENE2590.java
>
>
> Spinoff from java-user thread "question about Scorer.freq()" from Koji...
> BooleanScorer2 uses DisjunctionSumScorer to score only-SHOULD-clause boolean 
> queries.
> But, this scorer does too much work for collectors that never call .score, 
> because it scores while it's matching.  It should only call .score on the 
> subs when the caller calls its .score.
> This also has the side effect of messing up advanced collectors that gather 
> the freq() of the subs (using LUCENE-2590).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-2564) wordlistloader is inefficient

2011-09-14 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-2564:
---

Fix Version/s: (was: 3.4)
   3.5

> wordlistloader is inefficient
> -
>
> Key: LUCENE-2564
> URL: https://issues.apache.org/jira/browse/LUCENE-2564
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: modules/analysis
>Reporter: Robert Muir
>Assignee: Robert Muir
> Fix For: 3.5, 4.0
>
>
> WordListLoader is basically used for loading up stopwords lists, stem 
> dictionaries, etc.
> Unfortunately the api returns Set and sometimes even HashSet 
> or HashMap
> I think we should break it and return CharArraySets and CharArrayMaps (but 
> leave the return value as generic Set,Map).
> If someone objects to breaking it in 3.1, then we can do this only in 4.0, 
> but i think it would be good to fix it both places.
> The reason is that if someone does new FooAnalyzer() a lot (probably not 
> uncommon) i think its doing a bunch of useless copying.
> I think we should slap @lucene.internal on this API too, since thats mostly 
> how its being used.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-2921) Now that we track the code version at the segment level, we can stop tracking it also in each file level

2011-09-14 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-2921:
---

Fix Version/s: (was: 3.4)
   3.5

> Now that we track the code version at the segment level, we can stop tracking 
> it also in each file level
> 
>
> Key: LUCENE-2921
> URL: https://issues.apache.org/jira/browse/LUCENE-2921
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: core/index
>Reporter: Shai Erera
> Fix For: 3.5, 4.0
>
>
> Now that we track the code version that created the segment at the segment 
> level, we can stop tracking versions in each file. This has several major 
> benefits:
> # Today the constant names that use to track versions are confusing - they do 
> not state since which version it applies to, and so it's harder to determine 
> which formats we can stop supporting when working on the next major release.
> # Those format numbers are usually negative, but in some cases positive 
> (inconsistency) -- we need to remember to increase it "one down" for the 
> negative ones, which I always find confusing.
> # It will remove the format tracking from all the *Writers, and the *Reader 
> will receive the code format (String) and work w/ the appropriate constant 
> (e.g. Constants.LUCENE_30). Centralizing version tracking to SegmentInfo is 
> an advantage IMO.
> It's not urgent that we do it for 3.1 (though it requires an index format 
> change), because starting from 3.1 all segments track their version number 
> anyway (or migrated to track it), so we can safely release it in follow-on 3x 
> release.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-2949) FastVectorHighlighter FieldTermStack could likely benefit from using TermVectorMapper

2011-09-14 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-2949:
---

Fix Version/s: (was: 3.4)
   3.5

> FastVectorHighlighter FieldTermStack could likely benefit from using 
> TermVectorMapper
> -
>
> Key: LUCENE-2949
> URL: https://issues.apache.org/jira/browse/LUCENE-2949
> Project: Lucene - Java
>  Issue Type: Improvement
>Affects Versions: 3.0.3, 4.0
>Reporter: Grant Ingersoll
>Assignee: Koji Sekiguchi
>Priority: Minor
>  Labels: FastVectorHighlighter, Highlighter
> Fix For: 3.5, 4.0
>
> Attachments: LUCENE-2949.patch
>
>
> Based on my reading of the FieldTermStack constructor that loads the vector 
> from disk, we could probably save a bunch of time and memory by using the 
> TermVectorMapper callback mechanism instead of materializing the full array 
> of terms into memory and then throwing most of them out.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-2974) the hudson nightly for lucene should check out lucene by itself

2011-09-14 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-2974:
---

Fix Version/s: (was: 3.4)
   3.5

> the hudson nightly for lucene should check out lucene by itself
> ---
>
> Key: LUCENE-2974
> URL: https://issues.apache.org/jira/browse/LUCENE-2974
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: general/build
>Reporter: Robert Muir
> Fix For: 3.5, 4.0
>
>
> Currently its too easy to break the lucene-only packaging and build.
> the hudson job for lucene should check out lucene by itself, this will
> prevent it from being broken.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-2971) Auto Generate our LICENSE.txt and NOTICE.txt files

2011-09-14 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-2971:
---

Fix Version/s: (was: 3.4)
   3.5

> Auto Generate our LICENSE.txt and NOTICE.txt files
> --
>
> Key: LUCENE-2971
> URL: https://issues.apache.org/jira/browse/LUCENE-2971
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Grant Ingersoll
>Priority: Minor
> Fix For: 3.5, 4.0
>
>
> Once LUCENE-2952 is in place, we should be able to automatically generate 
> Lucene and Solr's LICENSE.txt and NOTICE.txt file (without massive 
> duplication)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-2906) Filter to process output of ICUTokenizer and create overlapping bigrams for CJK

2011-09-14 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-2906:
---

Fix Version/s: (was: 3.4)
   3.5

> Filter to process output of ICUTokenizer and create overlapping bigrams for 
> CJK 
> 
>
> Key: LUCENE-2906
> URL: https://issues.apache.org/jira/browse/LUCENE-2906
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: modules/analysis
>Reporter: Tom Burton-West
>Priority: Minor
> Fix For: 3.5, 4.0
>
> Attachments: LUCENE-2906.patch
>
>
> The ICUTokenizer produces unigrams for CJK. We would like to use the 
> ICUTokenizer but have overlapping bigrams created for CJK as in the CJK 
> Analyzer.  This filter would take the output of the ICUtokenizer, read the 
> ScriptAttribute and for selected scripts (Han, Kana), would produce 
> overlapping bigrams.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3022) DictionaryCompoundWordTokenFilter Flag onlyLongestMatch has no affect

2011-09-14 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-3022:
---

Fix Version/s: (was: 3.4)
   3.5

> DictionaryCompoundWordTokenFilter Flag onlyLongestMatch has no affect
> -
>
> Key: LUCENE-3022
> URL: https://issues.apache.org/jira/browse/LUCENE-3022
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: modules/analysis
>Affects Versions: 2.9.4, 3.1
>Reporter: Johann Höchtl
>Assignee: Robert Muir
>Priority: Minor
> Fix For: 3.5, 4.0
>
> Attachments: LUCENE-3022.patch, LUCENE-3022.patch
>
>   Original Estimate: 5m
>  Remaining Estimate: 5m
>
> When using the DictionaryCompoundWordTokenFilter with a german dictionary, I 
> got a strange behaviour:
> The german word "streifenbluse" (blouse with stripes) was decompounded to 
> "streifen" (stripe),"reifen"(tire) which makes no sense at all.
> I thought the flag onlyLongestMatch would fix this, because "streifen" is 
> longer than "reifen", but it had no effect.
> So I reviewed the sourcecode and found the problem:
> [code]
> protected void decomposeInternal(final Token token) {
> // Only words longer than minWordSize get processed
> if (token.length() < this.minWordSize) {
>   return;
> }
> 
> char[] lowerCaseTermBuffer=makeLowerCaseCopy(token.buffer());
> 
> for (int i=0;i Token longestMatchToken=null;
> for (int j=this.minSubwordSize-1;j if(i+j>token.length()) {
> break;
> }
> if(dictionary.contains(lowerCaseTermBuffer, i, j)) {
> if (this.onlyLongestMatch) {
>if (longestMatchToken!=null) {
>  if (longestMatchToken.length()longestMatchToken=createToken(i,j,token);
>  }
>} else {
>  longestMatchToken=createToken(i,j,token);
>}
> } else {
>tokens.add(createToken(i,j,token));
> }
> } 
> }
> if (this.onlyLongestMatch && longestMatchToken!=null) {
>   tokens.add(longestMatchToken);
> }
> }
>   }
> [/code]
> should be changed to 
> [code]
> protected void decomposeInternal(final Token token) {
> // Only words longer than minWordSize get processed
> if (token.termLength() < this.minWordSize) {
>   return;
> }
> char[] lowerCaseTermBuffer=makeLowerCaseCopy(token.termBuffer());
> Token longestMatchToken=null;
> for (int i=0;i for (int j=this.minSubwordSize-1;j if(i+j>token.termLength()) {
> break;
> }
> if(dictionary.contains(lowerCaseTermBuffer, i, j)) {
> if (this.onlyLongestMatch) {
>if (longestMatchToken!=null) {
>  if (longestMatchToken.termLength()longestMatchToken=createToken(i,j,token);
>  }
>} else {
>  longestMatchToken=createToken(i,j,token);
>}
> } else {
>tokens.add(createToken(i,j,token));
> }
> }
> }
> }
> if (this.onlyLongestMatch && longestMatchToken!=null) {
> tokens.add(longestMatchToken);
> }
>   }
> [/code]
> So, that only the longest token is really indexed and the onlyLongestMatch 
> Flag makes sense.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3019) FVH: uncontrollable color tags

2011-09-14 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-3019:
---

Fix Version/s: (was: 3.4)
   3.5

> FVH: uncontrollable color tags
> --
>
> Key: LUCENE-3019
> URL: https://issues.apache.org/jira/browse/LUCENE-3019
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: modules/highlighter
>Affects Versions: 2.9.4, 3.0.3, 3.1, 4.0
>Reporter: Koji Sekiguchi
>Assignee: Koji Sekiguchi
>Priority: Trivial
> Fix For: 3.5, 4.0
>
> Attachments: LUCENE-3019.patch
>
>
> The multi-colored tags is a feature of FVH. But it is uncontrollable (or more 
> precisely, unexpected by users) that which color is used for each terms.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3003) Move UnInvertedField into Lucene core

2011-09-14 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-3003:
---

Fix Version/s: (was: 3.4)
   3.5

> Move UnInvertedField into Lucene core
> -
>
> Key: LUCENE-3003
> URL: https://issues.apache.org/jira/browse/LUCENE-3003
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: core/index
>Reporter: Michael McCandless
>Assignee: Michael McCandless
>Priority: Minor
> Fix For: 3.5, 4.0
>
> Attachments: LUCENE-3003.patch, LUCENE-3003.patch, 
> byte_size_32-bit-openjdk6.txt
>
>
> Solr's UnInvertedField lets you quickly lookup all terms ords for a
> given doc/field.
> Like, FieldCache, it inverts the index to produce this, and creates a
> RAM-resident data structure holding the bits; but, unlike FieldCache,
> it can handle multiple values per doc, and, it does not hold the term
> bytes in RAM.  Rather, it holds only term ords, and then uses
> TermsEnum to resolve ord -> term.
> This is great eg for faceting, where you want to use int ords for all
> of your counting, and then only at the end you need to resolve the
> "top N" ords to their text.
> I think this is a useful core functionality, and we should move most
> of it into Lucene's core.  It's a good complement to FieldCache.  For
> this first baby step, I just move it into core and refactor Solr's
> usage of it.
> After this, as separate issues, I think there are some things we could
> explore/improve:
>   * The first-pass that allocates lots of tiny byte[] looks like it
> could be inefficient.  Maybe we could use the byte slices from the
> indexer for this...
>   * We can improve the RAM efficiency of the TermIndex: if the codec
> supports ords, and we are operating on one segment, we should just
> use it.  If not, we can use a more RAM-efficient data structure,
> eg an FST mapping to the ord.
>   * We may be able to improve on the main byte[] representation by
> using packed ints instead of delta-vInt?
>   * Eventually we should fold this ability into docvalues, ie we'd
> write the byte[] image at indexing time, and then loading would be
> fast, instead of uninverting

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3097) Post grouping faceting

2011-09-14 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-3097:
---

Fix Version/s: (was: 3.4)

> Post grouping faceting
> --
>
> Key: LUCENE-3097
> URL: https://issues.apache.org/jira/browse/LUCENE-3097
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: modules/grouping
>Reporter: Martijn van Groningen
>Assignee: Martijn van Groningen
>Priority: Minor
> Fix For: 3.5, 4.0
>
> Attachments: LUCENE-3097.patch, LUCENE-3097.patch, LUCENE-3097.patch, 
> LUCENE-3097.patch, LUCENE-3097.patch, LUCENE-30971.patch
>
>
> This issues focuses on implementing post grouping faceting.
> * How to handle multivalued fields. What field value to show with the facet.
> * Where the facet counts should be based on
> ** Facet counts can be based on the normal documents. Ungrouped counts. 
> ** Facet counts can be based on the groups. Grouped counts.
> ** Facet counts can be based on the combination of group value and facet 
> value. Matrix counts.   
> And properly more implementation options.
> The first two methods are implemented in the SOLR-236 patch. For the first 
> option it calculates a DocSet based on the individual documents from the 
> query result. For the second option it calculates a DocSet for all the most 
> relevant documents of a group. Once the DocSet is computed the FacetComponent 
> and StatsComponent use one the DocSet to create facets and statistics.  
> This last one is a bit more complex. I think it is best explained with an 
> example. Lets say we search on travel offers:
> |||hotel||departure_airport||duration||
> |Hotel a|AMS|5
> |Hotel a|DUS|10
> |Hotel b|AMS|5
> |Hotel b|AMS|10
> If we group by hotel and have a facet for airport. Most end users expect 
> (according to my experience off course) the following airport facet:
> AMS: 2
> DUS: 1
> The above result can't be achieved by the first two methods. You either get 
> counts AMS:3 and DUS:1 or 1 for both airports.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3116) pendingCommit in IndexWriter is not thoroughly tested

2011-09-14 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-3116:
---

Fix Version/s: (was: 3.4)
   3.5

> pendingCommit in IndexWriter is not thoroughly tested
> -
>
> Key: LUCENE-3116
> URL: https://issues.apache.org/jira/browse/LUCENE-3116
> Project: Lucene - Java
>  Issue Type: Test
>  Components: core/index
>Affects Versions: 3.2, 4.0
>Reporter: Uwe Schindler
> Fix For: 3.5, 4.0
>
>
> When working on LUCENE-3084, I had a copy-paste error in my patch (see 
> revision 1124307 and corrected in 1124316), I replaced pendingCommit by 
> segmentInfos in IndexWriter, corrected by the following patch:
> {noformat}
> --- lucene/dev/trunk/lucene/src/java/org/apache/lucene/index/IndexWriter.java 
> (original)
> +++ lucene/dev/trunk/lucene/src/java/org/apache/lucene/index/IndexWriter.java 
> Wed May 18 16:16:29 2011
> @@ -2552,7 +2552,7 @@ public class IndexWriter implements Clos
>  lastCommitChangeCount = pendingCommitChangeCount;
>  segmentInfos.updateGeneration(pendingCommit);
>  segmentInfos.setUserData(pendingCommit.getUserData());
> -rollbackSegments = segmentInfos.createBackupSegmentInfos(true);
> +rollbackSegments = pendingCommit.createBackupSegmentInfos(true);
>  deleter.checkpoint(pendingCommit, true);
>} finally {
>  // Matches the incRef done in startCommit:
> {noformat}
> This did not cause any test failure.
> On IRC, Mike said:
> {quote}
> [19:21]   mikemccand: ThetaPh1: hmm
> [19:21]   mikemccand: well
> [19:22]   mikemccand: pendingCommit and sis only differ while commit() is 
> running
> [19:22]   mikemccand: ie if a thread starts commit
> [19:22]   mikemccand: but fsync is taking a long time
> [19:22]   mikemccand: and another thread makes a change to sis
> [19:22]   ThetaPh1: ok so hard to find that bug
> [19:22]   mikemccand: we need our mock dir wrapper to sometimes take a 
> long time syncing
> {quote}
> Maybe we need such a test, I feel bad when such stupid changes don't make any 
> test fail.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3138) IW.addIndexes should fail fast if an index is too old/new

2011-09-14 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-3138:
---

Fix Version/s: (was: 3.4)
   3.5

> IW.addIndexes should fail fast if an index is too old/new
> -
>
> Key: LUCENE-3138
> URL: https://issues.apache.org/jira/browse/LUCENE-3138
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: core/index
>Reporter: Shai Erera
>Priority: Minor
> Fix For: 3.5, 4.0
>
>
> Today IW.addIndexes (both Dir and IR versions) do not check the format of the 
> incoming indexes. Therefore it could add a too old/new index and the app will 
> discover that only later, maybe during commit() or segment merges. We should 
> check that up front and fail fast.
> This issue is relevant only to 4.0 at the moment, which will not support 2.x 
> indexes anymore.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3071) PathHierarchyTokenizer adaptation for urls: splits reversed

2011-09-14 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-3071:
---

Fix Version/s: (was: 3.4)
   3.5

> PathHierarchyTokenizer adaptation for urls: splits reversed
> ---
>
> Key: LUCENE-3071
> URL: https://issues.apache.org/jira/browse/LUCENE-3071
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: modules/analysis
>Reporter: Olivier Favre
>Assignee: Ryan McKinley
>Priority: Minor
> Fix For: 3.5, 4.0
>
> Attachments: LUCENE-3071.patch, LUCENE-3071.patch, LUCENE-3071.patch, 
> LUCENE-3071.patch, ant.log.tar.bz2
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> {{PathHierarchyTokenizer}} should be usable to split urls the a "reversed" 
> way (useful for faceted search against urls):
> {{www.site.com}} -> {{www.site.com, site.com, com}}
> Moreover, it should be able to skip a given number of first (or last, if 
> reversed) tokens:
> {{/usr/share/doc/somesoftware/INTERESTING/PART}}
> Should give with 4 tokens skipped:
> {{INTERESTING}}
> {{INTERESTING/PART}}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-2822) TimeLimitingCollector starts thread in static {} with no way to stop them

2011-09-14 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-2822:
---

Fix Version/s: (was: 3.4)
   3.5

> TimeLimitingCollector starts thread in static {} with no way to stop them
> -
>
> Key: LUCENE-2822
> URL: https://issues.apache.org/jira/browse/LUCENE-2822
> Project: Lucene - Java
>  Issue Type: Bug
>Reporter: Robert Muir
> Fix For: 3.5, 4.0
>
>
> See the comment in LuceneTestCase.
> If you even do Class.forName("TimeLimitingCollector") it starts up a thread 
> in a static method, and there isn't a way to kill it.
> This is broken.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3120) span query matches too many docs when two query terms are the same unless inOrder=true

2011-09-14 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-3120:
---

Fix Version/s: (was: 3.4)
   3.5

> span query matches too many docs when two query terms are the same unless 
> inOrder=true
> --
>
> Key: LUCENE-3120
> URL: https://issues.apache.org/jira/browse/LUCENE-3120
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: core/search
>Reporter: Doron Cohen
>Priority: Minor
> Fix For: 3.5, 4.0
>
> Attachments: LUCENE-3120.patch, LUCENE-3120.patch
>
>
> spinoff of user list discussion - [SpanNearQuery - inOrder 
> parameter|http://markmail.org/message/i4cstlwgjmlcfwlc].
> With 3 documents:
> *  "a b x c d"
> *  "a b b d"
> *  "a b x b y d"
> Here are a few queries (the number in parenthesis indicates expected #hits):
> These ones work *as expected*:
> * (1)  in-order, slop=0, "b", "x", "b"
> * (1)  in-order, slop=0, "b", "b"
> * (2)  in-order, slop=1, "b", "b"
> These ones match *too many* hits:
> * (1)  any-order, slop=0, "b", "x", "b"
> * (1)  any-order, slop=1, "b", "x", "b"
> * (1)  any-order, slop=2, "b", "x", "b"
> * (1)  any-order, slop=3, "b", "x", "b"
> These ones match *too many* hits as well:
> * (1)  any-order, slop=0, "b", "b"
> * (2)  any-order, slop=1, "b", "b"
> Each of the above passes when using a phrase query (applying the slop, no 
> in-order indication in phrase query).
> This seems related to a known overlapping spans issue - [non-overlapping Span 
> queries|http://markmail.org/message/7jxn5eysjagjwlon] - as indicated by Hoss, 
> so we might decide to close this bug after all, but I would like to at least 
> have the junit that exposes the behavior in JIRA.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3004) Define Test Plan for 3.2

2011-09-14 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-3004:
---

Fix Version/s: (was: 3.4)
   3.5

> Define Test Plan for 3.2
> 
>
> Key: LUCENE-3004
> URL: https://issues.apache.org/jira/browse/LUCENE-3004
> Project: Lucene - Java
>  Issue Type: Test
>Reporter: Grant Ingersoll
>Priority: Blocker
> Fix For: 3.5, 4.0
>
>
> Before we can release, we need a test plan that defines what a successful 
> release candidate must do to be accepted.
> Test plan should be written at http://wiki.apache.org/lucene-java/TestPlans
> See 
> http://www.lucidimagination.com/search/document/14bd01e519f39aff/brainstorming_on_improving_the_release_process

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3122) Cascaded grouping

2011-09-14 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-3122:
---

Fix Version/s: (was: 3.4)
   3.5

> Cascaded grouping
> -
>
> Key: LUCENE-3122
> URL: https://issues.apache.org/jira/browse/LUCENE-3122
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: modules/grouping
>Reporter: Michael McCandless
> Fix For: 3.5, 4.0
>
>
> Similar to SOLR-2526, in that you are grouping on 2 separate fields, but 
> instead of treating those fields as a single grouping by a compound key, this 
> change would let you first group on key1 for the primary groups and then 
> secondarily on key2 within the primary groups.
> Ie, the result you get back would have groups A, B, C (grouped by key1) but 
> then the documents within group A would be grouped by key 2.
> I think this will be important for apps whose documents are the product of 
> denormalizing, ie where the Lucene document is really a sub-document of a 
> different identifier field.  Borrowing an example from LUCENE-3097, you have 
> doctors but each doctor may have multiple offices (addresses) where they 
> practice and so you index doctor X address as your lucene documents.  In this 
> case, your "identifier" field (that which "counts" for facets, and should be 
> "grouped" for presentation) is doctorid.  When you offer users search over 
> this index, you'd likely want to 1) group by distance (ie, < 0.1 miles, < 0.2 
> miles, etc., as a function query), but 2) also group by doctorid, ie cascaded 
> grouping.
> I suspect this would be easier to implement than it sounds: the per-group 
> collector used by the 2nd pass grouping collector for key1's grouping just 
> needs to be another grouping collector.  Spookily, though, that collection 
> would also have to be 2-pass, so it could get tricky since grouping is sort 
> of recursing on itself once we have LUCENE-3112, though, that should 
> enable efficient single pass grouping by the identifier (doctorid).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3133) Fix QueryParser to handle nested fields

2011-09-14 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-3133:
---

Fix Version/s: (was: 3.4)
   3.5

> Fix QueryParser to handle nested fields
> ---
>
> Key: LUCENE-3133
> URL: https://issues.apache.org/jira/browse/LUCENE-3133
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Michael McCandless
> Fix For: 3.5, 4.0
>
>
> Once we commit LUCENE-2454, we need to make it easy for apps to enable this 
> with QueryParser.
> It seems like it's a "schema" like behavior, ie we need to be able to express 
> the join structure of the related fields.
> And then whenever QP produces a query that spans fields requiring a join, the 
> NestedDocumentQuery is used to wrap the child fields?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3145) FST APIs should support CharsRef too

2011-09-14 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-3145:
---

Fix Version/s: (was: 3.4)
   3.5

> FST APIs should support CharsRef too
> 
>
> Key: LUCENE-3145
> URL: https://issues.apache.org/jira/browse/LUCENE-3145
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Michael McCandless
> Fix For: 3.5, 4.0
>
>
> The Builder API at heart is IntsRef, but we have sugar to pass in BytesRef, 
> CharSequence, etc.  We should add CharsRef too.
> Likewise we have IntsRefFSTEnum, BytesRefFSTEnum; we should add CharsRef 
> there.
> Finally the static Util methods should accept CharsRef.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3161) consider warnings from the source compilation

2011-09-14 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-3161:
---

Fix Version/s: (was: 3.4)
   3.5

> consider warnings from the source compilation
> -
>
> Key: LUCENE-3161
> URL: https://issues.apache.org/jira/browse/LUCENE-3161
> Project: Lucene - Java
>  Issue Type: Task
>  Components: general/build
>Reporter: Robert Muir
>  Labels: maybe32blocker
> Fix For: 3.5, 4.0
>
>
> as Doron mentioned in his review: At compiling there are various warning 
> printed, I think it would be more assuring for downloaders if the build runs 
> without warning. These warnings are not a stopper.
> we could conditionalize these warnings so that they don't "display" when 
> compiling from actual releases, but I have to wonder if we should hide 
> these... being open source I think we should display all our warts, maybe 
> some contributor sees these warnings and decides they want to submit a patch 
> to fix some of them.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3150) Wherever we catch & suppress Throwable we should not suppress ThreadInterruptedException

2011-09-14 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-3150:
---

Fix Version/s: (was: 3.4)
   3.5

> Wherever we catch & suppress Throwable we should not suppress 
> ThreadInterruptedException
> 
>
> Key: LUCENE-3150
> URL: https://issues.apache.org/jira/browse/LUCENE-3150
> Project: Lucene - Java
>  Issue Type: Bug
>Reporter: Michael McCandless
>Priority: Minor
> Fix For: 3.5, 4.0
>
>
> In various places we catch Throwable and suppress it, usually in exception 
> handlers where we want to just throw the first exc we had hit.
> But this is dangerous for a thread interrupt since it means we can swallow & 
> ignore the interrupt.
> We should at least catch the interrupt & restore the interrupt bit, if we 
> can't rethrow it.
> One example is in SegmentInfos where we write the segments.gen file... there 
> are many other examples in SegmentInfos too.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3201) improved compound file handling

2011-09-14 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-3201:
---

Fix Version/s: (was: 3.4)
   3.5

> improved compound file handling
> ---
>
> Key: LUCENE-3201
> URL: https://issues.apache.org/jira/browse/LUCENE-3201
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Robert Muir
>Assignee: Simon Willnauer
>Priority: Blocker
> Fix For: 3.5, 4.0
>
> Attachments: LUCENE-3201.patch, LUCENE-3201.patch
>
>
> Currently CompoundFileReader could use some improvements, i see the following 
> problems
> * its CSIndexInput extends bufferedindexinput, which is stupid for 
> directories like mmap.
> * it seeks on every readInternal
> * its not possible for a directory to override or improve the handling of 
> compound files.
> for example: it seems if you were impl'ing this thing from scratch, you would 
> just wrap the II directly (not extend BufferedIndexInput,
> and add compound file offset X to seek() calls, and override length(). But of 
> course, then you couldnt throw read past EOF always when you should,
> as a user could read into the next file and be left unaware.
> however, some directories could handle this better. for example MMapDirectory 
> could return an indexinput that simply mmaps the 'slice' of the CFS file.
> its underlying bytebuffer etc naturally does bounds checks already etc, so it 
> wouldnt need to be buffered, not even needing to add any offsets to seek(),
> as its position would just work.
> So I think we should try to refactor this so that a Directory can customize 
> how compound files are handled, the simplest 
> case for the least code change would be to add this to Directory.java:
> {code}
>   public Directory openCompoundInput(String filename) {
> return new CompoundFileReader(this, filename);
>   }
> {code}
> Because most code depends upon the fact compound files are implemented as a 
> Directory and transparent. at least then a subclass could override...
> but the 'recursion' is a little ugly... we could still label it 
> expert+internal+experimental or whatever.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3175) speed up core tests

2011-09-14 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-3175:
---

Fix Version/s: (was: 3.4)
   3.5

> speed up core tests
> ---
>
> Key: LUCENE-3175
> URL: https://issues.apache.org/jira/browse/LUCENE-3175
> Project: Lucene - Java
>  Issue Type: Test
>Reporter: Robert Muir
> Fix For: 3.5, 4.0
>
> Attachments: LUCENE-3175.patch, LUCENE-3175.patch, LUCENE-3175.patch, 
> LUCENE-3175_2.patch, test-core_core_2_duo_2-53GHZ.rtf, 
> test-core_core_2_duo_2-53GHZ.rtf
>
>
> Our core tests have gotten slower and slower, if you don't have a really fast 
> computer its probably frustrating.
> I think we should:
> 1. still have random parameters, but make the 'obscene' settings like 
> SimpleText rarer... we can always make them happen more on NIGHTLY
> 2. tests that make a lot of documents can conditionalize on NIGHTLY so that 
> they are still doing a reasonable test on ordinary runs e.g. numdocs = 
> (NIGHTLY ? 1 : 1000) * multiplier
> 3. refactor some of the slow huge classes with lots of tests like 
> TestIW/TestIR, at least pull out really slow methods like TestIR.testDiskFull 
> into its own class. this gives better parallelization.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3205) remove MultiTermQuery get/inc/clear totalNumberOfTerms

2011-09-14 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-3205:
---

Fix Version/s: (was: 3.4)
   3.5

> remove MultiTermQuery get/inc/clear totalNumberOfTerms
> --
>
> Key: LUCENE-3205
> URL: https://issues.apache.org/jira/browse/LUCENE-3205
> Project: Lucene - Java
>  Issue Type: Bug
>Reporter: Robert Muir
>Assignee: Uwe Schindler
> Fix For: 3.5, 4.0
>
> Attachments: LUCENE-3205.patch
>
>
> This method is not correct if the index has more than one segment.
> Its also not thread safe, and it means calling query.rewrite() modifies
> the original query. 
> All of these things add up to confusion, I think we should remove this 
> from multitermquery, the only thing that "uses" it is the NRQ tests, which 
> conditionalizes all the asserts anyway.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3339) TestNRTThreads hangs in nightly 3.x builds

2011-09-14 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-3339:
---

Fix Version/s: (was: 3.4)
   3.5

> TestNRTThreads hangs in nightly 3.x builds
> --
>
> Key: LUCENE-3339
> URL: https://issues.apache.org/jira/browse/LUCENE-3339
> Project: Lucene - Java
>  Issue Type: Bug
>Reporter: Robert Muir
>Assignee: Michael McCandless
> Fix For: 3.5
>
> Attachments: LUCENE-3339.patch
>
>
> Maybe we have a problem, maybe its a bug in the test.
> But its strange that lately the 3.x nightlies have been hanging here.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3206) FST package API refactoring

2011-09-14 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-3206:
---

Fix Version/s: (was: 3.4)
   3.5

> FST package API refactoring
> ---
>
> Key: LUCENE-3206
> URL: https://issues.apache.org/jira/browse/LUCENE-3206
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: core/FSTs
>Affects Versions: 3.2
>Reporter: Dawid Weiss
>Assignee: Dawid Weiss
>Priority: Minor
> Fix For: 3.5, 4.0
>
> Attachments: LUCENE-3206.patch
>
>
> The current API is still marked @experimental, so I think there's still time 
> to fiddle with it. I've been using the current API for some time and I do 
> have some ideas for improvement. This is a placeholder for these -- I'll post 
> a patch once I have a working proof of concept.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3269) Speed up Top-K sampling tests

2011-09-14 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-3269:
---

Fix Version/s: (was: 3.4)
   3.5

> Speed up Top-K sampling tests
> -
>
> Key: LUCENE-3269
> URL: https://issues.apache.org/jira/browse/LUCENE-3269
> Project: Lucene - Java
>  Issue Type: Test
>  Components: modules/facet
>Reporter: Robert Muir
> Fix For: 3.5, 4.0
>
> Attachments: LUCENE-3269.patch, LUCENE-3269.patch, LUCENE-3269.patch
>
>
> speed up the top-k sampling tests (but make sure they are thorough on nightly 
> etc still)
> usually we would do this with use of atLeast(), but these tests are somewhat 
> tricky,
> so maybe a different approach is needed.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3237) FSDirectory.fsync() may not work properly

2011-09-14 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-3237:
---

Fix Version/s: (was: 3.4)
   3.5

> FSDirectory.fsync() may not work properly
> -
>
> Key: LUCENE-3237
> URL: https://issues.apache.org/jira/browse/LUCENE-3237
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: core/store
>Reporter: Shai Erera
> Fix For: 3.5, 4.0
>
>
> Spinoff from LUCENE-3230. FSDirectory.fsync() opens a new RAF, sync() its 
> FileDescriptor and closes RAF. It is not clear that this syncs whatever was 
> written to the file by other FileDescriptors. It would be better if we do 
> this operation on the actual RAF/FileOS which wrote the data. We can add 
> sync() to IndexOutput and FSIndexOutput will do that.
> Directory-wise, we should stop syncing on file names, and instead sync on the 
> IOs that performed the write operations.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3184) add LuceneTestCase.rarely()/LuceneTestCase.atLeast()

2011-09-14 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-3184:
---

Fix Version/s: (was: 3.4)
   3.5

> add LuceneTestCase.rarely()/LuceneTestCase.atLeast()
> 
>
> Key: LUCENE-3184
> URL: https://issues.apache.org/jira/browse/LUCENE-3184
> Project: Lucene - Java
>  Issue Type: Test
>Reporter: Robert Muir
> Fix For: 3.5, 4.0
>
> Attachments: LUCENE-3184.patch
>
>
> in LUCENE-3175, the tests were sped up a lot by using reasonable number of 
> iterations normally, but cranking up for NIGHTLY.
> we also do crazy things more 'rarely' for normal builds (e.g. simpletext, 
> payloads, crazy merge params, etc)
> also, we found some bugs by doing this, because in general our parameters are 
> too fixed.
> however, it made the code look messy... I propose some new methods:
> instead of some crazy code in your test like:
> {code}
> int numdocs = (TEST_NIGHTLY ? 1000 : 100) * RANDOM_MULTIPLIER;
> {code}
> you use:
> {code}
> int numdocs = atLeast(100);
> {code}
> this will apply the multiplier, also factor in nightly, and finally add some 
> random fudge... so e.g. in local runs its sometimes 127 docs, sometimes 113 
> docs, etc.
> additionally instead of code like:
> {code}
> if ((TEST_NIGHTLY && random.nextBoolean()) || (random.nextInt(20) == 17)) {
> {code}
> you do
> {code}
> if (rarely()) {
> {code}
> which applies NIGHTLY and also the multiplier (logarithmic growth).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3218) Make CFS appendable

2011-09-14 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-3218:
---

Fix Version/s: (was: 3.4)
   3.5

> Make CFS appendable  
> -
>
> Key: LUCENE-3218
> URL: https://issues.apache.org/jira/browse/LUCENE-3218
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: core/index
>Affects Versions: 3.4, 4.0
>Reporter: Simon Willnauer
>Priority: Blocker
> Fix For: 3.5, 4.0
>
> Attachments: LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218.patch, 
> LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218.patch, 
> LUCENE-3218_3x.patch, LUCENE-3218_test_fix.patch, LUCENE-3218_tests.patch
>
>
> Currently CFS is created once all files are written during a flush / merge. 
> Once on disk the files are copied into the CFS format which is basically a 
> unnecessary for some of the files. We can at any time write at least one file 
> directly into the CFS which can save a reasonable amount of IO. For instance 
> stored fields could be written directly during indexing and during a Codec 
> Flush one of the written files can be appended directly. This optimization is 
> a nice sideeffect for lucene indexing itself but more important for DocValues 
> and LUCENE-3216 we could transparently pack per field files into a single 
> file only for docvalues without changing any code once LUCENE-3216 is 
> resolved.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3229) SpanNearQuery: ordered spans should not overlap

2011-09-14 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-3229:
---

Fix Version/s: (was: 3.4)
   3.5

> SpanNearQuery: ordered spans should not overlap
> ---
>
> Key: LUCENE-3229
> URL: https://issues.apache.org/jira/browse/LUCENE-3229
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: core/search
>Affects Versions: 3.1
> Environment: Windows XP, Java 1.6
>Reporter: ludovic Boutros
> Fix For: 3.5
>
> Attachments: LUCENE-3229.patch, LUCENE-3229.patch, SpanOverlap.diff, 
> SpanOverlap2.diff, SpanOverlapTestUnit.diff
>
>
> While using Span queries I think I've found a little bug.
> With a document like this (from the TestNearSpansOrdered unit test) :
> "w1 w2 w3 w4 w5"
> If I try to search for this span query :
> spanNear([spanNear([field:w3, field:w5], 1, true), field:w4], 0, true)
> the above document is returned and I think it should not because 'w4' is not 
> after 'w5'.
> The 2 spans are not ordered, because there is an overlap.
> I will add a test patch in the TestNearSpansOrdered unit test.
> I will add a patch to solve this issue too.
> Basicaly it modifies the two docSpansOrdered functions to make sure that the 
> spans does not overlap.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3392) Combining analyzers output

2011-09-14 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-3392:
---

Fix Version/s: (was: 3.4)
   3.5

> Combining analyzers output
> --
>
> Key: LUCENE-3392
> URL: https://issues.apache.org/jira/browse/LUCENE-3392
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: modules/analysis
>Reporter: Olivier Favre
>Priority: Minor
>  Labels: analysis
> Fix For: 3.5, 4.0
>
> Attachments: ComboAnalyzer-lucene-trunk.patch, 
> ComboAnalyzer-lucene3x.patch, ComboAnalyzer-lucene3x.patch
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> It should be easy to combine the output of multiple Analyzers, or 
> TokenStreams.
> A ComboAnalyzer and a ComboTokenStream class would take multiple instances, 
> and multiplex their output, keeping a rough order of tokens like increasing 
> position then increasing start offset then increasing end offset.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3415) Snowball filter to include original word too

2011-09-14 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-3415:
---

Fix Version/s: (was: 3.4)
   3.5

> Snowball filter to include original word too
> 
>
> Key: LUCENE-3415
> URL: https://issues.apache.org/jira/browse/LUCENE-3415
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: modules/analysis
>Affects Versions: 3.3
> Environment: All
>Reporter: Manish
>  Labels: features
> Fix For: 3.5, 4.0
>
>
> 1. Currently, snowball filter deletes the original word and adds the stemmed 
> word to the index. So, if i want to do search with / without stemming, i have 
> to keep 2 fields, one with stemming and one without it. 
> 2. Rather than doing this, if we have configurable item to preserve original, 
> it would solve more business problem. 
> 3. Using single field, i can do search using stemming / without stemming by 
> changing the query filters. 
> The same can also be done for phonetic filters too. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3343) Comparison operators >,>=,<,<= and = support as RangeQuery syntax in QueryParser

2011-09-14 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-3343:
---

Fix Version/s: (was: 3.4)
   3.5

> Comparison operators >,>=,<,<= and = support as RangeQuery syntax in 
> QueryParser
> 
>
> Key: LUCENE-3343
> URL: https://issues.apache.org/jira/browse/LUCENE-3343
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: modules/queryparser
>Reporter: Olivier Favre
>Assignee: Adriano Crestani
>Priority: Minor
>  Labels: parser, query
> Fix For: 3.5, 4.0
>
> Attachments: NumCompQueryParser-3x.patch, NumCompQueryParser.patch
>
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> To offer better interoperability with other search engines and to provide an 
> easier and more straight forward syntax,
> the operators >, >=, <, <= and = should be available to express an open range 
> query.
> They should at least work for numeric queries.
> '=' can be made a synonym for ':'.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3380) enable FileSwitchDirectory randomly in tests and fix compound-file/NoSuchDirectoryException bugs

2011-09-14 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-3380:
---

Fix Version/s: (was: 3.4)
   3.5

> enable FileSwitchDirectory randomly in tests and fix 
> compound-file/NoSuchDirectoryException bugs
> 
>
> Key: LUCENE-3380
> URL: https://issues.apache.org/jira/browse/LUCENE-3380
> Project: Lucene - Java
>  Issue Type: Bug
>Reporter: Robert Muir
>Assignee: Robert Muir
> Fix For: 3.5, 4.0
>
> Attachments: LUCENE-3380.patch
>
>
> Looks like FileSwitchDirectory has the same bugs in it as LUCENE-3374.
> We should randomly enable this guy in tests and flush them all out the same 
> way.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3363) minimizeHopcroft OOMEs on smallish (2096 states, finite) automaton

2011-09-14 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-3363:
---

Fix Version/s: (was: 3.4)
   3.5

> minimizeHopcroft OOMEs on smallish (2096 states, finite) automaton
> --
>
> Key: LUCENE-3363
> URL: https://issues.apache.org/jira/browse/LUCENE-3363
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: core/other
>Reporter: Michael McCandless
> Fix For: 3.5, 4.0
>
> Attachments: LUCENE-3363.patch
>
>
> Not sure what's up w/ this... if you check out the blocktree branch 
> (LUCENE-3030) and comment out the @Ignore in 
> TestTermsEnum2.testFiniteVersusInfinite then this should hit OOME: {[ant 
> test-core -Dtestcase=TestTermsEnum2 -Dtestmethod=testFiniteVersusInfinite 
> -Dtests.seed=-2577608857970454726:-2463580050179334504}}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[ANNOUNCE] Apache Solr 3.4.0 released

2011-09-14 Thread Michael McCandless
September 14 2011, Apache Solr™ 3.4.0 available

The Lucene PMC is pleased to announce the release of Apache Solr 3.4.0.

Apache Solr is the popular, blazing fast open source enterprise search
platform from the Apache Lucene project. Its major features include
powerful full-text search, hit highlighting, faceted search, dynamic
clustering, database integration, rich document (e.g., Word, PDF)
handling, and geospatial search.  Solr is highly scalable, providing
distributed search and index replication, and it powers the search and
navigation features of many of the world's largest internet sites.

This release contains numerous bug fixes, optimizations, and
improvements, some of which are highlighted below.  The release
is available for immediate download at:

   http://www.apache.org/dyn/closer.cgi/lucene/solr (see note below).

If you are already using Apache Solr 3.1, 3.2 or 3.3, we strongly
recommend you upgrade to 3.4.0 because of the index corruption bug on OS
or computer crash or power loss (LUCENE-3418), now fixed in 3.4.0.

See the CHANGES.txt file included with the release for a full list of
details.

Solr 3.4.0 Release Highlights:

  * Bug fixes and improvements from Apache Lucene 3.4.0, including a
major bug (LUCENE-3418) whereby a Lucene index could
easily become corrupted if the OS or computer crashed or lost
power.

  * SolrJ client can now parse grouped and range facets results
(SOLR-2523).

  * A new XsltUpdateRequestHandler allows posting XML that's
transformed by a provided XSLT into a valid Solr document
(SOLR-2630).

  * Post-group faceting option (group.truncate) can now compute
facet counts for only the highest ranking documents per-group.
(SOLR-2665).

  * Add commitWithin update request parameter to all update handlers
that were previously missing it.  This tells Solr to commit the
change within the specified amount of time (SOLR-2540).

  * You can now specify NIOFSDirectory (SOLR-2670).

  * New parameter hl.phraseLimit speeds up FastVectorHighlighter
(LUCENE-3234).

  * The query cache and filter cache can now be disabled per request
See http://wiki.apache.org/solr/CommonQueryParameters#Caching_of_filters
(SOLR-2429).

  * Improved memory usage, build time, and performance of
SynonymFilterFactory (LUCENE-3233).

  * Added omitPositions to the schema, so you can omit position
information while still indexing term frequencies (LUCENE-2048).

  * Various fixes for multi-threaded DataImportHandler.

Note: The Apache Software Foundation uses an extensive mirroring network for
distributing releases.  It is possible that the mirror you are using may not
have replicated the release yet.  If that is the case, please try another
mirror.  This also goes for Maven access.

Happy searching,

Apache Lucene/Solr Developers

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[ANNOUNCE] Apache Lucene 3.4.0 released

2011-09-14 Thread Michael McCandless
September 14 2011, Apache Lucene™ 3.4.0 available

The Lucene PMC is pleased to announce the release of Apache Lucene 3.4.0.

Apache Lucene is a high-performance, full-featured text search engine
library written entirely in Java. It is a technology suitable for nearly
any application that requires full-text search, especially cross-platform.

This release contains numerous bug fixes, optimizations, and
improvements, some of which are highlighted below.  The release
is available for immediate download at:

   http://www.apache.org/dyn/closer.cgi/lucene/java (see note below).

If you are already using Apache Lucene 3.1, 3.2 or 3.3, we strongly
recommend you upgrade to 3.4.0 because of the index corruption bug on
OS or computer crash or power loss (LUCENE-3418), now fixed in 3.4.0.

See the CHANGES.txt file included with the release for a full list of
details.

Lucene 3.4.0 Release Highlights:

  * Fixed a major bug (LUCENE-3418) whereby a Lucene index could
easily become corrupted if the OS or computer crashed or lost
power.

  * Added a new faceting module (contrib/facet) for computing facet
counts (both hierarchical and non-hierarchical) at search
time (LUCENE-3079).

  * Added a new join module (contrib/join), enabling indexing and
searching of nested (parent/child) documents using
BlockJoinQuery/Collector (LUCENE-3171).

  * It is now possible to index documents with term frequencies
included but without positions (LUCENE-2048); previously
omitTermFreqAndPositions always omitted both.

  * The modular QueryParser (contrib/queryparser) can now create
NumericRangeQuery.

  * Added SynonymFilter, in contrib/analyzers, to apply multi-word
synonyms during indexing or querying, including parsers to read
the wordnet and solr synonym formats (LUCENE-3233).

  * You can now control how documents that don't have a value on the
sort field should sort (LUCENE-3390), using SortField.setMissingValue.

  * Fixed a case where term vectors could be silently deleted from the
index after addIndexes (LUCENE-3402).

Note: The Apache Software Foundation uses an extensive mirroring network for
distributing releases.  It is possible that the mirror you are using may not
have replicated the release yet.  If that is the case, please try another
mirror.  This also goes for Maven access.

Happy searching,

Apache Lucene/Solr Developers

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2066) Search Grouping: support distributed search

2011-09-14 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13104917#comment-13104917
 ] 

Yonik Seeley commented on SOLR-2066:


I took a quick peek at this, and I see some changes to how distrib search works 
(ShardRequestFactory).
Could you give a brief explanation about the need for that and how it works?  
Maybe changes like this should be in their own issue so it's easy to tell other 
refactoring vs what's needed just for grouping.

> Search Grouping: support distributed search
> ---
>
> Key: SOLR-2066
> URL: https://issues.apache.org/jira/browse/SOLR-2066
> Project: Solr
>  Issue Type: Sub-task
>Reporter: Yonik Seeley
> Fix For: 3.5, 4.0
>
> Attachments: SOLR-2066.patch, SOLR-2066.patch, SOLR-2066.patch, 
> SOLR-2066.patch, SOLR-2066.patch, SOLR-2066.patch, SOLR-2066.patch, 
> SOLR-2066.patch, SOLR-2066.patch, SOLR-2066.patch, SOLR-2066.patch, 
> SOLR-2066.patch, SOLR-2066.patch
>
>
> Support distributed field collapsing / search grouping.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-2767) ClassCastException when using FieldAnalysisResponse and the analyzer list contains the CharMappingFilter (or any CharFilter)

2011-09-14 Thread Spyros Kapnissis (JIRA)
ClassCastException when using FieldAnalysisResponse and the analyzer list 
contains the CharMappingFilter (or any CharFilter)


 Key: SOLR-2767
 URL: https://issues.apache.org/jira/browse/SOLR-2767
 Project: Solr
  Issue Type: Bug
Affects Versions: 3.3, 4.0
Reporter: Spyros Kapnissis


I use the FieldAnalysisResponse class in order to gather some information about 
the analysis process. However, I get a ClassCastException (cannot convert 
String to NamedList) thrown at AnalysisResponseBase.buildPhases method 
if I have included the CharMappingFilter in my configuration.

It seems that a CharFilter does not create a NamedList, but a String 
entry in the analysis response.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2761) FSTLookup should use long-tail like discretization instead of proportional (linear)

2011-09-14 Thread David Smiley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13104864#comment-13104864
 ] 

David Smiley commented on SOLR-2761:


I recommend that we follow through with the alternative suggested in the source 
code comment: sort by weight and divide evenly.  That will handle the actual 
distribution in the data no matter what the curve looks like.

> FSTLookup should use long-tail like discretization instead of proportional 
> (linear)
> ---
>
> Key: SOLR-2761
> URL: https://issues.apache.org/jira/browse/SOLR-2761
> Project: Solr
>  Issue Type: Improvement
>  Components: spellchecker
>Affects Versions: 3.4
>Reporter: David Smiley
>Priority: Minor
>
> The Suggester's FSTLookup implementation discretizes the term frequencies 
> into a configurable number of buckets (configurable as "weightBuckets") in 
> order to deal with FST limitations. The mapping of a source frequency into a 
> bucket is a proportional (i.e. linear) mapping from the minimum and maximum 
> value. I don't think this makes sense at all given the well-known long-tail 
> like distribution of term frequencies. As a result of this problem, I've 
> found it necessary to increase weightBuckets substantially, like >100, to get 
> quality suggestions. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-2585) Context-Sensitive Spelling Suggestions & Collations

2011-09-14 Thread James Dyer (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Dyer updated SOLR-2585:
-

Attachment: SOLR-2585.patch

As previously discussed, the Lucene portion of this issue has been spun off to 
LUCENE-3436.  

This is a new patch with just the Solr piece.  Also, the new "Suggest Mode" 
enum is used both for the Original Lucene Spell Checker and DirectSpellChecker.

> Context-Sensitive Spelling Suggestions & Collations
> ---
>
> Key: SOLR-2585
> URL: https://issues.apache.org/jira/browse/SOLR-2585
> Project: Solr
>  Issue Type: Improvement
>  Components: spellchecker
>Affects Versions: 4.0
>Reporter: James Dyer
>Priority: Minor
> Attachments: SOLR-2585.patch, SOLR-2585.patch, SOLR-2585.patch
>
>
> Solr currently cannot offer what I'm calling here a "context-sensitive" 
> spelling suggestion.  That is, if a user enters one or more words that have 
> docFrequency > 0, but nevertheless are misspelled, then no suggestions are 
> offered.  Currently, Solr will always consider a word "correctly spelled" if 
> it is in the index and/or dictionary, regardless of context.  This issue & 
> patch add support for context-sensitive spelling suggestions. 
> See SpellCheckCollatorTest.testContextSensitiveCollate() for a the typical 
> use case for this functionality.  This tests both using 
> IndexBasedSepllChecker and DirectSolrSpellChecker. 
> Two new Spelling Parameters were added:
>   - spellcheck.alternativeTermCount - The count of suggestions to return for 
> each query term existing in the index and/or dictionary.  Presumably, users 
> will want fewer suggestions for words with docFrequency>0.  Also setting this 
> value turns "on" context-sensitive spell suggestions. 
>   - spellcheck.maxResultsForSuggest - The maximum number of hits the request 
> can return in order to both generate spelling suggestions and set the 
> "correctlySpelled" element to "false".  For example, if this is set to 5 and 
> the user's query returns 5 or fewer results, the spellchecker will report 
> "correctlySpelled=false" and also offer suggestions (and collations if 
> requested).  Setting this greater than zero is useful for creating 
> "did-you-mean" suggestions for queries that return a low number of hits.
> I have also included a test using shards.  See additions to 
> DistributedSpellCheckComponentTest. 
> In Lucene, SpellChecker.java can already support this functionality (by 
> passing a null IndexReader and field-name).  The DirectSpellChecker, however, 
> needs a minor enhancement.  This gives the option to allow DirectSpellChecker 
> to return suggestions for all query terms regardless of frequency.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2756) SolrJ maven dependencies are faulty; needless dependency on lucene-core

2011-09-14 Thread Steven Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13104857#comment-13104857
 ] 

Steven Rowe commented on SOLR-2756:
---

With the patch applied, the output under {{solr/solrj/}} from {{mvn 
dependency:tree}} under Java 1.6 is now:

{noformat}
[INFO] org.apache.solr:solr-solrj:jar:3.5-SNAPSHOT
[INFO] +- commons-httpclient:commons-httpclient:jar:3.1:compile
[INFO] |  +- commons-logging:commons-logging:jar:1.1.1:compile (version managed 
from 1.0.4)
[INFO] |  \- commons-codec:commons-codec:jar:1.4:compile (version managed from 
1.2)
[INFO] +- commons-io:commons-io:jar:1.4:compile
[INFO] +- org.codehaus.woodstox:wstx-asl:jar:3.2.7:compile
[INFO] \- org.slf4j:slf4j-api:jar:1.6.1:compile
{noformat}

> SolrJ maven dependencies are faulty; needless dependency on lucene-core
> ---
>
> Key: SOLR-2756
> URL: https://issues.apache.org/jira/browse/SOLR-2756
> Project: Solr
>  Issue Type: Bug
>  Components: clients - java
>Affects Versions: 3.3
>Reporter: David Smiley
>Priority: Minor
> Attachments: SOLR-2756-zookeeper-and-stax-api.patch, 
> SOLR-2756-zookeeper-and-stax-api.patch
>
>
> I included a SolrJ 3.3 dependency into a new project and I noticed needless 
> dependencies transitive show up.
> Here is a subset of the output from mvn dependency:tree:
> {noformat}
> [INFO] +- org.apache.solr:solr-solrj:jar:3.3.0:compile
> [INFO] |  +- org.apache.lucene:lucene-core:jar:3.3.0:compile
> [INFO] |  +- commons-httpclient:commons-httpclient:jar:3.1:compile
> [INFO] |  |  \- commons-codec:commons-codec:jar:1.2:compile
> [INFO] |  +- 
> org.apache.geronimo.specs:geronimo-stax-api_1.0_spec:jar:1.0.1:compile
> [INFO] |  +- org.apache.zookeeper:zookeeper:jar:3.3.1:compile
> [INFO] |  |  +- log4j:log4j:jar:1.2.15:compile
> [INFO] |  |  |  \- javax.mail:mail:jar:1.4:compile
> [INFO] |  |  | \- javax.activation:activation:jar:1.1:compile
> [INFO] |  |  \- jline:jline:jar:0.9.94:compile
> [INFO] |  \- org.codehaus.woodstox:wstx-asl:jar:3.2.7:compile
> [INFO] | \- stax:stax-api:jar:1.0.1:compile
> {noformat}
> Clearly there is an inconsistency with solr/dist/solrj-lib and this list.
> * lucene-core dependency should be removed
> * AFAIK, geronimo-stax-api and wstx-asl are only needed for Java 1.5.  Right? 
>  These can be put in a maven profile activated by jdk1.5.
> * zookeeper dependency should be removed. Is this used in Solr 4?  Even if it 
> is, it strikes me as an optional dependency.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3436) Spellchecker "Suggest Mode" Support

2011-09-14 Thread James Dyer (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Dyer updated LUCENE-3436:
---

Attachment: LUCENE-3436.patch

- Creates a new Enum, "SuggestMode".  
- SpellChecker methods that used to take a boolean "morePopular" have been 
converted to use the new Enum.
- Old SpellChecker methods have been marked as @Deprecated with comments (can 
be removed entirely for Trunk.  Included here for possible 3.x inclusion)
- A new Unit Test method for 0.a.l.s.s.SpellChecker tests 
"SUGGEST_MORE_POPULAR" and "SUGGEST_ALWAYS" (prior, "morePopular" had no test 
coverage).
- A new Unit Test scenario added to TestDirectSpellChecker for testing 
"SUGGEST_ALWAYS".

> Spellchecker "Suggest Mode" Support
> ---
>
> Key: LUCENE-3436
> URL: https://issues.apache.org/jira/browse/LUCENE-3436
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: modules/spellchecker
>Affects Versions: 3.3, 4.0
>Reporter: James Dyer
>Priority: Minor
> Fix For: 3.5, 4.0
>
> Attachments: LUCENE-3436.patch
>
>
> This is a spin-off from SOLR-2585.
> Currently o.a.l.s.s.SpellChecker and o.a.l.s.s.DirectSpellChecker support two 
> "Suggest Modes":
> 1. Suggest for terms that are not in the index.
> 2. Suggest "more popular" terms.
> This issue is to add a third Suggest Mode:
> 3. Suggest always.
> This will assist users in developing context-sensitive spell suggestions 
> and/or did-you-mean suggestions.  See SOLR-2585 for a full discussion.
> Note that o.a.l.s.s.SpellChecker already can support this functionality, if 
> the user passes in a NULL term & IndexReader.  This, however, is not 
> intutive.  o.a.l.s.s.DirectSpellChecker currently has no support for this 
> third Suggest Mode.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2761) FSTLookup should use long-tail like discretization instead of proportional (linear)

2011-09-14 Thread Dawid Weiss (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13104845#comment-13104845
 ] 

Dawid Weiss commented on SOLR-2761:
---

I guess a lot depends on the use case. In my case quantization was not a 
problem (the scores were "rough" and query independent anyway, so they did fall 
into corresponding buckets). "poor" performance would then have to be backed by 
what the requirement really is -- if one needs sorting by exact scores then the 
method used to speed up FSTLookup simply isn't a good fit. Still, compared to 
fetching everything and resorting this is a hell of a lot faster, so many folks 
(including me) may find it helpful.

It all depends, in other words.

As for using more buckets -- sure, you can do this. In fact, you can combine 
both approaches and use quantization to prefetch a buffer of matches, then 
collect outputs, sort and if this fills your desired number of results then 
there is no need to search any further because all other buckets will have 
lower scores (exact).

> FSTLookup should use long-tail like discretization instead of proportional 
> (linear)
> ---
>
> Key: SOLR-2761
> URL: https://issues.apache.org/jira/browse/SOLR-2761
> Project: Solr
>  Issue Type: Improvement
>  Components: spellchecker
>Affects Versions: 3.4
>Reporter: David Smiley
>Priority: Minor
>
> The Suggester's FSTLookup implementation discretizes the term frequencies 
> into a configurable number of buckets (configurable as "weightBuckets") in 
> order to deal with FST limitations. The mapping of a source frequency into a 
> bucket is a proportional (i.e. linear) mapping from the minimum and maximum 
> value. I don't think this makes sense at all given the well-known long-tail 
> like distribution of term frequencies. As a result of this problem, I've 
> found it necessary to increase weightBuckets substantially, like >100, to get 
> quality suggestions. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2756) SolrJ maven dependencies are faulty; needless dependency on lucene-core

2011-09-14 Thread Ryan McKinley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13104843#comment-13104843
 ] 

Ryan McKinley commented on SOLR-2756:
-

bq. However, because ConcurrentLRUCache is the only class in Solrj that 
requires the lucene-core dependency, and solr-core's FastLRUCache is the only 
Lucene/Solr use of ConcurrentLRUCache, I think ConcurrentLRUCache should be 
moved from Solrj to solr-core.

+1

solrj is the *client* it should not have any dependency on the server.   

> SolrJ maven dependencies are faulty; needless dependency on lucene-core
> ---
>
> Key: SOLR-2756
> URL: https://issues.apache.org/jira/browse/SOLR-2756
> Project: Solr
>  Issue Type: Bug
>  Components: clients - java
>Affects Versions: 3.3
>Reporter: David Smiley
>Priority: Minor
> Attachments: SOLR-2756-zookeeper-and-stax-api.patch, 
> SOLR-2756-zookeeper-and-stax-api.patch
>
>
> I included a SolrJ 3.3 dependency into a new project and I noticed needless 
> dependencies transitive show up.
> Here is a subset of the output from mvn dependency:tree:
> {noformat}
> [INFO] +- org.apache.solr:solr-solrj:jar:3.3.0:compile
> [INFO] |  +- org.apache.lucene:lucene-core:jar:3.3.0:compile
> [INFO] |  +- commons-httpclient:commons-httpclient:jar:3.1:compile
> [INFO] |  |  \- commons-codec:commons-codec:jar:1.2:compile
> [INFO] |  +- 
> org.apache.geronimo.specs:geronimo-stax-api_1.0_spec:jar:1.0.1:compile
> [INFO] |  +- org.apache.zookeeper:zookeeper:jar:3.3.1:compile
> [INFO] |  |  +- log4j:log4j:jar:1.2.15:compile
> [INFO] |  |  |  \- javax.mail:mail:jar:1.4:compile
> [INFO] |  |  | \- javax.activation:activation:jar:1.1:compile
> [INFO] |  |  \- jline:jline:jar:0.9.94:compile
> [INFO] |  \- org.codehaus.woodstox:wstx-asl:jar:3.2.7:compile
> [INFO] | \- stax:stax-api:jar:1.0.1:compile
> {noformat}
> Clearly there is an inconsistency with solr/dist/solrj-lib and this list.
> * lucene-core dependency should be removed
> * AFAIK, geronimo-stax-api and wstx-asl are only needed for Java 1.5.  Right? 
>  These can be put in a maven profile activated by jdk1.5.
> * zookeeper dependency should be removed. Is this used in Solr 4?  Even if it 
> is, it strikes me as an optional dependency.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-3436) Spellchecker "Suggest Mode" Support

2011-09-14 Thread James Dyer (JIRA)
Spellchecker "Suggest Mode" Support
---

 Key: LUCENE-3436
 URL: https://issues.apache.org/jira/browse/LUCENE-3436
 Project: Lucene - Java
  Issue Type: Improvement
  Components: modules/spellchecker
Affects Versions: 3.3, 4.0
Reporter: James Dyer
Priority: Minor
 Fix For: 3.5, 4.0


This is a spin-off from SOLR-2585.

Currently o.a.l.s.s.SpellChecker and o.a.l.s.s.DirectSpellChecker support two 
"Suggest Modes":
1. Suggest for terms that are not in the index.
2. Suggest "more popular" terms.

This issue is to add a third Suggest Mode:
3. Suggest always.

This will assist users in developing context-sensitive spell suggestions and/or 
did-you-mean suggestions.  See SOLR-2585 for a full discussion.

Note that o.a.l.s.s.SpellChecker already can support this functionality, if the 
user passes in a NULL term & IndexReader.  This, however, is not intutive.  
o.a.l.s.s.DirectSpellChecker currently has no support for this third Suggest 
Mode.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2739) TestSqlEntityProcessorDelta.testNonWritablePersistFile failures on some systems

2011-09-14 Thread Shawn Heisey (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13104841#comment-13104841
 ] 

Shawn Heisey commented on SOLR-2739:


I've also been seeing intermittent failures in TestCSVLoader, in both 3.4 and 
3x.  The nature of the failure is the same as on TestSqlEntityProcessorDelta, 
numFound shows a different value than what the test expects.  If I run the 
following command over and over, sometimes it will fail, but mostly it will 
pass:

ant test -Dtestcase=TestCSVLoader -Dtestmethod=testCommitWithin

Here's one failure on lucene_solr_3_4:

[junit] Testsuite: org.apache.solr.handler.TestCSVLoader
[junit] Testcase: testCommitWithin(org.apache.solr.handler.TestCSVLoader):  
Caused an ERROR
[junit] Exception during query
[junit] java.lang.RuntimeException: Exception during query
[junit] at 
org.apache.solr.SolrTestCaseJ4.assertQ(SolrTestCaseJ4.java:385)
[junit] at 
org.apache.solr.SolrTestCaseJ4.assertQ(SolrTestCaseJ4.java:352)
[junit] at 
org.apache.solr.handler.TestCSVLoader.testCommitWithin(TestCSVLoader.java:121)
[junit] at 
org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:147)
[junit] at 
org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:50)
[junit] Caused by: java.lang.RuntimeException: REQUEST FAILED: 
xpath=//*[@numFound='3']
[junit] xml response was: 
[junit] 
[junit] 
00
[junit] 
[junit]
[junit] request 
was:start=0&q=id:[100+TO+110]&qt=standard&rows=20&version=2.0
[junit] at 
org.apache.solr.SolrTestCaseJ4.assertQ(SolrTestCaseJ4.java:378)
[junit]
[junit]
[junit] Tests run: 5, Failures: 0, Errors: 1, Time elapsed: 29.793 sec
[junit]
[junit] - Standard Error -
[junit] 2011-09-15 12:46:04.PD org.apache.solr.SolrTestCaseJ4 assertQ
[junit] SEVERE: REQUEST FAILED: xpath=//*[@numFound='3']
[junit] xml response was: 
[junit] 
[junit] 
00
[junit] 
[junit]
[junit] request 
was:start=0&q=id:[100+TO+110]&qt=standard&rows=20&version=2.0
[junit] 2011-09-15 12:46:04.PD org.apache.solr.common.SolrException log
[junit] SEVERE: REQUEST FAILED: 
start=0&q=id:[100+TO+110]&qt=standard&rows=20&version=2.0:java.lang.RuntimeException:
 REQUEST FAILED: xpath=//*[@numFound='3']
[junit] xml response was: 
[junit] 
[junit] 
00
[junit] 
[junit]
[junit] request 
was:start=0&q=id:[100+TO+110]&qt=standard&rows=20&version=2.0
[junit] at 
org.apache.solr.SolrTestCaseJ4.assertQ(SolrTestCaseJ4.java:378)
[junit] at 
org.apache.solr.SolrTestCaseJ4.assertQ(SolrTestCaseJ4.java:352)
[junit] at 
org.apache.solr.handler.TestCSVLoader.testCommitWithin(TestCSVLoader.java:121)
[junit] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
[junit] at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
[junit] at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
[junit] at java.lang.reflect.Method.invoke(Method.java:597)
[junit] at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
[junit] at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
[junit] at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
[junit] at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
[junit] at org.junit.rules.TestWatchman$1.evaluate(TestWatchman.java:48)
[junit] at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28)
[junit] at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31)
[junit] at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:76)
[junit] at 
org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:147)
[junit] at 
org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:50)
[junit] at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193)
[junit] at 
org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52)
[junit] at 
org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191)
[junit] at 
org.junit.runners.ParentRunner.access$000(ParentRunner.java:42)
[junit] at 
org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184)
[junit] at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28)
[junit] at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31)
[junit] at org.junit.runners.ParentRunner.run(ParentRunner.java:236)
[junit] at 
junit.framework.JUnit

[jira] [Commented] (SOLR-2739) TestSqlEntityProcessorDelta.testNonWritablePersistFile failures on some systems

2011-09-14 Thread Shawn Heisey (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13104824#comment-13104824
 ] 

Shawn Heisey commented on SOLR-2739:


This is still a problem.  As it seems to be specific to my environment, I am 
very interested in tracking it down, but I have no idea where to begin.  My 
current test setup is CentOS 6, ext4, Oracle Java 1.6.0_27-b07.  Can you give 
me pointers on how to figure out what the problem is?  Do you need me to 
provide any more information than I have already?

Steps to reproduce current failures on my system from the commandline with 3.4 
or branch_3x:

svn co https://svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_3_4 
lucene_solr_3_4
cd lucene_solr_3_4/solr
ant test

svn co https://svn.apache.org/repos/asf/lucene/dev/branches/branch_3x branch_3x
cd branch_3x/solr
ant test

Some additional info:

[root@bigindy5 src]# rpm -qa | egrep "ant|junit|java"
java-1.6.0-openjdk-1.6.0.0-1.36.b17.el6_0.x86_64
java-1.6.0-sun-1.6.0.27-1.0.cf.x86_64
ant-nodeps-1.7.1-13.el6.x86_64
wpa_supplicant-0.6.8-10.el6.x86_64
libvirt-java-devel-0.4.5-2.el6.noarch
java-1.6.0-sun-devel-1.6.0.27-1.0.cf.x86_64
enchant-1.5.0-4.el6.x86_64
tzdata-java-2011g-1.el6.noarch
java_cup-0.10k-5.el6.x86_64
junit-3.8.2-6.5.el6.x86_64
java-1.6.0-sun-plugin-1.6.0.27-1.0.cf.x86_64
anthy-9100h-10.1.el6.x86_64
libvirt-java-0.4.5-2.el6.noarch
java-1.5.0-gcj-1.5.0.0-29.1.el6.x86_64
ant-1.7.1-13.el6.x86_64
ant-junit-1.7.1-13.el6.x86_64
ibus-anthy-1.2.1-1.el6.x86_64
java-1.6.0-sun-jdbc-1.6.0.27-1.0.cf.x86_64
junit4-4.5-5.3.el6.noarch

[root@bigindy5 src]# java -version
java version "1.6.0_27"
Java(TM) SE Runtime Environment (build 1.6.0_27-b07)
Java HotSpot(TM) 64-Bit Server VM (build 20.2-b06, mixed mode)

[root@bigindy5 src]# uname -a
Linux bigindy5 2.6.32-71.29.1.el6.centos.plus.x86_64 #1 SMP Sun Jun 26 16:27:27 
BST 2011 x86_64 x86_64 x86_64 GNU/Linux


> TestSqlEntityProcessorDelta.testNonWritablePersistFile failures on some 
> systems
> ---
>
> Key: SOLR-2739
> URL: https://issues.apache.org/jira/browse/SOLR-2739
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 3.3
>Reporter: Shawn Heisey
>Assignee: Hoss Man
> Fix For: 3.4, 4.0
>
>
> Shawn Heisey noted on the mailing list that he was getting consistent 
> failures from TestSqlEntityProcessorDelta.testNonWritablePersistFile on his 
> machine.
> I can't reproduce his exact failures, but the test is hinky enough that i 
> want to try and clean it up.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-2756) SolrJ maven dependencies are faulty; needless dependency on lucene-core

2011-09-14 Thread Steven Rowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Rowe updated SOLR-2756:
--

Attachment: SOLR-2756-zookeeper-and-stax-api.patch

bq. there appears to be dependency on stax:stax-api:jar:1.0.1 that is 
questionably if we already depend on geronimo's stax API - which I assume is 
the same Stax API.

This version of the patch excludes the stax:stax-api transitive dependency.

I also added a {{CHANGES.txt}} entry.

I plan on committing later today to branch_3x, then applying the same 
stax:stax-api transitive dependency exclusion to trunk (the other changes to 
branch_3x are not applicable to trunk).

> SolrJ maven dependencies are faulty; needless dependency on lucene-core
> ---
>
> Key: SOLR-2756
> URL: https://issues.apache.org/jira/browse/SOLR-2756
> Project: Solr
>  Issue Type: Bug
>  Components: clients - java
>Affects Versions: 3.3
>Reporter: David Smiley
>Priority: Minor
> Attachments: SOLR-2756-zookeeper-and-stax-api.patch, 
> SOLR-2756-zookeeper-and-stax-api.patch
>
>
> I included a SolrJ 3.3 dependency into a new project and I noticed needless 
> dependencies transitive show up.
> Here is a subset of the output from mvn dependency:tree:
> {noformat}
> [INFO] +- org.apache.solr:solr-solrj:jar:3.3.0:compile
> [INFO] |  +- org.apache.lucene:lucene-core:jar:3.3.0:compile
> [INFO] |  +- commons-httpclient:commons-httpclient:jar:3.1:compile
> [INFO] |  |  \- commons-codec:commons-codec:jar:1.2:compile
> [INFO] |  +- 
> org.apache.geronimo.specs:geronimo-stax-api_1.0_spec:jar:1.0.1:compile
> [INFO] |  +- org.apache.zookeeper:zookeeper:jar:3.3.1:compile
> [INFO] |  |  +- log4j:log4j:jar:1.2.15:compile
> [INFO] |  |  |  \- javax.mail:mail:jar:1.4:compile
> [INFO] |  |  | \- javax.activation:activation:jar:1.1:compile
> [INFO] |  |  \- jline:jline:jar:0.9.94:compile
> [INFO] |  \- org.codehaus.woodstox:wstx-asl:jar:3.2.7:compile
> [INFO] | \- stax:stax-api:jar:1.0.1:compile
> {noformat}
> Clearly there is an inconsistency with solr/dist/solrj-lib and this list.
> * lucene-core dependency should be removed
> * AFAIK, geronimo-stax-api and wstx-asl are only needed for Java 1.5.  Right? 
>  These can be put in a maven profile activated by jdk1.5.
> * zookeeper dependency should be removed. Is this used in Solr 4?  Even if it 
> is, it strikes me as an optional dependency.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [jira] [Created] (SOLR-2760) Cannot "ant dist or ant example"

2011-09-14 Thread Grant Ingersoll
Did you clean first?

On Sep 14, 2011, at 1:49 AM, Bill Bell wrote:

> Thoughts on this?
> 
> I did an "svn up"
> 
> 
> On 9/13/11 11:00 PM, "Bill Bell (JIRA)"  wrote:
> 
>> Cannot "ant dist or ant example"
>> 
>> 
>>Key: SOLR-2760
>>URL: https://issues.apache.org/jira/browse/SOLR-2760
>>Project: Solr
>> Issue Type: Bug
>>   Reporter: Bill Bell
>> 
>> 
>> Path: .
>> URL: http://svn.apache.org/repos/asf/lucene/dev/trunk/solr
>> Repository Root: http://svn.apache.org/repos/asf
>> Repository UUID: 13f79535-47bb-0310-9956-ffa450edef68
>> Revision: 1170435
>> Node Kind: directory
>> Schedule: normal
>> Last Changed Author: chrism
>> Last Changed Rev: 1170425
>> Last Changed Date: 2011-09-13 21:36:56 -0600 (Tue, 13 Sep 2011)
>> 
>> 
>> Then
>> 
>>> ant dist or ant example
>> 
>> compile-core:
>>   [javac] Compiling 23 source files to
>> /Users/bill/solr/trunk/modules/queries/build/classes/java
>>   [javac] 
>> /Users/bill/solr/trunk/modules/queries/src/java/org/apache/lucene/queries/
>> function/valuesource/NormValueSource.java:48: warning: [unchecked]
>> unchecked call to put(K,V) as a member of the raw type java.util.Map
>>   [javac] context.put("searcher",searcher);
>>   [javac]^
>>   [javac] 
>> /Users/bill/solr/trunk/modules/queries/src/java/org/apache/lucene/queries/
>> function/valuesource/NormValueSource.java:61: cannot find symbol
>>   [javac] symbol  : class ConstDoubleDocValues
>>   [javac] location: class
>> org.apache.lucene.queries.function.valuesource.NormValueSource
>>   [javac]   return new ConstDoubleDocValues(0.0, this);
>>   [javac]  ^
>>   [javac] 
>> /Users/bill/solr/trunk/modules/queries/src/java/org/apache/lucene/queries/
>> function/valuesource/NumDocsValueSource.java:40: cannot find symbol
>>   [javac] symbol  : class ConstIntDocValues
>>   [javac] location: class
>> org.apache.lucene.queries.function.valuesource.NumDocsValueSource
>>   [javac] return new
>> ConstIntDocValues(ReaderUtil.getTopLevelContext(readerContext).reader.numD
>> ocs(), this);
>>   [javac]^
>>   [javac] 
>> /Users/bill/solr/trunk/modules/queries/src/java/org/apache/lucene/queries/
>> function/valuesource/QueryValueSource.java:73: warning: [unchecked]
>> unchecked call to put(K,V) as a member of the raw type java.util.Map
>>   [javac] context.put(this, w);
>>   [javac]^
>>   [javac] 
>> /Users/bill/solr/trunk/modules/queries/src/java/org/apache/lucene/queries/
>> function/valuesource/ScaleFloatFunction.java:96: warning: [unchecked]
>> unchecked call to put(K,V) as a member of the raw type java.util.Map
>>   [javac] context.put(this.source, scaleInfo);
>>   [javac]^
>>   [javac] 
>> /Users/bill/solr/trunk/modules/queries/src/java/org/apache/lucene/queries/
>> function/valuesource/SumTotalTermFreqValueSource.java:68: warning:
>> [unchecked] unchecked call to put(K,V) as a member of the raw type
>> java.util.Map
>>   [javac] context.put(this, new LongDocValues(this) {
>>   [javac]^
>>   [javac] 
>> /Users/bill/solr/trunk/modules/queries/src/java/org/apache/lucene/queries/
>> function/valuesource/TotalTermFreqValueSource.java:68: warning:
>> [unchecked] unchecked call to put(K,V) as a member of the raw type
>> java.util.Map
>>   [javac] context.put(this, new LongDocValues(this) {
>>   [javac]^
>>   [javac] 2 errors
>>   [javac] 5 warnings
>> 
>> 
>> --
>> This message is automatically generated by JIRA.
>> For more information on JIRA, see: http://www.atlassian.com/software/jira
>> 
>> 
>> 
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>> 
> 
> 
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
> 


Grant Ingersoll
http://www.lucidimagination.com
Lucene Eurocon 2011: http://www.lucene-eurocon.com



[jira] [Commented] (SOLR-2756) SolrJ maven dependencies are faulty; needless dependency on lucene-core

2011-09-14 Thread Steven Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13104797#comment-13104797
 ] 

Steven Rowe commented on SOLR-2756:
---

bq. I think ConcurrentLRUCache should be moved from Solrj to solr-core.

Done: SOLR-2758

> SolrJ maven dependencies are faulty; needless dependency on lucene-core
> ---
>
> Key: SOLR-2756
> URL: https://issues.apache.org/jira/browse/SOLR-2756
> Project: Solr
>  Issue Type: Bug
>  Components: clients - java
>Affects Versions: 3.3
>Reporter: David Smiley
>Priority: Minor
> Attachments: SOLR-2756-zookeeper-and-stax-api.patch
>
>
> I included a SolrJ 3.3 dependency into a new project and I noticed needless 
> dependencies transitive show up.
> Here is a subset of the output from mvn dependency:tree:
> {noformat}
> [INFO] +- org.apache.solr:solr-solrj:jar:3.3.0:compile
> [INFO] |  +- org.apache.lucene:lucene-core:jar:3.3.0:compile
> [INFO] |  +- commons-httpclient:commons-httpclient:jar:3.1:compile
> [INFO] |  |  \- commons-codec:commons-codec:jar:1.2:compile
> [INFO] |  +- 
> org.apache.geronimo.specs:geronimo-stax-api_1.0_spec:jar:1.0.1:compile
> [INFO] |  +- org.apache.zookeeper:zookeeper:jar:3.3.1:compile
> [INFO] |  |  +- log4j:log4j:jar:1.2.15:compile
> [INFO] |  |  |  \- javax.mail:mail:jar:1.4:compile
> [INFO] |  |  | \- javax.activation:activation:jar:1.1:compile
> [INFO] |  |  \- jline:jline:jar:0.9.94:compile
> [INFO] |  \- org.codehaus.woodstox:wstx-asl:jar:3.2.7:compile
> [INFO] | \- stax:stax-api:jar:1.0.1:compile
> {noformat}
> Clearly there is an inconsistency with solr/dist/solrj-lib and this list.
> * lucene-core dependency should be removed
> * AFAIK, geronimo-stax-api and wstx-asl are only needed for Java 1.5.  Right? 
>  These can be put in a maven profile activated by jdk1.5.
> * zookeeper dependency should be removed. Is this used in Solr 4?  Even if it 
> is, it strikes me as an optional dependency.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (SOLR-2758) ConcurrentLRUCache should move from common/util to solr/util

2011-09-14 Thread Steven Rowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Rowe resolved SOLR-2758.
---

   Resolution: Fixed
Fix Version/s: 4.0
   3.5

Committed to trunk and branch_3x.

Thanks David!

> ConcurrentLRUCache should move from common/util to solr/util
> 
>
> Key: SOLR-2758
> URL: https://issues.apache.org/jira/browse/SOLR-2758
> Project: Solr
>  Issue Type: Improvement
>  Components: clients - java
>Affects Versions: 3.3
>Reporter: David Smiley
>Assignee: Steven Rowe
>Priority: Minor
> Fix For: 3.5, 4.0
>
> Attachments: SOLR-2758_move_ConcurrentLRUCache.patch, 
> SOLR-2758_move_ConcurrentLRUCache.patch, 
> SOLR-2758_move_ConcurrentLRUCache.patch
>
>
> There is exactly one small dependency that the SolrJ jar has to lucene-core 
> and that is indirectly via ConcurrentLRUCache which is in common/util.  SolrJ 
> doesn't even use this cache but it's there any way.  Attached is a patch for 
> the move. It also removes the lucene-core dependency that the SolrJ maven 
> pom.xml has on lucene-core.
> Steve Rowe agrees:
> https://issues.apache.org/jira/browse/SOLR-2756?focusedCommentId=13103103&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13103103

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Trunk test failure: ExtractingRequestHandlerTest.testCommitWithin() [was: [JENKINS-MAVEN] Lucene-Solr-Maven-trunk #239: POMs out of sync]

2011-09-14 Thread Chris Hostetter

: This is 100% reproducible on my local machine (run from 
solr/contrib/extraction/):
: 
: ant test -Dtestcase=ExtractingRequestHandlerTest 
-Dtestmethod=testCommitWithin 
-Dtests.seed=-2b35f16e02bddd0d:5c36eb67e44fc16d:-54d0d485d6a45315

I reopend SOLR-2540, where this test was added.

Jan?  are you looking at this?

: 
: Steve
: 
: > -Original Message-
: > From: Apache Jenkins Server [mailto:jenk...@builds.apache.org]
: > Sent: Tuesday, September 13, 2011 12:09 PM
: > To: dev@lucene.apache.org
: > Subject: [JENKINS-MAVEN] Lucene-Solr-Maven-trunk #239: POMs out of sync
: > 
: > Build: https://builds.apache.org/job/Lucene-Solr-Maven-trunk/239/
: > 
: > 1 tests failed.
: > FAILED:
: > org.apache.solr.handler.extraction.ExtractingRequestHandlerTest.testCommi
: > tWithin
: > 
: > Error Message:
: > Exception during query
: > 
: > Stack Trace:
: > java.lang.RuntimeException: Exception during query
: > at org.apache.solr.SolrTestCaseJ4.assertQ(SolrTestCaseJ4.java:396)
: > at org.apache.solr.SolrTestCaseJ4.assertQ(SolrTestCaseJ4.java:363)
: > at
: > org.apache.solr.handler.extraction.ExtractingRequestHandlerTest.testCommi
: > tWithin(ExtractingRequestHandlerTest.java:306)
: > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
: > at
: > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java
: > :57)
: > at
: > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorI
: > mpl.java:43)
: > at java.lang.reflect.Method.invoke(Method.java:616)
: > at
: > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMeth
: > od.java:44)
: > at
: > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallabl
: > e.java:15)
: > at
: > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod
: > .java:41)
: > at
: > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.
: > java:20)
: > at org.junit.rules.TestWatchman$1.evaluate(TestWatchman.java:48)
: > at
: > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java
: > :28)
: > at
: > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:3
: > 1)
: > at
: > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.
: > java:76)
: > at
: > org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner
: > .java:148)
: > at
: > org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner
: > .java:50)
: > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193)
: > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52)
: > at
: > org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191)
: > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:42)
: > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184)
: > at
: > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java
: > :28)
: > at
: > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:3
: > 1)
: > at org.junit.runners.ParentRunner.run(ParentRunner.java:236)
: > at
: > org.apache.maven.surefire.junit4.JUnit4TestSet.execute(JUnit4TestSet.java
: > :35)
: > at
: > org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Prov
: > ider.java:146)
: > at
: > org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.jav
: > a:97)
: > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
: > at
: > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java
: > :57)
: > at
: > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorI
: > mpl.java:43)
: > at java.lang.reflect.Method.invoke(Method.java:616)
: > at
: > org.apache.maven.surefire.booter.ProviderFactory$ClassLoaderProxy.invoke(
: > ProviderFactory.java:103)
: > at $Proxy0.invoke(Unknown Source)
: > at
: > org.apache.maven.surefire.booter.SurefireStarter.invokeProvider(SurefireS
: > tarter.java:145)
: > at
: > org.apache.maven.surefire.booter.SurefireStarter.runSuitesInProcess(Suref
: > ireStarter.java:87)
: > at
: > org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:69)
: > Caused by: java.lang.RuntimeException: REQUEST FAILED:
: > xpath=//*[@numFound='1']
: > xml response was: 
: > 
: > 0 name="QTime">0 start="0">
: > 
: > 
: > request was:start=0&q=id:one&qt=standard&rows=20&version=2.2
: > at org.apache.solr.SolrTestCaseJ4.assertQ(SolrTestCaseJ4.java:389)
: > ... 36 more
: > 
: > 
: > 
: > 
: > Build Log (for compile errors):
: > [...truncated 24297 lines...]
: > 
: > 
: > 
: > -
: > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
: > For additional commands, e-mail: dev-h...@lucene.apache.org
: 
: 

-Hoss

-

[jira] [Reopened] (SOLR-2540) CommitWithin as an Update Request parameter

2011-09-14 Thread Hoss Man (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man reopened SOLR-2540:



the new ExtractingRequestHandlerTest.testCommitWithin method fails fairly 
reliable on multiple machines.

Noted by sarowe on the dev list...

{noformat}
Subject: Trunk test failure: ExtractingRequestHandlerTest.testCommitWithin() 
[was: [JENKINS-MAVEN]
Lucene-Solr-Maven-trunk #239: POMs out of sync]

This is 100% reproducible on my local machine (run from 
solr/contrib/extraction/):

ant test -Dtestcase=ExtractingRequestHandlerTest -Dtestmethod=testCommitWithin
-Dtests.seed=-2b35f16e02bddd0d:5c36eb67e44fc16d:-54d0d485d6a45315
{noformat}

...i can reproduce this failure everytime i try (regardless of seed)


> CommitWithin as an Update Request parameter
> ---
>
> Key: SOLR-2540
> URL: https://issues.apache.org/jira/browse/SOLR-2540
> Project: Solr
>  Issue Type: New Feature
>  Components: update
>Reporter: Jan Høydahl
>Assignee: Jan Høydahl
>  Labels: commit, commitWithin
> Fix For: 3.4, 4.0
>
> Attachments: SOLR-2540.patch, SOLR-2540.patch
>
>
> It would be useful to support commitWithin HTTP GET request param on all 
> UpdateRequestHandlers.
> That way, you could set commitWithin on the request (for XML, JSON, CSV, 
> Binary and Extracting handlers) with this syntax:
> {code}
>   curl 
> http://localhost:8983/solr/update/extract?literal.id=123&commitWithin=1
>-H "Content-Type: application/pdf" --data-binary @file.pdf
> {code}
> PS: The JsonUpdateRequestHandler and BinaryUpdateRequestHandler already 
> support this syntax.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2766) ant prepare-release fails to package the javadocs for solr-test-framework and solr-solrj

2011-09-14 Thread Steven Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13104773#comment-13104773
 ] 

Steven Rowe commented on SOLR-2766:
---

bq. I think this is just a packaging problem (maybe from the recent renaming?); 
I see their javadocs under solr/build/solr-solrj/docs/api and 
solr/build/solr-test-framework/docs/api.

Yes, this is undoubtedly my fault (SOLR-2452).  I'll investigate.

> ant prepare-release fails to package the javadocs for solr-test-framework and 
> solr-solrj
> 
>
> Key: SOLR-2766
> URL: https://issues.apache.org/jira/browse/SOLR-2766
> Project: Solr
>  Issue Type: Bug
>Reporter: Michael McCandless
> Fix For: 3.5
>
>
> I was updating Solr's web site with the 3.4.0 release, but suddenly
> discovered that the javadocs for the test-framework and solrj (linked
> under the Documentation tab on the left) are missing from
> apache-solr-3.4.0.tgz.
> Ie, when I "tar xzf" that, then:
> {noformat}
> find . -name index.html
> ./docs/index.html
> ./docs/api/index.html
> {noformat}
> (3.3.0's tgz does include them)
> I think this is just a packaging problem (maybe from the recent
> renaming?); I see their javadocs under solr/build/solr-solrj/docs/api
> and solr/build/solr-test-framework/docs/api.
> I also see seprate javadocs for all solr contribs... are these
> supposed to be published on the web site?
> For now I've just copied up the solrj and test-framework jdocs, built
> from the 3.4 branch, to the site.  But we should fix this for
> 3.5.0

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2761) FSTLookup should use long-tail like discretization instead of proportional (linear)

2011-09-14 Thread David Smiley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13104766#comment-13104766
 ] 

David Smiley commented on SOLR-2761:


bq. David, when you use > 100 buckets did you see bad performance for 
low-weight lookups?

I didn't try in any serious way. I was simply writing about this feature when I 
observed the suggestions were poor compared to other Lookup impls and other 
ways of doing term completion. Then I started digging into why and what could 
be done about it.

> FSTLookup should use long-tail like discretization instead of proportional 
> (linear)
> ---
>
> Key: SOLR-2761
> URL: https://issues.apache.org/jira/browse/SOLR-2761
> Project: Solr
>  Issue Type: Improvement
>  Components: spellchecker
>Affects Versions: 3.4
>Reporter: David Smiley
>Priority: Minor
>
> The Suggester's FSTLookup implementation discretizes the term frequencies 
> into a configurable number of buckets (configurable as "weightBuckets") in 
> order to deal with FST limitations. The mapping of a source frequency into a 
> bucket is a proportional (i.e. linear) mapping from the minimum and maximum 
> value. I don't think this makes sense at all given the well-known long-tail 
> like distribution of term frequencies. As a result of this problem, I've 
> found it necessary to increase weightBuckets substantially, like >100, to get 
> quality suggestions. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2761) FSTLookup should use long-tail like discretization instead of proportional (linear)

2011-09-14 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13104761#comment-13104761
 ] 

Michael McCandless commented on SOLR-2761:
--

Ooooh, the javadocs and comments are awesome! -- thanks Dawid and
David.

I was just wondering what specifically is the limitation on our FST
impl and whether it's something we could improve.  It sounds like the
limitation is just how we quantize the incoming weights...

David, when you use > 100 buckets did you see bad performance for
low-weight lookups?

Maybe, in addition to the up-front quantization, we could also store
a more exact weight for each term (eg as the output).  Then on
retrieve we could re-sort the candidates by that exact weight.  But
this will make the FST larger...


> FSTLookup should use long-tail like discretization instead of proportional 
> (linear)
> ---
>
> Key: SOLR-2761
> URL: https://issues.apache.org/jira/browse/SOLR-2761
> Project: Solr
>  Issue Type: Improvement
>  Components: spellchecker
>Affects Versions: 3.4
>Reporter: David Smiley
>Priority: Minor
>
> The Suggester's FSTLookup implementation discretizes the term frequencies 
> into a configurable number of buckets (configurable as "weightBuckets") in 
> order to deal with FST limitations. The mapping of a source frequency into a 
> bucket is a proportional (i.e. linear) mapping from the minimum and maximum 
> value. I don't think this makes sense at all given the well-known long-tail 
> like distribution of term frequencies. As a result of this problem, I've 
> found it necessary to increase weightBuckets substantially, like >100, to get 
> quality suggestions. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-2766) ant prepare-release fails to package the javadocs for solr-test-framework and solr-solrj

2011-09-14 Thread Michael McCandless (JIRA)
ant prepare-release fails to package the javadocs for solr-test-framework and 
solr-solrj


 Key: SOLR-2766
 URL: https://issues.apache.org/jira/browse/SOLR-2766
 Project: Solr
  Issue Type: Bug
Reporter: Michael McCandless
 Fix For: 3.5



I was updating Solr's web site with the 3.4.0 release, but suddenly
discovered that the javadocs for the test-framework and solrj (linked
under the Documentation tab on the left) are missing from
apache-solr-3.4.0.tgz.

Ie, when I "tar xzf" that, then:

{noformat}
find . -name index.html
./docs/index.html
./docs/api/index.html
{noformat}

(3.3.0's tgz does include them)

I think this is just a packaging problem (maybe from the recent
renaming?); I see their javadocs under solr/build/solr-solrj/docs/api
and solr/build/solr-test-framework/docs/api.

I also see seprate javadocs for all solr contribs... are these
supposed to be published on the web site?

For now I've just copied up the solrj and test-framework jdocs, built
from the 3.4 branch, to the site.  But we should fix this for
3.5.0


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2761) FSTLookup should use long-tail like discretization instead of proportional (linear)

2011-09-14 Thread Dawid Weiss (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13104710#comment-13104710
 ] 

Dawid Weiss commented on SOLR-2761:
---

Let me know if anything is not clear, Mike.

> FSTLookup should use long-tail like discretization instead of proportional 
> (linear)
> ---
>
> Key: SOLR-2761
> URL: https://issues.apache.org/jira/browse/SOLR-2761
> Project: Solr
>  Issue Type: Improvement
>  Components: spellchecker
>Affects Versions: 3.4
>Reporter: David Smiley
>Priority: Minor
>
> The Suggester's FSTLookup implementation discretizes the term frequencies 
> into a configurable number of buckets (configurable as "weightBuckets") in 
> order to deal with FST limitations. The mapping of a source frequency into a 
> bucket is a proportional (i.e. linear) mapping from the minimum and maximum 
> value. I don't think this makes sense at all given the well-known long-tail 
> like distribution of term frequencies. As a result of this problem, I've 
> found it necessary to increase weightBuckets substantially, like >100, to get 
> quality suggestions. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3429) improve build system when tests hang

2011-09-14 Thread Dawid Weiss (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13104703#comment-13104703
 ] 

Dawid Weiss commented on LUCENE-3429:
-

bq. if a waiting thread doesn't react to interrupt() it won't react to stop() 
either. But without the waiting, the statement is wrong

Yes, this is probably true. If it's waiting (on a monitor or i/o) and doesn't 
react to interrupt, then it's in a deep hole somewhere where nothing's going to 
help it :)

bq. Yes, its "main" on Sun/IBM/... and "Main Thread" on Jrockit

Ok, so can I change it the way I suggested (i.e. have a "test" thread variable 
on the test superclass and compare to it instead)? You didn't explain the 
reason this code needs this comparison at all.

> improve build system when tests hang
> 
>
> Key: LUCENE-3429
> URL: https://issues.apache.org/jira/browse/LUCENE-3429
> Project: Lucene - Java
>  Issue Type: Test
>Reporter: Robert Muir
> Fix For: 3.5, 4.0
>
> Attachments: LUCENE-3429.patch, LUCENE-3429.patch
>
>
> Currently, if tests hang in hudson it can go hung for days until we manually 
> kill it.
> The problem is that when a hang happens its probably serious, what we want to 
> do (I think), is:
> # time out the build.
> # ensure we have enough debugging information to hopefully fix any hang.
> So I think the ideal solution would be:
> # add a sysprop "-D" that LuceneTestCase respects, it could default to no 
> timeout at all (some value like zero).
> # when a timeout is set, LuceneTestCase spawns an additional timer thread for 
> the test class? method?
> # if the timeout is exceeded, LuceneTestCase dumps all thread/stack 
> information, random seed information to hopefully reproduce the hang, and 
> fails the test.
> # nightly builds would pass some reasonable -D for each test.
> separately, I think we should have an "ant-level" timeout for the whole 
> build, in case it goes completely crazy (e.g. jvm completely hangs or 
> something else), just as an additional safety.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2762) FSTLookup returns one less suggestion than it should when onlyMorePopular=true

2011-09-14 Thread Dawid Weiss (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13104708#comment-13104708
 ] 

Dawid Weiss commented on SOLR-2762:
---

Ok, I'll try to reproduce on my own, thanks.

> FSTLookup returns one less suggestion than it should when onlyMorePopular=true
> --
>
> Key: SOLR-2762
> URL: https://issues.apache.org/jira/browse/SOLR-2762
> Project: Solr
>  Issue Type: Bug
>  Components: spellchecker
>Affects Versions: 3.3
>Reporter: David Smiley
>Assignee: Dawid Weiss
>Priority: Minor
>
> I'm using the Suggester.  When I switched from TSTLookup to FSTLookup, I 
> noticed that it returned one fewer suggestion than what I asked for. I have 
> spellcheck.onlyMorePopular=true; when I set it to false, I see the correct 
> count. Another aspect of the bug is that this off-by-one bug only seems to 
> occur when my suggestion has an exact match.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2761) FSTLookup should use long-tail like discretization instead of proportional (linear)

2011-09-14 Thread David Smiley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13104653#comment-13104653
 ] 

David Smiley commented on SOLR-2761:


It should be noted there are code comments Dawid left on doing another approach:
{code}
// Distribute weights into at most N buckets. This is a form of 
discretization to
// limit the number of possible weights so that they can be efficiently 
encoded in the
// automaton.
//
// It is assumed the distribution of weights is _linear_ so proportional 
division 
// of [min, max] range will be enough here. Other approaches could be to 
sort 
// weights and divide into proportional ranges.
{code}

> FSTLookup should use long-tail like discretization instead of proportional 
> (linear)
> ---
>
> Key: SOLR-2761
> URL: https://issues.apache.org/jira/browse/SOLR-2761
> Project: Solr
>  Issue Type: Improvement
>  Components: spellchecker
>Affects Versions: 3.4
>Reporter: David Smiley
>Priority: Minor
>
> The Suggester's FSTLookup implementation discretizes the term frequencies 
> into a configurable number of buckets (configurable as "weightBuckets") in 
> order to deal with FST limitations. The mapping of a source frequency into a 
> bucket is a proportional (i.e. linear) mapping from the minimum and maximum 
> value. I don't think this makes sense at all given the well-known long-tail 
> like distribution of term frequencies. As a result of this problem, I've 
> found it necessary to increase weightBuckets substantially, like >100, to get 
> quality suggestions. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3434) Make ShingleAnalyzerWrapper and PerFieldAnalyzerWrapper immutable

2011-09-14 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13104646#comment-13104646
 ] 

Robert Muir commented on LUCENE-3434:
-

I think you can remove the suppresswarnings and use Collections.emptyMap() 
instead of Collections.EMPTY_MAP ?

> Make ShingleAnalyzerWrapper and PerFieldAnalyzerWrapper immutable
> -
>
> Key: LUCENE-3434
> URL: https://issues.apache.org/jira/browse/LUCENE-3434
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: modules/analysis
>Reporter: Chris Male
> Attachments: LUCENE-3434-3x.patch, LUCENE-3434-trunk.patch
>
>
> Both ShingleAnalyzerWrapper and PerFieldAnalyzerWrapper have setters which 
> change some state which impacts their analysis stack.  If these are going to 
> become reusable, then the state must be immutable as changing it will have no 
> effect.
> Process will be similar to QueryAutoStopWordAnalyzer, I will remove in trunk 
> and deprecate in 3x.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2761) FSTLookup should use long-tail like discretization instead of proportional (linear)

2011-09-14 Thread David Smiley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13104641#comment-13104641
 ] 

David Smiley commented on SOLR-2761:


FSTLookup is well documented, thanks to Dawid.  Here is a link to the Javadocs 
for your convenience, Mike: 
https://builds.apache.org/job/Lucene-3.x/javadoc/all/org/apache/lucene/search/suggest/fst/FSTLookup.html?is-external=true

> FSTLookup should use long-tail like discretization instead of proportional 
> (linear)
> ---
>
> Key: SOLR-2761
> URL: https://issues.apache.org/jira/browse/SOLR-2761
> Project: Solr
>  Issue Type: Improvement
>  Components: spellchecker
>Affects Versions: 3.4
>Reporter: David Smiley
>Priority: Minor
>
> The Suggester's FSTLookup implementation discretizes the term frequencies 
> into a configurable number of buckets (configurable as "weightBuckets") in 
> order to deal with FST limitations. The mapping of a source frequency into a 
> bucket is a proportional (i.e. linear) mapping from the minimum and maximum 
> value. I don't think this makes sense at all given the well-known long-tail 
> like distribution of term frequencies. As a result of this problem, I've 
> found it necessary to increase weightBuckets substantially, like >100, to get 
> quality suggestions. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3435) Create a Size Estimator model for Lucene and Solr

2011-09-14 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13104622#comment-13104622
 ] 

Grant Ingersoll commented on LUCENE-3435:
-

A good deal of it Mike and I worked out yesterday on IRC (well, mostly Mike 
explained and I took copious notes).  The disk storage stuff is based on LIA2.  
It is a theoretical model and not an empirical one other than the bytes/term 
calculation was based off of indexing wikipedia.  

I would deem it a gross approximation of the state of trunk at this point in 
time.  My gut says the Lucene estimation is a little low, while Solr is fairly 
close (since I suspect Solr's memory usage is dominated by caching).  I imagine 
there are things still unaccounted for. For instance, I haven't reverse 
engineered the fieldValueCache memSize() method yet and I don't have a good 
sense of how much memory would be consumed in a highly concurrent system by the 
sheer number of Query objects instantiated or when one has really large Queries 
(say 5K terms).  It also is not meant to be one size fits all.  Lucene/Solr 
have a ton of tuning options that could change things significantly.

I did a few sanity checks against things I've seen in the past, and thought it 
was reasonable.  There is, of course, no substitute for good testing.  In other 
words, caveat emptor.

> Create a Size Estimator model for Lucene and Solr
> -
>
> Key: LUCENE-3435
> URL: https://issues.apache.org/jira/browse/LUCENE-3435
> Project: Lucene - Java
>  Issue Type: Task
>  Components: core/other
>Affects Versions: 4.0
>Reporter: Grant Ingersoll
>Assignee: Grant Ingersoll
>Priority: Minor
>
> It is often handy to be able to estimate the amount of memory and disk space 
> that both Lucene and Solr use, given certain assumptions.  I intend to check 
> in an Excel spreadsheet that allows people to estimate memory and disk usage 
> for trunk.  I propose to put it under dev-tools, as I don't think it should 
> be official documentation just yet and like the IDE stuff, we'll see how well 
> it gets maintained.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2761) FSTLookup should use long-tail like discretization instead of proportional (linear)

2011-09-14 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13104608#comment-13104608
 ] 

Michael McCandless commented on SOLR-2761:
--

What limitation of FSTs is causing us to discretize the term frequencies?

> FSTLookup should use long-tail like discretization instead of proportional 
> (linear)
> ---
>
> Key: SOLR-2761
> URL: https://issues.apache.org/jira/browse/SOLR-2761
> Project: Solr
>  Issue Type: Improvement
>  Components: spellchecker
>Affects Versions: 3.4
>Reporter: David Smiley
>Priority: Minor
>
> The Suggester's FSTLookup implementation discretizes the term frequencies 
> into a configurable number of buckets (configurable as "weightBuckets") in 
> order to deal with FST limitations. The mapping of a source frequency into a 
> bucket is a proportional (i.e. linear) mapping from the minimum and maximum 
> value. I don't think this makes sense at all given the well-known long-tail 
> like distribution of term frequencies. As a result of this problem, I've 
> found it necessary to increase weightBuckets substantially, like >100, to get 
> quality suggestions. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Issue Comment Edited] (SOLR-2758) ConcurrentLRUCache should move from common/util to solr/util

2011-09-14 Thread Steven Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13104586#comment-13104586
 ] 

Steven Rowe edited comment on SOLR-2758 at 9/14/11 3:54 PM:


When I applied the patch (using the 'patch' utility), the file movement didn't 
happen, so I modified the patch to depend on this svn script having been 
already run:

{noformat}
svn mv solr/solrj/src/java/org/apache/solr/common/util/ConcurrentLRUCache.java 
solr/core/src/java/org/apache/solr/util/
{noformat}

(I generated the patch with {{svn --no-diff-deleted diff > ...}}, so that the 
source file's contents wouldn't be needlessly included in the patch.)

Also, I added a CHANGES.txt entry.

I plan on committing this shortly.

  was (Author: steve_rowe):
When I applied the patch (using the 'patch' utility), the file movement 
didn't happen, so I modified the patch to depend on the this svn script having 
been already run:

{noformat}
svn mv solr/solrj/src/java/org/apache/solr/common/util/ConcurrentLRUCache.java 
solr/core/src/java/org/apache/solr/util/
{noformat}

(I generated the patch with {{svn --no-diff-deleted diff > ...}}, so that the 
source file's contents wouldn't be needlessly included in the patch.)

Also, I added a CHANGES.txt entry.

I plan on committing this shortly.
  
> ConcurrentLRUCache should move from common/util to solr/util
> 
>
> Key: SOLR-2758
> URL: https://issues.apache.org/jira/browse/SOLR-2758
> Project: Solr
>  Issue Type: Improvement
>  Components: clients - java
>Affects Versions: 3.3
>Reporter: David Smiley
>Assignee: Steven Rowe
>Priority: Minor
> Attachments: SOLR-2758_move_ConcurrentLRUCache.patch, 
> SOLR-2758_move_ConcurrentLRUCache.patch, 
> SOLR-2758_move_ConcurrentLRUCache.patch
>
>
> There is exactly one small dependency that the SolrJ jar has to lucene-core 
> and that is indirectly via ConcurrentLRUCache which is in common/util.  SolrJ 
> doesn't even use this cache but it's there any way.  Attached is a patch for 
> the move. It also removes the lucene-core dependency that the SolrJ maven 
> pom.xml has on lucene-core.
> Steve Rowe agrees:
> https://issues.apache.org/jira/browse/SOLR-2756?focusedCommentId=13103103&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13103103

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-2758) ConcurrentLRUCache should move from common/util to solr/util

2011-09-14 Thread Steven Rowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Rowe updated SOLR-2758:
--

Attachment: SOLR-2758_move_ConcurrentLRUCache.patch

When I applied the patch (using the 'patch' utility), the file movement didn't 
happen, so I modified the patch to depend on the this svn script having been 
already run:

{noformat}
svn mv solr/solrj/src/java/org/apache/solr/common/util/ConcurrentLRUCache.java 
solr/core/src/java/org/apache/solr/util/
{noformat}

(I generated the patch with {{svn --no-diff-deleted diff > ...}}, so that the 
source file's contents wouldn't be needlessly included in the patch.)

Also, I added a CHANGES.txt entry.

I plan on committing this shortly.

> ConcurrentLRUCache should move from common/util to solr/util
> 
>
> Key: SOLR-2758
> URL: https://issues.apache.org/jira/browse/SOLR-2758
> Project: Solr
>  Issue Type: Improvement
>  Components: clients - java
>Affects Versions: 3.3
>Reporter: David Smiley
>Assignee: Steven Rowe
>Priority: Minor
> Attachments: SOLR-2758_move_ConcurrentLRUCache.patch, 
> SOLR-2758_move_ConcurrentLRUCache.patch, 
> SOLR-2758_move_ConcurrentLRUCache.patch
>
>
> There is exactly one small dependency that the SolrJ jar has to lucene-core 
> and that is indirectly via ConcurrentLRUCache which is in common/util.  SolrJ 
> doesn't even use this cache but it's there any way.  Attached is a patch for 
> the move. It also removes the lucene-core dependency that the SolrJ maven 
> pom.xml has on lucene-core.
> Steve Rowe agrees:
> https://issues.apache.org/jira/browse/SOLR-2756?focusedCommentId=13103103&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13103103

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2762) FSTLookup returns one less suggestion than it should when onlyMorePopular=true

2011-09-14 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13104582#comment-13104582
 ] 

Michael McCandless commented on SOLR-2762:
--

bq. I thought it best to at least report the problem instead of do nothing.

+1


> FSTLookup returns one less suggestion than it should when onlyMorePopular=true
> --
>
> Key: SOLR-2762
> URL: https://issues.apache.org/jira/browse/SOLR-2762
> Project: Solr
>  Issue Type: Bug
>  Components: spellchecker
>Affects Versions: 3.3
>Reporter: David Smiley
>Assignee: Dawid Weiss
>Priority: Minor
>
> I'm using the Suggester.  When I switched from TSTLookup to FSTLookup, I 
> noticed that it returned one fewer suggestion than what I asked for. I have 
> spellcheck.onlyMorePopular=true; when I set it to false, I see the correct 
> count. Another aspect of the bug is that this off-by-one bug only seems to 
> occur when my suggestion has an exact match.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2764) Create a Norwegian plural/singular stemmer

2011-09-14 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13104577#comment-13104577
 ] 

Robert Muir commented on SOLR-2764:
---

I would leave the irregularities out (e.g. just like our english one basically 
'strips the s)'.
someone can always deal with exceptions with their own list: 
stemmerOverrideFilter etc

i dont know anything about norwegian but you can take the other languages as 
examples here, 
and create the ruleset for the most common nominal inflections... e.g. strip { 
-a, -ene, -en, -er, -et }  
or whatever.


> Create a Norwegian plural/singular stemmer
> --
>
> Key: SOLR-2764
> URL: https://issues.apache.org/jira/browse/SOLR-2764
> Project: Solr
>  Issue Type: New Feature
>  Components: Schema and Analysis
>Reporter: Jan Høydahl
>
> We need a light-weight stemmer for plural/singlular only in Norwegian

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3435) Create a Size Estimator model for Lucene and Solr

2011-09-14 Thread Otis Gospodnetic (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13104574#comment-13104574
 ] 

Otis Gospodnetic commented on LUCENE-3435:
--

Grant - what is your experience with this estimator (the one you just 
committed)?  That is, how often is it right or close (how close?) to what you 
see in reality, assuming you give it correct input?


> Create a Size Estimator model for Lucene and Solr
> -
>
> Key: LUCENE-3435
> URL: https://issues.apache.org/jira/browse/LUCENE-3435
> Project: Lucene - Java
>  Issue Type: Task
>  Components: core/other
>Affects Versions: 4.0
>Reporter: Grant Ingersoll
>Assignee: Grant Ingersoll
>Priority: Minor
>
> It is often handy to be able to estimate the amount of memory and disk space 
> that both Lucene and Solr use, given certain assumptions.  I intend to check 
> in an Excel spreadsheet that allows people to estimate memory and disk usage 
> for trunk.  I propose to put it under dev-tools, as I don't think it should 
> be official documentation just yet and like the IDE stuff, we'll see how well 
> it gets maintained.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3414) Bring Hunspell for Lucene into analysis module

2011-09-14 Thread Chris Male (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13104568#comment-13104568
 ] 

Chris Male commented on LUCENE-3414:


Nope, its on my mental TODO but go for it.

> Bring Hunspell for Lucene into analysis module
> --
>
> Key: LUCENE-3414
> URL: https://issues.apache.org/jira/browse/LUCENE-3414
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: modules/analysis
>Reporter: Chris Male
>Assignee: Chris Male
> Fix For: 3.5, 4.0
>
> Attachments: LUCENE-3414.patch, LUCENE-3414.patch
>
>
> Some time ago I along with Robert and Uwe, wrote an Stemmer which uses the 
> Hunspell algorithm.  It has the benefit of supporting dictionaries for a wide 
> array of languages.   
> It seems to still be being used but has fallen out of date.  I think it would 
> benefit from being inside the analysis module where additional features such 
> as decompounding support, could be added.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3429) improve build system when tests hang

2011-09-14 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13104565#comment-13104565
 ] 

Robert Muir commented on LUCENE-3429:
-

{quote}
what was it for (thread name checking)?
{quote}

Yes, its "main" on Sun/IBM/... and "Main Thread" on Jrockit

> improve build system when tests hang
> 
>
> Key: LUCENE-3429
> URL: https://issues.apache.org/jira/browse/LUCENE-3429
> Project: Lucene - Java
>  Issue Type: Test
>Reporter: Robert Muir
> Fix For: 3.5, 4.0
>
> Attachments: LUCENE-3429.patch, LUCENE-3429.patch
>
>
> Currently, if tests hang in hudson it can go hung for days until we manually 
> kill it.
> The problem is that when a hang happens its probably serious, what we want to 
> do (I think), is:
> # time out the build.
> # ensure we have enough debugging information to hopefully fix any hang.
> So I think the ideal solution would be:
> # add a sysprop "-D" that LuceneTestCase respects, it could default to no 
> timeout at all (some value like zero).
> # when a timeout is set, LuceneTestCase spawns an additional timer thread for 
> the test class? method?
> # if the timeout is exceeded, LuceneTestCase dumps all thread/stack 
> information, random seed information to hopefully reproduce the hang, and 
> fails the test.
> # nightly builds would pass some reasonable -D for each test.
> separately, I think we should have an "ant-level" timeout for the whole 
> build, in case it goes completely crazy (e.g. jvm completely hangs or 
> something else), just as an additional safety.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3414) Bring Hunspell for Lucene into analysis module

2011-09-14 Thread JIRA

[ 
https://issues.apache.org/jira/browse/LUCENE-3414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13104564#comment-13104564
 ] 

Jan Høydahl commented on LUCENE-3414:
-

Is there a JIRA for adding HunspellStemFilterFactory to Solr?

> Bring Hunspell for Lucene into analysis module
> --
>
> Key: LUCENE-3414
> URL: https://issues.apache.org/jira/browse/LUCENE-3414
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: modules/analysis
>Reporter: Chris Male
>Assignee: Chris Male
> Fix For: 3.5, 4.0
>
> Attachments: LUCENE-3414.patch, LUCENE-3414.patch
>
>
> Some time ago I along with Robert and Uwe, wrote an Stemmer which uses the 
> Hunspell algorithm.  It has the benefit of supporting dictionaries for a wide 
> array of languages.   
> It seems to still be being used but has fallen out of date.  I think it would 
> benefit from being inside the analysis module where additional features such 
> as decompounding support, could be added.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2765) Shard/Node states

2011-09-14 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13104562#comment-13104562
 ] 

Yonik Seeley commented on SOLR-2765:


- we probably want states on a per shard basis (in case we go with 
micro-sharding, a node may have multiple shards in different states).
- we might want a state on the node also... a way to mark it as "disabled" in 
general (note to rest of cluster - consider the node to be down)
- an active/enabled shard should be preferred as a leader

Perhaps at the same time thing of adding "roles" to nodes.  A comma separated 
list of values that have some pre-defined values, but that the user may also 
choose to define their own values.  One example use case would be to have a 
bank of indexers for rich text (PDF, Word, etc) that do all the work of text 
extraction or other expensive processing and forward the results to the right 
leader.  This could also be used to remove all search traffic from a node (by 
removing the standard "searcher" role) but allow it to stay up-to-date by 
remaining in the indexing loop.



> Shard/Node states
> -
>
> Key: SOLR-2765
> URL: https://issues.apache.org/jira/browse/SOLR-2765
> Project: Solr
>  Issue Type: Sub-task
>  Components: SolrCloud, update
>Reporter: Yonik Seeley
> Fix For: 4.0
>
>
> Need state for shards that indicate they are recovering, active/enabled, or 
> disabled.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-2765) Shard/Node states

2011-09-14 Thread Yonik Seeley (JIRA)
Shard/Node states
-

 Key: SOLR-2765
 URL: https://issues.apache.org/jira/browse/SOLR-2765
 Project: Solr
  Issue Type: Sub-task
Reporter: Yonik Seeley


Need state for shards that indicate they are recovering, active/enabled, or 
disabled.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Assigned] (SOLR-2758) ConcurrentLRUCache should move from common/util to solr/util

2011-09-14 Thread Steven Rowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Rowe reassigned SOLR-2758:
-

Assignee: Steven Rowe

> ConcurrentLRUCache should move from common/util to solr/util
> 
>
> Key: SOLR-2758
> URL: https://issues.apache.org/jira/browse/SOLR-2758
> Project: Solr
>  Issue Type: Improvement
>  Components: clients - java
>Affects Versions: 3.3
>Reporter: David Smiley
>Assignee: Steven Rowe
>Priority: Minor
> Attachments: SOLR-2758_move_ConcurrentLRUCache.patch, 
> SOLR-2758_move_ConcurrentLRUCache.patch
>
>
> There is exactly one small dependency that the SolrJ jar has to lucene-core 
> and that is indirectly via ConcurrentLRUCache which is in common/util.  SolrJ 
> doesn't even use this cache but it's there any way.  Attached is a patch for 
> the move. It also removes the lucene-core dependency that the SolrJ maven 
> pom.xml has on lucene-core.
> Steve Rowe agrees:
> https://issues.apache.org/jira/browse/SOLR-2756?focusedCommentId=13103103&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13103103

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2758) ConcurrentLRUCache should move from common/util to solr/util

2011-09-14 Thread David Smiley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13104541#comment-13104541
 ] 

David Smiley commented on SOLR-2758:


Please apply this patch to both 3x and trunk branches. Someone might argue that 
_technically_ this is a breaking change because a class moved from point A to 
point B, but it is internal. And config files reference the caches via the 
"solr." convenience.

> ConcurrentLRUCache should move from common/util to solr/util
> 
>
> Key: SOLR-2758
> URL: https://issues.apache.org/jira/browse/SOLR-2758
> Project: Solr
>  Issue Type: Improvement
>  Components: clients - java
>Affects Versions: 3.3
>Reporter: David Smiley
>Priority: Minor
> Attachments: SOLR-2758_move_ConcurrentLRUCache.patch, 
> SOLR-2758_move_ConcurrentLRUCache.patch
>
>
> There is exactly one small dependency that the SolrJ jar has to lucene-core 
> and that is indirectly via ConcurrentLRUCache which is in common/util.  SolrJ 
> doesn't even use this cache but it's there any way.  Attached is a patch for 
> the move. It also removes the lucene-core dependency that the SolrJ maven 
> pom.xml has on lucene-core.
> Steve Rowe agrees:
> https://issues.apache.org/jira/browse/SOLR-2756?focusedCommentId=13103103&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13103103

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2764) Create a Norwegian plural/singular stemmer

2011-09-14 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-2764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13104539#comment-13104539
 ] 

Jan Høydahl commented on SOLR-2764:
---

Unfortunately the rules for Noun conjugation is much more complex in Norwegian 
than English, and there are many irregularities.

> Create a Norwegian plural/singular stemmer
> --
>
> Key: SOLR-2764
> URL: https://issues.apache.org/jira/browse/SOLR-2764
> Project: Solr
>  Issue Type: New Feature
>  Components: Schema and Analysis
>Reporter: Jan Høydahl
>
> We need a light-weight stemmer for plural/singlular only in Norwegian

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



  1   2   >