date:20131108

Re: [JENKINS] Lucene-Solr-NightlyTests-trunk - Build # 434 - Still Failing

2013-11-08 Thread Robert Muir

test bug: I committed a fix.

On Thu, Nov 7, 2013 at 9:09 PM, Apache Jenkins Server
 wrote:
> Build: https://builds.apache.org/job/Lucene-Solr-NightlyTests-trunk/434/
>
> 1 tests failed.
> REGRESSION:  
> org.apache.lucene.codecs.pulsing.TestPulsingPostingsFormat.testInvertedWrite
>
> Error Message:
> Captured an uncaught exception in thread: Thread[id=12, name=Lucene Merge 
> Thread #0, state=RUNNABLE, group=TGRP-TestPulsingPostingsFormat]
>
> Stack Trace:
> com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an 
> uncaught exception in thread: Thread[id=12, name=Lucene Merge Thread #0, 
> state=RUNNABLE, group=TGRP-TestPulsingPostingsFormat]
> Caused by: org.apache.lucene.index.MergePolicy$MergeException: 
> java.util.ConcurrentModificationException
> at __randomizedtesting.SeedInfo.seed([FB3E7DB6675B0F39]:0)
> at 
> org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:545)
> at 
> org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:518)
> Caused by: java.util.ConcurrentModificationException
> at java.util.HashMap$HashIterator.nextEntry(HashMap.java:894)
> at java.util.HashMap$KeyIterator.next(HashMap.java:928)
> at 
> org.apache.lucene.index.BasePostingsFormatTestCase$1$1$1.write(BasePostingsFormatTestCase.java:1478)
> at 
> org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsWriter.write(PerFieldPostingsFormat.java:178)
> at 
> org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java:381)
> at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:103)
> at 
> org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4001)
> at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3598)
> at 
> org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:405)
> at 
> org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:482)
>
>
>
>
> Build Log:
> [...truncated 6901 lines...]
>[junit4] Suite: org.apache.lucene.codecs.pulsing.TestPulsingPostingsFormat
>[junit4]   2> Lap 08, 2013 10:02:26 AM 
> com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler
>  uncaughtException
>[junit4]   2> WARNING: Uncaught exception in thread: Thread[Lucene Merge 
> Thread #0,6,TGRP-TestPulsingPostingsFormat]
>[junit4]   2> org.apache.lucene.index.MergePolicy$MergeException: 
> java.util.ConcurrentModificationException
>[junit4]   2>at 
> __randomizedtesting.SeedInfo.seed([FB3E7DB6675B0F39]:0)
>[junit4]   2>at 
> org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:545)
>[junit4]   2>at 
> org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:518)
>[junit4]   2> Caused by: java.util.ConcurrentModificationException
>[junit4]   2>at 
> java.util.HashMap$HashIterator.nextEntry(HashMap.java:894)
>[junit4]   2>at 
> java.util.HashMap$KeyIterator.next(HashMap.java:928)
>[junit4]   2>at 
> org.apache.lucene.index.BasePostingsFormatTestCase$1$1$1.write(BasePostingsFormatTestCase.java:1478)
>[junit4]   2>at 
> org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsWriter.write(PerFieldPostingsFormat.java:178)
>[junit4]   2>at 
> org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java:381)
>[junit4]   2>at 
> org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:103)
>[junit4]   2>at 
> org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4001)
>[junit4]   2>at 
> org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3598)
>[junit4]   2>at 
> org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:405)
>[junit4]   2>at 
> org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:482)
>[junit4]   2>
>[junit4]   2> Lap 08, 2013 10:02:26 AM 
> com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler
>  uncaughtException
>[junit4]   2> WARNING: Uncaught exception in thread: Thread[Lucene Merge 
> Thread #1,6,TGRP-TestPulsingPostingsFormat]
>[junit4]   2> org.apache.lucene.index.MergePolicy$MergeException: 
> java.util.ConcurrentModificationException
>[junit4]   2>at 
> __randomizedtesting.SeedInfo.seed([FB3E7DB6675B0F39]:0)
>[junit4]   2>at 
> org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:545)
>[junit4]   2>at 
> org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:518)
>[junit4]   2> Caused by: java.util.ConcurrentModificationException
>[junit4]   2

[JENKINS] Lucene-Solr-trunk-MacOSX (64bit/jdk1.7.0) - Build # 999 - Failure!

2013-11-08 Thread Policeman Jenkins Server

Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-MacOSX/999/
Java: 64bit/jdk1.7.0 -XX:+UseCompressedOops -XX:+UseG1GC

All tests passed

Build Log:
[...truncated 10462 lines...]
   [junit4] JVM J0: stderr was not empty, see: 
/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/build/solr-core/test/temp/junit4-J0-20131108_235715_812.syserr
   [junit4] >>> JVM J0: stderr (verbatim) 
   [junit4] java(186,0x14ebd5000) malloc: *** error for object 0x14ebc3f90: 
pointer being freed was not allocated
   [junit4] *** set a breakpoint in malloc_error_break to debug
   [junit4] <<< JVM J0: EOF 

[...truncated 1 lines...]
   [junit4] ERROR: JVM J0 ended with an exception, command line: 
/Library/Java/JavaVirtualMachines/jdk1.7.0_45.jdk/Contents/Home/jre/bin/java 
-XX:+UseCompressedOops -XX:+UseG1GC -XX:+HeapDumpOnOutOfMemoryError 
-XX:HeapDumpPath=/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/heapdumps 
-Dtests.prefix=tests -Dtests.seed=2CD7DE196210E842 -Xmx512M -Dtests.iters= 
-Dtests.verbose=false -Dtests.infostream=false -Dtests.codec=random 
-Dtests.postingsformat=random -Dtests.docvaluesformat=random 
-Dtests.locale=random -Dtests.timezone=random -Dtests.directory=random 
-Dtests.linedocsfile=europarl.lines.txt.gz -Dtests.luceneMatchVersion=5.0 
-Dtests.cleanthreads=perClass 
-Djava.util.logging.config.file=/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/tools/junit4/logging.properties
 -Dtests.nightly=false -Dtests.weekly=false -Dtests.slow=true 
-Dtests.asserts.gracious=false -Dtests.multiplier=1 -DtempDir=. 
-Djava.io.tmpdir=. 
-Djunit4.tempDir=/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/build/solr-core/test/temp
 
-Dclover.db.dir=/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/clover/db
 -Djava.security.manager=org.apache.lucene.util.TestSecurityManager 
-Djava.security.policy=/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/tools/junit4/tests.policy
 -Dlucene.version=5.0-SNAPSHOT -Djetty.testMode=1 -Djetty.insecurerandom=1 
-Dsolr.directoryFactory=org.apache.solr.core.MockDirectoryFactory 
-Djava.awt.headless=true -Dtests.disableHdfs=true -classpath 
/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/build/solr-core/classes/test:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/build/solr-test-framework/classes/java:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/test-framework/lib/junit4-ant-2.0.13.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/build/solr-core/test-files:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/test-framework/classes/java:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/codecs/classes/java:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/build/solr-solrj/classes/java:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/build/solr-core/classes/java:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/analysis/common/lucene-analyzers-common-5.0-SNAPSHOT.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/analysis/kuromoji/lucene-analyzers-kuromoji-5.0-SNAPSHOT.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/analysis/phonetic/lucene-analyzers-phonetic-5.0-SNAPSHOT.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/codecs/lucene-codecs-5.0-SNAPSHOT.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/highlighter/lucene-highlighter-5.0-SNAPSHOT.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/memory/lucene-memory-5.0-SNAPSHOT.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/misc/lucene-misc-5.0-SNAPSHOT.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/spatial/lucene-spatial-5.0-SNAPSHOT.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/suggest/lucene-suggest-5.0-SNAPSHOT.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/grouping/lucene-grouping-5.0-SNAPSHOT.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/queries/lucene-queries-5.0-SNAPSHOT.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/queryparser/lucene-queryparser-5.0-SNAPSHOT.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/join/lucene-join-5.0-SNAPSHOT.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/core/lib/commons-cli-1.2.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/core/lib/commons-codec-1.7.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/core/lib/commons-configuration-1.6.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/core/lib/commons-fileupload-1.2.1.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/core/lib/commons-lang-2.6.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/core/lib/concurrentlinkedhashmap-lru-1.2.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/core/lib/dom4j-1.6.1.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/core/lib/guava-14.0.1.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-Mac

[JENKINS-MAVEN] Lucene-Solr-Maven-trunk #1021: POMs out of sync

2013-11-08 Thread Apache Jenkins Server

Build: https://builds.apache.org/job/Lucene-Solr-Maven-trunk/1021/

2 tests failed.
REGRESSION:  org.apache.solr.cloud.ChaosMonkeySafeLeaderTest.testDistribSearch

Error Message:
shard3 is not consistent.  Got 34 from 
http://127.0.0.1:29017/uil/co/collection1lastClient and got 23 from 
http://127.0.0.1:27870/uil/co/collection1

Stack Trace:
java.lang.AssertionError: shard3 is not consistent.  Got 34 from 
http://127.0.0.1:29017/uil/co/collection1lastClient and got 23 from 
http://127.0.0.1:27870/uil/co/collection1
at 
__randomizedtesting.SeedInfo.seed([2B5402E5F393C677:AAB28CFD84CCA64B]:0)
at org.junit.Assert.fail(Assert.java:93)
at 
org.apache.solr.cloud.AbstractFullDistribZkTestBase.checkShardConsistency(AbstractFullDistribZkTestBase.java:1151)
at 
org.apache.solr.cloud.ChaosMonkeySafeLeaderTest.doTest(ChaosMonkeySafeLeaderTest.java:135)


REGRESSION:  org.apache.solr.cloud.SyncSliceTest.testDistribSearch

Error Message:
Test Setup Failure: shard1 should have just been set up to be inconsistent - 
but it's still consistent. Leader:http://127.0.0.1:16328/collection1 Dead 
Guy:http://127.0.0.1:34134/collection1skip list:[CloudJettyRunner 
[url=http://127.0.0.1:18849/collection1], CloudJettyRunner 
[url=http://127.0.0.1:18849/collection1]]

Stack Trace:
java.lang.AssertionError: Test Setup Failure: shard1 should have just been set 
up to be inconsistent - but it's still consistent. 
Leader:http://127.0.0.1:16328/collection1 Dead 
Guy:http://127.0.0.1:34134/collection1skip list:[CloudJettyRunner 
[url=http://127.0.0.1:18849/collection1], CloudJettyRunner 
[url=http://127.0.0.1:18849/collection1]]
at __randomizedtesting.SeedInfo.seed([81EEC327BDDABBEA:84D3FCA85DBD6]:0)
at org.junit.Assert.fail(Assert.java:93)
at org.junit.Assert.assertTrue(Assert.java:43)
at org.junit.Assert.assertNotNull(Assert.java:526)
at org.apache.solr.cloud.SyncSliceTest.doTest(SyncSliceTest.java:216)




Build Log:
[...truncated 41966 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-4.x-Windows (64bit/jdk1.6.0_45) - Build # 3367 - Failure!

2013-11-08 Thread Policeman Jenkins Server

Build: http://jenkins.thetaphi.de/job/Lucene-Solr-4.x-Windows/3367/
Java: 64bit/jdk1.6.0_45 -XX:+UseCompressedOops -XX:+UseConcMarkSweepGC

1 tests failed.
REGRESSION:  org.apache.solr.core.TestNonNRTOpen.testReaderIsNotNRT

Error Message:
expected:<3> but was:<2>

Stack Trace:
java.lang.AssertionError: expected:<3> but was:<2>
at 
__randomizedtesting.SeedInfo.seed([A219E0541BF21F8C:179F81D3A433AD78]:0)
at org.junit.Assert.fail(Assert.java:93)
at org.junit.Assert.failNotEquals(Assert.java:647)
at org.junit.Assert.assertEquals(Assert.java:128)
at org.junit.Assert.assertEquals(Assert.java:472)
at org.junit.Assert.assertEquals(Assert.java:456)
at 
org.apache.solr.core.TestNonNRTOpen.assertNotNRT(TestNonNRTOpen.java:133)
at 
org.apache.solr.core.TestNonNRTOpen.testReaderIsNotNRT(TestNonNRTOpen.java:94)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1559)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.access$600(RandomizedRunner.java:79)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:737)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:773)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:787)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
at 
org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:782)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:442)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:746)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:648)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:682)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:693)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
at 
org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLea

[jira] [Updated] (LUCENE-5336) Add a simple QueryParser to parse human-entered queries.

2013-11-08 Thread Jack Conradson (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-5336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jack Conradson updated LUCENE-5336:
---

Attachment: LUCENE-5336.patch

I have attached a patch for this JIRA.

> Add a simple QueryParser to parse human-entered queries.
> 
>
> Key: LUCENE-5336
> URL: https://issues.apache.org/jira/browse/LUCENE-5336
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Jack Conradson
> Attachments: LUCENE-5336.patch
>
>
> I would like to add a new simple QueryParser to Lucene that is designed to 
> parse human-entered queries.  This parser will operate on an entire entered 
> query using a specified single field or a set of weighted fields (using term 
> boost).
> All features/operations in this parser can be enabled or disabled depending 
> on what is necessary for the user.  A default operator may be specified as 
> either 'MUST' representing 'and' or 'SHOULD' representing 'or.'  The 
> features/operations that this parser will include are the following:
> * AND specified as '+'
> * OR specified as '|'
> * NOT specified as '-'
> * PHRASE surrounded by double quotes
> * PREFIX specified as '*'
> * PRECEDENCE surrounded by '(' and ')'
> * WHITESPACE specified as ' ' '\n' '\r' and '\t' will cause the default 
> operator to be used
> * ESCAPE specified as '\' will allow operators to be used in terms
> The key differences between this parser and other existing parsers will be 
> the following:
> * No exceptions will be thrown, and errors in syntax will be ignored.  The 
> parser will do a best-effort interpretation of any query entered.
> * It uses minimal syntax to express queries.  All available operators are 
> single characters or pairs of single characters.
> * The parser is hand-written and in a single Java file making it easy to 
> modify.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-5336) Add a simple QueryParser to parse human-entered queries.

2013-11-08 Thread Jack Conradson (JIRA)

Jack Conradson created LUCENE-5336:
--

 Summary: Add a simple QueryParser to parse human-entered queries.
 Key: LUCENE-5336
 URL: https://issues.apache.org/jira/browse/LUCENE-5336
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Jack Conradson


I would like to add a new simple QueryParser to Lucene that is designed to 
parse human-entered queries.  This parser will operate on an entire entered 
query using a specified single field or a set of weighted fields (using term 
boost).

All features/operations in this parser can be enabled or disabled depending on 
what is necessary for the user.  A default operator may be specified as either 
'MUST' representing 'and' or 'SHOULD' representing 'or.'  The 
features/operations that this parser will include are the following:

* AND specified as '+'
* OR specified as '|'
* NOT specified as '-'
* PHRASE surrounded by double quotes
* PREFIX specified as '*'
* PRECEDENCE surrounded by '(' and ')'
* WHITESPACE specified as ' ' '\n' '\r' and '\t' will cause the default 
operator to be used
* ESCAPE specified as '\' will allow operators to be used in terms

The key differences between this parser and other existing parsers will be the 
following:

* No exceptions will be thrown, and errors in syntax will be ignored.  The 
parser will do a best-effort interpretation of any query entered.
* It uses minimal syntax to express queries.  All available operators are 
single characters or pairs of single characters.
* The parser is hand-written and in a single Java file making it easy to modify.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5334) Add Namespaces to Expressions Javascript Compiler

2013-11-08 Thread Jack Conradson (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13817809#comment-13817809
 ] 

Jack Conradson commented on LUCENE-5334:


Thanks for committing, Ryan.

> Add Namespaces to Expressions Javascript Compiler
> -
>
> Key: LUCENE-5334
> URL: https://issues.apache.org/jira/browse/LUCENE-5334
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Jack Conradson
>Assignee: Ryan Ernst
>Priority: Minor
> Attachments: LUCENE-5334.patch
>
>
> I would like to add the concept of namespaces to the expressions javascript 
> compiler using '.' as the operator.
> Example of namespace usage in functions:
> AccurateMath.sqrt(field)
> FastMath.sqrt(field)
> Example of namespace usage in variables:
> location.latitude
> location.longitude



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

ConjunctionScorer floating point precision for score()

2013-11-08 Thread sikburn

Hello,
I have been investigating an issue with document scoring and found that the
ConjunctionScorer implements the score method in a way that can cause
floating point precision rounding issues.  I noticed in some of my test
cases that documents that have not been merged/optimized (I'm not sure of
the correct terminology, they have a docNum of 0) have scorers added in a
different order than optimized documents.  Using a float to maintain the sum
of scores introduces the potential for floating point precision errors.  In
turn this causes the score that is returned from the ConjunctionScorer to be
different for some merged/unmerged documents that should have identical
scores.

Example:

float sum1 = 0.0061859353f + 0.0061859353f + 0.0030929677f + 0.0030929677f +
0.0030929677f + 0.5010608f + 0.0061859353f;

float sum2 =  0.0061859353f + 0.0061859353f + 0.0061859353f + 0.0030929677f
+ 0.0030929677f + 0.0030929677f + 0.5010608f;

sum1 == 0.5288975; // Incorrect
sum2 == 0.52889746; // Correct

I am currently running Solr/Lucene 3.6.2 from source and have two potential
solutions, but I not an expert on floating point precision, rounding, or
lucene performance implications.  

I also noticed that there is a comment in the 4.5.1 version of Lucene to the
effect of:
// TODO: sum into a double and cast to float if we ever send required
clauses to BS1

My Questions are as follows:
Is this currently expected behavior that should not be patched?
If not, would either of these potential solutions be maintained by the
Lucene development community?

Current:
public float score() throws IOException {
float sum = 0.0f;
for (int i = 0; i < scorers.length; i++) {
sum += scorers[i].score();
}
return sum;
}

Option 1:
public float score() throws IOException {
double sum = 0.0d;
for (int i = 0; i < scorers.length; i++) {
sum += scorers[i].score();
}
return (float)sum;
}

Option 2:
public float score() throws IOException {
BigDecimal sum = new BigDecimal(0.0f);
for (int i = 0; i < scorers.length; i++) {
sum = sum.add(new BigDecimal(scorers[i].score()));
}
return sum.floatValue();
}



--
View this message in context: 
http://lucene.472066.n3.nabble.com/ConjunctionScorer-floating-point-precision-for-score-tp4100051.html
Sent from the Lucene - Java Developer mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-5334) Add Namespaces to Expressions Javascript Compiler

2013-11-08 Thread Ryan Ernst (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-5334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan Ernst resolved LUCENE-5334.


Resolution: Fixed

Thanks Jack!

> Add Namespaces to Expressions Javascript Compiler
> -
>
> Key: LUCENE-5334
> URL: https://issues.apache.org/jira/browse/LUCENE-5334
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Jack Conradson
>Assignee: Ryan Ernst
>Priority: Minor
> Attachments: LUCENE-5334.patch
>
>
> I would like to add the concept of namespaces to the expressions javascript 
> compiler using '.' as the operator.
> Example of namespace usage in functions:
> AccurateMath.sqrt(field)
> FastMath.sqrt(field)
> Example of namespace usage in variables:
> location.latitude
> location.longitude



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5334) Add Namespaces to Expressions Javascript Compiler

2013-11-08 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13817713#comment-13817713
 ] 

ASF subversion and git services commented on LUCENE-5334:
-

Commit 1540195 from [~rjernst] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1540195 ]

LUCENE-5334: Add Namespaces to Expressions Javascript Compiler

> Add Namespaces to Expressions Javascript Compiler
> -
>
> Key: LUCENE-5334
> URL: https://issues.apache.org/jira/browse/LUCENE-5334
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Jack Conradson
>Assignee: Ryan Ernst
>Priority: Minor
> Attachments: LUCENE-5334.patch
>
>
> I would like to add the concept of namespaces to the expressions javascript 
> compiler using '.' as the operator.
> Example of namespace usage in functions:
> AccurateMath.sqrt(field)
> FastMath.sqrt(field)
> Example of namespace usage in variables:
> location.latitude
> location.longitude



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4335) Builds should regenerate all generated sources

2013-11-08 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13817683#comment-13817683
 ] 

ASF subversion and git services commented on LUCENE-4335:
-

Commit 1540187 from [~rjernst] in branch 'dev/trunk'
[ https://svn.apache.org/r1540187 ]

LUCENE-4335: Add Namespaces to Expressions Javascript Compiler

> Builds should regenerate all generated sources
> --
>
> Key: LUCENE-4335
> URL: https://issues.apache.org/jira/browse/LUCENE-4335
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Attachments: LUCENE-4335.patch, LUCENE-4335.patch, LUCENE-4335.patch
>
>
> We have more and more sources that are generated programmatically (query 
> parsers, fuzzy levN tables from Moman, packed ints specialized decoders, 
> etc.), and it's dangerous because developers may directly edit the generated 
> sources and forget to edit the meta-source.  It's happened to me several 
> times ... most recently just after landing the BlockPostingsFormat branch.
> I think we should re-gen all of these in our builds and fail the build if 
> this creates a difference.  I know some generators (eg JavaCC) embed 
> timestamps and so always create mods ... we can leave them out of this for 
> starters (or maybe post-process the sources to remove the timestamps) ...



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5332) SpanNearQuery with multiple terms does not find match

2013-11-08 Thread Jerry Zhou (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13817640#comment-13817640
 ] 

Jerry Zhou commented on LUCENE-5332:


The attached test case in this ticket does not have overlapping in the query. 
We just use a simple SpanNearQuery: b d g, and it fails.

The other issues LUCENE-5331 is about the repeats in nested SpeaNearQuery.

> SpanNearQuery with multiple terms does not find match
> -
>
> Key: LUCENE-5332
> URL: https://issues.apache.org/jira/browse/LUCENE-5332
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Jerry Zhou
> Attachments: MultiTermFlatSpanNearTest.java
>
>
> A flat structure (non-nested) for a SpanNearQuery containing multiple terms 
> does not always find the correct match.
> Test case is attached ...



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Estimating peak memory use for UnInvertedField faceting

2013-11-08 Thread Tom Burton-West

Hi Yonik,

I don't know enough about JVM tuning and monitoring to do this in a clean
way, so I just tried setting the max heap at 8GB and then 6GB to force
garbage collection.  With it set to 6GB it goes into  a long GC loop and
then runs out of heap (See below) .  The stack trace says the issue is with
DocTErmOrds.uninvert:
Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded
at org.apache.lucene.index.DocTermOrds.uninvert(DocTermOrds.java:405)

 I'm guessing the actual peak is somewhere between 6 and 8 GB.

BTW: is there some documentation somewhere that explains what the stats
output to INFO mean?

Tom

java.lang.OutOfMemoryError: GC overhead limit exceededjava.lang.RuntimeException: java.lang.OutOfMemoryError: GC
overhead limit exceeded
at
org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:653)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:366)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:141)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:215)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:188)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:213)
at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:172)
at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:548)
at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:117)
at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:108)
at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:174)
at
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:875)
at
org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processConnection(Http11BaseProtocol.java:665)
at
org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(PoolTcpEndpoint.java:528)
at
org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(LeaderFollowerWorkerThread.java:81)
at
org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:689)
at java.lang.Thread.run(Thread.java:724)
Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded
at org.apache.lucene.index.DocTermOrds.uninvert(DocTermOrds.java:405)
at org.apache.solr.request.UnInvertedField.(UnInvertedField.java:179)
at
org.apache.solr.request.UnInvertedField.getUnInvertedField(UnInvertedField.java:664)
at org.apache.solr.request.SimpleFacets.getTermCounts(SimpleFacets.java:426)
at
org.apache.solr.request.SimpleFacets.getFacetFieldCounts(SimpleFacets.java:517)
at
org.apache.solr.request.SimpleFacets.getFacetCounts(SimpleFacets.java:252)
at
org.apache.solr.handler.component.FacetComponent.process(FacetComponent.java:78)
at
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:208)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1817)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:639)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:345)
... 16 more

---
Nov 08, 2013 1:39:26 PM org.apache.solr.request.UnInvertedField 
INFO: UnInverted multi-valued field {field=topicStr,
memSize=1,768,101,824,
tindexSize=86,028,
time=45,854,
phase1=41,039,
nTerms=271,987,
bigTerms=0,
termInstances=569,429,716,
uses=0}
Nov 08, 2013 1:39:28 PM org.apache.solr.core.SolrCore execute

INFO: [core] webapp=/dev-3 path=/select
params={facet=true&facet.mincount=100&indent=true&q=ocr:the&facet.limit=30&facet.field=topicStr&wt=xml}
hits=138,605,690 status=0 QTime=49,797

On Fri, Nov 8, 2013 at 2:01 PM, Yonik Seeley  wrote:

> On Fri, Nov 8, 2013 at 1:56 PM, Tom Burton-West 
> wrote:
> > When testing an index of about 200 million documents, when we do a first
> > faceting on one field (query appended below), the memory use rises from
> > about 2.5 GB to 13GB.  If I run GC after the query the memory use goes
> down
> > to about 3GB and subsequent queries don't significantly increase the
> memory
> > use.
>
> Is there a way to tell what the real max memory usage is?  I assume
> 13GB is just the peak heap usage, but that could include a lot of
> garbage.
>
> -Yonik
> http://heliosearch.com -- making solr shine
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>

Re: Estimating peak memory use for UnInvertedField faceting

2013-11-08 Thread Yonik Seeley

On Fri, Nov 8, 2013 at 1:56 PM, Tom Burton-West  wrote:
> When testing an index of about 200 million documents, when we do a first
> faceting on one field (query appended below), the memory use rises from
> about 2.5 GB to 13GB.  If I run GC after the query the memory use goes down
> to about 3GB and subsequent queries don't significantly increase the memory
> use.

Is there a way to tell what the real max memory usage is?  I assume
13GB is just the peak heap usage, but that could include a lot of
garbage.

-Yonik
http://heliosearch.com -- making solr shine

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5027) Field Collapsing PostFilter

2013-11-08 Thread David (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13817573#comment-13817573
 ] 

David commented on SOLR-5027:
-

Joel,

I submitted a fix in https://issues.apache.org/jira/browse/SOLR-5416 

Let me know if you think this is problematic.

> Field Collapsing PostFilter
> ---
>
> Key: SOLR-5027
> URL: https://issues.apache.org/jira/browse/SOLR-5027
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Affects Versions: 5.0
>Reporter: Joel Bernstein
>Assignee: Joel Bernstein
>Priority: Minor
> Fix For: 4.6, 5.0
>
> Attachments: SOLR-5027.patch, SOLR-5027.patch, SOLR-5027.patch, 
> SOLR-5027.patch, SOLR-5027.patch, SOLR-5027.patch, SOLR-5027.patch, 
> SOLR-5027.patch, SOLR-5027.patch
>
>
> This ticket introduces the *CollapsingQParserPlugin* 
> The *CollapsingQParserPlugin* is a PostFilter that performs field collapsing. 
> This is a high performance alternative to standard Solr field collapsing 
> (with *ngroups*) when the number of distinct groups in the result set is high.
> For example in one performance test, a search with 10 million full results 
> and 1 million collapsed groups:
> Standard grouping with ngroups : 17 seconds.
> CollapsingQParserPlugin: 300 milli-seconds.
> Sample syntax:
> Collapse based on the highest scoring document:
> {code}
> fq=(!collapse field=}
> {code}
> Collapse based on the min value of a numeric field:
> {code}
> fq={!collapse field= min=}
> {code}
> Collapse based on the max value of a numeric field:
> {code}
> fq={!collapse field= max=}
> {code}
> Collapse with a null policy:
> {code}
> fq={!collapse field= nullPolicy=}
> {code}
> There are three null policies:
> ignore : removes docs with a null value in the collapse field (default).
> expand : treats each doc with a null value in the collapse field as a 
> separate group.
> collapse : collapses all docs with a null value into a single group using 
> either highest score, or min/max.
> The CollapsingQParserPlugin also fully supports the QueryElevationComponent
> *Note:*  The July 16 patch also includes and ExpandComponent that expands the 
> collapsed groups for the current search result page. This functionality will 
> be moved to it's own ticket.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Estimating peak memory use for UnInvertedField faceting

2013-11-08 Thread Tom Burton-West

We are considering indexing our 11 million books at a page level, which
comes to about 3 billion Solr documents.

Our subject field  by necessity is multi-valued so the UnInvertedField is
used for faceting.

When testing an index of about 200 million documents, when we do a first
faceting on one field (query appended below), the memory use rises from
about 2.5 GB to 13GB.  If I run GC after the query the memory use goes down
to about 3GB and subsequent queries don't significantly increase the memory
use.

After the query is run various statistics from UnInvertedField are sent to
the log (see below), but they seem to represent the final data structure
rather than the peak.  For example memSize is listed as 1.8GB, while the
temporary data structure was probably closer to 10GB (total 13GB).

Is there a formula for estimating the peak memory size?
Can the statistics spit out to INFO be used to somehow estimate the peak
memory size?

Tom
-

Nov 08, 2013 1:39:26 PM org.apache.solr.request.UnInvertedField 
INFO: UnInverted multi-valued field {field=topicStr,
memSize=1,768,101,824,
tindexSize=86,028,
time=45,854,
phase1=41,039,
nTerms=271,987,
bigTerms=0,
termInstances=569,429,716,
uses=0}
Nov 08, 2013 1:39:28 PM org.apache.solr.core.SolrCore execute

INFO: [core] webapp=/dev-3 path=/select
params={facet=true&facet.mincount=100&indent=true&q=ocr:the&facet.limit=30&facet.field=topicStr&wt=xml}
hits=138,605,690 status=0 QTime=49,797

[jira] [Assigned] (LUCENE-5334) Add Namespaces to Expressions Javascript Compiler

2013-11-08 Thread Ryan Ernst (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-5334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan Ernst reassigned LUCENE-5334:
--

Assignee: Ryan Ernst

> Add Namespaces to Expressions Javascript Compiler
> -
>
> Key: LUCENE-5334
> URL: https://issues.apache.org/jira/browse/LUCENE-5334
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Jack Conradson
>Assignee: Ryan Ernst
>Priority: Minor
> Attachments: LUCENE-5334.patch
>
>
> I would like to add the concept of namespaces to the expressions javascript 
> compiler using '.' as the operator.
> Example of namespace usage in functions:
> AccurateMath.sqrt(field)
> FastMath.sqrt(field)
> Example of namespace usage in variables:
> location.latitude
> location.longitude



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5287) Allow at least solrconfig.xml and schema.xml to be edited via the admin screen

2013-11-08 Thread Erick Erickson (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13817459#comment-13817459
 ] 

Erick Erickson commented on SOLR-5287:
--

I'm coming back around to this. It _looks_ (and we'll have more info on this 
next week when [~sarowe] has had a chance to straighten me out), like it'll be 
relatively easy to piggy-back on the REST-API work and allow it to handle 
arbitrary files in the conf directory. I'm envisioning a new option in the 
managed schema config for solrconfig. Currently, the  tag takes 
a tag like:

managed-schema

So what if we allowed something like
managed-all-conf
or
managed-schema managed-all-conf
or just assume that managed-all-conf enabled both the "push the whole file" 
option and using the managed schema. I think the managed schema will allow for 
a really nice UI interface that'll be valuable, and the people "who don't need 
no stinking wizard" can just freely edit the raw files.

the "managed-all-conf" list + CRUD operations on any file in the conf directory 
(maybe more later). The infrastructure is in place to push this to SolrCloud, 
so it seems like about the same amount of work to do it all.

That takes care of the ability to restrict this capability by configuration, 
getting things to the cloud etc. From a UI perspective, it's just a POST to the 
right URL with the body of the file.

Anyway, that's the current thinking...

> Allow at least solrconfig.xml and schema.xml to be edited via the admin screen
> --
>
> Key: SOLR-5287
> URL: https://issues.apache.org/jira/browse/SOLR-5287
> Project: Solr
>  Issue Type: Improvement
>  Components: Schema and Analysis, web gui
>Affects Versions: 4.5, 5.0
>Reporter: Erick Erickson
>Assignee: Erick Erickson
>
> A user asking a question on the Solr list got me to thinking about editing 
> the main config files from the Solr admin screen. I chatted briefly with 
> [~steffkes] about the mechanics of this on the browser side, he doesn't see a 
> problem on that end. His comment is there's no end point that'll write the 
> file back.
> Am I missing something here or is this actually not a hard problem? I see a 
> couple of issues off the bat, neither of which seem troublesome.
> 1> file permissions. I'd imagine lots of installations will get file 
> permission exceptions if Solr tries to write the file out. Well, do a 
> chmod/chown.
> 2> screwing up the system maliciously or not. I don't think this is an issue, 
> this would be part of the admin handler after all.
> Does anyone have objections to the idea? And how does this fit into the work 
> that [~sar...@syr.edu] has been doing?
> I can imagine this extending to SolrCloud with a "push this to ZK" option or 
> something like that, perhaps not in V1 unless it's easy.
> Of course any pointers gratefully received. Especially ones that start with 
> "Don't waste your effort, it'll never work (or be accepted)"...
> Because what scares me is this seems like such an easy thing to do that would 
> be a significant ease-of-use improvement, so there _has_ to be something I'm 
> missing.
> So if we go forward with this we'll make this the umbrella JIRA, the two 
> immediate sub-JIRAs that spring to mind will be the UI work and the endpoints 
> for the UI work to use.
> I think there are only two end-points here
> 1> list all the files in the conf (or arbitrary from /collection) 
> directory.
> 2> write this text to this file
> Possibly later we could add "clone the configs from coreX to coreY".
> BTW, I've assigned this to myself so I don't lose it, but if anyone wants to 
> take it over it won't hurt my feelings a bit



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5432) Allow simple editing for solrconfig.xml and schema.xml from the admin interface

2013-11-08 Thread Erick Erickson (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13817444#comment-13817444
 ] 

Erick Erickson commented on SOLR-5432:
--

Well, this might be a short-lived JIRA. I looked at the managed schema code and 
chatted with Steve Rowe. It _looks_ like it would be relatively 
straight-forward to leverage the managed schema infrastructure to allow for 
CRUD on arbitrary files at least in the conf directory. On a quick glance it 
doesn't appear to be much, if any, more work than the "simple" way of doing 
things. And it would get us SolrCloud support "for free" or at least using 
prior art.

We'll be able to look at this a bit more next week.

Any red flags here? It seems like schema.xml lends itself to a whole series of 
specific calls, the addfield, updatefield, copyfield sort of thing but the 
other files don't really, they're more blobs.

Which does leave us with the question of how the managed index schema for 
schema.xml should play if we also have generic file CRUD operations. Does it 
make sense to prevent the managed-file manipulation of schema.xml if they have 
configured the managed schema option? My personal take is that, barring having 
a hard time making this happen in the code, that the two options are not 
mutually exclusive and we shouldn't worry about it.

> Allow simple editing for solrconfig.xml and schema.xml from the admin 
> interface
> ---
>
> Key: SOLR-5432
> URL: https://issues.apache.org/jira/browse/SOLR-5432
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: 4.6, 5.0
>Reporter: Erick Erickson
>Assignee: Erick Erickson
>
> [~steffkes] OK, let's see if we can make the simple case work as per the 
> discussion in SOLR-5287 and reserve the rest of the enhancements for later?
> I'll try to create an end-point Real Soon Now and we can try it out. This 
> will be the simple case of just writing basically anything in the conf 
> directory. Specifically _excluding_ anything in the sub-directories for the 
> time being.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2899) Add OpenNLP Analysis capabilities as a module

2013-11-08 Thread Markus Jelsma (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13817332#comment-13817332
 ] 

Markus Jelsma commented on LUCENE-2899:
---

Hi - any change this is going to get committed some day?

> Add OpenNLP Analysis capabilities as a module
> -
>
> Key: LUCENE-2899
> URL: https://issues.apache.org/jira/browse/LUCENE-2899
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: modules/analysis
>Reporter: Grant Ingersoll
>Assignee: Grant Ingersoll
>Priority: Minor
> Fix For: 4.6
>
> Attachments: LUCENE-2899-RJN.patch, LUCENE-2899.patch, 
> OpenNLPFilter.java, OpenNLPTokenizer.java
>
>
> Now that OpenNLP is an ASF project and has a nice license, it would be nice 
> to have a submodule (under analysis) that exposed capabilities for it. Drew 
> Farris, Tom Morton and I have code that does:
> * Sentence Detection as a Tokenizer (could also be a TokenFilter, although it 
> would have to change slightly to buffer tokens)
> * NamedEntity recognition as a TokenFilter
> We are also planning a Tokenizer/TokenFilter that can put parts of speech as 
> either payloads (PartOfSpeechAttribute?) on a token or at the same position.
> I'd propose it go under:
> modules/analysis/opennlp



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS-MAVEN] Lucene-Solr-Maven-4.x #501: POMs out of sync

2013-11-08 Thread Apache Jenkins Server

Build: https://builds.apache.org/job/Lucene-Solr-Maven-4.x/501/

No tests ran.

Build Log:
[...truncated 20123 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5333) Support sparse faceting for heterogeneous indices

2013-11-08 Thread Shai Erera (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13817327#comment-13817327
 ] 

Shai Erera commented on LUCENE-5333:


Duh, good point! :).

I think then an AllFacetsAccumulator.create() with two variants - one that 
takes the members required to create TaxonomyFA and another to create 
SortedSetFA, and then return a FacetsAccumulator which *extends* either of the 
two, would work better ... but limited to CountingFacetRequest. I think, since 
it's so simple, we may not even need to bother with other aggregation 
functions, as this will be an example to how to achieve this functionality at 
all, and then an app could copy the code to create other FacetRequests (e.g. 
SumScoreFacetRequest)?

> Support sparse faceting for heterogeneous indices
> -
>
> Key: LUCENE-5333
> URL: https://issues.apache.org/jira/browse/LUCENE-5333
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: modules/facet
>Reporter: Michael McCandless
>
> In some search apps, e.g. a large e-commerce site, the index can have
> a mix of wildly different product categories and facet dimensions, and
> the number of dimensions could be huge.
> E.g. maybe the index has shirts, computer memory, hard drives, etc.,
> and each of these many categories has different attributes.
> In such an index, when someone searches for "so dimm", which should
> match a bunch of laptop memory modules, you can't (easily) know up
> front which facet dimensions will be important.
> But, I think this is very easy for the facet module, since ords are
> stored "row stride" (each doc lists all facet labels it has), we could
> simply count all facets that the hits actually saw, and then in the
> end see which ones "got traction" and return facet results for these
> top dims.
> I'm not sure what the API would look like, but conceptually this
> should work very well, because of how the facet module works.
> You shouldn't have to state up front exactly which facet dimensions
> to count...



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-5433) Use the Schema REST api for editing the schema file

2013-11-08 Thread Erick Erickson (JIRA)

Erick Erickson created SOLR-5433:


 Summary: Use the Schema REST api for editing the schema file
 Key: SOLR-5433
 URL: https://issues.apache.org/jira/browse/SOLR-5433
 Project: Solr
  Issue Type: Improvement
Affects Versions: 4.6, 5.0
Reporter: Erick Erickson
Assignee: Erick Erickson
Priority: Minor


[~sarowe] [~steffkes] Marker for going forward with some kind of wizard for 
editing the schema.xml file.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-5432) Allow simple editing for solrconfig.xml and schema.xml from the admin interface

2013-11-08 Thread Erick Erickson (JIRA)

Erick Erickson created SOLR-5432:


 Summary: Allow simple editing for solrconfig.xml and schema.xml 
from the admin interface
 Key: SOLR-5432
 URL: https://issues.apache.org/jira/browse/SOLR-5432
 Project: Solr
  Issue Type: Improvement
Affects Versions: 4.6, 5.0
Reporter: Erick Erickson
Assignee: Erick Erickson


[~steffkes] OK, let's see if we can make the simple case work as per the 
discussion in SOLR-5287 and reserve the rest of the enhancements for later?

I'll try to create an end-point Real Soon Now and we can try it out. This will 
be the simple case of just writing basically anything in the conf directory. 
Specifically _excluding_ anything in the sub-directories for the time being.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5042) MoreLikeThis doesn't return a score when mlt.count is set to 10

2013-11-08 Thread Markus Jelsma (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13817294#comment-13817294
 ] 

Markus Jelsma commented on SOLR-5042:
-

Great work! Thanks!

> MoreLikeThis doesn't return a score when mlt.count is set to 10
> ---
>
> Key: SOLR-5042
> URL: https://issues.apache.org/jira/browse/SOLR-5042
> Project: Solr
>  Issue Type: Bug
>  Components: MoreLikeThis
>Affects Versions: 4.3
>Reporter: Josh Curran
>Assignee: Shawn Heisey
>Priority: Minor
> Attachments: SOLR-5042.patch
>
>
> The problem appears to be around the mlt.count with in the solrconfig.xml. 
> When this value is set to 10, the 10 values that have been identified as 
> 'most like this' are returned with the original query, however the 'score' 
> field is missing.
> Changing the mlt.count to say 11 and issuing the same query then the 'score' 
> field is returned with the same query. This appears to be the workaround. 11 
> was just an arbitrary value, 12 or 15 also work 
> The same problem was raised on stackoverflow 
> http://stackoverflow.com/questions/16513719/solr-more-like-this-dont-return-score-while-specify-mlt-count



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5042) MoreLikeThis doesn't return a score when mlt.count is set to 10

2013-11-08 Thread Anshum Gupta (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13817291#comment-13817291
 ] 

Anshum Gupta commented on SOLR-5042:


[~markus17] yes this issue was resolved but I just didn't get time to add unit 
tests for it yet.
I'd however manually tested this.

> MoreLikeThis doesn't return a score when mlt.count is set to 10
> ---
>
> Key: SOLR-5042
> URL: https://issues.apache.org/jira/browse/SOLR-5042
> Project: Solr
>  Issue Type: Bug
>  Components: MoreLikeThis
>Affects Versions: 4.3
>Reporter: Josh Curran
>Assignee: Shawn Heisey
>Priority: Minor
> Attachments: SOLR-5042.patch
>
>
> The problem appears to be around the mlt.count with in the solrconfig.xml. 
> When this value is set to 10, the 10 values that have been identified as 
> 'most like this' are returned with the original query, however the 'score' 
> field is missing.
> Changing the mlt.count to say 11 and issuing the same query then the 'score' 
> field is returned with the same query. This appears to be the workaround. 11 
> was just an arbitrary value, 12 or 15 also work 
> The same problem was raised on stackoverflow 
> http://stackoverflow.com/questions/16513719/solr-more-like-this-dont-return-score-while-specify-mlt-count



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5335) Change raw Map type to Map for ValueSource context

2013-11-08 Thread Yonik Seeley (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13817290#comment-13817290
 ] 

Yonik Seeley commented on LUCENE-5335:
--

bq. I dont think things like SumTotalTermFreqValueSource should be a blocker to 
fixing this map API

The whole point of the current context is so you can do stuff like the current 
code does, so it's really unclear how you would "fix" it (or why it needs 
fixing), without first figuring out a different way to implement stuff like 
SumTotalTermFreqValueSource.  In that sense, it certainly does seem like a 
blocker.


> Change raw Map type to Map for ValueSource context
> -
>
> Key: LUCENE-5335
> URL: https://issues.apache.org/jira/browse/LUCENE-5335
> Project: Lucene - Core
>  Issue Type: Sub-task
>Reporter: Ryan Ernst
> Attachments: LUCENE-5335.patch
>
>
> Just as the title says.  Simple refactoring.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5042) MoreLikeThis doesn't return a score when mlt.count is set to 10

2013-11-08 Thread Markus Jelsma (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13817271#comment-13817271
 ] 

Markus Jelsma commented on SOLR-5042:
-

Guys, i see some commits in trunk and 4x, is this issue resolved?
Thanks

> MoreLikeThis doesn't return a score when mlt.count is set to 10
> ---
>
> Key: SOLR-5042
> URL: https://issues.apache.org/jira/browse/SOLR-5042
> Project: Solr
>  Issue Type: Bug
>  Components: MoreLikeThis
>Affects Versions: 4.3
>Reporter: Josh Curran
>Assignee: Shawn Heisey
>Priority: Minor
> Attachments: SOLR-5042.patch
>
>
> The problem appears to be around the mlt.count with in the solrconfig.xml. 
> When this value is set to 10, the 10 values that have been identified as 
> 'most like this' are returned with the original query, however the 'score' 
> field is missing.
> Changing the mlt.count to say 11 and issuing the same query then the 'score' 
> field is returned with the same query. This appears to be the workaround. 11 
> was just an arbitrary value, 12 or 15 also work 
> The same problem was raised on stackoverflow 
> http://stackoverflow.com/questions/16513719/solr-more-like-this-dont-return-score-while-specify-mlt-count



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5149) Query facet to respect mincount

2013-11-08 Thread Markus Jelsma (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13817259#comment-13817259
 ] 

Markus Jelsma commented on SOLR-5149:
-

Any more comments to this? Change stuff? We're using it in production for two 
months now and are happy with the results.

> Query facet to respect mincount
> ---
>
> Key: SOLR-5149
> URL: https://issues.apache.org/jira/browse/SOLR-5149
> Project: Solr
>  Issue Type: Bug
>  Components: SearchComponents - other
>Affects Versions: 4.4
>Reporter: Markus Jelsma
>Priority: Minor
> Fix For: 4.6
>
> Attachments: SOLR-5149-trunk.patch, SOLR-5149-trunk.patch, 
> SOLR-5149-trunk.patch, SOLR-5149-trunk.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5027) Field Collapsing PostFilter

2013-11-08 Thread Joel Bernstein (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13817260#comment-13817260
 ] 

Joel Bernstein commented on SOLR-5027:
--

Greg,

Are you asking for the ability to use full sort spec as the collapse criteria? 
I believe you are, but I just want to clarify.

You can currently use the full sort spec now to sort the collasped result set. 
But only min/max of a numeric field as collapse criteria.

Joel



> Field Collapsing PostFilter
> ---
>
> Key: SOLR-5027
> URL: https://issues.apache.org/jira/browse/SOLR-5027
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Affects Versions: 5.0
>Reporter: Joel Bernstein
>Assignee: Joel Bernstein
>Priority: Minor
> Fix For: 4.6, 5.0
>
> Attachments: SOLR-5027.patch, SOLR-5027.patch, SOLR-5027.patch, 
> SOLR-5027.patch, SOLR-5027.patch, SOLR-5027.patch, SOLR-5027.patch, 
> SOLR-5027.patch, SOLR-5027.patch
>
>
> This ticket introduces the *CollapsingQParserPlugin* 
> The *CollapsingQParserPlugin* is a PostFilter that performs field collapsing. 
> This is a high performance alternative to standard Solr field collapsing 
> (with *ngroups*) when the number of distinct groups in the result set is high.
> For example in one performance test, a search with 10 million full results 
> and 1 million collapsed groups:
> Standard grouping with ngroups : 17 seconds.
> CollapsingQParserPlugin: 300 milli-seconds.
> Sample syntax:
> Collapse based on the highest scoring document:
> {code}
> fq=(!collapse field=}
> {code}
> Collapse based on the min value of a numeric field:
> {code}
> fq={!collapse field= min=}
> {code}
> Collapse based on the max value of a numeric field:
> {code}
> fq={!collapse field= max=}
> {code}
> Collapse with a null policy:
> {code}
> fq={!collapse field= nullPolicy=}
> {code}
> There are three null policies:
> ignore : removes docs with a null value in the collapse field (default).
> expand : treats each doc with a null value in the collapse field as a 
> separate group.
> collapse : collapses all docs with a null value into a single group using 
> either highest score, or min/max.
> The CollapsingQParserPlugin also fully supports the QueryElevationComponent
> *Note:*  The July 16 patch also includes and ExpandComponent that expands the 
> collapsed groups for the current search result page. This functionality will 
> be moved to it's own ticket.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (SOLR-5027) Field Collapsing PostFilter

2013-11-08 Thread Joel Bernstein (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13817257#comment-13817257
 ] 

Joel Bernstein edited comment on SOLR-5027 at 11/8/13 1:09 PM:
---

David,

I was reading your comments while I was away on vacation but my mobile device 
wasn't playing nicely with the jira site, so I held off on replying until I got 
back.

I see the issue that you've reported and I'll be working on it through the jira 
that you created. I'll be posting to that jira with my thoughts soon.

Joel




was (Author: joel.bernstein):
David,

I've was reading your comments while I was away on vacation but my mobile 
device wasn't playing nicely with the jira site, so I held off on replying 
until I got back.

I see the issue that you've reported and I'll be working on it through the jira 
that you created. I'll be posting to that jira with my thoughts soon.

Joel



> Field Collapsing PostFilter
> ---
>
> Key: SOLR-5027
> URL: https://issues.apache.org/jira/browse/SOLR-5027
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Affects Versions: 5.0
>Reporter: Joel Bernstein
>Assignee: Joel Bernstein
>Priority: Minor
> Fix For: 4.6, 5.0
>
> Attachments: SOLR-5027.patch, SOLR-5027.patch, SOLR-5027.patch, 
> SOLR-5027.patch, SOLR-5027.patch, SOLR-5027.patch, SOLR-5027.patch, 
> SOLR-5027.patch, SOLR-5027.patch
>
>
> This ticket introduces the *CollapsingQParserPlugin* 
> The *CollapsingQParserPlugin* is a PostFilter that performs field collapsing. 
> This is a high performance alternative to standard Solr field collapsing 
> (with *ngroups*) when the number of distinct groups in the result set is high.
> For example in one performance test, a search with 10 million full results 
> and 1 million collapsed groups:
> Standard grouping with ngroups : 17 seconds.
> CollapsingQParserPlugin: 300 milli-seconds.
> Sample syntax:
> Collapse based on the highest scoring document:
> {code}
> fq=(!collapse field=}
> {code}
> Collapse based on the min value of a numeric field:
> {code}
> fq={!collapse field= min=}
> {code}
> Collapse based on the max value of a numeric field:
> {code}
> fq={!collapse field= max=}
> {code}
> Collapse with a null policy:
> {code}
> fq={!collapse field= nullPolicy=}
> {code}
> There are three null policies:
> ignore : removes docs with a null value in the collapse field (default).
> expand : treats each doc with a null value in the collapse field as a 
> separate group.
> collapse : collapses all docs with a null value into a single group using 
> either highest score, or min/max.
> The CollapsingQParserPlugin also fully supports the QueryElevationComponent
> *Note:*  The July 16 patch also includes and ExpandComponent that expands the 
> collapsed groups for the current search result page. This functionality will 
> be moved to it's own ticket.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5027) Field Collapsing PostFilter

2013-11-08 Thread Joel Bernstein (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13817257#comment-13817257
 ] 

Joel Bernstein commented on SOLR-5027:
--

David,

I've was reading your comments while I was away on vacation but my mobile 
device wasn't playing nicely with the jira site, so I held off on replying 
until I got back.

I see the issue that you've reported and I'll be working on it through the jira 
that you created. I'll be posting to that jira with my thoughts soon.

Joel



> Field Collapsing PostFilter
> ---
>
> Key: SOLR-5027
> URL: https://issues.apache.org/jira/browse/SOLR-5027
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Affects Versions: 5.0
>Reporter: Joel Bernstein
>Assignee: Joel Bernstein
>Priority: Minor
> Fix For: 4.6, 5.0
>
> Attachments: SOLR-5027.patch, SOLR-5027.patch, SOLR-5027.patch, 
> SOLR-5027.patch, SOLR-5027.patch, SOLR-5027.patch, SOLR-5027.patch, 
> SOLR-5027.patch, SOLR-5027.patch
>
>
> This ticket introduces the *CollapsingQParserPlugin* 
> The *CollapsingQParserPlugin* is a PostFilter that performs field collapsing. 
> This is a high performance alternative to standard Solr field collapsing 
> (with *ngroups*) when the number of distinct groups in the result set is high.
> For example in one performance test, a search with 10 million full results 
> and 1 million collapsed groups:
> Standard grouping with ngroups : 17 seconds.
> CollapsingQParserPlugin: 300 milli-seconds.
> Sample syntax:
> Collapse based on the highest scoring document:
> {code}
> fq=(!collapse field=}
> {code}
> Collapse based on the min value of a numeric field:
> {code}
> fq={!collapse field= min=}
> {code}
> Collapse based on the max value of a numeric field:
> {code}
> fq={!collapse field= max=}
> {code}
> Collapse with a null policy:
> {code}
> fq={!collapse field= nullPolicy=}
> {code}
> There are three null policies:
> ignore : removes docs with a null value in the collapse field (default).
> expand : treats each doc with a null value in the collapse field as a 
> separate group.
> collapse : collapses all docs with a null value into a single group using 
> either highest score, or min/max.
> The CollapsingQParserPlugin also fully supports the QueryElevationComponent
> *Note:*  The July 16 patch also includes and ExpandComponent that expands the 
> collapsed groups for the current search result page. This functionality will 
> be moved to it's own ticket.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5333) Support sparse faceting for heterogeneous indices

2013-11-08 Thread Michael McCandless (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13817196#comment-13817196
 ] 

Michael McCandless commented on LUCENE-5333:


Hmm if we wrap another FacetsAccumulator, is it the user's job to first create 
that accumulator (with no facet requests)?  But then how do we then create 
another one, with all the facet requests we derived from ROOT?  I guess we 
could just switch on the N types we have?  But then maybe we should just add 
static methods to each to make this "All" accumulator for each?

> Support sparse faceting for heterogeneous indices
> -
>
> Key: LUCENE-5333
> URL: https://issues.apache.org/jira/browse/LUCENE-5333
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: modules/facet
>Reporter: Michael McCandless
>
> In some search apps, e.g. a large e-commerce site, the index can have
> a mix of wildly different product categories and facet dimensions, and
> the number of dimensions could be huge.
> E.g. maybe the index has shirts, computer memory, hard drives, etc.,
> and each of these many categories has different attributes.
> In such an index, when someone searches for "so dimm", which should
> match a bunch of laptop memory modules, you can't (easily) know up
> front which facet dimensions will be important.
> But, I think this is very easy for the facet module, since ords are
> stored "row stride" (each doc lists all facet labels it has), we could
> simply count all facets that the hits actually saw, and then in the
> end see which ones "got traction" and return facet results for these
> top dims.
> I'm not sure what the API would look like, but conceptually this
> should work very well, because of how the facet module works.
> You shouldn't have to state up front exactly which facet dimensions
> to count...



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-5283) Fail the build if ant test didn't execute any tests (everything filtered out).

2013-11-08 Thread Dawid Weiss (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-5283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dawid Weiss resolved LUCENE-5283.
-

Resolution: Fixed

> Fail the build if ant test didn't execute any tests (everything filtered out).
> --
>
> Key: LUCENE-5283
> URL: https://issues.apache.org/jira/browse/LUCENE-5283
> Project: Lucene - Core
>  Issue Type: Wish
>Reporter: Dawid Weiss
>Assignee: Dawid Weiss
>Priority: Trivial
> Fix For: 4.6, 5.0
>
> Attachments: LUCENE-5283-permgen.patch, LUCENE-5283.patch, 
> LUCENE-5283.patch, LUCENE-5283.patch
>
>
> This should be an optional setting that defaults to 'false' (the build 
> proceeds).



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5379) Query-time multi-word synonym expansion

2013-11-08 Thread Otis Gospodnetic (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13817174#comment-13817174
 ] 

Otis Gospodnetic commented on SOLR-5379:


[~bsteele] - maybe you had your colleagues test other patches, like SOLR-4381?

> Query-time multi-word synonym expansion
> ---
>
> Key: SOLR-5379
> URL: https://issues.apache.org/jira/browse/SOLR-5379
> Project: Solr
>  Issue Type: Improvement
>  Components: query parsers
>Reporter: Nguyen Manh Tien
>  Labels: multi-word, queryparser, synonym
> Fix For: 4.5.1, 4.6
>
> Attachments: quoted.patch, synonym-expander.patch
>
>
> While dealing with synonym at query time, solr failed to work with multi-word 
> synonyms due to some reasons:
> - First the lucene queryparser tokenizes user query by space so it split 
> multi-word term into two terms before feeding to synonym filter, so synonym 
> filter can't recognized multi-word term to do expansion
> - Second, if synonym filter expand into multiple terms which contains 
> multi-word synonym, The SolrQueryParseBase currently use MultiPhraseQuery to 
> handle synonyms. But MultiPhraseQuery don't work with term have different 
> number of words.
> For the first one, we can extend quoted all multi-word synonym in user query 
> so that lucene queryparser don't split it. There are a jira task related to 
> this one https://issues.apache.org/jira/browse/LUCENE-2605.
> For the second, we can replace MultiPhraseQuery by an appropriate BoleanQuery 
> SHOULD which contains multiple PhraseQuery in case tokens stream have 
> multi-word synonym.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5283) Fail the build if ant test didn't execute any tests (everything filtered out).

2013-11-08 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13817175#comment-13817175
 ] 

ASF subversion and git services commented on LUCENE-5283:
-

Commit 1539975 from [~dawidweiss] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1539975 ]

LUCENE-5283: Fail the build if ant test didn't execute any tests (everything 
filtered out).

> Fail the build if ant test didn't execute any tests (everything filtered out).
> --
>
> Key: LUCENE-5283
> URL: https://issues.apache.org/jira/browse/LUCENE-5283
> Project: Lucene - Core
>  Issue Type: Wish
>Reporter: Dawid Weiss
>Assignee: Dawid Weiss
>Priority: Trivial
> Fix For: 4.6, 5.0
>
> Attachments: LUCENE-5283-permgen.patch, LUCENE-5283.patch, 
> LUCENE-5283.patch, LUCENE-5283.patch
>
>
> This should be an optional setting that defaults to 'false' (the build 
> proceeds).



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5325) Move ValueSource and FunctionValues under core/

2013-11-08 Thread Shai Erera (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13817138#comment-13817138
 ] 

Shai Erera commented on LUCENE-5325:


bq. Shai Erera, have you already started on this? If not I'd be happy to take 
it on.

No, I haven't started any work yet and won't be able to work on it in the next 
few weeks, so feel free to take it!

> Move ValueSource and FunctionValues under core/
> ---
>
> Key: LUCENE-5325
> URL: https://issues.apache.org/jira/browse/LUCENE-5325
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/search
>Reporter: Shai Erera
>
> Spinoff from LUCENE-5298: ValueSource and FunctionValues are abstract APIs 
> which exist under the queries/ module. That causes any module which wants to 
> depend on these APIs (but not necessarily on any of their actual 
> implementations!), to depend on the queries/ module. If we move these APIs 
> under core/, we can eliminate these dependencies and add some mock impls for 
> testing purposes.
> Quoting Robert from LUCENE-5298:
> {quote}
> we should eliminate the suggest/ dependencies on expressions and queries, the 
> expressions/ on queries, the grouping/ dependency on queries, the spatial/ 
> dependency on queries, its a mess.
> {quote}
> To add to that list, facet/ should not depend on queries too.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5379) Query-time multi-word synonym expansion

2013-11-08 Thread Markus Jelsma (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13817133#comment-13817133
 ] 

Markus Jelsma commented on SOLR-5379:
-

Oh i interpreted your comment as if you have tested it against the other 
patches linked to this one.

> Query-time multi-word synonym expansion
> ---
>
> Key: SOLR-5379
> URL: https://issues.apache.org/jira/browse/SOLR-5379
> Project: Solr
>  Issue Type: Improvement
>  Components: query parsers
>Reporter: Nguyen Manh Tien
>  Labels: multi-word, queryparser, synonym
> Fix For: 4.5.1, 4.6
>
> Attachments: quoted.patch, synonym-expander.patch
>
>
> While dealing with synonym at query time, solr failed to work with multi-word 
> synonyms due to some reasons:
> - First the lucene queryparser tokenizes user query by space so it split 
> multi-word term into two terms before feeding to synonym filter, so synonym 
> filter can't recognized multi-word term to do expansion
> - Second, if synonym filter expand into multiple terms which contains 
> multi-word synonym, The SolrQueryParseBase currently use MultiPhraseQuery to 
> handle synonyms. But MultiPhraseQuery don't work with term have different 
> number of words.
> For the first one, we can extend quoted all multi-word synonym in user query 
> so that lucene queryparser don't split it. There are a jira task related to 
> this one https://issues.apache.org/jira/browse/LUCENE-2605.
> For the second, we can replace MultiPhraseQuery by an appropriate BoleanQuery 
> SHOULD which contains multiple PhraseQuery in case tokens stream have 
> multi-word synonym.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

38 matches

Mail list logo