date:20140502

[JENKINS] Lucene-Solr-4.x-Windows (64bit/jdk1.8.0_20-ea-b05) - Build # 3924 - Still Failing!

2014-05-02 Thread Policeman Jenkins Server

Build: http://jenkins.thetaphi.de/job/Lucene-Solr-4.x-Windows/3924/
Java: 64bit/jdk1.8.0_20-ea-b05 -XX:+UseCompressedOops -XX:+UseG1GC

1 tests failed.
FAILED:  org.apache.solr.client.solrj.impl.BasicHttpSolrServerTest.testUpdate

Error Message:
expected: but was:

Stack Trace:
java.lang.AssertionError: expected: but was:
at 
__randomizedtesting.SeedInfo.seed([2DC8564DCD010891:9BDE1347ABC5E047]:0)
at org.junit.Assert.fail(Assert.java:93)
at org.junit.Assert.failNotEquals(Assert.java:647)
at org.junit.Assert.assertEquals(Assert.java:128)
at org.junit.Assert.assertEquals(Assert.java:147)
at 
org.apache.solr.client.solrj.impl.BasicHttpSolrServerTest.testUpdate(BasicHttpSolrServerTest.java:365)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:483)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1618)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:827)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:863)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:877)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
at 
org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:360)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:793)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:453)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:836)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:738)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:772)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:783)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65)
at 
org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$Statemen

[jira] [Updated] (SOLR-6039) debug=track causes debug=query info to be suppressed when no results found

2014-05-02 Thread Shawn Heisey (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-6039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shawn Heisey updated SOLR-6039:
---

Summary: debug=track causes debug=query info to be suppressed when no 
results found  (was: debug=track causes debug=query info to be suprsedd when no 
results found)

> debug=track causes debug=query info to be suppressed when no results found
> --
>
> Key: SOLR-6039
> URL: https://issues.apache.org/jira/browse/SOLR-6039
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 4.7
>Reporter: Hoss Man
> Attachments: SOLR-6039.patch, SOLR-6039.patch, SOLR-6039.patch, 
> SOLR-6039.patch
>
>
> Shamik Bandopadhyay noted on the mailing list that debugQuery=true wasn't 
> returning info about how the query was being parsed in SolrCloud.
> Steps to reproduce...
> * startup a simple 2 shard solr cluster using the example configs
> * Load this URL:
> ** http://localhost:8983/solr/select?q=Foo&debug=query
> ** note that the debug=query causes a debug block including "parsedquery" 
> showing "title:foo"
> * Load either of these URLs:
> ** http://localhost:8983/solr/select?q=Foo&debug=query&debug=track
> ** http://localhost:8983/solr/select?q=Foo&debugQuery=true (legacy short hand 
> option for enabling all debug options)
> ** Note that the debug block exists, but *only* includes the distributed 
> "track" options - the query parsing debugging info is not available
> * index the sample data (java -jar post.jar *.xml)
> * Load either of these URLs:
> ** http://localhost:8983/solr/select?q=Solr&debugQuery=true&fl=id
> ** http://localhost:8983/solr/select?q=Solr&debug=query&debug=track&fl=id
> ** Note that now we have at least one matching doc, and the parsedquery info 
> is included in the debug block along with the tracking info
> * Load either of these URLs:
> ** http://localhost:8983/solr/select?q=Solr&debug=query&debug=track&rows=0
> ** http://localhost:8983/solr/select?q=Solr&debug=query&debug=track&rows=0
> ** Note: even though we have a matching doc, since rows=0 prevents it from 
> being returned, the parsedquery debug info again no longer works - just the 
> track debug info
> 
> The work around, for people who want don't care about the newer "debug 
> tracking" and what the same debug information as pre-4.7, is to enumerate the 
> debug options (ie: {{debug=query&debug=timing&debug=results}}) instead of 
> relying on the shorthand: {{debugQuery=true}}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-trunk-MacOSX (64bit/jdk1.7.0) - Build # 1546 - Failure!

2014-05-02 Thread Policeman Jenkins Server

Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-MacOSX/1546/
Java: 64bit/jdk1.7.0 -XX:-UseCompressedOops -XX:+UseParallelGC

All tests passed

Build Log:
[...truncated 11192 lines...]
   [junit4] JVM J0: stderr was not empty, see: 
/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/build/solr-core/test/temp/junit4-J0-20140503_001903_822.syserr
   [junit4] >>> JVM J0: stderr (verbatim) 
   [junit4] java(215,0x146169000) malloc: *** error for object 0x146157b00: 
pointer being freed was not allocated
   [junit4] *** set a breakpoint in malloc_error_break to debug
   [junit4] <<< JVM J0: EOF 

[...truncated 1 lines...]
   [junit4] ERROR: JVM J0 ended with an exception, command line: 
/Library/Java/JavaVirtualMachines/jdk1.7.0_55.jdk/Contents/Home/jre/bin/java 
-XX:-UseCompressedOops -XX:+UseParallelGC -XX:+HeapDumpOnOutOfMemoryError 
-XX:HeapDumpPath=/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/heapdumps 
-Dtests.prefix=tests -Dtests.seed=1E5E2F4D9463327A -Xmx512M -Dtests.iters= 
-Dtests.verbose=false -Dtests.infostream=false -Dtests.codec=random 
-Dtests.postingsformat=random -Dtests.docvaluesformat=random 
-Dtests.locale=random -Dtests.timezone=random -Dtests.directory=random 
-Dtests.linedocsfile=europarl.lines.txt.gz -Dtests.luceneMatchVersion=5.0 
-Dtests.cleanthreads=perClass 
-Djava.util.logging.config.file=/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/tools/junit4/logging.properties
 -Dtests.nightly=false -Dtests.weekly=false -Dtests.monster=false 
-Dtests.slow=true -Dtests.asserts.gracious=false -Dtests.multiplier=1 
-DtempDir=. -Djava.io.tmpdir=. 
-Djunit4.tempDir=/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/build/solr-core/test/temp
 
-Dclover.db.dir=/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/clover/db
 -Djava.security.manager=org.apache.lucene.util.TestSecurityManager 
-Djava.security.policy=/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/tools/junit4/tests.policy
 -Dlucene.version=5.0-SNAPSHOT -Djetty.testMode=1 -Djetty.insecurerandom=1 
-Dsolr.directoryFactory=org.apache.solr.core.MockDirectoryFactory 
-Djava.awt.headless=true -Djdk.map.althashing.threshold=0 
-Dtests.leaveTemporary=false -Dtests.filterstacks=true -Dtests.disableHdfs=true 
-classpath 
/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/build/solr-core/classes/test:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/build/solr-test-framework/classes/java:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/test-framework/lib/junit4-ant-2.1.3.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/core/src/test-files:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/test-framework/classes/java:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/codecs/classes/java:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/build/solr-solrj/classes/java:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/build/solr-core/classes/java:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/analysis/common/lucene-analyzers-common-5.0-SNAPSHOT.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/analysis/kuromoji/lucene-analyzers-kuromoji-5.0-SNAPSHOT.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/analysis/phonetic/lucene-analyzers-phonetic-5.0-SNAPSHOT.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/codecs/lucene-codecs-5.0-SNAPSHOT.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/highlighter/lucene-highlighter-5.0-SNAPSHOT.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/memory/lucene-memory-5.0-SNAPSHOT.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/misc/lucene-misc-5.0-SNAPSHOT.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/spatial/lucene-spatial-5.0-SNAPSHOT.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/expressions/lucene-expressions-5.0-SNAPSHOT.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/suggest/lucene-suggest-5.0-SNAPSHOT.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/grouping/lucene-grouping-5.0-SNAPSHOT.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/queries/lucene-queries-5.0-SNAPSHOT.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/queryparser/lucene-queryparser-5.0-SNAPSHOT.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/join/lucene-join-5.0-SNAPSHOT.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/core/lib/antlr-runtime-3.5.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/core/lib/asm-4.1.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/core/lib/asm-commons-4.1.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/core/lib/commons-cli-1.2.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/core/lib/commons-codec-1.9.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/core/lib/commons-configuration-1.6.jar:/Users/jenk

[jira] [Commented] (SOLR-6022) Rename getAnalyzer to getIndexAnalyzer

2014-05-02 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-6022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13988445#comment-13988445
 ] 

ASF subversion and git services commented on SOLR-6022:
---

Commit 1592127 from [~rjernst] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1592127 ]

SOLR-6022: Deprecate getAnalyzer() in IndexField and FieldType, and add 
getIndexAnalyzer()

> Rename getAnalyzer to getIndexAnalyzer
> --
>
> Key: SOLR-6022
> URL: https://issues.apache.org/jira/browse/SOLR-6022
> Project: Solr
>  Issue Type: Improvement
>Reporter: Ryan Ernst
>Assignee: Ryan Ernst
> Fix For: 4.9, 5.0
>
> Attachments: SOLR-6022.branch_4x-deprecation.patch, SOLR-6022.patch, 
> SOLR-6022.patch, SOLR-6022.patch
>
>
> We have separate index/query analyzer chains, but the access methods for the 
> analyzers do not match up with the names.  This can lead to unknowingly using 
> the wrong analyzer chain (as it did in SOLR-6017).  We should do this 
> renaming in trunk, and deprecate the old getAnalyzer function in 4x.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (SOLR-6022) Rename getAnalyzer to getIndexAnalyzer

2014-05-02 Thread Ryan Ernst (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-6022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan Ernst resolved SOLR-6022.
--

   Resolution: Fixed
Fix Version/s: 5.0
   4.9

> Rename getAnalyzer to getIndexAnalyzer
> --
>
> Key: SOLR-6022
> URL: https://issues.apache.org/jira/browse/SOLR-6022
> Project: Solr
>  Issue Type: Improvement
>Reporter: Ryan Ernst
>Assignee: Ryan Ernst
> Fix For: 4.9, 5.0
>
> Attachments: SOLR-6022.branch_4x-deprecation.patch, SOLR-6022.patch, 
> SOLR-6022.patch, SOLR-6022.patch
>
>
> We have separate index/query analyzer chains, but the access methods for the 
> analyzers do not match up with the names.  This can lead to unknowingly using 
> the wrong analyzer chain (as it did in SOLR-6017).  We should do this 
> renaming in trunk, and deprecate the old getAnalyzer function in 4x.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-6039) debug=track causes debug=query info to be suprsedd when no results found

2014-05-02 Thread JIRA


[ 
https://issues.apache.org/jira/browse/SOLR-6039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13988391#comment-13988391
 ] 

Tomás Fernández Löbbe commented on SOLR-6039:
-

bq. I didnt' fully understand the changes you made when skimming your patch
Besides adding the query section when there are no "GET_DEBUG_PURPOSE" requests 
(that it's on on the GET_FIELDS phase), one thing that changes with the patch 
is that the shard requests for all phases will require "debug=timing" if timing 
is needed. Then after the final phase those times are added. 
Before this change, the timing section didn't come back on queries with no docs 
(this is I think a previous than SOLR-5399), now it does. 
Another implication with this change is that all shard requests will be 
considered, and not only the last phase (will now show higher times). As I say 
before, the times for all shard responses are being added, and because many of 
those requests are sent in parallel, this means that the timing displayed may 
be higher than the clock time of the request. I think this is useful 
information anyway and should be considered more as a metric of how much each 
component is taking in all the request. 

> debug=track causes debug=query info to be suprsedd when no results found
> 
>
> Key: SOLR-6039
> URL: https://issues.apache.org/jira/browse/SOLR-6039
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 4.7
>Reporter: Hoss Man
> Attachments: SOLR-6039.patch, SOLR-6039.patch, SOLR-6039.patch, 
> SOLR-6039.patch
>
>
> Shamik Bandopadhyay noted on the mailing list that debugQuery=true wasn't 
> returning info about how the query was being parsed in SolrCloud.
> Steps to reproduce...
> * startup a simple 2 shard solr cluster using the example configs
> * Load this URL:
> ** http://localhost:8983/solr/select?q=Foo&debug=query
> ** note that the debug=query causes a debug block including "parsedquery" 
> showing "title:foo"
> * Load either of these URLs:
> ** http://localhost:8983/solr/select?q=Foo&debug=query&debug=track
> ** http://localhost:8983/solr/select?q=Foo&debugQuery=true (legacy short hand 
> option for enabling all debug options)
> ** Note that the debug block exists, but *only* includes the distributed 
> "track" options - the query parsing debugging info is not available
> * index the sample data (java -jar post.jar *.xml)
> * Load either of these URLs:
> ** http://localhost:8983/solr/select?q=Solr&debugQuery=true&fl=id
> ** http://localhost:8983/solr/select?q=Solr&debug=query&debug=track&fl=id
> ** Note that now we have at least one matching doc, and the parsedquery info 
> is included in the debug block along with the tracking info
> * Load either of these URLs:
> ** http://localhost:8983/solr/select?q=Solr&debug=query&debug=track&rows=0
> ** http://localhost:8983/solr/select?q=Solr&debug=query&debug=track&rows=0
> ** Note: even though we have a matching doc, since rows=0 prevents it from 
> being returned, the parsedquery debug info again no longer works - just the 
> track debug info
> 
> The work around, for people who want don't care about the newer "debug 
> tracking" and what the same debug information as pre-4.7, is to enumerate the 
> debug options (ie: {{debug=query&debug=timing&debug=results}}) instead of 
> relying on the shorthand: {{debugQuery=true}}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5638) Default Attributes are expensive

2014-05-02 Thread Uwe Schindler (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13988378#comment-13988378
 ] 

Uwe Schindler commented on LUCENE-5638:
---

I also created another subtask to clean up the Token class and remove stupid 
copy-ctors and all those reinit() methods. Unmaintainable! LUCENE-5640

> Default Attributes are expensive
> 
>
> Key: LUCENE-5638
> URL: https://issues.apache.org/jira/browse/LUCENE-5638
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: modules/analysis
>Reporter: Robert Muir
> Attachments: LUCENE-5638.patch
>
>
> Changes like LUCENE-5634 make it clear that the default AttributeFactory 
> stuff has a very high cost: weakmaps/reflection/etc.
> Additionally I think clearAttributes() is more expensive than it should be: 
> it has to traverse a linked-list, calling clear() per token.
> Operations like cloning (save/restoreState) have a high cost tll.
> Maybe we can have a better Default? In other words, rename 
> DEFAULT_ATTRIBUTE_FACTORY to REFLECTION_ATTRIBUTE_FACTORY, and instead have a 
> faster default factory that just has one AttributeImpl with the "core ones" 
> that 95% of users are dealing with (TOKEN_ATTRIBUTE_FACTORY?): anything 
> outside of that falls back to reflection.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-6039) debug=track causes debug=query info to be suprsedd when no results found

2014-05-02 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/SOLR-6039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tomás Fernández Löbbe updated SOLR-6039:


Attachment: SOLR-6039.patch

> debug=track causes debug=query info to be suprsedd when no results found
> 
>
> Key: SOLR-6039
> URL: https://issues.apache.org/jira/browse/SOLR-6039
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 4.7
>Reporter: Hoss Man
> Attachments: SOLR-6039.patch, SOLR-6039.patch, SOLR-6039.patch, 
> SOLR-6039.patch
>
>
> Shamik Bandopadhyay noted on the mailing list that debugQuery=true wasn't 
> returning info about how the query was being parsed in SolrCloud.
> Steps to reproduce...
> * startup a simple 2 shard solr cluster using the example configs
> * Load this URL:
> ** http://localhost:8983/solr/select?q=Foo&debug=query
> ** note that the debug=query causes a debug block including "parsedquery" 
> showing "title:foo"
> * Load either of these URLs:
> ** http://localhost:8983/solr/select?q=Foo&debug=query&debug=track
> ** http://localhost:8983/solr/select?q=Foo&debugQuery=true (legacy short hand 
> option for enabling all debug options)
> ** Note that the debug block exists, but *only* includes the distributed 
> "track" options - the query parsing debugging info is not available
> * index the sample data (java -jar post.jar *.xml)
> * Load either of these URLs:
> ** http://localhost:8983/solr/select?q=Solr&debugQuery=true&fl=id
> ** http://localhost:8983/solr/select?q=Solr&debug=query&debug=track&fl=id
> ** Note that now we have at least one matching doc, and the parsedquery info 
> is included in the debug block along with the tracking info
> * Load either of these URLs:
> ** http://localhost:8983/solr/select?q=Solr&debug=query&debug=track&rows=0
> ** http://localhost:8983/solr/select?q=Solr&debug=query&debug=track&rows=0
> ** Note: even though we have a matching doc, since rows=0 prevents it from 
> being returned, the parsedquery debug info again no longer works - just the 
> track debug info
> 
> The work around, for people who want don't care about the newer "debug 
> tracking" and what the same debug information as pre-4.7, is to enumerate the 
> debug options (ie: {{debug=query&debug=timing&debug=results}}) instead of 
> relying on the shorthand: {{debugQuery=true}}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5638) Default Attributes are expensive

2014-05-02 Thread Uwe Schindler (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13988377#comment-13988377
 ] 

Uwe Schindler commented on LUCENE-5638:
---

In analysis module, TestWikipediaTokenizer also fails, we have to dig. I don't 
understand the failure.

> Default Attributes are expensive
> 
>
> Key: LUCENE-5638
> URL: https://issues.apache.org/jira/browse/LUCENE-5638
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: modules/analysis
>Reporter: Robert Muir
> Attachments: LUCENE-5638.patch
>
>
> Changes like LUCENE-5634 make it clear that the default AttributeFactory 
> stuff has a very high cost: weakmaps/reflection/etc.
> Additionally I think clearAttributes() is more expensive than it should be: 
> it has to traverse a linked-list, calling clear() per token.
> Operations like cloning (save/restoreState) have a high cost tll.
> Maybe we can have a better Default? In other words, rename 
> DEFAULT_ATTRIBUTE_FACTORY to REFLECTION_ATTRIBUTE_FACTORY, and instead have a 
> faster default factory that just has one AttributeImpl with the "core ones" 
> that 95% of users are dealing with (TOKEN_ATTRIBUTE_FACTORY?): anything 
> outside of that falls back to reflection.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5639) Fix implementation of PositionLengthAttribute in Token.java

2014-05-02 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13988374#comment-13988374
 ] 

ASF subversion and git services commented on LUCENE-5639:
-

Commit 1592086 from [~thetaphi] in branch 'dev/branches/lucene_solr_4_7'
[ https://svn.apache.org/r1592086 ]

Merged revision(s) 1592080 from lucene/dev/branches/branch_4x:
LUCENE-5639: Fix token class to correctly implement PoistionLengthAttribute

> Fix implementation of PositionLengthAttribute in Token.java
> ---
>
> Key: LUCENE-5639
> URL: https://issues.apache.org/jira/browse/LUCENE-5639
> Project: Lucene - Core
>  Issue Type: Sub-task
>  Components: modules/analysis
>Reporter: Uwe Schindler
>Assignee: Uwe Schindler
> Fix For: 4.7.3, 4.8.1, 4.9, 5.0
>
> Attachments: LUCENE-5639.patch
>
>
> The Token class misses to correctly implement all clone/copy/equals/... stuff 
> for PositionLengthAttribute.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Issue Comment Deleted] (LUCENE-5639) Fix implementation of PositionLengthAttribute in Token.java

2014-05-02 Thread Uwe Schindler (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-5639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-5639:
--

Comment: was deleted

(was: We should also cleanup the Token class and reomve the various horrible 
ctors calling each other. Alos all of the stupid reInit methods. All those are 
buggy like hellp if you add new attributes.)

> Fix implementation of PositionLengthAttribute in Token.java
> ---
>
> Key: LUCENE-5639
> URL: https://issues.apache.org/jira/browse/LUCENE-5639
> Project: Lucene - Core
>  Issue Type: Sub-task
>  Components: modules/analysis
>Reporter: Uwe Schindler
>Assignee: Uwe Schindler
> Fix For: 4.7.3, 4.8.1, 4.9, 5.0
>
> Attachments: LUCENE-5639.patch
>
>
> The Token class misses to correctly implement all clone/copy/equals/... stuff 
> for PositionLengthAttribute.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-5639) Fix implementation of PositionLengthAttribute in Token.java

2014-05-02 Thread Uwe Schindler (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-5639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler resolved LUCENE-5639.
---

Resolution: Fixed

> Fix implementation of PositionLengthAttribute in Token.java
> ---
>
> Key: LUCENE-5639
> URL: https://issues.apache.org/jira/browse/LUCENE-5639
> Project: Lucene - Core
>  Issue Type: Sub-task
>  Components: modules/analysis
>Reporter: Uwe Schindler
>Assignee: Uwe Schindler
> Fix For: 4.7.3, 4.8.1, 4.9, 5.0
>
> Attachments: LUCENE-5639.patch
>
>
> The Token class misses to correctly implement all clone/copy/equals/... stuff 
> for PositionLengthAttribute.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5639) Fix implementation of PositionLengthAttribute in Token.java

2014-05-02 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13988366#comment-13988366
 ] 

ASF subversion and git services commented on LUCENE-5639:
-

Commit 1592083 from [~thetaphi] in branch 'dev/branches/lucene_solr_4_8'
[ https://svn.apache.org/r1592083 ]

Merged revision(s) 1592080 from lucene/dev/branches/branch_4x:
LUCENE-5639: Fix token class to correctly implement PoistionLengthAttribute

> Fix implementation of PositionLengthAttribute in Token.java
> ---
>
> Key: LUCENE-5639
> URL: https://issues.apache.org/jira/browse/LUCENE-5639
> Project: Lucene - Core
>  Issue Type: Sub-task
>  Components: modules/analysis
>Reporter: Uwe Schindler
>Assignee: Uwe Schindler
> Fix For: 4.7.3, 4.8.1, 4.9, 5.0
>
> Attachments: LUCENE-5639.patch
>
>
> The Token class misses to correctly implement all clone/copy/equals/... stuff 
> for PositionLengthAttribute.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5639) Fix implementation of PositionLengthAttribute in Token.java

2014-05-02 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13988363#comment-13988363
 ] 

ASF subversion and git services commented on LUCENE-5639:
-

Commit 1592080 from [~thetaphi] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1592080 ]

Merged revision(s) 1592075, 1592078 from lucene/dev/trunk:
LUCENE-5639: Fix token class to correctly implement PoistionLengthAttribute

> Fix implementation of PositionLengthAttribute in Token.java
> ---
>
> Key: LUCENE-5639
> URL: https://issues.apache.org/jira/browse/LUCENE-5639
> Project: Lucene - Core
>  Issue Type: Sub-task
>  Components: modules/analysis
>Reporter: Uwe Schindler
>Assignee: Uwe Schindler
> Fix For: 4.7.3, 4.8.1, 4.9, 5.0
>
> Attachments: LUCENE-5639.patch
>
>
> The Token class misses to correctly implement all clone/copy/equals/... stuff 
> for PositionLengthAttribute.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5639) Fix implementation of PositionLengthAttribute in Token.java

2014-05-02 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13988361#comment-13988361
 ] 

ASF subversion and git services commented on LUCENE-5639:
-

Commit 1592078 from [~thetaphi] in branch 'dev/trunk'
[ https://svn.apache.org/r1592078 ]

LUCENE-5639: Fix token class to correctly implement PoistionLengthAttribute

> Fix implementation of PositionLengthAttribute in Token.java
> ---
>
> Key: LUCENE-5639
> URL: https://issues.apache.org/jira/browse/LUCENE-5639
> Project: Lucene - Core
>  Issue Type: Sub-task
>  Components: modules/analysis
>Reporter: Uwe Schindler
>Assignee: Uwe Schindler
> Fix For: 4.7.3, 4.8.1, 4.9, 5.0
>
> Attachments: LUCENE-5639.patch
>
>
> The Token class misses to correctly implement all clone/copy/equals/... stuff 
> for PositionLengthAttribute.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Assigned] (SOLR-6022) Rename getAnalyzer to getIndexAnalyzer

2014-05-02 Thread Ryan Ernst (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-6022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan Ernst reassigned SOLR-6022:


Assignee: Ryan Ernst

> Rename getAnalyzer to getIndexAnalyzer
> --
>
> Key: SOLR-6022
> URL: https://issues.apache.org/jira/browse/SOLR-6022
> Project: Solr
>  Issue Type: Improvement
>Reporter: Ryan Ernst
>Assignee: Ryan Ernst
> Attachments: SOLR-6022.branch_4x-deprecation.patch, SOLR-6022.patch, 
> SOLR-6022.patch, SOLR-6022.patch
>
>
> We have separate index/query analyzer chains, but the access methods for the 
> analyzers do not match up with the names.  This can lead to unknowingly using 
> the wrong analyzer chain (as it did in SOLR-6017).  We should do this 
> renaming in trunk, and deprecate the old getAnalyzer function in 4x.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-6022) Rename getAnalyzer to getIndexAnalyzer

2014-05-02 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-6022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13988358#comment-13988358
 ] 

ASF subversion and git services commented on SOLR-6022:
---

Commit 1592076 from [~rjernst] in branch 'dev/trunk'
[ https://svn.apache.org/r1592076 ]

SOLR-6022: Rename getAnalyzer() to getIndexAnalyzer()

> Rename getAnalyzer to getIndexAnalyzer
> --
>
> Key: SOLR-6022
> URL: https://issues.apache.org/jira/browse/SOLR-6022
> Project: Solr
>  Issue Type: Improvement
>Reporter: Ryan Ernst
> Attachments: SOLR-6022.branch_4x-deprecation.patch, SOLR-6022.patch, 
> SOLR-6022.patch, SOLR-6022.patch
>
>
> We have separate index/query analyzer chains, but the access methods for the 
> analyzers do not match up with the names.  This can lead to unknowingly using 
> the wrong analyzer chain (as it did in SOLR-6017).  We should do this 
> renaming in trunk, and deprecate the old getAnalyzer function in 4x.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5639) Fix implementation of PositionLengthAttribute in Token.java

2014-05-02 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13988357#comment-13988357
 ] 

ASF subversion and git services commented on LUCENE-5639:
-

Commit 1592075 from [~thetaphi] in branch 'dev/trunk'
[ https://svn.apache.org/r1592075 ]

LUCENE-5639: Fix token class to correctly implement PoistionLengthAttribute

> Fix implementation of PositionLengthAttribute in Token.java
> ---
>
> Key: LUCENE-5639
> URL: https://issues.apache.org/jira/browse/LUCENE-5639
> Project: Lucene - Core
>  Issue Type: Sub-task
>  Components: modules/analysis
>Reporter: Uwe Schindler
>Assignee: Uwe Schindler
> Fix For: 4.7.3, 4.8.1, 4.9, 5.0
>
> Attachments: LUCENE-5639.patch
>
>
> The Token class misses to correctly implement all clone/copy/equals/... stuff 
> for PositionLengthAttribute.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-5640) Cleanup Token class

2014-05-02 Thread Uwe Schindler (JIRA)

Uwe Schindler created LUCENE-5640:
-

 Summary: Cleanup Token class
 Key: LUCENE-5640
 URL: https://issues.apache.org/jira/browse/LUCENE-5640
 Project: Lucene - Core
  Issue Type: Sub-task
Reporter: Uwe Schindler
 Fix For: 4.9, 5.0


We should remove code duplication in the Token class:
- copy constructors
- reinit() shit
- non-default clone()

This is too bugy. Most of the methods can be simply removed. In fact, Token 
should just look like a clone of all AttributeImpl it implements.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-5639) Fix implementation of PositionLengthAttribute in Token.java

2014-05-02 Thread Uwe Schindler (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-5639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-5639:
--

Attachment: LUCENE-5639.patch

> Fix implementation of PositionLengthAttribute in Token.java
> ---
>
> Key: LUCENE-5639
> URL: https://issues.apache.org/jira/browse/LUCENE-5639
> Project: Lucene - Core
>  Issue Type: Sub-task
>  Components: modules/analysis
>Reporter: Uwe Schindler
>Assignee: Uwe Schindler
> Fix For: 4.7.3, 4.8.1, 4.9, 5.0
>
> Attachments: LUCENE-5639.patch
>
>
> The Token class misses to correctly implement all clone/copy/equals/... stuff 
> for PositionLengthAttribute.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5639) Fix implementation of PositionLengthAttribute in Token.java

2014-05-02 Thread Uwe Schindler (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13988343#comment-13988343
 ] 

Uwe Schindler commented on LUCENE-5639:
---

We should also cleanup the Token class and reomve the various horrible ctors 
calling each other. Alos all of the stupid reInit methods. All those are buggy 
like hellp if you add new attributes.

> Fix implementation of PositionLengthAttribute in Token.java
> ---
>
> Key: LUCENE-5639
> URL: https://issues.apache.org/jira/browse/LUCENE-5639
> Project: Lucene - Core
>  Issue Type: Sub-task
>  Components: modules/analysis
>Reporter: Uwe Schindler
>Assignee: Uwe Schindler
> Fix For: 4.7.3, 4.8.1, 4.9, 5.0
>
>
> The Token class misses to correctly implement all clone/copy/equals/... stuff 
> for PositionLengthAttribute.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-5639) Fix implementation of PositionLengthAttribute in Token.java

2014-05-02 Thread Uwe Schindler (JIRA)

Uwe Schindler created LUCENE-5639:
-

 Summary: Fix implementation of PositionLengthAttribute in 
Token.java
 Key: LUCENE-5639
 URL: https://issues.apache.org/jira/browse/LUCENE-5639
 Project: Lucene - Core
  Issue Type: Sub-task
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 4.7.3, 4.8.1, 4.9, 5.0


The Token class misses to correctly implement all clone/copy/equals/... stuff 
for PositionLengthAttribute.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5638) Default Attributes are expensive

2014-05-02 Thread Uwe Schindler (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13988337#comment-13988337
 ] 

Uwe Schindler commented on LUCENE-5638:
---

I created a subtask: LUCENE-5639

> Default Attributes are expensive
> 
>
> Key: LUCENE-5638
> URL: https://issues.apache.org/jira/browse/LUCENE-5638
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: modules/analysis
>Reporter: Robert Muir
> Attachments: LUCENE-5638.patch
>
>
> Changes like LUCENE-5634 make it clear that the default AttributeFactory 
> stuff has a very high cost: weakmaps/reflection/etc.
> Additionally I think clearAttributes() is more expensive than it should be: 
> it has to traverse a linked-list, calling clear() per token.
> Operations like cloning (save/restoreState) have a high cost tll.
> Maybe we can have a better Default? In other words, rename 
> DEFAULT_ATTRIBUTE_FACTORY to REFLECTION_ATTRIBUTE_FACTORY, and instead have a 
> faster default factory that just has one AttributeImpl with the "core ones" 
> that 95% of users are dealing with (TOKEN_ATTRIBUTE_FACTORY?): anything 
> outside of that falls back to reflection.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5638) Default Attributes are expensive

2014-05-02 Thread Uwe Schindler (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13988330#comment-13988330
 ] 

Uwe Schindler commented on LUCENE-5638:
---

I found the bug: Token implemented PositionLengthAttribute but missed to 
implement all the clone/copyTo/equals/... shit. I willheavy commit that, 
because its a bug.

> Default Attributes are expensive
> 
>
> Key: LUCENE-5638
> URL: https://issues.apache.org/jira/browse/LUCENE-5638
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: modules/analysis
>Reporter: Robert Muir
> Attachments: LUCENE-5638.patch
>
>
> Changes like LUCENE-5634 make it clear that the default AttributeFactory 
> stuff has a very high cost: weakmaps/reflection/etc.
> Additionally I think clearAttributes() is more expensive than it should be: 
> it has to traverse a linked-list, calling clear() per token.
> Operations like cloning (save/restoreState) have a high cost tll.
> Maybe we can have a better Default? In other words, rename 
> DEFAULT_ATTRIBUTE_FACTORY to REFLECTION_ATTRIBUTE_FACTORY, and instead have a 
> faster default factory that just has one AttributeImpl with the "core ones" 
> that 95% of users are dealing with (TOKEN_ATTRIBUTE_FACTORY?): anything 
> outside of that falls back to reflection.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-6039) debug=track causes debug=query info to be suprsedd when no results found

2014-05-02 Thread JIRA


[ 
https://issues.apache.org/jira/browse/SOLR-6039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13988329#comment-13988329
 ] 

Tomás Fernández Löbbe commented on SOLR-6039:
-

bq.  i think for now it makes sense to just "fix" the bug relating ot wether 
the info comes back 
I agree now. When I started to think how to use max vs sum in some situations I 
saw the changes were not trivial, better to leave that for a different Jira. 

I was about to upload a new patch with some more changes and tests, please give 
me some time until I merge with your changes before committing.

> debug=track causes debug=query info to be suprsedd when no results found
> 
>
> Key: SOLR-6039
> URL: https://issues.apache.org/jira/browse/SOLR-6039
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 4.7
>Reporter: Hoss Man
> Attachments: SOLR-6039.patch, SOLR-6039.patch, SOLR-6039.patch
>
>
> Shamik Bandopadhyay noted on the mailing list that debugQuery=true wasn't 
> returning info about how the query was being parsed in SolrCloud.
> Steps to reproduce...
> * startup a simple 2 shard solr cluster using the example configs
> * Load this URL:
> ** http://localhost:8983/solr/select?q=Foo&debug=query
> ** note that the debug=query causes a debug block including "parsedquery" 
> showing "title:foo"
> * Load either of these URLs:
> ** http://localhost:8983/solr/select?q=Foo&debug=query&debug=track
> ** http://localhost:8983/solr/select?q=Foo&debugQuery=true (legacy short hand 
> option for enabling all debug options)
> ** Note that the debug block exists, but *only* includes the distributed 
> "track" options - the query parsing debugging info is not available
> * index the sample data (java -jar post.jar *.xml)
> * Load either of these URLs:
> ** http://localhost:8983/solr/select?q=Solr&debugQuery=true&fl=id
> ** http://localhost:8983/solr/select?q=Solr&debug=query&debug=track&fl=id
> ** Note that now we have at least one matching doc, and the parsedquery info 
> is included in the debug block along with the tracking info
> * Load either of these URLs:
> ** http://localhost:8983/solr/select?q=Solr&debug=query&debug=track&rows=0
> ** http://localhost:8983/solr/select?q=Solr&debug=query&debug=track&rows=0
> ** Note: even though we have a matching doc, since rows=0 prevents it from 
> being returned, the parsedquery debug info again no longer works - just the 
> track debug info
> 
> The work around, for people who want don't care about the newer "debug 
> tracking" and what the same debug information as pre-4.7, is to enumerate the 
> debug options (ie: {{debug=query&debug=timing&debug=results}}) instead of 
> relying on the shorthand: {{debugQuery=true}}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-6039) debug=track causes debug=query info to be suprsedd when no results found

2014-05-02 Thread Hoss Man (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-6039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man updated SOLR-6039:
---

Attachment: SOLR-6039.patch

bq. This patch adds the timing info in all phases. The times responded by 
shards are still being added

Yeah ... i think for now it makes sense to just "fix" the bug relating ot 
wether the info comes back - but leave the definition the same as it's been and 
leave the question for wether the timing info should be "merged" differnetly 
for another issue (i can see different advantages to both sum vs max)

I didnt' fully understand the changes you made when skimming your patch -- but 
i did understand your test, and it looks good & fairly ccomprehensive and fills 
me with confidence that the fix is correct.  One thing i thing i noticed was 
still missing though is some testing of picking multiple options (ie: 
"debug=query&debug=timing") so i've added a randomized testing method that 
accounts for that case 9among other things)




> debug=track causes debug=query info to be suprsedd when no results found
> 
>
> Key: SOLR-6039
> URL: https://issues.apache.org/jira/browse/SOLR-6039
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 4.7
>Reporter: Hoss Man
> Attachments: SOLR-6039.patch, SOLR-6039.patch, SOLR-6039.patch
>
>
> Shamik Bandopadhyay noted on the mailing list that debugQuery=true wasn't 
> returning info about how the query was being parsed in SolrCloud.
> Steps to reproduce...
> * startup a simple 2 shard solr cluster using the example configs
> * Load this URL:
> ** http://localhost:8983/solr/select?q=Foo&debug=query
> ** note that the debug=query causes a debug block including "parsedquery" 
> showing "title:foo"
> * Load either of these URLs:
> ** http://localhost:8983/solr/select?q=Foo&debug=query&debug=track
> ** http://localhost:8983/solr/select?q=Foo&debugQuery=true (legacy short hand 
> option for enabling all debug options)
> ** Note that the debug block exists, but *only* includes the distributed 
> "track" options - the query parsing debugging info is not available
> * index the sample data (java -jar post.jar *.xml)
> * Load either of these URLs:
> ** http://localhost:8983/solr/select?q=Solr&debugQuery=true&fl=id
> ** http://localhost:8983/solr/select?q=Solr&debug=query&debug=track&fl=id
> ** Note that now we have at least one matching doc, and the parsedquery info 
> is included in the debug block along with the tracking info
> * Load either of these URLs:
> ** http://localhost:8983/solr/select?q=Solr&debug=query&debug=track&rows=0
> ** http://localhost:8983/solr/select?q=Solr&debug=query&debug=track&rows=0
> ** Note: even though we have a matching doc, since rows=0 prevents it from 
> being returned, the parsedquery debug info again no longer works - just the 
> track debug info
> 
> The work around, for people who want don't care about the newer "debug 
> tracking" and what the same debug information as pre-4.7, is to enumerate the 
> debug options (ie: {{debug=query&debug=timing&debug=results}}) instead of 
> relying on the shorthand: {{debugQuery=true}}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-5638) Default Attributes are expensive

2014-05-02 Thread Uwe Schindler (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-5638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-5638:
--

Attachment: LUCENE-5638.patch

Easy and simple one-line patch.

This uses the Token class as attributes impl, which supports:
{code:java}
public class Token extends CharTermAttributeImpl 
   implements TypeAttribute, PositionIncrementAttribute,
  FlagsAttribute, OffsetAttribute, 
PayloadAttribute, PositionLengthAttribute {
{code}
Strangely, this test fails:

{noformat}
[junit4] Tests with failures:
[junit4]   - 
org.apache.lucene.analysis.TestGraphTokenizers.testMockGraphTokenFilterOnGraphInput
[junit4]
{noformat}

So this one seems to catch some bug in Token.java or the test does not work 
with this attribute impl (maybe it copies/clones in a wrong way).

> Default Attributes are expensive
> 
>
> Key: LUCENE-5638
> URL: https://issues.apache.org/jira/browse/LUCENE-5638
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: modules/analysis
>Reporter: Robert Muir
> Attachments: LUCENE-5638.patch
>
>
> Changes like LUCENE-5634 make it clear that the default AttributeFactory 
> stuff has a very high cost: weakmaps/reflection/etc.
> Additionally I think clearAttributes() is more expensive than it should be: 
> it has to traverse a linked-list, calling clear() per token.
> Operations like cloning (save/restoreState) have a high cost tll.
> Maybe we can have a better Default? In other words, rename 
> DEFAULT_ATTRIBUTE_FACTORY to REFLECTION_ATTRIBUTE_FACTORY, and instead have a 
> faster default factory that just has one AttributeImpl with the "core ones" 
> that 95% of users are dealing with (TOKEN_ATTRIBUTE_FACTORY?): anything 
> outside of that falls back to reflection.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3538) fix java7 warnings in the source code

2014-05-02 Thread Ahmet Arslan (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13988240#comment-13988240
 ] 

Ahmet Arslan commented on LUCENE-3538:
--

Can anybody tell me what would be "warning free" signatures of following to 
methods 
* org.apache.lucene.queries.function.ValueSource#getValues
* org.apache.lucene.queries.function.ValueSource#createWeight

> fix java7 warnings in the source code
> -
>
> Key: LUCENE-3538
> URL: https://issues.apache.org/jira/browse/LUCENE-3538
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Robert Muir
>  Labels: Java7, newdev
>
> Now that oracle has fixed java7 bugs, I imagine some users will want to use 
> it.
> Currently if you compile lucene's code with java7 you get a ton of 
> warnings... lets clean this up



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-5090) NPE in DirectSpellChecker with alternativeTermCount and mm.

2014-05-02 Thread James Dyer (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-5090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Dyer updated SOLR-5090:
-

Attachment: SOLR-5090.patch

Here is a fix with a unit test scenario.  This ignores 
"spellcheck.alternativeTermCount" when set to zero as its absurd to ask 
spellcheckers to return zero suggestions for a word. (both DirectSpellChecker 
and the legacy IndexBasedSpellChecker choke on this scenario)

I plan to commit this in a few days.

> NPE in DirectSpellChecker with alternativeTermCount and mm.
> ---
>
> Key: SOLR-5090
> URL: https://issues.apache.org/jira/browse/SOLR-5090
> Project: Solr
>  Issue Type: Bug
>  Components: spellchecker
>Affects Versions: 4.4
> Environment: 4.4.0 1504776 - sarowe - 2013-07-19 02:58:35
>Reporter: Markus Jelsma
>Assignee: James Dyer
> Fix For: 4.9, 5.0
>
> Attachments: SOLR-5090.patch
>
>
> Query with three terms of which one is misspelled and 
> spellcheck.alternativeTermCount=0&mm=3 yields the following NPE:
> {code}
> ERROR org.apache.solr.servlet.SolrDispatchFilter  – 
> null:java.lang.NullPointerException
> at 
> org.apache.lucene.search.spell.DirectSpellChecker.suggestSimilar(DirectSpellChecker.java:422)
> at 
> org.apache.lucene.search.spell.DirectSpellChecker.suggestSimilar(DirectSpellChecker.java:355)
> at 
> org.apache.solr.spelling.DirectSolrSpellChecker.getSuggestions(DirectSolrSpellChecker.java:189)
> at 
> org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:188)
> at 
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:208)
> at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:1904)
> at 
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:659)
> at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:362)
> at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:158
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Assigned] (SOLR-5090) NPE in DirectSpellChecker with alternativeTermCount and mm.

2014-05-02 Thread James Dyer (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-5090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Dyer reassigned SOLR-5090:


Assignee: James Dyer

> NPE in DirectSpellChecker with alternativeTermCount and mm.
> ---
>
> Key: SOLR-5090
> URL: https://issues.apache.org/jira/browse/SOLR-5090
> Project: Solr
>  Issue Type: Bug
>  Components: spellchecker
>Affects Versions: 4.4
> Environment: 4.4.0 1504776 - sarowe - 2013-07-19 02:58:35
>Reporter: Markus Jelsma
>Assignee: James Dyer
> Fix For: 4.9, 5.0
>
>
> Query with three terms of which one is misspelled and 
> spellcheck.alternativeTermCount=0&mm=3 yields the following NPE:
> {code}
> ERROR org.apache.solr.servlet.SolrDispatchFilter  – 
> null:java.lang.NullPointerException
> at 
> org.apache.lucene.search.spell.DirectSpellChecker.suggestSimilar(DirectSpellChecker.java:422)
> at 
> org.apache.lucene.search.spell.DirectSpellChecker.suggestSimilar(DirectSpellChecker.java:355)
> at 
> org.apache.solr.spelling.DirectSolrSpellChecker.getSuggestions(DirectSolrSpellChecker.java:189)
> at 
> org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:188)
> at 
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:208)
> at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:1904)
> at 
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:659)
> at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:362)
> at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:158
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-6017) SimpleQParser uses index analyzer instead of query analyzer

2014-05-02 Thread Ryan Ernst (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-6017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan Ernst updated SOLR-6017:
-

Fix Version/s: 4.8.1

> SimpleQParser uses index analyzer instead of query analyzer
> ---
>
> Key: SOLR-6017
> URL: https://issues.apache.org/jira/browse/SOLR-6017
> Project: Solr
>  Issue Type: Bug
>Reporter: Ryan Ernst
>Assignee: Ryan Ernst
> Fix For: 4.8.1, 4.9, 5.0
>
> Attachments: SOLR-6017.patch
>
>
> The SimpleQParser uses getAnalyzer(), but it should be getQueryAnalyzer().



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-5634) Reuse TokenStream instances in Field

2014-05-02 Thread Robert Muir (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-5634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir resolved LUCENE-5634.
-

Resolution: Fixed

> Reuse TokenStream instances in Field
> 
>
> Key: LUCENE-5634
> URL: https://issues.apache.org/jira/browse/LUCENE-5634
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Michael McCandless
> Fix For: 4.9, 5.0
>
> Attachments: LUCENE-5634.patch, LUCENE-5634.patch, LUCENE-5634.patch, 
> LUCENE-5634.patch
>
>
> If you don't reuse your Doc/Field instances (which is very expert: I
> suspect few apps do) then there's a lot of garbage created to index each
> StringField because we make a new StringTokenStream or
> NumericTokenStream (and their Attributes).
> We should be able to re-use these instances via a static
> ThreadLocal...



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5634) Reuse TokenStream instances in Field

2014-05-02 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13988080#comment-13988080
 ] 

ASF subversion and git services commented on LUCENE-5634:
-

Commit 1592005 from [~rcmuir] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1592005 ]

LUCENE-5634: Reuse TokenStream instances for string and numeric Fields

> Reuse TokenStream instances in Field
> 
>
> Key: LUCENE-5634
> URL: https://issues.apache.org/jira/browse/LUCENE-5634
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Michael McCandless
> Fix For: 4.9, 5.0
>
> Attachments: LUCENE-5634.patch, LUCENE-5634.patch, LUCENE-5634.patch, 
> LUCENE-5634.patch
>
>
> If you don't reuse your Doc/Field instances (which is very expert: I
> suspect few apps do) then there's a lot of garbage created to index each
> StringField because we make a new StringTokenStream or
> NumericTokenStream (and their Attributes).
> We should be able to re-use these instances via a static
> ThreadLocal...



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5609) Should we revisit the default numeric precision step?

2014-05-02 Thread Michael McCandless (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13988063#comment-13988063
 ] 

Michael McCandless commented on LUCENE-5609:


I think we should do something here for 4.9; poor defaults just hurt our users.

I'd like to do 8/16, but Uwe are you completely against this?

> Should we revisit the default numeric precision step?
> -
>
> Key: LUCENE-5609
> URL: https://issues.apache.org/jira/browse/LUCENE-5609
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/search
>Reporter: Michael McCandless
> Fix For: 4.9, 5.0
>
> Attachments: LUCENE-5609.patch
>
>
> Right now it's 4, for both 8 (long/double) and 4 byte (int/float)
> numeric fields, but this is a pretty big hit on indexing speed and
> disk usage, especially for tiny documents, because it creates many (8
> or 16) terms for each value.
> Since we originally set these defaults, a lot has changed... e.g. we
> now rewrite MTQs per-segment, we have a faster (BlockTree) terms dict,
> a faster postings format, etc.
> Index size is important because it limits how much of the index will
> be hot (fit in the OS's IO cache).  And more apps are using Lucene for
> tiny docs where the overhead of individual fields is sizable.
> I used the Geonames corpus to run a simple benchmark (all sources are
> committed to luceneutil). It has 8.6 M tiny docs, each with 23 fields,
> with these numeric fields:
>   * lat/lng (double)
>   * modified time, elevation, population (long)
>   * dem (int)
> I tested 4, 8 and 16 precision steps:
> {noformat}
> indexing:
> PrecStepSizeIndexTime
>4   1812.7 MB651.4 sec
>8   1203.0 MB443.2 sec
>   16894.3 MB361.6 sec
> searching:
>  Field  PrecStep   QueryTime   TermCount
>  geoNameID 4   2872.5 ms   20306
>  geoNameID 8   2903.3 ms  104856
>  geoNameID16   3371.9 ms 5871427
>   latitude 4   2160.1 ms   36805
>   latitude 8   2249.0 ms  240655
>   latitude16   2725.9 ms 4649273
>   modified 4   2038.3 ms   13311
>   modified 8   2029.6 ms   58344
>   modified16   2060.5 ms   77763
>  longitude 4   3468.5 ms   33818
>  longitude 8   3629.9 ms  214863
>  longitude16   4060.9 ms 4532032
> {noformat}
> Index time is with 1 thread (for identical index structure).
> The query time is time to run 100 random ranges for that field,
> averaged over 20 iterations.  TermCount is the total number of terms
> the MTQ rewrote to across all 100 queries / segments, and it gets
> higher as expected as precStep gets higher, but the search time is not
> that heavily impacted ... negligible going from 4 to 8, and then some
> impact from 8 to 16.
> Maybe we should increase the int/float default precision step to 8 and
> long/double to 16?  Or both to 16?



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5634) Reuse TokenStream instances in Field

2014-05-02 Thread Michael McCandless (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13988060#comment-13988060
 ] 

Michael McCandless commented on LUCENE-5634:


bq. Or are you comparing the speedup by this patch in combination with the 
precision step change?

Baseline was the patch w/ precStep=8 and comp was the patch w/ precStep=4.  I 
just re-ran to be sure; this is IndexGeoNames.java in luceneutil if you want to 
try ... it's easy to run, you just need to download/unzip geonames corpus 
first.  Net/net precStep=4 is very costly and doesn't seem to buy much query 
time speedups from my tests on LUCENE-5609.

> Reuse TokenStream instances in Field
> 
>
> Key: LUCENE-5634
> URL: https://issues.apache.org/jira/browse/LUCENE-5634
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Michael McCandless
> Fix For: 4.9, 5.0
>
> Attachments: LUCENE-5634.patch, LUCENE-5634.patch, LUCENE-5634.patch, 
> LUCENE-5634.patch
>
>
> If you don't reuse your Doc/Field instances (which is very expert: I
> suspect few apps do) then there's a lot of garbage created to index each
> StringField because we make a new StringTokenStream or
> NumericTokenStream (and their Attributes).
> We should be able to re-use these instances via a static
> ThreadLocal...



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5634) Reuse TokenStream instances in Field

2014-05-02 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13988035#comment-13988035
 ] 

ASF subversion and git services commented on LUCENE-5634:
-

Commit 1591992 from [~rcmuir] in branch 'dev/trunk'
[ https://svn.apache.org/r1591992 ]

LUCENE-5634: Reuse TokenStream instances for string and numeric Fields

> Reuse TokenStream instances in Field
> 
>
> Key: LUCENE-5634
> URL: https://issues.apache.org/jira/browse/LUCENE-5634
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Michael McCandless
> Fix For: 4.9, 5.0
>
> Attachments: LUCENE-5634.patch, LUCENE-5634.patch, LUCENE-5634.patch, 
> LUCENE-5634.patch
>
>
> If you don't reuse your Doc/Field instances (which is very expert: I
> suspect few apps do) then there's a lot of garbage created to index each
> StringField because we make a new StringTokenStream or
> NumericTokenStream (and their Attributes).
> We should be able to re-use these instances via a static
> ThreadLocal...



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-6017) SimpleQParser uses index analyzer instead of query analyzer

2014-05-02 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-6017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13988027#comment-13988027
 ] 

ASF subversion and git services commented on SOLR-6017:
---

Commit 1591990 from [~rjernst] in branch 'dev/branches/lucene_solr_4_8'
[ https://svn.apache.org/r1591990 ]

SOLR-6017: Fix SimpleQParser to use query analyzer instead of index analyzer

> SimpleQParser uses index analyzer instead of query analyzer
> ---
>
> Key: SOLR-6017
> URL: https://issues.apache.org/jira/browse/SOLR-6017
> Project: Solr
>  Issue Type: Bug
>Reporter: Ryan Ernst
>Assignee: Ryan Ernst
> Fix For: 4.9, 5.0
>
> Attachments: SOLR-6017.patch
>
>
> The SimpleQParser uses getAnalyzer(), but it should be getQueryAnalyzer().



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5634) Reuse TokenStream instances in Field

2014-05-02 Thread Uwe Schindler (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13988024#comment-13988024
 ] 

Uwe Schindler commented on LUCENE-5634:
---

+1 I am fine with that patch!

> Reuse TokenStream instances in Field
> 
>
> Key: LUCENE-5634
> URL: https://issues.apache.org/jira/browse/LUCENE-5634
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Michael McCandless
> Fix For: 4.9, 5.0
>
> Attachments: LUCENE-5634.patch, LUCENE-5634.patch, LUCENE-5634.patch, 
> LUCENE-5634.patch
>
>
> If you don't reuse your Doc/Field instances (which is very expert: I
> suspect few apps do) then there's a lot of garbage created to index each
> StringField because we make a new StringTokenStream or
> NumericTokenStream (and their Attributes).
> We should be able to re-use these instances via a static
> ThreadLocal...



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5636) SegmentCommitInfo continues to list unneeded gen'd files

2014-05-02 Thread Shai Erera (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13988005#comment-13988005
 ] 

Shai Erera commented on LUCENE-5636:


I chatted with Robert about this. The current situation is that the old .fnm 
files continue to be referenced even when not needed, however when the segment 
is merged, they go away (as all gen'd files). Given that there's no way to 
solve it without breaking back-compat, unless we introduce hacks such as 
checking for a ".fnm" suffix, we discussed  how to solve this "going forward".

By "going forward" I mean to not change existing segments, but if they contain 
future updates, write the new information in a better way. Perhaps old .fnm 
files will still be referenced by those segments, until they're merged away, 
but new segments will fix that bug.

I think that this might be doable together with LUCENE-5618, by writing 
per-field gen'd DV file, so I'll try to solve it there and if it works I'll 
resolve that issue as appropriate.

> SegmentCommitInfo continues to list unneeded gen'd files
> 
>
> Key: LUCENE-5636
> URL: https://issues.apache.org/jira/browse/LUCENE-5636
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/index
>Reporter: Shai Erera
>Assignee: Shai Erera
> Attachments: LUCENE-5636.patch
>
>
> I thought I handled it in LUCENE-5246, but turns out I didn't handle it 
> fully. I'll upload a patch which improves the test to expose the bug. I know 
> where it is, but I'm not sure how to fix it without breaking index 
> back-compat. Can we do that on experimental features?
> The problem is that if you update different fields in different gens, the 
> FieldInfos files of older gens remain referenced (still!!). I open a new 
> issue since LUCENE-5246 is already resolved and released, so don't want to 
> mess up our JIRA...
> The severity of the bug is that unneeded files are still referenced in the 
> index. Everything still works correctly, it's just that .fnm files are still 
> there. But as I wrote, I'm still not sure how to solve it without requiring 
> apps that use dv updates to reindex.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5376) Add a demo search server

2014-05-02 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13988003#comment-13988003
 ] 

ASF subversion and git services commented on LUCENE-5376:
-

Commit 1591986 from jd...@apache.org in branch 'dev/branches/lucene5376_2'
[ https://svn.apache.org/r1591986 ]

LUCENE-5376: convert GET parameters to JSON

> Add a demo search server
> 
>
> Key: LUCENE-5376
> URL: https://issues.apache.org/jira/browse/LUCENE-5376
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Attachments: lucene-demo-server.tgz
>
>
> I think it'd be useful to have a "demo" search server for Lucene.
> Rather than being fully featured, like Solr, it would be minimal, just 
> wrapping the existing Lucene modules to show how you can make use of these 
> features in a server setting.
> The purpose is to demonstrate how one can build a minimal search server on 
> top of APIs like SearchManager, SearcherLifetimeManager, etc.
> This is also useful for finding rough edges / issues in Lucene's APIs that 
> make building a server unnecessarily hard.
> I don't think it should have back compatibility promises (except Lucene's 
> index back compatibility), so it's free to improve as Lucene's APIs change.
> As a starting point, I'll post what I built for the "eating your own dog 
> food" search app for Lucene's & Solr's jira issues 
> http://jirasearch.mikemccandless.com (blog: 
> http://blog.mikemccandless.com/2013/05/eating-dog-food-with-lucene.html ). It 
> uses Netty to expose basic indexing & searching APIs via JSON, but it's very 
> rough (lots nocommits).



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-5636) SegmentCommitInfo continues to list unneeded gen'd files

2014-05-02 Thread Shai Erera (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-5636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera updated LUCENE-5636:
---

Priority: Major  (was: Critical)

> SegmentCommitInfo continues to list unneeded gen'd files
> 
>
> Key: LUCENE-5636
> URL: https://issues.apache.org/jira/browse/LUCENE-5636
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/index
>Reporter: Shai Erera
>Assignee: Shai Erera
> Attachments: LUCENE-5636.patch
>
>
> I thought I handled it in LUCENE-5246, but turns out I didn't handle it 
> fully. I'll upload a patch which improves the test to expose the bug. I know 
> where it is, but I'm not sure how to fix it without breaking index 
> back-compat. Can we do that on experimental features?
> The problem is that if you update different fields in different gens, the 
> FieldInfos files of older gens remain referenced (still!!). I open a new 
> issue since LUCENE-5246 is already resolved and released, so don't want to 
> mess up our JIRA...
> The severity of the bug is that unneeded files are still referenced in the 
> index. Everything still works correctly, it's just that .fnm files are still 
> there. But as I wrote, I'm still not sure how to solve it without requiring 
> apps that use dv updates to reindex.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5376) Add a demo search server

2014-05-02 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13987989#comment-13987989
 ] 

ASF subversion and git services commented on LUCENE-5376:
-

Commit 1591984 from jd...@apache.org in branch 'dev/branches/lucene5376_2'
[ https://svn.apache.org/r1591984 ]

LUCENE-5376: HelpHandler fix for incoming parameter

> Add a demo search server
> 
>
> Key: LUCENE-5376
> URL: https://issues.apache.org/jira/browse/LUCENE-5376
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Attachments: lucene-demo-server.tgz
>
>
> I think it'd be useful to have a "demo" search server for Lucene.
> Rather than being fully featured, like Solr, it would be minimal, just 
> wrapping the existing Lucene modules to show how you can make use of these 
> features in a server setting.
> The purpose is to demonstrate how one can build a minimal search server on 
> top of APIs like SearchManager, SearcherLifetimeManager, etc.
> This is also useful for finding rough edges / issues in Lucene's APIs that 
> make building a server unnecessarily hard.
> I don't think it should have back compatibility promises (except Lucene's 
> index back compatibility), so it's free to improve as Lucene's APIs change.
> As a starting point, I'll post what I built for the "eating your own dog 
> food" search app for Lucene's & Solr's jira issues 
> http://jirasearch.mikemccandless.com (blog: 
> http://blog.mikemccandless.com/2013/05/eating-dog-food-with-lucene.html ). It 
> uses Netty to expose basic indexing & searching APIs via JSON, but it's very 
> rough (lots nocommits).



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-5634) Reuse TokenStream instances in Field

2014-05-02 Thread Robert Muir (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-5634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-5634:


Attachment: LUCENE-5634.patch

Updated patch with tests.

> Reuse TokenStream instances in Field
> 
>
> Key: LUCENE-5634
> URL: https://issues.apache.org/jira/browse/LUCENE-5634
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Michael McCandless
> Fix For: 4.9, 5.0
>
> Attachments: LUCENE-5634.patch, LUCENE-5634.patch, LUCENE-5634.patch, 
> LUCENE-5634.patch
>
>
> If you don't reuse your Doc/Field instances (which is very expert: I
> suspect few apps do) then there's a lot of garbage created to index each
> StringField because we make a new StringTokenStream or
> NumericTokenStream (and their Attributes).
> We should be able to re-use these instances via a static
> ThreadLocal...



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5638) Default Attributes are expensive

2014-05-02 Thread Uwe Schindler (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13987965#comment-13987965
 ] 

Uwe Schindler commented on LUCENE-5638:
---

bq. Changes like LUCENE-5634 make it clear that the default AttributeFactory 
stuff has a very high cost: weakmaps/reflection/etc.

The problem are not the weak maps and reflections. The reason why it is 
expensive is the fact that all attribute instances have to be put into the 2 
LinkedHashMaps on creating the TokenStream. I just repeat: It is not the 
refection! We had this discussion already back 5 years ago with Michael Busch!

In addition, the AttributeFactory itsself has less impact (this was already 
tested while developing it in 2.9). This is why the weak maps are there - so it 
is fast, the *only* reflection ever happens is: Class#newInstance() is cheap in 
recent Java versions, the speed difference in micro benchmarks is small, as 
fast as a native {{new}}.

So I disagree with removing the default AttributeFactory, we still need it for 
non-default attributes, so: The simple workaround would be to use 
TOKEN_ATTRIBUTE_FACTORY instead, which falls back to the default one for 
unknown attributes.

I agree with clearAttributes(), but this should be solved with 
TOKEN_ATTRIBUTE_FACTORY , too.

> Default Attributes are expensive
> 
>
> Key: LUCENE-5638
> URL: https://issues.apache.org/jira/browse/LUCENE-5638
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: modules/analysis
>Reporter: Robert Muir
>
> Changes like LUCENE-5634 make it clear that the default AttributeFactory 
> stuff has a very high cost: weakmaps/reflection/etc.
> Additionally I think clearAttributes() is more expensive than it should be: 
> it has to traverse a linked-list, calling clear() per token.
> Operations like cloning (save/restoreState) have a high cost tll.
> Maybe we can have a better Default? In other words, rename 
> DEFAULT_ATTRIBUTE_FACTORY to REFLECTION_ATTRIBUTE_FACTORY, and instead have a 
> faster default factory that just has one AttributeImpl with the "core ones" 
> that 95% of users are dealing with (TOKEN_ATTRIBUTE_FACTORY?): anything 
> outside of that falls back to reflection.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-6039) debug=track causes debug=query info to be suprsedd when no results found

2014-05-02 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/SOLR-6039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tomás Fernández Löbbe updated SOLR-6039:


Attachment: SOLR-6039.patch

This patch adds the timing info in all phases. The times responded by shards 
are still being added

> debug=track causes debug=query info to be suprsedd when no results found
> 
>
> Key: SOLR-6039
> URL: https://issues.apache.org/jira/browse/SOLR-6039
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 4.7
>Reporter: Hoss Man
> Attachments: SOLR-6039.patch, SOLR-6039.patch
>
>
> Shamik Bandopadhyay noted on the mailing list that debugQuery=true wasn't 
> returning info about how the query was being parsed in SolrCloud.
> Steps to reproduce...
> * startup a simple 2 shard solr cluster using the example configs
> * Load this URL:
> ** http://localhost:8983/solr/select?q=Foo&debug=query
> ** note that the debug=query causes a debug block including "parsedquery" 
> showing "title:foo"
> * Load either of these URLs:
> ** http://localhost:8983/solr/select?q=Foo&debug=query&debug=track
> ** http://localhost:8983/solr/select?q=Foo&debugQuery=true (legacy short hand 
> option for enabling all debug options)
> ** Note that the debug block exists, but *only* includes the distributed 
> "track" options - the query parsing debugging info is not available
> * index the sample data (java -jar post.jar *.xml)
> * Load either of these URLs:
> ** http://localhost:8983/solr/select?q=Solr&debugQuery=true&fl=id
> ** http://localhost:8983/solr/select?q=Solr&debug=query&debug=track&fl=id
> ** Note that now we have at least one matching doc, and the parsedquery info 
> is included in the debug block along with the tracking info
> * Load either of these URLs:
> ** http://localhost:8983/solr/select?q=Solr&debug=query&debug=track&rows=0
> ** http://localhost:8983/solr/select?q=Solr&debug=query&debug=track&rows=0
> ** Note: even though we have a matching doc, since rows=0 prevents it from 
> being returned, the parsedquery debug info again no longer works - just the 
> track debug info
> 
> The work around, for people who want don't care about the newer "debug 
> tracking" and what the same debug information as pre-4.7, is to enumerate the 
> debug options (ie: {{debug=query&debug=timing&debug=results}}) instead of 
> relying on the shorthand: {{debugQuery=true}}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-5831) Scale score PostFilter

2014-05-02 Thread Peter Keegan (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-5831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Keegan updated SOLR-5831:
---

Attachment: SOLR-5831.patch

Hi Joel,

The bug I discovered with secondary sort only occurs when the index has 
multiple segments. The dummy scorer docId should have been relative to the doc 
base. Also, the collector 'finish' method wasn't calling the delegate's finish 
method. Both of these bugs were fixed in the previous patch.

I don't have a unit test for multiple segments, but I did add a new unit test 
for the 'maxscalehits' parameter.

I'm still not sure that I'm determining the result window size for the 
QueryResultCache, correctly. See this part:
  // Determine the results window size.
  // TODO: this should be sized larger for the query result cache
  int winSize = 
request.getSearcher().getCore().getSolrConfig().queryResultWindowSize;

Could you verify if this is ok?

Thanks,
Peter


> Scale score PostFilter
> --
>
> Key: SOLR-5831
> URL: https://issues.apache.org/jira/browse/SOLR-5831
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Affects Versions: 4.7
>Reporter: Peter Keegan
>Assignee: Joel Bernstein
>Priority: Minor
> Fix For: 4.9
>
> Attachments: SOLR-5831.patch, SOLR-5831.patch, SOLR-5831.patch, 
> SOLR-5831.patch, TestScaleScoreQParserPlugin.patch
>
>
> The ScaleScoreQParserPlugin is a PostFilter that performs score scaling.
> This is an alternative to using a function query wrapping a scale() wrapping 
> a query(). For example:
> select?qq={!edismax v='news' qf='title^2 
> body'}&scaledQ=scale(product(query($qq),1),0,1)&q={!func}sum(product(0.75,$scaledQ),product(0.25,field(myfield)))&fq={!query
>  v=$qq}
> The problem with this query is that it has to scale every hit. Usually, only 
> the returned hits need to be scaled,
> but there may be use cases where the number of hits to be scaled is greater 
> than the returned hit count,
> but less than or equal to the total hit count.
> Sample syntax:
> fq={!scalescore+l=0.0 u=1.0 maxscalehits=1 
> func=sum(product(sscore(),0.75),product(field(myfield),0.25))}
> l=0.0 u=1.0   //Scale scores to values between 0-1, inclusive 
> maxscalehits=1//The maximum number of result scores to scale (-1 = 
> all hits, 0 = results 'page' size)
> func=...  //Apply the composite function to each hit. The 
> scaled score value is accessed by the 'score()' value source
> All parameters are optional. The defaults are:
> l=0.0 u=1.0
> maxscalehits=0 (result window size)
> func=(null)
>  
> Note: this patch is not complete, as it contains no test cases and may not 
> conform 
> to all the guidelines in http://wiki.apache.org/solr/HowToContribute. 
>  
> I would appreciate any feedback on the usability and implementation.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-5638) Default Attributes are expensive

2014-05-02 Thread Robert Muir (JIRA)

Robert Muir created LUCENE-5638:
---

 Summary: Default Attributes are expensive
 Key: LUCENE-5638
 URL: https://issues.apache.org/jira/browse/LUCENE-5638
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/analysis
Reporter: Robert Muir


Changes like LUCENE-5634 make it clear that the default AttributeFactory stuff 
has a very high cost: weakmaps/reflection/etc.

Additionally I think clearAttributes() is more expensive than it should be: it 
has to traverse a linked-list, calling clear() per token.

Operations like cloning (save/restoreState) have a high cost tll.

Maybe we can have a better Default? In other words, rename 
DEFAULT_ATTRIBUTE_FACTORY to REFLECTION_ATTRIBUTE_FACTORY, and instead have a 
faster default factory that just has one AttributeImpl with the "core ones" 
that 95% of users are dealing with (TOKEN_ATTRIBUTE_FACTORY?): anything outside 
of that falls back to reflection.




--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5634) Reuse TokenStream instances in Field

2014-05-02 Thread Uwe Schindler (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13987855#comment-13987855
 ] 

Uwe Schindler commented on LUCENE-5634:
---

bq. I would prefer to simply break the interface rather than do anything 
sophisticated here. Its a very expert low-level one. The patch had very minimal 
impact to the codebase.

+1. Nevertheless as we change a public interface, it should be mentioned in 
"Backwards Breaks".

> Reuse TokenStream instances in Field
> 
>
> Key: LUCENE-5634
> URL: https://issues.apache.org/jira/browse/LUCENE-5634
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Michael McCandless
> Fix For: 4.9, 5.0
>
> Attachments: LUCENE-5634.patch, LUCENE-5634.patch, LUCENE-5634.patch
>
>
> If you don't reuse your Doc/Field instances (which is very expert: I
> suspect few apps do) then there's a lot of garbage created to index each
> StringField because we make a new StringTokenStream or
> NumericTokenStream (and their Attributes).
> We should be able to re-use these instances via a static
> ThreadLocal...



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-5285) Solr response format should support child Docs

2014-05-02 Thread Varun Thacker (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-5285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Thacker updated SOLR-5285:


Attachment: SOLR-5285.patch

Correct patch. Please ignore the previous patch.

> Solr response format should support child Docs
> --
>
> Key: SOLR-5285
> URL: https://issues.apache.org/jira/browse/SOLR-5285
> Project: Solr
>  Issue Type: New Feature
>Reporter: Varun Thacker
> Fix For: 4.9, 5.0
>
> Attachments: SOLR-5285.patch, SOLR-5285.patch, SOLR-5285.patch, 
> SOLR-5285.patch, SOLR-5285.patch
>
>
> Solr has added support for taking childDocs as input ( only XML till now ). 
> It's currently used for BlockJoinQuery. 
> I feel that if a user indexes a document with child docs, even if he isn't 
> using the BJQ features and is just searching which results in a hit on the 
> parentDoc, it's childDocs should be returned in the response format.
> [~hossman_luc...@fucit.org] on IRC suggested that the DocTransformers would 
> be the place to add childDocs to the response.
> Now given a docId one needs to find out all the childDoc id's. A couple of 
> approaches which I could think of are 
> 1. Maintain the relation between a parentDoc and it's childDocs during 
> indexing time in maybe a separate index?
> 2. Somehow emulate what happens in ToParentBlockJoinQuery.nextDoc() - Given a 
> parentDoc it finds out all the childDocs but this requires a childScorer.
> Am I missing something obvious on how to find the relation between a 
> parentDoc and it's childDocs because none of the above solutions for this 
> look right.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-5285) Solr response format should support child Docs

2014-05-02 Thread Varun Thacker (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-5285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Thacker updated SOLR-5285:


Attachment: SOLR-5285.patch

Updated patch with trunk.

bq. it's not clear to me from the API what to expect will happen if i have more 
then one level of parent-child relationships in my index – will children & 
grandchildren be returned? whatever is expected needs to be documented/tested

Tested with grandchildren. In Lucene all grandchildren and all siblings  are 
treated as simply children to the parent document. A parent document and all 
it's child documents are indexed in a block. Hence we should document for only 
support one level of nesting.

> Solr response format should support child Docs
> --
>
> Key: SOLR-5285
> URL: https://issues.apache.org/jira/browse/SOLR-5285
> Project: Solr
>  Issue Type: New Feature
>Reporter: Varun Thacker
> Fix For: 4.9, 5.0
>
> Attachments: SOLR-5285.patch, SOLR-5285.patch, SOLR-5285.patch, 
> SOLR-5285.patch
>
>
> Solr has added support for taking childDocs as input ( only XML till now ). 
> It's currently used for BlockJoinQuery. 
> I feel that if a user indexes a document with child docs, even if he isn't 
> using the BJQ features and is just searching which results in a hit on the 
> parentDoc, it's childDocs should be returned in the response format.
> [~hossman_luc...@fucit.org] on IRC suggested that the DocTransformers would 
> be the place to add childDocs to the response.
> Now given a docId one needs to find out all the childDoc id's. A couple of 
> approaches which I could think of are 
> 1. Maintain the relation between a parentDoc and it's childDocs during 
> indexing time in maybe a separate index?
> 2. Somehow emulate what happens in ToParentBlockJoinQuery.nextDoc() - Given a 
> parentDoc it finds out all the childDocs but this requires a childScorer.
> Am I missing something obvious on how to find the relation between a 
> parentDoc and it's childDocs because none of the above solutions for this 
> look right.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5634) Reuse TokenStream instances in Field

2014-05-02 Thread Robert Muir (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13987844#comment-13987844
 ] 

Robert Muir commented on LUCENE-5634:
-

I would prefer to simply break the interface rather than do anything 
sophisticated here. Its a very expert low-level one. The patch had very minimal 
impact to the codebase.

I think its good to defer stuff with Analyzer and not do that here, that has a 
lot of consumers like QueryParsers, MoreLikeThis, Suggesters, ... Thats a more 
complex issue. I am unsure that adding things like equals is a good idea, it 
might make things very complex. For now, if you implement your own subclass, 
you can just ignore the parameter, and its the same performance and so on.

I will upload a new patch with tests (including doing stupid things). 

> Reuse TokenStream instances in Field
> 
>
> Key: LUCENE-5634
> URL: https://issues.apache.org/jira/browse/LUCENE-5634
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Michael McCandless
> Fix For: 4.9, 5.0
>
> Attachments: LUCENE-5634.patch, LUCENE-5634.patch, LUCENE-5634.patch
>
>
> If you don't reuse your Doc/Field instances (which is very expert: I
> suspect few apps do) then there's a lot of garbage created to index each
> StringField because we make a new StringTokenStream or
> NumericTokenStream (and their Attributes).
> We should be able to re-use these instances via a static
> ThreadLocal...



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5634) Reuse TokenStream instances in Field

2014-05-02 Thread Uwe Schindler (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13987843#comment-13987843
 ] 

Uwe Schindler commented on LUCENE-5634:
---

bq. BTW, that test was with precStep=8. If I use precStep=4 (still the default, 
we really have to fix LUCENE-5609!) then indexing time for Geonames with the 
patch is 164.8 sec (63% slower!).

HÄ? How comes, makes no sense to me. Are you sure you are doing the right 
thing? Or are you comparing  the speedup by this patch in combination with the 
precision step change?

> Reuse TokenStream instances in Field
> 
>
> Key: LUCENE-5634
> URL: https://issues.apache.org/jira/browse/LUCENE-5634
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Michael McCandless
> Fix For: 4.9, 5.0
>
> Attachments: LUCENE-5634.patch, LUCENE-5634.patch, LUCENE-5634.patch
>
>
> If you don't reuse your Doc/Field instances (which is very expert: I
> suspect few apps do) then there's a lot of garbage created to index each
> StringField because we make a new StringTokenStream or
> NumericTokenStream (and their Attributes).
> We should be able to re-use these instances via a static
> ThreadLocal...



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5634) Reuse TokenStream instances in Field

2014-05-02 Thread Uwe Schindler (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13987841#comment-13987841
 ] 

Uwe Schindler commented on LUCENE-5634:
---

Patch looks fine. I was afraid of complexity, but that looks quite good. I am 
not sure about backwards compatibility issues, but implementing your own 
IndexableField instance is still very expert. With Java 8 we could handle that 
with default interface methods (LOOL).

The current patch is fine for the 2 special cases, although its a bit risky, if 
we add new "settings" to NTS or change its API (we should have equals...). 
Maybe in LUCENE-5605 we can improve the check. If we pass FieldType directly to 
NTS and NRQ, we can handle the whole thing by comparing the field type and not 
rely on crazy internals like precStep.

It would be great if we could in the future remove the ThreadLocal from 
Analyzer, too - by using the same trick. Unfortunately with the current 
contract on TokenStream its hard to compare, unless we have a well-defined 
TokenStream#equals(). Ideally TokenStream#equals() should compare the 
"settings" of the stream and its inputs (for Filters), but that is too advanced 
for the simple 2 cases.

Another solution for this would be to have some "holder" around the TokenStream 
thats cached and provides hashcode/equals. By that a Field could determine 
better if its his own tokenstream (e.g. by putting a refernce to its field type 
into the holder).

> Reuse TokenStream instances in Field
> 
>
> Key: LUCENE-5634
> URL: https://issues.apache.org/jira/browse/LUCENE-5634
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Michael McCandless
> Fix For: 4.9, 5.0
>
> Attachments: LUCENE-5634.patch, LUCENE-5634.patch, LUCENE-5634.patch
>
>
> If you don't reuse your Doc/Field instances (which is very expert: I
> suspect few apps do) then there's a lot of garbage created to index each
> StringField because we make a new StringTokenStream or
> NumericTokenStream (and their Attributes).
> We should be able to re-use these instances via a static
> ThreadLocal...



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5634) Reuse TokenStream instances in Field

2014-05-02 Thread Michael McCandless (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13987827#comment-13987827
 ] 

Michael McCandless commented on LUCENE-5634:


BTW, that test was with precStep=8.  If I use precStep=4 (still the default, we 
really have to fix LUCENE-5609!) then indexing time for Geonames with the patch 
is 164.8 sec (63% slower!).

> Reuse TokenStream instances in Field
> 
>
> Key: LUCENE-5634
> URL: https://issues.apache.org/jira/browse/LUCENE-5634
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Michael McCandless
> Fix For: 4.9, 5.0
>
> Attachments: LUCENE-5634.patch, LUCENE-5634.patch, LUCENE-5634.patch
>
>
> If you don't reuse your Doc/Field instances (which is very expert: I
> suspect few apps do) then there's a lot of garbage created to index each
> StringField because we make a new StringTokenStream or
> NumericTokenStream (and their Attributes).
> We should be able to re-use these instances via a static
> ThreadLocal...



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5634) Reuse TokenStream instances in Field

2014-05-02 Thread Michael McCandless (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13987808#comment-13987808
 ] 

Michael McCandless commented on LUCENE-5634:


OK with NumericField, full Geonames index takes 129.7 sec on trunk and 101.0 
sec with last patch... nice speedup.

> Reuse TokenStream instances in Field
> 
>
> Key: LUCENE-5634
> URL: https://issues.apache.org/jira/browse/LUCENE-5634
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Michael McCandless
> Fix For: 4.9, 5.0
>
> Attachments: LUCENE-5634.patch, LUCENE-5634.patch, LUCENE-5634.patch
>
>
> If you don't reuse your Doc/Field instances (which is very expert: I
> suspect few apps do) then there's a lot of garbage created to index each
> StringField because we make a new StringTokenStream or
> NumericTokenStream (and their Attributes).
> We should be able to re-use these instances via a static
> ThreadLocal...



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5634) Reuse TokenStream instances in Field

2014-05-02 Thread Michael McCandless (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13987806#comment-13987806
 ] 

Michael McCandless commented on LUCENE-5634:


+1, patch looks good.

I ran IndexGeoNames again, it took 37.6 seconds, which is a big speedup over 
trunk (55.6 seconds).  However, it's only doing StringField right now ... I'll 
re-test w/ NumericField too.

> Reuse TokenStream instances in Field
> 
>
> Key: LUCENE-5634
> URL: https://issues.apache.org/jira/browse/LUCENE-5634
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Michael McCandless
> Fix For: 4.9, 5.0
>
> Attachments: LUCENE-5634.patch, LUCENE-5634.patch, LUCENE-5634.patch
>
>
> If you don't reuse your Doc/Field instances (which is very expert: I
> suspect few apps do) then there's a lot of garbage created to index each
> StringField because we make a new StringTokenStream or
> NumericTokenStream (and their Attributes).
> We should be able to re-use these instances via a static
> ThreadLocal...



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-5634) Reuse TokenStream instances in Field

2014-05-02 Thread Robert Muir (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-5634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-5634:


Attachment: LUCENE-5634.patch

here is a patch. Tests seem happy, but i didnt benchmark or yet write explicit 
test.

Personally I think its bogus: I don't like that these fields (StringField, 
NumericField) "backdoor" the analyzer and to me thats the real bug. But I am ok 
with the change as a step, because it only makes the low-level interface api 
more bogus.

> Reuse TokenStream instances in Field
> 
>
> Key: LUCENE-5634
> URL: https://issues.apache.org/jira/browse/LUCENE-5634
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Michael McCandless
> Fix For: 4.9, 5.0
>
> Attachments: LUCENE-5634.patch, LUCENE-5634.patch, LUCENE-5634.patch
>
>
> If you don't reuse your Doc/Field instances (which is very expert: I
> suspect few apps do) then there's a lot of garbage created to index each
> StringField because we make a new StringTokenStream or
> NumericTokenStream (and their Attributes).
> We should be able to re-use these instances via a static
> ThreadLocal...



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5634) Reuse TokenStream instances in Field

2014-05-02 Thread Michael McCandless (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13987789#comment-13987789
 ] 

Michael McCandless commented on LUCENE-5634:


bq. Maybe add a parameter to Field#tokenStream(), passing the previously cached 
instance! 

This sounds like a good idea!

> Reuse TokenStream instances in Field
> 
>
> Key: LUCENE-5634
> URL: https://issues.apache.org/jira/browse/LUCENE-5634
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Michael McCandless
> Fix For: 4.9, 5.0
>
> Attachments: LUCENE-5634.patch, LUCENE-5634.patch, LUCENE-5634.patch
>
>
> If you don't reuse your Doc/Field instances (which is very expert: I
> suspect few apps do) then there's a lot of garbage created to index each
> StringField because we make a new StringTokenStream or
> NumericTokenStream (and their Attributes).
> We should be able to re-use these instances via a static
> ThreadLocal...



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Assigned] (SOLR-5956) SnapShooter is using getRawInstanceDir, which is sometimes not a valid directory

2014-05-02 Thread Timothy Potter (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-5956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Potter reassigned SOLR-5956:


Assignee: Timothy Potter

> SnapShooter is using getRawInstanceDir, which is sometimes not a valid 
> directory
> 
>
> Key: SOLR-5956
> URL: https://issues.apache.org/jira/browse/SOLR-5956
> Project: Solr
>  Issue Type: Bug
>  Components: replication (java), SolrCloud
> Environment: SolrCloud
>Reporter: Timothy Potter
>Assignee: Timothy Potter
> Attachments: SOLR-5956.patch
>
>
> Ran into an issue where the getRawInstanceDir method on CoreDescriptor 
> returns an invalid directory on the server. Need to track down where this bad 
> value comes from and fix it. I suspect this has to do with me using symlinks
> e.g.
> I have server with solr.solr.home set to
> /home/ec2-user/lucene-solr/solr/cloud87/solr, which in reality is:
> /vol0/cloud87/solr as /home/ec2-user/lucene-solr/solr/cloud87 is a symlink to 
> /vol0/cloud87
> getRawInstanceDir was returning /vol0/cloud87/demo_shard1_replica1 which is 
> missing the /solr part of the directory path, should be:
> /vol0/cloud87/solr/demo_shard1_replica1



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5634) Reuse TokenStream instances in Field

2014-05-02 Thread Robert Muir (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13987729#comment-13987729
 ] 

Robert Muir commented on LUCENE-5634:
-

{quote}
Another idea: Maybe add a parameter to Field#tokenStream(), passing the 
previously cached instance! By this the field could obviously reuse the 
TokenStream, if the type (instanceof check) is correct. If not, throw it away 
and create a new one. The indexer then manages the cache (its just a field in 
DefaultIndexingChain or DocumentsWriter).
{quote}

I like this idea better. Lets try and see how bad it looks.

> Reuse TokenStream instances in Field
> 
>
> Key: LUCENE-5634
> URL: https://issues.apache.org/jira/browse/LUCENE-5634
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Michael McCandless
> Fix For: 4.9, 5.0
>
> Attachments: LUCENE-5634.patch, LUCENE-5634.patch
>
>
> If you don't reuse your Doc/Field instances (which is very expert: I
> suspect few apps do) then there's a lot of garbage created to index each
> StringField because we make a new StringTokenStream or
> NumericTokenStream (and their Attributes).
> We should be able to re-use these instances via a static
> ThreadLocal...



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5634) Reuse TokenStream instances in Field

2014-05-02 Thread Uwe Schindler (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13987648#comment-13987648
 ] 

Uwe Schindler commented on LUCENE-5634:
---

Another idea: Maybe add a parameter to Field#tokenStream(), passing the 
previously cached instance! By this the field could obviously reuse the 
TokenStream, if the type (instanceof check) is correct. If not, throw it away 
and create a new one. The indexer then manages the cache (its just a field in 
DefaultIndexingChain or DocumentsWriter).

> Reuse TokenStream instances in Field
> 
>
> Key: LUCENE-5634
> URL: https://issues.apache.org/jira/browse/LUCENE-5634
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Michael McCandless
> Fix For: 4.9, 5.0
>
> Attachments: LUCENE-5634.patch, LUCENE-5634.patch
>
>
> If you don't reuse your Doc/Field instances (which is very expert: I
> suspect few apps do) then there's a lot of garbage created to index each
> StringField because we make a new StringTokenStream or
> NumericTokenStream (and their Attributes).
> We should be able to re-use these instances via a static
> ThreadLocal...



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5634) Reuse TokenStream instances in Field

2014-05-02 Thread Uwe Schindler (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13987645#comment-13987645
 ] 

Uwe Schindler commented on LUCENE-5634:
---

bq. but it's trickier since the precStep is final (maybe we can un-final it and 
add a setter?)

Please donÄt do this. It is maybe better to do it like in Elasticsearch: Have a 
pool of NTS for each precision step.

bq. this optimization has proven to help a lot in the context of ES, but we can 
use a static thread local since we are fully in control of the threading model. 
With Lucene itself, where it can be used in many different environment, then 
this can cause some unexpected behavior. For example, this might cause Tomcat 
to warn on leaking resources when unloading a war.

Thanks Shay: This is really the reason why we always refused to use static (!) 
ThreadLocals in Lucene, especially for those heavy used components.

Maybe we can do a similar thing like with StringField in Mike's patch. Its a 
bit crazy to move out the TokenStreams from the field, but we can do this for 
performance here. Just have a lazy init pool of NumericTokenStreams for each 
precisionStep in each per thread DocumentsWriter (DefaultIndexingChain).

-1 to add thread locals in Lucene here!

Another idea how to manage the pools: Maybe add a protected method to Field 
that can get the DocumentsWriter instance and add some caching functionality 
for arbitrary TokenStreams (not just NumericTS or StringTS): Maybe some method 
on the per thread DocumentsWriter to set aTokenStream for reuse per field. The 
field (also custom ones) then could use 
setCachedTokenStream/getCachedTokenStream through the DocumentsWriter accessor 
from inside the Field.

> Reuse TokenStream instances in Field
> 
>
> Key: LUCENE-5634
> URL: https://issues.apache.org/jira/browse/LUCENE-5634
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Michael McCandless
> Fix For: 4.9, 5.0
>
> Attachments: LUCENE-5634.patch, LUCENE-5634.patch
>
>
> If you don't reuse your Doc/Field instances (which is very expert: I
> suspect few apps do) then there's a lot of garbage created to index each
> StringField because we make a new StringTokenStream or
> NumericTokenStream (and their Attributes).
> We should be able to re-use these instances via a static
> ThreadLocal...



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (LUCENE-5634) Reuse TokenStream instances in Field

2014-05-02 Thread Uwe Schindler (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13987645#comment-13987645
 ] 

Uwe Schindler edited comment on LUCENE-5634 at 5/2/14 12:45 PM:


bq. but it's trickier since the precStep is final (maybe we can un-final it and 
add a setter?)

Please don't do this. It is maybe better to do it like in Elasticsearch: Have a 
pool of NTS for each precision step.

bq. this optimization has proven to help a lot in the context of ES, but we can 
use a static thread local since we are fully in control of the threading model. 
With Lucene itself, where it can be used in many different environment, then 
this can cause some unexpected behavior. For example, this might cause Tomcat 
to warn on leaking resources when unloading a war.

Thanks Shay: This is really the reason why we always refused to use static (!) 
ThreadLocals in Lucene, especially for those heavy used components.

Maybe we can do a similar thing like with StringField in Mike's patch. Its a 
bit crazy to move out the TokenStreams from the field, but we can do this for 
performance here. Just have a lazy init pool of NumericTokenStreams for each 
precisionStep in each per thread DocumentsWriter (DefaultIndexingChain).

-1 to add thread locals in Lucene here!

Another idea how to manage the pools: Maybe add a protected method to Field 
that can get the DocumentsWriter instance and add some caching functionality 
for arbitrary TokenStreams (not just NumericTS or StringTS): Maybe some method 
on the per thread DocumentsWriter to set aTokenStream for reuse per field. The 
field (also custom ones) then could use 
setCachedTokenStream/getCachedTokenStream through the DocumentsWriter accessor 
from inside the Field.


was (Author: thetaphi):
bq. but it's trickier since the precStep is final (maybe we can un-final it and 
add a setter?)

Please donÄt do this. It is maybe better to do it like in Elasticsearch: Have a 
pool of NTS for each precision step.

bq. this optimization has proven to help a lot in the context of ES, but we can 
use a static thread local since we are fully in control of the threading model. 
With Lucene itself, where it can be used in many different environment, then 
this can cause some unexpected behavior. For example, this might cause Tomcat 
to warn on leaking resources when unloading a war.

Thanks Shay: This is really the reason why we always refused to use static (!) 
ThreadLocals in Lucene, especially for those heavy used components.

Maybe we can do a similar thing like with StringField in Mike's patch. Its a 
bit crazy to move out the TokenStreams from the field, but we can do this for 
performance here. Just have a lazy init pool of NumericTokenStreams for each 
precisionStep in each per thread DocumentsWriter (DefaultIndexingChain).

-1 to add thread locals in Lucene here!

Another idea how to manage the pools: Maybe add a protected method to Field 
that can get the DocumentsWriter instance and add some caching functionality 
for arbitrary TokenStreams (not just NumericTS or StringTS): Maybe some method 
on the per thread DocumentsWriter to set aTokenStream for reuse per field. The 
field (also custom ones) then could use 
setCachedTokenStream/getCachedTokenStream through the DocumentsWriter accessor 
from inside the Field.

> Reuse TokenStream instances in Field
> 
>
> Key: LUCENE-5634
> URL: https://issues.apache.org/jira/browse/LUCENE-5634
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Michael McCandless
> Fix For: 4.9, 5.0
>
> Attachments: LUCENE-5634.patch, LUCENE-5634.patch
>
>
> If you don't reuse your Doc/Field instances (which is very expert: I
> suspect few apps do) then there's a lot of garbage created to index each
> StringField because we make a new StringTokenStream or
> NumericTokenStream (and their Attributes).
> We should be able to re-use these instances via a static
> ThreadLocal...



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: CJKBigramFilter - position bug with outputUnigrams?

2014-05-02 Thread Robert Muir

>
> Would it be possible to implement an option with a name similar to
> "lastUnigramAtPreviousPosition" so that I can optionally get the
> behavior I'm after when the input is two or more characters, without
> changing current behavior for anyone else?  This would completely solve
> my current problem.
>

This is really not feasible. It sounds like multi-level n-grams in the
same field are a bad match for what you are doing (phrase queries
etc). This just doesnt work, and wont work, based on the mathematics.

Try another approach like removing this filter completely, maybe the
word segmentation by ICU is good enough.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

64 matches

Mail list logo