[JENKINS] Lucene-Solr-4.x-Windows (64bit/jdk1.8.0_20-ea-b05) - Build # 3924 - Still Failing!
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-4.x-Windows/3924/ Java: 64bit/jdk1.8.0_20-ea-b05 -XX:+UseCompressedOops -XX:+UseG1GC 1 tests failed. FAILED: org.apache.solr.client.solrj.impl.BasicHttpSolrServerTest.testUpdate Error Message: expected: but was: Stack Trace: java.lang.AssertionError: expected: but was: at __randomizedtesting.SeedInfo.seed([2DC8564DCD010891:9BDE1347ABC5E047]:0) at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.failNotEquals(Assert.java:647) at org.junit.Assert.assertEquals(Assert.java:128) at org.junit.Assert.assertEquals(Assert.java:147) at org.apache.solr.client.solrj.impl.BasicHttpSolrServerTest.testUpdate(BasicHttpSolrServerTest.java:365) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:483) at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1618) at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:827) at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:863) at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:877) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53) at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50) at org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:360) at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:793) at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:453) at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:836) at com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:738) at com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:772) at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:783) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65) at org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$Statemen
[jira] [Updated] (SOLR-6039) debug=track causes debug=query info to be suppressed when no results found
[ https://issues.apache.org/jira/browse/SOLR-6039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shawn Heisey updated SOLR-6039: --- Summary: debug=track causes debug=query info to be suppressed when no results found (was: debug=track causes debug=query info to be suprsedd when no results found) > debug=track causes debug=query info to be suppressed when no results found > -- > > Key: SOLR-6039 > URL: https://issues.apache.org/jira/browse/SOLR-6039 > Project: Solr > Issue Type: Bug >Affects Versions: 4.7 >Reporter: Hoss Man > Attachments: SOLR-6039.patch, SOLR-6039.patch, SOLR-6039.patch, > SOLR-6039.patch > > > Shamik Bandopadhyay noted on the mailing list that debugQuery=true wasn't > returning info about how the query was being parsed in SolrCloud. > Steps to reproduce... > * startup a simple 2 shard solr cluster using the example configs > * Load this URL: > ** http://localhost:8983/solr/select?q=Foo&debug=query > ** note that the debug=query causes a debug block including "parsedquery" > showing "title:foo" > * Load either of these URLs: > ** http://localhost:8983/solr/select?q=Foo&debug=query&debug=track > ** http://localhost:8983/solr/select?q=Foo&debugQuery=true (legacy short hand > option for enabling all debug options) > ** Note that the debug block exists, but *only* includes the distributed > "track" options - the query parsing debugging info is not available > * index the sample data (java -jar post.jar *.xml) > * Load either of these URLs: > ** http://localhost:8983/solr/select?q=Solr&debugQuery=true&fl=id > ** http://localhost:8983/solr/select?q=Solr&debug=query&debug=track&fl=id > ** Note that now we have at least one matching doc, and the parsedquery info > is included in the debug block along with the tracking info > * Load either of these URLs: > ** http://localhost:8983/solr/select?q=Solr&debug=query&debug=track&rows=0 > ** http://localhost:8983/solr/select?q=Solr&debug=query&debug=track&rows=0 > ** Note: even though we have a matching doc, since rows=0 prevents it from > being returned, the parsedquery debug info again no longer works - just the > track debug info > > The work around, for people who want don't care about the newer "debug > tracking" and what the same debug information as pre-4.7, is to enumerate the > debug options (ie: {{debug=query&debug=timing&debug=results}}) instead of > relying on the shorthand: {{debugQuery=true}} -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-trunk-MacOSX (64bit/jdk1.7.0) - Build # 1546 - Failure!
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-MacOSX/1546/ Java: 64bit/jdk1.7.0 -XX:-UseCompressedOops -XX:+UseParallelGC All tests passed Build Log: [...truncated 11192 lines...] [junit4] JVM J0: stderr was not empty, see: /Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/build/solr-core/test/temp/junit4-J0-20140503_001903_822.syserr [junit4] >>> JVM J0: stderr (verbatim) [junit4] java(215,0x146169000) malloc: *** error for object 0x146157b00: pointer being freed was not allocated [junit4] *** set a breakpoint in malloc_error_break to debug [junit4] <<< JVM J0: EOF [...truncated 1 lines...] [junit4] ERROR: JVM J0 ended with an exception, command line: /Library/Java/JavaVirtualMachines/jdk1.7.0_55.jdk/Contents/Home/jre/bin/java -XX:-UseCompressedOops -XX:+UseParallelGC -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/heapdumps -Dtests.prefix=tests -Dtests.seed=1E5E2F4D9463327A -Xmx512M -Dtests.iters= -Dtests.verbose=false -Dtests.infostream=false -Dtests.codec=random -Dtests.postingsformat=random -Dtests.docvaluesformat=random -Dtests.locale=random -Dtests.timezone=random -Dtests.directory=random -Dtests.linedocsfile=europarl.lines.txt.gz -Dtests.luceneMatchVersion=5.0 -Dtests.cleanthreads=perClass -Djava.util.logging.config.file=/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/tools/junit4/logging.properties -Dtests.nightly=false -Dtests.weekly=false -Dtests.monster=false -Dtests.slow=true -Dtests.asserts.gracious=false -Dtests.multiplier=1 -DtempDir=. -Djava.io.tmpdir=. -Djunit4.tempDir=/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/build/solr-core/test/temp -Dclover.db.dir=/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/clover/db -Djava.security.manager=org.apache.lucene.util.TestSecurityManager -Djava.security.policy=/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/tools/junit4/tests.policy -Dlucene.version=5.0-SNAPSHOT -Djetty.testMode=1 -Djetty.insecurerandom=1 -Dsolr.directoryFactory=org.apache.solr.core.MockDirectoryFactory -Djava.awt.headless=true -Djdk.map.althashing.threshold=0 -Dtests.leaveTemporary=false -Dtests.filterstacks=true -Dtests.disableHdfs=true -classpath /Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/build/solr-core/classes/test:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/build/solr-test-framework/classes/java:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/test-framework/lib/junit4-ant-2.1.3.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/core/src/test-files:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/test-framework/classes/java:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/codecs/classes/java:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/build/solr-solrj/classes/java:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/build/solr-core/classes/java:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/analysis/common/lucene-analyzers-common-5.0-SNAPSHOT.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/analysis/kuromoji/lucene-analyzers-kuromoji-5.0-SNAPSHOT.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/analysis/phonetic/lucene-analyzers-phonetic-5.0-SNAPSHOT.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/codecs/lucene-codecs-5.0-SNAPSHOT.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/highlighter/lucene-highlighter-5.0-SNAPSHOT.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/memory/lucene-memory-5.0-SNAPSHOT.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/misc/lucene-misc-5.0-SNAPSHOT.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/spatial/lucene-spatial-5.0-SNAPSHOT.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/expressions/lucene-expressions-5.0-SNAPSHOT.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/suggest/lucene-suggest-5.0-SNAPSHOT.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/grouping/lucene-grouping-5.0-SNAPSHOT.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/queries/lucene-queries-5.0-SNAPSHOT.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/queryparser/lucene-queryparser-5.0-SNAPSHOT.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/join/lucene-join-5.0-SNAPSHOT.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/core/lib/antlr-runtime-3.5.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/core/lib/asm-4.1.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/core/lib/asm-commons-4.1.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/core/lib/commons-cli-1.2.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/core/lib/commons-codec-1.9.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/core/lib/commons-configuration-1.6.jar:/Users/jenk
[jira] [Commented] (SOLR-6022) Rename getAnalyzer to getIndexAnalyzer
[ https://issues.apache.org/jira/browse/SOLR-6022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13988445#comment-13988445 ] ASF subversion and git services commented on SOLR-6022: --- Commit 1592127 from [~rjernst] in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1592127 ] SOLR-6022: Deprecate getAnalyzer() in IndexField and FieldType, and add getIndexAnalyzer() > Rename getAnalyzer to getIndexAnalyzer > -- > > Key: SOLR-6022 > URL: https://issues.apache.org/jira/browse/SOLR-6022 > Project: Solr > Issue Type: Improvement >Reporter: Ryan Ernst >Assignee: Ryan Ernst > Fix For: 4.9, 5.0 > > Attachments: SOLR-6022.branch_4x-deprecation.patch, SOLR-6022.patch, > SOLR-6022.patch, SOLR-6022.patch > > > We have separate index/query analyzer chains, but the access methods for the > analyzers do not match up with the names. This can lead to unknowingly using > the wrong analyzer chain (as it did in SOLR-6017). We should do this > renaming in trunk, and deprecate the old getAnalyzer function in 4x. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-6022) Rename getAnalyzer to getIndexAnalyzer
[ https://issues.apache.org/jira/browse/SOLR-6022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ryan Ernst resolved SOLR-6022. -- Resolution: Fixed Fix Version/s: 5.0 4.9 > Rename getAnalyzer to getIndexAnalyzer > -- > > Key: SOLR-6022 > URL: https://issues.apache.org/jira/browse/SOLR-6022 > Project: Solr > Issue Type: Improvement >Reporter: Ryan Ernst >Assignee: Ryan Ernst > Fix For: 4.9, 5.0 > > Attachments: SOLR-6022.branch_4x-deprecation.patch, SOLR-6022.patch, > SOLR-6022.patch, SOLR-6022.patch > > > We have separate index/query analyzer chains, but the access methods for the > analyzers do not match up with the names. This can lead to unknowingly using > the wrong analyzer chain (as it did in SOLR-6017). We should do this > renaming in trunk, and deprecate the old getAnalyzer function in 4x. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6039) debug=track causes debug=query info to be suprsedd when no results found
[ https://issues.apache.org/jira/browse/SOLR-6039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13988391#comment-13988391 ] Tomás Fernández Löbbe commented on SOLR-6039: - bq. I didnt' fully understand the changes you made when skimming your patch Besides adding the query section when there are no "GET_DEBUG_PURPOSE" requests (that it's on on the GET_FIELDS phase), one thing that changes with the patch is that the shard requests for all phases will require "debug=timing" if timing is needed. Then after the final phase those times are added. Before this change, the timing section didn't come back on queries with no docs (this is I think a previous than SOLR-5399), now it does. Another implication with this change is that all shard requests will be considered, and not only the last phase (will now show higher times). As I say before, the times for all shard responses are being added, and because many of those requests are sent in parallel, this means that the timing displayed may be higher than the clock time of the request. I think this is useful information anyway and should be considered more as a metric of how much each component is taking in all the request. > debug=track causes debug=query info to be suprsedd when no results found > > > Key: SOLR-6039 > URL: https://issues.apache.org/jira/browse/SOLR-6039 > Project: Solr > Issue Type: Bug >Affects Versions: 4.7 >Reporter: Hoss Man > Attachments: SOLR-6039.patch, SOLR-6039.patch, SOLR-6039.patch, > SOLR-6039.patch > > > Shamik Bandopadhyay noted on the mailing list that debugQuery=true wasn't > returning info about how the query was being parsed in SolrCloud. > Steps to reproduce... > * startup a simple 2 shard solr cluster using the example configs > * Load this URL: > ** http://localhost:8983/solr/select?q=Foo&debug=query > ** note that the debug=query causes a debug block including "parsedquery" > showing "title:foo" > * Load either of these URLs: > ** http://localhost:8983/solr/select?q=Foo&debug=query&debug=track > ** http://localhost:8983/solr/select?q=Foo&debugQuery=true (legacy short hand > option for enabling all debug options) > ** Note that the debug block exists, but *only* includes the distributed > "track" options - the query parsing debugging info is not available > * index the sample data (java -jar post.jar *.xml) > * Load either of these URLs: > ** http://localhost:8983/solr/select?q=Solr&debugQuery=true&fl=id > ** http://localhost:8983/solr/select?q=Solr&debug=query&debug=track&fl=id > ** Note that now we have at least one matching doc, and the parsedquery info > is included in the debug block along with the tracking info > * Load either of these URLs: > ** http://localhost:8983/solr/select?q=Solr&debug=query&debug=track&rows=0 > ** http://localhost:8983/solr/select?q=Solr&debug=query&debug=track&rows=0 > ** Note: even though we have a matching doc, since rows=0 prevents it from > being returned, the parsedquery debug info again no longer works - just the > track debug info > > The work around, for people who want don't care about the newer "debug > tracking" and what the same debug information as pre-4.7, is to enumerate the > debug options (ie: {{debug=query&debug=timing&debug=results}}) instead of > relying on the shorthand: {{debugQuery=true}} -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5638) Default Attributes are expensive
[ https://issues.apache.org/jira/browse/LUCENE-5638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13988378#comment-13988378 ] Uwe Schindler commented on LUCENE-5638: --- I also created another subtask to clean up the Token class and remove stupid copy-ctors and all those reinit() methods. Unmaintainable! LUCENE-5640 > Default Attributes are expensive > > > Key: LUCENE-5638 > URL: https://issues.apache.org/jira/browse/LUCENE-5638 > Project: Lucene - Core > Issue Type: Bug > Components: modules/analysis >Reporter: Robert Muir > Attachments: LUCENE-5638.patch > > > Changes like LUCENE-5634 make it clear that the default AttributeFactory > stuff has a very high cost: weakmaps/reflection/etc. > Additionally I think clearAttributes() is more expensive than it should be: > it has to traverse a linked-list, calling clear() per token. > Operations like cloning (save/restoreState) have a high cost tll. > Maybe we can have a better Default? In other words, rename > DEFAULT_ATTRIBUTE_FACTORY to REFLECTION_ATTRIBUTE_FACTORY, and instead have a > faster default factory that just has one AttributeImpl with the "core ones" > that 95% of users are dealing with (TOKEN_ATTRIBUTE_FACTORY?): anything > outside of that falls back to reflection. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-6039) debug=track causes debug=query info to be suprsedd when no results found
[ https://issues.apache.org/jira/browse/SOLR-6039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tomás Fernández Löbbe updated SOLR-6039: Attachment: SOLR-6039.patch > debug=track causes debug=query info to be suprsedd when no results found > > > Key: SOLR-6039 > URL: https://issues.apache.org/jira/browse/SOLR-6039 > Project: Solr > Issue Type: Bug >Affects Versions: 4.7 >Reporter: Hoss Man > Attachments: SOLR-6039.patch, SOLR-6039.patch, SOLR-6039.patch, > SOLR-6039.patch > > > Shamik Bandopadhyay noted on the mailing list that debugQuery=true wasn't > returning info about how the query was being parsed in SolrCloud. > Steps to reproduce... > * startup a simple 2 shard solr cluster using the example configs > * Load this URL: > ** http://localhost:8983/solr/select?q=Foo&debug=query > ** note that the debug=query causes a debug block including "parsedquery" > showing "title:foo" > * Load either of these URLs: > ** http://localhost:8983/solr/select?q=Foo&debug=query&debug=track > ** http://localhost:8983/solr/select?q=Foo&debugQuery=true (legacy short hand > option for enabling all debug options) > ** Note that the debug block exists, but *only* includes the distributed > "track" options - the query parsing debugging info is not available > * index the sample data (java -jar post.jar *.xml) > * Load either of these URLs: > ** http://localhost:8983/solr/select?q=Solr&debugQuery=true&fl=id > ** http://localhost:8983/solr/select?q=Solr&debug=query&debug=track&fl=id > ** Note that now we have at least one matching doc, and the parsedquery info > is included in the debug block along with the tracking info > * Load either of these URLs: > ** http://localhost:8983/solr/select?q=Solr&debug=query&debug=track&rows=0 > ** http://localhost:8983/solr/select?q=Solr&debug=query&debug=track&rows=0 > ** Note: even though we have a matching doc, since rows=0 prevents it from > being returned, the parsedquery debug info again no longer works - just the > track debug info > > The work around, for people who want don't care about the newer "debug > tracking" and what the same debug information as pre-4.7, is to enumerate the > debug options (ie: {{debug=query&debug=timing&debug=results}}) instead of > relying on the shorthand: {{debugQuery=true}} -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5638) Default Attributes are expensive
[ https://issues.apache.org/jira/browse/LUCENE-5638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13988377#comment-13988377 ] Uwe Schindler commented on LUCENE-5638: --- In analysis module, TestWikipediaTokenizer also fails, we have to dig. I don't understand the failure. > Default Attributes are expensive > > > Key: LUCENE-5638 > URL: https://issues.apache.org/jira/browse/LUCENE-5638 > Project: Lucene - Core > Issue Type: Bug > Components: modules/analysis >Reporter: Robert Muir > Attachments: LUCENE-5638.patch > > > Changes like LUCENE-5634 make it clear that the default AttributeFactory > stuff has a very high cost: weakmaps/reflection/etc. > Additionally I think clearAttributes() is more expensive than it should be: > it has to traverse a linked-list, calling clear() per token. > Operations like cloning (save/restoreState) have a high cost tll. > Maybe we can have a better Default? In other words, rename > DEFAULT_ATTRIBUTE_FACTORY to REFLECTION_ATTRIBUTE_FACTORY, and instead have a > faster default factory that just has one AttributeImpl with the "core ones" > that 95% of users are dealing with (TOKEN_ATTRIBUTE_FACTORY?): anything > outside of that falls back to reflection. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5639) Fix implementation of PositionLengthAttribute in Token.java
[ https://issues.apache.org/jira/browse/LUCENE-5639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13988374#comment-13988374 ] ASF subversion and git services commented on LUCENE-5639: - Commit 1592086 from [~thetaphi] in branch 'dev/branches/lucene_solr_4_7' [ https://svn.apache.org/r1592086 ] Merged revision(s) 1592080 from lucene/dev/branches/branch_4x: LUCENE-5639: Fix token class to correctly implement PoistionLengthAttribute > Fix implementation of PositionLengthAttribute in Token.java > --- > > Key: LUCENE-5639 > URL: https://issues.apache.org/jira/browse/LUCENE-5639 > Project: Lucene - Core > Issue Type: Sub-task > Components: modules/analysis >Reporter: Uwe Schindler >Assignee: Uwe Schindler > Fix For: 4.7.3, 4.8.1, 4.9, 5.0 > > Attachments: LUCENE-5639.patch > > > The Token class misses to correctly implement all clone/copy/equals/... stuff > for PositionLengthAttribute. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Issue Comment Deleted] (LUCENE-5639) Fix implementation of PositionLengthAttribute in Token.java
[ https://issues.apache.org/jira/browse/LUCENE-5639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-5639: -- Comment: was deleted (was: We should also cleanup the Token class and reomve the various horrible ctors calling each other. Alos all of the stupid reInit methods. All those are buggy like hellp if you add new attributes.) > Fix implementation of PositionLengthAttribute in Token.java > --- > > Key: LUCENE-5639 > URL: https://issues.apache.org/jira/browse/LUCENE-5639 > Project: Lucene - Core > Issue Type: Sub-task > Components: modules/analysis >Reporter: Uwe Schindler >Assignee: Uwe Schindler > Fix For: 4.7.3, 4.8.1, 4.9, 5.0 > > Attachments: LUCENE-5639.patch > > > The Token class misses to correctly implement all clone/copy/equals/... stuff > for PositionLengthAttribute. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-5639) Fix implementation of PositionLengthAttribute in Token.java
[ https://issues.apache.org/jira/browse/LUCENE-5639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler resolved LUCENE-5639. --- Resolution: Fixed > Fix implementation of PositionLengthAttribute in Token.java > --- > > Key: LUCENE-5639 > URL: https://issues.apache.org/jira/browse/LUCENE-5639 > Project: Lucene - Core > Issue Type: Sub-task > Components: modules/analysis >Reporter: Uwe Schindler >Assignee: Uwe Schindler > Fix For: 4.7.3, 4.8.1, 4.9, 5.0 > > Attachments: LUCENE-5639.patch > > > The Token class misses to correctly implement all clone/copy/equals/... stuff > for PositionLengthAttribute. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5639) Fix implementation of PositionLengthAttribute in Token.java
[ https://issues.apache.org/jira/browse/LUCENE-5639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13988366#comment-13988366 ] ASF subversion and git services commented on LUCENE-5639: - Commit 1592083 from [~thetaphi] in branch 'dev/branches/lucene_solr_4_8' [ https://svn.apache.org/r1592083 ] Merged revision(s) 1592080 from lucene/dev/branches/branch_4x: LUCENE-5639: Fix token class to correctly implement PoistionLengthAttribute > Fix implementation of PositionLengthAttribute in Token.java > --- > > Key: LUCENE-5639 > URL: https://issues.apache.org/jira/browse/LUCENE-5639 > Project: Lucene - Core > Issue Type: Sub-task > Components: modules/analysis >Reporter: Uwe Schindler >Assignee: Uwe Schindler > Fix For: 4.7.3, 4.8.1, 4.9, 5.0 > > Attachments: LUCENE-5639.patch > > > The Token class misses to correctly implement all clone/copy/equals/... stuff > for PositionLengthAttribute. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5639) Fix implementation of PositionLengthAttribute in Token.java
[ https://issues.apache.org/jira/browse/LUCENE-5639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13988363#comment-13988363 ] ASF subversion and git services commented on LUCENE-5639: - Commit 1592080 from [~thetaphi] in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1592080 ] Merged revision(s) 1592075, 1592078 from lucene/dev/trunk: LUCENE-5639: Fix token class to correctly implement PoistionLengthAttribute > Fix implementation of PositionLengthAttribute in Token.java > --- > > Key: LUCENE-5639 > URL: https://issues.apache.org/jira/browse/LUCENE-5639 > Project: Lucene - Core > Issue Type: Sub-task > Components: modules/analysis >Reporter: Uwe Schindler >Assignee: Uwe Schindler > Fix For: 4.7.3, 4.8.1, 4.9, 5.0 > > Attachments: LUCENE-5639.patch > > > The Token class misses to correctly implement all clone/copy/equals/... stuff > for PositionLengthAttribute. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5639) Fix implementation of PositionLengthAttribute in Token.java
[ https://issues.apache.org/jira/browse/LUCENE-5639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13988361#comment-13988361 ] ASF subversion and git services commented on LUCENE-5639: - Commit 1592078 from [~thetaphi] in branch 'dev/trunk' [ https://svn.apache.org/r1592078 ] LUCENE-5639: Fix token class to correctly implement PoistionLengthAttribute > Fix implementation of PositionLengthAttribute in Token.java > --- > > Key: LUCENE-5639 > URL: https://issues.apache.org/jira/browse/LUCENE-5639 > Project: Lucene - Core > Issue Type: Sub-task > Components: modules/analysis >Reporter: Uwe Schindler >Assignee: Uwe Schindler > Fix For: 4.7.3, 4.8.1, 4.9, 5.0 > > Attachments: LUCENE-5639.patch > > > The Token class misses to correctly implement all clone/copy/equals/... stuff > for PositionLengthAttribute. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (SOLR-6022) Rename getAnalyzer to getIndexAnalyzer
[ https://issues.apache.org/jira/browse/SOLR-6022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ryan Ernst reassigned SOLR-6022: Assignee: Ryan Ernst > Rename getAnalyzer to getIndexAnalyzer > -- > > Key: SOLR-6022 > URL: https://issues.apache.org/jira/browse/SOLR-6022 > Project: Solr > Issue Type: Improvement >Reporter: Ryan Ernst >Assignee: Ryan Ernst > Attachments: SOLR-6022.branch_4x-deprecation.patch, SOLR-6022.patch, > SOLR-6022.patch, SOLR-6022.patch > > > We have separate index/query analyzer chains, but the access methods for the > analyzers do not match up with the names. This can lead to unknowingly using > the wrong analyzer chain (as it did in SOLR-6017). We should do this > renaming in trunk, and deprecate the old getAnalyzer function in 4x. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6022) Rename getAnalyzer to getIndexAnalyzer
[ https://issues.apache.org/jira/browse/SOLR-6022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13988358#comment-13988358 ] ASF subversion and git services commented on SOLR-6022: --- Commit 1592076 from [~rjernst] in branch 'dev/trunk' [ https://svn.apache.org/r1592076 ] SOLR-6022: Rename getAnalyzer() to getIndexAnalyzer() > Rename getAnalyzer to getIndexAnalyzer > -- > > Key: SOLR-6022 > URL: https://issues.apache.org/jira/browse/SOLR-6022 > Project: Solr > Issue Type: Improvement >Reporter: Ryan Ernst > Attachments: SOLR-6022.branch_4x-deprecation.patch, SOLR-6022.patch, > SOLR-6022.patch, SOLR-6022.patch > > > We have separate index/query analyzer chains, but the access methods for the > analyzers do not match up with the names. This can lead to unknowingly using > the wrong analyzer chain (as it did in SOLR-6017). We should do this > renaming in trunk, and deprecate the old getAnalyzer function in 4x. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5639) Fix implementation of PositionLengthAttribute in Token.java
[ https://issues.apache.org/jira/browse/LUCENE-5639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13988357#comment-13988357 ] ASF subversion and git services commented on LUCENE-5639: - Commit 1592075 from [~thetaphi] in branch 'dev/trunk' [ https://svn.apache.org/r1592075 ] LUCENE-5639: Fix token class to correctly implement PoistionLengthAttribute > Fix implementation of PositionLengthAttribute in Token.java > --- > > Key: LUCENE-5639 > URL: https://issues.apache.org/jira/browse/LUCENE-5639 > Project: Lucene - Core > Issue Type: Sub-task > Components: modules/analysis >Reporter: Uwe Schindler >Assignee: Uwe Schindler > Fix For: 4.7.3, 4.8.1, 4.9, 5.0 > > Attachments: LUCENE-5639.patch > > > The Token class misses to correctly implement all clone/copy/equals/... stuff > for PositionLengthAttribute. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-5640) Cleanup Token class
Uwe Schindler created LUCENE-5640: - Summary: Cleanup Token class Key: LUCENE-5640 URL: https://issues.apache.org/jira/browse/LUCENE-5640 Project: Lucene - Core Issue Type: Sub-task Reporter: Uwe Schindler Fix For: 4.9, 5.0 We should remove code duplication in the Token class: - copy constructors - reinit() shit - non-default clone() This is too bugy. Most of the methods can be simply removed. In fact, Token should just look like a clone of all AttributeImpl it implements. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5639) Fix implementation of PositionLengthAttribute in Token.java
[ https://issues.apache.org/jira/browse/LUCENE-5639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-5639: -- Attachment: LUCENE-5639.patch > Fix implementation of PositionLengthAttribute in Token.java > --- > > Key: LUCENE-5639 > URL: https://issues.apache.org/jira/browse/LUCENE-5639 > Project: Lucene - Core > Issue Type: Sub-task > Components: modules/analysis >Reporter: Uwe Schindler >Assignee: Uwe Schindler > Fix For: 4.7.3, 4.8.1, 4.9, 5.0 > > Attachments: LUCENE-5639.patch > > > The Token class misses to correctly implement all clone/copy/equals/... stuff > for PositionLengthAttribute. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5639) Fix implementation of PositionLengthAttribute in Token.java
[ https://issues.apache.org/jira/browse/LUCENE-5639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13988343#comment-13988343 ] Uwe Schindler commented on LUCENE-5639: --- We should also cleanup the Token class and reomve the various horrible ctors calling each other. Alos all of the stupid reInit methods. All those are buggy like hellp if you add new attributes. > Fix implementation of PositionLengthAttribute in Token.java > --- > > Key: LUCENE-5639 > URL: https://issues.apache.org/jira/browse/LUCENE-5639 > Project: Lucene - Core > Issue Type: Sub-task > Components: modules/analysis >Reporter: Uwe Schindler >Assignee: Uwe Schindler > Fix For: 4.7.3, 4.8.1, 4.9, 5.0 > > > The Token class misses to correctly implement all clone/copy/equals/... stuff > for PositionLengthAttribute. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-5639) Fix implementation of PositionLengthAttribute in Token.java
Uwe Schindler created LUCENE-5639: - Summary: Fix implementation of PositionLengthAttribute in Token.java Key: LUCENE-5639 URL: https://issues.apache.org/jira/browse/LUCENE-5639 Project: Lucene - Core Issue Type: Sub-task Reporter: Uwe Schindler Assignee: Uwe Schindler Fix For: 4.7.3, 4.8.1, 4.9, 5.0 The Token class misses to correctly implement all clone/copy/equals/... stuff for PositionLengthAttribute. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5638) Default Attributes are expensive
[ https://issues.apache.org/jira/browse/LUCENE-5638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13988337#comment-13988337 ] Uwe Schindler commented on LUCENE-5638: --- I created a subtask: LUCENE-5639 > Default Attributes are expensive > > > Key: LUCENE-5638 > URL: https://issues.apache.org/jira/browse/LUCENE-5638 > Project: Lucene - Core > Issue Type: Bug > Components: modules/analysis >Reporter: Robert Muir > Attachments: LUCENE-5638.patch > > > Changes like LUCENE-5634 make it clear that the default AttributeFactory > stuff has a very high cost: weakmaps/reflection/etc. > Additionally I think clearAttributes() is more expensive than it should be: > it has to traverse a linked-list, calling clear() per token. > Operations like cloning (save/restoreState) have a high cost tll. > Maybe we can have a better Default? In other words, rename > DEFAULT_ATTRIBUTE_FACTORY to REFLECTION_ATTRIBUTE_FACTORY, and instead have a > faster default factory that just has one AttributeImpl with the "core ones" > that 95% of users are dealing with (TOKEN_ATTRIBUTE_FACTORY?): anything > outside of that falls back to reflection. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5638) Default Attributes are expensive
[ https://issues.apache.org/jira/browse/LUCENE-5638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13988330#comment-13988330 ] Uwe Schindler commented on LUCENE-5638: --- I found the bug: Token implemented PositionLengthAttribute but missed to implement all the clone/copyTo/equals/... shit. I willheavy commit that, because its a bug. > Default Attributes are expensive > > > Key: LUCENE-5638 > URL: https://issues.apache.org/jira/browse/LUCENE-5638 > Project: Lucene - Core > Issue Type: Bug > Components: modules/analysis >Reporter: Robert Muir > Attachments: LUCENE-5638.patch > > > Changes like LUCENE-5634 make it clear that the default AttributeFactory > stuff has a very high cost: weakmaps/reflection/etc. > Additionally I think clearAttributes() is more expensive than it should be: > it has to traverse a linked-list, calling clear() per token. > Operations like cloning (save/restoreState) have a high cost tll. > Maybe we can have a better Default? In other words, rename > DEFAULT_ATTRIBUTE_FACTORY to REFLECTION_ATTRIBUTE_FACTORY, and instead have a > faster default factory that just has one AttributeImpl with the "core ones" > that 95% of users are dealing with (TOKEN_ATTRIBUTE_FACTORY?): anything > outside of that falls back to reflection. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6039) debug=track causes debug=query info to be suprsedd when no results found
[ https://issues.apache.org/jira/browse/SOLR-6039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13988329#comment-13988329 ] Tomás Fernández Löbbe commented on SOLR-6039: - bq. i think for now it makes sense to just "fix" the bug relating ot wether the info comes back I agree now. When I started to think how to use max vs sum in some situations I saw the changes were not trivial, better to leave that for a different Jira. I was about to upload a new patch with some more changes and tests, please give me some time until I merge with your changes before committing. > debug=track causes debug=query info to be suprsedd when no results found > > > Key: SOLR-6039 > URL: https://issues.apache.org/jira/browse/SOLR-6039 > Project: Solr > Issue Type: Bug >Affects Versions: 4.7 >Reporter: Hoss Man > Attachments: SOLR-6039.patch, SOLR-6039.patch, SOLR-6039.patch > > > Shamik Bandopadhyay noted on the mailing list that debugQuery=true wasn't > returning info about how the query was being parsed in SolrCloud. > Steps to reproduce... > * startup a simple 2 shard solr cluster using the example configs > * Load this URL: > ** http://localhost:8983/solr/select?q=Foo&debug=query > ** note that the debug=query causes a debug block including "parsedquery" > showing "title:foo" > * Load either of these URLs: > ** http://localhost:8983/solr/select?q=Foo&debug=query&debug=track > ** http://localhost:8983/solr/select?q=Foo&debugQuery=true (legacy short hand > option for enabling all debug options) > ** Note that the debug block exists, but *only* includes the distributed > "track" options - the query parsing debugging info is not available > * index the sample data (java -jar post.jar *.xml) > * Load either of these URLs: > ** http://localhost:8983/solr/select?q=Solr&debugQuery=true&fl=id > ** http://localhost:8983/solr/select?q=Solr&debug=query&debug=track&fl=id > ** Note that now we have at least one matching doc, and the parsedquery info > is included in the debug block along with the tracking info > * Load either of these URLs: > ** http://localhost:8983/solr/select?q=Solr&debug=query&debug=track&rows=0 > ** http://localhost:8983/solr/select?q=Solr&debug=query&debug=track&rows=0 > ** Note: even though we have a matching doc, since rows=0 prevents it from > being returned, the parsedquery debug info again no longer works - just the > track debug info > > The work around, for people who want don't care about the newer "debug > tracking" and what the same debug information as pre-4.7, is to enumerate the > debug options (ie: {{debug=query&debug=timing&debug=results}}) instead of > relying on the shorthand: {{debugQuery=true}} -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-6039) debug=track causes debug=query info to be suprsedd when no results found
[ https://issues.apache.org/jira/browse/SOLR-6039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hoss Man updated SOLR-6039: --- Attachment: SOLR-6039.patch bq. This patch adds the timing info in all phases. The times responded by shards are still being added Yeah ... i think for now it makes sense to just "fix" the bug relating ot wether the info comes back - but leave the definition the same as it's been and leave the question for wether the timing info should be "merged" differnetly for another issue (i can see different advantages to both sum vs max) I didnt' fully understand the changes you made when skimming your patch -- but i did understand your test, and it looks good & fairly ccomprehensive and fills me with confidence that the fix is correct. One thing i thing i noticed was still missing though is some testing of picking multiple options (ie: "debug=query&debug=timing") so i've added a randomized testing method that accounts for that case 9among other things) > debug=track causes debug=query info to be suprsedd when no results found > > > Key: SOLR-6039 > URL: https://issues.apache.org/jira/browse/SOLR-6039 > Project: Solr > Issue Type: Bug >Affects Versions: 4.7 >Reporter: Hoss Man > Attachments: SOLR-6039.patch, SOLR-6039.patch, SOLR-6039.patch > > > Shamik Bandopadhyay noted on the mailing list that debugQuery=true wasn't > returning info about how the query was being parsed in SolrCloud. > Steps to reproduce... > * startup a simple 2 shard solr cluster using the example configs > * Load this URL: > ** http://localhost:8983/solr/select?q=Foo&debug=query > ** note that the debug=query causes a debug block including "parsedquery" > showing "title:foo" > * Load either of these URLs: > ** http://localhost:8983/solr/select?q=Foo&debug=query&debug=track > ** http://localhost:8983/solr/select?q=Foo&debugQuery=true (legacy short hand > option for enabling all debug options) > ** Note that the debug block exists, but *only* includes the distributed > "track" options - the query parsing debugging info is not available > * index the sample data (java -jar post.jar *.xml) > * Load either of these URLs: > ** http://localhost:8983/solr/select?q=Solr&debugQuery=true&fl=id > ** http://localhost:8983/solr/select?q=Solr&debug=query&debug=track&fl=id > ** Note that now we have at least one matching doc, and the parsedquery info > is included in the debug block along with the tracking info > * Load either of these URLs: > ** http://localhost:8983/solr/select?q=Solr&debug=query&debug=track&rows=0 > ** http://localhost:8983/solr/select?q=Solr&debug=query&debug=track&rows=0 > ** Note: even though we have a matching doc, since rows=0 prevents it from > being returned, the parsedquery debug info again no longer works - just the > track debug info > > The work around, for people who want don't care about the newer "debug > tracking" and what the same debug information as pre-4.7, is to enumerate the > debug options (ie: {{debug=query&debug=timing&debug=results}}) instead of > relying on the shorthand: {{debugQuery=true}} -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5638) Default Attributes are expensive
[ https://issues.apache.org/jira/browse/LUCENE-5638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-5638: -- Attachment: LUCENE-5638.patch Easy and simple one-line patch. This uses the Token class as attributes impl, which supports: {code:java} public class Token extends CharTermAttributeImpl implements TypeAttribute, PositionIncrementAttribute, FlagsAttribute, OffsetAttribute, PayloadAttribute, PositionLengthAttribute { {code} Strangely, this test fails: {noformat} [junit4] Tests with failures: [junit4] - org.apache.lucene.analysis.TestGraphTokenizers.testMockGraphTokenFilterOnGraphInput [junit4] {noformat} So this one seems to catch some bug in Token.java or the test does not work with this attribute impl (maybe it copies/clones in a wrong way). > Default Attributes are expensive > > > Key: LUCENE-5638 > URL: https://issues.apache.org/jira/browse/LUCENE-5638 > Project: Lucene - Core > Issue Type: Bug > Components: modules/analysis >Reporter: Robert Muir > Attachments: LUCENE-5638.patch > > > Changes like LUCENE-5634 make it clear that the default AttributeFactory > stuff has a very high cost: weakmaps/reflection/etc. > Additionally I think clearAttributes() is more expensive than it should be: > it has to traverse a linked-list, calling clear() per token. > Operations like cloning (save/restoreState) have a high cost tll. > Maybe we can have a better Default? In other words, rename > DEFAULT_ATTRIBUTE_FACTORY to REFLECTION_ATTRIBUTE_FACTORY, and instead have a > faster default factory that just has one AttributeImpl with the "core ones" > that 95% of users are dealing with (TOKEN_ATTRIBUTE_FACTORY?): anything > outside of that falls back to reflection. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3538) fix java7 warnings in the source code
[ https://issues.apache.org/jira/browse/LUCENE-3538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13988240#comment-13988240 ] Ahmet Arslan commented on LUCENE-3538: -- Can anybody tell me what would be "warning free" signatures of following to methods * org.apache.lucene.queries.function.ValueSource#getValues * org.apache.lucene.queries.function.ValueSource#createWeight > fix java7 warnings in the source code > - > > Key: LUCENE-3538 > URL: https://issues.apache.org/jira/browse/LUCENE-3538 > Project: Lucene - Core > Issue Type: Bug >Reporter: Robert Muir > Labels: Java7, newdev > > Now that oracle has fixed java7 bugs, I imagine some users will want to use > it. > Currently if you compile lucene's code with java7 you get a ton of > warnings... lets clean this up -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-5090) NPE in DirectSpellChecker with alternativeTermCount and mm.
[ https://issues.apache.org/jira/browse/SOLR-5090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] James Dyer updated SOLR-5090: - Attachment: SOLR-5090.patch Here is a fix with a unit test scenario. This ignores "spellcheck.alternativeTermCount" when set to zero as its absurd to ask spellcheckers to return zero suggestions for a word. (both DirectSpellChecker and the legacy IndexBasedSpellChecker choke on this scenario) I plan to commit this in a few days. > NPE in DirectSpellChecker with alternativeTermCount and mm. > --- > > Key: SOLR-5090 > URL: https://issues.apache.org/jira/browse/SOLR-5090 > Project: Solr > Issue Type: Bug > Components: spellchecker >Affects Versions: 4.4 > Environment: 4.4.0 1504776 - sarowe - 2013-07-19 02:58:35 >Reporter: Markus Jelsma >Assignee: James Dyer > Fix For: 4.9, 5.0 > > Attachments: SOLR-5090.patch > > > Query with three terms of which one is misspelled and > spellcheck.alternativeTermCount=0&mm=3 yields the following NPE: > {code} > ERROR org.apache.solr.servlet.SolrDispatchFilter – > null:java.lang.NullPointerException > at > org.apache.lucene.search.spell.DirectSpellChecker.suggestSimilar(DirectSpellChecker.java:422) > at > org.apache.lucene.search.spell.DirectSpellChecker.suggestSimilar(DirectSpellChecker.java:355) > at > org.apache.solr.spelling.DirectSolrSpellChecker.getSuggestions(DirectSolrSpellChecker.java:189) > at > org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:188) > at > org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:208) > at > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) > at org.apache.solr.core.SolrCore.execute(SolrCore.java:1904) > at > org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:659) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:362) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:158 > {code} -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (SOLR-5090) NPE in DirectSpellChecker with alternativeTermCount and mm.
[ https://issues.apache.org/jira/browse/SOLR-5090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] James Dyer reassigned SOLR-5090: Assignee: James Dyer > NPE in DirectSpellChecker with alternativeTermCount and mm. > --- > > Key: SOLR-5090 > URL: https://issues.apache.org/jira/browse/SOLR-5090 > Project: Solr > Issue Type: Bug > Components: spellchecker >Affects Versions: 4.4 > Environment: 4.4.0 1504776 - sarowe - 2013-07-19 02:58:35 >Reporter: Markus Jelsma >Assignee: James Dyer > Fix For: 4.9, 5.0 > > > Query with three terms of which one is misspelled and > spellcheck.alternativeTermCount=0&mm=3 yields the following NPE: > {code} > ERROR org.apache.solr.servlet.SolrDispatchFilter – > null:java.lang.NullPointerException > at > org.apache.lucene.search.spell.DirectSpellChecker.suggestSimilar(DirectSpellChecker.java:422) > at > org.apache.lucene.search.spell.DirectSpellChecker.suggestSimilar(DirectSpellChecker.java:355) > at > org.apache.solr.spelling.DirectSolrSpellChecker.getSuggestions(DirectSolrSpellChecker.java:189) > at > org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:188) > at > org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:208) > at > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) > at org.apache.solr.core.SolrCore.execute(SolrCore.java:1904) > at > org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:659) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:362) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:158 > {code} -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-6017) SimpleQParser uses index analyzer instead of query analyzer
[ https://issues.apache.org/jira/browse/SOLR-6017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ryan Ernst updated SOLR-6017: - Fix Version/s: 4.8.1 > SimpleQParser uses index analyzer instead of query analyzer > --- > > Key: SOLR-6017 > URL: https://issues.apache.org/jira/browse/SOLR-6017 > Project: Solr > Issue Type: Bug >Reporter: Ryan Ernst >Assignee: Ryan Ernst > Fix For: 4.8.1, 4.9, 5.0 > > Attachments: SOLR-6017.patch > > > The SimpleQParser uses getAnalyzer(), but it should be getQueryAnalyzer(). -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-5634) Reuse TokenStream instances in Field
[ https://issues.apache.org/jira/browse/LUCENE-5634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir resolved LUCENE-5634. - Resolution: Fixed > Reuse TokenStream instances in Field > > > Key: LUCENE-5634 > URL: https://issues.apache.org/jira/browse/LUCENE-5634 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Michael McCandless > Fix For: 4.9, 5.0 > > Attachments: LUCENE-5634.patch, LUCENE-5634.patch, LUCENE-5634.patch, > LUCENE-5634.patch > > > If you don't reuse your Doc/Field instances (which is very expert: I > suspect few apps do) then there's a lot of garbage created to index each > StringField because we make a new StringTokenStream or > NumericTokenStream (and their Attributes). > We should be able to re-use these instances via a static > ThreadLocal... -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5634) Reuse TokenStream instances in Field
[ https://issues.apache.org/jira/browse/LUCENE-5634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13988080#comment-13988080 ] ASF subversion and git services commented on LUCENE-5634: - Commit 1592005 from [~rcmuir] in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1592005 ] LUCENE-5634: Reuse TokenStream instances for string and numeric Fields > Reuse TokenStream instances in Field > > > Key: LUCENE-5634 > URL: https://issues.apache.org/jira/browse/LUCENE-5634 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Michael McCandless > Fix For: 4.9, 5.0 > > Attachments: LUCENE-5634.patch, LUCENE-5634.patch, LUCENE-5634.patch, > LUCENE-5634.patch > > > If you don't reuse your Doc/Field instances (which is very expert: I > suspect few apps do) then there's a lot of garbage created to index each > StringField because we make a new StringTokenStream or > NumericTokenStream (and their Attributes). > We should be able to re-use these instances via a static > ThreadLocal... -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5609) Should we revisit the default numeric precision step?
[ https://issues.apache.org/jira/browse/LUCENE-5609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13988063#comment-13988063 ] Michael McCandless commented on LUCENE-5609: I think we should do something here for 4.9; poor defaults just hurt our users. I'd like to do 8/16, but Uwe are you completely against this? > Should we revisit the default numeric precision step? > - > > Key: LUCENE-5609 > URL: https://issues.apache.org/jira/browse/LUCENE-5609 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search >Reporter: Michael McCandless > Fix For: 4.9, 5.0 > > Attachments: LUCENE-5609.patch > > > Right now it's 4, for both 8 (long/double) and 4 byte (int/float) > numeric fields, but this is a pretty big hit on indexing speed and > disk usage, especially for tiny documents, because it creates many (8 > or 16) terms for each value. > Since we originally set these defaults, a lot has changed... e.g. we > now rewrite MTQs per-segment, we have a faster (BlockTree) terms dict, > a faster postings format, etc. > Index size is important because it limits how much of the index will > be hot (fit in the OS's IO cache). And more apps are using Lucene for > tiny docs where the overhead of individual fields is sizable. > I used the Geonames corpus to run a simple benchmark (all sources are > committed to luceneutil). It has 8.6 M tiny docs, each with 23 fields, > with these numeric fields: > * lat/lng (double) > * modified time, elevation, population (long) > * dem (int) > I tested 4, 8 and 16 precision steps: > {noformat} > indexing: > PrecStepSizeIndexTime >4 1812.7 MB651.4 sec >8 1203.0 MB443.2 sec > 16894.3 MB361.6 sec > searching: > Field PrecStep QueryTime TermCount > geoNameID 4 2872.5 ms 20306 > geoNameID 8 2903.3 ms 104856 > geoNameID16 3371.9 ms 5871427 > latitude 4 2160.1 ms 36805 > latitude 8 2249.0 ms 240655 > latitude16 2725.9 ms 4649273 > modified 4 2038.3 ms 13311 > modified 8 2029.6 ms 58344 > modified16 2060.5 ms 77763 > longitude 4 3468.5 ms 33818 > longitude 8 3629.9 ms 214863 > longitude16 4060.9 ms 4532032 > {noformat} > Index time is with 1 thread (for identical index structure). > The query time is time to run 100 random ranges for that field, > averaged over 20 iterations. TermCount is the total number of terms > the MTQ rewrote to across all 100 queries / segments, and it gets > higher as expected as precStep gets higher, but the search time is not > that heavily impacted ... negligible going from 4 to 8, and then some > impact from 8 to 16. > Maybe we should increase the int/float default precision step to 8 and > long/double to 16? Or both to 16? -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5634) Reuse TokenStream instances in Field
[ https://issues.apache.org/jira/browse/LUCENE-5634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13988060#comment-13988060 ] Michael McCandless commented on LUCENE-5634: bq. Or are you comparing the speedup by this patch in combination with the precision step change? Baseline was the patch w/ precStep=8 and comp was the patch w/ precStep=4. I just re-ran to be sure; this is IndexGeoNames.java in luceneutil if you want to try ... it's easy to run, you just need to download/unzip geonames corpus first. Net/net precStep=4 is very costly and doesn't seem to buy much query time speedups from my tests on LUCENE-5609. > Reuse TokenStream instances in Field > > > Key: LUCENE-5634 > URL: https://issues.apache.org/jira/browse/LUCENE-5634 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Michael McCandless > Fix For: 4.9, 5.0 > > Attachments: LUCENE-5634.patch, LUCENE-5634.patch, LUCENE-5634.patch, > LUCENE-5634.patch > > > If you don't reuse your Doc/Field instances (which is very expert: I > suspect few apps do) then there's a lot of garbage created to index each > StringField because we make a new StringTokenStream or > NumericTokenStream (and their Attributes). > We should be able to re-use these instances via a static > ThreadLocal... -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5634) Reuse TokenStream instances in Field
[ https://issues.apache.org/jira/browse/LUCENE-5634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13988035#comment-13988035 ] ASF subversion and git services commented on LUCENE-5634: - Commit 1591992 from [~rcmuir] in branch 'dev/trunk' [ https://svn.apache.org/r1591992 ] LUCENE-5634: Reuse TokenStream instances for string and numeric Fields > Reuse TokenStream instances in Field > > > Key: LUCENE-5634 > URL: https://issues.apache.org/jira/browse/LUCENE-5634 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Michael McCandless > Fix For: 4.9, 5.0 > > Attachments: LUCENE-5634.patch, LUCENE-5634.patch, LUCENE-5634.patch, > LUCENE-5634.patch > > > If you don't reuse your Doc/Field instances (which is very expert: I > suspect few apps do) then there's a lot of garbage created to index each > StringField because we make a new StringTokenStream or > NumericTokenStream (and their Attributes). > We should be able to re-use these instances via a static > ThreadLocal... -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6017) SimpleQParser uses index analyzer instead of query analyzer
[ https://issues.apache.org/jira/browse/SOLR-6017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13988027#comment-13988027 ] ASF subversion and git services commented on SOLR-6017: --- Commit 1591990 from [~rjernst] in branch 'dev/branches/lucene_solr_4_8' [ https://svn.apache.org/r1591990 ] SOLR-6017: Fix SimpleQParser to use query analyzer instead of index analyzer > SimpleQParser uses index analyzer instead of query analyzer > --- > > Key: SOLR-6017 > URL: https://issues.apache.org/jira/browse/SOLR-6017 > Project: Solr > Issue Type: Bug >Reporter: Ryan Ernst >Assignee: Ryan Ernst > Fix For: 4.9, 5.0 > > Attachments: SOLR-6017.patch > > > The SimpleQParser uses getAnalyzer(), but it should be getQueryAnalyzer(). -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5634) Reuse TokenStream instances in Field
[ https://issues.apache.org/jira/browse/LUCENE-5634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13988024#comment-13988024 ] Uwe Schindler commented on LUCENE-5634: --- +1 I am fine with that patch! > Reuse TokenStream instances in Field > > > Key: LUCENE-5634 > URL: https://issues.apache.org/jira/browse/LUCENE-5634 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Michael McCandless > Fix For: 4.9, 5.0 > > Attachments: LUCENE-5634.patch, LUCENE-5634.patch, LUCENE-5634.patch, > LUCENE-5634.patch > > > If you don't reuse your Doc/Field instances (which is very expert: I > suspect few apps do) then there's a lot of garbage created to index each > StringField because we make a new StringTokenStream or > NumericTokenStream (and their Attributes). > We should be able to re-use these instances via a static > ThreadLocal... -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5636) SegmentCommitInfo continues to list unneeded gen'd files
[ https://issues.apache.org/jira/browse/LUCENE-5636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13988005#comment-13988005 ] Shai Erera commented on LUCENE-5636: I chatted with Robert about this. The current situation is that the old .fnm files continue to be referenced even when not needed, however when the segment is merged, they go away (as all gen'd files). Given that there's no way to solve it without breaking back-compat, unless we introduce hacks such as checking for a ".fnm" suffix, we discussed how to solve this "going forward". By "going forward" I mean to not change existing segments, but if they contain future updates, write the new information in a better way. Perhaps old .fnm files will still be referenced by those segments, until they're merged away, but new segments will fix that bug. I think that this might be doable together with LUCENE-5618, by writing per-field gen'd DV file, so I'll try to solve it there and if it works I'll resolve that issue as appropriate. > SegmentCommitInfo continues to list unneeded gen'd files > > > Key: LUCENE-5636 > URL: https://issues.apache.org/jira/browse/LUCENE-5636 > Project: Lucene - Core > Issue Type: Bug > Components: core/index >Reporter: Shai Erera >Assignee: Shai Erera > Attachments: LUCENE-5636.patch > > > I thought I handled it in LUCENE-5246, but turns out I didn't handle it > fully. I'll upload a patch which improves the test to expose the bug. I know > where it is, but I'm not sure how to fix it without breaking index > back-compat. Can we do that on experimental features? > The problem is that if you update different fields in different gens, the > FieldInfos files of older gens remain referenced (still!!). I open a new > issue since LUCENE-5246 is already resolved and released, so don't want to > mess up our JIRA... > The severity of the bug is that unneeded files are still referenced in the > index. Everything still works correctly, it's just that .fnm files are still > there. But as I wrote, I'm still not sure how to solve it without requiring > apps that use dv updates to reindex. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5376) Add a demo search server
[ https://issues.apache.org/jira/browse/LUCENE-5376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13988003#comment-13988003 ] ASF subversion and git services commented on LUCENE-5376: - Commit 1591986 from jd...@apache.org in branch 'dev/branches/lucene5376_2' [ https://svn.apache.org/r1591986 ] LUCENE-5376: convert GET parameters to JSON > Add a demo search server > > > Key: LUCENE-5376 > URL: https://issues.apache.org/jira/browse/LUCENE-5376 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Michael McCandless >Assignee: Michael McCandless > Attachments: lucene-demo-server.tgz > > > I think it'd be useful to have a "demo" search server for Lucene. > Rather than being fully featured, like Solr, it would be minimal, just > wrapping the existing Lucene modules to show how you can make use of these > features in a server setting. > The purpose is to demonstrate how one can build a minimal search server on > top of APIs like SearchManager, SearcherLifetimeManager, etc. > This is also useful for finding rough edges / issues in Lucene's APIs that > make building a server unnecessarily hard. > I don't think it should have back compatibility promises (except Lucene's > index back compatibility), so it's free to improve as Lucene's APIs change. > As a starting point, I'll post what I built for the "eating your own dog > food" search app for Lucene's & Solr's jira issues > http://jirasearch.mikemccandless.com (blog: > http://blog.mikemccandless.com/2013/05/eating-dog-food-with-lucene.html ). It > uses Netty to expose basic indexing & searching APIs via JSON, but it's very > rough (lots nocommits). -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5636) SegmentCommitInfo continues to list unneeded gen'd files
[ https://issues.apache.org/jira/browse/LUCENE-5636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shai Erera updated LUCENE-5636: --- Priority: Major (was: Critical) > SegmentCommitInfo continues to list unneeded gen'd files > > > Key: LUCENE-5636 > URL: https://issues.apache.org/jira/browse/LUCENE-5636 > Project: Lucene - Core > Issue Type: Bug > Components: core/index >Reporter: Shai Erera >Assignee: Shai Erera > Attachments: LUCENE-5636.patch > > > I thought I handled it in LUCENE-5246, but turns out I didn't handle it > fully. I'll upload a patch which improves the test to expose the bug. I know > where it is, but I'm not sure how to fix it without breaking index > back-compat. Can we do that on experimental features? > The problem is that if you update different fields in different gens, the > FieldInfos files of older gens remain referenced (still!!). I open a new > issue since LUCENE-5246 is already resolved and released, so don't want to > mess up our JIRA... > The severity of the bug is that unneeded files are still referenced in the > index. Everything still works correctly, it's just that .fnm files are still > there. But as I wrote, I'm still not sure how to solve it without requiring > apps that use dv updates to reindex. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5376) Add a demo search server
[ https://issues.apache.org/jira/browse/LUCENE-5376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13987989#comment-13987989 ] ASF subversion and git services commented on LUCENE-5376: - Commit 1591984 from jd...@apache.org in branch 'dev/branches/lucene5376_2' [ https://svn.apache.org/r1591984 ] LUCENE-5376: HelpHandler fix for incoming parameter > Add a demo search server > > > Key: LUCENE-5376 > URL: https://issues.apache.org/jira/browse/LUCENE-5376 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Michael McCandless >Assignee: Michael McCandless > Attachments: lucene-demo-server.tgz > > > I think it'd be useful to have a "demo" search server for Lucene. > Rather than being fully featured, like Solr, it would be minimal, just > wrapping the existing Lucene modules to show how you can make use of these > features in a server setting. > The purpose is to demonstrate how one can build a minimal search server on > top of APIs like SearchManager, SearcherLifetimeManager, etc. > This is also useful for finding rough edges / issues in Lucene's APIs that > make building a server unnecessarily hard. > I don't think it should have back compatibility promises (except Lucene's > index back compatibility), so it's free to improve as Lucene's APIs change. > As a starting point, I'll post what I built for the "eating your own dog > food" search app for Lucene's & Solr's jira issues > http://jirasearch.mikemccandless.com (blog: > http://blog.mikemccandless.com/2013/05/eating-dog-food-with-lucene.html ). It > uses Netty to expose basic indexing & searching APIs via JSON, but it's very > rough (lots nocommits). -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5634) Reuse TokenStream instances in Field
[ https://issues.apache.org/jira/browse/LUCENE-5634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-5634: Attachment: LUCENE-5634.patch Updated patch with tests. > Reuse TokenStream instances in Field > > > Key: LUCENE-5634 > URL: https://issues.apache.org/jira/browse/LUCENE-5634 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Michael McCandless > Fix For: 4.9, 5.0 > > Attachments: LUCENE-5634.patch, LUCENE-5634.patch, LUCENE-5634.patch, > LUCENE-5634.patch > > > If you don't reuse your Doc/Field instances (which is very expert: I > suspect few apps do) then there's a lot of garbage created to index each > StringField because we make a new StringTokenStream or > NumericTokenStream (and their Attributes). > We should be able to re-use these instances via a static > ThreadLocal... -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5638) Default Attributes are expensive
[ https://issues.apache.org/jira/browse/LUCENE-5638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13987965#comment-13987965 ] Uwe Schindler commented on LUCENE-5638: --- bq. Changes like LUCENE-5634 make it clear that the default AttributeFactory stuff has a very high cost: weakmaps/reflection/etc. The problem are not the weak maps and reflections. The reason why it is expensive is the fact that all attribute instances have to be put into the 2 LinkedHashMaps on creating the TokenStream. I just repeat: It is not the refection! We had this discussion already back 5 years ago with Michael Busch! In addition, the AttributeFactory itsself has less impact (this was already tested while developing it in 2.9). This is why the weak maps are there - so it is fast, the *only* reflection ever happens is: Class#newInstance() is cheap in recent Java versions, the speed difference in micro benchmarks is small, as fast as a native {{new}}. So I disagree with removing the default AttributeFactory, we still need it for non-default attributes, so: The simple workaround would be to use TOKEN_ATTRIBUTE_FACTORY instead, which falls back to the default one for unknown attributes. I agree with clearAttributes(), but this should be solved with TOKEN_ATTRIBUTE_FACTORY , too. > Default Attributes are expensive > > > Key: LUCENE-5638 > URL: https://issues.apache.org/jira/browse/LUCENE-5638 > Project: Lucene - Core > Issue Type: Bug > Components: modules/analysis >Reporter: Robert Muir > > Changes like LUCENE-5634 make it clear that the default AttributeFactory > stuff has a very high cost: weakmaps/reflection/etc. > Additionally I think clearAttributes() is more expensive than it should be: > it has to traverse a linked-list, calling clear() per token. > Operations like cloning (save/restoreState) have a high cost tll. > Maybe we can have a better Default? In other words, rename > DEFAULT_ATTRIBUTE_FACTORY to REFLECTION_ATTRIBUTE_FACTORY, and instead have a > faster default factory that just has one AttributeImpl with the "core ones" > that 95% of users are dealing with (TOKEN_ATTRIBUTE_FACTORY?): anything > outside of that falls back to reflection. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-6039) debug=track causes debug=query info to be suprsedd when no results found
[ https://issues.apache.org/jira/browse/SOLR-6039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tomás Fernández Löbbe updated SOLR-6039: Attachment: SOLR-6039.patch This patch adds the timing info in all phases. The times responded by shards are still being added > debug=track causes debug=query info to be suprsedd when no results found > > > Key: SOLR-6039 > URL: https://issues.apache.org/jira/browse/SOLR-6039 > Project: Solr > Issue Type: Bug >Affects Versions: 4.7 >Reporter: Hoss Man > Attachments: SOLR-6039.patch, SOLR-6039.patch > > > Shamik Bandopadhyay noted on the mailing list that debugQuery=true wasn't > returning info about how the query was being parsed in SolrCloud. > Steps to reproduce... > * startup a simple 2 shard solr cluster using the example configs > * Load this URL: > ** http://localhost:8983/solr/select?q=Foo&debug=query > ** note that the debug=query causes a debug block including "parsedquery" > showing "title:foo" > * Load either of these URLs: > ** http://localhost:8983/solr/select?q=Foo&debug=query&debug=track > ** http://localhost:8983/solr/select?q=Foo&debugQuery=true (legacy short hand > option for enabling all debug options) > ** Note that the debug block exists, but *only* includes the distributed > "track" options - the query parsing debugging info is not available > * index the sample data (java -jar post.jar *.xml) > * Load either of these URLs: > ** http://localhost:8983/solr/select?q=Solr&debugQuery=true&fl=id > ** http://localhost:8983/solr/select?q=Solr&debug=query&debug=track&fl=id > ** Note that now we have at least one matching doc, and the parsedquery info > is included in the debug block along with the tracking info > * Load either of these URLs: > ** http://localhost:8983/solr/select?q=Solr&debug=query&debug=track&rows=0 > ** http://localhost:8983/solr/select?q=Solr&debug=query&debug=track&rows=0 > ** Note: even though we have a matching doc, since rows=0 prevents it from > being returned, the parsedquery debug info again no longer works - just the > track debug info > > The work around, for people who want don't care about the newer "debug > tracking" and what the same debug information as pre-4.7, is to enumerate the > debug options (ie: {{debug=query&debug=timing&debug=results}}) instead of > relying on the shorthand: {{debugQuery=true}} -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-5831) Scale score PostFilter
[ https://issues.apache.org/jira/browse/SOLR-5831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Keegan updated SOLR-5831: --- Attachment: SOLR-5831.patch Hi Joel, The bug I discovered with secondary sort only occurs when the index has multiple segments. The dummy scorer docId should have been relative to the doc base. Also, the collector 'finish' method wasn't calling the delegate's finish method. Both of these bugs were fixed in the previous patch. I don't have a unit test for multiple segments, but I did add a new unit test for the 'maxscalehits' parameter. I'm still not sure that I'm determining the result window size for the QueryResultCache, correctly. See this part: // Determine the results window size. // TODO: this should be sized larger for the query result cache int winSize = request.getSearcher().getCore().getSolrConfig().queryResultWindowSize; Could you verify if this is ok? Thanks, Peter > Scale score PostFilter > -- > > Key: SOLR-5831 > URL: https://issues.apache.org/jira/browse/SOLR-5831 > Project: Solr > Issue Type: Improvement > Components: search >Affects Versions: 4.7 >Reporter: Peter Keegan >Assignee: Joel Bernstein >Priority: Minor > Fix For: 4.9 > > Attachments: SOLR-5831.patch, SOLR-5831.patch, SOLR-5831.patch, > SOLR-5831.patch, TestScaleScoreQParserPlugin.patch > > > The ScaleScoreQParserPlugin is a PostFilter that performs score scaling. > This is an alternative to using a function query wrapping a scale() wrapping > a query(). For example: > select?qq={!edismax v='news' qf='title^2 > body'}&scaledQ=scale(product(query($qq),1),0,1)&q={!func}sum(product(0.75,$scaledQ),product(0.25,field(myfield)))&fq={!query > v=$qq} > The problem with this query is that it has to scale every hit. Usually, only > the returned hits need to be scaled, > but there may be use cases where the number of hits to be scaled is greater > than the returned hit count, > but less than or equal to the total hit count. > Sample syntax: > fq={!scalescore+l=0.0 u=1.0 maxscalehits=1 > func=sum(product(sscore(),0.75),product(field(myfield),0.25))} > l=0.0 u=1.0 //Scale scores to values between 0-1, inclusive > maxscalehits=1//The maximum number of result scores to scale (-1 = > all hits, 0 = results 'page' size) > func=... //Apply the composite function to each hit. The > scaled score value is accessed by the 'score()' value source > All parameters are optional. The defaults are: > l=0.0 u=1.0 > maxscalehits=0 (result window size) > func=(null) > > Note: this patch is not complete, as it contains no test cases and may not > conform > to all the guidelines in http://wiki.apache.org/solr/HowToContribute. > > I would appreciate any feedback on the usability and implementation. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-5638) Default Attributes are expensive
Robert Muir created LUCENE-5638: --- Summary: Default Attributes are expensive Key: LUCENE-5638 URL: https://issues.apache.org/jira/browse/LUCENE-5638 Project: Lucene - Core Issue Type: Bug Components: modules/analysis Reporter: Robert Muir Changes like LUCENE-5634 make it clear that the default AttributeFactory stuff has a very high cost: weakmaps/reflection/etc. Additionally I think clearAttributes() is more expensive than it should be: it has to traverse a linked-list, calling clear() per token. Operations like cloning (save/restoreState) have a high cost tll. Maybe we can have a better Default? In other words, rename DEFAULT_ATTRIBUTE_FACTORY to REFLECTION_ATTRIBUTE_FACTORY, and instead have a faster default factory that just has one AttributeImpl with the "core ones" that 95% of users are dealing with (TOKEN_ATTRIBUTE_FACTORY?): anything outside of that falls back to reflection. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5634) Reuse TokenStream instances in Field
[ https://issues.apache.org/jira/browse/LUCENE-5634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13987855#comment-13987855 ] Uwe Schindler commented on LUCENE-5634: --- bq. I would prefer to simply break the interface rather than do anything sophisticated here. Its a very expert low-level one. The patch had very minimal impact to the codebase. +1. Nevertheless as we change a public interface, it should be mentioned in "Backwards Breaks". > Reuse TokenStream instances in Field > > > Key: LUCENE-5634 > URL: https://issues.apache.org/jira/browse/LUCENE-5634 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Michael McCandless > Fix For: 4.9, 5.0 > > Attachments: LUCENE-5634.patch, LUCENE-5634.patch, LUCENE-5634.patch > > > If you don't reuse your Doc/Field instances (which is very expert: I > suspect few apps do) then there's a lot of garbage created to index each > StringField because we make a new StringTokenStream or > NumericTokenStream (and their Attributes). > We should be able to re-use these instances via a static > ThreadLocal... -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-5285) Solr response format should support child Docs
[ https://issues.apache.org/jira/browse/SOLR-5285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Thacker updated SOLR-5285: Attachment: SOLR-5285.patch Correct patch. Please ignore the previous patch. > Solr response format should support child Docs > -- > > Key: SOLR-5285 > URL: https://issues.apache.org/jira/browse/SOLR-5285 > Project: Solr > Issue Type: New Feature >Reporter: Varun Thacker > Fix For: 4.9, 5.0 > > Attachments: SOLR-5285.patch, SOLR-5285.patch, SOLR-5285.patch, > SOLR-5285.patch, SOLR-5285.patch > > > Solr has added support for taking childDocs as input ( only XML till now ). > It's currently used for BlockJoinQuery. > I feel that if a user indexes a document with child docs, even if he isn't > using the BJQ features and is just searching which results in a hit on the > parentDoc, it's childDocs should be returned in the response format. > [~hossman_luc...@fucit.org] on IRC suggested that the DocTransformers would > be the place to add childDocs to the response. > Now given a docId one needs to find out all the childDoc id's. A couple of > approaches which I could think of are > 1. Maintain the relation between a parentDoc and it's childDocs during > indexing time in maybe a separate index? > 2. Somehow emulate what happens in ToParentBlockJoinQuery.nextDoc() - Given a > parentDoc it finds out all the childDocs but this requires a childScorer. > Am I missing something obvious on how to find the relation between a > parentDoc and it's childDocs because none of the above solutions for this > look right. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-5285) Solr response format should support child Docs
[ https://issues.apache.org/jira/browse/SOLR-5285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Thacker updated SOLR-5285: Attachment: SOLR-5285.patch Updated patch with trunk. bq. it's not clear to me from the API what to expect will happen if i have more then one level of parent-child relationships in my index – will children & grandchildren be returned? whatever is expected needs to be documented/tested Tested with grandchildren. In Lucene all grandchildren and all siblings are treated as simply children to the parent document. A parent document and all it's child documents are indexed in a block. Hence we should document for only support one level of nesting. > Solr response format should support child Docs > -- > > Key: SOLR-5285 > URL: https://issues.apache.org/jira/browse/SOLR-5285 > Project: Solr > Issue Type: New Feature >Reporter: Varun Thacker > Fix For: 4.9, 5.0 > > Attachments: SOLR-5285.patch, SOLR-5285.patch, SOLR-5285.patch, > SOLR-5285.patch > > > Solr has added support for taking childDocs as input ( only XML till now ). > It's currently used for BlockJoinQuery. > I feel that if a user indexes a document with child docs, even if he isn't > using the BJQ features and is just searching which results in a hit on the > parentDoc, it's childDocs should be returned in the response format. > [~hossman_luc...@fucit.org] on IRC suggested that the DocTransformers would > be the place to add childDocs to the response. > Now given a docId one needs to find out all the childDoc id's. A couple of > approaches which I could think of are > 1. Maintain the relation between a parentDoc and it's childDocs during > indexing time in maybe a separate index? > 2. Somehow emulate what happens in ToParentBlockJoinQuery.nextDoc() - Given a > parentDoc it finds out all the childDocs but this requires a childScorer. > Am I missing something obvious on how to find the relation between a > parentDoc and it's childDocs because none of the above solutions for this > look right. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5634) Reuse TokenStream instances in Field
[ https://issues.apache.org/jira/browse/LUCENE-5634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13987844#comment-13987844 ] Robert Muir commented on LUCENE-5634: - I would prefer to simply break the interface rather than do anything sophisticated here. Its a very expert low-level one. The patch had very minimal impact to the codebase. I think its good to defer stuff with Analyzer and not do that here, that has a lot of consumers like QueryParsers, MoreLikeThis, Suggesters, ... Thats a more complex issue. I am unsure that adding things like equals is a good idea, it might make things very complex. For now, if you implement your own subclass, you can just ignore the parameter, and its the same performance and so on. I will upload a new patch with tests (including doing stupid things). > Reuse TokenStream instances in Field > > > Key: LUCENE-5634 > URL: https://issues.apache.org/jira/browse/LUCENE-5634 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Michael McCandless > Fix For: 4.9, 5.0 > > Attachments: LUCENE-5634.patch, LUCENE-5634.patch, LUCENE-5634.patch > > > If you don't reuse your Doc/Field instances (which is very expert: I > suspect few apps do) then there's a lot of garbage created to index each > StringField because we make a new StringTokenStream or > NumericTokenStream (and their Attributes). > We should be able to re-use these instances via a static > ThreadLocal... -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5634) Reuse TokenStream instances in Field
[ https://issues.apache.org/jira/browse/LUCENE-5634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13987843#comment-13987843 ] Uwe Schindler commented on LUCENE-5634: --- bq. BTW, that test was with precStep=8. If I use precStep=4 (still the default, we really have to fix LUCENE-5609!) then indexing time for Geonames with the patch is 164.8 sec (63% slower!). HÄ? How comes, makes no sense to me. Are you sure you are doing the right thing? Or are you comparing the speedup by this patch in combination with the precision step change? > Reuse TokenStream instances in Field > > > Key: LUCENE-5634 > URL: https://issues.apache.org/jira/browse/LUCENE-5634 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Michael McCandless > Fix For: 4.9, 5.0 > > Attachments: LUCENE-5634.patch, LUCENE-5634.patch, LUCENE-5634.patch > > > If you don't reuse your Doc/Field instances (which is very expert: I > suspect few apps do) then there's a lot of garbage created to index each > StringField because we make a new StringTokenStream or > NumericTokenStream (and their Attributes). > We should be able to re-use these instances via a static > ThreadLocal... -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5634) Reuse TokenStream instances in Field
[ https://issues.apache.org/jira/browse/LUCENE-5634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13987841#comment-13987841 ] Uwe Schindler commented on LUCENE-5634: --- Patch looks fine. I was afraid of complexity, but that looks quite good. I am not sure about backwards compatibility issues, but implementing your own IndexableField instance is still very expert. With Java 8 we could handle that with default interface methods (LOOL). The current patch is fine for the 2 special cases, although its a bit risky, if we add new "settings" to NTS or change its API (we should have equals...). Maybe in LUCENE-5605 we can improve the check. If we pass FieldType directly to NTS and NRQ, we can handle the whole thing by comparing the field type and not rely on crazy internals like precStep. It would be great if we could in the future remove the ThreadLocal from Analyzer, too - by using the same trick. Unfortunately with the current contract on TokenStream its hard to compare, unless we have a well-defined TokenStream#equals(). Ideally TokenStream#equals() should compare the "settings" of the stream and its inputs (for Filters), but that is too advanced for the simple 2 cases. Another solution for this would be to have some "holder" around the TokenStream thats cached and provides hashcode/equals. By that a Field could determine better if its his own tokenstream (e.g. by putting a refernce to its field type into the holder). > Reuse TokenStream instances in Field > > > Key: LUCENE-5634 > URL: https://issues.apache.org/jira/browse/LUCENE-5634 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Michael McCandless > Fix For: 4.9, 5.0 > > Attachments: LUCENE-5634.patch, LUCENE-5634.patch, LUCENE-5634.patch > > > If you don't reuse your Doc/Field instances (which is very expert: I > suspect few apps do) then there's a lot of garbage created to index each > StringField because we make a new StringTokenStream or > NumericTokenStream (and their Attributes). > We should be able to re-use these instances via a static > ThreadLocal... -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5634) Reuse TokenStream instances in Field
[ https://issues.apache.org/jira/browse/LUCENE-5634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13987827#comment-13987827 ] Michael McCandless commented on LUCENE-5634: BTW, that test was with precStep=8. If I use precStep=4 (still the default, we really have to fix LUCENE-5609!) then indexing time for Geonames with the patch is 164.8 sec (63% slower!). > Reuse TokenStream instances in Field > > > Key: LUCENE-5634 > URL: https://issues.apache.org/jira/browse/LUCENE-5634 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Michael McCandless > Fix For: 4.9, 5.0 > > Attachments: LUCENE-5634.patch, LUCENE-5634.patch, LUCENE-5634.patch > > > If you don't reuse your Doc/Field instances (which is very expert: I > suspect few apps do) then there's a lot of garbage created to index each > StringField because we make a new StringTokenStream or > NumericTokenStream (and their Attributes). > We should be able to re-use these instances via a static > ThreadLocal... -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5634) Reuse TokenStream instances in Field
[ https://issues.apache.org/jira/browse/LUCENE-5634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13987808#comment-13987808 ] Michael McCandless commented on LUCENE-5634: OK with NumericField, full Geonames index takes 129.7 sec on trunk and 101.0 sec with last patch... nice speedup. > Reuse TokenStream instances in Field > > > Key: LUCENE-5634 > URL: https://issues.apache.org/jira/browse/LUCENE-5634 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Michael McCandless > Fix For: 4.9, 5.0 > > Attachments: LUCENE-5634.patch, LUCENE-5634.patch, LUCENE-5634.patch > > > If you don't reuse your Doc/Field instances (which is very expert: I > suspect few apps do) then there's a lot of garbage created to index each > StringField because we make a new StringTokenStream or > NumericTokenStream (and their Attributes). > We should be able to re-use these instances via a static > ThreadLocal... -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5634) Reuse TokenStream instances in Field
[ https://issues.apache.org/jira/browse/LUCENE-5634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13987806#comment-13987806 ] Michael McCandless commented on LUCENE-5634: +1, patch looks good. I ran IndexGeoNames again, it took 37.6 seconds, which is a big speedup over trunk (55.6 seconds). However, it's only doing StringField right now ... I'll re-test w/ NumericField too. > Reuse TokenStream instances in Field > > > Key: LUCENE-5634 > URL: https://issues.apache.org/jira/browse/LUCENE-5634 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Michael McCandless > Fix For: 4.9, 5.0 > > Attachments: LUCENE-5634.patch, LUCENE-5634.patch, LUCENE-5634.patch > > > If you don't reuse your Doc/Field instances (which is very expert: I > suspect few apps do) then there's a lot of garbage created to index each > StringField because we make a new StringTokenStream or > NumericTokenStream (and their Attributes). > We should be able to re-use these instances via a static > ThreadLocal... -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5634) Reuse TokenStream instances in Field
[ https://issues.apache.org/jira/browse/LUCENE-5634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-5634: Attachment: LUCENE-5634.patch here is a patch. Tests seem happy, but i didnt benchmark or yet write explicit test. Personally I think its bogus: I don't like that these fields (StringField, NumericField) "backdoor" the analyzer and to me thats the real bug. But I am ok with the change as a step, because it only makes the low-level interface api more bogus. > Reuse TokenStream instances in Field > > > Key: LUCENE-5634 > URL: https://issues.apache.org/jira/browse/LUCENE-5634 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Michael McCandless > Fix For: 4.9, 5.0 > > Attachments: LUCENE-5634.patch, LUCENE-5634.patch, LUCENE-5634.patch > > > If you don't reuse your Doc/Field instances (which is very expert: I > suspect few apps do) then there's a lot of garbage created to index each > StringField because we make a new StringTokenStream or > NumericTokenStream (and their Attributes). > We should be able to re-use these instances via a static > ThreadLocal... -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5634) Reuse TokenStream instances in Field
[ https://issues.apache.org/jira/browse/LUCENE-5634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13987789#comment-13987789 ] Michael McCandless commented on LUCENE-5634: bq. Maybe add a parameter to Field#tokenStream(), passing the previously cached instance! This sounds like a good idea! > Reuse TokenStream instances in Field > > > Key: LUCENE-5634 > URL: https://issues.apache.org/jira/browse/LUCENE-5634 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Michael McCandless > Fix For: 4.9, 5.0 > > Attachments: LUCENE-5634.patch, LUCENE-5634.patch, LUCENE-5634.patch > > > If you don't reuse your Doc/Field instances (which is very expert: I > suspect few apps do) then there's a lot of garbage created to index each > StringField because we make a new StringTokenStream or > NumericTokenStream (and their Attributes). > We should be able to re-use these instances via a static > ThreadLocal... -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (SOLR-5956) SnapShooter is using getRawInstanceDir, which is sometimes not a valid directory
[ https://issues.apache.org/jira/browse/SOLR-5956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Timothy Potter reassigned SOLR-5956: Assignee: Timothy Potter > SnapShooter is using getRawInstanceDir, which is sometimes not a valid > directory > > > Key: SOLR-5956 > URL: https://issues.apache.org/jira/browse/SOLR-5956 > Project: Solr > Issue Type: Bug > Components: replication (java), SolrCloud > Environment: SolrCloud >Reporter: Timothy Potter >Assignee: Timothy Potter > Attachments: SOLR-5956.patch > > > Ran into an issue where the getRawInstanceDir method on CoreDescriptor > returns an invalid directory on the server. Need to track down where this bad > value comes from and fix it. I suspect this has to do with me using symlinks > e.g. > I have server with solr.solr.home set to > /home/ec2-user/lucene-solr/solr/cloud87/solr, which in reality is: > /vol0/cloud87/solr as /home/ec2-user/lucene-solr/solr/cloud87 is a symlink to > /vol0/cloud87 > getRawInstanceDir was returning /vol0/cloud87/demo_shard1_replica1 which is > missing the /solr part of the directory path, should be: > /vol0/cloud87/solr/demo_shard1_replica1 -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5634) Reuse TokenStream instances in Field
[ https://issues.apache.org/jira/browse/LUCENE-5634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13987729#comment-13987729 ] Robert Muir commented on LUCENE-5634: - {quote} Another idea: Maybe add a parameter to Field#tokenStream(), passing the previously cached instance! By this the field could obviously reuse the TokenStream, if the type (instanceof check) is correct. If not, throw it away and create a new one. The indexer then manages the cache (its just a field in DefaultIndexingChain or DocumentsWriter). {quote} I like this idea better. Lets try and see how bad it looks. > Reuse TokenStream instances in Field > > > Key: LUCENE-5634 > URL: https://issues.apache.org/jira/browse/LUCENE-5634 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Michael McCandless > Fix For: 4.9, 5.0 > > Attachments: LUCENE-5634.patch, LUCENE-5634.patch > > > If you don't reuse your Doc/Field instances (which is very expert: I > suspect few apps do) then there's a lot of garbage created to index each > StringField because we make a new StringTokenStream or > NumericTokenStream (and their Attributes). > We should be able to re-use these instances via a static > ThreadLocal... -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5634) Reuse TokenStream instances in Field
[ https://issues.apache.org/jira/browse/LUCENE-5634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13987648#comment-13987648 ] Uwe Schindler commented on LUCENE-5634: --- Another idea: Maybe add a parameter to Field#tokenStream(), passing the previously cached instance! By this the field could obviously reuse the TokenStream, if the type (instanceof check) is correct. If not, throw it away and create a new one. The indexer then manages the cache (its just a field in DefaultIndexingChain or DocumentsWriter). > Reuse TokenStream instances in Field > > > Key: LUCENE-5634 > URL: https://issues.apache.org/jira/browse/LUCENE-5634 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Michael McCandless > Fix For: 4.9, 5.0 > > Attachments: LUCENE-5634.patch, LUCENE-5634.patch > > > If you don't reuse your Doc/Field instances (which is very expert: I > suspect few apps do) then there's a lot of garbage created to index each > StringField because we make a new StringTokenStream or > NumericTokenStream (and their Attributes). > We should be able to re-use these instances via a static > ThreadLocal... -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5634) Reuse TokenStream instances in Field
[ https://issues.apache.org/jira/browse/LUCENE-5634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13987645#comment-13987645 ] Uwe Schindler commented on LUCENE-5634: --- bq. but it's trickier since the precStep is final (maybe we can un-final it and add a setter?) Please donÄt do this. It is maybe better to do it like in Elasticsearch: Have a pool of NTS for each precision step. bq. this optimization has proven to help a lot in the context of ES, but we can use a static thread local since we are fully in control of the threading model. With Lucene itself, where it can be used in many different environment, then this can cause some unexpected behavior. For example, this might cause Tomcat to warn on leaking resources when unloading a war. Thanks Shay: This is really the reason why we always refused to use static (!) ThreadLocals in Lucene, especially for those heavy used components. Maybe we can do a similar thing like with StringField in Mike's patch. Its a bit crazy to move out the TokenStreams from the field, but we can do this for performance here. Just have a lazy init pool of NumericTokenStreams for each precisionStep in each per thread DocumentsWriter (DefaultIndexingChain). -1 to add thread locals in Lucene here! Another idea how to manage the pools: Maybe add a protected method to Field that can get the DocumentsWriter instance and add some caching functionality for arbitrary TokenStreams (not just NumericTS or StringTS): Maybe some method on the per thread DocumentsWriter to set aTokenStream for reuse per field. The field (also custom ones) then could use setCachedTokenStream/getCachedTokenStream through the DocumentsWriter accessor from inside the Field. > Reuse TokenStream instances in Field > > > Key: LUCENE-5634 > URL: https://issues.apache.org/jira/browse/LUCENE-5634 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Michael McCandless > Fix For: 4.9, 5.0 > > Attachments: LUCENE-5634.patch, LUCENE-5634.patch > > > If you don't reuse your Doc/Field instances (which is very expert: I > suspect few apps do) then there's a lot of garbage created to index each > StringField because we make a new StringTokenStream or > NumericTokenStream (and their Attributes). > We should be able to re-use these instances via a static > ThreadLocal... -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-5634) Reuse TokenStream instances in Field
[ https://issues.apache.org/jira/browse/LUCENE-5634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13987645#comment-13987645 ] Uwe Schindler edited comment on LUCENE-5634 at 5/2/14 12:45 PM: bq. but it's trickier since the precStep is final (maybe we can un-final it and add a setter?) Please don't do this. It is maybe better to do it like in Elasticsearch: Have a pool of NTS for each precision step. bq. this optimization has proven to help a lot in the context of ES, but we can use a static thread local since we are fully in control of the threading model. With Lucene itself, where it can be used in many different environment, then this can cause some unexpected behavior. For example, this might cause Tomcat to warn on leaking resources when unloading a war. Thanks Shay: This is really the reason why we always refused to use static (!) ThreadLocals in Lucene, especially for those heavy used components. Maybe we can do a similar thing like with StringField in Mike's patch. Its a bit crazy to move out the TokenStreams from the field, but we can do this for performance here. Just have a lazy init pool of NumericTokenStreams for each precisionStep in each per thread DocumentsWriter (DefaultIndexingChain). -1 to add thread locals in Lucene here! Another idea how to manage the pools: Maybe add a protected method to Field that can get the DocumentsWriter instance and add some caching functionality for arbitrary TokenStreams (not just NumericTS or StringTS): Maybe some method on the per thread DocumentsWriter to set aTokenStream for reuse per field. The field (also custom ones) then could use setCachedTokenStream/getCachedTokenStream through the DocumentsWriter accessor from inside the Field. was (Author: thetaphi): bq. but it's trickier since the precStep is final (maybe we can un-final it and add a setter?) Please donÄt do this. It is maybe better to do it like in Elasticsearch: Have a pool of NTS for each precision step. bq. this optimization has proven to help a lot in the context of ES, but we can use a static thread local since we are fully in control of the threading model. With Lucene itself, where it can be used in many different environment, then this can cause some unexpected behavior. For example, this might cause Tomcat to warn on leaking resources when unloading a war. Thanks Shay: This is really the reason why we always refused to use static (!) ThreadLocals in Lucene, especially for those heavy used components. Maybe we can do a similar thing like with StringField in Mike's patch. Its a bit crazy to move out the TokenStreams from the field, but we can do this for performance here. Just have a lazy init pool of NumericTokenStreams for each precisionStep in each per thread DocumentsWriter (DefaultIndexingChain). -1 to add thread locals in Lucene here! Another idea how to manage the pools: Maybe add a protected method to Field that can get the DocumentsWriter instance and add some caching functionality for arbitrary TokenStreams (not just NumericTS or StringTS): Maybe some method on the per thread DocumentsWriter to set aTokenStream for reuse per field. The field (also custom ones) then could use setCachedTokenStream/getCachedTokenStream through the DocumentsWriter accessor from inside the Field. > Reuse TokenStream instances in Field > > > Key: LUCENE-5634 > URL: https://issues.apache.org/jira/browse/LUCENE-5634 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Michael McCandless > Fix For: 4.9, 5.0 > > Attachments: LUCENE-5634.patch, LUCENE-5634.patch > > > If you don't reuse your Doc/Field instances (which is very expert: I > suspect few apps do) then there's a lot of garbage created to index each > StringField because we make a new StringTokenStream or > NumericTokenStream (and their Attributes). > We should be able to re-use these instances via a static > ThreadLocal... -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: CJKBigramFilter - position bug with outputUnigrams?
> > Would it be possible to implement an option with a name similar to > "lastUnigramAtPreviousPosition" so that I can optionally get the > behavior I'm after when the input is two or more characters, without > changing current behavior for anyone else? This would completely solve > my current problem. > This is really not feasible. It sounds like multi-level n-grams in the same field are a bad match for what you are doing (phrase queries etc). This just doesnt work, and wont work, based on the mathematics. Try another approach like removing this filter completely, maybe the word segmentation by ICU is good enough. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org