[jira] [Created] (LUCENE-6045) Refator classifier APIs to work better with multi threading
Tommaso Teofili created LUCENE-6045: --- Summary: Refator classifier APIs to work better with multi threading Key: LUCENE-6045 URL: https://issues.apache.org/jira/browse/LUCENE-6045 Project: Lucene - Core Issue Type: Improvement Components: modules/classification Reporter: Tommaso Teofili Assignee: Tommaso Teofili In https://issues.apache.org/jira/browse/LUCENE-4345?focusedCommentId=13454729page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13454729 [~simonw] pointed out that the current Classifier API doesn't work well in multi threading environments: bq. The interface you defined has some problems with respect to Multi-Threading IMO. The interface itself suggests that this class is stateful and you have to call methods in a certain order and at the same you need to make sure that it is not published for read access before training is done. I think it would be wise to pass in all needed objects as constructor arguments and make the references final so it can be shared across threads and add an interface that represents the trained model computed offline? In this case it doesn't really matter but in the future it might make sense. We can also skip the model interface entirely and remove the training method until we have some impls that really need to be trained. I missed that at that point but I think for 5.0 it would be wise to rearrange the API to address that properly. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-5736) Separate the classifiers to online and caching where possible
[ https://issues.apache.org/jira/browse/LUCENE-5736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tommaso Teofili resolved LUCENE-5736. - Resolution: Fixed Fix Version/s: 5.0 Separate the classifiers to online and caching where possible - Key: LUCENE-5736 URL: https://issues.apache.org/jira/browse/LUCENE-5736 Project: Lucene - Core Issue Type: Sub-task Components: modules/classification Reporter: Gergő Törcsvári Assignee: Tommaso Teofili Labels: gsoc2014 Fix For: 5.0 Attachments: 0803-caching.patch, 0810-caching.patch, CachingNaiveBayesClassifier.java The Lucene classifier implementations are now near onlines if they get a near realtime reader. It is good for the users whoes have a continously changing dataset, but slow for not changing datasets. The idea is: What if we implement a cache and speed up the results where it is possible. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (LUCENE-5548) Improve flexibility and testability of the classification module
[ https://issues.apache.org/jira/browse/LUCENE-5548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tommaso Teofili reassigned LUCENE-5548: --- Assignee: Tommaso Teofili Improve flexibility and testability of the classification module Key: LUCENE-5548 URL: https://issues.apache.org/jira/browse/LUCENE-5548 Project: Lucene - Core Issue Type: Improvement Components: modules/classification Reporter: Tommaso Teofili Assignee: Tommaso Teofili Labels: gsoc2014, mentor Lucene classification module's flexibility and capabilities may be improved with the following: - make it possible to use them online (or provide an online version of them) so that if the underlying index(reader) is updated the classifier doesn't need to be trained again to take into account newly added docs - eventually pass a different Analyzer together with the text to be classified (or directly a TokenStream) to specify custom tokenization/filtering. - normalize score calculations of existing classifiers - provide publicly available dataset based accuracy and speed tests - more Lucene based classification algorithms Specific subtasks for each of the above topics should be created to discuss each of them in depth. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-5699) Lucene classification score calculation normalize and return lists
[ https://issues.apache.org/jira/browse/LUCENE-5699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tommaso Teofili resolved LUCENE-5699. - Resolution: Fixed Lucene classification score calculation normalize and return lists -- Key: LUCENE-5699 URL: https://issues.apache.org/jira/browse/LUCENE-5699 Project: Lucene - Core Issue Type: Sub-task Components: modules/classification Reporter: Gergő Törcsvári Assignee: Tommaso Teofili Labels: gsoc2014 Fix For: 5.0, Trunk Attachments: 06-06-5699.patch, 0730.patch, 0803-base.patch, 0810-base.patch Now the classifiers can return only the best matching classes. If somebody want it to use more complex tasks he need to modify these classes for get second and third results too. If it is possible to return a list and it is not a lot resource why we dont do that? (We iterate a list so also.) The Bayes classifier get too small return values, and there were a bug with the zero floats. It was fixed with logarithmic. It would be nice to scale the class scores sum vlue to one, and then we coud compare two documents return score and relevance. (If we dont do this the wordcount in the test documents affected the result score.) With bulletpoints: * In the Bayes classification normalized score values, and return with result lists. * In the KNN classifier possibility to return a result list. * Make the ClassificationResult Comparable for list sorting. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-5.x-Windows (32bit/jdk1.8.0_40-ea-b09) - Build # 4307 - Still Failing!
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-5.x-Windows/4307/ Java: 32bit/jdk1.8.0_40-ea-b09 -server -XX:+UseSerialGC (asserts: true) 4 tests failed. FAILED: org.apache.solr.client.solrj.impl.CloudSolrServerTest.testDistribSearch Error Message: Stack Trace: java.lang.AssertionError at __randomizedtesting.SeedInfo.seed([EBA28C3A09DF34FA:6A4402227E8054C6]:0) at org.junit.Assert.fail(Assert.java:92) at org.junit.Assert.assertTrue(Assert.java:43) at org.junit.Assert.assertTrue(Assert.java:54) at org.apache.solr.client.solrj.impl.CloudSolrServerTest.allTests(CloudSolrServerTest.java:182) at org.apache.solr.client.solrj.impl.CloudSolrServerTest.doTest(CloudSolrServerTest.java:124) at org.apache.solr.BaseDistributedSearchTestCase.testDistribSearch(BaseDistributedSearchTestCase.java:869) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1618) at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:827) at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:863) at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:877) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53) at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:365) at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:798) at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:458) at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:836) at com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:738) at com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:772) at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:783) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:54) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65) at org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at
[jira] [Updated] (SOLR-6637) Solr should have a way to restore a core
[ https://issues.apache.org/jira/browse/SOLR-6637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Thacker updated SOLR-6637: Attachment: SOLR-6637.patch Updated patch. Improved the test case and fixed the bugs that it uncovered. Still fighting one issue - I keep getting this error when the index directory tries to close all the resources. Trying to figure out what is the underlying problem. {noformat} MockDirectoryWrapper: cannot close: there are still open files: {_0.cfs=1, _1.cfs=1} {noformat} Solr should have a way to restore a core Key: SOLR-6637 URL: https://issues.apache.org/jira/browse/SOLR-6637 Project: Solr Issue Type: Improvement Reporter: Varun Thacker Attachments: SOLR-6637.patch, SOLR-6637.patch, SOLR-6637.patch We have a core backup command which backs up the index. We should have a restore command too. This would restore any named snapshots created by the replication handlers backup command. While working on this patch right now I realized that during backup we only backup the index. Should we backup the conf files also? Any thoughts? I could separate Jira for this. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6005) Explore alternative to Document/Field/FieldType API
[ https://issues.apache.org/jira/browse/LUCENE-6005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14194378#comment-14194378 ] ASF subversion and git services commented on LUCENE-6005: - Commit 1636293 from [~mikemccand] in branch 'dev/branches/lucene6005' [ https://svn.apache.org/r1636293 ] LUCENE-6005: add Date, InetAddress types; add min/maxTokenLength; add maxTokenCount; use ValueType.NONE not null; each FieldType now stores Luceneversion it was created by Explore alternative to Document/Field/FieldType API --- Key: LUCENE-6005 URL: https://issues.apache.org/jira/browse/LUCENE-6005 Project: Lucene - Core Issue Type: Bug Reporter: Michael McCandless Assignee: Michael McCandless Fix For: Trunk Auto-prefix terms (LUCENE-5879) is blocked because it's impossible in Lucene today to add a simple API to use it, and I don't think we should commit features that only super-experts can figure out how to use: that's evil. The only realistic workaround for such new features is to instead add them directly to the various servers on top of Lucene, since they all already have nice schema APIs. I opened LUCENE-5989 to try do at least a baby step towards making it easier to use auto-prefix terms, so you can easily add singleton binary tokens, but even that has proven controversial. Net/net I think we have to solve the root cause of this by fixing the Document/Field/FieldType API so that new index-level features can have a usable API, properly defaulted for the right types of fields. Towards that, I'm exploring a replacement for Document/Field/FieldType. The idea is to expose simple methods on the document class (no more separate Field and FieldType classes): {noformat} doc.addLargeText(body, some text); doc.addShortText(title, a title); doc.addAtom(id, 29jafnn); doc.addBinary(bytes, new byte[7]); doc.addNumber(number, 17); {noformat} And then expose a separate FieldTypes class, that you pass to ctor of the new document class, which lets you set all the various per-field settings (stored, doc values, etc.). E.g.: {noformat} types.enableStored(id); {noformat} FieldTypes is a write-once schema, and it throws exceptions if you try to make invalid changes once a given setting is already written (e.g. enabling norms after having disabled them). It will (I haven't implemented this yet) save its state into IndexWriter's commitData, so it's available when you open a new IndexWriter for append and when you open a reader. It has methods to set all the per-field settings (analyzer, stored, term vectors, norms, index options, doc values type), and chooses reasonable defaults based on the value's type when it suddenly sees a new field. For example, when you add a number, it's indexed for range querying and sorting (numeric doc values) by default. FieldTypes provides the analyzer and codec (a little messy) that you pass to IndexWriterConfig. Since it's effectively a persistent schema, it knows all about the available fields at search time, so we could use it to create queries (checking if they are valid given that field's type). Query parsers and highlighters could consult it. Default UIs (above Lucene) could use it, etc. This is all future .. I think for this issue the goal should be to just provide a better index-time API but not yet make use of it at search time. So with this change, for auto-prefix terms, we could add an enable range queries/filters option, but then validate that the selected postings format supports such an option. I know this exploration will be horribly controversial, but realistically I don't think Lucene can move on much further if we can't finally address this schema problem head on. This is long overdue. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-6046) RegExp.toAutomaton high memory use
Lee Hinman created LUCENE-6046: -- Summary: RegExp.toAutomaton high memory use Key: LUCENE-6046 URL: https://issues.apache.org/jira/browse/LUCENE-6046 Project: Lucene - Core Issue Type: Bug Components: core/queryparser Affects Versions: 4.10.1 Reporter: Lee Hinman Priority: Minor When creating an automaton from an org.apache.lucene.util.automaton.RegExp, it's possible for the automaton to use so much memory it exceeds the maximum array size for java. The following caused an OutOfMemoryError with a 32gb heap: {noformat} new RegExp(\\[\\[(Datei|File|Bild|Image):[^]]*alt=[^]|}]{50,200}).toAutomaton(); {noformat} When increased to a 60gb heap, the following exception is thrown: {noformat} 1 java.lang.IllegalArgumentException: requested array size 2147483624 exceeds maximum array in java (2147483623) 1 __randomizedtesting.SeedInfo.seed([7BE81EF678615C32:95C8057A4ABA5B52]:0) 1 org.apache.lucene.util.ArrayUtil.oversize(ArrayUtil.java:168) 1 org.apache.lucene.util.ArrayUtil.grow(ArrayUtil.java:295) 1 org.apache.lucene.util.automaton.Automaton$Builder.addTransition(Automaton.java:639) 1 org.apache.lucene.util.automaton.Operations.determinize(Operations.java:741) 1 org.apache.lucene.util.automaton.MinimizationOperations.minimizeHopcroft(MinimizationOperations.java:62) 1 org.apache.lucene.util.automaton.MinimizationOperations.minimize(MinimizationOperations.java:51) 1 org.apache.lucene.util.automaton.RegExp.toAutomaton(RegExp.java:477) 1 org.apache.lucene.util.automaton.RegExp.toAutomaton(RegExp.java:426) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6046) RegExp.toAutomaton high memory use
[ https://issues.apache.org/jira/browse/LUCENE-6046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14194395#comment-14194395 ] Michael McCandless commented on LUCENE-6046: Hmm, two bugs here. First off, RegExp.toAutomaton is an inherently costly method: wasteful of RAM and CPU, doing minimize after each recursive operation, in order to build a DFA in the end. It's unfortunately quite easy to concoct regular expressions that make it consume ridiculous resources. I'll look at this example and see if we can improve it, but in the end it will always have its adversarial cases unless we give up on making the resulting automaton deterministic, which would be a very big change. Maybe we should add adversary defenses to it, e.g. you set a limit on the number of states it's allowed to create, and it throws a RegExpTooHardException if it would exceed that? Second off, ArrayUtil.oversize has the wrong (too large) value for MAX_ARRAY_LENGTH, which is a bug from LUCENE-5844. Which JVM did you run this test on? RegExp.toAutomaton high memory use -- Key: LUCENE-6046 URL: https://issues.apache.org/jira/browse/LUCENE-6046 Project: Lucene - Core Issue Type: Bug Components: core/queryparser Affects Versions: 4.10.1 Reporter: Lee Hinman Priority: Minor When creating an automaton from an org.apache.lucene.util.automaton.RegExp, it's possible for the automaton to use so much memory it exceeds the maximum array size for java. The following caused an OutOfMemoryError with a 32gb heap: {noformat} new RegExp(\\[\\[(Datei|File|Bild|Image):[^]]*alt=[^]|}]{50,200}).toAutomaton(); {noformat} When increased to a 60gb heap, the following exception is thrown: {noformat} 1 java.lang.IllegalArgumentException: requested array size 2147483624 exceeds maximum array in java (2147483623) 1 __randomizedtesting.SeedInfo.seed([7BE81EF678615C32:95C8057A4ABA5B52]:0) 1 org.apache.lucene.util.ArrayUtil.oversize(ArrayUtil.java:168) 1 org.apache.lucene.util.ArrayUtil.grow(ArrayUtil.java:295) 1 org.apache.lucene.util.automaton.Automaton$Builder.addTransition(Automaton.java:639) 1 org.apache.lucene.util.automaton.Operations.determinize(Operations.java:741) 1 org.apache.lucene.util.automaton.MinimizationOperations.minimizeHopcroft(MinimizationOperations.java:62) 1 org.apache.lucene.util.automaton.MinimizationOperations.minimize(MinimizationOperations.java:51) 1 org.apache.lucene.util.automaton.RegExp.toAutomaton(RegExp.java:477) 1 org.apache.lucene.util.automaton.RegExp.toAutomaton(RegExp.java:426) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (LUCENE-6046) RegExp.toAutomaton high memory use
[ https://issues.apache.org/jira/browse/LUCENE-6046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless reassigned LUCENE-6046: -- Assignee: Michael McCandless RegExp.toAutomaton high memory use -- Key: LUCENE-6046 URL: https://issues.apache.org/jira/browse/LUCENE-6046 Project: Lucene - Core Issue Type: Bug Components: core/queryparser Affects Versions: 4.10.1 Reporter: Lee Hinman Assignee: Michael McCandless Priority: Minor When creating an automaton from an org.apache.lucene.util.automaton.RegExp, it's possible for the automaton to use so much memory it exceeds the maximum array size for java. The following caused an OutOfMemoryError with a 32gb heap: {noformat} new RegExp(\\[\\[(Datei|File|Bild|Image):[^]]*alt=[^]|}]{50,200}).toAutomaton(); {noformat} When increased to a 60gb heap, the following exception is thrown: {noformat} 1 java.lang.IllegalArgumentException: requested array size 2147483624 exceeds maximum array in java (2147483623) 1 __randomizedtesting.SeedInfo.seed([7BE81EF678615C32:95C8057A4ABA5B52]:0) 1 org.apache.lucene.util.ArrayUtil.oversize(ArrayUtil.java:168) 1 org.apache.lucene.util.ArrayUtil.grow(ArrayUtil.java:295) 1 org.apache.lucene.util.automaton.Automaton$Builder.addTransition(Automaton.java:639) 1 org.apache.lucene.util.automaton.Operations.determinize(Operations.java:741) 1 org.apache.lucene.util.automaton.MinimizationOperations.minimizeHopcroft(MinimizationOperations.java:62) 1 org.apache.lucene.util.automaton.MinimizationOperations.minimize(MinimizationOperations.java:51) 1 org.apache.lucene.util.automaton.RegExp.toAutomaton(RegExp.java:477) 1 org.apache.lucene.util.automaton.RegExp.toAutomaton(RegExp.java:426) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6046) RegExp.toAutomaton high memory use
[ https://issues.apache.org/jira/browse/LUCENE-6046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14194397#comment-14194397 ] Dawid Weiss commented on LUCENE-6046: - Just a note -- Russ Cox wrote a series of excellent articles about different approaches of implementing regexp scanners. http://swtch.com/~rsc/regexp/regexp1.html (There is no clear winner -- both DFAs and NFA have advantages.) RegExp.toAutomaton high memory use -- Key: LUCENE-6046 URL: https://issues.apache.org/jira/browse/LUCENE-6046 Project: Lucene - Core Issue Type: Bug Components: core/queryparser Affects Versions: 4.10.1 Reporter: Lee Hinman Assignee: Michael McCandless Priority: Minor When creating an automaton from an org.apache.lucene.util.automaton.RegExp, it's possible for the automaton to use so much memory it exceeds the maximum array size for java. The following caused an OutOfMemoryError with a 32gb heap: {noformat} new RegExp(\\[\\[(Datei|File|Bild|Image):[^]]*alt=[^]|}]{50,200}).toAutomaton(); {noformat} When increased to a 60gb heap, the following exception is thrown: {noformat} 1 java.lang.IllegalArgumentException: requested array size 2147483624 exceeds maximum array in java (2147483623) 1 __randomizedtesting.SeedInfo.seed([7BE81EF678615C32:95C8057A4ABA5B52]:0) 1 org.apache.lucene.util.ArrayUtil.oversize(ArrayUtil.java:168) 1 org.apache.lucene.util.ArrayUtil.grow(ArrayUtil.java:295) 1 org.apache.lucene.util.automaton.Automaton$Builder.addTransition(Automaton.java:639) 1 org.apache.lucene.util.automaton.Operations.determinize(Operations.java:741) 1 org.apache.lucene.util.automaton.MinimizationOperations.minimizeHopcroft(MinimizationOperations.java:62) 1 org.apache.lucene.util.automaton.MinimizationOperations.minimize(MinimizationOperations.java:51) 1 org.apache.lucene.util.automaton.RegExp.toAutomaton(RegExp.java:477) 1 org.apache.lucene.util.automaton.RegExp.toAutomaton(RegExp.java:426) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6046) RegExp.toAutomaton high memory use
[ https://issues.apache.org/jira/browse/LUCENE-6046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14194400#comment-14194400 ] Lee Hinman commented on LUCENE-6046: [~mikemccand] I ran it with the following JVM: {noformat} java version 1.8.0_20 Java(TM) SE Runtime Environment (build 1.8.0_20-b26) Java HotSpot(TM) 64-Bit Server VM (build 25.20-b23, mixed mode) {noformat} RegExp.toAutomaton high memory use -- Key: LUCENE-6046 URL: https://issues.apache.org/jira/browse/LUCENE-6046 Project: Lucene - Core Issue Type: Bug Components: core/queryparser Affects Versions: 4.10.1 Reporter: Lee Hinman Assignee: Michael McCandless Priority: Minor When creating an automaton from an org.apache.lucene.util.automaton.RegExp, it's possible for the automaton to use so much memory it exceeds the maximum array size for java. The following caused an OutOfMemoryError with a 32gb heap: {noformat} new RegExp(\\[\\[(Datei|File|Bild|Image):[^]]*alt=[^]|}]{50,200}).toAutomaton(); {noformat} When increased to a 60gb heap, the following exception is thrown: {noformat} 1 java.lang.IllegalArgumentException: requested array size 2147483624 exceeds maximum array in java (2147483623) 1 __randomizedtesting.SeedInfo.seed([7BE81EF678615C32:95C8057A4ABA5B52]:0) 1 org.apache.lucene.util.ArrayUtil.oversize(ArrayUtil.java:168) 1 org.apache.lucene.util.ArrayUtil.grow(ArrayUtil.java:295) 1 org.apache.lucene.util.automaton.Automaton$Builder.addTransition(Automaton.java:639) 1 org.apache.lucene.util.automaton.Operations.determinize(Operations.java:741) 1 org.apache.lucene.util.automaton.MinimizationOperations.minimizeHopcroft(MinimizationOperations.java:62) 1 org.apache.lucene.util.automaton.MinimizationOperations.minimize(MinimizationOperations.java:51) 1 org.apache.lucene.util.automaton.RegExp.toAutomaton(RegExp.java:477) 1 org.apache.lucene.util.automaton.RegExp.toAutomaton(RegExp.java:426) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6046) RegExp.toAutomaton high memory use
[ https://issues.apache.org/jira/browse/LUCENE-6046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14194406#comment-14194406 ] Michael McCandless commented on LUCENE-6046: bq. Russ Cox wrote a series of excellent articles about different approaches of implementing regexp scanners. Thanks Dawid, these are great. Switching to NFA based matching would be a very large change ... I don't think we should pursue it here. Terms.intersect implementation for block tree is already very complex ... though I suppose of we could hide the on the fly subset construction (and convert regexp to a Thompson NFA) under an API, then Terms.intersect implementation wouldn't have to change much. Still, there will always be adversarial cases no matter which approach we choose. I think for this issue we should allow passing in a how much work are you willing to do to RegExp.toAutomaton, and it throws an exc when it would exceed that. RegExp.toAutomaton high memory use -- Key: LUCENE-6046 URL: https://issues.apache.org/jira/browse/LUCENE-6046 Project: Lucene - Core Issue Type: Bug Components: core/queryparser Affects Versions: 4.10.1 Reporter: Lee Hinman Assignee: Michael McCandless Priority: Minor When creating an automaton from an org.apache.lucene.util.automaton.RegExp, it's possible for the automaton to use so much memory it exceeds the maximum array size for java. The following caused an OutOfMemoryError with a 32gb heap: {noformat} new RegExp(\\[\\[(Datei|File|Bild|Image):[^]]*alt=[^]|}]{50,200}).toAutomaton(); {noformat} When increased to a 60gb heap, the following exception is thrown: {noformat} 1 java.lang.IllegalArgumentException: requested array size 2147483624 exceeds maximum array in java (2147483623) 1 __randomizedtesting.SeedInfo.seed([7BE81EF678615C32:95C8057A4ABA5B52]:0) 1 org.apache.lucene.util.ArrayUtil.oversize(ArrayUtil.java:168) 1 org.apache.lucene.util.ArrayUtil.grow(ArrayUtil.java:295) 1 org.apache.lucene.util.automaton.Automaton$Builder.addTransition(Automaton.java:639) 1 org.apache.lucene.util.automaton.Operations.determinize(Operations.java:741) 1 org.apache.lucene.util.automaton.MinimizationOperations.minimizeHopcroft(MinimizationOperations.java:62) 1 org.apache.lucene.util.automaton.MinimizationOperations.minimize(MinimizationOperations.java:51) 1 org.apache.lucene.util.automaton.RegExp.toAutomaton(RegExp.java:477) 1 org.apache.lucene.util.automaton.RegExp.toAutomaton(RegExp.java:426) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6046) RegExp.toAutomaton high memory use
[ https://issues.apache.org/jira/browse/LUCENE-6046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14194413#comment-14194413 ] Dawid Weiss commented on LUCENE-6046: - I didn't mean to imply we should change the regexp implementation! :) This was just a pointer in case somebody wished to understand why regexps can explode. I actually wish there was an NFA-based regexp implementation for Java (with low-memory footprint) because this would make concatenating thousands of regexps (e.g., for pattern detection) much easier. RegExp.toAutomaton high memory use -- Key: LUCENE-6046 URL: https://issues.apache.org/jira/browse/LUCENE-6046 Project: Lucene - Core Issue Type: Bug Components: core/queryparser Affects Versions: 4.10.1 Reporter: Lee Hinman Assignee: Michael McCandless Priority: Minor When creating an automaton from an org.apache.lucene.util.automaton.RegExp, it's possible for the automaton to use so much memory it exceeds the maximum array size for java. The following caused an OutOfMemoryError with a 32gb heap: {noformat} new RegExp(\\[\\[(Datei|File|Bild|Image):[^]]*alt=[^]|}]{50,200}).toAutomaton(); {noformat} When increased to a 60gb heap, the following exception is thrown: {noformat} 1 java.lang.IllegalArgumentException: requested array size 2147483624 exceeds maximum array in java (2147483623) 1 __randomizedtesting.SeedInfo.seed([7BE81EF678615C32:95C8057A4ABA5B52]:0) 1 org.apache.lucene.util.ArrayUtil.oversize(ArrayUtil.java:168) 1 org.apache.lucene.util.ArrayUtil.grow(ArrayUtil.java:295) 1 org.apache.lucene.util.automaton.Automaton$Builder.addTransition(Automaton.java:639) 1 org.apache.lucene.util.automaton.Operations.determinize(Operations.java:741) 1 org.apache.lucene.util.automaton.MinimizationOperations.minimizeHopcroft(MinimizationOperations.java:62) 1 org.apache.lucene.util.automaton.MinimizationOperations.minimize(MinimizationOperations.java:51) 1 org.apache.lucene.util.automaton.RegExp.toAutomaton(RegExp.java:477) 1 org.apache.lucene.util.automaton.RegExp.toAutomaton(RegExp.java:426) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6046) RegExp.toAutomaton high memory use
[ https://issues.apache.org/jira/browse/LUCENE-6046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14194412#comment-14194412 ] Michael McCandless commented on LUCENE-6046: bq. Michael McCandless I ran it with the following JVM: Thanks [~dakrone]. I was wrong about the first bug: there is no bug in ArrayUtil.oversize. That exception just means RegExp is trying to create a too-big array ... so just the one bug :) RegExp.toAutomaton high memory use -- Key: LUCENE-6046 URL: https://issues.apache.org/jira/browse/LUCENE-6046 Project: Lucene - Core Issue Type: Bug Components: core/queryparser Affects Versions: 4.10.1 Reporter: Lee Hinman Assignee: Michael McCandless Priority: Minor When creating an automaton from an org.apache.lucene.util.automaton.RegExp, it's possible for the automaton to use so much memory it exceeds the maximum array size for java. The following caused an OutOfMemoryError with a 32gb heap: {noformat} new RegExp(\\[\\[(Datei|File|Bild|Image):[^]]*alt=[^]|}]{50,200}).toAutomaton(); {noformat} When increased to a 60gb heap, the following exception is thrown: {noformat} 1 java.lang.IllegalArgumentException: requested array size 2147483624 exceeds maximum array in java (2147483623) 1 __randomizedtesting.SeedInfo.seed([7BE81EF678615C32:95C8057A4ABA5B52]:0) 1 org.apache.lucene.util.ArrayUtil.oversize(ArrayUtil.java:168) 1 org.apache.lucene.util.ArrayUtil.grow(ArrayUtil.java:295) 1 org.apache.lucene.util.automaton.Automaton$Builder.addTransition(Automaton.java:639) 1 org.apache.lucene.util.automaton.Operations.determinize(Operations.java:741) 1 org.apache.lucene.util.automaton.MinimizationOperations.minimizeHopcroft(MinimizationOperations.java:62) 1 org.apache.lucene.util.automaton.MinimizationOperations.minimize(MinimizationOperations.java:51) 1 org.apache.lucene.util.automaton.RegExp.toAutomaton(RegExp.java:477) 1 org.apache.lucene.util.automaton.RegExp.toAutomaton(RegExp.java:426) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6046) RegExp.toAutomaton high memory use
[ https://issues.apache.org/jira/browse/LUCENE-6046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14194436#comment-14194436 ] Lee Hinman commented on LUCENE-6046: bq. I think for this issue we should allow passing in a how much work are you willing to do to RegExp.toAutomaton, and it throws an exc when it would exceed that. For what it's worth, I think this would be a good solution for us, much better than silently (from the user's perspective) freezing and then hitting an OOME. RegExp.toAutomaton high memory use -- Key: LUCENE-6046 URL: https://issues.apache.org/jira/browse/LUCENE-6046 Project: Lucene - Core Issue Type: Bug Components: core/queryparser Affects Versions: 4.10.1 Reporter: Lee Hinman Assignee: Michael McCandless Priority: Minor When creating an automaton from an org.apache.lucene.util.automaton.RegExp, it's possible for the automaton to use so much memory it exceeds the maximum array size for java. The following caused an OutOfMemoryError with a 32gb heap: {noformat} new RegExp(\\[\\[(Datei|File|Bild|Image):[^]]*alt=[^]|}]{50,200}).toAutomaton(); {noformat} When increased to a 60gb heap, the following exception is thrown: {noformat} 1 java.lang.IllegalArgumentException: requested array size 2147483624 exceeds maximum array in java (2147483623) 1 __randomizedtesting.SeedInfo.seed([7BE81EF678615C32:95C8057A4ABA5B52]:0) 1 org.apache.lucene.util.ArrayUtil.oversize(ArrayUtil.java:168) 1 org.apache.lucene.util.ArrayUtil.grow(ArrayUtil.java:295) 1 org.apache.lucene.util.automaton.Automaton$Builder.addTransition(Automaton.java:639) 1 org.apache.lucene.util.automaton.Operations.determinize(Operations.java:741) 1 org.apache.lucene.util.automaton.MinimizationOperations.minimizeHopcroft(MinimizationOperations.java:62) 1 org.apache.lucene.util.automaton.MinimizationOperations.minimize(MinimizationOperations.java:51) 1 org.apache.lucene.util.automaton.RegExp.toAutomaton(RegExp.java:477) 1 org.apache.lucene.util.automaton.RegExp.toAutomaton(RegExp.java:426) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-6637) Solr should have a way to restore a core
[ https://issues.apache.org/jira/browse/SOLR-6637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Thacker updated SOLR-6637: Attachment: SOLR-6637.patch bq. MockDirectoryWrapper: cannot close: there are still open files: {_0.cfs=1, _1.cfs=1} Patch which fixes this. Looks like we can't use the try with resource block to get indexDir from the directoryFactory as we need to call release() instead of closing it. Solr should have a way to restore a core Key: SOLR-6637 URL: https://issues.apache.org/jira/browse/SOLR-6637 Project: Solr Issue Type: Improvement Reporter: Varun Thacker Attachments: SOLR-6637.patch, SOLR-6637.patch, SOLR-6637.patch, SOLR-6637.patch We have a core backup command which backs up the index. We should have a restore command too. This would restore any named snapshots created by the replication handlers backup command. While working on this patch right now I realized that during backup we only backup the index. Should we backup the conf files also? Any thoughts? I could separate Jira for this. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6517) CollectionsAPI call REBALANCELEADERS
[ https://issues.apache.org/jira/browse/SOLR-6517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14194476#comment-14194476 ] Noble Paul commented on SOLR-6517: -- Yes, there is a problem, it will not work, Ideally you should trigger the re-lection process by invoking joinElection() with joinAtHead=true . That is what the OVERSEERROLE command does CollectionsAPI call REBALANCELEADERS Key: SOLR-6517 URL: https://issues.apache.org/jira/browse/SOLR-6517 Project: Solr Issue Type: New Feature Affects Versions: 5.0, Trunk Reporter: Erick Erickson Assignee: Erick Erickson Fix For: 5.0, Trunk Attachments: SOLR-6517.patch, SOLR-6517.patch, SOLR-6517.patch Perhaps the final piece of SOLR-6491. Once the preferred leadership roles are assigned, there has to be a command make it so Mr. Solr. This is something of a placeholder to collect ideas. One wouldn't want to flood the system with hundreds of re-assignments at once. Should this be synchronous or asnych? Should it make the best attempt but not worry about perfection? Should it??? a collection=name parameter would be required and it would re-elect all the leaders that were on the 'wrong' node I'm thinking an optionally allowing one to specify a shard in the case where you wanted to make a very specific change. Note that there's no need to specify a particular replica, since there should be only a single preferredLeader per slice. This command would do nothing to any slice that did not have a replica with a preferredLeader role. Likewise it would do nothing if the slice in question already had the leader role assigned to the node with the preferredLeader role. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6365) specify appends, defaults, invariants outside of the component
[ https://issues.apache.org/jira/browse/SOLR-6365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14194477#comment-14194477 ] ASF subversion and git services commented on SOLR-6365: --- Commit 1636330 from [~noble.paul] in branch 'dev/trunk' [ https://svn.apache.org/r1636330 ] SOLR-6365 specify appends, defaults, invariants outside of the component --- Key: SOLR-6365 URL: https://issues.apache.org/jira/browse/SOLR-6365 Project: Solr Issue Type: Improvement Reporter: Noble Paul Assignee: Noble Paul Fix For: 5.0, Trunk Attachments: SOLR-6365-crappy-test.patch, SOLR-6365.patch, SOLR-6365.patch, SOLR-6365.patch The components are configured in solrconfig.xml mostly for specifying these extra parameters. If we separate these out, we can avoid specifying the components altogether and make solrconfig much simpler. Eventually we want users to see all functions as paths instead of components and control these params from outside , through an API and persisted in ZK objectives : * define standard components implicitly and let users override some params only * reuse standard params across components * define multiple param sets and mix and match these params at request time example {code:xml} !-- use json for all paths and _txt as the default search field-- initParams name=global path=/** lst name=defaults str name=wtjson/str str name=df_txt/str /lst /initParams {code} other examples {code:xml} initParams name=a path=/dump3,/root/*,/root1/** lst name=defaults str name=aA/str /lst lst name=invariants str name=bB/str /lst lst name=appends str name=cC/str /lst /initParams requestHandler name=/dump3 class=DumpRequestHandler/ requestHandler name=/dump4 class=DumpRequestHandler/ requestHandler name=/root/dump5 class=DumpRequestHandler/ requestHandler name=/root1/anotherlevel/dump6 class=DumpRequestHandler/ requestHandler name=/dump1 class=DumpRequestHandler initParams=a/ requestHandler name=/dump2 class=DumpRequestHandler initParams=a lst name=defaults str name=aA1/str /lst lst name=invariants str name=bB1/str /lst lst name=appends str name=cC1/str /lst /requestHandler {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6365) specify appends, defaults, invariants outside of the component
[ https://issues.apache.org/jira/browse/SOLR-6365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14194479#comment-14194479 ] ASF subversion and git services commented on SOLR-6365: --- Commit 1636331 from [~noble.paul] in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1636331 ] SOLR-6365 specify appends, defaults, invariants outside of the component --- Key: SOLR-6365 URL: https://issues.apache.org/jira/browse/SOLR-6365 Project: Solr Issue Type: Improvement Reporter: Noble Paul Assignee: Noble Paul Fix For: 5.0, Trunk Attachments: SOLR-6365-crappy-test.patch, SOLR-6365.patch, SOLR-6365.patch, SOLR-6365.patch The components are configured in solrconfig.xml mostly for specifying these extra parameters. If we separate these out, we can avoid specifying the components altogether and make solrconfig much simpler. Eventually we want users to see all functions as paths instead of components and control these params from outside , through an API and persisted in ZK objectives : * define standard components implicitly and let users override some params only * reuse standard params across components * define multiple param sets and mix and match these params at request time example {code:xml} !-- use json for all paths and _txt as the default search field-- initParams name=global path=/** lst name=defaults str name=wtjson/str str name=df_txt/str /lst /initParams {code} other examples {code:xml} initParams name=a path=/dump3,/root/*,/root1/** lst name=defaults str name=aA/str /lst lst name=invariants str name=bB/str /lst lst name=appends str name=cC/str /lst /initParams requestHandler name=/dump3 class=DumpRequestHandler/ requestHandler name=/dump4 class=DumpRequestHandler/ requestHandler name=/root/dump5 class=DumpRequestHandler/ requestHandler name=/root1/anotherlevel/dump6 class=DumpRequestHandler/ requestHandler name=/dump1 class=DumpRequestHandler initParams=a/ requestHandler name=/dump2 class=DumpRequestHandler initParams=a lst name=defaults str name=aA1/str /lst lst name=invariants str name=bB1/str /lst lst name=appends str name=cC1/str /lst /requestHandler {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6365) specify appends, defaults, invariants outside of the component
[ https://issues.apache.org/jira/browse/SOLR-6365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14194483#comment-14194483 ] Noble Paul commented on SOLR-6365: -- done. The new behavior is very simple. Whatever is put inside the {{requestHandle}} takes precedence over {{initParams}} specify appends, defaults, invariants outside of the component --- Key: SOLR-6365 URL: https://issues.apache.org/jira/browse/SOLR-6365 Project: Solr Issue Type: Improvement Reporter: Noble Paul Assignee: Noble Paul Fix For: 5.0, Trunk Attachments: SOLR-6365-crappy-test.patch, SOLR-6365.patch, SOLR-6365.patch, SOLR-6365.patch The components are configured in solrconfig.xml mostly for specifying these extra parameters. If we separate these out, we can avoid specifying the components altogether and make solrconfig much simpler. Eventually we want users to see all functions as paths instead of components and control these params from outside , through an API and persisted in ZK objectives : * define standard components implicitly and let users override some params only * reuse standard params across components * define multiple param sets and mix and match these params at request time example {code:xml} !-- use json for all paths and _txt as the default search field-- initParams name=global path=/** lst name=defaults str name=wtjson/str str name=df_txt/str /lst /initParams {code} other examples {code:xml} initParams name=a path=/dump3,/root/*,/root1/** lst name=defaults str name=aA/str /lst lst name=invariants str name=bB/str /lst lst name=appends str name=cC/str /lst /initParams requestHandler name=/dump3 class=DumpRequestHandler/ requestHandler name=/dump4 class=DumpRequestHandler/ requestHandler name=/root/dump5 class=DumpRequestHandler/ requestHandler name=/root1/anotherlevel/dump6 class=DumpRequestHandler/ requestHandler name=/dump1 class=DumpRequestHandler initParams=a/ requestHandler name=/dump2 class=DumpRequestHandler initParams=a lst name=defaults str name=aA1/str /lst lst name=invariants str name=bB1/str /lst lst name=appends str name=cC1/str /lst /requestHandler {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6046) RegExp.toAutomaton high memory use
[ https://issues.apache.org/jira/browse/LUCENE-6046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14194484#comment-14194484 ] Nik Everett commented on LUCENE-6046: - I'm working on a first cut of something that does that. Better regex implementation would be great but the biggest thing to me is being able to limit the amount of work the determinize operation performs. Its such a costly operation that I don't think it should ever be really abstracted from the user. Something like having determinize throw a checked exception when it performs too much work would make you keenly aware whenever you might be straying into exponential territory. RegExp.toAutomaton high memory use -- Key: LUCENE-6046 URL: https://issues.apache.org/jira/browse/LUCENE-6046 Project: Lucene - Core Issue Type: Bug Components: core/queryparser Affects Versions: 4.10.1 Reporter: Lee Hinman Assignee: Michael McCandless Priority: Minor When creating an automaton from an org.apache.lucene.util.automaton.RegExp, it's possible for the automaton to use so much memory it exceeds the maximum array size for java. The following caused an OutOfMemoryError with a 32gb heap: {noformat} new RegExp(\\[\\[(Datei|File|Bild|Image):[^]]*alt=[^]|}]{50,200}).toAutomaton(); {noformat} When increased to a 60gb heap, the following exception is thrown: {noformat} 1 java.lang.IllegalArgumentException: requested array size 2147483624 exceeds maximum array in java (2147483623) 1 __randomizedtesting.SeedInfo.seed([7BE81EF678615C32:95C8057A4ABA5B52]:0) 1 org.apache.lucene.util.ArrayUtil.oversize(ArrayUtil.java:168) 1 org.apache.lucene.util.ArrayUtil.grow(ArrayUtil.java:295) 1 org.apache.lucene.util.automaton.Automaton$Builder.addTransition(Automaton.java:639) 1 org.apache.lucene.util.automaton.Operations.determinize(Operations.java:741) 1 org.apache.lucene.util.automaton.MinimizationOperations.minimizeHopcroft(MinimizationOperations.java:62) 1 org.apache.lucene.util.automaton.MinimizationOperations.minimize(MinimizationOperations.java:51) 1 org.apache.lucene.util.automaton.RegExp.toAutomaton(RegExp.java:477) 1 org.apache.lucene.util.automaton.RegExp.toAutomaton(RegExp.java:426) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-6517) CollectionsAPI call REBALANCELEADERS
[ https://issues.apache.org/jira/browse/SOLR-6517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14194476#comment-14194476 ] Noble Paul edited comment on SOLR-6517 at 11/3/14 12:01 PM: Yes, there is a problem, it will not work, Ideally you should trigger the re-lection process by invoking joinElection() with joinAtHead=true . That is what the ADDROLE and role=overseer command does was (Author: noble.paul): Yes, there is a problem, it will not work, Ideally you should trigger the re-lection process by invoking joinElection() with joinAtHead=true . That is what the OVERSEERROLE command does CollectionsAPI call REBALANCELEADERS Key: SOLR-6517 URL: https://issues.apache.org/jira/browse/SOLR-6517 Project: Solr Issue Type: New Feature Affects Versions: 5.0, Trunk Reporter: Erick Erickson Assignee: Erick Erickson Fix For: 5.0, Trunk Attachments: SOLR-6517.patch, SOLR-6517.patch, SOLR-6517.patch Perhaps the final piece of SOLR-6491. Once the preferred leadership roles are assigned, there has to be a command make it so Mr. Solr. This is something of a placeholder to collect ideas. One wouldn't want to flood the system with hundreds of re-assignments at once. Should this be synchronous or asnych? Should it make the best attempt but not worry about perfection? Should it??? a collection=name parameter would be required and it would re-elect all the leaders that were on the 'wrong' node I'm thinking an optionally allowing one to specify a shard in the case where you wanted to make a very specific change. Note that there's no need to specify a particular replica, since there should be only a single preferredLeader per slice. This command would do nothing to any slice that did not have a replica with a preferredLeader role. Likewise it would do nothing if the slice in question already had the leader role assigned to the node with the preferredLeader role. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-6533) Support editing common solrconfig.xml values
[ https://issues.apache.org/jira/browse/SOLR-6533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Noble Paul updated SOLR-6533: - Attachment: SOLR-6533.patch all tests pass, added a command line option to disable config editing Support editing common solrconfig.xml values Key: SOLR-6533 URL: https://issues.apache.org/jira/browse/SOLR-6533 Project: Solr Issue Type: Sub-task Reporter: Noble Paul Attachments: SOLR-6533.patch, SOLR-6533.patch, SOLR-6533.patch, SOLR-6533.patch, SOLR-6533.patch, SOLR-6533.patch, SOLR-6533.patch, SOLR-6533.patch There are a bunch of properties in solrconfig.xml which users want to edit. We will attack them first These properties will be persisted to a separate file called config.json (or whatever file). Instead of saving in the same format we will have well known properties which users can directly edit {code} updateHandler.autoCommit.maxDocs query.filterCache.initialSize {code} The api will be modeled around the bulk schema API {code:javascript} curl http://localhost:8983/solr/collection1/config -H 'Content-type:application/json' -d '{ set-property : {updateHandler.autoCommit.maxDocs:5}, unset-property: updateHandler.autoCommit.maxDocs }' {code} {code:javascript} //or use this to set ${mypropname} values curl http://localhost:8983/solr/collection1/config -H 'Content-type:application/json' -d '{ set-user-property : {mypropname:my_prop_val}, unset-user-property:{mypropname} }' {code} The values stored in the config.json will always take precedence and will be applied after loading solrconfig.xml. An http GET on /config path will give the real config that is applied . An http GET of/config/overlay gives out the content of the configOverlay.json /config/component-name gives only the fchild of the same name from /config -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6528) hdfs cluster with replication min set to 2 / Solr does not honor dfs.replication in hdfs-site.xml
[ https://issues.apache.org/jira/browse/SOLR-6528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14194486#comment-14194486 ] davidchiu commented on SOLR-6528: - Hi,Michael,can you tell me what's the plan to fix this issue? hdfs cluster with replication min set to 2 / Solr does not honor dfs.replication in hdfs-site.xml -- Key: SOLR-6528 URL: https://issues.apache.org/jira/browse/SOLR-6528 Project: Solr Issue Type: Bug Affects Versions: 4.9 Environment: RedHat JDK 1.7 hadoop 2.4.1 Reporter: davidchiu Fix For: 4.10.3, Trunk org.apache.hadoop.ipc.RemoteException(java.io.IOException): file /user/solr/test1/core_node1/data/tlog/tlog.000 on client 192.161.1.91.\nRequested replication 1 is less than the required minimum 2\n\t -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
CFP: FOSDEM 2015 - Open Source Search Dev Room
***Please forward this CFP to anyone who may be interested in participating.*** Hi, Search has evolved to be much more than simply full-text search. We now rely on “search engines” for a wide variety of functionality: search as navigation, search as analytics and backend for data visualization and sometimes, dare we say it, as a data store. The purpose of this dev room is to explore the new world of open source search engines: their enhanced functionality, new use cases, feature and architectural deep dives, and the position of search in relation to the wider set of software tools. We welcome proposals from folks working with or on open source search engines (e.g. Apache Lucene, Apache Solr, Elasticsearch, Seeks, Sphinx, etc.) or technologies that heavily depend upon search (e.g. NoSQL databases, Nutch, Apache Hadoop). We are particularly interested in presentations on search algorithms, machine learning, real-world implementation/deployment stories and explorations of the future of search. Talks should be 30-60 minutes in length, including time for QA. You can submit your talks to us here: https://docs.google.com/forms/d/11yLMj9ZlRD1EMU3Knp5y6eO3H5BRK7V38G0OxSfp84A/viewform Our Call for Papers will close at 23:59 CEST on Monday, December 1, 2014. We cannot guarantee we will have the opportunity to review submissions made after the deadline, so please submit early (and often)! Should you have any questions, you can contact the Dev Room organizers: opensourcesearch-devr...@lists.fosdem.org Cheers, LH on behalf of the Open Source Search Dev Room Program Committee* * Boaz Leskes, Isabel Drost-Fromm, Leslie Hawthorn, Ted Dunning, Torsten Curdt, Uwe Schindler - Uwe Schindler uschind...@apache.org Apache Lucene PMC Member / Committer Bremen, Germany http://lucene.apache.org/ - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: CFP: FOSDEM 2015 - Open Source Search Dev Room
May I suggest the next time they do it, they mention event date and location :-) It's 31st of January/1st Feb, Brussels if I found the right web page. Regards, Alex. Personal: http://www.outerthoughts.com/ and @arafalov Solr resources and newsletter: http://www.solr-start.com/ and @solrstart Solr popularizers community: https://www.linkedin.com/groups?gid=6713853 On 3 November 2014 07:28, Uwe Schindler uschind...@apache.org wrote: ***Please forward this CFP to anyone who may be interested in participating.*** Hi, Search has evolved to be much more than simply full-text search. We now rely on “search engines” for a wide variety of functionality: search as navigation, search as analytics and backend for data visualization and sometimes, dare we say it, as a data store. The purpose of this dev room is to explore the new world of open source search engines: their enhanced functionality, new use cases, feature and architectural deep dives, and the position of search in relation to the wider set of software tools. We welcome proposals from folks working with or on open source search engines (e.g. Apache Lucene, Apache Solr, Elasticsearch, Seeks, Sphinx, etc.) or technologies that heavily depend upon search (e.g. NoSQL databases, Nutch, Apache Hadoop). We are particularly interested in presentations on search algorithms, machine learning, real-world implementation/deployment stories and explorations of the future of search. Talks should be 30-60 minutes in length, including time for QA. You can submit your talks to us here: https://docs.google.com/forms/d/11yLMj9ZlRD1EMU3Knp5y6eO3H5BRK7V38G0OxSfp84A/viewform Our Call for Papers will close at 23:59 CEST on Monday, December 1, 2014. We cannot guarantee we will have the opportunity to review submissions made after the deadline, so please submit early (and often)! Should you have any questions, you can contact the Dev Room organizers: opensourcesearch-devr...@lists.fosdem.org Cheers, LH on behalf of the Open Source Search Dev Room Program Committee* * Boaz Leskes, Isabel Drost-Fromm, Leslie Hawthorn, Ted Dunning, Torsten Curdt, Uwe Schindler - Uwe Schindler uschind...@apache.org Apache Lucene PMC Member / Committer Bremen, Germany http://lucene.apache.org/ - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
RE: CFP: FOSDEM 2015 - Open Source Search Dev Room
Hi, Sorry, my fault. I just copied the official CFP, which was sent to the FOSDEM list... Of course, those know the dates :-) And you are right, the conference is on the following date: Brussels / 31 January 1 February 2015 I have no idea on which day the search devroom takes place. Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Alexandre Rafalovitch [mailto:arafa...@gmail.com] Sent: Monday, November 03, 2014 1:36 PM To: dev@lucene.apache.org Subject: Re: CFP: FOSDEM 2015 - Open Source Search Dev Room May I suggest the next time they do it, they mention event date and location :-) It's 31st of January/1st Feb, Brussels if I found the right web page. Regards, Alex. Personal: http://www.outerthoughts.com/ and @arafalov Solr resources and newsletter: http://www.solr-start.com/ and @solrstart Solr popularizers community: https://www.linkedin.com/groups?gid=6713853 On 3 November 2014 07:28, Uwe Schindler uschind...@apache.org wrote: ***Please forward this CFP to anyone who may be interested in participating.*** Hi, Search has evolved to be much more than simply full-text search. We now rely on “search engines” for a wide variety of functionality: search as navigation, search as analytics and backend for data visualization and sometimes, dare we say it, as a data store. The purpose of this dev room is to explore the new world of open source search engines: their enhanced functionality, new use cases, feature and architectural deep dives, and the position of search in relation to the wider set of software tools. We welcome proposals from folks working with or on open source search engines (e.g. Apache Lucene, Apache Solr, Elasticsearch, Seeks, Sphinx, etc.) or technologies that heavily depend upon search (e.g. NoSQL databases, Nutch, Apache Hadoop). We are particularly interested in presentations on search algorithms, machine learning, real-world implementation/deployment stories and explorations of the future of search. Talks should be 30-60 minutes in length, including time for QA. You can submit your talks to us here: https://docs.google.com/forms/d/11yLMj9ZlRD1EMU3Knp5y6eO3H5BRK7V3 8G0Ox Sfp84A/viewform Our Call for Papers will close at 23:59 CEST on Monday, December 1, 2014. We cannot guarantee we will have the opportunity to review submissions made after the deadline, so please submit early (and often)! Should you have any questions, you can contact the Dev Room organizers: opensourcesearch-devr...@lists.fosdem.org Cheers, LH on behalf of the Open Source Search Dev Room Program Committee* * Boaz Leskes, Isabel Drost-Fromm, Leslie Hawthorn, Ted Dunning, Torsten Curdt, Uwe Schindler - Uwe Schindler uschind...@apache.org Apache Lucene PMC Member / Committer Bremen, Germany http://lucene.apache.org/ - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
lucene expressions - asm dependency
Hi all, I'm using lucene-expressions in a project, and it has a dependency on asm 4.1. I also use some components that depend on cglib 2.2.2, that depeneds on asm 3.1. Besides, asm 5.0.2 is out since April 2014 No ideal situation. Would it be possible to shade / repackage asm for lucene-expressions? In that case there will be no class conflicts when using lucene in a project where another asm is also used? -Rob
RE: FOSDEM 2015 - Open Source Search Dev Room
Hi, forgot to mention: FOSDEM 2015 takes place in Brussels on January 31th and February 1st, 2015. See also: https://fosdem.org/2015/ I hope to see you there! Uwe -Original Message- From: Uwe Schindler [mailto:uschind...@apache.org] Sent: Monday, November 03, 2014 1:29 PM To: dev@lucene.apache.org; java-u...@lucene.apache.org; solr- u...@lucene.apache.org; gene...@lucene.apache.org Subject: CFP: FOSDEM 2015 - Open Source Search Dev Room ***Please forward this CFP to anyone who may be interested in participating.*** Hi, Search has evolved to be much more than simply full-text search. We now rely on “search engines” for a wide variety of functionality: search as navigation, search as analytics and backend for data visualization and sometimes, dare we say it, as a data store. The purpose of this dev room is to explore the new world of open source search engines: their enhanced functionality, new use cases, feature and architectural deep dives, and the position of search in relation to the wider set of software tools. We welcome proposals from folks working with or on open source search engines (e.g. Apache Lucene, Apache Solr, Elasticsearch, Seeks, Sphinx, etc.) or technologies that heavily depend upon search (e.g. NoSQL databases, Nutch, Apache Hadoop). We are particularly interested in presentations on search algorithms, machine learning, real-world implementation/deployment stories and explorations of the future of search. Talks should be 30-60 minutes in length, including time for QA. You can submit your talks to us here: https://docs.google.com/forms/d/11yLMj9ZlRD1EMU3Knp5y6eO3H5BRK7V3 8G0OxSfp84A/viewform Our Call for Papers will close at 23:59 CEST on Monday, December 1, 2014. We cannot guarantee we will have the opportunity to review submissions made after the deadline, so please submit early (and often)! Should you have any questions, you can contact the Dev Room organizers: opensourcesearch-devr...@lists.fosdem.org Cheers, LH on behalf of the Open Source Search Dev Room Program Committee* * Boaz Leskes, Isabel Drost-Fromm, Leslie Hawthorn, Ted Dunning, Torsten Curdt, Uwe Schindler - Uwe Schindler uschind...@apache.org Apache Lucene PMC Member / Committer Bremen, Germany http://lucene.apache.org/ - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
RE: lucene expressions - asm dependency
Hi, Why not shade it on yourself depending on your project needs? You can do this in your own project easily, as a separate build step (e.g. Ant or maybe also in Maven using a separate sub-project which your main project depends on). The ASM issue is well-known, the forbidden-apis checker shades 5.0.2. Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de http://www.thetaphi.de/ eMail: u...@thetaphi.de From: Rob Audenaerde [mailto:rob.audenae...@gmail.com] Sent: Monday, November 03, 2014 10:11 AM To: dev@lucene.apache.org Subject: lucene expressions - asm dependency Hi all, I'm using lucene-expressions in a project, and it has a dependency on asm 4.1. I also use some components that depend on cglib 2.2.2, that depeneds on asm 3.1. Besides, asm 5.0.2 is out since April 2014 No ideal situation. Would it be possible to shade / repackage asm for lucene-expressions? In that case there will be no class conflicts when using lucene in a project where another asm is also used? -Rob
[jira] [Commented] (SOLR-6637) Solr should have a way to restore a core
[ https://issues.apache.org/jira/browse/SOLR-6637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14194516#comment-14194516 ] David Smiley commented on SOLR-6637: FYI I already created an issue in JIRA for this: https://issues.apache.org/jira/browse/SOLR-4545 Solr should have a way to restore a core Key: SOLR-6637 URL: https://issues.apache.org/jira/browse/SOLR-6637 Project: Solr Issue Type: Improvement Reporter: Varun Thacker Attachments: SOLR-6637.patch, SOLR-6637.patch, SOLR-6637.patch, SOLR-6637.patch We have a core backup command which backs up the index. We should have a restore command too. This would restore any named snapshots created by the replication handlers backup command. While working on this patch right now I realized that during backup we only backup the index. Should we backup the conf files also? Any thoughts? I could separate Jira for this. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6637) Solr should have a way to restore a core
[ https://issues.apache.org/jira/browse/SOLR-6637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14194522#comment-14194522 ] Varun Thacker commented on SOLR-6637: - [~dsmiley] I had not seen that issue previously. Should we move the work there ? bq. A proposed restore command to the replication handler should allow specifying a directory, or an as-of date; otherwise you'd get the most recent snapshot. My approach here has been to allow restoring named snapshots ( SOLR-5340 ) only. We can add functionality that says that if the name is not provided then we restore the most recent snapshot. Solr should have a way to restore a core Key: SOLR-6637 URL: https://issues.apache.org/jira/browse/SOLR-6637 Project: Solr Issue Type: Improvement Reporter: Varun Thacker Attachments: SOLR-6637.patch, SOLR-6637.patch, SOLR-6637.patch, SOLR-6637.patch We have a core backup command which backs up the index. We should have a restore command too. This would restore any named snapshots created by the replication handlers backup command. While working on this patch right now I realized that during backup we only backup the index. Should we backup the conf files also? Any thoughts? I could separate Jira for this. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6044) Add backcompat for TokenFilters with posInc=false before 4.4
[ https://issues.apache.org/jira/browse/LUCENE-6044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14194553#comment-14194553 ] Robert Muir commented on LUCENE-6044: - +1 Add backcompat for TokenFilters with posInc=false before 4.4 Key: LUCENE-6044 URL: https://issues.apache.org/jira/browse/LUCENE-6044 Project: Lucene - Core Issue Type: Bug Reporter: Ryan Ernst Attachments: LUCENE-6044.patch In Lucene 4.4, a number of token filters supporting the enablePositionIncrements=false setting were changed to default to true. However, with Lucene 5.0, the setting was removed altogether. We should have backcompat for this setting, as well as work when used with a TokenFilterFactory and match version 4.4. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6637) Solr should have a way to restore a core
[ https://issues.apache.org/jira/browse/SOLR-6637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14194555#comment-14194555 ] Noble Paul commented on SOLR-6637: -- [~varunthacker] Can I not restore the data to another core? bq.If the location is not provided in the query string then we default it to core.getDataDir() what is the usecase for restoring from the dataDir itself? bq.Remove any files in the current directory which does not belong to the segment . DON'T DO THIS There is a mechanism using for loading the index from another directory in the same core. Solr should have a way to restore a core Key: SOLR-6637 URL: https://issues.apache.org/jira/browse/SOLR-6637 Project: Solr Issue Type: Improvement Reporter: Varun Thacker Attachments: SOLR-6637.patch, SOLR-6637.patch, SOLR-6637.patch, SOLR-6637.patch We have a core backup command which backs up the index. We should have a restore command too. This would restore any named snapshots created by the replication handlers backup command. While working on this patch right now I realized that during backup we only backup the index. Should we backup the conf files also? Any thoughts? I could separate Jira for this. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6637) Solr should have a way to restore a core
[ https://issues.apache.org/jira/browse/SOLR-6637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14194563#comment-14194563 ] Varun Thacker commented on SOLR-6637: - bq. what is the usecase for restoring from the dataDir itself? What I meant here was - if location param is not provided it would see if the backup index is present under dataDir/backupName . bq. Remove any files in the current directory which does not belong to the segment . This is what I do here - Once all the files from the backup location have been successfully copied over the current index, there might be extra segment files from the current index lying around. It gets cleaned up in cleanupOldIndexFiles() where we take the name of the segment file from the backup index and see which files are extra. We then remove these extra segment files. Solr should have a way to restore a core Key: SOLR-6637 URL: https://issues.apache.org/jira/browse/SOLR-6637 Project: Solr Issue Type: Improvement Reporter: Varun Thacker Attachments: SOLR-6637.patch, SOLR-6637.patch, SOLR-6637.patch, SOLR-6637.patch We have a core backup command which backs up the index. We should have a restore command too. This would restore any named snapshots created by the replication handlers backup command. While working on this patch right now I realized that during backup we only backup the index. Should we backup the conf files also? Any thoughts? I could separate Jira for this. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6670) change BALANCESLICEUNIQUE to BALANCESHARDUNIQUE
[ https://issues.apache.org/jira/browse/SOLR-6670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14194578#comment-14194578 ] ASF subversion and git services commented on SOLR-6670: --- Commit 1636363 from [~erickoerickson] in branch 'dev/trunk' [ https://svn.apache.org/r1636363 ] SOLR-6670: change BALANCESLICEUNIQUE to BALANCESHARDUNIQUE. corrected typo change BALANCESLICEUNIQUE to BALANCESHARDUNIQUE --- Key: SOLR-6670 URL: https://issues.apache.org/jira/browse/SOLR-6670 Project: Solr Issue Type: Improvement Reporter: Erick Erickson Assignee: Erick Erickson Priority: Minor Fix For: 5.0, Trunk Attachments: SOLR-6670.patch JIRA for Jan's comments on SOLR-6513: I thought we agreed to prefer the term shard over slice, so I think we should do this for this API as well. The only place in our refguide we use the word slice is in How SolrCloud Works [1] and that description is disputed. The refguide explanation of what a shard is can be found in Shards and Indexing Data in SolrCloud [2], quoting: When your data is too large for one node, you can break it up and store it in sections by creating one or more shards. Each is a portion of the logical index, or core, and it's the set of all nodes containing that section of the index. So I'm proposing a rename of this API to BALANCESHARDUNIQUE and a rewrite of [1]. [1] https://cwiki.apache.org/confluence/display/solr/How+SolrCloud+Works [2] https://cwiki.apache.org/confluence/display/solr/Shards+and+Indexing+Data+in+SolrCloud Note Mark's comment on that JIRA, but I think it would be best to continue to talk about shards with user-facing operations. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6670) change BALANCESLICEUNIQUE to BALANCESHARDUNIQUE
[ https://issues.apache.org/jira/browse/SOLR-6670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14194581#comment-14194581 ] ASF subversion and git services commented on SOLR-6670: --- Commit 1636364 from [~erickoerickson] in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1636364 ] SOLR-6670: change BALANCESLICEUNIQUE to BALANCESHARDUNIQUE. corrected typo change BALANCESLICEUNIQUE to BALANCESHARDUNIQUE --- Key: SOLR-6670 URL: https://issues.apache.org/jira/browse/SOLR-6670 Project: Solr Issue Type: Improvement Reporter: Erick Erickson Assignee: Erick Erickson Priority: Minor Fix For: 5.0, Trunk Attachments: SOLR-6670.patch JIRA for Jan's comments on SOLR-6513: I thought we agreed to prefer the term shard over slice, so I think we should do this for this API as well. The only place in our refguide we use the word slice is in How SolrCloud Works [1] and that description is disputed. The refguide explanation of what a shard is can be found in Shards and Indexing Data in SolrCloud [2], quoting: When your data is too large for one node, you can break it up and store it in sections by creating one or more shards. Each is a portion of the logical index, or core, and it's the set of all nodes containing that section of the index. So I'm proposing a rename of this API to BALANCESHARDUNIQUE and a rewrite of [1]. [1] https://cwiki.apache.org/confluence/display/solr/How+SolrCloud+Works [2] https://cwiki.apache.org/confluence/display/solr/Shards+and+Indexing+Data+in+SolrCloud Note Mark's comment on that JIRA, but I think it would be best to continue to talk about shards with user-facing operations. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6365) specify appends, defaults, invariants outside of the component
[ https://issues.apache.org/jira/browse/SOLR-6365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14194582#comment-14194582 ] Erick Erickson commented on SOLR-6365: -- Great! specify appends, defaults, invariants outside of the component --- Key: SOLR-6365 URL: https://issues.apache.org/jira/browse/SOLR-6365 Project: Solr Issue Type: Improvement Reporter: Noble Paul Assignee: Noble Paul Fix For: 5.0, Trunk Attachments: SOLR-6365-crappy-test.patch, SOLR-6365.patch, SOLR-6365.patch, SOLR-6365.patch The components are configured in solrconfig.xml mostly for specifying these extra parameters. If we separate these out, we can avoid specifying the components altogether and make solrconfig much simpler. Eventually we want users to see all functions as paths instead of components and control these params from outside , through an API and persisted in ZK objectives : * define standard components implicitly and let users override some params only * reuse standard params across components * define multiple param sets and mix and match these params at request time example {code:xml} !-- use json for all paths and _txt as the default search field-- initParams name=global path=/** lst name=defaults str name=wtjson/str str name=df_txt/str /lst /initParams {code} other examples {code:xml} initParams name=a path=/dump3,/root/*,/root1/** lst name=defaults str name=aA/str /lst lst name=invariants str name=bB/str /lst lst name=appends str name=cC/str /lst /initParams requestHandler name=/dump3 class=DumpRequestHandler/ requestHandler name=/dump4 class=DumpRequestHandler/ requestHandler name=/root/dump5 class=DumpRequestHandler/ requestHandler name=/root1/anotherlevel/dump6 class=DumpRequestHandler/ requestHandler name=/dump1 class=DumpRequestHandler initParams=a/ requestHandler name=/dump2 class=DumpRequestHandler initParams=a lst name=defaults str name=aA1/str /lst lst name=invariants str name=bB1/str /lst lst name=appends str name=cC1/str /lst /requestHandler {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6046) RegExp.toAutomaton high memory use
[ https://issues.apache.org/jira/browse/LUCENE-6046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14194585#comment-14194585 ] Michael McCandless commented on LUCENE-6046: OK I boiled down the adversarial regexp to this simpler still-adversarial version: \[ac]*a\[ac]\{50,200} I suspect this is a legitimate adversary and not a bug in our RegExp/automaton impl, i.e. the number of states in the DFA for this is exponential as a function of the 50/200. RegExp.toAutomaton high memory use -- Key: LUCENE-6046 URL: https://issues.apache.org/jira/browse/LUCENE-6046 Project: Lucene - Core Issue Type: Bug Components: core/queryparser Affects Versions: 4.10.1 Reporter: Lee Hinman Assignee: Michael McCandless Priority: Minor When creating an automaton from an org.apache.lucene.util.automaton.RegExp, it's possible for the automaton to use so much memory it exceeds the maximum array size for java. The following caused an OutOfMemoryError with a 32gb heap: {noformat} new RegExp(\\[\\[(Datei|File|Bild|Image):[^]]*alt=[^]|}]{50,200}).toAutomaton(); {noformat} When increased to a 60gb heap, the following exception is thrown: {noformat} 1 java.lang.IllegalArgumentException: requested array size 2147483624 exceeds maximum array in java (2147483623) 1 __randomizedtesting.SeedInfo.seed([7BE81EF678615C32:95C8057A4ABA5B52]:0) 1 org.apache.lucene.util.ArrayUtil.oversize(ArrayUtil.java:168) 1 org.apache.lucene.util.ArrayUtil.grow(ArrayUtil.java:295) 1 org.apache.lucene.util.automaton.Automaton$Builder.addTransition(Automaton.java:639) 1 org.apache.lucene.util.automaton.Operations.determinize(Operations.java:741) 1 org.apache.lucene.util.automaton.MinimizationOperations.minimizeHopcroft(MinimizationOperations.java:62) 1 org.apache.lucene.util.automaton.MinimizationOperations.minimize(MinimizationOperations.java:51) 1 org.apache.lucene.util.automaton.RegExp.toAutomaton(RegExp.java:477) 1 org.apache.lucene.util.automaton.RegExp.toAutomaton(RegExp.java:426) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6046) RegExp.toAutomaton high memory use
[ https://issues.apache.org/jira/browse/LUCENE-6046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14194592#comment-14194592 ] Nik Everett commented on LUCENE-6046: - Oh yeah, its totally running into 2^n territory legitiately here. This is totally something that'd be rejected by a framework to prevent explosive growth during determination. RegExp.toAutomaton high memory use -- Key: LUCENE-6046 URL: https://issues.apache.org/jira/browse/LUCENE-6046 Project: Lucene - Core Issue Type: Bug Components: core/queryparser Affects Versions: 4.10.1 Reporter: Lee Hinman Assignee: Michael McCandless Priority: Minor When creating an automaton from an org.apache.lucene.util.automaton.RegExp, it's possible for the automaton to use so much memory it exceeds the maximum array size for java. The following caused an OutOfMemoryError with a 32gb heap: {noformat} new RegExp(\\[\\[(Datei|File|Bild|Image):[^]]*alt=[^]|}]{50,200}).toAutomaton(); {noformat} When increased to a 60gb heap, the following exception is thrown: {noformat} 1 java.lang.IllegalArgumentException: requested array size 2147483624 exceeds maximum array in java (2147483623) 1 __randomizedtesting.SeedInfo.seed([7BE81EF678615C32:95C8057A4ABA5B52]:0) 1 org.apache.lucene.util.ArrayUtil.oversize(ArrayUtil.java:168) 1 org.apache.lucene.util.ArrayUtil.grow(ArrayUtil.java:295) 1 org.apache.lucene.util.automaton.Automaton$Builder.addTransition(Automaton.java:639) 1 org.apache.lucene.util.automaton.Operations.determinize(Operations.java:741) 1 org.apache.lucene.util.automaton.MinimizationOperations.minimizeHopcroft(MinimizationOperations.java:62) 1 org.apache.lucene.util.automaton.MinimizationOperations.minimize(MinimizationOperations.java:51) 1 org.apache.lucene.util.automaton.RegExp.toAutomaton(RegExp.java:477) 1 org.apache.lucene.util.automaton.RegExp.toAutomaton(RegExp.java:426) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6044) Add backcompat for TokenFilters with posInc=false before 4.4
[ https://issues.apache.org/jira/browse/LUCENE-6044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14194597#comment-14194597 ] ASF subversion and git services commented on LUCENE-6044: - Commit 1636368 from [~rjernst] in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1636368 ] LUCENE-6044: Fixed backcompat support for token filters with enablePositionIncrements=false Add backcompat for TokenFilters with posInc=false before 4.4 Key: LUCENE-6044 URL: https://issues.apache.org/jira/browse/LUCENE-6044 Project: Lucene - Core Issue Type: Bug Reporter: Ryan Ernst Attachments: LUCENE-6044.patch In Lucene 4.4, a number of token filters supporting the enablePositionIncrements=false setting were changed to default to true. However, with Lucene 5.0, the setting was removed altogether. We should have backcompat for this setting, as well as work when used with a TokenFilterFactory and match version 4.4. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-6044) Add backcompat for TokenFilters with posInc=false before 4.4
[ https://issues.apache.org/jira/browse/LUCENE-6044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ryan Ernst resolved LUCENE-6044. Resolution: Fixed Add backcompat for TokenFilters with posInc=false before 4.4 Key: LUCENE-6044 URL: https://issues.apache.org/jira/browse/LUCENE-6044 Project: Lucene - Core Issue Type: Bug Reporter: Ryan Ernst Attachments: LUCENE-6044.patch In Lucene 4.4, a number of token filters supporting the enablePositionIncrements=false setting were changed to default to true. However, with Lucene 5.0, the setting was removed altogether. We should have backcompat for this setting, as well as work when used with a TokenFilterFactory and match version 4.4. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-6044) Add backcompat for TokenFilters with posInc=false before 4.4
[ https://issues.apache.org/jira/browse/LUCENE-6044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ryan Ernst updated LUCENE-6044: --- Fix Version/s: 5.0 Assignee: Ryan Ernst Add backcompat for TokenFilters with posInc=false before 4.4 Key: LUCENE-6044 URL: https://issues.apache.org/jira/browse/LUCENE-6044 Project: Lucene - Core Issue Type: Bug Reporter: Ryan Ernst Assignee: Ryan Ernst Fix For: 5.0 Attachments: LUCENE-6044.patch In Lucene 4.4, a number of token filters supporting the enablePositionIncrements=false setting were changed to default to true. However, with Lucene 5.0, the setting was removed altogether. We should have backcompat for this setting, as well as work when used with a TokenFilterFactory and match version 4.4. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6637) Solr should have a way to restore a core
[ https://issues.apache.org/jira/browse/SOLR-6637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14194608#comment-14194608 ] David Smiley commented on SOLR-6637: bq. David Smiley I had not seen that issue previously. Should we move the work there ? No, it's too late now. Next time please search for an existing issue. SOLR-4545 can be closed as a duplicate so long as you can restore a snapshot without being required to specify its name. A timestamp would be nice. Solr should have a way to restore a core Key: SOLR-6637 URL: https://issues.apache.org/jira/browse/SOLR-6637 Project: Solr Issue Type: Improvement Reporter: Varun Thacker Attachments: SOLR-6637.patch, SOLR-6637.patch, SOLR-6637.patch, SOLR-6637.patch We have a core backup command which backs up the index. We should have a restore command too. This would restore any named snapshots created by the replication handlers backup command. While working on this patch right now I realized that during backup we only backup the index. Should we backup the conf files also? Any thoughts? I could separate Jira for this. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6517) CollectionsAPI call REBALANCELEADERS
[ https://issues.apache.org/jira/browse/SOLR-6517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14194635#comment-14194635 ] Erick Erickson commented on SOLR-6517: -- Well, it's worked in every test, both manual and automated that I've run so far. Do you have a failure that demonstrates this? Maybe a mismatch in expectations? REBALANCELEADERS does _not_, and was not designed to, force the rebalancing immediately for nodes that do not have the preferredLeader property already set. It simply makes leaders out of those nodes that _already_ have the preferredLeader property set and are not currently the leader. So to rebalance the leaders across the cluster, you first need to BALANCESHARDUNIQUE with the preferredLeader property and _then_ issue the REBALANCELEADERS command. That way it's not required that the entire cluster be balanced, you can selectively assign _some_ preferredLeaders if you want. Or am I missing the boat completely? How do you see it not working? CollectionsAPI call REBALANCELEADERS Key: SOLR-6517 URL: https://issues.apache.org/jira/browse/SOLR-6517 Project: Solr Issue Type: New Feature Affects Versions: 5.0, Trunk Reporter: Erick Erickson Assignee: Erick Erickson Fix For: 5.0, Trunk Attachments: SOLR-6517.patch, SOLR-6517.patch, SOLR-6517.patch Perhaps the final piece of SOLR-6491. Once the preferred leadership roles are assigned, there has to be a command make it so Mr. Solr. This is something of a placeholder to collect ideas. One wouldn't want to flood the system with hundreds of re-assignments at once. Should this be synchronous or asnych? Should it make the best attempt but not worry about perfection? Should it??? a collection=name parameter would be required and it would re-elect all the leaders that were on the 'wrong' node I'm thinking an optionally allowing one to specify a shard in the case where you wanted to make a very specific change. Note that there's no need to specify a particular replica, since there should be only a single preferredLeader per slice. This command would do nothing to any slice that did not have a replica with a preferredLeader role. Likewise it would do nothing if the slice in question already had the leader role assigned to the node with the preferredLeader role. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6517) CollectionsAPI call REBALANCELEADERS
[ https://issues.apache.org/jira/browse/SOLR-6517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14194644#comment-14194644 ] Noble Paul commented on SOLR-6517: -- From the code , what I see is , a message is sent to overseer to change the leader. But there is not action performed to change the actual election queue. The role in the clusterstate is just a reflection of what should be there in the election queue and not the other way around. CollectionsAPI call REBALANCELEADERS Key: SOLR-6517 URL: https://issues.apache.org/jira/browse/SOLR-6517 Project: Solr Issue Type: New Feature Affects Versions: 5.0, Trunk Reporter: Erick Erickson Assignee: Erick Erickson Fix For: 5.0, Trunk Attachments: SOLR-6517.patch, SOLR-6517.patch, SOLR-6517.patch Perhaps the final piece of SOLR-6491. Once the preferred leadership roles are assigned, there has to be a command make it so Mr. Solr. This is something of a placeholder to collect ideas. One wouldn't want to flood the system with hundreds of re-assignments at once. Should this be synchronous or asnych? Should it make the best attempt but not worry about perfection? Should it??? a collection=name parameter would be required and it would re-elect all the leaders that were on the 'wrong' node I'm thinking an optionally allowing one to specify a shard in the case where you wanted to make a very specific change. Note that there's no need to specify a particular replica, since there should be only a single preferredLeader per slice. This command would do nothing to any slice that did not have a replica with a preferredLeader role. Likewise it would do nothing if the slice in question already had the leader role assigned to the node with the preferredLeader role. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-6517) CollectionsAPI call REBALANCELEADERS
[ https://issues.apache.org/jira/browse/SOLR-6517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14194644#comment-14194644 ] Noble Paul edited comment on SOLR-6517 at 11/3/14 3:50 PM: --- From the code , what I see is , a message is sent to overseer to change the leader. But there is no action performed to change the actual election queue. The role in the clusterstate is just a reflection of what should be there in the election queue and not the other way around. was (Author: noble.paul): From the code , what I see is , a message is sent to overseer to change the leader. But there is not action performed to change the actual election queue. The role in the clusterstate is just a reflection of what should be there in the election queue and not the other way around. CollectionsAPI call REBALANCELEADERS Key: SOLR-6517 URL: https://issues.apache.org/jira/browse/SOLR-6517 Project: Solr Issue Type: New Feature Affects Versions: 5.0, Trunk Reporter: Erick Erickson Assignee: Erick Erickson Fix For: 5.0, Trunk Attachments: SOLR-6517.patch, SOLR-6517.patch, SOLR-6517.patch Perhaps the final piece of SOLR-6491. Once the preferred leadership roles are assigned, there has to be a command make it so Mr. Solr. This is something of a placeholder to collect ideas. One wouldn't want to flood the system with hundreds of re-assignments at once. Should this be synchronous or asnych? Should it make the best attempt but not worry about perfection? Should it??? a collection=name parameter would be required and it would re-elect all the leaders that were on the 'wrong' node I'm thinking an optionally allowing one to specify a shard in the case where you wanted to make a very specific change. Note that there's no need to specify a particular replica, since there should be only a single preferredLeader per slice. This command would do nothing to any slice that did not have a replica with a preferredLeader role. Likewise it would do nothing if the slice in question already had the leader role assigned to the node with the preferredLeader role. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6645) Refactored DocumentObjectBinder and added AnnotationListeners
[ https://issues.apache.org/jira/browse/SOLR-6645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14194687#comment-14194687 ] Fabio Piro commented on SOLR-6645: -- _Friendly Reminder #2_ Hello, has anyone by any chance taken a look at the patch, between some trick or treats? Refactored DocumentObjectBinder and added AnnotationListeners - Key: SOLR-6645 URL: https://issues.apache.org/jira/browse/SOLR-6645 Project: Solr Issue Type: Improvement Components: clients - java Affects Versions: 4.10.2 Reporter: Fabio Piro Labels: annotations, binder, listener, solrj Fix For: 5.0, Trunk Attachments: SOLR-6645.patch Hello good people. It is understandable that the priority of SolrJ is to provide a stable API for java and not a rich-feature client, I'm well aware of that. On the other hand more features nowadays mean most of the time Spring Solr Data. Although I appreciate the enrichment work of that lib, sometimes depending on its monolithic dependencies and magic is not a valid option. So, I was thinking that the official DocumentObjectBinder could benefit from some love, and I had implemented a listener pattern for the annotations. *Note: No new logic or new annotations were introduced, the patch is only a refactor to make more extendible (for the user) the current DocumentObjectBinder and @Field DocField.* You can register your annotations and they relate listeners in the binder, and it will invoke the corresponding method in the listener on getBean and on toSolrInputDocument, therefore granting the chance to do something during the ongoing process. Changes are: * [MOD] */beans/DocumentObjectBinder*: The new logic and a new constructor for registering the annotations * [ADD] */impl/AccessorAnnotationListener*: Abstract utility class with the former get(), set(), isArray, isList, isContainedInMap etc... * [ADD] */impl/FieldAnnotationListener*: all the rest of DocField for dealing with @Field * [ADD] */AnnotationListener*: the base listener class * [MOD] */SolrServer*: added setBinder (this is the only tricky change, I hope it's not a problem). It's all well documented and the code is very easy to read. Tests are all green, it should be 100% backward compatible and the performance impact is void (the logic flow is exactly the same as now, and I only changed the bare essentials and nothing more, anyway). Some Examples (they are not part of the pull-request): The long awaited @FieldObject in 4 lines of code: https://issues.apache.org/jira/browse/SOLR-1945 {code:java} public class FieldObjectAnnotationListener extends AccessorAnnotationListenerFieldObject { public FieldObjectAnnotationListener(AnnotatedElement element, FieldObject annotation) { super(element, annotation); } @Override public void onGetBean(Object obj, SolrDocument doc, DocumentObjectBinder binder) { Object nested = binder.getBean(target.clazz, doc); setTo(obj, nested); } @Override public void onToSolrInputDocument(Object obj, SolrInputDocument doc, DocumentObjectBinder binder) { SolrInputDocument nested = binder.toSolrInputDocument(getFrom(obj)); for (Map.EntryString, SolrInputField entry : nested.entrySet()) { doc.addField(entry.getKey(), entry.getValue()); } } } {code} Or something entirely new like an annotation for ChildDocuments: {code:java} public class ChildDocumentsAnnotationListener extends AccessorAnnotationListenerChildDocuments { public ChildDocumentsAnnotationListener(AnnotatedElement element, ChildDocuments annotation) { super(element, annotation); if (!target.isInList || target.clazz.isPrimitive()) { throw new BindingException(@NestedDocuments is applicable only on ListObject.); } } @Override public void onGetBean(Object obj, SolrDocument doc, DocumentObjectBinder binder) { ListObject nested = new ArrayList(); for (SolrDocument child : doc.getChildDocuments()) { nested.add(binder.getBean(target.clazz, child));// this should be recursive, but it's only an example } setTo(obj, nested); } @Override public void onToSolrInputDocument(Object obj, SolrInputDocument doc, DocumentObjectBinder binder) { SolrInputDocument nested = binder.toSolrInputDocument(getFrom(obj)); doc.addChildDocuments(nested.getChildDocuments()); } } {code} In addition, all the logic is encapsulated in the listener, so you can make a custom FieldAnnotationListener too, and override the default one {code:java} public class CustomFieldAnnotationListener extends
[jira] [Comment Edited] (SOLR-6645) Refactored DocumentObjectBinder and added AnnotationListeners
[ https://issues.apache.org/jira/browse/SOLR-6645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14194687#comment-14194687 ] Fabio Piro edited comment on SOLR-6645 at 11/3/14 4:24 PM: --- __Friendly Reminder #2__ Hello, has anyone by any chance taken a look at the patch, between some trick or treats? was (Author: dewos): _Friendly Reminder #2_ Hello, has anyone by any chance taken a look at the patch, between some trick or treats? Refactored DocumentObjectBinder and added AnnotationListeners - Key: SOLR-6645 URL: https://issues.apache.org/jira/browse/SOLR-6645 Project: Solr Issue Type: Improvement Components: clients - java Affects Versions: 4.10.2 Reporter: Fabio Piro Labels: annotations, binder, listener, solrj Fix For: 5.0, Trunk Attachments: SOLR-6645.patch Hello good people. It is understandable that the priority of SolrJ is to provide a stable API for java and not a rich-feature client, I'm well aware of that. On the other hand more features nowadays mean most of the time Spring Solr Data. Although I appreciate the enrichment work of that lib, sometimes depending on its monolithic dependencies and magic is not a valid option. So, I was thinking that the official DocumentObjectBinder could benefit from some love, and I had implemented a listener pattern for the annotations. *Note: No new logic or new annotations were introduced, the patch is only a refactor to make more extendible (for the user) the current DocumentObjectBinder and @Field DocField.* You can register your annotations and they relate listeners in the binder, and it will invoke the corresponding method in the listener on getBean and on toSolrInputDocument, therefore granting the chance to do something during the ongoing process. Changes are: * [MOD] */beans/DocumentObjectBinder*: The new logic and a new constructor for registering the annotations * [ADD] */impl/AccessorAnnotationListener*: Abstract utility class with the former get(), set(), isArray, isList, isContainedInMap etc... * [ADD] */impl/FieldAnnotationListener*: all the rest of DocField for dealing with @Field * [ADD] */AnnotationListener*: the base listener class * [MOD] */SolrServer*: added setBinder (this is the only tricky change, I hope it's not a problem). It's all well documented and the code is very easy to read. Tests are all green, it should be 100% backward compatible and the performance impact is void (the logic flow is exactly the same as now, and I only changed the bare essentials and nothing more, anyway). Some Examples (they are not part of the pull-request): The long awaited @FieldObject in 4 lines of code: https://issues.apache.org/jira/browse/SOLR-1945 {code:java} public class FieldObjectAnnotationListener extends AccessorAnnotationListenerFieldObject { public FieldObjectAnnotationListener(AnnotatedElement element, FieldObject annotation) { super(element, annotation); } @Override public void onGetBean(Object obj, SolrDocument doc, DocumentObjectBinder binder) { Object nested = binder.getBean(target.clazz, doc); setTo(obj, nested); } @Override public void onToSolrInputDocument(Object obj, SolrInputDocument doc, DocumentObjectBinder binder) { SolrInputDocument nested = binder.toSolrInputDocument(getFrom(obj)); for (Map.EntryString, SolrInputField entry : nested.entrySet()) { doc.addField(entry.getKey(), entry.getValue()); } } } {code} Or something entirely new like an annotation for ChildDocuments: {code:java} public class ChildDocumentsAnnotationListener extends AccessorAnnotationListenerChildDocuments { public ChildDocumentsAnnotationListener(AnnotatedElement element, ChildDocuments annotation) { super(element, annotation); if (!target.isInList || target.clazz.isPrimitive()) { throw new BindingException(@NestedDocuments is applicable only on ListObject.); } } @Override public void onGetBean(Object obj, SolrDocument doc, DocumentObjectBinder binder) { ListObject nested = new ArrayList(); for (SolrDocument child : doc.getChildDocuments()) { nested.add(binder.getBean(target.clazz, child));// this should be recursive, but it's only an example } setTo(obj, nested); } @Override public void onToSolrInputDocument(Object obj, SolrInputDocument doc, DocumentObjectBinder binder) { SolrInputDocument nested = binder.toSolrInputDocument(getFrom(obj)); doc.addChildDocuments(nested.getChildDocuments()); } } {code} In addition, all the logic is
[jira] [Comment Edited] (SOLR-6645) Refactored DocumentObjectBinder and added AnnotationListeners
[ https://issues.apache.org/jira/browse/SOLR-6645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14194687#comment-14194687 ] Fabio Piro edited comment on SOLR-6645 at 11/3/14 4:24 PM: --- *Friendly Reminder #2* Hello, has anyone by any chance taken a look at the patch, between some trick or treats? was (Author: dewos): __Friendly Reminder #2__ Hello, has anyone by any chance taken a look at the patch, between some trick or treats? Refactored DocumentObjectBinder and added AnnotationListeners - Key: SOLR-6645 URL: https://issues.apache.org/jira/browse/SOLR-6645 Project: Solr Issue Type: Improvement Components: clients - java Affects Versions: 4.10.2 Reporter: Fabio Piro Labels: annotations, binder, listener, solrj Fix For: 5.0, Trunk Attachments: SOLR-6645.patch Hello good people. It is understandable that the priority of SolrJ is to provide a stable API for java and not a rich-feature client, I'm well aware of that. On the other hand more features nowadays mean most of the time Spring Solr Data. Although I appreciate the enrichment work of that lib, sometimes depending on its monolithic dependencies and magic is not a valid option. So, I was thinking that the official DocumentObjectBinder could benefit from some love, and I had implemented a listener pattern for the annotations. *Note: No new logic or new annotations were introduced, the patch is only a refactor to make more extendible (for the user) the current DocumentObjectBinder and @Field DocField.* You can register your annotations and they relate listeners in the binder, and it will invoke the corresponding method in the listener on getBean and on toSolrInputDocument, therefore granting the chance to do something during the ongoing process. Changes are: * [MOD] */beans/DocumentObjectBinder*: The new logic and a new constructor for registering the annotations * [ADD] */impl/AccessorAnnotationListener*: Abstract utility class with the former get(), set(), isArray, isList, isContainedInMap etc... * [ADD] */impl/FieldAnnotationListener*: all the rest of DocField for dealing with @Field * [ADD] */AnnotationListener*: the base listener class * [MOD] */SolrServer*: added setBinder (this is the only tricky change, I hope it's not a problem). It's all well documented and the code is very easy to read. Tests are all green, it should be 100% backward compatible and the performance impact is void (the logic flow is exactly the same as now, and I only changed the bare essentials and nothing more, anyway). Some Examples (they are not part of the pull-request): The long awaited @FieldObject in 4 lines of code: https://issues.apache.org/jira/browse/SOLR-1945 {code:java} public class FieldObjectAnnotationListener extends AccessorAnnotationListenerFieldObject { public FieldObjectAnnotationListener(AnnotatedElement element, FieldObject annotation) { super(element, annotation); } @Override public void onGetBean(Object obj, SolrDocument doc, DocumentObjectBinder binder) { Object nested = binder.getBean(target.clazz, doc); setTo(obj, nested); } @Override public void onToSolrInputDocument(Object obj, SolrInputDocument doc, DocumentObjectBinder binder) { SolrInputDocument nested = binder.toSolrInputDocument(getFrom(obj)); for (Map.EntryString, SolrInputField entry : nested.entrySet()) { doc.addField(entry.getKey(), entry.getValue()); } } } {code} Or something entirely new like an annotation for ChildDocuments: {code:java} public class ChildDocumentsAnnotationListener extends AccessorAnnotationListenerChildDocuments { public ChildDocumentsAnnotationListener(AnnotatedElement element, ChildDocuments annotation) { super(element, annotation); if (!target.isInList || target.clazz.isPrimitive()) { throw new BindingException(@NestedDocuments is applicable only on ListObject.); } } @Override public void onGetBean(Object obj, SolrDocument doc, DocumentObjectBinder binder) { ListObject nested = new ArrayList(); for (SolrDocument child : doc.getChildDocuments()) { nested.add(binder.getBean(target.clazz, child));// this should be recursive, but it's only an example } setTo(obj, nested); } @Override public void onToSolrInputDocument(Object obj, SolrInputDocument doc, DocumentObjectBinder binder) { SolrInputDocument nested = binder.toSolrInputDocument(getFrom(obj)); doc.addChildDocuments(nested.getChildDocuments()); } } {code} In addition, all the logic is
[jira] [Commented] (SOLR-6517) CollectionsAPI call REBALANCELEADERS
[ https://issues.apache.org/jira/browse/SOLR-6517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14194701#comment-14194701 ] Noble Paul commented on SOLR-6517: -- Unfortunately, solrcloud failures are hard to reproduce and fix. We need to put extra care while making changes to cloud . I've spent weeks debugging three overseer roles feature because it only failed in our 120 node cluster (never in the junit tests) CollectionsAPI call REBALANCELEADERS Key: SOLR-6517 URL: https://issues.apache.org/jira/browse/SOLR-6517 Project: Solr Issue Type: New Feature Affects Versions: 5.0, Trunk Reporter: Erick Erickson Assignee: Erick Erickson Fix For: 5.0, Trunk Attachments: SOLR-6517.patch, SOLR-6517.patch, SOLR-6517.patch Perhaps the final piece of SOLR-6491. Once the preferred leadership roles are assigned, there has to be a command make it so Mr. Solr. This is something of a placeholder to collect ideas. One wouldn't want to flood the system with hundreds of re-assignments at once. Should this be synchronous or asnych? Should it make the best attempt but not worry about perfection? Should it??? a collection=name parameter would be required and it would re-elect all the leaders that were on the 'wrong' node I'm thinking an optionally allowing one to specify a shard in the case where you wanted to make a very specific change. Note that there's no need to specify a particular replica, since there should be only a single preferredLeader per slice. This command would do nothing to any slice that did not have a replica with a preferredLeader role. Likewise it would do nothing if the slice in question already had the leader role assigned to the node with the preferredLeader role. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-6517) CollectionsAPI call REBALANCELEADERS
[ https://issues.apache.org/jira/browse/SOLR-6517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14194701#comment-14194701 ] Noble Paul edited comment on SOLR-6517 at 11/3/14 4:35 PM: --- Unfortunately, solrcloud failures are hard to reproduce and fix. We need to put extra care while making changes to cloud . I've spent weeks debugging the overseer roles feature because it only failed in our 120 node cluster (never in the junit tests) was (Author: noble.paul): Unfortunately, solrcloud failures are hard to reproduce and fix. We need to put extra care while making changes to cloud . I've spent weeks debugging three overseer roles feature because it only failed in our 120 node cluster (never in the junit tests) CollectionsAPI call REBALANCELEADERS Key: SOLR-6517 URL: https://issues.apache.org/jira/browse/SOLR-6517 Project: Solr Issue Type: New Feature Affects Versions: 5.0, Trunk Reporter: Erick Erickson Assignee: Erick Erickson Fix For: 5.0, Trunk Attachments: SOLR-6517.patch, SOLR-6517.patch, SOLR-6517.patch Perhaps the final piece of SOLR-6491. Once the preferred leadership roles are assigned, there has to be a command make it so Mr. Solr. This is something of a placeholder to collect ideas. One wouldn't want to flood the system with hundreds of re-assignments at once. Should this be synchronous or asnych? Should it make the best attempt but not worry about perfection? Should it??? a collection=name parameter would be required and it would re-elect all the leaders that were on the 'wrong' node I'm thinking an optionally allowing one to specify a shard in the case where you wanted to make a very specific change. Note that there's no need to specify a particular replica, since there should be only a single preferredLeader per slice. This command would do nothing to any slice that did not have a replica with a preferredLeader role. Likewise it would do nothing if the slice in question already had the leader role assigned to the node with the preferredLeader role. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-6046) RegExp.toAutomaton high memory use
[ https://issues.apache.org/jira/browse/LUCENE-6046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nik Everett updated LUCENE-6046: Attachment: LUCENE-6046.patch First cut at a patch. Adds maxDeterminizedStates to Operations.determinize and pipes it through to tons of places. I think its important never to hide when determinize is called because of how potentially heavy it is. Forcing callers of MinimizationOperations.minimize, Operations.reverse, Operations.minus etc to specify maxDeterminizedStates makes it pretty clear that the automaton might be determinized during those processes. I added an unchecked exception for when the Automaton can't be determinized within the specified number of state but I'm really tempted to change it to a checked exception to make it super duper obvious when determinization might occur. RegExp.toAutomaton high memory use -- Key: LUCENE-6046 URL: https://issues.apache.org/jira/browse/LUCENE-6046 Project: Lucene - Core Issue Type: Bug Components: core/queryparser Affects Versions: 4.10.1 Reporter: Lee Hinman Assignee: Michael McCandless Priority: Minor Attachments: LUCENE-6046.patch When creating an automaton from an org.apache.lucene.util.automaton.RegExp, it's possible for the automaton to use so much memory it exceeds the maximum array size for java. The following caused an OutOfMemoryError with a 32gb heap: {noformat} new RegExp(\\[\\[(Datei|File|Bild|Image):[^]]*alt=[^]|}]{50,200}).toAutomaton(); {noformat} When increased to a 60gb heap, the following exception is thrown: {noformat} 1 java.lang.IllegalArgumentException: requested array size 2147483624 exceeds maximum array in java (2147483623) 1 __randomizedtesting.SeedInfo.seed([7BE81EF678615C32:95C8057A4ABA5B52]:0) 1 org.apache.lucene.util.ArrayUtil.oversize(ArrayUtil.java:168) 1 org.apache.lucene.util.ArrayUtil.grow(ArrayUtil.java:295) 1 org.apache.lucene.util.automaton.Automaton$Builder.addTransition(Automaton.java:639) 1 org.apache.lucene.util.automaton.Operations.determinize(Operations.java:741) 1 org.apache.lucene.util.automaton.MinimizationOperations.minimizeHopcroft(MinimizationOperations.java:62) 1 org.apache.lucene.util.automaton.MinimizationOperations.minimize(MinimizationOperations.java:51) 1 org.apache.lucene.util.automaton.RegExp.toAutomaton(RegExp.java:477) 1 org.apache.lucene.util.automaton.RegExp.toAutomaton(RegExp.java:426) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6046) RegExp.toAutomaton high memory use
[ https://issues.apache.org/jira/browse/LUCENE-6046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14194716#comment-14194716 ] Nik Everett commented on LUCENE-6046: - Oh - I'm still running the solr tests against this. I imagine they'll pass as they've been running fine for 30 minutes now but I should throw that out there in case someone gets them to fail with this before I do. RegExp.toAutomaton high memory use -- Key: LUCENE-6046 URL: https://issues.apache.org/jira/browse/LUCENE-6046 Project: Lucene - Core Issue Type: Bug Components: core/queryparser Affects Versions: 4.10.1 Reporter: Lee Hinman Assignee: Michael McCandless Priority: Minor Attachments: LUCENE-6046.patch When creating an automaton from an org.apache.lucene.util.automaton.RegExp, it's possible for the automaton to use so much memory it exceeds the maximum array size for java. The following caused an OutOfMemoryError with a 32gb heap: {noformat} new RegExp(\\[\\[(Datei|File|Bild|Image):[^]]*alt=[^]|}]{50,200}).toAutomaton(); {noformat} When increased to a 60gb heap, the following exception is thrown: {noformat} 1 java.lang.IllegalArgumentException: requested array size 2147483624 exceeds maximum array in java (2147483623) 1 __randomizedtesting.SeedInfo.seed([7BE81EF678615C32:95C8057A4ABA5B52]:0) 1 org.apache.lucene.util.ArrayUtil.oversize(ArrayUtil.java:168) 1 org.apache.lucene.util.ArrayUtil.grow(ArrayUtil.java:295) 1 org.apache.lucene.util.automaton.Automaton$Builder.addTransition(Automaton.java:639) 1 org.apache.lucene.util.automaton.Operations.determinize(Operations.java:741) 1 org.apache.lucene.util.automaton.MinimizationOperations.minimizeHopcroft(MinimizationOperations.java:62) 1 org.apache.lucene.util.automaton.MinimizationOperations.minimize(MinimizationOperations.java:51) 1 org.apache.lucene.util.automaton.RegExp.toAutomaton(RegExp.java:477) 1 org.apache.lucene.util.automaton.RegExp.toAutomaton(RegExp.java:426) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-6591) Cluster state updates can be lost on exception in main queue loop
[ https://issues.apache.org/jira/browse/SOLR-6591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shalin Shekhar Mangar updated SOLR-6591: Attachment: SOLR-6591-ignore-no-collection-path.patch {quote} A rapid create+delete loop for collections with state format 1 causes the above exception to happen. This is because the updateZkState method assumes that the collection exists and it tries to write to /collections/collection_name/state.json directly without verifying whether the /collections/collection_name zk node exists {quote} This patch ignores state messages which are trying to create new collections when the parent zk path doesn't exist. I've added the following comment in the code to explain the situation: {quote} // if the /collections/collection_name path doesn't exist then it means that // 1) the user invoked a DELETE collection API and the OverseerCollectionProcessor has deleted // this zk path. // 2) these are most likely old state messages which are only being processed now because // if they were new state messages then in legacy mode, a new collection would have been // created with stateFormat = 1 (which is the default state format) // 3) these can't be new state messages created for a new collection because // otherwise the OverseerCollectionProcessor would have already created this path // as part of the create collection API call -- which is the only way in which a collection // with stateFormat 1 can possibly be created {quote} Cluster state updates can be lost on exception in main queue loop - Key: SOLR-6591 URL: https://issues.apache.org/jira/browse/SOLR-6591 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: Trunk Reporter: Shalin Shekhar Mangar Assignee: Shalin Shekhar Mangar Fix For: Trunk Attachments: SOLR-6591-constructStateFix.patch, SOLR-6591-ignore-no-collection-path.patch, SOLR-6591-no-mixed-batches.patch, SOLR-6591.patch I found this bug while going through the failure on jenkins: https://builds.apache.org/job/Lucene-Solr-NightlyTests-trunk/648/ {code} 2 tests failed. REGRESSION: org.apache.solr.cloud.CollectionsAPIDistributedZkTest.testDistribSearch Error Message: Error CREATEing SolrCore 'halfcollection_shard1_replica1': Unable to create core [halfcollection_shard1_replica1] Caused by: Could not get shard id for core: halfcollection_shard1_replica1 Stack Trace: org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: Error CREATEing SolrCore 'halfcollection_shard1_replica1': Unable to create core [halfcollection_shard1_replica1] Caused by: Could not get shard id for core: halfcollection_shard1_replica1 at org.apache.solr.client.solrj.impl.HttpSolrServer.executeMethod(HttpSolrServer.java:570) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:215) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:211) at org.apache.solr.cloud.CollectionsAPIDistributedZkTest.testErrorHandling(CollectionsAPIDistributedZkTest.java:583) at org.apache.solr.cloud.CollectionsAPIDistributedZkTest.doTest(CollectionsAPIDistributedZkTest.java:205) at org.apache.solr.BaseDistributedSearchTestCase.testDistribSearch(BaseDistributedSearchTestCase.java:869) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1618) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-5.x-MacOSX (64bit/jdk1.7.0) - Build # 1877 - Failure!
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-5.x-MacOSX/1877/ Java: 64bit/jdk1.7.0 -XX:-UseCompressedOops -XX:+UseParallelGC (asserts: true) 1 tests failed. REGRESSION: org.apache.solr.cloud.CollectionsAPIDistributedZkTest.testDistribSearch Error Message: Error CREATEing SolrCore 'halfcollection_shard1_replica1': Unable to create core [halfcollection_shard1_replica1] Caused by: Could not get shard id for core: halfcollection_shard1_replica1 Stack Trace: org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: Error CREATEing SolrCore 'halfcollection_shard1_replica1': Unable to create core [halfcollection_shard1_replica1] Caused by: Could not get shard id for core: halfcollection_shard1_replica1 at __randomizedtesting.SeedInfo.seed([BB86E03433744719:3A606E2C442B2725]:0) at org.apache.solr.client.solrj.impl.HttpSolrServer.executeMethod(HttpSolrServer.java:569) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:215) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:211) at org.apache.solr.cloud.CollectionsAPIDistributedZkTest.testErrorHandling(CollectionsAPIDistributedZkTest.java:583) at org.apache.solr.cloud.CollectionsAPIDistributedZkTest.doTest(CollectionsAPIDistributedZkTest.java:205) at org.apache.solr.BaseDistributedSearchTestCase.testDistribSearch(BaseDistributedSearchTestCase.java:869) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1618) at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:827) at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:863) at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:877) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53) at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:365) at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:798) at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:458) at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:836) at com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:738) at com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:772) at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:783) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at
[jira] [Updated] (SOLR-6591) Cluster state updates can be lost on exception in main queue loop
[ https://issues.apache.org/jira/browse/SOLR-6591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shalin Shekhar Mangar updated SOLR-6591: Attachment: (was: SOLR-6591-ignore-no-collection-path.patch) Cluster state updates can be lost on exception in main queue loop - Key: SOLR-6591 URL: https://issues.apache.org/jira/browse/SOLR-6591 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: Trunk Reporter: Shalin Shekhar Mangar Assignee: Shalin Shekhar Mangar Fix For: Trunk Attachments: SOLR-6591-constructStateFix.patch, SOLR-6591-no-mixed-batches.patch, SOLR-6591.patch I found this bug while going through the failure on jenkins: https://builds.apache.org/job/Lucene-Solr-NightlyTests-trunk/648/ {code} 2 tests failed. REGRESSION: org.apache.solr.cloud.CollectionsAPIDistributedZkTest.testDistribSearch Error Message: Error CREATEing SolrCore 'halfcollection_shard1_replica1': Unable to create core [halfcollection_shard1_replica1] Caused by: Could not get shard id for core: halfcollection_shard1_replica1 Stack Trace: org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: Error CREATEing SolrCore 'halfcollection_shard1_replica1': Unable to create core [halfcollection_shard1_replica1] Caused by: Could not get shard id for core: halfcollection_shard1_replica1 at org.apache.solr.client.solrj.impl.HttpSolrServer.executeMethod(HttpSolrServer.java:570) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:215) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:211) at org.apache.solr.cloud.CollectionsAPIDistributedZkTest.testErrorHandling(CollectionsAPIDistributedZkTest.java:583) at org.apache.solr.cloud.CollectionsAPIDistributedZkTest.doTest(CollectionsAPIDistributedZkTest.java:205) at org.apache.solr.BaseDistributedSearchTestCase.testDistribSearch(BaseDistributedSearchTestCase.java:869) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1618) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-6591) Cluster state updates can be lost on exception in main queue loop
[ https://issues.apache.org/jira/browse/SOLR-6591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shalin Shekhar Mangar updated SOLR-6591: Attachment: SOLR-6591-ignore-no-collection-path.patch With the right patch (SOLR-6591-ignore-no-collection-path.patch) this time. Cluster state updates can be lost on exception in main queue loop - Key: SOLR-6591 URL: https://issues.apache.org/jira/browse/SOLR-6591 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: Trunk Reporter: Shalin Shekhar Mangar Assignee: Shalin Shekhar Mangar Fix For: Trunk Attachments: SOLR-6591-constructStateFix.patch, SOLR-6591-ignore-no-collection-path.patch, SOLR-6591-no-mixed-batches.patch, SOLR-6591.patch I found this bug while going through the failure on jenkins: https://builds.apache.org/job/Lucene-Solr-NightlyTests-trunk/648/ {code} 2 tests failed. REGRESSION: org.apache.solr.cloud.CollectionsAPIDistributedZkTest.testDistribSearch Error Message: Error CREATEing SolrCore 'halfcollection_shard1_replica1': Unable to create core [halfcollection_shard1_replica1] Caused by: Could not get shard id for core: halfcollection_shard1_replica1 Stack Trace: org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: Error CREATEing SolrCore 'halfcollection_shard1_replica1': Unable to create core [halfcollection_shard1_replica1] Caused by: Could not get shard id for core: halfcollection_shard1_replica1 at org.apache.solr.client.solrj.impl.HttpSolrServer.executeMethod(HttpSolrServer.java:570) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:215) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:211) at org.apache.solr.cloud.CollectionsAPIDistributedZkTest.testErrorHandling(CollectionsAPIDistributedZkTest.java:583) at org.apache.solr.cloud.CollectionsAPIDistributedZkTest.doTest(CollectionsAPIDistributedZkTest.java:205) at org.apache.solr.BaseDistributedSearchTestCase.testDistribSearch(BaseDistributedSearchTestCase.java:869) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1618) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-6690) Highlight expanded results
Simon Endele created SOLR-6690: -- Summary: Highlight expanded results Key: SOLR-6690 URL: https://issues.apache.org/jira/browse/SOLR-6690 Project: Solr Issue Type: Wish Reporter: Simon Endele Priority: Minor Is it possible to apply the highlighting to documents in the expand section in the Solr response? I'm aware that https://cwiki.apache.org/confluence/x/jiBqAg states: All downstream components (faceting, highlighting, etc...) will work with the collapsed result set. So I tried to put the highlight component after the expand component like this: {code:xml}arr name=components strquery/str strfacet/str strstats/str strdebug/str strexpand/str strhighlight/str /arr{code} But with no effect. Is there another switch that needs to be flipped or could this be implemented easily? IMHO this is quite a common use case... -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6591) Cluster state updates can be lost on exception in main queue loop
[ https://issues.apache.org/jira/browse/SOLR-6591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14194756#comment-14194756 ] ASF subversion and git services commented on SOLR-6591: --- Commit 1636400 from sha...@apache.org in branch 'dev/trunk' [ https://svn.apache.org/r1636400 ] SOLR-6591: Ignore overseer operations for collections with stateFormat 1 if the parent ZK path doesn't exist Cluster state updates can be lost on exception in main queue loop - Key: SOLR-6591 URL: https://issues.apache.org/jira/browse/SOLR-6591 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: Trunk Reporter: Shalin Shekhar Mangar Assignee: Shalin Shekhar Mangar Fix For: Trunk Attachments: SOLR-6591-constructStateFix.patch, SOLR-6591-ignore-no-collection-path.patch, SOLR-6591-no-mixed-batches.patch, SOLR-6591.patch I found this bug while going through the failure on jenkins: https://builds.apache.org/job/Lucene-Solr-NightlyTests-trunk/648/ {code} 2 tests failed. REGRESSION: org.apache.solr.cloud.CollectionsAPIDistributedZkTest.testDistribSearch Error Message: Error CREATEing SolrCore 'halfcollection_shard1_replica1': Unable to create core [halfcollection_shard1_replica1] Caused by: Could not get shard id for core: halfcollection_shard1_replica1 Stack Trace: org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: Error CREATEing SolrCore 'halfcollection_shard1_replica1': Unable to create core [halfcollection_shard1_replica1] Caused by: Could not get shard id for core: halfcollection_shard1_replica1 at org.apache.solr.client.solrj.impl.HttpSolrServer.executeMethod(HttpSolrServer.java:570) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:215) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:211) at org.apache.solr.cloud.CollectionsAPIDistributedZkTest.testErrorHandling(CollectionsAPIDistributedZkTest.java:583) at org.apache.solr.cloud.CollectionsAPIDistributedZkTest.doTest(CollectionsAPIDistributedZkTest.java:205) at org.apache.solr.BaseDistributedSearchTestCase.testDistribSearch(BaseDistributedSearchTestCase.java:869) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1618) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-6691) REBALANCELEADERS needs to change the leader election queue.
Erick Erickson created SOLR-6691: Summary: REBALANCELEADERS needs to change the leader election queue. Key: SOLR-6691 URL: https://issues.apache.org/jira/browse/SOLR-6691 Project: Solr Issue Type: Bug Reporter: Erick Erickson Assignee: Erick Erickson The original code (SOLR-6517) assumed that changes in the clusterstate after issuing a command to the overseer to change the leader indicated that the leader was successfully changed. Fortunately, Noble clued me in that this isn't the case and that the potential leader needs to insert itself in the leader election queue before trigging the change leader command. Inserting themselves in the front of the queue should probably happen in BALANCESHARDUNIQUE when the preferredLeader property is assigned as well. [~noble.paul] Do evil things happen if a node joins at the head but it's _already_ in the queue? These ephemeral nodes in the queue are watching each other. So if node1 is the leader you have node1 - node2 - node3 - node4 where - means watches. Now, if node3 puts itself at the head of the list, you have node1 - node2 - node3 - node4 I _think_ when I was looking at this it all just worked. 1 node 1 goes down. Nodes 2 and 3 duke it out but there's code to insure that node3 becomes the leader and node2 inserts itself at then end so it's watching node 4. 2 node 2 goes down, nobody gets notified and it doesn't matter. 3 node 3 goes down, node 4 gets notified and starts watching node 2 by inserting itself at the end of the list. 4 node 4 goes down, nobody gets notified and it doesn't matter. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6591) Cluster state updates can be lost on exception in main queue loop
[ https://issues.apache.org/jira/browse/SOLR-6591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14194760#comment-14194760 ] ASF subversion and git services commented on SOLR-6591: --- Commit 1636401 from sha...@apache.org in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1636401 ] SOLR-6591: Ignore overseer operations for collections with stateFormat 1 if the parent ZK path doesn't exist Cluster state updates can be lost on exception in main queue loop - Key: SOLR-6591 URL: https://issues.apache.org/jira/browse/SOLR-6591 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: Trunk Reporter: Shalin Shekhar Mangar Assignee: Shalin Shekhar Mangar Fix For: Trunk Attachments: SOLR-6591-constructStateFix.patch, SOLR-6591-ignore-no-collection-path.patch, SOLR-6591-no-mixed-batches.patch, SOLR-6591.patch I found this bug while going through the failure on jenkins: https://builds.apache.org/job/Lucene-Solr-NightlyTests-trunk/648/ {code} 2 tests failed. REGRESSION: org.apache.solr.cloud.CollectionsAPIDistributedZkTest.testDistribSearch Error Message: Error CREATEing SolrCore 'halfcollection_shard1_replica1': Unable to create core [halfcollection_shard1_replica1] Caused by: Could not get shard id for core: halfcollection_shard1_replica1 Stack Trace: org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: Error CREATEing SolrCore 'halfcollection_shard1_replica1': Unable to create core [halfcollection_shard1_replica1] Caused by: Could not get shard id for core: halfcollection_shard1_replica1 at org.apache.solr.client.solrj.impl.HttpSolrServer.executeMethod(HttpSolrServer.java:570) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:215) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:211) at org.apache.solr.cloud.CollectionsAPIDistributedZkTest.testErrorHandling(CollectionsAPIDistributedZkTest.java:583) at org.apache.solr.cloud.CollectionsAPIDistributedZkTest.doTest(CollectionsAPIDistributedZkTest.java:205) at org.apache.solr.BaseDistributedSearchTestCase.testDistribSearch(BaseDistributedSearchTestCase.java:869) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1618) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6517) CollectionsAPI call REBALANCELEADERS
[ https://issues.apache.org/jira/browse/SOLR-6517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14194761#comment-14194761 ] Erick Erickson commented on SOLR-6517: -- Gah. OK, the fact that the state information isn't reflective of the actual state is what was throwing me. Let's move any further discussion over to SOLR-6691 (which I just created). I've tried to synopsize the discussion in that JIRA. Thanks for your patience in explaining, mucking around in the cloud state kinda scares me... apparently for good reason. CollectionsAPI call REBALANCELEADERS Key: SOLR-6517 URL: https://issues.apache.org/jira/browse/SOLR-6517 Project: Solr Issue Type: New Feature Affects Versions: 5.0, Trunk Reporter: Erick Erickson Assignee: Erick Erickson Fix For: 5.0, Trunk Attachments: SOLR-6517.patch, SOLR-6517.patch, SOLR-6517.patch Perhaps the final piece of SOLR-6491. Once the preferred leadership roles are assigned, there has to be a command make it so Mr. Solr. This is something of a placeholder to collect ideas. One wouldn't want to flood the system with hundreds of re-assignments at once. Should this be synchronous or asnych? Should it make the best attempt but not worry about perfection? Should it??? a collection=name parameter would be required and it would re-elect all the leaders that were on the 'wrong' node I'm thinking an optionally allowing one to specify a shard in the case where you wanted to make a very specific change. Note that there's no need to specify a particular replica, since there should be only a single preferredLeader per slice. This command would do nothing to any slice that did not have a replica with a preferredLeader role. Likewise it would do nothing if the slice in question already had the leader role assigned to the node with the preferredLeader role. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-6691) REBALANCELEADERS needs to change the leader election queue.
[ https://issues.apache.org/jira/browse/SOLR-6691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erick Erickson updated SOLR-6691: - Description: The original code (SOLR-6517) assumed that changes in the clusterstate after issuing a command to the overseer to change the leader indicated that the leader was successfully changed. Fortunately, Noble clued me in that this isn't the case and that the potential leader needs to insert itself in the leader election queue before trigging the change leader command. Inserting themselves in the front of the queue should probably happen in BALANCESHARDUNIQUE when the preferredLeader property is assigned as well. [~noble.paul] Do evil things happen if a node joins at the head but it's _already_ in the queue? These ephemeral nodes in the queue are watching each other. So if node1 is the leader you have node1 - node2 - node3 - node4 where - means watches. Now, if node3 puts itself at the head of the list, you have {{code} node1 - node2 - node3 - node4 {{code}} I _think_ when I was looking at this it all just worked. 1 node 1 goes down. Nodes 2 and 3 duke it out but there's code to insure that node3 becomes the leader and node2 inserts itself at then end so it's watching node 4. 2 node 2 goes down, nobody gets notified and it doesn't matter. 3 node 3 goes down, node 4 gets notified and starts watching node 2 by inserting itself at the end of the list. 4 node 4 goes down, nobody gets notified and it doesn't matter. was: The original code (SOLR-6517) assumed that changes in the clusterstate after issuing a command to the overseer to change the leader indicated that the leader was successfully changed. Fortunately, Noble clued me in that this isn't the case and that the potential leader needs to insert itself in the leader election queue before trigging the change leader command. Inserting themselves in the front of the queue should probably happen in BALANCESHARDUNIQUE when the preferredLeader property is assigned as well. [~noble.paul] Do evil things happen if a node joins at the head but it's _already_ in the queue? These ephemeral nodes in the queue are watching each other. So if node1 is the leader you have node1 - node2 - node3 - node4 where - means watches. Now, if node3 puts itself at the head of the list, you have node1 - node2 - node3 - node4 I _think_ when I was looking at this it all just worked. 1 node 1 goes down. Nodes 2 and 3 duke it out but there's code to insure that node3 becomes the leader and node2 inserts itself at then end so it's watching node 4. 2 node 2 goes down, nobody gets notified and it doesn't matter. 3 node 3 goes down, node 4 gets notified and starts watching node 2 by inserting itself at the end of the list. 4 node 4 goes down, nobody gets notified and it doesn't matter. REBALANCELEADERS needs to change the leader election queue. --- Key: SOLR-6691 URL: https://issues.apache.org/jira/browse/SOLR-6691 Project: Solr Issue Type: Bug Reporter: Erick Erickson Assignee: Erick Erickson The original code (SOLR-6517) assumed that changes in the clusterstate after issuing a command to the overseer to change the leader indicated that the leader was successfully changed. Fortunately, Noble clued me in that this isn't the case and that the potential leader needs to insert itself in the leader election queue before trigging the change leader command. Inserting themselves in the front of the queue should probably happen in BALANCESHARDUNIQUE when the preferredLeader property is assigned as well. [~noble.paul] Do evil things happen if a node joins at the head but it's _already_ in the queue? These ephemeral nodes in the queue are watching each other. So if node1 is the leader you have node1 - node2 - node3 - node4 where - means watches. Now, if node3 puts itself at the head of the list, you have {{code} node1 - node2 - node3 - node4 {{code}} I _think_ when I was looking at this it all just worked. 1 node 1 goes down. Nodes 2 and 3 duke it out but there's code to insure that node3 becomes the leader and node2 inserts itself at then end so it's watching node 4. 2 node 2 goes down, nobody gets notified and it doesn't matter. 3 node 3 goes down, node 4 gets notified and starts watching node 2 by inserting itself at the end of the list. 4 node 4 goes down, nobody gets notified and it doesn't matter. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-6691) REBALANCELEADERS needs to change the leader election queue.
[ https://issues.apache.org/jira/browse/SOLR-6691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erick Erickson updated SOLR-6691: - Description: The original code (SOLR-6517) assumed that changes in the clusterstate after issuing a command to the overseer to change the leader indicated that the leader was successfully changed. Fortunately, Noble clued me in that this isn't the case and that the potential leader needs to insert itself in the leader election queue before trigging the change leader command. Inserting themselves in the front of the queue should probably happen in BALANCESHARDUNIQUE when the preferredLeader property is assigned as well. [~noble.paul] Do evil things happen if a node joins at the head but it's _already_ in the queue? These ephemeral nodes in the queue are watching each other. So if node1 is the leader you have node1 - node2 - node3 - node4 where - means watches. Now, if node3 puts itself at the head of the list, you have {{code}} node1 - node2 - node3 - node4 {{code}} I _think_ when I was looking at this it all just worked. 1 node 1 goes down. Nodes 2 and 3 duke it out but there's code to insure that node3 becomes the leader and node2 inserts itself at then end so it's watching node 4. 2 node 2 goes down, nobody gets notified and it doesn't matter. 3 node 3 goes down, node 4 gets notified and starts watching node 2 by inserting itself at the end of the list. 4 node 4 goes down, nobody gets notified and it doesn't matter. was: The original code (SOLR-6517) assumed that changes in the clusterstate after issuing a command to the overseer to change the leader indicated that the leader was successfully changed. Fortunately, Noble clued me in that this isn't the case and that the potential leader needs to insert itself in the leader election queue before trigging the change leader command. Inserting themselves in the front of the queue should probably happen in BALANCESHARDUNIQUE when the preferredLeader property is assigned as well. [~noble.paul] Do evil things happen if a node joins at the head but it's _already_ in the queue? These ephemeral nodes in the queue are watching each other. So if node1 is the leader you have node1 - node2 - node3 - node4 where - means watches. Now, if node3 puts itself at the head of the list, you have {{code} node1 - node2 - node3 - node4 {{code}} I _think_ when I was looking at this it all just worked. 1 node 1 goes down. Nodes 2 and 3 duke it out but there's code to insure that node3 becomes the leader and node2 inserts itself at then end so it's watching node 4. 2 node 2 goes down, nobody gets notified and it doesn't matter. 3 node 3 goes down, node 4 gets notified and starts watching node 2 by inserting itself at the end of the list. 4 node 4 goes down, nobody gets notified and it doesn't matter. REBALANCELEADERS needs to change the leader election queue. --- Key: SOLR-6691 URL: https://issues.apache.org/jira/browse/SOLR-6691 Project: Solr Issue Type: Bug Reporter: Erick Erickson Assignee: Erick Erickson The original code (SOLR-6517) assumed that changes in the clusterstate after issuing a command to the overseer to change the leader indicated that the leader was successfully changed. Fortunately, Noble clued me in that this isn't the case and that the potential leader needs to insert itself in the leader election queue before trigging the change leader command. Inserting themselves in the front of the queue should probably happen in BALANCESHARDUNIQUE when the preferredLeader property is assigned as well. [~noble.paul] Do evil things happen if a node joins at the head but it's _already_ in the queue? These ephemeral nodes in the queue are watching each other. So if node1 is the leader you have node1 - node2 - node3 - node4 where - means watches. Now, if node3 puts itself at the head of the list, you have {{code}} node1 - node2 - node3 - node4 {{code}} I _think_ when I was looking at this it all just worked. 1 node 1 goes down. Nodes 2 and 3 duke it out but there's code to insure that node3 becomes the leader and node2 inserts itself at then end so it's watching node 4. 2 node 2 goes down, nobody gets notified and it doesn't matter. 3 node 3 goes down, node 4 gets notified and starts watching node 2 by inserting itself at the end of the list. 4 node 4 goes down, nobody gets notified and it doesn't matter. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-6691) REBALANCELEADERS needs to change the leader election queue.
[ https://issues.apache.org/jira/browse/SOLR-6691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erick Erickson updated SOLR-6691: - Description: The original code (SOLR-6517) assumed that changes in the clusterstate after issuing a command to the overseer to change the leader indicated that the leader was successfully changed. Fortunately, Noble clued me in that this isn't the case and that the potential leader needs to insert itself in the leader election queue before trigging the change leader command. Inserting themselves in the front of the queue should probably happen in BALANCESHARDUNIQUE when the preferredLeader property is assigned as well. [~noble.paul] Do evil things happen if a node joins at the head but it's _already_ in the queue? These ephemeral nodes in the queue are watching each other. So if node1 is the leader you have node1 - node2 - node3 - node4 where - means watches. Now, if node3 puts itself at the head of the list, you have {code} node1 - node2 - node3 - node4 {code} I _think_ when I was looking at this it all just worked. 1 node 1 goes down. Nodes 2 and 3 duke it out but there's code to insure that node3 becomes the leader and node2 inserts itself at then end so it's watching node 4. 2 node 2 goes down, nobody gets notified and it doesn't matter. 3 node 3 goes down, node 4 gets notified and starts watching node 2 by inserting itself at the end of the list. 4 node 4 goes down, nobody gets notified and it doesn't matter. was: The original code (SOLR-6517) assumed that changes in the clusterstate after issuing a command to the overseer to change the leader indicated that the leader was successfully changed. Fortunately, Noble clued me in that this isn't the case and that the potential leader needs to insert itself in the leader election queue before trigging the change leader command. Inserting themselves in the front of the queue should probably happen in BALANCESHARDUNIQUE when the preferredLeader property is assigned as well. [~noble.paul] Do evil things happen if a node joins at the head but it's _already_ in the queue? These ephemeral nodes in the queue are watching each other. So if node1 is the leader you have node1 - node2 - node3 - node4 where - means watches. Now, if node3 puts itself at the head of the list, you have {{code}} node1 - node2 - node3 - node4 {{code}} I _think_ when I was looking at this it all just worked. 1 node 1 goes down. Nodes 2 and 3 duke it out but there's code to insure that node3 becomes the leader and node2 inserts itself at then end so it's watching node 4. 2 node 2 goes down, nobody gets notified and it doesn't matter. 3 node 3 goes down, node 4 gets notified and starts watching node 2 by inserting itself at the end of the list. 4 node 4 goes down, nobody gets notified and it doesn't matter. REBALANCELEADERS needs to change the leader election queue. --- Key: SOLR-6691 URL: https://issues.apache.org/jira/browse/SOLR-6691 Project: Solr Issue Type: Bug Reporter: Erick Erickson Assignee: Erick Erickson The original code (SOLR-6517) assumed that changes in the clusterstate after issuing a command to the overseer to change the leader indicated that the leader was successfully changed. Fortunately, Noble clued me in that this isn't the case and that the potential leader needs to insert itself in the leader election queue before trigging the change leader command. Inserting themselves in the front of the queue should probably happen in BALANCESHARDUNIQUE when the preferredLeader property is assigned as well. [~noble.paul] Do evil things happen if a node joins at the head but it's _already_ in the queue? These ephemeral nodes in the queue are watching each other. So if node1 is the leader you have node1 - node2 - node3 - node4 where - means watches. Now, if node3 puts itself at the head of the list, you have {code} node1 - node2 - node3 - node4 {code} I _think_ when I was looking at this it all just worked. 1 node 1 goes down. Nodes 2 and 3 duke it out but there's code to insure that node3 becomes the leader and node2 inserts itself at then end so it's watching node 4. 2 node 2 goes down, nobody gets notified and it doesn't matter. 3 node 3 goes down, node 4 gets notified and starts watching node 2 by inserting itself at the end of the list. 4 node 4 goes down, nobody gets notified and it doesn't matter. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-6691) REBALANCELEADERS needs to change the leader election queue.
[ https://issues.apache.org/jira/browse/SOLR-6691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erick Erickson updated SOLR-6691: - Description: The original code (SOLR-6517) assumed that changes in the clusterstate after issuing a command to the overseer to change the leader indicated that the leader was successfully changed. Fortunately, Noble clued me in that this isn't the case and that the potential leader needs to insert itself in the leader election queue before trigging the change leader command. Inserting themselves in the front of the queue should probably happen in BALANCESHARDUNIQUE when the preferredLeader property is assigned as well. [~noble.paul] Do evil things happen if a node joins at the head but it's _already_ in the queue? These ephemeral nodes in the queue are watching each other. So if node1 is the leader you have node1 - node2 - node3 - node4 where - means watches. Now, if node3 puts itself at the head of the list, you have {code} node1 - node2 - node3 - node4 {code} I _think_ when I was looking at this it all just worked. 1 node 1 goes down. Nodes 2 and 3 duke it out but there's code to insure that node3 becomes the leader and node2 inserts itself at then end so it's watching node 4. 2 node 2 goes down, nobody gets notified and it doesn't matter. 3 node 3 goes down, node 4 gets notified and starts watching node 2 by inserting itself at the end of the list. 4 node 4 goes down, nobody gets notified and it doesn't matter. was: The original code (SOLR-6517) assumed that changes in the clusterstate after issuing a command to the overseer to change the leader indicated that the leader was successfully changed. Fortunately, Noble clued me in that this isn't the case and that the potential leader needs to insert itself in the leader election queue before trigging the change leader command. Inserting themselves in the front of the queue should probably happen in BALANCESHARDUNIQUE when the preferredLeader property is assigned as well. [~noble.paul] Do evil things happen if a node joins at the head but it's _already_ in the queue? These ephemeral nodes in the queue are watching each other. So if node1 is the leader you have node1 - node2 - node3 - node4 where - means watches. Now, if node3 puts itself at the head of the list, you have {code} node1 - node2 - node3 - node4 {code} I _think_ when I was looking at this it all just worked. 1 node 1 goes down. Nodes 2 and 3 duke it out but there's code to insure that node3 becomes the leader and node2 inserts itself at then end so it's watching node 4. 2 node 2 goes down, nobody gets notified and it doesn't matter. 3 node 3 goes down, node 4 gets notified and starts watching node 2 by inserting itself at the end of the list. 4 node 4 goes down, nobody gets notified and it doesn't matter. REBALANCELEADERS needs to change the leader election queue. --- Key: SOLR-6691 URL: https://issues.apache.org/jira/browse/SOLR-6691 Project: Solr Issue Type: Bug Reporter: Erick Erickson Assignee: Erick Erickson The original code (SOLR-6517) assumed that changes in the clusterstate after issuing a command to the overseer to change the leader indicated that the leader was successfully changed. Fortunately, Noble clued me in that this isn't the case and that the potential leader needs to insert itself in the leader election queue before trigging the change leader command. Inserting themselves in the front of the queue should probably happen in BALANCESHARDUNIQUE when the preferredLeader property is assigned as well. [~noble.paul] Do evil things happen if a node joins at the head but it's _already_ in the queue? These ephemeral nodes in the queue are watching each other. So if node1 is the leader you have node1 - node2 - node3 - node4 where - means watches. Now, if node3 puts itself at the head of the list, you have {code} node1 - node2 - node3 - node4 {code} I _think_ when I was looking at this it all just worked. 1 node 1 goes down. Nodes 2 and 3 duke it out but there's code to insure that node3 becomes the leader and node2 inserts itself at then end so it's watching node 4. 2 node 2 goes down, nobody gets notified and it doesn't matter. 3 node 3 goes down, node 4 gets notified and starts watching node 2 by inserting itself at the end of the list. 4 node 4 goes down, nobody gets notified and it doesn't matter. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: An experience and some thoughts about solr/example - solr/server
On 11/2/2014 5:57 PM, Erick Erickson wrote: I'm a little discomfited by having to learn new stuff, but that's a personal problem ;). I do think we have to be mindful of people who want something like what Shawn was doing, I do this all the time as well. And of new people who haven't a clue. Hmmm, actually new folks might have an easier time of it since they don't have any expectations ;). bq: ...'run example' target that could also fire off a create for collection1. Exactly, with a note (perhaps in the help for this command) about where the config files are located that are used. Perhaps with a 'clean' option that blows away the current data directory and (if Zookeeper becomes the one source of truth) does an upconfig first. Thanks for all the input on this thread, and for the hard work trying to make everything easier for a beginner. I actually do really like the fact that we now start with no cores, it was just a bit of a shock. It sounds like it's a relatively straightforward thing to fire off a CoreAdmin 'curl' command after startup that will populate an example core, and the conf directory is probably easy to locate in the download too. I just ask that this information be added to the immediately available docs (README.txt and similar). I did not check the tutorial ... if it's not already there, it probably should be. Thanks, Shawn - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-4.10-Linux (64bit/jdk1.7.0_67) - Build # 47 - Failure!
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-4.10-Linux/47/ Java: 64bit/jdk1.7.0_67 -XX:-UseCompressedOops -XX:+UseConcMarkSweepGC (asserts: false) 1 tests failed. REGRESSION: org.apache.solr.schema.TestCloudSchemaless.testDistribSearch Error Message: Timeout occured while waiting response from server at: https://127.0.0.1:53035/cyt/ab/collection1 Stack Trace: org.apache.solr.client.solrj.SolrServerException: Timeout occured while waiting response from server at: https://127.0.0.1:53035/cyt/ab/collection1 at __randomizedtesting.SeedInfo.seed([C3126427574CD7E2:42F4EA3F2013B7DE]:0) at org.apache.solr.client.solrj.impl.HttpSolrServer.executeMethod(HttpSolrServer.java:562) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:210) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:206) at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:124) at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:68) at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:54) at org.apache.solr.schema.TestCloudSchemaless.doTest(TestCloudSchemaless.java:140) at org.apache.solr.BaseDistributedSearchTestCase.testDistribSearch(BaseDistributedSearchTestCase.java:871) at sun.reflect.GeneratedMethodAccessor45.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1618) at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:827) at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:863) at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:877) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53) at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50) at org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:365) at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:798) at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:458) at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:836) at com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:738) at com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:772) at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:783) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at
[jira] [Commented] (SOLR-6680) DefaultSolrHighlighter can sometimes avoid CachingTokenFilter
[ https://issues.apache.org/jira/browse/SOLR-6680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14194858#comment-14194858 ] David Smiley commented on SOLR-6680: I should point out that the benefit of LUCENE-6033 won't be realized for a multi-valued field because of the way the offset adjusting works (TermOffsetsTokenStream). I'm not concerned with optimizing for this case but should someone else want to take this further then consider this approach: Don't wrap the TokenStream from the TermVectors. Instead, grab all the values of this field and wrap them in a CharSequence implementation that reads from each value in sequence. But Highlighter expects a String for the value; it could be modified to deal with a CharSequence instead. DefaultSolrHighlighter can sometimes avoid CachingTokenFilter - Key: SOLR-6680 URL: https://issues.apache.org/jira/browse/SOLR-6680 Project: Solr Issue Type: Improvement Components: highlighter Reporter: David Smiley Assignee: David Smiley Fix For: 5.0 Attachments: SOLR-6680.patch The DefaultSolrHighlighter (the most accurate one) is a bit over-eager to wrap the token stream in a CachingTokenFilter when hl.usePhraseHighlighter=true. This wastes memory, and it interferes with other optimizations -- LUCENE-6034. Furthermore, the internal TermOffsetsTokenStream (used when TermVectors are used with this) wasn't properly delegating reset(). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2927) SolrIndexSearcher's register do not match close and SolrCore's closeSearcher
[ https://issues.apache.org/jira/browse/SOLR-2927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14194864#comment-14194864 ] Michael Dodsworth commented on SOLR-2927: - [~shalinmangar] any feedback on this? SolrIndexSearcher's register do not match close and SolrCore's closeSearcher Key: SOLR-2927 URL: https://issues.apache.org/jira/browse/SOLR-2927 Project: Solr Issue Type: Bug Components: search Affects Versions: 4.0-ALPHA Environment: JDK1.6/CentOS Reporter: tom liu Assignee: Shalin Shekhar Mangar Fix For: 4.9, Trunk Attachments: SOLR-2927.patch, mbean-leak-jira.png # SolrIndexSearcher's register method put the name of searcher, but SolrCore's closeSearcher method remove name of currentSearcher on infoRegistry. # SolrIndexSearcher's register method put the name of cache, but SolrIndexSearcher's close do not remove the name of cache. so, there maybe lost some memory leak. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4656) Add hl.maxMultiValuedToExamine to limit the number of multiValued entries examined while highlighting
[ https://issues.apache.org/jira/browse/SOLR-4656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14194869#comment-14194869 ] David Smiley commented on SOLR-4656: I saw the results of the modifications here during my work on SOLR-6680. It's not clear to me there needed to be new parameters. Shouldn't the field value lengths be accumulated, approaching maxAnalyzedChars and then exit at that point? And furthermore, shouldn't this field value loop exit early once it sees {{fragTexts.size() = numFragments}} (i.e. hl.snippets is reached) ? Add hl.maxMultiValuedToExamine to limit the number of multiValued entries examined while highlighting - Key: SOLR-4656 URL: https://issues.apache.org/jira/browse/SOLR-4656 Project: Solr Issue Type: Improvement Components: highlighter Affects Versions: 4.3, Trunk Reporter: Erick Erickson Assignee: Erick Erickson Priority: Minor Fix For: 4.3, Trunk Attachments: SOLR-4656-4x.patch, SOLR-4656-4x.patch, SOLR-4656-trunk.patch, SOLR-4656.patch I'm looking at an admittedly pathological case of many, many entries in a multiValued field, and trying to implement a way to limit the number examined, analogous to maxAnalyzedChars, see the patch. Along the way, I noticed that we do what looks like unnecessary copying of the fields to be examined. We call Document.getFields, which copies all of the fields and values to the returned array. Then we copy all of those to another array, converting them to Strings. Then we actually examine them. a this doesn't seem very efficient and b reduces the benefit from limiting the number of mv values examined. So the attached does two things: 1 attempts to fix this 2 implements hl.maxMultiValuedToExamine I'd _really_ love it if someone who knows the highlighting code takes a peek at the fix to see if I've messed things up, the changes are actually pretty minimal. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
JDK 9 Early Access with Project Jigsaw build b36 is available on java.net
Hi Uwe Dawid, JDK 9 Early Access with Project Jigsaw build b36 is available on java.net [1] The goal of Project Jigsaw [2] is to design and implement a standard module system for the Java SE Platform, and to apply that system to the Platform itself and to the JDK. As described in JEP 220 [3], this build provides a new runtime image structure. For example, this new runtime image does not install an rt.jar file or a tools.jar file. Please refer to Project Jigsaw's updated project pages [2] [4] and Mark Reinhold's announcement email [5] for further details. We are very interested in your experiences testing this build. Comments, questions, and suggestions are welcome on the jigsaw-dev mailing list or else submit bug reports via bugs.java.com. Note: If you haven’t already subscribed to that mailing list then please do so first, otherwise your message will be discarded as spam. [1] https://jdk9.java.net/jigsaw/ [2] http://openjdk.java.net/projects/jigsaw/ [3] http://openjdk.java.net/jeps/220 [4] http://openjdk.java.net/projects/jigsaw/ea [5] http://mail.openjdk.java.net/pipermail/jigsaw-dev/2014-November/003878.html -- Rgds,Rory O'Donnell Quality Engineering Manager Oracle EMEA , Dublin, Ireland - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: JDK 9 Early Access with Project Jigsaw build b36 is available on java.net
I imagine this will break everything that relies on scanning rt.jar out there (proguard, shading plugins, api-checkers) -- fun, fun, fun ;) Dawid On Mon, Nov 3, 2014 at 7:41 PM, Rory O'Donnell Oracle, Dublin Ireland rory.odonn...@oracle.com wrote: Hi Uwe Dawid, JDK 9 Early Access with Project Jigsaw build b36 is available on java.net [1] The goal of Project Jigsaw [2] is to design and implement a standard module system for the Java SE Platform, and to apply that system to the Platform itself and to the JDK. As described in JEP 220 [3], this build provides a new runtime image structure. For example, this new runtime image does not install an rt.jar file or a tools.jar file. Please refer to Project Jigsaw's updated project pages [2] [4] and Mark Reinhold's announcement email [5] for further details. We are very interested in your experiences testing this build. Comments, questions, and suggestions are welcome on the jigsaw-dev mailing list or else submit bug reports via bugs.java.com. Note: If you haven’t already subscribed to that mailing list then please do so first, otherwise your message will be discarded as spam. [1] https://jdk9.java.net/jigsaw/ [2] http://openjdk.java.net/projects/jigsaw/ [3] http://openjdk.java.net/jeps/220 [4] http://openjdk.java.net/projects/jigsaw/ea [5] http://mail.openjdk.java.net/pipermail/jigsaw-dev/2014-November/003878.html -- Rgds,Rory O'Donnell Quality Engineering Manager Oracle EMEA , Dublin, Ireland - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4656) Add hl.maxMultiValuedToExamine to limit the number of multiValued entries examined while highlighting
[ https://issues.apache.org/jira/browse/SOLR-4656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14194962#comment-14194962 ] Erick Erickson commented on SOLR-4656: -- David: bq: Shouldn't the field value lengths be accumulated, I see where you're going, and I have I admit I didn't originate this code so all things are possible. It's a little different sense than maxAnayzedChars in that the unit of measurement is the number of MV entries rather than the number of characters analyzed, but I could argue either way. bq: shouldn't this field value loop exit early once ... I have no objection. Although it sees kind of late to take away this parameter, should we deprecate it insteas? Add hl.maxMultiValuedToExamine to limit the number of multiValued entries examined while highlighting - Key: SOLR-4656 URL: https://issues.apache.org/jira/browse/SOLR-4656 Project: Solr Issue Type: Improvement Components: highlighter Affects Versions: 4.3, Trunk Reporter: Erick Erickson Assignee: Erick Erickson Priority: Minor Fix For: 4.3, Trunk Attachments: SOLR-4656-4x.patch, SOLR-4656-4x.patch, SOLR-4656-trunk.patch, SOLR-4656.patch I'm looking at an admittedly pathological case of many, many entries in a multiValued field, and trying to implement a way to limit the number examined, analogous to maxAnalyzedChars, see the patch. Along the way, I noticed that we do what looks like unnecessary copying of the fields to be examined. We call Document.getFields, which copies all of the fields and values to the returned array. Then we copy all of those to another array, converting them to Strings. Then we actually examine them. a this doesn't seem very efficient and b reduces the benefit from limiting the number of mv values examined. So the attached does two things: 1 attempts to fix this 2 implements hl.maxMultiValuedToExamine I'd _really_ love it if someone who knows the highlighting code takes a peek at the fix to see if I've messed things up, the changes are actually pretty minimal. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4656) Add hl.maxMultiValuedToExamine to limit the number of multiValued entries examined while highlighting
[ https://issues.apache.org/jira/browse/SOLR-4656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14195001#comment-14195001 ] David Smiley commented on SOLR-4656: bq. It's a little different sense than maxAnayzedChars in that the unit of measurement is the number of MV entries rather than the number of characters analyzed, but I could argue either way. Sure... but was there per-value overhead involved that was a bit heavy for the particular client you did this for (i.e. massive number of values) or was it just a matter of not accumulating value lengths? bq. Although it sees kind of late to take away this parameter, should we deprecate it instead? If there are a large number of values, I guess it has some value. In my last comment to SOLR-6680 I stated I think multi-value handling should be done a bit differently in which each value should be virtually concatenated/iterated via a CharSequence wrapper and handed to the highlighter. Likewise the TokenStreams of each value could be wrapped into a concatenating wrapper. If that were done, then I think these parameters would be completely obsolete as it would handle the case of massive number of values. I'll create a separate issue to accumulate maxAnalyzedChars per value and exit early. Add hl.maxMultiValuedToExamine to limit the number of multiValued entries examined while highlighting - Key: SOLR-4656 URL: https://issues.apache.org/jira/browse/SOLR-4656 Project: Solr Issue Type: Improvement Components: highlighter Affects Versions: 4.3, Trunk Reporter: Erick Erickson Assignee: Erick Erickson Priority: Minor Fix For: 4.3, Trunk Attachments: SOLR-4656-4x.patch, SOLR-4656-4x.patch, SOLR-4656-trunk.patch, SOLR-4656.patch I'm looking at an admittedly pathological case of many, many entries in a multiValued field, and trying to implement a way to limit the number examined, analogous to maxAnalyzedChars, see the patch. Along the way, I noticed that we do what looks like unnecessary copying of the fields to be examined. We call Document.getFields, which copies all of the fields and values to the returned array. Then we copy all of those to another array, converting them to Strings. Then we actually examine them. a this doesn't seem very efficient and b reduces the benefit from limiting the number of mv values examined. So the attached does two things: 1 attempts to fix this 2 implements hl.maxMultiValuedToExamine I'd _really_ love it if someone who knows the highlighting code takes a peek at the fix to see if I've messed things up, the changes are actually pretty minimal. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-6692) hl.maxAnalyzedChars should apply cumulatively on a multi-valued field
David Smiley created SOLR-6692: -- Summary: hl.maxAnalyzedChars should apply cumulatively on a multi-valued field Key: SOLR-6692 URL: https://issues.apache.org/jira/browse/SOLR-6692 Project: Solr Issue Type: Improvement Components: highlighter Reporter: David Smiley Fix For: 5.0 I think hl.maxAnalyzedChars should apply cumulatively across the values of a multi-valued field. DefaultSolrHighligher doesn't; I'm not sure yet about the other two. Furthermore, DefaultSolrHighligher.doHighlightingByHighlighter should exit early from it's field value loop if it reaches hl.snippets. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-6046) RegExp.toAutomaton high memory use
[ https://issues.apache.org/jira/browse/LUCENE-6046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-6046: --- Attachment: LUCENE-6046.patch Patch, tests pass. I added a required int maxStates to RegExp.toAutomaton, and it threads this down to determinize, and throws RegExpTooHardExc if determinize would need to exceed that limit. I didn't make it a checked exc; I had started that way but it percolates up high, e.g. into query parsers, and I think that's too much. The exception message itself should make it quite clear what went wrong at query time. I also added this as an optional param to RegexpQuery default ctor, and defaulted it to 1 states, and to QueryParserBase, with the same default. RegExp.toAutomaton high memory use -- Key: LUCENE-6046 URL: https://issues.apache.org/jira/browse/LUCENE-6046 Project: Lucene - Core Issue Type: Bug Components: core/queryparser Affects Versions: 4.10.1 Reporter: Lee Hinman Assignee: Michael McCandless Priority: Minor Attachments: LUCENE-6046.patch, LUCENE-6046.patch When creating an automaton from an org.apache.lucene.util.automaton.RegExp, it's possible for the automaton to use so much memory it exceeds the maximum array size for java. The following caused an OutOfMemoryError with a 32gb heap: {noformat} new RegExp(\\[\\[(Datei|File|Bild|Image):[^]]*alt=[^]|}]{50,200}).toAutomaton(); {noformat} When increased to a 60gb heap, the following exception is thrown: {noformat} 1 java.lang.IllegalArgumentException: requested array size 2147483624 exceeds maximum array in java (2147483623) 1 __randomizedtesting.SeedInfo.seed([7BE81EF678615C32:95C8057A4ABA5B52]:0) 1 org.apache.lucene.util.ArrayUtil.oversize(ArrayUtil.java:168) 1 org.apache.lucene.util.ArrayUtil.grow(ArrayUtil.java:295) 1 org.apache.lucene.util.automaton.Automaton$Builder.addTransition(Automaton.java:639) 1 org.apache.lucene.util.automaton.Operations.determinize(Operations.java:741) 1 org.apache.lucene.util.automaton.MinimizationOperations.minimizeHopcroft(MinimizationOperations.java:62) 1 org.apache.lucene.util.automaton.MinimizationOperations.minimize(MinimizationOperations.java:51) 1 org.apache.lucene.util.automaton.RegExp.toAutomaton(RegExp.java:477) 1 org.apache.lucene.util.automaton.RegExp.toAutomaton(RegExp.java:426) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-6693) Start script for windows fails with 32bit JRE
Jan Høydahl created SOLR-6693: - Summary: Start script for windows fails with 32bit JRE Key: SOLR-6693 URL: https://issues.apache.org/jira/browse/SOLR-6693 Project: Solr Issue Type: Bug Components: scripts and tools Affects Versions: 4.10.2 Environment: WINDOWS 8.1 Reporter: Jan Høydahl Fix For: 5.0, Trunk *Reproduce:* # Install JRE8 from www.java.com (typically {{C:\Program Files (x86)\Java\jre1.8.0_25}}) # Run the command {{bin\solr start -V}} The result is: {{\Java\jre1.8.0_25\bin\java was unexpected at this time.}} *Reason* This comes from bad quoting of the {{%SOLR%}} variable. I think it's because of the parenthesis that it freaks out. *Solution* Quoting the lines where %JAVA% is printed, e.g. instead of {noformat} @echo Using Java: %JAVA% {noformat} then use {noformat} @echo Using Java: %JAVA% {noformat} This is needed several places. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-6693) Start script for windows fails with 32bit JRE
[ https://issues.apache.org/jira/browse/SOLR-6693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jan Høydahl updated SOLR-6693: -- Description: *Reproduce:* # Install JRE8 from www.java.com (typically {{C:\Program Files (x86)\Java\jre1.8.0_25}}) # Run the command {{bin\solr start -V}} The result is: {{\Java\jre1.8.0_25\bin\java was unexpected at this time.}} *Reason* This comes from bad quoting of the {{%SOLR%}} variable. I think it's because of the parenthesis that it freaks out. I think the same would apply for a 32-bit JDK because of the (x86) in the path, but I have not tested. *Solution* Quoting the lines where %JAVA% is printed, e.g. instead of {noformat} @echo Using Java: %JAVA% {noformat} then use {noformat} @echo Using Java: %JAVA% {noformat} This is needed several places. was: *Reproduce:* # Install JRE8 from www.java.com (typically {{C:\Program Files (x86)\Java\jre1.8.0_25}}) # Run the command {{bin\solr start -V}} The result is: {{\Java\jre1.8.0_25\bin\java was unexpected at this time.}} *Reason* This comes from bad quoting of the {{%SOLR%}} variable. I think it's because of the parenthesis that it freaks out. *Solution* Quoting the lines where %JAVA% is printed, e.g. instead of {noformat} @echo Using Java: %JAVA% {noformat} then use {noformat} @echo Using Java: %JAVA% {noformat} This is needed several places. Start script for windows fails with 32bit JRE - Key: SOLR-6693 URL: https://issues.apache.org/jira/browse/SOLR-6693 Project: Solr Issue Type: Bug Components: scripts and tools Affects Versions: 4.10.2 Environment: WINDOWS 8.1 Reporter: Jan Høydahl Labels: bin\solr.cmd Fix For: 5.0, Trunk *Reproduce:* # Install JRE8 from www.java.com (typically {{C:\Program Files (x86)\Java\jre1.8.0_25}}) # Run the command {{bin\solr start -V}} The result is: {{\Java\jre1.8.0_25\bin\java was unexpected at this time.}} *Reason* This comes from bad quoting of the {{%SOLR%}} variable. I think it's because of the parenthesis that it freaks out. I think the same would apply for a 32-bit JDK because of the (x86) in the path, but I have not tested. *Solution* Quoting the lines where %JAVA% is printed, e.g. instead of {noformat} @echo Using Java: %JAVA% {noformat} then use {noformat} @echo Using Java: %JAVA% {noformat} This is needed several places. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6046) RegExp.toAutomaton high memory use
[ https://issues.apache.org/jira/browse/LUCENE-6046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14195033#comment-14195033 ] Nik Everett commented on LUCENE-6046: - Oh no! I wrote a very similar patch! Sorry to duplicate effort there. I found that 10,000 states wasn't quite enough to handle some of the tests so I went with 1,000,000 as the default. Its pretty darn huge but it does get the job done. RegExp.toAutomaton high memory use -- Key: LUCENE-6046 URL: https://issues.apache.org/jira/browse/LUCENE-6046 Project: Lucene - Core Issue Type: Bug Components: core/queryparser Affects Versions: 4.10.1 Reporter: Lee Hinman Assignee: Michael McCandless Priority: Minor Attachments: LUCENE-6046.patch, LUCENE-6046.patch When creating an automaton from an org.apache.lucene.util.automaton.RegExp, it's possible for the automaton to use so much memory it exceeds the maximum array size for java. The following caused an OutOfMemoryError with a 32gb heap: {noformat} new RegExp(\\[\\[(Datei|File|Bild|Image):[^]]*alt=[^]|}]{50,200}).toAutomaton(); {noformat} When increased to a 60gb heap, the following exception is thrown: {noformat} 1 java.lang.IllegalArgumentException: requested array size 2147483624 exceeds maximum array in java (2147483623) 1 __randomizedtesting.SeedInfo.seed([7BE81EF678615C32:95C8057A4ABA5B52]:0) 1 org.apache.lucene.util.ArrayUtil.oversize(ArrayUtil.java:168) 1 org.apache.lucene.util.ArrayUtil.grow(ArrayUtil.java:295) 1 org.apache.lucene.util.automaton.Automaton$Builder.addTransition(Automaton.java:639) 1 org.apache.lucene.util.automaton.Operations.determinize(Operations.java:741) 1 org.apache.lucene.util.automaton.MinimizationOperations.minimizeHopcroft(MinimizationOperations.java:62) 1 org.apache.lucene.util.automaton.MinimizationOperations.minimize(MinimizationOperations.java:51) 1 org.apache.lucene.util.automaton.RegExp.toAutomaton(RegExp.java:477) 1 org.apache.lucene.util.automaton.RegExp.toAutomaton(RegExp.java:426) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: An experience and some thoughts about solr/example - solr/server
On Nov 3, 2014, at 12:50 PM, Shawn Heisey apa...@elyograg.org wrote: On 11/2/2014 5:57 PM, Erick Erickson wrote: I'm a little discomfited by having to learn new stuff, but that's a personal problem ;). I do think we have to be mindful of people who want something like what Shawn was doing, I do this all the time as well. And of new people who haven't a clue. Hmmm, actually new folks might have an easier time of it since they don't have any expectations ;). bq: ...'run example' target that could also fire off a create for collection1. Exactly, with a note (perhaps in the help for this command) about where the config files are located that are used. Perhaps with a 'clean' option that blows away the current data directory and (if Zookeeper becomes the one source of truth) does an upconfig first. Thanks for all the input on this thread, and for the hard work trying to make everything easier for a beginner. I actually do really like the fact that we now start with no cores, it was just a bit of a shock. It sounds like it's a relatively straightforward thing to fire off a CoreAdmin 'curl' command after startup that will populate an example core, and the conf directory is probably easy to locate in the download too. I just ask that this information be added to the immediately available docs (README.txt and similar). I did not check the tutorial ... if it's not already there, it probably should be. Or on trunk (and hopefully back ported if we do another 4.10.x release): $ bin/solr create_core -help Usage: solr create_core [-n name] [-c configset] -n name Name of core to create -c configset Name of configuration directory to use, valid options are: basic_configs: Minimal Solr configuration data_driven_schema_configs: Managed schema with field-guessing support enabled sample_techproducts_configs: Example configuration with many optional features enabled to demonstrate the full power of Solr If not specified, default is: data_driven_schema_configs - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6046) RegExp.toAutomaton high memory use
[ https://issues.apache.org/jira/browse/LUCENE-6046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14195047#comment-14195047 ] Michael McCandless commented on LUCENE-6046: Woops, sorry, I didn't see you had a patch here! Thank you. I like your patch: it's good to make all hidden usages of determinize visible. Let's start from your patch and merge anything from mine in? E.g. I think we can collapse minimizeHopcroft into just minimize... bq. I found that 10,000 states wasn't quite enough to handle some of the tests so I went with 1,000,000 as the default. Its pretty darn huge but it does get the job done. Whoa, which tests needed 1M max states? I worry about passing a 1M state automaton to minimize... RegExp.toAutomaton high memory use -- Key: LUCENE-6046 URL: https://issues.apache.org/jira/browse/LUCENE-6046 Project: Lucene - Core Issue Type: Bug Components: core/queryparser Affects Versions: 4.10.1 Reporter: Lee Hinman Assignee: Michael McCandless Priority: Minor Attachments: LUCENE-6046.patch, LUCENE-6046.patch When creating an automaton from an org.apache.lucene.util.automaton.RegExp, it's possible for the automaton to use so much memory it exceeds the maximum array size for java. The following caused an OutOfMemoryError with a 32gb heap: {noformat} new RegExp(\\[\\[(Datei|File|Bild|Image):[^]]*alt=[^]|}]{50,200}).toAutomaton(); {noformat} When increased to a 60gb heap, the following exception is thrown: {noformat} 1 java.lang.IllegalArgumentException: requested array size 2147483624 exceeds maximum array in java (2147483623) 1 __randomizedtesting.SeedInfo.seed([7BE81EF678615C32:95C8057A4ABA5B52]:0) 1 org.apache.lucene.util.ArrayUtil.oversize(ArrayUtil.java:168) 1 org.apache.lucene.util.ArrayUtil.grow(ArrayUtil.java:295) 1 org.apache.lucene.util.automaton.Automaton$Builder.addTransition(Automaton.java:639) 1 org.apache.lucene.util.automaton.Operations.determinize(Operations.java:741) 1 org.apache.lucene.util.automaton.MinimizationOperations.minimizeHopcroft(MinimizationOperations.java:62) 1 org.apache.lucene.util.automaton.MinimizationOperations.minimize(MinimizationOperations.java:51) 1 org.apache.lucene.util.automaton.RegExp.toAutomaton(RegExp.java:477) 1 org.apache.lucene.util.automaton.RegExp.toAutomaton(RegExp.java:426) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-6693) Start script for windows fails with 32bit JRE
[ https://issues.apache.org/jira/browse/SOLR-6693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jan Høydahl updated SOLR-6693: -- Description: *Reproduce:* # Install JRE8 from www.java.com (typically {{C:\Program Files (x86)\Java\jre1.8.0_25}}) # Run the command {{bin\solr start -V}} The result is: {{\Java\jre1.8.0_25\bin\java was unexpected at this time.}} *Reason* This comes from bad quoting of the {{%SOLR%}} variable. I think it's because of the parenthesis that it freaks out. I think the same would apply for a 32-bit JDK because of the (x86) in the path, but I have not tested. Tip: You can remove the line {{@ECHO OFF}} at the top to see exactly which is the offending line *Solution* Quoting the lines where %JAVA% is printed, e.g. instead of {noformat} @echo Using Java: %JAVA% {noformat} then use {noformat} @echo Using Java: %JAVA% {noformat} This is needed several places. was: *Reproduce:* # Install JRE8 from www.java.com (typically {{C:\Program Files (x86)\Java\jre1.8.0_25}}) # Run the command {{bin\solr start -V}} The result is: {{\Java\jre1.8.0_25\bin\java was unexpected at this time.}} *Reason* This comes from bad quoting of the {{%SOLR%}} variable. I think it's because of the parenthesis that it freaks out. I think the same would apply for a 32-bit JDK because of the (x86) in the path, but I have not tested. *Solution* Quoting the lines where %JAVA% is printed, e.g. instead of {noformat} @echo Using Java: %JAVA% {noformat} then use {noformat} @echo Using Java: %JAVA% {noformat} This is needed several places. Start script for windows fails with 32bit JRE - Key: SOLR-6693 URL: https://issues.apache.org/jira/browse/SOLR-6693 Project: Solr Issue Type: Bug Components: scripts and tools Affects Versions: 4.10.2 Environment: WINDOWS 8.1 Reporter: Jan Høydahl Labels: bin\solr.cmd Fix For: 5.0, Trunk *Reproduce:* # Install JRE8 from www.java.com (typically {{C:\Program Files (x86)\Java\jre1.8.0_25}}) # Run the command {{bin\solr start -V}} The result is: {{\Java\jre1.8.0_25\bin\java was unexpected at this time.}} *Reason* This comes from bad quoting of the {{%SOLR%}} variable. I think it's because of the parenthesis that it freaks out. I think the same would apply for a 32-bit JDK because of the (x86) in the path, but I have not tested. Tip: You can remove the line {{@ECHO OFF}} at the top to see exactly which is the offending line *Solution* Quoting the lines where %JAVA% is printed, e.g. instead of {noformat} @echo Using Java: %JAVA% {noformat} then use {noformat} @echo Using Java: %JAVA% {noformat} This is needed several places. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6046) RegExp.toAutomaton high memory use
[ https://issues.apache.org/jira/browse/LUCENE-6046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14195053#comment-14195053 ] Michael McCandless commented on LUCENE-6046: I like the test simplifications, and removing dead code from Operations.determinize. Can we fix the exc thrown from Regexp to include the offending regular expression, and fix the test to confirm the message contains it? Maybe also add RegExp.toStringTree? I found it very useful while debugging the original regexp :) I think QueryParserBase should also have a set/get for this option? RegExp.toAutomaton high memory use -- Key: LUCENE-6046 URL: https://issues.apache.org/jira/browse/LUCENE-6046 Project: Lucene - Core Issue Type: Bug Components: core/queryparser Affects Versions: 4.10.1 Reporter: Lee Hinman Assignee: Michael McCandless Priority: Minor Attachments: LUCENE-6046.patch, LUCENE-6046.patch When creating an automaton from an org.apache.lucene.util.automaton.RegExp, it's possible for the automaton to use so much memory it exceeds the maximum array size for java. The following caused an OutOfMemoryError with a 32gb heap: {noformat} new RegExp(\\[\\[(Datei|File|Bild|Image):[^]]*alt=[^]|}]{50,200}).toAutomaton(); {noformat} When increased to a 60gb heap, the following exception is thrown: {noformat} 1 java.lang.IllegalArgumentException: requested array size 2147483624 exceeds maximum array in java (2147483623) 1 __randomizedtesting.SeedInfo.seed([7BE81EF678615C32:95C8057A4ABA5B52]:0) 1 org.apache.lucene.util.ArrayUtil.oversize(ArrayUtil.java:168) 1 org.apache.lucene.util.ArrayUtil.grow(ArrayUtil.java:295) 1 org.apache.lucene.util.automaton.Automaton$Builder.addTransition(Automaton.java:639) 1 org.apache.lucene.util.automaton.Operations.determinize(Operations.java:741) 1 org.apache.lucene.util.automaton.MinimizationOperations.minimizeHopcroft(MinimizationOperations.java:62) 1 org.apache.lucene.util.automaton.MinimizationOperations.minimize(MinimizationOperations.java:51) 1 org.apache.lucene.util.automaton.RegExp.toAutomaton(RegExp.java:477) 1 org.apache.lucene.util.automaton.RegExp.toAutomaton(RegExp.java:426) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6046) RegExp.toAutomaton high memory use
[ https://issues.apache.org/jira/browse/LUCENE-6046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14195056#comment-14195056 ] Nik Everett commented on LUCENE-6046: - TestDeterminizeLexicon wants to make an automata that accepts 5000 random strings. So 10,000 isn't enough states for it. I'll drop the default limit to 10,000 again and just feed a million to that test case. RegExp.toAutomaton high memory use -- Key: LUCENE-6046 URL: https://issues.apache.org/jira/browse/LUCENE-6046 Project: Lucene - Core Issue Type: Bug Components: core/queryparser Affects Versions: 4.10.1 Reporter: Lee Hinman Assignee: Michael McCandless Priority: Minor Attachments: LUCENE-6046.patch, LUCENE-6046.patch When creating an automaton from an org.apache.lucene.util.automaton.RegExp, it's possible for the automaton to use so much memory it exceeds the maximum array size for java. The following caused an OutOfMemoryError with a 32gb heap: {noformat} new RegExp(\\[\\[(Datei|File|Bild|Image):[^]]*alt=[^]|}]{50,200}).toAutomaton(); {noformat} When increased to a 60gb heap, the following exception is thrown: {noformat} 1 java.lang.IllegalArgumentException: requested array size 2147483624 exceeds maximum array in java (2147483623) 1 __randomizedtesting.SeedInfo.seed([7BE81EF678615C32:95C8057A4ABA5B52]:0) 1 org.apache.lucene.util.ArrayUtil.oversize(ArrayUtil.java:168) 1 org.apache.lucene.util.ArrayUtil.grow(ArrayUtil.java:295) 1 org.apache.lucene.util.automaton.Automaton$Builder.addTransition(Automaton.java:639) 1 org.apache.lucene.util.automaton.Operations.determinize(Operations.java:741) 1 org.apache.lucene.util.automaton.MinimizationOperations.minimizeHopcroft(MinimizationOperations.java:62) 1 org.apache.lucene.util.automaton.MinimizationOperations.minimize(MinimizationOperations.java:51) 1 org.apache.lucene.util.automaton.RegExp.toAutomaton(RegExp.java:477) 1 org.apache.lucene.util.automaton.RegExp.toAutomaton(RegExp.java:426) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6046) RegExp.toAutomaton high memory use
[ https://issues.apache.org/jira/browse/LUCENE-6046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14195065#comment-14195065 ] Nik Everett commented on LUCENE-6046: - I'll certainly add the regexp string to the exception message. And I'll merge the toStringTree from your patch into mine if you'd like. Yeah - QueryParserBase should have this option too. The thing I found most useful for debugging this was to call toDot on the automata before and after normalization. I just looked at it and went, oh, of course you have to do it that way. No wonder the states explode. And then I read https://en.wikipedia.org/wiki/Powerset_construction and remembered it from my rusty CS degree. RegExp.toAutomaton high memory use -- Key: LUCENE-6046 URL: https://issues.apache.org/jira/browse/LUCENE-6046 Project: Lucene - Core Issue Type: Bug Components: core/queryparser Affects Versions: 4.10.1 Reporter: Lee Hinman Assignee: Michael McCandless Priority: Minor Attachments: LUCENE-6046.patch, LUCENE-6046.patch When creating an automaton from an org.apache.lucene.util.automaton.RegExp, it's possible for the automaton to use so much memory it exceeds the maximum array size for java. The following caused an OutOfMemoryError with a 32gb heap: {noformat} new RegExp(\\[\\[(Datei|File|Bild|Image):[^]]*alt=[^]|}]{50,200}).toAutomaton(); {noformat} When increased to a 60gb heap, the following exception is thrown: {noformat} 1 java.lang.IllegalArgumentException: requested array size 2147483624 exceeds maximum array in java (2147483623) 1 __randomizedtesting.SeedInfo.seed([7BE81EF678615C32:95C8057A4ABA5B52]:0) 1 org.apache.lucene.util.ArrayUtil.oversize(ArrayUtil.java:168) 1 org.apache.lucene.util.ArrayUtil.grow(ArrayUtil.java:295) 1 org.apache.lucene.util.automaton.Automaton$Builder.addTransition(Automaton.java:639) 1 org.apache.lucene.util.automaton.Operations.determinize(Operations.java:741) 1 org.apache.lucene.util.automaton.MinimizationOperations.minimizeHopcroft(MinimizationOperations.java:62) 1 org.apache.lucene.util.automaton.MinimizationOperations.minimize(MinimizationOperations.java:51) 1 org.apache.lucene.util.automaton.RegExp.toAutomaton(RegExp.java:477) 1 org.apache.lucene.util.automaton.RegExp.toAutomaton(RegExp.java:426) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6693) Start script for windows fails with 32bit JRE
[ https://issues.apache.org/jira/browse/SOLR-6693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14195096#comment-14195096 ] Jan Høydahl commented on SOLR-6693: --- After fixing the echo problems, the next hurdle occurs: {{Java 1.7 or later is required to run Solr.}} Even if I have Java8 (32bit). After some debugging, I found that the syntax {{-version:x.y}} does not work on 32-bit java for Windows, it prints the error even if you have the right version. So the question then is, should the script enforce 64bit Java and print a more useful message if not found? Or is there a way to fix the version testing under 32-bit Java on Windows? It would perhaps be good to print a warning for 32-bit Java since you should use 64bit if possible Start script for windows fails with 32bit JRE - Key: SOLR-6693 URL: https://issues.apache.org/jira/browse/SOLR-6693 Project: Solr Issue Type: Bug Components: scripts and tools Affects Versions: 4.10.2 Environment: WINDOWS 8.1 Reporter: Jan Høydahl Labels: bin\solr.cmd Fix For: 5.0, Trunk *Reproduce:* # Install JRE8 from www.java.com (typically {{C:\Program Files (x86)\Java\jre1.8.0_25}}) # Run the command {{bin\solr start -V}} The result is: {{\Java\jre1.8.0_25\bin\java was unexpected at this time.}} *Reason* This comes from bad quoting of the {{%SOLR%}} variable. I think it's because of the parenthesis that it freaks out. I think the same would apply for a 32-bit JDK because of the (x86) in the path, but I have not tested. Tip: You can remove the line {{@ECHO OFF}} at the top to see exactly which is the offending line *Solution* Quoting the lines where %JAVA% is printed, e.g. instead of {noformat} @echo Using Java: %JAVA% {noformat} then use {noformat} @echo Using Java: %JAVA% {noformat} This is needed several places. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-6694) Auto detect JAVA_HOME in bin\start.cmd
Jan Høydahl created SOLR-6694: - Summary: Auto detect JAVA_HOME in bin\start.cmd Key: SOLR-6694 URL: https://issues.apache.org/jira/browse/SOLR-6694 Project: Solr Issue Type: Improvement Components: scripts and tools Affects Versions: 4.10.2 Environment: Windows Reporter: Jan Høydahl The start script requires JAVA_HOME to be set. The Java installer on Windows does not set JAVA_HOME, so it is an obstacle for new users who wants to test. What the installer does is to set some registry values, and we can detect those to find a JAVA_HOME to use. It will give a better user experience. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4586) Increase default maxBooleanClauses
[ https://issues.apache.org/jira/browse/SOLR-4586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14195112#comment-14195112 ] Robert Parker commented on SOLR-4586: - Under Solr 4.10.2 in solrcloud configuration, if I upload a change to solrconfig.xml to zookeeper that raises maxBooleanClauses from 1024 to 2048 and then reload the collection, the cores do not recongnize a new value for maxBooleanClauses unlike other changes to schema.xml and solrconfig.xml. I have to bounce Solr on each node before queries will honor the new value for maxBooleanClauses. This seems like unintentional behavior. I should be able to make any change to schema.xml and solrconfig.xml, then upload those to zookeeper and have each node in the cluster instantly honor all new values after a core/collection reload. Increase default maxBooleanClauses -- Key: SOLR-4586 URL: https://issues.apache.org/jira/browse/SOLR-4586 Project: Solr Issue Type: Improvement Affects Versions: 4.2 Environment: 4.3-SNAPSHOT 1456767M - ncindex - 2013-03-15 13:11:50 Reporter: Shawn Heisey Attachments: SOLR-4586.patch, SOLR-4586.patch, SOLR-4586.patch, SOLR-4586.patch, SOLR-4586.patch, SOLR-4586_verify_maxClauses.patch In the #solr IRC channel, I mentioned the maxBooleanClauses limitation to someone asking a question about queries. Mark Miller told me that maxBooleanClauses no longer applies, that the limitation was removed from Lucene sometime in the 3.x series. The config still shows up in the example even in the just-released 4.2. Checking through the source code, I found that the config option is parsed and the value stored in objects, but does not actually seem to be used by anything. I removed every trace of it that I could find, and all tests still pass. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6694) Auto detect JAVA_HOME in bin\start.cmd
[ https://issues.apache.org/jira/browse/SOLR-6694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14195122#comment-14195122 ] Jan Høydahl commented on SOLR-6694: --- Here's some code from a start script I created long ago: {noformat} echo Detecting JAVA_HOME if %JAVA_HOME%== call:FIND_JAVA_HOME echo Java home: %JAVA_HOME% goto:DETECTED :FIND_JAVA_HOME FOR /F skip=2 tokens=2* %%A IN ('REG QUERY HKLM\Software\JavaSoft\Java Runtime Environment /v CurrentVersion') DO set CurVer=%%B FOR /F skip=2 tokens=2* %%A IN ('REG QUERY HKLM\Software\JavaSoft\Java Runtime Environment\%CurVer% /v JavaHome') DO set JAVA_HOME=%%B goto:EOF :DETECTED echo Do whatever {noformat} Auto detect JAVA_HOME in bin\start.cmd -- Key: SOLR-6694 URL: https://issues.apache.org/jira/browse/SOLR-6694 Project: Solr Issue Type: Improvement Components: scripts and tools Affects Versions: 4.10.2 Environment: Windows Reporter: Jan Høydahl The start script requires JAVA_HOME to be set. The Java installer on Windows does not set JAVA_HOME, so it is an obstacle for new users who wants to test. What the installer does is to set some registry values, and we can detect those to find a JAVA_HOME to use. It will give a better user experience. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6037) PendingTerm cannot be cast to PendingBlock
[ https://issues.apache.org/jira/browse/LUCENE-6037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14195127#comment-14195127 ] Michael McCandless commented on LUCENE-6037: Hmm are you sure this was just a multi-threaded issue? I don't see how [illegally] sharing a single Document across threads would lead to this exception. PendingTerm cannot be cast to PendingBlock -- Key: LUCENE-6037 URL: https://issues.apache.org/jira/browse/LUCENE-6037 Project: Lucene - Core Issue Type: Bug Components: core/codecs Affects Versions: 4.3.1 Environment: ubuntu 64bit Reporter: zhanlijun Priority: Critical Fix For: 4.3.1 the error as follows: java.lang.ClassCastException: org.apache.lucene.codecs.BlockTreeTermsWriter$PendingTerm cannot be cast to org.apache.lucene.codecs.BlockTreeTermsWriter$PendingBlock at org.apache.lucene.codecs.BlockTreeTermsWriter$TermsWriter.finish(BlockTreeTermsWriter.java:1014) at org.apache.lucene.index.FreqProxTermsWriterPerField.flush(FreqProxTermsWriterPerField.java:553) at org.apache.lucene.index.FreqProxTermsWriter.flush(FreqProxTermsWriter.java:85) at org.apache.lucene.index.TermsHash.flush(TermsHash.java:116) at org.apache.lucene.index.DocInverter.flush(DocInverter.java:53) at org.apache.lucene.index.DocFieldProcessor.flush(DocFieldProcessor.java:81) at org.apache.lucene.index.DocumentsWriterPerThread.flush(DocumentsWriterPerThread.java:493) at org.apache.lucene.index.DocumentsWriter.doFlush(DocumentsWriter.java:480) at org.apache.lucene.index.DocumentsWriter.postUpdate(DocumentsWriter.java:378) at org.apache.lucene.index.DocumentsWriter.updateDocuments(DocumentsWriter.java:413) at org.apache.lucene.index.IndexWriter.updateDocuments(IndexWriter.java:1283) at org.apache.lucene.index.IndexWriter.addDocuments(IndexWriter.java:1243) at org.apache.lucene.index.IndexWriter.addDocuments(IndexWriter.java:1228) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-6695) Change in solrconfig.xml for maxBooleanClauses in SolrCloud is not recognized
Robert Parker created SOLR-6695: --- Summary: Change in solrconfig.xml for maxBooleanClauses in SolrCloud is not recognized Key: SOLR-6695 URL: https://issues.apache.org/jira/browse/SOLR-6695 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.10.2, 4.10, 4.9 Reporter: Robert Parker Priority: Minor Under Solr 4.10.2 in solrcloud configuration, if I upload a change to solrconfig.xml to zookeeper that raises maxBooleanClauses from 1024 to 2048 and then reload the collection, the cores do not recongnize a new value for maxBooleanClauses unlike other changes to schema.xml and solrconfig.xml. I have to bounce Solr on each node before queries will honor the new value for maxBooleanClauses. This seems like unintentional behavior. I should be able to make any change to schema.xml and solrconfig.xml, then upload those to zookeeper and have each node in the cluster instantly honor all new values after a core/collection reload. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2927) SolrIndexSearcher's register do not match close and SolrCore's closeSearcher
[ https://issues.apache.org/jira/browse/SOLR-2927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14195169#comment-14195169 ] Shalin Shekhar Mangar commented on SOLR-2927: - Thanks for pinging me Michael. This issue had been forgotten. I now understand the bug and I am able to reproduce it locally. I started with Cyrille's patch which introduced an exception in the SolrCore constructor and I added logging of all items which are added to JMX and all the items that are removed on close after the exception. With a little bit of awk and sort, I have this list of mbeans which are leaked: {code} documentCache fieldValueCache filterCache mlt perSegFilter query queryResultCache searcher Searcher@778e65f2[techproducts] {code} SolrIndexSearcher's register do not match close and SolrCore's closeSearcher Key: SOLR-2927 URL: https://issues.apache.org/jira/browse/SOLR-2927 Project: Solr Issue Type: Bug Components: search Affects Versions: 4.0-ALPHA Environment: JDK1.6/CentOS Reporter: tom liu Assignee: Shalin Shekhar Mangar Fix For: 4.9, Trunk Attachments: SOLR-2927.patch, mbean-leak-jira.png # SolrIndexSearcher's register method put the name of searcher, but SolrCore's closeSearcher method remove name of currentSearcher on infoRegistry. # SolrIndexSearcher's register method put the name of cache, but SolrIndexSearcher's close do not remove the name of cache. so, there maybe lost some memory leak. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6058) Solr needs a new website
[ https://issues.apache.org/jira/browse/SOLR-6058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14195194#comment-14195194 ] Steve Rowe commented on SOLR-6058: -- I asked Infra (INFRA-8576) to enable the Attribute Lists markdown extension so that we can explicitly set ids, classes, and arbitrary attribute/value pairs on output elements when the ASF CMS uses Python Markdown to generate HTML - see https://pythonhosted.org/Markdown/extensions/attr_list.html Solr needs a new website Key: SOLR-6058 URL: https://issues.apache.org/jira/browse/SOLR-6058 Project: Solr Issue Type: Task Reporter: Grant Ingersoll Assignee: Grant Ingersoll Attachments: HTML.rar, SOLR-6058, SOLR-6058.location-fix.patchfile, Solr_Icons.pdf, Solr_Logo_on_black.pdf, Solr_Logo_on_black.png, Solr_Logo_on_orange.pdf, Solr_Logo_on_orange.png, Solr_Logo_on_white.pdf, Solr_Logo_on_white.png, Solr_Styleguide.pdf Solr needs a new website: better organization of content, less verbose, more pleasing graphics, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
RE: JDK 9 Early Access with Project Jigsaw build b36 is available on java.net
Hi, A few days ago, I already opened an issue about jigsaw in forbidden-apis (https://code.google.com/p/forbidden-apis/issues/detail?id=39). Currently it just says unsupported JDK and stops checking forbidden (because it cannot find out if a class is something like sun.misc.Unsafe, non-public). But adding support is quite easy. Once I installed this version locally, I can start implementing support for Java 9... Basically, forbidden works as it is, but the special cases like detecting private rt.jar APIs or extracting the deprecated signature files needs some changes. This is why it says unsupported. Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: dawid.we...@gmail.com [mailto:dawid.we...@gmail.com] On Behalf Of Dawid Weiss Sent: Monday, November 03, 2014 7:44 PM To: Uwe Schindler Cc: dev@lucene.apache.org Subject: Re: JDK 9 Early Access with Project Jigsaw build b36 is available on java.net I imagine this will break everything that relies on scanning rt.jar out there (proguard, shading plugins, api-checkers) -- fun, fun, fun ;) Dawid On Mon, Nov 3, 2014 at 7:41 PM, Rory O'Donnell Oracle, Dublin Ireland rory.odonn...@oracle.com wrote: Hi Uwe Dawid, JDK 9 Early Access with Project Jigsaw build b36 is available on java.net [1] The goal of Project Jigsaw [2] is to design and implement a standard module system for the Java SE Platform, and to apply that system to the Platform itself and to the JDK. As described in JEP 220 [3], this build provides a new runtime image structure. For example, this new runtime image does not install an rt.jar file or a tools.jar file. Please refer to Project Jigsaw's updated project pages [2] [4] and Mark Reinhold's announcement email [5] for further details. We are very interested in your experiences testing this build. Comments, questions, and suggestions are welcome on the jigsaw-dev mailing list or else submit bug reports via bugs.java.com. Note: If you haven’t already subscribed to that mailing list then please do so first, otherwise your message will be discarded as spam. [1] https://jdk9.java.net/jigsaw/ [2] http://openjdk.java.net/projects/jigsaw/ [3] http://openjdk.java.net/jeps/220 [4] http://openjdk.java.net/projects/jigsaw/ea [5] http://mail.openjdk.java.net/pipermail/jigsaw-dev/2014- November/003878 .html -- Rgds,Rory O'Donnell Quality Engineering Manager Oracle EMEA , Dublin, Ireland - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4586) Increase default maxBooleanClauses
[ https://issues.apache.org/jira/browse/SOLR-4586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14195223#comment-14195223 ] Shawn Heisey commented on SOLR-4586: [~reparker], maxBooleanClauses is a global Lucene setting across the entire application, and the last thing to set that value will win every time. If you have any configs with the default of 1024 and you reload any of those cores after reloading the one that sets it to 2048, then it will be changed back -- for the entire application. The best option is to set the higher limit in *every* solrconfig.xml file, or remove the setting from all of them except one. The javadocs for the Lucene setter method do not indicate this global nature, but I assure you that I have looked at the code, and it is indeed global. http://lucene.apache.org/core/4_10_0/core/org/apache/lucene/search/BooleanQuery.html#setMaxClauseCount%28int%29 Increase default maxBooleanClauses -- Key: SOLR-4586 URL: https://issues.apache.org/jira/browse/SOLR-4586 Project: Solr Issue Type: Improvement Affects Versions: 4.2 Environment: 4.3-SNAPSHOT 1456767M - ncindex - 2013-03-15 13:11:50 Reporter: Shawn Heisey Attachments: SOLR-4586.patch, SOLR-4586.patch, SOLR-4586.patch, SOLR-4586.patch, SOLR-4586.patch, SOLR-4586_verify_maxClauses.patch In the #solr IRC channel, I mentioned the maxBooleanClauses limitation to someone asking a question about queries. Mark Miller told me that maxBooleanClauses no longer applies, that the limitation was removed from Lucene sometime in the 3.x series. The config still shows up in the example even in the just-released 4.2. Checking through the source code, I found that the config option is parsed and the value stored in objects, but does not actually seem to be used by anything. I removed every trace of it that I could find, and all tests still pass. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4586) Increase default maxBooleanClauses
[ https://issues.apache.org/jira/browse/SOLR-4586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14195233#comment-14195233 ] Robert Parker commented on SOLR-4586: - Ive only got one collection and one config in zookeeper, and thats the one that is being changed. Each core had its solrconfig.xml updated on disk, but since its a SolrCloud config, only the zookeeper version should matter, correct? Increase default maxBooleanClauses -- Key: SOLR-4586 URL: https://issues.apache.org/jira/browse/SOLR-4586 Project: Solr Issue Type: Improvement Affects Versions: 4.2 Environment: 4.3-SNAPSHOT 1456767M - ncindex - 2013-03-15 13:11:50 Reporter: Shawn Heisey Attachments: SOLR-4586.patch, SOLR-4586.patch, SOLR-4586.patch, SOLR-4586.patch, SOLR-4586.patch, SOLR-4586_verify_maxClauses.patch In the #solr IRC channel, I mentioned the maxBooleanClauses limitation to someone asking a question about queries. Mark Miller told me that maxBooleanClauses no longer applies, that the limitation was removed from Lucene sometime in the 3.x series. The config still shows up in the example even in the just-released 4.2. Checking through the source code, I found that the config option is parsed and the value stored in objects, but does not actually seem to be used by anything. I removed every trace of it that I could find, and all tests still pass. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-6042) CustomScoreQuery Explain differs from the actual score when topLevelBoost is used.
[ https://issues.apache.org/jira/browse/LUCENE-6042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Denis Lantsman updated LUCENE-6042: --- Attachment: CustomScoreQuery.patch CustomScoreQuery Explain differs from the actual score when topLevelBoost is used. -- Key: LUCENE-6042 URL: https://issues.apache.org/jira/browse/LUCENE-6042 Project: Lucene - Core Issue Type: Bug Components: core/query/scoring Affects Versions: 4.8 Reporter: Denis Lantsman Priority: Minor Attachments: CustomScoreQuery.patch Original Estimate: 1h Remaining Estimate: 1h CustomScoreQuery.java, doExplain has the following line: {code} res.addDetail(new Explanation(getBoost(), queryBoost)); {code} This multiplies the custom score query by just the boost of the current query, and not by {code} queryWeight=topLevelBoost*getBoost(); {code} which is the value that's actually used during scoring. This leads to drastically different scores in the debug info, relative to the actual score, when the query is a subquery of another one, like a BooleanQuery clause, with a non-1 boost. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org