[jira] [Created] (LUCENE-6045) Refator classifier APIs to work better with multi threading

2014-11-03 Thread Tommaso Teofili (JIRA)
Tommaso Teofili created LUCENE-6045:
---

 Summary: Refator classifier APIs to work better with multi 
threading
 Key: LUCENE-6045
 URL: https://issues.apache.org/jira/browse/LUCENE-6045
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/classification
Reporter: Tommaso Teofili
Assignee: Tommaso Teofili


In 
https://issues.apache.org/jira/browse/LUCENE-4345?focusedCommentId=13454729page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13454729
 [~simonw] pointed out that the current Classifier API doesn't work well in 
multi threading environments: 

bq. The interface you defined has some problems with respect to Multi-Threading 
IMO. The interface itself suggests that this class is stateful and you have to 
call methods in a certain order and at the same you need to make sure that it 
is not published for read access before training is done. I think it would be 
wise to pass in all needed objects as constructor arguments and make the 
references final so it can be shared across threads and add an interface that 
represents the trained model computed offline? In this case it doesn't really 
matter but in the future it might make sense. We can also skip the model 
interface entirely and remove the training method until we have some impls that 
really need to be trained.

I missed that at that point but I think for 5.0 it would be wise to rearrange 
the API to address that properly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-5736) Separate the classifiers to online and caching where possible

2014-11-03 Thread Tommaso Teofili (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tommaso Teofili resolved LUCENE-5736.
-
   Resolution: Fixed
Fix Version/s: 5.0

 Separate the classifiers to online and caching where possible
 -

 Key: LUCENE-5736
 URL: https://issues.apache.org/jira/browse/LUCENE-5736
 Project: Lucene - Core
  Issue Type: Sub-task
  Components: modules/classification
Reporter: Gergő Törcsvári
Assignee: Tommaso Teofili
  Labels: gsoc2014
 Fix For: 5.0

 Attachments: 0803-caching.patch, 0810-caching.patch, 
 CachingNaiveBayesClassifier.java


 The Lucene classifier implementations are now near onlines if they get a near 
 realtime reader. It is good for the users whoes have a continously changing 
 dataset, but slow for not changing datasets.
 The idea is: What if we implement a cache and speed up the results where it 
 is possible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Assigned] (LUCENE-5548) Improve flexibility and testability of the classification module

2014-11-03 Thread Tommaso Teofili (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tommaso Teofili reassigned LUCENE-5548:
---

Assignee: Tommaso Teofili

 Improve flexibility and testability of the classification module
 

 Key: LUCENE-5548
 URL: https://issues.apache.org/jira/browse/LUCENE-5548
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/classification
Reporter: Tommaso Teofili
Assignee: Tommaso Teofili
  Labels: gsoc2014, mentor

 Lucene classification module's flexibility and capabilities may be improved 
 with the following:
 - make it possible to use them online (or provide an online version of 
 them) so that if the underlying index(reader) is updated the classifier 
 doesn't need to be trained again to take into account newly added docs
 - eventually pass a different Analyzer together with the text to be 
 classified (or directly a TokenStream) to specify custom 
 tokenization/filtering.
 - normalize score calculations of existing classifiers
 - provide publicly available dataset based accuracy and speed tests
 - more Lucene based classification algorithms
 Specific subtasks for each of the above topics should be created to discuss 
 each of them in depth.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-5699) Lucene classification score calculation normalize and return lists

2014-11-03 Thread Tommaso Teofili (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tommaso Teofili resolved LUCENE-5699.
-
Resolution: Fixed

 Lucene classification score calculation normalize and return lists
 --

 Key: LUCENE-5699
 URL: https://issues.apache.org/jira/browse/LUCENE-5699
 Project: Lucene - Core
  Issue Type: Sub-task
  Components: modules/classification
Reporter: Gergő Törcsvári
Assignee: Tommaso Teofili
  Labels: gsoc2014
 Fix For: 5.0, Trunk

 Attachments: 06-06-5699.patch, 0730.patch, 0803-base.patch, 
 0810-base.patch


 Now the classifiers can return only the best matching classes. If somebody 
 want it to use more complex tasks he need to modify these classes for get 
 second and third results too. If it is possible to return a list and it is 
 not a lot resource why we dont do that? (We iterate a list so also.)
 The Bayes classifier get too small return values, and there were a bug with 
 the zero floats. It was fixed with logarithmic. It would be nice to scale the 
 class scores sum vlue to one, and then we coud compare two documents return 
 score and relevance. (If we dont do this the wordcount in the test documents 
 affected the result score.)
 With bulletpoints:
 * In the Bayes classification normalized score values, and return with result 
 lists.
 * In the KNN classifier possibility to return a result list.
 * Make the ClassificationResult Comparable for list sorting.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-5.x-Windows (32bit/jdk1.8.0_40-ea-b09) - Build # 4307 - Still Failing!

2014-11-03 Thread Policeman Jenkins Server
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-5.x-Windows/4307/
Java: 32bit/jdk1.8.0_40-ea-b09 -server -XX:+UseSerialGC (asserts: true)

4 tests failed.
FAILED:  org.apache.solr.client.solrj.impl.CloudSolrServerTest.testDistribSearch

Error Message:


Stack Trace:
java.lang.AssertionError
at 
__randomizedtesting.SeedInfo.seed([EBA28C3A09DF34FA:6A4402227E8054C6]:0)
at org.junit.Assert.fail(Assert.java:92)
at org.junit.Assert.assertTrue(Assert.java:43)
at org.junit.Assert.assertTrue(Assert.java:54)
at 
org.apache.solr.client.solrj.impl.CloudSolrServerTest.allTests(CloudSolrServerTest.java:182)
at 
org.apache.solr.client.solrj.impl.CloudSolrServerTest.doTest(CloudSolrServerTest.java:124)
at 
org.apache.solr.BaseDistributedSearchTestCase.testDistribSearch(BaseDistributedSearchTestCase.java:869)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1618)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:827)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:863)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:877)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:365)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:798)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:458)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:836)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:738)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:772)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:783)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:54)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65)
at 
org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 

[jira] [Updated] (SOLR-6637) Solr should have a way to restore a core

2014-11-03 Thread Varun Thacker (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-6637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Thacker updated SOLR-6637:

Attachment: SOLR-6637.patch

Updated patch. Improved the test case and fixed the bugs that it uncovered.

Still fighting one issue - I keep getting this error when the index directory 
tries to close all the resources. Trying to figure out what is the underlying 
problem.

{noformat}
MockDirectoryWrapper: cannot close: there are still open files: {_0.cfs=1, 
_1.cfs=1}
{noformat}

 Solr should have a way to restore a core
 

 Key: SOLR-6637
 URL: https://issues.apache.org/jira/browse/SOLR-6637
 Project: Solr
  Issue Type: Improvement
Reporter: Varun Thacker
 Attachments: SOLR-6637.patch, SOLR-6637.patch, SOLR-6637.patch


 We have a core backup command which backs up the index. We should have a 
 restore command too. 
 This would restore any named snapshots created by the replication handlers 
 backup command.
 While working on this patch right now I realized that during backup we only 
 backup the index. Should we backup the conf files also? Any thoughts? I could 
 separate Jira for this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6005) Explore alternative to Document/Field/FieldType API

2014-11-03 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14194378#comment-14194378
 ] 

ASF subversion and git services commented on LUCENE-6005:
-

Commit 1636293 from [~mikemccand] in branch 'dev/branches/lucene6005'
[ https://svn.apache.org/r1636293 ]

LUCENE-6005: add Date, InetAddress types; add min/maxTokenLength; add 
maxTokenCount; use ValueType.NONE not null; each FieldType now stores 
Luceneversion it was created by

 Explore alternative to Document/Field/FieldType API
 ---

 Key: LUCENE-6005
 URL: https://issues.apache.org/jira/browse/LUCENE-6005
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: Trunk


 Auto-prefix terms (LUCENE-5879) is blocked because it's impossible in
 Lucene today to add a simple API to use it, and I don't think we
 should commit features that only super-experts can figure out how to
 use: that's evil.
 The only realistic workaround for such new features is to instead
 add them directly to the various servers on top of Lucene, since they
 all already have nice schema APIs.
 I opened LUCENE-5989 to try do at least a baby step towards making it
 easier to use auto-prefix terms, so you can easily add singleton
 binary tokens, but even that has proven controversial.
 Net/net I think we have to solve the root cause of this by fixing the
 Document/Field/FieldType API so that new index-level features can have
 a usable API, properly defaulted for the right types of fields.
 Towards that, I'm exploring a replacement for
 Document/Field/FieldType.  The idea is to expose simple methods on the
 document class (no more separate Field and FieldType classes):
 {noformat}
 doc.addLargeText(body, some text);
 doc.addShortText(title, a title);
 doc.addAtom(id, 29jafnn);
 doc.addBinary(bytes, new byte[7]);
 doc.addNumber(number, 17);
 {noformat}
 And then expose a separate FieldTypes class, that you pass to ctor of
 the new document class, which lets you set all the various per-field
 settings (stored, doc values, etc.).  E.g.:
 {noformat}
 types.enableStored(id);
 {noformat}
 FieldTypes is a write-once schema, and it throws exceptions if you try
 to make invalid changes once a given setting is already written
 (e.g. enabling norms after having disabled them).  It will (I haven't
 implemented this yet) save its state into IndexWriter's commitData, so
 it's available when you open a new IndexWriter for append and when you
 open a reader.
 It has methods to set all the per-field settings (analyzer, stored,
 term vectors, norms, index options, doc values type), and chooses
 reasonable defaults based on the value's type when it suddenly sees
 a new field.  For example, when you add a number, it's indexed for
 range querying and sorting (numeric doc values) by default.
 FieldTypes provides the analyzer and codec (a little messy) that you
 pass to IndexWriterConfig.  Since it's effectively a persistent
 schema, it knows all about the available fields at search time, so we
 could use it to create queries (checking if they are valid given that
 field's type).  Query parsers and highlighters could consult it.
 Default UIs (above Lucene) could use it, etc.  This is all future .. I
 think for this issue the goal should be to just provide a better
 index-time API but not yet make use of it at search time.
 So with this change, for auto-prefix terms, we could add an enable
 range queries/filters option, but then validate that the selected
 postings format supports such an option.
 I know this exploration will be horribly controversial, but
 realistically I don't think Lucene can move on much further if we
 can't finally address this schema problem head on.
 This is long overdue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-6046) RegExp.toAutomaton high memory use

2014-11-03 Thread Lee Hinman (JIRA)
Lee Hinman created LUCENE-6046:
--

 Summary: RegExp.toAutomaton high memory use
 Key: LUCENE-6046
 URL: https://issues.apache.org/jira/browse/LUCENE-6046
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/queryparser
Affects Versions: 4.10.1
Reporter: Lee Hinman
Priority: Minor


When creating an automaton from an org.apache.lucene.util.automaton.RegExp, 
it's possible for the automaton to use so much memory it exceeds the maximum 
array size for java.

The following caused an OutOfMemoryError with a 32gb heap:

{noformat}
new 
RegExp(\\[\\[(Datei|File|Bild|Image):[^]]*alt=[^]|}]{50,200}).toAutomaton();
{noformat}

When increased to a 60gb heap, the following exception is thrown:

{noformat}
  1 java.lang.IllegalArgumentException: requested array size 2147483624 
exceeds maximum array in java (2147483623)
  1 
__randomizedtesting.SeedInfo.seed([7BE81EF678615C32:95C8057A4ABA5B52]:0)
  1 org.apache.lucene.util.ArrayUtil.oversize(ArrayUtil.java:168)
  1 org.apache.lucene.util.ArrayUtil.grow(ArrayUtil.java:295)
  1 
org.apache.lucene.util.automaton.Automaton$Builder.addTransition(Automaton.java:639)
  1 
org.apache.lucene.util.automaton.Operations.determinize(Operations.java:741)
  1 
org.apache.lucene.util.automaton.MinimizationOperations.minimizeHopcroft(MinimizationOperations.java:62)
  1 
org.apache.lucene.util.automaton.MinimizationOperations.minimize(MinimizationOperations.java:51)
  1 org.apache.lucene.util.automaton.RegExp.toAutomaton(RegExp.java:477)
  1 org.apache.lucene.util.automaton.RegExp.toAutomaton(RegExp.java:426)
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6046) RegExp.toAutomaton high memory use

2014-11-03 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14194395#comment-14194395
 ] 

Michael McCandless commented on LUCENE-6046:


Hmm, two bugs here.

First off, RegExp.toAutomaton is an inherently costly method: wasteful of RAM 
and CPU, doing minimize after each recursive operation, in order to build a DFA 
in the end. It's unfortunately quite easy to concoct regular expressions that 
make it consume ridiculous resources.  I'll look at this example and see if we 
can improve it, but in the end it will always have its adversarial cases 
unless we give up on making the resulting automaton deterministic, which would 
be a very big change.

Maybe we should add adversary defenses to it, e.g. you set a limit on the 
number of states it's allowed to create, and it throws a RegExpTooHardException 
if it would exceed that?

Second off, ArrayUtil.oversize has the wrong (too large) value for 
MAX_ARRAY_LENGTH, which is a bug from LUCENE-5844.  Which JVM did you run this 
test on?

 RegExp.toAutomaton high memory use
 --

 Key: LUCENE-6046
 URL: https://issues.apache.org/jira/browse/LUCENE-6046
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/queryparser
Affects Versions: 4.10.1
Reporter: Lee Hinman
Priority: Minor

 When creating an automaton from an org.apache.lucene.util.automaton.RegExp, 
 it's possible for the automaton to use so much memory it exceeds the maximum 
 array size for java.
 The following caused an OutOfMemoryError with a 32gb heap:
 {noformat}
 new 
 RegExp(\\[\\[(Datei|File|Bild|Image):[^]]*alt=[^]|}]{50,200}).toAutomaton();
 {noformat}
 When increased to a 60gb heap, the following exception is thrown:
 {noformat}
   1 java.lang.IllegalArgumentException: requested array size 2147483624 
 exceeds maximum array in java (2147483623)
   1 
 __randomizedtesting.SeedInfo.seed([7BE81EF678615C32:95C8057A4ABA5B52]:0)
   1 org.apache.lucene.util.ArrayUtil.oversize(ArrayUtil.java:168)
   1 org.apache.lucene.util.ArrayUtil.grow(ArrayUtil.java:295)
   1 
 org.apache.lucene.util.automaton.Automaton$Builder.addTransition(Automaton.java:639)
   1 
 org.apache.lucene.util.automaton.Operations.determinize(Operations.java:741)
   1 
 org.apache.lucene.util.automaton.MinimizationOperations.minimizeHopcroft(MinimizationOperations.java:62)
   1 
 org.apache.lucene.util.automaton.MinimizationOperations.minimize(MinimizationOperations.java:51)
   1 org.apache.lucene.util.automaton.RegExp.toAutomaton(RegExp.java:477)
   1 org.apache.lucene.util.automaton.RegExp.toAutomaton(RegExp.java:426)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Assigned] (LUCENE-6046) RegExp.toAutomaton high memory use

2014-11-03 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-6046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless reassigned LUCENE-6046:
--

Assignee: Michael McCandless

 RegExp.toAutomaton high memory use
 --

 Key: LUCENE-6046
 URL: https://issues.apache.org/jira/browse/LUCENE-6046
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/queryparser
Affects Versions: 4.10.1
Reporter: Lee Hinman
Assignee: Michael McCandless
Priority: Minor

 When creating an automaton from an org.apache.lucene.util.automaton.RegExp, 
 it's possible for the automaton to use so much memory it exceeds the maximum 
 array size for java.
 The following caused an OutOfMemoryError with a 32gb heap:
 {noformat}
 new 
 RegExp(\\[\\[(Datei|File|Bild|Image):[^]]*alt=[^]|}]{50,200}).toAutomaton();
 {noformat}
 When increased to a 60gb heap, the following exception is thrown:
 {noformat}
   1 java.lang.IllegalArgumentException: requested array size 2147483624 
 exceeds maximum array in java (2147483623)
   1 
 __randomizedtesting.SeedInfo.seed([7BE81EF678615C32:95C8057A4ABA5B52]:0)
   1 org.apache.lucene.util.ArrayUtil.oversize(ArrayUtil.java:168)
   1 org.apache.lucene.util.ArrayUtil.grow(ArrayUtil.java:295)
   1 
 org.apache.lucene.util.automaton.Automaton$Builder.addTransition(Automaton.java:639)
   1 
 org.apache.lucene.util.automaton.Operations.determinize(Operations.java:741)
   1 
 org.apache.lucene.util.automaton.MinimizationOperations.minimizeHopcroft(MinimizationOperations.java:62)
   1 
 org.apache.lucene.util.automaton.MinimizationOperations.minimize(MinimizationOperations.java:51)
   1 org.apache.lucene.util.automaton.RegExp.toAutomaton(RegExp.java:477)
   1 org.apache.lucene.util.automaton.RegExp.toAutomaton(RegExp.java:426)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6046) RegExp.toAutomaton high memory use

2014-11-03 Thread Dawid Weiss (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14194397#comment-14194397
 ] 

Dawid Weiss commented on LUCENE-6046:
-

Just a note -- Russ Cox wrote a series of excellent articles about different 
approaches of implementing regexp scanners. 
http://swtch.com/~rsc/regexp/regexp1.html

(There is no clear winner -- both DFAs and NFA have advantages.)

 RegExp.toAutomaton high memory use
 --

 Key: LUCENE-6046
 URL: https://issues.apache.org/jira/browse/LUCENE-6046
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/queryparser
Affects Versions: 4.10.1
Reporter: Lee Hinman
Assignee: Michael McCandless
Priority: Minor

 When creating an automaton from an org.apache.lucene.util.automaton.RegExp, 
 it's possible for the automaton to use so much memory it exceeds the maximum 
 array size for java.
 The following caused an OutOfMemoryError with a 32gb heap:
 {noformat}
 new 
 RegExp(\\[\\[(Datei|File|Bild|Image):[^]]*alt=[^]|}]{50,200}).toAutomaton();
 {noformat}
 When increased to a 60gb heap, the following exception is thrown:
 {noformat}
   1 java.lang.IllegalArgumentException: requested array size 2147483624 
 exceeds maximum array in java (2147483623)
   1 
 __randomizedtesting.SeedInfo.seed([7BE81EF678615C32:95C8057A4ABA5B52]:0)
   1 org.apache.lucene.util.ArrayUtil.oversize(ArrayUtil.java:168)
   1 org.apache.lucene.util.ArrayUtil.grow(ArrayUtil.java:295)
   1 
 org.apache.lucene.util.automaton.Automaton$Builder.addTransition(Automaton.java:639)
   1 
 org.apache.lucene.util.automaton.Operations.determinize(Operations.java:741)
   1 
 org.apache.lucene.util.automaton.MinimizationOperations.minimizeHopcroft(MinimizationOperations.java:62)
   1 
 org.apache.lucene.util.automaton.MinimizationOperations.minimize(MinimizationOperations.java:51)
   1 org.apache.lucene.util.automaton.RegExp.toAutomaton(RegExp.java:477)
   1 org.apache.lucene.util.automaton.RegExp.toAutomaton(RegExp.java:426)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6046) RegExp.toAutomaton high memory use

2014-11-03 Thread Lee Hinman (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14194400#comment-14194400
 ] 

Lee Hinman commented on LUCENE-6046:


[~mikemccand] I ran it with the following JVM:

{noformat}
java version 1.8.0_20
Java(TM) SE Runtime Environment (build 1.8.0_20-b26)
Java HotSpot(TM) 64-Bit Server VM (build 25.20-b23, mixed mode)
{noformat}

 RegExp.toAutomaton high memory use
 --

 Key: LUCENE-6046
 URL: https://issues.apache.org/jira/browse/LUCENE-6046
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/queryparser
Affects Versions: 4.10.1
Reporter: Lee Hinman
Assignee: Michael McCandless
Priority: Minor

 When creating an automaton from an org.apache.lucene.util.automaton.RegExp, 
 it's possible for the automaton to use so much memory it exceeds the maximum 
 array size for java.
 The following caused an OutOfMemoryError with a 32gb heap:
 {noformat}
 new 
 RegExp(\\[\\[(Datei|File|Bild|Image):[^]]*alt=[^]|}]{50,200}).toAutomaton();
 {noformat}
 When increased to a 60gb heap, the following exception is thrown:
 {noformat}
   1 java.lang.IllegalArgumentException: requested array size 2147483624 
 exceeds maximum array in java (2147483623)
   1 
 __randomizedtesting.SeedInfo.seed([7BE81EF678615C32:95C8057A4ABA5B52]:0)
   1 org.apache.lucene.util.ArrayUtil.oversize(ArrayUtil.java:168)
   1 org.apache.lucene.util.ArrayUtil.grow(ArrayUtil.java:295)
   1 
 org.apache.lucene.util.automaton.Automaton$Builder.addTransition(Automaton.java:639)
   1 
 org.apache.lucene.util.automaton.Operations.determinize(Operations.java:741)
   1 
 org.apache.lucene.util.automaton.MinimizationOperations.minimizeHopcroft(MinimizationOperations.java:62)
   1 
 org.apache.lucene.util.automaton.MinimizationOperations.minimize(MinimizationOperations.java:51)
   1 org.apache.lucene.util.automaton.RegExp.toAutomaton(RegExp.java:477)
   1 org.apache.lucene.util.automaton.RegExp.toAutomaton(RegExp.java:426)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6046) RegExp.toAutomaton high memory use

2014-11-03 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14194406#comment-14194406
 ] 

Michael McCandless commented on LUCENE-6046:


bq.  Russ Cox wrote a series of excellent articles about different approaches 
of implementing regexp scanners. 

Thanks Dawid, these are great.

Switching to NFA based matching would be a very large change ... I don't think 
we should pursue it here.  Terms.intersect implementation for block tree is 
already very complex ... though I suppose of we could hide the on the fly 
subset construction (and convert regexp to a Thompson NFA) under an API, then 
Terms.intersect implementation wouldn't have to change much.

Still, there will always be adversarial cases no matter which approach we 
choose.  I think for this issue we should allow passing in a how much work are 
you willing to do to RegExp.toAutomaton, and it throws an exc when it would 
exceed that.

 RegExp.toAutomaton high memory use
 --

 Key: LUCENE-6046
 URL: https://issues.apache.org/jira/browse/LUCENE-6046
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/queryparser
Affects Versions: 4.10.1
Reporter: Lee Hinman
Assignee: Michael McCandless
Priority: Minor

 When creating an automaton from an org.apache.lucene.util.automaton.RegExp, 
 it's possible for the automaton to use so much memory it exceeds the maximum 
 array size for java.
 The following caused an OutOfMemoryError with a 32gb heap:
 {noformat}
 new 
 RegExp(\\[\\[(Datei|File|Bild|Image):[^]]*alt=[^]|}]{50,200}).toAutomaton();
 {noformat}
 When increased to a 60gb heap, the following exception is thrown:
 {noformat}
   1 java.lang.IllegalArgumentException: requested array size 2147483624 
 exceeds maximum array in java (2147483623)
   1 
 __randomizedtesting.SeedInfo.seed([7BE81EF678615C32:95C8057A4ABA5B52]:0)
   1 org.apache.lucene.util.ArrayUtil.oversize(ArrayUtil.java:168)
   1 org.apache.lucene.util.ArrayUtil.grow(ArrayUtil.java:295)
   1 
 org.apache.lucene.util.automaton.Automaton$Builder.addTransition(Automaton.java:639)
   1 
 org.apache.lucene.util.automaton.Operations.determinize(Operations.java:741)
   1 
 org.apache.lucene.util.automaton.MinimizationOperations.minimizeHopcroft(MinimizationOperations.java:62)
   1 
 org.apache.lucene.util.automaton.MinimizationOperations.minimize(MinimizationOperations.java:51)
   1 org.apache.lucene.util.automaton.RegExp.toAutomaton(RegExp.java:477)
   1 org.apache.lucene.util.automaton.RegExp.toAutomaton(RegExp.java:426)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6046) RegExp.toAutomaton high memory use

2014-11-03 Thread Dawid Weiss (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14194413#comment-14194413
 ] 

Dawid Weiss commented on LUCENE-6046:
-

I didn't mean to imply we should change the regexp implementation! :) This was 
just a pointer in case somebody wished to understand why regexps can explode. I 
actually wish there was an NFA-based regexp implementation for Java (with 
low-memory footprint) because this would make concatenating thousands of 
regexps (e.g., for pattern detection) much easier.

 RegExp.toAutomaton high memory use
 --

 Key: LUCENE-6046
 URL: https://issues.apache.org/jira/browse/LUCENE-6046
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/queryparser
Affects Versions: 4.10.1
Reporter: Lee Hinman
Assignee: Michael McCandless
Priority: Minor

 When creating an automaton from an org.apache.lucene.util.automaton.RegExp, 
 it's possible for the automaton to use so much memory it exceeds the maximum 
 array size for java.
 The following caused an OutOfMemoryError with a 32gb heap:
 {noformat}
 new 
 RegExp(\\[\\[(Datei|File|Bild|Image):[^]]*alt=[^]|}]{50,200}).toAutomaton();
 {noformat}
 When increased to a 60gb heap, the following exception is thrown:
 {noformat}
   1 java.lang.IllegalArgumentException: requested array size 2147483624 
 exceeds maximum array in java (2147483623)
   1 
 __randomizedtesting.SeedInfo.seed([7BE81EF678615C32:95C8057A4ABA5B52]:0)
   1 org.apache.lucene.util.ArrayUtil.oversize(ArrayUtil.java:168)
   1 org.apache.lucene.util.ArrayUtil.grow(ArrayUtil.java:295)
   1 
 org.apache.lucene.util.automaton.Automaton$Builder.addTransition(Automaton.java:639)
   1 
 org.apache.lucene.util.automaton.Operations.determinize(Operations.java:741)
   1 
 org.apache.lucene.util.automaton.MinimizationOperations.minimizeHopcroft(MinimizationOperations.java:62)
   1 
 org.apache.lucene.util.automaton.MinimizationOperations.minimize(MinimizationOperations.java:51)
   1 org.apache.lucene.util.automaton.RegExp.toAutomaton(RegExp.java:477)
   1 org.apache.lucene.util.automaton.RegExp.toAutomaton(RegExp.java:426)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6046) RegExp.toAutomaton high memory use

2014-11-03 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14194412#comment-14194412
 ] 

Michael McCandless commented on LUCENE-6046:


bq. Michael McCandless I ran it with the following JVM:

Thanks [~dakrone].

I was wrong about the first bug: there is no bug in ArrayUtil.oversize.  That 
exception just means RegExp is trying to create a too-big array ... so just the 
one bug :)

 RegExp.toAutomaton high memory use
 --

 Key: LUCENE-6046
 URL: https://issues.apache.org/jira/browse/LUCENE-6046
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/queryparser
Affects Versions: 4.10.1
Reporter: Lee Hinman
Assignee: Michael McCandless
Priority: Minor

 When creating an automaton from an org.apache.lucene.util.automaton.RegExp, 
 it's possible for the automaton to use so much memory it exceeds the maximum 
 array size for java.
 The following caused an OutOfMemoryError with a 32gb heap:
 {noformat}
 new 
 RegExp(\\[\\[(Datei|File|Bild|Image):[^]]*alt=[^]|}]{50,200}).toAutomaton();
 {noformat}
 When increased to a 60gb heap, the following exception is thrown:
 {noformat}
   1 java.lang.IllegalArgumentException: requested array size 2147483624 
 exceeds maximum array in java (2147483623)
   1 
 __randomizedtesting.SeedInfo.seed([7BE81EF678615C32:95C8057A4ABA5B52]:0)
   1 org.apache.lucene.util.ArrayUtil.oversize(ArrayUtil.java:168)
   1 org.apache.lucene.util.ArrayUtil.grow(ArrayUtil.java:295)
   1 
 org.apache.lucene.util.automaton.Automaton$Builder.addTransition(Automaton.java:639)
   1 
 org.apache.lucene.util.automaton.Operations.determinize(Operations.java:741)
   1 
 org.apache.lucene.util.automaton.MinimizationOperations.minimizeHopcroft(MinimizationOperations.java:62)
   1 
 org.apache.lucene.util.automaton.MinimizationOperations.minimize(MinimizationOperations.java:51)
   1 org.apache.lucene.util.automaton.RegExp.toAutomaton(RegExp.java:477)
   1 org.apache.lucene.util.automaton.RegExp.toAutomaton(RegExp.java:426)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6046) RegExp.toAutomaton high memory use

2014-11-03 Thread Lee Hinman (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14194436#comment-14194436
 ] 

Lee Hinman commented on LUCENE-6046:


bq. I think for this issue we should allow passing in a how much work are you 
willing to do to RegExp.toAutomaton, and it throws an exc when it would exceed 
that.

For what it's worth, I think this would be a good solution for us, much better 
than silently (from the user's perspective) freezing and then hitting an OOME.

 RegExp.toAutomaton high memory use
 --

 Key: LUCENE-6046
 URL: https://issues.apache.org/jira/browse/LUCENE-6046
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/queryparser
Affects Versions: 4.10.1
Reporter: Lee Hinman
Assignee: Michael McCandless
Priority: Minor

 When creating an automaton from an org.apache.lucene.util.automaton.RegExp, 
 it's possible for the automaton to use so much memory it exceeds the maximum 
 array size for java.
 The following caused an OutOfMemoryError with a 32gb heap:
 {noformat}
 new 
 RegExp(\\[\\[(Datei|File|Bild|Image):[^]]*alt=[^]|}]{50,200}).toAutomaton();
 {noformat}
 When increased to a 60gb heap, the following exception is thrown:
 {noformat}
   1 java.lang.IllegalArgumentException: requested array size 2147483624 
 exceeds maximum array in java (2147483623)
   1 
 __randomizedtesting.SeedInfo.seed([7BE81EF678615C32:95C8057A4ABA5B52]:0)
   1 org.apache.lucene.util.ArrayUtil.oversize(ArrayUtil.java:168)
   1 org.apache.lucene.util.ArrayUtil.grow(ArrayUtil.java:295)
   1 
 org.apache.lucene.util.automaton.Automaton$Builder.addTransition(Automaton.java:639)
   1 
 org.apache.lucene.util.automaton.Operations.determinize(Operations.java:741)
   1 
 org.apache.lucene.util.automaton.MinimizationOperations.minimizeHopcroft(MinimizationOperations.java:62)
   1 
 org.apache.lucene.util.automaton.MinimizationOperations.minimize(MinimizationOperations.java:51)
   1 org.apache.lucene.util.automaton.RegExp.toAutomaton(RegExp.java:477)
   1 org.apache.lucene.util.automaton.RegExp.toAutomaton(RegExp.java:426)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-6637) Solr should have a way to restore a core

2014-11-03 Thread Varun Thacker (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-6637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Thacker updated SOLR-6637:

Attachment: SOLR-6637.patch

bq. MockDirectoryWrapper: cannot close: there are still open files: {_0.cfs=1, 
_1.cfs=1}

Patch which fixes this. Looks like we can't use the try with resource block to 
get indexDir from the directoryFactory as we need to call release() instead of 
closing it.

 Solr should have a way to restore a core
 

 Key: SOLR-6637
 URL: https://issues.apache.org/jira/browse/SOLR-6637
 Project: Solr
  Issue Type: Improvement
Reporter: Varun Thacker
 Attachments: SOLR-6637.patch, SOLR-6637.patch, SOLR-6637.patch, 
 SOLR-6637.patch


 We have a core backup command which backs up the index. We should have a 
 restore command too. 
 This would restore any named snapshots created by the replication handlers 
 backup command.
 While working on this patch right now I realized that during backup we only 
 backup the index. Should we backup the conf files also? Any thoughts? I could 
 separate Jira for this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6517) CollectionsAPI call REBALANCELEADERS

2014-11-03 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14194476#comment-14194476
 ] 

Noble Paul commented on SOLR-6517:
--

Yes, there is a problem, it will not work,

Ideally you should trigger the re-lection process by invoking joinElection() 
with joinAtHead=true . That is what the OVERSEERROLE command does

 CollectionsAPI call REBALANCELEADERS
 

 Key: SOLR-6517
 URL: https://issues.apache.org/jira/browse/SOLR-6517
 Project: Solr
  Issue Type: New Feature
Affects Versions: 5.0, Trunk
Reporter: Erick Erickson
Assignee: Erick Erickson
 Fix For: 5.0, Trunk

 Attachments: SOLR-6517.patch, SOLR-6517.patch, SOLR-6517.patch


 Perhaps the final piece of SOLR-6491. Once the preferred leadership roles are 
 assigned, there has to be a command make it so Mr. Solr. This is something 
 of a placeholder to collect ideas. One wouldn't want to flood the system with 
 hundreds of re-assignments at once. Should this be synchronous or asnych? 
 Should it make the best attempt but not worry about perfection? Should it???
 a collection=name parameter would be required and it would re-elect all the 
 leaders that were on the 'wrong' node
 I'm thinking an optionally allowing one to specify a shard in the case where 
 you wanted to make a very specific change. Note that there's no need to 
 specify a particular replica, since there should be only a single 
 preferredLeader per slice.
 This command would do nothing to any slice that did not have a replica with a 
 preferredLeader role. Likewise it would do nothing if the slice in question 
 already had the leader role assigned to the node with the preferredLeader 
 role.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6365) specify appends, defaults, invariants outside of the component

2014-11-03 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14194477#comment-14194477
 ] 

ASF subversion and git services commented on SOLR-6365:
---

Commit 1636330 from [~noble.paul] in branch 'dev/trunk'
[ https://svn.apache.org/r1636330 ]

SOLR-6365

 specify  appends, defaults, invariants outside of the component
 ---

 Key: SOLR-6365
 URL: https://issues.apache.org/jira/browse/SOLR-6365
 Project: Solr
  Issue Type: Improvement
Reporter: Noble Paul
Assignee: Noble Paul
 Fix For: 5.0, Trunk

 Attachments: SOLR-6365-crappy-test.patch, SOLR-6365.patch, 
 SOLR-6365.patch, SOLR-6365.patch


 The components are configured in solrconfig.xml mostly for specifying these 
 extra parameters. If we separate these out, we can avoid specifying the 
 components altogether and make solrconfig much simpler. Eventually we want 
 users to see all functions as paths instead of components and control these 
 params from outside , through an API and persisted in ZK
 objectives :
 * define standard components implicitly and let users override some params 
 only
 * reuse standard params across components
 * define multiple param sets and mix and match these params at request time
 example
 {code:xml}
 !-- use json for all paths and _txt as the default search field--
 initParams name=global path=/**
   lst name=defaults
  str name=wtjson/str
  str name=df_txt/str
   /lst
 /initParams
 {code}
 other examples
 {code:xml}
 initParams name=a path=/dump3,/root/*,/root1/**
 lst name=defaults
   str name=aA/str
 /lst
 lst name=invariants
   str name=bB/str
 /lst
 lst name=appends
   str name=cC/str
 /lst
   /initParams
   requestHandler name=/dump3 class=DumpRequestHandler/
   requestHandler name=/dump4 class=DumpRequestHandler/
   requestHandler name=/root/dump5 class=DumpRequestHandler/
   requestHandler name=/root1/anotherlevel/dump6 
 class=DumpRequestHandler/
   requestHandler name=/dump1 class=DumpRequestHandler initParams=a/
   requestHandler name=/dump2 class=DumpRequestHandler initParams=a
 lst name=defaults
   str name=aA1/str
 /lst
 lst name=invariants
   str name=bB1/str
 /lst
 lst name=appends
   str name=cC1/str
 /lst
   /requestHandler
 {code}
  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6365) specify appends, defaults, invariants outside of the component

2014-11-03 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14194479#comment-14194479
 ] 

ASF subversion and git services commented on SOLR-6365:
---

Commit 1636331 from [~noble.paul] in branch 'dev/branches/branch_5x'
[ https://svn.apache.org/r1636331 ]

SOLR-6365

 specify  appends, defaults, invariants outside of the component
 ---

 Key: SOLR-6365
 URL: https://issues.apache.org/jira/browse/SOLR-6365
 Project: Solr
  Issue Type: Improvement
Reporter: Noble Paul
Assignee: Noble Paul
 Fix For: 5.0, Trunk

 Attachments: SOLR-6365-crappy-test.patch, SOLR-6365.patch, 
 SOLR-6365.patch, SOLR-6365.patch


 The components are configured in solrconfig.xml mostly for specifying these 
 extra parameters. If we separate these out, we can avoid specifying the 
 components altogether and make solrconfig much simpler. Eventually we want 
 users to see all functions as paths instead of components and control these 
 params from outside , through an API and persisted in ZK
 objectives :
 * define standard components implicitly and let users override some params 
 only
 * reuse standard params across components
 * define multiple param sets and mix and match these params at request time
 example
 {code:xml}
 !-- use json for all paths and _txt as the default search field--
 initParams name=global path=/**
   lst name=defaults
  str name=wtjson/str
  str name=df_txt/str
   /lst
 /initParams
 {code}
 other examples
 {code:xml}
 initParams name=a path=/dump3,/root/*,/root1/**
 lst name=defaults
   str name=aA/str
 /lst
 lst name=invariants
   str name=bB/str
 /lst
 lst name=appends
   str name=cC/str
 /lst
   /initParams
   requestHandler name=/dump3 class=DumpRequestHandler/
   requestHandler name=/dump4 class=DumpRequestHandler/
   requestHandler name=/root/dump5 class=DumpRequestHandler/
   requestHandler name=/root1/anotherlevel/dump6 
 class=DumpRequestHandler/
   requestHandler name=/dump1 class=DumpRequestHandler initParams=a/
   requestHandler name=/dump2 class=DumpRequestHandler initParams=a
 lst name=defaults
   str name=aA1/str
 /lst
 lst name=invariants
   str name=bB1/str
 /lst
 lst name=appends
   str name=cC1/str
 /lst
   /requestHandler
 {code}
  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6365) specify appends, defaults, invariants outside of the component

2014-11-03 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14194483#comment-14194483
 ] 

Noble Paul commented on SOLR-6365:
--

done.
The new behavior is very simple. Whatever is put inside the {{requestHandle}} 
takes precedence over {{initParams}}

 specify  appends, defaults, invariants outside of the component
 ---

 Key: SOLR-6365
 URL: https://issues.apache.org/jira/browse/SOLR-6365
 Project: Solr
  Issue Type: Improvement
Reporter: Noble Paul
Assignee: Noble Paul
 Fix For: 5.0, Trunk

 Attachments: SOLR-6365-crappy-test.patch, SOLR-6365.patch, 
 SOLR-6365.patch, SOLR-6365.patch


 The components are configured in solrconfig.xml mostly for specifying these 
 extra parameters. If we separate these out, we can avoid specifying the 
 components altogether and make solrconfig much simpler. Eventually we want 
 users to see all functions as paths instead of components and control these 
 params from outside , through an API and persisted in ZK
 objectives :
 * define standard components implicitly and let users override some params 
 only
 * reuse standard params across components
 * define multiple param sets and mix and match these params at request time
 example
 {code:xml}
 !-- use json for all paths and _txt as the default search field--
 initParams name=global path=/**
   lst name=defaults
  str name=wtjson/str
  str name=df_txt/str
   /lst
 /initParams
 {code}
 other examples
 {code:xml}
 initParams name=a path=/dump3,/root/*,/root1/**
 lst name=defaults
   str name=aA/str
 /lst
 lst name=invariants
   str name=bB/str
 /lst
 lst name=appends
   str name=cC/str
 /lst
   /initParams
   requestHandler name=/dump3 class=DumpRequestHandler/
   requestHandler name=/dump4 class=DumpRequestHandler/
   requestHandler name=/root/dump5 class=DumpRequestHandler/
   requestHandler name=/root1/anotherlevel/dump6 
 class=DumpRequestHandler/
   requestHandler name=/dump1 class=DumpRequestHandler initParams=a/
   requestHandler name=/dump2 class=DumpRequestHandler initParams=a
 lst name=defaults
   str name=aA1/str
 /lst
 lst name=invariants
   str name=bB1/str
 /lst
 lst name=appends
   str name=cC1/str
 /lst
   /requestHandler
 {code}
  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6046) RegExp.toAutomaton high memory use

2014-11-03 Thread Nik Everett (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14194484#comment-14194484
 ] 

Nik Everett commented on LUCENE-6046:
-

I'm working on a first cut of something that does that.  Better regex 
implementation would be great but the biggest thing to me is being able to 
limit the amount of work the determinize operation performs.  Its such a costly 
operation that I don't think it should ever be really abstracted from the user. 
 Something like having determinize throw a checked exception when it performs 
too much work would make you keenly aware whenever you might be straying into 
exponential territory.

 RegExp.toAutomaton high memory use
 --

 Key: LUCENE-6046
 URL: https://issues.apache.org/jira/browse/LUCENE-6046
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/queryparser
Affects Versions: 4.10.1
Reporter: Lee Hinman
Assignee: Michael McCandless
Priority: Minor

 When creating an automaton from an org.apache.lucene.util.automaton.RegExp, 
 it's possible for the automaton to use so much memory it exceeds the maximum 
 array size for java.
 The following caused an OutOfMemoryError with a 32gb heap:
 {noformat}
 new 
 RegExp(\\[\\[(Datei|File|Bild|Image):[^]]*alt=[^]|}]{50,200}).toAutomaton();
 {noformat}
 When increased to a 60gb heap, the following exception is thrown:
 {noformat}
   1 java.lang.IllegalArgumentException: requested array size 2147483624 
 exceeds maximum array in java (2147483623)
   1 
 __randomizedtesting.SeedInfo.seed([7BE81EF678615C32:95C8057A4ABA5B52]:0)
   1 org.apache.lucene.util.ArrayUtil.oversize(ArrayUtil.java:168)
   1 org.apache.lucene.util.ArrayUtil.grow(ArrayUtil.java:295)
   1 
 org.apache.lucene.util.automaton.Automaton$Builder.addTransition(Automaton.java:639)
   1 
 org.apache.lucene.util.automaton.Operations.determinize(Operations.java:741)
   1 
 org.apache.lucene.util.automaton.MinimizationOperations.minimizeHopcroft(MinimizationOperations.java:62)
   1 
 org.apache.lucene.util.automaton.MinimizationOperations.minimize(MinimizationOperations.java:51)
   1 org.apache.lucene.util.automaton.RegExp.toAutomaton(RegExp.java:477)
   1 org.apache.lucene.util.automaton.RegExp.toAutomaton(RegExp.java:426)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-6517) CollectionsAPI call REBALANCELEADERS

2014-11-03 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14194476#comment-14194476
 ] 

Noble Paul edited comment on SOLR-6517 at 11/3/14 12:01 PM:


Yes, there is a problem, it will not work,

Ideally you should trigger the re-lection process by invoking joinElection() 
with joinAtHead=true . That is what the ADDROLE and role=overseer command does


was (Author: noble.paul):
Yes, there is a problem, it will not work,

Ideally you should trigger the re-lection process by invoking joinElection() 
with joinAtHead=true . That is what the OVERSEERROLE command does

 CollectionsAPI call REBALANCELEADERS
 

 Key: SOLR-6517
 URL: https://issues.apache.org/jira/browse/SOLR-6517
 Project: Solr
  Issue Type: New Feature
Affects Versions: 5.0, Trunk
Reporter: Erick Erickson
Assignee: Erick Erickson
 Fix For: 5.0, Trunk

 Attachments: SOLR-6517.patch, SOLR-6517.patch, SOLR-6517.patch


 Perhaps the final piece of SOLR-6491. Once the preferred leadership roles are 
 assigned, there has to be a command make it so Mr. Solr. This is something 
 of a placeholder to collect ideas. One wouldn't want to flood the system with 
 hundreds of re-assignments at once. Should this be synchronous or asnych? 
 Should it make the best attempt but not worry about perfection? Should it???
 a collection=name parameter would be required and it would re-elect all the 
 leaders that were on the 'wrong' node
 I'm thinking an optionally allowing one to specify a shard in the case where 
 you wanted to make a very specific change. Note that there's no need to 
 specify a particular replica, since there should be only a single 
 preferredLeader per slice.
 This command would do nothing to any slice that did not have a replica with a 
 preferredLeader role. Likewise it would do nothing if the slice in question 
 already had the leader role assigned to the node with the preferredLeader 
 role.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-6533) Support editing common solrconfig.xml values

2014-11-03 Thread Noble Paul (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-6533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Noble Paul updated SOLR-6533:
-
Attachment: SOLR-6533.patch

all tests pass,
added a command line option to disable config editing

 Support editing common solrconfig.xml values
 

 Key: SOLR-6533
 URL: https://issues.apache.org/jira/browse/SOLR-6533
 Project: Solr
  Issue Type: Sub-task
Reporter: Noble Paul
 Attachments: SOLR-6533.patch, SOLR-6533.patch, SOLR-6533.patch, 
 SOLR-6533.patch, SOLR-6533.patch, SOLR-6533.patch, SOLR-6533.patch, 
 SOLR-6533.patch


 There are a bunch of properties in solrconfig.xml which users want to edit. 
 We will attack them first
 These properties will be persisted to a separate file called config.json (or 
 whatever file). Instead of saving in the same format we will have well known 
 properties which users can directly edit
 {code}
 updateHandler.autoCommit.maxDocs
 query.filterCache.initialSize
 {code}   
 The api will be modeled around the bulk schema API
 {code:javascript}
 curl http://localhost:8983/solr/collection1/config -H 
 'Content-type:application/json'  -d '{
 set-property : {updateHandler.autoCommit.maxDocs:5},
 unset-property: updateHandler.autoCommit.maxDocs
 }'
 {code}
 {code:javascript}
 //or use this to set ${mypropname} values
 curl http://localhost:8983/solr/collection1/config -H 
 'Content-type:application/json'  -d '{
 set-user-property : {mypropname:my_prop_val},
 unset-user-property:{mypropname}
 }'
 {code}
 The values stored in the config.json will always take precedence and will be 
 applied after loading solrconfig.xml. 
 An http GET on /config path will give the real config that is applied . 
 An http GET of/config/overlay gives out the content of the configOverlay.json
 /config/component-name gives only the fchild of the same name from /config



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6528) hdfs cluster with replication min set to 2 / Solr does not honor dfs.replication in hdfs-site.xml

2014-11-03 Thread davidchiu (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14194486#comment-14194486
 ] 

davidchiu commented on SOLR-6528:
-

Hi,Michael,can you tell me what's the plan to fix this issue?

 hdfs cluster with replication min set to 2 / Solr does not honor 
 dfs.replication in hdfs-site.xml 
 --

 Key: SOLR-6528
 URL: https://issues.apache.org/jira/browse/SOLR-6528
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.9
 Environment: RedHat JDK 1.7 hadoop 2.4.1
Reporter: davidchiu
 Fix For: 4.10.3, Trunk


 org.apache.hadoop.ipc.RemoteException(java.io.IOException): file 
 /user/solr/test1/core_node1/data/tlog/tlog.000 on client 
 192.161.1.91.\nRequested replication 1 is less than the required minimum 2\n\t



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



CFP: FOSDEM 2015 - Open Source Search Dev Room

2014-11-03 Thread Uwe Schindler
***Please forward this CFP to anyone who may be interested in participating.***

Hi,

Search has evolved to be much more than simply full-text search. We now rely on 
“search engines” for a wide variety of functionality:
search as navigation, search as analytics and backend for data visualization 
and sometimes, dare we say it, as a data store. The purpose of this dev room is 
to explore the new world of open source search engines: their enhanced 
functionality, new use cases, feature and architectural deep dives, and the 
position of search in relation to the wider set of software tools.

We welcome proposals from folks working with or on open source search engines 
(e.g. Apache Lucene, Apache Solr, Elasticsearch, Seeks, Sphinx, etc.) or 
technologies that heavily depend upon search (e.g.
NoSQL databases, Nutch, Apache Hadoop). We are particularly interested in 
presentations on search algorithms, machine learning, real-world 
implementation/deployment stories and explorations of the future of search.

Talks should be 30-60 minutes in length, including time for QA.

You can submit your talks to us here:
https://docs.google.com/forms/d/11yLMj9ZlRD1EMU3Knp5y6eO3H5BRK7V38G0OxSfp84A/viewform

Our Call for Papers will close at 23:59 CEST on Monday, December 1, 2014. We 
cannot guarantee we will have the opportunity to review submissions made after 
the deadline, so please submit early (and often)!

Should you have any questions, you can contact the Dev Room
organizers: opensourcesearch-devr...@lists.fosdem.org

Cheers,
LH on behalf of the Open Source Search Dev Room Program Committee*

* Boaz Leskes, Isabel Drost-Fromm, Leslie Hawthorn, Ted Dunning, Torsten Curdt, 
Uwe Schindler

-
Uwe Schindler
uschind...@apache.org 
Apache Lucene PMC Member / Committer
Bremen, Germany
http://lucene.apache.org/



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: CFP: FOSDEM 2015 - Open Source Search Dev Room

2014-11-03 Thread Alexandre Rafalovitch
May I suggest the next time they do it, they mention event date and location :-)

It's 31st of January/1st Feb, Brussels if I found the right web page.

Regards,
   Alex.
Personal: http://www.outerthoughts.com/ and @arafalov
Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
Solr popularizers community: https://www.linkedin.com/groups?gid=6713853


On 3 November 2014 07:28, Uwe Schindler uschind...@apache.org wrote:
 ***Please forward this CFP to anyone who may be interested in 
 participating.***

 Hi,

 Search has evolved to be much more than simply full-text search. We now rely 
 on “search engines” for a wide variety of functionality:
 search as navigation, search as analytics and backend for data visualization 
 and sometimes, dare we say it, as a data store. The purpose of this dev room 
 is to explore the new world of open source search engines: their enhanced 
 functionality, new use cases, feature and architectural deep dives, and the 
 position of search in relation to the wider set of software tools.

 We welcome proposals from folks working with or on open source search engines 
 (e.g. Apache Lucene, Apache Solr, Elasticsearch, Seeks, Sphinx, etc.) or 
 technologies that heavily depend upon search (e.g.
 NoSQL databases, Nutch, Apache Hadoop). We are particularly interested in 
 presentations on search algorithms, machine learning, real-world 
 implementation/deployment stories and explorations of the future of search.

 Talks should be 30-60 minutes in length, including time for QA.

 You can submit your talks to us here:
 https://docs.google.com/forms/d/11yLMj9ZlRD1EMU3Knp5y6eO3H5BRK7V38G0OxSfp84A/viewform

 Our Call for Papers will close at 23:59 CEST on Monday, December 1, 2014. We 
 cannot guarantee we will have the opportunity to review submissions made 
 after the deadline, so please submit early (and often)!

 Should you have any questions, you can contact the Dev Room
 organizers: opensourcesearch-devr...@lists.fosdem.org

 Cheers,
 LH on behalf of the Open Source Search Dev Room Program Committee*

 * Boaz Leskes, Isabel Drost-Fromm, Leslie Hawthorn, Ted Dunning, Torsten 
 Curdt, Uwe Schindler

 -
 Uwe Schindler
 uschind...@apache.org
 Apache Lucene PMC Member / Committer
 Bremen, Germany
 http://lucene.apache.org/



 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



RE: CFP: FOSDEM 2015 - Open Source Search Dev Room

2014-11-03 Thread Uwe Schindler
Hi,

Sorry, my fault. I just copied the official CFP, which was sent to the FOSDEM 
list... Of course, those know the dates :-)
And you are right, the conference is on the following date: Brussels / 31 
January  1 February 2015

I have no idea on which day the search devroom takes place.

Uwe

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


 -Original Message-
 From: Alexandre Rafalovitch [mailto:arafa...@gmail.com]
 Sent: Monday, November 03, 2014 1:36 PM
 To: dev@lucene.apache.org
 Subject: Re: CFP: FOSDEM 2015 - Open Source Search Dev Room
 
 May I suggest the next time they do it, they mention event date and location
 :-)
 
 It's 31st of January/1st Feb, Brussels if I found the right web page.
 
 Regards,
Alex.
 Personal: http://www.outerthoughts.com/ and @arafalov Solr resources and
 newsletter: http://www.solr-start.com/ and @solrstart Solr popularizers
 community: https://www.linkedin.com/groups?gid=6713853
 
 
 On 3 November 2014 07:28, Uwe Schindler uschind...@apache.org wrote:
  ***Please forward this CFP to anyone who may be interested in
  participating.***
 
  Hi,
 
  Search has evolved to be much more than simply full-text search. We now
 rely on “search engines” for a wide variety of functionality:
  search as navigation, search as analytics and backend for data visualization
 and sometimes, dare we say it, as a data store. The purpose of this dev room
 is to explore the new world of open source search engines: their enhanced
 functionality, new use cases, feature and architectural deep dives, and the
 position of search in relation to the wider set of software tools.
 
  We welcome proposals from folks working with or on open source search
 engines (e.g. Apache Lucene, Apache Solr, Elasticsearch, Seeks, Sphinx, etc.)
 or technologies that heavily depend upon search (e.g.
  NoSQL databases, Nutch, Apache Hadoop). We are particularly interested
 in presentations on search algorithms, machine learning, real-world
 implementation/deployment stories and explorations of the future of
 search.
 
  Talks should be 30-60 minutes in length, including time for QA.
 
  You can submit your talks to us here:
 
 https://docs.google.com/forms/d/11yLMj9ZlRD1EMU3Knp5y6eO3H5BRK7V3
 8G0Ox
  Sfp84A/viewform
 
  Our Call for Papers will close at 23:59 CEST on Monday, December 1, 2014.
 We cannot guarantee we will have the opportunity to review submissions
 made after the deadline, so please submit early (and often)!
 
  Should you have any questions, you can contact the Dev Room
  organizers: opensourcesearch-devr...@lists.fosdem.org
 
  Cheers,
  LH on behalf of the Open Source Search Dev Room Program Committee*
 
  * Boaz Leskes, Isabel Drost-Fromm, Leslie Hawthorn, Ted Dunning,
  Torsten Curdt, Uwe Schindler
 
  -
  Uwe Schindler
  uschind...@apache.org
  Apache Lucene PMC Member / Committer
  Bremen, Germany
  http://lucene.apache.org/
 
 
 
  -
  To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For
  additional commands, e-mail: dev-h...@lucene.apache.org
 
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional
 commands, e-mail: dev-h...@lucene.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



lucene expressions - asm dependency

2014-11-03 Thread Rob Audenaerde
Hi all,

I'm using lucene-expressions in a project, and it has a dependency on asm
4.1. I also use some components that depend on cglib 2.2.2, that depeneds
on asm 3.1. Besides, asm 5.0.2 is out since April 2014

No ideal situation.

Would it be possible to shade / repackage asm for lucene-expressions? In
that case there will be no class conflicts when using lucene in a project
where another asm is also used?

-Rob


RE: FOSDEM 2015 - Open Source Search Dev Room

2014-11-03 Thread Uwe Schindler
Hi,

forgot to mention:
FOSDEM 2015 takes place in Brussels on January 31th and February 1st, 2015. See 
also: https://fosdem.org/2015/

I hope to see you there!
Uwe

 -Original Message-
 From: Uwe Schindler [mailto:uschind...@apache.org]
 Sent: Monday, November 03, 2014 1:29 PM
 To: dev@lucene.apache.org; java-u...@lucene.apache.org; solr-
 u...@lucene.apache.org; gene...@lucene.apache.org
 Subject: CFP: FOSDEM 2015 - Open Source Search Dev Room
 
 ***Please forward this CFP to anyone who may be interested in
 participating.***
 
 Hi,
 
 Search has evolved to be much more than simply full-text search. We now
 rely on “search engines” for a wide variety of functionality:
 search as navigation, search as analytics and backend for data visualization
 and sometimes, dare we say it, as a data store. The purpose of this dev room
 is to explore the new world of open source search engines: their enhanced
 functionality, new use cases, feature and architectural deep dives, and the
 position of search in relation to the wider set of software tools.
 
 We welcome proposals from folks working with or on open source search
 engines (e.g. Apache Lucene, Apache Solr, Elasticsearch, Seeks, Sphinx, etc.)
 or technologies that heavily depend upon search (e.g.
 NoSQL databases, Nutch, Apache Hadoop). We are particularly interested in
 presentations on search algorithms, machine learning, real-world
 implementation/deployment stories and explorations of the future of
 search.
 
 Talks should be 30-60 minutes in length, including time for QA.
 
 You can submit your talks to us here:
 https://docs.google.com/forms/d/11yLMj9ZlRD1EMU3Knp5y6eO3H5BRK7V3
 8G0OxSfp84A/viewform
 
 Our Call for Papers will close at 23:59 CEST on Monday, December 1, 2014. We
 cannot guarantee we will have the opportunity to review submissions made
 after the deadline, so please submit early (and often)!
 
 Should you have any questions, you can contact the Dev Room
 organizers: opensourcesearch-devr...@lists.fosdem.org
 
 Cheers,
 LH on behalf of the Open Source Search Dev Room Program Committee*
 
 * Boaz Leskes, Isabel Drost-Fromm, Leslie Hawthorn, Ted Dunning, Torsten
 Curdt, Uwe Schindler
 
 -
 Uwe Schindler
 uschind...@apache.org
 Apache Lucene PMC Member / Committer
 Bremen, Germany
 http://lucene.apache.org/
 
 
 
 -
 To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-user-h...@lucene.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



RE: lucene expressions - asm dependency

2014-11-03 Thread Uwe Schindler
Hi,

 

Why not shade it on yourself depending on your project needs? You can do this 
in your own project easily, as a separate build step (e.g. Ant or maybe also in 
Maven using a separate sub-project which your main project depends on).

The ASM issue is well-known, the forbidden-apis checker shades 5.0.2.

 

Uwe

 

-

Uwe Schindler

H.-H.-Meier-Allee 63, D-28213 Bremen

http://www.thetaphi.de http://www.thetaphi.de/ 

eMail: u...@thetaphi.de

 

From: Rob Audenaerde [mailto:rob.audenae...@gmail.com] 
Sent: Monday, November 03, 2014 10:11 AM
To: dev@lucene.apache.org
Subject: lucene expressions - asm dependency

 

Hi all,

I'm using lucene-expressions in a project, and it has a dependency on asm 4.1. 
I also use some components that depend on cglib 2.2.2, that depeneds on asm 
3.1. Besides, asm 5.0.2 is out since April 2014

No ideal situation. 

Would it be possible to shade / repackage asm for lucene-expressions? In that 
case there will be no class conflicts when using lucene in a project where 
another asm is also used?

-Rob



[jira] [Commented] (SOLR-6637) Solr should have a way to restore a core

2014-11-03 Thread David Smiley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14194516#comment-14194516
 ] 

David Smiley commented on SOLR-6637:


FYI I already created an issue in JIRA for this: 
https://issues.apache.org/jira/browse/SOLR-4545

 Solr should have a way to restore a core
 

 Key: SOLR-6637
 URL: https://issues.apache.org/jira/browse/SOLR-6637
 Project: Solr
  Issue Type: Improvement
Reporter: Varun Thacker
 Attachments: SOLR-6637.patch, SOLR-6637.patch, SOLR-6637.patch, 
 SOLR-6637.patch


 We have a core backup command which backs up the index. We should have a 
 restore command too. 
 This would restore any named snapshots created by the replication handlers 
 backup command.
 While working on this patch right now I realized that during backup we only 
 backup the index. Should we backup the conf files also? Any thoughts? I could 
 separate Jira for this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6637) Solr should have a way to restore a core

2014-11-03 Thread Varun Thacker (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14194522#comment-14194522
 ] 

Varun Thacker commented on SOLR-6637:
-

[~dsmiley] I had not seen that issue previously. Should we move the work there 
? 

bq. A proposed restore command to the replication handler should allow 
specifying a directory, or an as-of date; otherwise you'd get the most recent 
snapshot.

My approach here has been to allow restoring named snapshots ( SOLR-5340 ) 
only. We can add functionality that says that if the name is not provided 
then we restore the most recent snapshot. 

 Solr should have a way to restore a core
 

 Key: SOLR-6637
 URL: https://issues.apache.org/jira/browse/SOLR-6637
 Project: Solr
  Issue Type: Improvement
Reporter: Varun Thacker
 Attachments: SOLR-6637.patch, SOLR-6637.patch, SOLR-6637.patch, 
 SOLR-6637.patch


 We have a core backup command which backs up the index. We should have a 
 restore command too. 
 This would restore any named snapshots created by the replication handlers 
 backup command.
 While working on this patch right now I realized that during backup we only 
 backup the index. Should we backup the conf files also? Any thoughts? I could 
 separate Jira for this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6044) Add backcompat for TokenFilters with posInc=false before 4.4

2014-11-03 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14194553#comment-14194553
 ] 

Robert Muir commented on LUCENE-6044:
-

+1

 Add backcompat for TokenFilters with posInc=false before 4.4
 

 Key: LUCENE-6044
 URL: https://issues.apache.org/jira/browse/LUCENE-6044
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Ryan Ernst
 Attachments: LUCENE-6044.patch


 In Lucene 4.4, a number of token filters supporting the 
 enablePositionIncrements=false setting were changed to default to true.  
 However, with Lucene 5.0, the setting was removed altogether.  We should have 
 backcompat for this setting, as well as work when used with a 
 TokenFilterFactory and match version  4.4.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6637) Solr should have a way to restore a core

2014-11-03 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14194555#comment-14194555
 ] 

Noble Paul commented on SOLR-6637:
--

[~varunthacker] Can I not restore the data to another core? 

bq.If the location is not provided in the query string then we default it to 
core.getDataDir()

what is the usecase for restoring from the dataDir itself?

bq.Remove any files in the current directory which does not belong to the 
segment .

DON'T DO THIS

There is a mechanism using for loading the index from another directory in the 
same core.

 Solr should have a way to restore a core
 

 Key: SOLR-6637
 URL: https://issues.apache.org/jira/browse/SOLR-6637
 Project: Solr
  Issue Type: Improvement
Reporter: Varun Thacker
 Attachments: SOLR-6637.patch, SOLR-6637.patch, SOLR-6637.patch, 
 SOLR-6637.patch


 We have a core backup command which backs up the index. We should have a 
 restore command too. 
 This would restore any named snapshots created by the replication handlers 
 backup command.
 While working on this patch right now I realized that during backup we only 
 backup the index. Should we backup the conf files also? Any thoughts? I could 
 separate Jira for this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6637) Solr should have a way to restore a core

2014-11-03 Thread Varun Thacker (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14194563#comment-14194563
 ] 

Varun Thacker commented on SOLR-6637:
-

bq. what is the usecase for restoring from the dataDir itself?

What I meant here was - if location param is not provided it would see if the 
backup index is present under dataDir/backupName .

bq. Remove any files in the current directory which does not belong to the 
segment .

This is what I do here - Once all the files from the backup location have been 
successfully copied over the current index, there might be extra segment files 
from the current index lying around. It gets cleaned up in 
cleanupOldIndexFiles() where we take the name of the segment file from the 
backup index and see which files are extra. We then remove these extra segment 
files.


 Solr should have a way to restore a core
 

 Key: SOLR-6637
 URL: https://issues.apache.org/jira/browse/SOLR-6637
 Project: Solr
  Issue Type: Improvement
Reporter: Varun Thacker
 Attachments: SOLR-6637.patch, SOLR-6637.patch, SOLR-6637.patch, 
 SOLR-6637.patch


 We have a core backup command which backs up the index. We should have a 
 restore command too. 
 This would restore any named snapshots created by the replication handlers 
 backup command.
 While working on this patch right now I realized that during backup we only 
 backup the index. Should we backup the conf files also? Any thoughts? I could 
 separate Jira for this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6670) change BALANCESLICEUNIQUE to BALANCESHARDUNIQUE

2014-11-03 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14194578#comment-14194578
 ] 

ASF subversion and git services commented on SOLR-6670:
---

Commit 1636363 from [~erickoerickson] in branch 'dev/trunk'
[ https://svn.apache.org/r1636363 ]

SOLR-6670: change BALANCESLICEUNIQUE to BALANCESHARDUNIQUE. corrected typo

 change BALANCESLICEUNIQUE to BALANCESHARDUNIQUE
 ---

 Key: SOLR-6670
 URL: https://issues.apache.org/jira/browse/SOLR-6670
 Project: Solr
  Issue Type: Improvement
Reporter: Erick Erickson
Assignee: Erick Erickson
Priority: Minor
 Fix For: 5.0, Trunk

 Attachments: SOLR-6670.patch


 JIRA for Jan's comments on SOLR-6513:
 I thought we agreed to prefer the term shard over slice, so I think we 
 should do this for this API as well.
 The only place in our refguide we use the word slice is in How SolrCloud 
 Works [1] and that description is disputed.
 The refguide explanation of what a shard is can be found in Shards and 
 Indexing Data in SolrCloud [2], quoting:
 When your data is too large for one node, you can break it up and store it in 
 sections by creating one or more shards. Each is a portion of the logical 
 index, or core, and it's the set of all nodes containing that section of the 
 index.
 So I'm proposing a rename of this API to BALANCESHARDUNIQUE and a rewrite of 
 [1].
 [1] https://cwiki.apache.org/confluence/display/solr/How+SolrCloud+Works
 [2] 
 https://cwiki.apache.org/confluence/display/solr/Shards+and+Indexing+Data+in+SolrCloud
 Note Mark's comment on that JIRA, but I think it would be best to continue to 
 talk about shards with user-facing operations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6670) change BALANCESLICEUNIQUE to BALANCESHARDUNIQUE

2014-11-03 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14194581#comment-14194581
 ] 

ASF subversion and git services commented on SOLR-6670:
---

Commit 1636364 from [~erickoerickson] in branch 'dev/branches/branch_5x'
[ https://svn.apache.org/r1636364 ]

SOLR-6670: change BALANCESLICEUNIQUE to BALANCESHARDUNIQUE. corrected typo

 change BALANCESLICEUNIQUE to BALANCESHARDUNIQUE
 ---

 Key: SOLR-6670
 URL: https://issues.apache.org/jira/browse/SOLR-6670
 Project: Solr
  Issue Type: Improvement
Reporter: Erick Erickson
Assignee: Erick Erickson
Priority: Minor
 Fix For: 5.0, Trunk

 Attachments: SOLR-6670.patch


 JIRA for Jan's comments on SOLR-6513:
 I thought we agreed to prefer the term shard over slice, so I think we 
 should do this for this API as well.
 The only place in our refguide we use the word slice is in How SolrCloud 
 Works [1] and that description is disputed.
 The refguide explanation of what a shard is can be found in Shards and 
 Indexing Data in SolrCloud [2], quoting:
 When your data is too large for one node, you can break it up and store it in 
 sections by creating one or more shards. Each is a portion of the logical 
 index, or core, and it's the set of all nodes containing that section of the 
 index.
 So I'm proposing a rename of this API to BALANCESHARDUNIQUE and a rewrite of 
 [1].
 [1] https://cwiki.apache.org/confluence/display/solr/How+SolrCloud+Works
 [2] 
 https://cwiki.apache.org/confluence/display/solr/Shards+and+Indexing+Data+in+SolrCloud
 Note Mark's comment on that JIRA, but I think it would be best to continue to 
 talk about shards with user-facing operations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6365) specify appends, defaults, invariants outside of the component

2014-11-03 Thread Erick Erickson (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14194582#comment-14194582
 ] 

Erick Erickson commented on SOLR-6365:
--

Great!

 specify  appends, defaults, invariants outside of the component
 ---

 Key: SOLR-6365
 URL: https://issues.apache.org/jira/browse/SOLR-6365
 Project: Solr
  Issue Type: Improvement
Reporter: Noble Paul
Assignee: Noble Paul
 Fix For: 5.0, Trunk

 Attachments: SOLR-6365-crappy-test.patch, SOLR-6365.patch, 
 SOLR-6365.patch, SOLR-6365.patch


 The components are configured in solrconfig.xml mostly for specifying these 
 extra parameters. If we separate these out, we can avoid specifying the 
 components altogether and make solrconfig much simpler. Eventually we want 
 users to see all functions as paths instead of components and control these 
 params from outside , through an API and persisted in ZK
 objectives :
 * define standard components implicitly and let users override some params 
 only
 * reuse standard params across components
 * define multiple param sets and mix and match these params at request time
 example
 {code:xml}
 !-- use json for all paths and _txt as the default search field--
 initParams name=global path=/**
   lst name=defaults
  str name=wtjson/str
  str name=df_txt/str
   /lst
 /initParams
 {code}
 other examples
 {code:xml}
 initParams name=a path=/dump3,/root/*,/root1/**
 lst name=defaults
   str name=aA/str
 /lst
 lst name=invariants
   str name=bB/str
 /lst
 lst name=appends
   str name=cC/str
 /lst
   /initParams
   requestHandler name=/dump3 class=DumpRequestHandler/
   requestHandler name=/dump4 class=DumpRequestHandler/
   requestHandler name=/root/dump5 class=DumpRequestHandler/
   requestHandler name=/root1/anotherlevel/dump6 
 class=DumpRequestHandler/
   requestHandler name=/dump1 class=DumpRequestHandler initParams=a/
   requestHandler name=/dump2 class=DumpRequestHandler initParams=a
 lst name=defaults
   str name=aA1/str
 /lst
 lst name=invariants
   str name=bB1/str
 /lst
 lst name=appends
   str name=cC1/str
 /lst
   /requestHandler
 {code}
  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6046) RegExp.toAutomaton high memory use

2014-11-03 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14194585#comment-14194585
 ] 

Michael McCandless commented on LUCENE-6046:


OK I boiled down the adversarial regexp to this simpler still-adversarial 
version: \[ac]*a\[ac]\{50,200}

I suspect this is a legitimate adversary and not a bug in our RegExp/automaton 
impl, i.e. the number of states in the DFA for this is exponential as a 
function of the 50/200.

 RegExp.toAutomaton high memory use
 --

 Key: LUCENE-6046
 URL: https://issues.apache.org/jira/browse/LUCENE-6046
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/queryparser
Affects Versions: 4.10.1
Reporter: Lee Hinman
Assignee: Michael McCandless
Priority: Minor

 When creating an automaton from an org.apache.lucene.util.automaton.RegExp, 
 it's possible for the automaton to use so much memory it exceeds the maximum 
 array size for java.
 The following caused an OutOfMemoryError with a 32gb heap:
 {noformat}
 new 
 RegExp(\\[\\[(Datei|File|Bild|Image):[^]]*alt=[^]|}]{50,200}).toAutomaton();
 {noformat}
 When increased to a 60gb heap, the following exception is thrown:
 {noformat}
   1 java.lang.IllegalArgumentException: requested array size 2147483624 
 exceeds maximum array in java (2147483623)
   1 
 __randomizedtesting.SeedInfo.seed([7BE81EF678615C32:95C8057A4ABA5B52]:0)
   1 org.apache.lucene.util.ArrayUtil.oversize(ArrayUtil.java:168)
   1 org.apache.lucene.util.ArrayUtil.grow(ArrayUtil.java:295)
   1 
 org.apache.lucene.util.automaton.Automaton$Builder.addTransition(Automaton.java:639)
   1 
 org.apache.lucene.util.automaton.Operations.determinize(Operations.java:741)
   1 
 org.apache.lucene.util.automaton.MinimizationOperations.minimizeHopcroft(MinimizationOperations.java:62)
   1 
 org.apache.lucene.util.automaton.MinimizationOperations.minimize(MinimizationOperations.java:51)
   1 org.apache.lucene.util.automaton.RegExp.toAutomaton(RegExp.java:477)
   1 org.apache.lucene.util.automaton.RegExp.toAutomaton(RegExp.java:426)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6046) RegExp.toAutomaton high memory use

2014-11-03 Thread Nik Everett (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14194592#comment-14194592
 ] 

Nik Everett commented on LUCENE-6046:
-

Oh yeah, its totally running into 2^n territory legitiately here.  This is 
totally something that'd be rejected by a framework to prevent explosive growth 
during determination.

 RegExp.toAutomaton high memory use
 --

 Key: LUCENE-6046
 URL: https://issues.apache.org/jira/browse/LUCENE-6046
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/queryparser
Affects Versions: 4.10.1
Reporter: Lee Hinman
Assignee: Michael McCandless
Priority: Minor

 When creating an automaton from an org.apache.lucene.util.automaton.RegExp, 
 it's possible for the automaton to use so much memory it exceeds the maximum 
 array size for java.
 The following caused an OutOfMemoryError with a 32gb heap:
 {noformat}
 new 
 RegExp(\\[\\[(Datei|File|Bild|Image):[^]]*alt=[^]|}]{50,200}).toAutomaton();
 {noformat}
 When increased to a 60gb heap, the following exception is thrown:
 {noformat}
   1 java.lang.IllegalArgumentException: requested array size 2147483624 
 exceeds maximum array in java (2147483623)
   1 
 __randomizedtesting.SeedInfo.seed([7BE81EF678615C32:95C8057A4ABA5B52]:0)
   1 org.apache.lucene.util.ArrayUtil.oversize(ArrayUtil.java:168)
   1 org.apache.lucene.util.ArrayUtil.grow(ArrayUtil.java:295)
   1 
 org.apache.lucene.util.automaton.Automaton$Builder.addTransition(Automaton.java:639)
   1 
 org.apache.lucene.util.automaton.Operations.determinize(Operations.java:741)
   1 
 org.apache.lucene.util.automaton.MinimizationOperations.minimizeHopcroft(MinimizationOperations.java:62)
   1 
 org.apache.lucene.util.automaton.MinimizationOperations.minimize(MinimizationOperations.java:51)
   1 org.apache.lucene.util.automaton.RegExp.toAutomaton(RegExp.java:477)
   1 org.apache.lucene.util.automaton.RegExp.toAutomaton(RegExp.java:426)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6044) Add backcompat for TokenFilters with posInc=false before 4.4

2014-11-03 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14194597#comment-14194597
 ] 

ASF subversion and git services commented on LUCENE-6044:
-

Commit 1636368 from [~rjernst] in branch 'dev/branches/branch_5x'
[ https://svn.apache.org/r1636368 ]

LUCENE-6044: Fixed backcompat support for token filters with 
enablePositionIncrements=false

 Add backcompat for TokenFilters with posInc=false before 4.4
 

 Key: LUCENE-6044
 URL: https://issues.apache.org/jira/browse/LUCENE-6044
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Ryan Ernst
 Attachments: LUCENE-6044.patch


 In Lucene 4.4, a number of token filters supporting the 
 enablePositionIncrements=false setting were changed to default to true.  
 However, with Lucene 5.0, the setting was removed altogether.  We should have 
 backcompat for this setting, as well as work when used with a 
 TokenFilterFactory and match version  4.4.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-6044) Add backcompat for TokenFilters with posInc=false before 4.4

2014-11-03 Thread Ryan Ernst (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-6044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan Ernst resolved LUCENE-6044.

Resolution: Fixed

 Add backcompat for TokenFilters with posInc=false before 4.4
 

 Key: LUCENE-6044
 URL: https://issues.apache.org/jira/browse/LUCENE-6044
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Ryan Ernst
 Attachments: LUCENE-6044.patch


 In Lucene 4.4, a number of token filters supporting the 
 enablePositionIncrements=false setting were changed to default to true.  
 However, with Lucene 5.0, the setting was removed altogether.  We should have 
 backcompat for this setting, as well as work when used with a 
 TokenFilterFactory and match version  4.4.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-6044) Add backcompat for TokenFilters with posInc=false before 4.4

2014-11-03 Thread Ryan Ernst (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-6044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan Ernst updated LUCENE-6044:
---
Fix Version/s: 5.0
 Assignee: Ryan Ernst

 Add backcompat for TokenFilters with posInc=false before 4.4
 

 Key: LUCENE-6044
 URL: https://issues.apache.org/jira/browse/LUCENE-6044
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Ryan Ernst
Assignee: Ryan Ernst
 Fix For: 5.0

 Attachments: LUCENE-6044.patch


 In Lucene 4.4, a number of token filters supporting the 
 enablePositionIncrements=false setting were changed to default to true.  
 However, with Lucene 5.0, the setting was removed altogether.  We should have 
 backcompat for this setting, as well as work when used with a 
 TokenFilterFactory and match version  4.4.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6637) Solr should have a way to restore a core

2014-11-03 Thread David Smiley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14194608#comment-14194608
 ] 

David Smiley commented on SOLR-6637:


bq. David Smiley I had not seen that issue previously. Should we move the work 
there ?

No, it's too late now.  Next time please search for an existing issue.  
SOLR-4545 can be closed as a duplicate so long as you can restore a snapshot 
without being required to specify its name.  A timestamp would be nice.

 Solr should have a way to restore a core
 

 Key: SOLR-6637
 URL: https://issues.apache.org/jira/browse/SOLR-6637
 Project: Solr
  Issue Type: Improvement
Reporter: Varun Thacker
 Attachments: SOLR-6637.patch, SOLR-6637.patch, SOLR-6637.patch, 
 SOLR-6637.patch


 We have a core backup command which backs up the index. We should have a 
 restore command too. 
 This would restore any named snapshots created by the replication handlers 
 backup command.
 While working on this patch right now I realized that during backup we only 
 backup the index. Should we backup the conf files also? Any thoughts? I could 
 separate Jira for this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6517) CollectionsAPI call REBALANCELEADERS

2014-11-03 Thread Erick Erickson (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14194635#comment-14194635
 ] 

Erick Erickson commented on SOLR-6517:
--

Well, it's worked in every test, both manual and automated that I've run so 
far. Do you have a failure that demonstrates this?

Maybe a mismatch in expectations? REBALANCELEADERS does _not_, and was not 
designed to, force the rebalancing immediately for nodes that do not have the 
preferredLeader property already set. It simply makes leaders out of those 
nodes that _already_ have the preferredLeader property set and are not 
currently the leader.

So to rebalance the leaders across the cluster, you first need to 
BALANCESHARDUNIQUE with the preferredLeader property and _then_ issue the 
REBALANCELEADERS command. That way it's not required that the entire cluster be 
balanced, you can selectively assign _some_ preferredLeaders if you want.

Or am I missing the boat completely? How do you see it not working?

 CollectionsAPI call REBALANCELEADERS
 

 Key: SOLR-6517
 URL: https://issues.apache.org/jira/browse/SOLR-6517
 Project: Solr
  Issue Type: New Feature
Affects Versions: 5.0, Trunk
Reporter: Erick Erickson
Assignee: Erick Erickson
 Fix For: 5.0, Trunk

 Attachments: SOLR-6517.patch, SOLR-6517.patch, SOLR-6517.patch


 Perhaps the final piece of SOLR-6491. Once the preferred leadership roles are 
 assigned, there has to be a command make it so Mr. Solr. This is something 
 of a placeholder to collect ideas. One wouldn't want to flood the system with 
 hundreds of re-assignments at once. Should this be synchronous or asnych? 
 Should it make the best attempt but not worry about perfection? Should it???
 a collection=name parameter would be required and it would re-elect all the 
 leaders that were on the 'wrong' node
 I'm thinking an optionally allowing one to specify a shard in the case where 
 you wanted to make a very specific change. Note that there's no need to 
 specify a particular replica, since there should be only a single 
 preferredLeader per slice.
 This command would do nothing to any slice that did not have a replica with a 
 preferredLeader role. Likewise it would do nothing if the slice in question 
 already had the leader role assigned to the node with the preferredLeader 
 role.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6517) CollectionsAPI call REBALANCELEADERS

2014-11-03 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14194644#comment-14194644
 ] 

Noble Paul commented on SOLR-6517:
--

From the code , what I see is , a message is sent to overseer to change the 
leader. But there is not action performed to change the actual election queue. 
The role in the clusterstate is just a reflection of what should be there in 
the election queue and not the other way around. 

 CollectionsAPI call REBALANCELEADERS
 

 Key: SOLR-6517
 URL: https://issues.apache.org/jira/browse/SOLR-6517
 Project: Solr
  Issue Type: New Feature
Affects Versions: 5.0, Trunk
Reporter: Erick Erickson
Assignee: Erick Erickson
 Fix For: 5.0, Trunk

 Attachments: SOLR-6517.patch, SOLR-6517.patch, SOLR-6517.patch


 Perhaps the final piece of SOLR-6491. Once the preferred leadership roles are 
 assigned, there has to be a command make it so Mr. Solr. This is something 
 of a placeholder to collect ideas. One wouldn't want to flood the system with 
 hundreds of re-assignments at once. Should this be synchronous or asnych? 
 Should it make the best attempt but not worry about perfection? Should it???
 a collection=name parameter would be required and it would re-elect all the 
 leaders that were on the 'wrong' node
 I'm thinking an optionally allowing one to specify a shard in the case where 
 you wanted to make a very specific change. Note that there's no need to 
 specify a particular replica, since there should be only a single 
 preferredLeader per slice.
 This command would do nothing to any slice that did not have a replica with a 
 preferredLeader role. Likewise it would do nothing if the slice in question 
 already had the leader role assigned to the node with the preferredLeader 
 role.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-6517) CollectionsAPI call REBALANCELEADERS

2014-11-03 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14194644#comment-14194644
 ] 

Noble Paul edited comment on SOLR-6517 at 11/3/14 3:50 PM:
---

From the code , what I see is , a message is sent to overseer to change the 
leader. But there is no action performed to change the actual election queue. 
The role in the clusterstate is just a reflection of what should be there in 
the election queue and not the other way around. 


was (Author: noble.paul):
From the code , what I see is , a message is sent to overseer to change the 
leader. But there is not action performed to change the actual election queue. 
The role in the clusterstate is just a reflection of what should be there in 
the election queue and not the other way around. 

 CollectionsAPI call REBALANCELEADERS
 

 Key: SOLR-6517
 URL: https://issues.apache.org/jira/browse/SOLR-6517
 Project: Solr
  Issue Type: New Feature
Affects Versions: 5.0, Trunk
Reporter: Erick Erickson
Assignee: Erick Erickson
 Fix For: 5.0, Trunk

 Attachments: SOLR-6517.patch, SOLR-6517.patch, SOLR-6517.patch


 Perhaps the final piece of SOLR-6491. Once the preferred leadership roles are 
 assigned, there has to be a command make it so Mr. Solr. This is something 
 of a placeholder to collect ideas. One wouldn't want to flood the system with 
 hundreds of re-assignments at once. Should this be synchronous or asnych? 
 Should it make the best attempt but not worry about perfection? Should it???
 a collection=name parameter would be required and it would re-elect all the 
 leaders that were on the 'wrong' node
 I'm thinking an optionally allowing one to specify a shard in the case where 
 you wanted to make a very specific change. Note that there's no need to 
 specify a particular replica, since there should be only a single 
 preferredLeader per slice.
 This command would do nothing to any slice that did not have a replica with a 
 preferredLeader role. Likewise it would do nothing if the slice in question 
 already had the leader role assigned to the node with the preferredLeader 
 role.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6645) Refactored DocumentObjectBinder and added AnnotationListeners

2014-11-03 Thread Fabio Piro (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14194687#comment-14194687
 ] 

Fabio Piro commented on SOLR-6645:
--

_Friendly Reminder #2_

Hello, has anyone by any chance taken a look at the patch, between some trick 
or treats?

 Refactored DocumentObjectBinder and added AnnotationListeners
 -

 Key: SOLR-6645
 URL: https://issues.apache.org/jira/browse/SOLR-6645
 Project: Solr
  Issue Type: Improvement
  Components: clients - java
Affects Versions: 4.10.2
Reporter: Fabio Piro
  Labels: annotations, binder, listener, solrj
 Fix For: 5.0, Trunk

 Attachments: SOLR-6645.patch


 Hello good people.
 It is understandable that the priority of SolrJ is to provide a stable API 
 for java and not a rich-feature client, I'm well aware of that. On the other 
 hand more features nowadays mean most of the time Spring Solr Data. Although 
 I appreciate the enrichment work of that lib, sometimes depending on its 
 monolithic dependencies and magic is not a valid option.
 So, I was thinking that the official DocumentObjectBinder could benefit from 
 some love, and I had implemented a listener pattern for the annotations. 
 *Note: No new logic or new annotations were introduced, the patch is only a 
 refactor to make more extendible (for the user) the current 
 DocumentObjectBinder and @Field DocField.*
 You can register your annotations and they relate listeners in the binder, 
 and it will invoke the corresponding method in the listener on getBean and on 
 toSolrInputDocument, therefore granting the chance to do something during the 
 ongoing process.
 Changes are:
 * [MOD] */beans/DocumentObjectBinder*:  The new logic and a new constructor 
 for registering the annotations
 * [ADD] */impl/AccessorAnnotationListener*: Abstract utility class with the 
 former get(), set(), isArray, isList, isContainedInMap etc...
 * [ADD] */impl/FieldAnnotationListener*: all the rest of DocField for dealing 
 with @Field
 * [ADD] */AnnotationListener*: the base listener class
 * [MOD] */SolrServer*: added setBinder (this is the only tricky change, I 
 hope it's not a problem).
 It's all well documented and the code is very easy to read. Tests are all 
 green, it should be 100% backward compatible and the performance impact is 
 void (the logic flow is exactly the same as now, and I only changed the bare 
 essentials and nothing more, anyway).
 Some Examples (they are not part of the pull-request):
 The long awaited @FieldObject in 4 lines of code:
 https://issues.apache.org/jira/browse/SOLR-1945
 {code:java}
 public class FieldObjectAnnotationListener extends 
 AccessorAnnotationListenerFieldObject {
 public FieldObjectAnnotationListener(AnnotatedElement element, 
 FieldObject annotation) {
 super(element, annotation);
 }
 @Override
 public void onGetBean(Object obj, SolrDocument doc, DocumentObjectBinder 
 binder) {
 Object nested = binder.getBean(target.clazz, doc);
 setTo(obj, nested);
 }
 @Override
 public void onToSolrInputDocument(Object obj, SolrInputDocument doc, 
 DocumentObjectBinder binder) {
 SolrInputDocument nested = binder.toSolrInputDocument(getFrom(obj));
 for (Map.EntryString, SolrInputField entry : nested.entrySet()) {
 doc.addField(entry.getKey(), entry.getValue());
 }
 }
 }
 {code}
 Or something entirely new like an annotation for ChildDocuments:
 {code:java}
 public class ChildDocumentsAnnotationListener extends 
 AccessorAnnotationListenerChildDocuments {
 public ChildDocumentsAnnotationListener(AnnotatedElement element, 
 ChildDocuments annotation) {
 super(element, annotation);
 if (!target.isInList || target.clazz.isPrimitive()) {
 throw new BindingException(@NestedDocuments is applicable only 
 on ListObject.);
 }
 }
 @Override
 public void onGetBean(Object obj, SolrDocument doc, DocumentObjectBinder 
 binder) {
 ListObject nested = new ArrayList();
 for (SolrDocument child : doc.getChildDocuments()) {
 nested.add(binder.getBean(target.clazz, child));// this should be 
 recursive, but it's only an example
 }
 setTo(obj, nested);
 }
 @Override
 public void onToSolrInputDocument(Object obj, SolrInputDocument doc, 
 DocumentObjectBinder binder) {
 SolrInputDocument nested = binder.toSolrInputDocument(getFrom(obj));
 doc.addChildDocuments(nested.getChildDocuments());
 }
 }
 {code}
 In addition, all the logic is encapsulated in the listener, so you can make a 
 custom FieldAnnotationListener too, and override the default one
 {code:java}
 public class CustomFieldAnnotationListener extends 

[jira] [Comment Edited] (SOLR-6645) Refactored DocumentObjectBinder and added AnnotationListeners

2014-11-03 Thread Fabio Piro (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14194687#comment-14194687
 ] 

Fabio Piro edited comment on SOLR-6645 at 11/3/14 4:24 PM:
---

__Friendly Reminder #2__

Hello, has anyone by any chance taken a look at the patch, between some trick 
or treats?


was (Author: dewos):
_Friendly Reminder #2_

Hello, has anyone by any chance taken a look at the patch, between some trick 
or treats?

 Refactored DocumentObjectBinder and added AnnotationListeners
 -

 Key: SOLR-6645
 URL: https://issues.apache.org/jira/browse/SOLR-6645
 Project: Solr
  Issue Type: Improvement
  Components: clients - java
Affects Versions: 4.10.2
Reporter: Fabio Piro
  Labels: annotations, binder, listener, solrj
 Fix For: 5.0, Trunk

 Attachments: SOLR-6645.patch


 Hello good people.
 It is understandable that the priority of SolrJ is to provide a stable API 
 for java and not a rich-feature client, I'm well aware of that. On the other 
 hand more features nowadays mean most of the time Spring Solr Data. Although 
 I appreciate the enrichment work of that lib, sometimes depending on its 
 monolithic dependencies and magic is not a valid option.
 So, I was thinking that the official DocumentObjectBinder could benefit from 
 some love, and I had implemented a listener pattern for the annotations. 
 *Note: No new logic or new annotations were introduced, the patch is only a 
 refactor to make more extendible (for the user) the current 
 DocumentObjectBinder and @Field DocField.*
 You can register your annotations and they relate listeners in the binder, 
 and it will invoke the corresponding method in the listener on getBean and on 
 toSolrInputDocument, therefore granting the chance to do something during the 
 ongoing process.
 Changes are:
 * [MOD] */beans/DocumentObjectBinder*:  The new logic and a new constructor 
 for registering the annotations
 * [ADD] */impl/AccessorAnnotationListener*: Abstract utility class with the 
 former get(), set(), isArray, isList, isContainedInMap etc...
 * [ADD] */impl/FieldAnnotationListener*: all the rest of DocField for dealing 
 with @Field
 * [ADD] */AnnotationListener*: the base listener class
 * [MOD] */SolrServer*: added setBinder (this is the only tricky change, I 
 hope it's not a problem).
 It's all well documented and the code is very easy to read. Tests are all 
 green, it should be 100% backward compatible and the performance impact is 
 void (the logic flow is exactly the same as now, and I only changed the bare 
 essentials and nothing more, anyway).
 Some Examples (they are not part of the pull-request):
 The long awaited @FieldObject in 4 lines of code:
 https://issues.apache.org/jira/browse/SOLR-1945
 {code:java}
 public class FieldObjectAnnotationListener extends 
 AccessorAnnotationListenerFieldObject {
 public FieldObjectAnnotationListener(AnnotatedElement element, 
 FieldObject annotation) {
 super(element, annotation);
 }
 @Override
 public void onGetBean(Object obj, SolrDocument doc, DocumentObjectBinder 
 binder) {
 Object nested = binder.getBean(target.clazz, doc);
 setTo(obj, nested);
 }
 @Override
 public void onToSolrInputDocument(Object obj, SolrInputDocument doc, 
 DocumentObjectBinder binder) {
 SolrInputDocument nested = binder.toSolrInputDocument(getFrom(obj));
 for (Map.EntryString, SolrInputField entry : nested.entrySet()) {
 doc.addField(entry.getKey(), entry.getValue());
 }
 }
 }
 {code}
 Or something entirely new like an annotation for ChildDocuments:
 {code:java}
 public class ChildDocumentsAnnotationListener extends 
 AccessorAnnotationListenerChildDocuments {
 public ChildDocumentsAnnotationListener(AnnotatedElement element, 
 ChildDocuments annotation) {
 super(element, annotation);
 if (!target.isInList || target.clazz.isPrimitive()) {
 throw new BindingException(@NestedDocuments is applicable only 
 on ListObject.);
 }
 }
 @Override
 public void onGetBean(Object obj, SolrDocument doc, DocumentObjectBinder 
 binder) {
 ListObject nested = new ArrayList();
 for (SolrDocument child : doc.getChildDocuments()) {
 nested.add(binder.getBean(target.clazz, child));// this should be 
 recursive, but it's only an example
 }
 setTo(obj, nested);
 }
 @Override
 public void onToSolrInputDocument(Object obj, SolrInputDocument doc, 
 DocumentObjectBinder binder) {
 SolrInputDocument nested = binder.toSolrInputDocument(getFrom(obj));
 doc.addChildDocuments(nested.getChildDocuments());
 }
 }
 {code}
 In addition, all the logic is 

[jira] [Comment Edited] (SOLR-6645) Refactored DocumentObjectBinder and added AnnotationListeners

2014-11-03 Thread Fabio Piro (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14194687#comment-14194687
 ] 

Fabio Piro edited comment on SOLR-6645 at 11/3/14 4:24 PM:
---

*Friendly Reminder #2*

Hello, has anyone by any chance taken a look at the patch, between some trick 
or treats?


was (Author: dewos):
__Friendly Reminder #2__

Hello, has anyone by any chance taken a look at the patch, between some trick 
or treats?

 Refactored DocumentObjectBinder and added AnnotationListeners
 -

 Key: SOLR-6645
 URL: https://issues.apache.org/jira/browse/SOLR-6645
 Project: Solr
  Issue Type: Improvement
  Components: clients - java
Affects Versions: 4.10.2
Reporter: Fabio Piro
  Labels: annotations, binder, listener, solrj
 Fix For: 5.0, Trunk

 Attachments: SOLR-6645.patch


 Hello good people.
 It is understandable that the priority of SolrJ is to provide a stable API 
 for java and not a rich-feature client, I'm well aware of that. On the other 
 hand more features nowadays mean most of the time Spring Solr Data. Although 
 I appreciate the enrichment work of that lib, sometimes depending on its 
 monolithic dependencies and magic is not a valid option.
 So, I was thinking that the official DocumentObjectBinder could benefit from 
 some love, and I had implemented a listener pattern for the annotations. 
 *Note: No new logic or new annotations were introduced, the patch is only a 
 refactor to make more extendible (for the user) the current 
 DocumentObjectBinder and @Field DocField.*
 You can register your annotations and they relate listeners in the binder, 
 and it will invoke the corresponding method in the listener on getBean and on 
 toSolrInputDocument, therefore granting the chance to do something during the 
 ongoing process.
 Changes are:
 * [MOD] */beans/DocumentObjectBinder*:  The new logic and a new constructor 
 for registering the annotations
 * [ADD] */impl/AccessorAnnotationListener*: Abstract utility class with the 
 former get(), set(), isArray, isList, isContainedInMap etc...
 * [ADD] */impl/FieldAnnotationListener*: all the rest of DocField for dealing 
 with @Field
 * [ADD] */AnnotationListener*: the base listener class
 * [MOD] */SolrServer*: added setBinder (this is the only tricky change, I 
 hope it's not a problem).
 It's all well documented and the code is very easy to read. Tests are all 
 green, it should be 100% backward compatible and the performance impact is 
 void (the logic flow is exactly the same as now, and I only changed the bare 
 essentials and nothing more, anyway).
 Some Examples (they are not part of the pull-request):
 The long awaited @FieldObject in 4 lines of code:
 https://issues.apache.org/jira/browse/SOLR-1945
 {code:java}
 public class FieldObjectAnnotationListener extends 
 AccessorAnnotationListenerFieldObject {
 public FieldObjectAnnotationListener(AnnotatedElement element, 
 FieldObject annotation) {
 super(element, annotation);
 }
 @Override
 public void onGetBean(Object obj, SolrDocument doc, DocumentObjectBinder 
 binder) {
 Object nested = binder.getBean(target.clazz, doc);
 setTo(obj, nested);
 }
 @Override
 public void onToSolrInputDocument(Object obj, SolrInputDocument doc, 
 DocumentObjectBinder binder) {
 SolrInputDocument nested = binder.toSolrInputDocument(getFrom(obj));
 for (Map.EntryString, SolrInputField entry : nested.entrySet()) {
 doc.addField(entry.getKey(), entry.getValue());
 }
 }
 }
 {code}
 Or something entirely new like an annotation for ChildDocuments:
 {code:java}
 public class ChildDocumentsAnnotationListener extends 
 AccessorAnnotationListenerChildDocuments {
 public ChildDocumentsAnnotationListener(AnnotatedElement element, 
 ChildDocuments annotation) {
 super(element, annotation);
 if (!target.isInList || target.clazz.isPrimitive()) {
 throw new BindingException(@NestedDocuments is applicable only 
 on ListObject.);
 }
 }
 @Override
 public void onGetBean(Object obj, SolrDocument doc, DocumentObjectBinder 
 binder) {
 ListObject nested = new ArrayList();
 for (SolrDocument child : doc.getChildDocuments()) {
 nested.add(binder.getBean(target.clazz, child));// this should be 
 recursive, but it's only an example
 }
 setTo(obj, nested);
 }
 @Override
 public void onToSolrInputDocument(Object obj, SolrInputDocument doc, 
 DocumentObjectBinder binder) {
 SolrInputDocument nested = binder.toSolrInputDocument(getFrom(obj));
 doc.addChildDocuments(nested.getChildDocuments());
 }
 }
 {code}
 In addition, all the logic is 

[jira] [Commented] (SOLR-6517) CollectionsAPI call REBALANCELEADERS

2014-11-03 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14194701#comment-14194701
 ] 

Noble Paul commented on SOLR-6517:
--

Unfortunately, solrcloud failures are hard to reproduce and fix. We need to put 
extra care while making changes to  cloud . I've spent weeks debugging three 
overseer roles feature because it only failed in our 120 node cluster (never in 
the junit tests) 

 CollectionsAPI call REBALANCELEADERS
 

 Key: SOLR-6517
 URL: https://issues.apache.org/jira/browse/SOLR-6517
 Project: Solr
  Issue Type: New Feature
Affects Versions: 5.0, Trunk
Reporter: Erick Erickson
Assignee: Erick Erickson
 Fix For: 5.0, Trunk

 Attachments: SOLR-6517.patch, SOLR-6517.patch, SOLR-6517.patch


 Perhaps the final piece of SOLR-6491. Once the preferred leadership roles are 
 assigned, there has to be a command make it so Mr. Solr. This is something 
 of a placeholder to collect ideas. One wouldn't want to flood the system with 
 hundreds of re-assignments at once. Should this be synchronous or asnych? 
 Should it make the best attempt but not worry about perfection? Should it???
 a collection=name parameter would be required and it would re-elect all the 
 leaders that were on the 'wrong' node
 I'm thinking an optionally allowing one to specify a shard in the case where 
 you wanted to make a very specific change. Note that there's no need to 
 specify a particular replica, since there should be only a single 
 preferredLeader per slice.
 This command would do nothing to any slice that did not have a replica with a 
 preferredLeader role. Likewise it would do nothing if the slice in question 
 already had the leader role assigned to the node with the preferredLeader 
 role.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-6517) CollectionsAPI call REBALANCELEADERS

2014-11-03 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14194701#comment-14194701
 ] 

Noble Paul edited comment on SOLR-6517 at 11/3/14 4:35 PM:
---

Unfortunately, solrcloud failures are hard to reproduce and fix. We need to put 
extra care while making changes to  cloud . I've spent weeks debugging the 
overseer roles feature because it only failed in our 120 node cluster (never in 
the junit tests) 


was (Author: noble.paul):
Unfortunately, solrcloud failures are hard to reproduce and fix. We need to put 
extra care while making changes to  cloud . I've spent weeks debugging three 
overseer roles feature because it only failed in our 120 node cluster (never in 
the junit tests) 

 CollectionsAPI call REBALANCELEADERS
 

 Key: SOLR-6517
 URL: https://issues.apache.org/jira/browse/SOLR-6517
 Project: Solr
  Issue Type: New Feature
Affects Versions: 5.0, Trunk
Reporter: Erick Erickson
Assignee: Erick Erickson
 Fix For: 5.0, Trunk

 Attachments: SOLR-6517.patch, SOLR-6517.patch, SOLR-6517.patch


 Perhaps the final piece of SOLR-6491. Once the preferred leadership roles are 
 assigned, there has to be a command make it so Mr. Solr. This is something 
 of a placeholder to collect ideas. One wouldn't want to flood the system with 
 hundreds of re-assignments at once. Should this be synchronous or asnych? 
 Should it make the best attempt but not worry about perfection? Should it???
 a collection=name parameter would be required and it would re-elect all the 
 leaders that were on the 'wrong' node
 I'm thinking an optionally allowing one to specify a shard in the case where 
 you wanted to make a very specific change. Note that there's no need to 
 specify a particular replica, since there should be only a single 
 preferredLeader per slice.
 This command would do nothing to any slice that did not have a replica with a 
 preferredLeader role. Likewise it would do nothing if the slice in question 
 already had the leader role assigned to the node with the preferredLeader 
 role.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-6046) RegExp.toAutomaton high memory use

2014-11-03 Thread Nik Everett (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-6046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nik Everett updated LUCENE-6046:

Attachment: LUCENE-6046.patch

First cut at a patch.  Adds maxDeterminizedStates to Operations.determinize and 
pipes it through to tons of places.  I think its important never to hide when 
determinize is called because of how potentially heavy it is.  Forcing callers 
of MinimizationOperations.minimize, Operations.reverse, Operations.minus etc to 
specify maxDeterminizedStates makes it pretty clear that the automaton might be 
determinized during those processes.

I added an unchecked exception for when the Automaton can't be determinized 
within the specified number of state but I'm really tempted to change it to a 
checked exception to make it super duper obvious when determinization might 
occur.

 RegExp.toAutomaton high memory use
 --

 Key: LUCENE-6046
 URL: https://issues.apache.org/jira/browse/LUCENE-6046
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/queryparser
Affects Versions: 4.10.1
Reporter: Lee Hinman
Assignee: Michael McCandless
Priority: Minor
 Attachments: LUCENE-6046.patch


 When creating an automaton from an org.apache.lucene.util.automaton.RegExp, 
 it's possible for the automaton to use so much memory it exceeds the maximum 
 array size for java.
 The following caused an OutOfMemoryError with a 32gb heap:
 {noformat}
 new 
 RegExp(\\[\\[(Datei|File|Bild|Image):[^]]*alt=[^]|}]{50,200}).toAutomaton();
 {noformat}
 When increased to a 60gb heap, the following exception is thrown:
 {noformat}
   1 java.lang.IllegalArgumentException: requested array size 2147483624 
 exceeds maximum array in java (2147483623)
   1 
 __randomizedtesting.SeedInfo.seed([7BE81EF678615C32:95C8057A4ABA5B52]:0)
   1 org.apache.lucene.util.ArrayUtil.oversize(ArrayUtil.java:168)
   1 org.apache.lucene.util.ArrayUtil.grow(ArrayUtil.java:295)
   1 
 org.apache.lucene.util.automaton.Automaton$Builder.addTransition(Automaton.java:639)
   1 
 org.apache.lucene.util.automaton.Operations.determinize(Operations.java:741)
   1 
 org.apache.lucene.util.automaton.MinimizationOperations.minimizeHopcroft(MinimizationOperations.java:62)
   1 
 org.apache.lucene.util.automaton.MinimizationOperations.minimize(MinimizationOperations.java:51)
   1 org.apache.lucene.util.automaton.RegExp.toAutomaton(RegExp.java:477)
   1 org.apache.lucene.util.automaton.RegExp.toAutomaton(RegExp.java:426)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6046) RegExp.toAutomaton high memory use

2014-11-03 Thread Nik Everett (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14194716#comment-14194716
 ] 

Nik Everett commented on LUCENE-6046:
-

Oh - I'm still running the solr tests against this.  I imagine they'll pass as 
they've been running fine for 30 minutes now but I should throw that out there 
in case someone gets them to fail with this before I do.

 RegExp.toAutomaton high memory use
 --

 Key: LUCENE-6046
 URL: https://issues.apache.org/jira/browse/LUCENE-6046
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/queryparser
Affects Versions: 4.10.1
Reporter: Lee Hinman
Assignee: Michael McCandless
Priority: Minor
 Attachments: LUCENE-6046.patch


 When creating an automaton from an org.apache.lucene.util.automaton.RegExp, 
 it's possible for the automaton to use so much memory it exceeds the maximum 
 array size for java.
 The following caused an OutOfMemoryError with a 32gb heap:
 {noformat}
 new 
 RegExp(\\[\\[(Datei|File|Bild|Image):[^]]*alt=[^]|}]{50,200}).toAutomaton();
 {noformat}
 When increased to a 60gb heap, the following exception is thrown:
 {noformat}
   1 java.lang.IllegalArgumentException: requested array size 2147483624 
 exceeds maximum array in java (2147483623)
   1 
 __randomizedtesting.SeedInfo.seed([7BE81EF678615C32:95C8057A4ABA5B52]:0)
   1 org.apache.lucene.util.ArrayUtil.oversize(ArrayUtil.java:168)
   1 org.apache.lucene.util.ArrayUtil.grow(ArrayUtil.java:295)
   1 
 org.apache.lucene.util.automaton.Automaton$Builder.addTransition(Automaton.java:639)
   1 
 org.apache.lucene.util.automaton.Operations.determinize(Operations.java:741)
   1 
 org.apache.lucene.util.automaton.MinimizationOperations.minimizeHopcroft(MinimizationOperations.java:62)
   1 
 org.apache.lucene.util.automaton.MinimizationOperations.minimize(MinimizationOperations.java:51)
   1 org.apache.lucene.util.automaton.RegExp.toAutomaton(RegExp.java:477)
   1 org.apache.lucene.util.automaton.RegExp.toAutomaton(RegExp.java:426)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-6591) Cluster state updates can be lost on exception in main queue loop

2014-11-03 Thread Shalin Shekhar Mangar (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-6591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shalin Shekhar Mangar updated SOLR-6591:

Attachment: SOLR-6591-ignore-no-collection-path.patch

{quote}
A rapid create+delete loop for collections with state format  1 causes the 
above exception to happen. This is because the updateZkState method assumes 
that the collection exists and it tries to write to 
/collections/collection_name/state.json directly without verifying whether the 
/collections/collection_name zk node exists
{quote}

This patch ignores state messages which are trying to create new collections 
when the parent zk path doesn't exist. I've added the following comment in the 
code to explain the situation:
{quote}
 // if the /collections/collection_name path doesn't exist then 
it means that
  // 1) the user invoked a DELETE collection API and the 
OverseerCollectionProcessor has deleted
  // this zk path.
  // 2) these are most likely old state messages which are 
only being processed now because
  // if they were new state messages then in legacy mode, a 
new collection would have been 
  // created with stateFormat = 1 (which is the default state 
format)
  // 3) these can't be new state messages created for a new 
collection because
  // otherwise the OverseerCollectionProcessor would have 
already created this path
  // as part of the create collection API call -- which is the 
only way in which a collection
  // with stateFormat  1 can possibly be created
{quote}



 Cluster state updates can be lost on exception in main queue loop
 -

 Key: SOLR-6591
 URL: https://issues.apache.org/jira/browse/SOLR-6591
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: Trunk
Reporter: Shalin Shekhar Mangar
Assignee: Shalin Shekhar Mangar
 Fix For: Trunk

 Attachments: SOLR-6591-constructStateFix.patch, 
 SOLR-6591-ignore-no-collection-path.patch, SOLR-6591-no-mixed-batches.patch, 
 SOLR-6591.patch


 I found this bug while going through the failure on jenkins:
 https://builds.apache.org/job/Lucene-Solr-NightlyTests-trunk/648/
 {code}
 2 tests failed.
 REGRESSION:  
 org.apache.solr.cloud.CollectionsAPIDistributedZkTest.testDistribSearch
 Error Message:
 Error CREATEing SolrCore 'halfcollection_shard1_replica1': Unable to create 
 core [halfcollection_shard1_replica1] Caused by: Could not get shard id for 
 core: halfcollection_shard1_replica1
 Stack Trace:
 org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: Error 
 CREATEing SolrCore 'halfcollection_shard1_replica1': Unable to create core 
 [halfcollection_shard1_replica1] Caused by: Could not get shard id for core: 
 halfcollection_shard1_replica1
 at 
 org.apache.solr.client.solrj.impl.HttpSolrServer.executeMethod(HttpSolrServer.java:570)
 at 
 org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:215)
 at 
 org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:211)
 at 
 org.apache.solr.cloud.CollectionsAPIDistributedZkTest.testErrorHandling(CollectionsAPIDistributedZkTest.java:583)
 at 
 org.apache.solr.cloud.CollectionsAPIDistributedZkTest.doTest(CollectionsAPIDistributedZkTest.java:205)
 at 
 org.apache.solr.BaseDistributedSearchTestCase.testDistribSearch(BaseDistributedSearchTestCase.java:869)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at 
 com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1618)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-5.x-MacOSX (64bit/jdk1.7.0) - Build # 1877 - Failure!

2014-11-03 Thread Policeman Jenkins Server
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-5.x-MacOSX/1877/
Java: 64bit/jdk1.7.0 -XX:-UseCompressedOops -XX:+UseParallelGC (asserts: true)

1 tests failed.
REGRESSION:  
org.apache.solr.cloud.CollectionsAPIDistributedZkTest.testDistribSearch

Error Message:
Error CREATEing SolrCore 'halfcollection_shard1_replica1': Unable to create 
core [halfcollection_shard1_replica1] Caused by: Could not get shard id for 
core: halfcollection_shard1_replica1

Stack Trace:
org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: Error 
CREATEing SolrCore 'halfcollection_shard1_replica1': Unable to create core 
[halfcollection_shard1_replica1] Caused by: Could not get shard id for core: 
halfcollection_shard1_replica1
at 
__randomizedtesting.SeedInfo.seed([BB86E03433744719:3A606E2C442B2725]:0)
at 
org.apache.solr.client.solrj.impl.HttpSolrServer.executeMethod(HttpSolrServer.java:569)
at 
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:215)
at 
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:211)
at 
org.apache.solr.cloud.CollectionsAPIDistributedZkTest.testErrorHandling(CollectionsAPIDistributedZkTest.java:583)
at 
org.apache.solr.cloud.CollectionsAPIDistributedZkTest.doTest(CollectionsAPIDistributedZkTest.java:205)
at 
org.apache.solr.BaseDistributedSearchTestCase.testDistribSearch(BaseDistributedSearchTestCase.java:869)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1618)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:827)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:863)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:877)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:365)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:798)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:458)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:836)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:738)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:772)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:783)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 

[jira] [Updated] (SOLR-6591) Cluster state updates can be lost on exception in main queue loop

2014-11-03 Thread Shalin Shekhar Mangar (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-6591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shalin Shekhar Mangar updated SOLR-6591:

Attachment: (was: SOLR-6591-ignore-no-collection-path.patch)

 Cluster state updates can be lost on exception in main queue loop
 -

 Key: SOLR-6591
 URL: https://issues.apache.org/jira/browse/SOLR-6591
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: Trunk
Reporter: Shalin Shekhar Mangar
Assignee: Shalin Shekhar Mangar
 Fix For: Trunk

 Attachments: SOLR-6591-constructStateFix.patch, 
 SOLR-6591-no-mixed-batches.patch, SOLR-6591.patch


 I found this bug while going through the failure on jenkins:
 https://builds.apache.org/job/Lucene-Solr-NightlyTests-trunk/648/
 {code}
 2 tests failed.
 REGRESSION:  
 org.apache.solr.cloud.CollectionsAPIDistributedZkTest.testDistribSearch
 Error Message:
 Error CREATEing SolrCore 'halfcollection_shard1_replica1': Unable to create 
 core [halfcollection_shard1_replica1] Caused by: Could not get shard id for 
 core: halfcollection_shard1_replica1
 Stack Trace:
 org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: Error 
 CREATEing SolrCore 'halfcollection_shard1_replica1': Unable to create core 
 [halfcollection_shard1_replica1] Caused by: Could not get shard id for core: 
 halfcollection_shard1_replica1
 at 
 org.apache.solr.client.solrj.impl.HttpSolrServer.executeMethod(HttpSolrServer.java:570)
 at 
 org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:215)
 at 
 org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:211)
 at 
 org.apache.solr.cloud.CollectionsAPIDistributedZkTest.testErrorHandling(CollectionsAPIDistributedZkTest.java:583)
 at 
 org.apache.solr.cloud.CollectionsAPIDistributedZkTest.doTest(CollectionsAPIDistributedZkTest.java:205)
 at 
 org.apache.solr.BaseDistributedSearchTestCase.testDistribSearch(BaseDistributedSearchTestCase.java:869)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at 
 com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1618)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-6591) Cluster state updates can be lost on exception in main queue loop

2014-11-03 Thread Shalin Shekhar Mangar (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-6591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shalin Shekhar Mangar updated SOLR-6591:

Attachment: SOLR-6591-ignore-no-collection-path.patch

With the right patch (SOLR-6591-ignore-no-collection-path.patch) this time.

 Cluster state updates can be lost on exception in main queue loop
 -

 Key: SOLR-6591
 URL: https://issues.apache.org/jira/browse/SOLR-6591
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: Trunk
Reporter: Shalin Shekhar Mangar
Assignee: Shalin Shekhar Mangar
 Fix For: Trunk

 Attachments: SOLR-6591-constructStateFix.patch, 
 SOLR-6591-ignore-no-collection-path.patch, SOLR-6591-no-mixed-batches.patch, 
 SOLR-6591.patch


 I found this bug while going through the failure on jenkins:
 https://builds.apache.org/job/Lucene-Solr-NightlyTests-trunk/648/
 {code}
 2 tests failed.
 REGRESSION:  
 org.apache.solr.cloud.CollectionsAPIDistributedZkTest.testDistribSearch
 Error Message:
 Error CREATEing SolrCore 'halfcollection_shard1_replica1': Unable to create 
 core [halfcollection_shard1_replica1] Caused by: Could not get shard id for 
 core: halfcollection_shard1_replica1
 Stack Trace:
 org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: Error 
 CREATEing SolrCore 'halfcollection_shard1_replica1': Unable to create core 
 [halfcollection_shard1_replica1] Caused by: Could not get shard id for core: 
 halfcollection_shard1_replica1
 at 
 org.apache.solr.client.solrj.impl.HttpSolrServer.executeMethod(HttpSolrServer.java:570)
 at 
 org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:215)
 at 
 org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:211)
 at 
 org.apache.solr.cloud.CollectionsAPIDistributedZkTest.testErrorHandling(CollectionsAPIDistributedZkTest.java:583)
 at 
 org.apache.solr.cloud.CollectionsAPIDistributedZkTest.doTest(CollectionsAPIDistributedZkTest.java:205)
 at 
 org.apache.solr.BaseDistributedSearchTestCase.testDistribSearch(BaseDistributedSearchTestCase.java:869)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at 
 com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1618)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-6690) Highlight expanded results

2014-11-03 Thread Simon Endele (JIRA)
Simon Endele created SOLR-6690:
--

 Summary: Highlight expanded results
 Key: SOLR-6690
 URL: https://issues.apache.org/jira/browse/SOLR-6690
 Project: Solr
  Issue Type: Wish
Reporter: Simon Endele
Priority: Minor


Is it possible to apply the highlighting to documents in the expand section 
in the Solr response?

I'm aware that https://cwiki.apache.org/confluence/x/jiBqAg states:
All downstream components (faceting, highlighting, etc...) will work with the 
collapsed result set.

So I tried to put the highlight component after the expand component like this:
{code:xml}arr name=components
strquery/str
strfacet/str
strstats/str
strdebug/str
strexpand/str
strhighlight/str
/arr{code}
But with no effect.

Is there another switch that needs to be flipped or could this be implemented 
easily?
IMHO this is quite a common use case...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6591) Cluster state updates can be lost on exception in main queue loop

2014-11-03 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14194756#comment-14194756
 ] 

ASF subversion and git services commented on SOLR-6591:
---

Commit 1636400 from sha...@apache.org in branch 'dev/trunk'
[ https://svn.apache.org/r1636400 ]

SOLR-6591: Ignore overseer operations for collections with stateFormat  1 if 
the parent ZK path doesn't exist

 Cluster state updates can be lost on exception in main queue loop
 -

 Key: SOLR-6591
 URL: https://issues.apache.org/jira/browse/SOLR-6591
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: Trunk
Reporter: Shalin Shekhar Mangar
Assignee: Shalin Shekhar Mangar
 Fix For: Trunk

 Attachments: SOLR-6591-constructStateFix.patch, 
 SOLR-6591-ignore-no-collection-path.patch, SOLR-6591-no-mixed-batches.patch, 
 SOLR-6591.patch


 I found this bug while going through the failure on jenkins:
 https://builds.apache.org/job/Lucene-Solr-NightlyTests-trunk/648/
 {code}
 2 tests failed.
 REGRESSION:  
 org.apache.solr.cloud.CollectionsAPIDistributedZkTest.testDistribSearch
 Error Message:
 Error CREATEing SolrCore 'halfcollection_shard1_replica1': Unable to create 
 core [halfcollection_shard1_replica1] Caused by: Could not get shard id for 
 core: halfcollection_shard1_replica1
 Stack Trace:
 org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: Error 
 CREATEing SolrCore 'halfcollection_shard1_replica1': Unable to create core 
 [halfcollection_shard1_replica1] Caused by: Could not get shard id for core: 
 halfcollection_shard1_replica1
 at 
 org.apache.solr.client.solrj.impl.HttpSolrServer.executeMethod(HttpSolrServer.java:570)
 at 
 org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:215)
 at 
 org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:211)
 at 
 org.apache.solr.cloud.CollectionsAPIDistributedZkTest.testErrorHandling(CollectionsAPIDistributedZkTest.java:583)
 at 
 org.apache.solr.cloud.CollectionsAPIDistributedZkTest.doTest(CollectionsAPIDistributedZkTest.java:205)
 at 
 org.apache.solr.BaseDistributedSearchTestCase.testDistribSearch(BaseDistributedSearchTestCase.java:869)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at 
 com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1618)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-6691) REBALANCELEADERS needs to change the leader election queue.

2014-11-03 Thread Erick Erickson (JIRA)
Erick Erickson created SOLR-6691:


 Summary: REBALANCELEADERS needs to change the leader election 
queue.
 Key: SOLR-6691
 URL: https://issues.apache.org/jira/browse/SOLR-6691
 Project: Solr
  Issue Type: Bug
Reporter: Erick Erickson
Assignee: Erick Erickson


The original code (SOLR-6517) assumed that changes in the clusterstate after 
issuing a command to the overseer to change the leader indicated that the 
leader was successfully changed. Fortunately, Noble clued me in that this isn't 
the case and that the potential leader needs to insert itself in the leader 
election queue before trigging the change leader command.

Inserting themselves in the front of the queue should probably happen in 
BALANCESHARDUNIQUE when the preferredLeader property is assigned as well.

[~noble.paul] Do evil things happen if a node joins at the head but it's 
_already_ in the queue? These ephemeral nodes in the queue are watching each 
other. So if node1 is the leader you have
node1 - node2 - node3 - node4
where - means watches.

Now, if node3 puts itself at the head of the list, you have
node1 - node2
   - node3 - node4

I _think_ when I was looking at this it all just worked. 
1 node 1 goes down. Nodes 2 and 3 duke it out but there's code to insure that 
node3 becomes the leader and node2 inserts itself at then end so it's watching 
node 4.

2 node 2 goes down, nobody gets notified and it doesn't matter.

3 node 3 goes down, node 4 gets notified and starts watching node 2 by 
inserting itself at the end of the list.

4 node 4 goes down, nobody gets notified and it doesn't matter.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6591) Cluster state updates can be lost on exception in main queue loop

2014-11-03 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14194760#comment-14194760
 ] 

ASF subversion and git services commented on SOLR-6591:
---

Commit 1636401 from sha...@apache.org in branch 'dev/branches/branch_5x'
[ https://svn.apache.org/r1636401 ]

SOLR-6591: Ignore overseer operations for collections with stateFormat  1 if 
the parent ZK path doesn't exist

 Cluster state updates can be lost on exception in main queue loop
 -

 Key: SOLR-6591
 URL: https://issues.apache.org/jira/browse/SOLR-6591
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: Trunk
Reporter: Shalin Shekhar Mangar
Assignee: Shalin Shekhar Mangar
 Fix For: Trunk

 Attachments: SOLR-6591-constructStateFix.patch, 
 SOLR-6591-ignore-no-collection-path.patch, SOLR-6591-no-mixed-batches.patch, 
 SOLR-6591.patch


 I found this bug while going through the failure on jenkins:
 https://builds.apache.org/job/Lucene-Solr-NightlyTests-trunk/648/
 {code}
 2 tests failed.
 REGRESSION:  
 org.apache.solr.cloud.CollectionsAPIDistributedZkTest.testDistribSearch
 Error Message:
 Error CREATEing SolrCore 'halfcollection_shard1_replica1': Unable to create 
 core [halfcollection_shard1_replica1] Caused by: Could not get shard id for 
 core: halfcollection_shard1_replica1
 Stack Trace:
 org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: Error 
 CREATEing SolrCore 'halfcollection_shard1_replica1': Unable to create core 
 [halfcollection_shard1_replica1] Caused by: Could not get shard id for core: 
 halfcollection_shard1_replica1
 at 
 org.apache.solr.client.solrj.impl.HttpSolrServer.executeMethod(HttpSolrServer.java:570)
 at 
 org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:215)
 at 
 org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:211)
 at 
 org.apache.solr.cloud.CollectionsAPIDistributedZkTest.testErrorHandling(CollectionsAPIDistributedZkTest.java:583)
 at 
 org.apache.solr.cloud.CollectionsAPIDistributedZkTest.doTest(CollectionsAPIDistributedZkTest.java:205)
 at 
 org.apache.solr.BaseDistributedSearchTestCase.testDistribSearch(BaseDistributedSearchTestCase.java:869)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at 
 com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1618)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6517) CollectionsAPI call REBALANCELEADERS

2014-11-03 Thread Erick Erickson (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14194761#comment-14194761
 ] 

Erick Erickson commented on SOLR-6517:
--

Gah. OK, the fact that the state information isn't reflective of the actual 
state is what was throwing me.

Let's move any further discussion over to SOLR-6691 (which I just created). 
I've tried to synopsize the discussion in that JIRA.

Thanks for your patience in explaining, mucking around in the cloud state kinda 
scares me... apparently for good reason.

 CollectionsAPI call REBALANCELEADERS
 

 Key: SOLR-6517
 URL: https://issues.apache.org/jira/browse/SOLR-6517
 Project: Solr
  Issue Type: New Feature
Affects Versions: 5.0, Trunk
Reporter: Erick Erickson
Assignee: Erick Erickson
 Fix For: 5.0, Trunk

 Attachments: SOLR-6517.patch, SOLR-6517.patch, SOLR-6517.patch


 Perhaps the final piece of SOLR-6491. Once the preferred leadership roles are 
 assigned, there has to be a command make it so Mr. Solr. This is something 
 of a placeholder to collect ideas. One wouldn't want to flood the system with 
 hundreds of re-assignments at once. Should this be synchronous or asnych? 
 Should it make the best attempt but not worry about perfection? Should it???
 a collection=name parameter would be required and it would re-elect all the 
 leaders that were on the 'wrong' node
 I'm thinking an optionally allowing one to specify a shard in the case where 
 you wanted to make a very specific change. Note that there's no need to 
 specify a particular replica, since there should be only a single 
 preferredLeader per slice.
 This command would do nothing to any slice that did not have a replica with a 
 preferredLeader role. Likewise it would do nothing if the slice in question 
 already had the leader role assigned to the node with the preferredLeader 
 role.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-6691) REBALANCELEADERS needs to change the leader election queue.

2014-11-03 Thread Erick Erickson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-6691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson updated SOLR-6691:
-
Description: 
The original code (SOLR-6517) assumed that changes in the clusterstate after 
issuing a command to the overseer to change the leader indicated that the 
leader was successfully changed. Fortunately, Noble clued me in that this isn't 
the case and that the potential leader needs to insert itself in the leader 
election queue before trigging the change leader command.

Inserting themselves in the front of the queue should probably happen in 
BALANCESHARDUNIQUE when the preferredLeader property is assigned as well.

[~noble.paul] Do evil things happen if a node joins at the head but it's 
_already_ in the queue? These ephemeral nodes in the queue are watching each 
other. So if node1 is the leader you have
node1 - node2 - node3 - node4
where - means watches.

Now, if node3 puts itself at the head of the list, you have
{{code}
node1 - node2
   - node3 - node4
{{code}}

I _think_ when I was looking at this it all just worked. 
1 node 1 goes down. Nodes 2 and 3 duke it out but there's code to insure that 
node3 becomes the leader and node2 inserts itself at then end so it's watching 
node 4.

2 node 2 goes down, nobody gets notified and it doesn't matter.

3 node 3 goes down, node 4 gets notified and starts watching node 2 by 
inserting itself at the end of the list.

4 node 4 goes down, nobody gets notified and it doesn't matter.

  was:
The original code (SOLR-6517) assumed that changes in the clusterstate after 
issuing a command to the overseer to change the leader indicated that the 
leader was successfully changed. Fortunately, Noble clued me in that this isn't 
the case and that the potential leader needs to insert itself in the leader 
election queue before trigging the change leader command.

Inserting themselves in the front of the queue should probably happen in 
BALANCESHARDUNIQUE when the preferredLeader property is assigned as well.

[~noble.paul] Do evil things happen if a node joins at the head but it's 
_already_ in the queue? These ephemeral nodes in the queue are watching each 
other. So if node1 is the leader you have
node1 - node2 - node3 - node4
where - means watches.

Now, if node3 puts itself at the head of the list, you have
node1 - node2
   - node3 - node4

I _think_ when I was looking at this it all just worked. 
1 node 1 goes down. Nodes 2 and 3 duke it out but there's code to insure that 
node3 becomes the leader and node2 inserts itself at then end so it's watching 
node 4.

2 node 2 goes down, nobody gets notified and it doesn't matter.

3 node 3 goes down, node 4 gets notified and starts watching node 2 by 
inserting itself at the end of the list.

4 node 4 goes down, nobody gets notified and it doesn't matter.


 REBALANCELEADERS needs to change the leader election queue.
 ---

 Key: SOLR-6691
 URL: https://issues.apache.org/jira/browse/SOLR-6691
 Project: Solr
  Issue Type: Bug
Reporter: Erick Erickson
Assignee: Erick Erickson

 The original code (SOLR-6517) assumed that changes in the clusterstate after 
 issuing a command to the overseer to change the leader indicated that the 
 leader was successfully changed. Fortunately, Noble clued me in that this 
 isn't the case and that the potential leader needs to insert itself in the 
 leader election queue before trigging the change leader command.
 Inserting themselves in the front of the queue should probably happen in 
 BALANCESHARDUNIQUE when the preferredLeader property is assigned as well.
 [~noble.paul] Do evil things happen if a node joins at the head but it's 
 _already_ in the queue? These ephemeral nodes in the queue are watching each 
 other. So if node1 is the leader you have
 node1 - node2 - node3 - node4
 where - means watches.
 Now, if node3 puts itself at the head of the list, you have
 {{code}
 node1 - node2
- node3 - node4
 {{code}}
 I _think_ when I was looking at this it all just worked. 
 1 node 1 goes down. Nodes 2 and 3 duke it out but there's code to insure 
 that node3 becomes the leader and node2 inserts itself at then end so it's 
 watching node 4.
 2 node 2 goes down, nobody gets notified and it doesn't matter.
 3 node 3 goes down, node 4 gets notified and starts watching node 2 by 
 inserting itself at the end of the list.
 4 node 4 goes down, nobody gets notified and it doesn't matter.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-6691) REBALANCELEADERS needs to change the leader election queue.

2014-11-03 Thread Erick Erickson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-6691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson updated SOLR-6691:
-
Description: 
The original code (SOLR-6517) assumed that changes in the clusterstate after 
issuing a command to the overseer to change the leader indicated that the 
leader was successfully changed. Fortunately, Noble clued me in that this isn't 
the case and that the potential leader needs to insert itself in the leader 
election queue before trigging the change leader command.

Inserting themselves in the front of the queue should probably happen in 
BALANCESHARDUNIQUE when the preferredLeader property is assigned as well.

[~noble.paul] Do evil things happen if a node joins at the head but it's 
_already_ in the queue? These ephemeral nodes in the queue are watching each 
other. So if node1 is the leader you have
node1 - node2 - node3 - node4
where - means watches.

Now, if node3 puts itself at the head of the list, you have
{{code}}
node1 - node2
   - node3 - node4
{{code}}

I _think_ when I was looking at this it all just worked. 
1 node 1 goes down. Nodes 2 and 3 duke it out but there's code to insure that 
node3 becomes the leader and node2 inserts itself at then end so it's watching 
node 4.

2 node 2 goes down, nobody gets notified and it doesn't matter.

3 node 3 goes down, node 4 gets notified and starts watching node 2 by 
inserting itself at the end of the list.

4 node 4 goes down, nobody gets notified and it doesn't matter.

  was:
The original code (SOLR-6517) assumed that changes in the clusterstate after 
issuing a command to the overseer to change the leader indicated that the 
leader was successfully changed. Fortunately, Noble clued me in that this isn't 
the case and that the potential leader needs to insert itself in the leader 
election queue before trigging the change leader command.

Inserting themselves in the front of the queue should probably happen in 
BALANCESHARDUNIQUE when the preferredLeader property is assigned as well.

[~noble.paul] Do evil things happen if a node joins at the head but it's 
_already_ in the queue? These ephemeral nodes in the queue are watching each 
other. So if node1 is the leader you have
node1 - node2 - node3 - node4
where - means watches.

Now, if node3 puts itself at the head of the list, you have
{{code}
node1 - node2
   - node3 - node4
{{code}}

I _think_ when I was looking at this it all just worked. 
1 node 1 goes down. Nodes 2 and 3 duke it out but there's code to insure that 
node3 becomes the leader and node2 inserts itself at then end so it's watching 
node 4.

2 node 2 goes down, nobody gets notified and it doesn't matter.

3 node 3 goes down, node 4 gets notified and starts watching node 2 by 
inserting itself at the end of the list.

4 node 4 goes down, nobody gets notified and it doesn't matter.


 REBALANCELEADERS needs to change the leader election queue.
 ---

 Key: SOLR-6691
 URL: https://issues.apache.org/jira/browse/SOLR-6691
 Project: Solr
  Issue Type: Bug
Reporter: Erick Erickson
Assignee: Erick Erickson

 The original code (SOLR-6517) assumed that changes in the clusterstate after 
 issuing a command to the overseer to change the leader indicated that the 
 leader was successfully changed. Fortunately, Noble clued me in that this 
 isn't the case and that the potential leader needs to insert itself in the 
 leader election queue before trigging the change leader command.
 Inserting themselves in the front of the queue should probably happen in 
 BALANCESHARDUNIQUE when the preferredLeader property is assigned as well.
 [~noble.paul] Do evil things happen if a node joins at the head but it's 
 _already_ in the queue? These ephemeral nodes in the queue are watching each 
 other. So if node1 is the leader you have
 node1 - node2 - node3 - node4
 where - means watches.
 Now, if node3 puts itself at the head of the list, you have
 {{code}}
 node1 - node2
- node3 - node4
 {{code}}
 I _think_ when I was looking at this it all just worked. 
 1 node 1 goes down. Nodes 2 and 3 duke it out but there's code to insure 
 that node3 becomes the leader and node2 inserts itself at then end so it's 
 watching node 4.
 2 node 2 goes down, nobody gets notified and it doesn't matter.
 3 node 3 goes down, node 4 gets notified and starts watching node 2 by 
 inserting itself at the end of the list.
 4 node 4 goes down, nobody gets notified and it doesn't matter.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-6691) REBALANCELEADERS needs to change the leader election queue.

2014-11-03 Thread Erick Erickson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-6691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson updated SOLR-6691:
-
Description: 
The original code (SOLR-6517) assumed that changes in the clusterstate after 
issuing a command to the overseer to change the leader indicated that the 
leader was successfully changed. Fortunately, Noble clued me in that this isn't 
the case and that the potential leader needs to insert itself in the leader 
election queue before trigging the change leader command.

Inserting themselves in the front of the queue should probably happen in 
BALANCESHARDUNIQUE when the preferredLeader property is assigned as well.

[~noble.paul] Do evil things happen if a node joins at the head but it's 
_already_ in the queue? These ephemeral nodes in the queue are watching each 
other. So if node1 is the leader you have
node1 - node2 - node3 - node4
where - means watches.

Now, if node3 puts itself at the head of the list, you have
{code}
node1 - node2
   - node3 - node4
{code}

I _think_ when I was looking at this it all just worked. 
1 node 1 goes down. Nodes 2 and 3 duke it out but there's code to insure that 
node3 becomes the leader and node2 inserts itself at then end so it's watching 
node 4.

2 node 2 goes down, nobody gets notified and it doesn't matter.

3 node 3 goes down, node 4 gets notified and starts watching node 2 by 
inserting itself at the end of the list.

4 node 4 goes down, nobody gets notified and it doesn't matter.

  was:
The original code (SOLR-6517) assumed that changes in the clusterstate after 
issuing a command to the overseer to change the leader indicated that the 
leader was successfully changed. Fortunately, Noble clued me in that this isn't 
the case and that the potential leader needs to insert itself in the leader 
election queue before trigging the change leader command.

Inserting themselves in the front of the queue should probably happen in 
BALANCESHARDUNIQUE when the preferredLeader property is assigned as well.

[~noble.paul] Do evil things happen if a node joins at the head but it's 
_already_ in the queue? These ephemeral nodes in the queue are watching each 
other. So if node1 is the leader you have
node1 - node2 - node3 - node4
where - means watches.

Now, if node3 puts itself at the head of the list, you have
{{code}}
node1 - node2
   - node3 - node4
{{code}}

I _think_ when I was looking at this it all just worked. 
1 node 1 goes down. Nodes 2 and 3 duke it out but there's code to insure that 
node3 becomes the leader and node2 inserts itself at then end so it's watching 
node 4.

2 node 2 goes down, nobody gets notified and it doesn't matter.

3 node 3 goes down, node 4 gets notified and starts watching node 2 by 
inserting itself at the end of the list.

4 node 4 goes down, nobody gets notified and it doesn't matter.


 REBALANCELEADERS needs to change the leader election queue.
 ---

 Key: SOLR-6691
 URL: https://issues.apache.org/jira/browse/SOLR-6691
 Project: Solr
  Issue Type: Bug
Reporter: Erick Erickson
Assignee: Erick Erickson

 The original code (SOLR-6517) assumed that changes in the clusterstate after 
 issuing a command to the overseer to change the leader indicated that the 
 leader was successfully changed. Fortunately, Noble clued me in that this 
 isn't the case and that the potential leader needs to insert itself in the 
 leader election queue before trigging the change leader command.
 Inserting themselves in the front of the queue should probably happen in 
 BALANCESHARDUNIQUE when the preferredLeader property is assigned as well.
 [~noble.paul] Do evil things happen if a node joins at the head but it's 
 _already_ in the queue? These ephemeral nodes in the queue are watching each 
 other. So if node1 is the leader you have
 node1 - node2 - node3 - node4
 where - means watches.
 Now, if node3 puts itself at the head of the list, you have
 {code}
 node1 - node2
- node3 - node4
 {code}
 I _think_ when I was looking at this it all just worked. 
 1 node 1 goes down. Nodes 2 and 3 duke it out but there's code to insure 
 that node3 becomes the leader and node2 inserts itself at then end so it's 
 watching node 4.
 2 node 2 goes down, nobody gets notified and it doesn't matter.
 3 node 3 goes down, node 4 gets notified and starts watching node 2 by 
 inserting itself at the end of the list.
 4 node 4 goes down, nobody gets notified and it doesn't matter.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-6691) REBALANCELEADERS needs to change the leader election queue.

2014-11-03 Thread Erick Erickson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-6691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson updated SOLR-6691:
-
Description: 
The original code (SOLR-6517) assumed that changes in the clusterstate after 
issuing a command to the overseer to change the leader indicated that the 
leader was successfully changed. Fortunately, Noble clued me in that this isn't 
the case and that the potential leader needs to insert itself in the leader 
election queue before trigging the change leader command.

Inserting themselves in the front of the queue should probably happen in 
BALANCESHARDUNIQUE when the preferredLeader property is assigned as well.

[~noble.paul] Do evil things happen if a node joins at the head but it's 
_already_ in the queue? These ephemeral nodes in the queue are watching each 
other. So if node1 is the leader you have
node1 - node2 - node3 - node4
where - means watches.

Now, if node3 puts itself at the head of the list, you have
{code}
node1 - node2
  - node3 - node4
{code}

I _think_ when I was looking at this it all just worked. 
1 node 1 goes down. Nodes 2 and 3 duke it out but there's code to insure that 
node3 becomes the leader and node2 inserts itself at then end so it's watching 
node 4.

2 node 2 goes down, nobody gets notified and it doesn't matter.

3 node 3 goes down, node 4 gets notified and starts watching node 2 by 
inserting itself at the end of the list.

4 node 4 goes down, nobody gets notified and it doesn't matter.

  was:
The original code (SOLR-6517) assumed that changes in the clusterstate after 
issuing a command to the overseer to change the leader indicated that the 
leader was successfully changed. Fortunately, Noble clued me in that this isn't 
the case and that the potential leader needs to insert itself in the leader 
election queue before trigging the change leader command.

Inserting themselves in the front of the queue should probably happen in 
BALANCESHARDUNIQUE when the preferredLeader property is assigned as well.

[~noble.paul] Do evil things happen if a node joins at the head but it's 
_already_ in the queue? These ephemeral nodes in the queue are watching each 
other. So if node1 is the leader you have
node1 - node2 - node3 - node4
where - means watches.

Now, if node3 puts itself at the head of the list, you have
{code}
node1 - node2
   - node3 - node4
{code}

I _think_ when I was looking at this it all just worked. 
1 node 1 goes down. Nodes 2 and 3 duke it out but there's code to insure that 
node3 becomes the leader and node2 inserts itself at then end so it's watching 
node 4.

2 node 2 goes down, nobody gets notified and it doesn't matter.

3 node 3 goes down, node 4 gets notified and starts watching node 2 by 
inserting itself at the end of the list.

4 node 4 goes down, nobody gets notified and it doesn't matter.


 REBALANCELEADERS needs to change the leader election queue.
 ---

 Key: SOLR-6691
 URL: https://issues.apache.org/jira/browse/SOLR-6691
 Project: Solr
  Issue Type: Bug
Reporter: Erick Erickson
Assignee: Erick Erickson

 The original code (SOLR-6517) assumed that changes in the clusterstate after 
 issuing a command to the overseer to change the leader indicated that the 
 leader was successfully changed. Fortunately, Noble clued me in that this 
 isn't the case and that the potential leader needs to insert itself in the 
 leader election queue before trigging the change leader command.
 Inserting themselves in the front of the queue should probably happen in 
 BALANCESHARDUNIQUE when the preferredLeader property is assigned as well.
 [~noble.paul] Do evil things happen if a node joins at the head but it's 
 _already_ in the queue? These ephemeral nodes in the queue are watching each 
 other. So if node1 is the leader you have
 node1 - node2 - node3 - node4
 where - means watches.
 Now, if node3 puts itself at the head of the list, you have
 {code}
 node1 - node2
   - node3 - node4
 {code}
 I _think_ when I was looking at this it all just worked. 
 1 node 1 goes down. Nodes 2 and 3 duke it out but there's code to insure 
 that node3 becomes the leader and node2 inserts itself at then end so it's 
 watching node 4.
 2 node 2 goes down, nobody gets notified and it doesn't matter.
 3 node 3 goes down, node 4 gets notified and starts watching node 2 by 
 inserting itself at the end of the list.
 4 node 4 goes down, nobody gets notified and it doesn't matter.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: An experience and some thoughts about solr/example - solr/server

2014-11-03 Thread Shawn Heisey
On 11/2/2014 5:57 PM, Erick Erickson wrote:
 I'm a little discomfited by having to learn new stuff, but that's a personal
 problem ;).

 I do think we have to be mindful of people who want something like what Shawn
 was doing, I do this all the time as well. And of new people who haven't a 
 clue.
 Hmmm, actually new folks might have an easier time of it since they don't
 have any expectations ;).

 bq: ...'run example' target that could also fire off a create for 
 collection1.

 Exactly, with a note (perhaps in the help for this command) about where the
 config files are located that are used. Perhaps with a 'clean' option that
 blows away the current data directory and (if Zookeeper becomes the one
 source of truth) does an upconfig first.

Thanks for all the input on this thread, and for the hard work trying to
make everything easier for a beginner.

I actually do really like the fact that we now start with no cores, it
was just a bit of a shock.  It sounds like it's a relatively
straightforward thing to fire off a CoreAdmin 'curl' command after
startup that will populate an example core, and the conf directory is
probably easy to locate in the download too. I just ask that this
information be added to the immediately available docs (README.txt and
similar).  I did not check the tutorial ... if it's not already there,
it probably should be.

Thanks,
Shawn


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-4.10-Linux (64bit/jdk1.7.0_67) - Build # 47 - Failure!

2014-11-03 Thread Policeman Jenkins Server
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-4.10-Linux/47/
Java: 64bit/jdk1.7.0_67 -XX:-UseCompressedOops -XX:+UseConcMarkSweepGC 
(asserts: false)

1 tests failed.
REGRESSION:  org.apache.solr.schema.TestCloudSchemaless.testDistribSearch

Error Message:
Timeout occured while waiting response from server at: 
https://127.0.0.1:53035/cyt/ab/collection1

Stack Trace:
org.apache.solr.client.solrj.SolrServerException: Timeout occured while waiting 
response from server at: https://127.0.0.1:53035/cyt/ab/collection1
at 
__randomizedtesting.SeedInfo.seed([C3126427574CD7E2:42F4EA3F2013B7DE]:0)
at 
org.apache.solr.client.solrj.impl.HttpSolrServer.executeMethod(HttpSolrServer.java:562)
at 
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:210)
at 
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:206)
at 
org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:124)
at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:68)
at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:54)
at 
org.apache.solr.schema.TestCloudSchemaless.doTest(TestCloudSchemaless.java:140)
at 
org.apache.solr.BaseDistributedSearchTestCase.testDistribSearch(BaseDistributedSearchTestCase.java:871)
at sun.reflect.GeneratedMethodAccessor45.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1618)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:827)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:863)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:877)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
at 
org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:365)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:798)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:458)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:836)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:738)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:772)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:783)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 

[jira] [Commented] (SOLR-6680) DefaultSolrHighlighter can sometimes avoid CachingTokenFilter

2014-11-03 Thread David Smiley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14194858#comment-14194858
 ] 

David Smiley commented on SOLR-6680:


I should point out that the benefit of LUCENE-6033 won't be realized for a 
multi-valued field because of the way the offset adjusting works 
(TermOffsetsTokenStream).  I'm not concerned with optimizing for this case but 
should someone else want to take this further then consider this approach:  
Don't wrap the TokenStream from the TermVectors.  Instead, grab all the values 
of this field and wrap them in a CharSequence implementation that reads from 
each value in sequence.  But Highlighter expects a String for the value; it 
could be modified to deal with a CharSequence instead.

 DefaultSolrHighlighter can sometimes avoid CachingTokenFilter
 -

 Key: SOLR-6680
 URL: https://issues.apache.org/jira/browse/SOLR-6680
 Project: Solr
  Issue Type: Improvement
  Components: highlighter
Reporter: David Smiley
Assignee: David Smiley
 Fix For: 5.0

 Attachments: SOLR-6680.patch


 The DefaultSolrHighlighter (the most accurate one) is a bit over-eager to 
 wrap the token stream in a CachingTokenFilter when 
 hl.usePhraseHighlighter=true.  This wastes memory, and it interferes with 
 other optimizations -- LUCENE-6034.  Furthermore, the internal 
 TermOffsetsTokenStream (used when TermVectors are used with this) wasn't 
 properly delegating reset().



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2927) SolrIndexSearcher's register do not match close and SolrCore's closeSearcher

2014-11-03 Thread Michael Dodsworth (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14194864#comment-14194864
 ] 

Michael Dodsworth commented on SOLR-2927:
-

[~shalinmangar] any feedback on this?

 SolrIndexSearcher's register do not match close and SolrCore's closeSearcher
 

 Key: SOLR-2927
 URL: https://issues.apache.org/jira/browse/SOLR-2927
 Project: Solr
  Issue Type: Bug
  Components: search
Affects Versions: 4.0-ALPHA
 Environment: JDK1.6/CentOS
Reporter: tom liu
Assignee: Shalin Shekhar Mangar
 Fix For: 4.9, Trunk

 Attachments: SOLR-2927.patch, mbean-leak-jira.png


 # SolrIndexSearcher's register method put the name of searcher, but 
 SolrCore's closeSearcher method remove name of currentSearcher on 
 infoRegistry.
 # SolrIndexSearcher's register method put the name of cache, but 
 SolrIndexSearcher's close do not remove the name of cache.
 so, there maybe lost some memory leak.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4656) Add hl.maxMultiValuedToExamine to limit the number of multiValued entries examined while highlighting

2014-11-03 Thread David Smiley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14194869#comment-14194869
 ] 

David Smiley commented on SOLR-4656:


I saw the results of the modifications here during my work on SOLR-6680.  It's 
not clear to me there needed to be new parameters.  Shouldn't the field value 
lengths be accumulated, approaching maxAnalyzedChars and then exit at that 
point?  And furthermore, shouldn't this field value loop exit early once it 
sees {{fragTexts.size() = numFragments}} (i.e. hl.snippets is reached) ?

 Add hl.maxMultiValuedToExamine to limit the number of multiValued entries 
 examined while highlighting
 -

 Key: SOLR-4656
 URL: https://issues.apache.org/jira/browse/SOLR-4656
 Project: Solr
  Issue Type: Improvement
  Components: highlighter
Affects Versions: 4.3, Trunk
Reporter: Erick Erickson
Assignee: Erick Erickson
Priority: Minor
 Fix For: 4.3, Trunk

 Attachments: SOLR-4656-4x.patch, SOLR-4656-4x.patch, 
 SOLR-4656-trunk.patch, SOLR-4656.patch


 I'm looking at an admittedly pathological case of many, many entries in a 
 multiValued field, and trying to implement a way to limit the number 
 examined, analogous to maxAnalyzedChars, see the patch.
 Along the way, I noticed that we do what looks like unnecessary copying of 
 the fields to be examined. We call Document.getFields, which copies all of 
 the fields and values to the returned array. Then we copy all of those to 
 another array, converting them to Strings. Then we actually examine them. a 
 this doesn't seem very efficient and b reduces the benefit from limiting the 
 number of mv values examined.
 So the attached does two things:
 1 attempts to fix this
 2 implements hl.maxMultiValuedToExamine
 I'd _really_ love it if someone who knows the highlighting code takes a peek 
 at the fix to see if I've messed things up, the changes are actually pretty 
 minimal.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



JDK 9 Early Access with Project Jigsaw build b36 is available on java.net

2014-11-03 Thread Rory O'Donnell Oracle, Dublin Ireland


Hi Uwe  Dawid,

JDK 9 Early Access with Project Jigsaw build b36 is available on 
java.net [1]


The goal of Project Jigsaw [2] is to design and implement a standard 
module system for the Java SE Platform,

and to apply that system to the Platform itself and to the JDK.

As described in JEP 220 [3], this build provides a new runtime image 
structure. For example, this new runtime

image does not install an rt.jar file or a tools.jar file.

Please refer to Project Jigsaw's updated project pages [2]  [4] and 
Mark Reinhold's announcement email [5]

for further details.

We are very interested in your experiences testing this build. Comments, 
questions, and suggestions are
welcome on the jigsaw-dev mailing list or else submit bug reports via 
bugs.java.com.


Note: If you haven’t already subscribed to that mailing list then please 
do so first, otherwise

your message will be discarded as spam.


[1] https://jdk9.java.net/jigsaw/
[2] http://openjdk.java.net/projects/jigsaw/
[3] http://openjdk.java.net/jeps/220
[4] http://openjdk.java.net/projects/jigsaw/ea
[5] 
http://mail.openjdk.java.net/pipermail/jigsaw-dev/2014-November/003878.html


--
Rgds,Rory O'Donnell
Quality Engineering Manager
Oracle EMEA , Dublin, Ireland


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: JDK 9 Early Access with Project Jigsaw build b36 is available on java.net

2014-11-03 Thread Dawid Weiss
I imagine this will break everything that relies on scanning rt.jar out
there (proguard, shading plugins, api-checkers) -- fun, fun, fun ;)

Dawid

On Mon, Nov 3, 2014 at 7:41 PM, Rory O'Donnell Oracle, Dublin Ireland
rory.odonn...@oracle.com wrote:

 Hi Uwe  Dawid,

 JDK 9 Early Access with Project Jigsaw build b36 is available on java.net
 [1]

 The goal of Project Jigsaw [2] is to design and implement a standard module
 system for the Java SE Platform,
 and to apply that system to the Platform itself and to the JDK.

 As described in JEP 220 [3], this build provides a new runtime image
 structure. For example, this new runtime
 image does not install an rt.jar file or a tools.jar file.

 Please refer to Project Jigsaw's updated project pages [2]  [4] and Mark
 Reinhold's announcement email [5]
 for further details.

 We are very interested in your experiences testing this build. Comments,
 questions, and suggestions are
 welcome on the jigsaw-dev mailing list or else submit bug reports via
 bugs.java.com.

 Note: If you haven’t already subscribed to that mailing list then please do
 so first, otherwise
 your message will be discarded as spam.


 [1] https://jdk9.java.net/jigsaw/
 [2] http://openjdk.java.net/projects/jigsaw/
 [3] http://openjdk.java.net/jeps/220
 [4] http://openjdk.java.net/projects/jigsaw/ea
 [5]
 http://mail.openjdk.java.net/pipermail/jigsaw-dev/2014-November/003878.html

 --
 Rgds,Rory O'Donnell
 Quality Engineering Manager
 Oracle EMEA , Dublin, Ireland


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4656) Add hl.maxMultiValuedToExamine to limit the number of multiValued entries examined while highlighting

2014-11-03 Thread Erick Erickson (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14194962#comment-14194962
 ] 

Erick Erickson commented on SOLR-4656:
--

David:

bq:  Shouldn't the field value lengths be accumulated,
I see where you're going, and I have I admit I didn't originate this code so 
all things are possible.
It's a little different sense than maxAnayzedChars in that the unit of 
measurement is the number of MV entries rather than the number of characters 
analyzed, but I could argue either way.

bq:  shouldn't this field value loop exit early once ...
I have no objection.

Although it sees kind of late to take away this parameter, should we deprecate 
it insteas?





 Add hl.maxMultiValuedToExamine to limit the number of multiValued entries 
 examined while highlighting
 -

 Key: SOLR-4656
 URL: https://issues.apache.org/jira/browse/SOLR-4656
 Project: Solr
  Issue Type: Improvement
  Components: highlighter
Affects Versions: 4.3, Trunk
Reporter: Erick Erickson
Assignee: Erick Erickson
Priority: Minor
 Fix For: 4.3, Trunk

 Attachments: SOLR-4656-4x.patch, SOLR-4656-4x.patch, 
 SOLR-4656-trunk.patch, SOLR-4656.patch


 I'm looking at an admittedly pathological case of many, many entries in a 
 multiValued field, and trying to implement a way to limit the number 
 examined, analogous to maxAnalyzedChars, see the patch.
 Along the way, I noticed that we do what looks like unnecessary copying of 
 the fields to be examined. We call Document.getFields, which copies all of 
 the fields and values to the returned array. Then we copy all of those to 
 another array, converting them to Strings. Then we actually examine them. a 
 this doesn't seem very efficient and b reduces the benefit from limiting the 
 number of mv values examined.
 So the attached does two things:
 1 attempts to fix this
 2 implements hl.maxMultiValuedToExamine
 I'd _really_ love it if someone who knows the highlighting code takes a peek 
 at the fix to see if I've messed things up, the changes are actually pretty 
 minimal.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4656) Add hl.maxMultiValuedToExamine to limit the number of multiValued entries examined while highlighting

2014-11-03 Thread David Smiley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14195001#comment-14195001
 ] 

David Smiley commented on SOLR-4656:


bq. It's a little different sense than maxAnayzedChars in that the unit of 
measurement is the number of MV entries rather than the number of characters 
analyzed, but I could argue either way.

Sure... but was there per-value overhead involved that was a bit heavy for the 
particular client you did this for (i.e. massive number of values) or was it 
just a matter of not accumulating value lengths?

bq. Although it sees kind of late to take away this parameter, should we 
deprecate it instead?

If there are a large number of values, I guess it has some value.

In my last comment to SOLR-6680 I stated I think multi-value handling should be 
done a bit differently in which each value should be virtually 
concatenated/iterated via a CharSequence wrapper and handed to the highlighter. 
 Likewise the TokenStreams of each value could be wrapped into a concatenating 
wrapper.  If that were done, then I think these parameters would be completely 
obsolete as it would handle the case of massive number of values.

I'll create a separate issue to accumulate maxAnalyzedChars per value and exit 
early.

 Add hl.maxMultiValuedToExamine to limit the number of multiValued entries 
 examined while highlighting
 -

 Key: SOLR-4656
 URL: https://issues.apache.org/jira/browse/SOLR-4656
 Project: Solr
  Issue Type: Improvement
  Components: highlighter
Affects Versions: 4.3, Trunk
Reporter: Erick Erickson
Assignee: Erick Erickson
Priority: Minor
 Fix For: 4.3, Trunk

 Attachments: SOLR-4656-4x.patch, SOLR-4656-4x.patch, 
 SOLR-4656-trunk.patch, SOLR-4656.patch


 I'm looking at an admittedly pathological case of many, many entries in a 
 multiValued field, and trying to implement a way to limit the number 
 examined, analogous to maxAnalyzedChars, see the patch.
 Along the way, I noticed that we do what looks like unnecessary copying of 
 the fields to be examined. We call Document.getFields, which copies all of 
 the fields and values to the returned array. Then we copy all of those to 
 another array, converting them to Strings. Then we actually examine them. a 
 this doesn't seem very efficient and b reduces the benefit from limiting the 
 number of mv values examined.
 So the attached does two things:
 1 attempts to fix this
 2 implements hl.maxMultiValuedToExamine
 I'd _really_ love it if someone who knows the highlighting code takes a peek 
 at the fix to see if I've messed things up, the changes are actually pretty 
 minimal.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-6692) hl.maxAnalyzedChars should apply cumulatively on a multi-valued field

2014-11-03 Thread David Smiley (JIRA)
David Smiley created SOLR-6692:
--

 Summary: hl.maxAnalyzedChars should apply cumulatively on a 
multi-valued field
 Key: SOLR-6692
 URL: https://issues.apache.org/jira/browse/SOLR-6692
 Project: Solr
  Issue Type: Improvement
  Components: highlighter
Reporter: David Smiley
 Fix For: 5.0


I think hl.maxAnalyzedChars should apply cumulatively across the values of a 
multi-valued field.  DefaultSolrHighligher doesn't; I'm not sure yet about the 
other two.

Furthermore, DefaultSolrHighligher.doHighlightingByHighlighter should exit 
early from it's field value loop if it reaches hl.snippets.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-6046) RegExp.toAutomaton high memory use

2014-11-03 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-6046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-6046:
---
Attachment: LUCENE-6046.patch

Patch, tests pass.  I added a required int maxStates to
RegExp.toAutomaton, and it threads this down to determinize, and
throws RegExpTooHardExc if determinize would need to exceed that
limit.

I didn't make it a checked exc; I had started that way but it
percolates up high, e.g. into query parsers, and I think that's too
much.  The exception message itself should make it quite clear what
went wrong at query time.

I also added this as an optional param to RegexpQuery default ctor,
and defaulted it to 1 states, and to QueryParserBase, with the same
default.


 RegExp.toAutomaton high memory use
 --

 Key: LUCENE-6046
 URL: https://issues.apache.org/jira/browse/LUCENE-6046
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/queryparser
Affects Versions: 4.10.1
Reporter: Lee Hinman
Assignee: Michael McCandless
Priority: Minor
 Attachments: LUCENE-6046.patch, LUCENE-6046.patch


 When creating an automaton from an org.apache.lucene.util.automaton.RegExp, 
 it's possible for the automaton to use so much memory it exceeds the maximum 
 array size for java.
 The following caused an OutOfMemoryError with a 32gb heap:
 {noformat}
 new 
 RegExp(\\[\\[(Datei|File|Bild|Image):[^]]*alt=[^]|}]{50,200}).toAutomaton();
 {noformat}
 When increased to a 60gb heap, the following exception is thrown:
 {noformat}
   1 java.lang.IllegalArgumentException: requested array size 2147483624 
 exceeds maximum array in java (2147483623)
   1 
 __randomizedtesting.SeedInfo.seed([7BE81EF678615C32:95C8057A4ABA5B52]:0)
   1 org.apache.lucene.util.ArrayUtil.oversize(ArrayUtil.java:168)
   1 org.apache.lucene.util.ArrayUtil.grow(ArrayUtil.java:295)
   1 
 org.apache.lucene.util.automaton.Automaton$Builder.addTransition(Automaton.java:639)
   1 
 org.apache.lucene.util.automaton.Operations.determinize(Operations.java:741)
   1 
 org.apache.lucene.util.automaton.MinimizationOperations.minimizeHopcroft(MinimizationOperations.java:62)
   1 
 org.apache.lucene.util.automaton.MinimizationOperations.minimize(MinimizationOperations.java:51)
   1 org.apache.lucene.util.automaton.RegExp.toAutomaton(RegExp.java:477)
   1 org.apache.lucene.util.automaton.RegExp.toAutomaton(RegExp.java:426)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-6693) Start script for windows fails with 32bit JRE

2014-11-03 Thread JIRA
Jan Høydahl created SOLR-6693:
-

 Summary: Start script for windows fails with 32bit JRE
 Key: SOLR-6693
 URL: https://issues.apache.org/jira/browse/SOLR-6693
 Project: Solr
  Issue Type: Bug
  Components: scripts and tools
Affects Versions: 4.10.2
 Environment: WINDOWS 8.1
Reporter: Jan Høydahl
 Fix For: 5.0, Trunk


*Reproduce:*
# Install JRE8 from www.java.com (typically {{C:\Program Files 
(x86)\Java\jre1.8.0_25}})
# Run the command {{bin\solr start -V}}

The result is:
{{\Java\jre1.8.0_25\bin\java was unexpected at this time.}}

*Reason*
This comes from bad quoting of the {{%SOLR%}} variable. I think it's because of 
the parenthesis that it freaks out.

*Solution*
Quoting the lines where %JAVA% is printed, e.g. instead of
{noformat}
  @echo Using Java: %JAVA%
{noformat}
then use
{noformat}
  @echo Using Java: %JAVA%
{noformat}

This is needed several places.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-6693) Start script for windows fails with 32bit JRE

2014-11-03 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/SOLR-6693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Høydahl updated SOLR-6693:
--
Description: 
*Reproduce:*
# Install JRE8 from www.java.com (typically {{C:\Program Files 
(x86)\Java\jre1.8.0_25}})
# Run the command {{bin\solr start -V}}

The result is:
{{\Java\jre1.8.0_25\bin\java was unexpected at this time.}}

*Reason*
This comes from bad quoting of the {{%SOLR%}} variable. I think it's because of 
the parenthesis that it freaks out. I think the same would apply for a 32-bit 
JDK because of the (x86) in the path, but I have not tested.

*Solution*
Quoting the lines where %JAVA% is printed, e.g. instead of
{noformat}
  @echo Using Java: %JAVA%
{noformat}
then use
{noformat}
  @echo Using Java: %JAVA%
{noformat}

This is needed several places.

  was:
*Reproduce:*
# Install JRE8 from www.java.com (typically {{C:\Program Files 
(x86)\Java\jre1.8.0_25}})
# Run the command {{bin\solr start -V}}

The result is:
{{\Java\jre1.8.0_25\bin\java was unexpected at this time.}}

*Reason*
This comes from bad quoting of the {{%SOLR%}} variable. I think it's because of 
the parenthesis that it freaks out.

*Solution*
Quoting the lines where %JAVA% is printed, e.g. instead of
{noformat}
  @echo Using Java: %JAVA%
{noformat}
then use
{noformat}
  @echo Using Java: %JAVA%
{noformat}

This is needed several places.


 Start script for windows fails with 32bit JRE
 -

 Key: SOLR-6693
 URL: https://issues.apache.org/jira/browse/SOLR-6693
 Project: Solr
  Issue Type: Bug
  Components: scripts and tools
Affects Versions: 4.10.2
 Environment: WINDOWS 8.1
Reporter: Jan Høydahl
  Labels: bin\solr.cmd
 Fix For: 5.0, Trunk


 *Reproduce:*
 # Install JRE8 from www.java.com (typically {{C:\Program Files 
 (x86)\Java\jre1.8.0_25}})
 # Run the command {{bin\solr start -V}}
 The result is:
 {{\Java\jre1.8.0_25\bin\java was unexpected at this time.}}
 *Reason*
 This comes from bad quoting of the {{%SOLR%}} variable. I think it's because 
 of the parenthesis that it freaks out. I think the same would apply for a 
 32-bit JDK because of the (x86) in the path, but I have not tested.
 *Solution*
 Quoting the lines where %JAVA% is printed, e.g. instead of
 {noformat}
   @echo Using Java: %JAVA%
 {noformat}
 then use
 {noformat}
   @echo Using Java: %JAVA%
 {noformat}
 This is needed several places.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6046) RegExp.toAutomaton high memory use

2014-11-03 Thread Nik Everett (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14195033#comment-14195033
 ] 

Nik Everett commented on LUCENE-6046:
-

Oh no!  I wrote a very similar patch!  Sorry to duplicate effort there.  

I found that 10,000 states wasn't quite enough to handle some of the tests so I 
went with 1,000,000 as the default.  Its pretty darn huge but it does get the 
job done.

 RegExp.toAutomaton high memory use
 --

 Key: LUCENE-6046
 URL: https://issues.apache.org/jira/browse/LUCENE-6046
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/queryparser
Affects Versions: 4.10.1
Reporter: Lee Hinman
Assignee: Michael McCandless
Priority: Minor
 Attachments: LUCENE-6046.patch, LUCENE-6046.patch


 When creating an automaton from an org.apache.lucene.util.automaton.RegExp, 
 it's possible for the automaton to use so much memory it exceeds the maximum 
 array size for java.
 The following caused an OutOfMemoryError with a 32gb heap:
 {noformat}
 new 
 RegExp(\\[\\[(Datei|File|Bild|Image):[^]]*alt=[^]|}]{50,200}).toAutomaton();
 {noformat}
 When increased to a 60gb heap, the following exception is thrown:
 {noformat}
   1 java.lang.IllegalArgumentException: requested array size 2147483624 
 exceeds maximum array in java (2147483623)
   1 
 __randomizedtesting.SeedInfo.seed([7BE81EF678615C32:95C8057A4ABA5B52]:0)
   1 org.apache.lucene.util.ArrayUtil.oversize(ArrayUtil.java:168)
   1 org.apache.lucene.util.ArrayUtil.grow(ArrayUtil.java:295)
   1 
 org.apache.lucene.util.automaton.Automaton$Builder.addTransition(Automaton.java:639)
   1 
 org.apache.lucene.util.automaton.Operations.determinize(Operations.java:741)
   1 
 org.apache.lucene.util.automaton.MinimizationOperations.minimizeHopcroft(MinimizationOperations.java:62)
   1 
 org.apache.lucene.util.automaton.MinimizationOperations.minimize(MinimizationOperations.java:51)
   1 org.apache.lucene.util.automaton.RegExp.toAutomaton(RegExp.java:477)
   1 org.apache.lucene.util.automaton.RegExp.toAutomaton(RegExp.java:426)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: An experience and some thoughts about solr/example - solr/server

2014-11-03 Thread Erik Hatcher

On Nov 3, 2014, at 12:50 PM, Shawn Heisey apa...@elyograg.org wrote:

 On 11/2/2014 5:57 PM, Erick Erickson wrote:
 I'm a little discomfited by having to learn new stuff, but that's a personal
 problem ;).
 
 I do think we have to be mindful of people who want something like what Shawn
 was doing, I do this all the time as well. And of new people who haven't a 
 clue.
 Hmmm, actually new folks might have an easier time of it since they don't
 have any expectations ;).
 
 bq: ...'run example' target that could also fire off a create for 
 collection1.
 
 Exactly, with a note (perhaps in the help for this command) about where the
 config files are located that are used. Perhaps with a 'clean' option that
 blows away the current data directory and (if Zookeeper becomes the one
 source of truth) does an upconfig first.
 
 Thanks for all the input on this thread, and for the hard work trying to
 make everything easier for a beginner.
 
 I actually do really like the fact that we now start with no cores, it
 was just a bit of a shock.  It sounds like it's a relatively
 straightforward thing to fire off a CoreAdmin 'curl' command after
 startup that will populate an example core, and the conf directory is
 probably easy to locate in the download too. I just ask that this
 information be added to the immediately available docs (README.txt and
 similar).  I did not check the tutorial ... if it's not already there,
 it probably should be.

Or on trunk (and hopefully back ported if we do another 4.10.x release):

$ bin/solr create_core -help

Usage: solr create_core [-n name] [-c configset]

  -n name   Name of core to create

  -c configset  Name of configuration directory to use, valid options are:
  basic_configs: Minimal Solr configuration
  data_driven_schema_configs: Managed schema with field-guessing support 
enabled
  sample_techproducts_configs: Example configuration with many optional 
features enabled to
 demonstrate the full power of Solr
  If not specified, default is: data_driven_schema_configs


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6046) RegExp.toAutomaton high memory use

2014-11-03 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14195047#comment-14195047
 ] 

Michael McCandless commented on LUCENE-6046:


Woops, sorry, I didn't see you had a patch here!  Thank you.

I like your patch: it's good to make all hidden usages of determinize visible.  
Let's start from your patch and merge anything from mine in?  E.g. I think we 
can collapse minimizeHopcroft into just minimize...

bq. I found that 10,000 states wasn't quite enough to handle some of the tests 
so I went with 1,000,000 as the default. Its pretty darn huge but it does get 
the job done.

Whoa, which tests needed 1M max states?  I worry about passing a 1M state 
automaton to minimize...

 RegExp.toAutomaton high memory use
 --

 Key: LUCENE-6046
 URL: https://issues.apache.org/jira/browse/LUCENE-6046
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/queryparser
Affects Versions: 4.10.1
Reporter: Lee Hinman
Assignee: Michael McCandless
Priority: Minor
 Attachments: LUCENE-6046.patch, LUCENE-6046.patch


 When creating an automaton from an org.apache.lucene.util.automaton.RegExp, 
 it's possible for the automaton to use so much memory it exceeds the maximum 
 array size for java.
 The following caused an OutOfMemoryError with a 32gb heap:
 {noformat}
 new 
 RegExp(\\[\\[(Datei|File|Bild|Image):[^]]*alt=[^]|}]{50,200}).toAutomaton();
 {noformat}
 When increased to a 60gb heap, the following exception is thrown:
 {noformat}
   1 java.lang.IllegalArgumentException: requested array size 2147483624 
 exceeds maximum array in java (2147483623)
   1 
 __randomizedtesting.SeedInfo.seed([7BE81EF678615C32:95C8057A4ABA5B52]:0)
   1 org.apache.lucene.util.ArrayUtil.oversize(ArrayUtil.java:168)
   1 org.apache.lucene.util.ArrayUtil.grow(ArrayUtil.java:295)
   1 
 org.apache.lucene.util.automaton.Automaton$Builder.addTransition(Automaton.java:639)
   1 
 org.apache.lucene.util.automaton.Operations.determinize(Operations.java:741)
   1 
 org.apache.lucene.util.automaton.MinimizationOperations.minimizeHopcroft(MinimizationOperations.java:62)
   1 
 org.apache.lucene.util.automaton.MinimizationOperations.minimize(MinimizationOperations.java:51)
   1 org.apache.lucene.util.automaton.RegExp.toAutomaton(RegExp.java:477)
   1 org.apache.lucene.util.automaton.RegExp.toAutomaton(RegExp.java:426)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-6693) Start script for windows fails with 32bit JRE

2014-11-03 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/SOLR-6693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Høydahl updated SOLR-6693:
--
Description: 
*Reproduce:*
# Install JRE8 from www.java.com (typically {{C:\Program Files 
(x86)\Java\jre1.8.0_25}})
# Run the command {{bin\solr start -V}}

The result is:
{{\Java\jre1.8.0_25\bin\java was unexpected at this time.}}

*Reason*
This comes from bad quoting of the {{%SOLR%}} variable. I think it's because of 
the parenthesis that it freaks out. I think the same would apply for a 32-bit 
JDK because of the (x86) in the path, but I have not tested.

Tip: You can remove the line {{@ECHO OFF}} at the top to see exactly which is 
the offending line

*Solution*
Quoting the lines where %JAVA% is printed, e.g. instead of
{noformat}
  @echo Using Java: %JAVA%
{noformat}
then use
{noformat}
  @echo Using Java: %JAVA%
{noformat}

This is needed several places.

  was:
*Reproduce:*
# Install JRE8 from www.java.com (typically {{C:\Program Files 
(x86)\Java\jre1.8.0_25}})
# Run the command {{bin\solr start -V}}

The result is:
{{\Java\jre1.8.0_25\bin\java was unexpected at this time.}}

*Reason*
This comes from bad quoting of the {{%SOLR%}} variable. I think it's because of 
the parenthesis that it freaks out. I think the same would apply for a 32-bit 
JDK because of the (x86) in the path, but I have not tested.

*Solution*
Quoting the lines where %JAVA% is printed, e.g. instead of
{noformat}
  @echo Using Java: %JAVA%
{noformat}
then use
{noformat}
  @echo Using Java: %JAVA%
{noformat}

This is needed several places.


 Start script for windows fails with 32bit JRE
 -

 Key: SOLR-6693
 URL: https://issues.apache.org/jira/browse/SOLR-6693
 Project: Solr
  Issue Type: Bug
  Components: scripts and tools
Affects Versions: 4.10.2
 Environment: WINDOWS 8.1
Reporter: Jan Høydahl
  Labels: bin\solr.cmd
 Fix For: 5.0, Trunk


 *Reproduce:*
 # Install JRE8 from www.java.com (typically {{C:\Program Files 
 (x86)\Java\jre1.8.0_25}})
 # Run the command {{bin\solr start -V}}
 The result is:
 {{\Java\jre1.8.0_25\bin\java was unexpected at this time.}}
 *Reason*
 This comes from bad quoting of the {{%SOLR%}} variable. I think it's because 
 of the parenthesis that it freaks out. I think the same would apply for a 
 32-bit JDK because of the (x86) in the path, but I have not tested.
 Tip: You can remove the line {{@ECHO OFF}} at the top to see exactly which is 
 the offending line
 *Solution*
 Quoting the lines where %JAVA% is printed, e.g. instead of
 {noformat}
   @echo Using Java: %JAVA%
 {noformat}
 then use
 {noformat}
   @echo Using Java: %JAVA%
 {noformat}
 This is needed several places.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6046) RegExp.toAutomaton high memory use

2014-11-03 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14195053#comment-14195053
 ] 

Michael McCandless commented on LUCENE-6046:


I like the test simplifications, and removing dead code from 
Operations.determinize.

Can we fix the exc thrown from Regexp to include the offending regular 
expression, and fix the test to confirm the message contains it?  Maybe also 
add RegExp.toStringTree?  I found it very useful while debugging the original 
regexp :)

I think QueryParserBase should also have a set/get for this option?

 RegExp.toAutomaton high memory use
 --

 Key: LUCENE-6046
 URL: https://issues.apache.org/jira/browse/LUCENE-6046
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/queryparser
Affects Versions: 4.10.1
Reporter: Lee Hinman
Assignee: Michael McCandless
Priority: Minor
 Attachments: LUCENE-6046.patch, LUCENE-6046.patch


 When creating an automaton from an org.apache.lucene.util.automaton.RegExp, 
 it's possible for the automaton to use so much memory it exceeds the maximum 
 array size for java.
 The following caused an OutOfMemoryError with a 32gb heap:
 {noformat}
 new 
 RegExp(\\[\\[(Datei|File|Bild|Image):[^]]*alt=[^]|}]{50,200}).toAutomaton();
 {noformat}
 When increased to a 60gb heap, the following exception is thrown:
 {noformat}
   1 java.lang.IllegalArgumentException: requested array size 2147483624 
 exceeds maximum array in java (2147483623)
   1 
 __randomizedtesting.SeedInfo.seed([7BE81EF678615C32:95C8057A4ABA5B52]:0)
   1 org.apache.lucene.util.ArrayUtil.oversize(ArrayUtil.java:168)
   1 org.apache.lucene.util.ArrayUtil.grow(ArrayUtil.java:295)
   1 
 org.apache.lucene.util.automaton.Automaton$Builder.addTransition(Automaton.java:639)
   1 
 org.apache.lucene.util.automaton.Operations.determinize(Operations.java:741)
   1 
 org.apache.lucene.util.automaton.MinimizationOperations.minimizeHopcroft(MinimizationOperations.java:62)
   1 
 org.apache.lucene.util.automaton.MinimizationOperations.minimize(MinimizationOperations.java:51)
   1 org.apache.lucene.util.automaton.RegExp.toAutomaton(RegExp.java:477)
   1 org.apache.lucene.util.automaton.RegExp.toAutomaton(RegExp.java:426)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6046) RegExp.toAutomaton high memory use

2014-11-03 Thread Nik Everett (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14195056#comment-14195056
 ] 

Nik Everett commented on LUCENE-6046:
-

TestDeterminizeLexicon wants to make an automata that accepts 5000 random 
strings.  So 10,000 isn't enough states for it.  I'll drop the default limit to 
10,000 again and just feed a million to that test case. 

 RegExp.toAutomaton high memory use
 --

 Key: LUCENE-6046
 URL: https://issues.apache.org/jira/browse/LUCENE-6046
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/queryparser
Affects Versions: 4.10.1
Reporter: Lee Hinman
Assignee: Michael McCandless
Priority: Minor
 Attachments: LUCENE-6046.patch, LUCENE-6046.patch


 When creating an automaton from an org.apache.lucene.util.automaton.RegExp, 
 it's possible for the automaton to use so much memory it exceeds the maximum 
 array size for java.
 The following caused an OutOfMemoryError with a 32gb heap:
 {noformat}
 new 
 RegExp(\\[\\[(Datei|File|Bild|Image):[^]]*alt=[^]|}]{50,200}).toAutomaton();
 {noformat}
 When increased to a 60gb heap, the following exception is thrown:
 {noformat}
   1 java.lang.IllegalArgumentException: requested array size 2147483624 
 exceeds maximum array in java (2147483623)
   1 
 __randomizedtesting.SeedInfo.seed([7BE81EF678615C32:95C8057A4ABA5B52]:0)
   1 org.apache.lucene.util.ArrayUtil.oversize(ArrayUtil.java:168)
   1 org.apache.lucene.util.ArrayUtil.grow(ArrayUtil.java:295)
   1 
 org.apache.lucene.util.automaton.Automaton$Builder.addTransition(Automaton.java:639)
   1 
 org.apache.lucene.util.automaton.Operations.determinize(Operations.java:741)
   1 
 org.apache.lucene.util.automaton.MinimizationOperations.minimizeHopcroft(MinimizationOperations.java:62)
   1 
 org.apache.lucene.util.automaton.MinimizationOperations.minimize(MinimizationOperations.java:51)
   1 org.apache.lucene.util.automaton.RegExp.toAutomaton(RegExp.java:477)
   1 org.apache.lucene.util.automaton.RegExp.toAutomaton(RegExp.java:426)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6046) RegExp.toAutomaton high memory use

2014-11-03 Thread Nik Everett (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14195065#comment-14195065
 ] 

Nik Everett commented on LUCENE-6046:
-

I'll certainly add the regexp string to the exception message.  And I'll merge 
the toStringTree from your patch into mine if you'd like.

Yeah - QueryParserBase should have this option too.

The thing I found most useful for debugging this was to call toDot on the 
automata before and after normalization.  I just looked at it and went, oh, of 
course you have to do it that way.  No wonder the states explode.  And then I 
read https://en.wikipedia.org/wiki/Powerset_construction and remembered it from 
my rusty CS degree.

 RegExp.toAutomaton high memory use
 --

 Key: LUCENE-6046
 URL: https://issues.apache.org/jira/browse/LUCENE-6046
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/queryparser
Affects Versions: 4.10.1
Reporter: Lee Hinman
Assignee: Michael McCandless
Priority: Minor
 Attachments: LUCENE-6046.patch, LUCENE-6046.patch


 When creating an automaton from an org.apache.lucene.util.automaton.RegExp, 
 it's possible for the automaton to use so much memory it exceeds the maximum 
 array size for java.
 The following caused an OutOfMemoryError with a 32gb heap:
 {noformat}
 new 
 RegExp(\\[\\[(Datei|File|Bild|Image):[^]]*alt=[^]|}]{50,200}).toAutomaton();
 {noformat}
 When increased to a 60gb heap, the following exception is thrown:
 {noformat}
   1 java.lang.IllegalArgumentException: requested array size 2147483624 
 exceeds maximum array in java (2147483623)
   1 
 __randomizedtesting.SeedInfo.seed([7BE81EF678615C32:95C8057A4ABA5B52]:0)
   1 org.apache.lucene.util.ArrayUtil.oversize(ArrayUtil.java:168)
   1 org.apache.lucene.util.ArrayUtil.grow(ArrayUtil.java:295)
   1 
 org.apache.lucene.util.automaton.Automaton$Builder.addTransition(Automaton.java:639)
   1 
 org.apache.lucene.util.automaton.Operations.determinize(Operations.java:741)
   1 
 org.apache.lucene.util.automaton.MinimizationOperations.minimizeHopcroft(MinimizationOperations.java:62)
   1 
 org.apache.lucene.util.automaton.MinimizationOperations.minimize(MinimizationOperations.java:51)
   1 org.apache.lucene.util.automaton.RegExp.toAutomaton(RegExp.java:477)
   1 org.apache.lucene.util.automaton.RegExp.toAutomaton(RegExp.java:426)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6693) Start script for windows fails with 32bit JRE

2014-11-03 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-6693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14195096#comment-14195096
 ] 

Jan Høydahl commented on SOLR-6693:
---

After fixing the echo problems, the next hurdle occurs:
{{Java 1.7 or later is required to run Solr.}}

Even if I have Java8 (32bit). After some debugging, I found that the syntax 
{{-version:x.y}} does not work on 32-bit java for Windows, it prints the error 
even if you have the right version.

So the question then is, should the script enforce 64bit Java and print a more 
useful message if not found? Or is there a way to fix the version testing under 
32-bit Java on Windows? It would perhaps be good to print a warning for 32-bit 
Java since you should use 64bit if possible

 Start script for windows fails with 32bit JRE
 -

 Key: SOLR-6693
 URL: https://issues.apache.org/jira/browse/SOLR-6693
 Project: Solr
  Issue Type: Bug
  Components: scripts and tools
Affects Versions: 4.10.2
 Environment: WINDOWS 8.1
Reporter: Jan Høydahl
  Labels: bin\solr.cmd
 Fix For: 5.0, Trunk


 *Reproduce:*
 # Install JRE8 from www.java.com (typically {{C:\Program Files 
 (x86)\Java\jre1.8.0_25}})
 # Run the command {{bin\solr start -V}}
 The result is:
 {{\Java\jre1.8.0_25\bin\java was unexpected at this time.}}
 *Reason*
 This comes from bad quoting of the {{%SOLR%}} variable. I think it's because 
 of the parenthesis that it freaks out. I think the same would apply for a 
 32-bit JDK because of the (x86) in the path, but I have not tested.
 Tip: You can remove the line {{@ECHO OFF}} at the top to see exactly which is 
 the offending line
 *Solution*
 Quoting the lines where %JAVA% is printed, e.g. instead of
 {noformat}
   @echo Using Java: %JAVA%
 {noformat}
 then use
 {noformat}
   @echo Using Java: %JAVA%
 {noformat}
 This is needed several places.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-6694) Auto detect JAVA_HOME in bin\start.cmd

2014-11-03 Thread JIRA
Jan Høydahl created SOLR-6694:
-

 Summary: Auto detect JAVA_HOME in bin\start.cmd
 Key: SOLR-6694
 URL: https://issues.apache.org/jira/browse/SOLR-6694
 Project: Solr
  Issue Type: Improvement
  Components: scripts and tools
Affects Versions: 4.10.2
 Environment: Windows
Reporter: Jan Høydahl


The start script requires JAVA_HOME to be set.

The Java installer on Windows does not set JAVA_HOME, so it is an obstacle for 
new users who wants to test. What the installer does is to set some registry 
values, and we can detect those to find a JAVA_HOME to use. It will give a 
better user experience.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4586) Increase default maxBooleanClauses

2014-11-03 Thread Robert Parker (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14195112#comment-14195112
 ] 

Robert Parker commented on SOLR-4586:
-

Under Solr 4.10.2 in solrcloud configuration, if I upload a change to 
solrconfig.xml to zookeeper that raises maxBooleanClauses from 1024 to 2048 and 
then reload the collection, the cores do not recongnize a new value for 
maxBooleanClauses unlike other changes to schema.xml and solrconfig.xml.  I 
have to bounce Solr on each node before queries will honor the new value for 
maxBooleanClauses.  This seems like unintentional behavior.  I should be able 
to make any change to schema.xml and solrconfig.xml, then upload those to 
zookeeper and have each node in the cluster instantly honor all new values 
after a core/collection reload.

 Increase default maxBooleanClauses
 --

 Key: SOLR-4586
 URL: https://issues.apache.org/jira/browse/SOLR-4586
 Project: Solr
  Issue Type: Improvement
Affects Versions: 4.2
 Environment: 4.3-SNAPSHOT 1456767M - ncindex - 2013-03-15 13:11:50
Reporter: Shawn Heisey
 Attachments: SOLR-4586.patch, SOLR-4586.patch, SOLR-4586.patch, 
 SOLR-4586.patch, SOLR-4586.patch, SOLR-4586_verify_maxClauses.patch


 In the #solr IRC channel, I mentioned the maxBooleanClauses limitation to 
 someone asking a question about queries.  Mark Miller told me that 
 maxBooleanClauses no longer applies, that the limitation was removed from 
 Lucene sometime in the 3.x series.  The config still shows up in the example 
 even in the just-released 4.2.
 Checking through the source code, I found that the config option is parsed 
 and the value stored in objects, but does not actually seem to be used by 
 anything.  I removed every trace of it that I could find, and all tests still 
 pass.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6694) Auto detect JAVA_HOME in bin\start.cmd

2014-11-03 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-6694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14195122#comment-14195122
 ] 

Jan Høydahl commented on SOLR-6694:
---

Here's some code from a start script I created long ago:

{noformat}
echo Detecting JAVA_HOME
if %JAVA_HOME%== call:FIND_JAVA_HOME
echo Java home: %JAVA_HOME%
goto:DETECTED

:FIND_JAVA_HOME
FOR /F skip=2 tokens=2* %%A IN ('REG QUERY HKLM\Software\JavaSoft\Java 
Runtime Environment /v CurrentVersion') DO set CurVer=%%B
FOR /F skip=2 tokens=2* %%A IN ('REG QUERY HKLM\Software\JavaSoft\Java 
Runtime Environment\%CurVer% /v JavaHome') DO set JAVA_HOME=%%B
goto:EOF

:DETECTED
echo Do whatever
{noformat}


 Auto detect JAVA_HOME in bin\start.cmd
 --

 Key: SOLR-6694
 URL: https://issues.apache.org/jira/browse/SOLR-6694
 Project: Solr
  Issue Type: Improvement
  Components: scripts and tools
Affects Versions: 4.10.2
 Environment: Windows
Reporter: Jan Høydahl

 The start script requires JAVA_HOME to be set.
 The Java installer on Windows does not set JAVA_HOME, so it is an obstacle 
 for new users who wants to test. What the installer does is to set some 
 registry values, and we can detect those to find a JAVA_HOME to use. It will 
 give a better user experience.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6037) PendingTerm cannot be cast to PendingBlock

2014-11-03 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14195127#comment-14195127
 ] 

Michael McCandless commented on LUCENE-6037:


Hmm are you sure this was just a multi-threaded issue?  I don't see how 
[illegally] sharing a single Document across threads would lead to this 
exception.

 PendingTerm cannot be cast to PendingBlock
 --

 Key: LUCENE-6037
 URL: https://issues.apache.org/jira/browse/LUCENE-6037
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/codecs
Affects Versions: 4.3.1
 Environment: ubuntu 64bit
Reporter: zhanlijun
Priority: Critical
 Fix For: 4.3.1


 the error as follows:
 java.lang.ClassCastException: 
 org.apache.lucene.codecs.BlockTreeTermsWriter$PendingTerm cannot be cast to 
 org.apache.lucene.codecs.BlockTreeTermsWriter$PendingBlock
 at 
 org.apache.lucene.codecs.BlockTreeTermsWriter$TermsWriter.finish(BlockTreeTermsWriter.java:1014)
 at 
 org.apache.lucene.index.FreqProxTermsWriterPerField.flush(FreqProxTermsWriterPerField.java:553)
 at 
 org.apache.lucene.index.FreqProxTermsWriter.flush(FreqProxTermsWriter.java:85)
 at org.apache.lucene.index.TermsHash.flush(TermsHash.java:116)
 at org.apache.lucene.index.DocInverter.flush(DocInverter.java:53)
 at 
 org.apache.lucene.index.DocFieldProcessor.flush(DocFieldProcessor.java:81)
 at 
 org.apache.lucene.index.DocumentsWriterPerThread.flush(DocumentsWriterPerThread.java:493)
 at 
 org.apache.lucene.index.DocumentsWriter.doFlush(DocumentsWriter.java:480)
 at 
 org.apache.lucene.index.DocumentsWriter.postUpdate(DocumentsWriter.java:378)
 at 
 org.apache.lucene.index.DocumentsWriter.updateDocuments(DocumentsWriter.java:413)
 at 
 org.apache.lucene.index.IndexWriter.updateDocuments(IndexWriter.java:1283)
 at 
 org.apache.lucene.index.IndexWriter.addDocuments(IndexWriter.java:1243)
 at 
 org.apache.lucene.index.IndexWriter.addDocuments(IndexWriter.java:1228)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-6695) Change in solrconfig.xml for maxBooleanClauses in SolrCloud is not recognized

2014-11-03 Thread Robert Parker (JIRA)
Robert Parker created SOLR-6695:
---

 Summary: Change in solrconfig.xml for maxBooleanClauses in 
SolrCloud is not recognized
 Key: SOLR-6695
 URL: https://issues.apache.org/jira/browse/SOLR-6695
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.10.2, 4.10, 4.9
Reporter: Robert Parker
Priority: Minor


Under Solr 4.10.2 in solrcloud configuration, if I upload a change to 
solrconfig.xml to zookeeper that raises maxBooleanClauses from 1024 to 2048 and 
then reload the collection, the cores do not recongnize a new value for 
maxBooleanClauses unlike other changes to schema.xml and solrconfig.xml. I have 
to bounce Solr on each node before queries will honor the new value for 
maxBooleanClauses. This seems like unintentional behavior. I should be able to 
make any change to schema.xml and solrconfig.xml, then upload those to 
zookeeper and have each node in the cluster instantly honor all new values 
after a core/collection reload.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2927) SolrIndexSearcher's register do not match close and SolrCore's closeSearcher

2014-11-03 Thread Shalin Shekhar Mangar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14195169#comment-14195169
 ] 

Shalin Shekhar Mangar commented on SOLR-2927:
-

Thanks for pinging me Michael. This issue had been forgotten.

I now understand the bug and I am able to reproduce it locally. I started with 
Cyrille's patch which introduced an exception in the SolrCore constructor and I 
added logging of all items which are added to JMX and all the items that are 
removed on close after the exception. With a little bit of awk and sort, I have 
this list of mbeans which are leaked:
{code}
documentCache
fieldValueCache
filterCache
mlt
perSegFilter
query
queryResultCache
searcher
Searcher@778e65f2[techproducts]
{code}

 SolrIndexSearcher's register do not match close and SolrCore's closeSearcher
 

 Key: SOLR-2927
 URL: https://issues.apache.org/jira/browse/SOLR-2927
 Project: Solr
  Issue Type: Bug
  Components: search
Affects Versions: 4.0-ALPHA
 Environment: JDK1.6/CentOS
Reporter: tom liu
Assignee: Shalin Shekhar Mangar
 Fix For: 4.9, Trunk

 Attachments: SOLR-2927.patch, mbean-leak-jira.png


 # SolrIndexSearcher's register method put the name of searcher, but 
 SolrCore's closeSearcher method remove name of currentSearcher on 
 infoRegistry.
 # SolrIndexSearcher's register method put the name of cache, but 
 SolrIndexSearcher's close do not remove the name of cache.
 so, there maybe lost some memory leak.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6058) Solr needs a new website

2014-11-03 Thread Steve Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14195194#comment-14195194
 ] 

Steve Rowe commented on SOLR-6058:
--

I asked Infra (INFRA-8576) to enable the Attribute Lists markdown extension so 
that we can explicitly set ids, classes, and arbitrary attribute/value pairs on 
output elements when the ASF CMS uses Python Markdown to generate HTML - see 
https://pythonhosted.org/Markdown/extensions/attr_list.html

 Solr needs a new website
 

 Key: SOLR-6058
 URL: https://issues.apache.org/jira/browse/SOLR-6058
 Project: Solr
  Issue Type: Task
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
 Attachments: HTML.rar, SOLR-6058, SOLR-6058.location-fix.patchfile, 
 Solr_Icons.pdf, Solr_Logo_on_black.pdf, Solr_Logo_on_black.png, 
 Solr_Logo_on_orange.pdf, Solr_Logo_on_orange.png, Solr_Logo_on_white.pdf, 
 Solr_Logo_on_white.png, Solr_Styleguide.pdf


 Solr needs a new website:  better organization of content, less verbose, more 
 pleasing graphics, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



RE: JDK 9 Early Access with Project Jigsaw build b36 is available on java.net

2014-11-03 Thread Uwe Schindler
Hi,

A few days ago, I already opened an issue about jigsaw in forbidden-apis 
(https://code.google.com/p/forbidden-apis/issues/detail?id=39). Currently it 
just says unsupported JDK and stops checking forbidden (because it cannot 
find out if a class is something like sun.misc.Unsafe, non-public). But adding 
support is quite easy. Once I installed this version locally, I can start 
implementing support for Java 9... Basically, forbidden works as it is, but the 
special cases like detecting private rt.jar APIs or extracting the deprecated 
signature files needs some changes. This is why it says unsupported.

Uwe

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de

 -Original Message-
 From: dawid.we...@gmail.com [mailto:dawid.we...@gmail.com] On Behalf
 Of Dawid Weiss
 Sent: Monday, November 03, 2014 7:44 PM
 To: Uwe Schindler
 Cc: dev@lucene.apache.org
 Subject: Re: JDK 9 Early Access with Project Jigsaw build b36 is available on
 java.net
 
 I imagine this will break everything that relies on scanning rt.jar out there
 (proguard, shading plugins, api-checkers) -- fun, fun, fun ;)
 
 Dawid
 
 On Mon, Nov 3, 2014 at 7:41 PM, Rory O'Donnell Oracle, Dublin Ireland
 rory.odonn...@oracle.com wrote:
 
  Hi Uwe  Dawid,
 
  JDK 9 Early Access with Project Jigsaw build b36 is available on
  java.net [1]
 
  The goal of Project Jigsaw [2] is to design and implement a standard
  module system for the Java SE Platform, and to apply that system to
  the Platform itself and to the JDK.
 
  As described in JEP 220 [3], this build provides a new runtime image
  structure. For example, this new runtime image does not install an
  rt.jar file or a tools.jar file.
 
  Please refer to Project Jigsaw's updated project pages [2]  [4] and
  Mark Reinhold's announcement email [5] for further details.
 
  We are very interested in your experiences testing this build.
  Comments, questions, and suggestions are welcome on the jigsaw-dev
  mailing list or else submit bug reports via bugs.java.com.
 
  Note: If you haven’t already subscribed to that mailing list then
  please do so first, otherwise your message will be discarded as spam.
 
 
  [1] https://jdk9.java.net/jigsaw/
  [2] http://openjdk.java.net/projects/jigsaw/
  [3] http://openjdk.java.net/jeps/220
  [4] http://openjdk.java.net/projects/jigsaw/ea
  [5]
  http://mail.openjdk.java.net/pipermail/jigsaw-dev/2014-
 November/003878
  .html
 
  --
  Rgds,Rory O'Donnell
  Quality Engineering Manager
  Oracle EMEA , Dublin, Ireland
 


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4586) Increase default maxBooleanClauses

2014-11-03 Thread Shawn Heisey (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14195223#comment-14195223
 ] 

Shawn Heisey commented on SOLR-4586:


[~reparker], maxBooleanClauses is a global Lucene setting across the entire 
application, and the last thing to set that value will win every time.

If you have any configs with the default of 1024 and you reload any of those 
cores after reloading the one that sets it to 2048, then it will be changed 
back -- for the entire application.  The best option is to set the higher limit 
in *every* solrconfig.xml file, or remove the setting from all of them except 
one.

The javadocs for the Lucene setter method do not indicate this global nature, 
but I assure you that I have looked at the code, and it is indeed global.

http://lucene.apache.org/core/4_10_0/core/org/apache/lucene/search/BooleanQuery.html#setMaxClauseCount%28int%29

 Increase default maxBooleanClauses
 --

 Key: SOLR-4586
 URL: https://issues.apache.org/jira/browse/SOLR-4586
 Project: Solr
  Issue Type: Improvement
Affects Versions: 4.2
 Environment: 4.3-SNAPSHOT 1456767M - ncindex - 2013-03-15 13:11:50
Reporter: Shawn Heisey
 Attachments: SOLR-4586.patch, SOLR-4586.patch, SOLR-4586.patch, 
 SOLR-4586.patch, SOLR-4586.patch, SOLR-4586_verify_maxClauses.patch


 In the #solr IRC channel, I mentioned the maxBooleanClauses limitation to 
 someone asking a question about queries.  Mark Miller told me that 
 maxBooleanClauses no longer applies, that the limitation was removed from 
 Lucene sometime in the 3.x series.  The config still shows up in the example 
 even in the just-released 4.2.
 Checking through the source code, I found that the config option is parsed 
 and the value stored in objects, but does not actually seem to be used by 
 anything.  I removed every trace of it that I could find, and all tests still 
 pass.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4586) Increase default maxBooleanClauses

2014-11-03 Thread Robert Parker (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14195233#comment-14195233
 ] 

Robert Parker commented on SOLR-4586:
-

Ive only got one collection and one config in zookeeper, and thats the one that 
is being changed.  Each core had its solrconfig.xml updated on disk, but since 
its a SolrCloud config, only the zookeeper version should matter, correct?  

 Increase default maxBooleanClauses
 --

 Key: SOLR-4586
 URL: https://issues.apache.org/jira/browse/SOLR-4586
 Project: Solr
  Issue Type: Improvement
Affects Versions: 4.2
 Environment: 4.3-SNAPSHOT 1456767M - ncindex - 2013-03-15 13:11:50
Reporter: Shawn Heisey
 Attachments: SOLR-4586.patch, SOLR-4586.patch, SOLR-4586.patch, 
 SOLR-4586.patch, SOLR-4586.patch, SOLR-4586_verify_maxClauses.patch


 In the #solr IRC channel, I mentioned the maxBooleanClauses limitation to 
 someone asking a question about queries.  Mark Miller told me that 
 maxBooleanClauses no longer applies, that the limitation was removed from 
 Lucene sometime in the 3.x series.  The config still shows up in the example 
 even in the just-released 4.2.
 Checking through the source code, I found that the config option is parsed 
 and the value stored in objects, but does not actually seem to be used by 
 anything.  I removed every trace of it that I could find, and all tests still 
 pass.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-6042) CustomScoreQuery Explain differs from the actual score when topLevelBoost is used.

2014-11-03 Thread Denis Lantsman (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-6042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denis Lantsman updated LUCENE-6042:
---
Attachment: CustomScoreQuery.patch

 CustomScoreQuery Explain differs from the actual score when topLevelBoost is 
 used.
 --

 Key: LUCENE-6042
 URL: https://issues.apache.org/jira/browse/LUCENE-6042
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/query/scoring
Affects Versions: 4.8
Reporter: Denis Lantsman
Priority: Minor
 Attachments: CustomScoreQuery.patch

   Original Estimate: 1h
  Remaining Estimate: 1h

 CustomScoreQuery.java, doExplain has the following line:
 {code}
 res.addDetail(new Explanation(getBoost(), queryBoost));
 {code}
 This multiplies the custom score query by just the boost of the current 
 query, and not by
 {code}
 queryWeight=topLevelBoost*getBoost();
 {code}
 which is the value that's actually used during scoring. This leads to 
 drastically different scores in the debug info, relative to the actual score, 
 when the query is a subquery of another one, like a BooleanQuery clause, with 
 a non-1 boost.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



  1   2   >