[jira] [Commented] (SOLR-6531) better error message when lockType doesn't work with directoryFactory

2014-10-29 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188072#comment-14188072
 ] 

Hoss Man commented on SOLR-6531:


DirectoryFactory defines an abstract createLockFactory method.

Look at any implementation of createLockFactory and you'll see how each 
implmentation limits which lockTypes can be used.

in each case, the error message thrown if you specify an invalid lockType 
should say waht the valid lockTypes are.

steps to reproduce: edit the same solrconfig.xml and replace 
{{lockType$\{solr.lock.type:native\}/lockType}} with 
{{lockTypebogus/lockType}} - you'll get an error mssage that doesn't tell 
you what non-bogus values are supported for that DirectoryFactory.

 better error message when lockType doesn't work with directoryFactory
 -

 Key: SOLR-6531
 URL: https://issues.apache.org/jira/browse/SOLR-6531
 Project: Solr
  Issue Type: Improvement
Reporter: Hoss Man
  Labels: difficulty-easy, impact-low

 SOLR-6519 improved the logic about which lockTypes could be configured with 
 which directoryFactory implementations, but the result is a somewhat 
 confusing error message.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6030) Add norms patched compression which uses table for most common values

2014-10-29 Thread Adrien Grand (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188073#comment-14188073
 ] 

Adrien Grand commented on LUCENE-6030:
--

+1

 Add norms patched compression which uses table for most common values
 -

 Key: LUCENE-6030
 URL: https://issues.apache.org/jira/browse/LUCENE-6030
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Ryan Ernst
 Attachments: LUCENE-6030.patch


 We have added the PATCHED norms sub format in lucene 50, which uses a bitset 
 to mark documents that have the most common value (when 97% of the documents 
 have that value).  This works well for fields that have a predominant value 
 length, and then a small number of docs with some other random values.  But 
 another common case is having a handful of very common value lengths, like 
 with a title field.
 We can use a table (see TABLE_COMPRESSION) to store the most common values, 
 and save an oridinal for the other case, at which point we can lookup in 
 the secondary patch table.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-SmokeRelease-5.x - Build # 206 - Failure

2014-10-29 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/Lucene-Solr-SmokeRelease-5.x/206/

No tests ran.

Build Log:
[...truncated 51317 lines...]
prepare-release-no-sign:
[mkdir] Created dir: 
/usr/home/jenkins/jenkins-slave/workspace/Lucene-Solr-SmokeRelease-5.x/lucene/build/smokeTestRelease/dist
 [copy] Copying 446 files to 
/usr/home/jenkins/jenkins-slave/workspace/Lucene-Solr-SmokeRelease-5.x/lucene/build/smokeTestRelease/dist/lucene
 [copy] Copying 254 files to 
/usr/home/jenkins/jenkins-slave/workspace/Lucene-Solr-SmokeRelease-5.x/lucene/build/smokeTestRelease/dist/solr
   [smoker] Java 1.7 JAVA_HOME=/home/jenkins/tools/java/latest1.7
   [smoker] NOTE: output encoding is US-ASCII
   [smoker] 
   [smoker] Load release URL 
file:/usr/home/jenkins/jenkins-slave/workspace/Lucene-Solr-SmokeRelease-5.x/lucene/build/smokeTestRelease/dist/...
   [smoker] 
   [smoker] Test Lucene...
   [smoker]   test basics...
   [smoker]   get KEYS
   [smoker] 0.1 MB in 0.01 sec (13.7 MB/sec)
   [smoker]   check changes HTML...
   [smoker]   download lucene-5.0.0-src.tgz...
   [smoker] 27.8 MB in 0.04 sec (700.9 MB/sec)
   [smoker] verify md5/sha1 digests
   [smoker]   download lucene-5.0.0.tgz...
   [smoker] 63.7 MB in 0.09 sec (712.5 MB/sec)
   [smoker] verify md5/sha1 digests
   [smoker]   download lucene-5.0.0.zip...
   [smoker] 73.1 MB in 0.08 sec (955.9 MB/sec)
   [smoker] verify md5/sha1 digests
   [smoker]   unpack lucene-5.0.0.tgz...
   [smoker] verify JAR metadata/identity/no javax.* or java.* classes...
   [smoker] test demo with 1.7...
   [smoker]   got 5539 hits for query lucene
   [smoker] checkindex with 1.7...
   [smoker] check Lucene's javadoc JAR
   [smoker]   unpack lucene-5.0.0.zip...
   [smoker] verify JAR metadata/identity/no javax.* or java.* classes...
   [smoker] test demo with 1.7...
   [smoker]   got 5539 hits for query lucene
   [smoker] checkindex with 1.7...
   [smoker] check Lucene's javadoc JAR
   [smoker]   unpack lucene-5.0.0-src.tgz...
   [smoker] make sure no JARs/WARs in src dist...
   [smoker] run ant validate
   [smoker] run tests w/ Java 7 and testArgs='-Dtests.jettyConnector=Socket 
-Dtests.disableHdfs=true -Dtests.multiplier=1 -Dtests.slow=false'...
   [smoker] test demo with 1.7...
   [smoker]   got 205 hits for query lucene
   [smoker] checkindex with 1.7...
   [smoker] generate javadocs w/ Java 7...
   [smoker] 
   [smoker] Crawl/parse...
   [smoker] 
   [smoker] Verify...
   [smoker]   confirm all releases have coverage in TestBackwardsCompatibility
   [smoker] find all past Lucene releases...
   [smoker] run TestBackwardsCompatibility..
   [smoker] success!
   [smoker] 
   [smoker] Test Solr...
   [smoker]   test basics...
   [smoker]   get KEYS
   [smoker] 0.1 MB in 0.02 sec (5.0 MB/sec)
   [smoker]   check changes HTML...
   [smoker]   download solr-5.0.0-src.tgz...
   [smoker] 34.0 MB in 0.05 sec (645.2 MB/sec)
   [smoker] verify md5/sha1 digests
   [smoker]   download solr-5.0.0.tgz...
   [smoker] 146.2 MB in 0.39 sec (372.9 MB/sec)
   [smoker] verify md5/sha1 digests
   [smoker]   download solr-5.0.0.zip...
   [smoker] 152.4 MB in 0.24 sec (644.5 MB/sec)
   [smoker] verify md5/sha1 digests
   [smoker]   unpack solr-5.0.0.tgz...
   [smoker] verify JAR metadata/identity/no javax.* or java.* classes...
   [smoker] unpack lucene-5.0.0.tgz...
   [smoker]   **WARNING**: skipping check of 
/usr/home/jenkins/jenkins-slave/workspace/Lucene-Solr-SmokeRelease-5.x/lucene/build/smokeTestRelease/tmp/unpack/solr-5.0.0/contrib/dataimporthandler-extras/lib/javax.mail-1.5.1.jar:
 it has javax.* classes
   [smoker]   **WARNING**: skipping check of 
/usr/home/jenkins/jenkins-slave/workspace/Lucene-Solr-SmokeRelease-5.x/lucene/build/smokeTestRelease/tmp/unpack/solr-5.0.0/contrib/dataimporthandler-extras/lib/activation-1.1.1.jar:
 it has javax.* classes
   [smoker] verify WAR metadata/contained JAR identity/no javax.* or java.* 
classes...
   [smoker] unpack lucene-5.0.0.tgz...
   [smoker] copying unpacked distribution for Java 7 ...
   [smoker] test solr example w/ Java 7...
   [smoker]   start Solr instance 
(log=/usr/home/jenkins/jenkins-slave/workspace/Lucene-Solr-SmokeRelease-5.x/lucene/build/smokeTestRelease/tmp/unpack/solr-5.0.0-java7/solr-example.log)...
   [smoker]   startup done
   [smoker]   test utf8...
   [smoker]   index example docs...
   [smoker]   run query...
   [smoker]   stop server (SIGINT)...
   [smoker]   unpack solr-5.0.0.zip...
   [smoker] verify JAR metadata/identity/no javax.* or java.* classes...
   [smoker] unpack lucene-5.0.0.tgz...
   [smoker]   **WARNING**: skipping check of 
/usr/home/jenkins/jenkins-slave/workspace/Lucene-Solr-SmokeRelease-5.x/lucene/build/smokeTestRelease/tmp/unpack/solr-5.0.0/contrib/dataimporthandler-extras/lib/javax.mail-1.5.1.jar:
 it has 

[JENKINS] Lucene-Solr-trunk-Windows (32bit/jdk1.8.0_20) - Build # 4397 - Still Failing!

2014-10-29 Thread Policeman Jenkins Server
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Windows/4397/
Java: 32bit/jdk1.8.0_20 -client -XX:+UseSerialGC

1 tests failed.
REGRESSION:  
org.apache.solr.client.solrj.SolrSchemalessExampleTest.testAddDelete

Error Message:
IOException occured when talking to server at: 
https://127.0.0.1:50400/solr/collection1

Stack Trace:
org.apache.solr.client.solrj.SolrServerException: IOException occured when 
talking to server at: https://127.0.0.1:50400/solr/collection1
at 
__randomizedtesting.SeedInfo.seed([7F8996E1391E8D2C:B769EBEBC8B65EFA]:0)
at 
org.apache.solr.client.solrj.impl.HttpSolrServer.executeMethod(HttpSolrServer.java:584)
at 
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:215)
at 
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:211)
at 
org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:124)
at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:116)
at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:102)
at 
org.apache.solr.client.solrj.SolrExampleTestsBase.testAddDelete(SolrExampleTestsBase.java:180)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:483)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1618)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:827)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:863)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:877)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:365)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:798)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:458)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:836)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:738)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:772)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:783)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:54)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 

[jira] [Created] (SOLR-6666) Dynamic copy fields are considering all dynamic fields, causing a significant performance impact on indexing documents

2014-10-29 Thread Liram Vardi (JIRA)
Liram Vardi created SOLR-:
-

 Summary: Dynamic copy fields are considering all dynamic fields, 
causing a significant performance impact on indexing documents
 Key: SOLR-
 URL: https://issues.apache.org/jira/browse/SOLR-
 Project: Solr
  Issue Type: Improvement
  Components: Schema and Analysis, update
 Environment: Linux, Solr 4.8, Schema with 70 fields and more than 500 
specific CopyFields for dynamic fields, but without wildcards (the fields are 
dynamic, the copy directive is not)
Reporter: Liram Vardi


Result:
After applying a fix for this issue, tests which we conducted show more than 40 
percent improvement on our insertion performance.

Explanation:

Using JVM profiler, we found a CPU bottleneck during Solr indexing process. 
This bottleneck can be found at org.apache.solr.schema.IndexSchema, in the 
following method, getCopyFieldsList():

{code:title=getCopyFieldsList() |borderStyle=solid}
final ListCopyField result = new ArrayList();
for (DynamicCopy dynamicCopy : dynamicCopyFields) {
  if (dynamicCopy.matches(sourceField)) {
result.add(new CopyField(getField(sourceField), 
dynamicCopy.getTargetField(sourceField), dynamicCopy.maxChars));
  }
}
ListCopyField fixedCopyFields = copyFieldsMap.get(sourceField);
if (null != fixedCopyFields) {
  result.addAll(fixedCopyFields);
}
{code}

This function tries to find for an input source field all its copyFields (All 
its destinations which Solr need to move this field). 
As you can probably note, the first part of the procedure is the procedure most 
“expensive” step (takes O(n) time while N is the size of the 
dynamicCopyFields group).
The next part is just a simple hash extraction, which takes O(1) time. 

Our schema contains over then 500 copyFields but only 70 of then are indexed 
fields. 
We also have one dynamic field with  a wildcard (*), which catches the rest 
of the document fields. 
As you can conclude, we have more than 400 copyFields that are based on this 
dynamicField but all, except one, are fixed (i.e. does not contain any 
wildcard).

From some reason, the copyFields registration procedure defines those 400 
fields as DynamicCopyField  and then store them in the “dynamicCopyFields” 
array, 
This step makes getCopyFieldsList() very expensive (in CPU terms) without any 
justification: All of those 400 copyFields are not glob and therefore do not 
need any complex pattern matching to the input field. They all can be store at 
the fixedCopyFields.
Only copyFields with asterisks need this special treatment and they are 
(especially on our case) pretty rare.  

Therefore, we created a patch which fix this problem by changing the 
registerCopyField() procedure.
Test which we conducted show that there is no change in the Indexing results. 
Moreover, the fix still successfully passes the class unit tests (i.e. 
IndexSchemaTest.java).

   



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-6667) Dynamic copy fields are considering all dynamic fields, causing a significant performance impact on indexing documents

2014-10-29 Thread Liram Vardi (JIRA)
Liram Vardi created SOLR-6667:
-

 Summary: Dynamic copy fields are considering all dynamic fields, 
causing a significant performance impact on indexing documents
 Key: SOLR-6667
 URL: https://issues.apache.org/jira/browse/SOLR-6667
 Project: Solr
  Issue Type: Improvement
  Components: Schema and Analysis, update
 Environment: Linux, Solr 4.8, Schema with 70 fields and more than 500 
specific CopyFields for dynamic fields, but without wildcards (the fields are 
dynamic, the copy directive is not)
Reporter: Liram Vardi


Result:
After applying a fix for this issue, tests which we conducted show more than 40 
percent improvement on our insertion performance.

Explanation:

Using JVM profiler, we found a CPU bottleneck during Solr indexing process. 
This bottleneck can be found at org.apache.solr.schema.IndexSchema, in the 
following method, getCopyFieldsList():

{code:title=getCopyFieldsList() |borderStyle=solid}
final ListCopyField result = new ArrayList();
for (DynamicCopy dynamicCopy : dynamicCopyFields) {
  if (dynamicCopy.matches(sourceField)) {
result.add(new CopyField(getField(sourceField), 
dynamicCopy.getTargetField(sourceField), dynamicCopy.maxChars));
  }
}
ListCopyField fixedCopyFields = copyFieldsMap.get(sourceField);
if (null != fixedCopyFields) {
  result.addAll(fixedCopyFields);
}
{code}

This function tries to find for an input source field all its copyFields (All 
its destinations which Solr need to move this field). 
As you can probably note, the first part of the procedure is the procedure most 
“expensive” step (takes O( n ) time while N is the size of the 
dynamicCopyFields group).
The next part is just a simple hash extraction, which takes O(1) time. 

Our schema contains over then 500 copyFields but only 70 of then are indexed 
fields. 
We also have one dynamic field with  a wildcard ( * ), which catches the rest 
of the document fields. 
As you can conclude, we have more than 400 copyFields that are based on this 
dynamicField but all, except one, are fixed (i.e. does not contain any 
wildcard).

From some reason, the copyFields registration procedure defines those 400 
fields as DynamicCopyField  and then store them in the “dynamicCopyFields” 
array, 
This step makes getCopyFieldsList() very expensive (in CPU terms) without any 
justification: All of those 400 copyFields are not glob and therefore do not 
need any complex pattern matching to the input field. They all can be store at 
the fixedCopyFields.
Only copyFields with asterisks need this special treatment and they are 
(especially on our case) pretty rare.  

Therefore, we created a patch which fix this problem by changing the 
registerCopyField() procedure.
Test which we conducted show that there is no change in the Indexing results. 
Moreover, the fix still successfully passes the class unit tests (i.e. 
IndexSchemaTest.java).




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-6667) Dynamic copy fields are considering all dynamic fields, causing a significant performance impact on indexing documents

2014-10-29 Thread Liram Vardi (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-6667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liram Vardi updated SOLR-6667:
--
Affects Version/s: 4.8

 Dynamic copy fields are considering all dynamic fields, causing a significant 
 performance impact on indexing documents
 --

 Key: SOLR-6667
 URL: https://issues.apache.org/jira/browse/SOLR-6667
 Project: Solr
  Issue Type: Improvement
  Components: Schema and Analysis, update
Affects Versions: 4.8
 Environment: Linux, Solr 4.8, Schema with 70 fields and more than 500 
 specific CopyFields for dynamic fields, but without wildcards (the fields are 
 dynamic, the copy directive is not)
Reporter: Liram Vardi

 Result:
 After applying a fix for this issue, tests which we conducted show more than 
 40 percent improvement on our insertion performance.
 Explanation:
 Using JVM profiler, we found a CPU bottleneck during Solr indexing process. 
 This bottleneck can be found at org.apache.solr.schema.IndexSchema, in the 
 following method, getCopyFieldsList():
 {code:title=getCopyFieldsList() |borderStyle=solid}
 final ListCopyField result = new ArrayList();
 for (DynamicCopy dynamicCopy : dynamicCopyFields) {
   if (dynamicCopy.matches(sourceField)) {
 result.add(new CopyField(getField(sourceField), 
 dynamicCopy.getTargetField(sourceField), dynamicCopy.maxChars));
   }
 }
 ListCopyField fixedCopyFields = copyFieldsMap.get(sourceField);
 if (null != fixedCopyFields) {
   result.addAll(fixedCopyFields);
 }
 {code}
 This function tries to find for an input source field all its copyFields (All 
 its destinations which Solr need to move this field). 
 As you can probably note, the first part of the procedure is the procedure 
 most “expensive” step (takes O( n ) time while N is the size of the 
 dynamicCopyFields group).
 The next part is just a simple hash extraction, which takes O(1) time. 
 Our schema contains over then 500 copyFields but only 70 of then are 
 indexed fields. 
 We also have one dynamic field with  a wildcard ( * ), which catches the 
 rest of the document fields. 
 As you can conclude, we have more than 400 copyFields that are based on this 
 dynamicField but all, except one, are fixed (i.e. does not contain any 
 wildcard).
 From some reason, the copyFields registration procedure defines those 400 
 fields as DynamicCopyField  and then store them in the “dynamicCopyFields” 
 array, 
 This step makes getCopyFieldsList() very expensive (in CPU terms) without any 
 justification: All of those 400 copyFields are not glob and therefore do not 
 need any complex pattern matching to the input field. They all can be store 
 at the fixedCopyFields.
 Only copyFields with asterisks need this special treatment and they are 
 (especially on our case) pretty rare.  
 Therefore, we created a patch which fix this problem by changing the 
 registerCopyField() procedure.
 Test which we conducted show that there is no change in the Indexing results. 
 Moreover, the fix still successfully passes the class unit tests (i.e. 
 IndexSchemaTest.java).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-6667) Dynamic copy fields are considering all dynamic fields, causing a significant performance impact on indexing documents

2014-10-29 Thread Liram Vardi (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-6667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liram Vardi updated SOLR-6667:
--
Issue Type: Improvement  (was: Bug)

 Dynamic copy fields are considering all dynamic fields, causing a significant 
 performance impact on indexing documents
 --

 Key: SOLR-6667
 URL: https://issues.apache.org/jira/browse/SOLR-6667
 Project: Solr
  Issue Type: Improvement
  Components: Schema and Analysis, update
Affects Versions: 4.8
 Environment: Linux, Solr 4.8, Schema with 70 fields and more than 500 
 specific CopyFields for dynamic fields, but without wildcards (the fields are 
 dynamic, the copy directive is not)
Reporter: Liram Vardi

 Result:
 After applying a fix for this issue, tests which we conducted show more than 
 40 percent improvement on our insertion performance.
 Explanation:
 Using JVM profiler, we found a CPU bottleneck during Solr indexing process. 
 This bottleneck can be found at org.apache.solr.schema.IndexSchema, in the 
 following method, getCopyFieldsList():
 {code:title=getCopyFieldsList() |borderStyle=solid}
 final ListCopyField result = new ArrayList();
 for (DynamicCopy dynamicCopy : dynamicCopyFields) {
   if (dynamicCopy.matches(sourceField)) {
 result.add(new CopyField(getField(sourceField), 
 dynamicCopy.getTargetField(sourceField), dynamicCopy.maxChars));
   }
 }
 ListCopyField fixedCopyFields = copyFieldsMap.get(sourceField);
 if (null != fixedCopyFields) {
   result.addAll(fixedCopyFields);
 }
 {code}
 This function tries to find for an input source field all its copyFields (All 
 its destinations which Solr need to move this field). 
 As you can probably note, the first part of the procedure is the procedure 
 most “expensive” step (takes O( n ) time while N is the size of the 
 dynamicCopyFields group).
 The next part is just a simple hash extraction, which takes O(1) time. 
 Our schema contains over then 500 copyFields but only 70 of then are 
 indexed fields. 
 We also have one dynamic field with  a wildcard ( * ), which catches the 
 rest of the document fields. 
 As you can conclude, we have more than 400 copyFields that are based on this 
 dynamicField but all, except one, are fixed (i.e. does not contain any 
 wildcard).
 From some reason, the copyFields registration procedure defines those 400 
 fields as DynamicCopyField  and then store them in the “dynamicCopyFields” 
 array, 
 This step makes getCopyFieldsList() very expensive (in CPU terms) without any 
 justification: All of those 400 copyFields are not glob and therefore do not 
 need any complex pattern matching to the input field. They all can be store 
 at the fixedCopyFields.
 Only copyFields with asterisks need this special treatment and they are 
 (especially on our case) pretty rare.  
 Therefore, we created a patch which fix this problem by changing the 
 registerCopyField() procedure.
 Test which we conducted show that there is no change in the Indexing results. 
 Moreover, the fix still successfully passes the class unit tests (i.e. 
 IndexSchemaTest.java).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-6667) Dynamic copy fields are considering all dynamic fields, causing a significant performance impact on indexing documents

2014-10-29 Thread Liram Vardi (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-6667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liram Vardi updated SOLR-6667:
--
Issue Type: Bug  (was: Improvement)

 Dynamic copy fields are considering all dynamic fields, causing a significant 
 performance impact on indexing documents
 --

 Key: SOLR-6667
 URL: https://issues.apache.org/jira/browse/SOLR-6667
 Project: Solr
  Issue Type: Bug
  Components: Schema and Analysis, update
Affects Versions: 4.8
 Environment: Linux, Solr 4.8, Schema with 70 fields and more than 500 
 specific CopyFields for dynamic fields, but without wildcards (the fields are 
 dynamic, the copy directive is not)
Reporter: Liram Vardi

 Result:
 After applying a fix for this issue, tests which we conducted show more than 
 40 percent improvement on our insertion performance.
 Explanation:
 Using JVM profiler, we found a CPU bottleneck during Solr indexing process. 
 This bottleneck can be found at org.apache.solr.schema.IndexSchema, in the 
 following method, getCopyFieldsList():
 {code:title=getCopyFieldsList() |borderStyle=solid}
 final ListCopyField result = new ArrayList();
 for (DynamicCopy dynamicCopy : dynamicCopyFields) {
   if (dynamicCopy.matches(sourceField)) {
 result.add(new CopyField(getField(sourceField), 
 dynamicCopy.getTargetField(sourceField), dynamicCopy.maxChars));
   }
 }
 ListCopyField fixedCopyFields = copyFieldsMap.get(sourceField);
 if (null != fixedCopyFields) {
   result.addAll(fixedCopyFields);
 }
 {code}
 This function tries to find for an input source field all its copyFields (All 
 its destinations which Solr need to move this field). 
 As you can probably note, the first part of the procedure is the procedure 
 most “expensive” step (takes O( n ) time while N is the size of the 
 dynamicCopyFields group).
 The next part is just a simple hash extraction, which takes O(1) time. 
 Our schema contains over then 500 copyFields but only 70 of then are 
 indexed fields. 
 We also have one dynamic field with  a wildcard ( * ), which catches the 
 rest of the document fields. 
 As you can conclude, we have more than 400 copyFields that are based on this 
 dynamicField but all, except one, are fixed (i.e. does not contain any 
 wildcard).
 From some reason, the copyFields registration procedure defines those 400 
 fields as DynamicCopyField  and then store them in the “dynamicCopyFields” 
 array, 
 This step makes getCopyFieldsList() very expensive (in CPU terms) without any 
 justification: All of those 400 copyFields are not glob and therefore do not 
 need any complex pattern matching to the input field. They all can be store 
 at the fixedCopyFields.
 Only copyFields with asterisks need this special treatment and they are 
 (especially on our case) pretty rare.  
 Therefore, we created a patch which fix this problem by changing the 
 registerCopyField() procedure.
 Test which we conducted show that there is no change in the Indexing results. 
 Moreover, the fix still successfully passes the class unit tests (i.e. 
 IndexSchemaTest.java).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-6667) Dynamic copy fields are considering all dynamic fields, causing a significant performance impact on indexing documents

2014-10-29 Thread Liram Vardi (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-6667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liram Vardi updated SOLR-6667:
--
Attachment: SOLR-6667.patch

 Dynamic copy fields are considering all dynamic fields, causing a significant 
 performance impact on indexing documents
 --

 Key: SOLR-6667
 URL: https://issues.apache.org/jira/browse/SOLR-6667
 Project: Solr
  Issue Type: Improvement
  Components: Schema and Analysis, update
Affects Versions: 4.8
 Environment: Linux, Solr 4.8, Schema with 70 fields and more than 500 
 specific CopyFields for dynamic fields, but without wildcards (the fields are 
 dynamic, the copy directive is not)
Reporter: Liram Vardi
 Attachments: SOLR-6667.patch


 Result:
 After applying a fix for this issue, tests which we conducted show more than 
 40 percent improvement on our insertion performance.
 Explanation:
 Using JVM profiler, we found a CPU bottleneck during Solr indexing process. 
 This bottleneck can be found at org.apache.solr.schema.IndexSchema, in the 
 following method, getCopyFieldsList():
 {code:title=getCopyFieldsList() |borderStyle=solid}
 final ListCopyField result = new ArrayList();
 for (DynamicCopy dynamicCopy : dynamicCopyFields) {
   if (dynamicCopy.matches(sourceField)) {
 result.add(new CopyField(getField(sourceField), 
 dynamicCopy.getTargetField(sourceField), dynamicCopy.maxChars));
   }
 }
 ListCopyField fixedCopyFields = copyFieldsMap.get(sourceField);
 if (null != fixedCopyFields) {
   result.addAll(fixedCopyFields);
 }
 {code}
 This function tries to find for an input source field all its copyFields (All 
 its destinations which Solr need to move this field). 
 As you can probably note, the first part of the procedure is the procedure 
 most “expensive” step (takes O( n ) time while N is the size of the 
 dynamicCopyFields group).
 The next part is just a simple hash extraction, which takes O(1) time. 
 Our schema contains over then 500 copyFields but only 70 of then are 
 indexed fields. 
 We also have one dynamic field with  a wildcard ( * ), which catches the 
 rest of the document fields. 
 As you can conclude, we have more than 400 copyFields that are based on this 
 dynamicField but all, except one, are fixed (i.e. does not contain any 
 wildcard).
 From some reason, the copyFields registration procedure defines those 400 
 fields as DynamicCopyField  and then store them in the “dynamicCopyFields” 
 array, 
 This step makes getCopyFieldsList() very expensive (in CPU terms) without any 
 justification: All of those 400 copyFields are not glob and therefore do not 
 need any complex pattern matching to the input field. They all can be store 
 at the fixedCopyFields.
 Only copyFields with asterisks need this special treatment and they are 
 (especially on our case) pretty rare.  
 Therefore, we created a patch which fix this problem by changing the 
 registerCopyField() procedure.
 Test which we conducted show that there is no change in the Indexing results. 
 Moreover, the fix still successfully passes the class unit tests (i.e. 
 IndexSchemaTest.java).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-NightlyTests-trunk - Build # 670 - Still Failing

2014-10-29 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/Lucene-Solr-NightlyTests-trunk/670/

1 tests failed.
FAILED:  org.apache.solr.cloud.CollectionsAPIDistributedZkTest.testDistribSearch

Error Message:
Captured an uncaught exception in thread: Thread[id=3850, name=Thread-1223, 
state=RUNNABLE, group=TGRP-CollectionsAPIDistributedZkTest]

Stack Trace:
com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught 
exception in thread: Thread[id=3850, name=Thread-1223, state=RUNNABLE, 
group=TGRP-CollectionsAPIDistributedZkTest]
Caused by: java.lang.NullPointerException
at __randomizedtesting.SeedInfo.seed([BAC342B198BF0FEB]:0)
at 
org.apache.solr.cloud.CollectionsAPIDistributedZkTest$1CollectionThread.run(CollectionsAPIDistributedZkTest.java:1044)




Build Log:
[...truncated 11726 lines...]
   [junit4] Suite: org.apache.solr.cloud.CollectionsAPIDistributedZkTest
   [junit4]   2 Creating dataDir: 
/usr/home/jenkins/jenkins-slave/workspace/Lucene-Solr-NightlyTests-trunk@2/solr/build/solr-core/test/J3/temp/solr.cloud.CollectionsAPIDistributedZkTest-BAC342B198BF0FEB-001/init-core-data-001
   [junit4]   2 671916 T3019 oas.SolrTestCaseJ4.buildSSLConfig Randomized ssl 
(true) and clientAuth (false)
   [junit4]   2 671916 T3019 oas.BaseDistributedSearchTestCase.initHostContext 
Setting hostContext system property: /
   [junit4]   2 671922 T3019 oas.SolrTestCaseJ4.setUp ###Starting 
testDistribSearch
   [junit4]   2 671923 T3019 oasc.ZkTestServer.run STARTING ZK TEST SERVER
   [junit4]   1 client port:0.0.0.0/0.0.0.0:0
   [junit4]   2 671924 T3020 oasc.ZkTestServer$ZKServerMain.runFromConfig 
Starting server
   [junit4]   2 672024 T3019 oasc.ZkTestServer.run start zk server on 
port:57521
   [junit4]   2 672025 T3019 
oascc.SolrZkClient.createZkCredentialsToAddAutomatically Using default 
ZkCredentialsProvider
   [junit4]   2 672026 T3019 oascc.ConnectionManager.waitForConnected Waiting 
for client to connect to ZooKeeper
   [junit4]   2 672031 T3026 oascc.ConnectionManager.process Watcher 
org.apache.solr.common.cloud.ConnectionManager@654fdfb9 
name:ZooKeeperConnection Watcher:127.0.0.1:57521 got event WatchedEvent 
state:SyncConnected type:None path:null path:null type:None
   [junit4]   2 672031 T3019 oascc.ConnectionManager.waitForConnected Client 
is connected to ZooKeeper
   [junit4]   2 672031 T3019 oascc.SolrZkClient.createZkACLProvider Using 
default ZkACLProvider
   [junit4]   2 672032 T3019 oascc.SolrZkClient.makePath makePath: /solr
   [junit4]   2 672034 T3019 
oascc.SolrZkClient.createZkCredentialsToAddAutomatically Using default 
ZkCredentialsProvider
   [junit4]   2 672036 T3019 oascc.ConnectionManager.waitForConnected Waiting 
for client to connect to ZooKeeper
   [junit4]   2 672037 T3028 oascc.ConnectionManager.process Watcher 
org.apache.solr.common.cloud.ConnectionManager@4c192e0b 
name:ZooKeeperConnection Watcher:127.0.0.1:57521/solr got event WatchedEvent 
state:SyncConnected type:None path:null path:null type:None
   [junit4]   2 672037 T3019 oascc.ConnectionManager.waitForConnected Client 
is connected to ZooKeeper
   [junit4]   2 672037 T3019 oascc.SolrZkClient.createZkACLProvider Using 
default ZkACLProvider
   [junit4]   2 672038 T3019 oascc.SolrZkClient.makePath makePath: 
/collections/collection1
   [junit4]   2 672040 T3019 oascc.SolrZkClient.makePath makePath: 
/collections/collection1/shards
   [junit4]   2 672042 T3019 oascc.SolrZkClient.makePath makePath: 
/collections/control_collection
   [junit4]   2 672044 T3019 oascc.SolrZkClient.makePath makePath: 
/collections/control_collection/shards
   [junit4]   2 672046 T3019 oasc.AbstractZkTestCase.putConfig put 
/usr/home/jenkins/jenkins-slave/workspace/Lucene-Solr-NightlyTests-trunk@2/solr/core/src/test-files/solr/collection1/conf/solrconfig-tlog.xml
 to /configs/conf1/solrconfig.xml
   [junit4]   2 672047 T3019 oascc.SolrZkClient.makePath makePath: 
/configs/conf1/solrconfig.xml
   [junit4]   2 672050 T3019 oasc.AbstractZkTestCase.putConfig put 
/usr/home/jenkins/jenkins-slave/workspace/Lucene-Solr-NightlyTests-trunk@2/solr/core/src/test-files/solr/collection1/conf/schema.xml
 to /configs/conf1/schema.xml
   [junit4]   2 672051 T3019 oascc.SolrZkClient.makePath makePath: 
/configs/conf1/schema.xml
   [junit4]   2 672053 T3019 oasc.AbstractZkTestCase.putConfig put 
/usr/home/jenkins/jenkins-slave/workspace/Lucene-Solr-NightlyTests-trunk@2/solr/core/src/test-files/solr/collection1/conf/solrconfig.snippet.randomindexconfig.xml
 to /configs/conf1/solrconfig.snippet.randomindexconfig.xml
   [junit4]   2 672054 T3019 oascc.SolrZkClient.makePath makePath: 
/configs/conf1/solrconfig.snippet.randomindexconfig.xml
   [junit4]   2 672056 T3019 oasc.AbstractZkTestCase.putConfig put 
/usr/home/jenkins/jenkins-slave/workspace/Lucene-Solr-NightlyTests-trunk@2/solr/core/src/test-files/solr/collection1/conf/stopwords.txt
 to /configs/conf1/stopwords.txt
   [junit4]   2 672057 T3019 oascc.SolrZkClient.makePath makePath: 

[jira] [Commented] (SOLR-5377) the Core Selector in the Admin UI should pre-select a core

2014-10-29 Thread Konstantin Gribov (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188231#comment-14188231
 ] 

Konstantin Gribov commented on SOLR-5377:
-

Maybe, parameter {{/solr/cores\[@defaultCoreName]}} (in xpath notation) from 
old {{solr.xml}} should be somehow added to the new {{solr.xml}} parameters.

 the Core Selector in the Admin UI should pre-select a core
 

 Key: SOLR-5377
 URL: https://issues.apache.org/jira/browse/SOLR-5377
 Project: Solr
  Issue Type: Improvement
Reporter: Michael McCandless
Priority: Minor

 I was trying to use the admin UI, to understand how text was analyzed, but it 
 was confusing (I couldn't find the Analysis page) until I realized I had to 
 use the Core Selector to select my core.
 I had only one core ... it seems like the Core Selector could easily just 
 pre-select a core (the one in my case...).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5532) SolrJ Content-Type validation is too strict for some webcontainers / proxies, breaks on equivilent content types

2014-10-29 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-5532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188249#comment-14188249
 ] 

Magnus Lövgren commented on SOLR-5532:
--

I've upgraded from 4.4 to 4.10.1 and have been struggling somewhat with my code 
that was affected by this change. Some observations that might be useful for 
others too:

The patch relies on the org.apache.http.entity.ContentType.parse method. It 
fails when parsing an empty string. That's fine (empty string should probably 
not be seen as a valid type anyway). The caveat is that an empty string is 
actually used as the fallback contentType if the response has no Content-Type 
header! This would be the typical case if the response is a 401 (typically has 
no Content-Type).

- In prior versions a 401 response threw a SolrException with code() 401
- Now a SolrServerException  is thrown (caused by a 
org.apache.http.ParseException). Hard to determine if it was due to bad 
credentials (401).

To restore previous behaviour, you'd presumably add the 
HttpStatus.SC_UNAUTHORIZED case to the switch and then throw a 
RemoteSolrException (with code 401). In other words - fail early for 401 
response (there's no content to parse anyway)

 SolrJ Content-Type validation is too strict for some webcontainers / proxies, 
 breaks on equivilent content types
 

 Key: SOLR-5532
 URL: https://issues.apache.org/jira/browse/SOLR-5532
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.6
 Environment: Windows 7, Java 1.7.0_45 (64bit), solr-solrj-4.6.0.jar
Reporter: Jakob Furrer
Assignee: Mark Miller
 Fix For: 4.6.1, 4.7, Trunk

 Attachments: SOLR-5532-elyograg-eclipse-screenshot.png, 
 SOLR-5532.patch


 due to SOLR-3530, HttpSolrServer now does a string equivilence check between 
 the Content-Type returned by the server, and a getContentTYpe() method 
 declared by the ResponseParser .. but string equivilence is too strict, and 
 can result in errors like this one reported by a user
 
 I just upgraded my Solr instance and with it I also upgraded the solrj 
 library in our custom application which sends diverse requests and queries to 
 Solr.
 I use the ping method to determine whether Solr started correctly under the 
 configured address. Since the upgrade the ping response results in an error:
 {code:xml}
 Cause: org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: 
 Expected content type application/xml; charset=UTF-8 but got 
 application/xml;charset=UTF-8.
 ?xml version=1.0 encoding=UTF-8?
 response
 lst name=responseHeaderint name=status0/intint 
 name=QTime0/intlst name=paramsstr name=dfsearchtext/strstr 
 name=echoParamsall/strstr name=rows10/strstr 
 name=echoParamsall/strstr name=wtxml/strstr 
 name=version2.2/strstr name=qsolrpingquery/strstr 
 name=distribfalse/str/lst/lststr name=statusOK/str
 /response
 {code}
 The Solr application itself works fine.
 Using an older version of the solrj library than solr-solrj-4.6.0.jar (e.g. 
 solr-solrj-4.5.1.jar) in the custom application does not produce this error.
 The Exception is produced in a Code block (_HttpSolrServer.java_, method 
 _request(...)_, around. line 140) which has been introduced with version 
 4.6.0.
 Code to reproduce the error:
 {code}
 try {
   HttpSolrServer solrServer = new 
 HttpSolrServer(http://localhost:8080/Solr/collection;);
   solrServer.setParser(new XMLResponseParser()); // this line is making 
 all the difference
   solrServer.ping();
 } catch (Exception e) {
   e.printStackTrace();
 }
 {code}
 A global search for charset=UTF-8 on the source code of solrj indicates 
 that other functions besides ping might be affected as well, because there 
 are several places where application/xml; charset=UTF-8 is spelled without 
 a space after the semicolon.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-5377) the Core Selector in the Admin UI should pre-select a core

2014-10-29 Thread Stefan Matheis (steffkes) (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Matheis (steffkes) updated SOLR-5377:

Component/s: web gui

 the Core Selector in the Admin UI should pre-select a core
 

 Key: SOLR-5377
 URL: https://issues.apache.org/jira/browse/SOLR-5377
 Project: Solr
  Issue Type: Improvement
  Components: web gui
Reporter: Michael McCandless
Priority: Minor

 I was trying to use the admin UI, to understand how text was analyzed, but it 
 was confusing (I couldn't find the Analysis page) until I realized I had to 
 use the Core Selector to select my core.
 I had only one core ... it seems like the Core Selector could easily just 
 pre-select a core (the one in my case...).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5532) SolrJ Content-Type validation is too strict for some webcontainers / proxies, breaks on equivilent content types

2014-10-29 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188262#comment-14188262
 ] 

Mark Miller commented on SOLR-5532:
---

I ran into this same issue in a review for Cloudera Search before I went on 
vacation a couple weeks ago. Technically, it was a back compat break. Please 
file a JIRA issue and we can address it. 

 SolrJ Content-Type validation is too strict for some webcontainers / proxies, 
 breaks on equivilent content types
 

 Key: SOLR-5532
 URL: https://issues.apache.org/jira/browse/SOLR-5532
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.6
 Environment: Windows 7, Java 1.7.0_45 (64bit), solr-solrj-4.6.0.jar
Reporter: Jakob Furrer
Assignee: Mark Miller
 Fix For: 4.6.1, 4.7, Trunk

 Attachments: SOLR-5532-elyograg-eclipse-screenshot.png, 
 SOLR-5532.patch


 due to SOLR-3530, HttpSolrServer now does a string equivilence check between 
 the Content-Type returned by the server, and a getContentTYpe() method 
 declared by the ResponseParser .. but string equivilence is too strict, and 
 can result in errors like this one reported by a user
 
 I just upgraded my Solr instance and with it I also upgraded the solrj 
 library in our custom application which sends diverse requests and queries to 
 Solr.
 I use the ping method to determine whether Solr started correctly under the 
 configured address. Since the upgrade the ping response results in an error:
 {code:xml}
 Cause: org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: 
 Expected content type application/xml; charset=UTF-8 but got 
 application/xml;charset=UTF-8.
 ?xml version=1.0 encoding=UTF-8?
 response
 lst name=responseHeaderint name=status0/intint 
 name=QTime0/intlst name=paramsstr name=dfsearchtext/strstr 
 name=echoParamsall/strstr name=rows10/strstr 
 name=echoParamsall/strstr name=wtxml/strstr 
 name=version2.2/strstr name=qsolrpingquery/strstr 
 name=distribfalse/str/lst/lststr name=statusOK/str
 /response
 {code}
 The Solr application itself works fine.
 Using an older version of the solrj library than solr-solrj-4.6.0.jar (e.g. 
 solr-solrj-4.5.1.jar) in the custom application does not produce this error.
 The Exception is produced in a Code block (_HttpSolrServer.java_, method 
 _request(...)_, around. line 140) which has been introduced with version 
 4.6.0.
 Code to reproduce the error:
 {code}
 try {
   HttpSolrServer solrServer = new 
 HttpSolrServer(http://localhost:8080/Solr/collection;);
   solrServer.setParser(new XMLResponseParser()); // this line is making 
 all the difference
   solrServer.ping();
 } catch (Exception e) {
   e.printStackTrace();
 }
 {code}
 A global search for charset=UTF-8 on the source code of solrj indicates 
 that other functions besides ping might be affected as well, because there 
 are several places where application/xml; charset=UTF-8 is spelled without 
 a space after the semicolon.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6618) SolrCore Initialization Failures when the solr is restarted, unable to Initialization a collection

2014-10-29 Thread Vijaya Jonnakuti (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188263#comment-14188263
 ] 

Vijaya Jonnakuti commented on SOLR-6618:


Thanks for the reply. I cannot say exactly when it happens. most like when I 
restart one of the solr node.

I have 3 zookeeper ensemble/3 solr node  3 client nodes
all the collections are created with a default config which is uploaded to 
zookeeper.

But when dih is run the different config folders with the corresponding 
collection name are created in the zookeeper with.
update_dih_store.properties in them and no configs in it.

after which it thinks overnighttest should have its own config set in zookeeper.
where I have not run dih for overnighttest.


example:

  configs
  /default
   |
   --has the configs in here 
  /collection1
   |
   ---update_dih_generic.properties but no configs
 /collection2
   |
   ---update_dih_generic.properties but no configs


Let me know if you need more details

 SolrCore Initialization Failures when the solr is restarted, unable to 
 Initialization a collection
 --

 Key: SOLR-6618
 URL: https://issues.apache.org/jira/browse/SOLR-6618
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.8
Reporter: Vijaya Jonnakuti

 I have uploaded  one config:default and  do specify 
 collection.configName=default when I create the collection
 and when solr is restart I get this error 
 org.apache.solr.common.cloud.ZooKeeperException:org.apache.solr.common.cloud.ZooKeeperException:
  Could not find configName for collection overnighttest found:[default, 
 collection1, collection2 and so on]
 These collection1 and collection2 empty configs are created when I run 
 DataImportHandler using ZKPropertiesWriter 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-6618) SolrCore Initialization Failures when the solr is restarted, unable to Initialization a collection

2014-10-29 Thread Vijaya Jonnakuti (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188263#comment-14188263
 ] 

Vijaya Jonnakuti edited comment on SOLR-6618 at 10/29/14 11:52 AM:
---

Thanks for the reply. I cannot say exactly when it happens. most like when I 
restart one of the solr node.

I have 3 zookeeper ensemble/3 solr node  3 client nodes
all the collections are created with a default config which is uploaded to 
zookeeper.

But when dih is run the different config folders with the corresponding 
collection name are created in the zookeeper with.
update_dih_store.properties in them and no configs in it.

after which it thinks overnighttest should have its own config set in zookeeper.
where I have not run dih for overnighttest.



  configs
  /default
   |
   --has the configs in here 
  /collection1
   |
   ---update_dih_generic.properties but no configs
 /collection2
   |
   ---update_dih_generic.properties but no configs


Let me know if you need more details


was (Author: pattapuvijaya):
Thanks for the reply. I cannot say exactly when it happens. most like when I 
restart one of the solr node.

I have 3 zookeeper ensemble/3 solr node  3 client nodes
all the collections are created with a default config which is uploaded to 
zookeeper.

But when dih is run the different config folders with the corresponding 
collection name are created in the zookeeper with.
update_dih_store.properties in them and no configs in it.

after which it thinks overnighttest should have its own config set in zookeeper.
where I have not run dih for overnighttest.


example:

  configs
  /default
   |
   --has the configs in here 
  /collection1
   |
   ---update_dih_generic.properties but no configs
 /collection2
   |
   ---update_dih_generic.properties but no configs


Let me know if you need more details

 SolrCore Initialization Failures when the solr is restarted, unable to 
 Initialization a collection
 --

 Key: SOLR-6618
 URL: https://issues.apache.org/jira/browse/SOLR-6618
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.8
Reporter: Vijaya Jonnakuti

 I have uploaded  one config:default and  do specify 
 collection.configName=default when I create the collection
 and when solr is restart I get this error 
 org.apache.solr.common.cloud.ZooKeeperException:org.apache.solr.common.cloud.ZooKeeperException:
  Could not find configName for collection overnighttest found:[default, 
 collection1, collection2 and so on]
 These collection1 and collection2 empty configs are created when I run 
 DataImportHandler using ZKPropertiesWriter 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-6618) SolrCore Initialization Failures when the solr is restarted, unable to Initialization a collection

2014-10-29 Thread Vijaya Jonnakuti (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188263#comment-14188263
 ] 

Vijaya Jonnakuti edited comment on SOLR-6618 at 10/29/14 11:56 AM:
---

Thanks for the reply. I cannot say exactly when it happens. most like when one 
of the solr node is restarted.

COnfiguration is 3 zookeeper ensemble/3 solr node  3 client nodes
all the collections are created with a default config which is uploaded to 
zookeeper.

But when dih is run the different config folders with the corresponding 
collection name are created in the zookeeper with.
update_dih_store.properties in them and no configs in it.

after which it thinks overnighttest should have its own config set in zookeeper.
 dih is not run for overnighttest do that folder is not created.



  configs
  /default
   |
   --has the configs in here 
  /collection1
   |
   ---update_dih_generic.properties but no configs
 /collection2
   |
   ---update_dih_generic.properties but no configs


Let me know if you need more details


was (Author: pattapuvijaya):
Thanks for the reply. I cannot say exactly when it happens. most like when I 
restart one of the solr node.

I have 3 zookeeper ensemble/3 solr node  3 client nodes
all the collections are created with a default config which is uploaded to 
zookeeper.

But when dih is run the different config folders with the corresponding 
collection name are created in the zookeeper with.
update_dih_store.properties in them and no configs in it.

after which it thinks overnighttest should have its own config set in zookeeper.
where I have not run dih for overnighttest.



  configs
  /default
   |
   --has the configs in here 
  /collection1
   |
   ---update_dih_generic.properties but no configs
 /collection2
   |
   ---update_dih_generic.properties but no configs


Let me know if you need more details

 SolrCore Initialization Failures when the solr is restarted, unable to 
 Initialization a collection
 --

 Key: SOLR-6618
 URL: https://issues.apache.org/jira/browse/SOLR-6618
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.8
Reporter: Vijaya Jonnakuti

 I have uploaded  one config:default and  do specify 
 collection.configName=default when I create the collection
 and when solr is restart I get this error 
 org.apache.solr.common.cloud.ZooKeeperException:org.apache.solr.common.cloud.ZooKeeperException:
  Could not find configName for collection overnighttest found:[default, 
 collection1, collection2 and so on]
 These collection1 and collection2 empty configs are created when I run 
 DataImportHandler using ZKPropertiesWriter 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-6618) SolrCore Initialization Failures when the solr is restarted, unable to Initialization a collection

2014-10-29 Thread Vijaya Jonnakuti (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188263#comment-14188263
 ] 

Vijaya Jonnakuti edited comment on SOLR-6618 at 10/29/14 11:57 AM:
---

Thanks for the reply. Cannot say exactly when it happens, most likely when one 
of the solr node is restarted.

Configuration is 3 zookeeper ensemble/3 solr node  3 client nodes
all the collections are created with a default config which is uploaded to 
zookeeper.

But when dih is run the different config folders with the corresponding 
collection name are created in the zookeeper with.
update_dih_store.properties in them and no configs in it.

After which it thinks overnighttest should have its own config set in 
zookeeper, dih is not run for overnighttest collection so that folder is not 
created.


  configs
  /default
   |
   --has the configs in here 
  /collection1
   |
   ---update_dih_generic.properties but no configs
 /collection2
   |
   ---update_dih_generic.properties but no configs


Let me know if you need more details


was (Author: pattapuvijaya):
Thanks for the reply. I cannot say exactly when it happens. most like when one 
of the solr node is restarted.

COnfiguration is 3 zookeeper ensemble/3 solr node  3 client nodes
all the collections are created with a default config which is uploaded to 
zookeeper.

But when dih is run the different config folders with the corresponding 
collection name are created in the zookeeper with.
update_dih_store.properties in them and no configs in it.

after which it thinks overnighttest should have its own config set in zookeeper.
 dih is not run for overnighttest do that folder is not created.



  configs
  /default
   |
   --has the configs in here 
  /collection1
   |
   ---update_dih_generic.properties but no configs
 /collection2
   |
   ---update_dih_generic.properties but no configs


Let me know if you need more details

 SolrCore Initialization Failures when the solr is restarted, unable to 
 Initialization a collection
 --

 Key: SOLR-6618
 URL: https://issues.apache.org/jira/browse/SOLR-6618
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.8
Reporter: Vijaya Jonnakuti

 I have uploaded  one config:default and  do specify 
 collection.configName=default when I create the collection
 and when solr is restart I get this error 
 org.apache.solr.common.cloud.ZooKeeperException:org.apache.solr.common.cloud.ZooKeeperException:
  Could not find configName for collection overnighttest found:[default, 
 collection1, collection2 and so on]
 These collection1 and collection2 empty configs are created when I run 
 DataImportHandler using ZKPropertiesWriter 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-6618) SolrCore Initialization Failures when the solr is restarted, unable to Initialization a collection

2014-10-29 Thread Vijaya Jonnakuti (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188263#comment-14188263
 ] 

Vijaya Jonnakuti edited comment on SOLR-6618 at 10/29/14 11:59 AM:
---

Thanks for the reply. Cannot say exactly when it happens, most likely when one 
of the solr node is restarted.

Configuration is 3 zookeeper ensemble/3 solr node  3 client nodes
all the collections are created with a default config which is uploaded to 
zookeeper.

But when dih is run the different config folders with the corresponding 
collection name are created in the zookeeper with.
update_dih_store.properties in them and no configs in it.

After which it thinks overnighttest should have its own config set in 
zookeeper, dih is not run for overnighttest collection so that folder is not 
created.

We do give configName as default for the collection creation api through 
solrj.

  configs
  /default
   |
   --has the configs in here 
  /collection1
   |
   ---update_dih_generic.properties but no configs
 /collection2
   |
   ---update_dih_generic.properties but no configs


Let me know if you need more details


was (Author: pattapuvijaya):
Thanks for the reply. Cannot say exactly when it happens, most likely when one 
of the solr node is restarted.

Configuration is 3 zookeeper ensemble/3 solr node  3 client nodes
all the collections are created with a default config which is uploaded to 
zookeeper.

But when dih is run the different config folders with the corresponding 
collection name are created in the zookeeper with.
update_dih_store.properties in them and no configs in it.

After which it thinks overnighttest should have its own config set in 
zookeeper, dih is not run for overnighttest collection so that folder is not 
created.


  configs
  /default
   |
   --has the configs in here 
  /collection1
   |
   ---update_dih_generic.properties but no configs
 /collection2
   |
   ---update_dih_generic.properties but no configs


Let me know if you need more details

 SolrCore Initialization Failures when the solr is restarted, unable to 
 Initialization a collection
 --

 Key: SOLR-6618
 URL: https://issues.apache.org/jira/browse/SOLR-6618
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.8
Reporter: Vijaya Jonnakuti

 I have uploaded  one config:default and  do specify 
 collection.configName=default when I create the collection
 and when solr is restart I get this error 
 org.apache.solr.common.cloud.ZooKeeperException:org.apache.solr.common.cloud.ZooKeeperException:
  Could not find configName for collection overnighttest found:[default, 
 collection1, collection2 and so on]
 These collection1 and collection2 empty configs are created when I run 
 DataImportHandler using ZKPropertiesWriter 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-6618) SolrCore Initialization Failures when the solr is restarted, unable to Initialization a collection

2014-10-29 Thread Vijaya Jonnakuti (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188263#comment-14188263
 ] 

Vijaya Jonnakuti edited comment on SOLR-6618 at 10/29/14 12:11 PM:
---

Thanks for the reply. Cannot say exactly when it happens, most likely when one 
of the solr node is restarted.

Configuration is 3 zookeeper ensemble/3 solr node  3 client nodes
all the collections are created with a default config which is uploaded to 
zookeeper.

But when dih is run the different config folders with the corresponding 
collection name are created in the zookeeper with.
update_dih_store.properties in them and no configs in it.

After which the system thinks overnighttest should have its own config set in 
zookeeper, dih is not run for overnighttest collection so that folder is not 
created.

We do give configName as default for the collection creation api through 
solrj.

  configs
  /default
   |
   --has the configs in here 
  /collection1
   |
   ---update_dih_generic.properties but no configs
 /collection2
   |
   ---update_dih_generic.properties but no configs


Let me know if you need more details


was (Author: pattapuvijaya):
Thanks for the reply. Cannot say exactly when it happens, most likely when one 
of the solr node is restarted.

Configuration is 3 zookeeper ensemble/3 solr node  3 client nodes
all the collections are created with a default config which is uploaded to 
zookeeper.

But when dih is run the different config folders with the corresponding 
collection name are created in the zookeeper with.
update_dih_store.properties in them and no configs in it.

After which it thinks overnighttest should have its own config set in 
zookeeper, dih is not run for overnighttest collection so that folder is not 
created.

We do give configName as default for the collection creation api through 
solrj.

  configs
  /default
   |
   --has the configs in here 
  /collection1
   |
   ---update_dih_generic.properties but no configs
 /collection2
   |
   ---update_dih_generic.properties but no configs


Let me know if you need more details

 SolrCore Initialization Failures when the solr is restarted, unable to 
 Initialization a collection
 --

 Key: SOLR-6618
 URL: https://issues.apache.org/jira/browse/SOLR-6618
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.8
Reporter: Vijaya Jonnakuti

 I have uploaded  one config:default and  do specify 
 collection.configName=default when I create the collection
 and when solr is restart I get this error 
 org.apache.solr.common.cloud.ZooKeeperException:org.apache.solr.common.cloud.ZooKeeperException:
  Could not find configName for collection overnighttest found:[default, 
 collection1, collection2 and so on]
 These collection1 and collection2 empty configs are created when I run 
 DataImportHandler using ZKPropertiesWriter 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-5.x-Windows (32bit/jdk1.7.0_67) - Build # 4296 - Failure!

2014-10-29 Thread Policeman Jenkins Server
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-5.x-Windows/4296/
Java: 32bit/jdk1.7.0_67 -server -XX:+UseG1GC

No tests ran.

Build Log:
[...truncated 12196 lines...]
FATAL: hudson.remoting.RequestAbortedException: java.io.IOException: Unexpected 
termination of the channel
hudson.remoting.RequestAbortedException: 
hudson.remoting.RequestAbortedException: java.io.IOException: Unexpected 
termination of the channel
at 
hudson.remoting.RequestAbortedException.wrapForRethrow(RequestAbortedException.java:41)
at 
hudson.remoting.RequestAbortedException.wrapForRethrow(RequestAbortedException.java:34)
at hudson.remoting.Request.call(Request.java:174)
at hudson.remoting.Channel.call(Channel.java:742)
at 
hudson.remoting.RemoteInvocationHandler.invoke(RemoteInvocationHandler.java:168)
at com.sun.proxy.$Proxy73.join(Unknown Source)
at hudson.Launcher$RemoteLauncher$ProcImpl.join(Launcher.java:978)
at hudson.Launcher$ProcStarter.join(Launcher.java:387)
at hudson.tasks.Ant.perform(Ant.java:217)
at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:20)
at 
hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:770)
at hudson.model.Build$BuildExecution.build(Build.java:199)
at hudson.model.Build$BuildExecution.doRun(Build.java:160)
at 
hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:533)
at hudson.model.Run.execute(Run.java:1759)
at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
at hudson.model.ResourceController.execute(ResourceController.java:89)
at hudson.model.Executor.run(Executor.java:240)
Caused by: hudson.remoting.RequestAbortedException: java.io.IOException: 
Unexpected termination of the channel
at hudson.remoting.Request.abort(Request.java:299)
at hudson.remoting.Channel.terminate(Channel.java:805)
at 
hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:69)
Caused by: java.io.IOException: Unexpected termination of the channel
at 
hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:50)
Caused by: java.io.EOFException
at 
java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2325)
at 
java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:2794)
at 
java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:801)
at java.io.ObjectInputStream.init(ObjectInputStream.java:299)
at 
hudson.remoting.ObjectInputStreamEx.init(ObjectInputStreamEx.java:40)
at 
hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:34)
at 
hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:48)



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-6631) DistributedQueue spinning on calling zookeeper getChildren()

2014-10-29 Thread Timothy Potter (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188305#comment-14188305
 ] 

Timothy Potter commented on SOLR-6631:
--

Thanks for the feedback Hoss. I was actually wondering if it would suffice to 
handle NodeChildrenChanged EventTypes in the LatchChildWatcher process method, 
i.e. change the code to:

if (eventType == Event.EventType.NodeChildrenChanged) {
 ...
}

[~markrmil...@gmail.com] or [~andyetitmoves] do either of you have any insight 
you can share on this? Specifically, I'd like to change the 
LatchChildWatcher.process to set the event member and notifyAll only if the 
EventType is NodeChildrenChanged, i.e.

{code}
@Override
public void process(WatchedEvent event) {
  Event.EventType eventType = event.getType();
  LOG.info(LatchChildWatcher fired on path:  + event.getPath() +  state: 

  + event.getState() +  type  + eventType);
  if (eventType == Event.EventType.NodeChildrenChanged) {
synchronized (lock) {
  this.event = event;
  lock.notifyAll();
}
  }
}
{code}

Or do we need to handle the other event types and just not affect the event if 
the type is None as originally suggested by [~mewmewball]?

Need to get this one committed soon ;-)

 DistributedQueue spinning on calling zookeeper getChildren()
 

 Key: SOLR-6631
 URL: https://issues.apache.org/jira/browse/SOLR-6631
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Reporter: Jessica Cheng Mallet
Assignee: Timothy Potter
  Labels: solrcloud
 Attachments: SOLR-6631.patch


 The change from SOLR-6336 introduced a bug where now I'm stuck in a loop 
 making getChildren() request to zookeeper with this thread dump:
 {quote}
 Thread-51 [WAITING] CPU time: 1d 15h 0m 57s
 java.lang.Object.wait()
 org.apache.zookeeper.ClientCnxn.submitRequest(RequestHeader, Record, Record, 
 ZooKeeper$WatchRegistration)
 org.apache.zookeeper.ZooKeeper.getChildren(String, Watcher)
 org.apache.solr.common.cloud.SolrZkClient$6.execute()2 recursive calls
 org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkOperation)
 org.apache.solr.common.cloud.SolrZkClient.getChildren(String, Watcher, 
 boolean)
 org.apache.solr.cloud.DistributedQueue.orderedChildren(Watcher)
 org.apache.solr.cloud.DistributedQueue.getChildren(long)
 org.apache.solr.cloud.DistributedQueue.peek(long)
 org.apache.solr.cloud.DistributedQueue.peek(boolean)
 org.apache.solr.cloud.Overseer$ClusterStateUpdater.run()
 java.lang.Thread.run()
 {quote}
 Looking at the code, I think the issue is that LatchChildWatcher#process 
 always sets the event to its member variable event, regardless of its type, 
 but the problem is that once the member event is set, the await no longer 
 waits. In this state, the while loop in getChildren(long), when called with 
 wait being Integer.MAX_VALUE will loop back, NOT wait at await because event 
 != null, but then it still will not get any children.
 {quote}
 while (true) \{
   if (!children.isEmpty()) break;
   watcher.await(wait == Long.MAX_VALUE ? DEFAULT_TIMEOUT : wait);
   if (watcher.getWatchedEvent() != null)
 \{ children = orderedChildren(null); \}
   if (wait != Long.MAX_VALUE) break;
 \}
 {quote}
 I think the fix would be to only set the event in the watcher if the type is 
 not None.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5317) [PATCH] Concordance capability

2014-10-29 Thread Tim Allison (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188309#comment-14188309
 ] 

Tim Allison commented on LUCENE-5317:
-

Thank you, Steve.  I created a lucene5317 branch on my github 
[fork|https://github.com/tballison/lucene-solr].  I applied your patch and will 
start adding my local updates...there have been quite a few since I posted the 
initial patch. 

When I'm happy enough with that, I'll put the patch on rb.

Thank you, again.

 [PATCH] Concordance capability
 --

 Key: LUCENE-5317
 URL: https://issues.apache.org/jira/browse/LUCENE-5317
 Project: Lucene - Core
  Issue Type: New Feature
  Components: core/search
Affects Versions: 4.5
Reporter: Tim Allison
  Labels: patch
 Fix For: 4.9

 Attachments: LUCENE-5317.patch, concordance_v1.patch.gz


 This patch enables a Lucene-powered concordance search capability.
 Concordances are extremely useful for linguists, lawyers and other analysts 
 performing analytic search vs. traditional snippeting/document retrieval 
 tasks.  By analytic search, I mean that the user wants to browse every time 
 a term appears (or at least the topn)  in a subset of documents and see the 
 words before and after.  
 Concordance technology is far simpler and less interesting than IR relevance 
 models/methods, but it can be extremely useful for some use cases.
 Traditional concordance sort orders are available (sort on words before the 
 target, words after, target then words before and target then words after).
 Under the hood, this is running SpanQuery's getSpans() and reanalyzing to 
 obtain character offsets.  There is plenty of room for optimizations and 
 refactoring.
 Many thanks to my colleague, Jason Robinson, for input on the design of this 
 patch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [VOTE] Release 4.10.2 RC1

2014-10-29 Thread Michael McCandless
Vote passes, I'll push.
Mike McCandless

http://blog.mikemccandless.com


On Mon, Oct 27, 2014 at 6:00 PM, Steve Rowe sar...@gmail.com wrote:
 +1

 SUCCESS! [0:52:16.190427]

 Steve

 On Oct 27, 2014, at 7:54 AM, Adrien Grand jpou...@gmail.com wrote:

 +1
 SUCCESS! [0:56:11.020611]

 On Sun, Oct 26, 2014 at 4:45 PM, Simon Willnauer
 simon.willna...@gmail.com wrote:
 Tests now pass for me too!! thanks mike

 +1

 On Sun, Oct 26, 2014 at 12:22 PM, Michael McCandless
 luc...@mikemccandless.com wrote:
 Artifacts: 
 http://people.apache.org/~mikemccand/staging_area/lucene-solr-4.10.2-RC1-rev1634293

 Smoke tester: python3 -u dev-tools/scripts/smokeTestRelease.py
 http://people.apache.org/~mikemccand/staging_area/lucene-solr-4.10.2-RC1-rev1634293
 1634293 4.10.2 /tmp/smoke4102 True

 I ran smoke tester:

  SUCCESS! [0:30:16.520543]

 And also confirmed Elasticsearch tests pass with this RC.

 Here's my +1

 Mike McCandless

 http://blog.mikemccandless.com

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org


 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org




 --
 Adrien

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-6668) FieldStatsInfo should expose sumOfSquares.

2014-10-29 Thread Christoph Strobl (JIRA)
Christoph Strobl created SOLR-6668:
--

 Summary: FieldStatsInfo should expose sumOfSquares.
 Key: SOLR-6668
 URL: https://issues.apache.org/jira/browse/SOLR-6668
 Project: Solr
  Issue Type: Improvement
  Components: SolrJ
Affects Versions: 4.10.1
Reporter: Christoph Strobl
Priority: Minor


The stats component returns {{sumOfSquares}} as part of its result. The value 
is picked up by {{FieldStatsInfo}} but cannot be directly accessed as there's 
no getter present. 

The value is also missing in {{toString()}}.

_SideNote:_ when in the class anyway it would be nice if {{min}}, {{max}}, 
{{sum}} would not be exposed/kept as {{Object}} but {{Double}} instead and if a 
{{SolrServerException}} instead of plain {{RuntimeException}} could be thrown 
for unknown keys.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6631) DistributedQueue spinning on calling zookeeper getChildren()

2014-10-29 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188326#comment-14188326
 ] 

Mark Miller commented on SOLR-6631:
---

 bq. if (eventType == Event.EventType.NodeChildrenChanged) {

+1 - we are only interested in waiting around to see a child added - this 
watcher should not need to consider other events.

 DistributedQueue spinning on calling zookeeper getChildren()
 

 Key: SOLR-6631
 URL: https://issues.apache.org/jira/browse/SOLR-6631
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Reporter: Jessica Cheng Mallet
Assignee: Timothy Potter
  Labels: solrcloud
 Attachments: SOLR-6631.patch


 The change from SOLR-6336 introduced a bug where now I'm stuck in a loop 
 making getChildren() request to zookeeper with this thread dump:
 {quote}
 Thread-51 [WAITING] CPU time: 1d 15h 0m 57s
 java.lang.Object.wait()
 org.apache.zookeeper.ClientCnxn.submitRequest(RequestHeader, Record, Record, 
 ZooKeeper$WatchRegistration)
 org.apache.zookeeper.ZooKeeper.getChildren(String, Watcher)
 org.apache.solr.common.cloud.SolrZkClient$6.execute()2 recursive calls
 org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkOperation)
 org.apache.solr.common.cloud.SolrZkClient.getChildren(String, Watcher, 
 boolean)
 org.apache.solr.cloud.DistributedQueue.orderedChildren(Watcher)
 org.apache.solr.cloud.DistributedQueue.getChildren(long)
 org.apache.solr.cloud.DistributedQueue.peek(long)
 org.apache.solr.cloud.DistributedQueue.peek(boolean)
 org.apache.solr.cloud.Overseer$ClusterStateUpdater.run()
 java.lang.Thread.run()
 {quote}
 Looking at the code, I think the issue is that LatchChildWatcher#process 
 always sets the event to its member variable event, regardless of its type, 
 but the problem is that once the member event is set, the await no longer 
 waits. In this state, the while loop in getChildren(long), when called with 
 wait being Integer.MAX_VALUE will loop back, NOT wait at await because event 
 != null, but then it still will not get any children.
 {quote}
 while (true) \{
   if (!children.isEmpty()) break;
   watcher.await(wait == Long.MAX_VALUE ? DEFAULT_TIMEOUT : wait);
   if (watcher.getWatchedEvent() != null)
 \{ children = orderedChildren(null); \}
   if (wait != Long.MAX_VALUE) break;
 \}
 {quote}
 I think the fix would be to only set the event in the watcher if the type is 
 not None.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6631) DistributedQueue spinning on calling zookeeper getChildren()

2014-10-29 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188346#comment-14188346
 ] 

ASF subversion and git services commented on SOLR-6631:
---

Commit 1635131 from [~thelabdude] in branch 'dev/trunk'
[ https://svn.apache.org/r1635131 ]

SOLR-6631: DistributedQueue spinning on calling zookeeper getChildren()

 DistributedQueue spinning on calling zookeeper getChildren()
 

 Key: SOLR-6631
 URL: https://issues.apache.org/jira/browse/SOLR-6631
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Reporter: Jessica Cheng Mallet
Assignee: Timothy Potter
  Labels: solrcloud
 Attachments: SOLR-6631.patch


 The change from SOLR-6336 introduced a bug where now I'm stuck in a loop 
 making getChildren() request to zookeeper with this thread dump:
 {quote}
 Thread-51 [WAITING] CPU time: 1d 15h 0m 57s
 java.lang.Object.wait()
 org.apache.zookeeper.ClientCnxn.submitRequest(RequestHeader, Record, Record, 
 ZooKeeper$WatchRegistration)
 org.apache.zookeeper.ZooKeeper.getChildren(String, Watcher)
 org.apache.solr.common.cloud.SolrZkClient$6.execute()2 recursive calls
 org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkOperation)
 org.apache.solr.common.cloud.SolrZkClient.getChildren(String, Watcher, 
 boolean)
 org.apache.solr.cloud.DistributedQueue.orderedChildren(Watcher)
 org.apache.solr.cloud.DistributedQueue.getChildren(long)
 org.apache.solr.cloud.DistributedQueue.peek(long)
 org.apache.solr.cloud.DistributedQueue.peek(boolean)
 org.apache.solr.cloud.Overseer$ClusterStateUpdater.run()
 java.lang.Thread.run()
 {quote}
 Looking at the code, I think the issue is that LatchChildWatcher#process 
 always sets the event to its member variable event, regardless of its type, 
 but the problem is that once the member event is set, the await no longer 
 waits. In this state, the while loop in getChildren(long), when called with 
 wait being Integer.MAX_VALUE will loop back, NOT wait at await because event 
 != null, but then it still will not get any children.
 {quote}
 while (true) \{
   if (!children.isEmpty()) break;
   watcher.await(wait == Long.MAX_VALUE ? DEFAULT_TIMEOUT : wait);
   if (watcher.getWatchedEvent() != null)
 \{ children = orderedChildren(null); \}
   if (wait != Long.MAX_VALUE) break;
 \}
 {quote}
 I think the fix would be to only set the event in the watcher if the type is 
 not None.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4920) DIH JdbcDataSource exception handling

2014-10-29 Thread Mikhail Khludnev (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188359#comment-14188359
 ] 

Mikhail Khludnev commented on SOLR-4920:


I face usability issue with  

{code:title=JdbcDataSource.java}
141 try {
142 c = DriverManager.getConnection(url, initProps);
143 } catch (SQLException e) {
144 // DriverManager does not allow you to use a driver which is not loaded 
through
145 // the class loader of the class which is trying to make the connection.
146 // This is a workaround for cases where the user puts the driver jar in 
the
147 // solr.home/lib or solr.home/core/lib directories.
148 Driver d = (Driver) DocBuilder.loadClass(driver, 
context.getSolrCore()).newInstance();
149 c = d.connect(url, initProps);
150 }
{code}

if I supply weird url, I've got SQLException, it's caught, then it calls c = 
d.connect(url, initProps); which returns null (which is pretty valid giving the 
javadoc). Then I have NPE where the connection is hit. There is no anything 
about SQLException reasons in the log. Isn't it worth to raise an issue? 


 DIH JdbcDataSource exception handling
 -

 Key: SOLR-4920
 URL: https://issues.apache.org/jira/browse/SOLR-4920
 Project: Solr
  Issue Type: Bug
  Components: contrib - DataImportHandler
Affects Versions: 4.3, Trunk
Reporter: Chris Eldredge
Assignee: Shalin Shekhar Mangar
Priority: Minor
 Fix For: 4.4

 Attachments: patch.diff


 JdbcDataSource will incorrectly suppress exceptions when retrieving a 
 connection from a JNDI context and fall back to trying to use DriverManager 
 to obtain a connection. This makes it impossible to troubleshoot 
 misconfigured JNDI DataSource.
 Additionally, when a SQLException is thrown while initializing a connection, 
 such as in setAutoCommit(), the connection will not be closed. This can cause 
 a resource leak.
 A patch will be attached with unit tests that addresses both issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-6669) 401 is not explicitly handled when querying HttpSolrServer

2014-10-29 Thread JIRA
Magnus Lövgren created SOLR-6669:


 Summary: 401 is not explicitly handled when querying HttpSolrServer
 Key: SOLR-6669
 URL: https://issues.apache.org/jira/browse/SOLR-6669
 Project: Solr
  Issue Type: Bug
  Components: SolrJ
Affects Versions: 4.7
Reporter: Magnus Lövgren
Priority: Minor


This is a regression, likely caused by SOLR-5532 (see comments at the end in 
that JIRA).

I use solrj and HttpSolrServer in my web application (deployed in Tomcat 7). 
Recently I updated Solr from 4.4. to 4.10.1 and it seems 401 is not handled 
properly anymore when using a custom HttpClient.

The essentials of my code (that was working in 4.4):
{code}
String theSolrBaseURL = ...
HttpClient theHttpClient = ...
SolrQuery theSolrQuery = ...

try {
   SolrServer solrServer = new HttpSolrServer(theSolrBaseURL, theHttpClient);
   QueryResponse response = solrServer.query(theSolrQuery);
   ...
} catch (SolrException se) {
   if (se.code() == HttpStatus.SC_UNAUTHORIZED) {
  // Client is using bad credentials, handle appropriately
  ...
   }
   ...
} catch (SolrServerException sse) {
   ...
}
{code}

The code should speak for itself, but the basic idea is to try to recover if 
the client is using bad credentials. In order to do that I catch the 
SolrException and check if the code is 401. This approach worked well in Solr 
4.4.

However, this doesn't work when using Solr 4.10.1. The query method throws a 
SolrServerException if the HttpClient is using bad credentials. The original 
cause is a {{org.apache.http.ParseException}}.

The problem arises in the {{HttpSolrServer.executeMethod(HttpRequestBase, 
ResponseParser)}} metod:

# The HttpClient executes the method and gets the response
#* The response is a 401/Unauthorized
#* 401 response has no Content-Type header
# Since there are no content type, it will be set to empty string as fallback
# Later on the mime type is extracted using 
{{org.apache.http.entity.ContentType.parse(String)}} in order to handle charset 
issues (see SOLR-5532)
#* This metod fails to parse empty string and throws a 
{{org.apache.http.ParseException}} 
# The intermediate caller {{QueryRequest.process(SolrServer)}} will catch the 
exception and throw a {{SolrServerException}}

A potential fix would be to add a 401 case to the existing switch
{code}
case HttpStatus.SC_UNAUTHORIZED:
   throw new RemoteSolrException(httpStatus, Server at 
  + getBaseURL() +  returned non ok status: + httpStatus
  + , message: + response.getStatusLine().getReasonPhrase(),
   null);
{code}

...and it would perhaps be appropriate to handle the content type fallback in 
some other way than setting it to an empty string?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-6669) 401 is not explicitly handled when querying HttpSolrServer

2014-10-29 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/SOLR-6669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Magnus Lövgren updated SOLR-6669:
-
Attachment: SOLR-6669_code_screenshots.zip

Attaching screenshots from IDEA debug session

 401 is not explicitly handled when querying HttpSolrServer
 --

 Key: SOLR-6669
 URL: https://issues.apache.org/jira/browse/SOLR-6669
 Project: Solr
  Issue Type: Bug
  Components: SolrJ
Affects Versions: 4.7
Reporter: Magnus Lövgren
Priority: Minor
 Attachments: SOLR-6669_code_screenshots.zip


 This is a regression, likely caused by SOLR-5532 (see comments at the end in 
 that JIRA).
 I use solrj and HttpSolrServer in my web application (deployed in Tomcat 7). 
 Recently I updated Solr from 4.4. to 4.10.1 and it seems 401 is not handled 
 properly anymore when using a custom HttpClient.
 The essentials of my code (that was working in 4.4):
 {code}
 String theSolrBaseURL = ...
 HttpClient theHttpClient = ...
 SolrQuery theSolrQuery = ...
 try {
SolrServer solrServer = new HttpSolrServer(theSolrBaseURL, theHttpClient);
QueryResponse response = solrServer.query(theSolrQuery);
...
 } catch (SolrException se) {
if (se.code() == HttpStatus.SC_UNAUTHORIZED) {
   // Client is using bad credentials, handle appropriately
 ...
}
...
 } catch (SolrServerException sse) {
...
 }
 {code}
 The code should speak for itself, but the basic idea is to try to recover if 
 the client is using bad credentials. In order to do that I catch the 
 SolrException and check if the code is 401. This approach worked well in Solr 
 4.4.
 However, this doesn't work when using Solr 4.10.1. The query method throws a 
 SolrServerException if the HttpClient is using bad credentials. The original 
 cause is a {{org.apache.http.ParseException}}.
 The problem arises in the {{HttpSolrServer.executeMethod(HttpRequestBase, 
 ResponseParser)}} metod:
 # The HttpClient executes the method and gets the response
 #* The response is a 401/Unauthorized
 #* 401 response has no Content-Type header
 # Since there are no content type, it will be set to empty string as fallback
 # Later on the mime type is extracted using 
 {{org.apache.http.entity.ContentType.parse(String)}} in order to handle 
 charset issues (see SOLR-5532)
 #* This metod fails to parse empty string and throws a 
 {{org.apache.http.ParseException}} 
 # The intermediate caller {{QueryRequest.process(SolrServer)}} will catch the 
 exception and throw a {{SolrServerException}}
 A potential fix would be to add a 401 case to the existing switch
 {code}
 case HttpStatus.SC_UNAUTHORIZED:
throw new RemoteSolrException(httpStatus, Server at 
   + getBaseURL() +  returned non ok status: + httpStatus
   + , message: + response.getStatusLine().getReasonPhrase(),
null);
 {code}
 ...and it would perhaps be appropriate to handle the content type fallback 
 in some other way than setting it to an empty string?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5532) SolrJ Content-Type validation is too strict for some webcontainers / proxies, breaks on equivilent content types

2014-10-29 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-5532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188365#comment-14188365
 ] 

Magnus Lövgren commented on SOLR-5532:
--

The 401 issue is now added as SOLR-6669

 SolrJ Content-Type validation is too strict for some webcontainers / proxies, 
 breaks on equivilent content types
 

 Key: SOLR-5532
 URL: https://issues.apache.org/jira/browse/SOLR-5532
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.6
 Environment: Windows 7, Java 1.7.0_45 (64bit), solr-solrj-4.6.0.jar
Reporter: Jakob Furrer
Assignee: Mark Miller
 Fix For: 4.6.1, 4.7, Trunk

 Attachments: SOLR-5532-elyograg-eclipse-screenshot.png, 
 SOLR-5532.patch


 due to SOLR-3530, HttpSolrServer now does a string equivilence check between 
 the Content-Type returned by the server, and a getContentTYpe() method 
 declared by the ResponseParser .. but string equivilence is too strict, and 
 can result in errors like this one reported by a user
 
 I just upgraded my Solr instance and with it I also upgraded the solrj 
 library in our custom application which sends diverse requests and queries to 
 Solr.
 I use the ping method to determine whether Solr started correctly under the 
 configured address. Since the upgrade the ping response results in an error:
 {code:xml}
 Cause: org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: 
 Expected content type application/xml; charset=UTF-8 but got 
 application/xml;charset=UTF-8.
 ?xml version=1.0 encoding=UTF-8?
 response
 lst name=responseHeaderint name=status0/intint 
 name=QTime0/intlst name=paramsstr name=dfsearchtext/strstr 
 name=echoParamsall/strstr name=rows10/strstr 
 name=echoParamsall/strstr name=wtxml/strstr 
 name=version2.2/strstr name=qsolrpingquery/strstr 
 name=distribfalse/str/lst/lststr name=statusOK/str
 /response
 {code}
 The Solr application itself works fine.
 Using an older version of the solrj library than solr-solrj-4.6.0.jar (e.g. 
 solr-solrj-4.5.1.jar) in the custom application does not produce this error.
 The Exception is produced in a Code block (_HttpSolrServer.java_, method 
 _request(...)_, around. line 140) which has been introduced with version 
 4.6.0.
 Code to reproduce the error:
 {code}
 try {
   HttpSolrServer solrServer = new 
 HttpSolrServer(http://localhost:8080/Solr/collection;);
   solrServer.setParser(new XMLResponseParser()); // this line is making 
 all the difference
   solrServer.ping();
 } catch (Exception e) {
   e.printStackTrace();
 }
 {code}
 A global search for charset=UTF-8 on the source code of solrj indicates 
 that other functions besides ping might be affected as well, because there 
 are several places where application/xml; charset=UTF-8 is spelled without 
 a space after the semicolon.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-6669) 401 is not explicitly handled when querying HttpSolrServer

2014-10-29 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/SOLR-6669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Magnus Lövgren updated SOLR-6669:
-
Environment: 
solr-solrj-4.10.1.jar
   tested with:
Windows 7, 6.1, amd64
Java HotSpot(TM) 64-Bit Server VM, 1.7.0_67, Oracle Corporation
and
Linux, 3.11.6-4-default, amd64
Java HotSpot(TM) 64-Bit Server VM, 1.7.0_72, Oracle Corporation

 401 is not explicitly handled when querying HttpSolrServer
 --

 Key: SOLR-6669
 URL: https://issues.apache.org/jira/browse/SOLR-6669
 Project: Solr
  Issue Type: Bug
  Components: SolrJ
Affects Versions: 4.7
 Environment: solr-solrj-4.10.1.jar
tested with:
 Windows 7, 6.1, amd64
 Java HotSpot(TM) 64-Bit Server VM, 1.7.0_67, Oracle Corporation
 and
 Linux, 3.11.6-4-default, amd64
 Java HotSpot(TM) 64-Bit Server VM, 1.7.0_72, Oracle Corporation
Reporter: Magnus Lövgren
Priority: Minor
 Attachments: SOLR-6669_code_screenshots.zip


 This is a regression, likely caused by SOLR-5532 (see comments at the end in 
 that JIRA).
 I use solrj and HttpSolrServer in my web application (deployed in Tomcat 7). 
 Recently I updated Solr from 4.4. to 4.10.1 and it seems 401 is not handled 
 properly anymore when using a custom HttpClient.
 The essentials of my code (that was working in 4.4):
 {code}
 String theSolrBaseURL = ...
 HttpClient theHttpClient = ...
 SolrQuery theSolrQuery = ...
 try {
SolrServer solrServer = new HttpSolrServer(theSolrBaseURL, theHttpClient);
QueryResponse response = solrServer.query(theSolrQuery);
...
 } catch (SolrException se) {
if (se.code() == HttpStatus.SC_UNAUTHORIZED) {
   // Client is using bad credentials, handle appropriately
 ...
}
...
 } catch (SolrServerException sse) {
...
 }
 {code}
 The code should speak for itself, but the basic idea is to try to recover if 
 the client is using bad credentials. In order to do that I catch the 
 SolrException and check if the code is 401. This approach worked well in Solr 
 4.4.
 However, this doesn't work when using Solr 4.10.1. The query method throws a 
 SolrServerException if the HttpClient is using bad credentials. The original 
 cause is a {{org.apache.http.ParseException}}.
 The problem arises in the {{HttpSolrServer.executeMethod(HttpRequestBase, 
 ResponseParser)}} metod:
 # The HttpClient executes the method and gets the response
 #* The response is a 401/Unauthorized
 #* 401 response has no Content-Type header
 # Since there are no content type, it will be set to empty string as fallback
 # Later on the mime type is extracted using 
 {{org.apache.http.entity.ContentType.parse(String)}} in order to handle 
 charset issues (see SOLR-5532)
 #* This metod fails to parse empty string and throws a 
 {{org.apache.http.ParseException}} 
 # The intermediate caller {{QueryRequest.process(SolrServer)}} will catch the 
 exception and throw a {{SolrServerException}}
 A potential fix would be to add a 401 case to the existing switch
 {code}
 case HttpStatus.SC_UNAUTHORIZED:
throw new RemoteSolrException(httpStatus, Server at 
   + getBaseURL() +  returned non ok status: + httpStatus
   + , message: + response.getStatusLine().getReasonPhrase(),
null);
 {code}
 ...and it would perhaps be appropriate to handle the content type fallback 
 in some other way than setting it to an empty string?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6631) DistributedQueue spinning on calling zookeeper getChildren()

2014-10-29 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188373#comment-14188373
 ] 

ASF subversion and git services commented on SOLR-6631:
---

Commit 1635142 from [~thelabdude] in branch 'dev/branches/branch_5x'
[ https://svn.apache.org/r1635142 ]

SOLR-6631: DistributedQueue spinning on calling zookeeper getChildren()

 DistributedQueue spinning on calling zookeeper getChildren()
 

 Key: SOLR-6631
 URL: https://issues.apache.org/jira/browse/SOLR-6631
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Reporter: Jessica Cheng Mallet
Assignee: Timothy Potter
  Labels: solrcloud
 Attachments: SOLR-6631.patch


 The change from SOLR-6336 introduced a bug where now I'm stuck in a loop 
 making getChildren() request to zookeeper with this thread dump:
 {quote}
 Thread-51 [WAITING] CPU time: 1d 15h 0m 57s
 java.lang.Object.wait()
 org.apache.zookeeper.ClientCnxn.submitRequest(RequestHeader, Record, Record, 
 ZooKeeper$WatchRegistration)
 org.apache.zookeeper.ZooKeeper.getChildren(String, Watcher)
 org.apache.solr.common.cloud.SolrZkClient$6.execute()2 recursive calls
 org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkOperation)
 org.apache.solr.common.cloud.SolrZkClient.getChildren(String, Watcher, 
 boolean)
 org.apache.solr.cloud.DistributedQueue.orderedChildren(Watcher)
 org.apache.solr.cloud.DistributedQueue.getChildren(long)
 org.apache.solr.cloud.DistributedQueue.peek(long)
 org.apache.solr.cloud.DistributedQueue.peek(boolean)
 org.apache.solr.cloud.Overseer$ClusterStateUpdater.run()
 java.lang.Thread.run()
 {quote}
 Looking at the code, I think the issue is that LatchChildWatcher#process 
 always sets the event to its member variable event, regardless of its type, 
 but the problem is that once the member event is set, the await no longer 
 waits. In this state, the while loop in getChildren(long), when called with 
 wait being Integer.MAX_VALUE will loop back, NOT wait at await because event 
 != null, but then it still will not get any children.
 {quote}
 while (true) \{
   if (!children.isEmpty()) break;
   watcher.await(wait == Long.MAX_VALUE ? DEFAULT_TIMEOUT : wait);
   if (watcher.getWatchedEvent() != null)
 \{ children = orderedChildren(null); \}
   if (wait != Long.MAX_VALUE) break;
 \}
 {quote}
 I think the fix would be to only set the event in the watcher if the type is 
 not None.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-6533) Support editing common solrconfig.xml values

2014-10-29 Thread Noble Paul (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-6533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Noble Paul updated SOLR-6533:
-
Attachment: SOLR-6533.patch

 Support editing common solrconfig.xml values
 

 Key: SOLR-6533
 URL: https://issues.apache.org/jira/browse/SOLR-6533
 Project: Solr
  Issue Type: Sub-task
Reporter: Noble Paul
 Attachments: SOLR-6533.patch, SOLR-6533.patch, SOLR-6533.patch, 
 SOLR-6533.patch, SOLR-6533.patch, SOLR-6533.patch


 There are a bunch of properties in solrconfig.xml which users want to edit. 
 We will attack them first
 These properties will be persisted to a separate file called config.json (or 
 whatever file). Instead of saving in the same format we will have well known 
 properties which users can directly edit
 {code}
 updateHandler.autoCommit.maxDocs
 query.filterCache.initialSize
 {code}   
 The api will be modeled around the bulk schema API
 {code:javascript}
 curl http://localhost:8983/solr/collection1/config -H 
 'Content-type:application/json'  -d '{
 set-property : {updateHandler.autoCommit.maxDocs:5},
 unset-property: updateHandler.autoCommit.maxDocs
 }'
 {code}
 {code:javascript}
 //or use this to set ${mypropname} values
 curl http://localhost:8983/solr/collection1/config -H 
 'Content-type:application/json'  -d '{
 set-user-property : {mypropname:my_prop_val},
 unset-user-property:{mypropname}
 }'
 {code}
 The values stored in the config.json will always take precedence and will be 
 applied after loading solrconfig.xml. 
 An http GET on /config path will give the real config that is applied . 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6666) Dynamic copy fields are considering all dynamic fields, causing a significant performance impact on indexing documents

2014-10-29 Thread Erick Erickson (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188403#comment-14188403
 ] 

Erick Erickson commented on SOLR-:
--

Liram:

Good stuff! 
Could you attach the patch to this JIRA? That'll make it easiest for someone to 
pick up.
Here's a hint on how: 
http://wiki.apache.org/solr/HowToContribute#Working_With_Patches

Thanks!


 Dynamic copy fields are considering all dynamic fields, causing a significant 
 performance impact on indexing documents
 --

 Key: SOLR-
 URL: https://issues.apache.org/jira/browse/SOLR-
 Project: Solr
  Issue Type: Improvement
  Components: Schema and Analysis, update
 Environment: Linux, Solr 4.8, Schema with 70 fields and more than 500 
 specific CopyFields for dynamic fields, but without wildcards (the fields are 
 dynamic, the copy directive is not)
Reporter: Liram Vardi

 Result:
 After applying a fix for this issue, tests which we conducted show more than 
 40 percent improvement on our insertion performance.
 Explanation:
 Using JVM profiler, we found a CPU bottleneck during Solr indexing process. 
 This bottleneck can be found at org.apache.solr.schema.IndexSchema, in the 
 following method, getCopyFieldsList():
 {code:title=getCopyFieldsList() |borderStyle=solid}
 final ListCopyField result = new ArrayList();
 for (DynamicCopy dynamicCopy : dynamicCopyFields) {
   if (dynamicCopy.matches(sourceField)) {
 result.add(new CopyField(getField(sourceField), 
 dynamicCopy.getTargetField(sourceField), dynamicCopy.maxChars));
   }
 }
 ListCopyField fixedCopyFields = copyFieldsMap.get(sourceField);
 if (null != fixedCopyFields) {
   result.addAll(fixedCopyFields);
 }
 {code}
 This function tries to find for an input source field all its copyFields (All 
 its destinations which Solr need to move this field). 
 As you can probably note, the first part of the procedure is the procedure 
 most “expensive” step (takes O(n) time while N is the size of the 
 dynamicCopyFields group).
 The next part is just a simple hash extraction, which takes O(1) time. 
 Our schema contains over then 500 copyFields but only 70 of then are 
 indexed fields. 
 We also have one dynamic field with  a wildcard (*), which catches the rest 
 of the document fields. 
 As you can conclude, we have more than 400 copyFields that are based on this 
 dynamicField but all, except one, are fixed (i.e. does not contain any 
 wildcard).
 From some reason, the copyFields registration procedure defines those 400 
 fields as DynamicCopyField  and then store them in the “dynamicCopyFields” 
 array, 
 This step makes getCopyFieldsList() very expensive (in CPU terms) without any 
 justification: All of those 400 copyFields are not glob and therefore do not 
 need any complex pattern matching to the input field. They all can be store 
 at the fixedCopyFields.
 Only copyFields with asterisks need this special treatment and they are 
 (especially on our case) pretty rare.  
 Therefore, we created a patch which fix this problem by changing the 
 registerCopyField() procedure.
 Test which we conducted show that there is no change in the Indexing results. 
 Moreover, the fix still successfully passes the class unit tests (i.e. 
 IndexSchemaTest.java).




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-6666) Dynamic copy fields are considering all dynamic fields, causing a significant performance impact on indexing documents

2014-10-29 Thread Liram Vardi (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liram Vardi updated SOLR-:
--
Description: 
Result:
After applying a fix for this issue, tests which we conducted show more than 40 
percent improvement on our insertion performance.

Explanation:

Using JVM profiler, we found a CPU bottleneck during Solr indexing process. 
This bottleneck can be found at org.apache.solr.schema.IndexSchema, in the 
following method, getCopyFieldsList():

{code:title=getCopyFieldsList() |borderStyle=solid}
final ListCopyField result = new ArrayList();
for (DynamicCopy dynamicCopy : dynamicCopyFields) {
  if (dynamicCopy.matches(sourceField)) {
result.add(new CopyField(getField(sourceField), 
dynamicCopy.getTargetField(sourceField), dynamicCopy.maxChars));
  }
}
ListCopyField fixedCopyFields = copyFieldsMap.get(sourceField);
if (null != fixedCopyFields) {
  result.addAll(fixedCopyFields);
}
{code}

This function tries to find for an input source field all its copyFields (All 
its destinations which Solr need to move this field). 
As you can probably note, the first part of the procedure is the procedure most 
“expensive” step (takes O( n ) time while N is the size of the 
dynamicCopyFields group).
The next part is just a simple hash extraction, which takes O(1) time. 

Our schema contains over then 500 copyFields but only 70 of then are indexed 
fields. 
We also have one dynamic field with  a wildcard ( * ), which catches the rest 
of the document fields. 
As you can conclude, we have more than 400 copyFields that are based on this 
dynamicField but all, except one, are fixed (i.e. does not contain any 
wildcard).

From some reason, the copyFields registration procedure defines those 400 
fields as DynamicCopyField  and then store them in the “dynamicCopyFields” 
array, 
This step makes getCopyFieldsList() very expensive (in CPU terms) without any 
justification: All of those 400 copyFields are not glob and therefore do not 
need any complex pattern matching to the input field. They all can be store at 
the fixedCopyFields.
Only copyFields with asterisks need this special treatment and they are 
(especially on our case) pretty rare.  

Therefore, we created a patch which fix this problem by changing the 
registerCopyField() procedure.
Test which we conducted show that there is no change in the Indexing results. 
Moreover, the fix still successfully passes the class unit tests (i.e. 
IndexSchemaTest.java).

   

  was:
Result:
After applying a fix for this issue, tests which we conducted show more than 40 
percent improvement on our insertion performance.

Explanation:

Using JVM profiler, we found a CPU bottleneck during Solr indexing process. 
This bottleneck can be found at org.apache.solr.schema.IndexSchema, in the 
following method, getCopyFieldsList():

{code:title=getCopyFieldsList() |borderStyle=solid}
final ListCopyField result = new ArrayList();
for (DynamicCopy dynamicCopy : dynamicCopyFields) {
  if (dynamicCopy.matches(sourceField)) {
result.add(new CopyField(getField(sourceField), 
dynamicCopy.getTargetField(sourceField), dynamicCopy.maxChars));
  }
}
ListCopyField fixedCopyFields = copyFieldsMap.get(sourceField);
if (null != fixedCopyFields) {
  result.addAll(fixedCopyFields);
}
{code}

This function tries to find for an input source field all its copyFields (All 
its destinations which Solr need to move this field). 
As you can probably note, the first part of the procedure is the procedure most 
“expensive” step (takes O(n) time while N is the size of the 
dynamicCopyFields group).
The next part is just a simple hash extraction, which takes O(1) time. 

Our schema contains over then 500 copyFields but only 70 of then are indexed 
fields. 
We also have one dynamic field with  a wildcard (*), which catches the rest 
of the document fields. 
As you can conclude, we have more than 400 copyFields that are based on this 
dynamicField but all, except one, are fixed (i.e. does not contain any 
wildcard).

From some reason, the copyFields registration procedure defines those 400 
fields as DynamicCopyField  and then store them in the “dynamicCopyFields” 
array, 
This step makes getCopyFieldsList() very expensive (in CPU terms) without any 
justification: All of those 400 copyFields are not glob and therefore do not 
need any complex pattern matching to the input field. They all can be store at 
the fixedCopyFields.
Only copyFields with asterisks need this special treatment and they are 
(especially on our case) pretty rare.  

Therefore, we created a patch which fix this problem by changing the 
registerCopyField() procedure.
Test which we conducted show that there is no change in the Indexing results. 
Moreover, the fix still successfully passes the class unit tests (i.e. 
IndexSchemaTest.java).

   


 Dynamic copy 

[jira] [Commented] (SOLR-6631) DistributedQueue spinning on calling zookeeper getChildren()

2014-10-29 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188407#comment-14188407
 ] 

ASF subversion and git services commented on SOLR-6631:
---

Commit 1635155 from [~thelabdude] in branch 'dev/trunk'
[ https://svn.apache.org/r1635155 ]

Backout fix for SOLR-6631 as things like create collection are hanging now

 DistributedQueue spinning on calling zookeeper getChildren()
 

 Key: SOLR-6631
 URL: https://issues.apache.org/jira/browse/SOLR-6631
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Reporter: Jessica Cheng Mallet
Assignee: Timothy Potter
  Labels: solrcloud
 Attachments: SOLR-6631.patch


 The change from SOLR-6336 introduced a bug where now I'm stuck in a loop 
 making getChildren() request to zookeeper with this thread dump:
 {quote}
 Thread-51 [WAITING] CPU time: 1d 15h 0m 57s
 java.lang.Object.wait()
 org.apache.zookeeper.ClientCnxn.submitRequest(RequestHeader, Record, Record, 
 ZooKeeper$WatchRegistration)
 org.apache.zookeeper.ZooKeeper.getChildren(String, Watcher)
 org.apache.solr.common.cloud.SolrZkClient$6.execute()2 recursive calls
 org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkOperation)
 org.apache.solr.common.cloud.SolrZkClient.getChildren(String, Watcher, 
 boolean)
 org.apache.solr.cloud.DistributedQueue.orderedChildren(Watcher)
 org.apache.solr.cloud.DistributedQueue.getChildren(long)
 org.apache.solr.cloud.DistributedQueue.peek(long)
 org.apache.solr.cloud.DistributedQueue.peek(boolean)
 org.apache.solr.cloud.Overseer$ClusterStateUpdater.run()
 java.lang.Thread.run()
 {quote}
 Looking at the code, I think the issue is that LatchChildWatcher#process 
 always sets the event to its member variable event, regardless of its type, 
 but the problem is that once the member event is set, the await no longer 
 waits. In this state, the while loop in getChildren(long), when called with 
 wait being Integer.MAX_VALUE will loop back, NOT wait at await because event 
 != null, but then it still will not get any children.
 {quote}
 while (true) \{
   if (!children.isEmpty()) break;
   watcher.await(wait == Long.MAX_VALUE ? DEFAULT_TIMEOUT : wait);
   if (watcher.getWatchedEvent() != null)
 \{ children = orderedChildren(null); \}
   if (wait != Long.MAX_VALUE) break;
 \}
 {quote}
 I think the fix would be to only set the event in the watcher if the type is 
 not None.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6631) DistributedQueue spinning on calling zookeeper getChildren()

2014-10-29 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188408#comment-14188408
 ] 

ASF subversion and git services commented on SOLR-6631:
---

Commit 1635157 from [~thelabdude] in branch 'dev/branches/branch_5x'
[ https://svn.apache.org/r1635157 ]

Backout fix for SOLR-6631 as things like create collection are hanging now

 DistributedQueue spinning on calling zookeeper getChildren()
 

 Key: SOLR-6631
 URL: https://issues.apache.org/jira/browse/SOLR-6631
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Reporter: Jessica Cheng Mallet
Assignee: Timothy Potter
  Labels: solrcloud
 Attachments: SOLR-6631.patch


 The change from SOLR-6336 introduced a bug where now I'm stuck in a loop 
 making getChildren() request to zookeeper with this thread dump:
 {quote}
 Thread-51 [WAITING] CPU time: 1d 15h 0m 57s
 java.lang.Object.wait()
 org.apache.zookeeper.ClientCnxn.submitRequest(RequestHeader, Record, Record, 
 ZooKeeper$WatchRegistration)
 org.apache.zookeeper.ZooKeeper.getChildren(String, Watcher)
 org.apache.solr.common.cloud.SolrZkClient$6.execute()2 recursive calls
 org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkOperation)
 org.apache.solr.common.cloud.SolrZkClient.getChildren(String, Watcher, 
 boolean)
 org.apache.solr.cloud.DistributedQueue.orderedChildren(Watcher)
 org.apache.solr.cloud.DistributedQueue.getChildren(long)
 org.apache.solr.cloud.DistributedQueue.peek(long)
 org.apache.solr.cloud.DistributedQueue.peek(boolean)
 org.apache.solr.cloud.Overseer$ClusterStateUpdater.run()
 java.lang.Thread.run()
 {quote}
 Looking at the code, I think the issue is that LatchChildWatcher#process 
 always sets the event to its member variable event, regardless of its type, 
 but the problem is that once the member event is set, the await no longer 
 waits. In this state, the while loop in getChildren(long), when called with 
 wait being Integer.MAX_VALUE will loop back, NOT wait at await because event 
 != null, but then it still will not get any children.
 {quote}
 while (true) \{
   if (!children.isEmpty()) break;
   watcher.await(wait == Long.MAX_VALUE ? DEFAULT_TIMEOUT : wait);
   if (watcher.getWatchedEvent() != null)
 \{ children = orderedChildren(null); \}
   if (wait != Long.MAX_VALUE) break;
 \}
 {quote}
 I think the fix would be to only set the event in the watcher if the type is 
 not None.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-6666) Dynamic copy fields are considering all dynamic fields, causing a significant performance impact on indexing documents

2014-10-29 Thread Liram Vardi (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liram Vardi updated SOLR-:
--
Attachment: SOLR-.patch

 Dynamic copy fields are considering all dynamic fields, causing a significant 
 performance impact on indexing documents
 --

 Key: SOLR-
 URL: https://issues.apache.org/jira/browse/SOLR-
 Project: Solr
  Issue Type: Improvement
  Components: Schema and Analysis, update
 Environment: Linux, Solr 4.8, Schema with 70 fields and more than 500 
 specific CopyFields for dynamic fields, but without wildcards (the fields are 
 dynamic, the copy directive is not)
Reporter: Liram Vardi
 Attachments: SOLR-.patch


 Result:
 After applying a fix for this issue, tests which we conducted show more than 
 40 percent improvement on our insertion performance.
 Explanation:
 Using JVM profiler, we found a CPU bottleneck during Solr indexing process. 
 This bottleneck can be found at org.apache.solr.schema.IndexSchema, in the 
 following method, getCopyFieldsList():
 {code:title=getCopyFieldsList() |borderStyle=solid}
 final ListCopyField result = new ArrayList();
 for (DynamicCopy dynamicCopy : dynamicCopyFields) {
   if (dynamicCopy.matches(sourceField)) {
 result.add(new CopyField(getField(sourceField), 
 dynamicCopy.getTargetField(sourceField), dynamicCopy.maxChars));
   }
 }
 ListCopyField fixedCopyFields = copyFieldsMap.get(sourceField);
 if (null != fixedCopyFields) {
   result.addAll(fixedCopyFields);
 }
 {code}
 This function tries to find for an input source field all its copyFields (All 
 its destinations which Solr need to move this field). 
 As you can probably note, the first part of the procedure is the procedure 
 most “expensive” step (takes O( n ) time while N is the size of the 
 dynamicCopyFields group).
 The next part is just a simple hash extraction, which takes O(1) time. 
 Our schema contains over then 500 copyFields but only 70 of then are 
 indexed fields. 
 We also have one dynamic field with  a wildcard ( * ), which catches the 
 rest of the document fields. 
 As you can conclude, we have more than 400 copyFields that are based on this 
 dynamicField but all, except one, are fixed (i.e. does not contain any 
 wildcard).
 From some reason, the copyFields registration procedure defines those 400 
 fields as DynamicCopyField  and then store them in the “dynamicCopyFields” 
 array, 
 This step makes getCopyFieldsList() very expensive (in CPU terms) without any 
 justification: All of those 400 copyFields are not glob and therefore do not 
 need any complex pattern matching to the input field. They all can be store 
 at the fixedCopyFields.
 Only copyFields with asterisks need this special treatment and they are 
 (especially on our case) pretty rare.  
 Therefore, we created a patch which fix this problem by changing the 
 registerCopyField() procedure.
 Test which we conducted show that there is no change in the Indexing results. 
 Moreover, the fix still successfully passes the class unit tests (i.e. 
 IndexSchemaTest.java).




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Closed] (SOLR-6667) Dynamic copy fields are considering all dynamic fields, causing a significant performance impact on indexing documents

2014-10-29 Thread Liram Vardi (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-6667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liram Vardi closed SOLR-6667.
-
Resolution: Duplicate

Duplicate of SOLR-

 Dynamic copy fields are considering all dynamic fields, causing a significant 
 performance impact on indexing documents
 --

 Key: SOLR-6667
 URL: https://issues.apache.org/jira/browse/SOLR-6667
 Project: Solr
  Issue Type: Improvement
  Components: Schema and Analysis, update
Affects Versions: 4.8
 Environment: Linux, Solr 4.8, Schema with 70 fields and more than 500 
 specific CopyFields for dynamic fields, but without wildcards (the fields are 
 dynamic, the copy directive is not)
Reporter: Liram Vardi
 Attachments: SOLR-6667.patch


 Result:
 After applying a fix for this issue, tests which we conducted show more than 
 40 percent improvement on our insertion performance.
 Explanation:
 Using JVM profiler, we found a CPU bottleneck during Solr indexing process. 
 This bottleneck can be found at org.apache.solr.schema.IndexSchema, in the 
 following method, getCopyFieldsList():
 {code:title=getCopyFieldsList() |borderStyle=solid}
 final ListCopyField result = new ArrayList();
 for (DynamicCopy dynamicCopy : dynamicCopyFields) {
   if (dynamicCopy.matches(sourceField)) {
 result.add(new CopyField(getField(sourceField), 
 dynamicCopy.getTargetField(sourceField), dynamicCopy.maxChars));
   }
 }
 ListCopyField fixedCopyFields = copyFieldsMap.get(sourceField);
 if (null != fixedCopyFields) {
   result.addAll(fixedCopyFields);
 }
 {code}
 This function tries to find for an input source field all its copyFields (All 
 its destinations which Solr need to move this field). 
 As you can probably note, the first part of the procedure is the procedure 
 most “expensive” step (takes O( n ) time while N is the size of the 
 dynamicCopyFields group).
 The next part is just a simple hash extraction, which takes O(1) time. 
 Our schema contains over then 500 copyFields but only 70 of then are 
 indexed fields. 
 We also have one dynamic field with  a wildcard ( * ), which catches the 
 rest of the document fields. 
 As you can conclude, we have more than 400 copyFields that are based on this 
 dynamicField but all, except one, are fixed (i.e. does not contain any 
 wildcard).
 From some reason, the copyFields registration procedure defines those 400 
 fields as DynamicCopyField  and then store them in the “dynamicCopyFields” 
 array, 
 This step makes getCopyFieldsList() very expensive (in CPU terms) without any 
 justification: All of those 400 copyFields are not glob and therefore do not 
 need any complex pattern matching to the input field. They all can be store 
 at the fixedCopyFields.
 Only copyFields with asterisks need this special treatment and they are 
 (especially on our case) pretty rare.  
 Therefore, we created a patch which fix this problem by changing the 
 registerCopyField() procedure.
 Test which we conducted show that there is no change in the Indexing results. 
 Moreover, the fix still successfully passes the class unit tests (i.e. 
 IndexSchemaTest.java).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6666) Dynamic copy fields are considering all dynamic fields, causing a significant performance impact on indexing documents

2014-10-29 Thread Liram Vardi (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188411#comment-14188411
 ] 

Liram Vardi commented on SOLR-:
---

Thanks :-)
The patch is attached.

 Dynamic copy fields are considering all dynamic fields, causing a significant 
 performance impact on indexing documents
 --

 Key: SOLR-
 URL: https://issues.apache.org/jira/browse/SOLR-
 Project: Solr
  Issue Type: Improvement
  Components: Schema and Analysis, update
 Environment: Linux, Solr 4.8, Schema with 70 fields and more than 500 
 specific CopyFields for dynamic fields, but without wildcards (the fields are 
 dynamic, the copy directive is not)
Reporter: Liram Vardi
 Attachments: SOLR-.patch


 Result:
 After applying a fix for this issue, tests which we conducted show more than 
 40 percent improvement on our insertion performance.
 Explanation:
 Using JVM profiler, we found a CPU bottleneck during Solr indexing process. 
 This bottleneck can be found at org.apache.solr.schema.IndexSchema, in the 
 following method, getCopyFieldsList():
 {code:title=getCopyFieldsList() |borderStyle=solid}
 final ListCopyField result = new ArrayList();
 for (DynamicCopy dynamicCopy : dynamicCopyFields) {
   if (dynamicCopy.matches(sourceField)) {
 result.add(new CopyField(getField(sourceField), 
 dynamicCopy.getTargetField(sourceField), dynamicCopy.maxChars));
   }
 }
 ListCopyField fixedCopyFields = copyFieldsMap.get(sourceField);
 if (null != fixedCopyFields) {
   result.addAll(fixedCopyFields);
 }
 {code}
 This function tries to find for an input source field all its copyFields (All 
 its destinations which Solr need to move this field). 
 As you can probably note, the first part of the procedure is the procedure 
 most “expensive” step (takes O( n ) time while N is the size of the 
 dynamicCopyFields group).
 The next part is just a simple hash extraction, which takes O(1) time. 
 Our schema contains over then 500 copyFields but only 70 of then are 
 indexed fields. 
 We also have one dynamic field with  a wildcard ( * ), which catches the 
 rest of the document fields. 
 As you can conclude, we have more than 400 copyFields that are based on this 
 dynamicField but all, except one, are fixed (i.e. does not contain any 
 wildcard).
 From some reason, the copyFields registration procedure defines those 400 
 fields as DynamicCopyField  and then store them in the “dynamicCopyFields” 
 array, 
 This step makes getCopyFieldsList() very expensive (in CPU terms) without any 
 justification: All of those 400 copyFields are not glob and therefore do not 
 need any complex pattern matching to the input field. They all can be store 
 at the fixedCopyFields.
 Only copyFields with asterisks need this special treatment and they are 
 (especially on our case) pretty rare.  
 Therefore, we created a patch which fix this problem by changing the 
 registerCopyField() procedure.
 Test which we conducted show that there is no change in the Indexing results. 
 Moreover, the fix still successfully passes the class unit tests (i.e. 
 IndexSchemaTest.java).




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Assigned] (SOLR-6666) Dynamic copy fields are considering all dynamic fields, causing a significant performance impact on indexing documents

2014-10-29 Thread Erick Erickson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson reassigned SOLR-:


Assignee: Erick Erickson

 Dynamic copy fields are considering all dynamic fields, causing a significant 
 performance impact on indexing documents
 --

 Key: SOLR-
 URL: https://issues.apache.org/jira/browse/SOLR-
 Project: Solr
  Issue Type: Improvement
  Components: Schema and Analysis, update
 Environment: Linux, Solr 4.8, Schema with 70 fields and more than 500 
 specific CopyFields for dynamic fields, but without wildcards (the fields are 
 dynamic, the copy directive is not)
Reporter: Liram Vardi
Assignee: Erick Erickson
 Attachments: SOLR-.patch


 Result:
 After applying a fix for this issue, tests which we conducted show more than 
 40 percent improvement on our insertion performance.
 Explanation:
 Using JVM profiler, we found a CPU bottleneck during Solr indexing process. 
 This bottleneck can be found at org.apache.solr.schema.IndexSchema, in the 
 following method, getCopyFieldsList():
 {code:title=getCopyFieldsList() |borderStyle=solid}
 final ListCopyField result = new ArrayList();
 for (DynamicCopy dynamicCopy : dynamicCopyFields) {
   if (dynamicCopy.matches(sourceField)) {
 result.add(new CopyField(getField(sourceField), 
 dynamicCopy.getTargetField(sourceField), dynamicCopy.maxChars));
   }
 }
 ListCopyField fixedCopyFields = copyFieldsMap.get(sourceField);
 if (null != fixedCopyFields) {
   result.addAll(fixedCopyFields);
 }
 {code}
 This function tries to find for an input source field all its copyFields (All 
 its destinations which Solr need to move this field). 
 As you can probably note, the first part of the procedure is the procedure 
 most “expensive” step (takes O( n ) time while N is the size of the 
 dynamicCopyFields group).
 The next part is just a simple hash extraction, which takes O(1) time. 
 Our schema contains over then 500 copyFields but only 70 of then are 
 indexed fields. 
 We also have one dynamic field with  a wildcard ( * ), which catches the 
 rest of the document fields. 
 As you can conclude, we have more than 400 copyFields that are based on this 
 dynamicField but all, except one, are fixed (i.e. does not contain any 
 wildcard).
 From some reason, the copyFields registration procedure defines those 400 
 fields as DynamicCopyField  and then store them in the “dynamicCopyFields” 
 array, 
 This step makes getCopyFieldsList() very expensive (in CPU terms) without any 
 justification: All of those 400 copyFields are not glob and therefore do not 
 need any complex pattern matching to the input field. They all can be store 
 at the fixedCopyFields.
 Only copyFields with asterisks need this special treatment and they are 
 (especially on our case) pretty rare.  
 Therefore, we created a patch which fix this problem by changing the 
 registerCopyField() procedure.
 Test which we conducted show that there is no change in the Indexing results. 
 Moreover, the fix still successfully passes the class unit tests (i.e. 
 IndexSchemaTest.java).




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6666) Dynamic copy fields are considering all dynamic fields, causing a significant performance impact on indexing documents

2014-10-29 Thread Erick Erickson (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188412#comment-14188412
 ] 

Erick Erickson commented on SOLR-:
--

I'll take a look, but I won't have any serious tim until this weekend, so if 
anyone wants to pick this up feel free.

 Dynamic copy fields are considering all dynamic fields, causing a significant 
 performance impact on indexing documents
 --

 Key: SOLR-
 URL: https://issues.apache.org/jira/browse/SOLR-
 Project: Solr
  Issue Type: Improvement
  Components: Schema and Analysis, update
 Environment: Linux, Solr 4.8, Schema with 70 fields and more than 500 
 specific CopyFields for dynamic fields, but without wildcards (the fields are 
 dynamic, the copy directive is not)
Reporter: Liram Vardi
Assignee: Erick Erickson
 Attachments: SOLR-.patch


 Result:
 After applying a fix for this issue, tests which we conducted show more than 
 40 percent improvement on our insertion performance.
 Explanation:
 Using JVM profiler, we found a CPU bottleneck during Solr indexing process. 
 This bottleneck can be found at org.apache.solr.schema.IndexSchema, in the 
 following method, getCopyFieldsList():
 {code:title=getCopyFieldsList() |borderStyle=solid}
 final ListCopyField result = new ArrayList();
 for (DynamicCopy dynamicCopy : dynamicCopyFields) {
   if (dynamicCopy.matches(sourceField)) {
 result.add(new CopyField(getField(sourceField), 
 dynamicCopy.getTargetField(sourceField), dynamicCopy.maxChars));
   }
 }
 ListCopyField fixedCopyFields = copyFieldsMap.get(sourceField);
 if (null != fixedCopyFields) {
   result.addAll(fixedCopyFields);
 }
 {code}
 This function tries to find for an input source field all its copyFields (All 
 its destinations which Solr need to move this field). 
 As you can probably note, the first part of the procedure is the procedure 
 most “expensive” step (takes O( n ) time while N is the size of the 
 dynamicCopyFields group).
 The next part is just a simple hash extraction, which takes O(1) time. 
 Our schema contains over then 500 copyFields but only 70 of then are 
 indexed fields. 
 We also have one dynamic field with  a wildcard ( * ), which catches the 
 rest of the document fields. 
 As you can conclude, we have more than 400 copyFields that are based on this 
 dynamicField but all, except one, are fixed (i.e. does not contain any 
 wildcard).
 From some reason, the copyFields registration procedure defines those 400 
 fields as DynamicCopyField  and then store them in the “dynamicCopyFields” 
 array, 
 This step makes getCopyFieldsList() very expensive (in CPU terms) without any 
 justification: All of those 400 copyFields are not glob and therefore do not 
 need any complex pattern matching to the input field. They all can be store 
 at the fixedCopyFields.
 Only copyFields with asterisks need this special treatment and they are 
 (especially on our case) pretty rare.  
 Therefore, we created a patch which fix this problem by changing the 
 registerCopyField() procedure.
 Test which we conducted show that there is no change in the Indexing results. 
 Moreover, the fix still successfully passes the class unit tests (i.e. 
 IndexSchemaTest.java).




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6610) In stateFormat=2, ZkController.publishAndWaitForDownStates always times out

2014-10-29 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188415#comment-14188415
 ] 

ASF subversion and git services commented on SOLR-6610:
---

Commit 1635163 from [~noble.paul] in branch 'dev/trunk'
[ https://svn.apache.org/r1635163 ]

SOLR-6610

 In stateFormat=2, ZkController.publishAndWaitForDownStates always times out
 ---

 Key: SOLR-6610
 URL: https://issues.apache.org/jira/browse/SOLR-6610
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Reporter: Jessica Cheng Mallet
Assignee: Noble Paul
  Labels: solrcloud
 Attachments: SOLR-6610.patch


 Using stateFormat=2, our solr always takes a while to start up and spits out 
 this warning line:
 {quote}
 WARN  - 2014-10-08 17:30:24.290; org.apache.solr.cloud.ZkController; Timed 
 out waiting to see all nodes published as DOWN in our cluster state.
 {quote}
 Looking at the code, this is probably because 
 ZkController.publishAndWaitForDownStates is called in ZkController.init, 
 which gets called via ZkContainer.initZookeeper in CoreContainer.load before 
 any of the stateFormat=2 collection watches are set in the 
 CoreContainer.preRegisterInZk call a few lines later.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6610) In stateFormat=2, ZkController.publishAndWaitForDownStates always times out

2014-10-29 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188428#comment-14188428
 ] 

ASF subversion and git services commented on SOLR-6610:
---

Commit 1635168 from [~noble.paul] in branch 'dev/branches/branch_5x'
[ https://svn.apache.org/r1635168 ]

SOLR-6610

 In stateFormat=2, ZkController.publishAndWaitForDownStates always times out
 ---

 Key: SOLR-6610
 URL: https://issues.apache.org/jira/browse/SOLR-6610
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Reporter: Jessica Cheng Mallet
Assignee: Noble Paul
  Labels: solrcloud
 Attachments: SOLR-6610.patch


 Using stateFormat=2, our solr always takes a while to start up and spits out 
 this warning line:
 {quote}
 WARN  - 2014-10-08 17:30:24.290; org.apache.solr.cloud.ZkController; Timed 
 out waiting to see all nodes published as DOWN in our cluster state.
 {quote}
 Looking at the code, this is probably because 
 ZkController.publishAndWaitForDownStates is called in ZkController.init, 
 which gets called via ZkContainer.initZookeeper in CoreContainer.load before 
 any of the stateFormat=2 collection watches are set in the 
 CoreContainer.preRegisterInZk call a few lines later.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6030) Add norms patched compression which uses table for most common values

2014-10-29 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188442#comment-14188442
 ] 

Robert Muir commented on LUCENE-6030:
-

+1, very nice

 Add norms patched compression which uses table for most common values
 -

 Key: LUCENE-6030
 URL: https://issues.apache.org/jira/browse/LUCENE-6030
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Ryan Ernst
 Attachments: LUCENE-6030.patch


 We have added the PATCHED norms sub format in lucene 50, which uses a bitset 
 to mark documents that have the most common value (when 97% of the documents 
 have that value).  This works well for fields that have a predominant value 
 length, and then a small number of docs with some other random values.  But 
 another common case is having a handful of very common value lengths, like 
 with a title field.
 We can use a table (see TABLE_COMPRESSION) to store the most common values, 
 and save an oridinal for the other case, at which point we can lookup in 
 the secondary patch table.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6517) CollectionsAPI call REBALANCELEADERS

2014-10-29 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188475#comment-14188475
 ] 

Noble Paul commented on SOLR-6517:
--

Please mention the JIRA ticket number in the commit message . It is extremely 
difficult otherwise when I look at the file history

 CollectionsAPI call REBALANCELEADERS
 

 Key: SOLR-6517
 URL: https://issues.apache.org/jira/browse/SOLR-6517
 Project: Solr
  Issue Type: New Feature
Affects Versions: 5.0, Trunk
Reporter: Erick Erickson
Assignee: Erick Erickson
 Fix For: 5.0, Trunk

 Attachments: SOLR-6517.patch, SOLR-6517.patch, SOLR-6517.patch


 Perhaps the final piece of SOLR-6491. Once the preferred leadership roles are 
 assigned, there has to be a command make it so Mr. Solr. This is something 
 of a placeholder to collect ideas. One wouldn't want to flood the system with 
 hundreds of re-assignments at once. Should this be synchronous or asnych? 
 Should it make the best attempt but not worry about perfection? Should it???
 a collection=name parameter would be required and it would re-elect all the 
 leaders that were on the 'wrong' node
 I'm thinking an optionally allowing one to specify a shard in the case where 
 you wanted to make a very specific change. Note that there's no need to 
 specify a particular replica, since there should be only a single 
 preferredLeader per slice.
 This command would do nothing to any slice that did not have a replica with a 
 preferredLeader role. Likewise it would do nothing if the slice in question 
 already had the leader role assigned to the node with the preferredLeader 
 role.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Issue Comment Deleted] (SOLR-6517) CollectionsAPI call REBALANCELEADERS

2014-10-29 Thread Noble Paul (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-6517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Noble Paul updated SOLR-6517:
-
Comment: was deleted

(was: Please mention the JIRA ticket number in the commit message . It is 
extremely difficult otherwise when I look at the file history)

 CollectionsAPI call REBALANCELEADERS
 

 Key: SOLR-6517
 URL: https://issues.apache.org/jira/browse/SOLR-6517
 Project: Solr
  Issue Type: New Feature
Affects Versions: 5.0, Trunk
Reporter: Erick Erickson
Assignee: Erick Erickson
 Fix For: 5.0, Trunk

 Attachments: SOLR-6517.patch, SOLR-6517.patch, SOLR-6517.patch


 Perhaps the final piece of SOLR-6491. Once the preferred leadership roles are 
 assigned, there has to be a command make it so Mr. Solr. This is something 
 of a placeholder to collect ideas. One wouldn't want to flood the system with 
 hundreds of re-assignments at once. Should this be synchronous or asnych? 
 Should it make the best attempt but not worry about perfection? Should it???
 a collection=name parameter would be required and it would re-elect all the 
 leaders that were on the 'wrong' node
 I'm thinking an optionally allowing one to specify a shard in the case where 
 you wanted to make a very specific change. Note that there's no need to 
 specify a particular replica, since there should be only a single 
 preferredLeader per slice.
 This command would do nothing to any slice that did not have a replica with a 
 preferredLeader role. Likewise it would do nothing if the slice in question 
 already had the leader role assigned to the node with the preferredLeader 
 role.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-2927) SolrIndexSearcher's register do not match close and SolrCore's closeSearcher

2014-10-29 Thread Cyrille Roy (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188484#comment-14188484
 ] 

Cyrille Roy edited comment on SOLR-2927 at 10/29/14 3:53 PM:
-

I have been able to reproduce this issue patching the code to throw an 
exception in SolrCore in branch 4_2 in constructor line 875
...
  resourceLoader.inform(resourceLoader);   
  //DO NOT COMMIT THIS:
  if(!metadata.equals(name)) throw new RuntimeException(test exception);
 ...

you can then curl any core
$curl http://localhost:xxx/solr/CORE_NAME/select?q=*:*;

open a jconsole and you will see the leaking mbean named solr/CORE_NAME


was (Author: croy):
I have able to reproduce this issue patching the code to throw an exception in 
SolrCore in branch 4_2 in constructor line 875
...
  resourceLoader.inform(resourceLoader);   
  //DO NOT COMMIT THIS:
  if(!metadata.equals(name)) throw new RuntimeException(test exception);
 ...

you can then curl any core
$curl http://localhost:xxx/solr/CORE_NAME/select?q=*:*;

open a jconsole and you will see the leaking mbean named solr/CORE_NAME

 SolrIndexSearcher's register do not match close and SolrCore's closeSearcher
 

 Key: SOLR-2927
 URL: https://issues.apache.org/jira/browse/SOLR-2927
 Project: Solr
  Issue Type: Bug
  Components: search
Affects Versions: 4.0-ALPHA
 Environment: JDK1.6/CentOS
Reporter: tom liu
Assignee: Shalin Shekhar Mangar
 Fix For: 4.9, Trunk


 # SolrIndexSearcher's register method put the name of searcher, but 
 SolrCore's closeSearcher method remove name of currentSearcher on 
 infoRegistry.
 # SolrIndexSearcher's register method put the name of cache, but 
 SolrIndexSearcher's close do not remove the name of cache.
 so, there maybe lost some memory leak.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2927) SolrIndexSearcher's register do not match close and SolrCore's closeSearcher

2014-10-29 Thread Cyrille Roy (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188484#comment-14188484
 ] 

Cyrille Roy commented on SOLR-2927:
---

I have able to reproduce this issue patching the code to throw an exception in 
SolrCore in branch 4_2 in constructor line 875
...
  resourceLoader.inform(resourceLoader);   
  //DO NOT COMMIT THIS:
  if(!metadata.equals(name)) throw new RuntimeException(test exception);
 ...

you can then curl any core
$curl http://localhost:xxx/solr/CORE_NAME/select?q=*:*;

open a jconsole and you will see the leaking mbean named solr/CORE_NAME

 SolrIndexSearcher's register do not match close and SolrCore's closeSearcher
 

 Key: SOLR-2927
 URL: https://issues.apache.org/jira/browse/SOLR-2927
 Project: Solr
  Issue Type: Bug
  Components: search
Affects Versions: 4.0-ALPHA
 Environment: JDK1.6/CentOS
Reporter: tom liu
Assignee: Shalin Shekhar Mangar
 Fix For: 4.9, Trunk


 # SolrIndexSearcher's register method put the name of searcher, but 
 SolrCore's closeSearcher method remove name of currentSearcher on 
 infoRegistry.
 # SolrIndexSearcher's register method put the name of cache, but 
 SolrIndexSearcher's close do not remove the name of cache.
 so, there maybe lost some memory leak.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-2927) SolrIndexSearcher's register do not match close and SolrCore's closeSearcher

2014-10-29 Thread Cyrille Roy (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cyrille Roy updated SOLR-2927:
--
Attachment: mbean-leak.png

 SolrIndexSearcher's register do not match close and SolrCore's closeSearcher
 

 Key: SOLR-2927
 URL: https://issues.apache.org/jira/browse/SOLR-2927
 Project: Solr
  Issue Type: Bug
  Components: search
Affects Versions: 4.0-ALPHA
 Environment: JDK1.6/CentOS
Reporter: tom liu
Assignee: Shalin Shekhar Mangar
 Fix For: 4.9, Trunk

 Attachments: mbean-leak.png


 # SolrIndexSearcher's register method put the name of searcher, but 
 SolrCore's closeSearcher method remove name of currentSearcher on 
 infoRegistry.
 # SolrIndexSearcher's register method put the name of cache, but 
 SolrIndexSearcher's close do not remove the name of cache.
 so, there maybe lost some memory leak.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2361) OutOfMemoryException while Indexing

2014-10-29 Thread Surendhar (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188486#comment-14188486
 ] 

Surendhar commented on LUCENE-2361:
---

Hi Thomas,
I am facing similar problem as mentioned above. I could see your comments made 
changes. In what version this problem got resolved. Appreciate your help.

 OutOfMemoryException while Indexing
 ---

 Key: LUCENE-2361
 URL: https://issues.apache.org/jira/browse/LUCENE-2361
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/index
Affects Versions: 2.9.1
 Environment: Windows
Reporter: Shivender Devarakonda

 Hi,
 We use lucene 2.9.1 version.
 We see the following OutOfMemory error in our environment, I think This is 
 happening at a significant high load. Have you observed this anytime? Please 
 let me know your thoughts on this.
 org.apache.lucene.index.MergePolicy$MergeException: 
 java.lang.OutOfMemoryError: PermGen space
   at 
 org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:351)
   at 
 org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:315)
 Caused by: java.lang.OutOfMemoryError: PermGen space
   at java.lang.String.$$YJP$$intern(Native Method)
   at java.lang.String.intern(Unknown Source)
   at 
 org.apache.lucene.util.SimpleStringInterner.intern(SimpleStringInterner.java:74)
   at org.apache.lucene.util.StringHelper.intern(StringHelper.java:36)
   at org.apache.lucene.index.FieldInfos.read(FieldInfos.java:356)
   at org.apache.lucene.index.FieldInfos.init(FieldInfos.java:71)
   at 
 org.apache.lucene.index.SegmentReader$CoreReaders.init(SegmentReader.java:116)
   at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:638)
   at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:608)
   at 
 org.apache.lucene.index.IndexWriter$ReaderPool.get(IndexWriter.java:686)
   at 
 org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4979)
   at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:4614)
   at 
 org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:235)
   at 
 org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:291)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-2927) SolrIndexSearcher's register do not match close and SolrCore's closeSearcher

2014-10-29 Thread Cyrille Roy (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cyrille Roy updated SOLR-2927:
--
Attachment: (was: mbean-leak.png)

 SolrIndexSearcher's register do not match close and SolrCore's closeSearcher
 

 Key: SOLR-2927
 URL: https://issues.apache.org/jira/browse/SOLR-2927
 Project: Solr
  Issue Type: Bug
  Components: search
Affects Versions: 4.0-ALPHA
 Environment: JDK1.6/CentOS
Reporter: tom liu
Assignee: Shalin Shekhar Mangar
 Fix For: 4.9, Trunk


 # SolrIndexSearcher's register method put the name of searcher, but 
 SolrCore's closeSearcher method remove name of currentSearcher on 
 infoRegistry.
 # SolrIndexSearcher's register method put the name of cache, but 
 SolrIndexSearcher's close do not remove the name of cache.
 so, there maybe lost some memory leak.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-2927) SolrIndexSearcher's register do not match close and SolrCore's closeSearcher

2014-10-29 Thread Cyrille Roy (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cyrille Roy updated SOLR-2927:
--
Attachment: mbean-leak-jira.png

 SolrIndexSearcher's register do not match close and SolrCore's closeSearcher
 

 Key: SOLR-2927
 URL: https://issues.apache.org/jira/browse/SOLR-2927
 Project: Solr
  Issue Type: Bug
  Components: search
Affects Versions: 4.0-ALPHA
 Environment: JDK1.6/CentOS
Reporter: tom liu
Assignee: Shalin Shekhar Mangar
 Fix For: 4.9, Trunk

 Attachments: mbean-leak-jira.png


 # SolrIndexSearcher's register method put the name of searcher, but 
 SolrCore's closeSearcher method remove name of currentSearcher on 
 infoRegistry.
 # SolrIndexSearcher's register method put the name of cache, but 
 SolrIndexSearcher's close do not remove the name of cache.
 so, there maybe lost some memory leak.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2361) OutOfMemoryException while Indexing

2014-10-29 Thread Surendhar (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188493#comment-14188493
 ] 

Surendhar commented on LUCENE-2361:
---

I could see still its is open, please let me know what version problem got 
resolved.

 OutOfMemoryException while Indexing
 ---

 Key: LUCENE-2361
 URL: https://issues.apache.org/jira/browse/LUCENE-2361
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/index
Affects Versions: 2.9.1
 Environment: Windows
Reporter: Shivender Devarakonda

 Hi,
 We use lucene 2.9.1 version.
 We see the following OutOfMemory error in our environment, I think This is 
 happening at a significant high load. Have you observed this anytime? Please 
 let me know your thoughts on this.
 org.apache.lucene.index.MergePolicy$MergeException: 
 java.lang.OutOfMemoryError: PermGen space
   at 
 org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:351)
   at 
 org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:315)
 Caused by: java.lang.OutOfMemoryError: PermGen space
   at java.lang.String.$$YJP$$intern(Native Method)
   at java.lang.String.intern(Unknown Source)
   at 
 org.apache.lucene.util.SimpleStringInterner.intern(SimpleStringInterner.java:74)
   at org.apache.lucene.util.StringHelper.intern(StringHelper.java:36)
   at org.apache.lucene.index.FieldInfos.read(FieldInfos.java:356)
   at org.apache.lucene.index.FieldInfos.init(FieldInfos.java:71)
   at 
 org.apache.lucene.index.SegmentReader$CoreReaders.init(SegmentReader.java:116)
   at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:638)
   at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:608)
   at 
 org.apache.lucene.index.IndexWriter$ReaderPool.get(IndexWriter.java:686)
   at 
 org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4979)
   at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:4614)
   at 
 org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:235)
   at 
 org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:291)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-2927) SolrIndexSearcher's register do not match close and SolrCore's closeSearcher

2014-10-29 Thread Cyrille Roy (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cyrille Roy updated SOLR-2927:
--
Attachment: SOLR-2927.patch

proposed patch: in SolrCore.close() start with waiting for searcherExecutor and 
then empty the infoRegistry which will unregister the mbean

 SolrIndexSearcher's register do not match close and SolrCore's closeSearcher
 

 Key: SOLR-2927
 URL: https://issues.apache.org/jira/browse/SOLR-2927
 Project: Solr
  Issue Type: Bug
  Components: search
Affects Versions: 4.0-ALPHA
 Environment: JDK1.6/CentOS
Reporter: tom liu
Assignee: Shalin Shekhar Mangar
 Fix For: 4.9, Trunk

 Attachments: SOLR-2927.patch, mbean-leak-jira.png


 # SolrIndexSearcher's register method put the name of searcher, but 
 SolrCore's closeSearcher method remove name of currentSearcher on 
 infoRegistry.
 # SolrIndexSearcher's register method put the name of cache, but 
 SolrIndexSearcher's close do not remove the name of cache.
 so, there maybe lost some memory leak.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-2927) SolrIndexSearcher's register do not match close and SolrCore's closeSearcher

2014-10-29 Thread Cyrille Roy (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188495#comment-14188495
 ] 

Cyrille Roy edited comment on SOLR-2927 at 10/29/14 4:16 PM:
-

proposed patch: in SolrCore.close() start with waiting for searcherExecutor and 
then empty the infoRegistry which will unregister the mbean.
Patch is built against trunk
Please let me know if it is not the right version to build a patch.


was (Author: croy):
proposed patch: in SolrCore.close() start with waiting for searcherExecutor and 
then empty the infoRegistry which will unregister the mbean

 SolrIndexSearcher's register do not match close and SolrCore's closeSearcher
 

 Key: SOLR-2927
 URL: https://issues.apache.org/jira/browse/SOLR-2927
 Project: Solr
  Issue Type: Bug
  Components: search
Affects Versions: 4.0-ALPHA
 Environment: JDK1.6/CentOS
Reporter: tom liu
Assignee: Shalin Shekhar Mangar
 Fix For: 4.9, Trunk

 Attachments: SOLR-2927.patch, mbean-leak-jira.png


 # SolrIndexSearcher's register method put the name of searcher, but 
 SolrCore's closeSearcher method remove name of currentSearcher on 
 infoRegistry.
 # SolrIndexSearcher's register method put the name of cache, but 
 SolrIndexSearcher's close do not remove the name of cache.
 so, there maybe lost some memory leak.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6631) DistributedQueue spinning on calling zookeeper getChildren()

2014-10-29 Thread Jessica Cheng Mallet (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188554#comment-14188554
 ] 

Jessica Cheng Mallet commented on SOLR-6631:


I originally thought NodeChildrenChanged would be enough too, but it made the 
tests hang forever. That's when I realized that the zk.exist() call in offer() 
also uses this watcher, so it's not enough to just watch for 
NodeChildrenChanged.

We can either make the watcher set all not None events (None events don't 
remove watches, so they need to be excluded), or use a different kind of watch 
in the zk.exist() call.

 DistributedQueue spinning on calling zookeeper getChildren()
 

 Key: SOLR-6631
 URL: https://issues.apache.org/jira/browse/SOLR-6631
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Reporter: Jessica Cheng Mallet
Assignee: Timothy Potter
  Labels: solrcloud
 Attachments: SOLR-6631.patch


 The change from SOLR-6336 introduced a bug where now I'm stuck in a loop 
 making getChildren() request to zookeeper with this thread dump:
 {quote}
 Thread-51 [WAITING] CPU time: 1d 15h 0m 57s
 java.lang.Object.wait()
 org.apache.zookeeper.ClientCnxn.submitRequest(RequestHeader, Record, Record, 
 ZooKeeper$WatchRegistration)
 org.apache.zookeeper.ZooKeeper.getChildren(String, Watcher)
 org.apache.solr.common.cloud.SolrZkClient$6.execute()2 recursive calls
 org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkOperation)
 org.apache.solr.common.cloud.SolrZkClient.getChildren(String, Watcher, 
 boolean)
 org.apache.solr.cloud.DistributedQueue.orderedChildren(Watcher)
 org.apache.solr.cloud.DistributedQueue.getChildren(long)
 org.apache.solr.cloud.DistributedQueue.peek(long)
 org.apache.solr.cloud.DistributedQueue.peek(boolean)
 org.apache.solr.cloud.Overseer$ClusterStateUpdater.run()
 java.lang.Thread.run()
 {quote}
 Looking at the code, I think the issue is that LatchChildWatcher#process 
 always sets the event to its member variable event, regardless of its type, 
 but the problem is that once the member event is set, the await no longer 
 waits. In this state, the while loop in getChildren(long), when called with 
 wait being Integer.MAX_VALUE will loop back, NOT wait at await because event 
 != null, but then it still will not get any children.
 {quote}
 while (true) \{
   if (!children.isEmpty()) break;
   watcher.await(wait == Long.MAX_VALUE ? DEFAULT_TIMEOUT : wait);
   if (watcher.getWatchedEvent() != null)
 \{ children = orderedChildren(null); \}
   if (wait != Long.MAX_VALUE) break;
 \}
 {quote}
 I think the fix would be to only set the event in the watcher if the type is 
 not None.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6517) CollectionsAPI call REBALANCELEADERS

2014-10-29 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188559#comment-14188559
 ] 

Noble Paul commented on SOLR-6517:
--

I'm trying to understand how the REBALANCLEADERS command work. Is there is a 
small writeup on the sequence of operations performed when this command is 
invoked 

 CollectionsAPI call REBALANCELEADERS
 

 Key: SOLR-6517
 URL: https://issues.apache.org/jira/browse/SOLR-6517
 Project: Solr
  Issue Type: New Feature
Affects Versions: 5.0, Trunk
Reporter: Erick Erickson
Assignee: Erick Erickson
 Fix For: 5.0, Trunk

 Attachments: SOLR-6517.patch, SOLR-6517.patch, SOLR-6517.patch


 Perhaps the final piece of SOLR-6491. Once the preferred leadership roles are 
 assigned, there has to be a command make it so Mr. Solr. This is something 
 of a placeholder to collect ideas. One wouldn't want to flood the system with 
 hundreds of re-assignments at once. Should this be synchronous or asnych? 
 Should it make the best attempt but not worry about perfection? Should it???
 a collection=name parameter would be required and it would re-elect all the 
 leaders that were on the 'wrong' node
 I'm thinking an optionally allowing one to specify a shard in the case where 
 you wanted to make a very specific change. Note that there's no need to 
 specify a particular replica, since there should be only a single 
 preferredLeader per slice.
 This command would do nothing to any slice that did not have a replica with a 
 preferredLeader role. Likewise it would do nothing if the slice in question 
 already had the leader role assigned to the node with the preferredLeader 
 role.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6631) DistributedQueue spinning on calling zookeeper getChildren()

2014-10-29 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188575#comment-14188575
 ] 

Mark Miller commented on SOLR-6631:
---

Whoops - glossed over that in my IDE call hierarchy.

bq. a different kind of watch in the zk.exist() call

I lean towards that - subclass or something - I think it's better for the 
current name of the watcher and think it's better the watch only processes the 
events it is actually interested in.

Not a real big deal either way, but in the other case, let's change the name of 
the Watcher.

 DistributedQueue spinning on calling zookeeper getChildren()
 

 Key: SOLR-6631
 URL: https://issues.apache.org/jira/browse/SOLR-6631
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Reporter: Jessica Cheng Mallet
Assignee: Timothy Potter
  Labels: solrcloud
 Attachments: SOLR-6631.patch


 The change from SOLR-6336 introduced a bug where now I'm stuck in a loop 
 making getChildren() request to zookeeper with this thread dump:
 {quote}
 Thread-51 [WAITING] CPU time: 1d 15h 0m 57s
 java.lang.Object.wait()
 org.apache.zookeeper.ClientCnxn.submitRequest(RequestHeader, Record, Record, 
 ZooKeeper$WatchRegistration)
 org.apache.zookeeper.ZooKeeper.getChildren(String, Watcher)
 org.apache.solr.common.cloud.SolrZkClient$6.execute()2 recursive calls
 org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkOperation)
 org.apache.solr.common.cloud.SolrZkClient.getChildren(String, Watcher, 
 boolean)
 org.apache.solr.cloud.DistributedQueue.orderedChildren(Watcher)
 org.apache.solr.cloud.DistributedQueue.getChildren(long)
 org.apache.solr.cloud.DistributedQueue.peek(long)
 org.apache.solr.cloud.DistributedQueue.peek(boolean)
 org.apache.solr.cloud.Overseer$ClusterStateUpdater.run()
 java.lang.Thread.run()
 {quote}
 Looking at the code, I think the issue is that LatchChildWatcher#process 
 always sets the event to its member variable event, regardless of its type, 
 but the problem is that once the member event is set, the await no longer 
 waits. In this state, the while loop in getChildren(long), when called with 
 wait being Integer.MAX_VALUE will loop back, NOT wait at await because event 
 != null, but then it still will not get any children.
 {quote}
 while (true) \{
   if (!children.isEmpty()) break;
   watcher.await(wait == Long.MAX_VALUE ? DEFAULT_TIMEOUT : wait);
   if (watcher.getWatchedEvent() != null)
 \{ children = orderedChildren(null); \}
   if (wait != Long.MAX_VALUE) break;
 \}
 {quote}
 I think the fix would be to only set the event in the watcher if the type is 
 not None.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-6517) CollectionsAPI call REBALANCELEADERS

2014-10-29 Thread Erick Erickson (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188690#comment-14188690
 ] 

Erick Erickson edited comment on SOLR-6517 at 10/29/14 5:59 PM:


There's some information here:
https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-RebalanceLeaders

Basically, though, it just queues up Overseer.OverseerAction.LEADER commands. 
There's a little bit of overloading here. 
CollectionsHandler.handleBalanceLeaders does the throttling of how many 
outstanding requests there are, and 
OverseerCollectionsProcessor.processAssignLeaders just queues up a 
Overseer.OverseerAction.LEADER call for the Overseer to execute.

Yeah, as the notes above indicate I'm perfectly aware that I should mention the 
JIRA in the message, just managed to forget once.


was (Author: erickerickson):
There's some information here:
https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-RebalanceLeaders

Basically, though, it just queues up Overseer.OverseerAction.LEADER commands. 
There's a little bit of overloading here. 
CollectionsHandler.handleBalanceLeaders does the throttling of how many 
outstanding requests there are, and 
OverseerCollectionsProcessor.processAssignLeaders just queues up a 
Overseer.OverseerAction.LEADER.toLower() call for the Overseer to execute.

Yeah, as the notes above indicate I'm perfectly aware that I should mention the 
JIRA in the message, just managed to forget once.

 CollectionsAPI call REBALANCELEADERS
 

 Key: SOLR-6517
 URL: https://issues.apache.org/jira/browse/SOLR-6517
 Project: Solr
  Issue Type: New Feature
Affects Versions: 5.0, Trunk
Reporter: Erick Erickson
Assignee: Erick Erickson
 Fix For: 5.0, Trunk

 Attachments: SOLR-6517.patch, SOLR-6517.patch, SOLR-6517.patch


 Perhaps the final piece of SOLR-6491. Once the preferred leadership roles are 
 assigned, there has to be a command make it so Mr. Solr. This is something 
 of a placeholder to collect ideas. One wouldn't want to flood the system with 
 hundreds of re-assignments at once. Should this be synchronous or asnych? 
 Should it make the best attempt but not worry about perfection? Should it???
 a collection=name parameter would be required and it would re-elect all the 
 leaders that were on the 'wrong' node
 I'm thinking an optionally allowing one to specify a shard in the case where 
 you wanted to make a very specific change. Note that there's no need to 
 specify a particular replica, since there should be only a single 
 preferredLeader per slice.
 This command would do nothing to any slice that did not have a replica with a 
 preferredLeader role. Likewise it would do nothing if the slice in question 
 already had the leader role assigned to the node with the preferredLeader 
 role.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6517) CollectionsAPI call REBALANCELEADERS

2014-10-29 Thread Erick Erickson (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188690#comment-14188690
 ] 

Erick Erickson commented on SOLR-6517:
--

There's some information here:
https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-RebalanceLeaders

Basically, though, it just queues up Overseer.OverseerAction.LEADER commands. 
There's a little bit of overloading here. 
CollectionsHandler.handleBalanceLeaders does the throttling of how many 
outstanding requests there are, and 
OverseerCollectionsProcessor.processAssignLeaders just queues up a 
Overseer.OverseerAction.LEADER.toLower() call for the Overseer to execute.

Yeah, as the notes above indicate I'm perfectly aware that I should mention the 
JIRA in the message, just managed to forget once.

 CollectionsAPI call REBALANCELEADERS
 

 Key: SOLR-6517
 URL: https://issues.apache.org/jira/browse/SOLR-6517
 Project: Solr
  Issue Type: New Feature
Affects Versions: 5.0, Trunk
Reporter: Erick Erickson
Assignee: Erick Erickson
 Fix For: 5.0, Trunk

 Attachments: SOLR-6517.patch, SOLR-6517.patch, SOLR-6517.patch


 Perhaps the final piece of SOLR-6491. Once the preferred leadership roles are 
 assigned, there has to be a command make it so Mr. Solr. This is something 
 of a placeholder to collect ideas. One wouldn't want to flood the system with 
 hundreds of re-assignments at once. Should this be synchronous or asnych? 
 Should it make the best attempt but not worry about perfection? Should it???
 a collection=name parameter would be required and it would re-elect all the 
 leaders that were on the 'wrong' node
 I'm thinking an optionally allowing one to specify a shard in the case where 
 you wanted to make a very specific change. Note that there's no need to 
 specify a particular replica, since there should be only a single 
 preferredLeader per slice.
 This command would do nothing to any slice that did not have a replica with a 
 preferredLeader role. Likewise it would do nothing if the slice in question 
 already had the leader role assigned to the node with the preferredLeader 
 role.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6517) CollectionsAPI call REBALANCELEADERS

2014-10-29 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188703#comment-14188703
 ] 

Noble Paul commented on SOLR-6517:
--

how is the leader election changed? How does it ensure that the 
preferredLeader gets elected?

 CollectionsAPI call REBALANCELEADERS
 

 Key: SOLR-6517
 URL: https://issues.apache.org/jira/browse/SOLR-6517
 Project: Solr
  Issue Type: New Feature
Affects Versions: 5.0, Trunk
Reporter: Erick Erickson
Assignee: Erick Erickson
 Fix For: 5.0, Trunk

 Attachments: SOLR-6517.patch, SOLR-6517.patch, SOLR-6517.patch


 Perhaps the final piece of SOLR-6491. Once the preferred leadership roles are 
 assigned, there has to be a command make it so Mr. Solr. This is something 
 of a placeholder to collect ideas. One wouldn't want to flood the system with 
 hundreds of re-assignments at once. Should this be synchronous or asnych? 
 Should it make the best attempt but not worry about perfection? Should it???
 a collection=name parameter would be required and it would re-elect all the 
 leaders that were on the 'wrong' node
 I'm thinking an optionally allowing one to specify a shard in the case where 
 you wanted to make a very specific change. Note that there's no need to 
 specify a particular replica, since there should be only a single 
 preferredLeader per slice.
 This command would do nothing to any slice that did not have a replica with a 
 preferredLeader role. Likewise it would do nothing if the slice in question 
 already had the leader role assigned to the node with the preferredLeader 
 role.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6517) CollectionsAPI call REBALANCELEADERS

2014-10-29 Thread Erick Erickson (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188733#comment-14188733
 ] 

Erick Erickson commented on SOLR-6517:
--

First, there aren't any guarantees here, it tries its best. For instance, the 
node may be down etc...

Barring that though, it's pretty straightforward. The meat of the processing is 
in
collecitonsHandler.handleBalanceLeaders.

get the DocCollection from the cluster state.
for each slice {
  for each replica {
 if (the replica is active and NOT the leader and has the preferredLeader 
property set) queue it up to become the leader
  }
}

There's some bookkeeping here to respect the various parameters about how many 
to reassign at once and how long to wait (maxAtOnce and maxWaitSeconds) as well 
as construct a pretty response giving all the info it can. This latter is one 
benefit of the heavy lifting being in collectionsHandler rather than over in 
the Overseer.


 CollectionsAPI call REBALANCELEADERS
 

 Key: SOLR-6517
 URL: https://issues.apache.org/jira/browse/SOLR-6517
 Project: Solr
  Issue Type: New Feature
Affects Versions: 5.0, Trunk
Reporter: Erick Erickson
Assignee: Erick Erickson
 Fix For: 5.0, Trunk

 Attachments: SOLR-6517.patch, SOLR-6517.patch, SOLR-6517.patch


 Perhaps the final piece of SOLR-6491. Once the preferred leadership roles are 
 assigned, there has to be a command make it so Mr. Solr. This is something 
 of a placeholder to collect ideas. One wouldn't want to flood the system with 
 hundreds of re-assignments at once. Should this be synchronous or asnych? 
 Should it make the best attempt but not worry about perfection? Should it???
 a collection=name parameter would be required and it would re-elect all the 
 leaders that were on the 'wrong' node
 I'm thinking an optionally allowing one to specify a shard in the case where 
 you wanted to make a very specific change. Note that there's no need to 
 specify a particular replica, since there should be only a single 
 preferredLeader per slice.
 This command would do nothing to any slice that did not have a replica with a 
 preferredLeader role. Likewise it would do nothing if the slice in question 
 already had the leader role assigned to the node with the preferredLeader 
 role.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6517) CollectionsAPI call REBALANCELEADERS

2014-10-29 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188746#comment-14188746
 ] 

Noble Paul commented on SOLR-6517:
--

bq. (the replica is active and NOT the leader and has the preferredLeader 
property set) queue it up to become the leader 

What happens to the node that is leader already? evicted?

What happens to other nodes in the queue?

 CollectionsAPI call REBALANCELEADERS
 

 Key: SOLR-6517
 URL: https://issues.apache.org/jira/browse/SOLR-6517
 Project: Solr
  Issue Type: New Feature
Affects Versions: 5.0, Trunk
Reporter: Erick Erickson
Assignee: Erick Erickson
 Fix For: 5.0, Trunk

 Attachments: SOLR-6517.patch, SOLR-6517.patch, SOLR-6517.patch


 Perhaps the final piece of SOLR-6491. Once the preferred leadership roles are 
 assigned, there has to be a command make it so Mr. Solr. This is something 
 of a placeholder to collect ideas. One wouldn't want to flood the system with 
 hundreds of re-assignments at once. Should this be synchronous or asnych? 
 Should it make the best attempt but not worry about perfection? Should it???
 a collection=name parameter would be required and it would re-elect all the 
 leaders that were on the 'wrong' node
 I'm thinking an optionally allowing one to specify a shard in the case where 
 you wanted to make a very specific change. Note that there's no need to 
 specify a particular replica, since there should be only a single 
 preferredLeader per slice.
 This command would do nothing to any slice that did not have a replica with a 
 preferredLeader role. Likewise it would do nothing if the slice in question 
 already had the leader role assigned to the node with the preferredLeader 
 role.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Closed] (LUCENE-2361) OutOfMemoryException while Indexing

2014-10-29 Thread David Smiley (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley closed LUCENE-2361.

Resolution: Cannot Reproduce

IMO this shouldn't be a JIRA issue... it should have been an email thread on 
Lucene java-user list.  Once a reproducible problem is found, then create an 
issue.  OOM's are quite possible simply by allocating too little heap to Java.

 OutOfMemoryException while Indexing
 ---

 Key: LUCENE-2361
 URL: https://issues.apache.org/jira/browse/LUCENE-2361
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/index
Affects Versions: 2.9.1
 Environment: Windows
Reporter: Shivender Devarakonda

 Hi,
 We use lucene 2.9.1 version.
 We see the following OutOfMemory error in our environment, I think This is 
 happening at a significant high load. Have you observed this anytime? Please 
 let me know your thoughts on this.
 org.apache.lucene.index.MergePolicy$MergeException: 
 java.lang.OutOfMemoryError: PermGen space
   at 
 org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:351)
   at 
 org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:315)
 Caused by: java.lang.OutOfMemoryError: PermGen space
   at java.lang.String.$$YJP$$intern(Native Method)
   at java.lang.String.intern(Unknown Source)
   at 
 org.apache.lucene.util.SimpleStringInterner.intern(SimpleStringInterner.java:74)
   at org.apache.lucene.util.StringHelper.intern(StringHelper.java:36)
   at org.apache.lucene.index.FieldInfos.read(FieldInfos.java:356)
   at org.apache.lucene.index.FieldInfos.init(FieldInfos.java:71)
   at 
 org.apache.lucene.index.SegmentReader$CoreReaders.init(SegmentReader.java:116)
   at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:638)
   at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:608)
   at 
 org.apache.lucene.index.IndexWriter$ReaderPool.get(IndexWriter.java:686)
   at 
 org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4979)
   at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:4614)
   at 
 org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:235)
   at 
 org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:291)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS-MAVEN] Lucene-Solr-Maven-5.x #745: POMs out of sync

2014-10-29 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/Lucene-Solr-Maven-5.x/745/

2 tests failed.
FAILED:  org.apache.solr.cloud.ChaosMonkeySafeLeaderTest.testDistribSearch

Error Message:
Test abandoned because suite timeout was reached.

Stack Trace:
java.lang.Exception: Test abandoned because suite timeout was reached.
at __randomizedtesting.SeedInfo.seed([6184597A251FBD36]:0)


FAILED:  
org.apache.solr.cloud.ChaosMonkeySafeLeaderTest.org.apache.solr.cloud.ChaosMonkeySafeLeaderTest

Error Message:
Suite timeout exceeded (= 720 msec).

Stack Trace:
java.lang.Exception: Suite timeout exceeded (= 720 msec).
at __randomizedtesting.SeedInfo.seed([6184597A251FBD36]:0)




Build Log:
[...truncated 53702 lines...]
BUILD FAILED
/usr/home/jenkins/jenkins-slave/workspace/Lucene-Solr-Maven-5.x/build.xml:548: 
The following error occurred while executing this line:
/usr/home/jenkins/jenkins-slave/workspace/Lucene-Solr-Maven-5.x/build.xml:200: 
The following error occurred while executing this line:
: Java returned: 1

Total time: 289 minutes 12 seconds
Build step 'Invoke Ant' marked build as failure
Recording test results
Email was triggered for: Failure
Sending email for trigger: Failure



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-6031) TokenSources

2014-10-29 Thread David Smiley (JIRA)
David Smiley created LUCENE-6031:


 Summary: TokenSources 
 Key: LUCENE-6031
 URL: https://issues.apache.org/jira/browse/LUCENE-6031
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/termvectors
Reporter: David Smiley
Assignee: David Smiley
 Fix For: 5.0


TokenSources.java, in the highlight module, is a facade that returns a 
TokenStream for a field by either un-inverting  converting the TermVector 
Terms, or by text re-analysis if TermVectors are unavailable or don't have the 
right options.  TokenSources is used by the default highlighter, which is the 
most accurate highlighter we've got.  When documents are large (say hundreds of 
kilobytes on up), I found that most of the highlighter's activity was up-front 
spent un-inverting  converting the term vector to a TokenStream, not on the 
actual/real highlighting that follows.  Much of that time was on a huge sort of 
hundreds of thousands of Tokens.  Time was also spent doing lots of String 
conversion and char copying, and it used a lot of memory, too.

In this patch, I overhauled TokenStreamFromTermPositionVector.java, and I 
removed similar logic in TokenSources that was used in circumstances when 
positions weren't available but offsets were.  This class can un-invert term 
vectors that have positions *and/or* offsets (at least one).  It doesn't sort.  
It places Tokens _directly_ into an array of tokens directly indexed by 
position.  When positions aren't available, the startOffset/8 is a substitute.  
I've got a more light-weight Token inner class used in place of the former and 
deprecated Token that ultimately forms a linked-list when the process is done.  
There is no string conversion; character copying is minimized.  The Token array 
is GC'ed after initialization, it's only needed during construction.

Misc:
* It implements reset() efficiently so it need not be wrapped in 
CachingTokenFilter (I'll supply a patch later on this).
* It only fetches payloads if you ask for them by adding the attribute (the 
default highlighter won't add the attribute).  
* It exposes the underlying TermVector terms via a getter too, which is needed 
by another patch to follow later.

A key assumption is that the position increment gap or first position isn't 
gigantic, as that will create wasted space and the linked-list formation 
ultimately has to visit all the slots.  We also assume that there aren't a ton 
of tokens at the same position, since inserting new tokens in sorted order is 
O(N^2) where 'N' is the average co-occurring token length.

My performance testing using Lucene's benchmark module on a megabyte document 
showed 5x speedup, in conjunction with some other patches to be posted 
separately. This patch made the most difference.

As an aside, our JIRA Components ought to be updated to reflect our Lucene 
modules.  There should be a component for highlighting, and not for term 
vectors.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-6031) TokenSources optimization, avoid sort

2014-10-29 Thread David Smiley (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-6031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley updated LUCENE-6031:
-
Summary: TokenSources optimization, avoid sort  (was: TokenSources )

 TokenSources optimization, avoid sort
 -

 Key: LUCENE-6031
 URL: https://issues.apache.org/jira/browse/LUCENE-6031
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/termvectors
Reporter: David Smiley
Assignee: David Smiley
 Fix For: 5.0


 TokenSources.java, in the highlight module, is a facade that returns a 
 TokenStream for a field by either un-inverting  converting the TermVector 
 Terms, or by text re-analysis if TermVectors are unavailable or don't have 
 the right options.  TokenSources is used by the default highlighter, which is 
 the most accurate highlighter we've got.  When documents are large (say 
 hundreds of kilobytes on up), I found that most of the highlighter's activity 
 was up-front spent un-inverting  converting the term vector to a 
 TokenStream, not on the actual/real highlighting that follows.  Much of that 
 time was on a huge sort of hundreds of thousands of Tokens.  Time was also 
 spent doing lots of String conversion and char copying, and it used a lot of 
 memory, too.
 In this patch, I overhauled TokenStreamFromTermPositionVector.java, and I 
 removed similar logic in TokenSources that was used in circumstances when 
 positions weren't available but offsets were.  This class can un-invert term 
 vectors that have positions *and/or* offsets (at least one).  It doesn't 
 sort.  It places Tokens _directly_ into an array of tokens directly indexed 
 by position.  When positions aren't available, the startOffset/8 is a 
 substitute.  I've got a more light-weight Token inner class used in place of 
 the former and deprecated Token that ultimately forms a linked-list when the 
 process is done.  There is no string conversion; character copying is 
 minimized.  The Token array is GC'ed after initialization, it's only needed 
 during construction.
 Misc:
 * It implements reset() efficiently so it need not be wrapped in 
 CachingTokenFilter (I'll supply a patch later on this).
 * It only fetches payloads if you ask for them by adding the attribute (the 
 default highlighter won't add the attribute).  
 * It exposes the underlying TermVector terms via a getter too, which is 
 needed by another patch to follow later.
 A key assumption is that the position increment gap or first position isn't 
 gigantic, as that will create wasted space and the linked-list formation 
 ultimately has to visit all the slots.  We also assume that there aren't a 
 ton of tokens at the same position, since inserting new tokens in sorted 
 order is O(N^2) where 'N' is the average co-occurring token length.
 My performance testing using Lucene's benchmark module on a megabyte document 
 showed 5x speedup, in conjunction with some other patches to be posted 
 separately. This patch made the most difference.
 As an aside, our JIRA Components ought to be updated to reflect our Lucene 
 modules.  There should be a component for highlighting, and not for term 
 vectors.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-6031) TokenSources optimization, avoid sort

2014-10-29 Thread David Smiley (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-6031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley updated LUCENE-6031:
-
Attachment: LUCENE-6031.patch

Here's the patch.

There are a couple no-commits:
# I want to rename TokenStreamFromTermPositionVector to 
TokenStreamFromTermVector
# I think TokenSources.getTokenStreamWithOffsets should relax it's insistence 
that the term vector have positions.  If you have control of your index options 
(and you do!), then you can choose not to put in positions and then highlight 
with the consequences of that decision, which is that highlighting will ignore 
stop-words, thus a query Sugar and Spice would not match sugar space and a 
query of sugar spice would match sugar and spice indexed.  If you don't 
even have stop-words then why put positions in the term vector.

 TokenSources optimization, avoid sort
 -

 Key: LUCENE-6031
 URL: https://issues.apache.org/jira/browse/LUCENE-6031
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/termvectors
Reporter: David Smiley
Assignee: David Smiley
 Fix For: 5.0

 Attachments: LUCENE-6031.patch


 TokenSources.java, in the highlight module, is a facade that returns a 
 TokenStream for a field by either un-inverting  converting the TermVector 
 Terms, or by text re-analysis if TermVectors are unavailable or don't have 
 the right options.  TokenSources is used by the default highlighter, which is 
 the most accurate highlighter we've got.  When documents are large (say 
 hundreds of kilobytes on up), I found that most of the highlighter's activity 
 was up-front spent un-inverting  converting the term vector to a 
 TokenStream, not on the actual/real highlighting that follows.  Much of that 
 time was on a huge sort of hundreds of thousands of Tokens.  Time was also 
 spent doing lots of String conversion and char copying, and it used a lot of 
 memory, too.
 In this patch, I overhauled TokenStreamFromTermPositionVector.java, and I 
 removed similar logic in TokenSources that was used in circumstances when 
 positions weren't available but offsets were.  This class can un-invert term 
 vectors that have positions *and/or* offsets (at least one).  It doesn't 
 sort.  It places Tokens _directly_ into an array of tokens directly indexed 
 by position.  When positions aren't available, the startOffset/8 is a 
 substitute.  I've got a more light-weight Token inner class used in place of 
 the former and deprecated Token that ultimately forms a linked-list when the 
 process is done.  There is no string conversion; character copying is 
 minimized.  The Token array is GC'ed after initialization, it's only needed 
 during construction.
 Misc:
 * It implements reset() efficiently so it need not be wrapped in 
 CachingTokenFilter (I'll supply a patch later on this).
 * It only fetches payloads if you ask for them by adding the attribute (the 
 default highlighter won't add the attribute).  
 * It exposes the underlying TermVector terms via a getter too, which is 
 needed by another patch to follow later.
 A key assumption is that the position increment gap or first position isn't 
 gigantic, as that will create wasted space and the linked-list formation 
 ultimately has to visit all the slots.  We also assume that there aren't a 
 ton of tokens at the same position, since inserting new tokens in sorted 
 order is O(N^2) where 'N' is the average co-occurring token length.
 My performance testing using Lucene's benchmark module on a megabyte document 
 showed 5x speedup, in conjunction with some other patches to be posted 
 separately. This patch made the most difference.
 As an aside, our JIRA Components ought to be updated to reflect our Lucene 
 modules.  There should be a component for highlighting, and not for term 
 vectors.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-6032) Dealing with slow iterators

2014-10-29 Thread Adrien Grand (JIRA)
Adrien Grand created LUCENE-6032:


 Summary: Dealing with slow iterators
 Key: LUCENE-6032
 URL: https://issues.apache.org/jira/browse/LUCENE-6032
 Project: Lucene - Core
  Issue Type: Wish
Reporter: Adrien Grand
Priority: Minor


This is a recurring issue (for instance already discussed in LUCENE-5418) but 
queries can sometimes be super slow if they wrap a filter that provides 
linear-time nextDoc/advance.

LUCENE-5418 has the following comment:

bq. New patch, throwing UOE from DocIdSet.iterator() for the Filter returned by 
Range.getFilter(). I like this approach: it's safer for the user so they don't 
accidentally apply a super slow filter.

I like this approach because doc id sets not providing efficient iteration 
should really be an exception rather than a common case. In addition, using an 
exception has the benefit of propagating the information through the call 
stack, which would not be the case if we used null or a sentinel value to say 
that the iterator is super slow. So if you write a filter that can wrap other 
filters and doesn't know how to deal with filters that don't support efficient 
iteration, you do not need to modify your code: it will work just fine with 
filters that support fast iteration and will fail on filters that don't.

Something I would like to explore is whether things like FilteredQuery could 
catch this exception in order to fall back automatically to a random-access 
strategy?

The general idea I have is that it is ok to apply a random filter as long as 
you have a fast iterator to drive iteration? So eg. a filtered query based on a 
slow iterator would make sense, but not a ConstantScoreQuery that would wrap a 
filter since it would need to evaluate the filter on all non-deleted documents 
(it would propagate the exception of the filter).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-6033) Add CachingTokenFilter.isCached and switch LinkedList to ArrayList

2014-10-29 Thread David Smiley (JIRA)
David Smiley created LUCENE-6033:


 Summary: Add CachingTokenFilter.isCached and switch LinkedList to 
ArrayList
 Key: LUCENE-6033
 URL: https://issues.apache.org/jira/browse/LUCENE-6033
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: David Smiley
Assignee: David Smiley
 Fix For: 5.0


CachingTokenFilter could use a simple boolean isCached() method implemented 
as-such:
{code:java}
  /** If the underlying token stream was consumed and cached */
  public boolean isCached() {
return cache != null;
  }
{code}
It's useful for the highlighting code to remove its wrapping of 
CachingTokenFilter if after handing-off to parts of its framework it turns out 
that it wasn't used.

Furthermore, use an ArrayList, not a LinkedList.  ArrayList is leaner when the 
token count is high, and this class doesn't manipulate the list in a way that 
might favor LL.

A separate patch will come that actually uses this method.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-6032) Dealing with slow iterators

2014-10-29 Thread Adrien Grand (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-6032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrien Grand updated LUCENE-6032:
-
Attachment: LUCENE-6032.patch

Here is a patch just to show the idea (it doesn't pass tests anyway since we 
have a couple of tests that wrap slow filters into a CSQ to test that they 
match the right docs).

 Dealing with slow iterators
 ---

 Key: LUCENE-6032
 URL: https://issues.apache.org/jira/browse/LUCENE-6032
 Project: Lucene - Core
  Issue Type: Wish
Reporter: Adrien Grand
Priority: Minor
 Attachments: LUCENE-6032.patch


 This is a recurring issue (for instance already discussed in LUCENE-5418) but 
 queries can sometimes be super slow if they wrap a filter that provides 
 linear-time nextDoc/advance.
 LUCENE-5418 has the following comment:
 bq. New patch, throwing UOE from DocIdSet.iterator() for the Filter returned 
 by Range.getFilter(). I like this approach: it's safer for the user so they 
 don't accidentally apply a super slow filter.
 I like this approach because doc id sets not providing efficient iteration 
 should really be an exception rather than a common case. In addition, using 
 an exception has the benefit of propagating the information through the call 
 stack, which would not be the case if we used null or a sentinel value to say 
 that the iterator is super slow. So if you write a filter that can wrap other 
 filters and doesn't know how to deal with filters that don't support 
 efficient iteration, you do not need to modify your code: it will work just 
 fine with filters that support fast iteration and will fail on filters that 
 don't.
 Something I would like to explore is whether things like FilteredQuery could 
 catch this exception in order to fall back automatically to a random-access 
 strategy?
 The general idea I have is that it is ok to apply a random filter as long as 
 you have a fast iterator to drive iteration? So eg. a filtered query based on 
 a slow iterator would make sense, but not a ConstantScoreQuery that would 
 wrap a filter since it would need to evaluate the filter on all non-deleted 
 documents (it would propagate the exception of the filter).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-6033) Add CachingTokenFilter.isCached and switch LinkedList to ArrayList

2014-10-29 Thread David Smiley (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-6033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley updated LUCENE-6033:
-
Attachment: LUCENE-6033.patch

 Add CachingTokenFilter.isCached and switch LinkedList to ArrayList
 --

 Key: LUCENE-6033
 URL: https://issues.apache.org/jira/browse/LUCENE-6033
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: David Smiley
Assignee: David Smiley
 Fix For: 5.0

 Attachments: LUCENE-6033.patch


 CachingTokenFilter could use a simple boolean isCached() method implemented 
 as-such:
 {code:java}
   /** If the underlying token stream was consumed and cached */
   public boolean isCached() {
 return cache != null;
   }
 {code}
 It's useful for the highlighting code to remove its wrapping of 
 CachingTokenFilter if after handing-off to parts of its framework it turns 
 out that it wasn't used.
 Furthermore, use an ArrayList, not a LinkedList.  ArrayList is leaner when 
 the token count is high, and this class doesn't manipulate the list in a way 
 that might favor LL.
 A separate patch will come that actually uses this method.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6517) CollectionsAPI call REBALANCELEADERS

2014-10-29 Thread Erick Erickson (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188976#comment-14188976
 ] 

Erick Erickson commented on SOLR-6517:
--

bq: What happens to the node that is leader already?
nothing, it's already the leader, what purpose would be served by changing 
anything?

bq: What happens to other nodes in the queue?
Not sure what you're asking here. The trick is that if a replica has the 
preferredLeader property set LeaderElector.joinElection is called with 
joinAtHead set to true. So it's next up in the list when the leadership is 
changed. The rest of the nodes are still in the queue though, ready to take 
over if the preferredLeader goes away.



 CollectionsAPI call REBALANCELEADERS
 

 Key: SOLR-6517
 URL: https://issues.apache.org/jira/browse/SOLR-6517
 Project: Solr
  Issue Type: New Feature
Affects Versions: 5.0, Trunk
Reporter: Erick Erickson
Assignee: Erick Erickson
 Fix For: 5.0, Trunk

 Attachments: SOLR-6517.patch, SOLR-6517.patch, SOLR-6517.patch


 Perhaps the final piece of SOLR-6491. Once the preferred leadership roles are 
 assigned, there has to be a command make it so Mr. Solr. This is something 
 of a placeholder to collect ideas. One wouldn't want to flood the system with 
 hundreds of re-assignments at once. Should this be synchronous or asnych? 
 Should it make the best attempt but not worry about perfection? Should it???
 a collection=name parameter would be required and it would re-elect all the 
 leaders that were on the 'wrong' node
 I'm thinking an optionally allowing one to specify a shard in the case where 
 you wanted to make a very specific change. Note that there's no need to 
 specify a particular replica, since there should be only a single 
 preferredLeader per slice.
 This command would do nothing to any slice that did not have a replica with a 
 preferredLeader role. Likewise it would do nothing if the slice in question 
 already had the leader role assigned to the node with the preferredLeader 
 role.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-Tests-trunk-Java7 - Build # 4942 - Failure

2014-10-29 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/Lucene-Solr-Tests-trunk-Java7/4942/

1 tests failed.
REGRESSION:  org.apache.solr.client.solrj.TestLBHttpSolrServer.testReliability

Error Message:
No live SolrServers available to handle this request

Stack Trace:
org.apache.solr.client.solrj.SolrServerException: No live SolrServers available 
to handle this request
at 
__randomizedtesting.SeedInfo.seed([F327769CC692E696:32EFABDA67F4373F]:0)
at 
org.apache.solr.client.solrj.impl.LBHttpSolrServer.request(LBHttpSolrServer.java:539)
at 
org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:91)
at org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:301)
at 
org.apache.solr.client.solrj.TestLBHttpSolrServer.testReliability(TestLBHttpSolrServer.java:223)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1618)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:827)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:863)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:877)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:365)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:798)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:458)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:836)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:738)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:772)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:783)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:54)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65)
at 
org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 

[jira] [Commented] (SOLR-6513) Add a collectionsAPI call BALANCESLICEUNIQUE

2014-10-29 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-6513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14189118#comment-14189118
 ] 

Jan Høydahl commented on SOLR-6513:
---

I thought we agreed to prefer the term shard over slice, so I think we 
should do this for this API as well. 

The *only* place in our refguide we use the word slice is in [How SolrCloud 
Works|https://cwiki.apache.org/confluence/display/solr/How+SolrCloud+Works] 
\[1\] and that description is disputed.

The refguide explanation of what a shard is can be found in [Shards and 
Indexing Data in 
SolrCloud|https://cwiki.apache.org/confluence/display/solr/Shards+and+Indexing+Data+in+SolrCloud]
 \[2\], quoting: 
{quote}
When your data is too large for one node, you can break it up and store it in 
sections by creating one or more shards. Each is a portion of the logical 
index, or core, and it's the set of all nodes containing that section of the 
index.
{quote}

So I'm proposing a rename of this API to {{BALANCESHARDUNIQUE}} and a rewrite 
of \[1\].

\[1\] https://cwiki.apache.org/confluence/display/solr/How+SolrCloud+Works
\[2\] 
https://cwiki.apache.org/confluence/display/solr/Shards+and+Indexing+Data+in+SolrCloud

 Add a collectionsAPI call BALANCESLICEUNIQUE
 

 Key: SOLR-6513
 URL: https://issues.apache.org/jira/browse/SOLR-6513
 Project: Solr
  Issue Type: Improvement
Reporter: Erick Erickson
Assignee: Erick Erickson
 Fix For: 5.0, Trunk

 Attachments: SOLR-6513.patch, SOLR-6513.patch, SOLR-6513.patch, 
 SOLR-6513.patch


 Another sub-task for SOLR-6491. The ability to assign a property on a 
 node-by-node basis is nice, but tedious to get right for a sysadmin, 
 especially if there are, say, 100s of nodes hosting a system. This JIRA would 
 essentially provide an automatic mechanism for assigning a property. This 
 particular command simply changes the cluster state, it doesn't do anything 
 like re-assign functions.
 My idea for this version is fairly limited. You'd have to specify a 
 collection and there would be no attempt to, say, evenly distribute the 
 preferred leader role/property for this collection by looking at _other_ 
 collections. Or by looking at underlying hardware capabilities. Or
 It would be a pretty simple round-robin assignment. About the only 
 intelligence built in would be to change as few roles/properties as possible. 
 Let's say that the correct number of nodes for this role turned out to be 3. 
 Any node currently having 3 properties for this collection would NOT be 
 changed. Any node having 2 properties would have one added that would be 
 taken from some node with  3 properties like this.
 This probably needs an optional parameter, something like 
 includeInactiveNodes=true|false
 Since this is an arbitrary property, one must specify sliceUnique=true. So 
 for the preferredLeader functionality, one would specify something like:
 action=BALANCESLICEUNIQUEproperty=preferredLeaderproprety.value=true.
 There are checks in this code that require the preferredLeader to have a t/f 
 value and require that sliceUnique bet true. That said, this can be called on 
 an arbitrary property that has only one such property per slice.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6513) Add a collectionsAPI call BALANCESLICEUNIQUE

2014-10-29 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14189130#comment-14189130
 ] 

Mark Miller commented on SOLR-6513:
---

The general way of things has been to use slice in code and shard+context in 
user facing things. There has never been real agreement any of these issues IMO 
though. Not even when just two of us worked on it.

 Add a collectionsAPI call BALANCESLICEUNIQUE
 

 Key: SOLR-6513
 URL: https://issues.apache.org/jira/browse/SOLR-6513
 Project: Solr
  Issue Type: Improvement
Reporter: Erick Erickson
Assignee: Erick Erickson
 Fix For: 5.0, Trunk

 Attachments: SOLR-6513.patch, SOLR-6513.patch, SOLR-6513.patch, 
 SOLR-6513.patch


 Another sub-task for SOLR-6491. The ability to assign a property on a 
 node-by-node basis is nice, but tedious to get right for a sysadmin, 
 especially if there are, say, 100s of nodes hosting a system. This JIRA would 
 essentially provide an automatic mechanism for assigning a property. This 
 particular command simply changes the cluster state, it doesn't do anything 
 like re-assign functions.
 My idea for this version is fairly limited. You'd have to specify a 
 collection and there would be no attempt to, say, evenly distribute the 
 preferred leader role/property for this collection by looking at _other_ 
 collections. Or by looking at underlying hardware capabilities. Or
 It would be a pretty simple round-robin assignment. About the only 
 intelligence built in would be to change as few roles/properties as possible. 
 Let's say that the correct number of nodes for this role turned out to be 3. 
 Any node currently having 3 properties for this collection would NOT be 
 changed. Any node having 2 properties would have one added that would be 
 taken from some node with  3 properties like this.
 This probably needs an optional parameter, something like 
 includeInactiveNodes=true|false
 Since this is an arbitrary property, one must specify sliceUnique=true. So 
 for the preferredLeader functionality, one would specify something like:
 action=BALANCESLICEUNIQUEproperty=preferredLeaderproprety.value=true.
 There are checks in this code that require the preferredLeader to have a t/f 
 value and require that sliceUnique bet true. That said, this can be called on 
 an arbitrary property that has only one such property per slice.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-6670) change BALANCESLICEUNIQUE to BALANCESHARDUNIQUE

2014-10-29 Thread Erick Erickson (JIRA)
Erick Erickson created SOLR-6670:


 Summary: change BALANCESLICEUNIQUE to BALANCESHARDUNIQUE
 Key: SOLR-6670
 URL: https://issues.apache.org/jira/browse/SOLR-6670
 Project: Solr
  Issue Type: Improvement
Reporter: Erick Erickson
Priority: Minor


JIRA for Jan's comments on SOLR-6513:

I thought we agreed to prefer the term shard over slice, so I think we 
should do this for this API as well.
The only place in our refguide we use the word slice is in How SolrCloud 
Works [1] and that description is disputed.
The refguide explanation of what a shard is can be found in Shards and Indexing 
Data in SolrCloud [2], quoting:
When your data is too large for one node, you can break it up and store it in 
sections by creating one or more shards. Each is a portion of the logical 
index, or core, and it's the set of all nodes containing that section of the 
index.
So I'm proposing a rename of this API to BALANCESHARDUNIQUE and a rewrite of 
[1].
[1] https://cwiki.apache.org/confluence/display/solr/How+SolrCloud+Works
[2] 
https://cwiki.apache.org/confluence/display/solr/Shards+and+Indexing+Data+in+SolrCloud

Note Mark's comment on that JIRA, but I think it would be best to continue to 
talk about shards with user-facing operations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Assigned] (SOLR-6670) change BALANCESLICEUNIQUE to BALANCESHARDUNIQUE

2014-10-29 Thread Erick Erickson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-6670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson reassigned SOLR-6670:


Assignee: Erick Erickson

 change BALANCESLICEUNIQUE to BALANCESHARDUNIQUE
 ---

 Key: SOLR-6670
 URL: https://issues.apache.org/jira/browse/SOLR-6670
 Project: Solr
  Issue Type: Improvement
Reporter: Erick Erickson
Assignee: Erick Erickson
Priority: Minor

 JIRA for Jan's comments on SOLR-6513:
 I thought we agreed to prefer the term shard over slice, so I think we 
 should do this for this API as well.
 The only place in our refguide we use the word slice is in How SolrCloud 
 Works [1] and that description is disputed.
 The refguide explanation of what a shard is can be found in Shards and 
 Indexing Data in SolrCloud [2], quoting:
 When your data is too large for one node, you can break it up and store it in 
 sections by creating one or more shards. Each is a portion of the logical 
 index, or core, and it's the set of all nodes containing that section of the 
 index.
 So I'm proposing a rename of this API to BALANCESHARDUNIQUE and a rewrite of 
 [1].
 [1] https://cwiki.apache.org/confluence/display/solr/How+SolrCloud+Works
 [2] 
 https://cwiki.apache.org/confluence/display/solr/Shards+and+Indexing+Data+in+SolrCloud
 Note Mark's comment on that JIRA, but I think it would be best to continue to 
 talk about shards with user-facing operations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6513) Add a collectionsAPI call BALANCESLICEUNIQUE

2014-10-29 Thread Erick Erickson (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14189181#comment-14189181
 ] 

Erick Erickson commented on SOLR-6513:
--

Still, I'm all for keeping things consistent. See SOLR-6670 and we'll go from 
there.

 Add a collectionsAPI call BALANCESLICEUNIQUE
 

 Key: SOLR-6513
 URL: https://issues.apache.org/jira/browse/SOLR-6513
 Project: Solr
  Issue Type: Improvement
Reporter: Erick Erickson
Assignee: Erick Erickson
 Fix For: 5.0, Trunk

 Attachments: SOLR-6513.patch, SOLR-6513.patch, SOLR-6513.patch, 
 SOLR-6513.patch


 Another sub-task for SOLR-6491. The ability to assign a property on a 
 node-by-node basis is nice, but tedious to get right for a sysadmin, 
 especially if there are, say, 100s of nodes hosting a system. This JIRA would 
 essentially provide an automatic mechanism for assigning a property. This 
 particular command simply changes the cluster state, it doesn't do anything 
 like re-assign functions.
 My idea for this version is fairly limited. You'd have to specify a 
 collection and there would be no attempt to, say, evenly distribute the 
 preferred leader role/property for this collection by looking at _other_ 
 collections. Or by looking at underlying hardware capabilities. Or
 It would be a pretty simple round-robin assignment. About the only 
 intelligence built in would be to change as few roles/properties as possible. 
 Let's say that the correct number of nodes for this role turned out to be 3. 
 Any node currently having 3 properties for this collection would NOT be 
 changed. Any node having 2 properties would have one added that would be 
 taken from some node with  3 properties like this.
 This probably needs an optional parameter, something like 
 includeInactiveNodes=true|false
 Since this is an arbitrary property, one must specify sliceUnique=true. So 
 for the preferredLeader functionality, one would specify something like:
 action=BALANCESLICEUNIQUEproperty=preferredLeaderproprety.value=true.
 There are checks in this code that require the preferredLeader to have a t/f 
 value and require that sliceUnique bet true. That said, this can be called on 
 an arbitrary property that has only one such property per slice.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-1672) RFE: facet reverse sort count

2014-10-29 Thread Charles Draper (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14189210#comment-14189210
 ] 

Charles Draper commented on SOLR-1672:
--

I would make heavy use of sort by index desc if it was available.

 RFE: facet reverse sort count
 -

 Key: SOLR-1672
 URL: https://issues.apache.org/jira/browse/SOLR-1672
 Project: Solr
  Issue Type: Improvement
  Components: search
Affects Versions: 1.4
 Environment: Java, Solrj, http
Reporter: Peter Sturge
Priority: Minor
 Attachments: SOLR-1672.patch

   Original Estimate: 0h
  Remaining Estimate: 0h

 As suggested by Chris Hosstetter, I have added an optional Comparator to the 
 BoundedTreeSetLong in the UnInvertedField class.
 This optional comparator is used when a new (and also optional) field facet 
 parameter called 'facet.sortorder' is set to the string 'dsc' 
 (e.g. f.facetname.facet.sortorder=dsc for per field, or 
 facet.sortorder=dsc for all facets).
 Note that this parameter has no effect if facet.method=enum.
 Any value other than 'dsc' (including no value) reverts the BoundedTreeSet to 
 its default behaviour.
  
 This change affects 2 source files:
  UnInvertedField.java
 [line 438] The getCounts() method signature is modified to add the 
 'facetSortOrder' parameter value to the end of the argument list.
  
 DIFF UnInvertedField.java:
 - public NamedList getCounts(SolrIndexSearcher searcher, DocSet baseDocs, int 
 offset, int limit, Integer mincount, boolean missing, String sort, String 
 prefix) throws IOException {
 + public NamedList getCounts(SolrIndexSearcher searcher, DocSet baseDocs, int 
 offset, int limit, Integer mincount, boolean missing, String sort, String 
 prefix, String facetSortOrder) throws IOException {
 [line 556] The getCounts() method is modified to create an overridden 
 BoundedTreeSetLong(int, Comparator) if the 'facetSortOrder' parameter 
 equals 'dsc'.
 DIFF UnInvertedField.java:
 - final BoundedTreeSetLong queue = new BoundedTreeSetLong(maxsize);
 + final BoundedTreeSetLong queue = (sort.equals(count) || 
 sort.equals(true)) ? (facetSortOrder.equals(dsc) ? new 
 BoundedTreeSetLong(maxsize, new Comparator()
 { @Override
 public int compare(Object o1, Object o2)
 {
   if (o1 == null || o2 == null)
 return 0;
   int result = ((Long) o1).compareTo((Long) o2);
   return (result != 0 ? result  0 ? -1 : 1 : 0); //lowest number first sort
 }}) : new BoundedTreeSetLong(maxsize)) : null;
  SimpleFacets.java
 [line 221] A getFieldParam(field, facet.sortorder, asc); is added to 
 retrieve the new parameter, if present. 'asc' used as a default value.
 DIFF SimpleFacets.java:
 + String facetSortOrder = params.getFieldParam(field, facet.sortorder, 
 asc);
  
 [line 253] The call to uif.getCounts() in the getTermCounts() method is 
 modified to pass the 'facetSortOrder' value string.
 DIFF SimpleFacets.java:
 - counts = uif.getCounts(searcher, base, offset, limit, 
 mincount,missing,sort,prefix);
 + counts = uif.getCounts(searcher, base, offset, limit, 
 mincount,missing,sort,prefix, facetSortOrder);
 Implementation Notes:
 I have noted in testing that I was not able to retrieve any '0' counts as I 
 had expected.
 I believe this could be because there appear to be some optimizations in 
 SimpleFacets/count caching such that zero counts are not iterated (at least 
 not by default)
 as a performance enhancement.
 I could be wrong about this, and zero counts may appear under some other as 
 yet untested circumstances. Perhaps an expert familiar with this part of the 
 code can clarify.
 In fact, this is not such a bad thing (at least for my requirements), as a 
 whole bunch of zero counts is not necessarily useful (for my requirements, 
 starting at '1' is just right).
  
 There may, however, be instances where someone *will* want zero counts - e.g. 
 searching for zero product stock counts (e.g. 'what have we run out of'). I 
 was envisioning the facet.mincount field
 being the preferred place to set where the 'lowest value' begins (e.g. 0 or 1 
 or possibly higher), but because of the caching/optimization, the behaviour 
 is somewhat different than expected.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6248) MoreLikeThis Query Parser

2014-10-29 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14189236#comment-14189236
 ] 

ASF subversion and git services commented on SOLR-6248:
---

Commit 1635329 from [~anshumg] in branch 'dev/trunk'
[ https://svn.apache.org/r1635329 ]

SOLR-6248: Changing the format of mlt query parser

 MoreLikeThis Query Parser
 -

 Key: SOLR-6248
 URL: https://issues.apache.org/jira/browse/SOLR-6248
 Project: Solr
  Issue Type: New Feature
  Components: query parsers
Reporter: Anshum Gupta
Assignee: Anshum Gupta
 Fix For: 5.0

 Attachments: SOLR-6248.patch, SOLR-6248.patch, SOLR-6248.patch, 
 SOLR-6248.patch, SOLR-6248.patch


 MLT Component doesn't let people highlight/paginate and the handler comes 
 with an cost of maintaining another piece in the config. Also, any changes to 
 the default (number of results to be fetched etc.) /select handler need to be 
 copied/synced with this handler too.
 Having an MLT QParser would let users get back docs based on a query for them 
 to paginate, highlight etc. It would also give them the flexibility to use 
 this anywhere i.e. q,fq,bq etc.
 A bit of history about MLT (thanks to Hoss)
 MLT Handler pre-dates the existence of QParsers and was meant to take an 
 arbitrary query as input, find docs that match that 
 query, club them together to find interesting terms, and then use those 
 terms as if they were my main query to generate a main result set.
 This result would then be used as the set to facet, highlight etc.
 The flow: Query - DocList(m) - Bag (terms) - Query - DocList\(y)
 The MLT component on the other hand solved a very different purpose of 
 augmenting the main result set. It is used to get similar docs for each of 
 the doc in the main result set.
 DocSet\(n) - n * Bag (terms) - n * (Query) - n * DocList(m)
 The new approach:
 All of this can be done better and cleaner (and makes more sense too) using 
 an MLT QParser.
 An important thing to handle here is the case where the user doesn't have 
 TermVectors, in which case, it does what happens right now i.e. parsing 
 stored fields.
 Also, in case the user doesn't have a field (to be used for MLT) indexed, the 
 field would need to be a TextField with an index analyzer defined. This 
 analyzer will then be used to extract terms for MLT.
 In case of SolrCloud mode, '/get-termvectors' can be used after looking at 
 the schema (if TermVectors are enabled for the field). If not, a /get call 
 can be used to fetch the field and parse it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6248) MoreLikeThis Query Parser

2014-10-29 Thread Anshum Gupta (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14189235#comment-14189235
 ] 

Anshum Gupta commented on SOLR-6248:


After a discussion with Hoss, I'm changing the format of the query parser. It 
wouldn't have an 'id' key in the request i.e. the new request would look like:
{quote}
\{!mlt qf=fieldname\}docId
{quote}

This would eliminate the need to document/maintain and track a new parameter 
name.

 MoreLikeThis Query Parser
 -

 Key: SOLR-6248
 URL: https://issues.apache.org/jira/browse/SOLR-6248
 Project: Solr
  Issue Type: New Feature
  Components: query parsers
Reporter: Anshum Gupta
Assignee: Anshum Gupta
 Fix For: 5.0

 Attachments: SOLR-6248.patch, SOLR-6248.patch, SOLR-6248.patch, 
 SOLR-6248.patch, SOLR-6248.patch


 MLT Component doesn't let people highlight/paginate and the handler comes 
 with an cost of maintaining another piece in the config. Also, any changes to 
 the default (number of results to be fetched etc.) /select handler need to be 
 copied/synced with this handler too.
 Having an MLT QParser would let users get back docs based on a query for them 
 to paginate, highlight etc. It would also give them the flexibility to use 
 this anywhere i.e. q,fq,bq etc.
 A bit of history about MLT (thanks to Hoss)
 MLT Handler pre-dates the existence of QParsers and was meant to take an 
 arbitrary query as input, find docs that match that 
 query, club them together to find interesting terms, and then use those 
 terms as if they were my main query to generate a main result set.
 This result would then be used as the set to facet, highlight etc.
 The flow: Query - DocList(m) - Bag (terms) - Query - DocList\(y)
 The MLT component on the other hand solved a very different purpose of 
 augmenting the main result set. It is used to get similar docs for each of 
 the doc in the main result set.
 DocSet\(n) - n * Bag (terms) - n * (Query) - n * DocList(m)
 The new approach:
 All of this can be done better and cleaner (and makes more sense too) using 
 an MLT QParser.
 An important thing to handle here is the case where the user doesn't have 
 TermVectors, in which case, it does what happens right now i.e. parsing 
 stored fields.
 Also, in case the user doesn't have a field (to be used for MLT) indexed, the 
 field would need to be a TextField with an index analyzer defined. This 
 analyzer will then be used to extract terms for MLT.
 In case of SolrCloud mode, '/get-termvectors' can be used after looking at 
 the schema (if TermVectors are enabled for the field). If not, a /get call 
 can be used to fetch the field and parse it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5579) Leader stops processing collection-work-queue after failed collection reload

2014-10-29 Thread Ryan Cooke (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14189269#comment-14189269
 ] 

Ryan Cooke commented on SOLR-5579:
--

Pretty sure we are also encountering this issue, the collection reload http 
requests issued through the core admin are timing out and a corresponding 
message is sitting in the collection-work-queue. Reloading cores using the 
reload button in the admin gui will successfully reload the local collection 
however. Issuing the reload http request with the parameter async=true seems to 
behave in the same way (request time out)

 Leader stops processing collection-work-queue after failed collection reload
 

 Key: SOLR-5579
 URL: https://issues.apache.org/jira/browse/SOLR-5579
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.5.1
 Environment: Debian Linux 6.0 running on VMWare
 Using embedded SOLR Jetty.
Reporter: Eric Bus
Assignee: Mark Miller
  Labels: collections, queue

 I've been experiencing the same problem a few times now. My leader in 
 /overseer_elect/leader stops processing the collection queue at 
 /overseer/collection-queue-work. The queue will build up and it will trigger 
 an alert in my monitoring tool.
 I haven't been able to pinpoint the reason that the leader stops, but usually 
 I kill the leader node to trigger a leader election. The new node will pick 
 up the queue. And this is where the problems start.
 When the new leader is processing the queue and picks up a reload for a shard 
 without an active leader, the queue stops. It keeps repeating the message 
 that there is no active leader for the shard. But a new leader is never 
 elected:
 {quote}
 ERROR - 2013-12-24 14:43:40.390; org.apache.solr.common.SolrException; Error 
 while trying to recover. 
 core=magento_349_shard1_replica1:org.apache.solr.common.SolrException: No 
 registered leader was found, collection:magento_349 slice:shard1
 at 
 org.apache.solr.common.cloud.ZkStateReader.getLeaderRetry(ZkStateReader.java:482)
 at 
 org.apache.solr.common.cloud.ZkStateReader.getLeaderRetry(ZkStateReader.java:465)
 at 
 org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:317)
 at 
 org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:219)
 ERROR - 2013-12-24 14:43:40.391; org.apache.solr.cloud.RecoveryStrategy; 
 Recovery failed - trying again... (7) core=magento_349_shard1_replica1
 INFO  - 2013-12-24 14:43:40.391; org.apache.solr.cloud.RecoveryStrategy; Wait 
 256.0 seconds before trying to recover again (8)
 {quote}
 Is the leader election in some way connected to the collection queue? If so, 
 can this be a deadlock, because it won't elect until the reload is complete?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6248) MoreLikeThis Query Parser

2014-10-29 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14189268#comment-14189268
 ] 

ASF subversion and git services commented on SOLR-6248:
---

Commit 1635336 from [~anshumg] in branch 'dev/branches/branch_5x'
[ https://svn.apache.org/r1635336 ]

SOLR-6248: Changing request format for mlt queryparser (merge from trunk)

 MoreLikeThis Query Parser
 -

 Key: SOLR-6248
 URL: https://issues.apache.org/jira/browse/SOLR-6248
 Project: Solr
  Issue Type: New Feature
  Components: query parsers
Reporter: Anshum Gupta
Assignee: Anshum Gupta
 Fix For: 5.0

 Attachments: SOLR-6248.patch, SOLR-6248.patch, SOLR-6248.patch, 
 SOLR-6248.patch, SOLR-6248.patch


 MLT Component doesn't let people highlight/paginate and the handler comes 
 with an cost of maintaining another piece in the config. Also, any changes to 
 the default (number of results to be fetched etc.) /select handler need to be 
 copied/synced with this handler too.
 Having an MLT QParser would let users get back docs based on a query for them 
 to paginate, highlight etc. It would also give them the flexibility to use 
 this anywhere i.e. q,fq,bq etc.
 A bit of history about MLT (thanks to Hoss)
 MLT Handler pre-dates the existence of QParsers and was meant to take an 
 arbitrary query as input, find docs that match that 
 query, club them together to find interesting terms, and then use those 
 terms as if they were my main query to generate a main result set.
 This result would then be used as the set to facet, highlight etc.
 The flow: Query - DocList(m) - Bag (terms) - Query - DocList\(y)
 The MLT component on the other hand solved a very different purpose of 
 augmenting the main result set. It is used to get similar docs for each of 
 the doc in the main result set.
 DocSet\(n) - n * Bag (terms) - n * (Query) - n * DocList(m)
 The new approach:
 All of this can be done better and cleaner (and makes more sense too) using 
 an MLT QParser.
 An important thing to handle here is the case where the user doesn't have 
 TermVectors, in which case, it does what happens right now i.e. parsing 
 stored fields.
 Also, in case the user doesn't have a field (to be used for MLT) indexed, the 
 field would need to be a TextField with an index analyzer defined. This 
 analyzer will then be used to extract terms for MLT.
 In case of SolrCloud mode, '/get-termvectors' can be used after looking at 
 the schema (if TermVectors are enabled for the field). If not, a /get call 
 can be used to fetch the field and parse it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-6671) Introduce a solr.data.root as root dir for all data

2014-10-29 Thread JIRA
Jan Høydahl created SOLR-6671:
-

 Summary: Introduce a solr.data.root as root dir for all data
 Key: SOLR-6671
 URL: https://issues.apache.org/jira/browse/SOLR-6671
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.10.1
Reporter: Jan Høydahl
 Fix For: 5.0, Trunk


Many users prefer to deploy code, config and data on separate disk locations, 
so the default of placing the indexes under 
{{$\{solr.solr.home\}/$\{solr.core.name\}/data}} is not always wanted.

In a multi-core/collection system, there is not much help in the 
{{solr.data.dir}} option, as it would set the {{dataDir}} to the same folder 
for all collections. One workaround, if you don't want to hardcode paths in 
your {{solrconfig.xml}}, is to specify the {{dataDir}} property in each 
{{solr.properties}} file.

A more elegant solution would be to introduce a new Java-option 
{{solr.data.root}} which would be to data the same as {{solr.solr.home}} is for 
config. If set, all collections would default their {{dataDir}} as 
{{$\{solr.data.root\)/$\{solr.core.name\}/data}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6671) Introduce a solr.data.root as root dir for all data

2014-10-29 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14189285#comment-14189285
 ] 

Hoss Man commented on SOLR-6671:


isn't this already trivial for users to do by sepecifying 
{{dataDir$\{solr.data.root\}/$\{solr.core.name\}/data/dataDir}} in their 
solrconfig.xml file(s) ?

 Introduce a solr.data.root as root dir for all data
 ---

 Key: SOLR-6671
 URL: https://issues.apache.org/jira/browse/SOLR-6671
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.10.1
Reporter: Jan Høydahl
 Fix For: 5.0, Trunk


 Many users prefer to deploy code, config and data on separate disk locations, 
 so the default of placing the indexes under 
 {{$\{solr.solr.home\}/$\{solr.core.name\}/data}} is not always wanted.
 In a multi-core/collection system, there is not much help in the 
 {{solr.data.dir}} option, as it would set the {{dataDir}} to the same folder 
 for all collections. One workaround, if you don't want to hardcode paths in 
 your {{solrconfig.xml}}, is to specify the {{dataDir}} property in each 
 {{solr.properties}} file.
 A more elegant solution would be to introduce a new Java-option 
 {{solr.data.root}} which would be to data the same as {{solr.solr.home}} is 
 for config. If set, all collections would default their {{dataDir}} as 
 {{$\{solr.data.root\)/$\{solr.core.name\}/data}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6671) Introduce a solr.data.root as root dir for all data

2014-10-29 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-6671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14189306#comment-14189306
 ] 

Jan Høydahl commented on SOLR-6671:
---

Not sure how to wire it in so it will also work as today if the new option is 
not specified.

What we have now in {{solrconfig.xml}} is;
{code:xml}dataDir${solr.data.dir:}/dataDir{code}
One way is to add a new property in {{solr.xml}}:
{code:xml}dataRootDir${solr.data.root:}/dataRootDir{code}
Then modify the logic in SolrCore and other places resolving default data dir, 
if empty to consider solr.data.root as well.


 Introduce a solr.data.root as root dir for all data
 ---

 Key: SOLR-6671
 URL: https://issues.apache.org/jira/browse/SOLR-6671
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.10.1
Reporter: Jan Høydahl
 Fix For: 5.0, Trunk


 Many users prefer to deploy code, config and data on separate disk locations, 
 so the default of placing the indexes under 
 {{$\{solr.solr.home\}/$\{solr.core.name\}/data}} is not always wanted.
 In a multi-core/collection system, there is not much help in the 
 {{solr.data.dir}} option, as it would set the {{dataDir}} to the same folder 
 for all collections. One workaround, if you don't want to hardcode paths in 
 your {{solrconfig.xml}}, is to specify the {{dataDir}} property in each 
 {{solr.properties}} file.
 A more elegant solution would be to introduce a new Java-option 
 {{solr.data.root}} which would be to data the same as {{solr.solr.home}} is 
 for config. If set, all collections would default their {{dataDir}} as 
 {{$\{solr.data.root\)/$\{solr.core.name\}/data}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6671) Introduce a solr.data.root as root dir for all data

2014-10-29 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14189315#comment-14189315
 ] 

Mark Miller commented on SOLR-6671:
---

This is similar to the solr.hdfs.home that the HdfsDirectoryFactory exposes to 
root SolrCloud instance dirs in one location. Def makes sense to have the same 
option for local filesystem given that you really don't want to manage data 
directories manually when using SolrCloud if you can help it. That was also a 
driving reason behind solr.hdfs.home.

 Introduce a solr.data.root as root dir for all data
 ---

 Key: SOLR-6671
 URL: https://issues.apache.org/jira/browse/SOLR-6671
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.10.1
Reporter: Jan Høydahl
 Fix For: 5.0, Trunk


 Many users prefer to deploy code, config and data on separate disk locations, 
 so the default of placing the indexes under 
 {{$\{solr.solr.home\}/$\{solr.core.name\}/data}} is not always wanted.
 In a multi-core/collection system, there is not much help in the 
 {{solr.data.dir}} option, as it would set the {{dataDir}} to the same folder 
 for all collections. One workaround, if you don't want to hardcode paths in 
 your {{solrconfig.xml}}, is to specify the {{dataDir}} property in each 
 {{solr.properties}} file.
 A more elegant solution would be to introduce a new Java-option 
 {{solr.data.root}} which would be to data the same as {{solr.solr.home}} is 
 for config. If set, all collections would default their {{dataDir}} as 
 {{$\{solr.data.root\)/$\{solr.core.name\}/data}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-1672) RFE: facet reverse sort count

2014-10-29 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14189341#comment-14189341
 ] 

Yonik Seeley commented on SOLR-1672:


bq. {code} And I favor the index [asc|desc] / count [asc|desc] format{code} 
+1, this is exactly the syntax that Heliosearch uses (well it actually accepts 
either index desc or index:desc) since the API is JSON:
http://heliosearch.org/json-facet-api/#TermsFacet

 RFE: facet reverse sort count
 -

 Key: SOLR-1672
 URL: https://issues.apache.org/jira/browse/SOLR-1672
 Project: Solr
  Issue Type: Improvement
  Components: search
Affects Versions: 1.4
 Environment: Java, Solrj, http
Reporter: Peter Sturge
Priority: Minor
 Attachments: SOLR-1672.patch

   Original Estimate: 0h
  Remaining Estimate: 0h

 As suggested by Chris Hosstetter, I have added an optional Comparator to the 
 BoundedTreeSetLong in the UnInvertedField class.
 This optional comparator is used when a new (and also optional) field facet 
 parameter called 'facet.sortorder' is set to the string 'dsc' 
 (e.g. f.facetname.facet.sortorder=dsc for per field, or 
 facet.sortorder=dsc for all facets).
 Note that this parameter has no effect if facet.method=enum.
 Any value other than 'dsc' (including no value) reverts the BoundedTreeSet to 
 its default behaviour.
  
 This change affects 2 source files:
  UnInvertedField.java
 [line 438] The getCounts() method signature is modified to add the 
 'facetSortOrder' parameter value to the end of the argument list.
  
 DIFF UnInvertedField.java:
 - public NamedList getCounts(SolrIndexSearcher searcher, DocSet baseDocs, int 
 offset, int limit, Integer mincount, boolean missing, String sort, String 
 prefix) throws IOException {
 + public NamedList getCounts(SolrIndexSearcher searcher, DocSet baseDocs, int 
 offset, int limit, Integer mincount, boolean missing, String sort, String 
 prefix, String facetSortOrder) throws IOException {
 [line 556] The getCounts() method is modified to create an overridden 
 BoundedTreeSetLong(int, Comparator) if the 'facetSortOrder' parameter 
 equals 'dsc'.
 DIFF UnInvertedField.java:
 - final BoundedTreeSetLong queue = new BoundedTreeSetLong(maxsize);
 + final BoundedTreeSetLong queue = (sort.equals(count) || 
 sort.equals(true)) ? (facetSortOrder.equals(dsc) ? new 
 BoundedTreeSetLong(maxsize, new Comparator()
 { @Override
 public int compare(Object o1, Object o2)
 {
   if (o1 == null || o2 == null)
 return 0;
   int result = ((Long) o1).compareTo((Long) o2);
   return (result != 0 ? result  0 ? -1 : 1 : 0); //lowest number first sort
 }}) : new BoundedTreeSetLong(maxsize)) : null;
  SimpleFacets.java
 [line 221] A getFieldParam(field, facet.sortorder, asc); is added to 
 retrieve the new parameter, if present. 'asc' used as a default value.
 DIFF SimpleFacets.java:
 + String facetSortOrder = params.getFieldParam(field, facet.sortorder, 
 asc);
  
 [line 253] The call to uif.getCounts() in the getTermCounts() method is 
 modified to pass the 'facetSortOrder' value string.
 DIFF SimpleFacets.java:
 - counts = uif.getCounts(searcher, base, offset, limit, 
 mincount,missing,sort,prefix);
 + counts = uif.getCounts(searcher, base, offset, limit, 
 mincount,missing,sort,prefix, facetSortOrder);
 Implementation Notes:
 I have noted in testing that I was not able to retrieve any '0' counts as I 
 had expected.
 I believe this could be because there appear to be some optimizations in 
 SimpleFacets/count caching such that zero counts are not iterated (at least 
 not by default)
 as a performance enhancement.
 I could be wrong about this, and zero counts may appear under some other as 
 yet untested circumstances. Perhaps an expert familiar with this part of the 
 code can clarify.
 In fact, this is not such a bad thing (at least for my requirements), as a 
 whole bunch of zero counts is not necessarily useful (for my requirements, 
 starting at '1' is just right).
  
 There may, however, be instances where someone *will* want zero counts - e.g. 
 searching for zero product stock counts (e.g. 'what have we run out of'). I 
 was envisioning the facet.mincount field
 being the preferred place to set where the 'lowest value' begins (e.g. 0 or 1 
 or possibly higher), but because of the caching/optimization, the behaviour 
 is somewhat different than expected.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6513) Add a collectionsAPI call BALANCESLICEUNIQUE

2014-10-29 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-6513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14189352#comment-14189352
 ] 

Jan Høydahl commented on SOLR-6513:
---

This API is not in a released version, so it should be safe to commit the 
rename as part of this JIRA, not?

 Add a collectionsAPI call BALANCESLICEUNIQUE
 

 Key: SOLR-6513
 URL: https://issues.apache.org/jira/browse/SOLR-6513
 Project: Solr
  Issue Type: Improvement
Reporter: Erick Erickson
Assignee: Erick Erickson
 Fix For: 5.0, Trunk

 Attachments: SOLR-6513.patch, SOLR-6513.patch, SOLR-6513.patch, 
 SOLR-6513.patch


 Another sub-task for SOLR-6491. The ability to assign a property on a 
 node-by-node basis is nice, but tedious to get right for a sysadmin, 
 especially if there are, say, 100s of nodes hosting a system. This JIRA would 
 essentially provide an automatic mechanism for assigning a property. This 
 particular command simply changes the cluster state, it doesn't do anything 
 like re-assign functions.
 My idea for this version is fairly limited. You'd have to specify a 
 collection and there would be no attempt to, say, evenly distribute the 
 preferred leader role/property for this collection by looking at _other_ 
 collections. Or by looking at underlying hardware capabilities. Or
 It would be a pretty simple round-robin assignment. About the only 
 intelligence built in would be to change as few roles/properties as possible. 
 Let's say that the correct number of nodes for this role turned out to be 3. 
 Any node currently having 3 properties for this collection would NOT be 
 changed. Any node having 2 properties would have one added that would be 
 taken from some node with  3 properties like this.
 This probably needs an optional parameter, something like 
 includeInactiveNodes=true|false
 Since this is an arbitrary property, one must specify sliceUnique=true. So 
 for the preferredLeader functionality, one would specify something like:
 action=BALANCESLICEUNIQUEproperty=preferredLeaderproprety.value=true.
 There are checks in this code that require the preferredLeader to have a t/f 
 value and require that sliceUnique bet true. That said, this can be called on 
 an arbitrary property that has only one such property per slice.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6671) Introduce a solr.data.root as root dir for all data

2014-10-29 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-6671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14189375#comment-14189375
 ] 

Jan Høydahl commented on SOLR-6671:
---

Hoss, yes you can compose your own variables everywhere in general, but this 
issue proposes to ship Solr with such convenience out of the box. Then also we 
could add an {{-r dir}} option to {{bin/solr}} for specifying where data 
should live. Thus people having tons of collections already will be able to 
upgrade to Solr5 and start using the option without further editing of XML's.

 Introduce a solr.data.root as root dir for all data
 ---

 Key: SOLR-6671
 URL: https://issues.apache.org/jira/browse/SOLR-6671
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.10.1
Reporter: Jan Høydahl
 Fix For: 5.0, Trunk


 Many users prefer to deploy code, config and data on separate disk locations, 
 so the default of placing the indexes under 
 {{$\{solr.solr.home\}/$\{solr.core.name\}/data}} is not always wanted.
 In a multi-core/collection system, there is not much help in the 
 {{solr.data.dir}} option, as it would set the {{dataDir}} to the same folder 
 for all collections. One workaround, if you don't want to hardcode paths in 
 your {{solrconfig.xml}}, is to specify the {{dataDir}} property in each 
 {{solr.properties}} file.
 A more elegant solution would be to introduce a new Java-option 
 {{solr.data.root}} which would be to data the same as {{solr.solr.home}} is 
 for config. If set, all collections would default their {{dataDir}} as 
 {{$\{solr.data.root\)/$\{solr.core.name\}/data}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6671) Introduce a solr.data.root as root dir for all data

2014-10-29 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-6671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14189405#comment-14189405
 ] 

Jan Høydahl commented on SOLR-6671:
---

[~markrmil...@gmail.com], if using {{solr.hdfs.home}}, should not also data 
from e.g. BlendedInfixSuggester be co-located there? But 
BlendedInfixLookupFactory currently hardcodes FSDirectory. Should probably 
create another JIRA for that and possibly other hardcodings.

 Introduce a solr.data.root as root dir for all data
 ---

 Key: SOLR-6671
 URL: https://issues.apache.org/jira/browse/SOLR-6671
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.10.1
Reporter: Jan Høydahl
 Fix For: 5.0, Trunk


 Many users prefer to deploy code, config and data on separate disk locations, 
 so the default of placing the indexes under 
 {{$\{solr.solr.home\}/$\{solr.core.name\}/data}} is not always wanted.
 In a multi-core/collection system, there is not much help in the 
 {{solr.data.dir}} option, as it would set the {{dataDir}} to the same folder 
 for all collections. One workaround, if you don't want to hardcode paths in 
 your {{solrconfig.xml}}, is to specify the {{dataDir}} property in each 
 {{solr.properties}} file.
 A more elegant solution would be to introduce a new Java-option 
 {{solr.data.root}} which would be to data the same as {{solr.solr.home}} is 
 for config. If set, all collections would default their {{dataDir}} as 
 {{$\{solr.data.root\)/$\{solr.core.name\}/data}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-NightlyTests-5.x - Build # 662 - Still Failing

2014-10-29 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/Lucene-Solr-NightlyTests-5.x/662/

1 tests failed.
FAILED:  
junit.framework.TestSuite.org.apache.solr.handler.TestReplicationHandler

Error Message:
Resource in scope SUITE failed to close. Resource was registered from thread 
Thread[id=4131, name=coreLoadExecutor-1670-thread-1, state=RUNNABLE, 
group=TGRP-TestReplicationHandler], registration stack trace below.

Stack Trace:
com.carrotsearch.randomizedtesting.ResourceDisposalError: Resource in scope 
SUITE failed to close. Resource was registered from thread Thread[id=4131, 
name=coreLoadExecutor-1670-thread-1, state=RUNNABLE, 
group=TGRP-TestReplicationHandler], registration stack trace below.
at __randomizedtesting.SeedInfo.seed([800168CC91F75BB2]:0)
at java.lang.Thread.getStackTrace(Thread.java:1589)
at 
com.carrotsearch.randomizedtesting.RandomizedContext.closeAtEnd(RandomizedContext.java:166)
at 
org.apache.lucene.util.LuceneTestCase.closeAfterSuite(LuceneTestCase.java:728)
at 
org.apache.lucene.util.LuceneTestCase.wrapDirectory(LuceneTestCase.java:1314)
at 
org.apache.lucene.util.LuceneTestCase.newDirectory(LuceneTestCase.java:1205)
at 
org.apache.lucene.util.LuceneTestCase.newDirectory(LuceneTestCase.java:1197)
at 
org.apache.solr.core.MockDirectoryFactory.create(MockDirectoryFactory.java:47)
at 
org.apache.solr.core.CachingDirectoryFactory.get(CachingDirectoryFactory.java:350)
at org.apache.solr.core.SolrCore.getNewIndexDir(SolrCore.java:276)
at org.apache.solr.core.SolrCore.initIndex(SolrCore.java:488)
at org.apache.solr.core.SolrCore.init(SolrCore.java:796)
at org.apache.solr.core.SolrCore.init(SolrCore.java:652)
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:509)
at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:273)
at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:267)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.AssertionError: Directory not closed: 
MockDirectoryWrapper(SimpleFSDirectory@/usr/home/jenkins/jenkins-slave/workspace/Lucene-Solr-NightlyTests-5.x/solr/build/solr-core/test/J2/temp/solr.handler.TestReplicationHandler-800168CC91F75BB2-001/index-SimpleFSDirectory-116
 
lockFactory=NativeFSLockFactory@/usr/home/jenkins/jenkins-slave/workspace/Lucene-Solr-NightlyTests-5.x/solr/build/solr-core/test/J2/temp/solr.handler.TestReplicationHandler-800168CC91F75BB2-001/index-SimpleFSDirectory-116)
at org.junit.Assert.fail(Assert.java:93)
at 
org.apache.lucene.util.CloseableDirectory.close(CloseableDirectory.java:47)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$2$1.apply(RandomizedRunner.java:699)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$2$1.apply(RandomizedRunner.java:696)
at 
com.carrotsearch.randomizedtesting.RandomizedContext.closeResources(RandomizedContext.java:183)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$2.afterAlways(RandomizedRunner.java:712)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:43)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:365)
... 1 more




Build Log:
[...truncated 12631 lines...]
   [junit4] Suite: org.apache.solr.handler.TestReplicationHandler
   [junit4]   2 Creating dataDir: 
/usr/home/jenkins/jenkins-slave/workspace/Lucene-Solr-NightlyTests-5.x/solr/build/solr-core/test/J2/temp/solr.handler.TestReplicationHandler-800168CC91F75BB2-001/init-core-data-001
   [junit4]   2 1421040 T3509 oas.SolrTestCaseJ4.setUp ###Starting 
doTestReplicateAfterCoreReload
   [junit4]   2 1421056 T3509 oejs.Server.doStart jetty-8.1.10.v20130312
   [junit4]   2 1421060 T3509 oejs.AbstractConnector.doStart Started 
SocketConnector@127.0.0.1:54137
   [junit4]   2 1421060 T3509 oass.SolrDispatchFilter.init 
SolrDispatchFilter.init()
   [junit4]   2 1421061 T3509 oasc.SolrResourceLoader.locateSolrHome JNDI not 
configured for solr (NoInitialContextEx)
   [junit4]   2 1421061 T3509 oasc.SolrResourceLoader.locateSolrHome using 
system property solr.solr.home: 
/usr/home/jenkins/jenkins-slave/workspace/Lucene-Solr-NightlyTests-5.x/solr/build/solr-core/test/J2/temp/solr.handler.TestReplicationHandler-800168CC91F75BB2-001/solr-instance-001
   [junit4]   2 1421061 T3509 oasc.SolrResourceLoader.init new 
SolrResourceLoader for directory: 
'/usr/home/jenkins/jenkins-slave/workspace/Lucene-Solr-NightlyTests-5.x/solr/build/solr-core/test/J2/temp/solr.handler.TestReplicationHandler-800168CC91F75BB2-001/solr-instance-001/'
   [junit4]   2 1421106 T3509 

[jira] [Updated] (SOLR-6351) Let Stats Hang off of Pivots (via 'tag')

2014-10-29 Thread Hoss Man (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-6351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man updated SOLR-6351:
---
Attachment: SOLR-6351.patch


My main focus the last day or so has been reviewing PivotFacetHelper  
PivotFacetValue with an eye towards simplifying the amount of redundent code 
between them and StatsComponent.  Some details posted below but one key thing i 
wanted to point out...

Even as (relatively) familiar as i am with the exsting Pivot code, it took me a 
long time to understand how PivotFacetHelper.getStats + PivotListEntry.STATS 
were working in the case of leaf level pivot values -- short answer: 
PivotFacetHelper.getStats totally ignores the Enum value of 
PivotListEntry.STATS and uses 0 (something PivotFacetHelper.getPivots also 
does that i've never noticed before).  

Given that we plan to add more data to pivots in issues like SOLR-4212  
SOLR-6353, i really wanted to come up with a pattern for dealing with this that 
was less likeely to trip people up when looking at the code.



{panel:title=Changes in this patch}


* StatsComponent
** refactored out tiny little reusable unwrapStats utility
** refactored out reusable convertToResponse utility
*** i was hoping this would help encapsulate  simplify the way the count==0 
rules are applied, to make top level consistent with pivots, but that lead me 
down a rabbit hole of pain as far as testing and backcompat and solrj - so i 
just captured it in a 'force' method param.
*** But at least now, the method is consistently called everywhere that outputs 
stats, so if/when we change the rules for how empty stats are returned (see 
comments in SOLR-6349) we won't need to audit/change multiple pieces of code, 
we can just focus on callers of this method
** Added a StatsInfo.getStatsField(key) method for use by 
PivotFacetHelper.mergeStats so it wouldn't need to constantly loop over every 
possible stats.field

* PivotFacetValue
** removed an unneccessary level of wrapping arround the MapString,StatsValues
** switched to using StatsComponent.convertToResponse directly instead of 
PivotFacetHelper.convertStatsValuesToNamedList

* PivotListEntry
** renamed index to minIndex
** added an extract method that knows how to correctly deal with the diff 
between optional entries that may exist starting at the minIndex, and 
mandatory entires (field,value,count) that *must* exist at the expected index.

* PivotFacetHelper
** changed the various getFoo methods to use PivotListEntry.FOO.extract
*** these methods now exact mainly just for convinience with the Object casting
*** this also ment the retrieve method could be removed
** simplified mergeStats via:
*** StatsComponent.unwrapStats
*** StatsInfo.getStatsField
** mergeStats javadocs
** removed convertStatsValuesToNamedList

* PivotFacetProcessor
** switch using StatsComponent.convertToResponse

* TestCloudPivots
** update nocommit comment regarding 'null' actualStats based on pain 
encountered working on StastComponent.convertToResponse 
*** added some more sanity check assertions in this case as well

* DistributedFacetPivotSmallTest
** added doTestPivotStatsFromOneShard to account for an edge case in merging 
that occured to me while reviewing PivotFacetHelper.mergeStats 
*** this fails because of how +/-Infinity are treated as the min/max - i'll 
working on fixing this next
*** currently commented out + has some nocommits to beef up this test w/other 
types

* merged my working changes with Vitaliy's additions (but have not yet actually 
reviewed the new tests)...
** FacetPivotSmallTest
** DistributedFacetPivotSmallAdvancedTest
** PivotFacetValue.getStatsValues ... allthough it's not clear to me yet what 
purpose/value this adds?


{panel}


 Let Stats Hang off of Pivots (via 'tag')
 

 Key: SOLR-6351
 URL: https://issues.apache.org/jira/browse/SOLR-6351
 Project: Solr
  Issue Type: Sub-task
Reporter: Hoss Man
 Attachments: SOLR-6351.patch, SOLR-6351.patch, SOLR-6351.patch, 
 SOLR-6351.patch, SOLR-6351.patch, SOLR-6351.patch, SOLR-6351.patch, 
 SOLR-6351.patch, SOLR-6351.patch, SOLR-6351.patch, SOLR-6351.patch, 
 SOLR-6351.patch, SOLR-6351.patch, SOLR-6351.patch


 he goal here is basically flip the notion of stats.facet on it's head, so 
 that instead of asking the stats component to also do some faceting 
 (something that's never worked well with the variety of field types and has 
 never worked in distributed mode) we instead ask the PivotFacet code to 
 compute some stats X for each leaf in a pivot.  We'll do this with the 
 existing {{stats.field}} params, but we'll leverage the {{tag}} local param 
 of the {{stats.field}} instances to be able to associate which stats we want 
 hanging off of which {{facet.pivot}}
 Example...
 {noformat}
 facet.pivot={!stats=s1}category,manufacturer
 

[jira] [Created] (SOLR-6672) function results' names should not include trailing whitespace

2014-10-29 Thread Mike Sokolov (JIRA)
Mike Sokolov created SOLR-6672:
--

 Summary: function results' names should not include trailing 
whitespace
 Key: SOLR-6672
 URL: https://issues.apache.org/jira/browse/SOLR-6672
 Project: Solr
  Issue Type: Bug
  Components: search
Reporter: Mike Sokolov
Priority: Minor


If you include a function as a result field in a list of multiple fields 
separated by white space, the corresponding key in the result markup includes 
trailing whitespace; Example:

{code}
fl=id field(units_used) archive_id
{code}

ends up returning results like this:

{code}
  {
id: nest.epubarchive.1,
archive_id: urn:isbn:97849D42C5A01,
field(units_used) : 123
  ^
  }
{code}

A workaround is to use comma separators instead of whitespace

{code} 
fl=id,field(units_used),archive_id
{code}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6517) CollectionsAPI call REBALANCELEADERS

2014-10-29 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14189635#comment-14189635
 ] 

Noble Paul commented on SOLR-6517:
--

Sorry for being a pain in the butt

I'm exactly looking LeaderElector.joinElection for that call and I don't see 
where it is done. 

 CollectionsAPI call REBALANCELEADERS
 

 Key: SOLR-6517
 URL: https://issues.apache.org/jira/browse/SOLR-6517
 Project: Solr
  Issue Type: New Feature
Affects Versions: 5.0, Trunk
Reporter: Erick Erickson
Assignee: Erick Erickson
 Fix For: 5.0, Trunk

 Attachments: SOLR-6517.patch, SOLR-6517.patch, SOLR-6517.patch


 Perhaps the final piece of SOLR-6491. Once the preferred leadership roles are 
 assigned, there has to be a command make it so Mr. Solr. This is something 
 of a placeholder to collect ideas. One wouldn't want to flood the system with 
 hundreds of re-assignments at once. Should this be synchronous or asnych? 
 Should it make the best attempt but not worry about perfection? Should it???
 a collection=name parameter would be required and it would re-elect all the 
 leaders that were on the 'wrong' node
 I'm thinking an optionally allowing one to specify a shard in the case where 
 you wanted to make a very specific change. Note that there's no need to 
 specify a particular replica, since there should be only a single 
 preferredLeader per slice.
 This command would do nothing to any slice that did not have a replica with a 
 preferredLeader role. Likewise it would do nothing if the slice in question 
 already had the leader role assigned to the node with the preferredLeader 
 role.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-6517) CollectionsAPI call REBALANCELEADERS

2014-10-29 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14189635#comment-14189635
 ] 

Noble Paul edited comment on SOLR-6517 at 10/30/14 4:39 AM:


Sorry for being a pain in the butt

I'm exactly looking for LeaderElector.joinElection  call and I don't see where 
it is done. 


was (Author: noble.paul):
Sorry for being a pain in the butt

I'm exactly looking LeaderElector.joinElection for that call and I don't see 
where it is done. 

 CollectionsAPI call REBALANCELEADERS
 

 Key: SOLR-6517
 URL: https://issues.apache.org/jira/browse/SOLR-6517
 Project: Solr
  Issue Type: New Feature
Affects Versions: 5.0, Trunk
Reporter: Erick Erickson
Assignee: Erick Erickson
 Fix For: 5.0, Trunk

 Attachments: SOLR-6517.patch, SOLR-6517.patch, SOLR-6517.patch


 Perhaps the final piece of SOLR-6491. Once the preferred leadership roles are 
 assigned, there has to be a command make it so Mr. Solr. This is something 
 of a placeholder to collect ideas. One wouldn't want to flood the system with 
 hundreds of re-assignments at once. Should this be synchronous or asnych? 
 Should it make the best attempt but not worry about perfection? Should it???
 a collection=name parameter would be required and it would re-elect all the 
 leaders that were on the 'wrong' node
 I'm thinking an optionally allowing one to specify a shard in the case where 
 you wanted to make a very specific change. Note that there's no need to 
 specify a particular replica, since there should be only a single 
 preferredLeader per slice.
 This command would do nothing to any slice that did not have a replica with a 
 preferredLeader role. Likewise it would do nothing if the slice in question 
 already had the leader role assigned to the node with the preferredLeader 
 role.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



  1   2   >