[jira] [Commented] (SOLR-6531) better error message when lockType doesn't work with directoryFactory
[ https://issues.apache.org/jira/browse/SOLR-6531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188072#comment-14188072 ] Hoss Man commented on SOLR-6531: DirectoryFactory defines an abstract createLockFactory method. Look at any implementation of createLockFactory and you'll see how each implmentation limits which lockTypes can be used. in each case, the error message thrown if you specify an invalid lockType should say waht the valid lockTypes are. steps to reproduce: edit the same solrconfig.xml and replace {{lockType$\{solr.lock.type:native\}/lockType}} with {{lockTypebogus/lockType}} - you'll get an error mssage that doesn't tell you what non-bogus values are supported for that DirectoryFactory. better error message when lockType doesn't work with directoryFactory - Key: SOLR-6531 URL: https://issues.apache.org/jira/browse/SOLR-6531 Project: Solr Issue Type: Improvement Reporter: Hoss Man Labels: difficulty-easy, impact-low SOLR-6519 improved the logic about which lockTypes could be configured with which directoryFactory implementations, but the result is a somewhat confusing error message. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6030) Add norms patched compression which uses table for most common values
[ https://issues.apache.org/jira/browse/LUCENE-6030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188073#comment-14188073 ] Adrien Grand commented on LUCENE-6030: -- +1 Add norms patched compression which uses table for most common values - Key: LUCENE-6030 URL: https://issues.apache.org/jira/browse/LUCENE-6030 Project: Lucene - Core Issue Type: Improvement Reporter: Ryan Ernst Attachments: LUCENE-6030.patch We have added the PATCHED norms sub format in lucene 50, which uses a bitset to mark documents that have the most common value (when 97% of the documents have that value). This works well for fields that have a predominant value length, and then a small number of docs with some other random values. But another common case is having a handful of very common value lengths, like with a title field. We can use a table (see TABLE_COMPRESSION) to store the most common values, and save an oridinal for the other case, at which point we can lookup in the secondary patch table. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-SmokeRelease-5.x - Build # 206 - Failure
Build: https://builds.apache.org/job/Lucene-Solr-SmokeRelease-5.x/206/ No tests ran. Build Log: [...truncated 51317 lines...] prepare-release-no-sign: [mkdir] Created dir: /usr/home/jenkins/jenkins-slave/workspace/Lucene-Solr-SmokeRelease-5.x/lucene/build/smokeTestRelease/dist [copy] Copying 446 files to /usr/home/jenkins/jenkins-slave/workspace/Lucene-Solr-SmokeRelease-5.x/lucene/build/smokeTestRelease/dist/lucene [copy] Copying 254 files to /usr/home/jenkins/jenkins-slave/workspace/Lucene-Solr-SmokeRelease-5.x/lucene/build/smokeTestRelease/dist/solr [smoker] Java 1.7 JAVA_HOME=/home/jenkins/tools/java/latest1.7 [smoker] NOTE: output encoding is US-ASCII [smoker] [smoker] Load release URL file:/usr/home/jenkins/jenkins-slave/workspace/Lucene-Solr-SmokeRelease-5.x/lucene/build/smokeTestRelease/dist/... [smoker] [smoker] Test Lucene... [smoker] test basics... [smoker] get KEYS [smoker] 0.1 MB in 0.01 sec (13.7 MB/sec) [smoker] check changes HTML... [smoker] download lucene-5.0.0-src.tgz... [smoker] 27.8 MB in 0.04 sec (700.9 MB/sec) [smoker] verify md5/sha1 digests [smoker] download lucene-5.0.0.tgz... [smoker] 63.7 MB in 0.09 sec (712.5 MB/sec) [smoker] verify md5/sha1 digests [smoker] download lucene-5.0.0.zip... [smoker] 73.1 MB in 0.08 sec (955.9 MB/sec) [smoker] verify md5/sha1 digests [smoker] unpack lucene-5.0.0.tgz... [smoker] verify JAR metadata/identity/no javax.* or java.* classes... [smoker] test demo with 1.7... [smoker] got 5539 hits for query lucene [smoker] checkindex with 1.7... [smoker] check Lucene's javadoc JAR [smoker] unpack lucene-5.0.0.zip... [smoker] verify JAR metadata/identity/no javax.* or java.* classes... [smoker] test demo with 1.7... [smoker] got 5539 hits for query lucene [smoker] checkindex with 1.7... [smoker] check Lucene's javadoc JAR [smoker] unpack lucene-5.0.0-src.tgz... [smoker] make sure no JARs/WARs in src dist... [smoker] run ant validate [smoker] run tests w/ Java 7 and testArgs='-Dtests.jettyConnector=Socket -Dtests.disableHdfs=true -Dtests.multiplier=1 -Dtests.slow=false'... [smoker] test demo with 1.7... [smoker] got 205 hits for query lucene [smoker] checkindex with 1.7... [smoker] generate javadocs w/ Java 7... [smoker] [smoker] Crawl/parse... [smoker] [smoker] Verify... [smoker] confirm all releases have coverage in TestBackwardsCompatibility [smoker] find all past Lucene releases... [smoker] run TestBackwardsCompatibility.. [smoker] success! [smoker] [smoker] Test Solr... [smoker] test basics... [smoker] get KEYS [smoker] 0.1 MB in 0.02 sec (5.0 MB/sec) [smoker] check changes HTML... [smoker] download solr-5.0.0-src.tgz... [smoker] 34.0 MB in 0.05 sec (645.2 MB/sec) [smoker] verify md5/sha1 digests [smoker] download solr-5.0.0.tgz... [smoker] 146.2 MB in 0.39 sec (372.9 MB/sec) [smoker] verify md5/sha1 digests [smoker] download solr-5.0.0.zip... [smoker] 152.4 MB in 0.24 sec (644.5 MB/sec) [smoker] verify md5/sha1 digests [smoker] unpack solr-5.0.0.tgz... [smoker] verify JAR metadata/identity/no javax.* or java.* classes... [smoker] unpack lucene-5.0.0.tgz... [smoker] **WARNING**: skipping check of /usr/home/jenkins/jenkins-slave/workspace/Lucene-Solr-SmokeRelease-5.x/lucene/build/smokeTestRelease/tmp/unpack/solr-5.0.0/contrib/dataimporthandler-extras/lib/javax.mail-1.5.1.jar: it has javax.* classes [smoker] **WARNING**: skipping check of /usr/home/jenkins/jenkins-slave/workspace/Lucene-Solr-SmokeRelease-5.x/lucene/build/smokeTestRelease/tmp/unpack/solr-5.0.0/contrib/dataimporthandler-extras/lib/activation-1.1.1.jar: it has javax.* classes [smoker] verify WAR metadata/contained JAR identity/no javax.* or java.* classes... [smoker] unpack lucene-5.0.0.tgz... [smoker] copying unpacked distribution for Java 7 ... [smoker] test solr example w/ Java 7... [smoker] start Solr instance (log=/usr/home/jenkins/jenkins-slave/workspace/Lucene-Solr-SmokeRelease-5.x/lucene/build/smokeTestRelease/tmp/unpack/solr-5.0.0-java7/solr-example.log)... [smoker] startup done [smoker] test utf8... [smoker] index example docs... [smoker] run query... [smoker] stop server (SIGINT)... [smoker] unpack solr-5.0.0.zip... [smoker] verify JAR metadata/identity/no javax.* or java.* classes... [smoker] unpack lucene-5.0.0.tgz... [smoker] **WARNING**: skipping check of /usr/home/jenkins/jenkins-slave/workspace/Lucene-Solr-SmokeRelease-5.x/lucene/build/smokeTestRelease/tmp/unpack/solr-5.0.0/contrib/dataimporthandler-extras/lib/javax.mail-1.5.1.jar: it has
[JENKINS] Lucene-Solr-trunk-Windows (32bit/jdk1.8.0_20) - Build # 4397 - Still Failing!
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Windows/4397/ Java: 32bit/jdk1.8.0_20 -client -XX:+UseSerialGC 1 tests failed. REGRESSION: org.apache.solr.client.solrj.SolrSchemalessExampleTest.testAddDelete Error Message: IOException occured when talking to server at: https://127.0.0.1:50400/solr/collection1 Stack Trace: org.apache.solr.client.solrj.SolrServerException: IOException occured when talking to server at: https://127.0.0.1:50400/solr/collection1 at __randomizedtesting.SeedInfo.seed([7F8996E1391E8D2C:B769EBEBC8B65EFA]:0) at org.apache.solr.client.solrj.impl.HttpSolrServer.executeMethod(HttpSolrServer.java:584) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:215) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:211) at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:124) at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:116) at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:102) at org.apache.solr.client.solrj.SolrExampleTestsBase.testAddDelete(SolrExampleTestsBase.java:180) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:483) at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1618) at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:827) at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:863) at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:877) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53) at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:365) at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:798) at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:458) at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:836) at com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:738) at com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:772) at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:783) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:54) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at
[jira] [Created] (SOLR-6666) Dynamic copy fields are considering all dynamic fields, causing a significant performance impact on indexing documents
Liram Vardi created SOLR-: - Summary: Dynamic copy fields are considering all dynamic fields, causing a significant performance impact on indexing documents Key: SOLR- URL: https://issues.apache.org/jira/browse/SOLR- Project: Solr Issue Type: Improvement Components: Schema and Analysis, update Environment: Linux, Solr 4.8, Schema with 70 fields and more than 500 specific CopyFields for dynamic fields, but without wildcards (the fields are dynamic, the copy directive is not) Reporter: Liram Vardi Result: After applying a fix for this issue, tests which we conducted show more than 40 percent improvement on our insertion performance. Explanation: Using JVM profiler, we found a CPU bottleneck during Solr indexing process. This bottleneck can be found at org.apache.solr.schema.IndexSchema, in the following method, getCopyFieldsList(): {code:title=getCopyFieldsList() |borderStyle=solid} final ListCopyField result = new ArrayList(); for (DynamicCopy dynamicCopy : dynamicCopyFields) { if (dynamicCopy.matches(sourceField)) { result.add(new CopyField(getField(sourceField), dynamicCopy.getTargetField(sourceField), dynamicCopy.maxChars)); } } ListCopyField fixedCopyFields = copyFieldsMap.get(sourceField); if (null != fixedCopyFields) { result.addAll(fixedCopyFields); } {code} This function tries to find for an input source field all its copyFields (All its destinations which Solr need to move this field). As you can probably note, the first part of the procedure is the procedure most “expensive” step (takes O(n) time while N is the size of the dynamicCopyFields group). The next part is just a simple hash extraction, which takes O(1) time. Our schema contains over then 500 copyFields but only 70 of then are indexed fields. We also have one dynamic field with a wildcard (*), which catches the rest of the document fields. As you can conclude, we have more than 400 copyFields that are based on this dynamicField but all, except one, are fixed (i.e. does not contain any wildcard). From some reason, the copyFields registration procedure defines those 400 fields as DynamicCopyField and then store them in the “dynamicCopyFields” array, This step makes getCopyFieldsList() very expensive (in CPU terms) without any justification: All of those 400 copyFields are not glob and therefore do not need any complex pattern matching to the input field. They all can be store at the fixedCopyFields. Only copyFields with asterisks need this special treatment and they are (especially on our case) pretty rare. Therefore, we created a patch which fix this problem by changing the registerCopyField() procedure. Test which we conducted show that there is no change in the Indexing results. Moreover, the fix still successfully passes the class unit tests (i.e. IndexSchemaTest.java). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-6667) Dynamic copy fields are considering all dynamic fields, causing a significant performance impact on indexing documents
Liram Vardi created SOLR-6667: - Summary: Dynamic copy fields are considering all dynamic fields, causing a significant performance impact on indexing documents Key: SOLR-6667 URL: https://issues.apache.org/jira/browse/SOLR-6667 Project: Solr Issue Type: Improvement Components: Schema and Analysis, update Environment: Linux, Solr 4.8, Schema with 70 fields and more than 500 specific CopyFields for dynamic fields, but without wildcards (the fields are dynamic, the copy directive is not) Reporter: Liram Vardi Result: After applying a fix for this issue, tests which we conducted show more than 40 percent improvement on our insertion performance. Explanation: Using JVM profiler, we found a CPU bottleneck during Solr indexing process. This bottleneck can be found at org.apache.solr.schema.IndexSchema, in the following method, getCopyFieldsList(): {code:title=getCopyFieldsList() |borderStyle=solid} final ListCopyField result = new ArrayList(); for (DynamicCopy dynamicCopy : dynamicCopyFields) { if (dynamicCopy.matches(sourceField)) { result.add(new CopyField(getField(sourceField), dynamicCopy.getTargetField(sourceField), dynamicCopy.maxChars)); } } ListCopyField fixedCopyFields = copyFieldsMap.get(sourceField); if (null != fixedCopyFields) { result.addAll(fixedCopyFields); } {code} This function tries to find for an input source field all its copyFields (All its destinations which Solr need to move this field). As you can probably note, the first part of the procedure is the procedure most “expensive” step (takes O( n ) time while N is the size of the dynamicCopyFields group). The next part is just a simple hash extraction, which takes O(1) time. Our schema contains over then 500 copyFields but only 70 of then are indexed fields. We also have one dynamic field with a wildcard ( * ), which catches the rest of the document fields. As you can conclude, we have more than 400 copyFields that are based on this dynamicField but all, except one, are fixed (i.e. does not contain any wildcard). From some reason, the copyFields registration procedure defines those 400 fields as DynamicCopyField and then store them in the “dynamicCopyFields” array, This step makes getCopyFieldsList() very expensive (in CPU terms) without any justification: All of those 400 copyFields are not glob and therefore do not need any complex pattern matching to the input field. They all can be store at the fixedCopyFields. Only copyFields with asterisks need this special treatment and they are (especially on our case) pretty rare. Therefore, we created a patch which fix this problem by changing the registerCopyField() procedure. Test which we conducted show that there is no change in the Indexing results. Moreover, the fix still successfully passes the class unit tests (i.e. IndexSchemaTest.java). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-6667) Dynamic copy fields are considering all dynamic fields, causing a significant performance impact on indexing documents
[ https://issues.apache.org/jira/browse/SOLR-6667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liram Vardi updated SOLR-6667: -- Affects Version/s: 4.8 Dynamic copy fields are considering all dynamic fields, causing a significant performance impact on indexing documents -- Key: SOLR-6667 URL: https://issues.apache.org/jira/browse/SOLR-6667 Project: Solr Issue Type: Improvement Components: Schema and Analysis, update Affects Versions: 4.8 Environment: Linux, Solr 4.8, Schema with 70 fields and more than 500 specific CopyFields for dynamic fields, but without wildcards (the fields are dynamic, the copy directive is not) Reporter: Liram Vardi Result: After applying a fix for this issue, tests which we conducted show more than 40 percent improvement on our insertion performance. Explanation: Using JVM profiler, we found a CPU bottleneck during Solr indexing process. This bottleneck can be found at org.apache.solr.schema.IndexSchema, in the following method, getCopyFieldsList(): {code:title=getCopyFieldsList() |borderStyle=solid} final ListCopyField result = new ArrayList(); for (DynamicCopy dynamicCopy : dynamicCopyFields) { if (dynamicCopy.matches(sourceField)) { result.add(new CopyField(getField(sourceField), dynamicCopy.getTargetField(sourceField), dynamicCopy.maxChars)); } } ListCopyField fixedCopyFields = copyFieldsMap.get(sourceField); if (null != fixedCopyFields) { result.addAll(fixedCopyFields); } {code} This function tries to find for an input source field all its copyFields (All its destinations which Solr need to move this field). As you can probably note, the first part of the procedure is the procedure most “expensive” step (takes O( n ) time while N is the size of the dynamicCopyFields group). The next part is just a simple hash extraction, which takes O(1) time. Our schema contains over then 500 copyFields but only 70 of then are indexed fields. We also have one dynamic field with a wildcard ( * ), which catches the rest of the document fields. As you can conclude, we have more than 400 copyFields that are based on this dynamicField but all, except one, are fixed (i.e. does not contain any wildcard). From some reason, the copyFields registration procedure defines those 400 fields as DynamicCopyField and then store them in the “dynamicCopyFields” array, This step makes getCopyFieldsList() very expensive (in CPU terms) without any justification: All of those 400 copyFields are not glob and therefore do not need any complex pattern matching to the input field. They all can be store at the fixedCopyFields. Only copyFields with asterisks need this special treatment and they are (especially on our case) pretty rare. Therefore, we created a patch which fix this problem by changing the registerCopyField() procedure. Test which we conducted show that there is no change in the Indexing results. Moreover, the fix still successfully passes the class unit tests (i.e. IndexSchemaTest.java). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-6667) Dynamic copy fields are considering all dynamic fields, causing a significant performance impact on indexing documents
[ https://issues.apache.org/jira/browse/SOLR-6667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liram Vardi updated SOLR-6667: -- Issue Type: Improvement (was: Bug) Dynamic copy fields are considering all dynamic fields, causing a significant performance impact on indexing documents -- Key: SOLR-6667 URL: https://issues.apache.org/jira/browse/SOLR-6667 Project: Solr Issue Type: Improvement Components: Schema and Analysis, update Affects Versions: 4.8 Environment: Linux, Solr 4.8, Schema with 70 fields and more than 500 specific CopyFields for dynamic fields, but without wildcards (the fields are dynamic, the copy directive is not) Reporter: Liram Vardi Result: After applying a fix for this issue, tests which we conducted show more than 40 percent improvement on our insertion performance. Explanation: Using JVM profiler, we found a CPU bottleneck during Solr indexing process. This bottleneck can be found at org.apache.solr.schema.IndexSchema, in the following method, getCopyFieldsList(): {code:title=getCopyFieldsList() |borderStyle=solid} final ListCopyField result = new ArrayList(); for (DynamicCopy dynamicCopy : dynamicCopyFields) { if (dynamicCopy.matches(sourceField)) { result.add(new CopyField(getField(sourceField), dynamicCopy.getTargetField(sourceField), dynamicCopy.maxChars)); } } ListCopyField fixedCopyFields = copyFieldsMap.get(sourceField); if (null != fixedCopyFields) { result.addAll(fixedCopyFields); } {code} This function tries to find for an input source field all its copyFields (All its destinations which Solr need to move this field). As you can probably note, the first part of the procedure is the procedure most “expensive” step (takes O( n ) time while N is the size of the dynamicCopyFields group). The next part is just a simple hash extraction, which takes O(1) time. Our schema contains over then 500 copyFields but only 70 of then are indexed fields. We also have one dynamic field with a wildcard ( * ), which catches the rest of the document fields. As you can conclude, we have more than 400 copyFields that are based on this dynamicField but all, except one, are fixed (i.e. does not contain any wildcard). From some reason, the copyFields registration procedure defines those 400 fields as DynamicCopyField and then store them in the “dynamicCopyFields” array, This step makes getCopyFieldsList() very expensive (in CPU terms) without any justification: All of those 400 copyFields are not glob and therefore do not need any complex pattern matching to the input field. They all can be store at the fixedCopyFields. Only copyFields with asterisks need this special treatment and they are (especially on our case) pretty rare. Therefore, we created a patch which fix this problem by changing the registerCopyField() procedure. Test which we conducted show that there is no change in the Indexing results. Moreover, the fix still successfully passes the class unit tests (i.e. IndexSchemaTest.java). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-6667) Dynamic copy fields are considering all dynamic fields, causing a significant performance impact on indexing documents
[ https://issues.apache.org/jira/browse/SOLR-6667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liram Vardi updated SOLR-6667: -- Issue Type: Bug (was: Improvement) Dynamic copy fields are considering all dynamic fields, causing a significant performance impact on indexing documents -- Key: SOLR-6667 URL: https://issues.apache.org/jira/browse/SOLR-6667 Project: Solr Issue Type: Bug Components: Schema and Analysis, update Affects Versions: 4.8 Environment: Linux, Solr 4.8, Schema with 70 fields and more than 500 specific CopyFields for dynamic fields, but without wildcards (the fields are dynamic, the copy directive is not) Reporter: Liram Vardi Result: After applying a fix for this issue, tests which we conducted show more than 40 percent improvement on our insertion performance. Explanation: Using JVM profiler, we found a CPU bottleneck during Solr indexing process. This bottleneck can be found at org.apache.solr.schema.IndexSchema, in the following method, getCopyFieldsList(): {code:title=getCopyFieldsList() |borderStyle=solid} final ListCopyField result = new ArrayList(); for (DynamicCopy dynamicCopy : dynamicCopyFields) { if (dynamicCopy.matches(sourceField)) { result.add(new CopyField(getField(sourceField), dynamicCopy.getTargetField(sourceField), dynamicCopy.maxChars)); } } ListCopyField fixedCopyFields = copyFieldsMap.get(sourceField); if (null != fixedCopyFields) { result.addAll(fixedCopyFields); } {code} This function tries to find for an input source field all its copyFields (All its destinations which Solr need to move this field). As you can probably note, the first part of the procedure is the procedure most “expensive” step (takes O( n ) time while N is the size of the dynamicCopyFields group). The next part is just a simple hash extraction, which takes O(1) time. Our schema contains over then 500 copyFields but only 70 of then are indexed fields. We also have one dynamic field with a wildcard ( * ), which catches the rest of the document fields. As you can conclude, we have more than 400 copyFields that are based on this dynamicField but all, except one, are fixed (i.e. does not contain any wildcard). From some reason, the copyFields registration procedure defines those 400 fields as DynamicCopyField and then store them in the “dynamicCopyFields” array, This step makes getCopyFieldsList() very expensive (in CPU terms) without any justification: All of those 400 copyFields are not glob and therefore do not need any complex pattern matching to the input field. They all can be store at the fixedCopyFields. Only copyFields with asterisks need this special treatment and they are (especially on our case) pretty rare. Therefore, we created a patch which fix this problem by changing the registerCopyField() procedure. Test which we conducted show that there is no change in the Indexing results. Moreover, the fix still successfully passes the class unit tests (i.e. IndexSchemaTest.java). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-6667) Dynamic copy fields are considering all dynamic fields, causing a significant performance impact on indexing documents
[ https://issues.apache.org/jira/browse/SOLR-6667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liram Vardi updated SOLR-6667: -- Attachment: SOLR-6667.patch Dynamic copy fields are considering all dynamic fields, causing a significant performance impact on indexing documents -- Key: SOLR-6667 URL: https://issues.apache.org/jira/browse/SOLR-6667 Project: Solr Issue Type: Improvement Components: Schema and Analysis, update Affects Versions: 4.8 Environment: Linux, Solr 4.8, Schema with 70 fields and more than 500 specific CopyFields for dynamic fields, but without wildcards (the fields are dynamic, the copy directive is not) Reporter: Liram Vardi Attachments: SOLR-6667.patch Result: After applying a fix for this issue, tests which we conducted show more than 40 percent improvement on our insertion performance. Explanation: Using JVM profiler, we found a CPU bottleneck during Solr indexing process. This bottleneck can be found at org.apache.solr.schema.IndexSchema, in the following method, getCopyFieldsList(): {code:title=getCopyFieldsList() |borderStyle=solid} final ListCopyField result = new ArrayList(); for (DynamicCopy dynamicCopy : dynamicCopyFields) { if (dynamicCopy.matches(sourceField)) { result.add(new CopyField(getField(sourceField), dynamicCopy.getTargetField(sourceField), dynamicCopy.maxChars)); } } ListCopyField fixedCopyFields = copyFieldsMap.get(sourceField); if (null != fixedCopyFields) { result.addAll(fixedCopyFields); } {code} This function tries to find for an input source field all its copyFields (All its destinations which Solr need to move this field). As you can probably note, the first part of the procedure is the procedure most “expensive” step (takes O( n ) time while N is the size of the dynamicCopyFields group). The next part is just a simple hash extraction, which takes O(1) time. Our schema contains over then 500 copyFields but only 70 of then are indexed fields. We also have one dynamic field with a wildcard ( * ), which catches the rest of the document fields. As you can conclude, we have more than 400 copyFields that are based on this dynamicField but all, except one, are fixed (i.e. does not contain any wildcard). From some reason, the copyFields registration procedure defines those 400 fields as DynamicCopyField and then store them in the “dynamicCopyFields” array, This step makes getCopyFieldsList() very expensive (in CPU terms) without any justification: All of those 400 copyFields are not glob and therefore do not need any complex pattern matching to the input field. They all can be store at the fixedCopyFields. Only copyFields with asterisks need this special treatment and they are (especially on our case) pretty rare. Therefore, we created a patch which fix this problem by changing the registerCopyField() procedure. Test which we conducted show that there is no change in the Indexing results. Moreover, the fix still successfully passes the class unit tests (i.e. IndexSchemaTest.java). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-NightlyTests-trunk - Build # 670 - Still Failing
Build: https://builds.apache.org/job/Lucene-Solr-NightlyTests-trunk/670/ 1 tests failed. FAILED: org.apache.solr.cloud.CollectionsAPIDistributedZkTest.testDistribSearch Error Message: Captured an uncaught exception in thread: Thread[id=3850, name=Thread-1223, state=RUNNABLE, group=TGRP-CollectionsAPIDistributedZkTest] Stack Trace: com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=3850, name=Thread-1223, state=RUNNABLE, group=TGRP-CollectionsAPIDistributedZkTest] Caused by: java.lang.NullPointerException at __randomizedtesting.SeedInfo.seed([BAC342B198BF0FEB]:0) at org.apache.solr.cloud.CollectionsAPIDistributedZkTest$1CollectionThread.run(CollectionsAPIDistributedZkTest.java:1044) Build Log: [...truncated 11726 lines...] [junit4] Suite: org.apache.solr.cloud.CollectionsAPIDistributedZkTest [junit4] 2 Creating dataDir: /usr/home/jenkins/jenkins-slave/workspace/Lucene-Solr-NightlyTests-trunk@2/solr/build/solr-core/test/J3/temp/solr.cloud.CollectionsAPIDistributedZkTest-BAC342B198BF0FEB-001/init-core-data-001 [junit4] 2 671916 T3019 oas.SolrTestCaseJ4.buildSSLConfig Randomized ssl (true) and clientAuth (false) [junit4] 2 671916 T3019 oas.BaseDistributedSearchTestCase.initHostContext Setting hostContext system property: / [junit4] 2 671922 T3019 oas.SolrTestCaseJ4.setUp ###Starting testDistribSearch [junit4] 2 671923 T3019 oasc.ZkTestServer.run STARTING ZK TEST SERVER [junit4] 1 client port:0.0.0.0/0.0.0.0:0 [junit4] 2 671924 T3020 oasc.ZkTestServer$ZKServerMain.runFromConfig Starting server [junit4] 2 672024 T3019 oasc.ZkTestServer.run start zk server on port:57521 [junit4] 2 672025 T3019 oascc.SolrZkClient.createZkCredentialsToAddAutomatically Using default ZkCredentialsProvider [junit4] 2 672026 T3019 oascc.ConnectionManager.waitForConnected Waiting for client to connect to ZooKeeper [junit4] 2 672031 T3026 oascc.ConnectionManager.process Watcher org.apache.solr.common.cloud.ConnectionManager@654fdfb9 name:ZooKeeperConnection Watcher:127.0.0.1:57521 got event WatchedEvent state:SyncConnected type:None path:null path:null type:None [junit4] 2 672031 T3019 oascc.ConnectionManager.waitForConnected Client is connected to ZooKeeper [junit4] 2 672031 T3019 oascc.SolrZkClient.createZkACLProvider Using default ZkACLProvider [junit4] 2 672032 T3019 oascc.SolrZkClient.makePath makePath: /solr [junit4] 2 672034 T3019 oascc.SolrZkClient.createZkCredentialsToAddAutomatically Using default ZkCredentialsProvider [junit4] 2 672036 T3019 oascc.ConnectionManager.waitForConnected Waiting for client to connect to ZooKeeper [junit4] 2 672037 T3028 oascc.ConnectionManager.process Watcher org.apache.solr.common.cloud.ConnectionManager@4c192e0b name:ZooKeeperConnection Watcher:127.0.0.1:57521/solr got event WatchedEvent state:SyncConnected type:None path:null path:null type:None [junit4] 2 672037 T3019 oascc.ConnectionManager.waitForConnected Client is connected to ZooKeeper [junit4] 2 672037 T3019 oascc.SolrZkClient.createZkACLProvider Using default ZkACLProvider [junit4] 2 672038 T3019 oascc.SolrZkClient.makePath makePath: /collections/collection1 [junit4] 2 672040 T3019 oascc.SolrZkClient.makePath makePath: /collections/collection1/shards [junit4] 2 672042 T3019 oascc.SolrZkClient.makePath makePath: /collections/control_collection [junit4] 2 672044 T3019 oascc.SolrZkClient.makePath makePath: /collections/control_collection/shards [junit4] 2 672046 T3019 oasc.AbstractZkTestCase.putConfig put /usr/home/jenkins/jenkins-slave/workspace/Lucene-Solr-NightlyTests-trunk@2/solr/core/src/test-files/solr/collection1/conf/solrconfig-tlog.xml to /configs/conf1/solrconfig.xml [junit4] 2 672047 T3019 oascc.SolrZkClient.makePath makePath: /configs/conf1/solrconfig.xml [junit4] 2 672050 T3019 oasc.AbstractZkTestCase.putConfig put /usr/home/jenkins/jenkins-slave/workspace/Lucene-Solr-NightlyTests-trunk@2/solr/core/src/test-files/solr/collection1/conf/schema.xml to /configs/conf1/schema.xml [junit4] 2 672051 T3019 oascc.SolrZkClient.makePath makePath: /configs/conf1/schema.xml [junit4] 2 672053 T3019 oasc.AbstractZkTestCase.putConfig put /usr/home/jenkins/jenkins-slave/workspace/Lucene-Solr-NightlyTests-trunk@2/solr/core/src/test-files/solr/collection1/conf/solrconfig.snippet.randomindexconfig.xml to /configs/conf1/solrconfig.snippet.randomindexconfig.xml [junit4] 2 672054 T3019 oascc.SolrZkClient.makePath makePath: /configs/conf1/solrconfig.snippet.randomindexconfig.xml [junit4] 2 672056 T3019 oasc.AbstractZkTestCase.putConfig put /usr/home/jenkins/jenkins-slave/workspace/Lucene-Solr-NightlyTests-trunk@2/solr/core/src/test-files/solr/collection1/conf/stopwords.txt to /configs/conf1/stopwords.txt [junit4] 2 672057 T3019 oascc.SolrZkClient.makePath makePath:
[jira] [Commented] (SOLR-5377) the Core Selector in the Admin UI should pre-select a core
[ https://issues.apache.org/jira/browse/SOLR-5377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188231#comment-14188231 ] Konstantin Gribov commented on SOLR-5377: - Maybe, parameter {{/solr/cores\[@defaultCoreName]}} (in xpath notation) from old {{solr.xml}} should be somehow added to the new {{solr.xml}} parameters. the Core Selector in the Admin UI should pre-select a core Key: SOLR-5377 URL: https://issues.apache.org/jira/browse/SOLR-5377 Project: Solr Issue Type: Improvement Reporter: Michael McCandless Priority: Minor I was trying to use the admin UI, to understand how text was analyzed, but it was confusing (I couldn't find the Analysis page) until I realized I had to use the Core Selector to select my core. I had only one core ... it seems like the Core Selector could easily just pre-select a core (the one in my case...). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5532) SolrJ Content-Type validation is too strict for some webcontainers / proxies, breaks on equivilent content types
[ https://issues.apache.org/jira/browse/SOLR-5532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188249#comment-14188249 ] Magnus Lövgren commented on SOLR-5532: -- I've upgraded from 4.4 to 4.10.1 and have been struggling somewhat with my code that was affected by this change. Some observations that might be useful for others too: The patch relies on the org.apache.http.entity.ContentType.parse method. It fails when parsing an empty string. That's fine (empty string should probably not be seen as a valid type anyway). The caveat is that an empty string is actually used as the fallback contentType if the response has no Content-Type header! This would be the typical case if the response is a 401 (typically has no Content-Type). - In prior versions a 401 response threw a SolrException with code() 401 - Now a SolrServerException is thrown (caused by a org.apache.http.ParseException). Hard to determine if it was due to bad credentials (401). To restore previous behaviour, you'd presumably add the HttpStatus.SC_UNAUTHORIZED case to the switch and then throw a RemoteSolrException (with code 401). In other words - fail early for 401 response (there's no content to parse anyway) SolrJ Content-Type validation is too strict for some webcontainers / proxies, breaks on equivilent content types Key: SOLR-5532 URL: https://issues.apache.org/jira/browse/SOLR-5532 Project: Solr Issue Type: Bug Affects Versions: 4.6 Environment: Windows 7, Java 1.7.0_45 (64bit), solr-solrj-4.6.0.jar Reporter: Jakob Furrer Assignee: Mark Miller Fix For: 4.6.1, 4.7, Trunk Attachments: SOLR-5532-elyograg-eclipse-screenshot.png, SOLR-5532.patch due to SOLR-3530, HttpSolrServer now does a string equivilence check between the Content-Type returned by the server, and a getContentTYpe() method declared by the ResponseParser .. but string equivilence is too strict, and can result in errors like this one reported by a user I just upgraded my Solr instance and with it I also upgraded the solrj library in our custom application which sends diverse requests and queries to Solr. I use the ping method to determine whether Solr started correctly under the configured address. Since the upgrade the ping response results in an error: {code:xml} Cause: org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: Expected content type application/xml; charset=UTF-8 but got application/xml;charset=UTF-8. ?xml version=1.0 encoding=UTF-8? response lst name=responseHeaderint name=status0/intint name=QTime0/intlst name=paramsstr name=dfsearchtext/strstr name=echoParamsall/strstr name=rows10/strstr name=echoParamsall/strstr name=wtxml/strstr name=version2.2/strstr name=qsolrpingquery/strstr name=distribfalse/str/lst/lststr name=statusOK/str /response {code} The Solr application itself works fine. Using an older version of the solrj library than solr-solrj-4.6.0.jar (e.g. solr-solrj-4.5.1.jar) in the custom application does not produce this error. The Exception is produced in a Code block (_HttpSolrServer.java_, method _request(...)_, around. line 140) which has been introduced with version 4.6.0. Code to reproduce the error: {code} try { HttpSolrServer solrServer = new HttpSolrServer(http://localhost:8080/Solr/collection;); solrServer.setParser(new XMLResponseParser()); // this line is making all the difference solrServer.ping(); } catch (Exception e) { e.printStackTrace(); } {code} A global search for charset=UTF-8 on the source code of solrj indicates that other functions besides ping might be affected as well, because there are several places where application/xml; charset=UTF-8 is spelled without a space after the semicolon. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-5377) the Core Selector in the Admin UI should pre-select a core
[ https://issues.apache.org/jira/browse/SOLR-5377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Matheis (steffkes) updated SOLR-5377: Component/s: web gui the Core Selector in the Admin UI should pre-select a core Key: SOLR-5377 URL: https://issues.apache.org/jira/browse/SOLR-5377 Project: Solr Issue Type: Improvement Components: web gui Reporter: Michael McCandless Priority: Minor I was trying to use the admin UI, to understand how text was analyzed, but it was confusing (I couldn't find the Analysis page) until I realized I had to use the Core Selector to select my core. I had only one core ... it seems like the Core Selector could easily just pre-select a core (the one in my case...). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5532) SolrJ Content-Type validation is too strict for some webcontainers / proxies, breaks on equivilent content types
[ https://issues.apache.org/jira/browse/SOLR-5532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188262#comment-14188262 ] Mark Miller commented on SOLR-5532: --- I ran into this same issue in a review for Cloudera Search before I went on vacation a couple weeks ago. Technically, it was a back compat break. Please file a JIRA issue and we can address it. SolrJ Content-Type validation is too strict for some webcontainers / proxies, breaks on equivilent content types Key: SOLR-5532 URL: https://issues.apache.org/jira/browse/SOLR-5532 Project: Solr Issue Type: Bug Affects Versions: 4.6 Environment: Windows 7, Java 1.7.0_45 (64bit), solr-solrj-4.6.0.jar Reporter: Jakob Furrer Assignee: Mark Miller Fix For: 4.6.1, 4.7, Trunk Attachments: SOLR-5532-elyograg-eclipse-screenshot.png, SOLR-5532.patch due to SOLR-3530, HttpSolrServer now does a string equivilence check between the Content-Type returned by the server, and a getContentTYpe() method declared by the ResponseParser .. but string equivilence is too strict, and can result in errors like this one reported by a user I just upgraded my Solr instance and with it I also upgraded the solrj library in our custom application which sends diverse requests and queries to Solr. I use the ping method to determine whether Solr started correctly under the configured address. Since the upgrade the ping response results in an error: {code:xml} Cause: org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: Expected content type application/xml; charset=UTF-8 but got application/xml;charset=UTF-8. ?xml version=1.0 encoding=UTF-8? response lst name=responseHeaderint name=status0/intint name=QTime0/intlst name=paramsstr name=dfsearchtext/strstr name=echoParamsall/strstr name=rows10/strstr name=echoParamsall/strstr name=wtxml/strstr name=version2.2/strstr name=qsolrpingquery/strstr name=distribfalse/str/lst/lststr name=statusOK/str /response {code} The Solr application itself works fine. Using an older version of the solrj library than solr-solrj-4.6.0.jar (e.g. solr-solrj-4.5.1.jar) in the custom application does not produce this error. The Exception is produced in a Code block (_HttpSolrServer.java_, method _request(...)_, around. line 140) which has been introduced with version 4.6.0. Code to reproduce the error: {code} try { HttpSolrServer solrServer = new HttpSolrServer(http://localhost:8080/Solr/collection;); solrServer.setParser(new XMLResponseParser()); // this line is making all the difference solrServer.ping(); } catch (Exception e) { e.printStackTrace(); } {code} A global search for charset=UTF-8 on the source code of solrj indicates that other functions besides ping might be affected as well, because there are several places where application/xml; charset=UTF-8 is spelled without a space after the semicolon. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6618) SolrCore Initialization Failures when the solr is restarted, unable to Initialization a collection
[ https://issues.apache.org/jira/browse/SOLR-6618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188263#comment-14188263 ] Vijaya Jonnakuti commented on SOLR-6618: Thanks for the reply. I cannot say exactly when it happens. most like when I restart one of the solr node. I have 3 zookeeper ensemble/3 solr node 3 client nodes all the collections are created with a default config which is uploaded to zookeeper. But when dih is run the different config folders with the corresponding collection name are created in the zookeeper with. update_dih_store.properties in them and no configs in it. after which it thinks overnighttest should have its own config set in zookeeper. where I have not run dih for overnighttest. example: configs /default | --has the configs in here /collection1 | ---update_dih_generic.properties but no configs /collection2 | ---update_dih_generic.properties but no configs Let me know if you need more details SolrCore Initialization Failures when the solr is restarted, unable to Initialization a collection -- Key: SOLR-6618 URL: https://issues.apache.org/jira/browse/SOLR-6618 Project: Solr Issue Type: Bug Affects Versions: 4.8 Reporter: Vijaya Jonnakuti I have uploaded one config:default and do specify collection.configName=default when I create the collection and when solr is restart I get this error org.apache.solr.common.cloud.ZooKeeperException:org.apache.solr.common.cloud.ZooKeeperException: Could not find configName for collection overnighttest found:[default, collection1, collection2 and so on] These collection1 and collection2 empty configs are created when I run DataImportHandler using ZKPropertiesWriter -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-6618) SolrCore Initialization Failures when the solr is restarted, unable to Initialization a collection
[ https://issues.apache.org/jira/browse/SOLR-6618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188263#comment-14188263 ] Vijaya Jonnakuti edited comment on SOLR-6618 at 10/29/14 11:52 AM: --- Thanks for the reply. I cannot say exactly when it happens. most like when I restart one of the solr node. I have 3 zookeeper ensemble/3 solr node 3 client nodes all the collections are created with a default config which is uploaded to zookeeper. But when dih is run the different config folders with the corresponding collection name are created in the zookeeper with. update_dih_store.properties in them and no configs in it. after which it thinks overnighttest should have its own config set in zookeeper. where I have not run dih for overnighttest. configs /default | --has the configs in here /collection1 | ---update_dih_generic.properties but no configs /collection2 | ---update_dih_generic.properties but no configs Let me know if you need more details was (Author: pattapuvijaya): Thanks for the reply. I cannot say exactly when it happens. most like when I restart one of the solr node. I have 3 zookeeper ensemble/3 solr node 3 client nodes all the collections are created with a default config which is uploaded to zookeeper. But when dih is run the different config folders with the corresponding collection name are created in the zookeeper with. update_dih_store.properties in them and no configs in it. after which it thinks overnighttest should have its own config set in zookeeper. where I have not run dih for overnighttest. example: configs /default | --has the configs in here /collection1 | ---update_dih_generic.properties but no configs /collection2 | ---update_dih_generic.properties but no configs Let me know if you need more details SolrCore Initialization Failures when the solr is restarted, unable to Initialization a collection -- Key: SOLR-6618 URL: https://issues.apache.org/jira/browse/SOLR-6618 Project: Solr Issue Type: Bug Affects Versions: 4.8 Reporter: Vijaya Jonnakuti I have uploaded one config:default and do specify collection.configName=default when I create the collection and when solr is restart I get this error org.apache.solr.common.cloud.ZooKeeperException:org.apache.solr.common.cloud.ZooKeeperException: Could not find configName for collection overnighttest found:[default, collection1, collection2 and so on] These collection1 and collection2 empty configs are created when I run DataImportHandler using ZKPropertiesWriter -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-6618) SolrCore Initialization Failures when the solr is restarted, unable to Initialization a collection
[ https://issues.apache.org/jira/browse/SOLR-6618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188263#comment-14188263 ] Vijaya Jonnakuti edited comment on SOLR-6618 at 10/29/14 11:56 AM: --- Thanks for the reply. I cannot say exactly when it happens. most like when one of the solr node is restarted. COnfiguration is 3 zookeeper ensemble/3 solr node 3 client nodes all the collections are created with a default config which is uploaded to zookeeper. But when dih is run the different config folders with the corresponding collection name are created in the zookeeper with. update_dih_store.properties in them and no configs in it. after which it thinks overnighttest should have its own config set in zookeeper. dih is not run for overnighttest do that folder is not created. configs /default | --has the configs in here /collection1 | ---update_dih_generic.properties but no configs /collection2 | ---update_dih_generic.properties but no configs Let me know if you need more details was (Author: pattapuvijaya): Thanks for the reply. I cannot say exactly when it happens. most like when I restart one of the solr node. I have 3 zookeeper ensemble/3 solr node 3 client nodes all the collections are created with a default config which is uploaded to zookeeper. But when dih is run the different config folders with the corresponding collection name are created in the zookeeper with. update_dih_store.properties in them and no configs in it. after which it thinks overnighttest should have its own config set in zookeeper. where I have not run dih for overnighttest. configs /default | --has the configs in here /collection1 | ---update_dih_generic.properties but no configs /collection2 | ---update_dih_generic.properties but no configs Let me know if you need more details SolrCore Initialization Failures when the solr is restarted, unable to Initialization a collection -- Key: SOLR-6618 URL: https://issues.apache.org/jira/browse/SOLR-6618 Project: Solr Issue Type: Bug Affects Versions: 4.8 Reporter: Vijaya Jonnakuti I have uploaded one config:default and do specify collection.configName=default when I create the collection and when solr is restart I get this error org.apache.solr.common.cloud.ZooKeeperException:org.apache.solr.common.cloud.ZooKeeperException: Could not find configName for collection overnighttest found:[default, collection1, collection2 and so on] These collection1 and collection2 empty configs are created when I run DataImportHandler using ZKPropertiesWriter -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-6618) SolrCore Initialization Failures when the solr is restarted, unable to Initialization a collection
[ https://issues.apache.org/jira/browse/SOLR-6618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188263#comment-14188263 ] Vijaya Jonnakuti edited comment on SOLR-6618 at 10/29/14 11:57 AM: --- Thanks for the reply. Cannot say exactly when it happens, most likely when one of the solr node is restarted. Configuration is 3 zookeeper ensemble/3 solr node 3 client nodes all the collections are created with a default config which is uploaded to zookeeper. But when dih is run the different config folders with the corresponding collection name are created in the zookeeper with. update_dih_store.properties in them and no configs in it. After which it thinks overnighttest should have its own config set in zookeeper, dih is not run for overnighttest collection so that folder is not created. configs /default | --has the configs in here /collection1 | ---update_dih_generic.properties but no configs /collection2 | ---update_dih_generic.properties but no configs Let me know if you need more details was (Author: pattapuvijaya): Thanks for the reply. I cannot say exactly when it happens. most like when one of the solr node is restarted. COnfiguration is 3 zookeeper ensemble/3 solr node 3 client nodes all the collections are created with a default config which is uploaded to zookeeper. But when dih is run the different config folders with the corresponding collection name are created in the zookeeper with. update_dih_store.properties in them and no configs in it. after which it thinks overnighttest should have its own config set in zookeeper. dih is not run for overnighttest do that folder is not created. configs /default | --has the configs in here /collection1 | ---update_dih_generic.properties but no configs /collection2 | ---update_dih_generic.properties but no configs Let me know if you need more details SolrCore Initialization Failures when the solr is restarted, unable to Initialization a collection -- Key: SOLR-6618 URL: https://issues.apache.org/jira/browse/SOLR-6618 Project: Solr Issue Type: Bug Affects Versions: 4.8 Reporter: Vijaya Jonnakuti I have uploaded one config:default and do specify collection.configName=default when I create the collection and when solr is restart I get this error org.apache.solr.common.cloud.ZooKeeperException:org.apache.solr.common.cloud.ZooKeeperException: Could not find configName for collection overnighttest found:[default, collection1, collection2 and so on] These collection1 and collection2 empty configs are created when I run DataImportHandler using ZKPropertiesWriter -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-6618) SolrCore Initialization Failures when the solr is restarted, unable to Initialization a collection
[ https://issues.apache.org/jira/browse/SOLR-6618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188263#comment-14188263 ] Vijaya Jonnakuti edited comment on SOLR-6618 at 10/29/14 11:59 AM: --- Thanks for the reply. Cannot say exactly when it happens, most likely when one of the solr node is restarted. Configuration is 3 zookeeper ensemble/3 solr node 3 client nodes all the collections are created with a default config which is uploaded to zookeeper. But when dih is run the different config folders with the corresponding collection name are created in the zookeeper with. update_dih_store.properties in them and no configs in it. After which it thinks overnighttest should have its own config set in zookeeper, dih is not run for overnighttest collection so that folder is not created. We do give configName as default for the collection creation api through solrj. configs /default | --has the configs in here /collection1 | ---update_dih_generic.properties but no configs /collection2 | ---update_dih_generic.properties but no configs Let me know if you need more details was (Author: pattapuvijaya): Thanks for the reply. Cannot say exactly when it happens, most likely when one of the solr node is restarted. Configuration is 3 zookeeper ensemble/3 solr node 3 client nodes all the collections are created with a default config which is uploaded to zookeeper. But when dih is run the different config folders with the corresponding collection name are created in the zookeeper with. update_dih_store.properties in them and no configs in it. After which it thinks overnighttest should have its own config set in zookeeper, dih is not run for overnighttest collection so that folder is not created. configs /default | --has the configs in here /collection1 | ---update_dih_generic.properties but no configs /collection2 | ---update_dih_generic.properties but no configs Let me know if you need more details SolrCore Initialization Failures when the solr is restarted, unable to Initialization a collection -- Key: SOLR-6618 URL: https://issues.apache.org/jira/browse/SOLR-6618 Project: Solr Issue Type: Bug Affects Versions: 4.8 Reporter: Vijaya Jonnakuti I have uploaded one config:default and do specify collection.configName=default when I create the collection and when solr is restart I get this error org.apache.solr.common.cloud.ZooKeeperException:org.apache.solr.common.cloud.ZooKeeperException: Could not find configName for collection overnighttest found:[default, collection1, collection2 and so on] These collection1 and collection2 empty configs are created when I run DataImportHandler using ZKPropertiesWriter -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-6618) SolrCore Initialization Failures when the solr is restarted, unable to Initialization a collection
[ https://issues.apache.org/jira/browse/SOLR-6618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188263#comment-14188263 ] Vijaya Jonnakuti edited comment on SOLR-6618 at 10/29/14 12:11 PM: --- Thanks for the reply. Cannot say exactly when it happens, most likely when one of the solr node is restarted. Configuration is 3 zookeeper ensemble/3 solr node 3 client nodes all the collections are created with a default config which is uploaded to zookeeper. But when dih is run the different config folders with the corresponding collection name are created in the zookeeper with. update_dih_store.properties in them and no configs in it. After which the system thinks overnighttest should have its own config set in zookeeper, dih is not run for overnighttest collection so that folder is not created. We do give configName as default for the collection creation api through solrj. configs /default | --has the configs in here /collection1 | ---update_dih_generic.properties but no configs /collection2 | ---update_dih_generic.properties but no configs Let me know if you need more details was (Author: pattapuvijaya): Thanks for the reply. Cannot say exactly when it happens, most likely when one of the solr node is restarted. Configuration is 3 zookeeper ensemble/3 solr node 3 client nodes all the collections are created with a default config which is uploaded to zookeeper. But when dih is run the different config folders with the corresponding collection name are created in the zookeeper with. update_dih_store.properties in them and no configs in it. After which it thinks overnighttest should have its own config set in zookeeper, dih is not run for overnighttest collection so that folder is not created. We do give configName as default for the collection creation api through solrj. configs /default | --has the configs in here /collection1 | ---update_dih_generic.properties but no configs /collection2 | ---update_dih_generic.properties but no configs Let me know if you need more details SolrCore Initialization Failures when the solr is restarted, unable to Initialization a collection -- Key: SOLR-6618 URL: https://issues.apache.org/jira/browse/SOLR-6618 Project: Solr Issue Type: Bug Affects Versions: 4.8 Reporter: Vijaya Jonnakuti I have uploaded one config:default and do specify collection.configName=default when I create the collection and when solr is restart I get this error org.apache.solr.common.cloud.ZooKeeperException:org.apache.solr.common.cloud.ZooKeeperException: Could not find configName for collection overnighttest found:[default, collection1, collection2 and so on] These collection1 and collection2 empty configs are created when I run DataImportHandler using ZKPropertiesWriter -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-5.x-Windows (32bit/jdk1.7.0_67) - Build # 4296 - Failure!
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-5.x-Windows/4296/ Java: 32bit/jdk1.7.0_67 -server -XX:+UseG1GC No tests ran. Build Log: [...truncated 12196 lines...] FATAL: hudson.remoting.RequestAbortedException: java.io.IOException: Unexpected termination of the channel hudson.remoting.RequestAbortedException: hudson.remoting.RequestAbortedException: java.io.IOException: Unexpected termination of the channel at hudson.remoting.RequestAbortedException.wrapForRethrow(RequestAbortedException.java:41) at hudson.remoting.RequestAbortedException.wrapForRethrow(RequestAbortedException.java:34) at hudson.remoting.Request.call(Request.java:174) at hudson.remoting.Channel.call(Channel.java:742) at hudson.remoting.RemoteInvocationHandler.invoke(RemoteInvocationHandler.java:168) at com.sun.proxy.$Proxy73.join(Unknown Source) at hudson.Launcher$RemoteLauncher$ProcImpl.join(Launcher.java:978) at hudson.Launcher$ProcStarter.join(Launcher.java:387) at hudson.tasks.Ant.perform(Ant.java:217) at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:20) at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:770) at hudson.model.Build$BuildExecution.build(Build.java:199) at hudson.model.Build$BuildExecution.doRun(Build.java:160) at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:533) at hudson.model.Run.execute(Run.java:1759) at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43) at hudson.model.ResourceController.execute(ResourceController.java:89) at hudson.model.Executor.run(Executor.java:240) Caused by: hudson.remoting.RequestAbortedException: java.io.IOException: Unexpected termination of the channel at hudson.remoting.Request.abort(Request.java:299) at hudson.remoting.Channel.terminate(Channel.java:805) at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:69) Caused by: java.io.IOException: Unexpected termination of the channel at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:50) Caused by: java.io.EOFException at java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2325) at java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:2794) at java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:801) at java.io.ObjectInputStream.init(ObjectInputStream.java:299) at hudson.remoting.ObjectInputStreamEx.init(ObjectInputStreamEx.java:40) at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:34) at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:48) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6631) DistributedQueue spinning on calling zookeeper getChildren()
[ https://issues.apache.org/jira/browse/SOLR-6631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188305#comment-14188305 ] Timothy Potter commented on SOLR-6631: -- Thanks for the feedback Hoss. I was actually wondering if it would suffice to handle NodeChildrenChanged EventTypes in the LatchChildWatcher process method, i.e. change the code to: if (eventType == Event.EventType.NodeChildrenChanged) { ... } [~markrmil...@gmail.com] or [~andyetitmoves] do either of you have any insight you can share on this? Specifically, I'd like to change the LatchChildWatcher.process to set the event member and notifyAll only if the EventType is NodeChildrenChanged, i.e. {code} @Override public void process(WatchedEvent event) { Event.EventType eventType = event.getType(); LOG.info(LatchChildWatcher fired on path: + event.getPath() + state: + event.getState() + type + eventType); if (eventType == Event.EventType.NodeChildrenChanged) { synchronized (lock) { this.event = event; lock.notifyAll(); } } } {code} Or do we need to handle the other event types and just not affect the event if the type is None as originally suggested by [~mewmewball]? Need to get this one committed soon ;-) DistributedQueue spinning on calling zookeeper getChildren() Key: SOLR-6631 URL: https://issues.apache.org/jira/browse/SOLR-6631 Project: Solr Issue Type: Bug Components: SolrCloud Reporter: Jessica Cheng Mallet Assignee: Timothy Potter Labels: solrcloud Attachments: SOLR-6631.patch The change from SOLR-6336 introduced a bug where now I'm stuck in a loop making getChildren() request to zookeeper with this thread dump: {quote} Thread-51 [WAITING] CPU time: 1d 15h 0m 57s java.lang.Object.wait() org.apache.zookeeper.ClientCnxn.submitRequest(RequestHeader, Record, Record, ZooKeeper$WatchRegistration) org.apache.zookeeper.ZooKeeper.getChildren(String, Watcher) org.apache.solr.common.cloud.SolrZkClient$6.execute()2 recursive calls org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkOperation) org.apache.solr.common.cloud.SolrZkClient.getChildren(String, Watcher, boolean) org.apache.solr.cloud.DistributedQueue.orderedChildren(Watcher) org.apache.solr.cloud.DistributedQueue.getChildren(long) org.apache.solr.cloud.DistributedQueue.peek(long) org.apache.solr.cloud.DistributedQueue.peek(boolean) org.apache.solr.cloud.Overseer$ClusterStateUpdater.run() java.lang.Thread.run() {quote} Looking at the code, I think the issue is that LatchChildWatcher#process always sets the event to its member variable event, regardless of its type, but the problem is that once the member event is set, the await no longer waits. In this state, the while loop in getChildren(long), when called with wait being Integer.MAX_VALUE will loop back, NOT wait at await because event != null, but then it still will not get any children. {quote} while (true) \{ if (!children.isEmpty()) break; watcher.await(wait == Long.MAX_VALUE ? DEFAULT_TIMEOUT : wait); if (watcher.getWatchedEvent() != null) \{ children = orderedChildren(null); \} if (wait != Long.MAX_VALUE) break; \} {quote} I think the fix would be to only set the event in the watcher if the type is not None. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5317) [PATCH] Concordance capability
[ https://issues.apache.org/jira/browse/LUCENE-5317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188309#comment-14188309 ] Tim Allison commented on LUCENE-5317: - Thank you, Steve. I created a lucene5317 branch on my github [fork|https://github.com/tballison/lucene-solr]. I applied your patch and will start adding my local updates...there have been quite a few since I posted the initial patch. When I'm happy enough with that, I'll put the patch on rb. Thank you, again. [PATCH] Concordance capability -- Key: LUCENE-5317 URL: https://issues.apache.org/jira/browse/LUCENE-5317 Project: Lucene - Core Issue Type: New Feature Components: core/search Affects Versions: 4.5 Reporter: Tim Allison Labels: patch Fix For: 4.9 Attachments: LUCENE-5317.patch, concordance_v1.patch.gz This patch enables a Lucene-powered concordance search capability. Concordances are extremely useful for linguists, lawyers and other analysts performing analytic search vs. traditional snippeting/document retrieval tasks. By analytic search, I mean that the user wants to browse every time a term appears (or at least the topn) in a subset of documents and see the words before and after. Concordance technology is far simpler and less interesting than IR relevance models/methods, but it can be extremely useful for some use cases. Traditional concordance sort orders are available (sort on words before the target, words after, target then words before and target then words after). Under the hood, this is running SpanQuery's getSpans() and reanalyzing to obtain character offsets. There is plenty of room for optimizations and refactoring. Many thanks to my colleague, Jason Robinson, for input on the design of this patch. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [VOTE] Release 4.10.2 RC1
Vote passes, I'll push. Mike McCandless http://blog.mikemccandless.com On Mon, Oct 27, 2014 at 6:00 PM, Steve Rowe sar...@gmail.com wrote: +1 SUCCESS! [0:52:16.190427] Steve On Oct 27, 2014, at 7:54 AM, Adrien Grand jpou...@gmail.com wrote: +1 SUCCESS! [0:56:11.020611] On Sun, Oct 26, 2014 at 4:45 PM, Simon Willnauer simon.willna...@gmail.com wrote: Tests now pass for me too!! thanks mike +1 On Sun, Oct 26, 2014 at 12:22 PM, Michael McCandless luc...@mikemccandless.com wrote: Artifacts: http://people.apache.org/~mikemccand/staging_area/lucene-solr-4.10.2-RC1-rev1634293 Smoke tester: python3 -u dev-tools/scripts/smokeTestRelease.py http://people.apache.org/~mikemccand/staging_area/lucene-solr-4.10.2-RC1-rev1634293 1634293 4.10.2 /tmp/smoke4102 True I ran smoke tester: SUCCESS! [0:30:16.520543] And also confirmed Elasticsearch tests pass with this RC. Here's my +1 Mike McCandless http://blog.mikemccandless.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org -- Adrien - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-6668) FieldStatsInfo should expose sumOfSquares.
Christoph Strobl created SOLR-6668: -- Summary: FieldStatsInfo should expose sumOfSquares. Key: SOLR-6668 URL: https://issues.apache.org/jira/browse/SOLR-6668 Project: Solr Issue Type: Improvement Components: SolrJ Affects Versions: 4.10.1 Reporter: Christoph Strobl Priority: Minor The stats component returns {{sumOfSquares}} as part of its result. The value is picked up by {{FieldStatsInfo}} but cannot be directly accessed as there's no getter present. The value is also missing in {{toString()}}. _SideNote:_ when in the class anyway it would be nice if {{min}}, {{max}}, {{sum}} would not be exposed/kept as {{Object}} but {{Double}} instead and if a {{SolrServerException}} instead of plain {{RuntimeException}} could be thrown for unknown keys. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6631) DistributedQueue spinning on calling zookeeper getChildren()
[ https://issues.apache.org/jira/browse/SOLR-6631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188326#comment-14188326 ] Mark Miller commented on SOLR-6631: --- bq. if (eventType == Event.EventType.NodeChildrenChanged) { +1 - we are only interested in waiting around to see a child added - this watcher should not need to consider other events. DistributedQueue spinning on calling zookeeper getChildren() Key: SOLR-6631 URL: https://issues.apache.org/jira/browse/SOLR-6631 Project: Solr Issue Type: Bug Components: SolrCloud Reporter: Jessica Cheng Mallet Assignee: Timothy Potter Labels: solrcloud Attachments: SOLR-6631.patch The change from SOLR-6336 introduced a bug where now I'm stuck in a loop making getChildren() request to zookeeper with this thread dump: {quote} Thread-51 [WAITING] CPU time: 1d 15h 0m 57s java.lang.Object.wait() org.apache.zookeeper.ClientCnxn.submitRequest(RequestHeader, Record, Record, ZooKeeper$WatchRegistration) org.apache.zookeeper.ZooKeeper.getChildren(String, Watcher) org.apache.solr.common.cloud.SolrZkClient$6.execute()2 recursive calls org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkOperation) org.apache.solr.common.cloud.SolrZkClient.getChildren(String, Watcher, boolean) org.apache.solr.cloud.DistributedQueue.orderedChildren(Watcher) org.apache.solr.cloud.DistributedQueue.getChildren(long) org.apache.solr.cloud.DistributedQueue.peek(long) org.apache.solr.cloud.DistributedQueue.peek(boolean) org.apache.solr.cloud.Overseer$ClusterStateUpdater.run() java.lang.Thread.run() {quote} Looking at the code, I think the issue is that LatchChildWatcher#process always sets the event to its member variable event, regardless of its type, but the problem is that once the member event is set, the await no longer waits. In this state, the while loop in getChildren(long), when called with wait being Integer.MAX_VALUE will loop back, NOT wait at await because event != null, but then it still will not get any children. {quote} while (true) \{ if (!children.isEmpty()) break; watcher.await(wait == Long.MAX_VALUE ? DEFAULT_TIMEOUT : wait); if (watcher.getWatchedEvent() != null) \{ children = orderedChildren(null); \} if (wait != Long.MAX_VALUE) break; \} {quote} I think the fix would be to only set the event in the watcher if the type is not None. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6631) DistributedQueue spinning on calling zookeeper getChildren()
[ https://issues.apache.org/jira/browse/SOLR-6631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188346#comment-14188346 ] ASF subversion and git services commented on SOLR-6631: --- Commit 1635131 from [~thelabdude] in branch 'dev/trunk' [ https://svn.apache.org/r1635131 ] SOLR-6631: DistributedQueue spinning on calling zookeeper getChildren() DistributedQueue spinning on calling zookeeper getChildren() Key: SOLR-6631 URL: https://issues.apache.org/jira/browse/SOLR-6631 Project: Solr Issue Type: Bug Components: SolrCloud Reporter: Jessica Cheng Mallet Assignee: Timothy Potter Labels: solrcloud Attachments: SOLR-6631.patch The change from SOLR-6336 introduced a bug where now I'm stuck in a loop making getChildren() request to zookeeper with this thread dump: {quote} Thread-51 [WAITING] CPU time: 1d 15h 0m 57s java.lang.Object.wait() org.apache.zookeeper.ClientCnxn.submitRequest(RequestHeader, Record, Record, ZooKeeper$WatchRegistration) org.apache.zookeeper.ZooKeeper.getChildren(String, Watcher) org.apache.solr.common.cloud.SolrZkClient$6.execute()2 recursive calls org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkOperation) org.apache.solr.common.cloud.SolrZkClient.getChildren(String, Watcher, boolean) org.apache.solr.cloud.DistributedQueue.orderedChildren(Watcher) org.apache.solr.cloud.DistributedQueue.getChildren(long) org.apache.solr.cloud.DistributedQueue.peek(long) org.apache.solr.cloud.DistributedQueue.peek(boolean) org.apache.solr.cloud.Overseer$ClusterStateUpdater.run() java.lang.Thread.run() {quote} Looking at the code, I think the issue is that LatchChildWatcher#process always sets the event to its member variable event, regardless of its type, but the problem is that once the member event is set, the await no longer waits. In this state, the while loop in getChildren(long), when called with wait being Integer.MAX_VALUE will loop back, NOT wait at await because event != null, but then it still will not get any children. {quote} while (true) \{ if (!children.isEmpty()) break; watcher.await(wait == Long.MAX_VALUE ? DEFAULT_TIMEOUT : wait); if (watcher.getWatchedEvent() != null) \{ children = orderedChildren(null); \} if (wait != Long.MAX_VALUE) break; \} {quote} I think the fix would be to only set the event in the watcher if the type is not None. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4920) DIH JdbcDataSource exception handling
[ https://issues.apache.org/jira/browse/SOLR-4920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188359#comment-14188359 ] Mikhail Khludnev commented on SOLR-4920: I face usability issue with {code:title=JdbcDataSource.java} 141 try { 142 c = DriverManager.getConnection(url, initProps); 143 } catch (SQLException e) { 144 // DriverManager does not allow you to use a driver which is not loaded through 145 // the class loader of the class which is trying to make the connection. 146 // This is a workaround for cases where the user puts the driver jar in the 147 // solr.home/lib or solr.home/core/lib directories. 148 Driver d = (Driver) DocBuilder.loadClass(driver, context.getSolrCore()).newInstance(); 149 c = d.connect(url, initProps); 150 } {code} if I supply weird url, I've got SQLException, it's caught, then it calls c = d.connect(url, initProps); which returns null (which is pretty valid giving the javadoc). Then I have NPE where the connection is hit. There is no anything about SQLException reasons in the log. Isn't it worth to raise an issue? DIH JdbcDataSource exception handling - Key: SOLR-4920 URL: https://issues.apache.org/jira/browse/SOLR-4920 Project: Solr Issue Type: Bug Components: contrib - DataImportHandler Affects Versions: 4.3, Trunk Reporter: Chris Eldredge Assignee: Shalin Shekhar Mangar Priority: Minor Fix For: 4.4 Attachments: patch.diff JdbcDataSource will incorrectly suppress exceptions when retrieving a connection from a JNDI context and fall back to trying to use DriverManager to obtain a connection. This makes it impossible to troubleshoot misconfigured JNDI DataSource. Additionally, when a SQLException is thrown while initializing a connection, such as in setAutoCommit(), the connection will not be closed. This can cause a resource leak. A patch will be attached with unit tests that addresses both issues. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-6669) 401 is not explicitly handled when querying HttpSolrServer
Magnus Lövgren created SOLR-6669: Summary: 401 is not explicitly handled when querying HttpSolrServer Key: SOLR-6669 URL: https://issues.apache.org/jira/browse/SOLR-6669 Project: Solr Issue Type: Bug Components: SolrJ Affects Versions: 4.7 Reporter: Magnus Lövgren Priority: Minor This is a regression, likely caused by SOLR-5532 (see comments at the end in that JIRA). I use solrj and HttpSolrServer in my web application (deployed in Tomcat 7). Recently I updated Solr from 4.4. to 4.10.1 and it seems 401 is not handled properly anymore when using a custom HttpClient. The essentials of my code (that was working in 4.4): {code} String theSolrBaseURL = ... HttpClient theHttpClient = ... SolrQuery theSolrQuery = ... try { SolrServer solrServer = new HttpSolrServer(theSolrBaseURL, theHttpClient); QueryResponse response = solrServer.query(theSolrQuery); ... } catch (SolrException se) { if (se.code() == HttpStatus.SC_UNAUTHORIZED) { // Client is using bad credentials, handle appropriately ... } ... } catch (SolrServerException sse) { ... } {code} The code should speak for itself, but the basic idea is to try to recover if the client is using bad credentials. In order to do that I catch the SolrException and check if the code is 401. This approach worked well in Solr 4.4. However, this doesn't work when using Solr 4.10.1. The query method throws a SolrServerException if the HttpClient is using bad credentials. The original cause is a {{org.apache.http.ParseException}}. The problem arises in the {{HttpSolrServer.executeMethod(HttpRequestBase, ResponseParser)}} metod: # The HttpClient executes the method and gets the response #* The response is a 401/Unauthorized #* 401 response has no Content-Type header # Since there are no content type, it will be set to empty string as fallback # Later on the mime type is extracted using {{org.apache.http.entity.ContentType.parse(String)}} in order to handle charset issues (see SOLR-5532) #* This metod fails to parse empty string and throws a {{org.apache.http.ParseException}} # The intermediate caller {{QueryRequest.process(SolrServer)}} will catch the exception and throw a {{SolrServerException}} A potential fix would be to add a 401 case to the existing switch {code} case HttpStatus.SC_UNAUTHORIZED: throw new RemoteSolrException(httpStatus, Server at + getBaseURL() + returned non ok status: + httpStatus + , message: + response.getStatusLine().getReasonPhrase(), null); {code} ...and it would perhaps be appropriate to handle the content type fallback in some other way than setting it to an empty string? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-6669) 401 is not explicitly handled when querying HttpSolrServer
[ https://issues.apache.org/jira/browse/SOLR-6669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Magnus Lövgren updated SOLR-6669: - Attachment: SOLR-6669_code_screenshots.zip Attaching screenshots from IDEA debug session 401 is not explicitly handled when querying HttpSolrServer -- Key: SOLR-6669 URL: https://issues.apache.org/jira/browse/SOLR-6669 Project: Solr Issue Type: Bug Components: SolrJ Affects Versions: 4.7 Reporter: Magnus Lövgren Priority: Minor Attachments: SOLR-6669_code_screenshots.zip This is a regression, likely caused by SOLR-5532 (see comments at the end in that JIRA). I use solrj and HttpSolrServer in my web application (deployed in Tomcat 7). Recently I updated Solr from 4.4. to 4.10.1 and it seems 401 is not handled properly anymore when using a custom HttpClient. The essentials of my code (that was working in 4.4): {code} String theSolrBaseURL = ... HttpClient theHttpClient = ... SolrQuery theSolrQuery = ... try { SolrServer solrServer = new HttpSolrServer(theSolrBaseURL, theHttpClient); QueryResponse response = solrServer.query(theSolrQuery); ... } catch (SolrException se) { if (se.code() == HttpStatus.SC_UNAUTHORIZED) { // Client is using bad credentials, handle appropriately ... } ... } catch (SolrServerException sse) { ... } {code} The code should speak for itself, but the basic idea is to try to recover if the client is using bad credentials. In order to do that I catch the SolrException and check if the code is 401. This approach worked well in Solr 4.4. However, this doesn't work when using Solr 4.10.1. The query method throws a SolrServerException if the HttpClient is using bad credentials. The original cause is a {{org.apache.http.ParseException}}. The problem arises in the {{HttpSolrServer.executeMethod(HttpRequestBase, ResponseParser)}} metod: # The HttpClient executes the method and gets the response #* The response is a 401/Unauthorized #* 401 response has no Content-Type header # Since there are no content type, it will be set to empty string as fallback # Later on the mime type is extracted using {{org.apache.http.entity.ContentType.parse(String)}} in order to handle charset issues (see SOLR-5532) #* This metod fails to parse empty string and throws a {{org.apache.http.ParseException}} # The intermediate caller {{QueryRequest.process(SolrServer)}} will catch the exception and throw a {{SolrServerException}} A potential fix would be to add a 401 case to the existing switch {code} case HttpStatus.SC_UNAUTHORIZED: throw new RemoteSolrException(httpStatus, Server at + getBaseURL() + returned non ok status: + httpStatus + , message: + response.getStatusLine().getReasonPhrase(), null); {code} ...and it would perhaps be appropriate to handle the content type fallback in some other way than setting it to an empty string? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5532) SolrJ Content-Type validation is too strict for some webcontainers / proxies, breaks on equivilent content types
[ https://issues.apache.org/jira/browse/SOLR-5532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188365#comment-14188365 ] Magnus Lövgren commented on SOLR-5532: -- The 401 issue is now added as SOLR-6669 SolrJ Content-Type validation is too strict for some webcontainers / proxies, breaks on equivilent content types Key: SOLR-5532 URL: https://issues.apache.org/jira/browse/SOLR-5532 Project: Solr Issue Type: Bug Affects Versions: 4.6 Environment: Windows 7, Java 1.7.0_45 (64bit), solr-solrj-4.6.0.jar Reporter: Jakob Furrer Assignee: Mark Miller Fix For: 4.6.1, 4.7, Trunk Attachments: SOLR-5532-elyograg-eclipse-screenshot.png, SOLR-5532.patch due to SOLR-3530, HttpSolrServer now does a string equivilence check between the Content-Type returned by the server, and a getContentTYpe() method declared by the ResponseParser .. but string equivilence is too strict, and can result in errors like this one reported by a user I just upgraded my Solr instance and with it I also upgraded the solrj library in our custom application which sends diverse requests and queries to Solr. I use the ping method to determine whether Solr started correctly under the configured address. Since the upgrade the ping response results in an error: {code:xml} Cause: org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: Expected content type application/xml; charset=UTF-8 but got application/xml;charset=UTF-8. ?xml version=1.0 encoding=UTF-8? response lst name=responseHeaderint name=status0/intint name=QTime0/intlst name=paramsstr name=dfsearchtext/strstr name=echoParamsall/strstr name=rows10/strstr name=echoParamsall/strstr name=wtxml/strstr name=version2.2/strstr name=qsolrpingquery/strstr name=distribfalse/str/lst/lststr name=statusOK/str /response {code} The Solr application itself works fine. Using an older version of the solrj library than solr-solrj-4.6.0.jar (e.g. solr-solrj-4.5.1.jar) in the custom application does not produce this error. The Exception is produced in a Code block (_HttpSolrServer.java_, method _request(...)_, around. line 140) which has been introduced with version 4.6.0. Code to reproduce the error: {code} try { HttpSolrServer solrServer = new HttpSolrServer(http://localhost:8080/Solr/collection;); solrServer.setParser(new XMLResponseParser()); // this line is making all the difference solrServer.ping(); } catch (Exception e) { e.printStackTrace(); } {code} A global search for charset=UTF-8 on the source code of solrj indicates that other functions besides ping might be affected as well, because there are several places where application/xml; charset=UTF-8 is spelled without a space after the semicolon. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-6669) 401 is not explicitly handled when querying HttpSolrServer
[ https://issues.apache.org/jira/browse/SOLR-6669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Magnus Lövgren updated SOLR-6669: - Environment: solr-solrj-4.10.1.jar tested with: Windows 7, 6.1, amd64 Java HotSpot(TM) 64-Bit Server VM, 1.7.0_67, Oracle Corporation and Linux, 3.11.6-4-default, amd64 Java HotSpot(TM) 64-Bit Server VM, 1.7.0_72, Oracle Corporation 401 is not explicitly handled when querying HttpSolrServer -- Key: SOLR-6669 URL: https://issues.apache.org/jira/browse/SOLR-6669 Project: Solr Issue Type: Bug Components: SolrJ Affects Versions: 4.7 Environment: solr-solrj-4.10.1.jar tested with: Windows 7, 6.1, amd64 Java HotSpot(TM) 64-Bit Server VM, 1.7.0_67, Oracle Corporation and Linux, 3.11.6-4-default, amd64 Java HotSpot(TM) 64-Bit Server VM, 1.7.0_72, Oracle Corporation Reporter: Magnus Lövgren Priority: Minor Attachments: SOLR-6669_code_screenshots.zip This is a regression, likely caused by SOLR-5532 (see comments at the end in that JIRA). I use solrj and HttpSolrServer in my web application (deployed in Tomcat 7). Recently I updated Solr from 4.4. to 4.10.1 and it seems 401 is not handled properly anymore when using a custom HttpClient. The essentials of my code (that was working in 4.4): {code} String theSolrBaseURL = ... HttpClient theHttpClient = ... SolrQuery theSolrQuery = ... try { SolrServer solrServer = new HttpSolrServer(theSolrBaseURL, theHttpClient); QueryResponse response = solrServer.query(theSolrQuery); ... } catch (SolrException se) { if (se.code() == HttpStatus.SC_UNAUTHORIZED) { // Client is using bad credentials, handle appropriately ... } ... } catch (SolrServerException sse) { ... } {code} The code should speak for itself, but the basic idea is to try to recover if the client is using bad credentials. In order to do that I catch the SolrException and check if the code is 401. This approach worked well in Solr 4.4. However, this doesn't work when using Solr 4.10.1. The query method throws a SolrServerException if the HttpClient is using bad credentials. The original cause is a {{org.apache.http.ParseException}}. The problem arises in the {{HttpSolrServer.executeMethod(HttpRequestBase, ResponseParser)}} metod: # The HttpClient executes the method and gets the response #* The response is a 401/Unauthorized #* 401 response has no Content-Type header # Since there are no content type, it will be set to empty string as fallback # Later on the mime type is extracted using {{org.apache.http.entity.ContentType.parse(String)}} in order to handle charset issues (see SOLR-5532) #* This metod fails to parse empty string and throws a {{org.apache.http.ParseException}} # The intermediate caller {{QueryRequest.process(SolrServer)}} will catch the exception and throw a {{SolrServerException}} A potential fix would be to add a 401 case to the existing switch {code} case HttpStatus.SC_UNAUTHORIZED: throw new RemoteSolrException(httpStatus, Server at + getBaseURL() + returned non ok status: + httpStatus + , message: + response.getStatusLine().getReasonPhrase(), null); {code} ...and it would perhaps be appropriate to handle the content type fallback in some other way than setting it to an empty string? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6631) DistributedQueue spinning on calling zookeeper getChildren()
[ https://issues.apache.org/jira/browse/SOLR-6631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188373#comment-14188373 ] ASF subversion and git services commented on SOLR-6631: --- Commit 1635142 from [~thelabdude] in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1635142 ] SOLR-6631: DistributedQueue spinning on calling zookeeper getChildren() DistributedQueue spinning on calling zookeeper getChildren() Key: SOLR-6631 URL: https://issues.apache.org/jira/browse/SOLR-6631 Project: Solr Issue Type: Bug Components: SolrCloud Reporter: Jessica Cheng Mallet Assignee: Timothy Potter Labels: solrcloud Attachments: SOLR-6631.patch The change from SOLR-6336 introduced a bug where now I'm stuck in a loop making getChildren() request to zookeeper with this thread dump: {quote} Thread-51 [WAITING] CPU time: 1d 15h 0m 57s java.lang.Object.wait() org.apache.zookeeper.ClientCnxn.submitRequest(RequestHeader, Record, Record, ZooKeeper$WatchRegistration) org.apache.zookeeper.ZooKeeper.getChildren(String, Watcher) org.apache.solr.common.cloud.SolrZkClient$6.execute()2 recursive calls org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkOperation) org.apache.solr.common.cloud.SolrZkClient.getChildren(String, Watcher, boolean) org.apache.solr.cloud.DistributedQueue.orderedChildren(Watcher) org.apache.solr.cloud.DistributedQueue.getChildren(long) org.apache.solr.cloud.DistributedQueue.peek(long) org.apache.solr.cloud.DistributedQueue.peek(boolean) org.apache.solr.cloud.Overseer$ClusterStateUpdater.run() java.lang.Thread.run() {quote} Looking at the code, I think the issue is that LatchChildWatcher#process always sets the event to its member variable event, regardless of its type, but the problem is that once the member event is set, the await no longer waits. In this state, the while loop in getChildren(long), when called with wait being Integer.MAX_VALUE will loop back, NOT wait at await because event != null, but then it still will not get any children. {quote} while (true) \{ if (!children.isEmpty()) break; watcher.await(wait == Long.MAX_VALUE ? DEFAULT_TIMEOUT : wait); if (watcher.getWatchedEvent() != null) \{ children = orderedChildren(null); \} if (wait != Long.MAX_VALUE) break; \} {quote} I think the fix would be to only set the event in the watcher if the type is not None. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-6533) Support editing common solrconfig.xml values
[ https://issues.apache.org/jira/browse/SOLR-6533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Noble Paul updated SOLR-6533: - Attachment: SOLR-6533.patch Support editing common solrconfig.xml values Key: SOLR-6533 URL: https://issues.apache.org/jira/browse/SOLR-6533 Project: Solr Issue Type: Sub-task Reporter: Noble Paul Attachments: SOLR-6533.patch, SOLR-6533.patch, SOLR-6533.patch, SOLR-6533.patch, SOLR-6533.patch, SOLR-6533.patch There are a bunch of properties in solrconfig.xml which users want to edit. We will attack them first These properties will be persisted to a separate file called config.json (or whatever file). Instead of saving in the same format we will have well known properties which users can directly edit {code} updateHandler.autoCommit.maxDocs query.filterCache.initialSize {code} The api will be modeled around the bulk schema API {code:javascript} curl http://localhost:8983/solr/collection1/config -H 'Content-type:application/json' -d '{ set-property : {updateHandler.autoCommit.maxDocs:5}, unset-property: updateHandler.autoCommit.maxDocs }' {code} {code:javascript} //or use this to set ${mypropname} values curl http://localhost:8983/solr/collection1/config -H 'Content-type:application/json' -d '{ set-user-property : {mypropname:my_prop_val}, unset-user-property:{mypropname} }' {code} The values stored in the config.json will always take precedence and will be applied after loading solrconfig.xml. An http GET on /config path will give the real config that is applied . -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6666) Dynamic copy fields are considering all dynamic fields, causing a significant performance impact on indexing documents
[ https://issues.apache.org/jira/browse/SOLR-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188403#comment-14188403 ] Erick Erickson commented on SOLR-: -- Liram: Good stuff! Could you attach the patch to this JIRA? That'll make it easiest for someone to pick up. Here's a hint on how: http://wiki.apache.org/solr/HowToContribute#Working_With_Patches Thanks! Dynamic copy fields are considering all dynamic fields, causing a significant performance impact on indexing documents -- Key: SOLR- URL: https://issues.apache.org/jira/browse/SOLR- Project: Solr Issue Type: Improvement Components: Schema and Analysis, update Environment: Linux, Solr 4.8, Schema with 70 fields and more than 500 specific CopyFields for dynamic fields, but without wildcards (the fields are dynamic, the copy directive is not) Reporter: Liram Vardi Result: After applying a fix for this issue, tests which we conducted show more than 40 percent improvement on our insertion performance. Explanation: Using JVM profiler, we found a CPU bottleneck during Solr indexing process. This bottleneck can be found at org.apache.solr.schema.IndexSchema, in the following method, getCopyFieldsList(): {code:title=getCopyFieldsList() |borderStyle=solid} final ListCopyField result = new ArrayList(); for (DynamicCopy dynamicCopy : dynamicCopyFields) { if (dynamicCopy.matches(sourceField)) { result.add(new CopyField(getField(sourceField), dynamicCopy.getTargetField(sourceField), dynamicCopy.maxChars)); } } ListCopyField fixedCopyFields = copyFieldsMap.get(sourceField); if (null != fixedCopyFields) { result.addAll(fixedCopyFields); } {code} This function tries to find for an input source field all its copyFields (All its destinations which Solr need to move this field). As you can probably note, the first part of the procedure is the procedure most “expensive” step (takes O(n) time while N is the size of the dynamicCopyFields group). The next part is just a simple hash extraction, which takes O(1) time. Our schema contains over then 500 copyFields but only 70 of then are indexed fields. We also have one dynamic field with a wildcard (*), which catches the rest of the document fields. As you can conclude, we have more than 400 copyFields that are based on this dynamicField but all, except one, are fixed (i.e. does not contain any wildcard). From some reason, the copyFields registration procedure defines those 400 fields as DynamicCopyField and then store them in the “dynamicCopyFields” array, This step makes getCopyFieldsList() very expensive (in CPU terms) without any justification: All of those 400 copyFields are not glob and therefore do not need any complex pattern matching to the input field. They all can be store at the fixedCopyFields. Only copyFields with asterisks need this special treatment and they are (especially on our case) pretty rare. Therefore, we created a patch which fix this problem by changing the registerCopyField() procedure. Test which we conducted show that there is no change in the Indexing results. Moreover, the fix still successfully passes the class unit tests (i.e. IndexSchemaTest.java). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-6666) Dynamic copy fields are considering all dynamic fields, causing a significant performance impact on indexing documents
[ https://issues.apache.org/jira/browse/SOLR-?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liram Vardi updated SOLR-: -- Description: Result: After applying a fix for this issue, tests which we conducted show more than 40 percent improvement on our insertion performance. Explanation: Using JVM profiler, we found a CPU bottleneck during Solr indexing process. This bottleneck can be found at org.apache.solr.schema.IndexSchema, in the following method, getCopyFieldsList(): {code:title=getCopyFieldsList() |borderStyle=solid} final ListCopyField result = new ArrayList(); for (DynamicCopy dynamicCopy : dynamicCopyFields) { if (dynamicCopy.matches(sourceField)) { result.add(new CopyField(getField(sourceField), dynamicCopy.getTargetField(sourceField), dynamicCopy.maxChars)); } } ListCopyField fixedCopyFields = copyFieldsMap.get(sourceField); if (null != fixedCopyFields) { result.addAll(fixedCopyFields); } {code} This function tries to find for an input source field all its copyFields (All its destinations which Solr need to move this field). As you can probably note, the first part of the procedure is the procedure most “expensive” step (takes O( n ) time while N is the size of the dynamicCopyFields group). The next part is just a simple hash extraction, which takes O(1) time. Our schema contains over then 500 copyFields but only 70 of then are indexed fields. We also have one dynamic field with a wildcard ( * ), which catches the rest of the document fields. As you can conclude, we have more than 400 copyFields that are based on this dynamicField but all, except one, are fixed (i.e. does not contain any wildcard). From some reason, the copyFields registration procedure defines those 400 fields as DynamicCopyField and then store them in the “dynamicCopyFields” array, This step makes getCopyFieldsList() very expensive (in CPU terms) without any justification: All of those 400 copyFields are not glob and therefore do not need any complex pattern matching to the input field. They all can be store at the fixedCopyFields. Only copyFields with asterisks need this special treatment and they are (especially on our case) pretty rare. Therefore, we created a patch which fix this problem by changing the registerCopyField() procedure. Test which we conducted show that there is no change in the Indexing results. Moreover, the fix still successfully passes the class unit tests (i.e. IndexSchemaTest.java). was: Result: After applying a fix for this issue, tests which we conducted show more than 40 percent improvement on our insertion performance. Explanation: Using JVM profiler, we found a CPU bottleneck during Solr indexing process. This bottleneck can be found at org.apache.solr.schema.IndexSchema, in the following method, getCopyFieldsList(): {code:title=getCopyFieldsList() |borderStyle=solid} final ListCopyField result = new ArrayList(); for (DynamicCopy dynamicCopy : dynamicCopyFields) { if (dynamicCopy.matches(sourceField)) { result.add(new CopyField(getField(sourceField), dynamicCopy.getTargetField(sourceField), dynamicCopy.maxChars)); } } ListCopyField fixedCopyFields = copyFieldsMap.get(sourceField); if (null != fixedCopyFields) { result.addAll(fixedCopyFields); } {code} This function tries to find for an input source field all its copyFields (All its destinations which Solr need to move this field). As you can probably note, the first part of the procedure is the procedure most “expensive” step (takes O(n) time while N is the size of the dynamicCopyFields group). The next part is just a simple hash extraction, which takes O(1) time. Our schema contains over then 500 copyFields but only 70 of then are indexed fields. We also have one dynamic field with a wildcard (*), which catches the rest of the document fields. As you can conclude, we have more than 400 copyFields that are based on this dynamicField but all, except one, are fixed (i.e. does not contain any wildcard). From some reason, the copyFields registration procedure defines those 400 fields as DynamicCopyField and then store them in the “dynamicCopyFields” array, This step makes getCopyFieldsList() very expensive (in CPU terms) without any justification: All of those 400 copyFields are not glob and therefore do not need any complex pattern matching to the input field. They all can be store at the fixedCopyFields. Only copyFields with asterisks need this special treatment and they are (especially on our case) pretty rare. Therefore, we created a patch which fix this problem by changing the registerCopyField() procedure. Test which we conducted show that there is no change in the Indexing results. Moreover, the fix still successfully passes the class unit tests (i.e. IndexSchemaTest.java). Dynamic copy
[jira] [Commented] (SOLR-6631) DistributedQueue spinning on calling zookeeper getChildren()
[ https://issues.apache.org/jira/browse/SOLR-6631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188407#comment-14188407 ] ASF subversion and git services commented on SOLR-6631: --- Commit 1635155 from [~thelabdude] in branch 'dev/trunk' [ https://svn.apache.org/r1635155 ] Backout fix for SOLR-6631 as things like create collection are hanging now DistributedQueue spinning on calling zookeeper getChildren() Key: SOLR-6631 URL: https://issues.apache.org/jira/browse/SOLR-6631 Project: Solr Issue Type: Bug Components: SolrCloud Reporter: Jessica Cheng Mallet Assignee: Timothy Potter Labels: solrcloud Attachments: SOLR-6631.patch The change from SOLR-6336 introduced a bug where now I'm stuck in a loop making getChildren() request to zookeeper with this thread dump: {quote} Thread-51 [WAITING] CPU time: 1d 15h 0m 57s java.lang.Object.wait() org.apache.zookeeper.ClientCnxn.submitRequest(RequestHeader, Record, Record, ZooKeeper$WatchRegistration) org.apache.zookeeper.ZooKeeper.getChildren(String, Watcher) org.apache.solr.common.cloud.SolrZkClient$6.execute()2 recursive calls org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkOperation) org.apache.solr.common.cloud.SolrZkClient.getChildren(String, Watcher, boolean) org.apache.solr.cloud.DistributedQueue.orderedChildren(Watcher) org.apache.solr.cloud.DistributedQueue.getChildren(long) org.apache.solr.cloud.DistributedQueue.peek(long) org.apache.solr.cloud.DistributedQueue.peek(boolean) org.apache.solr.cloud.Overseer$ClusterStateUpdater.run() java.lang.Thread.run() {quote} Looking at the code, I think the issue is that LatchChildWatcher#process always sets the event to its member variable event, regardless of its type, but the problem is that once the member event is set, the await no longer waits. In this state, the while loop in getChildren(long), when called with wait being Integer.MAX_VALUE will loop back, NOT wait at await because event != null, but then it still will not get any children. {quote} while (true) \{ if (!children.isEmpty()) break; watcher.await(wait == Long.MAX_VALUE ? DEFAULT_TIMEOUT : wait); if (watcher.getWatchedEvent() != null) \{ children = orderedChildren(null); \} if (wait != Long.MAX_VALUE) break; \} {quote} I think the fix would be to only set the event in the watcher if the type is not None. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6631) DistributedQueue spinning on calling zookeeper getChildren()
[ https://issues.apache.org/jira/browse/SOLR-6631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188408#comment-14188408 ] ASF subversion and git services commented on SOLR-6631: --- Commit 1635157 from [~thelabdude] in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1635157 ] Backout fix for SOLR-6631 as things like create collection are hanging now DistributedQueue spinning on calling zookeeper getChildren() Key: SOLR-6631 URL: https://issues.apache.org/jira/browse/SOLR-6631 Project: Solr Issue Type: Bug Components: SolrCloud Reporter: Jessica Cheng Mallet Assignee: Timothy Potter Labels: solrcloud Attachments: SOLR-6631.patch The change from SOLR-6336 introduced a bug where now I'm stuck in a loop making getChildren() request to zookeeper with this thread dump: {quote} Thread-51 [WAITING] CPU time: 1d 15h 0m 57s java.lang.Object.wait() org.apache.zookeeper.ClientCnxn.submitRequest(RequestHeader, Record, Record, ZooKeeper$WatchRegistration) org.apache.zookeeper.ZooKeeper.getChildren(String, Watcher) org.apache.solr.common.cloud.SolrZkClient$6.execute()2 recursive calls org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkOperation) org.apache.solr.common.cloud.SolrZkClient.getChildren(String, Watcher, boolean) org.apache.solr.cloud.DistributedQueue.orderedChildren(Watcher) org.apache.solr.cloud.DistributedQueue.getChildren(long) org.apache.solr.cloud.DistributedQueue.peek(long) org.apache.solr.cloud.DistributedQueue.peek(boolean) org.apache.solr.cloud.Overseer$ClusterStateUpdater.run() java.lang.Thread.run() {quote} Looking at the code, I think the issue is that LatchChildWatcher#process always sets the event to its member variable event, regardless of its type, but the problem is that once the member event is set, the await no longer waits. In this state, the while loop in getChildren(long), when called with wait being Integer.MAX_VALUE will loop back, NOT wait at await because event != null, but then it still will not get any children. {quote} while (true) \{ if (!children.isEmpty()) break; watcher.await(wait == Long.MAX_VALUE ? DEFAULT_TIMEOUT : wait); if (watcher.getWatchedEvent() != null) \{ children = orderedChildren(null); \} if (wait != Long.MAX_VALUE) break; \} {quote} I think the fix would be to only set the event in the watcher if the type is not None. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-6666) Dynamic copy fields are considering all dynamic fields, causing a significant performance impact on indexing documents
[ https://issues.apache.org/jira/browse/SOLR-?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liram Vardi updated SOLR-: -- Attachment: SOLR-.patch Dynamic copy fields are considering all dynamic fields, causing a significant performance impact on indexing documents -- Key: SOLR- URL: https://issues.apache.org/jira/browse/SOLR- Project: Solr Issue Type: Improvement Components: Schema and Analysis, update Environment: Linux, Solr 4.8, Schema with 70 fields and more than 500 specific CopyFields for dynamic fields, but without wildcards (the fields are dynamic, the copy directive is not) Reporter: Liram Vardi Attachments: SOLR-.patch Result: After applying a fix for this issue, tests which we conducted show more than 40 percent improvement on our insertion performance. Explanation: Using JVM profiler, we found a CPU bottleneck during Solr indexing process. This bottleneck can be found at org.apache.solr.schema.IndexSchema, in the following method, getCopyFieldsList(): {code:title=getCopyFieldsList() |borderStyle=solid} final ListCopyField result = new ArrayList(); for (DynamicCopy dynamicCopy : dynamicCopyFields) { if (dynamicCopy.matches(sourceField)) { result.add(new CopyField(getField(sourceField), dynamicCopy.getTargetField(sourceField), dynamicCopy.maxChars)); } } ListCopyField fixedCopyFields = copyFieldsMap.get(sourceField); if (null != fixedCopyFields) { result.addAll(fixedCopyFields); } {code} This function tries to find for an input source field all its copyFields (All its destinations which Solr need to move this field). As you can probably note, the first part of the procedure is the procedure most “expensive” step (takes O( n ) time while N is the size of the dynamicCopyFields group). The next part is just a simple hash extraction, which takes O(1) time. Our schema contains over then 500 copyFields but only 70 of then are indexed fields. We also have one dynamic field with a wildcard ( * ), which catches the rest of the document fields. As you can conclude, we have more than 400 copyFields that are based on this dynamicField but all, except one, are fixed (i.e. does not contain any wildcard). From some reason, the copyFields registration procedure defines those 400 fields as DynamicCopyField and then store them in the “dynamicCopyFields” array, This step makes getCopyFieldsList() very expensive (in CPU terms) without any justification: All of those 400 copyFields are not glob and therefore do not need any complex pattern matching to the input field. They all can be store at the fixedCopyFields. Only copyFields with asterisks need this special treatment and they are (especially on our case) pretty rare. Therefore, we created a patch which fix this problem by changing the registerCopyField() procedure. Test which we conducted show that there is no change in the Indexing results. Moreover, the fix still successfully passes the class unit tests (i.e. IndexSchemaTest.java). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Closed] (SOLR-6667) Dynamic copy fields are considering all dynamic fields, causing a significant performance impact on indexing documents
[ https://issues.apache.org/jira/browse/SOLR-6667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liram Vardi closed SOLR-6667. - Resolution: Duplicate Duplicate of SOLR- Dynamic copy fields are considering all dynamic fields, causing a significant performance impact on indexing documents -- Key: SOLR-6667 URL: https://issues.apache.org/jira/browse/SOLR-6667 Project: Solr Issue Type: Improvement Components: Schema and Analysis, update Affects Versions: 4.8 Environment: Linux, Solr 4.8, Schema with 70 fields and more than 500 specific CopyFields for dynamic fields, but without wildcards (the fields are dynamic, the copy directive is not) Reporter: Liram Vardi Attachments: SOLR-6667.patch Result: After applying a fix for this issue, tests which we conducted show more than 40 percent improvement on our insertion performance. Explanation: Using JVM profiler, we found a CPU bottleneck during Solr indexing process. This bottleneck can be found at org.apache.solr.schema.IndexSchema, in the following method, getCopyFieldsList(): {code:title=getCopyFieldsList() |borderStyle=solid} final ListCopyField result = new ArrayList(); for (DynamicCopy dynamicCopy : dynamicCopyFields) { if (dynamicCopy.matches(sourceField)) { result.add(new CopyField(getField(sourceField), dynamicCopy.getTargetField(sourceField), dynamicCopy.maxChars)); } } ListCopyField fixedCopyFields = copyFieldsMap.get(sourceField); if (null != fixedCopyFields) { result.addAll(fixedCopyFields); } {code} This function tries to find for an input source field all its copyFields (All its destinations which Solr need to move this field). As you can probably note, the first part of the procedure is the procedure most “expensive” step (takes O( n ) time while N is the size of the dynamicCopyFields group). The next part is just a simple hash extraction, which takes O(1) time. Our schema contains over then 500 copyFields but only 70 of then are indexed fields. We also have one dynamic field with a wildcard ( * ), which catches the rest of the document fields. As you can conclude, we have more than 400 copyFields that are based on this dynamicField but all, except one, are fixed (i.e. does not contain any wildcard). From some reason, the copyFields registration procedure defines those 400 fields as DynamicCopyField and then store them in the “dynamicCopyFields” array, This step makes getCopyFieldsList() very expensive (in CPU terms) without any justification: All of those 400 copyFields are not glob and therefore do not need any complex pattern matching to the input field. They all can be store at the fixedCopyFields. Only copyFields with asterisks need this special treatment and they are (especially on our case) pretty rare. Therefore, we created a patch which fix this problem by changing the registerCopyField() procedure. Test which we conducted show that there is no change in the Indexing results. Moreover, the fix still successfully passes the class unit tests (i.e. IndexSchemaTest.java). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6666) Dynamic copy fields are considering all dynamic fields, causing a significant performance impact on indexing documents
[ https://issues.apache.org/jira/browse/SOLR-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188411#comment-14188411 ] Liram Vardi commented on SOLR-: --- Thanks :-) The patch is attached. Dynamic copy fields are considering all dynamic fields, causing a significant performance impact on indexing documents -- Key: SOLR- URL: https://issues.apache.org/jira/browse/SOLR- Project: Solr Issue Type: Improvement Components: Schema and Analysis, update Environment: Linux, Solr 4.8, Schema with 70 fields and more than 500 specific CopyFields for dynamic fields, but without wildcards (the fields are dynamic, the copy directive is not) Reporter: Liram Vardi Attachments: SOLR-.patch Result: After applying a fix for this issue, tests which we conducted show more than 40 percent improvement on our insertion performance. Explanation: Using JVM profiler, we found a CPU bottleneck during Solr indexing process. This bottleneck can be found at org.apache.solr.schema.IndexSchema, in the following method, getCopyFieldsList(): {code:title=getCopyFieldsList() |borderStyle=solid} final ListCopyField result = new ArrayList(); for (DynamicCopy dynamicCopy : dynamicCopyFields) { if (dynamicCopy.matches(sourceField)) { result.add(new CopyField(getField(sourceField), dynamicCopy.getTargetField(sourceField), dynamicCopy.maxChars)); } } ListCopyField fixedCopyFields = copyFieldsMap.get(sourceField); if (null != fixedCopyFields) { result.addAll(fixedCopyFields); } {code} This function tries to find for an input source field all its copyFields (All its destinations which Solr need to move this field). As you can probably note, the first part of the procedure is the procedure most “expensive” step (takes O( n ) time while N is the size of the dynamicCopyFields group). The next part is just a simple hash extraction, which takes O(1) time. Our schema contains over then 500 copyFields but only 70 of then are indexed fields. We also have one dynamic field with a wildcard ( * ), which catches the rest of the document fields. As you can conclude, we have more than 400 copyFields that are based on this dynamicField but all, except one, are fixed (i.e. does not contain any wildcard). From some reason, the copyFields registration procedure defines those 400 fields as DynamicCopyField and then store them in the “dynamicCopyFields” array, This step makes getCopyFieldsList() very expensive (in CPU terms) without any justification: All of those 400 copyFields are not glob and therefore do not need any complex pattern matching to the input field. They all can be store at the fixedCopyFields. Only copyFields with asterisks need this special treatment and they are (especially on our case) pretty rare. Therefore, we created a patch which fix this problem by changing the registerCopyField() procedure. Test which we conducted show that there is no change in the Indexing results. Moreover, the fix still successfully passes the class unit tests (i.e. IndexSchemaTest.java). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (SOLR-6666) Dynamic copy fields are considering all dynamic fields, causing a significant performance impact on indexing documents
[ https://issues.apache.org/jira/browse/SOLR-?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erick Erickson reassigned SOLR-: Assignee: Erick Erickson Dynamic copy fields are considering all dynamic fields, causing a significant performance impact on indexing documents -- Key: SOLR- URL: https://issues.apache.org/jira/browse/SOLR- Project: Solr Issue Type: Improvement Components: Schema and Analysis, update Environment: Linux, Solr 4.8, Schema with 70 fields and more than 500 specific CopyFields for dynamic fields, but without wildcards (the fields are dynamic, the copy directive is not) Reporter: Liram Vardi Assignee: Erick Erickson Attachments: SOLR-.patch Result: After applying a fix for this issue, tests which we conducted show more than 40 percent improvement on our insertion performance. Explanation: Using JVM profiler, we found a CPU bottleneck during Solr indexing process. This bottleneck can be found at org.apache.solr.schema.IndexSchema, in the following method, getCopyFieldsList(): {code:title=getCopyFieldsList() |borderStyle=solid} final ListCopyField result = new ArrayList(); for (DynamicCopy dynamicCopy : dynamicCopyFields) { if (dynamicCopy.matches(sourceField)) { result.add(new CopyField(getField(sourceField), dynamicCopy.getTargetField(sourceField), dynamicCopy.maxChars)); } } ListCopyField fixedCopyFields = copyFieldsMap.get(sourceField); if (null != fixedCopyFields) { result.addAll(fixedCopyFields); } {code} This function tries to find for an input source field all its copyFields (All its destinations which Solr need to move this field). As you can probably note, the first part of the procedure is the procedure most “expensive” step (takes O( n ) time while N is the size of the dynamicCopyFields group). The next part is just a simple hash extraction, which takes O(1) time. Our schema contains over then 500 copyFields but only 70 of then are indexed fields. We also have one dynamic field with a wildcard ( * ), which catches the rest of the document fields. As you can conclude, we have more than 400 copyFields that are based on this dynamicField but all, except one, are fixed (i.e. does not contain any wildcard). From some reason, the copyFields registration procedure defines those 400 fields as DynamicCopyField and then store them in the “dynamicCopyFields” array, This step makes getCopyFieldsList() very expensive (in CPU terms) without any justification: All of those 400 copyFields are not glob and therefore do not need any complex pattern matching to the input field. They all can be store at the fixedCopyFields. Only copyFields with asterisks need this special treatment and they are (especially on our case) pretty rare. Therefore, we created a patch which fix this problem by changing the registerCopyField() procedure. Test which we conducted show that there is no change in the Indexing results. Moreover, the fix still successfully passes the class unit tests (i.e. IndexSchemaTest.java). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6666) Dynamic copy fields are considering all dynamic fields, causing a significant performance impact on indexing documents
[ https://issues.apache.org/jira/browse/SOLR-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188412#comment-14188412 ] Erick Erickson commented on SOLR-: -- I'll take a look, but I won't have any serious tim until this weekend, so if anyone wants to pick this up feel free. Dynamic copy fields are considering all dynamic fields, causing a significant performance impact on indexing documents -- Key: SOLR- URL: https://issues.apache.org/jira/browse/SOLR- Project: Solr Issue Type: Improvement Components: Schema and Analysis, update Environment: Linux, Solr 4.8, Schema with 70 fields and more than 500 specific CopyFields for dynamic fields, but without wildcards (the fields are dynamic, the copy directive is not) Reporter: Liram Vardi Assignee: Erick Erickson Attachments: SOLR-.patch Result: After applying a fix for this issue, tests which we conducted show more than 40 percent improvement on our insertion performance. Explanation: Using JVM profiler, we found a CPU bottleneck during Solr indexing process. This bottleneck can be found at org.apache.solr.schema.IndexSchema, in the following method, getCopyFieldsList(): {code:title=getCopyFieldsList() |borderStyle=solid} final ListCopyField result = new ArrayList(); for (DynamicCopy dynamicCopy : dynamicCopyFields) { if (dynamicCopy.matches(sourceField)) { result.add(new CopyField(getField(sourceField), dynamicCopy.getTargetField(sourceField), dynamicCopy.maxChars)); } } ListCopyField fixedCopyFields = copyFieldsMap.get(sourceField); if (null != fixedCopyFields) { result.addAll(fixedCopyFields); } {code} This function tries to find for an input source field all its copyFields (All its destinations which Solr need to move this field). As you can probably note, the first part of the procedure is the procedure most “expensive” step (takes O( n ) time while N is the size of the dynamicCopyFields group). The next part is just a simple hash extraction, which takes O(1) time. Our schema contains over then 500 copyFields but only 70 of then are indexed fields. We also have one dynamic field with a wildcard ( * ), which catches the rest of the document fields. As you can conclude, we have more than 400 copyFields that are based on this dynamicField but all, except one, are fixed (i.e. does not contain any wildcard). From some reason, the copyFields registration procedure defines those 400 fields as DynamicCopyField and then store them in the “dynamicCopyFields” array, This step makes getCopyFieldsList() very expensive (in CPU terms) without any justification: All of those 400 copyFields are not glob and therefore do not need any complex pattern matching to the input field. They all can be store at the fixedCopyFields. Only copyFields with asterisks need this special treatment and they are (especially on our case) pretty rare. Therefore, we created a patch which fix this problem by changing the registerCopyField() procedure. Test which we conducted show that there is no change in the Indexing results. Moreover, the fix still successfully passes the class unit tests (i.e. IndexSchemaTest.java). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6610) In stateFormat=2, ZkController.publishAndWaitForDownStates always times out
[ https://issues.apache.org/jira/browse/SOLR-6610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188415#comment-14188415 ] ASF subversion and git services commented on SOLR-6610: --- Commit 1635163 from [~noble.paul] in branch 'dev/trunk' [ https://svn.apache.org/r1635163 ] SOLR-6610 In stateFormat=2, ZkController.publishAndWaitForDownStates always times out --- Key: SOLR-6610 URL: https://issues.apache.org/jira/browse/SOLR-6610 Project: Solr Issue Type: Bug Components: SolrCloud Reporter: Jessica Cheng Mallet Assignee: Noble Paul Labels: solrcloud Attachments: SOLR-6610.patch Using stateFormat=2, our solr always takes a while to start up and spits out this warning line: {quote} WARN - 2014-10-08 17:30:24.290; org.apache.solr.cloud.ZkController; Timed out waiting to see all nodes published as DOWN in our cluster state. {quote} Looking at the code, this is probably because ZkController.publishAndWaitForDownStates is called in ZkController.init, which gets called via ZkContainer.initZookeeper in CoreContainer.load before any of the stateFormat=2 collection watches are set in the CoreContainer.preRegisterInZk call a few lines later. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6610) In stateFormat=2, ZkController.publishAndWaitForDownStates always times out
[ https://issues.apache.org/jira/browse/SOLR-6610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188428#comment-14188428 ] ASF subversion and git services commented on SOLR-6610: --- Commit 1635168 from [~noble.paul] in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1635168 ] SOLR-6610 In stateFormat=2, ZkController.publishAndWaitForDownStates always times out --- Key: SOLR-6610 URL: https://issues.apache.org/jira/browse/SOLR-6610 Project: Solr Issue Type: Bug Components: SolrCloud Reporter: Jessica Cheng Mallet Assignee: Noble Paul Labels: solrcloud Attachments: SOLR-6610.patch Using stateFormat=2, our solr always takes a while to start up and spits out this warning line: {quote} WARN - 2014-10-08 17:30:24.290; org.apache.solr.cloud.ZkController; Timed out waiting to see all nodes published as DOWN in our cluster state. {quote} Looking at the code, this is probably because ZkController.publishAndWaitForDownStates is called in ZkController.init, which gets called via ZkContainer.initZookeeper in CoreContainer.load before any of the stateFormat=2 collection watches are set in the CoreContainer.preRegisterInZk call a few lines later. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6030) Add norms patched compression which uses table for most common values
[ https://issues.apache.org/jira/browse/LUCENE-6030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188442#comment-14188442 ] Robert Muir commented on LUCENE-6030: - +1, very nice Add norms patched compression which uses table for most common values - Key: LUCENE-6030 URL: https://issues.apache.org/jira/browse/LUCENE-6030 Project: Lucene - Core Issue Type: Improvement Reporter: Ryan Ernst Attachments: LUCENE-6030.patch We have added the PATCHED norms sub format in lucene 50, which uses a bitset to mark documents that have the most common value (when 97% of the documents have that value). This works well for fields that have a predominant value length, and then a small number of docs with some other random values. But another common case is having a handful of very common value lengths, like with a title field. We can use a table (see TABLE_COMPRESSION) to store the most common values, and save an oridinal for the other case, at which point we can lookup in the secondary patch table. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6517) CollectionsAPI call REBALANCELEADERS
[ https://issues.apache.org/jira/browse/SOLR-6517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188475#comment-14188475 ] Noble Paul commented on SOLR-6517: -- Please mention the JIRA ticket number in the commit message . It is extremely difficult otherwise when I look at the file history CollectionsAPI call REBALANCELEADERS Key: SOLR-6517 URL: https://issues.apache.org/jira/browse/SOLR-6517 Project: Solr Issue Type: New Feature Affects Versions: 5.0, Trunk Reporter: Erick Erickson Assignee: Erick Erickson Fix For: 5.0, Trunk Attachments: SOLR-6517.patch, SOLR-6517.patch, SOLR-6517.patch Perhaps the final piece of SOLR-6491. Once the preferred leadership roles are assigned, there has to be a command make it so Mr. Solr. This is something of a placeholder to collect ideas. One wouldn't want to flood the system with hundreds of re-assignments at once. Should this be synchronous or asnych? Should it make the best attempt but not worry about perfection? Should it??? a collection=name parameter would be required and it would re-elect all the leaders that were on the 'wrong' node I'm thinking an optionally allowing one to specify a shard in the case where you wanted to make a very specific change. Note that there's no need to specify a particular replica, since there should be only a single preferredLeader per slice. This command would do nothing to any slice that did not have a replica with a preferredLeader role. Likewise it would do nothing if the slice in question already had the leader role assigned to the node with the preferredLeader role. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Issue Comment Deleted] (SOLR-6517) CollectionsAPI call REBALANCELEADERS
[ https://issues.apache.org/jira/browse/SOLR-6517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Noble Paul updated SOLR-6517: - Comment: was deleted (was: Please mention the JIRA ticket number in the commit message . It is extremely difficult otherwise when I look at the file history) CollectionsAPI call REBALANCELEADERS Key: SOLR-6517 URL: https://issues.apache.org/jira/browse/SOLR-6517 Project: Solr Issue Type: New Feature Affects Versions: 5.0, Trunk Reporter: Erick Erickson Assignee: Erick Erickson Fix For: 5.0, Trunk Attachments: SOLR-6517.patch, SOLR-6517.patch, SOLR-6517.patch Perhaps the final piece of SOLR-6491. Once the preferred leadership roles are assigned, there has to be a command make it so Mr. Solr. This is something of a placeholder to collect ideas. One wouldn't want to flood the system with hundreds of re-assignments at once. Should this be synchronous or asnych? Should it make the best attempt but not worry about perfection? Should it??? a collection=name parameter would be required and it would re-elect all the leaders that were on the 'wrong' node I'm thinking an optionally allowing one to specify a shard in the case where you wanted to make a very specific change. Note that there's no need to specify a particular replica, since there should be only a single preferredLeader per slice. This command would do nothing to any slice that did not have a replica with a preferredLeader role. Likewise it would do nothing if the slice in question already had the leader role assigned to the node with the preferredLeader role. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-2927) SolrIndexSearcher's register do not match close and SolrCore's closeSearcher
[ https://issues.apache.org/jira/browse/SOLR-2927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188484#comment-14188484 ] Cyrille Roy edited comment on SOLR-2927 at 10/29/14 3:53 PM: - I have been able to reproduce this issue patching the code to throw an exception in SolrCore in branch 4_2 in constructor line 875 ... resourceLoader.inform(resourceLoader); //DO NOT COMMIT THIS: if(!metadata.equals(name)) throw new RuntimeException(test exception); ... you can then curl any core $curl http://localhost:xxx/solr/CORE_NAME/select?q=*:*; open a jconsole and you will see the leaking mbean named solr/CORE_NAME was (Author: croy): I have able to reproduce this issue patching the code to throw an exception in SolrCore in branch 4_2 in constructor line 875 ... resourceLoader.inform(resourceLoader); //DO NOT COMMIT THIS: if(!metadata.equals(name)) throw new RuntimeException(test exception); ... you can then curl any core $curl http://localhost:xxx/solr/CORE_NAME/select?q=*:*; open a jconsole and you will see the leaking mbean named solr/CORE_NAME SolrIndexSearcher's register do not match close and SolrCore's closeSearcher Key: SOLR-2927 URL: https://issues.apache.org/jira/browse/SOLR-2927 Project: Solr Issue Type: Bug Components: search Affects Versions: 4.0-ALPHA Environment: JDK1.6/CentOS Reporter: tom liu Assignee: Shalin Shekhar Mangar Fix For: 4.9, Trunk # SolrIndexSearcher's register method put the name of searcher, but SolrCore's closeSearcher method remove name of currentSearcher on infoRegistry. # SolrIndexSearcher's register method put the name of cache, but SolrIndexSearcher's close do not remove the name of cache. so, there maybe lost some memory leak. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2927) SolrIndexSearcher's register do not match close and SolrCore's closeSearcher
[ https://issues.apache.org/jira/browse/SOLR-2927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188484#comment-14188484 ] Cyrille Roy commented on SOLR-2927: --- I have able to reproduce this issue patching the code to throw an exception in SolrCore in branch 4_2 in constructor line 875 ... resourceLoader.inform(resourceLoader); //DO NOT COMMIT THIS: if(!metadata.equals(name)) throw new RuntimeException(test exception); ... you can then curl any core $curl http://localhost:xxx/solr/CORE_NAME/select?q=*:*; open a jconsole and you will see the leaking mbean named solr/CORE_NAME SolrIndexSearcher's register do not match close and SolrCore's closeSearcher Key: SOLR-2927 URL: https://issues.apache.org/jira/browse/SOLR-2927 Project: Solr Issue Type: Bug Components: search Affects Versions: 4.0-ALPHA Environment: JDK1.6/CentOS Reporter: tom liu Assignee: Shalin Shekhar Mangar Fix For: 4.9, Trunk # SolrIndexSearcher's register method put the name of searcher, but SolrCore's closeSearcher method remove name of currentSearcher on infoRegistry. # SolrIndexSearcher's register method put the name of cache, but SolrIndexSearcher's close do not remove the name of cache. so, there maybe lost some memory leak. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2927) SolrIndexSearcher's register do not match close and SolrCore's closeSearcher
[ https://issues.apache.org/jira/browse/SOLR-2927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cyrille Roy updated SOLR-2927: -- Attachment: mbean-leak.png SolrIndexSearcher's register do not match close and SolrCore's closeSearcher Key: SOLR-2927 URL: https://issues.apache.org/jira/browse/SOLR-2927 Project: Solr Issue Type: Bug Components: search Affects Versions: 4.0-ALPHA Environment: JDK1.6/CentOS Reporter: tom liu Assignee: Shalin Shekhar Mangar Fix For: 4.9, Trunk Attachments: mbean-leak.png # SolrIndexSearcher's register method put the name of searcher, but SolrCore's closeSearcher method remove name of currentSearcher on infoRegistry. # SolrIndexSearcher's register method put the name of cache, but SolrIndexSearcher's close do not remove the name of cache. so, there maybe lost some memory leak. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2361) OutOfMemoryException while Indexing
[ https://issues.apache.org/jira/browse/LUCENE-2361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188486#comment-14188486 ] Surendhar commented on LUCENE-2361: --- Hi Thomas, I am facing similar problem as mentioned above. I could see your comments made changes. In what version this problem got resolved. Appreciate your help. OutOfMemoryException while Indexing --- Key: LUCENE-2361 URL: https://issues.apache.org/jira/browse/LUCENE-2361 Project: Lucene - Core Issue Type: Bug Components: core/index Affects Versions: 2.9.1 Environment: Windows Reporter: Shivender Devarakonda Hi, We use lucene 2.9.1 version. We see the following OutOfMemory error in our environment, I think This is happening at a significant high load. Have you observed this anytime? Please let me know your thoughts on this. org.apache.lucene.index.MergePolicy$MergeException: java.lang.OutOfMemoryError: PermGen space at org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:351) at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:315) Caused by: java.lang.OutOfMemoryError: PermGen space at java.lang.String.$$YJP$$intern(Native Method) at java.lang.String.intern(Unknown Source) at org.apache.lucene.util.SimpleStringInterner.intern(SimpleStringInterner.java:74) at org.apache.lucene.util.StringHelper.intern(StringHelper.java:36) at org.apache.lucene.index.FieldInfos.read(FieldInfos.java:356) at org.apache.lucene.index.FieldInfos.init(FieldInfos.java:71) at org.apache.lucene.index.SegmentReader$CoreReaders.init(SegmentReader.java:116) at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:638) at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:608) at org.apache.lucene.index.IndexWriter$ReaderPool.get(IndexWriter.java:686) at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4979) at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:4614) at org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:235) at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:291) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2927) SolrIndexSearcher's register do not match close and SolrCore's closeSearcher
[ https://issues.apache.org/jira/browse/SOLR-2927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cyrille Roy updated SOLR-2927: -- Attachment: (was: mbean-leak.png) SolrIndexSearcher's register do not match close and SolrCore's closeSearcher Key: SOLR-2927 URL: https://issues.apache.org/jira/browse/SOLR-2927 Project: Solr Issue Type: Bug Components: search Affects Versions: 4.0-ALPHA Environment: JDK1.6/CentOS Reporter: tom liu Assignee: Shalin Shekhar Mangar Fix For: 4.9, Trunk # SolrIndexSearcher's register method put the name of searcher, but SolrCore's closeSearcher method remove name of currentSearcher on infoRegistry. # SolrIndexSearcher's register method put the name of cache, but SolrIndexSearcher's close do not remove the name of cache. so, there maybe lost some memory leak. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2927) SolrIndexSearcher's register do not match close and SolrCore's closeSearcher
[ https://issues.apache.org/jira/browse/SOLR-2927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cyrille Roy updated SOLR-2927: -- Attachment: mbean-leak-jira.png SolrIndexSearcher's register do not match close and SolrCore's closeSearcher Key: SOLR-2927 URL: https://issues.apache.org/jira/browse/SOLR-2927 Project: Solr Issue Type: Bug Components: search Affects Versions: 4.0-ALPHA Environment: JDK1.6/CentOS Reporter: tom liu Assignee: Shalin Shekhar Mangar Fix For: 4.9, Trunk Attachments: mbean-leak-jira.png # SolrIndexSearcher's register method put the name of searcher, but SolrCore's closeSearcher method remove name of currentSearcher on infoRegistry. # SolrIndexSearcher's register method put the name of cache, but SolrIndexSearcher's close do not remove the name of cache. so, there maybe lost some memory leak. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2361) OutOfMemoryException while Indexing
[ https://issues.apache.org/jira/browse/LUCENE-2361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188493#comment-14188493 ] Surendhar commented on LUCENE-2361: --- I could see still its is open, please let me know what version problem got resolved. OutOfMemoryException while Indexing --- Key: LUCENE-2361 URL: https://issues.apache.org/jira/browse/LUCENE-2361 Project: Lucene - Core Issue Type: Bug Components: core/index Affects Versions: 2.9.1 Environment: Windows Reporter: Shivender Devarakonda Hi, We use lucene 2.9.1 version. We see the following OutOfMemory error in our environment, I think This is happening at a significant high load. Have you observed this anytime? Please let me know your thoughts on this. org.apache.lucene.index.MergePolicy$MergeException: java.lang.OutOfMemoryError: PermGen space at org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:351) at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:315) Caused by: java.lang.OutOfMemoryError: PermGen space at java.lang.String.$$YJP$$intern(Native Method) at java.lang.String.intern(Unknown Source) at org.apache.lucene.util.SimpleStringInterner.intern(SimpleStringInterner.java:74) at org.apache.lucene.util.StringHelper.intern(StringHelper.java:36) at org.apache.lucene.index.FieldInfos.read(FieldInfos.java:356) at org.apache.lucene.index.FieldInfos.init(FieldInfos.java:71) at org.apache.lucene.index.SegmentReader$CoreReaders.init(SegmentReader.java:116) at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:638) at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:608) at org.apache.lucene.index.IndexWriter$ReaderPool.get(IndexWriter.java:686) at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4979) at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:4614) at org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:235) at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:291) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2927) SolrIndexSearcher's register do not match close and SolrCore's closeSearcher
[ https://issues.apache.org/jira/browse/SOLR-2927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cyrille Roy updated SOLR-2927: -- Attachment: SOLR-2927.patch proposed patch: in SolrCore.close() start with waiting for searcherExecutor and then empty the infoRegistry which will unregister the mbean SolrIndexSearcher's register do not match close and SolrCore's closeSearcher Key: SOLR-2927 URL: https://issues.apache.org/jira/browse/SOLR-2927 Project: Solr Issue Type: Bug Components: search Affects Versions: 4.0-ALPHA Environment: JDK1.6/CentOS Reporter: tom liu Assignee: Shalin Shekhar Mangar Fix For: 4.9, Trunk Attachments: SOLR-2927.patch, mbean-leak-jira.png # SolrIndexSearcher's register method put the name of searcher, but SolrCore's closeSearcher method remove name of currentSearcher on infoRegistry. # SolrIndexSearcher's register method put the name of cache, but SolrIndexSearcher's close do not remove the name of cache. so, there maybe lost some memory leak. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-2927) SolrIndexSearcher's register do not match close and SolrCore's closeSearcher
[ https://issues.apache.org/jira/browse/SOLR-2927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188495#comment-14188495 ] Cyrille Roy edited comment on SOLR-2927 at 10/29/14 4:16 PM: - proposed patch: in SolrCore.close() start with waiting for searcherExecutor and then empty the infoRegistry which will unregister the mbean. Patch is built against trunk Please let me know if it is not the right version to build a patch. was (Author: croy): proposed patch: in SolrCore.close() start with waiting for searcherExecutor and then empty the infoRegistry which will unregister the mbean SolrIndexSearcher's register do not match close and SolrCore's closeSearcher Key: SOLR-2927 URL: https://issues.apache.org/jira/browse/SOLR-2927 Project: Solr Issue Type: Bug Components: search Affects Versions: 4.0-ALPHA Environment: JDK1.6/CentOS Reporter: tom liu Assignee: Shalin Shekhar Mangar Fix For: 4.9, Trunk Attachments: SOLR-2927.patch, mbean-leak-jira.png # SolrIndexSearcher's register method put the name of searcher, but SolrCore's closeSearcher method remove name of currentSearcher on infoRegistry. # SolrIndexSearcher's register method put the name of cache, but SolrIndexSearcher's close do not remove the name of cache. so, there maybe lost some memory leak. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6631) DistributedQueue spinning on calling zookeeper getChildren()
[ https://issues.apache.org/jira/browse/SOLR-6631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188554#comment-14188554 ] Jessica Cheng Mallet commented on SOLR-6631: I originally thought NodeChildrenChanged would be enough too, but it made the tests hang forever. That's when I realized that the zk.exist() call in offer() also uses this watcher, so it's not enough to just watch for NodeChildrenChanged. We can either make the watcher set all not None events (None events don't remove watches, so they need to be excluded), or use a different kind of watch in the zk.exist() call. DistributedQueue spinning on calling zookeeper getChildren() Key: SOLR-6631 URL: https://issues.apache.org/jira/browse/SOLR-6631 Project: Solr Issue Type: Bug Components: SolrCloud Reporter: Jessica Cheng Mallet Assignee: Timothy Potter Labels: solrcloud Attachments: SOLR-6631.patch The change from SOLR-6336 introduced a bug where now I'm stuck in a loop making getChildren() request to zookeeper with this thread dump: {quote} Thread-51 [WAITING] CPU time: 1d 15h 0m 57s java.lang.Object.wait() org.apache.zookeeper.ClientCnxn.submitRequest(RequestHeader, Record, Record, ZooKeeper$WatchRegistration) org.apache.zookeeper.ZooKeeper.getChildren(String, Watcher) org.apache.solr.common.cloud.SolrZkClient$6.execute()2 recursive calls org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkOperation) org.apache.solr.common.cloud.SolrZkClient.getChildren(String, Watcher, boolean) org.apache.solr.cloud.DistributedQueue.orderedChildren(Watcher) org.apache.solr.cloud.DistributedQueue.getChildren(long) org.apache.solr.cloud.DistributedQueue.peek(long) org.apache.solr.cloud.DistributedQueue.peek(boolean) org.apache.solr.cloud.Overseer$ClusterStateUpdater.run() java.lang.Thread.run() {quote} Looking at the code, I think the issue is that LatchChildWatcher#process always sets the event to its member variable event, regardless of its type, but the problem is that once the member event is set, the await no longer waits. In this state, the while loop in getChildren(long), when called with wait being Integer.MAX_VALUE will loop back, NOT wait at await because event != null, but then it still will not get any children. {quote} while (true) \{ if (!children.isEmpty()) break; watcher.await(wait == Long.MAX_VALUE ? DEFAULT_TIMEOUT : wait); if (watcher.getWatchedEvent() != null) \{ children = orderedChildren(null); \} if (wait != Long.MAX_VALUE) break; \} {quote} I think the fix would be to only set the event in the watcher if the type is not None. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6517) CollectionsAPI call REBALANCELEADERS
[ https://issues.apache.org/jira/browse/SOLR-6517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188559#comment-14188559 ] Noble Paul commented on SOLR-6517: -- I'm trying to understand how the REBALANCLEADERS command work. Is there is a small writeup on the sequence of operations performed when this command is invoked CollectionsAPI call REBALANCELEADERS Key: SOLR-6517 URL: https://issues.apache.org/jira/browse/SOLR-6517 Project: Solr Issue Type: New Feature Affects Versions: 5.0, Trunk Reporter: Erick Erickson Assignee: Erick Erickson Fix For: 5.0, Trunk Attachments: SOLR-6517.patch, SOLR-6517.patch, SOLR-6517.patch Perhaps the final piece of SOLR-6491. Once the preferred leadership roles are assigned, there has to be a command make it so Mr. Solr. This is something of a placeholder to collect ideas. One wouldn't want to flood the system with hundreds of re-assignments at once. Should this be synchronous or asnych? Should it make the best attempt but not worry about perfection? Should it??? a collection=name parameter would be required and it would re-elect all the leaders that were on the 'wrong' node I'm thinking an optionally allowing one to specify a shard in the case where you wanted to make a very specific change. Note that there's no need to specify a particular replica, since there should be only a single preferredLeader per slice. This command would do nothing to any slice that did not have a replica with a preferredLeader role. Likewise it would do nothing if the slice in question already had the leader role assigned to the node with the preferredLeader role. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6631) DistributedQueue spinning on calling zookeeper getChildren()
[ https://issues.apache.org/jira/browse/SOLR-6631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188575#comment-14188575 ] Mark Miller commented on SOLR-6631: --- Whoops - glossed over that in my IDE call hierarchy. bq. a different kind of watch in the zk.exist() call I lean towards that - subclass or something - I think it's better for the current name of the watcher and think it's better the watch only processes the events it is actually interested in. Not a real big deal either way, but in the other case, let's change the name of the Watcher. DistributedQueue spinning on calling zookeeper getChildren() Key: SOLR-6631 URL: https://issues.apache.org/jira/browse/SOLR-6631 Project: Solr Issue Type: Bug Components: SolrCloud Reporter: Jessica Cheng Mallet Assignee: Timothy Potter Labels: solrcloud Attachments: SOLR-6631.patch The change from SOLR-6336 introduced a bug where now I'm stuck in a loop making getChildren() request to zookeeper with this thread dump: {quote} Thread-51 [WAITING] CPU time: 1d 15h 0m 57s java.lang.Object.wait() org.apache.zookeeper.ClientCnxn.submitRequest(RequestHeader, Record, Record, ZooKeeper$WatchRegistration) org.apache.zookeeper.ZooKeeper.getChildren(String, Watcher) org.apache.solr.common.cloud.SolrZkClient$6.execute()2 recursive calls org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkOperation) org.apache.solr.common.cloud.SolrZkClient.getChildren(String, Watcher, boolean) org.apache.solr.cloud.DistributedQueue.orderedChildren(Watcher) org.apache.solr.cloud.DistributedQueue.getChildren(long) org.apache.solr.cloud.DistributedQueue.peek(long) org.apache.solr.cloud.DistributedQueue.peek(boolean) org.apache.solr.cloud.Overseer$ClusterStateUpdater.run() java.lang.Thread.run() {quote} Looking at the code, I think the issue is that LatchChildWatcher#process always sets the event to its member variable event, regardless of its type, but the problem is that once the member event is set, the await no longer waits. In this state, the while loop in getChildren(long), when called with wait being Integer.MAX_VALUE will loop back, NOT wait at await because event != null, but then it still will not get any children. {quote} while (true) \{ if (!children.isEmpty()) break; watcher.await(wait == Long.MAX_VALUE ? DEFAULT_TIMEOUT : wait); if (watcher.getWatchedEvent() != null) \{ children = orderedChildren(null); \} if (wait != Long.MAX_VALUE) break; \} {quote} I think the fix would be to only set the event in the watcher if the type is not None. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-6517) CollectionsAPI call REBALANCELEADERS
[ https://issues.apache.org/jira/browse/SOLR-6517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188690#comment-14188690 ] Erick Erickson edited comment on SOLR-6517 at 10/29/14 5:59 PM: There's some information here: https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-RebalanceLeaders Basically, though, it just queues up Overseer.OverseerAction.LEADER commands. There's a little bit of overloading here. CollectionsHandler.handleBalanceLeaders does the throttling of how many outstanding requests there are, and OverseerCollectionsProcessor.processAssignLeaders just queues up a Overseer.OverseerAction.LEADER call for the Overseer to execute. Yeah, as the notes above indicate I'm perfectly aware that I should mention the JIRA in the message, just managed to forget once. was (Author: erickerickson): There's some information here: https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-RebalanceLeaders Basically, though, it just queues up Overseer.OverseerAction.LEADER commands. There's a little bit of overloading here. CollectionsHandler.handleBalanceLeaders does the throttling of how many outstanding requests there are, and OverseerCollectionsProcessor.processAssignLeaders just queues up a Overseer.OverseerAction.LEADER.toLower() call for the Overseer to execute. Yeah, as the notes above indicate I'm perfectly aware that I should mention the JIRA in the message, just managed to forget once. CollectionsAPI call REBALANCELEADERS Key: SOLR-6517 URL: https://issues.apache.org/jira/browse/SOLR-6517 Project: Solr Issue Type: New Feature Affects Versions: 5.0, Trunk Reporter: Erick Erickson Assignee: Erick Erickson Fix For: 5.0, Trunk Attachments: SOLR-6517.patch, SOLR-6517.patch, SOLR-6517.patch Perhaps the final piece of SOLR-6491. Once the preferred leadership roles are assigned, there has to be a command make it so Mr. Solr. This is something of a placeholder to collect ideas. One wouldn't want to flood the system with hundreds of re-assignments at once. Should this be synchronous or asnych? Should it make the best attempt but not worry about perfection? Should it??? a collection=name parameter would be required and it would re-elect all the leaders that were on the 'wrong' node I'm thinking an optionally allowing one to specify a shard in the case where you wanted to make a very specific change. Note that there's no need to specify a particular replica, since there should be only a single preferredLeader per slice. This command would do nothing to any slice that did not have a replica with a preferredLeader role. Likewise it would do nothing if the slice in question already had the leader role assigned to the node with the preferredLeader role. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6517) CollectionsAPI call REBALANCELEADERS
[ https://issues.apache.org/jira/browse/SOLR-6517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188690#comment-14188690 ] Erick Erickson commented on SOLR-6517: -- There's some information here: https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-RebalanceLeaders Basically, though, it just queues up Overseer.OverseerAction.LEADER commands. There's a little bit of overloading here. CollectionsHandler.handleBalanceLeaders does the throttling of how many outstanding requests there are, and OverseerCollectionsProcessor.processAssignLeaders just queues up a Overseer.OverseerAction.LEADER.toLower() call for the Overseer to execute. Yeah, as the notes above indicate I'm perfectly aware that I should mention the JIRA in the message, just managed to forget once. CollectionsAPI call REBALANCELEADERS Key: SOLR-6517 URL: https://issues.apache.org/jira/browse/SOLR-6517 Project: Solr Issue Type: New Feature Affects Versions: 5.0, Trunk Reporter: Erick Erickson Assignee: Erick Erickson Fix For: 5.0, Trunk Attachments: SOLR-6517.patch, SOLR-6517.patch, SOLR-6517.patch Perhaps the final piece of SOLR-6491. Once the preferred leadership roles are assigned, there has to be a command make it so Mr. Solr. This is something of a placeholder to collect ideas. One wouldn't want to flood the system with hundreds of re-assignments at once. Should this be synchronous or asnych? Should it make the best attempt but not worry about perfection? Should it??? a collection=name parameter would be required and it would re-elect all the leaders that were on the 'wrong' node I'm thinking an optionally allowing one to specify a shard in the case where you wanted to make a very specific change. Note that there's no need to specify a particular replica, since there should be only a single preferredLeader per slice. This command would do nothing to any slice that did not have a replica with a preferredLeader role. Likewise it would do nothing if the slice in question already had the leader role assigned to the node with the preferredLeader role. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6517) CollectionsAPI call REBALANCELEADERS
[ https://issues.apache.org/jira/browse/SOLR-6517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188703#comment-14188703 ] Noble Paul commented on SOLR-6517: -- how is the leader election changed? How does it ensure that the preferredLeader gets elected? CollectionsAPI call REBALANCELEADERS Key: SOLR-6517 URL: https://issues.apache.org/jira/browse/SOLR-6517 Project: Solr Issue Type: New Feature Affects Versions: 5.0, Trunk Reporter: Erick Erickson Assignee: Erick Erickson Fix For: 5.0, Trunk Attachments: SOLR-6517.patch, SOLR-6517.patch, SOLR-6517.patch Perhaps the final piece of SOLR-6491. Once the preferred leadership roles are assigned, there has to be a command make it so Mr. Solr. This is something of a placeholder to collect ideas. One wouldn't want to flood the system with hundreds of re-assignments at once. Should this be synchronous or asnych? Should it make the best attempt but not worry about perfection? Should it??? a collection=name parameter would be required and it would re-elect all the leaders that were on the 'wrong' node I'm thinking an optionally allowing one to specify a shard in the case where you wanted to make a very specific change. Note that there's no need to specify a particular replica, since there should be only a single preferredLeader per slice. This command would do nothing to any slice that did not have a replica with a preferredLeader role. Likewise it would do nothing if the slice in question already had the leader role assigned to the node with the preferredLeader role. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6517) CollectionsAPI call REBALANCELEADERS
[ https://issues.apache.org/jira/browse/SOLR-6517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188733#comment-14188733 ] Erick Erickson commented on SOLR-6517: -- First, there aren't any guarantees here, it tries its best. For instance, the node may be down etc... Barring that though, it's pretty straightforward. The meat of the processing is in collecitonsHandler.handleBalanceLeaders. get the DocCollection from the cluster state. for each slice { for each replica { if (the replica is active and NOT the leader and has the preferredLeader property set) queue it up to become the leader } } There's some bookkeeping here to respect the various parameters about how many to reassign at once and how long to wait (maxAtOnce and maxWaitSeconds) as well as construct a pretty response giving all the info it can. This latter is one benefit of the heavy lifting being in collectionsHandler rather than over in the Overseer. CollectionsAPI call REBALANCELEADERS Key: SOLR-6517 URL: https://issues.apache.org/jira/browse/SOLR-6517 Project: Solr Issue Type: New Feature Affects Versions: 5.0, Trunk Reporter: Erick Erickson Assignee: Erick Erickson Fix For: 5.0, Trunk Attachments: SOLR-6517.patch, SOLR-6517.patch, SOLR-6517.patch Perhaps the final piece of SOLR-6491. Once the preferred leadership roles are assigned, there has to be a command make it so Mr. Solr. This is something of a placeholder to collect ideas. One wouldn't want to flood the system with hundreds of re-assignments at once. Should this be synchronous or asnych? Should it make the best attempt but not worry about perfection? Should it??? a collection=name parameter would be required and it would re-elect all the leaders that were on the 'wrong' node I'm thinking an optionally allowing one to specify a shard in the case where you wanted to make a very specific change. Note that there's no need to specify a particular replica, since there should be only a single preferredLeader per slice. This command would do nothing to any slice that did not have a replica with a preferredLeader role. Likewise it would do nothing if the slice in question already had the leader role assigned to the node with the preferredLeader role. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6517) CollectionsAPI call REBALANCELEADERS
[ https://issues.apache.org/jira/browse/SOLR-6517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188746#comment-14188746 ] Noble Paul commented on SOLR-6517: -- bq. (the replica is active and NOT the leader and has the preferredLeader property set) queue it up to become the leader What happens to the node that is leader already? evicted? What happens to other nodes in the queue? CollectionsAPI call REBALANCELEADERS Key: SOLR-6517 URL: https://issues.apache.org/jira/browse/SOLR-6517 Project: Solr Issue Type: New Feature Affects Versions: 5.0, Trunk Reporter: Erick Erickson Assignee: Erick Erickson Fix For: 5.0, Trunk Attachments: SOLR-6517.patch, SOLR-6517.patch, SOLR-6517.patch Perhaps the final piece of SOLR-6491. Once the preferred leadership roles are assigned, there has to be a command make it so Mr. Solr. This is something of a placeholder to collect ideas. One wouldn't want to flood the system with hundreds of re-assignments at once. Should this be synchronous or asnych? Should it make the best attempt but not worry about perfection? Should it??? a collection=name parameter would be required and it would re-elect all the leaders that were on the 'wrong' node I'm thinking an optionally allowing one to specify a shard in the case where you wanted to make a very specific change. Note that there's no need to specify a particular replica, since there should be only a single preferredLeader per slice. This command would do nothing to any slice that did not have a replica with a preferredLeader role. Likewise it would do nothing if the slice in question already had the leader role assigned to the node with the preferredLeader role. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Closed] (LUCENE-2361) OutOfMemoryException while Indexing
[ https://issues.apache.org/jira/browse/LUCENE-2361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Smiley closed LUCENE-2361. Resolution: Cannot Reproduce IMO this shouldn't be a JIRA issue... it should have been an email thread on Lucene java-user list. Once a reproducible problem is found, then create an issue. OOM's are quite possible simply by allocating too little heap to Java. OutOfMemoryException while Indexing --- Key: LUCENE-2361 URL: https://issues.apache.org/jira/browse/LUCENE-2361 Project: Lucene - Core Issue Type: Bug Components: core/index Affects Versions: 2.9.1 Environment: Windows Reporter: Shivender Devarakonda Hi, We use lucene 2.9.1 version. We see the following OutOfMemory error in our environment, I think This is happening at a significant high load. Have you observed this anytime? Please let me know your thoughts on this. org.apache.lucene.index.MergePolicy$MergeException: java.lang.OutOfMemoryError: PermGen space at org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:351) at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:315) Caused by: java.lang.OutOfMemoryError: PermGen space at java.lang.String.$$YJP$$intern(Native Method) at java.lang.String.intern(Unknown Source) at org.apache.lucene.util.SimpleStringInterner.intern(SimpleStringInterner.java:74) at org.apache.lucene.util.StringHelper.intern(StringHelper.java:36) at org.apache.lucene.index.FieldInfos.read(FieldInfos.java:356) at org.apache.lucene.index.FieldInfos.init(FieldInfos.java:71) at org.apache.lucene.index.SegmentReader$CoreReaders.init(SegmentReader.java:116) at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:638) at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:608) at org.apache.lucene.index.IndexWriter$ReaderPool.get(IndexWriter.java:686) at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4979) at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:4614) at org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:235) at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:291) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS-MAVEN] Lucene-Solr-Maven-5.x #745: POMs out of sync
Build: https://builds.apache.org/job/Lucene-Solr-Maven-5.x/745/ 2 tests failed. FAILED: org.apache.solr.cloud.ChaosMonkeySafeLeaderTest.testDistribSearch Error Message: Test abandoned because suite timeout was reached. Stack Trace: java.lang.Exception: Test abandoned because suite timeout was reached. at __randomizedtesting.SeedInfo.seed([6184597A251FBD36]:0) FAILED: org.apache.solr.cloud.ChaosMonkeySafeLeaderTest.org.apache.solr.cloud.ChaosMonkeySafeLeaderTest Error Message: Suite timeout exceeded (= 720 msec). Stack Trace: java.lang.Exception: Suite timeout exceeded (= 720 msec). at __randomizedtesting.SeedInfo.seed([6184597A251FBD36]:0) Build Log: [...truncated 53702 lines...] BUILD FAILED /usr/home/jenkins/jenkins-slave/workspace/Lucene-Solr-Maven-5.x/build.xml:548: The following error occurred while executing this line: /usr/home/jenkins/jenkins-slave/workspace/Lucene-Solr-Maven-5.x/build.xml:200: The following error occurred while executing this line: : Java returned: 1 Total time: 289 minutes 12 seconds Build step 'Invoke Ant' marked build as failure Recording test results Email was triggered for: Failure Sending email for trigger: Failure - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-6031) TokenSources
David Smiley created LUCENE-6031: Summary: TokenSources Key: LUCENE-6031 URL: https://issues.apache.org/jira/browse/LUCENE-6031 Project: Lucene - Core Issue Type: Improvement Components: core/termvectors Reporter: David Smiley Assignee: David Smiley Fix For: 5.0 TokenSources.java, in the highlight module, is a facade that returns a TokenStream for a field by either un-inverting converting the TermVector Terms, or by text re-analysis if TermVectors are unavailable or don't have the right options. TokenSources is used by the default highlighter, which is the most accurate highlighter we've got. When documents are large (say hundreds of kilobytes on up), I found that most of the highlighter's activity was up-front spent un-inverting converting the term vector to a TokenStream, not on the actual/real highlighting that follows. Much of that time was on a huge sort of hundreds of thousands of Tokens. Time was also spent doing lots of String conversion and char copying, and it used a lot of memory, too. In this patch, I overhauled TokenStreamFromTermPositionVector.java, and I removed similar logic in TokenSources that was used in circumstances when positions weren't available but offsets were. This class can un-invert term vectors that have positions *and/or* offsets (at least one). It doesn't sort. It places Tokens _directly_ into an array of tokens directly indexed by position. When positions aren't available, the startOffset/8 is a substitute. I've got a more light-weight Token inner class used in place of the former and deprecated Token that ultimately forms a linked-list when the process is done. There is no string conversion; character copying is minimized. The Token array is GC'ed after initialization, it's only needed during construction. Misc: * It implements reset() efficiently so it need not be wrapped in CachingTokenFilter (I'll supply a patch later on this). * It only fetches payloads if you ask for them by adding the attribute (the default highlighter won't add the attribute). * It exposes the underlying TermVector terms via a getter too, which is needed by another patch to follow later. A key assumption is that the position increment gap or first position isn't gigantic, as that will create wasted space and the linked-list formation ultimately has to visit all the slots. We also assume that there aren't a ton of tokens at the same position, since inserting new tokens in sorted order is O(N^2) where 'N' is the average co-occurring token length. My performance testing using Lucene's benchmark module on a megabyte document showed 5x speedup, in conjunction with some other patches to be posted separately. This patch made the most difference. As an aside, our JIRA Components ought to be updated to reflect our Lucene modules. There should be a component for highlighting, and not for term vectors. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-6031) TokenSources optimization, avoid sort
[ https://issues.apache.org/jira/browse/LUCENE-6031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Smiley updated LUCENE-6031: - Summary: TokenSources optimization, avoid sort (was: TokenSources ) TokenSources optimization, avoid sort - Key: LUCENE-6031 URL: https://issues.apache.org/jira/browse/LUCENE-6031 Project: Lucene - Core Issue Type: Improvement Components: core/termvectors Reporter: David Smiley Assignee: David Smiley Fix For: 5.0 TokenSources.java, in the highlight module, is a facade that returns a TokenStream for a field by either un-inverting converting the TermVector Terms, or by text re-analysis if TermVectors are unavailable or don't have the right options. TokenSources is used by the default highlighter, which is the most accurate highlighter we've got. When documents are large (say hundreds of kilobytes on up), I found that most of the highlighter's activity was up-front spent un-inverting converting the term vector to a TokenStream, not on the actual/real highlighting that follows. Much of that time was on a huge sort of hundreds of thousands of Tokens. Time was also spent doing lots of String conversion and char copying, and it used a lot of memory, too. In this patch, I overhauled TokenStreamFromTermPositionVector.java, and I removed similar logic in TokenSources that was used in circumstances when positions weren't available but offsets were. This class can un-invert term vectors that have positions *and/or* offsets (at least one). It doesn't sort. It places Tokens _directly_ into an array of tokens directly indexed by position. When positions aren't available, the startOffset/8 is a substitute. I've got a more light-weight Token inner class used in place of the former and deprecated Token that ultimately forms a linked-list when the process is done. There is no string conversion; character copying is minimized. The Token array is GC'ed after initialization, it's only needed during construction. Misc: * It implements reset() efficiently so it need not be wrapped in CachingTokenFilter (I'll supply a patch later on this). * It only fetches payloads if you ask for them by adding the attribute (the default highlighter won't add the attribute). * It exposes the underlying TermVector terms via a getter too, which is needed by another patch to follow later. A key assumption is that the position increment gap or first position isn't gigantic, as that will create wasted space and the linked-list formation ultimately has to visit all the slots. We also assume that there aren't a ton of tokens at the same position, since inserting new tokens in sorted order is O(N^2) where 'N' is the average co-occurring token length. My performance testing using Lucene's benchmark module on a megabyte document showed 5x speedup, in conjunction with some other patches to be posted separately. This patch made the most difference. As an aside, our JIRA Components ought to be updated to reflect our Lucene modules. There should be a component for highlighting, and not for term vectors. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-6031) TokenSources optimization, avoid sort
[ https://issues.apache.org/jira/browse/LUCENE-6031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Smiley updated LUCENE-6031: - Attachment: LUCENE-6031.patch Here's the patch. There are a couple no-commits: # I want to rename TokenStreamFromTermPositionVector to TokenStreamFromTermVector # I think TokenSources.getTokenStreamWithOffsets should relax it's insistence that the term vector have positions. If you have control of your index options (and you do!), then you can choose not to put in positions and then highlight with the consequences of that decision, which is that highlighting will ignore stop-words, thus a query Sugar and Spice would not match sugar space and a query of sugar spice would match sugar and spice indexed. If you don't even have stop-words then why put positions in the term vector. TokenSources optimization, avoid sort - Key: LUCENE-6031 URL: https://issues.apache.org/jira/browse/LUCENE-6031 Project: Lucene - Core Issue Type: Improvement Components: core/termvectors Reporter: David Smiley Assignee: David Smiley Fix For: 5.0 Attachments: LUCENE-6031.patch TokenSources.java, in the highlight module, is a facade that returns a TokenStream for a field by either un-inverting converting the TermVector Terms, or by text re-analysis if TermVectors are unavailable or don't have the right options. TokenSources is used by the default highlighter, which is the most accurate highlighter we've got. When documents are large (say hundreds of kilobytes on up), I found that most of the highlighter's activity was up-front spent un-inverting converting the term vector to a TokenStream, not on the actual/real highlighting that follows. Much of that time was on a huge sort of hundreds of thousands of Tokens. Time was also spent doing lots of String conversion and char copying, and it used a lot of memory, too. In this patch, I overhauled TokenStreamFromTermPositionVector.java, and I removed similar logic in TokenSources that was used in circumstances when positions weren't available but offsets were. This class can un-invert term vectors that have positions *and/or* offsets (at least one). It doesn't sort. It places Tokens _directly_ into an array of tokens directly indexed by position. When positions aren't available, the startOffset/8 is a substitute. I've got a more light-weight Token inner class used in place of the former and deprecated Token that ultimately forms a linked-list when the process is done. There is no string conversion; character copying is minimized. The Token array is GC'ed after initialization, it's only needed during construction. Misc: * It implements reset() efficiently so it need not be wrapped in CachingTokenFilter (I'll supply a patch later on this). * It only fetches payloads if you ask for them by adding the attribute (the default highlighter won't add the attribute). * It exposes the underlying TermVector terms via a getter too, which is needed by another patch to follow later. A key assumption is that the position increment gap or first position isn't gigantic, as that will create wasted space and the linked-list formation ultimately has to visit all the slots. We also assume that there aren't a ton of tokens at the same position, since inserting new tokens in sorted order is O(N^2) where 'N' is the average co-occurring token length. My performance testing using Lucene's benchmark module on a megabyte document showed 5x speedup, in conjunction with some other patches to be posted separately. This patch made the most difference. As an aside, our JIRA Components ought to be updated to reflect our Lucene modules. There should be a component for highlighting, and not for term vectors. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-6032) Dealing with slow iterators
Adrien Grand created LUCENE-6032: Summary: Dealing with slow iterators Key: LUCENE-6032 URL: https://issues.apache.org/jira/browse/LUCENE-6032 Project: Lucene - Core Issue Type: Wish Reporter: Adrien Grand Priority: Minor This is a recurring issue (for instance already discussed in LUCENE-5418) but queries can sometimes be super slow if they wrap a filter that provides linear-time nextDoc/advance. LUCENE-5418 has the following comment: bq. New patch, throwing UOE from DocIdSet.iterator() for the Filter returned by Range.getFilter(). I like this approach: it's safer for the user so they don't accidentally apply a super slow filter. I like this approach because doc id sets not providing efficient iteration should really be an exception rather than a common case. In addition, using an exception has the benefit of propagating the information through the call stack, which would not be the case if we used null or a sentinel value to say that the iterator is super slow. So if you write a filter that can wrap other filters and doesn't know how to deal with filters that don't support efficient iteration, you do not need to modify your code: it will work just fine with filters that support fast iteration and will fail on filters that don't. Something I would like to explore is whether things like FilteredQuery could catch this exception in order to fall back automatically to a random-access strategy? The general idea I have is that it is ok to apply a random filter as long as you have a fast iterator to drive iteration? So eg. a filtered query based on a slow iterator would make sense, but not a ConstantScoreQuery that would wrap a filter since it would need to evaluate the filter on all non-deleted documents (it would propagate the exception of the filter). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-6033) Add CachingTokenFilter.isCached and switch LinkedList to ArrayList
David Smiley created LUCENE-6033: Summary: Add CachingTokenFilter.isCached and switch LinkedList to ArrayList Key: LUCENE-6033 URL: https://issues.apache.org/jira/browse/LUCENE-6033 Project: Lucene - Core Issue Type: Improvement Reporter: David Smiley Assignee: David Smiley Fix For: 5.0 CachingTokenFilter could use a simple boolean isCached() method implemented as-such: {code:java} /** If the underlying token stream was consumed and cached */ public boolean isCached() { return cache != null; } {code} It's useful for the highlighting code to remove its wrapping of CachingTokenFilter if after handing-off to parts of its framework it turns out that it wasn't used. Furthermore, use an ArrayList, not a LinkedList. ArrayList is leaner when the token count is high, and this class doesn't manipulate the list in a way that might favor LL. A separate patch will come that actually uses this method. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-6032) Dealing with slow iterators
[ https://issues.apache.org/jira/browse/LUCENE-6032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adrien Grand updated LUCENE-6032: - Attachment: LUCENE-6032.patch Here is a patch just to show the idea (it doesn't pass tests anyway since we have a couple of tests that wrap slow filters into a CSQ to test that they match the right docs). Dealing with slow iterators --- Key: LUCENE-6032 URL: https://issues.apache.org/jira/browse/LUCENE-6032 Project: Lucene - Core Issue Type: Wish Reporter: Adrien Grand Priority: Minor Attachments: LUCENE-6032.patch This is a recurring issue (for instance already discussed in LUCENE-5418) but queries can sometimes be super slow if they wrap a filter that provides linear-time nextDoc/advance. LUCENE-5418 has the following comment: bq. New patch, throwing UOE from DocIdSet.iterator() for the Filter returned by Range.getFilter(). I like this approach: it's safer for the user so they don't accidentally apply a super slow filter. I like this approach because doc id sets not providing efficient iteration should really be an exception rather than a common case. In addition, using an exception has the benefit of propagating the information through the call stack, which would not be the case if we used null or a sentinel value to say that the iterator is super slow. So if you write a filter that can wrap other filters and doesn't know how to deal with filters that don't support efficient iteration, you do not need to modify your code: it will work just fine with filters that support fast iteration and will fail on filters that don't. Something I would like to explore is whether things like FilteredQuery could catch this exception in order to fall back automatically to a random-access strategy? The general idea I have is that it is ok to apply a random filter as long as you have a fast iterator to drive iteration? So eg. a filtered query based on a slow iterator would make sense, but not a ConstantScoreQuery that would wrap a filter since it would need to evaluate the filter on all non-deleted documents (it would propagate the exception of the filter). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-6033) Add CachingTokenFilter.isCached and switch LinkedList to ArrayList
[ https://issues.apache.org/jira/browse/LUCENE-6033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Smiley updated LUCENE-6033: - Attachment: LUCENE-6033.patch Add CachingTokenFilter.isCached and switch LinkedList to ArrayList -- Key: LUCENE-6033 URL: https://issues.apache.org/jira/browse/LUCENE-6033 Project: Lucene - Core Issue Type: Improvement Reporter: David Smiley Assignee: David Smiley Fix For: 5.0 Attachments: LUCENE-6033.patch CachingTokenFilter could use a simple boolean isCached() method implemented as-such: {code:java} /** If the underlying token stream was consumed and cached */ public boolean isCached() { return cache != null; } {code} It's useful for the highlighting code to remove its wrapping of CachingTokenFilter if after handing-off to parts of its framework it turns out that it wasn't used. Furthermore, use an ArrayList, not a LinkedList. ArrayList is leaner when the token count is high, and this class doesn't manipulate the list in a way that might favor LL. A separate patch will come that actually uses this method. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6517) CollectionsAPI call REBALANCELEADERS
[ https://issues.apache.org/jira/browse/SOLR-6517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188976#comment-14188976 ] Erick Erickson commented on SOLR-6517: -- bq: What happens to the node that is leader already? nothing, it's already the leader, what purpose would be served by changing anything? bq: What happens to other nodes in the queue? Not sure what you're asking here. The trick is that if a replica has the preferredLeader property set LeaderElector.joinElection is called with joinAtHead set to true. So it's next up in the list when the leadership is changed. The rest of the nodes are still in the queue though, ready to take over if the preferredLeader goes away. CollectionsAPI call REBALANCELEADERS Key: SOLR-6517 URL: https://issues.apache.org/jira/browse/SOLR-6517 Project: Solr Issue Type: New Feature Affects Versions: 5.0, Trunk Reporter: Erick Erickson Assignee: Erick Erickson Fix For: 5.0, Trunk Attachments: SOLR-6517.patch, SOLR-6517.patch, SOLR-6517.patch Perhaps the final piece of SOLR-6491. Once the preferred leadership roles are assigned, there has to be a command make it so Mr. Solr. This is something of a placeholder to collect ideas. One wouldn't want to flood the system with hundreds of re-assignments at once. Should this be synchronous or asnych? Should it make the best attempt but not worry about perfection? Should it??? a collection=name parameter would be required and it would re-elect all the leaders that were on the 'wrong' node I'm thinking an optionally allowing one to specify a shard in the case where you wanted to make a very specific change. Note that there's no need to specify a particular replica, since there should be only a single preferredLeader per slice. This command would do nothing to any slice that did not have a replica with a preferredLeader role. Likewise it would do nothing if the slice in question already had the leader role assigned to the node with the preferredLeader role. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-Tests-trunk-Java7 - Build # 4942 - Failure
Build: https://builds.apache.org/job/Lucene-Solr-Tests-trunk-Java7/4942/ 1 tests failed. REGRESSION: org.apache.solr.client.solrj.TestLBHttpSolrServer.testReliability Error Message: No live SolrServers available to handle this request Stack Trace: org.apache.solr.client.solrj.SolrServerException: No live SolrServers available to handle this request at __randomizedtesting.SeedInfo.seed([F327769CC692E696:32EFABDA67F4373F]:0) at org.apache.solr.client.solrj.impl.LBHttpSolrServer.request(LBHttpSolrServer.java:539) at org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:91) at org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:301) at org.apache.solr.client.solrj.TestLBHttpSolrServer.testReliability(TestLBHttpSolrServer.java:223) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1618) at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:827) at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:863) at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:877) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53) at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:365) at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:798) at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:458) at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:836) at com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:738) at com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:772) at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:783) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:54) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65) at org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at
[jira] [Commented] (SOLR-6513) Add a collectionsAPI call BALANCESLICEUNIQUE
[ https://issues.apache.org/jira/browse/SOLR-6513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14189118#comment-14189118 ] Jan Høydahl commented on SOLR-6513: --- I thought we agreed to prefer the term shard over slice, so I think we should do this for this API as well. The *only* place in our refguide we use the word slice is in [How SolrCloud Works|https://cwiki.apache.org/confluence/display/solr/How+SolrCloud+Works] \[1\] and that description is disputed. The refguide explanation of what a shard is can be found in [Shards and Indexing Data in SolrCloud|https://cwiki.apache.org/confluence/display/solr/Shards+and+Indexing+Data+in+SolrCloud] \[2\], quoting: {quote} When your data is too large for one node, you can break it up and store it in sections by creating one or more shards. Each is a portion of the logical index, or core, and it's the set of all nodes containing that section of the index. {quote} So I'm proposing a rename of this API to {{BALANCESHARDUNIQUE}} and a rewrite of \[1\]. \[1\] https://cwiki.apache.org/confluence/display/solr/How+SolrCloud+Works \[2\] https://cwiki.apache.org/confluence/display/solr/Shards+and+Indexing+Data+in+SolrCloud Add a collectionsAPI call BALANCESLICEUNIQUE Key: SOLR-6513 URL: https://issues.apache.org/jira/browse/SOLR-6513 Project: Solr Issue Type: Improvement Reporter: Erick Erickson Assignee: Erick Erickson Fix For: 5.0, Trunk Attachments: SOLR-6513.patch, SOLR-6513.patch, SOLR-6513.patch, SOLR-6513.patch Another sub-task for SOLR-6491. The ability to assign a property on a node-by-node basis is nice, but tedious to get right for a sysadmin, especially if there are, say, 100s of nodes hosting a system. This JIRA would essentially provide an automatic mechanism for assigning a property. This particular command simply changes the cluster state, it doesn't do anything like re-assign functions. My idea for this version is fairly limited. You'd have to specify a collection and there would be no attempt to, say, evenly distribute the preferred leader role/property for this collection by looking at _other_ collections. Or by looking at underlying hardware capabilities. Or It would be a pretty simple round-robin assignment. About the only intelligence built in would be to change as few roles/properties as possible. Let's say that the correct number of nodes for this role turned out to be 3. Any node currently having 3 properties for this collection would NOT be changed. Any node having 2 properties would have one added that would be taken from some node with 3 properties like this. This probably needs an optional parameter, something like includeInactiveNodes=true|false Since this is an arbitrary property, one must specify sliceUnique=true. So for the preferredLeader functionality, one would specify something like: action=BALANCESLICEUNIQUEproperty=preferredLeaderproprety.value=true. There are checks in this code that require the preferredLeader to have a t/f value and require that sliceUnique bet true. That said, this can be called on an arbitrary property that has only one such property per slice. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6513) Add a collectionsAPI call BALANCESLICEUNIQUE
[ https://issues.apache.org/jira/browse/SOLR-6513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14189130#comment-14189130 ] Mark Miller commented on SOLR-6513: --- The general way of things has been to use slice in code and shard+context in user facing things. There has never been real agreement any of these issues IMO though. Not even when just two of us worked on it. Add a collectionsAPI call BALANCESLICEUNIQUE Key: SOLR-6513 URL: https://issues.apache.org/jira/browse/SOLR-6513 Project: Solr Issue Type: Improvement Reporter: Erick Erickson Assignee: Erick Erickson Fix For: 5.0, Trunk Attachments: SOLR-6513.patch, SOLR-6513.patch, SOLR-6513.patch, SOLR-6513.patch Another sub-task for SOLR-6491. The ability to assign a property on a node-by-node basis is nice, but tedious to get right for a sysadmin, especially if there are, say, 100s of nodes hosting a system. This JIRA would essentially provide an automatic mechanism for assigning a property. This particular command simply changes the cluster state, it doesn't do anything like re-assign functions. My idea for this version is fairly limited. You'd have to specify a collection and there would be no attempt to, say, evenly distribute the preferred leader role/property for this collection by looking at _other_ collections. Or by looking at underlying hardware capabilities. Or It would be a pretty simple round-robin assignment. About the only intelligence built in would be to change as few roles/properties as possible. Let's say that the correct number of nodes for this role turned out to be 3. Any node currently having 3 properties for this collection would NOT be changed. Any node having 2 properties would have one added that would be taken from some node with 3 properties like this. This probably needs an optional parameter, something like includeInactiveNodes=true|false Since this is an arbitrary property, one must specify sliceUnique=true. So for the preferredLeader functionality, one would specify something like: action=BALANCESLICEUNIQUEproperty=preferredLeaderproprety.value=true. There are checks in this code that require the preferredLeader to have a t/f value and require that sliceUnique bet true. That said, this can be called on an arbitrary property that has only one such property per slice. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-6670) change BALANCESLICEUNIQUE to BALANCESHARDUNIQUE
Erick Erickson created SOLR-6670: Summary: change BALANCESLICEUNIQUE to BALANCESHARDUNIQUE Key: SOLR-6670 URL: https://issues.apache.org/jira/browse/SOLR-6670 Project: Solr Issue Type: Improvement Reporter: Erick Erickson Priority: Minor JIRA for Jan's comments on SOLR-6513: I thought we agreed to prefer the term shard over slice, so I think we should do this for this API as well. The only place in our refguide we use the word slice is in How SolrCloud Works [1] and that description is disputed. The refguide explanation of what a shard is can be found in Shards and Indexing Data in SolrCloud [2], quoting: When your data is too large for one node, you can break it up and store it in sections by creating one or more shards. Each is a portion of the logical index, or core, and it's the set of all nodes containing that section of the index. So I'm proposing a rename of this API to BALANCESHARDUNIQUE and a rewrite of [1]. [1] https://cwiki.apache.org/confluence/display/solr/How+SolrCloud+Works [2] https://cwiki.apache.org/confluence/display/solr/Shards+and+Indexing+Data+in+SolrCloud Note Mark's comment on that JIRA, but I think it would be best to continue to talk about shards with user-facing operations. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (SOLR-6670) change BALANCESLICEUNIQUE to BALANCESHARDUNIQUE
[ https://issues.apache.org/jira/browse/SOLR-6670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erick Erickson reassigned SOLR-6670: Assignee: Erick Erickson change BALANCESLICEUNIQUE to BALANCESHARDUNIQUE --- Key: SOLR-6670 URL: https://issues.apache.org/jira/browse/SOLR-6670 Project: Solr Issue Type: Improvement Reporter: Erick Erickson Assignee: Erick Erickson Priority: Minor JIRA for Jan's comments on SOLR-6513: I thought we agreed to prefer the term shard over slice, so I think we should do this for this API as well. The only place in our refguide we use the word slice is in How SolrCloud Works [1] and that description is disputed. The refguide explanation of what a shard is can be found in Shards and Indexing Data in SolrCloud [2], quoting: When your data is too large for one node, you can break it up and store it in sections by creating one or more shards. Each is a portion of the logical index, or core, and it's the set of all nodes containing that section of the index. So I'm proposing a rename of this API to BALANCESHARDUNIQUE and a rewrite of [1]. [1] https://cwiki.apache.org/confluence/display/solr/How+SolrCloud+Works [2] https://cwiki.apache.org/confluence/display/solr/Shards+and+Indexing+Data+in+SolrCloud Note Mark's comment on that JIRA, but I think it would be best to continue to talk about shards with user-facing operations. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6513) Add a collectionsAPI call BALANCESLICEUNIQUE
[ https://issues.apache.org/jira/browse/SOLR-6513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14189181#comment-14189181 ] Erick Erickson commented on SOLR-6513: -- Still, I'm all for keeping things consistent. See SOLR-6670 and we'll go from there. Add a collectionsAPI call BALANCESLICEUNIQUE Key: SOLR-6513 URL: https://issues.apache.org/jira/browse/SOLR-6513 Project: Solr Issue Type: Improvement Reporter: Erick Erickson Assignee: Erick Erickson Fix For: 5.0, Trunk Attachments: SOLR-6513.patch, SOLR-6513.patch, SOLR-6513.patch, SOLR-6513.patch Another sub-task for SOLR-6491. The ability to assign a property on a node-by-node basis is nice, but tedious to get right for a sysadmin, especially if there are, say, 100s of nodes hosting a system. This JIRA would essentially provide an automatic mechanism for assigning a property. This particular command simply changes the cluster state, it doesn't do anything like re-assign functions. My idea for this version is fairly limited. You'd have to specify a collection and there would be no attempt to, say, evenly distribute the preferred leader role/property for this collection by looking at _other_ collections. Or by looking at underlying hardware capabilities. Or It would be a pretty simple round-robin assignment. About the only intelligence built in would be to change as few roles/properties as possible. Let's say that the correct number of nodes for this role turned out to be 3. Any node currently having 3 properties for this collection would NOT be changed. Any node having 2 properties would have one added that would be taken from some node with 3 properties like this. This probably needs an optional parameter, something like includeInactiveNodes=true|false Since this is an arbitrary property, one must specify sliceUnique=true. So for the preferredLeader functionality, one would specify something like: action=BALANCESLICEUNIQUEproperty=preferredLeaderproprety.value=true. There are checks in this code that require the preferredLeader to have a t/f value and require that sliceUnique bet true. That said, this can be called on an arbitrary property that has only one such property per slice. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1672) RFE: facet reverse sort count
[ https://issues.apache.org/jira/browse/SOLR-1672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14189210#comment-14189210 ] Charles Draper commented on SOLR-1672: -- I would make heavy use of sort by index desc if it was available. RFE: facet reverse sort count - Key: SOLR-1672 URL: https://issues.apache.org/jira/browse/SOLR-1672 Project: Solr Issue Type: Improvement Components: search Affects Versions: 1.4 Environment: Java, Solrj, http Reporter: Peter Sturge Priority: Minor Attachments: SOLR-1672.patch Original Estimate: 0h Remaining Estimate: 0h As suggested by Chris Hosstetter, I have added an optional Comparator to the BoundedTreeSetLong in the UnInvertedField class. This optional comparator is used when a new (and also optional) field facet parameter called 'facet.sortorder' is set to the string 'dsc' (e.g. f.facetname.facet.sortorder=dsc for per field, or facet.sortorder=dsc for all facets). Note that this parameter has no effect if facet.method=enum. Any value other than 'dsc' (including no value) reverts the BoundedTreeSet to its default behaviour. This change affects 2 source files: UnInvertedField.java [line 438] The getCounts() method signature is modified to add the 'facetSortOrder' parameter value to the end of the argument list. DIFF UnInvertedField.java: - public NamedList getCounts(SolrIndexSearcher searcher, DocSet baseDocs, int offset, int limit, Integer mincount, boolean missing, String sort, String prefix) throws IOException { + public NamedList getCounts(SolrIndexSearcher searcher, DocSet baseDocs, int offset, int limit, Integer mincount, boolean missing, String sort, String prefix, String facetSortOrder) throws IOException { [line 556] The getCounts() method is modified to create an overridden BoundedTreeSetLong(int, Comparator) if the 'facetSortOrder' parameter equals 'dsc'. DIFF UnInvertedField.java: - final BoundedTreeSetLong queue = new BoundedTreeSetLong(maxsize); + final BoundedTreeSetLong queue = (sort.equals(count) || sort.equals(true)) ? (facetSortOrder.equals(dsc) ? new BoundedTreeSetLong(maxsize, new Comparator() { @Override public int compare(Object o1, Object o2) { if (o1 == null || o2 == null) return 0; int result = ((Long) o1).compareTo((Long) o2); return (result != 0 ? result 0 ? -1 : 1 : 0); //lowest number first sort }}) : new BoundedTreeSetLong(maxsize)) : null; SimpleFacets.java [line 221] A getFieldParam(field, facet.sortorder, asc); is added to retrieve the new parameter, if present. 'asc' used as a default value. DIFF SimpleFacets.java: + String facetSortOrder = params.getFieldParam(field, facet.sortorder, asc); [line 253] The call to uif.getCounts() in the getTermCounts() method is modified to pass the 'facetSortOrder' value string. DIFF SimpleFacets.java: - counts = uif.getCounts(searcher, base, offset, limit, mincount,missing,sort,prefix); + counts = uif.getCounts(searcher, base, offset, limit, mincount,missing,sort,prefix, facetSortOrder); Implementation Notes: I have noted in testing that I was not able to retrieve any '0' counts as I had expected. I believe this could be because there appear to be some optimizations in SimpleFacets/count caching such that zero counts are not iterated (at least not by default) as a performance enhancement. I could be wrong about this, and zero counts may appear under some other as yet untested circumstances. Perhaps an expert familiar with this part of the code can clarify. In fact, this is not such a bad thing (at least for my requirements), as a whole bunch of zero counts is not necessarily useful (for my requirements, starting at '1' is just right). There may, however, be instances where someone *will* want zero counts - e.g. searching for zero product stock counts (e.g. 'what have we run out of'). I was envisioning the facet.mincount field being the preferred place to set where the 'lowest value' begins (e.g. 0 or 1 or possibly higher), but because of the caching/optimization, the behaviour is somewhat different than expected. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6248) MoreLikeThis Query Parser
[ https://issues.apache.org/jira/browse/SOLR-6248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14189236#comment-14189236 ] ASF subversion and git services commented on SOLR-6248: --- Commit 1635329 from [~anshumg] in branch 'dev/trunk' [ https://svn.apache.org/r1635329 ] SOLR-6248: Changing the format of mlt query parser MoreLikeThis Query Parser - Key: SOLR-6248 URL: https://issues.apache.org/jira/browse/SOLR-6248 Project: Solr Issue Type: New Feature Components: query parsers Reporter: Anshum Gupta Assignee: Anshum Gupta Fix For: 5.0 Attachments: SOLR-6248.patch, SOLR-6248.patch, SOLR-6248.patch, SOLR-6248.patch, SOLR-6248.patch MLT Component doesn't let people highlight/paginate and the handler comes with an cost of maintaining another piece in the config. Also, any changes to the default (number of results to be fetched etc.) /select handler need to be copied/synced with this handler too. Having an MLT QParser would let users get back docs based on a query for them to paginate, highlight etc. It would also give them the flexibility to use this anywhere i.e. q,fq,bq etc. A bit of history about MLT (thanks to Hoss) MLT Handler pre-dates the existence of QParsers and was meant to take an arbitrary query as input, find docs that match that query, club them together to find interesting terms, and then use those terms as if they were my main query to generate a main result set. This result would then be used as the set to facet, highlight etc. The flow: Query - DocList(m) - Bag (terms) - Query - DocList\(y) The MLT component on the other hand solved a very different purpose of augmenting the main result set. It is used to get similar docs for each of the doc in the main result set. DocSet\(n) - n * Bag (terms) - n * (Query) - n * DocList(m) The new approach: All of this can be done better and cleaner (and makes more sense too) using an MLT QParser. An important thing to handle here is the case where the user doesn't have TermVectors, in which case, it does what happens right now i.e. parsing stored fields. Also, in case the user doesn't have a field (to be used for MLT) indexed, the field would need to be a TextField with an index analyzer defined. This analyzer will then be used to extract terms for MLT. In case of SolrCloud mode, '/get-termvectors' can be used after looking at the schema (if TermVectors are enabled for the field). If not, a /get call can be used to fetch the field and parse it. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6248) MoreLikeThis Query Parser
[ https://issues.apache.org/jira/browse/SOLR-6248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14189235#comment-14189235 ] Anshum Gupta commented on SOLR-6248: After a discussion with Hoss, I'm changing the format of the query parser. It wouldn't have an 'id' key in the request i.e. the new request would look like: {quote} \{!mlt qf=fieldname\}docId {quote} This would eliminate the need to document/maintain and track a new parameter name. MoreLikeThis Query Parser - Key: SOLR-6248 URL: https://issues.apache.org/jira/browse/SOLR-6248 Project: Solr Issue Type: New Feature Components: query parsers Reporter: Anshum Gupta Assignee: Anshum Gupta Fix For: 5.0 Attachments: SOLR-6248.patch, SOLR-6248.patch, SOLR-6248.patch, SOLR-6248.patch, SOLR-6248.patch MLT Component doesn't let people highlight/paginate and the handler comes with an cost of maintaining another piece in the config. Also, any changes to the default (number of results to be fetched etc.) /select handler need to be copied/synced with this handler too. Having an MLT QParser would let users get back docs based on a query for them to paginate, highlight etc. It would also give them the flexibility to use this anywhere i.e. q,fq,bq etc. A bit of history about MLT (thanks to Hoss) MLT Handler pre-dates the existence of QParsers and was meant to take an arbitrary query as input, find docs that match that query, club them together to find interesting terms, and then use those terms as if they were my main query to generate a main result set. This result would then be used as the set to facet, highlight etc. The flow: Query - DocList(m) - Bag (terms) - Query - DocList\(y) The MLT component on the other hand solved a very different purpose of augmenting the main result set. It is used to get similar docs for each of the doc in the main result set. DocSet\(n) - n * Bag (terms) - n * (Query) - n * DocList(m) The new approach: All of this can be done better and cleaner (and makes more sense too) using an MLT QParser. An important thing to handle here is the case where the user doesn't have TermVectors, in which case, it does what happens right now i.e. parsing stored fields. Also, in case the user doesn't have a field (to be used for MLT) indexed, the field would need to be a TextField with an index analyzer defined. This analyzer will then be used to extract terms for MLT. In case of SolrCloud mode, '/get-termvectors' can be used after looking at the schema (if TermVectors are enabled for the field). If not, a /get call can be used to fetch the field and parse it. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5579) Leader stops processing collection-work-queue after failed collection reload
[ https://issues.apache.org/jira/browse/SOLR-5579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14189269#comment-14189269 ] Ryan Cooke commented on SOLR-5579: -- Pretty sure we are also encountering this issue, the collection reload http requests issued through the core admin are timing out and a corresponding message is sitting in the collection-work-queue. Reloading cores using the reload button in the admin gui will successfully reload the local collection however. Issuing the reload http request with the parameter async=true seems to behave in the same way (request time out) Leader stops processing collection-work-queue after failed collection reload Key: SOLR-5579 URL: https://issues.apache.org/jira/browse/SOLR-5579 Project: Solr Issue Type: Bug Affects Versions: 4.5.1 Environment: Debian Linux 6.0 running on VMWare Using embedded SOLR Jetty. Reporter: Eric Bus Assignee: Mark Miller Labels: collections, queue I've been experiencing the same problem a few times now. My leader in /overseer_elect/leader stops processing the collection queue at /overseer/collection-queue-work. The queue will build up and it will trigger an alert in my monitoring tool. I haven't been able to pinpoint the reason that the leader stops, but usually I kill the leader node to trigger a leader election. The new node will pick up the queue. And this is where the problems start. When the new leader is processing the queue and picks up a reload for a shard without an active leader, the queue stops. It keeps repeating the message that there is no active leader for the shard. But a new leader is never elected: {quote} ERROR - 2013-12-24 14:43:40.390; org.apache.solr.common.SolrException; Error while trying to recover. core=magento_349_shard1_replica1:org.apache.solr.common.SolrException: No registered leader was found, collection:magento_349 slice:shard1 at org.apache.solr.common.cloud.ZkStateReader.getLeaderRetry(ZkStateReader.java:482) at org.apache.solr.common.cloud.ZkStateReader.getLeaderRetry(ZkStateReader.java:465) at org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:317) at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:219) ERROR - 2013-12-24 14:43:40.391; org.apache.solr.cloud.RecoveryStrategy; Recovery failed - trying again... (7) core=magento_349_shard1_replica1 INFO - 2013-12-24 14:43:40.391; org.apache.solr.cloud.RecoveryStrategy; Wait 256.0 seconds before trying to recover again (8) {quote} Is the leader election in some way connected to the collection queue? If so, can this be a deadlock, because it won't elect until the reload is complete? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6248) MoreLikeThis Query Parser
[ https://issues.apache.org/jira/browse/SOLR-6248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14189268#comment-14189268 ] ASF subversion and git services commented on SOLR-6248: --- Commit 1635336 from [~anshumg] in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1635336 ] SOLR-6248: Changing request format for mlt queryparser (merge from trunk) MoreLikeThis Query Parser - Key: SOLR-6248 URL: https://issues.apache.org/jira/browse/SOLR-6248 Project: Solr Issue Type: New Feature Components: query parsers Reporter: Anshum Gupta Assignee: Anshum Gupta Fix For: 5.0 Attachments: SOLR-6248.patch, SOLR-6248.patch, SOLR-6248.patch, SOLR-6248.patch, SOLR-6248.patch MLT Component doesn't let people highlight/paginate and the handler comes with an cost of maintaining another piece in the config. Also, any changes to the default (number of results to be fetched etc.) /select handler need to be copied/synced with this handler too. Having an MLT QParser would let users get back docs based on a query for them to paginate, highlight etc. It would also give them the flexibility to use this anywhere i.e. q,fq,bq etc. A bit of history about MLT (thanks to Hoss) MLT Handler pre-dates the existence of QParsers and was meant to take an arbitrary query as input, find docs that match that query, club them together to find interesting terms, and then use those terms as if they were my main query to generate a main result set. This result would then be used as the set to facet, highlight etc. The flow: Query - DocList(m) - Bag (terms) - Query - DocList\(y) The MLT component on the other hand solved a very different purpose of augmenting the main result set. It is used to get similar docs for each of the doc in the main result set. DocSet\(n) - n * Bag (terms) - n * (Query) - n * DocList(m) The new approach: All of this can be done better and cleaner (and makes more sense too) using an MLT QParser. An important thing to handle here is the case where the user doesn't have TermVectors, in which case, it does what happens right now i.e. parsing stored fields. Also, in case the user doesn't have a field (to be used for MLT) indexed, the field would need to be a TextField with an index analyzer defined. This analyzer will then be used to extract terms for MLT. In case of SolrCloud mode, '/get-termvectors' can be used after looking at the schema (if TermVectors are enabled for the field). If not, a /get call can be used to fetch the field and parse it. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-6671) Introduce a solr.data.root as root dir for all data
Jan Høydahl created SOLR-6671: - Summary: Introduce a solr.data.root as root dir for all data Key: SOLR-6671 URL: https://issues.apache.org/jira/browse/SOLR-6671 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.10.1 Reporter: Jan Høydahl Fix For: 5.0, Trunk Many users prefer to deploy code, config and data on separate disk locations, so the default of placing the indexes under {{$\{solr.solr.home\}/$\{solr.core.name\}/data}} is not always wanted. In a multi-core/collection system, there is not much help in the {{solr.data.dir}} option, as it would set the {{dataDir}} to the same folder for all collections. One workaround, if you don't want to hardcode paths in your {{solrconfig.xml}}, is to specify the {{dataDir}} property in each {{solr.properties}} file. A more elegant solution would be to introduce a new Java-option {{solr.data.root}} which would be to data the same as {{solr.solr.home}} is for config. If set, all collections would default their {{dataDir}} as {{$\{solr.data.root\)/$\{solr.core.name\}/data}} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6671) Introduce a solr.data.root as root dir for all data
[ https://issues.apache.org/jira/browse/SOLR-6671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14189285#comment-14189285 ] Hoss Man commented on SOLR-6671: isn't this already trivial for users to do by sepecifying {{dataDir$\{solr.data.root\}/$\{solr.core.name\}/data/dataDir}} in their solrconfig.xml file(s) ? Introduce a solr.data.root as root dir for all data --- Key: SOLR-6671 URL: https://issues.apache.org/jira/browse/SOLR-6671 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.10.1 Reporter: Jan Høydahl Fix For: 5.0, Trunk Many users prefer to deploy code, config and data on separate disk locations, so the default of placing the indexes under {{$\{solr.solr.home\}/$\{solr.core.name\}/data}} is not always wanted. In a multi-core/collection system, there is not much help in the {{solr.data.dir}} option, as it would set the {{dataDir}} to the same folder for all collections. One workaround, if you don't want to hardcode paths in your {{solrconfig.xml}}, is to specify the {{dataDir}} property in each {{solr.properties}} file. A more elegant solution would be to introduce a new Java-option {{solr.data.root}} which would be to data the same as {{solr.solr.home}} is for config. If set, all collections would default their {{dataDir}} as {{$\{solr.data.root\)/$\{solr.core.name\}/data}} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6671) Introduce a solr.data.root as root dir for all data
[ https://issues.apache.org/jira/browse/SOLR-6671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14189306#comment-14189306 ] Jan Høydahl commented on SOLR-6671: --- Not sure how to wire it in so it will also work as today if the new option is not specified. What we have now in {{solrconfig.xml}} is; {code:xml}dataDir${solr.data.dir:}/dataDir{code} One way is to add a new property in {{solr.xml}}: {code:xml}dataRootDir${solr.data.root:}/dataRootDir{code} Then modify the logic in SolrCore and other places resolving default data dir, if empty to consider solr.data.root as well. Introduce a solr.data.root as root dir for all data --- Key: SOLR-6671 URL: https://issues.apache.org/jira/browse/SOLR-6671 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.10.1 Reporter: Jan Høydahl Fix For: 5.0, Trunk Many users prefer to deploy code, config and data on separate disk locations, so the default of placing the indexes under {{$\{solr.solr.home\}/$\{solr.core.name\}/data}} is not always wanted. In a multi-core/collection system, there is not much help in the {{solr.data.dir}} option, as it would set the {{dataDir}} to the same folder for all collections. One workaround, if you don't want to hardcode paths in your {{solrconfig.xml}}, is to specify the {{dataDir}} property in each {{solr.properties}} file. A more elegant solution would be to introduce a new Java-option {{solr.data.root}} which would be to data the same as {{solr.solr.home}} is for config. If set, all collections would default their {{dataDir}} as {{$\{solr.data.root\)/$\{solr.core.name\}/data}} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6671) Introduce a solr.data.root as root dir for all data
[ https://issues.apache.org/jira/browse/SOLR-6671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14189315#comment-14189315 ] Mark Miller commented on SOLR-6671: --- This is similar to the solr.hdfs.home that the HdfsDirectoryFactory exposes to root SolrCloud instance dirs in one location. Def makes sense to have the same option for local filesystem given that you really don't want to manage data directories manually when using SolrCloud if you can help it. That was also a driving reason behind solr.hdfs.home. Introduce a solr.data.root as root dir for all data --- Key: SOLR-6671 URL: https://issues.apache.org/jira/browse/SOLR-6671 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.10.1 Reporter: Jan Høydahl Fix For: 5.0, Trunk Many users prefer to deploy code, config and data on separate disk locations, so the default of placing the indexes under {{$\{solr.solr.home\}/$\{solr.core.name\}/data}} is not always wanted. In a multi-core/collection system, there is not much help in the {{solr.data.dir}} option, as it would set the {{dataDir}} to the same folder for all collections. One workaround, if you don't want to hardcode paths in your {{solrconfig.xml}}, is to specify the {{dataDir}} property in each {{solr.properties}} file. A more elegant solution would be to introduce a new Java-option {{solr.data.root}} which would be to data the same as {{solr.solr.home}} is for config. If set, all collections would default their {{dataDir}} as {{$\{solr.data.root\)/$\{solr.core.name\}/data}} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1672) RFE: facet reverse sort count
[ https://issues.apache.org/jira/browse/SOLR-1672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14189341#comment-14189341 ] Yonik Seeley commented on SOLR-1672: bq. {code} And I favor the index [asc|desc] / count [asc|desc] format{code} +1, this is exactly the syntax that Heliosearch uses (well it actually accepts either index desc or index:desc) since the API is JSON: http://heliosearch.org/json-facet-api/#TermsFacet RFE: facet reverse sort count - Key: SOLR-1672 URL: https://issues.apache.org/jira/browse/SOLR-1672 Project: Solr Issue Type: Improvement Components: search Affects Versions: 1.4 Environment: Java, Solrj, http Reporter: Peter Sturge Priority: Minor Attachments: SOLR-1672.patch Original Estimate: 0h Remaining Estimate: 0h As suggested by Chris Hosstetter, I have added an optional Comparator to the BoundedTreeSetLong in the UnInvertedField class. This optional comparator is used when a new (and also optional) field facet parameter called 'facet.sortorder' is set to the string 'dsc' (e.g. f.facetname.facet.sortorder=dsc for per field, or facet.sortorder=dsc for all facets). Note that this parameter has no effect if facet.method=enum. Any value other than 'dsc' (including no value) reverts the BoundedTreeSet to its default behaviour. This change affects 2 source files: UnInvertedField.java [line 438] The getCounts() method signature is modified to add the 'facetSortOrder' parameter value to the end of the argument list. DIFF UnInvertedField.java: - public NamedList getCounts(SolrIndexSearcher searcher, DocSet baseDocs, int offset, int limit, Integer mincount, boolean missing, String sort, String prefix) throws IOException { + public NamedList getCounts(SolrIndexSearcher searcher, DocSet baseDocs, int offset, int limit, Integer mincount, boolean missing, String sort, String prefix, String facetSortOrder) throws IOException { [line 556] The getCounts() method is modified to create an overridden BoundedTreeSetLong(int, Comparator) if the 'facetSortOrder' parameter equals 'dsc'. DIFF UnInvertedField.java: - final BoundedTreeSetLong queue = new BoundedTreeSetLong(maxsize); + final BoundedTreeSetLong queue = (sort.equals(count) || sort.equals(true)) ? (facetSortOrder.equals(dsc) ? new BoundedTreeSetLong(maxsize, new Comparator() { @Override public int compare(Object o1, Object o2) { if (o1 == null || o2 == null) return 0; int result = ((Long) o1).compareTo((Long) o2); return (result != 0 ? result 0 ? -1 : 1 : 0); //lowest number first sort }}) : new BoundedTreeSetLong(maxsize)) : null; SimpleFacets.java [line 221] A getFieldParam(field, facet.sortorder, asc); is added to retrieve the new parameter, if present. 'asc' used as a default value. DIFF SimpleFacets.java: + String facetSortOrder = params.getFieldParam(field, facet.sortorder, asc); [line 253] The call to uif.getCounts() in the getTermCounts() method is modified to pass the 'facetSortOrder' value string. DIFF SimpleFacets.java: - counts = uif.getCounts(searcher, base, offset, limit, mincount,missing,sort,prefix); + counts = uif.getCounts(searcher, base, offset, limit, mincount,missing,sort,prefix, facetSortOrder); Implementation Notes: I have noted in testing that I was not able to retrieve any '0' counts as I had expected. I believe this could be because there appear to be some optimizations in SimpleFacets/count caching such that zero counts are not iterated (at least not by default) as a performance enhancement. I could be wrong about this, and zero counts may appear under some other as yet untested circumstances. Perhaps an expert familiar with this part of the code can clarify. In fact, this is not such a bad thing (at least for my requirements), as a whole bunch of zero counts is not necessarily useful (for my requirements, starting at '1' is just right). There may, however, be instances where someone *will* want zero counts - e.g. searching for zero product stock counts (e.g. 'what have we run out of'). I was envisioning the facet.mincount field being the preferred place to set where the 'lowest value' begins (e.g. 0 or 1 or possibly higher), but because of the caching/optimization, the behaviour is somewhat different than expected. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6513) Add a collectionsAPI call BALANCESLICEUNIQUE
[ https://issues.apache.org/jira/browse/SOLR-6513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14189352#comment-14189352 ] Jan Høydahl commented on SOLR-6513: --- This API is not in a released version, so it should be safe to commit the rename as part of this JIRA, not? Add a collectionsAPI call BALANCESLICEUNIQUE Key: SOLR-6513 URL: https://issues.apache.org/jira/browse/SOLR-6513 Project: Solr Issue Type: Improvement Reporter: Erick Erickson Assignee: Erick Erickson Fix For: 5.0, Trunk Attachments: SOLR-6513.patch, SOLR-6513.patch, SOLR-6513.patch, SOLR-6513.patch Another sub-task for SOLR-6491. The ability to assign a property on a node-by-node basis is nice, but tedious to get right for a sysadmin, especially if there are, say, 100s of nodes hosting a system. This JIRA would essentially provide an automatic mechanism for assigning a property. This particular command simply changes the cluster state, it doesn't do anything like re-assign functions. My idea for this version is fairly limited. You'd have to specify a collection and there would be no attempt to, say, evenly distribute the preferred leader role/property for this collection by looking at _other_ collections. Or by looking at underlying hardware capabilities. Or It would be a pretty simple round-robin assignment. About the only intelligence built in would be to change as few roles/properties as possible. Let's say that the correct number of nodes for this role turned out to be 3. Any node currently having 3 properties for this collection would NOT be changed. Any node having 2 properties would have one added that would be taken from some node with 3 properties like this. This probably needs an optional parameter, something like includeInactiveNodes=true|false Since this is an arbitrary property, one must specify sliceUnique=true. So for the preferredLeader functionality, one would specify something like: action=BALANCESLICEUNIQUEproperty=preferredLeaderproprety.value=true. There are checks in this code that require the preferredLeader to have a t/f value and require that sliceUnique bet true. That said, this can be called on an arbitrary property that has only one such property per slice. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6671) Introduce a solr.data.root as root dir for all data
[ https://issues.apache.org/jira/browse/SOLR-6671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14189375#comment-14189375 ] Jan Høydahl commented on SOLR-6671: --- Hoss, yes you can compose your own variables everywhere in general, but this issue proposes to ship Solr with such convenience out of the box. Then also we could add an {{-r dir}} option to {{bin/solr}} for specifying where data should live. Thus people having tons of collections already will be able to upgrade to Solr5 and start using the option without further editing of XML's. Introduce a solr.data.root as root dir for all data --- Key: SOLR-6671 URL: https://issues.apache.org/jira/browse/SOLR-6671 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.10.1 Reporter: Jan Høydahl Fix For: 5.0, Trunk Many users prefer to deploy code, config and data on separate disk locations, so the default of placing the indexes under {{$\{solr.solr.home\}/$\{solr.core.name\}/data}} is not always wanted. In a multi-core/collection system, there is not much help in the {{solr.data.dir}} option, as it would set the {{dataDir}} to the same folder for all collections. One workaround, if you don't want to hardcode paths in your {{solrconfig.xml}}, is to specify the {{dataDir}} property in each {{solr.properties}} file. A more elegant solution would be to introduce a new Java-option {{solr.data.root}} which would be to data the same as {{solr.solr.home}} is for config. If set, all collections would default their {{dataDir}} as {{$\{solr.data.root\)/$\{solr.core.name\}/data}} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6671) Introduce a solr.data.root as root dir for all data
[ https://issues.apache.org/jira/browse/SOLR-6671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14189405#comment-14189405 ] Jan Høydahl commented on SOLR-6671: --- [~markrmil...@gmail.com], if using {{solr.hdfs.home}}, should not also data from e.g. BlendedInfixSuggester be co-located there? But BlendedInfixLookupFactory currently hardcodes FSDirectory. Should probably create another JIRA for that and possibly other hardcodings. Introduce a solr.data.root as root dir for all data --- Key: SOLR-6671 URL: https://issues.apache.org/jira/browse/SOLR-6671 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.10.1 Reporter: Jan Høydahl Fix For: 5.0, Trunk Many users prefer to deploy code, config and data on separate disk locations, so the default of placing the indexes under {{$\{solr.solr.home\}/$\{solr.core.name\}/data}} is not always wanted. In a multi-core/collection system, there is not much help in the {{solr.data.dir}} option, as it would set the {{dataDir}} to the same folder for all collections. One workaround, if you don't want to hardcode paths in your {{solrconfig.xml}}, is to specify the {{dataDir}} property in each {{solr.properties}} file. A more elegant solution would be to introduce a new Java-option {{solr.data.root}} which would be to data the same as {{solr.solr.home}} is for config. If set, all collections would default their {{dataDir}} as {{$\{solr.data.root\)/$\{solr.core.name\}/data}} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-NightlyTests-5.x - Build # 662 - Still Failing
Build: https://builds.apache.org/job/Lucene-Solr-NightlyTests-5.x/662/ 1 tests failed. FAILED: junit.framework.TestSuite.org.apache.solr.handler.TestReplicationHandler Error Message: Resource in scope SUITE failed to close. Resource was registered from thread Thread[id=4131, name=coreLoadExecutor-1670-thread-1, state=RUNNABLE, group=TGRP-TestReplicationHandler], registration stack trace below. Stack Trace: com.carrotsearch.randomizedtesting.ResourceDisposalError: Resource in scope SUITE failed to close. Resource was registered from thread Thread[id=4131, name=coreLoadExecutor-1670-thread-1, state=RUNNABLE, group=TGRP-TestReplicationHandler], registration stack trace below. at __randomizedtesting.SeedInfo.seed([800168CC91F75BB2]:0) at java.lang.Thread.getStackTrace(Thread.java:1589) at com.carrotsearch.randomizedtesting.RandomizedContext.closeAtEnd(RandomizedContext.java:166) at org.apache.lucene.util.LuceneTestCase.closeAfterSuite(LuceneTestCase.java:728) at org.apache.lucene.util.LuceneTestCase.wrapDirectory(LuceneTestCase.java:1314) at org.apache.lucene.util.LuceneTestCase.newDirectory(LuceneTestCase.java:1205) at org.apache.lucene.util.LuceneTestCase.newDirectory(LuceneTestCase.java:1197) at org.apache.solr.core.MockDirectoryFactory.create(MockDirectoryFactory.java:47) at org.apache.solr.core.CachingDirectoryFactory.get(CachingDirectoryFactory.java:350) at org.apache.solr.core.SolrCore.getNewIndexDir(SolrCore.java:276) at org.apache.solr.core.SolrCore.initIndex(SolrCore.java:488) at org.apache.solr.core.SolrCore.init(SolrCore.java:796) at org.apache.solr.core.SolrCore.init(SolrCore.java:652) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:509) at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:273) at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:267) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.AssertionError: Directory not closed: MockDirectoryWrapper(SimpleFSDirectory@/usr/home/jenkins/jenkins-slave/workspace/Lucene-Solr-NightlyTests-5.x/solr/build/solr-core/test/J2/temp/solr.handler.TestReplicationHandler-800168CC91F75BB2-001/index-SimpleFSDirectory-116 lockFactory=NativeFSLockFactory@/usr/home/jenkins/jenkins-slave/workspace/Lucene-Solr-NightlyTests-5.x/solr/build/solr-core/test/J2/temp/solr.handler.TestReplicationHandler-800168CC91F75BB2-001/index-SimpleFSDirectory-116) at org.junit.Assert.fail(Assert.java:93) at org.apache.lucene.util.CloseableDirectory.close(CloseableDirectory.java:47) at com.carrotsearch.randomizedtesting.RandomizedRunner$2$1.apply(RandomizedRunner.java:699) at com.carrotsearch.randomizedtesting.RandomizedRunner$2$1.apply(RandomizedRunner.java:696) at com.carrotsearch.randomizedtesting.RandomizedContext.closeResources(RandomizedContext.java:183) at com.carrotsearch.randomizedtesting.RandomizedRunner$2.afterAlways(RandomizedRunner.java:712) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:43) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:365) ... 1 more Build Log: [...truncated 12631 lines...] [junit4] Suite: org.apache.solr.handler.TestReplicationHandler [junit4] 2 Creating dataDir: /usr/home/jenkins/jenkins-slave/workspace/Lucene-Solr-NightlyTests-5.x/solr/build/solr-core/test/J2/temp/solr.handler.TestReplicationHandler-800168CC91F75BB2-001/init-core-data-001 [junit4] 2 1421040 T3509 oas.SolrTestCaseJ4.setUp ###Starting doTestReplicateAfterCoreReload [junit4] 2 1421056 T3509 oejs.Server.doStart jetty-8.1.10.v20130312 [junit4] 2 1421060 T3509 oejs.AbstractConnector.doStart Started SocketConnector@127.0.0.1:54137 [junit4] 2 1421060 T3509 oass.SolrDispatchFilter.init SolrDispatchFilter.init() [junit4] 2 1421061 T3509 oasc.SolrResourceLoader.locateSolrHome JNDI not configured for solr (NoInitialContextEx) [junit4] 2 1421061 T3509 oasc.SolrResourceLoader.locateSolrHome using system property solr.solr.home: /usr/home/jenkins/jenkins-slave/workspace/Lucene-Solr-NightlyTests-5.x/solr/build/solr-core/test/J2/temp/solr.handler.TestReplicationHandler-800168CC91F75BB2-001/solr-instance-001 [junit4] 2 1421061 T3509 oasc.SolrResourceLoader.init new SolrResourceLoader for directory: '/usr/home/jenkins/jenkins-slave/workspace/Lucene-Solr-NightlyTests-5.x/solr/build/solr-core/test/J2/temp/solr.handler.TestReplicationHandler-800168CC91F75BB2-001/solr-instance-001/' [junit4] 2 1421106 T3509
[jira] [Updated] (SOLR-6351) Let Stats Hang off of Pivots (via 'tag')
[ https://issues.apache.org/jira/browse/SOLR-6351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hoss Man updated SOLR-6351: --- Attachment: SOLR-6351.patch My main focus the last day or so has been reviewing PivotFacetHelper PivotFacetValue with an eye towards simplifying the amount of redundent code between them and StatsComponent. Some details posted below but one key thing i wanted to point out... Even as (relatively) familiar as i am with the exsting Pivot code, it took me a long time to understand how PivotFacetHelper.getStats + PivotListEntry.STATS were working in the case of leaf level pivot values -- short answer: PivotFacetHelper.getStats totally ignores the Enum value of PivotListEntry.STATS and uses 0 (something PivotFacetHelper.getPivots also does that i've never noticed before). Given that we plan to add more data to pivots in issues like SOLR-4212 SOLR-6353, i really wanted to come up with a pattern for dealing with this that was less likeely to trip people up when looking at the code. {panel:title=Changes in this patch} * StatsComponent ** refactored out tiny little reusable unwrapStats utility ** refactored out reusable convertToResponse utility *** i was hoping this would help encapsulate simplify the way the count==0 rules are applied, to make top level consistent with pivots, but that lead me down a rabbit hole of pain as far as testing and backcompat and solrj - so i just captured it in a 'force' method param. *** But at least now, the method is consistently called everywhere that outputs stats, so if/when we change the rules for how empty stats are returned (see comments in SOLR-6349) we won't need to audit/change multiple pieces of code, we can just focus on callers of this method ** Added a StatsInfo.getStatsField(key) method for use by PivotFacetHelper.mergeStats so it wouldn't need to constantly loop over every possible stats.field * PivotFacetValue ** removed an unneccessary level of wrapping arround the MapString,StatsValues ** switched to using StatsComponent.convertToResponse directly instead of PivotFacetHelper.convertStatsValuesToNamedList * PivotListEntry ** renamed index to minIndex ** added an extract method that knows how to correctly deal with the diff between optional entries that may exist starting at the minIndex, and mandatory entires (field,value,count) that *must* exist at the expected index. * PivotFacetHelper ** changed the various getFoo methods to use PivotListEntry.FOO.extract *** these methods now exact mainly just for convinience with the Object casting *** this also ment the retrieve method could be removed ** simplified mergeStats via: *** StatsComponent.unwrapStats *** StatsInfo.getStatsField ** mergeStats javadocs ** removed convertStatsValuesToNamedList * PivotFacetProcessor ** switch using StatsComponent.convertToResponse * TestCloudPivots ** update nocommit comment regarding 'null' actualStats based on pain encountered working on StastComponent.convertToResponse *** added some more sanity check assertions in this case as well * DistributedFacetPivotSmallTest ** added doTestPivotStatsFromOneShard to account for an edge case in merging that occured to me while reviewing PivotFacetHelper.mergeStats *** this fails because of how +/-Infinity are treated as the min/max - i'll working on fixing this next *** currently commented out + has some nocommits to beef up this test w/other types * merged my working changes with Vitaliy's additions (but have not yet actually reviewed the new tests)... ** FacetPivotSmallTest ** DistributedFacetPivotSmallAdvancedTest ** PivotFacetValue.getStatsValues ... allthough it's not clear to me yet what purpose/value this adds? {panel} Let Stats Hang off of Pivots (via 'tag') Key: SOLR-6351 URL: https://issues.apache.org/jira/browse/SOLR-6351 Project: Solr Issue Type: Sub-task Reporter: Hoss Man Attachments: SOLR-6351.patch, SOLR-6351.patch, SOLR-6351.patch, SOLR-6351.patch, SOLR-6351.patch, SOLR-6351.patch, SOLR-6351.patch, SOLR-6351.patch, SOLR-6351.patch, SOLR-6351.patch, SOLR-6351.patch, SOLR-6351.patch, SOLR-6351.patch, SOLR-6351.patch he goal here is basically flip the notion of stats.facet on it's head, so that instead of asking the stats component to also do some faceting (something that's never worked well with the variety of field types and has never worked in distributed mode) we instead ask the PivotFacet code to compute some stats X for each leaf in a pivot. We'll do this with the existing {{stats.field}} params, but we'll leverage the {{tag}} local param of the {{stats.field}} instances to be able to associate which stats we want hanging off of which {{facet.pivot}} Example... {noformat} facet.pivot={!stats=s1}category,manufacturer
[jira] [Created] (SOLR-6672) function results' names should not include trailing whitespace
Mike Sokolov created SOLR-6672: -- Summary: function results' names should not include trailing whitespace Key: SOLR-6672 URL: https://issues.apache.org/jira/browse/SOLR-6672 Project: Solr Issue Type: Bug Components: search Reporter: Mike Sokolov Priority: Minor If you include a function as a result field in a list of multiple fields separated by white space, the corresponding key in the result markup includes trailing whitespace; Example: {code} fl=id field(units_used) archive_id {code} ends up returning results like this: {code} { id: nest.epubarchive.1, archive_id: urn:isbn:97849D42C5A01, field(units_used) : 123 ^ } {code} A workaround is to use comma separators instead of whitespace {code} fl=id,field(units_used),archive_id {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6517) CollectionsAPI call REBALANCELEADERS
[ https://issues.apache.org/jira/browse/SOLR-6517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14189635#comment-14189635 ] Noble Paul commented on SOLR-6517: -- Sorry for being a pain in the butt I'm exactly looking LeaderElector.joinElection for that call and I don't see where it is done. CollectionsAPI call REBALANCELEADERS Key: SOLR-6517 URL: https://issues.apache.org/jira/browse/SOLR-6517 Project: Solr Issue Type: New Feature Affects Versions: 5.0, Trunk Reporter: Erick Erickson Assignee: Erick Erickson Fix For: 5.0, Trunk Attachments: SOLR-6517.patch, SOLR-6517.patch, SOLR-6517.patch Perhaps the final piece of SOLR-6491. Once the preferred leadership roles are assigned, there has to be a command make it so Mr. Solr. This is something of a placeholder to collect ideas. One wouldn't want to flood the system with hundreds of re-assignments at once. Should this be synchronous or asnych? Should it make the best attempt but not worry about perfection? Should it??? a collection=name parameter would be required and it would re-elect all the leaders that were on the 'wrong' node I'm thinking an optionally allowing one to specify a shard in the case where you wanted to make a very specific change. Note that there's no need to specify a particular replica, since there should be only a single preferredLeader per slice. This command would do nothing to any slice that did not have a replica with a preferredLeader role. Likewise it would do nothing if the slice in question already had the leader role assigned to the node with the preferredLeader role. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-6517) CollectionsAPI call REBALANCELEADERS
[ https://issues.apache.org/jira/browse/SOLR-6517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14189635#comment-14189635 ] Noble Paul edited comment on SOLR-6517 at 10/30/14 4:39 AM: Sorry for being a pain in the butt I'm exactly looking for LeaderElector.joinElection call and I don't see where it is done. was (Author: noble.paul): Sorry for being a pain in the butt I'm exactly looking LeaderElector.joinElection for that call and I don't see where it is done. CollectionsAPI call REBALANCELEADERS Key: SOLR-6517 URL: https://issues.apache.org/jira/browse/SOLR-6517 Project: Solr Issue Type: New Feature Affects Versions: 5.0, Trunk Reporter: Erick Erickson Assignee: Erick Erickson Fix For: 5.0, Trunk Attachments: SOLR-6517.patch, SOLR-6517.patch, SOLR-6517.patch Perhaps the final piece of SOLR-6491. Once the preferred leadership roles are assigned, there has to be a command make it so Mr. Solr. This is something of a placeholder to collect ideas. One wouldn't want to flood the system with hundreds of re-assignments at once. Should this be synchronous or asnych? Should it make the best attempt but not worry about perfection? Should it??? a collection=name parameter would be required and it would re-elect all the leaders that were on the 'wrong' node I'm thinking an optionally allowing one to specify a shard in the case where you wanted to make a very specific change. Note that there's no need to specify a particular replica, since there should be only a single preferredLeader per slice. This command would do nothing to any slice that did not have a replica with a preferredLeader role. Likewise it would do nothing if the slice in question already had the leader role assigned to the node with the preferredLeader role. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org