[jira] [Resolved] (SOLR-11202) Implement a set-property command for AutoScaling API
[ https://issues.apache.org/jira/browse/SOLR-11202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shalin Shekhar Mangar resolved SOLR-11202. -- Resolution: Fixed > Implement a set-property command for AutoScaling API > > > Key: SOLR-11202 > URL: https://issues.apache.org/jira/browse/SOLR-11202 > Project: Solr > Issue Type: Sub-task > Security Level: Public(Default Security Level. Issues are Public) > Components: AutoScaling, SolrCloud > Reporter: Shalin Shekhar Mangar > Assignee: Shalin Shekhar Mangar > Fix For: 7.2, master (8.0) > > Attachments: SOLR-11202.patch, SOLR-11202.patch, SOLR-11202.patch > > > The Autoscaling API should support a {{set-property}} command so that > properties specific to autoscaling can be set/unset. Examples of such > properties are: > # The scheduled delay between trigger invocations (currently defaults to 1s) > # Min time between actions (currently defaults to 5s) -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (SOLR-11201) Implement trigger for arbitrary metrics
[ https://issues.apache.org/jira/browse/SOLR-11201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shalin Shekhar Mangar reassigned SOLR-11201: Assignee: Shalin Shekhar Mangar > Implement trigger for arbitrary metrics > --- > > Key: SOLR-11201 > URL: https://issues.apache.org/jira/browse/SOLR-11201 > Project: Solr > Issue Type: Sub-task > Security Level: Public(Default Security Level. Issues are Public) > Components: AutoScaling, SolrCloud > Reporter: Shalin Shekhar Mangar > Assignee: Shalin Shekhar Mangar > Fix For: 7.2 > > > It should be possible to set a trigger on any metrics exposed by the Metrics > API using a threshold value. Supporting {{waitFor}} may not be possible or > useful for all metrics. For those we will implement proper trigger support > (such as searchRate) However, a naive implementation might be to just poll > the value of the metric frequently and if it is consistently above the > threshold, fire the trigger. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-11202) Implement a set-property command for AutoScaling API
[ https://issues.apache.org/jira/browse/SOLR-11202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shalin Shekhar Mangar updated SOLR-11202: - Attachment: SOLR-11202.patch This patch: # Adds support for unsetting properties by specifying null as the value. # Updates the ref guide documentation with details about this API. > Implement a set-property command for AutoScaling API > > > Key: SOLR-11202 > URL: https://issues.apache.org/jira/browse/SOLR-11202 > Project: Solr > Issue Type: Sub-task > Security Level: Public(Default Security Level. Issues are Public) > Components: AutoScaling, SolrCloud > Reporter: Shalin Shekhar Mangar > Assignee: Shalin Shekhar Mangar > Fix For: 7.2, master (8.0) > > Attachments: SOLR-11202.patch, SOLR-11202.patch, SOLR-11202.patch > > > The Autoscaling API should support a {{set-property}} command so that > properties specific to autoscaling can be set/unset. Examples of such > properties are: > # The scheduled delay between trigger invocations (currently defaults to 1s) > # Min time between actions (currently defaults to 5s) -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-11202) Implement a set-property command for AutoScaling API
[ https://issues.apache.org/jira/browse/SOLR-11202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shalin Shekhar Mangar updated SOLR-11202: - Attachment: SOLR-11202.patch Thanks Andrzej. This patch incorporates all your comments. I also changed the time related properties to all use seconds for consistency. > Implement a set-property command for AutoScaling API > > > Key: SOLR-11202 > URL: https://issues.apache.org/jira/browse/SOLR-11202 > Project: Solr > Issue Type: Sub-task > Security Level: Public(Default Security Level. Issues are Public) > Components: AutoScaling, SolrCloud > Reporter: Shalin Shekhar Mangar > Assignee: Shalin Shekhar Mangar > Fix For: 7.2, master (8.0) > > Attachments: SOLR-11202.patch, SOLR-11202.patch > > > The Autoscaling API should support a {{set-property}} command so that > properties specific to autoscaling can be set/unset. Examples of such > properties are: > # The scheduled delay between trigger invocations (currently defaults to 1s) > # Min time between actions (currently defaults to 5s) -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9743) An UTILIZENODE command
[ https://issues.apache.org/jira/browse/SOLR-9743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16266418#comment-16266418 ] Shalin Shekhar Mangar commented on SOLR-9743: - In UtilizeNodeCmd, the following if statement will always be true: {code} if (Objects.equals(r.getName(), r.getName())) { {code} > An UTILIZENODE command > -- > > Key: SOLR-9743 > URL: https://issues.apache.org/jira/browse/SOLR-9743 > Project: Solr > Issue Type: Sub-task > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Noble Paul >Assignee: Noble Paul > > The command would accept one or more nodes and create appropriate replicas > based on some strategy. > The params are > *node: (required && multi-valued) : The nodes to be deployed > * collection: (optional) The collection to which the node should be added > to. if this parameter is not passed, try to assign to all collections > example: > {code} > action=UTILIZENODE=gettingstarted > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-11202) Implement a set-property command for AutoScaling API
[ https://issues.apache.org/jira/browse/SOLR-11202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shalin Shekhar Mangar updated SOLR-11202: - Attachment: SOLR-11202.patch Patch which adds set-properties command to autoscaling API. Support for 4 properties is added: # trigger.schedule.delay.seconds # trigger.cooldown.period.ms # trigger.core.pool.size # action.throttle.period.ms > Implement a set-property command for AutoScaling API > > > Key: SOLR-11202 > URL: https://issues.apache.org/jira/browse/SOLR-11202 > Project: Solr > Issue Type: Sub-task > Security Level: Public(Default Security Level. Issues are Public) > Components: AutoScaling, SolrCloud > Reporter: Shalin Shekhar Mangar > Assignee: Shalin Shekhar Mangar > Fix For: 7.2, master (8.0) > > Attachments: SOLR-11202.patch > > > The Autoscaling API should support a {{set-property}} command so that > properties specific to autoscaling can be set/unset. Examples of such > properties are: > # The scheduled delay between trigger invocations (currently defaults to 1s) > # Min time between actions (currently defaults to 5s) -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-11661) Race condition between core creation thread and recovery request from leader causes inconsistent view of documents
[ https://issues.apache.org/jira/browse/SOLR-11661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16259155#comment-16259155 ] Shalin Shekhar Mangar commented on SOLR-11661: -- Actually, Dat pointed out to me privately that the recovery thread is created because of core_node8 losing leader election. The request to recover made by core_node7 to core_node8 becomes a no-op because a recovery is already underway. > Race condition between core creation thread and recovery request from leader > causes inconsistent view of documents > -- > > Key: SOLR-11661 > URL: https://issues.apache.org/jira/browse/SOLR-11661 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrCloud > Reporter: Shalin Shekhar Mangar > Fix For: 7.2, master (8.0) > > Attachments: 11458-2-MoveReplicaHDFSTest-log.txt > > > While testing SOLR-11458, [~ab] ran into an interesting failure which > resulted in different document counts between leader and replica. The test is > MoveReplicaHDFSTest on jira/solr-11458-2 branch. > The failure is rare but reproducible on beasting: > {code} > reproduce with: ant test -Dtestcase=MoveReplicaHDFSTest > -Dtests.method=testNormalFailedMove -Dtests.seed=161856CB543CD71C > -Dtests.slow=true -Dtests.locale=ar-SA -Dtests.timezone=US/Michigan > -Dtests.asserts=true -Dtests.file.encoding=ISO-8859-1 >[junit4] FAILURE 14.2s | MoveReplicaHDFSTest.testNormalFailedMove <<< >[junit4]> Throwable #1: java.lang.AssertionError: expected:<100> but > was:<56> >[junit4]> at > __randomizedtesting.SeedInfo.seed([161856CB543CD71C:31134983787E4905]:0) >[junit4]> at > org.apache.solr.cloud.MoveReplicaTest.testFailedMove(MoveReplicaTest.java:305) >[junit4]> at > org.apache.solr.cloud.MoveReplicaHDFSTest.testNormalFailedMove(MoveReplicaHDFSTest.java:69) > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-11661) Race condition between core creation thread and recovery request from leader causes inconsistent view of documents
[ https://issues.apache.org/jira/browse/SOLR-11661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shalin Shekhar Mangar updated SOLR-11661: - Attachment: 11458-2-MoveReplicaHDFSTest-log.txt Full logs attached. Dat and I analyzed the logs and we found this problem: {code} # New collection called MoveReplicaHDFSTest_failed_coll is being created. New replicas core_node7 and core_node8 for shard are in process of being created. # New core MoveReplicaHDFSTest_failed_coll_shard2_replica_n4 core_node7 tries to become leader, asks MoveReplicaHDFSTest_failed_coll_shard2_replica_n6 core_node8 to sync # Sync fails because core_node8 has no versions # core_node7 becomes leader and asks core_node8 to recover # core_node8 gets a request to recover and starts recovery thread recoveryExecutor-53-thread-1-processing-n:127.0.0.1:61049_solr # core_node8 enters buffering state # core_node8 sends prep recovery command to core_node7 and publishes itself in recovery state # core_node7 has a thread in WaitForState and sees core_node8 as down currently # At t=70388, some DataStreamer Exception is reported from DFSClient and leader core_node7 logs that it could not close the HDFS transaction log due to no more good datanodes being available -- these look like they aren't relevant to the problem # core_node7 (leader) publishes itself as active # core_node7 create core is complete # core_node8 create thread (qtp1713789948-2124) sees that there is a leader and publishes itself as active, skipping recovery # core_node8 create core command is successful # collection create is finished # core_node7 remains tied in WaitForState because from now on it only sees core_node8 in active but not in recovery # the recovery thread in core_node8 remains waiting in prep recovery # New documents are added to the collection but they aren't visible to searchers because core_node8 is buffering and therefore ignores commit requests {code} So there is a race between the core create thread publishing local as active after the leader has asked said core to recover. This is a side effect of SOLR-9566 which skips recovery for replicas which are being created as part of a new collection. > Race condition between core creation thread and recovery request from leader > causes inconsistent view of documents > -- > > Key: SOLR-11661 > URL: https://issues.apache.org/jira/browse/SOLR-11661 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrCloud > Reporter: Shalin Shekhar Mangar > Fix For: 7.2, master (8.0) > > Attachments: 11458-2-MoveReplicaHDFSTest-log.txt > > > While testing SOLR-11458, [~ab] ran into an interesting failure which > resulted in different document counts between leader and replica. The test is > MoveReplicaHDFSTest on jira/solr-11458-2 branch. > The failure is rare but reproducible on beasting: > {code} > reproduce with: ant test -Dtestcase=MoveReplicaHDFSTest > -Dtests.method=testNormalFailedMove -Dtests.seed=161856CB543CD71C > -Dtests.slow=true -Dtests.locale=ar-SA -Dtests.timezone=US/Michigan > -Dtests.asserts=true -Dtests.file.encoding=ISO-8859-1 >[junit4] FAILURE 14.2s | MoveReplicaHDFSTest.testNormalFailedMove <<< >[junit4]> Throwable #1: java.lang.AssertionError: expected:<100> but > was:<56> >[junit4]> at > __randomizedtesting.SeedInfo.seed([161856CB543CD71C:31134983787E4905]:0) >[junit4]> at > org.apache.solr.cloud.MoveReplicaTest.testFailedMove(MoveReplicaTest.java:305) >[junit4]> at > org.apache.solr.cloud.MoveReplicaHDFSTest.testNormalFailedMove(MoveReplicaHDFSTest.java:69) > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-11661) Race condition between core creation thread and recovery request from leader causes inconsistent view of documents
Shalin Shekhar Mangar created SOLR-11661: Summary: Race condition between core creation thread and recovery request from leader causes inconsistent view of documents Key: SOLR-11661 URL: https://issues.apache.org/jira/browse/SOLR-11661 Project: Solr Issue Type: Bug Security Level: Public (Default Security Level. Issues are Public) Components: SolrCloud Reporter: Shalin Shekhar Mangar Fix For: 7.2, master (8.0) While testing SOLR-11458, [~ab] ran into an interesting failure which resulted in different document counts between leader and replica. The test is MoveReplicaHDFSTest on jira/solr-11458-2 branch. The failure is rare but reproducible on beasting: {code} reproduce with: ant test -Dtestcase=MoveReplicaHDFSTest -Dtests.method=testNormalFailedMove -Dtests.seed=161856CB543CD71C -Dtests.slow=true -Dtests.locale=ar-SA -Dtests.timezone=US/Michigan -Dtests.asserts=true -Dtests.file.encoding=ISO-8859-1 [junit4] FAILURE 14.2s | MoveReplicaHDFSTest.testNormalFailedMove <<< [junit4]> Throwable #1: java.lang.AssertionError: expected:<100> but was:<56> [junit4]>at __randomizedtesting.SeedInfo.seed([161856CB543CD71C:31134983787E4905]:0) [junit4]>at org.apache.solr.cloud.MoveReplicaTest.testFailedMove(MoveReplicaTest.java:305) [junit4]>at org.apache.solr.cloud.MoveReplicaHDFSTest.testNormalFailedMove(MoveReplicaHDFSTest.java:69) {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Welcome Noble Paul to the PMC
Congratulations Noble! On Mon, Nov 20, 2017 at 1:32 AM, Adrien Grand <jpou...@gmail.com> wrote: > I am pleased to announce that Noble Paul has accepted the PMC's invitation > to join. > > Welcome Noble! -- Regards, Shalin Shekhar Mangar. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-11638) CloudSolrClientTest periodic failures
[ https://issues.apache.org/jira/browse/SOLR-11638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shalin Shekhar Mangar resolved SOLR-11638. -- Resolution: Fixed Fix Version/s: master (8.0) 7.2 Thanks Jason! > CloudSolrClientTest periodic failures > - > > Key: SOLR-11638 > URL: https://issues.apache.org/jira/browse/SOLR-11638 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrJ, Tests >Affects Versions: master (8.0) >Reporter: Jason Gerlowski > Assignee: Shalin Shekhar Mangar > Fix For: 7.2, master (8.0) > > Attachments: SOLR-11638.patch > > > The test-randomization recently-added as a part of SOLR-11507 has caused > {{CloudSolrClientTest.testRetryUpdatesWhenClusterStateIsStale}} to fail > semi-regularly on master. The test only succeeds for me on 3 out of 10 test > runs. The test fails with the message: > {code} >[junit4] 2> 14848 ERROR > (TEST-CloudSolrClientTest.testRetryUpdatesWhenClusterStateIsStale-seed#[64E89FBB977E15AA]) > [] o.a.s.c.s.i.CloudSolrClient Request to collection > [stale_state_test_col] failed due to (404) > org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error > from server at > http://127.0.0.1:38925/solr/stale_state_test_col_shard1_replica_n1: Expected > mime type application/octet-stream but got text/html. >[junit4] 2> >[junit4] 2> content="text/html;charset=ISO-8859-1"/> >[junit4] 2> Error 404 >[junit4] 2> >[junit4] 2> >[junit4] 2> HTTP ERROR: 404 >[junit4] 2> Problem accessing > /solr/stale_state_test_col_shard1_replica_n1/update. Reason: >[junit4] 2> Can not find: > /solr/stale_state_test_col_shard1_replica_n1/update >[junit4] 2> http://eclipse.org/jetty;>Powered by Jetty:// > 9.3.20.v20170531 >[junit4] 2> >[junit4] 2> >[junit4] 2> , retry? 0 >[junit4] 2> 14851 INFO > (TEST-CloudSolrClientTest.testRetryUpdatesWhenClusterStateIsStale-seed#[64E89FBB977E15AA]) > [] o.a.s.SolrTestCaseJ4 ###Ending testRetryUpdatesWhenClusterStateIsStale >[junit4] 2> NOTE: reproduce with: ant test > -Dtestcase=CloudSolrClientTest > -Dtests.method=testRetryUpdatesWhenClusterStateIsStale > -Dtests.seed=64E89FBB977E15AA -Dtests.slow=true -Dtests.locale=es-VE > -Dtests.timezone=Indian/Chagos -Dtests.asserts=true > -Dtests.file.encoding=US-ASCII >[junit4] ERROR 5.86s | > CloudSolrClientTest.testRetryUpdatesWhenClusterStateIsStale <<< >[junit4]> Throwable #1: > org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error > from server at > http://127.0.0.1:38925/solr/stale_state_test_col_shard1_replica_n1: Expected > mime type application/octet-stream but got text/html. >[junit4]> >[junit4]> content="text/html;charset=ISO-8859-1"/> >[junit4]> Error 404 >[junit4]> >[junit4]> >[junit4]> HTTP ERROR: 404 >[junit4]> Problem accessing > /solr/stale_state_test_col_shard1_replica_n1/update. Reason: >[junit4]> Can not find: > /solr/stale_state_test_col_shard1_replica_n1/update >[junit4]> http://eclipse.org/jetty;>Powered by Jetty:// > 9.3.20.v20170531 >[junit4]> >[junit4]> >[junit4]> at > __randomizedtesting.SeedInfo.seed([64E89FBB977E15AA:D0D9075374976386]:0) >[junit4]> at > org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:607) >[junit4]> at > org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:255) >[junit4]> at > org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:244) >[junit4]> at > org.apache.solr.client.solrj.impl.LBHttpSolrClient.doRequest(LBHttpSolrClient.java:483) >[junit4]> at > org.apache.solr.client.solrj.impl.LBHttpSolrClient.request(LBHttpSolrClient.java:413) >[junit4]> at > org.apache.solr.client.solrj.impl.CloudSolrClient.directUpdate(CloudSolrClient.java:559) >[junit4]> at > org.apache.solr.client.solrj.impl.CloudSolrClient.sendRequest(CloudSolrClient.java:1016) >[junit4]> at > org.apache.solr.
[jira] [Comment Edited] (SOLR-11638) CloudSolrClientTest periodic failures
[ https://issues.apache.org/jira/browse/SOLR-11638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16249391#comment-16249391 ] Shalin Shekhar Mangar edited comment on SOLR-11638 at 11/13/17 11:05 AM: - The method you have used in the patch called {{.sendDirectUpdatesToAnyShardReplica()}} sets the internal flag {{directUpdatesToLeadersOnly=false}}. This is why I thought I should check with you again. Although reading the test again, I see that it creates just 1 replica in total so it probably doesn't matter whether we set directUpdatesToLeadersOnly to true or false. was (Author: shalinmangar): The method you have used in the patch called {{.sendDirectUpdatesToAnyShardReplica()}} sets the internal flag {{directUpdatesToLeadersOnly=false}}. This is why I thought I should check with you again. > CloudSolrClientTest periodic failures > - > > Key: SOLR-11638 > URL: https://issues.apache.org/jira/browse/SOLR-11638 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrJ, Tests >Affects Versions: master (8.0) >Reporter: Jason Gerlowski > Assignee: Shalin Shekhar Mangar > Attachments: SOLR-11638.patch > > > The test-randomization recently-added as a part of SOLR-11507 has caused > {{CloudSolrClientTest.testRetryUpdatesWhenClusterStateIsStale}} to fail > semi-regularly on master. The test only succeeds for me on 3 out of 10 test > runs. The test fails with the message: > {code} >[junit4] 2> 14848 ERROR > (TEST-CloudSolrClientTest.testRetryUpdatesWhenClusterStateIsStale-seed#[64E89FBB977E15AA]) > [] o.a.s.c.s.i.CloudSolrClient Request to collection > [stale_state_test_col] failed due to (404) > org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error > from server at > http://127.0.0.1:38925/solr/stale_state_test_col_shard1_replica_n1: Expected > mime type application/octet-stream but got text/html. >[junit4] 2> >[junit4] 2> content="text/html;charset=ISO-8859-1"/> >[junit4] 2> Error 404 >[junit4] 2> >[junit4] 2> >[junit4] 2> HTTP ERROR: 404 >[junit4] 2> Problem accessing > /solr/stale_state_test_col_shard1_replica_n1/update. Reason: >[junit4] 2> Can not find: > /solr/stale_state_test_col_shard1_replica_n1/update >[junit4] 2> http://eclipse.org/jetty;>Powered by Jetty:// > 9.3.20.v20170531 >[junit4] 2> >[junit4] 2> >[junit4] 2> , retry? 0 >[junit4] 2> 14851 INFO > (TEST-CloudSolrClientTest.testRetryUpdatesWhenClusterStateIsStale-seed#[64E89FBB977E15AA]) > [] o.a.s.SolrTestCaseJ4 ###Ending testRetryUpdatesWhenClusterStateIsStale >[junit4] 2> NOTE: reproduce with: ant test > -Dtestcase=CloudSolrClientTest > -Dtests.method=testRetryUpdatesWhenClusterStateIsStale > -Dtests.seed=64E89FBB977E15AA -Dtests.slow=true -Dtests.locale=es-VE > -Dtests.timezone=Indian/Chagos -Dtests.asserts=true > -Dtests.file.encoding=US-ASCII >[junit4] ERROR 5.86s | > CloudSolrClientTest.testRetryUpdatesWhenClusterStateIsStale <<< >[junit4]> Throwable #1: > org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error > from server at > http://127.0.0.1:38925/solr/stale_state_test_col_shard1_replica_n1: Expected > mime type application/octet-stream but got text/html. >[junit4]> >[junit4]> content="text/html;charset=ISO-8859-1"/> >[junit4]> Error 404 >[junit4]> >[junit4]> >[junit4]> HTTP ERROR: 404 >[junit4]> Problem accessing > /solr/stale_state_test_col_shard1_replica_n1/update. Reason: >[junit4]> Can not find: > /solr/stale_state_test_col_shard1_replica_n1/update >[junit4]> http://eclipse.org/jetty;>Powered by Jetty:// > 9.3.20.v20170531 >[junit4]> >[junit4]> >[junit4]> at > __randomizedtesting.SeedInfo.seed([64E89FBB977E15AA:D0D9075374976386]:0) >[junit4]> at > org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:607) >[junit4]> at > org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:255) >[junit4]> at > org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:244) >[junit4]> at > org.apache.solr.client.solrj
[jira] [Commented] (SOLR-11638) CloudSolrClientTest periodic failures
[ https://issues.apache.org/jira/browse/SOLR-11638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16249391#comment-16249391 ] Shalin Shekhar Mangar commented on SOLR-11638: -- The method you have used in the patch called {{.sendDirectUpdatesToAnyShardReplica()}} sets the internal flag {{directUpdatesToLeadersOnly=false}}. This is why I thought I should check with you again. > CloudSolrClientTest periodic failures > - > > Key: SOLR-11638 > URL: https://issues.apache.org/jira/browse/SOLR-11638 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrJ, Tests >Affects Versions: master (8.0) >Reporter: Jason Gerlowski > Assignee: Shalin Shekhar Mangar > Attachments: SOLR-11638.patch > > > The test-randomization recently-added as a part of SOLR-11507 has caused > {{CloudSolrClientTest.testRetryUpdatesWhenClusterStateIsStale}} to fail > semi-regularly on master. The test only succeeds for me on 3 out of 10 test > runs. The test fails with the message: > {code} >[junit4] 2> 14848 ERROR > (TEST-CloudSolrClientTest.testRetryUpdatesWhenClusterStateIsStale-seed#[64E89FBB977E15AA]) > [] o.a.s.c.s.i.CloudSolrClient Request to collection > [stale_state_test_col] failed due to (404) > org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error > from server at > http://127.0.0.1:38925/solr/stale_state_test_col_shard1_replica_n1: Expected > mime type application/octet-stream but got text/html. >[junit4] 2> >[junit4] 2> content="text/html;charset=ISO-8859-1"/> >[junit4] 2> Error 404 >[junit4] 2> >[junit4] 2> >[junit4] 2> HTTP ERROR: 404 >[junit4] 2> Problem accessing > /solr/stale_state_test_col_shard1_replica_n1/update. Reason: >[junit4] 2> Can not find: > /solr/stale_state_test_col_shard1_replica_n1/update >[junit4] 2> http://eclipse.org/jetty;>Powered by Jetty:// > 9.3.20.v20170531 >[junit4] 2> >[junit4] 2> >[junit4] 2> , retry? 0 >[junit4] 2> 14851 INFO > (TEST-CloudSolrClientTest.testRetryUpdatesWhenClusterStateIsStale-seed#[64E89FBB977E15AA]) > [] o.a.s.SolrTestCaseJ4 ###Ending testRetryUpdatesWhenClusterStateIsStale >[junit4] 2> NOTE: reproduce with: ant test > -Dtestcase=CloudSolrClientTest > -Dtests.method=testRetryUpdatesWhenClusterStateIsStale > -Dtests.seed=64E89FBB977E15AA -Dtests.slow=true -Dtests.locale=es-VE > -Dtests.timezone=Indian/Chagos -Dtests.asserts=true > -Dtests.file.encoding=US-ASCII >[junit4] ERROR 5.86s | > CloudSolrClientTest.testRetryUpdatesWhenClusterStateIsStale <<< >[junit4]> Throwable #1: > org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error > from server at > http://127.0.0.1:38925/solr/stale_state_test_col_shard1_replica_n1: Expected > mime type application/octet-stream but got text/html. >[junit4]> >[junit4]> content="text/html;charset=ISO-8859-1"/> >[junit4]> Error 404 >[junit4]> >[junit4]> >[junit4]> HTTP ERROR: 404 >[junit4]> Problem accessing > /solr/stale_state_test_col_shard1_replica_n1/update. Reason: >[junit4]> Can not find: > /solr/stale_state_test_col_shard1_replica_n1/update >[junit4]> http://eclipse.org/jetty;>Powered by Jetty:// > 9.3.20.v20170531 >[junit4]> >[junit4]> >[junit4]> at > __randomizedtesting.SeedInfo.seed([64E89FBB977E15AA:D0D9075374976386]:0) >[junit4]> at > org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:607) >[junit4]> at > org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:255) >[junit4]> at > org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:244) >[junit4]> at > org.apache.solr.client.solrj.impl.LBHttpSolrClient.doRequest(LBHttpSolrClient.java:483) >[junit4]> at > org.apache.solr.client.solrj.impl.LBHttpSolrClient.request(LBHttpSolrClient.java:413) >[junit4]> at > org.apache.solr.client.solrj.impl.CloudSolrClient.directUpdate(CloudSolrClient.java:559) >[junit4]> at > org.apache.solr.client.solrj.impl.CloudSolrClient.sendRequest(CloudSolrClient.java:1016) >[junit4]>
[jira] [Commented] (SOLR-11638) CloudSolrClientTest periodic failures
[ https://issues.apache.org/jira/browse/SOLR-11638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16249350#comment-16249350 ] Shalin Shekhar Mangar commented on SOLR-11638: -- Hi [~gerlowskija],in the description you said that the test depends on {{directUpdatesToLeadersOnly}} but you used {{sendDirectUpdatesToAnyShardReplica}} in your patch. Did you mean to use {{sendDirectUpdatesToShardLeadersOnly}} instead? > CloudSolrClientTest periodic failures > - > > Key: SOLR-11638 > URL: https://issues.apache.org/jira/browse/SOLR-11638 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrJ, Tests >Affects Versions: master (8.0) >Reporter: Jason Gerlowski > Assignee: Shalin Shekhar Mangar > Attachments: SOLR-11638.patch > > > The test-randomization recently-added as a part of SOLR-11507 has caused > {{CloudSolrClientTest.testRetryUpdatesWhenClusterStateIsStale}} to fail > semi-regularly on master. The test only succeeds for me on 3 out of 10 test > runs. The test fails with the message: > {code} >[junit4] 2> 14848 ERROR > (TEST-CloudSolrClientTest.testRetryUpdatesWhenClusterStateIsStale-seed#[64E89FBB977E15AA]) > [] o.a.s.c.s.i.CloudSolrClient Request to collection > [stale_state_test_col] failed due to (404) > org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error > from server at > http://127.0.0.1:38925/solr/stale_state_test_col_shard1_replica_n1: Expected > mime type application/octet-stream but got text/html. >[junit4] 2> >[junit4] 2> content="text/html;charset=ISO-8859-1"/> >[junit4] 2> Error 404 >[junit4] 2> >[junit4] 2> >[junit4] 2> HTTP ERROR: 404 >[junit4] 2> Problem accessing > /solr/stale_state_test_col_shard1_replica_n1/update. Reason: >[junit4] 2> Can not find: > /solr/stale_state_test_col_shard1_replica_n1/update >[junit4] 2> http://eclipse.org/jetty;>Powered by Jetty:// > 9.3.20.v20170531 >[junit4] 2> >[junit4] 2> >[junit4] 2> , retry? 0 >[junit4] 2> 14851 INFO > (TEST-CloudSolrClientTest.testRetryUpdatesWhenClusterStateIsStale-seed#[64E89FBB977E15AA]) > [] o.a.s.SolrTestCaseJ4 ###Ending testRetryUpdatesWhenClusterStateIsStale >[junit4] 2> NOTE: reproduce with: ant test > -Dtestcase=CloudSolrClientTest > -Dtests.method=testRetryUpdatesWhenClusterStateIsStale > -Dtests.seed=64E89FBB977E15AA -Dtests.slow=true -Dtests.locale=es-VE > -Dtests.timezone=Indian/Chagos -Dtests.asserts=true > -Dtests.file.encoding=US-ASCII >[junit4] ERROR 5.86s | > CloudSolrClientTest.testRetryUpdatesWhenClusterStateIsStale <<< >[junit4]> Throwable #1: > org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error > from server at > http://127.0.0.1:38925/solr/stale_state_test_col_shard1_replica_n1: Expected > mime type application/octet-stream but got text/html. >[junit4]> >[junit4]> content="text/html;charset=ISO-8859-1"/> >[junit4]> Error 404 >[junit4]> >[junit4]> >[junit4]> HTTP ERROR: 404 >[junit4]> Problem accessing > /solr/stale_state_test_col_shard1_replica_n1/update. Reason: >[junit4]> Can not find: > /solr/stale_state_test_col_shard1_replica_n1/update >[junit4]> http://eclipse.org/jetty;>Powered by Jetty:// > 9.3.20.v20170531 >[junit4]> >[junit4]> >[junit4]> at > __randomizedtesting.SeedInfo.seed([64E89FBB977E15AA:D0D9075374976386]:0) >[junit4]> at > org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:607) >[junit4]> at > org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:255) >[junit4]> at > org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:244) >[junit4]> at > org.apache.solr.client.solrj.impl.LBHttpSolrClient.doRequest(LBHttpSolrClient.java:483) >[junit4]> at > org.apache.solr.client.solrj.impl.LBHttpSolrClient.request(LBHttpSolrClient.java:413) >[junit4]> at > org.apache.solr.client.solrj.impl.CloudSolrClient.directUpdate(CloudSolrClient.java:559) >[junit4]> at > org.apache.solr.client.solrj.impl.Clou
[jira] [Assigned] (SOLR-11638) CloudSolrClientTest periodic failures
[ https://issues.apache.org/jira/browse/SOLR-11638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shalin Shekhar Mangar reassigned SOLR-11638: Assignee: Shalin Shekhar Mangar > CloudSolrClientTest periodic failures > - > > Key: SOLR-11638 > URL: https://issues.apache.org/jira/browse/SOLR-11638 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrJ, Tests >Affects Versions: master (8.0) >Reporter: Jason Gerlowski > Assignee: Shalin Shekhar Mangar > Attachments: SOLR-11638.patch > > > The test-randomization recently-added as a part of SOLR-11507 has caused > {{CloudSolrClientTest.testRetryUpdatesWhenClusterStateIsStale}} to fail > semi-regularly on master. The test only succeeds for me on 3 out of 10 test > runs. The test fails with the message: > {code} >[junit4] 2> 14848 ERROR > (TEST-CloudSolrClientTest.testRetryUpdatesWhenClusterStateIsStale-seed#[64E89FBB977E15AA]) > [] o.a.s.c.s.i.CloudSolrClient Request to collection > [stale_state_test_col] failed due to (404) > org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error > from server at > http://127.0.0.1:38925/solr/stale_state_test_col_shard1_replica_n1: Expected > mime type application/octet-stream but got text/html. >[junit4] 2> >[junit4] 2> content="text/html;charset=ISO-8859-1"/> >[junit4] 2> Error 404 >[junit4] 2> >[junit4] 2> >[junit4] 2> HTTP ERROR: 404 >[junit4] 2> Problem accessing > /solr/stale_state_test_col_shard1_replica_n1/update. Reason: >[junit4] 2> Can not find: > /solr/stale_state_test_col_shard1_replica_n1/update >[junit4] 2> http://eclipse.org/jetty;>Powered by Jetty:// > 9.3.20.v20170531 >[junit4] 2> >[junit4] 2> >[junit4] 2> , retry? 0 >[junit4] 2> 14851 INFO > (TEST-CloudSolrClientTest.testRetryUpdatesWhenClusterStateIsStale-seed#[64E89FBB977E15AA]) > [] o.a.s.SolrTestCaseJ4 ###Ending testRetryUpdatesWhenClusterStateIsStale >[junit4] 2> NOTE: reproduce with: ant test > -Dtestcase=CloudSolrClientTest > -Dtests.method=testRetryUpdatesWhenClusterStateIsStale > -Dtests.seed=64E89FBB977E15AA -Dtests.slow=true -Dtests.locale=es-VE > -Dtests.timezone=Indian/Chagos -Dtests.asserts=true > -Dtests.file.encoding=US-ASCII >[junit4] ERROR 5.86s | > CloudSolrClientTest.testRetryUpdatesWhenClusterStateIsStale <<< >[junit4]> Throwable #1: > org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error > from server at > http://127.0.0.1:38925/solr/stale_state_test_col_shard1_replica_n1: Expected > mime type application/octet-stream but got text/html. >[junit4]> >[junit4]> content="text/html;charset=ISO-8859-1"/> >[junit4]> Error 404 >[junit4]> >[junit4]> >[junit4]> HTTP ERROR: 404 >[junit4]> Problem accessing > /solr/stale_state_test_col_shard1_replica_n1/update. Reason: >[junit4]> Can not find: > /solr/stale_state_test_col_shard1_replica_n1/update >[junit4]> http://eclipse.org/jetty;>Powered by Jetty:// > 9.3.20.v20170531 >[junit4]> >[junit4]> >[junit4]> at > __randomizedtesting.SeedInfo.seed([64E89FBB977E15AA:D0D9075374976386]:0) >[junit4]> at > org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:607) >[junit4]> at > org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:255) >[junit4]> at > org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:244) >[junit4]> at > org.apache.solr.client.solrj.impl.LBHttpSolrClient.doRequest(LBHttpSolrClient.java:483) >[junit4]> at > org.apache.solr.client.solrj.impl.LBHttpSolrClient.request(LBHttpSolrClient.java:413) >[junit4]> at > org.apache.solr.client.solrj.impl.CloudSolrClient.directUpdate(CloudSolrClient.java:559) >[junit4]> at > org.apache.solr.client.solrj.impl.CloudSolrClient.sendRequest(CloudSolrClient.java:1016) >[junit4]> at > org.apache.solr.client.solrj.impl.CloudSolrClient.requestWithRetryOnStaleState(CloudSolrClient.java:883) >[junit4]> at > org.apache.solr.client.solrj.imp
[jira] [Assigned] (SOLR-11202) Implement a set-property command for AutoScaling API
[ https://issues.apache.org/jira/browse/SOLR-11202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shalin Shekhar Mangar reassigned SOLR-11202: Assignee: Shalin Shekhar Mangar > Implement a set-property command for AutoScaling API > > > Key: SOLR-11202 > URL: https://issues.apache.org/jira/browse/SOLR-11202 > Project: Solr > Issue Type: Sub-task > Security Level: Public(Default Security Level. Issues are Public) > Components: AutoScaling, SolrCloud > Reporter: Shalin Shekhar Mangar > Assignee: Shalin Shekhar Mangar > Fix For: 7.2, master (8.0) > > > The Autoscaling API should support a {{set-property}} command so that > properties specific to autoscaling can be set/unset. Examples of such > properties are: > # The scheduled delay between trigger invocations (currently defaults to 1s) > # Min time between actions (currently defaults to 5s) -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-11165) Write documentation for autoscaling APIs, triggers, actions, listeners for Solr 7.1
[ https://issues.apache.org/jira/browse/SOLR-11165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shalin Shekhar Mangar resolved SOLR-11165. -- Resolution: Fixed Assignee: Shalin Shekhar Mangar Fix Version/s: (was: 7.2) 7.1 > Write documentation for autoscaling APIs, triggers, actions, listeners for > Solr 7.1 > --- > > Key: SOLR-11165 > URL: https://issues.apache.org/jira/browse/SOLR-11165 > Project: Solr > Issue Type: Sub-task > Security Level: Public(Default Security Level. Issues are Public) > Components: AutoScaling, documentation > Reporter: Shalin Shekhar Mangar > Assignee: Shalin Shekhar Mangar > Fix For: master (8.0), 7.1 > > > This issue is to document all the new features and changes in autoscaling for > Solr 7.1 -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-11472) Leader election bug
[ https://issues.apache.org/jira/browse/SOLR-11472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shalin Shekhar Mangar resolved SOLR-11472. -- Resolution: Duplicate The root cause has been fixed in SOLR-11448. I found 1 similar test failure on Oct 23 which was after SOLR-11448 was committed but the logs no longer exist and I haven't seen anything since. So I'll close this and re-open if necessary. > Leader election bug > --- > > Key: SOLR-11472 > URL: https://issues.apache.org/jira/browse/SOLR-11472 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 7.1, master (8.0) >Reporter: Andrzej Bialecki > Assignee: Shalin Shekhar Mangar > Attachments: > Console_output_of_AutoscalingHistoryHandlerTest_failure.txt > > > SOLR-11407 uncovered a bug in leader election, where the same failing node is > retried indefinitely. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-11628) Add documentation of maxRamMB for filter cache and query result cache
[ https://issues.apache.org/jira/browse/SOLR-11628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shalin Shekhar Mangar resolved SOLR-11628. -- Resolution: Fixed > Add documentation of maxRamMB for filter cache and query result cache > - > > Key: SOLR-11628 > URL: https://issues.apache.org/jira/browse/SOLR-11628 > Project: Solr > Issue Type: Task > Security Level: Public(Default Security Level. Issues are Public) > Components: documentation > Reporter: Shalin Shekhar Mangar > Assignee: Shalin Shekhar Mangar >Priority: Minor > Fix For: 7.2, master (8.0) > > > The query settings in solrconfig page uses LRUCache in the filter cache > example. But by default the FastLRUCache is used. Also, SOLR-9633 added > support for maxRamMB for filter cache which is not documented at all. > https://builds.apache.org/view/L/view/Lucene/job/Solr-reference-guide-7.1/javadoc/query-settings-in-solrconfig.html -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-11628) Add documentation of maxRamMB for filter cache and query result cache
Shalin Shekhar Mangar created SOLR-11628: Summary: Add documentation of maxRamMB for filter cache and query result cache Key: SOLR-11628 URL: https://issues.apache.org/jira/browse/SOLR-11628 Project: Solr Issue Type: Task Security Level: Public (Default Security Level. Issues are Public) Components: documentation Reporter: Shalin Shekhar Mangar Assignee: Shalin Shekhar Mangar Priority: Minor Fix For: 7.2, master (8.0) The query settings in solrconfig page uses LRUCache in the filter cache example. But by default the FastLRUCache is used. Also, SOLR-9633 added support for maxRamMB for filter cache which is not documented at all. https://builds.apache.org/view/L/view/Lucene/job/Solr-reference-guide-7.1/javadoc/query-settings-in-solrconfig.html -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-11627) Weird AddReplicaTest failure on jenkins
[ https://issues.apache.org/jira/browse/SOLR-11627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shalin Shekhar Mangar updated SOLR-11627: - Attachment: addreplicatest-logs.txt Full test log is attached. > Weird AddReplicaTest failure on jenkins > --- > > Key: SOLR-11627 > URL: https://issues.apache.org/jira/browse/SOLR-11627 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrCloud > Reporter: Shalin Shekhar Mangar > Fix For: 7.2, master (8.0) > > Attachments: addreplicatest-logs.txt > > > I was going through some recent jenkins failure for autoscaling test failures > and I found this one: > https://jenkins.thetaphi.de/job/Lucene-Solr-master-Linux/20871 > {code} > FAILED: org.apache.solr.cloud.AddReplicaTest.test > Error Message: > Stack Trace: > java.lang.AssertionError > at > __randomizedtesting.SeedInfo.seed([77B3A9DC50455D4A:FFE79606FEB930B2]:0) > at org.junit.Assert.fail(Assert.java:92) > at org.junit.Assert.assertTrue(Assert.java:43) > at org.junit.Assert.assertTrue(Assert.java:54) > at org.apache.solr.cloud.AddReplicaTest.test(AddReplicaTest.java:103) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > {code} > I'm attaching the logs. It looks like a new replica published itself as > recoverying but the overseer never processed it. The leader keeps waiting in > prep recovery until the test times out. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-11627) Weird AddReplicaTest failure on jenkins
Shalin Shekhar Mangar created SOLR-11627: Summary: Weird AddReplicaTest failure on jenkins Key: SOLR-11627 URL: https://issues.apache.org/jira/browse/SOLR-11627 Project: Solr Issue Type: Bug Security Level: Public (Default Security Level. Issues are Public) Components: SolrCloud Reporter: Shalin Shekhar Mangar Fix For: 7.2, master (8.0) I was going through some recent jenkins failure for autoscaling test failures and I found this one: https://jenkins.thetaphi.de/job/Lucene-Solr-master-Linux/20871 {code} FAILED: org.apache.solr.cloud.AddReplicaTest.test Error Message: Stack Trace: java.lang.AssertionError at __randomizedtesting.SeedInfo.seed([77B3A9DC50455D4A:FFE79606FEB930B2]:0) at org.junit.Assert.fail(Assert.java:92) at org.junit.Assert.assertTrue(Assert.java:43) at org.junit.Assert.assertTrue(Assert.java:54) at org.apache.solr.cloud.AddReplicaTest.test(AddReplicaTest.java:103) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) {code} I'm attaching the logs. It looks like a new replica published itself as recoverying but the overseer never processed it. The leader keeps waiting in prep recovery until the test times out. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-11472) Leader election bug
[ https://issues.apache.org/jira/browse/SOLR-11472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16243764#comment-16243764 ] Shalin Shekhar Mangar commented on SOLR-11472: -- Here's the sequence of events: {code} core_node3 is leader for .system collection Test starts a new node at port 50071 Node Added Trigger fires and a plan is computed. action=MOVEREPLICA=.system=127.0.0.1:50071_solr=core_node3 is processed first and core_node8 is added on port 50071 but before it recovers fully, the leader node core_node3 is unloaded core_node6 becomes the leader and asks core_node8 to recover action=MOVEREPLICA=.system=127.0.0.1:50071_solr=core_node6 now core_node6 is to be moved and core_node10 is added on port 50071 but before it can recover, core_node6 is also unloaded system_shard1_replica_n2 on port 49937 becomes the leader and asks core_node8 and core_node10 to sync with it but before they can recover the test stops node 49937. The NodeLostTrigger fires and tries to create a new replica But leader election cannot happen because no nodes have any data and/or none of them were active before. {code} The crux of the issue is that move replica unloaded the leader before the newly added replica becomes active. Actually, Andrzej has fixed this problem already in SOLR-11448. The leader election issue seen in these logs is a known problem in SolrCloud. Mark Miller created SOLR-7065 to address the gridlock of leader election in such cases. I'll audit jenkins again to see if this test has failed since SOLR-11448 was committed. If not, then I'll close this issue. > Leader election bug > --- > > Key: SOLR-11472 > URL: https://issues.apache.org/jira/browse/SOLR-11472 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 7.1, master (8.0) >Reporter: Andrzej Bialecki > Assignee: Shalin Shekhar Mangar > Attachments: > Console_output_of_AutoscalingHistoryHandlerTest_failure.txt > > > SOLR-11407 uncovered a bug in leader election, where the same failing node is > retried indefinitely. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-11621) TriggerIntegrationTest failures on jenkins
[ https://issues.apache.org/jira/browse/SOLR-11621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shalin Shekhar Mangar resolved SOLR-11621. -- Resolution: Fixed Assignee: Shalin Shekhar Mangar > TriggerIntegrationTest failures on jenkins > -- > > Key: SOLR-11621 > URL: https://issues.apache.org/jira/browse/SOLR-11621 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: AutoScaling, Tests >Affects Versions: 7.1 > Reporter: Shalin Shekhar Mangar > Assignee: Shalin Shekhar Mangar > Fix For: 7.2, master (8.0) > > Attachments: SOLR-11621.patch > > > I have seen a few TriggerIntegrationTest failures due to timing issues of > triggers firing before waitFor period is over. We added a small delta to fix > this problem with other trigger tests but TestTriggerIntegration slipped > through the cracks. > An example of such failure is at > https://jenkins.thetaphi.de/job/Lucene-Solr-7.x-Linux/768 > {code} > [junit4] 2> 3025388 DEBUG > (AutoscalingActionExecutor-5242-thread-1-processing-n:127.0.0.1:34271_solr) > [n:127.0.0.1:34271_solr] o.a.s.c.a.TriggerIntegrationTest --throwable >[junit4] 2> java.lang.AssertionError: node_added_restore_trigger was > fired before the configured waitFor period >[junit4] 2> at org.junit.Assert.fail(Assert.java:93) >[junit4] 2> at > org.apache.solr.cloud.autoscaling.TriggerIntegrationTest$TestTriggerAction.process(TriggerIntegrationTest.java:591) >[junit4] 2> at > org.apache.solr.cloud.autoscaling.ScheduledTriggers.lambda$add$2(ScheduledTriggers.java:248) >[junit4] 2> at > java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:514) > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-11621) TriggerIntegrationTest failures on jenkins
[ https://issues.apache.org/jira/browse/SOLR-11621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shalin Shekhar Mangar updated SOLR-11621: - Attachment: SOLR-11621.patch Patch which adds a small 5 nanosecond delta in time elapsed calculations to avoid spurious failures. > TriggerIntegrationTest failures on jenkins > -- > > Key: SOLR-11621 > URL: https://issues.apache.org/jira/browse/SOLR-11621 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: AutoScaling, Tests >Affects Versions: 7.1 > Reporter: Shalin Shekhar Mangar > Fix For: 7.2, master (8.0) > > Attachments: SOLR-11621.patch > > > I have seen a few TriggerIntegrationTest failures due to timing issues of > triggers firing before waitFor period is over. We added a small delta to fix > this problem with other trigger tests but TestTriggerIntegration slipped > through the cracks. > An example of such failure is at > https://jenkins.thetaphi.de/job/Lucene-Solr-7.x-Linux/768 > {code} > [junit4] 2> 3025388 DEBUG > (AutoscalingActionExecutor-5242-thread-1-processing-n:127.0.0.1:34271_solr) > [n:127.0.0.1:34271_solr] o.a.s.c.a.TriggerIntegrationTest --throwable >[junit4] 2> java.lang.AssertionError: node_added_restore_trigger was > fired before the configured waitFor period >[junit4] 2> at org.junit.Assert.fail(Assert.java:93) >[junit4] 2> at > org.apache.solr.cloud.autoscaling.TriggerIntegrationTest$TestTriggerAction.process(TriggerIntegrationTest.java:591) >[junit4] 2> at > org.apache.solr.cloud.autoscaling.ScheduledTriggers.lambda$add$2(ScheduledTriggers.java:248) >[junit4] 2> at > java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:514) > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-11621) TriggerIntegrationTest failures on jenkins
Shalin Shekhar Mangar created SOLR-11621: Summary: TriggerIntegrationTest failures on jenkins Key: SOLR-11621 URL: https://issues.apache.org/jira/browse/SOLR-11621 Project: Solr Issue Type: Bug Security Level: Public (Default Security Level. Issues are Public) Components: AutoScaling, Tests Affects Versions: 7.1 Reporter: Shalin Shekhar Mangar Fix For: 7.2, master (8.0) I have seen a few TriggerIntegrationTest failures due to timing issues of triggers firing before waitFor period is over. We added a small delta to fix this problem with other trigger tests but TestTriggerIntegration slipped through the cracks. An example of such failure is at https://jenkins.thetaphi.de/job/Lucene-Solr-7.x-Linux/768 {code} [junit4] 2> 3025388 DEBUG (AutoscalingActionExecutor-5242-thread-1-processing-n:127.0.0.1:34271_solr) [n:127.0.0.1:34271_solr] o.a.s.c.a.TriggerIntegrationTest --throwable [junit4] 2> java.lang.AssertionError: node_added_restore_trigger was fired before the configured waitFor period [junit4] 2>at org.junit.Assert.fail(Assert.java:93) [junit4] 2>at org.apache.solr.cloud.autoscaling.TriggerIntegrationTest$TestTriggerAction.process(TriggerIntegrationTest.java:591) [junit4] 2>at org.apache.solr.cloud.autoscaling.ScheduledTriggers.lambda$add$2(ScheduledTriggers.java:248) [junit4] 2>at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:514) {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-11619) V2 API doesn't route to other nodes when the local node can't satisfy the request
[ https://issues.apache.org/jira/browse/SOLR-11619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16243470#comment-16243470 ] Shalin Shekhar Mangar commented on SOLR-11619: -- This sounds very similar to SOLR-11130 which was supposed to have been fixed for 7.0. > V2 API doesn't route to other nodes when the local node can't satisfy the > request > - > > Key: SOLR-11619 > URL: https://issues.apache.org/jira/browse/SOLR-11619 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: David Smiley >Assignee: David Smiley > Fix For: 7.2 > > Attachments: SOLR_11619_V2_action_REMOTEQUERY_bug.patch > > > There is some code in V2HttpCall to handle the case when the local node > doesn't have a core to satisfy the request. But AFAICT it's broken, which I > discovered while writing tests in SOLR-11299. The code will trigger an NPE > since it seems to get to an action state of PROCESS yet there is no core or > config. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-9440) ZkStateReader on a client can cache collection state and never refresh it
[ https://issues.apache.org/jira/browse/SOLR-9440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shalin Shekhar Mangar resolved SOLR-9440. - Resolution: Fixed > ZkStateReader on a client can cache collection state and never refresh it > - > > Key: SOLR-9440 > URL: https://issues.apache.org/jira/browse/SOLR-9440 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrCloud > Reporter: Shalin Shekhar Mangar > Assignee: Shalin Shekhar Mangar > Fix For: 7.2, master (8.0) > > Attachments: SOLR-9440.patch, SOLR-9440.patch > > > I saw this while writing a test case for SOLR-9438. The collection1 > collection which was in stateFormat=2 was somehow caching the > CloudSolrClient's ZkStateReader such that the returned cluster state > contained the collection state. However this collection was neither watched > nor lazy so any call to waitForRecoveriesToFinish would see stale state and > loop until timeout. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-9440) ZkStateReader on a client can cache collection state and never refresh it
[ https://issues.apache.org/jira/browse/SOLR-9440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shalin Shekhar Mangar updated SOLR-9440: Fix Version/s: (was: 6.7) (was: 7.0) 7.2 > ZkStateReader on a client can cache collection state and never refresh it > - > > Key: SOLR-9440 > URL: https://issues.apache.org/jira/browse/SOLR-9440 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrCloud > Reporter: Shalin Shekhar Mangar > Assignee: Shalin Shekhar Mangar > Fix For: 7.2, master (8.0) > > Attachments: SOLR-9440.patch, SOLR-9440.patch > > > I saw this while writing a test case for SOLR-9438. The collection1 > collection which was in stateFormat=2 was somehow caching the > CloudSolrClient's ZkStateReader such that the returned cluster state > contained the collection state. However this collection was neither watched > nor lazy so any call to waitForRecoveriesToFinish would see stale state and > loop until timeout. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-11472) Leader election bug
[ https://issues.apache.org/jira/browse/SOLR-11472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16241677#comment-16241677 ] Shalin Shekhar Mangar commented on SOLR-11472: -- I'm looking at this Varun. I'll post a summary once I've it pinned down. > Leader election bug > --- > > Key: SOLR-11472 > URL: https://issues.apache.org/jira/browse/SOLR-11472 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 7.1, master (8.0) >Reporter: Andrzej Bialecki > Assignee: Shalin Shekhar Mangar > Attachments: > Console_output_of_AutoscalingHistoryHandlerTest_failure.txt > > > SOLR-11407 uncovered a bug in leader election, where the same failing node is > retried indefinitely. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-11591) AutoAddReplicasIntegrationTest failures on jenkins
[ https://issues.apache.org/jira/browse/SOLR-11591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shalin Shekhar Mangar updated SOLR-11591: - Attachment: 13530-logs.txt 13525-logs.txt Attaching log output from the test runs. > AutoAddReplicasIntegrationTest failures on jenkins > -- > > Key: SOLR-11591 > URL: https://issues.apache.org/jira/browse/SOLR-11591 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: AutoScaling, Tests > Reporter: Shalin Shekhar Mangar >Priority: Minor > Fix For: 7.2, master (8.0) > > Attachments: 13525-logs.txt, 13530-logs.txt > > > Jenkins shows rare test failures for this one: > # http://jenkins.sarowe.net/view/Enabled/job/Lucene-Solr-tests-master/13525/ > # http://jenkins.sarowe.net/view/Enabled/job/Lucene-Solr-tests-master/13530/ -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-11591) AutoAddReplicasIntegrationTest failures on jenkins
Shalin Shekhar Mangar created SOLR-11591: Summary: AutoAddReplicasIntegrationTest failures on jenkins Key: SOLR-11591 URL: https://issues.apache.org/jira/browse/SOLR-11591 Project: Solr Issue Type: Bug Security Level: Public (Default Security Level. Issues are Public) Components: AutoScaling, Tests Reporter: Shalin Shekhar Mangar Priority: Minor Fix For: 7.2, master (8.0) Jenkins shows rare test failures for this one: # http://jenkins.sarowe.net/view/Enabled/job/Lucene-Solr-tests-master/13525/ # http://jenkins.sarowe.net/view/Enabled/job/Lucene-Solr-tests-master/13530/ -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9440) ZkStateReader on a client can cache collection state and never refresh it
[ https://issues.apache.org/jira/browse/SOLR-9440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16233639#comment-16233639 ] Shalin Shekhar Mangar commented on SOLR-9440: - {quote} I'm wondering if we need some synchronization here, with registerCollectionStateWatcher and also to make sure the watchedCollectionStates.remove and lazyCollectionStates.put is done atomically {quote} Hmm good point. I don't think we need synchronization here but we need to ensure that the result, once visible, is consistent. So this is a trick that the ZkStateReader uses -- It adds all collections to the lazyCollectionStates map and never removes them unless the collection is deleted. But it gives priority to watchedCollectionStates over the lazy ones. This ensures that during constructState, the collection is always available in the cluster state even if it is removed from the watchedCollectionStates. Actually the lazyCollectionStates.put is not necessary but it is there just for safety. bq. Is this only for testing purposes? Oops, yes, thanks for catching. I'll revert it. > ZkStateReader on a client can cache collection state and never refresh it > - > > Key: SOLR-9440 > URL: https://issues.apache.org/jira/browse/SOLR-9440 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrCloud > Reporter: Shalin Shekhar Mangar > Assignee: Shalin Shekhar Mangar >Priority: Major > Fix For: 6.7, 7.0, master (8.0) > > Attachments: SOLR-9440.patch, SOLR-9440.patch > > > I saw this while writing a test case for SOLR-9438. The collection1 > collection which was in stateFormat=2 was somehow caching the > CloudSolrClient's ZkStateReader such that the returned cluster state > contained the collection state. However this collection was neither watched > nor lazy so any call to waitForRecoveriesToFinish would see stale state and > loop until timeout. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (SOLR-11472) Leader election bug
[ https://issues.apache.org/jira/browse/SOLR-11472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shalin Shekhar Mangar reassigned SOLR-11472: Assignee: Shalin Shekhar Mangar > Leader election bug > --- > > Key: SOLR-11472 > URL: https://issues.apache.org/jira/browse/SOLR-11472 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 7.1, master (8.0) >Reporter: Andrzej Bialecki > Assignee: Shalin Shekhar Mangar > Attachments: > Console_output_of_AutoscalingHistoryHandlerTest_failure.txt > > > SOLR-11407 uncovered a bug in leader election, where the same failing node is > retried indefinitely. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9440) ZkStateReader on a client can cache collection state and never refresh it
[ https://issues.apache.org/jira/browse/SOLR-9440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16226696#comment-16226696 ] Shalin Shekhar Mangar commented on SOLR-9440: - I'll observe the impact of this change on jenkins for a couple of days before porting it to branch_7x. > ZkStateReader on a client can cache collection state and never refresh it > - > > Key: SOLR-9440 > URL: https://issues.apache.org/jira/browse/SOLR-9440 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrCloud > Reporter: Shalin Shekhar Mangar > Assignee: Shalin Shekhar Mangar > Fix For: 6.7, 7.0, master (8.0) > > Attachments: SOLR-9440.patch, SOLR-9440.patch > > > I saw this while writing a test case for SOLR-9438. The collection1 > collection which was in stateFormat=2 was somehow caching the > CloudSolrClient's ZkStateReader such that the returned cluster state > contained the collection state. However this collection was neither watched > nor lazy so any call to waitForRecoveriesToFinish would see stale state and > loop until timeout. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-9440) ZkStateReader on a client can cache collection state and never refresh it
[ https://issues.apache.org/jira/browse/SOLR-9440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shalin Shekhar Mangar updated SOLR-9440: Attachment: SOLR-9440.patch I added a test in ZkStateReaderTest which fails 100% of the time without the fix. This is ready. > ZkStateReader on a client can cache collection state and never refresh it > - > > Key: SOLR-9440 > URL: https://issues.apache.org/jira/browse/SOLR-9440 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrCloud > Reporter: Shalin Shekhar Mangar > Assignee: Shalin Shekhar Mangar > Fix For: 6.7, 7.0, master (8.0) > > Attachments: SOLR-9440.patch, SOLR-9440.patch > > > I saw this while writing a test case for SOLR-9438. The collection1 > collection which was in stateFormat=2 was somehow caching the > CloudSolrClient's ZkStateReader such that the returned cluster state > contained the collection state. However this collection was neither watched > nor lazy so any call to waitForRecoveriesToFinish would see stale state and > loop until timeout. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-9440) ZkStateReader on a client can cache collection state and never refresh it
[ https://issues.apache.org/jira/browse/SOLR-9440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shalin Shekhar Mangar updated SOLR-9440: Attachment: SOLR-9440.patch This patch changes removeCollectionStateWatcher to handle evictions the same as unregisterCore method. It also removes all the workarounds introduces because of this bug in various places. > ZkStateReader on a client can cache collection state and never refresh it > - > > Key: SOLR-9440 > URL: https://issues.apache.org/jira/browse/SOLR-9440 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrCloud > Reporter: Shalin Shekhar Mangar > Assignee: Shalin Shekhar Mangar > Fix For: 6.7, 7.0, master (8.0) > > Attachments: SOLR-9440.patch > > > I saw this while writing a test case for SOLR-9438. The collection1 > collection which was in stateFormat=2 was somehow caching the > CloudSolrClient's ZkStateReader such that the returned cluster state > contained the collection state. However this collection was neither watched > nor lazy so any call to waitForRecoveriesToFinish would see stale state and > loop until timeout. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9440) ZkStateReader on a client can cache collection state and never refresh it
[ https://issues.apache.org/jira/browse/SOLR-9440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16226416#comment-16226416 ] Shalin Shekhar Mangar commented on SOLR-9440: - Also, the reason why we saw this manifest on tests is because the SolrCloudTestCase.waitForState method eventually calls ZkStateReader.waitForState method which registers and then removes a collection state watcher. > ZkStateReader on a client can cache collection state and never refresh it > - > > Key: SOLR-9440 > URL: https://issues.apache.org/jira/browse/SOLR-9440 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrCloud > Reporter: Shalin Shekhar Mangar > Assignee: Shalin Shekhar Mangar > Fix For: 6.7, 7.0, master (8.0) > > > I saw this while writing a test case for SOLR-9438. The collection1 > collection which was in stateFormat=2 was somehow caching the > CloudSolrClient's ZkStateReader such that the returned cluster state > contained the collection state. However this collection was neither watched > nor lazy so any call to waitForRecoveriesToFinish would see stale state and > loop until timeout. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9440) ZkStateReader on a client can cache collection state and never refresh it
[ https://issues.apache.org/jira/browse/SOLR-9440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16226414#comment-16226414 ] Shalin Shekhar Mangar commented on SOLR-9440: - I found the root cause. The bug is in {{ZkStateReader.removeCollectionStateWatcher}} which only removes the collection from the collection's watch list i.e. collectionWatches map. Since ZK does not have a way to remove a watch, the watch object is fired again when the collection changes. Now, there is code in StateWatcher's refreshAndWatch method which is supposed to evict the cached DocCollection object from watchedCollectionStates if the collection is no more in the collectionWatches map. However, that code never gets executed because the StateWatcher's process method returns early if the collection is not in collectionWatches list. So a cached DocCollection reference that is neither lazy nor watched is left behind. > ZkStateReader on a client can cache collection state and never refresh it > - > > Key: SOLR-9440 > URL: https://issues.apache.org/jira/browse/SOLR-9440 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrCloud > Reporter: Shalin Shekhar Mangar > Assignee: Shalin Shekhar Mangar > Fix For: 6.7, 7.0, master (8.0) > > > I saw this while writing a test case for SOLR-9438. The collection1 > collection which was in stateFormat=2 was somehow caching the > CloudSolrClient's ZkStateReader such that the returned cluster state > contained the collection state. However this collection was neither watched > nor lazy so any call to waitForRecoveriesToFinish would see stale state and > loop until timeout. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (SOLR-9440) ZkStateReader on a client can cache collection state and never refresh it
[ https://issues.apache.org/jira/browse/SOLR-9440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shalin Shekhar Mangar reassigned SOLR-9440: --- Assignee: Shalin Shekhar Mangar > ZkStateReader on a client can cache collection state and never refresh it > - > > Key: SOLR-9440 > URL: https://issues.apache.org/jira/browse/SOLR-9440 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrCloud > Reporter: Shalin Shekhar Mangar > Assignee: Shalin Shekhar Mangar > Fix For: 6.7, 7.0, master (8.0) > > > I saw this while writing a test case for SOLR-9438. The collection1 > collection which was in stateFormat=2 was somehow caching the > CloudSolrClient's ZkStateReader such that the returned cluster state > contained the collection state. However this collection was neither watched > nor lazy so any call to waitForRecoveriesToFinish would see stale state and > loop until timeout. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9735) Umbrella JIRA for Auto Scaling and Cluster Management in SolrCloud
[ https://issues.apache.org/jira/browse/SOLR-9735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16220368#comment-16220368 ] Shalin Shekhar Mangar commented on SOLR-9735: - A note for people following this issue -- we intend to do all further development for the sub-tasks in this issue on the master branch itself. > Umbrella JIRA for Auto Scaling and Cluster Management in SolrCloud > -- > > Key: SOLR-9735 > URL: https://issues.apache.org/jira/browse/SOLR-9735 > Project: Solr > Issue Type: New Feature > Security Level: Public(Default Security Level. Issues are Public) > Components: AutoScaling >Reporter: Anshum Gupta > Assignee: Shalin Shekhar Mangar > Original Estimate: 1,344h > Remaining Estimate: 1,344h > > As SolrCloud is now used at fairly large scale, most users end up writing > their own cluster management tools. We should have a framework for cluster > management in Solr. > In a discussion with [~noble.paul], we outlined the following steps w.r.t. > the approach to having this implemented: > * *Basic API* calls for cluster management e.g. utilize added nodes, remove a > node etc. These calls would need explicit invocation by the users to begin > with. It would also specify the {{strategy}} to use. For instance I can have > a strategy called {{optimizeCoreCount}} which would target to have an even > no:of cores in each node . The strategy could optionally take parameters as > well > * *Metrics* and stats tracking e.g. qps, etc. These would be required for any > advanced cluster management tasks e.g. *maintain a qps of 'x'* by > *auto-adding a replica* (using a recipe) etc. We would need > collection/shard/node level views of metrics for this. > * *Recipes*: combination of multiple sequential/parallel API calls based on > rules. This would be complicated specially as most of these would be long > running series of tasks which would either have to be rolled back or resumed > in case of a failure. > * *Event based triggers* that would not require explicit cluster management > calls for end users. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Reopened] (SOLR-11524) Create a autoscaling/suggestions API end-point
[ https://issues.apache.org/jira/browse/SOLR-11524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shalin Shekhar Mangar reopened SOLR-11524: -- [~noblepaul] - I don't see any commits on branch_7x. Please mark this issue as resolved again after you have backported all the commits to branch_7x. > Create a autoscaling/suggestions API end-point > -- > > Key: SOLR-11524 > URL: https://issues.apache.org/jira/browse/SOLR-11524 > Project: Solr > Issue Type: Sub-task > Security Level: Public(Default Security Level. Issues are Public) > Components: AutoScaling >Reporter: Noble Paul >Assignee: Noble Paul > Fix For: 7.2 > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [JENKINS] Lucene-Solr-master-Linux (64bit/jdk-9) - Build # 20739 - Still Unstable!
I committed a fix. On Thu, Oct 26, 2017 at 5:18 PM, Policeman Jenkins Serverwrote: > Build: https://jenkins.thetaphi.de/job/Lucene-Solr-master-Linux/20739/ > Java: 64bit/jdk-9 -XX:-UseCompressedOops -XX:+UseConcMarkSweepGC > --illegal-access=deny > > 1 tests failed. > FAILED: org.apache.solr.cloud.autoscaling.AutoScalingHandlerTest.testReadApi > > Error Message: > No collection param specified on request and no default collection has been > set: [] > > Stack Trace: > org.apache.solr.common.SolrException: No collection param specified on > request and no default collection has been set: [] > at > __randomizedtesting.SeedInfo.seed([F0012AA5BECCA882:A728D110653E4A99]:0) > at > org.apache.solr.client.solrj.impl.CloudSolrClient.sendRequest(CloudSolrClient.java:1032) > at > org.apache.solr.client.solrj.impl.CloudSolrClient.requestWithRetryOnStaleState(CloudSolrClient.java:867) > at > org.apache.solr.client.solrj.impl.CloudSolrClient.request(CloudSolrClient.java:800) > at > org.apache.solr.client.solrj.SolrClient.request(SolrClient.java:1219) > at > org.apache.solr.cloud.autoscaling.AutoScalingHandlerTest.testReadApi(AutoScalingHandlerTest.java:708) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.base/java.lang.reflect.Method.invoke(Method.java:564) > at > com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1737) > at > com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:934) > at > com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:970) > at > com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:984) > at > com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57) > at > org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:49) > at > org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45) > at > org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48) > at > org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64) > at > org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47) > at > com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) > at > com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:368) > at > com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:817) > at > com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:468) > at > com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:943) > at > com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:829) > at > com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:879) > at > com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:890) > at > com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) > at > com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57) > at > org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45) > at > com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) > at > org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:41) > at > com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40) > at > com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40) > at > com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) > at > com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) > at > org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53) > at > org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47) > at >
CVE-2016-6809: Java code execution for serialized objects embedded in MATLAB files parsed by Apache Solr using Apache Tika
CVE-2016-6809: Java code execution for serialized objects embedded in MATLAB files parsed by Apache Solr using Tika Severity: Important Vendor: The Apache Software Foundation Versions Affected: Solr 5.0.0 to 5.5.4 Solr 6.0.0 to 6.6.1 Solr 7.0.0 to 7.0.1 Description: Apache Solr uses Apache Tika for parsing binary file types such as doc, xls, pdf etc. Apache Tika wraps the jmatio parser (https://github.com/gradusnikov/jmatio) to handle MATLAB files. The parser uses native deserialization on serialized Java objects embedded in MATLAB files. A malicious user could inject arbitrary code into a MATLAB file that would be executed when the object is deserialized. This vulnerability was originally described at http://mail-archives.apache.org/mod_mbox/tika-user/201611.mbox/%3C2125912914.1308916.1478787314903%40mail.yahoo.com%3E Mitigation: Users are advised to upgrade to either Solr 5.5.5 or Solr 6.6.2 or Solr 7.1.0 releases which have fixed this vulnerability. Solr 5.5.5 upgrades the jmatio parser to v1.2 and disables the Java deserialisation support to protect against this vulnerability. Solr 6.6.2 and Solr 7.1.0 have upgraded the bundled Tika to v1.16. Once upgrade is complete, no other steps are required. References: https://issues.apache.org/jira/browse/SOLR-11486 https://issues.apache.org/jira/browse/SOLR-10335 https://wiki.apache.org/solr/SolrSecurity -- Regards, Shalin Shekhar Mangar. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Congratulations to the new Lucene/Solr PMC Chair, Adrien Grand
Congratulations Adrien! On Thu, Oct 19, 2017 at 12:49 PM, Tommaso Teofili <tommaso.teof...@gmail.com> wrote: > Once a year the Lucene PMC rotates the PMC chair and Apache Vice President > position. > This year we have nominated and elected Adrien Grand as the chair and today > the board just approved it, so now it's official. > > Congratulations Adrien! > Regards, > Tommaso -- Regards, Shalin Shekhar Mangar. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[ANNOUNCE] [SECURITY] CVE-2017-12629: Several critical vulnerabilities discovered in Apache Solr (XXE & RCE)
CVE-2017-12629: Several critical vulnerabilities discovered in Apache Solr (XXE & RCE) Severity: Critical Vendor: The Apache Software Foundation Versions Affected: Solr 5.5.0 to 5.5.4 Solr 6.0.0 to 6.6.1 Solr 7.0.0 to 7.0.1 Description: The details of this vulnerability were reported on public mailing lists. See https://s.apache.org/FJDl The first vulnerability relates to XML external entity expansion in the XML Query Parser which is available, by default, for any query request with parameters deftype=xmlparser. This can be exploited to upload malicious data to the /upload request handler. It can also be used as Blind XXE using ftp wrapper in order to read arbitrary local files from the solr server. The second vulnerability relates to remote code execution using the RunExecutableListener available on all affected versions of Solr. At the time of the above report, this was a 0-day vulnerability with a working exploit affecting the versions of Solr mentioned in the previous section. However, mitigation steps were announced to protect Solr users the same day. See https://lucene.apache.org/solr/news.html#12-october-2017-please-secure-your-apache-solr-servers-since-a-zero-day-exploit-has-been-reported-on-a-public-mailing-list Mitigation: Users are advised to upgrade to either Solr 6.6.2 or Solr 7.1.0 releases both of which address the two vulnerabilities. Once upgrade is complete, no other steps are required. If users are unable to upgrade to Solr 6.6.2 or Solr 7.1.0 then they are advised to restart their Solr instances with the system parameter `-Ddisable.configEdit=true`. This will disallow any changes to be made to your configurations via the Config API. This is a key factor in this vulnerability, since it allows GET requests to add the RunExecutableListener to your config. Users are also advised to re-map the XML Query Parser to another parser to mitigate the XXE vulnerability. For example, adding the following to the solrconfig.xml file re-maps the xmlparser to the edismax parser: Credit: Michael Stepankin (JPMorgan Chase) Olga Barinova (Gotham Digital Science) References: https://issues.apache.org/jira/browse/SOLR-11482 https://issues.apache.org/jira/browse/SOLR-11477 https://wiki.apache.org/solr/SolrSecurity -- Regards, Shalin Shekhar Mangar. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Release 5.5.5
+1 On Tue, Oct 17, 2017 at 11:08 PM, Steve Rowe <sar...@gmail.com> wrote: > I think we should release a security-focused 5.5.5. I volunteer to be the > release manager. > > I’ll make an RC later today after I backport issues. > > -- > Steve > www.lucidworks.com > - > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > -- Regards, Shalin Shekhar Mangar. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4960) Require minimum ivy version
[ https://issues.apache.org/jira/browse/LUCENE-4960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shalin Shekhar Mangar updated LUCENE-4960: -- Fix Version/s: (was: 7.1) 7.2 > Require minimum ivy version > --- > > Key: LUCENE-4960 > URL: https://issues.apache.org/jira/browse/LUCENE-4960 > Project: Lucene - Core > Issue Type: Bug > Components: general/build >Affects Versions: 4.2.1 >Reporter: Shawn Heisey >Assignee: Steve Rowe >Priority: Minor > Fix For: master (8.0), 7.2 > > Attachments: LUCENE-4960.patch > > > Someone on solr-user ran into a problem while trying to run 'ant idea' so > they could work on Solr in their IDE. [~steve_rowe] indicated that this is > probably due to IVY-1194, requiring an ivy jar upgrade. > The build system should check for a minimum ivy version, just like it does > with ant. The absolute minimum we require appears to be 2.2.0, but do we > want to make it 2.3.0 due to IVY-1388? > I'm not sure how to go about checking the ivy version. Checking the ant > version is easy because it's ant itself that does the checking. > There might be other component versions that should be checked too. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-7935) Release .sha512 hash files with our artifacts
[ https://issues.apache.org/jira/browse/LUCENE-7935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shalin Shekhar Mangar updated LUCENE-7935: -- Fix Version/s: (was: 7.1) 7.2 > Release .sha512 hash files with our artifacts > - > > Key: LUCENE-7935 > URL: https://issues.apache.org/jira/browse/LUCENE-7935 > Project: Lucene - Core > Issue Type: Improvement > Components: general/build >Reporter: Jan Høydahl > Fix For: 7.2 > > Attachments: LUCENE-7935.patch > > > Currently we are only required to release {{.md5}} hashes with our artifacts, > and we also include {{.sha1}} files. It is expected that {{.sha512}} will be > required in the future (see > https://www.apache.org/dev/release-signing.html#sha1), so why not start > generating them right away? -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-7964) Remove Solr fieldType XML example from Lucene AnalysisFactories JavaDoc
[ https://issues.apache.org/jira/browse/LUCENE-7964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shalin Shekhar Mangar updated LUCENE-7964: -- Fix Version/s: (was: 7.1) 7.2 > Remove Solr fieldType XML example from Lucene AnalysisFactories JavaDoc > --- > > Key: LUCENE-7964 > URL: https://issues.apache.org/jira/browse/LUCENE-7964 > Project: Lucene - Core > Issue Type: Improvement > Components: general/javadocs >Reporter: Jan Høydahl >Priority: Trivial > Fix For: 7.2 > > > As proposed and discussed in this dev-list thread: > https://lists.apache.org/thread.html/9add7e4a3ad28b307dc51532a609b423982922d734064f26f8104744@%3Cdev.lucene.apache.org%3E > [~rcmuir] [~dsmiley] [~thetaphi] -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Closed] (SOLR-10821) Write documentation for the autoscaling APIs and policy/preferences syntax for Solr 7.0
[ https://issues.apache.org/jira/browse/SOLR-10821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shalin Shekhar Mangar closed SOLR-10821. > Write documentation for the autoscaling APIs and policy/preferences syntax > for Solr 7.0 > --- > > Key: SOLR-10821 > URL: https://issues.apache.org/jira/browse/SOLR-10821 > Project: Solr > Issue Type: Sub-task > Security Level: Public(Default Security Level. Issues are Public) > Components: documentation > Reporter: Shalin Shekhar Mangar > Assignee: Shalin Shekhar Mangar > Labels: autoscaling > Fix For: 7.0, master (8.0) > > > We need to document the following: > # set-policy > # set-cluster-preferences > # set-cluster-policy > # Autoscaling configuration read API > # Autoscaling diagnostics API > # policy and preference rule syntax -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-10397) Port 'autoAddReplicas' feature to the autoscaling framework and make it work with non-shared filesystems
[ https://issues.apache.org/jira/browse/SOLR-10397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shalin Shekhar Mangar resolved SOLR-10397. -- Resolution: Fixed Fix Version/s: (was: 7.0) master (8.0) 7.1 > Port 'autoAddReplicas' feature to the autoscaling framework and make it work > with non-shared filesystems > > > Key: SOLR-10397 > URL: https://issues.apache.org/jira/browse/SOLR-10397 > Project: Solr > Issue Type: Sub-task > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrCloud > Reporter: Shalin Shekhar Mangar >Assignee: Cao Manh Dat > Labels: autoscaling > Fix For: 7.1, master (8.0) > > Attachments: SOLR-10397.1.patch, SOLR-10397.2.patch, > SOLR-10397.2.patch, SOLR-10397.2.patch, SOLR-10397.patch, > SOLR-10397_remove_nocommit.patch > > > Currently 'autoAddReplicas=true' can be specified in the Collection Create > API to automatically add replicas when a replica becomes unavailable. I > propose to move this feature to the autoscaling cluster policy rules design. > This will include the following: > * Trigger support for ‘nodeLost’ event type > * Modification of existing implementation of ‘autoAddReplicas’ to > automatically create the appropriate ‘nodeLost’ trigger. > * Any such auto-created trigger must be marked internally such that setting > ‘autoAddReplicas=false’ via the Modify Collection API should delete or > disable corresponding trigger. > * Support for non-HDFS filesystems while retaining the optimization afforded > by HDFS i.e. the replaced replica can point to the existing data dir of the > old replica. > * Deprecate/remove the feature of enabling/disabling ‘autoAddReplicas’ across > the entire cluster using cluster properties in favor of using the > suspend-trigger/resume-trigger APIs. > This will retain backward compatibility for the most part and keep a common > use-case easy to enable as well as make it available to more people (i.e. > people who don't use HDFS). -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-10821) Write documentation for the autoscaling APIs and policy/preferences syntax for Solr 7.0
[ https://issues.apache.org/jira/browse/SOLR-10821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shalin Shekhar Mangar resolved SOLR-10821. -- Resolution: Fixed Fix Version/s: master (8.0) > Write documentation for the autoscaling APIs and policy/preferences syntax > for Solr 7.0 > --- > > Key: SOLR-10821 > URL: https://issues.apache.org/jira/browse/SOLR-10821 > Project: Solr > Issue Type: Sub-task > Security Level: Public(Default Security Level. Issues are Public) > Components: documentation > Reporter: Shalin Shekhar Mangar > Assignee: Shalin Shekhar Mangar > Labels: autoscaling > Fix For: master (8.0), 7.0 > > > We need to document the following: > # set-policy > # set-cluster-preferences > # set-cluster-policy > # Autoscaling configuration read API > # Autoscaling diagnostics API > # policy and preference rule syntax -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-10822) Concurrent execution of Policy computations should yield correct result
[ https://issues.apache.org/jira/browse/SOLR-10822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shalin Shekhar Mangar resolved SOLR-10822. -- Resolution: Fixed Fix Version/s: master (8.0) > Concurrent execution of Policy computations should yield correct result > > > Key: SOLR-10822 > URL: https://issues.apache.org/jira/browse/SOLR-10822 > Project: Solr > Issue Type: Sub-task > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrCloud > Reporter: Shalin Shekhar Mangar >Assignee: Noble Paul > Labels: autoscaling > Fix For: 7.1, master (8.0) > > Attachments: SOLR-10822.patch > > > Policy framework are now used to find replica placements by all collection > APIs but since these APIs can be executed concurrently, we can get wrong > placements because of concurrently running calculations. We should > synchronize just the calculation part so that they happen serially. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[ANNOUNCE] Apache Solr 7.1.0 released
17 October 2017, Apache Solr™ 7.1.0 available The Lucene PMC is pleased to announce the release of Apache Solr 7.1.0 Solr is the popular, blazing fast, open source NoSQL search platform from the Apache Lucene project. Its major features include powerful full-text search, hit highlighting, faceted search, dynamic clustering, database integration, rich document (e.g., Word, PDF) handling, and geospatial search. Solr is highly scalable, providing fault tolerant distributed search and indexing, and powers the search and navigation features of many of the world's largest internet sites. Solr 7.1.0 is available for immediate download at: http://lucene.apache.org/solr/mirrors-solr-latest-redir.html See http://lucene.apache.org/solr/7_1_0/changes/Changes.html for a full list of details. Solr 7.1.0 Release Highlights: * Critical Security Update: Fix for CVE-2017-12629 which is a working 0-day exploit reported on the public mailing list. See https://s.apache.org/FJDl * Auto-scaling: Solr can now move replicas automatically when a new node is added or an existing node is removed using the auto scaling policy framework introduced in 7.0 * Auto-scaling: The 'autoAddReplicas' feature which was limited to shared file systems is now available for all file systems. It has been ported to use the new autoscaling framework internally. * Auto-scaling: New set-trigger, remove-trigger, set-listener, remove-listener, suspend-trigger, resume-trigger APIs * Auto-scaling: New /autoscaling/history API to show past autoscaling actions and cluster events * New JSON based Query DSL for Solr that extends JSON Request API to also support all query parsers and their nested parameters * JSON Facet API: min/max aggregations are now supported on single-valued date fields * Lucene's Geo3D (surface of sphere & ellipsoid) is now supported on spatial RPT fields by setting spatialContextFactory="Geo3D". Furthermore, this is the first time Solr has out of the box support for polygons * Expanded support for statistical stream evaluators such as various distributions, rank correlations, distances and more. * Multiple other optimizations and bug fixes You are encouraged to thoroughly read the "Upgrade Notes" at http://lucene.apache.org/solr/7_1_0/changes/Changes.html or in the CHANGES.txt file accompanying the release. Solr 7.1 also includes many other new features as well as numerous optimizations and bugfixes of the corresponding Apache Lucene release. Please report any feedback to the mailing lists (http://lucene.apache.org/solr/discussion.html) Note: The Apache Software Foundation uses an extensive mirroring network for distributing releases. It is possible that the mirror you are using may not have replicated the release yet. If that is the case, please try another mirror. This also goes for Maven access. -- Regards, Shalin Shekhar Mangar. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[ANNOUNCE] Apache Lucene 7.1.0 released
17 October 2017, Apache Lucene™ 7.1.0 available The Lucene PMC is pleased to announce the release of Apache Lucene 7.1.0. Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full-text search, especially cross-platform. This release contains numerous bug fixes, optimizations, and improvements, some of which are highlighted below. The release is available for immediate download at: http://lucene.apache.org/core/mirrors-core-latest-redir.html Please read CHANGES.txt for a full list of new features and changes: https://lucene.apache.org/core/7_1_0/changes/Changes.html Lucene 7.1.0 Release Highlights: * New Geo3D shapes for non-spherical planet models * Serialization and deserialization support for Geo3D * A new CoveringQuery, whose required number of matching clauses can be defined per document * New BengaliAnalyzer for Bengali language * A point based range field called LatLonBoundingBox FloatPointNearestNeighbor, an N-dimensional FloatPoint K-nearest-neighbor search implementation * Faster default taxonomy cache * Support for computing facet counts for individual numeric values via LongValueFacetCounts * Faster geo-distance queries in case of dense single-valued fields when most documents match * Better heuristics in IndexOrDocValuesQuery * Optimized builds for OrdinalMap (used by SortedSetDocValuesFacetCounts and others) Please report any feedback to the mailing lists (http://lucene.apache.org/core/discussion.html) Note: The Apache Software Foundation uses an extensive mirroring network for distributing releases. It is possible that the mirror you are using may not have replicated the release yet. If that is the case, please try another mirror. This also applies to Maven access. -- Regards, Shalin Shekhar Mangar. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Release notes for Lucene/Solr 7.1
Thanks Alexandre. How about "New JSON based Query DSL for Solr that builds on top of the existing JSON Request API and allows queries and filters to be specified in a nested JSON structure"? Please feel free to improve. On Tue, Oct 10, 2017 at 9:10 PM, Alexandre Rafalovitch <arafa...@gmail.com> wrote: > I was quite surprised to see "New JSON based Query DSL for Solr". I > thought we already had one (unfinished?). > > This seems to be the reference to SOLR-11244 which does say it is an > extension of what we have. But also, in its turn, unfinished? > > It would be nice for the release notes to clarify that it is not > something completely new, but is an extension and how far it goes this > time. Unless that's in the Ref docs already and then we could mention > it. > > Regards, >Alex. > > http://www.solr-start.com/ - Resources for Solr users, new and experienced > > > On 10 October 2017 at 11:21, Shalin Shekhar Mangar <sha...@apache.org> wrote: >> Hello, >> >> I've created drafts for the 7.1 release notes: >> >> Lucene: https://wiki.apache.org/lucene-java/ReleaseNote71 >> Solr: https://wiki.apache.org/solr/ReleaseNote71 >> >> Please review and edit as you see fit. >> >> -- >> Regards, >> Shalin Shekhar Mangar. >> >> - >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >> For additional commands, e-mail: dev-h...@lucene.apache.org >> > > --------- > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > -- Regards, Shalin Shekhar Mangar. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [VOTE] Release Lucene/Solr 7.1.0 RC2
Owing to the serious nature of the exploit being fixed with these artifacts and that there are +1s from three PMC members and no -1s, I'm going to close the voting now. This vote has passed. Thanks to everyone who voted. On Sat, Oct 14, 2017 at 12:55 AM, Shalin Shekhar Mangar <sha...@apache.org> wrote: > Please vote for release candidate 2 for Lucene/Solr 7.1.0 > > The artifacts can be downloaded from: > https://dist.apache.org/repos/dist/dev/lucene/lucene-solr-7.1.0-RC2-rev84c90ad2c0218156c840e19a64d72b8a38550659 > > You can run the smoke tester directly with this command: > > python3 -u dev-tools/scripts/smokeTestRelease.py \ > https://dist.apache.org/repos/dist/dev/lucene/lucene-solr-7.1.0-RC2-rev84c90ad2c0218156c840e19a64d72b8a38550659 > > Smoke tester passed for me. > SUCCESS! [0:40:53.908967] > > Here's my +1 to release. > > -- > Regards, > Shalin Shekhar Mangar. -- Regards, Shalin Shekhar Mangar. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-10335) Upgrade to Tika 1.16 when available
[ https://issues.apache.org/jira/browse/SOLR-10335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shalin Shekhar Mangar resolved SOLR-10335. -- Resolution: Fixed Backported to 7x, 7_1 and 6_6 branches. Thanks Tim! > Upgrade to Tika 1.16 when available > --- > > Key: SOLR-10335 > URL: https://issues.apache.org/jira/browse/SOLR-10335 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Tim Allison > Assignee: Shalin Shekhar Mangar >Priority: Critical > Fix For: 7.1, 6.6.2 > > > Once POI 3.16-beta3 is out (early/mid April?), we'll push for a release of > Tika 1.15. > Please let us know if you have any requests. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: 6.6.2 Release
Hi Ishan, I've backported SOLR-10335 to branch_6_6 so this is ready to go. Thanks for volunteering for the release. I would have volunteered after releasing 7.1 but you beat me to it. On Fri, Oct 13, 2017 at 8:23 PM, Ishan Chattopadhyaya <ichattopadhy...@gmail.com> wrote: > Hi, > In light of [0], we need a 6.6.2 release as soon as possible. > > I'd like to volunteer to RM for this release, unless someone else wants to > do so or has an objection. > > Regards, > Ishan > > > [0] - > https://lucene.apache.org/solr/news.html#12-october-2017-please-secure-your-apache-solr-servers-since-a-zero-day-exploit-has-been-reported-on-a-public-mailing-list -- Regards, Shalin Shekhar Mangar. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [VOTE] Release Lucene/Solr 7.1.0 RC2
Answering myself, I think we should go ahead with this RC. I've added this entry to CHANGES.txt in all branches and it will be picked up in case there needs to be a re-spin due to other reasons. On Fri, Oct 13, 2017 at 8:16 PM, Shalin Shekhar Mangar <sha...@apache.org> wrote: > I just noticed that in the hurry to create this RC, I forgot to add > SOLR-10335 to Solr's CHANGES.txt. Is that worth a re-spin? > > On Fri, Oct 13, 2017 at 7:25 PM, Shalin Shekhar Mangar > <sha...@apache.org> wrote: >> Please vote for release candidate 2 for Lucene/Solr 7.1.0 >> >> The artifacts can be downloaded from: >> https://dist.apache.org/repos/dist/dev/lucene/lucene-solr-7.1.0-RC2-rev84c90ad2c0218156c840e19a64d72b8a38550659 >> >> You can run the smoke tester directly with this command: >> >> python3 -u dev-tools/scripts/smokeTestRelease.py \ >> https://dist.apache.org/repos/dist/dev/lucene/lucene-solr-7.1.0-RC2-rev84c90ad2c0218156c840e19a64d72b8a38550659 >> >> Smoke tester passed for me. >> SUCCESS! [0:40:53.908967] >> >> Here's my +1 to release. >> >> -- >> Regards, >> Shalin Shekhar Mangar. > > > > -- > Regards, > Shalin Shekhar Mangar. -- Regards, Shalin Shekhar Mangar. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [VOTE] Release Lucene/Solr 7.1.0 RC2
I just noticed that in the hurry to create this RC, I forgot to add SOLR-10335 to Solr's CHANGES.txt. Is that worth a re-spin? On Fri, Oct 13, 2017 at 7:25 PM, Shalin Shekhar Mangar <sha...@apache.org> wrote: > Please vote for release candidate 2 for Lucene/Solr 7.1.0 > > The artifacts can be downloaded from: > https://dist.apache.org/repos/dist/dev/lucene/lucene-solr-7.1.0-RC2-rev84c90ad2c0218156c840e19a64d72b8a38550659 > > You can run the smoke tester directly with this command: > > python3 -u dev-tools/scripts/smokeTestRelease.py \ > https://dist.apache.org/repos/dist/dev/lucene/lucene-solr-7.1.0-RC2-rev84c90ad2c0218156c840e19a64d72b8a38550659 > > Smoke tester passed for me. > SUCCESS! [0:40:53.908967] > > Here's my +1 to release. > > -- > Regards, > Shalin Shekhar Mangar. -- Regards, Shalin Shekhar Mangar. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-10335) Upgrade to Tika 1.16 when available
[ https://issues.apache.org/jira/browse/SOLR-10335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16204119#comment-16204119 ] Shalin Shekhar Mangar commented on SOLR-10335: -- No, I'll take care of the backports. Thanks! > Upgrade to Tika 1.16 when available > --- > > Key: SOLR-10335 > URL: https://issues.apache.org/jira/browse/SOLR-10335 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Tim Allison > Assignee: Shalin Shekhar Mangar >Priority: Critical > Fix For: 7.1, 6.6.2 > > > Once POI 3.16-beta3 is out (early/mid April?), we'll push for a release of > Tika 1.15. > Please let us know if you have any requests. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-10335) Upgrade to Tika 1.16 when available
[ https://issues.apache.org/jira/browse/SOLR-10335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shalin Shekhar Mangar updated SOLR-10335: - Fix Version/s: 6.6.2 7.1 > Upgrade to Tika 1.16 when available > --- > > Key: SOLR-10335 > URL: https://issues.apache.org/jira/browse/SOLR-10335 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Tim Allison > Assignee: Shalin Shekhar Mangar >Priority: Critical > Fix For: 7.1, 6.6.2 > > > Once POI 3.16-beta3 is out (early/mid April?), we'll push for a release of > Tika 1.15. > Please let us know if you have any requests. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[VOTE] Release Lucene/Solr 7.1.0 RC2
Please vote for release candidate 2 for Lucene/Solr 7.1.0 The artifacts can be downloaded from: https://dist.apache.org/repos/dist/dev/lucene/lucene-solr-7.1.0-RC2-rev84c90ad2c0218156c840e19a64d72b8a38550659 You can run the smoke tester directly with this command: python3 -u dev-tools/scripts/smokeTestRelease.py \ https://dist.apache.org/repos/dist/dev/lucene/lucene-solr-7.1.0-RC2-rev84c90ad2c0218156c840e19a64d72b8a38550659 Smoke tester passed for me. SUCCESS! [0:40:53.908967] Here's my +1 to release. -- Regards, Shalin Shekhar Mangar. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [VOTE] Release Lucene/Solr 7.1.0 RC1
This vote has been cancelled due to the recently reported vulnerabilities. I'll be putting up another RC soon for voting. On Thu, Oct 12, 2017 at 11:20 AM, Shalin Shekhar Mangar <sha...@apache.org> wrote: > Please vote for release candidate 1 for Lucene/Solr 7.1.0 > > The artifacts can be downloaded from: > https://dist.apache.org/repos/dist/dev/lucene/lucene-solr-7.1.0-RC1-reva2c54447f118a5dc70ab0e0ae14bd87b3545254b > > You can run the smoke tester directly with this command: > > python3 -u dev-tools/scripts/smokeTestRelease.py \ > https://dist.apache.org/repos/dist/dev/lucene/lucene-solr-7.1.0-RC1-reva2c54447f118a5dc70ab0e0ae14bd87b3545254b > > Smoke tester passed for me (but on the 2nd attempt due to a flaky test > that's already being tracked in a Jira). > SUCCESS! [0:55:14.107386] > > Here's my +1 to release. > > > -- > Regards, > Shalin Shekhar Mangar. -- Regards, Shalin Shekhar Mangar. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-10335) Upgrade to Tika 1.16 when available
[ https://issues.apache.org/jira/browse/SOLR-10335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16203609#comment-16203609 ] Shalin Shekhar Mangar commented on SOLR-10335: -- [~talli...@mitre.org] - It is likely that there will be a 6.6.2 due to the other vulnerabilities so yes, we should back port this to branch_6x and branch_6_6 too. > Upgrade to Tika 1.16 when available > --- > > Key: SOLR-10335 > URL: https://issues.apache.org/jira/browse/SOLR-10335 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Tim Allison > Assignee: Shalin Shekhar Mangar >Priority: Minor > > Once POI 3.16-beta3 is out (early/mid April?), we'll push for a release of > Tika 1.15. > Please let us know if you have any requests. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-10335) Upgrade to Tika 1.16 when available
[ https://issues.apache.org/jira/browse/SOLR-10335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16203511#comment-16203511 ] Shalin Shekhar Mangar commented on SOLR-10335: -- I'm going to merge this after running tests so that it can make it to the next RC for 7.1 > Upgrade to Tika 1.16 when available > --- > > Key: SOLR-10335 > URL: https://issues.apache.org/jira/browse/SOLR-10335 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Tim Allison > Assignee: Shalin Shekhar Mangar >Priority: Minor > > Once POI 3.16-beta3 is out (early/mid April?), we'll push for a release of > Tika 1.15. > Please let us know if you have any requests. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (SOLR-10335) Upgrade to Tika 1.16 when available
[ https://issues.apache.org/jira/browse/SOLR-10335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shalin Shekhar Mangar reassigned SOLR-10335: Assignee: Shalin Shekhar Mangar > Upgrade to Tika 1.16 when available > --- > > Key: SOLR-10335 > URL: https://issues.apache.org/jira/browse/SOLR-10335 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Tim Allison > Assignee: Shalin Shekhar Mangar >Priority: Minor > > Once POI 3.16-beta3 is out (early/mid April?), we'll push for a release of > Tika 1.15. > Please let us know if you have any requests. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-11392) StreamExpressionTest.testParallelExecutorStream fails too frequently
[ https://issues.apache.org/jira/browse/SOLR-11392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16203010#comment-16203010 ] Shalin Shekhar Mangar commented on SOLR-11392: -- bq. Notice this is looking for the "_n3" replica. What's odd about this is that only two replicas where created for this collection Joel, sorry for the late reply. Solr does not create replica numbers sequentially anymore. This was done in SOLR-11011 > StreamExpressionTest.testParallelExecutorStream fails too frequently > > > Key: SOLR-11392 > URL: https://issues.apache.org/jira/browse/SOLR-11392 > Project: Solr > Issue Type: New Feature > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Joel Bernstein > > I've never been able to reproduce the failure but jenkins fails frequently > with the following error: > {code} > Stack Trace: > org.apache.solr.client.solrj.impl.CloudSolrClient$RouteException: Error from > server at http://127.0.0.1:38180/solr/workQueue_shard2_replica_n3: Expected > mime type application/octet-stream but got text/html. > > > Error 404 > > > HTTP ERROR: 404 > Problem accessing /solr/workQueue_shard2_replica_n3/update. Reason: > Can not find: /solr/workQueue_shard2_replica_n3/update > http://eclipse.org/jetty;>Powered by Jetty:// > 9.3.20.v20170531 > > > {code} > What appears to be happening is that the test framework is having trouble > setting up the collection. > Here is the test code: > {code} > @Test > public void testParallelExecutorStream() throws Exception { > CollectionAdminRequest.createCollection("workQueue", "conf", 2, > 1).process(cluster.getSolrClient()); > AbstractDistribZkTestBase.waitForRecoveriesToFinish("workQueue", > cluster.getSolrClient().getZkStateReader(), > false, true, TIMEOUT); > CollectionAdminRequest.createCollection("mainCorpus", "conf", 2, > 1).process(cluster.getSolrClient()); > AbstractDistribZkTestBase.waitForRecoveriesToFinish("mainCorpus", > cluster.getSolrClient().getZkStateReader(), > false, true, TIMEOUT); > CollectionAdminRequest.createCollection("destination", "conf", 2, > 1).process(cluster.getSolrClient()); > AbstractDistribZkTestBase.waitForRecoveriesToFinish("destination", > cluster.getSolrClient().getZkStateReader(), > false, true, TIMEOUT); > UpdateRequest workRequest = new UpdateRequest(); > UpdateRequest dataRequest = new UpdateRequest(); > for (int i = 0; i < 500; i++) { > workRequest.add(id, String.valueOf(i), "expr_s", "update(destination, > batchSize=50, search(mainCorpus, q=id:"+i+", rows=1, sort=\"id asc\", > fl=\"id, body_t, field_i\"))"); > dataRequest.add(id, String.valueOf(i), "body_t", "hello world "+i, > "field_i", Integer.toString(i)); > } > workRequest.commit(cluster.getSolrClient(), "workQueue"); > dataRequest.commit(cluster.getSolrClient(), "mainCorpus"); > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[VOTE] Release Lucene/Solr 7.1.0 RC1
Please vote for release candidate 1 for Lucene/Solr 7.1.0 The artifacts can be downloaded from: https://dist.apache.org/repos/dist/dev/lucene/lucene-solr-7.1.0-RC1-reva2c54447f118a5dc70ab0e0ae14bd87b3545254b You can run the smoke tester directly with this command: python3 -u dev-tools/scripts/smokeTestRelease.py \ https://dist.apache.org/repos/dist/dev/lucene/lucene-solr-7.1.0-RC1-reva2c54447f118a5dc70ab0e0ae14bd87b3545254b Smoke tester passed for me (but on the 2nd attempt due to a flaky test that's already being tracked in a Jira). SUCCESS! [0:55:14.107386] Here's my +1 to release. -- Regards, Shalin Shekhar Mangar. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-11445) Overseer.processQueueItem().... zkStateWriter.enqueueUpdate might ideally have a try{}catch{} around it
[ https://issues.apache.org/jira/browse/SOLR-11445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16201513#comment-16201513 ] Shalin Shekhar Mangar commented on SOLR-11445: -- bq. We process state update queue in batch, so we don't know which message is the bad message. So we must fall-back on using workqueue + reread cluster state from Zk. Okay, sounds good to me. > Overseer.processQueueItem() zkStateWriter.enqueueUpdate might ideally > have a try{}catch{} around it > > > Key: SOLR-11445 > URL: https://issues.apache.org/jira/browse/SOLR-11445 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 6.6.1, 7.0, master (8.0) >Reporter: Greg Harris > Attachments: SOLR-11445.patch > > > So we had the following stack trace with a customer: > 2017-10-04 11:25:30.339 ERROR () [ ] o.a.s.c.Overseer Exception in > Overseer main queue loop > org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = > NoNode for /collections//state.json > at org.apache.zookeeper.KeeperException.create(KeeperException.java:111) > at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) > at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:783) > at > org.apache.solr.common.cloud.SolrZkClient$9.execute(SolrZkClient.java:391) > at > org.apache.solr.common.cloud.SolrZkClient$9.execute(SolrZkClient.java:388) > at > org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:60) > at org.apache.solr.common.cloud.SolrZkClient.create(SolrZkClient.java:388) > at > org.apache.solr.cloud.overseer.ZkStateWriter.writePendingUpdates(ZkStateWriter.java:235) > at > org.apache.solr.cloud.overseer.ZkStateWriter.enqueueUpdate(ZkStateWriter.java:152) > at > org.apache.solr.cloud.Overseer$ClusterStateUpdater.processQueueItem(Overseer.java:271) > at > org.apache.solr.cloud.Overseer$ClusterStateUpdater.run(Overseer.java:199) > at java.lang.Thread.run(Thread.java:748) > I want to highlight: > at > org.apache.solr.cloud.overseer.ZkStateWriter.enqueueUpdate(ZkStateWriter.java:152) > at > org.apache.solr.cloud.Overseer$ClusterStateUpdater.processQueueItem(Overseer.java:271) > This ends up coming from Overseer: > while (data != null) { > final ZkNodeProps message = ZkNodeProps.load(data); > log.debug("processMessage: workQueueSize: {}, message = {}", > workQueue.getStats().getQueueLength(), message); > // force flush to ZK after each message because there is no > fallback if workQueue items > // are removed from workQueue but fail to be written to ZK > *clusterState = processQueueItem(message, clusterState, > zkStateWriter, false, null); > workQueue.poll(); // poll-ing removes the element we got by > peek-ing* > data = workQueue.peek(); > } > Note: The processQueueItem comes before the poll, therefore upon a thrown > exception the same node/message that won't process becomes stuck. This made a > large cluster unable to come up on it's own without deleting the problem > node. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-11445) Overseer.processQueueItem().... zkStateWriter.enqueueUpdate might ideally have a try{}catch{} around it
[ https://issues.apache.org/jira/browse/SOLR-11445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16200061#comment-16200061 ] Shalin Shekhar Mangar commented on SOLR-11445: -- I think it is better that we explicitly check for NoNode or NodeExists exceptions in the isBadMessageOrInvalidState() method. Most other KeeperExceptions shouldn't cause us to poll items off the queue. Also, the same kind of handling should be done for exceptions thrown when processing messages from state update queue. > Overseer.processQueueItem() zkStateWriter.enqueueUpdate might ideally > have a try{}catch{} around it > > > Key: SOLR-11445 > URL: https://issues.apache.org/jira/browse/SOLR-11445 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 6.6.1, 7.0, master (8.0) >Reporter: Greg Harris > Attachments: SOLR-11445.patch > > > So we had the following stack trace with a customer: > 2017-10-04 11:25:30.339 ERROR () [ ] o.a.s.c.Overseer Exception in > Overseer main queue loop > org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = > NoNode for /collections//state.json > at org.apache.zookeeper.KeeperException.create(KeeperException.java:111) > at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) > at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:783) > at > org.apache.solr.common.cloud.SolrZkClient$9.execute(SolrZkClient.java:391) > at > org.apache.solr.common.cloud.SolrZkClient$9.execute(SolrZkClient.java:388) > at > org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:60) > at org.apache.solr.common.cloud.SolrZkClient.create(SolrZkClient.java:388) > at > org.apache.solr.cloud.overseer.ZkStateWriter.writePendingUpdates(ZkStateWriter.java:235) > at > org.apache.solr.cloud.overseer.ZkStateWriter.enqueueUpdate(ZkStateWriter.java:152) > at > org.apache.solr.cloud.Overseer$ClusterStateUpdater.processQueueItem(Overseer.java:271) > at > org.apache.solr.cloud.Overseer$ClusterStateUpdater.run(Overseer.java:199) > at java.lang.Thread.run(Thread.java:748) > I want to highlight: > at > org.apache.solr.cloud.overseer.ZkStateWriter.enqueueUpdate(ZkStateWriter.java:152) > at > org.apache.solr.cloud.Overseer$ClusterStateUpdater.processQueueItem(Overseer.java:271) > This ends up coming from Overseer: > while (data != null) { > final ZkNodeProps message = ZkNodeProps.load(data); > log.debug("processMessage: workQueueSize: {}, message = {}", > workQueue.getStats().getQueueLength(), message); > // force flush to ZK after each message because there is no > fallback if workQueue items > // are removed from workQueue but fail to be written to ZK > *clusterState = processQueueItem(message, clusterState, > zkStateWriter, false, null); > workQueue.poll(); // poll-ing removes the element we got by > peek-ing* > data = workQueue.peek(); > } > Note: The processQueueItem comes before the poll, therefore upon a thrown > exception the same node/message that won't process becomes stuck. This made a > large cluster unable to come up on it's own without deleting the problem > node. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Release notes for Lucene/Solr 7.1
Hello, I've created drafts for the 7.1 release notes: Lucene: https://wiki.apache.org/lucene-java/ReleaseNote71 Solr: https://wiki.apache.org/solr/ReleaseNote71 Please review and edit as you see fit. -- Regards, Shalin Shekhar Mangar. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-11085) Improve resiliency of autoscaling actions
[ https://issues.apache.org/jira/browse/SOLR-11085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shalin Shekhar Mangar resolved SOLR-11085. -- Resolution: Fixed > Improve resiliency of autoscaling actions > - > > Key: SOLR-11085 > URL: https://issues.apache.org/jira/browse/SOLR-11085 > Project: Solr > Issue Type: Sub-task > Security Level: Public(Default Security Level. Issues are Public) > Components: AutoScaling, SolrCloud > Reporter: Shalin Shekhar Mangar > Assignee: Shalin Shekhar Mangar > Fix For: 7.1, master (8.0) > > Attachments: SOLR-11085.patch > > > We need to improve resiliency of actions against: > # Overseer restarts > # Failed operations e.g. a move replica fails if target node is no longer live -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Release 7.1
Hello, The 7.0 release took much longer than anticipated so I would like to push a 7.1 release on a more aggressive schedule than usual. I've already pushed branch_7_1 created from branch_7x a few minutes back. If no one has any objections, I'd like to cut the first RC after 24 hours from this email. No new features should be pushed to branch_7_1. Critical bug fixes or test fixes only at this point, please. -- Regards, Shalin Shekhar Mangar. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-11157) remove-policy must fail if a policy to be deleted is used by a collection
[ https://issues.apache.org/jira/browse/SOLR-11157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shalin Shekhar Mangar updated SOLR-11157: - Fix Version/s: master (8.0) 7.1 Component/s: AutoScaling > remove-policy must fail if a policy to be deleted is used by a collection > - > > Key: SOLR-11157 > URL: https://issues.apache.org/jira/browse/SOLR-11157 > Project: Solr > Issue Type: Sub-task > Security Level: Public(Default Security Level. Issues are Public) > Components: AutoScaling >Reporter: Noble Paul >Assignee: Noble Paul >Priority: Minor > Fix For: 7.1, master (8.0) > > > If a policy that is referenced by a collection is deleted , all future > collection admin operations will fail. We must prevent that from happening -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9735) Umbrella JIRA for Auto Scaling and Cluster Management in SolrCloud
[ https://issues.apache.org/jira/browse/SOLR-9735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16196859#comment-16196859 ] Shalin Shekhar Mangar commented on SOLR-9735: - Noble merged the autoscaling branch to master last week. The tests look stable and we haven't seen any new failures. I am going to merge these changes to branch_7x now. > Umbrella JIRA for Auto Scaling and Cluster Management in SolrCloud > -- > > Key: SOLR-9735 > URL: https://issues.apache.org/jira/browse/SOLR-9735 > Project: Solr > Issue Type: New Feature > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Anshum Gupta > Assignee: Shalin Shekhar Mangar > Original Estimate: 1,344h > Remaining Estimate: 1,344h > > As SolrCloud is now used at fairly large scale, most users end up writing > their own cluster management tools. We should have a framework for cluster > management in Solr. > In a discussion with [~noble.paul], we outlined the following steps w.r.t. > the approach to having this implemented: > * *Basic API* calls for cluster management e.g. utilize added nodes, remove a > node etc. These calls would need explicit invocation by the users to begin > with. It would also specify the {{strategy}} to use. For instance I can have > a strategy called {{optimizeCoreCount}} which would target to have an even > no:of cores in each node . The strategy could optionally take parameters as > well > * *Metrics* and stats tracking e.g. qps, etc. These would be required for any > advanced cluster management tasks e.g. *maintain a qps of 'x'* by > *auto-adding a replica* (using a recipe) etc. We would need > collection/shard/node level views of metrics for this. > * *Recipes*: combination of multiple sequential/parallel API calls based on > rules. This would be complicated specially as most of these would be long > running series of tasks which would either have to be rolled back or resumed > in case of a failure. > * *Event based triggers* that would not require explicit cluster management > calls for end users. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Welcome Hrishikesh Gadre as Lucene/Solr committer
Congratulations and welcome Hrishikesh! On Fri, Sep 29, 2017 at 7:23 PM, Yonik Seeley <yo...@apache.org> wrote: > Hi All, > > Please join me in welcoming Hrishikesh Gadre as the latest Lucene/Solr > committer. > Hrishikesh, it's tradition for you to introduce yourself with a brief bio. > > Congrats and Welcome! > -Yonik > > - > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > -- Regards, Shalin Shekhar Mangar. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-10181) CREATEALIAS and DELETEALIAS commands consistency problems under concurrency
[ https://issues.apache.org/jira/browse/SOLR-10181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16174030#comment-16174030 ] Shalin Shekhar Mangar commented on SOLR-10181: -- Thanks Samuel. The right way to fix this is to use ZooKeeper's compare-and-set methods to update the /aliases.json. The SolrZkClient's getData method accepts a Stat object which returns the version of the znode. Then the setData method accepts a version which is used for compare-and-set. If the znode version in ZK does not match the version you provide to the client then the client throws a BadVersionException. You should catch it and retry the changes until you succeed. > CREATEALIAS and DELETEALIAS commands consistency problems under concurrency > --- > > Key: SOLR-10181 > URL: https://issues.apache.org/jira/browse/SOLR-10181 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrCloud >Affects Versions: 5.3, 5.4, 5.5, 6.4.1 >Reporter: Samuel García Martínez >Assignee: Erick Erickson > Attachments: SOLR-10181_testcase.patch > > > When several CREATEALIAS are run at the same time by the OCP it could happen > that, even tho the API response is OK, some of those CREATEALIAS request > changes are lost. > h3. The problem > The problem happens because the CREATEALIAS cmd implementation relies on > _zkStateReader.getAliases()_ to create the map that will be stored in ZK. If > several threads reach that line at the same time it will happen that only one > will be stored correctly and the others will be overridden. > The code I'm referencing is [this > piece|https://github.com/apache/lucene-solr/blob/8c1e67e30e071ceed636083532d4598bf6a8791f/solr/core/src/java/org/apache/solr/cloud/CreateAliasCmd.java#L65]. > As an example, let's say that the current aliases map has {a:colA, b:colB}. > If two CREATEALIAS (one adding c:colC and other creating d:colD) are > submitted to the _tpe_ and reach that line at the same time, the resulting > maps will look like {a:colA, b:colB, c:colC} and {a:colA, b:colB, d:colD} and > only one of them will be stored correctly in ZK, resulting in "data loss", > meaning that API is returning OK despite that it didn't work as expected. > On top of this, another concurrency problem could happen when the command > checks if the alias has been set using _checkForAlias_ method. if these two > CREATEALIAS zk writes had ran at the same time, the alias check fir one of > the threads can timeout since only one of the writes has "survived" and has > been "committed" to the _zkStateReader.getAliases()_ map. > h3. How to fix it > I can post a patch to this if someone gives me directions on how it should be > fixed. As I see this, there are two places where the issue can be fixed: in > the processor (OverseerCollectionMessageHandler) in a generic way or inside > the command itself. > h5. The processor fix > The locking mechanism (_OverseerCollectionMessageHandler#lockTask_) should be > the place to fix this inside the processor. I thought that adding the > operation name instead of only "collection" or "name" to the locking key > would fix the issue, but I realized that the problem will happen anyway if > the concurrency happens between different operations modifying the same > resource (like CREATEALIAS and DELETEALIAS do). So, if this should be the > path to follow I don't know what should be used as a locking key. > h5. The command fix > Fixing it at the command level (_CreateAliasCmd_ and _DeleteAliasCmd_) would > be relatively easy. Using optimistic locking, i.e, using the aliases.json zk > version in the keeper.setData. To do that, Aliases class should offer the > aliases version so the commands can forward that version with the update and > retry when it fails. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-11085) Improve resiliency of autoscaling actions
[ https://issues.apache.org/jira/browse/SOLR-11085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shalin Shekhar Mangar updated SOLR-11085: - Attachment: SOLR-11085.patch > Improve resiliency of autoscaling actions > - > > Key: SOLR-11085 > URL: https://issues.apache.org/jira/browse/SOLR-11085 > Project: Solr > Issue Type: Sub-task > Security Level: Public(Default Security Level. Issues are Public) > Components: AutoScaling, SolrCloud > Reporter: Shalin Shekhar Mangar > Assignee: Shalin Shekhar Mangar > Fix For: master (8.0), 7.1 > > Attachments: SOLR-11085.patch > > > We need to improve resiliency of actions against: > # Overseer restarts > # Failed operations e.g. a move replica fails if target node is no longer live -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-11085) Improve resiliency of autoscaling actions
[ https://issues.apache.org/jira/browse/SOLR-11085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16174014#comment-16174014 ] Shalin Shekhar Mangar commented on SOLR-11085: -- Committed to feature/autoscaling branch. > Improve resiliency of autoscaling actions > - > > Key: SOLR-11085 > URL: https://issues.apache.org/jira/browse/SOLR-11085 > Project: Solr > Issue Type: Sub-task > Security Level: Public(Default Security Level. Issues are Public) > Components: AutoScaling, SolrCloud > Reporter: Shalin Shekhar Mangar > Assignee: Shalin Shekhar Mangar > Fix For: master (8.0), 7.1 > > Attachments: SOLR-11085.patch > > > We need to improve resiliency of actions against: > # Overseer restarts > # Failed operations e.g. a move replica fails if target node is no longer live -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-11085) Improve resiliency of autoscaling actions
[ https://issues.apache.org/jira/browse/SOLR-11085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shalin Shekhar Mangar updated SOLR-11085: - Attachment: (was: SOLR-11085.patch) > Improve resiliency of autoscaling actions > - > > Key: SOLR-11085 > URL: https://issues.apache.org/jira/browse/SOLR-11085 > Project: Solr > Issue Type: Sub-task > Security Level: Public(Default Security Level. Issues are Public) > Components: AutoScaling, SolrCloud > Reporter: Shalin Shekhar Mangar > Assignee: Shalin Shekhar Mangar > Fix For: master (8.0), 7.1 > > > We need to improve resiliency of actions against: > # Overseer restarts > # Failed operations e.g. a move replica fails if target node is no longer live -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-10643) Throttling strategy for triggers and policy executions
[ https://issues.apache.org/jira/browse/SOLR-10643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shalin Shekhar Mangar updated SOLR-10643: - Fix Version/s: (was: 7.0) 7.1 master (8.0) > Throttling strategy for triggers and policy executions > -- > > Key: SOLR-10643 > URL: https://issues.apache.org/jira/browse/SOLR-10643 > Project: Solr > Issue Type: Sub-task > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrCloud > Reporter: Shalin Shekhar Mangar > Assignee: Shalin Shekhar Mangar > Labels: autoscaling > Fix For: master (8.0), 7.1 > > Attachments: SOLR-10643.patch, SOLR-10643.patch, SOLR-10643.patch > > > We must ensure that triggers and policy executions: > # Do not step on each other's toes by concurrent executions > # Do not fire/execute too frequently > # Do not stack up -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-10738) TriggerAction is initialised even if the trigger is never scheduled
[ https://issues.apache.org/jira/browse/SOLR-10738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shalin Shekhar Mangar updated SOLR-10738: - Fix Version/s: (was: 7.0) 7.1 master (8.0) > TriggerAction is initialised even if the trigger is never scheduled > --- > > Key: SOLR-10738 > URL: https://issues.apache.org/jira/browse/SOLR-10738 > Project: Solr > Issue Type: Sub-task > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrCloud > Reporter: Shalin Shekhar Mangar > Assignee: Shalin Shekhar Mangar > Fix For: master (8.0), 7.1 > > Attachments: SOLR-10738-fix.patch, SOLR-10738.patch, > SOLR-10738-tests.patch > > > The zk watcher responsible for creating triggers creates them blindly without > checking if the trigger is actually modified. This is be design as > ScheduledTriggers.add is a no-op if the trigger being added is unchanged. > However, since the trigger's actions are initialised in the trigger's > constructor, they are inited needlessly by the zk watcher thread even though > we may never schedule those trigger instances (because they are unchanged). > So I propose to change the TriggerAction lifecycle such that the > TriggerAction.init is only called when the trigger is actually ready to be > scheduled. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-10714) OverseerTriggerThread does not start triggers on overseer start until autoscaling config watcher is fired
[ https://issues.apache.org/jira/browse/SOLR-10714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shalin Shekhar Mangar updated SOLR-10714: - Fix Version/s: (was: 7.0) 7.1 master (8.0) > OverseerTriggerThread does not start triggers on overseer start until > autoscaling config watcher is fired > - > > Key: SOLR-10714 > URL: https://issues.apache.org/jira/browse/SOLR-10714 > Project: Solr > Issue Type: Sub-task > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrCloud > Reporter: Shalin Shekhar Mangar > Assignee: Shalin Shekhar Mangar > Labels: autoscaling > Fix For: master (8.0), 7.1 > > Attachments: SOLR-10714.patch > > > Thanks to [~ab] for catching this. The OverseerTriggerThread only sets a > watch but doesn't read trigger information by itself. Therefore no triggers > are started on overseer restart until the autoscaling zk node changes and the > watcher is fired. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-10602) Triggers should be able to restore state from old instances when taking over
[ https://issues.apache.org/jira/browse/SOLR-10602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shalin Shekhar Mangar updated SOLR-10602: - Fix Version/s: (was: 7.0) 7.1 master (8.0) > Triggers should be able to restore state from old instances when taking over > > > Key: SOLR-10602 > URL: https://issues.apache.org/jira/browse/SOLR-10602 > Project: Solr > Issue Type: Sub-task > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrCloud > Reporter: Shalin Shekhar Mangar > Assignee: Shalin Shekhar Mangar > Labels: autoscaling > Fix For: master (8.0), 7.1 > > Attachments: SOLR-10602.patch, SOLR-10602.patch > > > Currently if a user modifies a trigger then the old trigger is closed and > unscheduled and replaced with a new trigger instance with updated properties. > However, this loses the intermediate state that the trigger may have been > tracking. For example, say there is a trigger for NodeAdded event with > waitFor=5s and a new node is added to the cluster. While the trigger is > waiting for 5s before firing, the user modifies the trigger to change the > waitFor=2s. Doing this today will erase the state of the old trigger and the > new trigger will never fire for the newly added node. > We need to be able to restore state from old trigger instance before > replacing it. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-10396) Implement trigger support for nodeLost event type
[ https://issues.apache.org/jira/browse/SOLR-10396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shalin Shekhar Mangar updated SOLR-10396: - Fix Version/s: (was: 7.0) 7.1 master (8.0) > Implement trigger support for nodeLost event type > - > > Key: SOLR-10396 > URL: https://issues.apache.org/jira/browse/SOLR-10396 > Project: Solr > Issue Type: Sub-task > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrCloud > Reporter: Shalin Shekhar Mangar >Assignee: Cao Manh Dat > Labels: autoscaling > Fix For: master (8.0), 7.1 > > Attachments: SOLR-10396.patch > > > Implement support for 'nodeLost' event type in triggers. This kind of trigger > is fired when a node goes away (i.e. no longer live) and does not comes back > within a configured amount of time. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-10965) Implement ExecutePlanAction for autoscaling
[ https://issues.apache.org/jira/browse/SOLR-10965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shalin Shekhar Mangar updated SOLR-10965: - Fix Version/s: (was: 7.0) 7.1 master (8.0) > Implement ExecutePlanAction for autoscaling > --- > > Key: SOLR-10965 > URL: https://issues.apache.org/jira/browse/SOLR-10965 > Project: Solr > Issue Type: Sub-task > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrCloud > Reporter: Shalin Shekhar Mangar > Assignee: Shalin Shekhar Mangar > Labels: autoscaling > Fix For: master (8.0), 7.1 > > Attachments: SOLR-10965.patch, SOLR-10965.patch > > > The ExecutePlanAction will use cluster operations computed by > ComputePlanAction and execute them against the cluster. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-10496) Implement ComputePlanAction for autoscaling
[ https://issues.apache.org/jira/browse/SOLR-10496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shalin Shekhar Mangar updated SOLR-10496: - Fix Version/s: (was: 7.0) 7.1 master (8.0) > Implement ComputePlanAction for autoscaling > --- > > Key: SOLR-10496 > URL: https://issues.apache.org/jira/browse/SOLR-10496 > Project: Solr > Issue Type: Sub-task > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrCloud > Reporter: Shalin Shekhar Mangar > Assignee: Shalin Shekhar Mangar > Labels: autoscaling > Fix For: master (8.0), 7.1 > > Attachments: SOLR-10496.patch, SOLR-10496.patch, SOLR-10496.patch, > SOLR-10496.patch, SOLR-10496.patch, SOLR-10496.patch, SOLR-10496.patch, > SOLR-10496.patch, SOLR-10496-test2.patch, SOLR-10496-test.patch > > > The ComputePlanAction will use the cluster/collection policy to calculate the > cluster operations to be performed. This issue is about integrating the work > done in SOLR-10278 with the trigger framework. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-10358) Implement suspend-trigger and resume-trigger APIs
[ https://issues.apache.org/jira/browse/SOLR-10358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shalin Shekhar Mangar updated SOLR-10358: - Fix Version/s: (was: 7.0) 7.1 master (8.0) > Implement suspend-trigger and resume-trigger APIs > - > > Key: SOLR-10358 > URL: https://issues.apache.org/jira/browse/SOLR-10358 > Project: Solr > Issue Type: Sub-task > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrCloud > Reporter: Shalin Shekhar Mangar > Assignee: Shalin Shekhar Mangar > Labels: autoscaling > Fix For: master (8.0), 7.1 > > Attachments: SOLR-10358.patch > > > There are times when the user wants to pause execution of the autoscaling > policies because he/she is performing some maintenance tasks. A cluster wide > command can be used to suspend the triggers indefinitely or for a specific > amount of time. > h3. Examples: > Suspend the 'node_lost_trigger' until an explicit resume_trigger API is > called: > {code} > curl -H 'Content-type:application/json' -d '{ > "suspend-trigger" : { > "name" : "node_lost_trigger" > } > }' http://localhost:8983/solr/admin/autoscaling > {code} > Suspend all triggers until resumed by an explicit resume_trigger API call: > {code} > curl -H 'Content-type:application/json' -d '{ > "suspend-trigger" : { > "name" : "#EACH" > } > }' http://localhost:8983/solr/admin/autoscaling > {code} > Suspend all triggers for 1 hour: > {code} > curl -H 'Content-type:application/json' -d '{ > "suspend-trigger" : { > "name" : "#EACH" > "timeout" : "1h" > } > }' http://localhost:8983/solr/admin/autoscaling > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-10376) Implement trigger for nodeAdded event
[ https://issues.apache.org/jira/browse/SOLR-10376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shalin Shekhar Mangar updated SOLR-10376: - Fix Version/s: (was: 7.0) 7.1 master (8.0) > Implement trigger for nodeAdded event > - > > Key: SOLR-10376 > URL: https://issues.apache.org/jira/browse/SOLR-10376 > Project: Solr > Issue Type: Sub-task > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrCloud > Reporter: Shalin Shekhar Mangar > Assignee: Shalin Shekhar Mangar > Labels: autoscaling > Fix For: master (8.0), 7.1 > > Attachments: SOLR_10376_OverseerTest_fix.patch, SOLR-10376.patch, > SOLR-10376.patch, SOLR-10376.patch > > > This issue is about implementing support for the nodeAdded event type. > Whenever a node is added, all triggers for this event type should be fired. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-10340) Implement set-listener and remove-listener API
[ https://issues.apache.org/jira/browse/SOLR-10340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shalin Shekhar Mangar updated SOLR-10340: - Fix Version/s: (was: 7.0) 7.1 master (8.0) > Implement set-listener and remove-listener API > -- > > Key: SOLR-10340 > URL: https://issues.apache.org/jira/browse/SOLR-10340 > Project: Solr > Issue Type: Sub-task > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrCloud > Reporter: Shalin Shekhar Mangar > Assignee: Shalin Shekhar Mangar > Labels: autoscaling > Fix For: master (8.0), 7.1 > > Attachments: SOLR-10340.patch, SOLR-10340.patch > > > Implement set-listener and remove-listener API to listen to various lifecycle > stages of a trigger. > The set-listeners API can be invoked to add a listener to any trigger at any > stage of its execution. The parameters are : > * ‘name’ - a unique string identifying the listener so that it can be read, > updated and removed > * ‘trigger’ - the name of the trigger to listen to > * ‘stage’ - the stage of the trigger (multiple values can be specified as an > array of strings), possible values are: > ** STARTED, > ** ABORTED, > ** FAILED, > ** SUCCEEDED > * ‘beforeAction’ - the action name before which the listener should be > notified. Multiple values can be specified as an array of strings. > * ‘afterAction’ - the action name after which the listener should be > notified. Multiple values can be specified as an array of strings > * ‘class’ - an implementation of ‘TriggerListener’ class > * Other parameters depend on the listener class > An example invocation of this API is: > {code} > curl -H 'Content-type:application/json' -d '{ > “set-listener” : > { > “name” : “xyz”, > “trigger” : “node_lost_trigger”, > “stage” : [“STARTED”,”ABORTED”,”SUCCEEDED”], > “beforeAction” : “execute_plan”, > “class” : “solr.HttpCallback”, > “url” : > “http://xyz.com/on_node_lost?node={$LOST_NODE_NAME}” > }' http://localhost:8983/solr/admin/cluster > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-10339) Implement set-trigger and remove-trigger APIs
[ https://issues.apache.org/jira/browse/SOLR-10339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shalin Shekhar Mangar updated SOLR-10339: - Fix Version/s: (was: 7.0) 7.1 master (8.0) > Implement set-trigger and remove-trigger APIs > - > > Key: SOLR-10339 > URL: https://issues.apache.org/jira/browse/SOLR-10339 > Project: Solr > Issue Type: Sub-task > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrCloud > Reporter: Shalin Shekhar Mangar > Assignee: Shalin Shekhar Mangar > Labels: autoscaling > Fix For: master (8.0), 7.1 > > Attachments: SOLR-10339.patch, SOLR-10339.patch > > > Implement set-trigger and remove-trigger API to add, update and remove > triggers for autoscaling. > The following events are supported: > # nodeAdded > # nodeLost > # replicaLost > # schedule > # searchRate > # indexRate > Each trigger has the following properties: > # ‘name’ - a unique string to identify the trigger so that it can be read, > updated or removed later > # ‘state’ - the state of the event (ENABLED or DISABLED), default is ENABLED. > This allows one to add a trigger which is disabled until a RESUME_TRIGGER API > is called. > # ‘actions’ - a list of actions to be performed in the order specified. The > default list of actions for every trigger are to compute the plan, execute > the plan and save the plan. If an empty list of actions is explicitly > specified or null is specified when creating/updating the trigger then no > actions are performed at all. > Here's an example of an API invocation: > {code} > { > "set-trigger" : { > "name" : "node_lost_trigger", > "event" : "nodeLost", > "waitFor" : "10m", > "state" : "ENABLED", > "actions" : [ > { > "name" : "compute_plan", > "class" : "solr.ComputePlanAction" > }, > { > "name" : "execute_plan", > "class" : "solr.ExecutePlanAction" > }, > { > "name" : "log_plan", > "class" : "solr.LogPlanAction", > "collection" : ".system" > } > ] > } > } > {code} > Note this issue is only about implementation of the user-facing APIs and not > the actual trigger mechanism itself. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-10764) AutoScalingHandler should validate policy and preferences before updating zookeeper
[ https://issues.apache.org/jira/browse/SOLR-10764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shalin Shekhar Mangar updated SOLR-10764: - Fix Version/s: (was: 7.0) 7.1 master (8.0) > AutoScalingHandler should validate policy and preferences before updating > zookeeper > --- > > Key: SOLR-10764 > URL: https://issues.apache.org/jira/browse/SOLR-10764 > Project: Solr > Issue Type: Sub-task > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrCloud > Reporter: Shalin Shekhar Mangar > Assignee: Shalin Shekhar Mangar > Labels: autoscaling > Fix For: master (8.0), 7.1 > > Attachments: SOLR-10764.patch > > > AutoScalingHandler should validate policy and preferences before updating > zookeeper so that problems are caught early rather than during diagnostics or > actual executions. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-11085) Improve resiliency of autoscaling actions
[ https://issues.apache.org/jira/browse/SOLR-11085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shalin Shekhar Mangar updated SOLR-11085: - Attachment: SOLR-11085.patch This patch: # Passes ZkController to the TriggerFactory to avoid running into SOLR-11370. Similarily, triggers use the ZkController directly instead of accessing it via container.getZkController. # ExecutePlanAction always calls collection APIs in async mode and uses the request id to poll for completion of tasks. # Fixes a bug in Preference where the setApproxVal method was trying to access metrics for non-live nodes leading to NPE. # ScheduledTriggers: ## adds more protection against overseer shutdown ## ScheduledTriggers fires listeners for event stage STARTED after the event is enqueued successfully to ZK ## ScheduledTriggers waits for pending tasks requested by ExecutePlanAction before trying to compute/execute the plan again # SharedFSAutoReplicaFailoverTest now restarts overseer sometimes and removes the nocommit added by SOLR-10397 > Improve resiliency of autoscaling actions > - > > Key: SOLR-11085 > URL: https://issues.apache.org/jira/browse/SOLR-11085 > Project: Solr > Issue Type: Sub-task > Security Level: Public(Default Security Level. Issues are Public) > Components: AutoScaling, SolrCloud > Reporter: Shalin Shekhar Mangar > Assignee: Shalin Shekhar Mangar > Fix For: master (8.0), 7.1 > > Attachments: SOLR-11085.patch > > > We need to improve resiliency of actions against: > # Overseer restarts > # Failed operations e.g. a move replica fails if target node is no longer live -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-11381) HdfsDirectoryFactory throws NPE on cleanup because file system has been closed
Shalin Shekhar Mangar created SOLR-11381: Summary: HdfsDirectoryFactory throws NPE on cleanup because file system has been closed Key: SOLR-11381 URL: https://issues.apache.org/jira/browse/SOLR-11381 Project: Solr Issue Type: Bug Security Level: Public (Default Security Level. Issues are Public) Components: hdfs Reporter: Shalin Shekhar Mangar Priority: Trivial Fix For: master (8.0), 7.1 I saw this happening on tests related to autoscaling. The old directory clean up is triggered on core close in a separate thread. This can cause a race condition where the filesystem is closed before the cleanup starts running. Then a NPE is thrown and cleanup fails. Fixing the NPE is simple but I think this is a real bug where old directories can be left around on HDFS. I don't know enough about HDFS to investigate further. Leaving it here for interested people to pitch in. {code} 105029 ERROR (OldIndexDirectoryCleanupThreadForCore-control_collection_shard1_replica_n1) [n:127.0.0.1:58542_ c:control_collection s:shard1 r:core_node2 x:control_collection_shard1_replica_n1] o.a.s.c.HdfsDirectoryFactory Error checking for old index directories to clean-up. java.io.IOException: Filesystem closed at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:808) at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:2083) at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:2069) at org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:791) at org.apache.hadoop.hdfs.DistributedFileSystem.access$700(DistributedFileSystem.java:106) at org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:853) at org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:849) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:860) at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1517) at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1557) at org.apache.solr.core.HdfsDirectoryFactory.cleanupOldIndexDirectories(HdfsDirectoryFactory.java:540) at org.apache.solr.core.SolrCore.lambda$cleanupOldIndexDirectories$32(SolrCore.java:3019) at java.lang.Thread.run(Thread.java:745) 105030 ERROR (OldIndexDirectoryCleanupThreadForCore-control_collection_shard1_replica_n1) [n:127.0.0.1:58542_ c:control_collection s:shard1 r:core_node2 x:control_collection_shard1_replica_n1] o.a.s.c.SolrCore Failed to cleanup old index directories for core control_collection_shard1_replica_n1 java.lang.NullPointerException at org.apache.solr.core.HdfsDirectoryFactory.cleanupOldIndexDirectories(HdfsDirectoryFactory.java:558) at org.apache.solr.core.SolrCore.lambda$cleanupOldIndexDirectories$32(SolrCore.java:3019) at java.lang.Thread.run(Thread.java:745) {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8389) Convert CDCR peer cluster and other configurations into collection properties modifiable via APIs
[ https://issues.apache.org/jira/browse/SOLR-8389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16173546#comment-16173546 ] Shalin Shekhar Mangar commented on SOLR-8389: - Sounds great! Looking forward to it. > Convert CDCR peer cluster and other configurations into collection properties > modifiable via APIs > - > > Key: SOLR-8389 > URL: https://issues.apache.org/jira/browse/SOLR-8389 > Project: Solr > Issue Type: Improvement > Components: CDCR, SolrCloud > Reporter: Shalin Shekhar Mangar > > CDCR configuration is kept inside solrconfig.xml which makes it difficult to > add or change peer cluster configuration. > I propose to move all CDCR config to collection level properties in cluster > state so that they can be modified using the existing modify collection API. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-11378) Solr can leak transaction logs if a commit happens during unload
Shalin Shekhar Mangar created SOLR-11378: Summary: Solr can leak transaction logs if a commit happens during unload Key: SOLR-11378 URL: https://issues.apache.org/jira/browse/SOLR-11378 Project: Solr Issue Type: Bug Security Level: Public (Default Security Level. Issues are Public) Reporter: Shalin Shekhar Mangar Fix For: master (8.0), 7.1 I have a test called AutoscalingHistoryHandlerTest in the feature/autoscaling branch that fails fairly frequently due to the ObjectReleaseTracker complaining about TransactionLog not being closed. This happens because the solr core in question is being moved and is therefore unloaded but an auto-commit was in progress during this time. The commit does not succeed because a new searcher could not be opened and I think that causes the tlog reference to be leaked. It should be possible to isolate this problem into a smaller test by continuously indexing documents with a commitWithin and unloading the core in the middle of indexing. Here are the relevant stack traces: {code} 21106 ERROR (commitScheduler-48-thread-1) [n:127.0.0.1:54191_solr c:.system s:shard1 r:core_node5 x:.system_shard1_replica_n2] o.a.s.u.CommitTracker auto commit error...:org.apache.solr.common.SolrException: Error opening new searcher at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:2076) at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:2196) at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1933) at org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:710) at org.apache.solr.update.CommitTracker.run(CommitTracker.java:222) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: org.apache.solr.common.SolrException: openNewSearcher called on closed core at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:2063) ... 11 more {code} {code} java.lang.AssertionError: ObjectTracker found 1 object(s) that were not released!!! [TransactionLog] org.apache.solr.common.util.ObjectReleaseTracker$ObjectTrackerException: org.apache.solr.update.TransactionLog at org.apache.solr.common.util.ObjectReleaseTracker.track(ObjectReleaseTracker.java:42) at org.apache.solr.update.TransactionLog.(TransactionLog.java:190) at org.apache.solr.update.UpdateLog.newTransactionLog(UpdateLog.java:446) at org.apache.solr.update.UpdateLog.ensureLog(UpdateLog.java:1259) at org.apache.solr.update.UpdateLog.add(UpdateLog.java:532) at org.apache.solr.update.UpdateLog.add(UpdateLog.java:517) at org.apache.solr.update.DirectUpdateHandler2.doNormalUpdate(DirectUpdateHandler2.java:347) at org.apache.solr.update.DirectUpdateHandler2.addDoc0(DirectUpdateHandler2.java:266) at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:216) at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:67) at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:55) at org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(DistributedUpdateProcessor.java:991) at org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:1207) at org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:753) at org.apache.solr.update.processor.LogUpdateProcessorFactory$LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:103) at org.apache.solr.handler.loader.JavabinLoader$1.update(JavabinLoader.java:98) at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readOuterMostDocIterator(JavaBinUpdateRequestCodec.java:188) at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readIterator(JavaBinUpdateRequestCodec.java:144) at org.apache.solr.common.util.JavaBinCodec.readObject(JavaBinCodec.java:311) at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:256) at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readNamedList(JavaBinUpdateRequestCodec.java:130
[jira] [Created] (SOLR-11376) ComputePlanAction should accept configuration to compute plans only for specific collections
Shalin Shekhar Mangar created SOLR-11376: Summary: ComputePlanAction should accept configuration to compute plans only for specific collections Key: SOLR-11376 URL: https://issues.apache.org/jira/browse/SOLR-11376 Project: Solr Issue Type: Sub-task Security Level: Public (Default Security Level. Issues are Public) Components: AutoScaling Reporter: Shalin Shekhar Mangar Fix For: master (8.0), 7.1 Today when we enable triggers, we compute a plan for all collections. It is not possible to configure the action to only apply for specific collections. I propose to add a parameter to ComputePlanAction called "collections" which accepts a list of collection names for which to compute actions. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org