[jira] [Created] (SOLR-12588) Solr Autoscaling History doesn't log node added events
Jerry Bao created SOLR-12588: Summary: Solr Autoscaling History doesn't log node added events Key: SOLR-12588 URL: https://issues.apache.org/jira/browse/SOLR-12588 Project: Solr Issue Type: Bug Security Level: Public (Default Security Level. Issues are Public) Components: AutoScaling Affects Versions: 7.3.1 Reporter: Jerry Bao Autoscaling node added triggers don't log node added events to the history in .system collection. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-12563) Unable to delete failed/completed async request statuses
Jerry Bao created SOLR-12563: Summary: Unable to delete failed/completed async request statuses Key: SOLR-12563 URL: https://issues.apache.org/jira/browse/SOLR-12563 Project: Solr Issue Type: Bug Security Level: Public (Default Security Level. Issues are Public) Components: SolrCloud Affects Versions: 7.3.1 Reporter: Jerry Bao /admin/collections?action=DELETESTATUS=true {code} { "responseHeader": { "status": 500, "QTime": 5 }, "error": { "msg": "KeeperErrorCode = Directory not empty for /overseer/collection-map-completed/mn-node_lost_trigger", "trace": "org.apache.zookeeper.KeeperException$NotEmptyException: KeeperErrorCode = Directory not empty for /overseer/collection-map-completed/mn-node_lost_trigger\n\tat org.apache.zookeeper.KeeperException.create(KeeperException.java:128)\n\tat org.apache.zookeeper.KeeperException.create(KeeperException.java:54)\n\tat org.apache.zookeeper.ZooKeeper.delete(ZooKeeper.java:876)\n\tat org.apache.solr.common.cloud.SolrZkClient.lambda$delete$1(SolrZkClient.java:244)\n\tat org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:60)\n\tat org.apache.solr.common.cloud.SolrZkClient.delete(SolrZkClient.java:243)\n\tat org.apache.solr.cloud.DistributedMap.remove(DistributedMap.java:98)\n\tat org.apache.solr.handler.admin.CollectionsHandler$CollectionOperation$1.execute(CollectionsHandler.java:753)\n\tat org.apache.solr.handler.admin.CollectionsHandler$CollectionOperation.execute(CollectionsHandler.java:1114)\n\tat org.apache.solr.handler.admin.CollectionsHandler.invokeAction(CollectionsHandler.java:242)\n\tat org.apache.solr.handler.admin.CollectionsHandler.handleRequestBody(CollectionsHandler.java:230)\n\tat org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:195)\n\tat org.apache.solr.servlet.HttpSolrCall.handleAdmin(HttpSolrCall.java:736)\n\tat org.apache.solr.servlet.HttpSolrCall.handleAdminRequest(HttpSolrCall.java:717)\n\tat org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:498)\n\tat org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:384)\n\tat org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:330)\n\tat org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1629)\n\tat org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:533)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)\n\tat org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)\n\tat org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:190)\n\tat org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1595)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:188)\n\tat org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1253)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:168)\n\tat org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:473)\n\tat org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1564)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:166)\n\tat org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1155)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)\n\tat org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:219)\n\tat org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:126)\n\tat org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)\n\tat org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)\n\tat org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)\n\tat org.eclipse.jetty.server.Server.handle(Server.java:530)\n\tat org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:347)\n\tat org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:256)\n\tat org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:279)\n\tat org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:102)\n\tat org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:124)\n\tat org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:247)\n\tat org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.produce(EatWhatYouKill.java:140)\n\tat org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:131)\n\tat
[jira] [Commented] (SOLR-12495) Enhance the Autoscaling policy syntax to evenly distribute replicas
[ https://issues.apache.org/jira/browse/SOLR-12495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16521323#comment-16521323 ] Jerry Bao commented on SOLR-12495: -- {quote} Is there anything that's not already addressed by that? I understand that it won't show any violations if you are already in an imbalanced state. {quote} Thats the main issue: no violations if you're already in an imbalanced state. If the autoscaling suggestions also suggested to move replicas to a more balanced state (based on the preferences) without any violations, then that would solve this issue. We have machines that have 0 load on them because the collections are distributed amongst the machines but all of the replicas aren't distributed. We also see machines with too much load because they have one of every collection's replica on it. {quote} This can always lead to violations which are impossible to satisfy {quote} I think this can lead to violations that are impossible to satisfy because often to fix the violation, it takes multiple steps. Something like, doing a 3-way triangle movement. I understand that the more movement possible, the exponential increase in combinations you have to check, but I think we can be smarter here about deciding which machines are definitely possible to move to and which doesn't make sense to move to. I would say that if we could incorporate the preferences into suggestions (so that the trigger can move things to be more balanced based on our preferences), that should help us a lot here. > Enhance the Autoscaling policy syntax to evenly distribute replicas > --- > > Key: SOLR-12495 > URL: https://issues.apache.org/jira/browse/SOLR-12495 > Project: Solr > Issue Type: New Feature > Security Level: Public(Default Security Level. Issues are Public) > Components: AutoScaling >Reporter: Noble Paul >Priority: Major > > Support a new function value for {{replica= "#MINIMUM"}} > {{#MINIMUM}} means the minimum computed value for the given configuration > the value of replica will be calculated as {{<= > Math.ceil(number_of_replicas/number_of_valid_nodes) }} > *example 1:* > {code:java} > {"replica" : "#MINIMUM" , "shard" : "#EACH" , "node" : "#ANY"} > {code} > *case 1* : nodes=3, replicationFactor=4 > the value of replica will be calculated as {{Math.ceil(4/3) = 2}} > current state : nodes=3, replicationFactor=2 > this is equivalent to the hard coded rule > {code:java} > {"replica" : "<3" , "shard" : "#EACH" , "node" : "#ANY"} > {code} > *case 2* : > current state : nodes=3, replicationFactor=2 > this is equivalent to the hard coded rule > {code:java} > {"replica" : "<3" , "shard" : "#EACH" , "node" : "#ANY"} > {code} > *example:2* > {code} > {"replica" : "#MINIMUM" , "node" : "#ANY"}{code} > case 1: numShards = 2, replicationFactor=3, nodes = 5 > this is equivalent to the hard coded rule > {code:java} > {"replica" : "<3" , "node" : "#ANY"} > {code} > *example:3* > {code} > {"replica" : "<2" , "shard" : "#EACH" , "port" : "8983"}{code} > case 1: {{replicationFactor=3, nodes with port 8983 = 2}} > this is equivalent to the hard coded rule > {code} > {"replica" : "<3" , "shard" : "#EACH" , "port" : "8983"}{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-11985) Allow percentage in replica attribute in policy
[ https://issues.apache.org/jira/browse/SOLR-11985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16521322#comment-16521322 ] Jerry Bao commented on SOLR-11985: -- SOLR-12511 should definitely solve the issue I was speaking of :). {quote} But the problem is that once you are already in a badly distributed cluster, it won't show any violations. {quote} Yep thats the problem I was hoping we can avoid. Balancing needs an in-between (such as either 2-3 replicas each machine) to be distributed, not a maximum/minimum. > Allow percentage in replica attribute in policy > --- > > Key: SOLR-11985 > URL: https://issues.apache.org/jira/browse/SOLR-11985 > Project: Solr > Issue Type: New Feature > Security Level: Public(Default Security Level. Issues are Public) > Components: AutoScaling, SolrCloud >Reporter: Shalin Shekhar Mangar >Assignee: Noble Paul >Priority: Major > Fix For: master (8.0), 7.5 > > Attachments: SOLR-11985.patch, SOLR-11985.patch > > > Today we can only specify an absolute number in the 'replica' attribute in > the policy rules. It'd be useful to write a percentage value to make certain > use-cases easier. For example: > {code:java} > // Keep a third of the the replicas of each shard in east region > {"replica" : "<34%", "shard" : "#EACH", "sysprop:region": "east"} > // Keep two thirds of the the replicas of each shard in west region > {"replica" : "<67%", "shard" : "#EACH", "sysprop:region": "west"} > {code} > Today the above must be represented by different rules for each collection if > they have different replication factors. Also if the replication factor > changes later, the absolute value has to be changed in tandem. So expressing > a percentage removes both of these restrictions. > This feature means that the value of the attribute {{"replica"}} is only > available just in time. We call such values {{"computed values"}} . The > computed value for this attribute depends on other attributes as well. > Take the following 2 rules > {code:java} > //example 1 > {"replica" : "<34%", "shard" : "#EACH", "sysprop:region": "east"} > //example 2 > {"replica" : "<34%", "sysprop:region": "east"} > {code} > assume we have collection {{"A"}} with 2 shards and {{replicationFactor=3}} > *example 1* would mean that the value of replica is computed as > {{3 * 34 / 100 = 1.02}} > Which means *_for each shard_* keep less than 1.02 replica in east > availability zone > > *example 2* would mean that the value of replica is computed as > {{3 * 2 * 34 / 100 = 2.04}} > > which means _*for each collection*_ keep less than 2.04 replicas on east > availability zone -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-11985) Allow percentage in replica attribute in policy
[ https://issues.apache.org/jira/browse/SOLR-11985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16520669#comment-16520669 ] Jerry Bao commented on SOLR-11985: -- Given the way it was written, the concern I had was the following: One collection has shards with 3 replicas and another collection has shards with 4 replicas. If I had the following set of rules... {code} {"replica" : "<33%", "shard" : "#EACH", "sysprop:region": "us-east-1a"} {"replica" : "<33%", "shard" : "#EACH", "sysprop:region": "us-east-1b"} {"replica" : "<33%", "shard" : "#EACH", "sysprop:region": "us-east-1c"} {code} My concern was it would turn into {code} {"replica" : "<2", "shard" : "#EACH", "sysprop:region": "us-east-1a"} {"replica" : "<2", "shard" : "#EACH", "sysprop:region": "us-east-1b"} {"replica" : "<2", "shard" : "#EACH", "sysprop:region": "us-east-1c"} {code} for the collection with 3 replicas, and {code} {"replica" : "<3", "shard" : "#EACH", "sysprop:region": "us-east-1a"} {"replica" : "<3", "shard" : "#EACH", "sysprop:region": "us-east-1b"} {"replica" : "<3", "shard" : "#EACH", "sysprop:region": "us-east-1c"} {code} for the collection with 4 replicas. In the collection with 4 replicas, you could have 2 replicas on us-east-1a and 2 replicas on us-east-1b. What we really want is 1 on each before having the 4th replica on another zone. Due to the way the rules are set up, it treats them individually when they should be treated together; evenly balancing the replicas based on the number of zones available. We could make it work by making different zone rules per collection, but that shouldn't be necessary. Rack awareness (which is what we're trying to achieve here), should be collection agnostic and apply against each collection. https://issues.apache.org/jira/browse/SOLR-12511 would help here. > Allow percentage in replica attribute in policy > --- > > Key: SOLR-11985 > URL: https://issues.apache.org/jira/browse/SOLR-11985 > Project: Solr > Issue Type: New Feature > Security Level: Public(Default Security Level. Issues are Public) > Components: AutoScaling, SolrCloud >Reporter: Shalin Shekhar Mangar >Assignee: Noble Paul >Priority: Major > Fix For: master (8.0), 7.5 > > Attachments: SOLR-11985.patch, SOLR-11985.patch > > > Today we can only specify an absolute number in the 'replica' attribute in > the policy rules. It'd be useful to write a percentage value to make certain > use-cases easier. For example: > {code:java} > // Keep a third of the the replicas of each shard in east region > {"replica" : "<34%", "shard" : "#EACH", "sysprop:region": "east"} > // Keep two thirds of the the replicas of each shard in west region > {"replica" : "<67%", "shard" : "#EACH", "sysprop:region": "west"} > {code} > Today the above must be represented by different rules for each collection if > they have different replication factors. Also if the replication factor > changes later, the absolute value has to be changed in tandem. So expressing > a percentage removes both of these restrictions. > This feature means that the value of the attribute {{"replica"}} is only > available just in time. We call such values {{"computed values"}} . The > computed value for this attribute depends on other attributes as well. > Take the following 2 rules > {code:java} > //example 1 > {"replica" : "<34%", "shard" : "#EACH", "sysprop:region": "east"} > //example 2 > {"replica" : "<34%", "sysprop:region": "east"} > {code} > assume we have collection {{"A"}} with 2 shards and {{replicationFactor=3}} > *example 1* would mean that the value of replica is computed as > {{3 * 34 / 100 = 1.02}} > Which means *_for each shard_* keep less than 1.02 replica in east > availability zone > > *example 2* would mean that the value of replica is computed as > {{3 * 2 * 34 / 100 = 2.04}} > > which means _*for each collection*_ keep less than 2.04 replicas on east > availability zone -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-12495) Enhance the Autoscaling policy syntax to evenly distribute replicas
[ https://issues.apache.org/jira/browse/SOLR-12495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16520654#comment-16520654 ] Jerry Bao commented on SOLR-12495: -- {quote} Actually, the terms replica , shard are always associated with a collection. If the attribute shard is present , the replica counts are computed on a per-shard basis , if it is absent, it is computed on a per-collection basis The equivalent term for a replica globally is a core which is not associated with a collection or shard {quote} I see; could {"core": "#MINIMUM", "node": "#ANY"} be included with this issue? Along with per-collection balancing, we'll also need cluster-wide balancing. {quote} That means The no:of of replicas will have to be between 1 and 2 (inclusive) . Which means , both 1 and 2 are valid but 0 , 3 or >3 are invalid and , the list of violations will show that {quote} Awesome! No qualms here then :) Thanks for all your help on this issue! Cluster balancing is a critical issue for us @ Reddit. > Enhance the Autoscaling policy syntax to evenly distribute replicas > --- > > Key: SOLR-12495 > URL: https://issues.apache.org/jira/browse/SOLR-12495 > Project: Solr > Issue Type: New Feature > Security Level: Public(Default Security Level. Issues are Public) > Components: AutoScaling >Reporter: Noble Paul >Priority: Major > > Support a new function value for {{replica= "#MINIMUM"}} > {{#MINIMUM}} means the minimum computed value for the given configuration > the value of replica will be calculated as {{<= > Math.ceil(number_of_replicas/number_of_valid_nodes) }} > *example 1:* > {code:java} > {"replica" : "#MINIMUM" , "shard" : "#EACH" , "node" : "#ANY"} > {code} > *case 1* : nodes=3, replicationFactor=4 > the value of replica will be calculated as {{Math.ceil(4/3) = 2}} > current state : nodes=3, replicationFactor=2 > this is equivalent to the hard coded rule > {code:java} > {"replica" : "<3" , "shard" : "#EACH" , "node" : "#ANY"} > {code} > *case 2* : > current state : nodes=3, replicationFactor=2 > this is equivalent to the hard coded rule > {code:java} > {"replica" : "<3" , "shard" : "#EACH" , "node" : "#ANY"} > {code} > *example:2* > {code} > {"replica" : "#MINIMUM" , "node" : "#ANY"}{code} > case 1: numShards = 2, replicationFactor=3, nodes = 5 > this is equivalent to the hard coded rule > {code:java} > {"replica" : "<3" , "node" : "#ANY"} > {code} > *example:3* > {code} > {"replica" : "<2" , "shard" : "#EACH" , "port" : "8983"}{code} > case 1: {{replicationFactor=3, nodes with port 8983 = 2}} > this is equivalent to the hard coded rule > {code} > {"replica" : "<3" , "shard" : "#EACH" , "port" : "8983"}{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-12495) Make it possible to evenly distribute replicas
[ https://issues.apache.org/jira/browse/SOLR-12495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16519668#comment-16519668 ] Jerry Bao edited comment on SOLR-12495 at 6/21/18 6:40 PM: --- {quote}well {code:java} {"replica": "#MINIMUM", "node": "#ANY"} {code} means it is applied on a per collection basis {quote} That seems confusing to me; the way I read it is: keep a minimum number of replicas on every node. Just to clarify, when you say per-collection basis, you're meaning each collection is balanced? If that is so will there be a way to keep the entire cluster balanced irrespective of collection? Is that covered by the core preference? My concern here is that without a way to keep the entire cluster balanced irrespective of collection, you'll end up with nodes with one replica of every collection and other nodes with 0 replicas. For example, if you had three collections with 30 replicas each, and 45 nodes, you could end up with 30 nodes, each with one of each collections replica, and 15 nodes with 0 replicas, which is unbalanced. {quote}In reality, it works slightly different. The value "<3" is not a constant . it keeps varying when every replica is created. for instance , when replica # 40 is being created , the value is (40/40 = 1) that is like saying {{replica:"<2"}} . whereas , when replica #41 is created, it suddenly becomes {{"replica" : "<3"}}. So actually allocations happen evenly {quote} I understand that it's not constant, but what I'm saying is the rule itself can not be violated but the cluster not balanced. If I have 42 replicas and 40 nodes, I would want 1 replica on every node before getting 2 on other nodes. ceil(42/40) -> <3 rule, which has the potential of having 2 replicas on 21 nodes, which satisfies the rule but is not balanced. was (Author: jerry.bao): {quote}well {code:java} {"replica": "#MINIMUM", "node": "#ANY"} {code} means it is applied on a per collection basis {quote} That seems confusing to me; the way I read it is: keep a minimum number of replicas on every node. Just to clarify, when you say per-collection basis, you're meaning each collection is balanced? If that is so will there be a way to keep the entire cluster balanced irrespective of collection? Is that covered by the core preference? My concern here is that without a way to keep the entire cluster balanced irrespective of collection, you'll end up with nodes with one replica of every collection and other nodes with 0 replicas. For example, if you had three collections with 30 replicas each, and 45 nodes, you could end up with 30 nodes, each with one collections replica, and 15 nodes with 0 replicas, which is unbalanced. {quote}In reality, it works slightly different. The value "<3" is not a constant . it keeps varying when every replica is created. for instance , when replica # 40 is being created , the value is (40/40 = 1) that is like saying {{replica:"<2"}} . whereas , when replica #41 is created, it suddenly becomes {{"replica" : "<3"}}. So actually allocations happen evenly {quote} I understand that it's not constant, but what I'm saying is the rule itself can not be violated but the cluster not balanced. If I have 42 replicas and 40 nodes, I would want 1 replica on every node before getting 2 on other nodes. ceil(42/40) -> <3 rule, which has the potential of having 2 replicas on 21 nodes, which satisfies the rule but is not balanced. > Make it possible to evenly distribute replicas > -- > > Key: SOLR-12495 > URL: https://issues.apache.org/jira/browse/SOLR-12495 > Project: Solr > Issue Type: Sub-task > Security Level: Public(Default Security Level. Issues are Public) > Components: AutoScaling >Reporter: Noble Paul >Priority: Major > > Support a new function value for {{replica= "#MINIMUM"}} > {{#MINIMUM}} means the minimum computed value for the given configuration > the value of replica will be calculated as {{<= > Math.ceil(number_of_replicas/number_of_valid_nodes) }} > *example 1:* > {code:java} > {"replica" : "#MINIMUM" , "shard" : "#EACH" , "node" : "#ANY"} > {code} > *case 1* : nodes=3, replicationFactor=4 > the value of replica will be calculated as {{Math.ceil(4/3) = 2}} > current state : nodes=3, replicationFactor=2 > this is equivalent to the hard coded rule > {code:java} > {"replica" : "<3" , "shard" : "#EACH" , "node" : "#ANY"} > {code} > *case 2* : > current state : nodes=3, replicationFactor=2 > this is equivalent to the hard coded rule > {code:java} > {"replica" : "<3" , "shard" : "#EACH" , "node" : "#ANY"} > {code} > *example:2* > {code} > {"replica" : "#MINIMUM" , "node" : "#ANY"}{code} > case 1: numShards = 2, replicationFactor=3, nodes = 5 > this is equivalent to the hard coded rule > {code:java} > {"replica" : "<3" , "node" :
[jira] [Commented] (SOLR-12495) Make it possible to evenly distribute replicas
[ https://issues.apache.org/jira/browse/SOLR-12495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16519668#comment-16519668 ] Jerry Bao commented on SOLR-12495: -- {quote}well {code:java} {"replica": "#MINIMUM", "node": "#ANY"} {code} means it is applied on a per collection basis {quote} That seems confusing to me; the way I read it is: keep a minimum number of replicas on every node. Just to clarify, when you say per-collection basis, you're meaning each collection is balanced? If that is so will there be a way to keep the entire cluster balanced irrespective of collection? Is that covered by the core preference? My concern here is that without a way to keep the entire cluster balanced irrespective of collection, you'll end up with nodes with one replica of every collection and other nodes with 0 replicas. For example, if you had three collections with 30 replicas each, and 45 nodes, you could end up with 30 nodes, each with one collections replica, and 15 nodes with 0 replicas, which is unbalanced. {quote}In reality, it works slightly different. The value "<3" is not a constant . it keeps varying when every replica is created. for instance , when replica # 40 is being created , the value is (40/40 = 1) that is like saying {{replica:"<2"}} . whereas , when replica #41 is created, it suddenly becomes {{"replica" : "<3"}}. So actually allocations happen evenly {quote} I understand that it's not constant, but what I'm saying is the rule itself can not be violated but the cluster not balanced. If I have 42 replicas and 40 nodes, I would want 1 replica on every node before getting 2 on other nodes. ceil(42/40) -> <3 rule, which has the potential of having 2 replicas on 21 nodes, which satisfies the rule but is not balanced. > Make it possible to evenly distribute replicas > -- > > Key: SOLR-12495 > URL: https://issues.apache.org/jira/browse/SOLR-12495 > Project: Solr > Issue Type: Sub-task > Security Level: Public(Default Security Level. Issues are Public) > Components: AutoScaling >Reporter: Noble Paul >Priority: Major > > Support a new function value for {{replica= "#MINIMUM"}} > {{#MINIMUM}} means the minimum computed value for the given configuration > the value of replica will be calculated as {{<= > Math.ceil(number_of_replicas/number_of_valid_nodes) }} > *example 1:* > {code:java} > {"replica" : "#MINIMUM" , "shard" : "#EACH" , "node" : "#ANY"} > {code} > *case 1* : nodes=3, replicationFactor=4 > the value of replica will be calculated as {{Math.ceil(4/3) = 2}} > current state : nodes=3, replicationFactor=2 > this is equivalent to the hard coded rule > {code:java} > {"replica" : "<3" , "shard" : "#EACH" , "node" : "#ANY"} > {code} > *case 2* : > current state : nodes=3, replicationFactor=2 > this is equivalent to the hard coded rule > {code:java} > {"replica" : "<3" , "shard" : "#EACH" , "node" : "#ANY"} > {code} > *example:2* > {code} > {"replica" : "#MINIMUM" , "node" : "#ANY"}{code} > case 1: numShards = 2, replicationFactor=3, nodes = 5 > this is equivalent to the hard coded rule > {code:java} > {"replica" : "<3" , "node" : "#ANY"} > {code} > *example:3* > {code} > {"replica" : "<2" , "shard" : "#EACH" , "port" : "8983"}{code} > case 1: {{replicationFactor=3, nodes with port 8983 = 2}} > this is equivalent to the hard coded rule > {code} > {"replica" : "<3" , "shard" : "#EACH" , "port" : "8983"}{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-11985) Allow percentage in replica attribute in policy
[ https://issues.apache.org/jira/browse/SOLR-11985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16518984#comment-16518984 ] Jerry Bao commented on SOLR-11985: -- [~noble.paul] that would mean each shard would have to have the same amount of replicas, which might not be the case within a collection or among all collections; it would be nice if there were a set of policies that would evenly distribute a shard's replicas amongst a property without having to specify different rules per collection based on how many replicas each shard has. I agree that if all of the shards had the same number of replicas we could change the numbers, but that isn't always the case. Does that make sense? > Allow percentage in replica attribute in policy > --- > > Key: SOLR-11985 > URL: https://issues.apache.org/jira/browse/SOLR-11985 > Project: Solr > Issue Type: Sub-task > Security Level: Public(Default Security Level. Issues are Public) > Components: AutoScaling, SolrCloud >Reporter: Shalin Shekhar Mangar >Assignee: Noble Paul >Priority: Major > Fix For: master (8.0), 7.5 > > Attachments: SOLR-11985.patch, SOLR-11985.patch > > > Today we can only specify an absolute number in the 'replica' attribute in > the policy rules. It'd be useful to write a percentage value to make certain > use-cases easier. For example: > {code:java} > // Keep a third of the the replicas of each shard in east region > {"replica" : "<34%", "shard" : "#EACH", "sysprop:region": "east"} > // Keep two thirds of the the replicas of each shard in west region > {"replica" : "<67%", "shard" : "#EACH", "sysprop:region": "west"} > {code} > Today the above must be represented by different rules for each collection if > they have different replication factors. Also if the replication factor > changes later, the absolute value has to be changed in tandem. So expressing > a percentage removes both of these restrictions. > This feature means that the value of the attribute {{"replica"}} is only > available just in time. We call such values {{"computed values"}} . The > computed value for this attribute depends on other attributes as well. > Take the following 2 rules > {code:java} > //example 1 > {"replica" : "<34%", "shard" : "#EACH", "sysprop:region": "east"} > //example 2 > {"replica" : "<34%", "sysprop:region": "east"} > {code} > assume we have collection {{"A"}} with 2 shards and {{replicationFactor=3}} > *example 1* would mean that the value of replica is computed as > {{3 * 34 / 100 = 1.02}} > Which means *_for each shard_* keep less than 1.02 replica in east > availability zone > > *example 2* would mean that the value of replica is computed as > {{3 * 2 * 34 / 100 = 2.04}} > > which means _*for each collection*_ keep less than 2.04 replicas on east > availability zone -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-11985) Allow percentage in replica attribute in policy
[ https://issues.apache.org/jira/browse/SOLR-11985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16518480#comment-16518480 ] Jerry Bao commented on SOLR-11985: -- [~noble.paul] What would happen if I had 5 replicas and 3 zones for a shard? Is it possible to make a rule that balances the replicas on a shard as 2 on us-east-1a, 2 on us-east-1b, and 1 on us-east-1c? > Allow percentage in replica attribute in policy > --- > > Key: SOLR-11985 > URL: https://issues.apache.org/jira/browse/SOLR-11985 > Project: Solr > Issue Type: Sub-task > Security Level: Public(Default Security Level. Issues are Public) > Components: AutoScaling, SolrCloud >Reporter: Shalin Shekhar Mangar >Assignee: Noble Paul >Priority: Major > Fix For: master (8.0), 7.5 > > Attachments: SOLR-11985.patch, SOLR-11985.patch > > > Today we can only specify an absolute number in the 'replica' attribute in > the policy rules. It'd be useful to write a percentage value to make certain > use-cases easier. For example: > {code:java} > // Keep a third of the the replicas of each shard in east region > {"replica" : "<34%", "shard" : "#EACH", "sysprop:region": "east"} > // Keep two thirds of the the replicas of each shard in west region > {"replica" : "<67%", "shard" : "#EACH", "sysprop:region": "west"} > {code} > Today the above must be represented by different rules for each collection if > they have different replication factors. Also if the replication factor > changes later, the absolute value has to be changed in tandem. So expressing > a percentage removes both of these restrictions. > This feature means that the value of the attribute {{"replica"}} is only > available just in time. We call such values {{"computed values"}} . The > computed value for this attribute depends on other attributes as well. > Take the following 2 rules > {code:java} > //example 1 > {"replica" : "<34%", "shard" : "#EACH", "sysprop:region": "east"} > //example 2 > {"replica" : "<34%", "sysprop:region": "east"} > {code} > assume we have collection {{"A"}} with 2 shards and {{replicationFactor=3}} > *example 1* would mean that the value of replica is computed as > {{3 * 34 / 100 = 1.02}} > Which means *_for each shard_* keep less than 1.02 replica in east > availability zone > > *example 2* would mean that the value of replica is computed as > {{3 * 2 * 34 / 100 = 2.04}} > > which means _*for each collection*_ keep less than 2.04 replicas on east > availability zone -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-12495) Make it possible to evenly distribute replicas
[ https://issues.apache.org/jira/browse/SOLR-12495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16518476#comment-16518476 ] Jerry Bao commented on SOLR-12495: -- Wanted to add a couple of comments: Would be great if this occurs per-collection. For example, a collection with 42 replicas and 40 nodes should expect to have one replica from that collection on each node, with 2 nodes having 2 replicas. \{"replica": "#MINIMUM", "collection": "#EACH", "node": "#ANY"} Cluster-wide would also go along with this, making sure each node has a similar amount of replicas. \{"replica": "#MINIMUM", "node": "#ANY"} A warning that "<3" which is ceil(42/40) = 2 works, but only after each node has one replica. This rule also allows for 2 replicas on 21 nodes, which is not as good as 1 replica on all nodes, and 2 replicas on 1 node. I think this should be fixed by the ordering of the nodes by preference, but only if the list is updated after each movement. [~noble.paul] FYI > Make it possible to evenly distribute replicas > -- > > Key: SOLR-12495 > URL: https://issues.apache.org/jira/browse/SOLR-12495 > Project: Solr > Issue Type: Sub-task > Security Level: Public(Default Security Level. Issues are Public) > Components: AutoScaling >Reporter: Noble Paul >Priority: Major > > Support a new function value for {{replica= "#MINIMUM"}} > {{#MINIMUM}} means the minimum computed value for the given configuration > the value of replica will be calculated as {{<= > Math.ceil(number_of_replicas/number_of_valid_nodes) }} > *example 1:* > {code:java} > {"replica" : "#MINIMUM" , "shard" : "#EACH" , "node" : "#ANY"} > {code} > *case 1* : nodes=3, replicationFactor=4 > the value of replica will be calculated as {{Math.ceil(4/3) = 2}} > current state : nodes=3, replicationFactor=2 > this is equivalent to the hard coded rule > {code:java} > {"replica" : "<3" , "shard" : "#EACH" , "node" : "#ANY"} > {code} > *case 2* : > current state : nodes=3, replicationFactor=2 > this is equivalent to the hard coded rule > {code:java} > {"replica" : "<3" , "shard" : "#EACH" , "node" : "#ANY"} > {code} > *example:2* > {code} > {"replica" : "#MINIMUM" , "node" : "#ANY"}{code} > case 1: numShards = 2, replicationFactor=3, nodes = 5 > this is equivalent to the hard coded rule > {code:java} > {"replica" : "<3" , "node" : "#ANY"} > {code} > *example:3* > {code} > {"replica" : "<2" , "shard" : "#EACH" , "port" : "8983"}{code} > case 1: {{replicationFactor=3, nodes with port 8983 = 2}} > this is equivalent to the hard coded rule > {code} > {"replica" : "<3" , "shard" : "#EACH" , "port" : "8983"}{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-12088) Shards with dead replicas cause increased write latency
[ https://issues.apache.org/jira/browse/SOLR-12088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16495708#comment-16495708 ] Jerry Bao commented on SOLR-12088: -- [~caomanhdat] I can't confirm or deny whether or not this has been fixed, but I'm happy with closing this out and reopening if we see it again. > Shards with dead replicas cause increased write latency > --- > > Key: SOLR-12088 > URL: https://issues.apache.org/jira/browse/SOLR-12088 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrCloud >Affects Versions: 7.2 >Reporter: Jerry Bao >Priority: Major > > If a collection's shard contains dead replicas, write latency to the > collection is increased. For example, if a collection has 10 shards with a > replication factor of 3, and one of those shards contains 3 replicas and 3 > downed replicas, write latency is increased in comparison to a shard that > contains only 3 replicas. > My feeling here is that downed replicas should be completely ignored and not > cause issues to other alive replicas in terms of write latency. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-12358) Autoscaling suggestions fail randomly and for certain policies
[ https://issues.apache.org/jira/browse/SOLR-12358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jerry Bao updated SOLR-12358: - Attachment: diagnostics nodes > Autoscaling suggestions fail randomly and for certain policies > -- > > Key: SOLR-12358 > URL: https://issues.apache.org/jira/browse/SOLR-12358 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: AutoScaling >Affects Versions: 7.3.1 >Reporter: Jerry Bao >Priority: Critical > Attachments: diagnostics, nodes > > > For the following policy > {code:java} > {"cores": "<4","node": "#ANY"}{code} > the suggestions endpoint fails > {code:java} > "error": {"msg": "Comparison method violates its general contract!","trace": > "java.lang.IllegalArgumentException: Comparison method violates its general > contract!\n\tat java.util.TimSort.mergeHi(TimSort.java:899)\n\tat > java.util.TimSort.mergeAt(TimSort.java:516)\n\tat > java.util.TimSort.mergeCollapse(TimSort.java:441)\n\tat > java.util.TimSort.sort(TimSort.java:245)\n\tat > java.util.Arrays.sort(Arrays.java:1512)\n\tat > java.util.ArrayList.sort(ArrayList.java:1462)\n\tat > java.util.Collections.sort(Collections.java:175)\n\tat > org.apache.solr.client.solrj.cloud.autoscaling.Policy.setApproxValuesAndSortNodes(Policy.java:363)\n\tat > > org.apache.solr.client.solrj.cloud.autoscaling.Policy$Session.applyRules(Policy.java:310)\n\tat > > org.apache.solr.client.solrj.cloud.autoscaling.Policy$Session.(Policy.java:272)\n\tat > > org.apache.solr.client.solrj.cloud.autoscaling.Policy.createSession(Policy.java:376)\n\tat > > org.apache.solr.client.solrj.cloud.autoscaling.PolicyHelper.getSuggestions(PolicyHelper.java:214)\n\tat > > org.apache.solr.cloud.autoscaling.AutoScalingHandler.handleSuggestions(AutoScalingHandler.java:158)\n\tat > > org.apache.solr.cloud.autoscaling.AutoScalingHandler.handleRequestBody(AutoScalingHandler.java:133)\n\tat > > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:195)\n\tat > org.apache.solr.api.ApiBag$ReqHandlerToApi.call(ApiBag.java:242)\n\tat > org.apache.solr.api.V2HttpCall.handleAdmin(V2HttpCall.java:311)\n\tat > org.apache.solr.servlet.HttpSolrCall.handleAdminRequest(HttpSolrCall.java:717)\n\tat > org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:498)\n\tat > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:384)\n\tat > > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:330)\n\tat > > org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1629)\n\tat > > org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:533)\n\tat > > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)\n\tat > > org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)\n\tat > > org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)\n\tat > > org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:190)\n\tat > > org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1595)\n\tat > > org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:188)\n\tat > > org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1253)\n\tat > > org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:168)\n\tat > > org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:473)\n\tat > > org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1564)\n\tat > > org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:166)\n\tat > > org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1155)\n\tat > > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)\n\tat > > org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:219)\n\tat > > org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:126)\n\tat > > org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)\n\tat > > org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)\n\tat > > org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)\n\tat > org.eclipse.jetty.server.Server.handle(Server.java:530)\n\tat > org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:347)\n\tat > org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:256)\n\tat > > org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:279)\n\tat >
[jira] [Commented] (SOLR-12358) Autoscaling suggestions fail randomly and for certain policies
[ https://issues.apache.org/jira/browse/SOLR-12358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16481000#comment-16481000 ] Jerry Bao commented on SOLR-12358: -- [~noble.paul] Updated! > Autoscaling suggestions fail randomly and for certain policies > -- > > Key: SOLR-12358 > URL: https://issues.apache.org/jira/browse/SOLR-12358 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: AutoScaling >Affects Versions: 7.3.1 >Reporter: Jerry Bao >Priority: Critical > Attachments: diagnostics, nodes > > > For the following policy > {code:java} > {"cores": "<4","node": "#ANY"}{code} > the suggestions endpoint fails > {code:java} > "error": {"msg": "Comparison method violates its general contract!","trace": > "java.lang.IllegalArgumentException: Comparison method violates its general > contract!\n\tat java.util.TimSort.mergeHi(TimSort.java:899)\n\tat > java.util.TimSort.mergeAt(TimSort.java:516)\n\tat > java.util.TimSort.mergeCollapse(TimSort.java:441)\n\tat > java.util.TimSort.sort(TimSort.java:245)\n\tat > java.util.Arrays.sort(Arrays.java:1512)\n\tat > java.util.ArrayList.sort(ArrayList.java:1462)\n\tat > java.util.Collections.sort(Collections.java:175)\n\tat > org.apache.solr.client.solrj.cloud.autoscaling.Policy.setApproxValuesAndSortNodes(Policy.java:363)\n\tat > > org.apache.solr.client.solrj.cloud.autoscaling.Policy$Session.applyRules(Policy.java:310)\n\tat > > org.apache.solr.client.solrj.cloud.autoscaling.Policy$Session.(Policy.java:272)\n\tat > > org.apache.solr.client.solrj.cloud.autoscaling.Policy.createSession(Policy.java:376)\n\tat > > org.apache.solr.client.solrj.cloud.autoscaling.PolicyHelper.getSuggestions(PolicyHelper.java:214)\n\tat > > org.apache.solr.cloud.autoscaling.AutoScalingHandler.handleSuggestions(AutoScalingHandler.java:158)\n\tat > > org.apache.solr.cloud.autoscaling.AutoScalingHandler.handleRequestBody(AutoScalingHandler.java:133)\n\tat > > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:195)\n\tat > org.apache.solr.api.ApiBag$ReqHandlerToApi.call(ApiBag.java:242)\n\tat > org.apache.solr.api.V2HttpCall.handleAdmin(V2HttpCall.java:311)\n\tat > org.apache.solr.servlet.HttpSolrCall.handleAdminRequest(HttpSolrCall.java:717)\n\tat > org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:498)\n\tat > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:384)\n\tat > > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:330)\n\tat > > org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1629)\n\tat > > org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:533)\n\tat > > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)\n\tat > > org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)\n\tat > > org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)\n\tat > > org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:190)\n\tat > > org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1595)\n\tat > > org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:188)\n\tat > > org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1253)\n\tat > > org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:168)\n\tat > > org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:473)\n\tat > > org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1564)\n\tat > > org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:166)\n\tat > > org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1155)\n\tat > > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)\n\tat > > org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:219)\n\tat > > org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:126)\n\tat > > org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)\n\tat > > org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)\n\tat > > org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)\n\tat > org.eclipse.jetty.server.Server.handle(Server.java:530)\n\tat > org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:347)\n\tat > org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:256)\n\tat > > org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:279)\n\tat >
[jira] [Updated] (SOLR-12358) Autoscaling suggestions fail randomly and for certain policies
[ https://issues.apache.org/jira/browse/SOLR-12358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jerry Bao updated SOLR-12358: - Description: For the following policy {code:java} {"cores": "<4","node": "#ANY"}{code} the suggestions endpoint fails {code:java} "error": {"msg": "Comparison method violates its general contract!","trace": "java.lang.IllegalArgumentException: Comparison method violates its general contract!\n\tat java.util.TimSort.mergeHi(TimSort.java:899)\n\tat java.util.TimSort.mergeAt(TimSort.java:516)\n\tat java.util.TimSort.mergeCollapse(TimSort.java:441)\n\tat java.util.TimSort.sort(TimSort.java:245)\n\tat java.util.Arrays.sort(Arrays.java:1512)\n\tat java.util.ArrayList.sort(ArrayList.java:1462)\n\tat java.util.Collections.sort(Collections.java:175)\n\tat org.apache.solr.client.solrj.cloud.autoscaling.Policy.setApproxValuesAndSortNodes(Policy.java:363)\n\tat org.apache.solr.client.solrj.cloud.autoscaling.Policy$Session.applyRules(Policy.java:310)\n\tat org.apache.solr.client.solrj.cloud.autoscaling.Policy$Session.(Policy.java:272)\n\tat org.apache.solr.client.solrj.cloud.autoscaling.Policy.createSession(Policy.java:376)\n\tat org.apache.solr.client.solrj.cloud.autoscaling.PolicyHelper.getSuggestions(PolicyHelper.java:214)\n\tat org.apache.solr.cloud.autoscaling.AutoScalingHandler.handleSuggestions(AutoScalingHandler.java:158)\n\tat org.apache.solr.cloud.autoscaling.AutoScalingHandler.handleRequestBody(AutoScalingHandler.java:133)\n\tat org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:195)\n\tat org.apache.solr.api.ApiBag$ReqHandlerToApi.call(ApiBag.java:242)\n\tat org.apache.solr.api.V2HttpCall.handleAdmin(V2HttpCall.java:311)\n\tat org.apache.solr.servlet.HttpSolrCall.handleAdminRequest(HttpSolrCall.java:717)\n\tat org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:498)\n\tat org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:384)\n\tat org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:330)\n\tat org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1629)\n\tat org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:533)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)\n\tat org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)\n\tat org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:190)\n\tat org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1595)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:188)\n\tat org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1253)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:168)\n\tat org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:473)\n\tat org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1564)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:166)\n\tat org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1155)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)\n\tat org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:219)\n\tat org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:126)\n\tat org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)\n\tat org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)\n\tat org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)\n\tat org.eclipse.jetty.server.Server.handle(Server.java:530)\n\tat org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:347)\n\tat org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:256)\n\tat org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:279)\n\tat org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:102)\n\tat org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:124)\n\tat org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:247)\n\tat org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.produce(EatWhatYouKill.java:140)\n\tat org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:131)\n\tat org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:382)\n\tat org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:708)\n\tat org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:626)\n\tat
[jira] [Updated] (SOLR-12358) Autoscaling suggestions fail randomly and for certain policies
[ https://issues.apache.org/jira/browse/SOLR-12358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jerry Bao updated SOLR-12358: - Priority: Critical (was: Major) > Autoscaling suggestions fail randomly and for certain policies > -- > > Key: SOLR-12358 > URL: https://issues.apache.org/jira/browse/SOLR-12358 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 7.3.1 >Reporter: Jerry Bao >Priority: Critical > > For the following policy > {code:java} > {"replica": "<3", "node": "#ANY", "collection": "collection"}{code} > the suggestions endpoint fails > {code:java} > "error": {"msg": "Comparison method violates its general contract!","trace": > "java.lang.IllegalArgumentException: Comparison method violates its general > contract!\n\tat java.util.TimSort.mergeHi(TimSort.java:899)\n\tat > java.util.TimSort.mergeAt(TimSort.java:516)\n\tat > java.util.TimSort.mergeCollapse(TimSort.java:441)\n\tat > java.util.TimSort.sort(TimSort.java:245)\n\tat > java.util.Arrays.sort(Arrays.java:1512)\n\tat > java.util.ArrayList.sort(ArrayList.java:1462)\n\tat > java.util.Collections.sort(Collections.java:175)\n\tat > org.apache.solr.client.solrj.cloud.autoscaling.Policy.setApproxValuesAndSortNodes(Policy.java:363)\n\tat > > org.apache.solr.client.solrj.cloud.autoscaling.Policy$Session.applyRules(Policy.java:310)\n\tat > > org.apache.solr.client.solrj.cloud.autoscaling.Policy$Session.(Policy.java:272)\n\tat > > org.apache.solr.client.solrj.cloud.autoscaling.Policy.createSession(Policy.java:376)\n\tat > > org.apache.solr.client.solrj.cloud.autoscaling.PolicyHelper.getSuggestions(PolicyHelper.java:214)\n\tat > > org.apache.solr.cloud.autoscaling.AutoScalingHandler.handleSuggestions(AutoScalingHandler.java:158)\n\tat > > org.apache.solr.cloud.autoscaling.AutoScalingHandler.handleRequestBody(AutoScalingHandler.java:133)\n\tat > > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:195)\n\tat > org.apache.solr.api.ApiBag$ReqHandlerToApi.call(ApiBag.java:242)\n\tat > org.apache.solr.api.V2HttpCall.handleAdmin(V2HttpCall.java:311)\n\tat > org.apache.solr.servlet.HttpSolrCall.handleAdminRequest(HttpSolrCall.java:717)\n\tat > org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:498)\n\tat > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:384)\n\tat > > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:330)\n\tat > > org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1629)\n\tat > > org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:533)\n\tat > > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)\n\tat > > org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)\n\tat > > org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)\n\tat > > org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:190)\n\tat > > org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1595)\n\tat > > org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:188)\n\tat > > org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1253)\n\tat > > org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:168)\n\tat > > org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:473)\n\tat > > org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1564)\n\tat > > org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:166)\n\tat > > org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1155)\n\tat > > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)\n\tat > > org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:219)\n\tat > > org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:126)\n\tat > > org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)\n\tat > > org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)\n\tat > > org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)\n\tat > org.eclipse.jetty.server.Server.handle(Server.java:530)\n\tat > org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:347)\n\tat > org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:256)\n\tat > > org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:279)\n\tat > org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:102)\n\tat
[jira] [Created] (SOLR-12358) Autoscaling suggestions fail randomly and for certain policies
Jerry Bao created SOLR-12358: Summary: Autoscaling suggestions fail randomly and for certain policies Key: SOLR-12358 URL: https://issues.apache.org/jira/browse/SOLR-12358 Project: Solr Issue Type: Bug Security Level: Public (Default Security Level. Issues are Public) Affects Versions: 7.3.1 Reporter: Jerry Bao For the following policy {code:java} {"replica": "<3", "node": "#ANY", "collection": "collection"}{code} the suggestions endpoint fails {code:java} "error": {"msg": "Comparison method violates its general contract!","trace": "java.lang.IllegalArgumentException: Comparison method violates its general contract!\n\tat java.util.TimSort.mergeHi(TimSort.java:899)\n\tat java.util.TimSort.mergeAt(TimSort.java:516)\n\tat java.util.TimSort.mergeCollapse(TimSort.java:441)\n\tat java.util.TimSort.sort(TimSort.java:245)\n\tat java.util.Arrays.sort(Arrays.java:1512)\n\tat java.util.ArrayList.sort(ArrayList.java:1462)\n\tat java.util.Collections.sort(Collections.java:175)\n\tat org.apache.solr.client.solrj.cloud.autoscaling.Policy.setApproxValuesAndSortNodes(Policy.java:363)\n\tat org.apache.solr.client.solrj.cloud.autoscaling.Policy$Session.applyRules(Policy.java:310)\n\tat org.apache.solr.client.solrj.cloud.autoscaling.Policy$Session.(Policy.java:272)\n\tat org.apache.solr.client.solrj.cloud.autoscaling.Policy.createSession(Policy.java:376)\n\tat org.apache.solr.client.solrj.cloud.autoscaling.PolicyHelper.getSuggestions(PolicyHelper.java:214)\n\tat org.apache.solr.cloud.autoscaling.AutoScalingHandler.handleSuggestions(AutoScalingHandler.java:158)\n\tat org.apache.solr.cloud.autoscaling.AutoScalingHandler.handleRequestBody(AutoScalingHandler.java:133)\n\tat org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:195)\n\tat org.apache.solr.api.ApiBag$ReqHandlerToApi.call(ApiBag.java:242)\n\tat org.apache.solr.api.V2HttpCall.handleAdmin(V2HttpCall.java:311)\n\tat org.apache.solr.servlet.HttpSolrCall.handleAdminRequest(HttpSolrCall.java:717)\n\tat org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:498)\n\tat org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:384)\n\tat org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:330)\n\tat org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1629)\n\tat org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:533)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)\n\tat org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)\n\tat org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:190)\n\tat org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1595)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:188)\n\tat org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1253)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:168)\n\tat org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:473)\n\tat org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1564)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:166)\n\tat org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1155)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)\n\tat org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:219)\n\tat org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:126)\n\tat org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)\n\tat org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)\n\tat org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)\n\tat org.eclipse.jetty.server.Server.handle(Server.java:530)\n\tat org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:347)\n\tat org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:256)\n\tat org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:279)\n\tat org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:102)\n\tat org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:124)\n\tat org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:247)\n\tat org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.produce(EatWhatYouKill.java:140)\n\tat org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:131)\n\tat
[jira] [Commented] (SOLR-12087) Deleting replicas sometimes fails and causes the replicas to exist in the down state
[ https://issues.apache.org/jira/browse/SOLR-12087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16419464#comment-16419464 ] Jerry Bao commented on SOLR-12087: -- Can we get this fix backported to 7.3 and have a 7.3.1? > Deleting replicas sometimes fails and causes the replicas to exist in the > down state > > > Key: SOLR-12087 > URL: https://issues.apache.org/jira/browse/SOLR-12087 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrCloud >Affects Versions: 7.2 >Reporter: Jerry Bao >Assignee: Cao Manh Dat >Priority: Critical > Fix For: 7.4 > > Attachments: SOLR-12087.patch, SOLR-12087.patch, SOLR-12087.patch, > SOLR-12087.test.patch, Screen Shot 2018-03-16 at 11.50.32 AM.png > > > Sometimes when deleting replicas, the replica fails to be removed from the > cluster state. This occurs especially when deleting replicas en mass; the > resulting cause is that the data is deleted but the replicas aren't removed > from the cluster state. Attempting to delete the downed replicas causes > failures because the core does not exist anymore. > This also occurs when trying to move replicas, since that move is an add and > delete. > Some more information regarding this issue; when the MOVEREPLICA command is > issued, the new replica is created successfully but the replica to be deleted > fails to be removed from state.json (the core is deleted though) and we see > two logs spammed. > # The node containing the leader replica continually (read every second) > attempts to initiate recovery on the replica and fails to do so because the > core does not exist. As a result it continually publishes a down state for > the replica to zookeeper. > # The deleted replica node spams that it cannot locate the core because it's > been deleted. > During this period of time, we see an increase in ZK network connectivity > overall, until the replica is finally deleted (spamming DELETEREPLICA on the > shard until its removed from the state) > My guess is there's two issues at hand here: > # The leader continually attempts to recover a downed replica that is > unrecoverable because the core does not exist. > # The replica to be deleted is having trouble being deleted from state.json > in ZK. > This is mostly consistent for my use case. I'm running 7.2.1 with 66 nodes. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-12088) Shards with dead replicas cause increased write latency
[ https://issues.apache.org/jira/browse/SOLR-12088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16412212#comment-16412212 ] Jerry Bao commented on SOLR-12088: -- [~caomanhdat] It seems to last forever though I cannot confirm 100%. Definitely lasts past an hour. Why is the amount of LIR threads started decreasing as time goes on? > Shards with dead replicas cause increased write latency > --- > > Key: SOLR-12088 > URL: https://issues.apache.org/jira/browse/SOLR-12088 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrCloud >Affects Versions: 7.2 >Reporter: Jerry Bao >Priority: Major > > If a collection's shard contains dead replicas, write latency to the > collection is increased. For example, if a collection has 10 shards with a > replication factor of 3, and one of those shards contains 3 replicas and 3 > downed replicas, write latency is increased in comparison to a shard that > contains only 3 replicas. > My feeling here is that downed replicas should be completely ignored and not > cause issues to other alive replicas in terms of write latency. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-12087) Deleting replicas sometimes fails and causes the replicas to exist in the down state
[ https://issues.apache.org/jira/browse/SOLR-12087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16408177#comment-16408177 ] Jerry Bao commented on SOLR-12087: -- [~caomanhdat] That sounds exactly like the case I'm running into. I can't verify that the logs you say I should see I saw but I did definitely see the leader logs you were mentioning. {quote}You wrote that Attempting to delete the downed replicas causes failures because the core does not exist anymore. {quote} Sorry I should have been more clear here: It causes failures but not failures that block the deletion of the replica; the replica does eventually get deleted. {quote}Make sure that on the 2nd call of DeleteReplica (for removing zombie replica), parameters are correct because the name of the replica may get changed, ie: from core_node3 to core_node4. {quote} I wrote a small script to find all downed replicas and issue a delete command against them, which does take into account the name change. > Deleting replicas sometimes fails and causes the replicas to exist in the > down state > > > Key: SOLR-12087 > URL: https://issues.apache.org/jira/browse/SOLR-12087 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrCloud >Affects Versions: 7.2 >Reporter: Jerry Bao >Priority: Critical > Attachments: SOLR-12087.test.patch, Screen Shot 2018-03-16 at > 11.50.32 AM.png > > > Sometimes when deleting replicas, the replica fails to be removed from the > cluster state. This occurs especially when deleting replicas en mass; the > resulting cause is that the data is deleted but the replicas aren't removed > from the cluster state. Attempting to delete the downed replicas causes > failures because the core does not exist anymore. > This also occurs when trying to move replicas, since that move is an add and > delete. > Some more information regarding this issue; when the MOVEREPLICA command is > issued, the new replica is created successfully but the replica to be deleted > fails to be removed from state.json (the core is deleted though) and we see > two logs spammed. > # The node containing the leader replica continually (read every second) > attempts to initiate recovery on the replica and fails to do so because the > core does not exist. As a result it continually publishes a down state for > the replica to zookeeper. > # The deleted replica node spams that it cannot locate the core because it's > been deleted. > During this period of time, we see an increase in ZK network connectivity > overall, until the replica is finally deleted (spamming DELETEREPLICA on the > shard until its removed from the state) > My guess is there's two issues at hand here: > # The leader continually attempts to recover a downed replica that is > unrecoverable because the core does not exist. > # The replica to be deleted is having trouble being deleted from state.json > in ZK. > This is mostly consistent for my use case. I'm running 7.2.1 with 66 nodes. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-12087) Deleting replicas sometimes fails and causes the replicas to exist in the down state
[ https://issues.apache.org/jira/browse/SOLR-12087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16402548#comment-16402548 ] Jerry Bao edited comment on SOLR-12087 at 3/16/18 10:28 PM: Adding some more potentially relevant information: We're constantly updating Solr collections via live streaming updates. I noticed that moving shards that don't have live indexing is much easier than those that do. Also heavy indexing seems to be a factor in whether or not zombie shards exist. EDIT: It seems that collections with indexing consistently have zombie shards vs those that dont. was (Author: jerry.bao): Adding some more potentially relevant information: We're constantly updating Solr collections via live streaming updates. I noticed that moving shards that don't have live indexing is much easier than those that do. Also heavy indexing seems to be a factor in whether or not zombie shards exist. > Deleting replicas sometimes fails and causes the replicas to exist in the > down state > > > Key: SOLR-12087 > URL: https://issues.apache.org/jira/browse/SOLR-12087 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrCloud >Affects Versions: 7.2 >Reporter: Jerry Bao >Priority: Critical > Attachments: Screen Shot 2018-03-16 at 11.50.32 AM.png > > > Sometimes when deleting replicas, the replica fails to be removed from the > cluster state. This occurs especially when deleting replicas en mass; the > resulting cause is that the data is deleted but the replicas aren't removed > from the cluster state. Attempting to delete the downed replicas causes > failures because the core does not exist anymore. > This also occurs when trying to move replicas, since that move is an add and > delete. > Some more information regarding this issue; when the MOVEREPLICA command is > issued, the new replica is created successfully but the replica to be deleted > fails to be removed from state.json (the core is deleted though) and we see > two logs spammed. > # The node containing the leader replica continually (read every second) > attempts to initiate recovery on the replica and fails to do so because the > core does not exist. As a result it continually publishes a down state for > the replica to zookeeper. > # The deleted replica node spams that it cannot locate the core because it's > been deleted. > During this period of time, we see an increase in ZK network connectivity > overall, until the replica is finally deleted (spamming DELETEREPLICA on the > shard until its removed from the state) > My guess is there's two issues at hand here: > # The leader continually attempts to recover a downed replica that is > unrecoverable because the core does not exist. > # The replica to be deleted is having trouble being deleted from state.json > in ZK. > This is mostly consistent for my use case. I'm running 7.2.1 with 66 nodes. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-12087) Deleting replicas sometimes fails and causes the replicas to exist in the down state
[ https://issues.apache.org/jira/browse/SOLR-12087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16402548#comment-16402548 ] Jerry Bao edited comment on SOLR-12087 at 3/16/18 10:28 PM: Adding some more potentially relevant information: We're constantly updating Solr collections via live streaming updates. I noticed that moving shards that don't have live indexing is much easier than those that do. Also heavy indexing seems to be a factor in whether or not zombie shards exist. EDIT: It seems that collections with indexing/querying consistently have zombie shards vs those that dont. was (Author: jerry.bao): Adding some more potentially relevant information: We're constantly updating Solr collections via live streaming updates. I noticed that moving shards that don't have live indexing is much easier than those that do. Also heavy indexing seems to be a factor in whether or not zombie shards exist. EDIT: It seems that collections with indexing consistently have zombie shards vs those that dont. > Deleting replicas sometimes fails and causes the replicas to exist in the > down state > > > Key: SOLR-12087 > URL: https://issues.apache.org/jira/browse/SOLR-12087 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrCloud >Affects Versions: 7.2 >Reporter: Jerry Bao >Priority: Critical > Attachments: Screen Shot 2018-03-16 at 11.50.32 AM.png > > > Sometimes when deleting replicas, the replica fails to be removed from the > cluster state. This occurs especially when deleting replicas en mass; the > resulting cause is that the data is deleted but the replicas aren't removed > from the cluster state. Attempting to delete the downed replicas causes > failures because the core does not exist anymore. > This also occurs when trying to move replicas, since that move is an add and > delete. > Some more information regarding this issue; when the MOVEREPLICA command is > issued, the new replica is created successfully but the replica to be deleted > fails to be removed from state.json (the core is deleted though) and we see > two logs spammed. > # The node containing the leader replica continually (read every second) > attempts to initiate recovery on the replica and fails to do so because the > core does not exist. As a result it continually publishes a down state for > the replica to zookeeper. > # The deleted replica node spams that it cannot locate the core because it's > been deleted. > During this period of time, we see an increase in ZK network connectivity > overall, until the replica is finally deleted (spamming DELETEREPLICA on the > shard until its removed from the state) > My guess is there's two issues at hand here: > # The leader continually attempts to recover a downed replica that is > unrecoverable because the core does not exist. > # The replica to be deleted is having trouble being deleted from state.json > in ZK. > This is mostly consistent for my use case. I'm running 7.2.1 with 66 nodes. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-12087) Deleting replicas sometimes fails and causes the replicas to exist in the down state
[ https://issues.apache.org/jira/browse/SOLR-12087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16402548#comment-16402548 ] Jerry Bao edited comment on SOLR-12087 at 3/16/18 10:14 PM: Adding some more potentially relevant information: We're constantly updating Solr collections via live streaming updates. I noticed that moving shards that don't have live indexing is much easier than those that do. Also heavy indexing seems to be a factor in whether or not zombie shards exist. was (Author: jerry.bao): I've updated the description with more information. > Deleting replicas sometimes fails and causes the replicas to exist in the > down state > > > Key: SOLR-12087 > URL: https://issues.apache.org/jira/browse/SOLR-12087 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrCloud >Affects Versions: 7.2 >Reporter: Jerry Bao >Priority: Critical > Attachments: Screen Shot 2018-03-16 at 11.50.32 AM.png > > > Sometimes when deleting replicas, the replica fails to be removed from the > cluster state. This occurs especially when deleting replicas en mass; the > resulting cause is that the data is deleted but the replicas aren't removed > from the cluster state. Attempting to delete the downed replicas causes > failures because the core does not exist anymore. > This also occurs when trying to move replicas, since that move is an add and > delete. > Some more information regarding this issue; when the MOVEREPLICA command is > issued, the new replica is created successfully but the replica to be deleted > fails to be removed from state.json (the core is deleted though) and we see > two logs spammed. > # The node containing the leader replica continually (read every second) > attempts to initiate recovery on the replica and fails to do so because the > core does not exist. As a result it continually publishes a down state for > the replica to zookeeper. > # The deleted replica node spams that it cannot locate the core because it's > been deleted. > During this period of time, we see an increase in ZK network connectivity > overall, until the replica is finally deleted (spamming DELETEREPLICA on the > shard until its removed from the state) > My guess is there's two issues at hand here: > # The leader continually attempts to recover a downed replica that is > unrecoverable because the core does not exist. > # The replica to be deleted is having trouble being deleted from state.json > in ZK. > This is mostly consistent for my use case. I'm running 7.2.1 with 66 nodes. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-12117) Autoscaling suggestions are too few or non existent for clear violations
Jerry Bao created SOLR-12117: Summary: Autoscaling suggestions are too few or non existent for clear violations Key: SOLR-12117 URL: https://issues.apache.org/jira/browse/SOLR-12117 Project: Solr Issue Type: Bug Security Level: Public (Default Security Level. Issues are Public) Components: AutoScaling Reporter: Jerry Bao Attachments: autoscaling.json, diagnostics.json, solr_instances, suggestions.json Attaching suggestions, diagnostics, autoscaling settings, and the solr_instances AZ's. One of the operations suggested is impossible: {code:java} {"type": "violation","violation": {"node": "solr-0a7207d791bd08d4e:8983_solr","tagKey": "null","violation": {"node": "4","delta": 1},"clause": {"cores": "<4","node": "#ANY"}},"operation": {"method": "POST","path": "/c/r_posts","command": {"move-replica": {"targetNode": "solr-0f0e86f34298f7e79:8983_solr","inPlaceMove": "true","replica": "2151000"}}}{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-12117) Autoscaling suggestions are too few or non existent for clear violations
[ https://issues.apache.org/jira/browse/SOLR-12117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jerry Bao updated SOLR-12117: - Description: Attaching suggestions, diagnostics, autoscaling settings, and the solr_instances AZ's. Some of the suggestions are one too many for one violation, and other suggestions do not appear even though there are clear violations in the policy and easily fixable. (was: Attaching suggestions, diagnostics, autoscaling settings, and the solr_instances AZ's. One of the operations suggested is impossible: {code:java} {"type": "violation","violation": {"node": "solr-0a7207d791bd08d4e:8983_solr","tagKey": "null","violation": {"node": "4","delta": 1},"clause": {"cores": "<4","node": "#ANY"}},"operation": {"method": "POST","path": "/c/r_posts","command": {"move-replica": {"targetNode": "solr-0f0e86f34298f7e79:8983_solr","inPlaceMove": "true","replica": "2151000"}}}{code}) > Autoscaling suggestions are too few or non existent for clear violations > > > Key: SOLR-12117 > URL: https://issues.apache.org/jira/browse/SOLR-12117 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: AutoScaling >Reporter: Jerry Bao >Priority: Critical > Attachments: autoscaling.json, diagnostics.json, solr_instances, > suggestions.json > > > Attaching suggestions, diagnostics, autoscaling settings, and the > solr_instances AZ's. Some of the suggestions are one too many for one > violation, and other suggestions do not appear even though there are clear > violations in the policy and easily fixable. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-12116) Autoscaling suggests to move a replica that does not exist (all numbers)
[ https://issues.apache.org/jira/browse/SOLR-12116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jerry Bao updated SOLR-12116: - Attachment: solr_instances autoscaling.json diagnostics.json suggestions.json > Autoscaling suggests to move a replica that does not exist (all numbers) > > > Key: SOLR-12116 > URL: https://issues.apache.org/jira/browse/SOLR-12116 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: AutoScaling >Reporter: Jerry Bao >Priority: Critical > Attachments: autoscaling.json, diagnostics.json, solr_instances, > suggestions.json > > > Attaching suggestions, diagnostics, autoscaling settings, and the > solr_instances AZ's. One of the operations suggested is impossible: > {code:java} > {"type": "violation","violation": {"node": > "solr-0a7207d791bd08d4e:8983_solr","tagKey": "null","violation": {"node": > "4","delta": 1},"clause": {"cores": "<4","node": "#ANY"}},"operation": > {"method": "POST","path": "/c/r_posts","command": {"move-replica": > {"targetNode": "solr-0f0e86f34298f7e79:8983_solr","inPlaceMove": > "true","replica": "2151000"}}}{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-12116) Autoscaling suggests to move a replica that does not exist (all numbers)
[ https://issues.apache.org/jira/browse/SOLR-12116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jerry Bao updated SOLR-12116: - Priority: Critical (was: Major) > Autoscaling suggests to move a replica that does not exist (all numbers) > > > Key: SOLR-12116 > URL: https://issues.apache.org/jira/browse/SOLR-12116 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: AutoScaling >Reporter: Jerry Bao >Priority: Critical > > Attaching suggestions, diagnostics, autoscaling settings, and the > solr_instances AZ's. One of the operations suggested is impossible: > {code:java} > {"type": "violation","violation": {"node": > "solr-0a7207d791bd08d4e:8983_solr","tagKey": "null","violation": {"node": > "4","delta": 1},"clause": {"cores": "<4","node": "#ANY"}},"operation": > {"method": "POST","path": "/c/r_posts","command": {"move-replica": > {"targetNode": "solr-0f0e86f34298f7e79:8983_solr","inPlaceMove": > "true","replica": "2151000"}}}{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-12116) Autoscaling suggests to move a replica that does not exist (all numbers)
Jerry Bao created SOLR-12116: Summary: Autoscaling suggests to move a replica that does not exist (all numbers) Key: SOLR-12116 URL: https://issues.apache.org/jira/browse/SOLR-12116 Project: Solr Issue Type: Bug Security Level: Public (Default Security Level. Issues are Public) Components: AutoScaling Reporter: Jerry Bao Attaching suggestions, diagnostics, autoscaling settings, and the solr_instances AZ's. One of the operations suggested is impossible: {code:java} {"type": "violation","violation": {"node": "solr-0a7207d791bd08d4e:8983_solr","tagKey": "null","violation": {"node": "4","delta": 1},"clause": {"cores": "<4","node": "#ANY"}},"operation": {"method": "POST","path": "/c/r_posts","command": {"move-replica": {"targetNode": "solr-0f0e86f34298f7e79:8983_solr","inPlaceMove": "true","replica": "2151000"}}}{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-12087) Deleting replicas sometimes fails and causes the replicas to exist in the down state
[ https://issues.apache.org/jira/browse/SOLR-12087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jerry Bao updated SOLR-12087: - Description: Sometimes when deleting replicas, the replica fails to be removed from the cluster state. This occurs especially when deleting replicas en mass; the resulting cause is that the data is deleted but the replicas aren't removed from the cluster state. Attempting to delete the downed replicas causes failures because the core does not exist anymore. This also occurs when trying to move replicas, since that move is an add and delete. Some more information regarding this issue; when the MOVEREPLICA command is issued, the new replica is created successfully but the replica to be deleted fails to be removed from state.json (the core is deleted though) and we see two logs spammed. # The node containing the leader replica continually (read every second) attempts to initiate recovery on the replica and fails to do so because the core does not exist. As a result it continually publishes a down state for the replica to zookeeper. # The deleted replica node spams that it cannot locate the core because it's been deleted. During this period of time, we see an increase in ZK network connectivity overall, until the replica is finally deleted (spamming DELETEREPLICA on the shard until its removed from the state) My guess is there's two issues at hand here: # The leader continually attempts to recover a downed replica that is unrecoverable because the core does not exist. # The replica to be deleted is having trouble being deleted from state.json in ZK. This is mostly consistent for my use case. I'm running 7.2.1 with 66 nodes. was: Sometimes when deleting replicas, the replica fails to be removed from the cluster state. This occurs especially when deleting replicas en mass; the resulting cause is that the data is deleted but the replicas aren't removed from the cluster state. Attempting to delete the downed replicas causes failures because the core does not exist anymore. This also occurs when trying to move replicas, since that move is an add and delete. Some more information regarding this issue; when the MOVEREPLICA command is issued, the new replica is created successfully but the replica to be deleted fails to be removed from state.json (the core is deleted though) and we see two logs spammed. # The node containing the leader replica continually (read every second) attempts to initiate recovery on the replica and fails to do so because the core does not exist. As a result it continually publishes a down state for the replica to zookeeper. # The replica node spams that it cannot locate the core because it's been deleted. During this period of time, we see an increase in ZK network connectivity overall, until the replica is finally deleted (spamming DELETEREPLICA on the shard until its removed from the state) My guess is there's two issues at hand here: # The leader continually attempts to recover a downed replica that is unrecoverable because the core does not exist. # The replica to be deleted is having trouble being deleted from state.json in ZK. This is mostly consistent for my use case. I'm running 7.2.1 with 66 nodes. > Deleting replicas sometimes fails and causes the replicas to exist in the > down state > > > Key: SOLR-12087 > URL: https://issues.apache.org/jira/browse/SOLR-12087 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrCloud >Affects Versions: 7.2 >Reporter: Jerry Bao >Priority: Critical > Attachments: Screen Shot 2018-03-16 at 11.50.32 AM.png > > > Sometimes when deleting replicas, the replica fails to be removed from the > cluster state. This occurs especially when deleting replicas en mass; the > resulting cause is that the data is deleted but the replicas aren't removed > from the cluster state. Attempting to delete the downed replicas causes > failures because the core does not exist anymore. > This also occurs when trying to move replicas, since that move is an add and > delete. > Some more information regarding this issue; when the MOVEREPLICA command is > issued, the new replica is created successfully but the replica to be deleted > fails to be removed from state.json (the core is deleted though) and we see > two logs spammed. > # The node containing the leader replica continually (read every second) > attempts to initiate recovery on the replica and fails to do so because the > core does not exist. As a result it continually publishes a down state for > the replica to zookeeper. > # The deleted replica node spams that it cannot locate the core because it's > been
[jira] [Updated] (SOLR-12087) Deleting replicas sometimes fails and causes the replicas to exist in the down state
[ https://issues.apache.org/jira/browse/SOLR-12087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jerry Bao updated SOLR-12087: - Description: Sometimes when deleting replicas, the replica fails to be removed from the cluster state. This occurs especially when deleting replicas en mass; the resulting cause is that the data is deleted but the replicas aren't removed from the cluster state. Attempting to delete the downed replicas causes failures because the core does not exist anymore. This also occurs when trying to move replicas, since that move is an add and delete. Some more information regarding this issue; when the MOVEREPLICA command is issued, the new replica is created successfully but the replica to be deleted fails to be removed from state.json (the core is deleted though) and we see two logs spammed. # The node containing the leader replica continually (read every second) attempts to initiate recovery on the replica and fails to do so because the core does not exist. As a result it continually publishes a down state for the replica to zookeeper. # The replica node spams that it cannot locate the core because it's been deleted. During this period of time, we see an increase in ZK network connectivity overall, until the replica is finally deleted (spamming DELETEREPLICA on the shard until its removed from the state) My guess is there's two issues at hand here: # The leader continually attempts to recover a downed replica that is unrecoverable because the core does not exist. # The replica to be deleted is having trouble being deleted from state.json in ZK. This is mostly consistent for my use case. I'm running 7.2.1 with 66 nodes. was: Sometimes when deleting replicas, the replica fails to be removed from the cluster state. This occurs especially when deleting replicas en mass; the resulting cause is that the data is deleted but the replicas aren't removed from the cluster state. Attempting to delete the downed replicas causes failures because the core does not exist anymore. This also occurs when trying to move replicas, since that move is an add and delete. Some more information regarding this issue; when the MOVEREPLICA command is issued, the new replica is created successfully but the replica to be deleted fails to be removed from state.json (the core is deleted though) and we see two logs spammed. # The node containing the leader replica continually attempts to initiate recovery on the replica and fails to do so because the core does not exist. As a result it continually publishes a down state for the replica to zookeeper. # The replica node spams that it cannot locate the core because it's been deleted. During this period of time, we see an increase in ZK network connectivity overall, until the replica is finally deleted (spamming DELETEREPLICA on the shard until its removed from the state) My guess is there's two issues at hand here: # The leader continually attempts to recover a downed replica that is unrecoverable because the core does not exist. # The replica to be deleted is having trouble being deleted from state.json in ZK. This is mostly consistent for my use case. I'm running 7.2.1 with 66 nodes. > Deleting replicas sometimes fails and causes the replicas to exist in the > down state > > > Key: SOLR-12087 > URL: https://issues.apache.org/jira/browse/SOLR-12087 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrCloud >Affects Versions: 7.2 >Reporter: Jerry Bao >Priority: Critical > Attachments: Screen Shot 2018-03-16 at 11.50.32 AM.png > > > Sometimes when deleting replicas, the replica fails to be removed from the > cluster state. This occurs especially when deleting replicas en mass; the > resulting cause is that the data is deleted but the replicas aren't removed > from the cluster state. Attempting to delete the downed replicas causes > failures because the core does not exist anymore. > This also occurs when trying to move replicas, since that move is an add and > delete. > Some more information regarding this issue; when the MOVEREPLICA command is > issued, the new replica is created successfully but the replica to be deleted > fails to be removed from state.json (the core is deleted though) and we see > two logs spammed. > # The node containing the leader replica continually (read every second) > attempts to initiate recovery on the replica and fails to do so because the > core does not exist. As a result it continually publishes a down state for > the replica to zookeeper. > # The replica node spams that it cannot locate the core because it's been > deleted. > During this period of time, we
[jira] [Commented] (SOLR-12087) Deleting replicas sometimes fails and causes the replicas to exist in the down state
[ https://issues.apache.org/jira/browse/SOLR-12087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16402548#comment-16402548 ] Jerry Bao commented on SOLR-12087: -- I've updated the description with more information. > Deleting replicas sometimes fails and causes the replicas to exist in the > down state > > > Key: SOLR-12087 > URL: https://issues.apache.org/jira/browse/SOLR-12087 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrCloud >Affects Versions: 7.2 >Reporter: Jerry Bao >Priority: Critical > Attachments: Screen Shot 2018-03-16 at 11.50.32 AM.png > > > Sometimes when deleting replicas, the replica fails to be removed from the > cluster state. This occurs especially when deleting replicas en mass; the > resulting cause is that the data is deleted but the replicas aren't removed > from the cluster state. Attempting to delete the downed replicas causes > failures because the core does not exist anymore. > This also occurs when trying to move replicas, since that move is an add and > delete. > Some more information regarding this issue; when the MOVEREPLICA command is > issued, the new replica is created successfully but the replica to be deleted > fails to be removed from state.json (the core is deleted though) and we see > two logs spammed. > # The node containing the leader replica continually attempts to initiate > recovery on the replica and fails to do so because the core does not exist. > As a result it continually publishes a down state for the replica to > zookeeper. > # The replica node spams that it cannot locate the core because it's been > deleted. > During this period of time, we see an increase in ZK network connectivity > overall, until the replica is finally deleted (spamming DELETEREPLICA on the > shard until its removed from the state) > My guess is there's two issues at hand here: > # The leader continually attempts to recover a downed replica that is > unrecoverable because the core does not exist. > # The replica to be deleted is having trouble being deleted from state.json > in ZK. > This is mostly consistent for my use case. I'm running 7.2.1 with 66 nodes. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-12087) Deleting replicas sometimes fails and causes the replicas to exist in the down state
[ https://issues.apache.org/jira/browse/SOLR-12087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jerry Bao updated SOLR-12087: - Priority: Critical (was: Major) > Deleting replicas sometimes fails and causes the replicas to exist in the > down state > > > Key: SOLR-12087 > URL: https://issues.apache.org/jira/browse/SOLR-12087 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrCloud >Affects Versions: 7.2 >Reporter: Jerry Bao >Priority: Critical > Attachments: Screen Shot 2018-03-16 at 11.50.32 AM.png > > > Sometimes when deleting replicas, the replica fails to be removed from the > cluster state. This occurs especially when deleting replicas en mass; the > resulting cause is that the data is deleted but the replicas aren't removed > from the cluster state. Attempting to delete the downed replicas causes > failures because the core does not exist anymore. > This also occurs when trying to move replicas, since that move is an add and > delete. > Some more information regarding this issue; when the MOVEREPLICA command is > issued, the new replica is created successfully but the replica to be deleted > fails to be removed from state.json (the core is deleted though) and we see > two logs spammed. > # The node containing the leader replica continually attempts to initiate > recovery on the replica and fails to do so because the core does not exist. > As a result it continually publishes a down state for the replica to > zookeeper. > # The replica node spams that it cannot locate the core because it's been > deleted. > During this period of time, we see an increase in ZK network connectivity > overall, until the replica is finally deleted (spamming DELETEREPLICA on the > shard until its removed from the state) > My guess is there's two issues at hand here: > # The leader continually attempts to recover a downed replica that is > unrecoverable because the core does not exist. > # The replica to be deleted is having trouble being deleted from state.json > in ZK. > This is mostly consistent for my use case. I'm running 7.2.1 with 66 nodes. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-12087) Deleting replicas sometimes fails and causes the replicas to exist in the down state
[ https://issues.apache.org/jira/browse/SOLR-12087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jerry Bao updated SOLR-12087: - Attachment: Screen Shot 2018-03-16 at 11.50.32 AM.png > Deleting replicas sometimes fails and causes the replicas to exist in the > down state > > > Key: SOLR-12087 > URL: https://issues.apache.org/jira/browse/SOLR-12087 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrCloud >Affects Versions: 7.2 >Reporter: Jerry Bao >Priority: Major > Attachments: Screen Shot 2018-03-16 at 11.50.32 AM.png > > > Sometimes when deleting replicas, the replica fails to be removed from the > cluster state. This occurs especially when deleting replicas en mass; the > resulting cause is that the data is deleted but the replicas aren't removed > from the cluster state. Attempting to delete the downed replicas causes > failures because the core does not exist anymore. > This also occurs when trying to move replicas, since that move is an add and > delete. > Some more information regarding this issue; when the MOVEREPLICA command is > issued, the new replica is created successfully but the replica to be deleted > fails to be removed from state.json (the core is deleted though) and we see > two logs spammed. > # The node containing the leader replica continually attempts to initiate > recovery on the replica and fails to do so because the core does not exist. > As a result it continually publishes a down state for the replica to > zookeeper. > # The replica node spams that it cannot locate the core because it's been > deleted. > During this period of time, we see an increase in ZK network connectivity > overall, until the replica is finally deleted (spamming DELETEREPLICA on the > shard until its removed from the state) > My guess is there's two issues at hand here: > # The leader continually attempts to recover a downed replica that is > unrecoverable because the core does not exist. > # The replica to be deleted is having trouble being deleted from state.json > in ZK. > This is mostly consistent for my use case. I'm running 7.2.1 with 66 nodes. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-12087) Deleting replicas sometimes fails and causes the replicas to exist in the down state
[ https://issues.apache.org/jira/browse/SOLR-12087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jerry Bao updated SOLR-12087: - Description: Sometimes when deleting replicas, the replica fails to be removed from the cluster state. This occurs especially when deleting replicas en mass; the resulting cause is that the data is deleted but the replicas aren't removed from the cluster state. Attempting to delete the downed replicas causes failures because the core does not exist anymore. This also occurs when trying to move replicas, since that move is an add and delete. Some more information regarding this issue; when the MOVEREPLICA command is issued, the new replica is created successfully but the replica to be deleted fails to be removed from state.json (the core is deleted though) and we see two logs spammed. # The node containing the leader replica continually attempts to initiate recovery on the replica and fails to do so because the core does not exist. As a result it continually publishes a down state for the replica to zookeeper. # The replica node spams that it cannot locate the core because it's been deleted. During this period of time, we see an increase in ZK network connectivity overall, until the replica is finally deleted (spamming DELETEREPLICA on the shard until its removed from the state) My guess is there's two issues at hand here: # The leader continually attempts to recover a downed replica that is unrecoverable because the core does not exist. # The replica to be deleted is having trouble being deleted from state.json in ZK. This is mostly consistent for my use case. I'm running 7.2.1 with 66 nodes. was: Sometimes when deleting replicas, the replica fails to be removed from the cluster state. This occurs especially when deleting replicas en mass; the resulting cause is that the data is deleted but the replicas aren't removed from the cluster state. Attempting to delete the downed replicas causes failures because the core does not exist anymore. This also occurs when trying to move replicas, since that move is an add and delete. > Deleting replicas sometimes fails and causes the replicas to exist in the > down state > > > Key: SOLR-12087 > URL: https://issues.apache.org/jira/browse/SOLR-12087 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrCloud >Affects Versions: 7.2 >Reporter: Jerry Bao >Priority: Major > Attachments: Screen Shot 2018-03-16 at 11.50.32 AM.png > > > Sometimes when deleting replicas, the replica fails to be removed from the > cluster state. This occurs especially when deleting replicas en mass; the > resulting cause is that the data is deleted but the replicas aren't removed > from the cluster state. Attempting to delete the downed replicas causes > failures because the core does not exist anymore. > This also occurs when trying to move replicas, since that move is an add and > delete. > Some more information regarding this issue; when the MOVEREPLICA command is > issued, the new replica is created successfully but the replica to be deleted > fails to be removed from state.json (the core is deleted though) and we see > two logs spammed. > # The node containing the leader replica continually attempts to initiate > recovery on the replica and fails to do so because the core does not exist. > As a result it continually publishes a down state for the replica to > zookeeper. > # The replica node spams that it cannot locate the core because it's been > deleted. > During this period of time, we see an increase in ZK network connectivity > overall, until the replica is finally deleted (spamming DELETEREPLICA on the > shard until its removed from the state) > My guess is there's two issues at hand here: > # The leader continually attempts to recover a downed replica that is > unrecoverable because the core does not exist. > # The replica to be deleted is having trouble being deleted from state.json > in ZK. > This is mostly consistent for my use case. I'm running 7.2.1 with 66 nodes. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-12087) Deleting replicas sometimes fails and causes the replicas to exist in the down state
[ https://issues.apache.org/jira/browse/SOLR-12087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jerry Bao updated SOLR-12087: - Summary: Deleting replicas sometimes fails and causes the replicas to exist in the down state (was: Deleting shards sometimes fails and causes the shard to exist in the down state) > Deleting replicas sometimes fails and causes the replicas to exist in the > down state > > > Key: SOLR-12087 > URL: https://issues.apache.org/jira/browse/SOLR-12087 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrCloud >Affects Versions: 7.2 >Reporter: Jerry Bao >Priority: Major > > Sometimes when deleting replicas, the replica fails to be removed from the > cluster state. This occurs especially when deleting replicas en mass; the > resulting cause is that the data is deleted but the replicas aren't removed > from the cluster state. Attempting to delete the downed replicas causes > failures because the core does not exist anymore. > This also occurs when trying to move replicas, since that move is an add and > delete. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-12087) Deleting shards sometimes fails and causes the shard to exist in the down state
[ https://issues.apache.org/jira/browse/SOLR-12087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jerry Bao updated SOLR-12087: - Description: Sometimes when deleting replicas, the replica fails to be removed from the cluster state. This occurs especially when deleting replicas en mass; the resulting cause is that the data is deleted but the replicas aren't removed from the cluster state. Attempting to delete the downed replicas causes failures because the core does not exist anymore. This also occurs when trying to move replicas, since that move is an add and delete. was: Sometimes when deleting replicas, the replica fails to be removed from the cluster state. This occurs especially when deleting replicas en mass; the resulting cause is that the data is deleted but the replicas aren't removed from the cluster state. Attempting to delete the downed replicas causes failures because the core does not exist anymore. It seems like when deleting replicas, ZK writes are timing out, preventing the cluster state from being properly updated. > Deleting shards sometimes fails and causes the shard to exist in the down > state > --- > > Key: SOLR-12087 > URL: https://issues.apache.org/jira/browse/SOLR-12087 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrCloud >Affects Versions: 7.2 >Reporter: Jerry Bao >Priority: Major > > Sometimes when deleting replicas, the replica fails to be removed from the > cluster state. This occurs especially when deleting replicas en mass; the > resulting cause is that the data is deleted but the replicas aren't removed > from the cluster state. Attempting to delete the downed replicas causes > failures because the core does not exist anymore. > This also occurs when trying to move replicas, since that move is an add and > delete. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-12088) Shards with dead replicas cause increased write latency
[ https://issues.apache.org/jira/browse/SOLR-12088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16399608#comment-16399608 ] Jerry Bao edited comment on SOLR-12088 at 3/14/18 11:35 PM: We've been running on Solr 7.2.1, so its all been state.json and not clusterstate.json. In regards to re-issuing the DELETEREPLICA command, sometimes that fails and I filed a Jira for that here: SOLR-12087. That was what was causing this second issue here. For example purposes, our indexing latency went from 2s to 1.7s after successfully deleting the dead replicas. One thing I did notice is that the dead replicas spam the logs with "unable to unload non-existent core" on the machine that hosts the dead replicas. Could be a side affect? was (Author: jerry.bao): We've been running on Solr 7.2.1, so its all been state.json and not clusterstate.json. In regards to re-issuing the DELETEREPLICA command, sometimes that fails and I filed a Jira for that here: SOLR-12087. That was what was causing this second issue here. For example purposes, our indexing latency went from 2s to 1.7s after deleting the dead replicas. One thing I did notice is that the dead replicas spam the logs with "unable to unload non-existent core" on the machine that hosts the dead replicas. Could be a side affect? > Shards with dead replicas cause increased write latency > --- > > Key: SOLR-12088 > URL: https://issues.apache.org/jira/browse/SOLR-12088 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrCloud >Affects Versions: 7.2 >Reporter: Jerry Bao >Priority: Major > > If a collection's shard contains dead replicas, write latency to the > collection is increased. For example, if a collection has 10 shards with a > replication factor of 3, and one of those shards contains 3 replicas and 3 > downed replicas, write latency is increased in comparison to a shard that > contains only 3 replicas. > My feeling here is that downed replicas should be completely ignored and not > cause issues to other alive replicas in terms of write latency. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-12088) Shards with dead replicas cause increased write latency
[ https://issues.apache.org/jira/browse/SOLR-12088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16399608#comment-16399608 ] Jerry Bao commented on SOLR-12088: -- We've been running on Solr 7.2.1, so its all been state.json and not clusterstate.json. In regards to re-issuing the DELETEREPLICA command, sometimes that fails and I filed a Jira for that here: SOLR-12087. That was what was causing this second issue here. For example purposes, our indexing latency went from 2s to 1.7s after deleting the dead replicas. One thing I did notice is that the dead replicas spam the logs with "unable to unload non-existent core" on the machine that hosts the dead replicas. Could be a side affect? > Shards with dead replicas cause increased write latency > --- > > Key: SOLR-12088 > URL: https://issues.apache.org/jira/browse/SOLR-12088 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrCloud >Affects Versions: 7.2 >Reporter: Jerry Bao >Priority: Major > > If a collection's shard contains dead replicas, write latency to the > collection is increased. For example, if a collection has 10 shards with a > replication factor of 3, and one of those shards contains 3 replicas and 3 > downed replicas, write latency is increased in comparison to a shard that > contains only 3 replicas. > My feeling here is that downed replicas should be completely ignored and not > cause issues to other alive replicas in terms of write latency. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-12088) Shards with dead replicas cause increased write latency
[ https://issues.apache.org/jira/browse/SOLR-12088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16399184#comment-16399184 ] Jerry Bao commented on SOLR-12088: -- [~erickerickson] I don't have an answer to your question; this issue occurred from movement of replicas where the movement did not completely clean up the state of the replicas, causing it to be a zombie replicas (data gone but state still exists after movement). Your thinking definitely could explain why theres a higher latency of indexing times. That makes the most sense to me. How long is this timeout? > Shards with dead replicas cause increased write latency > --- > > Key: SOLR-12088 > URL: https://issues.apache.org/jira/browse/SOLR-12088 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrCloud >Affects Versions: 7.2 >Reporter: Jerry Bao >Priority: Major > > If a collection's shard contains dead replicas, write latency to the > collection is increased. For example, if a collection has 10 shards with a > replication factor of 3, and one of those shards contains 3 replicas and 3 > downed replicas, write latency is increased in comparison to a shard that > contains only 3 replicas. > My feeling here is that downed replicas should be completely ignored and not > cause issues to other alive replicas in terms of write latency. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-12088) Shards with dead replicas cause increased write latency
[ https://issues.apache.org/jira/browse/SOLR-12088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16399167#comment-16399167 ] Jerry Bao commented on SOLR-12088: -- Your scenario is what I experienced, so yes :) 1. 30 nodes in the cluster 2. There are no nodes part of the cluster that aren't hosting any replicas. 3. Indexing via Lucidwork's Fusion (which I assume is using a SolrJ based client) 4. Latency is measured through our own service's instrumentation of roundtrip time to index. > Shards with dead replicas cause increased write latency > --- > > Key: SOLR-12088 > URL: https://issues.apache.org/jira/browse/SOLR-12088 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrCloud >Affects Versions: 7.2 >Reporter: Jerry Bao >Priority: Major > > If a collection's shard contains dead replicas, write latency to the > collection is increased. For example, if a collection has 10 shards with a > replication factor of 3, and one of those shards contains 3 replicas and 3 > downed replicas, write latency is increased in comparison to a shard that > contains only 3 replicas. > My feeling here is that downed replicas should be completely ignored and not > cause issues to other alive replicas in terms of write latency. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-12088) Shards with dead replicas cause increased write latency
Jerry Bao created SOLR-12088: Summary: Shards with dead replicas cause increased write latency Key: SOLR-12088 URL: https://issues.apache.org/jira/browse/SOLR-12088 Project: Solr Issue Type: Bug Security Level: Public (Default Security Level. Issues are Public) Components: SolrCloud Affects Versions: 7.2 Reporter: Jerry Bao If a collection's shard contains dead replicas, write latency to the collection is increased. For example, if a collection has 10 shards with a replication factor of 3, and one of those shards contains 3 replicas and 3 downed replicas, write latency is increased in comparison to a shard that contains only 3 replicas. My feeling here is that downed replicas should be completely ignored and not cause issues to other alive replicas in terms of write latency. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-12087) Deleting shards sometimes fails and causes the shard to exist in the down state
Jerry Bao created SOLR-12087: Summary: Deleting shards sometimes fails and causes the shard to exist in the down state Key: SOLR-12087 URL: https://issues.apache.org/jira/browse/SOLR-12087 Project: Solr Issue Type: Bug Security Level: Public (Default Security Level. Issues are Public) Components: SolrCloud Affects Versions: 7.2 Reporter: Jerry Bao Sometimes when deleting replicas, the replica fails to be removed from the cluster state. This occurs especially when deleting replicas en mass; the resulting cause is that the data is deleted but the replicas aren't removed from the cluster state. Attempting to delete the downed replicas causes failures because the core does not exist anymore. It seems like when deleting replicas, ZK writes are timing out, preventing the cluster state from being properly updated. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-12014) Cryptic error message when creating a collection with sharding that violates autoscaling policies
[ https://issues.apache.org/jira/browse/SOLR-12014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jerry Bao updated SOLR-12014: - Description: When creating a collection with sharding a replication factors that are impossible because it will violate autoscaling policies, Solr raises a cryptic exception that is unrelated to the issue. {code:java} { "responseHeader":{ "status":500, "QTime":629}, "Operation create caused exception:":"org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: Error closing CloudSolrClient", "exception":{ "msg":"Error closing CloudSolrClient", "rspCode":500}, "error":{ "metadata":[ "error-class","org.apache.solr.common.SolrException", "root-error-class","org.apache.solr.common.SolrException"], "msg":"Error closing CloudSolrClient", "trace":"org.apache.solr.common.SolrException: Error closing CloudSolrClient\n\tat org.apache.solr.handler.admin.CollectionsHandler.handleResponse(CollectionsHandler.java:309)\n\tat org.apache.solr.handler.admin.CollectionsHandler.invokeAction(CollectionsHandler.java:246)\n\tat org.apache.solr.handler.admin.CollectionsHandler.handleRequestBody(CollectionsHandler.java:224)\n\tat org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:177)\n\tat org.apache.solr.servlet.HttpSolrCall.handleAdmin(HttpSolrCall.java:735)\n\tat org.apache.solr.servlet.HttpSolrCall.handleAdminRequest(HttpSolrCall.java:716)\n\tat org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:497)\n\tat org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:382)\n\tat org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:326)\n\tat org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1751)\n\tat org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)\n\tat org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)\n\tat org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)\n\tat org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)\n\tat org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)\n\tat org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)\n\tat org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)\n\tat org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)\n\tat org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)\n\tat org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)\n\tat org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)\n\tat org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)\n\tat org.eclipse.jetty.server.Server.handle(Server.java:534)\n\tat org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320)\n\tat org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)\n\tat org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283)\n\tat org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:108)\n\tat org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)\n\tat org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303)\n\tat org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)\n\tat org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136)\n\tat org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)\n\tat org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)\n\tat java.lang.Thread.run(Thread.java:748)\n", "code":500}}{code} was:When creating a collection with sharding a replication factors that are impossible because it will violate autoscaling policies, Solr raises a cryptic exception that is unrelated to the issue. > Cryptic error message when creating a collection with sharding that violates > autoscaling policies > - > > Key: SOLR-12014 > URL: https://issues.apache.org/jira/browse/SOLR-12014 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: AutoScaling >Affects Versions: 7.2 >Reporter: Jerry Bao >Priority: Major > > When creating a collection with sharding a replication factors that
[jira] [Created] (SOLR-12014) Cryptic error message when creating a collection with sharding that violates autoscaling policies
Jerry Bao created SOLR-12014: Summary: Cryptic error message when creating a collection with sharding that violates autoscaling policies Key: SOLR-12014 URL: https://issues.apache.org/jira/browse/SOLR-12014 Project: Solr Issue Type: Bug Security Level: Public (Default Security Level. Issues are Public) Components: AutoScaling Affects Versions: 7.2 Reporter: Jerry Bao When creating a collection with sharding a replication factors that are impossible because it will violate autoscaling policies, Solr raises a cryptic exception that is unrelated to the issue. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org