[jira] [Comment Edited] (SOLR-14347) Autoscaling placement wrong when concurrent replica placements are calculated
[ https://issues.apache.org/jira/browse/SOLR-14347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17119116#comment-17119116 ] Ilan Ginzburg edited comment on SOLR-14347 at 5/29/20, 6:28 AM: [~ab] I’ve created PR [https://github.com/apache/lucene-solr/pull/1542|https://github.com/apache/lucene-solr/pull/1542] that I believe solves the issue identified above. Fix has two parts: # The obvious one: in {{PolicyHelper.getReplicaLocations()}}, the new (post placement computation) Session is returned with the {{SessionWrapper}} so that the next use sees the assignments of the computation rather than the initial state. # A less obvious one: in the same method, the way the current (orig) session is copied to create the new Session is modified to not validate collections in Zookeeper. This validation removed from the new session everything that didn’t make it to Zookeeper, therefore not showing assignments in progress. {{Policy.Session.cloneToNewSession()}} contains the code doing the copy (tried to keep it as close to the original behavior as possible). A multithreaded collection creation test (JMeter with 40 threads looping through creating single shard single replica collections) led to a balanced 3 nodes cluster. Before the fix there were severe imbalances (up to a single node taking all replicas of the run). After this fix gets merged (as well as [https://github.com/apache/lucene-solr/pull/1504|https://github.com/apache/lucene-solr/pull/1504] in SOLR-14462 dealing with Session creation and caching), I believe Session management could benefit from some refactoring and simplification. was (Author: murblanc): [~ab] I’ve created PR [https://github.com/apache/lucene-solr/pull/1542|http://example.com/] that I believe solves the issue identified above. Fix has two parts: # The obvious one: in {{PolicyHelper.getReplicaLocations()}}, the new (post placement computation) Session is returned with the {{SessionWrapper}} so that the next use sees the assignments of the computation rather than the initial state. # A less obvious one: in the same method, the way the current (orig) session is copied to create the new Session is modified to not validate collections in Zookeeper. This validation removed from the new session everything that didn’t make it to Zookeeper, therefore not showing assignments in progress. {{Policy.Session.cloneToNewSession()}} contains the code doing the copy (tried to keep it as close to the original behavior as possible). A multithreaded collection creation test (JMeter with 40 threads looping through creating single shard single replica collections) led to a balanced 3 nodes cluster. Before the fix there were severe imbalances (up to a single node taking all replicas of the run). After this fix gets merged (as well as [https://github.com/apache/lucene-solr/pull/1504|http://example.com/] in SOLR-14462 dealing with Session creation and caching), I believe Session management could benefit from some refactoring and simplification. > Autoscaling placement wrong when concurrent replica placements are calculated > - > > Key: SOLR-14347 > URL: https://issues.apache.org/jira/browse/SOLR-14347 > Project: Solr > Issue Type: Bug > Components: AutoScaling >Affects Versions: 8.5 >Reporter: Andrzej Bialecki >Assignee: Andrzej Bialecki >Priority: Critical > Fix For: 8.6 > > Attachments: SOLR-14347.patch > > Time Spent: 0.5h > Remaining Estimate: 0h > > Steps to reproduce: > * create a cluster of a few nodes (tested with 7 nodes) > * define per-collection policies that distribute replicas exclusively on > different nodes per policy > * concurrently create a few collections, each using a different policy > * resulting replica placement will be seriously wrong, causing many policy > violations > Running the same scenario but instead creating collections sequentially > results in no violations. > I suspect this is caused by incorrect locking level for all collection > operations (as defined in {{CollectionParams.CollectionAction}}) that create > new replica placements - i.e. CREATE, ADDREPLICA, MOVEREPLICA, DELETENODE, > REPLACENODE, SPLITSHARD, RESTORE, REINDEXCOLLECTION. All of these operations > use the policy engine to create new replica placements, and as a result they > change the cluster state. However, currently these operations are locked (in > {{OverseerCollectionMessageHandler.lockTask}} ) using > {{LockLevel.COLLECTION}}. In practice this means that the lock is held only > for the particular collection that is being modified. > A straightforward fix for this issue is to change the locking level to > CLUSTER (and I confirm this fixes the scenario
[jira] [Comment Edited] (SOLR-14347) Autoscaling placement wrong when concurrent replica placements are calculated
[ https://issues.apache.org/jira/browse/SOLR-14347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17119116#comment-17119116 ] Ilan Ginzburg edited comment on SOLR-14347 at 5/29/20, 6:27 AM: [~ab] I’ve created PR [https://github.com/apache/lucene-solr/pull/1542|http://example.com/] that I believe solves the issue identified above. Fix has two parts: # The obvious one: in {{PolicyHelper.getReplicaLocations()}}, the new (post placement computation) Session is returned with the {{SessionWrapper}} so that the next use sees the assignments of the computation rather than the initial state. # A less obvious one: in the same method, the way the current (orig) session is copied to create the new Session is modified to not validate collections in Zookeeper. This validation removed from the new session everything that didn’t make it to Zookeeper, therefore not showing assignments in progress. {{Policy.Session.cloneToNewSession()}} contains the code doing the copy (tried to keep it as close to the original behavior as possible). A multithreaded collection creation test (JMeter with 40 threads looping through creating single shard single replica collections) led to a balanced 3 nodes cluster. Before the fix there were severe imbalances (up to a single node taking all replicas of the run). After this fix gets merged (as well as [https://github.com/apache/lucene-solr/pull/1504|http://example.com/] in SOLR-14462 dealing with Session creation and caching), I believe Session management could benefit from some refactoring and simplification. was (Author: murblanc): [~ab] I’ve created PR [https://github.com/apache/lucene-solr/pull/1542|http://example.com] that I believe solves the issue identified above. Fix has two parts: # The obvious one: in {{PolicyHelper.getReplicaLocations()}}, the new (post placement computation) Session is returned with the {{SessionWrapper}} so that the next use sees the assignments of the computation rather than the initial state. # A less obvious one: in the same method, the way the current (orig) session is copied to create the new Session is modified to not validate collections in Zookeeper. This validation removed from the new session everything that didn’t make it to Zookeeper, therefore not showing assignments in progress. {{Policy.Session.cloneToNewSession()}} contains the code doing the copy (tried to keep it as close to the original behavior as possible). A multithreaded collection creation test (JMeter with 40 threads looping through creating single shard single replica collections) led to a balanced 3 nodes cluster. Before the fix there were severe imbalances (up to a single node taking all replicas of the run). After this fix gets merged (as well as [https://github.com/apache/lucene-solr/pull/1504|http://example.com] in SOLR-14462 dealing with Session creation and caching), I believe Session management could benefit from some refactoring and simplification. > Autoscaling placement wrong when concurrent replica placements are calculated > - > > Key: SOLR-14347 > URL: https://issues.apache.org/jira/browse/SOLR-14347 > Project: Solr > Issue Type: Bug > Components: AutoScaling >Affects Versions: 8.5 >Reporter: Andrzej Bialecki >Assignee: Andrzej Bialecki >Priority: Critical > Fix For: 8.6 > > Attachments: SOLR-14347.patch > > Time Spent: 0.5h > Remaining Estimate: 0h > > Steps to reproduce: > * create a cluster of a few nodes (tested with 7 nodes) > * define per-collection policies that distribute replicas exclusively on > different nodes per policy > * concurrently create a few collections, each using a different policy > * resulting replica placement will be seriously wrong, causing many policy > violations > Running the same scenario but instead creating collections sequentially > results in no violations. > I suspect this is caused by incorrect locking level for all collection > operations (as defined in {{CollectionParams.CollectionAction}}) that create > new replica placements - i.e. CREATE, ADDREPLICA, MOVEREPLICA, DELETENODE, > REPLACENODE, SPLITSHARD, RESTORE, REINDEXCOLLECTION. All of these operations > use the policy engine to create new replica placements, and as a result they > change the cluster state. However, currently these operations are locked (in > {{OverseerCollectionMessageHandler.lockTask}} ) using > {{LockLevel.COLLECTION}}. In practice this means that the lock is held only > for the particular collection that is being modified. > A straightforward fix for this issue is to change the locking level to > CLUSTER (and I confirm this fixes the scenario described above). However, > this effectively serializes
[jira] [Comment Edited] (SOLR-14347) Autoscaling placement wrong when concurrent replica placements are calculated
[ https://issues.apache.org/jira/browse/SOLR-14347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17064940#comment-17064940 ] Andrzej Bialecki edited comment on SOLR-14347 at 3/23/20, 4:51 PM: --- It turns out that the bug was caused by the fact that per-collection policies are applied during calculations to the cached {{Session}} instance and cause side-effects that later affect calculations for other collections. Setting the LockLevel.CLUSTER fixed this because all computations became sequential, but at a relatively high cost of blocking all other CLUSTER level operations. It appears that re-creating a {{Policy.Session}} in {{PolicyHelper.getReplicaLocations(...)}} fixes this behavior too, because the new Session doesn't carry over the side-effects from previous per-collection policies. There is a slight performance impact of this approach, because re-creating a Session is costly for large clusters, but it's less intrusive than locking out all other CLUSTER level ops. We may re-visit this issue at some point to reduce this cost, but I think this fix at least protects us from the current completely wrong behavior. was (Author: ab): It turns out that the bug was caused by the fact that per-collection policies are applied during calculations and cause side-effects that later affect calculations for other collections. Setting the LockLevel.CLUSTER fixed this because all computations became sequential, but at a relatively high cost of blocking all other CLUSTER level operations. It appears that re-creating a {{Policy.Session}} in {{PolicyHelper.getReplicaLocations(...)}} fixes this behavior too, because the new Session doesn't carry over the side-effects from previous per-collection policies. There is a slight performance impact of this approach, because re-creating a Session is costly for large clusters, but it's less intrusive than locking out all other CLUSTER level ops. We may re-visit this issue at some point to reduce this cost, but I think this fix at least protects us from the current completely wrong behavior. > Autoscaling placement wrong when concurrent replica placements are calculated > - > > Key: SOLR-14347 > URL: https://issues.apache.org/jira/browse/SOLR-14347 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: AutoScaling >Affects Versions: 8.5 >Reporter: Andrzej Bialecki >Assignee: Andrzej Bialecki >Priority: Major > Attachments: SOLR-14347.patch > > Time Spent: 10m > Remaining Estimate: 0h > > Steps to reproduce: > * create a cluster of a few nodes (tested with 7 nodes) > * define per-collection policies that distribute replicas exclusively on > different nodes per policy > * concurrently create a few collections, each using a different policy > * resulting replica placement will be seriously wrong, causing many policy > violations > Running the same scenario but instead creating collections sequentially > results in no violations. > I suspect this is caused by incorrect locking level for all collection > operations (as defined in {{CollectionParams.CollectionAction}}) that create > new replica placements - i.e. CREATE, ADDREPLICA, MOVEREPLICA, DELETENODE, > REPLACENODE, SPLITSHARD, RESTORE, REINDEXCOLLECTION. All of these operations > use the policy engine to create new replica placements, and as a result they > change the cluster state. However, currently these operations are locked (in > {{OverseerCollectionMessageHandler.lockTask}} ) using > {{LockLevel.COLLECTION}}. In practice this means that the lock is held only > for the particular collection that is being modified. > A straightforward fix for this issue is to change the locking level to > CLUSTER (and I confirm this fixes the scenario described above). However, > this effectively serializes all collection operations listed above, which > will result in general slow-down of all collection operations. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org