[jira] [Comment Edited] (SOLR-14347) Autoscaling placement wrong when concurrent replica placements are calculated

2020-05-29 Thread Ilan Ginzburg (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17119116#comment-17119116
 ] 

Ilan Ginzburg edited comment on SOLR-14347 at 5/29/20, 6:28 AM:


[~ab] I’ve created PR 
[https://github.com/apache/lucene-solr/pull/1542|https://github.com/apache/lucene-solr/pull/1542]
 that I believe solves the issue identified above.

Fix has two parts:
 # The obvious one: in {{PolicyHelper.getReplicaLocations()}}, the new (post 
placement computation) Session is returned with the {{SessionWrapper}} so that 
the next use sees the assignments of the computation rather than the initial 
state.
 # A less obvious one: in the same method, the way the current (orig) session 
is copied to create the new Session is modified to not validate collections in 
Zookeeper. This validation removed from the new session everything that didn’t 
make it to Zookeeper, therefore not showing assignments in progress. 
{{Policy.Session.cloneToNewSession()}} contains the code doing the copy (tried 
to keep it as close to the original behavior as possible).

A multithreaded collection creation test (JMeter with 40 threads looping 
through creating single shard single replica collections) led to a balanced 3 
nodes cluster. Before the fix there were severe imbalances (up to a single node 
taking all replicas of the run).

After this fix gets merged (as well as 
[https://github.com/apache/lucene-solr/pull/1504|https://github.com/apache/lucene-solr/pull/1504]
 in SOLR-14462 dealing with Session creation and caching), I believe Session 
management could benefit from some refactoring and simplification.


was (Author: murblanc):
[~ab] I’ve created PR 
[https://github.com/apache/lucene-solr/pull/1542|http://example.com/] that I 
believe solves the issue identified above.

Fix has two parts:
 # The obvious one: in {{PolicyHelper.getReplicaLocations()}}, the new (post 
placement computation) Session is returned with the {{SessionWrapper}} so that 
the next use sees the assignments of the computation rather than the initial 
state.
 # A less obvious one: in the same method, the way the current (orig) session 
is copied to create the new Session is modified to not validate collections in 
Zookeeper. This validation removed from the new session everything that didn’t 
make it to Zookeeper, therefore not showing assignments in progress. 
{{Policy.Session.cloneToNewSession()}} contains the code doing the copy (tried 
to keep it as close to the original behavior as possible).

A multithreaded collection creation test (JMeter with 40 threads looping 
through creating single shard single replica collections) led to a balanced 3 
nodes cluster. Before the fix there were severe imbalances (up to a single node 
taking all replicas of the run).

After this fix gets merged (as well as 
[https://github.com/apache/lucene-solr/pull/1504|http://example.com/] in 
SOLR-14462 dealing with Session creation and caching), I believe Session 
management could benefit from some refactoring and simplification.

> Autoscaling placement wrong when concurrent replica placements are calculated
> -
>
> Key: SOLR-14347
> URL: https://issues.apache.org/jira/browse/SOLR-14347
> Project: Solr
>  Issue Type: Bug
>  Components: AutoScaling
>Affects Versions: 8.5
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Critical
> Fix For: 8.6
>
> Attachments: SOLR-14347.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Steps to reproduce:
>  * create a cluster of a few nodes (tested with 7 nodes)
>  * define per-collection policies that distribute replicas exclusively on 
> different nodes per policy
>  * concurrently create a few collections, each using a different policy
>  * resulting replica placement will be seriously wrong, causing many policy 
> violations
> Running the same scenario but instead creating collections sequentially 
> results in no violations.
> I suspect this is caused by incorrect locking level for all collection 
> operations (as defined in {{CollectionParams.CollectionAction}}) that create 
> new replica placements - i.e. CREATE, ADDREPLICA, MOVEREPLICA, DELETENODE, 
> REPLACENODE, SPLITSHARD, RESTORE, REINDEXCOLLECTION. All of these operations 
> use the policy engine to create new replica placements, and as a result they 
> change the cluster state. However, currently these operations are locked (in 
> {{OverseerCollectionMessageHandler.lockTask}} ) using 
> {{LockLevel.COLLECTION}}. In practice this means that the lock is held only 
> for the particular collection that is being modified.
> A straightforward fix for this issue is to change the locking level to 
> CLUSTER (and I confirm this fixes the scenario 

[jira] [Comment Edited] (SOLR-14347) Autoscaling placement wrong when concurrent replica placements are calculated

2020-05-29 Thread Ilan Ginzburg (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17119116#comment-17119116
 ] 

Ilan Ginzburg edited comment on SOLR-14347 at 5/29/20, 6:27 AM:


[~ab] I’ve created PR 
[https://github.com/apache/lucene-solr/pull/1542|http://example.com/] that I 
believe solves the issue identified above.

Fix has two parts:
 # The obvious one: in {{PolicyHelper.getReplicaLocations()}}, the new (post 
placement computation) Session is returned with the {{SessionWrapper}} so that 
the next use sees the assignments of the computation rather than the initial 
state.
 # A less obvious one: in the same method, the way the current (orig) session 
is copied to create the new Session is modified to not validate collections in 
Zookeeper. This validation removed from the new session everything that didn’t 
make it to Zookeeper, therefore not showing assignments in progress. 
{{Policy.Session.cloneToNewSession()}} contains the code doing the copy (tried 
to keep it as close to the original behavior as possible).

A multithreaded collection creation test (JMeter with 40 threads looping 
through creating single shard single replica collections) led to a balanced 3 
nodes cluster. Before the fix there were severe imbalances (up to a single node 
taking all replicas of the run).

After this fix gets merged (as well as 
[https://github.com/apache/lucene-solr/pull/1504|http://example.com/] in 
SOLR-14462 dealing with Session creation and caching), I believe Session 
management could benefit from some refactoring and simplification.


was (Author: murblanc):
[~ab] I’ve created PR 
[https://github.com/apache/lucene-solr/pull/1542|http://example.com] that I 
believe solves the issue identified above.

Fix has two parts:
 # The obvious one: in {{PolicyHelper.getReplicaLocations()}}, the new (post 
placement computation) Session is returned with the {{SessionWrapper}} so that 
the next use sees the assignments of the computation rather than the initial 
state.
 # A less obvious one: in the same method, the way the current (orig) session 
is copied to create the new Session is modified to not validate collections in 
Zookeeper. This validation removed from the new session everything that didn’t 
make it to Zookeeper, therefore not showing assignments in progress. 
{{Policy.Session.cloneToNewSession()}} contains the code doing the copy (tried 
to keep it as close to the original behavior as possible).

A multithreaded collection creation test (JMeter with 40 threads looping 
through creating single shard single replica collections) led to a balanced 3 
nodes cluster. Before the fix there were severe imbalances (up to a single node 
taking all replicas of the run).

After this fix gets merged (as well as 
[https://github.com/apache/lucene-solr/pull/1504|http://example.com] in 
SOLR-14462 dealing with Session creation and caching), I believe Session 
management could benefit from some refactoring and simplification.

> Autoscaling placement wrong when concurrent replica placements are calculated
> -
>
> Key: SOLR-14347
> URL: https://issues.apache.org/jira/browse/SOLR-14347
> Project: Solr
>  Issue Type: Bug
>  Components: AutoScaling
>Affects Versions: 8.5
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Critical
> Fix For: 8.6
>
> Attachments: SOLR-14347.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Steps to reproduce:
>  * create a cluster of a few nodes (tested with 7 nodes)
>  * define per-collection policies that distribute replicas exclusively on 
> different nodes per policy
>  * concurrently create a few collections, each using a different policy
>  * resulting replica placement will be seriously wrong, causing many policy 
> violations
> Running the same scenario but instead creating collections sequentially 
> results in no violations.
> I suspect this is caused by incorrect locking level for all collection 
> operations (as defined in {{CollectionParams.CollectionAction}}) that create 
> new replica placements - i.e. CREATE, ADDREPLICA, MOVEREPLICA, DELETENODE, 
> REPLACENODE, SPLITSHARD, RESTORE, REINDEXCOLLECTION. All of these operations 
> use the policy engine to create new replica placements, and as a result they 
> change the cluster state. However, currently these operations are locked (in 
> {{OverseerCollectionMessageHandler.lockTask}} ) using 
> {{LockLevel.COLLECTION}}. In practice this means that the lock is held only 
> for the particular collection that is being modified.
> A straightforward fix for this issue is to change the locking level to 
> CLUSTER (and I confirm this fixes the scenario described above). However, 
> this effectively serializes 

[jira] [Comment Edited] (SOLR-14347) Autoscaling placement wrong when concurrent replica placements are calculated

2020-03-23 Thread Andrzej Bialecki (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17064940#comment-17064940
 ] 

Andrzej Bialecki edited comment on SOLR-14347 at 3/23/20, 4:51 PM:
---

It turns out that the bug was caused by the fact that per-collection policies 
are applied during calculations to the cached {{Session}} instance and cause 
side-effects that later affect calculations for other collections.

Setting the LockLevel.CLUSTER fixed this because all computations became 
sequential, but at a relatively high cost of blocking all other CLUSTER level 
operations. It appears that re-creating a {{Policy.Session}} in 
{{PolicyHelper.getReplicaLocations(...)}} fixes this behavior too, because the 
new Session doesn't carry over the side-effects from previous per-collection 
policies. There is a slight performance impact of this approach, because 
re-creating a Session is costly for large clusters, but it's less intrusive 
than locking out all other CLUSTER level ops.

We may re-visit this issue at some point to reduce this cost, but I think this 
fix at least protects us from the current completely wrong behavior.


was (Author: ab):
It turns out that the bug was caused by the fact that per-collection policies 
are applied during calculations and cause side-effects that later affect 
calculations for other collections.

Setting the LockLevel.CLUSTER fixed this because all computations became 
sequential, but at a relatively high cost of blocking all other CLUSTER level 
operations. It appears that re-creating a {{Policy.Session}} in 
{{PolicyHelper.getReplicaLocations(...)}} fixes this behavior too, because the 
new Session doesn't carry over the side-effects from previous per-collection 
policies. There is a slight performance impact of this approach, because 
re-creating a Session is costly for large clusters, but it's less intrusive 
than locking out all other CLUSTER level ops.

We may re-visit this issue at some point to reduce this cost, but I think this 
fix at least protects us from the current completely wrong behavior.

> Autoscaling placement wrong when concurrent replica placements are calculated
> -
>
> Key: SOLR-14347
> URL: https://issues.apache.org/jira/browse/SOLR-14347
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: AutoScaling
>Affects Versions: 8.5
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
> Attachments: SOLR-14347.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Steps to reproduce:
>  * create a cluster of a few nodes (tested with 7 nodes)
>  * define per-collection policies that distribute replicas exclusively on 
> different nodes per policy
>  * concurrently create a few collections, each using a different policy
>  * resulting replica placement will be seriously wrong, causing many policy 
> violations
> Running the same scenario but instead creating collections sequentially 
> results in no violations.
> I suspect this is caused by incorrect locking level for all collection 
> operations (as defined in {{CollectionParams.CollectionAction}}) that create 
> new replica placements - i.e. CREATE, ADDREPLICA, MOVEREPLICA, DELETENODE, 
> REPLACENODE, SPLITSHARD, RESTORE, REINDEXCOLLECTION. All of these operations 
> use the policy engine to create new replica placements, and as a result they 
> change the cluster state. However, currently these operations are locked (in 
> {{OverseerCollectionMessageHandler.lockTask}} ) using 
> {{LockLevel.COLLECTION}}. In practice this means that the lock is held only 
> for the particular collection that is being modified.
> A straightforward fix for this issue is to change the locking level to 
> CLUSTER (and I confirm this fixes the scenario described above). However, 
> this effectively serializes all collection operations listed above, which 
> will result in general slow-down of all collection operations.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org