Andrzej Bialecki  created SOLR-11320:
----------------------------------------

             Summary: Lock autoscaling triggers when changes they requested are 
being made
                 Key: SOLR-11320
                 URL: https://issues.apache.org/jira/browse/SOLR-11320
             Project: Solr
          Issue Type: Improvement
      Security Level: Public (Default Security Level. Issues are Public)
          Components: AutoScaling
            Reporter: Andrzej Bialecki 
            Assignee: Andrzej Bialecki 


Autoscaling triggers generate events that are then processed by actions such as 
ComputePlanAction and ExecutePlanAction. This process is far from instantaneous 
- it may take sometimes several seconds or even minutes to eg. move or add 
replicas.

The original condition that caused the first event will usually persist during 
this time, and eventually after {{waitFor}} time elapsed it will lead to a new 
event being generated, which will be queued for execution once the previous 
actions are completed - but by that time the original condition may have been 
alleviated by these actions, and the conditions reported in the new event no 
longer reflect the latest cluster state.

For this reason some autoscaling frameworks introduce a "cooldown" period, 
where triggers are temporarily disabled for a fixed period of time to avoid 
piling up new events while cluster changes are being made. This method 
introduces a fixed delay that is specific to a trigger.

>From the point of view of control theory the feedback loop design should 
>minimize inherent delays because they are very hard to properly compensate for 
>and either lead to instability (when controller tries to compensate for an 
>out-of-step state) or lead to increased system lag (the system sluggishly 
>reacts to changes because it has to wait for things to settle down) - so from 
>this point of view a fixed delay, which is also hard to estimate properly and 
>may be inadequate depending on varying conditions, is not ideal.

A better alternative would be to lock the trigger just for the actual duration 
of time while changes are being made. Initially this could be implemented as a 
global lock for all triggers for the duration of modifications performed by 
ExecutePlanAction.

Currently cluster modifications executed by ExecutePlanAction are made 
asynchronously, so it's hard to determine when the changes actually take 
effect, eg. when a new (or moved) replica becomes active, so this would have to 
be changed as well.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to