Re: Autoscale proposal - transition compensated continuous scaling

Lahiru Sandaruwan Thu, 26 Feb 2015 02:14:40 -0800

Hi Michael,

Interesting analysis.


We have had some discussions regarding the prediction/ regression
improvement in dev list.

It was based around introducing curve fitting, rather than using separate "ave
+ ave grad + ave 2nd grad", what we have now.

The discussion was under the subject "[Autoscaling] [Improvement]
Introducing "curve fitting" for stat prediction algorithm of Autoscaler".

Could you have a look at that discussion as well. Also i have filled the
same suggestion in a Jira [1].

Thanks.

[1] https://issues.apache.org/jira/browse/STRATOS-1211

On Fri, Feb 13, 2015 at 9:37 PM, Michael Hall (michaha2) <micha...@cisco.com
> wrote:

>  Hi Imesh, Lakmal, Devs,
>
>  Following the attached email thread, this email is intended as a
> starting point in formalising the proposal of the proceeding autoscale
> enhancement.
>
>  Kind regards,
>
>  Mike…
>
>  Proposal for ‘*Transition Compensated Continuous Scaling*’ enhancement
> to be added to Apache Stratos Autoscale feature to:
>
>    1. Greatly improve (~ x100) the maximum rate of cluster size increase
>    (maximum rate of ascent), when subjected to a sudden increase in load.
>    (continuous scaling decisions can occur as the decision isn’t delayed
>    (cluster monitor interval) to wait for the system to tend toward a steady
>    state)
>    2.  Eliminate redundant cartridges being spawned/terminated because of
>    cartridge startup/stop being larger than a scaling decision interval
>    (cluster monitor interval)
>
> *Implementation Overview:*
>
>  Current:
>
>  Measured health statistic -> sent to CEP -> 1 minute average -> forward
> prediction (use ave + ave grad + ave 2nd grad) -> use autoscale policy to
> calc number of required cartridges -> compare required cartridge count, to
> current cartridge found and scale appropriately
>
>  Proposed:
>
>  Measured health statistic -> sent to CEP -> 1 minute ‘moving’ average (per
> second) -> forward prediction (use ave + ave grad + ave 2nd grad) -> use
> autoscale policy to calc number of required cartridges -> compare required
> cartridge count, to ( the active (current) cartridge count + the spawning
> cartridge count – the terminating cartridge count ) and scale
> appropriately
>
>  *Implementation of ‘spawning’/‘terminating’ cartridge count:*
>
>  Currently the autoscale feature is not aware of the amount of cartridges
> in the cluster that are transitioning to and from the ACTIVE state. The
> proposed enhancement relies on being able to know this count at any given
> moment in time.
>
>  This can be implemented by using asynchronous events, where:
>
>  ‘MEMBER SPAWNED EVENT’ -> increments cluster-cartridge-count-spawned
> ‘MEMBER ACTIVE EVENT’ -> decrements cluster-cartridge-count-spawned, and
> increments cluster-cartridge-count-active
> ‘MEMBER TERMINATING EVENT’ -> increments cluster-cartridge-count-
> terminating
> ‘MEMBER TERMINATED EVENT’ -> decrements cluster-cartridge-count-terminating,
> and decrements cluster-cartridge-count-active
>
>  *Summary*
>
>  By compensating the ‘current’ cartridge count/ cluster size, with the
> cartridges that are transitioning, we remove the issue of duplicating
> scaling decisions whilst also allowing the scaling decision to occur
> continuously, greatly improving our ‘maximum rate of ascent’ when scaling
> up our cluster in reaction to a sudden increase in load.
>
>
>
>   From: Michael Hall <micha...@cisco.com>
> Reply-To: "dev@stratos.apache.org" <dev@stratos.apache.org>
> Date: Friday, 13 February 2015 11:36
> To: Lakmal Warusawithana <lak...@wso2.com>, "dev@stratos.apache.org" <
> dev@stratos.apache.org>, Imesh Gunaratne <im...@wso2.com>
> Subject: Re: autoscale architecture
>
>   That’s a good plan,
>
>  My work number is +442088242650
>
>  I’m around now, but will break for a while for lunch in an hour or so.
>
>  Cheers
>
>   From: Lakmal Warusawithana <lak...@wso2.com>
> Date: Friday, 13 February 2015 11:24
> To: "dev@stratos.apache.org" <dev@stratos.apache.org>, Imesh Gunaratne <
> im...@wso2.com>, Michael Hall <micha...@cisco.com>
> Subject: Re: autoscale architecture
>
>   Shall we go for a call, it will be more productive.
>
> On Fri, Feb 13, 2015 at 4:45 PM, Lakmal Warusawithana <lak...@wso2.com>
> wrote:
>
>> Hi Michael
>>
>> On Fri, Feb 13, 2015 at 4:14 PM, Michael Hall (michaha2) <
>> micha...@cisco.com> wrote:
>>
>>>  Hi Imesh,
>>>
>>>  So ‘transistion compensated’ refers to cartridges, which are
>>> ’transistioning’ between SPAWNED-ACTIVE, and TERMINATING-TERMINATED.
>>>
>>>  What it really means, is that if the 'aggregated average’ (Referred to
>>> this as <metric>PredictedValue in scaling.drl) is compensated:
>>>
>>>    1. As if the ‘spawning’ cartridges are providing resouce (although
>>>    they aren’t yet)
>>>    2. As if the ‘terminating’ cartridges have removed resource
>>>    (although they haven't yet)
>>>
>>> Such that the ‘transition compensated aggregated average', will be
>>> approximately what the actually aggregated average would be if those
>>> cartridges had become fully ‘active’ or ‘terminated’. This means the
>>> ‘transition compensated aggregated average’ is always in a sensible state
>>> to make a scaling decision.
>>>
>>>  This then allows us to make a scaling decision as often as we’d like
>>> (much smaller than 90 seconds, could even be every 1 second), because if
>>> you take the example the we’ve scaled up, the 'transition compensated
>>> aggregated average’ will instantly adjust to N/N+1 of it’s raw value
>>> (copied formula from previous email for reference below), so another
>>> scaling decision will only occur, if the underlying load (aggregated
>>> average) increases even further.
>>>
>>>  *transistion-compensated-agg-ave = agg-ave * ( cluster-size /
>>> cluster-size +  cluster-spawned-size - cluster–terminating-size )*
>>>
>>>
>>   I think this is good proposal, definitely it will help to calculate
>> more accurate agg-ave values. Since CEP has the topology information we can
>> easily calculate this.
>>
>>  AFAIK, auto scaler take care of cartridge states when calculating
>> required instances count for a predicted load.
>>
>>
>>
>>>   I’d be more than happy to setup a webex meeting to try and explain
>>> this better? Or another avenue of communication at your preference?
>>>
>>>  Kind regards,
>>>
>>>  Mike
>>>
>>>   From: Imesh Gunaratne <im...@apache.org>
>>> Reply-To: "dev@stratos.apache.org" <dev@stratos.apache.org>
>>> Date: Friday, 13 February 2015 01:09
>>>
>>> To: dev <dev@stratos.apache.org>
>>> Subject: Re: autoscale architecture
>>>
>>>   Hi Mike,
>>>
>>>  Thanks for the detailed explanation of your question. Currently we do
>>> not have the capability to do this in runtime for a specific cartridge.
>>> However we could reduce the global scaling decision interval. This needs to
>>> be configured at three locations:
>>>
>>>  1. Cartridge agent statistics publishing interval (default: 15 seconds)
>>> 2. CEP execution plan/faulty member detection interval (default: 1 min)
>>> 3. Autoscaler cluster monitor interval (default: 90 seconds)
>>>
>>>  I did not clearly get what you mean by 'transition compensated'. Is
>>> there a way to explain it further?
>>>
>>>  Thanks
>>>
>>>
>>> On Fri, Feb 13, 2015 at 12:26 AM, Michael Hall (michaha2) <
>>> micha...@cisco.com> wrote:
>>>
>>>>  Hi Dev,
>>>>
>>>>  Thanks for your response Imesh, if its ok, I’d like to skip straight
>>>> to my (rather lengthy) question:
>>>>
>>>>  Does the autoscaler have, currently or plans to introduce, a means to
>>>> receive an asynchronous event, signalling that a cartridge has gone from
>>>> ‘SPAWNED’ to ‘ACTIVE’, after it is launched from a 'scale-up’ decision, so
>>>> that, scaling decision interval can decrease to approximately the metric
>>>> update interval, and multiple cartridges are not spawned when only one is
>>>> needed?
>>>>
>>>>  In more depth:
>>>>
>>>>  The reasons for my question being that by knowing a cartridge is in
>>>> the ‘SPAWNED’ or ’TERMINATING’ state, the aggregated metric averages can be
>>>> ’transition compensated’ I.e…
>>>> *transistion-compensated-agg-ave = agg-ave * ( cluster-size /
>>>> cluster-size +  cluster-spawned-size - cluster–terminating-size )*
>>>> To allow the scaling decisions to occur on a continuous (only throttled
>>>> by the metric update frequency) basis.
>>>>
>>>>  It appears that currently scaling decision occurs ~minutes. If this
>>>> becomes ~seconds, it would vastly improving the maximum rate of ascent a
>>>> cluster can scale against sudden increase in load.
>>>>
>>>>  It appears that there is no spawning state awareness, which also
>>>> means several ‘redundant’ instances get spawned, when instance startup time
>>>> is greater than the scale decision interval.
>>>>
>>>>  Finally:
>>>>
>>>>  Are there difficulties in tracking ‘SPAWNED’ to ‘ACTIVE’ state on a
>>>> per cartridge basis, how does this align (if its a valid enhancement) with
>>>> other potential improvements that could be made to the autoscaler?
>>>>
>>>>  Regards,
>>>>
>>>>  Mike
>>>>
>>>>   From: Imesh Gunaratne <im...@apache.org>
>>>> Reply-To: "dev@stratos.apache.org" <dev@stratos.apache.org>
>>>> Date: Thursday, 12 February 2015 18:16
>>>> To: dev <dev@stratos.apache.org>
>>>> Subject: Re: autoscale architecture
>>>>
>>>>   Hi Michael,
>>>>
>>>>  Yes you can ask any questions you have on Autoscaling here.
>>>>
>>>>  I don't think we have documented Autoscaling feature in 4.1.0 at the
>>>> moment. However you could find some information here [1]. Autoscaling has
>>>> slightly changed with Composite Application Model.
>>>>
>>>>  [1]
>>>> https://cwiki.apache.org/confluence/display/STRATOS/4.1.0+Autoscaler
>>>>
>>>>  Thanks
>>>>
>>>> On Thu, Feb 12, 2015 at 9:33 PM, Michael Hall (michaha2) <
>>>> micha...@cisco.com> wrote:
>>>>
>>>>>  Hi Devs,
>>>>>
>>>>>  Is there a resource or contact that can help me understand the
>>>>> current, and planned architecture of the autoscaling feature within 
>>>>> Stratos.
>>>>>
>>>>>  Best Regards,
>>>>>
>>>>>  Mike
>>>>>
>>>>
>>>>
>>>>
>>>>  --
>>>>  Imesh Gunaratne
>>>>
>>>> Technical Lead, WSO2
>>>> Committer & PMC Member, Apache Stratos
>>>>
>>>
>>>
>>>
>>>  --
>>>  Imesh Gunaratne
>>>
>>> Technical Lead, WSO2
>>> Committer & PMC Member, Apache Stratos
>>>
>>
>>
>>
>>  --
>>  Lakmal Warusawithana
>> Vice President, Apache Stratos
>> Director - Cloud Architecture; WSO2 Inc.
>> Mobile : +94714289692
>> Blog : http://lakmalsview.blogspot.com/
>>
>>
>
>
>  --
>  Lakmal Warusawithana
> Vice President, Apache Stratos
> Director - Cloud Architecture; WSO2 Inc.
> Mobile : +94714289692
> Blog : http://lakmalsview.blogspot.com/
>
>


-- 
--
Lahiru Sandaruwan
Committer and PMC member, Apache Stratos,
Senior Software Engineer,
WSO2 Inc., http://wso2.com
lean.enterprise.middleware

email: lahi...@wso2.com blog: http://lahiruwrites.blogspot.com/
linked-in: http://lk.linkedin.com/pub/lahiru-sandaruwan/16/153/146

Re: Autoscale proposal - transition compensated continuous scaling

Reply via email to