Re: Autoscale proposal - transition compensated continuous scaling

Lakmal Warusawithana Thu, 26 Feb 2015 02:07:41 -0800

Also Please create a JIRA and mark it for future release.

On Thu, Feb 26, 2015 at 3:32 PM, Imesh Gunaratne <[email protected]> wrote:


> Hi Michael,
>
> Thanks for preparing a specification for the autoscaling improvement you
> suggested, it looks good. We could also use the Wiki for documenting this.
>
> May be we can work on this in a separate branch and include it in a later
> release. If you need any assistance to get it started please let us know,
> we could guide to you setup the development environment.
>
> Thanks
>
> On Fri, Feb 13, 2015 at 9:37 PM, Michael Hall (michaha2) <
> [email protected]> wrote:
>
>>  Hi Imesh, Lakmal, Devs,
>>
>>  Following the attached email thread, this email is intended as a
>> starting point in formalising the proposal of the proceeding autoscale
>> enhancement.
>>
>>  Kind regards,
>>
>>  Mike…
>>
>>  Proposal for ‘*Transition Compensated Continuous Scaling*’ enhancement
>> to be added to Apache Stratos Autoscale feature to:
>>
>>    1. Greatly improve (~ x100) the maximum rate of cluster size increase
>>    (maximum rate of ascent), when subjected to a sudden increase in load.
>>    (continuous scaling decisions can occur as the decision isn’t delayed
>>    (cluster monitor interval) to wait for the system to tend toward a steady
>>    state)
>>    2.  Eliminate redundant cartridges being spawned/terminated because
>>    of cartridge startup/stop being larger than a scaling decision interval
>>    (cluster monitor interval)
>>
>> *Implementation Overview:*
>>
>>  Current:
>>
>>  Measured health statistic -> sent to CEP -> 1 minute average -> forward
>> prediction (use ave + ave grad + ave 2nd grad) -> use autoscale policy to
>> calc number of required cartridges -> compare required cartridge count, to
>> current cartridge found and scale appropriately
>>
>>  Proposed:
>>
>>  Measured health statistic -> sent to CEP -> 1 minute ‘moving’ average (per
>> second) -> forward prediction (use ave + ave grad + ave 2nd grad) -> use
>> autoscale policy to calc number of required cartridges -> compare required
>> cartridge count, to ( the active (current) cartridge count + the
>> spawning cartridge count – the terminating cartridge count ) and scale
>> appropriately
>>
>>  *Implementation of ‘spawning’/‘terminating’ cartridge count:*
>>
>>  Currently the autoscale feature is not aware of the amount of
>> cartridges in the cluster that are transitioning to and from the ACTIVE
>> state. The proposed enhancement relies on being able to know this count at
>> any given moment in time.
>>
>>  This can be implemented by using asynchronous events, where:
>>
>>  ‘MEMBER SPAWNED EVENT’ -> increments cluster-cartridge-count-spawned
>> ‘MEMBER ACTIVE EVENT’ -> decrements cluster-cartridge-count-spawned, and
>> increments cluster-cartridge-count-active
>> ‘MEMBER TERMINATING EVENT’ -> increments cluster-cartridge-count-
>> terminating
>> ‘MEMBER TERMINATED EVENT’ -> decrements cluster-cartridge-count-terminating,
>> and decrements cluster-cartridge-count-active
>>
>>  *Summary*
>>
>>  By compensating the ‘current’ cartridge count/ cluster size, with the
>> cartridges that are transitioning, we remove the issue of duplicating
>> scaling decisions whilst also allowing the scaling decision to occur
>> continuously, greatly improving our ‘maximum rate of ascent’ when scaling
>> up our cluster in reaction to a sudden increase in load.
>>
>>
>>
>>   From: Michael Hall <[email protected]>
>> Reply-To: "[email protected]" <[email protected]>
>> Date: Friday, 13 February 2015 11:36
>> To: Lakmal Warusawithana <[email protected]>, "[email protected]" <
>> [email protected]>, Imesh Gunaratne <[email protected]>
>> Subject: Re: autoscale architecture
>>
>>   That’s a good plan,
>>
>>  My work number is +442088242650
>>
>>  I’m around now, but will break for a while for lunch in an hour or so.
>>
>>  Cheers
>>
>>   From: Lakmal Warusawithana <[email protected]>
>> Date: Friday, 13 February 2015 11:24
>> To: "[email protected]" <[email protected]>, Imesh Gunaratne <
>> [email protected]>, Michael Hall <[email protected]>
>> Subject: Re: autoscale architecture
>>
>>   Shall we go for a call, it will be more productive.
>>
>> On Fri, Feb 13, 2015 at 4:45 PM, Lakmal Warusawithana <[email protected]>
>> wrote:
>>
>>> Hi Michael
>>>
>>> On Fri, Feb 13, 2015 at 4:14 PM, Michael Hall (michaha2) <
>>> [email protected]> wrote:
>>>
>>>>  Hi Imesh,
>>>>
>>>>  So ‘transistion compensated’ refers to cartridges, which are
>>>> ’transistioning’ between SPAWNED-ACTIVE, and TERMINATING-TERMINATED.
>>>>
>>>>  What it really means, is that if the 'aggregated average’ (Referred
>>>> to this as <metric>PredictedValue in scaling.drl) is compensated:
>>>>
>>>>    1. As if the ‘spawning’ cartridges are providing resouce (although
>>>>    they aren’t yet)
>>>>    2. As if the ‘terminating’ cartridges have removed resource
>>>>    (although they haven't yet)
>>>>
>>>> Such that the ‘transition compensated aggregated average', will be
>>>> approximately what the actually aggregated average would be if those
>>>> cartridges had become fully ‘active’ or ‘terminated’. This means the
>>>> ‘transition compensated aggregated average’ is always in a sensible state
>>>> to make a scaling decision.
>>>>
>>>>  This then allows us to make a scaling decision as often as we’d like
>>>> (much smaller than 90 seconds, could even be every 1 second), because if
>>>> you take the example the we’ve scaled up, the 'transition compensated
>>>> aggregated average’ will instantly adjust to N/N+1 of it’s raw value
>>>> (copied formula from previous email for reference below), so another
>>>> scaling decision will only occur, if the underlying load (aggregated
>>>> average) increases even further.
>>>>
>>>>  *transistion-compensated-agg-ave = agg-ave * ( cluster-size /
>>>> cluster-size +  cluster-spawned-size - cluster–terminating-size )*
>>>>
>>>>
>>>   I think this is good proposal, definitely it will help to calculate
>>> more accurate agg-ave values. Since CEP has the topology information we can
>>> easily calculate this.
>>>
>>>  AFAIK, auto scaler take care of cartridge states when calculating
>>> required instances count for a predicted load.
>>>
>>>
>>>
>>>>   I’d be more than happy to setup a webex meeting to try and explain
>>>> this better? Or another avenue of communication at your preference?
>>>>
>>>>  Kind regards,
>>>>
>>>>  Mike
>>>>
>>>>   From: Imesh Gunaratne <[email protected]>
>>>> Reply-To: "[email protected]" <[email protected]>
>>>> Date: Friday, 13 February 2015 01:09
>>>>
>>>> To: dev <[email protected]>
>>>> Subject: Re: autoscale architecture
>>>>
>>>>   Hi Mike,
>>>>
>>>>  Thanks for the detailed explanation of your question. Currently we do
>>>> not have the capability to do this in runtime for a specific cartridge.
>>>> However we could reduce the global scaling decision interval. This needs to
>>>> be configured at three locations:
>>>>
>>>>  1. Cartridge agent statistics publishing interval (default: 15
>>>> seconds)
>>>> 2. CEP execution plan/faulty member detection interval (default: 1 min)
>>>> 3. Autoscaler cluster monitor interval (default: 90 seconds)
>>>>
>>>>  I did not clearly get what you mean by 'transition compensated'. Is
>>>> there a way to explain it further?
>>>>
>>>>  Thanks
>>>>
>>>>
>>>> On Fri, Feb 13, 2015 at 12:26 AM, Michael Hall (michaha2) <
>>>> [email protected]> wrote:
>>>>
>>>>>  Hi Dev,
>>>>>
>>>>>  Thanks for your response Imesh, if its ok, I’d like to skip straight
>>>>> to my (rather lengthy) question:
>>>>>
>>>>>  Does the autoscaler have, currently or plans to introduce, a means
>>>>> to receive an asynchronous event, signalling that a cartridge has gone 
>>>>> from
>>>>> ‘SPAWNED’ to ‘ACTIVE’, after it is launched from a 'scale-up’ decision, so
>>>>> that, scaling decision interval can decrease to approximately the metric
>>>>> update interval, and multiple cartridges are not spawned when only one is
>>>>> needed?
>>>>>
>>>>>  In more depth:
>>>>>
>>>>>  The reasons for my question being that by knowing a cartridge is in
>>>>> the ‘SPAWNED’ or ’TERMINATING’ state, the aggregated metric averages can 
>>>>> be
>>>>> ’transition compensated’ I.e…
>>>>> *transistion-compensated-agg-ave = agg-ave * ( cluster-size /
>>>>> cluster-size +  cluster-spawned-size - cluster–terminating-size )*
>>>>> To allow the scaling decisions to occur on a continuous (only
>>>>> throttled by the metric update frequency) basis.
>>>>>
>>>>>  It appears that currently scaling decision occurs ~minutes. If this
>>>>> becomes ~seconds, it would vastly improving the maximum rate of ascent a
>>>>> cluster can scale against sudden increase in load.
>>>>>
>>>>>  It appears that there is no spawning state awareness, which also
>>>>> means several ‘redundant’ instances get spawned, when instance startup 
>>>>> time
>>>>> is greater than the scale decision interval.
>>>>>
>>>>>  Finally:
>>>>>
>>>>>  Are there difficulties in tracking ‘SPAWNED’ to ‘ACTIVE’ state on a
>>>>> per cartridge basis, how does this align (if its a valid enhancement) with
>>>>> other potential improvements that could be made to the autoscaler?
>>>>>
>>>>>  Regards,
>>>>>
>>>>>  Mike
>>>>>
>>>>>   From: Imesh Gunaratne <[email protected]>
>>>>> Reply-To: "[email protected]" <[email protected]>
>>>>> Date: Thursday, 12 February 2015 18:16
>>>>> To: dev <[email protected]>
>>>>> Subject: Re: autoscale architecture
>>>>>
>>>>>   Hi Michael,
>>>>>
>>>>>  Yes you can ask any questions you have on Autoscaling here.
>>>>>
>>>>>  I don't think we have documented Autoscaling feature in 4.1.0 at the
>>>>> moment. However you could find some information here [1]. Autoscaling has
>>>>> slightly changed with Composite Application Model.
>>>>>
>>>>>  [1]
>>>>> https://cwiki.apache.org/confluence/display/STRATOS/4.1.0+Autoscaler
>>>>>
>>>>>  Thanks
>>>>>
>>>>> On Thu, Feb 12, 2015 at 9:33 PM, Michael Hall (michaha2) <
>>>>> [email protected]> wrote:
>>>>>
>>>>>>  Hi Devs,
>>>>>>
>>>>>>  Is there a resource or contact that can help me understand the
>>>>>> current, and planned architecture of the autoscaling feature within 
>>>>>> Stratos.
>>>>>>
>>>>>>  Best Regards,
>>>>>>
>>>>>>  Mike
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>>  --
>>>>>  Imesh Gunaratne
>>>>>
>>>>> Technical Lead, WSO2
>>>>> Committer & PMC Member, Apache Stratos
>>>>>
>>>>
>>>>
>>>>
>>>>  --
>>>>  Imesh Gunaratne
>>>>
>>>> Technical Lead, WSO2
>>>> Committer & PMC Member, Apache Stratos
>>>>
>>>
>>>
>>>
>>>  --
>>>  Lakmal Warusawithana
>>> Vice President, Apache Stratos
>>> Director - Cloud Architecture; WSO2 Inc.
>>> Mobile : +94714289692
>>> Blog : http://lakmalsview.blogspot.com/
>>>
>>>
>>
>>
>>  --
>>  Lakmal Warusawithana
>> Vice President, Apache Stratos
>> Director - Cloud Architecture; WSO2 Inc.
>> Mobile : +94714289692
>> Blog : http://lakmalsview.blogspot.com/
>>
>>
>
>
> --
> Imesh Gunaratne
>
> Technical Lead, WSO2
> Committer & PMC Member, Apache Stratos
>



-- 
Lakmal Warusawithana
Vice President, Apache Stratos
Director - Cloud Architecture; WSO2 Inc.
Mobile : +94714289692
Blog : http://lakmalsview.blogspot.com/

Re: Autoscale proposal - transition compensated continuous scaling

Reply via email to