Hi Michael, Interesting analysis.
We have had some discussions regarding the prediction/ regression improvement in dev list. It was based around introducing curve fitting, rather than using separate "ave + ave grad + ave 2nd grad", what we have now. The discussion was under the subject "[Autoscaling] [Improvement] Introducing "curve fitting" for stat prediction algorithm of Autoscaler". Could you have a look at that discussion as well. Also i have filled the same suggestion in a Jira [1]. Thanks. [1] https://issues.apache.org/jira/browse/STRATOS-1211 On Fri, Feb 13, 2015 at 9:37 PM, Michael Hall (michaha2) <micha...@cisco.com > wrote: > Hi Imesh, Lakmal, Devs, > > Following the attached email thread, this email is intended as a > starting point in formalising the proposal of the proceeding autoscale > enhancement. > > Kind regards, > > Mike… > > Proposal for ‘*Transition Compensated Continuous Scaling*’ enhancement > to be added to Apache Stratos Autoscale feature to: > > 1. Greatly improve (~ x100) the maximum rate of cluster size increase > (maximum rate of ascent), when subjected to a sudden increase in load. > (continuous scaling decisions can occur as the decision isn’t delayed > (cluster monitor interval) to wait for the system to tend toward a steady > state) > 2. Eliminate redundant cartridges being spawned/terminated because of > cartridge startup/stop being larger than a scaling decision interval > (cluster monitor interval) > > *Implementation Overview:* > > Current: > > Measured health statistic -> sent to CEP -> 1 minute average -> forward > prediction (use ave + ave grad + ave 2nd grad) -> use autoscale policy to > calc number of required cartridges -> compare required cartridge count, to > current cartridge found and scale appropriately > > Proposed: > > Measured health statistic -> sent to CEP -> 1 minute ‘moving’ average (per > second) -> forward prediction (use ave + ave grad + ave 2nd grad) -> use > autoscale policy to calc number of required cartridges -> compare required > cartridge count, to ( the active (current) cartridge count + the spawning > cartridge count – the terminating cartridge count ) and scale > appropriately > > *Implementation of ‘spawning’/‘terminating’ cartridge count:* > > Currently the autoscale feature is not aware of the amount of cartridges > in the cluster that are transitioning to and from the ACTIVE state. The > proposed enhancement relies on being able to know this count at any given > moment in time. > > This can be implemented by using asynchronous events, where: > > ‘MEMBER SPAWNED EVENT’ -> increments cluster-cartridge-count-spawned > ‘MEMBER ACTIVE EVENT’ -> decrements cluster-cartridge-count-spawned, and > increments cluster-cartridge-count-active > ‘MEMBER TERMINATING EVENT’ -> increments cluster-cartridge-count- > terminating > ‘MEMBER TERMINATED EVENT’ -> decrements cluster-cartridge-count-terminating, > and decrements cluster-cartridge-count-active > > *Summary* > > By compensating the ‘current’ cartridge count/ cluster size, with the > cartridges that are transitioning, we remove the issue of duplicating > scaling decisions whilst also allowing the scaling decision to occur > continuously, greatly improving our ‘maximum rate of ascent’ when scaling > up our cluster in reaction to a sudden increase in load. > > > > From: Michael Hall <micha...@cisco.com> > Reply-To: "dev@stratos.apache.org" <dev@stratos.apache.org> > Date: Friday, 13 February 2015 11:36 > To: Lakmal Warusawithana <lak...@wso2.com>, "dev@stratos.apache.org" < > dev@stratos.apache.org>, Imesh Gunaratne <im...@wso2.com> > Subject: Re: autoscale architecture > > That’s a good plan, > > My work number is +442088242650 > > I’m around now, but will break for a while for lunch in an hour or so. > > Cheers > > From: Lakmal Warusawithana <lak...@wso2.com> > Date: Friday, 13 February 2015 11:24 > To: "dev@stratos.apache.org" <dev@stratos.apache.org>, Imesh Gunaratne < > im...@wso2.com>, Michael Hall <micha...@cisco.com> > Subject: Re: autoscale architecture > > Shall we go for a call, it will be more productive. > > On Fri, Feb 13, 2015 at 4:45 PM, Lakmal Warusawithana <lak...@wso2.com> > wrote: > >> Hi Michael >> >> On Fri, Feb 13, 2015 at 4:14 PM, Michael Hall (michaha2) < >> micha...@cisco.com> wrote: >> >>> Hi Imesh, >>> >>> So ‘transistion compensated’ refers to cartridges, which are >>> ’transistioning’ between SPAWNED-ACTIVE, and TERMINATING-TERMINATED. >>> >>> What it really means, is that if the 'aggregated average’ (Referred to >>> this as <metric>PredictedValue in scaling.drl) is compensated: >>> >>> 1. As if the ‘spawning’ cartridges are providing resouce (although >>> they aren’t yet) >>> 2. As if the ‘terminating’ cartridges have removed resource >>> (although they haven't yet) >>> >>> Such that the ‘transition compensated aggregated average', will be >>> approximately what the actually aggregated average would be if those >>> cartridges had become fully ‘active’ or ‘terminated’. This means the >>> ‘transition compensated aggregated average’ is always in a sensible state >>> to make a scaling decision. >>> >>> This then allows us to make a scaling decision as often as we’d like >>> (much smaller than 90 seconds, could even be every 1 second), because if >>> you take the example the we’ve scaled up, the 'transition compensated >>> aggregated average’ will instantly adjust to N/N+1 of it’s raw value >>> (copied formula from previous email for reference below), so another >>> scaling decision will only occur, if the underlying load (aggregated >>> average) increases even further. >>> >>> *transistion-compensated-agg-ave = agg-ave * ( cluster-size / >>> cluster-size + cluster-spawned-size - cluster–terminating-size )* >>> >>> >> I think this is good proposal, definitely it will help to calculate >> more accurate agg-ave values. Since CEP has the topology information we can >> easily calculate this. >> >> AFAIK, auto scaler take care of cartridge states when calculating >> required instances count for a predicted load. >> >> >> >>> I’d be more than happy to setup a webex meeting to try and explain >>> this better? Or another avenue of communication at your preference? >>> >>> Kind regards, >>> >>> Mike >>> >>> From: Imesh Gunaratne <im...@apache.org> >>> Reply-To: "dev@stratos.apache.org" <dev@stratos.apache.org> >>> Date: Friday, 13 February 2015 01:09 >>> >>> To: dev <dev@stratos.apache.org> >>> Subject: Re: autoscale architecture >>> >>> Hi Mike, >>> >>> Thanks for the detailed explanation of your question. Currently we do >>> not have the capability to do this in runtime for a specific cartridge. >>> However we could reduce the global scaling decision interval. This needs to >>> be configured at three locations: >>> >>> 1. Cartridge agent statistics publishing interval (default: 15 seconds) >>> 2. CEP execution plan/faulty member detection interval (default: 1 min) >>> 3. Autoscaler cluster monitor interval (default: 90 seconds) >>> >>> I did not clearly get what you mean by 'transition compensated'. Is >>> there a way to explain it further? >>> >>> Thanks >>> >>> >>> On Fri, Feb 13, 2015 at 12:26 AM, Michael Hall (michaha2) < >>> micha...@cisco.com> wrote: >>> >>>> Hi Dev, >>>> >>>> Thanks for your response Imesh, if its ok, I’d like to skip straight >>>> to my (rather lengthy) question: >>>> >>>> Does the autoscaler have, currently or plans to introduce, a means to >>>> receive an asynchronous event, signalling that a cartridge has gone from >>>> ‘SPAWNED’ to ‘ACTIVE’, after it is launched from a 'scale-up’ decision, so >>>> that, scaling decision interval can decrease to approximately the metric >>>> update interval, and multiple cartridges are not spawned when only one is >>>> needed? >>>> >>>> In more depth: >>>> >>>> The reasons for my question being that by knowing a cartridge is in >>>> the ‘SPAWNED’ or ’TERMINATING’ state, the aggregated metric averages can be >>>> ’transition compensated’ I.e… >>>> *transistion-compensated-agg-ave = agg-ave * ( cluster-size / >>>> cluster-size + cluster-spawned-size - cluster–terminating-size )* >>>> To allow the scaling decisions to occur on a continuous (only throttled >>>> by the metric update frequency) basis. >>>> >>>> It appears that currently scaling decision occurs ~minutes. If this >>>> becomes ~seconds, it would vastly improving the maximum rate of ascent a >>>> cluster can scale against sudden increase in load. >>>> >>>> It appears that there is no spawning state awareness, which also >>>> means several ‘redundant’ instances get spawned, when instance startup time >>>> is greater than the scale decision interval. >>>> >>>> Finally: >>>> >>>> Are there difficulties in tracking ‘SPAWNED’ to ‘ACTIVE’ state on a >>>> per cartridge basis, how does this align (if its a valid enhancement) with >>>> other potential improvements that could be made to the autoscaler? >>>> >>>> Regards, >>>> >>>> Mike >>>> >>>> From: Imesh Gunaratne <im...@apache.org> >>>> Reply-To: "dev@stratos.apache.org" <dev@stratos.apache.org> >>>> Date: Thursday, 12 February 2015 18:16 >>>> To: dev <dev@stratos.apache.org> >>>> Subject: Re: autoscale architecture >>>> >>>> Hi Michael, >>>> >>>> Yes you can ask any questions you have on Autoscaling here. >>>> >>>> I don't think we have documented Autoscaling feature in 4.1.0 at the >>>> moment. However you could find some information here [1]. Autoscaling has >>>> slightly changed with Composite Application Model. >>>> >>>> [1] >>>> https://cwiki.apache.org/confluence/display/STRATOS/4.1.0+Autoscaler >>>> >>>> Thanks >>>> >>>> On Thu, Feb 12, 2015 at 9:33 PM, Michael Hall (michaha2) < >>>> micha...@cisco.com> wrote: >>>> >>>>> Hi Devs, >>>>> >>>>> Is there a resource or contact that can help me understand the >>>>> current, and planned architecture of the autoscaling feature within >>>>> Stratos. >>>>> >>>>> Best Regards, >>>>> >>>>> Mike >>>>> >>>> >>>> >>>> >>>> -- >>>> Imesh Gunaratne >>>> >>>> Technical Lead, WSO2 >>>> Committer & PMC Member, Apache Stratos >>>> >>> >>> >>> >>> -- >>> Imesh Gunaratne >>> >>> Technical Lead, WSO2 >>> Committer & PMC Member, Apache Stratos >>> >> >> >> >> -- >> Lakmal Warusawithana >> Vice President, Apache Stratos >> Director - Cloud Architecture; WSO2 Inc. >> Mobile : +94714289692 >> Blog : http://lakmalsview.blogspot.com/ >> >> > > > -- > Lakmal Warusawithana > Vice President, Apache Stratos > Director - Cloud Architecture; WSO2 Inc. > Mobile : +94714289692 > Blog : http://lakmalsview.blogspot.com/ > > -- -- Lahiru Sandaruwan Committer and PMC member, Apache Stratos, Senior Software Engineer, WSO2 Inc., http://wso2.com lean.enterprise.middleware email: lahi...@wso2.com blog: http://lahiruwrites.blogspot.com/ linked-in: http://lk.linkedin.com/pub/lahiru-sandaruwan/16/153/146