Also Please create a JIRA and mark it for future release. On Thu, Feb 26, 2015 at 3:32 PM, Imesh Gunaratne <im...@apache.org> wrote:
> Hi Michael, > > Thanks for preparing a specification for the autoscaling improvement you > suggested, it looks good. We could also use the Wiki for documenting this. > > May be we can work on this in a separate branch and include it in a later > release. If you need any assistance to get it started please let us know, > we could guide to you setup the development environment. > > Thanks > > On Fri, Feb 13, 2015 at 9:37 PM, Michael Hall (michaha2) < > micha...@cisco.com> wrote: > >> Hi Imesh, Lakmal, Devs, >> >> Following the attached email thread, this email is intended as a >> starting point in formalising the proposal of the proceeding autoscale >> enhancement. >> >> Kind regards, >> >> Mike… >> >> Proposal for ‘*Transition Compensated Continuous Scaling*’ enhancement >> to be added to Apache Stratos Autoscale feature to: >> >> 1. Greatly improve (~ x100) the maximum rate of cluster size increase >> (maximum rate of ascent), when subjected to a sudden increase in load. >> (continuous scaling decisions can occur as the decision isn’t delayed >> (cluster monitor interval) to wait for the system to tend toward a steady >> state) >> 2. Eliminate redundant cartridges being spawned/terminated because >> of cartridge startup/stop being larger than a scaling decision interval >> (cluster monitor interval) >> >> *Implementation Overview:* >> >> Current: >> >> Measured health statistic -> sent to CEP -> 1 minute average -> forward >> prediction (use ave + ave grad + ave 2nd grad) -> use autoscale policy to >> calc number of required cartridges -> compare required cartridge count, to >> current cartridge found and scale appropriately >> >> Proposed: >> >> Measured health statistic -> sent to CEP -> 1 minute ‘moving’ average (per >> second) -> forward prediction (use ave + ave grad + ave 2nd grad) -> use >> autoscale policy to calc number of required cartridges -> compare required >> cartridge count, to ( the active (current) cartridge count + the >> spawning cartridge count – the terminating cartridge count ) and scale >> appropriately >> >> *Implementation of ‘spawning’/‘terminating’ cartridge count:* >> >> Currently the autoscale feature is not aware of the amount of >> cartridges in the cluster that are transitioning to and from the ACTIVE >> state. The proposed enhancement relies on being able to know this count at >> any given moment in time. >> >> This can be implemented by using asynchronous events, where: >> >> ‘MEMBER SPAWNED EVENT’ -> increments cluster-cartridge-count-spawned >> ‘MEMBER ACTIVE EVENT’ -> decrements cluster-cartridge-count-spawned, and >> increments cluster-cartridge-count-active >> ‘MEMBER TERMINATING EVENT’ -> increments cluster-cartridge-count- >> terminating >> ‘MEMBER TERMINATED EVENT’ -> decrements cluster-cartridge-count-terminating, >> and decrements cluster-cartridge-count-active >> >> *Summary* >> >> By compensating the ‘current’ cartridge count/ cluster size, with the >> cartridges that are transitioning, we remove the issue of duplicating >> scaling decisions whilst also allowing the scaling decision to occur >> continuously, greatly improving our ‘maximum rate of ascent’ when scaling >> up our cluster in reaction to a sudden increase in load. >> >> >> >> From: Michael Hall <micha...@cisco.com> >> Reply-To: "dev@stratos.apache.org" <dev@stratos.apache.org> >> Date: Friday, 13 February 2015 11:36 >> To: Lakmal Warusawithana <lak...@wso2.com>, "dev@stratos.apache.org" < >> dev@stratos.apache.org>, Imesh Gunaratne <im...@wso2.com> >> Subject: Re: autoscale architecture >> >> That’s a good plan, >> >> My work number is +442088242650 >> >> I’m around now, but will break for a while for lunch in an hour or so. >> >> Cheers >> >> From: Lakmal Warusawithana <lak...@wso2.com> >> Date: Friday, 13 February 2015 11:24 >> To: "dev@stratos.apache.org" <dev@stratos.apache.org>, Imesh Gunaratne < >> im...@wso2.com>, Michael Hall <micha...@cisco.com> >> Subject: Re: autoscale architecture >> >> Shall we go for a call, it will be more productive. >> >> On Fri, Feb 13, 2015 at 4:45 PM, Lakmal Warusawithana <lak...@wso2.com> >> wrote: >> >>> Hi Michael >>> >>> On Fri, Feb 13, 2015 at 4:14 PM, Michael Hall (michaha2) < >>> micha...@cisco.com> wrote: >>> >>>> Hi Imesh, >>>> >>>> So ‘transistion compensated’ refers to cartridges, which are >>>> ’transistioning’ between SPAWNED-ACTIVE, and TERMINATING-TERMINATED. >>>> >>>> What it really means, is that if the 'aggregated average’ (Referred >>>> to this as <metric>PredictedValue in scaling.drl) is compensated: >>>> >>>> 1. As if the ‘spawning’ cartridges are providing resouce (although >>>> they aren’t yet) >>>> 2. As if the ‘terminating’ cartridges have removed resource >>>> (although they haven't yet) >>>> >>>> Such that the ‘transition compensated aggregated average', will be >>>> approximately what the actually aggregated average would be if those >>>> cartridges had become fully ‘active’ or ‘terminated’. This means the >>>> ‘transition compensated aggregated average’ is always in a sensible state >>>> to make a scaling decision. >>>> >>>> This then allows us to make a scaling decision as often as we’d like >>>> (much smaller than 90 seconds, could even be every 1 second), because if >>>> you take the example the we’ve scaled up, the 'transition compensated >>>> aggregated average’ will instantly adjust to N/N+1 of it’s raw value >>>> (copied formula from previous email for reference below), so another >>>> scaling decision will only occur, if the underlying load (aggregated >>>> average) increases even further. >>>> >>>> *transistion-compensated-agg-ave = agg-ave * ( cluster-size / >>>> cluster-size + cluster-spawned-size - cluster–terminating-size )* >>>> >>>> >>> I think this is good proposal, definitely it will help to calculate >>> more accurate agg-ave values. Since CEP has the topology information we can >>> easily calculate this. >>> >>> AFAIK, auto scaler take care of cartridge states when calculating >>> required instances count for a predicted load. >>> >>> >>> >>>> I’d be more than happy to setup a webex meeting to try and explain >>>> this better? Or another avenue of communication at your preference? >>>> >>>> Kind regards, >>>> >>>> Mike >>>> >>>> From: Imesh Gunaratne <im...@apache.org> >>>> Reply-To: "dev@stratos.apache.org" <dev@stratos.apache.org> >>>> Date: Friday, 13 February 2015 01:09 >>>> >>>> To: dev <dev@stratos.apache.org> >>>> Subject: Re: autoscale architecture >>>> >>>> Hi Mike, >>>> >>>> Thanks for the detailed explanation of your question. Currently we do >>>> not have the capability to do this in runtime for a specific cartridge. >>>> However we could reduce the global scaling decision interval. This needs to >>>> be configured at three locations: >>>> >>>> 1. Cartridge agent statistics publishing interval (default: 15 >>>> seconds) >>>> 2. CEP execution plan/faulty member detection interval (default: 1 min) >>>> 3. Autoscaler cluster monitor interval (default: 90 seconds) >>>> >>>> I did not clearly get what you mean by 'transition compensated'. Is >>>> there a way to explain it further? >>>> >>>> Thanks >>>> >>>> >>>> On Fri, Feb 13, 2015 at 12:26 AM, Michael Hall (michaha2) < >>>> micha...@cisco.com> wrote: >>>> >>>>> Hi Dev, >>>>> >>>>> Thanks for your response Imesh, if its ok, I’d like to skip straight >>>>> to my (rather lengthy) question: >>>>> >>>>> Does the autoscaler have, currently or plans to introduce, a means >>>>> to receive an asynchronous event, signalling that a cartridge has gone >>>>> from >>>>> ‘SPAWNED’ to ‘ACTIVE’, after it is launched from a 'scale-up’ decision, so >>>>> that, scaling decision interval can decrease to approximately the metric >>>>> update interval, and multiple cartridges are not spawned when only one is >>>>> needed? >>>>> >>>>> In more depth: >>>>> >>>>> The reasons for my question being that by knowing a cartridge is in >>>>> the ‘SPAWNED’ or ’TERMINATING’ state, the aggregated metric averages can >>>>> be >>>>> ’transition compensated’ I.e… >>>>> *transistion-compensated-agg-ave = agg-ave * ( cluster-size / >>>>> cluster-size + cluster-spawned-size - cluster–terminating-size )* >>>>> To allow the scaling decisions to occur on a continuous (only >>>>> throttled by the metric update frequency) basis. >>>>> >>>>> It appears that currently scaling decision occurs ~minutes. If this >>>>> becomes ~seconds, it would vastly improving the maximum rate of ascent a >>>>> cluster can scale against sudden increase in load. >>>>> >>>>> It appears that there is no spawning state awareness, which also >>>>> means several ‘redundant’ instances get spawned, when instance startup >>>>> time >>>>> is greater than the scale decision interval. >>>>> >>>>> Finally: >>>>> >>>>> Are there difficulties in tracking ‘SPAWNED’ to ‘ACTIVE’ state on a >>>>> per cartridge basis, how does this align (if its a valid enhancement) with >>>>> other potential improvements that could be made to the autoscaler? >>>>> >>>>> Regards, >>>>> >>>>> Mike >>>>> >>>>> From: Imesh Gunaratne <im...@apache.org> >>>>> Reply-To: "dev@stratos.apache.org" <dev@stratos.apache.org> >>>>> Date: Thursday, 12 February 2015 18:16 >>>>> To: dev <dev@stratos.apache.org> >>>>> Subject: Re: autoscale architecture >>>>> >>>>> Hi Michael, >>>>> >>>>> Yes you can ask any questions you have on Autoscaling here. >>>>> >>>>> I don't think we have documented Autoscaling feature in 4.1.0 at the >>>>> moment. However you could find some information here [1]. Autoscaling has >>>>> slightly changed with Composite Application Model. >>>>> >>>>> [1] >>>>> https://cwiki.apache.org/confluence/display/STRATOS/4.1.0+Autoscaler >>>>> >>>>> Thanks >>>>> >>>>> On Thu, Feb 12, 2015 at 9:33 PM, Michael Hall (michaha2) < >>>>> micha...@cisco.com> wrote: >>>>> >>>>>> Hi Devs, >>>>>> >>>>>> Is there a resource or contact that can help me understand the >>>>>> current, and planned architecture of the autoscaling feature within >>>>>> Stratos. >>>>>> >>>>>> Best Regards, >>>>>> >>>>>> Mike >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> Imesh Gunaratne >>>>> >>>>> Technical Lead, WSO2 >>>>> Committer & PMC Member, Apache Stratos >>>>> >>>> >>>> >>>> >>>> -- >>>> Imesh Gunaratne >>>> >>>> Technical Lead, WSO2 >>>> Committer & PMC Member, Apache Stratos >>>> >>> >>> >>> >>> -- >>> Lakmal Warusawithana >>> Vice President, Apache Stratos >>> Director - Cloud Architecture; WSO2 Inc. >>> Mobile : +94714289692 >>> Blog : http://lakmalsview.blogspot.com/ >>> >>> >> >> >> -- >> Lakmal Warusawithana >> Vice President, Apache Stratos >> Director - Cloud Architecture; WSO2 Inc. >> Mobile : +94714289692 >> Blog : http://lakmalsview.blogspot.com/ >> >> > > > -- > Imesh Gunaratne > > Technical Lead, WSO2 > Committer & PMC Member, Apache Stratos > -- Lakmal Warusawithana Vice President, Apache Stratos Director - Cloud Architecture; WSO2 Inc. Mobile : +94714289692 Blog : http://lakmalsview.blogspot.com/