Re: Improvements to Autoscaling in Apache Stratos [gsoc]

Lahiru Sandaruwan Tue, 29 Apr 2014 21:09:08 -0700

On Wed, Apr 30, 2014 at 9:12 AM, Nirmal Fernando <[email protected]>wrote:


>
>
>
> On Wed, Apr 30, 2014 at 9:02 AM, Lahiru Sandaruwan <[email protected]>wrote:
>
>>
>>
>>
>> On Wed, Apr 30, 2014 at 8:24 AM, Nirmal Fernando 
>> <[email protected]>wrote:
>>
>>> Hi Lahiru,
>>>
>>> I still don't understand what's the difference here. This is the same
>>> concept we had from pre-Apache era. In the requests-in-flight case, user
>>> gives the # requests that an instance could bear and based on the current
>>> load we would scale.
>>>
>>
>> Please note the difference of this # i have mentioned at the thread i
>> pointed. This number is bit different now and then.
>>
>
> Well.. can you explain the difference ? for me it's just a measure of
> server's capability to handle load and which is a threshold.
>

Okay. Let's it is a threshold.

>
>
>>
>>> And AFAIS what we need to improve is the prediction logic.
>>>
>>
>> No. We do not stop after the prediction. We calculate the number of
>> instances, that we did not do before.
>>
>
> We did it in earlier auto-scaler. We calculated number of instances that
> require and spawn 'n' instances. It is not there right now in 4.0 after the
> architecture change.
>

Great. Let's find a way to do it in 4.0 as well. Amazon does a good job
with that and they have an article regarding it.

>
>
>> Then we do not have to worry about upper limit and lower limit.
>>
>
> Well.. if you see Asiri's equation it still uses a threshold value and
> he's talking about scaling up scenario and hence it's the upper limit.
>
>
This will be changed. Limits does not apply as my reply explained.

> None of you is talking about scaling down scenario, AFAIS.
>

The greatness of new approach is that we do not need to worry about scale
down or up and the limits that we take the decision. We do consider both
scenarios in one formula.

Killing two birds with one stone ;)


>
>>
>>>
>>> On Wed, Apr 30, 2014 at 5:58 AM, Lahiru Sandaruwan <[email protected]>wrote:
>>>
>>>> Hi Nirmal,
>>>>
>>>> I thought the scenario a bit and explained at thread [1]. There, Isuru
>>>> Perera has sent a usecase that i used to explain how things happen. With
>>>> the new approach, we need a value from user, but it is not a threshold. It
>>>> is "the number of concurrent requests that one instance can handle".
>>>>
>>>> Anyway we need more people think through this :)
>>>>
>>>> Everyones ideas are highly appreciated since this is like the "brain"
>>>> of Stratos(Can live without it, but no use ;)).
>>>>
>>>> Thanks.
>>>>
>>>> [1] Load Balancer Statistics Publishing Sliding Window
>>>>
>>>>
>>>> On Tue, Apr 29, 2014 at 11:39 PM, Nirmal Fernando <
>>>> [email protected]> wrote:
>>>>
>>>>> Guys,
>>>>>
>>>>> What's the plan of finding value of the T (threshold)? To me, we need
>>>>> to get it from the user via auto-scaling policy.
>>>>>
>>>>>
>>>>> On Mon, Mar 31, 2014 at 11:40 PM, Lahiru Sandaruwan 
>>>>> <[email protected]>wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>>
>>>>>> On Sat, Mar 29, 2014 at 5:29 AM, Asiri Liyana Arachchi <
>>>>>> [email protected]> wrote:
>>>>>>
>>>>>>>
>>>>>>> *Predicting the Number of Instances.*
>>>>>>>
>>>>>>> Lets take
>>>>>>>
>>>>>>> n - predicted number of instances
>>>>>>> m - active instances
>>>>>>> T - threshold
>>>>>>> L - predicted next minute Load / memory consumption ( return value
>>>>>>> of the
>>>>>>> *org.apache.stratos.autoscaler.rule.RuleTasksDelegator#getPredictedValueForNextMinute()*method
>>>>>>>  )
>>>>>>> 0.8 - scale up factor
>>>>>>> 0.2 - scale down factor
>>>>>>>
>>>>>>> *Since Request in flight* is per Cluster
>>>>>>>
>>>>>>> Therefor as I understood threshold value for requestInFlight pretty
>>>>>>> much means how many requests that are inflight will be handled by an
>>>>>>> instance.
>>>>>>>
>>>>>>> n = L/(T*0.8)
>>>>>>>
>>>>>>> scale down is done only when predicted value is lower than the T*0.2
>>>>>>>
>>>>>>>
>>>>>>> *Memory Consumption (mc ) and Load Average (la )* is per member.
>>>>>>>
>>>>>>
>>>>>> We get these stats clusterwise as well. Currently clusterwise stat is
>>>>>> used for taking decision. Memberwise stats are used when we choosing 
>>>>>> nodes
>>>>>> for terminating. Least loaded node at the moment will be selected to
>>>>>> terminate.
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> m * L <= n * (T*0.8)
>>>>>>>
>>>>>>> Hence n can be calculated getting the ceiling value of  (m*L) / T as
>>>>>>> an int
>>>>>>> scale down is done only when predicted value is lower than the T*0.2
>>>>>>>
>>>>>>>
>>>>>>> *getPredictedValueForNextMinute() *predicts the next minute values.
>>>>>>> So rather than writing instance prediction algorithm from scratch using
>>>>>>> provided next minutes values , needed instances can be calculated 
>>>>>>> easily.
>>>>>>> (IMO)
>>>>>>> Currently stratos auotoscaler is capable only of scaling up or down
>>>>>>> by one instance based on predicted values. But using this method it's
>>>>>>> capable of predicting exactly how many instances that should be spawned 
>>>>>>> to
>>>>>>> handle the next minute load and even when scaling down it will predict 
>>>>>>> how
>>>>>>> many instances that should be terminated.
>>>>>>> Code : [1]
>>>>>>>
>>>>>>> I would like to know your comments on this approach.
>>>>>>>
>>>>>>>
>>>>>>> [1] :
>>>>>>> https://github.com/asiriwork/autoscaler-stratos/blob/a770787dca78ecfa3649624613fbb505280a2fb9/org.apache.stratos.autoscaler/src/main/java/org/apache/stratos/autoscaler/rule/RuleTasksDelegator.java
>>>>>>>
>>>>>>>
>>>>>>> Regards,
>>>>>>> Asiri
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Sun, Mar 23, 2014 at 11:53 AM, Lahiru Sandaruwan <
>>>>>>> [email protected]> wrote:
>>>>>>>
>>>>>>>> Great to hear that.
>>>>>>>>
>>>>>>>> Thanks.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Sat, Mar 22, 2014 at 1:53 AM, Asiri Liyana Arachchi <
>>>>>>>> [email protected]> wrote:
>>>>>>>>
>>>>>>>>> I've submit the proposal for "Improvements to Autoscaling for
>>>>>>>>> Apache Stratos" project at google-melange.
>>>>>>>>>
>>>>>>>>> Here is the link
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> https://www.google-melange.com/gsoc/proposal/review/student/google/gsoc2014/asiria/5629499534213120
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Regards
>>>>>>>>> Asiri
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Tue, Mar 18, 2014 at 4:29 AM, Asiri Liyana Arachchi <
>>>>>>>>> [email protected]> wrote:
>>>>>>>>>
>>>>>>>>>> Thanks a lot for the elaborated reply.
>>>>>>>>>>
>>>>>>>>>> It helped a lot in getting familiar with Drools by running
>>>>>>>>>> samples as you've pointed. And I've built the code base.
>>>>>>>>>>
>>>>>>>>>> After going through scaling.drl
>>>>>>>>>> (products/autoscaler/modules/distribution/src/main/conf/scaling.drl) 
>>>>>>>>>> it was
>>>>>>>>>> clear that currently stratos uses
>>>>>>>>>> RuleTasksDelegator.getPredictedValueForNextMinute() method to 
>>>>>>>>>> compare, stat
>>>>>>>>>> values against the thresholds.
>>>>>>>>>>
>>>>>>>>>> *Approach on deciding the number of instances that might need to
>>>>>>>>>> handle the load:*
>>>>>>>>>>
>>>>>>>>>> Using existing method on predicting next minute Requests
>>>>>>>>>> inflight, Load average and Memory Consumption.
>>>>>>>>>>
>>>>>>>>>>    - Assumption: current thresholds of those metrics are the
>>>>>>>>>>    optimal values for an instance.
>>>>>>>>>>    - Based on that implementing a simple algorithm to decide,
>>>>>>>>>>    how many number of instances that might need for the next minute 
>>>>>>>>>> using
>>>>>>>>>>    predicted values for those metrics.
>>>>>>>>>>    - That algorithm will be implemented in such a way that it
>>>>>>>>>>    always will keep the instances under thresholds (or near 
>>>>>>>>>> thresholds ) of
>>>>>>>>>>    one or more metrics , with out exceeding them.
>>>>>>>>>>    - Assumption : metrics act inverse or direct proportionally
>>>>>>>>>>    when instances are spawned. (for an ex. load  is equally 
>>>>>>>>>> distributed among
>>>>>>>>>>    all the instances + newly spawned instances. )
>>>>>>>>>>
>>>>>>>>>> *Predict the load according to a schedule defined by end user *
>>>>>>>>>>
>>>>>>>>>> *Does this mean providing a functionality in web UI to define a
>>>>>>>>>> schedule and make it active? *It's not clear to me.
>>>>>>>>>> *Can this be achieved by generating an auto scale policy xml with
>>>>>>>>>> user defined thresholds similar to how it’s done currently and 
>>>>>>>>>> making it
>>>>>>>>>> possible to override the *auto-scaling* algorithm in use when
>>>>>>>>>> it’s needed (like in a specific time *which is already defined)
>>>>>>>>>> ? .
>>>>>>>>>>
>>>>>>>>>> Thanks
>>>>>>>>>> Asiri
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Wed, Mar 12, 2014 at 8:05 AM, Lahiru Sandaruwan <
>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi Asiri,
>>>>>>>>>>>
>>>>>>>>>>> It is a pleasure to see your interest. Sorry for the late reply.
>>>>>>>>>>> I missed the mail.
>>>>>>>>>>>
>>>>>>>>>>> Get the code base and build as a starting point for Stratos.
>>>>>>>>>>>
>>>>>>>>>>> You will not find Drools hard, after running some samples. [1]
>>>>>>>>>>> looks like a good sample. You can just run those in WSO2 BRS. You 
>>>>>>>>>>> can use
>>>>>>>>>>> your Java knowledge as we can write Java code in "then" section.
>>>>>>>>>>>
>>>>>>>>>>> AMQP knowledge means you have to understand pub/sub model with
>>>>>>>>>>> topics. Conceptually thats it. In addition, handling subs/pubs 
>>>>>>>>>>> using java
>>>>>>>>>>> codes.
>>>>>>>>>>>
>>>>>>>>>>> Great research, find the comments inline.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Mar 11, 2014 at 11:23 AM, Asiri Liyana Arachchi <
>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> 1. Improve Auto-scaling to predict the number of instances
>>>>>>>>>>>> required in the next time interval.
>>>>>>>>>>>>
>>>>>>>>>>>> As far as I understood, this project aims at introducing a new
>>>>>>>>>>>> auto scaling strategy apart from the threshold based auto scaling 
>>>>>>>>>>>> which is
>>>>>>>>>>>> currently in use, to stratos  making it more proactive on 
>>>>>>>>>>>> auto-scaling.
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Correct. So system should scale, understanding the load and
>>>>>>>>>>> hence the number of instances that would require to handle that 
>>>>>>>>>>> load.
>>>>>>>>>>>
>>>>>>>>>>> We have 3 types of information about load, and should consider
>>>>>>>>>>> all 3 for our decision.
>>>>>>>>>>>
>>>>>>>>>>>    - Requests inflight(Information about how many requests are
>>>>>>>>>>>    waiting to get the response)
>>>>>>>>>>>    - Load average of cartridge instances running
>>>>>>>>>>>    - Memory consumption of cartridge instances running
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> To do that there are several strategies suggested.
>>>>>>>>>>>>
>>>>>>>>>>>> 1. Kalman Filter
>>>>>>>>>>>> 2. Control theory
>>>>>>>>>>>> 3. Time Series Analysis.
>>>>>>>>>>>> 4. FFT
>>>>>>>>>>>>
>>>>>>>>>>>> As I've gone through these techniques as for now I felt that
>>>>>>>>>>>> Kalman Filter would be the most viable candidate and it can be 
>>>>>>>>>>>> used to
>>>>>>>>>>>> address this issue effectively.There is an apache API for Kalman 
>>>>>>>>>>>> filter [1].
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> We should find an efficient, yet simplest way to get the job
>>>>>>>>>>> done.  We currently use S = u*t + 0.5 *a*t*t prediction(motion) 
>>>>>>>>>>> equation.
>>>>>>>>>>> This is one of the equations that Kalman filter used to do 
>>>>>>>>>>> prediction. But
>>>>>>>>>>> with this, we have to compare with a threshold to take the decision.
>>>>>>>>>>>
>>>>>>>>>>> We receive second derivative, gradient and average values at a
>>>>>>>>>>> given time. Lets say we time interval we consider is minute. So we 
>>>>>>>>>>> can
>>>>>>>>>>> predict the load in the next minute using them.
>>>>>>>>>>> Also we know the number of instances that are running at the
>>>>>>>>>>> moment. The algorithm does not need to be complex. It should be just
>>>>>>>>>>> intelligent enough to find the matching number of instances that 
>>>>>>>>>>> should be
>>>>>>>>>>> there in the next minute.
>>>>>>>>>>>
>>>>>>>>>>> [1] https://docs.wso2.org/display/BRS200/Sample+Rule+Definition
>>>>>>>>>>>
>>>>>>>>>>> Thanks.
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> But I think selecting an auto scaling algorithm would involve
>>>>>>>>>>>> more of research and testing. Even selecting metrics to predict on 
>>>>>>>>>>>> will
>>>>>>>>>>>> also be challenging because some of the metrics for an example 
>>>>>>>>>>>> *load
>>>>>>>>>>>> average *depends on autos-scalling causing predictions to
>>>>>>>>>>>> deviate from the actual values.
>>>>>>>>>>>>
>>>>>>>>>>> I would appreciate if you can comment on this.
>>>>>>>>>>>>
>>>>>>>>>>>> [1] :
>>>>>>>>>>>> http://commons.apache.org/proper/commons-math/apidocs/org/apache/commons/math3/filter/KalmanFilter.html
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks
>>>>>>>>>>>> Asiri
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Thu, Mar 6, 2014 at 7:38 AM, Udara Liyanage 
>>>>>>>>>>>> <[email protected]>wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi Asiri,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Glad to hear your interest on Stratos. I don't think it will
>>>>>>>>>>>>> take more than few days to learn drools and amqp. You will be 
>>>>>>>>>>>>> able to do it
>>>>>>>>>>>>> within given time period.
>>>>>>>>>>>>> Happy to see your project proposal soon.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Touched, not typed. Erroneous words are a feature, not a typo.
>>>>>>>>>>>>> On Mar 6, 2014 7:13 AM, "Asiri Liyana Arachchi" <
>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I'm Asiri Liyana Arachchi , third year student studying
>>>>>>>>>>>>>> Computer Science and Engineering in University of Moratuwa , Sri 
>>>>>>>>>>>>>> Lanka.
>>>>>>>>>>>>>> I would like to start contributing towards the project
>>>>>>>>>>>>>> $subject .I've gone through the resources about this project 
>>>>>>>>>>>>>> including
>>>>>>>>>>>>>> stratos documentation and the code-base.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> As expected I'm familiur with java , json and SOA . I would
>>>>>>>>>>>>>> like to know how well and in what cases Drools and APQM skills 
>>>>>>>>>>>>>> are
>>>>>>>>>>>>>> required. Also would it be feasible to complete the project in 
>>>>>>>>>>>>>> the projects
>>>>>>>>>>>>>> limited time, considered that the Drools and APQM are to be 
>>>>>>>>>>>>>> learnt along
>>>>>>>>>>>>>> with the total work of the project.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Asiri
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> --
>>>>>>>>>>> Lahiru Sandaruwan
>>>>>>>>>>> Software Engineer,
>>>>>>>>>>> Platform Technologies,
>>>>>>>>>>> WSO2 Inc., http://wso2.com
>>>>>>>>>>> lean.enterprise.middleware
>>>>>>>>>>>
>>>>>>>>>>> email: [email protected] cell: (+94) 773 325 954
>>>>>>>>>>> blog: http://lahiruwrites.blogspot.com/
>>>>>>>>>>> twitter: http://twitter.com/lahirus
>>>>>>>>>>> linked-in:
>>>>>>>>>>> http://lk.linkedin.com/pub/lahiru-sandaruwan/16/153/146
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> --
>>>>>>>> Lahiru Sandaruwan
>>>>>>>> Software Engineer,
>>>>>>>> Platform Technologies,
>>>>>>>> WSO2 Inc., http://wso2.com
>>>>>>>> lean.enterprise.middleware
>>>>>>>>
>>>>>>>> email: [email protected] cell: (+94) 773 325 954
>>>>>>>> blog: http://lahiruwrites.blogspot.com/
>>>>>>>> twitter: http://twitter.com/lahirus
>>>>>>>> linked-in: http://lk.linkedin.com/pub/lahiru-sandaruwan/16/153/146
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> --
>>>>>> Lahiru Sandaruwan
>>>>>> Software Engineer,
>>>>>> Platform Technologies,
>>>>>> WSO2 Inc., http://wso2.com
>>>>>> lean.enterprise.middleware
>>>>>>
>>>>>> email: [email protected] cell: (+94) 773 325 954
>>>>>> blog: http://lahiruwrites.blogspot.com/
>>>>>> twitter: http://twitter.com/lahirus
>>>>>> linked-in: http://lk.linkedin.com/pub/lahiru-sandaruwan/16/153/146
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Best Regards,
>>>>> Nirmal
>>>>>
>>>>> Nirmal Fernando.
>>>>> PPMC Member & Committer of Apache Stratos,
>>>>> Senior Software Engineer, WSO2 Inc.
>>>>>
>>>>> Blog: http://nirmalfdo.blogspot.com/
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> --
>>>> Lahiru Sandaruwan
>>>> Committer and PPMC member, Apache Stratos(incubating),
>>>> Senior Software Engineer,
>>>> WSO2 Inc., http://wso2.com
>>>> lean.enterprise.middleware
>>>>
>>>> email: [email protected] cell: (+94) 773 325 954
>>>> blog: http://lahiruwrites.blogspot.com/
>>>> twitter: http://twitter.com/lahirus
>>>> linked-in: http://lk.linkedin.com/pub/lahiru-sandaruwan/16/153/146
>>>>
>>>>
>>>
>>>
>>> --
>>> Best Regards,
>>> Nirmal
>>>
>>> Nirmal Fernando.
>>> PPMC Member & Committer of Apache Stratos,
>>> Senior Software Engineer, WSO2 Inc.
>>>
>>> Blog: http://nirmalfdo.blogspot.com/
>>>
>>
>>
>>
>> --
>> --
>> Lahiru Sandaruwan
>> Committer and PPMC member, Apache Stratos(incubating),
>> Senior Software Engineer,
>> WSO2 Inc., http://wso2.com
>> lean.enterprise.middleware
>>
>> email: [email protected] cell: (+94) 773 325 954
>> blog: http://lahiruwrites.blogspot.com/
>> twitter: http://twitter.com/lahirus
>> linked-in: http://lk.linkedin.com/pub/lahiru-sandaruwan/16/153/146
>>
>>
>
>
> --
> Best Regards,
> Nirmal
>
> Nirmal Fernando.
> PPMC Member & Committer of Apache Stratos,
> Senior Software Engineer, WSO2 Inc.
>
> Blog: http://nirmalfdo.blogspot.com/
>



-- 
--
Lahiru Sandaruwan
Committer and PPMC member, Apache Stratos(incubating),
Senior Software Engineer,
WSO2 Inc., http://wso2.com
lean.enterprise.middleware

email: [email protected] cell: (+94) 773 325 954
blog: http://lahiruwrites.blogspot.com/
twitter: http://twitter.com/lahirus
linked-in: http://lk.linkedin.com/pub/lahiru-sandaruwan/16/153/146

Re: Improvements to Autoscaling in Apache Stratos [gsoc]

Reply via email to