Re: Improvements to Autoscaling in Apache Stratos [gsoc]

Asiri Liyana Arachchi Thu, 26 Jun 2014 13:18:58 -0700

Hi all,

*Calculating the number of requests that an instance can handle *


Need the number of requests served with in a minute for a cluster .
This value will be taken from the LB. As it's currently not available that
logic will be written in LB.

*Lets consider the following scenario*

number of request served with in a minute = 70
active instances for that particular minute  = 4

average number of requests that an instance can handle = 70 /4 = 17.5

This will be calculated for consecutive times ( at least 10 times ) and the
final average will be calculated.


*For an example*

 t = 13 min to t=23 min

n1 =* average number of requests that an instance can handle * in t=13 min
, calculated using above method
n2 =* average number of requests that an instance can handle * in t=14 min
, calculated using above method
.......
......
n10 =* average number of requests that an instance can handle * in t=22 ,
calculated using above method

*final average number of requests that an instance can handle* = ( n1 + n2
+ .....+ n10 )/10
This value will be used for the calculation of* number of instances  *needed
for autoscaling through out the next 10 minutes. (t=23 min to t=33 min)
It'll be round off to the ceiling value.

For t= 23 min to 33 min new *final average number of requests that an
instance can handle *will be calculated
and that value will be used for the instance calculation throughout t=33
min to t=43 min



Number of times that *average number of requests that an instance can
handle *should be calculated (above example it's 10 ) before taking the
final average will be made user configurable.


*Issues:*
There might be a problem if a pending instance turns to an active instance
with in the time period (minute) considered for  *average number of
requests that an instance can handle *calculation. As we take average for
several consecutive times that error will be minimized.

 As suggested in the thread at the initiation (t=0 min ) the value *average
number of requests that an instance can handle *is not set. It will wait at
least 2 minutes (for taking stats needed for above calculation) without
autoscaling until that value is set.Once it's set above procedure will
continue. Normally there will be minimum number of instances spawned at the
initiation. Above calculation will be done based on the that instances
amount.

Depending on the user requirement the initial value of *average number of
requests that an instance can handle *will be either taken from the user or
will be let to automatically calculated as explained above.

Welcome your thoughts on this!



*Predict the number of instances needed. *

Quoting me.

On Thu, Jun 5, 2014 at 9:51 AM, Asiri Liyana Arachchi <[email protected]>
 wrote:

>
> For an e.g.
> Number of rif that an instance could handle - 50
> Predicted rif =170
> Required instances = 170 /50
>                               = 4 (taking the ceiling value )
>
> If the current number of instances is 2 another 4-2 have to be spawned.
> If the current number of instances is 6 , the number of instances that
> should be terminated is 4-6
>
> When rounding of values ( number of instances ) we can either follow the
> way amazon did it for percentage based auto scaling [1] or we can let user
> decide (in autoscaling policy) whether to use ceiling or floor value to
> round off depending on his server availability requirements. Welcome your
> thoughts on this.
>
>
Appreciate your comments on this rounding off method for instance
calculation.




Regards,
Asiri



On Mon, Jun 16, 2014 at 6:34 PM, Lahiru Sandaruwan <[email protected]> wrote:

>
>
>
> On Sat, Jun 7, 2014 at 3:09 AM, Asiri Liyana Arachchi <[email protected]
> > wrote:
>
>> Hi,
>>
>> On Thu, Jun 5, 2014 at 5:09 PM, Dale Chalfant <[email protected]
>> > wrote:
>>
>> Hello Asiri,
>>>
>>>
>>>
>>> Are you using acceleration and gradient to determine the predicted rif
>>> (over the amount of time that it is expected to launch those new
>>> instances)?  If we are rapidly increasing (high acceleration), the
>>> prediction will be higher than if we have been relatively stable (low
>>> acceleration) over time.  I recall these factors being added some months
>>> back to allow for this capability.
>>>
>>
>>  Yes. The current prediction equation is used to predict the rif .
>>
>>
>> Currently I'm working on how to predict the number of concurrent requests
>> that an instance can handle.
>>
>> On Fri, Jun 6, 2014 at 9:31 AM, Lahiru Sandaruwan <[email protected]>
>>  wrote:
>>
>>>
>>> It will be easy to find measures from LB side. We already calculate the
>>> requests delivered and response we got back. But we calculate them per
>>> cluster, not per instance. May be we can improve that too.
>>>
>>
>> Calculating request stats (response time and rif) per instance seems
>> essential for deciding the number of concurrent requests that an instance
>> can handle.
>>
>> But still I think for the initial state of instance calculation using
>> user defined value (provided considering server capabilities etc. ) would
>> make much more sense since the concurrent requests that can be handled by
>> an instance might change over time. It would be just a good value to start
>> with.Similarly in the prevailing configuration user is expected to provide
>> threshold values for rif in AS policy.
>> I don't understand how to decide on what value should be used as the
>> number of concurrent requests that can be handled by an instance at the
>> initial state without having any stat for predicting if it's not given.
>>
>
> Yes, but we can wait until we find it if we want. May be 10 minutes limit.
>
>>
>>
>> On Fri, Jun 6, 2014 at 9:31 AM, Lahiru Sandaruwan <[email protected]>
>> wrote:
>>
>>>
>>>
>>>
>>> On Thu, Jun 5, 2014 at 10:59 PM, Nirmal Fernando <[email protected]
>>> > wrote:
>>>
>>>> If we are to go in that path, we need to collect stats from LB such as
>>>> response time. In that case, we might not even need to consider RIF count.
>>>> I think we should make this another dimension :-)
>>>>
>>>>
>>> hmm.. We might need to research a bit to find the which attributes to
>>> catch, like response time.
>>>
>>>>
>>>> On Thu, Jun 5, 2014 at 10:50 PM, Akila Ravihansa Perera <
>>>> [email protected]> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> Sounds good to me :)
>>>>>
>>>>> I wouldn't rely on any user input on number of concurrent requests
>>>>> that can be handled by a cartridge instance. A user has no way of
>>>>> knowing how the instance would perform in a production environment.
>>>>> Things could get messy, some service calls may take too long to
>>>>> respond than initially anticipated by the user.
>>>>>
>>>>> So my suggestion was that AS should predict this value. And LB should
>>>>> be the entity that will monitor how requests are being handled by
>>>>> cartridge instances and help AS to decide what would be the ideal
>>>>> concurrent requests count per each instance.
>>>>>
>>>>>
>>> It will be easy to find measures from LB side. We already calculate the
>>> requests delivered and response we got back. But we calculate them per
>>> cluster, not per instance. May be we can improve that too.
>>>
>>> AS will get the stats summarized at CEP currently. Hope we can use the
>>> same model.
>>>
>>>
>>>  On Thu, Jun 5, 2014 at 10:24 PM, Lahiru Sandaruwan <[email protected]>
>>>>> wrote:
>>>>> > Hi,
>>>>> >
>>>>> > Good suggestion on considering the feedback.
>>>>> >
>>>>> > The number of requests successfully served in particular period will
>>>>> be a
>>>>> > good value for feedback(What else?). We can take this value from LB
>>>>> itself.
>>>>> >
>>>>> > Also how about giving the freedom to user, whether he want to give
>>>>> the
>>>>> > initial value("number of concurrent requests that can be handled by
>>>>> an
>>>>> > cartridge Instance") or not?
>>>>> >
>>>>> > On Thu, Jun 5, 2014 at 5:27 PM, Akila Ravihansa Perera <
>>>>> [email protected]>
>>>>> > wrote:
>>>>> >>
>>>>> >> Hi Nirmal,
>>>>> >>
>>>>> >> I only gave a simple example for calculating the number of instances
>>>>> >> required. Taking the avg for per instance ideal RIF is a naive
>>>>> method
>>>>> >> of calculating it. However, the main idea is that rather than taking
>>>>> >> user input for per instance RIF, AS will predict this by looking at
>>>>> >> current active instances, avg RIF defined in policy and predicted
>>>>> RIF.
>>>>> >>
>>>>> >>
>>>>> >> This is a matter of separating concerns for LB and AS. IMHO, it is
>>>>> up
>>>>> >> to the LB to decide whether an instance can handle a certain level
>>>>> of
>>>>> >> concurrent requests.
>>>>> >>
>>>>> >> I'm still unclear about this user input parameter, "Number of rif
>>>>> than
>>>>> >> an instance could handle", that Asiri has mentioned. Could you
>>>>> please
>>>>> >> elaborate more on that?
>>>>> >
>>>>> >
>>>>> > In Asiri's proposal, user input is "number of concurrent requests
>>>>> that can
>>>>> > be handled by an cartridge Instance".
>>>>> >>
>>>>> >>
>>>>> >> Thanks.
>>>>> >>
>>>>> >> On Thu, Jun 5, 2014 at 12:07 PM, Nirmal Fernando <
>>>>> [email protected]>
>>>>> >> wrote:
>>>>> >> > Hi Akila,
>>>>> >> >
>>>>> >> > How did you come up with the value of 75 (150/2 ?)? What's the
>>>>> basis for
>>>>> >> > assuming that all 150 requests are served correctly? (Server
>>>>> might be
>>>>> >> > capable of handling only 20 concurrent requests at a moment.
>>>>> >> >
>>>>> >> >
>>>>> >> > On Thu, Jun 5, 2014 at 12:00 PM, Akila Ravihansa Perera
>>>>> >> > <[email protected]>
>>>>> >> > wrote:
>>>>> >> >>
>>>>> >> >> Hi Asiri,
>>>>> >> >>
>>>>> >> >> Great work on the proposal. I have some few concerns/suggestions.
>>>>> >> >>
>>>>> >> >> RIF metric is calculated by taking the number of requests
>>>>> currently in
>>>>> >> >> the LB's queue, AFAIK. Therefore, rather than taking input for
>>>>> rif
>>>>> >> >> count that an instance could handle, it would make sense to
>>>>> calculate
>>>>> >> >> the number of instances required to maintain the average RIF.
>>>>> >> >>
>>>>> >> >> For eg. let's say we have 2 instances, and RIF avg is 150, and
>>>>> >> >> predicted RIF goes to 170. It means using 2 instances, one
>>>>> instance
>>>>> >> >> may have to take 85 RIF. But avg RIF for one instance should
>>>>> ideally
>>>>> >> >> be 75. Then we can calculate how many instances we need to
>>>>> maintain 75
>>>>> >> >> RIF per instance.
>>>>> >> >>
>>>>> >> >> This is merely a suggestion. Reason is I don't think taking user
>>>>> input
>>>>> >> >> for RIF per instance would make much sense, IMHO.
>>>>> >> >>
>>>>> >> >> Thanks.
>>>>> >> >>
>>>>> >> >> On Thu, Jun 5, 2014 at 9:51 AM, Asiri Liyana Arachchi
>>>>> >> >> <[email protected]> wrote:
>>>>> >> >> > 1. Improve the auto-scaling to predict the number of instances
>>>>> >> >> > needed.
>>>>> >> >> >
>>>>> >> >> > Starting a new thread with suggestions to predict the number of
>>>>> >> >> > instances.
>>>>> >> >> >
>>>>> >> >> > There are three factors that are being considered when auto
>>>>> scaling.
>>>>> >> >> > Requests in flight (rif)
>>>>> >> >> > Memory Consumption
>>>>> >> >> > Load average.
>>>>> >> >> >
>>>>> >> >> > For requests in flight.
>>>>> >> >> >
>>>>> >> >> > User input - Number of rif than an instance could handle.
>>>>> >> >> >
>>>>> >> >> > Once it's given we can simply calculate the required number of
>>>>> >> >> > instances
>>>>> >> >> > to
>>>>> >> >> > spawn or terminate.
>>>>> >> >> >
>>>>> >> >> > For an e.g.
>>>>> >> >> > Number of rif that an instance could handle - 50
>>>>> >> >> > Predicted rif =170
>>>>> >> >> > Required instances = 170 /50
>>>>> >> >> >                               = 4 (taking the ceiling value )
>>>>> >> >> >
>>>>> >> >> > If the current number of instances is 2 another 4-2 have to be
>>>>> >> >> > spawned.
>>>>> >> >> > If the current number of instances is 6 , the number of
>>>>> instances
>>>>> >> >> > that
>>>>> >> >> > should be terminated is 4-6
>>>>> >> >> >
>>>>> >> >> > When rounding of values ( number of instances ) we can either
>>>>> follow
>>>>> >> >> > the
>>>>> >> >> > way
>>>>> >> >> > amazon did it for percentage based auto scaling [1] or we can
>>>>> let
>>>>> >> >> > user
>>>>> >> >> > decide (in autoscaling policy) whether to use ceiling or floor
>>>>> value
>>>>> >> >> > to
>>>>> >> >> > round off depending on his server availability requirements.
>>>>> Welcome
>>>>> >> >> > your
>>>>> >> >> > thoughts on this.
>>>>> >> >> >
>>>>> >> >> >  Here is the project's work that i'm supposed to complete.
>>>>> >> >> >
>>>>> >> >> > 1) setting up apache stratos on openstack.
>>>>> >> >> > 2) research on how to use load average / memory consumption for
>>>>> >> >> > instance
>>>>> >> >> > calculation.
>>>>> >> >> > 3) Getting community feed back and implementation.
>>>>> >> >> > 4) Research on improving prediction algorithm.
>>>>> >> >> > 5) Schedule based autoscaling.
>>>>> >> >> >
>>>>> >> >> > Currently working on setting up apache stratos.(for testing)
>>>>> >> >> >
>>>>> >> >> > [1]
>>>>> >> >> >
>>>>> >> >> >
>>>>> >> >> >
>>>>> http://docs.aws.amazon.com/AutoScaling/latest/DeveloperGuide/as-scale-based-on-demand.html
>>>>> >> >>
>>>>> >> >>
>>>>> >> >>
>>>>> >> >> --
>>>>> >> >> Akila Ravihansa Perera
>>>>> >> >> Software Engineer
>>>>> >> >> WSO2 Inc.
>>>>> >> >> http://wso2.com
>>>>> >> >>
>>>>> >> >> Phone: +94 77 64 154 38
>>>>> >> >> Blog: http://ravihansa3000.blogspot.com
>>>>> >> >
>>>>> >> >
>>>>> >> >
>>>>> >> >
>>>>> >> > --
>>>>> >> > Best Regards,
>>>>> >> > Nirmal
>>>>> >> >
>>>>> >> > Nirmal Fernando.
>>>>> >> > PPMC Member & Committer of Apache Stratos,
>>>>> >> > Senior Software Engineer, WSO2 Inc.
>>>>> >> >
>>>>> >> > Blog: http://nirmalfdo.blogspot.com/
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >> --
>>>>> >> Akila Ravihansa Perera
>>>>> >> Software Engineer
>>>>> >> WSO2 Inc.
>>>>> >> http://wso2.com
>>>>> >>
>>>>> >> Phone: +94 77 64 154 38
>>>>> >> Blog: http://ravihansa3000.blogspot.com
>>>>> >
>>>>> >
>>>>> >
>>>>> >
>>>>> > --
>>>>> > --
>>>>> > Lahiru Sandaruwan
>>>>> > Committer and PMC member, Apache Stratos,
>>>>> > Senior Software Engineer,
>>>>> > WSO2 Inc., http://wso2.com
>>>>> > lean.enterprise.middleware
>>>>> >
>>>>> > email: [email protected] cell: (+94) 773 325 954
>>>>> > blog: http://lahiruwrites.blogspot.com/
>>>>> > twitter: http://twitter.com/lahirus
>>>>> > linked-in: http://lk.linkedin.com/pub/lahiru-sandaruwan/16/153/146
>>>>> >
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Akila Ravihansa Perera
>>>>> Software Engineer
>>>>> WSO2 Inc.
>>>>> http://wso2.com
>>>>>
>>>>> Phone: +94 77 64 154 38
>>>>> Blog: http://ravihansa3000.blogspot.com
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Best Regards,
>>>> Nirmal
>>>>
>>>> Nirmal Fernando.
>>>> PPMC Member & Committer of Apache Stratos,
>>>> Senior Software Engineer, WSO2 Inc.
>>>>
>>>> Blog: http://nirmalfdo.blogspot.com/
>>>>
>>>
>>>
>>>
>>> --
>>> --
>>> Lahiru Sandaruwan
>>> Committer and PMC member, Apache Stratos,
>>> Senior Software Engineer,
>>> WSO2 Inc., http://wso2.com
>>> lean.enterprise.middleware
>>>
>>> email: [email protected] cell: (+94) 773 325 954
>>> blog: http://lahiruwrites.blogspot.com/
>>> twitter: http://twitter.com/lahirus
>>> linked-in: http://lk.linkedin.com/pub/lahiru-sandaruwan/16/153/146
>>>
>>>
>>
>
>
> --
> --
> Lahiru Sandaruwan
> Committer and PMC member, Apache Stratos,
> Senior Software Engineer,
> WSO2 Inc., http://wso2.com
> lean.enterprise.middleware
>
> email: [email protected] cell: (+94) 773 325 954
> blog: http://lahiruwrites.blogspot.com/
> twitter: http://twitter.com/lahirus
> linked-in: http://lk.linkedin.com/pub/lahiru-sandaruwan/16/153/146
>
>

Re: Improvements to Autoscaling in Apache Stratos [gsoc]

Reply via email to