Re: Autoscaling in an IaaS environment

Sam Bessalah Fri, 21 Aug 2015 08:46:41 -0700

Looks like Netflix has open osurced Fenzo, their Scheduler for EC2, with
autoscaling. Might be of interest here:
http://techblog.netflix.com/2015/08/fenzo-oss-scheduler-for-apache-mesos.html


On Tue, Aug 4, 2015 at 3:09 PM, VELTEN, MATHIEU <mathieu.vel...@atos.net>
wrote:

> Hi,
>
> First thanks for the various answers and sorry for the duplicate first
> email, I thought it didn't went through the mailing because of the
> attachment, and I failed to notice the pagination when checking the
> archives...
>
> I didn't know the existence of the requestResources API, indeed it looks a
> lot like my proposed "wishes". With that API I think I have everything in
> place to write a custom Allocator that should fit our needs.
> I also think it could be useful to provide it inside Mesos as an
> alternative (default ?) to the current Allocator, if there is enough
> interest/usage to support it.
>
> Here is the intended design I came up while reading your various
> contributions, feel free to comment :
>
> I understand the various concern regarding support burden when introducing
> specific IaaS drivers and I agree with you on that, it should be left to a
> plugin system with really simple methods like scaleUp(Requests) and
> scaleDown(emptyVolatileSlaves, nonFullVolatileSlaves).
>
> You can basically put all the important and user-specific logic (ramp up
> granularity, cooldown period, binpacking) in the plugin, allocator
> modifications would then be kept to a minimum :
> - after an unsuccessful round of offers, gather wishes if any and call
> scaleUp (requests can be empty here, which is equivalent to a blind scaleUp)
> - after a round of offers, list empty and non-full volatile slaves, call
> scaleDown. Binpacking can be handled inside the plugin by migrating tasks
> between the volatiles slaves before killing the remaining ones
>
> All of this can even be put under a if (plugin != null), so there is
> basically no impact at all when no auto-scale plugin is provided.
>
> For my information is there some WIP regarding tasks migration ? We don't
> really need it for binpacking since most of our workflow is stateless and I
> think we will not allow stateful services in volatile slaves anyway.
>
> Regards,
>
> Mathieu
>
>
> -----Original Message-----
> From: Yong Feng [mailto:fengyong...@gmail.com]
> Sent: Tuesday, August 04, 2015 5:59 AM
> To: dev@mesos.apache.org
> Subject: Re: Autoscaling in an IaaS environment
>
> Hi Benjamin
>
> Right, Mesos has to orchestrate shrink, for example notify framework to
> gracefully terminate workload, or even make the schedule decision which
> host will be closed and reclaimed. However it does not mean Mesos has to be
> built with policy to trigger the auto-scale.
>
> The policy of auto-scale Mesos cluster itself is trying to meet the
> overall SLA of Mesos cluster, but may not be the SLA of specific framework,
> even thought they may be relevant.
>
> It is probably better to ask Mesos focus on resource sharing among
> framework to meet the SLA of framework, while an outside Auto-scaler to
> monitor the Mesos and work with Mesos to meet the SLA of Mesos (all the
> frameworks).
>
> Thanks,
>
> Yong
>
> On Mon, Aug 3, 2015 at 2:34 PM, Benjamin Mahler <benjamin.mah...@gmail.com
> >
> wrote:
>
> > With auto-scaling, shrinking is not as easy as growing. For example,
> > we may need to "defragment" the cluster in order to shrink the number
> > of slaves, and mesos seems to be in the best position to orchestrate
> > such a process if you want do this based on framework's SLA
> > constraints (would re-use inverse offers).
> >
> > On Sun, Aug 2, 2015 at 5:35 PM, Yong Feng <fengyong...@gmail.com> wrote:
> >
> > > I prefer an auto-scaler outside mesos as well. As long as Mesos
> > > exports enough statistics, an outside auto-scaler should be able to
> > > make the auto-scale decision as smart as Mesos itself. It will also
> > > help to douple the resource scheduilng from resource infrastructure
> > > management. Mesos
> > just
> > > need focus on how to support adding/removing nodes dynanicly and
> > gracefully
> > > without impact running workload such as feature of host
> > > maintenance/removing ....
> > >
> > > Besides, exproting statistics also helps on Mesos
> > > diagnosing/troubleshooting, simulation, profiling and so on.
> > >
> > > The only case an auto-scaler may not support is that the auto-scale
> > decison
> > > may have impact on sceduling decison for exapmle resource mamanger
> > > (like
> > > Mesos) don't have to reclaim a framework if new nodes with required
> > > resources will be added. However we even could argue whether it is a
> > valid
> > > use case that we ask scheduling decison depends on auto-scale decsion.
> > >
> > > Thanks
> > >
> > >
> > > On Sun, Aug 2, 2015 at 12:56 PM, tommy xiao <xia...@gmail.com> wrote:
> > >
> > > > my want:  write a daemon to query mesos framework api, get the
> > statistics
> > > > from mesos api. then invoke the IaaS's API to scale the cluster size.
> > > >
> > > > 2015-08-02 22:32 GMT+08:00 Alex Rukletsov <a...@mesosphere.com>:
> > > >
> > > > > I agree with Vinod that the Master accumulates a lot of
> > > > > statistics
> > that
> > > > can
> > > > > be used for smarter decisions about cluster scaling. However,
> > > > > I'm not
> > > > sure
> > > > > this feature should reside in Mesos. I would rather expose
> > > > > statistics
> > > > and /
> > > > > or recommendations and let external tooling or an operator do
> > > > > the
> > job.
> > > > > On 31 Jul 2015 7:15 pm, "Vinod Kone" <vinodk...@gmail.com> wrote:
> > > > >
> > > > > > Thanks for pinging again Mathieu!
> > > > > >
> > > > > > I think auto-scaling of a Mesos cluster is a nifty feature to
> have.
> > > The
> > > > > > only question in my mind (and likely others) is whether this
> > > > > functionality
> > > > > > should reside in Mesos, or a framework or an operator. As you
> > > > mentioned,
> > > > > > Netflix took the framework way but it doesn't necessarily work
> > > > > > in a multi-framework environment. If the functionality lies
> > > > > > with an
> > > operator
> > > > > it
> > > > > > has to be a library (likely a service) so that more people can
> > > > > > take advantage of it.
> > > > > >
> > > > > > In my mind, it is not hard to imagine having this
> > > > > > functionality in
> > > > Mesos.
> > > > > > Since Mesos is in the best position to know the (current and
> > perhaps
> > > > > > projected) state of the cluster it could make smart decisions
> > > > > > about
> > > the
> > > > > > shape and size of the new nodes that can be added. This also
> > becomes
> > > > > > interesting in the face of the quota
> > > > > > <https://issues.apache.org/jira/browse/MESOS-1791> work that
> > > > > > we
> > are
> > > > > > currently doing.
> > > > > >
> > > > > > Having said that, I think you can do this today by writing an
> > > allocator
> > > > > > module. Note that Mesos already provides a requestResources()
> > > > > > API
> > > call
> > > > > > (similar to Wish in your ppt) that is passed to the allocator.
> > > > > > You
> > > > should
> > > > > > be able to write an allocator module that takes this signal
> > > > > > and
> > talks
> > > > to
> > > > > > your favorite IaaS API to spin up new node(s) if necessary.
> > > > > >
> > > > > >
> > > > > > On Fri, Jul 31, 2015 at 8:29 AM, Roger Ignazio
> > > > > > <rigna...@gmail.com
> > >
> > > > > wrote:
> > > > > >
> > > > > > > With the number of IaaS providers out there, and the fact
> > > > > > > that
> > > Mesos
> > > > > > > doesn't really concern itself with where it's running (IaaS,
> > > > > bare-metal,
> > > > > > > on-prem, in the cloud), this sounds more like an operations
> > problem
> > > > > than
> > > > > > a
> > > > > > > feature that should be in Mesos core.
> > > > > > >
> > > > > > > By any chance, have you had a chance to look at
> > > > > > > https://github.com/thefactory/autoscale-python? I'd venture
> > > > > > > to
> > > guess
> > > > > > that
> > > > > > > project (or a homegrown solution talking to your IaaS' API),
> > > combined
> > > > > > with
> > > > > > > some custom AWS AMIs (or vSphere templates or OpenStack
> > > > > > > images or
> > > > ...),
> > > > > > > would satisfy your use-case.
> > > > > > >
> > > > > > > -- Roger
> > > > > > >
> > > > > > > On Fri, Jul 31, 2015 at 5:37 AM, VELTEN, MATHIEU <
> > > > > > mathieu.vel...@atos.net>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Hi,
> > > > > > > >
> > > > > > > > I am currently working for some projects using Mesos at
> > > > > > > > Atos
> > > > Toulouse
> > > > > > and
> > > > > > > > we are using it on top of a classical IaaS.
> > > > > > > >
> > > > > > > > After playing with Mesos and looking at some code it
> > > > > > > > appears to
> > > me
> > > > > that
> > > > > > > > there is no elasticity mechanism in place. I opened an
> > > > > > > > issue in
> > > > Jira
> > > > > > some
> > > > > > > > months ago here, which contains most of the content of
> > > > > > > > this
> > > email :
> > > > > > > > https://issues.apache.org/jira/browse/MESOS-2453
> > > > > > > >
> > > > > > > > Here is what I have in mind (ppt in the following link for
> > > > > > > > the
> > > > > detailed
> > > > > > > > and visual version ☺ ) :
> > > > > > > > - Add the possibility for a framework to signal that it
> > > > > > > > has
> > some
> > > > work
> > > > > > > > pending (with or without further semantics regarding what
> > > resources
> > > > > is
> > > > > > > > wished ?)
> > > > > > > > - Modify the Mesos algo to call a pluggable driver when no
> > > resource
> > > > > is
> > > > > > > > available and at least one framework has some work to do.
> > > > > > > >    In this case the driver should scale up the Mesos
> > > > > > > > cluster by
> > > > > > launching
> > > > > > > > VMs. How much and of which size is a little tricky here
> > > > > > > > without
> > > > > adding
> > > > > > > > semantics to the framework signal.
> > > > > > > > - We should also add a flag somewhere to mark the slave as
> > > > "volatile"
> > > > > > so
> > > > > > > > we can prefer the use of static resources, and shut down
> > > > > > > > the
> > > > volatile
> > > > > > > > slaves after some time left unused.
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> > https://docs.google.com/presentation/d/1eNQSvDQ64gPNbmf0YVPq9tIWLMCbAH
> > Exos5WXrm0uqI/edit?usp=sharing
> > > > > > > >
> > > > > > > > Does it look doable to you ? what do you think about the
> > > principle
> > > > ?
> > > > > > > > Do you think we can add some semantics to the "I have work
> > > > > > > > to
> > do"
> > > > > > > > framework signal without breaking the two-level scheduling
> > > > principle
> > > > > ?
> > > > > > > > I don't think it violates it since both mechanisms
> > > > > > > > (signaling a
> > > > need
> > > > > > and
> > > > > > > > effectively take a resource from an offer) are fully
> > independent
> > > in
> > > > > my
> > > > > > > > proposal but I feel a little out of my league to be sure.
> > > > > > > >
> > > > > > > > This proposal currently doesn't specifically address bin
> > packing,
> > > > > > however
> > > > > > > > with the aforementioned modifications in place it should
> > > > > > > > be
> > easy
> > > to
> > > > > add
> > > > > > > > since we know which resources are volatile.
> > > > > > > >
> > > > > > > > I have seen some other work (by Netflix for example)
> > > > > > > > address
> > this
> > > > > > problem
> > > > > > > > however it always seems to be at the framework level and
> > > > > > > > not
> > > inside
> > > > > the
> > > > > > > > core Mesos architecture, is there a reason for that except
> > > > > > > > lack
> > > of
> > > > > time
> > > > > > > for
> > > > > > > > specification/contribution ?
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> > http://fr.slideshare.net/spodila/aws-reinvent-2014-talk-scheduling-usi
> > ng-apache-mesos-in-the-cloud
> > > > > > > >
> > > > > > > > Regards,
> > > > > > > >
> > > > > > > > Mathieu Velten
> > > > > > > > Ce message et toutes les pièces jointes (ci-après le
> > > > > > > > "message")
> > > > sont
> > > > > > > > établis à l’intention exclusive des destinataires
> > > > > > > > désignés. Il
> > > > > contient
> > > > > > > des
> > > > > > > > informations confidentielles et pouvant être protégé par
> > > > > > > > le
> > > secret
> > > > > > > > professionnel. Si vous recevez ce message par erreur,
> > > > > > > > merci
> > d'en
> > > > > > avertir
> > > > > > > > immédiatement l'expéditeur et de détruire le message.
> > > > > > > > Toute
> > > > > utilisation
> > > > > > > de
> > > > > > > > ce message non conforme à sa destination, toute diffusion
> > > > > > > > ou
> > > toute
> > > > > > > > publication, totale ou partielle, est interdite, sauf
> > > autorisation
> > > > > > > expresse
> > > > > > > > de l’émetteur. L'internet ne garantissant pas l'intégrité
> > > > > > > > de ce
> > > > > message
> > > > > > > > lors de son acheminement, Atos (et ses filiales)
> > > > > > > > décline(nt)
> > > toute
> > > > > > > > responsabilité au titre de son contenu. Bien que ce
> > > > > > > > message ait
> > > > fait
> > > > > > > > l’objet d’un traitement anti-virus lors de son envoi,
> > l’émetteur
> > > ne
> > > > > > peut
> > > > > > > > garantir l’absence totale de logiciels malveillants dans
> > > > > > > > son
> > > > contenu
> > > > > et
> > > > > > > ne
> > > > > > > > pourrait être tenu pour responsable des dommages engendrés
> > > > > > > > par
> > la
> > > > > > > > transmission de l’un d’eux.
> > > > > > > >
> > > > > > > > This message and any attachments (the "message") are
> > > > > > > > intended
> > > > solely
> > > > > > for
> > > > > > > > the addressee(s). It contains confidential information,
> > > > > > > > that
> > may
> > > be
> > > > > > > > privileged. If you receive this message in error, please
> > > > > > > > notify
> > > the
> > > > > > > sender
> > > > > > > > immediately and delete the message. Any use of the message
> > > > > > > > in
> > > > > violation
> > > > > > > of
> > > > > > > > its purpose, any dissemination or disclosure, either
> > > > > > > > wholly or
> > > > > > partially
> > > > > > > is
> > > > > > > > strictly prohibited, unless it has been explicitly
> > > > > > > > authorized
> > by
> > > > the
> > > > > > > > sender. As its integrity cannot be secured on the
> > > > > > > > internet,
> > Atos
> > > > and
> > > > > > its
> > > > > > > > subsidiaries decline any liability for the content of this
> > > message.
> > > > > > > > Although the sender endeavors to maintain a computer
> > > > > > > > virus-free
> > > > > > network,
> > > > > > > > the sender does not warrant that this transmission is
> > virus-free
> > > > and
> > > > > > will
> > > > > > > > not be liable for any damages resulting from any virus
> > > transmitted.
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Deshi Xiao
> > > > Twitter: xds2000
> > > > E-mail: xiaods(AT)gmail.com
> > > >
> > >
> >
> Ce message et toutes les pièces jointes (ci-après le "message") sont
> établis à l’intention exclusive des destinataires désignés. Il contient des
> informations confidentielles et pouvant être protégé par le secret
> professionnel. Si vous recevez ce message par erreur, merci d'en avertir
> immédiatement l'expéditeur et de détruire le message. Toute utilisation de
> ce message non conforme à sa destination, toute diffusion ou toute
> publication, totale ou partielle, est interdite, sauf autorisation expresse
> de l’émetteur. L'internet ne garantissant pas l'intégrité de ce message
> lors de son acheminement, Atos (et ses filiales) décline(nt) toute
> responsabilité au titre de son contenu. Bien que ce message ait fait
> l’objet d’un traitement anti-virus lors de son envoi, l’émetteur ne peut
> garantir l’absence totale de logiciels malveillants dans son contenu et ne
> pourrait être tenu pour responsable des dommages engendrés par la
> transmission de l’un d’eux.
>
> This message and any attachments (the "message") are intended solely for
> the addressee(s). It contains confidential information, that may be
> privileged. If you receive this message in error, please notify the sender
> immediately and delete the message. Any use of the message in violation of
> its purpose, any dissemination or disclosure, either wholly or partially is
> strictly prohibited, unless it has been explicitly authorized by the
> sender. As its integrity cannot be secured on the internet, Atos and its
> subsidiaries decline any liability for the content of this message.
> Although the sender endeavors to maintain a computer virus-free network,
> the sender does not warrant that this transmission is virus-free and will
> not be liable for any damages resulting from any virus transmitted.
>

Re: Autoscaling in an IaaS environment

Reply via email to