Re: [DISCUSS] Extract core autoscaling algorithm as new SubModule in flink-kubernetes-operator

Gyula Fóra Thu, 16 Feb 2023 07:03:46 -0800

@Shammon , Samrat:

I appreciate the enthusiasm and I wish this was only a matter of intention
but making the autoscaler work without the operator may be a pretty big
task.
You must not forget 2 core requirements here.


1. The autoscaler logic itself has to run somewhere (in this case on k8s
within the operator)S
2. Something has to execute the job stateful upgrades safely based on the
scaling decisions (in this case the operator does that).

1. Can be solved almost anywhere easily however you need resiliency etc for
this to be a prod application, 2. is the really tricky part. The operator
was actually built to execute job upgrades, if you look at the code you
will appreciate the complexity of the task.

As I said in the earlier thread. It is easier to make the operator work
with jobs running in different types of clusters than to take the
autoscaler module itself and plug that in somewhere else.

Gyula


On Thu, Feb 16, 2023 at 3:12 PM Samrat Deb <decordea...@gmail.com> wrote:

> Hi Shammon,
>
> Thank you for your input, completely aligned with you.
>
> We are fine with either of the options ,
>
> but IMO, to start with it will be easy to have it in the
> flink-kubernetes-operator as a module instead of a separate repo which
> requires additional effort.
>
> Given that we would be incrementally working on making an autoscaling
> recommendation framework generic enough,
>
> Once it reaches a point where the community feels it needs to be moved to a
> separate repo we can take a call.
>
> Bests,
>
> Samrat
>
>
> On Thu, Feb 16, 2023 at 7:37 PM Samrat Deb <decordea...@gmail.com> wrote:
>
> > Hi Max ,
> > If you are fine and aligned with the same thought , since this is going
> to
> > be very useful to us, we are ready to help / contribute additional work
> > required.
> >
> > Bests,
> > Samrat
> >
> >
> > On Thu, 16 Feb 2023 at 5:28 PM, Shammon FY <zjur...@gmail.com> wrote:
> >
> >> Hi Samrat
> >>
> >> Do you mean to create an independent module for flink scaling in
> >> flink-k8s-operator? How about creating a project such as
> >> `flink-auto-scaling` which is completely independent? Besides resource
> >> managers such as k8s and yarn, we can do more things in the project, for
> >> example, updating config in the user's `job submission system` after
> >> scaling flink jobs. WDYT?
> >>
> >> Best,
> >> Shammon
> >>
> >>
> >> On Thu, Feb 16, 2023 at 7:38 PM Maximilian Michels <m...@apache.org>
> >> wrote:
> >>
> >> > Hi Samrat,
> >> >
> >> > The autoscaling module is now pluggable but it is still tightly
> >> > coupled with Kubernetes. It will take additional work for the logic to
> >> > work independently of the cluster manager.
> >> >
> >> > -Max
> >> >
> >> > On Thu, Feb 16, 2023 at 11:14 AM Samrat Deb <decordea...@gmail.com>
> >> wrote:
> >> > >
> >> > > Oh! yesterday it got merged.
> >> > > Apologies , I missed the recent commit @Gyula.
> >> > >
> >> > > Thanks for the update
> >> > >
> >> > >
> >> > >
> >> > > On Thu, Feb 16, 2023 at 3:17 PM Gyula Fóra <gyula.f...@gmail.com>
> >> wrote:
> >> > >
> >> > > > Max recently moved the autoscaler logic in a separate submodule,
> did
> >> > you
> >> > > > see that?
> >> > > >
> >> > > >
> >> > > >
> >> >
> >>
> https://github.com/apache/flink-kubernetes-operator/commit/5bb8e9dc4dd29e10f3ba7c8ce7cefcdffbf92da4
> >> > > >
> >> > > > Gyula
> >> > > >
> >> > > > On Thu, Feb 16, 2023 at 10:27 AM Samrat Deb <
> decordea...@gmail.com>
> >> > wrote:
> >> > > >
> >> > > > > Hi ,
> >> > > > >
> >> > > > > *Context:*
> >> > > > > Auto Scaling was introduced in Flink as part of FLIP-271[1].
> >> > > > > It discusses one of the important aspects to provide a robust
> >> default
> >> > > > > scaling algorithm.
> >> > > > >       a. Ensure scaling yields effective usage of assigned task
> >> > slots.
> >> > > > >       b. Ramp up in case of any backlog to ensure it gets
> >> processed
> >> > in a
> >> > > > > timely manner
> >> > > > >       c. Minimize the number of scaling decisions to prevent
> >> costly
> >> > > > rescale
> >> > > > > operation
> >> > > > > The flip intends to add an auto scaling framework based on 6
> major
> >> > > > metrics
> >> > > > > and contains different types of threshold to trigger the
> scaling.
> >> > > > >
> >> > > > > Thread[2] discusses a different problem: why autoscaler is part
> of
> >> > the
> >> > > > > operator instead of jobmanager at runtime.
> >> > > > > The Community decided to keep the autoscaling logic in the
> >> > > > > flink-kubernetes-operator.
> >> > > > >
> >> > > > > *Proposal: *
> >> > > > > In this discussion, I want to put forward a thought of
> extracting
> >> > out the
> >> > > > > auto scaling logic into a new submodule in
> >> flink-kubernetes-operator
> >> > > > > repository[3],
> >> > > > > which will be independent of any resource manager/Operator.
> >> > > > > Currently the Autoscaling algorithm is very tightly coupled with
> >> the
> >> > > > > kubernetes API.
> >> > > > > This makes the autoscaling core algorithm not so easily
> extensible
> >> > for
> >> > > > > different available resource managers like YARN, Mesos etc.
> >> > > > > A Separate autoscaling module inside the flink kubernetes
> operator
> >> > will
> >> > > > > help other resource managers to leverage the autoscaling logic.
> >> > > > >
> >> > > > > [1]
> >> > > > >
> >> > > >
> >> >
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-271%3A+Autoscaling
> >> > > > > [2]
> >> https://lists.apache.org/thread/pvfb3fw99mj8r1x8zzyxgvk4dcppwssz
> >> > > > > [3] https://github.com/apache/flink-kubernetes-operator
> >> > > > >
> >> > > > >
> >> > > > > Bests,
> >> > > > > Samrat
> >> > > > >
> >> > > >
> >> >
> >>
> >
>

Re: [DISCUSS] Extract core autoscaling algorithm as new SubModule in flink-kubernetes-operator

Reply via email to