Re: [DISCUSS] FLIP-334 : Decoupling autoscaler and kubernetes

Rui Fan Wed, 06 Sep 2023 23:08:51 -0700

Hi Max,

Thanks for your feedback!


> We need to go through these phases for the FLIP to be meaningful:
> 1. Decouple autoscaler from current autoscaler (generalization)
> 2. Ensure 100% functionality and test coverage of Kubernetes
implementation
> 3. Interface with another backend (e.g. YARN or standalone)

These phases make sense to me.

I have updated the FLIP[1] based on our offline discussion.
I added the `5. Standalone AutoScaler` to explain the generic
autoscaler.

Please correct me if anything is wrong or missed, thanks again!

[1] https://cwiki.apache.org/confluence/x/x4qzDw

Best,
Rui

On Wed, Sep 6, 2023 at 7:00 PM Maximilian Michels <m...@apache.org> wrote:

> Hey Rui, hey Samrat,
>
> I want to ensure this is not just an exercise but has actual benefits
> for the community. In the past, I've seen that the effort stops half
> way through, the refactoring gets done with some regressions, but
> actual alternative implementations based on the new design never
> follow.
>
> We need to go through these phases for the FLIP to be meaningful:
>
> 1. Decouple autoscaler from current autoscaler (generalization)
> 2. Ensure 100% functionality and test coverage of Kubernetes implementation
> 3. Interface with another backend (e.g. YARN or standalone)
>
> If we don't follow through with this plan, I'm not sure we are better
> off than with the current implementation. Apologies if I'm being a bit
> strict here but the autoscaling code has become a critical
> infrastructure component. We need to carefully weigh the pros and cons
> here to avoid risks for our users, some of them using this code in
> production and relying on it on a day to day basis.
>
> That said, we are open to following through with the FLIP and we can
> definitely help review code changes and build on the new design.
>
> -Max
>
>
> On Wed, Sep 6, 2023 at 11:26 AM Rui Fan <1996fan...@gmail.com> wrote:
> >
> > Hi Max,
> >
> > As the FLIP mentioned, we have the plan to add the
> > alternative implementation.
> >
> > First of all, we will develop a generic autoscaler. This generic
> > autoscaler will not have knowledge of specific jobs, and users
> > will have the flexibility to pass the JobAutoScalerContext
> > when utilizing the generic autoscaler. Communication with
> > Flink jobs can be achieved through the RestClusterClient.
> >
> >    - The generic ScalingRealizer is based on the rescale API (FLIP-291).
> >    - The generic EventHandler is based on the logger.
> >    - The generic StateStore is based on the Heap. This means that the
> state
> >    information is stored in memory and can be lost if the autoscaler
> restarts.
> >
> >
> > Secondly, for yarn implementation, as Samrat mentioned,
> > There is currently no flink-yarn-operator, and we cannot
> > easily obtain the job list. We are not yet sure how to manage
> > yarn's flink jobs. In order to prevent the FLIP from being too huge,
> > after confirming with Gyula and Samrat before, it is decided
> > that the current FLIP will not implement the automated
> > yarn-autoscaler. And it will be a separate FLIP in the future.
> >
> >
> > After this part is finished, flink users or other flink platforms can
> easy
> > to use the autoscaler, they just pass the Context, and the autoscaler
> > can find the flink job using the RestClient.
> >
> > The first part will be done in this FLIP. And we can discuss
> > whether the second part should be done in this FLIP as well.
> >
> > Best,
> > Rui
> >
> > On Wed, Sep 6, 2023 at 4:34 AM Samrat Deb <decordea...@gmail.com> wrote:
> >
> > > Hi Max,
> > >
> > > > are we planning to add an alternative implementation
> > > against the new interfaces?
> > >
> > > Yes, we are simultaneously working on the YARN implementation using the
> > > interface. During the initial interface design, we encountered some
> > > anomalies while implementing it in YARN.
> > >
> > > Once the interfaces are finalized, we will proceed to raise a pull
> request
> > > (PR) for YARN as well.
> > >
> > > Our initial approach was to create a decoupled interface as part of
> > > FLIP-334 and then implement it for YARN in the subsequent phase.
> > > However, if you recommend combining both phases, we can certainly
> consider
> > > that option.
> > >
> > > We look forward to hearing your thoughts on whether to have YARN
> > > implementation as part of FLIP-334 or seperate one ?
> > >
> > > Bests
> > > Samrat
> > >
> > >
> > >
> > > On Tue, Sep 5, 2023 at 8:41 PM Maximilian Michels <m...@apache.org>
> wrote:
> > >
> > > > Thanks Rui for the update!
> > > >
> > > > Alongside with the refactoring to decouple autoscaler logic from the
> > > > deployment logic, are we planning to add an alternative
> implementation
> > > > against the new interfaces? I think the best way to get the
> interfaces
> > > > right, is to have an alternative implementation in addition to
> > > > Kubernetes. YARN or a standalone mode implementation were already
> > > > mentioned. Ultimately, this is the reason we are doing the
> > > > refactoring. Without a new implementation, it becomes harder to
> > > > justify the refactoring work.
> > > >
> > > > Cheers,
> > > > Max
> > > >
> > > > On Tue, Sep 5, 2023 at 9:48 AM Rui Fan <fan...@apache.org> wrote:
> > > > >
> > > > > After discussing this FLIP-334[1] offline with Gyula and Max,
> > > > > I updated the FLIP based on the latest conclusion.
> > > > >
> > > > > Big thanks to Gyula and Max for their professional advice!
> > > > >
> > > > > > Does the interface function of handlerRecommendedParallelism
> > > > > > in AutoScalerEventHandler conflict with
> > > > > > handlerScalingFailure/handlerScalingReport (one of the
> > > > > > handles the event of scale failure, and the other handles
> > > > > > the event of scale success).
> > > > > Hi Matt,
> > > > >
> > > > > You can take a look at the FLIP, I think the issue has been fixed.
> > > > > Currently, we introduced the ScalingRealizer and
> > > > > AutoScalerEventHandler interface.
> > > > >
> > > > > The ScalingRealizer handles scaling action.
> > > > >
> > > > > The AutoScalerEventHandler  interface handles loggable events.
> > > > >
> > > > >
> > > > > Looking forward to your feedback, thanks!
> > > > >
> > > > > [1] https://cwiki.apache.org/confluence/x/x4qzDw
> > > > >
> > > > > Best,
> > > > > Rui
> > > > >
> > > > > On Thu, Aug 24, 2023 at 10:55 AM Matt Wang <wang...@163.com>
> wrote:
> > > > >>
> > > > >> Sorry for the late reply, I still have a small question here:
> > > > >> Does the interface function of handlerRecommendedParallelism
> > > > >> in AutoScalerEventHandler conflict with
> > > > >> handlerScalingFailure/handlerScalingReport (one of the
> > > > >> handles the event of scale failure, and the other handles
> > > > >> the event of scale success).
> > > > >>
> > > > >>
> > > > >>
> > > > >> --
> > > > >>
> > > > >> Best,
> > > > >> Matt Wang
> > > > >>
> > > > >>
> > > > >> ---- Replied Message ----
> > > > >> | From | Rui Fan<1996fan...@gmail.com> |
> > > > >> | Date | 08/21/2023 17:41 |
> > > > >> | To | <dev@flink.apache.org> |
> > > > >> | Cc | Maximilian Michels<m...@apache.org> ,
> > > > >> Gyula Fóra<gyula.f...@gmail.com> ,
> > > > >> Matt Wang<wang...@163.com> |
> > > > >> | Subject | Re: [DISCUSS] FLIP-334 : Decoupling autoscaler and
> > > > kubernetes |
> > > > >> Hi Max, Gyula and Matt,
> > > > >>
> > > > >> Do you have any other comments?
> > > > >>
> > > > >> The flink-kubernetes-operator 1.6 has been released recently,
> > > > >> it's a good time to kick off this FLIP.
> > > > >>
> > > > >> Please let me know if you have any questions or concerns,
> > > > >> looking forward to your feedback, thanks!
> > > > >>
> > > > >> Best,
> > > > >> Rui
> > > > >>
> > > > >> On Wed, Aug 9, 2023 at 11:55 AM Rui Fan <1996fan...@gmail.com>
> wrote:
> > > > >>
> > > > >> Hi Matt Wang,
> > > > >>
> > > > >> Thanks for your discussion here.
> > > > >>
> > > > >> it is recommended to unify the descriptions of AutoScalerHandler
> > > > >> and AutoScalerEventHandler in the FLIP
> > > > >>
> > > > >> Good catch, I have updated all AutoScalerHandler to
> > > > >> AutoScalerEventHandler.
> > > > >>
> > > > >> Can it support the use of zookeeper (zookeeper is a relatively
> > > > >> common use of flink HA)?
> > > > >>
> > > > >> In my opinion, it's a good suggestion. However, I prefer we
> > > > >> implement other state stores in the other FLINK JIRA, and
> > > > >> this FLIP focus on the decoupling and implementing the
> > > > >> necessary state store. Does that make sense?
> > > > >>
> > > > >> Regarding each scaling information, can it be persisted in
> > > > >> the shared file system through the filesystem? I think it will
> > > > >> be a more valuable requirement to support viewing
> > > > >> Autoscaling info on the UI in the future, which can provide
> > > > >> some foundations in advance;
> > > > >>
> > > > >> This is a good suggestion as well. It's useful for users to check
> > > > >> the scaling information. I propose to add a CompositeEventHandler,
> > > > >> it can include multiple EventHandlers.
> > > > >>
> > > > >> However, as the last question, I prefer we implement other
> > > > >> event handler in the other FLINK JIRA. What do you think?
> > > > >>
> > > > >> A solution mentioned in FLIP is to initialize the
> > > > >> AutoScalerEventHandler object every time an event is
> > > > >> processed.
> > > > >>
> > > > >> No, the FLIP mentioned `The AutoScalerEventHandler  object is
> shared
> > > for
> > > > >> all flink jobs`,
> > > > >> So the AutoScalerEventHandler is only initialized once.
> > > > >>
> > > > >> And we call the AutoScalerEventHandler#handlerXXX
> > > > >> every time an event is processed.
> > > > >>
> > > > >> Best,
> > > > >> Rui
> > > > >>
> > > > >> On Tue, Aug 8, 2023 at 9:40 PM Matt Wang <wang...@163.com> wrote:
> > > > >>
> > > > >> Hi Rui
> > > > >>
> > > > >> Thanks for driving the FLIP.
> > > > >>
> > > > >> I agree with the point fo this FLIP. This FLIP first provides a
> > > > >> general function of Autoscaler in Flink repo, and there is no
> > > > >> need to move kubernetes-autoscaler from kubernetes-operator
> > > > >> to Flink repo in this FLIP(it is recommended to unify the
> > > > >> descriptions of AutoScalerHandler and AutoScalerEventHandler
> > > > >> in the FLIP). Here I still have a few questions:
> > > > >>
> > > > >> 1. AutoScalerStateStore mainly records the state information
> > > > >> during Scaling. In addition to supporting the use of configmap,
> > > > >> can it support the use of zookeeper (zookeeper is a relatively
> > > > >> common use of flink HA)?
> > > > >> 2. Regarding each scaling information, can it be persisted in
> > > > >> the shared file system through the filesystem? I think it will
> > > > >> be a more valuable requirement to support viewing
> > > > >> Autoscaling info on the UI in the future, which can provide
> > > > >> some foundations in advance;
> > > > >> 3. A solution mentioned in FLIP is to initialize the
> > > > >> AutoScalerEventHandler object every time an event is
> > > > >> processed. What is the main purpose of this solution?
> > > > >>
> > > > >>
> > > > >>
> > > > >> --
> > > > >>
> > > > >> Best,
> > > > >> Matt Wang
> > > > >>
> > > > >>
> > > > >> ---- Replied Message ----
> > > > >> | From | Rui Fan<1996fan...@gmail.com> |
> > > > >> | Date | 08/7/2023 11:34 |
> > > > >> | To | <dev@flink.apache.org> |
> > > > >> | Cc | m...@apache.org<m...@apache.org> ,
> > > > >> Gyula Fóra<gyula.f...@gmail.com> |
> > > > >> | Subject | Re: [DISCUSS] FLIP-334 : Decoupling autoscaler and
> > > > kubernetes
> > > > >> |
> > > > >> Hi Ron:
> > > > >>
> > > > >> Thanks for the feedback! The goal is indeed to turn the autoscaler
> > > into
> > > > >> a general tool that can support other resource management.
> > > > >>
> > > > >>
> > > > >> Hi Max, Gyula:
> > > > >>
> > > > >> My proposed `AutoScalerStateStore` is similar to Map, it can
> really be
> > > > >> improved.
> > > > >>
> > > > >> public interface AutoScalerStateStore {
> > > > >> Map<String, String> getState(KEY jobKey)
> > > > >> void updateState(KEY jobKey, Map<String, String> state);
> > > > >> }
> > > > >>
> > > > >> From the method parameter, the StateStore is shared by all jobs,
> > > right?
> > > > >> If yes, the `KEY jobKey` isn't enough because the CR is needed
> during
> > > > >> creating the ConfigMap. The `jobKey` is ResourceID, the
> extraJobInfo
> > > is
> > > > >> CR.
> > > > >>
> > > > >> So, this parameter may need to be changed from `KEY jobKey` to
> > > > >> `JobAutoScalerContext<KEY, INFO>`. Does that make sense?
> > > > >> If yes, I can update the interface in the FLIP doc.
> > > > >>
> > > > >> Best,
> > > > >> Rui
> > > > >>
> > > > >> On Mon, Aug 7, 2023 at 10:18 AM liu ron <ron9....@gmail.com>
> wrote:
> > > > >>
> > > > >> Hi, Rui
> > > > >>
> > > > >> Thanks for driving the FLIP.
> > > > >>
> > > > >> The tuning of streaming jobs by autoscaler is very important.
> Although
> > > > the
> > > > >> mainstream trend now is cloud-native, many companies still run
> their
> > > > Flink
> > > > >> jobs on Yarn for historical reasons. If we can decouple autoscaler
> > > from
> > > > >> K8S
> > > > >> and turn it into a common tool that can support other resource
> > > > management
> > > > >> frameworks such as Yarn, I think it will be very meaningful.
> > > > >> +1 for this proposal.
> > > > >>
> > > > >> Best,
> > > > >> Ron
> > > > >>
> > > > >>
> > > > >> Gyula Fóra <gyula.f...@gmail.com> 于2023年8月5日周六 15:03写道：
> > > > >>
> > > > >> Hi Rui!
> > > > >>
> > > > >> Thanks for the proposal.
> > > > >>
> > > > >> I agree with Max on that the state store abstractions could be
> > > improved
> > > > >> and
> > > > >> be more specific as we know what goes into the state. It could
> simply
> > > be
> > > > >>
> > > > >> public interface AutoScalerStateStore {
> > > > >> Map<String, String> getState(KEY jobKey)
> > > > >> void updateState(KEY jobKey, Map<String, String> state);
> > > > >> }
> > > > >>
> > > > >>
> > > > >> We could also remove the entire recommended parallelism logic
> from the
> > > > >> interface and make it internal to the implementation somehow
> because
> > > > it's
> > > > >> not very nice in the current form.
> > > > >>
> > > > >> Cheers,
> > > > >> Gyula
> > > > >>
> > > > >> On Fri, Aug 4, 2023 at 7:05 AM Rui Fan <1996fan...@gmail.com>
> wrote:
> > > > >>
> > > > >> Hi Max,
> > > > >>
> > > > >> After careful consideration, I prefer to keep the
> AutoScalerStateStore
> > > > >> instead of AutoScalerEventHandler taking over the work of
> > > > >> AutoScalerStateStore. And the following are some reasons:
> > > > >>
> > > > >> 1. Keeping the AutoScalerStateStore to make StateStore easy to
> plug
> > > in.
> > > > >>
> > > > >> Currently, the kubernetes-operator-autoscaler uses the ConfigMap
> as
> > > the
> > > > >> state store. However, users may use a different state store for
> > > > >> yarn-autoscaler or generic autoscaler. Such as: MySQL StateStore,
> > > > >> Heaped StateStore or PostgreSQL StateStore, etc.
> > > > >>
> > > > >> Of course, kubernetes autoscaler can also use the MySQL
> StateStore.
> > > > >> If the AutoScalerEventHandler is responsible for recording events,
> > > > >> scaling job and accessing state, whenever users or community want
> to
> > > > >> create a new state store, they must also implement the new
> > > > >> AutoScalerEventHandler, it includes recording events and scaling
> job.
> > > > >>
> > > > >> If we decouple AutoScalerEventHandler and AutoScalerStateStore,
> > > > >> it's easy to implement a new state store.
> > > > >>
> > > > >> 2. AutoScalerEventHandler isn't suitable for access state.
> > > > >>
> > > > >> Sometimes the generic autoscaler needs to query state,
> > > > >> AutoScalerEventHandler can update the state during handling
> events.
> > > > >> However, it's wired to provide a series of interfaces to query
> state.
> > > > >>
> > > > >> What do you think?
> > > > >>
> > > > >> And looking forward to more thoughts from the community, thanks!
> > > > >>
> > > > >> Best,
> > > > >> Rui Fan
> > > > >>
> > > > >> On Tue, Aug 1, 2023 at 11:47 PM Rui Fan <1996fan...@gmail.com>
> wrote:
> > > > >>
> > > > >> Hi Max,
> > > > >>
> > > > >> Thanks for your quick response!
> > > > >>
> > > > >> 1. Handle state in the AutoScalerEventHandler which will receive
> > > > >> all related scaling and metric collection events, and can keep
> > > > >> track of any state.
> > > > >>
> > > > >> If I understand correctly, you mean that updating state is just
> part
> > > > >> of
> > > > >> handling events, right?
> > > > >>
> > > > >> If yes, sounds make sense. However, I have some concerns:
> > > > >>
> > > > >> - Currently, we have 3 key-values that need to be stored. And the
> > > > >> autoscaler needs to get them first, then update them, and
> sometimes
> > > > >> remove them. If we use AutoScalerEventHandler, we need to provided
> > > > >> 9 methods, right? Every key has 3 methods.
> > > > >> - Do we add the persistState interface for AutoScalerEventHandler
> to
> > > > >> persist in-memory state to kubernetes?
> > > > >>
> > > > >> 2. In the long run, the autoscaling logic can move to a
> > > > >> separate repository, although this will complicate the release
> > > > >> process, so I would defer this unless there is strong interest.
> > > > >>
> > > > >> I also agree to leave it in flink-k8s-operator for now. Unless
> moving
> > > > >> it
> > > > >> out of flink-k8s-operator is necessary in the future.
> > > > >>
> > > > >> 3. The proposal mentions some removal of tests.
> > > > >>
> > > > >> Sorry, I didn't express clearly in FLIP. POC just check whether
> these
> > > > >> interfaces work well. It will take more time if I develop all the
> > > > >> tests
> > > > >> during POC. So I removed these tests in my POC.
> > > > >>
> > > > >> These tests will be completed in the final PR, and the test is
> very
> > > > >> useful
> > > > >> for less bugs.
> > > > >>
> > > > >> Best,
> > > > >> Rui Fan
> > > > >>
> > > > >> On Tue, Aug 1, 2023 at 10:10 PM Maximilian Michels <
> m...@apache.org>
> > > > >> wrote:
> > > > >>
> > > > >> Hi Rui,
> > > > >>
> > > > >> Thanks for the proposal. I think it makes a lot of sense to
> decouple
> > > > >> the autoscaler from Kubernetes-related dependencies. A couple of
> > > > >> notes
> > > > >> when I read the proposal:
> > > > >>
> > > > >> 1. You propose AutoScalerEventHandler, AutoScalerStateStore,
> > > > >> AutoScalerStateStoreFactory, and AutoScalerEventHandler.
> > > > >> AutoscalerStateStore is a generic key/value database (methods:
> > > > >> "get"/"put"/"delete"). I would propose to refine this interface
> and
> > > > >> make it less general purpose, e.g. add a method for persisting
> > > > >> scaling
> > > > >> decisions as well as any metrics gathered for the current metric
> > > > >> window. For simplicity, I'd even go so far to remove the state
> store
> > > > >> entirely, but rather handle state in the AutoScalerEventHandler
> > > > >> which
> > > > >> will receive all related scaling and metric collection events, and
> > > > >> can
> > > > >> keep track of any state.
> > > > >>
> > > > >> 2. You propose to make the current autoscaler module
> > > > >> Kubernetes-agnostic by moving the Kubernetes parts into the main
> > > > >> operator module. I think that makes sense since the Kubernetes
> > > > >> implementation will continue to be tightly coupled with
> Kubernetes.
> > > > >> The goal of the separate module was to make the autoscaler logic
> > > > >> pluggable, but this will continue to be possible with the new
> > > > >> "flink-autoscaler" module which contains the autoscaling logic and
> > > > >> interfaces. In the long run, the autoscaling logic can move to a
> > > > >> separate repository, although this will complicate the release
> > > > >> process, so I would defer this unless there is strong interest.
> > > > >>
> > > > >> 3. The proposal mentions some removal of tests. It is critical for
> > > > >> us
> > > > >> that all test coverage of the current implementation remains
> active.
> > > > >> It is ok if some of the test coverage only covers the Kubernetes
> > > > >> implementation. We can eventually move more tests without
> Kubernetes
> > > > >> significance into the implementation-agnostic autoscaler tests.
> > > > >>
> > > > >> -Max
> > > > >>
> > > > >> On Tue, Aug 1, 2023 at 9:46 AM Rui Fan <fan...@apache.org> wrote:
> > > > >>
> > > > >> Hi all,
> > > > >>
> > > > >> I and Samrat(cc'ed) created the FLIP-334[1] to decoupling the
> > > > >> autoscaler
> > > > >> and kubernetes.
> > > > >>
> > > > >> Currently, the flink-autoscaler is tightly integrated with
> > > > >> Kubernetes.
> > > > >> There are compelling reasons to extend the use of flink-autoscaler
> > > > >> to
> > > > >> more types of Flink jobs:
> > > > >> 1. With the recent merge of the Externalized Declarative Resource
> > > > >> Management (FLIP-291[2]), in-place scaling is now supported
> > > > >> across all types of Flink jobs. This development has made scaling
> > > > >> Flink
> > > > >> on
> > > > >> YARN a straightforward process.
> > > > >> 2. Several discussions[3] within the Flink user community, as
> > > > >> observed
> > > > >> in
> > > > >> the mail list , have emphasized the necessity of flink-autoscaler
> > > > >> supporting
> > > > >> Flink on YARN.
> > > > >>
> > > > >> Please refer to the FLIP[1] document for more details about the
> > > > >> proposed
> > > > >> design and implementation. We welcome any feedback and opinions on
> > > > >> this proposal.
> > > > >>
> > > > >> [1] https://cwiki.apache.org/confluence/x/x4qzDw
> > > > >> [2]
> > > > >>
> > > > >>
> > > > >>
> > > > >>
> > > > >>
> > > > >>
> > > >
> > >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-291%3A+Externalized+Declarative+Resource+Management
> > > > >> [3]
> > > > >> https://lists.apache.org/thread/pr0r8hq8kqpzk3q1zrzkl3rp1lz24v7v
> > > > >>
> > > > >>
> > > > >>
> > > > >>
> > > > >>
> > > > >>
> > > >
> > >
>

Re: [DISCUSS] FLIP-334 : Decoupling autoscaler and kubernetes

Reply via email to