Re: [DISCUSS] FLIP-543: Support Customized Autoscale Algorithm

Gyula Fóra Wed, 03 Sep 2025 01:39:34 -0700

Hi Peter!
Sounds like a good plan, it would be great if you could help review the
PR/finalize the pluggable evaluator logic to make sure it fits your needs.


Cheers,
Gyula

On Fri, Aug 29, 2025 at 12:17 AM Peter Huang <[email protected]>
wrote:

> Hi Folks,
>
> Thanks for these suggestions. I think we aligned these two features are
> common and should be implemented in upstream.
> I try to summarize the AIs below. Please feel free to add more if I
> miss anything.
>
> 1) Finish the planned work in FLIP-514 to support pluggable
> MetricsEvaluator. Support the scheduled-scaling plugin as planned in
> FLIP-514
> 2) Support the Predictive Autoscaling as a configurable feature on top of a
> customized MetricsEvaluator in FLIP-543
> 3) Support the Data size aware autoscaling as configurable feature on top
> of a customized MetricsEvaluator in FLIP-543
>
> I will revise the FLIP-543 to talk about mainly focus on how Predictive
> Autoscaling and  Data size aware autoscaling  could be implemented on top
> of  pluggable MetricsEvaluator.
>
> Best Regards
> Peter Huang
>
> On Thu, Aug 28, 2025 at 2:18 AM Rui Fan <[email protected]> wrote:
>
> > Hi everyone,
> >
> > Thanks for the productive conversation on FLIP-543.
> >
> > I agree that we need more extensibility in the autoscaler. The predictive
> > scaling
> > use case is a perfect example of a powerful feature that would help many
> of
> > us
> > improve job availability by scaling before backlogs build up.
> >
> > To echo Gyula and Max's points, I also believe the best path forward is
> to
> > build
> > this capability as an extension to the existing framework, not as a
> > replacement.
> > This would offer a robust, community-driven solution for a common
> problem,
> > which feels more sustainable than asking users to implement and maintain
> > custom forks of the logic.
> >
> > Best,
> > Rui
> >
> > On Thu, Aug 28, 2025 at 7:14 AM Pradeepta Choudhury
> > <[email protected]> wrote:
> >
> > > Hello Peter,
> > >
> > > To start with, great initiative! But I echo the same concern raised
> about
> > > creating too many extension points can compromise the autoscaler
> > > functionality.
> > > When we proposed FLIP-514 [1] and a custom evaluator, the aim was
> > twofold:
> > > provide the required extension point and ship practical strategies as
> > > pluggables. At the same time, we wanted to preserve flexibility for
> > > advanced, highly specific scenarios—like predictive scaling—that differ
> > by
> > > ecosystem, platform, and company. The custom evaluator strikes that
> > balance
> > > was the thought process: it lets users adjust the evaluated
> > > metrics—especially TARGET_DATA_RATE—that drive the scale-factor
> > > calculation, enabling useful out-of-the-box behavior without
> constraining
> > > bespoke implementations.
> > > One of the desired outcomes we had set for FLIP-514 was to ship a
> > > scheduled-scaling strategy as a pluggable, leveraging a baseline period
> > and
> > > explicit scheduled windows to drive planned capacity changes. I’ve been
> > > away since last month due to personal commitments. I plan to resume
> after
> > > first week of September and will complete the scheduled-scaling plugin
> to
> > > wrap up the custom evaluator.
> > > Having the ScalingRealizer pluggable (
> > > https://github.com/apache/flink-kubernetes-operator/pull/1020/files),
> > > definitely sounds helpful for certain scenarios.
> > > But I totally agree with the general approach suggested by Gyula, about
> > > solving specific issues independently in the "best possible way" and
> then
> > > coming to a good solution regarding pluggability that could be
> foundation
> > > for future use-cases.
> > >
> > >
> > > Thanks and Regards
> > > Pradeepta
> > >
> > >
> > > > On 26 Aug 2025, at 6:05 PM, [email protected] <
> > > [email protected]> wrote:
> > > >
> > > > From the ScalingRealizer, I think having before/after  hooks for
> > > `realizeParallelismOverrides` and `realizeConfigOverrides` would be
> good.
> > > We can support these hooks from plugins, thoughts?
> > > >
> > > >
> > > > Best,
> > > > Diljeet(DJ) Singh
> > > >
> > > > On 2025/08/26 08:24:33 Maximilian Michels wrote:
> > > >> Hi Peter,
> > > >>
> > > >> First of all, this is a great initiative. Flink Autoscaling
> definitely
> > > >> needs more points of extension. We recently added support for
> hooking
> > > >> into the metric evaluation (FLIP-514), but clearly that is just one
> > > >> extension point.
> > > >>
> > > >> That said, I think we will need to revise the approach a bit. I'm
> not
> > > >> sure, we should be replacing core components. As Gyula mentioned,
> > > >> replacing those will easily break the entire autoscaler. Instead, we
> > > >> should be adding extension points which allow for meaningful
> additions
> > > >> without breaking the scaling logic. There is already the option to
> > > >> replace the entire autoscaling module, if users really want to roll
> > > >> out a completely custom version.
> > > >>
> > > >> What usually works best is to formulate the use case first, then
> > > >> figure out what autoscaler customization would be necessary to
> > > >> implement the use case.
> > > >>
> > > >> As for making the ScalingRealizer pluggable
> > > >> (
> https://github.com/apache/flink-kubernetes-operator/pull/1020/files
> > ),
> > > >> I do think that makes sense for some scenarios.
> > > >>
> > > >> Cheers,
> > > >> Max
> > > >>
> > > >> On Tue, Aug 26, 2025 at 8:59 AM Gyula Fóra <[email protected]> wrote:
> > > >>>
> > > >>> Hi Peter & Diljeet!
> > > >>>
> > > >>> My general feedback is that we should try to introduce extension
> > > plugins instead of plugins that completely replace key parts of the
> > > autoscaler code.
> > > >>>
> > > >>> Let me give you a concrete example through FLIP-514 and FLIP-543
> > using
> > > the MetricsEvaluator pluggability.
> > > >>> The MetricsEvaluator in the autoscaler is responsible for
> > > evaluating/deriving/calculating metrics from the collected metrics. It
> > has
> > > to calculate everything in a more or less specific way otherwise other
> > > parts of the autoscaler that depend on these metrics may not work. It
> > > doesn't seem very practical/resonable to completely reimplement this
> just
> > > because someone wants to extend the logic, this is extremely error
> prone
> > > and fragile especially if the autoscaler logic later evolves.
> > > >>>
> > > >>> FLIP-514 takes the approach to extend the metric evaluator with a
> new
> > > method that allows users to at the end modify the evaluated metrics and
> > > define custom ones. This is the right approach here as it makes a new
> > > extension very simple to build and maintain without interfering with
> > > existing logic.
> > > >>>
> > > >>> The approach in FLIP-543 and in Diljeet's example PR takes the
> > > replacement approach to completely substitute the entire parts of the
> > > implementation (the entire evaluator, scaling realizer etc). I think
> this
> > > is not very good for either the community or the actual user. From a
> > > community perspective it makes it harder to extend the logic with nice
> > > small additions and from a user's perspective it is very error probe if
> > the
> > > operator autoscaler logic changes as it basically exposes a lot of
> > internal
> > > logic on a user interface.
> > > >>>
> > > >>> So at this point,  -1 for the approach in FLIP-543 from my side,
> but
> > I
> > > would love to hear the opinion of others as well.
> > > >>>
> > > >>> Cheers
> > > >>> Gyula
> > > >>>
> > > >>> On Mon, Aug 25, 2025 at 11:44 PM Peter Huang <[email protected]>
> > wrote:
> > > >>>>
> > > >>>> Hi Diljeet,
> > > >>>>
> > > >>>> Yes, I think we have similar requirements to make autoscaler even
> > more
> > > >>>> powerful to handle some customized requirements.
> > > >>>> The quick PoC makes sense to me. Let's get some more feedback from
> > the
> > > >>>> community.
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>> Best Regards
> > > >>>> Peter Huang
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>> On Mon, Aug 25, 2025 at 2:37 PM Peter Huang <[email protected]>
> > > >>>> wrote:
> > > >>>>
> > > >>>>> Just try to combine the discussion into one thread.
> > > >>>>>
> > > >>>>> @Diljeet Singh
> > > >>>>> Posted a quick PoC for the proposal
> > > >>>>> https://github.com/apache/flink-kubernetes-operator/pull/1020.
> > > >>>>>
> > > >>>>>
> > > >>>>>
> > > >>>>>
> > > >>>>> On Mon, Aug 25, 2025 at 7:52 AM Peter Huang <[email protected]>
> > > >>>>> wrote:
> > > >>>>>
> > > >>>>>> Hi Community,
> > > >>>>>>
> > > >>>>>> Our org has been heavily using the Flink autoscaling algorithm.
> It
> > > >>>>>> greatly reduced our operation overhead and improved cost
> > efficiency
> > > >>>>>> as users always over provision resources when onboard. Recently,
> > we
> > > have
> > > >>>>>> had some requirements to customize the auto scaling algorithm
> > > >>>>>> for different scenarios, for example, during the holiday season
> > > large but
> > > >>>>>> predictable traffic spike, increase checkpoint interval together
> > > with
> > > >>>>>> scale up for streaming ingestion use cases.
> > > >>>>>>
> > > >>>>>> We search through the discussion about the topic in the mail
> list
> > > >>>>>> including the existing FLIP-514
> > > >>>>>> <
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-514%3A+Custom+Evaluator+plugin+for+Flink+Autoscaler
> > > >.
> > > >>>>>> Looks like the discussion is not finalized yet.
> > > >>>>>> To accelerate the process, we adopt and combine the
> > > >>>>>> existing opinions from the community and create a proposal in
> > > FLIP-543
> > > >>>>>> <
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-543%3A+Support+Customized+Autoscale+Algorithm
> > > >.
> > > >>>>>> The basic idea
> > > >>>>>> is to make some core components of autoscaler pluggable, for
> > > example,
> > > >>>>>> MetricsCollector, Metrics Evaluator, and ScalingRealizer, at the
> > > same
> > > >>>>>> keep the core logic skeleton (which is already well justified in
> > > large
> > > >>>>>> amount of users) of autoscaler untouched.
> > > >>>>>>
> > > >>>>>> Looking forward to any feedback and opinions on FLIP-543.
> > > >>>>>>
> > > >>>>>> [1]
> > > >>>>>>
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-543%3A+Support+Customized+Autoscale+Algorithm
> > > >>>>>> [2]
> > > >>>>>>
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-514%3A+Custom+Evaluator+plugin+for+Flink+Autoscaler
> > > >>>>>> [3] Other related discussion thread
> > > >>>>>>
> > > >>>>>>
> https://lists.apache.org/thread/749l74z1h5jylkxrw3rtjmxcj2t9p7ws
> > > >>>>>>
> > > >>>>>>
> https://lists.apache.org/thread/mcd7jcn4kz6oqtyqq5hfycjf9mqh6c53
> > > >>>>>>
> > > >>>>>>
> > > >>>>>> Best Regards
> > > >>>>>> Peter Huang
> > > >>>>>>
> > > >>>>>
> > >
> > >
> >
>

Re: [DISCUSS] FLIP-543: Support Customized Autoscale Algorithm

Reply via email to