Hi Peter! Sounds like a good plan, it would be great if you could help review the PR/finalize the pluggable evaluator logic to make sure it fits your needs.
Cheers, Gyula On Fri, Aug 29, 2025 at 12:17 AM Peter Huang <[email protected]> wrote: > Hi Folks, > > Thanks for these suggestions. I think we aligned these two features are > common and should be implemented in upstream. > I try to summarize the AIs below. Please feel free to add more if I > miss anything. > > 1) Finish the planned work in FLIP-514 to support pluggable > MetricsEvaluator. Support the scheduled-scaling plugin as planned in > FLIP-514 > 2) Support the Predictive Autoscaling as a configurable feature on top of a > customized MetricsEvaluator in FLIP-543 > 3) Support the Data size aware autoscaling as configurable feature on top > of a customized MetricsEvaluator in FLIP-543 > > I will revise the FLIP-543 to talk about mainly focus on how Predictive > Autoscaling and Data size aware autoscaling could be implemented on top > of pluggable MetricsEvaluator. > > Best Regards > Peter Huang > > On Thu, Aug 28, 2025 at 2:18 AM Rui Fan <[email protected]> wrote: > > > Hi everyone, > > > > Thanks for the productive conversation on FLIP-543. > > > > I agree that we need more extensibility in the autoscaler. The predictive > > scaling > > use case is a perfect example of a powerful feature that would help many > of > > us > > improve job availability by scaling before backlogs build up. > > > > To echo Gyula and Max's points, I also believe the best path forward is > to > > build > > this capability as an extension to the existing framework, not as a > > replacement. > > This would offer a robust, community-driven solution for a common > problem, > > which feels more sustainable than asking users to implement and maintain > > custom forks of the logic. > > > > Best, > > Rui > > > > On Thu, Aug 28, 2025 at 7:14 AM Pradeepta Choudhury > > <[email protected]> wrote: > > > > > Hello Peter, > > > > > > To start with, great initiative! But I echo the same concern raised > about > > > creating too many extension points can compromise the autoscaler > > > functionality. > > > When we proposed FLIP-514 [1] and a custom evaluator, the aim was > > twofold: > > > provide the required extension point and ship practical strategies as > > > pluggables. At the same time, we wanted to preserve flexibility for > > > advanced, highly specific scenarios—like predictive scaling—that differ > > by > > > ecosystem, platform, and company. The custom evaluator strikes that > > balance > > > was the thought process: it lets users adjust the evaluated > > > metrics—especially TARGET_DATA_RATE—that drive the scale-factor > > > calculation, enabling useful out-of-the-box behavior without > constraining > > > bespoke implementations. > > > One of the desired outcomes we had set for FLIP-514 was to ship a > > > scheduled-scaling strategy as a pluggable, leveraging a baseline period > > and > > > explicit scheduled windows to drive planned capacity changes. I’ve been > > > away since last month due to personal commitments. I plan to resume > after > > > first week of September and will complete the scheduled-scaling plugin > to > > > wrap up the custom evaluator. > > > Having the ScalingRealizer pluggable ( > > > https://github.com/apache/flink-kubernetes-operator/pull/1020/files), > > > definitely sounds helpful for certain scenarios. > > > But I totally agree with the general approach suggested by Gyula, about > > > solving specific issues independently in the "best possible way" and > then > > > coming to a good solution regarding pluggability that could be > foundation > > > for future use-cases. > > > > > > > > > Thanks and Regards > > > Pradeepta > > > > > > > > > > On 26 Aug 2025, at 6:05 PM, [email protected] < > > > [email protected]> wrote: > > > > > > > > From the ScalingRealizer, I think having before/after hooks for > > > `realizeParallelismOverrides` and `realizeConfigOverrides` would be > good. > > > We can support these hooks from plugins, thoughts? > > > > > > > > > > > > Best, > > > > Diljeet(DJ) Singh > > > > > > > > On 2025/08/26 08:24:33 Maximilian Michels wrote: > > > >> Hi Peter, > > > >> > > > >> First of all, this is a great initiative. Flink Autoscaling > definitely > > > >> needs more points of extension. We recently added support for > hooking > > > >> into the metric evaluation (FLIP-514), but clearly that is just one > > > >> extension point. > > > >> > > > >> That said, I think we will need to revise the approach a bit. I'm > not > > > >> sure, we should be replacing core components. As Gyula mentioned, > > > >> replacing those will easily break the entire autoscaler. Instead, we > > > >> should be adding extension points which allow for meaningful > additions > > > >> without breaking the scaling logic. There is already the option to > > > >> replace the entire autoscaling module, if users really want to roll > > > >> out a completely custom version. > > > >> > > > >> What usually works best is to formulate the use case first, then > > > >> figure out what autoscaler customization would be necessary to > > > >> implement the use case. > > > >> > > > >> As for making the ScalingRealizer pluggable > > > >> ( > https://github.com/apache/flink-kubernetes-operator/pull/1020/files > > ), > > > >> I do think that makes sense for some scenarios. > > > >> > > > >> Cheers, > > > >> Max > > > >> > > > >> On Tue, Aug 26, 2025 at 8:59 AM Gyula Fóra <[email protected]> wrote: > > > >>> > > > >>> Hi Peter & Diljeet! > > > >>> > > > >>> My general feedback is that we should try to introduce extension > > > plugins instead of plugins that completely replace key parts of the > > > autoscaler code. > > > >>> > > > >>> Let me give you a concrete example through FLIP-514 and FLIP-543 > > using > > > the MetricsEvaluator pluggability. > > > >>> The MetricsEvaluator in the autoscaler is responsible for > > > evaluating/deriving/calculating metrics from the collected metrics. It > > has > > > to calculate everything in a more or less specific way otherwise other > > > parts of the autoscaler that depend on these metrics may not work. It > > > doesn't seem very practical/resonable to completely reimplement this > just > > > because someone wants to extend the logic, this is extremely error > prone > > > and fragile especially if the autoscaler logic later evolves. > > > >>> > > > >>> FLIP-514 takes the approach to extend the metric evaluator with a > new > > > method that allows users to at the end modify the evaluated metrics and > > > define custom ones. This is the right approach here as it makes a new > > > extension very simple to build and maintain without interfering with > > > existing logic. > > > >>> > > > >>> The approach in FLIP-543 and in Diljeet's example PR takes the > > > replacement approach to completely substitute the entire parts of the > > > implementation (the entire evaluator, scaling realizer etc). I think > this > > > is not very good for either the community or the actual user. From a > > > community perspective it makes it harder to extend the logic with nice > > > small additions and from a user's perspective it is very error probe if > > the > > > operator autoscaler logic changes as it basically exposes a lot of > > internal > > > logic on a user interface. > > > >>> > > > >>> So at this point, -1 for the approach in FLIP-543 from my side, > but > > I > > > would love to hear the opinion of others as well. > > > >>> > > > >>> Cheers > > > >>> Gyula > > > >>> > > > >>> On Mon, Aug 25, 2025 at 11:44 PM Peter Huang <[email protected]> > > wrote: > > > >>>> > > > >>>> Hi Diljeet, > > > >>>> > > > >>>> Yes, I think we have similar requirements to make autoscaler even > > more > > > >>>> powerful to handle some customized requirements. > > > >>>> The quick PoC makes sense to me. Let's get some more feedback from > > the > > > >>>> community. > > > >>>> > > > >>>> > > > >>>> > > > >>>> Best Regards > > > >>>> Peter Huang > > > >>>> > > > >>>> > > > >>>> > > > >>>> On Mon, Aug 25, 2025 at 2:37 PM Peter Huang <[email protected]> > > > >>>> wrote: > > > >>>> > > > >>>>> Just try to combine the discussion into one thread. > > > >>>>> > > > >>>>> @Diljeet Singh > > > >>>>> Posted a quick PoC for the proposal > > > >>>>> https://github.com/apache/flink-kubernetes-operator/pull/1020. > > > >>>>> > > > >>>>> > > > >>>>> > > > >>>>> > > > >>>>> On Mon, Aug 25, 2025 at 7:52 AM Peter Huang <[email protected]> > > > >>>>> wrote: > > > >>>>> > > > >>>>>> Hi Community, > > > >>>>>> > > > >>>>>> Our org has been heavily using the Flink autoscaling algorithm. > It > > > >>>>>> greatly reduced our operation overhead and improved cost > > efficiency > > > >>>>>> as users always over provision resources when onboard. Recently, > > we > > > have > > > >>>>>> had some requirements to customize the auto scaling algorithm > > > >>>>>> for different scenarios, for example, during the holiday season > > > large but > > > >>>>>> predictable traffic spike, increase checkpoint interval together > > > with > > > >>>>>> scale up for streaming ingestion use cases. > > > >>>>>> > > > >>>>>> We search through the discussion about the topic in the mail > list > > > >>>>>> including the existing FLIP-514 > > > >>>>>> < > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-514%3A+Custom+Evaluator+plugin+for+Flink+Autoscaler > > > >. > > > >>>>>> Looks like the discussion is not finalized yet. > > > >>>>>> To accelerate the process, we adopt and combine the > > > >>>>>> existing opinions from the community and create a proposal in > > > FLIP-543 > > > >>>>>> < > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-543%3A+Support+Customized+Autoscale+Algorithm > > > >. > > > >>>>>> The basic idea > > > >>>>>> is to make some core components of autoscaler pluggable, for > > > example, > > > >>>>>> MetricsCollector, Metrics Evaluator, and ScalingRealizer, at the > > > same > > > >>>>>> keep the core logic skeleton (which is already well justified in > > > large > > > >>>>>> amount of users) of autoscaler untouched. > > > >>>>>> > > > >>>>>> Looking forward to any feedback and opinions on FLIP-543. > > > >>>>>> > > > >>>>>> [1] > > > >>>>>> > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-543%3A+Support+Customized+Autoscale+Algorithm > > > >>>>>> [2] > > > >>>>>> > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-514%3A+Custom+Evaluator+plugin+for+Flink+Autoscaler > > > >>>>>> [3] Other related discussion thread > > > >>>>>> > > > >>>>>> > https://lists.apache.org/thread/749l74z1h5jylkxrw3rtjmxcj2t9p7ws > > > >>>>>> > > > >>>>>> > https://lists.apache.org/thread/mcd7jcn4kz6oqtyqq5hfycjf9mqh6c53 > > > >>>>>> > > > >>>>>> > > > >>>>>> Best Regards > > > >>>>>> Peter Huang > > > >>>>>> > > > >>>>> > > > > > > > > >
