Hey Guowei,

Thanks for the proposal, and I think this is very valuable. I have some
question about it:

1. What are our expected throughput and latency targets? Do we have any
forward-looking tests for this?

2. AI involves a very large number of operators. Besides allowing users to
use them through UDFs, will we also provide commonly used built-in
operators?

3. Each of the 11 sub-FLIPs is a major project involving a significant
amount of changes. What is our plan for this?

4. GPU scheduling is extremely complex. What is our current roadmap for
this?

This is a very high-quality and exciting proposal. Making Flink an
AI-native data processing engine will make it far more valuable in the AI
era. Look forward to seeing it land and come to fruition soon.

Robert Metzger <[email protected]> 于2026年4月28日周二 14:38写道:

> Hey Guowei,
>
> Thanks for the proposal. I just took a brief look, here are some high level
> questions:
>
> Regarding the RPC Operator: What is the difference to the async io operator
> we have already?
>
> "Connector API for Multimodal Data Source/Sink": Why do we need to touch
> the connector API for supporting multimodal data? Isn't this more of a
> formats concern?
>
> "Non-Disruptive Scaling for CPU Operators": How do you want to guarantee
> exactly-once on that kind of scaling? E.g. you need to somehow make a
> handover between the old and new new pipeline
>
> Overall, I find the proposal has some things which seem related to making
> Flink more AI native, but other changes seem orthogonal to that. For
> example the checkpoint or scaling changes are actually unrelated to AI, and
> just engine improvements.
>
>
> On Tue, Apr 28, 2026 at 5:48 AM Guowei Ma <[email protected]> wrote:
>
> > Hi everyone,
> >
> > I'd like to start a discussion on an umbrella FLIP[1] that lays out a
> > direction for evolving Flink into a data engine that natively supports AI
> > workloads.
> >
> > The short version: user workloads are shifting from BI analytics to
> > multimodal data processing centered on model inference, and this triggers
> > cascading changes across the stack — multimodal data flowing through
> > pipelines, heterogeneous CPU/GPU resources, vectorized execution, and
> > inference tasks that run for seconds to minutes on Spot instances. The
> > proposal sketches an evolution along five directions (development
> paradigm,
> > data model, heterogeneous resources, execution engine, fault tolerance),
> > decomposed into 11 sub-FLIPs organized into three layers: core runtime
> > primitives, AI workload expression and execution, and production-grade
> > operational guarantees. Most sub-FLIPs have no hard dependencies on each
> > other and can be advanced in parallel.
> >
> > A note on scope, since it's an umbrella:
> >
> > - In scope here: whether the evolution directions are reasonable, whether
> > each sub-FLIP's motivation and proposed approach are well-founded, and
> > whether the boundaries and dependencies between sub-FLIPs are clear.
> > - Out of scope here: detailed designs, API specifics, and implementation
> > plans of individual sub-FLIPs — those will go through their own FLIPs.
> > - Consensus criteria: agreement on the overall direction is sufficient
> for
> > the umbrella to pass; passing it does not lock in any sub-FLIP's design —
> > sub-FLIPs may still be adjusted, deferred, or withdrawn as they progress.
> >
> > All proposed changes are incremental — no existing API or behavior is
> > removed or altered. Compatibility details are covered at the end of the
> > document.
> >
> > Looking forward to your feedback on the overall direction and the
> layering.
> >
> > [1]
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=421957275
> >
> > Thanks,
> > Guowei
> >
>

Reply via email to