Re: [DISCUSS] FLIP-577: AI-Native Flink — An Umbrella Proposal for Multimodal Data Processing

Dian Fu Tue, 28 Apr 2026 19:08:12 -0700

Hi Guowei,

Thanks for driving this FLIP.  I believe it will be an important step
towards moving Flink from the BI stack towards the AI stack.


Here are my thoughts on the concerns raised by Gustavo:

> - SQL/Table: What is the plan here? How will the new multimodal types
> (Tensor, Image, Embedding) work in the type system, codegen, and
> plan/savepoint compatibility?

I think the new multimodal types will be introduced also in the
DataStream API & Table API just as the existing built-in data types in
Flink.

> Is there a plan for SQL-level model inference
> beyond the current ML_PREDICT shape, for example, vector similarity or
> multimodal predicates? Today this is still very vendor-specific across the
> industry, so it would be nice to know if Flink wants to take a clear
> position here or how this flip will fit with the sql table vision

We intend to introduce more built-in, domain-specific model inference
functions beyond ML_PREDICT. This is reflected in the part `FLIP-XXX:
Built-in Multimodal Operators and AI Function`. We plan to introduce
quite a few
built-in functionalities around multimodal data processing and
domain-specific model inference functions to ease the life of users.
It will be available for both Python users and SQL users. The detailed
list will be well discussed in that sub-FLIP.

> - DataStream (v1 and v2): Will RpcOperator and the Arrow-batch primitives
> be exposed as first-class building blocks for Java users, or only as
> internal pieces behind the Python DataFrame? Many streaming inference use
> cases (real-time enrichment, CDC + model scoring) fit very well with
> DataStream and would benefit from clear guidance.

Good question. Regarding Arrow-batch primitives, the preliminary
thinking is to focus on the Python jobs first, since it's very clear
that Python users need it. If we see clear use cases where Java users
would directly benefit from it, we can absolutely expose them as
first-class building blocks. It would be great if you could share more
thoughts on how Java users could use it in the above or other
scenarios.

Regards,
Dian


On Wed, Apr 29, 2026 at 12:04 AM David Radley <[email protected]> wrote:
>
> Hi Guowei,
> This is an interesting proposal. I second Roberts questions. Some thoughts.
>
> Layer 3 does not depend on layers 1 and 2 I think. At the high level I 
> wonder, is the idea that Flink could become like an R ML pipeline or SPSS? It 
> would be good to compare existing technology solutions and what benefits 
> Flink will bring to these scenarios.
> FLIP-XXX: Supporting RpcOperator — Independently Deployed and Scaled RPC 
> Service Operators - see Robert's comment. I assume this is a specialization 
> of the async io operator for RPC. When you say deploying RPC services that 
> are fully managed by the Flink runtime, where would these be deployed? If it 
> is remote how would this work? It would be interesting to see some use cases 
> where Flink would be deploying RPC services that it has created.
> FLIP-XXX: Multimodal Data Type System and Object Reference Mechanism
>
> I like the idea of adding these types - the interesting part will be the 
> deser.
>
> FLIP-XXX: A More Pythonic DataFrame API for Python Users - this makes sense
> FLIP-XXX: Connector API for Multimodal Data Source/Sink - I assume this will 
> be renamed new multimodal formats. Are there existing registries that these 
> could be looked in - similar to schema registry - so we can bring in 
> artifacts via metadata?
> FLIP-XXX: Built-in Multimodal Operators and AI Functions - I wonder if we 
> could bring in existing implementation libraries and the new work would allow 
> us to call them from Flink. i.e. not having to do them one call by call but 
> library by library.
> FLIP-XXX: Columnar Data Transport and Processing Optimization - this seems a 
> big change, events as columns rather than events as rows or CDC sequences. I 
> assume this would not be exposed in SQL?
>
>  kind regards, David.
>
> From: Robert Metzger <[email protected]>
> Date: Tuesday, 28 April 2026 at 07:38
> To: [email protected] <[email protected]>
> Subject: [EXTERNAL] Re: [DISCUSS] FLIP-577: AI-Native Flink — An Umbrella 
> Proposal for Multimodal Data Processing
>
> Hey Guowei,
>
> Thanks for the proposal. I just took a brief look, here are some high level
> questions:
>
> Regarding the RPC Operator: What is the difference to the async io operator
> we have already?
>
> "Connector API for Multimodal Data Source/Sink": Why do we need to touch
> the connector API for supporting multimodal data? Isn't this more of a
> formats concern?
>
> "Non-Disruptive Scaling for CPU Operators": How do you want to guarantee
> exactly-once on that kind of scaling? E.g. you need to somehow make a
> handover between the old and new new pipeline
>
> Overall, I find the proposal has some things which seem related to making
> Flink more AI native, but other changes seem orthogonal to that. For
> example the checkpoint or scaling changes are actually unrelated to AI, and
> just engine improvements.
>
>
> On Tue, Apr 28, 2026 at 5:48 AM Guowei Ma <[email protected]> wrote:
>
> > Hi everyone,
> >
> > I'd like to start a discussion on an umbrella FLIP[1] that lays out a
> > direction for evolving Flink into a data engine that natively supports AI
> > workloads.
> >
> > The short version: user workloads are shifting from BI analytics to
> > multimodal data processing centered on model inference, and this triggers
> > cascading changes across the stack — multimodal data flowing through
> > pipelines, heterogeneous CPU/GPU resources, vectorized execution, and
> > inference tasks that run for seconds to minutes on Spot instances. The
> > proposal sketches an evolution along five directions (development paradigm,
> > data model, heterogeneous resources, execution engine, fault tolerance),
> > decomposed into 11 sub-FLIPs organized into three layers: core runtime
> > primitives, AI workload expression and execution, and production-grade
> > operational guarantees. Most sub-FLIPs have no hard dependencies on each
> > other and can be advanced in parallel.
> >
> > A note on scope, since it's an umbrella:
> >
> > - In scope here: whether the evolution directions are reasonable, whether
> > each sub-FLIP's motivation and proposed approach are well-founded, and
> > whether the boundaries and dependencies between sub-FLIPs are clear.
> > - Out of scope here: detailed designs, API specifics, and implementation
> > plans of individual sub-FLIPs — those will go through their own FLIPs.
> > - Consensus criteria: agreement on the overall direction is sufficient for
> > the umbrella to pass; passing it does not lock in any sub-FLIP's design —
> > sub-FLIPs may still be adjusted, deferred, or withdrawn as they progress.
> >
> > All proposed changes are incremental — no existing API or behavior is
> > removed or altered. Compatibility details are covered at the end of the
> > document.
> >
> > Looking forward to your feedback on the overall direction and the layering.
> >
> > [1]
> > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=421957275
> >
> > Thanks,
> > Guowei
> >
>
> Unless otherwise stated above:
>
> IBM United Kingdom Limited
> Registered in England and Wales with number 741598
> Registered office: Building C, IBM Hursley Office, Hursley Park Road, 
> Winchester, Hampshire SO21 2JN

Re: [DISCUSS] FLIP-577: AI-Native Flink — An Umbrella Proposal for Multimodal Data Processing

Reply via email to