Re: [DISCUSS] FLIP-577: AI-Native Flink — An Umbrella Proposal for Multimodal Data Processing

Gustavo de Morais Thu, 30 Apr 2026 02:44:23 -0700

Hi Guowei and Dian,

Thanks for the detailed answers - this clears up I was wondering about
specifically regarding "FLIP-XXX: Multimodal Data Type System and Object
Reference Mechanism".


Glad to hear that multimodal types and AI functions will be first-class for
both SQL/Table and DataStream. On point 2, I'd lean toward splitting the
SQL/Table side into its own sub-FLIP due to scope - but up to your
judgment. I understand the Umbrella FLIP is meant to be a high-level
overview of the plan, and that the details will be fleshed out in the
sub-FLIPs.

Thanks for driving the the initiative and the discussion.

Best,
Gustavo

On Thu, 30 Apr 2026 at 09:58, Guowei Ma <[email protected]> wrote:

> Hi David,
>
> Thanks for the careful review — these questions help sharpen the umbrella.
> Let me respond point by point.
>
> 1. Layer 3 dependencies on Layer 1/2
>
> I broadly agree. The pure CPU-side checkpoint enhancements in Layer 3
> (Pipeline Region independent checkpoint, Unaligned Checkpoint improvements)
> can indeed be advanced independently. The parts of Layer 3 related to GPU
> elastic scaling do depend on RpcOperator from Layer 1, but this is a local
> dependency and doesn't affect the parallelism of Layer 3 as a whole. I'll
> make the dependency relationships more explicit in the next revision of the
> umbrella.
>
> 2. "Could Flink become like R or SPSS?" — and how this compares to existing
> solutions
>
> This is a question worth answering directly, because it goes to Flink's
> positioning in AI data processing.
>
> Direct answer: no. R and SPSS are single-machine interactive statistical
> analysis tools, targeted at statisticians and researchers. Flink is
> distributed data infrastructure, targeted at engineers building production
> data pipelines. These aren't in the same lane.
>
> The more relevant comparisons are Daft and Ray Data — these are the most
> active distributed systems in AI data processing today. Flink's
> differentiation shows up on two levels.
>
> On the ecosystem side: Flink's core strength is its streaming + checkpoint
> machinery, which matters especially in inference scenarios. A single
> inference can take seconds to minutes, so the cost of failover is far
> higher than in traditional batch data processing, and fine-grained fault
> tolerance directly determines production viability. We've seen users put
> significant additional engineering effort into fault tolerance for
> inference workloads on other systems, and even so the result is less
> systematic and less complete than what Flink already provides — this is the
> accumulation of more than a decade of streaming engine work, and not
> something easily caught up with in the short term.
>
> On ecosystem position: Flink is already data infrastructure inside many
> enterprises, widely used for ETL, CDC, and real-time analytics. Integrating
> AI inference directly into existing Flink pipelines is far cheaper than
> spinning up a separate stack on Ray or Daft — especially when multimodal
> data is already flowing through Flink. The AI-Native evolution makes this
> integration a natural extension, rather than asking users to switch to a
> different technology stack.
>
> 3. RpcOperator: where it's deployed, how remote works
>
> Let me first clarify one point (this also addresses the same confusion in
> Robert's email): RpcOperator is not a specialization of async io, it's a
> deployment primitive — it splits GPU compute out of the data-plane topology
> so it can be independently scheduled, scaled, and recovered as a service.
> CPU operators can still use async semantics when calling it.
>
> In terms of deployment shape, the mechanics for bringing up an RpcOperator,
> managing its lifecycle, and scheduling its resources are similar to how
> Flink launches a TaskManager today — reusing existing Flink runtime
> capabilities rather than introducing a new infrastructure layer. The
> concrete protocols and implementation details will be developed in the
> sub-FLIP.
>
> Typical use cases are GPU inference services such as LLM inference and
> multimodal model inference — compute units that have their own resource
> profile and scaling curve, distinct from those of the main data flow, and
> therefore well-suited to being deployed as independent services.
>
> 4. Multimodal types — deser is the core
>
> You said "the interesting part will be the deser" — that's exactly right,
> and deser is indeed the core of this sub-FLIP's design. The Object
> Reference mechanism is a partial answer here — large objects are serialized
> only once and passed through the pipeline by reference, which significantly
> reduces the SerDes burden in multimodal scenarios. The detailed SerDes
> design will be developed in the sub-FLIP.
>
> 5. Built-in operators: library-by-library integration
>
> This is a very pragmatic suggestion, and it's exactly the direction we're
> going in. Our goal is not to reinvent the wheel, but to use the new
> mechanisms to bring existing AI libraries smoothly into Flink, rather than
> forcing users to write call-by-call wrappers. The layering is:
>
>    - Built-in operators cover the most common operations that can benefit
>    from framework-level optimization (model sharing, batching, GPU resource
>    pooling, etc.); these will be built on top of existing libraries rather
>    than reimplemented from scratch.
>    - The UDF system lets users import any Python library directly, without
>    needing a dedicated operator per call.
>
> The two are complementary: common operations that benefit from framework
> optimization go through built-ins; long-tail and custom needs go through
> UDFs. The concrete set of built-in operators, the library integration
> approach, and the specific framework-level optimization points will be
> addressed in "FLIP-XXX: Built-in Multimodal Operators and AI Functions".
>
> 6. Columnar: whether it's exposed in SQL
>
> Your reading is correct — in the first phase we don't plan to expose the
> columnar format at the SQL layer. Columnar execution is an internal engine
> optimization, and SQL remains row-based at the logical level.
>
> Whether UDFs should directly produce/consume columnar data (Arrow / NumPy)
> is something we're keeping open and can discuss further down the line — it
> depends on how strong the vectorization need is in concrete scenarios, and
> on the impact on user programming model complexity.
>
> Thanks again — your feedback meaningfully improves the clarity of the
> umbrella in several places.
>
>
> Best, Guowei
>
>
> On Wed, Apr 29, 2026 at 12:02 AM David Radley <[email protected]>
> wrote:
>
> > Hi Guowei,
> > This is an interesting proposal. I second Roberts questions. Some
> thoughts.
> >
> > Layer 3 does not depend on layers 1 and 2 I think. At the high level I
> > wonder, is the idea that Flink could become like an R ML pipeline or
> SPSS?
> > It would be good to compare existing technology solutions and what
> benefits
> > Flink will bring to these scenarios.
> > FLIP-XXX: Supporting RpcOperator — Independently Deployed and Scaled RPC
> > Service Operators - see Robert's comment. I assume this is a
> specialization
> > of the async io operator for RPC. When you say deploying RPC services
> that
> > are fully managed by the Flink runtime, where would these be deployed? If
> > it is remote how would this work? It would be interesting to see some use
> > cases where Flink would be deploying RPC services that it has created.
> > FLIP-XXX: Multimodal Data Type System and Object Reference Mechanism
> >
> > I like the idea of adding these types - the interesting part will be the
> > deser.
> >
> > FLIP-XXX: A More Pythonic DataFrame API for Python Users - this makes
> sense
> > FLIP-XXX: Connector API for Multimodal Data Source/Sink - I assume this
> > will be renamed new multimodal formats. Are there existing registries
> that
> > these could be looked in - similar to schema registry - so we can bring
> in
> > artifacts via metadata?
> > FLIP-XXX: Built-in Multimodal Operators and AI Functions - I wonder if we
> > could bring in existing implementation libraries and the new work would
> > allow us to call them from Flink. i.e. not having to do them one call by
> > call but library by library.
> > FLIP-XXX: Columnar Data Transport and Processing Optimization - this
> seems
> > a big change, events as columns rather than events as rows or CDC
> > sequences. I assume this would not be exposed in SQL?
> >
> >  kind regards, David.
> >
> > From: Robert Metzger <[email protected]>
> > Date: Tuesday, 28 April 2026 at 07:38
> > To: [email protected] <[email protected]>
> > Subject: [EXTERNAL] Re: [DISCUSS] FLIP-577: AI-Native Flink — An Umbrella
> > Proposal for Multimodal Data Processing
> >
> > Hey Guowei,
> >
> > Thanks for the proposal. I just took a brief look, here are some high
> level
> > questions:
> >
> > Regarding the RPC Operator: What is the difference to the async io
> operator
> > we have already?
> >
> > "Connector API for Multimodal Data Source/Sink": Why do we need to touch
> > the connector API for supporting multimodal data? Isn't this more of a
> > formats concern?
> >
> > "Non-Disruptive Scaling for CPU Operators": How do you want to guarantee
> > exactly-once on that kind of scaling? E.g. you need to somehow make a
> > handover between the old and new new pipeline
> >
> > Overall, I find the proposal has some things which seem related to making
> > Flink more AI native, but other changes seem orthogonal to that. For
> > example the checkpoint or scaling changes are actually unrelated to AI,
> and
> > just engine improvements.
> >
> >
> > On Tue, Apr 28, 2026 at 5:48 AM Guowei Ma <[email protected]> wrote:
> >
> > > Hi everyone,
> > >
> > > I'd like to start a discussion on an umbrella FLIP[1] that lays out a
> > > direction for evolving Flink into a data engine that natively supports
> AI
> > > workloads.
> > >
> > > The short version: user workloads are shifting from BI analytics to
> > > multimodal data processing centered on model inference, and this
> triggers
> > > cascading changes across the stack — multimodal data flowing through
> > > pipelines, heterogeneous CPU/GPU resources, vectorized execution, and
> > > inference tasks that run for seconds to minutes on Spot instances. The
> > > proposal sketches an evolution along five directions (development
> > paradigm,
> > > data model, heterogeneous resources, execution engine, fault
> tolerance),
> > > decomposed into 11 sub-FLIPs organized into three layers: core runtime
> > > primitives, AI workload expression and execution, and production-grade
> > > operational guarantees. Most sub-FLIPs have no hard dependencies on
> each
> > > other and can be advanced in parallel.
> > >
> > > A note on scope, since it's an umbrella:
> > >
> > > - In scope here: whether the evolution directions are reasonable,
> whether
> > > each sub-FLIP's motivation and proposed approach are well-founded, and
> > > whether the boundaries and dependencies between sub-FLIPs are clear.
> > > - Out of scope here: detailed designs, API specifics, and
> implementation
> > > plans of individual sub-FLIPs — those will go through their own FLIPs.
> > > - Consensus criteria: agreement on the overall direction is sufficient
> > for
> > > the umbrella to pass; passing it does not lock in any sub-FLIP's
> design —
> > > sub-FLIPs may still be adjusted, deferred, or withdrawn as they
> progress.
> > >
> > > All proposed changes are incremental — no existing API or behavior is
> > > removed or altered. Compatibility details are covered at the end of the
> > > document.
> > >
> > > Looking forward to your feedback on the overall direction and the
> > layering.
> > >
> > > [1]
> > >
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=421957275
> > >
> > > Thanks,
> > > Guowei
> > >
> >
> > Unless otherwise stated above:
> >
> > IBM United Kingdom Limited
> > Registered in England and Wales with number 741598
> > Registered office: Building C, IBM Hursley Office, Hursley Park Road,
> > Winchester, Hampshire SO21 2JN
> >
>

Re: [DISCUSS] FLIP-577: AI-Native Flink — An Umbrella Proposal for Multimodal Data Processing

Reply via email to