Hi Guowei and Dian, Thanks for the detailed answers - this clears up I was wondering about specifically regarding "FLIP-XXX: Multimodal Data Type System and Object Reference Mechanism".
Glad to hear that multimodal types and AI functions will be first-class for both SQL/Table and DataStream. On point 2, I'd lean toward splitting the SQL/Table side into its own sub-FLIP due to scope - but up to your judgment. I understand the Umbrella FLIP is meant to be a high-level overview of the plan, and that the details will be fleshed out in the sub-FLIPs. Thanks for driving the the initiative and the discussion. Best, Gustavo On Thu, 30 Apr 2026 at 09:58, Guowei Ma <[email protected]> wrote: > Hi David, > > Thanks for the careful review — these questions help sharpen the umbrella. > Let me respond point by point. > > 1. Layer 3 dependencies on Layer 1/2 > > I broadly agree. The pure CPU-side checkpoint enhancements in Layer 3 > (Pipeline Region independent checkpoint, Unaligned Checkpoint improvements) > can indeed be advanced independently. The parts of Layer 3 related to GPU > elastic scaling do depend on RpcOperator from Layer 1, but this is a local > dependency and doesn't affect the parallelism of Layer 3 as a whole. I'll > make the dependency relationships more explicit in the next revision of the > umbrella. > > 2. "Could Flink become like R or SPSS?" — and how this compares to existing > solutions > > This is a question worth answering directly, because it goes to Flink's > positioning in AI data processing. > > Direct answer: no. R and SPSS are single-machine interactive statistical > analysis tools, targeted at statisticians and researchers. Flink is > distributed data infrastructure, targeted at engineers building production > data pipelines. These aren't in the same lane. > > The more relevant comparisons are Daft and Ray Data — these are the most > active distributed systems in AI data processing today. Flink's > differentiation shows up on two levels. > > On the ecosystem side: Flink's core strength is its streaming + checkpoint > machinery, which matters especially in inference scenarios. A single > inference can take seconds to minutes, so the cost of failover is far > higher than in traditional batch data processing, and fine-grained fault > tolerance directly determines production viability. We've seen users put > significant additional engineering effort into fault tolerance for > inference workloads on other systems, and even so the result is less > systematic and less complete than what Flink already provides — this is the > accumulation of more than a decade of streaming engine work, and not > something easily caught up with in the short term. > > On ecosystem position: Flink is already data infrastructure inside many > enterprises, widely used for ETL, CDC, and real-time analytics. Integrating > AI inference directly into existing Flink pipelines is far cheaper than > spinning up a separate stack on Ray or Daft — especially when multimodal > data is already flowing through Flink. The AI-Native evolution makes this > integration a natural extension, rather than asking users to switch to a > different technology stack. > > 3. RpcOperator: where it's deployed, how remote works > > Let me first clarify one point (this also addresses the same confusion in > Robert's email): RpcOperator is not a specialization of async io, it's a > deployment primitive — it splits GPU compute out of the data-plane topology > so it can be independently scheduled, scaled, and recovered as a service. > CPU operators can still use async semantics when calling it. > > In terms of deployment shape, the mechanics for bringing up an RpcOperator, > managing its lifecycle, and scheduling its resources are similar to how > Flink launches a TaskManager today — reusing existing Flink runtime > capabilities rather than introducing a new infrastructure layer. The > concrete protocols and implementation details will be developed in the > sub-FLIP. > > Typical use cases are GPU inference services such as LLM inference and > multimodal model inference — compute units that have their own resource > profile and scaling curve, distinct from those of the main data flow, and > therefore well-suited to being deployed as independent services. > > 4. Multimodal types — deser is the core > > You said "the interesting part will be the deser" — that's exactly right, > and deser is indeed the core of this sub-FLIP's design. The Object > Reference mechanism is a partial answer here — large objects are serialized > only once and passed through the pipeline by reference, which significantly > reduces the SerDes burden in multimodal scenarios. The detailed SerDes > design will be developed in the sub-FLIP. > > 5. Built-in operators: library-by-library integration > > This is a very pragmatic suggestion, and it's exactly the direction we're > going in. Our goal is not to reinvent the wheel, but to use the new > mechanisms to bring existing AI libraries smoothly into Flink, rather than > forcing users to write call-by-call wrappers. The layering is: > > - Built-in operators cover the most common operations that can benefit > from framework-level optimization (model sharing, batching, GPU resource > pooling, etc.); these will be built on top of existing libraries rather > than reimplemented from scratch. > - The UDF system lets users import any Python library directly, without > needing a dedicated operator per call. > > The two are complementary: common operations that benefit from framework > optimization go through built-ins; long-tail and custom needs go through > UDFs. The concrete set of built-in operators, the library integration > approach, and the specific framework-level optimization points will be > addressed in "FLIP-XXX: Built-in Multimodal Operators and AI Functions". > > 6. Columnar: whether it's exposed in SQL > > Your reading is correct — in the first phase we don't plan to expose the > columnar format at the SQL layer. Columnar execution is an internal engine > optimization, and SQL remains row-based at the logical level. > > Whether UDFs should directly produce/consume columnar data (Arrow / NumPy) > is something we're keeping open and can discuss further down the line — it > depends on how strong the vectorization need is in concrete scenarios, and > on the impact on user programming model complexity. > > Thanks again — your feedback meaningfully improves the clarity of the > umbrella in several places. > > > Best, Guowei > > > On Wed, Apr 29, 2026 at 12:02 AM David Radley <[email protected]> > wrote: > > > Hi Guowei, > > This is an interesting proposal. I second Roberts questions. Some > thoughts. > > > > Layer 3 does not depend on layers 1 and 2 I think. At the high level I > > wonder, is the idea that Flink could become like an R ML pipeline or > SPSS? > > It would be good to compare existing technology solutions and what > benefits > > Flink will bring to these scenarios. > > FLIP-XXX: Supporting RpcOperator — Independently Deployed and Scaled RPC > > Service Operators - see Robert's comment. I assume this is a > specialization > > of the async io operator for RPC. When you say deploying RPC services > that > > are fully managed by the Flink runtime, where would these be deployed? If > > it is remote how would this work? It would be interesting to see some use > > cases where Flink would be deploying RPC services that it has created. > > FLIP-XXX: Multimodal Data Type System and Object Reference Mechanism > > > > I like the idea of adding these types - the interesting part will be the > > deser. > > > > FLIP-XXX: A More Pythonic DataFrame API for Python Users - this makes > sense > > FLIP-XXX: Connector API for Multimodal Data Source/Sink - I assume this > > will be renamed new multimodal formats. Are there existing registries > that > > these could be looked in - similar to schema registry - so we can bring > in > > artifacts via metadata? > > FLIP-XXX: Built-in Multimodal Operators and AI Functions - I wonder if we > > could bring in existing implementation libraries and the new work would > > allow us to call them from Flink. i.e. not having to do them one call by > > call but library by library. > > FLIP-XXX: Columnar Data Transport and Processing Optimization - this > seems > > a big change, events as columns rather than events as rows or CDC > > sequences. I assume this would not be exposed in SQL? > > > > kind regards, David. > > > > From: Robert Metzger <[email protected]> > > Date: Tuesday, 28 April 2026 at 07:38 > > To: [email protected] <[email protected]> > > Subject: [EXTERNAL] Re: [DISCUSS] FLIP-577: AI-Native Flink — An Umbrella > > Proposal for Multimodal Data Processing > > > > Hey Guowei, > > > > Thanks for the proposal. I just took a brief look, here are some high > level > > questions: > > > > Regarding the RPC Operator: What is the difference to the async io > operator > > we have already? > > > > "Connector API for Multimodal Data Source/Sink": Why do we need to touch > > the connector API for supporting multimodal data? Isn't this more of a > > formats concern? > > > > "Non-Disruptive Scaling for CPU Operators": How do you want to guarantee > > exactly-once on that kind of scaling? E.g. you need to somehow make a > > handover between the old and new new pipeline > > > > Overall, I find the proposal has some things which seem related to making > > Flink more AI native, but other changes seem orthogonal to that. For > > example the checkpoint or scaling changes are actually unrelated to AI, > and > > just engine improvements. > > > > > > On Tue, Apr 28, 2026 at 5:48 AM Guowei Ma <[email protected]> wrote: > > > > > Hi everyone, > > > > > > I'd like to start a discussion on an umbrella FLIP[1] that lays out a > > > direction for evolving Flink into a data engine that natively supports > AI > > > workloads. > > > > > > The short version: user workloads are shifting from BI analytics to > > > multimodal data processing centered on model inference, and this > triggers > > > cascading changes across the stack — multimodal data flowing through > > > pipelines, heterogeneous CPU/GPU resources, vectorized execution, and > > > inference tasks that run for seconds to minutes on Spot instances. The > > > proposal sketches an evolution along five directions (development > > paradigm, > > > data model, heterogeneous resources, execution engine, fault > tolerance), > > > decomposed into 11 sub-FLIPs organized into three layers: core runtime > > > primitives, AI workload expression and execution, and production-grade > > > operational guarantees. Most sub-FLIPs have no hard dependencies on > each > > > other and can be advanced in parallel. > > > > > > A note on scope, since it's an umbrella: > > > > > > - In scope here: whether the evolution directions are reasonable, > whether > > > each sub-FLIP's motivation and proposed approach are well-founded, and > > > whether the boundaries and dependencies between sub-FLIPs are clear. > > > - Out of scope here: detailed designs, API specifics, and > implementation > > > plans of individual sub-FLIPs — those will go through their own FLIPs. > > > - Consensus criteria: agreement on the overall direction is sufficient > > for > > > the umbrella to pass; passing it does not lock in any sub-FLIP's > design — > > > sub-FLIPs may still be adjusted, deferred, or withdrawn as they > progress. > > > > > > All proposed changes are incremental — no existing API or behavior is > > > removed or altered. Compatibility details are covered at the end of the > > > document. > > > > > > Looking forward to your feedback on the overall direction and the > > layering. > > > > > > [1] > > > > > > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=421957275 > > > > > > Thanks, > > > Guowei > > > > > > > Unless otherwise stated above: > > > > IBM United Kingdom Limited > > Registered in England and Wales with number 741598 > > Registered office: Building C, IBM Hursley Office, Hursley Park Road, > > Winchester, Hampshire SO21 2JN > > >
