Hi Gustavo

Thanks for the feedback, and I'm glad you find the runtime direction
reasonable. You're right that the API side of the umbrella deserves more
detail — let me respond point by point.

1. Multimodal type system

Support for the multimodal types (Tensor, Image, Embedding) will be
first-class for both SQL/Table and DataStream — this is an engine-level
type system capability and shouldn't be tied to any single API surface.
I'll make this explicit in "FLIP-XXX: Multimodal Data Type System and
Object Reference Mechanism", spelling out the visibility across each API so
it isn't read as Python-only.

2. Built-in multimodal operators and AI functions

Likewise, multimodal operators and AI functions will be supported for both
SQL and Python. I'll clarify this in "FLIP-XXX: Built-in Multimodal
Operators and AI Functions", or alternatively consider splitting the
SQL/Table side into its own sub-FLIP so it gets the design depth it
deserves.

3. DataStream

On whether to add built-in multimodal operators or expose Arrow directly on
the DataStream side, I'm open — this needs to be evaluated case by case.
One consideration is that DataStream can in many cases be converted to
SQL/Table for execution, which would let it reuse the multimodal
capabilities provided there without duplicating them on the DataStream
side. Which use cases are best served by the SQL/Table conversion path
versus native DataStream support is something we'll need to look at
concretely.

That said, I want to be explicit on one point: RpcOperator, as an
engine-level first-class operator abstraction, will be available to Java
DataStream users. The use cases you mentioned — real-time enrichment, CDC +
model scoring — can absolutely use RpcOperator on DataStream to integrate
GPU inference services. I'll make this explicit in the RpcOperator sub-FLIP.

4. SQL-level vector capabilities

On vector similarity, Flink already introduced the VECTOR_SEARCH interface
in 2.2 [1]. We'll build on that interface and extend it rather than
starting something separate.

Thanks again — this feedback meaningfully improves the coverage of the
umbrella.


[1]
https://nightlies.apache.org/flink/flink-docs-release-2.2/docs/dev/table/sql/queries/vector-search/


Best,
Guowei


On Tue, Apr 28, 2026 at 7:43 PM Gustavo de Morais <[email protected]>
wrote:

> Hello Guowei,
>
> Thanks for this proposal. The runtime direction looks like a good
> foundation, and I think many of these pieces will also help non-AI
> workloads.
>
> One part I would like to see more detailed is the API side, especially for
> Java DataStream and SQL/Table users. The FLIP is Python-DataFrame focused,
> which makes sense for the AI ecosystem, but some questions are still open
> to align this with the overall project:
>
> - SQL/Table: What is the plan here? How will the new multimodal types
> (Tensor, Image, Embedding) work in the type system, codegen, and
> plan/savepoint compatibility? Is there a plan for SQL-level model inference
> beyond the current ML_PREDICT shape, for example, vector similarity or
> multimodal predicates? Today this is still very vendor-specific across the
> industry, so it would be nice to know if Flink wants to take a clear
> position here or how this flip will fit with the sql table vision
>
> - DataStream (v1 and v2): Will RpcOperator and the Arrow-batch primitives
> be exposed as first-class building blocks for Java users, or only as
> internal pieces behind the Python DataFrame? Many streaming inference use
> cases (real-time enrichment, CDC + model scoring) fit very well with
> DataStream and would benefit from clear guidance.
>
> This is not a blocker for the overall direction. I just think the API
> roadmap deserves the same level of detail as the runtime one, so that the
> current user base has a clear picture of what "AI-native" means for them.
>
> Kind regards,
> Gustavo
>
> On Tue, 28 Apr 2026 at 10:33, zl z <[email protected]> wrote:
>
> > Hey Guowei,
> >
> > Thanks for the proposal, and I think this is very valuable. I have some
> > question about it:
> >
> > 1. What are our expected throughput and latency targets? Do we have any
> > forward-looking tests for this?
> >
> > 2. AI involves a very large number of operators. Besides allowing users
> to
> > use them through UDFs, will we also provide commonly used built-in
> > operators?
> >
> > 3. Each of the 11 sub-FLIPs is a major project involving a significant
> > amount of changes. What is our plan for this?
> >
> > 4. GPU scheduling is extremely complex. What is our current roadmap for
> > this?
> >
> > This is a very high-quality and exciting proposal. Making Flink an
> > AI-native data processing engine will make it far more valuable in the AI
> > era. Look forward to seeing it land and come to fruition soon.
> >
> > Robert Metzger <[email protected]> 于2026年4月28日周二 14:38写道:
> >
> > > Hey Guowei,
> > >
> > > Thanks for the proposal. I just took a brief look, here are some high
> > level
> > > questions:
> > >
> > > Regarding the RPC Operator: What is the difference to the async io
> > operator
> > > we have already?
> > >
> > > "Connector API for Multimodal Data Source/Sink": Why do we need to
> touch
> > > the connector API for supporting multimodal data? Isn't this more of a
> > > formats concern?
> > >
> > > "Non-Disruptive Scaling for CPU Operators": How do you want to
> guarantee
> > > exactly-once on that kind of scaling? E.g. you need to somehow make a
> > > handover between the old and new new pipeline
> > >
> > > Overall, I find the proposal has some things which seem related to
> making
> > > Flink more AI native, but other changes seem orthogonal to that. For
> > > example the checkpoint or scaling changes are actually unrelated to AI,
> > and
> > > just engine improvements.
> > >
> > >
> > > On Tue, Apr 28, 2026 at 5:48 AM Guowei Ma <[email protected]>
> wrote:
> > >
> > > > Hi everyone,
> > > >
> > > > I'd like to start a discussion on an umbrella FLIP[1] that lays out a
> > > > direction for evolving Flink into a data engine that natively
> supports
> > AI
> > > > workloads.
> > > >
> > > > The short version: user workloads are shifting from BI analytics to
> > > > multimodal data processing centered on model inference, and this
> > triggers
> > > > cascading changes across the stack — multimodal data flowing through
> > > > pipelines, heterogeneous CPU/GPU resources, vectorized execution, and
> > > > inference tasks that run for seconds to minutes on Spot instances.
> The
> > > > proposal sketches an evolution along five directions (development
> > > paradigm,
> > > > data model, heterogeneous resources, execution engine, fault
> > tolerance),
> > > > decomposed into 11 sub-FLIPs organized into three layers: core
> runtime
> > > > primitives, AI workload expression and execution, and
> production-grade
> > > > operational guarantees. Most sub-FLIPs have no hard dependencies on
> > each
> > > > other and can be advanced in parallel.
> > > >
> > > > A note on scope, since it's an umbrella:
> > > >
> > > > - In scope here: whether the evolution directions are reasonable,
> > whether
> > > > each sub-FLIP's motivation and proposed approach are well-founded,
> and
> > > > whether the boundaries and dependencies between sub-FLIPs are clear.
> > > > - Out of scope here: detailed designs, API specifics, and
> > implementation
> > > > plans of individual sub-FLIPs — those will go through their own
> FLIPs.
> > > > - Consensus criteria: agreement on the overall direction is
> sufficient
> > > for
> > > > the umbrella to pass; passing it does not lock in any sub-FLIP's
> > design —
> > > > sub-FLIPs may still be adjusted, deferred, or withdrawn as they
> > progress.
> > > >
> > > > All proposed changes are incremental — no existing API or behavior is
> > > > removed or altered. Compatibility details are covered at the end of
> the
> > > > document.
> > > >
> > > > Looking forward to your feedback on the overall direction and the
> > > layering.
> > > >
> > > > [1]
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=421957275
> > > >
> > > > Thanks,
> > > > Guowei
> > > >
> > >
> >
>

Reply via email to