>
> Hi Guowei,
>
> Thanks for putting together this comprehensive umbrella FLIP. As one of
> Connector/Flink CDC projects maintainers, I'm very excited about this
> direction and would like to share my perspective.
>
> Over the past two years, I've been closely following and studying AI
> technologies that are tightly coupled with data — multimodal Data
> lake(Paimon/Iceberg/Lance/Vertox), Ray Data, Daft, and similar frameworks —
> while also thinking about where the Flink community should be heading. I
> have to be honest: over the last six months, I've come to a clear
> realization that data processing engines and data pipelines centered purely
> on structured data and CPUs are gradually losing their competitive edge —
> or at the very least, losing mindshare. The industry's attention has
> decisively shifted toward multimodal, GPU-accelerated, AI-native data
> processing. This umbrella FLIP addresses exactly this gap, and I believe it
> is both timely and necessary for Flink's long-term relevance.
>
> *+1 on the overall direction.* The five evolution directions are
> well-motivated and logically layered. The fact that most sub-FLIPs can
> proceed in parallel is a good engineering choice — it allows the community
> to make progress on multiple fronts without blocking each other.
>
> I'd also like to address a comment raised by @Robert in the thread:
>
> >>> "Connector API for Multimodal Data Source/Sink": Why do we need to
> touch the connector API for supporting multimodal data? Isn't this more of
> a formats concern?
>
> This is a fair question, and I think the distinction is worth clarifying.
> *Formats
> and connectors solve different problems, and multimodal data requires
> changes at both layers — but especially at the connector layer.* Here's
> why:
>
> 1. *Formats deal with encoding/decoding within a known data structure* —
> e.g., how to serialize a Row into JSON or Avro. But multimodal sources
> (object storage with raw image files, video streams via HLS, etc.) often
> don't have a "format" in the traditional Flink sense. The raw data itself
> (a .jpg file in S3, a video segment) needs to be ingested and standardized
> into Flink's internal type system. This is fundamentally a *Source API
> concern*, not a format concern.
> 2. *The Source side needs a unified ingestion abstraction* for
> heterogeneous multimodal origins. Today, if you want to read images from
> OSS/S3 and feed them into a Flink pipeline as typed Image objects
> (leveraging the new multimodal type system from the companion sub-FLIP),
> there is no standard connector API to do so. Users would have to write
> custom sources from scratch, with no framework-level support for
> OBJECT_REF, large object handling, or type-safe multimodal semantics.
> 3. *The Sink side does intersect with formats*, but goes beyond them.
> It's not just about encoding — it's about the *type mapping contract*
> between Flink's
> new native multimodal types (Tensor, Image, Embedding) and the evolving
> multimodal types in lake formats (BLOB, VECTOR in Paimon/Iceberg/Lance).
> This requires connector-level coordination to ensure that raw data and
> metadata land correctly in lake tables, which is beyond what a format alone
> can handle.
>
> In short: formats are about "how to encode/decode data," while this
> sub-FLIP is about "how to ingest unstructured multimodal data into the
> pipeline and land it correctly in external systems." These are
> connector-layer responsibilities.
>
> On this particular sub-FLIP, I've actually been doing some research and
> local POC work on multimodal ingestion and sinking with OSS/S3/HLS on the
> source side and Iceberg/Paimon on the sink side. Based on what I've
> learned, I'd be happy to contribute a more detailed design proposal to help
> drive the discussion forward. That said, I agree with the umbrella's
> scoping principle — the detailed design should be discussed in the
> dedicated sub-FLIP rather than here. Happy to follow up there once it's
> opened.
>
> Thanks again, Guowei, for driving this forward.
>
> At last. as a long-time Flink enthusiast, I sincerely hope Flink can
> remain center stage in the AI era — and this FLIP is a solid step in that
> direction.
>
> Best,
> Leonard
>