Hi Yi, and Flink Community,

Thanks for bringing up this excellent proposal. I am fully in favor of
FLIP-582.

In our production workloads, especially regarding large-scale AI inference,
the tight coupling of the data plane and compute units has always been a
major pain point. If an inference subtask fails due to a GPU OOM or driver
issue, triggering a global rollback is incredibly expensive since model
reloading takes minutes.

The introduction of the RpcOperator Service as a first-class primitive
masterfully decouples the heavy inference tasks from the mainstream
topology. The fault isolation, independent scaling, and stateless design
perfectly match the requirements of modern AI-oriented data processing.

This is a clean and robust architecture. Looking forward to seeing this
merged!


Best wishes,
Charles Zhang
from Apache InLong


Yi Zhang <[email protected]> 于2026年5月27日周三 14:12写道:

> Hi everyone,
>
>
>
> I would like to start a discussion on FLIP-582: Support RpcOperator
> Service [1].
>
>
> AI-oriented workloads like multimodal data processing and model inference
> are
> growing rapidly in recent years. These workloads are characterized by
> expensive
> resources (GPUs) and high initialization costs (seconds to minutes for
> model
> loading). In today's Flink, embedding them in the data plane couples their
> parallelism and failover with surrounding operators; deploying them as
> external
> services disconnects their lifecycle from the job and doubles operational
> overhead.
>
>
> This FLIP introduces RpcOperator Service — a framework-level primitive
> that runs
> user-defined compute as RPC services in an independent Pipelined Region
> within
> the Flink job. Because the service is isolated at the scheduling level, it
> can achieve
> fault isolation, independent scaling, and dedicated resource allocation.
> As a native
> Flink primitive, it also lays the foundation for automatic flow control,
> flexible load
> balancing, and coordinated auto-scaling — all without introducing external
> infrastructure or additional operational burden.
>
>
>
>
> Looking forward to your feedback and suggestions!
>
>
>
>
> [1]
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-582%3A+Support+RpcOperator+Service
>
>
>
>
>
> Best Regards,
> Yi Zhang

Reply via email to