Hi Yi, and Flink Community, Thanks for bringing up this excellent proposal. I am fully in favor of FLIP-582.
In our production workloads, especially regarding large-scale AI inference, the tight coupling of the data plane and compute units has always been a major pain point. If an inference subtask fails due to a GPU OOM or driver issue, triggering a global rollback is incredibly expensive since model reloading takes minutes. The introduction of the RpcOperator Service as a first-class primitive masterfully decouples the heavy inference tasks from the mainstream topology. The fault isolation, independent scaling, and stateless design perfectly match the requirements of modern AI-oriented data processing. This is a clean and robust architecture. Looking forward to seeing this merged! Best wishes, Charles Zhang from Apache InLong Yi Zhang <[email protected]> 于2026年5月27日周三 14:12写道: > Hi everyone, > > > > I would like to start a discussion on FLIP-582: Support RpcOperator > Service [1]. > > > AI-oriented workloads like multimodal data processing and model inference > are > growing rapidly in recent years. These workloads are characterized by > expensive > resources (GPUs) and high initialization costs (seconds to minutes for > model > loading). In today's Flink, embedding them in the data plane couples their > parallelism and failover with surrounding operators; deploying them as > external > services disconnects their lifecycle from the job and doubles operational > overhead. > > > This FLIP introduces RpcOperator Service — a framework-level primitive > that runs > user-defined compute as RPC services in an independent Pipelined Region > within > the Flink job. Because the service is isolated at the scheduling level, it > can achieve > fault isolation, independent scaling, and dedicated resource allocation. > As a native > Flink primitive, it also lays the foundation for automatic flow control, > flexible load > balancing, and coordinated auto-scaling — all without introducing external > infrastructure or additional operational burden. > > > > > Looking forward to your feedback and suggestions! > > > > > [1] > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-582%3A+Support+RpcOperator+Service > > > > > > Best Regards, > Yi Zhang
