Re: [DISCUSS] FLIP-582: Support RpcOperator Service

Guowei Ma Wed, 27 May 2026 23:55:46 -0700

Hi,YI

Thanks for putting together this FLIP — it's a well-thought-out proposal,
and the RpcOperator Service is a really valuable addition for AI/inference
workloads on Flink. I have a few questions and suggestions:

1. Autoscaling
How is scale-up/scale-down expressed in this design? Could you consider
allowing concurrency to be configured as a (min, max) range? Currently it
appears only min = max (i.e. fixed parallelism) is supported.

2. Async invocation & end-to-end examples
The example shows a synchronous invocation style. I'd suggest also
providing an asynchronous invocation example — or even just a scaffold that
exposes only the RpcOperator's handle and method. In addition, it would be
best to include a complete end-to-end example of invoking a model (CPU
inference would be fine).

3. GPU as a resource
GPU is surely a common/first-class resource. Accessing it here through an
extend mechanism — wouldn't that mislead users? It would seem more natural
to declare GPU as a standard resource rather than going through an
extension.

4. Serial method invocation
Each method of an RpcOperator is invoked serially. I'd suggest stating this
explicitly in both the documentation and the code comments, so that users —
or an AI — can be clearly aware of it.

5. Why synchronous methods?
Why is every method in RpcOperator a synchronous call? Does this imply that
no model inference supports concurrent invocation?

6. SQL UDF usage
How would SQL UDF users invoke this capability? I'd suggest providing a
complete example for this as well.

Thanks again for the great work!

Best,
Guowei

On Wed, May 27, 2026 at 8:36 PM Charles Zhang <[email protected]>
wrote:

> Hi Yi, and Flink Community,
>
> Thanks for bringing up this excellent proposal. I am fully in favor of
> FLIP-582.
>
> In our production workloads, especially regarding large-scale AI inference,
> the tight coupling of the data plane and compute units has always been a
> major pain point. If an inference subtask fails due to a GPU OOM or driver
> issue, triggering a global rollback is incredibly expensive since model
> reloading takes minutes.
>
> The introduction of the RpcOperator Service as a first-class primitive
> masterfully decouples the heavy inference tasks from the mainstream
> topology. The fault isolation, independent scaling, and stateless design
> perfectly match the requirements of modern AI-oriented data processing.
>
> This is a clean and robust architecture. Looking forward to seeing this
> merged!
>
>
> Best wishes,
> Charles Zhang
> from Apache InLong
>
>
> Yi Zhang <[email protected]> 于2026年5月27日周三 14:12写道：
>
> > Hi everyone,
> >
> >
> >
> > I would like to start a discussion on FLIP-582: Support RpcOperator
> > Service [1].
> >
> >
> > AI-oriented workloads like multimodal data processing and model inference
> > are
> > growing rapidly in recent years. These workloads are characterized by
> > expensive
> > resources (GPUs) and high initialization costs (seconds to minutes for
> > model
> > loading). In today's Flink, embedding them in the data plane couples
> their
> > parallelism and failover with surrounding operators; deploying them as
> > external
> > services disconnects their lifecycle from the job and doubles operational
> > overhead.
> >
> >
> > This FLIP introduces RpcOperator Service — a framework-level primitive
> > that runs
> > user-defined compute as RPC services in an independent Pipelined Region
> > within
> > the Flink job. Because the service is isolated at the scheduling level,
> it
> > can achieve
> > fault isolation, independent scaling, and dedicated resource allocation.
> > As a native
> > Flink primitive, it also lays the foundation for automatic flow control,
> > flexible load
> > balancing, and coordinated auto-scaling — all without introducing
> external
> > infrastructure or additional operational burden.
> >
> >
> >
> >
> > Looking forward to your feedback and suggestions!
> >
> >
> >
> >
> > [1]
> >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-582%3A+Support+RpcOperator+Service
> >
> >
> >
> >
> >
> > Best Regards,
> > Yi Zhang
>

Re: [DISCUSS] FLIP-582: Support RpcOperator Service

Reply via email to