Hi,
I’m trying to understand - what are the features we are talking about which can 
not be implemented in flink-agents sub project ?

Shekhar 

On Saturday, April 18, 2026, 21:39, FeatZhang <[email protected]> wrote:

Hi Flink devs,

I would like to start a discussion on a missing piece in Flink’s current
AI/ML inference capabilities and propose a FLIP for a *streaming-native AI
inference runtime layer*.
Motivation

Apache Flink currently provides basic AI inference capabilities through
SQL-level constructs such as ML_PREDICT and related functions. These are
useful for integrating external models into batch and streaming pipelines.

However, in production AI workloads (especially real-time inference and LLM
serving), we observe several gaps:

  - No unified runtime abstraction for inference execution
  - No streaming-native batching or latency-aware scheduling
  - Limited support for backpressure-aware inference control
  - No built-in retry, fallback, or circuit breaker mechanisms
  - Fragmented integration with external inference systems (e.g., HTTP
  services, Triton, LLM endpoints)

As a result, users often re-implement these capabilities in user-defined
functions, leading to inconsistent behavior and duplicated complexity.
------------------------------
Proposal (High-level)

This FLIP proposes introducing a *Streaming-native AI Inference Runtime
Layer* in Flink, providing:

  - A unified inference operator abstraction
  - Adaptive batching and concurrency control
  - Backpressure-aware request scheduling
  - Pluggable inference backends (HTTP / Triton / custom services)
  - Built-in reliability mechanisms (retry, timeout, circuit breaker)
  - Standard metrics and observability hooks

------------------------------
Design Overview

The high-level architecture would look like:

DataStream / Table API
        ↓
Inference Operator Layer
        ↓
Inference Execution Engine
        ↓
Pluggable Inference Backend

This layer would integrate with Flink’s existing streaming runtime and
remain fully compatible with current SQL/Table APIs.
------------------------------
Non-goals

  - This does NOT replace ML_PREDICT or existing SQL semantics
  - This does NOT introduce a new ML training framework
  - This is not tied to any specific inference engine

------------------------------
Why now

We see increasing adoption of Flink for real-time AI workloads, including:

  - streaming inference
  - LLM-based pipelines
  - hybrid AI + data processing workflows

However, the lack of a standardized runtime abstraction makes production
deployments complex and inconsistent.
------------------------------
Request for feedback

I would like feedback on:

  1. Whether a dedicated inference runtime layer fits within Flink’s
  architectural direction
  2. Preferred integration approach (Table API, DataStream, or both)
  3. Scope of built-in features vs user-defined extensibility
  4. Any existing efforts or ongoing work in this direction

If there is agreement on direction, I will follow up with a more detailed
FLIP design document.
------------------------------

Thanks,
featzhang



Reply via email to