Hi, I’m trying to understand - what are the features we are talking about which can not be implemented in flink-agents sub project ?
Shekhar On Saturday, April 18, 2026, 21:39, FeatZhang <[email protected]> wrote: Hi Flink devs, I would like to start a discussion on a missing piece in Flink’s current AI/ML inference capabilities and propose a FLIP for a *streaming-native AI inference runtime layer*. Motivation Apache Flink currently provides basic AI inference capabilities through SQL-level constructs such as ML_PREDICT and related functions. These are useful for integrating external models into batch and streaming pipelines. However, in production AI workloads (especially real-time inference and LLM serving), we observe several gaps: - No unified runtime abstraction for inference execution - No streaming-native batching or latency-aware scheduling - Limited support for backpressure-aware inference control - No built-in retry, fallback, or circuit breaker mechanisms - Fragmented integration with external inference systems (e.g., HTTP services, Triton, LLM endpoints) As a result, users often re-implement these capabilities in user-defined functions, leading to inconsistent behavior and duplicated complexity. ------------------------------ Proposal (High-level) This FLIP proposes introducing a *Streaming-native AI Inference Runtime Layer* in Flink, providing: - A unified inference operator abstraction - Adaptive batching and concurrency control - Backpressure-aware request scheduling - Pluggable inference backends (HTTP / Triton / custom services) - Built-in reliability mechanisms (retry, timeout, circuit breaker) - Standard metrics and observability hooks ------------------------------ Design Overview The high-level architecture would look like: DataStream / Table API ↓ Inference Operator Layer ↓ Inference Execution Engine ↓ Pluggable Inference Backend This layer would integrate with Flink’s existing streaming runtime and remain fully compatible with current SQL/Table APIs. ------------------------------ Non-goals - This does NOT replace ML_PREDICT or existing SQL semantics - This does NOT introduce a new ML training framework - This is not tied to any specific inference engine ------------------------------ Why now We see increasing adoption of Flink for real-time AI workloads, including: - streaming inference - LLM-based pipelines - hybrid AI + data processing workflows However, the lack of a standardized runtime abstraction makes production deployments complex and inconsistent. ------------------------------ Request for feedback I would like feedback on: 1. Whether a dedicated inference runtime layer fits within Flink’s architectural direction 2. Preferred integration approach (Table API, DataStream, or both) 3. Scope of built-in features vs user-defined extensibility 4. Any existing efforts or ongoing work in this direction If there is agreement on direction, I will follow up with a more detailed FLIP design document. ------------------------------ Thanks, featzhang
