GitHub user weiqingy created a discussion: [Discussion] Supervisor + Sub-Agent Orchestration: Concrete Multi-Agent Use Cases for Event-Driven Agents
This thread proposes a concrete use case and primitive set for multi-agent orchestration — directly responding to @xintongsong's question in [#516](https://github.com/apache/flink-agents/discussions/516): > *"How important is it to support multi-agent system for event-driven agents? > What are the concrete use cases? And how do we build a multi-agent system on > Flink's streaming engine?"* It builds on the gap analysis @thefalc opened in [#84](https://github.com/apache/flink-agents/discussions/84) (which identified "agent self-reflection / output evaluation" and "orchestrator-worker / hierarchical patterns" as core multi-agent gaps but didn't propose a specific primitive set) and the foundations being laid by [#429](https://github.com/apache/flink-agents/discussions/429) (async execution) and [#598](https://github.com/apache/flink-agents/discussions/598) (durable execution reconcile). ### Motivation: capability orchestration, not LLM chaining Internal production discussions at my company landed on a framing worth surfacing: > **"Agentic" should not mean "calling LLMs in a loop." It should mean > autonomously orchestrating heterogeneous capabilities — LLMs, internal > services, REST APIs, ML models, *other agents* — based on reasoning. Any > text-in / text-out service should be a peer to an LLM in the orchestration > graph.** Today flink-agents leans LLM-centric in naming, examples, and resource model (`CHAT_MODEL` is first-class; everything else is "a tool"). This framing limits perceived applicability — users see "Flink + LLM calls" rather than "Flink for autonomous capability orchestration." That's a positioning weakness; capability-agnostic framing is what enterprise teams want. ### Concrete use case: supervisor + sub-agent with iterative refinement A pattern repeatedly requested by enterprise teams — and well-established in the Python ecosystem (LangGraph `create_supervisor`, CrewAI hierarchical mode, AutoGen GroupChat). **Example: real-time customer support ticket triage on a Kafka stream** ``` Kafka ticket stream │ ▼ ┌────────────────┐ │ Supervisor │ reads ticket, decides which sub-agent to delegate │ Agent │ └────────┬───────┘ │ ┌─────┴──────┬──────────────┬───────────────┐ ▼ ▼ ▼ ▼ Classifier Knowledge Ticket-API External Agent Sub-Agent Search (REST, (LangChain (LLM) Sub-Agent no LLM) wrapped as (LLM+RAG) REST endpoint) │ │ │ │ └────────────┴──────────────┴───────────────┘ │ ▼ Supervisor evaluates: "Is this answer complete & accurate?" │ ┌──────┴───────┐ │ NO │ YES ▼ ▼ Refine prompt Emit response with feedback to output topic → loop back (bounded by max rounds / quality threshold / cost budget) ``` Key properties: - **Heterogeneous sub-agents** — LLM, RAG, REST service, external agent — look the same to the supervisor. - **Iterative refinement** — supervisor judges sub-agent output and re-prompts with feedback context until satisfied. - **Streaming-native** — runs continuously over a Kafka stream, not per-request. - **Durable** — multi-round loop survives task manager crashes via Flink keyed state + checkpoints. ### Why this pattern fits flink-agents (vs LangGraph / CrewAI) LangGraph already does supervisor + refinement very well — *per request, in one Python process, with DB-backed checkpointing*. flink-agents' opportunity is the **same pattern over continuous streaming inputs with distributed exactly-once durability**. A multi-round refinement loop processing millions of events per day, surviving node failures, is something neither LangGraph nor CrewAI can offer. That answers @xintongsong's question on "how is it different on a streaming engine." ### Primitives to discuss I checked all open PRs/issues — nothing addresses these. ResourceType has `CHAT_MODEL`, `TOOL`, `MCP_SERVER`, `SKILLS` but no abstraction for "another agent" or "remote text service as peer to LLM." ReActAgent is single-agent only. No correlation/reply primitives for cross-job calls. The list below is a starting proposal — **the goal of this thread is to decide which of these are must-have to unblock the use case, which can wait, and which are out of scope.** **Tier 1** 1. **Unified callable resource type** — one abstraction subsuming Tool + REST service + sub-agent (sync HTTP, async Kafka, MCP). Supervisor shouldn't care about wire protocol. Directly addresses the capability-agnostic framing. 2. **Async cross-job RPC pattern** — correlation IDs, reply topics, timeouts, retries, in-flight state in Flink keyed state. This is the **streaming-durability differentiator** — Flink already gives us durable state + exactly-once; we should expose it as a clean "delegate to another flink-agent and await reply" primitive instead of forcing every team to reinvent the plumbing. **Tier 2** (likely starts as recipes, promoted to primitives after validation) 3. **Judge / critic step** — document the pattern first with example code; formalize only after multiple users converge on the shape. Avoids over-abstraction. 4. **Richer loop termination** — quality threshold, budget (tokens, wall-clock, rounds), not just `AGENT_MAX_ITERATIONS`. **Tier 0** (free wins, can do now) 5. **Reframe docs/examples** — lead with "agents orchestrate capabilities," not "agents call LLMs." Add a multi-agent reference example using *existing* primitives (Tool + ReActAgent + Kafka request/reply) to demonstrate the pattern works today, even if rough. ### Open questions for the community 1. Is this use case compelling enough to move multi-agent from **"Not likely"** to **"If possible"** in the [0.3 roadmap (#516)](https://github.com/apache/flink-agents/discussions/516) (or 0.4)? The supervisor pattern has broad industry validation; the streaming-durable variant is uniquely ours. 2. Should the supervisor pattern be built first as a **recipe/example** using existing primitives, validated, and only then formalized into framework primitives? I'd argue yes — avoids LangChain's "too many abstractions too fast" mistake. 3. Should we add new resource types (`SubAgent`, `RemoteCapability`), or extend existing `TOOL` / `MCP_SERVER`? Extending is less disruptive; new types are cleaner. 4. Cross-job RPC over Kafka — correlation ID + reply topic is the obvious shape, but timer-based timeouts, retries, and dead-lettering deserve a dedicated design proposal. Linking @xintongsong @thefalc @yanand0909 since you've engaged on adjacent topics in [#516](https://github.com/apache/flink-agents/discussions/516) / [#84](https://github.com/apache/flink-agents/discussions/84). GitHub link: https://github.com/apache/flink-agents/discussions/660 ---- This is an automatically sent email for [email protected]. To unsubscribe, please send an email to: [email protected]
