GitHub user Xiao-zhen-Liu added a comment to the discussion: Task ideas for the dkNet-AI · Apache Texera Agent Hackathon
# Streaming Texera: Real-Time Workflows on Amber ## Problem Texera's Amber engine is micro-batch and bounded-input today: every workflow assumes its sources finish. Live data — sockets, WebSockets, sensor feeds, chat streams, agent event logs — cannot be expressed as a Texera workflow without polling hacks. ## Idea Add a first-class **streaming mode** to Texera workflows. Sources can produce data forever; downstream operators react to it in real time; the workflow runs indefinitely until the user stops it. ## What users get - **Live source operators** — connect a workflow to a TCP socket or WebSocket feed and watch results update as events arrive. - **Event-time windows** — tumbling, sliding, and session windows for aggregating streams (counts, sums, sessionization). - **Watermarks** — principled handling of out-of-order events, so windows fire correctly under network jitter. - **Continuous workflows** — a new "running indefinitely" workflow state with explicit Stop & Flush vs. Kill controls. - **Python streaming UDFs** — users yield tuples (and watermarks) forever from Python, same SDK ergonomics as today's UDFs. ## Why it matters - Unlocks live dashboards, alerting pipelines, and real-time AI-agent observability — directly aligned with the Agent Hackathon theme. - Keeps Texera's visual-workflow model intact; no new UI paradigm to learn. - Bounded workflows are unchanged — streaming is opt-in. ## Demo A WebSocket feed of agent tool-call events → tumbling 1-minute window → live "tool usage by agent" chart in the Texera UI, updating as events arrive. GitHub link: https://github.com/apache/texera/discussions/5059#discussioncomment-16924319 ---- This is an automatically sent email for [email protected]. To unsubscribe, please send an email to: [email protected]
