Hi Druid community, I would like to start a discussion on a new ingestion mode for Apache Druid backed by Kafka 4.0 Share Groups (KIP-932 and subsequent iterations). Tracking progress: https://github.com/apache/druid/issues/18439 Motivation & Vision: A large class of Druid ingestion use cases is inherently task-queue-like, not stream-ordered:
- Distributed System Monitoring: Log lines from thousands of microservices. Whether a log from service A arrives before service B is irrelevant — the query is "total ERROR count in the last 5 minutes." - IoT Fleet Analytics: Temperature readings from geographically dispersed sensors. Each reading is an independent unit of work. The relative arrival order of a sensor in Singapore vs. Oslo carries no semantic meaning. - Security Threat Detection: Netflow records analyzed for volume patterns. Threats are identified by aggregate attributes, not microsecond sequencing across network segments. - API Observability: Billions of HTTP request records. Total p99 latency per endpoint per minute is the query. The order of individual requests does not affect the answer. For all of these, the correct primitive is a work queue: N workers pull items from a shared pool, process them independently, and signal completion. Kafka Share Groups implement exactly this at the broker level, removing the need for Druid to solve it through partition management. I will share the draft the design doc(and draft a PR) for further discussion. Looking forward to feedback. Thanks, Shekhar RajakGithub: @Shekharrajak
