Re: [I] Druid Modernization Ideas (druid)

via GitHub Mon, 02 Mar 2026 02:22:34 -0800


kfaraz commented on issue #19039:
URL: https://github.com/apache/druid/issues/19039#issuecomment-3983456838


   I have always felt the need for the __Internal metrics ingestion__ too.
   
   I ended up doing something rudimentary via `CliEventCollector` in embedded 
tests where Druid services send their metrics over HTTP to an "event collector" 
(a new type of Druid node used in embedded tests only), which then keeps the 
metrics in memory but doesn't persist them.
   
   One way to implement this for production could probably build on the idea of 
the event collector:
   - Add a new (realtime) task type, say `index_metrics_internal`
   - The task registers an HTTP endpoint and also registers a service, say 
`metrics_collector`.
   - Druid services use a wrapper over the existing `HttpPostEmitter`.
     - This wrapper first discovers the task based on the service
     - Then posts metrics to this endpoint
   - Task collects metrics in a queue and creates segments for say a 
`__metrics` datasource from the incoming stream/queue of metrics
   
   The advantage of using a realtime task is:
   - Don't need new sys tables to query this data.
   - Support all kinds of queries that Druid currently supports (sys tables do 
not support all types of queries currently)
   - All the benefit of realtime tasks, like auto-discover schema, etc.
   
   Con: Takes up a task slot


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [I] Druid Modernization Ideas (druid)

Reply via email to