kfaraz commented on code in PR #14443: URL: https://github.com/apache/druid/pull/14443#discussion_r1237973409
########## docs/operations/metrics.md: ########## @@ -326,6 +326,12 @@ If `emitBalancingStats` is set to `true` in the Coordinator [dynamic configurati ## General Health +### Service Health + +|Metric|Description|Dimensions|Normal Value| +|------|-----------|----------|------------| +|`druid/heartbeat`| Report service health. For Overlord/Coordinator, the dimension is leader count. `ServiceStatusMonitor` must be enabled. |`heartbeatType`|1| Review Comment: No, I don't have a concrete example in mind either. I was thinking mostly of any cluster-level information, inter-service communication, etc. I agree with you that `server/` can be misleading when running on containers. I avoided using it in a PR due to similar reasons. I am working on a PR where I think I am going to use `cluster/` for server view syncs. e.g. `cluster/serverview/synced` which denotes the sync status between coordinator/broker inventory and different historical/peon processes. `druid/` is pretty much a catch-all and any metric that goes under `cluster/` could potentially go under `druid/` as well. But I generally try to use prefixes that make the metrics a little more user-friendly and somewhat self-explanatory. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
