abhishekrb19 commented on code in PR #18598:
URL: https://github.com/apache/druid/pull/18598#discussion_r2411358991
##########
extensions-contrib/prometheus-emitter/src/main/java/org/apache/druid/emitter/prometheus/PrometheusEmitter.java:
##########
@@ -256,4 +279,21 @@ public void setPushGateway(PushGateway pushGateway)
{
this.pushGateway = pushGateway;
}
+
+ private void cleanUpStaleMetrics()
+ {
+ if (config.getFlushPeriod() == null) {
+ return;
+ }
+
+ Map<String, DimensionsAndCollector> map = metrics.getRegisteredMetrics();
+ for (Map.Entry<String, DimensionsAndCollector> entry : map.entrySet()) {
+ if (entry.getValue().isExpired(config.getFlushPeriod())) {
+ log.debug("Metric [%s] has expired (last updated %d ms ago)",
+ entry.getKey(),
+ entry.getValue().getTimeSinceLastUpdate());
+ entry.getValue().getCollector().clear();
Review Comment:
>Hmm, I suppose removing the children for specific labels would be a more
complete solution.
But it could blow up the memory usage pretty fast since we would now be
tracking the last updated time for every unique set of label values for every
metric.
Yeah, good callout on the memory usage when tracking every unique
combination of label sets per metric. Prometheus isn’t designed for very [high
cardinality
labels](https://prometheus.io/docs/practices/instrumentation/#do-not-overuse-labels),
and the Prometheus `SimpleCollector` in the java library already tracks the
unique label sets (`children`) internally. On the extension side, we’d only be
dealing with a subset of those `children`, since we’d remove the ephemeral
“stale” entries anyway (if there's a way to leverage the collector’s internal
state somehow, that would be even better). So I’m not too concerned about the
additional memory this would require, though it might still be worth validating
in a cluster with some labels with reasonable cardinality and load.
I’m also okay with the suggestion to start without labels for now and
revisit this (or another solution) once we have a better understanding of the
additional memory overhead it might introduce. The current implementation
should remain extensible enough to support label tracking in the future if
needed. We can keep [#14638](https://github.com/apache/druid/issues/14638)
open and call out in the documentation that the TTL expiration only applies
to certain metric types and not to individual timeseries'.
@aho135 @kfaraz what do you think?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]