abhishekrb19 commented on code in PR #18689:
URL: https://github.com/apache/druid/pull/18689#discussion_r2462292099


##########
extensions-contrib/prometheus-emitter/src/main/java/org/apache/druid/emitter/prometheus/DimensionsAndCollector.java:
##########
@@ -66,22 +70,41 @@ public double[] getHistogramBuckets()
     return histogramBuckets;
   }
 
-  public void resetLastUpdateTime()
+  /**
+   * For each unique set of labelValues, keeps track of the amount of time 
that has elapsed since its metric
+   * value has been updated.
+   */
+  public void resetLastUpdateTime(List<String> labelValues)
   {
-    updateTimer.restart();
+    labelValuesToStopwatch.compute(labelValues, (k, v) -> {
+      if (v != null) {
+        v.restart();
+        return v;
+      } else {
+        return Stopwatch.createStarted();
+      }
+    });
   }
 
-  public long getMillisSinceLastUpdate()
+  public ConcurrentMap<List<String>, Stopwatch> getLabelValuesToStopwatch()
   {
-    return updateTimer.millisElapsed();
+    return labelValuesToStopwatch;
   }
 
-  public boolean isExpired()
+  /**
+   * For the given labelValues, checks if the metric value has been updated 
within the configured flushPeriod.
+   * Returns true and removes the entry from the map if it has expired, 
otherwise returns false.
+   */
+  public boolean removeIfExpired(List<String> labelValues)

Review Comment:
   nit: to make this more boolean-y :) what do you think of calling this 
something like `shouldRemoveIfExpired` or `haveLabelsExpired`?
   
   
   



##########
extensions-contrib/prometheus-emitter/src/main/java/org/apache/druid/emitter/prometheus/DimensionsAndCollector.java:
##########
@@ -33,7 +37,7 @@ public class DimensionsAndCollector
   private final SimpleCollector collector;
   private final double conversionFactor;
   private final double[] histogramBuckets;
-  private final Stopwatch updateTimer;
+  private final ConcurrentHashMap<List<String>, Stopwatch> 
labelValuesToStopwatch;

Review Comment:
   Did you observe any memory pressure with this change? Previously 
[there](https://github.com/apache/druid/pull/18598#discussion_r2409716522) were 
some concerns on the memory usage with this approach, so it would be nice if 
you could share any observations



##########
extensions-contrib/prometheus-emitter/src/test/java/org/apache/druid/emitter/prometheus/PrometheusEmitterTest.java:
##########
@@ -507,14 +510,81 @@ public void testMetricTtlUpdate()
 
     Assert.assertFalse(
         "Metric should not be expired",
-        testMetric.isExpired()
+        testMetric.removeIfExpired(Arrays.asList("historical", 
"druid.test.cn", "historical1"))
     );
-    emitter.emit(event);
+    emitter.close();
+  }
+
+  @Test
+  public void testMetricTtlUpdateWithDifferentLabels()
+  {
+    int flushPeriod = 3;
+    PrometheusEmitterConfig config = new 
PrometheusEmitterConfig(PrometheusEmitterConfig.Strategy.exporter, "test", 
null, 0, null, true, true, flushPeriod, null, false, null);
+    PrometheusEmitter emitter = new PrometheusEmitter(config);
+    emitter.start();
+
+    ServiceMetricEvent event1 = ServiceMetricEvent.builder()
+                                                 
.setMetric("segment/loadQueue/count", 10)
+                                                 .setDimension("server", 
"historical1")
+                                                 
.build(ImmutableMap.of("service", "historical", "host", "druid.test.cn"));
+    ServiceMetricEvent event2 = ServiceMetricEvent.builder()
+                                                  
.setMetric("segment/loadQueue/count", 10)
+                                                  .setDimension("server", 
"historical2")
+                                                  
.build(ImmutableMap.of("service", "historical", "host", "druid.test.cn"));
+    emitter.emit(event1);
+    emitter.emit(event2);
+
+    // Get the metrics and check that it's not expired initially
+    Map<String, DimensionsAndCollector> registeredMetrics = 
emitter.getMetrics().getRegisteredMetrics();
+    DimensionsAndCollector testMetric = 
registeredMetrics.get("segment/loadQueue/count");
 
-    long timeSinceLastUpdate = testMetric.getMillisSinceLastUpdate();
+    Assert.assertNotNull(
+        "Test metric should be registered",
+        testMetric
+    );
+    Assert.assertFalse(
+        "Metric should not be expired initially",
+        testMetric.removeIfExpired(Arrays.asList("historical", 
"druid.test.cn", "historical1"))
+    );
+    Assert.assertFalse(
+        "Metric should not be expired initially",
+        testMetric.removeIfExpired(Arrays.asList("historical", 
"druid.test.cn", "historical2"))
+    );
+
+    // Wait for a little, but not long enough for the metric to expire
+    long waitTime = TimeUnit.SECONDS.toMillis(flushPeriod) / 5;
+    try {
+      Thread.sleep(waitTime);
+    }
+    catch (InterruptedException e) {
+      Thread.currentThread().interrupt();
+    }
+
+    Assert.assertFalse(
+        "Metric should not be expired",
+        testMetric.removeIfExpired(Arrays.asList("historical", 
"druid.test.cn", "historical1"))
+    );
+    Assert.assertFalse(
+        "Metric should not be expired",
+        testMetric.removeIfExpired(Arrays.asList("historical", 
"druid.test.cn", "historical2"))
+    );
+    // Reset update time only for event2
+    emitter.emit(event2);
+
+    try {
+      // Wait for the remainder of the TTL to allow event1 to expire
+      Thread.sleep(waitTime * 4);
+    }
+    catch (InterruptedException e) {
+      Thread.currentThread().interrupt();
+    }
     Assert.assertTrue(
-        "Update time should have been refreshed",
-        timeSinceLastUpdate < waitTime
+        "Metric should be expired",
+        testMetric.removeIfExpired(Arrays.asList("historical", 
"druid.test.cn", "historical1"))
+    );
+    Assert.assertFalse(

Review Comment:
   Thanks for these tests!
   
   From a functional PoV, it would be better to verify the end state of the 
collector, that the metrics are indeed cleared, rather than calling the 
internal function `removeIfExpired()`; as it is, I think we don't have coverage 
for `cleanUpStaleMetrics()`.
   
   What do you think of something like:
   
   - `emitter.emit(metric1 - labelset1)`
   - `emitter.emit(metric1 - labelset2)`
   - `emitter.emit(metric1 - labelset1)`
   
   verify that `metric1 - labelset1` is not cleared from the collector, whereas 
`metric1 - labelset2` is cleared.
   
   
   I think we could leverage the `SimpleCollector`'s samples for verification. 
We could also pass `BlockingExecutorService` from the test to the prometheus 
emitter to control the executions (a bunch of tests do this to have better 
control of async executors). Please let me know what you think.
   
   
   



##########
extensions-contrib/prometheus-emitter/src/main/java/org/apache/druid/emitter/prometheus/PrometheusEmitter.java:
##########
@@ -286,13 +290,16 @@ private void cleanUpStaleMetrics()
     Map<String, DimensionsAndCollector> map = metrics.getRegisteredMetrics();
     for (Map.Entry<String, DimensionsAndCollector> entry : map.entrySet()) {
       DimensionsAndCollector metric = entry.getValue();
-      if (metric.isExpired()) {
-        log.debug(
-            "Metric [%s] has expired (last updated [%d] ms ago)",
-            entry.getKey(),
-            metric.getMillisSinceLastUpdate()
-        );
-        metric.getCollector().clear();
+      Set<List<String>> labelValuesCopy = new 
HashSet<>(metric.getLabelValuesToStopwatch().keySet());

Review Comment:
   Is a copy of the map required here if the underlying map is already 
concurrent safe from using `ConcurrentMap`? We're also doing 
`computeIfPresent()` on the map entries whihh should be safe



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to