Copilot commented on code in PR #13218:
URL: https://github.com/apache/cloudstack/pull/13218#discussion_r3287128296


##########
plugins/integrations/prometheus/src/main/java/org/apache/cloudstack/metrics/PrometheusExporterImpl.java:
##########
@@ -501,6 +511,35 @@ private void addVMsBySizeMetrics(final List<Item> 
metricsList, final long dcId,
         }
     }
 
+    private void addVirtualRouterHealthCheckMetrics(final List<Item> 
metricsList, final long dcId, final String zoneName, final String zoneUuid) {
+        List<DomainRouterVO> routers = domainRouterDao.listByDataCenter(dcId);
+        if (routers == null) {
+            return;
+        }
+        for (DomainRouterVO router : routers) {
+            if (router.getRole() != VirtualRouter.Role.VIRTUAL_ROUTER) {
+                continue;
+            }
+            List<RouterHealthCheckResultVO> checks = 
routerHealthCheckResultDao.getHealthCheckResults(router.getId());
+            if (checks == null || checks.isEmpty()) {

Review Comment:
   `addVirtualRouterHealthCheckMetrics` performs a separate 
`routerHealthCheckResultDao.getHealthCheckResults(routerId)` query per router. 
Since `updateMetrics()` is invoked on every `/metrics` scrape, this becomes an 
N+1 query pattern that can add significant DB load in zones with many routers. 
Consider adding a DAO method to fetch results for all router IDs in the zone in 
a single query (and then group in-memory), or otherwise batch the lookups to 
keep scrape latency predictable.



##########
plugins/integrations/prometheus/src/main/java/org/apache/cloudstack/metrics/PrometheusExporterImpl.java:
##########
@@ -501,6 +511,35 @@ private void addVMsBySizeMetrics(final List<Item> 
metricsList, final long dcId,
         }
     }
 
+    private void addVirtualRouterHealthCheckMetrics(final List<Item> 
metricsList, final long dcId, final String zoneName, final String zoneUuid) {
+        List<DomainRouterVO> routers = domainRouterDao.listByDataCenter(dcId);
+        if (routers == null) {
+            return;
+        }
+        for (DomainRouterVO router : routers) {
+            if (router.getRole() != VirtualRouter.Role.VIRTUAL_ROUTER) {
+                continue;
+            }
+            List<RouterHealthCheckResultVO> checks = 
routerHealthCheckResultDao.getHealthCheckResults(router.getId());

Review Comment:
   `domainRouterDao.listByDataCenter(dcId)` returns routers in all VM states 
(including Stopped/Destroyed), so this exporter can emit stale health-check 
metrics for routers that are no longer running and also do unnecessary DB work. 
Filtering to `State.Running` (and possibly skipping `State.Destroyed`) before 
querying health-check results would avoid exposing obsolete metrics and reduce 
scrape cost.
   



##########
plugins/integrations/prometheus/src/main/java/org/apache/cloudstack/metrics/PrometheusExporterImpl.java:
##########
@@ -501,6 +511,35 @@ private void addVMsBySizeMetrics(final List<Item> 
metricsList, final long dcId,
         }
     }
 
+    private void addVirtualRouterHealthCheckMetrics(final List<Item> 
metricsList, final long dcId, final String zoneName, final String zoneUuid) {
+        List<DomainRouterVO> routers = domainRouterDao.listByDataCenter(dcId);
+        if (routers == null) {
+            return;
+        }

Review Comment:
   Health-check metrics are exported unconditionally; when router health checks 
are disabled via `router.health.checks.enabled`, this still walks all routers 
and queries the health-check table. It would be better to short-circuit early 
(before listing routers) when health checks are disabled to avoid unnecessary 
work and to align `/metrics` with the feature being enabled.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to