sodonnel commented on code in PR #3781:
URL: https://github.com/apache/ozone/pull/3781#discussion_r999254808
##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/node/DatanodeAdminMonitorImpl.java:
##########
@@ -168,6 +225,43 @@ public int getTrackedNodeCount() {
return trackedNodes.size();
}
+ synchronized void setMetricsToGauge() {
+ metrics.setTrackedContainersUnhealthyTotal(unhealthyContainers);
+ metrics.setTrackedRecommissionNodesTotal(trackedRecommission);
+ metrics.setTrackedDecommissioningMaintenanceNodesTotal(
+ trackedDecomMaintenance);
+ metrics.setTrackedContainersUnderReplicatedTotal(
+ underReplicatedContainers);
+ metrics.setTrackedContainersSufficientlyReplicatedTotal(
+ sufficientlyReplicatedContainers);
+ metrics.setTrackedPipelinesWaitingToCloseTotal(pipelinesWaitingToClose);
+ for (Map.Entry<String, Long> e :
+ pipelinesWaitingToCloseByHost.entrySet()) {
+ metrics.metricRecordPipelineWaitingToCloseByHost(e.getKey(),
+ e.getValue());
+ }
+ for (Map.Entry<String, ContainerStateInWorkflow> e :
Review Comment:
I might be wrong, but I think there is a bug here.
Lets say we put a host to maintenance. It will have some metrics tracked in
the ByHost maps.
After each pass we reset these maps to have zero counts, but we don't remove
the entries from the maps anywhere (unless I have missed it). Then we update
the values accordingly.
Later the node goes back into service and even though it is removed from the
monitor, it will be tracked with zero counts forever.
Over time on a long running cluster, we will build up a lot of "by host"
metrics with zero values, when they really should be removed.
I think the reset will need to remove them from the maps rather than zeroing
them, and also when setting the values to the metric gauge, you will need to
remove values no longer there from it too.
It might be easier to pass a `Map<String, ContainerStateInWorkflow>` to the
metrics class to facilitate removing the stale entries.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]