neils-dev commented on code in PR #3781:
URL: https://github.com/apache/ozone/pull/3781#discussion_r1004946930
##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/node/DatanodeAdminMonitorImpl.java:
##########
@@ -168,6 +232,55 @@ public int getTrackedNodeCount() {
return trackedNodes.size();
}
+ synchronized void setMetricsToGauge() {
+ metrics.setTrackedContainersUnhealthyTotal(unhealthyContainers);
+ metrics.setTrackedRecommissionNodesTotal(trackedRecommission);
+ metrics.setTrackedDecommissioningMaintenanceNodesTotal(
+ trackedDecomMaintenance);
+ metrics.setTrackedContainersUnderReplicatedTotal(
+ underReplicatedContainers);
+ metrics.setTrackedContainersSufficientlyReplicatedTotal(
+ sufficientlyReplicatedContainers);
+ metrics.setTrackedPipelinesWaitingToCloseTotal(pipelinesWaitingToClose);
+ for (Map.Entry<String, ContainerStateInWorkflow> e :
+ containerStateByHost.entrySet()) {
+ metrics.metricRecordOfContainerStateByHost(e.getKey(),
+ e.getValue().sufficientlyReplicated,
+ e.getValue().underReplicatedContainers,
+ e.getValue().unhealthyContainers,
+ e.getValue().pipelinesWaitingToClose);
+ }
+ }
+
+ void resetContainerMetrics() {
+ pipelinesWaitingToClose = 0;
+ sufficientlyReplicatedContainers = 0;
+ unhealthyContainers = 0;
+ underReplicatedContainers = 0;
+
+ for (Map.Entry<String, ContainerStateInWorkflow> e :
Review Comment:
Yes, thanks, I was looking to couple the `ContainerStateInWorkflow` for use
both in the `DatanodeAdminMonitorImpl`and in the `NodeDecommissionMetrics`
however there are a few issues with that and thus it is implemented this way.
Those issues are,
i.) there needs to be separate stores for numbers collected for the metrics
from the monitor and numbers stored in the `NodeDecommissionMonitor.` This is
so that we do not report incomplete intermediate numbers to the`
NodeDecommissionMonitor` when the metrics are periodically pulled through
`getMetrics().` We flush the numbers on each run of the monitor thread to the
`NodeDecommissionMetrics `once all the numbers have been collected (the calls
to `metricRecordOfContainterStateByHost)`. For this reason the `Map<string,
ContainerStateInWorkflow>` cannot be used directly in the
`NodeDecommisonMonitor`.
ii.) With the two separate stores, we need to know which hosts stored are
currently in the workflow and which are out of the workflow and stale. Thus
the check in the monitor code to collect those hosts that are stale and
reporting that to the
`NodeDecommissionMetrics.metricRemoveRecordOfContainerStateByHost()`. For this
reason as well, it looks like clearing the map on each run of the monitor
instead of iterating to reset to 0, as suggested, is not possible. We need to
know which nodes (hosts) have become stale since the last run of the monitor.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]