neils-dev commented on PR #3781:
URL: https://github.com/apache/ozone/pull/3781#issuecomment-1295689285

   Pushed new changes that finish the clean up for the metrics reset and 
collection in the monitor.  In addition the metric in the 
NodeDecommisionMetrics changed for container state metrics per node in the 
workflow. 
   
    Now, a single metric name is used for the same metric collected for each 
datanode, within the metric is an associated tag that identifies the node for 
the metric reading, ie. 
`node_decommission_metrics_tracked_sufficiently_replicated_dn{datanode="ozone-datanode-2.ozone_default",hostname="39160451dea0"}.
   
   The decommissioning / maintenance workflow is tracked by JMX displaying each 
aggregated metric and displaying the node container state metrics only when the 
node is in the workflow.  Prometheus now also displays each aggregated metric 
but now under a unique metric name for each container state metric, displays 
each host associated with the reading as a tag.  This can be seen below in a 
workflow decommissioning `datanode-3:`
   
   during:
   ```
       "name" : 
"Hadoop:service=StorageContainerManager,name=NodeDecommissionMetric
   s",
       "modelerType" : "NodeDecommissionMetrics",
       "tag.Hostname" : "39160451dea0",
       "TrackedDecommissioningMaintenanceNodesTotal" : 1,
       "TrackedRecommissionNodesTotal" : 0,
       "TrackedPipelinesWaitingToCloseTotal" : 0,
       "TrackedContainersUnderReplicatedTotal" : 1,
       "TrackedContainersUnhealthyTotal" : 0,
       "TrackedContainersSufficientlyReplicatedTotal" : 0,
       "tag.datanode.1" : "ozone-datanode-3.ozone_default",
       "tag.Hostname.1" : "39160451dea0",
       "TrackedUnderReplicatedDN.1" : 1,
       "tag.datanode.2" : "ozone-datanode-3.ozone_default",
       "tag.Hostname.2" : "39160451dea0",
       "TrackedSufficientlyReplicatedDN.2" : 0,
       "tag.datanode.3" : "ozone-datanode-3.ozone_default",
       "tag.Hostname.3" : "39160451dea0",
       "TrackedPipelinesWaitingToCloseDN.3" : 0,
       "tag.datanode.4" : "ozone-datanode-3.ozone_default",
       "tag.Hostname.4" : "39160451dea0",
       "TrackedUnhealthyContainersDN.4" : 0
     }, {
   ```
   
   after,
   ```
    }, {
       "name" : 
"Hadoop:service=StorageContainerManager,name=NodeDecommissionMetrics",
       "modelerType" : "NodeDecommissionMetrics",
       "tag.Hostname" : "39160451dea0",
       "TrackedDecommissioningMaintenanceNodesTotal" : 0,
       "TrackedRecommissionNodesTotal" : 0,
       "TrackedPipelinesWaitingToCloseTotal" : 0,
       "TrackedContainersUnderReplicatedTotal" : 0,
       "TrackedContainersUnhealthyTotal" : 0,
       "TrackedContainersSufficientlyReplicatedTotal" : 0
     }, {
   ```
   and Prometheus, decommission datanode-2 and datanode-3:
   
   
![decommission_prom_DNs](https://user-images.githubusercontent.com/81126310/198768113-f907b06a-6fc7-49c6-809d-8a9618c2c764.png)
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to