Wei-Chiu Chuang created HDDS-15266:
--------------------------------------

             Summary: Ability to trace container state transitions across a 
cluster
                 Key: HDDS-15266
                 URL: https://issues.apache.org/jira/browse/HDDS-15266
             Project: Apache Ozone
          Issue Type: Wish
            Reporter: Wei-Chiu Chuang


Request for proposal to improve observability around the container health state 
cluster-wide.

 

Current state:
   * Container Audit Logs: Accurate. Container state transitions are currently 
audited in HddsDispatcher.java on individual Data Nodes, making cluster-wide 
health tracking difficult. Single datanode scope allows monitoring of container 
state transitions. However, container health state requires holistic view 
across the cluster. For example, when a container becomes over-replicated, when 
does a container becomes unhealthy; at what timestamp and the corresponding 
ratis transaction id.

 

One possible approach is to implement a mechanism for Data Nodes to report 
container state transition events to SCM (or Recon) via heartbeats, allowing 
SCM to expose a holistic, cluster-wide metric for container transitions.

 

Open to suggestions.

 

References:

[https://ozone.apache.org/docs/next/system-internals/replication/data/replication-manager#container-state-descriptions]

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to