sodonnel opened a new pull request #433: Hdds 2860 Cluster disk space metrics 
should reflect decommission and maintenance states
URL: https://github.com/apache/hadoop-ozone/pull/433
 
 
   # This needs HDDS-2113 committed before this one.
   
   ## What changes were proposed in this pull request?
   
   Now we have decommission states, we need to adjust the cluster capacity, 
space used and available metrics which are exposed via JMX.
   
   For a node decommissioning, the space used on the node effectively needs to 
be transfer to other nodes via container replication before decommission can 
complete, but this is difficult to track from a space usage perspective. When a 
node completes decommission, we can assume it provides no capacity to the 
cluster and uses none. Therefore, for decommissioning + decommissioned nodes, 
the simplest calculation is to exclude the node completely in a similar way to 
a dead node.
   
   For maintenance nodes, things are even less clear. For a maintenance node, 
it is read only so it cannot provide capacity to the cluster, but it is 
expected to return to service, so excluding it completely probably does not 
make sense. However, perhaps the simplest solution is to do the following:
   
   1. For any node not IN_SERVICE, do not include its usage or space in the 
cluster capacity totals.
   2. Introduce some new metrics to account for the maintenance and perhaps 
decommission capacity, so it is not lost eg:
   
   ```
   # Existing metrics
   "DiskCapacity" : 62725623808,
   "DiskUsed" : 4096,
   "DiskRemaining" : 50459619328,
   
   # Suggested additional new ones, with the above only considering IN_SERVICE 
nodes:
   "MaintenanceDiskCapacity": 0
   "MaintenanceDiskUsed": 0
   "MaintenanceDiskRemaining": 0
   "DecommissionedDiskCapacity": 0
   "DecommissionedDiskUsed": 0
   "DecommissionedDiskRemaining": 0
   ...
   ```
   That way, the cluster totals are only what is currently "online", but we 
have the other metrics to track what has been removed etc. The key advantage of 
this, is that it is easy to understand.
   
   There could also be an argument that the new decommissionedDisk metrics are 
not needed as that capacity is technically lost from the cluster forever.
   
   
   ## What is the link to the Apache JIRA
   
   https://issues.apache.org/jira/browse/HDDS-2860
   
   ## How was this patch tested?
   
   Additional unit test was added an manual inspection of the new metrics.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org

Reply via email to