[PR] Update in `force.ha` value configuration value obtention [cloudstack]

via GitHub Wed, 10 Jun 2026 07:40:13 -0700


erikbocks opened a new pull request, #13391:
URL: https://github.com/apache/cloudstack/pull/13391


   ### Description
   
   Currently, workflows that use the `force.ha` configuration value only 
consider the global value of the configuration, even though it has a `Cluster` 
scope. To fix this, changes were made to consider the configuration's cluster 
scope value. If it is not configured at cluster level, the global value is used.
   
   Sometimes, the cluster ID was not available at the method for the 
configuration value obtention, and the host could be `null`. Thus, the 
`findClusterAndHostIdForVM` was used, as it searches for the VM's host/last 
host and the returned host's cluster. If they (host and cluster) are both null, 
the cluster ID is obtained from the storage where the VM volume is allocated is 
returned.
   
   This PR also removes the host's cluster cleanup that was executed during 
host's removal. It was observed that there was no impact in removing the host' 
cluster ID, and processes that could be affected by it already contained 
validations to prevent errors.
   
   ### Types of changes
   
   - [ ] Breaking change (fix or feature that would cause existing 
functionality to change)
   - [ ] New feature (non-breaking change which adds functionality)
   - [X] Bug fix (non-breaking change which fixes an issue)
   - [ ] Enhancement (improves an existing feature and functionality)
   - [ ] Cleanup (Code refactoring and cleanup, that may add test cases)
   - [ ] Build/CI
   - [ ] Test (unit or integration test code)
   
   ### Feature/Enhancement Scale or Bug Severity
   
   #### Bug Severity
   
   - [ ] BLOCKER
   - [ ] Critical
   - [ ] Major
   - [X] Minor
   - [ ] Trivial
   
   ### Screenshots (if appropriate):
   
   ### How Has This Been Tested?
   
   First, I validated that without the changes, VMs without HA enabled in their 
offerings were not restarted, even though the `force.ha` configuration was 
enabled at cluster scope. After installing the packages with the new changes, I 
made the following tests. All the tests were executed one time with the 
configuration enabled globally, and one time with it enabled at cluster scope:
   
   | Test | Result |
   | ---- | ------ |
   | Kill the VM's process with `kill -9 <pid>` | The VM was identified as 
shutdown, and restarted automatically. |
   | Hard power off the VM's host | The host was identified as `Disconnected`, 
and the VM was restarted in another environment node. | 
   | Host's forced removal | ACS identified that that were VMs running in the 
host, and restarted them automatically in another node. |
   
   ---
   
   Regarding the cluster ID cleanup removal, the following tests were conducted 
to check whether keeping it would cause inconsistencies:
   
   | Test | Result |
   | ---- | ------ |
   | Host removal | The host was removed, and its cluster ID was kept. |
   | Host removal with running VMs | The host was removed, the cluster ID was 
kept, and the VMs were restarted on another node. |
   | Host reintroduction | The host was reintroduced with success. |
   | Cluster and host creation | No issue was found during the creation of both 
resources. No issue was found during the removal of both resources. |
   | Removing host from Cluster 1, deleting Cluster 1, and adding host to 
Cluster 2 | The host was added with success. |  
   
   Before the cluster ID cleanup removal, during the host's forced removal, HA 
restart jobs are created for the workers (`ha.workers` configuration) to 
process. However, if there aren't enough workers available to process the jobs 
and the `force.ha` configuration is enabled only at cluster scope, it is 
possible that the host's removal flow finishes before processing all HA jobs, 
leading to inconsistent VMs. 
   
   In order to validate if this was fixed after the cleanup removal, in a 
environment with 2 hosts, I provisioned 35 VMs in one host and set the 
`ha.workers` amount to `1`. Then, I forcefully removed the host where the VMs 
were provisioned, and validated that all the VMs were stopped, and restarted on 
the other host.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[PR] Update in `force.ha` value configuration value obtention [cloudstack]

Reply via email to