erikbocks opened a new pull request, #13391: URL: https://github.com/apache/cloudstack/pull/13391
### Description Currently, workflows that use the `force.ha` configuration value only consider the global value of the configuration, even though it has a `Cluster` scope. To fix this, changes were made to consider the configuration's cluster scope value. If it is not configured at cluster level, the global value is used. Sometimes, the cluster ID was not available at the method for the configuration value obtention, and the host could be `null`. Thus, the `findClusterAndHostIdForVM` was used, as it searches for the VM's host/last host and the returned host's cluster. If they (host and cluster) are both null, the cluster ID is obtained from the storage where the VM volume is allocated is returned. This PR also removes the host's cluster cleanup that was executed during host's removal. It was observed that there was no impact in removing the host' cluster ID, and processes that could be affected by it already contained validations to prevent errors. ### Types of changes - [ ] Breaking change (fix or feature that would cause existing functionality to change) - [ ] New feature (non-breaking change which adds functionality) - [X] Bug fix (non-breaking change which fixes an issue) - [ ] Enhancement (improves an existing feature and functionality) - [ ] Cleanup (Code refactoring and cleanup, that may add test cases) - [ ] Build/CI - [ ] Test (unit or integration test code) ### Feature/Enhancement Scale or Bug Severity #### Bug Severity - [ ] BLOCKER - [ ] Critical - [ ] Major - [X] Minor - [ ] Trivial ### Screenshots (if appropriate): ### How Has This Been Tested? First, I validated that without the changes, VMs without HA enabled in their offerings were not restarted, even though the `force.ha` configuration was enabled at cluster scope. After installing the packages with the new changes, I made the following tests. All the tests were executed one time with the configuration enabled globally, and one time with it enabled at cluster scope: | Test | Result | | ---- | ------ | | Kill the VM's process with `kill -9 <pid>` | The VM was identified as shutdown, and restarted automatically. | | Hard power off the VM's host | The host was identified as `Disconnected`, and the VM was restarted in another environment node. | | Host's forced removal | ACS identified that that were VMs running in the host, and restarted them automatically in another node. | --- Regarding the cluster ID cleanup removal, the following tests were conducted to check whether keeping it would cause inconsistencies: | Test | Result | | ---- | ------ | | Host removal | The host was removed, and its cluster ID was kept. | | Host removal with running VMs | The host was removed, the cluster ID was kept, and the VMs were restarted on another node. | | Host reintroduction | The host was reintroduced with success. | | Cluster and host creation | No issue was found during the creation of both resources. No issue was found during the removal of both resources. | | Removing host from Cluster 1, deleting Cluster 1, and adding host to Cluster 2 | The host was added with success. | Before the cluster ID cleanup removal, during the host's forced removal, HA restart jobs are created for the workers (`ha.workers` configuration) to process. However, if there aren't enough workers available to process the jobs and the `force.ha` configuration is enabled only at cluster scope, it is possible that the host's removal flow finishes before processing all HA jobs, leading to inconsistent VMs. In order to validate if this was fixed after the cleanup removal, in a environment with 2 hosts, I provisioned 35 VMs in one host and set the `ha.workers` amount to `1`. Then, I forcefully removed the host where the VMs were provisioned, and validated that all the VMs were stopped, and restarted on the other host. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
