sureshanaparti opened a new pull request, #10674: URL: https://github.com/apache/cloudstack/pull/10674
### Description This PR improves the current direct agents rebalancing activity. Sometimes hypervisor hosts (direct agents) stuck with Disconnect state during agent rebalancing activity across multiple management server nodes. This issue was noticed during frequent restart of the management server nodes in the cluster. When there are multiple management server nodes in a cluster, if one or more nodes are shutdown/start/restart, CloudStack will rebalance the hosts among the remaining nodes or move the nodes to the newly joined management server nodes. During the rebalancing period multiple operations could happen including: - DirectAgentScan at interval of configured direct.agent.scan.interval - AgentRebalanceScan to identify and schedule rebalance agents - TransferAgentScan to transfer the host from original owner to future owner **Current Rebalance behavior** 1. For hosts that have AgentAttache && not forForward but in Disconnect state, CloudStack simply ignore these hosts without trying to ping again or update the status of the host. 2. For hosts that have AgentAttache && forForward, CloudStack removes the agent but still try to loadDirectlyConnectedHost. **Improved Rebalance behavior** During DirectAgentScan: scanDirectAgentToLoad(), identify hosts that for self-managed hosts that are in Disconnect state (disconnected after pingtimeout). 1. For hosts that have AgentAttache and is forForward, CloudStack should remove the agent 2. For hosts that have AgentAttache and is not forForward but in Disconnect state, CloudStack should try to investigate and update the status to Up if host is pingable. 3. For hosts that don't have AgentAttache, CloudStack should try to loadDirectlyConnectedHost. <!--- Describe your changes in DETAIL - And how has behaviour functionally changed. --> <!-- For new features, provide link to FS, dev ML discussion etc. --> <!-- In case of bug fix, the expected and actual behaviours, steps to reproduce. --> <!-- When "Fixes: #<id>" is specified, the issue/PR will automatically be closed when this PR gets merged --> <!-- For addressing multiple issues/PRs, use multiple "Fixes: #<id>" --> <!-- Fixes: # --> <!--- ******************************************************************************* --> <!--- NOTE: AUTOMATION USES THE DESCRIPTIONS TO SET LABELS AND PRODUCE DOCUMENTATION. --> <!--- PLEASE PUT AN 'X' in only **ONE** box --> <!--- ******************************************************************************* --> ### Types of changes - [ ] Breaking change (fix or feature that would cause existing functionality to change) - [ ] New feature (non-breaking change which adds functionality) - [ ] Bug fix (non-breaking change which fixes an issue) - [x] Enhancement (improves an existing feature and functionality) - [ ] Cleanup (Code refactoring and cleanup, that may add test cases) - [ ] build/CI - [ ] test (unit or integration test code) ### Feature/Enhancement Scale or Bug Severity #### Feature/Enhancement Scale - [ ] Major - [x] Minor #### Bug Severity - [ ] BLOCKER - [ ] Critical - [ ] Major - [ ] Minor - [ ] Trivial ### Screenshots (if appropriate): ### How Has This Been Tested? <!-- Please describe in detail how you tested your changes. --> <!-- Include details of your testing environment, and the tests you ran to --> #### How did you try to break this feature and the system with this change? <!-- see how your change affects other areas of the code, etc. --> <!-- Please read the [CONTRIBUTING](https://github.com/apache/cloudstack/blob/main/CONTRIBUTING.md) document --> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@cloudstack.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org