soreana opened a new pull request, #8328:
URL: https://github.com/apache/cloudstack/pull/8328
### Description
Sometimes the hostStats object of the agents becomes null in the management
server. It is a rare situation, and we haven't found the root cause yet, but it
occurs occasionally in our CloudStack deployments with many hosts.
The hostStat is null, even though the agent is UP and hosting multiple VMs.
It is possible to access the VM consoles and execute tasks on them.
This pull request doesn't address the issue directly; rather it displays
those hosts in Prometheus so we can restart the agent and get the necessary
information.
<!--- Describe your changes in DETAIL - And how has behaviour functionally
changed. -->
<!-- For new features, provide link to FS, dev ML discussion etc. -->
<!-- In case of bug fix, the expected and actual behaviours, steps to
reproduce. -->
<!-- When "Fixes: #<id>" is specified, the issue/PR will automatically be
closed when this PR gets merged -->
<!-- For addressing multiple issues/PRs, use multiple "Fixes: #<id>" -->
<!-- Fixes: # -->
<!---
*******************************************************************************
-->
<!--- NOTE: AUTOMATION USES THE DESCRIPTIONS TO SET LABELS AND PRODUCE
DOCUMENTATION. -->
<!--- PLEASE PUT AN 'X' in only **ONE** box -->
<!---
*******************************************************************************
-->
### Types of changes
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] New feature (non-breaking change which adds functionality)
- [x] Bug fix (non-breaking change which fixes an issue)
- [ ] Enhancement (improves an existing feature and functionality)
- [ ] Cleanup (Code refactoring and cleanup, that may add test cases)
- [ ] build/CI
### Feature/Enhancement Scale or Bug Severity
#### Feature/Enhancement Scale
- [x] Major
- [ ] Minor
#### Bug Severity
- [ ] BLOCKER
- [ ] Critical
- [x] Major
- [ ] Minor
- [ ] Trivial
### How Has This Been Tested?
<!-- Please describe in detail how you tested your changes. -->
<!-- Include details of your testing environment, and the tests you ran to
-->
1. Set `prometheus.exporter.enable` to `true`.
2. Execute `curl localhost:9595/metrics` on management server to make sure
that prometheus is working.
3. Stop any agent.
4. Run `curl localhost:9595/metrics | grep cloudstack_host_missing_info` you
get nothing in output cause the host state is still there. (If you wait for
couple of minutes management server may remove it)
5. Restart the management server to remove cashed host stats objects in the
memory.
6. Run `curl localhost:9595/metrics | grep cloudstack_host_missing_info`
again to get the following output:
```
curl localhost:9595/metrics | grep cloudstack_host_missing_info
cloudstack_host_missing_info{zone="testZone1",hostname="node01",filter="hostStats"}
-1
```
#### How did you try to break this feature and the system with this change?
<!-- see how your change affects other areas of the code, etc. -->
The change wouldn't affect other area of code as the prometheus module is
somehow an independent part of the CloudStack.
<!-- Please read the
[CONTRIBUTING](https://github.com/apache/cloudstack/blob/main/CONTRIBUTING.md)
document -->
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]