There's [1], but I would have expected you to see error logs like [2] if that's what you're hitting.
[1] https://github.com/openstack/nova/blob/master/nova/conf/compute.py#L627-L645 [2] https://github.com/openstack/nova/blob/master/nova/compute/manager.py#L1714-L1716 efried On 01/31/2018 03:16 PM, Chris Apsey wrote: > All, > > Running in to a strange issue I haven't seen before. > > Randomly, the nova-compute services on compute nodes are disabling > themselves (as if someone ran openstack compute service set --disable > hostX nova-compute. When this happens, the node continues to report > itself as 'up' - the service is just disabled. As a result, if enough > of these occur, we get scheduling errors due to lack of available > resources (which makes sense). Re-enabling them works just fine and > they continue on as if nothing happened. I looked through the logs and > I can find the API calls where we re-enable the services (PUT > /v2.1/os-services/enable), but I do not see any API calls where the > services are getting disabled initially. > > Is anyone aware of any cases where compute nodes will automatically > disable their nova-compute service on their own, or has anyone seen this > before and might know a root cause? We have plenty of spare vcpus and > RAM on each node - like less than 25% utilization (both in absolute > terms and in terms of applied ratios). > > We're seeing follow-on errors regarding rmq messages getting lost and > vif-plug failures, but we think those are a symptom, not a cause. > > Currently running pike on Xenial. > > --- > v/r > > Chris Apsey > bitskr...@bitskrieg.net > https://www.bitskrieg.net > > _______________________________________________ > OpenStack-operators mailing list > OpenStack-operators@lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators _______________________________________________ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators