Hi Folks,

I'd like to get some eyes on a bug I just filed:  
https://bugs.launchpad.net/nova/+bug/1250049

A recent change (https://review.openstack.org/#/c/52189/9 ) introduced the 
automatic disable / re-enable of nova-compute when connection to libvirt is 
lost and recovered.   The problem is that it doesn't take any account of the 
fact that a cloud administrator may have other reasons for disabling a service, 
and always put nova-compute back into an enabled state.

The impact of this is pretty big for us - at any point in time we have a number 
of servers disabled for various operational reasons, and there are times when 
we need to restart libvirt as part of a deployment.  With this change in place 
all of those hosts are returned to an enabled state, and the reason that they 
were disabled is lost.

While I like the concept that an error condition like this should disable the 
host from a scheduling perspective, I think it needs to be implemented as an 
additional form of disablement (i.e a separate value kept in the ServiceGroup 
API), not an override of the current one.

I'd like to propose that the current change is reverted as a priority, and a 
new approach then submitted as a second step that works alongside the current 
enable /disable reason.

Sorry for not catching this in the review stage - I didn't notice this one at 
all.

Phil
_______________________________________________
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Reply via email to