On Dec 10, 2015, at 15:57, Devananda van der Veen <devananda....@gmail.com> wrote:
> So, at this point, I think we need to accept that the scheduling of > virtualized and bare metal workloads are two different problem domains that > are equally complex. > > Either, we: > * build a separate scheduler process in Ironic, forking the Nova scheduler as > a starting point so as to be compatible with existing plugins; or > * begin building a direct integration between nova-scheduler and ironic, and > create a non-virtualization-centric resource tracker within Nova; or > * proceed with the plan we previously outlined, accept that this isn't going > to be backwards compatible with nova filter plugins, and apologize to any > operators who rely on the using the same scheduler plugins for baremetal and > virtual resources; or > * keep punting on this, bringing pain and suffering to all operators of bare > metal clouds, because nova-compute must be run as exactly one process for all > sizes of clouds. Speaking only for myself, I find the current direction unfortunate, but at the same time understandable, given how long it’s been discussed and the need to act now. It becomes apparent to me when I think about the future picture, if I imagine what the Compute API is should look like for all end users of vm/baremetal/container. They should be able to call one API to create an instance and the cloud will do the right things. I can see Nova being that API (entrypoint + scheduling, then handoff via driver to vm/baremetal/container API). An alternative would be a separate, new frontend API that hands off to a separate scheduling API (scheduler break out) that hands off to the various compute APIs (vm/baremetal/container). I realized that if we were able to do a 1:1 ratio of nova-compute to Ironic node, everything would work fine as-is. But I understand the problems with that as nova-compute processes can’t be run on the inventory nodes themselves, so you’re left with a ton of processes that you would have to find a place to run and it’s wasteful. Ironic doesn’t “fit in” to the model of 1:1 nova-compute to resource. My concern with the current plan is the need to sync constructs like aggregates and availability zones from one system (Nova) to the other (Ironic) in perpetuity. Users will have to set them up in both systems and keep them in sync. The code itself also has to be effectively duplicated along with filters and kept in sync. Eventually each of Nova and Ironic would be separate standalone systems, I imagine, to avoid having the sync issues. I’d rather we provided something like a more generic “Resource View API” in Nova that allows baremetal/container/clustered hypervisor environments to report resources via a REST API, and scheduling would occur based on the resources table (instead of having resource trackers). Each environment reporting resources would provide corresponding in-tree Nova scheduler filters that know what to do with resources related to them. Then scheduling would select a resource and lookup the compute host responsible for that resource, and nova-compute would delegate the chosen resource to, for example, Ironic. This same concept could exist in a separate scheduler service instead of Nova, but I don’t see why it can’t be in Nova. I figure we could either enhance Nova and eventually forklift the virtualization driver code out into a thin service that manages vms, or we could build a new frontend service and a scheduling service, and forklift the scheduling bits out of Nova so that it ends up being a thin service. The end result seems really similar to me, though one could argue that there are other systems that want to share scheduling code that aren’t provisioning compute, and thus scheduling would have to move out of Nova anyway. With the current direction, I see things going separate standalone with duplicated constructs and then eventually refactored to use common services down the road if and when they exist. I would personally prefer a direction toward something like a Resource View API in Nova that generalizes resources to avoid compute services, like Ironic, having to duplicate scheduling, aggregates, availability zones, etc. -melanie
signature.asc
Description: Message signed with OpenPGP using GPGMail
__________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev