Hi Vish, Is this discussion for long-term goal or for this Folsom release?
We still believe that bare-metal database is needed because there is not an automated way how bare-metal nodes report their capabilities to their bare-metal nova-compute node. Thanks, David > > I am interested in finding a solution that enables bare-metal and > virtualized requests to be serviced through the same scheduler where > the compute_nodes table has a full view of schedulable resources. This > would seem to simplify the end-to-end flow while opening up some > additional use cases (e.g. dynamic allocation of a node from > bare-metal to hypervisor and back). > > One approach would be to have a proxy running a single nova-compute > daemon fronting the bare-metal nodes . That nova-compute daemon would > report up many HostState objects (1 per bare-metal node) to become > entries in the compute_nodes table and accessible through the > scheduler HostManager object. > > > > > The HostState object would set cpu_info, vcpus, member_mb and local_gb > values to be used for scheduling with the hypervisor_host field > holding the bare-metal machine address (e.g. for IPMI based commands) > and hypervisor_type = NONE. The bare-metal Flavors are created with an > extra_spec of hypervisor_type= NONE and the corresponding > compute_capabilities_filter would reduce the available hosts to those > bare_metal nodes. The scheduler would need to understand that > hypervisor_type = NONE means you need an exact fit (or best-fit) host > vs weighting them (perhaps through the multi-scheduler). The scheduler > would cast out the message to the <topic>.<service-hostname> (code > today uses the HostState hostname), with the compute driver having to > understand if it must be serviced elsewhere (but does not break any > existing implementations since it is 1 to 1). > > > > > > Does this solution seem workable? Anything I missed? > > The bare metal driver already is proxying for the other nodes so it > sounds like we need a couple of things to make this happen: > > > a) modify driver.get_host_stats to be able to return a list of host > stats instead of just one. Report the whole list back to the > scheduler. We could modify the receiving end to accept a list as well > or just make multiple calls to > self.update_service_capabilities(capabilities) > > > b) make a few minor changes to the scheduler to make sure filtering > still works. Note the changes here may be very helpful: > > > https://review.openstack.org/10327 > > > c) we have to make sure that instances launched on those nodes take up > the entire host state somehow. We could probably do this by making > sure that the instance_type ram, mb, gb etc. matches what the node > has, but we may want a new boolean field "used" if those aren't > sufficient. > > > I This approach seems pretty good. We could potentially get rid of the > shared bare_metal_node table. I guess the only other concern is how > you populate the capabilities that the bare metal nodes are reporting. > I guess an api extension that rpcs to a baremetal node to add the > node. Maybe someday this could be autogenerated by the bare metal host > looking in its arp table for dhcp requests! :) > > > Vish > > _______________________________________________ > OpenStack-dev mailing list > openstack-...@lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev _______________________________________________ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp