On Aug 15, 2012, at 3:17 PM, Michael J Fork <mjf...@us.ibm.com> wrote:

> I am interested in finding a solution that enables bare-metal and virtualized 
> requests to be serviced through the same scheduler where the compute_nodes 
> table has a full view of schedulable resources.  This would seem to simplify 
> the end-to-end flow while opening up some additional use cases (e.g. dynamic 
> allocation of a node from bare-metal to hypervisor and back).  
> 
> One approach would be to have a proxy running a single nova-compute daemon 
> fronting the bare-metal nodes .  That nova-compute daemon would report up 
> many HostState objects (1 per bare-metal node) to become entries in the 
> compute_nodes table and accessible through the scheduler HostManager object.
> 
> The HostState object would set cpu_info, vcpus, member_mb and local_gb values 
> to be used for scheduling with the hypervisor_host field holding the 
> bare-metal machine address (e.g. for IPMI based commands) and hypervisor_type 
> = NONE.  The bare-metal Flavors are created with an extra_spec of 
> hypervisor_type= NONE and the corresponding compute_capabilities_filter would 
> reduce the available hosts to those bare_metal nodes.  The scheduler would 
> need to understand that hypervisor_type = NONE means you need an exact fit 
> (or best-fit) host vs weighting them (perhaps through the multi-scheduler).  
> The scheduler would cast out the message to the <topic>.<service-hostname> 
> (code today uses the HostState hostname), with the compute driver having to 
> understand if it must be serviced elsewhere (but does not break any existing 
> implementations since it is 1 to 1).
> 
> 
> Does this solution seem workable? Anything I missed?
> 
The bare metal driver already is proxying for the other nodes so it sounds like 
we need a couple of things to make this happen:

a) modify driver.get_host_stats to be able to return a list of host stats 
instead of just one. Report the whole list back to the scheduler. We could 
modify the receiving end to accept a list as well or just make multiple calls 
to 
        self.update_service_capabilities(capabilities)

b) make a few minor changes to the scheduler to make sure filtering still 
works. Note the changes here may be very helpful:

https://review.openstack.org/10327

c) we have to make sure that instances launched on those nodes take up the 
entire host state somehow. We could probably do this by making sure that the 
instance_type ram, mb, gb etc. matches what the node has, but we may want a new 
boolean field "used" if those aren't sufficient.

I This approach seems pretty good. We could potentially get rid of the shared 
bare_metal_node table. I guess the only other concern is how you populate the 
capabilities that the bare metal nodes are reporting. I guess an api extension 
that rpcs to a baremetal node to add the node. Maybe someday this could be 
autogenerated by the bare metal host looking in its arp table for dhcp 
requests! :)

Vish

_______________________________________________
Mailing list: https://launchpad.net/~openstack
Post to     : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp

Reply via email to