Hello all, It seems that the requirement for keys of HostManager.service_state is just to be unique; these do not have to be valid hostnames or queues (Already, existing code casts messages to <topic>.<service-hostname>. Michael, doesn't it?). So, I tried '<host>/<bm_node_id>' as 'host' of capabilities. Then, HostManager.service_state is: { <host>/<bm_node_id> : { <service> : { cap k : v }}}. So far, it works fine. How about this way?
I paste relevant code in the bottom of this mail just to make sure. NOTE: I added a new column 'nodename' to compute_nodes to store bm_node_id, but storing it in 'hypervisor_hostname' may be a right solution. (The whole code is in our github(NTTdocomo-openstack/nova, branch 'multinode'), multiple resource_trackers are also implemented.) Thanks, Arata
diff --git a/nova/scheduler/host_manager.py b/nova/scheduler/host_manager.py index 33ba2c1..567729f 100644 --- a/nova/scheduler/host_manager.py +++ b/nova/scheduler/host_manager.py @@ -98,9 +98,10 @@ class HostState(object): previously used and lock down access. """- def __init__(self, host, topic, capabilities=None, service=None):
+ def __init__(self, host, topic, capabilities=None, service=None, nodename=None): self.host = host self.topic = topic + self.nodename = nodename# Read-only capability dicts @@ -175,8 +176,8 @@ class HostState(object):
return Truedef __repr__(self):
- return ("host '%s': free_ram_mb:%s free_disk_mb:%s" % - (self.host, self.free_ram_mb, self.free_disk_mb)) + return ("host '%s' / nodename '%s': free_ram_mb:%s free_disk_mb:%s" % + (self.host, self.nodename, self.free_ram_mb, self.free_disk_mb))class HostManager(object):
@@ -268,11 +269,16 @@ class HostManager(object): LOG.warn(_("No service for compute ID %s") % compute['id']) continue host = service['host'] - capabilities = self.service_states.get(host, None) + if compute['nodename']: + host_node = '%s/%s' % (host, compute['nodename']) + else: + host_node = host + capabilities = self.service_states.get(host_node, None) host_state = self.host_state_cls(host, topic, capabilities=capabilities, - service=dict(service.iteritems())) + service=dict(service.iteritems()), + nodename=compute['nodename']) host_state.update_from_compute_node(compute) - host_state_map[host] = host_state + host_state_map[host_node] = host_statereturn host_state_map
diff --git a/nova/virt/baremetal/driver.py b/nova/virt/baremetal/driver.py index 087d1b6..dbcfbde 100644 --- a/nova/virt/baremetal/driver.py +++ b/nova/virt/baremetal/driver.py (skip...) + def _create_node_cap(self, node): + dic = self._node_resources(node) + dic['host'] = '%s/%s' % (FLAGS.host, node['id']) + dic['cpu_arch'] = self._extra_specs.get('cpu_arch') + dic['instance_type_extra_specs'] = self._extra_specs + dic['supported_instances'] = self._supported_instances + # TODO: put node's extra specs + return dicdef get_host_stats(self, refresh=False):
- return self._get_host_stats() + caps = [] + context = nova_context.get_admin_context() + nodes = bmdb.bm_node_get_all(context, + service_host=FLAGS.host) + for node in nodes: + node_cap = self._create_node_cap(node) + caps.append(node_cap) + return caps (2012/08/28 5:55), Michael J Fork wrote:
openstack-bounces+mjfork=us.ibm....@lists.launchpad.net wrote on 08/27/2012 02:58:56 PM: > From: David Kang <dk...@isi.edu> > To: Vishvananda Ishaya <vishvana...@gmail.com>, > Cc: OpenStack Development Mailing List <openstack- > d...@lists.openstack.org>, "openstack@lists.launchpad.net \ > (openstack@lists.launchpad.net\)" <openstack@lists.launchpad.net> > Date: 08/27/2012 03:06 PM > Subject: Re: [Openstack] [openstack-dev] Discussion about where to > put database for bare-metal provisioning (review 10726) > Sent by: openstack-bounces+mjfork=us.ibm....@lists.launchpad.net > > > Hi Vish, > > I think I understand your idea. > One service entry with multiple bare-metal compute_node entries are > registered at the start of bare-metal nova-compute. > 'hypervisor_hostname' must be different for each bare-metal machine, > such as 'bare-metal-0001.xxx.com', 'bare-metal-0002.xxx.com', etc.) > But their IP addresses must be the IP address of bare-metal nova- > compute, such that an instance is casted > not to bare-metal machine directly but to bare-metal nova-compute. I believe the change here is to cast out the message to the <topic>.<service-hostname>. Existing code sends it to the compute_node hostname (see line 202 of nova/scheduler/filter_scheduler.py, specifically host=weighted_host.host_state.host). Changing that to cast to the service hostname would send the message to the bare-metal proxy node and should not have an effect on current deployments since the service hostname and the host_state.host would always be equal. This model will also let you keep the bare-metal compute node IP in the compute node table. > One extension we need to do at the scheduler side is using (host, > hypervisor_hostname) instead of (host) only in host_manager.py. > 'HostManager.service_state' is { <host> : { <service > : { cap k : v }}}. > It needs to be changed to { <host> : { <service> : { > <hypervisor_name> : { cap k : v }}}}. > Most functions of HostState need to be changed to use (host, > hypervisor_name) pair to identify a compute node. Would an alternative here be to change the top level "host" to be the hypervisor_hostname and enforce uniqueness? > Are we on the same page, now? > > Thanks, > David > > ----- Original Message ----- > > Hi David, > > > > I just checked out the code more extensively and I don't see why you > > need to create a new service entry for each compute_node entry. The > > code in host_manager to get all host states explicitly gets all > > compute_node entries. I don't see any reason why multiple compute_node > > entries can't share the same service. I don't see any place in the > > scheduler that is grabbing records by "service" instead of by "compute > > node", but if there is one that I missed, it should be fairly easy to > > change it. > > > > The compute_node record is created in the compute/resource_tracker.py > > as of a recent commit, so I think the path forward would be to make > > sure that one of the records is created for each bare metal node by > > the bare metal compute, perhaps by having multiple resource_trackers. > > > > Vish > > > > On Aug 27, 2012, at 9:40 AM, David Kang <dk...@isi.edu> wrote: > > > > > > > > Vish, > > > > > > I think I don't understand your statement fully. > > > Unless we use different hostnames, (hostname, hypervisor_hostname) > > > must be the > > > same for all bare-metal nodes under a bare-metal nova-compute. > > > > > > Could you elaborate the following statement a little bit more? > > > > > >> You would just have to use a little more than hostname. Perhaps > > >> (hostname, hypervisor_hostname) could be used to update the entry? > > >> > > > > > > Thanks, > > > David > > > > > > > > > > > > ----- Original Message ----- > > >> I would investigate changing the capabilities to key off of > > >> something > > >> other than hostname. It looks from the table structure like > > >> compute_nodes could be have a many-to-one relationship with > > >> services. > > >> You would just have to use a little more than hostname. Perhaps > > >> (hostname, hypervisor_hostname) could be used to update the entry? > > >> > > >> Vish > > >> > > >> On Aug 24, 2012, at 11:23 AM, David Kang <dk...@isi.edu> wrote: > > >> > > >>> > > >>> Vish, > > >>> > > >>> I've tested your code and did more testing. > > >>> There are a couple of problems. > > >>> 1. host name should be unique. If not, any repetitive updates of > > >>> new > > >>> capabilities with the same host name are simply overwritten. > > >>> 2. We cannot generate arbitrary host names on the fly. > > >>> The scheduler (I tested filter scheduler) gets host names from > > >>> db. > > >>> So, if a host name is not in the 'services' table, it is not > > >>> considered by the scheduler at all. > > >>> > > >>> So, to make your suggestions possible, nova-compute should > > >>> register > > >>> N different host names in 'services' table, > > >>> and N corresponding entries in 'compute_nodes' table. > > >>> Here is an example: > > >>> > > >>> mysql> select id, host, binary, topic, report_count, disabled, > > >>> availability_zone from services; > > >>> +----+-------------+----------------+----------- > +--------------+----------+-------------------+ > > >>> | id | host | binary | topic | report_count | disabled | > > >>> | availability_zone | > > >>> +----+-------------+----------------+----------- > +--------------+----------+-------------------+ > > >>> | 1 | bespin101 | nova-scheduler | scheduler | 17145 | 0 | nova | > > >>> | 2 | bespin101 | nova-network | network | 16819 | 0 | nova | > > >>> | 3 | bespin101-0 | nova-compute | compute | 16405 | 0 | nova | > > >>> | 4 | bespin101-1 | nova-compute | compute | 1 | 0 | nova | > > >>> +----+-------------+----------------+----------- > +--------------+----------+-------------------+ > > >>> > > >>> mysql> select id, service_id, hypervisor_hostname from > > >>> compute_nodes; > > >>> +----+------------+------------------------+ > > >>> | id | service_id | hypervisor_hostname | > > >>> +----+------------+------------------------+ > > >>> | 1 | 3 | bespin101.east.isi.edu | > > >>> | 2 | 4 | bespin101.east.isi.edu | > > >>> +----+------------+------------------------+ > > >>> > > >>> Then, nova db (compute_nodes table) has entries of all bare-metal > > >>> nodes. > > >>> What do you think of this approach. > > >>> Do you have any better approach? > > >>> > > >>> Thanks, > > >>> David > > >>> > > >>> > > >>> > > >>> ----- Original Message ----- > > >>>> To elaborate, something the below. I'm not absolutely sure you > > >>>> need > > >>>> to > > >>>> be able to set service_name and host, but this gives you the > > >>>> option > > >>>> to > > >>>> do so if needed. > > >>>> > > >>>> iff --git a/nova/manager.py b/nova/manager.py > > >>>> index c6711aa..c0f4669 100644 > > >>>> --- a/nova/manager.py > > >>>> +++ b/nova/manager.py > > >>>> @@ -217,6 +217,8 @@ class SchedulerDependentManager(Manager): > > >>>> > > >>>> def update_service_capabilities(self, capabilities): > > >>>> """Remember these capabilities to send on next periodic > > >>>> update.""" > > >>>> + if not isinstance(capabilities, list): > > >>>> + capabilities = [capabilities] > > >>>> self.last_capabilities = capabilities > > >>>> > > >>>> @periodic_task > > >>>> @@ -224,5 +226,8 @@ class SchedulerDependentManager(Manager): > > >>>> """Pass data back to the scheduler at a periodic interval.""" > > >>>> if self.last_capabilities: > > >>>> LOG.debug(_('Notifying Schedulers of capabilities ...')) > > >>>> - self.scheduler_rpcapi.update_service_capabilities(context, > > >>>> - self.service_name, self.host, self.last_capabilities) > > >>>> + for capability_item in self.last_capabilities: > > >>>> + name = capability_item.get('service_name', self.service_name) > > >>>> + host = capability_item.get('host', self.host) > > >>>> + self.scheduler_rpcapi.update_service_capabilities(context, > > >>>> + name, host, capability_item) > > >>>> > > >>>> On Aug 21, 2012, at 1:28 PM, David Kang <dk...@isi.edu> wrote: > > >>>> > > >>>>> > > >>>>> Hi Vish, > > >>>>> > > >>>>> We are trying to change our code according to your comment. > > >>>>> I want to ask a question. > > >>>>> > > >>>>>>>> a) modify driver.get_host_stats to be able to return a list > > >>>>>>>> of > > >>>>>>>> host > > >>>>>>>> stats instead of just one. Report the whole list back to the > > >>>>>>>> scheduler. We could modify the receiving end to accept a list > > >>>>>>>> as > > >>>>>>>> well > > >>>>>>>> or just make multiple calls to > > >>>>>>>> self.update_service_capabilities(capabilities) > > >>>>> > > >>>>> Modifying driver.get_host_stats to return a list of host stats > > >>>>> is > > >>>>> easy. > > >>>>> Calling muliple calls to > > >>>>> self.update_service_capabilities(capabilities) doesn't seem to > > >>>>> work, > > >>>>> because 'capabilities' is overwritten each time. > > >>>>> > > >>>>> Modifying the receiving end to accept a list seems to be easy. > > >>>>> However, 'capabilities' is assumed to be dictionary by all other > > >>>>> scheduler routines, > > >>>>> it looks like that we have to change all of them to handle > > >>>>> 'capability' as a list of dictionary. > > >>>>> > > >>>>> If my understanding is correct, it would affect many parts of > > >>>>> the > > >>>>> scheduler. > > >>>>> Is it what you recommended? > > >>>>> > > >>>>> Thanks, > > >>>>> David > > >>>>> > > >>>>> > > >>>>> ----- Original Message ----- > > >>>>>> This was an immediate goal, the bare-metal nova-compute node > > >>>>>> could > > >>>>>> keep an internal database, but report capabilities through nova > > >>>>>> in > > >>>>>> the > > >>>>>> common way with the changes below. Then the scheduler wouldn't > > >>>>>> need > > >>>>>> access to the bare metal database at all. > > >>>>>> > > >>>>>> On Aug 15, 2012, at 4:23 PM, David Kang <dk...@isi.edu> wrote: > > >>>>>> > > >>>>>>> > > >>>>>>> Hi Vish, > > >>>>>>> > > >>>>>>> Is this discussion for long-term goal or for this Folsom > > >>>>>>> release? > > >>>>>>> > > >>>>>>> We still believe that bare-metal database is needed > > >>>>>>> because there is not an automated way how bare-metal nodes > > >>>>>>> report > > >>>>>>> their capabilities > > >>>>>>> to their bare-metal nova-compute node. > > >>>>>>> > > >>>>>>> Thanks, > > >>>>>>> David > > >>>>>>> > > >>>>>>>> > > >>>>>>>> I am interested in finding a solution that enables bare-metal > > >>>>>>>> and > > >>>>>>>> virtualized requests to be serviced through the same > > >>>>>>>> scheduler > > >>>>>>>> where > > >>>>>>>> the compute_nodes table has a full view of schedulable > > >>>>>>>> resources. > > >>>>>>>> This > > >>>>>>>> would seem to simplify the end-to-end flow while opening up > > >>>>>>>> some > > >>>>>>>> additional use cases (e.g. dynamic allocation of a node from > > >>>>>>>> bare-metal to hypervisor and back). > > >>>>>>>> > > >>>>>>>> One approach would be to have a proxy running a single > > >>>>>>>> nova-compute > > >>>>>>>> daemon fronting the bare-metal nodes . That nova-compute > > >>>>>>>> daemon > > >>>>>>>> would > > >>>>>>>> report up many HostState objects (1 per bare-metal node) to > > >>>>>>>> become > > >>>>>>>> entries in the compute_nodes table and accessible through the > > >>>>>>>> scheduler HostManager object. > > >>>>>>>> > > >>>>>>>> > > >>>>>>>> > > >>>>>>>> > > >>>>>>>> The HostState object would set cpu_info, vcpus, member_mb and > > >>>>>>>> local_gb > > >>>>>>>> values to be used for scheduling with the hypervisor_host > > >>>>>>>> field > > >>>>>>>> holding the bare-metal machine address (e.g. for IPMI based > > >>>>>>>> commands) > > >>>>>>>> and hypervisor_type = NONE. The bare-metal Flavors are > > >>>>>>>> created > > >>>>>>>> with > > >>>>>>>> an > > >>>>>>>> extra_spec of hypervisor_type= NONE and the corresponding > > >>>>>>>> compute_capabilities_filter would reduce the available hosts > > >>>>>>>> to > > >>>>>>>> those > > >>>>>>>> bare_metal nodes. The scheduler would need to understand that > > >>>>>>>> hypervisor_type = NONE means you need an exact fit (or > > >>>>>>>> best-fit) > > >>>>>>>> host > > >>>>>>>> vs weighting them (perhaps through the multi-scheduler). The > > >>>>>>>> scheduler > > >>>>>>>> would cast out the message to the <topic>.<service-hostname> > > >>>>>>>> (code > > >>>>>>>> today uses the HostState hostname), with the compute driver > > >>>>>>>> having > > >>>>>>>> to > > >>>>>>>> understand if it must be serviced elsewhere (but does not > > >>>>>>>> break > > >>>>>>>> any > > >>>>>>>> existing implementations since it is 1 to 1). > > >>>>>>>> > > >>>>>>>> > > >>>>>>>> > > >>>>>>>> > > >>>>>>>> > > >>>>>>>> Does this solution seem workable? Anything I missed? > > >>>>>>>> > > >>>>>>>> The bare metal driver already is proxying for the other nodes > > >>>>>>>> so > > >>>>>>>> it > > >>>>>>>> sounds like we need a couple of things to make this happen: > > >>>>>>>> > > >>>>>>>> > > >>>>>>>> a) modify driver.get_host_stats to be able to return a list > > >>>>>>>> of > > >>>>>>>> host > > >>>>>>>> stats instead of just one. Report the whole list back to the > > >>>>>>>> scheduler. We could modify the receiving end to accept a list > > >>>>>>>> as > > >>>>>>>> well > > >>>>>>>> or just make multiple calls to > > >>>>>>>> self.update_service_capabilities(capabilities) > > >>>>>>>> > > >>>>>>>> > > >>>>>>>> b) make a few minor changes to the scheduler to make sure > > >>>>>>>> filtering > > >>>>>>>> still works. Note the changes here may be very helpful: > > >>>>>>>> > > >>>>>>>> > > >>>>>>>> https://review.openstack.org/10327 > > >>>>>>>> > > >>>>>>>> > > >>>>>>>> c) we have to make sure that instances launched on those > > >>>>>>>> nodes > > >>>>>>>> take > > >>>>>>>> up > > >>>>>>>> the entire host state somehow. We could probably do this by > > >>>>>>>> making > > >>>>>>>> sure that the instance_type ram, mb, gb etc. matches what the > > >>>>>>>> node > > >>>>>>>> has, but we may want a new boolean field "used" if those > > >>>>>>>> aren't > > >>>>>>>> sufficient. > > >>>>>>>> > > >>>>>>>> > > >>>>>>>> I This approach seems pretty good. We could potentially get > > >>>>>>>> rid > > >>>>>>>> of > > >>>>>>>> the > > >>>>>>>> shared bare_metal_node table. I guess the only other concern > > >>>>>>>> is > > >>>>>>>> how > > >>>>>>>> you populate the capabilities that the bare metal nodes are > > >>>>>>>> reporting. > > >>>>>>>> I guess an api extension that rpcs to a baremetal node to add > > >>>>>>>> the > > >>>>>>>> node. Maybe someday this could be autogenerated by the bare > > >>>>>>>> metal > > >>>>>>>> host > > >>>>>>>> looking in its arp table for dhcp requests! :) > > >>>>>>>> > > >>>>>>>> > > >>>>>>>> Vish > > >>>>>>>> > > >>>>>>>> _______________________________________________ > > >>>>>>>> OpenStack-dev mailing list > > >>>>>>>> openstack-...@lists.openstack.org > > >>>>>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > > >>>>>>> > > >>>>>>> _______________________________________________ > > >>>>>>> OpenStack-dev mailing list > > >>>>>>> openstack-...@lists.openstack.org > > >>>>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > > >>>>>> > > >>>>>> > > >>>>>> _______________________________________________ > > >>>>>> OpenStack-dev mailing list > > >>>>>> openstack-...@lists.openstack.org > > >>>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > > >>>>> > > >>>>> _______________________________________________ > > >>>>> OpenStack-dev mailing list > > >>>>> openstack-...@lists.openstack.org > > >>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > > >>>> > > >>>> > > >>>> _______________________________________________ > > >>>> OpenStack-dev mailing list > > >>>> openstack-...@lists.openstack.org > > >>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > > _______________________________________________ > Mailing list: https://launchpad.net/~openstack > Post to : openstack@lists.launchpad.net > Unsubscribe : https://launchpad.net/~openstack > More help : https://help.launchpad.net/ListHelp > Michael ------------------------------------------------- Michael Fork Cloud Architect, Emerging Solutions IBM Systems & Technology Group _______________________________________________ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
smime.p7s
Description: S/MIME Cryptographic Signature
_______________________________________________ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp