Re: [Openstack] [openstack-dev] Discussion about where to put database for bare-metal provisioning (review 10726)

2012-08-27 Thread David Kang

 Vish,

 I think I don't understand your statement fully.
Unless we use different hostnames, (hostname, hypervisor_hostname) must be the 
same for all bare-metal nodes under a bare-metal nova-compute.

 Could you elaborate the following statement a little bit more?

 You would just have to use a little more than hostname. Perhaps
 (hostname, hypervisor_hostname) could be used to update the entry?
 

 Thanks,
 David



- Original Message -
 I would investigate changing the capabilities to key off of something
 other than hostname. It looks from the table structure like
 compute_nodes could be have a many-to-one relationship with services.
 You would just have to use a little more than hostname. Perhaps
 (hostname, hypervisor_hostname) could be used to update the entry?
 
 Vish
 
 On Aug 24, 2012, at 11:23 AM, David Kang dk...@isi.edu wrote:
 
 
   Vish,
 
   I've tested your code and did more testing.
  There are a couple of problems.
  1. host name should be unique. If not, any repetitive updates of new
  capabilities with the same host name are simply overwritten.
  2. We cannot generate arbitrary host names on the fly.
    The scheduler (I tested filter scheduler) gets host names from db.
    So, if a host name is not in the 'services' table, it is not
    considered by the scheduler at all.
 
  So, to make your suggestions possible, nova-compute should register
  N different host names in 'services' table,
  and N corresponding entries in 'compute_nodes' table.
  Here is an example:
 
  mysql select id, host, binary, topic, report_count, disabled,
  availability_zone from services;
  ++-++---+--+--+---+
  | id | host | binary | topic | report_count | disabled |
  | availability_zone |
  ++-++---+--+--+---+
  |  1 | bespin101 | nova-scheduler | scheduler | 17145 | 0 | nova |
  |  2 | bespin101 | nova-network | network | 16819 | 0 | nova |
  |  3 | bespin101-0 | nova-compute | compute | 16405 | 0 | nova |
  |  4 | bespin101-1 | nova-compute | compute | 1 | 0 | nova |
  ++-++---+--+--+---+
 
  mysql select id, service_id, hypervisor_hostname from
  compute_nodes;
  ++++
  | id | service_id | hypervisor_hostname |
  ++++
  |  1 | 3 | bespin101.east.isi.edu |
  |  2 | 4 | bespin101.east.isi.edu |
  ++++
 
   Then, nova db (compute_nodes table) has entries of all bare-metal
   nodes.
  What do you think of this approach.
  Do you have any better approach?
 
   Thanks,
   David
 
 
 
  - Original Message -
  To elaborate, something the below. I'm not absolutely sure you need
  to
  be able to set service_name and host, but this gives you the option
  to
  do so if needed.
 
  iff --git a/nova/manager.py b/nova/manager.py
  index c6711aa..c0f4669 100644
  --- a/nova/manager.py
  +++ b/nova/manager.py
  @@ -217,6 +217,8 @@ class SchedulerDependentManager(Manager):
 
  def update_service_capabilities(self, capabilities):
  Remember these capabilities to send on next periodic update.
  + if not isinstance(capabilities, list):
  + capabilities = [capabilities]
  self.last_capabilities = capabilities
 
  @periodic_task
  @@ -224,5 +226,8 @@ class SchedulerDependentManager(Manager):
  Pass data back to the scheduler at a periodic interval.
  if self.last_capabilities:
  LOG.debug(_('Notifying Schedulers of capabilities ...'))
  - self.scheduler_rpcapi.update_service_capabilities(context,
  - self.service_name, self.host, self.last_capabilities)
  + for capability_item in self.last_capabilities:
  + name = capability_item.get('service_name', self.service_name)
  + host = capability_item.get('host', self.host)
  + self.scheduler_rpcapi.update_service_capabilities(context,
  + name, host, capability_item)
 
  On Aug 21, 2012, at 1:28 PM, David Kang dk...@isi.edu wrote:
 
 
   Hi Vish,
 
   We are trying to change our code according to your comment.
  I want to ask a question.
 
  a) modify driver.get_host_stats to be able to return a list of
  host
  stats instead of just one. Report the whole list back to the
  scheduler. We could modify the receiving end to accept a list
  as
  well
  or just make multiple calls to
  self.update_service_capabilities(capabilities)
 
   Modifying driver.get_host_stats to return a list of host stats is
   easy.
  Calling muliple calls to
  self.update_service_capabilities(capabilities) doesn't seem to
  work,
  because 'capabilities' is overwritten each time.
 
   Modifying the receiving end to accept a list seems to be easy.
  However, 'capabilities' is assumed to be dictionary by all other
  scheduler routines,
  it looks like that we have to change all of them to handle
  'capability' as a list of dictionary.
 
   If my 

Re: [Openstack] [openstack-dev] Discussion about where to put database for bare-metal provisioning (review 10726)

2012-08-27 Thread Vishvananda Ishaya
Hi David,

I just checked out the code more extensively and I don't see why you need to 
create a new service entry for each compute_node entry. The code in 
host_manager to get all host states explicitly gets all compute_node entries. I 
don't see any reason why multiple compute_node entries can't share the same 
service. I don't see any place in the scheduler that is grabbing records by 
service instead of by compute node, but if there is one that I missed, it 
should be fairly easy to change it.

The compute_node record is created in the compute/resource_tracker.py as of a 
recent commit, so I think the path forward would be to make sure that one of 
the records is created for each bare metal node by the bare metal compute, 
perhaps by having multiple resource_trackers. 

Vish

On Aug 27, 2012, at 9:40 AM, David Kang dk...@isi.edu wrote:

 
  Vish,
 
  I think I don't understand your statement fully.
 Unless we use different hostnames, (hostname, hypervisor_hostname) must be 
 the 
 same for all bare-metal nodes under a bare-metal nova-compute.
 
  Could you elaborate the following statement a little bit more?
 
 You would just have to use a little more than hostname. Perhaps
 (hostname, hypervisor_hostname) could be used to update the entry?
 
 
  Thanks,
  David
 
 
 
 - Original Message -
 I would investigate changing the capabilities to key off of something
 other than hostname. It looks from the table structure like
 compute_nodes could be have a many-to-one relationship with services.
 You would just have to use a little more than hostname. Perhaps
 (hostname, hypervisor_hostname) could be used to update the entry?
 
 Vish
 
 On Aug 24, 2012, at 11:23 AM, David Kang dk...@isi.edu wrote:
 
 
  Vish,
 
  I've tested your code and did more testing.
 There are a couple of problems.
 1. host name should be unique. If not, any repetitive updates of new
 capabilities with the same host name are simply overwritten.
 2. We cannot generate arbitrary host names on the fly.
   The scheduler (I tested filter scheduler) gets host names from db.
   So, if a host name is not in the 'services' table, it is not
   considered by the scheduler at all.
 
 So, to make your suggestions possible, nova-compute should register
 N different host names in 'services' table,
 and N corresponding entries in 'compute_nodes' table.
 Here is an example:
 
 mysql select id, host, binary, topic, report_count, disabled,
 availability_zone from services;
 ++-++---+--+--+---+
 | id | host | binary | topic | report_count | disabled |
 | availability_zone |
 ++-++---+--+--+---+
 |  1 | bespin101 | nova-scheduler | scheduler | 17145 | 0 | nova |
 |  2 | bespin101 | nova-network | network | 16819 | 0 | nova |
 |  3 | bespin101-0 | nova-compute | compute | 16405 | 0 | nova |
 |  4 | bespin101-1 | nova-compute | compute | 1 | 0 | nova |
 ++-++---+--+--+---+
 
 mysql select id, service_id, hypervisor_hostname from
 compute_nodes;
 ++++
 | id | service_id | hypervisor_hostname |
 ++++
 |  1 | 3 | bespin101.east.isi.edu |
 |  2 | 4 | bespin101.east.isi.edu |
 ++++
 
  Then, nova db (compute_nodes table) has entries of all bare-metal
  nodes.
 What do you think of this approach.
 Do you have any better approach?
 
  Thanks,
  David
 
 
 
 - Original Message -
 To elaborate, something the below. I'm not absolutely sure you need
 to
 be able to set service_name and host, but this gives you the option
 to
 do so if needed.
 
 iff --git a/nova/manager.py b/nova/manager.py
 index c6711aa..c0f4669 100644
 --- a/nova/manager.py
 +++ b/nova/manager.py
 @@ -217,6 +217,8 @@ class SchedulerDependentManager(Manager):
 
 def update_service_capabilities(self, capabilities):
 Remember these capabilities to send on next periodic update.
 + if not isinstance(capabilities, list):
 + capabilities = [capabilities]
 self.last_capabilities = capabilities
 
 @periodic_task
 @@ -224,5 +226,8 @@ class SchedulerDependentManager(Manager):
 Pass data back to the scheduler at a periodic interval.
 if self.last_capabilities:
 LOG.debug(_('Notifying Schedulers of capabilities ...'))
 - self.scheduler_rpcapi.update_service_capabilities(context,
 - self.service_name, self.host, self.last_capabilities)
 + for capability_item in self.last_capabilities:
 + name = capability_item.get('service_name', self.service_name)
 + host = capability_item.get('host', self.host)
 + self.scheduler_rpcapi.update_service_capabilities(context,
 + name, host, capability_item)
 
 On Aug 21, 2012, at 1:28 PM, David Kang dk...@isi.edu wrote:
 
 
  Hi Vish,
 
  We are trying to change our code according to your comment.
 I want to ask a 

Re: [Openstack] [openstack-dev] Discussion about where to put database for bare-metal provisioning (review 10726)

2012-08-27 Thread David Kang

 Hi Vish,

 I think I understand your idea.
One service entry with multiple bare-metal compute_node entries are registered 
at the start of bare-metal nova-compute.
'hypervisor_hostname' must be different for each bare-metal machine, such as 
'bare-metal-0001.xxx.com', 'bare-metal-0002.xxx.com', etc.)
But their IP addresses must be the IP address of bare-metal nova-compute, such 
that an instance is casted 
not to bare-metal machine directly but to bare-metal nova-compute.

 One extension we need to do at the scheduler side is using (host, 
hypervisor_hostname) instead of (host) only in host_manager.py.
'HostManager.service_state' is { host : { service  : { cap k : v }}}.
It needs to be changed to { host : { service : { hypervisor_name : { cap 
k : v .

Most functions of HostState need to be changed to use (host, hypervisor_name) 
pair to identify a compute node. 

 Are we on the same page, now?

 Thanks,
 David

- Original Message -
 Hi David,
 
 I just checked out the code more extensively and I don't see why you
 need to create a new service entry for each compute_node entry. The
 code in host_manager to get all host states explicitly gets all
 compute_node entries. I don't see any reason why multiple compute_node
 entries can't share the same service. I don't see any place in the
 scheduler that is grabbing records by service instead of by compute
 node, but if there is one that I missed, it should be fairly easy to
 change it.
 
 The compute_node record is created in the compute/resource_tracker.py
 as of a recent commit, so I think the path forward would be to make
 sure that one of the records is created for each bare metal node by
 the bare metal compute, perhaps by having multiple resource_trackers.
 
 Vish
 
 On Aug 27, 2012, at 9:40 AM, David Kang dk...@isi.edu wrote:
 
 
   Vish,
 
   I think I don't understand your statement fully.
  Unless we use different hostnames, (hostname, hypervisor_hostname)
  must be the
  same for all bare-metal nodes under a bare-metal nova-compute.
 
   Could you elaborate the following statement a little bit more?
 
  You would just have to use a little more than hostname. Perhaps
  (hostname, hypervisor_hostname) could be used to update the entry?
 
 
   Thanks,
   David
 
 
 
  - Original Message -
  I would investigate changing the capabilities to key off of
  something
  other than hostname. It looks from the table structure like
  compute_nodes could be have a many-to-one relationship with
  services.
  You would just have to use a little more than hostname. Perhaps
  (hostname, hypervisor_hostname) could be used to update the entry?
 
  Vish
 
  On Aug 24, 2012, at 11:23 AM, David Kang dk...@isi.edu wrote:
 
 
   Vish,
 
   I've tested your code and did more testing.
  There are a couple of problems.
  1. host name should be unique. If not, any repetitive updates of
  new
  capabilities with the same host name are simply overwritten.
  2. We cannot generate arbitrary host names on the fly.
The scheduler (I tested filter scheduler) gets host names from
db.
So, if a host name is not in the 'services' table, it is not
considered by the scheduler at all.
 
  So, to make your suggestions possible, nova-compute should
  register
  N different host names in 'services' table,
  and N corresponding entries in 'compute_nodes' table.
  Here is an example:
 
  mysql select id, host, binary, topic, report_count, disabled,
  availability_zone from services;
  ++-++---+--+--+---+
  | id | host | binary | topic | report_count | disabled |
  | availability_zone |
  ++-++---+--+--+---+
  |  1 | bespin101 | nova-scheduler | scheduler | 17145 | 0 | nova |
  |  2 | bespin101 | nova-network | network | 16819 | 0 | nova |
  |  3 | bespin101-0 | nova-compute | compute | 16405 | 0 | nova |
  |  4 | bespin101-1 | nova-compute | compute | 1 | 0 | nova |
  ++-++---+--+--+---+
 
  mysql select id, service_id, hypervisor_hostname from
  compute_nodes;
  ++++
  | id | service_id | hypervisor_hostname |
  ++++
  |  1 | 3 | bespin101.east.isi.edu |
  |  2 | 4 | bespin101.east.isi.edu |
  ++++
 
   Then, nova db (compute_nodes table) has entries of all bare-metal
   nodes.
  What do you think of this approach.
  Do you have any better approach?
 
   Thanks,
   David
 
 
 
  - Original Message -
  To elaborate, something the below. I'm not absolutely sure you
  need
  to
  be able to set service_name and host, but this gives you the
  option
  to
  do so if needed.
 
  iff --git a/nova/manager.py b/nova/manager.py
  index c6711aa..c0f4669 100644
  --- a/nova/manager.py
  +++ b/nova/manager.py
  @@ -217,6 

Re: [Openstack] [openstack-dev] Discussion about where to put database for bare-metal provisioning (review 10726)

2012-08-27 Thread Michael J Fork

openstack-bounces+mjfork=us.ibm@lists.launchpad.net wrote on 08/27/2012
02:58:56 PM:

 From: David Kang dk...@isi.edu
 To: Vishvananda Ishaya vishvana...@gmail.com,
 Cc: OpenStack Development Mailing List openstack-
 d...@lists.openstack.org, openstack@lists.launchpad.net \
 (openstack@lists.launchpad.net\) openstack@lists.launchpad.net
 Date: 08/27/2012 03:06 PM
 Subject: Re: [Openstack] [openstack-dev] Discussion about where to
 put database for bare-metal provisioning (review 10726)
 Sent by: openstack-bounces+mjfork=us.ibm@lists.launchpad.net


  Hi Vish,

  I think I understand your idea.
 One service entry with multiple bare-metal compute_node entries are
 registered at the start of bare-metal nova-compute.
 'hypervisor_hostname' must be different for each bare-metal machine,
 such as 'bare-metal-0001.xxx.com', 'bare-metal-0002.xxx.com', etc.)
 But their IP addresses must be the IP address of bare-metal nova-
 compute, such that an instance is casted
 not to bare-metal machine directly but to bare-metal nova-compute.

I believe the change here is to cast out the message to the
topic.service-hostname. Existing code sends it to the compute_node
hostname (see line 202 of nova/scheduler/filter_scheduler.py, specifically
host=weighted_host.host_state.host).  Changing that to cast to the service
hostname would send the message to the bare-metal proxy node and should not
have an effect on current deployments since the service hostname and the
host_state.host would always be equal.  This model will also let you keep
the bare-metal compute node IP in the compute node table.

  One extension we need to do at the scheduler side is using (host,
 hypervisor_hostname) instead of (host) only in host_manager.py.
 'HostManager.service_state' is { host : { service  : { cap k : v }}}.
 It needs to be changed to { host : { service : {
 hypervisor_name : { cap k : v .
 Most functions of HostState need to be changed to use (host,
 hypervisor_name) pair to identify a compute node.

Would an alternative here be to change the top level host to be the
hypervisor_hostname and enforce uniqueness?

  Are we on the same page, now?

  Thanks,
  David

 - Original Message -
  Hi David,
 
  I just checked out the code more extensively and I don't see why you
  need to create a new service entry for each compute_node entry. The
  code in host_manager to get all host states explicitly gets all
  compute_node entries. I don't see any reason why multiple compute_node
  entries can't share the same service. I don't see any place in the
  scheduler that is grabbing records by service instead of by compute
  node, but if there is one that I missed, it should be fairly easy to
  change it.
 
  The compute_node record is created in the compute/resource_tracker.py
  as of a recent commit, so I think the path forward would be to make
  sure that one of the records is created for each bare metal node by
  the bare metal compute, perhaps by having multiple resource_trackers.
 
  Vish
 
  On Aug 27, 2012, at 9:40 AM, David Kang dk...@isi.edu wrote:
 
  
Vish,
  
I think I don't understand your statement fully.
   Unless we use different hostnames, (hostname, hypervisor_hostname)
   must be the
   same for all bare-metal nodes under a bare-metal nova-compute.
  
Could you elaborate the following statement a little bit more?
  
   You would just have to use a little more than hostname. Perhaps
   (hostname, hypervisor_hostname) could be used to update the entry?
  
  
Thanks,
David
  
  
  
   - Original Message -
   I would investigate changing the capabilities to key off of
   something
   other than hostname. It looks from the table structure like
   compute_nodes could be have a many-to-one relationship with
   services.
   You would just have to use a little more than hostname. Perhaps
   (hostname, hypervisor_hostname) could be used to update the entry?
  
   Vish
  
   On Aug 24, 2012, at 11:23 AM, David Kang dk...@isi.edu wrote:
  
  
Vish,
  
I've tested your code and did more testing.
   There are a couple of problems.
   1. host name should be unique. If not, any repetitive updates of
   new
   capabilities with the same host name are simply overwritten.
   2. We cannot generate arbitrary host names on the fly.
 The scheduler (I tested filter scheduler) gets host names from
 db.
 So, if a host name is not in the 'services' table, it is not
 considered by the scheduler at all.
  
   So, to make your suggestions possible, nova-compute should
   register
   N different host names in 'services' table,
   and N corresponding entries in 'compute_nodes' table.
   Here is an example:
  
   mysql select id, host, binary, topic, report_count, disabled,
   availability_zone from services;
   ++-++---
 +--+--+---+
   | id | host | binary | topic | report_count | disabled |
   | availability_zone

Re: [Openstack] [openstack-dev] Discussion about where to put database for bare-metal provisioning (review 10726)

2012-08-27 Thread VTJ NOTSU Arata

Hello all,

It seems that the requirement for keys of HostManager.service_state is just to 
be unique;
these do not have to be valid hostnames or queues (Already, existing code casts
messages to topic.service-hostname. Michael, doesn't it?). So, I tried
'host/bm_node_id' as 'host' of capabilities. Then, 
HostManager.service_state is:
 { host/bm_node_id : { service : { cap k : v }}}.
So far, it works fine. How about this way?

I paste relevant code in the bottom of this mail just to make sure.
NOTE: I added a new column 'nodename' to compute_nodes to store bm_node_id,
but storing it in 'hypervisor_hostname' may be a right solution.

(The whole code is in our github(NTTdocomo-openstack/nova, branch 'multinode'),
multiple resource_trackers are also implemented.)

Thanks,
Arata
 


diff --git a/nova/scheduler/host_manager.py b/nova/scheduler/host_manager.py
index 33ba2c1..567729f 100644
--- a/nova/scheduler/host_manager.py
+++ b/nova/scheduler/host_manager.py
@@ -98,9 +98,10 @@ class HostState(object):
 previously used and lock down access.
 
 
-def __init__(self, host, topic, capabilities=None, service=None):

+def __init__(self, host, topic, capabilities=None, service=None, 
nodename=None):
 self.host = host
 self.topic = topic
+self.nodename = nodename
 
 # Read-only capability dicts
 
@@ -175,8 +176,8 @@ class HostState(object):

 return True
 
 def __repr__(self):

-return (host '%s': free_ram_mb:%s free_disk_mb:%s %
-(self.host, self.free_ram_mb, self.free_disk_mb))
+return (host '%s' / nodename '%s': free_ram_mb:%s free_disk_mb:%s %
+(self.host, self.nodename, self.free_ram_mb, 
self.free_disk_mb))
 
 
 class HostManager(object):

@@ -268,11 +269,16 @@ class HostManager(object):
 LOG.warn(_(No service for compute ID %s) % compute['id'])
 continue
 host = service['host']
-capabilities = self.service_states.get(host, None)
+if compute['nodename']:
+host_node = '%s/%s' % (host, compute['nodename'])
+else:
+host_node = host
+capabilities = self.service_states.get(host_node, None)
 host_state = self.host_state_cls(host, topic,
 capabilities=capabilities,
-service=dict(service.iteritems()))
+service=dict(service.iteritems()),
+nodename=compute['nodename'])
 host_state.update_from_compute_node(compute)
-host_state_map[host] = host_state
+host_state_map[host_node] = host_state
 
 return host_state_map


diff --git a/nova/virt/baremetal/driver.py b/nova/virt/baremetal/driver.py
index 087d1b6..dbcfbde 100644
--- a/nova/virt/baremetal/driver.py
+++ b/nova/virt/baremetal/driver.py
(skip...)
+def _create_node_cap(self, node):
+dic = self._node_resources(node)
+dic['host'] = '%s/%s' % (FLAGS.host, node['id'])
+dic['cpu_arch'] = self._extra_specs.get('cpu_arch')
+dic['instance_type_extra_specs'] = self._extra_specs
+dic['supported_instances'] = self._supported_instances
+# TODO: put node's extra specs
+return dic
 
 def get_host_stats(self, refresh=False):

-return self._get_host_stats()
+caps = []
+context = nova_context.get_admin_context()
+nodes = bmdb.bm_node_get_all(context,
+ service_host=FLAGS.host)
+for node in nodes:
+node_cap = self._create_node_cap(node)
+caps.append(node_cap)
+return caps


(2012/08/28 5:55), Michael J Fork wrote:

openstack-bounces+mjfork=us.ibm@lists.launchpad.net wrote on 08/27/2012 
02:58:56 PM:

  From: David Kang dk...@isi.edu
  To: Vishvananda Ishaya vishvana...@gmail.com,
  Cc: OpenStack Development Mailing List openstack-
  d...@lists.openstack.org, openstack@lists.launchpad.net \
  (openstack@lists.launchpad.net\) openstack@lists.launchpad.net
  Date: 08/27/2012 03:06 PM
  Subject: Re: [Openstack] [openstack-dev] Discussion about where to
  put database for bare-metal provisioning (review 10726)
  Sent by: openstack-bounces+mjfork=us.ibm@lists.launchpad.net
 
 
   Hi Vish,
 
   I think I understand your idea.
  One service entry with multiple bare-metal compute_node entries are
  registered at the start of bare-metal nova-compute.
  'hypervisor_hostname' must be different for each bare-metal machine,
  such as 'bare-metal-0001.xxx.com', 'bare-metal-0002.xxx.com', etc.)
  But their IP addresses must be the IP address of bare-metal nova-
  compute, such that an instance is casted
  not to bare-metal machine directly but to bare-metal nova-compute.

I believe the change here is to cast out the message to the 
topic.service-hostname. Existing code sends it to the compute_node hostname 
(see line 202 of nova/scheduler/filter_scheduler.py

Re: [Openstack] [openstack-dev] Discussion about where to put database for bare-metal provisioning (review 10726)

2012-08-27 Thread Michael J Fork

David Kang dk...@isi.edu wrote on 08/27/2012 05:22:37 PM:

 From: David Kang dk...@isi.edu
 To: Michael J Fork/Rochester/IBM@IBMUS,
 Cc: openstack@lists.launchpad.net (openstack@lists.launchpad.net)
 openstack@lists.launchpad.net, openstack-bounces+mjfork=us ibm com
 openstack-bounces+mjfork=us.ibm@lists.launchpad.net, OpenStack
 Development Mailing List openstack-...@lists.openstack.org,
 Vishvananda Ishaya vishvana...@gmail.com
 Date: 08/27/2012 05:22 PM
 Subject: Re: [Openstack] [openstack-dev] Discussion about where to
 put database for bare-metal provisioning (review 10726)


  Michael,

  I think you mean compute_node hostname as 'hypervisor_hostname'
 field in the 'compute_node' table.

Yes.  This value would be part of the payload of the message cast to the
proxy node so that it knows who the request was directed to.

 What do you mean by service hostname?
 I don't see such field in the 'service' table in the database.
 Is it in some other table?
 Or do you suggest adding 'service_hostname' field in the 'service' table?

The host field in the services table.  This value would be used as the
target of the rpc cast so that the proxy node would receive the message.


  Thanks,
  David

 - Original Message -
  openstack-bounces+mjfork=us.ibm@lists.launchpad.net wrote on
  08/27/2012 02:58:56 PM:
 
   From: David Kang dk...@isi.edu
   To: Vishvananda Ishaya vishvana...@gmail.com,
   Cc: OpenStack Development Mailing List openstack-
   d...@lists.openstack.org, openstack@lists.launchpad.net \
   (openstack@lists.launchpad.net\) openstack@lists.launchpad.net
   Date: 08/27/2012 03:06 PM
   Subject: Re: [Openstack] [openstack-dev] Discussion about where to
   put database for bare-metal provisioning (review 10726)
   Sent by: openstack-bounces+mjfork=us.ibm@lists.launchpad.net
  
  
   Hi Vish,
  
   I think I understand your idea.
   One service entry with multiple bare-metal compute_node entries are
   registered at the start of bare-metal nova-compute.
   'hypervisor_hostname' must be different for each bare-metal machine,
   such as 'bare-metal-0001.xxx.com', 'bare-metal-0002.xxx.com', etc.)
   But their IP addresses must be the IP address of bare-metal nova-
   compute, such that an instance is casted
   not to bare-metal machine directly but to bare-metal nova-compute.
 
  I believe the change here is to cast out the message to the
  topic.service-hostname. Existing code sends it to the compute_node
  hostname (see line 202 of nova/scheduler/filter_scheduler.py,
  specifically host=weighted_host.host_state.host). Changing that to
  cast to the service hostname would send the message to the bare-metal
  proxy node and should not have an effect on current deployments since
  the service hostname and the host_state.host would always be equal.
  This model will also let you keep the bare-metal compute node IP in
  the compute node table.
 
   One extension we need to do at the scheduler side is using (host,
   hypervisor_hostname) instead of (host) only in host_manager.py.
   'HostManager.service_state' is { host : { service  : { cap k : v
   }}}.
   It needs to be changed to { host : { service : {
   hypervisor_name : { cap k : v .
   Most functions of HostState need to be changed to use (host,
   hypervisor_name) pair to identify a compute node.
 
  Would an alternative here be to change the top level host to be the
  hypervisor_hostname and enforce uniqueness?
 
   Are we on the same page, now?
  
   Thanks,
   David
  
   - Original Message -
Hi David,
   
I just checked out the code more extensively and I don't see why
you
need to create a new service entry for each compute_node entry.
The
code in host_manager to get all host states explicitly gets all
compute_node entries. I don't see any reason why multiple
compute_node
entries can't share the same service. I don't see any place in the
scheduler that is grabbing records by service instead of by
compute
node, but if there is one that I missed, it should be fairly easy
to
change it.
   
The compute_node record is created in the
compute/resource_tracker.py
as of a recent commit, so I think the path forward would be to
make
sure that one of the records is created for each bare metal node
by
the bare metal compute, perhaps by having multiple
resource_trackers.
   
Vish
   
On Aug 27, 2012, at 9:40 AM, David Kang dk...@isi.edu wrote:
   

 Vish,

 I think I don't understand your statement fully.
 Unless we use different hostnames, (hostname,
 hypervisor_hostname)
 must be the
 same for all bare-metal nodes under a bare-metal nova-compute.

 Could you elaborate the following statement a little bit more?

 You would just have to use a little more than hostname. Perhaps
 (hostname, hypervisor_hostname) could be used to update the
 entry?


 Thanks,
 David

Re: [Openstack] [openstack-dev] Discussion about where to put database for bare-metal provisioning (review 10726)

2012-08-27 Thread David Kang

 Michael,

 It is a little confusing without knowing the assumptions of your suggestions.
First of all, I want to make sure that you agree on the followings:
1. one entry for a bare-metal machines in the 'compute_node' table.
2. one entry for bare-metal nova-compute that manages N bare-metal machines in 
the 'service' table.

In addition to that I think you suggest augmenting 'host' field in the 
'service' table,
such that 'host' field can be used for RPC.
(I don't think the current 'host' field can be used for that purpose now.)

 David

- Original Message -
 David Kang dk...@isi.edu wrote on 08/27/2012 05:22:37 PM:
 
  From: David Kang dk...@isi.edu
  To: Michael J Fork/Rochester/IBM@IBMUS,
  Cc: openstack@lists.launchpad.net (openstack@lists.launchpad.net)
  openstack@lists.launchpad.net, openstack-bounces+mjfork=us ibm com
  openstack-bounces+mjfork=us.ibm@lists.launchpad.net, OpenStack
  Development Mailing List openstack-...@lists.openstack.org,
  Vishvananda Ishaya vishvana...@gmail.com
  Date: 08/27/2012 05:22 PM
  Subject: Re: [Openstack] [openstack-dev] Discussion about where to
  put database for bare-metal provisioning (review 10726)
 
 
  Michael,
 
  I think you mean compute_node hostname as 'hypervisor_hostname'
  field in the 'compute_node' table.
 
 Yes. This value would be part of the payload of the message cast to
 the proxy node so that it knows who the request was directed to.
 
  What do you mean by service hostname?
  I don't see such field in the 'service' table in the database.
  Is it in some other table?
  Or do you suggest adding 'service_hostname' field in the 'service'
  table?
 
 The host field in the services table. This value would be used as
 the target of the rpc cast so that the proxy node would receive the
 message.
 
 
  Thanks,
  David
 
  - Original Message -
   openstack-bounces+mjfork=us.ibm@lists.launchpad.net wrote on
   08/27/2012 02:58:56 PM:
  
From: David Kang dk...@isi.edu
To: Vishvananda Ishaya vishvana...@gmail.com,
Cc: OpenStack Development Mailing List openstack-
d...@lists.openstack.org, openstack@lists.launchpad.net \
(openstack@lists.launchpad.net\)
openstack@lists.launchpad.net
Date: 08/27/2012 03:06 PM
Subject: Re: [Openstack] [openstack-dev] Discussion about where
to
put database for bare-metal provisioning (review 10726)
Sent by: openstack-bounces+mjfork=us.ibm@lists.launchpad.net
   
   
Hi Vish,
   
I think I understand your idea.
One service entry with multiple bare-metal compute_node entries
are
registered at the start of bare-metal nova-compute.
'hypervisor_hostname' must be different for each bare-metal
machine,
such as 'bare-metal-0001.xxx.com', 'bare-metal-0002.xxx.com',
etc.)
But their IP addresses must be the IP address of bare-metal
nova-
compute, such that an instance is casted
not to bare-metal machine directly but to bare-metal
nova-compute.
  
   I believe the change here is to cast out the message to the
   topic.service-hostname. Existing code sends it to the
   compute_node
   hostname (see line 202 of nova/scheduler/filter_scheduler.py,
   specifically host=weighted_host.host_state.host). Changing that to
   cast to the service hostname would send the message to the
   bare-metal
   proxy node and should not have an effect on current deployments
   since
   the service hostname and the host_state.host would always be
   equal.
   This model will also let you keep the bare-metal compute node IP
   in
   the compute node table.
  
One extension we need to do at the scheduler side is using
(host,
hypervisor_hostname) instead of (host) only in host_manager.py.
'HostManager.service_state' is { host : { service  : { cap k
: v
}}}.
It needs to be changed to { host : { service : {
hypervisor_name : { cap k : v .
Most functions of HostState need to be changed to use (host,
hypervisor_name) pair to identify a compute node.
  
   Would an alternative here be to change the top level host to be
   the
   hypervisor_hostname and enforce uniqueness?
  
Are we on the same page, now?
   
Thanks,
David
   
- Original Message -
 Hi David,

 I just checked out the code more extensively and I don't see
 why
 you
 need to create a new service entry for each compute_node
 entry.
 The
 code in host_manager to get all host states explicitly gets
 all
 compute_node entries. I don't see any reason why multiple
 compute_node
 entries can't share the same service. I don't see any place in
 the
 scheduler that is grabbing records by service instead of by
 compute
 node, but if there is one that I missed, it should be fairly
 easy
 to
 change it.

 The compute_node record is created in the
 compute/resource_tracker.py
 as of a recent commit, so I think the path forward

Re: [Openstack] [openstack-dev] Discussion about where to put database for bare-metal provisioning (review 10726)

2012-08-27 Thread Michael J Fork

VTJ NOTSU Arata no...@virtualtech.jp wrote on 08/27/2012 07:30:40 PM:

 From: VTJ NOTSU Arata no...@virtualtech.jp
 To: Michael J Fork/Rochester/IBM@IBMUS,
 Cc: David Kang dk...@isi.edu, openstack@lists.launchpad.net
 (openstack@lists.launchpad.net) openstack@lists.launchpad.net,
 openstack-bounces+mjfork=us.ibm@lists.launchpad.net, OpenStack
 Development Mailing List openstack-...@lists.openstack.org
 Date: 08/27/2012 07:30 PM
 Subject: Re: [Openstack] [openstack-dev] Discussion about where to
 put database for bare-metal provisioning (review 10726)

 Hi Michael,

  Looking at line 203 in nova/scheduler/filter_scheduler.py, the
 target host in the cast call is weighted_host*.*host_state*.*host
 and not a service host. (My guess is this will likely require a fair
 number of changes in the scheduler area to change cast calls to
 target a service host instead of a compute node)

 weighted_host.host_state.host still seems to be service['host']...
 Please look at it again with me.

 # First, HostStateManager.get_all_host_states:
 # host_manager.py:264
  compute_nodes = db.compute_node_get_all(context)
  for compute in compute_nodes:
 # service is from services table (joined-loaded with compute_nodes)
  service = compute['service']
  if not service:
  LOG.warn(_(No service for compute ID %s) % compute
['id'])
  continue
  host = service['host']
  capabilities = self.service_states.get(host, None)
 # go to HostState constructor:
 # the 1st parameter 'host' is service['host']
  host_state = self.host_state_cls(host, topic,
  capabilities=capabilities,
  service=dict(service.iteritems()))

 # host_manager.py:101
  def __init__(self, host, topic, capabilities=None, service=None):
  self.host = host
  self.topic = topic
 # here, HostState.host is service['host']

 Then, update_from_compute_node(compute) is called but it leaves
 self.host unchanged.
 WeightedHost.host_state is this HostState. So, host at
 filter_scheduler.py:203 is service['host']. We can use existing code
 about RPC target. Do I miss something?

Agreed, you can use the existing RPC target.  Sorry for the confusion.
This actually answers the question in David's last e-mail asking if the
host field can be used from the services table - it already is.

 Thanks,
 Arata



BIG SNIP


Michael

-
Michael Fork
Cloud Architect, Emerging Solutions
IBM Systems  Technology Group___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] [openstack-dev] Discussion about where to put database for bare-metal provisioning (review 10726)

2012-08-24 Thread David Kang

 Vish,

 I've tested your code and did more testing.
There are a couple of problems.
1. host name should be unique. If not, any repetitive updates of new 
capabilities with the same host name are simply overwritten.
2. We cannot generate arbitrary host names on the fly.
  The scheduler (I tested filter scheduler) gets host names from db.
  So, if a host name is not in the 'services' table, it is not considered by 
the scheduler at all.

So, to make your suggestions possible, nova-compute should register N different 
host names in 'services' table,
and N corresponding entries in 'compute_nodes' table.
Here is an example:

mysql select id, host, binary, topic, report_count, disabled, 
availability_zone from services;
++-++---+--+--+---+
| id | host        | binary         | topic     | report_count | disabled | 
availability_zone |
++-++---+--+--+---+
|  1 | bespin101   | nova-scheduler | scheduler |        17145 |        0 | 
nova              |
|  2 | bespin101   | nova-network   | network   |        16819 |        0 | 
nova              |
|  3 | bespin101-0 | nova-compute   | compute   |        16405 |        0 | 
nova              |
|  4 | bespin101-1 | nova-compute   | compute   |            1 |        0 | 
nova              |
++-++---+--+--+---+

mysql select id, service_id, hypervisor_hostname from compute_nodes;
++++
| id | service_id | hypervisor_hostname    |
++++
|  1 |          3 | bespin101.east.isi.edu |
|  2 |          4 | bespin101.east.isi.edu |
++++

 Then, nova db (compute_nodes table) has entries of all bare-metal nodes.
What do you think of this approach.
Do you have any better approach?

 Thanks,
 David



- Original Message -
 To elaborate, something the below. I'm not absolutely sure you need to
 be able to set service_name and host, but this gives you the option to
 do so if needed.
 
 iff --git a/nova/manager.py b/nova/manager.py
 index c6711aa..c0f4669 100644
 --- a/nova/manager.py
 +++ b/nova/manager.py
 @@ -217,6 +217,8 @@ class SchedulerDependentManager(Manager):
 
 def update_service_capabilities(self, capabilities):
 Remember these capabilities to send on next periodic update.
 + if not isinstance(capabilities, list):
 + capabilities = [capabilities]
 self.last_capabilities = capabilities
 
 @periodic_task
 @@ -224,5 +226,8 @@ class SchedulerDependentManager(Manager):
 Pass data back to the scheduler at a periodic interval.
 if self.last_capabilities:
 LOG.debug(_('Notifying Schedulers of capabilities ...'))
 - self.scheduler_rpcapi.update_service_capabilities(context,
 - self.service_name, self.host, self.last_capabilities)
 + for capability_item in self.last_capabilities:
 + name = capability_item.get('service_name', self.service_name)
 + host = capability_item.get('host', self.host)
 + self.scheduler_rpcapi.update_service_capabilities(context,
 + name, host, capability_item)
 
 On Aug 21, 2012, at 1:28 PM, David Kang dk...@isi.edu wrote:
 
 
   Hi Vish,
 
   We are trying to change our code according to your comment.
  I want to ask a question.
 
  a) modify driver.get_host_stats to be able to return a list of
  host
  stats instead of just one. Report the whole list back to the
  scheduler. We could modify the receiving end to accept a list as
  well
  or just make multiple calls to
  self.update_service_capabilities(capabilities)
 
   Modifying driver.get_host_stats to return a list of host stats is
   easy.
  Calling muliple calls to
  self.update_service_capabilities(capabilities) doesn't seem to work,
  because 'capabilities' is overwritten each time.
 
   Modifying the receiving end to accept a list seems to be easy.
  However, 'capabilities' is assumed to be dictionary by all other
  scheduler routines,
  it looks like that we have to change all of them to handle
  'capability' as a list of dictionary.
 
   If my understanding is correct, it would affect many parts of the
   scheduler.
  Is it what you recommended?
 
   Thanks,
   David
 
 
  - Original Message -
  This was an immediate goal, the bare-metal nova-compute node could
  keep an internal database, but report capabilities through nova in
  the
  common way with the changes below. Then the scheduler wouldn't need
  access to the bare metal database at all.
 
  On Aug 15, 2012, at 4:23 PM, David Kang dk...@isi.edu wrote:
 
 
  Hi Vish,
 
  Is this discussion for long-term goal or for this Folsom release?
 
  We still believe that bare-metal database is needed
  because there is not an automated way how bare-metal nodes report
  their capabilities
  to their bare-metal nova-compute node.
 
  Thanks,
  David
 
 
  I am interested in finding a solution 

Re: [Openstack] [openstack-dev] Discussion about where to put database for bare-metal provisioning (review 10726)

2012-08-24 Thread Vishvananda Ishaya
I would investigate changing the capabilities to key off of something other 
than hostname. It looks from the table structure like compute_nodes could be 
have a many-to-one relationship with services. You would just have to use a 
little more than hostname. Perhaps (hostname, hypervisor_hostname) could be 
used to update the entry?

Vish

On Aug 24, 2012, at 11:23 AM, David Kang dk...@isi.edu wrote:

 
  Vish,
 
  I've tested your code and did more testing.
 There are a couple of problems.
 1. host name should be unique. If not, any repetitive updates of new 
 capabilities with the same host name are simply overwritten.
 2. We cannot generate arbitrary host names on the fly.
   The scheduler (I tested filter scheduler) gets host names from db.
   So, if a host name is not in the 'services' table, it is not considered by 
 the scheduler at all.
 
 So, to make your suggestions possible, nova-compute should register N 
 different host names in 'services' table,
 and N corresponding entries in 'compute_nodes' table.
 Here is an example:
 
 mysql select id, host, binary, topic, report_count, disabled, 
 availability_zone from services;
 ++-++---+--+--+---+
 | id | host| binary | topic | report_count | disabled | 
 availability_zone |
 ++-++---+--+--+---+
 |  1 | bespin101   | nova-scheduler | scheduler |17145 |0 | 
 nova  |
 |  2 | bespin101   | nova-network   | network   |16819 |0 | 
 nova  |
 |  3 | bespin101-0 | nova-compute   | compute   |16405 |0 | 
 nova  |
 |  4 | bespin101-1 | nova-compute   | compute   |1 |0 | 
 nova  |
 ++-++---+--+--+---+
 
 mysql select id, service_id, hypervisor_hostname from compute_nodes;
 ++++
 | id | service_id | hypervisor_hostname|
 ++++
 |  1 |  3 | bespin101.east.isi.edu |
 |  2 |  4 | bespin101.east.isi.edu |
 ++++
 
  Then, nova db (compute_nodes table) has entries of all bare-metal nodes.
 What do you think of this approach.
 Do you have any better approach?
 
  Thanks,
  David
 
 
 
 - Original Message -
 To elaborate, something the below. I'm not absolutely sure you need to
 be able to set service_name and host, but this gives you the option to
 do so if needed.
 
 iff --git a/nova/manager.py b/nova/manager.py
 index c6711aa..c0f4669 100644
 --- a/nova/manager.py
 +++ b/nova/manager.py
 @@ -217,6 +217,8 @@ class SchedulerDependentManager(Manager):
 
 def update_service_capabilities(self, capabilities):
 Remember these capabilities to send on next periodic update.
 + if not isinstance(capabilities, list):
 + capabilities = [capabilities]
 self.last_capabilities = capabilities
 
 @periodic_task
 @@ -224,5 +226,8 @@ class SchedulerDependentManager(Manager):
 Pass data back to the scheduler at a periodic interval.
 if self.last_capabilities:
 LOG.debug(_('Notifying Schedulers of capabilities ...'))
 - self.scheduler_rpcapi.update_service_capabilities(context,
 - self.service_name, self.host, self.last_capabilities)
 + for capability_item in self.last_capabilities:
 + name = capability_item.get('service_name', self.service_name)
 + host = capability_item.get('host', self.host)
 + self.scheduler_rpcapi.update_service_capabilities(context,
 + name, host, capability_item)
 
 On Aug 21, 2012, at 1:28 PM, David Kang dk...@isi.edu wrote:
 
 
  Hi Vish,
 
  We are trying to change our code according to your comment.
 I want to ask a question.
 
 a) modify driver.get_host_stats to be able to return a list of
 host
 stats instead of just one. Report the whole list back to the
 scheduler. We could modify the receiving end to accept a list as
 well
 or just make multiple calls to
 self.update_service_capabilities(capabilities)
 
  Modifying driver.get_host_stats to return a list of host stats is
  easy.
 Calling muliple calls to
 self.update_service_capabilities(capabilities) doesn't seem to work,
 because 'capabilities' is overwritten each time.
 
  Modifying the receiving end to accept a list seems to be easy.
 However, 'capabilities' is assumed to be dictionary by all other
 scheduler routines,
 it looks like that we have to change all of them to handle
 'capability' as a list of dictionary.
 
  If my understanding is correct, it would affect many parts of the
  scheduler.
 Is it what you recommended?
 
  Thanks,
  David
 
 
 - Original Message -
 This was an immediate goal, the bare-metal nova-compute node could
 keep an internal database, but report capabilities through nova in
 the
 common way with the changes below. Then the scheduler wouldn't need
 access to the 

Re: [Openstack] [openstack-dev] Discussion about where to put database for bare-metal provisioning (review 10726)

2012-08-21 Thread David Kang

 Hi Vish,

 We are trying to change our code according to your comment.
I want to ask a question.

  a) modify driver.get_host_stats to be able to return a list of host
  stats instead of just one. Report the whole list back to the
  scheduler. We could modify the receiving end to accept a list as
  well
  or just make multiple calls to
  self.update_service_capabilities(capabilities)

 Modifying driver.get_host_stats to return a list of host stats is easy.
Calling muliple calls to self.update_service_capabilities(capabilities) doesn't 
seem to work,
because 'capabilities' is overwritten each time.

 Modifying the receiving end to accept a list seems to be easy.
However, 'capabilities' is assumed to be dictionary by all other scheduler 
routines,
it looks like that we have to change all of them to handle 'capability' as a 
list of dictionary.

 If my understanding is correct, it would affect many parts of the scheduler.
Is it what you recommended?

 Thanks,
 David
 

- Original Message -
 This was an immediate goal, the bare-metal nova-compute node could
 keep an internal database, but report capabilities through nova in the
 common way with the changes below. Then the scheduler wouldn't need
 access to the bare metal database at all.
 
 On Aug 15, 2012, at 4:23 PM, David Kang dk...@isi.edu wrote:
 
 
  Hi Vish,
 
  Is this discussion for long-term goal or for this Folsom release?
 
  We still believe that bare-metal database is needed
  because there is not an automated way how bare-metal nodes report
  their capabilities
  to their bare-metal nova-compute node.
 
  Thanks,
  David
 
 
  I am interested in finding a solution that enables bare-metal and
  virtualized requests to be serviced through the same scheduler
  where
  the compute_nodes table has a full view of schedulable resources.
  This
  would seem to simplify the end-to-end flow while opening up some
  additional use cases (e.g. dynamic allocation of a node from
  bare-metal to hypervisor and back).
 
  One approach would be to have a proxy running a single nova-compute
  daemon fronting the bare-metal nodes . That nova-compute daemon
  would
  report up many HostState objects (1 per bare-metal node) to become
  entries in the compute_nodes table and accessible through the
  scheduler HostManager object.
 
 
 
 
  The HostState object would set cpu_info, vcpus, member_mb and
  local_gb
  values to be used for scheduling with the hypervisor_host field
  holding the bare-metal machine address (e.g. for IPMI based
  commands)
  and hypervisor_type = NONE. The bare-metal Flavors are created with
  an
  extra_spec of hypervisor_type= NONE and the corresponding
  compute_capabilities_filter would reduce the available hosts to
  those
  bare_metal nodes. The scheduler would need to understand that
  hypervisor_type = NONE means you need an exact fit (or best-fit)
  host
  vs weighting them (perhaps through the multi-scheduler). The
  scheduler
  would cast out the message to the topic.service-hostname (code
  today uses the HostState hostname), with the compute driver having
  to
  understand if it must be serviced elsewhere (but does not break any
  existing implementations since it is 1 to 1).
 
 
 
 
 
  Does this solution seem workable? Anything I missed?
 
  The bare metal driver already is proxying for the other nodes so it
  sounds like we need a couple of things to make this happen:
 
 
  a) modify driver.get_host_stats to be able to return a list of host
  stats instead of just one. Report the whole list back to the
  scheduler. We could modify the receiving end to accept a list as
  well
  or just make multiple calls to
  self.update_service_capabilities(capabilities)
 
 
  b) make a few minor changes to the scheduler to make sure filtering
  still works. Note the changes here may be very helpful:
 
 
  https://review.openstack.org/10327
 
 
  c) we have to make sure that instances launched on those nodes take
  up
  the entire host state somehow. We could probably do this by making
  sure that the instance_type ram, mb, gb etc. matches what the node
  has, but we may want a new boolean field used if those aren't
  sufficient.
 
 
  I This approach seems pretty good. We could potentially get rid of
  the
  shared bare_metal_node table. I guess the only other concern is how
  you populate the capabilities that the bare metal nodes are
  reporting.
  I guess an api extension that rpcs to a baremetal node to add the
  node. Maybe someday this could be autogenerated by the bare metal
  host
  looking in its arp table for dhcp requests! :)
 
 
  Vish
 
  ___
  OpenStack-dev mailing list
  openstack-...@lists.openstack.org
  http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
 
  ___
  OpenStack-dev mailing list
  openstack-...@lists.openstack.org
  http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
 
 
 

Re: [Openstack] [openstack-dev] Discussion about where to put database for bare-metal provisioning (review 10726)

2012-08-21 Thread Vishvananda Ishaya
I think you should be able to modify nova/manager.py to be able to store and 
report a list of capabilites and report them all individually to the scheduler 
via the periodic task.

Vish

On Aug 21, 2012, at 1:28 PM, David Kang dk...@isi.edu wrote:

 
  Hi Vish,
 
  We are trying to change our code according to your comment.
 I want to ask a question.
 
 a) modify driver.get_host_stats to be able to return a list of host
 stats instead of just one. Report the whole list back to the
 scheduler. We could modify the receiving end to accept a list as
 well
 or just make multiple calls to
 self.update_service_capabilities(capabilities)
 
  Modifying driver.get_host_stats to return a list of host stats is easy.
 Calling muliple calls to self.update_service_capabilities(capabilities) 
 doesn't seem to work,
 because 'capabilities' is overwritten each time.
 
  Modifying the receiving end to accept a list seems to be easy.
 However, 'capabilities' is assumed to be dictionary by all other scheduler 
 routines,
 it looks like that we have to change all of them to handle 'capability' as a 
 list of dictionary.
 
  If my understanding is correct, it would affect many parts of the scheduler.
 Is it what you recommended?
 
  Thanks,
  David
  
 
 - Original Message -
 This was an immediate goal, the bare-metal nova-compute node could
 keep an internal database, but report capabilities through nova in the
 common way with the changes below. Then the scheduler wouldn't need
 access to the bare metal database at all.
 
 On Aug 15, 2012, at 4:23 PM, David Kang dk...@isi.edu wrote:
 
 
 Hi Vish,
 
 Is this discussion for long-term goal or for this Folsom release?
 
 We still believe that bare-metal database is needed
 because there is not an automated way how bare-metal nodes report
 their capabilities
 to their bare-metal nova-compute node.
 
 Thanks,
 David
 
 
 I am interested in finding a solution that enables bare-metal and
 virtualized requests to be serviced through the same scheduler
 where
 the compute_nodes table has a full view of schedulable resources.
 This
 would seem to simplify the end-to-end flow while opening up some
 additional use cases (e.g. dynamic allocation of a node from
 bare-metal to hypervisor and back).
 
 One approach would be to have a proxy running a single nova-compute
 daemon fronting the bare-metal nodes . That nova-compute daemon
 would
 report up many HostState objects (1 per bare-metal node) to become
 entries in the compute_nodes table and accessible through the
 scheduler HostManager object.
 
 
 
 
 The HostState object would set cpu_info, vcpus, member_mb and
 local_gb
 values to be used for scheduling with the hypervisor_host field
 holding the bare-metal machine address (e.g. for IPMI based
 commands)
 and hypervisor_type = NONE. The bare-metal Flavors are created with
 an
 extra_spec of hypervisor_type= NONE and the corresponding
 compute_capabilities_filter would reduce the available hosts to
 those
 bare_metal nodes. The scheduler would need to understand that
 hypervisor_type = NONE means you need an exact fit (or best-fit)
 host
 vs weighting them (perhaps through the multi-scheduler). The
 scheduler
 would cast out the message to the topic.service-hostname (code
 today uses the HostState hostname), with the compute driver having
 to
 understand if it must be serviced elsewhere (but does not break any
 existing implementations since it is 1 to 1).
 
 
 
 
 
 Does this solution seem workable? Anything I missed?
 
 The bare metal driver already is proxying for the other nodes so it
 sounds like we need a couple of things to make this happen:
 
 
 a) modify driver.get_host_stats to be able to return a list of host
 stats instead of just one. Report the whole list back to the
 scheduler. We could modify the receiving end to accept a list as
 well
 or just make multiple calls to
 self.update_service_capabilities(capabilities)
 
 
 b) make a few minor changes to the scheduler to make sure filtering
 still works. Note the changes here may be very helpful:
 
 
 https://review.openstack.org/10327
 
 
 c) we have to make sure that instances launched on those nodes take
 up
 the entire host state somehow. We could probably do this by making
 sure that the instance_type ram, mb, gb etc. matches what the node
 has, but we may want a new boolean field used if those aren't
 sufficient.
 
 
 I This approach seems pretty good. We could potentially get rid of
 the
 shared bare_metal_node table. I guess the only other concern is how
 you populate the capabilities that the bare metal nodes are
 reporting.
 I guess an api extension that rpcs to a baremetal node to add the
 node. Maybe someday this could be autogenerated by the bare metal
 host
 looking in its arp table for dhcp requests! :)
 
 
 Vish
 
 ___
 OpenStack-dev mailing list
 openstack-...@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
 
 

Re: [Openstack] [openstack-dev] Discussion about where to put database for bare-metal provisioning (review 10726)

2012-08-21 Thread Vishvananda Ishaya
To elaborate, something the below. I'm not absolutely sure you need to be able 
to set service_name and host, but this gives you the option to do so if needed.

iff --git a/nova/manager.py b/nova/manager.py
index c6711aa..c0f4669 100644
--- a/nova/manager.py
+++ b/nova/manager.py
@@ -217,6 +217,8 @@ class SchedulerDependentManager(Manager):
 
 def update_service_capabilities(self, capabilities):
 Remember these capabilities to send on next periodic update.
+if not isinstance(capabilities, list):
+capabilities = [capabilities]
 self.last_capabilities = capabilities
 
 @periodic_task
@@ -224,5 +226,8 @@ class SchedulerDependentManager(Manager):
 Pass data back to the scheduler at a periodic interval.
 if self.last_capabilities:
 LOG.debug(_('Notifying Schedulers of capabilities ...'))
-self.scheduler_rpcapi.update_service_capabilities(context,
-self.service_name, self.host, self.last_capabilities)
+for capability_item in self.last_capabilities:
+name = capability_item.get('service_name', self.service_name)
+host = capability_item.get('host', self.host)
+self.scheduler_rpcapi.update_service_capabilities(context,
+name, host, capability_item)

On Aug 21, 2012, at 1:28 PM, David Kang dk...@isi.edu wrote:

 
  Hi Vish,
 
  We are trying to change our code according to your comment.
 I want to ask a question.
 
 a) modify driver.get_host_stats to be able to return a list of host
 stats instead of just one. Report the whole list back to the
 scheduler. We could modify the receiving end to accept a list as
 well
 or just make multiple calls to
 self.update_service_capabilities(capabilities)
 
  Modifying driver.get_host_stats to return a list of host stats is easy.
 Calling muliple calls to self.update_service_capabilities(capabilities) 
 doesn't seem to work,
 because 'capabilities' is overwritten each time.
 
  Modifying the receiving end to accept a list seems to be easy.
 However, 'capabilities' is assumed to be dictionary by all other scheduler 
 routines,
 it looks like that we have to change all of them to handle 'capability' as a 
 list of dictionary.
 
  If my understanding is correct, it would affect many parts of the scheduler.
 Is it what you recommended?
 
  Thanks,
  David
  
 
 - Original Message -
 This was an immediate goal, the bare-metal nova-compute node could
 keep an internal database, but report capabilities through nova in the
 common way with the changes below. Then the scheduler wouldn't need
 access to the bare metal database at all.
 
 On Aug 15, 2012, at 4:23 PM, David Kang dk...@isi.edu wrote:
 
 
 Hi Vish,
 
 Is this discussion for long-term goal or for this Folsom release?
 
 We still believe that bare-metal database is needed
 because there is not an automated way how bare-metal nodes report
 their capabilities
 to their bare-metal nova-compute node.
 
 Thanks,
 David
 
 
 I am interested in finding a solution that enables bare-metal and
 virtualized requests to be serviced through the same scheduler
 where
 the compute_nodes table has a full view of schedulable resources.
 This
 would seem to simplify the end-to-end flow while opening up some
 additional use cases (e.g. dynamic allocation of a node from
 bare-metal to hypervisor and back).
 
 One approach would be to have a proxy running a single nova-compute
 daemon fronting the bare-metal nodes . That nova-compute daemon
 would
 report up many HostState objects (1 per bare-metal node) to become
 entries in the compute_nodes table and accessible through the
 scheduler HostManager object.
 
 
 
 
 The HostState object would set cpu_info, vcpus, member_mb and
 local_gb
 values to be used for scheduling with the hypervisor_host field
 holding the bare-metal machine address (e.g. for IPMI based
 commands)
 and hypervisor_type = NONE. The bare-metal Flavors are created with
 an
 extra_spec of hypervisor_type= NONE and the corresponding
 compute_capabilities_filter would reduce the available hosts to
 those
 bare_metal nodes. The scheduler would need to understand that
 hypervisor_type = NONE means you need an exact fit (or best-fit)
 host
 vs weighting them (perhaps through the multi-scheduler). The
 scheduler
 would cast out the message to the topic.service-hostname (code
 today uses the HostState hostname), with the compute driver having
 to
 understand if it must be serviced elsewhere (but does not break any
 existing implementations since it is 1 to 1).
 
 
 
 
 
 Does this solution seem workable? Anything I missed?
 
 The bare metal driver already is proxying for the other nodes so it
 sounds like we need a couple of things to make this happen:
 
 
 a) modify driver.get_host_stats to be able to return a list of host
 stats instead of just one. Report the whole list back to the
 scheduler. We could modify the receiving end to accept a 

Re: [Openstack] [openstack-dev] Discussion about where to put database for bare-metal provisioning (review 10726)

2012-08-16 Thread Michael J Fork

vishvana...@gmail.com wrote on 08/15/2012 06:54:58 PM:

 From: Vishvananda Ishaya vishvana...@gmail.com
 To: OpenStack Development Mailing List
openstack-...@lists.openstack.org,
 Cc: openstack@lists.launchpad.net \(openstack@lists.launchpad.net
 \) openstack@lists.launchpad.net
 Date: 08/15/2012 06:58 PM
 Subject: Re: [Openstack] [openstack-dev] Discussion about where to
 put database for bare-metal provisioning (review 10726)
 Sent by: openstack-bounces+mjfork=us.ibm@lists.launchpad.net

 On Aug 15, 2012, at 3:17 PM, Michael J Fork mjf...@us.ibm.com wrote:

  I am interested in finding a solution that enables bare-metal and
  virtualized requests to be serviced through the same scheduler where
  the compute_nodes table has a full view of schedulable resources.
  This would seem to simplify the end-to-end flow while opening up
  some additional use cases (e.g. dynamic allocation of a node from
  bare-metal to hypervisor and back).
 
  One approach would be to have a proxy running a single nova-compute
  daemon fronting the bare-metal nodes .  That nova-compute daemon
  would report up many HostState objects (1 per bare-metal node) to
  become entries in the compute_nodes table and accessible through the
  scheduler HostManager object.
  The HostState object would set cpu_info, vcpus, member_mb and
  local_gb values to be used for scheduling with the hypervisor_host
  field holding the bare-metal machine address (e.g. for IPMI based
  commands) and hypervisor_type = NONE.  The bare-metal Flavors are
  created with an extra_spec of hypervisor_type= NONE and the
  corresponding compute_capabilities_filter would reduce the available
  hosts to those bare_metal nodes.  The scheduler would need to
  understand that hypervisor_type = NONE means you need an exact fit
  (or best-fit) host vs weighting them (perhaps through the multi-
  scheduler).  The scheduler would cast out the message to the
  topic.service-hostname (code today uses the HostState hostname),
  with the compute driver having to understand if it must be serviced
  elsewhere (but does not break any existing implementations since it
  is 1 to 1).
 
  Does this solution seem workable? Anything I missed?
  The bare metal driver already is proxying for the other nodes so it
 sounds like we need a couple of things to make this happen:

 a) modify driver.get_host_stats to be able to return a list of host
 stats instead of just one. Report the whole list back to the
 scheduler. We could modify the receiving end to accept a list as
 well or just make multiple calls to
 self.update_service_capabilities(capabilities)

 b) make a few minor changes to the scheduler to make sure filtering
 still works. Note the changes here may be very helpful:

 https://review.openstack.org/10327

 c) we have to make sure that instances launched on those nodes take
 up the entire host state somehow. We could probably do this by
 making sure that the instance_type ram, mb, gb etc. matches what the
 node has, but we may want a new boolean field used if those aren't
 sufficient.

My initial thought is that showing the actual resources the guest requested
as being consumed in HostState would enable use cases like migrating a
guest running on a too-big machine to a right-size one.  However, that
would required the bare-metal node to store the state of the requested
guest when that information could be obtained from the instance_type.

For now, the simplest is probably to have the bare-metal virt driver set
the disk_available = 0 and host_memory_free = 0 so the scheduler removes
them from consideration, with the vcpus, disk_total, host_memory_total set
to the physical machine values.  If the requested guest size is easily
accessible, the _used values could be set to those values (although not
clear if anything would break though with _total != _free + _used, in which
case setting _used = _total would seem to be acceptable for now).

Another options is to add num_instances to HostState and have the
bare-metal filter remove hypervisor_type = NONE with num_instances  0.
The scheduler would never see them and then would be no need to show them
fully consumed.  Drawback is that the num_instances call is marked as being
expensive and would incur some overhead.

 I This approach seems pretty good. We could potentially get rid of
 the shared bare_metal_node table. I guess the only other concern is
 how you populate the capabilities that the bare metal nodes are
 reporting. I guess an api extension that rpcs to a baremetal node to
 add the node. Maybe someday this could be autogenerated by the bare
 metal host looking in its arp table for dhcp requests! :)

 Vish
 ___
 Mailing list: https://launchpad.net/~openstack
 Post to : openstack@lists.launchpad.net
 Unsubscribe : https://launchpad.net/~openstack
 More help   : https://help.launchpad.net/ListHelp

Michael

-
Michael Fork
Cloud

Re: [Openstack] [openstack-dev] Discussion about where to put database for bare-metal provisioning (review 10726)

2012-08-15 Thread Michael J Fork

I am interested in finding a solution that enables bare-metal and
virtualized requests to be serviced through the same scheduler where the
compute_nodes table has a full view of schedulable resources.  This would
seem to simplify the end-to-end flow while opening up some additional use
cases (e.g. dynamic allocation of a node from bare-metal to hypervisor and
back).

One approach would be to have a proxy running a single nova-compute daemon
fronting the bare-metal nodes .  That nova-compute daemon would report up
many HostState objects (1 per bare-metal node) to become entries in the
compute_nodes table and accessible through the scheduler HostManager
object.  The HostState object would set cpu_info, vcpus, member_mb and
local_gb values to be used for scheduling with the hypervisor_host field
holding the bare-metal machine address (e.g. for IPMI based commands) and
hypervisor_type = NONE.  The bare-metal Flavors are created with an
extra_spec of hypervisor_type= NONE and the corresponding
compute_capabilities_filter would reduce the available hosts to those
bare_metal nodes.  The scheduler would need to understand that
hypervisor_type = NONE means you need an exact fit (or best-fit) host vs
weighting them (perhaps through the multi-scheduler).  The scheduler would
cast out the message to the topic.service-hostname (code today uses the
HostState hostname), with the compute driver having to understand if it
must be serviced elsewhere (but does not break any existing implementations
since it is 1 to 1).

Does this solution seem workable? Anything I missed?

Michael

-
Michael Fork
Cloud Architect, Emerging Solutions
IBM Systems  Technology Group

David Kang dk...@isi.edu wrote on 08/15/2012 11:28:34 AM:

 From: David Kang dk...@isi.edu
 To: OpenStack Development Mailing List openstack-
 d...@lists.openstack.org, openstack@lists.launchpad.net
 (openstack@lists.launchpad.net) openstack@lists.launchpad.net,
 Cc: mkk...@isi.edu, Ken Ash 50ba...@gmail.com, VTJ NOTSU Arata
 no...@virtualtech.jp
 Date: 08/15/2012 02:08 PM
 Subject: [openstack-dev] Discussion about where to put database for
 bare-metal provisioning (review 10726)



  Hi,

  This is call for discussion about the code review 10726.
 https://review.openstack.org/#/c/10726/
 Mark asked why we implemented a separata database for bare-metal
provisioning.
 Here we describe our thought.
 We are open to discussion and to the changes that the community
recommends.
 Please give us your thoughts.

  NTT Docomo and USC/ISI have developed bare-metal provisioning.
 We created separate database to describe bare-metal nodes, which
 consists of 5 tables now.
 Our initial implementation assumes the database is not a part of
 nova database.
 In addition to the reasons described in the comments of the code review,
 here is another reason we decided a separate database for baremetal
 provisioning.

 Bare-metal database is mainly used by bare-metal nova-compute.
 Since bare-metal nova-compute manages multiple bare-metal machines,
 it needs to keep/update the information of bare-metal machines.
 If the bare-metal database is in the main nova db, accessing nova db
 remotely by
 bare-metal nova-compute is inevitable.
 Once Vish told us that shared db access from nova-compute is not
desirable.

 It is possible to make the scheduler do the job of bare-metal
nova-compute.
 However, it would need a big changes in how the scheduler and a
nova-compute
 communicates. For example, currently, the scheduler casts an instance to
a
 nova-compute. But for bare-metal node, the scheduler should cast an
instance
 to a bare-metal machine through bare-metal nova-compute.
 Bare-metal nova-compute should boot the machine, transfer kernel, fs,
etc.
 So, bare-metal nova-compute should know the id of bare-metal node
 and other information
 for booting (PXE ip address, ...) and more.
 That information should be sent to bare-metal nova-compute by the
scheduler.

 If frequent access of bare-metal tables in nova db from bare-metal
 nova-compute
 is OK, we are OK to put the bare-metal tables into nova db.

  Please let us know your opinions.

  Thanks,
  David, Mikyung @ USC/ISI

 --
 Dr. Dong-In David Kang
 Computer Scientist
 USC/ISI


 ___
 OpenStack-dev mailing list
 openstack-...@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] [openstack-dev] Discussion about where to put database for bare-metal provisioning (review 10726)

2012-08-15 Thread Vishvananda Ishaya

On Aug 15, 2012, at 3:17 PM, Michael J Fork mjf...@us.ibm.com wrote:

 I am interested in finding a solution that enables bare-metal and virtualized 
 requests to be serviced through the same scheduler where the compute_nodes 
 table has a full view of schedulable resources.  This would seem to simplify 
 the end-to-end flow while opening up some additional use cases (e.g. dynamic 
 allocation of a node from bare-metal to hypervisor and back).  
 
 One approach would be to have a proxy running a single nova-compute daemon 
 fronting the bare-metal nodes .  That nova-compute daemon would report up 
 many HostState objects (1 per bare-metal node) to become entries in the 
 compute_nodes table and accessible through the scheduler HostManager object.
 
 The HostState object would set cpu_info, vcpus, member_mb and local_gb values 
 to be used for scheduling with the hypervisor_host field holding the 
 bare-metal machine address (e.g. for IPMI based commands) and hypervisor_type 
 = NONE.  The bare-metal Flavors are created with an extra_spec of 
 hypervisor_type= NONE and the corresponding compute_capabilities_filter would 
 reduce the available hosts to those bare_metal nodes.  The scheduler would 
 need to understand that hypervisor_type = NONE means you need an exact fit 
 (or best-fit) host vs weighting them (perhaps through the multi-scheduler).  
 The scheduler would cast out the message to the topic.service-hostname 
 (code today uses the HostState hostname), with the compute driver having to 
 understand if it must be serviced elsewhere (but does not break any existing 
 implementations since it is 1 to 1).
 
 
 Does this solution seem workable? Anything I missed?
 
The bare metal driver already is proxying for the other nodes so it sounds like 
we need a couple of things to make this happen:

a) modify driver.get_host_stats to be able to return a list of host stats 
instead of just one. Report the whole list back to the scheduler. We could 
modify the receiving end to accept a list as well or just make multiple calls 
to 
self.update_service_capabilities(capabilities)

b) make a few minor changes to the scheduler to make sure filtering still 
works. Note the changes here may be very helpful:

https://review.openstack.org/10327

c) we have to make sure that instances launched on those nodes take up the 
entire host state somehow. We could probably do this by making sure that the 
instance_type ram, mb, gb etc. matches what the node has, but we may want a new 
boolean field used if those aren't sufficient.

I This approach seems pretty good. We could potentially get rid of the shared 
bare_metal_node table. I guess the only other concern is how you populate the 
capabilities that the bare metal nodes are reporting. I guess an api extension 
that rpcs to a baremetal node to add the node. Maybe someday this could be 
autogenerated by the bare metal host looking in its arp table for dhcp 
requests! :)

Vish

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] [openstack-dev] Discussion about where to put database for bare-metal provisioning (review 10726)

2012-08-15 Thread David Kang

 Hi Vish,

 Is this discussion for long-term goal or for this Folsom release?

 We still believe that bare-metal database is needed
because there is not an automated way how bare-metal nodes report their 
capabilities
to their bare-metal nova-compute node. 

 Thanks,
 David
 
 
 I am interested in finding a solution that enables bare-metal and
 virtualized requests to be serviced through the same scheduler where
 the compute_nodes table has a full view of schedulable resources. This
 would seem to simplify the end-to-end flow while opening up some
 additional use cases (e.g. dynamic allocation of a node from
 bare-metal to hypervisor and back).
 
 One approach would be to have a proxy running a single nova-compute
 daemon fronting the bare-metal nodes . That nova-compute daemon would
 report up many HostState objects (1 per bare-metal node) to become
 entries in the compute_nodes table and accessible through the
 scheduler HostManager object.
 
 
 
 
 The HostState object would set cpu_info, vcpus, member_mb and local_gb
 values to be used for scheduling with the hypervisor_host field
 holding the bare-metal machine address (e.g. for IPMI based commands)
 and hypervisor_type = NONE. The bare-metal Flavors are created with an
 extra_spec of hypervisor_type= NONE and the corresponding
 compute_capabilities_filter would reduce the available hosts to those
 bare_metal nodes. The scheduler would need to understand that
 hypervisor_type = NONE means you need an exact fit (or best-fit) host
 vs weighting them (perhaps through the multi-scheduler). The scheduler
 would cast out the message to the topic.service-hostname (code
 today uses the HostState hostname), with the compute driver having to
 understand if it must be serviced elsewhere (but does not break any
 existing implementations since it is 1 to 1).
 
 
 
 
 
 Does this solution seem workable? Anything I missed?
 
 The bare metal driver already is proxying for the other nodes so it
 sounds like we need a couple of things to make this happen:
 
 
 a) modify driver.get_host_stats to be able to return a list of host
 stats instead of just one. Report the whole list back to the
 scheduler. We could modify the receiving end to accept a list as well
 or just make multiple calls to
 self.update_service_capabilities(capabilities)
 
 
 b) make a few minor changes to the scheduler to make sure filtering
 still works. Note the changes here may be very helpful:
 
 
 https://review.openstack.org/10327
 
 
 c) we have to make sure that instances launched on those nodes take up
 the entire host state somehow. We could probably do this by making
 sure that the instance_type ram, mb, gb etc. matches what the node
 has, but we may want a new boolean field used if those aren't
 sufficient.
 
 
 I This approach seems pretty good. We could potentially get rid of the
 shared bare_metal_node table. I guess the only other concern is how
 you populate the capabilities that the bare metal nodes are reporting.
 I guess an api extension that rpcs to a baremetal node to add the
 node. Maybe someday this could be autogenerated by the bare metal host
 looking in its arp table for dhcp requests! :)
 
 
 Vish
 
 ___
 OpenStack-dev mailing list
 openstack-...@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] [openstack-dev] Discussion about where to put database for bare-metal provisioning (review 10726)

2012-08-15 Thread Vishvananda Ishaya
This was an immediate goal, the bare-metal nova-compute node could keep an 
internal database, but report capabilities through nova in the common way with 
the changes below. Then the scheduler wouldn't need access to the bare metal 
database at all.

On Aug 15, 2012, at 4:23 PM, David Kang dk...@isi.edu wrote:

 
 Hi Vish,
 
 Is this discussion for long-term goal or for this Folsom release?
 
 We still believe that bare-metal database is needed
 because there is not an automated way how bare-metal nodes report their 
 capabilities
 to their bare-metal nova-compute node. 
 
 Thanks,
 David
 
 
 I am interested in finding a solution that enables bare-metal and
 virtualized requests to be serviced through the same scheduler where
 the compute_nodes table has a full view of schedulable resources. This
 would seem to simplify the end-to-end flow while opening up some
 additional use cases (e.g. dynamic allocation of a node from
 bare-metal to hypervisor and back).
 
 One approach would be to have a proxy running a single nova-compute
 daemon fronting the bare-metal nodes . That nova-compute daemon would
 report up many HostState objects (1 per bare-metal node) to become
 entries in the compute_nodes table and accessible through the
 scheduler HostManager object.
 
 
 
 
 The HostState object would set cpu_info, vcpus, member_mb and local_gb
 values to be used for scheduling with the hypervisor_host field
 holding the bare-metal machine address (e.g. for IPMI based commands)
 and hypervisor_type = NONE. The bare-metal Flavors are created with an
 extra_spec of hypervisor_type= NONE and the corresponding
 compute_capabilities_filter would reduce the available hosts to those
 bare_metal nodes. The scheduler would need to understand that
 hypervisor_type = NONE means you need an exact fit (or best-fit) host
 vs weighting them (perhaps through the multi-scheduler). The scheduler
 would cast out the message to the topic.service-hostname (code
 today uses the HostState hostname), with the compute driver having to
 understand if it must be serviced elsewhere (but does not break any
 existing implementations since it is 1 to 1).
 
 
 
 
 
 Does this solution seem workable? Anything I missed?
 
 The bare metal driver already is proxying for the other nodes so it
 sounds like we need a couple of things to make this happen:
 
 
 a) modify driver.get_host_stats to be able to return a list of host
 stats instead of just one. Report the whole list back to the
 scheduler. We could modify the receiving end to accept a list as well
 or just make multiple calls to
 self.update_service_capabilities(capabilities)
 
 
 b) make a few minor changes to the scheduler to make sure filtering
 still works. Note the changes here may be very helpful:
 
 
 https://review.openstack.org/10327
 
 
 c) we have to make sure that instances launched on those nodes take up
 the entire host state somehow. We could probably do this by making
 sure that the instance_type ram, mb, gb etc. matches what the node
 has, but we may want a new boolean field used if those aren't
 sufficient.
 
 
 I This approach seems pretty good. We could potentially get rid of the
 shared bare_metal_node table. I guess the only other concern is how
 you populate the capabilities that the bare metal nodes are reporting.
 I guess an api extension that rpcs to a baremetal node to add the
 node. Maybe someday this could be autogenerated by the bare metal host
 looking in its arp table for dhcp requests! :)
 
 
 Vish
 
 ___
 OpenStack-dev mailing list
 openstack-...@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
 
 ___
 OpenStack-dev mailing list
 openstack-...@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] [openstack-dev] Discussion about where to put database for bare-metal provisioning (review 10726)

2012-08-15 Thread David Kang

 I see.
We (NTT and USC/ISI) will discuss more about that.
And we will implement according to your input.
From now, we assume to use current db.

 Here are my understanding of your comment. Please correct me if it is not 
correct.

 a) is clear.
 b) NTT Docomo has implemented some features that you mentioned in a new file 
baremetal_host_manager.py.
  We will not use baremetal_host_manager.py but will modify host_manager.py 
directly.
  I'm not sure if we have to change the scheduler. 
 c) I think used field my work. We'll look into it.

 Thanks,
 David

--
Dr. Dong-In David Kang
Computer Scientist
USC/ISI

- Original Message -
 This was an immediate goal, the bare-metal nova-compute node could
 keep an internal database, but report capabilities through nova in the
 common way with the changes below. Then the scheduler wouldn't need
 access to the bare metal database at all.
 
 On Aug 15, 2012, at 4:23 PM, David Kang dk...@isi.edu wrote:
 
 
  Hi Vish,
 
  Is this discussion for long-term goal or for this Folsom release?
 
  We still believe that bare-metal database is needed
  because there is not an automated way how bare-metal nodes report
  their capabilities
  to their bare-metal nova-compute node.
 
  Thanks,
  David
 
 
  I am interested in finding a solution that enables bare-metal and
  virtualized requests to be serviced through the same scheduler
  where
  the compute_nodes table has a full view of schedulable resources.
  This
  would seem to simplify the end-to-end flow while opening up some
  additional use cases (e.g. dynamic allocation of a node from
  bare-metal to hypervisor and back).
 
  One approach would be to have a proxy running a single nova-compute
  daemon fronting the bare-metal nodes . That nova-compute daemon
  would
  report up many HostState objects (1 per bare-metal node) to become
  entries in the compute_nodes table and accessible through the
  scheduler HostManager object.
 
 
 
 
  The HostState object would set cpu_info, vcpus, member_mb and
  local_gb
  values to be used for scheduling with the hypervisor_host field
  holding the bare-metal machine address (e.g. for IPMI based
  commands)
  and hypervisor_type = NONE. The bare-metal Flavors are created with
  an
  extra_spec of hypervisor_type= NONE and the corresponding
  compute_capabilities_filter would reduce the available hosts to
  those
  bare_metal nodes. The scheduler would need to understand that
  hypervisor_type = NONE means you need an exact fit (or best-fit)
  host
  vs weighting them (perhaps through the multi-scheduler). The
  scheduler
  would cast out the message to the topic.service-hostname (code
  today uses the HostState hostname), with the compute driver having
  to
  understand if it must be serviced elsewhere (but does not break any
  existing implementations since it is 1 to 1).
 
 
 
 
 
  Does this solution seem workable? Anything I missed?
 
  The bare metal driver already is proxying for the other nodes so it
  sounds like we need a couple of things to make this happen:
 
 
  a) modify driver.get_host_stats to be able to return a list of host
  stats instead of just one. Report the whole list back to the
  scheduler. We could modify the receiving end to accept a list as
  well
  or just make multiple calls to
  self.update_service_capabilities(capabilities)
 
 
  b) make a few minor changes to the scheduler to make sure filtering
  still works. Note the changes here may be very helpful:
 
 
  https://review.openstack.org/10327
 
 
  c) we have to make sure that instances launched on those nodes take
  up
  the entire host state somehow. We could probably do this by making
  sure that the instance_type ram, mb, gb etc. matches what the node
  has, but we may want a new boolean field used if those aren't
  sufficient.
 
 
  I This approach seems pretty good. We could potentially get rid of
  the
  shared bare_metal_node table. I guess the only other concern is how
  you populate the capabilities that the bare metal nodes are
  reporting.
  I guess an api extension that rpcs to a baremetal node to add the
  node. Maybe someday this could be autogenerated by the bare metal
  host
  looking in its arp table for dhcp requests! :)
 
 
  Vish
 
  ___
  OpenStack-dev mailing list
  openstack-...@lists.openstack.org
  http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
 
  ___
  OpenStack-dev mailing list
  openstack-...@lists.openstack.org
  http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
 
 
 ___
 OpenStack-dev mailing list
 openstack-...@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe :