Re: [openstack-dev] [TripleO] RFC: profile matching

Dmitry Tantsur Wed, 02 Dec 2015 02:02:39 -0800

On 12/01/2015 06:55 PM, Ben Nemec wrote:

Sorry for not getting to this earlier.  Some thoughts inline.


On 11/09/2015 08:51 AM, Dmitry Tantsur wrote:

Hi folks!

I spent some time thinking about bringing profile matching back in, so
I'd like to get your comments on the following near-future plan.

First, the scope of the problem. What we do is essentially kind of
capability discovery. We'll help nova scheduler with doing the right
thing by assigning a capability like "suits for compute", "suits for
controller", etc. The most obvious path is to use inspector to assign
capabilities like "profile=1" and then filter nodes by it.

A special care, however, is needed when some of the nodes match 2 or
more profiles. E.g. if we have all 4 nodes matching "compute" and then
only 1 matching "controller", nova can select this one node for
"compute" flavor, and then complain that it does not have enough hosts
for "controller".

We also want to conduct some sanity check before even calling to
heat/nova to avoid cryptic "no valid host found" errors.

(1) Inspector part

During the liberty cycle we've landed a whole bunch of API's to
inspector that allow us to define rules on introspection data. The plan
is to have rules saying, for example:

   rule 1: if memory_mb >= 8192, add capability "compute_profile=1"
   rule 2: if local_gb >= 100, add capability "controller_profile=1"

Note that these rules are defined via inspector API using a JSON-based
DSL [1].

As you see, one node can receive 0, 1 or many such capabilities. So we
need the next step to make a final decision, based on how many nodes we
need of every profile.


Is the intent that this will replace the standalone ahc-match call that
currently assigns profiles to nodes?  In general I'm +1 on simplifying
the process (which is why I'm finally revisiting this) so I think I'm
onboard with that idea.

Yes


(2) Modifications of `overcloud deploy` command: assigning profiles

New argument --assign-profiles will be added. If it's provided,
tripleoclient will fetch all ironic nodes, and try to ensure that we
have enough nodes with all profiles.

Nodes with existing "profile:xxx" capability are left as they are. For
nodes without a profile it will look at "xxx_profile" capabilities
discovered on the previous step. One of the possible profiles will be
chosen and assigned to "profile" capability. The assignment stops as
soon as we have enough nodes of a flavor as requested by a user.


And this assignment would follow the same rules as the existing AHC
version does?  So if I had a rules file that specified 3 controllers, 3
cephs, and an unlimited number of computes, it would first find and
assign 3 controllers, then 3 cephs, and finally assign all the other
matching nodes to compute.

There's no longer a spec file, though we could create something likethat. The spec file had 2 problems:

1. it was used to maintain state in local file system

2. it was completely out of sync with what was later passed to thedeploy command. So you could, for example, request 1 controller and theremaining to be computes in a spec file, and then request deploy with 2controllers, which was doomed to fail.


I guess there's still a danger if ceph nodes also match the controller
profile definition but not the other way around, because a ceph node
might get chosen as a controller and then there won't be enough matching
ceph nodes when we get to that.  IIRC (it's been a while since I've done
automatic profile matching) that's how it would work today so it's an
existing problem, but it would be nice if we could fix that as part of
this work.  I'm not sure how complex the resolution code for such
conflicts would need to be.

My current patch does not deal with it. Spec file only had ordering, soyou could process 'ceph' before 'controller'. We can do the same byaccepting something like --profile-ordering=ceph,controller,compute. WDYT?


I can't think of something smarter for now, any ideas are welcome.


(3) Modifications of `overcloud deploy` command: validation

To avoid 'no valid host found' errors from nova, the deploy command will
fetch all flavors involved and look at the "profile" capabilities. If
they are set for any flavors, it will check if we have enough ironic
nodes with a given "profile:xxx" capability. This check will happen
after profiles assigning, if --assign-profiles is used.

By the way, this is already implemented. I was not aware of it whilewriting my first email.


Please let me know what you think.

[1] https://github.com/openstack/ironic-inspector#introspection-rules

__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: [email protected]?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: [email protected]?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: [email protected]?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [TripleO] RFC: profile matching

Reply via email to