On 06/13/2018 03:17 PM, James Slagle wrote:
On Wed, Jun 13, 2018 at 6:49 AM, Dmitry Tantsur <dtant...@redhat.com> wrote:
Slightly hijacking the thread to provide a status update on one of the items
:)

Thanks for jumping in.


The immediate plan right now is to wait for metalsmith 0.4.0 to hit the
repositories, then start experimenting. I need to find a way to
1. make creating nova instances no-op
2. collect the required information from the created stack (I need networks,
ports, hostnames, initial SSH keys, capabilities, images)
3. update the config-download code to optionally include the role [2]
I'm not entirely sure where to start, so any hints are welcome.

Here are a couple of possibilities.

We could reuse the OS::TripleO::{{role.name}}Server mappings that we
already have in place for pre-provisioned nodes (deployed-server).
This could be mapped to a template that exposes some Ansible tasks as
outputs that drives metalsmith to do the deployment. When
config-download runs, it would execute these ansible tasks to
provision the nodes with Ironic. This has the advantage of maintaining
compatibility with our existing Heat parameter interfaces. It removes
Nova from the deployment so that from the undercloud perspective you'd
roughly have:

Mistral -> Heat -> config-download -> Ironic (driven via ansible/metalsmith)

One thing that came to my mind while planning this work is that I'd prefer all nodes to be processed in one step. This will help avoiding some issues that we have now. For example, the following does not work reliably:

 compute-0: just any profile:compute
 compute-1: precise node=abcd
 control-0: any node

This has two issues that will pop up randomly:
1. compute-0 can pick node abcd designated for compute-1
2. control-0 can pick a compute node, failing either compute-0 or compute-1

This problem is hard to fix if all deployment requests are processed separately, but is quite trivial if the decision is done based on the whole deployment plan. I'm going to work on a bulk scheduler like that in metalsmith.


A further (or completely different) iteration might look like:

Step 1: Mistral -> Ironic (driven via ansible/metalsmith)
Step 2: Heat -> config-download

Step 1 will still use provided environment to figure out the count of nodes for each role, their images, capabilities and (optionally) precise node scheduling? I'm a bit worried about the last bit: IIRC we rely on Heat's %index% variable currently. We can, of course, ask people to replace it with something more explicit on upgrade.


Step 2 would use the pre-provisioned node (deployed-server)  feature
already existing in TripleO and treat the just provisioned by Ironic
nodes, as pre-provisioned from the Heat stack perspective. Step 1 and
Step 2 would also probably be driven by a higher level Mistral
workflow. This has the advantage of minimal impact to
tripleo-heat-templates, and also removes Heat from the baremetal
provisioning step. However, we'd likely need some python compatibility
libraries that could translate Heat parameter values such as
HostnameMap to ansible vars for some basic backwards compatibility.

Overall, I like this option better. It will allow an operator to isolate the bare metal provisioning step from everything else.



[1] https://github.com/openstack/metalsmith
[2] https://metalsmith.readthedocs.io/en/latest/user/ansible.html


Obviously we have things to consider here such as backwards compatibility
and
upgrades, but overall, I think this would be a great simplification to our
overall deployment workflow.


Yeah, this is tricky. Can we make Heat "forget" about Nova instances? Maybe
by re-defining them to OS::Heat::None?

Not exactly, as Heat would delete the previous versions of the
resources. We'd need some special migrations, or could support the
existing method forever for upgrades, and only deprecate it for new
deployments.

Do I get it right that if we redefine OS::TripleO::{{role.name}}Server to be OS::Heat::None, Heat will delete the old {{role.name}}Server instances on the next update? This is sad..

I'd prefer not to keep Nova support forever, this is going to be hard to maintain and cover by the CI. Should we extend Heat to support "forgetting" resources? I think it may have a use case outside of TripleO.


I'd like to help with this work. I'll start by taking a look at what
you've got so far. Feel free to reach out if you'd like some
additional dev assistance or testing.


Thanks!

__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Reply via email to