Re: [openstack-dev] [TripleO] config-download/ansible next steps
On Mon, Jun 18, 2018 at 1:51 PM, Dmitry Tantsur wrote: > On 06/13/2018 03:17 PM, James Slagle wrote: >> >> On Wed, Jun 13, 2018 at 6:49 AM, Dmitry Tantsur >> wrote: >>> >>> Slightly hijacking the thread to provide a status update on one of the >>> items >>> :) >> >> >> Thanks for jumping in. >> >> >>> The immediate plan right now is to wait for metalsmith 0.4.0 to hit the >>> repositories, then start experimenting. I need to find a way to >>> 1. make creating nova instances no-op >>> 2. collect the required information from the created stack (I need >>> networks, >>> ports, hostnames, initial SSH keys, capabilities, images) >>> 3. update the config-download code to optionally include the role [2] >>> I'm not entirely sure where to start, so any hints are welcome. >> >> >> Here are a couple of possibilities. >> >> We could reuse the OS::TripleO::{{role.name}}Server mappings that we >> already have in place for pre-provisioned nodes (deployed-server). >> This could be mapped to a template that exposes some Ansible tasks as >> outputs that drives metalsmith to do the deployment. When >> config-download runs, it would execute these ansible tasks to >> provision the nodes with Ironic. This has the advantage of maintaining >> compatibility with our existing Heat parameter interfaces. It removes >> Nova from the deployment so that from the undercloud perspective you'd >> roughly have: >> >> Mistral -> Heat -> config-download -> Ironic (driven via >> ansible/metalsmith) > > > One thing that came to my mind while planning this work is that I'd prefer > all nodes to be processed in one step. This will help avoiding some issues > that we have now. For example, the following does not work reliably: > > compute-0: just any profile:compute > compute-1: precise node=abcd > control-0: any node > > This has two issues that will pop up randomly: > 1. compute-0 can pick node abcd designated for compute-1 > 2. control-0 can pick a compute node, failing either compute-0 or compute-1 > > This problem is hard to fix if all deployment requests are processed > separately, but is quite trivial if the decision is done based on the whole > deployment plan. I'm going to work on a bulk scheduler like that in > metalsmith. > >> >> A further (or completely different) iteration might look like: >> >> Step 1: Mistral -> Ironic (driven via ansible/metalsmith) >> Step 2: Heat -> config-download > > > Step 1 will still use provided environment to figure out the count of nodes > for each role, their images, capabilities and (optionally) precise node > scheduling? > I'm a bit worried about the last bit: IIRC we rely on Heat's %index% > variable currently. We can, of course, ask people to replace it with > something more explicit on upgrade. > >> >> Step 2 would use the pre-provisioned node (deployed-server) feature >> already existing in TripleO and treat the just provisioned by Ironic >> nodes, as pre-provisioned from the Heat stack perspective. Step 1 and >> Step 2 would also probably be driven by a higher level Mistral >> workflow. This has the advantage of minimal impact to >> tripleo-heat-templates, and also removes Heat from the baremetal >> provisioning step. However, we'd likely need some python compatibility >> libraries that could translate Heat parameter values such as >> HostnameMap to ansible vars for some basic backwards compatibility. > > > Overall, I like this option better. It will allow an operator to isolate the > bare metal provisioning step from everything else. > >> >>> >>> [1] https://github.com/openstack/metalsmith >>> [2] https://metalsmith.readthedocs.io/en/latest/user/ansible.html >>> Obviously we have things to consider here such as backwards compatibility and upgrades, but overall, I think this would be a great simplification to our overall deployment workflow. >>> >>> Yeah, this is tricky. Can we make Heat "forget" about Nova instances? >>> Maybe >>> by re-defining them to OS::Heat::None? >> >> >> Not exactly, as Heat would delete the previous versions of the >> resources. We'd need some special migrations, or could support the >> existing method forever for upgrades, and only deprecate it for new >> deployments. > > > Do I get it right that if we redefine OS::TripleO::{{role.name}}Server to be > OS::Heat::None, Heat will delete the old {{role.name}}Server instances on > the next update? This is sad.. > > I'd prefer not to keep Nova support forever, this is going to be hard to > maintain and cover by the CI. Should we extend Heat to support "forgetting" > resources? I think it may have a use case outside of TripleO. This is already supported, it's just not the default: https://docs.openstack.org/heat/latest/template_guide/hot_spec.html#resources-section you can used e.g deletion_policy: retain to skip the deletion of the underlying heat-managed resource. Steve __ OpenStack Development Mailing L
Re: [openstack-dev] [TripleO] config-download/ansible next steps
On 06/13/2018 03:17 PM, James Slagle wrote: On Wed, Jun 13, 2018 at 6:49 AM, Dmitry Tantsur wrote: Slightly hijacking the thread to provide a status update on one of the items :) Thanks for jumping in. The immediate plan right now is to wait for metalsmith 0.4.0 to hit the repositories, then start experimenting. I need to find a way to 1. make creating nova instances no-op 2. collect the required information from the created stack (I need networks, ports, hostnames, initial SSH keys, capabilities, images) 3. update the config-download code to optionally include the role [2] I'm not entirely sure where to start, so any hints are welcome. Here are a couple of possibilities. We could reuse the OS::TripleO::{{role.name}}Server mappings that we already have in place for pre-provisioned nodes (deployed-server). This could be mapped to a template that exposes some Ansible tasks as outputs that drives metalsmith to do the deployment. When config-download runs, it would execute these ansible tasks to provision the nodes with Ironic. This has the advantage of maintaining compatibility with our existing Heat parameter interfaces. It removes Nova from the deployment so that from the undercloud perspective you'd roughly have: Mistral -> Heat -> config-download -> Ironic (driven via ansible/metalsmith) One thing that came to my mind while planning this work is that I'd prefer all nodes to be processed in one step. This will help avoiding some issues that we have now. For example, the following does not work reliably: compute-0: just any profile:compute compute-1: precise node=abcd control-0: any node This has two issues that will pop up randomly: 1. compute-0 can pick node abcd designated for compute-1 2. control-0 can pick a compute node, failing either compute-0 or compute-1 This problem is hard to fix if all deployment requests are processed separately, but is quite trivial if the decision is done based on the whole deployment plan. I'm going to work on a bulk scheduler like that in metalsmith. A further (or completely different) iteration might look like: Step 1: Mistral -> Ironic (driven via ansible/metalsmith) Step 2: Heat -> config-download Step 1 will still use provided environment to figure out the count of nodes for each role, their images, capabilities and (optionally) precise node scheduling? I'm a bit worried about the last bit: IIRC we rely on Heat's %index% variable currently. We can, of course, ask people to replace it with something more explicit on upgrade. Step 2 would use the pre-provisioned node (deployed-server) feature already existing in TripleO and treat the just provisioned by Ironic nodes, as pre-provisioned from the Heat stack perspective. Step 1 and Step 2 would also probably be driven by a higher level Mistral workflow. This has the advantage of minimal impact to tripleo-heat-templates, and also removes Heat from the baremetal provisioning step. However, we'd likely need some python compatibility libraries that could translate Heat parameter values such as HostnameMap to ansible vars for some basic backwards compatibility. Overall, I like this option better. It will allow an operator to isolate the bare metal provisioning step from everything else. [1] https://github.com/openstack/metalsmith [2] https://metalsmith.readthedocs.io/en/latest/user/ansible.html Obviously we have things to consider here such as backwards compatibility and upgrades, but overall, I think this would be a great simplification to our overall deployment workflow. Yeah, this is tricky. Can we make Heat "forget" about Nova instances? Maybe by re-defining them to OS::Heat::None? Not exactly, as Heat would delete the previous versions of the resources. We'd need some special migrations, or could support the existing method forever for upgrades, and only deprecate it for new deployments. Do I get it right that if we redefine OS::TripleO::{{role.name}}Server to be OS::Heat::None, Heat will delete the old {{role.name}}Server instances on the next update? This is sad.. I'd prefer not to keep Nova support forever, this is going to be hard to maintain and cover by the CI. Should we extend Heat to support "forgetting" resources? I think it may have a use case outside of TripleO. I'd like to help with this work. I'll start by taking a look at what you've got so far. Feel free to reach out if you'd like some additional dev assistance or testing. Thanks! __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [TripleO] config-download/ansible next steps
Hi, On Wed, Jun 13, 2018 at 3:17 PM, James Slagle wrote: > On Wed, Jun 13, 2018 at 6:49 AM, Dmitry Tantsur wrote: >> Slightly hijacking the thread to provide a status update on one of the items >> :) > > Thanks for jumping in. > > >> The immediate plan right now is to wait for metalsmith 0.4.0 to hit the >> repositories, then start experimenting. I need to find a way to >> 1. make creating nova instances no-op >> 2. collect the required information from the created stack (I need networks, >> ports, hostnames, initial SSH keys, capabilities, images) >> 3. update the config-download code to optionally include the role [2] >> I'm not entirely sure where to start, so any hints are welcome. > > Here are a couple of possibilities. > > We could reuse the OS::TripleO::{{role.name}}Server mappings that we > already have in place for pre-provisioned nodes (deployed-server). > This could be mapped to a template that exposes some Ansible tasks as > outputs that drives metalsmith to do the deployment. When > config-download runs, it would execute these ansible tasks to > provision the nodes with Ironic. This has the advantage of maintaining > compatibility with our existing Heat parameter interfaces. It removes > Nova from the deployment so that from the undercloud perspective you'd > roughly have: > > Mistral -> Heat -> config-download -> Ironic (driven via ansible/metalsmith) > > A further (or completely different) iteration might look like: > > Step 1: Mistral -> Ironic (driven via ansible/metalsmith) > Step 2: Heat -> config-download I really like this approach. It decouples provisioning level from deployment. As a result we may use better level of parallelism. For instance, when we have 3 provisioned servers that match controller roles we may start controller deployment without waiting other nodes provisioning. For Compute role the strategy may be different such as deploy Compute server when at least one node provisioned. > > Step 2 would use the pre-provisioned node (deployed-server) feature > already existing in TripleO and treat the just provisioned by Ironic > nodes, as pre-provisioned from the Heat stack perspective. Step 1 and > Step 2 would also probably be driven by a higher level Mistral > workflow. This has the advantage of minimal impact to > tripleo-heat-templates, and also removes Heat from the baremetal > provisioning step. However, we'd likely need some python compatibility > libraries that could translate Heat parameter values such as > HostnameMap to ansible vars for some basic backwards compatibility. > >> >> [1] https://github.com/openstack/metalsmith >> [2] https://metalsmith.readthedocs.io/en/latest/user/ansible.html >> >>> >>> Obviously we have things to consider here such as backwards compatibility >>> and >>> upgrades, but overall, I think this would be a great simplification to our >>> overall deployment workflow. >>> >> >> Yeah, this is tricky. Can we make Heat "forget" about Nova instances? Maybe >> by re-defining them to OS::Heat::None? > > Not exactly, as Heat would delete the previous versions of the > resources. We'd need some special migrations, or could support the > existing method forever for upgrades, and only deprecate it for new > deployments. > > I'd like to help with this work. I'll start by taking a look at what > you've got so far. Feel free to reach out if you'd like some > additional dev assistance or testing. > > -- > -- James Slagle > -- > > __ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev -- Best Regards, Sergii Golovatiuk __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [TripleO] config-download/ansible next steps
On Wed, Jun 13, 2018 at 6:49 AM, Dmitry Tantsur wrote: > Slightly hijacking the thread to provide a status update on one of the items > :) Thanks for jumping in. > The immediate plan right now is to wait for metalsmith 0.4.0 to hit the > repositories, then start experimenting. I need to find a way to > 1. make creating nova instances no-op > 2. collect the required information from the created stack (I need networks, > ports, hostnames, initial SSH keys, capabilities, images) > 3. update the config-download code to optionally include the role [2] > I'm not entirely sure where to start, so any hints are welcome. Here are a couple of possibilities. We could reuse the OS::TripleO::{{role.name}}Server mappings that we already have in place for pre-provisioned nodes (deployed-server). This could be mapped to a template that exposes some Ansible tasks as outputs that drives metalsmith to do the deployment. When config-download runs, it would execute these ansible tasks to provision the nodes with Ironic. This has the advantage of maintaining compatibility with our existing Heat parameter interfaces. It removes Nova from the deployment so that from the undercloud perspective you'd roughly have: Mistral -> Heat -> config-download -> Ironic (driven via ansible/metalsmith) A further (or completely different) iteration might look like: Step 1: Mistral -> Ironic (driven via ansible/metalsmith) Step 2: Heat -> config-download Step 2 would use the pre-provisioned node (deployed-server) feature already existing in TripleO and treat the just provisioned by Ironic nodes, as pre-provisioned from the Heat stack perspective. Step 1 and Step 2 would also probably be driven by a higher level Mistral workflow. This has the advantage of minimal impact to tripleo-heat-templates, and also removes Heat from the baremetal provisioning step. However, we'd likely need some python compatibility libraries that could translate Heat parameter values such as HostnameMap to ansible vars for some basic backwards compatibility. > > [1] https://github.com/openstack/metalsmith > [2] https://metalsmith.readthedocs.io/en/latest/user/ansible.html > >> >> Obviously we have things to consider here such as backwards compatibility >> and >> upgrades, but overall, I think this would be a great simplification to our >> overall deployment workflow. >> > > Yeah, this is tricky. Can we make Heat "forget" about Nova instances? Maybe > by re-defining them to OS::Heat::None? Not exactly, as Heat would delete the previous versions of the resources. We'd need some special migrations, or could support the existing method forever for upgrades, and only deprecate it for new deployments. I'd like to help with this work. I'll start by taking a look at what you've got so far. Feel free to reach out if you'd like some additional dev assistance or testing. -- -- James Slagle -- __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [TripleO] config-download/ansible next steps
Slightly hijacking the thread to provide a status update on one of the items :) On 06/12/2018 07:04 PM, James Slagle wrote: I wanted to provide an update on some next steps around config-download/Ansible and TripleO. Now that we've completed transitioning to config-download by default in Rocky, some might be wondering where we're going next. 4. Ansible driven baremetal deployment Dmitry Tantsur has indicated he's going to be looking at driving TripleO baremetal provisioning with Ironic and ansible directly. This would remove Heat+Nova from the baremetal provisioning workflows we currently use. I'm actually already looking, my efforts just have not become visible yet. I started with reviving my old metalsmith project [1] to host the code we need to make this happen. This now has a CLI tool and a very dump (for now) ansible role [2] to drive it. Why a new tool? First, I want it to be reusable outside of TripleO (and outside of ansible modules), thus I don't want to put the code directly into, say, tripleo-common. Second, the current OpenStack Ansible modules are not quite sufficient for the task: 1. Both the os_ironic_node module and the underlying openstacksdk library lack support for the critically important VIF attachment API. I'm working on addressing that, but it will take substantial time (e.g. we need to stabilize the microversion support in openstacksdk). 2. Missing support for building configdrive. Again, can probably be added to openstacksdk, and I'll get to it one day. 3. No bulk operations. There is no way, to my best knowledge (please tell me I'm wrong), to provision several nodes in parallel via the current ansible modules. It is probably solvable via a new ansible module, but also see the next points. 4. No scheduling. That is, there is no way out-of-box to pick a suitable node for deployment. It can be done in pure ansible in the simplest case, but our case is not the simplest. Particularly, I don't want to end up parsing capabilities in ansible :) Also one of the goals of this work is to provide better messages than "No valid hosts found". 5. On top of #3 and #4, it is not possible to operate at the deployment level, not on the node level. From the current Heat stack we're going to receive a list of overcloud instances with their roles and other parameters. Some code has to take this input and make a decision on whether to deploy/undeploy something. It's currently done by Heat+Nova together, but they're not doing a great job in some corner cases. Particularly, replacing a node may be painful. So, while I do plan to solve #1 and #2 eventually, #3 - #5 require some place to put the logic. Putting it to TripleO or to ansible itself will preclude reusing it outside of TripleO and ansible accordingly. So, metalsmith is this place for now. I think in the far future I will try proposing a module to ansible itself that will handle #3 - #5 and will be backed by metalsmith. It will probably have a similar interface to the current PoC role [2]. The immediate plan right now is to wait for metalsmith 0.4.0 to hit the repositories, then start experimenting. I need to find a way to 1. make creating nova instances no-op 2. collect the required information from the created stack (I need networks, ports, hostnames, initial SSH keys, capabilities, images) 3. update the config-download code to optionally include the role [2] I'm not entirely sure where to start, so any hints are welcome. [1] https://github.com/openstack/metalsmith [2] https://metalsmith.readthedocs.io/en/latest/user/ansible.html Obviously we have things to consider here such as backwards compatibility and upgrades, but overall, I think this would be a great simplification to our overall deployment workflow. Yeah, this is tricky. Can we make Heat "forget" about Nova instances? Maybe by re-defining them to OS::Heat::None? Dmitry __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [TripleO] config-download/ansible next steps
I wanted to provide an update on some next steps around config-download/Ansible and TripleO. Now that we've completed transitioning to config-download by default in Rocky, some might be wondering where we're going next. 1. Standalone roles. The idea here is to refactor the ansible tasks lists into standalone ansible roles. From the tripleo-heat-templates side, we then just update the service templates to apply those roles (possibly with a specific task file). Since not all of the interfaces in tripleo-heat-templates are pure ansible tasks lists (docker_config, puppet_config), there is some exploratory work here to determine how we can use those inputs in both a standalone ansible role and tripleo-heat-templates. David Peacock sent out a POC of some inital work[1]. 2. Standalone playbooks. Similar to standalone roles, the idea here is to refactor some of the playbooks into their own proper ansible project directories. These would probably be new git repositories. Again, since some of our playbooks are rendered by jinja2, there is some exploratory work here to see how we can make these more re-usable and not as tightly coupled with tripleo-heat-templates. 3. Native ansible tasks for the per-server deployments in tripleo-heat-templates. Presently we are using a generic ansible task(s) that acts as a shim around the heat-config hooks for the per-server deployments. This is necessary for backwards compatibility. Going forward, we want to take a closer look at how we can use more native ansible tasks for these (e.g., os-net-config ansible module). This will improve our ansible playbook interfaces and make the playbooks more friendly for manual interactions. 4. Ansible driven baremetal deployment Dmitry Tantsur has indicated he's going to be looking at driving TripleO baremetal provisioning with Ironic and ansible directly. This would remove Heat+Nova from the baremetal provisioning workflows we currently use. Obviously we have things to consider here such as backwards compatibility and upgrades, but overall, I think this would be a great simplification to our overall deployment workflow. 5. Other deployment architectures There are various ongoing efforts continuing and spinning up related to the: - all-in-one/standalone installer[2] - the zero footprint installer[3] - split-controlplane[4] I think config-download with ansible is going to drive a lot of these use cases, particularly as it relates to edge deployments. If any of this is an area of interest, please reach out. You can find contacts on the provided links. There may be some upstream squads forming around some of this work in the near future. If you have other ideas about improvements/direction, please chime in. [1] http://lists.openstack.org/pipermail/openstack-dev/2018-March/128887.html [2] http://lists.openstack.org/pipermail/openstack-dev/2018-June/131135.html [3] http://lists.openstack.org/pipermail/openstack-dev/2018-June/131192.html [4] https://specs.openstack.org/openstack/tripleo-specs/specs/rocky/split-controlplane.html -- -- James Slagle -- __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev