Re: [openstack-dev] [TripleO] config-download/ansible next steps

2018-06-18 Thread Steven Hardy
On Mon, Jun 18, 2018 at 1:51 PM, Dmitry Tantsur  wrote:
> On 06/13/2018 03:17 PM, James Slagle wrote:
>>
>> On Wed, Jun 13, 2018 at 6:49 AM, Dmitry Tantsur 
>> wrote:
>>>
>>> Slightly hijacking the thread to provide a status update on one of the
>>> items
>>> :)
>>
>>
>> Thanks for jumping in.
>>
>>
>>> The immediate plan right now is to wait for metalsmith 0.4.0 to hit the
>>> repositories, then start experimenting. I need to find a way to
>>> 1. make creating nova instances no-op
>>> 2. collect the required information from the created stack (I need
>>> networks,
>>> ports, hostnames, initial SSH keys, capabilities, images)
>>> 3. update the config-download code to optionally include the role [2]
>>> I'm not entirely sure where to start, so any hints are welcome.
>>
>>
>> Here are a couple of possibilities.
>>
>> We could reuse the OS::TripleO::{{role.name}}Server mappings that we
>> already have in place for pre-provisioned nodes (deployed-server).
>> This could be mapped to a template that exposes some Ansible tasks as
>> outputs that drives metalsmith to do the deployment. When
>> config-download runs, it would execute these ansible tasks to
>> provision the nodes with Ironic. This has the advantage of maintaining
>> compatibility with our existing Heat parameter interfaces. It removes
>> Nova from the deployment so that from the undercloud perspective you'd
>> roughly have:
>>
>> Mistral -> Heat -> config-download -> Ironic (driven via
>> ansible/metalsmith)
>
>
> One thing that came to my mind while planning this work is that I'd prefer
> all nodes to be processed in one step. This will help avoiding some issues
> that we have now. For example, the following does not work reliably:
>
>  compute-0: just any profile:compute
>  compute-1: precise node=abcd
>  control-0: any node
>
> This has two issues that will pop up randomly:
> 1. compute-0 can pick node abcd designated for compute-1
> 2. control-0 can pick a compute node, failing either compute-0 or compute-1
>
> This problem is hard to fix if all deployment requests are processed
> separately, but is quite trivial if the decision is done based on the whole
> deployment plan. I'm going to work on a bulk scheduler like that in
> metalsmith.
>
>>
>> A further (or completely different) iteration might look like:
>>
>> Step 1: Mistral -> Ironic (driven via ansible/metalsmith)
>> Step 2: Heat -> config-download
>
>
> Step 1 will still use provided environment to figure out the count of nodes
> for each role, their images, capabilities and (optionally) precise node
> scheduling?
> I'm a bit worried about the last bit: IIRC we rely on Heat's %index%
> variable currently. We can, of course, ask people to replace it with
> something more explicit on upgrade.
>
>>
>> Step 2 would use the pre-provisioned node (deployed-server)  feature
>> already existing in TripleO and treat the just provisioned by Ironic
>> nodes, as pre-provisioned from the Heat stack perspective. Step 1 and
>> Step 2 would also probably be driven by a higher level Mistral
>> workflow. This has the advantage of minimal impact to
>> tripleo-heat-templates, and also removes Heat from the baremetal
>> provisioning step. However, we'd likely need some python compatibility
>> libraries that could translate Heat parameter values such as
>> HostnameMap to ansible vars for some basic backwards compatibility.
>
>
> Overall, I like this option better. It will allow an operator to isolate the
> bare metal provisioning step from everything else.
>
>>
>>>
>>> [1] https://github.com/openstack/metalsmith
>>> [2] https://metalsmith.readthedocs.io/en/latest/user/ansible.html
>>>

 Obviously we have things to consider here such as backwards
 compatibility
 and
 upgrades, but overall, I think this would be a great simplification to
 our
 overall deployment workflow.

>>>
>>> Yeah, this is tricky. Can we make Heat "forget" about Nova instances?
>>> Maybe
>>> by re-defining them to OS::Heat::None?
>>
>>
>> Not exactly, as Heat would delete the previous versions of the
>> resources. We'd need some special migrations, or could support the
>> existing method forever for upgrades, and only deprecate it for new
>> deployments.
>
>
> Do I get it right that if we redefine OS::TripleO::{{role.name}}Server to be
> OS::Heat::None, Heat will delete the old {{role.name}}Server instances on
> the next update? This is sad..
>
> I'd prefer not to keep Nova support forever, this is going to be hard to
> maintain and cover by the CI. Should we extend Heat to support "forgetting"
> resources? I think it may have a use case outside of TripleO.

This is already supported, it's just not the default:

https://docs.openstack.org/heat/latest/template_guide/hot_spec.html#resources-section

you can used e.g deletion_policy: retain to skip the deletion of the
underlying heat-managed resource.

Steve

__
OpenStack Development Mailing L

Re: [openstack-dev] [TripleO] config-download/ansible next steps

2018-06-18 Thread Dmitry Tantsur

On 06/13/2018 03:17 PM, James Slagle wrote:

On Wed, Jun 13, 2018 at 6:49 AM, Dmitry Tantsur  wrote:

Slightly hijacking the thread to provide a status update on one of the items
:)


Thanks for jumping in.



The immediate plan right now is to wait for metalsmith 0.4.0 to hit the
repositories, then start experimenting. I need to find a way to
1. make creating nova instances no-op
2. collect the required information from the created stack (I need networks,
ports, hostnames, initial SSH keys, capabilities, images)
3. update the config-download code to optionally include the role [2]
I'm not entirely sure where to start, so any hints are welcome.


Here are a couple of possibilities.

We could reuse the OS::TripleO::{{role.name}}Server mappings that we
already have in place for pre-provisioned nodes (deployed-server).
This could be mapped to a template that exposes some Ansible tasks as
outputs that drives metalsmith to do the deployment. When
config-download runs, it would execute these ansible tasks to
provision the nodes with Ironic. This has the advantage of maintaining
compatibility with our existing Heat parameter interfaces. It removes
Nova from the deployment so that from the undercloud perspective you'd
roughly have:

Mistral -> Heat -> config-download -> Ironic (driven via ansible/metalsmith)


One thing that came to my mind while planning this work is that I'd prefer all 
nodes to be processed in one step. This will help avoiding some issues that we 
have now. For example, the following does not work reliably:


 compute-0: just any profile:compute
 compute-1: precise node=abcd
 control-0: any node

This has two issues that will pop up randomly:
1. compute-0 can pick node abcd designated for compute-1
2. control-0 can pick a compute node, failing either compute-0 or compute-1

This problem is hard to fix if all deployment requests are processed separately, 
but is quite trivial if the decision is done based on the whole deployment plan. 
I'm going to work on a bulk scheduler like that in metalsmith.




A further (or completely different) iteration might look like:

Step 1: Mistral -> Ironic (driven via ansible/metalsmith)
Step 2: Heat -> config-download


Step 1 will still use provided environment to figure out the count of nodes for 
each role, their images, capabilities and (optionally) precise node scheduling?
I'm a bit worried about the last bit: IIRC we rely on Heat's %index% variable 
currently. We can, of course, ask people to replace it with something more 
explicit on upgrade.




Step 2 would use the pre-provisioned node (deployed-server)  feature
already existing in TripleO and treat the just provisioned by Ironic
nodes, as pre-provisioned from the Heat stack perspective. Step 1 and
Step 2 would also probably be driven by a higher level Mistral
workflow. This has the advantage of minimal impact to
tripleo-heat-templates, and also removes Heat from the baremetal
provisioning step. However, we'd likely need some python compatibility
libraries that could translate Heat parameter values such as
HostnameMap to ansible vars for some basic backwards compatibility.


Overall, I like this option better. It will allow an operator to isolate the 
bare metal provisioning step from everything else.






[1] https://github.com/openstack/metalsmith
[2] https://metalsmith.readthedocs.io/en/latest/user/ansible.html



Obviously we have things to consider here such as backwards compatibility
and
upgrades, but overall, I think this would be a great simplification to our
overall deployment workflow.



Yeah, this is tricky. Can we make Heat "forget" about Nova instances? Maybe
by re-defining them to OS::Heat::None?


Not exactly, as Heat would delete the previous versions of the
resources. We'd need some special migrations, or could support the
existing method forever for upgrades, and only deprecate it for new
deployments.


Do I get it right that if we redefine OS::TripleO::{{role.name}}Server to be 
OS::Heat::None, Heat will delete the old {{role.name}}Server instances on the 
next update? This is sad..


I'd prefer not to keep Nova support forever, this is going to be hard to 
maintain and cover by the CI. Should we extend Heat to support "forgetting" 
resources? I think it may have a use case outside of TripleO.




I'd like to help with this work. I'll start by taking a look at what
you've got so far. Feel free to reach out if you'd like some
additional dev assistance or testing.



Thanks!

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO] config-download/ansible next steps

2018-06-13 Thread Sergii Golovatiuk
Hi,

On Wed, Jun 13, 2018 at 3:17 PM, James Slagle  wrote:
> On Wed, Jun 13, 2018 at 6:49 AM, Dmitry Tantsur  wrote:
>> Slightly hijacking the thread to provide a status update on one of the items
>> :)
>
> Thanks for jumping in.
>
>
>> The immediate plan right now is to wait for metalsmith 0.4.0 to hit the
>> repositories, then start experimenting. I need to find a way to
>> 1. make creating nova instances no-op
>> 2. collect the required information from the created stack (I need networks,
>> ports, hostnames, initial SSH keys, capabilities, images)
>> 3. update the config-download code to optionally include the role [2]
>> I'm not entirely sure where to start, so any hints are welcome.
>
> Here are a couple of possibilities.
>
> We could reuse the OS::TripleO::{{role.name}}Server mappings that we
> already have in place for pre-provisioned nodes (deployed-server).
> This could be mapped to a template that exposes some Ansible tasks as
> outputs that drives metalsmith to do the deployment. When
> config-download runs, it would execute these ansible tasks to
> provision the nodes with Ironic. This has the advantage of maintaining
> compatibility with our existing Heat parameter interfaces. It removes
> Nova from the deployment so that from the undercloud perspective you'd
> roughly have:
>
> Mistral -> Heat -> config-download -> Ironic (driven via ansible/metalsmith)
>
> A further (or completely different) iteration might look like:
>
> Step 1: Mistral -> Ironic (driven via ansible/metalsmith)
> Step 2: Heat -> config-download

I really like this approach. It decouples provisioning level from
deployment. As a result we may use better level of parallelism. For
instance, when we have 3 provisioned servers that match controller
roles we may start controller deployment without waiting other nodes
provisioning. For Compute role the strategy may be different such as
deploy Compute server when at least one node provisioned.

>
> Step 2 would use the pre-provisioned node (deployed-server)  feature
> already existing in TripleO and treat the just provisioned by Ironic
> nodes, as pre-provisioned from the Heat stack perspective. Step 1 and
> Step 2 would also probably be driven by a higher level Mistral
> workflow. This has the advantage of minimal impact to
> tripleo-heat-templates, and also removes Heat from the baremetal
> provisioning step. However, we'd likely need some python compatibility
> libraries that could translate Heat parameter values such as
> HostnameMap to ansible vars for some basic backwards compatibility.
>
>>
>> [1] https://github.com/openstack/metalsmith
>> [2] https://metalsmith.readthedocs.io/en/latest/user/ansible.html
>>
>>>
>>> Obviously we have things to consider here such as backwards compatibility
>>> and
>>> upgrades, but overall, I think this would be a great simplification to our
>>> overall deployment workflow.
>>>
>>
>> Yeah, this is tricky. Can we make Heat "forget" about Nova instances? Maybe
>> by re-defining them to OS::Heat::None?
>
> Not exactly, as Heat would delete the previous versions of the
> resources. We'd need some special migrations, or could support the
> existing method forever for upgrades, and only deprecate it for new
> deployments.
>
> I'd like to help with this work. I'll start by taking a look at what
> you've got so far. Feel free to reach out if you'd like some
> additional dev assistance or testing.
>
> --
> -- James Slagle
> --
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



-- 
Best Regards,
Sergii Golovatiuk

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO] config-download/ansible next steps

2018-06-13 Thread James Slagle
On Wed, Jun 13, 2018 at 6:49 AM, Dmitry Tantsur  wrote:
> Slightly hijacking the thread to provide a status update on one of the items
> :)

Thanks for jumping in.


> The immediate plan right now is to wait for metalsmith 0.4.0 to hit the
> repositories, then start experimenting. I need to find a way to
> 1. make creating nova instances no-op
> 2. collect the required information from the created stack (I need networks,
> ports, hostnames, initial SSH keys, capabilities, images)
> 3. update the config-download code to optionally include the role [2]
> I'm not entirely sure where to start, so any hints are welcome.

Here are a couple of possibilities.

We could reuse the OS::TripleO::{{role.name}}Server mappings that we
already have in place for pre-provisioned nodes (deployed-server).
This could be mapped to a template that exposes some Ansible tasks as
outputs that drives metalsmith to do the deployment. When
config-download runs, it would execute these ansible tasks to
provision the nodes with Ironic. This has the advantage of maintaining
compatibility with our existing Heat parameter interfaces. It removes
Nova from the deployment so that from the undercloud perspective you'd
roughly have:

Mistral -> Heat -> config-download -> Ironic (driven via ansible/metalsmith)

A further (or completely different) iteration might look like:

Step 1: Mistral -> Ironic (driven via ansible/metalsmith)
Step 2: Heat -> config-download

Step 2 would use the pre-provisioned node (deployed-server)  feature
already existing in TripleO and treat the just provisioned by Ironic
nodes, as pre-provisioned from the Heat stack perspective. Step 1 and
Step 2 would also probably be driven by a higher level Mistral
workflow. This has the advantage of minimal impact to
tripleo-heat-templates, and also removes Heat from the baremetal
provisioning step. However, we'd likely need some python compatibility
libraries that could translate Heat parameter values such as
HostnameMap to ansible vars for some basic backwards compatibility.

>
> [1] https://github.com/openstack/metalsmith
> [2] https://metalsmith.readthedocs.io/en/latest/user/ansible.html
>
>>
>> Obviously we have things to consider here such as backwards compatibility
>> and
>> upgrades, but overall, I think this would be a great simplification to our
>> overall deployment workflow.
>>
>
> Yeah, this is tricky. Can we make Heat "forget" about Nova instances? Maybe
> by re-defining them to OS::Heat::None?

Not exactly, as Heat would delete the previous versions of the
resources. We'd need some special migrations, or could support the
existing method forever for upgrades, and only deprecate it for new
deployments.

I'd like to help with this work. I'll start by taking a look at what
you've got so far. Feel free to reach out if you'd like some
additional dev assistance or testing.

-- 
-- James Slagle
--

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO] config-download/ansible next steps

2018-06-13 Thread Dmitry Tantsur

Slightly hijacking the thread to provide a status update on one of the items :)

On 06/12/2018 07:04 PM, James Slagle wrote:

I wanted to provide an update on some next steps around config-download/Ansible
and TripleO. Now that we've completed transitioning to config-download by
default in Rocky, some might be wondering where we're going next.





4. Ansible driven baremetal deployment

Dmitry Tantsur has indicated he's going to be looking at driving TripleO
baremetal provisioning with Ironic and ansible directly. This would remove
Heat+Nova from the baremetal provisioning workflows we currently use.


I'm actually already looking, my efforts just have not become visible yet.

I started with reviving my old metalsmith project [1] to host the code we need 
to make this happen. This now has a CLI tool and a very dump (for now) ansible 
role [2] to drive it.


Why a new tool? First, I want it to be reusable outside of TripleO (and outside 
of ansible modules), thus I don't want to put the code directly into, say, 
tripleo-common. Second, the current OpenStack Ansible modules are not quite 
sufficient for the task:


1. Both the os_ironic_node module and the underlying openstacksdk library lack 
support for the critically important VIF attachment API. I'm working on 
addressing that, but it will take substantial time (e.g. we need to stabilize 
the microversion support in openstacksdk).


2. Missing support for building configdrive. Again, can probably be added to 
openstacksdk, and I'll get to it one day.


3. No bulk operations. There is no way, to my best knowledge (please tell me I'm 
wrong), to provision several nodes in parallel via the current ansible modules. 
It is probably solvable via a new ansible module, but also see the next points.


4. No scheduling. That is, there is no way out-of-box to pick a suitable node 
for deployment. It can be done in pure ansible in the simplest case, but our 
case is not the simplest. Particularly, I don't want to end up parsing 
capabilities in ansible :) Also one of the goals of this work is to provide 
better messages than "No valid hosts found".


5. On top of #3 and #4, it is not possible to operate at the deployment level, 
not on the node level. From the current Heat stack we're going to receive a list 
of overcloud instances with their roles and other parameters. Some code has to 
take this input and make a decision on whether to deploy/undeploy something. 
It's currently done by Heat+Nova together, but they're not doing a great job in 
some corner cases. Particularly, replacing a node may be painful.


So, while I do plan to solve #1 and #2 eventually, #3 - #5 require some place to 
put the logic. Putting it to TripleO or to ansible itself will preclude reusing 
it outside of TripleO and ansible accordingly. So, metalsmith is this place for 
now. I think in the far future I will try proposing a module to ansible itself 
that will handle #3 - #5 and will be backed by metalsmith. It will probably have 
a similar interface to the current PoC role [2].


The immediate plan right now is to wait for metalsmith 0.4.0 to hit the 
repositories, then start experimenting. I need to find a way to

1. make creating nova instances no-op
2. collect the required information from the created stack (I need networks, 
ports, hostnames, initial SSH keys, capabilities, images)

3. update the config-download code to optionally include the role [2]
I'm not entirely sure where to start, so any hints are welcome.

[1] https://github.com/openstack/metalsmith
[2] https://metalsmith.readthedocs.io/en/latest/user/ansible.html



Obviously we have things to consider here such as backwards compatibility and
upgrades, but overall, I think this would be a great simplification to our
overall deployment workflow.



Yeah, this is tricky. Can we make Heat "forget" about Nova instances? Maybe by 
re-defining them to OS::Heat::None?


Dmitry


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [TripleO] config-download/ansible next steps

2018-06-12 Thread James Slagle
I wanted to provide an update on some next steps around config-download/Ansible
and TripleO. Now that we've completed transitioning to config-download by
default in Rocky, some might be wondering where we're going next.

1. Standalone roles.

The idea here is to refactor the ansible tasks lists into
standalone ansible roles. From the tripleo-heat-templates side, we then just
update the service templates to apply those roles (possibly with a specific
task file).

Since not all of the interfaces in tripleo-heat-templates are pure ansible
tasks lists (docker_config, puppet_config), there is some exploratory work here
to determine how we can use those inputs in both a standalone ansible role and
tripleo-heat-templates.

David Peacock sent out a POC of some inital work[1].

2. Standalone playbooks.

Similar to standalone roles, the idea here is to refactor some of the playbooks
into their own proper ansible project directories. These would probably be new
git repositories.

Again, since some of our playbooks are rendered by jinja2, there is some
exploratory work here to see how we can make these more re-usable and not as
tightly coupled with tripleo-heat-templates.

3. Native ansible tasks for the per-server deployments in
tripleo-heat-templates.

Presently we are using a generic ansible task(s) that acts as a shim around the
heat-config hooks for the per-server deployments. This is necessary for
backwards compatibility. Going forward, we want to take a closer look at how we
can use more native ansible tasks for these (e.g., os-net-config ansible
module).

This will improve our ansible playbook interfaces and make the playbooks more
friendly for manual interactions.

4. Ansible driven baremetal deployment

Dmitry Tantsur has indicated he's going to be looking at driving TripleO
baremetal provisioning with Ironic and ansible directly. This would remove
Heat+Nova from the baremetal provisioning workflows we currently use.

Obviously we have things to consider here such as backwards compatibility and
upgrades, but overall, I think this would be a great simplification to our
overall deployment workflow.

5. Other deployment architectures

There are various ongoing efforts continuing and spinning up related to
the:

- all-in-one/standalone installer[2]
- the zero footprint installer[3]
- split-controlplane[4]

I think config-download with ansible is going to drive a lot of these use
cases, particularly as it relates to edge deployments.

If any of this is an area of interest, please reach out. You can find contacts
on the provided links. There may be some upstream squads forming around some of
this work in the near future.

If you have other ideas about improvements/direction, please chime in.

[1] http://lists.openstack.org/pipermail/openstack-dev/2018-March/128887.html
[2] http://lists.openstack.org/pipermail/openstack-dev/2018-June/131135.html
[3] http://lists.openstack.org/pipermail/openstack-dev/2018-June/131192.html
[4] 
https://specs.openstack.org/openstack/tripleo-specs/specs/rocky/split-controlplane.html


-- 
-- James Slagle
--

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev