Re: [openstack-dev] [tripleo] let's talk (development) environment deployment tooling and workflows

John Trowbridge Wed, 21 Sep 2016 08:07:00 -0700


On 09/19/2016 01:21 PM, Steven Hardy wrote:
> Hi Alex,
> 
> Firstly, thanks for this detailed feedback - it's very helpful to have
> someone with a fresh perspective look at the day-1 experience for TripleO,
> and while some of what follows are "know issues", it's great to get some
> perspective on them, as well as ideas re how we might improve things.
> 
> On Thu, Sep 15, 2016 at 09:09:24AM -0600, Alex Schultz wrote:
>> Hi all,
>>
>> I've recently started looking at the various methods for deploying and
>> developing tripleo.  What I would like to bring up is the current
>> combination of the tooling for managing the VM instances and the
>> actual deployment method to launch the undercloud/overcloud
>> installation.  While running through the various methods and reading
>> up on the documentation, I'm concerned that they are not currently
>> flexible enough for a developer (or operator for that matter) to be
>> able to setup the various environment configurations for testing
>> deployments and doing development.  Additionally I ran into issues
>> just trying get them working at all so this probably doesn't help when
>> trying to attract new contributors as well.  The focus of this email
>> and of my experience seems to relate with workflow-simplification
>> spec[0].  I would like to share my experiences with the various
>> tooling available and raise some ideas.
>>
>> Example Situation:
>>
>> For example, I have a laptop with 16G of RAM and an SSD and I'd like
>> to get started with tripleo.  How can I deploy tripleo?
> 
> So, this is probably problem #1, because while I have managed to deploy a
> minimal TripleO environment on a laptop with 16G of RAM, I think it's
> pretty widely known that it's not really enough (certainly with our default
> configuration, which has unfortunately grown over time as more and more
> things got integrated).
> 
> I see two options here:
> 
> 1. Document the reality (which is really you need a physical machine with
> at least 32G RAM unless you're prepared to deal with swapping).
> 
> 2. Look at providing a "TripleO lite" install option, which disables some
> services (both on the undercloud and default overcloud install).
> 
> Either of these are defintely possible, but (2) seems like the best
> long-term solution (although it probably means another CI job).
> 
>> Tools:
>>
>> instack:
>>
>> I started with the tripleo docs[1] that reference using the instack
>> tools for virtual environment creation while deploying tripleo.   The
>> docs say you need at least 12G of RAM[2].  The docs lie (step 7[3]).
>> So after basically shutting everything down and letting it deploy with
>> all my RAM, the deployment fails because the undercloud runs out of
>> RAM and OOM killer kills off heat.  This was not because I had reduced
>> the amount of ram for the undercloud node or anything.  It was because
>> by default, 6GB of RAM with no swap is configured for the undercloud
>> (not sure if this is a bug?).  So I added a swap file to the
>> undercloud and continued. My next adventure was having the overcloud
>> deployment fail because lack of memory as puppet fails trying to spawn
>> a process and gets denied.  The instack method does not configure swap
>> for the VMs that are deployed and the deployment did not work with 5GB
>> RAM for each node.  So for a full 16GB I was unable to follow the
>> documentation and use instack to successfully deploy.  At this point I
>> switched over to trying to use tripleo-quickstart.  Eventually I was
>> able to figure out a configuration with instack to get it to deploy
>> when I figured out how to enable swap for the overcloud deployment.
> 
> Yeah, so this definitely exposes that we need to update the docs, and also
> provide an easy install-time option to enable swap on all-the-things for
> memory contrained environments.
> 
>> tripleo-quickstart:
>>
>> The next thing I attempted to use was the tripleo-quickstart[4].
>> Following the directions I attempted to deploy against my localhost.
>> I turns out that doesn't work as expected since ansible likes to do
>> magic when dealing with localhost[5].  Ultimately I was unable to get
>> it working against my laptop locally because I ran into some libvirt
>> issues.  But I was able to get it to work when I pointed it at a
>> separate machine.  It should be noted that tripleo-quickstart creates
>> an undercloud with swap which was nice because then it actually works,
>> but is an inconsistent experience depending on which tool you used for
>> your deployment.
> 
> Yeah, so while a lot of folks have good luck with tripleo-quickstart, it
> has the disadvantage of not currently being the tool used in upstream
> TripleO CI (which folks have looked at fixing, but it's not yet happened).
> 
> The original plan was for tripleo-quickstart to completely replace the
> instack-virt-setup workflow:
> 
> https://blueprints.launchpad.net/tripleo/+spec/tripleo-quickstart
> 
> But for a variety of reasons, we never quite got to that - we may need a
> summit discussion on the path forward here.
> 
> For me (as an upstream developer) it really boils down to the CI usage
> issue - at all times I want to use the tool which gets me closest to what
> runs in upstream CI (which although we actually use instack-virt-setup, we
> otherwise follow the tripleo-docs procedure pretty closely, using a helper
> script called tripleo.sh, which you can run locally):
> 
> http://paste.fedoraproject.org/431073/30480114
> 
This kind of feels like quickstart FUD to me.


CI does not run tripleo.sh. So the paste above is not actually
reproducing any tripleo-ci job. tripleo-ci runs a wrapper around
tripleo.sh[1] with specific ENV variables set in a different script[2],
and then some required steps to actually make that work in yet another
script[3].

Further, if you replaced the instack-virt-setup line in the above script
with a run of tripleo-quickstart, the other steps could still be run on
top of a quickstart undercloud.

[1]
https://github.com/openstack-infra/tripleo-ci/blob/master/scripts/deploy.sh
[2]
https://github.com/openstack-infra/tripleo-ci/blob/master/toci_gate_test.sh#L107-L200
[3]
https://github.com/openstack-infra/tripleo-ci/blob/master/toci_instack.sh#L180-L182

>> Thoughts:
>>
>> What these two methods showed me is that the deployment of tripleo is
>> not exactly a foolproof thing and that there are a lot of assumptions
>> that are being handled by the both of these tools.  My initial goal to
>> start this conversation around tooling and workflows was to bring the
>> idea of separation of the (virtual) environment configuration from the
>> actual deployment of tripleo as well as identifying places for
>> improvement as a way to speed up development and deployment testing.
>> I believe there are a few reasons why this can be beneficial.
> 
> Yep, I think this goal is uncontentious, and it's pretty much the original
> aim of tripleo-quickstart.
> 
>> The first reason is that as a developer, I would like to simplify the
>> development environment creation process and be able to draw the line
>> between environment and actual deployment tool.  By developing and
>> documenting a working development/deployment workflow, we can simplify
>> the onboarding experience as well as possibly accelerating the
>> existing development processes by reducing the time spent messing with
>> creating environments.  Does tripleo need to manage creation of VMs to
>> deploy on? The answer is probably no.  As the end user will want to
>> deploy tripleo on his or her gear, the focus for tripleo probably
>> should be on improving that process.  Now this doesn't mean that we
>> can't write stuff to do this, as it's important for development and
>> testing.  I'm not sure this is a core part of what should be
>> 'tripleo'.
> 
> Yeah, agreed - the automation around setting up the VMs is really just a
> convenience, and it's not really a core part of TripleO - any tool could be
> used provided the VMs end up configured in the way we require.
> 
>> Another reason why I think this is important is as we talk about
>> creating different scenarios for CI[6] to improve testing, it would
>> also be useful for a developer or qa engineer to be able to test
>> different environmental configurations that would be more realistic of
>> actual deployment scenarios without having to hunt down multiple
>> machines or configure actual hardware networking.  For example,
>> creating environments using multiple networks, changing NICs,
>> providing different sized nodes based on roles, etc can all be done
>> virtually.  While tripleo-quickstart has some of these options, it is
>> mixed in with the tripleo deployment process and does not seem to
>> align with being able to deploy tripleo in more real world networking
>> or environmental scenarios.
> 
> Yeah, so I think this is one reason why the tripleo-quickstart discussion
> has sometimes proven tricky - the original spec was about replacing only
> the virt-setup pieces, but there was subsequently some scope-creep.  I
> think this is being adressed, but it'd be good to have folks working on
> that chime in here.
> 

Indeed, we have moved all of the ansible code for doing full deployments
outside of tripleo-quickstart. There is still a bunch of CI helper
scripts for RDO CI in tree, and a playbook that exercises the full
deployment code, but I would like to move all of that out of the
quickstart tree as well.

We have also added the ability to consume the images produced by
tripleo-ci rather than using images produced by RDO. This required
adding the ability to use an overcloud-full image as an undercloud
image, since tripleo-ci no longer produces one. I think this will
actually allow us to stop producing an undercloud image downstream of
TripleO as well, which means that those images could be produced using
only methods from tripleo-docs.

I also recently started to look at what it would be like to add a
"virt-setup" option to tripleo.sh that runs quickstart[4]. It is
actually pretty simple for doing just the basic instack-virt-setup part.

However, what developers actually want is to reproduce exactly what runs
in CI. With our current CI architecture it is a bit harder to do that
cleanly in an external tool, but I actually was able to get a minimal
POC of it working[5]. It requires quite a bit of hacky stuff to make
tripleo-ci deploy.sh work[6], but I think it shines a light on what we
need to improve to make tripleo-ci externally consumable. It does seem
like we could get there in under 5 patches though, which is less than I
thought when starting on the POC.

The one piece not addressed in either POC patch is building changes
under test using DLRN. There is a function in tripleo.sh that does this
for us in CI, and it can be leveraged for the developer use case as
well. There is also an ansible role in RDO[7] that could be used for
that purpose. The ansible role has quite a few more features than
tripleo.sh, including multi-gerrit support and allowing to build
packaging changes along with the code changes. It also has a hook in the
tripleo-quickstart code to inject the DLRN repo created into the
overcloud image before we ever boot anything, so from a quickstart
perspective it is quite a bit nicer.

[4] https://review.openstack.org/371587
[5] https://review.openstack.org/374116
[6]
https://review.openstack.org/#/c/374116/1/roles/libvirt/setup/undercloud/templates/complete_deploy.sh.j2
[7] https://github.com/redhat-openstack/ansible-role-tripleo-gate

>> Since there are a bunch of assumptions baked into the existing
>> development scripts, I would say the current approach is more 'it
>> works in devstack' than 'it works for the end user'.  This is not to
>> say the currently tools don't have their uses as they currently work
>> for the existing CI setup and for many developers today.  I think we
>> can do better if we draw clearer lines between what is tripleo and
>> what is something that is environmental and a supporting tool.
> 
> I'm not sure I agree here - after you have your virt stuff setup, the
> TripleO pieces which do the deployment are identical to those used by the
> end user (unlike Devstack where it's likely the entire environment has been
> configured using a different tool to production deployments).
> 
>> Ideas:
>>
>> As part of bringing something to get the conversation started and to
>> better understand how things work, I spent about two days coming up
>> with a PoC[7] for a workflow that splits the environment creation,
>> configuration, and management out from the actual deployment of the
>> undercloud/overcloud.  Having previously used other tools for managing
>> environments for deploying openstack, I thought I'd try deploying
>> tripleo using something I was familiar with, fuel-devops[8].  The
>> whole point of fuel-devops is to be able to create and manage entire
>> virtual environments (and their networking configuration) on a given
>> host.  With this, I was able to create my environment setup in a yaml
>> file[9] which would then be able to be reproduced.  So with this tool,
>> I'm able to create a number nodes of a given memory, disk, network
>> configuration as part of an 'environment'.  This environment is
>> completely separated from another environment which means given a
>> large enough virtual host, I could have multiple tripleo deployments
>> occurring simultaneously on their own networks.  This is a nice
>> feature, but just an added bonus to the tool (along with snapshotting
>> and a few other nifty things).  The bigger feature here is that this
>> is more representative of what someone using tripleo is going to
>> experience. They are going to have their environment already
>> configured and would like to deploy tripleo on it.  Once the
>> environment was created, I started to understand what it would be like
>> for an end user to take an undercloud image and deploy it.
>> Fortunately because we're still dealing with VMs, you can just point
>> the undercloud node at the undercloud image itself[10] for testing
>> purposes.

I think this could be knowing one tool better than another.
tripleo-quickstart can be run in such a way as it only does the
environment setup part. And it takes a yaml config file that describes
that environment. Further, it deploys the VMs using qemu:///session
inside a non-root user, so they would be totally segregated from VMs
using qemu:///system or another non-root user. There may be some small
amount of effort to actually use quickstart to deploy 2 environments on
the same host. I have not tried that, as on a 32G virthost getting a
single realistic deployment (ie HA w/ ceph) is challenging enough.
However, I do run a couple of utility VMs (irc bouncer etc.) on the same
virthost that I do quickstart deploys on, and quickstart does not ever
interfere.

I actually thought it would be neat to add an opposite feature, where a
user could pair two smaller virthosts and deploy VMs across them. I
havent spent any time looking at how difficult that would be to
implement though.

I do think snapshotting would be a really great feature for
tripleo-quickstart, and could be something worth looking at for Ocata.
> 
> This looks very interesting, thanks for sharing! :)
> 
> That said, my main concern if we go this way is we'd end up with three ways
> to do a virt setup vs the current two ;)
> 
> Definitely worthy of broader discussion though, particularly in the context
> of this vs the ansible based tripleo-quickstart.
> 
>> Once the environment exists, it starts exposing what exactly it means
>> to deploy a tripleo undercloud/overcloud.  The majority of the effort
>> I had to expend for this PoC was actually related to the construction
>> of the instackenv.json to load the overcloud nodes into ironic.  As
>> mentioned in the workflow-simplification spec[0], this is a known
>> limitation and there are possible solutions and I think this is
>> important of the end user experience when trying to work with tripleo.
>> It should be noted that I managed to get the undercloud and
>> controller/compute deployed (but eating into VM swap space) in 12GB on
>> my laptop.  This was something I was unable to do with either instack
>> or tripleo-quickstart.

Again, I think this could be knowing one tool better than another.
tripleo-quickstart can tune CPU and memory for each VM[8]. Though it
looks like we could document that better. What is in the defaults and
config files is what we have found to work well on 32G virthosts.

I will say I would be pretty surprised if the "cloud in 12GB" could do
any post-deploy validation without falling over.

[8]
https://github.com/openstack/tripleo-quickstart/blob/master/roles/common/defaults/main.yml#L15-L56

> 
> So, I'm a little unclear here, presumably the actual RAM usage was the
> same, so is this just because you were able to easily configure swap on all
> the VMs?
> 
>> There are some short coming with this particular tool choice. My
>> understanding is that fuel-devops is still limited to managing a
>> single host.  So you don't use it against remote nodes, but it is good
>> if you have decently sized physical machine or want to work locally.
>> I ran into issues with network configurations and pxe booting, but I
>> have a feeling that's more of a bug in libvirt and my lack of time to
>> devote to undercloud setup.  So it's not perfect, but it does show off
>> the basics of the concept.  Overall I think clearly defining the
>> tripleo installation process from the environment configuration is an
>> important step for end user usability and even developer workflows.
> 
> I think the multi-node use-case is mostly handled via OVB[1] now which is
> where you basically use an OpenStack cloud to host the VMs used for a
> TripleO deployment (yes, that is OpenStack on OpenStack on OpenStack).
> 
> We're using that in CI and it works pretty well, so I think the main gap is
> a super-easy day-1 workflow that allows users/developers to get up and
> running easily on a single node (as mentioned above tho, quickstart was
> aimed at closing this gapm and has been working well for a lot of folks).
> 
> Thanks for the feedback - defintely more here we can discuss and hopefully
> refine into actionable bugs/specs/patches! :)
> 
> Steve
> 
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 

__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [tripleo] let's talk (development) environment deployment tooling and workflows

Reply via email to