On 09/19/2016 01:21 PM, Steven Hardy wrote: > Hi Alex, > > Firstly, thanks for this detailed feedback - it's very helpful to have > someone with a fresh perspective look at the day-1 experience for TripleO, > and while some of what follows are "know issues", it's great to get some > perspective on them, as well as ideas re how we might improve things. > > On Thu, Sep 15, 2016 at 09:09:24AM -0600, Alex Schultz wrote: >> Hi all, >> >> I've recently started looking at the various methods for deploying and >> developing tripleo. What I would like to bring up is the current >> combination of the tooling for managing the VM instances and the >> actual deployment method to launch the undercloud/overcloud >> installation. While running through the various methods and reading >> up on the documentation, I'm concerned that they are not currently >> flexible enough for a developer (or operator for that matter) to be >> able to setup the various environment configurations for testing >> deployments and doing development. Additionally I ran into issues >> just trying get them working at all so this probably doesn't help when >> trying to attract new contributors as well. The focus of this email >> and of my experience seems to relate with workflow-simplification >> spec[0]. I would like to share my experiences with the various >> tooling available and raise some ideas. >> >> Example Situation: >> >> For example, I have a laptop with 16G of RAM and an SSD and I'd like >> to get started with tripleo. How can I deploy tripleo? > > So, this is probably problem #1, because while I have managed to deploy a > minimal TripleO environment on a laptop with 16G of RAM, I think it's > pretty widely known that it's not really enough (certainly with our default > configuration, which has unfortunately grown over time as more and more > things got integrated). > > I see two options here: > > 1. Document the reality (which is really you need a physical machine with > at least 32G RAM unless you're prepared to deal with swapping). > > 2. Look at providing a "TripleO lite" install option, which disables some > services (both on the undercloud and default overcloud install). > > Either of these are defintely possible, but (2) seems like the best > long-term solution (although it probably means another CI job). > >> Tools: >> >> instack: >> >> I started with the tripleo docs[1] that reference using the instack >> tools for virtual environment creation while deploying tripleo. The >> docs say you need at least 12G of RAM[2]. The docs lie (step 7[3]). >> So after basically shutting everything down and letting it deploy with >> all my RAM, the deployment fails because the undercloud runs out of >> RAM and OOM killer kills off heat. This was not because I had reduced >> the amount of ram for the undercloud node or anything. It was because >> by default, 6GB of RAM with no swap is configured for the undercloud >> (not sure if this is a bug?). So I added a swap file to the >> undercloud and continued. My next adventure was having the overcloud >> deployment fail because lack of memory as puppet fails trying to spawn >> a process and gets denied. The instack method does not configure swap >> for the VMs that are deployed and the deployment did not work with 5GB >> RAM for each node. So for a full 16GB I was unable to follow the >> documentation and use instack to successfully deploy. At this point I >> switched over to trying to use tripleo-quickstart. Eventually I was >> able to figure out a configuration with instack to get it to deploy >> when I figured out how to enable swap for the overcloud deployment. > > Yeah, so this definitely exposes that we need to update the docs, and also > provide an easy install-time option to enable swap on all-the-things for > memory contrained environments. > >> tripleo-quickstart: >> >> The next thing I attempted to use was the tripleo-quickstart[4]. >> Following the directions I attempted to deploy against my localhost. >> I turns out that doesn't work as expected since ansible likes to do >> magic when dealing with localhost[5]. Ultimately I was unable to get >> it working against my laptop locally because I ran into some libvirt >> issues. But I was able to get it to work when I pointed it at a >> separate machine. It should be noted that tripleo-quickstart creates >> an undercloud with swap which was nice because then it actually works, >> but is an inconsistent experience depending on which tool you used for >> your deployment. > > Yeah, so while a lot of folks have good luck with tripleo-quickstart, it > has the disadvantage of not currently being the tool used in upstream > TripleO CI (which folks have looked at fixing, but it's not yet happened). > > The original plan was for tripleo-quickstart to completely replace the > instack-virt-setup workflow: > > https://blueprints.launchpad.net/tripleo/+spec/tripleo-quickstart > > But for a variety of reasons, we never quite got to that - we may need a > summit discussion on the path forward here. > > For me (as an upstream developer) it really boils down to the CI usage > issue - at all times I want to use the tool which gets me closest to what > runs in upstream CI (which although we actually use instack-virt-setup, we > otherwise follow the tripleo-docs procedure pretty closely, using a helper > script called tripleo.sh, which you can run locally): > > http://paste.fedoraproject.org/431073/30480114 > This kind of feels like quickstart FUD to me.
CI does not run tripleo.sh. So the paste above is not actually reproducing any tripleo-ci job. tripleo-ci runs a wrapper around tripleo.sh[1] with specific ENV variables set in a different script[2], and then some required steps to actually make that work in yet another script[3]. Further, if you replaced the instack-virt-setup line in the above script with a run of tripleo-quickstart, the other steps could still be run on top of a quickstart undercloud. [1] https://github.com/openstack-infra/tripleo-ci/blob/master/scripts/deploy.sh [2] https://github.com/openstack-infra/tripleo-ci/blob/master/toci_gate_test.sh#L107-L200 [3] https://github.com/openstack-infra/tripleo-ci/blob/master/toci_instack.sh#L180-L182 >> Thoughts: >> >> What these two methods showed me is that the deployment of tripleo is >> not exactly a foolproof thing and that there are a lot of assumptions >> that are being handled by the both of these tools. My initial goal to >> start this conversation around tooling and workflows was to bring the >> idea of separation of the (virtual) environment configuration from the >> actual deployment of tripleo as well as identifying places for >> improvement as a way to speed up development and deployment testing. >> I believe there are a few reasons why this can be beneficial. > > Yep, I think this goal is uncontentious, and it's pretty much the original > aim of tripleo-quickstart. > >> The first reason is that as a developer, I would like to simplify the >> development environment creation process and be able to draw the line >> between environment and actual deployment tool. By developing and >> documenting a working development/deployment workflow, we can simplify >> the onboarding experience as well as possibly accelerating the >> existing development processes by reducing the time spent messing with >> creating environments. Does tripleo need to manage creation of VMs to >> deploy on? The answer is probably no. As the end user will want to >> deploy tripleo on his or her gear, the focus for tripleo probably >> should be on improving that process. Now this doesn't mean that we >> can't write stuff to do this, as it's important for development and >> testing. I'm not sure this is a core part of what should be >> 'tripleo'. > > Yeah, agreed - the automation around setting up the VMs is really just a > convenience, and it's not really a core part of TripleO - any tool could be > used provided the VMs end up configured in the way we require. > >> Another reason why I think this is important is as we talk about >> creating different scenarios for CI[6] to improve testing, it would >> also be useful for a developer or qa engineer to be able to test >> different environmental configurations that would be more realistic of >> actual deployment scenarios without having to hunt down multiple >> machines or configure actual hardware networking. For example, >> creating environments using multiple networks, changing NICs, >> providing different sized nodes based on roles, etc can all be done >> virtually. While tripleo-quickstart has some of these options, it is >> mixed in with the tripleo deployment process and does not seem to >> align with being able to deploy tripleo in more real world networking >> or environmental scenarios. > > Yeah, so I think this is one reason why the tripleo-quickstart discussion > has sometimes proven tricky - the original spec was about replacing only > the virt-setup pieces, but there was subsequently some scope-creep. I > think this is being adressed, but it'd be good to have folks working on > that chime in here. > Indeed, we have moved all of the ansible code for doing full deployments outside of tripleo-quickstart. There is still a bunch of CI helper scripts for RDO CI in tree, and a playbook that exercises the full deployment code, but I would like to move all of that out of the quickstart tree as well. We have also added the ability to consume the images produced by tripleo-ci rather than using images produced by RDO. This required adding the ability to use an overcloud-full image as an undercloud image, since tripleo-ci no longer produces one. I think this will actually allow us to stop producing an undercloud image downstream of TripleO as well, which means that those images could be produced using only methods from tripleo-docs. I also recently started to look at what it would be like to add a "virt-setup" option to tripleo.sh that runs quickstart[4]. It is actually pretty simple for doing just the basic instack-virt-setup part. However, what developers actually want is to reproduce exactly what runs in CI. With our current CI architecture it is a bit harder to do that cleanly in an external tool, but I actually was able to get a minimal POC of it working[5]. It requires quite a bit of hacky stuff to make tripleo-ci deploy.sh work[6], but I think it shines a light on what we need to improve to make tripleo-ci externally consumable. It does seem like we could get there in under 5 patches though, which is less than I thought when starting on the POC. The one piece not addressed in either POC patch is building changes under test using DLRN. There is a function in tripleo.sh that does this for us in CI, and it can be leveraged for the developer use case as well. There is also an ansible role in RDO[7] that could be used for that purpose. The ansible role has quite a few more features than tripleo.sh, including multi-gerrit support and allowing to build packaging changes along with the code changes. It also has a hook in the tripleo-quickstart code to inject the DLRN repo created into the overcloud image before we ever boot anything, so from a quickstart perspective it is quite a bit nicer. [4] https://review.openstack.org/371587 [5] https://review.openstack.org/374116 [6] https://review.openstack.org/#/c/374116/1/roles/libvirt/setup/undercloud/templates/complete_deploy.sh.j2 [7] https://github.com/redhat-openstack/ansible-role-tripleo-gate >> Since there are a bunch of assumptions baked into the existing >> development scripts, I would say the current approach is more 'it >> works in devstack' than 'it works for the end user'. This is not to >> say the currently tools don't have their uses as they currently work >> for the existing CI setup and for many developers today. I think we >> can do better if we draw clearer lines between what is tripleo and >> what is something that is environmental and a supporting tool. > > I'm not sure I agree here - after you have your virt stuff setup, the > TripleO pieces which do the deployment are identical to those used by the > end user (unlike Devstack where it's likely the entire environment has been > configured using a different tool to production deployments). > >> Ideas: >> >> As part of bringing something to get the conversation started and to >> better understand how things work, I spent about two days coming up >> with a PoC[7] for a workflow that splits the environment creation, >> configuration, and management out from the actual deployment of the >> undercloud/overcloud. Having previously used other tools for managing >> environments for deploying openstack, I thought I'd try deploying >> tripleo using something I was familiar with, fuel-devops[8]. The >> whole point of fuel-devops is to be able to create and manage entire >> virtual environments (and their networking configuration) on a given >> host. With this, I was able to create my environment setup in a yaml >> file[9] which would then be able to be reproduced. So with this tool, >> I'm able to create a number nodes of a given memory, disk, network >> configuration as part of an 'environment'. This environment is >> completely separated from another environment which means given a >> large enough virtual host, I could have multiple tripleo deployments >> occurring simultaneously on their own networks. This is a nice >> feature, but just an added bonus to the tool (along with snapshotting >> and a few other nifty things). The bigger feature here is that this >> is more representative of what someone using tripleo is going to >> experience. They are going to have their environment already >> configured and would like to deploy tripleo on it. Once the >> environment was created, I started to understand what it would be like >> for an end user to take an undercloud image and deploy it. >> Fortunately because we're still dealing with VMs, you can just point >> the undercloud node at the undercloud image itself[10] for testing >> purposes. I think this could be knowing one tool better than another. tripleo-quickstart can be run in such a way as it only does the environment setup part. And it takes a yaml config file that describes that environment. Further, it deploys the VMs using qemu:///session inside a non-root user, so they would be totally segregated from VMs using qemu:///system or another non-root user. There may be some small amount of effort to actually use quickstart to deploy 2 environments on the same host. I have not tried that, as on a 32G virthost getting a single realistic deployment (ie HA w/ ceph) is challenging enough. However, I do run a couple of utility VMs (irc bouncer etc.) on the same virthost that I do quickstart deploys on, and quickstart does not ever interfere. I actually thought it would be neat to add an opposite feature, where a user could pair two smaller virthosts and deploy VMs across them. I havent spent any time looking at how difficult that would be to implement though. I do think snapshotting would be a really great feature for tripleo-quickstart, and could be something worth looking at for Ocata. > > This looks very interesting, thanks for sharing! :) > > That said, my main concern if we go this way is we'd end up with three ways > to do a virt setup vs the current two ;) > > Definitely worthy of broader discussion though, particularly in the context > of this vs the ansible based tripleo-quickstart. > >> Once the environment exists, it starts exposing what exactly it means >> to deploy a tripleo undercloud/overcloud. The majority of the effort >> I had to expend for this PoC was actually related to the construction >> of the instackenv.json to load the overcloud nodes into ironic. As >> mentioned in the workflow-simplification spec[0], this is a known >> limitation and there are possible solutions and I think this is >> important of the end user experience when trying to work with tripleo. >> It should be noted that I managed to get the undercloud and >> controller/compute deployed (but eating into VM swap space) in 12GB on >> my laptop. This was something I was unable to do with either instack >> or tripleo-quickstart. Again, I think this could be knowing one tool better than another. tripleo-quickstart can tune CPU and memory for each VM[8]. Though it looks like we could document that better. What is in the defaults and config files is what we have found to work well on 32G virthosts. I will say I would be pretty surprised if the "cloud in 12GB" could do any post-deploy validation without falling over. [8] https://github.com/openstack/tripleo-quickstart/blob/master/roles/common/defaults/main.yml#L15-L56 > > So, I'm a little unclear here, presumably the actual RAM usage was the > same, so is this just because you were able to easily configure swap on all > the VMs? > >> There are some short coming with this particular tool choice. My >> understanding is that fuel-devops is still limited to managing a >> single host. So you don't use it against remote nodes, but it is good >> if you have decently sized physical machine or want to work locally. >> I ran into issues with network configurations and pxe booting, but I >> have a feeling that's more of a bug in libvirt and my lack of time to >> devote to undercloud setup. So it's not perfect, but it does show off >> the basics of the concept. Overall I think clearly defining the >> tripleo installation process from the environment configuration is an >> important step for end user usability and even developer workflows. > > I think the multi-node use-case is mostly handled via OVB[1] now which is > where you basically use an OpenStack cloud to host the VMs used for a > TripleO deployment (yes, that is OpenStack on OpenStack on OpenStack). > > We're using that in CI and it works pretty well, so I think the main gap is > a super-easy day-1 workflow that allows users/developers to get up and > running easily on a single node (as mentioned above tho, quickstart was > aimed at closing this gapm and has been working well for a lot of folks). > > Thanks for the feedback - defintely more here we can discuss and hopefully > refine into actionable bugs/specs/patches! :) > > Steve > > __________________________________________________________________________ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > __________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev