On Mon, May 15, 2017 at 6:27 PM, Steven Hardy <sha...@redhat.com> wrote: > On Mon, May 08, 2017 at 02:45:08PM +0300, Marios Andreou wrote: >> Hi folks, after some discussion locally with colleagues about improving >> the upgrades experience, one of the items that came up was pre-upgrade and >> update validations. I took an AI to look at the current status of >> tripleo-validations [0] and posted a simple WIP [1] intended to be run >> before an undercloud update/upgrade and which just checks service status. >> It was pointed out by shardy that for such checks it is better to instead >> continue to use the per-service  manifests where possible like [2] for >> example where we check status before N..O major upgrade. There may still >> be some undercloud specific validations that we can land into the >> tripleo-validations repo (thinking about things like the neutron >> networks/ports, validating the current nova nodes state etc?). >> So do folks have any thoughts about this subject - for example the kinds >> of things we should be checking - Steve said he had some reviews in >> progress for collecting the overcloud ansible puppet/docker config into an >> ansible playbook that the operator can invoke for upgrade of the 'manual' >> nodes (for example compute in the N..O workflow) - the point being that we >> can add more per-service ansible validation tasks into the service >> manifests for execution when the play is run by the operator - but I'll >> let Steve point at and talk about those. > > Thanks for starting this thread Marios, sorry for the slow reply due to > Summit etc. > > As we discussed, I think adding validations is great, but I'd prefer we > kept any overcloud validations specific to services in t-h-t instead of > trying to manage service specific things over multiple repos. > > This would also help with the idea of per-step validations I think, where > e.g you could have a "is service active" test and run it after the step > where we expect the service to start, a blueprint was raised a while back > asking for exactly that: > > https://blueprints.launchpad.net/tripleo/+spec/step-by-step-validation > > One way we could achive this is to add ansible tasks that perform some > validation after each step, where we combine the tasks for all services, > similar to how we already do upgrade_tasks and host_prep_tasks: > > https://github.com/openstack/tripleo-heat-templates/blob/master/docker/services/database/redis.yaml#L92 > > With the benefit of hindsight using ansible tags for upgrade_tasks wasn't > the best approach, because you can't change the tags via SoftwareDeployment > (e.g you need a SoftwareConfig per step), it's better if we either generate > the list of tasks by merging maps e.g > > validation_tasks: > step3: > - sometask > > Or via ansible conditionals where we pass a step value in to each run of > the tasks: > > validation_tasks: > - sometask > when: step == 3 > > The latter approach is probably my preference, because it'll require less > complex merging in the heat layer. > > As you mentioned, I've been working on ways to make the deployment steps > more ansible driven, so having these tasks integrated with the t-h-t model > would be well aligned with that I think: > > https://review.openstack.org/#/c/454816/ > > https://review.openstack.org/#/c/462211/ > > Happy to discuss further when you're ready to start integrating some > overcloud validations.
Maybe these are two kinds of pre-upgrade validations that serve different purposes. The more general validations (like checking connectivity, making sure the stack is in good shape, repos are available, etc.) should give operators a fair amount of confidence that all basic prerequisites to start an update are met *before* the upgrade is started. They could be run from the UI or CLI and would fit well into the tripleo-validations repo. Similar to the existing tripleo-validations, failures don't prevent operators from doing something. The service-specific validations otoh are closely tied to the upgrade process and will stop further progress when failing. They are fundamentally different to the tripleo-validations and could therefore live in t-h-t. I personally don't see why we shouldn't have pre-upgrade validations both in tripleo-validations and in t-h-t, as long as we know which ones go where. If everything that's tied to a specific overcloud service or upgrade step goes into t-h-t, I could see these two groups (using the validations suggested earlier in this thread): tripleo-validations: - Undercloud service check - Verify that the stack is in a *_COMPLETE state - Verify undercloud disk space. For node replacement we recommended a minimum of 10 GB free. - Network/repo availability check (undercloud and overcloud) - Verify we're at the latest version of the current release - ... tripleo-heat-templates: - Pacemaker cluster health - Ceph health - APIs healthcheck (per overcloud service) - Check Galera and Rabbit clusters and verify all nodes are up. - Disabling stonith. - ... In theory I could imagine another variety of pre-upgrade validations: Ones that are general in nature (not tied to an overcloud service), but are specific to a particular version jump (so they would be run before a N..O upgrade, but wouldn't make sense for an O..P jump). These could still live in the tripleo-validations repo, but would only exist as backports to the relevant "from"-version. But lacking a good example, this is probably a bit academic for now. :-) Any thoughts? Thanks Florian __________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev