On 16.08.2017 3:33, Emilien Macchi wrote: > So far, we're having 3 critical issues, that we all need to address as > soon as we can. > > Problem #1: Upgrade jobs timeout from Newton to Ocata > https://bugs.launchpad.net/tripleo/+bug/1702955 > Today I spent an hour to look at it and here's what I've found so far: > depending on which public cloud we're running the TripleO CI jobs, it > timeouts or not. > Here's an example of Heat resources that run in our CI: > https://www.diffchecker.com/VTXkNFuk > On the left, resources on a job that failed (running on internap) and > on the right (running on citycloud) it worked. > I've been through all upgrade steps and I haven't seen specific tasks > that take more time here or here, but some little changes that make > the big change at the end (so hard to debug). > Note: both jobs use AFS mirrors. > Help on that front would be very welcome. > > > Problem #2: from Ocata to Pike (containerized) missing container upload step > https://bugs.launchpad.net/tripleo/+bug/1710938 > Wes has a patch (thanks!) that is currently in the gate: > https://review.openstack.org/#/c/493972 > Thanks to that work, we managed to find the problem #3. > > > Problem #3: from Ocata to Pike: all container images are > uploaded/specified, even for services not deployed > https://bugs.launchpad.net/tripleo/+bug/1710992 > The CI jobs are timeouting during the upgrade process because > downloading + uploading _all_ containers in local cache takes more > than 20 minutes. > So this is where we are now, upgrade jobs timeout on that. Steve Baker > is currently looking at it but we'll probably offer some help. > > > Solutions: > - for stable/ocata: make upgrade jobs non-voting > - for pike: keep upgrade jobs non-voting and release without upgrade testing
This doesn't look like a viable option to me. I'd prefer reduce the scope (deployed services under upgrade testing) of the upgrade testing, but release only having it passing for that scope. > > Risks: > - for stable/ocata: it's highly possible to inject regression if jobs > aren't voting anymore. > - for pike: the quality of the release won't be good enough in term of > CI coverage comparing to Ocata. > > Mitigations: > - for stable/ocata: make jobs non-voting and enforce our > core-reviewers to pay double attention on what is landed. It should be > temporary until we manage to fix the CI jobs. > - for master: release RC1 without upgrade jobs and make progress > - Run TripleO upgrade scenarios as third party CI in RDO Cloud or > somewhere with resources and without timeout constraints. > > I would like some feedback on the proposal so we can move forward this week, > Thanks. > -- Best regards, Bogdan Dobrelya, Irc #bogdando __________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev