On Wed, 2017-03-29 at 22:07 -0400, Paul Belanger wrote: > On Thu, Mar 30, 2017 at 09:56:59AM +1300, Steve Baker wrote: > > On Thu, Mar 30, 2017 at 9:39 AM, Emilien Macchi <emil...@redhat.com > > > wrote: > > > > > On Mon, Mar 27, 2017 at 8:00 AM, Flavio Percoco <fla...@redhat.co > > > m> wrote: > > > > On 23/03/17 16:24 +0100, Martin André wrote: > > > > > > > > > > On Wed, Mar 22, 2017 at 2:20 PM, Dan Prince <dprince@redhat.c > > > > > om> wrote: > > > > > > > > > > > > On Wed, 2017-03-22 at 13:35 +0100, Flavio Percoco wrote: > > > > > > > > > > > > > > On 22/03/17 13:32 +0100, Flavio Percoco wrote: > > > > > > > > On 21/03/17 23:15 -0400, Emilien Macchi wrote: > > > > > > > > > Hey, > > > > > > > > > > > > > > > > > > I've noticed that container jobs look pretty unstable > > > > > > > > > lately; to > > > > > > > > > me, > > > > > > > > > it sounds like a timeout: > > > > > > > > > http://logs.openstack.org/19/447319/2/check-tripleo/g > > > > > > > > > ate-tripleo- > > > > > > > > > ci-centos-7-ovb-containers-oooq- > > > > > > > > > nv/bca496a/console.html#_2017-03- > > > > > > > > > 22_00_08_55_358973 > > > > > > > > > > > > > > > > There are different hypothesis on what is going on > > > > > > > > here. Some > > > > > > > > patches have > > > > > > > > landed to improve the write performance on containers > > > > > > > > by using > > > > > > > > hostpath mounts > > > > > > > > but we think the real slowness is coming from the > > > > > > > > images download. > > > > > > > > > > > > > > > > This said, this is still under investigation and the > > > > > > > > containers > > > > > > > > squad will > > > > > > > > report back as soon as there are new findings. > > > > > > > > > > > > > > Also, to be more precise, Martin André is looking into > > > > > > > this. He also > > > > > > > fixed the > > > > > > > gate in the last 2 weeks. > > > > > > > > > > > > > > > > > > I spoke w/ Martin on IRC. He seems to think this is the > > > > > > cause of some > > > > > > of the failures: > > > > > > > > > > > > http://logs.openstack.org/32/446432/1/check-tripleo/gate- > > > > > > tripleo-ci-cen > > > > > > tos-7-ovb-containers-oooq-nv/543bc80/logs/oooq/overcloud- > > > > > > controller- > > > > > > 0/var/log/extra/docker/containers/heat_engine/log/heat/heat > > > > > > - > > > > > > engine.log.txt.gz#_2017-03-21_20_26_29_697 > > > > > > > > > > > > > > > > > > Looks like Heat isn't able to create Nova instances in the > > > > > > overcloud > > > > > > due to "Host 'overcloud-novacompute-0' is not mapped to any > > > > > > cell'. This > > > > > > means our cells initialization code for containers may not > > > > > > be quite > > > > > > right... or there is a race somewhere. > > > > > > > > > > > > > > > Here are some findings. I've looked at time measures from CI > > > > > for > > > > > https://review.openstack.org/#/c/448533/ which provided the > > > > > most > > > > > recent results: > > > > > > > > > > * gate-tripleo-ci-centos-7-ovb-ha [1] > > > > > undercloud install: 23 > > > > > overcloud deploy: 72 > > > > > total time: 125 > > > > > * gate-tripleo-ci-centos-7-ovb-nonha [2] > > > > > undercloud install: 25 > > > > > overcloud deploy: 48 > > > > > total time: 122 > > > > > * gate-tripleo-ci-centos-7-ovb-updates [3] > > > > > undercloud install: 24 > > > > > overcloud deploy: 57 > > > > > total time: 152 > > > > > * gate-tripleo-ci-centos-7-ovb-containers-oooq-nv [4] > > > > > undercloud install: 28 > > > > > overcloud deploy: 48 > > > > > total time: 165 (timeout) > > > > > > > > > > Looking at the undercloud & overcloud install times, the most > > > > > task > > > > > consuming tasks, the containers job isn't doing that bad > > > > > compared to > > > > > other OVB jobs. But looking closer I could see that: > > > > > - the containers job pulls docker images from dockerhub, this > > > > > process > > > > > takes roughly 18 min. > > > > > > > > > > > > I think we can optimize this a bit by having the script that > > > > populates > > > > > > the > > > > local > > > > registry in the overcloud job to run in parallel. The docker > > > > daemon can > > > > > > do > > > > multiple pulls w/o problems. > > > > > > > > > - the overcloud validate task takes 10 min more than it > > > > > should because > > > > > of the bug Dan mentioned (a fix is in the queue at > > > > > https://review.openstack.org/#/c/448575/) > > > > > > > > > > > > +A > > > > > > > > > - the postci takes a long time with quickstart, 13 min (4 min > > > > > alone > > > > > spent on docker log collection) whereas it takes only 3 min > > > > > when using > > > > > tripleo.sh > > > > > > > > > > > > mmh, does this have anything to do with ansible being in > > > > between? Or is > > > > > > that > > > > time specifically for the part that gets the logs? > > > > > > > > > > > > > > Adding all these numbers, we're at about 40 min of additional > > > > > time for > > > > > oooq containers job which is enough to cross the CI job > > > > > limit. > > > > > > > > > > There is certainly a lot of room for optimization here and > > > > > there and > > > > > I'll explore how we can speed up the containers CI job over > > > > > the next > > > > > > > > > > > > Thanks a lot for the update. The time break down is fantastic, > > > > Flavio > > > > > > TBH the problem is far from being solved: > > > > > > 1. Click on https://status-tripleoci.rhcloud.com/ > > > 2. Select gate-tripleo-ci-centos-7-ovb-containers-oooq-nv > > > > > > Container job has been failing more than 55% of the time. > > > > > > As a reference, > > > gate-tripleo-ci-centos-7-ovb-nonha has 90% of success. > > > gate-tripleo-ci-centos-7-ovb-ha has 64% of success. > > > > > > It clearly means the ovb-containers job was and is not ready to > > > be run > > > in the check pipeline, it's not reliable enough. > > > > > > The current queue time in TripleO OVB is 11 hours. This is not > > > acceptable for TripleO developers and we need a short term > > > solution, > > > which is disabling this job from the check pipeline: > > > https://review.openstack.org/#/c/451546/ > > > > > > > > > > Yes, given resource constraints I don't see an alternative in the > > short > > term. > > > > > > > On the long-term, we need to: > > > > > > - Stabilize ovb-containers which is AFIK already WIP by Martin > > > (kudos > > > to him). My hope is Martin gets enough help from Container squad > > > to > > > work on this topic. > > > - Remove ovb-nonha scenario from the check pipeline - and > > > probably > > > keep it periodic. Dan Prince started some work on it: > > > https://review.openstack.org/#/c/449791/ and > > > https://review.openstack.org/#/c/449785/ - but not much progress > > > on it > > > in the recent days. > > > - Engage some work on getting multinode-scenario(001,002,003,004) > > > jobs > > > for containers, so we don't need much OVB jobs (only one > > > probably) for > > > container scenarios. > > > > > > > > > > Another work item in progress which should help with the stability > > of the > > ovb containers job is Dan has set up a docker-distribution based > > registry > > on a node in rhcloud. Once jobs are pulling images from this there > > should > > be less timeouts due to image pull speed. > > > > Before we go and stand up private infrastructure for tripleo to > depend on, can > we please work on solving this is for all openstack projects > upstream? We do > want to run regional mirrors for docker things, however we need to > address > issues on how to integration this with AFS. > > We are trying to break the cycle of tripleo standing up private > infrastructure > and consume more community based. So far we are making good progress, > however I > would see this effort a step backwards, not forward.
I would propose that we do both. Lets setup resources in-rack that help us efficiently cache containers from dockerhub. And lets also do the same within infra so that jobs running there benefit as well. IMO a local, in-rack proxy/mirror that requires little to no maintenance (which is all we are setting up here really) is a very good pattern. Are there other ideas that will allow us to avoid the overhead of continually pulling images into our Rack from dockerhub? Dan > > > > > I know everyone is busy by working on container support in > > > composable > > > services, but we might assign more resources on CI work here, > > > otherwise I'm not sure how we're going to stabilize the CI. > > > > > > Any feedback is very welcome. > > > > > > > > > > > > weeks. > > > > > > > > > > Martin > > > > > > > > > > [1] > > > > > http://logs.openstack.org/33/448533/2/check-tripleo/gate- > > > > > > tripleo-ci-centos-7-ovb-ha/d2c1b16/ > > > > > [2] > > > > > http://logs.openstack.org/33/448533/2/check-tripleo/gate- > > > > > > tripleo-ci-centos-7-ovb-nonha/d6df760/ > > > > > [3] > > > > > http://logs.openstack.org/33/448533/2/check-tripleo/gate- > > > > > > tripleo-ci-centos-7-ovb-updates/3b1f795/ > > > > > [4] > > > > > http://logs.openstack.org/33/448533/2/check-tripleo/gate- > > > > > > tripleo-ci-centos-7-ovb-containers-oooq-nv/b816f20/ > > > > > > > > > > > Dan > > > > > > > > > > > > > > > > > > > > Flavio > > > > > > > > > > > > > > > > > > > > > > > > > > > > _________________________________________________________ > > > > > > > ____________ > > > > > > > _____ > > > > > > > OpenStack Development Mailing List (not for usage > > > > > > > questions) > > > > > > > Unsubscribe: openstack-dev-requ...@lists.openstack.org?su > > > > > > > bject:unsubs > > > > > > > cribe > > > > > > > http://lists.openstack.org/cgi-bin/mailman/listinfo/opens > > > > > > > tack-dev > > > > > > > > > > > > > > > > > > > > > > > > ___________________________________________________________ > > > > > > _ > > > > > > ______________ > > > > > > OpenStack Development Mailing List (not for usage > > > > > > questions) > > > > > > Unsubscribe: > > > > > > openstack-dev-requ...@lists.openstack.org?subject:unsubscri > > > > > > be > > > > > > http://lists.openstack.org/cgi-bin/mailman/listinfo/opensta > > > > > > ck-dev > > > > > > > > > > > > > > > ____________________________________________________________ > > > > > > ______________ > > > > > OpenStack Development Mailing List (not for usage questions) > > > > > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subjec > > > > > t: > > > > > > unsubscribe > > > > > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack > > > > > -dev > > > > > > > > > > > > -- > > > > @flaper87 > > > > Flavio Percoco > > > > > > > > ____________________________________________________________ > > > > > > ______________ > > > > OpenStack Development Mailing List (not for usage questions) > > > > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject: > > > > > > unsubscribe > > > > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-d > > > > ev > > > > > > > > > > > > > > > > -- > > > Emilien Macchi > > > > > > _________________________________________________________________ > > > _________ > > > OpenStack Development Mailing List (not for usage questions) > > > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:un > > > subscribe > > > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > > > > > ___________________________________________________________________ > > _______ > > OpenStack Development Mailing List (not for usage questions) > > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsu > > bscribe > > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > > > _____________________________________________________________________ > _____ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubs > cribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev