On Thu, Jun 16, 2016 at 12:20:06PM +0000, Steven Dake (stdake) wrote: > David, > > The gates are unreliable for a variety of reasons - some we can fix - some > we can't directly. > > RDO rabbitmq introduced IPv6 support to erlang, which caused our gate > reliably to drop dramatically. Prior to this change, our gate was running > 95% reliability or better - assuming the code wasn¹t busted. > The gate gear is different - meaning different setup. We have been > working on debugging all these various gate provider issues with infra > team and I think that is mostly concluded. > The gate changed to something called bindeps which has been less reliable > for us.
I would be curious to hear your issues with bindep. A quick look at kolla show you are not using other-requirements.txt yet, so you are using our default fallback.txt file. I am unsure how that could be impacting you. > We do not have mirrors of CentOS repos - although it is in the works. > Mirrors will ensure that images always get built. At the moment many of > the gate failures are triggered by build failures (the mirrors are too > busy). This is no longer the case, openstack-infra is now mirroring both centos-7[1] and epel-7[2]. And just this week we brought Ubuntu Cloud Archive[3] online. It would be pretty trivial to update kolla to start using them. [1] http://mirror.dfw.rax.openstack.org/centos/7/ [2] http://mirror.dfw.rax.openstack.org/epel/7/ [3] http://mirror.dfw.rax.openstack.org/ubuntu-cloud-archive/ > We do not have mirrors of the other 5-10 repos and files we use. This > causes more build failures. > We do have the infrastructure in AFS to do this, it would require you to write the patch and submit it to openstack-infra so we can bring it online. In fact, the OpenStack Ansible team was responsible for UCA mirror above, I simply did the last 5% to bring it into production. > Complicating matters, any of theses 5 things above can crater one gate job > of which we run about 15 jobs, which causes the entire gate to fail (if > they were voting). I really want a voting gate for kolla's jobs. I super > want it. The reason we can't make the gates voting at this time is > because of the sheer unreliability of the gate. > > If anyone is up for a thorough analysis of *why* the gates are failing, > that would help us fix them. > > Regards > -steve > > On 6/15/16, 3:27 AM, "Paul Bourke" <[email protected]> wrote: > > >Hi David, > > > >I agree with this completely. Gates continue to be a problem for Kolla, > >reasons why have been discussed in the past but at least for me it's not > >clear what the key issues are. > > > >I've added this item to agenda for todays IRC meeting (16:00 UTC - > >https://wiki.openstack.org/wiki/Meetings/Kolla). It may help if before > >hand we can brainstorm a list of the most common problems here beforehand. > > > >To kick things off, rabbitmq seems to cause a disproportionate amount of > >issues, and the problems are difficult to diagnose, particularly when > >the only way to debug is to summit "DO NOT MERGE" patch sets over and > >over. Here's an example of a failed centos binary gate from a simple > >patch set I was reviewing this morning: > >http://logs.openstack.org/06/329506/1/check/gate-kolla-dsvm-deploy-centos- > >binary/3486d03/console.html#_2016-06-14_15_36_19_425413 > > > >Cheers, > >-Paul > > > >On 15/06/16 04:26, David Moreau Simard wrote: > >> Hi Kolla o/ > >> > >> I'm writing to you because I'm concerned. > >> > >> In case you didn't already know, the RDO community collaborates with > >> upstream deployment and installation projects to test it's packaging. > >> > >> This relationship is beneficial in a lot of ways for both parties, in > >>summary: > >> - RDO has improved test coverage (because it's otherwise hard to test > >> different ways of installing, configuring and deploying OpenStack by > >> ourselves) > >> - The RDO community works with upstream projects (deployment or core > >> projects) to fix issues that we find > >> - In return, the collaborating deployment project can feel more > >> confident that the RDO packages it consumes have already been tested > >> using it's platform and should work > >> > >> To make a long story short, we do this with a project called WeIRDO > >> [1] which essentially runs gate jobs outside of the gate. > >> > >> I tried to get Kolla in our testing pipeline during the Mitaka cycle. > >> I really did. > >> I contributed the necessary features I needed in Kolla in order to > >> make this work, like the configurable Yum repositories for example. > >> > >> However, in the end, I had to put off the initiative because the gate > >> jobs were very flappy and unreliable. > >> We cannot afford to have a job that is *expected* to flap in our > >> testing pipeline, it leads to a lot of wasted time, effort and > >> resources. > >> > >> I think there's been a lot of improvements since my last attempt but > >> to get a sample of data, I looked at ~30 recently merged reviews. > >> Of 260 total build/deploy jobs, 55 (or over 20%) failed -- and I > >> didn't account for rechecks, just the last known status of the check > >> jobs. > >> I put up the results of those jobs here [2]. > >> > >> In the case that interests me most, CentOS binary jobs, it's 5 > >> failures out of 50 jobs, so 10%. Not as bad but still a concern for > >> me. > >> > >> Other deployment projects like Puppet-OpenStack, OpenStack Ansible, > >> Packstack and TripleO have quite a bit of *voting* integration testing > >> jobs. > >> Why are Kolla's jobs non-voting and so unreliable ? > >> > >> Thanks, > >> > >> [1]: https://github.com/rdo-infra/weirdo > >> [2]: > >>https://docs.google.com/spreadsheets/d/1NYyMIDaUnlOD2wWuioAEOhjeVmZe7Q8_z > >>dFfuLjquG4/edit#gid=0 > >> > >> David Moreau Simard > >> Senior Software Engineer | Openstack RDO > >> > >> dmsimard = [irc, github, twitter] > >> > >> > >>_________________________________________________________________________ > >>_ > >> OpenStack Development Mailing List (not for usage questions) > >> Unsubscribe: > >>[email protected]?subject:unsubscribe > >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > >> > > > >__________________________________________________________________________ > >OpenStack Development Mailing List (not for usage questions) > >Unsubscribe: [email protected]?subject:unsubscribe > >http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > > > __________________________________________________________________________ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: [email protected]?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: [email protected]?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
