On 7/6/16, 5:50 PM, "Paul Belanger" <pabelan...@redhat.com> wrote:
>On Thu, Jun 16, 2016 at 12:20:06PM +0000, Steven Dake (stdake) wrote: >> David, >> >> The gates are unreliable for a variety of reasons - some we can fix - >>some >> we can't directly. >> >> RDO rabbitmq introduced IPv6 support to erlang, which caused our gate >> reliably to drop dramatically. Prior to this change, our gate was >>running >> 95% reliability or better - assuming the code wasn¹t busted. >> The gate gear is different - meaning different setup. We have been >> working on debugging all these various gate provider issues with infra >> team and I think that is mostly concluded. >> The gate changed to something called bindeps which has been less >>reliable >> for us. > >I would be curious to hear your issues with bindep. A quick look at kolla >show >you are not using other-requirements.txt yet, so you are using our default >fallback.txt file. I am unsure how that could be impacting you. > >> We do not have mirrors of CentOS repos - although it is in the works. >> Mirrors will ensure that images always get built. At the moment many of >> the gate failures are triggered by build failures (the mirrors are too >> busy). > >This is no longer the case, openstack-infra is now mirroring both >centos-7[1] >and epel-7[2]. And just this week we brought Ubuntu Cloud Archive[3] >online. It >would be pretty trivial to update kolla to start using them. > >[1] http://mirror.dfw.rax.openstack.org/centos/7/ >[2] http://mirror.dfw.rax.openstack.org/epel/7/ >[3] http://mirror.dfw.rax.openstack.org/ubuntu-cloud-archive/ Thanks I was aware that infra made mirrors available; I have not had a chance to personally modify the gate to make use of these mirrors. I am not sure if there is an issue with bindep or not. A whole lot of things changed at once and our gate went from pretty stable to super unstable. One of those things was bindeps but there were a bunch of other changes. I wouldn't pin it all on binddep. > >> We do not have mirrors of the other 5-10 repos and files we use. This >> causes more build failures. >> >We do have the infrastructure in AFS to do this, it would require you to >write >the patch and submit it to openstack-infra so we can bring it online. In >fact, >the OpenStack Ansible team was responsible for UCA mirror above, I simply >did >the last 5% to bring it into production. Wow that’s huge! I was not aware of this. Do you have an example patch which brings a mirror into service?? Thanks -steve > >> Complicating matters, any of theses 5 things above can crater one gate >>job >> of which we run about 15 jobs, which causes the entire gate to fail (if >> they were voting). I really want a voting gate for kolla's jobs. I >>super >> want it. The reason we can't make the gates voting at this time is >> because of the sheer unreliability of the gate. >> >> If anyone is up for a thorough analysis of *why* the gates are failing, >> that would help us fix them. >> >> Regards >> -steve >> >> On 6/15/16, 3:27 AM, "Paul Bourke" <paul.bou...@oracle.com> wrote: >> >> >Hi David, >> > >> >I agree with this completely. Gates continue to be a problem for Kolla, >> >reasons why have been discussed in the past but at least for me it's >>not >> >clear what the key issues are. >> > >> >I've added this item to agenda for todays IRC meeting (16:00 UTC - >> >https://wiki.openstack.org/wiki/Meetings/Kolla). It may help if before >> >hand we can brainstorm a list of the most common problems here >>beforehand. >> > >> >To kick things off, rabbitmq seems to cause a disproportionate amount >>of >> >issues, and the problems are difficult to diagnose, particularly when >> >the only way to debug is to summit "DO NOT MERGE" patch sets over and >> >over. Here's an example of a failed centos binary gate from a simple >> >patch set I was reviewing this morning: >> >>>http://logs.openstack.org/06/329506/1/check/gate-kolla-dsvm-deploy-cento >>>s- >> >binary/3486d03/console.html#_2016-06-14_15_36_19_425413 >> > >> >Cheers, >> >-Paul >> > >> >On 15/06/16 04:26, David Moreau Simard wrote: >> >> Hi Kolla o/ >> >> >> >> I'm writing to you because I'm concerned. >> >> >> >> In case you didn't already know, the RDO community collaborates with >> >> upstream deployment and installation projects to test it's packaging. >> >> >> >> This relationship is beneficial in a lot of ways for both parties, in >> >>summary: >> >> - RDO has improved test coverage (because it's otherwise hard to test >> >> different ways of installing, configuring and deploying OpenStack by >> >> ourselves) >> >> - The RDO community works with upstream projects (deployment or core >> >> projects) to fix issues that we find >> >> - In return, the collaborating deployment project can feel more >> >> confident that the RDO packages it consumes have already been tested >> >> using it's platform and should work >> >> >> >> To make a long story short, we do this with a project called WeIRDO >> >> [1] which essentially runs gate jobs outside of the gate. >> >> >> >> I tried to get Kolla in our testing pipeline during the Mitaka cycle. >> >> I really did. >> >> I contributed the necessary features I needed in Kolla in order to >> >> make this work, like the configurable Yum repositories for example. >> >> >> >> However, in the end, I had to put off the initiative because the gate >> >> jobs were very flappy and unreliable. >> >> We cannot afford to have a job that is *expected* to flap in our >> >> testing pipeline, it leads to a lot of wasted time, effort and >> >> resources. >> >> >> >> I think there's been a lot of improvements since my last attempt but >> >> to get a sample of data, I looked at ~30 recently merged reviews. >> >> Of 260 total build/deploy jobs, 55 (or over 20%) failed -- and I >> >> didn't account for rechecks, just the last known status of the check >> >> jobs. >> >> I put up the results of those jobs here [2]. >> >> >> >> In the case that interests me most, CentOS binary jobs, it's 5 >> >> failures out of 50 jobs, so 10%. Not as bad but still a concern for >> >> me. >> >> >> >> Other deployment projects like Puppet-OpenStack, OpenStack Ansible, >> >> Packstack and TripleO have quite a bit of *voting* integration >>testing >> >> jobs. >> >> Why are Kolla's jobs non-voting and so unreliable ? >> >> >> >> Thanks, >> >> >> >> [1]: https://github.com/rdo-infra/weirdo >> >> [2]: >> >>>>https://docs.google.com/spreadsheets/d/1NYyMIDaUnlOD2wWuioAEOhjeVmZe7Q8 >>>>_z >> >>dFfuLjquG4/edit#gid=0 >> >> >> >> David Moreau Simard >> >> Senior Software Engineer | Openstack RDO >> >> >> >> dmsimard = [irc, github, twitter] >> >> >> >> >> >>>>_______________________________________________________________________ >>>>__ >> >>_ >> >> OpenStack Development Mailing List (not for usage questions) >> >> Unsubscribe: >> >>openstack-dev-requ...@lists.openstack.org?subject:unsubscribe >> >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >> >> >> > >> >>>________________________________________________________________________ >>>__ >> >OpenStack Development Mailing List (not for usage questions) >> >Unsubscribe: >>openstack-dev-requ...@lists.openstack.org?subject:unsubscribe >> >http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >> >> >> >>_________________________________________________________________________ >>_ >> OpenStack Development Mailing List (not for usage questions) >> Unsubscribe: >>openstack-dev-requ...@lists.openstack.org?subject:unsubscribe >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > >__________________________________________________________________________ >OpenStack Development Mailing List (not for usage questions) >Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe >http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev