Re: [openstack-dev] [kolla] Stability and reliability of gate jobs

Paul Bourke Wed, 15 Jun 2016 03:32:16 -0700

Hi David,

I agree with this completely. Gates continue to be a problem for Kolla,reasons why have been discussed in the past but at least for me it's notclear what the key issues are.

I've added this item to agenda for todays IRC meeting (16:00 UTC -https://wiki.openstack.org/wiki/Meetings/Kolla). It may help if beforehand we can brainstorm a list of the most common problems here beforehand.

To kick things off, rabbitmq seems to cause a disproportionate amount ofissues, and the problems are difficult to diagnose, particularly whenthe only way to debug is to summit "DO NOT MERGE" patch sets over andover. Here's an example of a failed centos binary gate from a simplepatch set I was reviewing this morning:http://logs.openstack.org/06/329506/1/check/gate-kolla-dsvm-deploy-centos-binary/3486d03/console.html#_2016-06-14_15_36_19_425413


Cheers,
-Paul

On 15/06/16 04:26, David Moreau Simard wrote:

Hi Kolla o/

I'm writing to you because I'm concerned.

In case you didn't already know, the RDO community collaborates with
upstream deployment and installation projects to test it's packaging.

This relationship is beneficial in a lot of ways for both parties, in summary:
- RDO has improved test coverage (because it's otherwise hard to test
different ways of installing, configuring and deploying OpenStack by
ourselves)
- The RDO community works with upstream projects (deployment or core
projects) to fix issues that we find
- In return, the collaborating deployment project can feel more
confident that the RDO packages it consumes have already been tested
using it's platform and should work

To make a long story short, we do this with a project called WeIRDO
[1] which essentially runs gate jobs outside of the gate.

I tried to get Kolla in our testing pipeline during the Mitaka cycle.
I really did.
I contributed the necessary features I needed in Kolla in order to
make this work, like the configurable Yum repositories for example.

However, in the end, I had to put off the initiative because the gate
jobs were very flappy and unreliable.
We cannot afford to have a job that is *expected* to flap in our
testing pipeline, it leads to a lot of wasted time, effort and
resources.

I think there's been a lot of improvements since my last attempt but
to get a sample of data, I looked at ~30 recently merged reviews.
Of 260 total build/deploy jobs, 55 (or over 20%) failed -- and I
didn't account for rechecks, just the last known status of the check
jobs.
I put up the results of those jobs here [2].

In the case that interests me most, CentOS binary jobs, it's 5
failures out of 50 jobs, so 10%. Not as bad but still a concern for
me.

Other deployment projects like Puppet-OpenStack, OpenStack Ansible,
Packstack and TripleO have quite a bit of *voting* integration testing
jobs.
Why are Kolla's jobs non-voting and so unreliable ?

Thanks,

[1]: https://github.com/rdo-infra/weirdo
[2]:
https://docs.google.com/spreadsheets/d/1NYyMIDaUnlOD2wWuioAEOhjeVmZe7Q8_zdFfuLjquG4/edit#gid=0

David Moreau Simard
Senior Software Engineer | Openstack RDO

dmsimard = [irc, github, twitter]

__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: [email protected]?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: [email protected]?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [kolla] Stability and reliability of gate jobs

Reply via email to