Dell - Internal Use - Confidential Thanks for starting the discussion. Will attend
-----Original Message----- From: Dmitry Tantsur [mailto:dtant...@redhat.com] Sent: Thursday, April 13, 2017 10:33 AM To: OpenStack Development Mailing List (not for usage questions) Subject: [openstack-dev] [ironic] 3rdparty CI status and how we can help to make it green Hi all, especially maintainers of 3rdparty CI for Ironic :) I've been watching our 3rdparty CI results recently. While things have improved compared to e.g. a month ago, most of jobs still finish with failures. I've written a simple script [1] to fetch CI runs information from my local Gertty database, the results [2] show that some jobs still fail surprisingly often (> 50% of cases): - job: tempest-dsvm-ironic-agent-irmc rate: 0.9857142857142858 - job: tempest-dsvm-ironic-iscsi-irmc rate: 0.9771428571428571 - job: dell-hw-tempest-dsvm-ironic-pxe_drac rate: 0.9682539682539683 - job: gate-tempest-ironic-ilo-driver-iscsi_ilo rate: 0.9582463465553236 - job: dell-hw-tempest-dsvm-ironic-pxe_ipmitool rate: 0.9111111111111111 - job: tempest-dsvm-ironic-pxe-irmc rate: 0.8171428571428572 - job: gate-tempest-ironic-ilo-driver-pxe_ilo rate: 0.791231732776618 I would like to start the discussion on how we (as a team) can help people maintaining the CI to keep failure rate closer to one of our virtual CI (< 30% of cases, judging by [2]). I'm thinking of the following potential problems: 1. Our devstack plugin changes too often. I've head this complaint at least once. Should we maybe freeze our devstack at some point to allow the vendor folks to catch up? Then we should start looking at the CI results more carefully when modifying it. 2. Our devstack plugin is inconvenient for hardware, and requires hacks. This is something Miles (?) told me when trying to set up an environment for his hardware lab. If so, can we get a list of pain problems, preferably in a form of reported bugs? Myself and hopefully other folks can certainly dedicate some time to make your life easier. 3. The number of jobs to run on is too high. I've noticed that 3rdparty CI runs even on patches that clearly don't require it, e.g. docs-only changes. I suggest the maintainers to adopt some exclude rules similar to [3]. Also, most of the vendors run 3-4 jobs for different flavors of their drivers (and it is going to increase with the driver composition work). I wonder if we should recommend switching from ironic the baremetal_basic_ops test to what we call "standalone" tests [4]. This will allow to have only one job testing several drivers/combinations of interfaces within the same time frame. Finally, I've proposed this topic for the virtual meetup [5] planned in the end of April. Please feel free to stop by and let us know how we can help. Thanks, Dmitry. P.S. I've seen expired or self-signed HTTPS certificates on logs sites of some 3rdparty CI. Please try to fix such issues as soon as possible to allow the community to understand failures. [1] https://github.com/dtantsur/ci-report [2] http://paste.openstack.org/show/606467/ [3] https://github.com/openstack-infra/project-config/blob/master/zuul/layout.yaml#L1375-L1385 [4] https://github.com/openstack/ironic/blob/master/ironic_tempest_plugin/tests/scenario/ironic_standalone/test_basic_ops.py [5] https://etherpad.openstack.org/p/ironic-virtual-meetup __________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
__________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev