Re: [openstack-dev] [tripleo][ironic][heat] Adding back the tripleo check job

Devananda van der Veen Tue, 01 Dec 2015 17:14:54 -0800

On Tue, Dec 1, 2015 at 3:22 AM, Steven Hardy <sha...@redhat.com> wrote:


> On Mon, Nov 30, 2015 at 03:35:13PM -0800, Devananda van der Veen wrote:
> >    On Mon, Nov 30, 2015 at 3:07 PM, Zane Bitter <zbit...@redhat.com>
> wrote:
> >
> >      On 30/11/15 12:51, Ruby Loo wrote:
> >
> >        On 30 November 2015 at 10:19, Derek Higgins <der...@redhat.com
> >        <mailto:der...@redhat.com>> wrote:
> >
> >        Â  Â  Hi All,
> >
> >        Â  Â  Â  Â  Â A few months tripleo switch from its devtest based
> CI to
> >        one
> >        Â  Â  that was based on instack. Before doing this we anticipated
> >        Â  Â  disruption in the ci jobs and removed them from non tripleo
> >        projects.
> >
> >        Â  Â  Â  Â  Â We'd like to investigate adding it back to heat and
> >        ironic as
> >        Â  Â  these are the two projects where we find our ci provides the
> >        most
> >        Â  Â  value. But we can only do this if the results from the job
> are
> >        Â  Â  treated as voting.
> >
> >        What does this mean? That the tripleo job could vote and do a -1
> and
> >        block ironic's gate?
> >
> >        Â  Â  Â  Â  Â In the past most of the non tripleo projects tended
> to
> >        ignore
> >        Â  Â  the results from the tripleo job as it wasn't unusual for
> the
> >        job to
> >        Â  Â  broken for days at a time. The thing is, ignoring the
> results of
> >        the
> >        Â  Â  job is the reason (the majority of the time) it was broken
> in
> >        the
> >        Â  Â  first place.
> >        Â  Â  Â  Â  Â To decrease the number of breakages we are now no
> longer
> >        Â  Â  running master code for everything (for the non tripleo
> projects
> >        we
> >        Â  Â  bump the versions we use periodically if they are working).
> I
> >        Â  Â  believe with this model the CI jobs we run have become a lot
> >        more
> >        Â  Â  reliable, there are still breakages but far less frequently.
> >
> >        Â  Â  What I proposing is we add at least one of our tripleo jobs
> back
> >        to
> >        Â  Â  both heat and ironic (and other projects associated with
> them
> >        e.g.
> >        Â  Â  clients, ironicinspector etc..), tripleo will switch to
> running
> >        Â  Â  latest master of those repositories and the cores approving
> on
> >        those
> >        Â  Â  projects should wait for a passing CI jobs before hitting
> >        approve.
> >        Â  Â  So how do people feel about doing this? can we give it a
> go? A
> >        Â  Â  couple of people have already expressed an interest in doing
> >        this
> >        Â  Â  but I'd like to make sure were all in agreement before
> switching
> >        it on.
> >
> >        This seems to indicate that the tripleo jobs are non-voting, or at
> >        least
> >        won't block the gate -- so I'm fine with adding tripleo jobs to
> >        ironic.
> >        But if you want cores to wait/make sure they pass, then shouldn't
> they
> >        be voting? (Guess I'm a bit confused.)
> >
> >      +1
> >
> >      I don't think it hurts to turn it on, but tbh I'm uncomfortable
> with the
> >      mental overhead of a non-voting job that I have to manually treat
> as a
> >      voting job. If it's stable enough to make it a voting job, I'd
> prefer we
> >      just make it voting. And if it's not then I'd like to see it be made
> >      stable enough to be a voting job and then make it voting.
> >
> >    This is roughly where I sit as well -- if it's non-voting, experience
> >    tells me that it will largely be ignored, and as such, isn't a good
> use of
> >    resources.
>
> I'm sure you can appreciate it's something of a chicken/egg problem though
> - if everyone always ignores non-voting jobs, they never become voting.
>
> That effect is magnified with TripleO though, because it consumes so many
> OpenStack projects, any one of which has the capability to break our CI, so
> in an ideal world we'd have voting feedback on all-the-things, but that's
> not where we are right now due in large-part to the steady stream of
> regressions (from Heat, Ironic and other projects).
>
> >    I haven't looked at tripleo or tripleoci in a while, so I wont assume
> that
> >    my recollection of the CI jobs bears any resemblance to what exists
> today.
> >    Could you explain what areas of ironic (or its subprojects) will be
> >    covered by these tests?Â  If they are already covered by existing
> tests,
> >    then I don't see the benefit of adding another job; conversely, if
> this is
> >    testing areas we don't cover today, then there's probably value in
> running
> >    tripleoci in a voting fashion for now and then moving that coverage
> into
> >    ironic's project testing.
>
> I like to think of TripleO as a trunk-chasing "power user", and as such
> gives very valuable "user" feedback, including breaking things in exciting
> ways you hadn't anticipated in your project integration tests.
>
> This has, in the case of Heat at least, made TripleO an extremely effective
> "kitchen sink" stress test, and has uncovered numerous issues we failed to
> find with out internal tests (obviously we do add coverage when we find
> them).
>
> In the case of Ironic, I think the usage is somewhat less demanding, but no
> less "real world" - here's a good example for you:
>
> https://bugs.launchpad.net/ironic/+bug/1507738
>
> In this case, Ironic landed a change to master, which broke all existing
> deployments using Centos/RHEL derived distributions, so master Ironic has
> been broken for folks using those distros for over 6 weeks.
>
> I know in that case, the problem was really old ipxe image in the distro,
> and yes there were several possible workarounds, but as a developer who
> cares about users, I personally would rather get gate feedback than angry
> users on IRC/email when I unwittingly break the world for them ;)
>
> (note, I'm not assigning any blame above, it's one of *many* examples of
> unexpected breakage due to insufficient gate feedback of real usage accross
> many projects).
>

Great example, Steve, and I agree that more and faster feedback from users
into patches is a good thing. I'm also sad that it was broken for that long
and no one raised the issue in our meeting until this week.

This particular bug highlights a gap in Ironic's test coverage which I
would be delighted if someone wants to close -- that we aren't testing
support for RH-based distros. Closing that gap doesn't require TripleoCI at
all; we should simply add a dsvm job for Ironic on Fedora, using a
Fedora-based ramdisk. That will help prevent similar regressions in the
future.

Anyway, I have big reservations about putting TripleoCI on a path to ever
gating Ironic patches. I started to bikeshed on that and then deleted it
... tldr; I believe it is important for this job to vote in a non-gating
way. As a reviewer, I'm unlikely to pay attention to it if it doesn't vote,
and there's a good reason for this:

Non-voting jobs are used for experimentation. A non-voting job is a job
that we want to vote, but which we don't trust enough yet. It has been
promoted from the experimental pipeline to the check pipeline so that it
gets a lot more runs and so that we can stabilize it enough to make it
voting.

I was going to suggest that tripleoci vote as a third party CI system (I
know, it's not actually a third-party CI system, but I'd like to vote like
one). And then I noticed that it used to do just that. [0] If I'm
interpreting it correctly, the "gate-tripleo-ironic*" jobs voted from a
separate account, left an informative -1, but did not block the gate.
That's exactly what I would like in this case.


Cheers,
-Devananda

[0] https://review.openstack.org/#/c/184402/

__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [tripleo][ironic][heat] Adding back the tripleo check job

Reply via email to