Re: [openstack-dev] [ci][infra][tripleo] Multi-staged check pipelines for Zuul v3 proposal

Bogdan Dobrelya Tue, 15 May 2018 08:57:06 -0700

On 5/15/18 5:08 PM, Sagi Shnaidman wrote:

Bogdan,
I think before final decisions we need to know exactly - what a price weneed to pay? Without exact numbers it will be difficult to discuss about.I we need to wait 80 mins of undercloud-containers job to finish forstarting all other jobs, it will be about 4.5 hours to wait for result(+ 4.5 hours in gate) which is too big price imho and doesn't worth aneffort.
What are exact numbers we are talking about?

I fully agree but can't have those numbers, sorry! As I noted above,those are definitely sitting in openstack-infra's elastic search DB,just needed to get extracted with some assistance of folks who know moreon that!


Thanks

On Tue, May 15, 2018 at 3:07 PM, Bogdan Dobrelya <bdobr...@redhat.com<mailto:bdobr...@redhat.com>> wrote:


    Let me clarify the problem I want to solve with pipelines.

    It is getting *hard* to develop things and move patches to the Happy
    End (merged):
    - Patches wait too long for CI jobs to start. It should be minutes
    and not hours of waiting.
    - If a patch fails a job w/o a good reason, the consequent recheck
    operation repeat waiting all over again.

    How pipelines may help solve it?
    Pipelines only alleviate, not solve the problem of waiting. We only
    want to build pipelines for the main zuul check process, omitting
    gating and RDO CI (for now).

    Where are two cases to consider:
    - A patch succeeds all checks
    - A patch fails a check with dependencies

    The latter cases benefit us the most, when pipelines are designed
    like it is proposed here. So that any jobs expected to fail, when a
    dependency fails, will be omitted from execution. This saves HW
    resources and zuul queue places a lot, making it available for other
    patches and allowing those to have CI jobs started faster (less
    waiting!). When we have "recheck storms", like because of some known
    intermittent side issue, that outcome is multiplied by the recheck
    storm um... level, and delivers even better and absolutely amazing
    results :) Zuul queue will not be growing insanely getting
    overwhelmed by multiple clones of the rechecked jobs highly likely
    deemed to fail, and blocking other patches what might have chances
    to pass checks as non-affected by that intermittent issue.

    And for the first case, when a patch succeeds, it takes some
    extended time, and that is the price to pay. How much time it takes
    to finish in a pipeline fully depends on implementation.

    The effectiveness could only be measured with numbers extracted from
    elastic search data, like average time to wait for a job to start,
    success vs fail execution time percentiles for a job, average amount
    of rechecks, recheck storms history et al. I don't have that data
    and don't know how to get it. Any help with that is very appreciated
    and could really help to move the proposed patches forward or
    decline it. And we could then compare "before" and "after" as well.

    I hope that explains the problem scope and the methodology to
    address that.


    On 5/14/18 6:15 PM, Bogdan Dobrelya wrote:

        An update for your review please folks

            Bogdan Dobrelya <bdobreli at redhat.com <http://redhat.com>>
            writes:

                Hello.
                As Zuul documentation [0] explains, the names "check",
                "gate", and
                "post"  may be altered for more advanced pipelines. Is
                it doable to
                introduce, for particular openstack projects, multiple check
                stages/steps as check-1, check-2 and so on? And is it
                possible to make
                the consequent steps reusing environments from the
                previous steps
                finished with?

                Narrowing down to tripleo CI scope, the problem I'd want
                we to solve
                with this "virtual RFE", and using such multi-staged
                check pipelines,
                is reducing (ideally, de-duplicating) some of the common
                steps for
                existing CI jobs.


            What you're describing sounds more like a job graph within a
            pipeline.
            See:
            
https://docs.openstack.org/infra/zuul/user/config.html#attr-job.dependencies
            
<https://docs.openstack.org/infra/zuul/user/config.html#attr-job.dependencies>

            for how to configure a job to run only after another job has
            completed.
            There is also a facility to pass data between such jobs.

            ... (skipped) ...

            Creating a job graph to have one job use the results of the
            previous job
            can make sense in a lot of cases.  It doesn't always save *time*
            however.

            It's worth noting that in OpenStack's Zuul, we have made an
            explicit
            choice not to have long-running integration jobs depend on
            shorter pep8
            or tox jobs, and that's because we value developer time more
            than CPU
            time.  We would rather run all of the tests and return all
            of the
            results so a developer can fix all of the errors as quickly
            as possible,
            rather than forcing an iterative workflow where they have to
            fix all the
            whitespace issues before the CI system will tell them which
            actual tests
            broke.

            -Jim


        I proposed a few zuul dependencies [0], [1] to tripleo CI
        pipelines for undercloud deployments vs upgrades testing (and
        some more). Given that those undercloud jobs have not so high
        fail rates though, I think Emilien is right in his comments and
        those would buy us nothing.

          From the other side, what do you think folks of making the
        tripleo-ci-centos-7-3nodes-multinode depend on
        tripleo-ci-centos-7-containers-multinode [2]? The former seems
        quite faily and long running, and is non-voting. It deploys (see
        featuresets configs [3]*) a 3 nodes in HA fashion. And it seems
        almost never passing, when the containers-multinode fails - see
        the CI stats page [4]. I've found only a 2 cases there for the
        otherwise situation, when containers-multinode fails, but
        3nodes-multinode passes. So cutting off those future failures
        via the dependency added, *would* buy us something and allow
        other jobs to wait less to commence, by a reasonable price of
        somewhat extended time of the main zuul pipeline. I think it
        makes sense and that extended CI time will not overhead the RDO
        CI execution times so much to become a problem. WDYT?

        [0] https://review.openstack.org/#/c/568275/
        <https://review.openstack.org/#/c/568275/>
        [1] https://review.openstack.org/#/c/568278/
        <https://review.openstack.org/#/c/568278/>
        [2] https://review.openstack.org/#/c/568326/
        <https://review.openstack.org/#/c/568326/>
        [3]
        
https://docs.openstack.org/tripleo-quickstart/latest/feature-configuration.html
        
<https://docs.openstack.org/tripleo-quickstart/latest/feature-configuration.html>

        [4] http://tripleo.org/cistatus.html
        <http://tripleo.org/cistatus.html>

        * ignore the column 1, it's obsolete, all CI jobs now using
        configs download AFAICT...

--Best regards,

    Bogdan Dobrelya,
    Irc #bogdando

    __________________________________________________________________________
    OpenStack Development Mailing List (not for usage questions)
    Unsubscribe:
    openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
    <http://openstack-dev-requ...@lists.openstack.org?subject:unsubscribe>
    http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
    <http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev>




--
Best regards
Sagi Shnaidman

__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



--
Best regards,
Bogdan Dobrelya,
Irc #bogdando

__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [ci][infra][tripleo] Multi-staged check pipelines for Zuul v3 proposal

Reply via email to