subject:"\[openstack\-dev\] \[ci\]\[infra\]\[tripleo\] Multi\-staged check pipelines for Zuul v3 proposal"

Re: [openstack-dev] [ci][infra][tripleo] Multi-staged check pipelines for Zuul v3 proposal

2018-06-05 Thread Bogdan Dobrelya

The proposed undercloud installation jobs dependency [0] worked, see the 
jobs start time [1], [2] confirms that. The resulting delay for the full 
pipeline is an ~80 minutes, as it was expected. So PTAL folks, I propose 
to try it out in real gating and see how the tripleo zuul queue gets 
relieved.


The remaining patch [1] adding a dependency on tox/linting didn't work, 
I'll need some help please to figure out why.


Thank you Tristan and James and y'all folks for helping!

[0] https://review.openstack.org/#/c/568536/
[1] 
http://logs.openstack.org/36/568536/6/check/tripleo-ci-centos-7-undercloud-containers/cfebec0/ara-report/
[2] 
http://logs.openstack.org/36/568536/6/check/tripleo-ci-centos-7-containers-multinode/1a211bb/ara-report/

[3] https://review.openstack.org/#/c/568543/




Perhaps this has something to do with jobs evaluation order, it may be
worth trying to add the dependencies list in the project-templates, like
it is done here for example:
http://git.openstack.org/cgit/openstack-infra/project-config/tree/zuul.d/projects.yaml#n9799

It also easier to read dependencies from pipelines definition imo.

-Tristan



--
Best regards,
Bogdan Dobrelya,
Irc #bogdando

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [ci][infra][tripleo] Multi-staged check pipelines for Zuul v3 proposal

2018-05-28 Thread Bogdan Dobrelya


On 5/28/18 11:43 AM, Bogdan Dobrelya wrote:

On 5/25/18 6:40 PM, Tristan Cacqueray wrote:

Hello Bogdan,

Perhaps this has something to do with jobs evaluation order, it may be
worth trying to add the dependencies list in the project-templates, like
it is done here for example:
http://git.openstack.org/cgit/openstack-infra/project-config/tree/zuul.d/projects.yaml#n9799 



It also easier to read dependencies from pipelines definition imo.


Thank you!
It seems for the most places, tripleo uses pre-defined templates, see 
[0]. And templates can not import dependencies [1] :(


Here is a zuul story for that [2]

[2] https://storyboard.openstack.org/#!/story/2002113



[0] 
http://codesearch.openstack.org/?q=-%20project%3A=nope==tripleo-ci,tripleo-common,tripleo-common-tempest-plugin,tripleo-docs,tripleo-ha-utils,tripleo-heat-templates,tripleo-image-elements,tripleo-ipsec,tripleo-puppet-elements,tripleo-quickstart,tripleo-quickstart-extras,tripleo-repos,tripleo-specs,tripleo-ui,tripleo-upgrade,tripleo-validations 



[1] https://review.openstack.org/#/c/568536/4



-Tristan

On May 25, 2018 12:45 pm, Bogdan Dobrelya wrote:
Job dependencies seem ignored by zuul, see jobs [0],[1],[2] started 
simultaneously. While I expected them run one by one. According to 
the patch 568536 [3], [1] is a dependency for [2] and [3].


The same can be observed for the remaining patches in the topic [4].
Is that a bug or I misunderstood what zuul job dependencies actually do?

[0] 
http://logs.openstack.org/36/568536/2/check/tripleo-ci-centos-7-undercloud-containers/731183a/ara-report/ 

[1] 
http://logs.openstack.org/36/568536/2/check/tripleo-ci-centos-7-3nodes-multinode/a1353ed/ara-report/ 

[2] 
http://logs.openstack.org/36/568536/2/check/tripleo-ci-centos-7-containers-multinode/9777136/ara-report/ 


[3] https://review.openstack.org/#/c/568536/
[4] 
https://review.openstack.org/#/q/topic:ci_pipelines+(status:open+OR+status:merged) 








--
Best regards,
Bogdan Dobrelya,
Irc #bogdando

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [ci][infra][tripleo] Multi-staged check pipelines for Zuul v3 proposal

2018-05-28 Thread Bogdan Dobrelya


On 5/25/18 6:40 PM, Tristan Cacqueray wrote:

Hello Bogdan,

Perhaps this has something to do with jobs evaluation order, it may be
worth trying to add the dependencies list in the project-templates, like
it is done here for example:
http://git.openstack.org/cgit/openstack-infra/project-config/tree/zuul.d/projects.yaml#n9799 



It also easier to read dependencies from pipelines definition imo.


Thank you!
It seems for the most places, tripleo uses pre-defined templates, see 
[0]. And templates can not import dependencies [1] :(


[0] 
http://codesearch.openstack.org/?q=-%20project%3A=nope==tripleo-ci,tripleo-common,tripleo-common-tempest-plugin,tripleo-docs,tripleo-ha-utils,tripleo-heat-templates,tripleo-image-elements,tripleo-ipsec,tripleo-puppet-elements,tripleo-quickstart,tripleo-quickstart-extras,tripleo-repos,tripleo-specs,tripleo-ui,tripleo-upgrade,tripleo-validations


[1] https://review.openstack.org/#/c/568536/4



-Tristan

On May 25, 2018 12:45 pm, Bogdan Dobrelya wrote:
Job dependencies seem ignored by zuul, see jobs [0],[1],[2] started 
simultaneously. While I expected them run one by one. According to the 
patch 568536 [3], [1] is a dependency for [2] and [3].


The same can be observed for the remaining patches in the topic [4].
Is that a bug or I misunderstood what zuul job dependencies actually do?

[0] 
http://logs.openstack.org/36/568536/2/check/tripleo-ci-centos-7-undercloud-containers/731183a/ara-report/ 

[1] 
http://logs.openstack.org/36/568536/2/check/tripleo-ci-centos-7-3nodes-multinode/a1353ed/ara-report/ 

[2] 
http://logs.openstack.org/36/568536/2/check/tripleo-ci-centos-7-containers-multinode/9777136/ara-report/ 


[3] https://review.openstack.org/#/c/568536/
[4] 
https://review.openstack.org/#/q/topic:ci_pipelines+(status:open+OR+status:merged) 





--
Best regards,
Bogdan Dobrelya,
Irc #bogdando

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [ci][infra][tripleo] Multi-staged check pipelines for Zuul v3 proposal

2018-05-25 Thread Tristan Cacqueray

Hello Bogdan,

Perhaps this has something to do with jobs evaluation order, it may be
worth trying to add the dependencies list in the project-templates, like
it is done here for example:
http://git.openstack.org/cgit/openstack-infra/project-config/tree/zuul.d/projects.yaml#n9799

It also easier to read dependencies from pipelines definition imo.

-Tristan

On May 25, 2018 12:45 pm, Bogdan Dobrelya wrote:
Job dependencies seem ignored by zuul, see jobs [0],[1],[2] started
simultaneously. While I expected them run one by one. According to the
patch 568536 [3], [1] is a dependency for [2] and [3].

The same can be observed for the remaining patches in the topic [4].
Is that a bug or I misunderstood what zuul job dependencies actually do?

[0]
http://logs.openstack.org/36/568536/2/check/tripleo-ci-centos-7-undercloud-containers/731183a/ara-report/
[1]
http://logs.openstack.org/36/568536/2/check/tripleo-ci-centos-7-3nodes-multinode/a1353ed/ara-report/
[2]
http://logs.openstack.org/36/568536/2/check/tripleo-ci-centos-7-containers-multinode/9777136/ara-report/

[3] https://review.openstack.org/#/c/568536/
[4]
https://review.openstack.org/#/q/topic:ci_pipelines+(status:open+OR+status:merged)

On 5/15/18 11:39 AM, Bogdan Dobrelya wrote:

Added a few more patches [0], [1] by the discussion results. PTAL folks.
Wrt remaining in the topic, I'd propose to give it a try and revert it,
if it proved to be worse than better.

Thank you for feedback!

The next step could be reusing artifacts, like DLRN repos and containers
built for patches and hosted undercloud, in the consequent pipelined
jobs. But I'm not sure how to even approach that.

[0] https://review.openstack.org/#/c/568536/
[1] https://review.openstack.org/#/c/568543/

On 5/15/18 10:54 AM, Bogdan Dobrelya wrote:

On 5/14/18 10:06 PM, Alex Schultz wrote:
On Mon, May 14, 2018 at 10:15 AM, Bogdan Dobrelya
wrote:

An update for your review please folks

Bogdan Dobrelya writes:

Hello.
As Zuul documentation [0] explains, the names "check", "gate", and
"post" may be altered for more advanced pipelines. Is it doable to
introduce, for particular openstack projects, multiple check
stages/steps as check-1, check-2 and so on? And is it possible to
make

the consequent steps reusing environments from the previous steps
finished with?

Narrowing down to tripleo CI scope, the problem I'd want we to solve
with this "virtual RFE", and using such multi-staged check pipelines,
is reducing (ideally, de-duplicating) some of the common steps for
existing CI jobs.

What you're describing sounds more like a job graph within a pipeline.
See:
https://docs.openstack.org/infra/zuul/user/config.html#attr-job.dependencies

for how to configure a job to run only after another job has
completed.

There is also a facility to pass data between such jobs.

... (skipped) ...

Creating a job graph to have one job use the results of the
previous job

can make sense in a lot of cases. It doesn't always save *time*
however.

It's worth noting that in OpenStack's Zuul, we have made an explicit
choice not to have long-running integration jobs depend on shorter
pep8

or tox jobs, and that's because we value developer time more than CPU
time. We would rather run all of the tests and return all of the
results so a developer can fix all of the errors as quickly as
possible,
rather than forcing an iterative workflow where they have to fix
all the
whitespace issues before the CI system will tell them which actual
tests

broke.

-Jim

I proposed a few zuul dependencies [0], [1] to tripleo CI pipelines for
undercloud deployments vs upgrades testing (and some more). Given
that those
undercloud jobs have not so high fail rates though, I think Emilien
is right

in his comments and those would buy us nothing.

From the other side, what do you think folks of making the
tripleo-ci-centos-7-3nodes-multinode depend on
tripleo-ci-centos-7-containers-multinode [2]? The former seems quite
faily
and long running, and is non-voting. It deploys (see featuresets
configs
[3]*) a 3 nodes in HA fashion. And it seems almost never passing,
when the
containers-multinode fails - see the CI stats page [4]. I've found
only a 2
cases there for the otherwise situation, when containers-multinode
fails,
but 3nodes-multinode passes. So cutting off those future failures
via the
dependency added, *would* buy us something and allow other jobs to
wait less
to commence, by a reasonable price of somewhat extended time of the
main
zuul pipeline. I think it makes sense and that extended CI time will
not

overhead the RDO CI execution times so much to become a problem. WDYT?

I'm not sure it makes sense to add a dependency on other deployment
tests. It's going to add additional time to the CI run because the
upgrade won't start until well over an hour after the rest of the

The things are not so simple. There is also a significant
time-to-wait-in-queue jobs start delay. And it

Re: [openstack-dev] [ci][infra][tripleo] Multi-staged check pipelines for Zuul v3 proposal

2018-05-25 Thread Bogdan Dobrelya

Job dependencies seem ignored by zuul, see jobs [0],[1],[2] started
simultaneously. While I expected them run one by one. According to the
patch 568536 [3], [1] is a dependency for [2] and [3].

The same can be observed for the remaining patches in the topic [4].
Is that a bug or I misunderstood what zuul job dependencies actually do?

[3] https://review.openstack.org/#/c/568536/
[4]
https://review.openstack.org/#/q/topic:ci_pipelines+(status:open+OR+status:merged)

On 5/15/18 11:39 AM, Bogdan Dobrelya wrote:

Added a few more patches [0], [1] by the discussion results. PTAL folks.
Wrt remaining in the topic, I'd propose to give it a try and revert it,
if it proved to be worse than better.

Thank you for feedback!

The next step could be reusing artifacts, like DLRN repos and containers
built for patches and hosted undercloud, in the consequent pipelined
jobs. But I'm not sure how to even approach that.

[0] https://review.openstack.org/#/c/568536/
[1] https://review.openstack.org/#/c/568543/

On 5/15/18 10:54 AM, Bogdan Dobrelya wrote:

On 5/14/18 10:06 PM, Alex Schultz wrote:
On Mon, May 14, 2018 at 10:15 AM, Bogdan Dobrelya
wrote:

An update for your review please folks

Bogdan Dobrelya writes:

the consequent steps reusing environments from the previous steps
finished with?

What you're describing sounds more like a job graph within a pipeline.
See:
https://docs.openstack.org/infra/zuul/user/config.html#attr-job.dependencies

for how to configure a job to run only after another job has
completed.

There is also a facility to pass data between such jobs.

... (skipped) ...

Creating a job graph to have one job use the results of the
previous job

can make sense in a lot of cases. It doesn't always save *time*
however.

It's worth noting that in OpenStack's Zuul, we have made an explicit
choice not to have long-running integration jobs depend on shorter
pep8

broke.

-Jim

in his comments and those would buy us nothing.

overhead the RDO CI execution times so much to become a problem. WDYT?

The things are not so simple. There is also a significant
time-to-wait-in-queue jobs start delay. And it takes probably even
longer than the time to execute jobs. And that delay is a function of
available HW resources and zuul queue length. And the proposed change
affects those parameters as well, assuming jobs with failed
dependencies won't run at all. So we could expect longer execution
times compensated with shorter wait times! I'm not sure how to
estimate that tho. You folks have all numbers and

Re: [openstack-dev] [ci][infra][tripleo] Multi-staged check pipelines for Zuul v3 proposal

2018-05-16 Thread Jeremy Stanley

On 2018-05-16 15:17:52 +0200 (+0200), Bogdan Dobrelya wrote:
[...]
> My understanding, I may be totally wrong, is that unlike to
> packages and repos (do not count OSTree [0]), containers use
> layers and can be exported into tarballs with built-in
> de-duplication. This makes idea of tossing those tarballs around
> much more attractive, than doing something similar to repositories
> with packages.
[...]

Projects which utilize service VMs (e.g. Trove) were asking to do
precisely the same things and had nothing to do with containers. The
idea that you might build a VM image up from proposed source in one
job and then fire several other jobs which used that proposed image
well-predates similar requests from container-oriented projects.
-- 
Jeremy Stanley


signature.asc
Description: PGP signature
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [ci][infra][tripleo] Multi-staged check pipelines for Zuul v3 proposal

2018-05-16 Thread Bogdan Dobrelya


On 5/16/18 2:17 PM, Jeremy Stanley wrote:

On 2018-05-16 11:31:30 +0200 (+0200), Bogdan Dobrelya wrote:
[...]

I'm pretty sure though with broader containers adoption, openstack
infra will catch up eventually, so we all could benefit our
upstream CI jobs with affinity based and co-located data available
around for consequent build steps.


I still don't see what it has to do with containers. We've known


My understanding, I may be totally wrong, is that unlike to packages and 
repos (do not count OSTree [0]), containers use layers and can be 
exported into tarballs with built-in de-duplication. This makes idea of 
tossing those tarballs around much more attractive, than doing something 
similar to repositories with packages. Of course container images can be 
pre-built into nodepool images, just like packages, so CI users can 
rebuild on top with less changes brought into new layers, which is 
another nice to have option by the way.


[0] https://rpm-ostree.readthedocs.io/en/latest/


these were potentially useful features long before
container-oriented projects came into the picture. We simply focused
on implementing other, even more generally-applicable features
first.


Right, I think this only confirms that it *does* have something to 
containers, and priorities for containerized use cases will follow 
containers adoption trends. If everyone one day suddenly ask for 
nodepool images containing latest kolla containers injected, for example.





__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev




--
Best regards,
Bogdan Dobrelya,
Irc #bogdando

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [ci][infra][tripleo] Multi-staged check pipelines for Zuul v3 proposal

2018-05-16 Thread Jeremy Stanley

On 2018-05-16 11:31:30 +0200 (+0200), Bogdan Dobrelya wrote:
[...]
> I'm pretty sure though with broader containers adoption, openstack
> infra will catch up eventually, so we all could benefit our
> upstream CI jobs with affinity based and co-located data available
> around for consequent build steps.

I still don't see what it has to do with containers. We've known
these were potentially useful features long before
container-oriented projects came into the picture. We simply focused
on implementing other, even more generally-applicable features
first.
-- 
Jeremy Stanley


signature.asc
Description: PGP signature
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [ci][infra][tripleo] Multi-staged check pipelines for Zuul v3 proposal

2018-05-16 Thread Bogdan Dobrelya

On 5/15/18 10:31 PM, Wesley Hayutin wrote:

On Tue, May 15, 2018 at 1:29 PM James E. Blair > wrote:

Jeremy Stanley > writes:

 > On 2018-05-15 09:40:28 -0700 (-0700), James E. Blair wrote:
 > [...]
 >> We're also talking about making a new kind of job which can
continue to
 >> run after it's "finished" so that you could use it to do
something like
 >> host a container registry that's used by other jobs running on the
 >> change.  We don't have that feature yet, but if we did, would
you prefer
 >> to use that instead of the intermediate swift storage?
 >
 > If the subsequent jobs depending on that one get nodes allocated
 > from the same provider, that could solve a lot of the potential
 > network performance risks as well.

That's... tricky.  We're *also* looking at affinity for buildsets, and
I'm optimistic we'll end up with something there eventually, but that's
likely to be a more substantive change and probably won't happen as
soon.  I do agree it will be nice, especially for use cases like this.

-Jim

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe:
openstack-dev-requ...@lists.openstack.org?subject:unsubscribe

http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

There is a lot here to unpack and discuss, but I really like the ideas 
I'm seeing.
Nice work Bogdan!  I've added it the tripleo meeting agenda for next 
week so we can continue socializing the idea and get feedback.

Thanks!

https://etherpad.openstack.org/p/tripleo-meeting-items

Thank you for feedback, folks. There is a lot of technical caveats, 
right. I'm pretty sure though with broader containers adoption, 
openstack infra will catch up eventually, so we all could benefit our 
upstream CI jobs with affinity based and co-located data available 
around for consequent build steps.

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

--
Best regards,
Bogdan Dobrelya,
Irc #bogdando

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [ci][infra][tripleo] Multi-staged check pipelines for Zuul v3 proposal

2018-05-15 Thread Jeremy Stanley

On 2018-05-15 14:52:26 -0600 (-0600), Wesley Hayutin wrote:
[...]
> The content would then sync to a swift file server on a central
> point for ALL the openstack providers or it would be sync'd to
> each cloud?
[...]

We haven't previously requested that all the Infra provider donors
support Swift, and even for the ones who do I don't think we can
count on it being available in every region where we run jobs. I
assumed that implementation would be a single (central) Swift tenant
provided by one of our donors who has it, thus the reason for my
performance concerns at "large" artifact sizes.
-- 
Jeremy Stanley


signature.asc
Description: PGP signature
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [ci][infra][tripleo] Multi-staged check pipelines for Zuul v3 proposal

2018-05-15 Thread Wesley Hayutin

On Tue, May 15, 2018 at 11:42 AM Jeremy Stanley  wrote:

> On 2018-05-15 17:31:07 +0200 (+0200), Bogdan Dobrelya wrote:
> [...]
> > * upload into a swift container, with an automatic expiration set, the
> > de-duplicated and compressed tarball created with something like:
> >   # docker save $(docker images -q) | gzip -1 > all.tar.xz
> > (I expect it will be something like a 2G file)
> > * something similar for DLRN repos prolly, I'm not an expert for this
> part.
> >
> > Then those stored artifacts to be picked up by the next step in the
> graph,
> > deploying undercloud and overcloud in the single step, like:
> > * fetch the swift containers with repos and container images
> [...]
>
> I do worry a little about network fragility here, as well as
> extremely variable performance. Randomly-selected job nodes could be
> shuffling those files halfway across the globe so either upload or
> download (or both) will experience high round-trip latency as well
> as potentially constrained throughput, packet loss,
> disconnects/interruptions and so on... all the things we deal with
> when trying to rely on the Internet, except magnified by the
> quantity of data being transferred about.
>
> Ultimately still worth trying, I think, but just keep in mind it may
> introduce more issues than it solves.
> --
> Jeremy Stanley
>

Question...   If we were to build or update the containers that need an
update and I'm assuming the overcloud images here as well as a parent job.

The content would then sync to a swift file server on a central point for
ALL the openstack providers or it would be sync'd to each cloud?

Not to throw too much cold water on the idea, but...
I wonder if the time to upload and download the containers and images would
significantly reduce any advantage this process has.

Although centralizing the container updates and images on a per check job
basis sounds attractive, I get the sense we need to be very careful and
fully vett the idea.  At the moment it's also an optimization ( maybe ) so
I don't see this as a very high priority atm.

Let's bring the discussion the tripleo meeting next week.  Thanks all!

> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [ci][infra][tripleo] Multi-staged check pipelines for Zuul v3 proposal

2018-05-15 Thread Wesley Hayutin

On Mon, May 14, 2018 at 3:16 PM Sagi Shnaidman  wrote:

> Hi, Bogdan
>
> I like the idea with undercloud job. Actually if undercloud fails, I'd
> stop all other jobs, because it doens't make sense to run them. Seeing the
> same failure in 10 jobs doesn't add too much. So maybe adding undercloud
> job as dependency for all multinode jobs would be great idea. I think it's
> worth to check also how long it will delay jobs. Will all jobs wait until
> undercloud job is running? Or they will be aborted when undercloud job is
> failing?
>
> However I'm very sceptical about multinode containers and scenarios jobs,
> they could fail because of very different reasons, like race conditions in
> product or infra issues. Having skipping some of them will lead to more
> rechecks from devs trying to discover all problems in a row, which will
> delay the development process significantly.
>
> Thanks
>

I agree on both counts w/ Sagi here.
Thanks Sagi

>
>
> On Mon, May 14, 2018 at 7:15 PM, Bogdan Dobrelya 
> wrote:
>
>> An update for your review please folks
>>
>> Bogdan Dobrelya  writes:
>>>
>>> Hello.
 As Zuul documentation [0] explains, the names "check", "gate", and
 "post"  may be altered for more advanced pipelines. Is it doable to
 introduce, for particular openstack projects, multiple check
 stages/steps as check-1, check-2 and so on? And is it possible to make
 the consequent steps reusing environments from the previous steps
 finished with?

 Narrowing down to tripleo CI scope, the problem I'd want we to solve
 with this "virtual RFE", and using such multi-staged check pipelines,
 is reducing (ideally, de-duplicating) some of the common steps for
 existing CI jobs.

>>>
>>> What you're describing sounds more like a job graph within a pipeline.
>>> See:
>>> https://docs.openstack.org/infra/zuul/user/config.html#attr-job.dependencies
>>> for how to configure a job to run only after another job has completed.
>>> There is also a facility to pass data between such jobs.
>>>
>>> ... (skipped) ...
>>>
>>> Creating a job graph to have one job use the results of the previous job
>>> can make sense in a lot of cases.  It doesn't always save *time*
>>> however.
>>>
>>> It's worth noting that in OpenStack's Zuul, we have made an explicit
>>> choice not to have long-running integration jobs depend on shorter pep8
>>> or tox jobs, and that's because we value developer time more than CPU
>>> time.  We would rather run all of the tests and return all of the
>>> results so a developer can fix all of the errors as quickly as possible,
>>> rather than forcing an iterative workflow where they have to fix all the
>>> whitespace issues before the CI system will tell them which actual tests
>>> broke.
>>>
>>> -Jim
>>>
>>
>> I proposed a few zuul dependencies [0], [1] to tripleo CI pipelines for
>> undercloud deployments vs upgrades testing (and some more). Given that
>> those undercloud jobs have not so high fail rates though, I think Emilien
>> is right in his comments and those would buy us nothing.
>>
>> From the other side, what do you think folks of making the
>> tripleo-ci-centos-7-3nodes-multinode depend on
>> tripleo-ci-centos-7-containers-multinode [2]? The former seems quite faily
>> and long running, and is non-voting. It deploys (see featuresets configs
>> [3]*) a 3 nodes in HA fashion. And it seems almost never passing, when the
>> containers-multinode fails - see the CI stats page [4]. I've found only a 2
>> cases there for the otherwise situation, when containers-multinode fails,
>> but 3nodes-multinode passes. So cutting off those future failures via the
>> dependency added, *would* buy us something and allow other jobs to wait
>> less to commence, by a reasonable price of somewhat extended time of the
>> main zuul pipeline. I think it makes sense and that extended CI time will
>> not overhead the RDO CI execution times so much to become a problem. WDYT?
>>
>> [0] https://review.openstack.org/#/c/568275/
>> [1] https://review.openstack.org/#/c/568278/
>> [2] https://review.openstack.org/#/c/568326/
>> [3]
>> https://docs.openstack.org/tripleo-quickstart/latest/feature-configuration.html
>> [4] http://tripleo.org/cistatus.html
>>
>> * ignore the column 1, it's obsolete, all CI jobs now using configs
>> download AFAICT...
>>
>> --
>> Best regards,
>> Bogdan Dobrelya,
>> Irc #bogdando
>>
>> __
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe:
>> openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>
>
>
> --
> Best regards
> Sagi Shnaidman
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe

Re: [openstack-dev] [ci][infra][tripleo] Multi-staged check pipelines for Zuul v3 proposal

2018-05-15 Thread Wesley Hayutin

On Tue, May 15, 2018 at 1:29 PM James E. Blair  wrote:

> Jeremy Stanley  writes:
>
> > On 2018-05-15 09:40:28 -0700 (-0700), James E. Blair wrote:
> > [...]
> >> We're also talking about making a new kind of job which can continue to
> >> run after it's "finished" so that you could use it to do something like
> >> host a container registry that's used by other jobs running on the
> >> change.  We don't have that feature yet, but if we did, would you prefer
> >> to use that instead of the intermediate swift storage?
> >
> > If the subsequent jobs depending on that one get nodes allocated
> > from the same provider, that could solve a lot of the potential
> > network performance risks as well.
>
> That's... tricky.  We're *also* looking at affinity for buildsets, and
> I'm optimistic we'll end up with something there eventually, but that's
> likely to be a more substantive change and probably won't happen as
> soon.  I do agree it will be nice, especially for use cases like this.
>
> -Jim
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


There is a lot here to unpack and discuss, but I really like the ideas I'm
seeing.
Nice work Bogdan!  I've added it the tripleo meeting agenda for next week
so we can continue socializing the idea and get feedback.

Thanks!

https://etherpad.openstack.org/p/tripleo-meeting-items
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [ci][infra][tripleo] Multi-staged check pipelines for Zuul v3 proposal

2018-05-15 Thread James E. Blair

Jeremy Stanley  writes:

> On 2018-05-15 09:40:28 -0700 (-0700), James E. Blair wrote:
> [...]
>> We're also talking about making a new kind of job which can continue to
>> run after it's "finished" so that you could use it to do something like
>> host a container registry that's used by other jobs running on the
>> change.  We don't have that feature yet, but if we did, would you prefer
>> to use that instead of the intermediate swift storage?
>
> If the subsequent jobs depending on that one get nodes allocated
> from the same provider, that could solve a lot of the potential
> network performance risks as well.

That's... tricky.  We're *also* looking at affinity for buildsets, and
I'm optimistic we'll end up with something there eventually, but that's
likely to be a more substantive change and probably won't happen as
soon.  I do agree it will be nice, especially for use cases like this.

-Jim

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [ci][infra][tripleo] Multi-staged check pipelines for Zuul v3 proposal

2018-05-15 Thread Jeremy Stanley

On 2018-05-15 09:40:28 -0700 (-0700), James E. Blair wrote:
[...]
> We're also talking about making a new kind of job which can continue to
> run after it's "finished" so that you could use it to do something like
> host a container registry that's used by other jobs running on the
> change.  We don't have that feature yet, but if we did, would you prefer
> to use that instead of the intermediate swift storage?

If the subsequent jobs depending on that one get nodes allocated
from the same provider, that could solve a lot of the potential
network performance risks as well.
-- 
Jeremy Stanley


signature.asc
Description: PGP signature
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [ci][infra][tripleo] Multi-staged check pipelines for Zuul v3 proposal

2018-05-15 Thread James E. Blair

Bogdan Dobrelya  writes:

> * check out testing depends-on things,

(Zuul should have done this for you, but yes.)

> * build repos and all tripleo docker images from these repos,
> * upload into a swift container, with an automatic expiration set, the
> de-duplicated and compressed tarball created with something like:
>   # docker save $(docker images -q) | gzip -1 > all.tar.xz
> (I expect it will be something like a 2G file)
> * something similar for DLRN repos prolly, I'm not an expert for this part.
>
> Then those stored artifacts to be picked up by the next step in the
> graph, deploying undercloud and overcloud in the single step, like:
> * fetch the swift containers with repos and container images
> * docker load -i all.tar.xz
> * populate images into a local registry, as usual
> * something similar for the repos. Includes an offline yum update (we
> already have a compressed repo, right? profit!)
> * deploy UC
> * deploy OC, if a job wants it
>
> And if OC deployment brought into a separate step, we do not need
> local registries, just 'docker load -i all.tar.xz' issued for
> overcloud nodes should replace image prep workflows and registries,
> AFAICT. Not sure with the repos for that case.
>
> I wish to assist with the upstream infra swift setup for tripleo, and
> that plan, just need a blessing and more hands from tripleo CI squad
> ;)

That sounds about right (at least the Zuul parts :).

We're also talking about making a new kind of job which can continue to
run after it's "finished" so that you could use it to do something like
host a container registry that's used by other jobs running on the
change.  We don't have that feature yet, but if we did, would you prefer
to use that instead of the intermediate swift storage?

-Jim

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [ci][infra][tripleo] Multi-staged check pipelines for Zuul v3 proposal

2018-05-15 Thread Bogdan Dobrelya


On 5/15/18 5:08 PM, Sagi Shnaidman wrote:

Bogdan,

I think before final decisions we need to know exactly - what a price we 
need to pay? Without exact numbers it will be difficult to discuss about.
I we need to wait 80 mins of undercloud-containers job to finish for 
starting all other jobs, it will be about 4.5 hours to wait for result 
(+ 4.5 hours in gate) which is too big price imho and doesn't worth an 
effort.


What are exact numbers we are talking about?


I fully agree but can't have those numbers, sorry! As I noted above, 
those are definitely sitting in openstack-infra's elastic search DB, 
just needed to get extracted with some assistance of folks who know more 
on that!




Thanks


On Tue, May 15, 2018 at 3:07 PM, Bogdan Dobrelya > wrote:


Let me clarify the problem I want to solve with pipelines.

It is getting *hard* to develop things and move patches to the Happy
End (merged):
- Patches wait too long for CI jobs to start. It should be minutes
and not hours of waiting.
- If a patch fails a job w/o a good reason, the consequent recheck
operation repeat waiting all over again.

How pipelines may help solve it?
Pipelines only alleviate, not solve the problem of waiting. We only
want to build pipelines for the main zuul check process, omitting
gating and RDO CI (for now).

Where are two cases to consider:
- A patch succeeds all checks
- A patch fails a check with dependencies

The latter cases benefit us the most, when pipelines are designed
like it is proposed here. So that any jobs expected to fail, when a
dependency fails, will be omitted from execution. This saves HW
resources and zuul queue places a lot, making it available for other
patches and allowing those to have CI jobs started faster (less
waiting!). When we have "recheck storms", like because of some known
intermittent side issue, that outcome is multiplied by the recheck
storm um... level, and delivers even better and absolutely amazing
results :) Zuul queue will not be growing insanely getting
overwhelmed by multiple clones of the rechecked jobs highly likely
deemed to fail, and blocking other patches what might have chances
to pass checks as non-affected by that intermittent issue.

And for the first case, when a patch succeeds, it takes some
extended time, and that is the price to pay. How much time it takes
to finish in a pipeline fully depends on implementation.

The effectiveness could only be measured with numbers extracted from
elastic search data, like average time to wait for a job to start,
success vs fail execution time percentiles for a job, average amount
of rechecks, recheck storms history et al. I don't have that data
and don't know how to get it. Any help with that is very appreciated
and could really help to move the proposed patches forward or
decline it. And we could then compare "before" and "after" as well.

I hope that explains the problem scope and the methodology to
address that.


On 5/14/18 6:15 PM, Bogdan Dobrelya wrote:

An update for your review please folks

Bogdan Dobrelya http://redhat.com>>
writes:

Hello.
As Zuul documentation [0] explains, the names "check",
"gate", and
"post"  may be altered for more advanced pipelines. Is
it doable to
introduce, for particular openstack projects, multiple check
stages/steps as check-1, check-2 and so on? And is it
possible to make
the consequent steps reusing environments from the
previous steps
finished with?

Narrowing down to tripleo CI scope, the problem I'd want
we to solve
with this "virtual RFE", and using such multi-staged
check pipelines,
is reducing (ideally, de-duplicating) some of the common
steps for
existing CI jobs.


What you're describing sounds more like a job graph within a
pipeline.
See:

https://docs.openstack.org/infra/zuul/user/config.html#attr-job.dependencies



for how to configure a job to run only after another job has
completed.
There is also a facility to pass data between such jobs.

... (skipped) ...

Creating a job graph to have one job use the results of the
previous job
can make sense in a lot of cases.  It doesn't always save *time*
however.

It's worth noting that in OpenStack's Zuul, we have made an
explicit
choice not to have

Re: [openstack-dev] [ci][infra][tripleo] Multi-staged check pipelines for Zuul v3 proposal

2018-05-15 Thread Jeremy Stanley

On 2018-05-15 17:31:07 +0200 (+0200), Bogdan Dobrelya wrote:
[...]
> * upload into a swift container, with an automatic expiration set, the
> de-duplicated and compressed tarball created with something like:
>   # docker save $(docker images -q) | gzip -1 > all.tar.xz
> (I expect it will be something like a 2G file)
> * something similar for DLRN repos prolly, I'm not an expert for this part.
> 
> Then those stored artifacts to be picked up by the next step in the graph,
> deploying undercloud and overcloud in the single step, like:
> * fetch the swift containers with repos and container images
[...]

I do worry a little about network fragility here, as well as
extremely variable performance. Randomly-selected job nodes could be
shuffling those files halfway across the globe so either upload or
download (or both) will experience high round-trip latency as well
as potentially constrained throughput, packet loss,
disconnects/interruptions and so on... all the things we deal with
when trying to rely on the Internet, except magnified by the
quantity of data being transferred about.

Ultimately still worth trying, I think, but just keep in mind it may
introduce more issues than it solves.
-- 
Jeremy Stanley

signature.asc
Description: PGP signature
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [ci][infra][tripleo] Multi-staged check pipelines for Zuul v3 proposal

2018-05-15 Thread Bogdan Dobrelya


On 5/15/18 4:30 PM, James E. Blair wrote:

Bogdan Dobrelya  writes:


Added a few more patches [0], [1] by the discussion results. PTAL folks.
Wrt remaining in the topic, I'd propose to give it a try and revert
it, if it proved to be worse than better.
Thank you for feedback!

The next step could be reusing artifacts, like DLRN repos and
containers built for patches and hosted undercloud, in the consequent
pipelined jobs. But I'm not sure how to even approach that.

[0] https://review.openstack.org/#/c/568536/
[1] https://review.openstack.org/#/c/568543/


In order to use an artifact in a dependent job, you need to store it
somewhere and retrieve it.

In the parent job, I'd recommend storing the artifact on the log server
(in an "artifacts/" directory) next to the job's logs.  The log server
is essentially a time-limited artifact repository keyed on the zuul
build UUID.

Pass the URL to the child job using the zuul_return Ansible module.

Have the child job fetch it from the log server using the URL it gets.

However, don't do that if the artifacts are very large -- more than a
few MB -- we'll end up running out of space quickly.

In that case, please volunteer some time to help the infra team set up a
swift container to store these artifacts.  We don't need to *run*
swift -- we have clouds with swift already.  We just need some help
setting up accounts, secrets, and Ansible roles to use it from Zuul.


Thank you, that's a good proposal! So when we have done that upstream 
infra swift setup for tripleo, the 1st step in the job dependency graph 
may be using quickstart to do something like:


* check out testing depends-on things,
* build repos and all tripleo docker images from these repos,
* upload into a swift container, with an automatic expiration set, the 
de-duplicated and compressed tarball created with something like:

  # docker save $(docker images -q) | gzip -1 > all.tar.xz
(I expect it will be something like a 2G file)
* something similar for DLRN repos prolly, I'm not an expert for this part.

Then those stored artifacts to be picked up by the next step in the 
graph, deploying undercloud and overcloud in the single step, like:

* fetch the swift containers with repos and container images
* docker load -i all.tar.xz
* populate images into a local registry, as usual
* something similar for the repos. Includes an offline yum update (we 
already have a compressed repo, right? profit!)

* deploy UC
* deploy OC, if a job wants it

And if OC deployment brought into a separate step, we do not need local 
registries, just 'docker load -i all.tar.xz' issued for overcloud nodes 
should replace image prep workflows and registries, AFAICT. Not sure 
with the repos for that case.


I wish to assist with the upstream infra swift setup for tripleo, and 
that plan, just need a blessing and more hands from tripleo CI squad ;)




-Jim

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev




--
Best regards,
Bogdan Dobrelya,
Irc #bogdando

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [ci][infra][tripleo] Multi-staged check pipelines for Zuul v3 proposal

2018-05-15 Thread Sagi Shnaidman

Bogdan,

I think before final decisions we need to know exactly - what a price we
need to pay? Without exact numbers it will be difficult to discuss about.
I we need to wait 80 mins of undercloud-containers job to finish for
starting all other jobs, it will be about 4.5 hours to wait for result (+
4.5 hours in gate) which is too big price imho and doesn't worth an effort.

What are exact numbers we are talking about?

Thanks


On Tue, May 15, 2018 at 3:07 PM, Bogdan Dobrelya 
wrote:

> Let me clarify the problem I want to solve with pipelines.
>
> It is getting *hard* to develop things and move patches to the Happy End
> (merged):
> - Patches wait too long for CI jobs to start. It should be minutes and not
> hours of waiting.
> - If a patch fails a job w/o a good reason, the consequent recheck
> operation repeat waiting all over again.
>
> How pipelines may help solve it?
> Pipelines only alleviate, not solve the problem of waiting. We only want
> to build pipelines for the main zuul check process, omitting gating and RDO
> CI (for now).
>
> Where are two cases to consider:
> - A patch succeeds all checks
> - A patch fails a check with dependencies
>
> The latter cases benefit us the most, when pipelines are designed like it
> is proposed here. So that any jobs expected to fail, when a dependency
> fails, will be omitted from execution. This saves HW resources and zuul
> queue places a lot, making it available for other patches and allowing
> those to have CI jobs started faster (less waiting!). When we have "recheck
> storms", like because of some known intermittent side issue, that outcome
> is multiplied by the recheck storm um... level, and delivers even better
> and absolutely amazing results :) Zuul queue will not be growing insanely
> getting overwhelmed by multiple clones of the rechecked jobs highly likely
> deemed to fail, and blocking other patches what might have chances to pass
> checks as non-affected by that intermittent issue.
>
> And for the first case, when a patch succeeds, it takes some extended
> time, and that is the price to pay. How much time it takes to finish in a
> pipeline fully depends on implementation.
>
> The effectiveness could only be measured with numbers extracted from
> elastic search data, like average time to wait for a job to start, success
> vs fail execution time percentiles for a job, average amount of rechecks,
> recheck storms history et al. I don't have that data and don't know how to
> get it. Any help with that is very appreciated and could really help to
> move the proposed patches forward or decline it. And we could then compare
> "before" and "after" as well.
>
> I hope that explains the problem scope and the methodology to address that.
>
>
> On 5/14/18 6:15 PM, Bogdan Dobrelya wrote:
>
>> An update for your review please folks
>>
>> Bogdan Dobrelya  writes:
>>>
>>> Hello.
 As Zuul documentation [0] explains, the names "check", "gate", and
 "post"  may be altered for more advanced pipelines. Is it doable to
 introduce, for particular openstack projects, multiple check
 stages/steps as check-1, check-2 and so on? And is it possible to make
 the consequent steps reusing environments from the previous steps
 finished with?

 Narrowing down to tripleo CI scope, the problem I'd want we to solve
 with this "virtual RFE", and using such multi-staged check pipelines,
 is reducing (ideally, de-duplicating) some of the common steps for
 existing CI jobs.

>>>
>>> What you're describing sounds more like a job graph within a pipeline.
>>> See: https://docs.openstack.org/infra/zuul/user/config.html#attr-
>>> job.dependencies
>>> for how to configure a job to run only after another job has completed.
>>> There is also a facility to pass data between such jobs.
>>>
>>> ... (skipped) ...
>>>
>>> Creating a job graph to have one job use the results of the previous job
>>> can make sense in a lot of cases.  It doesn't always save *time*
>>> however.
>>>
>>> It's worth noting that in OpenStack's Zuul, we have made an explicit
>>> choice not to have long-running integration jobs depend on shorter pep8
>>> or tox jobs, and that's because we value developer time more than CPU
>>> time.  We would rather run all of the tests and return all of the
>>> results so a developer can fix all of the errors as quickly as possible,
>>> rather than forcing an iterative workflow where they have to fix all the
>>> whitespace issues before the CI system will tell them which actual tests
>>> broke.
>>>
>>> -Jim
>>>
>>
>> I proposed a few zuul dependencies [0], [1] to tripleo CI pipelines for
>> undercloud deployments vs upgrades testing (and some more). Given that
>> those undercloud jobs have not so high fail rates though, I think Emilien
>> is right in his comments and those would buy us nothing.
>>
>>  From the other side, what do you think folks of making the
>> tripleo-ci-centos-7-3nodes-multinode depend on
>>

Re: [openstack-dev] [ci][infra][tripleo] Multi-staged check pipelines for Zuul v3 proposal

2018-05-15 Thread James E. Blair

Bogdan Dobrelya  writes:

> Added a few more patches [0], [1] by the discussion results. PTAL folks.
> Wrt remaining in the topic, I'd propose to give it a try and revert
> it, if it proved to be worse than better.
> Thank you for feedback!
>
> The next step could be reusing artifacts, like DLRN repos and
> containers built for patches and hosted undercloud, in the consequent
> pipelined jobs. But I'm not sure how to even approach that.
>
> [0] https://review.openstack.org/#/c/568536/
> [1] https://review.openstack.org/#/c/568543/

In order to use an artifact in a dependent job, you need to store it
somewhere and retrieve it.

In the parent job, I'd recommend storing the artifact on the log server
(in an "artifacts/" directory) next to the job's logs.  The log server
is essentially a time-limited artifact repository keyed on the zuul
build UUID.

Pass the URL to the child job using the zuul_return Ansible module.

Have the child job fetch it from the log server using the URL it gets.

However, don't do that if the artifacts are very large -- more than a
few MB -- we'll end up running out of space quickly.

In that case, please volunteer some time to help the infra team set up a
swift container to store these artifacts.  We don't need to *run*
swift -- we have clouds with swift already.  We just need some help
setting up accounts, secrets, and Ansible roles to use it from Zuul.

-Jim

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [ci][infra][tripleo] Multi-staged check pipelines for Zuul v3 proposal

2018-05-15 Thread Jeremy Stanley

On 2018-05-15 15:22:14 +0200 (+0200), Bogdan Dobrelya wrote:
[...]
> I mean pipelines as jobs executed in batches, ordered via defined
> dependencies, like gitlab pipelines [0]. And those batches can
> also be thought of steps, or whatever we call that.
[...]

Got it. So Zuul refers to that relationship as a job dependency:

https://zuul-ci.org/docs/zuul/user/config.html#attr-job.dependencies

To be clearer, you might refer to this as dependent job ordering or
a job dependency graph.
-- 
Jeremy Stanley


signature.asc
Description: PGP signature
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [ci][infra][tripleo] Multi-staged check pipelines for Zuul v3 proposal

2018-05-15 Thread Bogdan Dobrelya


On 5/15/18 2:30 PM, Jeremy Stanley wrote:

On 2018-05-15 14:07:56 +0200 (+0200), Bogdan Dobrelya wrote:
[...]

How pipelines may help solve it?
Pipelines only alleviate, not solve the problem of waiting. We only want to
build pipelines for the main zuul check process, omitting gating and RDO CI
(for now).

Where are two cases to consider:
- A patch succeeds all checks
- A patch fails a check with dependencies

The latter cases benefit us the most, when pipelines are designed like it is
proposed here. So that any jobs expected to fail, when a dependency fails,
will be omitted from execution.

[...]

Your choice of terminology is making it hard to follow this
proposal. You seem to mean something other than
https://zuul-ci.org/docs/zuul/user/config.html#pipeline when you use
the term "pipeline" (which gets confusing very quickly for anyone
familiar with Zuul configuration concepts).


Indeed, sorry for that confusion. I mean pipelines as jobs executed in 
batches, ordered via defined dependencies, like gitlab pipelines [0]. 
And those batches can also be thought of steps, or whatever we call that.


[0] https://docs.gitlab.com/ee/ci/pipelines.html




__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev




--
Best regards,
Bogdan Dobrelya,
Irc #bogdando

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [ci][infra][tripleo] Multi-staged check pipelines for Zuul v3 proposal

2018-05-15 Thread Jeremy Stanley

On 2018-05-15 14:07:56 +0200 (+0200), Bogdan Dobrelya wrote:
[...]
> How pipelines may help solve it?
> Pipelines only alleviate, not solve the problem of waiting. We only want to
> build pipelines for the main zuul check process, omitting gating and RDO CI
> (for now).
> 
> Where are two cases to consider:
> - A patch succeeds all checks
> - A patch fails a check with dependencies
> 
> The latter cases benefit us the most, when pipelines are designed like it is
> proposed here. So that any jobs expected to fail, when a dependency fails,
> will be omitted from execution.
[...]

Your choice of terminology is making it hard to follow this
proposal. You seem to mean something other than
https://zuul-ci.org/docs/zuul/user/config.html#pipeline when you use
the term "pipeline" (which gets confusing very quickly for anyone
familiar with Zuul configuration concepts).
-- 
Jeremy Stanley


signature.asc
Description: PGP signature
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [ci][infra][tripleo] Multi-staged check pipelines for Zuul v3 proposal

2018-05-15 Thread Bogdan Dobrelya


Let me clarify the problem I want to solve with pipelines.

It is getting *hard* to develop things and move patches to the Happy End 
(merged):
- Patches wait too long for CI jobs to start. It should be minutes and 
not hours of waiting.
- If a patch fails a job w/o a good reason, the consequent recheck 
operation repeat waiting all over again.


How pipelines may help solve it?
Pipelines only alleviate, not solve the problem of waiting. We only want 
to build pipelines for the main zuul check process, omitting gating and 
RDO CI (for now).


Where are two cases to consider:
- A patch succeeds all checks
- A patch fails a check with dependencies

The latter cases benefit us the most, when pipelines are designed like 
it is proposed here. So that any jobs expected to fail, when a 
dependency fails, will be omitted from execution. This saves HW 
resources and zuul queue places a lot, making it available for other 
patches and allowing those to have CI jobs started faster (less 
waiting!). When we have "recheck storms", like because of some known 
intermittent side issue, that outcome is multiplied by the recheck storm 
um... level, and delivers even better and absolutely amazing results :) 
Zuul queue will not be growing insanely getting overwhelmed by multiple 
clones of the rechecked jobs highly likely deemed to fail, and blocking 
other patches what might have chances to pass checks as non-affected by 
that intermittent issue.


And for the first case, when a patch succeeds, it takes some extended 
time, and that is the price to pay. How much time it takes to finish in 
a pipeline fully depends on implementation.


The effectiveness could only be measured with numbers extracted from 
elastic search data, like average time to wait for a job to start, 
success vs fail execution time percentiles for a job, average amount of 
rechecks, recheck storms history et al. I don't have that data and don't 
know how to get it. Any help with that is very appreciated and could 
really help to move the proposed patches forward or decline it. And we 
could then compare "before" and "after" as well.


I hope that explains the problem scope and the methodology to address that.

On 5/14/18 6:15 PM, Bogdan Dobrelya wrote:

An update for your review please folks


Bogdan Dobrelya  writes:


Hello.
As Zuul documentation [0] explains, the names "check", "gate", and
"post"  may be altered for more advanced pipelines. Is it doable to
introduce, for particular openstack projects, multiple check
stages/steps as check-1, check-2 and so on? And is it possible to make
the consequent steps reusing environments from the previous steps
finished with?

Narrowing down to tripleo CI scope, the problem I'd want we to solve
with this "virtual RFE", and using such multi-staged check pipelines,
is reducing (ideally, de-duplicating) some of the common steps for
existing CI jobs.


What you're describing sounds more like a job graph within a pipeline.
See: 
https://docs.openstack.org/infra/zuul/user/config.html#attr-job.dependencies 


for how to configure a job to run only after another job has completed.
There is also a facility to pass data between such jobs.

... (skipped) ...

Creating a job graph to have one job use the results of the previous job
can make sense in a lot of cases.  It doesn't always save *time*
however.

It's worth noting that in OpenStack's Zuul, we have made an explicit
choice not to have long-running integration jobs depend on shorter pep8
or tox jobs, and that's because we value developer time more than CPU
time.  We would rather run all of the tests and return all of the
results so a developer can fix all of the errors as quickly as possible,
rather than forcing an iterative workflow where they have to fix all the
whitespace issues before the CI system will tell them which actual tests
broke.

-Jim


I proposed a few zuul dependencies [0], [1] to tripleo CI pipelines for 
undercloud deployments vs upgrades testing (and some more). Given that 
those undercloud jobs have not so high fail rates though, I think 
Emilien is right in his comments and those would buy us nothing.


 From the other side, what do you think folks of making the
tripleo-ci-centos-7-3nodes-multinode depend on 
tripleo-ci-centos-7-containers-multinode [2]? The former seems quite 
faily and long running, and is non-voting. It deploys (see featuresets 
configs [3]*) a 3 nodes in HA fashion. And it seems almost never 
passing, when the containers-multinode fails - see the CI stats page 
[4]. I've found only a 2 cases there for the otherwise situation, when 
containers-multinode fails, but 3nodes-multinode passes. So cutting off 
those future failures via the dependency added, *would* buy us something 
and allow other jobs to wait less to commence, by a reasonable price of 
somewhat extended time of the main zuul pipeline. I think it makes sense 
and that extended CI time will not overhead the RDO CI execution times 
so much to become a problem. WDYT?

Re: [openstack-dev] [ci][infra][tripleo] Multi-staged check pipelines for Zuul v3 proposal

2018-05-15 Thread Bogdan Dobrelya


Added a few more patches [0], [1] by the discussion results. PTAL folks.
Wrt remaining in the topic, I'd propose to give it a try and revert it, 
if it proved to be worse than better.

Thank you for feedback!

The next step could be reusing artifacts, like DLRN repos and containers 
built for patches and hosted undercloud, in the consequent pipelined 
jobs. But I'm not sure how to even approach that.


[0] https://review.openstack.org/#/c/568536/
[1] https://review.openstack.org/#/c/568543/

On 5/15/18 10:54 AM, Bogdan Dobrelya wrote:

On 5/14/18 10:06 PM, Alex Schultz wrote:
On Mon, May 14, 2018 at 10:15 AM, Bogdan Dobrelya 
 wrote:

An update for your review please folks


Bogdan Dobrelya  writes:


Hello.
As Zuul documentation [0] explains, the names "check", "gate", and
"post"  may be altered for more advanced pipelines. Is it doable to
introduce, for particular openstack projects, multiple check
stages/steps as check-1, check-2 and so on? And is it possible to make
the consequent steps reusing environments from the previous steps
finished with?

Narrowing down to tripleo CI scope, the problem I'd want we to solve
with this "virtual RFE", and using such multi-staged check pipelines,
is reducing (ideally, de-duplicating) some of the common steps for
existing CI jobs.



What you're describing sounds more like a job graph within a pipeline.
See:
https://docs.openstack.org/infra/zuul/user/config.html#attr-job.dependencies 


for how to configure a job to run only after another job has completed.
There is also a facility to pass data between such jobs.

... (skipped) ...

Creating a job graph to have one job use the results of the previous 
job

can make sense in a lot of cases.  It doesn't always save *time*
however.

It's worth noting that in OpenStack's Zuul, we have made an explicit
choice not to have long-running integration jobs depend on shorter pep8
or tox jobs, and that's because we value developer time more than CPU
time.  We would rather run all of the tests and return all of the
results so a developer can fix all of the errors as quickly as 
possible,
rather than forcing an iterative workflow where they have to fix all 
the
whitespace issues before the CI system will tell them which actual 
tests

broke.

-Jim



I proposed a few zuul dependencies [0], [1] to tripleo CI pipelines for
undercloud deployments vs upgrades testing (and some more). Given 
that those
undercloud jobs have not so high fail rates though, I think Emilien 
is right

in his comments and those would buy us nothing.

 From the other side, what do you think folks of making the
tripleo-ci-centos-7-3nodes-multinode depend on
tripleo-ci-centos-7-containers-multinode [2]? The former seems quite 
faily

and long running, and is non-voting. It deploys (see featuresets configs
[3]*) a 3 nodes in HA fashion. And it seems almost never passing, 
when the
containers-multinode fails - see the CI stats page [4]. I've found 
only a 2
cases there for the otherwise situation, when containers-multinode 
fails,
but 3nodes-multinode passes. So cutting off those future failures via 
the
dependency added, *would* buy us something and allow other jobs to 
wait less

to commence, by a reasonable price of somewhat extended time of the main
zuul pipeline. I think it makes sense and that extended CI time will not
overhead the RDO CI execution times so much to become a problem. WDYT?



I'm not sure it makes sense to add a dependency on other deployment
tests. It's going to add additional time to the CI run because the
upgrade won't start until well over an hour after the rest of the


The things are not so simple. There is also a significant 
time-to-wait-in-queue jobs start delay. And it takes probably even 
longer than the time to execute jobs. And that delay is a function of 
available HW resources and zuul queue length. And the proposed change 
affects those parameters as well, assuming jobs with failed dependencies 
won't run at all. So we could expect longer execution times compensated 
with shorter wait times! I'm not sure how to estimate that tho. You 
folks have all numbers and knowledge, let's use that please.



jobs.  The only thing I could think of where this makes more sense is
to delay the deployment tests until the pep8/unit tests pass.  e.g.
let's not burn resources when the code is bad. There might be
arguments about lack of information from a deployment when developing
things but I would argue that the patch should be vetted properly
first in a local environment before taking CI resources.


I support this idea as well, though I'm sceptical about having that 
blessed in the end :) I'll add a patch though.




Thanks,
-Alex


[0] https://review.openstack.org/#/c/568275/
[1] https://review.openstack.org/#/c/568278/
[2] https://review.openstack.org/#/c/568326/
[3]
https://docs.openstack.org/tripleo-quickstart/latest/feature-configuration.html 


[4] http://tripleo.org/cistatus.html

* ignore the column 1, it's

Re: [openstack-dev] [ci][infra][tripleo] Multi-staged check pipelines for Zuul v3 proposal

2018-05-15 Thread Bogdan Dobrelya


On 5/14/18 10:06 PM, Alex Schultz wrote:

On Mon, May 14, 2018 at 10:15 AM, Bogdan Dobrelya  wrote:

An update for your review please folks


Bogdan Dobrelya  writes:


Hello.
As Zuul documentation [0] explains, the names "check", "gate", and
"post"  may be altered for more advanced pipelines. Is it doable to
introduce, for particular openstack projects, multiple check
stages/steps as check-1, check-2 and so on? And is it possible to make
the consequent steps reusing environments from the previous steps
finished with?

Narrowing down to tripleo CI scope, the problem I'd want we to solve
with this "virtual RFE", and using such multi-staged check pipelines,
is reducing (ideally, de-duplicating) some of the common steps for
existing CI jobs.



What you're describing sounds more like a job graph within a pipeline.
See:
https://docs.openstack.org/infra/zuul/user/config.html#attr-job.dependencies
for how to configure a job to run only after another job has completed.
There is also a facility to pass data between such jobs.

... (skipped) ...

Creating a job graph to have one job use the results of the previous job
can make sense in a lot of cases.  It doesn't always save *time*
however.

It's worth noting that in OpenStack's Zuul, we have made an explicit
choice not to have long-running integration jobs depend on shorter pep8
or tox jobs, and that's because we value developer time more than CPU
time.  We would rather run all of the tests and return all of the
results so a developer can fix all of the errors as quickly as possible,
rather than forcing an iterative workflow where they have to fix all the
whitespace issues before the CI system will tell them which actual tests
broke.

-Jim



I proposed a few zuul dependencies [0], [1] to tripleo CI pipelines for
undercloud deployments vs upgrades testing (and some more). Given that those
undercloud jobs have not so high fail rates though, I think Emilien is right
in his comments and those would buy us nothing.

 From the other side, what do you think folks of making the
tripleo-ci-centos-7-3nodes-multinode depend on
tripleo-ci-centos-7-containers-multinode [2]? The former seems quite faily
and long running, and is non-voting. It deploys (see featuresets configs
[3]*) a 3 nodes in HA fashion. And it seems almost never passing, when the
containers-multinode fails - see the CI stats page [4]. I've found only a 2
cases there for the otherwise situation, when containers-multinode fails,
but 3nodes-multinode passes. So cutting off those future failures via the
dependency added, *would* buy us something and allow other jobs to wait less
to commence, by a reasonable price of somewhat extended time of the main
zuul pipeline. I think it makes sense and that extended CI time will not
overhead the RDO CI execution times so much to become a problem. WDYT?



I'm not sure it makes sense to add a dependency on other deployment
tests. It's going to add additional time to the CI run because the
upgrade won't start until well over an hour after the rest of the


The things are not so simple. There is also a significant 
time-to-wait-in-queue jobs start delay. And it takes probably even 
longer than the time to execute jobs. And that delay is a function of 
available HW resources and zuul queue length. And the proposed change 
affects those parameters as well, assuming jobs with failed dependencies 
won't run at all. So we could expect longer execution times compensated 
with shorter wait times! I'm not sure how to estimate that tho. You 
folks have all numbers and knowledge, let's use that please.



jobs.  The only thing I could think of where this makes more sense is
to delay the deployment tests until the pep8/unit tests pass.  e.g.
let's not burn resources when the code is bad. There might be
arguments about lack of information from a deployment when developing
things but I would argue that the patch should be vetted properly
first in a local environment before taking CI resources.


I support this idea as well, though I'm sceptical about having that 
blessed in the end :) I'll add a patch though.




Thanks,
-Alex


[0] https://review.openstack.org/#/c/568275/
[1] https://review.openstack.org/#/c/568278/
[2] https://review.openstack.org/#/c/568326/
[3]
https://docs.openstack.org/tripleo-quickstart/latest/feature-configuration.html
[4] http://tripleo.org/cistatus.html

* ignore the column 1, it's obsolete, all CI jobs now using configs download
AFAICT...

--
Best regards,
Bogdan Dobrelya,
Irc #bogdando

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe:

Re: [openstack-dev] [ci][infra][tripleo] Multi-staged check pipelines for Zuul v3 proposal

2018-05-15 Thread Bogdan Dobrelya


On 5/14/18 9:15 PM, Sagi Shnaidman wrote:

Hi, Bogdan

I like the idea with undercloud job. Actually if undercloud fails, I'd 
stop all other jobs, because it doens't make sense to run them. Seeing 
the same failure in 10 jobs doesn't add too much. So maybe adding 
undercloud job as dependency for all multinode jobs would be great idea.


I like that idea, I'll add another patch in the topic then.

I think it's worth to check also how long it will delay jobs. Will all 
jobs wait until undercloud job is running? Or they will be aborted when 
undercloud job is failing?


That is is a good question for openstack-infra folks developing zuul :)
But, we could just try it and see how it works, happily zuul v3 allows 
doing that just in the scope of proposed patches! My expectation is all 
jobs delayed (and I mean the main zuul pipeline execution time here) by 
an average time of the undercloud deploy job of ~80 min, which hopefully 
should not be a big deal given that there is a separate RDO CI pipeline 
running in parallel, which normally *highly likely* extends that 
extended time anyway :) And given high chances of additional 'recheck 
rdo' runs we can observe these days for patches on review. I wish we 
could introduce inter-pipeline dependencies (zuul CI <-> RDO CI) for 
those as well...




However I'm very sceptical about multinode containers and scenarios 
jobs, they could fail because of very different reasons, like race 
conditions in product or infra issues. Having skipping some of them will 
lead to more rechecks from devs trying to discover all problems in a 
row, which will delay the development process significantly.


right, I roughly estimated delay for the main zuul pipeline execution 
time for jobs might be a ~2.5h, which is not good. We could live with 
that had it be a ~1h only, like it takes for the undercloud containers 
job dependency example.




Thanks


On Mon, May 14, 2018 at 7:15 PM, Bogdan Dobrelya > wrote:


An update for your review please folks

Bogdan Dobrelya http://redhat.com>> writes:

Hello.
As Zuul documentation [0] explains, the names "check",
"gate", and
"post"  may be altered for more advanced pipelines. Is it
doable to
introduce, for particular openstack projects, multiple check
stages/steps as check-1, check-2 and so on? And is it
possible to make
the consequent steps reusing environments from the previous
steps
finished with?

Narrowing down to tripleo CI scope, the problem I'd want we
to solve
with this "virtual RFE", and using such multi-staged check
pipelines,
is reducing (ideally, de-duplicating) some of the common
steps for
existing CI jobs.


What you're describing sounds more like a job graph within a
pipeline.
See:

https://docs.openstack.org/infra/zuul/user/config.html#attr-job.dependencies


for how to configure a job to run only after another job has
completed.
There is also a facility to pass data between such jobs.

... (skipped) ...

Creating a job graph to have one job use the results of the
previous job
can make sense in a lot of cases.  It doesn't always save *time*
however.

It's worth noting that in OpenStack's Zuul, we have made an explicit
choice not to have long-running integration jobs depend on
shorter pep8
or tox jobs, and that's because we value developer time more
than CPU
time.  We would rather run all of the tests and return all of the
results so a developer can fix all of the errors as quickly as
possible,
rather than forcing an iterative workflow where they have to fix
all the
whitespace issues before the CI system will tell them which
actual tests
broke.

-Jim


I proposed a few zuul dependencies [0], [1] to tripleo CI pipelines
for undercloud deployments vs upgrades testing (and some more).
Given that those undercloud jobs have not so high fail rates though,
I think Emilien is right in his comments and those would buy us nothing.

 From the other side, what do you think folks of making the
tripleo-ci-centos-7-3nodes-multinode depend on
tripleo-ci-centos-7-containers-multinode [2]? The former seems quite
faily and long running, and is non-voting. It deploys (see
featuresets configs [3]*) a 3 nodes in HA fashion. And it seems
almost never passing, when the containers-multinode fails - see the
CI stats page [4]. I've found only a 2 cases there for the otherwise
situation, when containers-multinode fails, but 3nodes-multinode

Re: [openstack-dev] [ci][infra][tripleo] Multi-staged check pipelines for Zuul v3 proposal

2018-05-14 Thread Alex Schultz

On Mon, May 14, 2018 at 10:15 AM, Bogdan Dobrelya  wrote:
> An update for your review please folks
>
>> Bogdan Dobrelya  writes:
>>
>>> Hello.
>>> As Zuul documentation [0] explains, the names "check", "gate", and
>>> "post"  may be altered for more advanced pipelines. Is it doable to
>>> introduce, for particular openstack projects, multiple check
>>> stages/steps as check-1, check-2 and so on? And is it possible to make
>>> the consequent steps reusing environments from the previous steps
>>> finished with?
>>>
>>> Narrowing down to tripleo CI scope, the problem I'd want we to solve
>>> with this "virtual RFE", and using such multi-staged check pipelines,
>>> is reducing (ideally, de-duplicating) some of the common steps for
>>> existing CI jobs.
>>
>>
>> What you're describing sounds more like a job graph within a pipeline.
>> See:
>> https://docs.openstack.org/infra/zuul/user/config.html#attr-job.dependencies
>> for how to configure a job to run only after another job has completed.
>> There is also a facility to pass data between such jobs.
>>
>> ... (skipped) ...
>>
>> Creating a job graph to have one job use the results of the previous job
>> can make sense in a lot of cases.  It doesn't always save *time*
>> however.
>>
>> It's worth noting that in OpenStack's Zuul, we have made an explicit
>> choice not to have long-running integration jobs depend on shorter pep8
>> or tox jobs, and that's because we value developer time more than CPU
>> time.  We would rather run all of the tests and return all of the
>> results so a developer can fix all of the errors as quickly as possible,
>> rather than forcing an iterative workflow where they have to fix all the
>> whitespace issues before the CI system will tell them which actual tests
>> broke.
>>
>> -Jim
>
>
> I proposed a few zuul dependencies [0], [1] to tripleo CI pipelines for
> undercloud deployments vs upgrades testing (and some more). Given that those
> undercloud jobs have not so high fail rates though, I think Emilien is right
> in his comments and those would buy us nothing.
>
> From the other side, what do you think folks of making the
> tripleo-ci-centos-7-3nodes-multinode depend on
> tripleo-ci-centos-7-containers-multinode [2]? The former seems quite faily
> and long running, and is non-voting. It deploys (see featuresets configs
> [3]*) a 3 nodes in HA fashion. And it seems almost never passing, when the
> containers-multinode fails - see the CI stats page [4]. I've found only a 2
> cases there for the otherwise situation, when containers-multinode fails,
> but 3nodes-multinode passes. So cutting off those future failures via the
> dependency added, *would* buy us something and allow other jobs to wait less
> to commence, by a reasonable price of somewhat extended time of the main
> zuul pipeline. I think it makes sense and that extended CI time will not
> overhead the RDO CI execution times so much to become a problem. WDYT?
>

I'm not sure it makes sense to add a dependency on other deployment
tests. It's going to add additional time to the CI run because the
upgrade won't start until well over an hour after the rest of the
jobs.  The only thing I could think of where this makes more sense is
to delay the deployment tests until the pep8/unit tests pass.  e.g.
let's not burn resources when the code is bad. There might be
arguments about lack of information from a deployment when developing
things but I would argue that the patch should be vetted properly
first in a local environment before taking CI resources.

Thanks,
-Alex

> [0] https://review.openstack.org/#/c/568275/
> [1] https://review.openstack.org/#/c/568278/
> [2] https://review.openstack.org/#/c/568326/
> [3]
> https://docs.openstack.org/tripleo-quickstart/latest/feature-configuration.html
> [4] http://tripleo.org/cistatus.html
>
> * ignore the column 1, it's obsolete, all CI jobs now using configs download
> AFAICT...
>
> --
> Best regards,
> Bogdan Dobrelya,
> Irc #bogdando
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [ci][infra][tripleo] Multi-staged check pipelines for Zuul v3 proposal

2018-05-14 Thread Sagi Shnaidman

Hi, Bogdan

I like the idea with undercloud job. Actually if undercloud fails, I'd stop
all other jobs, because it doens't make sense to run them. Seeing the same
failure in 10 jobs doesn't add too much. So maybe adding undercloud job as
dependency for all multinode jobs would be great idea. I think it's worth
to check also how long it will delay jobs. Will all jobs wait until
undercloud job is running? Or they will be aborted when undercloud job is
failing?

However I'm very sceptical about multinode containers and scenarios jobs,
they could fail because of very different reasons, like race conditions in
product or infra issues. Having skipping some of them will lead to more
rechecks from devs trying to discover all problems in a row, which will
delay the development process significantly.

Thanks


On Mon, May 14, 2018 at 7:15 PM, Bogdan Dobrelya 
wrote:

> An update for your review please folks
>
> Bogdan Dobrelya  writes:
>>
>> Hello.
>>> As Zuul documentation [0] explains, the names "check", "gate", and
>>> "post"  may be altered for more advanced pipelines. Is it doable to
>>> introduce, for particular openstack projects, multiple check
>>> stages/steps as check-1, check-2 and so on? And is it possible to make
>>> the consequent steps reusing environments from the previous steps
>>> finished with?
>>>
>>> Narrowing down to tripleo CI scope, the problem I'd want we to solve
>>> with this "virtual RFE", and using such multi-staged check pipelines,
>>> is reducing (ideally, de-duplicating) some of the common steps for
>>> existing CI jobs.
>>>
>>
>> What you're describing sounds more like a job graph within a pipeline.
>> See: https://docs.openstack.org/infra/zuul/user/config.html#attr-
>> job.dependencies
>> for how to configure a job to run only after another job has completed.
>> There is also a facility to pass data between such jobs.
>>
>> ... (skipped) ...
>>
>> Creating a job graph to have one job use the results of the previous job
>> can make sense in a lot of cases.  It doesn't always save *time*
>> however.
>>
>> It's worth noting that in OpenStack's Zuul, we have made an explicit
>> choice not to have long-running integration jobs depend on shorter pep8
>> or tox jobs, and that's because we value developer time more than CPU
>> time.  We would rather run all of the tests and return all of the
>> results so a developer can fix all of the errors as quickly as possible,
>> rather than forcing an iterative workflow where they have to fix all the
>> whitespace issues before the CI system will tell them which actual tests
>> broke.
>>
>> -Jim
>>
>
> I proposed a few zuul dependencies [0], [1] to tripleo CI pipelines for
> undercloud deployments vs upgrades testing (and some more). Given that
> those undercloud jobs have not so high fail rates though, I think Emilien
> is right in his comments and those would buy us nothing.
>
> From the other side, what do you think folks of making the
> tripleo-ci-centos-7-3nodes-multinode depend on
> tripleo-ci-centos-7-containers-multinode [2]? The former seems quite
> faily and long running, and is non-voting. It deploys (see featuresets
> configs [3]*) a 3 nodes in HA fashion. And it seems almost never passing,
> when the containers-multinode fails - see the CI stats page [4]. I've found
> only a 2 cases there for the otherwise situation, when containers-multinode
> fails, but 3nodes-multinode passes. So cutting off those future failures
> via the dependency added, *would* buy us something and allow other jobs to
> wait less to commence, by a reasonable price of somewhat extended time of
> the main zuul pipeline. I think it makes sense and that extended CI time
> will not overhead the RDO CI execution times so much to become a problem.
> WDYT?
>
> [0] https://review.openstack.org/#/c/568275/
> [1] https://review.openstack.org/#/c/568278/
> [2] https://review.openstack.org/#/c/568326/
> [3] https://docs.openstack.org/tripleo-quickstart/latest/feature
> -configuration.html
> [4] http://tripleo.org/cistatus.html
>
> * ignore the column 1, it's obsolete, all CI jobs now using configs
> download AFAICT...
>
> --
> Best regards,
> Bogdan Dobrelya,
> Irc #bogdando
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>



-- 
Best regards
Sagi Shnaidman
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [ci][infra][tripleo] Multi-staged check pipelines for Zuul v3 proposal

2018-05-14 Thread Bogdan Dobrelya


An update for your review please folks


Bogdan Dobrelya  writes:


Hello.
As Zuul documentation [0] explains, the names "check", "gate", and
"post"  may be altered for more advanced pipelines. Is it doable to
introduce, for particular openstack projects, multiple check
stages/steps as check-1, check-2 and so on? And is it possible to make
the consequent steps reusing environments from the previous steps
finished with?

Narrowing down to tripleo CI scope, the problem I'd want we to solve
with this "virtual RFE", and using such multi-staged check pipelines,
is reducing (ideally, de-duplicating) some of the common steps for
existing CI jobs.


What you're describing sounds more like a job graph within a pipeline.
See: 
https://docs.openstack.org/infra/zuul/user/config.html#attr-job.dependencies
for how to configure a job to run only after another job has completed.
There is also a facility to pass data between such jobs.

... (skipped) ...

Creating a job graph to have one job use the results of the previous job
can make sense in a lot of cases.  It doesn't always save *time*
however.

It's worth noting that in OpenStack's Zuul, we have made an explicit
choice not to have long-running integration jobs depend on shorter pep8
or tox jobs, and that's because we value developer time more than CPU
time.  We would rather run all of the tests and return all of the
results so a developer can fix all of the errors as quickly as possible,
rather than forcing an iterative workflow where they have to fix all the
whitespace issues before the CI system will tell them which actual tests
broke.

-Jim


I proposed a few zuul dependencies [0], [1] to tripleo CI pipelines for 
undercloud deployments vs upgrades testing (and some more). Given that 
those undercloud jobs have not so high fail rates though, I think 
Emilien is right in his comments and those would buy us nothing.


From the other side, what do you think folks of making the
tripleo-ci-centos-7-3nodes-multinode depend on 
tripleo-ci-centos-7-containers-multinode [2]? The former seems quite 
faily and long running, and is non-voting. It deploys (see featuresets 
configs [3]*) a 3 nodes in HA fashion. And it seems almost never 
passing, when the containers-multinode fails - see the CI stats page 
[4]. I've found only a 2 cases there for the otherwise situation, when 
containers-multinode fails, but 3nodes-multinode passes. So cutting off 
those future failures via the dependency added, *would* buy us something 
and allow other jobs to wait less to commence, by a reasonable price of 
somewhat extended time of the main zuul pipeline. I think it makes sense 
and that extended CI time will not overhead the RDO CI execution times 
so much to become a problem. WDYT?


[0] https://review.openstack.org/#/c/568275/
[1] https://review.openstack.org/#/c/568278/
[2] https://review.openstack.org/#/c/568326/
[3] 
https://docs.openstack.org/tripleo-quickstart/latest/feature-configuration.html

[4] http://tripleo.org/cistatus.html

* ignore the column 1, it's obsolete, all CI jobs now using configs 
download AFAICT...


--
Best regards,
Bogdan Dobrelya,
Irc #bogdando

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [ci][infra][tripleo] Multi-staged check pipelines for Zuul v3 proposal

2018-03-02 Thread James E. Blair

Bogdan Dobrelya  writes:

> Hello.
> As Zuul documentation [0] explains, the names "check", "gate", and
> "post"  may be altered for more advanced pipelines. Is it doable to
> introduce, for particular openstack projects, multiple check
> stages/steps as check-1, check-2 and so on? And is it possible to make
> the consequent steps reusing environments from the previous steps
> finished with?
>
> Narrowing down to tripleo CI scope, the problem I'd want we to solve
> with this "virtual RFE", and using such multi-staged check pipelines,
> is reducing (ideally, de-duplicating) some of the common steps for
> existing CI jobs.

What you're describing sounds more like a job graph within a pipeline.
See: 
https://docs.openstack.org/infra/zuul/user/config.html#attr-job.dependencies
for how to configure a job to run only after another job has completed.
There is also a facility to pass data between such jobs.

> For example, we may want to omit running all of the OVB or multinode
> (non-upgrade) jobs deploying overclouds, when the *undercloud* fails
> to install. This case makes even more sense, when undercloud is
> deployed from the same heat templates (aka Containerized Undercloud)
> and uses the same packages and containers images, as overcloud would
> do! Or, maybe, just stop the world, when tox failed at the step1 and
> do nothing more, as it makes very little sense to run anything else
> (IMO), if the patch can never be gated with a failed tox check
> anyway...
>
> What I propose here, is to think and discuss, and come up with an RFE,
> either for tripleo, or zuul, or both, of the following scenarios
> (examples are tripleo/RDO CI specific, though you can think of other
> use cases ofc!):
>
> case A. No deduplication, simple multi-staged check pipeline:
>
> * check-1: syntax only, lint/tox
> * check-2 : undercloud install with heat and containers
> * check-3 : undercloud install with heat and containers, build
> overcloud images (if not multinode job type), deploy
> overcloud... (repeats OVB jobs as is, basically)
>
> case B. Full de-duplication scenario (consequent steps re-use the
> previous steps results, building "on-top"):
>
> * check-1: syntax only, lint/tox
> * check-2 : undercloud unstall, reuses nothing from the step1 prolly
> * check-3 : build overcloud images, if not multinode job type, extends
> stage 2
> * check-4:  deploy overcloud, extends stages 2/3
> * check-5: upgrade undercloud, extends stage 2
> * check-6: upgrade overcloud, extends stage 4
> (looking into future...)
> * check-7: deploy openshift/k8s on openstack and do e2e/conformance et
> al, extends either stage 4 or 6
>
> I believe even the simplest 'case A' would reduce the zuul queues for
> tripleo CI dramatically. What do you think folks? See also PTG tripleo
> CI notes [1].
>
> [0] https://docs.openstack.org/infra/zuul/user/concepts.html
> [1] https://etherpad.openstack.org/p/tripleo-ptg-ci

Creating a job graph to have one job use the results of the previous job
can make sense in a lot of cases.  It doesn't always save *time*
however.

It's worth noting that in OpenStack's Zuul, we have made an explicit
choice not to have long-running integration jobs depend on shorter pep8
or tox jobs, and that's because we value developer time more than CPU
time.  We would rather run all of the tests and return all of the
results so a developer can fix all of the errors as quickly as possible,
rather than forcing an iterative workflow where they have to fix all the
whitespace issues before the CI system will tell them which actual tests
broke.

-Jim

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

[openstack-dev] [ci][infra][tripleo] Multi-staged check pipelines for Zuul v3 proposal

2018-03-02 Thread Bogdan Dobrelya


Hello.
As Zuul documentation [0] explains, the names "check", "gate", and 
"post"  may be altered for more advanced pipelines. Is it doable to 
introduce, for particular openstack projects, multiple check 
stages/steps as check-1, check-2 and so on? And is it possible to make 
the consequent steps reusing environments from the previous steps 
finished with?


Narrowing down to tripleo CI scope, the problem I'd want we to solve 
with this "virtual RFE", and using such multi-staged check pipelines, is 
reducing (ideally, de-duplicating) some of the common steps for existing 
CI jobs.


For example, we may want to omit running all of the OVB or multinode 
(non-upgrade) jobs deploying overclouds, when the *undercloud* fails to 
install. This case makes even more sense, when undercloud is deployed 
from the same heat templates (aka Containerized Undercloud) and uses the 
same packages and containers images, as overcloud would do! Or, maybe, 
just stop the world, when tox failed at the step1 and do nothing more, 
as it makes very little sense to run anything else (IMO), if the patch 
can never be gated with a failed tox check anyway...


What I propose here, is to think and discuss, and come up with an RFE, 
either for tripleo, or zuul, or both, of the following scenarios 
(examples are tripleo/RDO CI specific, though you can think of other use 
cases ofc!):


case A. No deduplication, simple multi-staged check pipeline:

* check-1: syntax only, lint/tox
* check-2 : undercloud install with heat and containers
* check-3 : undercloud install with heat and containers, build overcloud 
images (if not multinode job type), deploy overcloud... (repeats OVB 
jobs as is, basically)


case B. Full de-duplication scenario (consequent steps re-use the 
previous steps results, building "on-top"):


* check-1: syntax only, lint/tox
* check-2 : undercloud unstall, reuses nothing from the step1 prolly
* check-3 : build overcloud images, if not multinode job type, extends 
stage 2

* check-4:  deploy overcloud, extends stages 2/3
* check-5: upgrade undercloud, extends stage 2
* check-6: upgrade overcloud, extends stage 4
(looking into future...)
* check-7: deploy openshift/k8s on openstack and do e2e/conformance et 
al, extends either stage 4 or 6


I believe even the simplest 'case A' would reduce the zuul queues for 
tripleo CI dramatically. What do you think folks? See also PTG tripleo 
CI notes [1].


[0] https://docs.openstack.org/infra/zuul/user/concepts.html
[1] https://etherpad.openstack.org/p/tripleo-ptg-ci

--
Best regards,
Bogdan Dobrelya,
Irc #bogdando

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [ci][infra][tripleo] Multi-staged check pipelines for Zuul v3 proposal

Re: [openstack-dev] [ci][infra][tripleo] Multi-staged check pipelines for Zuul v3 proposal

Re: [openstack-dev] [ci][infra][tripleo] Multi-staged check pipelines for Zuul v3 proposal

Re: [openstack-dev] [ci][infra][tripleo] Multi-staged check pipelines for Zuul v3 proposal

Re: [openstack-dev] [ci][infra][tripleo] Multi-staged check pipelines for Zuul v3 proposal

Re: [openstack-dev] [ci][infra][tripleo] Multi-staged check pipelines for Zuul v3 proposal

Re: [openstack-dev] [ci][infra][tripleo] Multi-staged check pipelines for Zuul v3 proposal

Re: [openstack-dev] [ci][infra][tripleo] Multi-staged check pipelines for Zuul v3 proposal

Re: [openstack-dev] [ci][infra][tripleo] Multi-staged check pipelines for Zuul v3 proposal

Re: [openstack-dev] [ci][infra][tripleo] Multi-staged check pipelines for Zuul v3 proposal

Re: [openstack-dev] [ci][infra][tripleo] Multi-staged check pipelines for Zuul v3 proposal

Re: [openstack-dev] [ci][infra][tripleo] Multi-staged check pipelines for Zuul v3 proposal

Re: [openstack-dev] [ci][infra][tripleo] Multi-staged check pipelines for Zuul v3 proposal

Re: [openstack-dev] [ci][infra][tripleo] Multi-staged check pipelines for Zuul v3 proposal

Re: [openstack-dev] [ci][infra][tripleo] Multi-staged check pipelines for Zuul v3 proposal

Re: [openstack-dev] [ci][infra][tripleo] Multi-staged check pipelines for Zuul v3 proposal

Re: [openstack-dev] [ci][infra][tripleo] Multi-staged check pipelines for Zuul v3 proposal

Re: [openstack-dev] [ci][infra][tripleo] Multi-staged check pipelines for Zuul v3 proposal

Re: [openstack-dev] [ci][infra][tripleo] Multi-staged check pipelines for Zuul v3 proposal

Re: [openstack-dev] [ci][infra][tripleo] Multi-staged check pipelines for Zuul v3 proposal

Re: [openstack-dev] [ci][infra][tripleo] Multi-staged check pipelines for Zuul v3 proposal

Re: [openstack-dev] [ci][infra][tripleo] Multi-staged check pipelines for Zuul v3 proposal

Re: [openstack-dev] [ci][infra][tripleo] Multi-staged check pipelines for Zuul v3 proposal

Re: [openstack-dev] [ci][infra][tripleo] Multi-staged check pipelines for Zuul v3 proposal

Re: [openstack-dev] [ci][infra][tripleo] Multi-staged check pipelines for Zuul v3 proposal

Re: [openstack-dev] [ci][infra][tripleo] Multi-staged check pipelines for Zuul v3 proposal

Re: [openstack-dev] [ci][infra][tripleo] Multi-staged check pipelines for Zuul v3 proposal

Re: [openstack-dev] [ci][infra][tripleo] Multi-staged check pipelines for Zuul v3 proposal

Re: [openstack-dev] [ci][infra][tripleo] Multi-staged check pipelines for Zuul v3 proposal

Re: [openstack-dev] [ci][infra][tripleo] Multi-staged check pipelines for Zuul v3 proposal

Re: [openstack-dev] [ci][infra][tripleo] Multi-staged check pipelines for Zuul v3 proposal

Re: [openstack-dev] [ci][infra][tripleo] Multi-staged check pipelines for Zuul v3 proposal

[openstack-dev] [ci][infra][tripleo] Multi-staged check pipelines for Zuul v3 proposal

33 matches

Site Navigation

Mail list logo

Footer information