Re: [openstack-dev] [tripleo] reducing our upstream CI footprint

2018-11-01 Thread Derek Higgins
On Wed, 31 Oct 2018 at 17:22, Alex Schultz  wrote:
>
> Hey everyone,
>
> Based on previous emails around this[0][1], I have proposed a possible
> reducing in our usage by switching the scenario001--011 jobs to
> non-voting and removing them from the gate[2]. This will reduce the
> likelihood of causing gate resets and hopefully allow us to land
> corrective patches sooner.  In terms of risks, there is a risk that we
> might introduce breaking changes in the scenarios because they are
> officially non-voting, and we will still be gating promotions on these
> scenarios.  This means that if they are broken, they will need the
> same attention and care to fix them so we should be vigilant when the
> jobs are failing.
>
> The hope is that we can switch these scenarios out with voting
> standalone versions in the next few weeks, but until that I think we
> should proceed by removing them from the gate.  I know this is less
> than ideal but as most failures with these jobs in the gate are either
> timeouts or unrelated to the changes (or gate queue), they are more of
> hindrance than a help at this point.
>
> Thanks,
> -Alex

While on the topic of reducing the CI footprint

something worth considering when pushing up a string of patches would
be to remove a bunch of the check jobs at the start of the patch set.

e.g. If I'm working on t-h-t and have a series of 10 patches, while
looking for feedback I could remove most of the jobs from
zuul.d/layout.yaml in patch 1 so all 10 patches don't run the entire
suite of CI jobs. Once it becomes clear that the patchset is nearly
ready to merge, I change patch 1 leave zuul.d/layout.yaml as is.

I'm not suggesting everybody does this but anybody who tends to push
up multiple patch sets together could consider it to not tie up
resources for hours.

>
> [0] 
> http://lists.openstack.org/pipermail/openstack-dev/2018-October/136141.html
> [1] 
> http://lists.openstack.org/pipermail/openstack-dev/2018-October/135396.html
> [2] 
> https://review.openstack.org/#/q/topic:reduce-tripleo-usage+(status:open+OR+status:merged)
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tripleo] reducing our upstream CI footprint

2018-10-31 Thread Ben Nemec



On 10/31/18 4:59 PM, Harald Jensås wrote:

On Wed, 2018-10-31 at 11:39 -0600, Wesley Hayutin wrote:



On Wed, Oct 31, 2018 at 11:21 AM Alex Schultz 
wrote:

Hey everyone,

Based on previous emails around this[0][1], I have proposed a
possible
reducing in our usage by switching the scenario001--011 jobs to
non-voting and removing them from the gate[2]. This will reduce the
likelihood of causing gate resets and hopefully allow us to land
corrective patches sooner.  In terms of risks, there is a risk that
we
might introduce breaking changes in the scenarios because they are
officially non-voting, and we will still be gating promotions on
these
scenarios.  This means that if they are broken, they will need the
same attention and care to fix them so we should be vigilant when
the
jobs are failing.

The hope is that we can switch these scenarios out with voting
standalone versions in the next few weeks, but until that I think
we
should proceed by removing them from the gate.  I know this is less
than ideal but as most failures with these jobs in the gate are
either
timeouts or unrelated to the changes (or gate queue), they are more
of
hindrance than a help at this point.

Thanks,
-Alex


I think I also have to agree.
Having to deploy with containers, update containers and run with two
nodes is no longer a very viable option upstream.  It's not
impossible but it should be the exception and not the rule for all
our jobs.


afaict in my local environment, the container prep stuff takes ages
when adding the playbooks to update them with yum. We will still have
to do this for every standalone job right?



Also, I enabled profiling for ansible tasks on the undercloud and
noticed that the UndercloudPostDeploy was high on the list, actually
the longest running task when re-running the undercloud install ...

Moving from shell script using openstack cli to python reduced the time
for this task dramatically in my environment, see:
https://review.openstack.org/614540. 6 and half minutes reduced to 40
seconds ...


Everything old is new again: 
https://github.com/openstack/instack-undercloud/commit/0eb1b59926c7dc46e321c56db29af95b3d755f34#diff-5602f1b710e86ca1eb7334cb0632f9ee


:-)




How much time would we save in the gates if we converted some of the
shell scripting to python, or if we want to stay in shell script we can
use the interactive shell or use the client-as-a-service[2]?

Interactive shell:
time openstack <<-EOC
server list
workflow list
workflow execution list
EOC

real0m2.852s
time (openstack server list; \
   openstack workflow list; \
   openstack workflow execution list)

real0m7.119s

The difference is significant.

We could cache a token[1], and specify the end-point on each command,
but doing so is still far from as effective as using the interactive.


There is an old thread[2] on the mailing list, which contain a
server/client solution. If we run this service in CI nodes and drop in
the replacement openstack command in /usr/local/bin/openstack we would
use ~1/5 of the time for each command.

(undercloud) [stack@leafs ~]$ time (/usr/bin/openstack network list -f
value -c ID; /usr/bin/openstack network segment list -f value -c ID;
/usr/bin/openstack subnet list -f value -c ID)


real0m6.443s
user0m2.171s
sys 0m0.366s

(undercloud) [stack@leafs ~]$ time (/usr/local/bin/openstack network
list -f value -c ID; /usr/local/bin/openstack network segment list -f
value -c ID; /usr/local/bin/openstack subnet list -f value -c ID)

real0m1.698s
user0m0.042s
sys 0m0.018s



I relize this is a kind of hacky approch, but it does seem to work and
it should be fairly quick to get in there. (With the Undercloud post
script I see 6 minutes returned, what can we get in CI, 10-15 minutes?
Then we could look at moving these scripts to python or use ansible
openstack modules which hopefully does'nt share the same issues with
loading as the python clients?


I'm personally a fan of using Python as then it is unit-testable, but 
I'm not sure how that works with the tht-based code so maybe it's not a 
factor.






[1] https://wiki.openstack.org/wiki/OpenStackClient/Authentication
[2]
http://lists.openstack.org/pipermail/openstack-dev/2016-April/092546.html



Thanks Alex

  

[0]
http://lists.openstack.org/pipermail/openstack-dev/2018-October/136141.html
[1]
http://lists.openstack.org/pipermail/openstack-dev/2018-October/135396.html
[2]
https://review.openstack.org/#/q/topic:reduce-tripleo-usage+(status:open+OR+status:merged
)

___
___
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsu
bscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


--
WES HAYUTIN
ASSOCIATE MANAGER
Red Hat

whayu...@redhat.comT: +19194232509 IRC:  weshay


View my calendar and check my availability for meetings HERE

Re: [openstack-dev] [tripleo] reducing our upstream CI footprint

2018-10-31 Thread Harald Jensås
On Wed, 2018-10-31 at 11:39 -0600, Wesley Hayutin wrote:
> 
> 
> On Wed, Oct 31, 2018 at 11:21 AM Alex Schultz 
> wrote:
> > Hey everyone,
> > 
> > Based on previous emails around this[0][1], I have proposed a
> > possible
> > reducing in our usage by switching the scenario001--011 jobs to
> > non-voting and removing them from the gate[2]. This will reduce the
> > likelihood of causing gate resets and hopefully allow us to land
> > corrective patches sooner.  In terms of risks, there is a risk that
> > we
> > might introduce breaking changes in the scenarios because they are
> > officially non-voting, and we will still be gating promotions on
> > these
> > scenarios.  This means that if they are broken, they will need the
> > same attention and care to fix them so we should be vigilant when
> > the
> > jobs are failing.
> > 
> > The hope is that we can switch these scenarios out with voting
> > standalone versions in the next few weeks, but until that I think
> > we
> > should proceed by removing them from the gate.  I know this is less
> > than ideal but as most failures with these jobs in the gate are
> > either
> > timeouts or unrelated to the changes (or gate queue), they are more
> > of
> > hindrance than a help at this point.
> > 
> > Thanks,
> > -Alex
> 
> I think I also have to agree.
> Having to deploy with containers, update containers and run with two
> nodes is no longer a very viable option upstream.  It's not
> impossible but it should be the exception and not the rule for all
> our jobs.
> 
afaict in my local environment, the container prep stuff takes ages
when adding the playbooks to update them with yum. We will still have
to do this for every standalone job right?



Also, I enabled profiling for ansible tasks on the undercloud and
noticed that the UndercloudPostDeploy was high on the list, actually
the longest running task when re-running the undercloud install ...

Moving from shell script using openstack cli to python reduced the time
for this task dramatically in my environment, see: 
https://review.openstack.org/614540. 6 and half minutes reduced to 40
seconds ...


How much time would we save in the gates if we converted some of the
shell scripting to python, or if we want to stay in shell script we can
use the interactive shell or use the client-as-a-service[2]?

Interactive shell:
time openstack <<-EOC
server list
workflow list
workflow execution list
EOC

real0m2.852s
time (openstack server list; \
  openstack workflow list; \
  openstack workflow execution list)

real0m7.119s

The difference is significant.

We could cache a token[1], and specify the end-point on each command,
but doing so is still far from as effective as using the interactive.


There is an old thread[2] on the mailing list, which contain a
server/client solution. If we run this service in CI nodes and drop in
the replacement openstack command in /usr/local/bin/openstack we would
use ~1/5 of the time for each command.

(undercloud) [stack@leafs ~]$ time (/usr/bin/openstack network list -f
value -c ID; /usr/bin/openstack network segment list -f value -c ID;
/usr/bin/openstack subnet list -f value -c ID)


real0m6.443s
user0m2.171s
sys 0m0.366s

(undercloud) [stack@leafs ~]$ time (/usr/local/bin/openstack network
list -f value -c ID; /usr/local/bin/openstack network segment list -f
value -c ID; /usr/local/bin/openstack subnet list -f value -c ID)

real0m1.698s
user0m0.042s
sys 0m0.018s



I relize this is a kind of hacky approch, but it does seem to work and
it should be fairly quick to get in there. (With the Undercloud post
script I see 6 minutes returned, what can we get in CI, 10-15 minutes?
Then we could look at moving these scripts to python or use ansible
openstack modules which hopefully does'nt share the same issues with
loading as the python clients?



[1] https://wiki.openstack.org/wiki/OpenStackClient/Authentication
[2] 
http://lists.openstack.org/pipermail/openstack-dev/2016-April/092546.html


> Thanks Alex
> 
>  
> > [0] 
> > http://lists.openstack.org/pipermail/openstack-dev/2018-October/136141.html
> > [1] 
> > http://lists.openstack.org/pipermail/openstack-dev/2018-October/135396.html
> > [2] 
> > https://review.openstack.org/#/q/topic:reduce-tripleo-usage+(status:open+OR+status:merged
> > )
> > 
> > ___
> > ___
> > OpenStack Development Mailing List (not for usage questions)
> > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsu
> > bscribe
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 
> -- 
> WES HAYUTIN
> ASSOCIATE MANAGER
> Red Hat 
> 
> whayu...@redhat.comT: +19194232509 IRC:  weshay
> 
> 
> View my calendar and check my availability for meetings HERE
> _
> _
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: 

Re: [openstack-dev] [tripleo] reducing our upstream CI footprint

2018-10-31 Thread Doug Hellmann
Alex Schultz  writes:

> Hey everyone,
>
> Based on previous emails around this[0][1], I have proposed a possible
> reducing in our usage by switching the scenario001--011 jobs to
> non-voting and removing them from the gate[2]. This will reduce the
> likelihood of causing gate resets and hopefully allow us to land
> corrective patches sooner.  In terms of risks, there is a risk that we
> might introduce breaking changes in the scenarios because they are
> officially non-voting, and we will still be gating promotions on these
> scenarios.  This means that if they are broken, they will need the
> same attention and care to fix them so we should be vigilant when the
> jobs are failing.
>
> The hope is that we can switch these scenarios out with voting
> standalone versions in the next few weeks, but until that I think we
> should proceed by removing them from the gate.  I know this is less
> than ideal but as most failures with these jobs in the gate are either
> timeouts or unrelated to the changes (or gate queue), they are more of
> hindrance than a help at this point.
>
> Thanks,
> -Alex
>
> [0] 
> http://lists.openstack.org/pipermail/openstack-dev/2018-October/136141.html
> [1] 
> http://lists.openstack.org/pipermail/openstack-dev/2018-October/135396.html
> [2] 
> https://review.openstack.org/#/q/topic:reduce-tripleo-usage+(status:open+OR+status:merged)
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

This makes a lot of sense as a temporary measure. Thanks for continuing
to drive these changes!

Doug


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tripleo] reducing our upstream CI footprint

2018-10-31 Thread Wesley Hayutin
On Wed, Oct 31, 2018 at 11:21 AM Alex Schultz  wrote:

> Hey everyone,
>
> Based on previous emails around this[0][1], I have proposed a possible
> reducing in our usage by switching the scenario001--011 jobs to
> non-voting and removing them from the gate[2]. This will reduce the
> likelihood of causing gate resets and hopefully allow us to land
> corrective patches sooner.  In terms of risks, there is a risk that we
> might introduce breaking changes in the scenarios because they are
> officially non-voting, and we will still be gating promotions on these
> scenarios.  This means that if they are broken, they will need the
> same attention and care to fix them so we should be vigilant when the
> jobs are failing.
>
> The hope is that we can switch these scenarios out with voting
> standalone versions in the next few weeks, but until that I think we
> should proceed by removing them from the gate.  I know this is less
> than ideal but as most failures with these jobs in the gate are either
> timeouts or unrelated to the changes (or gate queue), they are more of
> hindrance than a help at this point.
>
> Thanks,
> -Alex
>

I think I also have to agree.
Having to deploy with containers, update containers and run with two nodes
is no longer a very viable option upstream.  It's not impossible but it
should be the exception and not the rule for all our jobs.

Thanks Alex



>
> [0]
> http://lists.openstack.org/pipermail/openstack-dev/2018-October/136141.html
> [1]
> http://lists.openstack.org/pipermail/openstack-dev/2018-October/135396.html
> [2]
> https://review.openstack.org/#/q/topic:reduce-tripleo-usage+(status:open+OR+status:merged)
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
-- 

Wes Hayutin

Associate MANAGER

Red Hat



whayu...@redhat.comT: +1919 <+19197544114>4232509 IRC:  weshay


View my calendar and check my availability for meetings HERE

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev