Re: [openstack-dev] [tripleo] Pacemaker + containers CI

2017-08-30 Thread Jiří Stránský

On 29.8.2017 17:12, Emilien Macchi wrote:

On Tue, Aug 29, 2017 at 2:14 AM, Jiří Stránský  wrote:
[...]

the CI for containerized deployments with Pacemaker is close! In fact, it
works [1][2] (but there are pending changes to merge).


Really good news, thanks for the update!


The way it's proposed in gerrit currently is to switch the
centos-7-containers-multinode job (featureset010) to deploy with Pacemaker.
What do you think about making this switch as a first step? [...]


I'm ok with the idea


No -1s yet, so i removed WIP status of [4].


as long as
gate-tripleo-ci-centos-7-containers-multinode-upgrades-nv keep working
fine.


That's a different featureset so we can control it independently from 
the basic deployment job. It might be good to switch this one to 
Pacemaker too, if we can solve the current timeout issues and perhaps 
have some spare wall time.


Non-pacemaker containers are still CI'd by OVB job, so the upgrade job 
(currently still non-Pacemaker) shouldn't be more vulnerable even if we 
switch the multinode job to Pacemaker.



Deploying Pacemaker on a single node environment is not optimal but
already cover a bunch of code which is good.


Later it would be nice to get a proper clustering test with 3 controllers.
Should we try and switch the centos-7-ovb-ha-oooq job to deploy containers
on master and stable/pike? (Probably by adding a new job that only runs on
master + Pike, and making the old ovb-ha-oooq only run upto Ocata, to keep
the OVB capacity demands unchanged?) I'd be +1 on that since containers are
the intended way of deploying Pike and beyond. WDYT?


It's actually a good start to our discussion at the PTG:
https://etherpad.openstack.org/p/tripleo-ptg-queens-ci-related-topics
(we have a session on Wednesday morning about CI topics, please make
sure you can join!)

I think in Queens, we'll run container-only jobs, even for OVB.
That said, I think OVB coverage in Queens will be very useful to try
HA with 3 controllers (containerized) and the baremetal services
coverage will only run on Pike, Ocata and Newton.

That way, we would have:

Queens:
- multinode jobs covering basic HA scenario, single node but still
useful to test a good part of the code
- OVB jobs covering production environment and hopefully spot issues
we wouldn't see with multinode jobs

Pike, Ocata, Newton:
no change on OVB job

(note it's a proposal, not a statement)


Yea focusing the CI changes towards containerized mainly on Queens+ 
could be fine too. The frequency of patches going into stable/pike will 
be dropping as it gains stability, so time spent on CI enhancements 
might indeed be better focused on Queens+. We can always adjust if that 
doesn't prove to be the case.




[...]

[3] https://review.openstack.org/498474


approved

[...]

Thanks,




__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tripleo] Pacemaker + containers CI

2017-08-29 Thread Emilien Macchi
On Tue, Aug 29, 2017 at 2:14 AM, Jiří Stránský  wrote:
[...]
> the CI for containerized deployments with Pacemaker is close! In fact, it
> works [1][2] (but there are pending changes to merge).

Really good news, thanks for the update!

> The way it's proposed in gerrit currently is to switch the
> centos-7-containers-multinode job (featureset010) to deploy with Pacemaker.
> What do you think about making this switch as a first step? [...]

I'm ok with the idea as long as
gate-tripleo-ci-centos-7-containers-multinode-upgrades-nv keep working
fine.
Deploying Pacemaker on a single node environment is not optimal but
already cover a bunch of code which is good.

> Later it would be nice to get a proper clustering test with 3 controllers.
> Should we try and switch the centos-7-ovb-ha-oooq job to deploy containers
> on master and stable/pike? (Probably by adding a new job that only runs on
> master + Pike, and making the old ovb-ha-oooq only run upto Ocata, to keep
> the OVB capacity demands unchanged?) I'd be +1 on that since containers are
> the intended way of deploying Pike and beyond. WDYT?

It's actually a good start to our discussion at the PTG:
https://etherpad.openstack.org/p/tripleo-ptg-queens-ci-related-topics
(we have a session on Wednesday morning about CI topics, please make
sure you can join!)

I think in Queens, we'll run container-only jobs, even for OVB.
That said, I think OVB coverage in Queens will be very useful to try
HA with 3 controllers (containerized) and the baremetal services
coverage will only run on Pike, Ocata and Newton.

That way, we would have:

Queens:
- multinode jobs covering basic HA scenario, single node but still
useful to test a good part of the code
- OVB jobs covering production environment and hopefully spot issues
we wouldn't see with multinode jobs

Pike, Ocata, Newton:
no change on OVB job

(note it's a proposal, not a statement)

[...]
> [3] https://review.openstack.org/498474

approved

[...]

Thanks,
-- 
Emilien Macchi

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tripleo] Pacemaker + containers CI

2017-08-29 Thread Jiří Stránský

On 29.8.2017 14:42, Giulio Fidente wrote:

On 08/29/2017 02:33 PM, Jiří Stránský wrote:

A bit of context: Currently our only upgrade check job is non-OVB -
containers-multinode-upgrades-nv. As of late we started hitting
timeouts, and the job only does mixed-version deploy + 1 node AIO
overcloud upgrade (just the main step). It doesn't do undercloud
upgrade, nor compute upgrade, nor converge, and it still times out...
It's a bit difficult to find things to cut off here. :D We could look
into speeding things up (e.g. try to reintroduce selective container
image upload etc.) but i think we might also be approaching the
"natural" deploy+upgrade limits. We might need to bump up the timeouts
if we want to test more things. Though it's not only about capacity of
HW, it could also get unwieldy for devs if we keep increasing the
feedback time from CI, so we're kinda in a tough spot with upgrade CI...


agreed which goes back to "nobody looks at the periodic jobs" but
periodic job seems the answer?



Yea that might be the best solution :)

J.

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tripleo] Pacemaker + containers CI

2017-08-29 Thread Giulio Fidente

On 08/29/2017 02:33 PM, Jiří Stránský wrote:
A bit of context: Currently our only upgrade check job is non-OVB - 
containers-multinode-upgrades-nv. As of late we started hitting 
timeouts, and the job only does mixed-version deploy + 1 node AIO 
overcloud upgrade (just the main step). It doesn't do undercloud 
upgrade, nor compute upgrade, nor converge, and it still times out... 
It's a bit difficult to find things to cut off here. :D We could look 
into speeding things up (e.g. try to reintroduce selective container 
image upload etc.) but i think we might also be approaching the 
"natural" deploy+upgrade limits. We might need to bump up the timeouts 
if we want to test more things. Though it's not only about capacity of 
HW, it could also get unwieldy for devs if we keep increasing the 
feedback time from CI, so we're kinda in a tough spot with upgrade CI...


agreed which goes back to "nobody looks at the periodic jobs" but 
periodic job seems the answer?

--
Giulio Fidente
GPG KEY: 08D733BA

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tripleo] Pacemaker + containers CI

2017-08-29 Thread Jiří Stránský

On 29.8.2017 13:22, Giulio Fidente wrote:

On 08/29/2017 11:14 AM, Jiří Stránský wrote:

Hi owls,

the CI for containerized deployments with Pacemaker is close! In fact,
it works [1][2] (but there are pending changes to merge).


cool :D

I also spotted this which we need for ceph
https://review.openstack.org/#/c/498356/

but I am not sure if we want to enable ceph in this job as we have it
already in a couple of scenarios, more below ...


+1 on keeping it in scenarios if that covers our needs.




The way it's proposed in gerrit currently is to switch the
centos-7-containers-multinode job (featureset010) to deploy with
Pacemaker. What do you think about making this switch as a first step?
(The OVB job is an option too, but that one is considerably closer to
timeouts already, so it may be better left as is.)


+1 on switching the existing job


Later it would be nice to get a proper clustering test with 3
controllers. Should we try and switch the centos-7-ovb-ha-oooq job to
deploy containers on master and stable/pike? (Probably by adding a new
job that only runs on master + Pike, and making the old ovb-ha-oooq only
run upto Ocata, to keep the OVB capacity demands unchanged?) I'd be +1
on that since containers are the intended way of deploying Pike and
beyond. WDYT?


switching OVB to containers from pike seems fine because that's the
indended way as you pointed, yet I would like to enable ceph in the
upgrade job, and it requires multiple MON instances (multiple controllers)

would it make any sense to deploy the pacemaker / ceph combination using
multiple controllers in the upgrade job and drop the standard ovb job
(which doesn't do upgrade) or use it for other purposes?


It makes sense feature-wise to test upgrade with Ceph, i'd say it's a 
pretty common and important use case.


However i'm not sure how can we achieve it time-wise in CI. Is it 
possible to estimate how much time might the Ceph upgrade add?


A bit of context: Currently our only upgrade check job is non-OVB - 
containers-multinode-upgrades-nv. As of late we started hitting 
timeouts, and the job only does mixed-version deploy + 1 node AIO 
overcloud upgrade (just the main step). It doesn't do undercloud 
upgrade, nor compute upgrade, nor converge, and it still times out... 
It's a bit difficult to find things to cut off here. :D We could look 
into speeding things up (e.g. try to reintroduce selective container 
image upload etc.) but i think we might also be approaching the 
"natural" deploy+upgrade limits. We might need to bump up the timeouts 
if we want to test more things. Though it's not only about capacity of 
HW, it could also get unwieldy for devs if we keep increasing the 
feedback time from CI, so we're kinda in a tough spot with upgrade CI...


Jirka

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tripleo] Pacemaker + containers CI

2017-08-29 Thread Giulio Fidente

On 08/29/2017 11:14 AM, Jiří Stránský wrote:

Hi owls,

the CI for containerized deployments with Pacemaker is close! In fact, 
it works [1][2] (but there are pending changes to merge).


cool :D

I also spotted this which we need for ceph 
https://review.openstack.org/#/c/498356/


but I am not sure if we want to enable ceph in this job as we have it 
already in a couple of scenarios, more below ...


The way it's proposed in gerrit currently is to switch the 
centos-7-containers-multinode job (featureset010) to deploy with 
Pacemaker. What do you think about making this switch as a first step? 
(The OVB job is an option too, but that one is considerably closer to 
timeouts already, so it may be better left as is.)


+1 on switching the existing job

Later it would be nice to get a proper clustering test with 3 
controllers. Should we try and switch the centos-7-ovb-ha-oooq job to 
deploy containers on master and stable/pike? (Probably by adding a new 
job that only runs on master + Pike, and making the old ovb-ha-oooq only 
run upto Ocata, to keep the OVB capacity demands unchanged?) I'd be +1 
on that since containers are the intended way of deploying Pike and 
beyond. WDYT?


switching OVB to containers from pike seems fine because that's the 
indended way as you pointed, yet I would like to enable ceph in the 
upgrade job, and it requires multiple MON instances (multiple controllers)


would it make any sense to deploy the pacemaker / ceph combination using 
multiple controllers in the upgrade job and drop the standard ovb job 
(which doesn't do upgrade) or use it for other purposes?

--
Giulio Fidente
GPG KEY: 08D733BA

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [tripleo] Pacemaker + containers CI

2017-08-29 Thread Jiří Stránský

Hi owls,

the CI for containerized deployments with Pacemaker is close! In fact, 
it works [1][2] (but there are pending changes to merge).


The way it's proposed in gerrit currently is to switch the 
centos-7-containers-multinode job (featureset010) to deploy with 
Pacemaker. What do you think about making this switch as a first step? 
(The OVB job is an option too, but that one is considerably closer to 
timeouts already, so it may be better left as is.)


Later it would be nice to get a proper clustering test with 3 
controllers. Should we try and switch the centos-7-ovb-ha-oooq job to 
deploy containers on master and stable/pike? (Probably by adding a new 
job that only runs on master + Pike, and making the old ovb-ha-oooq only 
run upto Ocata, to keep the OVB capacity demands unchanged?) I'd be +1 
on that since containers are the intended way of deploying Pike and 
beyond. WDYT?


Have a good day,

Jirka

P.S. You can deploy containerized with pacemaker using OOOQ by setting 
both `containerized_overcloud` and `enable_pacemaker` to true. Thanks to 
Wes for collaboration on this.


P.P.S. The remaining patches are [3] and maybe [4] if we're ok with 
switching centos-7-containers-multinode.



[1] 
http://logs.openstack.org/24/471724/5/check/gate-tripleo-ci-centos-7-containers-multinode/6330e5e/logs/subnode-2/var/log/pacemaker/bundles/


[2] 
http://logs.openstack.org/24/471724/5/check/gate-tripleo-ci-centos-7-containers-multinode/6330e5e/logs/subnode-2/var/log/extra/docker/containers/


[3] https://review.openstack.org/498474
[4] https://review.openstack.org/471724

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev