Re: [openstack-dev] [TripleO] Proposing Ronelle Landy for Tripleo-Quickstart/Extras/CI core

2017-11-30 Thread Attila Darazs

On 11/29/2017 08:34 PM, John Trowbridge wrote:
I would like to propose Ronelle be given +2 for the above repos. She has 
been a solid contributor to tripleo-quickstart and extras almost since 
the beginning. She has solid review numbers, but more importantly has 
always done quality reviews. She also has been working in the very 
intense rover role on the CI squad in the past CI sprint, and has done 
very well in that role.


+1, yep!

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tripleo] Blocking gate - do not recheck / rebase / approve any patch now (please)

2017-10-26 Thread Attila Darazs

On 10/26/2017 06:14 AM, Emilien Macchi wrote:

On Wed, Oct 25, 2017 at 1:59 PM, Emilien Macchi  wrote:

Quick update before being afk for some hours:

- Still trying to land https://review.openstack.org/#/c/513701 (thanks
Paul for promoting it in gate).


Landed.


- Disabling voting on scenario001 and scenario004 container jobs:
https://review.openstack.org/#/c/515188/


Done, please be very careful while these jobs are not voting.
If any doubt, please ping me or fultonj or gfidente on #tripleo.


- overcloudrc/keystone v2 workaround:
https://review.openstack.org/#/c/515161/ (d0ugal will work on proper
fix for https://bugs.launchpad.net/tripleo/+bug/1727454)


Merged - Dougal will work on the real fix this week but not urgent anymore.


- Fixing zaqar/notification issues on
https://review.openstack.org/#/c/515123 - we hope that helps to reduce
some failures in gate


In gate right now and hopefully merged in less than 2 hours.
Otherwise, please keep rechecking it.
According to Thomas Hervé, il will reduce the change to timeout.


- puppet-tripleo gate broken on stable branches (syntax jobs not
running properly) - jeblair is looking at it now


jeblair will provide a fix hopefully this week but this is not
critical at this time.
Thanks Jim for your help.


Once again, we'll need to retrospect and see why we reached that
terrible state but let's focus on bringing our CI in a good shape
again.
Thanks a ton to everyone who is involved,


I'm now restoring all patches that I killed from the gate.
You can now recheck / rebase / approve what you want, but please save
our CI resources and do it with moderation. We are not done yet.

I won't call victory but we've merged almost all our blockers, one is
missing but currently in gate:
https://review.openstack.org/515123 - need babysit until merged.

Now let's see how RDO promotion works. We're close :-)


We also have to change the tenant rc file from overcloudrc to 
overcloudrc.v3 for the validate-simple role to unblock promotion on master.


I created a bug to track that problem and going to post a fix soon:

https://bugs.launchpad.net/tripleo/+bug/1727698

Attila

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [tripleo] CI Squad Meeting Summary (week 35) - better late than never

2017-09-06 Thread Attila Darazs
If the topics below interest you and you want to contribute to the 
discussion, feel free to join the next meeting:


Time: Thursdays, 14:30-15:30 UTC
Place: https://bluejeans.com/4113567798/

Full minutes: https://etherpad.openstack.org/p/tripleo-ci-squad-meeting

Here are the topics discussed from last Thursday:

Downstream (RHEL based) upstream Quickstart gates are down, because we 
had to migrate from QEOS7 to the ci-rhos internal cloud, which is not 
able to support our jobs currently. Ronelle is going to talk to 
responsible people to solve problems on it.


Tempest is now running in more and more scenario jobs. See Emilien's 
email[1] for details.


There's ongoing work from Emilien to get the upgrades job working on 
stable/pike. Please help with reviews to get it going.


Most of the squad's work is currently focusing on getting the periodic 
promotion pipeline on rdocloud working and uploading containers and images.


That's the short version, join us on the Thursday meeting or read the 
etherpad for more. :)


Best regards,
Attila

[1] 
http://lists.openstack.org/pipermail/openstack-dev/2017-September/121849.html


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [tripleo] CI Squad Meeting Summary (week 34)

2017-08-25 Thread Attila Darazs
If the topics below interest you and you want to contribute to the 
discussion, feel free to join the next meeting:


Time: Thursdays, 14:30-15:30 UTC
Place: https://bluejeans.com/4113567798/

Full minutes: https://etherpad.openstack.org/p/tripleo-ci-squad-meeting

Topics discussed:

We talked about the balance between using openstack-infra supported vs. 
self hosted solutions for graphite, logservers, proxies and mirrors. 
Paul Belanger joined us and the end result seemed to be that we're going 
to try to keep as many services under infra as we can, but sometimes the 
line is not so clear when we're dealing with 3rd party environments like 
rdocloud.


Ronelle talked about changing the overcommit ratio on rdocloud after the 
analysis of our usage. This can be probably done without any issue.


Wes added 
"gate-tripleo-ci-centos-7-scenario003-multinode-oooq-container" to the 
tripleo-quickstart-extras check and gate jobs to make sure we won't 
break containers. and that we get some feedback on the status of 
containers jobs.


RDO packaging changes are now gating with Quickstart 
(multinode-featureset005), though it's non-voting. It might help us 
prevent breaks from the packaging side.


Promotion jobs are still not working fully on RDO Cloud, but we're 
working on it.


That's it for this week, have a nice weekend.

Best regards,
Attila

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [tripleo] How to report tripleo-quickstart results to DLRN API

2017-08-22 Thread Attila Darazs

Hi folks,

I'm trying to come up with a good design for $subject, and there are 
several different methods with pros and cons. I'd like to get your 
opinion about them.


For a bit of context, DLRN API[1] is a new extension of DLRN, our 
package and repo building solution for RDO. It's designed to be a 
central point of information about jobs that ran on certain hashes in 
various stages of testing and handle "promotions", which are really just 
symlinks to certain hashes.


We want to report back job results on multiple levels (upstream, RDO CI 
phase1 & phase2) and then use the information to promote new hashes at 
every stage.


If we would only be interested in reporting successful runs, the 
solution is fairly simple: add a reporting step to the 
quickstart-extras.yml[2] playbook at the end if a "report" variable is set.


However it would be probably useful in the long term to also report back 
failures (for statistics) and that's where things get complicated.


It would be great if we could report the failed status within the same 
quickstart.sh run instead of having a second run, because this way we 
don't have to touch the shell scripts in multiple places (upstream, 
phase1, phase2), just get the reporting done with config file changes.


This is not simple, because the Ansible play can exit at any failed 
task. We would need to wrap each task in rescue blocks[3] to avoid 
skipping the failure.


Idea #1: Create a "run successful" marker file at the reporting step, 
and report failure in case the file is not found (also making sure the 
file does not exist at the start of the run). This would still require 
multiple run of ansible-playbook, but we could integrate the 
functionality into quickstart.sh by creating a --report option, making 
it available at every environment at the same time.


Idea #2: Don't fail on *any* step, just register variables and check for 
success. An example where we already do this is the overcloud-deploy 
role. We don't fail on errors[4], but write out a file with the result 
and fail later[5]. We would need to do this at almost all shell parts to 
be reasonably certain we won't miss any failure. This requires a lot of 
alterations to playbooks and it seems a bit forced on Ansible without 
the usage of the rescue block, which we can't put in every single task.


Idea #3: Use "post-build scripts" in the promotion jobs. We can pattern 
match for failed/passed jobs and report the result accordingly. The 
problem with this is that it's environment dependent. While we can 
certainly do this with post-build scripts in Jenkins Job Builder on 
CentOS CI, it's not clear how to solve this in Zuul queues. Probably we 
just need to make the shell scripts of the jobs more involved (not fail 
on quickstart.sh's nonzero exit). Besides these complications, it also 
means that we have to keep the reporting method in sync across multiple 
environments.


Neither of these solutions are ideal, let me know if you have any better 
design idea. I personally think #1 might be the easiest and cleanest to 
implement, especially that I'm planning to introduce multiple 
ansible-playbook runs in quickstart.sh during the redesign of the devmode.


Best regards,
Attila

[1] https://github.com/javierpena/dlrnapi_client
[2] 
https://github.com/openstack/tripleo-quickstart-extras/blob/master/playbooks/quickstart-extras.yml
[3] 
http://docs.ansible.com/ansible/latest/playbooks_blocks.html#error-handling
[4] 
https://github.com/openstack/tripleo-quickstart-extras/blob/master/roles/overcloud-deploy/tasks/deploy-overcloud.yml#L6
[5] 
https://github.com/openstack/tripleo-quickstart-extras/blob/master/playbooks/quickstart-extras-overcloud.yml#L32-L44


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [tripleo] CI Squad Meeting Summary (week 33)

2017-08-18 Thread Attila Darazs
If the topics below interest you and you want to contribute to the 
discussion, feel free to join the next meeting:


Time: Thursdays, 14:30-15:30 UTC
Place: https://bluejeans.com/4113567798/

Full minutes: https://etherpad.openstack.org/p/tripleo-ci-squad-meeting

Topics discussed:

We debated whether we should add an upgrades job to 
tripleo-quickstart-extra that would allow our IRC bot (hubbot) to report 
on the status of the upgrades as well using gatestatus[1]. The upgrades 
jobs are not stable enough to do that though.


We have/had two major infra issues during the week, one is not using the 
nodepool DNS in jobs (fixed by Sagi) and not using the DLRN & CentOS 
mirrors during DLRN package building in the gates. The latter has fixes 
but they are not merged yet.


Emilien and Arx is working on adding tempest tests in place of pingtests 
in most of our gate jobs where it's useful. We also have quite a few 
jobs that don't have any validation yet.


We decided on using a whitelist for the collecting log files from /etc 
on the upstream jobs. This will reduce the load on the logserver.


3/4 node multinode jobs are almost ready, we're trying to merge the 
changes, just like the ones for multinic with libvirt.


We're also working hard to get the periodic/promotion jobs work on 
rdocloud to increase the cadence of the promotions. We have daily 
standups to coordinate the work with Ronelle, Wes, John and me.


That's it for this week, have a nice weekend.

Best regards,
Attila

[1] https://github.com/adarazs/gate-status

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [tripleo] CI Squad Meeting Summary (week 31)

2017-08-04 Thread Attila Darazs
If the topics below interest you and you want to contribute to the 
discussion, feel free to join the next meeting:


Time: Thursdays, 14:30-15:30 UTC
Place: https://bluejeans.com/4113567798/

Full minutes: https://etherpad.openstack.org/p/tripleo-ci-squad-meeting

There are a lot of people on vacation, so this was a small meeting.

We started by discussing the hash promotions and the ways to track 
issues. Whether it's an upstream or RDO promotion issue, just create a 
Launchpad bug against tripleo and tag it with ci and alert. It will 
automatically gets escalated and gets attention.


Gabriele gave a presentation about his current status with container 
building on RDO Cloud. It looks to be in a good shape, however there are 
still bugs to iron out.


Arx explained that the scenario001 jobs are now running a tempest test 
as well, good way to introduce more testing upstream, while Emilien 
explained that we should probably do more tempest testing on container 
jobs as well.


Wes brought up an issue about collecting logs during the image building 
process which needs attention.


That's it for this week, have a nice weekend.

Best regards,
Attila

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [tripleo] CI Squad Meeting Summary (week 30) - holidays

2017-07-28 Thread Attila Darazs
If the topics below interest you and you want to contribute to the 
discussion, feel free to join the next meeting:


Time: Thursdays, 14:30-15:30 UTC
Place: https://bluejeans.com/4113567798/

Full minutes: https://etherpad.openstack.org/p/tripleo-ci-squad-meeting

= Discussion topics =

Wes suggested to remove the verbose ansible logging from our CI runs and 
direct people to use ARA instead. This seems like a good solution after 
we get the upstream README file merged where we can explain the changes.


There was also a discussion about having the OVB related repo handled 
upstream instead of Ben's personal github account. Ronelle will start a 
thread about this.


We will need an upstream periodic job that runs tempest on containerized 
overcloud. I added a card to keep track of this[1].


The initramfs modification patches[2] from Sagi need some eyes, please 
review them if you have time.


= Who's working on what? =

This is not a comprehensive status, just highlights.

John is currently working on the 3 node multinode jobs, he already added 
a job in the CI check, but it's not passing yet. Pending a lot of 
changes to be merged for it.


Wes is testing the multinic libvirt runs, they are not passing yet.

Gabriele is fighting with the container promotion jobs on rdocloud, 
using DLRN API. It's a complex goal to achieve.


Sagi was doing a SOVA design meeting, and tried to include the RDO jobs 
in the output of it. Here's the status page if you don't know it[3].


Arx was working on Tempest related issues.

Ronelle did some BMC and devmode OVB fixes, thanks! Probably a lot more, 
but that's what I remember.


Attila is working on DLRN API reporting/promotion on RDO CI for now, 
later on other systems too.


= Announcements =

A heads up: next week a significant portion of the CI Squad will be on 
holiday. Sagi, John and Wes will be on PTO and Ronelle will be on 
meetings. Gabriele and I are still here as cores if you need reviews for 
tripleo-ci or quickstart.


That's it, have a nice weekend.

Best regards,
Attila

[1] 
https://trello.com/c/dvQOn9aK/297-create-an-upstream-tempest-job-that-runs-on-a-containerized-system

[2] https://review.openstack.org/#/q/topic:initramfs
[3] http://cistatus.tripleo.org/

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [tripleo] CI Squad Meeting Summary (week 28) - some announcements

2017-07-17 Thread Attila Darazs
If the topics below interest you and you want to contribute to the 
discussion, feel free to join the next meeting:


Time: Thursdays, 14:30-15:30 UTC
Place: https://bluejeans.com/4113567798/

Full minutes: https://etherpad.openstack.org/p/tripleo-ci-squad-meeting

= Announcements =

TripleO Cores who would like to +workflow changes on tripleo-quickstart, 
tripleo-quickstart-extras and tripleo-ci should attend the Squad meeting 
to gain the necessary overview for deciding when to submit changes to 
these repos. This was discussed by the repo specific cores over this 
meeting.


In other news the https://thirdparty-logs.rdoproject.org/ logserver 
(hosted on OS1) migrated to https://thirdparty.logs.rdoproject.org/ (on 
RDO cloud).


= Discussion topics =

This week we had a more balanced agenda, with multiple small topics. 
Here they are:


* John started working on the much requested 3 node multinode feature 
for Quickstart. Here's his WIP change[1]. This is necessary to test HA + 
containers on multinode jobs.


* The OVB job transition is almost over complete. Sagi was cleaning up 
the last few tasks, replacing the 
gate-tripleo-ci-centos-7-ovb-nonha-puppet-* jobs of ceph and cinder to 
featureset024 which deploys ceph (former updates job) and 
gate-tripleo-ci-centos-7-ovb-nonha-convergence jobs which runs on 
experimental for Heat repo.


* Gabriele made a nice solution to run periodic jobs on demand if 
necessary. The patch[2] is still not merged, but it looks promising.


* Ronelle and Gabriele continues to work on the RDO cloud migration 
(both OVB and multinode). There are already some new and already 
exisitng jobs migrated there as a test.


That's it for last week.

Best regards,
Attila

[1] https://review.openstack.org/483078
[2] https://review.openstack.org/478516

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [tripleo] CI Squad Meeting Summary (week 26) - job renaming discussion

2017-06-30 Thread Attila Darazs
If the topics below interest you and you want to contribute to the 
discussion, feel free to join the next meeting:


Time: Thursdays, 14:30-15:30 UTC
Place: https://bluejeans.com/4113567798/

Full minutes: https://etherpad.openstack.org/p/tripleo-ci-squad-meeting

= Renaming the CI jobs =

When we started the job transition to Quickstart, we introduced the 
concept of featuresets[1] that define a certain combination of features 
for each job.


This seemed to be a sensible solution, as it's not practical to mention 
all the individual features in the job name, and short names can be 
misleading (for example ovb-ha job does so much more than tests HA).


We decided to keep the original names for these jobs to simplify the 
transition, but the plan is to rename them to something that will help 
to reproduce the jobs locally with Quickstart.


The proposed naming scheme will be the same as the one we're now using 
for job type in project-config:


gate-tripleo-ci-centos-7-{node-config}-{featureset-config}

So for example the current "gate-tripleo-ci-centos-7-ovb-ha-oooq" job 
would look like "gate-tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001"


The advantage of this will be that it will be easy to reproduce a gate 
job on a local virthost by typing something like:


./quickstart.sh --release tripleo-ci/master \
--nodes config/nodes/3ctlr_1comp.yml \
--config config/general_config/featureset001.yml \


Please let us know if this method sounds like a step forward.

= PTG nomination discussion =

We discussed who to nominate to attend the PTG. We decised to nominate 
John, Arx and Sagi to go to the PTG. Ronelle and I are going to apply 
for it without the nomination as well, maybe we'll have budget to meet 
there and discuss CI related topics.


= Smaller items =

* We're close to finish the transition of the "ovb-updates" promotion 
job, it might be the first one to get renamed by the method discussed above.


* The 
gate-tripleo-ci-centos-7-scenario00{1,2,3,4}-multinode-oooq-container 
jobs are now passing and voting (1 is still not in the gates, waiting 
for merge of [2])


* There were some issues found and fixed (by Sagi) on the stable branch 
promotion jobs.


* Gabriele keeps working on getting the promotion/periodic jobs hosted 
at the new RDO Cloud infrastructure, allowing us to run promotion more 
frequently.


* John keeps working on the libvirt multi-nic support, his patches are 
here[3].


Thank you for reading the summary. Have a great weekend!

Best regards,
Attila

[1] 
https://docs.openstack.org/developer/tripleo-quickstart/feature-configuration.html

[2] https://review.openstack.org/478979
[3] https://review.openstack.org/#/q/topic:libvirt-multi-nic

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [tripleo] CI Squad Meeting Summary (week 24) - devmode issues, promotion progress

2017-06-16 Thread Attila Darazs
If the topics below interest you and you want to contribute to the 
discussion, feel free to join the next meeting:


Time: Thursdays, 14:30-15:30 UTC
Place: https://bluejeans.com/4113567798/

Full minutes: https://etherpad.openstack.org/p/tripleo-ci-squad-meeting


= Devmode OVB issues =

Devmode OVB (the one you lunch with "./devmode-sh --ovb" is not able to 
deploy reliably on RDO Cloud due to DNS issues. This change[1] might 
help, but still having problems.



= Promotion job changes =

Moving the promotion jobs over to Quickstart is an important but 
difficult to achieve goal. It would be great to not debug jobs from the 
old system again. There's the first step towards that.


We retired the "periodic-tripleo-ci-centos-7-ovb-nonha" job and 
transitioned the "ha" one to run with Quickstart. The new job's name is 
"periodic-tripleo-ci-centos-7-ovb-ha-oooq" and it's already used to 
promote new DLRN hashes.


There's still an issue with it, which is fixed in this[2] change and it 
should start working properly soon (it already got through an overcloud 
deployment). Thanks Gabriele for leading this effort!


Migrating the remaining "periodic-tripleo-ci-centos-7-ovb-updates" job 
is not straightforward, as we don't have feature parity in Quickstart 
with this original job. The job name is misleading, as there are a lot 
of things tested within this job. What we miss is predictable placement, 
hostname mapping and predictable IPs, apart from the actual update, that 
we will leave to the Lifecycle team.



= Where to put tripleo-ci env files? =

Currently we're using Ben's repo[3] for OVB environment files, while THT 
has also env files[4] that we don't test upstream. That's not ideal and 
we started to discuss where to really store these configs and how to 
handle it properly. Should it be in the tripleo-ci repo? Should we have 
up-to-date and tested versions in THT? Can we backport those to stable 
branches?


We didn't really figure out the solution to this during the meeting, so 
feel free to continue the discussion here or next time.


Thank you for reading the summary. Have a great weekend!

Best regards,
Attila

[1] https://review.openstack.org/474334
[2] https://review.openstack.org/474504
[3] 
https://github.com/cybertron/openstack-virtual-baremetal/tree/master/network-templates
[4] 
https://github.com/rdo-management/tripleo-heat-templates/tree/mgt-master/environments


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [tripleo] CI Squad Meeting Summary (week 23) - images, devmode and the RDO Cloud

2017-06-09 Thread Attila Darazs
If the topics below interest you and you want to contribute to the 
discussion, feel free to join the next meeting:


Time: Thursdays, 14:30-15:30 UTC
Place: https://bluejeans.com/4113567798/

Full minutes: https://etherpad.openstack.org/p/tripleo-ci-squad-meeting

We had a packed agenda and intense discussion as always! Let's start 
with an announcement:


The smoothly named "TripleO deploy time optimization hackathlon" will be 
held on 21st and 22nd of June. It would be great to have the cooperation 
of multiple teams here. See the etherpad[1] for details.


= Extending our image building =

It seems that multiple teams would like to utilize the upstream/RDO 
image building process and produce images just like we do upstream. 
Unfortunately our current image storage systems are not having enough 
bandwidth (either upstream or on the RDO level) to increase the amount 
of images served.


Paul Belanger joined us and explained the longer term plans of OpenStack 
infra, which would provide a proper image/binary blob hosting solution 
in the 6 months time frame.


In the short term, we will recreate both the upstream and RDO image 
hosting instances on the new RDO Cloud and will test the throughput.


= Transitioning the promotion jobs =

This task still needs some further work. We're missing feature parity on 
the ovb-updates job. As the CI Squad is not able to take responsibility 
for the update functionality, we will probably migrate the job with 
everything else but the update part and make that the new promotion job.


We will also extend the amount of jobs voting on a promotion, probably 
will the scenario jobs.


= Devmode =

Quickstart's devmode.sh seems to be picking up popularity among the 
TripleO developers. Meanwhile we're starting to realize the limitations 
of the interface it provides for Quickstart. We're going to have a 
design session next week on Tuesday (13th) at 1pm UTC where we will try 
to come up with some ideas to improve this.


Ian Main suggested to default devmode.sh to deploy a containerized 
system so that developers get more familiar with that. We agreed on this 
being a good idea and will follow it up with some changes.


= RDO Cloud =

The RDO cloud transition is continuing, however Paul requested that we 
don't add the new cloud to the tripleo queue upstream but rather use the 
rdoproject's own zuul and nodepool to be a bit more independent and run 
it like a third party CI system. This will require further cooperation 
with RDO Infra folks.


Meanwhile Sagi is setting up the infrastructure needed on the RDO Cloud 
instance to run CI jobs.


Thank you for reading the summary. Have a great weekend!

Best regards,
Attila

[1] https://etherpad.openstack.org/p/tripleo-deploy-time-hack

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [tripleo] CI Squad Meeting Summary (week 22) - Promotion Problems

2017-06-02 Thread Attila Darazs
If the topics below interest you and you want to contribute to the 
discussion, feel free to join the next meeting:


Time: Thursdays, 14:30-15:30 UTC
Place: https://bluejeans.com/4113567798/

Full minutes: https://etherpad.openstack.org/p/tripleo-ci-squad-meeting

= CI Promotion problems =

The last promoted DLRN hash is from 21st of May, so now it's 12 day old. 
This is mostly due to not being able to thoroughly gate everything that 
consists of TripleO and we're right in the middle of the cycle where 
most work happens and a lot of code gets merged into every project.


However we should still try our best to improve the situation. If you're 
in any position to help solve our blocker problems (the bugs are 
announced on #tripleo regularly), please lend a hand!


= Smaller topics =

* We also had a couple of issues due to trying to bump Ansible from 2.2 
to version 2.3 in Quickstart. This uncovered a couple of gaps in our 
gating, and we decided to revert until we fix them.


* We're on track with transitioning some OVB jobs to RDO Cloud, now we 
need to create our infrastructure there and add the cloud definition to 
openstack-infra/project-config.


* We have RDO containers built on the CentOS CI system[1]. We should 
eventually integrate them into the promotion pipeline. Maybe use them as 
the basis for upstream CI runs eventually?


* Our periodic tempest jobs are getting good results on both Ocata and 
Master, Arx keeps ironing out the remaining failures. See the current 
status here: [2].


* The featureset discussion is coming to an end, we have a good idea how 
what should go in which config files, now the cores should document that 
to help contributors make the right calls when creating new config files 
or modifying existing ones.


Thank you for reading the summary. Have a great weekend!

Best regards,
Attila

[1] https://ci.centos.org/job/rdo-tripleo-containers-build/
[2] 
http://status.openstack.org/openstack-health/#/g/project/openstack-infra~2Ftripleo-ci?searchJob=


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [tripleo] CI Squad Meeting Summary (week 21) - Devmode OVB, RDO Cloud and config management

2017-05-26 Thread Attila Darazs
If the topics below interest you and you want to contribute to the 
discussion, feel free to join the next meeting:


Time: Thursdays, 14:30-15:30 UTC
Place: https://bluejeans.com/4113567798/

Full minutes: https://etherpad.openstack.org/p/tripleo-ci-squad-meeting

= Periodic & Promotion OVB jobs Quickstart transition =

We had a lively technical discussions this week. Gabriele's work on 
transitioning the periodic & promotion jobs is nearly complete, only 
needs reviews at this point. We won't set a transition date for these as 
it is not really impacting folks long term if these jobs are failing for 
a few days at this point. We'll transition when everything is ready.


= RDO Cloud & Devmode OVB =

We continued planning the introduction of RDO Cloud for the upstream OVB 
jobs. We're still at the point of account setup.


The new OVB based devmode seems to be working fine. If you have access 
to RDO Cloud, and haven't tried it already, give it a go. It can set up 
a full master branch based deployment within 2 hours, including any 
pending changes baked into the under & overcloud.


When you have your account info sourced, all it takes is

$ ./devmode.sh --ovb

from your tripleo-quickstart repo! See here[1] for more info.

= Container jobs on nodepool multinode =

Gabriele is stuck with these new Quickstart jobs. We would need a deep 
dive into debugging and using the container based TripleO deployments. 
Let us know if you can do one!


= How to handle Quickstart configuration =

This a never-ending topic, on which we managed to spend a good chunk of 
time this week as well. Where should we put various configs? Should we 
duplicate a bunch of variables or cut them into small files?


For now it seems we can agree on 3 levels of configuration:

* nodes config (i.e. how many nodes we want for the deployment)
* envionment + provisioner settings (i.e. you want to run on rdocloud 
with ovb, or on a local machine with libvirt)
* featureset (a certain set of features enabled/disabled for the jobs, 
like pacemaker and ssl)


This seems rather straightforward until we encounter exceptions. We're 
going to figure out the edge cases and rework the current configs to 
stick to the rules.



That's it for this week. Thank you for reading the summary.

Best regards,
Attila

[1] http://docs.openstack.org/developer/tripleo-quickstart/devmode-ovb.html

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [tripleo] CI Squad Meeting Summary (week 20)

2017-05-19 Thread Attila Darazs
If the topics below interest you and you want to contribute to the 
discussion, feel free to join the next meeting:


Time: Thursdays, 14:30-15:30 UTC
Place: https://bluejeans.com/4113567798/

Full minutes: https://etherpad.openstack.org/p/tripleo-ci-squad-meeting

= Using RDO Cloud for OVB jobs =

We spent some time discussing the steps needed to start running a few 
OVB TripleO jobs on the new RDO Cloud, which seems to be a good shape to 
start utilizing it. We need to create new users for it and add the cloud 
definition to project-config among other things.


When all is set up, we will ramp up the amount of jobs ran there slowly 
to test the stability and bottlenecks.


= Old OVB jobs running without Quickstart =

There are a couple of jobs that is still not transitioned running on a 
few repos. We need to figure out if those jobs are still needed and if 
yes, what's holding them back from transition.


= CI jobs with containers =

We talked about possible ways to update all the containers with fresh 
and gating packages. It's not a trivial problem and we will probably 
involve more container folks in it. The current idea is to create a 
container that could locally serve the DLRN hash packages, avoiding 
downloading them for each containers. However this will be still an IO 
intensive solution, but probably there's no way around it.


= Gate instability, critical bug =

The pingest failures are still plaguing the ovb-ha job, we really need a 
solution for this critical bug[1], as it fails around ~30 percent of the 
time. Please take a look if you can!


Thank you for reading the summary.

Best regards,
Attila

[1] https://bugs.launchpad.net/tripleo/+bug/1680195

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [tripleo] CI Squad Meeting Summary (week 18 & 19)

2017-05-12 Thread Attila Darazs
If the topics below interest you and you want to contribute to the 
discussion, feel free to join the next meeting:


Time: Thursdays, 14:30-15:30 UTC
Place: https://bluejeans.com/4113567798/

Full minutes: https://etherpad.openstack.org/p/tripleo-ci-squad-meeting

The previous week's meeting was short and focused on transition, so I 
didn't send a summary for it. We also had a couple of daily sync 
meetings to discuss the ongoing work. Here's what happened in the last 
two weeks.


= Quickstart Transition Phase 2 Status =

As previously planned, we transitioned the ovb-ha and ovb-nonha jobs to 
run with Quickstart. Please read the details of it from the announcement 
email[1].


The job is performing really well over the previous days, check the 
statistics here[2]. The only problem is some pingtest failure which 
seems be to not a Quickstart but a TripleO bug[3].


We're still working on transitioning periodic and promotion jobs and 
started planning "phase3" which will include updates and upgrades jobs 
and the containerized undercloud job.


= Review Process Improvements =

Ronelle initiated a conversation about improving the speed of landing 
bigger features and changes in Quickstart. A recent example is the OVB 
mode for devmode.sh which is taking a long time to get merged. Ideas 
about the new process can be seen at this etherpad[4].


= Image hosting issues =

We had a discussion about hosting the pre-built images for Quickstart, 
which seems to be problematic recently and results in bad user 
experience for first time users.


We can't get the CentOS CDN to serve up-to-date consistent images, and 
we have capacity problems on images.rdoproject.org. The solution might 
be the new RDO Cloud, but for now we are considering having each job 
build the image by default. This could add some overhead but it might 
save time if the download is slow or headaches if the images are outdated.


Thank you for reading the summary. Have a good weekend!

Best regards,
Attila

[1] http://lists.openstack.org/pipermail/openstack-dev/2017-May/116568.html
[2] http://status-tripleoci.rhcloud.com/ and then click on 
"gate-tripleo-ci-centos-7-ovb-ha-oooq"

[3] https://bugs.launchpad.net/tripleo/+bug/1690109
[4] https://review.rdoproject.org/etherpad/p/rdoci-review-process

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [tripleo] images.rdoproject.org / thirdparty-logs.rdoproject.org is going tdown for a short maintenance

2017-05-03 Thread Attila Darazs
I need to reboot this machine for updates/fixes. It should be a short 
downtime, but a few jobs/downloads might be interrupted, so I announce 
it here for reference.


This jobs serves both TripleO and RDO jobs, so I CC both mailing list.

Best regards,
Attila

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [tripleo] CI Squad Meeting Summary (week 17)

2017-04-28 Thread Attila Darazs
If the topics below interest you and you want to contribute to the 
discussion, feel free to join the next meeting:


Time: Thursdays, 14:30-15:30 UTC
Place: https://bluejeans.com/4113567798/

Full minutes: https://etherpad.openstack.org/p/tripleo-ci-squad-meeting

Our meeting was an hour later as Gabriele's Quickstart Deep Dive session 
was conflicting wiht it. The session was excellent and if you didn't 
attend, I'd highly recommend watching it once the recording of it comes 
out. Meanwhile you can check out the summary here[1].


= Quickstart Transition Phase 2 Status =

We estimate that the transition of the OVB jobs will take place on *9th 
of May*. The following jobs are going to switch to be run by Quickstart:


* ovb-ha
* ovb-nonha
* ovb-updates

The -oooq equivalent jobs are already running close to the final 
configuration which gives us good confidence for the transition.


= Smaller topics =

* Sagi brought it up that openstack-infra's image building broke us 2 
times in the last weeks, and it would be nice to find some solution for 
the problem. maybe promoting those images too? Sagi will bring this 
topic up at the infra meeting.


* The OVB based devmode.sh is stuck because we can't use shade properly 
from the virtual environment, this needs further investigation.


* How we use featuresets: Wes brought it up that we are not very 
consistent with using the new "featureset" style configuration 
everywhere. Indeed, we need to move to using them on RDO CI as well, but 
at least their use in tripleo-ci is consistent among the transitioned jobs.


* Wes suggested to develop a rotation for watching the gating jobs to 
free up developers from constantly watching them. We need to figure out 
a good system for this.


Thank you for reading the summary.

Best regards,
Attila

[1] https://etherpad.openstack.org/p/quickstart-deep-dive

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [tripleo] CI Squad Meeting Summary (week 13)

2017-04-03 Thread Attila Darazs
If the topics below interest you and you want to contribute to the 
discussion, feel free to join the next meeting:


Time: Thursdays, 14:30-15:30 UTC
Place: https://bluejeans.com/4113567798/

Full minutes: https://etherpad.openstack.org/p/tripleo-ci-squad-meeting

We had a meeting full of intense discussion last Thursday. Here's the 
summary.


= Promotion jobs and HTTP caching =

The first part of it was centered around trying to improve and mostly 
speed up the promotion process for TripleO, which is an ongoing 
discussion for the last few weeks.


Image building takes a long time (~30 minutes) for each promotion job, 
which we can be spared by having a separate job build the images. This 
would result in fewer job timeouts. Zuul v3 will be able to handle these 
kind of job dependencies directly, but meanwhile we can probably work 
around it. Our contact on this work is pabelanger.


A lot of other outside queries can be probably sped up by having an 
infra-wide caching proxy. This might be an Apache server with mod_proxy 
for the short term, and AFS mirror in the long term. This will speed up 
image downloads and docker registry downloads as well, speeding up our jobs.


= Quickstart transition update =

The big OVB change from last week got merged, now we're checking the 
stability of those jobs before proceeding with the transition. We'll 
want to have more extensive testing before we move the voting jobs over, 
so probably we'll create parallel non-voting jobs this time 
(ha/non-ha/updates + gate job), not just testing through pending 
tripleo-ci changes.




We will probably combine the former ha and nonha OVB jobs to save 
resources on rh1. Relevant change and discussion here[1].


We also briefly talked and discussed how to involve and bring up to 
speed more people for reviewing Quickstart changes. There will be a deep 
dive session on the subject given by one of the current cores probably.


Best regards,
Attila

[1] https://review.openstack.org/449785

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [tripleo] CI Squad Meeting Summary (week 12)

2017-03-27 Thread Attila Darazs
If the topics below interest you and you want to contribute to the 
discussion, feel free to join the next meeting:


Time: Thursdays, 14:30-15:30 UTC
Place: https://bluejeans.com/4113567798/

Full minutes: https://etherpad.openstack.org/p/tripleo-ci-squad-meeting

== Gating & CDN issues ==

Last week was a rough one for the TripleO gate jobs. We fixed a couple 
of issues on the oooq gates handling the stable branches. This was 
mainly a workaround[1] from tripleo-ci missing from quickstart for 
building the gated packages.


We also had quite a lot of issues with gate jobs not being able to 
download packages[2]. Figuring out how to deal with that issue is still 
under way. There were quite a lot more small fixes to help fix the gate 
instability[3].


== Timestamps ==

We also added timestamps to all the quickstart deployment logs, so now 
it will be easy to link directly to a timestamp in any of the logs. 
Example[4]. It has a per-second resolution, but it only depends on awk 
being present on the systems running the commands.


== Logs, postci.txt ==

Until now the postci.txt file was a bit hidden, we now copy it out under 
logs/postci.txt.gz in every oooq gate job. We're also working on making 
a README style page for the logs that could help guide newcomers 
debugging common errors and finding the relevant logs files.


Let us know if you have further suggestions for improving the log 
browsing, or if you're missing some vital logs.


Some smaller discussion items:

* due to the critical patch for OVB[5] not merging last week, we're 
going to push out the transition of the next batch of jobs to at least 
next Monday (3rd of April).


* the periodic pipeline is still not running often enough. We will 
probably move 3 OVB jobs to run every 8 hours as a start to increase the 
cadence


* We're probably going to move to the "CI Squad" Trello board[6] from 
the current RDO board that we're sharing with other team(s).


Best regards,
Attila

[1] https://review.openstack.org/447530
[2] https://bugs.launchpad.net/tripleo/+bug/1674681
[3] https://review.openstack.org/#/q/topic:tripleo/outstanding
[4] 
http://logs.openstack.org/75/446075/8/check/gate-tripleo-ci-centos-7-nonha-multinode-oooq/cb1f563/logs/undercloud/home/jenkins/install_packages.sh.log.txt.gz#_2017-03-24_21_30_20

[5] https://review.openstack.org/431567
[6] https://trello.com/b/U1ITy0cu/tripleo-ci-squad

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [tripleo] CI Squad Meeting Summary (week 11)

2017-03-16 Thread Attila Darazs
If the topics below interest you and you want to contribute to the 
discussion, feel free to join the next meeting:


Time: Thursdays, 14:30-15:30 UTC (WARNING: time changed due to DST)
Place: https://bluejeans.com/4113567798/

Full minutes: https://etherpad.openstack.org/p/tripleo-ci-squad-meeting

The last week was very significant in the CI Squad's life: we migrated 
the first set of TripleO gating jobs to Quickstart[1] and it went more 
or less smoothly. There were a few failed gate jobs, but we quickly 
patched up the problematic parts.


For the "phase2" of the transition we're going to concentrate on three 
areas:


1) usability improvements, to make the logs from the jobs easier to 
browse and understand


2) make sure the speed of the new jobs are roughly at the same level as 
the previous ones


3) get the OVB jobs ported as well

We use the "oooq-t-phase2"[2] gerrit topic for the changes around these 
areas. As the OVB related ones are kind of big, we will not migrate the 
jobs next week, most probably only on the beginning of the week after.


We're also trying to utilize the new RDO Cloud, hopefully we will be 
able to offload a couple of gate jobs on it soon.


Best regards,
Attila

[1] 
http://lists.openstack.org/pipermail/openstack-dev/2017-March/113996.html

[2] https://review.openstack.org/#/q/topic:oooq-t-phase2

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [tripleo] Gating jobs are now running with Quickstart

2017-03-15 Thread Attila Darazs
As discussed previously in the CI Squad meeting summaries[1] and on the 
TripleO weekly meeting, the multinode gate jobs are now running with 
tripleo-quickstart. To signify the change, we added the -oooq suffix to 
them.


The following jobs migrated yesterday evening, with more to come:

- gate-tripleo-ci-centos-7-undercloud-oooq
- gate-tripleo-ci-centos-7-nonha-multinode-oooq
- gate-tripleo-ci-centos-7-scenario001-multinode-oooq
- gate-tripleo-ci-centos-7-scenario002-multinode-oooq
- gate-tripleo-ci-centos-7-scenario003-multinode-oooq
- gate-tripleo-ci-centos-7-scenario004-multinode-oooq

For those who are already familiar with Quickstart, we introduced two 
new concepts:


- featureset config files that are numbered collection of settings, 
without node configuration[2]
- the '--nodes' option for quickstart.sh and the config/nodes files that 
deal with only the number and type of nodes the deployment will have[3]


If you would like to debug these jobs, it might be useful to read 
Quickstart's documentation[4]. We hope the transition will be smooth, 
but if you have problems ping members of the TripleO CI Squad on #tripleo.


Best regards,

[1] 
http://lists.openstack.org/pipermail/openstack-dev/2017-March/113724.html
[2] 
https://docs.openstack.org/developer/tripleo-quickstart/feature-configuration.html
[3] 
https://docs.openstack.org/developer/tripleo-quickstart/node-configuration.html

[4] https://docs.openstack.org/developer/tripleo-quickstart/

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [tripleo] CI Squad Meeting Summary (week 10)

2017-03-10 Thread Attila Darazs
If the topics below interest you and you want to contribute to the 
discussion, feel free to join the next meeting:


Time: Thursdays, 15:30-16:30 UTC
Place: https://bluejeans.com/4113567798/

Full minutes: https://etherpad.openstack.org/p/tripleo-ci-squad-meeting

I skipped last week's summary as the CI Squad was very focused on making 
the Quickstart upstream job transition deadline of March 13th.


Things are in a good shape! I want to emphasize here how well and hard 
Gabriele, Sagi and Ben worked together in the last weeks on the transition.


We had daily stand-ups in the last three days instead of just the 
regular Thursday meeting.


Our current status is: GREEN.

We have the "oooq-t-phase1"[1] gerrit topic tracking the outstanding 
changes for the transition. There's 3 of them left unmerged, all in very 
good state.


This WIP change[2] pulls together all the necessary changes and we got 
good results on the undercloud only, basic multinode and scenario 1-2 
jobs. We also reproduced the exact same failure as the current 
scenario001 job is experiencing, which is exactly what we want to see.


We expect the 3rd and 4th scenario working similarly well, as we 
previously had Quickstart only runs with them, just not through this WIP 
change.


After we merge [2], we can change the "job type" in project-config to 
"flip the switch" and have the transitioned jobs be driven by Quickstart.


We're in good shape for a potential Monday transition.

Best regards,
Attila

[1] https://review.openstack.org/#/q/topic:oooq-t-phase1
[2] https://review.openstack.org/431029

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [tripleo] CI Squad Meeting Summary (week 8) and Quickstart transition deadline

2017-02-28 Thread Attila Darazs
**IMPORTANT** We are planning to transition the first batch of jobs on 
the very beginning of the Pike cycle! What this means is that on, or 
very close to 10th of March we're going to switch over at least the 
multinode scenario jobs (1 to 5) to be driven by Quickstart, but 
possibly more.


As always, if these topics interest you and you want to contribute to 
the discussion, feel free to join the next meeting:


Time: Thursdays, 15:30-16:30 UTC
Place: https://bluejeans.com/4113567798/

Full minutes: https://etherpad.openstack.org/p/tripleo-ci-squad-meeting

Our meeting was focused on identifying the critical path of the 
Quickstart TripleO CI transition, and thanks to Ronelle, we have work 
items labelled for it here[1]. Please take a look at that board to see 
what we're up to.


(We're using the RDO Infra board until the transition period, later we 
will probably migrate to the CI Squad board completely.)


We also need to focus on quickstart's ability to reproduce all the 
different scenarios on libvirt. Currently we're good, but we are adding 
a few features during the transition that need to be working with 
virthosts too out of the box, like multinic deployments.


Best regards,
Attila

[1] 
https://trello.com/b/HhXlqdiu/rdo?menu=filter&filter=label:oooq%20phase%201%20transition


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tripleo] CI Squad Meeting Summary (week 7)

2017-02-21 Thread Attila Darazs

On 02/17/2017 07:18 PM, Paul Belanger wrote:

On Fri, Feb 17, 2017 at 03:39:44PM +0100, Attila Darazs wrote:

As always, if these topics interest you and you want to contribute to the
discussion, feel free to join the next meeting:

Time: Thursdays, 15:30-16:30 UTC
Place: https://bluejeans.com/4113567798/

Full minutes: https://etherpad.openstack.org/p/tripleo-ci-squad-meeting


Was this meeting recorded in some manner? I see you are using bluejeans, but
don't see any recordings of the discussion.

Additionally, I am a little sad IRC is not being used for these meetings. Some
of the things tripleo is doing is of interest of me, but I find it difficult to
join a video session for 1hour just to listen.  With IRC, it is easier for me to
multi task into other things, then come back and review what has been discussed.


We are not recording it for now, sorry. We are tying to keep good 
minutes & this summary to bridge the gap for the lack of recording or 
IRC meeting.


We voted about it a few weeks ago and video meeting won. We did agree 
about revisiting the IRC option after the transition is done as the bulk 
of the meetings are mostly chats about possible technical solutions for 
the quickstart transition rather than classic meeting where we decide 
stuff. We're bringing those to the weekly TripleO IRC meetings.



* We discussed about the state of the Quickstart based update/upgrade jobs
upstream. matbu is working on them and the changes for the jobs are under
review. Sagi will help with adding project definitions upstream when the
changes are merged.

* John started to draft out the details of the CI related PTG sessions[1].

* A couple of us brought up reviews that they wanted merged. We discussed
the reasons, and agreed that sometimes an encouraging email to the mailing
list has the best effect to move important or slow-to-merge changes moving
forward.

* We talked quite a lot about log collection upstream. Currently Quickstart
doesn't collect logs exactly as upstream, and that might be okay, as we
collect more, and hopefully in a more easy to digest format.

* However we might collect too much, and finding the way around the logs is
not that easy. So John suggested to create an entry page in html for the
jobs that point to different possible places to find debug output.


Yes, logging was something of an issue this week.  We are still purging data on
logs.o.o, but it does look like quickstart is too aggressive with log
collection. We currently only have 12TB of HDD space for logs.o.o and our
retention policy has dropped from 6 months to 6 weeks.

I believe we are going have a discussion at the PTG about this for
openstack-infra and implement some changes (caps) for jobs in the coming future.
If you are planning on attending the PTG, I encourage you to attend the
discussions.


I won't be on the PTG this time, but maybe Emilien or John can join.

With regards to space, we're going to comb through the logging and make 
sure we're a bit more selective about what we gather.


Attila


* We also discussed adding back debug output to elastic search, as the
current console output doesn't contain everything, we log a lot of
deployment output in seperate log files in undercloud/home/stack/*.log

* Migration to the new Quickstart jobs will happen at or close to 10th of
March, in the beginning of the Pike cycle when the gates are still stable.

That was all for this week.

Best regards,
Attila

[1] https://etherpad.openstack.org/p/tripleo-ci-roadmap

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev




__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [tripleo] CI Squad Meeting Summary (week 7)

2017-02-17 Thread Attila Darazs
As always, if these topics interest you and you want to contribute to 
the discussion, feel free to join the next meeting:


Time: Thursdays, 15:30-16:30 UTC
Place: https://bluejeans.com/4113567798/

Full minutes: https://etherpad.openstack.org/p/tripleo-ci-squad-meeting

* We discussed about the state of the Quickstart based update/upgrade 
jobs upstream. matbu is working on them and the changes for the jobs are 
under review. Sagi will help with adding project definitions upstream 
when the changes are merged.


* John started to draft out the details of the CI related PTG sessions[1].

* A couple of us brought up reviews that they wanted merged. We 
discussed the reasons, and agreed that sometimes an encouraging email to 
the mailing list has the best effect to move important or slow-to-merge 
changes moving forward.


* We talked quite a lot about log collection upstream. Currently 
Quickstart doesn't collect logs exactly as upstream, and that might be 
okay, as we collect more, and hopefully in a more easy to digest format.


* However we might collect too much, and finding the way around the logs 
is not that easy. So John suggested to create an entry page in html for 
the jobs that point to different possible places to find debug output.


* We also discussed adding back debug output to elastic search, as the 
current console output doesn't contain everything, we log a lot of 
deployment output in seperate log files in undercloud/home/stack/*.log


* Migration to the new Quickstart jobs will happen at or close to 10th 
of March, in the beginning of the Pike cycle when the gates are still 
stable.


That was all for this week.

Best regards,
Attila

[1] https://etherpad.openstack.org/p/tripleo-ci-roadmap

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [tripleo] CI Squad Meeting Summary (week 6)

2017-02-13 Thread Attila Darazs
As always, if these topics interest you and you want to contribute to 
the discussion, feel free to join the next meeting:


Time: Thursdays, 15:30-16:30 UTC
Place: https://bluejeans.com/4113567798/

Full minutes: https://etherpad.openstack.org/p/tripleo-ci-squad-meeting

We had only about half the usual attendance on our Thursday meeting as 
people had conflicts and other hindrances. I joined it from an airport 
lobby. We still managed to do some good work.


== Task prioritization ==

Our main focus was on prioritizing the remaining tasks for Quickstart 
upstream transition. There are a few high priority items which we put in 
the Next column on the RDO Infra board. See all the outstanding "Q to U" 
(Quickstart to Upstream) cards here[1].


Some of these are simple and quick low hanging fruits, a few are bigger 
chunks of work that need a good attention, like making sure that our 
multinode workflow can be reproduced over libvirt for easier debugging.


== Quickstart extra roles ==

We pulled in all useful roles into the quickstart-extras repo when we 
created it, and it seems it might be better if a few very specialized 
ones would live outside of it.


One example is Raul's validate-ha role, which we will split off to speed 
up development, as most cores are not involved in this and gates are not 
testing it.


== Update on transitioning to the new Quickstart jobs ==

We will use the job type field from the upstream jobs to figure out 
which quickstart job config we have to use for gate jobs (not the job name).


In addition to this, Gabrielle will tackle the issue of mixing the old 
and new jobs, and run them in parallel, letting us transition them one 
by one. Details in the trello card[2].


== Gating improvement ==

I was part of a meeting last week where we tried to identify problem 
areas for our testing and came to the conclusion that the ungated 
openstack common repo[3] is sometimes the cause for gating breaks.


We should start gating it to improve upstream quickstart job stability.

Best regards,
Attila

[1] 
https://trello.com/b/HhXlqdiu/rdo?menu=filter&filter=label:%5BQ%20to%20U%5D

[2[ https://trello.com/c/dNTpzD1n
[3] https://buildlogs.centos.org/centos/7/cloud/x86_64/openstack-ocata/

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [tripleo] CI Squad Meeting Summary (week 5 - for real now)

2017-02-03 Thread Attila Darazs
As always, if these topics interest you and you want to contribute to 
the discussion, feel free to join the next meeting:


Time: Thursdays, 15:30-16:30 UTC
Place: https://bluejeans.com/4113567798/

Note: last two weeks I used the incorrect week number. Getting back on 
track, this is the 5th week of 2017.


Yesterday's meeting focused almost entirely on figuring out what's the 
"feature delta" between the current TripleO CI and the functionality of 
our Quickstart based CI. In the spirit of aviation we call this 
"preflight checklist"[1].


It contains:

* the various variables that turn functionality on and off in upstream tests

* a short description of the variables

* a "Quickstart" section for each, describing if it's supported or not 
currently. If yes, usually there's a link at the respective part, if 
not, we add a Trello card to track the work, or a bug if we plan to take 
care of it a bit later


* proposed new Quickstart jobs, where we combine the existing features 
into fewer jobs with same coverage


* the existing upstream jobs with the features they currently cover

If you're somewhat familiar with the current CI system, please look over 
these and let us know if there's any mistake in it.


Other than this, the new nodepool and OVB Quickstart jobs are working, 
apart from the OVB HA job -- Sagi is working on it.


I'm not sure the link for the checklist is accessible for everyone, so 
I'm going to paste it here after the link. Probably the formatting is 
not perfect, so if you can, check the google doc.


Best regards,
Attila

[1] 
https://docs.google.com/document/d/1Mb_t5Qe-Lnh0uaXy0ubX9y4k65Q4D_aw-49eZOqoviQ/edit?pli=1#


---

TripleO CI Quickstart Transition

Preflight Checklist - All the items must be ready by 10th March



This document describes:

* Existing features in the current TripleO CI

* Their support in Quickstart

* The current CI job feature sets (what features are tested in specific 
jobs)


* The new proposed job feature sets (to reduce the amount of jobs while 
keeping the same coverage)


Feature index





Each list item represents a “toci_gate_test.sh” variable that 
enable/disable features in the CI jobs.




* 1-CONTROLLER

   * Use 1 controller (and maybe other type of nodes)

   * Quickstart: supported

* 3-CONTROLLERS

   * Use 3 controllers (and maybe other type of nodes)

   * Quickstart: supported

* 1-COMPUTE

   * Use 1 compute node (and maybe other type of nodes)

   * Quickstart: supported

* 1-CEPH

   * Use 1 ceph node (and maybe other type of nodes)

   * Quickstart: supported

* CONTAINERS

   * Container based undercloud (feature under development)

   * Container based overcloud

   * Quickstart: supported (undercloud in progress)[a]

* DISABLE_IRONIC

   * This is just a label that denotes the ability to skip ironic steps 
entirely during multinode based jobs


   * Quickstart: not needed, implemented elsewhere

* DELOREAN_HTTP_URLS

   * We can't use squid to cache https urls, so don't use them

   * Quickstart: make sure we use only http in build-test-package [card]

* IRONIC_DRIVERS_DEPLOY_HTTPPORT

   * sets http port unconditionally to 
ironic::drivers::deploy::http_port: 3816 in undercloud overrides


   * Quickstart: not supported [card]

* IMAGE_CACHE_SELECT

   * This feature enables to select whether or not using an image from 
cache when gating specific project (i.e. projects that alter the image 
creation process)


   * TODO(gcerami) propose a change to unify the list of project for 
every image to build (specific project gated -> all images will be 
recreated from scratch)


   * Quickstart: work in progress (trown, image build role) [card]

* IMAGE_CACHE_UPLOAD

   * Ability to promote image, uploading it to the image cache server

   * We can leave the current implementation in bash, but work on which 
job type combination will activate the upload


   * Quickstart: not needed, can be handled up the current script

* INTROSPECT

   * perform overcloud nodes introspection

   * Quickstart: supported, but we are still performing bulk 
introspection while we should use new format as in 
http://git.openstack.org/cgit/openstack-infra/tripleo-ci/tree/scripts/tripleo.sh#n608 
instead of 
https://github.com/openstack/tripleo-quickstart-extras/blob/master/roles/overcloud-prep-images/templates/overcloud-prep-images.sh.j2#L90


   * [card]

* METRICS

   * Tripleo ci is sprinkled with metric sections, surrounded with 
start_metric - stop_metric primitives that gather section duration 
informations throughout various steps of the deployment. (they really 
just set timers). This metrics are then sent to graphite host for graph 
rendering at the end of the run


   * Quickstart: not needed

* MULTINODE_SETUP

* MULTINODE_NODES_BOOTSTRAP

   * multiple nodes are consumed from openstack nodes pool

   * A setup to create a network between nodepool nodes is needed

   * All nodes must contain proper nodepool configurations in /etc/nodepool

   * N

Re: [openstack-dev] [TripleO] Proposing Sergey (Sagi) Shnaidman for core on tripleo-ci

2017-02-02 Thread Attila Darazs

On 02/01/2017 08:37 PM, John Trowbridge wrote:



On 01/30/2017 10:56 AM, Emilien Macchi wrote:

Sagi, you're now core on TripleO CI repo. Thanks for your hard work on
tripleo-quickstart transition, and also helping by keeping CI in good
shape, your work is amazing!

Congrats!

Note: I couldn't add you to tripleo-ci group, but only to tripleo-core
(Gerrit permissions), which mean you can +2 everything but we trust
you to use it only on tripleo-ci. I'll figure out the Gerrit
permissions later.



I also told Sagi that he should also feel free to +2 any
tripleo-quickstart/extras patches which are aimed at transitioning
tripleo-ci to use quickstart. I didn't really think about this as an
extra permission, as any tripleo core has +2 on
tripleo-quickstart/extras. However, I seem to have surprised the other
quickstart cores with this. None were opposed to the idea, but just
wanted to make sure that it was clearly communicated that this is allowed.

If there is some objection to this, we can consider it further. FWIW,
Sagi has been consistently providing high quality critical reviews for
tripleo-quickstart/extras for some time now, and was pivotal in the
setup of the quickstart based OVB job.


Thanks for the clarification.

And +1 on Sagi as a quickstart/extras core. I really appreciate his 
critical eyes on the changes.


Attila

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [tripleo] CI Squad Meeting Summary (week 5)

2017-01-30 Thread Attila Darazs
In the spirit of "better late than never", here's a summary of our CI 
Squad meeting.


Time: Thursdays, 15:30-16:30 UTC
Place: https://bluejeans.com/4113567798/


Configuration management in TripleO CI
==

There was a design meeting organized by Gabrielle (thanks!) to discuss 
how to solve the problem of configuring the new Quickstart jobs.


There are multiple approaches for this problem, and it's difficult to 
find a balance from having a single definite config file per job (too 
much duplication) to parsing the job name and handling every option with 
coding logic in the testing system (hard to reproduce/know what's 
happening).


What emerged from the still ongoing discussion is identifying three 
config sections:


* provisioner: e.g. libvirt, OVB or nodepool based jobs and the related 
configution to allow quickstart to work on these systems
* release: one config file for release, config/release (we already have 
these)

* config: one config file for general config, config/general_config

It seems useful to give a neutral name for a certain set of 
functionalities tested in certain CI jobs instead of the misleading 
names of "ha/nonha" (when in fact they test a lot more). "featureset01", 
"featureset02", etc. looks like a good candidate for naming them.


So we could end up with jobs like "tripleo-ovb-featureset01-newton", 
with the featureset matrix documented somewhere like tripleo-ci.



Smaller topics
==

* Both the OVB and nodepool jobs are working apart from generic setbacks 
like the python-pandas issue breaking our jobs.


* Our blocker/bottleneck for transition is now the above discussed 
configuration management.


* The "Quickstart transition checklist" is now hosted on google docs 
here[1].


* We are having trouble keeping track of the issues in upstream CI. 
Using an invidual trello board instead of the current etherpad was 
suggested. We're going to try this solution out this week and post updates.


* Emilien mentioned the new additonal composable upgrades testing in 
TripelO CI[2].


* We had a bug triage/squashing event last Friday. Started moving bugs 
from the "tripleo-quickstart" project to "tripleo" and tag them as 
ci/quickstart, to ease the scheduling of bugs.


* Also managed to make a big improvment on the tripleo-quickstart bug 
count, going from 65 open bugs to 42, and from 37 new bugs to 21.


Full meeting minutes can be found here:

https://etherpad.openstack.org/p/tripleo-ci-squad-meeting

Best regards,
Attila

[1] 
https://docs.google.com/document/d/1Mb_t5Qe-Lnh0uaXy0ubX9y4k65Q4D_aw-49eZOqoviQ/edit?pli=1

[2] https://review.openstack.org/425727

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [tripleo] CI Squad Meeting Summary (week 4)

2017-01-19 Thread Attila Darazs
Everybody interested in the TripleO CI and Quickstart is welcome to join 
the weekly meeting:


Time: Thursdays, 15:30-16:30 UTC
Place: https://bluejeans.com/4113567798/

Here's this week's summary:

* There aren't any blockers or bottlenecks slowing down the transition 
to the Quickstart based CI. We're right on track.


* The Quickstart OVB jobs are running stable. Yesterday they broke due 
to a tripleo-ci change, but Sagi fixed them today.


* The Quickstart multinode nodepool job is also working well. It's a 
good basis for extending our feature coverage.


* The ovb-ha-oooq-nv and nonha-multinode-oooq-nv jobs are moving in the 
check-tripleo queue to make sure we catch any change that breaks these 
new jobs[1].


* We have a few quickstart log collection usability improvements are on 
the way: soon all the text based logs are going to be renamed to end in 
txt.gz, making them browsable from the log servers[2]. Also the log 
collection output will be in a log file instead of dumped on a console.


* We are trying to reduce the number of unnecessary OVB jobs by limiting 
the files we trigger on, but openstack-infra doesn't like our current 
approach[3]. We brainstormed about alternative solutions (see the 
meeting minutes for details).


* Ben Kero proposed a PTG CI session about the CI moving to use 
Quickstart. Emilien is suggesting to create a second one regarding 
reusing the scenario jobs for container tests.


* There's a draft for the "pre-flight check list"[4] for the CI 
transition made by Gabrielle to make sure the quickstart based jobs will 
have the same coverage or better than the current CI system.


* We are going to have a design session about the handling of the config 
files for these new jobs on Wednesday the 25th, 15:00 UTC.


The full meeting minutes are here: 
https://etherpad.openstack.org/p/tripleo-ci-squad-meeting


Best regards,
Attila

[1] https://review.openstack.org/422646
[2] https://review.openstack.org/422638
[3] https://review.openstack.org/421525
[4] https://etherpad.openstack.org/p/oooq-tripleo-ci-check-list

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [tripleo] CI Squad Meeting Summary

2017-01-12 Thread Attila Darazs
We had our first meeting as the the CI Squad today. We re-purposed our 
"Quickstart to Upstream Transitioning" meeting into the Squad meeting, 
so the topics were and will be focused on the transition for the next 
month or so.


Everybody interested in the TripleO CI and Quickstart is welcome to join 
the meeting:


Time: Thursdays, 15:30-16:30 UTC
Place: https://bluejeans.com/4113567798/

Meeting summary:

* The Quickstart transition does not seem to have any blockers or 
bottlenecks for the time being, we are on track to be ready at the end 
of the Ocata cycle, and switch in the beginning of Pike.


* The new Quickstart based OVB jobs are working consistently and 
reliably, though the tinyrpc.server bug affected them just like the 
regular jobs.


* Got our first overcloud deploy today on the Quickstart based nodepool 
multinode jobs, big kudos to bkero and trown for making it happen. The 
verification part still needs work though.


* The experimental queue is overloaded, we will move OVB workloads to 
the multinode jobs once we get them working reliably.


* We might want to change the experimental OVB jobs to run container and 
composable upgrade jobs instead for increased coverage, but we need more 
input on that, we will discuss it on the next TripleO meeting.


* Selecting the proper configuration to use for each gate/period job is 
not optimal, we should rethink the system during the transition to 
Quickstart. This needs a design session. Also we need an up-to-date job 
type - functionality matrix like this[1].


* We voted to keep the meetings on BlueJeans and at 15:30 UTC.

* We will keep tracking the transition work on the RDO Infra Trello 
board with the [Q to U] tags, maybe move to the CI Squad board on Trello 
later. We should also look at Storyboard and see if that would be useful 
to switch to.


Full meeting minutes and notes: 
https://etherpad.openstack.org/p/tripleo-ci-squad-meeting


Best regards,
Attila

[1] https://review.openstack.org/#/c/399269/7/README.md

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tripleo] [ci] TripleO-Quickstart Transition to TripleO-CI Update and Invite:

2017-01-04 Thread Attila Darazs

On 01/04/2017 10:34 AM, Steven Hardy wrote:

Hi Harry,

On Tue, Jan 03, 2017 at 04:04:51PM -0500, Harry Rybacki wrote:

Greetings All,

Folks have been diligently working on the blueprint[1] to prepare
TripleO-Quickstart (OOOQ)[2] and TripleO-Quickstart-Extras[3] for
their transition into TripleO-CI. Presently, our aim is to begin the
actual transition to OOOQ on 4-Feb-2017. We are tracking our work on
the RDO-Infra Trello board[4] and holding public discussion of key
blockers on the team’s scrum etherpad[5].


Thanks for the update - can you please describe what "transition into
TripleO-CI" means?


Hello Steve,

This means we're trying to run all the gate jobs with Quickstart and 
make sure we have the same features enabled and results for each 
existing gate jobs.



I'm happy to see this work proceeding, but we have to be mindful that the
end of the development cycle (around the time you're proposing) is always
a crazy-busy time where folks are trying to land features and fixes.

So, we absolutely must avoid any CI outages around this time, thus I get
nervous talking about major CI transitions around the Release-candate
weeks ;)

https://releases.openstack.org/ocata/schedule.html

If we're talking about getting the jobs ready, then switching over to
primarily oooq jobs in early pike, that's great, but please lets ensure we
don't may any disruptive changes before the end of this (very short and
really busy) cycle.


As I see the early pike is only 2 week away from our planned switch, it 
might be wiser to delay it indeed. The end-of-cycle stability might be 
even useful for us to run some new jobs parallel for a while if we have 
enough resources.



We are hosting weekly transition update meetings (1600-1700 UTC) and
would like to invite folks to participate. Specifically, we are
looking for at least one stakeholder in the existing TripleO-CI to
join us as we prepare to migrate OOOQ. Attend and map out job/feature
coverage to identify any holes so we can begin plugging them. Please
reply off-list or reach out to me (hrybacki) on IRC to be added to the
transition meeting calendar invite.


Why can't we discuss this in the weekly TripleO IRC meeting?

I think folks would be fine with having a standing item where we dicscuss
this transition (there is already a CI item, but I've rarely seen this
topic raised there).


I agree that we should have a standing item about this in the TripleO 
meeting, however this transition meeting usually takes an hour a week in 
itself, so we cannot really fit it into the TripleO meeting.


Also why we ask somebody well versed in the TripleO CI to join us is 
that we might get answers to questions we didn't even know we had. There 
are probably shortcuts and known workarounds to what we're trying to 
achieve in the upstream system that we're not familiar with.


Also the discussion is focused on Quickstart (for example how to develop 
some roles that unify different workloads like OVB and nodepool), so it 
wouldn't be relevant for the TripleO meeting entirely.


Thus the request still stands, I think we could get a big help with 
somebody familiar with the CI system. This should be a once a week 
meeting for only the following 3-6 weeks.


We will make a short status from now on about the current state of the 
transition on the TripleO meetings though.


Thank you for your thoughts,
Attila


https://wiki.openstack.org/wiki/Meetings/TripleO

Thanks!

Steve

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev




__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [TripleO] Make "RedHat RDO CI" vote on tripleo-quickstart changes?

2016-09-06 Thread Attila Darazs
I think Third party gating is solid enough to make it vote, not just 
post results.


Currently we have a gate job breaking and I almost submitted something 
without everything passing. Would be useful to make it blocking.


Is there any reason not to do this?

Attila

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev