Re: [openstack-dev] [TripleO] Proposing Ronelle Landy for Tripleo-Quickstart/Extras/CI core
On 11/29/2017 08:34 PM, John Trowbridge wrote: I would like to propose Ronelle be given +2 for the above repos. She has been a solid contributor to tripleo-quickstart and extras almost since the beginning. She has solid review numbers, but more importantly has always done quality reviews. She also has been working in the very intense rover role on the CI squad in the past CI sprint, and has done very well in that role. +1, yep! __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [tripleo] Blocking gate - do not recheck / rebase / approve any patch now (please)
On 10/26/2017 06:14 AM, Emilien Macchi wrote: On Wed, Oct 25, 2017 at 1:59 PM, Emilien Macchi wrote: Quick update before being afk for some hours: - Still trying to land https://review.openstack.org/#/c/513701 (thanks Paul for promoting it in gate). Landed. - Disabling voting on scenario001 and scenario004 container jobs: https://review.openstack.org/#/c/515188/ Done, please be very careful while these jobs are not voting. If any doubt, please ping me or fultonj or gfidente on #tripleo. - overcloudrc/keystone v2 workaround: https://review.openstack.org/#/c/515161/ (d0ugal will work on proper fix for https://bugs.launchpad.net/tripleo/+bug/1727454) Merged - Dougal will work on the real fix this week but not urgent anymore. - Fixing zaqar/notification issues on https://review.openstack.org/#/c/515123 - we hope that helps to reduce some failures in gate In gate right now and hopefully merged in less than 2 hours. Otherwise, please keep rechecking it. According to Thomas Hervé, il will reduce the change to timeout. - puppet-tripleo gate broken on stable branches (syntax jobs not running properly) - jeblair is looking at it now jeblair will provide a fix hopefully this week but this is not critical at this time. Thanks Jim for your help. Once again, we'll need to retrospect and see why we reached that terrible state but let's focus on bringing our CI in a good shape again. Thanks a ton to everyone who is involved, I'm now restoring all patches that I killed from the gate. You can now recheck / rebase / approve what you want, but please save our CI resources and do it with moderation. We are not done yet. I won't call victory but we've merged almost all our blockers, one is missing but currently in gate: https://review.openstack.org/515123 - need babysit until merged. Now let's see how RDO promotion works. We're close :-) We also have to change the tenant rc file from overcloudrc to overcloudrc.v3 for the validate-simple role to unblock promotion on master. I created a bug to track that problem and going to post a fix soon: https://bugs.launchpad.net/tripleo/+bug/1727698 Attila __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [tripleo] CI Squad Meeting Summary (week 35) - better late than never
If the topics below interest you and you want to contribute to the discussion, feel free to join the next meeting: Time: Thursdays, 14:30-15:30 UTC Place: https://bluejeans.com/4113567798/ Full minutes: https://etherpad.openstack.org/p/tripleo-ci-squad-meeting Here are the topics discussed from last Thursday: Downstream (RHEL based) upstream Quickstart gates are down, because we had to migrate from QEOS7 to the ci-rhos internal cloud, which is not able to support our jobs currently. Ronelle is going to talk to responsible people to solve problems on it. Tempest is now running in more and more scenario jobs. See Emilien's email[1] for details. There's ongoing work from Emilien to get the upgrades job working on stable/pike. Please help with reviews to get it going. Most of the squad's work is currently focusing on getting the periodic promotion pipeline on rdocloud working and uploading containers and images. That's the short version, join us on the Thursday meeting or read the etherpad for more. :) Best regards, Attila [1] http://lists.openstack.org/pipermail/openstack-dev/2017-September/121849.html __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [tripleo] CI Squad Meeting Summary (week 34)
If the topics below interest you and you want to contribute to the discussion, feel free to join the next meeting: Time: Thursdays, 14:30-15:30 UTC Place: https://bluejeans.com/4113567798/ Full minutes: https://etherpad.openstack.org/p/tripleo-ci-squad-meeting Topics discussed: We talked about the balance between using openstack-infra supported vs. self hosted solutions for graphite, logservers, proxies and mirrors. Paul Belanger joined us and the end result seemed to be that we're going to try to keep as many services under infra as we can, but sometimes the line is not so clear when we're dealing with 3rd party environments like rdocloud. Ronelle talked about changing the overcommit ratio on rdocloud after the analysis of our usage. This can be probably done without any issue. Wes added "gate-tripleo-ci-centos-7-scenario003-multinode-oooq-container" to the tripleo-quickstart-extras check and gate jobs to make sure we won't break containers. and that we get some feedback on the status of containers jobs. RDO packaging changes are now gating with Quickstart (multinode-featureset005), though it's non-voting. It might help us prevent breaks from the packaging side. Promotion jobs are still not working fully on RDO Cloud, but we're working on it. That's it for this week, have a nice weekend. Best regards, Attila __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [tripleo] How to report tripleo-quickstart results to DLRN API
Hi folks, I'm trying to come up with a good design for $subject, and there are several different methods with pros and cons. I'd like to get your opinion about them. For a bit of context, DLRN API[1] is a new extension of DLRN, our package and repo building solution for RDO. It's designed to be a central point of information about jobs that ran on certain hashes in various stages of testing and handle "promotions", which are really just symlinks to certain hashes. We want to report back job results on multiple levels (upstream, RDO CI phase1 & phase2) and then use the information to promote new hashes at every stage. If we would only be interested in reporting successful runs, the solution is fairly simple: add a reporting step to the quickstart-extras.yml[2] playbook at the end if a "report" variable is set. However it would be probably useful in the long term to also report back failures (for statistics) and that's where things get complicated. It would be great if we could report the failed status within the same quickstart.sh run instead of having a second run, because this way we don't have to touch the shell scripts in multiple places (upstream, phase1, phase2), just get the reporting done with config file changes. This is not simple, because the Ansible play can exit at any failed task. We would need to wrap each task in rescue blocks[3] to avoid skipping the failure. Idea #1: Create a "run successful" marker file at the reporting step, and report failure in case the file is not found (also making sure the file does not exist at the start of the run). This would still require multiple run of ansible-playbook, but we could integrate the functionality into quickstart.sh by creating a --report option, making it available at every environment at the same time. Idea #2: Don't fail on *any* step, just register variables and check for success. An example where we already do this is the overcloud-deploy role. We don't fail on errors[4], but write out a file with the result and fail later[5]. We would need to do this at almost all shell parts to be reasonably certain we won't miss any failure. This requires a lot of alterations to playbooks and it seems a bit forced on Ansible without the usage of the rescue block, which we can't put in every single task. Idea #3: Use "post-build scripts" in the promotion jobs. We can pattern match for failed/passed jobs and report the result accordingly. The problem with this is that it's environment dependent. While we can certainly do this with post-build scripts in Jenkins Job Builder on CentOS CI, it's not clear how to solve this in Zuul queues. Probably we just need to make the shell scripts of the jobs more involved (not fail on quickstart.sh's nonzero exit). Besides these complications, it also means that we have to keep the reporting method in sync across multiple environments. Neither of these solutions are ideal, let me know if you have any better design idea. I personally think #1 might be the easiest and cleanest to implement, especially that I'm planning to introduce multiple ansible-playbook runs in quickstart.sh during the redesign of the devmode. Best regards, Attila [1] https://github.com/javierpena/dlrnapi_client [2] https://github.com/openstack/tripleo-quickstart-extras/blob/master/playbooks/quickstart-extras.yml [3] http://docs.ansible.com/ansible/latest/playbooks_blocks.html#error-handling [4] https://github.com/openstack/tripleo-quickstart-extras/blob/master/roles/overcloud-deploy/tasks/deploy-overcloud.yml#L6 [5] https://github.com/openstack/tripleo-quickstart-extras/blob/master/playbooks/quickstart-extras-overcloud.yml#L32-L44 __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [tripleo] CI Squad Meeting Summary (week 33)
If the topics below interest you and you want to contribute to the discussion, feel free to join the next meeting: Time: Thursdays, 14:30-15:30 UTC Place: https://bluejeans.com/4113567798/ Full minutes: https://etherpad.openstack.org/p/tripleo-ci-squad-meeting Topics discussed: We debated whether we should add an upgrades job to tripleo-quickstart-extra that would allow our IRC bot (hubbot) to report on the status of the upgrades as well using gatestatus[1]. The upgrades jobs are not stable enough to do that though. We have/had two major infra issues during the week, one is not using the nodepool DNS in jobs (fixed by Sagi) and not using the DLRN & CentOS mirrors during DLRN package building in the gates. The latter has fixes but they are not merged yet. Emilien and Arx is working on adding tempest tests in place of pingtests in most of our gate jobs where it's useful. We also have quite a few jobs that don't have any validation yet. We decided on using a whitelist for the collecting log files from /etc on the upstream jobs. This will reduce the load on the logserver. 3/4 node multinode jobs are almost ready, we're trying to merge the changes, just like the ones for multinic with libvirt. We're also working hard to get the periodic/promotion jobs work on rdocloud to increase the cadence of the promotions. We have daily standups to coordinate the work with Ronelle, Wes, John and me. That's it for this week, have a nice weekend. Best regards, Attila [1] https://github.com/adarazs/gate-status __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [tripleo] CI Squad Meeting Summary (week 31)
If the topics below interest you and you want to contribute to the discussion, feel free to join the next meeting: Time: Thursdays, 14:30-15:30 UTC Place: https://bluejeans.com/4113567798/ Full minutes: https://etherpad.openstack.org/p/tripleo-ci-squad-meeting There are a lot of people on vacation, so this was a small meeting. We started by discussing the hash promotions and the ways to track issues. Whether it's an upstream or RDO promotion issue, just create a Launchpad bug against tripleo and tag it with ci and alert. It will automatically gets escalated and gets attention. Gabriele gave a presentation about his current status with container building on RDO Cloud. It looks to be in a good shape, however there are still bugs to iron out. Arx explained that the scenario001 jobs are now running a tempest test as well, good way to introduce more testing upstream, while Emilien explained that we should probably do more tempest testing on container jobs as well. Wes brought up an issue about collecting logs during the image building process which needs attention. That's it for this week, have a nice weekend. Best regards, Attila __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [tripleo] CI Squad Meeting Summary (week 30) - holidays
If the topics below interest you and you want to contribute to the discussion, feel free to join the next meeting: Time: Thursdays, 14:30-15:30 UTC Place: https://bluejeans.com/4113567798/ Full minutes: https://etherpad.openstack.org/p/tripleo-ci-squad-meeting = Discussion topics = Wes suggested to remove the verbose ansible logging from our CI runs and direct people to use ARA instead. This seems like a good solution after we get the upstream README file merged where we can explain the changes. There was also a discussion about having the OVB related repo handled upstream instead of Ben's personal github account. Ronelle will start a thread about this. We will need an upstream periodic job that runs tempest on containerized overcloud. I added a card to keep track of this[1]. The initramfs modification patches[2] from Sagi need some eyes, please review them if you have time. = Who's working on what? = This is not a comprehensive status, just highlights. John is currently working on the 3 node multinode jobs, he already added a job in the CI check, but it's not passing yet. Pending a lot of changes to be merged for it. Wes is testing the multinic libvirt runs, they are not passing yet. Gabriele is fighting with the container promotion jobs on rdocloud, using DLRN API. It's a complex goal to achieve. Sagi was doing a SOVA design meeting, and tried to include the RDO jobs in the output of it. Here's the status page if you don't know it[3]. Arx was working on Tempest related issues. Ronelle did some BMC and devmode OVB fixes, thanks! Probably a lot more, but that's what I remember. Attila is working on DLRN API reporting/promotion on RDO CI for now, later on other systems too. = Announcements = A heads up: next week a significant portion of the CI Squad will be on holiday. Sagi, John and Wes will be on PTO and Ronelle will be on meetings. Gabriele and I are still here as cores if you need reviews for tripleo-ci or quickstart. That's it, have a nice weekend. Best regards, Attila [1] https://trello.com/c/dvQOn9aK/297-create-an-upstream-tempest-job-that-runs-on-a-containerized-system [2] https://review.openstack.org/#/q/topic:initramfs [3] http://cistatus.tripleo.org/ __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [tripleo] CI Squad Meeting Summary (week 28) - some announcements
If the topics below interest you and you want to contribute to the discussion, feel free to join the next meeting: Time: Thursdays, 14:30-15:30 UTC Place: https://bluejeans.com/4113567798/ Full minutes: https://etherpad.openstack.org/p/tripleo-ci-squad-meeting = Announcements = TripleO Cores who would like to +workflow changes on tripleo-quickstart, tripleo-quickstart-extras and tripleo-ci should attend the Squad meeting to gain the necessary overview for deciding when to submit changes to these repos. This was discussed by the repo specific cores over this meeting. In other news the https://thirdparty-logs.rdoproject.org/ logserver (hosted on OS1) migrated to https://thirdparty.logs.rdoproject.org/ (on RDO cloud). = Discussion topics = This week we had a more balanced agenda, with multiple small topics. Here they are: * John started working on the much requested 3 node multinode feature for Quickstart. Here's his WIP change[1]. This is necessary to test HA + containers on multinode jobs. * The OVB job transition is almost over complete. Sagi was cleaning up the last few tasks, replacing the gate-tripleo-ci-centos-7-ovb-nonha-puppet-* jobs of ceph and cinder to featureset024 which deploys ceph (former updates job) and gate-tripleo-ci-centos-7-ovb-nonha-convergence jobs which runs on experimental for Heat repo. * Gabriele made a nice solution to run periodic jobs on demand if necessary. The patch[2] is still not merged, but it looks promising. * Ronelle and Gabriele continues to work on the RDO cloud migration (both OVB and multinode). There are already some new and already exisitng jobs migrated there as a test. That's it for last week. Best regards, Attila [1] https://review.openstack.org/483078 [2] https://review.openstack.org/478516 __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [tripleo] CI Squad Meeting Summary (week 26) - job renaming discussion
If the topics below interest you and you want to contribute to the discussion, feel free to join the next meeting: Time: Thursdays, 14:30-15:30 UTC Place: https://bluejeans.com/4113567798/ Full minutes: https://etherpad.openstack.org/p/tripleo-ci-squad-meeting = Renaming the CI jobs = When we started the job transition to Quickstart, we introduced the concept of featuresets[1] that define a certain combination of features for each job. This seemed to be a sensible solution, as it's not practical to mention all the individual features in the job name, and short names can be misleading (for example ovb-ha job does so much more than tests HA). We decided to keep the original names for these jobs to simplify the transition, but the plan is to rename them to something that will help to reproduce the jobs locally with Quickstart. The proposed naming scheme will be the same as the one we're now using for job type in project-config: gate-tripleo-ci-centos-7-{node-config}-{featureset-config} So for example the current "gate-tripleo-ci-centos-7-ovb-ha-oooq" job would look like "gate-tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001" The advantage of this will be that it will be easy to reproduce a gate job on a local virthost by typing something like: ./quickstart.sh --release tripleo-ci/master \ --nodes config/nodes/3ctlr_1comp.yml \ --config config/general_config/featureset001.yml \ Please let us know if this method sounds like a step forward. = PTG nomination discussion = We discussed who to nominate to attend the PTG. We decised to nominate John, Arx and Sagi to go to the PTG. Ronelle and I are going to apply for it without the nomination as well, maybe we'll have budget to meet there and discuss CI related topics. = Smaller items = * We're close to finish the transition of the "ovb-updates" promotion job, it might be the first one to get renamed by the method discussed above. * The gate-tripleo-ci-centos-7-scenario00{1,2,3,4}-multinode-oooq-container jobs are now passing and voting (1 is still not in the gates, waiting for merge of [2]) * There were some issues found and fixed (by Sagi) on the stable branch promotion jobs. * Gabriele keeps working on getting the promotion/periodic jobs hosted at the new RDO Cloud infrastructure, allowing us to run promotion more frequently. * John keeps working on the libvirt multi-nic support, his patches are here[3]. Thank you for reading the summary. Have a great weekend! Best regards, Attila [1] https://docs.openstack.org/developer/tripleo-quickstart/feature-configuration.html [2] https://review.openstack.org/478979 [3] https://review.openstack.org/#/q/topic:libvirt-multi-nic __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [tripleo] CI Squad Meeting Summary (week 24) - devmode issues, promotion progress
If the topics below interest you and you want to contribute to the discussion, feel free to join the next meeting: Time: Thursdays, 14:30-15:30 UTC Place: https://bluejeans.com/4113567798/ Full minutes: https://etherpad.openstack.org/p/tripleo-ci-squad-meeting = Devmode OVB issues = Devmode OVB (the one you lunch with "./devmode-sh --ovb" is not able to deploy reliably on RDO Cloud due to DNS issues. This change[1] might help, but still having problems. = Promotion job changes = Moving the promotion jobs over to Quickstart is an important but difficult to achieve goal. It would be great to not debug jobs from the old system again. There's the first step towards that. We retired the "periodic-tripleo-ci-centos-7-ovb-nonha" job and transitioned the "ha" one to run with Quickstart. The new job's name is "periodic-tripleo-ci-centos-7-ovb-ha-oooq" and it's already used to promote new DLRN hashes. There's still an issue with it, which is fixed in this[2] change and it should start working properly soon (it already got through an overcloud deployment). Thanks Gabriele for leading this effort! Migrating the remaining "periodic-tripleo-ci-centos-7-ovb-updates" job is not straightforward, as we don't have feature parity in Quickstart with this original job. The job name is misleading, as there are a lot of things tested within this job. What we miss is predictable placement, hostname mapping and predictable IPs, apart from the actual update, that we will leave to the Lifecycle team. = Where to put tripleo-ci env files? = Currently we're using Ben's repo[3] for OVB environment files, while THT has also env files[4] that we don't test upstream. That's not ideal and we started to discuss where to really store these configs and how to handle it properly. Should it be in the tripleo-ci repo? Should we have up-to-date and tested versions in THT? Can we backport those to stable branches? We didn't really figure out the solution to this during the meeting, so feel free to continue the discussion here or next time. Thank you for reading the summary. Have a great weekend! Best regards, Attila [1] https://review.openstack.org/474334 [2] https://review.openstack.org/474504 [3] https://github.com/cybertron/openstack-virtual-baremetal/tree/master/network-templates [4] https://github.com/rdo-management/tripleo-heat-templates/tree/mgt-master/environments __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [tripleo] CI Squad Meeting Summary (week 23) - images, devmode and the RDO Cloud
If the topics below interest you and you want to contribute to the discussion, feel free to join the next meeting: Time: Thursdays, 14:30-15:30 UTC Place: https://bluejeans.com/4113567798/ Full minutes: https://etherpad.openstack.org/p/tripleo-ci-squad-meeting We had a packed agenda and intense discussion as always! Let's start with an announcement: The smoothly named "TripleO deploy time optimization hackathlon" will be held on 21st and 22nd of June. It would be great to have the cooperation of multiple teams here. See the etherpad[1] for details. = Extending our image building = It seems that multiple teams would like to utilize the upstream/RDO image building process and produce images just like we do upstream. Unfortunately our current image storage systems are not having enough bandwidth (either upstream or on the RDO level) to increase the amount of images served. Paul Belanger joined us and explained the longer term plans of OpenStack infra, which would provide a proper image/binary blob hosting solution in the 6 months time frame. In the short term, we will recreate both the upstream and RDO image hosting instances on the new RDO Cloud and will test the throughput. = Transitioning the promotion jobs = This task still needs some further work. We're missing feature parity on the ovb-updates job. As the CI Squad is not able to take responsibility for the update functionality, we will probably migrate the job with everything else but the update part and make that the new promotion job. We will also extend the amount of jobs voting on a promotion, probably will the scenario jobs. = Devmode = Quickstart's devmode.sh seems to be picking up popularity among the TripleO developers. Meanwhile we're starting to realize the limitations of the interface it provides for Quickstart. We're going to have a design session next week on Tuesday (13th) at 1pm UTC where we will try to come up with some ideas to improve this. Ian Main suggested to default devmode.sh to deploy a containerized system so that developers get more familiar with that. We agreed on this being a good idea and will follow it up with some changes. = RDO Cloud = The RDO cloud transition is continuing, however Paul requested that we don't add the new cloud to the tripleo queue upstream but rather use the rdoproject's own zuul and nodepool to be a bit more independent and run it like a third party CI system. This will require further cooperation with RDO Infra folks. Meanwhile Sagi is setting up the infrastructure needed on the RDO Cloud instance to run CI jobs. Thank you for reading the summary. Have a great weekend! Best regards, Attila [1] https://etherpad.openstack.org/p/tripleo-deploy-time-hack __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [tripleo] CI Squad Meeting Summary (week 22) - Promotion Problems
If the topics below interest you and you want to contribute to the discussion, feel free to join the next meeting: Time: Thursdays, 14:30-15:30 UTC Place: https://bluejeans.com/4113567798/ Full minutes: https://etherpad.openstack.org/p/tripleo-ci-squad-meeting = CI Promotion problems = The last promoted DLRN hash is from 21st of May, so now it's 12 day old. This is mostly due to not being able to thoroughly gate everything that consists of TripleO and we're right in the middle of the cycle where most work happens and a lot of code gets merged into every project. However we should still try our best to improve the situation. If you're in any position to help solve our blocker problems (the bugs are announced on #tripleo regularly), please lend a hand! = Smaller topics = * We also had a couple of issues due to trying to bump Ansible from 2.2 to version 2.3 in Quickstart. This uncovered a couple of gaps in our gating, and we decided to revert until we fix them. * We're on track with transitioning some OVB jobs to RDO Cloud, now we need to create our infrastructure there and add the cloud definition to openstack-infra/project-config. * We have RDO containers built on the CentOS CI system[1]. We should eventually integrate them into the promotion pipeline. Maybe use them as the basis for upstream CI runs eventually? * Our periodic tempest jobs are getting good results on both Ocata and Master, Arx keeps ironing out the remaining failures. See the current status here: [2]. * The featureset discussion is coming to an end, we have a good idea how what should go in which config files, now the cores should document that to help contributors make the right calls when creating new config files or modifying existing ones. Thank you for reading the summary. Have a great weekend! Best regards, Attila [1] https://ci.centos.org/job/rdo-tripleo-containers-build/ [2] http://status.openstack.org/openstack-health/#/g/project/openstack-infra~2Ftripleo-ci?searchJob= __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [tripleo] CI Squad Meeting Summary (week 21) - Devmode OVB, RDO Cloud and config management
If the topics below interest you and you want to contribute to the discussion, feel free to join the next meeting: Time: Thursdays, 14:30-15:30 UTC Place: https://bluejeans.com/4113567798/ Full minutes: https://etherpad.openstack.org/p/tripleo-ci-squad-meeting = Periodic & Promotion OVB jobs Quickstart transition = We had a lively technical discussions this week. Gabriele's work on transitioning the periodic & promotion jobs is nearly complete, only needs reviews at this point. We won't set a transition date for these as it is not really impacting folks long term if these jobs are failing for a few days at this point. We'll transition when everything is ready. = RDO Cloud & Devmode OVB = We continued planning the introduction of RDO Cloud for the upstream OVB jobs. We're still at the point of account setup. The new OVB based devmode seems to be working fine. If you have access to RDO Cloud, and haven't tried it already, give it a go. It can set up a full master branch based deployment within 2 hours, including any pending changes baked into the under & overcloud. When you have your account info sourced, all it takes is $ ./devmode.sh --ovb from your tripleo-quickstart repo! See here[1] for more info. = Container jobs on nodepool multinode = Gabriele is stuck with these new Quickstart jobs. We would need a deep dive into debugging and using the container based TripleO deployments. Let us know if you can do one! = How to handle Quickstart configuration = This a never-ending topic, on which we managed to spend a good chunk of time this week as well. Where should we put various configs? Should we duplicate a bunch of variables or cut them into small files? For now it seems we can agree on 3 levels of configuration: * nodes config (i.e. how many nodes we want for the deployment) * envionment + provisioner settings (i.e. you want to run on rdocloud with ovb, or on a local machine with libvirt) * featureset (a certain set of features enabled/disabled for the jobs, like pacemaker and ssl) This seems rather straightforward until we encounter exceptions. We're going to figure out the edge cases and rework the current configs to stick to the rules. That's it for this week. Thank you for reading the summary. Best regards, Attila [1] http://docs.openstack.org/developer/tripleo-quickstart/devmode-ovb.html __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [tripleo] CI Squad Meeting Summary (week 20)
If the topics below interest you and you want to contribute to the discussion, feel free to join the next meeting: Time: Thursdays, 14:30-15:30 UTC Place: https://bluejeans.com/4113567798/ Full minutes: https://etherpad.openstack.org/p/tripleo-ci-squad-meeting = Using RDO Cloud for OVB jobs = We spent some time discussing the steps needed to start running a few OVB TripleO jobs on the new RDO Cloud, which seems to be a good shape to start utilizing it. We need to create new users for it and add the cloud definition to project-config among other things. When all is set up, we will ramp up the amount of jobs ran there slowly to test the stability and bottlenecks. = Old OVB jobs running without Quickstart = There are a couple of jobs that is still not transitioned running on a few repos. We need to figure out if those jobs are still needed and if yes, what's holding them back from transition. = CI jobs with containers = We talked about possible ways to update all the containers with fresh and gating packages. It's not a trivial problem and we will probably involve more container folks in it. The current idea is to create a container that could locally serve the DLRN hash packages, avoiding downloading them for each containers. However this will be still an IO intensive solution, but probably there's no way around it. = Gate instability, critical bug = The pingest failures are still plaguing the ovb-ha job, we really need a solution for this critical bug[1], as it fails around ~30 percent of the time. Please take a look if you can! Thank you for reading the summary. Best regards, Attila [1] https://bugs.launchpad.net/tripleo/+bug/1680195 __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [tripleo] CI Squad Meeting Summary (week 18 & 19)
If the topics below interest you and you want to contribute to the discussion, feel free to join the next meeting: Time: Thursdays, 14:30-15:30 UTC Place: https://bluejeans.com/4113567798/ Full minutes: https://etherpad.openstack.org/p/tripleo-ci-squad-meeting The previous week's meeting was short and focused on transition, so I didn't send a summary for it. We also had a couple of daily sync meetings to discuss the ongoing work. Here's what happened in the last two weeks. = Quickstart Transition Phase 2 Status = As previously planned, we transitioned the ovb-ha and ovb-nonha jobs to run with Quickstart. Please read the details of it from the announcement email[1]. The job is performing really well over the previous days, check the statistics here[2]. The only problem is some pingtest failure which seems be to not a Quickstart but a TripleO bug[3]. We're still working on transitioning periodic and promotion jobs and started planning "phase3" which will include updates and upgrades jobs and the containerized undercloud job. = Review Process Improvements = Ronelle initiated a conversation about improving the speed of landing bigger features and changes in Quickstart. A recent example is the OVB mode for devmode.sh which is taking a long time to get merged. Ideas about the new process can be seen at this etherpad[4]. = Image hosting issues = We had a discussion about hosting the pre-built images for Quickstart, which seems to be problematic recently and results in bad user experience for first time users. We can't get the CentOS CDN to serve up-to-date consistent images, and we have capacity problems on images.rdoproject.org. The solution might be the new RDO Cloud, but for now we are considering having each job build the image by default. This could add some overhead but it might save time if the download is slow or headaches if the images are outdated. Thank you for reading the summary. Have a good weekend! Best regards, Attila [1] http://lists.openstack.org/pipermail/openstack-dev/2017-May/116568.html [2] http://status-tripleoci.rhcloud.com/ and then click on "gate-tripleo-ci-centos-7-ovb-ha-oooq" [3] https://bugs.launchpad.net/tripleo/+bug/1690109 [4] https://review.rdoproject.org/etherpad/p/rdoci-review-process __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [tripleo] images.rdoproject.org / thirdparty-logs.rdoproject.org is going tdown for a short maintenance
I need to reboot this machine for updates/fixes. It should be a short downtime, but a few jobs/downloads might be interrupted, so I announce it here for reference. This jobs serves both TripleO and RDO jobs, so I CC both mailing list. Best regards, Attila __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [tripleo] CI Squad Meeting Summary (week 17)
If the topics below interest you and you want to contribute to the discussion, feel free to join the next meeting: Time: Thursdays, 14:30-15:30 UTC Place: https://bluejeans.com/4113567798/ Full minutes: https://etherpad.openstack.org/p/tripleo-ci-squad-meeting Our meeting was an hour later as Gabriele's Quickstart Deep Dive session was conflicting wiht it. The session was excellent and if you didn't attend, I'd highly recommend watching it once the recording of it comes out. Meanwhile you can check out the summary here[1]. = Quickstart Transition Phase 2 Status = We estimate that the transition of the OVB jobs will take place on *9th of May*. The following jobs are going to switch to be run by Quickstart: * ovb-ha * ovb-nonha * ovb-updates The -oooq equivalent jobs are already running close to the final configuration which gives us good confidence for the transition. = Smaller topics = * Sagi brought it up that openstack-infra's image building broke us 2 times in the last weeks, and it would be nice to find some solution for the problem. maybe promoting those images too? Sagi will bring this topic up at the infra meeting. * The OVB based devmode.sh is stuck because we can't use shade properly from the virtual environment, this needs further investigation. * How we use featuresets: Wes brought it up that we are not very consistent with using the new "featureset" style configuration everywhere. Indeed, we need to move to using them on RDO CI as well, but at least their use in tripleo-ci is consistent among the transitioned jobs. * Wes suggested to develop a rotation for watching the gating jobs to free up developers from constantly watching them. We need to figure out a good system for this. Thank you for reading the summary. Best regards, Attila [1] https://etherpad.openstack.org/p/quickstart-deep-dive __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [tripleo] CI Squad Meeting Summary (week 13)
If the topics below interest you and you want to contribute to the discussion, feel free to join the next meeting: Time: Thursdays, 14:30-15:30 UTC Place: https://bluejeans.com/4113567798/ Full minutes: https://etherpad.openstack.org/p/tripleo-ci-squad-meeting We had a meeting full of intense discussion last Thursday. Here's the summary. = Promotion jobs and HTTP caching = The first part of it was centered around trying to improve and mostly speed up the promotion process for TripleO, which is an ongoing discussion for the last few weeks. Image building takes a long time (~30 minutes) for each promotion job, which we can be spared by having a separate job build the images. This would result in fewer job timeouts. Zuul v3 will be able to handle these kind of job dependencies directly, but meanwhile we can probably work around it. Our contact on this work is pabelanger. A lot of other outside queries can be probably sped up by having an infra-wide caching proxy. This might be an Apache server with mod_proxy for the short term, and AFS mirror in the long term. This will speed up image downloads and docker registry downloads as well, speeding up our jobs. = Quickstart transition update = The big OVB change from last week got merged, now we're checking the stability of those jobs before proceeding with the transition. We'll want to have more extensive testing before we move the voting jobs over, so probably we'll create parallel non-voting jobs this time (ha/non-ha/updates + gate job), not just testing through pending tripleo-ci changes. We will probably combine the former ha and nonha OVB jobs to save resources on rh1. Relevant change and discussion here[1]. We also briefly talked and discussed how to involve and bring up to speed more people for reviewing Quickstart changes. There will be a deep dive session on the subject given by one of the current cores probably. Best regards, Attila [1] https://review.openstack.org/449785 __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [tripleo] CI Squad Meeting Summary (week 12)
If the topics below interest you and you want to contribute to the discussion, feel free to join the next meeting: Time: Thursdays, 14:30-15:30 UTC Place: https://bluejeans.com/4113567798/ Full minutes: https://etherpad.openstack.org/p/tripleo-ci-squad-meeting == Gating & CDN issues == Last week was a rough one for the TripleO gate jobs. We fixed a couple of issues on the oooq gates handling the stable branches. This was mainly a workaround[1] from tripleo-ci missing from quickstart for building the gated packages. We also had quite a lot of issues with gate jobs not being able to download packages[2]. Figuring out how to deal with that issue is still under way. There were quite a lot more small fixes to help fix the gate instability[3]. == Timestamps == We also added timestamps to all the quickstart deployment logs, so now it will be easy to link directly to a timestamp in any of the logs. Example[4]. It has a per-second resolution, but it only depends on awk being present on the systems running the commands. == Logs, postci.txt == Until now the postci.txt file was a bit hidden, we now copy it out under logs/postci.txt.gz in every oooq gate job. We're also working on making a README style page for the logs that could help guide newcomers debugging common errors and finding the relevant logs files. Let us know if you have further suggestions for improving the log browsing, or if you're missing some vital logs. Some smaller discussion items: * due to the critical patch for OVB[5] not merging last week, we're going to push out the transition of the next batch of jobs to at least next Monday (3rd of April). * the periodic pipeline is still not running often enough. We will probably move 3 OVB jobs to run every 8 hours as a start to increase the cadence * We're probably going to move to the "CI Squad" Trello board[6] from the current RDO board that we're sharing with other team(s). Best regards, Attila [1] https://review.openstack.org/447530 [2] https://bugs.launchpad.net/tripleo/+bug/1674681 [3] https://review.openstack.org/#/q/topic:tripleo/outstanding [4] http://logs.openstack.org/75/446075/8/check/gate-tripleo-ci-centos-7-nonha-multinode-oooq/cb1f563/logs/undercloud/home/jenkins/install_packages.sh.log.txt.gz#_2017-03-24_21_30_20 [5] https://review.openstack.org/431567 [6] https://trello.com/b/U1ITy0cu/tripleo-ci-squad __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [tripleo] CI Squad Meeting Summary (week 11)
If the topics below interest you and you want to contribute to the discussion, feel free to join the next meeting: Time: Thursdays, 14:30-15:30 UTC (WARNING: time changed due to DST) Place: https://bluejeans.com/4113567798/ Full minutes: https://etherpad.openstack.org/p/tripleo-ci-squad-meeting The last week was very significant in the CI Squad's life: we migrated the first set of TripleO gating jobs to Quickstart[1] and it went more or less smoothly. There were a few failed gate jobs, but we quickly patched up the problematic parts. For the "phase2" of the transition we're going to concentrate on three areas: 1) usability improvements, to make the logs from the jobs easier to browse and understand 2) make sure the speed of the new jobs are roughly at the same level as the previous ones 3) get the OVB jobs ported as well We use the "oooq-t-phase2"[2] gerrit topic for the changes around these areas. As the OVB related ones are kind of big, we will not migrate the jobs next week, most probably only on the beginning of the week after. We're also trying to utilize the new RDO Cloud, hopefully we will be able to offload a couple of gate jobs on it soon. Best regards, Attila [1] http://lists.openstack.org/pipermail/openstack-dev/2017-March/113996.html [2] https://review.openstack.org/#/q/topic:oooq-t-phase2 __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [tripleo] Gating jobs are now running with Quickstart
As discussed previously in the CI Squad meeting summaries[1] and on the TripleO weekly meeting, the multinode gate jobs are now running with tripleo-quickstart. To signify the change, we added the -oooq suffix to them. The following jobs migrated yesterday evening, with more to come: - gate-tripleo-ci-centos-7-undercloud-oooq - gate-tripleo-ci-centos-7-nonha-multinode-oooq - gate-tripleo-ci-centos-7-scenario001-multinode-oooq - gate-tripleo-ci-centos-7-scenario002-multinode-oooq - gate-tripleo-ci-centos-7-scenario003-multinode-oooq - gate-tripleo-ci-centos-7-scenario004-multinode-oooq For those who are already familiar with Quickstart, we introduced two new concepts: - featureset config files that are numbered collection of settings, without node configuration[2] - the '--nodes' option for quickstart.sh and the config/nodes files that deal with only the number and type of nodes the deployment will have[3] If you would like to debug these jobs, it might be useful to read Quickstart's documentation[4]. We hope the transition will be smooth, but if you have problems ping members of the TripleO CI Squad on #tripleo. Best regards, [1] http://lists.openstack.org/pipermail/openstack-dev/2017-March/113724.html [2] https://docs.openstack.org/developer/tripleo-quickstart/feature-configuration.html [3] https://docs.openstack.org/developer/tripleo-quickstart/node-configuration.html [4] https://docs.openstack.org/developer/tripleo-quickstart/ __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [tripleo] CI Squad Meeting Summary (week 10)
If the topics below interest you and you want to contribute to the discussion, feel free to join the next meeting: Time: Thursdays, 15:30-16:30 UTC Place: https://bluejeans.com/4113567798/ Full minutes: https://etherpad.openstack.org/p/tripleo-ci-squad-meeting I skipped last week's summary as the CI Squad was very focused on making the Quickstart upstream job transition deadline of March 13th. Things are in a good shape! I want to emphasize here how well and hard Gabriele, Sagi and Ben worked together in the last weeks on the transition. We had daily stand-ups in the last three days instead of just the regular Thursday meeting. Our current status is: GREEN. We have the "oooq-t-phase1"[1] gerrit topic tracking the outstanding changes for the transition. There's 3 of them left unmerged, all in very good state. This WIP change[2] pulls together all the necessary changes and we got good results on the undercloud only, basic multinode and scenario 1-2 jobs. We also reproduced the exact same failure as the current scenario001 job is experiencing, which is exactly what we want to see. We expect the 3rd and 4th scenario working similarly well, as we previously had Quickstart only runs with them, just not through this WIP change. After we merge [2], we can change the "job type" in project-config to "flip the switch" and have the transitioned jobs be driven by Quickstart. We're in good shape for a potential Monday transition. Best regards, Attila [1] https://review.openstack.org/#/q/topic:oooq-t-phase1 [2] https://review.openstack.org/431029 __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [tripleo] CI Squad Meeting Summary (week 8) and Quickstart transition deadline
**IMPORTANT** We are planning to transition the first batch of jobs on the very beginning of the Pike cycle! What this means is that on, or very close to 10th of March we're going to switch over at least the multinode scenario jobs (1 to 5) to be driven by Quickstart, but possibly more. As always, if these topics interest you and you want to contribute to the discussion, feel free to join the next meeting: Time: Thursdays, 15:30-16:30 UTC Place: https://bluejeans.com/4113567798/ Full minutes: https://etherpad.openstack.org/p/tripleo-ci-squad-meeting Our meeting was focused on identifying the critical path of the Quickstart TripleO CI transition, and thanks to Ronelle, we have work items labelled for it here[1]. Please take a look at that board to see what we're up to. (We're using the RDO Infra board until the transition period, later we will probably migrate to the CI Squad board completely.) We also need to focus on quickstart's ability to reproduce all the different scenarios on libvirt. Currently we're good, but we are adding a few features during the transition that need to be working with virthosts too out of the box, like multinic deployments. Best regards, Attila [1] https://trello.com/b/HhXlqdiu/rdo?menu=filter&filter=label:oooq%20phase%201%20transition __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [tripleo] CI Squad Meeting Summary (week 7)
On 02/17/2017 07:18 PM, Paul Belanger wrote: On Fri, Feb 17, 2017 at 03:39:44PM +0100, Attila Darazs wrote: As always, if these topics interest you and you want to contribute to the discussion, feel free to join the next meeting: Time: Thursdays, 15:30-16:30 UTC Place: https://bluejeans.com/4113567798/ Full minutes: https://etherpad.openstack.org/p/tripleo-ci-squad-meeting Was this meeting recorded in some manner? I see you are using bluejeans, but don't see any recordings of the discussion. Additionally, I am a little sad IRC is not being used for these meetings. Some of the things tripleo is doing is of interest of me, but I find it difficult to join a video session for 1hour just to listen. With IRC, it is easier for me to multi task into other things, then come back and review what has been discussed. We are not recording it for now, sorry. We are tying to keep good minutes & this summary to bridge the gap for the lack of recording or IRC meeting. We voted about it a few weeks ago and video meeting won. We did agree about revisiting the IRC option after the transition is done as the bulk of the meetings are mostly chats about possible technical solutions for the quickstart transition rather than classic meeting where we decide stuff. We're bringing those to the weekly TripleO IRC meetings. * We discussed about the state of the Quickstart based update/upgrade jobs upstream. matbu is working on them and the changes for the jobs are under review. Sagi will help with adding project definitions upstream when the changes are merged. * John started to draft out the details of the CI related PTG sessions[1]. * A couple of us brought up reviews that they wanted merged. We discussed the reasons, and agreed that sometimes an encouraging email to the mailing list has the best effect to move important or slow-to-merge changes moving forward. * We talked quite a lot about log collection upstream. Currently Quickstart doesn't collect logs exactly as upstream, and that might be okay, as we collect more, and hopefully in a more easy to digest format. * However we might collect too much, and finding the way around the logs is not that easy. So John suggested to create an entry page in html for the jobs that point to different possible places to find debug output. Yes, logging was something of an issue this week. We are still purging data on logs.o.o, but it does look like quickstart is too aggressive with log collection. We currently only have 12TB of HDD space for logs.o.o and our retention policy has dropped from 6 months to 6 weeks. I believe we are going have a discussion at the PTG about this for openstack-infra and implement some changes (caps) for jobs in the coming future. If you are planning on attending the PTG, I encourage you to attend the discussions. I won't be on the PTG this time, but maybe Emilien or John can join. With regards to space, we're going to comb through the logging and make sure we're a bit more selective about what we gather. Attila * We also discussed adding back debug output to elastic search, as the current console output doesn't contain everything, we log a lot of deployment output in seperate log files in undercloud/home/stack/*.log * Migration to the new Quickstart jobs will happen at or close to 10th of March, in the beginning of the Pike cycle when the gates are still stable. That was all for this week. Best regards, Attila [1] https://etherpad.openstack.org/p/tripleo-ci-roadmap __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [tripleo] CI Squad Meeting Summary (week 7)
As always, if these topics interest you and you want to contribute to the discussion, feel free to join the next meeting: Time: Thursdays, 15:30-16:30 UTC Place: https://bluejeans.com/4113567798/ Full minutes: https://etherpad.openstack.org/p/tripleo-ci-squad-meeting * We discussed about the state of the Quickstart based update/upgrade jobs upstream. matbu is working on them and the changes for the jobs are under review. Sagi will help with adding project definitions upstream when the changes are merged. * John started to draft out the details of the CI related PTG sessions[1]. * A couple of us brought up reviews that they wanted merged. We discussed the reasons, and agreed that sometimes an encouraging email to the mailing list has the best effect to move important or slow-to-merge changes moving forward. * We talked quite a lot about log collection upstream. Currently Quickstart doesn't collect logs exactly as upstream, and that might be okay, as we collect more, and hopefully in a more easy to digest format. * However we might collect too much, and finding the way around the logs is not that easy. So John suggested to create an entry page in html for the jobs that point to different possible places to find debug output. * We also discussed adding back debug output to elastic search, as the current console output doesn't contain everything, we log a lot of deployment output in seperate log files in undercloud/home/stack/*.log * Migration to the new Quickstart jobs will happen at or close to 10th of March, in the beginning of the Pike cycle when the gates are still stable. That was all for this week. Best regards, Attila [1] https://etherpad.openstack.org/p/tripleo-ci-roadmap __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [tripleo] CI Squad Meeting Summary (week 6)
As always, if these topics interest you and you want to contribute to the discussion, feel free to join the next meeting: Time: Thursdays, 15:30-16:30 UTC Place: https://bluejeans.com/4113567798/ Full minutes: https://etherpad.openstack.org/p/tripleo-ci-squad-meeting We had only about half the usual attendance on our Thursday meeting as people had conflicts and other hindrances. I joined it from an airport lobby. We still managed to do some good work. == Task prioritization == Our main focus was on prioritizing the remaining tasks for Quickstart upstream transition. There are a few high priority items which we put in the Next column on the RDO Infra board. See all the outstanding "Q to U" (Quickstart to Upstream) cards here[1]. Some of these are simple and quick low hanging fruits, a few are bigger chunks of work that need a good attention, like making sure that our multinode workflow can be reproduced over libvirt for easier debugging. == Quickstart extra roles == We pulled in all useful roles into the quickstart-extras repo when we created it, and it seems it might be better if a few very specialized ones would live outside of it. One example is Raul's validate-ha role, which we will split off to speed up development, as most cores are not involved in this and gates are not testing it. == Update on transitioning to the new Quickstart jobs == We will use the job type field from the upstream jobs to figure out which quickstart job config we have to use for gate jobs (not the job name). In addition to this, Gabrielle will tackle the issue of mixing the old and new jobs, and run them in parallel, letting us transition them one by one. Details in the trello card[2]. == Gating improvement == I was part of a meeting last week where we tried to identify problem areas for our testing and came to the conclusion that the ungated openstack common repo[3] is sometimes the cause for gating breaks. We should start gating it to improve upstream quickstart job stability. Best regards, Attila [1] https://trello.com/b/HhXlqdiu/rdo?menu=filter&filter=label:%5BQ%20to%20U%5D [2[ https://trello.com/c/dNTpzD1n [3] https://buildlogs.centos.org/centos/7/cloud/x86_64/openstack-ocata/ __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [tripleo] CI Squad Meeting Summary (week 5 - for real now)
As always, if these topics interest you and you want to contribute to the discussion, feel free to join the next meeting: Time: Thursdays, 15:30-16:30 UTC Place: https://bluejeans.com/4113567798/ Note: last two weeks I used the incorrect week number. Getting back on track, this is the 5th week of 2017. Yesterday's meeting focused almost entirely on figuring out what's the "feature delta" between the current TripleO CI and the functionality of our Quickstart based CI. In the spirit of aviation we call this "preflight checklist"[1]. It contains: * the various variables that turn functionality on and off in upstream tests * a short description of the variables * a "Quickstart" section for each, describing if it's supported or not currently. If yes, usually there's a link at the respective part, if not, we add a Trello card to track the work, or a bug if we plan to take care of it a bit later * proposed new Quickstart jobs, where we combine the existing features into fewer jobs with same coverage * the existing upstream jobs with the features they currently cover If you're somewhat familiar with the current CI system, please look over these and let us know if there's any mistake in it. Other than this, the new nodepool and OVB Quickstart jobs are working, apart from the OVB HA job -- Sagi is working on it. I'm not sure the link for the checklist is accessible for everyone, so I'm going to paste it here after the link. Probably the formatting is not perfect, so if you can, check the google doc. Best regards, Attila [1] https://docs.google.com/document/d/1Mb_t5Qe-Lnh0uaXy0ubX9y4k65Q4D_aw-49eZOqoviQ/edit?pli=1# --- TripleO CI Quickstart Transition Preflight Checklist - All the items must be ready by 10th March This document describes: * Existing features in the current TripleO CI * Their support in Quickstart * The current CI job feature sets (what features are tested in specific jobs) * The new proposed job feature sets (to reduce the amount of jobs while keeping the same coverage) Feature index Each list item represents a “toci_gate_test.sh” variable that enable/disable features in the CI jobs. * 1-CONTROLLER * Use 1 controller (and maybe other type of nodes) * Quickstart: supported * 3-CONTROLLERS * Use 3 controllers (and maybe other type of nodes) * Quickstart: supported * 1-COMPUTE * Use 1 compute node (and maybe other type of nodes) * Quickstart: supported * 1-CEPH * Use 1 ceph node (and maybe other type of nodes) * Quickstart: supported * CONTAINERS * Container based undercloud (feature under development) * Container based overcloud * Quickstart: supported (undercloud in progress)[a] * DISABLE_IRONIC * This is just a label that denotes the ability to skip ironic steps entirely during multinode based jobs * Quickstart: not needed, implemented elsewhere * DELOREAN_HTTP_URLS * We can't use squid to cache https urls, so don't use them * Quickstart: make sure we use only http in build-test-package [card] * IRONIC_DRIVERS_DEPLOY_HTTPPORT * sets http port unconditionally to ironic::drivers::deploy::http_port: 3816 in undercloud overrides * Quickstart: not supported [card] * IMAGE_CACHE_SELECT * This feature enables to select whether or not using an image from cache when gating specific project (i.e. projects that alter the image creation process) * TODO(gcerami) propose a change to unify the list of project for every image to build (specific project gated -> all images will be recreated from scratch) * Quickstart: work in progress (trown, image build role) [card] * IMAGE_CACHE_UPLOAD * Ability to promote image, uploading it to the image cache server * We can leave the current implementation in bash, but work on which job type combination will activate the upload * Quickstart: not needed, can be handled up the current script * INTROSPECT * perform overcloud nodes introspection * Quickstart: supported, but we are still performing bulk introspection while we should use new format as in http://git.openstack.org/cgit/openstack-infra/tripleo-ci/tree/scripts/tripleo.sh#n608 instead of https://github.com/openstack/tripleo-quickstart-extras/blob/master/roles/overcloud-prep-images/templates/overcloud-prep-images.sh.j2#L90 * [card] * METRICS * Tripleo ci is sprinkled with metric sections, surrounded with start_metric - stop_metric primitives that gather section duration informations throughout various steps of the deployment. (they really just set timers). This metrics are then sent to graphite host for graph rendering at the end of the run * Quickstart: not needed * MULTINODE_SETUP * MULTINODE_NODES_BOOTSTRAP * multiple nodes are consumed from openstack nodes pool * A setup to create a network between nodepool nodes is needed * All nodes must contain proper nodepool configurations in /etc/nodepool * N
Re: [openstack-dev] [TripleO] Proposing Sergey (Sagi) Shnaidman for core on tripleo-ci
On 02/01/2017 08:37 PM, John Trowbridge wrote: On 01/30/2017 10:56 AM, Emilien Macchi wrote: Sagi, you're now core on TripleO CI repo. Thanks for your hard work on tripleo-quickstart transition, and also helping by keeping CI in good shape, your work is amazing! Congrats! Note: I couldn't add you to tripleo-ci group, but only to tripleo-core (Gerrit permissions), which mean you can +2 everything but we trust you to use it only on tripleo-ci. I'll figure out the Gerrit permissions later. I also told Sagi that he should also feel free to +2 any tripleo-quickstart/extras patches which are aimed at transitioning tripleo-ci to use quickstart. I didn't really think about this as an extra permission, as any tripleo core has +2 on tripleo-quickstart/extras. However, I seem to have surprised the other quickstart cores with this. None were opposed to the idea, but just wanted to make sure that it was clearly communicated that this is allowed. If there is some objection to this, we can consider it further. FWIW, Sagi has been consistently providing high quality critical reviews for tripleo-quickstart/extras for some time now, and was pivotal in the setup of the quickstart based OVB job. Thanks for the clarification. And +1 on Sagi as a quickstart/extras core. I really appreciate his critical eyes on the changes. Attila __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [tripleo] CI Squad Meeting Summary (week 5)
In the spirit of "better late than never", here's a summary of our CI Squad meeting. Time: Thursdays, 15:30-16:30 UTC Place: https://bluejeans.com/4113567798/ Configuration management in TripleO CI == There was a design meeting organized by Gabrielle (thanks!) to discuss how to solve the problem of configuring the new Quickstart jobs. There are multiple approaches for this problem, and it's difficult to find a balance from having a single definite config file per job (too much duplication) to parsing the job name and handling every option with coding logic in the testing system (hard to reproduce/know what's happening). What emerged from the still ongoing discussion is identifying three config sections: * provisioner: e.g. libvirt, OVB or nodepool based jobs and the related configution to allow quickstart to work on these systems * release: one config file for release, config/release (we already have these) * config: one config file for general config, config/general_config It seems useful to give a neutral name for a certain set of functionalities tested in certain CI jobs instead of the misleading names of "ha/nonha" (when in fact they test a lot more). "featureset01", "featureset02", etc. looks like a good candidate for naming them. So we could end up with jobs like "tripleo-ovb-featureset01-newton", with the featureset matrix documented somewhere like tripleo-ci. Smaller topics == * Both the OVB and nodepool jobs are working apart from generic setbacks like the python-pandas issue breaking our jobs. * Our blocker/bottleneck for transition is now the above discussed configuration management. * The "Quickstart transition checklist" is now hosted on google docs here[1]. * We are having trouble keeping track of the issues in upstream CI. Using an invidual trello board instead of the current etherpad was suggested. We're going to try this solution out this week and post updates. * Emilien mentioned the new additonal composable upgrades testing in TripelO CI[2]. * We had a bug triage/squashing event last Friday. Started moving bugs from the "tripleo-quickstart" project to "tripleo" and tag them as ci/quickstart, to ease the scheduling of bugs. * Also managed to make a big improvment on the tripleo-quickstart bug count, going from 65 open bugs to 42, and from 37 new bugs to 21. Full meeting minutes can be found here: https://etherpad.openstack.org/p/tripleo-ci-squad-meeting Best regards, Attila [1] https://docs.google.com/document/d/1Mb_t5Qe-Lnh0uaXy0ubX9y4k65Q4D_aw-49eZOqoviQ/edit?pli=1 [2] https://review.openstack.org/425727 __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [tripleo] CI Squad Meeting Summary (week 4)
Everybody interested in the TripleO CI and Quickstart is welcome to join the weekly meeting: Time: Thursdays, 15:30-16:30 UTC Place: https://bluejeans.com/4113567798/ Here's this week's summary: * There aren't any blockers or bottlenecks slowing down the transition to the Quickstart based CI. We're right on track. * The Quickstart OVB jobs are running stable. Yesterday they broke due to a tripleo-ci change, but Sagi fixed them today. * The Quickstart multinode nodepool job is also working well. It's a good basis for extending our feature coverage. * The ovb-ha-oooq-nv and nonha-multinode-oooq-nv jobs are moving in the check-tripleo queue to make sure we catch any change that breaks these new jobs[1]. * We have a few quickstart log collection usability improvements are on the way: soon all the text based logs are going to be renamed to end in txt.gz, making them browsable from the log servers[2]. Also the log collection output will be in a log file instead of dumped on a console. * We are trying to reduce the number of unnecessary OVB jobs by limiting the files we trigger on, but openstack-infra doesn't like our current approach[3]. We brainstormed about alternative solutions (see the meeting minutes for details). * Ben Kero proposed a PTG CI session about the CI moving to use Quickstart. Emilien is suggesting to create a second one regarding reusing the scenario jobs for container tests. * There's a draft for the "pre-flight check list"[4] for the CI transition made by Gabrielle to make sure the quickstart based jobs will have the same coverage or better than the current CI system. * We are going to have a design session about the handling of the config files for these new jobs on Wednesday the 25th, 15:00 UTC. The full meeting minutes are here: https://etherpad.openstack.org/p/tripleo-ci-squad-meeting Best regards, Attila [1] https://review.openstack.org/422646 [2] https://review.openstack.org/422638 [3] https://review.openstack.org/421525 [4] https://etherpad.openstack.org/p/oooq-tripleo-ci-check-list __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [tripleo] CI Squad Meeting Summary
We had our first meeting as the the CI Squad today. We re-purposed our "Quickstart to Upstream Transitioning" meeting into the Squad meeting, so the topics were and will be focused on the transition for the next month or so. Everybody interested in the TripleO CI and Quickstart is welcome to join the meeting: Time: Thursdays, 15:30-16:30 UTC Place: https://bluejeans.com/4113567798/ Meeting summary: * The Quickstart transition does not seem to have any blockers or bottlenecks for the time being, we are on track to be ready at the end of the Ocata cycle, and switch in the beginning of Pike. * The new Quickstart based OVB jobs are working consistently and reliably, though the tinyrpc.server bug affected them just like the regular jobs. * Got our first overcloud deploy today on the Quickstart based nodepool multinode jobs, big kudos to bkero and trown for making it happen. The verification part still needs work though. * The experimental queue is overloaded, we will move OVB workloads to the multinode jobs once we get them working reliably. * We might want to change the experimental OVB jobs to run container and composable upgrade jobs instead for increased coverage, but we need more input on that, we will discuss it on the next TripleO meeting. * Selecting the proper configuration to use for each gate/period job is not optimal, we should rethink the system during the transition to Quickstart. This needs a design session. Also we need an up-to-date job type - functionality matrix like this[1]. * We voted to keep the meetings on BlueJeans and at 15:30 UTC. * We will keep tracking the transition work on the RDO Infra Trello board with the [Q to U] tags, maybe move to the CI Squad board on Trello later. We should also look at Storyboard and see if that would be useful to switch to. Full meeting minutes and notes: https://etherpad.openstack.org/p/tripleo-ci-squad-meeting Best regards, Attila [1] https://review.openstack.org/#/c/399269/7/README.md __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [tripleo] [ci] TripleO-Quickstart Transition to TripleO-CI Update and Invite:
On 01/04/2017 10:34 AM, Steven Hardy wrote: Hi Harry, On Tue, Jan 03, 2017 at 04:04:51PM -0500, Harry Rybacki wrote: Greetings All, Folks have been diligently working on the blueprint[1] to prepare TripleO-Quickstart (OOOQ)[2] and TripleO-Quickstart-Extras[3] for their transition into TripleO-CI. Presently, our aim is to begin the actual transition to OOOQ on 4-Feb-2017. We are tracking our work on the RDO-Infra Trello board[4] and holding public discussion of key blockers on the team’s scrum etherpad[5]. Thanks for the update - can you please describe what "transition into TripleO-CI" means? Hello Steve, This means we're trying to run all the gate jobs with Quickstart and make sure we have the same features enabled and results for each existing gate jobs. I'm happy to see this work proceeding, but we have to be mindful that the end of the development cycle (around the time you're proposing) is always a crazy-busy time where folks are trying to land features and fixes. So, we absolutely must avoid any CI outages around this time, thus I get nervous talking about major CI transitions around the Release-candate weeks ;) https://releases.openstack.org/ocata/schedule.html If we're talking about getting the jobs ready, then switching over to primarily oooq jobs in early pike, that's great, but please lets ensure we don't may any disruptive changes before the end of this (very short and really busy) cycle. As I see the early pike is only 2 week away from our planned switch, it might be wiser to delay it indeed. The end-of-cycle stability might be even useful for us to run some new jobs parallel for a while if we have enough resources. We are hosting weekly transition update meetings (1600-1700 UTC) and would like to invite folks to participate. Specifically, we are looking for at least one stakeholder in the existing TripleO-CI to join us as we prepare to migrate OOOQ. Attend and map out job/feature coverage to identify any holes so we can begin plugging them. Please reply off-list or reach out to me (hrybacki) on IRC to be added to the transition meeting calendar invite. Why can't we discuss this in the weekly TripleO IRC meeting? I think folks would be fine with having a standing item where we dicscuss this transition (there is already a CI item, but I've rarely seen this topic raised there). I agree that we should have a standing item about this in the TripleO meeting, however this transition meeting usually takes an hour a week in itself, so we cannot really fit it into the TripleO meeting. Also why we ask somebody well versed in the TripleO CI to join us is that we might get answers to questions we didn't even know we had. There are probably shortcuts and known workarounds to what we're trying to achieve in the upstream system that we're not familiar with. Also the discussion is focused on Quickstart (for example how to develop some roles that unify different workloads like OVB and nodepool), so it wouldn't be relevant for the TripleO meeting entirely. Thus the request still stands, I think we could get a big help with somebody familiar with the CI system. This should be a once a week meeting for only the following 3-6 weeks. We will make a short status from now on about the current state of the transition on the TripleO meetings though. Thank you for your thoughts, Attila https://wiki.openstack.org/wiki/Meetings/TripleO Thanks! Steve __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [TripleO] Make "RedHat RDO CI" vote on tripleo-quickstart changes?
I think Third party gating is solid enough to make it vote, not just post results. Currently we have a gate job breaking and I almost submitted something without everything passing. Would be useful to make it blocking. Is there any reason not to do this? Attila __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev