Re: [openstack-dev] [tripleo] tripleo gate is blocked - please read

Bogdan Dobrelya Thu, 14 Jun 2018 02:48:12 -0700

On 6/14/18 3:50 AM, Emilien Macchi wrote:

TL;DR: gate queue was 25h+, we put all patches from gate on standby, donot restore/recheck until further announcement.
We recently enabled the containerized undercloud for multinode jobs andwe believe this was a bit premature as the container download processwasn't optimized so it's not pulling the mirrors for the same containersmultiple times yet.It caused the job runtime to increase and probably the load on docker.io<http://docker.io> mirrors hosted by OpenStack Infra to be a bit slowerto provide the same containers multiple times. The time taken to preparecontainers on the undercloud and then for the overcloud caused the jobsto randomly timeout therefore the gate to fail in a high amount oftimes, so we decided to remove all jobs from the gate by abandoning thepatches temporarily (I have them in my browser and will restore whenthings are stable again, please do not touch anything).
Steve Baker has been working on a series of patches that optimize theway we prepare the containers but basically the workflow will be:- pull containers needed for the undercloud into a local registry, usinginfra mirror if available
- deploy the containerized undercloud
- pull containers needed for the overcloud minus the ones already pulledfor the undercloud, using infra mirror if available
- update containers on the overcloud
- deploy the containerized undercloud

Let me also note that it's may be time to introduce jobs dependencies[0]. Dependencies might somewhat alleviate registries/mirrors DoSissues, like that one we have currently, by running jobs in batches, andnot firing of all at once.

We still have options to think of. The undercloud deployment takeslonger than standalone, but provides better coverage therefore betterextrapolates (and cuts off) future overcloud failures for the dependentjobs. Standalone is less stable yet though. The containers update checkmay be also an option for the step 1, or step 2, before the remainingmultinode jobs execute.

Making those dependent jobs skipped, in turn, reduces DoS effects causedto registries and mirrors.

[0]https://review.openstack.org/#/q/status:open+project:openstack-infra/tripleo-ci+topic:ci_pipelines

With that process, we hope to reduce the runtime of the deployment andtherefore reduce the timeouts in the gate.To enable it, we need to land in that order:https://review.openstack.org/#/c/571613/,https://review.openstack.org/#/c/574485/,https://review.openstack.org/#/c/571631/ andhttps://review.openstack.org/#/c/568403.
In the meantime, we are disabling the containerized undercloud recentlyenabled on all scenarios: https://review.openstack.org/#/c/575264/ formitigation with the hope to stabilize things until Steve's patches land.Hopefully, we can merge Steve's work tonight/tomorrow and re-enable thecontainerized undercloud on scenarios after checking that we don't havetimeouts and reasonable deployment runtimes.
That's the plan we came with, if you have any question / feedback pleaseshare it.
--
Emilien, Steve and Wes

__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



--
Best regards,
Bogdan Dobrelya,
Irc #bogdando

__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [tripleo] tripleo gate is blocked - please read

Reply via email to