Re: [openstack-dev] [tripleo] tripleo upstream gate outtage, was: -> gate jobs impacted RAX yum mirror

Jeremy Stanley Mon, 14 May 2018 09:38:09 -0700

On 2018-05-14 09:57:17 -0600 (-0600), Wesley Hayutin wrote:
> On Mon, May 14, 2018 at 10:36 AM Jeremy Stanley <[email protected]> wrote:
[...]
> > Couldn't a significant burst of new packages cause the same
> > symptoms even without it being tied to a minor version increase?
> 
> Yes, certainly this could happen outside of a minor update of the
> baseos.


Thanks for confirming. So this is not specifically a CentOS minor
version increase issue, it's just more likely to occur at minor
version boundaries.

> So the only thing out of our control is the package set on the
> base nodepool image. If that suddenly gets updated with too many
> packages, then we have to scramble to ensure the images and
> containers are also udpated.

It's still unclear to me why the packages on the test instance image
(i.e. the "container host") are related to the packages in the
container guest images at all. That would seem to be the whole point
of having containers?

> If there is a breaking change in the nodepool image for example
> [a], we have to react to and fix that as well.

I would argue that one is a terrible workaround which happened to
show its warts. We should fix DIB's pip-and-virtualenv element
rather than continue rely on side effects of pinning RPM versions.
I've commented to that effect on https://launchpad.net/bugs/1770298
just now.

> > It sounds like a problem with how the jobs are designed
> > and expectations around distros slowly trickling package updates
> > into the series without occasional larger bursts of package deltas.
> > I'd like to understand more about why you upgrade packages inside
> > your externally-produced container images at job runtime at all,
> > rather than relying on the package versions baked into them.
> 
> We do that to ensure the gerrit review itself and it's
> dependencies are built via rpm and injected into the build. If we
> did not do this the job would not be testing the change at all.
> This is a result of being a package based deployment for better or
> worse.
[...]

Now I'll risk jumping to proposing solutions, but have you
considered building those particular packages in containers too?
That way they're built against the same package versions as will be
present in the other container images you're using rather than to
the package versions on the host, right? Seems like it would
completely sidestep the problem.

> An enhancement could be to stage the new images for say one week
> or so. Do we need the CentOS updates immediately? Is there a
> possible path that does not create a lot of work for infra, but
> also provides some space for projects to prep for the consumption
> of the updates?
[...]

Nodepool builds new images constantly, but at least daily. Part of
this is to prevent the delta of available packages/indices and other
files baked into those images from being more than a day or so stale
at any given point in time. The older the image, the more packages
(on average) jobs will need to download if they want to test with
latest package versions and the more strain it will put on our
mirrors and on our bandwidth quotas/donors' networks.

There's also a question of retention, if we're building images at
least daily but keeping them around for 7 days (storage on the
builders, tenant quotas for Glance in our providers) as well as the
explosion of additional nodes we'd need since we pre-boot nodes with
each of our images (and the idea as I understand it is that you
would want jobs to be able to select between any of them). One
option, I suppose, would be to switch to building images weekly
instead of daily, but that only solves the storage and node count
problem not the additional bandwidth and mirror load. And of course,
nodepool would need to learn to be able to boot nodes from older
versions of an image on record which is not a feature it has right
now.

> Understood, I suspect this will become a more widespread issue as
> more projects start to use containers ( not sure ).

I'm still confused as to what makes this a container problem in the
general sense, rather than just a problem (leaky abstraction) with
how you've designed the job framework in which you're using them.

> It's my understanding that there are some mechanisms in place to
> pin packages in the centos nodepool image so there has been some
> thoughts generally in the area of this issue.
[...]

If this is a reference back to bug 1770298, as mentioned already I
think that's a mistake in diskimage-builder's stdlib which should be
corrected, not a pattern we should propagate.
-- 
Jeremy Stanley

signature.asc
Description: PGP signature

__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: [email protected]?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [tripleo] tripleo upstream gate outtage, was: -> gate jobs impacted RAX yum mirror

Reply via email to