Re: [Mesa-dev] postmortem: arm64_test job timeouts today

2020-07-20 Thread Michel Dänzer
On 2020-07-18 2:11 a.m., Eric Anholt wrote:
> 
> - https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5669 would
> give us twice the -j flags for our builds on fd.o's x86 runners (such
> as for the arm64_test job)

Landing this should be straightforward.


> However, I have no solution for the general problem of "users can
> merge code that causes failing container builds for others."  Could we
> make ci-templates not use registry-cached containers in marge-bot
> pipelines,

You mean something like
https://gitlab.freedesktop.org/freedesktop/ci-templates/-/issues/14#note_571088
?


> and then replicate the image up to mesa/mesa somehow?

This would require ci-templates enforcing that the image in the forked
registry is up to date when merging. That was my original idea for the
ci-templates issue above, but there was resistance. Maybe the benefit of
not having to rebuild the image in the main project as well could change
things though.


-- 
Earthling Michel Dänzer   |   https://redhat.com
Libre software enthusiast | Mesa and X developer
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] postmortem: arm64_test job timeouts today

2020-07-17 Thread Eric Anholt
With the landing of
https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5839 we
entered a state that caused future pipelines to fail.

This is due to an unfortunate interaction between gitlab MRs and
ci-templates' model of container image distribution: MRs are tested in
the submitter's repository, but ci-templates only replicates container
images from mesa/mesa to user repositories.  So, if someone has ever
uploaded a container image to their repo under a tag that can pass the
tests, they can land code that makes all future pipelines fail.  In
this case, arm64_test was near the timeout for the pipelines and was
failing for most people, including marge, and marge's queue ended up
quite backed up.

There are a few things we can do to mitigate this particular job's timeouts:

- https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5669 would
give us twice the -j flags for our builds on fd.o's x86 runners (such
as for the arm64_test job)
- https://gitlab.freedesktop.org/mesa/mesa/-/issues/3123 would let us
move back to debian testing or unstable for the test images, and use
more debian packages (like apitrace) instead of hand-building them in
our CI system
- https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=962718 would let
us cut a big portion of the test container build times

However, I have no solution for the general problem of "users can
merge code that causes failing container builds for others."  Could we
make ci-templates not use registry-cached containers in marge-bot
pipelines, and then replicate the image up to mesa/mesa somehow?
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev