Re: Improving CI throughput

Mathieu Othacehe Tue, 25 Aug 2020 06:33:08 -0700


Hey,


> Yeah, this is a ridiculous situation.  We should do a hackathon to get
> better monitoring of useful metrics (machine load,
> time-of-push-to-time-to-build-completion, etc.), to clearly identify the
> bottlenecks (crashes? inefficient protocol? scheduling issues? Cuirass
> or offload or guix-daemon issue?), and to address as many of them as we
> can.
>
> Any volunteers?  :-)

I'd really like to improve the situation! A hackathon seems like a
nice idea.

As a matter of fact, I already spent some times improving the stability
of Cuirass web interface[1].

Now I can see multiple topics that could be approached in parallel:

* Add metrics to Cuirass as you suggested. There's an open ticket about
  that here[2].

* Investigate offloading issues[3].

* Fix database contention[4].

* Fix guix-daemon deadlocking[5].

* Monitor closely what's happening on Berlin and decide if it is
opportune to add a build scheduler mechanism somewhere. See what Hydra
is doing[6] and what Chris is proposing[7].

As most of the issues are only observed on Berlin machines, which access is
restricted, we will also have to find a way to reproduce them locally.

Anyway, if some people are motivated, we could try to plan a day or
week-end to work on those topics :).

Thanks,

Mathieu

[1]: https://issues.guix.gnu.org/42548.
[2]: https://issues.guix.gnu.org/32548.
[3]: https://issues.guix.gnu.org/34033.
[4]: https://issues.guix.gnu.org/42001.
[5]: https://issues.guix.gnu.org/31785.
[6]: 
https://github.com/NixOS/hydra/blob/master/src/hydra-queue-runner/dispatcher.cc
[7]: https://lists.gnu.org/archive/html/guix-devel/2020-04/msg00323.html

Re: Improving CI throughput

Reply via email to