Re: [RFC] Refactoring OpenWrt's build infra
On 16-10-22, Christian Marangi wrote: > On Sun, Oct 16, 2022 at 02:07:05PM +0200, Baptiste Jonglez wrote: > > - either buildbot can run latent workers with a different Docker image > > depending on the build > > IMHO, this would be the safest and better solution to the problem. But > this means that we will have to support 2 thing instead of having one > centrilized container. I'm not even sure Buildbot is able to do that :) But if it is, and the only change between worker images is the version of the base image (e.g. Debian), then that sounds manageable. > Would be ideal to have one centrilized dl/ dir where each runner can go > and take the file. We already support that in openwrt (to have a > different dl dir) and there isn't any problem with having different > release tar for the same package. I had tried to share dl/ across several worker containers on the same physical machine. But there are race conditions that make it not so easy to do. I fixed one issue [1] but there was another that I couldn't track down. We could use object storage, for instance from DigitalOcean [2]. It would allow all workers to read/write to the same shared storage for download files, and get (hopefully) good download performance. With that in place, we could also prune dl/ very aggressively to save disk space. Baptiste [1] https://git.openwrt.org/d4c957f24b2f76986378c6d9 [2] https://www.digitalocean.com/products/spaces signature.asc Description: PGP signature ___ openwrt-devel mailing list openwrt-devel@lists.openwrt.org https://lists.openwrt.org/mailman/listinfo/openwrt-devel
Re: [RFC] Refactoring OpenWrt's build infra
On Sun, Oct 16, 2022 at 02:07:05PM +0200, Baptiste Jonglez wrote: > Hi, > > On 05-10-22, Thibaut wrote: > > Hi, > > > > Following an earlier conversation on IRC with Petr, I’m willing to work on > > refactoring our buildbot setup as follows: > > > > - single master for each stage (images and packages) > > - latent workers attached to either master, thus able to build > > opportunistically from either master or release branches as needed / as > > work becomes available > > This is a good idea, but I see one main downside: we would probably have > to use the same buildbot worker image for all releases. > > From what I remember, when the worker image was updated from Debian 9 to > Debian 10, this seriously broke 19.07 builds. Maybe Petr or Jow will > remember the details better. > > I see two ways to address this: > > - either buildbot can run latent workers with a different Docker image > depending on the build > IMHO, this would be the safest and better solution to the problem. But this means that we will have to support 2 thing instead of having one centrilized container. > - otherwise, we have to think early about the update strategy. Maybe use > the shared buildbot instance for master branch + most recent release > only, and move older releases back to a dedicated buildbot instance? > > > The main upside is that all buildslaves could be pooled, improving overall > > throughput and reducing wasted « idle time », thus lowering build times and > > operating costs. > > > > Petr also suggested that extra release workers could be spawned at will > > (through e.g. cloud VMs) when a new release is to be tagged; tagged release > > could be scheduled only to release workers: this would still work within > > this « single master » build scheme. > > > > NB: I’m aware of the potential performance penalty of having buildslaves > > randomly switching between branches, so I would try to come up with a > > reasonably smart solution to this issue if it doesn’t conflict with the > > main goals. > > One thing to look for is disk space usage. Full disks is a common cause > of build failures. If a single worker goes through builds for different > branches, I would expect disk usage to be higher (e.g. more different > versions of software in dl/). > Would be ideal to have one centrilized dl/ dir where each runner can go and take the file. We already support that in openwrt (to have a different dl dir) and there isn't any problem with having different release tar for the same package. -- Ansuel ___ openwrt-devel mailing list openwrt-devel@lists.openwrt.org https://lists.openwrt.org/mailman/listinfo/openwrt-devel
Re: [RFC] Refactoring OpenWrt's build infra
Hi, On 05-10-22, Thibaut wrote: > Hi, > > Following an earlier conversation on IRC with Petr, I’m willing to work on > refactoring our buildbot setup as follows: > > - single master for each stage (images and packages) > - latent workers attached to either master, thus able to build > opportunistically from either master or release branches as needed / as work > becomes available This is a good idea, but I see one main downside: we would probably have to use the same buildbot worker image for all releases. From what I remember, when the worker image was updated from Debian 9 to Debian 10, this seriously broke 19.07 builds. Maybe Petr or Jow will remember the details better. I see two ways to address this: - either buildbot can run latent workers with a different Docker image depending on the build - otherwise, we have to think early about the update strategy. Maybe use the shared buildbot instance for master branch + most recent release only, and move older releases back to a dedicated buildbot instance? > The main upside is that all buildslaves could be pooled, improving overall > throughput and reducing wasted « idle time », thus lowering build times and > operating costs. > > Petr also suggested that extra release workers could be spawned at will > (through e.g. cloud VMs) when a new release is to be tagged; tagged release > could be scheduled only to release workers: this would still work within this > « single master » build scheme. > > NB: I’m aware of the potential performance penalty of having buildslaves > randomly switching between branches, so I would try to come up with a > reasonably smart solution to this issue if it doesn’t conflict with the main > goals. One thing to look for is disk space usage. Full disks is a common cause of build failures. If a single worker goes through builds for different branches, I would expect disk usage to be higher (e.g. more different versions of software in dl/). Thanks, Baptiste signature.asc Description: PGP signature ___ openwrt-devel mailing list openwrt-devel@lists.openwrt.org https://lists.openwrt.org/mailman/listinfo/openwrt-devel
Re: [RFC] Refactoring OpenWrt's build infra
Thibaut [2022-10-05 17:56:17]: [adding Jo and Paul to Cc: loop] Hi, > Before I set on to revamp the system accordingly I want to ask if this > proposal seems like a Good Idea™ :) those above mentioned topics are on my TODO list for a long time already, so any help is more then appreciated, thanks! Since we're currently using buildbot repository as our main source for the production containers, I would like to suggest use of issues[1] there to track future plans and ongoing work transparently over there for obvious reasons. Other option might be mirroring of that GitLab buildbot repo to GitHub and use issues there instead if thats preferred. More food for thoughts: * We should replace currently HW EOL machine serving buildbot.openwrt.org - we're currently blocked with this by still pending OpenWrt.org account on Hetzner - this refactoring might be a good opportunity for tackling it * Filter out GitPoller build events originating from noop sources like the CI tooling[2] - IIRC those build events gets propagated down to 2nd stage/package builds as well * Rate/resource limits handling during scripts/feeds invocations - git.openwrt.org might be overloaded in certain time periods, leading to waste of build resources and false positive build results * python3/host: build install race condition with uboot/scripts/dtc/pylibfdt[3] is another such resource waste example * Use HSM backed storage for release/package signing keys * IIRC Paul (and probably more folks) find our buildbot based system arcane and would like to try using something more recent, like for example GitHub Actions instead - perhaps we should try to align with those ideas and consider factoring out the build steps into something more self-contained, build layer agnostic and thus reusable? - it just seems to me, that we're reinventing a wheels[4] * We should consider making our buildbot infra completely open, so anyone can reuse it and/or make it better 1. https://gitlab.com/openwrt/buildbot/-/issues/new 2. https://github.com/openwrt/openwrt/pull/10094#issuecomment-1170760326 3. https://github.com/openwrt/openwrt/pull/10407 4. https://github.com/openwrt/packages/issues/19241 Cheers, Petr ___ openwrt-devel mailing list openwrt-devel@lists.openwrt.org https://lists.openwrt.org/mailman/listinfo/openwrt-devel
Re: [RFC] Refactoring OpenWrt's build infra
On 10/5/22 17:56, Thibaut wrote: Hi, Following an earlier conversation on IRC with Petr, I’m willing to work on refactoring our buildbot setup as follows: - single master for each stage (images and packages) - latent workers attached to either master, thus able to build opportunistically from either master or release branches as needed / as work becomes available The main upside is that all buildslaves could be pooled, improving overall throughput and reducing wasted « idle time », thus lowering build times and operating costs. Petr also suggested that extra release workers could be spawned at will (through e.g. cloud VMs) when a new release is to be tagged; tagged release could be scheduled only to release workers: this would still work within this « single master » build scheme. NB: I’m aware of the potential performance penalty of having buildslaves randomly switching between branches, so I would try to come up with a reasonably smart solution to this issue if it doesn’t conflict with the main goals. Before I set on to revamp the system accordingly I want to ask if this proposal seems like a Good Idea™ :) Comments welcome, T. Hi, This sounds like a good idea, but I am not an expert in this topic. I would approve such a change, but others are much more knowledge how our infrastructure works. I do not know if we need special container for each release branch, I think we try to use an old Debian to build to make it possible to use the image builder binaries also on older systems. Hauke ___ openwrt-devel mailing list openwrt-devel@lists.openwrt.org https://lists.openwrt.org/mailman/listinfo/openwrt-devel
[RFC] Refactoring OpenWrt's build infra
Hi, Following an earlier conversation on IRC with Petr, I’m willing to work on refactoring our buildbot setup as follows: - single master for each stage (images and packages) - latent workers attached to either master, thus able to build opportunistically from either master or release branches as needed / as work becomes available The main upside is that all buildslaves could be pooled, improving overall throughput and reducing wasted « idle time », thus lowering build times and operating costs. Petr also suggested that extra release workers could be spawned at will (through e.g. cloud VMs) when a new release is to be tagged; tagged release could be scheduled only to release workers: this would still work within this « single master » build scheme. NB: I’m aware of the potential performance penalty of having buildslaves randomly switching between branches, so I would try to come up with a reasonably smart solution to this issue if it doesn’t conflict with the main goals. Before I set on to revamp the system accordingly I want to ask if this proposal seems like a Good Idea™ :) Comments welcome, T. ___ openwrt-devel mailing list openwrt-devel@lists.openwrt.org https://lists.openwrt.org/mailman/listinfo/openwrt-devel