On Mon, Jan 2, 2023 at 4:48 AM m1027 <m1...@posteo.net> wrote: > > Hi and happy new year. > > When we create apps on Gentoo they become easily incompatible for > older Gentoo systems in production where unattended remote world > updates are risky. This is due to new glibc, openssl-3 etc.
I wrote a very long reply, but I've removed most of it: I basically have a few questions, and then some comments: I don't quite grasp your problem statement, so I will repeat what I think it is and you can confirm / deny. - Your devs build using gentoo synced against some recent tree, they have recent packages, and they build some software that you deploy to prod. - Your prod machines are running gentoo synced against some recent tree, but not upgraded (maybe only glsa-check runs) and so they are running 'old' packages because you are afraid to update them[0] - Your software builds OK in dev, but when you deploy it in prod it breaks, because prod is really old, and your developments are using packages that are too new. My main feedback here is: - Your "build" environment should be like prod. You said you didn't want to build "developer VMs" but I am unsure why. For example I run Ubuntu and I do all my gentoo development (admittedly very little these days) in a systemd-nspawn container, and I have a few shell scripts to mount everything and set it up (so it has a tree snapshot, some git repos, some writable space etc.) - Your "prod" environment is too risky to upgrade, and you have difficulty crafting builds that run in every prod environment. I think this is fixable by making a build environment more like the prod environment. The challenge here is that if you have not done that (kept the copies of ebuilds around, the distfiles, etc) it can be challenging to "recreate" the existing older prod environments. But if you do the above thing (where devs build in a container) and you can make that container like the prod environments, then you can enable devs to build for the prod environment (in a container on their local machine) and get the outcome you want. - Understand that not upgrading prod is like, to use a finance term, picking up pennies in front of a steamroller. It's a great strategy, but eventually you will actually *need* to upgrade something. Maybe for a critical security issue, maybe for a feature. Having a build environment that matches prod is good practice, you should do it, but you should also really schedule maintenance for these prod nodes to get them upgraded. (For physical machines, I've often seen businesses just eat the risk and assume the machine will physically fail before the steamroller comes, but this is less true with virtualized environments that have longer real lifetimes.) > > So, what we've thought of so far is: > > (1) Keeping outdated developer boxes around and compile there. We > would freeze portage against accidental emerge sync by creating a > git branch in /var/db/repos/gentoo. This feels hacky and requires a > increating number of develper VMs. And sometimes we are hit by a > silent incompatibility we were not aware of. In general when you build binaries for some target, you should build on that target when possible. To me, this is the crux of your issue (that you do not) and one of the main causes of your pain. You will need to figure out a way to either: - Upgrade the older environments to new packages. - Build in copies of the older environments. I actually expect the second one to take 1-2 sprints (so like 1 engineer month?) - One sprint to make some scripts that makes a new production 'container' - One sprint to sort of integrate that container into your dev workflow, so devs build in the container instead of what they build in now. It might be more or less daunting depending on how many distinct (unique?) prod environments you have (how many containers will you actually need for good build coverage?), how experienced in Gentoo your developers are, and how many artifacts from prod you have. - A few crazy ideas are like: - Snapshot an existing prod machine, strip of it machine-specific bits, and use that as your container. - Use quickpkg to generate a bunch of bin pkgs from a prod machine, use that to bootstrap a container. - Probably some other exciting ideas on the list ;) > > (2) Using Ubuntu LTS for production and Gentoo for development is > hit by subtile libjpeg incompatibilites and such. I would advise, if possible, to make dev and prod as similar as possible[1]. I'd be curious what blockers you think there are to this pattern. Remember that "dev" is not "whatever your devs are using" but is ideally some maintained environment; segmented from their daily driver computer (somehow). > > (3) Distributing apps as VMs or docker: Even those tools advance and > become incompatible, right? And not suitable when for smaller Arm > devices. I think if your apps are small and self-contained and easily rebuilt, your (3) and (4) can be workable. If you need 1000 dependencies at runtime, your containers are going to be expensive to build, expensive to maintain, you are gonna have to build them often (for security issues), it can be challenging to support incremental builds and incremental updates...you generally want a clearer problem statement to adopt this pain. Two problem statements that might be worth it are below ;) If you told me you had 100 different production environments, or needed to support 12 different OSes, I'd tell you to use containers (or similar) If you told me you didn't control your production environment (because users installed the software wherever) I'd tell you use containers (or similar) > > (4) Flatpak: No experience, does it work well? Flatpak is conceptually similar to your (3). I know you are basically asking "does it work" and the answer is "probably", but see the other questions for (3). I suspect it's less about "does it work" and more about "is some container deployment thing really a great idea." > > (5) Inventing a full fledged OTA Gentoo OS updater and distribute > that together with the apps... Nah. This sounds like a very expensive solution that is likely rife with very exciting security problems, fwiw. > > Hm... Comments welcome. > Peter's comment about basically running your own fork of gentoo.git and sort of 'importing the updates' is workable. Google did this for debian testing (called project Rodete)[2]. I can't say it's a particularly cheap solution (significant automation and testing required) but I think as long as you are keeping up (I would advise never falling more than 365d behind time.now() in your fork) then I think it provides some benefits. - You control when you take updates. - You want to stay "close" to time.now() in the tree, since a rolling distro is how things are tested. - This buys you 365d or so to fix any problem you find. - It nominally requires that you test against ::gentoo and ::your-gentoo-fork, so you find problems in ::gentoo before they are pulled into your fork, giving you a heads up that you need to put work in. [0] FWIW this is basically what #gentoo-infra does on our boxes and it's terrible and I would not recommend it to most people in the modern era. Upgrade your stuff regularly. [1] When I was at Google we had a hilarious outage because someone switched login managers (gdm vs kdm) and kdm had a different default umask somehow? Anyway it resulted in a critical component having the wrong permissions and it caused a massive outage (luckily we had sufficient redundancy that it was not user visible) but it was one of the scariest outages I had ever seen. I was in charge of investigating (being on the dev OS team at the time) and it was definitely very difficult to figure out "what changed" to produce the bad build. We stopped building on developer workstations soon after, FWIW. [2] https://cloud.google.com/blog/topics/developers-practitioners/how-google-got-to-rolling-linux-releases-for-desktops > Thanks > >