Re: "guix pack -f docker" does too much work
Hi, Michal Atlas skribis: >>> Also seems that Nix's way only quickly imports the changed layers? And >>> Guix's always imports the whole thing, at least I think? >> What do you mean by “imports the whole thing”? > > I'm not sure what exactly happens, so correct me if I'm wrong, however > if I time the different approaches, I think that how Guix creates a > single-layered image, then if anything changes the entire image gets > re-imported into docker. Oh, there’s the quite recent ‘--max-layers’ option: https://guix.gnu.org/manual/devel/en/html_node/Invoking-guix-pack.html However the default is to create a single layer. Maybe worth changing to 32 or so? Oleg, WDYT? (We should also document the default value of ‘--max-layers’ in the manual: I had to check the code…) > On that note, I know that guix pack goes through %compressors in > order, however zstd is an insane improvement over gzip when working > with containers, would it perhaps be possible to default to it, or > would that break far too many workflows, or is there another reason? > Perhaps during changing how guix pack works would be a good time to > make both breaking changes at once? If Docker itself always understands zstd, then we could change the default, indeed. For other backends, such as plain tarballs, we could make that change but it’s going to be potentially more of a breaking change. Thoughts? Ludo’.
Re: "guix pack -f docker" does too much work
Hi, Also seems that Nix's way only quickly imports the changed layers? And Guix's always imports the whole thing, at least I think? What do you mean by “imports the whole thing”? I'm not sure what exactly happens, so correct me if I'm wrong, however if I time the different approaches, I think that how Guix creates a single-layered image, then if anything changes the entire image gets re-imported into docker. Though with the layered approach, if only one or two paths change, then those get imported, (and even though there's still some baseline that compression takes up) docker importing just the changed paths is a very noticeable speedup. On that note, I know that guix pack goes through %compressors in order, however zstd is an insane improvement over gzip when working with containers, would it perhaps be possible to default to it, or would that break far too many workflows, or is there another reason? Perhaps during changing how guix pack works would be a good time to make both breaking changes at once? Thanks, Michal.
Re: "guix pack -f docker" does too much work
Hi, Michal Atlas skribis: > I greatly agree, it would be an awesome QOL improvement. If there’s consensus, let’s see how we can get that done. The advantage of having (guix docker) & co. all in Scheme is that moving it from a derivation to code running straight from ‘guix pack’ is definitely feasible (a bit of work though because ‘guix pack’ has quite a few backends). > Just want to mention that it might be nice to take inspiration from > the Nix dockerTools, since they already have quite a lot of effort put > into this. > > Including for example an option called `streamLayeredImage` [1] which > doesn't generate a tarball at all, but rather a script that outputs > the layers without assembling them, in a format which Docker or Podman > can import without the huge intermediary file. > > i.e. $(guix pack ...) | docker load > > [1]: > https://ryantm.github.io/nixpkgs/builders/images/dockertools/#ssec-pkgs-dockerTools-streamLayeredImage Nice! Sounds very much in line with what Ricardo was proposing. > Also seems that Nix's way only quickly imports the changed layers? And > Guix's always imports the whole thing, at least I think? What do you mean by “imports the whole thing”? Thanks, Ludo’.
Re: "guix pack -f docker" does too much work
Hi, On Sat, 01 Jun 2024 at 15:58, Ludovic Courtès wrote: >> I think it would be great if "guix pack -f docker" could avoid building >> all these identical layers again and again. Perhaps it would be >> possible to have a single derivation for each layer? This way we >> wouldn't have to recreate the same layer archives every time. > > That sounds nice in terms of saving CPU time. It’s less nice in terms > of disk usage: a single ‘guix pack -f docker’ run would populate the > store with roughly twice the size of the closure. > > I think each solution (single derivation vs. one derivation per layer) > makes a different tradeoff. I don’t have a strong feeling about which > one is better. I share Ricardo wish. From my perspective, I do not care much about polluting my local Guix store when building Docker images. Because all that will be removed at the next GC – once all the work is loaded elsewhere. However, it appears frustrating to build again and again complete large images when the difference is sometimes just a couple of packages. I would be in favor to share more derivations between images. :-) Cheers, simon
Re: "guix pack -f docker" does too much work
On Sat 01 Jun 2024 15:58, Ludovic Courtès writes: >> I think it would be great if "guix pack -f docker" could avoid building >> all these identical layers again and again. Perhaps it would be >> possible to have a single derivation for each layer? This way we >> wouldn't have to recreate the same layer archives every time. > > That sounds nice in terms of saving CPU time. It’s less nice in terms > of disk usage: a single ‘guix pack -f docker’ run would populate the > store with roughly twice the size of the closure. If the concern is CPU time, I would make sure you have switched to zstd or some other faster codec, via `guix pack -f docker -C zstd`. You probably already knew but if you haven't tried, it's quite surprising :) Andy
Re: "guix pack -f docker" does too much work
Hello Ricardo, I greatly agree, it would be an awesome QOL improvement. Just want to mention that it might be nice to take inspiration from the Nix dockerTools, since they already have quite a lot of effort put into this. Including for example an option called `streamLayeredImage` [1] which doesn't generate a tarball at all, but rather a script that outputs the layers without assembling them, in a format which Docker or Podman can import without the huge intermediary file. i.e. $(guix pack ...) | docker load [1]: https://ryantm.github.io/nixpkgs/builders/images/dockertools/#ssec-pkgs-dockerTools-streamLayeredImage So that'd allow Guix to skip generating the final tarball altogether, which makes packing very swift. Also seems that Nix's way only quickly imports the changed layers? And Guix's always imports the whole thing, at least I think? Reading through how they do it, it seems that they pass the raw store paths to this python script [2] and it does the rest? Save for figuring out some merging of paths since there's a limit to the number of layers, I don't think this would be too difficult to port (after we find what license the script is under at least, or replicate the behaviour in Guile). [2]: https://github.com/NixOS/nixpkgs/blob/90509d6d66eb1524e2798a2a8627f44ae413f174/pkgs/build-support/docker/stream_layered_image.py What do you think? --- Atlas
Re: "guix pack -f docker" does too much work
Ludovic Courtès writes: >> I think it would be great if "guix pack -f docker" could avoid building >> all these identical layers again and again. Perhaps it would be >> possible to have a single derivation for each layer? This way we >> wouldn't have to recreate the same layer archives every time. > > That sounds nice in terms of saving CPU time. It’s less nice in terms > of disk usage: a single ‘guix pack -f docker’ run would populate the > store with roughly twice the size of the closure. Arguably we don't actually care all that much for the Docker image that ends up in the store. It's really a temporary thing that we want to load into Docker or upload somewhere else. I've often wanted to stream the eventual output of "guix pack" to a pipe, precisely because I don't want to store the same thing twice: once in the store and once in the Docker storage backend. It's actually worse than that: I often end up having dozens of packs in the store whose layers are almost all identical. > I think each solution (single derivation vs. one derivation per layer) > makes a different tradeoff. I don’t have a strong feeling about which > one is better. Can we have both? I realize that adding the option to stream build output to a pipe is not a trivial change, but it would solve the unnecessary storage requirement for packs. "docker load" reads from standard input, but other packs would also benefit from a streaming output; an example is Docker-free deployment to a remote server: just pipe "guix pack" to a remote tar process and you're all set. -- Ricardo
"guix pack -f docker" does too much work
Hi Guix, a few months ago "guix pack -f docker" was modified to produce layers. This is great! Unfortunately, "guix pack" itself still produces one big tarball containing all these layers. There is no sharing of previously built layers, because they are all hidden inside the pack. I think it would be great if "guix pack -f docker" could avoid building all these identical layers again and again. Perhaps it would be possible to have a single derivation for each layer? This way we wouldn't have to recreate the same layer archives every time. What do you think? -- Ricardo