Re: Guix Docker image inflation

2020-05-31 Thread zimoun
Hi Chris,

On Sun, 31 May 2020 at 23:04, Chris Marusich  wrote:

> Anyway, the point is: you begin with a previous image.  The previous
> image already has these store paths from the previous installation of
> Guix.  Therefore, they exist on the previous layer.  Because they exist
> on the previous layer, they cannot be removed from the Docker image, and
> they are carried forward in that previous layer, to all new images.
> Regardless of any changes to guix-daemon we might make, the way in which
> you build your images will cause them to grow by hundreds of megabytes
> every day.

I agree that the core of the issue is the Docker layers filesystem.
And it cannot be fixed on the Guix side.  Therefore, even it is *bad*
and dangerous, what about

--8<---cut here---start->8---
root@guix /# guix gc
root@guix /# guix gc --list-dead | xargs rm -rf
root@guix /# exit
$ docker stop
$ docker commit
$ docker export $ID | docker import - guix-new
> --8<---cut here---end--->8---

?

Well, it cannot be recommended because it is dangerous.  But it
somehow bypasses the guix-daemon and "hard-removes" the items and
then, as Vincent suggested, 'docker export | docker import' flattens
the layers so the dead items are then really gone in the new Docker
image.  And I have shown [1], "guix gc+export/import" does not lead to
an image where the dead items are gone, I have not mistaken.

[1] http://issues.guix.gnu.org/41607#6


WDYT?

Cheers,
simon



Re: Guix Docker image inflation

2020-05-31 Thread zimoun
Dear Stephen,

On Sun, 31 May 2020 at 21:43, Stephen Scheck  wrote:
> On Sun, May 31, 2020 at 2:51 PM zimoun  wrote:
>>
>> Maybe the explosion of size would be slower.  If you do, please report
>> here the number after say 12 generations; I am really interesting. ;-)
>
> Now I'm confused - in your reply to Vincent, it seemed that there were still 
> problems
> with the GC removing dead store items even after you did an export/re-import 
> with the
> entire image on a single Docker layer? Or did I misread it?

The export/import trick cut by half the size of "your guix-bootstrap"
image.  So even I am not convince that it will fix the issue, I think
that my proposal is the correct thing to do to delete dead items in
the store.  Basically, after the pull, you need to delete all the
other generations of /root/.config/guix/current (expected one) by
"guix pull -d", then to delete all the generations of
/root/.guix-profile with "guix package -d" and last garbage collect.
For sure, it will not delete the items coming from previous layer but
it will delete all the dead items of the current session.  And "docker
export | docker import" could remove other items -- even if in the
case of "install/remove hello" it is not work cleanly, some items are
deleted.

Well, it is just to be complete with your approach.


>> All the question seems to be:
>>  - what is the purpose of such Docker image?  Which usage?
>>  - what infrastructure do you have at hand to build the Docker images?
>
> Well, Guix is interesting, and there aren't ready-made containers for it out 
> there like there are for
> Ubuntu, Fedora, etc. if you have a need to do some task in that kind of 
> environment, or just to play
> around, or see how the system is evolving. Also, I have been playing around 
> with Guile lately and
> I thought Guix might be a better fit for that kind of work than other 
> environments where Guile is
> largely neglected (Guix is *written* in Guile, after all). And I happened to 
> be learning GitLab CI/CD
> around the same time, and it seemed like a good opportunity to experiment 
> with both at once,
> so I thought, why not? :-) Which infrastructure - well, GitLab CI/CD, with 
> fixed compute limits :-)

Yeah, "ready-made container" could be really cool!  AFAIK, no one took
the time to implement and document.  There are various attempts but
not always reported on help-guix or guix-devel.  Well, the answer of
these 2 questions implies different strategies.

For example, I am running Guix on the top of Debian so I basically use
only the package manager.  And I use this infrastructure to produce
Docker images containing apps running "guix pack -f docker -m
manifest.scm".  Because I am interested in Reproducible Science, I
also use "guix time-machine -C channel.scm -- pack -f docker -m
manifest.scm".  However, the Docker images contains only applications
(R or Python with bunch of packages) and the "user" cannot use these
images to extend them running "guix install foo"; because I want to
track reproducibility so the only way is to go by 'manifest.scm' and
'channel.scm' files.

Another example is the Dockerfile way.  Based on any image (Alpine or
Debian), I build an image containing the Guix package manager --
roughly speaking as you are doing with your image 'guix-bootstrap'.
Then I use this image in 2 different ways: with a Dockerfile or
directly.  In both cases, it always starts by "guix pull.  And I never
chain the images -- I mean only 3 "layers" at maximum: 0-debian,
1-guix-fresh and 2-guix-lastest.  Well, I have never run "guix gc"
inside a Docker image.


Last, I have never played with "guix system docker-image".  But in the
context of GitlabCI, what I would try should be something like:

CONTAINER=`docker run --detach --privileged $OLD`
docker exec $CONTAINER guix pull
docker exec $CONTAINER guix system docker-image --root=/image.tar stuff.scm
docker cp $CONTAINER:$IMG $NEW

with maybe instead of "guix pull" this bazooka:

docker exec $CONTAINER guix install git
docker exec $CONTAINER git clone
docker exec $CONTAINER guix environment guix
docker exec $CONTAINER ./bootstrap && ./configure --localstatedir=/var/
docker exec $CONTAINER make -j
docker exec $CONTAINER ./pre-inst-env guix system docker-image


Well, use "guix system docker-image" inside a Docker image already
containing Guix; this avoid the layer issue, isn't it?


But as I said elsewhere, I am not really familiar with Docker so my
words are probable irrelevant.

All the best,
simon



Re: Guix Docker image inflation

2020-05-31 Thread Chris Marusich
Stephen Scheck  writes:

> IF any of the store files resulting from `guix pull` are ephemeral
> (i.e. intermediate build results not anchored to a profile) AND guix
> GC worked inside the container, my approach might still work - yes
> there would be image and layers growth but it might be small enough
> not to care between periodic image rebases. But I'm starting to doubt
> that, or at least it is difficult to quantify with the GC issues.

I think you're right about it being difficult to quantify the GC issues.

Basically, when you run "guix pull", the current Guix will "build"
(i.e., maybe download via substitutes, maybe build from source) the new
Guix, which puts it into the store, and updates the profile symlinks to
make it current.  In the process of doing this, some intermediate builds
might be performed if substitutes are not available.  Although the new
Guix will remain live in the store after the profile symlinks are
updated to make it current, (1) intermediate results might be left dead
after "guix pull" is finished, and (2) if the old Guix is sufficiently
different from the new Guix, it will also become dead after the symlinks
that were keeping it live are removed.

So, the amount of garbage that will be left over depends on a few
factors, like whether substitutes were available, and how different the
new Guix is from the old one.  It can also depend on how the guix-daemon
has been started (see "--gc-keep-outputs" and --gc-keep-derivations" in
the "Invoking guix-daemon" section of the manual).

In the case of your Docker images, most (all?) of the garbage is coming
from case (2) above: as Guix changes, the old Guix will be made dead and
GC'd (hypothetically, let's suppose GC is working), but it will still
exist on prior layers, since it came from a prior layer.  As for case
(1), the intermediate results, I think they are not contributing to your
image size for two reasons: substitutes are probably available, and even
if they weren't available, the intermediates would probably appear
during "guix pull", which means they'd be on the top layer and would be
GC'd, so they wouldn't be included in any layer of the next image.  The
fact that the biggest dead paths in your latest image consist entirely
of store paths that look suspiciously like they came from prior Guix
installations is further evidence in support of this theory.

--8<---cut here---start->8---
root@guix /# du -Phc $(guix gc --list-dead) 2>/dev/null | sort -hk 1,1 | tail
finding garbage collector roots...
determining live/dead paths...
187M
/gnu/store/0vwg9aqzs5xrk10vcs4dl105s3f42ilf-guix-b1affd477-modules/lib/guile/3.0/site-ccache
187M/gnu/store/47aack48aczpzm635axsy4jf2pvmwrv0-guix-ef1d475b0-modules/lib
187M
/gnu/store/47aack48aczpzm635axsy4jf2pvmwrv0-guix-ef1d475b0-modules/lib/guile
187M
/gnu/store/47aack48aczpzm635axsy4jf2pvmwrv0-guix-ef1d475b0-modules/lib/guile/3.0
187M
/gnu/store/47aack48aczpzm635axsy4jf2pvmwrv0-guix-ef1d475b0-modules/lib/guile/3.0/site-ccache
194M/gnu/store/hz2rn2l0jixg91q4rsdcwc489y71ll29-guix-05e1edf22-modules
198M/gnu/store/5mhn1ynxvy7jihsknsnv3yspkkvc0r5s-guix-2e59ae238-modules
210M/gnu/store/0vwg9aqzs5xrk10vcs4dl105s3f42ilf-guix-b1affd477-modules
210M/gnu/store/47aack48aczpzm635axsy4jf2pvmwrv0-guix-ef1d475b0-modules
3.0Gtotal
root@guix /# 
--8<---cut here---end--->8---

These "guix-HASH-modules" directories, for example, are used as part of
each Guix installation:

--8<---cut here---start->8---
root@guix /# realpath ~/.config/guix/current/share/guile
/gnu/store/mj6pf6nf0kf03nhh7bmpc6m43v6knq6m-guix-a5374cde9-modules/share/guile
root@guix /# 
--8<---cut here---end--->8---

Each of them has a total closure size of almost 500 MB, although since
they might share some references, each one individually is adding "only"
about 200 MB.

--8<---cut here---start->8---
root@guix /# guix size 
/gnu/store/mj6pf6nf0kf03nhh7bmpc6m43v6knq6m-guix-a5374cde9-modules
store item   totalself
/gnu/store/mj6pf6nf0kf03nhh7bmpc6m43v6knq6m-guix-a5374cde9-modules   485.9   
206.9  42.6%
/gnu/store/hkmsljl2sf4nk96b35f0bmfkr2lqanfq-guix-packages-base 105.7   
105.7  21.8%
/gnu/store/s7izb7j0s5rzcq297nd7ba9sfiqh5zmz-guix-system 43.2
43.2   8.9%
/gnu/store/fa6wj5bxkj5ll1d7292a70knmyl7a0cr-glibc-2.31  38.4
36.7   7.6%
/gnu/store/01b4w3m6mp55y531kyi1g8shh722kwqm-gcc-7.5.0-lib   71.0
32.6   6.7%
/gnu/store/wcv5mscivggkygnz68nn2671fr3kapjc-guix-packages-base-source19.4   
 19.4   4.0%
/gnu/store/6zygksmvzcq92xf65cna91dbf7a4zblh-guix-extra  19.4
19.4   4.0%
/gnu/store/a7wiy24mmcilbqp39pl0jdlw10vbvavb-guix-cli 8.0 
7.3   1.5%
/gnu/store/f6k9b4grrfpip4h5lrmpnsnn2gqziihr-guix-system-tests 

Re: Guix Docker image inflation

2020-05-31 Thread Stephen Scheck
On Sun, May 31, 2020 at 2:51 PM zimoun  wrote:

> Maybe the explosion of size would be slower.  If you do, please report
> here the number after say 12 generations; I am really interesting. ;-)
>

Now I'm confused - in your reply to Vincent, it seemed that there were
still problems
with the GC removing dead store items even after you did an
export/re-import with the
entire image on a single Docker layer? Or did I misread it?


> All the question seems to be:
>  - what is the purpose of such Docker image?  Which usage?
>  - what infrastructure do you have at hand to build the Docker images?
>

Well, Guix is interesting, and there aren't ready-made containers for it
out there like there are for
Ubuntu, Fedora, etc. if you have a need to do some task in that kind of
environment, or just to play
around, or see how the system is evolving. Also, I have been playing around
with Guile lately and
I thought Guix might be a better fit for that kind of work than other
environments where Guile is
largely neglected (Guix is *written* in Guile, after all). And I happened
to be learning GitLab CI/CD
around the same time, and it seemed like a good opportunity to experiment
with both at once,
so I thought, why not? :-) Which infrastructure - well, GitLab CI/CD, with
fixed compute limits :-)


Re: Guix Docker image inflation

2020-05-31 Thread zimoun
Dear Stephen, again :-)

On Sun, 31 May 2020 at 20:30, Stephen Scheck  wrote:

>> No, it is how Docker is designed.  Maybe the terminology "layer" is
>> not the Docker one but when the images are chained, one cannot remove
>> the data of the previous layer of the total image.
>
> I'm not disagreeing with that, but IF any of the store files resulting from 
> `guix pull`
> are ephemeral (i.e. intermediate build results not anchored to a profile) AND 
> guix
> GC worked inside the container, my approach might still work - yes there 
> would be
> image and layers growth but it might be small enough not to care between 
> periodic image
> rebases. But I'm starting to doubt that, or at least it is difficult to 
> quantify with the
> GC issues.

Currently, if I read correctly, your images are chained with something like,

--8<---cut here---start->8---
GUIX_PATH=/root/.config/guix/current/bin
$GUIX_PATH/guix pull --branch=$CI_COMMIT_REF_NAME--fallback
/root/.config/guix/current/bin/guix gc --delete-generations
/root/.config/guix/current/bin/guix gc --collect-garbage
/root/.config/guix/current/bin/guix gc --optimize
docker commit
--8<---cut here---end--->8---

and instead you should do something like

--8<---cut here---start->8---
GUIX_PATH=/root/.config/guix/current/bin
$GUIX_PATH/guix pull --branch=$CI_COMMIT_REF_NAME--fallback
/root/.config/guix/current/bin/guix pull -d
/root/.config/guix/current/bin/guix package -d
/root/.config/guix/current/bin/guix gc
docker commit
docker export | docker import
--8<---cut here---end--->8---

Maybe the explosion of size would be slower.  If you do, please report
here the number after say 12 generations; I am really interesting. ;-)


>> Because if you run Guix outside an Docker container, you will not have
>> the issue.  The main issue is how the Docker "filesystem" is designed.
>
> Actually, there might be another way around this, still avoiding the need for 
> a custom Runner,
> for example mounting /var/guix and /gnu/store into the container instead of 
> belonging to it. If
> done that way, layer accumulation wouldn't be an issue, and maybe GC between 
> layers neither.

Yes, it is one solution.
All the question seems to be:
 - what is the purpose of such Docker image?  Which usage?
 - what infrastructure do you have at hand to build the Docker images?


Thank you for raising all this Docker image production question. :-)

All the best,
simon



Re: Guix Docker image inflation

2020-05-31 Thread zimoun
Dear Stephen,

On Sun, 31 May 2020 at 19:51, Stephen Scheck  wrote:

> But I'm now starting to doubt my whole approach because it seems like
> there are some fundamental GC problems with running a live Guix system
> inside a container.

I do not think it is "some fundamental GC problems with running a live
Guix system inside a container" but it is a fundamental Docker
filesystem design which is incompatible with your approach.  As I have
tried to show, the issue is:

$ CONTAINER=`docker run --detach --tty --privileged image0`
$ docker exec --interactive --tty $CONTAINER /bin/sh
/ # dd if=/dev/urandom of=/data1 bs=1234567 count=1024
$ HASH=`docker commit $CONTAINER` && docker tag $HASH image1

$ CONTAINER=`docker run --detach --tty --privileged image1`
$ docker exec --interactive --tty $CONTAINER /bin/sh
/ # rm /data1
/ # dd if=/dev/urandom of=/data2 bs=1234567 count=1024
$ HASH=`docker commit $CONTAINER` && docker tag $HASH image2

$ CONTAINER=`docker run --detach --tty --privileged image2`
$ docker exec --interactive --tty $CONTAINER /bin/sh
/ # rm /data2
/ # dd if=/dev/urandom of=/data3 bs=1234567 count=1024
$ HASH=`docker commit $CONTAINER` && docker tag $HASH image3

etc.

And all the resulting images are bigger and bigger.  Do I misread something?

Maybe "docker export | docker import" should help to keep the size
"reasonable" even if I am not convinced...


Well, thank you for raising the issue, because I have learnt
interesting stuff about Docker. :-)
And I do not have yet something concrete to say about your initial issue, sorry.


All the best,
simon



Re: Guix Docker image inflation

2020-05-31 Thread Stephen Scheck
On Sun, May 31, 2020 at 5:37 AM zimoun  wrote:

> No, it is how Docker is designed.  Maybe the terminology "layer" is
> not the Docker one but when the images are chained, one cannot remove
> the data of the previous layer of the total image.
>

I'm not disagreeing with that, but IF any of the store files resulting from
`guix pull`
are ephemeral (i.e. intermediate build results not anchored to a profile)
AND guix
GC worked inside the container, my approach might still work - yes there
would be
image and layers growth but it might be small enough not to care between
periodic image
rebases. But I'm starting to doubt that, or at least it is difficult to
quantify with the
GC issues.


> Because if you run Guix outside an Docker container, you will not have
> the issue.  The main issue is how the Docker "filesystem" is designed.
>

Actually, there might be another way around this, still avoiding the need
for a custom Runner,
for example mounting /var/guix and /gnu/store into the container instead of
belonging to it. If
done that way, layer accumulation wouldn't be an issue, and maybe GC
between layers neither.


Re: Guix Docker image inflation

2020-05-31 Thread zimoun
Hi Vincent,

On Sun, 31 May 2020 at 12:50, Vincent Legoll  wrote:

> docker export  | docker import - img_name

I do not know if it really works here.  Maybe I am doing incorrectly...

--8<---cut here---start->8---
$ docker images --format "{{.Size}}\t{{.Repository}}"
959MB4reexport
960MB3clean
960MB2remove-hello
959MB1install-hello
578MB0new-fresh
1.06GBfresh
1.06GBsingularsyntax/guix-bootstrap
--8<---cut here---end--->8---

Well, and the interesting part is:

--8<---cut here---start->8---
$ CONTAINER=`docker run --detach --tty --privileged 4reexport`
$ docker exec --interactive --tty $CONTAINER /bin/sh
/ # /root/.config/guix/current/bin/guix gc --list-live | grep hello
/root/.config/guix/current/bin/guix gc --list-live | grep hello
finding garbage collector roots...
determining live/dead paths...
/ # /root/.config/guix/current/bin/guix gc --list-dead | grep hello
/root/.config/guix/current/bin/guix gc --list-dead | grep hello
finding garbage collector roots...
determining live/dead paths...
/gnu/store/kg9mirg6xbvzcp0a98v7326n1nvvwgsj-hello-2.10
/ # /root/.config/guix/current/bin/guix gc --references
/gnu/store/kg9mirg6xbvzcp0a98v7326n1nvvwgsj-hello-2.10
/root/.config/guix/current/bin/guix gc --references
/gnu/store/kg9mirg6xbvzcp0a98v7326n1nvvwgsj-he
llo-2.10
guix gc: error: path
`/gnu/store/kg9mirg6xbvzcp0a98v7326n1nvvwgsj-hello-2.10' is not valid
/ # exit
--8<---cut here---end--->8---



Just for the record, the commands run:

--8<---cut here---start->8---
$ CONTAINER=`docker run --detach --tty --privileged fresh`
$ CMD='CMD "/root/.config/guix/current/bin/guix-daemon"
"--build-users-group=guixbuild"'
$ docker export $CONTAINER \
   | docker import --change $CMD - 0new-fresh

$ CONTAINER=`docker run --detach --tty --privileged 0new-fresh`
$ docker exec --interactive --tty $CONTAINER /bin/sh
/ # /root/.config/guix/current/bin/guix install hello
/ # exit

$ docker stop $CONTAINER
$ HASH=`docker commit $CONTAINER` && docker tag $HASH 1install-hello

$ CONTAINER=`docker run --detach --tty --privileged 1install-hello`
$ docker exec --interactive --tty $CONTAINER /bin/sh
/ # /root/.config/guix/current/bin/guix remove hello
/ # exit

$ docker stop $CONTAINER
$ HASH=`docker commit $CONTAINER` && docker tag $HASH 2remove-hello

$ CONTAINER=`docker run --detach --tty --privileged 2remove-hello`
$ docker exec --interactive --tty $CONTAINER /bin/sh
/ # /root/.config/guix/current/bin/guix pull -d
/ # /root/.config/guix/current/bin/guix package -d
/ # /root/.config/guix/current/bin/guix gc
/ # exit
$ docker stop $CONTAINER
$ HASH=`docker commit $CONTAINER` && docker tag $HASH 3clean

$ CONTAINER=`docker run --detach --tty --privileged 3clean`
$ docker export $CONTAINER | docker import --change $CMD - 4reexport
--8<---cut here---end--->8---

where I cheated with $CMD which does not as is but the full 'CMD...'
has to be typed after '--change'.


All the best,
simon



Re: Guix Docker image inflation

2020-05-31 Thread Stephen Scheck
On Sun, May 31, 2020 at 12:31 AM Chris Marusich 
wrote:

> > Also, layers are helpful in the case of someone pulling down daily
> > Guix Docker images on a frequent basis, because then only the new,
> > ideally small layers need to be downloaded, whereas if you rebase for
> > every image build, you'd have to download the entire image every day.
>
> That is true, but suppose I have the following 3 images:
>
> - Image A: A base image created in January 2020.
> - Image B: Based on A, and I ran "guix pull" in February 2020.
> - Image C: Based on A, and I ran "guix pull" in June 2020.
>
> I would guess that the size difference between A and B is approximately
> the same as the difference between A and C.  It'll be different, of
> course, but generally the size difference between A and C should not
> grow linearly with time, since "guix pull" is only going to install at
> most the total closure of things necessary to build and run Guix, which
> doesn't increase much in size as time goes on.  However, when you
> daisy-chain the images every day, the image size will grow linearly with
> time because the contents of all the previous layers is carried forward.
>
> > My build script issues several `docker exec  `
> > sequences, followed by a `docker commit `. Intermediate
> > changes to the container file system prior to the commit do not
> > generate layers, only the net changes after the commit.
>
> There are two problems here.  One is that the image size grows without
> bound.  The other is that guix-daemon is failing to GC store items in
> the Docker container.  Although they are both concerning, the latter is
> not the cause of the former.
>
> If you install new store items (e.g., via "guix pull"), make them dead,
> and then GC them, all in the same container before running "docker
> commit", then I agree: those GC'd store items would not persist in a
> layer anywhere.  However, I don't think that's what's happening here.
> Sure, there might be a few store items like this, but in practice, there
> will be many store items from the previous image which began live but
> became dead when you ran "guix pull" and deleted your old profile
> generations.  It is those store items that are adding the most space to
> your image.
>

Yes, I get this. I never expected the container to stay constant in size,
but I
was hoping daily pulls would result in relatively low image growth. It's not
clear to me if any of the items which should get GC'd but don't are just
ephemeral build results, in which case growth might be tolerable with an
occasional rebase (perhaps monthly or bi-monthly).

But I'm now starting to doubt my whole approach because it seems like
there are some fundamental GC problems with running a live Guix system
inside a container.


> Besides store items, I noticed two other things about your images:
>
> - The contents of /var is growing slowly without bound, but it isn't
>   nearly as bad as the contents of /gnu/store.  This is probably due to
>   log files; consider pruning them.
>

These are presumably OK to delete, without any special handling for Guix?


> - Your script runs "docker commit" while guix-daemon (and other
>   programs) are still running.  To ensure the guix-daemon's database (or
>   other things) does not become corrupt, consider terminating all
>   processes before committing the new image.
>

`docker commit` pauses the container (unless you tell it not to) ...
although
I guess that could still cause problems if Guix store writes aren't
implemented
in an atomic way.

Thanks,
-SS


Re: Guix Docker image inflation

2020-05-31 Thread Vincent Legoll

Hello,

maybe you can try:

docker export  | docker import - img_name

This should flatten the layers back to a single one.

--
Vincent Legoll




Re: Guix Docker image inflation

2020-05-31 Thread zimoun
Dear Stephen,

On Sat, 30 May 2020 at 19:13, Stephen Scheck  wrote:

> No, it is not layers - they are a symptom, not the cause. See my reply to 
> Chris.
> The problem is clearly that Guix isn't deleting garbage files ... which may 
> have something
> to do with how Guix interacts with files in the file system and differences 
> in Docker
> environments (no idea, I don't know how Guix works, but perhaps it needs some 
> special
> privilege enabled when it runs inside Docker containers?), but layers 
> themselves do not
> prevent file deletion inside a container.

No, it is how Docker is designed.  Maybe the terminology "layer" is
not the Docker one but when the images are chained, one cannot remove
the data of the previous layer of the total image.


> It is possible to host your own external Runners, and have them utilized by
> CI/CD jobs running inside the GitLab cloud service. You could install Guix
> on them and configure your CI/CD pipeline to require execution of certain
> jobs on these custom runners. But I'm not sure I see why that would help?

Because if you run Guix outside an Docker container, you will not have
the issue.  The main issue is how the Docker "filesystem" is designed.


All the best,
simon



Re: Guix Docker image inflation

2020-05-31 Thread zimoun
Hi Chris,

On Sun, 31 May 2020 at 06:32, Chris Marusich  wrote:

> I would guess that the size difference between A and B is approximately
> the same as the difference between A and C.  It'll be different, of
> course, but generally the size difference between A and C should not
> grow linearly with time, since "guix pull" is only going to install at
> most the total closure of things necessary to build and run Guix, which
> doesn't increase much in size as time goes on.  However, when you
> daisy-chain the images every day, the image size will grow linearly with
> time because the contents of all the previous layers is carried forward.

Exactly and it is not specific to Guix but it is how Docker works, if
I understand correctly.


> - Your script runs "docker commit" while guix-daemon (and other
>   programs) are still running.  To ensure the guix-daemon's database (or
>   other things) does not become corrupt, consider terminating all
>   processes before committing the new image.

Do you think the GC issue comes from this?
Because "docker stop" and then "docker commit" does not change the
issue.  The GC is still confused by trying to delete items than are
not in the store.  Roughly speaking, "guix gc" says it removes items
of size 0, but then "guix gc --references" says the path does not
exist.

--8<---cut here---start->8---
/ # /root/.config/guix/current/bin/guix gc
/root/.config/guix/current/bin/guix gc
[...]
/ # /root/.config/guix/current/bin/guix gc --list-dead | grep hello
/root/.config/guix/current/bin/guix gc --list-dead | grep hello
finding garbage collector roots...
determining live/dead paths...
/gnu/store/kg9mirg6xbvzcp0a98v7326n1nvvwgsj-hello-2.10
/ # /root/.config/guix/current/bin/guix gc --references
/gnu/store/kg9mirg6xbvzcp0a98v7326n1nvvwgsj-hello-2.10
/root/.config/guix/current/bin/guix gc --references
/gnu/store/kg9mirg6xbvzcp0a98v7326n1nvvwgsj-hello-2.10
guix gc: error: path
`/gnu/store/kg9mirg6xbvzcp0a98v7326n1nvvwgsj-hello-2.10' is not valid
--8<---cut here---end--->8---



> I apologize for not reading your thread more closely to begin with.  I
> took a closer looks, and I think I can explain what is going on now.
> Please check the bug report and reply there if anything is unclear.

Ah sorry, maybe you always addressed these questions in the bug report.


Cheers,
simon



Re: Guix Docker image inflation

2020-05-31 Thread zimoun
Dear Stephen,

Follow ups of
https://lists.gnu.org/archive/html/help-guix/2020-05/msg00249.html
and bug#41607 CC http://issues.guix.gnu.org/41607


On Sat, 30 May 2020 at 19:02, Stephen Scheck  wrote:

> You can convince yourself of this by doing something like the following:
>
> docker run 
> docker exec  dd if=/dev/urandom of=/RANDOM-DATA
> bs=1048576 count=1024
> docker commit 
> docker exec  rm /RANDOM-DATA
> docker commit 

It does not convince myself and maybe I am doing wrongly but it is not
what I am observing for an example with more than 2 'commits'.  Here
my session, based on your images rename "fresh" because it will happen
on any image.

--8<---cut here---start->8---
$ docker images
REPOSITORY  TAG IMAGE ID
 CREATED SIZE
fresh   latest  c36cef8306d5
 3 weeks ago 1.06GB
singularsyntax/guix-bootstrap   1.1.0-alpine-3.11   c36cef8306d5
 3 weeks ago 1.06GB

$ CONTAINER=`docker run --detach --tty --privileged fresh`
$ docker exec --interactive --tty $CONTAINER /bin/sh
/ # dd if=/dev/urandom of=/DATA bs=1234567 count=1024
dd if=/dev/urandom of=/DATA bs=1234567 count=1024
1024+0 records in
1024+0 records out
/ # exit
exit

$ HASH=`docker commit $CONTAINER` && docker tag $HASH add-data
$ docker stop $CONTAINER && docker rm $CONTAINER
cb89992b76ace2afe5dc6e082c8de121c483dfeeb688d89849713e2cf90b68c7
cb89992b76ace2afe5dc6e082c8de121c483dfeeb688d89849713e2cf90b68c7

$ CONTAINER=`docker run --detach --tty --privileged add-data`
$ docker exec --interactive --tty $CONTAINER /bin/sh
/ # rm /DATA
rm /DATA
/ # dd if=/dev/urandom of=/OTHER bs=1234567 count=1024
dd if=/dev/urandom of=/OTHER bs=1234567 count=1024
1024+0 records in
1024+0 records out
/ # exit
exit

$ HASH=`docker commit $CONTAINER` && docker tag $HASH rm-data-add-other
$ docker stop $CONTAINER && docker rm $CONTAINER
93e9afe593226ec29669efe8515b47487f455d4ad5e012cc67372c2549ec7256
93e9afe593226ec29669efe8515b47487f455d4ad5e012cc67372c2549ec7256

$ CONTAINER=`docker run --detach --tty --privileged rm-data-add-other`
$ docker exec --interactive --tty $CONTAINER /bin/sh
/ # rm /OTHER
rm /OTHER
/ # exit
exit

$ HASH=`docker commit $CONTAINER` && docker tag $HASH rm-other

$ docker stop $CONTAINER && docker rm $CONTAINER
469b341c2f394ef05f5f43a5d96239853e3552d979028a457a9bdd1096a8fab4
469b341c2f394ef05f5f43a5d96239853e3552d979028a457a9bdd1096a8fab4

$ docker images
REPOSITORY  TAG IMAGE ID
 CREATED  SIZE
rm-otherlatest  b80d300aa755
 23 seconds ago   3.59GB
rm-data-add-other   latest  de551eac1d55
 About a minute ago   3.59GB
add-datalatest  6a563dad
 3 minutes ago2.32GB
fresh   latest  c36cef8306d5
 3 weeks ago  1.06GB
singularsyntax/guix-bootstrap   1.1.0-alpine-3.11   c36cef8306d5
 3 weeks ago  1.06GB

$ CONTAINER=`docker run --detach --tty --privileged rm-other`
$ docker exec --interactive --tty $CONTAINER /bin/sh
/ # ls /
ls /
bindevetcgnuhome   libmedia  mntoptproc
root   runsbin   srvsystmpusrvar
/ # exit
exit
--8<---cut here---end--->8---

> You'll end up with two new images - the first one should be about 1 GB
> larger than the base image,
> the second one the same size.

As you see, the image 'rm-other' does not container either /DATA or
/OTHER and its size is not the same than the initial one 'fresh'.  So
I do not know if the correct Docker terminology is "layer" because the
issue is definitely on the Docker side and not on the Guix side.


Cheers,
simon



Re: Guix Docker image inflation

2020-05-30 Thread Chris Marusich
Hi Stephen,

Stephen Scheck  writes:

> Layers certainly add some image size overhead, but I don't think that
> is the culprit here.

> Also, layers are helpful in the case of someone pulling down daily
> Guix Docker images on a frequent basis, because then only the new,
> ideally small layers need to be downloaded, whereas if you rebase for
> every image build, you'd have to download the entire image every day.

That is true, but suppose I have the following 3 images:

- Image A: A base image created in January 2020.
- Image B: Based on A, and I ran "guix pull" in February 2020.
- Image C: Based on A, and I ran "guix pull" in June 2020.

I would guess that the size difference between A and B is approximately
the same as the difference between A and C.  It'll be different, of
course, but generally the size difference between A and C should not
grow linearly with time, since "guix pull" is only going to install at
most the total closure of things necessary to build and run Guix, which
doesn't increase much in size as time goes on.  However, when you
daisy-chain the images every day, the image size will grow linearly with
time because the contents of all the previous layers is carried forward.

> My build script issues several `docker exec  `
> sequences, followed by a `docker commit `. Intermediate
> changes to the container file system prior to the commit do not
> generate layers, only the net changes after the commit.

There are two problems here.  One is that the image size grows without
bound.  The other is that guix-daemon is failing to GC store items in
the Docker container.  Although they are both concerning, the latter is
not the cause of the former.

If you install new store items (e.g., via "guix pull"), make them dead,
and then GC them, all in the same container before running "docker
commit", then I agree: those GC'd store items would not persist in a
layer anywhere.  However, I don't think that's what's happening here.
Sure, there might be a few store items like this, but in practice, there
will be many store items from the previous image which began live but
became dead when you ran "guix pull" and deleted your old profile
generations.  It is those store items that are adding the most space to
your image.

Besides store items, I noticed two other things about your images:

- The contents of /var is growing slowly without bound, but it isn't
  nearly as bad as the contents of /gnu/store.  This is probably due to
  log files; consider pruning them.

- Your script runs "docker commit" while guix-daemon (and other
  programs) are still running.  To ensure the guix-daemon's database (or
  other things) does not become corrupt, consider terminating all
  processes before committing the new image.

> FYI, Guix itself can build Docker images from scratch - no base image
>> required!  It can even build a Docker image of a full-blown Guix System
>> from scratch.  Sorry if you already knew that - I just wanted to point
>> it out in case you didn't!
>>
>
> Yes, thanks, I know - if you read through the thread you'll see that I make
> reference to  `guix system docker-image [...]`.

I apologize for not reading your thread more closely to begin with.  I
took a closer looks, and I think I can explain what is going on now.
Please check the bug report and reply there if anything is unclear.

-- 
Chris


signature.asc
Description: PGP signature


Re: Guix Docker image inflation

2020-05-30 Thread Stephen Scheck
On Fri, May 29, 2020 at 7:55 PM zimoun  wrote:


> Thank you for the explanation.  The issue is these layers.  When I
> wrote [1], it was not clear for me because I am not enough familiar
> with Docker, but with your explanations, it is clear now. :-)
>
> [1] http://issues.guix.gnu.org/41607#1
>

No, it is not layers - they are a symptom, not the cause. See my reply to
Chris.
The problem is clearly that Guix isn't deleting garbage files ... which may
have something
to do with how Guix interacts with files in the file system and differences
in Docker
environments (no idea, I don't know how Guix works, but perhaps it needs
some special
privilege enabled when it runs inside Docker containers?), but layers
themselves do not
prevent file deletion inside a container.


> > FYI, Guix itself can build Docker images from scratch - no base image
> > required!  It can even build a Docker image of a full-blown Guix System
> > from scratch.  Sorry if you already knew that - I just wanted to point
> > it out in case you didn't!
>
> I think the idea is to use GitlabCI to build the Docker images
> containing Guix materials.  And AFAIK, GitlabCI does not provide Guix
> related tools, isn't it?  I mean there is no gitlab-runner able to run
> guix-daemon.  If I remember well, we discussed about this topic at
> FOSDEM, it should be awesome. :-)
>

It is possible to host your own external Runners, and have them utilized by
CI/CD jobs running inside the GitLab cloud service. You could install Guix
on them and configure your CI/CD pipeline to require execution of certain
jobs on these custom runners. But I'm not sure I see why that would help?


Re: Guix Docker image inflation

2020-05-30 Thread Stephen Scheck
On Fri, May 29, 2020 at 7:31 PM Chris Marusich  wrote:

>
> Could it be that you are accumulating layers without bound?
>
>
> https://developers.redhat.com/blog/2016/03/09/more-about-docker-images-size/
>
> Since Docker images are built up of immutable layers, if you build your
> image from an existing base image, I'm not sure that it's possible to
> produce a new image that is smaller than the base image.  Basically,
> even if you run "guix gc" to remove dead store items, they will still
> exist on a prior layer, so the size of the new image won't decrease.
> And since you're installing new things, the size will actually increase.
> If you repeat this process by using the new image as an input for yet
> another build, I think you will accumulate layers and storage space
> without bound.
>

Layers certainly add some image size overhead, but I don't think that is
the culprit
here. And producing a smaller image isn't really the goal, it's just to
keep image
growth reasonable between each incremental guix pull. Dead store items would
only exist on previous layers if they make it there in the first place. As
has been
demonstrated on previous posts in the thread, I believe the problem is some
guix bug which prevents deletion of garbage-collected store items.

What is reasonable growth? That is hard to answer, but I would expect it be
roughly
proportional to the growth of a guix installation over time in a non-Docker
environment,
taking some constant amount of layer overhead as a given.

I don't really know what `guix pull` does, but I think it's something along
these lines:
1) the global package index is brought up-to-date; 2) Any packages which
are installed
in the profile doing the pull are upgraded to newer versions if they've
been updated. So
day-to-day, particularly in the case where there have been no updates to
packages
installed in the profile, size growth should be very small. Periodic
"rebasing" of incremental
Docker images might still be helpful from time to time using one of the
layer squashing
tools out there, but I don't think it should be necessary on a daily basis.

Also, layers are helpful in the case of someone pulling down daily Guix
Docker images
on a frequent basis, because then only the new, ideally small layers need
to be downloaded,
whereas if you rebase for every image build, you'd have to download the
entire image
every day.

The boundless layer accumulation you point out shouldn't be a problem with
the way that
I'm building the images. When you do a `RUN ` inside a Dockerfile,
it is essentially
doing `docker exec  ` followed by `docker commit
`. It is
the commit step which produces a new layer. You can think of a RUN command
inside a Dockerfile
as kind of a single-step transaction, which incorporates the net file
system changes into the image.

My build script issues several `docker exec  `
sequences, followed by a
`docker commit `. Intermediate changes to the container file
system prior to the commit
do not generate layers, only the net changes after the commit.

You can convince yourself of this by doing something like the following:

docker run 
docker exec  dd if=/dev/urandom of=/RANDOM-DATA
bs=1048576 count=1024
docker commit 
docker exec  rm /RANDOM-DATA
docker commit 

You'll end up with two new images - the first one should be about 1 GB
larger than the base image,
the second one the same size.

FYI, Guix itself can build Docker images from scratch - no base image
> required!  It can even build a Docker image of a full-blown Guix System
> from scratch.  Sorry if you already knew that - I just wanted to point
> it out in case you didn't!
>

Yes, thanks, I know - if you read through the thread you'll see that I make
reference to  `guix system docker-image [...]`.

-SS


Re: Guix Docker image inflation

2020-05-29 Thread zimoun
Hi Chris,

On Sat, 30 May 2020 at 01:31, Chris Marusich  wrote:

> Could it be that you are accumulating layers without bound?
>
> https://developers.redhat.com/blog/2016/03/09/more-about-docker-images-size/
>
> Since Docker images are built up of immutable layers, if you build your
> image from an existing base image, I'm not sure that it's possible to
> produce a new image that is smaller than the base image.  Basically,
> even if you run "guix gc" to remove dead store items, they will still
> exist on a prior layer, so the size of the new image won't decrease.
> And since you're installing new things, the size will actually increase.
> If you repeat this process by using the new image as an input for yet
> another build, I think you will accumulate layers and storage space
> without bound.

Thank you for the explanation.  The issue is these layers.  When I
wrote [1], it was not clear for me because I am not enough familiar
with Docker, but with your explanations, it is clear now. :-)

[1] http://issues.guix.gnu.org/41607#1


> FYI, Guix itself can build Docker images from scratch - no base image
> required!  It can even build a Docker image of a full-blown Guix System
> from scratch.  Sorry if you already knew that - I just wanted to point
> it out in case you didn't!

I think the idea is to use GitlabCI to build the Docker images
containing Guix materials.  And AFAIK, GitlabCI does not provide Guix
related tools, isn't it?  I mean there is no gitlab-runner able to run
guix-daemon.  If I remember well, we discussed about this topic at
FOSDEM, it should be awesome. :-)


Cheers,
simon



Re: Guix Docker image inflation

2020-05-29 Thread Chris Marusich
Stephen Scheck  writes:

> Hello,
>
> As an exercise, I set up daily Guix System Docker image builds using GitLab
> and Docker Hub, here:
> https://hub.docker.com/repository/registry-1.docker.io/singularsyntax/guix/tags?page=1
>
> The build process works as follows: if an existing `latest` image does not
> exist for a given branch (master, 1.1.0, etc.), then bootstrap an image by
> running `guix system docker-image` inside an Alpine Linux Docker container
> with a fresh Guix installation. Using this image as a seed, `guix pull` is
> run for the desired branch, and the resulting image is committed to the
> Docker repository. If a "latest" image does exist, it is used instead as
> the base from which to run `guix pull`. Daily images are thus built
> incrementally from the previous day's build. For anybody curious about the
> process, the build script can be browsed here:
> https://gitlab.com/singularsyntax-docker-hub/guix/-/blob/master/.gitlab-ci.yml
>
> It works pretty well, except that I'm observing substantial image size
> inflation day-over-day, starting at ~197 MB from the seed image, now up to
> 1.71 GB eleven days later despite running `guix gc --delete-generations`,
> `guix gc --collect-garbage`, and `guix gc --optimize` after pulling prior
> to committing each new image.
>
> I'm wondering if there is some other Guix GC operation or option I'm
> missing, or any other suggestions which could stop this unsustainable image
> bloat from occurring. I really do doubt that the Guix System itself is
> growing this quickly.

Could it be that you are accumulating layers without bound?

https://developers.redhat.com/blog/2016/03/09/more-about-docker-images-size/

Since Docker images are built up of immutable layers, if you build your
image from an existing base image, I'm not sure that it's possible to
produce a new image that is smaller than the base image.  Basically,
even if you run "guix gc" to remove dead store items, they will still
exist on a prior layer, so the size of the new image won't decrease.
And since you're installing new things, the size will actually increase.
If you repeat this process by using the new image as an input for yet
another build, I think you will accumulate layers and storage space
without bound.

If this is what's happening, you might consider always building starting
from the same base image every time.  You could then update the base
image (e.g., by changing the FROM line of a Dockerfile, if that's what
you're using) periodically as new versions of it are released.  This
would probably allow you to avoid accumulating layers without bound.

FYI, Guix itself can build Docker images from scratch - no base image
required!  It can even build a Docker image of a full-blown Guix System
from scratch.  Sorry if you already knew that - I just wanted to point
it out in case you didn't!

See:

https://guix.gnu.org/manual/en/html_node/Invoking-guix-pack.html
https://guix.gnu.org/manual/en/html_node/Invoking-guix-system.html

Hope that helps,

-- 
Chris


signature.asc
Description: PGP signature


Re: Guix Docker image inflation

2020-05-29 Thread Stephen Scheck
On Fri, May 29, 2020 at 5:54 PM zimoun  wrote:

> Do you have '/var/' in your Docker image?  Because it looks like the same
> than:
>

Yes:

root@guix ~# ls -la /var/guix
total 44
drwxr-xr-x 1 root root 4096 May 16 19:36 ./
drwxr-xr-x 1 root root 4096 May 29 22:02 ../
drwxr-xr-x 1 root root 4096 May 29 22:02 daemon-socket/
drwxr-xr-x 1 root root 4096 May 27 00:34 db/
-rw--- 1 root root0 May 16 19:35 gc.lock
drwxr-xr-x 1 root root 4096 May 16 19:57 gcroots/
drwxr-xr-x 1 root root 4096 Jan  1  1970 profiles/
drwxr-xr-x 1 root root 4096 May 16 19:35 substitute/
drwxr-xr-x 1 root root 4096 May 27 00:34 temproots/
drwxr-xr-x 1 root root 4096 May 16 19:36 userpool/

If you'd like, you can fetch the exact same image and look around yourself:

docker pull singularsyntax/guix:master-a5374cd # same as
singularsyntax/guix:latest
CONTAINER=`docker run --detach --tty --privileged
singularsyntax/guix:master-a5374cd`
docker exec --interactive --tty $CONTAINER
/run/current-system/profile/bin/bash --login


Re: Guix Docker image inflation

2020-05-29 Thread zimoun
On Fri, 29 May 2020 at 23:04, Stephen Scheck  wrote:

> root@guix /# guix system list-generations
> guix system: error: open-file: No such file or directory: 
> "/var/guix/profiles/system-1-link/parameters"

Do you have '/var/' in your Docker image?  Because it looks like the same than:

> root@localhost /gnu/store# guix package --list-profiles
> /root/.config/guix/current
> root@localhost /gnu/store# guix package -d
> guix package: error: profile '/var/guix/profiles/per-user/root/guix-profile' 
> does not exist


In addition, you have that:

> root@localhost /gnu/store# guix package --list-profiles
> /root/.config/guix/current

and it is really weird because you are doing:

 guix package --install --fallback jq
 /root/.config/guix/current/bin/guix describe --format=json |
/root/.guix-profile/bin/jq

therefore, somehow, the profile '/root/.guix-profile' should appears
with '--list-profiles' too.


I do not know if it is a bug -- as Leo suggests -- or if something is
not configured as expected.  Well, I asked you about the initial
Docker images because it should come from this one.  The fact that
"guix gc --list-dead" outputs a lot of items and the fact that 'you
cannot garbage collect with "guix gc" leads me to think that something
is wrong with '/var/guix/'.  I do not know...

Well, does "guix gc --list-dead | grep guix-cli-modules.drv | wc -l"
return the same number than you have ran "guix pull"?


All the best,
simon



Re: Guix Docker image inflation

2020-05-29 Thread Stephen Scheck
On Fri, May 29, 2020 at 2:44 PM zimoun  wrote:

> On Fri, 29 May 2020 at 20:37, Leo Famulari  wrote:
> >
> > On Fri, May 29, 2020 at 08:21:08PM +0200, Marius Bakke wrote:
> > > Leo Famulari  writes:
> > > > --branch and --commit would be passed to `guix pull`, and then you'd
> run
> > > > `guix system docker-image` based on that.
> > >
> > > There is also 'guix time-machine --commit=abc123 -- system
> docker-image'.
> >
> > Right, that's probably more efficient than creating lots of `guix pull`
> > generations.
>
> Yes, but it is hard to apriori know the forward commit.
>

Yes, and also, does a Docker image created by `guix pull` followed by `guix
system docker-image [...]` in fact really inherit the
Guix snapshot from the system that creates it?

Here's what I get on a freshly minted image made that way:

root@guix /# guix pull --list-generations
guix pull: error: profile '/var/guix/profiles/per-user/root/current-guix'
does not exist
root@guix /# guix describe
guix describe: error: failed to determine origin
hint: Perhaps this `guix' command was not obtained with `guix pull'? Its
version string is 1.1.0-4.bdc801e.

root@guix /# guix package --list-generations
guix package: error: profile
'/var/guix/profiles/per-user/root/guix-profile' does not exist

But here's `guix describe` output from the parent system:

root@localhost /# guix describe
Generation 13 May 29 2020 19:28:11 (current)
  guix 41a2d6a
repository URL: https://git.savannah.gnu.org/git/guix.git
branch: master
commit: 41a2d6a8b9294a6eb8e97aaefd569e755f5f461e

Until a fresh `guix pull` is run on the new image, it isn't functional and
there's no apparent way to confirm its actual commit hash,
so I don't really see what advantage it offers over the incremental method
I'm using (and it's unfeasibly slow, about 10-15 minutes
for an incremental pull compared to over an hour to finish `guix system
docker-image`).


Re: Guix Docker image inflation

2020-05-29 Thread Stephen Scheck
On Fri, May 29, 2020 at 4:02 PM zimoun  wrote:

> Well, could you try
>
>guix system delete-generations
>guix gc
>

root@guix /# guix system list-generations
guix system: error: open-file: No such file or directory:
"/var/guix/profiles/system-1-link/parameters"
root@guix /# guix system delete-generations
Backtrace:
   1 (primitive-load "/root/.config/guix/current/bin/guix")
In guix/ui.scm:
  1936:12  0 (run-guix-command _ . _)

guix/ui.scm:1936:12: In procedure run-guix-command:
In procedure struct-vtable: Wrong type argument in position 1 (expecting
struct): #f


Re: Guix Docker image inflation

2020-05-29 Thread zimoun
On Fri, 29 May 2020 at 20:47, Stephen Scheck  wrote:

> Not the point, no, but how else do I obtain a seed Guix Docker image, which I 
> can use to birth clean, pristine
> "baby" images of Guix's own making? It would be really nice if the Guix 
> project itself provided such an image!

Help welcome! :-)


Well, this

> root@localhost /gnu/store# guix gc
> finding garbage collector roots...
> deleting garbage...
> [0 MiB] deleting 
> '/gnu/store/x7ns2xcp8lfg24zq7gr3y8ffczn1nsxp-guix-d79c917f2-modules'
> [0 MiB] deleting '/gnu/store/slwkzcmg6r1lr9a16x3krd2ax384p8wr-guix-system'
> [0 MiB] deleting '/gnu/store/dzifisbdk1gwy2fw2hwzgvdnjak22awl-guix-extra'
> deleting `/gnu/store/trash'
> deleting unused links...
> note: currently hard linking saves 1181.82 MiB
> guix gc: freed 0.636 MiBs

and this

> root@localhost /gnu/store# guix gc --list-dead
> finding garbage collector roots...
> determining live/dead paths...
> /gnu/store/0bm8h4ns6bymc7q24vhfr0dnb7qab729-guix-cli
> /gnu/store/0hjjj9dppc5xvq3bfjwbsygrfyqn0rlv-guix-cli
> /gnu/store/0m0xx2958fgyz8kk093afik5cn4rhrc1-guix-cli-modules
> /gnu/store/0pi2jhn3a778gc3fm1l31sh07fik4zwa-guix-system-tests-modules
> /gnu/store/0vwg9aqzs5xrk10vcs4dl105s3f42ilf-guix-b1affd477-modules
> # Lots more listed...

is weird.  Something wrong happens here.


Well, could you try

   guix system delete-generations
   guix gc

?



Re: Guix Docker image inflation

2020-05-29 Thread Stephen Scheck
On Fri, May 29, 2020 at 2:08 PM zimoun  wrote:

> How the initial Docker image
> singularsyntax/guix-bootstrap:1.1.0-alpine-3.11 is built?
> To understand, you use the Docker image
> singularsyntax/guix-bootstrap:1.1.0-alpine-3.11 to build another
> Docker image namely guix-docker-image.tar using Guix, right?
> Well, that is not the point neither the issue. :-)
>

You can look at the Dockerfile here:
https://gitlab.com/singularsyntax-docker-hub/guix-bootstrap

It's pretty close to exactly the manual instructions for installing Guix on
a "foreign" distro on top of Alpine Linux.

Not the point, no, but how else do I obtain a seed Guix Docker image, which
I can use to birth clean, pristine
"baby" images of Guix's own making? It would be really nice if the Guix
project itself provided such an image!


> could you try that
>
> --8<---cut here---start->8---
> GUIX_PATH=/root/.config/guix/current/bin
> $GUIX_PATH/guix pull --branch=$CI_COMMIT_REF_NAME--fallback
> /root/.config/guix/current/bin/guix pull -d
> /root/.config/guix/current/bin/guix package -d
> /root/.config/guix/current/bin/guix gc
> docker commit
> /root/.config/guix/current/bin/guix package --install --fallback jq
> --8<---cut here---end--->8---
>


> Last, you could try to see what "guix package --list-profiles" says
> and then "guix gc --list-dead".

root@localhost /gnu/store# guix pull -d
root@localhost /gnu/store# guix package --list-profiles
/root/.config/guix/current
root@localhost /gnu/store# guix package -d
guix package: error: profile
'/var/guix/profiles/per-user/root/guix-profile' does not exist
root@localhost /gnu/store# guix package --list-profiles
/root/.config/guix/current
root@localhost /gnu/store# du -hs .
4.3G .
root@localhost /gnu/store# guix gc
finding garbage collector roots...
deleting garbage...
[0 MiB] deleting
'/gnu/store/x7ns2xcp8lfg24zq7gr3y8ffczn1nsxp-guix-d79c917f2-modules'
[0 MiB] deleting '/gnu/store/slwkzcmg6r1lr9a16x3krd2ax384p8wr-guix-system'
[0 MiB] deleting '/gnu/store/dzifisbdk1gwy2fw2hwzgvdnjak22awl-guix-extra'
deleting `/gnu/store/trash'
deleting unused links...
note: currently hard linking saves 1181.82 MiB
guix gc: freed 0.636 MiBs
root@localhost /gnu/store# du -hs .
4.3G .

root@localhost /gnu/store# guix gc --list-dead
finding garbage collector roots...
determining live/dead paths...
/gnu/store/0bm8h4ns6bymc7q24vhfr0dnb7qab729-guix-cli
/gnu/store/0hjjj9dppc5xvq3bfjwbsygrfyqn0rlv-guix-cli
/gnu/store/0m0xx2958fgyz8kk093afik5cn4rhrc1-guix-cli-modules
/gnu/store/0pi2jhn3a778gc3fm1l31sh07fik4zwa-guix-system-tests-modules
/gnu/store/0vwg9aqzs5xrk10vcs4dl105s3f42ilf-guix-b1affd477-modules
# Lots more listed...


Re: Guix Docker image inflation

2020-05-29 Thread Marius Bakke
Leo Famulari  writes:

>> How else would you suggest that it be done? It would be nice if `guix
>> system docker-image`
>> took `--branch` and `--commit` options to build a container from a
>> well-defined Guix check-in
>> state, but that doesn't seem to be the case. And in any case - too slow.
>> The point here is to
>> leverage daily incremental pulls to keep data transfer and build times down.
>
> --branch and --commit would be passed to `guix pull`, and then you'd run
> `guix system docker-image` based on that.

There is also 'guix time-machine --commit=abc123 -- system docker-image'.


signature.asc
Description: PGP signature


Re: Guix Docker image inflation

2020-05-29 Thread zimoun
On Fri, 29 May 2020 at 20:37, Leo Famulari  wrote:
>
> On Fri, May 29, 2020 at 08:21:08PM +0200, Marius Bakke wrote:
> > Leo Famulari  writes:
> > > --branch and --commit would be passed to `guix pull`, and then you'd run
> > > `guix system docker-image` based on that.
> >
> > There is also 'guix time-machine --commit=abc123 -- system docker-image'.
>
> Right, that's probably more efficient than creating lots of `guix pull`
> generations.

Yes, but it is hard to apriori know the forward commit.



Re: Guix Docker image inflation

2020-05-29 Thread Leo Famulari
On Fri, May 29, 2020 at 08:21:08PM +0200, Marius Bakke wrote:
> Leo Famulari  writes:
> > --branch and --commit would be passed to `guix pull`, and then you'd run
> > `guix system docker-image` based on that.
> 
> There is also 'guix time-machine --commit=abc123 -- system docker-image'.

Right, that's probably more efficient than creating lots of `guix pull`
generations.



Re: Guix Docker image inflation

2020-05-29 Thread Stephen Scheck
On Fri, May 29, 2020 at 2:02 PM Leo Famulari  wrote:

> Okay. For debugging, can you try garbage collecting those modules
> directories? And if the garbage collector refuses, you can investigate
> why with the 3 R's of Guix garbage collection, --referrers,
> --references, and --requisites.
>

# Hmm...
root@localhost /gnu/store# guix gc --references
/gnu/store/x7ns2xcp8lfg24zq7gr3y8ffczn1nsxp-guix-d79c917f2-modules
guix gc: error: path
`/gnu/store/x7ns2xcp8lfg24zq7gr3y8ffczn1nsxp-guix-d79c917f2-modules' is not
valid

# Hmm...
root@localhost /gnu/store# guix gc --requisites
/gnu/store/x7ns2xcp8lfg24zq7gr3y8ffczn1nsxp-guix-d79c917f2-modules
guix gc: error: path
`/gnu/store/x7ns2xcp8lfg24zq7gr3y8ffczn1nsxp-guix-d79c917f2-modules' is not
valid

# Hmm... this one is different - no output
root@localhost /gnu/store# guix gc --referrers
/gnu/store/x7ns2xcp8lfg24zq7gr3y8ffczn1nsxp-guix-d79c917f2-modules

# Now try to delete it...
root@localhost /gnu/store# guix gc --delete
/gnu/store/x7ns2xcp8lfg24zq7gr3y8ffczn1nsxp-guix-d79c917f2-modules
finding garbage collector roots...
[0 MiB] deleting
'/gnu/store/x7ns2xcp8lfg24zq7gr3y8ffczn1nsxp-guix-d79c917f2-modules'
deleting `/gnu/store/trash'
deleting unused links...
note: currently hard linking saves 1181.36 MiB

# Still there...
root@localhost /gnu/store# du -hs
/gnu/store/x7ns2xcp8lfg24zq7gr3y8ffczn1nsxp-guix-d79c917f2-modules
210M /gnu/store/x7ns2xcp8lfg24zq7gr3y8ffczn1nsxp-guix-d79c917f2-modules


Re: Guix Docker image inflation

2020-05-29 Thread zimoun
Dear,

On Wed, 27 May 2020 at 21:42, Stephen Scheck  wrote:


> https://gitlab.com/singularsyntax-docker-hub/guix/-/blob/master/.gitlab-ci.yml

How the initial Docker image
singularsyntax/guix-bootstrap:1.1.0-alpine-3.11 is built?
To understand, you use the Docker image
singularsyntax/guix-bootstrap:1.1.0-alpine-3.11 to build another
Docker image namely guix-docker-image.tar using Guix, right?
Well, that is not the point neither the issue. :-)


Well, instead of that

--8<---cut here---start->8---
GUIX_PATH=/root/.config/guix/current/bin
$GUIX_PATH/guix pull --branch=$CI_COMMIT_REF_NAME--fallback
/root/.config/guix/current/bin/guix gc --delete-generations
/root/.config/guix/current/bin/guix gc --collect-garbage
/root/.config/guix/current/bin/guix gc --optimize
docker commit
/root/.config/guix/current/bin/guix package --install --fallback jq
--8<---cut here---end--->8---

could you try that

--8<---cut here---start->8---
GUIX_PATH=/root/.config/guix/current/bin
$GUIX_PATH/guix pull --branch=$CI_COMMIT_REF_NAME--fallback
/root/.config/guix/current/bin/guix pull -d
/root/.config/guix/current/bin/guix package -d
/root/.config/guix/current/bin/guix gc
docker commit
/root/.config/guix/current/bin/guix package --install --fallback jq
--8<---cut here---end--->8---

?


Last, you could try to see what "guix package --list-profiles" says
and then "guix gc --list-dead".


Hope that helps,
simon



Re: Guix Docker image inflation

2020-05-29 Thread Leo Famulari
On Fri, May 29, 2020 at 01:56:28PM -0400, Stephen Scheck wrote:
> > > "guix-system$|guix-packages-base$|guix-[0-9a-f]*-modules$"
> > [...]
> > > 191M
> > /gnu/store/l3amdz5xyhflg5wdzlxr2685dq5glic2-guix-527ab3125-modules
> > > 201M
> > /gnu/store/5mhn1ynxvy7jihsknsnv3yspkkvc0r5s-guix-2e59ae238-modules
> >
> > If I understand correctly, you should not need both of these directories
> > in a Guix VM image. The latter hashes are truncated guix.git commit
> > hashes and a VM image would only be based on a single one.
> >
> 
> Exactly, I agree (to the extent that I understand Guix).
> 
> I recommend looking into why all these directories are being copied into
> > your images.
> >
> 
> Whatever is in /gnu/store (as managed by Guix) goes into the image, nothing
> more and nothing less.

Okay. For debugging, can you try garbage collecting those modules
directories? And if the garbage collector refuses, you can investigate
why with the 3 R's of Guix garbage collection, --referrers,
--references, and --requisites.

> How else would you suggest that it be done? It would be nice if `guix
> system docker-image`
> took `--branch` and `--commit` options to build a container from a
> well-defined Guix check-in
> state, but that doesn't seem to be the case. And in any case - too slow.
> The point here is to
> leverage daily incremental pulls to keep data transfer and build times down.

--branch and --commit would be passed to `guix pull`, and then you'd run
`guix system docker-image` based on that.



Re: Guix Docker image inflation

2020-05-29 Thread Stephen Scheck
On Fri, May 29, 2020 at 1:08 PM Leo Famulari  wrote:

> I'm still not quite sure what you are doing (or what Docker does) so
> please bear with me.
>
> > root@localhost /# du -h --max-depth=1 /gnu/store | egrep
> > "guix-system$|guix-packages-base$|guix-[0-9a-f]*-modules$"
> [...]
> > 191M
> /gnu/store/l3amdz5xyhflg5wdzlxr2685dq5glic2-guix-527ab3125-modules
> > 201M
> /gnu/store/5mhn1ynxvy7jihsknsnv3yspkkvc0r5s-guix-2e59ae238-modules
>
> If I understand correctly, you should not need both of these directories
> in a Guix VM image. The latter hashes are truncated guix.git commit
> hashes and a VM image would only be based on a single one.
>

Exactly, I agree (to the extent that I understand Guix).

I recommend looking into why all these directories are being copied into
> your images.
>

Whatever is in /gnu/store (as managed by Guix) goes into the image, nothing
more and nothing less.


>
> I figure you'd want to create each image with *only* the things
> corresponding to the Git commit it's based on, but it sounds like they
> are being created by copying the entire host image, which doesn't seem
> right.
>
> If the Docker images are being created by simply snapshotting the file
> system of a non-ephemeral Guix system, that's probably not the right way
> to do it. Is that what's going on?
>

Yes, as I said, the image is created from a file system snapshot, after Guix
is brought up to date via `guix pull` and those various Guix garbage
collection
operations are run. However, it's not quite "non-ephmeral" as each Guix
operation
is run as an atomic command inside the Docker container, with nothing else
running (except for guix-daemon, which has to always be running for Guix to
operate to the best of my understanding, and a couple other Guix System
daemons
which anyway would be equivalent to the situation to any Guix installation
running
outside of a Docker container).

How else would you suggest that it be done? It would be nice if `guix
system docker-image`
took `--branch` and `--commit` options to build a container from a
well-defined Guix check-in
state, but that doesn't seem to be the case. And in any case - too slow.
The point here is to
leverage daily incremental pulls to keep data transfer and build times down.


Re: Guix Docker image inflation

2020-05-29 Thread Leo Famulari
On Fri, May 29, 2020 at 12:19:46PM -0400, Stephen Scheck wrote:
> The previous day's Docker image is used as the base for the new one being
> built - the image is pulled from Docker Hub, `guix pull` is run inside it,
> and a new
> image is "committed" (Docker terminology for creating a new image from a
> file system snapshot).

I'm still not quite sure what you are doing (or what Docker does) so
please bear with me.

> root@localhost /# du -h --max-depth=1 /gnu/store | egrep
> "guix-system$|guix-packages-base$|guix-[0-9a-f]*-modules$"
[...]
> 191M /gnu/store/l3amdz5xyhflg5wdzlxr2685dq5glic2-guix-527ab3125-modules
> 201M /gnu/store/5mhn1ynxvy7jihsknsnv3yspkkvc0r5s-guix-2e59ae238-modules

If I understand correctly, you should not need both of these directories
in a Guix VM image. The latter hashes are truncated guix.git commit
hashes and a VM image would only be based on a single one.

I recommend looking into why all these directories are being copied into
your images.

I figure you'd want to create each image with *only* the things
corresponding to the Git commit it's based on, but it sounds like they
are being created by copying the entire host image, which doesn't seem
right.

If the Docker images are being created by simply snapshotting the file
system of a non-ephemeral Guix system, that's probably not the right way
to do it. Is that what's going on?



Re: Guix Docker image inflation

2020-05-29 Thread Stephen Scheck
On Thu, May 28, 2020 at 3:33 PM Leo Famulari  wrote:
> I'm not familiar with Docker so I'm not sure exactly what you are doing.
> Specifically, I can't tell if you are creating new Docker images from
> scratch each day, or if you are continuing to use the same one from day
> to day.

The previous day's Docker image is used as the base for the new one being
built - the image is pulled from Docker Hub, `guix pull` is run inside it,
and a new
image is "committed" (Docker terminology for creating a new image from a
file system snapshot).

BTW, I posted an incorrect internal link - the actual Docker images are
available here if
you'd like to try them out:

https://hub.docker.com/r/singularsyntax/guix/tags

> I'm also not sure which image is growing each day...

The daily Docker images described above.

> In general, the parameters --delete-generations and --collect-garbage
> are supposed to be passed values like a reference to a profile or an
> amount of data to delete, respectively. Are you doing that?

`guix gc --delete-generations` without a parameter causes all preceding
pull and package generations to be deleted.

> Are you removing / invalidating old generations before attempting to
> garbage collect them? The store items they refer to cannot be deleted
> until the generations themselves are no longer registered.

Yes,  `guix gc --delete-generations`, `guix gc --collect-garbage`, and
`guix gc --optimize`
are run in the order given. Note that passing a specific amount parameter to
`--collect-garbage` makes no difference.

> Usually, these old profiles are responsible for most of the disk usage
> in /gnu/store.

Indeed. It's clear what's taking up the space, but I don't understand why
it does not get garbage collected:

root@localhost /# guix pull --list-generations
Generation 12 May 28 2020 20:45:30 (current)
  guix a5374cd
repository URL: https://git.savannah.gnu.org/git/guix.git
branch: master
commit: a5374cde918cfeae5c16b43b9f2dd2b24bc3564d

root@localhost /# guix package --list-generations
guix package: error: profile
'/var/guix/profiles/per-user/root/guix-profile' does not exist

root@localhost /# du -h --max-depth=1 /gnu/store | egrep
"guix-system$|guix-packages-base$|guix-[0-9a-f]*-modules$"
44M /gnu/store/slwkzcmg6r1lr9a16x3krd2ax384p8wr-guix-system
44M /gnu/store/zf67wb6c0s97vwmywjq09hy9jq0w5mmi-guix-system
107M /gnu/store/plaay02w581vx9ilyiv93sl1lw54n7h5-guix-packages-base
44M /gnu/store/qhbk7g8z97m37iak1s1yn2my82gv0lj5-guix-system
103M /gnu/store/2qcfl7h10dynjlifyvqwh9iiic52q5x6-guix-packages-base
107M /gnu/store/m0fv2xmfif5pxnfb1bscfvgyfx0x6xdc-guix-packages-base
90M /gnu/store/hz2rn2l0jixg91q4rsdcwc489y71ll29-guix-05e1edf22-modules
41M /gnu/store/w47fgv8p2hvaqdwywymwvm0qlh4gw0ih-guix-system
191M /gnu/store/l3amdz5xyhflg5wdzlxr2685dq5glic2-guix-527ab3125-modules
201M /gnu/store/5mhn1ynxvy7jihsknsnv3yspkkvc0r5s-guix-2e59ae238-modules
44M /gnu/store/dzc16sv8jv831m0jkk5llc2ws1a3mk0z-guix-system
44M /gnu/store/9a2hr5lh15vxqa7bjih8w47wr6hr11nv-guix-system
103M /gnu/store/1lwdys51wi08r5an2rr6sqk9kbgr7qip-guix-packages-base
44M /gnu/store/c3spiv1c0fg83j7d99mjwk0s6fw77wl5-guix-system
44M /gnu/store/vwzk618h1wxy6z9i06xnhnxj4gvhkiss-guix-system
6.7M /gnu/store/a5xsqxr04pwnyni5x2gqjnishzq80cbw-guix-packages-base
14M /gnu/store/mych9fchln22pbhpc5syxyymx4hz496y-guix-8bd0b533b-modules
35M /gnu/store/brbwlbnx56ms50kklyqk9fsf0xkwjjf9-guix-498e2e669-modules
3.2M /gnu/store/dirpwhdr7h4nyphy4ncxqi4f2njv3rsh-guix-packages-base
35M /gnu/store/d3h4b7nvnms8d03ddi9b481dlxpykl7l-guix-5e3d16994-modules
5.8M /gnu/store/n339sr8c63f0nzja6yl8zfwy1jklj19j-guix-packages-base
25M /gnu/store/0vwg9aqzs5xrk10vcs4dl105s3f42ilf-guix-b1affd477-modules
41M /gnu/store/pwr8ab20xa1whxag689lsz82l2na08x0-guix-system
6.5M /gnu/store/6sggbpgg0zkbgxwf3wa2j15dis8z7cr1-guix-packages-base
57M /gnu/store/8z9qc2bvq8azc08p4miq77yf2agk07aq-guix-843e77205-modules
71M /gnu/store/ibgjq1ampj8bldrabbsnwik2sr0gg3as-guix-a43fe7acd-modules
37M /gnu/store/x7ns2xcp8lfg24zq7gr3y8ffczn1nsxp-guix-d79c917f2-modules
18M /gnu/store/i72b4biraw6bhy1v7ly46kwyaacvfa28-guix-system
178M /gnu/store/47aack48aczpzm635axsy4jf2pvmwrv0-guix-ef1d475b0-modules
15M /gnu/store/77sxajrwigsdnyr4l4jq4pk6v5kwbm59-guix-system


Re: Guix Docker image inflation

2020-05-28 Thread Leo Famulari
On Wed, May 27, 2020 at 03:41:49PM -0400, Stephen Scheck wrote:
> As an exercise, I set up daily Guix System Docker image builds using GitLab
> and Docker Hub, here:
> https://hub.docker.com/repository/registry-1.docker.io/singularsyntax/guix/tags?page=1

Cool!

> The build process works as follows: if an existing `latest` image does not
> exist for a given branch (master, 1.1.0, etc.), then bootstrap an image by
> running `guix system docker-image` inside an Alpine Linux Docker container
> with a fresh Guix installation. Using this image as a seed, `guix pull` is
> run for the desired branch, and the resulting image is committed to the
> Docker repository. If a "latest" image does exist, it is used instead as
> the base from which to run `guix pull`. Daily images are thus built
> incrementally from the previous day's build. For anybody curious about the
> process, the build script can be browsed here:
> https://gitlab.com/singularsyntax-docker-hub/guix/-/blob/master/.gitlab-ci.yml

I'm not familiar with Docker so I'm not sure exactly what you are doing.
Specifically, I can't tell if you are creating new Docker images from
scratch each day, or if you are continuing to use the same one from day
to day.

> It works pretty well, except that I'm observing substantial image size
> inflation day-over-day, starting at ~197 MB from the seed image, now up to
> 1.71 GB eleven days later despite running `guix gc --delete-generations`,
> `guix gc --collect-garbage`, and `guix gc --optimize` after pulling prior
> to committing each new image.

I'm also not sure which image is growing each day...

In general, the parameters --delete-generations and --collect-garbage
are supposed to be passed values like a reference to a profile or an
amount of data to delete, respectively. Are you doing that?

Are you removing / invalidating old generations before attempting to
garbage collect them? The store items they refer to cannot be deleted
until the generations themselves are no longer registered.

You can list existing generations with e.g. `guix package
--list-generations`. You can invalidate them with `guix package
--delete-generations=42` or with time-based patterns like `guix package
--delete-generations=1m`, which removes everything older than one month.
The same-named argument to `guix gc` should be shorthand for that.

Similarly for the profile used by `guix pull`, which is accessed like
this: `guix package --profile=$HOME/.config/guix/current
--list-generations`.

Usually, these old profiles are responsible for most of the disk usage
in /gnu/store.