Re: Mid-December update on bordeaux.guix.gnu.org

2021-12-15 Thread Christopher Baines

zimoun  writes:

> Hi Chris,
>
> Thanks for the update.  And for the all work. :-)
>
>
> On Wed, 15 Dec 2021 at 16:48, Christopher Baines  wrote:
>
>> In summary, the space issue I mentioned in the previous update has
>> effectively been addressed. All the paused agents are now unpaused and
>> builds are happening again.
>
> The timing had almost been perfect. ;-)
>
>
> Well, as discussed on Sept., one concern I have is about “long-term
> storage” – where long-term is not well-defined and storage either.
>
> Do you think that Bordeaux could run
>
>

The Guix Build Coordinator just builds derivations. I haven't got it to
build a manifest before, but that's possible I guess.

I think it's unnecessary though, since I believe derivations for all
origins of all packages are already being built, that happens through
just asking the coordinator to build derivations for all packages, you
don't need to specify "source" derivations separately.

> ?  Having a redundancy about all origins would avoid breakage.  For
> instance, because Berlin was down yesterday morning, “guix pull” was
> broken because the missing ’datefuge’ package – disappeared upstream.

I would hope that bordeaux.guix.gnu.org has a substitute for that, could
you check the derivation against data.guix.gnu.org, and see if there's a
build? Use a URL like:

  
https://data.guix.gnu.org/gnu/store/vhj3gg00hzqfi8lazr3snb9msr4a3q6l-datefudge_1.23.tar.xz.drv

There is one issue though, bordeaux.guix.gnu.org doesn't provide content
addressed files in the same way guix publish does. I hope to add that
through the nar-herder, and once that's added, bordeaux.guix.gnu.org can
hopefully be added to the list of content addressed mirrors:

  https://git.savannah.gnu.org/cgit/guix.git/tree/guix/download.scm#n368

That would mean that the bytes for a tar archive for example would be
available by the sha256 hash, not just as a nar. I'm not sure to what
extent this would help, but it's probably useful.

>> In general this is an important step in being more flexible where the
>> nars are stored. There's still a reliance on storing pretty much all the
>> nars on a single machine, but which machine has this role is more
>> flexible now. I think this architecture also makes it easier to break
>> the "all nars on a single machine" restriction in the future as well.
>
> IIUC the design, if the proxy server is lost, then it is easy to replace
> it.  Right?

I guess so, the nar-herder helps with managing the data at least which
makes setting up new or replacement servers easier.

> I remember discussions about CDN [2,3,4,5,6].  I do not know if it
> solves the issue but from my understanding, it will improve at least
> performance delivery.  Well, it appears to me worth to give a try.
>
>
> 2: 
> 3: 
> 
> 4: 
> 5: 
> 6: 

Effectively this is moving towards building a CDN. With the nar-herder,
you could deploy reverse proxies (or edge nodes) in various
locations. Then the issue just becomes how to have users use the ones
that are best for them. This might require doing some fancy stuff with
GeoIP based DNS, and somehow sharing TLS certificates between the
machines, but I think it's quite feasible.

>> Going forward, it would be good to have an additional full backup of the
>> nars that bayfront can serve things from, to provide more
>> redundancy. I'm hoping the nar-herder will also enable setting up
>> geographically distributed mirrors, which will hopefully improve
>> redundancy further, and maybe performance of fetching nars too.
>
> To me, one first general question about backup coordination is to define
> a window for time:
>
>  - source: forever until the complete fallback to SWH is robust;
>  - all the substitutes to run “guix time-machine --commit=<> -- help ”
>for any commit reachable by inferior: forever;
>  - package substitute: rule something.

The idea I've been working with so far is simply to store everything
that's built, forever.

Currently, that amounts to 561,043 nars totaling ~2.5TB's.

How feasible this is depends on a number of factors, but I don't have
any reason to think it's not feasible yet.

> Thanks for taking care about redundancy and reliance of CI.

There's not a relationship to continuous integration yet, although I am
hoping if the building and serving substitutes stuff stabilises,
bordeaux.guix.gnu.org might be able to play a part in testing patches
and branches (as discussed in [1]).

1: https://lists.gnu.org/archive/html/guix-devel/2021-08/msg1.html

Thanks for all your comments

Re: Mid-December update on bordeaux.guix.gnu.org

2021-12-15 Thread zimoun
Hi Chris,

Thanks for the update.  And for the all work. :-)


On Wed, 15 Dec 2021 at 16:48, Christopher Baines  wrote:

> In summary, the space issue I mentioned in the previous update has
> effectively been addressed. All the paused agents are now unpaused and
> builds are happening again.

The timing had almost been perfect. ;-)


Well, as discussed on Sept., one concern I have is about “long-term
storage” – where long-term is not well-defined and storage either.

Do you think that Bordeaux could run

   

?  Having a redundancy about all origins would avoid breakage.  For
instance, because Berlin was down yesterday morning, “guix pull” was
broken because the missing ’datefuge’ package – disappeared upstream.

Today, Guix is well covered for package using ’git-fetch’ but not for
all the others methods.  The situation is improving to have a complete
fallback using Software Heritage via Disarchive.  Not ready yet [1]. :-)

This redundancy about all sources appears to me vitally important.
Because if Berlin is totally lost for whatever reason, it is game over
for preserving Guix – well, recover from scattered with people’s store.

Other said, in term of capacity and priority, it appears to me worse if
0.01% source (or even just one) is missing than if some substitutes are
missing.  Because I can always locally burn some CPU, but I cannot
create source code. :-)

1: 


> In addition to lakefront, I've also added a 6TB hard drive to hatysa,
> the HoneyComb LX2 machine that I host. Like lakefront, it's busy
> downloading the nars from bayfront. This will act as a backup in case
> lakefront is lost.

Cool!  Thanks.


> In general this is an important step in being more flexible where the
> nars are stored. There's still a reliance on storing pretty much all the
> nars on a single machine, but which machine has this role is more
> flexible now. I think this architecture also makes it easier to break
> the "all nars on a single machine" restriction in the future as well.

IIUC the design, if the proxy server is lost, then it is easy to replace
it.  Right?

I remember discussions about CDN [2,3,4,5,6].  I do not know if it
solves the issue but from my understanding, it will improve at least
performance delivery.  Well, it appears to me worth to give a try.


2: 
3: 

4: 
5: 
6: 


> Going forward, it would be good to have an additional full backup of the
> nars that bayfront can serve things from, to provide more
> redundancy. I'm hoping the nar-herder will also enable setting up
> geographically distributed mirrors, which will hopefully improve
> redundancy further, and maybe performance of fetching nars too.

To me, one first general question about backup coordination is to define
a window for time:

 - source: forever until the complete fallback to SWH is robust;
 - all the substitutes to run “guix time-machine --commit=<> -- help ”
   for any commit reachable by inferior: forever;
 - package substitute: rule something.


Thanks for taking care about redundancy and reliance of CI.


Cheers,
simon



Re: Guix Documentation Meetup

2021-12-15 Thread jgart
On Wed, 15 Dec 2021 20:12:48 +0100 zimoun  wrote:

Hi all,

Just wanted to share the patches here from the meetup:

https://issues.guix.gnu.org/52505

all best,

jgart



Re: core-updates-frozen branch merged

2021-12-15 Thread Thiago Jung Bauermann
Hello!

Em segunda-feira, 13 de dezembro de 2021, às 22:34:34 -03, Maxim Cournoyer 
escreveu:
> Hello Guix!
> 
> In case you hadn't taken notice, the core-update-frozen branch was
> finally merged into master.

Hooray! Thank you very much to everyone who made this happen! I really 
appreciate the effort. The master branch is working great for me.

-- 
Thanks,
Thiago





Re: CI status

2021-12-15 Thread Ricardo Wurmus


Mathieu Othacehe  writes:

>> * The cuirass-remote-server Avahi service is no longer visible when
>>   running "avahi-browse -a". I strongly suspect that this is related to
>>   the static-networking update, even if I don't have a proof for
>>   now. This means that the remote-workers using Avahi for discovering
>>   (hydra-guix-*) machines can no longer connect. The
>>   ci.guix.gnu.org/workers list is thus quite empty.
>
> This is caused by: https://issues.guix.gnu.org/52520. I worked-around it
> by enabling multicast manually on the berlin eno1 network interface.
>
> Cuirass is building again, yay!

Thank you, Mathieu!  I really appreciate your work on Cuirass and your
efforts in diagnosing and working around performance problems.

-- 
Ricardo



Re: CI status

2021-12-15 Thread Leo Famulari
On Wed, Dec 15, 2021 at 08:38:56PM +0100, Mathieu Othacehe wrote:
> 
> > * The cuirass-remote-server Avahi service is no longer visible when
> >   running "avahi-browse -a". I strongly suspect that this is related to
> >   the static-networking update, even if I don't have a proof for
> >   now. This means that the remote-workers using Avahi for discovering
> >   (hydra-guix-*) machines can no longer connect. The
> >   ci.guix.gnu.org/workers list is thus quite empty.
> 
> This is caused by: https://issues.guix.gnu.org/52520. I worked-around it
> by enabling multicast manually on the berlin eno1 network interface.
> 
> Cuirass is building again, yay!

Great news! Thanks for your diligence.



Re: CI status

2021-12-15 Thread Mathieu Othacehe


> * The cuirass-remote-server Avahi service is no longer visible when
>   running "avahi-browse -a". I strongly suspect that this is related to
>   the static-networking update, even if I don't have a proof for
>   now. This means that the remote-workers using Avahi for discovering
>   (hydra-guix-*) machines can no longer connect. The
>   ci.guix.gnu.org/workers list is thus quite empty.

This is caused by: https://issues.guix.gnu.org/52520. I worked-around it
by enabling multicast manually on the berlin eno1 network interface.

Cuirass is building again, yay!

Mathieu



Re: Guix Documentation Meetup

2021-12-15 Thread zimoun
Hi,

On Wed, 15 Dec 2021 at 18:37, Blake Shaw  wrote:
>
> Blake Shaw  writes:
>
> Simon, just peeped your monad tutorial, and gotta say its one of the
> clearest presentations of the subject I've seen. Great work!

Thanks.  If it helps, I can share the Org source file.


Cheers,
simon



SSH service for Guix Home

2021-12-15 Thread Ludovic Courtès
Hi Andrew,

One service I miss for Guix Home is ‘home-ssh-service-type’, which is in
the “original” Guix Home.

Could you contribute a patch adding it?  (I could do it on your behalf,
but it sounds more logical to let you handle it.)

Also, could you (or Xinglu, or Oleg) write a blog post for guix.gnu.org,
targeting an audience who’s not familiar with this kind of tool, making
it clear what the rationale is and what it can bring to “normal users”?
It would be really helpful to have that published within a couple of
weeks or so, before the next release.

Last, it’d be great to see the three of you (and more people!) back in
action regarding Guix Home.  I understand that life sometimes gets in
the way, but it seems that there’s been some confusion as to how to go
forward—e.g., —which may partly
explain why things stalled.  If there are patches waiting for review,
also don’t hesitate to ping!

Thanks,
Ludo’.



Re: CI status

2021-12-15 Thread Leo Famulari
On Wed, Dec 15, 2021 at 05:15:08PM +0100, Mathieu Othacehe wrote:
> * The IO operations on Berlin are mysteriously slow. Removing files from
>   /gnu/store/trash is taking ages. This is reported here:
>   https://issues.guix.gnu.org/51787.

I believe this is because we are running `guix gc --verify=contents` to
check the status of the build artifacts after the shutdown. I'm not sure
whether or not we can get a progress report on this.

> * The PostgreSQL database behind ci.guix.gnu.org also became super slow
>   and I decided to drop it. I don't know if there's a connection with
>   the above point. I'm missing the appropriate tools/knowledge to
>   monitor the IO & file-system performances.

You might try `atop`, which at least highlights that the storage is
almost fully loaded with I/O operations. Beyond that is `sar` from the
sysstat package, although making use of it requires some learning.



Mid-December update on bordeaux.guix.gnu.org

2021-12-15 Thread Christopher Baines
Hey!

I sent out the last update 3 weeks ago [1].

1: https://lists.gnu.org/archive/html/guix-devel/2021-11/msg00154.html

In summary, the space issue I mentioned in the previous update has
effectively been addressed. All the paused agents are now unpaused and
builds are happening again.

However, due to the time spent not building things, the backlog is
longer than usual, and the substitute availability (especially for
x86_64-linux and i686-linux) is lower than usual.

I've also noticed that bordeaux.guix.gnu.org doesn't work over IPv6, and
I want to fix that soon.

** Space issues and the nar-herder

bordeaux.guix.gnu.org wasn't building things for 2 weeks as the space on
bayfront was getting a little scarce. This week I started rolling out
the nar-herder [2], a utility I've been thinking about for a while. This
has enabled moving nars off of bayfront on to another machine which I've
confusingly named lakefront.

The lakefront machine is hosted by Hetzner in Germany, and has 6TB of
storage across 2 hard drives. When a nar is requested from bayfront, it
will check it's local storage, and serve it from there if it exists,
otherwise it will forward the request to lakefront. There might be a
slight drop in the speed you can download nars, but apart from that this
change shouldn't be visible.

The nar-herder is now busy deleting nars on bayfront which are available
on lakefront. Once it's got through the backlog, I'll enable NGinx
caching for the nars on bayfront, which should help improve
performance. I've tested downloading the largest nars (~2GB) though, and
it seems to work fine.

In addition to lakefront, I've also added a 6TB hard drive to hatysa,
the HoneyComb LX2 machine that I host. Like lakefront, it's busy
downloading the nars from bayfront. This will act as a backup in case
lakefront is lost.

In general this is an important step in being more flexible where the
nars are stored. There's still a reliance on storing pretty much all the
nars on a single machine, but which machine has this role is more
flexible now. I think this architecture also makes it easier to break
the "all nars on a single machine" restriction in the future as well.

Going forward, it would be good to have an additional full backup of the
nars that bayfront can serve things from, to provide more
redundancy. I'm hoping the nar-herder will also enable setting up
geographically distributed mirrors, which will hopefully improve
redundancy further, and maybe performance of fetching nars too.

Once I've had a chance to neaten up the code a little, I'll get a Guix
package and service written, plus I'll draft a Guix blog post about the
nar-herder.

2: https://git.cbaines.net/guix/nar-herder/about/

** Build machines and backlog

Because of the 2 weeks of not building anything, there's a significant
backlog of builds to get through, and I'm not including the new builds
from the core-updates-frozen merge here.

As for build machines, milano-guix-1 came back online today, which is
great. I believe harbourfront is still unusable through (broken hard
drive).

That means there's the following currently running build agents (by
architecture):

 - x86_64-linux + i686-linux (3 machines):
   - 4 core Intel NUC (giedi)
   - Max 16 cores for 1 concurrent build on bayfront
   - 32 cores on milano-guix-1 (slow storage though)
 - aarch64-linux + armhf-linux (2 machines)
   - 16 core HoneyComb LX2 (hatysa)
   - 4 core Overdrive (monokuma)
 - powerpc64le-linux (1 machine)
   - 64 core machine (polaris)

Ironically, I think that the most under-resourced area is x86_64-linux +
i686-linux. bayfront is restricted in what it can do since it also runs
the coordinator, and things go badly if the machine gets
overloaded. bayfront and milano-guix-1 both have hard drive storage,
which also can slow them down when building things (especially
milano-guix-1).

If we (as a project) want bordeaux.guix.gnu.org to have the capacity to
keep up, it would be good to make a plan to add capacity. I think even a
single high core count x86_64-linux machine with performant storage
would make a big difference.

** IPv6 not supported (yet)

I was slow to notice, but bordeaux.guix.gnu.org isn't available over
IPv6 yet, since bayfront doesn't seem to have IPv6 connectivity. I want
to address this, but I haven't worked out quite how to yet.

Please let me know if you have any comments or questions!

Chris


signature.asc
Description: PGP signature


Re: Guix Documentation Meetup

2021-12-15 Thread Blake Shaw
Blake Shaw  writes:

Simon, just peeped your monad tutorial, and gotta say its one of the
clearest presentations of the subject I've seen. Great work!

>> I agree.  For what it is worth, I tried once to quickly explain monad,
>> with the aim of “Store Monad“ in mind,
>>
>> https://guix.gnu.org/manual/devel/en/guix.html#The-Store-Monad
>>
>> After several discussions with strong Guix hackers, it appears to me
>> that they missed the general concept of monad, at least it was vague.
>> Therefore, I tried to write a simple explanation,
>>
>> https://simon.tournier.info/posts/2021-02-03-monad.html
>
-- 
“In girum imus nocte et consumimur igni”



CI status

2021-12-15 Thread Mathieu Othacehe


Hello,

You must have noticed that the CI is currently struggling a bit. Here is
a small recap of the situation.

* The IO operations on Berlin are mysteriously slow. Removing files from
  /gnu/store/trash is taking ages. This is reported here:
  https://issues.guix.gnu.org/51787.

  We have to kill the garbage collect frequently to keep things
  going. The bad side is obviously that we can't do that forever, as we
  only have 9.3T and decreasing, while we aim to stay at 10T available.

* The PostgreSQL database behind ci.guix.gnu.org also became super slow
  and I decided to drop it. I don't know if there's a connection with
  the above point. I'm missing the appropriate tools/knowledge to
  monitor the IO & file-system performances.

* The php package isn't building anymore, reported here:
  https://issues.guix.gnu.org/52513. This means that we cannot
  reconfigure zabbix. I removed it from the berlin configuration
  temporarily.
  
* The cuirass-remote-server Avahi service is no longer visible when
  running "avahi-browse -a". I strongly suspect that this is related to
  the static-networking update, even if I don't have a proof for
  now. This means that the remote-workers using Avahi for discovering
  (hydra-guix-*) machines can no longer connect. The
  ci.guix.gnu.org/workers list is thus quite empty.

* Facing those problems, I tried to rollback to a previous system
  generation, but this is bringing even more issues, as for instance the
  older Cuirass package, is struggling with the new database structure and
  other niceties. I think out best course of action is to stick to
  master and fix the above problems.

Thanks,

Mathieu



Guix Packaging Meetup This Saturday

2021-12-15 Thread jgart
Hi Guixers!

I'd like to invite you this Saturday to another Guix Packaging Meetup.

We'll meet at 10:30 AM ET (3:30 PM UTC)

Here's the room link:

https://meet.nixnet.services/b/jga-rtw-ahw-yky

If you have any packaging requests or ideas for things you'd like to work on 
together feel free to reply.

all best,

jgart



Re: Tensorflow fixes on core-updates-frozen

2021-12-15 Thread Guillaume Le Vaillant
Ricardo Wurmus  skribis:

> Ricardo Wurmus  writes:
>
>> Unfortunately, this is not enough to build tensorflow.  At the very end
>> we have this problem: […]
>
> This should now be fixed with commit e1c91aae23af12bccab615902a08ebc86defc1ac.

Thanks!


signature.asc
Description: PGP signature


Release process and schedule ?

2021-12-15 Thread zimoun
Hi,

Now core-udpates is merged, it is good time to make a new release [0].

Schedule?  I propose a release date for January, 31st.  Early?  When?

The branch version-1.4.0 is already created and instead of cherry pick,
I propose to rebase with master until the freeze.  WDYT?

For v1.3, this bug #47297 [1] collected all the blocking items and it
worked well, I guess.  Do we do the same?


In any case, to help the release process, please:

 a. test – especially the installer [2]
 b. fix bugs severity:important,serious [3,4] or report
 c. proofread the last manual [5]
 d. translate [6]
 e. highlight an item to ease the list of important changes

The lesson of v1.0.1 [#,@] is: please help in testing the installer. :-)


How does it sound?

Last, Guix is a “rolling-release“, so what does it mean ‘release’? :-)
The main argument for releasing, IMHO, is communication and so attract
potential new users. :-)

Cheers,
simon

[0]: 
[1] 
[2] 
[3] 
[4] 
[5] 
[6] 

[#] 
[@] 



Re: Tensorflow fixes on core-updates-frozen

2021-12-15 Thread Ricardo Wurmus


Ricardo Wurmus  writes:

> Unfortunately, this is not enough to build tensorflow.  At the very end
> we have this problem: […]

This should now be fixed with commit e1c91aae23af12bccab615902a08ebc86defc1ac.

-- 
Ricardo