bug#55848: [cuirass] workers stalled

2022-08-11 Thread Ludovic Courtès
Hi,

Ricardo Wurmus  skribis:

> Ludovic Courtès  writes:
>
>> Ludovic Courtès  skribis:
>>
>>> guix-daemon is configured to use the default substitute URLs,
>>> https://ci.guix.gnu.org and https://bordeaux.guix.gnu.org, which we know
>>> are unreachable.
>>>
>>> I’ve theoretically addressed this here:
>>>
>>>   
>>> https://git.savannah.gnu.org/cgit/guix/maintenance.git/commit/?id=99bd9dc9001d6bea7480a7ce0e0e10ff78adb787
>>>   
>>> https://git.savannah.gnu.org/cgit/guix/maintenance.git/commit/?id=b0661cc7d6dd74b0aeac3b052a80a8a2fef2af9c
>>>
>>> I tried to reconfigure those boxes with ‘guix deploy’, but this is
>>> currently on hold because ci.guix has run out of inodes…
>>
>> Time passed and I had kinda forgotten about it, but the problem remains.
>
> I wrote this earlier:
>
>> They should be using the local IP instead of routing through the
>> internet, so /etc/hosts should contain an entry for
>>
>> 141.80.167.131 ci.guix.gnu.org
>
> So running the daemon with “--substitute-urls=http://10.0.0.1” should
> not be necessary.

Oh my bad, sorry for overlooking your message.

Explicitly going through http://10.0.0.1 is still desirable I think
because we avoid HTTPS altogether.

‘guix deploy’ is still running on berlin.guix and building things;
unfortunately I’m going AFK for a bit.  I’ll pick it up later unless
someone takes care of it by then.

Thanks,
Ludo’.





bug#55848: [cuirass] workers stalled

2022-06-20 Thread Maxim Cournoyer
Hi Maxime,

Maxime Devos  writes:

> Maxim Cournoyer schreef op zo 19-06-2022 om 22:39 [-0400]:
>> There was also an attempt to cross-compile a rust/cargo bootstrap seed
>> for other architectures (branch: wip-cross-built-rust) but due to
>> complications with building rust as a static archive (it relies on
>> dynamic linking for its macro expand crates), the effort stalled.
>
> FWIW, has it been considered to cross-compile rust non-statically
> (not as a seed, just as an input cross-compiled from another system)?
> Doesn't help for people that cannot offload to x86_64 and don't have
> substitutes from ci.guix.gnu.org or such enabled, but could still be an
> improvement.

This already works, on the branch.  One of the patches carried there
that made it possible has been merged upstream too.  The issue is that
to offer a useful cross-compiled rust on non-x86_64 systems, you need to
move it from system domains; the clean way to do this is to archive a
static binary that depends on nothing else somewhere, and extract it in
a package for the target architecture.

Currently it's not cleanly self-contained because it still references
GCC libraries.

Maxim





bug#55848: [cuirass] workers stalled

2022-06-20 Thread Maxime Devos
Maxim Cournoyer schreef op zo 19-06-2022 om 22:39 [-0400]:
> There was also an attempt to cross-compile a rust/cargo bootstrap seed
> for other architectures (branch: wip-cross-built-rust) but due to
> complications with building rust as a static archive (it relies on
> dynamic linking for its macro expand crates), the effort stalled.

FWIW, has it been considered to cross-compile rust non-statically
(not as a seed, just as an input cross-compiled from another system)?
Doesn't help for people that cannot offload to x86_64 and don't have
substitutes from ci.guix.gnu.org or such enabled, but could still be an
improvement.

Greetings,
Maxime.


signature.asc
Description: This is a digitally signed message part


bug#55848: [cuirass] workers stalled

2022-06-19 Thread Tom Fitzhenry
On Mon, 20 Jun 2022, at 12:39 PM, Maxim Cournoyer wrote:
> That's a known issue with mrustc; it only succeeds with x86_64; the
> other architectures have problems.  That's a bug the mrustc author would
> like to fix, so perhaps in time in will improve (especially if
> interested parties can lend a hand).

mrustc was fixed on aarch64 in https://issues.guix.gnu.org/54580 on staging, 
which was recently merged to master.

I had tested mrustc and rust-1.39 to compile on aarch64 on staging, but now I 
observe rust-1.39 failing.

I'll take a closer look, maybe I'm missing something.





bug#55848: [cuirass] workers stalled

2022-06-19 Thread Maxim Cournoyer
Hi Mathieu!

[...]

> A few issues remain for aarch64:
>
> * grunewald and kreuzberg are not on .
>   Perhaps they were taken down while the substitute ratio was low to
>   avoid each worker independently recompiling expensive toolchains?
> * rust@1.39.0 (and thus all of Rust) is missing from ci and bordeaux. I
>   had expected this would have been working. I'll take a look and raise
>   a separate issue.

That's a known issue with mrustc; it only succeeds with x86_64; the
other architectures have problems.  That's a bug the mrustc author would
like to fix, so perhaps in time in will improve (especially if
interested parties can lend a hand).

There was also an attempt to cross-compile a rust/cargo bootstrap seed
for other architectures (branch: wip-cross-built-rust) but due to
complications with building rust as a static archive (it relies on
dynamic linking for its macro expand crates), the effort stalled.

Thanks,

Maxim





bug#55848: [cuirass] workers stalled

2022-06-18 Thread Tom Fitzhenry
Mathieu Othacehe  writes:

Substitutes for aarch64 are a lot healthier now. Thanks Ludovic!

* kreuzberg is now successfully building and has been for a while.
* ci.guix.gnu.has has 41% of substitutes (a low percentage, but likely a
  high percentage of toolchains). 0 jobs are queued, presumably because Curiass
  believes its up-to-date. This should increase over time, as packages
  are updated.
* bordeaux has 83.8% of substitutes.

A few issues remain for aarch64:

* grunewald and kreuzberg are not on .
  Perhaps they were taken down while the substitute ratio was low to
  avoid each worker independently recompiling expensive toolchains?
* rust@1.39.0 (and thus all of Rust) is missing from ci and bordeaux. I
  had expected this would have been working. I'll take a look and raise
  a separate issue.

--8<---cut here---start->8---
$ ./pre-inst-env guix weather -s aarch64-linux -c2000
computing 15514 package derivations for aarch64-linux...
looking for 16265 store items on https://ci.guix.gnu.org...
https://ci.guix.gnu.org
  41.0% substitutes available (6668 out of 16265)
  at least 34188.1 MiB of nars (compressed)
  45362.5 MiB on disk (uncompressed)
  0.015 seconds per request (144.9 seconds in total)
  66.2 requests per second

  0.0% (0 out of 9597) of the missing items are queued
  at least 1000 queued builds
  aarch64-linux: 110 (11.0%)
  powerpc64le-linux: 890 (89.0%)
  build rate: 36.81 builds per hour
  aarch64-linux: 17.23 builds per hour
  x86_64-linux: 14.25 builds per hour
  powerpc64le-linux: 1.01 builds per hour
  i686-linux: 4.83 builds per hour
1871 packages are missing from 'https://ci.guix.gnu.org' for 'aarch64-linux', 
among which:
  3479  rust@1.39.0 
/gnu/store/xxlgndidxvhdd391k35vcmviixq5d9b0-rust-1.39.0-cargo 
/gnu/store/cfy1p8q4bwwy1i01cjfssfry21kpljz3-rust-1.39.0 
  2111  cairomm@1.14.2  
/gnu/store/bxknxn3nbmmvavf537k0pggrynhrgsaf-cairomm-1.14.2-doc 
/gnu/store/3sn66mgr29v73zpp93c2v09a0rj87l3w-cairomm-1.14.2 
  2101  texlive-latex-pgf@59745 
/gnu/store/l6jr7v8ygn3ybj4gxcwskf8ifsjcj6x1-texlive-latex-pgf-59745 
looking for 16265 store items on https://bordeaux.guix.gnu.org...
https://bordeaux.guix.gnu.org
  83.8% substitutes available (13624 out of 16265)
  35138.6 MiB of nars (compressed)
  109501.6 MiB on disk (uncompressed)
  0.060 seconds per request (699.4 seconds in total)
  16.7 requests per second
  (continuous integration information unavailable)
579 packages are missing from 'https://bordeaux.guix.gnu.org' for 
'aarch64-linux', among which:
  3479  rust@1.39.0 
/gnu/store/xxlgndidxvhdd391k35vcmviixq5d9b0-rust-1.39.0-cargo 
/gnu/store/cfy1p8q4bwwy1i01cjfssfry21kpljz3-rust-1.39.0
--8<---cut here---end--->8---



> Hello,
>
> The aarch64 workers were all idle whereas 70k builds were
> available. Once restarted, they started building again.
>
> The problem might be that when the server is unavailable for a while the
> worker connections expire and cannot be resumed once the server is
> available again.
>
> Thanks,
>
> Mathieu





bug#55848: [cuirass] workers stalled

2022-06-12 Thread Ludovic Courtès
Ricardo Wurmus  skribis:

> They should be using the local IP instead of routing through the
> internet, so /etc/hosts should contain an entry for
>
> 141.80.167.131 ci.guix.gnu.org

Good idea.

> “guix deploy” did not work on these nodes due to a serious problem: they
> were given *some* x86_64 binaries to execute, so deployed systems were
> unbootable.  Since we don’t have a serial interface through which you
> could debug this remotely, please make sure not to deploy a broken
> system.  I’d like to avoid trips to the data centre.

Ooooh right, thanks for the reminder!

Ludo’.





bug#55848: [cuirass] workers stalled

2022-06-12 Thread Ricardo Wurmus


Ludovic Courtès  writes:

> Hi,
>
> (+Cc: guix-sysadmin)
>
> Tom Fitzhenry  skribis:
>
>>>From following the builds on http://ci.guix.gnu.org/workers , many
>> (all?) builds are failing on the following workers:
>>
>> * grunewald
>> * kreuzberg
>> * pankow
>>
>> The builds are failing with the same error:
>>
>> "substitute: updating substitutes from 'https://ci.guix.gnu.org'...
>> 0.0%guix substitute: error: TLS error in procedure 'handshake': Error in
>> the pull function."
>
> On these machines, https://ci.guix.gnu.org (among other) is unavailable
> for some reason (firewall I guess):

They should be using the local IP instead of routing through the
internet, so /etc/hosts should contain an entry for

141.80.167.131 ci.guix.gnu.org

(We have the same entry on the other build nodes hosted at the MDC.)

“guix deploy” did not work on these nodes due to a serious problem: they
were given *some* x86_64 binaries to execute, so deployed systems were
unbootable.  Since we don’t have a serial interface through which you
could debug this remotely, please make sure not to deploy a broken
system.  I’d like to avoid trips to the data centre.

-- 
Ricardo





bug#55848: [cuirass] workers stalled

2022-06-12 Thread Ludovic Courtès
Hi,

(+Cc: guix-sysadmin)

Tom Fitzhenry  skribis:

>>From following the builds on http://ci.guix.gnu.org/workers , many
> (all?) builds are failing on the following workers:
>
> * grunewald
> * kreuzberg
> * pankow
>
> The builds are failing with the same error:
>
> "substitute: updating substitutes from 'https://ci.guix.gnu.org'...
> 0.0%guix substitute: error: TLS error in procedure 'handshake': Error in
> the pull function."

On these machines, https://ci.guix.gnu.org (among other) is unavailable
for some reason (firewall I guess):

--8<---cut here---start->8---
ludo@grunewald ~$ wget --debug -O/dev/null https://ci.guix.gnu.org
Setting --output-document (outputdocument) to /dev/null
DEBUG output created by Wget 1.21.1 on linux-gnu.

Reading HSTS entries from /home/ludo/.wget-hsts
URI encoding = ‘UTF-8’
--2022-06-11 22:38:59--  https://ci.guix.gnu.org/
Certificates loaded: 444
Resolving ci.guix.gnu.org (ci.guix.gnu.org)... 141.80.181.40
Caching ci.guix.gnu.org => 141.80.181.40
Connecting to ci.guix.gnu.org (ci.guix.gnu.org)|141.80.181.40|:443... connected.
Created socket 4.
Releasing 0x1fd26b50 (new refcount 1).

[Sits there forever…]
--8<---cut here---end--->8---

These machines are configured using ‘honeycomb-system’ from (sysadmin
honeycomb) in maintenance.git.

guix-daemon is configured to use the default substitute URLs,
https://ci.guix.gnu.org and https://bordeaux.guix.gnu.org, which we know
are unreachable.

I’ve theoretically addressed this here:

  
https://git.savannah.gnu.org/cgit/guix/maintenance.git/commit/?id=99bd9dc9001d6bea7480a7ce0e0e10ff78adb787
  
https://git.savannah.gnu.org/cgit/guix/maintenance.git/commit/?id=b0661cc7d6dd74b0aeac3b052a80a8a2fef2af9c

I tried to reconfigure those boxes with ‘guix deploy’, but this is
currently on hold because ci.guix has run out of inodes…

To be continued!

Ludo’.





bug#55848: [cuirass] workers stalled

2022-06-11 Thread Tom Fitzhenry
Greg Hogan  writes:

> On Wed, Jun 8, 2022 at 11:32 AM Mathieu Othacehe  wrote:
>> The aarch64 workers were all idle whereas 70k builds were
>> available. Once restarted, they started building again.

>From following the builds on http://ci.guix.gnu.org/workers , many
(all?) builds are failing on the following workers:

* grunewald
* kreuzberg
* pankow

The builds are failing with the same error:

"substitute: updating substitutes from 'https://ci.guix.gnu.org'...
0.0%guix substitute: error: TLS error in procedure 'handshake': Error in
the pull function."

Here's some examples:
* http://ci.guix.gnu.org/build/998403/details
* http://ci.guix.gnu.org/build/978678/details
* http://ci.guix.gnu.org/build/978243/details


On worker overdrive1, in the raw log of
http://ci.guix.gnu.org/build/875908/details we can see this
rust-async-mutex build managing to pull substitutes, but it 
seems to be compiling rust-1.57 itself.





bug#55848: [cuirass] workers stalled

2022-06-08 Thread Greg Hogan
On Wed, Jun 8, 2022 at 11:32 AM Mathieu Othacehe  wrote:
>
>
> Hello,
>
> The aarch64 workers were all idle whereas 70k builds were
> available. Once restarted, they started building again.
>
> The problem might be that when the server is unavailable for a while the
> worker connections expire and cannot be resumed once the server is
> available again.
>
> Thanks,
>
> Mathieu

The recent aarch64 builds look to all be failing with the following message.

=  =
substitute:
substitute:  [Kupdating substitutes from 'https://ci.guix.gnu.org'...
 0.0%guix substitute: error: TLS error in procedure 'handshake': Error
in the pull function.
=  =





bug#55848: [cuirass] workers stalled

2022-06-08 Thread Mathieu Othacehe


Hello,

The aarch64 workers were all idle whereas 70k builds were
available. Once restarted, they started building again.

The problem might be that when the server is unavailable for a while the
worker connections expire and cannot be resumed once the server is
available again.

Thanks,

Mathieu