bug#54447: cuirass: missing derivation error

2024-07-14 Thread Ludovic Courtès
Hi!

Ludovic Courtès  skribis:

> News from the everlasting bug!
>
>   cannot build missing derivation 
> ‘/gnu/store/dfgc46q3l8wlnymv49a1wjnxypin8p0y-plink-1.07.drv’

[...]

> ‘guix publish’ replied, but 40s too late (nginx has
> “proxy_connect_timeout 10s;” for .narinfo URLs¹).

While the exact reason why ‘guix publish’ exhibits this behavior is
unclear, the good news is that this is “fixed” by having ‘cuirass
remote-worker’ retry when it fails to substitute a .drv (thanks Chris
for the obvious-in-hindsight tip!):

  
https://git.savannah.gnu.org/cgit/guix/guix-cuirass.git/commit/?id=2365ba786c805477fcbae6eaeb358b0dd0501598
  
https://git.savannah.gnu.org/cgit/guix/guix-cuirass.git/commit/?id=71426663f6ea32152782645e4632168dd2b18602

Furthermore, workers can now reject builds if they fail to substitute
the .drv, in which case ‘cuirass remote-server’ either reschedules or
cancels the build:

  
https://git.savannah.gnu.org/cgit/guix/guix-cuirass.git/commit/?id=a909fa99340db5e5cd64612ea4e07e929dc643ad

This has been deployed a few days ago on berlin and on its x86_64 build
machines.  Working well so far!

Ludo’.





bug#54447: cuirass: missing derivation error

2024-04-13 Thread John Kehayias via Bug reports for GNU Guix
Hi all,

On Thu, Apr 04, 2024 at 11:33 PM, Ludovic Courtès wrote:

> Hello!
>
> News from the everlasting bug!
>
>   cannot build missing derivation
> ‘/gnu/store/dfgc46q3l8wlnymv49a1wjnxypin8p0y-plink-1.07.drv’
>
> (From .)
>
> Why was it missing this time?  /var/log/nginx/error.log:
>
> 2024/04/04 17:15:03 [error] 98751#0: *152293778 upstream timed out (110: 
> Connection timed out) while reading response header from upstream, client: 
> 141.80.167.169, server: ci.guix.gnu.org, request: "GET 
> /dfgc46q3l8wlnymv49a1wjnxypin8p0y.narinfo HTTP/1.1", upstream: 
> "http://127.0.0.1:3000/dfgc46q3l8wlnymv49a1wjnxypin8p0y.narinfo;, host: 
> "141.80.167.131"
>
>
> Oops!  (There are dozens of upstream timeouts logged on that minute.)
>
> /var/log/guix-publish.log:
>
> 2024-04-04 17:14:51 GET 
> /nar/lzip/pz39bkq7pd1hgy5rwiynqa33gyjvpgs5-python-pygments-2.12.0
> 2024-04-04 17:14:51 GET /z2xxwwxswdd4b8c8iwmxhqnqbp5nwz09.narinfo
> 2024-04-04 17:14:51 GET /lgyck285bsxzwrnh3x5ix5dwzd3n3wga.narinfo
> 2024-04-04 17:14:51 GET 
> /nar/zstd/jxkglr445f215m2faqz1i2lgmbans4rf-texlive-amsmath-66594-doc
> 2024-04-04 17:15:33 GET /qg5cxb869i42jn7x2dm6k5l41ikkz21w.narinfo
> 2024-04-04 17:15:33 GET 
> /nar/zstd/i2hp3q2pfhsyl0al7z38am7cqpddi4qr-texlive-capt-of-66594-doc
> 2024-04-04 17:15:33 GET /hh0gdbljj3cjdnjbr88kfm21mhys5sy7.narinfo
> 2024-04-04 17:15:33 GET /dfgc46q3l8wlnymv49a1wjnxypin8p0y.narinfo
> 2024-04-04 17:15:33 GET /yj63wifalfr6sla42h7mkqg011qrl5d0.narinfo
> 2024-04-04 17:15:33 GET /h2s2g2adxbnd67g34mnjnpcr6p3nhr69.narinfo
> 2024-04-04 17:15:33 -> GET /h2s2g2adxbnd67g34mnjnpcr6p3nhr69.narinfo: 404
> 2024-04-04 17:15:33 GET 
> /nar/lzip/6zxlrw15b9dsv73s7v5fqabl7iv5v5il-python-exceptiongroup-1.1.1
> 2024-04-04 17:15:33 GET 
> /nar/zstd/pychjd114abscbqlzcr3s7myf1497vw2-julia-compilersupportlibraries-jll-0.4.0%2B1
>
> ‘guix publish’ replied, but 40s too late (nginx has
> “proxy_connect_timeout 10s;” for .narinfo URLs¹).
>
> Notice the 40s pause time between 17:14:51 and 17:15:33.  Stop-the-world
> GC?  Unlikely, because ‘guix publish’ had been running for ~3h, so even
> with a leak², it’s hard to believe GC could take this long.
>
> Ludo’.
>
> ¹
> https://git.savannah.gnu.org/cgit/guix/maintenance.git/tree/hydra/nginx/berlin.scm#n103
> ² https://issues.guix.gnu.org/69596

I don't have any insight, but if anyone wants to see this in action at a
large scale, take look at pretty much any red dot on
https://ci.guix.gnu.org/eval/1238471/dashboard?system=i686-linux

>From my quick look all the CL and texlive failures were all missing
derivation. I've tried restarting a bunch to get i686 coverage going, so
hopefully some will disappear. But I can't/won't manually restart the
thousands(?) of failed builds. I didn't see such issues on x86_64, while
other architectures take a really long time to build on Berlin so I
haven't looked.

I don't know if this is helpful, but thought I would chime in if anyone
wants potentially a bunch of data. And if there are good ideas to
recover (just restart all builds?) that would be great so mesa-updates
will be build on i686 since otherwise it looks good.

Thanks!
John






bug#54447: cuirass: missing derivation error

2024-04-04 Thread Ludovic Courtès
Hello!

News from the everlasting bug!

  cannot build missing derivation 
‘/gnu/store/dfgc46q3l8wlnymv49a1wjnxypin8p0y-plink-1.07.drv’

(From .)

Why was it missing this time?  /var/log/nginx/error.log:

--8<---cut here---start->8---
2024/04/04 17:15:03 [error] 98751#0: *152293778 upstream timed out (110: 
Connection timed out) while reading response header from upstream, client: 
141.80.167.169, server: ci.guix.gnu.org, request: "GET 
/dfgc46q3l8wlnymv49a1wjnxypin8p0y.narinfo HTTP/1.1", upstream: 
"http://127.0.0.1:3000/dfgc46q3l8wlnymv49a1wjnxypin8p0y.narinfo;, host: 
"141.80.167.131"
--8<---cut here---end--->8---

Oops!  (There are dozens of upstream timeouts logged on that minute.)

/var/log/guix-publish.log:

--8<---cut here---start->8---
2024-04-04 17:14:51 GET 
/nar/lzip/pz39bkq7pd1hgy5rwiynqa33gyjvpgs5-python-pygments-2.12.0
2024-04-04 17:14:51 GET /z2xxwwxswdd4b8c8iwmxhqnqbp5nwz09.narinfo
2024-04-04 17:14:51 GET /lgyck285bsxzwrnh3x5ix5dwzd3n3wga.narinfo
2024-04-04 17:14:51 GET 
/nar/zstd/jxkglr445f215m2faqz1i2lgmbans4rf-texlive-amsmath-66594-doc
2024-04-04 17:15:33 GET /qg5cxb869i42jn7x2dm6k5l41ikkz21w.narinfo
2024-04-04 17:15:33 GET 
/nar/zstd/i2hp3q2pfhsyl0al7z38am7cqpddi4qr-texlive-capt-of-66594-doc
2024-04-04 17:15:33 GET /hh0gdbljj3cjdnjbr88kfm21mhys5sy7.narinfo
2024-04-04 17:15:33 GET /dfgc46q3l8wlnymv49a1wjnxypin8p0y.narinfo
2024-04-04 17:15:33 GET /yj63wifalfr6sla42h7mkqg011qrl5d0.narinfo
2024-04-04 17:15:33 GET /h2s2g2adxbnd67g34mnjnpcr6p3nhr69.narinfo
2024-04-04 17:15:33 -> GET /h2s2g2adxbnd67g34mnjnpcr6p3nhr69.narinfo: 404
2024-04-04 17:15:33 GET 
/nar/lzip/6zxlrw15b9dsv73s7v5fqabl7iv5v5il-python-exceptiongroup-1.1.1
2024-04-04 17:15:33 GET 
/nar/zstd/pychjd114abscbqlzcr3s7myf1497vw2-julia-compilersupportlibraries-jll-0.4.0%2B1
--8<---cut here---end--->8---

‘guix publish’ replied, but 40s too late (nginx has
“proxy_connect_timeout 10s;” for .narinfo URLs¹).

Notice the 40s pause time between 17:14:51 and 17:15:33.  Stop-the-world
GC?  Unlikely, because ‘guix publish’ had been running for ~3h, so even
with a leak², it’s hard to believe GC could take this long.

Ludo’.

¹ 
https://git.savannah.gnu.org/cgit/guix/maintenance.git/tree/hydra/nginx/berlin.scm#n103
² https://issues.guix.gnu.org/69596





bug#54447: cuirass: missing derivation error

2023-11-20 Thread Maxim Cournoyer
Hi Ludovic,

Ludovic Courtès  writes:

> Maxim Cournoyer  skribis:
>
>> Another example: https://ci.guix.gnu.org/build/1982454/details
>>
>> substitute: 
>> substitute: [Kupdating substitutes from 'http://10.0.0.1'...   0.0%
>> substitute: [Kupdating substitutes from 'http://10.0.0.1'... 100.0%
>> cannot build missing derivation 
>> ?/gnu/store/vwhgs9dkj9spryglb180j27dr5vidjxv-ecl-23.9.9.drv?
>
> This one is from Sep. 9, which is before I deployed the remote-worker
> fixes, so I’ll dismiss it (happy to look at more recent ones though!).

Here's a more recent occurrence:
https://ci.guix.gnu.org/build/2635272/details

I haven't restarted it to leave proof of its existence :-)

-- 
Thanks,
Maxim





bug#54447: cuirass: missing derivation error

2023-10-16 Thread Ludovic Courtès
Ludovic Courtès  skribis:

> Turns out that ‘cleanup-cuirass-roots’ in maintenance.git, used as an
> mcron job, explicitly removes GC roots for things like *-os-encrypted
> once they’re more than two days old, as well as GC roots for the
> corresponding .drv.
>
> I think this was increasing the likelihood that a .drv would be GC’d by
> the time we run the test: under high load¹, it’s plausible that a system
> test wouldn’t be built within two days after it’s been queued.
>
> I’m proposing the change below to address this; I don’t think we need
> ‘--gc-keep-outputs --gc-keep-derivations’ anymore now that we keep
> things in ‘guix publish’ cache first and foremost.

I pushed a variant of this patch:

  053839d hydra: services: Leave “guix-binary.tar.xz” GC roots.
  e40d961 hydra: services: Preserve Cuirass .drv GC roots.
  b8fc66c hydra: cuirass: Fix build product regexps.

I didn’t dare remove “--gc-keep-derivations”.  I reconfigured berlin
just now from this commit and restarted mcron (I didn’t restart
guix-daemon to avoid downtime; we should do that when the queue is close
to empty).

We’ll have to monitor disk usage to make sure it’s not negatively
affected.

Ludo’.





bug#54447: cuirass: missing derivation error

2023-10-16 Thread Ludovic Courtès
Maxim Cournoyer  skribis:

>> Tip of the day: M-: (build-farm-build 1982454)
>
> I don't have such a function in scope, is this from the guix-emacs
> package?

It’s from the ‘emacs-build-farm’ package, which I recommend.  :-)

Ludo’.





bug#54447: cuirass: missing derivation error

2023-10-16 Thread Maxim Cournoyer
Hi,

Ludovic Courtès  writes:

> Maxim Cournoyer  skribis:
>
>> Another example: https://ci.guix.gnu.org/build/1982454/details
>>
>> substitute: 
>> substitute: [Kupdating substitutes from 'http://10.0.0.1'...   0.0%
>> substitute: [Kupdating substitutes from 'http://10.0.0.1'... 100.0%
>> cannot build missing derivation 
>> ?/gnu/store/vwhgs9dkj9spryglb180j27dr5vidjxv-ecl-23.9.9.drv?
>
> This one is from Sep. 9, which is before I deployed the remote-worker
> fixes, so I’ll dismiss it (happy to look at more recent ones though!).
>
> Tip of the day: M-: (build-farm-build 1982454)

I don't have such a function in scope, is this from the guix-emacs
package?

-- 
Thanks,
Maxim





bug#54447: cuirass: missing derivation error

2023-10-15 Thread Ludovic Courtès
Ludovic Courtès  skribis:

> In addition to the mcron job, Cuirass’s own ‘register-gc-roots’
> procedure periodically deletes GC roots older than ‘%gc-roots-ttl’ (30
> days in practice).  That’s okay, except that it would be safer to delete
> GC roots for a .drv if and only if it’s been built already.

Fixed in Cuirass commit 55af0f70c0d4938b8eda777382bbc4d8f5698a37.

Ludo'.





bug#54447: cuirass: missing derivation error

2023-10-15 Thread Ludovic Courtès
Ludovic Courtès  skribis:

> Looking at the nginx and ‘guix publish’ logs, I found that the missing
> substitute is not that of 4r1wij3bzj9zv75ds82a93jl7bcman2x (the .drv
> itself) but rather that of a dependency of that .drv:
>
>   [14/Oct/2023:23:22:09 +0200] "GET /wqqzcxrhbnv0nzg64iiiqd5grr4vk2zg.narinfo 
> HTTP/1.1" 404 58 "-" "GNU Guile"
>
> That item’s size is above the cache bypass threshold of 100 MiB as
> currently configured on berlin:
>
> $ du -hs /gnu/store/wqqzcxrhbnv0nzg64iiiqd5grr4vk2zg-guix-5a6b1a5
> 124M/gnu/store/wqqzcxrhbnv0nzg64iiiqd5grr4vk2zg-guix-5a6b1a5
>
> The immediate fix/workaround is to raise that threshold.

I raised the threshold to 150 MiB in maintenance.git commit
213384e43de63ce3a5a55599e8fb89891ffef7eb.

I reconfigured berlin and restarted ‘guix publish’ seconds ago.
Hopefully next time installation tests won’t have that problem.

Ludo’.





bug#54447: cuirass: missing derivation error

2023-10-15 Thread Ludovic Courtès
Hi!

Ludovic Courtès  skribis:

> Mathieu Othacehe  skribis:
>
>> A lot of builds, among them ~20 system tests[1], are failing with:
>> "cannot build missing derivation
>> ?/gnu/store/hs6kp1lqgymhyp3jndc0dsp0pn4psgv0-gui-installed-desktop-os-encrypted.drv?"
>> errors.
>>
>> Those derivations are present on the CI head node. This means that the
>> errors occur during substitution. This is most likely caused by some
>> issue with the publish server, because:
>>
>> - The publish server serves a 404 error. We should get rid once and for
>>   all of this 404 thing, pushing something like:
>>   https://issues.guix.gnu.org/50040.
>>
>> or
>>
>> - The publish server is not fast enough and hits an Nginx timeout that
>>   closes the communication.
>
> Also being discussed at .

I got confirmation that the cache-bypass-threshold hypothesis holds, at
least for system tests.

Namely, looking at ,
which ends like this:

--8<---cut here---start->8---
@ substituter-succeeded /gnu/store/qh2876i5l1wvxgwhg9fbl9zmb3px3n2m-gc-roots.drv
fetching path 
`/gnu/store/fh9dnmrfsz429pwqmvsjnk0snlm959kc-xdg-mime-database-builder'...
@ substituter-started 
/gnu/store/fh9dnmrfsz429pwqmvsjnk0snlm959kc-xdg-mime-database-builder substitute
Downloading 
http://141.80.167.131/nar/lzip/fh9dnmrfsz429pwqmvsjnk0snlm959kc-xdg-mime-database-builder...
. xdg-mime-database-builder3.6MiB/s 00:00 | 3KiB 
transferred. xdg-mime-database-builder1.9MiB/s 00:00 | 
3KiB transferred

@ substituter-succeeded 
/gnu/store/fh9dnmrfsz429pwqmvsjnk0snlm959kc-xdg-mime-database-builder
cannot build missing derivation 
‘/gnu/store/4r1wij3bzj9zv75ds82a93jl7bcman2x-installed-extlinux-os.drv’
--8<---cut here---end--->8---

Looking at the nginx and ‘guix publish’ logs, I found that the missing
substitute is not that of 4r1wij3bzj9zv75ds82a93jl7bcman2x (the .drv
itself) but rather that of a dependency of that .drv:

  [14/Oct/2023:23:22:09 +0200] "GET /wqqzcxrhbnv0nzg64iiiqd5grr4vk2zg.narinfo 
HTTP/1.1" 404 58 "-" "GNU Guile"

That item’s size is above the cache bypass threshold of 100 MiB as
currently configured on berlin:

--8<---cut here---start->8---
$ du -hs /gnu/store/wqqzcxrhbnv0nzg64iiiqd5grr4vk2zg-guix-5a6b1a5
124M/gnu/store/wqqzcxrhbnv0nzg64iiiqd5grr4vk2zg-guix-5a6b1a5
--8<---cut here---end--->8---

The immediate fix/workaround is to raise that threshold.

A better solution would be for system tests to depend on a fixed-output
derivation for the Guix source instead of the “source” above (I use
“source” as it is used in the context of ).

Thanks,
Ludo’.





bug#54447: cuirass: missing derivation error

2023-10-15 Thread Ludovic Courtès
Maxim Cournoyer  skribis:

> Another example: https://ci.guix.gnu.org/build/1982454/details
>
> substitute: 
> substitute: [Kupdating substitutes from 'http://10.0.0.1'...   0.0%
> substitute: [Kupdating substitutes from 'http://10.0.0.1'... 100.0%
> cannot build missing derivation 
> ?/gnu/store/vwhgs9dkj9spryglb180j27dr5vidjxv-ecl-23.9.9.drv?

This one is from Sep. 9, which is before I deployed the remote-worker
fixes, so I’ll dismiss it (happy to look at more recent ones though!).

Tip of the day: M-: (build-farm-build 1982454)

Ludo’.





bug#54447: cuirass: missing derivation error

2023-10-10 Thread Maxim Cournoyer
Hello,

宋文武  writes:

[...]

> Hello, this one for ddd: https://ci.guix.gnu.org/build/1372655/log/raw
>
>   cannot build missing derivation 
> ?/gnu/store/anzz2p18b7r9x45y350avnk8br2yihi2-ddd-3.4.0.drv?
>
> Restart it on CI still got the same error.

Another example: https://ci.guix.gnu.org/build/1982454/details

--8<---cut here---start->8---
substitute: 
substitute: [Kupdating substitutes from 'http://10.0.0.1'...   0.0%
substitute: [Kupdating substitutes from 'http://10.0.0.1'... 100.0%
cannot build missing derivation 
?/gnu/store/vwhgs9dkj9spryglb180j27dr5vidjxv-ecl-23.9.9.drv?
--8<---cut here---end--->8---

-- 
Thanks,
Maxim





bug#54447: cuirass: missing derivation error

2023-10-10 Thread Maxim Cournoyer
Hi Ludovic,

Ludovic Courtès  writes:

> Hello!
>
> Mathieu Othacehe  skribis:
>
>> A lot of builds, among them ~20 system tests[1], are failing with:
>> "cannot build missing derivation
>> ?/gnu/store/hs6kp1lqgymhyp3jndc0dsp0pn4psgv0-gui-installed-desktop-os-encrypted.drv?"
>> errors.
>
> I have a disappointingly simple hypothesis for this.  Remember that
> “missing derivation” errors happen primarily for system tests.
>
> Turns out that ‘cleanup-cuirass-roots’ in maintenance.git, used as an
> mcron job, explicitly removes GC roots for things like *-os-encrypted
> once they’re more than two days old, as well as GC roots for the
> corresponding .drv.
>
> I think this was increasing the likelihood that a .drv would be GC’d by
> the time we run the test: under high load¹, it’s plausible that a system
> test wouldn’t be built within two days after it’s been queued.
>
> I’m proposing the change below to address this; I don’t think we need
> ‘--gc-keep-outputs --gc-keep-derivations’ anymore now that we keep
> things in ‘guix publish’ cache first and foremost.
>
> Thoughts?

Ah, so that mcron job is kind of a hack to hasten garbage collecting
only *some* items faster than the default policy of 30 days?  And we'd
now avoid deleting selected .drv files while still deleting their
outputs, so in the case something that needs it took more than 2 days to
build, it could lead to having to rebuild the garbage collected outputs?

I'm not sure if we need such a fancy hack with the 100 TiB of data we
now have, but your fix seems reasonable (LGTM!)

> In addition to the mcron job, Cuirass’s own ‘register-gc-roots’
> procedure periodically deletes GC roots older than ‘%gc-roots-ttl’ (30
> days in practice).  That’s okay, except that it would be safer to delete
> GC roots for a .drv if and only if it’s been built already.

Hm.  I wonder if this could explain the other cases we've seen.  It
could be that building a derivation was interrupted or canceled for some
reason, then 30 days elapsed, then was garbage collected, and after
which it doesn't get recreated and we get the error of the missing .drv?

-- 
Thanks,
Maxim





bug#54447: cuirass: missing derivation error

2023-10-10 Thread Ludovic Courtès
Hello!

Mathieu Othacehe  skribis:

> A lot of builds, among them ~20 system tests[1], are failing with:
> "cannot build missing derivation
> ?/gnu/store/hs6kp1lqgymhyp3jndc0dsp0pn4psgv0-gui-installed-desktop-os-encrypted.drv?"
> errors.

I have a disappointingly simple hypothesis for this.  Remember that
“missing derivation” errors happen primarily for system tests.

Turns out that ‘cleanup-cuirass-roots’ in maintenance.git, used as an
mcron job, explicitly removes GC roots for things like *-os-encrypted
once they’re more than two days old, as well as GC roots for the
corresponding .drv.

I think this was increasing the likelihood that a .drv would be GC’d by
the time we run the test: under high load¹, it’s plausible that a system
test wouldn’t be built within two days after it’s been queued.

I’m proposing the change below to address this; I don’t think we need
‘--gc-keep-outputs --gc-keep-derivations’ anymore now that we keep
things in ‘guix publish’ cache first and foremost.

Thoughts?

In addition to the mcron job, Cuirass’s own ‘register-gc-roots’
procedure periodically deletes GC roots older than ‘%gc-roots-ttl’ (30
days in practice).  That’s okay, except that it would be safer to delete
GC roots for a .drv if and only if it’s been built already.

Thanks,
Ludo’.

¹ The queue was often processed slowly, with many workers remaining idle
  due to the bug fixed by
  
.

diff --git a/hydra/modules/sysadmin/services.scm b/hydra/modules/sysadmin/services.scm
index fecfdde..e6f2b44 100644
--- a/hydra/modules/sysadmin/services.scm
+++ b/hydra/modules/sysadmin/services.scm
@@ -110,9 +110,7 @@
   ((guix config) => ,(make-config.scm)))
#~(begin
(use-modules (ice-9 ftw)
-(srfi srfi-1)
-(guix store)
-(guix derivations))
+(srfi srfi-1))
 
(define %roots-directory
  "/var/guix/profiles/per-user/cuirass/cuirass")
@@ -157,28 +155,6 @@
  deleted))
  deleted))
 
-   (define (root-target root)
- ;; Return the store item ROOT refers to.
- (string-append (%store-prefix) "/" (basename root)))
-
-   (define (derivation-referrers store item)
- ;; Return the referrers of the derivers of ITEM.
- (let* ((derivers  (valid-derivers store item))
-(referrers (append-map (lambda (drv)
- (referrers store drv))
-   derivers)))
-   (delete-duplicates referrers)))
-
-   (define (delete-gc-root-for-derivation drv)
- ;; Delete the GC root for DRV, if any.
- (catch 'system-error
-   (lambda ()
- (let ((item (derivation-path->output-path drv)))
-   (delete-file
-(string-append %roots-directory
-   "/" (basename drv)
-   (const #f)))
-
;; Note: 'scandir' would introduce too much overhead due
;; to the large number of entries that it would sort.
(define deleted
@@ -197,17 +173,7 @@
(for-each (lambda (file)
(display file port)
(newline port))
- deleted)))
-
-   ;; Since we run 'guix-daemon --gc-keep-outputs
-   ;; --gc-keep-derivations', also remove GC roots for the outputs of
-   ;; derivations that refer to the derivers of DELETED.
-   (for-each delete-gc-root-for-derivation
- (with-store store
-   (append-map (lambda (root)
- (derivation-referrers
-  store (root-target root)))
-   deleted
+ deleted
 
 (define (gc-jobs threshold)
   "Return the garbage collection mcron jobs.  The garbage collection
@@ -251,8 +217,7 @@ collection instead."
 
(build-accounts (* build-accounts-to-max-jobs-ratio max-jobs))
(extra-options (list "--max-jobs" (number->string max-jobs)
-"--cores" (number->string cores)
-"--gc-keep-outputs" "--gc-keep-derivations"
+"--cores" (number->string cores)
 
 
 ;;;


bug#54447: cuirass: missing derivation error

2023-08-30 Thread 宋文武 via Bug reports for GNU Guix
Maxim Cournoyer  writes:

> I wonder if these could be related to the DDoS protection discovered on
> the Berlin network.  I'll keep looking for other, potentially different
> occurrences.


Hello, this one for ddd: https://ci.guix.gnu.org/build/1372655/log/raw

  cannot build missing derivation 
?/gnu/store/anzz2p18b7r9x45y350avnk8br2yihi2-ddd-3.4.0.drv?

Restart it on CI still got the same error.





bug#54447: cuirass: missing derivation error

2023-08-22 Thread Ludovic Courtès
Hi,

Maxim Cournoyer  skribis:

> Looking at multiple of recent 'cannot build missing derivation' build
> failures on Cuirass, I see for example:
>
> substitute: 
> substitute: [Kupdating substitutes from 'http://141.80.167.131'...   0.0%
> substitute: [Kcould not fetch 
> http://141.80.167.131/rhgrs3ac6h64siz0krqh2ia8kkn3h6ym.narinfo 504
> substitute: updating substitutes from 'http://141.80.167.131'... 100.0%
> cannot build missing derivation 
> ?/gnu/store/rhgrs3ac6h64siz0krqh2ia8kkn3h6ym-python-asdf-standard-1.0.3.drv?
>
>
> So it seems the error originated from guix-publish being too heavily
> under load to produce a timely reply, and the nginx proxy issued a 504
> (timeout) error response.
>
> Looking into /var/log/guix-publish.log for a corresponding entry, I
> found:
>
> 2023-08-21 23:59:35 GET /rhgrs3ac6h64siz0krqh2ia8kkn3h6ym.narinfo
> 2023-08-21 23:59:35 In web/server/http.scm:
> 2023-08-21 23:59:35 159:7  2 (http-write #< socket: 
> # …)
> 2023-08-21 23:59:35 In unknown file:
> 2023-08-21 23:59:351 (put-bytevector # 
> #vu8(83 # …) …)
> 2023-08-21 23:59:35 In ice-9/boot-9.scm:
> 2023-08-21 23:59:35   1685:16  0 (raise-exception _ #:continuable? _)
> 2023-08-21 23:59:35 In procedure fport_write: Broken pipe
>
>
> So the connection was apparently severed (?), resulting in the "broken
> pipe" error.

I think it’s just that, when ‘guix publish’ eventually replied, the
client had left, hence EPIPE.

The initial problem does look like ‘guix publish’ being too slow.  Do
the corresponding nginx logs confirm the “backend too slow => 504”
hypothesis?

Thanks for investigating!

Ludo’.





bug#54447: cuirass: missing derivation error

2023-08-21 Thread Maxim Cournoyer
Hello,

Mathieu Othacehe  writes:

> Hello,
>
> A lot of builds, among them ~20 system tests[1], are failing with:
> "cannot build missing derivation
> ?/gnu/store/hs6kp1lqgymhyp3jndc0dsp0pn4psgv0-gui-installed-desktop-os-encrypted.drv?"
> errors.
>
> Those derivations are present on the CI head node. This means that the
> errors occur during substitution. This is most likely caused by some
> issue with the publish server, because:
>
> - The publish server serves a 404 error. We should get rid once and for
>   all of this 404 thing, pushing something like:
>   https://issues.guix.gnu.org/50040.
>
> or
>
> - The publish server is not fast enough and hits an Nginx timeout that
>   closes the communication.
>
> Any other cause I could be missing?

Looking at multiple of recent 'cannot build missing derivation' build
failures on Cuirass, I see for example:

--8<---cut here---start->8---
substitute: 
substitute: [Kupdating substitutes from 'http://141.80.167.131'...   0.0%
substitute: [Kcould not fetch 
http://141.80.167.131/rhgrs3ac6h64siz0krqh2ia8kkn3h6ym.narinfo 504
substitute: updating substitutes from 'http://141.80.167.131'... 100.0%
cannot build missing derivation 
?/gnu/store/rhgrs3ac6h64siz0krqh2ia8kkn3h6ym-python-asdf-standard-1.0.3.drv?
--8<---cut here---end--->8---

So it seems the error originated from guix-publish being too heavily
under load to produce a timely reply, and the nginx proxy issued a 504
(timeout) error response.

Looking into /var/log/guix-publish.log for a corresponding entry, I
found:

--8<---cut here---start->8---
2023-08-21 23:59:35 GET /rhgrs3ac6h64siz0krqh2ia8kkn3h6ym.narinfo
2023-08-21 23:59:35 In web/server/http.scm:
2023-08-21 23:59:35 159:7  2 (http-write #< socket: 
# …)
2023-08-21 23:59:35 In unknown file:
2023-08-21 23:59:351 (put-bytevector # 
#vu8(83 # …) …)
2023-08-21 23:59:35 In ice-9/boot-9.scm:
2023-08-21 23:59:35   1685:16  0 (raise-exception _ #:continuable? _)
2023-08-21 23:59:35 In procedure fport_write: Broken pipe
--8<---cut here---end--->8---

So the connection was apparently severed (?), resulting in the "broken
pipe" error.

Here's a different one:

--8<---cut here---start->8---
substitute: 
substitute: [Kupdating substitutes from 'http://141.80.167.131'...   0.0%
substitute: [Kcould not fetch 
http://141.80.167.131/p2lfyvbxicjqsm4qp6368bx76gp0g948.narinfo 504
substitute: updating substitutes from 'http://141.80.167.131'... 100.0%
cannot build missing derivation 
?/gnu/store/p2lfyvbxicjqsm4qp6368bx76gp0g948-python-astropy-healpix-0.7.drv?
--8<---cut here---end--->8---

it occurred around the same time, and the failing mode was the same, per
guix-publish.log:

--8<---cut here---start->8---
2023-08-21 23:59:35 GET /p2lfyvbxicjqsm4qp6368bx76gp0g948.narinfo
2023-08-21 23:59:35 In web/server/http.scm:
2023-08-21 23:59:35 159:7  2 (http-write #< socket: 
# …)
2023-08-21 23:59:35 In unknown file:
2023-08-21 23:59:351 (put-bytevector # 
#vu8(83 # …) …)
2023-08-21 23:59:35 In ice-9/boot-9.scm:
2023-08-21 23:59:35   1685:16  0 (raise-exception _ #:continuable? _)
2023-08-21 23:59:35 In procedure fport_write: Broken pipe
--8<---cut here---end--->8---

I wonder if these could be related to the DDoS protection discovered on
the Berlin network.  I'll keep looking for other, potentially different
occurrences.

-- 
Thanks,
Maxim





bug#54447: cuirass: missing derivation error

2022-12-10 Thread Ludovic Courtès
Mathieu Othacehe  skribis:

> A lot of builds, among them ~20 system tests[1], are failing with:
> "cannot build missing derivation
> ?/gnu/store/hs6kp1lqgymhyp3jndc0dsp0pn4psgv0-gui-installed-desktop-os-encrypted.drv?"
> errors.
>
> Those derivations are present on the CI head node. This means that the
> errors occur during substitution. This is most likely caused by some
> issue with the publish server, because:
>
> - The publish server serves a 404 error. We should get rid once and for
>   all of this 404 thing, pushing something like:
>   https://issues.guix.gnu.org/50040.
>
> or
>
> - The publish server is not fast enough and hits an Nginx timeout that
>   closes the communication.

Also being discussed at .

Ludo’.





bug#54447: cuirass: missing derivation error

2022-08-10 Thread Maxime Devos


On 10-08-2022 11:43, Maxime Devos wrote:

Here's another instance: https://ci.guix.gnu.org/eval/528710


More information:

 * non-ASCII does not seem to be set up (see: ?) (looks irrelevant)
 * here are connection failures

Log:


substitute:
substitute: updating substitutes from 'http://141.80.167.131'...   0.0%guix 
substitute: warning: 141.80.167.131: connection failed: Connection refused
substitute:
cannot build missing derivation 
?/gnu/store/4gqj2byvj9zz30wzvwkbijpya3vn1bjw-rust-dogged-0.2.0.drv?


Greetings,
Maxime.


OpenPGP_0x49E3EE22191725EE.asc
Description: OpenPGP public key


OpenPGP_signature
Description: OpenPGP digital signature


bug#54447: cuirass: missing derivation error

2022-08-10 Thread Maxime Devos

Here's another instance: https://ci.guix.gnu.org/eval/528710



OpenPGP_0x49E3EE22191725EE.asc
Description: OpenPGP public key


OpenPGP_signature
Description: OpenPGP digital signature


bug#54447: cuirass: missing derivation error

2022-03-18 Thread Mathieu Othacehe


Hello,

A lot of builds, among them ~20 system tests[1], are failing with:
"cannot build missing derivation
?/gnu/store/hs6kp1lqgymhyp3jndc0dsp0pn4psgv0-gui-installed-desktop-os-encrypted.drv?"
errors.

Those derivations are present on the CI head node. This means that the
errors occur during substitution. This is most likely caused by some
issue with the publish server, because:

- The publish server serves a 404 error. We should get rid once and for
  all of this 404 thing, pushing something like:
  https://issues.guix.gnu.org/50040.

or

- The publish server is not fast enough and hits an Nginx timeout that
  closes the communication.

Any other cause I could be missing?

Thanks,

Mathieu

[1]: https://ci.guix.gnu.org/eval/159975?status=failed