Hi, Ludovic Courtès <l...@gnu.org> writes:
> Hi, > > Marius Bakke <mar...@gnu.org> skribis: > >> Marius Bakke <mar...@gnu.org> writes: >> >>> 'guix offload test' passes without problems. >> >> Not so fast, running it in a loop reveals the crash. >> >> There is a trace file in /root/offloadtest.trace on Berlin with such an >> occurence. It looks like a timeout is reached shortly before the EOF >> error: >> >> 10139 poll([{fd=14, events=POLLIN|POLLOUT}], 1, 0) = 1 ([{fd=14, >> revents=POLLOUT}]) >> 10139 poll([{fd=14, events=POLLIN}], 1, 15000) = 0 (Timeout) >> 10139 write(2, "Backtrace:\n", 11) = 11 >> >> This seems to be from a different node than the one reported previously, >> as the preceding connect() was to this machine: >> >> 10139 connect(44, {sa_family=AF_INET, sin_port=htons(22), >> sin_addr=inet_addr("141.80.167.186")}, 16) = -1 EINPROGRESS >> (Operation now in progress) > > So it looks like ‘connect’ fails and eventually we get an EOF object. > However, I don’t see where that EOF comes from because the return value > of ‘connect!’ (the Guile-SSH procedure) is properly checked. > > Ludo’. I got a slightly different backtrace that suggests making the connection is not at fault, rather it occurs during the read-repl-response call: --8<---------------cut here---------------start------------->8--- guix offload: testing 1 build machines defined in '/etc/guix/machines.scm'... Backtrace: 8 (primitive-load "/home/maxim/.config/guix/current/bin/guix") In guix/ui.scm: 2165:12 7 (run-guix-command _ . _) In ice-9/boot-9.scm: 1752:10 6 (with-exception-handler _ _ #:unwind? _ #:unwind-for-type _) 1747:15 5 (with-exception-handler #<procedure 7f2caf885780 at ice-9/boot-9.scm:1831:7 (exn)> _ # _ # …) In guix/scripts/offload.scm: 704:21 4 (check-machine-availability _ _) In srfi/srfi-1.scm: 586:17 3 (map1 (#<session ma...@overdrive1.guix.gnu.org:52522 (connected) 7f2cae396fc0>)) In guix/inferior.scm: 258:2 2 (port->inferior _ _) 240:2 1 (read-repl-response _ _) In ice-9/boot-9.scm: 1685:16 0 (raise-exception _ #:continuable? _) ice-9/boot-9.scm:1685:16: In procedure raise-exception: Throw to key `match-error' with args `("match" "no matching pattern" #<eof>)'. --8<---------------cut here---------------end--------------->8--- I seem to get this more often than not with the overdrive1 offload machine. Maxim