bug#64653: ‘static-networking’ fails to start

2023-07-15 Thread Ludovic Courtès
Hi!

On the machine that exhibited , I’m
now seeing this, with the fix from commit
26602f4063a6e0c626e8deb3423166bcd0abeb90:

--8<---cut here---start->8---
[  121.017522] shepherd[1]: Starting service user-homes...
[  121.049038] tg3 :05:00.0 eth0: Tigon3 [partno(BCM95720) rev 572] 
(PCI Express) MAC address b8:cb:29:b5:1c:3a
[  121.049042] tg3 :05:00.0 eth0: attached PHY is 5720C (10/100/1000Base-T 
Ethernet) (WireSpeed[1], EEE[1])
[  121.049044] tg3 :05:00.0 eth0: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[1] 
TSOcap[1]
[  121.049045] tg3 :05:00.0 eth0: dma_rwctrl[0001] dma_mask[64-bit]
[  121.084342] tg3 :05:00.1 eth1: Tigon3 [partno(BCM95720) rev 572] 
(PCI Express) MAC address b8:cb:29:b5:1c:3b
[  121.084355] tg3 :05:00.1 eth1: attached PHY is 5720C (10/100/1000Base-T 
Ethernet) (WireSpeed[1], EEE[1])
[  121.084363] tg3 :05:00.1 eth1: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[1] 
TSOcap[1]
[  121.084370] tg3 :05:00.1 eth1: dma_rwctrl[0001] dma_mask[64-bit]
[  121.102367] iTCO_vendor_support: vendor-support=0
[  121.103831] Error: Driver 'pcspkr' is already registered, aborting...
[  121.108617] dcdbas dcdbas: Dell Systems Management Base Driver (version 
5.6.0-3.4)
[  121.113037] tg3 :05:00.1 eno2: renamed from eth1

[...]

[  121.281600] shepherd[1]: Service user-homes has been started.
[  121.282538] shepherd[1]: Service user-homes started.
[  121.368316] ipmi_si IPI0001:00: Using irq 10
[  121.405790] ipmi_si IPI0001:00: IPMI message handler: Found new BMC (man_id: 
0x0002a2, prod_id: 0x0100, dev_id: 0x20)
[  121.419871] shepherd[1]: Exception caught while starting #< 
7f19889012a0>: (wrong-type-arg "port-filename" "Wrong type argument in position 
~A: ~S" (1 #) (#))
[  121.420074] shepherd[1]: Service user-homes running with value #t.
[  121.420218] shepherd[1]: Service networking failed to start.
--8<---cut here---end--->8---

The failure seems to happen after the whole static networking config has
been set up though (‘ip a’ shows that everything’s in place).

Problem is that at this point ‘networking’ cannot be started unless you
manually tear down everything with ‘ip’:

--8<---cut here---start->8---
$ sudo herd start networking
herd: error: exception rattrapée pendant l’exécution de « start » sur le 
service « networking » :
Throw to key `%exception' with args `("#<&netlink-response-error errno: 17>")'.
--8<---cut here---end--->8---

(17 = EEXIST)

This makes me think we should make the set up phase idempotent or,
alternatively, add special actions to force a change.

Thoughts?

Ludo’.





bug#64653: ‘static-networking’ fails to start

2023-10-02 Thread Ludovic Courtès
Ludovic Courtès  skribis:

> [  121.281600] shepherd[1]: Service user-homes has been started.
> [  121.282538] shepherd[1]: Service user-homes started.
> [  121.368316] ipmi_si IPI0001:00: Using irq 10
> [  121.405790] ipmi_si IPI0001:00: IPMI message handler: Found new BMC 
> (man_id: 0x0002a2, prod_id: 0x0100, dev_id: 0x20)
> [  121.419871] shepherd[1]: Exception caught while starting #< 
> 7f19889012a0>: (wrong-type-arg "port-filename" "Wrong type argument in 
> position ~A: ~S" (1 #) (# 7f1981887000>))
> [  121.420074] shepherd[1]: Service user-homes running with value #t.
> [  121.420218] shepherd[1]: Service networking failed to start.
>
>
> The failure seems to happen after the whole static networking config has
> been set up though (‘ip a’ shows that everything’s in place).
>
> Problem is that at this point ‘networking’ cannot be started unless you
> manually tear down everything with ‘ip’:
>
> $ sudo herd start networking
> herd: error: exception rattrapée pendant l’exécution de « start » sur le 
> service « networking » :
> Throw to key `%exception' with args `("#<&netlink-response-error errno: 
> 17>")'.

Quick workaround if you encounter this bug:

  1. Find the “tear-down” script of your system with:

   guix gc -R /run/current-system |grep tear-down-network

  2. In a ‘screen’ session, run this as root:

   while true ; do herd enable networking; herd start networking; sleep 3; 
done

  3. Run:

   sudo guile --no-auto-compile TEAR_DOWN_SCRIPT_FROM_STEP_1

Beautiful, isn’t it?

(We’ll actually work on fixing the bug, too…)

Ludo’.





bug#64653: ‘static-networking’ fails to start

2023-11-11 Thread Leo Nikkilä via Bug reports for GNU Guix
I'm also seeing this issue on a headless RockPro64 system. Do you know anything 
I could change in the configuration to work around this during boot, e.g. patch 
a specific commit out?

Happy to provide further details or test things on my system.





bug#64653: ‘static-networking’ fails to start

2024-01-03 Thread Ludovic Courtès
Hello!

Ludovic Courtès  skribis:

> [  121.282538] shepherd[1]: Service user-homes started.
> [  121.368316] ipmi_si IPI0001:00: Using irq 10
> [  121.405790] ipmi_si IPI0001:00: IPMI message handler: Found new BMC 
> (man_id: 0x0002a2, prod_id: 0x0100, dev_id: 0x20)
> [  121.419871] shepherd[1]: Exception caught while starting #< 
> 7f19889012a0>: (wrong-type-arg "port-filename" "Wrong type argument in 
> position ~A: ~S" (1 #) (# 7f1981887000>))
> [  121.420074] shepherd[1]: Service user-homes running with value #t.
> [  121.420218] shepherd[1]: Service networking failed to start.

I’m seeing a similar exception in a Hurd VM running shepherd 0.10.3rc1:

--8<---cut here---start->8---
Jan  3 23:13:22 localhost shepherd[1]: Exception caught while starting 
networking: (wrong-type-arg "port-filename" "Wrong type argument in position 
~A: ~S" (1 #) (#)) 
Jan  3 23:13:22 localhost shepherd[1]: Service networking failed to start. 
--8<---cut here---end--->8---

It’s interesting because it suggests that the offending ‘port-filename’
call comes from ‘load’, not from the network-setup code being loaded
(here, the /hurd/pfinet translator has been properly set up).

Looking at the code in ‘boot-9.scm’, I *think* we end up calling
‘primitive-load’; ‘shepherd’ replaces it with its own (@ (shepherd
support) primitive-load*).

I managed to grab this backtrace:

--8<---cut here---start->8---
Evaluating user expression (catch #t (lambda () (load "/gnu/store/64?")) # ?).
starting 
'/gnu/store/gn8q7p790a9zdnlciyp1vlncpin366r0-hurd-v0.9.git20230216/hurd/pfinet 
"--ipv6" "/servers/socket/26" "--interface" "/dev/eth0" "--address" "10.0.2.15" 
"--netmask" "255.255.255.0" "--gateway" "10.0.2.2"'
In ice-9/boot-9.scm:
142:2  7 (dynamic-wind # ?)
In shepherd/support.scm:
   486:15  6 (_ #)
In ice-9/read.scm:
   859:19  5 (read _)
In unknown file:
   4 (port-filename #)
In ice-9/boot-9.scm:
  1685:16  3 (raise-exception _ #:continuable? _)
  1780:13  2 (_ #<&compound-exception components: (#<&assertion-fail?>)
In ice-9/eval.scm:
159:9  1 (_ #(#(#) (# "port-fil?" ?)))
In unknown file:
   0 (make-stack #t)
#t
--8<---cut here---end--->8---

So it’s indeed ‘read’ as called from ‘primitive-load*’ that stumbles
upon a closed port.  It also happens when loading a file that simply
suspends the current fiber via ‘sleep’ or similar, but only on the Hurd
though.

To be continued…

Ludo’.





bug#64653: ‘static-networking’ fails to start

2024-01-05 Thread Ludovic Courtès
Hi!

Ludovic Courtès  skribis:

> Evaluating user expression (catch #t (lambda () (load "/gnu/store/64?")) # ?).
> starting 
> '/gnu/store/gn8q7p790a9zdnlciyp1vlncpin366r0-hurd-v0.9.git20230216/hurd/pfinet
>  "--ipv6" "/servers/socket/26" "--interface" "/dev/eth0" "--address" 
> "10.0.2.15" "--netmask" "255.255.255.0" "--gateway" "10.0.2.2"'
> In ice-9/boot-9.scm:
> 142:2  7 (dynamic-wind # ?)
> In shepherd/support.scm:
>486:15  6 (_ #)
> In ice-9/read.scm:
>859:19  5 (read _)
> In unknown file:
>4 (port-filename #)
> In ice-9/boot-9.scm:
>   1685:16  3 (raise-exception _ #:continuable? _)
>   1780:13  2 (_ #<&compound-exception components: (#<&assertion-fail?>)
> In ice-9/eval.scm:
> 159:9  1 (_ #(#(#) (# "port-fil?" ?)))
> In unknown file:
>0 (make-stack #t)
> #t
>
> So it’s indeed ‘read’ as called from ‘primitive-load*’ that stumbles
> upon a closed port.

Good news: this is fixed by 4e431fda5f2ec76b6d6a271be7c30b1324431329!
Silly me had introduced a ‘dynamic-wind’ there.

(The funny thing with extensible systems like the Shepherd is that the
problem can be anywhere.  :-))

Ludo’.





bug#64653: 'static-networking' fails to start

2024-03-25 Thread Fabio Natali via Bug reports for GNU Guix
Hi,

I've been trying to reconfigure a machine from static IPv4 to static
dual-stack or IPv6-only. I followed one⁰ of the examples in the manual,
so I'd think I got the syntax right.

Once the reconfiguration has taken place and when restarting the
networking service, I get this error:

,
| herd: error: exception caught while executing 'start' on service 'networking':
| Throw to key `%exception' with args `("#<&netlink-response-error errno: 
17>")'.
`

This would seem to be relevant to this bug report 64653?

Do you know what this might be related to and what I can do to solve it?

This happens on an up-to-date Guix system.

Thanks, best wishes, Fabio.

⁰ 
https://guix.gnu.org/manual/devel/en/html_node/Networking-Setup.html#index-static_002dnetworking


-- 
Fabio Natali
https://fabionatali.com





bug#64653: 'static-networking' fails to start

2024-03-25 Thread Fabio Natali via Bug reports for GNU Guix
On 2024-03-25, 11:52 +, Fabio Natali  wrote:
> Once the reconfiguration has taken place and when restarting the
> networking service, I get this error:
>
> ,
> | herd: error: exception caught while executing 'start' on service 
> 'networking':
> | Throw to key `%exception' with args `("#<&netlink-response-error errno: 
> 17>")'.
> `

Ok, good news, thanks to Felix's advice[0] I was able to get this
sorted!

Apparently, specifying a default IPv6 gateway (as a link local address)
is what was causing the issue for me. Once the following bit was
commented out, everything started working again.

,
| (static-networking
|  (addresses (list (network-address
|(device "eth0")
|(value "10.0.0.2/24"))
|   (network-address
|(device "eth0")
|(value "2001:db8::1/64"
|  (routes (list (network-route
| (destination "default")
| (gateway "10.0.0.1"
| ;;(network-route
| ;; (destination "default")
| ;; (gateway "fe80::"
|  (name-servers '("10.0.0.1" "2001:db8::")))
`

("fe80::" and "2001:db8::" are just placeholders.)

I assume the router address gets retrieved automatically via Router
Advertisment (RA), so no need for that in my case.

Still, I'd expect to be possible to indicate the router's link-local
address... Do you see a possible bug here or is there anything else that
I might be missing?

Thanks, cheers, Fabio.


[0] https://lists.gnu.org/archive/html/help-guix/2024-03/msg00132.html


-- 
Fabio Natali
https://fabionatali.com