Hi Ludovic,

executive summary: it is (was) a "network architecture" mistake by my
side, since I was mixing a device with static-network defined via guix
with a bridge defined via libvirt... and this is not good.  The more I
think about it the more I'm convinced that trying to add a route for
device "swws-bridge" (see below) in the "eno1" [1] static-networking
declaration is simply a... mistake.

Julien I'm adidng you in Cc: only because you develop guile-netlink and
maybe you could see if it's possible to improve netlink related error
messages.

Ludovic Courtès <l...@gnu.org> writes:

> Giovanni Biscuolo <g...@xelera.eu> skribis:
>
>> after a reboot on a running remote host (it was running since several
>> guix system generations ago... but with no reboots meanwhile) I get a
>> failing networking service and consequently the ssh service (et al)
>> refuses to start :-(
>>
>> Sorry I've no text to show you but a screenshot (see attachment below)
>> because I'm connecting with a remote KVM console appliance.

In a follow-up message I was then able to copy the actual error message:

--8<---------------cut here---------------start------------->8---

Jun 14 11:28:32 localhost vmunix: [    6.258520] shepherd[1]: Starting service
networking...
Jun 14 11:28:32 localhost vmunix: [    6.472949] shepherd[1]: Service 
networking failed to
start.
Jun 14 11:28:32 localhost vmunix: [    6.474842] shepherd[1]: Exception caught 
while
starting networking: (no-such-device "swws-bridge")
Jun 14 11:28:32 localhost vmunix: [    6.492344] shepherd[1]: Starting service
networking...
Jun 14 11:28:32 localhost vmunix: [    6.509652] shepherd[1]: Exception caught 
while
starting networking: (%exception #<&netlink-response-error errno: 17>)
Jun 14 11:28:32 localhost vmunix: [    6.510034] shepherd[1]: Service 
networking failed to
start.

--8<---------------cut here---------------end--------------->8---

Then (in the same message) I described how I was able to solve my issue,
this is the "core" of my configuration _mistake:_

--8<---------------cut here---------------start------------->8---

            (service static-networking-service-type
                     (list (static-networking
                            (addresses (list (network-address
                                              (device ane-wan-device)
                                              (value (string-append ane-wan-ip4 
"/24")))))
                            (routes (list (network-route
                                           (destination "default")
                                           (gateway ane-wan-gateway))))
                                          ;; ip route add 10.1.2.0/24 dev 
swws-bridge via 192.168.133.12
                                          ;; (network-route
                                          ;;  (destination "10.1.2.0/24")   ;; 
lxcbr0 net
                                          ;;  (device swws-bridge-name)
                                          ;;  (gateway "192.168.133.12")))) ;; 
on node002
                            (name-servers '("185.12.64.1"
                                            "185.12.64.1")))))

--8<---------------cut here---------------end--------------->8---

I commented out the second network-route definition, the one using
"swws-bridge" [1] as device to route to 10.1.2.0/24 via 192.168.133.12.

When I used that code, AFAIU the first time shepherd was trying to start
the networking service, failing because "swws-bridge" is missing and
(guile-)netlink fails with "no-such-device", then it tries again but
fails because the very same route is already defined (but not
functional).

A failing networking service (although the interface is up and running)
means that ssh (et al) fails to start, because networking is a ssh
requisite.

> 17 = EEXIST, which is netlink’s way of saying that the device/route/link
> it’s trying to add already exists.

Ah thanks!  I was not able to find that error code.

When run on the command line I get:

--8<---------------cut here---------------start------------->8---

g@ane ~$ sudo ip route add 10.1.2.0/24 dev swws-bridge via 192.168.133.12
RTNETLINK answers: File exists

--8<---------------cut here---------------end--------------->8---

Is it possible to have the same error and/or little bit of context in
syslog when this happens with 'network-set-up/linux'

Anyway, I think that "ip route" should just be idempotent... but maybe
I'm missing something. (and this is obviously not a downstream issue)

> The problem here is that static networking adds devices, routes, and
> links (see ‘network-set-up/linux’ in the code).  If it fails in the
> middle, then it may have added devices without adding routes, so you end
> up with half-configured networking.  Ideally this would be
> transactional.

Well, actually it would be a pity to fail a whole static-networking
"just" for a failing /secondary/ route, no?

But as I told in the "executive summary", how could I /dare/ to
declaratively add (with Guix System) a similar route for "swws-bridge"
when "swws-bridge" is managed by libvirt?

I should simply use libvirt to add that! :-)
https://libvirt.org/formatnetwork.html#static-routes

> When that happens, you need to check the logs and use the ‘ip’ command
> to figure out which part failed exactly.  In your case, the root problem
> seems to be that “swws-bridge” did not exist.

Yes, I can confirm this

> Then you can (1) manually fix it with ‘ip’, and (2) adjust your Guix
> System config to fix the problems you found.
>
> This is inconvenient at best.  I would be interested in hearing
> suggestions on how to improve on this.

Oh well, for my use-case I don't think there is anything to improve:
I just have to keep the "eno1" device configuration _separate_ from the
"swws-bridge" one (even if "swws-bridge" was defined via static-network
and not libvirt).

The only suggestion I have is to add a more "user friendly" error
messages in syslog for netlink-related errors, it wold have helped me
more to read "adding route, RTNETLINK answers: File exists" than
"netlink-response-error errno: 17"

Thank you and... happy hacking! Gio'


[1] swws-bridge-name is defined as "swws-bridge"
    ane-wan-device is defined as "eno1"    

-- 
Giovanni Biscuolo

Xelera IT Infrastructures

Attachment: signature.asc
Description: PGP signature

Reply via email to