Hello,
I have a fascination for networking as some might be aware. I think that
a proper network design is the solid foundation underneath a cloud which
allows it to scale and provide the flexibility an organization requires.
I've worked a lot on the VXLAN+EVPN+BGP integration in CloudStack and I
think it's a great solution and should be the default for anybody who
starts to deploy CloudStack today.
VXLAN does have its drawbacks as it requires VXLAN offloading in the
NIC, switches and routers who can process it and requires additional
networking skills.
In the end a VM needs connectivity, IPv4 and/or IPv6. This allows them
to connect to other servers and the rest of the internet.
In the current design, whether it is traditional VLAN or VXLAN we still
assume that there is a L2 network. The VLAN or the VNI in VXLAN.
Technically these are not required and we can use pure L3 routing
towards the VMs from the host. In my opinion this can simplify
networking while also adding scalability.
** cloudbr0 **
On a test machine with plain Libvirt+KVM I created cloudbr0:
113: cloudbr0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue
state UP group default qlen 1000
link/ether f6:73:63:49:1f:33 brd ff:ff:ff:ff:ff:ff
inet 169.254.0.1/32 scope global cloudbr0
valid_lft forever preferred_lft forever
inet6 fe80::b009:e3ff:fe41:1394/64 scope link
valid_lft forever preferred_lft forever
inet6 fe80::1/64 scope link
valid_lft forever preferred_lft forever
You can see I've added two addresses to the bridge:
- 169.254.0.1/32
- fe80::1/64
** Test VM **
I have deployed a test VM which I attached to cloudbr0 and manually
added the addresses using netplan:
network:
ethernets:
ens18:
addresses:
- 2a14:9b80:103::100/128
- 2.57.57.29/32
nameservers:
addresses:
- 2620:fe::fe
search: []
routes:
- to: 0.0.0.0/0
via: 169.254.0.1
on-link: true
- to: ::/0
via: fe80::1
version: 2
This results in:
root@routing-test:~# ip addr show dev ens18
2: ens18: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel
state UP group default qlen 1000
link/ether bc:24:11:93:d7:94 brd ff:ff:ff:ff:ff:ff
altname enp0s18
inet 2.57.57.29/32 scope global ens18
valid_lft forever preferred_lft forever
inet6 2a14:9b80:103::100/128 scope global
valid_lft forever preferred_lft forever
inet6 fe80::be24:11ff:fe93:d794/64 scope link
valid_lft forever preferred_lft forever
root@routing-test:~#
In the VM you now see the IPv4 and IPv6 routes:
root@routing-test:~# ip -4 r
default via 169.254.0.1 dev ens18 proto static onlink
root@routing-test:~# ip -6 r
2a14:9b80:103::100 dev ens18 proto kernel metric 256 pref medium
fe80::/64 dev ens18 proto kernel metric 256 pref medium
default via fe80::1 dev ens18 proto static metric 1024 pref medium
root@routing-test:~#
** Static route and ARP/NDP entry **
On the HV I needed to add two routes and ARP/NDP entries pointing to the VM
ip -6 route add 2a14:9b80:103::100/128 dev cloudbr0
ip -6 neigh add 2a14:9b80:103::100 lladdr BC:24:11:93:D7:94 dev cloudbr0
nud permanent
ip -4 route add 2.57.57.29/32 dev cloudbr0
ip -4 neigh add 2.57.57.29 lladdr BC:24:11:93:D7:94 dev cloudbr0 nud
permanent
BC:24:11:93:D7:94 is the MAC address of the VM in this case.
** L3 Routing with BGP **
On the hypervisor I have the FRR BGP daemon running who advertises the
/32 and /128 routes:
- 2.57.57.29/32
- 2a14:9b80:103::100/128
ubuntu# sh ip route 2.57.57.29
Routing entry for 2.57.57.29/32
Known via "kernel", distance 0, metric 0, best
Last update 00:00:51 ago
* directly connected, cloudbr0, weight 1
hv-138-a12-26# show ipv6 route 2a14:9b80:103::100
Routing entry for 2a14:9b80:103::100/128
Known via "static", distance 1, metric 0
Last update 6d04h23m ago
directly connected, cloudbr0, weight 1
Routing entry for 2a14:9b80:103::100/128
Known via "kernel", distance 0, metric 1024, best
Last update 6d08h27m ago
* directly connected, cloudbr0, weight 1
ubuntu#
Both addresses are now advertised upstream towards the other BGP peers
while the hypervisor only receives the default routes from upstream
(0.0.0.0/0 and ::/0)
*** CloudStack ***
As we only route /32 or /128s towards a VM we gain a lot more
flexibility as these IPs can be routed anywhere in your network. No
stretching of VLANs nor routing VXLAN between sites.
CloudStack orchestration will need to make sure we program the right
routes on the hypervisor, but this is something Libvirt hooks can take
care of.
BGP is to be configured by the admin and that is to be documented.
This would be an additional type of network which will not support:
- DHCP
- User-Data from the VR
- A VR at all
UserData will need to come from ConfigDrive and using ConfigDrive the VM
will need to configure the IPs locally.
Security Grouping can and will still work as it does right now.
** IPv4 and IPv6 ***
This idea is protocol independent and since DHCP is no longer needed it
can work in multiple modes:
- IPv4 only
- IPv6 only (Really single stack!)
- IPv4+IPv6 (Dual Stack)
ConfigDrive will take care of the network configuration.
** What's next? **
I am not proposing anything to be developed right now, but I hope to
spark some ideas with people and get a discussion going.
If this will lead to an implementation to be written? Let's see!
Wido