Re: BGP EVPN with CloudStack
Op 21/05/2024 om 04:26 schreef Hanis Irfan: Hi Wido, I'm currently running Rocky Linux 9 for the HV. That should be fine Why are you setting anything on cloudbr0? There is no need to create cloudbr0 with VXLAN. cloudbr0 is just a naming choice on my end. Is it okay for me to use something like NetworkManager to create the bridge? You don't need this bridge at all with VXLAN. So, no need to create a VXLAN interface the as the bridge slave? Then how would the bridge communicate through the VXLAN tunnels? No, you just need to have a /32 (v4) and /128 (v6) attached to 'lo' and announce it. The v4 address will be the VTEP source. Make sure all HVs can ping eachother on their loopback address. Make sure this underlay works! root@hv-138-a13-37:~# ip addr show dev lo 1: lo: mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet 10.255.255.108/32 brd 10.255.255.108 scope global lo valid_lft forever preferred_lft forever inet6 2a05:xxx:xxx:2::108/128 scope global valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever root@hv-138-a13-37:~# You should be able to ping with -s 8192, this verifies that jumboframes work. Assuming we're using VNI 10027. lo (VTEP) < vxlan10027 (slave of bridge) < cloudbr0/cloudbr1 No, not exactly. vxlan10027 --> brvx-10027 -> vnet89 Both the VXLAN device and Bridge are created on the fly by the 'modifyvxlan.sh' script (use my modified version! [0]) when needed. Shouldn't it be the same for the guest VNIs later on? E.g: lo < vxlan1000 < vxlan1000br < vnet1 < Guest VM ^ |-vnet2 < Guest VM Yes, so that's about how it works. Quick snippet from a hypervisor: root@hv-138-a13-37:~# bridge link show 9: vxlan505: mtu 1500 master brvx-505 state forwarding priority 32 cost 100 22: vxlan519: mtu 1500 master brvx-519 state forwarding priority 32 cost 100 24: vnet9: mtu 1500 master brvx-519 state forwarding priority 32 cost 100 root@hv-138-a13-37:~# Hope this helps! Wido [0]: https://gist.github.com/wido/51cb9880d86f08f73766634d7f6df3f4 This /20 IPv4 is used for everything within CloudStack's communcation. When talking about subnet size, the /20 is used for a couple of pods (row of racks) and not just a single pod, correct? How do you calculate to allocate the range for system VMs in a pod? Considering maximum 16 hosts per cluster. If let's say in the future we ran out of the /20 space (more than 4k host in a zone), we can connect new /20 subnets via a router, right? In *agent.properties* we have only set 'private.network.device=cloudbr1' I actually got an issue when trying to add a new KVM host before. I got the error "resource not found". I was able to add the host if we restart the host OS while adding the host in the web UI. I don't really configure anything in the *agent.properties* file before adding the host via the web UI. Did I did it wrong? Is there any deployment procedure that you can share with me? You first need to make sure that the HV can ping all the other loopback addresses of the Leaf and Spine switches and all HVs can connect to eachother via their loopback addresses. I can assure you that the spine switches, leaf switches and HV can all ping each other via their loopback addresses. Just a note, all the L3 routing between the switches and the HV is configured BGP unnumbered and eBGP. Thank you and sorry if there are unrelated questions in the thread. Best Regards, Hanis Irfan -Original Message- From: Wido den Hollander Sent: Tuesday, 21 May, 2024 04:06 To: Hanis Irfan ; users@cloudstack.apache.org Subject: Re: BGP EVPN with CloudStack Hi Hanis, See my reply inline. Op 17/05/2024 om 12:38 schreef Hanis Irfan: I think this is more about BGP EVPN than CloudStack but would appreciate anyone that could help. So basically, I’ve tried the Advanced Networking with VLAN isolation for my POC and now want to migrate to VXLAN. I would say that I’ve little to no knowledge about VXLAN in particular and BGP EVPN. Why don’t I use multicast instead is because our spine leaf switches have been already configured with basic EVPN (though not tested yet). We have 2 spine switches (Mellanox SN2700) and 2 leaf switches (Mellanox SN2410) running BGP unnumbered for underlay between them. Underlay between the hypervisor that is running FRR and the leaf switches also configured with BGP unnumbered. All the switches and hypervisor are assigned with 4-byte private ASN individually. Here is the FRR config on the hypervisor, all basic config: ``` ip forwarding ipv6 forwarding
RE: BGP EVPN with CloudStack
Hi Wido, I'm currently running Rocky Linux 9 for the HV. > Why are you setting anything on cloudbr0? There is no need to create cloudbr0 > with VXLAN. cloudbr0 is just a naming choice on my end. Is it okay for me to use something like NetworkManager to create the bridge? So, no need to create a VXLAN interface the as the bridge slave? Then how would the bridge communicate through the VXLAN tunnels? Assuming we're using VNI 10027. lo (VTEP) < vxlan10027 (slave of bridge) < cloudbr0/cloudbr1 Shouldn't it be the same for the guest VNIs later on? E.g: lo < vxlan1000 < vxlan1000br < vnet1 < Guest VM ^ |-vnet2 < Guest VM > This /20 IPv4 is used for everything within CloudStack's communcation. When talking about subnet size, the /20 is used for a couple of pods (row of racks) and not just a single pod, correct? How do you calculate to allocate the range for system VMs in a pod? Considering maximum 16 hosts per cluster. If let's say in the future we ran out of the /20 space (more than 4k host in a zone), we can connect new /20 subnets via a router, right? > In *agent.properties* we have only set 'private.network.device=cloudbr1' I actually got an issue when trying to add a new KVM host before. I got the error "resource not found". I was able to add the host if we restart the host OS while adding the host in the web UI. I don't really configure anything in the *agent.properties* file before adding the host via the web UI. Did I did it wrong? Is there any deployment procedure that you can share with me? > You first need to make sure that the HV can ping all the other loopback > addresses of the Leaf and Spine switches and all HVs can connect to eachother > via their loopback addresses. I can assure you that the spine switches, leaf switches and HV can all ping each other via their loopback addresses. Just a note, all the L3 routing between the switches and the HV is configured BGP unnumbered and eBGP. Thank you and sorry if there are unrelated questions in the thread. Best Regards, Hanis Irfan -Original Message- From: Wido den Hollander Sent: Tuesday, 21 May, 2024 04:06 To: Hanis Irfan ; users@cloudstack.apache.org Subject: Re: BGP EVPN with CloudStack Hi Hanis, See my reply inline. Op 17/05/2024 om 12:38 schreef Hanis Irfan: > I think this is more about BGP EVPN than CloudStack but would > appreciate anyone that could help. So basically, I’ve tried the > Advanced Networking with VLAN isolation for my POC and now want to migrate to > VXLAN. > > I would say that I’ve little to no knowledge about VXLAN in particular > and BGP EVPN. Why don’t I use multicast instead is because our spine > leaf switches have been already configured with basic EVPN (though not > tested yet). > > We have 2 spine switches (Mellanox SN2700) and 2 leaf switches > (Mellanox > SN2410) running BGP unnumbered for underlay between them. Underlay > between the hypervisor that is running FRR and the leaf switches also > configured with BGP unnumbered. > > All the switches and hypervisor are assigned with 4-byte private ASN > individually. Here is the FRR config on the hypervisor, all basic config: > > ``` > > ip forwarding > > ipv6 forwarding > > interface ens3f0np0 > > no ipv6 nd suppress-ra > > exit > > interface ens3f1np1 > > no ipv6 nd suppress-ra > > exit > > router bgp 420015 > > bgp router-id 10.XXX.118.1 > > no bgp ebgp-requires-policy > > neighbor uplink peer-group > > neighbor uplink remote-as external > > neighbor ens3f0np0 interface peer-group uplink > > neighbor ens3f1np1 interface peer-group uplink > > address-family ipv4 unicast > > network 10.XXX.118.1/32 > > exit-address-family > > address-family ipv6 unicast > > network 2407::0:1::1/128 > > neighbor uplink activate > > neighbor uplink soft-reconfiguration inbound > > exit-address-family > > address-family l2vpn evpn > > neighbor uplink activate > > neighbor uplink attribute-unchanged next-hop > > advertise-all-vni > > advertise-svi-ip > > exit-address-family > > ``` > > Now I want to configure cloudbr0 bridge as the management interface > for ACS. I’ve done it like so: > > ``` > > nmcli connection add type bridge con-name cloudbr0 ifname cloudbr0 \ > > ipv4.method manual ipv4.addresses 10.XXX.113.11/24 ipv4.gateway > 10.XXX.113.1 \ > > ipv4.dns 1.1.1.1,8.8.8.8 \ > >
Re: BGP EVPN with CloudStack
Hi Hanis, See my reply inline. Op 17/05/2024 om 12:38 schreef Hanis Irfan: I think this is more about BGP EVPN than CloudStack but would appreciate anyone that could help. So basically, I’ve tried the Advanced Networking with VLAN isolation for my POC and now want to migrate to VXLAN. I would say that I’ve little to no knowledge about VXLAN in particular and BGP EVPN. Why don’t I use multicast instead is because our spine leaf switches have been already configured with basic EVPN (though not tested yet). We have 2 spine switches (Mellanox SN2700) and 2 leaf switches (Mellanox SN2410) running BGP unnumbered for underlay between them. Underlay between the hypervisor that is running FRR and the leaf switches also configured with BGP unnumbered. All the switches and hypervisor are assigned with 4-byte private ASN individually. Here is the FRR config on the hypervisor, all basic config: ``` ip forwarding ipv6 forwarding interface ens3f0np0 no ipv6 nd suppress-ra exit interface ens3f1np1 no ipv6 nd suppress-ra exit router bgp 420015 bgp router-id 10.XXX.118.1 no bgp ebgp-requires-policy neighbor uplink peer-group neighbor uplink remote-as external neighbor ens3f0np0 interface peer-group uplink neighbor ens3f1np1 interface peer-group uplink address-family ipv4 unicast network 10.XXX.118.1/32 exit-address-family address-family ipv6 unicast network 2407::0:1::1/128 neighbor uplink activate neighbor uplink soft-reconfiguration inbound exit-address-family address-family l2vpn evpn neighbor uplink activate neighbor uplink attribute-unchanged next-hop advertise-all-vni advertise-svi-ip exit-address-family ``` Now I want to configure cloudbr0 bridge as the management interface for ACS. I’ve done it like so: ``` nmcli connection add type bridge con-name cloudbr0 ifname cloudbr0 \ ipv4.method manual ipv4.addresses 10.XXX.113.11/24 ipv4.gateway 10.XXX.113.1 \ ipv4.dns 1.1.1.1,8.8.8.8 \ ipv6.method manual ipv6.addresses 2407::200:c002::11/64 ipv6.gateway 2407::200:c002::1 \ ipv6.dns 2606:4700:4700::,2001:4860:4860:: \ bridge.stp no ethernet.mtu 9216 nmcli connection add type vxlan slave-type bridge con-name vxlan10027 ifname vxlan10027 \ id 10027 destination-port 4789 local 2407::0:1::1 vxlan.learning no \ master cloudbr0 ethernet.mtu 9216 dev lo nmcli connection up cloudbr0 nmcli connection up vxlan10027 ``` Why are you setting anything on cloudbr0? There is no need to create cloudbr0 with VXLAN. We only have created cloudbr1 (using systemd-networkd) for the POD communcation, but that's all: *cloudbr1.network* [Match] Name=cloudbr1 [Network] LinkLocalAddressing=no [Address] Address=10.100.2.108/20 [Route] Gateway=10.100.1.1 [Link] MTUBytes=1500 *cloudbr1.netdev* [NetDev] Name=cloudbr1 Kind=bridge This /20 IPv4 is used for everything within CloudStack's communcation. In *agent.properties* we have only set 'private.network.device=cloudbr1' I can see that the EVPN route with MAC address of cloudbr0 can be seen on both leaf switches. However, I can’t ping from the hypervisor to its gateway (.1) which is a firewall running somewhere that’s connected to a switchport tagged with VLAN 27. You first need to make sure that the HV can ping all the other loopback addresses of the Leaf and Spine switches and all HVs can connect to eachother via their loopback addresses. Can you check that? That's not EVPN nor VXLAN, just /32 (IPv4) routing with BGP. Wido