Re: BGP EVPN with CloudStack

2024-05-22 Thread Wido den Hollander




Op 21/05/2024 om 04:26 schreef Hanis Irfan:

Hi Wido,

I'm currently running Rocky Linux 9 for the HV.


That should be fine




Why are you setting anything on cloudbr0? There is no need to create cloudbr0 
with VXLAN.


cloudbr0 is just a naming choice on my end. Is it okay for me to use something 
like NetworkManager to create the bridge?


You don't need this bridge at all with VXLAN.



So, no need to create a VXLAN interface the as the bridge slave? Then how would 
the bridge communicate through the VXLAN tunnels?


No, you just need to have a /32 (v4) and /128 (v6) attached to 'lo' and 
announce it. The v4 address will be the VTEP source.


Make sure all HVs can ping eachother on their loopback address. Make 
sure this underlay works!


root@hv-138-a13-37:~# ip addr show dev lo
1: lo:  mtu 65536 qdisc noqueue state UNKNOWN 
group default qlen 1000

link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
   valid_lft forever preferred_lft forever
inet 10.255.255.108/32 brd 10.255.255.108 scope global lo
   valid_lft forever preferred_lft forever
inet6 2a05:xxx:xxx:2::108/128 scope global
   valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
   valid_lft forever preferred_lft forever
root@hv-138-a13-37:~#

You should be able to ping with -s 8192, this verifies that jumboframes 
work.




Assuming we're using VNI 10027.

lo (VTEP) < vxlan10027 (slave of bridge) < cloudbr0/cloudbr1



No, not exactly.

vxlan10027 --> brvx-10027 -> vnet89

Both the VXLAN device and Bridge are created on the fly by the 
'modifyvxlan.sh' script (use my modified version! [0]) when needed.



Shouldn't it be the same for the guest VNIs later on? E.g:

lo < vxlan1000 < vxlan1000br < vnet1 < Guest VM
^
|-vnet2 < Guest VM



Yes, so that's about how it works. Quick snippet from a hypervisor:

root@hv-138-a13-37:~# bridge link show
9: vxlan505:  mtu 1500 master brvx-505 
state forwarding priority 32 cost 100
22: vxlan519:  mtu 1500 master brvx-519 
state forwarding priority 32 cost 100
24: vnet9:  mtu 1500 master brvx-519 
state forwarding priority 32 cost 100

root@hv-138-a13-37:~#

Hope this helps!

Wido

[0]: https://gist.github.com/wido/51cb9880d86f08f73766634d7f6df3f4




This /20 IPv4 is used for everything within CloudStack's communcation.

When talking about subnet size, the /20 is used for a couple of pods (row of 
racks) and not just a single pod, correct?
How do you calculate to allocate the range for system VMs in a pod? Considering 
maximum 16 hosts per cluster.
If let's say in the future we ran out of the /20 space (more than 4k host in a 
zone), we can connect new /20 subnets via a router, right?


In *agent.properties* we have only set 'private.network.device=cloudbr1'

I actually got an issue when trying to add a new KVM host before. I got the error 
"resource not found". I was able to add the host if we restart the host OS 
while adding the host in the web UI.
I don't really configure anything in the *agent.properties* file before adding 
the host via the web UI. Did I did it wrong? Is there any deployment procedure 
that you can share with me?


You first need to make sure that the HV can ping all the other loopback 
addresses of the Leaf and Spine switches and all HVs can connect to eachother 
via their loopback addresses.

I can assure you that the spine switches, leaf switches and HV can all ping 
each other via their loopback addresses. Just a note, all the L3 routing 
between the switches and the HV is configured BGP unnumbered and eBGP.

Thank you and sorry if there are unrelated questions in the thread.

Best Regards,

Hanis Irfan

-Original Message-
From: Wido den Hollander 
Sent: Tuesday, 21 May, 2024 04:06
To: Hanis Irfan ; users@cloudstack.apache.org
Subject: Re: BGP EVPN with CloudStack

Hi Hanis,

See my reply inline.

Op 17/05/2024 om 12:38 schreef Hanis Irfan:

I think this is more about BGP EVPN than CloudStack but would
appreciate anyone that could help. So basically, I’ve tried the
Advanced Networking with VLAN isolation for my POC and now want to migrate to 
VXLAN.

I would say that I’ve little to no knowledge about VXLAN in particular
and BGP EVPN. Why don’t I use multicast instead is because our spine
leaf switches have been already configured with basic EVPN (though not
tested yet).

We have 2 spine switches (Mellanox SN2700) and 2 leaf switches
(Mellanox
SN2410) running BGP unnumbered for underlay between them. Underlay
between the hypervisor that is running FRR and the leaf switches also
configured with BGP unnumbered.

All the switches and hypervisor are assigned with 4-byte private ASN
individually. Here is the FRR config on the hypervisor, all basic config:

```

ip forwarding

ipv6 forwarding

RE: BGP EVPN with CloudStack

2024-05-20 Thread Hanis Irfan
Hi Wido,

I'm currently running Rocky Linux 9 for the HV. 

> Why are you setting anything on cloudbr0? There is no need to create cloudbr0 
> with VXLAN.

cloudbr0 is just a naming choice on my end. Is it okay for me to use something 
like NetworkManager to create the bridge?

So, no need to create a VXLAN interface the as the bridge slave? Then how would 
the bridge communicate through the VXLAN tunnels? 

Assuming we're using VNI 10027.

lo (VTEP) < vxlan10027 (slave of bridge) < cloudbr0/cloudbr1

Shouldn't it be the same for the guest VNIs later on? E.g:

lo < vxlan1000 < vxlan1000br < vnet1 < Guest VM
^
|-vnet2 < Guest VM


> This /20 IPv4 is used for everything within CloudStack's communcation.
When talking about subnet size, the /20 is used for a couple of pods (row of 
racks) and not just a single pod, correct?
How do you calculate to allocate the range for system VMs in a pod? Considering 
maximum 16 hosts per cluster.
If let's say in the future we ran out of the /20 space (more than 4k host in a 
zone), we can connect new /20 subnets via a router, right?

> In *agent.properties* we have only set 'private.network.device=cloudbr1'
I actually got an issue when trying to add a new KVM host before. I got the 
error "resource not found". I was able to add the host if we restart the host 
OS while adding the host in the web UI.
I don't really configure anything in the *agent.properties* file before adding 
the host via the web UI. Did I did it wrong? Is there any deployment procedure 
that you can share with me?

> You first need to make sure that the HV can ping all the other loopback 
> addresses of the Leaf and Spine switches and all HVs can connect to eachother 
> via their loopback addresses.
I can assure you that the spine switches, leaf switches and HV can all ping 
each other via their loopback addresses. Just a note, all the L3 routing 
between the switches and the HV is configured BGP unnumbered and eBGP.

Thank you and sorry if there are unrelated questions in the thread.

Best Regards,

Hanis Irfan

-Original Message-
From: Wido den Hollander  
Sent: Tuesday, 21 May, 2024 04:06
To: Hanis Irfan ; users@cloudstack.apache.org
Subject: Re: BGP EVPN with CloudStack

Hi Hanis,

See my reply inline.

Op 17/05/2024 om 12:38 schreef Hanis Irfan:
> I think this is more about BGP EVPN than CloudStack but would 
> appreciate anyone that could help. So basically, I’ve tried the 
> Advanced Networking with VLAN isolation for my POC and now want to migrate to 
> VXLAN.
> 
> I would say that I’ve little to no knowledge about VXLAN in particular 
> and BGP EVPN. Why don’t I use multicast instead is because our spine 
> leaf switches have been already configured with basic EVPN (though not 
> tested yet).
> 
> We have 2 spine switches (Mellanox SN2700) and 2 leaf switches 
> (Mellanox
> SN2410) running BGP unnumbered for underlay between them. Underlay 
> between the hypervisor that is running FRR and the leaf switches also 
> configured with BGP unnumbered.
> 
> All the switches and hypervisor are assigned with 4-byte private ASN 
> individually. Here is the FRR config on the hypervisor, all basic config:
> 
> ```
> 
> ip forwarding
> 
> ipv6 forwarding
> 
> interface ens3f0np0
> 
>  no ipv6 nd suppress-ra
> 
> exit
> 
> interface ens3f1np1
> 
>  no ipv6 nd suppress-ra
> 
> exit
> 
> router bgp 420015
> 
>  bgp router-id 10.XXX.118.1
> 
>  no bgp ebgp-requires-policy
> 
>  neighbor uplink peer-group
> 
>  neighbor uplink remote-as external
> 
>  neighbor ens3f0np0 interface peer-group uplink
> 
>  neighbor ens3f1np1 interface peer-group uplink
> 
>  address-family ipv4 unicast
> 
>  network 10.XXX.118.1/32
> 
>  exit-address-family
> 
>  address-family ipv6 unicast
> 
>  network 2407::0:1::1/128
> 
>  neighbor uplink activate
> 
>  neighbor uplink soft-reconfiguration inbound
> 
>  exit-address-family
> 
>  address-family l2vpn evpn
> 
>  neighbor uplink activate
> 
>  neighbor uplink attribute-unchanged next-hop
> 
>  advertise-all-vni
> 
>  advertise-svi-ip
> 
>  exit-address-family
> 
> ```
> 
> Now I want to configure cloudbr0 bridge as the management interface 
> for ACS. I’ve done it like so:
> 
> ```
> 
> nmcli connection add type bridge con-name cloudbr0 ifname cloudbr0 \
> 
> ipv4.method manual ipv4.addresses 10.XXX.113.11/24 ipv4.gateway
> 10.XXX.113.1 \
> 
> ipv4.dns 1.1.1.1,8.8.8.8 \
> 
> 

Re: BGP EVPN with CloudStack

2024-05-20 Thread Wido den Hollander

Hi Hanis,

See my reply inline.

Op 17/05/2024 om 12:38 schreef Hanis Irfan:
I think this is more about BGP EVPN than CloudStack but would appreciate 
anyone that could help. So basically, I’ve tried the Advanced Networking 
with VLAN isolation for my POC and now want to migrate to VXLAN.


I would say that I’ve little to no knowledge about VXLAN in particular 
and BGP EVPN. Why don’t I use multicast instead is because our spine 
leaf switches have been already configured with basic EVPN (though not 
tested yet).


We have 2 spine switches (Mellanox SN2700) and 2 leaf switches (Mellanox 
SN2410) running BGP unnumbered for underlay between them. Underlay 
between the hypervisor that is running FRR and the leaf switches also 
configured with BGP unnumbered.


All the switches and hypervisor are assigned with 4-byte private ASN 
individually. Here is the FRR config on the hypervisor, all basic config:


```

ip forwarding

ipv6 forwarding

interface ens3f0np0

     no ipv6 nd suppress-ra

exit

interface ens3f1np1

     no ipv6 nd suppress-ra

exit

router bgp 420015

     bgp router-id 10.XXX.118.1

     no bgp ebgp-requires-policy

     neighbor uplink peer-group

     neighbor uplink remote-as external

     neighbor ens3f0np0 interface peer-group uplink

     neighbor ens3f1np1 interface peer-group uplink

     address-family ipv4 unicast

     network 10.XXX.118.1/32

     exit-address-family

     address-family ipv6 unicast

     network 2407::0:1::1/128

     neighbor uplink activate

     neighbor uplink soft-reconfiguration inbound

     exit-address-family

     address-family l2vpn evpn

     neighbor uplink activate

     neighbor uplink attribute-unchanged next-hop

     advertise-all-vni

     advertise-svi-ip

     exit-address-family

```

Now I want to configure cloudbr0 bridge as the management interface for 
ACS. I’ve done it like so:


```

nmcli connection add type bridge con-name cloudbr0 ifname cloudbr0 \

ipv4.method manual ipv4.addresses 10.XXX.113.11/24 ipv4.gateway 
10.XXX.113.1 \


ipv4.dns 1.1.1.1,8.8.8.8 \

ipv6.method manual ipv6.addresses 2407::200:c002::11/64 ipv6.gateway 
2407::200:c002::1 \


ipv6.dns 2606:4700:4700::,2001:4860:4860:: \

bridge.stp no ethernet.mtu 9216

nmcli connection add type vxlan slave-type bridge con-name vxlan10027 
ifname vxlan10027 \


id 10027 destination-port 4789 local 2407::0:1::1 vxlan.learning no \

master cloudbr0 ethernet.mtu 9216 dev lo

nmcli connection up cloudbr0

nmcli connection up vxlan10027

```



Why are you setting anything on cloudbr0? There is no need to create 
cloudbr0 with VXLAN.


We only have created cloudbr1 (using systemd-networkd) for the POD 
communcation, but that's all:


*cloudbr1.network*
[Match]
Name=cloudbr1

[Network]
LinkLocalAddressing=no

[Address]
Address=10.100.2.108/20

[Route]
Gateway=10.100.1.1

[Link]
MTUBytes=1500

*cloudbr1.netdev*
[NetDev]
Name=cloudbr1
Kind=bridge


This /20 IPv4 is used for everything within CloudStack's communcation.

In *agent.properties* we have only set 'private.network.device=cloudbr1'

I can see that the EVPN route with MAC address of cloudbr0 can be seen 
on both leaf switches. However, I can’t ping from the hypervisor to its 
gateway (.1) which is a firewall running somewhere that’s connected to a 
switchport tagged with VLAN 27.




You first need to make sure that the HV can ping all the other loopback 
addresses of the Leaf and Spine switches and all HVs can connect to 
eachother via their loopback addresses.


Can you check that? That's not EVPN nor VXLAN, just /32 (IPv4) routing 
with BGP.


Wido