Re: VXLAN and KVm experiences

Ivan Kudryavtsev Fri, 28 Dec 2018 08:44:16 -0800

Wido, that's interesting.

Do you think that the Cumulus-based switches with BGP inside have advantage
over classic OSPF-based routing switches and separate multihop MP BGP
route-servers for VNI propagation?


I'm thinking about pure L3 OSPF-based backend networks for management and
storage where cloudstack uses bridges on dummy interfaces with IP assigned
while real NICS use utility IP-addresses in several OSPF networks and all
those target IPs are distributed with OSPF.

Next, VNI-s are created over bridges and their information is distributed
over BGP.

This approach helps to implement fault tolerance and multi-path routes with
standard L3 stack without xSTP, VCS, etc, decrease broadcast domains.

Any thoughts?


пт, 28 дек. 2018 г. в 05:34, Wido den Hollander <w...@widodh.nl>:

>
>
> On 10/23/18 2:54 PM, Ivan Kudryavtsev wrote:
> > Doesn't solution like this works seamlessly for large VXLAN networks?
> >
> > https://vincent.bernat.ch/en/blog/2017-vxlan-bgp-evpn
> >
>
> This is what we are looking into right now.
>
> As CloudStack executes *modifyvxlan.sh* prior to starting an Instance it
> would be just a matter of replacing this script with a version which
> does the EVPN for us.
>
> Our routers will probably be 36x100G SuperMicro Bare Matel switches
> running Cumulus.
>
> Using unnumbered BGP over IPv6 we'll provide network connectivity to the
> Hypervisors.
>
> Using FFR and EVPN we'll be able to enable VXLAN on the hypervisors and
> route traffic.
>
> As these things seem to be very use-case specific I don't see how we can
> integrate this into CloudStack in a generic way.
>
> The *modifyvxlan.sh* script gets the VNI as a argument, so anybody can
> adapt it to their own needs for their specific environment.
>
> Wido
>
> > вт, 23 окт. 2018 г., 8:34 Simon Weller <swel...@ena.com.invalid>:
> >
> >> Linux native VXLAN uses multicast and each host has to participate in
> >> multicast in order to see the VXLAN networks. We haven't tried using PIM
> >> across a L3 boundary with ACS, although it will probably work fine.
> >>
> >> Another option is to use a L3 VTEP, but right now there is no native
> >> support for that in CloudStack's VXLAN implementation, although we've
> >> thought about proposing it as feature.
> >>
> >>
> >> ________________________________
> >> From: Wido den Hollander <w...@widodh.nl>
> >> Sent: Tuesday, October 23, 2018 7:17 AM
> >> To: dev@cloudstack.apache.org; Simon Weller
> >> Subject: Re: VXLAN and KVm experiences
> >>
> >>
> >>
> >> On 10/23/18 1:51 PM, Simon Weller wrote:
> >>> We've also been using VXLAN on KVM for all of our isolated VPC guest
> >> networks for quite a long time now. As Andrija pointed out, make sure
> you
> >> increase the max_igmp_memberships param and also put an ip address on
> each
> >> interface host VXLAN interface in the same subnet for all hosts that
> will
> >> share networking, or multicast won't work.
> >>>
> >>
> >> Thanks! So you are saying that all hypervisors need to be in the same L2
> >> network or are you routing the multicast?
> >>
> >> My idea was that each POD would be an isolated Layer 3 domain and that a
> >> VNI would span over the different Layer 3 networks.
> >>
> >> I don't like STP and other Layer 2 loop-prevention systems.
> >>
> >> Wido
> >>
> >>>
> >>> - Si
> >>>
> >>>
> >>> ________________________________
> >>> From: Wido den Hollander <w...@widodh.nl>
> >>> Sent: Tuesday, October 23, 2018 5:21 AM
> >>> To: dev@cloudstack.apache.org
> >>> Subject: Re: VXLAN and KVm experiences
> >>>
> >>>
> >>>
> >>> On 10/23/18 11:21 AM, Andrija Panic wrote:
> >>>> Hi Wido,
> >>>>
> >>>> I have "pioneered" this one in production for last 3 years (and
> >> suffered a
> >>>> nasty pain of silent drop of packages on kernel 3.X back in the days
> >>>> because of being unaware of max_igmp_memberships kernel parameters,
> so I
> >>>> have updated the manual long time ago).
> >>>>
> >>>> I never had any issues (beside above nasty one...) and it works very
> >> well.
> >>>
> >>> That's what I want to hear!
> >>>
> >>>> To avoid above issue that I described - you should increase
> >>>> max_igmp_memberships (/proc/sys/net/ipv4/igmp_max_memberships)  -
> >> otherwise
> >>>> with more than 20 vxlan interfaces, some of them will stay in down
> state
> >>>> and have a hard traffic drop (with proper message in agent.log) with
> >> kernel
> >>>>> 4.0 (or I silent, bitchy random packet drop on kernel 3.X...) - and
> >> also
> >>>> pay attention to MTU size as well - anyway everything is in the manual
> >> (I
> >>>> updated everything I though was missing) - so please check it.
> >>>>
> >>>
> >>> Yes, the underlying network will all be 9000 bytes MTU.
> >>>
> >>>> Our example setup:
> >>>>
> >>>> We have i.e. bond.950 as the main VLAN which will carry all vxlan
> >> "tunnels"
> >>>> - so this is defined as KVM traffic label. In our case it didn't make
> >> sense
> >>>> to use bridge on top of this bond0.950 (as the traffic label) - you
> can
> >>>> test it on your own - since this bridge is used only to extract child
> >>>> bond0.950 interface name, then based on vxlan ID, ACS will provision
> >>>> vxlan...@bond0.xxx and join this new vxlan interface to NEW bridge
> >> created
> >>>> (and then of course vNIC goes to this new bridge), so original bridge
> >> (to
> >>>> which bond0.xxx belonged) is not used for anything.
> >>>>
> >>>
> >>> Clear, I indeed thought something like that would happen.
> >>>
> >>>> Here is sample from above for vxlan 867 used for tenant isolation:
> >>>>
> >>>> root@hostname:~# brctl show brvx-867
> >>>>
> >>>> bridge name     bridge id               STP enabled     interfaces
> >>>> brvx-867                8000.2215cfce99ce       no              vnet6
> >>>>
> >>>>      vxlan867
> >>>>
> >>>> root@hostname:~# ip -d link show vxlan867
> >>>>
> >>>> 297: vxlan867: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 8142 qdisc
> noqueue
> >>>> master brvx-867 state UNKNOWN mode DEFAULT group default qlen 1000
> >>>>     link/ether 22:15:cf:ce:99:ce brd ff:ff:ff:ff:ff:ff promiscuity 1
> >>>>     vxlan id 867 group 239.0.3.99 dev bond0.950 port 0 0 ttl 10 ageing
> >> 300
> >>>>
> >>>> root@ix1-c7-2:~# ifconfig bond0.950 | grep MTU
> >>>>           UP BROADCAST RUNNING MULTICAST  MTU:8192  Metric:1
> >>>>
> >>>> So note how the vxlan interface has by 50 bytes smaller MTU than the
> >>>> bond0.950 parent interface (which could affects traffic inside VM) -
> so
> >>>> jumbo frames are needed anyway on the parent interface (bond.950 in
> >> example
> >>>> above with minimum of 1550 MTU)
> >>>>
> >>>
> >>> Yes, thanks! We will be using 1500 MTU inside the VMs, so all the
> >>> networks underneath will be ~9k.
> >>>
> >>>> Ping me if more details needed, happy to help.
> >>>>
> >>>
> >>> Awesome! We'll be doing a PoC rather soon. I'll come back with our
> >>> experiences later.
> >>>
> >>> Wido
> >>>
> >>>> Cheers
> >>>> Andrija
> >>>>
> >>>> On Tue, 23 Oct 2018 at 08:23, Wido den Hollander <w...@widodh.nl>
> >> wrote:
> >>>>
> >>>>> Hi,
> >>>>>
> >>>>> I just wanted to know if there are people out there using KVM with
> >>>>> Advanced Networking and using VXLAN for different networks.
> >>>>>
> >>>>> Our main goal would be to spawn a VM and based on the network the NIC
> >> is
> >>>>> in attach it to a different VXLAN bridge on the KVM host.
> >>>>>
> >>>>> It seems to me that this should work, but I just wanted to check and
> >> see
> >>>>> if people have experience with it.
> >>>>>
> >>>>> Wido
> >>>>>
> >>>>
> >>>>
> >>>
> >>
> >
>


-- 
With best regards, Ivan Kudryavtsev
Bitworks LLC
Cell RU: +7-923-414-1515
Cell USA: +1-201-257-1512
WWW: http://bitworks.software/ <http://bw-sw.com/>

Re: VXLAN and KVm experiences

Reply via email to