Re: [openstack-dev] [neutron] VXLAN with single-NIC compute nodes: Avoiding the MTU pitfalls

Fredy Neeser Thu, 12 Mar 2015 05:39:47 -0700


On 11.03.2015 19:31, Ian Wells wrote:

On 11 March 2015 at 04:27, Fredy Neeser <fredy.nee...@solnet.ch<mailto:fredy.nee...@solnet.ch>> wrote:


    7: br-ex.1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc
    noqueue state UNKNOWN group default
        link/ether e0:3f:49:b4:7c:a7 brd ff:ff:ff:ff:ff:ff
        inet 192.168.1.14/24 <http://192.168.1.14/24> brd
    192.168.1.255 scope global br-ex.1
           valid_lft forever preferred_lft forever

    8: br-ex.12: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1554 qdisc
    noqueue state UNKNOWN group default
        link/ether e0:3f:49:b4:7c:a7 brd ff:ff:ff:ff:ff:ff
        inet 192.168.1.14/24 <http://192.168.1.14/24> brd
    192.168.1.255 scope global br-ex.12
           valid_lft forever preferred_lft forever

I find it hard to believe that you want the same address configured on*both* of these interfaces - which one do you think will be sendingpackets?


Ian, thanks for your feedback!

I did choose the same address for the two interfaces, for three reasons:

1. Within my home single-LAN (underlay) environment, traffic isswitched, and VXLAN traffic is confined to VLAN 12, so there is never aconflict between IP 192.168.1.14 on VLAN 1 and the same IP on VLAN 12.OTOH, for a more scalable VXLAN setup (with multiple underlays and L3routing in between), I would like to use different IPs for br-ex.1 andbr-ex.12 -- for example by using separate subnets

  192.168.1.0/26  for VLAN 1
  192.168.12.0/26  for VLAN 12
However, I'm not quite there yet (see 3.).

2. I'm using policy routing on my hosts to steer VXLAN traffic (UDPdest. port 4789) to interface br-ex.12 -- all other traffic from192.168.1.14 is source routed from br-ex.1, presumably because br-ex.1is a lower-numbered interface than br-ex.12 (?) -- interesting questionwhether I'm relying here on the order in which I created these twointerfaces.


  [root@langrain ~]# ip a
  ...

7: br-ex.1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueuestate UNKNOWN group default

      link/ether e0:3f:49:b4:7c:a7 brd ff:ff:ff:ff:ff:ff
      inet 192.168.1.14/24 brd 192.168.1.255 scope global br-ex.1
         valid_lft forever preferred_lft forever

8: br-ex.12: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1554 qdisc noqueuestate UNKNOWN group default

      link/ether e0:3f:49:b4:7c:a7 brd ff:ff:ff:ff:ff:ff
      inet 192.168.1.14/24 brd 192.168.1.255 scope global br-ex.12
         valid_lft forever preferred_lft forever

3. It's not clear to me how to setup multiple nodes with packstack if anode's tunnel IP does not equal its admin IP (or the OpenStack API IP incase of a controller node). With packstack, I can only specify thecompute node IPs through CONFIG_COMPUTE_HOSTS. Presumably, these IPs areused for both packstack deployment (admin IP) and for configuring theVXLAN tunnel IPs (local_ip and remote_ip parameters). How would Ispecify different IPs for these purposes? (Recall that my hosts have asingle NIC).

In any case, native traffic on bridge br-ex is sent via br-ex.1 (VLAN1), which is also the reason the Neutron gateway port qg-XXX needs to bean access port for VLAN 1 (tag: 1). VXLAN traffic is sent frombr-ex.12 on all compute nodes. See the 2 cases below:

Case 1. Max-size ping from compute node 'langrain' (192.168.1.14) toanother host on same LAN=> Native traffic sent from br-ex.1; no traffic sent frombr-ex.12


[fn@langrain ~]$ ping -M do -s 1472 -c 1 192.168.1.54
PING 192.168.1.54 (192.168.1.54) 1472(1500) bytes of data.
1480 bytes from 192.168.1.54: icmp_seq=1 ttl=64 time=0.766 ms

[root@langrain ~]# tcpdump -n -i br-ex.1 dst 192.168.1.54
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on br-ex.1, link-type EN10MB (Ethernet), capture size 65535 bytes

10:32:37.666572 IP 192.168.1.14 > 192.168.1.54: ICMP echo request, id10432, seq 1, length 148010:32:42.673665 ARP, Request who-has 192.168.1.54 tell 192.168.1.14,length 28

Case 2: Max-size ping from a guest1 (10.0.0.1) on compute node'langrain' (192.168.1.14)to a guest2 (10.0.0.3) on another compute node(192.168.1.21) via VXLAN tunnel.

             Guests are on the same virtual network 10.0.0.0/24

=> Encapsulated traffic sent from br-ex.12; no trafficsent from br-ex.1


$ ping -M do -s 1472 -c 1 10.0.0.3
PING 10.0.0.3 (10.0.0.3) 1472(1500) bytes of data.
1480 bytes from 10.0.0.3: icmp_seq=1 ttl=64 time=2.22 ms

[root@langrain ~]# tcpdump -n -i br-ex.12
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on br-ex.12, link-type EN10MB (Ethernet), capture size 65535 bytes

11:02:56.916265 IP 192.168.1.14.47872 > 192.168.1.21.4789: VXLAN, flags[I] (0x08), vni 10

ARP, Request who-has 10.0.0.3 tell 10.0.0.1, length 28

11:02:56.916991 IP 192.168.1.21.51408 > 192.168.1.14.4789: VXLAN, flags[I] (0x08), vni 10

ARP, Reply 10.0.0.3 is-at fa:16:3e:e6:e1:c8, length 28

11:02:56.917282 IP 192.168.1.14.57836 > 192.168.1.21.4789: VXLAN, flags[I] (0x08), vni 10

IP 10.0.0.1 > 10.0.0.3: ICMP echo request, id 25474, seq 1, length 1480

11:02:56.918110 IP 192.168.1.21.44153 > 192.168.1.14.4789: VXLAN, flags[I] (0x08), vni 10

IP 10.0.0.3 > 10.0.0.1: ICMP echo reply, id 25474, seq 1, length 1480

11:03:01.918885 IP 192.168.1.21.51408 > 192.168.1.14.4789: VXLAN, flags[I] (0x08), vni 10

ARP, Request who-has 10.0.0.1 tell 10.0.0.3, length 28

11:03:01.919207 IP 192.168.1.14.57760 > 192.168.1.21.4789: VXLAN, flags[I] (0x08), vni 10

ARP, Reply 10.0.0.1 is-at fa:16:3e:f4:1d:89, length 28

11:03:01.920502 ARP, Request who-has 192.168.1.14 tell 192.168.1.21,length 46

11:03:01.920519 ARP, Reply 192.168.1.14 is-at e0:3f:49:b4:7c:a7, length 28

You may find that configuring a VLAN interface for eth1.12 (not in abridge, with a local address suitable for communication with computenodes, for VXLAN traffic) and eth1.1 (in br-ex, for external trafficto use) does better for you.

Hmm, I only have one NIC (eth0). In order to attach eth0 to br-ex, Ihad to configure it as an OVSPort.Maybe I misunderstand your alternative, but are you suggesting toconfigure eth0.1 as an OVSPort (connected to br-ex), and eth0.12 as astandalone interface? (Not sure a physical interface can be "brainsplit" in such a way.)

I'm also not clear what your Openstack API endpoint address or MTU is- maybe that's why the eth1.1 interface is addressed?

It's 192.168.1.14, and br-ex.1 is always used for native traffic, so theMTU is 1500.

Note that my physical switch uses a native VLAN of 1 and is configuredwith "Untag all ports" for VLAN 1. Moreover, OVSPort eth0 (attached tobr-ex) is configured for VLAN trunking with a native VLAN of 1(vlan_mode: native-untagged, trunks: [1,12], tag: 1), so within bridgebr-ex, native packets are tagged 1.

I can tell you that if you want your API to be on the same address192.168.1.14 as the VXLAN tunnel endpoints then it has to be oneaddress on one interface and the two functions will share the same MTU- almost certainly not what you're looking for.

With my current setup (thanks to policy routing), I have the same IP ontwo interfaces br-ex.1 and br-ex.12, with MTUs 1500 and 1554, respectively.

If you source VXLAN packets from a different IP address then you canput it on a different interface and give it a different MTU - whichappears to fit what you want much better.

Selecting different compute host IPs for admin (CONFIG_COMPUTE_HOSTS)and tunnel IPs would eliminate the need for policy routing and is alsomore suitable for scaling a VXLAN deployment across multiple independentL2 BC domains, but for that I'll need to resolve point 3. above --pointers in that direction are much appreciated.


Thanks,
- Fredy

__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [neutron] VXLAN with single-NIC compute nodes: Avoiding the MTU pitfalls

Reply via email to