Hello, This will be a long e-mail and I will present my findings on the $subject. I have been debugging this problem for 6 days and I have pinpointed where the problem lies but I haven't been able to fix it. I would really appreciate if you can read it. I hope this e-mail will be reference in the mailing list for other people experiencing the same/similar problem.
Unfortunately, I am stuck at fixing the problem. Any help is appreciated. === TL;DR === ICMP packets from VM are fragmented normally, seen in tap interface as-is but they are reconstructed in interfaces above tap (qbr, qvr, qvo). They don't make their way out of compute node using GRE tunnel. echo 0 > /proc/sys/net/bridge/bridge-nf-call-iptable on compute node causes fragmented packets to go as-is, not reconstructed in above interfaces (qbr, qvr, etc). They make their way out of GRE tunnel, reaches network namespace, network namespace attempts to reply it but it doesn't reach GRE tunnel on network node. Lowering the MTU value in every interface in router network namespace to 1454 (same as VM) fixes ICMP problem. ICMP packet with any length reaches from vm to router and vice-versa. However, regular TCP connections do not work. I see a lot of retransmission. Even the simple nc connection is unreliable from VM to router namespace. Iperf shows 23KBit of speed. Here is the detailed problem description. === Symptoms === These are the symptoms I experience. 1- Ping works but cannot SSH into VM 2- I cannot download anything inside VM, the connection is too slow. 3- A lot of TCP retransmission occurs 4- VM cannot communicate with metadata server (maybe related with 2/3?) === Install and Infrastructure Information == I have followed the official juno document step-by-step (double checked to see if I mis-configured anything). Neutron is configured using ml2, Openvswitch, and gre. Just as suggested in the documentation. I have 3 physical machines with 2 NICs (controller, network, and compute). em1 interface is my management and data network and it has a separate switch (10.20.0.0/24). Network and Compute node communicates using GRE on this address. em2 is connected to other switch which act as an outside network (192.168.88.0/24). So, external and internal network is physically separate. Hosts run Ubuntu Server 14.04. They have OpenStack Juno. Kernel version is 3.13.0-32-generic. Openvswitch version number is 2.0.1+git20140120-0ubuntu2. KVM is used as a hypervisor. VMs have MTU 1454 configured in dnsmasq.conf as written in official Juno documentation. (checked this inside VM as well) VM network is 10.0.0.0/24. I have 1 VM for testing and it has IP address of 10.0.0.8. Its router IP address is 10.0.0.1. This router has a gateway address of 192.168.88.1 All the bridges (created by neutron, agents, etc), and network interfaces in physical hosts have MTU of 1500. GRO and TSO is off on network and compute nodes (ethtool -K em1 gro/tso off) === Findings === I will omit what I tried to get here -it is a long way :(- and will present the issue straight. I realized that "ping -s 1430 10.0.0.1" inside VM works, but "ping -s 1431 10.0.0.1" does not work. I checked to see if it is the same the other way around inside network namespace for this network in Network node. Running "ip netns exec <qrouterxxxx> ping -s 1430 10.0.0.8" works, but -s 1431 does not. 1- Now, looking at the problem from VM side, I ran tcpdump nearly all over the places. It appears that the problem lies in qvo/qbr/qvb bridges as explained in [0]. When sending ICMP packets, they are fragmented as expected in tap interface. However, the fragmented packet is reconstructed just after tap interface, in qbrxxx, and carried as reconstructed all the way long to qvbxxx/qvoxxx. So, the packet never gets out on GRE tunnel. 2- Looking from the network namespace in network node, I checked the MTU values of the interfaces. qrxx and qgxx have MTU of 1500. Then I ran tcpdump on qrxxx where the packets to/from VM should be seen. In addition, I ran tcpdump on em1 (management interface where GRE packets should be seen) on both network and compute node. With "ping -s 1431 10.0.0.8", I saw that packets were fragmented inside network namespace. However, the first packet wasn't seen in GRE (em1), but the second fragmented packet was seen, it made its way out. Since it was not a full packet, ping failed from network namespace to VM. === Attempts to Solve The Problem === I searched the bridge fragmentation problem and it was suggested to disable "bridge-nf-call-iptables". I ran the following command in compute node: "echo 0 > /proc/sys/net/bridge/bridge-nf-call-iptables" With this setting, fragmented packets from vm goes as-is in tap, and other qvo/qvb interfaces. They are not reconstructed. They make their way out GRE tunnel, and reach the router namespace. Router namespace attempts to reply it but it fails to reply. The reply packet never goes out in GRE tunel. This tcpdump is attached as "ping-from-vm-to-router-with-1431-bridge-nf-call-iptables-off.txt" I suspected the MTU settings inside router namespace so I lowered the MTU value inside namespace to 1454 (same as VM). With this setting, I can now ping the VM from router namespace with -s 1431. It seems that lowering the MTU values in router namespace fixes the problem. I tried to ping router from VM, and VM from router and it works in two direction. Whatever setting I put in -s (2000, 3000, etc), I get replies. It seemed that the problem was fixed. However, when I tried making a regular connection, it failed. I setup iperf server running inside network namespace. Connecting from VM to router using iperf, I got 23.3KBit speed. Giving up iperf, I tried simple netcat connection and it was unreliable, slow. In tcpdump, I saw a lot of tcp retranssmissions. The tcpdump of simple connection from VM to router namespace using netcat is attached. (nc -l -p 9999 is run on router namespace). This dump was gathered on tap interface of VM on compute node. === Summary === ICMP packets are OK with "bridge-nf-call-iptables" off on compute node and by lowering the MTU value inside router namespace. However, regular TCP connection does not work and I see lots of tcp retransmissions. I think disabling "bridge-nf-call-iptables" causes the security group to be disabled as well, and it doesn't look right. Those are the findings I've gathered and I haven't fixed the problem yet. I am giving the links I found for reference while attacking this problem. They look similar. http://lists.openstack.org/pipermail/openstack-dev/2014-January/024995.html https://bugs.launchpad.net/fuel/+bug/1256289 https://bugs.launchpad.net/openstack-manuals/+bug/1322799 Thank you for reading so far. I appreciate it. Regards, Eren [0] http://openvswitch.org/pipermail/discuss/2014-May/013964.html -- System Administrator https://skyatlas.com/
tap-interface-on-compute-from-vm-to-router-using-netcat.pcap
Description: application/vnd.tcpdump.pcap
compute node, tap interface =========================== 15:30:21.604640 IP (tos 0x0, ttl 64, id 49494, offset 0, flags [+], proto ICMP (1), length 1452) 10.0.0.8 > 10.0.0.1: ICMP echo request, id 1997, seq 1, length 1432 15:30:21.604709 IP (tos 0x0, ttl 64, id 49494, offset 1432, flags [none], proto ICMP (1), length 27) 10.0.0.8 > 10.0.0.1: icmp compute node, em1 interface, GRE tunnel ======================================= 15:30:21.604939 IP (tos 0x0, ttl 64, id 34168, offset 0, flags [DF], proto GRE (47), length 1494) compute1 > network: GREv0, Flags [key present], key=0x1, length 1474 IP (tos 0x0, ttl 64, id 49494, offset 0, flags [+], proto ICMP (1), length 1452) 10.0.0.8 > 10.0.0.1: ICMP echo request, id 1997, seq 1, length 1432 15:30:21.604951 IP (tos 0x0, ttl 64, id 34169, offset 0, flags [DF], proto GRE (47), length 69) compute1 > network: GREv0, Flags [key present], key=0x1, length 49 IP (tos 0x0, ttl 64, id 49494, offset 1432, flags [none], proto ICMP (1), length 27) 10.0.0.8 > 10.0.0.1: icmp network node, em1 interface, GRE tunnel ======================================= 15:30:21.579658 IP (tos 0x0, ttl 64, id 34168, offset 0, flags [DF], proto GRE (47), length 1494) compute1 > network: GREv0, Flags [key present], key=0x1, length 1474 IP (tos 0x0, ttl 64, id 49494, offset 0, flags [+], proto ICMP (1), length 1452) 10.0.0.8 > 10.0.0.1: ICMP echo request, id 1997, seq 1, length 1432 15:30:21.579692 IP (tos 0x0, ttl 64, id 34169, offset 0, flags [DF], proto GRE (47), length 69) compute1 > network: GREv0, Flags [key present], key=0x1, length 49 IP (tos 0x0, ttl 64, id 49494, offset 1432, flags [none], proto ICMP (1), length 27) 10.0.0.8 > 10.0.0.1: icmp network namespace on network node, qrxxx interface ================================================== 15:30:21.579937 IP (tos 0x0, ttl 64, id 49494, offset 1432, flags [none], proto ICMP (1), length 27) 10.0.0.8 > 10.0.0.1: ip-proto-1 15:30:21.579952 IP (tos 0x0, ttl 64, id 49494, offset 0, flags [+], proto ICMP (1), length 1452) 10.0.0.8 > 10.0.0.1: ICMP echo request, id 1997, seq 1, length 1432 15:30:21.579990 IP (tos 0x0, ttl 64, id 41204, offset 0, flags [none], proto ICMP (1), length 1459) 10.0.0.1 > 10.0.0.8: ICMP echo reply, id 1997, seq 1, length 1439 The last reply doesn't make its way to GRE tunnel. It is not seen in em1 interface.
_______________________________________________ Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack Post to : openstack@lists.openstack.org Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack