Hi, I have been observing strange TCP connection aborts on an OVS lately. The TCP connection are all localhost only. So no external components can be blamed. tcpdump shows the TCP ACKs where missing and there might have been some data corruption as well (hard to tell without a proper decoder).
The ovs instance is not configured to touch lo. After lots of debugging I have been able to find a correlation with an OVS instance on that host. To reproduce the issue I run netperf on lo like this: # netperf -l 600 -D 1,second -H localhost This reports a steady 48527.77 10^6bits/s through on lo. Then I push load through OVS. My OF controller creates on flow rule per TCP connection going through the switch. With about 100 new connections per second this loads the 8 cores to about 50% each. At some random point (mostly within the first 10 seconds of the test), CPU load drops to zero and netperf stalls. The kernel begins to spill out messages like this: grep : 1433 callbacks suppressed With systemtap, I have traced this message to ip_finish_output2 in net/ipv4/ip_output.c. The skb's at that point have a destination IP of 0.0.0.0. Combinations tested: openvswitch-1.11 on Linux 3.8.13 openvswitch-2.3.0 on Linux 3.14.19 openvswitch-git (2654cc338bfb413a6295078e3a7a8e1d4f67cbcc) on Linux 3.14.19 I seems that under this type of load openvswitch kills traffic through lo. Any ideas on what to try next? Andreas -- -- Dipl. Inform. Andreas Schultz _______________________________________________ discuss mailing list discuss@openvswitch.org http://openvswitch.org/mailman/listinfo/discuss