On 18/2/2022 6:31 pm, Jake Yip wrote:
On 16/2/2022 6:38 pm, Daniel Alvarez wrote:
On 16 Feb 2022, at 06:02, Jake Yip via discuss
<ovs-discuss@openvswitch.org> wrote:
Hi all,
We are running VMs on OpenStack with OVN. We have an issue with
performance, tested using iperf3 with TCP. We get like ~300Kbits in
the problematic scenario, when normal traffic is around 1Gbps. Some
observations are:
* Only TCP is affected, not UDP iperf3 tests
* It only happens between some nodes, not others
* In only happens in one direction in some cases
We've looked into this and found that the poor performance may be due
to retransmission / congestion. Looking deeper, there seems to be
some interesting behaviour with fragmentation? / reassembly
Our architecture is like this:
VM23 - [TAP23 - BOND23] -- (internet) -- [BOND21 - TAP21] - VM21
VMs are on hypervisors, on the hypervisor the tap devices egress out
bonds. The
We have done tcpdumps from VM23 , VM21, BOND21 and TAP21.
What we have found is that a PSH,ACK packet from VM23 is re-written
into a ACK packet when it gets to the bond.
When it gets to the other side, these packets doesn't seem to be
reassembled properly to be passed onto the tap into VM21.
We would like to know if this behaviour (rewriting a PSH,ACK into
separate ACK packet) is a normal behaviour of OVS/OVN? Is there any
other reason why there are so many retransmissions?
I'm not sure if this is an OVN or OVS issue, apologies if this is not
the right list. I'm also not sure if I'm debugging this issue
correctly. Any help will be welcome!
Which NICs are you using?
Is this Geneve traffic? over VLAN?
If so, may you re run your tests by disabling the VLAN tx offload in
both hypervisors (w/ ethtool)?
Cheers,
Daniel
Thanks!
We tried disabling VLAN tx offload on the hypervisors, no difference.
Also tried turning off a few offloading, no difference too.
Interestingly, same VM hypervisor only has problems to some VMs, not
others. If it was a offload issue, will that affect traffic to all VMs
equally?
I'm not sure if it is related to HW or SW, it is all very peculiar.
Regards,
Jake
Just to answer my own question, and close off this discussion - we have
found that it was caused by Generic Receive Offload (GRO).
We have GRE tunnels across the internet to connect different sites
together. An (iperf3) packet travelling across the tunnel looks like
(GRE (GENEVE (TCP)))
We found out, on some tunnel nodes, when un-encapsulating the GRE
packet, joins two UDP GENEVE packet (and the inner TCP data) together,
but does not seem to generate a valid UDP checksum for the resultant
packet. This packet appears to be dropped further down the line causing
massive packet loss in iperf3.
Turning off GRO resolves this issue. Thanks to everyone who pointed me
in the direction of offloads!
Regards,
Jake
_______________________________________________
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss