Re: [ovs-discuss] Poor performance with TCP

Jake Yip via discuss Wed, 02 Mar 2022 01:48:22 -0800

On 18/2/2022 6:31 pm, Jake Yip wrote:

On 16/2/2022 6:38 pm, Daniel Alvarez wrote:
On 16 Feb 2022, at 06:02, Jake Yip via discuss<ovs-discuss@openvswitch.org> wrote:
Hi all,
We are running VMs on OpenStack with OVN. We have an issue withperformance, tested using iperf3 with TCP. We get like ~300Kbits inthe problematic scenario, when normal traffic is around 1Gbps. Someobservations are:
* Only TCP is affected, not UDP iperf3 tests
* It only happens between some nodes, not others
* In only happens in one direction in some cases
We've looked into this and found that the poor performance may be dueto retransmission / congestion. Looking deeper, there seems to besome interesting behaviour with fragmentation? / reassembly
Our architecture is like this:

VM23 - [TAP23 - BOND23] -- (internet) -- [BOND21 - TAP21] - VM21
VMs are on hypervisors, on the hypervisor the tap devices egress outbonds. The
We have done tcpdumps from VM23 , VM21, BOND21 and TAP21.
What we have found is that a PSH,ACK packet from VM23 is re-writteninto a ACK packet when it gets to the bond.
When it gets to the other side, these packets doesn't seem to bereassembled properly to be passed onto the tap into VM21.
We would like to know if this behaviour (rewriting a PSH,ACK intoseparate ACK packet) is a normal behaviour of OVS/OVN? Is there anyother reason why there are so many retransmissions?
I'm not sure if this is an OVN or OVS issue, apologies if this is notthe right list. I'm also not sure if I'm debugging this issuecorrectly. Any help will be welcome!
Which NICs are you using?
Is this Geneve traffic? over VLAN?
If so, may you re run your tests by disabling the VLAN tx offload inboth hypervisors (w/ ethtool)?
Cheers,
Daniel
Thanks!
We tried disabling VLAN tx offload on the hypervisors, no difference.Also tried turning off a few offloading, no difference too.
Interestingly, same VM hypervisor only has problems to some VMs, notothers. If it was a offload issue, will that affect traffic to all VMsequally?
I'm not sure if it is related to HW or SW, it is all very peculiar.

Regards,
Jake

Just to answer my own question, and close off this discussion - we havefound that it was caused by Generic Receive Offload (GRO).

We have GRE tunnels across the internet to connect different sitestogether. An (iperf3) packet travelling across the tunnel looks like


 (GRE (GENEVE (TCP)))

We found out, on some tunnel nodes, when un-encapsulating the GREpacket, joins two UDP GENEVE packet (and the inner TCP data) together,but does not seem to generate a valid UDP checksum for the resultantpacket. This packet appears to be dropped further down the line causingmassive packet loss in iperf3.

Turning off GRO resolves this issue. Thanks to everyone who pointed mein the direction of offloads!


Regards,
Jake
_______________________________________________
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss

Re: [ovs-discuss] Poor performance with TCP

Reply via email to