Hi all, The following patch introduced a regression in Chelsio cxgb4 driver, causing port failure when running heavy TSO traffic:
commit 10d3be569243def8d92ac3722395ef5a59c504e6 Author: Eric Dumazet <[email protected]> Date: Thu Apr 21 10:55:23 2016 -0700 tcp-tso: do not split TSO packets at retransmit time Linux TCP stack painfully segments all TSO/GSO packets before retransmits. This was fine back in the days when TSO/GSO were emerging, with their bugs, but we believe the dark age is over. Keeping big packets in write queues, but also in stack traversal has a lot of benefits. - Less memory overhead, because write queues have less skbs - Less cpu overhead at ACK processing. - Better SACK processing, as lot of studies mentioned how awful linux was at this ;) - Less cpu overhead to send the rtx packets (IP stack traversal, netfilter traversal, drivers...) - Better latencies in presence of losses. - Smaller spikes in fq like packet schedulers, as retransmits are not constrained by TCP Small Queues. 1 % packet losses are common today, and at 100Gbit speeds, this translates to ~80,000 losses per second. Losses are often correlated, and we see many retransmit events leading to 1-MSS train of packets, at the time hosts are already under stress. Signed-off-by: Eric Dumazet <[email protected]> Acked-by: Yuchung Cheng <[email protected]> Signed-off-by: David S. Miller [email protected] When the number of TCP retransmissions are quite high, the packet length coming from stack does not seems to be proper, due to which our TSO module gets stuck. If I change segs back to 1 in __tcp_retransmit_skb(), traffic is running fine. Please let us know if we are missing something. Thanks, Arjun.
