Hi all, 

The following patch introduced a regression in Chelsio cxgb4 driver, causing 
port failure when running heavy TSO traffic:

commit 10d3be569243def8d92ac3722395ef5a59c504e6
Author: Eric Dumazet <[email protected]>
Date:   Thu Apr 21 10:55:23 2016 -0700

    tcp-tso: do not split TSO packets at retransmit time

    Linux TCP stack painfully segments all TSO/GSO packets before retransmits.

    This was fine back in the days when TSO/GSO were emerging, with their
    bugs, but we believe the dark age is over.

    Keeping big packets in write queues, but also in stack traversal
    has a lot of benefits.
     - Less memory overhead, because write queues have less skbs
     - Less cpu overhead at ACK processing.
     - Better SACK processing, as lot of studies mentioned how
       awful linux was at this ;)
     - Less cpu overhead to send the rtx packets
       (IP stack traversal, netfilter traversal, drivers...)
     - Better latencies in presence of losses.
     - Smaller spikes in fq like packet schedulers, as retransmits
       are not constrained by TCP Small Queues.

    1 % packet losses are common today, and at 100Gbit speeds, this
    translates to ~80,000 losses per second.
    Losses are often correlated, and we see many retransmit events
    leading to 1-MSS train of packets, at the time hosts are already
    under stress.

    Signed-off-by: Eric Dumazet <[email protected]>
    Acked-by: Yuchung Cheng <[email protected]>
    Signed-off-by: David S. Miller [email protected]

When the number of TCP retransmissions are quite high, the packet length coming 
from stack does not seems to be proper, due to which our TSO module gets stuck. 
If I change segs back to 1 in __tcp_retransmit_skb(),  traffic is running fine. 
Please let us know if we are missing something.

Thanks,
Arjun.

Reply via email to