On Tue, 2018-02-20 at 07:39 -0800, Eric Dumazet wrote: > On Tue, 2018-02-20 at 10:32 +0100, Oleksandr Natalenko wrote: > > Hi. > > > > 19.02.2018 20:56, Eric Dumazet wrote: > > > Switching TCP to GSO mode, relying on core networking layers > > > to perform eventual adaptation for dumb devices was overdue. > > > > > > 1) Most TCP developments are done with TSO in mind. > > > 2) Less high-resolution timers needs to be armed for TCP-pacing > > > 3) GSO can benefit of xmit_more hint > > > 4) Receiver GRO is more effective (as if TSO was used for real on > > > sender) > > > -> less ACK packets and overhead. > > > 5) Write queues have less overhead (one skb holds about 64KB of > > > payload) > > > 6) SACK coalescing just works. (no payload in skb->head) > > > 7) rtx rb-tree contains less packets, SACK is cheaper. > > > 8) Removal of legacy code. Less maintenance hassles. > > > > > > Note that I have left the sendpage/zerocopy paths, but they probably > > > can > > > benefit from the same strategy. > > > > > > Thanks to Oleksandr Natalenko for reporting a performance issue for > > > BBR/fq_codel, > > > which was the main reason I worked on this patch series. > > > > Thanks for dealing with this that fast. > > > > Does this mean that the option to optimise internal TCP pacing is still > > an open question? > > It is not an optimization that is needed, but taking into account that > highres timers can have latencies of ~2 usec or more. > > When sending 64KB TSO packets, having extra 2 usec after every ~54 usec > (at 10Gbit) has no big impact, since TCP computes a slightly inflated > pacing rate anyway. > > But when sending one MSS/packet every usec, this definitely can > demonstrate a big slowdown. > > But the anser is yes, I will take a look at this timer drift.
Actually timer drifts are not horrible (at least on my lab hosts) But BBR has a pessimistic way to sense the burst size, as it is tied to TSO/GSO being there. Following patch helps a lot. diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index b2bca373f8bee35267df49b5947a6793fed71a12..6818042cd8a9a1778f54637861647091afd9a769 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -1730,7 +1730,7 @@ u32 tcp_tso_autosize(const struct sock *sk, unsigned int mss_now, */ segs = max_t(u32, bytes / mss_now, min_tso_segs); - return min_t(u32, segs, sk->sk_gso_max_segs); + return segs; } EXPORT_SYMBOL(tcp_tso_autosize); @@ -1742,9 +1742,10 @@ static u32 tcp_tso_segs(struct sock *sk, unsigned int mss_now) const struct tcp_congestion_ops *ca_ops = inet_csk(sk)->icsk_ca_ops; u32 tso_segs = ca_ops->tso_segs_goal ? ca_ops->tso_segs_goal(sk) : 0; - return tso_segs ? : - tcp_tso_autosize(sk, mss_now, - sock_net(sk)->ipv4.sysctl_tcp_min_tso_segs); + if (!tso_segs) + tso_segs = tcp_tso_autosize(sk, mss_now, + sock_net(sk)->ipv4.sysctl_tcp_min_tso_segs); + return min_t(u32, tso_segs, sk->sk_gso_max_segs); } /* Returns the portion of skb which can be sent right away */