From: Eric Dumazet <eduma...@google.com>
Date: Sat,  3 Dec 2016 11:14:49 -0800

> Under very high TX stress, CPU handling NIC TX completions can spend
> considerable amount of cycles handling TSQ (TCP Small Queues) logic.
> 
> This patch series avoids some atomic operations, but most notable
> patch is the 3rd one, allowing other cpus processing ACK packets and
> calling tcp_write_xmit() to grab TCP_TSQ_DEFERRED so that
> tcp_tasklet_func() can skip already processed sockets.
> 
> This avoid lots of lock acquisitions and cache lines accesses,
> particularly under load.
> 
> In v2, I added :
> 
> - tcp_small_queue_check() change to allow 1st and 2nd packets
>   in write queue to be sent, even in the case TX completion of
>   already acknowledged packets did not happen yet.
>   This helps when TX completion coalescing parameters are set
>   even to insane values, and/or busy polling is used.
> 
> - A reorganization of struct sock fields to
>   lower false sharing and increase data locality.
> 
> - Then I moved tsq_flags from tcp_sock to struct sock also
>   to reduce cache line misses during TX completions.
> 
> I measured an overall throughput gain of 22 % for heavy TCP use
> over a single TX queue.

Looks fantastic, series applied, thanks Eric.

Reply via email to