From: Eric Dumazet <eduma...@google.com> Date: Sat, 3 Dec 2016 11:14:49 -0800
> Under very high TX stress, CPU handling NIC TX completions can spend > considerable amount of cycles handling TSQ (TCP Small Queues) logic. > > This patch series avoids some atomic operations, but most notable > patch is the 3rd one, allowing other cpus processing ACK packets and > calling tcp_write_xmit() to grab TCP_TSQ_DEFERRED so that > tcp_tasklet_func() can skip already processed sockets. > > This avoid lots of lock acquisitions and cache lines accesses, > particularly under load. > > In v2, I added : > > - tcp_small_queue_check() change to allow 1st and 2nd packets > in write queue to be sent, even in the case TX completion of > already acknowledged packets did not happen yet. > This helps when TX completion coalescing parameters are set > even to insane values, and/or busy polling is used. > > - A reorganization of struct sock fields to > lower false sharing and increase data locality. > > - Then I moved tsq_flags from tcp_sock to struct sock also > to reduce cache line misses during TX completions. > > I measured an overall throughput gain of 22 % for heavy TCP use > over a single TX queue. Looks fantastic, series applied, thanks Eric.