Re: Is there a maximum bytes in flight limitation in the tcp stack?

Yuchung Cheng Mon, 07 Nov 2016 20:51:55 -0800

On Thu, Nov 3, 2016 at 9:37 AM, De Schepper, Koen (Nokia - BE)
<koen.de_schep...@nokia-bell-labs.com> wrote:
>
> Hi,
>
> We experience some limit on the maximum packets in flight which seem not to 
> be related with the receive or write buffers. Does somebody know if there is 
> an issue with a maximum of around 1MByte (or sometimes 2Mbyte) of data in 
> flight per TCP flow?


does not ring a bell. I've definitely see cubic reaching >2MB cwnd (inflight)
some packet trace will help.

btw, tcp_rmem is the maximum receive buffer including all header and
control overhead. the receive window announced is (very roughly) half
of your rcvbuf.

>
> It seems to be a strict and stable limit independent from the CC (tested with 
> Cubic, Reno and DCTCP). On a link of 200Mbps and 200ms RTT our link is only 
> 20% (sometimes 40%, see conditions below) utilized for a single TCP flow with 
> no drop experienced at all (no bottleneck in the AQM or RTT emulation, as it 
> supports more throughput if multiple flows are active).
>
> Some configuration changes we already tried on both client and server (kernel 
> 3.18.9):
>
> net.ipv4.tcp_no_metrics_save = 1
> net.ipv4.tcp_rmem = 4096 87380 6291456
> net.ipv4.tcp_wmem = 4096 16384 4194304
>
> SERVER# ss -i
> tcp    ESTAB      0      1049728  10.187.255.211:46642     10.187.16.194:ssh
>          dctcp wscale:7,7 rto:408 rtt:204.333/0.741 ato:40 mss:1448 cwnd:1466 
> send 83.1Mbps unacked:728 rcv_rtt:212 rcv_space:29200
> CLIENT# ss -i
> tcp    ESTAB      0      288      10.187.16.194:ssh      10.187.255.211:46642
>          dctcp wscale:7,7 rto:404 rtt:203.389/0.213 ato:40 mss:1448 cwnd:78 
> send 4.4Mbps unacked:8 rcv_rtt:204 rcv_space:1074844
>
> When increasing the write and receive mem further (they were already way 
> above 1 or 2 MB) it steps to double (40%; 2Mbytes in flight):
> net.ipv4.tcp_no_metrics_save = 1
> net.ipv4.tcp_rmem = 4096 8000000 16291456
> net.ipv4.tcp_wmem = 4096 8000000 16291456
>
> SERVER # ss -i
> tcp    ESTAB      0      2068976  10.187.255.212:54637     10.187.16.112:ssh
>          cubic wscale:8,8 rto:404 rtt:202.622/0.061 ato:40 mss:1448 cwnd:1849 
> ssthresh:1140 send 105.7Mbps unacked:1457 rcv_rtt:217.5 rcv_space:29200
> CLIENT# ss -i
> tcp    ESTAB      0      648      10.187.16.112:ssh      10.187.255.212:54637
>          cubic wscale:8,8 rto:404 rtt:201.956/0.038 ato:40 mss:1448 cwnd:132 
> send 7.6Mbps unacked:18 rcv_rtt:204 rcv_space:2093044
>
> Further increasing (x10) does not help anymore...
> net.ipv4.tcp_no_metrics_save = 1
> net.ipv4.tcp_rmem = 4096 80000000 162914560
> net.ipv4.tcp_wmem = 4096 80000000 162914560
>
> As all these parameters autotune, it is hard to find out which one is 
> limiting... In the examples, above unacked does not want to go higher, while 
> congestion window in the server is big enough... rcv_space could be limiting, 
> but it tunes up if I change the server with the higher buffers (switching to 
> 2MByte in flight).
>
> We also tried tcp_limit_output_bytes, setting it bigger (x10) and 
> smaller(/10), without effect. We've put it in /etc/sysctl.conf and rebooted, 
> to make sure that it is effective.
>
> Some more detailed tests that had an effect on the 1 or 2MByte:
> - It seems that with TSO off, if we configure a bigger wmem buffer, an 
> ongoing flow suddenly is able to immediately double its bytes in flight 
> limit. We configured further up to more than 10x the buffer, but no further 
> increase helps, and the limits we saw are only 1MByte and 2Mbyte (no 
> intermediate values depending on any parameter). When setting tcp_wmem 
> smaller again, the 2MByte limit stays on the ongoing flow. We have to restart 
> the flow to make the buffer reduction to 1MByte effective.
> - With TSO on, only the 2MByte limit is effective, independent from the wmem 
> buffer. We have to restart the flow to make a tso change effective.
>
> Koen.
>

Re: Is there a maximum bytes in flight limitation in the tcp stack?

Reply via email to