On Thu, Nov 3, 2016 at 9:37 AM, De Schepper, Koen (Nokia - BE) <koen.de_schep...@nokia-bell-labs.com> wrote: > > Hi, > > We experience some limit on the maximum packets in flight which seem not to > be related with the receive or write buffers. Does somebody know if there is > an issue with a maximum of around 1MByte (or sometimes 2Mbyte) of data in > flight per TCP flow?
does not ring a bell. I've definitely see cubic reaching >2MB cwnd (inflight) some packet trace will help. btw, tcp_rmem is the maximum receive buffer including all header and control overhead. the receive window announced is (very roughly) half of your rcvbuf. > > It seems to be a strict and stable limit independent from the CC (tested with > Cubic, Reno and DCTCP). On a link of 200Mbps and 200ms RTT our link is only > 20% (sometimes 40%, see conditions below) utilized for a single TCP flow with > no drop experienced at all (no bottleneck in the AQM or RTT emulation, as it > supports more throughput if multiple flows are active). > > Some configuration changes we already tried on both client and server (kernel > 3.18.9): > > net.ipv4.tcp_no_metrics_save = 1 > net.ipv4.tcp_rmem = 4096 87380 6291456 > net.ipv4.tcp_wmem = 4096 16384 4194304 > > SERVER# ss -i > tcp ESTAB 0 1049728 10.187.255.211:46642 10.187.16.194:ssh > dctcp wscale:7,7 rto:408 rtt:204.333/0.741 ato:40 mss:1448 cwnd:1466 > send 83.1Mbps unacked:728 rcv_rtt:212 rcv_space:29200 > CLIENT# ss -i > tcp ESTAB 0 288 10.187.16.194:ssh 10.187.255.211:46642 > dctcp wscale:7,7 rto:404 rtt:203.389/0.213 ato:40 mss:1448 cwnd:78 > send 4.4Mbps unacked:8 rcv_rtt:204 rcv_space:1074844 > > When increasing the write and receive mem further (they were already way > above 1 or 2 MB) it steps to double (40%; 2Mbytes in flight): > net.ipv4.tcp_no_metrics_save = 1 > net.ipv4.tcp_rmem = 4096 8000000 16291456 > net.ipv4.tcp_wmem = 4096 8000000 16291456 > > SERVER # ss -i > tcp ESTAB 0 2068976 10.187.255.212:54637 10.187.16.112:ssh > cubic wscale:8,8 rto:404 rtt:202.622/0.061 ato:40 mss:1448 cwnd:1849 > ssthresh:1140 send 105.7Mbps unacked:1457 rcv_rtt:217.5 rcv_space:29200 > CLIENT# ss -i > tcp ESTAB 0 648 10.187.16.112:ssh 10.187.255.212:54637 > cubic wscale:8,8 rto:404 rtt:201.956/0.038 ato:40 mss:1448 cwnd:132 > send 7.6Mbps unacked:18 rcv_rtt:204 rcv_space:2093044 > > Further increasing (x10) does not help anymore... > net.ipv4.tcp_no_metrics_save = 1 > net.ipv4.tcp_rmem = 4096 80000000 162914560 > net.ipv4.tcp_wmem = 4096 80000000 162914560 > > As all these parameters autotune, it is hard to find out which one is > limiting... In the examples, above unacked does not want to go higher, while > congestion window in the server is big enough... rcv_space could be limiting, > but it tunes up if I change the server with the higher buffers (switching to > 2MByte in flight). > > We also tried tcp_limit_output_bytes, setting it bigger (x10) and > smaller(/10), without effect. We've put it in /etc/sysctl.conf and rebooted, > to make sure that it is effective. > > Some more detailed tests that had an effect on the 1 or 2MByte: > - It seems that with TSO off, if we configure a bigger wmem buffer, an > ongoing flow suddenly is able to immediately double its bytes in flight > limit. We configured further up to more than 10x the buffer, but no further > increase helps, and the limits we saw are only 1MByte and 2Mbyte (no > intermediate values depending on any parameter). When setting tcp_wmem > smaller again, the 2MByte limit stays on the ongoing flow. We have to restart > the flow to make the buffer reduction to 1MByte effective. > - With TSO on, only the 2MByte limit is effective, independent from the wmem > buffer. We have to restart the flow to make a tso change effective. > > Koen. >