On Sun, Jul 28, 2019 at 5:14 PM Josh Hunt <joh...@akamai.com> wrote: > > On 7/28/19 6:54 AM, Eric Dumazet wrote: > > On Sun, Jul 28, 2019 at 1:21 AM Josh Hunt <joh...@akamai.com> wrote: > >> > >> On 7/27/19 12:05 AM, Eric Dumazet wrote: > >>> On Sat, Jul 27, 2019 at 4:23 AM Josh Hunt <joh...@akamai.com> wrote: > >>>> > >>>> The current implementation of TCP MTU probing can considerably > >>>> underestimate the MTU on lossy connections allowing the MSS to get down > >>>> to > >>>> 48. We have found that in almost all of these cases on our networks these > >>>> paths can handle much larger MTUs meaning the connections are being > >>>> artificially limited. Even though TCP MTU probing can raise the MSS back > >>>> up > >>>> we have seen this not to be the case causing connections to be "stuck" > >>>> with > >>>> an MSS of 48 when heavy loss is present. > >>>> > >>>> Prior to pushing out this change we could not keep TCP MTU probing > >>>> enabled > >>>> b/c of the above reasons. Now with a reasonble floor set we've had it > >>>> enabled for the past 6 months. > >>> > >>> And what reasonable value have you used ??? > >> > >> Reasonable for some may not be reasonable for others hence the new > >> sysctl :) We're currently running with a fairly high value based off of > >> the v6 min MTU minus headers and options, etc. We went conservative with > >> our setting initially as it seemed a reasonable first step when > >> re-enabling TCP MTU probing since with no configurable floor we saw a # > >> of cases where connections were using severely reduced mss b/c of loss > >> and not b/c of actual path restriction. I plan to reevaluate the setting > >> at some point, but since the probing method is still the same it means > >> the same clients who got stuck with mss of 48 before will land at > >> whatever floor we set. Looking forward we are interested in trying to > >> improve TCP MTU probing so it does not penalize clients like this. > >> > >> A suggestion for a more reasonable floor default would be 512, which is > >> the same as the min_pmtu. Given both mechanisms are trying to achieve > >> the same goal it seems like they should have a similar min/floor. > >> > >>> > >>>> > >>>> The new sysctl will still default to TCP_MIN_SND_MSS (48), but gives > >>>> administrators the ability to control the floor of MSS probing. > >>>> > >>>> Signed-off-by: Josh Hunt <joh...@akamai.com> > >>>> --- > >>>> Documentation/networking/ip-sysctl.txt | 6 ++++++ > >>>> include/net/netns/ipv4.h | 1 + > >>>> net/ipv4/sysctl_net_ipv4.c | 9 +++++++++ > >>>> net/ipv4/tcp_ipv4.c | 1 + > >>>> net/ipv4/tcp_timer.c | 2 +- > >>>> 5 files changed, 18 insertions(+), 1 deletion(-) > >>>> > >>>> diff --git a/Documentation/networking/ip-sysctl.txt > >>>> b/Documentation/networking/ip-sysctl.txt > >>>> index df33674799b5..49e95f438ed7 100644 > >>>> --- a/Documentation/networking/ip-sysctl.txt > >>>> +++ b/Documentation/networking/ip-sysctl.txt > >>>> @@ -256,6 +256,12 @@ tcp_base_mss - INTEGER > >>>> Path MTU discovery (MTU probing). If MTU probing is enabled, > >>>> this is the initial MSS used by the connection. > >>>> > >>>> +tcp_mtu_probe_floor - INTEGER > >>>> + If MTU probing is enabled this caps the minimum MSS used for > >>>> search_low > >>>> + for the connection. > >>>> + > >>>> + Default : 48 > >>>> + > >>>> tcp_min_snd_mss - INTEGER > >>>> TCP SYN and SYNACK messages usually advertise an ADVMSS option, > >>>> as described in RFC 1122 and RFC 6691. > >>>> diff --git a/include/net/netns/ipv4.h b/include/net/netns/ipv4.h > >>>> index bc24a8ec1ce5..c0c0791b1912 100644 > >>>> --- a/include/net/netns/ipv4.h > >>>> +++ b/include/net/netns/ipv4.h > >>>> @@ -116,6 +116,7 @@ struct netns_ipv4 { > >>>> int sysctl_tcp_l3mdev_accept; > >>>> #endif > >>>> int sysctl_tcp_mtu_probing; > >>>> + int sysctl_tcp_mtu_probe_floor; > >>>> int sysctl_tcp_base_mss; > >>>> int sysctl_tcp_min_snd_mss; > >>>> int sysctl_tcp_probe_threshold; > >>>> diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c > >>>> index 0b980e841927..59ded25acd04 100644 > >>>> --- a/net/ipv4/sysctl_net_ipv4.c > >>>> +++ b/net/ipv4/sysctl_net_ipv4.c > >>>> @@ -820,6 +820,15 @@ static struct ctl_table ipv4_net_table[] = { > >>>> .extra2 = &tcp_min_snd_mss_max, > >>>> }, > >>>> { > >>>> + .procname = "tcp_mtu_probe_floor", > >>>> + .data = > >>>> &init_net.ipv4.sysctl_tcp_mtu_probe_floor, > >>>> + .maxlen = sizeof(int), > >>>> + .mode = 0644, > >>>> + .proc_handler = proc_dointvec_minmax, > >>>> + .extra1 = &tcp_min_snd_mss_min, > >>>> + .extra2 = &tcp_min_snd_mss_max, > >>>> + }, > >>>> + { > >>>> .procname = "tcp_probe_threshold", > >>>> .data = > >>>> &init_net.ipv4.sysctl_tcp_probe_threshold, > >>>> .maxlen = sizeof(int), > >>>> diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c > >>>> index d57641cb3477..e0a372676329 100644 > >>>> --- a/net/ipv4/tcp_ipv4.c > >>>> +++ b/net/ipv4/tcp_ipv4.c > >>>> @@ -2637,6 +2637,7 @@ static int __net_init tcp_sk_init(struct net *net) > >>>> net->ipv4.sysctl_tcp_min_snd_mss = TCP_MIN_SND_MSS; > >>>> net->ipv4.sysctl_tcp_probe_threshold = TCP_PROBE_THRESHOLD; > >>>> net->ipv4.sysctl_tcp_probe_interval = TCP_PROBE_INTERVAL; > >>>> + net->ipv4.sysctl_tcp_mtu_probe_floor = TCP_MIN_SND_MSS; > >>>> > >>>> net->ipv4.sysctl_tcp_keepalive_time = TCP_KEEPALIVE_TIME; > >>>> net->ipv4.sysctl_tcp_keepalive_probes = TCP_KEEPALIVE_PROBES; > >>>> diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c > >>>> index c801cd37cc2a..dbd9d2d0ee63 100644 > >>>> --- a/net/ipv4/tcp_timer.c > >>>> +++ b/net/ipv4/tcp_timer.c > >>>> @@ -154,7 +154,7 @@ static void tcp_mtu_probing(struct > >>>> inet_connection_sock *icsk, struct sock *sk) > >>>> } else { > >>>> mss = tcp_mtu_to_mss(sk, icsk->icsk_mtup.search_low) > >>>> >> 1; > >>>> mss = min(net->ipv4.sysctl_tcp_base_mss, mss); > >>>> - mss = max(mss, 68 - tcp_sk(sk)->tcp_header_len); > >>>> + mss = max(mss, net->ipv4.sysctl_tcp_mtu_probe_floor); > >>>> mss = max(mss, net->ipv4.sysctl_tcp_min_snd_mss); > >>>> icsk->icsk_mtup.search_low = tcp_mss_to_mtu(sk, mss); > >>>> } > >>> > >>> > >>> Existing sysctl should be enough ? > >> > >> I don't think so. Changing tcp_min_snd_mss could impact clients that > >> really want/need a small mss. When you added the new sysctl I tried to > >> analyze the mss values we're seeing to understand what we could possibly > >> raise it to. While not a huge amount, we see more clients than I > >> expected announcing mss values in the 180-512 range. Given that I would > >> not feel comfortable setting tcp_min_snd_mss to say 512 as I suggested > >> above. > > > > If these clients need mss values in 180-512 ranges, how MTU probing > > would work for them, > > if you set a floor to 512 ? > > First, we already seem to be fine with ignoring these paths with ICMP > based PMTU discovery b/c of our min_pmtu default of 512 and that is > configurable. Second by adding this sysctl we're giving administrators > the choice to decide if they'd like to attempt to support these very > very small # of paths which may be below 512 (MSS <= 512 does not mean > MTU <= 512) or cover themselves by being able to raise the floor to not > penalize clients who may be on very lossy networks. > > > > > Are we sure the intent of tcp_base_mss was not to act as a floor ? > > My understanding is that tcp_base_mss is meant to be the initial value > of search_low (as per Docs). Then in RFC 4821 [1] Sections 7.2, shows > search_low should be configurable, and 7.7 we see that in response to > successive black hole detection search_low should be halved. So I don't > think it was meant to be a floor, but just the initial search_low param.
That matches my reading of the RFC and code as well. But in that case IMHO an additional commit should fix this comment to reflect the fact thatTCP_BASE_MSS is the initial value, rather than a floor: diff --git a/include/net/tcp.h b/include/net/tcp.h index 42728239cdbe..05575ac70333 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -75,7 +75,7 @@ void tcp_time_wait(struct sock *sk, int state, int timeo); /* Minimal accepted MSS. It is (60+60+8) - (20+20). */ #define TCP_MIN_MSS 88U -/* The least MTU to use for probing */ +/* The initial MTU to use for probing */ #define TCP_BASE_MSS 1024 /* probing interval, default to 10 minutes as per RFC4821 */ neal