Re: [PATCH net] bpf: always re-init the congestion control after switching to it
On 1/23/18, 9:30 AM, "Yuchung Cheng" wrote: The original patch that changes TCP's congestion control via eBPF only re-initializes the new congestion control, if the BPF op is set to an (invalid) value beyond BPF_SOCK_OPS_NEEDS_ECN. Consequently TCP will What do you mean by “(invalid) value”? run the new congestion control from random states. This has always been possible with setsockopt, no? This patch fixes the issue by always re-init the congestion control like other means such as setsockopt and sysctl changes. The current code re-inits the congestion control when calling tcp_set_congestion_control except when it is called early on (i.e. op <= BPF_SOCK_OPS_NEEDS_ECN). In that case there is no need to re-initialize since it will be initialized later by TCP when the connection is established. Otherwise, if we always call tcp_reinit_congestion_control we would call tcp_cleanup_congestion_control when the congestion control has not been initialized yet. Fixes: 91b5b21c7c16 ("bpf: Add support for changing congestion control") Signed-off-by: Yuchung Cheng Signed-off-by: Eric Dumazet Signed-off-by: Neal Cardwell Signed-off-by: Soheil Hassas Yeganeh --- include/net/tcp.h | 2 +- net/core/filter.c | 3 +-- net/ipv4/tcp.c | 2 +- net/ipv4/tcp_cong.c | 11 ++- 4 files changed, 5 insertions(+), 13 deletions(-) diff --git a/include/net/tcp.h b/include/net/tcp.h index 6da880d2f022..f94a71b62ba5 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -1006,7 +1006,7 @@ void tcp_get_default_congestion_control(struct net *net, char *name); void tcp_get_available_congestion_control(char *buf, size_t len); void tcp_get_allowed_congestion_control(char *buf, size_t len); int tcp_set_allowed_congestion_control(char *allowed); -int tcp_set_congestion_control(struct sock *sk, const char *name, bool load, bool reinit); +int tcp_set_congestion_control(struct sock *sk, const char *name, bool load); u32 tcp_slow_start(struct tcp_sock *tp, u32 acked); void tcp_cong_avoid_ai(struct tcp_sock *tp, u32 w, u32 acked); diff --git a/net/core/filter.c b/net/core/filter.c index 6a85e67fafce..757d52adccfc 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -3233,12 +3233,11 @@ BPF_CALL_5(bpf_setsockopt, struct bpf_sock_ops_kern *, bpf_sock, sk->sk_prot->setsockopt == tcp_setsockopt) { if (optname == TCP_CONGESTION) { char name[TCP_CA_NAME_MAX]; - bool reinit = bpf_sock->op > BPF_SOCK_OPS_NEEDS_ECN; strncpy(name, optval, min_t(long, optlen, TCP_CA_NAME_MAX-1)); name[TCP_CA_NAME_MAX-1] = 0; - ret = tcp_set_congestion_control(sk, name, false, reinit); + ret = tcp_set_congestion_control(sk, name, false); } else { struct tcp_sock *tp = tcp_sk(sk); diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index f08eebe60446..21e2a07e857e 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -2550,7 +2550,7 @@ static int do_tcp_setsockopt(struct sock *sk, int level, name[val] = 0; lock_sock(sk); - err = tcp_set_congestion_control(sk, name, true, true); + err = tcp_set_congestion_control(sk, name, true); release_sock(sk); return err; } diff --git a/net/ipv4/tcp_cong.c b/net/ipv4/tcp_cong.c index bc6c02f16243..70895bee3026 100644 --- a/net/ipv4/tcp_cong.c +++ b/net/ipv4/tcp_cong.c @@ -332,7 +332,7 @@ int tcp_set_allowed_congestion_control(char *val) * tcp_reinit_congestion_control (if the current congestion control was * already initialized. */ -int tcp_set_congestion_control(struct sock *sk, const char *name, bool load, bool reinit) +int tcp_set_congestion_control(struct sock *sk, const char *name, bool load) { struct inet_connection_sock *icsk = inet_csk(sk); const struct tcp_congestion_ops *ca; @@ -356,15 +356,8 @@ int tcp_set_congestion_control(struct sock *sk, const char *name, bool load, boo if (!ca) { err = -ENOENT; } else if (!load) { - const struct tcp_congestion_ops *old_ca = icsk->icsk_ca_ops; - if (try_module_get(ca->owner)) { - if (reinit) { - tcp_reinit_congestion_control(sk, ca); - } else { - icsk->icsk_ca_ops = ca; - module_put(old_ca->owner); - } + tcp_reinit_congestion_control(sk, ca); } else { err
Re: [PATCH net] bpf: always re-init the congestion control after switching to it
On Tue, Jan 23, 2018 at 2:20 PM, Lawrence Brakmo wrote: > On 1/23/18, 9:30 AM, "Yuchung Cheng" wrote: > > The original patch that changes TCP's congestion control via eBPF only > re-initializes the new congestion control, if the BPF op is set to an > (invalid) value beyond BPF_SOCK_OPS_NEEDS_ECN. Consequently TCP will > > What do you mean by “(invalid) value”? > > run the new congestion control from random states. > > This has always been possible with setsockopt, no? > >This patch fixes > the issue by always re-init the congestion control like other means > such as setsockopt and sysctl changes. > > The current code re-inits the congestion control when calling > tcp_set_congestion_control except when it is called early on (i.e. op <= > BPF_SOCK_OPS_NEEDS_ECN). In that case there is no need to re-initialize > since it will be initialized later by TCP when the connection is established. > > Otherwise, if we always call tcp_reinit_congestion_control we would call > tcp_cleanup_congestion_control when the congestion control has not been > initialized yet. On Tue, Jan 23, 2018 at 2:20 PM, Lawrence Brakmo wrote: > On 1/23/18, 9:30 AM, "Yuchung Cheng" wrote: > > The original patch that changes TCP's congestion control via eBPF only > re-initializes the new congestion control, if the BPF op is set to an > (invalid) value beyond BPF_SOCK_OPS_NEEDS_ECN. Consequently TCP will > > What do you mean by “(invalid) value”? > > run the new congestion control from random states. > > This has always been possible with setsockopt, no? > >This patch fixes > the issue by always re-init the congestion control like other means > such as setsockopt and sysctl changes. > > The current code re-inits the congestion control when calling > tcp_set_congestion_control except when it is called early on (i.e. op <= > BPF_SOCK_OPS_NEEDS_ECN). In that case there is no need to re-initialize > since it will be initialized later by TCP when the connection is established. > > Otherwise, if we always call tcp_reinit_congestion_control we would call > tcp_cleanup_congestion_control when the congestion control has not been > initialized yet. Interesting. So I wonder if the symptoms we were seeing were due to kernels that did not yet have this fix: 27204aaa9dc6 ("tcp: uniform the set up of sockets after successful connection): https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git/commit/?id=27204aaa9dc67b833b77179fdac556288ec3a4bf Before that fix, there could be TFO passive connections that at SYN time called: tcp_init_congestion_control(child); and then: tcp_call_bpf(child, BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB); So that if the CC was switched in the BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB handler then there would be no init for the new module? neal
Re: [PATCH net] bpf: always re-init the congestion control after switching to it
On Tue, 2018-01-23 at 14:39 -0500, Neal Cardwell wrote: > On Tue, Jan 23, 2018 at 2:20 PM, Lawrence Brakmo wrote: > > On 1/23/18, 9:30 AM, "Yuchung Cheng" wrote: > > > > The original patch that changes TCP's congestion control via eBPF only > > re-initializes the new congestion control, if the BPF op is set to an > > (invalid) value beyond BPF_SOCK_OPS_NEEDS_ECN. Consequently TCP will > > > > What do you mean by “(invalid) value”? > > > > run the new congestion control from random states. > > > > This has always been possible with setsockopt, no? > > > >This patch fixes > > the issue by always re-init the congestion control like other means > > such as setsockopt and sysctl changes. > > > > The current code re-inits the congestion control when calling > > tcp_set_congestion_control except when it is called early on (i.e. op <= > > BPF_SOCK_OPS_NEEDS_ECN). In that case there is no need to re-initialize > > since it will be initialized later by TCP when the connection is > > established. > > > > Otherwise, if we always call tcp_reinit_congestion_control we would call > > tcp_cleanup_congestion_control when the congestion control has not been > > initialized yet. > > On Tue, Jan 23, 2018 at 2:20 PM, Lawrence Brakmo wrote: > > On 1/23/18, 9:30 AM, "Yuchung Cheng" wrote: > > > > The original patch that changes TCP's congestion control via eBPF only > > re-initializes the new congestion control, if the BPF op is set to an > > (invalid) value beyond BPF_SOCK_OPS_NEEDS_ECN. Consequently TCP will > > > > What do you mean by “(invalid) value”? > > > > run the new congestion control from random states. > > > > This has always been possible with setsockopt, no? > > > >This patch fixes > > the issue by always re-init the congestion control like other means > > such as setsockopt and sysctl changes. > > > > The current code re-inits the congestion control when calling > > tcp_set_congestion_control except when it is called early on (i.e. op <= > > BPF_SOCK_OPS_NEEDS_ECN). In that case there is no need to re-initialize > > since it will be initialized later by TCP when the connection is > > established. > > > > Otherwise, if we always call tcp_reinit_congestion_control we would call > > tcp_cleanup_congestion_control when the congestion control has not been > > initialized yet. > > Interesting. So I wonder if the symptoms we were seeing were due to > kernels that did not yet have this fix: > > 27204aaa9dc6 ("tcp: uniform the set up of sockets after successful > connection): > > https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git/commit/?id=27204aaa9dc67b833b77179fdac556288ec3a4bf > > Before that fix, there could be TFO passive connections that at SYN time > called: > tcp_init_congestion_control(child); > and then: > tcp_call_bpf(child, BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB); > > So that if the CC was switched in the > BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB handler then there would be no > init for the new module? Note that bpf_sock->op can be written by a malicious BPF filter. So, a malicious filter can switch from Cubic to BBR without re-init, and bad things can happen. I do not believe we should trust BPF here.
Re: [PATCH net] bpf: always re-init the congestion control after switching to it
On 1/23/18, 11:50 AM, "Eric Dumazet" wrote: On Tue, 2018-01-23 at 14:39 -0500, Neal Cardwell wrote: > On Tue, Jan 23, 2018 at 2:20 PM, Lawrence Brakmo wrote: > > On 1/23/18, 9:30 AM, "Yuchung Cheng" wrote: > > > > The original patch that changes TCP's congestion control via eBPF only > > re-initializes the new congestion control, if the BPF op is set to an > > (invalid) value beyond BPF_SOCK_OPS_NEEDS_ECN. Consequently TCP will > > > > What do you mean by “(invalid) value”? > > > > run the new congestion control from random states. > > > > This has always been possible with setsockopt, no? > > > >This patch fixes > > the issue by always re-init the congestion control like other means > > such as setsockopt and sysctl changes. > > > > The current code re-inits the congestion control when calling > > tcp_set_congestion_control except when it is called early on (i.e. op <= > > BPF_SOCK_OPS_NEEDS_ECN). In that case there is no need to re-initialize > > since it will be initialized later by TCP when the connection is established. > > > > Otherwise, if we always call tcp_reinit_congestion_control we would call > > tcp_cleanup_congestion_control when the congestion control has not been > > initialized yet. > > On Tue, Jan 23, 2018 at 2:20 PM, Lawrence Brakmo wrote: > > On 1/23/18, 9:30 AM, "Yuchung Cheng" wrote: > > > > The original patch that changes TCP's congestion control via eBPF only > > re-initializes the new congestion control, if the BPF op is set to an > > (invalid) value beyond BPF_SOCK_OPS_NEEDS_ECN. Consequently TCP will > > > > What do you mean by “(invalid) value”? > > > > run the new congestion control from random states. > > > > This has always been possible with setsockopt, no? > > > >This patch fixes > > the issue by always re-init the congestion control like other means > > such as setsockopt and sysctl changes. > > > > The current code re-inits the congestion control when calling > > tcp_set_congestion_control except when it is called early on (i.e. op <= > > BPF_SOCK_OPS_NEEDS_ECN). In that case there is no need to re-initialize > > since it will be initialized later by TCP when the connection is established. > > > > Otherwise, if we always call tcp_reinit_congestion_control we would call > > tcp_cleanup_congestion_control when the congestion control has not been > > initialized yet. > > Interesting. So I wonder if the symptoms we were seeing were due to > kernels that did not yet have this fix: > > 27204aaa9dc6 ("tcp: uniform the set up of sockets after successful > connection): > https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git/commit/?id=27204aaa9dc67b833b77179fdac556288ec3a4bf > > Before that fix, there could be TFO passive connections that at SYN time called: > tcp_init_congestion_control(child); > and then: > tcp_call_bpf(child, BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB); > > So that if the CC was switched in the > BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB handler then there would be no > init for the new module? Note that bpf_sock->op can be written by a malicious BPF filter. So, a malicious filter can switch from Cubic to BBR without re-init, and bad things can happen. I do not believe we should trust BPF here. Very good point Eric. One solution would be to make bpf_sock->op not writeable by the BPF program. Neal, you are correct that would have been a problem. I leave it up to you guys whether making bpf_sock->op not writeable by BPF program is enough or if it is safer to always re-init (as long as there is no problem calling tcp_cleanup_congestion_control when it has not been initialized.
Re: [PATCH net] bpf: always re-init the congestion control after switching to it
On Tue, Jan 23, 2018 at 12:19 PM, Lawrence Brakmo wrote: > > On 1/23/18, 11:50 AM, "Eric Dumazet" wrote: > > On Tue, 2018-01-23 at 14:39 -0500, Neal Cardwell wrote: > > On Tue, Jan 23, 2018 at 2:20 PM, Lawrence Brakmo wrote: > > > On 1/23/18, 9:30 AM, "Yuchung Cheng" wrote: > > > > > > The original patch that changes TCP's congestion control via eBPF > only > > > re-initializes the new congestion control, if the BPF op is set > to an > > > (invalid) value beyond BPF_SOCK_OPS_NEEDS_ECN. Consequently TCP > will > > > > > > What do you mean by “(invalid) value”? > > > > > > run the new congestion control from random states. > > > > > > This has always been possible with setsockopt, no? > > > > > >This patch fixes > > > the issue by always re-init the congestion control like other > means > > > such as setsockopt and sysctl changes. > > > > > > The current code re-inits the congestion control when calling > > > tcp_set_congestion_control except when it is called early on (i.e. op > <= > > > BPF_SOCK_OPS_NEEDS_ECN). In that case there is no need to > re-initialize > > > since it will be initialized later by TCP when the connection is > established. > > > > > > Otherwise, if we always call tcp_reinit_congestion_control we would > call > > > tcp_cleanup_congestion_control when the congestion control has not > been > > > initialized yet. > > > > On Tue, Jan 23, 2018 at 2:20 PM, Lawrence Brakmo wrote: > > > On 1/23/18, 9:30 AM, "Yuchung Cheng" wrote: > > > > > > The original patch that changes TCP's congestion control via eBPF > only > > > re-initializes the new congestion control, if the BPF op is set > to an > > > (invalid) value beyond BPF_SOCK_OPS_NEEDS_ECN. Consequently TCP > will > > > > > > What do you mean by “(invalid) value”? > > > > > > run the new congestion control from random states. > > > > > > This has always been possible with setsockopt, no? > > > > > >This patch fixes > > > the issue by always re-init the congestion control like other > means > > > such as setsockopt and sysctl changes. > > > > > > The current code re-inits the congestion control when calling > > > tcp_set_congestion_control except when it is called early on (i.e. op > <= > > > BPF_SOCK_OPS_NEEDS_ECN). In that case there is no need to > re-initialize > > > since it will be initialized later by TCP when the connection is > established. > > > > > > Otherwise, if we always call tcp_reinit_congestion_control we would > call > > > tcp_cleanup_congestion_control when the congestion control has not > been > > > initialized yet. > > > > Interesting. So I wonder if the symptoms we were seeing were due to > > kernels that did not yet have this fix: > > > > 27204aaa9dc6 ("tcp: uniform the set up of sockets after successful > > connection): > > > https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git/commit/?id=27204aaa9dc67b833b77179fdac556288ec3a4bf > > > > Before that fix, there could be TFO passive connections that at SYN > time called: > > tcp_init_congestion_control(child); > > and then: > > tcp_call_bpf(child, BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB); > > > > So that if the CC was switched in the > > BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB handler then there would be no > > init for the new module? > > > Note that bpf_sock->op can be written by a malicious BPF filter. > > So, a malicious filter can switch from Cubic to BBR without re-init, > and bad things can happen. > > I do not believe we should trust BPF here. > > Very good point Eric. One solution would be to make bpf_sock->op not > writeable by > the BPF program. > > Neal, you are correct that would have been a problem. I leave it up to you > guys whether > making bpf_sock->op not writeable by BPF program is enough or if it is safer > to always > re-init (as long as there is no problem calling > tcp_cleanup_congestion_control when it > has not been initialized. Thank you Larry for the clarification. I prefer the latter approach and will respin. > > >
Re: [PATCH net] bpf: always re-init the congestion control after switching to it
On Tue, Jan 23, 2018 at 08:19:54PM +, Lawrence Brakmo wrote: > On 1/23/18, 11:50 AM, "Eric Dumazet" wrote: > > On Tue, 2018-01-23 at 14:39 -0500, Neal Cardwell wrote: > > On Tue, Jan 23, 2018 at 2:20 PM, Lawrence Brakmo wrote: > > > On 1/23/18, 9:30 AM, "Yuchung Cheng" wrote: > > > > > > The original patch that changes TCP's congestion control via eBPF > only > > > re-initializes the new congestion control, if the BPF op is set > to an > > > (invalid) value beyond BPF_SOCK_OPS_NEEDS_ECN. Consequently TCP > will > > > > > > What do you mean by “(invalid) value”? > > > > > > run the new congestion control from random states. > > > > > > This has always been possible with setsockopt, no? > > > > > >This patch fixes > > > the issue by always re-init the congestion control like other > means > > > such as setsockopt and sysctl changes. > > > > > > The current code re-inits the congestion control when calling > > > tcp_set_congestion_control except when it is called early on (i.e. op > <= > > > BPF_SOCK_OPS_NEEDS_ECN). In that case there is no need to > re-initialize > > > since it will be initialized later by TCP when the connection is > established. > > > > > > Otherwise, if we always call tcp_reinit_congestion_control we would > call > > > tcp_cleanup_congestion_control when the congestion control has not > been > > > initialized yet. > > > > On Tue, Jan 23, 2018 at 2:20 PM, Lawrence Brakmo wrote: > > > On 1/23/18, 9:30 AM, "Yuchung Cheng" wrote: > > > > > > The original patch that changes TCP's congestion control via eBPF > only > > > re-initializes the new congestion control, if the BPF op is set > to an > > > (invalid) value beyond BPF_SOCK_OPS_NEEDS_ECN. Consequently TCP > will > > > > > > What do you mean by “(invalid) value”? > > > > > > run the new congestion control from random states. > > > > > > This has always been possible with setsockopt, no? > > > > > >This patch fixes > > > the issue by always re-init the congestion control like other > means > > > such as setsockopt and sysctl changes. > > > > > > The current code re-inits the congestion control when calling > > > tcp_set_congestion_control except when it is called early on (i.e. op > <= > > > BPF_SOCK_OPS_NEEDS_ECN). In that case there is no need to > re-initialize > > > since it will be initialized later by TCP when the connection is > established. > > > > > > Otherwise, if we always call tcp_reinit_congestion_control we would > call > > > tcp_cleanup_congestion_control when the congestion control has not > been > > > initialized yet. > > > > Interesting. So I wonder if the symptoms we were seeing were due to > > kernels that did not yet have this fix: > > > > 27204aaa9dc6 ("tcp: uniform the set up of sockets after successful > > connection): > > > https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git/commit/?id=27204aaa9dc67b833b77179fdac556288ec3a4bf > > > > Before that fix, there could be TFO passive connections that at SYN > time called: > > tcp_init_congestion_control(child); > > and then: > > tcp_call_bpf(child, BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB); > > > > So that if the CC was switched in the > > BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB handler then there would be no > > init for the new module? > > > Note that bpf_sock->op can be written by a malicious BPF filter. > > So, a malicious filter can switch from Cubic to BBR without re-init, > and bad things can happen. > > I do not believe we should trust BPF here. > > Very good point Eric. One solution would be to make bpf_sock->op not > writeable by > the BPF program. > > Neal, you are correct that would have been a problem. I leave it up to you > guys whether > making bpf_sock->op not writeable by BPF program is enough or if it is safer > to always > re-init (as long as there is no problem calling > tcp_cleanup_congestion_control when it > has not been initialized. I think allowing write into 'op' and 'replylong' was a mistake. Only 'reply' field is used by tcp_call_bpf(). No reason to let programs write into other fields. I think we have to fix it now before programs start to rely on this undefined behavior.
Re: [PATCH net] bpf: always re-init the congestion control after switching to it
On 1/23/18, 3:26 PM, "Alexei Starovoitov" wrote: On Tue, Jan 23, 2018 at 08:19:54PM +, Lawrence Brakmo wrote: > On 1/23/18, 11:50 AM, "Eric Dumazet" wrote: > > On Tue, 2018-01-23 at 14:39 -0500, Neal Cardwell wrote: > > On Tue, Jan 23, 2018 at 2:20 PM, Lawrence Brakmo wrote: > > > On 1/23/18, 9:30 AM, "Yuchung Cheng" wrote: > > > > > > The original patch that changes TCP's congestion control via eBPF only > > > re-initializes the new congestion control, if the BPF op is set to an > > > (invalid) value beyond BPF_SOCK_OPS_NEEDS_ECN. Consequently TCP will > > > > > > What do you mean by “(invalid) value”? > > > > > > run the new congestion control from random states. > > > > > > This has always been possible with setsockopt, no? > > > > > >This patch fixes > > > the issue by always re-init the congestion control like other means > > > such as setsockopt and sysctl changes. > > > > > > The current code re-inits the congestion control when calling > > > tcp_set_congestion_control except when it is called early on (i.e. op <= > > > BPF_SOCK_OPS_NEEDS_ECN). In that case there is no need to re-initialize > > > since it will be initialized later by TCP when the connection is established. > > > > > > Otherwise, if we always call tcp_reinit_congestion_control we would call > > > tcp_cleanup_congestion_control when the congestion control has not been > > > initialized yet. > > > > On Tue, Jan 23, 2018 at 2:20 PM, Lawrence Brakmo wrote: > > > On 1/23/18, 9:30 AM, "Yuchung Cheng" wrote: > > > > > > The original patch that changes TCP's congestion control via eBPF only > > > re-initializes the new congestion control, if the BPF op is set to an > > > (invalid) value beyond BPF_SOCK_OPS_NEEDS_ECN. Consequently TCP will > > > > > > What do you mean by “(invalid) value”? > > > > > > run the new congestion control from random states. > > > > > > This has always been possible with setsockopt, no? > > > > > >This patch fixes > > > the issue by always re-init the congestion control like other means > > > such as setsockopt and sysctl changes. > > > > > > The current code re-inits the congestion control when calling > > > tcp_set_congestion_control except when it is called early on (i.e. op <= > > > BPF_SOCK_OPS_NEEDS_ECN). In that case there is no need to re-initialize > > > since it will be initialized later by TCP when the connection is established. > > > > > > Otherwise, if we always call tcp_reinit_congestion_control we would call > > > tcp_cleanup_congestion_control when the congestion control has not been > > > initialized yet. > > > > Interesting. So I wonder if the symptoms we were seeing were due to > > kernels that did not yet have this fix: > > > > 27204aaa9dc6 ("tcp: uniform the set up of sockets after successful > > connection): > > https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git/commit/?id=27204aaa9dc67b833b77179fdac556288ec3a4bf > > > > Before that fix, there could be TFO passive connections that at SYN time called: > > tcp_init_congestion_control(child); > > and then: > > tcp_call_bpf(child, BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB); > > > > So that if the CC was switched in the > > BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB handler then there would be no > > init for the new module? > > > Note that bpf_sock->op can be written by a malicious BPF filter. > > So, a malicious filter can switch from Cubic to BBR without re-init, > and bad things can happen. > > I do not believe we should trust BPF here. > > Very good point Eric. One solution would be to make bpf_sock->op not writeable by > the BPF program. > > Neal, you are correct that would have been a problem. I leave it up to you guys whether > making bpf_sock->op not writeable by BPF program is enough or if it is safer to always > re-init (as long as there is no problem calling tcp_cleanup_congestion_control when it > has not been initialized. I think allowing write into 'op' and 'replylong' was a mistake. Only 'reply' field is used by tcp_call_bpf(). No reason to let programs write into other fields. I think we have to fix it now before programs start to rely on this undefined behavior. write into ‘op’ is a mistake. Writing to replylong is a mistake unt
Re: [PATCH net] bpf: always re-init the congestion control after switching to it
On Tue, Jan 23, 2018 at 3:30 PM, Lawrence Brakmo wrote: > > > > On 1/23/18, 3:26 PM, "Alexei Starovoitov" > wrote: > > On Tue, Jan 23, 2018 at 08:19:54PM +, Lawrence Brakmo wrote: > > On 1/23/18, 11:50 AM, "Eric Dumazet" wrote: > > > > On Tue, 2018-01-23 at 14:39 -0500, Neal Cardwell wrote: > > > On Tue, Jan 23, 2018 at 2:20 PM, Lawrence Brakmo > wrote: > > > > On 1/23/18, 9:30 AM, "Yuchung Cheng" wrote: > > > > > > > > The original patch that changes TCP's congestion control > via eBPF only > > > > re-initializes the new congestion control, if the BPF op is > set to an > > > > (invalid) value beyond BPF_SOCK_OPS_NEEDS_ECN. Consequently > TCP will > > > > > > > > What do you mean by “(invalid) value”? > > > > > > > > run the new congestion control from random states. > > > > > > > > This has always been possible with setsockopt, no? > > > > > > > >This patch fixes > > > > the issue by always re-init the congestion control like > other means > > > > such as setsockopt and sysctl changes. > > > > > > > > The current code re-inits the congestion control when calling > > > > tcp_set_congestion_control except when it is called early on > (i.e. op <= > > > > BPF_SOCK_OPS_NEEDS_ECN). In that case there is no need to > re-initialize > > > > since it will be initialized later by TCP when the connection > is established. > > > > > > > > Otherwise, if we always call tcp_reinit_congestion_control we > would call > > > > tcp_cleanup_congestion_control when the congestion control has > not been > > > > initialized yet. > > > > > > On Tue, Jan 23, 2018 at 2:20 PM, Lawrence Brakmo > wrote: > > > > On 1/23/18, 9:30 AM, "Yuchung Cheng" wrote: > > > > > > > > The original patch that changes TCP's congestion control > via eBPF only > > > > re-initializes the new congestion control, if the BPF op is > set to an > > > > (invalid) value beyond BPF_SOCK_OPS_NEEDS_ECN. Consequently > TCP will > > > > > > > > What do you mean by “(invalid) value”? > > > > > > > > run the new congestion control from random states. > > > > > > > > This has always been possible with setsockopt, no? > > > > > > > >This patch fixes > > > > the issue by always re-init the congestion control like > other means > > > > such as setsockopt and sysctl changes. > > > > > > > > The current code re-inits the congestion control when calling > > > > tcp_set_congestion_control except when it is called early on > (i.e. op <= > > > > BPF_SOCK_OPS_NEEDS_ECN). In that case there is no need to > re-initialize > > > > since it will be initialized later by TCP when the connection > is established. > > > > > > > > Otherwise, if we always call tcp_reinit_congestion_control we > would call > > > > tcp_cleanup_congestion_control when the congestion control has > not been > > > > initialized yet. > > > > > > Interesting. So I wonder if the symptoms we were seeing were due > to > > > kernels that did not yet have this fix: > > > > > > 27204aaa9dc6 ("tcp: uniform the set up of sockets after > successful > > > connection): > > > > https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git/commit/?id=27204aaa9dc67b833b77179fdac556288ec3a4bf > > > > > > Before that fix, there could be TFO passive connections that at > SYN time called: > > > tcp_init_congestion_control(child); > > > and then: > > > tcp_call_bpf(child, BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB); > > > > > > So that if the CC was switched in the > > > BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB handler then there would be no > > > init for the new module? > > > > > > Note that bpf_sock->op can be written by a malicious BPF filter. > > > > So, a malicious filter can switch from Cubic to BBR without re-init, > > and bad things can happen. > > > > I do not believe we should trust BPF here. > > > > Very good point Eric. One solution would be to make bpf_sock->op not > writeable by > > the BPF program. > > > > Neal, you are correct that would have been a problem. I leave it up to > you guys whether > > making bpf_sock->op not writeable by BPF program is enough or if it is > safer to always > > re-init (as long as there is no problem calling > tcp_cleanup_congestion_control when it > > has not been initialized. > > I think allowing write into 'op' and 'replylong' was a mistake. > Only 'reply' fie