Re: [PATCH net] bpf: always re-init the congestion control after switching to it

2018-01-23 Thread Lawrence Brakmo
On 1/23/18, 9:30 AM, "Yuchung Cheng"  wrote:

The original patch that changes TCP's congestion control via eBPF only
re-initializes the new congestion control, if the BPF op is set to an
(invalid) value beyond BPF_SOCK_OPS_NEEDS_ECN. Consequently TCP will

What do you mean by “(invalid) value”?

run the new congestion control from random states. 

This has always been possible with setsockopt, no?

   This patch fixes
the issue by always re-init the congestion control like other means
such as setsockopt and sysctl changes.

The current code re-inits the congestion control when calling 
tcp_set_congestion_control except when it is called early on (i.e. op <= 
BPF_SOCK_OPS_NEEDS_ECN). In that case there is no need to re-initialize
since it will be initialized later by TCP when the connection is established.

Otherwise, if we always call tcp_reinit_congestion_control we would call 
tcp_cleanup_congestion_control when the congestion control has not been
initialized yet.


Fixes: 91b5b21c7c16 ("bpf: Add support for changing congestion control")
Signed-off-by: Yuchung Cheng 
Signed-off-by: Eric Dumazet 
Signed-off-by: Neal Cardwell 
Signed-off-by: Soheil Hassas Yeganeh 
---
 include/net/tcp.h   |  2 +-
 net/core/filter.c   |  3 +--
 net/ipv4/tcp.c  |  2 +-
 net/ipv4/tcp_cong.c | 11 ++-
 4 files changed, 5 insertions(+), 13 deletions(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index 6da880d2f022..f94a71b62ba5 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -1006,7 +1006,7 @@ void tcp_get_default_congestion_control(struct net 
*net, char *name);
 void tcp_get_available_congestion_control(char *buf, size_t len);
 void tcp_get_allowed_congestion_control(char *buf, size_t len);
 int tcp_set_allowed_congestion_control(char *allowed);
-int tcp_set_congestion_control(struct sock *sk, const char *name, bool 
load, bool reinit);
+int tcp_set_congestion_control(struct sock *sk, const char *name, bool 
load);
 u32 tcp_slow_start(struct tcp_sock *tp, u32 acked);
 void tcp_cong_avoid_ai(struct tcp_sock *tp, u32 w, u32 acked);
 
diff --git a/net/core/filter.c b/net/core/filter.c
index 6a85e67fafce..757d52adccfc 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -3233,12 +3233,11 @@ BPF_CALL_5(bpf_setsockopt, struct bpf_sock_ops_kern 
*, bpf_sock,
   sk->sk_prot->setsockopt == tcp_setsockopt) {
if (optname == TCP_CONGESTION) {
char name[TCP_CA_NAME_MAX];
-   bool reinit = bpf_sock->op > BPF_SOCK_OPS_NEEDS_ECN;
 
strncpy(name, optval, min_t(long, optlen,
TCP_CA_NAME_MAX-1));
name[TCP_CA_NAME_MAX-1] = 0;
-   ret = tcp_set_congestion_control(sk, name, false, 
reinit);
+   ret = tcp_set_congestion_control(sk, name, false);
} else {
struct tcp_sock *tp = tcp_sk(sk);
 
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index f08eebe60446..21e2a07e857e 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -2550,7 +2550,7 @@ static int do_tcp_setsockopt(struct sock *sk, int 
level,
name[val] = 0;
 
lock_sock(sk);
-   err = tcp_set_congestion_control(sk, name, true, true);
+   err = tcp_set_congestion_control(sk, name, true);
release_sock(sk);
return err;
}
diff --git a/net/ipv4/tcp_cong.c b/net/ipv4/tcp_cong.c
index bc6c02f16243..70895bee3026 100644
--- a/net/ipv4/tcp_cong.c
+++ b/net/ipv4/tcp_cong.c
@@ -332,7 +332,7 @@ int tcp_set_allowed_congestion_control(char *val)
  * tcp_reinit_congestion_control (if the current congestion control was
  * already initialized.
  */
-int tcp_set_congestion_control(struct sock *sk, const char *name, bool 
load, bool reinit)
+int tcp_set_congestion_control(struct sock *sk, const char *name, bool 
load)
 {
struct inet_connection_sock *icsk = inet_csk(sk);
const struct tcp_congestion_ops *ca;
@@ -356,15 +356,8 @@ int tcp_set_congestion_control(struct sock *sk, const 
char *name, bool load, boo
if (!ca) {
err = -ENOENT;
} else if (!load) {
-   const struct tcp_congestion_ops *old_ca = icsk->icsk_ca_ops;
-
if (try_module_get(ca->owner)) {
-   if (reinit) {
-   tcp_reinit_congestion_control(sk, ca);
-   } else {
-   icsk->icsk_ca_ops = ca;
-   module_put(old_ca->owner);
-   }
+   tcp_reinit_congestion_control(sk, ca);
} else {
err

Re: [PATCH net] bpf: always re-init the congestion control after switching to it

2018-01-23 Thread Neal Cardwell
On Tue, Jan 23, 2018 at 2:20 PM, Lawrence Brakmo  wrote:
> On 1/23/18, 9:30 AM, "Yuchung Cheng"  wrote:
>
> The original patch that changes TCP's congestion control via eBPF only
> re-initializes the new congestion control, if the BPF op is set to an
> (invalid) value beyond BPF_SOCK_OPS_NEEDS_ECN. Consequently TCP will
>
> What do you mean by “(invalid) value”?
>
> run the new congestion control from random states.
>
> This has always been possible with setsockopt, no?
>
>This patch fixes
> the issue by always re-init the congestion control like other means
> such as setsockopt and sysctl changes.
>
> The current code re-inits the congestion control when calling
> tcp_set_congestion_control except when it is called early on (i.e. op <=
> BPF_SOCK_OPS_NEEDS_ECN). In that case there is no need to re-initialize
> since it will be initialized later by TCP when the connection is established.
>
> Otherwise, if we always call tcp_reinit_congestion_control we would call
> tcp_cleanup_congestion_control when the congestion control has not been
> initialized yet.

On Tue, Jan 23, 2018 at 2:20 PM, Lawrence Brakmo  wrote:
> On 1/23/18, 9:30 AM, "Yuchung Cheng"  wrote:
>
> The original patch that changes TCP's congestion control via eBPF only
> re-initializes the new congestion control, if the BPF op is set to an
> (invalid) value beyond BPF_SOCK_OPS_NEEDS_ECN. Consequently TCP will
>
> What do you mean by “(invalid) value”?
>
> run the new congestion control from random states.
>
> This has always been possible with setsockopt, no?
>
>This patch fixes
> the issue by always re-init the congestion control like other means
> such as setsockopt and sysctl changes.
>
> The current code re-inits the congestion control when calling
> tcp_set_congestion_control except when it is called early on (i.e. op <=
> BPF_SOCK_OPS_NEEDS_ECN). In that case there is no need to re-initialize
> since it will be initialized later by TCP when the connection is established.
>
> Otherwise, if we always call tcp_reinit_congestion_control we would call
> tcp_cleanup_congestion_control when the congestion control has not been
> initialized yet.

Interesting. So I wonder if the symptoms we were seeing were due to
kernels that did not yet have this fix:

  27204aaa9dc6 ("tcp: uniform the set up of sockets after successful
connection):
  
https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git/commit/?id=27204aaa9dc67b833b77179fdac556288ec3a4bf

Before that fix, there could be TFO passive connections that at SYN time called:
  tcp_init_congestion_control(child);
and then:
  tcp_call_bpf(child, BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB);

So that if the CC was switched in the
BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB handler then there would be no
init for the new module?

neal


Re: [PATCH net] bpf: always re-init the congestion control after switching to it

2018-01-23 Thread Eric Dumazet
On Tue, 2018-01-23 at 14:39 -0500, Neal Cardwell wrote:
> On Tue, Jan 23, 2018 at 2:20 PM, Lawrence Brakmo  wrote:
> > On 1/23/18, 9:30 AM, "Yuchung Cheng"  wrote:
> > 
> > The original patch that changes TCP's congestion control via eBPF only
> > re-initializes the new congestion control, if the BPF op is set to an
> > (invalid) value beyond BPF_SOCK_OPS_NEEDS_ECN. Consequently TCP will
> > 
> > What do you mean by “(invalid) value”?
> > 
> > run the new congestion control from random states.
> > 
> > This has always been possible with setsockopt, no?
> > 
> >This patch fixes
> > the issue by always re-init the congestion control like other means
> > such as setsockopt and sysctl changes.
> > 
> > The current code re-inits the congestion control when calling
> > tcp_set_congestion_control except when it is called early on (i.e. op <=
> > BPF_SOCK_OPS_NEEDS_ECN). In that case there is no need to re-initialize
> > since it will be initialized later by TCP when the connection is 
> > established.
> > 
> > Otherwise, if we always call tcp_reinit_congestion_control we would call
> > tcp_cleanup_congestion_control when the congestion control has not been
> > initialized yet.
> 
> On Tue, Jan 23, 2018 at 2:20 PM, Lawrence Brakmo  wrote:
> > On 1/23/18, 9:30 AM, "Yuchung Cheng"  wrote:
> > 
> > The original patch that changes TCP's congestion control via eBPF only
> > re-initializes the new congestion control, if the BPF op is set to an
> > (invalid) value beyond BPF_SOCK_OPS_NEEDS_ECN. Consequently TCP will
> > 
> > What do you mean by “(invalid) value”?
> > 
> > run the new congestion control from random states.
> > 
> > This has always been possible with setsockopt, no?
> > 
> >This patch fixes
> > the issue by always re-init the congestion control like other means
> > such as setsockopt and sysctl changes.
> > 
> > The current code re-inits the congestion control when calling
> > tcp_set_congestion_control except when it is called early on (i.e. op <=
> > BPF_SOCK_OPS_NEEDS_ECN). In that case there is no need to re-initialize
> > since it will be initialized later by TCP when the connection is 
> > established.
> > 
> > Otherwise, if we always call tcp_reinit_congestion_control we would call
> > tcp_cleanup_congestion_control when the congestion control has not been
> > initialized yet.
> 
> Interesting. So I wonder if the symptoms we were seeing were due to
> kernels that did not yet have this fix:
> 
>   27204aaa9dc6 ("tcp: uniform the set up of sockets after successful
> connection):
>   
> https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git/commit/?id=27204aaa9dc67b833b77179fdac556288ec3a4bf
> 
> Before that fix, there could be TFO passive connections that at SYN time 
> called:
>   tcp_init_congestion_control(child);
> and then:
>   tcp_call_bpf(child, BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB);
> 
> So that if the CC was switched in the
> BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB handler then there would be no
> init for the new module?


Note that bpf_sock->op can be written by a malicious BPF filter.

So, a malicious filter can switch from Cubic to BBR without re-init,
and bad things can happen.

I do not believe we should trust BPF here.



Re: [PATCH net] bpf: always re-init the congestion control after switching to it

2018-01-23 Thread Lawrence Brakmo
On 1/23/18, 11:50 AM, "Eric Dumazet"  wrote:

On Tue, 2018-01-23 at 14:39 -0500, Neal Cardwell wrote:
> On Tue, Jan 23, 2018 at 2:20 PM, Lawrence Brakmo  wrote:
> > On 1/23/18, 9:30 AM, "Yuchung Cheng"  wrote:
> > 
> > The original patch that changes TCP's congestion control via eBPF 
only
> > re-initializes the new congestion control, if the BPF op is set to 
an
> > (invalid) value beyond BPF_SOCK_OPS_NEEDS_ECN. Consequently TCP will
> > 
> > What do you mean by “(invalid) value”?
> > 
> > run the new congestion control from random states.
> > 
> > This has always been possible with setsockopt, no?
> > 
> >This patch fixes
> > the issue by always re-init the congestion control like other means
> > such as setsockopt and sysctl changes.
> > 
> > The current code re-inits the congestion control when calling
> > tcp_set_congestion_control except when it is called early on (i.e. op <=
> > BPF_SOCK_OPS_NEEDS_ECN). In that case there is no need to re-initialize
> > since it will be initialized later by TCP when the connection is 
established.
> > 
> > Otherwise, if we always call tcp_reinit_congestion_control we would call
> > tcp_cleanup_congestion_control when the congestion control has not been
> > initialized yet.
> 
> On Tue, Jan 23, 2018 at 2:20 PM, Lawrence Brakmo  wrote:
> > On 1/23/18, 9:30 AM, "Yuchung Cheng"  wrote:
> > 
> > The original patch that changes TCP's congestion control via eBPF 
only
> > re-initializes the new congestion control, if the BPF op is set to 
an
> > (invalid) value beyond BPF_SOCK_OPS_NEEDS_ECN. Consequently TCP will
> > 
> > What do you mean by “(invalid) value”?
> > 
> > run the new congestion control from random states.
> > 
> > This has always been possible with setsockopt, no?
> > 
> >This patch fixes
> > the issue by always re-init the congestion control like other means
> > such as setsockopt and sysctl changes.
> > 
> > The current code re-inits the congestion control when calling
> > tcp_set_congestion_control except when it is called early on (i.e. op <=
> > BPF_SOCK_OPS_NEEDS_ECN). In that case there is no need to re-initialize
> > since it will be initialized later by TCP when the connection is 
established.
> > 
> > Otherwise, if we always call tcp_reinit_congestion_control we would call
> > tcp_cleanup_congestion_control when the congestion control has not been
> > initialized yet.
> 
> Interesting. So I wonder if the symptoms we were seeing were due to
> kernels that did not yet have this fix:
> 
>   27204aaa9dc6 ("tcp: uniform the set up of sockets after successful
> connection):
>   
https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git/commit/?id=27204aaa9dc67b833b77179fdac556288ec3a4bf
> 
> Before that fix, there could be TFO passive connections that at SYN time 
called:
>   tcp_init_congestion_control(child);
> and then:
>   tcp_call_bpf(child, BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB);
> 
> So that if the CC was switched in the
> BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB handler then there would be no
> init for the new module?


Note that bpf_sock->op can be written by a malicious BPF filter.

So, a malicious filter can switch from Cubic to BBR without re-init,
and bad things can happen.

I do not believe we should trust BPF here.

Very good point Eric. One solution would be to make bpf_sock->op not writeable 
by
the BPF program. 

Neal, you are correct that would have been a problem. I leave it up to you guys 
whether
making bpf_sock->op not writeable by BPF program is enough or if it is safer to 
always
re-init (as long as there is no problem calling tcp_cleanup_congestion_control 
when it
has not been initialized.





Re: [PATCH net] bpf: always re-init the congestion control after switching to it

2018-01-23 Thread Yuchung Cheng
On Tue, Jan 23, 2018 at 12:19 PM, Lawrence Brakmo  wrote:
>
> On 1/23/18, 11:50 AM, "Eric Dumazet"  wrote:
>
> On Tue, 2018-01-23 at 14:39 -0500, Neal Cardwell wrote:
> > On Tue, Jan 23, 2018 at 2:20 PM, Lawrence Brakmo  wrote:
> > > On 1/23/18, 9:30 AM, "Yuchung Cheng"  wrote:
> > >
> > > The original patch that changes TCP's congestion control via eBPF 
> only
> > > re-initializes the new congestion control, if the BPF op is set 
> to an
> > > (invalid) value beyond BPF_SOCK_OPS_NEEDS_ECN. Consequently TCP 
> will
> > >
> > > What do you mean by “(invalid) value”?
> > >
> > > run the new congestion control from random states.
> > >
> > > This has always been possible with setsockopt, no?
> > >
> > >This patch fixes
> > > the issue by always re-init the congestion control like other 
> means
> > > such as setsockopt and sysctl changes.
> > >
> > > The current code re-inits the congestion control when calling
> > > tcp_set_congestion_control except when it is called early on (i.e. op 
> <=
> > > BPF_SOCK_OPS_NEEDS_ECN). In that case there is no need to 
> re-initialize
> > > since it will be initialized later by TCP when the connection is 
> established.
> > >
> > > Otherwise, if we always call tcp_reinit_congestion_control we would 
> call
> > > tcp_cleanup_congestion_control when the congestion control has not 
> been
> > > initialized yet.
> >
> > On Tue, Jan 23, 2018 at 2:20 PM, Lawrence Brakmo  wrote:
> > > On 1/23/18, 9:30 AM, "Yuchung Cheng"  wrote:
> > >
> > > The original patch that changes TCP's congestion control via eBPF 
> only
> > > re-initializes the new congestion control, if the BPF op is set 
> to an
> > > (invalid) value beyond BPF_SOCK_OPS_NEEDS_ECN. Consequently TCP 
> will
> > >
> > > What do you mean by “(invalid) value”?
> > >
> > > run the new congestion control from random states.
> > >
> > > This has always been possible with setsockopt, no?
> > >
> > >This patch fixes
> > > the issue by always re-init the congestion control like other 
> means
> > > such as setsockopt and sysctl changes.
> > >
> > > The current code re-inits the congestion control when calling
> > > tcp_set_congestion_control except when it is called early on (i.e. op 
> <=
> > > BPF_SOCK_OPS_NEEDS_ECN). In that case there is no need to 
> re-initialize
> > > since it will be initialized later by TCP when the connection is 
> established.
> > >
> > > Otherwise, if we always call tcp_reinit_congestion_control we would 
> call
> > > tcp_cleanup_congestion_control when the congestion control has not 
> been
> > > initialized yet.
> >
> > Interesting. So I wonder if the symptoms we were seeing were due to
> > kernels that did not yet have this fix:
> >
> >   27204aaa9dc6 ("tcp: uniform the set up of sockets after successful
> > connection):
> >   
> https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git/commit/?id=27204aaa9dc67b833b77179fdac556288ec3a4bf
> >
> > Before that fix, there could be TFO passive connections that at SYN 
> time called:
> >   tcp_init_congestion_control(child);
> > and then:
> >   tcp_call_bpf(child, BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB);
> >
> > So that if the CC was switched in the
> > BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB handler then there would be no
> > init for the new module?
>
>
> Note that bpf_sock->op can be written by a malicious BPF filter.
>
> So, a malicious filter can switch from Cubic to BBR without re-init,
> and bad things can happen.
>
> I do not believe we should trust BPF here.
>
> Very good point Eric. One solution would be to make bpf_sock->op not 
> writeable by
> the BPF program.
>
> Neal, you are correct that would have been a problem. I leave it up to you 
> guys whether
> making bpf_sock->op not writeable by BPF program is enough or if it is safer 
> to always
> re-init (as long as there is no problem calling 
> tcp_cleanup_congestion_control when it
> has not been initialized.
Thank you Larry for the clarification. I prefer the latter approach
and will respin.

>
>
>


Re: [PATCH net] bpf: always re-init the congestion control after switching to it

2018-01-23 Thread Alexei Starovoitov
On Tue, Jan 23, 2018 at 08:19:54PM +, Lawrence Brakmo wrote:
> On 1/23/18, 11:50 AM, "Eric Dumazet"  wrote:
> 
> On Tue, 2018-01-23 at 14:39 -0500, Neal Cardwell wrote:
> > On Tue, Jan 23, 2018 at 2:20 PM, Lawrence Brakmo  wrote:
> > > On 1/23/18, 9:30 AM, "Yuchung Cheng"  wrote:
> > > 
> > > The original patch that changes TCP's congestion control via eBPF 
> only
> > > re-initializes the new congestion control, if the BPF op is set 
> to an
> > > (invalid) value beyond BPF_SOCK_OPS_NEEDS_ECN. Consequently TCP 
> will
> > > 
> > > What do you mean by “(invalid) value”?
> > > 
> > > run the new congestion control from random states.
> > > 
> > > This has always been possible with setsockopt, no?
> > > 
> > >This patch fixes
> > > the issue by always re-init the congestion control like other 
> means
> > > such as setsockopt and sysctl changes.
> > > 
> > > The current code re-inits the congestion control when calling
> > > tcp_set_congestion_control except when it is called early on (i.e. op 
> <=
> > > BPF_SOCK_OPS_NEEDS_ECN). In that case there is no need to 
> re-initialize
> > > since it will be initialized later by TCP when the connection is 
> established.
> > > 
> > > Otherwise, if we always call tcp_reinit_congestion_control we would 
> call
> > > tcp_cleanup_congestion_control when the congestion control has not 
> been
> > > initialized yet.
> > 
> > On Tue, Jan 23, 2018 at 2:20 PM, Lawrence Brakmo  wrote:
> > > On 1/23/18, 9:30 AM, "Yuchung Cheng"  wrote:
> > > 
> > > The original patch that changes TCP's congestion control via eBPF 
> only
> > > re-initializes the new congestion control, if the BPF op is set 
> to an
> > > (invalid) value beyond BPF_SOCK_OPS_NEEDS_ECN. Consequently TCP 
> will
> > > 
> > > What do you mean by “(invalid) value”?
> > > 
> > > run the new congestion control from random states.
> > > 
> > > This has always been possible with setsockopt, no?
> > > 
> > >This patch fixes
> > > the issue by always re-init the congestion control like other 
> means
> > > such as setsockopt and sysctl changes.
> > > 
> > > The current code re-inits the congestion control when calling
> > > tcp_set_congestion_control except when it is called early on (i.e. op 
> <=
> > > BPF_SOCK_OPS_NEEDS_ECN). In that case there is no need to 
> re-initialize
> > > since it will be initialized later by TCP when the connection is 
> established.
> > > 
> > > Otherwise, if we always call tcp_reinit_congestion_control we would 
> call
> > > tcp_cleanup_congestion_control when the congestion control has not 
> been
> > > initialized yet.
> > 
> > Interesting. So I wonder if the symptoms we were seeing were due to
> > kernels that did not yet have this fix:
> > 
> >   27204aaa9dc6 ("tcp: uniform the set up of sockets after successful
> > connection):
> >   
> https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git/commit/?id=27204aaa9dc67b833b77179fdac556288ec3a4bf
> > 
> > Before that fix, there could be TFO passive connections that at SYN 
> time called:
> >   tcp_init_congestion_control(child);
> > and then:
> >   tcp_call_bpf(child, BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB);
> > 
> > So that if the CC was switched in the
> > BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB handler then there would be no
> > init for the new module?
> 
> 
> Note that bpf_sock->op can be written by a malicious BPF filter.
> 
> So, a malicious filter can switch from Cubic to BBR without re-init,
> and bad things can happen.
> 
> I do not believe we should trust BPF here.
> 
> Very good point Eric. One solution would be to make bpf_sock->op not 
> writeable by
> the BPF program. 
> 
> Neal, you are correct that would have been a problem. I leave it up to you 
> guys whether
> making bpf_sock->op not writeable by BPF program is enough or if it is safer 
> to always
> re-init (as long as there is no problem calling 
> tcp_cleanup_congestion_control when it
> has not been initialized.

I think allowing write into 'op' and 'replylong' was a mistake.
Only 'reply' field is used by tcp_call_bpf().
No reason to let programs write into other fields.
I think we have to fix it now before programs start to rely
on this undefined behavior.



Re: [PATCH net] bpf: always re-init the congestion control after switching to it

2018-01-23 Thread Lawrence Brakmo


On 1/23/18, 3:26 PM, "Alexei Starovoitov"  wrote:

On Tue, Jan 23, 2018 at 08:19:54PM +, Lawrence Brakmo wrote:
> On 1/23/18, 11:50 AM, "Eric Dumazet"  wrote:
> 
> On Tue, 2018-01-23 at 14:39 -0500, Neal Cardwell wrote:
> > On Tue, Jan 23, 2018 at 2:20 PM, Lawrence Brakmo  
wrote:
> > > On 1/23/18, 9:30 AM, "Yuchung Cheng"  wrote:
> > > 
> > > The original patch that changes TCP's congestion control via 
eBPF only
> > > re-initializes the new congestion control, if the BPF op is 
set to an
> > > (invalid) value beyond BPF_SOCK_OPS_NEEDS_ECN. Consequently 
TCP will
> > > 
> > > What do you mean by “(invalid) value”?
> > > 
> > > run the new congestion control from random states.
> > > 
> > > This has always been possible with setsockopt, no?
> > > 
> > >This patch fixes
> > > the issue by always re-init the congestion control like other 
means
> > > such as setsockopt and sysctl changes.
> > > 
> > > The current code re-inits the congestion control when calling
> > > tcp_set_congestion_control except when it is called early on 
(i.e. op <=
> > > BPF_SOCK_OPS_NEEDS_ECN). In that case there is no need to 
re-initialize
> > > since it will be initialized later by TCP when the connection is 
established.
> > > 
> > > Otherwise, if we always call tcp_reinit_congestion_control we 
would call
> > > tcp_cleanup_congestion_control when the congestion control has 
not been
> > > initialized yet.
> > 
> > On Tue, Jan 23, 2018 at 2:20 PM, Lawrence Brakmo  
wrote:
> > > On 1/23/18, 9:30 AM, "Yuchung Cheng"  wrote:
> > > 
> > > The original patch that changes TCP's congestion control via 
eBPF only
> > > re-initializes the new congestion control, if the BPF op is 
set to an
> > > (invalid) value beyond BPF_SOCK_OPS_NEEDS_ECN. Consequently 
TCP will
> > > 
> > > What do you mean by “(invalid) value”?
> > > 
> > > run the new congestion control from random states.
> > > 
> > > This has always been possible with setsockopt, no?
> > > 
> > >This patch fixes
> > > the issue by always re-init the congestion control like other 
means
> > > such as setsockopt and sysctl changes.
> > > 
> > > The current code re-inits the congestion control when calling
> > > tcp_set_congestion_control except when it is called early on 
(i.e. op <=
> > > BPF_SOCK_OPS_NEEDS_ECN). In that case there is no need to 
re-initialize
> > > since it will be initialized later by TCP when the connection is 
established.
> > > 
> > > Otherwise, if we always call tcp_reinit_congestion_control we 
would call
> > > tcp_cleanup_congestion_control when the congestion control has 
not been
> > > initialized yet.
> > 
> > Interesting. So I wonder if the symptoms we were seeing were due to
> > kernels that did not yet have this fix:
> > 
> >   27204aaa9dc6 ("tcp: uniform the set up of sockets after successful
> > connection):
> >   
https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git/commit/?id=27204aaa9dc67b833b77179fdac556288ec3a4bf
> > 
> > Before that fix, there could be TFO passive connections that at SYN 
time called:
> >   tcp_init_congestion_control(child);
> > and then:
> >   tcp_call_bpf(child, BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB);
> > 
> > So that if the CC was switched in the
> > BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB handler then there would be no
> > init for the new module?
> 
> 
> Note that bpf_sock->op can be written by a malicious BPF filter.
> 
> So, a malicious filter can switch from Cubic to BBR without re-init,
> and bad things can happen.
> 
> I do not believe we should trust BPF here.
> 
> Very good point Eric. One solution would be to make bpf_sock->op not 
writeable by
> the BPF program. 
> 
> Neal, you are correct that would have been a problem. I leave it up to 
you guys whether
> making bpf_sock->op not writeable by BPF program is enough or if it is 
safer to always
> re-init (as long as there is no problem calling 
tcp_cleanup_congestion_control when it
> has not been initialized.

I think allowing write into 'op' and 'replylong' was a mistake.
Only 'reply' field is used by tcp_call_bpf().
No reason to let programs write into other fields.
I think we have to fix it now before programs start to rely
on this undefined behavior.

write into ‘op’ is a mistake. Writing to replylong is a mistake unt

Re: [PATCH net] bpf: always re-init the congestion control after switching to it

2018-01-23 Thread Yuchung Cheng
On Tue, Jan 23, 2018 at 3:30 PM, Lawrence Brakmo  wrote:
>
>
>
> On 1/23/18, 3:26 PM, "Alexei Starovoitov"  
> wrote:
>
> On Tue, Jan 23, 2018 at 08:19:54PM +, Lawrence Brakmo wrote:
> > On 1/23/18, 11:50 AM, "Eric Dumazet"  wrote:
> >
> > On Tue, 2018-01-23 at 14:39 -0500, Neal Cardwell wrote:
> > > On Tue, Jan 23, 2018 at 2:20 PM, Lawrence Brakmo  
> wrote:
> > > > On 1/23/18, 9:30 AM, "Yuchung Cheng"  wrote:
> > > >
> > > > The original patch that changes TCP's congestion control 
> via eBPF only
> > > > re-initializes the new congestion control, if the BPF op is 
> set to an
> > > > (invalid) value beyond BPF_SOCK_OPS_NEEDS_ECN. Consequently 
> TCP will
> > > >
> > > > What do you mean by “(invalid) value”?
> > > >
> > > > run the new congestion control from random states.
> > > >
> > > > This has always been possible with setsockopt, no?
> > > >
> > > >This patch fixes
> > > > the issue by always re-init the congestion control like 
> other means
> > > > such as setsockopt and sysctl changes.
> > > >
> > > > The current code re-inits the congestion control when calling
> > > > tcp_set_congestion_control except when it is called early on 
> (i.e. op <=
> > > > BPF_SOCK_OPS_NEEDS_ECN). In that case there is no need to 
> re-initialize
> > > > since it will be initialized later by TCP when the connection 
> is established.
> > > >
> > > > Otherwise, if we always call tcp_reinit_congestion_control we 
> would call
> > > > tcp_cleanup_congestion_control when the congestion control has 
> not been
> > > > initialized yet.
> > >
> > > On Tue, Jan 23, 2018 at 2:20 PM, Lawrence Brakmo  
> wrote:
> > > > On 1/23/18, 9:30 AM, "Yuchung Cheng"  wrote:
> > > >
> > > > The original patch that changes TCP's congestion control 
> via eBPF only
> > > > re-initializes the new congestion control, if the BPF op is 
> set to an
> > > > (invalid) value beyond BPF_SOCK_OPS_NEEDS_ECN. Consequently 
> TCP will
> > > >
> > > > What do you mean by “(invalid) value”?
> > > >
> > > > run the new congestion control from random states.
> > > >
> > > > This has always been possible with setsockopt, no?
> > > >
> > > >This patch fixes
> > > > the issue by always re-init the congestion control like 
> other means
> > > > such as setsockopt and sysctl changes.
> > > >
> > > > The current code re-inits the congestion control when calling
> > > > tcp_set_congestion_control except when it is called early on 
> (i.e. op <=
> > > > BPF_SOCK_OPS_NEEDS_ECN). In that case there is no need to 
> re-initialize
> > > > since it will be initialized later by TCP when the connection 
> is established.
> > > >
> > > > Otherwise, if we always call tcp_reinit_congestion_control we 
> would call
> > > > tcp_cleanup_congestion_control when the congestion control has 
> not been
> > > > initialized yet.
> > >
> > > Interesting. So I wonder if the symptoms we were seeing were due 
> to
> > > kernels that did not yet have this fix:
> > >
> > >   27204aaa9dc6 ("tcp: uniform the set up of sockets after 
> successful
> > > connection):
> > >   
> https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git/commit/?id=27204aaa9dc67b833b77179fdac556288ec3a4bf
> > >
> > > Before that fix, there could be TFO passive connections that at 
> SYN time called:
> > >   tcp_init_congestion_control(child);
> > > and then:
> > >   tcp_call_bpf(child, BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB);
> > >
> > > So that if the CC was switched in the
> > > BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB handler then there would be no
> > > init for the new module?
> >
> >
> > Note that bpf_sock->op can be written by a malicious BPF filter.
> >
> > So, a malicious filter can switch from Cubic to BBR without re-init,
> > and bad things can happen.
> >
> > I do not believe we should trust BPF here.
> >
> > Very good point Eric. One solution would be to make bpf_sock->op not 
> writeable by
> > the BPF program.
> >
> > Neal, you are correct that would have been a problem. I leave it up to 
> you guys whether
> > making bpf_sock->op not writeable by BPF program is enough or if it is 
> safer to always
> > re-init (as long as there is no problem calling 
> tcp_cleanup_congestion_control when it
> > has not been initialized.
>
> I think allowing write into 'op' and 'replylong' was a mistake.
> Only 'reply' fie