[PATCH 5.11 129/210] mlxsw: spectrum: Fix ECN marking in tunnel decapsulation

2021-04-12 Thread Greg Kroah-Hartman
From: Ido Schimmel 

[ Upstream commit 66167c310deb4ac1725f81004fb4b504676ad0bf ]

Cited commit changed the behavior of the software data path with regards
to the ECN marking of decapsulated packets. However, the commit did not
change other callers of __INET_ECN_decapsulate(), namely mlxsw. The
driver is using the function in order to ensure that the hardware and
software data paths act the same with regards to the ECN marking of
decapsulated packets.

The discrepancy was uncovered by commit 5aa3c334a449 ("selftests:
forwarding: vxlan_bridge_1d: Fix vxlan ecn decapsulate value") that
aligned the selftest to the new behavior. Without this patch the
selftest passes when used with veth pairs, but fails when used with
mlxsw netdevs.

Fix this by instructing the device to propagate the ECT(1) mark from the
outer header to the inner header when the inner header is ECT(0), for
both NVE and IP-in-IP tunnels.

A helper is added in order not to duplicate the code between both tunnel
types.

Fixes: b723748750ec ("tunnel: Propagate ECT(1) when decapsulating as 
recommended by RFC6040")
Signed-off-by: Ido Schimmel 
Reviewed-by: Petr Machata 
Acked-by: Toke Høiland-Jørgensen 
Signed-off-by: David S. Miller 
Signed-off-by: Sasha Levin 
---
 drivers/net/ethernet/mellanox/mlxsw/spectrum.h| 15 +++
 .../net/ethernet/mellanox/mlxsw/spectrum_ipip.c   |  7 +++
 .../net/ethernet/mellanox/mlxsw/spectrum_nve.c|  7 +++
 3 files changed, 21 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum.h 
b/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
index a6956cfc9cb1..4399c9a4999d 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
@@ -21,6 +21,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "port.h"
 #include "core.h"
@@ -346,6 +347,20 @@ struct mlxsw_sp_port_type_speed_ops {
u32 (*ptys_proto_cap_masked_get)(u32 eth_proto_cap);
 };
 
+static inline u8 mlxsw_sp_tunnel_ecn_decap(u8 outer_ecn, u8 inner_ecn,
+  bool *trap_en)
+{
+   bool set_ce = false;
+
+   *trap_en = !!__INET_ECN_decapsulate(outer_ecn, inner_ecn, _ce);
+   if (set_ce)
+   return INET_ECN_CE;
+   else if (outer_ecn == INET_ECN_ECT_1 && inner_ecn == INET_ECN_ECT_0)
+   return INET_ECN_ECT_1;
+   else
+   return inner_ecn;
+}
+
 static inline struct net_device *
 mlxsw_sp_bridge_vxlan_dev_find(struct net_device *br_dev)
 {
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_ipip.c 
b/drivers/net/ethernet/mellanox/mlxsw/spectrum_ipip.c
index 6ccca39bae84..64a8f838eb53 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_ipip.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_ipip.c
@@ -335,12 +335,11 @@ static int mlxsw_sp_ipip_ecn_decap_init_one(struct 
mlxsw_sp *mlxsw_sp,
u8 inner_ecn, u8 outer_ecn)
 {
char tidem_pl[MLXSW_REG_TIDEM_LEN];
-   bool trap_en, set_ce = false;
u8 new_inner_ecn;
+   bool trap_en;
 
-   trap_en = __INET_ECN_decapsulate(outer_ecn, inner_ecn, _ce);
-   new_inner_ecn = set_ce ? INET_ECN_CE : inner_ecn;
-
+   new_inner_ecn = mlxsw_sp_tunnel_ecn_decap(outer_ecn, inner_ecn,
+ _en);
mlxsw_reg_tidem_pack(tidem_pl, outer_ecn, inner_ecn, new_inner_ecn,
 trap_en, trap_en ? MLXSW_TRAP_ID_DECAP_ECN0 : 0);
return mlxsw_reg_write(mlxsw_sp->core, MLXSW_REG(tidem), tidem_pl);
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_nve.c 
b/drivers/net/ethernet/mellanox/mlxsw/spectrum_nve.c
index e5ec595593f4..9eba8fa684ae 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_nve.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_nve.c
@@ -909,12 +909,11 @@ static int __mlxsw_sp_nve_ecn_decap_init(struct mlxsw_sp 
*mlxsw_sp,
 u8 inner_ecn, u8 outer_ecn)
 {
char tndem_pl[MLXSW_REG_TNDEM_LEN];
-   bool trap_en, set_ce = false;
u8 new_inner_ecn;
+   bool trap_en;
 
-   trap_en = !!__INET_ECN_decapsulate(outer_ecn, inner_ecn, _ce);
-   new_inner_ecn = set_ce ? INET_ECN_CE : inner_ecn;
-
+   new_inner_ecn = mlxsw_sp_tunnel_ecn_decap(outer_ecn, inner_ecn,
+ _en);
mlxsw_reg_tndem_pack(tndem_pl, outer_ecn, inner_ecn, new_inner_ecn,
 trap_en, trap_en ? MLXSW_TRAP_ID_DECAP_ECN0 : 0);
return mlxsw_reg_write(mlxsw_sp->core, MLXSW_REG(tndem), tndem_pl);
-- 
2.30.2





[PATCH 5.10 118/188] mlxsw: spectrum: Fix ECN marking in tunnel decapsulation

2021-04-12 Thread Greg Kroah-Hartman
From: Ido Schimmel 

[ Upstream commit 66167c310deb4ac1725f81004fb4b504676ad0bf ]

Cited commit changed the behavior of the software data path with regards
to the ECN marking of decapsulated packets. However, the commit did not
change other callers of __INET_ECN_decapsulate(), namely mlxsw. The
driver is using the function in order to ensure that the hardware and
software data paths act the same with regards to the ECN marking of
decapsulated packets.

The discrepancy was uncovered by commit 5aa3c334a449 ("selftests:
forwarding: vxlan_bridge_1d: Fix vxlan ecn decapsulate value") that
aligned the selftest to the new behavior. Without this patch the
selftest passes when used with veth pairs, but fails when used with
mlxsw netdevs.

Fix this by instructing the device to propagate the ECT(1) mark from the
outer header to the inner header when the inner header is ECT(0), for
both NVE and IP-in-IP tunnels.

A helper is added in order not to duplicate the code between both tunnel
types.

Fixes: b723748750ec ("tunnel: Propagate ECT(1) when decapsulating as 
recommended by RFC6040")
Signed-off-by: Ido Schimmel 
Reviewed-by: Petr Machata 
Acked-by: Toke Høiland-Jørgensen 
Signed-off-by: David S. Miller 
Signed-off-by: Sasha Levin 
---
 drivers/net/ethernet/mellanox/mlxsw/spectrum.h| 15 +++
 .../net/ethernet/mellanox/mlxsw/spectrum_ipip.c   |  7 +++
 .../net/ethernet/mellanox/mlxsw/spectrum_nve.c|  7 +++
 3 files changed, 21 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum.h 
b/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
index 74b3959b36d4..3e7576e671df 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
@@ -20,6 +20,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "port.h"
 #include "core.h"
@@ -345,6 +346,20 @@ struct mlxsw_sp_port_type_speed_ops {
u32 (*ptys_proto_cap_masked_get)(u32 eth_proto_cap);
 };
 
+static inline u8 mlxsw_sp_tunnel_ecn_decap(u8 outer_ecn, u8 inner_ecn,
+  bool *trap_en)
+{
+   bool set_ce = false;
+
+   *trap_en = !!__INET_ECN_decapsulate(outer_ecn, inner_ecn, _ce);
+   if (set_ce)
+   return INET_ECN_CE;
+   else if (outer_ecn == INET_ECN_ECT_1 && inner_ecn == INET_ECN_ECT_0)
+   return INET_ECN_ECT_1;
+   else
+   return inner_ecn;
+}
+
 static inline struct net_device *
 mlxsw_sp_bridge_vxlan_dev_find(struct net_device *br_dev)
 {
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_ipip.c 
b/drivers/net/ethernet/mellanox/mlxsw/spectrum_ipip.c
index a8525992528f..3262a2c15ea7 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_ipip.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_ipip.c
@@ -371,12 +371,11 @@ static int mlxsw_sp_ipip_ecn_decap_init_one(struct 
mlxsw_sp *mlxsw_sp,
u8 inner_ecn, u8 outer_ecn)
 {
char tidem_pl[MLXSW_REG_TIDEM_LEN];
-   bool trap_en, set_ce = false;
u8 new_inner_ecn;
+   bool trap_en;
 
-   trap_en = __INET_ECN_decapsulate(outer_ecn, inner_ecn, _ce);
-   new_inner_ecn = set_ce ? INET_ECN_CE : inner_ecn;
-
+   new_inner_ecn = mlxsw_sp_tunnel_ecn_decap(outer_ecn, inner_ecn,
+ _en);
mlxsw_reg_tidem_pack(tidem_pl, outer_ecn, inner_ecn, new_inner_ecn,
 trap_en, trap_en ? MLXSW_TRAP_ID_DECAP_ECN0 : 0);
return mlxsw_reg_write(mlxsw_sp->core, MLXSW_REG(tidem), tidem_pl);
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_nve.c 
b/drivers/net/ethernet/mellanox/mlxsw/spectrum_nve.c
index 54d3e7dcd303..a2d1b95d1f58 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_nve.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_nve.c
@@ -909,12 +909,11 @@ static int __mlxsw_sp_nve_ecn_decap_init(struct mlxsw_sp 
*mlxsw_sp,
 u8 inner_ecn, u8 outer_ecn)
 {
char tndem_pl[MLXSW_REG_TNDEM_LEN];
-   bool trap_en, set_ce = false;
u8 new_inner_ecn;
+   bool trap_en;
 
-   trap_en = !!__INET_ECN_decapsulate(outer_ecn, inner_ecn, _ce);
-   new_inner_ecn = set_ce ? INET_ECN_CE : inner_ecn;
-
+   new_inner_ecn = mlxsw_sp_tunnel_ecn_decap(outer_ecn, inner_ecn,
+ _en);
mlxsw_reg_tndem_pack(tndem_pl, outer_ecn, inner_ecn, new_inner_ecn,
 trap_en, trap_en ? MLXSW_TRAP_ID_DECAP_ECN0 : 0);
return mlxsw_reg_write(mlxsw_sp->core, MLXSW_REG(tndem), tndem_pl);
-- 
2.30.2





[PATCH 5.11 185/254] selftests: forwarding: vxlan_bridge_1d: Fix vxlan ecn decapsulate value

2021-03-29 Thread Greg Kroah-Hartman
From: Hangbin Liu 

[ Upstream commit 5aa3c334a449bab24519c4967f5ac2b3304c8dcf ]

The ECN bit defines ECT(1) = 1, ECT(0) = 2. So inner 0x02 + outer 0x01
should be inner ECT(0) + outer ECT(1). Based on the description of
__INET_ECN_decapsulate, the final decapsulate value should be
ECT(1). So fix the test expect value to 0x01.

Before the fix:
TEST: VXLAN: ECN decap: 01/02->0x02 [FAIL]
Expected to capture 10 packets, got 0.

After the fix:
TEST: VXLAN: ECN decap: 01/02->0x01 [ OK ]

Fixes: a0b61f3d8ebf ("selftests: forwarding: vxlan_bridge_1d: Add an ECN decap 
test")
Signed-off-by: Hangbin Liu 
Signed-off-by: David S. Miller 
Signed-off-by: Sasha Levin 
---
 tools/testing/selftests/net/forwarding/vxlan_bridge_1d.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/testing/selftests/net/forwarding/vxlan_bridge_1d.sh 
b/tools/testing/selftests/net/forwarding/vxlan_bridge_1d.sh
index ce6bea9675c0..0ccb1dda099a 100755
--- a/tools/testing/selftests/net/forwarding/vxlan_bridge_1d.sh
+++ b/tools/testing/selftests/net/forwarding/vxlan_bridge_1d.sh
@@ -658,7 +658,7 @@ test_ecn_decap()
# In accordance with INET_ECN_decapsulate()
__test_ecn_decap 00 00 0x00
__test_ecn_decap 01 01 0x01
-   __test_ecn_decap 02 01 0x02
+   __test_ecn_decap 02 01 0x01
__test_ecn_decap 01 03 0x03
__test_ecn_decap 02 03 0x03
test_ecn_decap_error
-- 
2.30.1





[PATCH 5.10 162/221] selftests: forwarding: vxlan_bridge_1d: Fix vxlan ecn decapsulate value

2021-03-29 Thread Greg Kroah-Hartman
From: Hangbin Liu 

[ Upstream commit 5aa3c334a449bab24519c4967f5ac2b3304c8dcf ]

The ECN bit defines ECT(1) = 1, ECT(0) = 2. So inner 0x02 + outer 0x01
should be inner ECT(0) + outer ECT(1). Based on the description of
__INET_ECN_decapsulate, the final decapsulate value should be
ECT(1). So fix the test expect value to 0x01.

Before the fix:
TEST: VXLAN: ECN decap: 01/02->0x02 [FAIL]
Expected to capture 10 packets, got 0.

After the fix:
TEST: VXLAN: ECN decap: 01/02->0x01 [ OK ]

Fixes: a0b61f3d8ebf ("selftests: forwarding: vxlan_bridge_1d: Add an ECN decap 
test")
Signed-off-by: Hangbin Liu 
Signed-off-by: David S. Miller 
Signed-off-by: Sasha Levin 
---
 tools/testing/selftests/net/forwarding/vxlan_bridge_1d.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/testing/selftests/net/forwarding/vxlan_bridge_1d.sh 
b/tools/testing/selftests/net/forwarding/vxlan_bridge_1d.sh
index ce6bea9675c0..0ccb1dda099a 100755
--- a/tools/testing/selftests/net/forwarding/vxlan_bridge_1d.sh
+++ b/tools/testing/selftests/net/forwarding/vxlan_bridge_1d.sh
@@ -658,7 +658,7 @@ test_ecn_decap()
# In accordance with INET_ECN_decapsulate()
__test_ecn_decap 00 00 0x00
__test_ecn_decap 01 01 0x01
-   __test_ecn_decap 02 01 0x02
+   __test_ecn_decap 02 01 0x01
__test_ecn_decap 01 03 0x03
__test_ecn_decap 02 03 0x03
test_ecn_decap_error
-- 
2.30.1





[PATCH 5.4 085/111] selftests: forwarding: vxlan_bridge_1d: Fix vxlan ecn decapsulate value

2021-03-29 Thread Greg Kroah-Hartman
From: Hangbin Liu 

[ Upstream commit 5aa3c334a449bab24519c4967f5ac2b3304c8dcf ]

The ECN bit defines ECT(1) = 1, ECT(0) = 2. So inner 0x02 + outer 0x01
should be inner ECT(0) + outer ECT(1). Based on the description of
__INET_ECN_decapsulate, the final decapsulate value should be
ECT(1). So fix the test expect value to 0x01.

Before the fix:
TEST: VXLAN: ECN decap: 01/02->0x02 [FAIL]
Expected to capture 10 packets, got 0.

After the fix:
TEST: VXLAN: ECN decap: 01/02->0x01 [ OK ]

Fixes: a0b61f3d8ebf ("selftests: forwarding: vxlan_bridge_1d: Add an ECN decap 
test")
Signed-off-by: Hangbin Liu 
Signed-off-by: David S. Miller 
Signed-off-by: Sasha Levin 
---
 tools/testing/selftests/net/forwarding/vxlan_bridge_1d.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/testing/selftests/net/forwarding/vxlan_bridge_1d.sh 
b/tools/testing/selftests/net/forwarding/vxlan_bridge_1d.sh
index ce6bea9675c0..0ccb1dda099a 100755
--- a/tools/testing/selftests/net/forwarding/vxlan_bridge_1d.sh
+++ b/tools/testing/selftests/net/forwarding/vxlan_bridge_1d.sh
@@ -658,7 +658,7 @@ test_ecn_decap()
# In accordance with INET_ECN_decapsulate()
__test_ecn_decap 00 00 0x00
__test_ecn_decap 01 01 0x01
-   __test_ecn_decap 02 01 0x02
+   __test_ecn_decap 02 01 0x01
__test_ecn_decap 01 03 0x03
__test_ecn_decap 02 03 0x03
test_ecn_decap_error
-- 
2.30.1





[PATCH 5.10 150/199] netfilter: rpfilter: mask ecn bits before fib lookup

2021-01-26 Thread Greg Kroah-Hartman
From: Guillaume Nault 

commit 2e5a6266fbb11ae93c468dfecab169aca9c27b43 upstream.

RT_TOS() only masks one of the two ECN bits. Therefore rpfilter_mt()
treats Not-ECT or ECT(1) packets in a different way than those with
ECT(0) or CE.

Reproducer:

  Create two netns, connected with a veth:
  $ ip netns add ns0
  $ ip netns add ns1
  $ ip link add name veth01 netns ns0 type veth peer name veth10 netns ns1
  $ ip -netns ns0 link set dev veth01 up
  $ ip -netns ns1 link set dev veth10 up
  $ ip -netns ns0 address add 192.0.2.10/32 dev veth01
  $ ip -netns ns1 address add 192.0.2.11/32 dev veth10

  Add a route to ns1 in ns0:
  $ ip -netns ns0 route add 192.0.2.11/32 dev veth01

  In ns1, only packets with TOS 4 can be routed to ns0:
  $ ip -netns ns1 route add 192.0.2.10/32 tos 4 dev veth10

  Ping from ns0 to ns1 works regardless of the ECN bits, as long as TOS
  is 4:
  $ ip netns exec ns0 ping -Q 4 192.0.2.11   # TOS 4, Not-ECT
... 0% packet loss ...
  $ ip netns exec ns0 ping -Q 5 192.0.2.11   # TOS 4, ECT(1)
... 0% packet loss ...
  $ ip netns exec ns0 ping -Q 6 192.0.2.11   # TOS 4, ECT(0)
... 0% packet loss ...
  $ ip netns exec ns0 ping -Q 7 192.0.2.11   # TOS 4, CE
... 0% packet loss ...

  Now use iptable's rpfilter module in ns1:
  $ ip netns exec ns1 iptables-legacy -t raw -A PREROUTING -m rpfilter --invert 
-j DROP

  Not-ECT and ECT(1) packets still pass:
  $ ip netns exec ns0 ping -Q 4 192.0.2.11   # TOS 4, Not-ECT
... 0% packet loss ...
  $ ip netns exec ns0 ping -Q 5 192.0.2.11   # TOS 4, ECT(1)
... 0% packet loss ...

  But ECT(0) and ECN packets are dropped:
  $ ip netns exec ns0 ping -Q 6 192.0.2.11   # TOS 4, ECT(0)
... 100% packet loss ...
  $ ip netns exec ns0 ping -Q 7 192.0.2.11   # TOS 4, CE
... 100% packet loss ...

After this patch, rpfilter doesn't drop ECT(0) and CE packets anymore.

Fixes: 8f97339d3feb ("netfilter: add ipv4 reverse path filter match")
Signed-off-by: Guillaume Nault 
Signed-off-by: Jakub Kicinski 
Signed-off-by: Greg Kroah-Hartman 

---
 net/ipv4/netfilter/ipt_rpfilter.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/net/ipv4/netfilter/ipt_rpfilter.c
+++ b/net/ipv4/netfilter/ipt_rpfilter.c
@@ -76,7 +76,7 @@ static bool rpfilter_mt(const struct sk_
flow.daddr = iph->saddr;
flow.saddr = rpfilter_get_saddr(iph->daddr);
flow.flowi4_mark = info->flags & XT_RPFILTER_VALID_MARK ? skb->mark : 0;
-   flow.flowi4_tos = RT_TOS(iph->tos);
+   flow.flowi4_tos = iph->tos & IPTOS_RT_MASK;
flow.flowi4_scope = RT_SCOPE_UNIVERSE;
flow.flowi4_oif = l3mdev_master_ifindex_rcu(xt_in(par));
 




[PATCH 5.4 69/86] netfilter: rpfilter: mask ecn bits before fib lookup

2021-01-26 Thread Greg Kroah-Hartman
From: Guillaume Nault 

commit 2e5a6266fbb11ae93c468dfecab169aca9c27b43 upstream.

RT_TOS() only masks one of the two ECN bits. Therefore rpfilter_mt()
treats Not-ECT or ECT(1) packets in a different way than those with
ECT(0) or CE.

Reproducer:

  Create two netns, connected with a veth:
  $ ip netns add ns0
  $ ip netns add ns1
  $ ip link add name veth01 netns ns0 type veth peer name veth10 netns ns1
  $ ip -netns ns0 link set dev veth01 up
  $ ip -netns ns1 link set dev veth10 up
  $ ip -netns ns0 address add 192.0.2.10/32 dev veth01
  $ ip -netns ns1 address add 192.0.2.11/32 dev veth10

  Add a route to ns1 in ns0:
  $ ip -netns ns0 route add 192.0.2.11/32 dev veth01

  In ns1, only packets with TOS 4 can be routed to ns0:
  $ ip -netns ns1 route add 192.0.2.10/32 tos 4 dev veth10

  Ping from ns0 to ns1 works regardless of the ECN bits, as long as TOS
  is 4:
  $ ip netns exec ns0 ping -Q 4 192.0.2.11   # TOS 4, Not-ECT
... 0% packet loss ...
  $ ip netns exec ns0 ping -Q 5 192.0.2.11   # TOS 4, ECT(1)
... 0% packet loss ...
  $ ip netns exec ns0 ping -Q 6 192.0.2.11   # TOS 4, ECT(0)
... 0% packet loss ...
  $ ip netns exec ns0 ping -Q 7 192.0.2.11   # TOS 4, CE
... 0% packet loss ...

  Now use iptable's rpfilter module in ns1:
  $ ip netns exec ns1 iptables-legacy -t raw -A PREROUTING -m rpfilter --invert 
-j DROP

  Not-ECT and ECT(1) packets still pass:
  $ ip netns exec ns0 ping -Q 4 192.0.2.11   # TOS 4, Not-ECT
... 0% packet loss ...
  $ ip netns exec ns0 ping -Q 5 192.0.2.11   # TOS 4, ECT(1)
... 0% packet loss ...

  But ECT(0) and ECN packets are dropped:
  $ ip netns exec ns0 ping -Q 6 192.0.2.11   # TOS 4, ECT(0)
... 100% packet loss ...
  $ ip netns exec ns0 ping -Q 7 192.0.2.11   # TOS 4, CE
... 100% packet loss ...

After this patch, rpfilter doesn't drop ECT(0) and CE packets anymore.

Fixes: 8f97339d3feb ("netfilter: add ipv4 reverse path filter match")
Signed-off-by: Guillaume Nault 
Signed-off-by: Jakub Kicinski 
Signed-off-by: Greg Kroah-Hartman 

---
 net/ipv4/netfilter/ipt_rpfilter.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/net/ipv4/netfilter/ipt_rpfilter.c
+++ b/net/ipv4/netfilter/ipt_rpfilter.c
@@ -76,7 +76,7 @@ static bool rpfilter_mt(const struct sk_
flow.daddr = iph->saddr;
flow.saddr = rpfilter_get_saddr(iph->daddr);
flow.flowi4_mark = info->flags & XT_RPFILTER_VALID_MARK ? skb->mark : 0;
-   flow.flowi4_tos = RT_TOS(iph->tos);
+   flow.flowi4_tos = iph->tos & IPTOS_RT_MASK;
flow.flowi4_scope = RT_SCOPE_UNIVERSE;
flow.flowi4_oif = l3mdev_master_ifindex_rcu(xt_in(par));
 




[PATCH 4.19 44/58] netfilter: rpfilter: mask ecn bits before fib lookup

2021-01-26 Thread Greg Kroah-Hartman
From: Guillaume Nault 

commit 2e5a6266fbb11ae93c468dfecab169aca9c27b43 upstream.

RT_TOS() only masks one of the two ECN bits. Therefore rpfilter_mt()
treats Not-ECT or ECT(1) packets in a different way than those with
ECT(0) or CE.

Reproducer:

  Create two netns, connected with a veth:
  $ ip netns add ns0
  $ ip netns add ns1
  $ ip link add name veth01 netns ns0 type veth peer name veth10 netns ns1
  $ ip -netns ns0 link set dev veth01 up
  $ ip -netns ns1 link set dev veth10 up
  $ ip -netns ns0 address add 192.0.2.10/32 dev veth01
  $ ip -netns ns1 address add 192.0.2.11/32 dev veth10

  Add a route to ns1 in ns0:
  $ ip -netns ns0 route add 192.0.2.11/32 dev veth01

  In ns1, only packets with TOS 4 can be routed to ns0:
  $ ip -netns ns1 route add 192.0.2.10/32 tos 4 dev veth10

  Ping from ns0 to ns1 works regardless of the ECN bits, as long as TOS
  is 4:
  $ ip netns exec ns0 ping -Q 4 192.0.2.11   # TOS 4, Not-ECT
... 0% packet loss ...
  $ ip netns exec ns0 ping -Q 5 192.0.2.11   # TOS 4, ECT(1)
... 0% packet loss ...
  $ ip netns exec ns0 ping -Q 6 192.0.2.11   # TOS 4, ECT(0)
... 0% packet loss ...
  $ ip netns exec ns0 ping -Q 7 192.0.2.11   # TOS 4, CE
... 0% packet loss ...

  Now use iptable's rpfilter module in ns1:
  $ ip netns exec ns1 iptables-legacy -t raw -A PREROUTING -m rpfilter --invert 
-j DROP

  Not-ECT and ECT(1) packets still pass:
  $ ip netns exec ns0 ping -Q 4 192.0.2.11   # TOS 4, Not-ECT
... 0% packet loss ...
  $ ip netns exec ns0 ping -Q 5 192.0.2.11   # TOS 4, ECT(1)
... 0% packet loss ...

  But ECT(0) and ECN packets are dropped:
  $ ip netns exec ns0 ping -Q 6 192.0.2.11   # TOS 4, ECT(0)
... 100% packet loss ...
  $ ip netns exec ns0 ping -Q 7 192.0.2.11   # TOS 4, CE
... 100% packet loss ...

After this patch, rpfilter doesn't drop ECT(0) and CE packets anymore.

Fixes: 8f97339d3feb ("netfilter: add ipv4 reverse path filter match")
Signed-off-by: Guillaume Nault 
Signed-off-by: Jakub Kicinski 
Signed-off-by: Greg Kroah-Hartman 

---
 net/ipv4/netfilter/ipt_rpfilter.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/net/ipv4/netfilter/ipt_rpfilter.c
+++ b/net/ipv4/netfilter/ipt_rpfilter.c
@@ -94,7 +94,7 @@ static bool rpfilter_mt(const struct sk_
flow.daddr = iph->saddr;
flow.saddr = rpfilter_get_saddr(iph->daddr);
flow.flowi4_mark = info->flags & XT_RPFILTER_VALID_MARK ? skb->mark : 0;
-   flow.flowi4_tos = RT_TOS(iph->tos);
+   flow.flowi4_tos = iph->tos & IPTOS_RT_MASK;
flow.flowi4_scope = RT_SCOPE_UNIVERSE;
flow.flowi4_oif = l3mdev_master_ifindex_rcu(xt_in(par));
 




[PATCH 4.9 10/45] ipv4: Ignore ECN bits for fib lookups in fib_compute_spec_dst()

2021-01-11 Thread Greg Kroah-Hartman
From: Guillaume Nault 

[ Upstream commit 21fdca22eb7df2a1e194b8adb812ce370748b733 ]

RT_TOS() only clears one of the ECN bits. Therefore, when
fib_compute_spec_dst() resorts to a fib lookup, it can return
different results depending on the value of the second ECN bit.

For example, ECT(0) and ECT(1) packets could be treated differently.

  $ ip netns add ns0
  $ ip netns add ns1
  $ ip link add name veth01 netns ns0 type veth peer name veth10 netns ns1
  $ ip -netns ns0 link set dev lo up
  $ ip -netns ns1 link set dev lo up
  $ ip -netns ns0 link set dev veth01 up
  $ ip -netns ns1 link set dev veth10 up

  $ ip -netns ns0 address add 192.0.2.10/24 dev veth01
  $ ip -netns ns1 address add 192.0.2.11/24 dev veth10

  $ ip -netns ns1 address add 192.0.2.21/32 dev lo
  $ ip -netns ns1 route add 192.0.2.10/32 tos 4 dev veth10 src 192.0.2.21
  $ ip netns exec ns1 sysctl -wq net.ipv4.icmp_echo_ignore_broadcasts=0

With TOS 4 and ECT(1), ns1 replies using source address 192.0.2.21
(ping uses -Q to set all TOS and ECN bits):

  $ ip netns exec ns0 ping -c 1 -b -Q 5 192.0.2.255
  [...]
  64 bytes from 192.0.2.21: icmp_seq=1 ttl=64 time=0.544 ms

But with TOS 4 and ECT(0), ns1 replies using source address 192.0.2.11
because the "tos 4" route isn't matched:

  $ ip netns exec ns0 ping -c 1 -b -Q 6 192.0.2.255
  [...]
  64 bytes from 192.0.2.11: icmp_seq=1 ttl=64 time=0.597 ms

After this patch the ECN bits don't affect the result anymore:

  $ ip netns exec ns0 ping -c 1 -b -Q 6 192.0.2.255
  [...]
  64 bytes from 192.0.2.21: icmp_seq=1 ttl=64 time=0.591 ms

Fixes: 35ebf65e851c ("ipv4: Create and use fib_compute_spec_dst() helper.")
Signed-off-by: Guillaume Nault 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman 
---
 net/ipv4/fib_frontend.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/net/ipv4/fib_frontend.c
+++ b/net/ipv4/fib_frontend.c
@@ -292,7 +292,7 @@ __be32 fib_compute_spec_dst(struct sk_bu
.flowi4_iif = LOOPBACK_IFINDEX,
.flowi4_oif = l3mdev_master_ifindex_rcu(dev),
.daddr = ip_hdr(skb)->saddr,
-   .flowi4_tos = RT_TOS(ip_hdr(skb)->tos),
+   .flowi4_tos = ip_hdr(skb)->tos & IPTOS_RT_MASK,
.flowi4_scope = scope,
.flowi4_mark = vmark ? skb->mark : 0,
};




[PATCH 4.19 27/77] ipv4: Ignore ECN bits for fib lookups in fib_compute_spec_dst()

2021-01-11 Thread Greg Kroah-Hartman
From: Guillaume Nault 

[ Upstream commit 21fdca22eb7df2a1e194b8adb812ce370748b733 ]

RT_TOS() only clears one of the ECN bits. Therefore, when
fib_compute_spec_dst() resorts to a fib lookup, it can return
different results depending on the value of the second ECN bit.

For example, ECT(0) and ECT(1) packets could be treated differently.

  $ ip netns add ns0
  $ ip netns add ns1
  $ ip link add name veth01 netns ns0 type veth peer name veth10 netns ns1
  $ ip -netns ns0 link set dev lo up
  $ ip -netns ns1 link set dev lo up
  $ ip -netns ns0 link set dev veth01 up
  $ ip -netns ns1 link set dev veth10 up

  $ ip -netns ns0 address add 192.0.2.10/24 dev veth01
  $ ip -netns ns1 address add 192.0.2.11/24 dev veth10

  $ ip -netns ns1 address add 192.0.2.21/32 dev lo
  $ ip -netns ns1 route add 192.0.2.10/32 tos 4 dev veth10 src 192.0.2.21
  $ ip netns exec ns1 sysctl -wq net.ipv4.icmp_echo_ignore_broadcasts=0

With TOS 4 and ECT(1), ns1 replies using source address 192.0.2.21
(ping uses -Q to set all TOS and ECN bits):

  $ ip netns exec ns0 ping -c 1 -b -Q 5 192.0.2.255
  [...]
  64 bytes from 192.0.2.21: icmp_seq=1 ttl=64 time=0.544 ms

But with TOS 4 and ECT(0), ns1 replies using source address 192.0.2.11
because the "tos 4" route isn't matched:

  $ ip netns exec ns0 ping -c 1 -b -Q 6 192.0.2.255
  [...]
  64 bytes from 192.0.2.11: icmp_seq=1 ttl=64 time=0.597 ms

After this patch the ECN bits don't affect the result anymore:

  $ ip netns exec ns0 ping -c 1 -b -Q 6 192.0.2.255
  [...]
  64 bytes from 192.0.2.21: icmp_seq=1 ttl=64 time=0.591 ms

Fixes: 35ebf65e851c ("ipv4: Create and use fib_compute_spec_dst() helper.")
Signed-off-by: Guillaume Nault 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman 
---
 net/ipv4/fib_frontend.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/net/ipv4/fib_frontend.c
+++ b/net/ipv4/fib_frontend.c
@@ -302,7 +302,7 @@ __be32 fib_compute_spec_dst(struct sk_bu
.flowi4_iif = LOOPBACK_IFINDEX,
.flowi4_oif = l3mdev_master_ifindex_rcu(dev),
.daddr = ip_hdr(skb)->saddr,
-   .flowi4_tos = RT_TOS(ip_hdr(skb)->tos),
+   .flowi4_tos = ip_hdr(skb)->tos & IPTOS_RT_MASK,
.flowi4_scope = scope,
.flowi4_mark = vmark ? skb->mark : 0,
};




[PATCH 5.4 33/92] ipv4: Ignore ECN bits for fib lookups in fib_compute_spec_dst()

2021-01-11 Thread Greg Kroah-Hartman
From: Guillaume Nault 

[ Upstream commit 21fdca22eb7df2a1e194b8adb812ce370748b733 ]

RT_TOS() only clears one of the ECN bits. Therefore, when
fib_compute_spec_dst() resorts to a fib lookup, it can return
different results depending on the value of the second ECN bit.

For example, ECT(0) and ECT(1) packets could be treated differently.

  $ ip netns add ns0
  $ ip netns add ns1
  $ ip link add name veth01 netns ns0 type veth peer name veth10 netns ns1
  $ ip -netns ns0 link set dev lo up
  $ ip -netns ns1 link set dev lo up
  $ ip -netns ns0 link set dev veth01 up
  $ ip -netns ns1 link set dev veth10 up

  $ ip -netns ns0 address add 192.0.2.10/24 dev veth01
  $ ip -netns ns1 address add 192.0.2.11/24 dev veth10

  $ ip -netns ns1 address add 192.0.2.21/32 dev lo
  $ ip -netns ns1 route add 192.0.2.10/32 tos 4 dev veth10 src 192.0.2.21
  $ ip netns exec ns1 sysctl -wq net.ipv4.icmp_echo_ignore_broadcasts=0

With TOS 4 and ECT(1), ns1 replies using source address 192.0.2.21
(ping uses -Q to set all TOS and ECN bits):

  $ ip netns exec ns0 ping -c 1 -b -Q 5 192.0.2.255
  [...]
  64 bytes from 192.0.2.21: icmp_seq=1 ttl=64 time=0.544 ms

But with TOS 4 and ECT(0), ns1 replies using source address 192.0.2.11
because the "tos 4" route isn't matched:

  $ ip netns exec ns0 ping -c 1 -b -Q 6 192.0.2.255
  [...]
  64 bytes from 192.0.2.11: icmp_seq=1 ttl=64 time=0.597 ms

After this patch the ECN bits don't affect the result anymore:

  $ ip netns exec ns0 ping -c 1 -b -Q 6 192.0.2.255
  [...]
  64 bytes from 192.0.2.21: icmp_seq=1 ttl=64 time=0.591 ms

Fixes: 35ebf65e851c ("ipv4: Create and use fib_compute_spec_dst() helper.")
Signed-off-by: Guillaume Nault 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman 
---
 net/ipv4/fib_frontend.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/net/ipv4/fib_frontend.c
+++ b/net/ipv4/fib_frontend.c
@@ -302,7 +302,7 @@ __be32 fib_compute_spec_dst(struct sk_bu
.flowi4_iif = LOOPBACK_IFINDEX,
.flowi4_oif = l3mdev_master_ifindex_rcu(dev),
.daddr = ip_hdr(skb)->saddr,
-   .flowi4_tos = RT_TOS(ip_hdr(skb)->tos),
+   .flowi4_tos = ip_hdr(skb)->tos & IPTOS_RT_MASK,
.flowi4_scope = scope,
.flowi4_mark = vmark ? skb->mark : 0,
};




[PATCH 5.10 034/145] ipv4: Ignore ECN bits for fib lookups in fib_compute_spec_dst()

2021-01-11 Thread Greg Kroah-Hartman
From: Guillaume Nault 

[ Upstream commit 21fdca22eb7df2a1e194b8adb812ce370748b733 ]

RT_TOS() only clears one of the ECN bits. Therefore, when
fib_compute_spec_dst() resorts to a fib lookup, it can return
different results depending on the value of the second ECN bit.

For example, ECT(0) and ECT(1) packets could be treated differently.

  $ ip netns add ns0
  $ ip netns add ns1
  $ ip link add name veth01 netns ns0 type veth peer name veth10 netns ns1
  $ ip -netns ns0 link set dev lo up
  $ ip -netns ns1 link set dev lo up
  $ ip -netns ns0 link set dev veth01 up
  $ ip -netns ns1 link set dev veth10 up

  $ ip -netns ns0 address add 192.0.2.10/24 dev veth01
  $ ip -netns ns1 address add 192.0.2.11/24 dev veth10

  $ ip -netns ns1 address add 192.0.2.21/32 dev lo
  $ ip -netns ns1 route add 192.0.2.10/32 tos 4 dev veth10 src 192.0.2.21
  $ ip netns exec ns1 sysctl -wq net.ipv4.icmp_echo_ignore_broadcasts=0

With TOS 4 and ECT(1), ns1 replies using source address 192.0.2.21
(ping uses -Q to set all TOS and ECN bits):

  $ ip netns exec ns0 ping -c 1 -b -Q 5 192.0.2.255
  [...]
  64 bytes from 192.0.2.21: icmp_seq=1 ttl=64 time=0.544 ms

But with TOS 4 and ECT(0), ns1 replies using source address 192.0.2.11
because the "tos 4" route isn't matched:

  $ ip netns exec ns0 ping -c 1 -b -Q 6 192.0.2.255
  [...]
  64 bytes from 192.0.2.11: icmp_seq=1 ttl=64 time=0.597 ms

After this patch the ECN bits don't affect the result anymore:

  $ ip netns exec ns0 ping -c 1 -b -Q 6 192.0.2.255
  [...]
  64 bytes from 192.0.2.21: icmp_seq=1 ttl=64 time=0.591 ms

Fixes: 35ebf65e851c ("ipv4: Create and use fib_compute_spec_dst() helper.")
Signed-off-by: Guillaume Nault 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman 
---
 net/ipv4/fib_frontend.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/net/ipv4/fib_frontend.c
+++ b/net/ipv4/fib_frontend.c
@@ -292,7 +292,7 @@ __be32 fib_compute_spec_dst(struct sk_bu
.flowi4_iif = LOOPBACK_IFINDEX,
.flowi4_oif = l3mdev_master_ifindex_rcu(dev),
.daddr = ip_hdr(skb)->saddr,
-   .flowi4_tos = RT_TOS(ip_hdr(skb)->tos),
+   .flowi4_tos = ip_hdr(skb)->tos & IPTOS_RT_MASK,
.flowi4_scope = scope,
.flowi4_mark = vmark ? skb->mark : 0,
};




[PATCH 4.14 17/57] ipv4: Ignore ECN bits for fib lookups in fib_compute_spec_dst()

2021-01-11 Thread Greg Kroah-Hartman
From: Guillaume Nault 

[ Upstream commit 21fdca22eb7df2a1e194b8adb812ce370748b733 ]

RT_TOS() only clears one of the ECN bits. Therefore, when
fib_compute_spec_dst() resorts to a fib lookup, it can return
different results depending on the value of the second ECN bit.

For example, ECT(0) and ECT(1) packets could be treated differently.

  $ ip netns add ns0
  $ ip netns add ns1
  $ ip link add name veth01 netns ns0 type veth peer name veth10 netns ns1
  $ ip -netns ns0 link set dev lo up
  $ ip -netns ns1 link set dev lo up
  $ ip -netns ns0 link set dev veth01 up
  $ ip -netns ns1 link set dev veth10 up

  $ ip -netns ns0 address add 192.0.2.10/24 dev veth01
  $ ip -netns ns1 address add 192.0.2.11/24 dev veth10

  $ ip -netns ns1 address add 192.0.2.21/32 dev lo
  $ ip -netns ns1 route add 192.0.2.10/32 tos 4 dev veth10 src 192.0.2.21
  $ ip netns exec ns1 sysctl -wq net.ipv4.icmp_echo_ignore_broadcasts=0

With TOS 4 and ECT(1), ns1 replies using source address 192.0.2.21
(ping uses -Q to set all TOS and ECN bits):

  $ ip netns exec ns0 ping -c 1 -b -Q 5 192.0.2.255
  [...]
  64 bytes from 192.0.2.21: icmp_seq=1 ttl=64 time=0.544 ms

But with TOS 4 and ECT(0), ns1 replies using source address 192.0.2.11
because the "tos 4" route isn't matched:

  $ ip netns exec ns0 ping -c 1 -b -Q 6 192.0.2.255
  [...]
  64 bytes from 192.0.2.11: icmp_seq=1 ttl=64 time=0.597 ms

After this patch the ECN bits don't affect the result anymore:

  $ ip netns exec ns0 ping -c 1 -b -Q 6 192.0.2.255
  [...]
  64 bytes from 192.0.2.21: icmp_seq=1 ttl=64 time=0.591 ms

Fixes: 35ebf65e851c ("ipv4: Create and use fib_compute_spec_dst() helper.")
Signed-off-by: Guillaume Nault 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman 
---
 net/ipv4/fib_frontend.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/net/ipv4/fib_frontend.c
+++ b/net/ipv4/fib_frontend.c
@@ -292,7 +292,7 @@ __be32 fib_compute_spec_dst(struct sk_bu
.flowi4_iif = LOOPBACK_IFINDEX,
.flowi4_oif = l3mdev_master_ifindex_rcu(dev),
.daddr = ip_hdr(skb)->saddr,
-   .flowi4_tos = RT_TOS(ip_hdr(skb)->tos),
+   .flowi4_tos = ip_hdr(skb)->tos & IPTOS_RT_MASK,
.flowi4_scope = scope,
.flowi4_mark = vmark ? skb->mark : 0,
};




[PATCH 4.4 07/38] ipv4: Ignore ECN bits for fib lookups in fib_compute_spec_dst()

2021-01-11 Thread Greg Kroah-Hartman
From: Guillaume Nault 

[ Upstream commit 21fdca22eb7df2a1e194b8adb812ce370748b733 ]

RT_TOS() only clears one of the ECN bits. Therefore, when
fib_compute_spec_dst() resorts to a fib lookup, it can return
different results depending on the value of the second ECN bit.

For example, ECT(0) and ECT(1) packets could be treated differently.

  $ ip netns add ns0
  $ ip netns add ns1
  $ ip link add name veth01 netns ns0 type veth peer name veth10 netns ns1
  $ ip -netns ns0 link set dev lo up
  $ ip -netns ns1 link set dev lo up
  $ ip -netns ns0 link set dev veth01 up
  $ ip -netns ns1 link set dev veth10 up

  $ ip -netns ns0 address add 192.0.2.10/24 dev veth01
  $ ip -netns ns1 address add 192.0.2.11/24 dev veth10

  $ ip -netns ns1 address add 192.0.2.21/32 dev lo
  $ ip -netns ns1 route add 192.0.2.10/32 tos 4 dev veth10 src 192.0.2.21
  $ ip netns exec ns1 sysctl -wq net.ipv4.icmp_echo_ignore_broadcasts=0

With TOS 4 and ECT(1), ns1 replies using source address 192.0.2.21
(ping uses -Q to set all TOS and ECN bits):

  $ ip netns exec ns0 ping -c 1 -b -Q 5 192.0.2.255
  [...]
  64 bytes from 192.0.2.21: icmp_seq=1 ttl=64 time=0.544 ms

But with TOS 4 and ECT(0), ns1 replies using source address 192.0.2.11
because the "tos 4" route isn't matched:

  $ ip netns exec ns0 ping -c 1 -b -Q 6 192.0.2.255
  [...]
  64 bytes from 192.0.2.11: icmp_seq=1 ttl=64 time=0.597 ms

After this patch the ECN bits don't affect the result anymore:

  $ ip netns exec ns0 ping -c 1 -b -Q 6 192.0.2.255
  [...]
  64 bytes from 192.0.2.21: icmp_seq=1 ttl=64 time=0.591 ms

Fixes: 35ebf65e851c ("ipv4: Create and use fib_compute_spec_dst() helper.")
Signed-off-by: Guillaume Nault 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman 
---
 net/ipv4/fib_frontend.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/net/ipv4/fib_frontend.c
+++ b/net/ipv4/fib_frontend.c
@@ -299,7 +299,7 @@ __be32 fib_compute_spec_dst(struct sk_bu
.flowi4_iif = LOOPBACK_IFINDEX,
.flowi4_oif = l3mdev_master_ifindex_rcu(dev),
.daddr = ip_hdr(skb)->saddr,
-   .flowi4_tos = RT_TOS(ip_hdr(skb)->tos),
+   .flowi4_tos = ip_hdr(skb)->tos & IPTOS_RT_MASK,
.flowi4_scope = scope,
.flowi4_mark = vmark ? skb->mark : 0,
};




[PATCH 4.4 15/39] geneve: pull IP header before ECN decapsulation

2020-12-10 Thread Greg Kroah-Hartman
From: Eric Dumazet 

IP_ECN_decapsulate() and IP6_ECN_decapsulate() assume
IP header is already pulled.

geneve does not ensure this yet.

Fixing this generically in IP_ECN_decapsulate() and
IP6_ECN_decapsulate() is not possible, since callers
pass a pointer that might be freed by pskb_may_pull()

syzbot reported :

BUG: KMSAN: uninit-value in __INET_ECN_decapsulate include/net/inet_ecn.h:238 
[inline]
BUG: KMSAN: uninit-value in INET_ECN_decapsulate+0x345/0x1db0 
include/net/inet_ecn.h:260
CPU: 1 PID: 8941 Comm: syz-executor.0 Not tainted 5.10.0-rc4-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 
01/01/2011
Call Trace:
 
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x21c/0x280 lib/dump_stack.c:118
 kmsan_report+0xf7/0x1e0 mm/kmsan/kmsan_report.c:118
 __msan_warning+0x5f/0xa0 mm/kmsan/kmsan_instr.c:197
 __INET_ECN_decapsulate include/net/inet_ecn.h:238 [inline]
 INET_ECN_decapsulate+0x345/0x1db0 include/net/inet_ecn.h:260
 geneve_rx+0x2103/0x2980 include/net/inet_ecn.h:306
 geneve_udp_encap_recv+0x105c/0x1340 drivers/net/geneve.c:377
 udp_queue_rcv_one_skb+0x193a/0x1af0 net/ipv4/udp.c:2093
 udp_queue_rcv_skb+0x282/0x1050 net/ipv4/udp.c:2167
 udp_unicast_rcv_skb net/ipv4/udp.c:2325 [inline]
 __udp4_lib_rcv+0x399d/0x5880 net/ipv4/udp.c:2394
 udp_rcv+0x5c/0x70 net/ipv4/udp.c:2564
 ip_protocol_deliver_rcu+0x572/0xc50 net/ipv4/ip_input.c:204
 ip_local_deliver_finish net/ipv4/ip_input.c:231 [inline]
 NF_HOOK include/linux/netfilter.h:301 [inline]
 ip_local_deliver+0x583/0x8d0 net/ipv4/ip_input.c:252
 dst_input include/net/dst.h:449 [inline]
 ip_rcv_finish net/ipv4/ip_input.c:428 [inline]
 NF_HOOK include/linux/netfilter.h:301 [inline]
 ip_rcv+0x5c3/0x840 net/ipv4/ip_input.c:539
 __netif_receive_skb_one_core net/core/dev.c:5315 [inline]
 __netif_receive_skb+0x1ec/0x640 net/core/dev.c:5429
 process_backlog+0x523/0xc10 net/core/dev.c:6319
 napi_poll+0x420/0x1010 net/core/dev.c:6763
 net_rx_action+0x35c/0xd40 net/core/dev.c:6833
 __do_softirq+0x1a9/0x6fa kernel/softirq.c:298
 asm_call_irq_on_stack+0xf/0x20
 
 __run_on_irqstack arch/x86/include/asm/irq_stack.h:26 [inline]
 run_on_irqstack_cond arch/x86/include/asm/irq_stack.h:77 [inline]
 do_softirq_own_stack+0x6e/0x90 arch/x86/kernel/irq_64.c:77
 do_softirq kernel/softirq.c:343 [inline]
 __local_bh_enable_ip+0x184/0x1d0 kernel/softirq.c:195
 local_bh_enable+0x36/0x40 include/linux/bottom_half.h:32
 rcu_read_unlock_bh include/linux/rcupdate.h:730 [inline]
 __dev_queue_xmit+0x3a9b/0x4520 net/core/dev.c:4167
 dev_queue_xmit+0x4b/0x60 net/core/dev.c:4173
 packet_snd net/packet/af_packet.c:2992 [inline]
 packet_sendmsg+0x86f9/0x99d0 net/packet/af_packet.c:3017
 sock_sendmsg_nosec net/socket.c:651 [inline]
 sock_sendmsg net/socket.c:671 [inline]
 __sys_sendto+0x9dc/0xc80 net/socket.c:1992
 __do_sys_sendto net/socket.c:2004 [inline]
 __se_sys_sendto+0x107/0x130 net/socket.c:2000
 __x64_sys_sendto+0x6e/0x90 net/socket.c:2000
 do_syscall_64+0x9f/0x140 arch/x86/entry/common.c:48
 entry_SYSCALL_64_after_hwframe+0x44/0xa9

Fixes: 2d07dc79fe04 ("geneve: add initial netdev driver for GENEVE tunnels")
Signed-off-by: Eric Dumazet 
Reported-by: syzbot 
Link: https://lore.kernel.org/r/20201201090507.4137906-1-eric.duma...@gmail.com
Signed-off-by: Jakub Kicinski 
---
 drivers/net/geneve.c | 18 --
 1 file changed, 16 insertions(+), 2 deletions(-)

diff --git a/drivers/net/geneve.c b/drivers/net/geneve.c
index ee38299f9c578..e0384609fb84a 100644
--- a/drivers/net/geneve.c
+++ b/drivers/net/geneve.c
@@ -231,8 +231,20 @@ static void geneve_rx(struct geneve_sock *gs, struct 
sk_buff *skb)
 
/* Ignore packet loops (and multicast echo) */
if (ether_addr_equal(eth_hdr(skb)->h_source, geneve->dev->dev_addr))
-   goto drop;
-
+   goto rx_error;
+
+   switch (skb_protocol(skb, true)) {
+   case htons(ETH_P_IP):
+   if (pskb_may_pull(skb, sizeof(struct iphdr)))
+   goto rx_error;
+   break;
+   case htons(ETH_P_IPV6):
+   if (pskb_may_pull(skb, sizeof(struct ipv6hdr)))
+   goto rx_error;
+   break;
+   default:
+   goto rx_error;
+   }
skb_reset_network_header(skb);
 
if (iph)
@@ -269,6 +281,8 @@ static void geneve_rx(struct geneve_sock *gs, struct 
sk_buff *skb)
 
gro_cells_receive(>gro_cells, skb);
return;
+rx_error:
+   geneve->dev->stats.rx_errors++;
 drop:
/* Consume bad packet */
kfree_skb(skb);
-- 
2.27.0





[PATCH 4.9 21/45] geneve: pull IP header before ECN decapsulation

2020-12-10 Thread Greg Kroah-Hartman
From: Eric Dumazet 

IP_ECN_decapsulate() and IP6_ECN_decapsulate() assume
IP header is already pulled.

geneve does not ensure this yet.

Fixing this generically in IP_ECN_decapsulate() and
IP6_ECN_decapsulate() is not possible, since callers
pass a pointer that might be freed by pskb_may_pull()

syzbot reported :

BUG: KMSAN: uninit-value in __INET_ECN_decapsulate include/net/inet_ecn.h:238 
[inline]
BUG: KMSAN: uninit-value in INET_ECN_decapsulate+0x345/0x1db0 
include/net/inet_ecn.h:260
CPU: 1 PID: 8941 Comm: syz-executor.0 Not tainted 5.10.0-rc4-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 
01/01/2011
Call Trace:
 
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x21c/0x280 lib/dump_stack.c:118
 kmsan_report+0xf7/0x1e0 mm/kmsan/kmsan_report.c:118
 __msan_warning+0x5f/0xa0 mm/kmsan/kmsan_instr.c:197
 __INET_ECN_decapsulate include/net/inet_ecn.h:238 [inline]
 INET_ECN_decapsulate+0x345/0x1db0 include/net/inet_ecn.h:260
 geneve_rx+0x2103/0x2980 include/net/inet_ecn.h:306
 geneve_udp_encap_recv+0x105c/0x1340 drivers/net/geneve.c:377
 udp_queue_rcv_one_skb+0x193a/0x1af0 net/ipv4/udp.c:2093
 udp_queue_rcv_skb+0x282/0x1050 net/ipv4/udp.c:2167
 udp_unicast_rcv_skb net/ipv4/udp.c:2325 [inline]
 __udp4_lib_rcv+0x399d/0x5880 net/ipv4/udp.c:2394
 udp_rcv+0x5c/0x70 net/ipv4/udp.c:2564
 ip_protocol_deliver_rcu+0x572/0xc50 net/ipv4/ip_input.c:204
 ip_local_deliver_finish net/ipv4/ip_input.c:231 [inline]
 NF_HOOK include/linux/netfilter.h:301 [inline]
 ip_local_deliver+0x583/0x8d0 net/ipv4/ip_input.c:252
 dst_input include/net/dst.h:449 [inline]
 ip_rcv_finish net/ipv4/ip_input.c:428 [inline]
 NF_HOOK include/linux/netfilter.h:301 [inline]
 ip_rcv+0x5c3/0x840 net/ipv4/ip_input.c:539
 __netif_receive_skb_one_core net/core/dev.c:5315 [inline]
 __netif_receive_skb+0x1ec/0x640 net/core/dev.c:5429
 process_backlog+0x523/0xc10 net/core/dev.c:6319
 napi_poll+0x420/0x1010 net/core/dev.c:6763
 net_rx_action+0x35c/0xd40 net/core/dev.c:6833
 __do_softirq+0x1a9/0x6fa kernel/softirq.c:298
 asm_call_irq_on_stack+0xf/0x20
 
 __run_on_irqstack arch/x86/include/asm/irq_stack.h:26 [inline]
 run_on_irqstack_cond arch/x86/include/asm/irq_stack.h:77 [inline]
 do_softirq_own_stack+0x6e/0x90 arch/x86/kernel/irq_64.c:77
 do_softirq kernel/softirq.c:343 [inline]
 __local_bh_enable_ip+0x184/0x1d0 kernel/softirq.c:195
 local_bh_enable+0x36/0x40 include/linux/bottom_half.h:32
 rcu_read_unlock_bh include/linux/rcupdate.h:730 [inline]
 __dev_queue_xmit+0x3a9b/0x4520 net/core/dev.c:4167
 dev_queue_xmit+0x4b/0x60 net/core/dev.c:4173
 packet_snd net/packet/af_packet.c:2992 [inline]
 packet_sendmsg+0x86f9/0x99d0 net/packet/af_packet.c:3017
 sock_sendmsg_nosec net/socket.c:651 [inline]
 sock_sendmsg net/socket.c:671 [inline]
 __sys_sendto+0x9dc/0xc80 net/socket.c:1992
 __do_sys_sendto net/socket.c:2004 [inline]
 __se_sys_sendto+0x107/0x130 net/socket.c:2000
 __x64_sys_sendto+0x6e/0x90 net/socket.c:2000
 do_syscall_64+0x9f/0x140 arch/x86/entry/common.c:48
 entry_SYSCALL_64_after_hwframe+0x44/0xa9

Fixes: 2d07dc79fe04 ("geneve: add initial netdev driver for GENEVE tunnels")
Signed-off-by: Eric Dumazet 
Reported-by: syzbot 
Link: https://lore.kernel.org/r/20201201090507.4137906-1-eric.duma...@gmail.com
Signed-off-by: Jakub Kicinski 
---
 drivers/net/geneve.c | 20 
 1 file changed, 16 insertions(+), 4 deletions(-)

diff --git a/drivers/net/geneve.c b/drivers/net/geneve.c
index d89995f4bd433..e6f9fe7fa2a40 100644
--- a/drivers/net/geneve.c
+++ b/drivers/net/geneve.c
@@ -249,11 +249,21 @@ static void geneve_rx(struct geneve_dev *geneve, struct 
geneve_sock *gs,
skb_dst_set(skb, _dst->dst);
 
/* Ignore packet loops (and multicast echo) */
-   if (ether_addr_equal(eth_hdr(skb)->h_source, geneve->dev->dev_addr)) {
-   geneve->dev->stats.rx_errors++;
-   goto drop;
+   if (ether_addr_equal(eth_hdr(skb)->h_source, geneve->dev->dev_addr))
+   goto rx_error;
+
+   switch (skb_protocol(skb, true)) {
+   case htons(ETH_P_IP):
+   if (pskb_may_pull(skb, sizeof(struct iphdr)))
+   goto rx_error;
+   break;
+   case htons(ETH_P_IPV6):
+   if (pskb_may_pull(skb, sizeof(struct ipv6hdr)))
+   goto rx_error;
+   break;
+   default:
+   goto rx_error;
}
-
oiph = skb_network_header(skb);
skb_reset_network_header(skb);
 
@@ -294,6 +304,8 @@ static void geneve_rx(struct geneve_dev *geneve, struct 
geneve_sock *gs,
u64_stats_update_end(>syncp);
}
return;
+rx_error:
+   geneve->dev->stats.rx_errors++;
 drop:
/* Consume bad packet */
kfree_skb(skb);
-- 
2.27.0





[PATCH 4.14 04/31] geneve: pull IP header before ECN decapsulation

2020-12-10 Thread Greg Kroah-Hartman
From: Eric Dumazet 

IP_ECN_decapsulate() and IP6_ECN_decapsulate() assume
IP header is already pulled.

geneve does not ensure this yet.

Fixing this generically in IP_ECN_decapsulate() and
IP6_ECN_decapsulate() is not possible, since callers
pass a pointer that might be freed by pskb_may_pull()

syzbot reported :

BUG: KMSAN: uninit-value in __INET_ECN_decapsulate include/net/inet_ecn.h:238 
[inline]
BUG: KMSAN: uninit-value in INET_ECN_decapsulate+0x345/0x1db0 
include/net/inet_ecn.h:260
CPU: 1 PID: 8941 Comm: syz-executor.0 Not tainted 5.10.0-rc4-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 
01/01/2011
Call Trace:
 
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x21c/0x280 lib/dump_stack.c:118
 kmsan_report+0xf7/0x1e0 mm/kmsan/kmsan_report.c:118
 __msan_warning+0x5f/0xa0 mm/kmsan/kmsan_instr.c:197
 __INET_ECN_decapsulate include/net/inet_ecn.h:238 [inline]
 INET_ECN_decapsulate+0x345/0x1db0 include/net/inet_ecn.h:260
 geneve_rx+0x2103/0x2980 include/net/inet_ecn.h:306
 geneve_udp_encap_recv+0x105c/0x1340 drivers/net/geneve.c:377
 udp_queue_rcv_one_skb+0x193a/0x1af0 net/ipv4/udp.c:2093
 udp_queue_rcv_skb+0x282/0x1050 net/ipv4/udp.c:2167
 udp_unicast_rcv_skb net/ipv4/udp.c:2325 [inline]
 __udp4_lib_rcv+0x399d/0x5880 net/ipv4/udp.c:2394
 udp_rcv+0x5c/0x70 net/ipv4/udp.c:2564
 ip_protocol_deliver_rcu+0x572/0xc50 net/ipv4/ip_input.c:204
 ip_local_deliver_finish net/ipv4/ip_input.c:231 [inline]
 NF_HOOK include/linux/netfilter.h:301 [inline]
 ip_local_deliver+0x583/0x8d0 net/ipv4/ip_input.c:252
 dst_input include/net/dst.h:449 [inline]
 ip_rcv_finish net/ipv4/ip_input.c:428 [inline]
 NF_HOOK include/linux/netfilter.h:301 [inline]
 ip_rcv+0x5c3/0x840 net/ipv4/ip_input.c:539
 __netif_receive_skb_one_core net/core/dev.c:5315 [inline]
 __netif_receive_skb+0x1ec/0x640 net/core/dev.c:5429
 process_backlog+0x523/0xc10 net/core/dev.c:6319
 napi_poll+0x420/0x1010 net/core/dev.c:6763
 net_rx_action+0x35c/0xd40 net/core/dev.c:6833
 __do_softirq+0x1a9/0x6fa kernel/softirq.c:298
 asm_call_irq_on_stack+0xf/0x20
 
 __run_on_irqstack arch/x86/include/asm/irq_stack.h:26 [inline]
 run_on_irqstack_cond arch/x86/include/asm/irq_stack.h:77 [inline]
 do_softirq_own_stack+0x6e/0x90 arch/x86/kernel/irq_64.c:77
 do_softirq kernel/softirq.c:343 [inline]
 __local_bh_enable_ip+0x184/0x1d0 kernel/softirq.c:195
 local_bh_enable+0x36/0x40 include/linux/bottom_half.h:32
 rcu_read_unlock_bh include/linux/rcupdate.h:730 [inline]
 __dev_queue_xmit+0x3a9b/0x4520 net/core/dev.c:4167
 dev_queue_xmit+0x4b/0x60 net/core/dev.c:4173
 packet_snd net/packet/af_packet.c:2992 [inline]
 packet_sendmsg+0x86f9/0x99d0 net/packet/af_packet.c:3017
 sock_sendmsg_nosec net/socket.c:651 [inline]
 sock_sendmsg net/socket.c:671 [inline]
 __sys_sendto+0x9dc/0xc80 net/socket.c:1992
 __do_sys_sendto net/socket.c:2004 [inline]
 __se_sys_sendto+0x107/0x130 net/socket.c:2000
 __x64_sys_sendto+0x6e/0x90 net/socket.c:2000
 do_syscall_64+0x9f/0x140 arch/x86/entry/common.c:48
 entry_SYSCALL_64_after_hwframe+0x44/0xa9

Fixes: 2d07dc79fe04 ("geneve: add initial netdev driver for GENEVE tunnels")
Signed-off-by: Eric Dumazet 
Reported-by: syzbot 
Link: https://lore.kernel.org/r/20201201090507.4137906-1-eric.duma...@gmail.com
Signed-off-by: Jakub Kicinski 
---
 drivers/net/geneve.c | 20 
 1 file changed, 16 insertions(+), 4 deletions(-)

diff --git a/drivers/net/geneve.c b/drivers/net/geneve.c
index f48006c22a8a6..5eb7f409dc10b 100644
--- a/drivers/net/geneve.c
+++ b/drivers/net/geneve.c
@@ -254,11 +254,21 @@ static void geneve_rx(struct geneve_dev *geneve, struct 
geneve_sock *gs,
skb_dst_set(skb, _dst->dst);
 
/* Ignore packet loops (and multicast echo) */
-   if (ether_addr_equal(eth_hdr(skb)->h_source, geneve->dev->dev_addr)) {
-   geneve->dev->stats.rx_errors++;
-   goto drop;
+   if (ether_addr_equal(eth_hdr(skb)->h_source, geneve->dev->dev_addr))
+   goto rx_error;
+
+   switch (skb_protocol(skb, true)) {
+   case htons(ETH_P_IP):
+   if (pskb_may_pull(skb, sizeof(struct iphdr)))
+   goto rx_error;
+   break;
+   case htons(ETH_P_IPV6):
+   if (pskb_may_pull(skb, sizeof(struct ipv6hdr)))
+   goto rx_error;
+   break;
+   default:
+   goto rx_error;
}
-
oiph = skb_network_header(skb);
skb_reset_network_header(skb);
 
@@ -299,6 +309,8 @@ static void geneve_rx(struct geneve_dev *geneve, struct 
geneve_sock *gs,
u64_stats_update_end(>syncp);
}
return;
+rx_error:
+   geneve->dev->stats.rx_errors++;
 drop:
/* Consume bad packet */
kfree_skb(skb);
-- 
2.27.0





Re: [PATCH 4.4 15/39] geneve: pull IP header before ECN decapsulation

2020-12-10 Thread Greg Kroah-Hartman
On Thu, Dec 10, 2020 at 03:32:12PM +0100, Eric Dumazet wrote:
> On Thu, Dec 10, 2020 at 3:26 PM Greg Kroah-Hartman
>  wrote:
> >
> > From: Eric Dumazet 
> >
> > IP_ECN_decapsulate() and IP6_ECN_decapsulate() assume
> > IP header is already pulled.
> >
> > geneve does not ensure this yet.
> >
> > Fixing this generically in IP_ECN_decapsulate() and
> > IP6_ECN_decapsulate() is not possible, since callers
> > pass a pointer that might be freed by pskb_may_pull()
> >
> > syzbot reported :
> >
> 
> Note that we had to revert this patch, so you can either scratp this
> backport, or make sure to backport the revert.

I'll drop it thanks.  Odd I lost the upstream git id on this patch, let
me check what went wrong...

greg k-h


Re: [PATCH 4.4 15/39] geneve: pull IP header before ECN decapsulation

2020-12-10 Thread Greg Kroah-Hartman
On Thu, Dec 10, 2020 at 03:53:09PM +0100, Eric Dumazet wrote:
> On Thu, Dec 10, 2020 at 3:40 PM Greg Kroah-Hartman
>  wrote:
> >
> > On Thu, Dec 10, 2020 at 03:38:44PM +0100, Greg Kroah-Hartman wrote:
> > > On Thu, Dec 10, 2020 at 03:32:12PM +0100, Eric Dumazet wrote:
> > > > On Thu, Dec 10, 2020 at 3:26 PM Greg Kroah-Hartman
> > > >  wrote:
> > > > >
> > > > > From: Eric Dumazet 
> > > > >
> > > > > IP_ECN_decapsulate() and IP6_ECN_decapsulate() assume
> > > > > IP header is already pulled.
> > > > >
> > > > > geneve does not ensure this yet.
> > > > >
> > > > > Fixing this generically in IP_ECN_decapsulate() and
> > > > > IP6_ECN_decapsulate() is not possible, since callers
> > > > > pass a pointer that might be freed by pskb_may_pull()
> > > > >
> > > > > syzbot reported :
> > > > >
> > > >
> > > > Note that we had to revert this patch, so you can either scratp this
> > > > backport, or make sure to backport the revert.
> > >
> > > I'll drop it thanks.  Odd I lost the upstream git id on this patch, let
> > > me check what went wrong...
> >
> > What is the git id of the revert?  This ended up already in 4.19.y,
> > 5.4.y, and 5.9.y so needs to be reverted there.
> >
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git/commit/?id=c02bd115b1d25931159f89c7d9bf47a30f5d4b41

Thanks, I'll drop the patch from 4.4, 4.9, and 4.14, and queue up this
revert for the other trees now.

greg k-h


Re: [PATCH 4.4 15/39] geneve: pull IP header before ECN decapsulation

2020-12-10 Thread Greg Kroah-Hartman
On Thu, Dec 10, 2020 at 03:38:44PM +0100, Greg Kroah-Hartman wrote:
> On Thu, Dec 10, 2020 at 03:32:12PM +0100, Eric Dumazet wrote:
> > On Thu, Dec 10, 2020 at 3:26 PM Greg Kroah-Hartman
> >  wrote:
> > >
> > > From: Eric Dumazet 
> > >
> > > IP_ECN_decapsulate() and IP6_ECN_decapsulate() assume
> > > IP header is already pulled.
> > >
> > > geneve does not ensure this yet.
> > >
> > > Fixing this generically in IP_ECN_decapsulate() and
> > > IP6_ECN_decapsulate() is not possible, since callers
> > > pass a pointer that might be freed by pskb_may_pull()
> > >
> > > syzbot reported :
> > >
> > 
> > Note that we had to revert this patch, so you can either scratp this
> > backport, or make sure to backport the revert.
> 
> I'll drop it thanks.  Odd I lost the upstream git id on this patch, let
> me check what went wrong...

What is the git id of the revert?  This ended up already in 4.19.y,
5.4.y, and 5.9.y so needs to be reverted there.

thanks,

greg k-h


Re: [PATCH 4.4 15/39] geneve: pull IP header before ECN decapsulation

2020-12-10 Thread Eric Dumazet
On Thu, Dec 10, 2020 at 3:40 PM Greg Kroah-Hartman
 wrote:
>
> On Thu, Dec 10, 2020 at 03:38:44PM +0100, Greg Kroah-Hartman wrote:
> > On Thu, Dec 10, 2020 at 03:32:12PM +0100, Eric Dumazet wrote:
> > > On Thu, Dec 10, 2020 at 3:26 PM Greg Kroah-Hartman
> > >  wrote:
> > > >
> > > > From: Eric Dumazet 
> > > >
> > > > IP_ECN_decapsulate() and IP6_ECN_decapsulate() assume
> > > > IP header is already pulled.
> > > >
> > > > geneve does not ensure this yet.
> > > >
> > > > Fixing this generically in IP_ECN_decapsulate() and
> > > > IP6_ECN_decapsulate() is not possible, since callers
> > > > pass a pointer that might be freed by pskb_may_pull()
> > > >
> > > > syzbot reported :
> > > >
> > >
> > > Note that we had to revert this patch, so you can either scratp this
> > > backport, or make sure to backport the revert.
> >
> > I'll drop it thanks.  Odd I lost the upstream git id on this patch, let
> > me check what went wrong...
>
> What is the git id of the revert?  This ended up already in 4.19.y,
> 5.4.y, and 5.9.y so needs to be reverted there.
>

https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git/commit/?id=c02bd115b1d25931159f89c7d9bf47a30f5d4b41

Thanks !

> thanks,
>
> greg k-h


Re: [PATCH 4.4 15/39] geneve: pull IP header before ECN decapsulation

2020-12-10 Thread Eric Dumazet
On Thu, Dec 10, 2020 at 3:26 PM Greg Kroah-Hartman
 wrote:
>
> From: Eric Dumazet 
>
> IP_ECN_decapsulate() and IP6_ECN_decapsulate() assume
> IP header is already pulled.
>
> geneve does not ensure this yet.
>
> Fixing this generically in IP_ECN_decapsulate() and
> IP6_ECN_decapsulate() is not possible, since callers
> pass a pointer that might be freed by pskb_may_pull()
>
> syzbot reported :
>

Note that we had to revert this patch, so you can either scratp this
backport, or make sure to backport the revert.

Thanks !


[PATCH 5.9 24/46] geneve: pull IP header before ECN decapsulation

2020-12-06 Thread Greg Kroah-Hartman
From: Eric Dumazet 

[ Upstream commit 4179b00c04d18ea7013f68d578d80f3c9d13150a ]

IP_ECN_decapsulate() and IP6_ECN_decapsulate() assume
IP header is already pulled.

geneve does not ensure this yet.

Fixing this generically in IP_ECN_decapsulate() and
IP6_ECN_decapsulate() is not possible, since callers
pass a pointer that might be freed by pskb_may_pull()

syzbot reported :

BUG: KMSAN: uninit-value in __INET_ECN_decapsulate include/net/inet_ecn.h:238 
[inline]
BUG: KMSAN: uninit-value in INET_ECN_decapsulate+0x345/0x1db0 
include/net/inet_ecn.h:260
CPU: 1 PID: 8941 Comm: syz-executor.0 Not tainted 5.10.0-rc4-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 
01/01/2011
Call Trace:
 
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x21c/0x280 lib/dump_stack.c:118
 kmsan_report+0xf7/0x1e0 mm/kmsan/kmsan_report.c:118
 __msan_warning+0x5f/0xa0 mm/kmsan/kmsan_instr.c:197
 __INET_ECN_decapsulate include/net/inet_ecn.h:238 [inline]
 INET_ECN_decapsulate+0x345/0x1db0 include/net/inet_ecn.h:260
 geneve_rx+0x2103/0x2980 include/net/inet_ecn.h:306
 geneve_udp_encap_recv+0x105c/0x1340 drivers/net/geneve.c:377
 udp_queue_rcv_one_skb+0x193a/0x1af0 net/ipv4/udp.c:2093
 udp_queue_rcv_skb+0x282/0x1050 net/ipv4/udp.c:2167
 udp_unicast_rcv_skb net/ipv4/udp.c:2325 [inline]
 __udp4_lib_rcv+0x399d/0x5880 net/ipv4/udp.c:2394
 udp_rcv+0x5c/0x70 net/ipv4/udp.c:2564
 ip_protocol_deliver_rcu+0x572/0xc50 net/ipv4/ip_input.c:204
 ip_local_deliver_finish net/ipv4/ip_input.c:231 [inline]
 NF_HOOK include/linux/netfilter.h:301 [inline]
 ip_local_deliver+0x583/0x8d0 net/ipv4/ip_input.c:252
 dst_input include/net/dst.h:449 [inline]
 ip_rcv_finish net/ipv4/ip_input.c:428 [inline]
 NF_HOOK include/linux/netfilter.h:301 [inline]
 ip_rcv+0x5c3/0x840 net/ipv4/ip_input.c:539
 __netif_receive_skb_one_core net/core/dev.c:5315 [inline]
 __netif_receive_skb+0x1ec/0x640 net/core/dev.c:5429
 process_backlog+0x523/0xc10 net/core/dev.c:6319
 napi_poll+0x420/0x1010 net/core/dev.c:6763
 net_rx_action+0x35c/0xd40 net/core/dev.c:6833
 __do_softirq+0x1a9/0x6fa kernel/softirq.c:298
 asm_call_irq_on_stack+0xf/0x20
 
 __run_on_irqstack arch/x86/include/asm/irq_stack.h:26 [inline]
 run_on_irqstack_cond arch/x86/include/asm/irq_stack.h:77 [inline]
 do_softirq_own_stack+0x6e/0x90 arch/x86/kernel/irq_64.c:77
 do_softirq kernel/softirq.c:343 [inline]
 __local_bh_enable_ip+0x184/0x1d0 kernel/softirq.c:195
 local_bh_enable+0x36/0x40 include/linux/bottom_half.h:32
 rcu_read_unlock_bh include/linux/rcupdate.h:730 [inline]
 __dev_queue_xmit+0x3a9b/0x4520 net/core/dev.c:4167
 dev_queue_xmit+0x4b/0x60 net/core/dev.c:4173
 packet_snd net/packet/af_packet.c:2992 [inline]
 packet_sendmsg+0x86f9/0x99d0 net/packet/af_packet.c:3017
 sock_sendmsg_nosec net/socket.c:651 [inline]
 sock_sendmsg net/socket.c:671 [inline]
 __sys_sendto+0x9dc/0xc80 net/socket.c:1992
 __do_sys_sendto net/socket.c:2004 [inline]
 __se_sys_sendto+0x107/0x130 net/socket.c:2000
 __x64_sys_sendto+0x6e/0x90 net/socket.c:2000
 do_syscall_64+0x9f/0x140 arch/x86/entry/common.c:48
 entry_SYSCALL_64_after_hwframe+0x44/0xa9

Fixes: 2d07dc79fe04 ("geneve: add initial netdev driver for GENEVE tunnels")
Signed-off-by: Eric Dumazet 
Reported-by: syzbot 
Link: https://lore.kernel.org/r/20201201090507.4137906-1-eric.duma...@gmail.com
Signed-off-by: Jakub Kicinski 
Signed-off-by: Greg Kroah-Hartman 
---
 drivers/net/geneve.c |   20 
 1 file changed, 16 insertions(+), 4 deletions(-)

--- a/drivers/net/geneve.c
+++ b/drivers/net/geneve.c
@@ -258,11 +258,21 @@ static void geneve_rx(struct geneve_dev
skb_dst_set(skb, _dst->dst);
 
/* Ignore packet loops (and multicast echo) */
-   if (ether_addr_equal(eth_hdr(skb)->h_source, geneve->dev->dev_addr)) {
-   geneve->dev->stats.rx_errors++;
-   goto drop;
-   }
+   if (ether_addr_equal(eth_hdr(skb)->h_source, geneve->dev->dev_addr))
+   goto rx_error;
 
+   switch (skb_protocol(skb, true)) {
+   case htons(ETH_P_IP):
+   if (pskb_may_pull(skb, sizeof(struct iphdr)))
+   goto rx_error;
+   break;
+   case htons(ETH_P_IPV6):
+   if (pskb_may_pull(skb, sizeof(struct ipv6hdr)))
+   goto rx_error;
+   break;
+   default:
+   goto rx_error;
+   }
oiph = skb_network_header(skb);
skb_reset_network_header(skb);
 
@@ -303,6 +313,8 @@ static void geneve_rx(struct geneve_dev
u64_stats_update_end(>syncp);
}
return;
+rx_error:
+   geneve->dev->stats.rx_errors++;
 drop:
/* Consume bad packet */
kfree_skb(skb);




[PATCH 5.4 22/39] geneve: pull IP header before ECN decapsulation

2020-12-06 Thread Greg Kroah-Hartman
From: Eric Dumazet 

[ Upstream commit 4179b00c04d18ea7013f68d578d80f3c9d13150a ]

IP_ECN_decapsulate() and IP6_ECN_decapsulate() assume
IP header is already pulled.

geneve does not ensure this yet.

Fixing this generically in IP_ECN_decapsulate() and
IP6_ECN_decapsulate() is not possible, since callers
pass a pointer that might be freed by pskb_may_pull()

syzbot reported :

BUG: KMSAN: uninit-value in __INET_ECN_decapsulate include/net/inet_ecn.h:238 
[inline]
BUG: KMSAN: uninit-value in INET_ECN_decapsulate+0x345/0x1db0 
include/net/inet_ecn.h:260
CPU: 1 PID: 8941 Comm: syz-executor.0 Not tainted 5.10.0-rc4-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 
01/01/2011
Call Trace:
 
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x21c/0x280 lib/dump_stack.c:118
 kmsan_report+0xf7/0x1e0 mm/kmsan/kmsan_report.c:118
 __msan_warning+0x5f/0xa0 mm/kmsan/kmsan_instr.c:197
 __INET_ECN_decapsulate include/net/inet_ecn.h:238 [inline]
 INET_ECN_decapsulate+0x345/0x1db0 include/net/inet_ecn.h:260
 geneve_rx+0x2103/0x2980 include/net/inet_ecn.h:306
 geneve_udp_encap_recv+0x105c/0x1340 drivers/net/geneve.c:377
 udp_queue_rcv_one_skb+0x193a/0x1af0 net/ipv4/udp.c:2093
 udp_queue_rcv_skb+0x282/0x1050 net/ipv4/udp.c:2167
 udp_unicast_rcv_skb net/ipv4/udp.c:2325 [inline]
 __udp4_lib_rcv+0x399d/0x5880 net/ipv4/udp.c:2394
 udp_rcv+0x5c/0x70 net/ipv4/udp.c:2564
 ip_protocol_deliver_rcu+0x572/0xc50 net/ipv4/ip_input.c:204
 ip_local_deliver_finish net/ipv4/ip_input.c:231 [inline]
 NF_HOOK include/linux/netfilter.h:301 [inline]
 ip_local_deliver+0x583/0x8d0 net/ipv4/ip_input.c:252
 dst_input include/net/dst.h:449 [inline]
 ip_rcv_finish net/ipv4/ip_input.c:428 [inline]
 NF_HOOK include/linux/netfilter.h:301 [inline]
 ip_rcv+0x5c3/0x840 net/ipv4/ip_input.c:539
 __netif_receive_skb_one_core net/core/dev.c:5315 [inline]
 __netif_receive_skb+0x1ec/0x640 net/core/dev.c:5429
 process_backlog+0x523/0xc10 net/core/dev.c:6319
 napi_poll+0x420/0x1010 net/core/dev.c:6763
 net_rx_action+0x35c/0xd40 net/core/dev.c:6833
 __do_softirq+0x1a9/0x6fa kernel/softirq.c:298
 asm_call_irq_on_stack+0xf/0x20
 
 __run_on_irqstack arch/x86/include/asm/irq_stack.h:26 [inline]
 run_on_irqstack_cond arch/x86/include/asm/irq_stack.h:77 [inline]
 do_softirq_own_stack+0x6e/0x90 arch/x86/kernel/irq_64.c:77
 do_softirq kernel/softirq.c:343 [inline]
 __local_bh_enable_ip+0x184/0x1d0 kernel/softirq.c:195
 local_bh_enable+0x36/0x40 include/linux/bottom_half.h:32
 rcu_read_unlock_bh include/linux/rcupdate.h:730 [inline]
 __dev_queue_xmit+0x3a9b/0x4520 net/core/dev.c:4167
 dev_queue_xmit+0x4b/0x60 net/core/dev.c:4173
 packet_snd net/packet/af_packet.c:2992 [inline]
 packet_sendmsg+0x86f9/0x99d0 net/packet/af_packet.c:3017
 sock_sendmsg_nosec net/socket.c:651 [inline]
 sock_sendmsg net/socket.c:671 [inline]
 __sys_sendto+0x9dc/0xc80 net/socket.c:1992
 __do_sys_sendto net/socket.c:2004 [inline]
 __se_sys_sendto+0x107/0x130 net/socket.c:2000
 __x64_sys_sendto+0x6e/0x90 net/socket.c:2000
 do_syscall_64+0x9f/0x140 arch/x86/entry/common.c:48
 entry_SYSCALL_64_after_hwframe+0x44/0xa9

Fixes: 2d07dc79fe04 ("geneve: add initial netdev driver for GENEVE tunnels")
Signed-off-by: Eric Dumazet 
Reported-by: syzbot 
Link: https://lore.kernel.org/r/20201201090507.4137906-1-eric.duma...@gmail.com
Signed-off-by: Jakub Kicinski 
Signed-off-by: Greg Kroah-Hartman 
---
 drivers/net/geneve.c |   20 
 1 file changed, 16 insertions(+), 4 deletions(-)

--- a/drivers/net/geneve.c
+++ b/drivers/net/geneve.c
@@ -254,11 +254,21 @@ static void geneve_rx(struct geneve_dev
skb_dst_set(skb, _dst->dst);
 
/* Ignore packet loops (and multicast echo) */
-   if (ether_addr_equal(eth_hdr(skb)->h_source, geneve->dev->dev_addr)) {
-   geneve->dev->stats.rx_errors++;
-   goto drop;
-   }
+   if (ether_addr_equal(eth_hdr(skb)->h_source, geneve->dev->dev_addr))
+   goto rx_error;
 
+   switch (skb_protocol(skb, true)) {
+   case htons(ETH_P_IP):
+   if (pskb_may_pull(skb, sizeof(struct iphdr)))
+   goto rx_error;
+   break;
+   case htons(ETH_P_IPV6):
+   if (pskb_may_pull(skb, sizeof(struct ipv6hdr)))
+   goto rx_error;
+   break;
+   default:
+   goto rx_error;
+   }
oiph = skb_network_header(skb);
skb_reset_network_header(skb);
 
@@ -299,6 +309,8 @@ static void geneve_rx(struct geneve_dev
u64_stats_update_end(>syncp);
}
return;
+rx_error:
+   geneve->dev->stats.rx_errors++;
 drop:
/* Consume bad packet */
kfree_skb(skb);




[PATCH 4.19 18/32] geneve: pull IP header before ECN decapsulation

2020-12-06 Thread Greg Kroah-Hartman
From: Eric Dumazet 

[ Upstream commit 4179b00c04d18ea7013f68d578d80f3c9d13150a ]

IP_ECN_decapsulate() and IP6_ECN_decapsulate() assume
IP header is already pulled.

geneve does not ensure this yet.

Fixing this generically in IP_ECN_decapsulate() and
IP6_ECN_decapsulate() is not possible, since callers
pass a pointer that might be freed by pskb_may_pull()

syzbot reported :

BUG: KMSAN: uninit-value in __INET_ECN_decapsulate include/net/inet_ecn.h:238 
[inline]
BUG: KMSAN: uninit-value in INET_ECN_decapsulate+0x345/0x1db0 
include/net/inet_ecn.h:260
CPU: 1 PID: 8941 Comm: syz-executor.0 Not tainted 5.10.0-rc4-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 
01/01/2011
Call Trace:
 
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x21c/0x280 lib/dump_stack.c:118
 kmsan_report+0xf7/0x1e0 mm/kmsan/kmsan_report.c:118
 __msan_warning+0x5f/0xa0 mm/kmsan/kmsan_instr.c:197
 __INET_ECN_decapsulate include/net/inet_ecn.h:238 [inline]
 INET_ECN_decapsulate+0x345/0x1db0 include/net/inet_ecn.h:260
 geneve_rx+0x2103/0x2980 include/net/inet_ecn.h:306
 geneve_udp_encap_recv+0x105c/0x1340 drivers/net/geneve.c:377
 udp_queue_rcv_one_skb+0x193a/0x1af0 net/ipv4/udp.c:2093
 udp_queue_rcv_skb+0x282/0x1050 net/ipv4/udp.c:2167
 udp_unicast_rcv_skb net/ipv4/udp.c:2325 [inline]
 __udp4_lib_rcv+0x399d/0x5880 net/ipv4/udp.c:2394
 udp_rcv+0x5c/0x70 net/ipv4/udp.c:2564
 ip_protocol_deliver_rcu+0x572/0xc50 net/ipv4/ip_input.c:204
 ip_local_deliver_finish net/ipv4/ip_input.c:231 [inline]
 NF_HOOK include/linux/netfilter.h:301 [inline]
 ip_local_deliver+0x583/0x8d0 net/ipv4/ip_input.c:252
 dst_input include/net/dst.h:449 [inline]
 ip_rcv_finish net/ipv4/ip_input.c:428 [inline]
 NF_HOOK include/linux/netfilter.h:301 [inline]
 ip_rcv+0x5c3/0x840 net/ipv4/ip_input.c:539
 __netif_receive_skb_one_core net/core/dev.c:5315 [inline]
 __netif_receive_skb+0x1ec/0x640 net/core/dev.c:5429
 process_backlog+0x523/0xc10 net/core/dev.c:6319
 napi_poll+0x420/0x1010 net/core/dev.c:6763
 net_rx_action+0x35c/0xd40 net/core/dev.c:6833
 __do_softirq+0x1a9/0x6fa kernel/softirq.c:298
 asm_call_irq_on_stack+0xf/0x20
 
 __run_on_irqstack arch/x86/include/asm/irq_stack.h:26 [inline]
 run_on_irqstack_cond arch/x86/include/asm/irq_stack.h:77 [inline]
 do_softirq_own_stack+0x6e/0x90 arch/x86/kernel/irq_64.c:77
 do_softirq kernel/softirq.c:343 [inline]
 __local_bh_enable_ip+0x184/0x1d0 kernel/softirq.c:195
 local_bh_enable+0x36/0x40 include/linux/bottom_half.h:32
 rcu_read_unlock_bh include/linux/rcupdate.h:730 [inline]
 __dev_queue_xmit+0x3a9b/0x4520 net/core/dev.c:4167
 dev_queue_xmit+0x4b/0x60 net/core/dev.c:4173
 packet_snd net/packet/af_packet.c:2992 [inline]
 packet_sendmsg+0x86f9/0x99d0 net/packet/af_packet.c:3017
 sock_sendmsg_nosec net/socket.c:651 [inline]
 sock_sendmsg net/socket.c:671 [inline]
 __sys_sendto+0x9dc/0xc80 net/socket.c:1992
 __do_sys_sendto net/socket.c:2004 [inline]
 __se_sys_sendto+0x107/0x130 net/socket.c:2000
 __x64_sys_sendto+0x6e/0x90 net/socket.c:2000
 do_syscall_64+0x9f/0x140 arch/x86/entry/common.c:48
 entry_SYSCALL_64_after_hwframe+0x44/0xa9

Fixes: 2d07dc79fe04 ("geneve: add initial netdev driver for GENEVE tunnels")
Signed-off-by: Eric Dumazet 
Reported-by: syzbot 
Link: https://lore.kernel.org/r/20201201090507.4137906-1-eric.duma...@gmail.com
Signed-off-by: Jakub Kicinski 
Signed-off-by: Greg Kroah-Hartman 
---
 drivers/net/geneve.c |   20 
 1 file changed, 16 insertions(+), 4 deletions(-)

--- a/drivers/net/geneve.c
+++ b/drivers/net/geneve.c
@@ -256,11 +256,21 @@ static void geneve_rx(struct geneve_dev
skb_dst_set(skb, _dst->dst);
 
/* Ignore packet loops (and multicast echo) */
-   if (ether_addr_equal(eth_hdr(skb)->h_source, geneve->dev->dev_addr)) {
-   geneve->dev->stats.rx_errors++;
-   goto drop;
-   }
+   if (ether_addr_equal(eth_hdr(skb)->h_source, geneve->dev->dev_addr))
+   goto rx_error;
 
+   switch (skb_protocol(skb, true)) {
+   case htons(ETH_P_IP):
+   if (pskb_may_pull(skb, sizeof(struct iphdr)))
+   goto rx_error;
+   break;
+   case htons(ETH_P_IPV6):
+   if (pskb_may_pull(skb, sizeof(struct ipv6hdr)))
+   goto rx_error;
+   break;
+   default:
+   goto rx_error;
+   }
oiph = skb_network_header(skb);
skb_reset_network_header(skb);
 
@@ -301,6 +311,8 @@ static void geneve_rx(struct geneve_dev
u64_stats_update_end(>syncp);
}
return;
+rx_error:
+   geneve->dev->stats.rx_errors++;
 drop:
/* Consume bad packet */
kfree_skb(skb);




[PATCH 5.7 022/265] tcp: don't ignore ECN CWR on pure ACK

2020-06-29 Thread Sasha Levin
From: Denis Kirjanov 

[ Upstream commit 2570284060b48f3f79d8f1a2698792f36c385e9a ]

there is a problem with the CWR flag set in an incoming ACK segment
and it leads to the situation when the ECE flag is latched forever

the following packetdrill script shows what happens:

// Stack receives incoming segments with CE set
+0.1 <[ect0]  . 11001:12001(1000) ack 1001 win 65535
+0.0 <[ce]. 12001:13001(1000) ack 1001 win 65535
+0.0 <[ect0] P. 13001:14001(1000) ack 1001 win 65535

// Stack repsonds with ECN ECHO
+0.0 >[noecn]  . 1001:1001(0) ack 12001
+0.0 >[noecn] E. 1001:1001(0) ack 13001
+0.0 >[noecn] E. 1001:1001(0) ack 14001

// Write a packet
+0.1 write(3, ..., 1000) = 1000
+0.0 >[ect0] PE. 1001:2001(1000) ack 14001

// Pure ACK received
+0.01 <[noecn] W. 14001:14001(0) ack 2001 win 65535

// Since CWR was sent, this packet should NOT have ECE set

+0.1 write(3, ..., 1000) = 1000
+0.0 >[ect0]  P. 2001:3001(1000) ack 14001
// but Linux will still keep ECE latched here, with packetdrill
// flagging a missing ECE flag, expecting
// >[ect0] PE. 2001:3001(1000) ack 14001
// in the script

In the situation above we will continue to send ECN ECHO packets
and trigger the peer to reduce the congestion window. To avoid that
we can check CWR on pure ACKs received.

v3:
- Add a sequence check to avoid sending an ACK to an ACK

v2:
- Adjusted the comment
- move CWR check before checking for unacknowledged packets

Signed-off-by: Denis Kirjanov 
Acked-by: Neal Cardwell 
Signed-off-by: Eric Dumazet 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman 
---
 net/ipv4/tcp_input.c | 14 +++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 29c6fc8c77168..ccab8bc29e2b1 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -261,7 +261,8 @@ static void tcp_ecn_accept_cwr(struct sock *sk, const 
struct sk_buff *skb)
 * cwnd may be very low (even just 1 packet), so we should ACK
 * immediately.
 */
-   inet_csk(sk)->icsk_ack.pending |= ICSK_ACK_NOW;
+   if (TCP_SKB_CB(skb)->seq != TCP_SKB_CB(skb)->end_seq)
+   inet_csk(sk)->icsk_ack.pending |= ICSK_ACK_NOW;
}
 }
 
@@ -3683,6 +3684,15 @@ static int tcp_ack(struct sock *sk, const struct sk_buff 
*skb, int flag)
tcp_in_ack_event(sk, ack_ev_flags);
}
 
+   /* This is a deviation from RFC3168 since it states that:
+* "When the TCP data sender is ready to set the CWR bit after reducing
+* the congestion window, it SHOULD set the CWR bit only on the first
+* new data packet that it transmits."
+* We accept CWR on pure ACKs to be more robust
+* with widely-deployed TCP implementations that do this.
+*/
+   tcp_ecn_accept_cwr(sk, skb);
+
/* We passed data and got it acked, remove any soft error
 * log. Something worked...
 */
@@ -4780,8 +4790,6 @@ static void tcp_data_queue(struct sock *sk, struct 
sk_buff *skb)
skb_dst_drop(skb);
__skb_pull(skb, tcp_hdr(skb)->doff * 4);
 
-   tcp_ecn_accept_cwr(sk, skb);
-
tp->rx_opt.dsack = 0;
 
/*  Queue data for delivery to the user.
-- 
2.25.1



[PATCH 4.19 028/131] tcp: don't ignore ECN CWR on pure ACK

2020-06-29 Thread Sasha Levin
From: Denis Kirjanov 

[ Upstream commit 2570284060b48f3f79d8f1a2698792f36c385e9a ]

there is a problem with the CWR flag set in an incoming ACK segment
and it leads to the situation when the ECE flag is latched forever

the following packetdrill script shows what happens:

// Stack receives incoming segments with CE set
+0.1 <[ect0]  . 11001:12001(1000) ack 1001 win 65535
+0.0 <[ce]. 12001:13001(1000) ack 1001 win 65535
+0.0 <[ect0] P. 13001:14001(1000) ack 1001 win 65535

// Stack repsonds with ECN ECHO
+0.0 >[noecn]  . 1001:1001(0) ack 12001
+0.0 >[noecn] E. 1001:1001(0) ack 13001
+0.0 >[noecn] E. 1001:1001(0) ack 14001

// Write a packet
+0.1 write(3, ..., 1000) = 1000
+0.0 >[ect0] PE. 1001:2001(1000) ack 14001

// Pure ACK received
+0.01 <[noecn] W. 14001:14001(0) ack 2001 win 65535

// Since CWR was sent, this packet should NOT have ECE set

+0.1 write(3, ..., 1000) = 1000
+0.0 >[ect0]  P. 2001:3001(1000) ack 14001
// but Linux will still keep ECE latched here, with packetdrill
// flagging a missing ECE flag, expecting
// >[ect0] PE. 2001:3001(1000) ack 14001
// in the script

In the situation above we will continue to send ECN ECHO packets
and trigger the peer to reduce the congestion window. To avoid that
we can check CWR on pure ACKs received.

v3:
- Add a sequence check to avoid sending an ACK to an ACK

v2:
- Adjusted the comment
- move CWR check before checking for unacknowledged packets

Signed-off-by: Denis Kirjanov 
Acked-by: Neal Cardwell 
Signed-off-by: Eric Dumazet 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman 
---
 net/ipv4/tcp_input.c | 14 +++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 12e1ea7344d96..ee1b4804b40de 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -254,7 +254,8 @@ static void tcp_ecn_accept_cwr(struct sock *sk, const 
struct sk_buff *skb)
 * cwnd may be very low (even just 1 packet), so we should ACK
 * immediately.
 */
-   inet_csk(sk)->icsk_ack.pending |= ICSK_ACK_NOW;
+   if (TCP_SKB_CB(skb)->seq != TCP_SKB_CB(skb)->end_seq)
+   inet_csk(sk)->icsk_ack.pending |= ICSK_ACK_NOW;
}
 }
 
@@ -3665,6 +3666,15 @@ static int tcp_ack(struct sock *sk, const struct sk_buff 
*skb, int flag)
tcp_in_ack_event(sk, ack_ev_flags);
}
 
+   /* This is a deviation from RFC3168 since it states that:
+* "When the TCP data sender is ready to set the CWR bit after reducing
+* the congestion window, it SHOULD set the CWR bit only on the first
+* new data packet that it transmits."
+* We accept CWR on pure ACKs to be more robust
+* with widely-deployed TCP implementations that do this.
+*/
+   tcp_ecn_accept_cwr(sk, skb);
+
/* We passed data and got it acked, remove any soft error
 * log. Something worked...
 */
@@ -4703,8 +4713,6 @@ static void tcp_data_queue(struct sock *sk, struct 
sk_buff *skb)
skb_dst_drop(skb);
__skb_pull(skb, tcp_hdr(skb)->doff * 4);
 
-   tcp_ecn_accept_cwr(sk, skb);
-
tp->rx_opt.dsack = 0;
 
/*  Queue data for delivery to the user.
-- 
2.25.1



[PATCH 5.4 018/178] tcp: don't ignore ECN CWR on pure ACK

2020-06-29 Thread Sasha Levin
From: Denis Kirjanov 

[ Upstream commit 2570284060b48f3f79d8f1a2698792f36c385e9a ]

there is a problem with the CWR flag set in an incoming ACK segment
and it leads to the situation when the ECE flag is latched forever

the following packetdrill script shows what happens:

// Stack receives incoming segments with CE set
+0.1 <[ect0]  . 11001:12001(1000) ack 1001 win 65535
+0.0 <[ce]. 12001:13001(1000) ack 1001 win 65535
+0.0 <[ect0] P. 13001:14001(1000) ack 1001 win 65535

// Stack repsonds with ECN ECHO
+0.0 >[noecn]  . 1001:1001(0) ack 12001
+0.0 >[noecn] E. 1001:1001(0) ack 13001
+0.0 >[noecn] E. 1001:1001(0) ack 14001

// Write a packet
+0.1 write(3, ..., 1000) = 1000
+0.0 >[ect0] PE. 1001:2001(1000) ack 14001

// Pure ACK received
+0.01 <[noecn] W. 14001:14001(0) ack 2001 win 65535

// Since CWR was sent, this packet should NOT have ECE set

+0.1 write(3, ..., 1000) = 1000
+0.0 >[ect0]  P. 2001:3001(1000) ack 14001
// but Linux will still keep ECE latched here, with packetdrill
// flagging a missing ECE flag, expecting
// >[ect0] PE. 2001:3001(1000) ack 14001
// in the script

In the situation above we will continue to send ECN ECHO packets
and trigger the peer to reduce the congestion window. To avoid that
we can check CWR on pure ACKs received.

v3:
- Add a sequence check to avoid sending an ACK to an ACK

v2:
- Adjusted the comment
- move CWR check before checking for unacknowledged packets

Signed-off-by: Denis Kirjanov 
Acked-by: Neal Cardwell 
Signed-off-by: Eric Dumazet 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman 
---
 net/ipv4/tcp_input.c | 14 +++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 677facbeed26a..cc8411c98f28a 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -260,7 +260,8 @@ static void tcp_ecn_accept_cwr(struct sock *sk, const 
struct sk_buff *skb)
 * cwnd may be very low (even just 1 packet), so we should ACK
 * immediately.
 */
-   inet_csk(sk)->icsk_ack.pending |= ICSK_ACK_NOW;
+   if (TCP_SKB_CB(skb)->seq != TCP_SKB_CB(skb)->end_seq)
+   inet_csk(sk)->icsk_ack.pending |= ICSK_ACK_NOW;
}
 }
 
@@ -3682,6 +3683,15 @@ static int tcp_ack(struct sock *sk, const struct sk_buff 
*skb, int flag)
tcp_in_ack_event(sk, ack_ev_flags);
}
 
+   /* This is a deviation from RFC3168 since it states that:
+* "When the TCP data sender is ready to set the CWR bit after reducing
+* the congestion window, it SHOULD set the CWR bit only on the first
+* new data packet that it transmits."
+* We accept CWR on pure ACKs to be more robust
+* with widely-deployed TCP implementations that do this.
+*/
+   tcp_ecn_accept_cwr(sk, skb);
+
/* We passed data and got it acked, remove any soft error
 * log. Something worked...
 */
@@ -4771,8 +4781,6 @@ static void tcp_data_queue(struct sock *sk, struct 
sk_buff *skb)
skb_dst_drop(skb);
__skb_pull(skb, tcp_hdr(skb)->doff * 4);
 
-   tcp_ecn_accept_cwr(sk, skb);
-
tp->rx_opt.dsack = 0;
 
/*  Queue data for delivery to the user.
-- 
2.25.1



[PATCH 5.6 053/118] wireguard: receive: use tunnel helpers for decapsulating ECN markings

2020-05-13 Thread Greg Kroah-Hartman
From: "Toke H�iland-J�rgensen" 

[ Upstream commit eebabcb26ea1e3295704477c6cd4e772c96a9559 ]

WireGuard currently only propagates ECN markings on tunnel decap according
to the old RFC3168 specification. However, the spec has since been updated
in RFC6040 to recommend slightly different decapsulation semantics. This
was implemented in the kernel as a set of common helpers for ECN
decapsulation, so let's just switch over WireGuard to using those, so it
can benefit from this enhancement and any future tweaks. We do not drop
packets with invalid ECN marking combinations, because WireGuard is
frequently used to work around broken ISPs, which could be doing that.

Fixes: e7096c131e51 ("net: WireGuard secure network tunnel")
Reported-by: Olivier Tilmans 
Cc: Dave Taht 
Cc: Rodney W. Grimes 
Signed-off-by: Toke Høiland-Jørgensen 
Signed-off-by: Jason A. Donenfeld 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman 
---
 drivers/net/wireguard/receive.c |6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

--- a/drivers/net/wireguard/receive.c
+++ b/drivers/net/wireguard/receive.c
@@ -393,13 +393,11 @@ static void wg_packet_consume_data_done(
len = ntohs(ip_hdr(skb)->tot_len);
if (unlikely(len < sizeof(struct iphdr)))
goto dishonest_packet_size;
-   if (INET_ECN_is_ce(PACKET_CB(skb)->ds))
-   IP_ECN_set_ce(ip_hdr(skb));
+   INET_ECN_decapsulate(skb, PACKET_CB(skb)->ds, ip_hdr(skb)->tos);
} else if (skb->protocol == htons(ETH_P_IPV6)) {
len = ntohs(ipv6_hdr(skb)->payload_len) +
  sizeof(struct ipv6hdr);
-   if (INET_ECN_is_ce(PACKET_CB(skb)->ds))
-   IP6_ECN_set_ce(skb, ipv6_hdr(skb));
+   INET_ECN_decapsulate(skb, PACKET_CB(skb)->ds, 
ipv6_get_dsfield(ipv6_hdr(skb)));
} else {
goto dishonest_packet_type;
}




[PATCH 4.19 15/63] net/mlx5e: Set ECN for received packets using CQE indication

2019-09-29 Thread Greg Kroah-Hartman
From: Natali Shechtman 

[ Upstream commit f007c13d4ad62f494c83897eda96437005df4a91 ]

In multi-host (MH) NIC scheme, a single HW port serves multiple hosts
or sockets on the same host.
The HW uses a mechanism in the PCIe buffer which monitors
the amount of consumed PCIe buffers per host.
On a certain configuration, under congestion,
the HW emulates a switch doing ECN marking on packets using ECN
indication on the completion descriptor (CQE).

The driver needs to set the ECN bits on the packet SKB,
such that the network stack can react on that, this commit does that.

Needed by downstream patch which fixes a mlx5 checksum issue.

Fixes: bbceefce9adf ("net/mlx5e: Support RX CHECKSUM_COMPLETE")
Signed-off-by: Natali Shechtman 
Reviewed-by: Tariq Toukan 
Signed-off-by: Saeed Mahameed 
Signed-off-by: Greg Kroah-Hartman 
---
 drivers/net/ethernet/mellanox/mlx5/core/en_rx.c|   35 ++---
 drivers/net/ethernet/mellanox/mlx5/core/en_stats.c |3 +
 drivers/net/ethernet/mellanox/mlx5/core/en_stats.h |2 +
 3 files changed, 35 insertions(+), 5 deletions(-)

--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
@@ -37,6 +37,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "en.h"
 #include "en_tc.h"
 #include "eswitch.h"
@@ -688,12 +689,29 @@ static inline void mlx5e_skb_set_hash(st
skb_set_hash(skb, be32_to_cpu(cqe->rss_hash_result), ht);
 }
 
-static inline bool is_last_ethertype_ip(struct sk_buff *skb, int 
*network_depth)
+static inline bool is_last_ethertype_ip(struct sk_buff *skb, int 
*network_depth,
+   __be16 *proto)
 {
-   __be16 ethertype = ((struct ethhdr *)skb->data)->h_proto;
+   *proto = ((struct ethhdr *)skb->data)->h_proto;
+   *proto = __vlan_get_protocol(skb, *proto, network_depth);
+   return (*proto == htons(ETH_P_IP) || *proto == htons(ETH_P_IPV6));
+}
+
+static inline void mlx5e_enable_ecn(struct mlx5e_rq *rq, struct sk_buff *skb)
+{
+   int network_depth = 0;
+   __be16 proto;
+   void *ip;
+   int rc;
 
-   ethertype = __vlan_get_protocol(skb, ethertype, network_depth);
-   return (ethertype == htons(ETH_P_IP) || ethertype == htons(ETH_P_IPV6));
+   if (unlikely(!is_last_ethertype_ip(skb, _depth, )))
+   return;
+
+   ip = skb->data + network_depth;
+   rc = ((proto == htons(ETH_P_IP)) ? IP_ECN_set_ce((struct iphdr *)ip) :
+IP6_ECN_set_ce(skb, (struct ipv6hdr 
*)ip));
+
+   rq->stats->ecn_mark += !!rc;
 }
 
 static u32 mlx5e_get_fcs(const struct sk_buff *skb)
@@ -717,6 +735,7 @@ static inline void mlx5e_handle_csum(str
 {
struct mlx5e_rq_stats *stats = rq->stats;
int network_depth = 0;
+   __be16 proto;
 
if (unlikely(!(netdev->features & NETIF_F_RXCSUM)))
goto csum_none;
@@ -738,7 +757,7 @@ static inline void mlx5e_handle_csum(str
if (short_frame(skb->len))
goto csum_unnecessary;
 
-   if (likely(is_last_ethertype_ip(skb, _depth))) {
+   if (likely(is_last_ethertype_ip(skb, _depth, ))) {
skb->ip_summed = CHECKSUM_COMPLETE;
skb->csum = csum_unfold((__force __sum16)cqe->check_sum);
if (network_depth > ETH_HLEN)
@@ -775,6 +794,8 @@ csum_none:
stats->csum_none++;
 }
 
+#define MLX5E_CE_BIT_MASK 0x80
+
 static inline void mlx5e_build_rx_skb(struct mlx5_cqe64 *cqe,
  u32 cqe_bcnt,
  struct mlx5e_rq *rq,
@@ -819,6 +840,10 @@ static inline void mlx5e_build_rx_skb(st
skb->mark = be32_to_cpu(cqe->sop_drop_qpn) & MLX5E_TC_FLOW_ID_MASK;
 
mlx5e_handle_csum(netdev, cqe, rq, skb, !!lro_num_seg);
+   /* checking CE bit in cqe - MSB in ml_path field */
+   if (unlikely(cqe->ml_path & MLX5E_CE_BIT_MASK))
+   mlx5e_enable_ecn(rq, skb);
+
skb->protocol = eth_type_trans(skb, netdev);
 }
 
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c
@@ -53,6 +53,7 @@ static const struct counter_desc sw_stat
 
{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, rx_lro_packets) },
{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, rx_lro_bytes) },
+   { MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, rx_ecn_mark) },
{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, rx_removed_vlan_packets) },
{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, rx_csum_unnecessary) },
{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, rx_csum_none) },
@@ -144,6 +145,7 @@ void mlx5e_grp_sw_update_stats(struct ml
s->rx_bytes += rq_stats->bytes;
s->rx_lro_packets += rq_stats->lro_packets;
s->rx_lro_bytes += rq_

[PATCH 3.18 72/85] tcp: add one more quick ack after after ECN events

2018-08-07 Thread Greg Kroah-Hartman
3.18-stable review patch.  If anyone has any objections, please let me know.

--

From: Eric Dumazet 

[ Upstream commit 15ecbe94a45ef88491ca459b26efdd02f91edb6d ]

Larry Brakmo proposal ( https://patchwork.ozlabs.org/patch/935233/
tcp: force cwnd at least 2 in tcp_cwnd_reduction) made us rethink
about our recent patch removing ~16 quick acks after ECN events.

tcp_enter_quickack_mode(sk, 1) makes sure one immediate ack is sent,
but in the case the sender cwnd was lowered to 1, we do not want
to have a delayed ack for the next packet we will receive.

Fixes: 522040ea5fdd ("tcp: do not aggressively quick ack after ECN events")
Signed-off-by: Eric Dumazet 
Reported-by: Neal Cardwell 
Cc: Lawrence Brakmo 
Acked-by: Neal Cardwell 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman 
---
 net/ipv4/tcp_input.c |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -232,7 +232,7 @@ static void __tcp_ecn_check_ce(struct so
 * it is probably a retransmit.
 */
if (tp->ecn_flags & TCP_ECN_SEEN)
-   tcp_enter_quickack_mode(sk, 1);
+   tcp_enter_quickack_mode(sk, 2);
break;
case INET_ECN_CE:
if (tcp_ca_needs_ecn(sk))
@@ -240,7 +240,7 @@ static void __tcp_ecn_check_ce(struct so
 
if (!(tp->ecn_flags & TCP_ECN_DEMAND_CWR)) {
/* Better not delay acks, sender can have a very low 
cwnd */
-   tcp_enter_quickack_mode(sk, 1);
+   tcp_enter_quickack_mode(sk, 2);
tp->ecn_flags |= TCP_ECN_DEMAND_CWR;
}
tp->ecn_flags |= TCP_ECN_SEEN;




[PATCH 3.18 72/85] tcp: add one more quick ack after after ECN events

2018-08-07 Thread Greg Kroah-Hartman
3.18-stable review patch.  If anyone has any objections, please let me know.

--

From: Eric Dumazet 

[ Upstream commit 15ecbe94a45ef88491ca459b26efdd02f91edb6d ]

Larry Brakmo proposal ( https://patchwork.ozlabs.org/patch/935233/
tcp: force cwnd at least 2 in tcp_cwnd_reduction) made us rethink
about our recent patch removing ~16 quick acks after ECN events.

tcp_enter_quickack_mode(sk, 1) makes sure one immediate ack is sent,
but in the case the sender cwnd was lowered to 1, we do not want
to have a delayed ack for the next packet we will receive.

Fixes: 522040ea5fdd ("tcp: do not aggressively quick ack after ECN events")
Signed-off-by: Eric Dumazet 
Reported-by: Neal Cardwell 
Cc: Lawrence Brakmo 
Acked-by: Neal Cardwell 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman 
---
 net/ipv4/tcp_input.c |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -232,7 +232,7 @@ static void __tcp_ecn_check_ce(struct so
 * it is probably a retransmit.
 */
if (tp->ecn_flags & TCP_ECN_SEEN)
-   tcp_enter_quickack_mode(sk, 1);
+   tcp_enter_quickack_mode(sk, 2);
break;
case INET_ECN_CE:
if (tcp_ca_needs_ecn(sk))
@@ -240,7 +240,7 @@ static void __tcp_ecn_check_ce(struct so
 
if (!(tp->ecn_flags & TCP_ECN_DEMAND_CWR)) {
/* Better not delay acks, sender can have a very low 
cwnd */
-   tcp_enter_quickack_mode(sk, 1);
+   tcp_enter_quickack_mode(sk, 2);
tp->ecn_flags |= TCP_ECN_DEMAND_CWR;
}
tp->ecn_flags |= TCP_ECN_SEEN;




[PATCH 3.18 70/85] tcp: do not aggressively quick ack after ECN events

2018-08-07 Thread Greg Kroah-Hartman
3.18-stable review patch.  If anyone has any objections, please let me know.

--

From: Eric Dumazet 

[ Upstream commit 522040ea5fdd1c33bbf75e1d7c7c0422b96a94ef ]

ECN signals currently forces TCP to enter quickack mode for
up to 16 (TCP_MAX_QUICKACKS) following incoming packets.

We believe this is not needed, and only sending one immediate ack
for the current packet should be enough.

This should reduce the extra load noticed in DCTCP environments,
after congestion events.

This is part 2 of our effort to reduce pure ACK packets.

Signed-off-by: Eric Dumazet 
Acked-by: Soheil Hassas Yeganeh 
Acked-by: Yuchung Cheng 
Acked-by: Neal Cardwell 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman 
---
 net/ipv4/tcp_input.c |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -230,7 +230,7 @@ static void __tcp_ecn_check_ce(struct tc
 * it is probably a retransmit.
 */
if (tp->ecn_flags & TCP_ECN_SEEN)
-   tcp_enter_quickack_mode((struct sock *)tp, 
TCP_MAX_QUICKACKS);
+   tcp_enter_quickack_mode((struct sock *)tp, 1);
break;
case INET_ECN_CE:
if (tcp_ca_needs_ecn((struct sock *)tp))
@@ -238,7 +238,7 @@ static void __tcp_ecn_check_ce(struct tc
 
if (!(tp->ecn_flags & TCP_ECN_DEMAND_CWR)) {
/* Better not delay acks, sender can have a very low 
cwnd */
-   tcp_enter_quickack_mode((struct sock *)tp, 
TCP_MAX_QUICKACKS);
+   tcp_enter_quickack_mode((struct sock *)tp, 1);
tp->ecn_flags |= TCP_ECN_DEMAND_CWR;
}
tp->ecn_flags |= TCP_ECN_SEEN;




[PATCH 3.18 70/85] tcp: do not aggressively quick ack after ECN events

2018-08-07 Thread Greg Kroah-Hartman
3.18-stable review patch.  If anyone has any objections, please let me know.

--

From: Eric Dumazet 

[ Upstream commit 522040ea5fdd1c33bbf75e1d7c7c0422b96a94ef ]

ECN signals currently forces TCP to enter quickack mode for
up to 16 (TCP_MAX_QUICKACKS) following incoming packets.

We believe this is not needed, and only sending one immediate ack
for the current packet should be enough.

This should reduce the extra load noticed in DCTCP environments,
after congestion events.

This is part 2 of our effort to reduce pure ACK packets.

Signed-off-by: Eric Dumazet 
Acked-by: Soheil Hassas Yeganeh 
Acked-by: Yuchung Cheng 
Acked-by: Neal Cardwell 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman 
---
 net/ipv4/tcp_input.c |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -230,7 +230,7 @@ static void __tcp_ecn_check_ce(struct tc
 * it is probably a retransmit.
 */
if (tp->ecn_flags & TCP_ECN_SEEN)
-   tcp_enter_quickack_mode((struct sock *)tp, 
TCP_MAX_QUICKACKS);
+   tcp_enter_quickack_mode((struct sock *)tp, 1);
break;
case INET_ECN_CE:
if (tcp_ca_needs_ecn((struct sock *)tp))
@@ -238,7 +238,7 @@ static void __tcp_ecn_check_ce(struct tc
 
if (!(tp->ecn_flags & TCP_ECN_DEMAND_CWR)) {
/* Better not delay acks, sender can have a very low 
cwnd */
-   tcp_enter_quickack_mode((struct sock *)tp, 
TCP_MAX_QUICKACKS);
+   tcp_enter_quickack_mode((struct sock *)tp, 1);
tp->ecn_flags |= TCP_ECN_DEMAND_CWR;
}
tp->ecn_flags |= TCP_ECN_SEEN;




[PATCH 4.4 111/124] tcp: do not aggressively quick ack after ECN events

2018-08-04 Thread Greg Kroah-Hartman
4.4-stable review patch.  If anyone has any objections, please let me know.

--

From: Eric Dumazet 

[ Upstream commit 522040ea5fdd1c33bbf75e1d7c7c0422b96a94ef ]

ECN signals currently forces TCP to enter quickack mode for
up to 16 (TCP_MAX_QUICKACKS) following incoming packets.

We believe this is not needed, and only sending one immediate ack
for the current packet should be enough.

This should reduce the extra load noticed in DCTCP environments,
after congestion events.

This is part 2 of our effort to reduce pure ACK packets.

Signed-off-by: Eric Dumazet 
Acked-by: Soheil Hassas Yeganeh 
Acked-by: Yuchung Cheng 
Acked-by: Neal Cardwell 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman 
---
 net/ipv4/tcp_input.c |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -237,7 +237,7 @@ static void __tcp_ecn_check_ce(struct tc
 * it is probably a retransmit.
 */
if (tp->ecn_flags & TCP_ECN_SEEN)
-   tcp_enter_quickack_mode((struct sock *)tp, 
TCP_MAX_QUICKACKS);
+   tcp_enter_quickack_mode((struct sock *)tp, 1);
break;
case INET_ECN_CE:
if (tcp_ca_needs_ecn((struct sock *)tp))
@@ -245,7 +245,7 @@ static void __tcp_ecn_check_ce(struct tc
 
if (!(tp->ecn_flags & TCP_ECN_DEMAND_CWR)) {
/* Better not delay acks, sender can have a very low 
cwnd */
-   tcp_enter_quickack_mode((struct sock *)tp, 
TCP_MAX_QUICKACKS);
+   tcp_enter_quickack_mode((struct sock *)tp, 1);
tp->ecn_flags |= TCP_ECN_DEMAND_CWR;
}
tp->ecn_flags |= TCP_ECN_SEEN;




[PATCH 4.4 111/124] tcp: do not aggressively quick ack after ECN events

2018-08-04 Thread Greg Kroah-Hartman
4.4-stable review patch.  If anyone has any objections, please let me know.

--

From: Eric Dumazet 

[ Upstream commit 522040ea5fdd1c33bbf75e1d7c7c0422b96a94ef ]

ECN signals currently forces TCP to enter quickack mode for
up to 16 (TCP_MAX_QUICKACKS) following incoming packets.

We believe this is not needed, and only sending one immediate ack
for the current packet should be enough.

This should reduce the extra load noticed in DCTCP environments,
after congestion events.

This is part 2 of our effort to reduce pure ACK packets.

Signed-off-by: Eric Dumazet 
Acked-by: Soheil Hassas Yeganeh 
Acked-by: Yuchung Cheng 
Acked-by: Neal Cardwell 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman 
---
 net/ipv4/tcp_input.c |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -237,7 +237,7 @@ static void __tcp_ecn_check_ce(struct tc
 * it is probably a retransmit.
 */
if (tp->ecn_flags & TCP_ECN_SEEN)
-   tcp_enter_quickack_mode((struct sock *)tp, 
TCP_MAX_QUICKACKS);
+   tcp_enter_quickack_mode((struct sock *)tp, 1);
break;
case INET_ECN_CE:
if (tcp_ca_needs_ecn((struct sock *)tp))
@@ -245,7 +245,7 @@ static void __tcp_ecn_check_ce(struct tc
 
if (!(tp->ecn_flags & TCP_ECN_DEMAND_CWR)) {
/* Better not delay acks, sender can have a very low 
cwnd */
-   tcp_enter_quickack_mode((struct sock *)tp, 
TCP_MAX_QUICKACKS);
+   tcp_enter_quickack_mode((struct sock *)tp, 1);
tp->ecn_flags |= TCP_ECN_DEMAND_CWR;
}
tp->ecn_flags |= TCP_ECN_SEEN;




[PATCH 4.4 113/124] tcp: add one more quick ack after after ECN events

2018-08-04 Thread Greg Kroah-Hartman
4.4-stable review patch.  If anyone has any objections, please let me know.

--

From: Eric Dumazet 

[ Upstream commit 15ecbe94a45ef88491ca459b26efdd02f91edb6d ]

Larry Brakmo proposal ( https://patchwork.ozlabs.org/patch/935233/
tcp: force cwnd at least 2 in tcp_cwnd_reduction) made us rethink
about our recent patch removing ~16 quick acks after ECN events.

tcp_enter_quickack_mode(sk, 1) makes sure one immediate ack is sent,
but in the case the sender cwnd was lowered to 1, we do not want
to have a delayed ack for the next packet we will receive.

Fixes: 522040ea5fdd ("tcp: do not aggressively quick ack after ECN events")
Signed-off-by: Eric Dumazet 
Reported-by: Neal Cardwell 
Cc: Lawrence Brakmo 
Acked-by: Neal Cardwell 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman 
---
 net/ipv4/tcp_input.c |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -239,7 +239,7 @@ static void __tcp_ecn_check_ce(struct so
 * it is probably a retransmit.
 */
if (tp->ecn_flags & TCP_ECN_SEEN)
-   tcp_enter_quickack_mode(sk, 1);
+   tcp_enter_quickack_mode(sk, 2);
break;
case INET_ECN_CE:
if (tcp_ca_needs_ecn(sk))
@@ -247,7 +247,7 @@ static void __tcp_ecn_check_ce(struct so
 
if (!(tp->ecn_flags & TCP_ECN_DEMAND_CWR)) {
/* Better not delay acks, sender can have a very low 
cwnd */
-   tcp_enter_quickack_mode(sk, 1);
+   tcp_enter_quickack_mode(sk, 2);
tp->ecn_flags |= TCP_ECN_DEMAND_CWR;
}
tp->ecn_flags |= TCP_ECN_SEEN;




[PATCH 4.4 113/124] tcp: add one more quick ack after after ECN events

2018-08-04 Thread Greg Kroah-Hartman
4.4-stable review patch.  If anyone has any objections, please let me know.

--

From: Eric Dumazet 

[ Upstream commit 15ecbe94a45ef88491ca459b26efdd02f91edb6d ]

Larry Brakmo proposal ( https://patchwork.ozlabs.org/patch/935233/
tcp: force cwnd at least 2 in tcp_cwnd_reduction) made us rethink
about our recent patch removing ~16 quick acks after ECN events.

tcp_enter_quickack_mode(sk, 1) makes sure one immediate ack is sent,
but in the case the sender cwnd was lowered to 1, we do not want
to have a delayed ack for the next packet we will receive.

Fixes: 522040ea5fdd ("tcp: do not aggressively quick ack after ECN events")
Signed-off-by: Eric Dumazet 
Reported-by: Neal Cardwell 
Cc: Lawrence Brakmo 
Acked-by: Neal Cardwell 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman 
---
 net/ipv4/tcp_input.c |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -239,7 +239,7 @@ static void __tcp_ecn_check_ce(struct so
 * it is probably a retransmit.
 */
if (tp->ecn_flags & TCP_ECN_SEEN)
-   tcp_enter_quickack_mode(sk, 1);
+   tcp_enter_quickack_mode(sk, 2);
break;
case INET_ECN_CE:
if (tcp_ca_needs_ecn(sk))
@@ -247,7 +247,7 @@ static void __tcp_ecn_check_ce(struct so
 
if (!(tp->ecn_flags & TCP_ECN_DEMAND_CWR)) {
/* Better not delay acks, sender can have a very low 
cwnd */
-   tcp_enter_quickack_mode(sk, 1);
+   tcp_enter_quickack_mode(sk, 2);
tp->ecn_flags |= TCP_ECN_DEMAND_CWR;
}
tp->ecn_flags |= TCP_ECN_SEEN;




[PATCH 4.9 15/32] tcp: add one more quick ack after after ECN events

2018-08-04 Thread Greg Kroah-Hartman
4.9-stable review patch.  If anyone has any objections, please let me know.

--

From: Eric Dumazet 

[ Upstream commit 15ecbe94a45ef88491ca459b26efdd02f91edb6d ]

Larry Brakmo proposal ( https://patchwork.ozlabs.org/patch/935233/
tcp: force cwnd at least 2 in tcp_cwnd_reduction) made us rethink
about our recent patch removing ~16 quick acks after ECN events.

tcp_enter_quickack_mode(sk, 1) makes sure one immediate ack is sent,
but in the case the sender cwnd was lowered to 1, we do not want
to have a delayed ack for the next packet we will receive.

Fixes: 522040ea5fdd ("tcp: do not aggressively quick ack after ECN events")
Signed-off-by: Eric Dumazet 
Reported-by: Neal Cardwell 
Cc: Lawrence Brakmo 
Acked-by: Neal Cardwell 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman 
---
 net/ipv4/tcp_input.c |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -261,7 +261,7 @@ static void __tcp_ecn_check_ce(struct so
 * it is probably a retransmit.
 */
if (tp->ecn_flags & TCP_ECN_SEEN)
-   tcp_enter_quickack_mode(sk, 1);
+   tcp_enter_quickack_mode(sk, 2);
break;
case INET_ECN_CE:
if (tcp_ca_needs_ecn(sk))
@@ -269,7 +269,7 @@ static void __tcp_ecn_check_ce(struct so
 
if (!(tp->ecn_flags & TCP_ECN_DEMAND_CWR)) {
/* Better not delay acks, sender can have a very low 
cwnd */
-   tcp_enter_quickack_mode(sk, 1);
+   tcp_enter_quickack_mode(sk, 2);
tp->ecn_flags |= TCP_ECN_DEMAND_CWR;
}
tp->ecn_flags |= TCP_ECN_SEEN;




[PATCH 4.9 15/32] tcp: add one more quick ack after after ECN events

2018-08-04 Thread Greg Kroah-Hartman
4.9-stable review patch.  If anyone has any objections, please let me know.

--

From: Eric Dumazet 

[ Upstream commit 15ecbe94a45ef88491ca459b26efdd02f91edb6d ]

Larry Brakmo proposal ( https://patchwork.ozlabs.org/patch/935233/
tcp: force cwnd at least 2 in tcp_cwnd_reduction) made us rethink
about our recent patch removing ~16 quick acks after ECN events.

tcp_enter_quickack_mode(sk, 1) makes sure one immediate ack is sent,
but in the case the sender cwnd was lowered to 1, we do not want
to have a delayed ack for the next packet we will receive.

Fixes: 522040ea5fdd ("tcp: do not aggressively quick ack after ECN events")
Signed-off-by: Eric Dumazet 
Reported-by: Neal Cardwell 
Cc: Lawrence Brakmo 
Acked-by: Neal Cardwell 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman 
---
 net/ipv4/tcp_input.c |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -261,7 +261,7 @@ static void __tcp_ecn_check_ce(struct so
 * it is probably a retransmit.
 */
if (tp->ecn_flags & TCP_ECN_SEEN)
-   tcp_enter_quickack_mode(sk, 1);
+   tcp_enter_quickack_mode(sk, 2);
break;
case INET_ECN_CE:
if (tcp_ca_needs_ecn(sk))
@@ -269,7 +269,7 @@ static void __tcp_ecn_check_ce(struct so
 
if (!(tp->ecn_flags & TCP_ECN_DEMAND_CWR)) {
/* Better not delay acks, sender can have a very low 
cwnd */
-   tcp_enter_quickack_mode(sk, 1);
+   tcp_enter_quickack_mode(sk, 2);
tp->ecn_flags |= TCP_ECN_DEMAND_CWR;
}
tp->ecn_flags |= TCP_ECN_SEEN;




[PATCH 4.9 13/32] tcp: do not aggressively quick ack after ECN events

2018-08-04 Thread Greg Kroah-Hartman
4.9-stable review patch.  If anyone has any objections, please let me know.

--

From: Eric Dumazet 

[ Upstream commit 522040ea5fdd1c33bbf75e1d7c7c0422b96a94ef ]

ECN signals currently forces TCP to enter quickack mode for
up to 16 (TCP_MAX_QUICKACKS) following incoming packets.

We believe this is not needed, and only sending one immediate ack
for the current packet should be enough.

This should reduce the extra load noticed in DCTCP environments,
after congestion events.

This is part 2 of our effort to reduce pure ACK packets.

Signed-off-by: Eric Dumazet 
Acked-by: Soheil Hassas Yeganeh 
Acked-by: Yuchung Cheng 
Acked-by: Neal Cardwell 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman 
---
 net/ipv4/tcp_input.c |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -259,7 +259,7 @@ static void __tcp_ecn_check_ce(struct tc
 * it is probably a retransmit.
 */
if (tp->ecn_flags & TCP_ECN_SEEN)
-   tcp_enter_quickack_mode((struct sock *)tp, 
TCP_MAX_QUICKACKS);
+   tcp_enter_quickack_mode((struct sock *)tp, 1);
break;
case INET_ECN_CE:
if (tcp_ca_needs_ecn((struct sock *)tp))
@@ -267,7 +267,7 @@ static void __tcp_ecn_check_ce(struct tc
 
if (!(tp->ecn_flags & TCP_ECN_DEMAND_CWR)) {
/* Better not delay acks, sender can have a very low 
cwnd */
-   tcp_enter_quickack_mode((struct sock *)tp, 
TCP_MAX_QUICKACKS);
+   tcp_enter_quickack_mode((struct sock *)tp, 1);
tp->ecn_flags |= TCP_ECN_DEMAND_CWR;
}
tp->ecn_flags |= TCP_ECN_SEEN;




[PATCH 4.9 13/32] tcp: do not aggressively quick ack after ECN events

2018-08-04 Thread Greg Kroah-Hartman
4.9-stable review patch.  If anyone has any objections, please let me know.

--

From: Eric Dumazet 

[ Upstream commit 522040ea5fdd1c33bbf75e1d7c7c0422b96a94ef ]

ECN signals currently forces TCP to enter quickack mode for
up to 16 (TCP_MAX_QUICKACKS) following incoming packets.

We believe this is not needed, and only sending one immediate ack
for the current packet should be enough.

This should reduce the extra load noticed in DCTCP environments,
after congestion events.

This is part 2 of our effort to reduce pure ACK packets.

Signed-off-by: Eric Dumazet 
Acked-by: Soheil Hassas Yeganeh 
Acked-by: Yuchung Cheng 
Acked-by: Neal Cardwell 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman 
---
 net/ipv4/tcp_input.c |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -259,7 +259,7 @@ static void __tcp_ecn_check_ce(struct tc
 * it is probably a retransmit.
 */
if (tp->ecn_flags & TCP_ECN_SEEN)
-   tcp_enter_quickack_mode((struct sock *)tp, 
TCP_MAX_QUICKACKS);
+   tcp_enter_quickack_mode((struct sock *)tp, 1);
break;
case INET_ECN_CE:
if (tcp_ca_needs_ecn((struct sock *)tp))
@@ -267,7 +267,7 @@ static void __tcp_ecn_check_ce(struct tc
 
if (!(tp->ecn_flags & TCP_ECN_DEMAND_CWR)) {
/* Better not delay acks, sender can have a very low 
cwnd */
-   tcp_enter_quickack_mode((struct sock *)tp, 
TCP_MAX_QUICKACKS);
+   tcp_enter_quickack_mode((struct sock *)tp, 1);
tp->ecn_flags |= TCP_ECN_DEMAND_CWR;
}
tp->ecn_flags |= TCP_ECN_SEEN;




[PATCH 4.17 331/336] tcp: do not aggressively quick ack after ECN events

2018-08-01 Thread Greg Kroah-Hartman
4.17-stable review patch.  If anyone has any objections, please let me know.

--

From: Eric Dumazet 

[ Upstream commit 522040ea5fdd1c33bbf75e1d7c7c0422b96a94ef ]

ECN signals currently forces TCP to enter quickack mode for
up to 16 (TCP_MAX_QUICKACKS) following incoming packets.

We believe this is not needed, and only sending one immediate ack
for the current packet should be enough.

This should reduce the extra load noticed in DCTCP environments,
after congestion events.

This is part 2 of our effort to reduce pure ACK packets.

Signed-off-by: Eric Dumazet 
Acked-by: Soheil Hassas Yeganeh 
Acked-by: Yuchung Cheng 
Acked-by: Neal Cardwell 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman 
---
 net/ipv4/tcp_input.c |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -245,7 +245,7 @@ static void __tcp_ecn_check_ce(struct tc
 * it is probably a retransmit.
 */
if (tp->ecn_flags & TCP_ECN_SEEN)
-   tcp_enter_quickack_mode((struct sock *)tp, 
TCP_MAX_QUICKACKS);
+   tcp_enter_quickack_mode((struct sock *)tp, 1);
break;
case INET_ECN_CE:
if (tcp_ca_needs_ecn((struct sock *)tp))
@@ -253,7 +253,7 @@ static void __tcp_ecn_check_ce(struct tc
 
if (!(tp->ecn_flags & TCP_ECN_DEMAND_CWR)) {
/* Better not delay acks, sender can have a very low 
cwnd */
-   tcp_enter_quickack_mode((struct sock *)tp, 
TCP_MAX_QUICKACKS);
+   tcp_enter_quickack_mode((struct sock *)tp, 1);
tp->ecn_flags |= TCP_ECN_DEMAND_CWR;
}
tp->ecn_flags |= TCP_ECN_SEEN;




[PATCH 4.17 331/336] tcp: do not aggressively quick ack after ECN events

2018-08-01 Thread Greg Kroah-Hartman
4.17-stable review patch.  If anyone has any objections, please let me know.

--

From: Eric Dumazet 

[ Upstream commit 522040ea5fdd1c33bbf75e1d7c7c0422b96a94ef ]

ECN signals currently forces TCP to enter quickack mode for
up to 16 (TCP_MAX_QUICKACKS) following incoming packets.

We believe this is not needed, and only sending one immediate ack
for the current packet should be enough.

This should reduce the extra load noticed in DCTCP environments,
after congestion events.

This is part 2 of our effort to reduce pure ACK packets.

Signed-off-by: Eric Dumazet 
Acked-by: Soheil Hassas Yeganeh 
Acked-by: Yuchung Cheng 
Acked-by: Neal Cardwell 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman 
---
 net/ipv4/tcp_input.c |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -245,7 +245,7 @@ static void __tcp_ecn_check_ce(struct tc
 * it is probably a retransmit.
 */
if (tp->ecn_flags & TCP_ECN_SEEN)
-   tcp_enter_quickack_mode((struct sock *)tp, 
TCP_MAX_QUICKACKS);
+   tcp_enter_quickack_mode((struct sock *)tp, 1);
break;
case INET_ECN_CE:
if (tcp_ca_needs_ecn((struct sock *)tp))
@@ -253,7 +253,7 @@ static void __tcp_ecn_check_ce(struct tc
 
if (!(tp->ecn_flags & TCP_ECN_DEMAND_CWR)) {
/* Better not delay acks, sender can have a very low 
cwnd */
-   tcp_enter_quickack_mode((struct sock *)tp, 
TCP_MAX_QUICKACKS);
+   tcp_enter_quickack_mode((struct sock *)tp, 1);
tp->ecn_flags |= TCP_ECN_DEMAND_CWR;
}
tp->ecn_flags |= TCP_ECN_SEEN;




[PATCH 4.17 333/336] tcp: add one more quick ack after after ECN events

2018-08-01 Thread Greg Kroah-Hartman
4.17-stable review patch.  If anyone has any objections, please let me know.

--

From: Eric Dumazet 

[ Upstream commit 15ecbe94a45ef88491ca459b26efdd02f91edb6d ]

Larry Brakmo proposal ( https://patchwork.ozlabs.org/patch/935233/
tcp: force cwnd at least 2 in tcp_cwnd_reduction) made us rethink
about our recent patch removing ~16 quick acks after ECN events.

tcp_enter_quickack_mode(sk, 1) makes sure one immediate ack is sent,
but in the case the sender cwnd was lowered to 1, we do not want
to have a delayed ack for the next packet we will receive.

Fixes: 522040ea5fdd ("tcp: do not aggressively quick ack after ECN events")
Signed-off-by: Eric Dumazet 
Reported-by: Neal Cardwell 
Cc: Lawrence Brakmo 
Acked-by: Neal Cardwell 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman 
---
 net/ipv4/tcp_input.c |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -247,7 +247,7 @@ static void __tcp_ecn_check_ce(struct so
 * it is probably a retransmit.
 */
if (tp->ecn_flags & TCP_ECN_SEEN)
-   tcp_enter_quickack_mode(sk, 1);
+   tcp_enter_quickack_mode(sk, 2);
break;
case INET_ECN_CE:
if (tcp_ca_needs_ecn(sk))
@@ -255,7 +255,7 @@ static void __tcp_ecn_check_ce(struct so
 
if (!(tp->ecn_flags & TCP_ECN_DEMAND_CWR)) {
/* Better not delay acks, sender can have a very low 
cwnd */
-   tcp_enter_quickack_mode(sk, 1);
+   tcp_enter_quickack_mode(sk, 2);
tp->ecn_flags |= TCP_ECN_DEMAND_CWR;
}
tp->ecn_flags |= TCP_ECN_SEEN;




[PATCH 4.17 333/336] tcp: add one more quick ack after after ECN events

2018-08-01 Thread Greg Kroah-Hartman
4.17-stable review patch.  If anyone has any objections, please let me know.

--

From: Eric Dumazet 

[ Upstream commit 15ecbe94a45ef88491ca459b26efdd02f91edb6d ]

Larry Brakmo proposal ( https://patchwork.ozlabs.org/patch/935233/
tcp: force cwnd at least 2 in tcp_cwnd_reduction) made us rethink
about our recent patch removing ~16 quick acks after ECN events.

tcp_enter_quickack_mode(sk, 1) makes sure one immediate ack is sent,
but in the case the sender cwnd was lowered to 1, we do not want
to have a delayed ack for the next packet we will receive.

Fixes: 522040ea5fdd ("tcp: do not aggressively quick ack after ECN events")
Signed-off-by: Eric Dumazet 
Reported-by: Neal Cardwell 
Cc: Lawrence Brakmo 
Acked-by: Neal Cardwell 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman 
---
 net/ipv4/tcp_input.c |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -247,7 +247,7 @@ static void __tcp_ecn_check_ce(struct so
 * it is probably a retransmit.
 */
if (tp->ecn_flags & TCP_ECN_SEEN)
-   tcp_enter_quickack_mode(sk, 1);
+   tcp_enter_quickack_mode(sk, 2);
break;
case INET_ECN_CE:
if (tcp_ca_needs_ecn(sk))
@@ -255,7 +255,7 @@ static void __tcp_ecn_check_ce(struct so
 
if (!(tp->ecn_flags & TCP_ECN_DEMAND_CWR)) {
/* Better not delay acks, sender can have a very low 
cwnd */
-   tcp_enter_quickack_mode(sk, 1);
+   tcp_enter_quickack_mode(sk, 2);
tp->ecn_flags |= TCP_ECN_DEMAND_CWR;
}
tp->ecn_flags |= TCP_ECN_SEEN;




[PATCH 4.14 244/246] tcp: do not aggressively quick ack after ECN events

2018-08-01 Thread Greg Kroah-Hartman
4.14-stable review patch.  If anyone has any objections, please let me know.

--

From: Eric Dumazet 

[ Upstream commit 522040ea5fdd1c33bbf75e1d7c7c0422b96a94ef ]

ECN signals currently forces TCP to enter quickack mode for
up to 16 (TCP_MAX_QUICKACKS) following incoming packets.

We believe this is not needed, and only sending one immediate ack
for the current packet should be enough.

This should reduce the extra load noticed in DCTCP environments,
after congestion events.

This is part 2 of our effort to reduce pure ACK packets.

Signed-off-by: Eric Dumazet 
Acked-by: Soheil Hassas Yeganeh 
Acked-by: Yuchung Cheng 
Acked-by: Neal Cardwell 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman 
---
 net/ipv4/tcp_input.c |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -259,7 +259,7 @@ static void __tcp_ecn_check_ce(struct tc
 * it is probably a retransmit.
 */
if (tp->ecn_flags & TCP_ECN_SEEN)
-   tcp_enter_quickack_mode((struct sock *)tp, 
TCP_MAX_QUICKACKS);
+   tcp_enter_quickack_mode((struct sock *)tp, 1);
break;
case INET_ECN_CE:
if (tcp_ca_needs_ecn((struct sock *)tp))
@@ -267,7 +267,7 @@ static void __tcp_ecn_check_ce(struct tc
 
if (!(tp->ecn_flags & TCP_ECN_DEMAND_CWR)) {
/* Better not delay acks, sender can have a very low 
cwnd */
-   tcp_enter_quickack_mode((struct sock *)tp, 
TCP_MAX_QUICKACKS);
+   tcp_enter_quickack_mode((struct sock *)tp, 1);
tp->ecn_flags |= TCP_ECN_DEMAND_CWR;
}
tp->ecn_flags |= TCP_ECN_SEEN;




[PATCH 4.14 244/246] tcp: do not aggressively quick ack after ECN events

2018-08-01 Thread Greg Kroah-Hartman
4.14-stable review patch.  If anyone has any objections, please let me know.

--

From: Eric Dumazet 

[ Upstream commit 522040ea5fdd1c33bbf75e1d7c7c0422b96a94ef ]

ECN signals currently forces TCP to enter quickack mode for
up to 16 (TCP_MAX_QUICKACKS) following incoming packets.

We believe this is not needed, and only sending one immediate ack
for the current packet should be enough.

This should reduce the extra load noticed in DCTCP environments,
after congestion events.

This is part 2 of our effort to reduce pure ACK packets.

Signed-off-by: Eric Dumazet 
Acked-by: Soheil Hassas Yeganeh 
Acked-by: Yuchung Cheng 
Acked-by: Neal Cardwell 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman 
---
 net/ipv4/tcp_input.c |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -259,7 +259,7 @@ static void __tcp_ecn_check_ce(struct tc
 * it is probably a retransmit.
 */
if (tp->ecn_flags & TCP_ECN_SEEN)
-   tcp_enter_quickack_mode((struct sock *)tp, 
TCP_MAX_QUICKACKS);
+   tcp_enter_quickack_mode((struct sock *)tp, 1);
break;
case INET_ECN_CE:
if (tcp_ca_needs_ecn((struct sock *)tp))
@@ -267,7 +267,7 @@ static void __tcp_ecn_check_ce(struct tc
 
if (!(tp->ecn_flags & TCP_ECN_DEMAND_CWR)) {
/* Better not delay acks, sender can have a very low 
cwnd */
-   tcp_enter_quickack_mode((struct sock *)tp, 
TCP_MAX_QUICKACKS);
+   tcp_enter_quickack_mode((struct sock *)tp, 1);
tp->ecn_flags |= TCP_ECN_DEMAND_CWR;
}
tp->ecn_flags |= TCP_ECN_SEEN;




[PATCH 4.14 246/246] tcp: add one more quick ack after after ECN events

2018-08-01 Thread Greg Kroah-Hartman
4.14-stable review patch.  If anyone has any objections, please let me know.

--

From: Eric Dumazet 

[ Upstream commit 15ecbe94a45ef88491ca459b26efdd02f91edb6d ]

Larry Brakmo proposal ( https://patchwork.ozlabs.org/patch/935233/
tcp: force cwnd at least 2 in tcp_cwnd_reduction) made us rethink
about our recent patch removing ~16 quick acks after ECN events.

tcp_enter_quickack_mode(sk, 1) makes sure one immediate ack is sent,
but in the case the sender cwnd was lowered to 1, we do not want
to have a delayed ack for the next packet we will receive.

Fixes: 522040ea5fdd ("tcp: do not aggressively quick ack after ECN events")
Signed-off-by: Eric Dumazet 
Reported-by: Neal Cardwell 
Cc: Lawrence Brakmo 
Acked-by: Neal Cardwell 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman 
---
 net/ipv4/tcp_input.c |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -261,7 +261,7 @@ static void __tcp_ecn_check_ce(struct so
 * it is probably a retransmit.
 */
if (tp->ecn_flags & TCP_ECN_SEEN)
-   tcp_enter_quickack_mode(sk, 1);
+   tcp_enter_quickack_mode(sk, 2);
break;
case INET_ECN_CE:
if (tcp_ca_needs_ecn(sk))
@@ -269,7 +269,7 @@ static void __tcp_ecn_check_ce(struct so
 
if (!(tp->ecn_flags & TCP_ECN_DEMAND_CWR)) {
/* Better not delay acks, sender can have a very low 
cwnd */
-   tcp_enter_quickack_mode(sk, 1);
+   tcp_enter_quickack_mode(sk, 2);
tp->ecn_flags |= TCP_ECN_DEMAND_CWR;
}
tp->ecn_flags |= TCP_ECN_SEEN;




[PATCH 4.14 246/246] tcp: add one more quick ack after after ECN events

2018-08-01 Thread Greg Kroah-Hartman
4.14-stable review patch.  If anyone has any objections, please let me know.

--

From: Eric Dumazet 

[ Upstream commit 15ecbe94a45ef88491ca459b26efdd02f91edb6d ]

Larry Brakmo proposal ( https://patchwork.ozlabs.org/patch/935233/
tcp: force cwnd at least 2 in tcp_cwnd_reduction) made us rethink
about our recent patch removing ~16 quick acks after ECN events.

tcp_enter_quickack_mode(sk, 1) makes sure one immediate ack is sent,
but in the case the sender cwnd was lowered to 1, we do not want
to have a delayed ack for the next packet we will receive.

Fixes: 522040ea5fdd ("tcp: do not aggressively quick ack after ECN events")
Signed-off-by: Eric Dumazet 
Reported-by: Neal Cardwell 
Cc: Lawrence Brakmo 
Acked-by: Neal Cardwell 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman 
---
 net/ipv4/tcp_input.c |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -261,7 +261,7 @@ static void __tcp_ecn_check_ce(struct so
 * it is probably a retransmit.
 */
if (tp->ecn_flags & TCP_ECN_SEEN)
-   tcp_enter_quickack_mode(sk, 1);
+   tcp_enter_quickack_mode(sk, 2);
break;
case INET_ECN_CE:
if (tcp_ca_needs_ecn(sk))
@@ -269,7 +269,7 @@ static void __tcp_ecn_check_ce(struct so
 
if (!(tp->ecn_flags & TCP_ECN_DEMAND_CWR)) {
/* Better not delay acks, sender can have a very low 
cwnd */
-   tcp_enter_quickack_mode(sk, 1);
+   tcp_enter_quickack_mode(sk, 2);
tp->ecn_flags |= TCP_ECN_DEMAND_CWR;
}
tp->ecn_flags |= TCP_ECN_SEEN;




[PATCH 4.15 14/84] tcp: allow TLP in ECN CWR

2018-03-23 Thread Greg Kroah-Hartman
4.15-stable review patch.  If anyone has any objections, please let me know.

--

From: Neal Cardwell 


[ Upstream commit b4f70c3d4ec32a2ff4c62e1e2da0da5f55fe12bd ]

This patch enables tail loss probe in cwnd reduction (CWR) state
to detect potential losses. Prior to this patch, since the sender
uses PRR to determine the cwnd in CWR state, the combination of
CWR+PRR plus tcp_tso_should_defer() could cause unnecessary stalls
upon losses: PRR makes cwnd so gentle that tcp_tso_should_defer()
defers sending wait for more ACKs. The ACKs may not come due to
packet losses.

Disallowing TLP when there is unused cwnd had the primary effect
of disallowing TLP when there is TSO deferral, Nagle deferral,
or we hit the rwin limit. Because basically every application
write() or incoming ACK will cause us to run tcp_write_xmit()
to see if we can send more, and then if we sent something we call
tcp_schedule_loss_probe() to see if we should schedule a TLP. At
that point, there are a few common reasons why some cwnd budget
could still be unused:

(a) rwin limit
(b) nagle check
(c) TSO deferral
(d) TSQ

For (d), after the next packet tx completion the TSQ mechanism
will allow us to send more packets, so we don't really need a
TLP (in practice it shouldn't matter whether we schedule one
or not). But for (a), (b), (c) the sender won't send any more
packets until it gets another ACK. But if the whole flight was
lost, or all the ACKs were lost, then we won't get any more ACKs,
and ideally we should schedule and send a TLP to get more feedback.
In particular for a long time we have wanted some kind of timer for
TSO deferral, and at least this would give us some kind of timer

Reported-by: Steve Ibanez 
Signed-off-by: Neal Cardwell 
Signed-off-by: Yuchung Cheng 
Reviewed-by: Nandita Dukkipati 
Reviewed-by: Eric Dumazet 
Signed-off-by: David S. Miller 
Signed-off-by: Sasha Levin 
Signed-off-by: Greg Kroah-Hartman 
---
 net/ipv4/tcp_output.c |9 +++--
 1 file changed, 3 insertions(+), 6 deletions(-)

--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -2440,15 +2440,12 @@ bool tcp_schedule_loss_probe(struct sock
 
early_retrans = sock_net(sk)->ipv4.sysctl_tcp_early_retrans;
/* Schedule a loss probe in 2*RTT for SACK capable connections
-* in Open state, that are either limited by cwnd or application.
+* not in loss recovery, that are either limited by cwnd or application.
 */
if ((early_retrans != 3 && early_retrans != 4) ||
!tp->packets_out || !tcp_is_sack(tp) ||
-   icsk->icsk_ca_state != TCP_CA_Open)
-   return false;
-
-   if ((tp->snd_cwnd > tcp_packets_in_flight(tp)) &&
-!tcp_write_queue_empty(sk))
+   (icsk->icsk_ca_state != TCP_CA_Open &&
+icsk->icsk_ca_state != TCP_CA_CWR))
return false;
 
/* Probe timeout is 2*rtt. Add minimum RTO to account




[PATCH 4.15 14/84] tcp: allow TLP in ECN CWR

2018-03-23 Thread Greg Kroah-Hartman
4.15-stable review patch.  If anyone has any objections, please let me know.

--

From: Neal Cardwell 


[ Upstream commit b4f70c3d4ec32a2ff4c62e1e2da0da5f55fe12bd ]

This patch enables tail loss probe in cwnd reduction (CWR) state
to detect potential losses. Prior to this patch, since the sender
uses PRR to determine the cwnd in CWR state, the combination of
CWR+PRR plus tcp_tso_should_defer() could cause unnecessary stalls
upon losses: PRR makes cwnd so gentle that tcp_tso_should_defer()
defers sending wait for more ACKs. The ACKs may not come due to
packet losses.

Disallowing TLP when there is unused cwnd had the primary effect
of disallowing TLP when there is TSO deferral, Nagle deferral,
or we hit the rwin limit. Because basically every application
write() or incoming ACK will cause us to run tcp_write_xmit()
to see if we can send more, and then if we sent something we call
tcp_schedule_loss_probe() to see if we should schedule a TLP. At
that point, there are a few common reasons why some cwnd budget
could still be unused:

(a) rwin limit
(b) nagle check
(c) TSO deferral
(d) TSQ

For (d), after the next packet tx completion the TSQ mechanism
will allow us to send more packets, so we don't really need a
TLP (in practice it shouldn't matter whether we schedule one
or not). But for (a), (b), (c) the sender won't send any more
packets until it gets another ACK. But if the whole flight was
lost, or all the ACKs were lost, then we won't get any more ACKs,
and ideally we should schedule and send a TLP to get more feedback.
In particular for a long time we have wanted some kind of timer for
TSO deferral, and at least this would give us some kind of timer

Reported-by: Steve Ibanez 
Signed-off-by: Neal Cardwell 
Signed-off-by: Yuchung Cheng 
Reviewed-by: Nandita Dukkipati 
Reviewed-by: Eric Dumazet 
Signed-off-by: David S. Miller 
Signed-off-by: Sasha Levin 
Signed-off-by: Greg Kroah-Hartman 
---
 net/ipv4/tcp_output.c |9 +++--
 1 file changed, 3 insertions(+), 6 deletions(-)

--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -2440,15 +2440,12 @@ bool tcp_schedule_loss_probe(struct sock
 
early_retrans = sock_net(sk)->ipv4.sysctl_tcp_early_retrans;
/* Schedule a loss probe in 2*RTT for SACK capable connections
-* in Open state, that are either limited by cwnd or application.
+* not in loss recovery, that are either limited by cwnd or application.
 */
if ((early_retrans != 3 && early_retrans != 4) ||
!tp->packets_out || !tcp_is_sack(tp) ||
-   icsk->icsk_ca_state != TCP_CA_Open)
-   return false;
-
-   if ((tp->snd_cwnd > tcp_packets_in_flight(tp)) &&
-!tcp_write_queue_empty(sk))
+   (icsk->icsk_ca_state != TCP_CA_Open &&
+icsk->icsk_ca_state != TCP_CA_CWR))
return false;
 
/* Probe timeout is 2*rtt. Add minimum RTO to account




[PATCH AUTOSEL for 4.15 15/78] tcp: allow TLP in ECN CWR

2018-03-07 Thread Sasha Levin
From: Neal Cardwell 

[ Upstream commit b4f70c3d4ec32a2ff4c62e1e2da0da5f55fe12bd ]

This patch enables tail loss probe in cwnd reduction (CWR) state
to detect potential losses. Prior to this patch, since the sender
uses PRR to determine the cwnd in CWR state, the combination of
CWR+PRR plus tcp_tso_should_defer() could cause unnecessary stalls
upon losses: PRR makes cwnd so gentle that tcp_tso_should_defer()
defers sending wait for more ACKs. The ACKs may not come due to
packet losses.

Disallowing TLP when there is unused cwnd had the primary effect
of disallowing TLP when there is TSO deferral, Nagle deferral,
or we hit the rwin limit. Because basically every application
write() or incoming ACK will cause us to run tcp_write_xmit()
to see if we can send more, and then if we sent something we call
tcp_schedule_loss_probe() to see if we should schedule a TLP. At
that point, there are a few common reasons why some cwnd budget
could still be unused:

(a) rwin limit
(b) nagle check
(c) TSO deferral
(d) TSQ

For (d), after the next packet tx completion the TSQ mechanism
will allow us to send more packets, so we don't really need a
TLP (in practice it shouldn't matter whether we schedule one
or not). But for (a), (b), (c) the sender won't send any more
packets until it gets another ACK. But if the whole flight was
lost, or all the ACKs were lost, then we won't get any more ACKs,
and ideally we should schedule and send a TLP to get more feedback.
In particular for a long time we have wanted some kind of timer for
TSO deferral, and at least this would give us some kind of timer

Reported-by: Steve Ibanez 
Signed-off-by: Neal Cardwell 
Signed-off-by: Yuchung Cheng 
Reviewed-by: Nandita Dukkipati 
Reviewed-by: Eric Dumazet 
Signed-off-by: David S. Miller 
Signed-off-by: Sasha Levin 
---
 net/ipv4/tcp_output.c | 9 +++--
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index a4d214c7b506..04be9f833927 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -2414,15 +2414,12 @@ bool tcp_schedule_loss_probe(struct sock *sk, bool 
advancing_rto)
 
early_retrans = sock_net(sk)->ipv4.sysctl_tcp_early_retrans;
/* Schedule a loss probe in 2*RTT for SACK capable connections
-* in Open state, that are either limited by cwnd or application.
+* not in loss recovery, that are either limited by cwnd or application.
 */
if ((early_retrans != 3 && early_retrans != 4) ||
!tp->packets_out || !tcp_is_sack(tp) ||
-   icsk->icsk_ca_state != TCP_CA_Open)
-   return false;
-
-   if ((tp->snd_cwnd > tcp_packets_in_flight(tp)) &&
-!tcp_write_queue_empty(sk))
+   (icsk->icsk_ca_state != TCP_CA_Open &&
+icsk->icsk_ca_state != TCP_CA_CWR))
return false;
 
/* Probe timeout is 2*rtt. Add minimum RTO to account
-- 
2.14.1


[PATCH AUTOSEL for 4.15 15/78] tcp: allow TLP in ECN CWR

2018-03-07 Thread Sasha Levin
From: Neal Cardwell 

[ Upstream commit b4f70c3d4ec32a2ff4c62e1e2da0da5f55fe12bd ]

This patch enables tail loss probe in cwnd reduction (CWR) state
to detect potential losses. Prior to this patch, since the sender
uses PRR to determine the cwnd in CWR state, the combination of
CWR+PRR plus tcp_tso_should_defer() could cause unnecessary stalls
upon losses: PRR makes cwnd so gentle that tcp_tso_should_defer()
defers sending wait for more ACKs. The ACKs may not come due to
packet losses.

Disallowing TLP when there is unused cwnd had the primary effect
of disallowing TLP when there is TSO deferral, Nagle deferral,
or we hit the rwin limit. Because basically every application
write() or incoming ACK will cause us to run tcp_write_xmit()
to see if we can send more, and then if we sent something we call
tcp_schedule_loss_probe() to see if we should schedule a TLP. At
that point, there are a few common reasons why some cwnd budget
could still be unused:

(a) rwin limit
(b) nagle check
(c) TSO deferral
(d) TSQ

For (d), after the next packet tx completion the TSQ mechanism
will allow us to send more packets, so we don't really need a
TLP (in practice it shouldn't matter whether we schedule one
or not). But for (a), (b), (c) the sender won't send any more
packets until it gets another ACK. But if the whole flight was
lost, or all the ACKs were lost, then we won't get any more ACKs,
and ideally we should schedule and send a TLP to get more feedback.
In particular for a long time we have wanted some kind of timer for
TSO deferral, and at least this would give us some kind of timer

Reported-by: Steve Ibanez 
Signed-off-by: Neal Cardwell 
Signed-off-by: Yuchung Cheng 
Reviewed-by: Nandita Dukkipati 
Reviewed-by: Eric Dumazet 
Signed-off-by: David S. Miller 
Signed-off-by: Sasha Levin 
---
 net/ipv4/tcp_output.c | 9 +++--
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index a4d214c7b506..04be9f833927 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -2414,15 +2414,12 @@ bool tcp_schedule_loss_probe(struct sock *sk, bool 
advancing_rto)
 
early_retrans = sock_net(sk)->ipv4.sysctl_tcp_early_retrans;
/* Schedule a loss probe in 2*RTT for SACK capable connections
-* in Open state, that are either limited by cwnd or application.
+* not in loss recovery, that are either limited by cwnd or application.
 */
if ((early_retrans != 3 && early_retrans != 4) ||
!tp->packets_out || !tcp_is_sack(tp) ||
-   icsk->icsk_ca_state != TCP_CA_Open)
-   return false;
-
-   if ((tp->snd_cwnd > tcp_packets_in_flight(tp)) &&
-!tcp_write_queue_empty(sk))
+   (icsk->icsk_ca_state != TCP_CA_Open &&
+icsk->icsk_ca_state != TCP_CA_CWR))
return false;
 
/* Probe timeout is 2*rtt. Add minimum RTO to account
-- 
2.14.1


Process phantom ECN event in TCP without CWR response

2017-05-23 Thread Lars Erik Storbukås
I'm trying to generate phantom ECN events to (manually) decrease the
transmission rate/throughput.

The signals is meant to be generated and received on a single host. I
don't want the ECN event to generate a CWR (Congestion Window Reduced)
response to the sender. I'm trying to think of ways to avoid the TCP
code from entering the part of an ECN event, where the response to the
sender is generated.

I have thought of two (possible) solutions:

1. Before the phantom ECN signal is generated, a FLAG is set,
indicating that a phantom ECN event is coming. Before entering the
part where the CWR response is generated, perform a check on whether
the FLAG is set or not (if set - do not enter CWR part).

2. Instead of generating ECN signals (modify incoming packets), use a
flag to indicate that the next incoming ACK is processed as if it were
an ECN signal (except entering the CWR part).


Any input on how to implement, or pointers for where to look for
similar solutions is greatly appreciated.

...

For those who are interested in why I'm trying to achieve this:

I'm working on the implementation of a Deadline Aware, Less than Best
Effort framework. A framework for adding both LBE behaviour and
awareness of “soft” delivery deadlines to any congestion control (CC)
algorithm, whether loss-based, delay- based or explicit
signaling-based. This effectively allows it to turn an arbitrary CC
protocol into a scavenger protocol that dynamically adapts its sending
rate to network conditions and remaining time before the deadline, to
balance timeliness and transmission aggressiveness.

/ Lars Erik Storbukås (storbukas@gmail.com)


Process phantom ECN event in TCP without CWR response

2017-05-23 Thread Lars Erik Storbukås
I'm trying to generate phantom ECN events to (manually) decrease the
transmission rate/throughput.

The signals is meant to be generated and received on a single host. I
don't want the ECN event to generate a CWR (Congestion Window Reduced)
response to the sender. I'm trying to think of ways to avoid the TCP
code from entering the part of an ECN event, where the response to the
sender is generated.

I have thought of two (possible) solutions:

1. Before the phantom ECN signal is generated, a FLAG is set,
indicating that a phantom ECN event is coming. Before entering the
part where the CWR response is generated, perform a check on whether
the FLAG is set or not (if set - do not enter CWR part).

2. Instead of generating ECN signals (modify incoming packets), use a
flag to indicate that the next incoming ACK is processed as if it were
an ECN signal (except entering the CWR part).


Any input on how to implement, or pointers for where to look for
similar solutions is greatly appreciated.

...

For those who are interested in why I'm trying to achieve this:

I'm working on the implementation of a Deadline Aware, Less than Best
Effort framework. A framework for adding both LBE behaviour and
awareness of “soft” delivery deadlines to any congestion control (CC)
algorithm, whether loss-based, delay- based or explicit
signaling-based. This effectively allows it to turn an arbitrary CC
protocol into a scavenger protocol that dynamically adapts its sending
rate to network conditions and remaining time before the deadline, to
balance timeliness and transmission aggressiveness.

/ Lars Erik Storbukås (storbukas@gmail.com)


Process phantom ECN event in TCP without CWR response

2017-05-22 Thread Lars Erik Storbukås
I'm trying to generate phantom ECN events to (manually) decrease the
transmission rate/throughput.

The signals is meant to be generated and received on a single host. I
don't want the ECN event to generate a CWR (Congestion Window Reduced)
response to the sender. I'm trying to think of ways to avoid the TCP
code from entering the part of an ECN event, where the response to the
sender is generated.

I have thought of two (possible) solutions:

1. Before the phantom ECN signal is generated, a FLAG is set,
indicating that a phantom ECN event is coming. Before entering the
part where the CWR response is generated, perform a check on whether
the FLAG is set or not (if set - do not enter CWR part).

2. Instead of generating ECN signals (modify incoming packets), use a
flag to indicate that the next incoming ACK is processed as if it were
an ECN signal (except entering the CWR part).


Any input on how to implement, or pointers for where to look for
similar solutions is greatly appreciated.

...

For those who are interested in why I'm trying to achieve this:

I'm working on the implementation of a Deadline Aware, Less than Best
Effort framework. A framework for adding both LBE behaviour and
awareness of “soft” delivery deadlines to any congestion control (CC)
algorithm, whether loss-based, delay- based or explicit
signaling-based. This effectively allows it to turn an arbitrary CC
protocol into a scavenger protocol that dynamically adapts its sending
rate to network conditions and remaining time before the deadline, to
balance timeliness and transmission aggressiveness.

/ Lars Erik Storbukås (storbukas@gmail.com)


Process phantom ECN event in TCP without CWR response

2017-05-22 Thread Lars Erik Storbukås
I'm trying to generate phantom ECN events to (manually) decrease the
transmission rate/throughput.

The signals is meant to be generated and received on a single host. I
don't want the ECN event to generate a CWR (Congestion Window Reduced)
response to the sender. I'm trying to think of ways to avoid the TCP
code from entering the part of an ECN event, where the response to the
sender is generated.

I have thought of two (possible) solutions:

1. Before the phantom ECN signal is generated, a FLAG is set,
indicating that a phantom ECN event is coming. Before entering the
part where the CWR response is generated, perform a check on whether
the FLAG is set or not (if set - do not enter CWR part).

2. Instead of generating ECN signals (modify incoming packets), use a
flag to indicate that the next incoming ACK is processed as if it were
an ECN signal (except entering the CWR part).


Any input on how to implement, or pointers for where to look for
similar solutions is greatly appreciated.

...

For those who are interested in why I'm trying to achieve this:

I'm working on the implementation of a Deadline Aware, Less than Best
Effort framework. A framework for adding both LBE behaviour and
awareness of “soft” delivery deadlines to any congestion control (CC)
algorithm, whether loss-based, delay- based or explicit
signaling-based. This effectively allows it to turn an arbitrary CC
protocol into a scavenger protocol that dynamically adapts its sending
rate to network conditions and remaining time before the deadline, to
balance timeliness and transmission aggressiveness.

/ Lars Erik Storbukås (storbukas@gmail.com)


[PATCH 3.2 160/164] tcp: be more strict before accepting ECN negociation

2014-12-11 Thread Ben Hutchings
3.2.65-rc1 review patch.  If anyone has any objections, please let me know.

--

From: Eric Dumazet 

commit bd14b1b2e29bd6812597f896dde06eaf7c6d2f24 upstream.

It appears some networks play bad games with the two bits reserved for
ECN. This can trigger false congestion notifications and very slow
transferts.

Since RFC 3168 (6.1.1) forbids SYN packets to carry CT bits, we can
disable TCP ECN negociation if it happens we receive mangled CT bits in
the SYN packet.

Signed-off-by: Eric Dumazet 
Cc: Perry Lorier 
Cc: Matt Mathis 
Cc: Yuchung Cheng 
Cc: Neal Cardwell 
Cc: Wilmer van der Gaast 
Cc: Ankur Jain 
Cc: Tom Herbert 
Cc: Dave Täht 
Acked-by: Neal Cardwell 
Signed-off-by: David S. Miller 
Signed-off-by: Ben Hutchings 
Cc: Florian Westphal 
---
 include/net/tcp.h   | 23 ---
 net/ipv4/tcp_ipv4.c |  2 +-
 net/ipv6/tcp_ipv6.c |  2 +-
 3 files changed, 18 insertions(+), 9 deletions(-)

--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -358,13 +358,6 @@ static inline void tcp_dec_quickack_mode
 #defineTCP_ECN_DEMAND_CWR  4
 #defineTCP_ECN_SEEN8
 
-static __inline__ void
-TCP_ECN_create_request(struct request_sock *req, struct tcphdr *th)
-{
-   if (sysctl_tcp_ecn && th->ece && th->cwr)
-   inet_rsk(req)->ecn_ok = 1;
-}
-
 enum tcp_tw_status {
TCP_TW_SUCCESS = 0,
TCP_TW_RST = 1,
@@ -652,6 +645,22 @@ struct tcp_skb_cb {
 
 #define TCP_SKB_CB(__skb)  ((struct tcp_skb_cb *)&((__skb)->cb[0]))
 
+/* RFC3168 : 6.1.1 SYN packets must not have ECT/ECN bits set
+ *
+ * If we receive a SYN packet with these bits set, it means a network is
+ * playing bad games with TOS bits. In order to avoid possible false congestion
+ * notifications, we disable TCP ECN negociation.
+ */
+static inline void
+TCP_ECN_create_request(struct request_sock *req, const struct sk_buff *skb)
+{
+   const struct tcphdr *th = tcp_hdr(skb);
+
+   if (sysctl_tcp_ecn && th->ece && th->cwr &&
+   INET_ECN_is_not_ect(TCP_SKB_CB(skb)->ip_dsfield))
+   inet_rsk(req)->ecn_ok = 1;
+}
+
 /* Due to TSO, an SKB can be composed of multiple actual
  * packets.  To keep these tracked properly, we use this.
  */
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -1352,7 +1352,7 @@ int tcp_v4_conn_request(struct sock *sk,
goto drop_and_free;
 
if (!want_cookie || tmp_opt.tstamp_ok)
-   TCP_ECN_create_request(req, tcp_hdr(skb));
+   TCP_ECN_create_request(req, skb);
 
if (want_cookie) {
isn = cookie_v4_init_sequence(sk, skb, >mss);
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -1254,7 +1254,7 @@ static int tcp_v6_conn_request(struct so
ipv6_addr_copy(>rmt_addr, _hdr(skb)->saddr);
ipv6_addr_copy(>loc_addr, _hdr(skb)->daddr);
if (!want_cookie || tmp_opt.tstamp_ok)
-   TCP_ECN_create_request(req, tcp_hdr(skb));
+   TCP_ECN_create_request(req, skb);
 
treq->iif = sk->sk_bound_dev_if;
 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3.2 160/164] tcp: be more strict before accepting ECN negociation

2014-12-11 Thread Ben Hutchings
3.2.65-rc1 review patch.  If anyone has any objections, please let me know.

--

From: Eric Dumazet eduma...@google.com

commit bd14b1b2e29bd6812597f896dde06eaf7c6d2f24 upstream.

It appears some networks play bad games with the two bits reserved for
ECN. This can trigger false congestion notifications and very slow
transferts.

Since RFC 3168 (6.1.1) forbids SYN packets to carry CT bits, we can
disable TCP ECN negociation if it happens we receive mangled CT bits in
the SYN packet.

Signed-off-by: Eric Dumazet eduma...@google.com
Cc: Perry Lorier per...@google.com
Cc: Matt Mathis mattmat...@google.com
Cc: Yuchung Cheng ych...@google.com
Cc: Neal Cardwell ncardw...@google.com
Cc: Wilmer van der Gaast wil...@google.com
Cc: Ankur Jain jan...@google.com
Cc: Tom Herbert therb...@google.com
Cc: Dave Täht dave.t...@bufferbloat.net
Acked-by: Neal Cardwell ncardw...@google.com
Signed-off-by: David S. Miller da...@davemloft.net
Signed-off-by: Ben Hutchings b...@decadent.org.uk
Cc: Florian Westphal f...@strlen.de
---
 include/net/tcp.h   | 23 ---
 net/ipv4/tcp_ipv4.c |  2 +-
 net/ipv6/tcp_ipv6.c |  2 +-
 3 files changed, 18 insertions(+), 9 deletions(-)

--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -358,13 +358,6 @@ static inline void tcp_dec_quickack_mode
 #defineTCP_ECN_DEMAND_CWR  4
 #defineTCP_ECN_SEEN8
 
-static __inline__ void
-TCP_ECN_create_request(struct request_sock *req, struct tcphdr *th)
-{
-   if (sysctl_tcp_ecn  th-ece  th-cwr)
-   inet_rsk(req)-ecn_ok = 1;
-}
-
 enum tcp_tw_status {
TCP_TW_SUCCESS = 0,
TCP_TW_RST = 1,
@@ -652,6 +645,22 @@ struct tcp_skb_cb {
 
 #define TCP_SKB_CB(__skb)  ((struct tcp_skb_cb *)((__skb)-cb[0]))
 
+/* RFC3168 : 6.1.1 SYN packets must not have ECT/ECN bits set
+ *
+ * If we receive a SYN packet with these bits set, it means a network is
+ * playing bad games with TOS bits. In order to avoid possible false congestion
+ * notifications, we disable TCP ECN negociation.
+ */
+static inline void
+TCP_ECN_create_request(struct request_sock *req, const struct sk_buff *skb)
+{
+   const struct tcphdr *th = tcp_hdr(skb);
+
+   if (sysctl_tcp_ecn  th-ece  th-cwr 
+   INET_ECN_is_not_ect(TCP_SKB_CB(skb)-ip_dsfield))
+   inet_rsk(req)-ecn_ok = 1;
+}
+
 /* Due to TSO, an SKB can be composed of multiple actual
  * packets.  To keep these tracked properly, we use this.
  */
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -1352,7 +1352,7 @@ int tcp_v4_conn_request(struct sock *sk,
goto drop_and_free;
 
if (!want_cookie || tmp_opt.tstamp_ok)
-   TCP_ECN_create_request(req, tcp_hdr(skb));
+   TCP_ECN_create_request(req, skb);
 
if (want_cookie) {
isn = cookie_v4_init_sequence(sk, skb, req-mss);
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -1254,7 +1254,7 @@ static int tcp_v6_conn_request(struct so
ipv6_addr_copy(treq-rmt_addr, ipv6_hdr(skb)-saddr);
ipv6_addr_copy(treq-loc_addr, ipv6_hdr(skb)-daddr);
if (!want_cookie || tmp_opt.tstamp_ok)
-   TCP_ECN_create_request(req, tcp_hdr(skb));
+   TCP_ECN_create_request(req, skb);
 
treq-iif = sk-sk_bound_dev_if;
 

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3.16 338/357] ipvs: Maintain all DSCP and ECN bits for ipv6 tun forwarding

2014-10-03 Thread Greg Kroah-Hartman
3.16-stable review patch.  If anyone has any objections, please let me know.

--

From: Alex Gartrell 

commit 76f084bc10004b3050b2cff9cfac29148f1f6088 upstream.

Previously, only the four high bits of the tclass were maintained in the
ipv6 case.  This matches the behavior of ipv4, though whether or not we
should reflect ECN bits may be up for debate.

Signed-off-by: Alex Gartrell 
Acked-by: Julian Anastasov 
Signed-off-by: Simon Horman 
Signed-off-by: Greg Kroah-Hartman 

---
 net/netfilter/ipvs/ip_vs_xmit.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/net/netfilter/ipvs/ip_vs_xmit.c
+++ b/net/netfilter/ipvs/ip_vs_xmit.c
@@ -967,8 +967,8 @@ ip_vs_tunnel_xmit_v6(struct sk_buff *skb
iph->nexthdr=   IPPROTO_IPV6;
iph->payload_len=   old_iph->payload_len;
be16_add_cpu(>payload_len, sizeof(*old_iph));
-   iph->priority   =   old_iph->priority;
memset(>flow_lbl, 0, sizeof(iph->flow_lbl));
+   ipv6_change_dsfield(iph, 0, ipv6_get_dsfield(old_iph));
iph->daddr = cp->daddr.in6;
iph->saddr = saddr;
iph->hop_limit  =   old_iph->hop_limit;


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3.10 129/143] ipvs: Maintain all DSCP and ECN bits for ipv6 tun forwarding

2014-10-03 Thread Greg Kroah-Hartman
3.10-stable review patch.  If anyone has any objections, please let me know.

--

From: Alex Gartrell 

commit 76f084bc10004b3050b2cff9cfac29148f1f6088 upstream.

Previously, only the four high bits of the tclass were maintained in the
ipv6 case.  This matches the behavior of ipv4, though whether or not we
should reflect ECN bits may be up for debate.

Signed-off-by: Alex Gartrell 
Acked-by: Julian Anastasov 
Signed-off-by: Simon Horman 
Signed-off-by: Greg Kroah-Hartman 

---
 net/netfilter/ipvs/ip_vs_xmit.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/net/netfilter/ipvs/ip_vs_xmit.c
+++ b/net/netfilter/ipvs/ip_vs_xmit.c
@@ -967,8 +967,8 @@ ip_vs_tunnel_xmit_v6(struct sk_buff *skb
iph->nexthdr=   IPPROTO_IPV6;
iph->payload_len=   old_iph->payload_len;
be16_add_cpu(>payload_len, sizeof(*old_iph));
-   iph->priority   =   old_iph->priority;
memset(>flow_lbl, 0, sizeof(iph->flow_lbl));
+   ipv6_change_dsfield(iph, 0, ipv6_get_dsfield(old_iph));
iph->daddr = cp->daddr.in6;
iph->saddr = saddr;
iph->hop_limit  =   old_iph->hop_limit;


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3.14 224/238] ipvs: Maintain all DSCP and ECN bits for ipv6 tun forwarding

2014-10-03 Thread Greg Kroah-Hartman
3.14-stable review patch.  If anyone has any objections, please let me know.

--

From: Alex Gartrell 

commit 76f084bc10004b3050b2cff9cfac29148f1f6088 upstream.

Previously, only the four high bits of the tclass were maintained in the
ipv6 case.  This matches the behavior of ipv4, though whether or not we
should reflect ECN bits may be up for debate.

Signed-off-by: Alex Gartrell 
Acked-by: Julian Anastasov 
Signed-off-by: Simon Horman 
Signed-off-by: Greg Kroah-Hartman 

---
 net/netfilter/ipvs/ip_vs_xmit.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/net/netfilter/ipvs/ip_vs_xmit.c
+++ b/net/netfilter/ipvs/ip_vs_xmit.c
@@ -967,8 +967,8 @@ ip_vs_tunnel_xmit_v6(struct sk_buff *skb
iph->nexthdr=   IPPROTO_IPV6;
iph->payload_len=   old_iph->payload_len;
be16_add_cpu(>payload_len, sizeof(*old_iph));
-   iph->priority   =   old_iph->priority;
memset(>flow_lbl, 0, sizeof(iph->flow_lbl));
+   ipv6_change_dsfield(iph, 0, ipv6_get_dsfield(old_iph));
iph->daddr = cp->daddr.in6;
iph->saddr = saddr;
iph->hop_limit  =   old_iph->hop_limit;


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3.14 224/238] ipvs: Maintain all DSCP and ECN bits for ipv6 tun forwarding

2014-10-03 Thread Greg Kroah-Hartman
3.14-stable review patch.  If anyone has any objections, please let me know.

--

From: Alex Gartrell agartr...@fb.com

commit 76f084bc10004b3050b2cff9cfac29148f1f6088 upstream.

Previously, only the four high bits of the tclass were maintained in the
ipv6 case.  This matches the behavior of ipv4, though whether or not we
should reflect ECN bits may be up for debate.

Signed-off-by: Alex Gartrell agartr...@fb.com
Acked-by: Julian Anastasov j...@ssi.bg
Signed-off-by: Simon Horman ho...@verge.net.au
Signed-off-by: Greg Kroah-Hartman gre...@linuxfoundation.org

---
 net/netfilter/ipvs/ip_vs_xmit.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/net/netfilter/ipvs/ip_vs_xmit.c
+++ b/net/netfilter/ipvs/ip_vs_xmit.c
@@ -967,8 +967,8 @@ ip_vs_tunnel_xmit_v6(struct sk_buff *skb
iph-nexthdr=   IPPROTO_IPV6;
iph-payload_len=   old_iph-payload_len;
be16_add_cpu(iph-payload_len, sizeof(*old_iph));
-   iph-priority   =   old_iph-priority;
memset(iph-flow_lbl, 0, sizeof(iph-flow_lbl));
+   ipv6_change_dsfield(iph, 0, ipv6_get_dsfield(old_iph));
iph-daddr = cp-daddr.in6;
iph-saddr = saddr;
iph-hop_limit  =   old_iph-hop_limit;


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3.10 129/143] ipvs: Maintain all DSCP and ECN bits for ipv6 tun forwarding

2014-10-03 Thread Greg Kroah-Hartman
3.10-stable review patch.  If anyone has any objections, please let me know.

--

From: Alex Gartrell agartr...@fb.com

commit 76f084bc10004b3050b2cff9cfac29148f1f6088 upstream.

Previously, only the four high bits of the tclass were maintained in the
ipv6 case.  This matches the behavior of ipv4, though whether or not we
should reflect ECN bits may be up for debate.

Signed-off-by: Alex Gartrell agartr...@fb.com
Acked-by: Julian Anastasov j...@ssi.bg
Signed-off-by: Simon Horman ho...@verge.net.au
Signed-off-by: Greg Kroah-Hartman gre...@linuxfoundation.org

---
 net/netfilter/ipvs/ip_vs_xmit.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/net/netfilter/ipvs/ip_vs_xmit.c
+++ b/net/netfilter/ipvs/ip_vs_xmit.c
@@ -967,8 +967,8 @@ ip_vs_tunnel_xmit_v6(struct sk_buff *skb
iph-nexthdr=   IPPROTO_IPV6;
iph-payload_len=   old_iph-payload_len;
be16_add_cpu(iph-payload_len, sizeof(*old_iph));
-   iph-priority   =   old_iph-priority;
memset(iph-flow_lbl, 0, sizeof(iph-flow_lbl));
+   ipv6_change_dsfield(iph, 0, ipv6_get_dsfield(old_iph));
iph-daddr = cp-daddr.in6;
iph-saddr = saddr;
iph-hop_limit  =   old_iph-hop_limit;


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3.16 338/357] ipvs: Maintain all DSCP and ECN bits for ipv6 tun forwarding

2014-10-03 Thread Greg Kroah-Hartman
3.16-stable review patch.  If anyone has any objections, please let me know.

--

From: Alex Gartrell agartr...@fb.com

commit 76f084bc10004b3050b2cff9cfac29148f1f6088 upstream.

Previously, only the four high bits of the tclass were maintained in the
ipv6 case.  This matches the behavior of ipv4, though whether or not we
should reflect ECN bits may be up for debate.

Signed-off-by: Alex Gartrell agartr...@fb.com
Acked-by: Julian Anastasov j...@ssi.bg
Signed-off-by: Simon Horman ho...@verge.net.au
Signed-off-by: Greg Kroah-Hartman gre...@linuxfoundation.org

---
 net/netfilter/ipvs/ip_vs_xmit.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/net/netfilter/ipvs/ip_vs_xmit.c
+++ b/net/netfilter/ipvs/ip_vs_xmit.c
@@ -967,8 +967,8 @@ ip_vs_tunnel_xmit_v6(struct sk_buff *skb
iph-nexthdr=   IPPROTO_IPV6;
iph-payload_len=   old_iph-payload_len;
be16_add_cpu(iph-payload_len, sizeof(*old_iph));
-   iph-priority   =   old_iph-priority;
memset(iph-flow_lbl, 0, sizeof(iph-flow_lbl));
+   ipv6_change_dsfield(iph, 0, ipv6_get_dsfield(old_iph));
iph-daddr = cp-daddr.in6;
iph-saddr = saddr;
iph-hop_limit  =   old_iph-hop_limit;


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[ 67/85] ipv6: GRO should be ECN friendly

2012-10-25 Thread Greg Kroah-Hartman
3.6-stable review patch.  If anyone has any objections, please let me know.

--


From: Eric Dumazet 

[ Upstream commit 51ec04038c113a811b177baa85d293feff9ce995 ]

IPv4 side of the problem was addressed in commit a9e050f4e7f9d
(net: tcp: GRO should be ECN friendly)

This patch does the same, but for IPv6 : A Traffic Class mismatch
doesnt mean flows are different, but instead should force a flush
of previous packets.

This patch removes artificial packet reordering problem.

Signed-off-by: Eric Dumazet 
Cc: Herbert Xu 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman 
---
 net/ipv6/af_inet6.c |   11 +++
 1 file changed, 7 insertions(+), 4 deletions(-)

--- a/net/ipv6/af_inet6.c
+++ b/net/ipv6/af_inet6.c
@@ -880,22 +880,25 @@ static struct sk_buff **ipv6_gro_receive
nlen = skb_network_header_len(skb);
 
for (p = *head; p; p = p->next) {
-   struct ipv6hdr *iph2;
+   const struct ipv6hdr *iph2;
+   __be32 first_word; /* 
 */
 
if (!NAPI_GRO_CB(p)->same_flow)
continue;
 
iph2 = ipv6_hdr(p);
+   first_word = *(__be32 *)iph ^ *(__be32 *)iph2 ;
 
-   /* All fields must match except length. */
+   /* All fields must match except length and Traffic Class. */
if (nlen != skb_network_header_len(p) ||
-   memcmp(iph, iph2, offsetof(struct ipv6hdr, payload_len)) ||
+   (first_word & htonl(0xF00F)) ||
memcmp(>nexthdr, >nexthdr,
   nlen - offsetof(struct ipv6hdr, nexthdr))) {
NAPI_GRO_CB(p)->same_flow = 0;
continue;
}
-
+   /* flush if Traffic Class fields are different */
+   NAPI_GRO_CB(p)->flush |= !!(first_word & htonl(0x0FF0));
NAPI_GRO_CB(p)->flush |= flush;
}
 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[ 67/85] ipv6: GRO should be ECN friendly

2012-10-25 Thread Greg Kroah-Hartman
3.6-stable review patch.  If anyone has any objections, please let me know.

--


From: Eric Dumazet eduma...@google.com

[ Upstream commit 51ec04038c113a811b177baa85d293feff9ce995 ]

IPv4 side of the problem was addressed in commit a9e050f4e7f9d
(net: tcp: GRO should be ECN friendly)

This patch does the same, but for IPv6 : A Traffic Class mismatch
doesnt mean flows are different, but instead should force a flush
of previous packets.

This patch removes artificial packet reordering problem.

Signed-off-by: Eric Dumazet eduma...@google.com
Cc: Herbert Xu herb...@gondor.apana.org.au
Signed-off-by: David S. Miller da...@davemloft.net
Signed-off-by: Greg Kroah-Hartman gre...@linuxfoundation.org
---
 net/ipv6/af_inet6.c |   11 +++
 1 file changed, 7 insertions(+), 4 deletions(-)

--- a/net/ipv6/af_inet6.c
+++ b/net/ipv6/af_inet6.c
@@ -880,22 +880,25 @@ static struct sk_buff **ipv6_gro_receive
nlen = skb_network_header_len(skb);
 
for (p = *head; p; p = p-next) {
-   struct ipv6hdr *iph2;
+   const struct ipv6hdr *iph2;
+   __be32 first_word; /* 
Version:4Traffic_Class:8Flow_Label:20 */
 
if (!NAPI_GRO_CB(p)-same_flow)
continue;
 
iph2 = ipv6_hdr(p);
+   first_word = *(__be32 *)iph ^ *(__be32 *)iph2 ;
 
-   /* All fields must match except length. */
+   /* All fields must match except length and Traffic Class. */
if (nlen != skb_network_header_len(p) ||
-   memcmp(iph, iph2, offsetof(struct ipv6hdr, payload_len)) ||
+   (first_word  htonl(0xF00F)) ||
memcmp(iph-nexthdr, iph2-nexthdr,
   nlen - offsetof(struct ipv6hdr, nexthdr))) {
NAPI_GRO_CB(p)-same_flow = 0;
continue;
}
-
+   /* flush if Traffic Class fields are different */
+   NAPI_GRO_CB(p)-flush |= !!(first_word  htonl(0x0FF0));
NAPI_GRO_CB(p)-flush |= flush;
}
 


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: FYI: ECN approved as Standard

2001-06-14 Thread Ralf Baechle

On Thu, Jun 14, 2001 at 01:33:53PM +0200, Anders Peter Fugmann wrote:

> Great to hear, but I cannot find anything that backs it up.
> I really want to see the final RFC.
> 
> Perhaps you could send me an URL pointing to it?

Usually takes a few days until the RFC editor will announce and
publish the new RFC.

  Ralf
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: FYI: ECN approved as Standard

2001-06-14 Thread Anders Peter Fugmann

Hi Jamal

Great to hear, but I cannot find anything that backs it up.
I really want to see the final RFC.

Perhaps you could send me an URL pointing to it?

TIA
Anders Fugmann

jamal wrote:

> 
> The IESG approved ECN as a proposed standard on the 12th of June.
> That means as of now, anyone blocking ECN bits is considered to be
> blaspheming.
> 
> 
> cheers,
> jamal
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 
> 


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: FYI: ECN approved as Standard

2001-06-14 Thread Anders Peter Fugmann

Hi Jamal

Great to hear, but I cannot find anything that backs it up.
I really want to see the final RFC.

Perhaps you could send me an URL pointing to it?

TIA
Anders Fugmann

jamal wrote:

 
 The IESG approved ECN as a proposed standard on the 12th of June.
 That means as of now, anyone blocking ECN bits is considered to be
 blaspheming.
 
 
 cheers,
 jamal
 
 -
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/
 
 


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: FYI: ECN approved as Standard

2001-06-14 Thread Ralf Baechle

On Thu, Jun 14, 2001 at 01:33:53PM +0200, Anders Peter Fugmann wrote:

 Great to hear, but I cannot find anything that backs it up.
 I really want to see the final RFC.
 
 Perhaps you could send me an URL pointing to it?

Usually takes a few days until the RFC editor will announce and
publish the new RFC.

  Ralf
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



FYI: ECN approved as Standard

2001-06-13 Thread jamal



The IESG approved ECN as a proposed standard on the 12th of June.
That means as of now, anyone blocking ECN bits is considered to be
blaspheming.


cheers,
jamal

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



FYI: ECN approved as Standard

2001-06-13 Thread jamal



The IESG approved ECN as a proposed standard on the 12th of June.
That means as of now, anyone blocking ECN bits is considered to be
blaspheming.


cheers,
jamal

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: ECN is on!

2001-05-25 Thread Steve Modica

Rogier Wolff wrote:
> The "we'll turn it on in February" warning is worth NOTHING in this
> situation: February comes and goes. March comes and goes. Everybody
> who read the warning will think: Ok, so I must be fine.
> 
> A warning of the form: "ECN will go on as soon as this message clears
> the queues" would've been useful, as thousands (hundreds?) suddenly get
> nothing anymore.
> 

I agree with this line of thinking.  The various academics studying
geology have been warning California about "The Big One" for years now,
and no one seems to care anymore.  

I don't think anyone's being lazy and I certainly don't have the
information to comment on the size of their butts.  So I'd rather just
assume they were working very hard on other things (like getting TPC-H
benchmarks to run!)

Steve

-- 
Steve Modica
Manager - Networking Drivers Group
"Give a man a fish, and he will eat for a day, hit him with a fish and
he leaves you alone" - me
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: ECN is on!

2001-05-25 Thread Steve Modica

Rogier Wolff wrote:
 The we'll turn it on in February warning is worth NOTHING in this
 situation: February comes and goes. March comes and goes. Everybody
 who read the warning will think: Ok, so I must be fine.
 
 A warning of the form: ECN will go on as soon as this message clears
 the queues would've been useful, as thousands (hundreds?) suddenly get
 nothing anymore.
 

I agree with this line of thinking.  The various academics studying
geology have been warning California about The Big One for years now,
and no one seems to care anymore.  

I don't think anyone's being lazy and I certainly don't have the
information to comment on the size of their butts.  So I'd rather just
assume they were working very hard on other things (like getting TPC-H
benchmarks to run!)

Steve

-- 
Steve Modica
Manager - Networking Drivers Group
Give a man a fish, and he will eat for a day, hit him with a fish and
he leaves you alone - me
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: ECN is on!

2001-05-23 Thread Matti Aarnio

  Folks, herewith I declare this topic ("ECN is on") TABOO, if
  you want to continue discussing it, do that at   linux-kernel
  WITH NEW TOPIC.

  My original message had  reply-to  pointing to  linux-kernel,
  but all it takes is single person to ignore that...

  Spare the other lists, my original intention was to "spread
  the word", as not everybody subscribes  linux-kernel ...

/Matti Aarnio
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: ECN is on!

2001-05-23 Thread Matti Aarnio

  Folks, herewith I declare this topic (ECN is on) TABOO, if
  you want to continue discussing it, do that at   linux-kernel
  WITH NEW TOPIC.

  My original message had  reply-to  pointing to  linux-kernel,
  but all it takes is single person to ignore that...

  Spare the other lists, my original intention was to spread
  the word, as not everybody subscribes  linux-kernel ...

/Matti Aarnio
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: ECN is on!

2001-05-22 Thread Matthias Andree

Richard Gooch <[EMAIL PROTECTED]> writes:

> Sure, Dave is being bloody-minded, but that's the only way we'll see
> people get off their fat, lazy asses and fix their broken systems.
> In fact, hopefully he's still in a dark mood, and he may take up the
> suggestion to bounce mails of the following type:
> - MIME encoded
> - HTML encoded
> - quoted printables (those stupid "=20" things are particuarly hard to
>   read).

MIME is no encoding, but a way to declare mail contents and encode
binary data. You need not use it on mail you send.

HTML is no encoding. (No doubt it's usually sent by people without A
Clue[tm] or being ruthless.)

quoted-printable is an encoding, and it's probably around for ten years
now. I can send base64 if you like that better, but then, even more
people will cry, while others don't even notice. 

Gnus 5.8 + Emacs, mutt, Netscape Communicator are three packages which
deal with MIME-"enhanced" mail.

Plus, people which use any characters beyond ASCII have no real choice
but to use MIME; if they have MTAs in between that don't talk
ESMTP/8BITMIME, then quoted-printable is what happens.

Use emil, metamail or such if you want to keep your mailer.

-- 
Matthias Andree
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Final Warning [ was: ECN is on! ]

2001-05-22 Thread Matti Aarnio

On Tue, May 22, 2001 at 11:55:59AM -0500, Joe Barr wrote:
> What is ECN?  Is it the reason SNORT has started this lately:

http://vger.kernel.org/

  Follow the links, and you will get an exellent answer.

> 
> Active System Attack Alerts
> =-=-=-=-=-=-=-=-=-=-=-=-=-=
> May 22 10:11:18 pooh snort: spp_portscan: PORTSCAN DETECTED from 199.183.24.194 
>(STEALTH)
> May 22 10:11:22 pooh snort: spp_portscan: portscan status from 199.183.24.194: 1 
>connections across 1 hosts: TCP(1), UDP(0) STEALTH
> 
> 
> See ya,
> Joe Barr

/Matti Aarnio
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: ECN is on!

2001-05-22 Thread Matti Aarnio

  FOLKS, I HAVE ALL THE TIME USED 'Reply-To:' HEADER POINTING
  TO  linux-kernel -- INSTEAD OF ALL THE LISTS...

  If you want to continue this, do it there.
  (Before I decide to taboo "Re: ECN is on!" subject line..)


On Tue, May 22, 2001 at 12:23:29PM -0400, Richard Gooch wrote:
...
> Well, while that would be somewhat satisfying, there is a problem if
> the message gets corrupted by this. And since some people send to the
> list without being subscribed (or, like me, have duplicate filtering),
> they'll never see that their message was mangled as it passed through
> the list.
> 
> Nope, a bounce is better. If you're going to do these things, feedback
> is essential. The bounce isn't meant to offend the sender, it's
> designed to let them know what's happening.

The only GOOD time to bounce is at SMTP reception
into VGER, not latter.  It doesn't have facilities
to do all what Majordomo taboo filters do now.
(Just because I have been lazy and haven't done
 any such content filters for vger.)

With ECN on, emailed bounce messages won't (necessarily)
make it to the sender at all.

Majordomo's filter bounces the message to be approved
by list owner -- who usually uses the 'D' key to my
knowledge.

>   Regards,
> 
>   Richard
> Permanent: [EMAIL PROTECTED]
> Current:   [EMAIL PROTECTED]

/Matti Aarnio
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Final Warning [ was: ECN is on! ]

2001-05-22 Thread Joe Barr



What is ECN?  Is it the reason SNORT has started this lately:



Active System Attack Alerts
=-=-=-=-=-=-=-=-=-=-=-=-=-=
May 22 10:11:18 pooh snort: spp_portscan: PORTSCAN DETECTED from 199.183.24.194 
(STEALTH)
May 22 10:11:22 pooh snort: spp_portscan: portscan status from 199.183.24.194: 1 
connections across 1 hosts: TCP(1), UDP(0) STEALTH



See ya,
Joe Barr




On Tue, 22 May 2001 10:57:34 -0400




David Relson <[EMAIL PROTECTED]> wrote:

> At 10:18 AM 5/22/01, Steve Modica wrote:
> 
> 
> >Perhaps it's none of my business, but it doesn't seem very sporting to
> >just turn something on that breaks stuff and say "you had fair
> >warning".  Why not shut it back off, issue a statement saying it works
> >now and will be re-enabled on June 10th or something, and everyone must
> >do thus and so or they will break on that day?
> >
> >Vague things like "it'll be turned on real soon now" or ASAP really mean
> >"never" since admins always have things with real deadlines at the top
> >of their list.
> 
> 
> I'd suggest something like:
> 
> Final Warning.  ECN is being turned on NOW.  If your firewall doesn't 
> support ECN, this will be the last message that gets through to you from us.
> 
> Such a message will have the interesting characteristic of being the last 
> message received.  This will make it obvious why no further messages are 
> arriving.
> 
> David
> 
> 
> David Relson   Osage Software Systems, Inc.
> [EMAIL PROTECTED]   Ann Arbor, MI 48103
> www.osagesoftware.com  tel:  734.821.8800
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 


-- 

#--#
| Joe Barr   [EMAIL PROTECTED] |
| Longears and Linux... nowhere but Texas! |
#--#
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



RE: ECN is on!

2001-05-22 Thread Christian, Chip

Not to mention, not everyone on the list runs their own mailservers.

-Original Message-
From: Steve Modica [mailto:[EMAIL PROTECTED]]
Sent: Tuesday, May 22, 2001 12:28
To: Rogier Wolff
Cc: Richard Gooch; Brent D. Norris; David S. Miller;
[EMAIL PROTECTED]; [EMAIL PROTECTED];
[EMAIL PROTECTED]; [EMAIL PROTECTED];
[EMAIL PROTECTED]; [EMAIL PROTECTED]
Subject: Re: ECN is on!


Rogier Wolff wrote:
> The "we'll turn it on in February" warning is worth NOTHING in this
> situation: February comes and goes. March comes and goes. Everybody
> who read the warning will think: Ok, so I must be fine.
> 
> A warning of the form: "ECN will go on as soon as this message clears
> the queues" would've been useful, as thousands (hundreds?) suddenly get
> nothing anymore.
> 

I agree with this line of thinking.  The various academics studying
geology have been warning California about "The Big One" for years now,
and no one seems to care anymore.  

I don't think anyone's being lazy and I certainly don't have the
information to comment on the size of their butts.  So I'd rather just
assume they were working very hard on other things (like getting TPC-H
benchmarks to run!)

Steve

-- 
Steve Modica
Manager - Networking Drivers Group
"Give a man a fish, and he will eat for a day, hit him with a fish and
he leaves you alone" - me
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: ECN is on!

2001-05-22 Thread Richard Gooch

Tony Hoyle writes:
> Richard Gooch wrote:
> 
> > In fact, hopefully he's still in a dark mood, and he may take up the
> > suggestion to bounce mails of the following type:
> > - MIME encoded
> > - HTML encoded
> > - quoted printables (those stupid "=20" things are particuarly hard to
> >   read).
> 
> Surely it'd be better to get the list to filter them through stripmime?
> 
> I'd be tempted to put a message at the top at the same time:
> "*WARNING* The message below was sent by someone too clueless to 
> configure their email client properly"

Well, while that would be somewhat satisfying, there is a problem if
the message gets corrupted by this. And since some people send to the
list without being subscribed (or, like me, have duplicate filtering),
they'll never see that their message was mangled as it passed through
the list.

Nope, a bounce is better. If you're going to do these things, feedback
is essential. The bounce isn't meant to offend the sender, it's
designed to let them know what's happening.

Regards,

Richard
Permanent: [EMAIL PROTECTED]
Current:   [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: ECN is on!

2001-05-22 Thread Richard Gooch

Matti Aarnio writes:
> On Tue, May 22, 2001 at 09:06:25AM -0400, Richard Gooch wrote:
> ...
> > Sure, Dave is being bloody-minded, but that's the only way we'll see
> > people get off their fat, lazy asses and fix their broken systems.
> > In fact, hopefully he's still in a dark mood, and he may take up the
> > suggestion to bounce mails of the following type:
> > - MIME encoded
> > - HTML encoded
> > - quoted printables (those stupid "=20" things are particuarly hard to
> >   read).
> 
>   Bounces to where ?

To the sender, of course, who is the evil culprit.

>   And for that matter, people who you do want to punish do run MUAs,
>   which happily open up everything -- except these bounce reports
>   VGER generates.  But then, vger sends those reports to
>   linux-kernel-owner, who needs no additional punishment...

I don't understand what you mean here. Are you saying that these
MUAs which generate horrible messages drop bounces on the floor?!?

Regards,

Richard
Permanent: [EMAIL PROTECTED]
Current:   [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: ECN is on!

2001-05-22 Thread Rogier Wolff

Richard Gooch wrote:
> Dave sent a message out a week or two ago saying he was going to do it
> soon. And back in January he said he'd be doing it in February. The
> kernel list FAQ has stated this right at the top, in big, bright red
> letters. Yesterday, after I saw Dave's announcement, I updated the FAQ
> to reflect that we're now running ECN.
> 
> People have had plenty of warning. Think of it as a bonus that it
> didn't happen back in February. They've had an extra 3 months to sort
> something out.

The "we'll turn it on in February" warning is worth NOTHING in this
situation: February comes and goes. March comes and goes. Everybody
who read the warning will think: Ok, so I must be fine.

A warning of the form: "ECN will go on as soon as this message clears
the queues" would've been useful, as thousands (hundreds?) suddenly get
nothing anymore.

Roger. 

-- 
** [EMAIL PROTECTED] ** http://www.BitWizard.nl/ ** +31-15-2137555 **
*-- BitWizard writes Linux device drivers for any device you may have! --*
* There are old pilots, and there are bold pilots. 
* There are also old, bald pilots. 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: ECN is on!

2001-05-22 Thread Matti Aarnio

On Tue, May 22, 2001 at 05:00:22PM +0100, Tony Hoyle wrote:
> > suggestion to bounce mails of the following type:
> > - MIME encoded
> > - HTML encoded
> > - quoted printables (those stupid "=20" things are particuarly hard to
> >   read).
> 
> Surely it'd be better to get the list to filter them through stripmime?

Read page:

http://vger.kernel.org/majordomo-info.html

 
> :-)
> Tony

/Matti Aarnio
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: ECN is on!

2001-05-22 Thread Matti Aarnio

On Tue, May 22, 2001 at 09:06:25AM -0400, Richard Gooch wrote:
...
> Sure, Dave is being bloody-minded, but that's the only way we'll see
> people get off their fat, lazy asses and fix their broken systems.
> In fact, hopefully he's still in a dark mood, and he may take up the
> suggestion to bounce mails of the following type:
> - MIME encoded
> - HTML encoded
> - quoted printables (those stupid "=20" things are particuarly hard to
>   read).

  Bounces to where ?

  The bounces ARE in MIME, but frobbing them into HTML is ...
  That would involve my dead body, which I am not keen on supplying.

  And for that matter, people who you do want to punish do run MUAs, which
  happily open up everything -- except these bounce reports VGER generates.
  But then, vger sends those reports to linux-kernel-owner, who needs no
  additional punishment...

>   Regards,
>   Richard
> Permanent: [EMAIL PROTECTED]
> Current:   [EMAIL PROTECTED]

/Matti Aarnio
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: ECN is on!

2001-05-22 Thread Tony Hoyle

Richard Gooch wrote:

> In fact, hopefully he's still in a dark mood, and he may take up the
> suggestion to bounce mails of the following type:
> - MIME encoded
> - HTML encoded
> - quoted printables (those stupid "=20" things are particuarly hard to
>   read).

Surely it'd be better to get the list to filter them through stripmime?


I'd be tempted to put a message at the top at the same time:
"*WARNING* The message below was sent by someone too clueless to 
configure their email client properly"

:-)

Tony

-- 
"Two weeks before due date, the programmers work 22 hour days
  cobbling an application from... (apparently) one programmer
  bashing his face into the keyboard." -- Dilbert

[EMAIL PROTECTED] http://www.nothing-on.tv

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: ECN is on!

2001-05-22 Thread Richard Gooch

Brent D. Norris writes:
> > I veto, the whole point of moving to ECN was to make a statement and
> > get people to fix their kit.
> >
> > We will remove these people, that's all.
> 
> Isn't this a problem though because the messge saying that ECN was
> enabled was set after ECN was enabled?  Thus these people have no
> idea what is going on and they probably won't know what to fix until
> they do.

Dave sent a message out a week or two ago saying he was going to do it
soon. And back in January he said he'd be doing it in February. The
kernel list FAQ has stated this right at the top, in big, bright red
letters. Yesterday, after I saw Dave's announcement, I updated the FAQ
to reflect that we're now running ECN.

People have had plenty of warning. Think of it as a bonus that it
didn't happen back in February. They've had an extra 3 months to sort
something out.

I note with disgust the number of places which should know better, but
still haven't fixed their kit. Most appalling was
missionalcriticallinux.com. Shame!

Sure, Dave is being bloody-minded, but that's the only way we'll see
people get off their fat, lazy asses and fix their broken systems.
In fact, hopefully he's still in a dark mood, and he may take up the
suggestion to bounce mails of the following type:
- MIME encoded
- HTML encoded
- quoted printables (those stupid "=20" things are particuarly hard to
  read).

Regards,

Richard
Permanent: [EMAIL PROTECTED]
Current:   [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: ECN is on!

2001-05-22 Thread Richard Gooch

Alan Cox writes:
> > Matti Aarnio writes:
> >  > I am contemplating to periodically turn off the ECN bit to
> >  > let email out, but DaveM has veto there.
> > 
> > I veto, the whole point of moving to ECN was to make a statement and
> > get people to fix their kit.
> > 
> > We will remove these people, that's all.
> 
> Since HTML email also has a spec can we remove the people who moan
> about that too ;)

I'm sure M$ Exchange has a spec too. Doesn't mean we should support
it. As a community, we need to fight against the darkness.

>   "MIME, oh mime, how I hate thee.  Let me stick pins in you to 
>count the ways..." -- Ben LaHaise

Amen, brother!

Regards,

Richard
Permanent: [EMAIL PROTECTED]
Current:   [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Final Warning [ was: ECN is on! ]

2001-05-22 Thread Matti Aarnio

  Folks, don't speculate.  You are late anyway.

  We just had ECN off for two hours, and all sites which didn't
  commit harakiri at their firewalls ("bad TCP frame from that address,
  I will place that source into dead list") now either got their message,
  or are having some long-term troubles which might anyway get them
  kicked out after a few days time.
  (Long-term troubles meaning things where my solaris 2.6 machine can't
   reach those sites/servers.  D'uh...)

/Matti Aarnio


On Tue, May 22, 2001 at 10:57:34AM -0400, David Relson wrote:
> I'd suggest something like:
> 
> Final Warning.  ECN is being turned on NOW.  If your firewall doesn't 
> support ECN, this will be the last message that gets through to you from us.
> 
> Such a message will have the interesting characteristic of being the last 
> message received.  This will make it obvious why no further messages are 
> arriving.
> 
> David
> 
> 
> David Relson   Osage Software Systems, Inc.
> [EMAIL PROTECTED]   Ann Arbor, MI 48103
> www.osagesoftware.com  tel:  734.821.8800
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Final Warning [ was: ECN is on! ]

2001-05-22 Thread David Relson

At 10:18 AM 5/22/01, Steve Modica wrote:


>Perhaps it's none of my business, but it doesn't seem very sporting to
>just turn something on that breaks stuff and say "you had fair
>warning".  Why not shut it back off, issue a statement saying it works
>now and will be re-enabled on June 10th or something, and everyone must
>do thus and so or they will break on that day?
>
>Vague things like "it'll be turned on real soon now" or ASAP really mean
>"never" since admins always have things with real deadlines at the top
>of their list.


I'd suggest something like:

Final Warning.  ECN is being turned on NOW.  If your firewall doesn't 
support ECN, this will be the last message that gets through to you from us.

Such a message will have the interesting characteristic of being the last 
message received.  This will make it obvious why no further messages are 
arriving.

David


David Relson   Osage Software Systems, Inc.
[EMAIL PROTECTED]   Ann Arbor, MI 48103
www.osagesoftware.com  tel:  734.821.8800

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: ECN is on!

2001-05-22 Thread Steve Modica

"David S. Miller" wrote:
> 
> Matti Aarnio writes:
>  > I am contemplating to periodically turn off the ECN bit to
>  > let email out, but DaveM has veto there.
> 
> I veto, the whole point of moving to ECN was to make a statement and
> get people to fix their kit.
> 
> We will remove these people, that's all.
> 
> Later,
> David S. Miller
> [EMAIL PROTECTED]
> -
> To unsubscribe from this list: send the line "unsubscribe linux-net" in
> the body of a message to [EMAIL PROTECTED]

Perhaps it's none of my business, but it doesn't seem very sporting to
just turn something on that breaks stuff and say "you had fair
warning".  Why not shut it back off, issue a statement saying it works
now and will be re-enabled on June 10th or something, and everyone must
do thus and so or they will break on that day?

Vague things like "it'll be turned on real soon now" or ASAP really mean
"never" since admins always have things with real deadlines at the top
of their list.

Steve
-- 
Steve Modica
Manager - Networking Drivers Group
"Give a man a fish, and he will eat for a day, hit him with a fish and
he leaves you alone" - me
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: ECN is on!

2001-05-22 Thread Graham Murray

Matti Aarnio <[EMAIL PROTECTED]> writes:

> ... and immediately I have been able to verify a bunch of
> domains/servers which won't get thru when incoming connection
> has ECN.

As a matter of interest, are you also noting how many actually
negotiate ECN rather than simply responding with a "plain" SYN ACK? 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: ECN is on!

2001-05-22 Thread Brent D. Norris

> I veto, the whole point of moving to ECN was to make a statement and
> get people to fix their kit.
>
Isn't this a problem though because the messge saying that ECN was enabled
was set after ECN was enabled?  Thus these people have no idea what is
going on and they probably won't know what to fix until they do.

> We will remove these people, that's all.
>
> Later,
> David S. Miller
> [EMAIL PROTECTED]
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>

Brent Norris

Executive Advisor -- WKU-Linux

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: ECN is on!

2001-05-22 Thread Bohdan Vlasyuk

On Tue, May 22, 2001 at 01:10:31PM +0300, Matti Aarnio wrote:

> This list is NOT exhaustive of domains with problems, it
> primarily lists only those who are subscribers of linux-kernel,
> and thus accumulated (al lot) more than 1 email with "connection
> timed out" status into vger's queue.
> 
>   DEST. DOMAIN SERVER NAME
> 
> ic.sunysb.edu   -> bartman.ic.sunysb.edu
...
> geeksimplex.org   -> DNS A: 24.18.90.197 (home.com cable)
Please, next time you'll send such lists, sort it somehow. For example,
with vim you can do it by selecting lines with V, and then :!sort

Thanks!.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: ECN is on!

2001-05-22 Thread Alan Cox

> Matti Aarnio writes:
>  > I am contemplating to periodically turn off the ECN bit to
>  > let email out, but DaveM has veto there.
> 
> I veto, the whole point of moving to ECN was to make a statement and
> get people to fix their kit.
> 
> We will remove these people, that's all.

Since HTML email also has a spec can we remove the people who moan about that
too ;)

Alan
--
"MIME, oh mime, how I hate thee.  Let me stick pins in you to 
 count the ways..." -- Ben LaHaise

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: ECN is on!

2001-05-22 Thread David S. Miller


Matti Aarnio writes:
 > I am contemplating to periodically turn off the ECN bit to
 > let email out, but DaveM has veto there.

I veto, the whole point of moving to ECN was to make a statement and
get people to fix their kit.

We will remove these people, that's all.

Later,
David S. Miller
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



ECN is on!

2001-05-22 Thread Matti Aarnio

... and immediately I have been able to verify a bunch of
domains/servers which won't get thru when incoming connection
has ECN.I tested all of these with Linux running ECN, and
Solaris 2.6 without ECN.  When Solaris got connection, and
ECN-Linux didn't, domain and its server got listed.

Amazing share of these troubled destination systems run some
firewall which thinks it is cool to fuck SMTP protocol.
(I mean obscuring responses given by the remote system,
 frobbing incoming protocol if some particular detail doesn't
 please the rules it presumes to play with, etc..)

I am contemplating to periodically turn off the ECN bit to
let email out, but DaveM has veto there.

This list is NOT exhaustive of domains with problems, it
primarily lists only those who are subscribers of linux-kernel,
and thus accumulated (al lot) more than 1 email with "connection
timed out" status into vger's queue.


  DEST. DOMAIN SERVER NAME

ic.sunysb.edu   -> bartman.ic.sunysb.edu
olympus.phys.wesleyan.edu   -> olympus.phys.wesleyan.edu
imap.reed.edu   -> imogen.reed.edu
aplcomm.jhuapl.edu  -> dallas.jhuapl.edu
mail.utexas.edu -> mx2.mail.utexas.edu
cs.jhu.edu  -> hops.cs.jhu.edu
judy.indstate.edu   - gets connected, but then freezes.
cc.usu.edu  -> cc.usu.edu
aubi.de -> mail.aubi-online.de
opensource.se   -> mail.carambole.com
routemeister.net-> mail.carambole.com
*.swipnet.se-> smtp-ext.swip.net
swipnet.se  -> smtp-ext.swip.net
get2net.dk  -> smtp-ext.swip.net
dina.kvl.dk -> sheridan.dina.kvl.dk
enea.se -> ruff.enea.se
able.es -> jalon.able.es
vadoc.state.va.us   -> mail.vadoc.state.va.us
libero.it   -> smtp-in.libero.it
ra.cit.alcatel.fr   -> mail.alcatel.fr
csse.monash.edu.au  -> ld-mx.it.monash.edu.au
galactica.it-> mail.galactica.it
ds.catv.ne.jp   -> cs14.catv.ne.jp
mailbox.dsnet.it-> mailin.dsnet.it
lee.k12.nc.us   -> shomer.lee.k12.nc.us
sh.bel.alcatel.be   -> mx001.alcatel.be
quantum.cicese.mx   -> quantum.cicese.mx
isuzu.pl-> isztye02.isuzu.pl
gruppocredit.it -> mext.gruppocredit.it
debitel.net -> mail.dnsg.net
optical.lvl.pri.bms.com -> chimera.bms.com
us.celoxica.com -> mail.us.embeddedsol.com
ford.com-> mail0.allegro.net
vnnews.com  -> mail.cinet.vnn.vn
echostar.com-> rf-mail1.echostar.com
jetform.com -> mail.jetform.com
half.com-> mailhub.half.com
pa.dec.com  -> ztxmail01.ztx.compaq.com
compaq.com  -> ztxmail01.ztx.compaq.com
zk3.dec.com -> ztxmail01.ztx.compaq.com
allaire.com -> smtp.allaire.com
catalog-international.com   -> ciexchange.catalog-international.com
lcr-m.com   -> mail1.lcr-m.net
logica.com  -> mail4.messagelabs.com
missioncriticallinux.com-> mail.missioncriticallinux.com
msdw.com-> mx1.ms.com
honeywell.com   -> tmpsmtp702.honeywell.com
austin.ibm.com  -> mg02.austin.ibm.com
btinternet.com  -> moongate.btinternet.com
geeksimplex.org -> DNS A: 24.18.90.197 (home.com cable)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



  1   2   3   4   5   6   7   8   >