date:20181106

[PATCH net-next] net: phy: realtek: load driver for all PHYs with a Realtek OUI

2018-11-06 Thread Heiner Kallweit

Instead of listing every single PHYID, load the driver for every PHYID
with a Realtek OUI, independent of model number and revision.

This patch also improves two further aspects:
- constify realtek_tbl[]
- the mask should have been 0x instead of 0x001f so far,
  by masking out some bits a PHY from another vendor could have been
  matched

Signed-off-by: Heiner Kallweit 
---
 drivers/net/phy/realtek.c | 11 ++-
 1 file changed, 2 insertions(+), 9 deletions(-)

diff --git a/drivers/net/phy/realtek.c b/drivers/net/phy/realtek.c
index 271e8adc3..7b1c89b38 100644
--- a/drivers/net/phy/realtek.c
+++ b/drivers/net/phy/realtek.c
@@ -305,15 +305,8 @@ static struct phy_driver realtek_drvs[] = {
 
 module_phy_driver(realtek_drvs);
 
-static struct mdio_device_id __maybe_unused realtek_tbl[] = {
-   { 0x001cc816, 0x001f },
-   { 0x001cc910, 0x001f },
-   { 0x001cc912, 0x001f },
-   { 0x001cc913, 0x001f },
-   { 0x001cc914, 0x001f },
-   { 0x001cc915, 0x001f },
-   { 0x001cc916, 0x001f },
-   { 0x001cc961, 0x001f },
+static const struct mdio_device_id __maybe_unused realtek_tbl[] = {
+   { 0x001cc800, GENMASK(31, 10) },
{ }
 };
 
-- 
2.19.1

Re: [PATCH rdma] net/mlx5: Fix XRC SRQ umem valid bits

2018-11-06 Thread Leon Romanovsky

On Tue, Nov 06, 2018 at 03:11:53PM -0700, Jason Gunthorpe wrote:
> On Tue, Nov 06, 2018 at 05:10:53PM -0500, Doug Ledford wrote:
> > On Tue, 2018-11-06 at 22:02 +, Jason Gunthorpe wrote:
> > > On Tue, Nov 06, 2018 at 04:31:08PM -0500, Doug Ledford wrote:
> > > > On Wed, 2018-10-31 at 12:20 +0200, Leon Romanovsky wrote:
> > > > > From: Yishai Hadas 
> > > > >
> > > > > Adapt XRC SRQ to the latest HW specification with fixed definition
> > > > > around umem valid bits. The previous definition relied on a bit which
> > > > > was taken for other purposes in legacy FW.
> > > > >
> > > > > Fixes: bd37197554eb ("net/mlx5: Update mlx5_ifc with DEVX UID bits")
> > > > > Signed-off-by: Yishai Hadas 
> > > > > Reviewed-by: Artemy Kovalyov 
> > > > > Signed-off-by: Leon Romanovsky 
> > > > > Hi Doug, Jason
> > > > >
> > > > > This commit fixes code sent in this merge window, so I'm not marking 
> > > > > it
> > > > > with any rdma-rc/rdma-next. It will be better to be sent during this 
> > > > > merge
> > > > > window if you have extra pull request to issue, or as a -rc material, 
> > > > > if
> > > > > not.
> > > > >
> > > > > BTW, we didn't combine reserved fields, because our convention is to 
> > > > > align such
> > > > > fields to 32 bits for better readability.
> > > > >
> > > > > Thanks
> > > >
> > > > This looks fine.  Let me know when it's in the mlx5-next tree to pull.
> > >
> > > It needs to go to -rc...
> > >
> > > This needs a mlx5-rc branch for this I guess?
> >
> > I don't think so.  As long as it's the first commit in mlx5-next, and
> > mlx5-next is 4.20-rc1 based, then pulling this commit into the -rc tree
> > will only pull the single commit.  Then when we pull into for-next for
> > the first time, we will get this in for-next too.  That seems best to
> > me.
>
> That works too, if Leon is fast :)

Thank you both for suggestion.

I did it.
99b77fef3c6c net/mlx5: Fix XRC SRQ umem valid bits

It is first commit and it is based on -rc1.

Thanks

>
> Jason


signature.asc
Description: PGP signature

[PATCH net-next] net: phy: make phy_trigger_machine static

2018-11-06 Thread Heiner Kallweit

phy_trigger_machine() is used in phy.c only, so we can make it static.

Signed-off-by: Heiner Kallweit 
---
 drivers/net/phy/phy.c | 33 -
 include/linux/phy.h   |  1 -
 2 files changed, 12 insertions(+), 22 deletions(-)

diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c
index 1d73ac330..476578746 100644
--- a/drivers/net/phy/phy.c
+++ b/drivers/net/phy/phy.c
@@ -467,6 +467,18 @@ int phy_mii_ioctl(struct phy_device *phydev, struct ifreq 
*ifr, int cmd)
 }
 EXPORT_SYMBOL(phy_mii_ioctl);
 
+static void phy_queue_state_machine(struct phy_device *phydev,
+   unsigned int secs)
+{
+   mod_delayed_work(system_power_efficient_wq, >state_queue,
+secs * HZ);
+}
+
+static void phy_trigger_machine(struct phy_device *phydev)
+{
+   phy_queue_state_machine(phydev, 0);
+}
+
 static int phy_config_aneg(struct phy_device *phydev)
 {
if (phydev->drv->config_aneg)
@@ -620,13 +632,6 @@ int phy_speed_up(struct phy_device *phydev)
 }
 EXPORT_SYMBOL_GPL(phy_speed_up);
 
-static void phy_queue_state_machine(struct phy_device *phydev,
-   unsigned int secs)
-{
-   mod_delayed_work(system_power_efficient_wq, >state_queue,
-secs * HZ);
-}
-
 /**
  * phy_start_machine - start PHY state machine tracking
  * @phydev: the phy_device struct
@@ -643,20 +648,6 @@ void phy_start_machine(struct phy_device *phydev)
 }
 EXPORT_SYMBOL_GPL(phy_start_machine);
 
-/**
- * phy_trigger_machine - trigger the state machine to run
- *
- * @phydev: the phy_device struct
- *
- * Description: There has been a change in state which requires that the
- *   state machine runs.
- */
-
-void phy_trigger_machine(struct phy_device *phydev)
-{
-   phy_queue_state_machine(phydev, 0);
-}
-
 /**
  * phy_stop_machine - stop the PHY state machine tracking
  * @phydev: target phy_device struct
diff --git a/include/linux/phy.h b/include/linux/phy.h
index 3ea87f774..9e4d49ef4 100644
--- a/include/linux/phy.h
+++ b/include/linux/phy.h
@@ -1054,7 +1054,6 @@ void phy_change_work(struct work_struct *work);
 void phy_mac_interrupt(struct phy_device *phydev);
 void phy_start_machine(struct phy_device *phydev);
 void phy_stop_machine(struct phy_device *phydev);
-void phy_trigger_machine(struct phy_device *phydev);
 int phy_ethtool_sset(struct phy_device *phydev, struct ethtool_cmd *cmd);
 void phy_ethtool_ksettings_get(struct phy_device *phydev,
   struct ethtool_link_ksettings *cmd);
-- 
2.19.1

Re: [PATCH net-next 03/11] vxlan: Allow configuration of DF behaviour

2018-11-06 Thread Stephen Hemminger

On Tue,  6 Nov 2018 22:38:59 +0100
Stefano Brivio  wrote:

>   df = htons(IP_DF);
>   }
>  
> + if (!df) {
> + if (vxlan->cfg.df == VXLAN_DF_SET) {
> + df = htons(IP_DF);

I am confused, this looks like this new flag is duplicating the exiting tunnel 
DF flag.
(in info->key.tun.flags). Why is another flag needed?

Join the Illuminati Brotherhood

2018-11-06 Thread Bright Terry

Greetings from the Illuminati order, Join the Illuminati Brotherhood for
fame, knowledge, wealth and powers.

[PATCH] staging: rtl8723bs: Fix incorrect sense of ether_addr_equal

2018-11-06 Thread Larry Finger

In commit b37f9e1c3801 ("staging: rtl8723bs: Fix lines too long in
update_recvframe_attrib()."), the refactoring involved replacing
two memcmp() calls with ether_addr_equal() calls. What the author
missed is that memcmp() returns false when the two strings are equal,
whereas ether_addr_equal() returns true when the two addresses are
equal. One side effect of this error is that the strength of an
unassociated AP was much stronger than the same AP after association.
This bug is reported at bko#201611.

Fixes: b37f9e1c3801 ("staging: rtl8723bs: Fix lines too long in 
update_recvframe_attrib().")
Cc: Stable 
Cc: youling257 
Cc: u.srikant.patn...@gmail.com
Reported-and-tested-by: youling257 
Signed-off-by: Larry Finger 
---
 drivers/staging/rtl8723bs/hal/rtl8723bs_recv.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/staging/rtl8723bs/hal/rtl8723bs_recv.c 
b/drivers/staging/rtl8723bs/hal/rtl8723bs_recv.c
index 85077947b9b8..85aba8a503cd 100644
--- a/drivers/staging/rtl8723bs/hal/rtl8723bs_recv.c
+++ b/drivers/staging/rtl8723bs/hal/rtl8723bs_recv.c
@@ -109,12 +109,12 @@ static void update_recvframe_phyinfo(union recv_frame 
*precvframe,
rx_bssid = get_hdr_bssid(wlanhdr);
pkt_info.bssid_match = ((!IsFrameTypeCtrl(wlanhdr)) &&
!pattrib->icv_err && !pattrib->crc_err &&
-   !ether_addr_equal(rx_bssid, my_bssid));
+   ether_addr_equal(rx_bssid, my_bssid));
 
rx_ra = get_ra(wlanhdr);
my_hwaddr = myid(>eeprompriv);
pkt_info.to_self = pkt_info.bssid_match &&
-   !ether_addr_equal(rx_ra, my_hwaddr);
+   ether_addr_equal(rx_ra, my_hwaddr);
 
 
pkt_info.is_beacon = pkt_info.bssid_match &&
-- 
2.19.1

[PATCH bpf-next 3/3] selftests/bpf: Test narrow loads with off > 0 for bpf_sock_addr

2018-11-06 Thread Andrey Ignatov

Add more test cases for context bpf_sock_addr to test narrow loads with
offset > 0 for ctx->user_ip4 field (__u32):
* off=1, size=1;
* off=2, size=1;
* off=3, size=1;
* off=2, size=2.

Signed-off-by: Andrey Ignatov 
---
 tools/testing/selftests/bpf/test_sock_addr.c | 28 +---
 1 file changed, 24 insertions(+), 4 deletions(-)

diff --git a/tools/testing/selftests/bpf/test_sock_addr.c 
b/tools/testing/selftests/bpf/test_sock_addr.c
index aeeb76a54d63..73b7493d4120 100644
--- a/tools/testing/selftests/bpf/test_sock_addr.c
+++ b/tools/testing/selftests/bpf/test_sock_addr.c
@@ -574,24 +574,44 @@ static int bind4_prog_load(const struct sock_addr_test 
*test)
/* if (sk.family == AF_INET && */
BPF_LDX_MEM(BPF_W, BPF_REG_7, BPF_REG_6,
offsetof(struct bpf_sock_addr, family)),
-   BPF_JMP_IMM(BPF_JNE, BPF_REG_7, AF_INET, 16),
+   BPF_JMP_IMM(BPF_JNE, BPF_REG_7, AF_INET, 24),
 
/* (sk.type == SOCK_DGRAM || sk.type == SOCK_STREAM) && */
BPF_LDX_MEM(BPF_W, BPF_REG_7, BPF_REG_6,
offsetof(struct bpf_sock_addr, type)),
BPF_JMP_IMM(BPF_JNE, BPF_REG_7, SOCK_DGRAM, 1),
BPF_JMP_A(1),
-   BPF_JMP_IMM(BPF_JNE, BPF_REG_7, SOCK_STREAM, 12),
+   BPF_JMP_IMM(BPF_JNE, BPF_REG_7, SOCK_STREAM, 20),
 
/* 1st_byte_of_user_ip4 == expected && */
BPF_LDX_MEM(BPF_B, BPF_REG_7, BPF_REG_6,
offsetof(struct bpf_sock_addr, user_ip4)),
-   BPF_JMP_IMM(BPF_JNE, BPF_REG_7, ip4.u4_addr8[0], 10),
+   BPF_JMP_IMM(BPF_JNE, BPF_REG_7, ip4.u4_addr8[0], 18),
+
+   /* 2nd_byte_of_user_ip4 == expected && */
+   BPF_LDX_MEM(BPF_B, BPF_REG_7, BPF_REG_6,
+   offsetof(struct bpf_sock_addr, user_ip4) + 1),
+   BPF_JMP_IMM(BPF_JNE, BPF_REG_7, ip4.u4_addr8[1], 16),
+
+   /* 3rd_byte_of_user_ip4 == expected && */
+   BPF_LDX_MEM(BPF_B, BPF_REG_7, BPF_REG_6,
+   offsetof(struct bpf_sock_addr, user_ip4) + 2),
+   BPF_JMP_IMM(BPF_JNE, BPF_REG_7, ip4.u4_addr8[2], 14),
+
+   /* 4th_byte_of_user_ip4 == expected && */
+   BPF_LDX_MEM(BPF_B, BPF_REG_7, BPF_REG_6,
+   offsetof(struct bpf_sock_addr, user_ip4) + 3),
+   BPF_JMP_IMM(BPF_JNE, BPF_REG_7, ip4.u4_addr8[3], 12),
 
/* 1st_half_of_user_ip4 == expected && */
BPF_LDX_MEM(BPF_H, BPF_REG_7, BPF_REG_6,
offsetof(struct bpf_sock_addr, user_ip4)),
-   BPF_JMP_IMM(BPF_JNE, BPF_REG_7, ip4.u4_addr16[0], 8),
+   BPF_JMP_IMM(BPF_JNE, BPF_REG_7, ip4.u4_addr16[0], 10),
+
+   /* 2nd_half_of_user_ip4 == expected && */
+   BPF_LDX_MEM(BPF_H, BPF_REG_7, BPF_REG_6,
+   offsetof(struct bpf_sock_addr, user_ip4) + 2),
+   BPF_JMP_IMM(BPF_JNE, BPF_REG_7, ip4.u4_addr16[1], 8),
 
/* whole_user_ip4 == expected) { */
BPF_LDX_MEM(BPF_W, BPF_REG_7, BPF_REG_6,
-- 
2.17.1

[PATCH bpf-next 2/3] selftests/bpf: Test narrow loads with off > 0 in test_verifier

2018-11-06 Thread Andrey Ignatov

Test the following narrow loads in test_verifier for context __sk_buff:
* off=1, size=1 - ok;
* off=2, size=1 - ok;
* off=3, size=1 - ok;
* off=0, size=2 - ok;
* off=1, size=2 - fail;
* off=0, size=2 - ok;
* off=3, size=2 - fail.

Signed-off-by: Andrey Ignatov 
---
 tools/testing/selftests/bpf/test_verifier.c | 48 -
 1 file changed, 38 insertions(+), 10 deletions(-)

diff --git a/tools/testing/selftests/bpf/test_verifier.c 
b/tools/testing/selftests/bpf/test_verifier.c
index 6f61df62f690..54d16fbdef8b 100644
--- a/tools/testing/selftests/bpf/test_verifier.c
+++ b/tools/testing/selftests/bpf/test_verifier.c
@@ -2026,29 +2026,27 @@ static struct bpf_test tests[] = {
.result = ACCEPT,
},
{
-   "check skb->hash byte load not permitted 1",
+   "check skb->hash byte load permitted 1",
.insns = {
BPF_MOV64_IMM(BPF_REG_0, 0),
BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_1,
offsetof(struct __sk_buff, hash) + 1),
BPF_EXIT_INSN(),
},
-   .errstr = "invalid bpf_context access",
-   .result = REJECT,
+   .result = ACCEPT,
},
{
-   "check skb->hash byte load not permitted 2",
+   "check skb->hash byte load permitted 2",
.insns = {
BPF_MOV64_IMM(BPF_REG_0, 0),
BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_1,
offsetof(struct __sk_buff, hash) + 2),
BPF_EXIT_INSN(),
},
-   .errstr = "invalid bpf_context access",
-   .result = REJECT,
+   .result = ACCEPT,
},
{
-   "check skb->hash byte load not permitted 3",
+   "check skb->hash byte load permitted 3",
.insns = {
BPF_MOV64_IMM(BPF_REG_0, 0),
 #if __BYTE_ORDER == __LITTLE_ENDIAN
@@ -2060,8 +2058,7 @@ static struct bpf_test tests[] = {
 #endif
BPF_EXIT_INSN(),
},
-   .errstr = "invalid bpf_context access",
-   .result = REJECT,
+   .result = ACCEPT,
},
{
"check cb access: byte, wrong type",
@@ -2173,7 +2170,7 @@ static struct bpf_test tests[] = {
.result = ACCEPT,
},
{
-   "check skb->hash half load not permitted",
+   "check skb->hash half load permitted 2",
.insns = {
BPF_MOV64_IMM(BPF_REG_0, 0),
 #if __BYTE_ORDER == __LITTLE_ENDIAN
@@ -2182,6 +2179,37 @@ static struct bpf_test tests[] = {
 #else
BPF_LDX_MEM(BPF_H, BPF_REG_0, BPF_REG_1,
offsetof(struct __sk_buff, hash)),
+#endif
+   BPF_EXIT_INSN(),
+   },
+   .result = ACCEPT,
+   },
+   {
+   "check skb->hash half load not permitted, unaligned 1",
+   .insns = {
+   BPF_MOV64_IMM(BPF_REG_0, 0),
+#if __BYTE_ORDER == __LITTLE_ENDIAN
+   BPF_LDX_MEM(BPF_H, BPF_REG_0, BPF_REG_1,
+   offsetof(struct __sk_buff, hash) + 1),
+#else
+   BPF_LDX_MEM(BPF_H, BPF_REG_0, BPF_REG_1,
+   offsetof(struct __sk_buff, hash) + 3),
+#endif
+   BPF_EXIT_INSN(),
+   },
+   .errstr = "invalid bpf_context access",
+   .result = REJECT,
+   },
+   {
+   "check skb->hash half load not permitted, unaligned 3",
+   .insns = {
+   BPF_MOV64_IMM(BPF_REG_0, 0),
+#if __BYTE_ORDER == __LITTLE_ENDIAN
+   BPF_LDX_MEM(BPF_H, BPF_REG_0, BPF_REG_1,
+   offsetof(struct __sk_buff, hash) + 3),
+#else
+   BPF_LDX_MEM(BPF_H, BPF_REG_0, BPF_REG_1,
+   offsetof(struct __sk_buff, hash) + 1),
 #endif
BPF_EXIT_INSN(),
},
-- 
2.17.1

[PATCH bpf-next 1/3] bpf: Allow narrow loads with offset > 0

2018-11-06 Thread Andrey Ignatov

Currently BPF verifier allows narrow loads for a context field only with
offset zero. E.g. if there is a __u32 field then only the following
loads are permitted:
  * off=0, size=1 (narrow);
  * off=0, size=2 (narrow);
  * off=0, size=4 (full).

On the other hand LLVM can generate a load with offset different than
zero that make sense from program logic point of view, but verifier
doesn't accept it.

E.g. tools/testing/selftests/bpf/sendmsg4_prog.c has code:

  #define DST_IP4   0xC0A801FEU /* 192.168.1.254 */
  ...
if ((ctx->user_ip4 >> 24) == (bpf_htonl(DST_IP4) >> 24) &&

where ctx is struct bpf_sock_addr.

Some versions of LLVM can produce the following byte code for it:

   8:   71 12 07 00 00 00 00 00 r2 = *(u8 *)(r1 + 7)
   9:   67 02 00 00 18 00 00 00 r2 <<= 24
  10:   18 03 00 00 00 00 00 fe 00 00 00 00 00 00 00 00 r3 = 
4261412864 ll
  12:   5d 32 07 00 00 00 00 00 if r2 != r3 goto +7 

where `*(u8 *)(r1 + 7)` means narrow load for ctx->user_ip4 with size=1
and offset=3 (7 - sizeof(ctx->user_family) = 3). This load is currently
rejected by verifier.

Verifier code that rejects such loads is in bpf_ctx_narrow_access_ok()
what means any is_valid_access implementation, that uses the function,
works this way, e.g. bpf_skb_is_valid_access() for __sk_buff or
sock_addr_is_valid_access() for bpf_sock_addr.

The patch makes such loads supported. Offset can be in [0; size_default)
but has to be multiple of load size. E.g. for __u32 field the following
loads are supported now:
  * off=0, size=1 (narrow);
  * off=1, size=1 (narrow);
  * off=2, size=1 (narrow);
  * off=3, size=1 (narrow);
  * off=0, size=2 (narrow);
  * off=2, size=2 (narrow);
  * off=0, size=4 (full).

Reported-by: Yonghong Song 
Signed-off-by: Andrey Ignatov 
---
 include/linux/filter.h | 16 +---
 kernel/bpf/verifier.c  | 19 +++
 2 files changed, 16 insertions(+), 19 deletions(-)

diff --git a/include/linux/filter.h b/include/linux/filter.h
index de629b706d1d..cc17f5f32fbb 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -668,24 +668,10 @@ static inline u32 bpf_ctx_off_adjust_machine(u32 size)
return size;
 }
 
-static inline bool bpf_ctx_narrow_align_ok(u32 off, u32 size_access,
-  u32 size_default)
-{
-   size_default = bpf_ctx_off_adjust_machine(size_default);
-   size_access  = bpf_ctx_off_adjust_machine(size_access);
-
-#ifdef __LITTLE_ENDIAN
-   return (off & (size_default - 1)) == 0;
-#else
-   return (off & (size_default - 1)) + size_access == size_default;
-#endif
-}
-
 static inline bool
 bpf_ctx_narrow_access_ok(u32 off, u32 size, u32 size_default)
 {
-   return bpf_ctx_narrow_align_ok(off, size, size_default) &&
-  size <= size_default && (size & (size - 1)) == 0;
+   return size <= size_default && (size & (size - 1)) == 0;
 }
 
 #define bpf_classic_proglen(fprog) (fprog->len * sizeof(fprog->filter[0]))
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 1971ca325fb4..fa592502568e 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -5803,9 +5803,9 @@ static int convert_ctx_accesses(struct bpf_verifier_env 
*env)
 * we will apply proper mask to the result.
 */
is_narrower_load = size < ctx_field_size;
+   u32 size_default = bpf_ctx_off_adjust_machine(ctx_field_size);
+   u32 off = insn->off;
if (is_narrower_load) {
-   u32 size_default = 
bpf_ctx_off_adjust_machine(ctx_field_size);
-   u32 off = insn->off;
u8 size_code;
 
if (type == BPF_WRITE) {
@@ -5833,12 +5833,23 @@ static int convert_ctx_accesses(struct bpf_verifier_env 
*env)
}
 
if (is_narrower_load && size < target_size) {
-   if (ctx_field_size <= 4)
+   u8 shift = (off & (size_default - 1)) * 8;
+
+   if (ctx_field_size <= 4) {
+   if (shift)
+   insn_buf[cnt++] = BPF_ALU32_IMM(BPF_RSH,
+   
insn->dst_reg,
+   shift);
insn_buf[cnt++] = BPF_ALU32_IMM(BPF_AND, 
insn->dst_reg,
(1 << size * 8) 
- 1);
-   else
+   } else {
+   if (shift)
+   insn_buf[cnt++] = BPF_ALU64_IMM(BPF_RSH,
+   
insn->dst_reg,
+   shift);

[PATCH bpf-next 0/3] bpf: Allow narrow loads with offset > 0

2018-11-06 Thread Andrey Ignatov

This patch set adds support for narrow loads with offset > 0 to BPF
verifier.

Patch 1 provides more details and is the main patch in the set.
Patches 2 and 3 add new test cases to test_verifier and test_sock_addr
selftests.


Andrey Ignatov (3):
  bpf: Allow narrow loads with offset > 0
  selftests/bpf: Test narrow loads with off > 0 in test_verifier
  selftests/bpf: Test narrow loads with off > 0 for bpf_sock_addr

 include/linux/filter.h   | 16 +--
 kernel/bpf/verifier.c| 19 ++--
 tools/testing/selftests/bpf/test_sock_addr.c | 28 ++--
 tools/testing/selftests/bpf/test_verifier.c  | 48 
 4 files changed, 78 insertions(+), 33 deletions(-)

-- 
2.17.1

[PATCH net-next 5/7] nfp: flower: make nfp_fl_lag_changels_event() void

2018-11-06 Thread Jakub Kicinski

nfp_fl_lag_changels_event() never fails, and therefore we would
never return NOTIFY_BAD for NETDEV_CHANGELOWERSTATE.  Make this
clearer by changing nfp_fl_lag_changels_event()'s return type
to void.

Signed-off-by: Jakub Kicinski 
Reviewed-by: John Hurley 
---
 .../net/ethernet/netronome/nfp/flower/lag_conf.c| 13 +
 1 file changed, 5 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/flower/lag_conf.c 
b/drivers/net/ethernet/netronome/nfp/flower/lag_conf.c
index dc060748b33b..22b75a519269 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/lag_conf.c
+++ b/drivers/net/ethernet/netronome/nfp/flower/lag_conf.c
@@ -582,7 +582,7 @@ nfp_fl_lag_changeupper_event(struct nfp_fl_lag *lag,
return 0;
 }
 
-static int
+static void
 nfp_fl_lag_changels_event(struct nfp_fl_lag *lag, struct net_device *netdev,
  struct netdev_notifier_changelowerstate_info *info)
 {
@@ -593,18 +593,18 @@ nfp_fl_lag_changels_event(struct nfp_fl_lag *lag, struct 
net_device *netdev,
unsigned long *flags;
 
if (!netif_is_lag_port(netdev) || !nfp_netdev_is_nfp_repr(netdev))
-   return 0;
+   return;
 
lag_lower_info = info->lower_state_info;
if (!lag_lower_info)
-   return 0;
+   return;
 
priv = container_of(lag, struct nfp_flower_priv, nfp_lag);
repr = netdev_priv(netdev);
 
/* Verify that the repr is associated with this app. */
if (repr->app != priv->app)
-   return 0;
+   return;
 
repr_priv = repr->app_priv;
flags = _priv->lag_port_flags;
@@ -624,7 +624,6 @@ nfp_fl_lag_changels_event(struct nfp_fl_lag *lag, struct 
net_device *netdev,
mutex_unlock(>lock);
 
schedule_delayed_work(>work, NFP_FL_LAG_DELAY);
-   return 0;
 }
 
 static int
@@ -645,9 +644,7 @@ nfp_fl_lag_netdev_event(struct notifier_block *nb, unsigned 
long event,
return NOTIFY_BAD;
return NOTIFY_OK;
case NETDEV_CHANGELOWERSTATE:
-   err = nfp_fl_lag_changels_event(lag, netdev, ptr);
-   if (err)
-   return NOTIFY_BAD;
+   nfp_fl_lag_changels_event(lag, netdev, ptr);
return NOTIFY_OK;
case NETDEV_UNREGISTER:
nfp_fl_lag_schedule_group_delete(lag, netdev);
-- 
2.17.1

[PATCH net-next 2/7] nfp: flower: add ipv6 set flow label and hop limit offload

2018-11-06 Thread Jakub Kicinski

From: Pieter Jansen van Vuuren 

Add ipv6 set flow label and hop limit action offload. Since pedit sets
headers per 4 byte word, we need to ensure that setting either version,
priority, payload_len or nexthdr does not get offloaded.

Signed-off-by: Pieter Jansen van Vuuren 
Reviewed-by: Jakub Kicinski 
---
 .../ethernet/netronome/nfp/flower/action.c| 65 +--
 .../net/ethernet/netronome/nfp/flower/cmsg.h  | 14 
 2 files changed, 75 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/flower/action.c 
b/drivers/net/ethernet/netronome/nfp/flower/action.c
index b79b924ef56d..cfea8f790f95 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/action.c
+++ b/drivers/net/ethernet/netronome/nfp/flower/action.c
@@ -476,12 +476,57 @@ nfp_fl_set_ip6_helper(int opcode_tag, u8 word, __be32 
exact, __be32 mask,
ip6->head.len_lw = sizeof(*ip6) >> NFP_FL_LW_SIZ;
 }
 
+struct ipv6_hop_limit_word {
+   __be16 payload_len;
+   u8 nexthdr;
+   u8 hop_limit;
+};
+
+static int
+nfp_fl_set_ip6_hop_limit_flow_label(u32 off, __be32 exact, __be32 mask,
+   struct nfp_fl_set_ipv6_tc_hl_fl *ip_hl_fl)
+{
+   struct ipv6_hop_limit_word *fl_hl_mask;
+   struct ipv6_hop_limit_word *fl_hl;
+
+   switch (off) {
+   case offsetof(struct ipv6hdr, payload_len):
+   fl_hl_mask = (struct ipv6_hop_limit_word *)
+   fl_hl = (struct ipv6_hop_limit_word *)
+
+   if (fl_hl_mask->nexthdr || fl_hl_mask->payload_len)
+   return -EOPNOTSUPP;
+
+   ip_hl_fl->ipv6_hop_limit_mask |= fl_hl_mask->hop_limit;
+   ip_hl_fl->ipv6_hop_limit &= ~fl_hl_mask->hop_limit;
+   ip_hl_fl->ipv6_hop_limit |= fl_hl->hop_limit &
+   fl_hl_mask->hop_limit;
+   break;
+   case round_down(offsetof(struct ipv6hdr, flow_lbl), 4):
+   if (mask & ~IPV6_FLOW_LABEL_MASK ||
+   exact & ~IPV6_FLOW_LABEL_MASK)
+   return -EOPNOTSUPP;
+
+   ip_hl_fl->ipv6_label_mask |= mask;
+   ip_hl_fl->ipv6_label &= ~mask;
+   ip_hl_fl->ipv6_label |= exact & mask;
+   break;
+   }
+
+   ip_hl_fl->head.jump_id = NFP_FL_ACTION_OPCODE_SET_IPV6_TC_HL_FL;
+   ip_hl_fl->head.len_lw = sizeof(*ip_hl_fl) >> NFP_FL_LW_SIZ;
+
+   return 0;
+}
+
 static int
 nfp_fl_set_ip6(const struct tc_action *action, int idx, u32 off,
   struct nfp_fl_set_ipv6_addr *ip_dst,
-  struct nfp_fl_set_ipv6_addr *ip_src)
+  struct nfp_fl_set_ipv6_addr *ip_src,
+  struct nfp_fl_set_ipv6_tc_hl_fl *ip_hl_fl)
 {
__be32 exact, mask;
+   int err = 0;
u8 word;
 
/* We are expecting tcf_pedit to return a big endian value */
@@ -492,7 +537,8 @@ nfp_fl_set_ip6(const struct tc_action *action, int idx, u32 
off,
return -EOPNOTSUPP;
 
if (off < offsetof(struct ipv6hdr, saddr)) {
-   return -EOPNOTSUPP;
+   err = nfp_fl_set_ip6_hop_limit_flow_label(off, exact, mask,
+ ip_hl_fl);
} else if (off < offsetof(struct ipv6hdr, daddr)) {
word = (off - offsetof(struct ipv6hdr, saddr)) / sizeof(exact);
nfp_fl_set_ip6_helper(NFP_FL_ACTION_OPCODE_SET_IPV6_SRC, word,
@@ -506,7 +552,7 @@ nfp_fl_set_ip6(const struct tc_action *action, int idx, u32 
off,
return -EOPNOTSUPP;
}
 
-   return 0;
+   return err;
 }
 
 static int
@@ -557,6 +603,7 @@ nfp_fl_pedit(const struct tc_action *action, struct 
tc_cls_flower_offload *flow,
 char *nfp_action, int *a_len, u32 *csum_updated)
 {
struct nfp_fl_set_ipv6_addr set_ip6_dst, set_ip6_src;
+   struct nfp_fl_set_ipv6_tc_hl_fl set_ip6_tc_hl_fl;
struct nfp_fl_set_ip4_ttl_tos set_ip_ttl_tos;
struct nfp_fl_set_ip4_addrs set_ip_addr;
struct nfp_fl_set_tport set_tport;
@@ -567,6 +614,7 @@ nfp_fl_pedit(const struct tc_action *action, struct 
tc_cls_flower_offload *flow,
u32 offset, cmd;
u8 ip_proto = 0;
 
+   memset(_ip6_tc_hl_fl, 0, sizeof(set_ip6_tc_hl_fl));
memset(_ip_ttl_tos, 0, sizeof(set_ip_ttl_tos));
memset(_ip6_dst, 0, sizeof(set_ip6_dst));
memset(_ip6_src, 0, sizeof(set_ip6_src));
@@ -593,7 +641,7 @@ nfp_fl_pedit(const struct tc_action *action, struct 
tc_cls_flower_offload *flow,
break;
case TCA_PEDIT_KEY_EX_HDR_TYPE_IP6:
err = nfp_fl_set_ip6(action, idx, offset, _ip6_dst,
-_ip6_src);
+_ip6_src, _ip6_tc_hl_fl);
break;
case TCA_PEDIT_KEY_EX_HDR_TYPE_TCP:
err = nfp_fl_set_tport(action, idx, offset,

[PATCH net-next 6/7] nfp: register a notifier handler in a central location for the device

2018-11-06 Thread Jakub Kicinski

Code interested in networking events registers its own notifier
handlers.  Create one device-wide notifier instance.

Signed-off-by: Jakub Kicinski 
Reviewed-by: John Hurley 
---
 drivers/net/ethernet/netronome/nfp/nfp_app.c | 47 
 drivers/net/ethernet/netronome/nfp/nfp_app.h | 25 +--
 2 files changed, 57 insertions(+), 15 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/nfp_app.c 
b/drivers/net/ethernet/netronome/nfp/nfp_app.c
index 68a0991aac22..4a1b8f79e731 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_app.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_app.c
@@ -136,6 +136,53 @@ nfp_app_reprs_set(struct nfp_app *app, enum nfp_repr_type 
type,
return old;
 }
 
+static int
+nfp_app_netdev_event(struct notifier_block *nb, unsigned long event, void *ptr)
+{
+   struct net_device *netdev;
+   struct nfp_app *app;
+
+   netdev = netdev_notifier_info_to_dev(ptr);
+   app = container_of(nb, struct nfp_app, netdev_nb);
+
+   if (app->type->netdev_event)
+   return app->type->netdev_event(app, netdev, event, ptr);
+   return NOTIFY_DONE;
+}
+
+int nfp_app_start(struct nfp_app *app, struct nfp_net *ctrl)
+{
+   int err;
+
+   app->ctrl = ctrl;
+
+   if (app->type->start) {
+   err = app->type->start(app);
+   if (err)
+   return err;
+   }
+
+   app->netdev_nb.notifier_call = nfp_app_netdev_event;
+   err = register_netdevice_notifier(>netdev_nb);
+   if (err)
+   goto err_app_stop;
+
+   return 0;
+
+err_app_stop:
+   if (app->type->stop)
+   app->type->stop(app);
+   return err;
+}
+
+void nfp_app_stop(struct nfp_app *app)
+{
+   unregister_netdevice_notifier(>netdev_nb);
+
+   if (app->type->stop)
+   app->type->stop(app);
+}
+
 struct nfp_app *nfp_app_alloc(struct nfp_pf *pf, enum nfp_app_id id)
 {
struct nfp_app *app;
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_app.h 
b/drivers/net/ethernet/netronome/nfp/nfp_app.h
index 4d6ecf99b1cc..d578d856a009 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_app.h
+++ b/drivers/net/ethernet/netronome/nfp/nfp_app.h
@@ -69,6 +69,7 @@ extern const struct nfp_app_type app_abm;
  * @port_get_stats_strings:get strings for extra statistics
  * @start: start application logic
  * @stop:  stop application logic
+ * @netdev_event:  Netdevice notifier event
  * @ctrl_msg_rx:control message handler
  * @ctrl_msg_rx_raw:   handler for control messages from data queues
  * @setup_tc:  setup TC ndo
@@ -122,6 +123,9 @@ struct nfp_app_type {
int (*start)(struct nfp_app *app);
void (*stop)(struct nfp_app *app);
 
+   int (*netdev_event)(struct nfp_app *app, struct net_device *netdev,
+   unsigned long event, void *ptr);
+
void (*ctrl_msg_rx)(struct nfp_app *app, struct sk_buff *skb);
void (*ctrl_msg_rx_raw)(struct nfp_app *app, const void *data,
unsigned int len);
@@ -151,6 +155,7 @@ struct nfp_app_type {
  * @reprs: array of pointers to representors
  * @type:  pointer to const application ops and info
  * @ctrl_mtu:  MTU to set on the control vNIC (set in .init())
+ * @netdev_nb: Netdevice notifier block
  * @priv:  app-specific priv data
  */
 struct nfp_app {
@@ -163,6 +168,9 @@ struct nfp_app {
 
const struct nfp_app_type *type;
unsigned int ctrl_mtu;
+
+   struct notifier_block netdev_nb;
+
void *priv;
 };
 
@@ -264,21 +272,6 @@ nfp_app_repr_change_mtu(struct nfp_app *app, struct 
net_device *netdev,
return app->type->repr_change_mtu(app, netdev, new_mtu);
 }
 
-static inline int nfp_app_start(struct nfp_app *app, struct nfp_net *ctrl)
-{
-   app->ctrl = ctrl;
-   if (!app->type->start)
-   return 0;
-   return app->type->start(app);
-}
-
-static inline void nfp_app_stop(struct nfp_app *app)
-{
-   if (!app->type->stop)
-   return;
-   app->type->stop(app);
-}
-
 static inline const char *nfp_app_name(struct nfp_app *app)
 {
if (!app)
@@ -430,6 +423,8 @@ nfp_app_ctrl_msg_alloc(struct nfp_app *app, unsigned int 
size, gfp_t priority);
 
 struct nfp_app *nfp_app_alloc(struct nfp_pf *pf, enum nfp_app_id id);
 void nfp_app_free(struct nfp_app *app);
+int nfp_app_start(struct nfp_app *app, struct nfp_net *ctrl);
+void nfp_app_stop(struct nfp_app *app);
 
 /* Callbacks shared between apps */
 
-- 
2.17.1

[PATCH net-next 0/7] nfp: more set actions and notifier refactor

2018-11-06 Thread Jakub Kicinski

Hi!

This series brings updates to flower offload code.  First Pieter adds
support for setting TTL, ToS, Flow Label and Hop Limit fields in IPv4
and IPv6 headers.

Remaining 5 patches deal with factoring out netdev notifiers from flower
code.  We already have two instances, and more is coming, so it's time
to move to one central notifier which then feeds individual feature
handlers.

I start that part by cleaning up the existing notifiers.  Next a central
notifier is added, and used by flower offloads.

Jakub Kicinski (5):
  nfp: flower: remove unnecessary iteration over devices
  nfp: flower: don't try to nack device unregister events
  nfp: flower: make nfp_fl_lag_changels_event() void
  nfp: register a notifier handler in a central location for the device
  nfp: flower: use the common netdev notifier

Pieter Jansen van Vuuren (2):
  nfp: flower: add ipv4 set ttl and tos offload
  nfp: flower: add ipv6 set flow label and hop limit offload

 .../ethernet/netronome/nfp/flower/action.c| 143 --
 .../net/ethernet/netronome/nfp/flower/cmsg.h  |  24 +++
 .../ethernet/netronome/nfp/flower/lag_conf.c  |  48 +++---
 .../net/ethernet/netronome/nfp/flower/main.c  |  23 ++-
 .../net/ethernet/netronome/nfp/flower/main.h  |  10 +-
 .../netronome/nfp/flower/tunnel_conf.c|  45 +-
 drivers/net/ethernet/netronome/nfp/nfp_app.c  |  47 ++
 drivers/net/ethernet/netronome/nfp/nfp_app.h  |  25 ++-
 8 files changed, 261 insertions(+), 104 deletions(-)

-- 
2.17.1

[PATCH net-next 4/7] nfp: flower: don't try to nack device unregister events

2018-11-06 Thread Jakub Kicinski

Returning an error from a notifier means we want to veto the change.
We shouldn't veto NETDEV_UNREGISTER just because we couldn't find
the tracking info for given master.

I can't seem to find a way to trigger this unless we have some
other bug, so it's probably not fix-worthy.

While at it move the checking if the netdev really is of interest
into the handling functions, like we do for other events.

Signed-off-by: Jakub Kicinski 
Reviewed-by: John Hurley 
---
 .../ethernet/netronome/nfp/flower/lag_conf.c  | 21 +++
 1 file changed, 12 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/flower/lag_conf.c 
b/drivers/net/ethernet/netronome/nfp/flower/lag_conf.c
index 81dcf5b318ba..dc060748b33b 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/lag_conf.c
+++ b/drivers/net/ethernet/netronome/nfp/flower/lag_conf.c
@@ -472,17 +472,25 @@ nfp_fl_lag_schedule_group_remove(struct nfp_fl_lag *lag,
schedule_delayed_work(>work, NFP_FL_LAG_DELAY);
 }
 
-static int
+static void
 nfp_fl_lag_schedule_group_delete(struct nfp_fl_lag *lag,
 struct net_device *master)
 {
struct nfp_fl_lag_group *group;
+   struct nfp_flower_priv *priv;
+
+   priv = container_of(lag, struct nfp_flower_priv, nfp_lag);
+
+   if (!netif_is_bond_master(master))
+   return;
 
mutex_lock(>lock);
group = nfp_fl_lag_find_group_for_master_with_lag(lag, master);
if (!group) {
mutex_unlock(>lock);
-   return -ENOENT;
+   nfp_warn(priv->app->cpp, "untracked bond got unregistered %s\n",
+netdev_name(master));
+   return;
}
 
group->to_remove = true;
@@ -490,7 +498,6 @@ nfp_fl_lag_schedule_group_delete(struct nfp_fl_lag *lag,
mutex_unlock(>lock);
 
schedule_delayed_work(>work, NFP_FL_LAG_DELAY);
-   return 0;
 }
 
 static int
@@ -643,12 +650,8 @@ nfp_fl_lag_netdev_event(struct notifier_block *nb, 
unsigned long event,
return NOTIFY_BAD;
return NOTIFY_OK;
case NETDEV_UNREGISTER:
-   if (netif_is_bond_master(netdev)) {
-   err = nfp_fl_lag_schedule_group_delete(lag, netdev);
-   if (err)
-   return NOTIFY_BAD;
-   return NOTIFY_OK;
-   }
+   nfp_fl_lag_schedule_group_delete(lag, netdev);
+   return NOTIFY_OK;
}
 
return NOTIFY_DONE;
-- 
2.17.1

[PATCH net-next 1/7] nfp: flower: add ipv4 set ttl and tos offload

2018-11-06 Thread Jakub Kicinski

From: Pieter Jansen van Vuuren 

Add ipv4 set ttl and tos action offload. Since pedit sets headers per 4
byte word, we need to ensure that setting either version, ihl, protocol,
total length or checksum does not get offloaded.

Signed-off-by: Pieter Jansen van Vuuren 
Reviewed-by: John Hurley 
Reviewed-by: Jakub Kicinski 
---
 .../ethernet/netronome/nfp/flower/action.c| 69 +--
 .../net/ethernet/netronome/nfp/flower/cmsg.h  | 10 +++
 2 files changed, 73 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/flower/action.c 
b/drivers/net/ethernet/netronome/nfp/flower/action.c
index 244dc261006e..b79b924ef56d 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/action.c
+++ b/drivers/net/ethernet/netronome/nfp/flower/action.c
@@ -384,10 +384,21 @@ nfp_fl_set_eth(const struct tc_action *action, int idx, 
u32 off,
return 0;
 }
 
+struct ipv4_ttl_word {
+   __u8ttl;
+   __u8protocol;
+   __sum16 check;
+};
+
 static int
 nfp_fl_set_ip4(const struct tc_action *action, int idx, u32 off,
-  struct nfp_fl_set_ip4_addrs *set_ip_addr)
+  struct nfp_fl_set_ip4_addrs *set_ip_addr,
+  struct nfp_fl_set_ip4_ttl_tos *set_ip_ttl_tos)
 {
+   struct ipv4_ttl_word *ttl_word_mask;
+   struct ipv4_ttl_word *ttl_word;
+   struct iphdr *tos_word_mask;
+   struct iphdr *tos_word;
__be32 exact, mask;
 
/* We are expecting tcf_pedit to return a big endian value */
@@ -402,20 +413,53 @@ nfp_fl_set_ip4(const struct tc_action *action, int idx, 
u32 off,
set_ip_addr->ipv4_dst_mask |= mask;
set_ip_addr->ipv4_dst &= ~mask;
set_ip_addr->ipv4_dst |= exact & mask;
+   set_ip_addr->head.jump_id = NFP_FL_ACTION_OPCODE_SET_IPV4_ADDRS;
+   set_ip_addr->head.len_lw = sizeof(*set_ip_addr) >>
+  NFP_FL_LW_SIZ;
break;
case offsetof(struct iphdr, saddr):
set_ip_addr->ipv4_src_mask |= mask;
set_ip_addr->ipv4_src &= ~mask;
set_ip_addr->ipv4_src |= exact & mask;
+   set_ip_addr->head.jump_id = NFP_FL_ACTION_OPCODE_SET_IPV4_ADDRS;
+   set_ip_addr->head.len_lw = sizeof(*set_ip_addr) >>
+  NFP_FL_LW_SIZ;
+   break;
+   case offsetof(struct iphdr, ttl):
+   ttl_word_mask = (struct ipv4_ttl_word *)
+   ttl_word = (struct ipv4_ttl_word *)
+
+   if (ttl_word_mask->protocol || ttl_word_mask->check)
+   return -EOPNOTSUPP;
+
+   set_ip_ttl_tos->ipv4_ttl_mask |= ttl_word_mask->ttl;
+   set_ip_ttl_tos->ipv4_ttl &= ~ttl_word_mask->ttl;
+   set_ip_ttl_tos->ipv4_ttl |= ttl_word->ttl & ttl_word_mask->ttl;
+   set_ip_ttl_tos->head.jump_id =
+   NFP_FL_ACTION_OPCODE_SET_IPV4_TTL_TOS;
+   set_ip_ttl_tos->head.len_lw = sizeof(*set_ip_ttl_tos) >>
+ NFP_FL_LW_SIZ;
+   break;
+   case round_down(offsetof(struct iphdr, tos), 4):
+   tos_word_mask = (struct iphdr *)
+   tos_word = (struct iphdr *)
+
+   if (tos_word_mask->version || tos_word_mask->ihl ||
+   tos_word_mask->tot_len)
+   return -EOPNOTSUPP;
+
+   set_ip_ttl_tos->ipv4_tos_mask |= tos_word_mask->tos;
+   set_ip_ttl_tos->ipv4_tos &= ~tos_word_mask->tos;
+   set_ip_ttl_tos->ipv4_tos |= tos_word->tos & tos_word_mask->tos;
+   set_ip_ttl_tos->head.jump_id =
+   NFP_FL_ACTION_OPCODE_SET_IPV4_TTL_TOS;
+   set_ip_ttl_tos->head.len_lw = sizeof(*set_ip_ttl_tos) >>
+ NFP_FL_LW_SIZ;
break;
default:
return -EOPNOTSUPP;
}
 
-   set_ip_addr->reserved = cpu_to_be16(0);
-   set_ip_addr->head.jump_id = NFP_FL_ACTION_OPCODE_SET_IPV4_ADDRS;
-   set_ip_addr->head.len_lw = sizeof(*set_ip_addr) >> NFP_FL_LW_SIZ;
-
return 0;
 }
 
@@ -513,6 +557,7 @@ nfp_fl_pedit(const struct tc_action *action, struct 
tc_cls_flower_offload *flow,
 char *nfp_action, int *a_len, u32 *csum_updated)
 {
struct nfp_fl_set_ipv6_addr set_ip6_dst, set_ip6_src;
+   struct nfp_fl_set_ip4_ttl_tos set_ip_ttl_tos;
struct nfp_fl_set_ip4_addrs set_ip_addr;
struct nfp_fl_set_tport set_tport;
struct nfp_fl_set_eth set_eth;
@@ -522,6 +567,7 @@ nfp_fl_pedit(const struct tc_action *action, struct 
tc_cls_flower_offload *flow,
u32 offset, cmd;
u8 ip_proto = 0;
 
+   memset(_ip_ttl_tos, 0, sizeof(set_ip_ttl_tos));
memset(_ip6_dst, 0, sizeof(set_ip6_dst));
memset(_ip6_src, 0, sizeof(set_ip6_src));
memset(_ip_addr, 0,

[PATCH net-next 7/7] nfp: flower: use the common netdev notifier

2018-11-06 Thread Jakub Kicinski

Use driver's common notifier for LAG and tunnel configuration.

Signed-off-by: Jakub Kicinski 
Reviewed-by: John Hurley 
---
 .../ethernet/netronome/nfp/flower/lag_conf.c  | 14 ++-
 .../net/ethernet/netronome/nfp/flower/main.c  | 23 +++
 .../net/ethernet/netronome/nfp/flower/main.h  | 10 +++--
 .../netronome/nfp/flower/tunnel_conf.c| 38 ++-
 4 files changed, 30 insertions(+), 55 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/flower/lag_conf.c 
b/drivers/net/ethernet/netronome/nfp/flower/lag_conf.c
index 22b75a519269..5db838f45694 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/lag_conf.c
+++ b/drivers/net/ethernet/netronome/nfp/flower/lag_conf.c
@@ -626,17 +626,13 @@ nfp_fl_lag_changels_event(struct nfp_fl_lag *lag, struct 
net_device *netdev,
schedule_delayed_work(>work, NFP_FL_LAG_DELAY);
 }
 
-static int
-nfp_fl_lag_netdev_event(struct notifier_block *nb, unsigned long event,
-   void *ptr)
+int nfp_flower_lag_netdev_event(struct nfp_flower_priv *priv,
+   struct net_device *netdev,
+   unsigned long event, void *ptr)
 {
-   struct net_device *netdev;
-   struct nfp_fl_lag *lag;
+   struct nfp_fl_lag *lag = >nfp_lag;
int err;
 
-   netdev = netdev_notifier_info_to_dev(ptr);
-   lag = container_of(nb, struct nfp_fl_lag, lag_nb);
-
switch (event) {
case NETDEV_CHANGEUPPER:
err = nfp_fl_lag_changeupper_event(lag, ptr);
@@ -673,8 +669,6 @@ void nfp_flower_lag_init(struct nfp_fl_lag *lag)
 
/* 0 is a reserved batch version so increment to first valid value. */
nfp_fl_increment_version(lag);
-
-   lag->lag_nb.notifier_call = nfp_fl_lag_netdev_event;
 }
 
 void nfp_flower_lag_cleanup(struct nfp_fl_lag *lag)
diff --git a/drivers/net/ethernet/netronome/nfp/flower/main.c 
b/drivers/net/ethernet/netronome/nfp/flower/main.c
index 3a54728d2ea6..2ad00773750f 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/main.c
+++ b/drivers/net/ethernet/netronome/nfp/flower/main.c
@@ -661,23 +661,30 @@ static int nfp_flower_start(struct nfp_app *app)
err = nfp_flower_lag_reset(_priv->nfp_lag);
if (err)
return err;
-
-   err = register_netdevice_notifier(_priv->nfp_lag.lag_nb);
-   if (err)
-   return err;
}
 
return nfp_tunnel_config_start(app);
 }
 
 static void nfp_flower_stop(struct nfp_app *app)
+{
+   nfp_tunnel_config_stop(app);
+}
+
+static int
+nfp_flower_netdev_event(struct nfp_app *app, struct net_device *netdev,
+   unsigned long event, void *ptr)
 {
struct nfp_flower_priv *app_priv = app->priv;
+   int ret;
 
-   if (app_priv->flower_ext_feats & NFP_FL_FEATS_LAG)
-   unregister_netdevice_notifier(_priv->nfp_lag.lag_nb);
+   if (app_priv->flower_ext_feats & NFP_FL_FEATS_LAG) {
+   ret = nfp_flower_lag_netdev_event(app_priv, netdev, event, ptr);
+   if (ret & NOTIFY_STOP_MASK)
+   return ret;
+   }
 
-   nfp_tunnel_config_stop(app);
+   return nfp_tunnel_mac_event_handler(app, netdev, event, ptr);
 }
 
 const struct nfp_app_type app_flower = {
@@ -708,6 +715,8 @@ const struct nfp_app_type app_flower = {
.start  = nfp_flower_start,
.stop   = nfp_flower_stop,
 
+   .netdev_event   = nfp_flower_netdev_event,
+
.ctrl_msg_rx= nfp_flower_cmsg_rx,
 
.sriov_enable   = nfp_flower_sriov_enable,
diff --git a/drivers/net/ethernet/netronome/nfp/flower/main.h 
b/drivers/net/ethernet/netronome/nfp/flower/main.h
index 90045bab95bf..0f6f1675f6f1 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/main.h
+++ b/drivers/net/ethernet/netronome/nfp/flower/main.h
@@ -72,7 +72,6 @@ struct nfp_mtu_conf {
 
 /**
  * struct nfp_fl_lag - Flower APP priv data for link aggregation
- * @lag_nb:Notifier to track master/slave events
  * @work:  Work queue for writing configs to the HW
  * @lock:  Lock to protect lag_group_list
  * @group_list:List of all master/slave groups offloaded
@@ -85,7 +84,6 @@ struct nfp_mtu_conf {
  * retransmission
  */
 struct nfp_fl_lag {
-   struct notifier_block lag_nb;
struct delayed_work work;
struct mutex lock;
struct list_head group_list;
@@ -126,7 +124,6 @@ struct nfp_fl_lag {
  * @nfp_neigh_off_lock:Lock for the neighbour address list
  * @nfp_mac_off_ids:   IDA to manage id assignment for offloaded macs
  * @nfp_mac_off_count: Number of MACs in address list
- * @nfp_tun_mac_nb:Notifier to monitor link state
  * @nfp_tun_neigh_nb:  Notifier to monitor neighbour state
  * @reify_replies: atomically stores the number of replies received
  * from firmware for repr reify
@@ -160,7

[PATCH net-next 3/7] nfp: flower: remove unnecessary iteration over devices

2018-11-06 Thread Jakub Kicinski

For flower tunnel offloads FW has to be informed about MAC addresses
of tunnel devices.  We use a netdev notifier to keep track of these
addresses.

Remove unnecessary loop over netdevices after notifier is registered.
The intention of the loop was to catch devices which already existed
on the system before nfp driver got loaded, but netdev notifier will
replay NETDEV_REGISTER events.

Signed-off-by: Jakub Kicinski 
Reviewed-by: John Hurley 
---
 drivers/net/ethernet/netronome/nfp/flower/tunnel_conf.c | 7 ---
 1 file changed, 7 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/flower/tunnel_conf.c 
b/drivers/net/ethernet/netronome/nfp/flower/tunnel_conf.c
index 8e5bec04d1f9..a3a44f1187d3 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/tunnel_conf.c
+++ b/drivers/net/ethernet/netronome/nfp/flower/tunnel_conf.c
@@ -686,7 +686,6 @@ static int nfp_tun_mac_event_handler(struct notifier_block 
*nb,
 int nfp_tunnel_config_start(struct nfp_app *app)
 {
struct nfp_flower_priv *priv = app->priv;
-   struct net_device *netdev;
int err;
 
/* Initialise priv data for MAC offloading. */
@@ -715,12 +714,6 @@ int nfp_tunnel_config_start(struct nfp_app *app)
if (err)
goto err_unreg_mac_nb;
 
-   /* Parse netdevs already registered for MACs that need offloaded. */
-   rtnl_lock();
-   for_each_netdev(_net, netdev)
-   nfp_tun_add_to_mac_offload_list(netdev, app);
-   rtnl_unlock();
-
return 0;
 
 err_unreg_mac_nb:
-- 
2.17.1

Re: [PATCH bpf-next v2 02/13] bpf: btf: Add BTF_KIND_FUNC and BTF_KIND_FUNC_PROTO

2018-11-06 Thread Alexei Starovoitov

On Tue, Nov 06, 2018 at 10:58:42PM +, Edward Cree wrote:
>  share its type record with 'foo'.  And partly just because the
>  name of the function itself is no more part of its type than the
>  name of an integer variable is part of the integer's type.

correct. function name is not part of its type.
function name is part of BTF that provide debug info about the function.

Function name and function argument names are part of the same debug info.
Splitting them makes no sense.

> (Whereas names of parameters are like names of struct members:
>  while they are not part of the 'pure type' from a language
>  perspective, they are part of the type from the perspective of
>  debugging, which is why they belong in the BTF type record.)

struct name and struct field names live in the same BTF record.
Similarly function name and function argument names should be
in the same BTF record, so we can reuse most of the BTF validation
and BTF parsing logic by doing so.
The minor difference between KIND_STRUCT and KIND_FUNC is
an addition of return type_id.
Everything else is common.
imo that speaks for itself that it's a correct path forward.

> > There are C, bpftrace, p4 and python frontends. These languages
> > should be free to put into BTF KIND_FUNC name that makes sense
> > from the language point of view.
> I'm paying attention to BTF because I'm adding support for it into
>  my ebpf_asm.  Don't you think I *know* that frontends for BPF are
>  more than just C?

assembler is not a high level language.
I believe it's a proper trade-off to make C easier to use
in expense of some ugliness in your ebpf_asm.

> > The global variables for given .c file will look like single KIND_STRUCT
> That's exactly the kind of superficially-clever but nasty hack
>  that results from the continued insistence on conflating types
>  and instances (objects).  In the long run it will make
>  maintenance harder, and frustrate new features owing to the need
>  to find new hacks to shoehorn them into the same model.

Let's keep 'nasty hack' claims out of this discussion.
I find the current BTF design and KIND_FUNC addition to be elegant
and appropriate.

> Instead there should be entries for the globals in something like
>  the variables table I mentioned,
> 2 "fred" type=1 where=global func=0 offset=8
>  in which 'func' is unused and 'offset' gives offset in .bss.
>  'where' might also include indication of whether it's static.

'static' like boolean flag? That won't help introspection.
To properly describe 'static' functions more information is necessary.
I don't like to invent new formats. BTF is extensible description
of any debug info. I prefer to keep all debug info in one place
and in one common format.

> I'm saying that the *function* is entirely different to its
>  *type*.  It's a category error to conflate them:
>     f: x -> x + 1
>  is a function.

BTF does not describe function. BTF describes debug info about function.
BPF program is the function.
BTF is not *type* only format. It's debug info format.
Trying to make BTF into type only is not going to work.
It's already more than type only as I showed earlier.

Re: [PATCH RFC net-next 0/3] net: phy: sfp: Warn when using generic PHY driver

2018-11-06 Thread Andrew Lunn

> Another approach could be to maintain a list of modules that do not work
> with the generic PHY driver and therefore require a specialized driver,
> in that case we could even go as far as not letting sfp_sm_probe_phy()
> return success. Not sure how well things would scale, probably not too
> bad given there are only a handful of users of the SFP framework thus far...

Hi Florian

Blacklisting modules with known issues with the generic driver does
not sound too bad. This is just a warning, a helpful hint, and it is
not going to work anyway. And i don't see scaling problems, Copper
SFPs seems quite odd to start with...

Andrew

[net-next:master 8/13] drivers/net/dsa/bcm_sf2_cfp.c:532:2-3: Unneeded semicolon

2018-11-06 Thread kbuild test robot

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git 
master
head:   5882d526d887e42ead4014d79620e5a8aa741151
commit: ae7a5aff783c79d5ca87867df84b08c43447159b [8/13] net: dsa: bcm_sf2: Keep 
copy of inserted rules


coccinelle warnings: (new ones prefixed by >>)

>> drivers/net/dsa/bcm_sf2_cfp.c:532:2-3: Unneeded semicolon

Please review and possibly fold the followup patch.

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation

[PATCH] net: dsa: bcm_sf2: fix semicolon.cocci warnings

2018-11-06 Thread kbuild test robot

From: kbuild test robot 

drivers/net/dsa/bcm_sf2_cfp.c:1168:2-3: Unneeded semicolon
drivers/net/dsa/bcm_sf2_cfp.c:532:2-3: Unneeded semicolon


 Remove unneeded semicolon.

Generated by: scripts/coccinelle/misc/semicolon.cocci

Fixes: ae7a5aff783c ("net: dsa: bcm_sf2: Keep copy of inserted rules")
CC: Florian Fainelli 
Signed-off-by: kbuild test robot 
---

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git 
master
head:   5882d526d887e42ead4014d79620e5a8aa741151
commit: ae7a5aff783c79d5ca87867df84b08c43447159b [8/13] net: dsa: bcm_sf2: Keep 
copy of inserted rules

 bcm_sf2_cfp.c |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

--- a/drivers/net/dsa/bcm_sf2_cfp.c
+++ b/drivers/net/dsa/bcm_sf2_cfp.c
@@ -529,7 +529,7 @@ static struct cfp_rule *bcm_sf2_cfp_rule
list_for_each_entry(rule, >cfp.rules_list, next) {
if (rule->port == port && rule->fs.location == location)
break;
-   };
+   }
 
return rule;
 }
@@ -1165,7 +1165,7 @@ int bcm_sf2_cfp_resume(struct dsa_switch
dev_err(ds->dev, "failed to restore rule\n");
return ret;
}
-   };
+   }
 
return ret;
 }

Re: [PATCH RFC net-next 0/3] net: phy: sfp: Warn when using generic PHY driver

2018-11-06 Thread Florian Fainelli

On 11/6/18 4:34 PM, Russell King - ARM Linux wrote:
> On Tue, Nov 06, 2018 at 04:09:35PM -0800, Florian Fainelli wrote:
>> On 11/6/18 4:03 PM, Russell King - ARM Linux wrote:
>>> On Tue, Nov 06, 2018 at 03:38:44PM -0800, David Miller wrote:
 From: Florian Fainelli 
 Date: Tue,  6 Nov 2018 15:29:10 -0800

> This patch series allows warning an user that the generic PHY driver(s)
> are used when a SFP incorporates a PHY (e.g: 1000BaseT SFP) which is
> likely not going to work at all.
>
> Let me know if you would want to do that differently.

 Is there ever a possibility that the generic PHY driver could work
 in an SFP situation?
>>>
>>> I don't yet see the reason for Florian's patch series - all the Marvell
>>> 88e based modules I have, or have come across in information from
>>> manufacturers self-configure themselves and don't really need the
>>> Marvell 1G PHY driver.
>>>
>>> For example, the Source Photonics were offering a range of 1GbaseT
>>> modules with the 88e programmed in different modes, but published
>>> instructions for the register accesses to configure them differently
>>> (eg, SGMII vs 1000base-X interface facing the MAC).  Depending on
>>> the module part number determines which mode the PHY has been
>>> programmed to come up in.
>>>
>>> So in theory, you don't need any PHY driver for these modules - but
>>> it's useful to have a functional PHY driver to be able to read out
>>> the negotiated flow control results.
>>>
>>> I'd like more information from Florian about the reasoning behind
>>> this patch series before it's merged.
>>>
>>
>> The module that I am using [1] would not work, as in , no link up being
>> reported without turning on the Marvell PHY driver:
>>
>> https://www.amazon.com/dp/B01LW2P72V/ref=twister_B07F3WQJQX?_encoding=UTF8=1
>>
>> this module uses a 88E PHY as well (OUI: 0x01410cc2).
> 
> From the above URL:
> 
>  * This is 1000M SFP-T Transceiver, not 10/100/1000M Multi-Rate SFP-T. If
>you want to buy 10/100/1000M Multi-Rate SFP-T, pls contact us.10Gtek
>offer more compatible options, if your brands not listed above, pls
>contact us.
> 
> I wonder if this is like the Source Photonics situation, where the
> 1000base-T only variant of their module uses 1000base-X on the MAC
> side, whereas their 10/100/1000base-T variant uses SGMII.  The only
> difference between these are the part numbers and the programming
> of the 88E to tell it which mode to default to for the host
> side.  (There's no true way to know from the EEPROM whether a module
> wants SGMII or 1000base-X.)
> 
> What I also gather is that this is a 10Gtek-manufactured version of
> the Ubiquiti UF-RJ45-1G - the original Ubiquiti version supports
> 10/100/1G speeds which would require the 88e to configure for
> a SGMII host interface by default.
> 
> Now, the reason that modules with an 88E configured to default to
> 1000base-X will work when the marvell PHY driver is present, but not
> with the generic driver is that the marvell PHY driver will see that
> SFP/phylink is wanting to use SGMII mode, and the Marvell PHY driver
> reprograms the PHY to use SGMII.  This is only a problem for these
> modules.
> 
> So, in so far as your patch 3 goes to give a hint that the Marvell
> driver should be selected, that's correct.
> 
> However, where the 88e is configured for SGMII by default, the
> Marvell driver shouldn't be required, and I wonder whether we ought
> to be issuing a warning in that case.  The problem, however, is there
> is no way to know for certain.
> 
> We could have modules that do not use the Marvell PHY, and if we don't
> have a PHY driver for their particular PHY, do we want a warning to be
> issued?

Another approach could be to maintain a list of modules that do not work
with the generic PHY driver and therefore require a specialized driver,
in that case we could even go as far as not letting sfp_sm_probe_phy()
return success. Not sure how well things would scale, probably not too
bad given there are only a handful of users of the SFP framework thus far...

> 
> The whole 1000base-X vs SGMII with SFP modules is all very icky. :(
> 
-- 
Florian

Re: [PATCH v2 1/3] bpf: allow zero-initializing hash map seed

2018-11-06 Thread Song Liu

On Thu, Oct 25, 2018 at 8:12 AM Lorenz Bauer  wrote:
>
> On Tue, 9 Oct 2018 at 01:08, Song Liu  wrote:
> >
> > > --- a/include/uapi/linux/bpf.h
> > > +++ b/include/uapi/linux/bpf.h
> > > @@ -253,6 +253,8 @@ enum bpf_attach_type {
> > >  #define BPF_F_NO_COMMON_LRU(1U << 1)
> > >  /* Specify numa node during map creation */
> > >  #define BPF_F_NUMA_NODE(1U << 2)
> > > +/* Zero-initialize hash function seed. This should only be used for 
> > > testing. */
> > > +#define BPF_F_ZERO_SEED(1U << 6)
> >
> > Please add this line after
> > #define BPF_F_STACK_BUILD_ID(1U << 5)
>
> I wanted to keep the flags for BPF_MAP_CREATE grouped together.
> Maybe the correct value is (1U << 3)? It seemed like the other flags
> were allocated to avoid
> overlap between different BPF commands, however, so I tried to follow suit.

I think it should be (1U << 6). We probably should move BPF_F_QUERY_EFFECTIVE
to after BPF_F_STACK_BUILD_ID (and BPF_F_ZERO_SEED).

Also, please rebase against the latest bpf-next tree and resubmit the set.

Thanks,
Song

[PATCH net-next] net: phy: bcm7xxx: Add entry for BCM7255

2018-11-06 Thread Florian Fainelli

From: Justin Chen 

Add support for BCM7255 EPHY.

Signed-off-by: Justin Chen 
Signed-off-by: Florian Fainelli 
---
 drivers/net/phy/bcm7xxx.c | 2 ++
 include/linux/brcmphy.h   | 1 +
 2 files changed, 3 insertions(+)

diff --git a/drivers/net/phy/bcm7xxx.c b/drivers/net/phy/bcm7xxx.c
index b2b6307d64a4..712224cc442d 100644
--- a/drivers/net/phy/bcm7xxx.c
+++ b/drivers/net/phy/bcm7xxx.c
@@ -650,6 +650,7 @@ static int bcm7xxx_28nm_probe(struct phy_device *phydev)
 
 static struct phy_driver bcm7xxx_driver[] = {
BCM7XXX_28NM_GPHY(PHY_ID_BCM7250, "Broadcom BCM7250"),
+   BCM7XXX_28NM_EPHY(PHY_ID_BCM7255, "Broadcom BCM7255"),
BCM7XXX_28NM_EPHY(PHY_ID_BCM7260, "Broadcom BCM7260"),
BCM7XXX_28NM_EPHY(PHY_ID_BCM7268, "Broadcom BCM7268"),
BCM7XXX_28NM_EPHY(PHY_ID_BCM7271, "Broadcom BCM7271"),
@@ -670,6 +671,7 @@ static struct phy_driver bcm7xxx_driver[] = {
 
 static struct mdio_device_id __maybe_unused bcm7xxx_tbl[] = {
{ PHY_ID_BCM7250, 0xfff0, },
+   { PHY_ID_BCM7255, 0xfff0, },
{ PHY_ID_BCM7260, 0xfff0, },
{ PHY_ID_BCM7268, 0xfff0, },
{ PHY_ID_BCM7271, 0xfff0, },
diff --git a/include/linux/brcmphy.h b/include/linux/brcmphy.h
index 949e9af8d9d6..9cd00a37b8d3 100644
--- a/include/linux/brcmphy.h
+++ b/include/linux/brcmphy.h
@@ -28,6 +28,7 @@
 #define PHY_ID_BCM896100x03625cd0
 
 #define PHY_ID_BCM7250 0xae025280
+#define PHY_ID_BCM7255 0xae025120
 #define PHY_ID_BCM7260 0xae025190
 #define PHY_ID_BCM7268 0xae025090
 #define PHY_ID_BCM7271 0xae0253b0
-- 
2.17.1

Re: [PATCH RFC net-next 0/3] net: phy: sfp: Warn when using generic PHY driver

2018-11-06 Thread Russell King - ARM Linux

On Tue, Nov 06, 2018 at 04:09:35PM -0800, Florian Fainelli wrote:
> On 11/6/18 4:03 PM, Russell King - ARM Linux wrote:
> > On Tue, Nov 06, 2018 at 03:38:44PM -0800, David Miller wrote:
> >> From: Florian Fainelli 
> >> Date: Tue,  6 Nov 2018 15:29:10 -0800
> >>
> >>> This patch series allows warning an user that the generic PHY driver(s)
> >>> are used when a SFP incorporates a PHY (e.g: 1000BaseT SFP) which is
> >>> likely not going to work at all.
> >>>
> >>> Let me know if you would want to do that differently.
> >>
> >> Is there ever a possibility that the generic PHY driver could work
> >> in an SFP situation?
> > 
> > I don't yet see the reason for Florian's patch series - all the Marvell
> > 88e based modules I have, or have come across in information from
> > manufacturers self-configure themselves and don't really need the
> > Marvell 1G PHY driver.
> > 
> > For example, the Source Photonics were offering a range of 1GbaseT
> > modules with the 88e programmed in different modes, but published
> > instructions for the register accesses to configure them differently
> > (eg, SGMII vs 1000base-X interface facing the MAC).  Depending on
> > the module part number determines which mode the PHY has been
> > programmed to come up in.
> > 
> > So in theory, you don't need any PHY driver for these modules - but
> > it's useful to have a functional PHY driver to be able to read out
> > the negotiated flow control results.
> > 
> > I'd like more information from Florian about the reasoning behind
> > this patch series before it's merged.
> > 
> 
> The module that I am using [1] would not work, as in , no link up being
> reported without turning on the Marvell PHY driver:
> 
> https://www.amazon.com/dp/B01LW2P72V/ref=twister_B07F3WQJQX?_encoding=UTF8=1
> 
> this module uses a 88E PHY as well (OUI: 0x01410cc2).

>From the above URL:

 * This is 1000M SFP-T Transceiver, not 10/100/1000M Multi-Rate SFP-T. If
   you want to buy 10/100/1000M Multi-Rate SFP-T, pls contact us.10Gtek
   offer more compatible options, if your brands not listed above, pls
   contact us.

I wonder if this is like the Source Photonics situation, where the
1000base-T only variant of their module uses 1000base-X on the MAC
side, whereas their 10/100/1000base-T variant uses SGMII.  The only
difference between these are the part numbers and the programming
of the 88E to tell it which mode to default to for the host
side.  (There's no true way to know from the EEPROM whether a module
wants SGMII or 1000base-X.)

What I also gather is that this is a 10Gtek-manufactured version of
the Ubiquiti UF-RJ45-1G - the original Ubiquiti version supports
10/100/1G speeds which would require the 88e to configure for
a SGMII host interface by default.

Now, the reason that modules with an 88E configured to default to
1000base-X will work when the marvell PHY driver is present, but not
with the generic driver is that the marvell PHY driver will see that
SFP/phylink is wanting to use SGMII mode, and the Marvell PHY driver
reprograms the PHY to use SGMII.  This is only a problem for these
modules.

So, in so far as your patch 3 goes to give a hint that the Marvell
driver should be selected, that's correct.

However, where the 88e is configured for SGMII by default, the
Marvell driver shouldn't be required, and I wonder whether we ought
to be issuing a warning in that case.  The problem, however, is there
is no way to know for certain.

We could have modules that do not use the Marvell PHY, and if we don't
have a PHY driver for their particular PHY, do we want a warning to be
issued?

The whole 1000base-X vs SGMII with SFP modules is all very icky. :(

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up

Re: [PATCH bpf-next] bpf_load: add map name to load_maps error message

2018-11-06 Thread Song Liu

On Mon, Oct 29, 2018 at 3:12 PM John Fastabend  wrote:
>
> On 10/29/2018 02:14 PM, Shannon Nelson wrote:
> > To help when debugging bpf/xdp load issues, have the load_map()
> > error message include the number and name of the map that
> > failed.
> >
> > Signed-off-by: Shannon Nelson 
> > ---
> >  samples/bpf/bpf_load.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/samples/bpf/bpf_load.c b/samples/bpf/bpf_load.c
> > index 89161c9..5de0357 100644
> > --- a/samples/bpf/bpf_load.c
> > +++ b/samples/bpf/bpf_load.c
> > @@ -282,8 +282,8 @@ static int load_maps(struct bpf_map_data *maps, int 
> > nr_maps,
> >   numa_node);
> >   }
> >   if (map_fd[i] < 0) {
> > - printf("failed to create a map: %d %s\n",
> > -errno, strerror(errno));
> > + printf("failed to create map %d (%s): %d %s\n",
> > +i, maps[i].name, errno, strerror(errno));
> >   return 1;
> >   }
> >   maps[i].fd = map_fd[i];
> >
>
> LGTM
>
> Acked-by: John Fastabend 

Acked-by: Song Liu

Re: [PATCH RFC net-next 0/3] net: phy: sfp: Warn when using generic PHY driver

2018-11-06 Thread Florian Fainelli

On 11/6/18 4:03 PM, Russell King - ARM Linux wrote:
> On Tue, Nov 06, 2018 at 03:38:44PM -0800, David Miller wrote:
>> From: Florian Fainelli 
>> Date: Tue,  6 Nov 2018 15:29:10 -0800
>>
>>> This patch series allows warning an user that the generic PHY driver(s)
>>> are used when a SFP incorporates a PHY (e.g: 1000BaseT SFP) which is
>>> likely not going to work at all.
>>>
>>> Let me know if you would want to do that differently.
>>
>> Is there ever a possibility that the generic PHY driver could work
>> in an SFP situation?
> 
> I don't yet see the reason for Florian's patch series - all the Marvell
> 88e based modules I have, or have come across in information from
> manufacturers self-configure themselves and don't really need the
> Marvell 1G PHY driver.
> 
> For example, the Source Photonics were offering a range of 1GbaseT
> modules with the 88e programmed in different modes, but published
> instructions for the register accesses to configure them differently
> (eg, SGMII vs 1000base-X interface facing the MAC).  Depending on
> the module part number determines which mode the PHY has been
> programmed to come up in.
> 
> So in theory, you don't need any PHY driver for these modules - but
> it's useful to have a functional PHY driver to be able to read out
> the negotiated flow control results.
> 
> I'd like more information from Florian about the reasoning behind
> this patch series before it's merged.
> 

The module that I am using [1] would not work, as in , no link up being
reported without turning on the Marvell PHY driver:

https://www.amazon.com/dp/B01LW2P72V/ref=twister_B07F3WQJQX?_encoding=UTF8=1

this module uses a 88E PHY as well (OUI: 0x01410cc2).
-- 
Florian

Re: [PATCH RFC net-next 0/3] net: phy: sfp: Warn when using generic PHY driver

2018-11-06 Thread Russell King - ARM Linux

On Tue, Nov 06, 2018 at 03:38:44PM -0800, David Miller wrote:
> From: Florian Fainelli 
> Date: Tue,  6 Nov 2018 15:29:10 -0800
> 
> > This patch series allows warning an user that the generic PHY driver(s)
> > are used when a SFP incorporates a PHY (e.g: 1000BaseT SFP) which is
> > likely not going to work at all.
> > 
> > Let me know if you would want to do that differently.
> 
> Is there ever a possibility that the generic PHY driver could work
> in an SFP situation?

I don't yet see the reason for Florian's patch series - all the Marvell
88e based modules I have, or have come across in information from
manufacturers self-configure themselves and don't really need the
Marvell 1G PHY driver.

For example, the Source Photonics were offering a range of 1GbaseT
modules with the 88e programmed in different modes, but published
instructions for the register accesses to configure them differently
(eg, SGMII vs 1000base-X interface facing the MAC).  Depending on
the module part number determines which mode the PHY has been
programmed to come up in.

So in theory, you don't need any PHY driver for these modules - but
it's useful to have a functional PHY driver to be able to read out
the negotiated flow control results.

I'd like more information from Florian about the reasoning behind
this patch series before it's merged.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up

Re: [PATCH RFC net-next 0/3] net: phy: sfp: Warn when using generic PHY driver

2018-11-06 Thread Florian Fainelli

On 11/6/18 3:38 PM, David Miller wrote:
> From: Florian Fainelli 
> Date: Tue,  6 Nov 2018 15:29:10 -0800
> 
>> This patch series allows warning an user that the generic PHY driver(s)
>> are used when a SFP incorporates a PHY (e.g: 1000BaseT SFP) which is
>> likely not going to work at all.
>>
>> Let me know if you would want to do that differently.
> 
> Is there ever a possibility that the generic PHY driver could work
> in an SFP situation?

Given the PHY has to operate in SGMII mode, I doubt it could work
without a specialized driver, Andrew, Russell, would you concur?

> 
> If not, yes emit the message but also fail the load and registry too
> perhaps?
> 

I was not sure this would be acceptable, but it is definitively an easy
change.
-- 
Florian

Re: [PATCH net-next 0/3] net: systemport: Unmap queues upon DSA unregister event

2018-11-06 Thread David Miller

From: Florian Fainelli 
Date: Tue,  6 Nov 2018 15:15:15 -0800

> This patch series fixes the unbinding/binding of the bcm_sf2 switch
> driver along with bcmsysport which monitors the switch port queues.
> Because the driver was not processing the DSA_PORT_UNREGISTER event, we
> would not be unmapping switch port/queues, which could cause incorrect
> decisions to be made by the HW (e.g: queue always back-pressured).

Series applied, thanks Florian.

Re: [PATCH RFC net-next 0/3] net: phy: sfp: Warn when using generic PHY driver

2018-11-06 Thread David Miller

From: Florian Fainelli 
Date: Tue,  6 Nov 2018 15:29:10 -0800

> This patch series allows warning an user that the generic PHY driver(s)
> are used when a SFP incorporates a PHY (e.g: 1000BaseT SFP) which is
> likely not going to work at all.
> 
> Let me know if you would want to do that differently.

Is there ever a possibility that the generic PHY driver could work
in an SFP situation?

If not, yes emit the message but also fail the load and registry too
perhaps?

[PATCH RFC net-next 2/3] net: phy: sfp: Issue warning when using Generic PHY driver(s)

2018-11-06 Thread Florian Fainelli

1000BaseT SFP modules typically include an Ethernet PHY device, and
while the Generic PHY driver will be able to bind to it, it usually will
not work at all without a specialized PHY driver. Issue a warning in
that case to help toubleshoot things.

Signed-off-by: Florian Fainelli 
---
 drivers/net/phy/sfp.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/net/phy/sfp.c b/drivers/net/phy/sfp.c
index fd8bb998ae52..228205d8ce84 100644
--- a/drivers/net/phy/sfp.c
+++ b/drivers/net/phy/sfp.c
@@ -1203,6 +1203,9 @@ static void sfp_sm_probe_phy(struct sfp *sfp)
}
 
sfp->mod_phy = phy;
+   if (phy_driver_is_genphy(phy) || phy_driver_is_genphy_10g(phy))
+   dev_warn(sfp->dev, "Using Generic PHY driver with a SFP!\n");
+
phy_start(phy);
 }
 
-- 
2.17.1

[PATCH RFC net-next 1/3] net: phy: Add helpers to determine if PHY driver is generic

2018-11-06 Thread Florian Fainelli

We are already checking in phy_detach() that the PHY driver is of
generic kind (1G or 10G) and we are going to make use of that in the SFP
layer as well for 1000BaseT SFP modules, so expose helper functions to
return that information.

Signed-off-by: Florian Fainelli 
---
 drivers/net/phy/phy_device.c | 34 --
 include/linux/phy.h  |  3 +++
 2 files changed, 35 insertions(+), 2 deletions(-)

diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
index ab33d1777132..15de7a3263bf 100644
--- a/drivers/net/phy/phy_device.c
+++ b/drivers/net/phy/phy_device.c
@@ -1262,6 +1262,36 @@ struct phy_device *phy_attach(struct net_device *dev, 
const char *bus_id,
 }
 EXPORT_SYMBOL(phy_attach);
 
+static bool phy_driver_is_genphy_kind(struct phy_device *phydev,
+ struct device_driver *driver)
+{
+   struct device *d = >mdio.dev;
+   bool ret = false;
+
+   if (!phydev->drv)
+   return ret;
+
+   get_device(d);
+   ret = d->driver == driver;
+   put_device(d);
+
+   return ret;
+}
+
+bool phy_driver_is_genphy(struct phy_device *phydev)
+{
+   return phy_driver_is_genphy_kind(phydev,
+_driver.mdiodrv.driver);
+}
+EXPORT_SYMBOL_GPL(phy_driver_is_genphy);
+
+bool phy_driver_is_genphy_10g(struct phy_device *phydev)
+{
+   return phy_driver_is_genphy_kind(phydev,
+_10g_driver.mdiodrv.driver);
+}
+EXPORT_SYMBOL_GPL(phy_driver_is_genphy_10g);
+
 /**
  * phy_detach - detach a PHY device from its network device
  * @phydev: target phy_device struct
@@ -1293,8 +1323,8 @@ void phy_detach(struct phy_device *phydev)
 * from the generic driver so that there's a chance a
 * real driver could be loaded
 */
-   if (phydev->mdio.dev.driver == _10g_driver.mdiodrv.driver ||
-   phydev->mdio.dev.driver == _driver.mdiodrv.driver)
+   if (phy_driver_is_genphy(phydev) ||
+   phy_driver_is_genphy_10g(phydev))
device_release_driver(>mdio.dev);
 
/*
diff --git a/include/linux/phy.h b/include/linux/phy.h
index 3ea87f774a76..84a6c7efef60 100644
--- a/include/linux/phy.h
+++ b/include/linux/phy.h
@@ -1192,4 +1192,7 @@ module_exit(phy_module_exit)
 #define module_phy_driver(__phy_drivers)   \
phy_module_driver(__phy_drivers, ARRAY_SIZE(__phy_drivers))
 
+bool phy_driver_is_genphy(struct phy_device *phydev);
+bool phy_driver_is_genphy_10g(struct phy_device *phydev);
+
 #endif /* __PHY_H */
-- 
2.17.1

[PATCH RFC net-next 3/3] net: phy: Default MARVELL_PHY to the value of SFP

2018-11-06 Thread Florian Fainelli

Marvell PHYs are typically found in 1000BaseT SFP modules, so give a
chance for users to get the correct PHY driver when using SFP modules.

Signed-off-by: Florian Fainelli 
---
 drivers/net/phy/Kconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/phy/Kconfig b/drivers/net/phy/Kconfig
index 3d187cd50eb0..cf7d44ba20c5 100644
--- a/drivers/net/phy/Kconfig
+++ b/drivers/net/phy/Kconfig
@@ -350,6 +350,7 @@ config LXT_PHY
 
 config MARVELL_PHY
tristate "Marvell PHYs"
+   default SFP
---help---
  Currently has a driver for the 88E1011S
 
-- 
2.17.1

[PATCH RFC net-next 0/3] net: phy: sfp: Warn when using generic PHY driver

2018-11-06 Thread Florian Fainelli

Hi all,

This patch series allows warning an user that the generic PHY driver(s)
are used when a SFP incorporates a PHY (e.g: 1000BaseT SFP) which is
likely not going to work at all.

Let me know if you would want to do that differently.

Florian Fainelli (3):
  net: phy: Add helpers to determine if PHY driver is generic
  net: phy: sfp: Issue warning when using Generic PHY driver(s)
  net: phy: Default MARVELL_PHY to the value of SFP

 drivers/net/phy/Kconfig  |  1 +
 drivers/net/phy/phy_device.c | 34 --
 drivers/net/phy/sfp.c|  3 +++
 include/linux/phy.h  |  3 +++
 4 files changed, 39 insertions(+), 2 deletions(-)

-- 
2.17.1

Re: [PATCH net-next 09/11] udp: Support for error handlers of tunnels with arbitrary destination port

2018-11-06 Thread David Miller

From: Stefano Brivio 
Date: Tue,  6 Nov 2018 22:39:05 +0100

> diff --git a/include/net/ip6_tunnel.h b/include/net/ip6_tunnel.h
> index 236e40ba06bf..7855966b4a19 100644
> --- a/include/net/ip6_tunnel.h
> +++ b/include/net/ip6_tunnel.h
> @@ -69,6 +69,8 @@ struct ip6_tnl_encap_ops {
>   size_t (*encap_hlen)(struct ip_tunnel_encap *e);
>   int (*build_header)(struct sk_buff *skb, struct ip_tunnel_encap *e,
>   u8 *protocol, struct flowi6 *fl6);
> + int (*err_handler)(struct sk_buff *, struct inet6_skb_parm *opt,
> +u8 type, u8 code, int offset, __be32 info);
>  };

Please give names to all of the arguments in this new method.

...
> diff --git a/include/net/ip_tunnels.h b/include/net/ip_tunnels.h
> index b0d022ff6ea1..5980659312e5 100644
> --- a/include/net/ip_tunnels.h
> +++ b/include/net/ip_tunnels.h
> @@ -311,6 +311,7 @@ struct ip_tunnel_encap_ops {
>   size_t (*encap_hlen)(struct ip_tunnel_encap *e);
>   int (*build_header)(struct sk_buff *skb, struct ip_tunnel_encap *e,
>   u8 *protocol, struct flowi4 *fl4);
> + int (*err_handler)(struct sk_buff *, u32);

Likewise.

Re: [PATCH net-next 01/11] udp: Handle ICMP errors for tunnels with same destination port on both endpoints

2018-11-06 Thread David Miller

From: Stefano Brivio 
Date: Tue,  6 Nov 2018 22:38:57 +0100

> + /* Network header needs to point to the outer IPv4 header inside ICMP */
> + skb_reset_network_header(skb);
> + iph = ip_hdr(skb);
> + /* Transport header needs to point to the UDP header */
> + skb_set_transport_header(skb, iph->ihl << 2);

Please put an empty line before the second comment.

Re: [PATCH net-next 00/11] ICMP error handling for UDP tunnels

2018-11-06 Thread David Miller

From: Stefano Brivio 
Date: Tue,  6 Nov 2018 22:38:56 +0100

> This series introduces ICMP error handling for UDP tunnels and
> encapsulations and related selftests. We need to handle ICMP errors to
> support PMTU discovery and route redirection -- this support is entirely
> missing right now:
> 
> - patch 1/11 adds a socket lookup for UDP tunnels that use, by design,
>   the same destination port on both endpoints -- i.e. VxLAN and GENEVE
> - patches 2/11 to 7/11 are specific to VxLAN and GENEVE
> - patches 8/11 and 9/11 add infrastructure for lookup of encapsulations
>   where sent packets cannot be matched via receiving socket lookup, i.e.
>   FoU and GUE
> - patches 10/11 and 11/11 are specific to FoU and GUE

I like this series, especially the testcases.

But I have a minor coding style issue or two I'd like you
to fixup before I apply this.

I'll reply to individual patches as needed.

[PATCH net-next 2/3] net: systemport: Simplify queue mapping logic

2018-11-06 Thread Florian Fainelli

The use of a bitmap speeds up the finding of the first available queue
to which we could start establishing the mapping for, but we still have
to loop over all slave network devices to set them up. Simplify the
logic to have a single loop, and use the fact that a correctly
configured ring has inspect set to true. This will make things simpler
to unwind during device unregistration.

Signed-off-by: Florian Fainelli 
---
 drivers/net/ethernet/broadcom/bcmsysport.c | 17 +
 drivers/net/ethernet/broadcom/bcmsysport.h |  1 -
 2 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bcmsysport.c 
b/drivers/net/ethernet/broadcom/bcmsysport.c
index 0e2d99c737e3..f620c647bb86 100644
--- a/drivers/net/ethernet/broadcom/bcmsysport.c
+++ b/drivers/net/ethernet/broadcom/bcmsysport.c
@@ -2312,7 +2312,7 @@ static int bcm_sysport_map_queues(struct notifier_block 
*nb,
struct bcm_sysport_priv *priv;
struct net_device *slave_dev;
unsigned int num_tx_queues;
-   unsigned int q, start, port;
+   unsigned int q, qp, port;
struct net_device *dev;
 
priv = container_of(nb, struct bcm_sysport_priv, dsa_notifier);
@@ -2351,20 +2351,21 @@ static int bcm_sysport_map_queues(struct notifier_block 
*nb,
 
priv->per_port_num_tx_queues = num_tx_queues;
 
-   start = find_first_zero_bit(>queue_bitmap, dev->num_tx_queues);
-   for (q = 0; q < num_tx_queues; q++) {
-   ring = >tx_rings[q + start];
+   for (q = 0, qp = 0; q < dev->num_tx_queues && qp < num_tx_queues;
+q++) {
+   ring = >tx_rings[q];
+
+   if (ring->inspect)
+   continue;
 
/* Just remember the mapping actual programming done
 * during bcm_sysport_init_tx_ring
 */
-   ring->switch_queue = q;
+   ring->switch_queue = qp;
ring->switch_port = port;
ring->inspect = true;
priv->ring_map[q + port * num_tx_queues] = ring;
-
-   /* Set all queues as being used now */
-   set_bit(q + start, >queue_bitmap);
+   qp++;
}
 
return 0;
diff --git a/drivers/net/ethernet/broadcom/bcmsysport.h 
b/drivers/net/ethernet/broadcom/bcmsysport.h
index a7a230884a87..94d64b203098 100644
--- a/drivers/net/ethernet/broadcom/bcmsysport.h
+++ b/drivers/net/ethernet/broadcom/bcmsysport.h
@@ -795,7 +795,6 @@ struct bcm_sysport_priv {
/* map information between switch port queues and local queues */
struct notifier_block   dsa_notifier;
unsigned intper_port_num_tx_queues;
-   unsigned long   queue_bitmap;
struct bcm_sysport_tx_ring *ring_map[DSA_MAX_PORTS * 8];
 
 };
-- 
2.17.1

[PATCH net-next 3/3] net: systemport: Unmap queues upon DSA unregister event

2018-11-06 Thread Florian Fainelli

Binding and unbinding the switch driver which creates the DSA slave
network devices for which we set-up inspection would lead to
undesireable effects since we were not clearing the port/queue mapping
to the SYSTEMPORT TX queue.

Signed-off-by: Florian Fainelli 
---
 drivers/net/ethernet/broadcom/bcmsysport.c | 56 +++---
 1 file changed, 50 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bcmsysport.c 
b/drivers/net/ethernet/broadcom/bcmsysport.c
index f620c647bb86..f8f0a027b3ae 100644
--- a/drivers/net/ethernet/broadcom/bcmsysport.c
+++ b/drivers/net/ethernet/broadcom/bcmsysport.c
@@ -2371,17 +2371,61 @@ static int bcm_sysport_map_queues(struct notifier_block 
*nb,
return 0;
 }
 
+static int bcm_sysport_unmap_queues(struct notifier_block *nb,
+   struct dsa_notifier_register_info *info)
+{
+   struct bcm_sysport_tx_ring *ring;
+   struct bcm_sysport_priv *priv;
+   struct net_device *slave_dev;
+   unsigned int num_tx_queues;
+   struct net_device *dev;
+   unsigned int q, port;
+
+   priv = container_of(nb, struct bcm_sysport_priv, dsa_notifier);
+   if (priv->netdev != info->master)
+   return 0;
+
+   dev = info->master;
+
+   if (dev->netdev_ops != _sysport_netdev_ops)
+   return 0;
+
+   port = info->port_number;
+   slave_dev = info->info.dev;
+
+   num_tx_queues = slave_dev->real_num_tx_queues;
+
+   for (q = 0; q < dev->num_tx_queues; q++) {
+   ring = >tx_rings[q];
+
+   if (ring->switch_port != port)
+   continue;
+
+   if (!ring->inspect)
+   continue;
+
+   ring->inspect = false;
+   priv->ring_map[q + port * num_tx_queues] = NULL;
+   }
+
+   return 0;
+}
+
 static int bcm_sysport_dsa_notifier(struct notifier_block *nb,
unsigned long event, void *ptr)
 {
-   struct dsa_notifier_register_info *info;
+   int ret = NOTIFY_DONE;
 
-   if (event != DSA_PORT_REGISTER)
-   return NOTIFY_DONE;
-
-   info = ptr;
+   switch (event) {
+   case DSA_PORT_REGISTER:
+   ret = bcm_sysport_map_queues(nb, ptr);
+   break;
+   case DSA_PORT_UNREGISTER:
+   ret = bcm_sysport_unmap_queues(nb, ptr);
+   break;
+   }
 
-   return notifier_from_errno(bcm_sysport_map_queues(nb, info));
+   return notifier_from_errno(ret);
 }
 
 #define REV_FMT"v%2x.%02x"
-- 
2.17.1

[PATCH net-next 1/3] net: dsa: bcm_sf2: Turn on PHY to allow successful registration

2018-11-06 Thread Florian Fainelli

We are binding to the PHY using the SF2 slave MDIO bus that we create,
binding involves reading the PHY's MII_PHYSID1/2 which won't be possible
if the PHY is turned off. Temporarily turn it on/off for the bus probing
to succeeed. This fixes unbind/bind problems where the port connecting
to that PHY would be in error since it could not connect to it.

Signed-off-by: Florian Fainelli 
---
 drivers/net/dsa/bcm_sf2.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/net/dsa/bcm_sf2.c b/drivers/net/dsa/bcm_sf2.c
index 2eb68769562c..2c664aac1e7b 100644
--- a/drivers/net/dsa/bcm_sf2.c
+++ b/drivers/net/dsa/bcm_sf2.c
@@ -1090,12 +1090,16 @@ static int bcm_sf2_sw_probe(struct platform_device 
*pdev)
return ret;
}
 
+   bcm_sf2_gphy_enable_set(priv->dev->ds, true);
+
ret = bcm_sf2_mdio_register(ds);
if (ret) {
pr_err("failed to register MDIO bus\n");
return ret;
}
 
+   bcm_sf2_gphy_enable_set(priv->dev->ds, false);
+
ret = bcm_sf2_cfp_rst(priv);
if (ret) {
pr_err("failed to reset CFP\n");
-- 
2.17.1

[PATCH net-next 0/3] net: systemport: Unmap queues upon DSA unregister event

2018-11-06 Thread Florian Fainelli

Hi all,

This patch series fixes the unbinding/binding of the bcm_sf2 switch
driver along with bcmsysport which monitors the switch port queues.
Because the driver was not processing the DSA_PORT_UNREGISTER event, we
would not be unmapping switch port/queues, which could cause incorrect
decisions to be made by the HW (e.g: queue always back-pressured).

Florian Fainelli (3):
  net: dsa: bcm_sf2: Turn on PHY to allow successful registration
  net: systemport: Simplify queue mapping logic
  net: systemport: Unmap queues upon DSA unregister event

 drivers/net/dsa/bcm_sf2.c  |  4 ++
 drivers/net/ethernet/broadcom/bcmsysport.c | 71 ++
 drivers/net/ethernet/broadcom/bcmsysport.h |  1 -
 3 files changed, 62 insertions(+), 14 deletions(-)

-- 
2.17.1

Re: [PATCH net-next 0/5] net: dsa: bcm_sf2: Store rules in lists

2018-11-06 Thread Florian Fainelli

On 11/6/18 3:06 PM, David Miller wrote:
> From: Florian Fainelli 
> Date: Tue,  6 Nov 2018 12:58:36 -0800
> 
>> Hi all,
>>
>> This patch series changes the bcm-sf2 driver to keep a copy of the
>> inserted rules as opposed to using the HW as a storage area for a number
>> of reasons:
>>
>> - this helps us with doing duplicate rule detection in a faster way, it
>>   would have required a full rule read before
>>
>> - this helps with Pablo's on-going work to convert ethtool_rx_flow_spec
>>   to a more generic flow rule structure by having fewer code paths to
>>   convert to the new structure/helpers
>>
>> - we need to cache copies to restore them during drive resumption,
>>   because depending on the low power mode the system has entered, the
>>   switch may have lost all of its context
> 
> Looks good to me, series applied and build testing right now.
> 
> I will say that the ethtool flow spec comparison should probably
> eventually be broken out into a helper function places somewhere
> common.  It's very likely this approach, and thus the helper, can
> be used by other drivers in a similar situation.

Sure, that could be done, I will check with Pablo how he wants to
approach that as well since he is reworking how flow rules are
represented. Thanks!
-- 
Florian

Re: [PATCH v4 1/3] net: emac: implement 802.1Q VLAN TX tagging support

2018-11-06 Thread David Miller

From: Christian Lamparter 
Date: Tue,  6 Nov 2018 23:27:49 +0100

> @@ -1435,6 +1436,22 @@ static inline netdev_tx_t emac_xmit_finish(struct 
> emac_instance *dev, int len)
>   return NETDEV_TX_OK;
>  }
>  
> +static inline u16 emac_tx_vlan(struct emac_instance *dev, struct sk_buff 
> *skb)
> +{
> + /* Handle VLAN TPID and TCI insert if this is a VLAN skb */
> + if (emac_has_feature(dev, EMAC_FTR_HAS_VLAN_CTAG_TX) &&
> + skb_vlan_tag_present(skb)) {
> + struct emac_regs __iomem *p = dev->emacp;
> +
> + /* update the VLAN TCI */
> + out_be32(>vtci, (u32)skb_vlan_tag_get(skb));

Hmmm, how does this vtci register work?

How can you have a global piece of register state controlling the VLAN
tag that will be used for the TX frame?

What happens if you queue up several TX SKBs, each one with a different
VLAN tci?

Normally the TCI state is implemented on a per-tx-descriptor basis.

Re: [PATCH net-next 0/5] net: dsa: bcm_sf2: Store rules in lists

2018-11-06 Thread David Miller

From: Florian Fainelli 
Date: Tue,  6 Nov 2018 12:58:36 -0800

> Hi all,
> 
> This patch series changes the bcm-sf2 driver to keep a copy of the
> inserted rules as opposed to using the HW as a storage area for a number
> of reasons:
> 
> - this helps us with doing duplicate rule detection in a faster way, it
>   would have required a full rule read before
> 
> - this helps with Pablo's on-going work to convert ethtool_rx_flow_spec
>   to a more generic flow rule structure by having fewer code paths to
>   convert to the new structure/helpers
> 
> - we need to cache copies to restore them during drive resumption,
>   because depending on the low power mode the system has entered, the
>   switch may have lost all of its context

Looks good to me, series applied and build testing right now.

I will say that the ethtool flow spec comparison should probably
eventually be broken out into a helper function places somewhere
common.  It's very likely this approach, and thus the helper, can
be used by other drivers in a similar situation.

Thanks.

Re: [PATCH net-next 0/3] net: More extack messages

2018-11-06 Thread David Miller

From: David Ahern 
Date: Tue,  6 Nov 2018 12:51:13 -0800

> From: David Ahern 
> 
> Add more extack messages for several link create errors (e.g., invalid
> number of queues, unknown link kind) and invalid metrics argument.

Series applied, thanks David.

Re: [PATCH bpf-next v2 02/13] bpf: btf: Add BTF_KIND_FUNC and BTF_KIND_FUNC_PROTO

2018-11-06 Thread Edward Cree

On 06/11/18 21:56, Alexei Starovoitov wrote:
> that looks very weird to me. Why split func name from argument names?
> The 'function name' as seen by the BTF may not be the symbol name
> as seen in elf file.
The symbol name will be in the symbol table, which is not the same
 thing as the functions table in BTF that I'm proposing.  (They do
 look a little similar as I included an insn_idx for functions that
 partially duplicates the offset given in the symbol table.  But
 that's necessary precisely for the reason you mention, that the
 function name != the symbol name in general.)
"Splitting" func name from argument names is partly to potentially
 save space — if we'd had "int bar(int x)" instead then 'bar' could
 share its type record with 'foo'.  And partly just because the
 name of the function itself is no more part of its type than the
 name of an integer variable is part of the integer's type.
(Whereas names of parameters are like names of struct members:
 while they are not part of the 'pure type' from a language
 perspective, they are part of the type from the perspective of
 debugging, which is why they belong in the BTF type record.)

> There are C, bpftrace, p4 and python frontends. These languages
> should be free to put into BTF KIND_FUNC name that makes sense
> from the language point of view.
I'm paying attention to BTF because I'm adding support for it into
 my ebpf_asm.  Don't you think I *know* that frontends for BPF are
 more than just C?

>>  and in the 'variables' section we might have
>> 1 "quux" type=1 where=stack func=1 offset=-8
> that doesn't work. stack slots can be reused by compiler.
And who says that there can't be multiple records pointing to the
 same stack slot with different types & names?

> Instead we will annotate every load/store with btf type id.
That's certainly more useful; but I think most useful of all is to
 have *both* (though the stack slot types should be optional).

> The global variables for given .c file will look like single KIND_STRUCT
That's exactly the kind of superficially-clever but nasty hack
 that results from the continued insistence on conflating types
 and instances (objects).  In the long run it will make
 maintenance harder, and frustrate new features owing to the need
 to find new hacks to shoehorn them into the same model.
Instead there should be entries for the globals in something like
 the variables table I mentioned,
2 "fred" type=1 where=global func=0 offset=8
 in which 'func' is unused and 'offset' gives offset in .bss.
 'where' might also include indication of whether it's static.

Then for linkage you can extend this with index of which file it
 came from.

But maybe discussing global variables is a bit premature as eBPF
 doesn't have any such thing yet.

> yes we do see these things differently.
> To us function name is the debug info that fits well into BTF description.
> Whereas you see the function name part of function declaration
> as something 'entirely different'.
I'm not saying that the function name is 'entirely different'
 to the rest of the type.  (Though I do think it doesn't
 belong in the type, that's a weaker and contingent point.)
I'm saying that the *function* is entirely different to its
 *type*.  It's a category error to conflate them:
    f: x ↦ x + 1
 is a function.
    int → int
 is a type, and specifically the type of the object named "f".
(And the nature of mathematical notation for functions happens
 to put the name 'x' in the former, whereas we are putting the
 parameter name in the latter, but that's irrelevant.)
Similarly, "1" is an integer, but "integer" is a type, and is
 not itself an integer, while "1" is not a type.  They are at
 different meta-levels.

-Ed

Re: [PATCH net-next] ipv6: gro: do not use slow memcmp() in ipv6_gro_receive()

2018-11-06 Thread David Miller

From: Eric Dumazet 
Date: Tue, 6 Nov 2018 14:51:15 -0800

> On Tue, Nov 6, 2018 at 2:41 PM David Miller  wrote:
>>
>> From: Eric Dumazet 
>> Date: Tue,  6 Nov 2018 14:25:52 -0800
>>
>> > + if (unlikely(nlen > sizeof(struct ipv6hdr))) {
>> > + if (memcmp(iph + 1, iph2 + 1,
>> > +nlen - sizeof(struct ipv6hdr)))
>> > + goto not_same_flow;
>> > + }
>>
>> Is this even possible?
> 
> I believe that nlen can be indeed > sizeof(struct ipv6hdr) in presence
> of exthdrs,
> eg if ipv6_gso_pull_exthdrs() had to be called (line 201)
> 
>  I admit I have not checked if this was actually possible.

Indeed, that does make it possible.

Patch applied, thanks!

Re: [PATCH net-next] ipv6: gro: do not use slow memcmp() in ipv6_gro_receive()

2018-11-06 Thread Eric Dumazet

On Tue, Nov 6, 2018 at 2:41 PM David Miller  wrote:
>
> From: Eric Dumazet 
> Date: Tue,  6 Nov 2018 14:25:52 -0800
>
> > + if (unlikely(nlen > sizeof(struct ipv6hdr))) {
> > + if (memcmp(iph + 1, iph2 + 1,
> > +nlen - sizeof(struct ipv6hdr)))
> > + goto not_same_flow;
> > + }
>
> Is this even possible?

I believe that nlen can be indeed > sizeof(struct ipv6hdr) in presence
of exthdrs,
eg if ipv6_gso_pull_exthdrs() had to be called (line 201)

 I admit I have not checked if this was actually possible.


>
> off = skb_gro_offset(skb);
> hlen = off + sizeof(*iph);
> iph = skb_gro_header_fast(skb, off);
>
> off is some offset to the ipv6hdr in skb.  This is GRO's CB data_offset.
>
> skb_set_network_header(skb, off);
> skb_gro_pull(skb, sizeof(*iph));
> skb_set_transport_header(skb, skb_gro_offset(skb));
>
> Set network header to location of iph in SKB.
>
> GRO pull causes an incremebt of data_offset by sizeof(*iph) bytes.
>
> Set transport header to new data_offset value.
>
> nlen = skb_network_header_len(skb);
>
> This is transport_header - network_header.
>
> From what I can see, it is impossible for this to take on any value
> other than sizeof(*ipv6hdr).
>
> If you agree, please let's get rid of nlen and this useless code, and
> replace with sizeof(*ipv6hdr) as needed.
>
> Thanks.

Re: [PATCH net-next 1/3] devlink: Add fw_version_check generic parameter

2018-11-06 Thread Jakub Kicinski

On Tue, 6 Nov 2018 22:37:51 +0200, Ido Schimmel wrote:
> On Tue, Nov 06, 2018 at 12:19:13PM -0800, Jakub Kicinski wrote:
> > On Tue, 6 Nov 2018 20:05:00 +, Ido Schimmel wrote:  
> > > From: Shalom Toledo 
> > > 
> > > Many drivers checking the device's firmware version during the
> > > initialization flow and flashing a compatible version if the current
> > > version is not.
> > > 
> > > fw_version_check gives the ability to skip this check which allows to run
> > > the device with a different firmware version than required by the driver
> > > for testing and/or debugging purposes.
> > > 
> > > Signed-off-by: Shalom Toledo 
> > > Reviewed-by: Jiri Pirko 
> > > Signed-off-by: Ido Schimmel   
> > 
> > The documentation is missing, so it's hard to comment on the definition
> > of the parameter...
> 
> I assume you mean Documentation/networking/devlink-params.txt ?

Yes

> > We have a FW loading policy for NFP, too, so it'd be good to see if we
> > can find a common ground.  
> 
> If the parameter is set, then device runs with whatever firmware version
> was last flashed (via ethtool, for example). Otherwise, the driver will
> flash a version according to its policy. In mlxsw, it is a specific
> version.
> 
> Will that work for you?

Our FW is always backward compatible so there is no need to downgrade.

What we have is this more along these lines: there are two images one
on disk and second in the flash.  The FW loading policy can decide
which of those should be preferred, or should the versions be compared
and the newer one win (default).  But we don't flash the newer FW, just
potentially load it from disk today.

I'm not sure whether 'fw_version_check' describes the general behaviour
of not updating the FW in flash.  The policy of updating the FW in the
flash if the one on disk is newer seems to be something we could adopt
as well.  Can we come up with a more general parameter which could
select FW loading policy that'd for both cases?

Would values like these make any sense to you?
 - driver preferred (your default behaviour, we don't support since
   driver doesn't care);
 - newest (our default, device compares images and picks newer);
 - always disk (always run with what's on the disk, regardless of
   versions);
 - always flash (always run with what's already in flash, don't look at
   disk);

Separate bool parameter 'fw_flash_auto_update' would decide whether the
selected FW should be flashed to the device (always true for you AFAIU).

Let me know if that makes sense, it would be nice if we could converge
on a common solution, or at least name our parameters sufficiently
distinctly to avoid confusion :)

Re: [PATCH net-next] ipv6: gro: do not use slow memcmp() in ipv6_gro_receive()

2018-11-06 Thread David Miller

From: Eric Dumazet 
Date: Tue,  6 Nov 2018 14:25:52 -0800

> + if (unlikely(nlen > sizeof(struct ipv6hdr))) {
> + if (memcmp(iph + 1, iph2 + 1,
> +nlen - sizeof(struct ipv6hdr)))
> + goto not_same_flow;
> + }

Is this even possible?

off = skb_gro_offset(skb);
hlen = off + sizeof(*iph);
iph = skb_gro_header_fast(skb, off);

off is some offset to the ipv6hdr in skb.  This is GRO's CB data_offset.

skb_set_network_header(skb, off);
skb_gro_pull(skb, sizeof(*iph));
skb_set_transport_header(skb, skb_gro_offset(skb));

Set network header to location of iph in SKB.

GRO pull causes an incremebt of data_offset by sizeof(*iph) bytes.

Set transport header to new data_offset value.

nlen = skb_network_header_len(skb);

This is transport_header - network_header.

>From what I can see, it is impossible for this to take on any value
other than sizeof(*ipv6hdr).

If you agree, please let's get rid of nlen and this useless code, and
replace with sizeof(*ipv6hdr) as needed.

Thanks.

[PATCH v4 1/3] net: emac: implement 802.1Q VLAN TX tagging support

2018-11-06 Thread Christian Lamparter

As per' APM82181 Embedded Processor User Manual 26.1 EMAC Features:
VLAN:
 - Support for VLAN tag ID in compliance with IEEE 802.3ac.
 - VLAN tag insertion or replacement for transmit packets

This patch completes the missing code for the VLAN tx tagging
support, as the the EMAC_MR1_VLE was already enabled.

Signed-off-by: Christian Lamparter 
---
 drivers/net/ethernet/ibm/emac/core.c | 32 
 drivers/net/ethernet/ibm/emac/core.h |  6 +-
 2 files changed, 33 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/ibm/emac/core.c 
b/drivers/net/ethernet/ibm/emac/core.c
index 760b2ad8e295..be560f9031f4 100644
--- a/drivers/net/ethernet/ibm/emac/core.c
+++ b/drivers/net/ethernet/ibm/emac/core.c
@@ -37,6 +37,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -674,7 +675,7 @@ static int emac_configure(struct emac_instance *dev)
 ndev->dev_addr[5]);
 
/* VLAN Tag Protocol ID */
-   out_be32(>vtpid, 0x8100);
+   out_be32(>vtpid, ETH_P_8021Q);
 
/* Receive mode register */
r = emac_iff2rmr(ndev);
@@ -1435,6 +1436,22 @@ static inline netdev_tx_t emac_xmit_finish(struct 
emac_instance *dev, int len)
return NETDEV_TX_OK;
 }
 
+static inline u16 emac_tx_vlan(struct emac_instance *dev, struct sk_buff *skb)
+{
+   /* Handle VLAN TPID and TCI insert if this is a VLAN skb */
+   if (emac_has_feature(dev, EMAC_FTR_HAS_VLAN_CTAG_TX) &&
+   skb_vlan_tag_present(skb)) {
+   struct emac_regs __iomem *p = dev->emacp;
+
+   /* update the VLAN TCI */
+   out_be32(>vtci, (u32)skb_vlan_tag_get(skb));
+
+   /* Insert VLAN tag */
+   return EMAC_TX_CTRL_IVT;
+   }
+   return 0;
+}
+
 /* Tx lock BH */
 static netdev_tx_t emac_start_xmit(struct sk_buff *skb, struct net_device 
*ndev)
 {
@@ -1443,7 +1460,7 @@ static netdev_tx_t emac_start_xmit(struct sk_buff *skb, 
struct net_device *ndev)
int slot;
 
u16 ctrl = EMAC_TX_CTRL_GFCS | EMAC_TX_CTRL_GP | MAL_TX_CTRL_READY |
-   MAL_TX_CTRL_LAST | emac_tx_csum(dev, skb);
+   MAL_TX_CTRL_LAST | emac_tx_csum(dev, skb) | emac_tx_vlan(dev, skb);
 
slot = dev->tx_slot++;
if (dev->tx_slot == NUM_TX_BUFF) {
@@ -1518,7 +1535,7 @@ emac_start_xmit_sg(struct sk_buff *skb, struct net_device 
*ndev)
goto stop_queue;
 
ctrl = EMAC_TX_CTRL_GFCS | EMAC_TX_CTRL_GP | MAL_TX_CTRL_READY |
-   emac_tx_csum(dev, skb);
+   emac_tx_csum(dev, skb) | emac_tx_vlan(dev, skb);
slot = dev->tx_slot;
 
/* skb data */
@@ -2891,7 +2908,8 @@ static int emac_init_config(struct emac_instance *dev)
if (of_device_is_compatible(np, "ibm,emac-apm821xx")) {
dev->features |= (EMAC_APM821XX_REQ_JUMBO_FRAME_SIZE |
  EMAC_FTR_APM821XX_NO_HALF_DUPLEX |
- EMAC_FTR_460EX_PHY_CLK_FIX);
+ EMAC_FTR_460EX_PHY_CLK_FIX |
+ EMAC_FTR_HAS_VLAN_CTAG_TX);
}
} else if (of_device_is_compatible(np, "ibm,emac4")) {
dev->features |= EMAC_FTR_EMAC4;
@@ -3148,6 +3166,12 @@ static int emac_probe(struct platform_device *ofdev)
 
if (dev->tah_dev) {
ndev->hw_features = NETIF_F_IP_CSUM | NETIF_F_SG;
+
+   if (emac_has_feature(dev, EMAC_FTR_HAS_VLAN_CTAG_TX)) {
+   ndev->vlan_features |= ndev->hw_features;
+   ndev->hw_features |= NETIF_F_HW_VLAN_CTAG_TX;
+   }
+
ndev->features |= ndev->hw_features | NETIF_F_RXCSUM;
}
ndev->watchdog_timeo = 5 * HZ;
diff --git a/drivers/net/ethernet/ibm/emac/core.h 
b/drivers/net/ethernet/ibm/emac/core.h
index 84caa4a3fc52..8d84d439168c 100644
--- a/drivers/net/ethernet/ibm/emac/core.h
+++ b/drivers/net/ethernet/ibm/emac/core.h
@@ -334,6 +334,8 @@ struct emac_instance {
  * APM821xx does not support Half Duplex mode
  */
 #define EMAC_FTR_APM821XX_NO_HALF_DUPLEX   0x1000
+/* EMAC can insert 802.1Q tag */
+#define EMAC_FTR_HAS_VLAN_CTAG_TX  0x2000
 
 /* Right now, we don't quite handle the always/possible masks on the
  * most optimal way as we don't have a way to say something like
@@ -363,7 +365,9 @@ enum {
EMAC_FTR_460EX_PHY_CLK_FIX |
EMAC_FTR_440EP_PHY_CLK_FIX |
EMAC_APM821XX_REQ_JUMBO_FRAME_SIZE |
-   EMAC_FTR_APM821XX_NO_HALF_DUPLEX,
+   EMAC_FTR_APM821XX_NO_HALF_DUPLEX |
+   EMAC_FTR_HAS_VLAN_CTAG_TX |
+   0,
 };
 
 static inline int emac_has_feature(struct emac_instance *dev,
-- 
2.19.1

[PATCH v4 2/3] net: emac: implement TCP segmentation offload (TSO)

2018-11-06 Thread Christian Lamparter

This patch enables TSO(v4) hw feature for emac driver.
As atleast the APM82181's TCP/IP acceleration hardware
controller (TAH) provides TCP segmentation support in
the transmit path.

Signed-off-by: Christian Lamparter 
---
 drivers/net/ethernet/ibm/emac/core.c | 112 ++-
 drivers/net/ethernet/ibm/emac/core.h |   7 ++
 drivers/net/ethernet/ibm/emac/emac.h |   7 ++
 drivers/net/ethernet/ibm/emac/tah.c  |  22 +-
 drivers/net/ethernet/ibm/emac/tah.h  |   2 +
 5 files changed, 148 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/ibm/emac/core.c 
b/drivers/net/ethernet/ibm/emac/core.c
index be560f9031f4..80aafd7552aa 100644
--- a/drivers/net/ethernet/ibm/emac/core.c
+++ b/drivers/net/ethernet/ibm/emac/core.c
@@ -38,6 +38,9 @@
 #include 
 #include 
 #include 
+#include 
+#include 
+#include 
 #include 
 #include 
 #include 
@@ -1118,6 +1121,32 @@ static int emac_resize_rx_ring(struct emac_instance 
*dev, int new_mtu)
return ret;
 }
 
+/* Restriction applied for the segmentation size
+ * to use HW segmentation offload feature. the size
+ * of the segment must not be less than 168 bytes for
+ * DIX formatted segments, or 176 bytes for
+ * IEEE formatted segments. However based on actual
+ * tests any MTU less than 416 causes excessive retries
+ * due to TX FIFO underruns.
+ */
+const u32 tah_ss[TAH_NO_SSR] = { 1500, 1344, 1152, 960, 768, 416 };
+
+/* look-up matching segment size for the given mtu */
+static void emac_find_tso_ss_for_mtu(struct emac_instance *dev)
+{
+   int i;
+
+   for (i = 0; i < ARRAY_SIZE(tah_ss); i++) {
+   if (tah_ss[i] <= dev->ndev->mtu)
+   break;
+   }
+   /* if no matching segment size is found, set the tso_ss_mtu_start
+* variable anyway. This will cause the emac_tx_tso to skip straight
+* to the software fallback.
+*/
+   dev->tso_ss_mtu_start = i;
+}
+
 /* Process ctx, rtnl_lock semaphore */
 static int emac_change_mtu(struct net_device *ndev, int new_mtu)
 {
@@ -1134,6 +1163,7 @@ static int emac_change_mtu(struct net_device *ndev, int 
new_mtu)
 
if (!ret) {
ndev->mtu = new_mtu;
+   emac_find_tso_ss_for_mtu(dev);
dev->rx_skb_size = emac_rx_skb_size(new_mtu);
dev->rx_sync_size = emac_rx_sync_size(new_mtu);
}
@@ -1410,6 +1440,33 @@ static inline u16 emac_tx_csum(struct emac_instance *dev,
return 0;
 }
 
+static int emac_tx_tso(struct emac_instance *dev, struct sk_buff *skb,
+  u16 *ctrl)
+{
+   if (emac_has_feature(dev, EMAC_FTR_TAH_HAS_TSO) && skb_is_gso(skb) &&
+   !!(skb_shinfo(skb)->gso_type & (SKB_GSO_TCPV4 | SKB_GSO_TCPV6))) {
+   u32 seg_size = 0, i;
+
+   /* Get the MTU */
+   seg_size = skb_shinfo(skb)->gso_size + tcp_hdrlen(skb) +
+  skb_network_header_len(skb);
+
+   for (i = dev->tso_ss_mtu_start; i < ARRAY_SIZE(tah_ss); i++) {
+   if (tah_ss[i] > seg_size)
+   continue;
+
+   *ctrl |= EMAC_TX_CTRL_TAH_SSR(i);
+   return 0;
+   }
+
+   /* none found fall back to software */
+   return -EINVAL;
+   }
+
+   *ctrl |= emac_tx_csum(dev, skb);
+   return 0;
+}
+
 static inline netdev_tx_t emac_xmit_finish(struct emac_instance *dev, int len)
 {
struct emac_regs __iomem *p = dev->emacp;
@@ -1452,6 +1509,46 @@ static inline u16 emac_tx_vlan(struct emac_instance 
*dev, struct sk_buff *skb)
return 0;
 }
 
+static netdev_tx_t
+emac_start_xmit_sg(struct sk_buff *skb, struct net_device *ndev);
+
+static int
+emac_sw_tso(struct sk_buff *skb, struct net_device *ndev)
+{
+   struct emac_instance *dev = netdev_priv(ndev);
+   struct sk_buff *segs, *curr;
+   unsigned int i, frag_slots;
+
+   /* make sure to not overflow the tx ring */
+   frag_slots = dev->tx_cnt;
+   for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) {
+   struct skb_frag_struct *frag = _shinfo(skb)->frags[i];
+
+   frag_slots += mal_tx_chunks(skb_frag_size(frag));
+
+   if (frag_slots >= NUM_TX_BUFF)
+   return -ENOSPC;
+   };
+
+   segs = skb_gso_segment(skb, ndev->features &
+   ~(NETIF_F_TSO | NETIF_F_TSO6));
+   if (IS_ERR_OR_NULL(segs)) {
+   ++dev->estats.tx_dropped;
+   dev_kfree_skb_any(skb);
+   } else {
+   while (segs) {
+   curr = segs;
+   segs = curr->next;
+   curr->next = NULL;
+
+   emac_start_xmit_sg(curr, ndev);
+   }
+   dev_consume_skb_any(skb);
+   }
+
+   return 0;
+}
+
 /* Tx lock BH */
 static netdev_tx_t emac_start_xmit(struct sk_buff *skb, struct net_device 
*ndev)

[PATCH v4 3/3] net: emac: remove IBM_EMAC_RX_SKB_HEADROOM

2018-11-06 Thread Christian Lamparter

The EMAC driver had a custom IBM_EMAC_RX_SKB_HEADROOM
Kconfig option that reserved additional skb headroom for RX.
This patch removes the option and migrates the code
to use napi_alloc_skb() and netdev_alloc_skb_ip_align()
in its place.

Signed-off-by: Christian Lamparter 
---
 drivers/net/ethernet/ibm/emac/Kconfig | 12 --
 drivers/net/ethernet/ibm/emac/core.c  | 57 +++
 drivers/net/ethernet/ibm/emac/core.h  | 10 ++---
 3 files changed, 43 insertions(+), 36 deletions(-)

diff --git a/drivers/net/ethernet/ibm/emac/Kconfig 
b/drivers/net/ethernet/ibm/emac/Kconfig
index 90d49191beb3..eacf7e141fdc 100644
--- a/drivers/net/ethernet/ibm/emac/Kconfig
+++ b/drivers/net/ethernet/ibm/emac/Kconfig
@@ -28,18 +28,6 @@ config IBM_EMAC_RX_COPY_THRESHOLD
depends on IBM_EMAC
default "256"
 
-config IBM_EMAC_RX_SKB_HEADROOM
-   int "Additional RX skb headroom (bytes)"
-   depends on IBM_EMAC
-   default "0"
-   help
- Additional receive skb headroom. Note, that driver
- will always reserve at least 2 bytes to make IP header
- aligned, so usually there is no need to add any additional
- headroom.
-
- If unsure, set to 0.
-
 config IBM_EMAC_DEBUG
bool "Debugging"
depends on IBM_EMAC
diff --git a/drivers/net/ethernet/ibm/emac/core.c 
b/drivers/net/ethernet/ibm/emac/core.c
index 80aafd7552aa..266b6614125b 100644
--- a/drivers/net/ethernet/ibm/emac/core.c
+++ b/drivers/net/ethernet/ibm/emac/core.c
@@ -1075,7 +1075,9 @@ static int emac_resize_rx_ring(struct emac_instance *dev, 
int new_mtu)
 
/* Second pass, allocate new skbs */
for (i = 0; i < NUM_RX_BUFF; ++i) {
-   struct sk_buff *skb = alloc_skb(rx_skb_size, GFP_ATOMIC);
+   struct sk_buff *skb;
+
+   skb = netdev_alloc_skb_ip_align(dev->ndev, rx_skb_size);
if (!skb) {
ret = -ENOMEM;
goto oom;
@@ -1084,7 +1086,6 @@ static int emac_resize_rx_ring(struct emac_instance *dev, 
int new_mtu)
BUG_ON(!dev->rx_skb[i]);
dev_kfree_skb(dev->rx_skb[i]);
 
-   skb_reserve(skb, EMAC_RX_SKB_HEADROOM + 2);
dev->rx_desc[i].data_ptr =
dma_map_single(>ofdev->dev, skb->data - 2, 
rx_sync_size,
   DMA_FROM_DEVICE) + 2;
@@ -1205,20 +1206,18 @@ static void emac_clean_rx_ring(struct emac_instance 
*dev)
}
 }
 
-static inline int emac_alloc_rx_skb(struct emac_instance *dev, int slot,
-   gfp_t flags)
+static inline int
+__emac_prepare_rx_skb(struct sk_buff *skb, struct emac_instance *dev, int slot)
 {
-   struct sk_buff *skb = alloc_skb(dev->rx_skb_size, flags);
if (unlikely(!skb))
return -ENOMEM;
 
dev->rx_skb[slot] = skb;
dev->rx_desc[slot].data_len = 0;
 
-   skb_reserve(skb, EMAC_RX_SKB_HEADROOM + 2);
dev->rx_desc[slot].data_ptr =
-   dma_map_single(>ofdev->dev, skb->data - 2, dev->rx_sync_size,
-  DMA_FROM_DEVICE) + 2;
+   dma_map_single(>ofdev->dev, skb->data - NET_IP_ALIGN,
+  dev->rx_sync_size, DMA_FROM_DEVICE) + NET_IP_ALIGN;
wmb();
dev->rx_desc[slot].ctrl = MAL_RX_CTRL_EMPTY |
(slot == (NUM_RX_BUFF - 1) ? MAL_RX_CTRL_WRAP : 0);
@@ -1226,6 +1225,27 @@ static inline int emac_alloc_rx_skb(struct emac_instance 
*dev, int slot,
return 0;
 }
 
+static inline int
+emac_alloc_rx_skb(struct emac_instance *dev, int slot)
+{
+   struct sk_buff *skb;
+
+   skb = __netdev_alloc_skb_ip_align(dev->ndev, dev->rx_skb_size,
+ GFP_KERNEL);
+
+   return __emac_prepare_rx_skb(skb, dev, slot);
+}
+
+static inline int
+emac_alloc_rx_skb_napi(struct emac_instance *dev, int slot)
+{
+   struct sk_buff *skb;
+
+   skb = napi_alloc_skb(>mal->napi, dev->rx_skb_size);
+
+   return __emac_prepare_rx_skb(skb, dev, slot);
+}
+
 static void emac_print_link_status(struct emac_instance *dev)
 {
if (netif_carrier_ok(dev->ndev))
@@ -1256,7 +1276,7 @@ static int emac_open(struct net_device *ndev)
 
/* Allocate RX ring */
for (i = 0; i < NUM_RX_BUFF; ++i)
-   if (emac_alloc_rx_skb(dev, i, GFP_KERNEL)) {
+   if (emac_alloc_rx_skb(dev, i)) {
printk(KERN_ERR "%s: failed to allocate RX ring\n",
   ndev->name);
goto oom;
@@ -1779,8 +1799,9 @@ static inline void emac_recycle_rx_skb(struct 
emac_instance *dev, int slot,
DBG2(dev, "recycle %d %d" NL, slot, len);
 
if (len)
-   dma_map_single(>ofdev->dev, skb->data - 2,
-  EMAC_DMA_ALIGN(len + 2), DMA_FROM_DEVICE);
+   dma_map_single(>ofdev->dev, skb->data - NET_IP_ALIGN,
+

[PATCH net-next] ipv6: gro: do not use slow memcmp() in ipv6_gro_receive()

2018-11-06 Thread Eric Dumazet

ipv6_gro_receive() compares 34 bytes using slow memcmp(),
while handcoding with a couple of ipv6_addr_equal() is much faster.

Before this patch, "perf top -e cycles:pp -C " would
see memcmp() using ~10% of cpu cycles on a 40Gbit NIC
receiving IPv6 TCP traffic.

Signed-off-by: Eric Dumazet 
---
 net/ipv6/ip6_offload.c | 13 ++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/net/ipv6/ip6_offload.c b/net/ipv6/ip6_offload.c
index 
c7e495f1201105f1ac1724a7b8fd82399efcce32..70f525c33cb6c1f375919b94a7afc45cc6bdcd5f
 100644
--- a/net/ipv6/ip6_offload.c
+++ b/net/ipv6/ip6_offload.c
@@ -229,14 +229,21 @@ static struct sk_buff *ipv6_gro_receive(struct list_head 
*head,
 * XXX skbs on the gro_list have all been parsed and pulled
 * already so we don't need to compare nlen
 * (nlen != (sizeof(*iph2) + ipv6_exthdrs_len(iph2, )))
-* memcmp() alone below is suffcient, right?
+* memcmp() alone below is sufficient, right?
 */
 if ((first_word & htonl(0xF00F)) ||
-   memcmp(>nexthdr, >nexthdr,
-  nlen - offsetof(struct ipv6hdr, nexthdr))) {
+   !ipv6_addr_equal(>saddr, >saddr) ||
+   !ipv6_addr_equal(>daddr, >daddr) ||
+   *(u16 *)>nexthdr != *(u16 *)>nexthdr) {
+not_same_flow:
NAPI_GRO_CB(p)->same_flow = 0;
continue;
}
+   if (unlikely(nlen > sizeof(struct ipv6hdr))) {
+   if (memcmp(iph + 1, iph2 + 1,
+  nlen - sizeof(struct ipv6hdr)))
+   goto not_same_flow;
+   }
/* flush if Traffic Class fields are different */
NAPI_GRO_CB(p)->flush |= !!(first_word & htonl(0x0FF0));
NAPI_GRO_CB(p)->flush |= flush;
-- 
2.19.1.930.g4563a0d9d0-goog

Re: [PATCH rdma] net/mlx5: Fix XRC SRQ umem valid bits

2018-11-06 Thread Jason Gunthorpe

On Tue, Nov 06, 2018 at 05:10:53PM -0500, Doug Ledford wrote:
> On Tue, 2018-11-06 at 22:02 +, Jason Gunthorpe wrote:
> > On Tue, Nov 06, 2018 at 04:31:08PM -0500, Doug Ledford wrote:
> > > On Wed, 2018-10-31 at 12:20 +0200, Leon Romanovsky wrote:
> > > > From: Yishai Hadas 
> > > > 
> > > > Adapt XRC SRQ to the latest HW specification with fixed definition
> > > > around umem valid bits. The previous definition relied on a bit which
> > > > was taken for other purposes in legacy FW.
> > > > 
> > > > Fixes: bd37197554eb ("net/mlx5: Update mlx5_ifc with DEVX UID bits")
> > > > Signed-off-by: Yishai Hadas 
> > > > Reviewed-by: Artemy Kovalyov 
> > > > Signed-off-by: Leon Romanovsky 
> > > > Hi Doug, Jason
> > > > 
> > > > This commit fixes code sent in this merge window, so I'm not marking it
> > > > with any rdma-rc/rdma-next. It will be better to be sent during this 
> > > > merge
> > > > window if you have extra pull request to issue, or as a -rc material, if
> > > > not.
> > > > 
> > > > BTW, we didn't combine reserved fields, because our convention is to 
> > > > align such
> > > > fields to 32 bits for better readability.
> > > > 
> > > > Thanks
> > > 
> > > This looks fine.  Let me know when it's in the mlx5-next tree to pull.
> > 
> > It needs to go to -rc... 
> > 
> > This needs a mlx5-rc branch for this I guess?
> 
> I don't think so.  As long as it's the first commit in mlx5-next, and
> mlx5-next is 4.20-rc1 based, then pulling this commit into the -rc tree
> will only pull the single commit.  Then when we pull into for-next for
> the first time, we will get this in for-next too.  That seems best to
> me.

That works too, if Leon is fast :)

Jason

Re: [PATCH rdma] net/mlx5: Fix XRC SRQ umem valid bits

2018-11-06 Thread Doug Ledford

On Tue, 2018-11-06 at 22:02 +, Jason Gunthorpe wrote:
> On Tue, Nov 06, 2018 at 04:31:08PM -0500, Doug Ledford wrote:
> > On Wed, 2018-10-31 at 12:20 +0200, Leon Romanovsky wrote:
> > > From: Yishai Hadas 
> > > 
> > > Adapt XRC SRQ to the latest HW specification with fixed definition
> > > around umem valid bits. The previous definition relied on a bit which
> > > was taken for other purposes in legacy FW.
> > > 
> > > Fixes: bd37197554eb ("net/mlx5: Update mlx5_ifc with DEVX UID bits")
> > > Signed-off-by: Yishai Hadas 
> > > Reviewed-by: Artemy Kovalyov 
> > > Signed-off-by: Leon Romanovsky 
> > > Hi Doug, Jason
> > > 
> > > This commit fixes code sent in this merge window, so I'm not marking it
> > > with any rdma-rc/rdma-next. It will be better to be sent during this merge
> > > window if you have extra pull request to issue, or as a -rc material, if
> > > not.
> > > 
> > > BTW, we didn't combine reserved fields, because our convention is to 
> > > align such
> > > fields to 32 bits for better readability.
> > > 
> > > Thanks
> > 
> > This looks fine.  Let me know when it's in the mlx5-next tree to pull.
> 
> It needs to go to -rc... 
> 
> This needs a mlx5-rc branch for this I guess?

I don't think so.  As long as it's the first commit in mlx5-next, and
mlx5-next is 4.20-rc1 based, then pulling this commit into the -rc tree
will only pull the single commit.  Then when we pull into for-next for
the first time, we will get this in for-next too.  That seems best to
me.

-- 
Doug Ledford 
GPG KeyID: B826A3330E572FDD
Key fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD


signature.asc
Description: This is a digitally signed message part

Re: [PATCH rdma] net/mlx5: Fix XRC SRQ umem valid bits

2018-11-06 Thread Jason Gunthorpe

On Tue, Nov 06, 2018 at 04:31:08PM -0500, Doug Ledford wrote:
> On Wed, 2018-10-31 at 12:20 +0200, Leon Romanovsky wrote:
> > From: Yishai Hadas 
> > 
> > Adapt XRC SRQ to the latest HW specification with fixed definition
> > around umem valid bits. The previous definition relied on a bit which
> > was taken for other purposes in legacy FW.
> > 
> > Fixes: bd37197554eb ("net/mlx5: Update mlx5_ifc with DEVX UID bits")
> > Signed-off-by: Yishai Hadas 
> > Reviewed-by: Artemy Kovalyov 
> > Signed-off-by: Leon Romanovsky 
> > Hi Doug, Jason
> > 
> > This commit fixes code sent in this merge window, so I'm not marking it
> > with any rdma-rc/rdma-next. It will be better to be sent during this merge
> > window if you have extra pull request to issue, or as a -rc material, if
> > not.
> > 
> > BTW, we didn't combine reserved fields, because our convention is to align 
> > such
> > fields to 32 bits for better readability.
> > 
> > Thanks
> 
> This looks fine.  Let me know when it's in the mlx5-next tree to pull.

It needs to go to -rc... 

This needs a mlx5-rc branch for this I guess?

Jason

Re: [PATCH bpf-next v2 02/13] bpf: btf: Add BTF_KIND_FUNC and BTF_KIND_FUNC_PROTO

2018-11-06 Thread Alexei Starovoitov

On Tue, Nov 06, 2018 at 06:52:11PM +, Edward Cree wrote:
> On 06/11/18 06:29, Alexei Starovoitov wrote:
> > BTF is not pure type information. BTF is everything that verifier needs
> > to know to make safety decisions that bpf instruction set doesn't have.
> Yes, I'm not disputing that and never have.
> I'm just saying that it will be much cleaner and better if it's
>  internally organised differently.
> 
> > Splitting pure types into one section, variables into another,
> > functions into yet another is not practical, since the same
> > modifiers (like const or volatile) need to be applied to
> > variables and functions. At the end all sections will have
> > the same style of encoding, hence no need to duplicate
> > the encoding three times and instead it's cleaner to encode
> > all of them BTF-style via different KINDs.
> This shows that you've misunderstood what I'm proposing, probably
>  I explained it poorly so I'll try again.
> I'm not suggesting that the 'functions' and 'variables' sections
>  would have _type_ records in them, only that they would reference
>  records in the 'types' section.  So if for instance we have
>     int foo(int x) { int quux; /* ... */ }
>     int bar(int y) { /* ... */ }
>  in the source, then in the 'types' section we would have
> 1 INT 32bits encoding=signed offset=0
> 2 FUNC args=(name="x" type=1,), ret=1
> 3 FUNC args=(name="y" type=1,), ret=1
>  while in the 'functions' section we would have
> 1 "foo" type=2 start_insn_idx=23 insn_count=19 (... maybe other info too...)
> 2 "bar" type=3 start_insn_idx=42 insn_count=5

that looks very weird to me. Why split func name from argument names?
The 'function name' as seen by the BTF may not be the symbol name
as seen in elf file.
Like C++ mangles names for elf. If/when we have C++ front-end for BPF
the BTF func name will be 'class::method' string.
While elf symbol name will still be mangled string in the elf file.

btw libbpf processes that elf symbol name for bpf prog already
and passes it to the kernel as bpf_attr.prog_name.
BTF's func name should be the one seen in the source.
Whatever that source code might be.
There are C, bpftrace, p4 and python frontends. These languages
should be free to put into BTF KIND_FUNC name that makes sense
from the language point of view.
Ideally it's C-like name, so when bpftool prints that BTF
it would be meaningful.

btw we've been thinking to teach libbpf to generate BTF KIND_FUNC
on the fly based on elf symbol name (when real BTF is missing in the elf),
but decided not to go that route for now.

>  and in the 'variables' section we might have
> 1 "quux" type=1 where=stack func=1 offset=-8

that doesn't work. stack slots can be reused by compiler.
variable can be in the register too.
right now we're not planning to have an equivalent of such dwarf
debug info in BTF, since -O2 is mandatory and with optimizations
variable tracking (gcc term) is not effective.
Instead we will annotate every load/store with btf type id.
Meaning no plans to tackle local variables.

The global variables for given .c file will look like single KIND_STRUCT
(which is variable length)
and when libbpf will learn to 'link' multiple .o the BTF deduplication
work (which we're doing for vmlinux) will apply as-is to
combine multiple .o together.

> 
> Thus the graph of types lives entirely in the 'types' section, but
>  things-that-are-not-types don't.  I'm not making a distinction
>  between "pure types" and (somehow) impure types; I'm making a
>  distinction between types (with all their impurities) and
>  *instances* of those types.
> Note that these 'sections' may all really be regions of the '.BTF'
>  ELF section, if that makes the implementation easier.  Also, the
>  'functions' and 'variables' sections _won't_ have the same style
>  of encoding as the 'types', because they're storing entirely
>  different data and in fact don't need variable record sizes.

yes we do see these things differently.
To us function name is the debug info that fits well into BTF description.
Whereas you see the function name part of function declaration
as something 'entirely different'. I don't quite get that point.
In C elf symbol name and in-source func name are the same,
which is probably causing this terminology confusion.

[PATCH iproute2 net-next 0/2] Add DF configuration for VxLAN and GENEVE link types

2018-11-06 Thread Stefano Brivio

This series adds configuration of the DF bit in outgoing IPv4 packets for
VxLAN and GENEVE link types.

I also included uapi/linux/if_link.h changes for convenience, please let
me know if I should repost without them.

Stefano Brivio (2):
  iplink_vxlan: Add DF configuration
  iplink_geneve: Add DF configuration

 include/uapi/linux/if_link.h | 18 ++
 ip/iplink_geneve.c   | 29 +
 ip/iplink_vxlan.c| 29 +
 man/man8/ip-link.8.in| 28 
 4 files changed, 104 insertions(+)

-- 
2.19.1

[PATCH iproute2 net-next 2/2] iplink_geneve: Add DF configuration

2018-11-06 Thread Stefano Brivio

Allow to set the DF bit behaviour for outgoing IPv4 packets: it can be
always on, inherited from the inner header, or, by default, always off,
which is the current behaviour.

Signed-off-by: Stefano Brivio 
---
 include/uapi/linux/if_link.h |  9 +
 ip/iplink_geneve.c   | 29 +
 man/man8/ip-link.8.in| 14 ++
 3 files changed, 52 insertions(+)

diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index 4caf683ce546..183ca7527178 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -563,10 +563,19 @@ enum {
IFLA_GENEVE_UDP_ZERO_CSUM6_RX,
IFLA_GENEVE_LABEL,
IFLA_GENEVE_TTL_INHERIT,
+   IFLA_GENEVE_DF,
__IFLA_GENEVE_MAX
 };
 #define IFLA_GENEVE_MAX(__IFLA_GENEVE_MAX - 1)
 
+enum ifla_geneve_df {
+   GENEVE_DF_UNSET = 0,
+   GENEVE_DF_SET,
+   GENEVE_DF_INHERIT,
+   __GENEVE_DF_END,
+   GENEVE_DF_MAX = __GENEVE_DF_END - 1,
+};
+
 /* PPP section */
 enum {
IFLA_PPP_UNSPEC,
diff --git a/ip/iplink_geneve.c b/ip/iplink_geneve.c
index c417842b2a5b..1872b74c5d70 100644
--- a/ip/iplink_geneve.c
+++ b/ip/iplink_geneve.c
@@ -24,6 +24,7 @@ static void print_explain(FILE *f)
"  remote ADDR\n"
"  [ ttl TTL ]\n"
"  [ tos TOS ]\n"
+   "  [ df DF ]\n"
"  [ flowlabel LABEL ]\n"
"  [ dstport PORT ]\n"
"  [ [no]external ]\n"
@@ -35,6 +36,7 @@ static void print_explain(FILE *f)
"   ADDR  := IP_ADDRESS\n"
"   TOS   := { NUMBER | inherit }\n"
"   TTL   := { 1..255 | auto | inherit }\n"
+   "   DF:= { unset | set | inherit }\n"
"   LABEL := 0-1048575\n"
);
 }
@@ -115,6 +117,22 @@ static int geneve_parse_opt(struct link_util *lu, int 
argc, char **argv,
tos = uval;
} else
tos = 1;
+   } else if (!matches(*argv, "df")) {
+   enum ifla_geneve_df df;
+
+   NEXT_ARG();
+   check_duparg(, IFLA_GENEVE_DF, "df", *argv);
+   if (strcmp(*argv, "unset") == 0)
+   df = GENEVE_DF_UNSET;
+   else if (strcmp(*argv, "set") == 0)
+   df = GENEVE_DF_SET;
+   else if (strcmp(*argv, "inherit") == 0)
+   df = GENEVE_DF_INHERIT;
+   else
+   invarg("DF must be 'unset', 'set' or 'inherit'",
+  *argv);
+
+   addattr8(n, 1024, IFLA_GENEVE_DF, df);
} else if (!matches(*argv, "label") ||
   !matches(*argv, "flowlabel")) {
__u32 uval;
@@ -287,6 +305,17 @@ static void geneve_print_opt(struct link_util *lu, FILE 
*f, struct rtattr *tb[])
print_string(PRINT_FP, NULL, "tos %s ", "inherit");
}
 
+   if (tb[IFLA_GENEVE_DF]) {
+   enum ifla_geneve_df df = rta_getattr_u8(tb[IFLA_GENEVE_DF]);
+
+   if (df == GENEVE_DF_UNSET)
+   print_string(PRINT_JSON, "df", "df %s ", "unset");
+   else if (df == GENEVE_DF_SET)
+   print_string(PRINT_ANY, "df", "df %s ", "set");
+   else if (df == GENEVE_DF_INHERIT)
+   print_string(PRINT_ANY, "df", "df %s ", "inherit");
+   }
+
if (tb[IFLA_GENEVE_LABEL]) {
__u32 label = rta_getattr_u32(tb[IFLA_GENEVE_LABEL]);
 
diff --git a/man/man8/ip-link.8.in b/man/man8/ip-link.8.in
index 1b899dd06b92..568e0aa02579 100644
--- a/man/man8/ip-link.8.in
+++ b/man/man8/ip-link.8.in
@@ -1180,6 +1180,8 @@ the following additional arguments are supported:
 ] [
 .BI tos " TOS "
 ] [
+.BI df " DF "
+] [
 .BI flowlabel " FLOWLABEL "
 ] [
 .BI dstport " PORT"
@@ -1212,6 +1214,18 @@ ttl. Default option is "0".
 .BI tos " TOS"
 - specifies the TOS value to use in outgoing packets.
 
+.sp
+.BI df " DF"
+- specifies the usage of the DF bit in outgoing packets with IPv4 headers.
+The value
+.B inherit
+causes the bit to be copied from the original IP header. The values
+.B unset
+and
+.B set
+cause the bit to be always unset or always set, respectively. By default, the
+bit is not set.
+
 .sp
 .BI flowlabel " FLOWLABEL"
 - specifies the flow label to use in outgoing packets.
-- 
2.19.1

[PATCH iproute2 net-next 1/2] iplink_vxlan: Add DF configuration

2018-11-06 Thread Stefano Brivio

Allow to set the DF bit behaviour for outgoing IPv4 packets: it can be
always on, inherited from the inner header, or, by default, always off,
which is the current behaviour.

Signed-off-by: Stefano Brivio 
---
 include/uapi/linux/if_link.h |  9 +
 ip/iplink_vxlan.c| 29 +
 man/man8/ip-link.8.in| 14 ++
 3 files changed, 52 insertions(+)

diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index 9c254603ebda..4caf683ce546 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -530,6 +530,7 @@ enum {
IFLA_VXLAN_LABEL,
IFLA_VXLAN_GPE,
IFLA_VXLAN_TTL_INHERIT,
+   IFLA_VXLAN_DF,
__IFLA_VXLAN_MAX
 };
 #define IFLA_VXLAN_MAX (__IFLA_VXLAN_MAX - 1)
@@ -539,6 +540,14 @@ struct ifla_vxlan_port_range {
__be16  high;
 };
 
+enum ifla_vxlan_df {
+   VXLAN_DF_UNSET = 0,
+   VXLAN_DF_SET,
+   VXLAN_DF_INHERIT,
+   __VXLAN_DF_END,
+   VXLAN_DF_MAX = __VXLAN_DF_END - 1,
+};
+
 /* GENEVE section */
 enum {
IFLA_GENEVE_UNSPEC,
diff --git a/ip/iplink_vxlan.c b/ip/iplink_vxlan.c
index 7fc0e2b4eb06..86afbe1334f0 100644
--- a/ip/iplink_vxlan.c
+++ b/ip/iplink_vxlan.c
@@ -31,6 +31,7 @@ static void print_explain(FILE *f)
" [ local ADDR ]\n"
" [ ttl TTL ]\n"
" [ tos TOS ]\n"
+   " [ df DF ]\n"
" [ flowlabel LABEL ]\n"
" [ dev PHYS_DEV ]\n"
" [ dstport PORT ]\n"
@@ -52,6 +53,7 @@ static void print_explain(FILE *f)
"   ADDR  := { IP_ADDRESS | any }\n"
"   TOS   := { NUMBER | inherit }\n"
"   TTL   := { 1..255 | auto | inherit }\n"
+   "   DF:= { unset | set | inherit }\n"
"   LABEL := 0-1048575\n"
);
 }
@@ -170,6 +172,22 @@ static int vxlan_parse_opt(struct link_util *lu, int argc, 
char **argv,
} else
tos = 1;
addattr8(n, 1024, IFLA_VXLAN_TOS, tos);
+   } else if (!matches(*argv, "df")) {
+   enum ifla_vxlan_df df;
+
+   NEXT_ARG();
+   check_duparg(, IFLA_VXLAN_DF, "df", *argv);
+   if (strcmp(*argv, "unset") == 0)
+   df = VXLAN_DF_UNSET;
+   else if (strcmp(*argv, "set") == 0)
+   df = VXLAN_DF_SET;
+   else if (strcmp(*argv, "inherit") == 0)
+   df = VXLAN_DF_INHERIT;
+   else
+   invarg("DF must be 'unset', 'set' or 'inherit'",
+  *argv);
+
+   addattr8(n, 1024, IFLA_VXLAN_DF, df);
} else if (!matches(*argv, "label") ||
   !matches(*argv, "flowlabel")) {
__u32 uval;
@@ -538,6 +556,17 @@ static void vxlan_print_opt(struct link_util *lu, FILE *f, 
struct rtattr *tb[])
print_string(PRINT_FP, NULL, "ttl %s ", "auto");
}
 
+   if (tb[IFLA_VXLAN_DF]) {
+   enum ifla_vxlan_df df = rta_getattr_u8(tb[IFLA_VXLAN_DF]);
+
+   if (df == VXLAN_DF_UNSET)
+   print_string(PRINT_JSON, "df", "df %s ", "unset");
+   else if (df == VXLAN_DF_SET)
+   print_string(PRINT_ANY, "df", "df %s ", "set");
+   else if (df == VXLAN_DF_INHERIT)
+   print_string(PRINT_ANY, "df", "df %s ", "inherit");
+   }
+
if (tb[IFLA_VXLAN_LABEL]) {
__u32 label = rta_getattr_u32(tb[IFLA_VXLAN_LABEL]);
 
diff --git a/man/man8/ip-link.8.in b/man/man8/ip-link.8.in
index 5132f514b279..1b899dd06b92 100644
--- a/man/man8/ip-link.8.in
+++ b/man/man8/ip-link.8.in
@@ -496,6 +496,8 @@ the following additional arguments are supported:
 ] [
 .BI tos " TOS "
 ] [
+.BI df " DF "
+] [
 .BI flowlabel " FLOWLABEL "
 ] [
 .BI dstport " PORT "
@@ -565,6 +567,18 @@ parameter.
 .BI tos " TOS"
 - specifies the TOS value to use in outgoing packets.
 
+.sp
+.BI df " DF"
+- specifies the usage of the DF bit in outgoing packets with IPv4 headers.
+The value
+.B inherit
+causes the bit to be copied from the original IP header. The values
+.B unset
+and
+.B set
+cause the bit to be always unset or always set, respectively. By default, the
+bit is not set.
+
 .sp
 .BI flowlabel " FLOWLABEL"
 - specifies the flow label to use in outgoing packets.
-- 
2.19.1

[PATCH net-next 03/11] vxlan: Allow configuration of DF behaviour

2018-11-06 Thread Stefano Brivio

Allow users to set the IPv4 DF bit in outgoing packets, or to inherit its
value from the IPv4 inner header. If the encapsulated protocol is IPv6 and
DF is configured to be inherited, always set it.

For IPv4, inheriting DF from the inner header was probably intended from
the very beginning judging by the comment to vxlan_xmit(), but it wasn't
actually implemented -- also because it would have done more harm than
good, without handling for ICMP Fragmentation Needed messages.

According to RFC 7348, "Path MTU discovery MAY be used". An expired RFC
draft, draft-saum-nvo3-pmtud-over-vxlan-05, whose purpose was to describe
PMTUD implementation, says that "is a MUST that Vxlan gateways [...]
SHOULD set the DF-bit [...]", whatever that means.

Given this background, the only sane option is probably to let the user
decide, and keep the current behaviour as default.

Reviewed-by: Sabrina Dubroca 
Signed-off-by: Stefano Brivio 
---
 drivers/net/vxlan.c  | 29 +
 include/net/vxlan.h  |  1 +
 include/uapi/linux/if_link.h |  9 +
 3 files changed, 39 insertions(+)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index 7706e392b2a7..ccb19b833706 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -2289,6 +2289,19 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct 
net_device *dev,
df = htons(IP_DF);
}
 
+   if (!df) {
+   if (vxlan->cfg.df == VXLAN_DF_SET) {
+   df = htons(IP_DF);
+   } else if (vxlan->cfg.df == VXLAN_DF_INHERIT) {
+   struct ethhdr *eth = eth_hdr(skb);
+
+   if (ntohs(eth->h_proto) == ETH_P_IPV6 ||
+   (ntohs(eth->h_proto) == ETH_P_IP &&
+old_iph->frag_off & htons(IP_DF)))
+   df = htons(IP_DF);
+   }
+   }
+
ndst = >dst;
skb_tunnel_check_pmtu(skb, ndst, VXLAN_HEADROOM);
 
@@ -2837,6 +2850,7 @@ static const struct nla_policy 
vxlan_policy[IFLA_VXLAN_MAX + 1] = {
[IFLA_VXLAN_GPE]= { .type = NLA_FLAG, },
[IFLA_VXLAN_REMCSUM_NOPARTIAL]  = { .type = NLA_FLAG },
[IFLA_VXLAN_TTL_INHERIT]= { .type = NLA_FLAG },
+   [IFLA_VXLAN_DF] = { .type = NLA_U8 },
 };
 
 static int vxlan_validate(struct nlattr *tb[], struct nlattr *data[],
@@ -2893,6 +2907,16 @@ static int vxlan_validate(struct nlattr *tb[], struct 
nlattr *data[],
}
}
 
+   if (data[IFLA_VXLAN_DF]) {
+   enum ifla_vxlan_df df = nla_get_u8(data[IFLA_VXLAN_DF]);
+
+   if (df < 0 || df > VXLAN_DF_MAX) {
+   NL_SET_ERR_MSG_ATTR(extack, tb[IFLA_VXLAN_DF],
+   "Invalid DF attribute");
+   return -EINVAL;
+   }
+   }
+
return 0;
 }
 
@@ -3538,6 +3562,9 @@ static int vxlan_nl2conf(struct nlattr *tb[], struct 
nlattr *data[],
conf->mtu = nla_get_u32(tb[IFLA_MTU]);
}
 
+   if (data[IFLA_VXLAN_DF])
+   conf->df = nla_get_u8(data[IFLA_VXLAN_DF]);
+
return 0;
 }
 
@@ -3630,6 +3657,7 @@ static size_t vxlan_get_size(const struct net_device *dev)
nla_total_size(sizeof(__u8)) +  /* IFLA_VXLAN_TTL */
nla_total_size(sizeof(__u8)) +  /* IFLA_VXLAN_TTL_INHERIT */
nla_total_size(sizeof(__u8)) +  /* IFLA_VXLAN_TOS */
+   nla_total_size(sizeof(__u8)) +  /* IFLA_VXLAN_DF */
nla_total_size(sizeof(__be32)) + /* IFLA_VXLAN_LABEL */
nla_total_size(sizeof(__u8)) +  /* IFLA_VXLAN_LEARNING */
nla_total_size(sizeof(__u8)) +  /* IFLA_VXLAN_PROXY */
@@ -3696,6 +3724,7 @@ static int vxlan_fill_info(struct sk_buff *skb, const 
struct net_device *dev)
nla_put_u8(skb, IFLA_VXLAN_TTL_INHERIT,
   !!(vxlan->cfg.flags & VXLAN_F_TTL_INHERIT)) ||
nla_put_u8(skb, IFLA_VXLAN_TOS, vxlan->cfg.tos) ||
+   nla_put_u8(skb, IFLA_VXLAN_DF, vxlan->cfg.df) ||
nla_put_be32(skb, IFLA_VXLAN_LABEL, vxlan->cfg.label) ||
nla_put_u8(skb, IFLA_VXLAN_LEARNING,
!!(vxlan->cfg.flags & VXLAN_F_LEARN)) ||
diff --git a/include/net/vxlan.h b/include/net/vxlan.h
index 03431c148e16..ec999c49df1f 100644
--- a/include/net/vxlan.h
+++ b/include/net/vxlan.h
@@ -216,6 +216,7 @@ struct vxlan_config {
unsigned long   age_interval;
unsigned intaddrmax;
boolno_share;
+   enum ifla_vxlan_df  df;
 };
 
 struct vxlan_dev_node {
diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index 1debfa42cba1..efc588949431 100644
--- a/include/uapi/linux/if_link.h
+++

[PATCH net-next 01/11] udp: Handle ICMP errors for tunnels with same destination port on both endpoints

2018-11-06 Thread Stefano Brivio

For both IPv4 and IPv6, if we can't match errors to a socket, try
tunnels before ignoring them. Look up a socket with the original source
and destination ports as found in the UDP packet inside the ICMP payload,
this will work for tunnels that force the same destination port for both
endpoints, i.e. VxLAN and GENEVE.

For IPv6 redirect messages, call ip6_redirect() directly with the output
interface argument set to the interface we received the packet from (as
it's the very interface we should build the exception on), otherwise the
new nexthop will be rejected. There's no such need for IPv4.

Tunnels can now export an encap_err_lookup() operation that indicates a
match. Pass the packet to the lookup function, and if the tunnel driver
reports a matching association, continue with regular ICMP error handling.

Reviewed-by: Sabrina Dubroca 
Signed-off-by: Stefano Brivio 
---
 include/linux/udp.h  |  1 +
 include/net/udp_tunnel.h |  3 ++
 net/ipv4/udp.c   | 76 +++-
 net/ipv4/udp_tunnel.c|  1 +
 net/ipv6/udp.c   | 83 ++--
 5 files changed, 144 insertions(+), 20 deletions(-)

diff --git a/include/linux/udp.h b/include/linux/udp.h
index 320d49d85484..c8410837f044 100644
--- a/include/linux/udp.h
+++ b/include/linux/udp.h
@@ -71,6 +71,7 @@ struct udp_sock {
 * For encapsulation sockets.
 */
int (*encap_rcv)(struct sock *sk, struct sk_buff *skb);
+   int (*encap_err_lookup)(struct sock *sk, struct sk_buff *skb);
void (*encap_destroy)(struct sock *sk);
 
/* GRO functions for UDP socket */
diff --git a/include/net/udp_tunnel.h b/include/net/udp_tunnel.h
index fe680ab6b15a..bf2f84984392 100644
--- a/include/net/udp_tunnel.h
+++ b/include/net/udp_tunnel.h
@@ -64,6 +64,8 @@ static inline int udp_sock_create(struct net *net,
 }
 
 typedef int (*udp_tunnel_encap_rcv_t)(struct sock *sk, struct sk_buff *skb);
+typedef int (*udp_tunnel_encap_err_lookup_t)(struct sock *sk,
+struct sk_buff *skb);
 typedef void (*udp_tunnel_encap_destroy_t)(struct sock *sk);
 typedef struct sk_buff *(*udp_tunnel_gro_receive_t)(struct sock *sk,
struct list_head *head,
@@ -76,6 +78,7 @@ struct udp_tunnel_sock_cfg {
/* Used for setting up udp_sock fields, see udp.h for details */
__u8  encap_type;
udp_tunnel_encap_rcv_t encap_rcv;
+   udp_tunnel_encap_err_lookup_t encap_err_lookup;
udp_tunnel_encap_destroy_t encap_destroy;
udp_tunnel_gro_receive_t gro_receive;
udp_tunnel_gro_complete_t gro_complete;
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index ca3ed931f2a9..1f054a85062d 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -585,6 +585,59 @@ static inline bool __udp_is_mcast_sock(struct net *net, 
struct sock *sk,
return true;
 }
 
+DEFINE_STATIC_KEY_FALSE(udp_encap_needed_key);
+void udp_encap_enable(void)
+{
+   static_branch_enable(_encap_needed_key);
+}
+EXPORT_SYMBOL(udp_encap_enable);
+
+/* Try to match ICMP errors to UDP tunnels by looking up a socket without
+ * reversing source and destination port: this will match tunnels that force 
the
+ * same destination port on both endpoints (e.g. VxLAN, GENEVE). Then ask the
+ * tunnel implementation to match the error against a valid association.
+ *
+ * Return the socket if we have a match.
+ */
+static struct sock *__udp4_lib_err_encap(struct net *net,
+const struct iphdr *iph,
+struct udphdr *uh,
+struct udp_table *udptable,
+struct sk_buff *skb)
+{
+   int (*lookup)(struct sock *sk, struct sk_buff *skb);
+   int network_offset, transport_offset;
+   struct udp_sock *up;
+   struct sock *sk;
+
+   sk = __udp4_lib_lookup(net, iph->daddr, uh->source,
+  iph->saddr, uh->dest, skb->dev->ifindex, 0,
+  udptable, NULL);
+   if (!sk)
+   return NULL;
+
+   network_offset = skb_network_offset(skb);
+   transport_offset = skb_transport_offset(skb);
+
+   skb_reset_network_header(skb);
+
+   /* Network header needs to point to the outer IPv4 header inside ICMP */
+   skb_reset_network_header(skb);
+   iph = ip_hdr(skb);
+   /* Transport header needs to point to the UDP header */
+   skb_set_transport_header(skb, iph->ihl << 2);
+
+   up = udp_sk(sk);
+   lookup = READ_ONCE(up->encap_err_lookup);
+   if (!lookup || lookup(sk, skb))
+   sk = NULL;
+
+   skb_set_transport_header(skb, transport_offset);
+   skb_set_network_header(skb, network_offset);
+
+   return sk;
+}
+
 /*
  * This routine is called by the ICMP module when it gets some
  * sort of error condition.  If err < 0 then the socket

[PATCH net-next 05/11] geneve: ICMP error lookup handler

2018-11-06 Thread Stefano Brivio

Export an encap_err_lookup() operation to match an ICMP error against a
valid VNI.

Reviewed-by: Sabrina Dubroca 
Signed-off-by: Stefano Brivio 
---
 drivers/net/geneve.c | 52 
 1 file changed, 52 insertions(+)

diff --git a/drivers/net/geneve.c b/drivers/net/geneve.c
index a0cd1c41cf5f..8a69879d516a 100644
--- a/drivers/net/geneve.c
+++ b/drivers/net/geneve.c
@@ -387,6 +387,57 @@ static int geneve_udp_encap_recv(struct sock *sk, struct 
sk_buff *skb)
return 0;
 }
 
+/* Callback from net/ipv{4,6}/udp.c to check that we have a tunnel for errors 
*/
+static int geneve_udp_encap_err_lookup(struct sock *sk, struct sk_buff *skb)
+{
+   struct genevehdr *geneveh;
+   struct geneve_sock *gs;
+   u8 zero_vni[3] = { 0 };
+   u8 *vni = zero_vni;
+
+   if (skb->len < GENEVE_BASE_HLEN)
+   return -EINVAL;
+
+   geneveh = geneve_hdr(skb);
+   if (geneveh->ver != GENEVE_VER)
+   return -EINVAL;
+
+   if (geneveh->proto_type != htons(ETH_P_TEB))
+   return -EINVAL;
+
+   gs = rcu_dereference_sk_user_data(sk);
+   if (!gs)
+   return -ENOENT;
+
+   if (geneve_get_sk_family(gs) == AF_INET) {
+   struct iphdr *iph = ip_hdr(skb);
+   __be32 addr4 = 0;
+
+   if (!gs->collect_md) {
+   vni = geneve_hdr(skb)->vni;
+   addr4 = iph->daddr;
+   }
+
+   return geneve_lookup(gs, addr4, vni) ? 0 : -ENOENT;
+   }
+
+#if IS_ENABLED(CONFIG_IPV6)
+   if (geneve_get_sk_family(gs) == AF_INET6) {
+   struct ipv6hdr *ip6h = ipv6_hdr(skb);
+   struct in6_addr addr6 = { 0 };
+
+   if (!gs->collect_md) {
+   vni = geneve_hdr(skb)->vni;
+   addr6 = ip6h->daddr;
+   }
+
+   return geneve6_lookup(gs, addr6, vni) ? 0 : -ENOENT;
+   }
+#endif
+
+   return -EPFNOSUPPORT;
+}
+
 static struct socket *geneve_create_sock(struct net *net, bool ipv6,
 __be16 port, bool ipv6_rx_csum)
 {
@@ -544,6 +595,7 @@ static struct geneve_sock *geneve_socket_create(struct net 
*net, __be16 port,
tunnel_cfg.gro_receive = geneve_gro_receive;
tunnel_cfg.gro_complete = geneve_gro_complete;
tunnel_cfg.encap_rcv = geneve_udp_encap_recv;
+   tunnel_cfg.encap_err_lookup = geneve_udp_encap_err_lookup;
tunnel_cfg.encap_destroy = NULL;
setup_udp_tunnel_sock(net, sock, _cfg);
list_add(>list, >sock_list);
-- 
2.19.1

[PATCH net-next 09/11] udp: Support for error handlers of tunnels with arbitrary destination port

2018-11-06 Thread Stefano Brivio

ICMP error handling is currently not possible for UDP tunnels not
employing a receiving socket with local destination port matching the
remote one, because we have no way to look them up.

Add an err_handler tunnel encapsulation operation that can be exported by
tunnels in order to pass the error to the protocol implementing the
encapsulation. We can't easily use a lookup function as we did for VxLAN
and GENEVE, as protocol error handlers, which would be in turn called by
implementations of this new operation, handle the errors themselves,
together with the tunnel lookup.

Without a socket, we can't be sure which encapsulation error handler is
the appropriate one: encapsulation handlers (the ones for FoU and GUE
introduced in the next patch, e.g.) will need to check the new error codes
returned by protocol handlers to figure out if errors match the given
encapsulation, and, in turn, report this error back, so that we can try
all of them in __udp{4,6}_lib_err_encap_no_sk() until we have a match.

Reviewed-by: Sabrina Dubroca 
Signed-off-by: Stefano Brivio 
---
 include/net/ip6_tunnel.h |  2 +
 include/net/ip_tunnels.h |  1 +
 net/ipv4/udp.c   | 75 +++--
 net/ipv6/udp.c   | 80 ++--
 4 files changed, 119 insertions(+), 39 deletions(-)

diff --git a/include/net/ip6_tunnel.h b/include/net/ip6_tunnel.h
index 236e40ba06bf..7855966b4a19 100644
--- a/include/net/ip6_tunnel.h
+++ b/include/net/ip6_tunnel.h
@@ -69,6 +69,8 @@ struct ip6_tnl_encap_ops {
size_t (*encap_hlen)(struct ip_tunnel_encap *e);
int (*build_header)(struct sk_buff *skb, struct ip_tunnel_encap *e,
u8 *protocol, struct flowi6 *fl6);
+   int (*err_handler)(struct sk_buff *, struct inet6_skb_parm *opt,
+  u8 type, u8 code, int offset, __be32 info);
 };
 
 #ifdef CONFIG_INET
diff --git a/include/net/ip_tunnels.h b/include/net/ip_tunnels.h
index b0d022ff6ea1..5980659312e5 100644
--- a/include/net/ip_tunnels.h
+++ b/include/net/ip_tunnels.h
@@ -311,6 +311,7 @@ struct ip_tunnel_encap_ops {
size_t (*encap_hlen)(struct ip_tunnel_encap *e);
int (*build_header)(struct sk_buff *skb, struct ip_tunnel_encap *e,
u8 *protocol, struct flowi4 *fl4);
+   int (*err_handler)(struct sk_buff *, u32);
 };
 
 #define MAX_IPTUN_ENCAP_OPS 8
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index b89c4cfd7c62..83950b4faced 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -105,6 +105,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -592,30 +593,48 @@ void udp_encap_enable(void)
 }
 EXPORT_SYMBOL(udp_encap_enable);
 
+/* Handler for tunnels with arbitrary destination ports: no socket lookup, go
+ * through error handlers in encapsulations looking for a match.
+ */
+static int __udp4_lib_err_encap_no_sk(struct sk_buff *skb, u32 info)
+{
+   int i;
+
+   for (i = 0; i < MAX_IPTUN_ENCAP_OPS; i++) {
+   int (*handler)(struct sk_buff *skb, u32 info);
+
+   if (!iptun_encaps[i])
+   continue;
+   handler = rcu_dereference(iptun_encaps[i]->err_handler);
+   if (handler && !handler(skb, info))
+   return 0;
+   }
+
+   return -ENOENT;
+}
+
 /* Try to match ICMP errors to UDP tunnels by looking up a socket without
  * reversing source and destination port: this will match tunnels that force 
the
- * same destination port on both endpoints (e.g. VxLAN, GENEVE). Then ask the
- * tunnel implementation to match the error against a valid association.
+ * same destination port on both endpoints (e.g. VxLAN, GENEVE). If this 
doesn't
+ * match any socket, probe tunnels with arbitrary destination ports (e.g. FoU,
+ * GUE): there, the receiving socket is useless, as the port we've sent packets
+ * to won't necessarily match the local destination port.
+ *
+ * Then ask the tunnel implementation to match the error against a valid
+ * association.
  *
- * Return the socket if we have a match.
+ * Return an error if we can't find a match, the socket if we need further
+ * processing, zero otherwise.
  */
 static struct sock *__udp4_lib_err_encap(struct net *net,
 const struct iphdr *iph,
 struct udphdr *uh,
 struct udp_table *udptable,
-struct sk_buff *skb)
+struct sk_buff *skb, u32 info)
 {
-   int (*lookup)(struct sock *sk, struct sk_buff *skb);
int network_offset, transport_offset;
-   struct udp_sock *up;
struct sock *sk;
 
-   sk = __udp4_lib_lookup(net, iph->daddr, uh->source,
-  iph->saddr, uh->dest, skb->dev->ifindex, 0,
-  udptable, NULL);
-   if (!sk)
-

[PATCH net-next 08/11] net: Convert protocol error handlers from void to int

2018-11-06 Thread Stefano Brivio

We'll need this to handle ICMP errors for tunnels without a sending socket
(i.e. FoU and GUE). There, we might have to look up different types of IP
tunnels, registered as network protocols, before we get a match, so we
want this for the error handlers of IPPROTO_IPIP and IPPROTO_IPV6 in both
inet_protos and inet6_protos. These error codes will be used in the next
patch.

For consistency, return sensible error codes in protocol error handlers
whenever errors don't match a protocol or any of its states.

This has no effect on existing error handling paths.

Reviewed-by: Sabrina Dubroca 
Signed-off-by: Stefano Brivio 
---
 include/net/icmp.h|  2 +-
 include/net/protocol.h|  9 ++--
 include/net/sctp/sctp.h   |  2 +-
 include/net/tcp.h |  2 +-
 include/net/udp.h |  2 +-
 net/dccp/ipv4.c   | 13 +++
 net/dccp/ipv6.c   | 13 +++
 net/ipv4/gre_demux.c  |  9 ++--
 net/ipv4/icmp.c   |  6 +++--
 net/ipv4/ip_gre.c | 48 ---
 net/ipv4/ipip.c   | 14 ++--
 net/ipv4/tcp_ipv4.c   | 22 ++
 net/ipv4/tunnel4.c| 18 ++-
 net/ipv4/udp.c| 10 
 net/ipv4/udp_impl.h   |  2 +-
 net/ipv4/udplite.c|  4 ++--
 net/ipv4/xfrm4_protocol.c | 18 ++-
 net/ipv6/icmp.c   |  4 +++-
 net/ipv6/ip6_gre.c| 18 ---
 net/ipv6/tcp_ipv6.c   | 13 +++
 net/ipv6/tunnel6.c| 12 ++
 net/ipv6/udp.c| 18 +++
 net/ipv6/udp_impl.h   |  4 ++--
 net/ipv6/udplite.c|  5 ++--
 net/ipv6/xfrm6_protocol.c | 18 ++-
 net/sctp/input.c  |  5 ++--
 net/sctp/ipv6.c   |  7 --
 27 files changed, 177 insertions(+), 121 deletions(-)

diff --git a/include/net/icmp.h b/include/net/icmp.h
index 3ef2743a8eec..6ac3a5bd0117 100644
--- a/include/net/icmp.h
+++ b/include/net/icmp.h
@@ -41,7 +41,7 @@ struct net;
 
 void icmp_send(struct sk_buff *skb_in, int type, int code, __be32 info);
 int icmp_rcv(struct sk_buff *skb);
-void icmp_err(struct sk_buff *skb, u32 info);
+int icmp_err(struct sk_buff *skb, u32 info);
 int icmp_init(void);
 void icmp_out_count(struct net *net, unsigned char type);
 
diff --git a/include/net/protocol.h b/include/net/protocol.h
index 4fc75f7ae23b..92b3eaad6088 100644
--- a/include/net/protocol.h
+++ b/include/net/protocol.h
@@ -42,7 +42,10 @@ struct net_protocol {
int (*early_demux)(struct sk_buff *skb);
int (*early_demux_handler)(struct sk_buff *skb);
int (*handler)(struct sk_buff *skb);
-   void(*err_handler)(struct sk_buff *skb, u32 info);
+
+   /* This returns an error if we weren't able to handle the error. */
+   int (*err_handler)(struct sk_buff *skb, u32 info);
+
unsigned intno_policy:1,
netns_ok:1,
/* does the protocol do more stringent
@@ -58,10 +61,12 @@ struct inet6_protocol {
void(*early_demux_handler)(struct sk_buff *skb);
int (*handler)(struct sk_buff *skb);
 
-   void(*err_handler)(struct sk_buff *skb,
+   /* This returns an error if we weren't able to handle the error. */
+   int (*err_handler)(struct sk_buff *skb,
   struct inet6_skb_parm *opt,
   u8 type, u8 code, int offset,
   __be32 info);
+
unsigned intflags;  /* INET6_PROTO_xxx */
 };
 
diff --git a/include/net/sctp/sctp.h b/include/net/sctp/sctp.h
index 8c2caa370e0f..9a3b48a35e90 100644
--- a/include/net/sctp/sctp.h
+++ b/include/net/sctp/sctp.h
@@ -151,7 +151,7 @@ int sctp_primitive_RECONF(struct net *net, struct 
sctp_association *asoc,
  * sctp/input.c
  */
 int sctp_rcv(struct sk_buff *skb);
-void sctp_v4_err(struct sk_buff *skb, u32 info);
+int sctp_v4_err(struct sk_buff *skb, u32 info);
 void sctp_hash_endpoint(struct sctp_endpoint *);
 void sctp_unhash_endpoint(struct sctp_endpoint *);
 struct sock *sctp_err_lookup(struct net *net, int family, struct sk_buff *,
diff --git a/include/net/tcp.h b/include/net/tcp.h
index a18914d20486..4743836bed2e 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -313,7 +313,7 @@ extern struct proto tcp_prot;
 
 void tcp_tasklet_init(void);
 
-void tcp_v4_err(struct sk_buff *skb, u32);
+int tcp_v4_err(struct sk_buff *skb, u32);
 
 void tcp_shutdown(struct sock *sk, int how);
 
diff --git a/include/net/udp.h b/include/net/udp.h
index 9e82cb391dea..7e3f1e2b68eb 100644
--- a/include/net/udp.h
+++ b/include/net/udp.h
@@ -272,7 +272,7 @@ bool udp_sk_rx_dst_set(struct sock *sk, struct dst_entry 
*dst);
 int udp_get_port(struct sock *sk, unsigned short snum,
 int (*saddr_cmp)(const struct sock *,
  const struct

[PATCH net-next 11/11] selftests: pmtu: Introduce FoU and GUE PMTU exceptions tests

2018-11-06 Thread Stefano Brivio

Introduce eight tests, for FoU and GUE, with IPv4 and IPv6 payload,
on IPv4 and IPv6 transport, that check that PMTU exceptions are created
with the right value when exceeding the MTU on a link of the path.

Reviewed-by: Sabrina Dubroca 
Signed-off-by: Stefano Brivio 
---
 tools/testing/selftests/net/pmtu.sh | 179 
 1 file changed, 179 insertions(+)

diff --git a/tools/testing/selftests/net/pmtu.sh 
b/tools/testing/selftests/net/pmtu.sh
index e9bb0c37bdfc..b746bcb29e0f 100755
--- a/tools/testing/selftests/net/pmtu.sh
+++ b/tools/testing/selftests/net/pmtu.sh
@@ -43,6 +43,14 @@
 # - pmtu_ipv6_geneve6_exception
 #  Same as pmtu_ipv6_vxlan6, but using a GENEVE tunnel instead of VxLAN
 #
+# - pmtu_ipv{4,6}_fou{4,6}_exception
+#  Same as pmtu_ipv4_vxlan6, but using a direct IPv4/IPv6 encapsulation
+#  (FoU) over IPv4/IPv6, instead of VxLAN
+#
+# - pmtu_ipv{4,6}_fou{4,6}_exception
+#  Same as pmtu_ipv4_vxlan6, but using a generic UDP IPv4/IPv6
+#  encapsulation (GUE) over IPv4/IPv6, instead of VxLAN
+#
 # - pmtu_vti4_exception
 #  Set up vti tunnel on top of veth, with xfrm states and policies, in two
 #  namespaces with matching endpoints. Check that route exception is not
@@ -93,6 +101,14 @@ tests="
pmtu_ipv6_vxlan6_exception  IPv6 over vxlan6: PMTU exceptions
pmtu_ipv4_geneve6_exception IPv4 over geneve6: PMTU exceptions
pmtu_ipv6_geneve6_exception IPv6 over geneve6: PMTU exceptions
+   pmtu_ipv4_fou4_exceptionIPv4 over fou4: PMTU exceptions
+   pmtu_ipv6_fou4_exceptionIPv6 over fou4: PMTU exceptions
+   pmtu_ipv4_fou6_exceptionIPv4 over fou6: PMTU exceptions
+   pmtu_ipv6_fou6_exceptionIPv6 over fou6: PMTU exceptions
+   pmtu_ipv4_gue4_exceptionIPv4 over gue4: PMTU exceptions
+   pmtu_ipv6_gue4_exceptionIPv6 over gue4: PMTU exceptions
+   pmtu_ipv4_gue6_exceptionIPv4 over gue6: PMTU exceptions
+   pmtu_ipv6_gue6_exceptionIPv6 over gue6: PMTU exceptions
pmtu_vti6_exception vti6: PMTU exceptions
pmtu_vti4_exception vti4: PMTU exceptions
pmtu_vti4_default_mtu   vti4: default MTU assignment
@@ -180,6 +196,89 @@ nsname() {
eval echo \$NS_$1
 }
 
+setup_fou_or_gue() {
+   outer="${1}"
+   inner="${2}"
+   encap="${3}"
+
+   if [ "${outer}" = "4" ]; then
+   modprobe fou || return 2
+   a_addr="${prefix4}.${a_r1}.1"
+   b_addr="${prefix4}.${b_r1}.1"
+   if [ "${inner}" = "4" ]; then
+   type="ipip"
+   ipproto="4"
+   else
+   type="sit"
+   ipproto="41"
+   fi
+   else
+   modprobe fou6 || return 2
+   a_addr="${prefix6}:${a_r1}::1"
+   b_addr="${prefix6}:${b_r1}::1"
+   if [ "${inner}" = "4" ]; then
+   type="ip6tnl"
+   mode="mode ipip6"
+   ipproto="4 -6"
+   else
+   type="ip6tnl"
+   mode="mode ip6ip6"
+   ipproto="41 -6"
+   fi
+   fi
+
+   ${ns_a} ip fou add port  ipproto ${ipproto} || return 2
+   ${ns_a} ip link add ${encap}_a type ${type} ${mode} local ${a_addr} 
remote ${b_addr} encap ${encap} encap-sport auto encap-dport 5556 || return 2
+
+   ${ns_b} ip fou add port 5556 ipproto ${ipproto}
+   ${ns_b} ip link add ${encap}_b type ${type} ${mode} local ${b_addr} 
remote ${a_addr} encap ${encap} encap-sport auto encap-dport 
+
+   if [ "${inner}" = "4" ]; then
+   ${ns_a} ip addr add ${tunnel4_a_addr}/${tunnel4_mask} dev 
${encap}_a
+   ${ns_b} ip addr add ${tunnel4_b_addr}/${tunnel4_mask} dev 
${encap}_b
+   else
+   ${ns_a} ip addr add ${tunnel6_a_addr}/${tunnel6_mask} dev 
${encap}_a
+   ${ns_b} ip addr add ${tunnel6_b_addr}/${tunnel6_mask} dev 
${encap}_b
+   fi
+
+   ${ns_a} ip link set ${encap}_a up
+   ${ns_b} ip link set ${encap}_b up
+
+   sleep 1
+}
+
+setup_fou44() {
+   setup_fou_or_gue 4 4 fou
+}
+
+setup_fou46() {
+   setup_fou_or_gue 4 6 fou
+}
+
+setup_fou64() {
+   setup_fou_or_gue 6 4 fou
+}
+
+setup_fou66() {
+   setup_fou_or_gue 6 6 fou
+}
+
+setup_gue44() {
+   setup_fou_or_gue 4 4 gue
+}
+
+setup_gue46() {
+   setup_fou_or_gue 4 6 gue
+}
+
+setup_gue64() {
+   setup_fou_or_gue 6 4 gue
+}
+
+setup_gue66() {
+   setup_fou_or_gue 6 6 gue
+}
+
 setup_namespaces() {
for n in ${NS_A} ${NS_B} ${NS_R1} ${NS_R2}; do
ip netns add ${n} || return 1
@@ -560,6 +659,86 @@ test_pmtu_ipvX_over_vxlan6_or_geneve6_exception() {
check_pmtu_value ${exp_mtu} "${pmtu}" "exceeding link layer MTU on 
${type} interface"
 }

[PATCH net-next 10/11] fou, fou6: ICMP error handlers for FoU and GUE

2018-11-06 Thread Stefano Brivio

As the destination port in FoU and GUE receiving sockets doesn't
necessarily match the remote destination port, we can't associate errors
to the encapsulating tunnels with a socket lookup -- we need to blindly
try them instead. This means we don't even know if we are handling errors
for FoU or GUE without digging into the packets.

Hence, implement a single handler for both, one for IPv4 and one for IPv6,
that will check whether the packet that generated the ICMP error used a
direct IP encapsulation or if it had a GUE header, and send the error to
the matching protocol handler, if any.

Reviewed-by: Sabrina Dubroca 
Signed-off-by: Stefano Brivio 
---
 net/ipv4/fou.c  | 68 +
 net/ipv4/protocol.c |  1 +
 net/ipv6/fou6.c | 74 +
 3 files changed, 143 insertions(+)

diff --git a/net/ipv4/fou.c b/net/ipv4/fou.c
index 500a59906b87..0d0ad19ecb87 100644
--- a/net/ipv4/fou.c
+++ b/net/ipv4/fou.c
@@ -3,6 +3,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -1003,15 +1004,82 @@ static int gue_build_header(struct sk_buff *skb, struct 
ip_tunnel_encap *e,
return 0;
 }
 
+static int gue_err_proto_handler(int proto, struct sk_buff *skb, u32 info)
+{
+   const struct net_protocol *ipprot = rcu_dereference(inet_protos[proto]);
+
+   if (ipprot && ipprot->err_handler) {
+   if (!ipprot->err_handler(skb, info))
+   return 0;
+   }
+
+   return -ENOENT;
+}
+
+static int gue_err(struct sk_buff *skb, u32 info)
+{
+   int transport_offset = skb_transport_offset(skb);
+   struct guehdr *guehdr;
+   size_t optlen;
+   int ret;
+
+   if (skb->len < sizeof(struct udphdr) + sizeof(struct guehdr))
+   return -EINVAL;
+
+   guehdr = (struct guehdr *)_hdr(skb)[1];
+
+   switch (guehdr->version) {
+   case 0: /* Full GUE header present */
+   break;
+   case 1: {
+   /* Direct encasulation of IPv4 or IPv6 */
+   skb_set_transport_header(skb, -(int)sizeof(struct icmphdr));
+
+   switch (((struct iphdr *)guehdr)->version) {
+   case 4:
+   ret = gue_err_proto_handler(IPPROTO_IPIP, skb, info);
+   goto out;
+#if IS_ENABLED(CONFIG_IPV6)
+   case 6:
+   ret = gue_err_proto_handler(IPPROTO_IPV6, skb, info);
+   goto out;
+#endif
+   default:
+   ret = -EOPNOTSUPP;
+   goto out;
+   }
+   }
+   default: /* Undefined version */
+   return -EOPNOTSUPP;
+   }
+
+   if (guehdr->control)
+   return -ENOENT;
+
+   optlen = guehdr->hlen << 2;
+
+   if (validate_gue_flags(guehdr, optlen))
+   return -EINVAL;
+
+   skb_set_transport_header(skb, -(int)sizeof(struct icmphdr));
+   ret = gue_err_proto_handler(guehdr->proto_ctype, skb, info);
+
+out:
+   skb_set_transport_header(skb, transport_offset);
+   return ret;
+}
+
 
 static const struct ip_tunnel_encap_ops fou_iptun_ops = {
.encap_hlen = fou_encap_hlen,
.build_header = fou_build_header,
+   .err_handler = gue_err,
 };
 
 static const struct ip_tunnel_encap_ops gue_iptun_ops = {
.encap_hlen = gue_encap_hlen,
.build_header = gue_build_header,
+   .err_handler = gue_err,
 };
 
 static int ip_tunnel_encap_add_fou_ops(void)
diff --git a/net/ipv4/protocol.c b/net/ipv4/protocol.c
index 32a691b7ce2c..92d249e053be 100644
--- a/net/ipv4/protocol.c
+++ b/net/ipv4/protocol.c
@@ -29,6 +29,7 @@
 #include 
 
 struct net_protocol __rcu *inet_protos[MAX_INET_PROTOS] __read_mostly;
+EXPORT_SYMBOL(inet_protos);
 const struct net_offload __rcu *inet_offloads[MAX_INET_PROTOS] __read_mostly;
 EXPORT_SYMBOL(inet_offloads);
 
diff --git a/net/ipv6/fou6.c b/net/ipv6/fou6.c
index 6de3c04b0f30..bd675c61deb1 100644
--- a/net/ipv6/fou6.c
+++ b/net/ipv6/fou6.c
@@ -4,6 +4,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -69,14 +70,87 @@ static int gue6_build_header(struct sk_buff *skb, struct 
ip_tunnel_encap *e,
return 0;
 }
 
+static int gue6_err_proto_handler(int proto, struct sk_buff *skb,
+ struct inet6_skb_parm *opt,
+ u8 type, u8 code, int offset, u32 info)
+{
+   const struct inet6_protocol *ipprot;
+
+   ipprot = rcu_dereference(inet6_protos[proto]);
+   if (ipprot && ipprot->err_handler) {
+   if (!ipprot->err_handler(skb, opt, type, code, offset, info))
+   return 0;
+   }
+
+   return -ENOENT;
+}
+
+static int gue6_err(struct sk_buff *skb, struct inet6_skb_parm *opt,
+   u8 type, u8 code, int offset, __be32 info)
+{
+   int transport_offset = skb_transport_offset(skb);
+   struct

[PATCH net-next 02/11] vxlan: ICMP error lookup handler

2018-11-06 Thread Stefano Brivio

Export an encap_err_lookup() operation to match an ICMP error against a
valid VNI.

Reviewed-by: Sabrina Dubroca 
Signed-off-by: Stefano Brivio 
---
 drivers/net/vxlan.c | 29 +
 1 file changed, 29 insertions(+)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index 297cdeaef479..7706e392b2a7 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -1552,6 +1552,34 @@ static int vxlan_rcv(struct sock *sk, struct sk_buff 
*skb)
return 0;
 }
 
+/* Callback from net/ipv{4,6}/udp.c to check that we have a VNI for errors */
+static int vxlan_err_lookup(struct sock *sk, struct sk_buff *skb)
+{
+   struct vxlan_dev *vxlan;
+   struct vxlan_sock *vs;
+   struct vxlanhdr *hdr;
+   __be32 vni;
+
+   if (skb->len < VXLAN_HLEN)
+   return -EINVAL;
+
+   hdr = vxlan_hdr(skb);
+
+   if (!(hdr->vx_flags & VXLAN_HF_VNI))
+   return -EINVAL;
+
+   vs = rcu_dereference_sk_user_data(sk);
+   if (!vs)
+   return -ENOENT;
+
+   vni = vxlan_vni(hdr->vx_vni);
+   vxlan = vxlan_vs_find_vni(vs, skb->dev->ifindex, vni);
+   if (!vxlan)
+   return -ENOENT;
+
+   return 0;
+}
+
 static int arp_reduce(struct net_device *dev, struct sk_buff *skb, __be32 vni)
 {
struct vxlan_dev *vxlan = netdev_priv(dev);
@@ -2948,6 +2976,7 @@ static struct vxlan_sock *vxlan_socket_create(struct net 
*net, bool ipv6,
tunnel_cfg.sk_user_data = vs;
tunnel_cfg.encap_type = 1;
tunnel_cfg.encap_rcv = vxlan_rcv;
+   tunnel_cfg.encap_err_lookup = vxlan_err_lookup;
tunnel_cfg.encap_destroy = NULL;
tunnel_cfg.gro_receive = vxlan_gro_receive;
tunnel_cfg.gro_complete = vxlan_gro_complete;
-- 
2.19.1

[PATCH net-next 04/11] selftests: pmtu: Introduce tests for IPv4/IPv6 over VxLAN over IPv6

2018-11-06 Thread Stefano Brivio

Use a router between endpoints, implemented via namespaces, set a low MTU
between router and destination endpoint, exceed it and check PMTU value in
route exceptions.

Reviewed-by: Sabrina Dubroca 
Signed-off-by: Stefano Brivio 
---
This only introduces tests over VxLAN over IPv6 right now. I'll introduce
tests over IPv4 (they can be added trivially) once DF configuration support
is accepted into iproute2.

 tools/testing/selftests/net/pmtu.sh | 115 +++-
 1 file changed, 97 insertions(+), 18 deletions(-)

diff --git a/tools/testing/selftests/net/pmtu.sh 
b/tools/testing/selftests/net/pmtu.sh
index a369d616b390..19ede74af560 100755
--- a/tools/testing/selftests/net/pmtu.sh
+++ b/tools/testing/selftests/net/pmtu.sh
@@ -26,6 +26,17 @@
 # - pmtu_ipv6
 #  Same as pmtu_ipv4, except for locked PMTU tests, using IPv6
 #
+# - pmtu_ipv4_vxlan6_exception
+#  Set up the same network topology as pmtu_ipv4, create a VxLAN tunnel
+#  over IPv6 between A and B, routed via R1. On the link between R1 and B,
+#  set a MTU lower than the VxLAN MTU and the MTU on the link between A and
+#  R1. Send IPv4 packets, exceeding the MTU between R1 and B, over VxLAN
+#  from A to B and check that the PMTU exception is created with the right
+#  value on A
+#
+# - pmtu_ipv6_vxlan6_exception
+#  Same as pmtu_ipv4_vxlan6_exception, but send IPv6 packets from A to B
+#
 # - pmtu_vti4_exception
 #  Set up vti tunnel on top of veth, with xfrm states and policies, in two
 #  namespaces with matching endpoints. Check that route exception is not
@@ -72,6 +83,8 @@ which ping6 > /dev/null 2>&1 && ping6=$(which ping6) || 
ping6=$(which ping)
 tests="
pmtu_ipv4_exception ipv4: PMTU exceptions
pmtu_ipv6_exception ipv6: PMTU exceptions
+   pmtu_ipv4_vxlan6_exception  IPv4 over vxlan6: PMTU exceptions
+   pmtu_ipv6_vxlan6_exception  IPv6 over vxlan6: PMTU exceptions
pmtu_vti6_exception vti6: PMTU exceptions
pmtu_vti4_exception vti4: PMTU exceptions
pmtu_vti4_default_mtu   vti4: default MTU assignment
@@ -95,8 +108,8 @@ ns_r2="ip netns exec ${NS_R2}"
 # Addresses are:
 # - IPv4: PREFIX4.SEGMENT.ID (/24)
 # - IPv6: PREFIX6:SEGMENT::ID (/64)
-prefix4="192.168"
-prefix6="fd00"
+prefix4="10.0"
+prefix6="fc00"
 a_r1=1
 a_r2=2
 b_r1=3
@@ -129,12 +142,12 @@ veth6_a_addr="fd00:1::a"
 veth6_b_addr="fd00:1::b"
 veth6_mask="64"
 
-vti4_a_addr="192.168.2.1"
-vti4_b_addr="192.168.2.2"
-vti4_mask="24"
-vti6_a_addr="fd00:2::a"
-vti6_b_addr="fd00:2::b"
-vti6_mask="64"
+tunnel4_a_addr="192.168.2.1"
+tunnel4_b_addr="192.168.2.2"
+tunnel4_mask="24"
+tunnel6_a_addr="fd00:2::a"
+tunnel6_b_addr="fd00:2::b"
+tunnel6_mask="64"
 
 dummy6_0_addr="fc00:1000::0"
 dummy6_1_addr="fc00:1001::0"
@@ -202,11 +215,34 @@ setup_vti() {
 }
 
 setup_vti4() {
-   setup_vti 4 ${veth4_a_addr} ${veth4_b_addr} ${vti4_a_addr} 
${vti4_b_addr} ${vti4_mask}
+   setup_vti 4 ${veth4_a_addr} ${veth4_b_addr} ${tunnel4_a_addr} 
${tunnel4_b_addr} ${tunnel4_mask}
 }
 
 setup_vti6() {
-   setup_vti 6 ${veth6_a_addr} ${veth6_b_addr} ${vti6_a_addr} 
${vti6_b_addr} ${vti6_mask}
+   setup_vti 6 ${veth6_a_addr} ${veth6_b_addr} ${tunnel6_a_addr} 
${tunnel6_b_addr} ${tunnel6_mask}
+}
+
+setup_vxlan() {
+   a_addr="${1}"
+   b_addr="${2}"
+
+   ${ns_a} ip link add vxlan_a type vxlan id 1 local ${a_addr} remote 
${b_addr} ttl 64 dstport 4789 || return 1
+   ${ns_b} ip link add vxlan_b type vxlan id 1 local ${b_addr} remote 
${a_addr} ttl 64 dstport 4789
+
+   ${ns_a} ip addr add ${tunnel4_a_addr}/${tunnel4_mask}   dev vxlan_a
+   ${ns_b} ip addr add ${tunnel4_b_addr}/${tunnel4_mask}   dev vxlan_b
+
+   ${ns_a} ip addr add ${tunnel6_a_addr}/${tunnel6_mask}   dev vxlan_a
+   ${ns_b} ip addr add ${tunnel6_b_addr}/${tunnel6_mask}   dev vxlan_b
+
+   ${ns_a} ip link set vxlan_a up
+   ${ns_b} ip link set vxlan_b up
+
+   sleep 1
+}
+
+setup_vxlan6() {
+   setup_vxlan ${prefix6}:${a_r1}::1 ${prefix6}:${b_r1}::1
 }
 
 setup_xfrm() {
@@ -465,6 +501,49 @@ test_pmtu_ipv6_exception() {
test_pmtu_ipvX 6
 }
 
+test_pmtu_ipvX_over_vxlan6_exception() {
+   family=${1}
+   ll_mtu=4000
+
+   setup namespaces routing vxlan6 || return 2
+   #  IPv6 header   UDP header   VxLAN header   
Ethernet header
+   exp_mtu=$((${ll_mtu} - 40  - 8  - 8- 14))
+
+   trace "${ns_a}" vxlan_a  "${ns_b}"  vxlan_b \
+ "${ns_a}" veth_A-R1"${ns_r1}" veth_R1-A \
+ "${ns_b}" veth_B-R1"${ns_r1}" veth_R1-B
+
+   if [ ${family} -eq 4 ]; then
+   ping=ping
+   dst=${tunnel4_b_addr}
+   else
+   ping=${ping6}
+   dst=${tunnel6_b_addr}
+   fi
+
+   # Create route exception by exceeding link layer MTU
+   mtu "${ns_a}"  veth_A-R1 $((${ll_mtu} +

[PATCH net-next 06/11] geneve: Allow configuration of DF behaviour

2018-11-06 Thread Stefano Brivio

draft-ietf-nvo3-geneve-08 says:

   It is strongly RECOMMENDED that Path MTU Discovery ([RFC1191],
   [RFC1981]) be used by setting the DF bit in the IP header when Geneve
   packets are transmitted over IPv4 (this is the default with IPv6).

Now that ICMP error handling is working for GENEVE, we can comply with
this recommendation.

Make this configurable, though, to avoid breaking existing setups. By
default, DF won't be set. It can be set or inherited from inner IPv4
packets. If it's configured to be inherited and we are encapsulating IPv6,
it will be set.

Reviewed-by: Sabrina Dubroca 
Signed-off-by: Stefano Brivio 
---
 drivers/net/geneve.c | 52 +++-
 include/uapi/linux/if_link.h |  9 +++
 2 files changed, 55 insertions(+), 6 deletions(-)

diff --git a/drivers/net/geneve.c b/drivers/net/geneve.c
index 8a69879d516a..cafdee06b5c8 100644
--- a/drivers/net/geneve.c
+++ b/drivers/net/geneve.c
@@ -70,6 +70,7 @@ struct geneve_dev {
bool   collect_md;
bool   use_udp6_rx_checksums;
bool   ttl_inherit;
+   enum ifla_geneve_df df;
 };
 
 struct geneve_sock {
@@ -898,7 +899,24 @@ static int geneve_xmit_skb(struct sk_buff *skb, struct 
net_device *dev,
ttl = key->ttl;
ttl = ttl ? : ip4_dst_hoplimit(>dst);
}
+
df = key->tun_flags & TUNNEL_DONT_FRAGMENT ? htons(IP_DF) : 0;
+   if (!df) {
+   if (geneve->df == GENEVE_DF_SET) {
+   df = htons(IP_DF);
+   } else if (geneve->df == GENEVE_DF_INHERIT) {
+   struct ethhdr *eth = eth_hdr(skb);
+
+   if (ntohs(eth->h_proto) == ETH_P_IPV6) {
+   df = htons(IP_DF);
+   } else if (ntohs(eth->h_proto) == ETH_P_IP) {
+   struct iphdr *iph = ip_hdr(skb);
+
+   if (iph->frag_off & htons(IP_DF))
+   df = htons(IP_DF);
+   }
+   }
+   }
 
err = geneve_build_skb(>dst, skb, info, xnet, sizeof(struct iphdr));
if (unlikely(err))
@@ -1145,6 +1163,7 @@ static const struct nla_policy 
geneve_policy[IFLA_GENEVE_MAX + 1] = {
[IFLA_GENEVE_UDP_ZERO_CSUM6_TX] = { .type = NLA_U8 },
[IFLA_GENEVE_UDP_ZERO_CSUM6_RX] = { .type = NLA_U8 },
[IFLA_GENEVE_TTL_INHERIT]   = { .type = NLA_U8 },
+   [IFLA_GENEVE_DF]= { .type = NLA_U8 },
 };
 
 static int geneve_validate(struct nlattr *tb[], struct nlattr *data[],
@@ -1180,6 +1199,16 @@ static int geneve_validate(struct nlattr *tb[], struct 
nlattr *data[],
}
}
 
+   if (data[IFLA_GENEVE_DF]) {
+   enum ifla_geneve_df df = nla_get_u8(data[IFLA_GENEVE_DF]);
+
+   if (df < 0 || df > GENEVE_DF_MAX) {
+   NL_SET_ERR_MSG_ATTR(extack, tb[IFLA_GENEVE_DF],
+   "Invalid DF attribute");
+   return -EINVAL;
+   }
+   }
+
return 0;
 }
 
@@ -1225,7 +1254,7 @@ static int geneve_configure(struct net *net, struct 
net_device *dev,
struct netlink_ext_ack *extack,
const struct ip_tunnel_info *info,
bool metadata, bool ipv6_rx_csum,
-   bool ttl_inherit)
+   bool ttl_inherit, enum ifla_geneve_df df)
 {
struct geneve_net *gn = net_generic(net, geneve_net_id);
struct geneve_dev *t, *geneve = netdev_priv(dev);
@@ -1275,6 +1304,7 @@ static int geneve_configure(struct net *net, struct 
net_device *dev,
geneve->collect_md = metadata;
geneve->use_udp6_rx_checksums = ipv6_rx_csum;
geneve->ttl_inherit = ttl_inherit;
+   geneve->df = df;
 
err = register_netdevice(dev);
if (err)
@@ -1294,7 +1324,7 @@ static int geneve_nl2info(struct nlattr *tb[], struct 
nlattr *data[],
  struct netlink_ext_ack *extack,
  struct ip_tunnel_info *info, bool *metadata,
  bool *use_udp6_rx_checksums, bool *ttl_inherit,
- bool changelink)
+ enum ifla_geneve_df *df, bool changelink)
 {
int attrtype;
 
@@ -1382,6 +1412,9 @@ static int geneve_nl2info(struct nlattr *tb[], struct 
nlattr *data[],
if (data[IFLA_GENEVE_TOS])
info->key.tos = nla_get_u8(data[IFLA_GENEVE_TOS]);
 
+   if (data[IFLA_GENEVE_DF])
+   *df = nla_get_u8(data[IFLA_GENEVE_DF]);
+
if (data[IFLA_GENEVE_LABEL]) {
info->key.label = nla_get_be32(data[IFLA_GENEVE_LABEL]) &
  IPV6_FLOWLABEL_MASK;
@@ -1500,6 +1533,7 @@ static int geneve_newlink(struct net *net, struct 
net_device *dev,

[PATCH net-next 00/11] ICMP error handling for UDP tunnels

2018-11-06 Thread Stefano Brivio

This series introduces ICMP error handling for UDP tunnels and
encapsulations and related selftests. We need to handle ICMP errors to
support PMTU discovery and route redirection -- this support is entirely
missing right now:

- patch 1/11 adds a socket lookup for UDP tunnels that use, by design,
  the same destination port on both endpoints -- i.e. VxLAN and GENEVE
- patches 2/11 to 7/11 are specific to VxLAN and GENEVE
- patches 8/11 and 9/11 add infrastructure for lookup of encapsulations
  where sent packets cannot be matched via receiving socket lookup, i.e.
  FoU and GUE
- patches 10/11 and 11/11 are specific to FoU and GUE

Stefano Brivio (11):
  udp: Handle ICMP errors for tunnels with same destination port on both
endpoints
  vxlan: ICMP error lookup handler
  vxlan: Allow configuration of DF behaviour
  selftests: pmtu: Introduce tests for IPv4/IPv6 over VxLAN over IPv6
  geneve: ICMP error lookup handler
  geneve: Allow configuration of DF behaviour
  selftests: pmtu: Introduce tests for IPv4/IPv6 over GENEVE over IPv6
  net: Convert protocol error handlers from void to int
  udp: Support for error handlers of tunnels with arbitrary destination
port
  fou, fou6: ICMP error handlers for FoU and GUE
  selftests: pmtu: Introduce FoU and GUE PMTU exceptions tests

 drivers/net/geneve.c| 104 -
 drivers/net/vxlan.c |  58 +
 include/linux/udp.h |   1 +
 include/net/icmp.h  |   2 +-
 include/net/ip6_tunnel.h|   2 +
 include/net/ip_tunnels.h|   1 +
 include/net/protocol.h  |   9 +-
 include/net/sctp/sctp.h |   2 +-
 include/net/tcp.h   |   2 +-
 include/net/udp.h   |   2 +-
 include/net/udp_tunnel.h|   3 +
 include/net/vxlan.h |   1 +
 include/uapi/linux/if_link.h|  18 ++
 net/dccp/ipv4.c |  13 +-
 net/dccp/ipv6.c |  13 +-
 net/ipv4/fou.c  |  68 ++
 net/ipv4/gre_demux.c|   9 +-
 net/ipv4/icmp.c |   6 +-
 net/ipv4/ip_gre.c   |  48 ++--
 net/ipv4/ipip.c |  14 +-
 net/ipv4/protocol.c |   1 +
 net/ipv4/tcp_ipv4.c |  22 +-
 net/ipv4/tunnel4.c  |  18 +-
 net/ipv4/udp.c  | 119 --
 net/ipv4/udp_impl.h |   2 +-
 net/ipv4/udp_tunnel.c   |   1 +
 net/ipv4/udplite.c  |   4 +-
 net/ipv4/xfrm4_protocol.c   |  18 +-
 net/ipv6/fou6.c |  74 +++
 net/ipv6/icmp.c |   4 +-
 net/ipv6/ip6_gre.c  |  18 +-
 net/ipv6/tcp_ipv6.c |  13 +-
 net/ipv6/tunnel6.c  |  12 +-
 net/ipv6/udp.c  | 141 ++--
 net/ipv6/udp_impl.h |   4 +-
 net/ipv6/udplite.c  |   5 +-
 net/ipv6/xfrm6_protocol.c   |  18 +-
 net/sctp/input.c|   5 +-
 net/sctp/ipv6.c |   7 +-
 tools/testing/selftests/net/pmtu.sh | 326 ++--
 40 files changed, 1025 insertions(+), 163 deletions(-)

-- 
2.19.1

[PATCH net-next 07/11] selftests: pmtu: Introduce tests for IPv4/IPv6 over GENEVE over IPv6

2018-11-06 Thread Stefano Brivio

Use a router between endpoints, implemented via namespaces, set a low MTU
between router and destination endpoint, exceed it and check PMTU value in
route exceptions.

Reviewed-by: Sabrina Dubroca 
Signed-off-by: Stefano Brivio 
---
This only introduces tests over GENEVE over IPv6 right now. I'll introduce
tests over IPv4 (they can be added trivially) once DF configuration support
is accepted into iproute2.

 tools/testing/selftests/net/pmtu.sh | 78 -
 1 file changed, 55 insertions(+), 23 deletions(-)

diff --git a/tools/testing/selftests/net/pmtu.sh 
b/tools/testing/selftests/net/pmtu.sh
index 19ede74af560..e9bb0c37bdfc 100755
--- a/tools/testing/selftests/net/pmtu.sh
+++ b/tools/testing/selftests/net/pmtu.sh
@@ -37,6 +37,12 @@
 # - pmtu_ipv6_vxlan6_exception
 #  Same as pmtu_ipv4_vxlan6_exception, but send IPv6 packets from A to B
 #
+# - pmtu_ipv4_geneve6_exception
+#  Same as pmtu_ipv4_vxlan6, but using a GENEVE tunnel instead of VxLAN
+#
+# - pmtu_ipv6_geneve6_exception
+#  Same as pmtu_ipv6_vxlan6, but using a GENEVE tunnel instead of VxLAN
+#
 # - pmtu_vti4_exception
 #  Set up vti tunnel on top of veth, with xfrm states and policies, in two
 #  namespaces with matching endpoints. Check that route exception is not
@@ -85,6 +91,8 @@ tests="
pmtu_ipv6_exception ipv6: PMTU exceptions
pmtu_ipv4_vxlan6_exception  IPv4 over vxlan6: PMTU exceptions
pmtu_ipv6_vxlan6_exception  IPv6 over vxlan6: PMTU exceptions
+   pmtu_ipv4_geneve6_exception IPv4 over geneve6: PMTU exceptions
+   pmtu_ipv6_geneve6_exception IPv6 over geneve6: PMTU exceptions
pmtu_vti6_exception vti6: PMTU exceptions
pmtu_vti4_exception vti4: PMTU exceptions
pmtu_vti4_default_mtu   vti4: default MTU assignment
@@ -222,27 +230,42 @@ setup_vti6() {
setup_vti 6 ${veth6_a_addr} ${veth6_b_addr} ${tunnel6_a_addr} 
${tunnel6_b_addr} ${tunnel6_mask}
 }
 
-setup_vxlan() {
-   a_addr="${1}"
-   b_addr="${2}"
+setup_vxlan_or_geneve() {
+   type="${1}"
+   a_addr="${2}"
+   b_addr="${3}"
+
+   if [ "${type}" = "vxlan" ]; then
+   opts="ttl 64 dstport 4789"
+   opts_a="local ${a_addr}"
+   opts_b="local ${b_addr}"
+   else
+   opts=""
+   opts_a=""
+   opts_b=""
+   fi
 
-   ${ns_a} ip link add vxlan_a type vxlan id 1 local ${a_addr} remote 
${b_addr} ttl 64 dstport 4789 || return 1
-   ${ns_b} ip link add vxlan_b type vxlan id 1 local ${b_addr} remote 
${a_addr} ttl 64 dstport 4789
+   ${ns_a} ip link add ${type}_a type ${type} id 1 ${opts_a} remote 
${b_addr} ${opts} || return 1
+   ${ns_b} ip link add ${type}_b type ${type} id 1 ${opts_b} remote 
${a_addr} ${opts}
 
-   ${ns_a} ip addr add ${tunnel4_a_addr}/${tunnel4_mask}   dev vxlan_a
-   ${ns_b} ip addr add ${tunnel4_b_addr}/${tunnel4_mask}   dev vxlan_b
+   ${ns_a} ip addr add ${tunnel4_a_addr}/${tunnel4_mask} dev ${type}_a
+   ${ns_b} ip addr add ${tunnel4_b_addr}/${tunnel4_mask} dev ${type}_b
 
-   ${ns_a} ip addr add ${tunnel6_a_addr}/${tunnel6_mask}   dev vxlan_a
-   ${ns_b} ip addr add ${tunnel6_b_addr}/${tunnel6_mask}   dev vxlan_b
+   ${ns_a} ip addr add ${tunnel6_a_addr}/${tunnel6_mask} dev ${type}_a
+   ${ns_b} ip addr add ${tunnel6_b_addr}/${tunnel6_mask} dev ${type}_b
 
-   ${ns_a} ip link set vxlan_a up
-   ${ns_b} ip link set vxlan_b up
+   ${ns_a} ip link set ${type}_a up
+   ${ns_b} ip link set ${type}_b up
 
sleep 1
 }
 
+setup_geneve6() {
+   setup_vxlan_or_geneve geneve ${prefix6}:${a_r1}::1 ${prefix6}:${b_r1}::1
+}
+
 setup_vxlan6() {
-   setup_vxlan ${prefix6}:${a_r1}::1 ${prefix6}:${b_r1}::1
+   setup_vxlan_or_geneve vxlan ${prefix6}:${a_r1}::1 ${prefix6}:${b_r1}::1
 }
 
 setup_xfrm() {
@@ -501,15 +524,16 @@ test_pmtu_ipv6_exception() {
test_pmtu_ipvX 6
 }
 
-test_pmtu_ipvX_over_vxlan6_exception() {
-   family=${1}
+test_pmtu_ipvX_over_vxlan6_or_geneve6_exception() {
+   type=${1}
+   family=${2}
ll_mtu=4000
 
-   setup namespaces routing vxlan6 || return 2
-   #  IPv6 header   UDP header   VxLAN header   
Ethernet header
-   exp_mtu=$((${ll_mtu} - 40  - 8  - 8- 14))
+   setup namespaces routing ${type}6 || return 2
+   #  IPv6 header   UDP header   VxLAN/GENEVE header   
Ethernet header
+   exp_mtu=$((${ll_mtu} - 40  - 8  - 8   - 
14))
 
-   trace "${ns_a}" vxlan_a  "${ns_b}"  vxlan_b \
+   trace "${ns_a}" ${type}_a"${ns_b}"  ${type}_b \
  "${ns_a}" veth_A-R1"${ns_r1}" veth_R1-A \
  "${ns_b}" veth_B-R1"${ns_r1}" veth_R1-B
 
@@ -527,21 +551,29 @@ test_pmtu_ipvX_over_vxlan6_exception() {
mtu "${ns_b}"  veth_B-R1

Re: [PATCH rdma] net/mlx5: Fix XRC SRQ umem valid bits

2018-11-06 Thread Doug Ledford

On Wed, 2018-10-31 at 12:20 +0200, Leon Romanovsky wrote:
> From: Yishai Hadas 
> 
> Adapt XRC SRQ to the latest HW specification with fixed definition
> around umem valid bits. The previous definition relied on a bit which
> was taken for other purposes in legacy FW.
> 
> Fixes: bd37197554eb ("net/mlx5: Update mlx5_ifc with DEVX UID bits")
> Signed-off-by: Yishai Hadas 
> Reviewed-by: Artemy Kovalyov 
> Signed-off-by: Leon Romanovsky 
> ---
> Hi Doug, Jason
> 
> This commit fixes code sent in this merge window, so I'm not marking it
> with any rdma-rc/rdma-next. It will be better to be sent during this merge
> window if you have extra pull request to issue, or as a -rc material, if
> not.
> 
> BTW, we didn't combine reserved fields, because our convention is to align 
> such
> fields to 32 bits for better readability.
> 
> Thanks

This looks fine.  Let me know when it's in the mlx5-next tree to pull.

-- 
Doug Ledford 
GPG KeyID: B826A3330E572FDD
Key fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD


signature.asc
Description: This is a digitally signed message part

[PATCH net-next 4/5] net: dsa: bcm_sf2: Get rid of unmarshalling functions

2018-11-06 Thread Florian Fainelli

Now that we have migrated the CFP rule handling to a list with a
software copy, the delete/get operation just returns what is on the
list, no need to read from the hardware which is both slow and more
error prone.

Signed-off-by: Florian Fainelli 
---
 drivers/net/dsa/bcm_sf2_cfp.c | 310 --
 1 file changed, 310 deletions(-)

diff --git a/drivers/net/dsa/bcm_sf2_cfp.c b/drivers/net/dsa/bcm_sf2_cfp.c
index 5034e4f56fd5..8f734abde7b3 100644
--- a/drivers/net/dsa/bcm_sf2_cfp.c
+++ b/drivers/net/dsa/bcm_sf2_cfp.c
@@ -974,316 +974,6 @@ static void bcm_sf2_invert_masks(struct 
ethtool_rx_flow_spec *flow)
flow->m_ext.data[1] ^= cpu_to_be32(~0);
 }
 
-static int __maybe_unused bcm_sf2_cfp_unslice_ipv4(struct bcm_sf2_priv *priv,
-  struct ethtool_tcpip4_spec 
*v4_spec,
-  bool mask)
-{
-   u32 reg, offset, ipv4;
-   u16 src_dst_port;
-
-   if (mask)
-   offset = CORE_CFP_MASK_PORT(3);
-   else
-   offset = CORE_CFP_DATA_PORT(3);
-
-   reg = core_readl(priv, offset);
-   /* src port [15:8] */
-   src_dst_port = reg << 8;
-
-   if (mask)
-   offset = CORE_CFP_MASK_PORT(2);
-   else
-   offset = CORE_CFP_DATA_PORT(2);
-
-   reg = core_readl(priv, offset);
-   /* src port [7:0] */
-   src_dst_port |= (reg >> 24);
-
-   v4_spec->pdst = cpu_to_be16(src_dst_port);
-   v4_spec->psrc = cpu_to_be16((u16)(reg >> 8));
-
-   /* IPv4 dst [15:8] */
-   ipv4 = (reg & 0xff) << 8;
-
-   if (mask)
-   offset = CORE_CFP_MASK_PORT(1);
-   else
-   offset = CORE_CFP_DATA_PORT(1);
-
-   reg = core_readl(priv, offset);
-   /* IPv4 dst [31:16] */
-   ipv4 |= ((reg >> 8) & 0x) << 16;
-   /* IPv4 dst [7:0] */
-   ipv4 |= (reg >> 24) & 0xff;
-   v4_spec->ip4dst = cpu_to_be32(ipv4);
-
-   /* IPv4 src [15:8] */
-   ipv4 = (reg & 0xff) << 8;
-
-   if (mask)
-   offset = CORE_CFP_MASK_PORT(0);
-   else
-   offset = CORE_CFP_DATA_PORT(0);
-   reg = core_readl(priv, offset);
-
-   /* Once the TCAM is programmed, the mask reflects the slice number
-* being matched, don't bother checking it when reading back the
-* mask spec
-*/
-   if (!mask && !(reg & SLICE_VALID))
-   return -EINVAL;
-
-   /* IPv4 src [7:0] */
-   ipv4 |= (reg >> 24) & 0xff;
-   /* IPv4 src [31:16] */
-   ipv4 |= ((reg >> 8) & 0x) << 16;
-   v4_spec->ip4src = cpu_to_be32(ipv4);
-
-   return 0;
-}
-
-static int bcm_sf2_cfp_ipv4_rule_get(struct bcm_sf2_priv *priv, int port,
-struct ethtool_rx_flow_spec *fs)
-{
-   struct ethtool_tcpip4_spec *v4_spec = NULL, *v4_m_spec = NULL;
-   u32 reg;
-   int ret;
-
-   reg = core_readl(priv, CORE_CFP_DATA_PORT(6));
-
-   switch ((reg & IPPROTO_MASK) >> IPPROTO_SHIFT) {
-   case IPPROTO_TCP:
-   fs->flow_type = TCP_V4_FLOW;
-   v4_spec = >h_u.tcp_ip4_spec;
-   v4_m_spec = >m_u.tcp_ip4_spec;
-   break;
-   case IPPROTO_UDP:
-   fs->flow_type = UDP_V4_FLOW;
-   v4_spec = >h_u.udp_ip4_spec;
-   v4_m_spec = >m_u.udp_ip4_spec;
-   break;
-   default:
-   return -EINVAL;
-   }
-
-   fs->m_ext.data[0] = cpu_to_be32((reg >> IP_FRAG_SHIFT) & 1);
-   v4_spec->tos = (reg >> IPTOS_SHIFT) & IPTOS_MASK;
-
-   ret = bcm_sf2_cfp_unslice_ipv4(priv, v4_spec, false);
-   if (ret)
-   return ret;
-
-   return bcm_sf2_cfp_unslice_ipv4(priv, v4_m_spec, true);
-}
-
-static int __maybe_unused bcm_sf2_cfp_unslice_ipv6(struct bcm_sf2_priv *priv,
-  __be32 *ip6_addr,
-  __be16 *port,
-  bool mask)
-{
-   u32 reg, tmp, offset;
-
-   /* C-Tag[31:24]
-* UDF_n_B8 [23:8] (port)
-* UDF_n_B7 (upper) [7:0] (addr[15:8])
-*/
-   if (mask)
-   offset = CORE_CFP_MASK_PORT(4);
-   else
-   offset = CORE_CFP_DATA_PORT(4);
-   reg = core_readl(priv, offset);
-   *port = cpu_to_be32(reg) >> 8;
-   tmp = (u32)(reg & 0xff) << 8;
-
-   /* UDF_n_B7 (lower) [31:24] (addr[7:0])
-* UDF_n_B6 [23:8] (addr[31:16])
-* UDF_n_B5 (upper) [7:0] (addr[47:40])
-*/
-   if (mask)
-   offset = CORE_CFP_MASK_PORT(3);
-   else
-   offset = CORE_CFP_DATA_PORT(3);
-   reg = core_readl(priv, offset);
-   tmp |= (reg >> 24) & 0xff;
-   tmp |= (u32)((reg >> 8) << 16);
-   ip6_addr[3] = cpu_to_be32(tmp);
-   tmp = (u32)(reg & 0xff) << 8;
-
-

[PATCH net-next 5/5] net: systemport: Restore Broadcom tag match filters upon resume

2018-11-06 Thread Florian Fainelli

Some of the system suspend states that we support wipe out entirely the
HW contents. If we had a Wake-on-LAN filter programmed prior to going
into suspend, but we did not actually wake-up from Wake-on-LAN and
instead used a deeper suspend state, make sure we restore the CID number
that we need to match against.

Signed-off-by: Florian Fainelli 
---
 drivers/net/ethernet/broadcom/bcmsysport.c | 12 
 drivers/net/ethernet/broadcom/bcmsysport.h |  1 +
 2 files changed, 13 insertions(+)

diff --git a/drivers/net/ethernet/broadcom/bcmsysport.c 
b/drivers/net/ethernet/broadcom/bcmsysport.c
index 0e2d99c737e3..2e60dda32adc 100644
--- a/drivers/net/ethernet/broadcom/bcmsysport.c
+++ b/drivers/net/ethernet/broadcom/bcmsysport.c
@@ -1068,6 +1068,7 @@ static void mpd_enable_set(struct bcm_sysport_priv *priv, 
bool enable)
 
 static void bcm_sysport_resume_from_wol(struct bcm_sysport_priv *priv)
 {
+   unsigned int index;
u32 reg;
 
/* Disable RXCHK, active filters and Broadcom tag matching */
@@ -1076,6 +1077,15 @@ static void bcm_sysport_resume_from_wol(struct 
bcm_sysport_priv *priv)
 RXCHK_BRCM_TAG_MATCH_SHIFT | RXCHK_EN | RXCHK_BRCM_TAG_EN);
rxchk_writel(priv, reg, RXCHK_CONTROL);
 
+   /* Make sure we restore correct CID index in case HW lost
+* its context during deep idle state
+*/
+   for_each_set_bit(index, priv->filters, RXCHK_BRCM_TAG_MAX) {
+   rxchk_writel(priv, priv->filters_loc[index] <<
+RXCHK_BRCM_TAG_CID_SHIFT, RXCHK_BRCM_TAG(index));
+   rxchk_writel(priv, 0xff00, RXCHK_BRCM_TAG_MASK(index));
+   }
+
/* Clear the MagicPacket detection logic */
mpd_enable_set(priv, false);
 
@@ -2189,6 +2199,7 @@ static int bcm_sysport_rule_set(struct bcm_sysport_priv 
*priv,
rxchk_writel(priv, reg, RXCHK_BRCM_TAG(index));
rxchk_writel(priv, 0xff00, RXCHK_BRCM_TAG_MASK(index));
 
+   priv->filters_loc[index] = nfc->fs.location;
set_bit(index, priv->filters);
 
return 0;
@@ -2208,6 +2219,7 @@ static int bcm_sysport_rule_del(struct bcm_sysport_priv 
*priv,
 * be taken care of during suspend time by bcm_sysport_suspend_to_wol
 */
clear_bit(index, priv->filters);
+   priv->filters_loc[index] = 0;
 
return 0;
 }
diff --git a/drivers/net/ethernet/broadcom/bcmsysport.h 
b/drivers/net/ethernet/broadcom/bcmsysport.h
index a7a230884a87..7a0b7bfedd19 100644
--- a/drivers/net/ethernet/broadcom/bcmsysport.h
+++ b/drivers/net/ethernet/broadcom/bcmsysport.h
@@ -786,6 +786,7 @@ struct bcm_sysport_priv {
/* Ethtool */
u32 msg_enable;
DECLARE_BITMAP(filters, RXCHK_BRCM_TAG_MAX);
+   u32 filters_loc[RXCHK_BRCM_TAG_MAX];
 
struct bcm_sysport_stats64  stats64;
 
-- 
2.17.1

[PATCH net-next 2/5] net: dsa: bcm_sf2: Split rule handling from HW operation

2018-11-06 Thread Florian Fainelli

In preparation for restoring CFP rules during system wide system
suspend/resume where the hardware loses its context, split the rule
validation from its actual insertion as well as the rule removal from
its actual hardware deletion operation.

Signed-off-by: Florian Fainelli 
---
 drivers/net/dsa/bcm_sf2_cfp.c | 89 +--
 1 file changed, 54 insertions(+), 35 deletions(-)

diff --git a/drivers/net/dsa/bcm_sf2_cfp.c b/drivers/net/dsa/bcm_sf2_cfp.c
index 29b6b4204662..396bfa43c2e1 100644
--- a/drivers/net/dsa/bcm_sf2_cfp.c
+++ b/drivers/net/dsa/bcm_sf2_cfp.c
@@ -789,32 +789,14 @@ static int bcm_sf2_cfp_ipv6_rule_set(struct bcm_sf2_priv 
*priv, int port,
return ret;
 }
 
-static int bcm_sf2_cfp_rule_set(struct dsa_switch *ds, int port,
-   struct ethtool_rx_flow_spec *fs)
+static int bcm_sf2_cfp_rule_insert(struct dsa_switch *ds, int port,
+  struct ethtool_rx_flow_spec *fs)
 {
struct bcm_sf2_priv *priv = bcm_sf2_to_priv(ds);
s8 cpu_port = ds->ports[port].cpu_dp->index;
__u64 ring_cookie = fs->ring_cookie;
unsigned int queue_num, port_num;
-   struct cfp_rule *rule = NULL;
-   int ret = -EINVAL;
-
-   /* Check for unsupported extensions */
-   if ((fs->flow_type & FLOW_EXT) && (fs->m_ext.vlan_etype ||
-fs->m_ext.data[1]))
-   return -EINVAL;
-
-   if (fs->location != RX_CLS_LOC_ANY &&
-   test_bit(fs->location, priv->cfp.used))
-   return -EBUSY;
-
-   if (fs->location != RX_CLS_LOC_ANY &&
-   fs->location > bcm_sf2_cfp_rule_size(priv))
-   return -EINVAL;
-
-   ret = bcm_sf2_cfp_rule_cmp(priv, port, fs);
-   if (ret == 0)
-   return -EEXIST;
+   int ret;
 
/* This rule is a Wake-on-LAN filter and we must specifically
 * target the CPU port in order for it to be working.
@@ -841,10 +823,6 @@ static int bcm_sf2_cfp_rule_set(struct dsa_switch *ds, int 
port,
if (port_num >= 7)
port_num -= 1;
 
-   rule = kzalloc(sizeof(*rule), GFP_KERNEL);
-   if (!rule)
-   return -ENOMEM;
-
switch (fs->flow_type & ~FLOW_EXT) {
case TCP_V4_FLOW:
case UDP_V4_FLOW:
@@ -861,6 +839,38 @@ static int bcm_sf2_cfp_rule_set(struct dsa_switch *ds, int 
port,
break;
}
 
+   return ret;
+}
+
+static int bcm_sf2_cfp_rule_set(struct dsa_switch *ds, int port,
+   struct ethtool_rx_flow_spec *fs)
+{
+   struct bcm_sf2_priv *priv = bcm_sf2_to_priv(ds);
+   struct cfp_rule *rule = NULL;
+   int ret = -EINVAL;
+
+   /* Check for unsupported extensions */
+   if ((fs->flow_type & FLOW_EXT) && (fs->m_ext.vlan_etype ||
+fs->m_ext.data[1]))
+   return -EINVAL;
+
+   if (fs->location != RX_CLS_LOC_ANY &&
+   test_bit(fs->location, priv->cfp.used))
+   return -EBUSY;
+
+   if (fs->location != RX_CLS_LOC_ANY &&
+   fs->location > bcm_sf2_cfp_rule_size(priv))
+   return -EINVAL;
+
+   ret = bcm_sf2_cfp_rule_cmp(priv, port, fs);
+   if (ret == 0)
+   return -EEXIST;
+
+   rule = kzalloc(sizeof(*rule), GFP_KERNEL);
+   if (!rule)
+   return -ENOMEM;
+
+   ret = bcm_sf2_cfp_rule_insert(ds, port, fs);
if (ret) {
kfree(rule);
return ret;
@@ -910,13 +920,28 @@ static int bcm_sf2_cfp_rule_del_one(struct bcm_sf2_priv 
*priv, int port,
return 0;
 }
 
-static int bcm_sf2_cfp_rule_del(struct bcm_sf2_priv *priv, int port,
-   u32 loc)
+static int bcm_sf2_cfp_rule_remove(struct bcm_sf2_priv *priv, int port,
+  u32 loc)
 {
-   struct cfp_rule *rule;
u32 next_loc = 0;
int ret;
 
+   ret = bcm_sf2_cfp_rule_del_one(priv, port, loc, _loc);
+   if (ret)
+   return ret;
+
+   /* If this was an IPv6 rule, delete is companion rule too */
+   if (next_loc)
+   ret = bcm_sf2_cfp_rule_del_one(priv, port, next_loc, NULL);
+
+   return ret;
+}
+
+static int bcm_sf2_cfp_rule_del(struct bcm_sf2_priv *priv, int port, u32 loc)
+{
+   struct cfp_rule *rule;
+   int ret;
+
/* Refuse deleting unused rules, and those that are not unique since
 * that could leave IPv6 rules with one of the chained rule in the
 * table.
@@ -928,13 +953,7 @@ static int bcm_sf2_cfp_rule_del(struct bcm_sf2_priv *priv, 
int port,
if (!rule)
return -EINVAL;
 
-   ret = bcm_sf2_cfp_rule_del_one(priv, port, loc, _loc);
-   if (ret)
-   return ret;
-
-   /* If this was an IPv6 rule, delete is companion rule too */
-   if (next_loc)
-   ret = bcm_sf2_cfp_rule_del_one(priv, port, next_loc, NULL);
+   ret = bcm_sf2_cfp_rule_remove(priv, port,

[PATCH net-next 3/5] net: dsa: bcm_sf2: Restore CFP rules during system resume

2018-11-06 Thread Florian Fainelli

The hardware can lose its context during system suspend, and depending
on the switch generation (7445 vs. 7278), while the rules are still
there, they will have their valid bit cleared (because that's the
fastest way for the HW to reset things). Just make sure we re-apply them
coming back from resume. The 7445 switch is an older version of the core
that has some quirky RAM technology requiring a delete then re-inser to
guarantee the RAM entries are properly latched.

Signed-off-by: Florian Fainelli 
---
 drivers/net/dsa/bcm_sf2.c |  4 
 drivers/net/dsa/bcm_sf2.h |  1 +
 drivers/net/dsa/bcm_sf2_cfp.c | 36 +++
 3 files changed, 41 insertions(+)

diff --git a/drivers/net/dsa/bcm_sf2.c b/drivers/net/dsa/bcm_sf2.c
index 89722dbfafb8..d8b93043b789 100644
--- a/drivers/net/dsa/bcm_sf2.c
+++ b/drivers/net/dsa/bcm_sf2.c
@@ -710,6 +710,10 @@ static int bcm_sf2_sw_resume(struct dsa_switch *ds)
return ret;
}
 
+   ret = bcm_sf2_cfp_resume(ds);
+   if (ret)
+   return ret;
+
if (priv->hw_params.num_gphy == 1)
bcm_sf2_gphy_enable_set(ds, true);
 
diff --git a/drivers/net/dsa/bcm_sf2.h b/drivers/net/dsa/bcm_sf2.h
index 03444982c25e..faaef320ec48 100644
--- a/drivers/net/dsa/bcm_sf2.h
+++ b/drivers/net/dsa/bcm_sf2.h
@@ -215,5 +215,6 @@ int bcm_sf2_set_rxnfc(struct dsa_switch *ds, int port,
  struct ethtool_rxnfc *nfc);
 int bcm_sf2_cfp_rst(struct bcm_sf2_priv *priv);
 void bcm_sf2_cfp_exit(struct dsa_switch *ds);
+int bcm_sf2_cfp_resume(struct dsa_switch *ds);
 
 #endif /* __BCM_SF2_H */
diff --git a/drivers/net/dsa/bcm_sf2_cfp.c b/drivers/net/dsa/bcm_sf2_cfp.c
index 396bfa43c2e1..5034e4f56fd5 100644
--- a/drivers/net/dsa/bcm_sf2_cfp.c
+++ b/drivers/net/dsa/bcm_sf2_cfp.c
@@ -1443,3 +1443,39 @@ void bcm_sf2_cfp_exit(struct dsa_switch *ds)
list_for_each_entry_safe_reverse(rule, n, >cfp.rules_list, next)
bcm_sf2_cfp_rule_del(priv, rule->port, rule->fs.location);
 }
+
+int bcm_sf2_cfp_resume(struct dsa_switch *ds)
+{
+   struct bcm_sf2_priv *priv = bcm_sf2_to_priv(ds);
+   struct cfp_rule *rule;
+   int ret = 0;
+   u32 reg;
+
+   if (list_empty(>cfp.rules_list))
+   return ret;
+
+   reg = core_readl(priv, CORE_CFP_CTL_REG);
+   reg &= ~CFP_EN_MAP_MASK;
+   core_writel(priv, reg, CORE_CFP_CTL_REG);
+
+   ret = bcm_sf2_cfp_rst(priv);
+   if (ret)
+   return ret;
+
+   list_for_each_entry(rule, >cfp.rules_list, next) {
+   ret = bcm_sf2_cfp_rule_remove(priv, rule->port,
+ rule->fs.location);
+   if (ret) {
+   dev_err(ds->dev, "failed to remove rule\n");
+   return ret;
+   }
+
+   ret = bcm_sf2_cfp_rule_insert(ds, rule->port, >fs);
+   if (ret) {
+   dev_err(ds->dev, "failed to restore rule\n");
+   return ret;
+   }
+   };
+
+   return ret;
+}
-- 
2.17.1

[PATCH net-next 0/5] net: dsa: bcm_sf2: Store rules in lists

2018-11-06 Thread Florian Fainelli

Hi all,

This patch series changes the bcm-sf2 driver to keep a copy of the
inserted rules as opposed to using the HW as a storage area for a number
of reasons:

- this helps us with doing duplicate rule detection in a faster way, it
  would have required a full rule read before

- this helps with Pablo's on-going work to convert ethtool_rx_flow_spec
  to a more generic flow rule structure by having fewer code paths to
  convert to the new structure/helpers

- we need to cache copies to restore them during drive resumption,
  because depending on the low power mode the system has entered, the
  switch may have lost all of its context

Florian Fainelli (5):
  net: dsa: bcm_sf2: Keep copy of inserted rules
  net: dsa: bcm_sf2: Split rule handling from HW operation
  net: dsa: bcm_sf2: Restore CFP rules during system resume
  net: dsa: bcm_sf2: Get rid of unmarshalling functions
  net: systemport: Restore Broadcom tag match filters upon resume

 drivers/net/dsa/bcm_sf2.c  |   6 +
 drivers/net/dsa/bcm_sf2.h  |   3 +
 drivers/net/dsa/bcm_sf2_cfp.c  | 497 -
 drivers/net/ethernet/broadcom/bcmsysport.c |  12 +
 drivers/net/ethernet/broadcom/bcmsysport.h |   1 +
 5 files changed, 204 insertions(+), 315 deletions(-)

-- 
2.17.1

[PATCH net-next 1/5] net: dsa: bcm_sf2: Keep copy of inserted rules

2018-11-06 Thread Florian Fainelli

We tried hard to use the hardware as a storage area, which made things
needlessly complex in that we had to both marshall and unmarshall the
ethtool_rx_flow_spec into what the CFP hardware understands but it did
not require any driver level allocations, so that was nice.

Keep a copy of the ethtool_rx_flow_spec rule we want to insert, and also
make sure we don't have a duplicate rule already. This greatly speeds up
the deletion time since we only need to clear the slice's valid bit and
not perform a full read.

This is a preparatory step for being able to restore rules upon system
resumption where the hardware loses its context partially or entirely.

Signed-off-by: Florian Fainelli 
---
 drivers/net/dsa/bcm_sf2.c |   2 +
 drivers/net/dsa/bcm_sf2.h |   2 +
 drivers/net/dsa/bcm_sf2_cfp.c | 144 +++---
 3 files changed, 137 insertions(+), 11 deletions(-)

diff --git a/drivers/net/dsa/bcm_sf2.c b/drivers/net/dsa/bcm_sf2.c
index 2eb68769562c..89722dbfafb8 100644
--- a/drivers/net/dsa/bcm_sf2.c
+++ b/drivers/net/dsa/bcm_sf2.c
@@ -1061,6 +1061,7 @@ static int bcm_sf2_sw_probe(struct platform_device *pdev)
spin_lock_init(>indir_lock);
mutex_init(>stats_mutex);
mutex_init(>cfp.lock);
+   INIT_LIST_HEAD(>cfp.rules_list);
 
/* CFP rule #0 cannot be used for specific classifications, flag it as
 * permanently used
@@ -1166,6 +1167,7 @@ static int bcm_sf2_sw_remove(struct platform_device *pdev)
 
priv->wol_ports_mask = 0;
dsa_unregister_switch(priv->dev->ds);
+   bcm_sf2_cfp_exit(priv->dev->ds);
/* Disable all ports and interrupts */
bcm_sf2_sw_suspend(priv->dev->ds);
bcm_sf2_mdio_unregister(priv);
diff --git a/drivers/net/dsa/bcm_sf2.h b/drivers/net/dsa/bcm_sf2.h
index cc31e986e6e3..03444982c25e 100644
--- a/drivers/net/dsa/bcm_sf2.h
+++ b/drivers/net/dsa/bcm_sf2.h
@@ -56,6 +56,7 @@ struct bcm_sf2_cfp_priv {
DECLARE_BITMAP(used, CFP_NUM_RULES);
DECLARE_BITMAP(unique, CFP_NUM_RULES);
unsigned int rules_cnt;
+   struct list_head rules_list;
 };
 
 struct bcm_sf2_priv {
@@ -213,5 +214,6 @@ int bcm_sf2_get_rxnfc(struct dsa_switch *ds, int port,
 int bcm_sf2_set_rxnfc(struct dsa_switch *ds, int port,
  struct ethtool_rxnfc *nfc);
 int bcm_sf2_cfp_rst(struct bcm_sf2_priv *priv);
+void bcm_sf2_cfp_exit(struct dsa_switch *ds);
 
 #endif /* __BCM_SF2_H */
diff --git a/drivers/net/dsa/bcm_sf2_cfp.c b/drivers/net/dsa/bcm_sf2_cfp.c
index 47c5f272a084..29b6b4204662 100644
--- a/drivers/net/dsa/bcm_sf2_cfp.c
+++ b/drivers/net/dsa/bcm_sf2_cfp.c
@@ -20,6 +20,12 @@
 #include "bcm_sf2.h"
 #include "bcm_sf2_regs.h"
 
+struct cfp_rule {
+   int port;
+   struct ethtool_rx_flow_spec fs;
+   struct list_head next;
+};
+
 struct cfp_udf_slice_layout {
u8 slices[UDFS_PER_SLICE];
u32 mask_value;
@@ -515,6 +521,61 @@ static void bcm_sf2_cfp_slice_ipv6(struct bcm_sf2_priv 
*priv,
core_writel(priv, reg, offset);
 }
 
+static struct cfp_rule *bcm_sf2_cfp_rule_find(struct bcm_sf2_priv *priv,
+ int port, u32 location)
+{
+   struct cfp_rule *rule = NULL;
+
+   list_for_each_entry(rule, >cfp.rules_list, next) {
+   if (rule->port == port && rule->fs.location == location)
+   break;
+   };
+
+   return rule;
+}
+
+static int bcm_sf2_cfp_rule_cmp(struct bcm_sf2_priv *priv, int port,
+   struct ethtool_rx_flow_spec *fs)
+{
+   struct cfp_rule *rule = NULL;
+   size_t fs_size = 0;
+   int ret = 1;
+
+   if (list_empty(>cfp.rules_list))
+   return ret;
+
+   list_for_each_entry(rule, >cfp.rules_list, next) {
+   ret = 1;
+   if (rule->port != port)
+   continue;
+
+   if (rule->fs.flow_type != fs->flow_type ||
+   rule->fs.ring_cookie != fs->ring_cookie ||
+   rule->fs.m_ext.data[0] != fs->m_ext.data[0])
+   continue;
+
+   switch (fs->flow_type & ~FLOW_EXT) {
+   case TCP_V6_FLOW:
+   case UDP_V6_FLOW:
+   fs_size = sizeof(struct ethtool_tcpip6_spec);
+   break;
+   case TCP_V4_FLOW:
+   case UDP_V4_FLOW:
+   fs_size = sizeof(struct ethtool_tcpip4_spec);
+   break;
+   default:
+   continue;
+   }
+
+   ret = memcmp(>fs.h_u, >h_u, fs_size);
+   ret |= memcmp(>fs.m_u, >m_u, fs_size);
+   if (ret == 0)
+   break;
+   }
+
+   return ret;
+}
+
 static int bcm_sf2_cfp_ipv6_rule_set(struct bcm_sf2_priv *priv, int port,
 unsigned int port_num,
 unsigned int queue_num,
@@ -735,6 +796,7

[PATCH net-next 2/3] net: Add extack argument to ip_fib_metrics_init

2018-11-06 Thread David Ahern

From: David Ahern 

Add extack argument to ip_fib_metrics_init and add messages for invalid
metrics.

Signed-off-by: David Ahern 
---
 include/net/ip.h |  3 ++-
 net/ipv4/fib_semantics.c |  2 +-
 net/ipv4/metrics.c   | 26 +++---
 net/ipv6/route.c |  5 +++--
 4 files changed, 25 insertions(+), 11 deletions(-)

diff --git a/include/net/ip.h b/include/net/ip.h
index 72593e171d14..462182f78236 100644
--- a/include/net/ip.h
+++ b/include/net/ip.h
@@ -421,7 +421,8 @@ static inline unsigned int ip_skb_dst_mtu(struct sock *sk,
 }
 
 struct dst_metrics *ip_fib_metrics_init(struct net *net, struct nlattr *fc_mx,
-   int fc_mx_len);
+   int fc_mx_len,
+   struct netlink_ext_ack *extack);
 static inline void ip_fib_metrics_put(struct dst_metrics *fib_metrics)
 {
if (fib_metrics != _default_metrics &&
diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c
index b5c3937ca6ec..5022bc63863a 100644
--- a/net/ipv4/fib_semantics.c
+++ b/net/ipv4/fib_semantics.c
@@ -1076,7 +1076,7 @@ struct fib_info *fib_create_info(struct fib_config *cfg,
if (!fi)
goto failure;
fi->fib_metrics = ip_fib_metrics_init(fi->fib_net, cfg->fc_mx,
- cfg->fc_mx_len);
+ cfg->fc_mx_len, extack);
if (unlikely(IS_ERR(fi->fib_metrics))) {
err = PTR_ERR(fi->fib_metrics);
kfree(fi);
diff --git a/net/ipv4/metrics.c b/net/ipv4/metrics.c
index 6d218f5a2e71..ca9a5fefdefa 100644
--- a/net/ipv4/metrics.c
+++ b/net/ipv4/metrics.c
@@ -6,7 +6,8 @@
 #include 
 
 static int ip_metrics_convert(struct net *net, struct nlattr *fc_mx,
- int fc_mx_len, u32 *metrics)
+ int fc_mx_len, u32 *metrics,
+ struct netlink_ext_ack *extack)
 {
bool ecn_ca = false;
struct nlattr *nla;
@@ -21,19 +22,26 @@ static int ip_metrics_convert(struct net *net, struct 
nlattr *fc_mx,
 
if (!type)
continue;
-   if (type > RTAX_MAX)
+   if (type > RTAX_MAX) {
+   NL_SET_ERR_MSG(extack, "Invalid metric type");
return -EINVAL;
+   }
 
if (type == RTAX_CC_ALGO) {
char tmp[TCP_CA_NAME_MAX];
 
nla_strlcpy(tmp, nla, sizeof(tmp));
val = tcp_ca_get_key_by_name(net, tmp, _ca);
-   if (val == TCP_CA_UNSPEC)
+   if (val == TCP_CA_UNSPEC) {
+   NL_SET_ERR_MSG(extack, "Unknown tcp congestion 
algorithm");
return -EINVAL;
+   }
} else {
-   if (nla_len(nla) != sizeof(u32))
+   if (nla_len(nla) != sizeof(u32)) {
+   NL_SET_ERR_MSG_ATTR(extack, nla,
+   "Invalid attribute in 
metrics");
return -EINVAL;
+   }
val = nla_get_u32(nla);
}
if (type == RTAX_ADVMSS && val > 65535 - 40)
@@ -42,8 +50,10 @@ static int ip_metrics_convert(struct net *net, struct nlattr 
*fc_mx,
val = 65535 - 15;
if (type == RTAX_HOPLIMIT && val > 255)
val = 255;
-   if (type == RTAX_FEATURES && (val & ~RTAX_FEATURE_MASK))
+   if (type == RTAX_FEATURES && (val & ~RTAX_FEATURE_MASK)) {
+   NL_SET_ERR_MSG(extack, "Unknown flag set in feature 
mask in metrics attribute");
return -EINVAL;
+   }
metrics[type - 1] = val;
}
 
@@ -54,7 +64,8 @@ static int ip_metrics_convert(struct net *net, struct nlattr 
*fc_mx,
 }
 
 struct dst_metrics *ip_fib_metrics_init(struct net *net, struct nlattr *fc_mx,
-   int fc_mx_len)
+   int fc_mx_len,
+   struct netlink_ext_ack *extack)
 {
struct dst_metrics *fib_metrics;
int err;
@@ -66,7 +77,8 @@ struct dst_metrics *ip_fib_metrics_init(struct net *net, 
struct nlattr *fc_mx,
if (unlikely(!fib_metrics))
return ERR_PTR(-ENOMEM);
 
-   err = ip_metrics_convert(net, fc_mx, fc_mx_len, fib_metrics->metrics);
+   err = ip_metrics_convert(net, fc_mx, fc_mx_len, fib_metrics->metrics,
+extack);
if (!err) {
refcount_set(_metrics->refcnt, 1);
} else {
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 2a7423c39456..b2447b7c7303 100644
--- a/net/ipv6/route.c
+++

[PATCH net-next 1/3] net: Add extack argument to rtnl_create_link

2018-11-06 Thread David Ahern

From: David Ahern 

Add extack arg to rtnl_create_link and add messages for invalid
number of Tx or Rx queues.

Signed-off-by: David Ahern 
---
 drivers/net/can/vxcan.c |  2 +-
 drivers/net/geneve.c|  2 +-
 drivers/net/veth.c  |  2 +-
 drivers/net/vxlan.c |  2 +-
 include/net/rtnetlink.h |  3 ++-
 net/core/rtnetlink.c| 18 --
 net/ipv4/ip_gre.c   |  2 +-
 7 files changed, 19 insertions(+), 12 deletions(-)

diff --git a/drivers/net/can/vxcan.c b/drivers/net/can/vxcan.c
index ed6828821fbd..80af658e530d 100644
--- a/drivers/net/can/vxcan.c
+++ b/drivers/net/can/vxcan.c
@@ -207,7 +207,7 @@ static int vxcan_newlink(struct net *net, struct net_device 
*dev,
return PTR_ERR(peer_net);
 
peer = rtnl_create_link(peer_net, ifname, name_assign_type,
-   _link_ops, tbp);
+   _link_ops, tbp, extack);
if (IS_ERR(peer)) {
put_net(peer_net);
return PTR_ERR(peer);
diff --git a/drivers/net/geneve.c b/drivers/net/geneve.c
index a0cd1c41cf5f..fbfc13d81f66 100644
--- a/drivers/net/geneve.c
+++ b/drivers/net/geneve.c
@@ -1666,7 +1666,7 @@ struct net_device *geneve_dev_create_fb(struct net *net, 
const char *name,
 
memset(tb, 0, sizeof(tb));
dev = rtnl_create_link(net, name, name_assign_type,
-  _link_ops, tb);
+  _link_ops, tb, NULL);
if (IS_ERR(dev))
return dev;
 
diff --git a/drivers/net/veth.c b/drivers/net/veth.c
index 890fa5b905e2..f412ea1cef18 100644
--- a/drivers/net/veth.c
+++ b/drivers/net/veth.c
@@ -1253,7 +1253,7 @@ static int veth_newlink(struct net *src_net, struct 
net_device *dev,
return PTR_ERR(net);
 
peer = rtnl_create_link(net, ifname, name_assign_type,
-   _link_ops, tbp);
+   _link_ops, tbp, extack);
if (IS_ERR(peer)) {
put_net(net);
return PTR_ERR(peer);
diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index 297cdeaef479..ae969f806d56 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -3749,7 +3749,7 @@ struct net_device *vxlan_dev_create(struct net *net, 
const char *name,
memset(, 0, sizeof(tb));
 
dev = rtnl_create_link(net, name, name_assign_type,
-  _link_ops, tb);
+  _link_ops, tb, NULL);
if (IS_ERR(dev))
return dev;
 
diff --git a/include/net/rtnetlink.h b/include/net/rtnetlink.h
index cf26e5aacac4..e2091bb2b3a8 100644
--- a/include/net/rtnetlink.h
+++ b/include/net/rtnetlink.h
@@ -159,7 +159,8 @@ struct net *rtnl_link_get_net(struct net *src_net, struct 
nlattr *tb[]);
 struct net_device *rtnl_create_link(struct net *net, const char *ifname,
unsigned char name_assign_type,
const struct rtnl_link_ops *ops,
-   struct nlattr *tb[]);
+   struct nlattr *tb[],
+   struct netlink_ext_ack *extack);
 int rtnl_delete_link(struct net_device *dev);
 int rtnl_configure_link(struct net_device *dev, const struct ifinfomsg *ifm);
 
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 33d9227a8b80..f787b7640d49 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -2885,9 +2885,11 @@ int rtnl_configure_link(struct net_device *dev, const 
struct ifinfomsg *ifm)
 }
 EXPORT_SYMBOL(rtnl_configure_link);
 
-struct net_device *rtnl_create_link(struct net *net,
-   const char *ifname, unsigned char name_assign_type,
-   const struct rtnl_link_ops *ops, struct nlattr *tb[])
+struct net_device *rtnl_create_link(struct net *net, const char *ifname,
+   unsigned char name_assign_type,
+   const struct rtnl_link_ops *ops,
+   struct nlattr *tb[],
+   struct netlink_ext_ack *extack)
 {
struct net_device *dev;
unsigned int num_tx_queues = 1;
@@ -2903,11 +2905,15 @@ struct net_device *rtnl_create_link(struct net *net,
else if (ops->get_num_rx_queues)
num_rx_queues = ops->get_num_rx_queues();
 
-   if (num_tx_queues < 1 || num_tx_queues > 4096)
+   if (num_tx_queues < 1 || num_tx_queues > 4096) {
+   NL_SET_ERR_MSG(extack, "Invalid number of transmit queues");
return ERR_PTR(-EINVAL);
+   }
 
-   if (num_rx_queues < 1 || num_rx_queues > 4096)
+   if (num_rx_queues < 1 || num_rx_queues > 4096) {
+   NL_SET_ERR_MSG(extack, "Invalid number of receive queues");
return ERR_PTR(-EINVAL);
+   }
 
dev = alloc_netdev_mqs(ops->priv_size, ifname, name_assign_type,
   ops->setup, num_tx_queues,

[PATCH net-next 3/3] rtnetlink: Add more extack messages to rtnl_newlink

2018-11-06 Thread David Ahern

From: David Ahern 

Add extack arg to the nla_parse_nested calls in rtnl_newlink, and
add messages for unknown device type and link network namespace id.
In particular, it improves the failure message when the wrong link
type is used. From
$ ip li add bond1 type bonding
RTNETLINK answers: Operation not supported
to
$ ip li add bond1 type bonding
Error: Unknown device type.

(The module name is bonding but the link type is bond.)

Signed-off-by: David Ahern 
---
 net/core/rtnetlink.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index f787b7640d49..86f2d9cbdae3 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -3054,7 +3054,7 @@ static int rtnl_newlink(struct sk_buff *skb, struct 
nlmsghdr *nlh,
if (ops->maxtype && linkinfo[IFLA_INFO_DATA]) {
err = nla_parse_nested(attr, ops->maxtype,
   linkinfo[IFLA_INFO_DATA],
-  ops->policy, NULL);
+  ops->policy, extack);
if (err < 0)
return err;
data = attr;
@@ -3076,7 +3076,7 @@ static int rtnl_newlink(struct sk_buff *skb, struct 
nlmsghdr *nlh,
   m_ops->slave_maxtype,
   
linkinfo[IFLA_INFO_SLAVE_DATA],
   m_ops->slave_policy,
-  NULL);
+  extack);
if (err < 0)
return err;
slave_data = slave_attr;
@@ -3140,6 +3140,7 @@ static int rtnl_newlink(struct sk_buff *skb, struct 
nlmsghdr *nlh,
goto replay;
}
 #endif
+   NL_SET_ERR_MSG(extack, "Unknown device type");
return -EOPNOTSUPP;
}
 
@@ -3160,6 +3161,7 @@ static int rtnl_newlink(struct sk_buff *skb, struct 
nlmsghdr *nlh,
 
link_net = get_net_ns_by_id(dest_net, id);
if (!link_net) {
+   NL_SET_ERR_MSG(extack, "Unknown network 
namespace id");
err =  -EINVAL;
goto out;
}
-- 
2.11.0

[PATCH net-next 0/3] net: More extack messages

2018-11-06 Thread David Ahern

From: David Ahern 

Add more extack messages for several link create errors (e.g., invalid
number of queues, unknown link kind) and invalid metrics argument.

David Ahern (3):
  net: Add extack argument to rtnl_create_link
  net: Add extack argument to ip_fib_metrics_init
  rtnetlink: Add more extack messages to rtnl_newlink

 drivers/net/can/vxcan.c  |  2 +-
 drivers/net/geneve.c |  2 +-
 drivers/net/veth.c   |  2 +-
 drivers/net/vxlan.c  |  2 +-
 include/net/ip.h |  3 ++-
 include/net/rtnetlink.h  |  3 ++-
 net/core/rtnetlink.c | 24 
 net/ipv4/fib_semantics.c |  2 +-
 net/ipv4/ip_gre.c|  2 +-
 net/ipv4/metrics.c   | 26 +++---
 net/ipv6/route.c |  5 +++--
 11 files changed, 48 insertions(+), 25 deletions(-)

-- 
2.11.0

Re: [PATCH net-next 1/3] devlink: Add fw_version_check generic parameter

2018-11-06 Thread Ido Schimmel

On Tue, Nov 06, 2018 at 12:19:13PM -0800, Jakub Kicinski wrote:
> On Tue, 6 Nov 2018 20:05:00 +, Ido Schimmel wrote:
> > From: Shalom Toledo 
> > 
> > Many drivers checking the device's firmware version during the
> > initialization flow and flashing a compatible version if the current
> > version is not.
> > 
> > fw_version_check gives the ability to skip this check which allows to run
> > the device with a different firmware version than required by the driver
> > for testing and/or debugging purposes.
> > 
> > Signed-off-by: Shalom Toledo 
> > Reviewed-by: Jiri Pirko 
> > Signed-off-by: Ido Schimmel 
> 
> The documentation is missing, so it's hard to comment on the definition
> of the parameter...  

I assume you mean Documentation/networking/devlink-params.txt ?

> We have a FW loading policy for NFP, too, so it'd be good to see if we
> can find a common ground.

If the parameter is set, then device runs with whatever firmware version
was last flashed (via ethtool, for example). Otherwise, the driver will
flash a version according to its policy. In mlxsw, it is a specific
version.

Will that work for you?

[PATCH bpf-next 0/2] TCP-BPF event notification support

2018-11-06 Thread Sowmini Varadhan

This patchset uses eBPF perf-event based notification mechanism to solve
the problem described in 
   https://marc.info/?l=linux-netdev=154022219423571=2.
Thanks to Daniel Borkmann for feedback/input.

The problem statement is
  We would like to monitor some subset of TCP sockets in user-space,
  (the monitoring application would define 4-tuples it wants to monitor)
  using TCP_INFO stats to analyze reported problems. The idea is to
  use those stats to see where the bottlenecks are likely to be ("is it
  application-limited?" or "is there evidence of BufferBloat in the
  path?" etc)

  Today we can do this by periodically polling for tcp_info, but this
  could be made more efficient if the kernel would asynchronously
  notify the application via tcp_info when some "interesting"
  thresholds (e.g., "RTT variance > X", or "total_retrans > Y" etc)
  are reached. And to make this effective, it is better if
  we could apply the threshold check *before* constructing the
  tcp_info netlink notification, so that we don't waste resources
  constructing notifications that will be discarded by the filter.

This patchset solves the problem by adding perf-event based notification
support for sock_ops (Patch1). The eBPF kernel module can thus 
be designed to apply any desired filters to the bpf_sock_ops and
trigger a perf-event notification based on the verdict from the filter.
The uspace component can use these perf-event notifications to either
read any state managed by the eBPF kernel module, or issue a TCP_INFO 
netlink call if desired.

Patch 2 provides a simple example that shows how to use this infra
(and also provides a test case for it)

Sowmini Varadhan (2):
  bpf: add perf-event notificaton support for sock_ops
  selftests/bpf: add a test case for sock_ops perf-event notification

 net/core/filter.c |   19 ++
 tools/testing/selftests/bpf/Makefile  |4 +-
 tools/testing/selftests/bpf/perf-sys.h|   74 
 tools/testing/selftests/bpf/test_tcpnotify.h  |   19 ++
 tools/testing/selftests/bpf/test_tcpnotify_kern.c |   95 +++
 tools/testing/selftests/bpf/test_tcpnotify_user.c |  186 +
 6 files changed, 396 insertions(+), 1 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/perf-sys.h
 create mode 100644 tools/testing/selftests/bpf/test_tcpnotify.h
 create mode 100644 tools/testing/selftests/bpf/test_tcpnotify_kern.c
 create mode 100644 tools/testing/selftests/bpf/test_tcpnotify_user.c

[PATCH bpf-next 2/2] selftests/bpf: add a test case for sock_ops perf-event notification

2018-11-06 Thread Sowmini Varadhan

This patch provides a tcp_bpf based eBPF sample. The test
- ncat(1) as the TCP client program to connect() to a port
  with the intention of triggerring SYN retransmissions: we
  first install an iptables DROP rule to make sure ncat SYNs are
  resent (instead of aborting instantly after a TCP RST)
- has a bpf kernel module that sends a perf-event notification for
  each TCP retransmit, and also tracks the number of such notifications
  sent in the global_map
The test passes when the number of event notifications intercepted
in user-space matches the value in the global_map.

Signed-off-by: Sowmini Varadhan 
---
 tools/testing/selftests/bpf/Makefile  |4 +-
 tools/testing/selftests/bpf/perf-sys.h|   74 
 tools/testing/selftests/bpf/test_tcpnotify.h  |   19 ++
 tools/testing/selftests/bpf/test_tcpnotify_kern.c |   95 +++
 tools/testing/selftests/bpf/test_tcpnotify_user.c |  186 +
 5 files changed, 377 insertions(+), 1 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/perf-sys.h
 create mode 100644 tools/testing/selftests/bpf/test_tcpnotify.h
 create mode 100644 tools/testing/selftests/bpf/test_tcpnotify_kern.c
 create mode 100644 tools/testing/selftests/bpf/test_tcpnotify_user.c

diff --git a/tools/testing/selftests/bpf/Makefile 
b/tools/testing/selftests/bpf/Makefile
index e39dfb4..6c94048 100644
--- a/tools/testing/selftests/bpf/Makefile
+++ b/tools/testing/selftests/bpf/Makefile
@@ -24,12 +24,13 @@ TEST_GEN_PROGS = test_verifier test_tag test_maps 
test_lru_map test_lpm_map test
test_align test_verifier_log test_dev_cgroup test_tcpbpf_user \
test_sock test_btf test_sockmap test_lirc_mode2_user get_cgroup_id_user 
\
test_socket_cookie test_cgroup_storage test_select_reuseport 
test_section_names \
-   test_netcnt
+   test_netcnt test_tcpnotify_user
 
 TEST_GEN_FILES = test_pkt_access.o test_xdp.o test_l4lb.o test_tcp_estats.o 
test_obj_id.o \
test_pkt_md_access.o test_xdp_redirect.o test_xdp_meta.o 
sockmap_parse_prog.o \
sockmap_verdict_prog.o dev_cgroup.o sample_ret0.o test_tracepoint.o \
test_l4lb_noinline.o test_xdp_noinline.o test_stacktrace_map.o \
+   test_tcpnotify_kern.o \
sample_map_ret0.o test_tcpbpf_kern.o test_stacktrace_build_id.o \
sockmap_tcp_msg_prog.o connect4_prog.o connect6_prog.o 
test_adjust_tail.o \
test_btf_haskv.o test_btf_nokv.o test_sockmap_kern.o test_tunnel_kern.o 
\
@@ -74,6 +75,7 @@ $(OUTPUT)/test_sock_addr: cgroup_helpers.c
 $(OUTPUT)/test_socket_cookie: cgroup_helpers.c
 $(OUTPUT)/test_sockmap: cgroup_helpers.c
 $(OUTPUT)/test_tcpbpf_user: cgroup_helpers.c
+$(OUTPUT)/test_tcpnotify_user: cgroup_helpers.c trace_helpers.c
 $(OUTPUT)/test_progs: trace_helpers.c
 $(OUTPUT)/get_cgroup_id_user: cgroup_helpers.c
 $(OUTPUT)/test_cgroup_storage: cgroup_helpers.c
diff --git a/tools/testing/selftests/bpf/perf-sys.h 
b/tools/testing/selftests/bpf/perf-sys.h
new file mode 100644
index 000..3eb7a39
--- /dev/null
+++ b/tools/testing/selftests/bpf/perf-sys.h
@@ -0,0 +1,74 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _PERF_SYS_H
+#define _PERF_SYS_H
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#ifdef __powerpc__
+#define CPUINFO_PROC   {"cpu"}
+#endif
+
+#ifdef __s390__
+#define CPUINFO_PROC   {"vendor_id"}
+#endif
+
+#ifdef __sh__
+#define CPUINFO_PROC   {"cpu type"}
+#endif
+
+#ifdef __hppa__
+#define CPUINFO_PROC   {"cpu"}
+#endif
+
+#ifdef __sparc__
+#define CPUINFO_PROC   {"cpu"}
+#endif
+
+#ifdef __alpha__
+#define CPUINFO_PROC   {"cpu model"}
+#endif
+
+#ifdef __arm__
+#define CPUINFO_PROC   {"model name", "Processor"}
+#endif
+
+#ifdef __mips__
+#define CPUINFO_PROC   {"cpu model"}
+#endif
+
+#ifdef __arc__
+#define CPUINFO_PROC   {"Processor"}
+#endif
+
+#ifdef __xtensa__
+#define CPUINFO_PROC   {"core ID"}
+#endif
+
+#ifndef CPUINFO_PROC
+#define CPUINFO_PROC   { "model name", }
+#endif
+
+static inline int
+sys_perf_event_open(struct perf_event_attr *attr,
+ pid_t pid, int cpu, int group_fd,
+ unsigned long flags)
+{
+   int fd;
+
+   fd = syscall(__NR_perf_event_open, attr, pid, cpu,
+group_fd, flags);
+
+#ifdef HAVE_ATTR_TEST
+   if (unlikely(test_attr__enabled))
+   test_attr__open(attr, pid, cpu, fd, group_fd, flags);
+#endif
+   return fd;
+}
+
+#endif /* _PERF_SYS_H */
diff --git a/tools/testing/selftests/bpf/test_tcpnotify.h 
b/tools/testing/selftests/bpf/test_tcpnotify.h
new file mode 100644
index 000..8b6cea0
--- /dev/null
+++ b/tools/testing/selftests/bpf/test_tcpnotify.h
@@ -0,0 +1,19 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#ifndef _TEST_TCPBPF_H
+#define _TEST_TCPBPF_H
+
+struct tcpnotify_globals {
+   __u32 total_retrans;
+   __u32 ncalls;
+};
+
+struct tcp_notifier {
+   __u8type;
+   __u8subtype;
+   __u8source;
+   __u8

[PATCH bpf-next 1/2] bpf: add perf-event notificaton support for sock_ops

2018-11-06 Thread Sowmini Varadhan

This patch allows eBPF programs that use sock_ops to send
perf-based event notifications using bpf_perf_event_output()

Signed-off-by: Sowmini Varadhan 
---
 net/core/filter.c |   19 +++
 1 files changed, 19 insertions(+), 0 deletions(-)

diff --git a/net/core/filter.c b/net/core/filter.c
index e521c5e..23464a3 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -4048,6 +4048,23 @@ static unsigned long bpf_xdp_copy(void *dst_buff, const 
void *src_buff,
return ret;
 }
 
+BPF_CALL_5(bpf_sock_opts_event_output, struct bpf_sock_ops *, skops,
+  struct bpf_map *, map, u64, flags, void *, data, u64, size)
+{
+   return bpf_event_output(map, flags, data, size, NULL, 0, NULL);
+}
+
+static const struct bpf_func_proto bpf_sock_ops_event_output_proto =  {
+   .func   = bpf_sock_opts_event_output,
+   .gpl_only   = true,
+   .ret_type   = RET_INTEGER,
+   .arg1_type  = ARG_PTR_TO_CTX,
+   .arg2_type  = ARG_CONST_MAP_PTR,
+   .arg3_type  = ARG_ANYTHING,
+   .arg4_type  = ARG_PTR_TO_MEM,
+   .arg5_type  = ARG_CONST_SIZE_OR_ZERO,
+};
+
 static const struct bpf_func_proto bpf_setsockopt_proto = {
.func   = bpf_setsockopt,
.gpl_only   = false,
@@ -5226,6 +5243,8 @@ bool bpf_helper_changes_pkt_data(void *func)
 sock_ops_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 {
switch (func_id) {
+   case BPF_FUNC_perf_event_output:
+   return _sock_ops_event_output_proto;
case BPF_FUNC_setsockopt:
return _setsockopt_proto;
case BPF_FUNC_getsockopt:
-- 
1.7.1

Re: [PATCH net-next 1/3] devlink: Add fw_version_check generic parameter

2018-11-06 Thread Jakub Kicinski

On Tue, 6 Nov 2018 20:05:00 +, Ido Schimmel wrote:
> From: Shalom Toledo 
> 
> Many drivers checking the device's firmware version during the
> initialization flow and flashing a compatible version if the current
> version is not.
> 
> fw_version_check gives the ability to skip this check which allows to run
> the device with a different firmware version than required by the driver
> for testing and/or debugging purposes.
> 
> Signed-off-by: Shalom Toledo 
> Reviewed-by: Jiri Pirko 
> Signed-off-by: Ido Schimmel 

The documentation is missing, so it's hard to comment on the definition
of the parameter...  We have a FW loading policy for NFP, too, so it'd
be good to see if we can find a common ground.

bpf-next is OPEN

2018-11-06 Thread Daniel Borkmann

Merge window is over so bpf-next is open again!

Thanks,
Daniel

[PATCH net-next 0/3] mlxsw: Add fw_version_check devlink parameter

2018-11-06 Thread Ido Schimmel

Shalom says:

Currently, drivers do not have the ability to skip the firmware version
check during their initialization flow. Hence, drivers will always flash
a compatible version. This prevents drivers from running the device with
a different firmware version for testing and/or debugging purposes. For
example, testing a firmware bug fix.

For these situations, the new devlink generic parameter,
fw_version_check, gives the ability to skip the version check and allows
drivers to run with a different firmware version than what is required
by the driver.

Patch #1 adds the new parameter to devlink. The other two patches, #2
and #3, add support for this parameter in the mlxsw driver.

Shalom Toledo (3):
  devlink: Add fw_version_check generic parameter
  mlxsw: core: Reset firmware after flash during driver initialization
  mlxsw: spectrum: Skip firmware version check based on devlink
parameter

 drivers/net/ethernet/mellanox/mlxsw/core.c| 45 +--
 drivers/net/ethernet/mellanox/mlxsw/core.h|  2 +
 drivers/net/ethernet/mellanox/mlxsw/pci.c | 11 +
 .../net/ethernet/mellanox/mlxsw/spectrum.c| 45 +++
 include/net/devlink.h |  4 ++
 net/core/devlink.c|  5 +++
 6 files changed, 98 insertions(+), 14 deletions(-)

-- 
2.19.1

[PATCH net-next 3/3] mlxsw: spectrum: Skip firmware version check based on devlink parameter

2018-11-06 Thread Ido Schimmel

From: Shalom Toledo 

Based on fw_version_check devlink parameter, skip firmware version check in
order to run the device with different firmware version than required by
the driver for testing and/or debugging purposes.

Signed-off-by: Shalom Toledo 
Reviewed-by: Jiri Pirko 
Signed-off-by: Ido Schimmel 
---
 drivers/net/ethernet/mellanox/mlxsw/core.c| 13 ++
 drivers/net/ethernet/mellanox/mlxsw/core.h|  2 +
 .../net/ethernet/mellanox/mlxsw/spectrum.c| 45 +++
 3 files changed, 60 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/core.c 
b/drivers/net/ethernet/mellanox/mlxsw/core.c
index b2e1b83525db..281aeb1c2386 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/core.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/core.c
@@ -1036,6 +1036,12 @@ __mlxsw_core_bus_device_register(const struct 
mlxsw_bus_info *mlxsw_bus_info,
goto err_devlink_register;
}
 
+   if (mlxsw_driver->params_register && !reload) {
+   err = mlxsw_driver->params_register(mlxsw_core);
+   if (err)
+   goto err_register_params;
+   }
+
err = mlxsw_hwmon_init(mlxsw_core, mlxsw_bus_info, _core->hwmon);
if (err)
goto err_hwmon_init;
@@ -1058,6 +1064,9 @@ __mlxsw_core_bus_device_register(const struct 
mlxsw_bus_info *mlxsw_bus_info,
 err_thermal_init:
mlxsw_hwmon_fini(mlxsw_core->hwmon);
 err_hwmon_init:
+   if (mlxsw_driver->params_unregister && !reload)
+   mlxsw_driver->params_unregister(mlxsw_core);
+err_register_params:
if (!reload)
devlink_unregister(devlink);
 err_devlink_register:
@@ -1121,6 +1130,8 @@ void mlxsw_core_bus_device_unregister(struct mlxsw_core 
*mlxsw_core,
mlxsw_core->driver->fini(mlxsw_core);
mlxsw_thermal_fini(mlxsw_core->thermal);
mlxsw_hwmon_fini(mlxsw_core->hwmon);
+   if (mlxsw_core->driver->params_unregister && !reload)
+   mlxsw_core->driver->params_unregister(mlxsw_core);
if (!reload)
devlink_unregister(devlink);
mlxsw_emad_fini(mlxsw_core);
@@ -1133,6 +1144,8 @@ void mlxsw_core_bus_device_unregister(struct mlxsw_core 
*mlxsw_core,
return;
 
 reload_fail_deinit:
+   if (mlxsw_core->driver->params_unregister)
+   mlxsw_core->driver->params_unregister(mlxsw_core);
devlink_unregister(devlink);
devlink_resources_unregister(devlink, NULL);
devlink_free(devlink);
diff --git a/drivers/net/ethernet/mellanox/mlxsw/core.h 
b/drivers/net/ethernet/mellanox/mlxsw/core.h
index c35be477856f..d811be8989b0 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/core.h
+++ b/drivers/net/ethernet/mellanox/mlxsw/core.h
@@ -282,6 +282,8 @@ struct mlxsw_driver {
 const struct mlxsw_config_profile *profile,
 u64 *p_single_size, u64 *p_double_size,
 u64 *p_linear_size);
+   int (*params_register)(struct mlxsw_core *mlxsw_core);
+   void (*params_unregister)(struct mlxsw_core *mlxsw_core);
u8 txhdr_len;
const struct mlxsw_config_profile *profile;
bool res_query_enabled;
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c 
b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
index 9bec940330a4..ab1ca8c1d6df 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
@@ -318,6 +318,7 @@ static int mlxsw_sp_fw_rev_validate(struct mlxsw_sp 
*mlxsw_sp)
const struct mlxsw_fw_rev *rev = _sp->bus_info->fw_rev;
const struct mlxsw_fw_rev *req_rev = mlxsw_sp->req_rev;
const char *fw_filename = mlxsw_sp->fw_filename;
+   union devlink_param_value value;
const struct firmware *firmware;
int err;
 
@@ -325,6 +326,15 @@ static int mlxsw_sp_fw_rev_validate(struct mlxsw_sp 
*mlxsw_sp)
if (!req_rev || !fw_filename)
return 0;
 
+   /* Don't check if devlink fw_version_check param is false */
+   err = 
devlink_param_driverinit_value_get(priv_to_devlink(mlxsw_sp->core),
+
DEVLINK_PARAM_GENERIC_ID_FW_VERSION_CHECK,
+);
+   if (err)
+   return err;
+   if (!value.vbool)
+   return 0;
+
/* Validate driver & FW are compatible */
if (rev->major != req_rev->major) {
WARN(1, "Mismatch in major FW version [%d:%d] is never 
expected; Please contact support\n",
@@ -4171,6 +4181,37 @@ static int mlxsw_sp_kvd_sizes_get(struct mlxsw_core 
*mlxsw_core,
return 0;
 }
 
+static const struct devlink_param mlxsw_sp_devlink_params[] = {
+   DEVLINK_PARAM_GENERIC(FW_VERSION_CHECK,
+ BIT(DEVLINK_PARAM_CMODE_DRIVERINIT),
+ NULL, NULL, NULL),
+};
+
+static int

[PATCH net-next 1/3] devlink: Add fw_version_check generic parameter

2018-11-06 Thread Ido Schimmel

From: Shalom Toledo 

Many drivers checking the device's firmware version during the
initialization flow and flashing a compatible version if the current
version is not.

fw_version_check gives the ability to skip this check which allows to run
the device with a different firmware version than required by the driver
for testing and/or debugging purposes.

Signed-off-by: Shalom Toledo 
Reviewed-by: Jiri Pirko 
Signed-off-by: Ido Schimmel 
---
 include/net/devlink.h | 4 
 net/core/devlink.c| 5 +
 2 files changed, 9 insertions(+)

diff --git a/include/net/devlink.h b/include/net/devlink.h
index 45db0c79462d..d47ea9d38252 100644
--- a/include/net/devlink.h
+++ b/include/net/devlink.h
@@ -365,6 +365,7 @@ enum devlink_param_generic_id {
DEVLINK_PARAM_GENERIC_ID_IGNORE_ARI,
DEVLINK_PARAM_GENERIC_ID_MSIX_VEC_PER_PF_MAX,
DEVLINK_PARAM_GENERIC_ID_MSIX_VEC_PER_PF_MIN,
+   DEVLINK_PARAM_GENERIC_ID_FW_VERSION_CHECK,
 
/* add new param generic ids above here*/
__DEVLINK_PARAM_GENERIC_ID_MAX,
@@ -392,6 +393,9 @@ enum devlink_param_generic_id {
 #define DEVLINK_PARAM_GENERIC_MSIX_VEC_PER_PF_MIN_NAME "msix_vec_per_pf_min"
 #define DEVLINK_PARAM_GENERIC_MSIX_VEC_PER_PF_MIN_TYPE DEVLINK_PARAM_TYPE_U32
 
+#define DEVLINK_PARAM_GENERIC_FW_VERSION_CHECK_NAME "fw_version_check"
+#define DEVLINK_PARAM_GENERIC_FW_VERSION_CHECK_TYPE DEVLINK_PARAM_TYPE_BOOL
+
 #define DEVLINK_PARAM_GENERIC(_id, _cmodes, _get, _set, _validate) \
 {  \
.id = DEVLINK_PARAM_GENERIC_ID_##_id,   \
diff --git a/net/core/devlink.c b/net/core/devlink.c
index 3a4b29a13d31..1a09ad057851 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -2692,6 +2692,11 @@ static const struct devlink_param 
devlink_param_generic[] = {
.name = DEVLINK_PARAM_GENERIC_MSIX_VEC_PER_PF_MIN_NAME,
.type = DEVLINK_PARAM_GENERIC_MSIX_VEC_PER_PF_MIN_TYPE,
},
+   {
+   .id = DEVLINK_PARAM_GENERIC_ID_FW_VERSION_CHECK,
+   .name = DEVLINK_PARAM_GENERIC_FW_VERSION_CHECK_NAME,
+   .type = DEVLINK_PARAM_GENERIC_FW_VERSION_CHECK_TYPE,
+   },
 };
 
 static int devlink_param_generic_verify(const struct devlink_param *param)
-- 
2.19.1

[PATCH net-next 2/3] mlxsw: core: Reset firmware after flash during driver initialization

2018-11-06 Thread Ido Schimmel

From: Shalom Toledo 

After flashing new firmware during the driver initialization flow (reload
or not), the driver should do a firmware reset when it gets -EAGAIN in
order to load the new one.

Signed-off-by: Shalom Toledo 
Reviewed-by: Jiri Pirko 
Signed-off-by: Ido Schimmel 
---
 drivers/net/ethernet/mellanox/mlxsw/core.c | 32 +++---
 drivers/net/ethernet/mellanox/mlxsw/pci.c  | 11 +---
 2 files changed, 29 insertions(+), 14 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/core.c 
b/drivers/net/ethernet/mellanox/mlxsw/core.c
index 30f751e69698..b2e1b83525db 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/core.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/core.c
@@ -965,10 +965,11 @@ static const struct devlink_ops mlxsw_devlink_ops = {
.sb_occ_tc_port_bind_get= mlxsw_devlink_sb_occ_tc_port_bind_get,
 };
 
-int mlxsw_core_bus_device_register(const struct mlxsw_bus_info *mlxsw_bus_info,
-  const struct mlxsw_bus *mlxsw_bus,
-  void *bus_priv, bool reload,
-  struct devlink *devlink)
+static int
+__mlxsw_core_bus_device_register(const struct mlxsw_bus_info *mlxsw_bus_info,
+const struct mlxsw_bus *mlxsw_bus,
+void *bus_priv, bool reload,
+struct devlink *devlink)
 {
const char *device_kind = mlxsw_bus_info->device_kind;
struct mlxsw_core *mlxsw_core;
@@ -1076,6 +1077,29 @@ int mlxsw_core_bus_device_register(const struct 
mlxsw_bus_info *mlxsw_bus_info,
 err_devlink_alloc:
return err;
 }
+
+int mlxsw_core_bus_device_register(const struct mlxsw_bus_info *mlxsw_bus_info,
+  const struct mlxsw_bus *mlxsw_bus,
+  void *bus_priv, bool reload,
+  struct devlink *devlink)
+{
+   bool called_again = false;
+   int err;
+
+again:
+   err = __mlxsw_core_bus_device_register(mlxsw_bus_info, mlxsw_bus,
+  bus_priv, reload, devlink);
+   /* -EAGAIN is returned in case the FW was updated. FW needs
+* a reset, so lets try to call __mlxsw_core_bus_device_register()
+* again.
+*/
+   if (err == -EAGAIN && !called_again) {
+   called_again = true;
+   goto again;
+   }
+
+   return err;
+}
 EXPORT_SYMBOL(mlxsw_core_bus_device_register);
 
 void mlxsw_core_bus_device_unregister(struct mlxsw_core *mlxsw_core,
diff --git a/drivers/net/ethernet/mellanox/mlxsw/pci.c 
b/drivers/net/ethernet/mellanox/mlxsw/pci.c
index 5890fdfd62c3..66b8098c6fd2 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/pci.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/pci.c
@@ -1720,7 +1720,6 @@ static int mlxsw_pci_probe(struct pci_dev *pdev, const 
struct pci_device_id *id)
 {
const char *driver_name = pdev->driver->name;
struct mlxsw_pci *mlxsw_pci;
-   bool called_again = false;
int err;
 
mlxsw_pci = kzalloc(sizeof(*mlxsw_pci), GFP_KERNEL);
@@ -1777,18 +1776,10 @@ static int mlxsw_pci_probe(struct pci_dev *pdev, const 
struct pci_device_id *id)
mlxsw_pci->bus_info.dev = >dev;
mlxsw_pci->id = id;
 
-again:
err = mlxsw_core_bus_device_register(_pci->bus_info,
 _pci_bus, mlxsw_pci, false,
 NULL);
-   /* -EAGAIN is returned in case the FW was updated. FW needs
-* a reset, so lets try to call mlxsw_core_bus_device_register()
-* again.
-*/
-   if (err == -EAGAIN && !called_again) {
-   called_again = true;
-   goto again;
-   } else if (err) {
+   if (err) {
dev_err(>dev, "cannot register bus device\n");
goto err_bus_device_register;
}
-- 
2.19.1

net-next is OPEN...

2018-11-06 Thread David Miller



Do it, to it.

Re: [PATCH bpf-next v2 02/13] bpf: btf: Add BTF_KIND_FUNC and BTF_KIND_FUNC_PROTO

2018-11-06 Thread Edward Cree

On 06/11/18 06:29, Alexei Starovoitov wrote:
> BTF is not pure type information. BTF is everything that verifier needs
> to know to make safety decisions that bpf instruction set doesn't have.
Yes, I'm not disputing that and never have.
I'm just saying that it will be much cleaner and better if it's
 internally organised differently.

> Splitting pure types into one section, variables into another,
> functions into yet another is not practical, since the same
> modifiers (like const or volatile) need to be applied to
> variables and functions. At the end all sections will have
> the same style of encoding, hence no need to duplicate
> the encoding three times and instead it's cleaner to encode
> all of them BTF-style via different KINDs.
This shows that you've misunderstood what I'm proposing, probably
 I explained it poorly so I'll try again.
I'm not suggesting that the 'functions' and 'variables' sections
 would have _type_ records in them, only that they would reference
 records in the 'types' section.  So if for instance we have
    int foo(int x) { int quux; /* ... */ }
    int bar(int y) { /* ... */ }
 in the source, then in the 'types' section we would have
1 INT 32bits encoding=signed offset=0
2 FUNC args=(name="x" type=1,), ret=1
3 FUNC args=(name="y" type=1,), ret=1
 while in the 'functions' section we would have
1 "foo" type=2 start_insn_idx=23 insn_count=19 (... maybe other info too...)
2 "bar" type=3 start_insn_idx=42 insn_count=5
 and in the 'variables' section we might have
1 "quux" type=1 where=stack func=1 offset=-8

Thus the graph of types lives entirely in the 'types' section, but
 things-that-are-not-types don't.  I'm not making a distinction
 between "pure types" and (somehow) impure types; I'm making a
 distinction between types (with all their impurities) and
 *instances* of those types.
Note that these 'sections' may all really be regions of the '.BTF'
 ELF section, if that makes the implementation easier.  Also, the
 'functions' and 'variables' sections _won't_ have the same style
 of encoding as the 'types', because they're storing entirely
 different data and in fact don't need variable record sizes.

And note that in this case any const or volatile qualifiers happen
 _in the 'types' section_, because they're just another way of
 deriving a type, and the records in other sections that might want
 them will just point at a BTF_KIND_CONST or BTF_KIND_VOLATILE
 record in the 'types' section.

-Ed

Re: [PATCH v2 net 0/3] net: bql: better deal with GSO

2018-11-06 Thread David Miller

From: Edward Cree 
Date: Tue, 6 Nov 2018 18:23:42 +

> I'm doing a patch to update sfc to use this new helper.  Is this 'net'
>  material, or should I wait for 'net-next' to open back up?

As the driver maintainer I guess I can lead that judgment up to you
at least to a certain extent.

Re: [PATCH net-next v4 9/9] ipv6: do not drop vrf udp multicast packets

2018-11-06 Thread David Ahern

On 11/2/18 1:10 PM, Mike Manning wrote:
> From: Dewi Morgan 
> 
> For bound udp sockets in a vrf, also check the sdif to get the index
> for ingress devices enslaved to an l3mdev.
> 
> Signed-off-by: Dewi Morgan 
> Signed-off-by: Mike Manning 
> ---
>  net/ipv6/udp.c | 8 +---
>  1 file changed, 5 insertions(+), 3 deletions(-)
> 

Reviewed-by: David Ahern

1 2 >

1 - 100 of 110 matches

Mail list logo