Re: [net-next PATCH RFC 1/8] page_pool: add helper functions for DMA

2018-12-07 Thread Ilias Apalodimas
On Fri, Dec 07, 2018 at 11:06:55PM -0800, David Miller wrote:

> This isn't going to work on 32-bit platforms where dma_addr_t is a u64,
> because the page private is unsigned long.
> 
> Grep for PHY_ADDR_T_64BIT under arch/ to see the vast majority of the
> cases where this happens, then ARCH_DMA_ADDR_T_64BIT.

Noted, thanks for the heads up.

Thanks
/Ilias


Re: [net-next PATCH RFC 4/8] net: core: add recycle capabilities on skbs via page_pool API

2018-12-07 Thread Ilias Apalodimas
On Fri, Dec 07, 2018 at 11:15:14PM -0800, David Miller wrote:
> From: Jesper Dangaard Brouer 
> Date: Fri, 07 Dec 2018 00:25:47 +0100
> 
> > @@ -744,6 +745,10 @@ struct sk_buff {
> > head_frag:1,
> > xmit_more:1,
> > pfmemalloc:1;
> > +   /* TODO: Future idea, extend mem_info with __u8 flags, and
> > +* move bits head_frag and pfmemalloc there.
> > +*/
> > +   struct xdp_mem_info mem_info;
> 
> This is 4 bytes right?

With this patchset yes

> 
> I guess I can live with this.

Great news!

> 
> Please do some microbenchmarks to make sure this doesn't show any
> obvious regressions.

Will do

Thanks
/Ilias


Re: [net-next PATCH RFC 4/8] net: core: add recycle capabilities on skbs via page_pool API

2018-12-07 Thread David Miller
From: Jesper Dangaard Brouer 
Date: Fri, 07 Dec 2018 00:25:47 +0100

> @@ -744,6 +745,10 @@ struct sk_buff {
>   head_frag:1,
>   xmit_more:1,
>   pfmemalloc:1;
> + /* TODO: Future idea, extend mem_info with __u8 flags, and
> +  * move bits head_frag and pfmemalloc there.
> +  */
> + struct xdp_mem_info mem_info;

This is 4 bytes right?

I guess I can live with this.

Please do some microbenchmarks to make sure this doesn't show any
obvious regressions.

Thanks.


Re: [net-next PATCH RFC 1/8] page_pool: add helper functions for DMA

2018-12-07 Thread David Miller
From: Jesper Dangaard Brouer 
Date: Fri, 07 Dec 2018 00:25:32 +0100

> From: Ilias Apalodimas 
> 
> Add helper functions for retreiving dma_addr_t stored in page_private and
> unmapping dma addresses, mapped via the page_pool API.
> 
> Signed-off-by: Ilias Apalodimas 
> Signed-off-by: Jesper Dangaard Brouer 

This isn't going to work on 32-bit platforms where dma_addr_t is a u64,
because the page private is unsigned long.

Grep for PHY_ADDR_T_64BIT under arch/ to see the vast majority of the
cases where this happens, then ARCH_DMA_ADDR_T_64BIT.


Re: [PATCH] Revert "net/ibm/emac: wrong bit is used for STA control"

2018-12-07 Thread David Miller
From: Benjamin Herrenschmidt 
Date: Fri, 07 Dec 2018 15:05:04 +1100

> This reverts commit 624ca9c33c8a853a4a589836e310d776620f4ab9.
> 
> This commit is completely bogus. The STACR register has two formats, old
> and new, depending on the version of the IP block used. There's a pair of
> device-tree properties that can be used to specify the format used:
> 
>   has-inverted-stacr-oc
>   has-new-stacr-staopc
> 
> What this commit did was to change the bit definition used with the old
> parts to match the new parts. This of course breaks the driver on all
> the old ones.
> 
> Instead, the author should have set the appropriate properties in the
> device-tree for the variant used on his board.
> 
> Signed-off-by: Benjamin Herrenschmidt 
> ---
> 
> Found while setting up some old ppc440 boxes for test/CI

Applied, thanks.


Re: [PATCH] net-udp: deprioritize cpu match for udp socket lookup

2018-12-07 Thread David Miller
From: Maciej Żenczykowski 
Date: Fri, 7 Dec 2018 16:46:36 -0800

>> This doesn't apply to the current net tree.
>>
>> Also "net-udp: " is a weird subsystem prefix, just use "udp: ".
>>
>> Thank you.
> 
> Interesting... this patch was on top of net-next/master, and it still
> rebases cleanly on current net-next/master.
> 
> Would you like it on net/master instead?  It indeed doesn't apply
> cleanly there...

Well, it is a bug fix isn't it?  Or is this more like a behavioral feature?


Re: [PATCH V2] net: dsa: ksz: Add reset GPIO handling

2018-12-07 Thread Marek Vasut
On 12/08/2018 12:46 AM, David Miller wrote:
> From: Marek Vasut 
> Date: Fri, 7 Dec 2018 23:59:58 +0100
> 
>> On 12/07/2018 11:24 PM, Andrew Lunn wrote:
>>> On Fri, Dec 07, 2018 at 10:51:36PM +0100, Marek Vasut wrote:
 Add code to handle optional reset GPIO in the KSZ switch driver. The switch
 has a reset GPIO line which can be controlled by the CPU, so make sure it 
 is
 configured correctly in such setups.
>>>
>>> Hi Marek
>>
>> Hi Andrew,
>>
>>> Please make this a patch series, not two individual patches.
>>
>> This actually is an individual patch, it doesn't depend on anything.
>> Or do you mean a series with the DT documentation change ?
> 
> Yes, but all of this stuff is building up for one single purpose,
> and that is to support a new mode of operation with DSA or whatever.

I'll group together the ones which make sense to group together and are
not orthogonal if that's OK with you. The reset handling really is
orthogonal from the rest and can go in independently of the rest.

> So please group them together in a series with an appropriate
> header posting.

Sure

-- 
Best regards,
Marek Vasut


Re: [PATCH] net: dsa: ksz: Increase the tag alignment

2018-12-07 Thread Marek Vasut
On 12/08/2018 01:52 AM, tristram...@microchip.com wrote:
>> -padlen = (skb->len >= ETH_ZLEN) ? 0 : ETH_ZLEN - skb->len;
>> +padlen = (skb->len >= VLAN_ETH_ZLEN) ? 0 : VLAN_ETH_ZLEN - skb-
>>> len;
> 
> The requirement is the tail tag should be at the end of frame before FCS.
> When the length is less than 60 the MAC controller will pad the frame to
> legal size.  That is why this function makes sure the padding is done
> manually.  Increasing the size just increases the length of the frame and the
> chance to allocate new socket buffer.
> 
> Example of using ping size of 18 will have the sizes of request and response
> differ by 4 bytes.  Not that it matters much.
> 
> Are you concerned the MAC controller will remove the VLAN tag and so the frame
> will not be sent? Or the switch removes the VLAN tag and is not able to send?

With TI CPSW in dual-ethernet configuration, which adds internal VLAN
tag at the end of the frame, the KSZ switch fails. The CPU will send out
packets and the switch will reject them as corrupted. It needs this
extra VLAN tag padding.

-- 
Best regards,
Marek Vasut


Re: [PATCH] net: dsa: ksz: Fix port membership

2018-12-07 Thread Marek Vasut
On 12/08/2018 01:13 AM, tristram...@microchip.com wrote:
>> Do you have a git tree with all the KSZ patches based on -next
>> somewhere, so I don't have to look for them in random MLs ?
> 
> I just sent it this Monday and the subject for that patch is
> "[PATCH RFC 6/6] net: dsa: microchip: Add switch offload forwarding support."

Is all that collected in some git tree somewhere, so I don't have to
look for various patches in varying states of decay throughout the ML?

-- 
Best regards,
Marek Vasut


Re: [PATCH bpf-next 2/3] bpf: add bpffs pretty print for cgroup local storage maps

2018-12-07 Thread Yonghong Song


On 12/7/18 4:53 PM, Roman Gushchin wrote:
> Implement bpffs pretty printing for cgroup local storage maps
> (both shared and per-cpu).
> Output example (captured for tools/testing/selftests/bpf/netcnt_prog.c):
> 
> Shared:
>$ cat /sys/fs/bpf/map_2
># WARNING!! The output is for debug purpose only
># WARNING!! The output format will change
>{4294968594,1}: {,1039896}
> 
> Per-cpu:
>$ cat /sys/fs/bpf/map_1
># WARNING!! The output is for debug purpose only
># WARNING!! The output format will change
>{4294968594,1}: {
>   cpu0: {0,0,0,0,0}
>   cpu1: {0,0,0,0,0}
>   cpu2: {1,104,0,0,0}
>   cpu3: {0,0,0,0,0}
>}
> 
> Signed-off-by: Roman Gushchin 
> Cc: Alexei Starovoitov 
> Cc: Daniel Borkmann 
> ---
>   include/linux/btf.h| 10 +
>   kernel/bpf/local_storage.c | 90 +-
>   2 files changed, 99 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/btf.h b/include/linux/btf.h
> index 8c2199b5d250..ac67bc4cbfd9 100644
> --- a/include/linux/btf.h
> +++ b/include/linux/btf.h
> @@ -5,6 +5,7 @@
>   #define _LINUX_BTF_H 1
>   
>   #include 
> +#include 
>   
>   struct btf;
>   struct btf_type;
> @@ -63,4 +64,13 @@ static inline const char *btf_name_by_offset(const struct 
> btf *btf,
>   }
>   #endif
>   
> +static inline const struct btf_type *btf_orig_type(const struct btf *btf,
> +const struct btf_type *t)
> +{
> + while (t && BTF_INFO_KIND(t->info) == BTF_KIND_TYPEDEF)
> + t = btf_type_by_id(btf, t->type);
technically, type modifier "const" and "volatile" can apply to member 
type as well. But these modifiers really don't make sense here.
Could you add a comment here to mention that they will be treated
as an error since such a programming is not really recommended?

> +
> + return t;
> +}
> +
>   #endif
> diff --git a/kernel/bpf/local_storage.c b/kernel/bpf/local_storage.c
> index b65017dead44..7b51fe1aba3c 100644
> --- a/kernel/bpf/local_storage.c
> +++ b/kernel/bpf/local_storage.c
> @@ -1,11 +1,13 @@
>   //SPDX-License-Identifier: GPL-2.0
>   #include 
>   #include 
> +#include 
>   #include 
>   #include 
>   #include 
>   #include 
>   #include 
> +#include 
>   
>   DEFINE_PER_CPU(struct bpf_cgroup_storage*, 
> bpf_cgroup_storage[MAX_BPF_CGROUP_STORAGE_TYPE]);
>   
> @@ -308,6 +310,91 @@ static int cgroup_storage_delete_elem(struct bpf_map 
> *map, void *key)
>   return -EINVAL;
>   }
>   
> +static int cgroup_storage_check_btf(const struct bpf_map *map,
> + const struct btf *btf,
> + const struct btf_type *key_type,
> + const struct btf_type *value_type)
> +{
> + const struct btf_type *st, *t;
> + struct btf_member *m;
> +
> + /* Key is expected to be of struct bpf_cgroup_storage_key type,
> +  * which is:
> +  * struct bpf_cgroup_storage_key {
> +  *  __u64   cgroup_inode_id;
> +  *  __u32   attach_type;
> +  * };
> +  */
> +
> + /*
> +  * Key_type must be a structure (or a typedef of a structure) with
> +  * two members.
> +  */
> + st = btf_orig_type(btf, key_type);
> + if (BTF_INFO_KIND(st->info) != BTF_KIND_STRUCT ||
> + BTF_INFO_VLEN(st->info) != 2)
> + return -EINVAL;
> +
> + /*
> +  * The first field must be a 64 bit integer at 0 offset.
> +  */
> + m = (struct btf_member *)(st + 1);
> + t = btf_orig_type(btf, btf_type_by_id(btf, m->type));
> + if (!t || BTF_INFO_KIND(t->info) != BTF_KIND_INT || m->offset ||
> + t->size !=
> + FIELD_SIZEOF(struct bpf_cgroup_storage_key, cgroup_inode_id))
> + return -EINVAL;

We should not use t->size here. The "t->size" is the type size, and the
real number of bits held by the member is BTF_INT_BITS(...) with the 
argument of the u32 int value after "t".

> +
> + /*
> +  * The second field must be a 32 bit integer at 0 offset.
> +  */
> + m = m + 1;
> + t = btf_orig_type(btf, btf_type_by_id(btf, m->type));
> + if (!t || BTF_INFO_KIND(t->info) != BTF_KIND_INT ||
> + m->offset != offsetof(struct bpf_cgroup_storage_key, attach_type) *
> + BITS_PER_BYTE || t->size !=
> + FIELD_SIZEOF(struct bpf_cgroup_storage_key, attach_type))
> + return -EINVAL;

The same is here. t->size should not be used.
BTF_INT_BITS(...) should be used.

> +
> + return 0;
> +}
> +
> +static void cgroup_storage_seq_show_elem(struct bpf_map *map, void *_key,
> +  struct seq_file *m)
> +{
> + enum bpf_cgroup_storage_type stype = cgroup_storage_type(map);
> + struct bpf_cgroup_storage_key *key = _key;
> + struct bpf_cgroup_storage *storage;
> + int cpu;
> +
> + rcu_read_lock();
> + storage = cgroup_storage_lookup(map_to_storage(map), key, false);
> + if (!storage) 

[PATCH bpf-next 2/3] bpf: add bpffs pretty print for cgroup local storage maps

2018-12-07 Thread Roman Gushchin
Implement bpffs pretty printing for cgroup local storage maps
(both shared and per-cpu).
Output example (captured for tools/testing/selftests/bpf/netcnt_prog.c):

Shared:
  $ cat /sys/fs/bpf/map_2
  # WARNING!! The output is for debug purpose only
  # WARNING!! The output format will change
  {4294968594,1}: {,1039896}

Per-cpu:
  $ cat /sys/fs/bpf/map_1
  # WARNING!! The output is for debug purpose only
  # WARNING!! The output format will change
  {4294968594,1}: {
cpu0: {0,0,0,0,0}
cpu1: {0,0,0,0,0}
cpu2: {1,104,0,0,0}
cpu3: {0,0,0,0,0}
  }

Signed-off-by: Roman Gushchin 
Cc: Alexei Starovoitov 
Cc: Daniel Borkmann 
---
 include/linux/btf.h| 10 +
 kernel/bpf/local_storage.c | 90 +-
 2 files changed, 99 insertions(+), 1 deletion(-)

diff --git a/include/linux/btf.h b/include/linux/btf.h
index 8c2199b5d250..ac67bc4cbfd9 100644
--- a/include/linux/btf.h
+++ b/include/linux/btf.h
@@ -5,6 +5,7 @@
 #define _LINUX_BTF_H 1
 
 #include 
+#include 
 
 struct btf;
 struct btf_type;
@@ -63,4 +64,13 @@ static inline const char *btf_name_by_offset(const struct 
btf *btf,
 }
 #endif
 
+static inline const struct btf_type *btf_orig_type(const struct btf *btf,
+  const struct btf_type *t)
+{
+   while (t && BTF_INFO_KIND(t->info) == BTF_KIND_TYPEDEF)
+   t = btf_type_by_id(btf, t->type);
+
+   return t;
+}
+
 #endif
diff --git a/kernel/bpf/local_storage.c b/kernel/bpf/local_storage.c
index b65017dead44..7b51fe1aba3c 100644
--- a/kernel/bpf/local_storage.c
+++ b/kernel/bpf/local_storage.c
@@ -1,11 +1,13 @@
 //SPDX-License-Identifier: GPL-2.0
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
 #include 
 #include 
+#include 
 
 DEFINE_PER_CPU(struct bpf_cgroup_storage*, 
bpf_cgroup_storage[MAX_BPF_CGROUP_STORAGE_TYPE]);
 
@@ -308,6 +310,91 @@ static int cgroup_storage_delete_elem(struct bpf_map *map, 
void *key)
return -EINVAL;
 }
 
+static int cgroup_storage_check_btf(const struct bpf_map *map,
+   const struct btf *btf,
+   const struct btf_type *key_type,
+   const struct btf_type *value_type)
+{
+   const struct btf_type *st, *t;
+   struct btf_member *m;
+
+   /* Key is expected to be of struct bpf_cgroup_storage_key type,
+* which is:
+* struct bpf_cgroup_storage_key {
+*  __u64   cgroup_inode_id;
+*  __u32   attach_type;
+* };
+*/
+
+   /*
+* Key_type must be a structure (or a typedef of a structure) with
+* two members.
+*/
+   st = btf_orig_type(btf, key_type);
+   if (BTF_INFO_KIND(st->info) != BTF_KIND_STRUCT ||
+   BTF_INFO_VLEN(st->info) != 2)
+   return -EINVAL;
+
+   /*
+* The first field must be a 64 bit integer at 0 offset.
+*/
+   m = (struct btf_member *)(st + 1);
+   t = btf_orig_type(btf, btf_type_by_id(btf, m->type));
+   if (!t || BTF_INFO_KIND(t->info) != BTF_KIND_INT || m->offset ||
+   t->size !=
+   FIELD_SIZEOF(struct bpf_cgroup_storage_key, cgroup_inode_id))
+   return -EINVAL;
+
+   /*
+* The second field must be a 32 bit integer at 0 offset.
+*/
+   m = m + 1;
+   t = btf_orig_type(btf, btf_type_by_id(btf, m->type));
+   if (!t || BTF_INFO_KIND(t->info) != BTF_KIND_INT ||
+   m->offset != offsetof(struct bpf_cgroup_storage_key, attach_type) *
+   BITS_PER_BYTE || t->size !=
+   FIELD_SIZEOF(struct bpf_cgroup_storage_key, attach_type))
+   return -EINVAL;
+
+   return 0;
+}
+
+static void cgroup_storage_seq_show_elem(struct bpf_map *map, void *_key,
+struct seq_file *m)
+{
+   enum bpf_cgroup_storage_type stype = cgroup_storage_type(map);
+   struct bpf_cgroup_storage_key *key = _key;
+   struct bpf_cgroup_storage *storage;
+   int cpu;
+
+   rcu_read_lock();
+   storage = cgroup_storage_lookup(map_to_storage(map), key, false);
+   if (!storage) {
+   rcu_read_unlock();
+   return;
+   }
+
+   btf_type_seq_show(map->btf, map->btf_key_type_id, key, m);
+   stype = cgroup_storage_type(map);
+   if (stype == BPF_CGROUP_STORAGE_SHARED) {
+   seq_puts(m, ": ");
+   btf_type_seq_show(map->btf, map->btf_value_type_id,
+ &READ_ONCE(storage->buf)->data[0], m);
+   seq_puts(m, "\n");
+   } else {
+   seq_puts(m, ": {\n");
+   for_each_possible_cpu(cpu) {
+   seq_printf(m, "\tcpu%d: ", cpu);
+   btf_type_seq_show(map->btf, map->btf_value_type_id,
+ per_cpu_ptr(storage->percpu_buf, cpu),
+

[PATCH bpf-next 3/3] selftests/bpf: add btf annotations for cgroup_local_storage maps

2018-12-07 Thread Roman Gushchin
Add btf annotations to cgroup local storage maps (per-cpu and shared)
in the network packet counting example.

Signed-off-by: Roman Gushchin 
Cc: Alexei Starovoitov 
Cc: Daniel Borkmann 
---
 tools/testing/selftests/bpf/netcnt_prog.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/tools/testing/selftests/bpf/netcnt_prog.c 
b/tools/testing/selftests/bpf/netcnt_prog.c
index 1198abca1360..9f741e69cebe 100644
--- a/tools/testing/selftests/bpf/netcnt_prog.c
+++ b/tools/testing/selftests/bpf/netcnt_prog.c
@@ -16,12 +16,18 @@ struct bpf_map_def SEC("maps") percpu_netcnt = {
.value_size = sizeof(struct percpu_net_cnt),
 };
 
+BPF_ANNOTATE_KV_PAIR(percpu_netcnt, struct bpf_cgroup_storage_key,
+struct percpu_net_cnt);
+
 struct bpf_map_def SEC("maps") netcnt = {
.type = BPF_MAP_TYPE_CGROUP_STORAGE,
.key_size = sizeof(struct bpf_cgroup_storage_key),
.value_size = sizeof(struct net_cnt),
 };
 
+BPF_ANNOTATE_KV_PAIR(netcnt, struct bpf_cgroup_storage_key,
+struct net_cnt);
+
 SEC("cgroup/skb")
 int bpf_nextcnt(struct __sk_buff *skb)
 {
-- 
2.19.2



[PATCH bpf-next 1/3] bpf: pass struct btf pointer to the map_check_btf() callback

2018-12-07 Thread Roman Gushchin
If key_type or value_type are of non-trivial data types
(e.g. structure or typedef), it's not possible to check them without
the additional information, which can't be obtained without a pointer
to the btf structure.

So, let's pass btf pointer to the map_check_btf() callbacks.

Signed-off-by: Roman Gushchin 
Cc: Alexei Starovoitov 
Cc: Daniel Borkmann 
---
 include/linux/bpf.h   | 3 +++
 kernel/bpf/arraymap.c | 1 +
 kernel/bpf/lpm_trie.c | 1 +
 kernel/bpf/syscall.c  | 3 ++-
 4 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index e82b7039fc66..128d93540b23 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -23,6 +23,7 @@ struct bpf_prog;
 struct bpf_map;
 struct sock;
 struct seq_file;
+struct btf;
 struct btf_type;
 
 /* map is generic key/value storage optionally accesible by eBPF programs */
@@ -52,6 +53,7 @@ struct bpf_map_ops {
void (*map_seq_show_elem)(struct bpf_map *map, void *key,
  struct seq_file *m);
int (*map_check_btf)(const struct bpf_map *map,
+const struct btf *btf,
 const struct btf_type *key_type,
 const struct btf_type *value_type);
 };
@@ -126,6 +128,7 @@ static inline bool bpf_map_support_seq_show(const struct 
bpf_map *map)
 }
 
 int map_check_no_btf(const struct bpf_map *map,
+const struct btf *btf,
 const struct btf_type *key_type,
 const struct btf_type *value_type);
 
diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
index 24583da9ffd1..25632a75d630 100644
--- a/kernel/bpf/arraymap.c
+++ b/kernel/bpf/arraymap.c
@@ -382,6 +382,7 @@ static void percpu_array_map_seq_show_elem(struct bpf_map 
*map, void *key,
 }
 
 static int array_map_check_btf(const struct bpf_map *map,
+  const struct btf *btf,
   const struct btf_type *key_type,
   const struct btf_type *value_type)
 {
diff --git a/kernel/bpf/lpm_trie.c b/kernel/bpf/lpm_trie.c
index bfd4882e1106..abf1002080df 100644
--- a/kernel/bpf/lpm_trie.c
+++ b/kernel/bpf/lpm_trie.c
@@ -728,6 +728,7 @@ static int trie_get_next_key(struct bpf_map *map, void 
*_key, void *_next_key)
 }
 
 static int trie_check_btf(const struct bpf_map *map,
+ const struct btf *btf,
  const struct btf_type *key_type,
  const struct btf_type *value_type)
 {
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index aa05aa38f4a8..7c2e8ab03a34 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -456,6 +456,7 @@ static int bpf_obj_name_cpy(char *dst, const char *src)
 }
 
 int map_check_no_btf(const struct bpf_map *map,
+const struct btf *btf,
 const struct btf_type *key_type,
 const struct btf_type *value_type)
 {
@@ -478,7 +479,7 @@ static int map_check_btf(const struct bpf_map *map, const 
struct btf *btf,
return -EINVAL;
 
if (map->ops->map_check_btf)
-   ret = map->ops->map_check_btf(map, key_type, value_type);
+   ret = map->ops->map_check_btf(map, btf, key_type, value_type);
 
return ret;
 }
-- 
2.19.2



RE: [PATCH] net: dsa: ksz: Increase the tag alignment

2018-12-07 Thread Tristram.Ha
> - padlen = (skb->len >= ETH_ZLEN) ? 0 : ETH_ZLEN - skb->len;
> + padlen = (skb->len >= VLAN_ETH_ZLEN) ? 0 : VLAN_ETH_ZLEN - skb-
> >len;

The requirement is the tail tag should be at the end of frame before FCS.
When the length is less than 60 the MAC controller will pad the frame to
legal size.  That is why this function makes sure the padding is done
manually.  Increasing the size just increases the length of the frame and the
chance to allocate new socket buffer.

Example of using ping size of 18 will have the sizes of request and response
differ by 4 bytes.  Not that it matters much.

Are you concerned the MAC controller will remove the VLAN tag and so the frame
will not be sent? Or the switch removes the VLAN tag and is not able to send?



Re: [PATCH] net-udp: deprioritize cpu match for udp socket lookup

2018-12-07 Thread Maciej Żenczykowski
> This doesn't apply to the current net tree.
>
> Also "net-udp: " is a weird subsystem prefix, just use "udp: ".
>
> Thank you.

Interesting... this patch was on top of net-next/master, and it still
rebases cleanly on current net-next/master.

Would you like it on net/master instead?  It indeed doesn't apply
cleanly there...


[PATCH bpf-next 1/7] bpf: Add bpf_line_info support

2018-12-07 Thread Martin KaFai Lau
This patch adds bpf_line_info support.

It accepts an array of bpf_line_info objects during BPF_PROG_LOAD.
The "line_info", "line_info_cnt" and "line_info_rec_size" are added
to the "union bpf_attr".  The "line_info_rec_size" makes
bpf_line_info extensible in the future.

The new "check_btf_line()" ensures the userspace line_info is valid
for the kernel to use.

When the verifier is translating/patching the bpf_prog (through
"bpf_patch_insn_single()"), the line_infos' insn_off is also
adjusted by the newly added "bpf_adj_linfo()".

If the bpf_prog is jited, this patch also provides the jited addrs (in
aux->jited_linfo) for the corresponding line_info.insn_off.
"bpf_prog_fill_jited_linfo()" is added to fill the aux->jited_linfo.
It is currently called by the x86 jit.  Other jits can also use
"bpf_prog_fill_jited_linfo()" and it will be done in the followup patches.
In the future, if it deemed necessary, a particular jit could also provide
its own "bpf_prog_fill_jited_linfo()" implementation.

A few "*line_info*" fields are added to the bpf_prog_info such
that the user can get the xlated line_info back (i.e. the line_info
with its insn_off reflecting the translated prog).  The jited_line_info
is available if the prog is jited.  It is an array of __u64.
If the prog is not jited, jited_line_info_cnt is 0.

The verifier's verbose log with line_info will be done in
a follow up patch.

Signed-off-by: Martin KaFai Lau 
Acked-by: Yonghong Song 
---
 arch/x86/net/bpf_jit_comp.c  |   2 +
 include/linux/bpf.h  |  21 
 include/linux/bpf_verifier.h |   1 +
 include/linux/btf.h  |   1 +
 include/linux/filter.h   |   7 ++
 include/uapi/linux/bpf.h |  19 
 kernel/bpf/btf.c |   2 +-
 kernel/bpf/core.c| 118 -
 kernel/bpf/syscall.c |  83 +--
 kernel/bpf/verifier.c| 198 ++-
 10 files changed, 419 insertions(+), 33 deletions(-)

diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
index 2580cd2e98b1..5542303c43d9 100644
--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -1181,6 +1181,8 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog 
*prog)
}
 
if (!image || !prog->is_func || extra_pass) {
+   if (image)
+   bpf_prog_fill_jited_linfo(prog, addrs);
 out_addrs:
kfree(addrs);
kfree(jit_data);
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index e82b7039fc66..0c992b86eb2c 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -319,7 +319,28 @@ struct bpf_prog_aux {
struct bpf_prog_offload *offload;
struct btf *btf;
struct bpf_func_info *func_info;
+   /* bpf_line_info loaded from userspace.  linfo->insn_off
+* has the xlated insn offset.
+* Both the main and sub prog share the same linfo.
+* The subprog can access its first linfo by
+* using the linfo_idx.
+*/
+   struct bpf_line_info *linfo;
+   /* jited_linfo is the jited addr of the linfo.  It has a
+* one to one mapping to linfo:
+* jited_linfo[i] is the jited addr for the linfo[i]->insn_off.
+* Both the main and sub prog share the same jited_linfo.
+* The subprog can access its first jited_linfo by
+* using the linfo_idx.
+*/
+   void **jited_linfo;
u32 func_info_cnt;
+   u32 nr_linfo;
+   /* subprog can use linfo_idx to access its first linfo and
+* jited_linfo.
+* main prog always has linfo_idx == 0
+*/
+   u32 linfo_idx;
union {
struct work_struct work;
struct rcu_head rcu;
diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index 11f5df1092d9..c736945be7c5 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -203,6 +203,7 @@ static inline bool bpf_verifier_log_needed(const struct 
bpf_verifier_log *log)
 
 struct bpf_subprog_info {
u32 start; /* insn idx of function entry point */
+   u32 linfo_idx; /* The idx to the main_prog->aux->linfo */
u16 stack_depth; /* max. stack depth used by this function */
 };
 
diff --git a/include/linux/btf.h b/include/linux/btf.h
index 8c2199b5d250..b98405a56383 100644
--- a/include/linux/btf.h
+++ b/include/linux/btf.h
@@ -46,6 +46,7 @@ void btf_type_seq_show(const struct btf *btf, u32 type_id, 
void *obj,
   struct seq_file *m);
 int btf_get_fd_by_id(u32 id);
 u32 btf_id(const struct btf *btf);
+bool btf_name_offset_valid(const struct btf *btf, u32 offset);
 
 #ifdef CONFIG_BPF_SYSCALL
 const struct btf_type *btf_type_by_id(const struct btf *btf, u32 type_id);
diff --git a/include/linux/filter.h b/include/linux/filter.h
index d16deead65c6..29f21f9d7f68 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -718,6 +718,13 @@ void bpf_prog_free(struct bpf_prog *

[PATCH bpf-next 6/7] bpf: libbpf: Add btf_line_info support to libbpf

2018-12-07 Thread Martin KaFai Lau
This patch adds bpf_line_info support to libbpf:
1) Parsing the line_info sec from ".BTF.ext"
2) Relocating the line_info.  If the main prog *_info relocation
   fails, it will ignore the remaining subprog line_info and continue.
   If the subprog *_info relocation fails, it will bail out.
3) BPF_PROG_LOAD a prog with line_info

Signed-off-by: Martin KaFai Lau 
Acked-by: Yonghong Song 
---
 tools/lib/bpf/bpf.c|  86 +++--
 tools/lib/bpf/bpf.h|   3 +
 tools/lib/bpf/btf.c| 209 +
 tools/lib/bpf/btf.h|  10 +-
 tools/lib/bpf/libbpf.c |  20 
 5 files changed, 239 insertions(+), 89 deletions(-)

diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
index 9fbbc0ed5952..3caaa3428774 100644
--- a/tools/lib/bpf/bpf.c
+++ b/tools/lib/bpf/bpf.c
@@ -173,11 +173,36 @@ int bpf_create_map_in_map(enum bpf_map_type map_type, 
const char *name,
  -1);
 }
 
+static void *
+alloc_zero_tailing_info(const void *orecord, __u32 cnt,
+   __u32 actual_rec_size, __u32 expected_rec_size)
+{
+   __u64 info_len = actual_rec_size * cnt;
+   void *info, *nrecord;
+   int i;
+
+   info = malloc(info_len);
+   if (!info)
+   return NULL;
+
+   /* zero out bytes kernel does not understand */
+   nrecord = info;
+   for (i = 0; i < cnt; i++) {
+   memcpy(nrecord, orecord, expected_rec_size);
+   memset(nrecord + expected_rec_size, 0,
+  actual_rec_size - expected_rec_size);
+   orecord += actual_rec_size;
+   nrecord += actual_rec_size;
+   }
+
+   return info;
+}
+
 int bpf_load_program_xattr(const struct bpf_load_program_attr *load_attr,
   char *log_buf, size_t log_buf_sz)
 {
+   void *finfo = NULL, *linfo = NULL;
union bpf_attr attr;
-   void *finfo = NULL;
__u32 name_len;
int fd;
 
@@ -201,6 +226,9 @@ int bpf_load_program_xattr(const struct 
bpf_load_program_attr *load_attr,
attr.func_info_rec_size = load_attr->func_info_rec_size;
attr.func_info_cnt = load_attr->func_info_cnt;
attr.func_info = ptr_to_u64(load_attr->func_info);
+   attr.line_info_rec_size = load_attr->line_info_rec_size;
+   attr.line_info_cnt = load_attr->line_info_cnt;
+   attr.line_info = ptr_to_u64(load_attr->line_info);
memcpy(attr.prog_name, load_attr->name,
   min(name_len, BPF_OBJ_NAME_LEN - 1));
 
@@ -212,36 +240,35 @@ int bpf_load_program_xattr(const struct 
bpf_load_program_attr *load_attr,
 * to give user space a hint how to deal with loading failure.
 * Check to see whether we can make some changes and load again.
 */
-   if (errno == E2BIG && attr.func_info_cnt &&
-   attr.func_info_rec_size < load_attr->func_info_rec_size) {
-   __u32 actual_rec_size = load_attr->func_info_rec_size;
-   __u32 expected_rec_size = attr.func_info_rec_size;
-   __u32 finfo_cnt = load_attr->func_info_cnt;
-   __u64 finfo_len = actual_rec_size * finfo_cnt;
-   const void *orecord;
-   void *nrecord;
-   int i;
-
-   finfo = malloc(finfo_len);
-   if (!finfo)
-   /* further try with log buffer won't help */
-   return fd;
-
-   /* zero out bytes kernel does not understand */
-   orecord = load_attr->func_info;
-   nrecord = finfo;
-   for (i = 0; i < load_attr->func_info_cnt; i++) {
-   memcpy(nrecord, orecord, expected_rec_size);
-   memset(nrecord + expected_rec_size, 0,
-  actual_rec_size - expected_rec_size);
-   orecord += actual_rec_size;
-   nrecord += actual_rec_size;
+   while (errno == E2BIG && (!finfo || !linfo)) {
+   if (!finfo && attr.func_info_cnt &&
+   attr.func_info_rec_size < load_attr->func_info_rec_size) {
+   /* try with corrected func info records */
+   finfo = alloc_zero_tailing_info(load_attr->func_info,
+   
load_attr->func_info_cnt,
+   
load_attr->func_info_rec_size,
+   
attr.func_info_rec_size);
+   if (!finfo)
+   goto done;
+
+   attr.func_info = ptr_to_u64(finfo);
+   attr.func_info_rec_size = load_attr->func_info_rec_size;
+   } else if (!linfo && attr.line_info_cnt &&
+  attr.line_info_rec_size <
+  load_attr->line_info_rec_size) {
+   linfo = alloc_zero_tailing_info(

[PATCH bpf-next 3/7] bpf: Refactor and bug fix in test_func_type in test_btf.c

2018-12-07 Thread Martin KaFai Lau
1) bpf_load_program_xattr() is absorbing the EBIG error
   which makes testing this case impossible.  It is replaced
   with a direct syscall(__NR_bpf, BPF_PROG_LOAD,...).
2) The test_func_type() is renamed to test_info_raw() to
   prepare for the new line_info test in the next patch.
3) The bpf_obj_get_info_by_fd() testing for func_info
   is refactored to test_get_finfo().  A new
   test_get_linfo() will be added in the next patch
   for testing line_info purpose.
4) The test->func_info_cnt is checked instead of
   a static value "2".
5) Remove unnecessary "\n" in error message.
6) Adding back info_raw_test_num to the cmd arg such
   that a specific test case can be tested, like
   all other existing tests.

7) Fix a bug in handling expected_prog_load_failure.
   A test could pass even if prog_fd != -1 while
   expected_prog_load_failure is true.
8) The min rec_size check should be < 8 instead of < 4.

Fixes: 4798c4ba3ba9 ("tools/bpf: extends test_btf to test load/retrieve 
func_type info")
Signed-off-by: Martin KaFai Lau 
Acked-by: Yonghong Song 
---
 tools/testing/selftests/bpf/test_btf.c | 211 +++--
 1 file changed, 125 insertions(+), 86 deletions(-)

diff --git a/tools/testing/selftests/bpf/test_btf.c 
b/tools/testing/selftests/bpf/test_btf.c
index ff0952ea757a..8d5777c89620 100644
--- a/tools/testing/selftests/bpf/test_btf.c
+++ b/tools/testing/selftests/bpf/test_btf.c
@@ -6,6 +6,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -114,12 +115,13 @@ static struct args {
unsigned int raw_test_num;
unsigned int file_test_num;
unsigned int get_info_test_num;
+   unsigned int info_raw_test_num;
bool raw_test;
bool file_test;
bool get_info_test;
bool pprint_test;
bool always_log;
-   bool func_type_test;
+   bool info_raw_test;
 } args;
 
 static char btf_log_buf[BTF_LOG_BUF_SIZE];
@@ -3051,7 +3053,7 @@ static int test_pprint(void)
return err;
 }
 
-static struct btf_func_type_test {
+static struct prog_info_raw_test {
const char *descr;
const char *str_sec;
__u32 raw_types[MAX_NR_RAW_TYPES];
@@ -3062,7 +3064,7 @@ static struct btf_func_type_test {
__u32 func_info_rec_size;
__u32 func_info_cnt;
bool expected_prog_load_failure;
-} func_type_test[] = {
+} info_raw_tests[] = {
 {
.descr = "func_type (main func + one sub)",
.raw_types = {
@@ -3198,90 +3200,44 @@ static size_t probe_prog_length(const struct bpf_insn 
*fp)
return len + 1;
 }
 
-static int do_test_func_type(int test_num)
+static int test_get_finfo(const struct prog_info_raw_test *test,
+ int prog_fd)
 {
-   const struct btf_func_type_test *test = &func_type_test[test_num];
-   unsigned int raw_btf_size, info_len, rec_size;
-   int i, btf_fd = -1, prog_fd = -1, err = 0;
-   struct bpf_load_program_attr attr = {};
-   void *raw_btf, *func_info = NULL;
struct bpf_prog_info info = {};
struct bpf_func_info *finfo;
-
-   fprintf(stderr, "%s..", test->descr);
-   raw_btf = btf_raw_create(&hdr_tmpl, test->raw_types,
-test->str_sec, test->str_sec_size,
-&raw_btf_size);
-
-   if (!raw_btf)
-   return -1;
-
-   *btf_log_buf = '\0';
-   btf_fd = bpf_load_btf(raw_btf, raw_btf_size,
- btf_log_buf, BTF_LOG_BUF_SIZE,
- args.always_log);
-   free(raw_btf);
-
-   if (CHECK(btf_fd == -1, "invalid btf_fd errno:%d", errno)) {
-   err = -1;
-   goto done;
-   }
-
-   if (*btf_log_buf && args.always_log)
-   fprintf(stderr, "\n%s", btf_log_buf);
-
-   attr.prog_type = test->prog_type;
-   attr.insns = test->insns;
-   attr.insns_cnt = probe_prog_length(attr.insns);
-   attr.license = "GPL";
-   attr.prog_btf_fd = btf_fd;
-   attr.func_info_rec_size = test->func_info_rec_size;
-   attr.func_info_cnt = test->func_info_cnt;
-   attr.func_info = test->func_info;
-
-   *btf_log_buf = '\0';
-   prog_fd = bpf_load_program_xattr(&attr, btf_log_buf,
-BTF_LOG_BUF_SIZE);
-   if (test->expected_prog_load_failure && prog_fd == -1) {
-   err = 0;
-   goto done;
-   }
-   if (CHECK(prog_fd == -1, "invalid prog_id errno:%d", errno)) {
-   fprintf(stderr, "%s\n", btf_log_buf);
-   err = -1;
-   goto done;
-   }
+   __u32 info_len, rec_size, i;
+   void *func_info = NULL;
+   int err;
 
/* get necessary lens */
info_len = sizeof(struct bpf_prog_info);
err = bpf_obj_get_info_by_fd(prog_fd, &info, &info_len);
if (CHECK(err == -1, "invalid get info (1st) errno:%d", errno)) {
fprintf(stderr, "%s\n", btf_log_b

[PATCH bpf-next 0/7] Introduce bpf_line_info

2018-12-07 Thread Martin KaFai Lau
This patch series introduces the bpf_line_info.  Please see individual patch
for details.

It will be useful for introspection purpose, like:

[root@arch-fb-vm1 bpf]# ~/devshare/fb-kernel/linux/tools/bpf/bpftool/bpftool 
prog dump jited pinned /sys/fs/bpf/test_btf_haskv
[...]
int test_long_fname_2(struct dummy_tracepoint_args * arg):
bpf_prog_44a040bf25481309_test_long_fname_2:
; static int test_long_fname_2(struct dummy_tracepoint_args *arg)
   0:   push   %rbp
   1:   mov%rsp,%rbp
   4:   sub$0x30,%rsp
   b:   sub$0x28,%rbp
   f:   mov%rbx,0x0(%rbp)
  13:   mov%r13,0x8(%rbp)
  17:   mov%r14,0x10(%rbp)
  1b:   mov%r15,0x18(%rbp)
  1f:   xor%eax,%eax
  21:   mov%rax,0x20(%rbp)
  25:   xor%esi,%esi
; int key = 0;
  27:   mov%esi,-0x4(%rbp)
; if (!arg->sock)
  2a:   mov0x8(%rdi),%rdi
; if (!arg->sock)
  2e:   cmp$0x0,%rdi
  32:   je 0x0070
  34:   mov%rbp,%rsi
; counts = bpf_map_lookup_elem(&btf_map, &key);
  37:   add$0xfffc,%rsi
  3b:   movabs $0x8881139d7480,%rdi
  45:   add$0x110,%rdi
  4c:   mov0x0(%rsi),%eax
  4f:   cmp$0x4,%rax
  53:   jae0x005e
  55:   shl$0x3,%rax
  59:   add%rdi,%rax
  5c:   jmp0x0060
  5e:   xor%eax,%eax
; if (!counts)
  60:   cmp$0x0,%rax
  64:   je 0x0070
; counts->v6++;
  66:   mov0x4(%rax),%edi
  69:   add$0x1,%rdi
  6d:   mov%edi,0x4(%rax)
  70:   mov0x0(%rbp),%rbx
  74:   mov0x8(%rbp),%r13
  78:   mov0x10(%rbp),%r14
  7c:   mov0x18(%rbp),%r15
  80:   add$0x28,%rbp
  84:   leaveq
  85:   retq
[...]

Martin KaFai Lau (7):
  bpf: Add bpf_line_info support
  bpf: tools: Sync uapi bpf.h
  bpf: Refactor and bug fix in test_func_type in test_btf.c
  bpf: Add unit tests for bpf_line_info
  bpf: libbpf: Refactor and bug fix on the bpf_func_info loading logic
  bpf: libbpf: Add btf_line_info support to libbpf
  bpf: libbpf: bpftool: Print bpf_line_info during prog dump

 arch/x86/net/bpf_jit_comp.c   |   2 +
 include/linux/bpf.h   |  21 +
 include/linux/bpf_verifier.h  |   1 +
 include/linux/btf.h   |   1 +
 include/linux/filter.h|   7 +
 include/uapi/linux/bpf.h  |  19 +
 kernel/bpf/btf.c  |   2 +-
 kernel/bpf/core.c | 118 ++-
 kernel/bpf/syscall.c  |  83 +-
 kernel/bpf/verifier.c | 198 -
 .../bpftool/Documentation/bpftool-prog.rst|  16 +-
 tools/bpf/bpftool/bash-completion/bpftool |   6 +-
 tools/bpf/bpftool/btf_dumper.c|  64 ++
 tools/bpf/bpftool/jit_disasm.c|  23 +-
 tools/bpf/bpftool/main.h  |  23 +-
 tools/bpf/bpftool/prog.c  | 100 ++-
 tools/bpf/bpftool/xlated_dumper.c |  30 +-
 tools/bpf/bpftool/xlated_dumper.h |   7 +-
 tools/include/uapi/linux/bpf.h|  19 +
 tools/lib/bpf/Build   |   2 +-
 tools/lib/bpf/bpf.c   |  93 ++-
 tools/lib/bpf/bpf.h   |   3 +
 tools/lib/bpf/bpf_prog_linfo.c| 253 ++
 tools/lib/bpf/btf.c   | 342 
 tools/lib/bpf/btf.h   |  25 +-
 tools/lib/bpf/libbpf.c| 159 +++-
 tools/lib/bpf/libbpf.h|  13 +
 tools/lib/bpf/libbpf.map  |   4 +
 tools/testing/selftests/bpf/test_btf.c| 790 +++---
 29 files changed, 2036 insertions(+), 388 deletions(-)
 create mode 100644 tools/lib/bpf/bpf_prog_linfo.c

-- 
2.17.1



[PATCH bpf-next 7/7] bpf: libbpf: bpftool: Print bpf_line_info during prog dump

2018-12-07 Thread Martin KaFai Lau
This patch adds print bpf_line_info function in 'prog dump jitted'
and 'prog dump xlated':

[root@arch-fb-vm1 bpf]# ~/devshare/fb-kernel/linux/tools/bpf/bpftool/bpftool 
prog dump jited pinned /sys/fs/bpf/test_btf_haskv
[...]
int test_long_fname_2(struct dummy_tracepoint_args * arg):
bpf_prog_44a040bf25481309_test_long_fname_2:
; static int test_long_fname_2(struct dummy_tracepoint_args *arg)
   0:   push   %rbp
   1:   mov%rsp,%rbp
   4:   sub$0x30,%rsp
   b:   sub$0x28,%rbp
   f:   mov%rbx,0x0(%rbp)
  13:   mov%r13,0x8(%rbp)
  17:   mov%r14,0x10(%rbp)
  1b:   mov%r15,0x18(%rbp)
  1f:   xor%eax,%eax
  21:   mov%rax,0x20(%rbp)
  25:   xor%esi,%esi
; int key = 0;
  27:   mov%esi,-0x4(%rbp)
; if (!arg->sock)
  2a:   mov0x8(%rdi),%rdi
; if (!arg->sock)
  2e:   cmp$0x0,%rdi
  32:   je 0x0070
  34:   mov%rbp,%rsi
; counts = bpf_map_lookup_elem(&btf_map, &key);
  37:   add$0xfffc,%rsi
  3b:   movabs $0x8881139d7480,%rdi
  45:   add$0x110,%rdi
  4c:   mov0x0(%rsi),%eax
  4f:   cmp$0x4,%rax
  53:   jae0x005e
  55:   shl$0x3,%rax
  59:   add%rdi,%rax
  5c:   jmp0x0060
  5e:   xor%eax,%eax
; if (!counts)
  60:   cmp$0x0,%rax
  64:   je 0x0070
; counts->v6++;
  66:   mov0x4(%rax),%edi
  69:   add$0x1,%rdi
  6d:   mov%edi,0x4(%rax)
  70:   mov0x0(%rbp),%rbx
  74:   mov0x8(%rbp),%r13
  78:   mov0x10(%rbp),%r14
  7c:   mov0x18(%rbp),%r15
  80:   add$0x28,%rbp
  84:   leaveq
  85:   retq
[...]

With linum:
[root@arch-fb-vm1 bpf]# ~/devshare/fb-kernel/linux/tools/bpf/bpftool/bpftool 
prog dump jited pinned /sys/fs/bpf/test_btf_haskv linum
int _dummy_tracepoint(struct dummy_tracepoint_args * arg):
bpf_prog_b07ccb89267cf242__dummy_tracepoint:
; return test_long_fname_1(arg); 
[file:/data/users/kafai/fb-kernel/linux/tools/testing/selftests/bpf/test_btf_haskv.c
 line_num:54 line_col:9]
   0:   push   %rbp
   1:   mov%rsp,%rbp
   4:   sub$0x28,%rsp
   b:   sub$0x28,%rbp
   f:   mov%rbx,0x0(%rbp)
  13:   mov%r13,0x8(%rbp)
  17:   mov%r14,0x10(%rbp)
  1b:   mov%r15,0x18(%rbp)
  1f:   xor%eax,%eax
  21:   mov%rax,0x20(%rbp)
  25:   callq  0x851e
; return test_long_fname_1(arg); 
[file:/data/users/kafai/fb-kernel/linux/tools/testing/selftests/bpf/test_btf_haskv.c
 line_num:54 line_col:2]
  2a:   xor%eax,%eax
  2c:   mov0x0(%rbp),%rbx
  30:   mov0x8(%rbp),%r13
  34:   mov0x10(%rbp),%r14
  38:   mov0x18(%rbp),%r15
  3c:   add$0x28,%rbp
  40:   leaveq
  41:   retq
[...]

Signed-off-by: Martin KaFai Lau 
Acked-by: Yonghong Song 
---
 .../bpftool/Documentation/bpftool-prog.rst|  16 +-
 tools/bpf/bpftool/bash-completion/bpftool |   6 +-
 tools/bpf/bpftool/btf_dumper.c|  64 +
 tools/bpf/bpftool/jit_disasm.c|  23 +-
 tools/bpf/bpftool/main.h  |  23 +-
 tools/bpf/bpftool/prog.c  | 100 ++-
 tools/bpf/bpftool/xlated_dumper.c |  30 ++-
 tools/bpf/bpftool/xlated_dumper.h |   7 +-
 tools/lib/bpf/Build   |   2 +-
 tools/lib/bpf/bpf_prog_linfo.c| 253 ++
 tools/lib/bpf/libbpf.h|  13 +
 tools/lib/bpf/libbpf.map  |   4 +
 12 files changed, 516 insertions(+), 25 deletions(-)
 create mode 100644 tools/lib/bpf/bpf_prog_linfo.c

diff --git a/tools/bpf/bpftool/Documentation/bpftool-prog.rst 
b/tools/bpf/bpftool/Documentation/bpftool-prog.rst
index 5524b6dccd85..7c30731a9b73 100644
--- a/tools/bpf/bpftool/Documentation/bpftool-prog.rst
+++ b/tools/bpf/bpftool/Documentation/bpftool-prog.rst
@@ -22,8 +22,8 @@ MAP COMMANDS
 =
 
 |  **bpftool** **prog { show | list }** [*PROG*]
-|  **bpftool** **prog dump xlated** *PROG* [{**file** *FILE* | **opcodes** 
| **visual**}]
-|  **bpftool** **prog dump jited**  *PROG* [{**file** *FILE* | 
**opcodes**}]
+|  **bpftool** **prog dump xlated** *PROG* [{**file** *FILE* | **opcodes** 
| **visual** | **linum**}]
+|  **bpftool** **prog dump jited**  *PROG* [{**file** *FILE* | **opcodes** 
| **linum**}]
 |  **bpftool** **prog pin** *PROG* *FILE*
 |  **bpftool** **prog { load | loadall }** *OBJ* *PATH* [**type** *TYPE*] 
[**map** {**idx** *IDX* | **name** *NAME*} *MAP*] [**dev** *NAME*]
 |  **bpftool** **prog attach** *PROG* *ATTACH_TYPE* [*MAP*]
@@ -56,7 +56,7 @@ DESCRIPTION
  Output will start with program ID followed by program type and
  zero or more named attributes (depending on kernel version).
 
-   **bpftool prog dump xlated** *PROG* [{ **file** *FILE* | **opcodes** | 
**visual** }]
+   **bpftool prog dump xlated** *PROG* [{ **file** *FILE* | **opcodes** | 
**visual** | **linum** }]
  Dump eBPF instructions of the program from the kernel. By
   

[PATCH bpf-next 2/7] bpf: tools: Sync uapi bpf.h

2018-12-07 Thread Martin KaFai Lau
Sync uapi bpf.h to tools/include/uapi/linux for
the new bpf_line_info.

Signed-off-by: Martin KaFai Lau 
Acked-by: Yonghong Song 
---
 tools/include/uapi/linux/bpf.h | 19 +++
 1 file changed, 19 insertions(+)

diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 16263e8827fc..7973c28b24a0 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -356,6 +356,9 @@ union bpf_attr {
__u32   func_info_rec_size; /* userspace 
bpf_func_info size */
__aligned_u64   func_info;  /* func info */
__u32   func_info_cnt;  /* number of bpf_func_info 
records */
+   __u32   line_info_rec_size; /* userspace 
bpf_line_info size */
+   __aligned_u64   line_info;  /* line info */
+   __u32   line_info_cnt;  /* number of bpf_line_info 
records */
};
 
struct { /* anonymous struct used by BPF_OBJ_* commands */
@@ -2679,6 +2682,12 @@ struct bpf_prog_info {
__u32 func_info_rec_size;
__aligned_u64 func_info;
__u32 func_info_cnt;
+   __u32 line_info_cnt;
+   __aligned_u64 line_info;
+   __aligned_u64 jited_line_info;
+   __u32 jited_line_info_cnt;
+   __u32 line_info_rec_size;
+   __u32 jited_line_info_rec_size;
 } __attribute__((aligned(8)));
 
 struct bpf_map_info {
@@ -2995,4 +3004,14 @@ struct bpf_func_info {
__u32   type_id;
 };
 
+#define BPF_LINE_INFO_LINE_NUM(line_col)   ((line_col) >> 10)
+#define BPF_LINE_INFO_LINE_COL(line_col)   ((line_col) & 0x3ff)
+
+struct bpf_line_info {
+   __u32   insn_off;
+   __u32   file_name_off;
+   __u32   line_off;
+   __u32   line_col;
+};
+
 #endif /* _UAPI__LINUX_BPF_H__ */
-- 
2.17.1



[PATCH bpf-next 4/7] bpf: Add unit tests for bpf_line_info

2018-12-07 Thread Martin KaFai Lau
Add unit tests for bpf_line_info for both BPF_PROG_LOAD and
BPF_OBJ_GET_INFO_BY_FD.

jit enabled:
[root@arch-fb-vm1 bpf]# ./test_btf -k 0
BTF prog info raw test[5] (line_info (No subprog)): OK
BTF prog info raw test[6] (line_info (No subprog. insn_off >= prog->len)): OK
BTF prog info raw test[7] (line_info (No subprog. zero tailing line_info): OK
BTF prog info raw test[8] (line_info (No subprog. nonzero tailing line_info)): 
OK
BTF prog info raw test[9] (line_info (subprog)): OK
BTF prog info raw test[10] (line_info (subprog + func_info)): OK
BTF prog info raw test[11] (line_info (subprog. missing 1st func line info)): OK
BTF prog info raw test[12] (line_info (subprog. missing 2nd func line info)): OK
BTF prog info raw test[13] (line_info (subprog. unordered insn offset)): OK

jit disabled:
BTF prog info raw test[5] (line_info (No subprog)): not jited. skipping 
jited_line_info check. OK
BTF prog info raw test[6] (line_info (No subprog. insn_off >= prog->len)): OK
BTF prog info raw test[7] (line_info (No subprog. zero tailing line_info): not 
jited. skipping jited_line_info check. OK
BTF prog info raw test[8] (line_info (No subprog. nonzero tailing line_info)): 
OK
BTF prog info raw test[9] (line_info (subprog)): not jited. skipping 
jited_line_info check. OK
BTF prog info raw test[10] (line_info (subprog + func_info)): not jited. 
skipping jited_line_info check. OK
BTF prog info raw test[11] (line_info (subprog. missing 1st func line info)): OK
BTF prog info raw test[12] (line_info (subprog. missing 2nd func line info)): OK
BTF prog info raw test[13] (line_info (subprog. unordered insn offset)): OK

Signed-off-by: Martin KaFai Lau 
Acked-by: Yonghong Song 
---
 tools/testing/selftests/bpf/test_btf.c | 597 -
 1 file changed, 580 insertions(+), 17 deletions(-)

diff --git a/tools/testing/selftests/bpf/test_btf.c 
b/tools/testing/selftests/bpf/test_btf.c
index 8d5777c89620..7707273736ac 100644
--- a/tools/testing/selftests/bpf/test_btf.c
+++ b/tools/testing/selftests/bpf/test_btf.c
@@ -108,7 +108,7 @@ static int __base_pr(const char *format, ...)
 #define BTF_END_RAW 0xdeadbeef
 #define NAME_TBD 0xdeadb33f
 
-#define MAX_NR_RAW_TYPES 1024
+#define MAX_NR_RAW_U32 1024
 #define BTF_LOG_BUF_SIZE 65535
 
 static struct args {
@@ -137,7 +137,7 @@ struct btf_raw_test {
const char *str_sec;
const char *map_name;
const char *err_str;
-   __u32 raw_types[MAX_NR_RAW_TYPES];
+   __u32 raw_types[MAX_NR_RAW_U32];
__u32 str_sec_size;
enum bpf_map_type map_type;
__u32 key_size;
@@ -156,6 +156,9 @@ struct btf_raw_test {
int str_len_delta;
 };
 
+#define BTF_STR_SEC(str) \
+   .str_sec = str, .str_sec_size = sizeof(str)
+
 static struct btf_raw_test raw_tests[] = {
 /* enum E {
  * E0,
@@ -1858,11 +1861,11 @@ static const char *get_next_str(const char *start, 
const char *end)
return start < end - 1 ? start + 1 : NULL;
 }
 
-static int get_type_sec_size(const __u32 *raw_types)
+static int get_raw_sec_size(const __u32 *raw_types)
 {
int i;
 
-   for (i = MAX_NR_RAW_TYPES - 1;
+   for (i = MAX_NR_RAW_U32 - 1;
 i >= 0 && raw_types[i] != BTF_END_RAW;
 i--)
;
@@ -1874,7 +1877,8 @@ static void *btf_raw_create(const struct btf_header *hdr,
const __u32 *raw_types,
const char *str,
unsigned int str_sec_size,
-   unsigned int *btf_size)
+   unsigned int *btf_size,
+   const char **ret_next_str)
 {
const char *next_str = str, *end_str = str + str_sec_size;
unsigned int size_needed, offset;
@@ -1883,7 +1887,7 @@ static void *btf_raw_create(const struct btf_header *hdr,
uint32_t *ret_types;
void *raw_btf;
 
-   type_sec_size = get_type_sec_size(raw_types);
+   type_sec_size = get_raw_sec_size(raw_types);
if (CHECK(type_sec_size < 0, "Cannot get nr_raw_types"))
return NULL;
 
@@ -1922,6 +1926,8 @@ static void *btf_raw_create(const struct btf_header *hdr,
ret_hdr->str_len = str_sec_size;
 
*btf_size = size_needed;
+   if (ret_next_str)
+   *ret_next_str = next_str;
 
return raw_btf;
 }
@@ -1941,7 +1947,7 @@ static int do_test_raw(unsigned int test_num)
 test->raw_types,
 test->str_sec,
 test->str_sec_size,
-&raw_btf_size);
+&raw_btf_size, NULL);
 
if (!raw_btf)
return -1;
@@ -2018,7 +2024,7 @@ static int test_raw(void)
 struct btf_get_info_test {
const char *descr;
const char *str_sec;
-   __u32 raw_types[MAX_NR_RAW_TYPES];
+   __u32 raw_types[MAX_NR_RAW_U32];
__u32 str_sec_size;
int btf_size_delta;

[PATCH bpf-next 5/7] bpf: libbpf: Refactor and bug fix on the bpf_func_info loading logic

2018-12-07 Thread Martin KaFai Lau
This patch refactor and fix a bug in the libbpf's bpf_func_info loading
logic.  The bug fix and refactoring are targeting the same
commit 2993e0515bb4 ("tools/bpf: add support to read .BTF.ext sections")
which is in the bpf-next branch.

1) In bpf_load_program_xattr(), it should retry when errno == E2BIG
   regardless of log_buf and log_buf_sz.  This patch fixes it.

2) btf_ext__reloc_init() and btf_ext__reloc() are essentially
   the same except btf_ext__reloc_init() always has insns_cnt == 0.
   Hence, btf_ext__reloc_init() is removed.

   btf_ext__reloc() is also renamed to btf_ext__reloc_func_info()
   to get ready for the line_info support in the next patch.

3) Consolidate func_info section logic from "btf_ext_parse_hdr()",
   "btf_ext_validate_func_info()" and "btf_ext__new()" to
   a new function "btf_ext_copy_func_info()" such that similar
   logic can be reused by the later libbpf's line_info patch.

4) The next line_info patch will store line_info_cnt instead of
   line_info_len in the bpf_program because the kernel is taking
   line_info_cnt also.  It will save a few "len" to "cnt" conversions
   and will also save some function args.

   Hence, this patch also makes bpf_program to store func_info_cnt
   instead of func_info_len.

5) btf_ext depends on btf.  e.g. the func_info's type_id
   in ".BTF.ext" is not useful when ".BTF" is absent.
   This patch only init the obj->btf_ext pointer after
   it has successfully init the obj->btf pointer.

   This can avoid always checking "obj->btf && obj->btf_ext"
   together for accessing ".BTF.ext".  Checking "obj->btf_ext"
   alone will do.

6) Move "struct btf_sec_func_info" from btf.h to btf.c.
   There is no external usage outside btf.c.

Fixes: 2993e0515bb4 ("tools/bpf: add support to read .BTF.ext sections")
Signed-off-by: Martin KaFai Lau 
Acked-by: Yonghong Song 
---
 tools/lib/bpf/bpf.c|   7 +-
 tools/lib/bpf/btf.c| 191 -
 tools/lib/bpf/btf.h|  17 +---
 tools/lib/bpf/libbpf.c | 139 --
 4 files changed, 177 insertions(+), 177 deletions(-)

diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
index 5c3be06bf0dd..9fbbc0ed5952 100644
--- a/tools/lib/bpf/bpf.c
+++ b/tools/lib/bpf/bpf.c
@@ -205,7 +205,7 @@ int bpf_load_program_xattr(const struct 
bpf_load_program_attr *load_attr,
   min(name_len, BPF_OBJ_NAME_LEN - 1));
 
fd = sys_bpf(BPF_PROG_LOAD, &attr, sizeof(attr));
-   if (fd >= 0 || !log_buf || !log_buf_sz)
+   if (fd >= 0)
return fd;
 
/* After bpf_prog_load, the kernel may modify certain attributes
@@ -244,10 +244,13 @@ int bpf_load_program_xattr(const struct 
bpf_load_program_attr *load_attr,
 
fd = sys_bpf(BPF_PROG_LOAD, &attr, sizeof(attr));
 
-   if (fd >= 0 || !log_buf || !log_buf_sz)
+   if (fd >= 0)
goto done;
}
 
+   if (!log_buf || !log_buf_sz)
+   goto done;
+
/* Try again with log */
attr.log_buf = ptr_to_u64(log_buf);
attr.log_size = log_buf_sz;
diff --git a/tools/lib/bpf/btf.c b/tools/lib/bpf/btf.c
index 85d6446cf832..aa4fa02b13fc 100644
--- a/tools/lib/bpf/btf.c
+++ b/tools/lib/bpf/btf.c
@@ -43,6 +43,13 @@ struct btf_ext {
__u32 func_info_len;
 };
 
+struct btf_sec_func_info {
+   __u32   sec_name_off;
+   __u32   num_func_info;
+   /* Followed by num_func_info number of bpf func_info records */
+   __u8data[0];
+};
+
 /* The minimum bpf_func_info checked by the loader */
 struct bpf_func_info_min {
__u32   insn_off;
@@ -479,41 +486,66 @@ int btf__get_from_id(__u32 id, struct btf **btf)
return err;
 }
 
-static int btf_ext_validate_func_info(const void *finfo, __u32 size,
- btf_print_fn_t err_log)
+static int btf_ext_copy_func_info(struct btf_ext *btf_ext,
+ __u8 *data, __u32 data_size,
+ btf_print_fn_t err_log)
 {
-   int sec_hdrlen = sizeof(struct btf_sec_func_info);
-   __u32 size_left, num_records, record_size;
+   const struct btf_ext_header *hdr = (struct btf_ext_header *)data;
const struct btf_sec_func_info *sinfo;
-   __u64 total_record_size;
+   __u32 info_left, record_size;
+   /* The start of the info sec (including the __u32 record_size). */
+   const void *info;
+
+   /* data and data_size do not include btf_ext_header from now on */
+   data = data + hdr->hdr_len;
+   data_size -= hdr->hdr_len;
+
+   if (hdr->func_info_off & 0x03) {
+   elog("BTF.ext func_info section is not aligned to 4 bytes\n");
+   return -EINVAL;
+   }
+
+   if (data_size < hdr->func_info_off ||
+   hdr->func_info_len > data_size - hdr->func_info_off) {
+   elog("func_info section (off:%u len:%u) is beyond the end of 
the ELF section .BTF.ext\n",
+  

Re: [PATCH net-next 0/4] tc-testing: implement command timeouts and better results tracking

2018-12-07 Thread David Miller
From: Lucas Bates 
Date: Thu,  6 Dec 2018 17:42:23 -0500

> Patch 1 adds a timeout feature for any command tdc launches in a subshell.
> This prevents tdc from hanging indefinitely.
> 
> Patches 2-4 introduce a new method for tracking and generating test case
> results, and implements it across the core script and all applicable
> plugins.

Series applied.


Re: [PATCH net v2 0/2] Fix slab out-of-bounds on insufficient headroom for IPv6 packets

2018-12-07 Thread David Miller
From: Stefano Brivio 
Date: Thu,  6 Dec 2018 19:30:35 +0100

> Patch 1/2 fixes a slab out-of-bounds occurring with short SCTP packets over
> IPv4 over L2TP over IPv6 on a configuration with relatively low HEADER_MAX.
> 
> Patch 2/2 makes sure we avoid writing before the allocated buffer in
> neigh_hh_output() in case the headroom is enough for the unaligned hardware
> header size, but not enough for the aligned one, and that we warn if we hit
> this condition.

Series applied and queued up for -stable, thanks.


Re: [PATCH net] tcp: lack of available data can also cause TSO defer

2018-12-07 Thread David Miller
From: Eric Dumazet 
Date: Thu,  6 Dec 2018 09:58:24 -0800

> tcp_tso_should_defer() can return true in three different cases :
> 
>  1) We are cwnd-limited
>  2) We are rwnd-limited
>  3) We are application limited.
> 
> Neal pointed out that my recent fix went too far, since
> it assumed that if we were not in 1) case, we must be rwnd-limited
> 
> Fix this by properly populating the is_cwnd_limited and
> is_rwnd_limited booleans.
> 
> After this change, we can finally move the silly check for FIN
> flag only for the application-limited case.
> 
> The same move for EOR bit will be handled in net-next,
> since commit 1c09f7d073b1 ("tcp: do not try to defer skbs
> with eor mark (MSG_EOR)") is scheduled for linux-4.21
> 
> Tested by running 200 concurrent netperf -t TCP_RR -- -r 6,100
> and checking none of them was rwnd_limited in the chrono_stat
> output from "ss -ti" command.
> 
> Fixes: 41727549de3e ("tcp: Do not underestimate rwnd_limited")
> Signed-off-by: Eric Dumazet 
> Suggested-by: Neal Cardwell 
> Reviewed-by: Neal Cardwell 
> Acked-by: Soheil Hassas Yeganeh 
> Reviewed-by: Yuchung Cheng 

Applied.


Re: [PATCH] net-udp: deprioritize cpu match for udp socket lookup

2018-12-07 Thread David Miller
From: Maciej Żenczykowski 
Date: Wed,  5 Dec 2018 12:59:17 -0800

> From: Maciej Żenczykowski 
> 
> During udp socket lookup cpu match should be lowest priority,
> hence it should increase score by only 1.
> 
> The next priority is delivering v4 to v4 sockets, and v6 to v6 sockets.
> The v6 code path doesn't have to deal with this so it always gets
> a score of '4'.  The v4 code path uses '4' or '2' depending on
> whether we're delivering to a v4 socket or a dualstack v6 socket.
> 
> This is more important than cpu match, so has to be greater than
> the '1' bump in score from cpu match.
> 
> All other matches (src/dst ip, src port) are even *more* important,
> so need to bump score by 4 for ipv4.
> 
> For ipv6 we could simply bump by 2, but let's keep the two code
> paths as similar as possible.
> 
> (also, while at it, remove two unnecessary unconditional score bumps)
> 
> Signed-off-by: Maciej Żenczykowski 

This doesn't apply to the current net tree.

Also "net-udp: " is a weird subsystem prefix, just use "udp: ".

Thank you.


RE: [PATCH] net: dsa: ksz: Fix port membership

2018-12-07 Thread Tristram.Ha
> Do you have a git tree with all the KSZ patches based on -next
> somewhere, so I don't have to look for them in random MLs ?

I just sent it this Monday and the subject for that patch is
"[PATCH RFC 6/6] net: dsa: microchip: Add switch offload forwarding support."



Re: [Patch v2 net-next] call sk_dst_reset when set SO_DONTROUTE

2018-12-07 Thread David Miller
From: yupeng 
Date: Wed,  5 Dec 2018 18:56:28 -0800

> after set SO_DONTROUTE to 1, the IP layer should not route packets if
> the dest IP address is not in link scope. But if the socket has cached
> the dst_entry, such packets would be routed until the sk_dst_cache
> expires. So we should clean the sk_dst_cache when a user set
> SO_DONTROUTE option. Below are server/client python scripts which
> could reprodue this issue:
 ...
> Signed-off-by: yupeng 

Applied.


RE: [PATCH] net: dsa: ksz: Fix port membership

2018-12-07 Thread Tristram.Ha
> > I think if you do this without setting offload_fwd_mark you will
> > receive duplicate frame.
> 
> I don't think it will, at least not in the normal case. The hardware
> should know the egress port, so there is no need to forward a copy to
> the CPU. The only time it should forward to the CPU is when the egress
> port is not known, so it floods. Without offload_fwd_mark set, the SW
> bridge will flood it back out the ports causing duplication. But that
> is not too bad. The Marvell driver did this for a while and nothing
> bad was reported.

For unicast frames it is okay as the CPU port does not see it after the first
one.  For multicast frames there will be duplicates, and it is tolerated?



Re: [PATCH v2 net-next] neighbor: Improve garbage collection

2018-12-07 Thread David Miller
From: David Ahern 
Date: Fri,  7 Dec 2018 12:24:57 -0800

> From: David Ahern 
> 
> The existing garbage collection algorithm has a number of problems:
 ...
> This patch addresses these problems as follows:
> 
> 1. Use of a separate list_head to track entries that can be garbage
>collected along with a separate counter. PERMANENT entries are not
>added to this list.
> 
>The gc_thresh parameters are only compared to the new counter, not the
>total entries in the table. The forced_gc function is updated to only
>walk this new gc_list looking for entries to evict.
> 
> 2. Entries are added to the list head at the tail and removed from the
>front.
> 
> 3. Entries are only evicted if they were last updated more than 5 seconds
>ago, adhering to the original intent of gc_thresh2.
> 
> 4. Forced gc is stopped once the number of gc_entries drops below
>gc_thresh2.
> 
> 5. Since gc checks do not apply to PERMANENT entries, gc levels are skipped
>when allocating a new neighbor for a PERMANENT entry. By extension this
>means there are no explicit limits on the number of PERMANENT entries
>that can be created, but this is no different than FIB entries or FDB
>entries.
> 
> Signed-off-by: David Ahern 
> ---
> v2
> - remove on_gc_list boolean in favor of !list_empty
> - fix neigh_alloc to add new entry to tail of list_head

Again, looks great, applied.


Re: [PATCH net-next 00/14] net: hns3: Additions/optimizations related to HNS3 H/W err handling

2018-12-07 Thread David Miller
From: Salil Mehta 
Date: Fri, 7 Dec 2018 21:07:57 +

> This patch set primarily does following addtions and optimizations
> related to error handling in HNS3 Ethernet driver:
> 
>  1. Name changes for enable and process functions and minor loop
> optimizations. [PATCH 1-6]
>  2. Modify query and clearing of RAS errors using new set of commands
> because modules specific commands for clearing RCB PPP PF, SSU are
> obselete. [PATCH 7]
>  3. Deletes logging 1-bit errors for RAS in HNS3 driver as these never
> get reported to the driver. [PATCH 8]
>  4. Add handling of NIC hw errors reported through MSIx rather than
> PCIe AER channel. [PATCH 9]
>  5. Add handling for the HW RAS and MSIx errors in the modules MAC, PPP
> PF, MSIx SRAM, RCB and SSU. [PATCH 10-13]
>  6. Add handling of RoCEE RAS errors. [PATCH 14]

Series applied, thank you.


RE: [PATCH] net: dsa: ksz: Fix port membership

2018-12-07 Thread Tristram.Ha
> >> If two ports are in the same bridge and in forwarding state, the packets
> >> must be able to pass between them in both directions. The current code
> >> only configures this bridge membership for a port newly added to the
> >> bridge, but does not update all the other ports. Thus, ingress packets
> >> on the new port will be forwarded, but ingress packets on other ports
> >> destined for the new port (eg. a reply) will not be forwarded back to
> >> the new port, because they are not configured to do so. This patch fixes
> >> that by updating the membership registers of all ports.
> >>
> >> Signed-off-by: Marek Vasut 
> >> Cc: Vivien Didelot 
> >> Cc: Woojung Huh 
> >> Cc: David S. Miller 
> >> Cc: Tristram Ha 
> >> ---
> >>  drivers/net/dsa/microchip/ksz9477.c | 6 +++---
> >>  1 file changed, 3 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/drivers/net/dsa/microchip/ksz9477.c
> >> b/drivers/net/dsa/microchip/ksz9477.c
> >> index 0684657fbf9a9..e24dd14ccde77 100644
> >> --- a/drivers/net/dsa/microchip/ksz9477.c
> >> +++ b/drivers/net/dsa/microchip/ksz9477.c
> >> @@ -396,7 +396,7 @@ static void ksz9477_port_stp_state_set(struct
> >> dsa_switch *ds, int port,
> >>struct ksz_device *dev = ds->priv;
> >>struct ksz_port *p = &dev->ports[port];
> >>u8 data;
> >> -  int member = -1;
> >> +  int i, member = -1;
> >>
> >>ksz_pread8(dev, port, P_STP_CTRL, &data);
> >>data &= ~(PORT_TX_ENABLE | PORT_RX_ENABLE |
> >> PORT_LEARN_DISABLE);
> >> @@ -454,8 +454,8 @@ static void ksz9477_port_stp_state_set(struct
> >> dsa_switch *ds, int port,
> >>dev->tx_ports &= ~(1 << port);
> >>
> >>/* Port membership may share register with STP state. */
> >> -  if (member >= 0 && member != p->member)
> >> -  ksz9477_cfg_port_member(dev, port, (u8)member);
> >> +  for (i = 0; i < SWITCH_PORT_NUM; i++)
> >> +  ksz9477_cfg_port_member(dev, i, (u8)member);
> >>
> >>/* Check if forwarding needs to be updated. */
> >>if (state != BR_STATE_FORWARDING) {
> >
> > The original DSA model did not have a way to tell the bridge device not to
> > forward the frame, so the switch driver always setup the membership to
> > disable forwarding between ports.
> >
> > When lan devices are setup they act like individual devices.  A bridge 
> > device
> > adding them under it will forward the frames.
> >
> > The new switchdev model adds the offload_fwd_mark bit to tell the bridge
> not to
> > forward frame.
> >
> > The ksz_update_port_member function in ksz_common.c is doing this
> membership
> > setup for all forwarding ports.  It was finally enabled in one of the RFC
> patches I
> > submitted recently (Add switch forward offloading support).
> >
> > I think if you do this without setting offload_fwd_mark you will receive
> duplicate
> > frame.
> >
> 
> Either I am misreading Marek's patch, or I don't quite understand your
> response, but what is happening when you enslave a switch port into a
> bridge is that you need to make sure that:
> 
> - the switch port being enslaved will be part of the same forwarding
> group as any other switch port already in the bridge
> - any existing switch port already enslaved in the bridge must now also
> be allowed to forward to the port that is being enslaved
> 
> That is to me, exactly what Marek's patch is fixing, your response is
> about something slightly orthogonal here.

I got confused here as the code is obviously wrong and should not work,
so I found out why it works in the bridge device situation.  There is actually
a bug in the driver that enables this behavior.  The port_vlan_filtering 
function
turns off the port membership enforcement.  Fixing this problem should be
easy, but this port_vlan_filtering function is also not implemented right.  It 
treats the
operation as a simple VLAN on/off, but it is more complex than that.

Anyway it seems to work in the bridge device situation, but it does not work
in the default situation:

Assume there are two port devices lan1 and lan2.  The device lan1 is assigned
an IP address and can talk to outside.  Enabling and disabling lan2 by doing
"ifconfig lan2 up" and "ifconfig lan2 down."  The device lan1 is no longer 
working.

Create a bridge device and add a child device by doing "brctl addbr br0" and
"brctl addif br0 lan1."  This will call port_vlan_filtering and then the feature
UNICAST_VLAN_BOUNDARY is disabled.  This causes port membership to have no
effect on unicast packets and so it does not matter what member value is used.
The device lan1 can start working again.

The fix is to avoid disabling UNICAST_VLAN_BOUNDARY and it should be set all
the time.  In this switch the default is on.


Re: [PATCH V2] net: dsa: ksz: Add reset GPIO handling

2018-12-07 Thread David Miller
From: Marek Vasut 
Date: Fri, 7 Dec 2018 23:59:58 +0100

> On 12/07/2018 11:24 PM, Andrew Lunn wrote:
>> On Fri, Dec 07, 2018 at 10:51:36PM +0100, Marek Vasut wrote:
>>> Add code to handle optional reset GPIO in the KSZ switch driver. The switch
>>> has a reset GPIO line which can be controlled by the CPU, so make sure it is
>>> configured correctly in such setups.
>> 
>> Hi Marek
> 
> Hi Andrew,
> 
>> Please make this a patch series, not two individual patches.
> 
> This actually is an individual patch, it doesn't depend on anything.
> Or do you mean a series with the DT documentation change ?

Yes, but all of this stuff is building up for one single purpose,
and that is to support a new mode of operation with DSA or whatever.

So please group them together in a series with an appropriate
header posting.


Re: [PATCH net-next] neighbor: Add protocol attribute

2018-12-07 Thread David Miller
From: Eric Dumazet 
Date: Fri, 7 Dec 2018 15:03:04 -0800

> On 12/07/2018 02:24 PM, David Ahern wrote:
>> On 12/7/18 3:20 PM, Eric Dumazet wrote:
>> 
>> /* --- cacheline 3 boundary (192 bytes) --- */
>> struct hh_cachehh;   /*   19248 */
>> 
>> ...
>> 
>> but does not change the actual allocation size which is rounded to 512.
>> 
> 
> I have not talked about the allocation size, but alignment of ->ha field,
> which is kind of assuming long alignment, in a strange way.

Right, neigh->ha[] should probably be kept 8-byte aligned.


Re: [PATCH net-next] neighbor: Add protocol attribute

2018-12-07 Thread Eric Dumazet



On 12/07/2018 02:24 PM, David Ahern wrote:
> On 12/7/18 3:20 PM, Eric Dumazet wrote:
>>
>>
>> On 12/07/2018 01:49 PM, David Ahern wrote:
>>> From: David Ahern 
>>>
>>> Similar to routes and rules, add protocol attribute to neighbor entries
>>> for easier tracking of how each was created.
>>>
>>> Signed-off-by: David Ahern 
>>> ---
>>>  include/net/neighbour.h|  2 ++
>>>  include/uapi/linux/neighbour.h |  1 +
>>>  net/core/neighbour.c   | 24 +++-
>>>  3 files changed, 26 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/include/net/neighbour.h b/include/net/neighbour.h
>>> index 6c13072910ab..e93c59df9501 100644
>>> --- a/include/net/neighbour.h
>>> +++ b/include/net/neighbour.h
>>> @@ -149,6 +149,7 @@ struct neighbour {
>>> __u8nud_state;
>>> __u8type;
>>> __u8dead;
>>> +   u8  protocol;
>>> seqlock_t   ha_lock;
>>> unsigned char   ha[ALIGN(MAX_ADDR_LEN, sizeof(unsigned long))];
>>
>> This looks like ha[] alignment would change, I am not sure how critical it 
>> is.
> 
> Just adds 4 bytes to neighbour:
> 
> ...
> /* --- cacheline 2 boundary (128 bytes) --- */
> long unsigned int  used; /*   128 8 */
> atomic_t   probes;   /*   136 4 */
> __u8   flags;/*   140 1 */
> __u8   nud_state;/*   141 1 */
> __u8   type; /*   142 1 */
> __u8   dead; /*   143 1 */
> u8 protocol; /*   144 1 */
> 
> /* XXX 3 bytes hole, try to pack */
> seqlock_t  ha_lock;  /*   148 8 */
> unsigned char  ha[32];   /*   15632 */
> /* XXX 4 bytes hole, try to pack */
> 
> /* --- cacheline 3 boundary (192 bytes) --- */
> struct hh_cachehh;   /*   19248 */
> 
> ...
> 
> but does not change the actual allocation size which is rounded to 512.
> 

I have not talked about the allocation size, but alignment of ->ha field,
which is kind of assuming long alignment, in a strange way.

As I said, I do not know how performance critical this might be.



Re: [PATCH V2] net: dsa: ksz: Add reset GPIO handling

2018-12-07 Thread Marek Vasut
On 12/07/2018 11:24 PM, Andrew Lunn wrote:
> On Fri, Dec 07, 2018 at 10:51:36PM +0100, Marek Vasut wrote:
>> Add code to handle optional reset GPIO in the KSZ switch driver. The switch
>> has a reset GPIO line which can be controlled by the CPU, so make sure it is
>> configured correctly in such setups.
> 
> Hi Marek

Hi Andrew,

> Please make this a patch series, not two individual patches.

This actually is an individual patch, it doesn't depend on anything.
Or do you mean a series with the DT documentation change ?

> And as David has already said, include a cover letter.
> 
> Otherwise, this looks O.K.
> 
> Thanks
>   Andrew
> 


-- 
Best regards,
Marek Vasut


Re: [PATCH net-next] neighbor: Add protocol attribute

2018-12-07 Thread David Ahern
On 12/7/18 3:20 PM, Eric Dumazet wrote:
> 
> 
> On 12/07/2018 01:49 PM, David Ahern wrote:
>> From: David Ahern 
>>
>> Similar to routes and rules, add protocol attribute to neighbor entries
>> for easier tracking of how each was created.
>>
>> Signed-off-by: David Ahern 
>> ---
>>  include/net/neighbour.h|  2 ++
>>  include/uapi/linux/neighbour.h |  1 +
>>  net/core/neighbour.c   | 24 +++-
>>  3 files changed, 26 insertions(+), 1 deletion(-)
>>
>> diff --git a/include/net/neighbour.h b/include/net/neighbour.h
>> index 6c13072910ab..e93c59df9501 100644
>> --- a/include/net/neighbour.h
>> +++ b/include/net/neighbour.h
>> @@ -149,6 +149,7 @@ struct neighbour {
>>  __u8nud_state;
>>  __u8type;
>>  __u8dead;
>> +u8  protocol;
>>  seqlock_t   ha_lock;
>>  unsigned char   ha[ALIGN(MAX_ADDR_LEN, sizeof(unsigned long))];
> 
> This looks like ha[] alignment would change, I am not sure how critical it is.

Just adds 4 bytes to neighbour:

...
/* --- cacheline 2 boundary (128 bytes) --- */
long unsigned int  used; /*   128 8 */
atomic_t   probes;   /*   136 4 */
__u8   flags;/*   140 1 */
__u8   nud_state;/*   141 1 */
__u8   type; /*   142 1 */
__u8   dead; /*   143 1 */
u8 protocol; /*   144 1 */

/* XXX 3 bytes hole, try to pack */
seqlock_t  ha_lock;  /*   148 8 */
unsigned char  ha[32];   /*   15632 */
/* XXX 4 bytes hole, try to pack */

/* --- cacheline 3 boundary (192 bytes) --- */
struct hh_cachehh;   /*   19248 */

...

but does not change the actual allocation size which is rounded to 512.


Re: [PATCH V2] net: dsa: ksz: Add reset GPIO handling

2018-12-07 Thread Andrew Lunn
On Fri, Dec 07, 2018 at 10:51:36PM +0100, Marek Vasut wrote:
> Add code to handle optional reset GPIO in the KSZ switch driver. The switch
> has a reset GPIO line which can be controlled by the CPU, so make sure it is
> configured correctly in such setups.

Hi Marek

Please make this a patch series, not two individual patches.

And as David has already said, include a cover letter.

Otherwise, this looks O.K.

Thanks
Andrew


Re: [PATCH net-next] neighbor: Add protocol attribute

2018-12-07 Thread Eric Dumazet



On 12/07/2018 01:49 PM, David Ahern wrote:
> From: David Ahern 
> 
> Similar to routes and rules, add protocol attribute to neighbor entries
> for easier tracking of how each was created.
> 
> Signed-off-by: David Ahern 
> ---
>  include/net/neighbour.h|  2 ++
>  include/uapi/linux/neighbour.h |  1 +
>  net/core/neighbour.c   | 24 +++-
>  3 files changed, 26 insertions(+), 1 deletion(-)
> 
> diff --git a/include/net/neighbour.h b/include/net/neighbour.h
> index 6c13072910ab..e93c59df9501 100644
> --- a/include/net/neighbour.h
> +++ b/include/net/neighbour.h
> @@ -149,6 +149,7 @@ struct neighbour {
>   __u8nud_state;
>   __u8type;
>   __u8dead;
> + u8  protocol;
>   seqlock_t   ha_lock;
>   unsigned char   ha[ALIGN(MAX_ADDR_LEN, sizeof(unsigned long))];

This looks like ha[] alignment would change, I am not sure how critical it is.

>   struct hh_cache hh;
> @@ -173,6 +174,7 @@ struct pneigh_entry {
>   possible_net_t  net;
>   struct net_device   *dev;
>   u8  flags;
> + u8  protocol;
>   u8  key[0];
>  };
>  




[PATCH] net: dsa: Add optional reset GPIO to Microchip KSZ switch binding

2018-12-07 Thread Marek Vasut
Add optional reset GPIO, as such a signal is available on the KSZ switches.

Signed-off-by: Marek Vasut 
Cc: Andrew Lunn 
Cc: Florian Fainelli 
Cc: Woojung Huh 
Cc: David S. Miller 
---
 Documentation/devicetree/bindings/net/dsa/ksz.txt | 4 
 1 file changed, 4 insertions(+)

diff --git a/Documentation/devicetree/bindings/net/dsa/ksz.txt 
b/Documentation/devicetree/bindings/net/dsa/ksz.txt
index ac145b885e955..0f407fb371ce1 100644
--- a/Documentation/devicetree/bindings/net/dsa/ksz.txt
+++ b/Documentation/devicetree/bindings/net/dsa/ksz.txt
@@ -8,6 +8,10 @@ Required properties:
   - "microchip,ksz9477"
   - "microchip,ksz9897"
 
+Optional properties:
+
+- reset-gpios  : Should be a gpio specifier for a reset line
+
 See Documentation/devicetree/bindings/net/dsa/dsa.txt for a list of additional
 required and optional properties.
 
-- 
2.18.0



Re: [PATCH] net: dsa: ksz: Add reset GPIO handling

2018-12-07 Thread Marek Vasut
On 12/07/2018 08:55 PM, Andrew Lunn wrote:
>> +dev->reset_gpio = -1;
>> +reset_gpio = of_get_named_gpio_flags(np, "reset-gpios", 0,
>> + &reset_gpio_flags);
>> +if (reset_gpio >= 0) {
>> +flags = (reset_gpio_flags == OF_GPIO_ACTIVE_LOW) ?
>> +GPIOF_ACTIVE_LOW : 0;
> 
> Can you use devm_gpiod_get_optional()? It makes this a lot simpler.
> Take a look at mv88e6xxx/chip.c which also uses a GPIO for reset.

Done

> You also need to update the binding documentation for this new
> property.

Will do in a separate patch.

-- 
Best regards,
Marek Vasut


[PATCH V2] net: dsa: ksz: Add reset GPIO handling

2018-12-07 Thread Marek Vasut
Add code to handle optional reset GPIO in the KSZ switch driver. The switch
has a reset GPIO line which can be controlled by the CPU, so make sure it is
configured correctly in such setups.

Signed-off-by: Marek Vasut 
Cc: Vivien Didelot 
Cc: Woojung Huh 
Cc: David S. Miller 
Cc: Tristram Ha 
---
V2: Switch to devm_gpiod_get_optional()
---
 drivers/net/dsa/microchip/ksz_common.c | 17 +
 drivers/net/dsa/microchip/ksz_priv.h   |  2 ++
 2 files changed, 19 insertions(+)

diff --git a/drivers/net/dsa/microchip/ksz_common.c 
b/drivers/net/dsa/microchip/ksz_common.c
index 9705808c3af7a..3b12e2dcff31b 100644
--- a/drivers/net/dsa/microchip/ksz_common.c
+++ b/drivers/net/dsa/microchip/ksz_common.c
@@ -8,12 +8,14 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -294,6 +296,17 @@ int ksz_switch_register(struct ksz_device *dev,
if (dev->pdata)
dev->chip_id = dev->pdata->chip_id;
 
+   dev->reset_gpio = devm_gpiod_get_optional(dev->dev, "reset",
+ GPIOD_OUT_LOW);
+   if (IS_ERR(dev->reset_gpio))
+   return PTR_ERR(dev->reset_gpio);
+
+   if (dev->reset_gpio) {
+   gpiod_set_value(dev->reset_gpio, 1);
+   mdelay(10);
+   gpiod_set_value(dev->reset_gpio, 0);
+   }
+
mutex_init(&dev->reg_mutex);
mutex_init(&dev->stats_mutex);
mutex_init(&dev->alu_mutex);
@@ -329,6 +342,10 @@ void ksz_switch_remove(struct ksz_device *dev)
 {
dev->dev_ops->exit(dev);
dsa_unregister_switch(dev->ds);
+
+   if (dev->reset_gpio)
+   gpiod_set_value(dev->reset_gpio, 1);
+
 }
 EXPORT_SYMBOL(ksz_switch_remove);
 
diff --git a/drivers/net/dsa/microchip/ksz_priv.h 
b/drivers/net/dsa/microchip/ksz_priv.h
index a38ff0841ed4e..60b49010904bf 100644
--- a/drivers/net/dsa/microchip/ksz_priv.h
+++ b/drivers/net/dsa/microchip/ksz_priv.h
@@ -59,6 +59,8 @@ struct ksz_device {
 
void *priv;
 
+   struct gpio_desc *reset_gpio;   /* Optional reset GPIO */
+
/* chip specific data */
u32 chip_id;
int num_vlans;
-- 
2.18.0



[PATCH net-next] neighbor: Add protocol attribute

2018-12-07 Thread David Ahern
From: David Ahern 

Similar to routes and rules, add protocol attribute to neighbor entries
for easier tracking of how each was created.

Signed-off-by: David Ahern 
---
 include/net/neighbour.h|  2 ++
 include/uapi/linux/neighbour.h |  1 +
 net/core/neighbour.c   | 24 +++-
 3 files changed, 26 insertions(+), 1 deletion(-)

diff --git a/include/net/neighbour.h b/include/net/neighbour.h
index 6c13072910ab..e93c59df9501 100644
--- a/include/net/neighbour.h
+++ b/include/net/neighbour.h
@@ -149,6 +149,7 @@ struct neighbour {
__u8nud_state;
__u8type;
__u8dead;
+   u8  protocol;
seqlock_t   ha_lock;
unsigned char   ha[ALIGN(MAX_ADDR_LEN, sizeof(unsigned long))];
struct hh_cache hh;
@@ -173,6 +174,7 @@ struct pneigh_entry {
possible_net_t  net;
struct net_device   *dev;
u8  flags;
+   u8  protocol;
u8  key[0];
 };
 
diff --git a/include/uapi/linux/neighbour.h b/include/uapi/linux/neighbour.h
index 998155444e0d..cd144e3099a3 100644
--- a/include/uapi/linux/neighbour.h
+++ b/include/uapi/linux/neighbour.h
@@ -28,6 +28,7 @@ enum {
NDA_MASTER,
NDA_LINK_NETNSID,
NDA_SRC_VNI,
+   NDA_PROTOCOL,  /* Originator of entry */
__NDA_MAX
 };
 
diff --git a/net/core/neighbour.c b/net/core/neighbour.c
index c3b58712e98b..56984695585d 100644
--- a/net/core/neighbour.c
+++ b/net/core/neighbour.c
@@ -1799,6 +1799,7 @@ static int neigh_add(struct sk_buff *skb, struct nlmsghdr 
*nlh,
struct net_device *dev = NULL;
struct neighbour *neigh;
void *dst, *lladdr;
+   u8 protocol = 0;
int err;
 
ASSERT_RTNL();
@@ -1838,6 +1839,14 @@ static int neigh_add(struct sk_buff *skb, struct 
nlmsghdr *nlh,
dst = nla_data(tb[NDA_DST]);
lladdr = tb[NDA_LLADDR] ? nla_data(tb[NDA_LLADDR]) : NULL;
 
+   if (tb[NDA_PROTOCOL]) {
+   if (nla_len(tb[NDA_PROTOCOL]) != sizeof(u8)) {
+   NL_SET_ERR_MSG(extack, "Invalid protocol attribute");
+   goto out;
+   }
+   protocol = nla_get_u8(tb[NDA_PROTOCOL]);
+   }
+
if (ndm->ndm_flags & NTF_PROXY) {
struct pneigh_entry *pn;
 
@@ -1845,6 +1854,8 @@ static int neigh_add(struct sk_buff *skb, struct nlmsghdr 
*nlh,
pn = pneigh_lookup(tbl, net, dst, dev, 1);
if (pn) {
pn->flags = ndm->ndm_flags;
+   if (protocol)
+   pn->protocol = protocol;
err = 0;
}
goto out;
@@ -1893,6 +1904,10 @@ static int neigh_add(struct sk_buff *skb, struct 
nlmsghdr *nlh,
} else
err = __neigh_update(neigh, lladdr, ndm->ndm_state, flags,
 NETLINK_CB(skb).portid, extack);
+
+   if (protocol)
+   neigh->protocol = protocol;
+
neigh_release(neigh);
 
 out:
@@ -2386,6 +2401,9 @@ static int neigh_fill_info(struct sk_buff *skb, struct 
neighbour *neigh,
nla_put(skb, NDA_CACHEINFO, sizeof(ci), &ci))
goto nla_put_failure;
 
+   if (neigh->protocol && nla_put_u8(skb, NDA_PROTOCOL, neigh->protocol))
+   goto nla_put_failure;
+
nlmsg_end(skb, nlh);
return 0;
 
@@ -2417,6 +2435,9 @@ static int pneigh_fill_info(struct sk_buff *skb, struct 
pneigh_entry *pn,
if (nla_put(skb, NDA_DST, tbl->key_len, pn->key))
goto nla_put_failure;
 
+   if (pn->protocol && nla_put_u8(skb, NDA_PROTOCOL, pn->protocol))
+   goto nla_put_failure;
+
nlmsg_end(skb, nlh);
return 0;
 
@@ -3072,7 +3093,8 @@ static inline size_t neigh_nlmsg_size(void)
   + nla_total_size(MAX_ADDR_LEN) /* NDA_DST */
   + nla_total_size(MAX_ADDR_LEN) /* NDA_LLADDR */
   + nla_total_size(sizeof(struct nda_cacheinfo))
-  + nla_total_size(4); /* NDA_PROBES */
+  + nla_total_size(4)  /* NDA_PROBES */
+  + nla_total_size(1); /* NDA_PROTOCOL */
 }
 
 static void __neigh_notify(struct neighbour *n, int type, int flags,
-- 
2.11.0



Re: [PATCH net-next v2 1/4] indirect call wrappers: helpers to speed-up indirect calls of builtin

2018-12-07 Thread David Woodhouse
On Fri, 2018-12-07 at 21:46 +0100, Paolo Abeni wrote:
> > I wonder if we can declare the common case functions as 'weak' so that
> > the link failures don't happen when they're absent.
> 
> I experimented a previous version with alias. I avoided weak alias
> usage, because I [mis?]understood not all compilers have a complete
> support for them (e.g. clang).
> Also, with weak ref, a coding error that is now discovered at build
> time will result in worse performance at runtime, likely with some
> uncommon configuration, possibly not as easily detected. I'm unsure
> that would be better ?!?

I think everything supports weak linkage; we've been using it for
years.

> > Once we extend this past the network code, especially to file systems'
> > f_ops, I suspect we're going to want to use something like static keys
> > to patch the common cases at runtime — perhaps changing the f_ops
> > default according to what the root file system is, etc.
> 
> I'm sorry, I don't follow here. I think static keys can't be used for
> the reported network case: we have different list elements each
> contaning a different function pointer and we access/use
> different ptr on a per packet basis.

Yes, the alternatives would be used to change the "likely" case.

We still do the "if (fn == default_fn) default_fn(); else (*fn)();"
part; or even the variant with two (or more) common cases. 

It's just that the value of 'default_fn' can be changed at runtime
(with patching like alternatives/static keys, since of course it has to
be a direct call).




smime.p7s
Description: S/MIME cryptographic signature


Re: [PATCH bpf 1/2] selftests/bpf: use thoff instead of nhoff in BPF flow dissector

2018-12-07 Thread Alexei Starovoitov
On Wed, Dec 05, 2018 at 08:40:47PM -0800, Stanislav Fomichev wrote:
> We are returning thoff from the flow dissector, not the nhoff. Pass
> thoff along with nhoff to the bpf program (initially thoff == nhoff)
> and expect flow dissector amend/return thoff, not nhoff.
> 
> This avoids confusion, when by the time bpf flow dissector exits,
> nhoff == thoff, which doesn't make much sense.
> 
> Signed-off-by: Stanislav Fomichev 

applied both to bpf tree. thanks



Re: [PATCH v2 bpf-next 0/7] bpf: support BPF_ALU | BPF_ARSH

2018-12-07 Thread Alexei Starovoitov
On Wed, Dec 05, 2018 at 01:52:29PM -0500, Jiong Wang wrote:
> BPF_ALU | BPF_ARSH | BPF_* were rejected by commit: 7891a87efc71
> ("bpf: arsh is not supported in 32 bit alu thus reject it"). As explained
> in the commit message, this is due to there is no complete support for them
> on interpreter and various JIT compilation back-ends.
> 
> This patch set is a follow-up which completes the missing bits. This also
> pave the way for running bpf program compiled with ALU32 instruction
> enabled by specifing -mattr=+alu32 to LLVM for which case there is likely
> to have more BPF_ALU | BPF_ARSH insns that will trigger the rejection code.
> 
> test_verifier.c is updated accordingly.
> 
> I have tested this patch set on x86-64 and NFP, I need help of review and
> test on the arch changes (mips/ppc/s390).
> 
> Note, there might be merge confict on mips change which is better to be
> applied on top of:
> 
>   commit: 20b880a05f06 ("mips: bpf: fix encoding bug for mm_srlv32_op"),
> 
> which is on mips-fixes branch at the moment.
> 
> Thanks.
> 
> v1->v2:
>  - Fix ppc implementation bug. Should zero high bits explicitly.

I've applied this set and earlier commit "mips: bpf: fix encoding bug for 
mm_srlv32_op"
to bpf-next.

Thanks



Re: [PATCH bpf-next 0/7] Add XDP_ATTACH bind() flag to AF_XDP sockets

2018-12-07 Thread Alexei Starovoitov
On Fri, Dec 07, 2018 at 12:44:24PM +0100, Björn Töpel wrote:
> From: Björn Töpel 
> 
> Hi!
> 
> This patch set adds support for a new XDP socket bind option,
> XDP_ATTACH.
> 
> The rationale behind attach is performance and ease of use. Many XDP
> socket users just need a simple way of creating/binding a socket and
> receiving frames right away without loading an XDP program.
> 
> XDP_ATTACH adds a mechanism we call "builtin XDP program" that simply
> is a kernel provided XDP program that is installed to the netdev when
> XDP_ATTACH is being passed as a bind() flag.
> 
> The builtin program is the simplest program possible to redirect a
> frame to an attached socket. In restricted C it would look like this:
> 
>   SEC("xdp")
>   int xdp_prog(struct xdp_md *ctx)
>   {
> return bpf_xsk_redirect(ctx);
>   }
> 
> The builtin program loaded via XDP_ATTACH behaves, from an
> install-to-netdev/uninstall-from-netdev point of view, differently
> from regular XDP programs. The easiest way to look at it is as a
> 2-level hierarchy, where regular XDP programs has precedence over the
> builtin one.

The feature makes sense to me.
May be XDP_ATTACH_BUILTIN would be a better name ?
Also I think it needs another parameter to say which builtin
program to use.
This unconditional xsk_redirect is fine for performance
benchmarking, but for production I suspect the users would want
an easy way to stay safe when they're playing with AF_XDP.
So another builtin program that redirects ssh and ping traffic
back to the kernel would be a nice addition.



[PATCH net-next 01/14] net: hns3: remove existing process error functions and reorder hw_blk table

2018-12-07 Thread Salil Mehta
From: Shiju Jose 

1.The command interface for queryng and clearing hw errors is
  changed, which requires the new process error functions to be added.
  This patch removes all the current process error functions and
  associated definitions. The new functions to handle ras errors
  would be added in this patch set.

2. Fixed order issue of the hw_blk table.

Signed-off-by: Shiju Jose 
Signed-off-by: Salil Mehta 
---
 .../net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.h |  12 -
 .../net/ethernet/hisilicon/hns3/hns3pf/hclge_err.c | 462 +
 .../net/ethernet/hisilicon/hns3/hns3pf/hclge_err.h |  19 -
 3 files changed, 18 insertions(+), 475 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.h 
b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.h
index e1805b9..d2fb210 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.h
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.h
@@ -216,25 +216,13 @@ enum hclge_opcode_type {
 
/* Error INT commands */
HCLGE_TM_SCH_ECC_INT_EN = 0x0829,
-   HCLGE_TM_SCH_ECC_ERR_RINT_CMD   = 0x082d,
-   HCLGE_TM_SCH_ECC_ERR_RINT_CE= 0x082f,
-   HCLGE_TM_SCH_ECC_ERR_RINT_NFE   = 0x0830,
-   HCLGE_TM_SCH_ECC_ERR_RINT_FE= 0x0831,
-   HCLGE_TM_SCH_MBIT_ECC_INFO_CMD  = 0x0833,
HCLGE_COMMON_ECC_INT_CFG= 0x1505,
-   HCLGE_IGU_EGU_TNL_INT_QUERY = 0x1802,
HCLGE_IGU_EGU_TNL_INT_EN= 0x1803,
-   HCLGE_IGU_EGU_TNL_INT_CLR   = 0x1804,
-   HCLGE_IGU_COMMON_INT_QUERY  = 0x1805,
HCLGE_IGU_COMMON_INT_EN = 0x1806,
-   HCLGE_IGU_COMMON_INT_CLR= 0x1807,
HCLGE_TM_QCN_MEM_INT_CFG= 0x1A14,
-   HCLGE_TM_QCN_MEM_INT_INFO_CMD   = 0x1A17,
HCLGE_PPP_CMD0_INT_CMD  = 0x2100,
HCLGE_PPP_CMD1_INT_CMD  = 0x2101,
-   HCLGE_NCSI_INT_QUERY= 0x2400,
HCLGE_NCSI_INT_EN   = 0x2401,
-   HCLGE_NCSI_INT_CLR  = 0x2402,
 };
 
 #define HCLGE_TQP_REG_OFFSET   0x8
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_err.c 
b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_err.c
index 6da9e22..ac9ab3c 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_err.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_err.c
@@ -336,25 +336,6 @@ static const struct hclge_hw_error hclge_qcn_ecc_err_int[] 
= {
{ /* sentinel */ }
 };
 
-static void hclge_log_error(struct device *dev,
-   const struct hclge_hw_error *err_list,
-   u32 err_sts)
-{
-   const struct hclge_hw_error *err;
-   int i = 0;
-
-   while (err_list[i].msg) {
-   err = &err_list[i];
-   if (!(err->int_msk & err_sts)) {
-   i++;
-   continue;
-   }
-   dev_warn(dev, "%s [error status=0x%x] found\n",
-err->msg, err_sts);
-   i++;
-   }
-}
-
 /* hclge_cmd_query_error: read the error information
  * @hdev: pointer to struct hclge_dev
  * @desc: descriptor for describing the command
@@ -391,53 +372,6 @@ static int hclge_cmd_query_error(struct hclge_dev *hdev,
return ret;
 }
 
-/* hclge_cmd_clear_error: clear the error status
- * @hdev: pointer to struct hclge_dev
- * @desc: descriptor for describing the command
- * @desc_src: prefilled descriptor from the previous command for reusing
- * @cmd:  command opcode
- * @flag: flag for extended command structure
- *
- * This function clear the error status in the hw register/s using command
- */
-static int hclge_cmd_clear_error(struct hclge_dev *hdev,
-struct hclge_desc *desc,
-struct hclge_desc *desc_src,
-u32 cmd, u16 flag)
-{
-   struct device *dev = &hdev->pdev->dev;
-   int num = 1;
-   int ret, i;
-
-   if (cmd) {
-   hclge_cmd_setup_basic_desc(&desc[0], cmd, false);
-   if (flag) {
-   desc[0].flag |= cpu_to_le16(flag);
-   hclge_cmd_setup_basic_desc(&desc[1], cmd, false);
-   num = 2;
-   }
-   if (desc_src) {
-   for (i = 0; i < 6; i++) {
-   desc[0].data[i] = desc_src[0].data[i];
-   if (flag)
-   desc[1].data[i] = desc_src[1].data[i];
-   }
-   }
-   } else {
-   hclge_cmd_reuse_desc(&desc[0], false);
-   if (flag) {
-   desc[0].flag |= cpu_to_le16(flag);
-   hclge_cmd_reuse_desc(&desc[1], false);
-   num = 2;
-   }
-   }
-   ret = hclge_cmd_send(&hdev->hw, &desc[0], num);
-   if (ret)
-   dev_err(dev, "clear e

[PATCH net-next 00/14] net: hns3: Additions/optimizations related to HNS3 H/W err handling

2018-12-07 Thread Salil Mehta
This patch set primarily does following addtions and optimizations
related to error handling in HNS3 Ethernet driver:

 1. Name changes for enable and process functions and minor loop
optimizations. [PATCH 1-6]
 2. Modify query and clearing of RAS errors using new set of commands
because modules specific commands for clearing RCB PPP PF, SSU are
obselete. [PATCH 7]
 3. Deletes logging 1-bit errors for RAS in HNS3 driver as these never
get reported to the driver. [PATCH 8]
 4. Add handling of NIC hw errors reported through MSIx rather than
PCIe AER channel. [PATCH 9]
 5. Add handling for the HW RAS and MSIx errors in the modules MAC, PPP
PF, MSIx SRAM, RCB and SSU. [PATCH 10-13]
 6. Add handling of RoCEE RAS errors. [PATCH 14]

Salil Mehta (1):
  net: hns3: add handling of hw errors reported through MSIX

Shiju Jose (13):
  net: hns3: remove existing process error functions and reorder hw_blk
table
  net: hns3: rename enable error interrupt functions
  net: hns3: re-enable error interrupts on hw reset
  net: hns3: deletes unnecessary settings of the descriptor data
  net: hns3: rename process_hw_error function
  net: hns3: add optimization in the hclge_hw_error_set_state
  net: hns3: add handling of hw ras errors using new set of commands
  net: hns3: deleted logging 1 bit errors
  net: hns3: add handling of hw errors of MAC
  net: hns3: handle hw errors of PPP PF
  net: hns3: handle hw errors of PPU(RCB)
  net: hns3: handle hw errors of SSU
  net: hns3: add handling of RDMA RAS errors

 drivers/net/ethernet/hisilicon/hns3/hnae3.h|3 +-
 drivers/net/ethernet/hisilicon/hns3/hns3_enet.c|4 +-
 .../net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.h |   27 +-
 .../net/ethernet/hisilicon/hns3/hns3pf/hclge_err.c | 1554 
 .../net/ethernet/hisilicon/hns3/hns3pf/hclge_err.h |   79 +-
 .../ethernet/hisilicon/hns3/hns3pf/hclge_main.c|   55 +-
 .../ethernet/hisilicon/hns3/hns3pf/hclge_main.h|1 +
 7 files changed, 1067 insertions(+), 656 deletions(-)

-- 
2.7.4




[PATCH net-next 12/14] net: hns3: handle hw errors of PPU(RCB)

2018-12-07 Thread Salil Mehta
From: Shiju Jose 

This patch enables and handles hw RAS and MSIx errors of PPU(RCB).

Signed-off-by: Shiju Jose 
Signed-off-by: Salil Mehta 
---
 .../net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.h |   3 +
 .../net/ethernet/hisilicon/hns3/hns3pf/hclge_err.c | 162 +
 .../net/ethernet/hisilicon/hns3/hns3pf/hclge_err.h |  15 ++
 3 files changed, 180 insertions(+)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.h 
b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.h
index 46af567..0223e83 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.h
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.h
@@ -217,6 +217,9 @@ enum hclge_opcode_type {
/* Error INT commands */
HCLGE_MAC_COMMON_INT_EN = 0x030E,
HCLGE_TM_SCH_ECC_INT_EN = 0x0829,
+   HCLGE_PPU_MPF_ECC_INT_CMD   = 0x0B40,
+   HCLGE_PPU_MPF_OTHER_INT_CMD = 0x0B41,
+   HCLGE_PPU_PF_OTHER_INT_CMD  = 0x0B42,
HCLGE_COMMON_ECC_INT_CFG= 0x1505,
HCLGE_QUERY_RAS_INT_STS_BD_NUM  = 0x1510,
HCLGE_QUERY_CLEAR_MPF_RAS_INT   = 0x1511,
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_err.c 
b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_err.c
index e82ef4f..00086ce 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_err.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_err.c
@@ -222,6 +222,47 @@ static const struct hclge_hw_error 
hclge_mac_afifo_tnl_int[] = {
{ /* sentinel */ }
 };
 
+static const struct hclge_hw_error hclge_ppu_mpf_abnormal_int_st2[] = {
+   { .int_msk = BIT(13), .msg = "rpu_rx_pkt_bit32_ecc_mbit_err" },
+   { .int_msk = BIT(14), .msg = "rpu_rx_pkt_bit33_ecc_mbit_err" },
+   { .int_msk = BIT(15), .msg = "rpu_rx_pkt_bit34_ecc_mbit_err" },
+   { .int_msk = BIT(16), .msg = "rpu_rx_pkt_bit35_ecc_mbit_err" },
+   { .int_msk = BIT(17), .msg = "rcb_tx_ring_ecc_mbit_err" },
+   { .int_msk = BIT(18), .msg = "rcb_rx_ring_ecc_mbit_err" },
+   { .int_msk = BIT(19), .msg = "rcb_tx_fbd_ecc_mbit_err" },
+   { .int_msk = BIT(20), .msg = "rcb_rx_ebd_ecc_mbit_err" },
+   { .int_msk = BIT(21), .msg = "rcb_tso_info_ecc_mbit_err" },
+   { .int_msk = BIT(22), .msg = "rcb_tx_int_info_ecc_mbit_err" },
+   { .int_msk = BIT(23), .msg = "rcb_rx_int_info_ecc_mbit_err" },
+   { .int_msk = BIT(24), .msg = "tpu_tx_pkt_0_ecc_mbit_err" },
+   { .int_msk = BIT(25), .msg = "tpu_tx_pkt_1_ecc_mbit_err" },
+   { .int_msk = BIT(26), .msg = "rd_bus_err" },
+   { .int_msk = BIT(27), .msg = "wr_bus_err" },
+   { .int_msk = BIT(28), .msg = "reg_search_miss" },
+   { .int_msk = BIT(29), .msg = "rx_q_search_miss" },
+   { .int_msk = BIT(30), .msg = "ooo_ecc_err_detect" },
+   { .int_msk = BIT(31), .msg = "ooo_ecc_err_multpl" },
+   { /* sentinel */ }
+};
+
+static const struct hclge_hw_error hclge_ppu_mpf_abnormal_int_st3[] = {
+   { .int_msk = BIT(4), .msg = "gro_bd_ecc_mbit_err" },
+   { .int_msk = BIT(5), .msg = "gro_context_ecc_mbit_err" },
+   { .int_msk = BIT(6), .msg = "rx_stash_cfg_ecc_mbit_err" },
+   { .int_msk = BIT(7), .msg = "axi_rd_fbd_ecc_mbit_err" },
+   { /* sentinel */ }
+};
+
+static const struct hclge_hw_error hclge_ppu_pf_abnormal_int[] = {
+   { .int_msk = BIT(0), .msg = "over_8bd_no_fe" },
+   { .int_msk = BIT(1), .msg = "tso_mss_cmp_min_err" },
+   { .int_msk = BIT(2), .msg = "tso_mss_cmp_max_err" },
+   { .int_msk = BIT(3), .msg = "tx_rd_fbd_poison" },
+   { .int_msk = BIT(4), .msg = "rx_rd_ebd_poison" },
+   { .int_msk = BIT(5), .msg = "buf_wait_timeout" },
+   { /* sentinel */ }
+};
+
 static void hclge_log_error(struct device *dev, char *reg,
const struct hclge_hw_error *err,
u32 err_sts)
@@ -489,6 +530,82 @@ static int hclge_config_mac_err_int(struct hclge_dev 
*hdev, bool en)
return ret;
 }
 
+static int hclge_config_ppu_error_interrupts(struct hclge_dev *hdev, u32 cmd,
+bool en)
+{
+   struct device *dev = &hdev->pdev->dev;
+   struct hclge_desc desc[2];
+   int num = 1;
+   int ret;
+
+   /* configure PPU error interrupts */
+   if (cmd == HCLGE_PPU_MPF_ECC_INT_CMD) {
+   hclge_cmd_setup_basic_desc(&desc[0], cmd, false);
+   desc[0].flag |= HCLGE_CMD_FLAG_NEXT;
+   hclge_cmd_setup_basic_desc(&desc[1], cmd, false);
+   if (en) {
+   desc[0].data[0] = HCLGE_PPU_MPF_ABNORMAL_INT0_EN;
+   desc[0].data[1] = HCLGE_PPU_MPF_ABNORMAL_INT1_EN;
+   desc[1].data[3] = HCLGE_PPU_MPF_ABNORMAL_INT3_EN;
+   desc[1].data[4] = HCLGE_PPU_MPF_ABNORMAL_INT2_EN;
+   }
+
+   desc[1].data[0] = HCLGE_PPU_MPF_ABNORMAL_INT0_EN_MASK;
+   desc[1].data[1] = HCLGE_PPU_MPF_ABNORMAL_INT1_EN

[PATCH net-next 07/14] net: hns3: add handling of hw ras errors using new set of commands

2018-12-07 Thread Salil Mehta
From: Shiju Jose 

1. This patch adds handling of hw ras errors using new set of
   common commands.
2. Updated the error message tables to match the register's name and
   error status returned by the commands.

Signed-off-by: Shiju Jose 
Signed-off-by: Salil Mehta 
---
 .../net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.h |   3 +
 .../net/ethernet/hisilicon/hns3/hns3pf/hclge_err.c | 489 ++---
 .../net/ethernet/hisilicon/hns3/hns3pf/hclge_err.h |   9 +
 3 files changed, 331 insertions(+), 170 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.h 
b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.h
index d2fb210..0a0eb6c 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.h
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.h
@@ -217,6 +217,9 @@ enum hclge_opcode_type {
/* Error INT commands */
HCLGE_TM_SCH_ECC_INT_EN = 0x0829,
HCLGE_COMMON_ECC_INT_CFG= 0x1505,
+   HCLGE_QUERY_RAS_INT_STS_BD_NUM  = 0x1510,
+   HCLGE_QUERY_CLEAR_MPF_RAS_INT   = 0x1511,
+   HCLGE_QUERY_CLEAR_PF_RAS_INT= 0x1512,
HCLGE_IGU_EGU_TNL_INT_EN= 0x1803,
HCLGE_IGU_COMMON_INT_EN = 0x1806,
HCLGE_TM_QCN_MEM_INT_CFG= 0x1A14,
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_err.c 
b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_err.c
index d1c9f7a..22e7c5b 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_err.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_err.c
@@ -20,12 +20,7 @@ static const struct hclge_hw_error hclge_imp_tcm_ecc_int[] = 
{
{ .int_msk = BIT(13), .msg = "imp_dtcm1_mem0_ecc_mbit_err" },
{ .int_msk = BIT(14), .msg = "imp_dtcm1_mem1_ecc_1bit_err" },
{ .int_msk = BIT(15), .msg = "imp_dtcm1_mem1_ecc_mbit_err" },
-   { /* sentinel */ }
-};
-
-static const struct hclge_hw_error hclge_imp_itcm4_ecc_int[] = {
-   { .int_msk = BIT(0), .msg = "imp_itcm4_ecc_1bit_err" },
-   { .int_msk = BIT(1), .msg = "imp_itcm4_ecc_mbit_err" },
+   { .int_msk = BIT(17), .msg = "imp_itcm4_ecc_mbit_err" },
{ /* sentinel */ }
 };
 
@@ -46,26 +41,14 @@ static const struct hclge_hw_error 
hclge_cmdq_nic_mem_ecc_int[] = {
{ .int_msk = BIT(13), .msg = "cmdq_nic_rx_addr_ecc_mbit_err" },
{ .int_msk = BIT(14), .msg = "cmdq_nic_tx_addr_ecc_1bit_err" },
{ .int_msk = BIT(15), .msg = "cmdq_nic_tx_addr_ecc_mbit_err" },
-   { /* sentinel */ }
-};
-
-static const struct hclge_hw_error hclge_cmdq_rocee_mem_ecc_int[] = {
-   { .int_msk = BIT(0), .msg = "cmdq_rocee_rx_depth_ecc_1bit_err" },
-   { .int_msk = BIT(1), .msg = "cmdq_rocee_rx_depth_ecc_mbit_err" },
-   { .int_msk = BIT(2), .msg = "cmdq_rocee_tx_depth_ecc_1bit_err" },
-   { .int_msk = BIT(3), .msg = "cmdq_rocee_tx_depth_ecc_mbit_err" },
-   { .int_msk = BIT(4), .msg = "cmdq_rocee_rx_tail_ecc_1bit_err" },
-   { .int_msk = BIT(5), .msg = "cmdq_rocee_rx_tail_ecc_mbit_err" },
-   { .int_msk = BIT(6), .msg = "cmdq_rocee_tx_tail_ecc_1bit_err" },
-   { .int_msk = BIT(7), .msg = "cmdq_rocee_tx_tail_ecc_mbit_err" },
-   { .int_msk = BIT(8), .msg = "cmdq_rocee_rx_head_ecc_1bit_err" },
-   { .int_msk = BIT(9), .msg = "cmdq_rocee_rx_head_ecc_mbit_err" },
-   { .int_msk = BIT(10), .msg = "cmdq_rocee_tx_head_ecc_1bit_err" },
-   { .int_msk = BIT(11), .msg = "cmdq_rocee_tx_head_ecc_mbit_err" },
-   { .int_msk = BIT(12), .msg = "cmdq_rocee_rx_addr_ecc_1bit_err" },
-   { .int_msk = BIT(13), .msg = "cmdq_rocee_rx_addr_ecc_mbit_err" },
-   { .int_msk = BIT(14), .msg = "cmdq_rocee_tx_addr_ecc_1bit_err" },
-   { .int_msk = BIT(15), .msg = "cmdq_rocee_tx_addr_ecc_mbit_err" },
+   { .int_msk = BIT(17), .msg = "cmdq_rocee_rx_depth_ecc_mbit_err" },
+   { .int_msk = BIT(19), .msg = "cmdq_rocee_tx_depth_ecc_mbit_err" },
+   { .int_msk = BIT(21), .msg = "cmdq_rocee_rx_tail_ecc_mbit_err" },
+   { .int_msk = BIT(23), .msg = "cmdq_rocee_tx_tail_ecc_mbit_err" },
+   { .int_msk = BIT(25), .msg = "cmdq_rocee_rx_head_ecc_mbit_err" },
+   { .int_msk = BIT(27), .msg = "cmdq_rocee_tx_head_ecc_mbit_err" },
+   { .int_msk = BIT(29), .msg = "cmdq_rocee_rx_addr_ecc_mbit_err" },
+   { .int_msk = BIT(31), .msg = "cmdq_rocee_tx_addr_ecc_mbit_err" },
{ /* sentinel */ }
 };
 
@@ -85,7 +68,13 @@ static const struct hclge_hw_error hclge_tqp_int_ecc_int[] = 
{
{ /* sentinel */ }
 };
 
-static const struct hclge_hw_error hclge_igu_com_err_int[] = {
+static const struct hclge_hw_error hclge_msix_sram_ecc_int[] = {
+   { .int_msk = BIT(1), .msg = "msix_nic_ecc_mbit_err" },
+   { .int_msk = BIT(3), .msg = "msix_rocee_ecc_mbit_err" },
+   { /* sentinel */ }
+};
+
+static const struct hclge_hw_error hclge_igu_int[] = {
{ .int_msk = BIT(0), .msg = "igu_rx_buf0_ecc_mbit_err" },
{ .int_msk = BIT(1), .msg = "igu_rx_buf0_ecc_1bit_err" },
{ .

[PATCH net-next 10/14] net: hns3: add handling of hw errors of MAC

2018-12-07 Thread Salil Mehta
From: Shiju Jose 

This patch adds enable and handling of hw errors of
the MAC block.

Signed-off-by: Shiju Jose 
Signed-off-by: Salil Mehta 
---
 .../net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.h |  1 +
 .../net/ethernet/hisilicon/hns3/hns3pf/hclge_err.c | 48 ++
 .../net/ethernet/hisilicon/hns3/hns3pf/hclge_err.h |  2 +
 3 files changed, 51 insertions(+)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.h 
b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.h
index 08d02b9..46af567 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.h
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.h
@@ -215,6 +215,7 @@ enum hclge_opcode_type {
HCLGE_OPC_SFP_GET_SPEED = 0x7104,
 
/* Error INT commands */
+   HCLGE_MAC_COMMON_INT_EN = 0x030E,
HCLGE_TM_SCH_ECC_INT_EN = 0x0829,
HCLGE_COMMON_ECC_INT_CFG= 0x1505,
HCLGE_QUERY_RAS_INT_STS_BD_NUM  = 0x1510,
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_err.c 
b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_err.c
index 0676670..20f8bb5 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_err.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_err.c
@@ -210,6 +210,18 @@ static const struct hclge_hw_error hclge_qcn_ecc_rint[] = {
{ /* sentinel */ }
 };
 
+static const struct hclge_hw_error hclge_mac_afifo_tnl_int[] = {
+   { .int_msk = BIT(0), .msg = "egu_cge_afifo_ecc_1bit_err" },
+   { .int_msk = BIT(1), .msg = "egu_cge_afifo_ecc_mbit_err" },
+   { .int_msk = BIT(2), .msg = "egu_lge_afifo_ecc_1bit_err" },
+   { .int_msk = BIT(3), .msg = "egu_lge_afifo_ecc_mbit_err" },
+   { .int_msk = BIT(4), .msg = "cge_igu_afifo_ecc_1bit_err" },
+   { .int_msk = BIT(5), .msg = "cge_igu_afifo_ecc_mbit_err" },
+   { .int_msk = BIT(6), .msg = "lge_igu_afifo_ecc_1bit_err" },
+   { .int_msk = BIT(7), .msg = "lge_igu_afifo_ecc_mbit_err" },
+   { /* sentinel */ }
+};
+
 static void hclge_log_error(struct device *dev, char *reg,
const struct hclge_hw_error *err,
u32 err_sts)
@@ -452,6 +464,27 @@ static int hclge_config_tm_hw_err_int(struct hclge_dev 
*hdev, bool en)
return ret;
 }
 
+static int hclge_config_mac_err_int(struct hclge_dev *hdev, bool en)
+{
+   struct device *dev = &hdev->pdev->dev;
+   struct hclge_desc desc;
+   int ret;
+
+   /* configure MAC common error interrupts */
+   hclge_cmd_setup_basic_desc(&desc, HCLGE_MAC_COMMON_INT_EN, false);
+   if (en)
+   desc.data[0] = cpu_to_le32(HCLGE_MAC_COMMON_ERR_INT_EN);
+
+   desc.data[1] = cpu_to_le32(HCLGE_MAC_COMMON_ERR_INT_EN_MASK);
+
+   ret = hclge_cmd_send(&hdev->hw, &desc, 1);
+   if (ret)
+   dev_err(dev,
+   "fail(%d) to configure MAC COMMON error intr\n", ret);
+
+   return ret;
+}
+
 #define HCLGE_SET_DEFAULT_RESET_REQUEST(reset_type) \
do { \
if (ae_dev->ops->set_default_reset_request) \
@@ -688,6 +721,10 @@ static const struct hclge_hw_blk hw_blk[] = {
  .msk = BIT(5), .name = "COMMON",
  .config_err_int = hclge_config_common_hw_err_int,
},
+   {
+ .msk = BIT(8), .name = "MAC",
+ .config_err_int = hclge_config_mac_err_int,
+   },
{ /* sentinel */ }
 };
 
@@ -735,7 +772,9 @@ int hclge_handle_hw_msix_error(struct hclge_dev *hdev,
u32 mpf_bd_num, pf_bd_num, bd_num;
struct hclge_desc desc_bd;
struct hclge_desc *desc;
+   __le32 *desc_data;
int ret = 0;
+   u32 status;
 
/* set default handling */
set_bit(HNAE3_FUNC_RESET, reset_requests);
@@ -774,6 +813,15 @@ int hclge_handle_hw_msix_error(struct hclge_dev *hdev,
goto msi_error;
}
 
+   /* log MAC errors */
+   desc_data = (__le32 *)&desc[1];
+   status = le32_to_cpu(*desc_data);
+   if (status) {
+   hclge_log_error(dev, "MAC_AFIFO_TNL_INT_R",
+   &hclge_mac_afifo_tnl_int[0], status);
+   set_bit(HNAE3_GLOBAL_RESET, reset_requests);
+   }
+
/* clear all main PF MSIx errors */
hclge_cmd_reuse_desc(&desc[0], false);
desc[0].flag |= cpu_to_le16(HCLGE_CMD_FLAG_NEXT);
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_err.h 
b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_err.h
index 05adccb..8e7d151 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_err.h
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_err.h
@@ -44,6 +44,8 @@
 #define HCLGE_TM_QCN_MEM_ERR_INT_EN0xFF
 #define HCLGE_NCSI_ERR_INT_EN  0x3
 #define HCLGE_NCSI_ERR_INT_TYPE0x9
+#define HCLGE_MAC_COMMON_ERR_INT_ENGENMASK(7, 0)
+#define HCLGE_MAC_COMMON_ERR_INT_EN_MASK   GENMASK(7, 0)
 
 #define HCLGE_IGU_INT_MASK GENMASK(3, 0)
 #define HCLGE_IGU

[PATCH net-next 09/14] net: hns3: add handling of hw errors reported through MSIX

2018-12-07 Thread Salil Mehta
This patch adds handling for HNS3 hardware errors(non-standard)
which are reported through MSIX interrupts and not through
PCIe AER channel.

These MSIX reported hardware errors are handled using common
misc. interrupt handler. Hardware error related registers
cannot be cleared in context to the interrupt received as
they require *heavy* access to hardware using IMP(Integrated
Mangement Processor) commands. Hence, we defer the clearing
of such error events till later time.

Since, we have defered exact identification of errors we
will have to defer the level of receovery/reset which
might be required. Hence, a new reset type UNKNOWN reset
has been introduced which effectively defers the assertion
of the reset till we get hold of kind of errors at later
time.

Signed-off-by: Salil Mehta 
Signed-off-by: Shiju Jose 
---
 drivers/net/ethernet/hisilicon/hns3/hnae3.h|  1 +
 .../net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.h |  3 +
 .../net/ethernet/hisilicon/hns3/hns3pf/hclge_err.c | 93 ++
 .../net/ethernet/hisilicon/hns3/hns3pf/hclge_err.h |  5 ++
 .../ethernet/hisilicon/hns3/hns3pf/hclge_main.c| 39 -
 .../ethernet/hisilicon/hns3/hns3pf/hclge_main.h|  1 +
 6 files changed, 140 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hnae3.h 
b/drivers/net/ethernet/hisilicon/hns3/hnae3.h
index 9d9f4f9..294e725 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hnae3.h
+++ b/drivers/net/ethernet/hisilicon/hns3/hnae3.h
@@ -136,6 +136,7 @@ enum hnae3_reset_type {
HNAE3_CORE_RESET,
HNAE3_GLOBAL_RESET,
HNAE3_IMP_RESET,
+   HNAE3_UNKNOWN_RESET,
HNAE3_NONE_RESET,
 };
 
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.h 
b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.h
index 0a0eb6c..08d02b9 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.h
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.h
@@ -220,6 +220,9 @@ enum hclge_opcode_type {
HCLGE_QUERY_RAS_INT_STS_BD_NUM  = 0x1510,
HCLGE_QUERY_CLEAR_MPF_RAS_INT   = 0x1511,
HCLGE_QUERY_CLEAR_PF_RAS_INT= 0x1512,
+   HCLGE_QUERY_MSIX_INT_STS_BD_NUM = 0x1513,
+   HCLGE_QUERY_CLEAR_ALL_MPF_MSIX_INT  = 0x1514,
+   HCLGE_QUERY_CLEAR_ALL_PF_MSIX_INT   = 0x1515,
HCLGE_IGU_EGU_TNL_INT_EN= 0x1803,
HCLGE_IGU_COMMON_INT_EN = 0x1806,
HCLGE_TM_QCN_MEM_INT_CFG= 0x1A14,
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_err.c 
b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_err.c
index 7371ae4..0676670 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_err.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_err.c
@@ -727,3 +727,96 @@ pci_ers_result_t hclge_handle_hw_ras_error(struct 
hnae3_ae_dev *ae_dev)
 
return PCI_ERS_RESULT_RECOVERED;
 }
+
+int hclge_handle_hw_msix_error(struct hclge_dev *hdev,
+  unsigned long *reset_requests)
+{
+   struct device *dev = &hdev->pdev->dev;
+   u32 mpf_bd_num, pf_bd_num, bd_num;
+   struct hclge_desc desc_bd;
+   struct hclge_desc *desc;
+   int ret = 0;
+
+   /* set default handling */
+   set_bit(HNAE3_FUNC_RESET, reset_requests);
+
+   /* query the number of bds for the MSIx int status */
+   hclge_cmd_setup_basic_desc(&desc_bd, HCLGE_QUERY_MSIX_INT_STS_BD_NUM,
+  true);
+   ret = hclge_cmd_send(&hdev->hw, &desc_bd, 1);
+   if (ret) {
+   dev_err(dev, "fail(%d) to query msix int status bd num\n",
+   ret);
+   /* reset everything for now */
+   set_bit(HNAE3_GLOBAL_RESET, reset_requests);
+   return ret;
+   }
+
+   mpf_bd_num = le32_to_cpu(desc_bd.data[0]);
+   pf_bd_num = le32_to_cpu(desc_bd.data[1]);
+   bd_num = max_t(u32, mpf_bd_num, pf_bd_num);
+
+   desc = kcalloc(bd_num, sizeof(struct hclge_desc), GFP_KERNEL);
+   if (!desc)
+   goto out;
+
+   /* query all main PF MSIx errors */
+   hclge_cmd_setup_basic_desc(&desc[0], HCLGE_QUERY_CLEAR_ALL_MPF_MSIX_INT,
+  true);
+   desc[0].flag |= cpu_to_le16(HCLGE_CMD_FLAG_NEXT);
+
+   ret = hclge_cmd_send(&hdev->hw, &desc[0], mpf_bd_num);
+   if (ret) {
+   dev_err(dev, "query all mpf msix int cmd failed (%d)\n",
+   ret);
+   /* reset everything for now */
+   set_bit(HNAE3_GLOBAL_RESET, reset_requests);
+   goto msi_error;
+   }
+
+   /* clear all main PF MSIx errors */
+   hclge_cmd_reuse_desc(&desc[0], false);
+   desc[0].flag |= cpu_to_le16(HCLGE_CMD_FLAG_NEXT);
+
+   ret = hclge_cmd_send(&hdev->hw, &desc[0], mpf_bd_num);
+   if (ret) {
+   dev_err(dev, "clear all mpf msix int cmd failed (%d)\n",
+   ret);
+   /* reset everything

[PATCH net-next 03/14] net: hns3: re-enable error interrupts on hw reset

2018-12-07 Thread Salil Mehta
From: Shiju Jose 

This patch adds calling hclge_hw_error_set_state function
to re-enable the error interrupts those will be disabled on
the hw reset.

Signed-off-by: Shiju Jose 
Signed-off-by: Salil Mehta 
---
 drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_err.c  |  2 +-
 drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_err.h  |  1 -
 drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c | 14 +-
 3 files changed, 10 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_err.c 
b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_err.c
index 21437fe..7e23d36 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_err.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_err.c
@@ -540,7 +540,7 @@ static int hclge_config_ppp_hw_err_int(struct hclge_dev 
*hdev, bool en)
return ret;
 }
 
-int hclge_config_tm_hw_err_int(struct hclge_dev *hdev, bool en)
+static int hclge_config_tm_hw_err_int(struct hclge_dev *hdev, bool en)
 {
struct device *dev = &hdev->pdev->dev;
struct hclge_desc desc;
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_err.h 
b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_err.h
index 856374c..405739b 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_err.h
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_err.h
@@ -59,6 +59,5 @@ struct hclge_hw_error {
 };
 
 int hclge_hw_error_set_state(struct hclge_dev *hdev, bool state);
-int hclge_config_tm_hw_err_int(struct hclge_dev *hdev, bool en);
 pci_ers_result_t hclge_process_ras_hw_error(struct hnae3_ae_dev *ae_dev);
 #endif
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c 
b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
index 5cea95c..431d92a 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
@@ -7269,7 +7269,7 @@ static int hclge_init_ae_dev(struct hnae3_ae_dev *ae_dev)
ret = hclge_hw_error_set_state(hdev, true);
if (ret) {
dev_err(&pdev->dev,
-   "hw error interrupts enable failed, ret =%d\n", ret);
+   "fail(%d) to enable hw error interrupts\n", ret);
goto err_mdiobus_unreg;
}
 
@@ -7405,11 +7405,15 @@ static int hclge_reset_ae_dev(struct hnae3_ae_dev 
*ae_dev)
return ret;
}
 
-   /* Re-enable the TM hw error interrupts because
-* they get disabled on core/global reset.
+   /* Re-enable the hw error interrupts because
+* the interrupts get disabled on core/global reset.
 */
-   if (hclge_config_tm_hw_err_int(hdev, true))
-   dev_err(&pdev->dev, "failed to enable TM hw error 
interrupts\n");
+   ret = hclge_hw_error_set_state(hdev, true);
+   if (ret) {
+   dev_err(&pdev->dev,
+   "fail(%d) to re-enable HNS hw error interrupts\n", ret);
+   return ret;
+   }
 
hclge_reset_vport_state(hdev);
 
-- 
2.7.4




[PATCH net-next 04/14] net: hns3: deletes unnecessary settings of the descriptor data

2018-12-07 Thread Salil Mehta
From: Shiju Jose 

This patch deletes unnecessary setting of the descriptor data
to 0 for disabling error interrupts because
it is already done by the hclge_cmd_setup_basic_desc function.

Signed-off-by: Shiju Jose 
Signed-off-by: Salil Mehta 
---
 .../net/ethernet/hisilicon/hns3/hns3pf/hclge_err.c | 27 --
 1 file changed, 5 insertions(+), 22 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_err.c 
b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_err.c
index 7e23d36..62fab23 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_err.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_err.c
@@ -390,13 +390,8 @@ static int hclge_config_common_hw_err_int(struct hclge_dev 
*hdev, bool en)
desc[0].data[3] = cpu_to_le32(HCLGE_IMP_RD_POISON_ERR_INT_EN);
desc[0].data[4] = cpu_to_le32(HCLGE_TQP_ECC_ERR_INT_EN);
desc[0].data[5] = cpu_to_le32(HCLGE_IMP_ITCM4_ECC_ERR_INT_EN);
-   } else {
-   desc[0].data[0] = 0;
-   desc[0].data[2] = 0;
-   desc[0].data[3] = 0;
-   desc[0].data[4] = 0;
-   desc[0].data[5] = 0;
}
+
desc[1].data[0] = cpu_to_le32(HCLGE_IMP_TCM_ECC_ERR_INT_EN_MASK);
desc[1].data[2] = cpu_to_le32(HCLGE_CMDQ_NIC_ECC_ERR_INT_EN_MASK |
HCLGE_CMDQ_ROCEE_ECC_ERR_INT_EN_MASK);
@@ -425,8 +420,6 @@ static int hclge_config_ncsi_hw_err_int(struct hclge_dev 
*hdev, bool en)
hclge_cmd_setup_basic_desc(&desc, HCLGE_NCSI_INT_EN, false);
if (en)
desc.data[0] = cpu_to_le32(HCLGE_NCSI_ERR_INT_EN);
-   else
-   desc.data[0] = 0;
 
ret = hclge_cmd_send(&hdev->hw, &desc, 1);
if (ret)
@@ -446,8 +439,7 @@ static int hclge_config_igu_egu_hw_err_int(struct hclge_dev 
*hdev, bool en)
hclge_cmd_setup_basic_desc(&desc, HCLGE_IGU_COMMON_INT_EN, false);
if (en)
desc.data[0] = cpu_to_le32(HCLGE_IGU_ERR_INT_EN);
-   else
-   desc.data[0] = 0;
+
desc.data[1] = cpu_to_le32(HCLGE_IGU_ERR_INT_EN_MASK);
 
ret = hclge_cmd_send(&hdev->hw, &desc, 1);
@@ -460,8 +452,7 @@ static int hclge_config_igu_egu_hw_err_int(struct hclge_dev 
*hdev, bool en)
hclge_cmd_setup_basic_desc(&desc, HCLGE_IGU_EGU_TNL_INT_EN, false);
if (en)
desc.data[0] = cpu_to_le32(HCLGE_IGU_TNL_ERR_INT_EN);
-   else
-   desc.data[0] = 0;
+
desc.data[1] = cpu_to_le32(HCLGE_IGU_TNL_ERR_INT_EN_MASK);
 
ret = hclge_cmd_send(&hdev->hw, &desc, 1);
@@ -494,10 +485,8 @@ static int hclge_config_ppp_error_interrupt(struct 
hclge_dev *hdev, u32 cmd,
cpu_to_le32(HCLGE_PPP_MPF_ECC_ERR_INT0_EN);
desc[0].data[1] =
cpu_to_le32(HCLGE_PPP_MPF_ECC_ERR_INT1_EN);
-   } else {
-   desc[0].data[0] = 0;
-   desc[0].data[1] = 0;
}
+
desc[1].data[0] =
cpu_to_le32(HCLGE_PPP_MPF_ECC_ERR_INT0_EN_MASK);
desc[1].data[1] =
@@ -508,10 +497,8 @@ static int hclge_config_ppp_error_interrupt(struct 
hclge_dev *hdev, u32 cmd,
cpu_to_le32(HCLGE_PPP_MPF_ECC_ERR_INT2_EN);
desc[0].data[1] =
cpu_to_le32(HCLGE_PPP_MPF_ECC_ERR_INT3_EN);
-   } else {
-   desc[0].data[0] = 0;
-   desc[0].data[1] = 0;
}
+
desc[1].data[0] =
cpu_to_le32(HCLGE_PPP_MPF_ECC_ERR_INT2_EN_MASK);
desc[1].data[1] =
@@ -550,8 +537,6 @@ static int hclge_config_tm_hw_err_int(struct hclge_dev 
*hdev, bool en)
hclge_cmd_setup_basic_desc(&desc, HCLGE_TM_SCH_ECC_INT_EN, false);
if (en)
desc.data[0] = cpu_to_le32(HCLGE_TM_SCH_ECC_ERR_INT_EN);
-   else
-   desc.data[0] = 0;
 
ret = hclge_cmd_send(&hdev->hw, &desc, 1);
if (ret) {
@@ -570,8 +555,6 @@ static int hclge_config_tm_hw_err_int(struct hclge_dev 
*hdev, bool en)
hclge_cmd_reuse_desc(&desc, false);
if (en)
desc.data[1] = cpu_to_le32(HCLGE_TM_QCN_MEM_ERR_INT_EN);
-   else
-   desc.data[1] = 0;
 
ret = hclge_cmd_send(&hdev->hw, &desc, 1);
if (ret)
-- 
2.7.4




[PATCH net-next 02/14] net: hns3: rename enable error interrupt functions

2018-12-07 Thread Salil Mehta
From: Shiju Jose 

This patch
- renames the enable error interrupt functions.
  The reason is that these functions
  are used for both enable and disable error interrupts.

- removes redundant logs from the enable error interrupt functions.

Signed-off-by: Shiju Jose 
Signed-off-by: Salil Mehta 
---
 .../net/ethernet/hisilicon/hns3/hns3pf/hclge_err.c | 83 --
 .../net/ethernet/hisilicon/hns3/hns3pf/hclge_err.h |  4 +-
 .../ethernet/hisilicon/hns3/hns3pf/hclge_main.c|  2 +-
 3 files changed, 34 insertions(+), 55 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_err.c 
b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_err.c
index ac9ab3c..21437fe 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_err.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_err.c
@@ -372,18 +372,18 @@ static int hclge_cmd_query_error(struct hclge_dev *hdev,
return ret;
 }
 
-static int hclge_enable_common_error(struct hclge_dev *hdev, bool en)
+static int hclge_config_common_hw_err_int(struct hclge_dev *hdev, bool en)
 {
struct device *dev = &hdev->pdev->dev;
struct hclge_desc desc[2];
int ret;
 
+   /* configure common error interrupts */
hclge_cmd_setup_basic_desc(&desc[0], HCLGE_COMMON_ECC_INT_CFG, false);
desc[0].flag |= cpu_to_le16(HCLGE_CMD_FLAG_NEXT);
hclge_cmd_setup_basic_desc(&desc[1], HCLGE_COMMON_ECC_INT_CFG, false);
 
if (en) {
-   /* enable COMMON error interrupts */
desc[0].data[0] = cpu_to_le32(HCLGE_IMP_TCM_ECC_ERR_INT_EN);
desc[0].data[2] = cpu_to_le32(HCLGE_CMDQ_NIC_ECC_ERR_INT_EN |
HCLGE_CMDQ_ROCEE_ECC_ERR_INT_EN);
@@ -391,7 +391,6 @@ static int hclge_enable_common_error(struct hclge_dev 
*hdev, bool en)
desc[0].data[4] = cpu_to_le32(HCLGE_TQP_ECC_ERR_INT_EN);
desc[0].data[5] = cpu_to_le32(HCLGE_IMP_ITCM4_ECC_ERR_INT_EN);
} else {
-   /* disable COMMON error interrupts */
desc[0].data[0] = 0;
desc[0].data[2] = 0;
desc[0].data[3] = 0;
@@ -408,13 +407,12 @@ static int hclge_enable_common_error(struct hclge_dev 
*hdev, bool en)
ret = hclge_cmd_send(&hdev->hw, &desc[0], 2);
if (ret)
dev_err(dev,
-   "failed(%d) to enable/disable COMMON err interrupts\n",
-   ret);
+   "fail(%d) to configure common err interrupts\n", ret);
 
return ret;
 }
 
-static int hclge_enable_ncsi_error(struct hclge_dev *hdev, bool en)
+static int hclge_config_ncsi_hw_err_int(struct hclge_dev *hdev, bool en)
 {
struct device *dev = &hdev->pdev->dev;
struct hclge_desc desc;
@@ -423,7 +421,7 @@ static int hclge_enable_ncsi_error(struct hclge_dev *hdev, 
bool en)
if (hdev->pdev->revision < 0x21)
return 0;
 
-   /* enable/disable NCSI  error interrupts */
+   /* configure NCSI error interrupts */
hclge_cmd_setup_basic_desc(&desc, HCLGE_NCSI_INT_EN, false);
if (en)
desc.data[0] = cpu_to_le32(HCLGE_NCSI_ERR_INT_EN);
@@ -433,19 +431,18 @@ static int hclge_enable_ncsi_error(struct hclge_dev 
*hdev, bool en)
ret = hclge_cmd_send(&hdev->hw, &desc, 1);
if (ret)
dev_err(dev,
-   "failed(%d) to enable/disable NCSI error interrupts\n",
-   ret);
+   "fail(%d) to configure  NCSI error interrupts\n", ret);
 
return ret;
 }
 
-static int hclge_enable_igu_egu_error(struct hclge_dev *hdev, bool en)
+static int hclge_config_igu_egu_hw_err_int(struct hclge_dev *hdev, bool en)
 {
struct device *dev = &hdev->pdev->dev;
struct hclge_desc desc;
int ret;
 
-   /* enable/disable error interrupts */
+   /* configure IGU,EGU error interrupts */
hclge_cmd_setup_basic_desc(&desc, HCLGE_IGU_COMMON_INT_EN, false);
if (en)
desc.data[0] = cpu_to_le32(HCLGE_IGU_ERR_INT_EN);
@@ -456,8 +453,7 @@ static int hclge_enable_igu_egu_error(struct hclge_dev 
*hdev, bool en)
ret = hclge_cmd_send(&hdev->hw, &desc, 1);
if (ret) {
dev_err(dev,
-   "failed(%d) to enable/disable IGU common interrupts\n",
-   ret);
+   "fail(%d) to configure IGU common interrupts\n", ret);
return ret;
}
 
@@ -471,26 +467,23 @@ static int hclge_enable_igu_egu_error(struct hclge_dev 
*hdev, bool en)
ret = hclge_cmd_send(&hdev->hw, &desc, 1);
if (ret) {
dev_err(dev,
-   "failed(%d) to enable/disable IGU-EGU TNL interrupts\n",
-   ret);
+   "fail(%d) to configure IGU-EGU TNL interrupts\n", ret);
return ret;
}
 
-   ret = hclge_enable_n

[PATCH net-next 06/14] net: hns3: add optimization in the hclge_hw_error_set_state

2018-12-07 Thread Salil Mehta
From: Shiju Jose 

1. This patch adds minor loop optimization in the
   hclge_hw_error_set_state function.
2. Adds logging module's name if it fails to configure the
   error interrupts.

Signed-off-by: Shiju Jose 
Signed-off-by: Salil Mehta 
---
 drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_err.c | 16 +++-
 1 file changed, 7 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_err.c 
b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_err.c
index 2d07be8..d1c9f7a 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_err.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_err.c
@@ -586,18 +586,16 @@ static const struct hclge_hw_blk hw_blk[] = {
 
 int hclge_hw_error_set_state(struct hclge_dev *hdev, bool state)
 {
+   const struct hclge_hw_blk *module = hw_blk;
int ret = 0;
-   int i = 0;
 
-   while (hw_blk[i].name) {
-   if (!hw_blk[i].config_err_int) {
-   i++;
-   continue;
+   while (module->name) {
+   if (module->config_err_int) {
+   ret = module->config_err_int(hdev, state);
+   if (ret)
+   return ret;
}
-   ret = hw_blk[i].config_err_int(hdev, state);
-   if (ret)
-   return ret;
-   i++;
+   module++;
}
 
return ret;
-- 
2.7.4




[PATCH net-next 05/14] net: hns3: rename process_hw_error function

2018-12-07 Thread Salil Mehta
From: Shiju Jose 

This patch renames process_hw_error function to
handle_hw_ras_error function to match the purpose
of the function. This is because hw errors reported through
ras and msix interrupts will be handled separately.

Signed-off-by: Shiju Jose 
Signed-off-by: Salil Mehta 
---
 drivers/net/ethernet/hisilicon/hns3/hnae3.h | 2 +-
 drivers/net/ethernet/hisilicon/hns3/hns3_enet.c | 4 ++--
 drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_err.c  | 2 +-
 drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_err.h  | 2 +-
 drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c | 2 +-
 5 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hnae3.h 
b/drivers/net/ethernet/hisilicon/hns3/hnae3.h
index a1707b7..9d9f4f9 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hnae3.h
+++ b/drivers/net/ethernet/hisilicon/hns3/hnae3.h
@@ -454,7 +454,7 @@ struct hnae3_ae_ops {
int (*restore_fd_rules)(struct hnae3_handle *handle);
void (*enable_fd)(struct hnae3_handle *handle, bool enable);
int (*dbg_run_cmd)(struct hnae3_handle *handle, char *cmd_buf);
-   pci_ers_result_t (*process_hw_error)(struct hnae3_ae_dev *ae_dev);
+   pci_ers_result_t (*handle_hw_ras_error)(struct hnae3_ae_dev *ae_dev);
bool (*get_hw_reset_stat)(struct hnae3_handle *handle);
bool (*ae_dev_resetting)(struct hnae3_handle *handle);
unsigned long (*ae_dev_reset_cnt)(struct hnae3_handle *handle);
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c 
b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
index d1b2de2..69142a3 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
@@ -1828,8 +1828,8 @@ static pci_ers_result_t hns3_error_detected(struct 
pci_dev *pdev,
return PCI_ERS_RESULT_NONE;
}
 
-   if (ae_dev->ops->process_hw_error)
-   ret = ae_dev->ops->process_hw_error(ae_dev);
+   if (ae_dev->ops->handle_hw_ras_error)
+   ret = ae_dev->ops->handle_hw_ras_error(ae_dev);
else
return PCI_ERS_RESULT_NONE;
 
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_err.c 
b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_err.c
index 62fab23..2d07be8 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_err.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_err.c
@@ -603,7 +603,7 @@ int hclge_hw_error_set_state(struct hclge_dev *hdev, bool 
state)
return ret;
 }
 
-pci_ers_result_t hclge_process_ras_hw_error(struct hnae3_ae_dev *ae_dev)
+pci_ers_result_t hclge_handle_hw_ras_error(struct hnae3_ae_dev *ae_dev)
 {
struct hclge_dev *hdev = ae_dev->priv;
struct device *dev = &hdev->pdev->dev;
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_err.h 
b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_err.h
index 405739b..9fe1c96 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_err.h
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_err.h
@@ -59,5 +59,5 @@ struct hclge_hw_error {
 };
 
 int hclge_hw_error_set_state(struct hclge_dev *hdev, bool state);
-pci_ers_result_t hclge_process_ras_hw_error(struct hnae3_ae_dev *ae_dev);
+pci_ers_result_t hclge_handle_hw_ras_error(struct hnae3_ae_dev *ae_dev);
 #endif
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c 
b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
index 431d92a..354ac5f 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
@@ -7935,7 +7935,7 @@ static const struct hnae3_ae_ops hclge_ops = {
.restore_fd_rules = hclge_restore_fd_entries,
.enable_fd = hclge_enable_fd,
.dbg_run_cmd = hclge_dbg_run_cmd,
-   .process_hw_error = hclge_process_ras_hw_error,
+   .handle_hw_ras_error = hclge_handle_hw_ras_error,
.get_hw_reset_stat = hclge_get_hw_reset_stat,
.ae_dev_resetting = hclge_ae_dev_resetting,
.ae_dev_reset_cnt = hclge_ae_dev_reset_cnt,
-- 
2.7.4




[PATCH net-next 08/14] net: hns3: deleted logging 1 bit errors

2018-12-07 Thread Salil Mehta
From: Shiju Jose 

This patch deletes logging 1 bit errors for the following reasons.
1. AER does not notify 1 bit errors to the device drivers.
   However AER reports 1 bit errors to the userspace through the
   trace_aer_event for logging in the rasdaemon.
2. Firmware clears the status of 1 bit errors in the hw registers.

Signed-off-by: Shiju Jose 
Signed-off-by: Salil Mehta 
---
 .../net/ethernet/hisilicon/hns3/hns3pf/hclge_err.c | 37 --
 1 file changed, 37 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_err.c 
b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_err.c
index 22e7c5b..7371ae4 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_err.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_err.c
@@ -4,42 +4,26 @@
 #include "hclge_err.h"
 
 static const struct hclge_hw_error hclge_imp_tcm_ecc_int[] = {
-   { .int_msk = BIT(0), .msg = "imp_itcm0_ecc_1bit_err" },
{ .int_msk = BIT(1), .msg = "imp_itcm0_ecc_mbit_err" },
-   { .int_msk = BIT(2), .msg = "imp_itcm1_ecc_1bit_err" },
{ .int_msk = BIT(3), .msg = "imp_itcm1_ecc_mbit_err" },
-   { .int_msk = BIT(4), .msg = "imp_itcm2_ecc_1bit_err" },
{ .int_msk = BIT(5), .msg = "imp_itcm2_ecc_mbit_err" },
-   { .int_msk = BIT(6), .msg = "imp_itcm3_ecc_1bit_err" },
{ .int_msk = BIT(7), .msg = "imp_itcm3_ecc_mbit_err" },
-   { .int_msk = BIT(8), .msg = "imp_dtcm0_mem0_ecc_1bit_err" },
{ .int_msk = BIT(9), .msg = "imp_dtcm0_mem0_ecc_mbit_err" },
-   { .int_msk = BIT(10), .msg = "imp_dtcm0_mem1_ecc_1bit_err" },
{ .int_msk = BIT(11), .msg = "imp_dtcm0_mem1_ecc_mbit_err" },
-   { .int_msk = BIT(12), .msg = "imp_dtcm1_mem0_ecc_1bit_err" },
{ .int_msk = BIT(13), .msg = "imp_dtcm1_mem0_ecc_mbit_err" },
-   { .int_msk = BIT(14), .msg = "imp_dtcm1_mem1_ecc_1bit_err" },
{ .int_msk = BIT(15), .msg = "imp_dtcm1_mem1_ecc_mbit_err" },
{ .int_msk = BIT(17), .msg = "imp_itcm4_ecc_mbit_err" },
{ /* sentinel */ }
 };
 
 static const struct hclge_hw_error hclge_cmdq_nic_mem_ecc_int[] = {
-   { .int_msk = BIT(0), .msg = "cmdq_nic_rx_depth_ecc_1bit_err" },
{ .int_msk = BIT(1), .msg = "cmdq_nic_rx_depth_ecc_mbit_err" },
-   { .int_msk = BIT(2), .msg = "cmdq_nic_tx_depth_ecc_1bit_err" },
{ .int_msk = BIT(3), .msg = "cmdq_nic_tx_depth_ecc_mbit_err" },
-   { .int_msk = BIT(4), .msg = "cmdq_nic_rx_tail_ecc_1bit_err" },
{ .int_msk = BIT(5), .msg = "cmdq_nic_rx_tail_ecc_mbit_err" },
-   { .int_msk = BIT(6), .msg = "cmdq_nic_tx_tail_ecc_1bit_err" },
{ .int_msk = BIT(7), .msg = "cmdq_nic_tx_tail_ecc_mbit_err" },
-   { .int_msk = BIT(8), .msg = "cmdq_nic_rx_head_ecc_1bit_err" },
{ .int_msk = BIT(9), .msg = "cmdq_nic_rx_head_ecc_mbit_err" },
-   { .int_msk = BIT(10), .msg = "cmdq_nic_tx_head_ecc_1bit_err" },
{ .int_msk = BIT(11), .msg = "cmdq_nic_tx_head_ecc_mbit_err" },
-   { .int_msk = BIT(12), .msg = "cmdq_nic_rx_addr_ecc_1bit_err" },
{ .int_msk = BIT(13), .msg = "cmdq_nic_rx_addr_ecc_mbit_err" },
-   { .int_msk = BIT(14), .msg = "cmdq_nic_tx_addr_ecc_1bit_err" },
{ .int_msk = BIT(15), .msg = "cmdq_nic_tx_addr_ecc_mbit_err" },
{ .int_msk = BIT(17), .msg = "cmdq_rocee_rx_depth_ecc_mbit_err" },
{ .int_msk = BIT(19), .msg = "cmdq_rocee_tx_depth_ecc_mbit_err" },
@@ -53,12 +37,6 @@ static const struct hclge_hw_error 
hclge_cmdq_nic_mem_ecc_int[] = {
 };
 
 static const struct hclge_hw_error hclge_tqp_int_ecc_int[] = {
-   { .int_msk = BIT(0), .msg = "tqp_int_cfg_even_ecc_1bit_err" },
-   { .int_msk = BIT(1), .msg = "tqp_int_cfg_odd_ecc_1bit_err" },
-   { .int_msk = BIT(2), .msg = "tqp_int_ctrl_even_ecc_1bit_err" },
-   { .int_msk = BIT(3), .msg = "tqp_int_ctrl_odd_ecc_1bit_err" },
-   { .int_msk = BIT(4), .msg = "tx_que_scan_int_ecc_1bit_err" },
-   { .int_msk = BIT(5), .msg = "rx_que_scan_int_ecc_1bit_err" },
{ .int_msk = BIT(6), .msg = "tqp_int_cfg_even_ecc_mbit_err" },
{ .int_msk = BIT(7), .msg = "tqp_int_cfg_odd_ecc_mbit_err" },
{ .int_msk = BIT(8), .msg = "tqp_int_ctrl_even_ecc_mbit_err" },
@@ -76,9 +54,7 @@ static const struct hclge_hw_error hclge_msix_sram_ecc_int[] 
= {
 
 static const struct hclge_hw_error hclge_igu_int[] = {
{ .int_msk = BIT(0), .msg = "igu_rx_buf0_ecc_mbit_err" },
-   { .int_msk = BIT(1), .msg = "igu_rx_buf0_ecc_1bit_err" },
{ .int_msk = BIT(2), .msg = "igu_rx_buf1_ecc_mbit_err" },
-   { .int_msk = BIT(3), .msg = "igu_rx_buf1_ecc_1bit_err" },
{ /* sentinel */ }
 };
 
@@ -93,7 +69,6 @@ static const struct hclge_hw_error hclge_igu_egu_tnl_int[] = {
 };
 
 static const struct hclge_hw_error hclge_ncsi_err_int[] = {
-   { .int_msk = BIT(0), .msg = "ncsi_tx_ecc_1bit_err" },
{ .int_msk = BIT(1), .msg = "ncsi_tx_ecc_mbit_err" },
{ /* sentinel */ }
 };
@@ -154,7 +129,6 @@ static const struct hcl

[PATCH net-next 11/14] net: hns3: handle hw errors of PPP PF

2018-12-07 Thread Salil Mehta
From: Shiju Jose 

This patch handles PF hw errors of PPP(Programmable Packet Processor).

Signed-off-by: Shiju Jose 
Signed-off-by: Salil Mehta 
---
 drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_err.c | 15 +--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_err.c 
b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_err.c
index 20f8bb5..e82ef4f 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_err.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_err.c
@@ -112,8 +112,8 @@ static const struct hclge_hw_error 
hclge_ppp_mpf_abnormal_int_st1[] = {
{ /* sentinel */ }
 };
 
-static const struct hclge_hw_error hclge_ppp_pf_int[] = {
-   { .int_msk = BIT(0), .msg = "Tx_vlan_tag_err" },
+static const struct hclge_hw_error hclge_ppp_pf_abnormal_int[] = {
+   { .int_msk = BIT(0), .msg = "tx_vlan_tag_err" },
{ .int_msk = BIT(1), .msg = "rss_list_tc_unassigned_queue_err" },
{ /* sentinel */ }
 };
@@ -385,12 +385,16 @@ static int hclge_config_ppp_error_interrupt(struct 
hclge_dev *hdev, u32 cmd,
cpu_to_le32(HCLGE_PPP_MPF_ECC_ERR_INT0_EN);
desc[0].data[1] =
cpu_to_le32(HCLGE_PPP_MPF_ECC_ERR_INT1_EN);
+   desc[0].data[4] = cpu_to_le32(HCLGE_PPP_PF_ERR_INT_EN);
}
 
desc[1].data[0] =
cpu_to_le32(HCLGE_PPP_MPF_ECC_ERR_INT0_EN_MASK);
desc[1].data[1] =
cpu_to_le32(HCLGE_PPP_MPF_ECC_ERR_INT1_EN_MASK);
+   if (hdev->pdev->revision >= 0x21)
+   desc[1].data[2] =
+   cpu_to_le32(HCLGE_PPP_PF_ERR_INT_EN_MASK);
} else if (cmd == HCLGE_PPP_CMD1_INT_CMD) {
if (en) {
desc[0].data[0] =
@@ -850,6 +854,13 @@ int hclge_handle_hw_msix_error(struct hclge_dev *hdev,
goto msi_error;
}
 
+   /* read and log PPP PF errors */
+   desc_data = (__le32 *)&desc[2];
+   status = le32_to_cpu(*desc_data);
+   if (status)
+   hclge_log_error(dev, "PPP_PF_ABNORMAL_INT_ST0",
+   &hclge_ppp_pf_abnormal_int[0], status);
+
/* clear all PF MSIx errors */
hclge_cmd_reuse_desc(&desc[0], false);
desc[0].flag |= cpu_to_le16(HCLGE_CMD_FLAG_NEXT);
-- 
2.7.4




[PATCH net-next 13/14] net: hns3: handle hw errors of SSU

2018-12-07 Thread Salil Mehta
From: Shiju Jose 

This patch enables and handles hw errors of the Storage Switch Unit(SSU).

Signed-off-by: Shiju Jose 
Signed-off-by: Salil Mehta 
---
 .../net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.h |   2 +
 .../net/ethernet/hisilicon/hns3/hns3pf/hclge_err.c | 187 +
 .../net/ethernet/hisilicon/hns3/hns3pf/hclge_err.h |  14 ++
 3 files changed, 203 insertions(+)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.h 
b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.h
index 0223e83..eb91519 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.h
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.h
@@ -217,6 +217,8 @@ enum hclge_opcode_type {
/* Error INT commands */
HCLGE_MAC_COMMON_INT_EN = 0x030E,
HCLGE_TM_SCH_ECC_INT_EN = 0x0829,
+   HCLGE_SSU_ECC_INT_CMD   = 0x0989,
+   HCLGE_SSU_COMMON_INT_CMD= 0x098C,
HCLGE_PPU_MPF_ECC_INT_CMD   = 0x0B40,
HCLGE_PPU_MPF_OTHER_INT_CMD = 0x0B41,
HCLGE_PPU_PF_OTHER_INT_CMD  = 0x0B42,
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_err.c 
b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_err.c
index 00086ce..660320d 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_err.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_err.c
@@ -263,6 +263,80 @@ static const struct hclge_hw_error 
hclge_ppu_pf_abnormal_int[] = {
{ /* sentinel */ }
 };
 
+static const struct hclge_hw_error hclge_ssu_com_err_int[] = {
+   { .int_msk = BIT(0), .msg = "buf_sum_err" },
+   { .int_msk = BIT(1), .msg = "ppp_mb_num_err" },
+   { .int_msk = BIT(2), .msg = "ppp_mbid_err" },
+   { .int_msk = BIT(3), .msg = "ppp_rlt_mac_err" },
+   { .int_msk = BIT(4), .msg = "ppp_rlt_host_err" },
+   { .int_msk = BIT(5), .msg = "cks_edit_position_err" },
+   { .int_msk = BIT(6), .msg = "cks_edit_condition_err" },
+   { .int_msk = BIT(7), .msg = "vlan_edit_condition_err" },
+   { .int_msk = BIT(8), .msg = "vlan_num_ot_err" },
+   { .int_msk = BIT(9), .msg = "vlan_num_in_err" },
+   { /* sentinel */ }
+};
+
+static const struct hclge_hw_error hclge_ssu_port_based_err_int[] = {
+   { .int_msk = BIT(0), .msg = "roc_pkt_without_key_port" },
+   { .int_msk = BIT(1), .msg = "tpu_pkt_without_key_port" },
+   { .int_msk = BIT(2), .msg = "igu_pkt_without_key_port" },
+   { .int_msk = BIT(3), .msg = "roc_eof_mis_match_port" },
+   { .int_msk = BIT(4), .msg = "tpu_eof_mis_match_port" },
+   { .int_msk = BIT(5), .msg = "igu_eof_mis_match_port" },
+   { .int_msk = BIT(6), .msg = "roc_sof_mis_match_port" },
+   { .int_msk = BIT(7), .msg = "tpu_sof_mis_match_port" },
+   { .int_msk = BIT(8), .msg = "igu_sof_mis_match_port" },
+   { .int_msk = BIT(11), .msg = "ets_rd_int_rx_port" },
+   { .int_msk = BIT(12), .msg = "ets_wr_int_rx_port" },
+   { .int_msk = BIT(13), .msg = "ets_rd_int_tx_port" },
+   { .int_msk = BIT(14), .msg = "ets_wr_int_tx_port" },
+   { /* sentinel */ }
+};
+
+static const struct hclge_hw_error hclge_ssu_fifo_overflow_int[] = {
+   { .int_msk = BIT(0), .msg = "ig_mac_inf_int" },
+   { .int_msk = BIT(1), .msg = "ig_host_inf_int" },
+   { .int_msk = BIT(2), .msg = "ig_roc_buf_int" },
+   { .int_msk = BIT(3), .msg = "ig_host_data_fifo_int" },
+   { .int_msk = BIT(4), .msg = "ig_host_key_fifo_int" },
+   { .int_msk = BIT(5), .msg = "tx_qcn_fifo_int" },
+   { .int_msk = BIT(6), .msg = "rx_qcn_fifo_int" },
+   { .int_msk = BIT(7), .msg = "tx_pf_rd_fifo_int" },
+   { .int_msk = BIT(8), .msg = "rx_pf_rd_fifo_int" },
+   { .int_msk = BIT(9), .msg = "qm_eof_fifo_int" },
+   { .int_msk = BIT(10), .msg = "mb_rlt_fifo_int" },
+   { .int_msk = BIT(11), .msg = "dup_uncopy_fifo_int" },
+   { .int_msk = BIT(12), .msg = "dup_cnt_rd_fifo_int" },
+   { .int_msk = BIT(13), .msg = "dup_cnt_drop_fifo_int" },
+   { .int_msk = BIT(14), .msg = "dup_cnt_wrb_fifo_int" },
+   { .int_msk = BIT(15), .msg = "host_cmd_fifo_int" },
+   { .int_msk = BIT(16), .msg = "mac_cmd_fifo_int" },
+   { .int_msk = BIT(17), .msg = "host_cmd_bitmap_empty_int" },
+   { .int_msk = BIT(18), .msg = "mac_cmd_bitmap_empty_int" },
+   { .int_msk = BIT(19), .msg = "dup_bitmap_empty_int" },
+   { .int_msk = BIT(20), .msg = "out_queue_bitmap_empty_int" },
+   { .int_msk = BIT(21), .msg = "bank2_bitmap_empty_int" },
+   { .int_msk = BIT(22), .msg = "bank1_bitmap_empty_int" },
+   { .int_msk = BIT(23), .msg = "bank0_bitmap_empty_int" },
+   { /* sentinel */ }
+};
+
+static const struct hclge_hw_error hclge_ssu_ets_tcg_int[] = {
+   { .int_msk = BIT(0), .msg = "ets_rd_int_rx_tcg" },
+   { .int_msk = BIT(1), .msg = "ets_wr_int_rx_tcg" },
+   { .int_msk = BIT(2), .msg = "ets_rd_int_tx_tcg" },
+   { .int_msk = BIT(3), .msg = "ets_wr_int_tx_tcg" },
+   { /* se

[PATCH net-next 14/14] net: hns3: add handling of RDMA RAS errors

2018-12-07 Thread Salil Mehta
From: Shiju Jose 

This patch handles the RDMA RAS errors.
1. Enable RAS interrupt, print error detail info and clear error status.
2. Do CORE reset to recovery when these non-fatal errors happened.

Signed-off-by: Xiaofei Tan 
Signed-off-by: Shiju Jose 
Signed-off-by: Salil Mehta 
---
 .../net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.h |   3 +
 .../net/ethernet/hisilicon/hns3/hns3pf/hclge_err.c | 185 -
 .../net/ethernet/hisilicon/hns3/hns3pf/hclge_err.h |  12 ++
 3 files changed, 199 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.h 
b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.h
index eb91519..b1ee6fe 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.h
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.h
@@ -229,6 +229,9 @@ enum hclge_opcode_type {
HCLGE_QUERY_MSIX_INT_STS_BD_NUM = 0x1513,
HCLGE_QUERY_CLEAR_ALL_MPF_MSIX_INT  = 0x1514,
HCLGE_QUERY_CLEAR_ALL_PF_MSIX_INT   = 0x1515,
+   HCLGE_CONFIG_ROCEE_RAS_INT_EN   = 0x1580,
+   HCLGE_QUERY_CLEAR_ROCEE_RAS_INT = 0x1581,
+   HCLGE_ROCEE_PF_RAS_INT_CMD  = 0x1584,
HCLGE_IGU_EGU_TNL_INT_EN= 0x1803,
HCLGE_IGU_COMMON_INT_EN = 0x1806,
HCLGE_TM_QCN_MEM_INT_CFG= 0x1A14,
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_err.c 
b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_err.c
index 660320d..2b52a51 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_err.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_err.c
@@ -337,6 +337,30 @@ static const struct hclge_hw_error 
hclge_ssu_port_based_pf_int[] = {
{ /* sentinel */ }
 };
 
+static const struct hclge_hw_error hclge_rocee_qmm_ovf_err_int[] = {
+   { .int_msk = 0, .msg = "rocee qmm ovf: sgid invalid err" },
+   { .int_msk = 0x4, .msg = "rocee qmm ovf: sgid ovf err" },
+   { .int_msk = 0x8, .msg = "rocee qmm ovf: smac invalid err" },
+   { .int_msk = 0xC, .msg = "rocee qmm ovf: smac ovf err" },
+   { .int_msk = 0x10, .msg = "rocee qmm ovf: cqc invalid err" },
+   { .int_msk = 0x11, .msg = "rocee qmm ovf: cqc ovf err" },
+   { .int_msk = 0x12, .msg = "rocee qmm ovf: cqc hopnum err" },
+   { .int_msk = 0x13, .msg = "rocee qmm ovf: cqc ba0 err" },
+   { .int_msk = 0x14, .msg = "rocee qmm ovf: srqc invalid err" },
+   { .int_msk = 0x15, .msg = "rocee qmm ovf: srqc ovf err" },
+   { .int_msk = 0x16, .msg = "rocee qmm ovf: srqc hopnum err" },
+   { .int_msk = 0x17, .msg = "rocee qmm ovf: srqc ba0 err" },
+   { .int_msk = 0x18, .msg = "rocee qmm ovf: mpt invalid err" },
+   { .int_msk = 0x19, .msg = "rocee qmm ovf: mpt ovf err" },
+   { .int_msk = 0x1A, .msg = "rocee qmm ovf: mpt hopnum err" },
+   { .int_msk = 0x1B, .msg = "rocee qmm ovf: mpt ba0 err" },
+   { .int_msk = 0x1C, .msg = "rocee qmm ovf: qpc invalid err" },
+   { .int_msk = 0x1D, .msg = "rocee qmm ovf: qpc ovf err" },
+   { .int_msk = 0x1E, .msg = "rocee qmm ovf: qpc hopnum err" },
+   { .int_msk = 0x1F, .msg = "rocee qmm ovf: qpc ba0 err" },
+   { /* sentinel */ }
+};
+
 static void hclge_log_error(struct device *dev, char *reg,
const struct hclge_hw_error *err,
u32 err_sts)
@@ -1023,6 +1047,148 @@ static int hclge_handle_all_ras_errors(struct hclge_dev 
*hdev)
return ret;
 }
 
+static int hclge_log_rocee_ovf_error(struct hclge_dev *hdev)
+{
+   struct device *dev = &hdev->pdev->dev;
+   struct hclge_desc desc[2];
+   int ret;
+
+   /* read overflow error status */
+   ret = hclge_cmd_query_error(hdev, &desc[0],
+   HCLGE_ROCEE_PF_RAS_INT_CMD,
+   0, 0, 0);
+   if (ret) {
+   dev_err(dev, "failed(%d) to query ROCEE OVF error sts\n", ret);
+   return ret;
+   }
+
+   /* log overflow error */
+   if (le32_to_cpu(desc[0].data[0]) & HCLGE_ROCEE_OVF_ERR_INT_MASK) {
+   const struct hclge_hw_error *err;
+   u32 err_sts;
+
+   err = &hclge_rocee_qmm_ovf_err_int[0];
+   err_sts = HCLGE_ROCEE_OVF_ERR_TYPE_MASK &
+ le32_to_cpu(desc[0].data[0]);
+   while (err->msg) {
+   if (err->int_msk == err_sts) {
+   dev_warn(dev, "%s [error status=0x%x] found\n",
+err->msg,
+le32_to_cpu(desc[0].data[0]));
+   break;
+   }
+   err++;
+   }
+   }
+
+   if (le32_to_cpu(desc[0].data[1]) & HCLGE_ROCEE_OVF_ERR_INT_MASK) {
+   dev_warn(dev, "ROCEE TSP OVF [error status=0x%x] found\n",
+le32_to_cpu(desc[0].data[1]));
+   }
+
+   if (le32_to_cpu(desc

Re: [PATCH net-next v2 0/4] net: mitigate retpoline overhead

2018-12-07 Thread David Miller
From: Paolo Abeni 
Date: Fri, 07 Dec 2018 21:29:20 +0100

> Are you building with CONFIG_IPV6=m ?

I always build allmodconfig


Re: [PATCH net-next v2 00/12] mlxsw: Un/offload FDB on NVE detach/attach

2018-12-07 Thread David Miller
From: Ido Schimmel 
Date: Fri, 7 Dec 2018 19:55:01 +

> Petr says:
> 
> When a VXLAN device is attached to a bridge of a driver capable of
> offloading such, or upped, the FDB entries already present at the device
> need to be offloaded. Similarly when an offloaded VXLAN device ceases
> being interesting (it is downed, or detached, or a front-panel port
> netdevice is detached from the bridge that the VXLAN device is attached
> to), any offloaded FDB entries need to be unoffloaded and unmarked. This
> attach / detach processing is implemented in this patchset.
> 
> In patch #1, a code pattern is extracted into a named function for
> easier reuse.
> 
> In patch #2, vxlan_fdb_replay() is added to send
> SWITCHDEV_VXLAN_FDB_ADD_TO_DEVICE for each FDB entry with a given VNI.
> The intention is that the offloading driver will interpret these events
> like any other and thus offload the FDB entries that existed prior to
> VXLAN attach.
> 
> In patches #3 and #4, the functions vxlan_fdb_clear_offload() resp.
> br_fdb_clear_offload() are added. These clear the offloaded flag at
> matching FDB entries.
> 
> In patches #5-#9, we introduce FID-type-specific and NVE-type-specific
> ops necessary to properly abstract invocations of the replay/clear
> functions.
> 
> Finally patch #10 implements the FDB management.
> 
> In patch #11, the mlxsw-specific test case is extended to check that the
> management of offload marks under the newly-supported situations is
> correct. Patch #12, from Ido, exercises the new code paths in actual
> functional test.
 ...

Series applied, thanks.


Re: [PATCH iproute2-next 0/2] devlink: Add support for 'fw_load_policy' generic parameter

2018-12-07 Thread David Ahern
On 12/4/18 3:14 AM, Shalom Toledo wrote:
> Patch #1 add string to uint conversion support for generic parameters.
> Patch #2 add string to uint support for 'fw_load_policy' generic parameter
> 
> Shalom Toledo (2):
>   devlink: Add string to uint{8,16,32} conversion for generic parameters
>   devlink: Add support for 'fw_load_policy' generic parameter
> 
>  devlink/devlink.c| 156 ---
>  include/uapi/linux/devlink.h |   5 ++
>  2 files changed, 151 insertions(+), 10 deletions(-)
> 

applied to iproute2-next. Thanks


Re: [PATCH 1/5] net: dsa: ksz: Add MIB counter reading support

2018-12-07 Thread David Miller


Every patch series should have a header posting with Subject of
the form "[PATCH 0/N] ..." explaining what the series does at
a high level, how it does it, and why it does it that way.


Re: OMAP4430 SDP with KS8851: very slow networking

2018-12-07 Thread Tony Lindgren
* Russell King - ARM Linux  [181207 19:27]:
> You mentioned that edge mode didn't work as well as level mode on
> duovero smsc controller, I think this may help to solve the same
> issue but for edge IRQs - we need a mask_ack_irq function to avoid
> acking while the edge interrupt is masked.  Let me know if that
> lowers the smsc ping latency while in edge mode.

Looks like smsc edge interrupt is still producing varying
ping latencies with this. Seems like the mas_ack_irq is
a nice improvment though.

Regards,

Tony


Re: [PATCH v2 net-next 0/4] net: aquantia: add RSS configuration

2018-12-07 Thread David Miller
From: Igor Russkikh 
Date: Fri, 7 Dec 2018 14:00:09 +

> In this patchset few bugs related to RSS are fixed and RSS table and
> hash key configuration is added.
> 
> We also do increase max number of HW rings upto 8.
> 
> v2: removed extra arg check

Series applied.


Re: [PATCH net-next v2 1/4] indirect call wrappers: helpers to speed-up indirect calls of builtin

2018-12-07 Thread Paolo Abeni
On Fri, 2018-12-07 at 09:46 +, David Woodhouse wrote:
> On Wed, 2018-12-05 at 19:13 +0100, Paolo Abeni wrote:
> > +/*
> > + * We can use INDIRECT_CALL_$NR for ipv6 related functions only if ipv6 is
> > + * builtin, this macro simplify dealing with indirect calls with only 
> > ipv4/ipv6
> > + * alternatives
> > + */
> > +#if IS_BUILTIN(CONFIG_IPV6)
> > +#define INDIRECT_CALL_INET(f, f2, f1, ...) \
> > +   INDIRECT_CALL_2(f, f2, f1, __VA_ARGS__)
> > +#elif IS_ENABLED(CONFIG_INET)
> > +#define INDIRECT_CALL_INET(f, f2, f1, ...) INDIRECT_CALL_1(f, f1, 
> > __VA_ARGS__)
> > +#else
> > +#define INDIRECT_CALL_INET(f, f2, f1, ...) f(__VA_ARGS__)
> > +#endif
> > +
> > +#endif
> 
> Thanks for working on this.
> 
> I'm not stunningly keen on the part cited above. And it doesn't seem to
> be working either, given Dave's later error and reversion.

My bad, I did not test vs a relevant cfg. Hopefully that can be fixed.

> I wonder if we can declare the common case functions as 'weak' so that
> the link failures don't happen when they're absent.

I experimented a previous version with alias. I avoided weak alias
usage, because I [mis?]understood not all compilers have a complete
support for them (e.g. clang).
Also, with weak ref, a coding error that is now discovered at build
time will result in worse performance at runtime, likely with some
uncommon configuration, possibly not as easily detected. I'm unsure
that would be better ?!?

> Once we extend this past the network code, especially to file systems'
> f_ops, I suspect we're going to want to use something like static keys
> to patch the common cases at runtime — perhaps changing the f_ops
> default according to what the root file system is, etc.

I'm sorry, I don't follow here. I think static keys can't be used for the 
reported network case: we have different list elements each contaning a 
different function pointer and we access/use
different ptr on a per packet basis.

> I'd quite like to see the API for this taking that into account even if
> it's left to be a future development.

Again, I'm lost here. Can you please give more hints?

Thanks,

Paolo



Re: [PATCH] net/mlx4_core: Correctly set PFC param if global pause is turned off.

2018-12-07 Thread David Miller
From: Tarick Bedeir 
Date: Fri,  7 Dec 2018 00:30:26 -0800

> rx_ppp and tx_ppp can be set between 0 and 255, so don't clamp to 1.
> 
> Fixes: 6e8814ceb7e8 ("net/mlx4_en: Fix mixed PFC and Global pause user 
> control requests")
> Signed-off-by: Tarick Bedeir 

Tariq and co., please review.


Re: [PATCH] gianfar: Add gfar_change_carrier()

2018-12-07 Thread Florian Fainelli
On 12/7/18 9:26 AM, Andrew Lunn wrote:
>> Would you be happier if .ndo_change_carrier() only acted on Fixed PHYs?
> 
> I think it makes sense to allow a fixed phy carrier to be changed from
> user space. However, i don't think you can easily plumb that to
> .ndo_change_carrier(), since that is a MAC feature. You need to change
> the fixed_phy_status to indicate the PHY has lost link, and then let
> the usual mechanisms tell the MAC it is down and change the carrier
> status.

Joakim, I still don't understand what did not work with:

- adding a ndo_change_carrier() interface which keeps a boolean flag
whether the link was up or not

- register a fixed_link_update callback for your fixed PHY, which just
propagates that flag back to the fixed PHY

and that should take care of having the carrier go down, as driven by
the PHY state machine, for that fixed device.
-- 
Florian


Re: [PATCH rdma-next 0/3] Packet based credit mode

2018-12-07 Thread Jason Gunthorpe
On Fri, Dec 07, 2018 at 08:04:26AM +0200, Leon Romanovsky wrote:
> On Thu, Dec 06, 2018 at 08:27:06PM -0700, Jason Gunthorpe wrote:
> > On Fri, Nov 30, 2018 at 01:22:03PM +0200, Leon Romanovsky wrote:
> > > From: Leon Romanovsky 
> > >
> > > >From Danit,
> > >
> > > Packet based credit mode is an alternative end-to-end credit mode for QPs
> > > set during their creation. Credits are transported from the responder
> > > to the requester to optimize the use of its receive resources.
> > > In packet-based credit mode, credits are issued on a per packet basis.
> > >
> > > The advantage of this feature comes while sending large RDMA messages
> > > through switches that are short in memory.
> > >
> > > The first commit exposes QP creation flag and the HCA capability. The
> > > second commit adds support for a new DV QP creation flag. The last
> > > commit report packet based credit mode capability via the MLX5DV device
> > > capabilities.
> > >
> > > Thanks
> > >
> > > Danit Goldberg (3):
> > >   net/mlx5: Expose packet based credit mode
> > >   IB/mlx5: Add packet based credit mode support
> > >   IB/mlx5: Report packet based credit mode device capability
> >
> > This looks fine to me, can you update the shared branch please
> 
> Done, thanks
> 3fd3c80acc17 net/mlx5: Expose packet based credit mode

Applied to for-next

Thanks,
Jason


Re: [PATCH net-next v2 0/4] net: mitigate retpoline overhead

2018-12-07 Thread Paolo Abeni
Hi,

On Thu, 2018-12-06 at 22:28 -0800, David Miller wrote:
> From: David Miller 
> Date: Thu, 06 Dec 2018 22:24:09 -0800 (PST)
> 
> > Series applied, thanks!
> 
> Erm... actually reverted.  Please fix these build failures:

oops ...
I'm sorry for the late reply. I'm travelling and I will not able to re-
post soon.

> ld: net/ipv6/ip6_offload.o: in function `ipv6_gro_receive':
> ip6_offload.c:(.text+0xda2): undefined reference to `udp6_gro_receive'
> ld: ip6_offload.c:(.text+0xdb6): undefined reference to `udp6_gro_receive'
> ld: net/ipv6/ip6_offload.o: in function `ipv6_gro_complete':
> ip6_offload.c:(.text+0x1953): undefined reference to `udp6_gro_complete'
> ld: ip6_offload.c:(.text+0x1966): undefined reference to `udp6_gro_complete'
> make: *** [Makefile:1036: vmlinux] Error 1

Are you building with CONFIG_IPV6=m ? I tested vs some common cfg, but
I omitted that in my last iteration (my bad). With such conf ip6
offloads are builtin while udp6 offloads end-up in the ipv6 module, so
I can't use them with the given conf.

I'll try to fix the above in v3.

I'm sorry for this mess,

Paolo



[PATCH v2 net-next] neighbor: Improve garbage collection

2018-12-07 Thread David Ahern
From: David Ahern 

The existing garbage collection algorithm has a number of problems:

1. The gc algorithm will not evict PERMANENT entries as those entries
   are managed by userspace, yet the existing algorithm walks the entire
   hash table which means it always considers PERMANENT entries when
   looking for entries to evict. In some use cases (e.g., EVPN) there
   can be tens of thousands of PERMANENT entries leading to wasted
   CPU cycles when gc kicks in. As an example, with 32k permanent
   entries, neigh_alloc has been observed taking more than 4 msec per
   invocation.

2. Currently, when the number of neighbor entries hits gc_thresh2 and
   the last flush for the table was more than 5 seconds ago gc kicks in
   walks the entire hash table evicting *all* entries not in PERMANENT
   or REACHABLE state and not marked as externally learned. There is no
   discriminator on when the neigh entry was created or if it just moved
   from REACHABLE to another NUD_VALID state (e.g., NUD_STALE).

   It is possible for entries to be created or for established neighbor
   entries to be moved to STALE (e.g., an external node sends an ARP
   request) right before the 5 second window lapses:

-|-x|--|-
t-5 t t+5

   If that happens those entries are evicted during gc causing unnecessary
   thrashing on neighbor entries and userspace caches trying to track them.

   Further, this contradicts the description of gc_thresh2 which says
   "Entries older than 5 seconds will be cleared".

   One workaround is to make gc_thresh2 == gc_thresh3 but that negates the
   whole point of having separate thresholds.

3. Clearing *all* neigh non-PERMANENT/REACHABLE/externally learned entries
   when gc_thresh2 is exceeded is over kill and contributes to trashing
   especially during startup.

This patch addresses these problems as follows:

1. Use of a separate list_head to track entries that can be garbage
   collected along with a separate counter. PERMANENT entries are not
   added to this list.

   The gc_thresh parameters are only compared to the new counter, not the
   total entries in the table. The forced_gc function is updated to only
   walk this new gc_list looking for entries to evict.

2. Entries are added to the list head at the tail and removed from the
   front.

3. Entries are only evicted if they were last updated more than 5 seconds
   ago, adhering to the original intent of gc_thresh2.

4. Forced gc is stopped once the number of gc_entries drops below
   gc_thresh2.

5. Since gc checks do not apply to PERMANENT entries, gc levels are skipped
   when allocating a new neighbor for a PERMANENT entry. By extension this
   means there are no explicit limits on the number of PERMANENT entries
   that can be created, but this is no different than FIB entries or FDB
   entries.

Signed-off-by: David Ahern 
---
v2
- remove on_gc_list boolean in favor of !list_empty
- fix neigh_alloc to add new entry to tail of list_head

 Documentation/networking/ip-sysctl.txt |   4 +-
 include/net/neighbour.h|   3 +
 net/core/neighbour.c   | 119 +++--
 3 files changed, 90 insertions(+), 36 deletions(-)

diff --git a/Documentation/networking/ip-sysctl.txt 
b/Documentation/networking/ip-sysctl.txt
index af2a69439b93..acdfb5d2bcaa 100644
--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -108,8 +108,8 @@ neigh/default/gc_thresh2 - INTEGER
Default: 512
 
 neigh/default/gc_thresh3 - INTEGER
-   Maximum number of neighbor entries allowed.  Increase this
-   when using large numbers of interfaces and when communicating
+   Maximum number of non-PERMANENT neighbor entries allowed.  Increase
+   this when using large numbers of interfaces and when communicating
with large numbers of directly-connected peers.
Default: 1024
 
diff --git a/include/net/neighbour.h b/include/net/neighbour.h
index f58b384aa6c9..6c13072910ab 100644
--- a/include/net/neighbour.h
+++ b/include/net/neighbour.h
@@ -154,6 +154,7 @@ struct neighbour {
struct hh_cache hh;
int (*output)(struct neighbour *, struct sk_buff *);
const struct neigh_ops  *ops;
+   struct list_headgc_list;
struct rcu_head rcu;
struct net_device   *dev;
u8  primary_key[0];
@@ -214,6 +215,8 @@ struct neigh_table {
struct timer_list   proxy_timer;
struct sk_buff_head proxy_queue;
atomic_tentries;
+   atomic_tgc_entries;
+   struct list_headgc_list;
rwlock_tlock;
unsigned long   last_rand;
struct neigh_statistics __percpu *stats;
diff --git a/net/core/neighbour.c b/net/core/neighbour.c
index 6d479b5562be..c3b58712e98b 100644
--- a/net/core/neighbour.c
+++ b/

Re: [PATCH net] ipv6: sr: properly initialize flowi6 prior passing to ip6_route_output

2018-12-07 Thread David Miller
From: Shmulik Ladkani 
Date: Fri,  7 Dec 2018 09:50:17 +0200

> In 'seg6_output', stack variable 'struct flowi6 fl6' was missing
> initialization.
> 
> Fixes: 6c8702c60b88 ("ipv6: sr: add support for SRH encapsulation and 
> injection with lwtunnels")
> Signed-off-by: Shmulik Ladkani 

Applied and queued up for -stable, thanks.


Re: [PATCH] net: dsa: ksz: Fix port membership

2018-12-07 Thread Andrew Lunn
> I think if you do this without setting offload_fwd_mark you will
> receive duplicate frame.

I don't think it will, at least not in the normal case. The hardware
should know the egress port, so there is no need to forward a copy to
the CPU. The only time it should forward to the CPU is when the egress
port is not known, so it floods. Without offload_fwd_mark set, the SW
bridge will flood it back out the ports causing duplication. But that
is not too bad. The Marvell driver did this for a while and nothing
bad was reported.

Andrew


Re: [PATCH] net: dsa: ksz: Fix port membership

2018-12-07 Thread Marek Vasut
On 12/07/2018 08:37 PM, tristram...@microchip.com wrote:
>> If two ports are in the same bridge and in forwarding state, the packets
>> must be able to pass between them in both directions. The current code
>> only configures this bridge membership for a port newly added to the
>> bridge, but does not update all the other ports. Thus, ingress packets
>> on the new port will be forwarded, but ingress packets on other ports
>> destined for the new port (eg. a reply) will not be forwarded back to
>> the new port, because they are not configured to do so. This patch fixes
>> that by updating the membership registers of all ports.
>>
>> Signed-off-by: Marek Vasut 
>> Cc: Vivien Didelot 
>> Cc: Woojung Huh 
>> Cc: David S. Miller 
>> Cc: Tristram Ha 
>> ---
>>  drivers/net/dsa/microchip/ksz9477.c | 6 +++---
>>  1 file changed, 3 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/net/dsa/microchip/ksz9477.c
>> b/drivers/net/dsa/microchip/ksz9477.c
>> index 0684657fbf9a9..e24dd14ccde77 100644
>> --- a/drivers/net/dsa/microchip/ksz9477.c
>> +++ b/drivers/net/dsa/microchip/ksz9477.c
>> @@ -396,7 +396,7 @@ static void ksz9477_port_stp_state_set(struct
>> dsa_switch *ds, int port,
>>  struct ksz_device *dev = ds->priv;
>>  struct ksz_port *p = &dev->ports[port];
>>  u8 data;
>> -int member = -1;
>> +int i, member = -1;
>>
>>  ksz_pread8(dev, port, P_STP_CTRL, &data);
>>  data &= ~(PORT_TX_ENABLE | PORT_RX_ENABLE |
>> PORT_LEARN_DISABLE);
>> @@ -454,8 +454,8 @@ static void ksz9477_port_stp_state_set(struct
>> dsa_switch *ds, int port,
>>  dev->tx_ports &= ~(1 << port);
>>
>>  /* Port membership may share register with STP state. */
>> -if (member >= 0 && member != p->member)
>> -ksz9477_cfg_port_member(dev, port, (u8)member);
>> +for (i = 0; i < SWITCH_PORT_NUM; i++)
>> +ksz9477_cfg_port_member(dev, i, (u8)member);
>>
>>  /* Check if forwarding needs to be updated. */
>>  if (state != BR_STATE_FORWARDING) {
> 
> The original DSA model did not have a way to tell the bridge device not to
> forward the frame, so the switch driver always setup the membership to
> disable forwarding between ports.
> 
> When lan devices are setup they act like individual devices.  A bridge device
> adding them under it will forward the frames.
> 
> The new switchdev model adds the offload_fwd_mark bit to tell the bridge not 
> to
> forward frame.
> 
> The ksz_update_port_member function in ksz_common.c is doing this membership
> setup for all forwarding ports.  It was finally enabled in one of the RFC 
> patches I
> submitted recently (Add switch forward offloading support).
> 
> I think if you do this without setting offload_fwd_mark you will receive 
> duplicate
> frame.

Do you have a git tree with all the KSZ patches based on -next
somewhere, so I don't have to look for them in random MLs ?

-- 
Best regards,
Marek Vasut


[PATCH net-next v2 06/12] mlxsw: spectrum_switchdev: Publish mlxsw_sp_switchdev_notifier

2018-12-07 Thread Ido Schimmel
From: Petr Machata 

The notifier block will need to be passed to vxlan_fdb_replay() in a
follow-up patch.

Signed-off-by: Petr Machata 
Signed-off-by: Ido Schimmel 
---
 drivers/net/ethernet/mellanox/mlxsw/spectrum.h   | 1 +
 drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c | 2 +-
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum.h 
b/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
index 00d783136d11..6b0dc40fa213 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
@@ -387,6 +387,7 @@ int mlxsw_sp_bridge_vxlan_join(struct mlxsw_sp *mlxsw_sp,
   struct netlink_ext_ack *extack);
 void mlxsw_sp_bridge_vxlan_leave(struct mlxsw_sp *mlxsw_sp,
 const struct net_device *vxlan_dev);
+extern struct notifier_block mlxsw_sp_switchdev_notifier;
 
 /* spectrum.c */
 int mlxsw_sp_port_ets_set(struct mlxsw_sp_port *mlxsw_sp_port,
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c 
b/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c
index 7c38231bbd89..402f652cbf1b 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c
@@ -3186,7 +3186,7 @@ static int mlxsw_sp_switchdev_event(struct notifier_block 
*unused,
return NOTIFY_BAD;
 }
 
-static struct notifier_block mlxsw_sp_switchdev_notifier = {
+struct notifier_block mlxsw_sp_switchdev_notifier = {
.notifier_call = mlxsw_sp_switchdev_event,
 };
 
-- 
2.19.1



[PATCH net-next v2 11/12] selftests: mlxsw: vxlan: Test FDB un/marking on VXLAN join/leave

2018-12-07 Thread Ido Schimmel
From: Petr Machata 

When a VXLAN device is attached to an offloaded bridge, or when a
front-panel port is attached to a bridge that already has a VXLAN
device, mlxsw should offload the existing offloadable FDB entries.
Similarly when VXLAN device is downed, the FDB entries are unoffloaded,
and the marks thus need to be cleared. Similarly when a front-panel port
device is attached to a bridge with a VXLAN device, or when VLAN flags
are tweaked on a VXLAN port attached to a VLAN-aware bridge.

Test that the replaying / clearing logic works by observing transitions
in presence of offload marks under different scenarios.

Signed-off-by: Petr Machata 
Signed-off-by: Ido Schimmel 
---
 .../selftests/drivers/net/mlxsw/vxlan.sh  | 177 ++
 1 file changed, 177 insertions(+)

diff --git a/tools/testing/selftests/drivers/net/mlxsw/vxlan.sh 
b/tools/testing/selftests/drivers/net/mlxsw/vxlan.sh
index 90b4998a3b70..ea11535f5a6e 100755
--- a/tools/testing/selftests/drivers/net/mlxsw/vxlan.sh
+++ b/tools/testing/selftests/drivers/net/mlxsw/vxlan.sh
@@ -647,12 +647,159 @@ offload_indication_decap_route_test()
noudpcsum ttl 20 tos inherit local 198.51.100.1 dstport 4789
 }
 
+check_fdb_offloaded()
+{
+   local mac=00:11:22:33:44:55
+   local zmac=00:00:00:00:00:00
+
+   bridge fdb show dev vxlan0 | grep $mac | grep self | grep -q offload
+   check_err $?
+   bridge fdb show dev vxlan0 | grep $mac | grep master | grep -q offload
+   check_err $?
+
+   bridge fdb show dev vxlan0 | grep $zmac | grep self | grep -q offload
+   check_err $?
+}
+
+check_vxlan_fdb_not_offloaded()
+{
+   local mac=00:11:22:33:44:55
+   local zmac=00:00:00:00:00:00
+
+   bridge fdb show dev vxlan0 | grep $mac | grep -q self
+   check_err $?
+   bridge fdb show dev vxlan0 | grep $mac | grep self | grep -q offload
+   check_fail $?
+
+   bridge fdb show dev vxlan0 | grep $zmac | grep -q self
+   check_err $?
+   bridge fdb show dev vxlan0 | grep $zmac | grep self | grep -q offload
+   check_fail $?
+}
+
+check_bridge_fdb_not_offloaded()
+{
+   local mac=00:11:22:33:44:55
+   local zmac=00:00:00:00:00:00
+
+   bridge fdb show dev vxlan0 | grep $mac | grep -q master
+   check_err $?
+   bridge fdb show dev vxlan0 | grep $mac | grep master | grep -q offload
+   check_fail $?
+}
+
+__offload_indication_join_vxlan_first()
+{
+   local vid=$1; shift
+
+   local mac=00:11:22:33:44:55
+   local zmac=00:00:00:00:00:00
+
+   bridge fdb append $zmac dev vxlan0 self dst 198.51.100.2
+
+   ip link set dev vxlan0 master br0
+   bridge fdb add dev vxlan0 $mac self master static dst 198.51.100.2
+
+   RET=0
+   check_vxlan_fdb_not_offloaded
+   ip link set dev $swp1 master br0
+   sleep .1
+   check_fdb_offloaded
+   log_test "offload indication - attach vxlan first"
+
+   RET=0
+   ip link set dev vxlan0 down
+   check_vxlan_fdb_not_offloaded
+   check_bridge_fdb_not_offloaded
+   log_test "offload indication - set vxlan down"
+
+   RET=0
+   ip link set dev vxlan0 up
+   sleep .1
+   check_fdb_offloaded
+   log_test "offload indication - set vxlan up"
+
+   if [[ ! -z $vid ]]; then
+   RET=0
+   bridge vlan del dev vxlan0 vid $vid
+   check_vxlan_fdb_not_offloaded
+   check_bridge_fdb_not_offloaded
+   log_test "offload indication - delete VLAN"
+
+   RET=0
+   bridge vlan add dev vxlan0 vid $vid
+   check_vxlan_fdb_not_offloaded
+   check_bridge_fdb_not_offloaded
+   log_test "offload indication - add tagged VLAN"
+
+   RET=0
+   bridge vlan add dev vxlan0 vid $vid pvid untagged
+   sleep .1
+   check_fdb_offloaded
+   log_test "offload indication - add pvid/untagged VLAN"
+   fi
+
+   RET=0
+   ip link set dev $swp1 nomaster
+   check_vxlan_fdb_not_offloaded
+   log_test "offload indication - detach port"
+}
+
+offload_indication_join_vxlan_first()
+{
+   ip link add dev br0 up type bridge mcast_snooping 0
+   ip link add name vxlan0 up type vxlan id 10 nolearning noudpcsum \
+   ttl 20 tos inherit local 198.51.100.1 dstport 4789
+
+   __offload_indication_join_vxlan_first
+
+   ip link del dev vxlan0
+   ip link del dev br0
+}
+
+__offload_indication_join_vxlan_last()
+{
+   local zmac=00:00:00:00:00:00
+
+   RET=0
+
+   bridge fdb append $zmac dev vxlan0 self dst 198.51.100.2
+
+   ip link set dev $swp1 master br0
+
+   bridge fdb show dev vxlan0 | grep $zmac | grep self | grep -q offload
+   check_fail $?
+
+   ip link set dev vxlan0 master br0
+
+   bridge fdb show dev vxlan0 | grep $zmac | grep self | grep -q offload
+   check_err $?
+
+   log_test "offload indication - a

[PATCH net-next v2 12/12] selftests: forwarding: Add PVID test case for VXLAN with VLAN-aware bridges

2018-12-07 Thread Ido Schimmel
When using VLAN-aware bridges with VXLAN, the VLAN that is mapped to the
VNI of the VXLAN device is that which is configured as "pvid untagged"
on the corresponding bridge port.

When these flags are toggled or when the VLAN is deleted entirely,
remote hosts should not be able to receive packets from the VTEP.

Add a test case for above mentioned scenarios.

Signed-off-by: Petr Machata 
Signed-off-by: Ido Schimmel 
---
 .../net/forwarding/vxlan_bridge_1q.sh | 70 +++
 1 file changed, 70 insertions(+)

diff --git a/tools/testing/selftests/net/forwarding/vxlan_bridge_1q.sh 
b/tools/testing/selftests/net/forwarding/vxlan_bridge_1q.sh
index bac2e568d22c..a5789721ba92 100755
--- a/tools/testing/selftests/net/forwarding/vxlan_bridge_1q.sh
+++ b/tools/testing/selftests/net/forwarding/vxlan_bridge_1q.sh
@@ -95,6 +95,7 @@ export VXPORT
test_flood
test_unicast
test_learning
+   test_pvid
 "}
 
 NUM_NETIFS=6
@@ -610,6 +611,75 @@ test_unicast()
done
 }
 
+test_pvid()
+{
+   local -a expects=(0 0 0 0 0)
+   local mac=de:ad:be:ef:13:37
+   local dst=192.0.2.100
+   local vid=10
+
+   # Check that flooding works
+   RET=0
+
+   expects[0]=10; expects[1]=10; expects[3]=10
+   vxlan_flood_test $mac $dst $vid "${expects[@]}"
+
+   log_test "VXLAN: flood before pvid off"
+
+   # Toggle PVID off and test that flood to remote hosts does not work
+   RET=0
+
+   bridge vlan add vid 10 dev vx10
+
+   expects[0]=10; expects[1]=0; expects[3]=0
+   vxlan_flood_test $mac $dst $vid "${expects[@]}"
+
+   log_test "VXLAN: flood after pvid off"
+
+   # Toggle PVID on and test that flood to remote hosts does work
+   RET=0
+
+   bridge vlan add vid 10 dev vx10 pvid untagged
+
+   expects[0]=10; expects[1]=10; expects[3]=10
+   vxlan_flood_test $mac $dst $vid "${expects[@]}"
+
+   log_test "VXLAN: flood after pvid on"
+
+   # Add a new VLAN and test that it does not affect flooding
+   RET=0
+
+   bridge vlan add vid 30 dev vx10
+
+   expects[0]=10; expects[1]=10; expects[3]=10
+   vxlan_flood_test $mac $dst $vid "${expects[@]}"
+
+   bridge vlan del vid 30 dev vx10
+
+   log_test "VXLAN: flood after vlan add"
+
+   # Remove currently mapped VLAN and test that flood to remote hosts does
+   # not work
+   RET=0
+
+   bridge vlan del vid 10 dev vx10
+
+   expects[0]=10; expects[1]=0; expects[3]=0
+   vxlan_flood_test $mac $dst $vid "${expects[@]}"
+
+   log_test "VXLAN: flood after vlan delete"
+
+   # Re-add the VLAN and test that flood to remote hosts does work
+   RET=0
+
+   bridge vlan add vid 10 dev vx10 pvid untagged
+
+   expects[0]=10; expects[1]=10; expects[3]=10
+   vxlan_flood_test $mac $dst $vid "${expects[@]}"
+
+   log_test "VXLAN: flood after vlan re-add"
+}
+
 vxlan_ping_test()
 {
local ping_dev=$1; shift
-- 
2.19.1



[PATCH net-next v2 08/12] mlxsw: spectrum_nve: Add mlxsw_sp_nve_ops.fdb_clear_offload

2018-12-07 Thread Ido Schimmel
From: Petr Machata 

If there are any offloaded FDB entries at an NVE device at the time that
it's un-offloaded, their offloaded marks need to be cleared. How that is
done depends on NVE device type, and therefore add a per-NVE-type
operation.

Implement the operation for the sole NVE device type currently supported
by mlxsw, VXLAN.

Signed-off-by: Petr Machata 
Signed-off-by: Ido Schimmel 
---
 drivers/net/ethernet/mellanox/mlxsw/spectrum_nve.h |  1 +
 .../net/ethernet/mellanox/mlxsw/spectrum_nve_vxlan.c   | 10 ++
 2 files changed, 11 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_nve.h 
b/drivers/net/ethernet/mellanox/mlxsw/spectrum_nve.h
index e2f945543433..02937ea95bc3 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_nve.h
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_nve.h
@@ -42,6 +42,7 @@ struct mlxsw_sp_nve_ops {
const struct mlxsw_sp_nve_config *config);
void (*fini)(struct mlxsw_sp_nve *nve);
int (*fdb_replay)(const struct net_device *nve_dev, __be32 vni);
+   void (*fdb_clear_offload)(const struct net_device *nve_dev, __be32 vni);
 };
 
 extern const struct mlxsw_sp_nve_ops mlxsw_sp1_nve_vxlan_ops;
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_nve_vxlan.c 
b/drivers/net/ethernet/mellanox/mlxsw/spectrum_nve_vxlan.c
index 1651c912ef77..74e564c4ac19 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_nve_vxlan.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_nve_vxlan.c
@@ -219,6 +219,14 @@ mlxsw_sp_nve_vxlan_fdb_replay(const struct net_device 
*nve_dev, __be32 vni)
return vxlan_fdb_replay(nve_dev, vni, &mlxsw_sp_switchdev_notifier);
 }
 
+static void
+mlxsw_sp_nve_vxlan_clear_offload(const struct net_device *nve_dev, __be32 vni)
+{
+   if (WARN_ON(!netif_is_vxlan(nve_dev)))
+   return;
+   vxlan_fdb_clear_offload(nve_dev, vni);
+}
+
 const struct mlxsw_sp_nve_ops mlxsw_sp1_nve_vxlan_ops = {
.type   = MLXSW_SP_NVE_TYPE_VXLAN,
.can_offload= mlxsw_sp1_nve_vxlan_can_offload,
@@ -226,6 +234,7 @@ const struct mlxsw_sp_nve_ops mlxsw_sp1_nve_vxlan_ops = {
.init   = mlxsw_sp1_nve_vxlan_init,
.fini   = mlxsw_sp1_nve_vxlan_fini,
.fdb_replay = mlxsw_sp_nve_vxlan_fdb_replay,
+   .fdb_clear_offload = mlxsw_sp_nve_vxlan_clear_offload,
 };
 
 static bool mlxsw_sp2_nve_vxlan_can_offload(const struct mlxsw_sp_nve *nve,
@@ -252,4 +261,5 @@ const struct mlxsw_sp_nve_ops mlxsw_sp2_nve_vxlan_ops = {
.init   = mlxsw_sp2_nve_vxlan_init,
.fini   = mlxsw_sp2_nve_vxlan_fini,
.fdb_replay = mlxsw_sp_nve_vxlan_fdb_replay,
+   .fdb_clear_offload = mlxsw_sp_nve_vxlan_clear_offload,
 };
-- 
2.19.1



[PATCH net-next v2 07/12] mlxsw: spectrum_nve: Add mlxsw_sp_nve_ops.fdb_replay

2018-12-07 Thread Ido Schimmel
From: Petr Machata 

A replay of FDB needs to be performed so that the FDB entries existing
at the NVE device are offloaded. How the replay is done depends on NVE
device type, and therefore add a per-NVE-type operation.

Implement the operation for the sole NVE device type currently supported
by mlxsw, VXLAN.

Signed-off-by: Petr Machata 
Signed-off-by: Ido Schimmel 
---
 drivers/net/ethernet/mellanox/mlxsw/spectrum_nve.h |  1 +
 .../net/ethernet/mellanox/mlxsw/spectrum_nve_vxlan.c   | 10 ++
 2 files changed, 11 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_nve.h 
b/drivers/net/ethernet/mellanox/mlxsw/spectrum_nve.h
index 4cc3297e13d6..e2f945543433 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_nve.h
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_nve.h
@@ -41,6 +41,7 @@ struct mlxsw_sp_nve_ops {
int (*init)(struct mlxsw_sp_nve *nve,
const struct mlxsw_sp_nve_config *config);
void (*fini)(struct mlxsw_sp_nve *nve);
+   int (*fdb_replay)(const struct net_device *nve_dev, __be32 vni);
 };
 
 extern const struct mlxsw_sp_nve_ops mlxsw_sp1_nve_vxlan_ops;
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_nve_vxlan.c 
b/drivers/net/ethernet/mellanox/mlxsw/spectrum_nve_vxlan.c
index 4e9cc00a88fd..1651c912ef77 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_nve_vxlan.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_nve_vxlan.c
@@ -211,12 +211,21 @@ static void mlxsw_sp1_nve_vxlan_fini(struct mlxsw_sp_nve 
*nve)
 config->udp_dport);
 }
 
+static int
+mlxsw_sp_nve_vxlan_fdb_replay(const struct net_device *nve_dev, __be32 vni)
+{
+   if (WARN_ON(!netif_is_vxlan(nve_dev)))
+   return -EINVAL;
+   return vxlan_fdb_replay(nve_dev, vni, &mlxsw_sp_switchdev_notifier);
+}
+
 const struct mlxsw_sp_nve_ops mlxsw_sp1_nve_vxlan_ops = {
.type   = MLXSW_SP_NVE_TYPE_VXLAN,
.can_offload= mlxsw_sp1_nve_vxlan_can_offload,
.nve_config = mlxsw_sp_nve_vxlan_config,
.init   = mlxsw_sp1_nve_vxlan_init,
.fini   = mlxsw_sp1_nve_vxlan_fini,
+   .fdb_replay = mlxsw_sp_nve_vxlan_fdb_replay,
 };
 
 static bool mlxsw_sp2_nve_vxlan_can_offload(const struct mlxsw_sp_nve *nve,
@@ -242,4 +251,5 @@ const struct mlxsw_sp_nve_ops mlxsw_sp2_nve_vxlan_ops = {
.nve_config = mlxsw_sp_nve_vxlan_config,
.init   = mlxsw_sp2_nve_vxlan_init,
.fini   = mlxsw_sp2_nve_vxlan_fini,
+   .fdb_replay = mlxsw_sp_nve_vxlan_fdb_replay,
 };
-- 
2.19.1



[PATCH net-next v2 10/12] mlxsw: spectrum_nve: Un/offload FDB on nve_fid_disable/enable

2018-12-07 Thread Ido Schimmel
From: Petr Machata 

Any existing NVE FDB entries need to be offloaded when NVE is enabled
for a given FID. Recent patches have added fdb_replay op for this, so
just invoke it from mlxsw_sp_nve_fid_enable().

When NVE is disabled on a FID, any existing FDB offloaded marks need to
be cleared on NVE device as well as on its bridge master. An op to
handle this, fdb_clear_offload, has been added to FID ops and NVE ops in
previous patches. Add code to resolve the NVE device, NVE type, and
dispatch to both fdb_clear_offload ops.

Signed-off-by: Petr Machata 
Signed-off-by: Ido Schimmel 
---
 .../ethernet/mellanox/mlxsw/spectrum_nve.c| 41 +++
 1 file changed, 41 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_nve.c 
b/drivers/net/ethernet/mellanox/mlxsw/spectrum_nve.c
index e8e4cb6dfd38..9a86a7cde3b9 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_nve.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_nve.c
@@ -789,6 +789,21 @@ static void mlxsw_sp_nve_fdb_flush_by_fid(struct mlxsw_sp 
*mlxsw_sp,
mlxsw_reg_write(mlxsw_sp->core, MLXSW_REG(sfdf), sfdf_pl);
 }
 
+static void mlxsw_sp_nve_fdb_clear_offload(struct mlxsw_sp *mlxsw_sp,
+  const struct mlxsw_sp_fid *fid,
+  const struct net_device *nve_dev,
+  __be32 vni)
+{
+   const struct mlxsw_sp_nve_ops *ops;
+   enum mlxsw_sp_nve_type type;
+
+   if (WARN_ON(mlxsw_sp_fid_nve_type(fid, &type)))
+   return;
+
+   ops = mlxsw_sp->nve->nve_ops_arr[type];
+   ops->fdb_clear_offload(nve_dev, vni);
+}
+
 int mlxsw_sp_nve_fid_enable(struct mlxsw_sp *mlxsw_sp, struct mlxsw_sp_fid 
*fid,
struct mlxsw_sp_nve_params *params,
struct netlink_ext_ack *extack)
@@ -826,8 +841,16 @@ int mlxsw_sp_nve_fid_enable(struct mlxsw_sp *mlxsw_sp, 
struct mlxsw_sp_fid *fid,
 
nve->config = config;
 
+   err = ops->fdb_replay(params->dev, params->vni);
+   if (err) {
+   NL_SET_ERR_MSG_MOD(extack, "Failed to offload the FDB");
+   goto err_fdb_replay;
+   }
+
return 0;
 
+err_fdb_replay:
+   mlxsw_sp_fid_vni_clear(fid);
 err_fid_vni_set:
mlxsw_sp_nve_tunnel_fini(mlxsw_sp);
return err;
@@ -837,9 +860,27 @@ void mlxsw_sp_nve_fid_disable(struct mlxsw_sp *mlxsw_sp,
  struct mlxsw_sp_fid *fid)
 {
u16 fid_index = mlxsw_sp_fid_index(fid);
+   struct net_device *nve_dev;
+   int nve_ifindex;
+   __be32 vni;
 
mlxsw_sp_nve_flood_ip_flush(mlxsw_sp, fid);
mlxsw_sp_nve_fdb_flush_by_fid(mlxsw_sp, fid_index);
+
+   if (WARN_ON(mlxsw_sp_fid_nve_ifindex(fid, &nve_ifindex) ||
+   mlxsw_sp_fid_vni(fid, &vni)))
+   goto out;
+
+   nve_dev = dev_get_by_index(&init_net, nve_ifindex);
+   if (!nve_dev)
+   goto out;
+
+   mlxsw_sp_nve_fdb_clear_offload(mlxsw_sp, fid, nve_dev, vni);
+   mlxsw_sp_fid_fdb_clear_offload(fid, nve_dev);
+
+   dev_put(nve_dev);
+
+out:
mlxsw_sp_fid_vni_clear(fid);
mlxsw_sp_nve_tunnel_fini(mlxsw_sp);
 }
-- 
2.19.1



[PATCH net-next v2 09/12] mlxsw: spectrum: Add mlxsw_sp_fid_ops.fdb_clear_offload

2018-12-07 Thread Ido Schimmel
From: Petr Machata 

If there are any offloaded FDB entries at bridge master of an NVE device
at the time that it's un-offloaded, their offloaded marks need to be
cleared. How that is done depends on whether the bridge in question is
vlan aware. Therefore add a per-FID-type operation.

Implement the operation for the 802.1q and 802.1d bridges.

Add and publish a function mlxsw_sp_fid_fdb_clear_offload() to dispatch
to the new operation according to FID type.

Signed-off-by: Petr Machata 
Signed-off-by: Ido Schimmel 
---
 .../net/ethernet/mellanox/mlxsw/spectrum.h|  2 ++
 .../ethernet/mellanox/mlxsw/spectrum_fid.c| 28 +++
 2 files changed, 30 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum.h 
b/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
index 6b0dc40fa213..2d8f3692a949 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
@@ -763,6 +763,8 @@ int mlxsw_sp_fid_vni_set(struct mlxsw_sp_fid *fid, enum 
mlxsw_sp_nve_type type,
 __be32 vni, int nve_ifindex);
 void mlxsw_sp_fid_vni_clear(struct mlxsw_sp_fid *fid);
 bool mlxsw_sp_fid_vni_is_set(const struct mlxsw_sp_fid *fid);
+void mlxsw_sp_fid_fdb_clear_offload(const struct mlxsw_sp_fid *fid,
+   const struct net_device *nve_dev);
 int mlxsw_sp_fid_flood_set(struct mlxsw_sp_fid *fid,
   enum mlxsw_sp_flood_type packet_type, u8 local_port,
   bool member);
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_fid.c 
b/drivers/net/ethernet/mellanox/mlxsw/spectrum_fid.c
index fe16e0be716e..7adb1494ebba 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_fid.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_fid.c
@@ -85,6 +85,8 @@ struct mlxsw_sp_fid_ops {
int (*nve_flood_index_set)(struct mlxsw_sp_fid *fid,
   u32 nve_flood_index);
void (*nve_flood_index_clear)(struct mlxsw_sp_fid *fid);
+   void (*fdb_clear_offload)(const struct mlxsw_sp_fid *fid,
+ const struct net_device *nve_dev);
 };
 
 struct mlxsw_sp_fid_family {
@@ -277,6 +279,16 @@ bool mlxsw_sp_fid_vni_is_set(const struct mlxsw_sp_fid 
*fid)
return fid->vni_valid;
 }
 
+void mlxsw_sp_fid_fdb_clear_offload(const struct mlxsw_sp_fid *fid,
+   const struct net_device *nve_dev)
+{
+   struct mlxsw_sp_fid_family *fid_family = fid->fid_family;
+   const struct mlxsw_sp_fid_ops *ops = fid_family->ops;
+
+   if (ops->fdb_clear_offload)
+   ops->fdb_clear_offload(fid, nve_dev);
+}
+
 static const struct mlxsw_sp_flood_table *
 mlxsw_sp_fid_flood_table_lookup(const struct mlxsw_sp_fid *fid,
enum mlxsw_sp_flood_type packet_type)
@@ -766,6 +778,13 @@ static void 
mlxsw_sp_fid_8021d_nve_flood_index_clear(struct mlxsw_sp_fid *fid)
fid->vni_valid, 0, false);
 }
 
+static void
+mlxsw_sp_fid_8021d_fdb_clear_offload(const struct mlxsw_sp_fid *fid,
+const struct net_device *nve_dev)
+{
+   br_fdb_clear_offload(nve_dev, 0);
+}
+
 static const struct mlxsw_sp_fid_ops mlxsw_sp_fid_8021d_ops = {
.setup  = mlxsw_sp_fid_8021d_setup,
.configure  = mlxsw_sp_fid_8021d_configure,
@@ -779,6 +798,7 @@ static const struct mlxsw_sp_fid_ops mlxsw_sp_fid_8021d_ops 
= {
.vni_clear  = mlxsw_sp_fid_8021d_vni_clear,
.nve_flood_index_set= mlxsw_sp_fid_8021d_nve_flood_index_set,
.nve_flood_index_clear  = mlxsw_sp_fid_8021d_nve_flood_index_clear,
+   .fdb_clear_offload  = mlxsw_sp_fid_8021d_fdb_clear_offload,
 };
 
 static const struct mlxsw_sp_flood_table mlxsw_sp_fid_8021d_flood_tables[] = {
@@ -815,6 +835,13 @@ static const struct mlxsw_sp_fid_family 
mlxsw_sp_fid_8021d_family = {
.lag_vid_valid  = 1,
 };
 
+static void
+mlxsw_sp_fid_8021q_fdb_clear_offload(const struct mlxsw_sp_fid *fid,
+const struct net_device *nve_dev)
+{
+   br_fdb_clear_offload(nve_dev, mlxsw_sp_fid_8021q_vid(fid));
+}
+
 static const struct mlxsw_sp_fid_ops mlxsw_sp_fid_8021q_emu_ops = {
.setup  = mlxsw_sp_fid_8021q_setup,
.configure  = mlxsw_sp_fid_8021d_configure,
@@ -828,6 +855,7 @@ static const struct mlxsw_sp_fid_ops 
mlxsw_sp_fid_8021q_emu_ops = {
.vni_clear  = mlxsw_sp_fid_8021d_vni_clear,
.nve_flood_index_set= mlxsw_sp_fid_8021d_nve_flood_index_set,
.nve_flood_index_clear  = mlxsw_sp_fid_8021d_nve_flood_index_clear,
+   .fdb_clear_offload  = mlxsw_sp_fid_8021q_fdb_clear_offload,
 };
 
 /* There are 4K-2 emulated 802.1Q FIDs, starting right after the 802.1D FIDs */
-- 
2.19.1



Re: [PATCH] net: dsa: ksz: Add reset GPIO handling

2018-12-07 Thread Andrew Lunn
> + dev->reset_gpio = -1;
> + reset_gpio = of_get_named_gpio_flags(np, "reset-gpios", 0,
> +  &reset_gpio_flags);
> + if (reset_gpio >= 0) {
> + flags = (reset_gpio_flags == OF_GPIO_ACTIVE_LOW) ?
> + GPIOF_ACTIVE_LOW : 0;

Can you use devm_gpiod_get_optional()? It makes this a lot simpler.
Take a look at mv88e6xxx/chip.c which also uses a GPIO for reset.

You also need to update the binding documentation for this new
property.

Andrew


[PATCH net-next v2 00/12] mlxsw: Un/offload FDB on NVE detach/attach

2018-12-07 Thread Ido Schimmel
Petr says:

When a VXLAN device is attached to a bridge of a driver capable of
offloading such, or upped, the FDB entries already present at the device
need to be offloaded. Similarly when an offloaded VXLAN device ceases
being interesting (it is downed, or detached, or a front-panel port
netdevice is detached from the bridge that the VXLAN device is attached
to), any offloaded FDB entries need to be unoffloaded and unmarked. This
attach / detach processing is implemented in this patchset.

In patch #1, a code pattern is extracted into a named function for
easier reuse.

In patch #2, vxlan_fdb_replay() is added to send
SWITCHDEV_VXLAN_FDB_ADD_TO_DEVICE for each FDB entry with a given VNI.
The intention is that the offloading driver will interpret these events
like any other and thus offload the FDB entries that existed prior to
VXLAN attach.

In patches #3 and #4, the functions vxlan_fdb_clear_offload() resp.
br_fdb_clear_offload() are added. These clear the offloaded flag at
matching FDB entries.

In patches #5-#9, we introduce FID-type-specific and NVE-type-specific
ops necessary to properly abstract invocations of the replay/clear
functions.

Finally patch #10 implements the FDB management.

In patch #11, the mlxsw-specific test case is extended to check that the
management of offload marks under the newly-supported situations is
correct. Patch #12, from Ido, exercises the new code paths in actual
functional test.

v2:
- Patch #1:
- Modify vxlan_fdb_switchdev_notifier_info() to initialize the
  structure through a passed-in pointer argument, instead of returning
  it as a value.
- Patch #2:
- Adapt to API change in vxlan_fdb_switchdev_notifier_info()

Ido Schimmel (1):
  selftests: forwarding: Add PVID test case for VXLAN with VLAN-aware
bridges

Petr Machata (11):
  vxlan: Add a function to init switchdev_notifier_vxlan_fdb_info
  vxlan: Add vxlan_fdb_replay()
  vxlan: Add vxlan_fdb_clear_offload()
  bridge: Add br_fdb_clear_offload()
  mlxsw: spectrum: Track NVE type at FIDs
  mlxsw: spectrum_switchdev: Publish mlxsw_sp_switchdev_notifier
  mlxsw: spectrum_nve: Add mlxsw_sp_nve_ops.fdb_replay
  mlxsw: spectrum_nve: Add mlxsw_sp_nve_ops.fdb_clear_offload
  mlxsw: spectrum: Add mlxsw_sp_fid_ops.fdb_clear_offload
  mlxsw: spectrum_nve: Un/offload FDB on nve_fid_disable/enable
  selftests: mlxsw: vxlan: Test FDB un/marking on VXLAN join/leave

 .../net/ethernet/mellanox/mlxsw/spectrum.h|  16 +-
 .../ethernet/mellanox/mlxsw/spectrum_fid.c|  44 -
 .../ethernet/mellanox/mlxsw/spectrum_nve.c|  44 -
 .../ethernet/mellanox/mlxsw/spectrum_nve.h|   2 +
 .../mellanox/mlxsw/spectrum_nve_vxlan.c   |  20 ++
 .../mellanox/mlxsw/spectrum_switchdev.c   |   2 +-
 drivers/net/vxlan.c   | 110 ---
 include/linux/if_bridge.h |   6 +
 include/net/vxlan.h   |  15 ++
 net/bridge/br_fdb.c   |  20 ++
 .../selftests/drivers/net/mlxsw/vxlan.sh  | 177 ++
 .../net/forwarding/vxlan_bridge_1q.sh |  70 +++
 12 files changed, 495 insertions(+), 31 deletions(-)

-- 
2.19.1



[PATCH net-next v2 02/12] vxlan: Add vxlan_fdb_replay()

2018-12-07 Thread Ido Schimmel
From: Petr Machata 

When a VXLAN device becomes relevant to a driver (such as when it is
attached to an offloaded bridge), the driver will generally need to walk
the existing FDB entries and offload them.

Add a function vxlan_fdb_replay() to call a given notifier block for
each FDB entry with a given VNI.

Signed-off-by: Petr Machata 
Signed-off-by: Ido Schimmel 
---
 drivers/net/vxlan.c | 47 +
 include/net/vxlan.h |  9 +
 2 files changed, 56 insertions(+)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index d3db0313c97e..d9cb0d903283 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -552,6 +552,53 @@ int vxlan_fdb_find_uc(struct net_device *dev, const u8 
*mac, __be32 vni,
 }
 EXPORT_SYMBOL_GPL(vxlan_fdb_find_uc);
 
+static int vxlan_fdb_notify_one(struct notifier_block *nb,
+   const struct vxlan_dev *vxlan,
+   const struct vxlan_fdb *f,
+   const struct vxlan_rdst *rdst)
+{
+   struct switchdev_notifier_vxlan_fdb_info fdb_info;
+   int rc;
+
+   vxlan_fdb_switchdev_notifier_info(vxlan, f, rdst, &fdb_info);
+   rc = nb->notifier_call(nb, SWITCHDEV_VXLAN_FDB_ADD_TO_DEVICE,
+  &fdb_info);
+   return notifier_to_errno(rc);
+}
+
+int vxlan_fdb_replay(const struct net_device *dev, __be32 vni,
+struct notifier_block *nb)
+{
+   struct vxlan_dev *vxlan;
+   struct vxlan_rdst *rdst;
+   struct vxlan_fdb *f;
+   unsigned int h;
+   int rc = 0;
+
+   if (!netif_is_vxlan(dev))
+   return -EINVAL;
+   vxlan = netdev_priv(dev);
+
+   spin_lock_bh(&vxlan->hash_lock);
+   for (h = 0; h < FDB_HASH_SIZE; ++h) {
+   hlist_for_each_entry(f, &vxlan->fdb_head[h], hlist) {
+   if (f->vni == vni) {
+   list_for_each_entry(rdst, &f->remotes, list) {
+   rc = vxlan_fdb_notify_one(nb, vxlan,
+ f, rdst);
+   if (rc)
+   goto out;
+   }
+   }
+   }
+   }
+
+out:
+   spin_unlock_bh(&vxlan->hash_lock);
+   return rc;
+}
+EXPORT_SYMBOL_GPL(vxlan_fdb_replay);
+
 /* Replace destination of unicast mac */
 static int vxlan_fdb_replace(struct vxlan_fdb *f,
 union vxlan_addr *ip, __be16 port, __be32 vni,
diff --git a/include/net/vxlan.h b/include/net/vxlan.h
index b73c670df184..f49aa9afe598 100644
--- a/include/net/vxlan.h
+++ b/include/net/vxlan.h
@@ -427,6 +427,9 @@ struct switchdev_notifier_vxlan_fdb_info {
 #if IS_ENABLED(CONFIG_VXLAN)
 int vxlan_fdb_find_uc(struct net_device *dev, const u8 *mac, __be32 vni,
  struct switchdev_notifier_vxlan_fdb_info *fdb_info);
+int vxlan_fdb_replay(const struct net_device *dev, __be32 vni,
+struct notifier_block *nb);
+
 #else
 static inline int
 vxlan_fdb_find_uc(struct net_device *dev, const u8 *mac, __be32 vni,
@@ -434,6 +437,12 @@ vxlan_fdb_find_uc(struct net_device *dev, const u8 *mac, 
__be32 vni,
 {
return -ENOENT;
 }
+
+static inline int vxlan_fdb_replay(const struct net_device *dev, __be32 vni,
+  struct notifier_block *nb)
+{
+   return -EOPNOTSUPP;
+}
 #endif
 
 #endif
-- 
2.19.1



[PATCH net-next v2 03/12] vxlan: Add vxlan_fdb_clear_offload()

2018-12-07 Thread Ido Schimmel
From: Petr Machata 

When a driver unoffloads all FDB entries en bloc, it's inefficient to
send the switchdev notification one by one. Add a helper that walks the
FDB table, unsetting the offload flag on RDST with a given VNI.

Signed-off-by: Petr Machata 
Signed-off-by: Ido Schimmel 
---
 drivers/net/vxlan.c | 22 ++
 include/net/vxlan.h |  6 ++
 2 files changed, 28 insertions(+)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index d9cb0d903283..b56ef684ecac 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -599,6 +599,28 @@ int vxlan_fdb_replay(const struct net_device *dev, __be32 
vni,
 }
 EXPORT_SYMBOL_GPL(vxlan_fdb_replay);
 
+void vxlan_fdb_clear_offload(const struct net_device *dev, __be32 vni)
+{
+   struct vxlan_dev *vxlan;
+   struct vxlan_rdst *rdst;
+   struct vxlan_fdb *f;
+   unsigned int h;
+
+   if (!netif_is_vxlan(dev))
+   return;
+   vxlan = netdev_priv(dev);
+
+   spin_lock_bh(&vxlan->hash_lock);
+   for (h = 0; h < FDB_HASH_SIZE; ++h) {
+   hlist_for_each_entry(f, &vxlan->fdb_head[h], hlist)
+   if (f->vni == vni)
+   list_for_each_entry(rdst, &f->remotes, list)
+   rdst->offloaded = false;
+   }
+   spin_unlock_bh(&vxlan->hash_lock);
+}
+EXPORT_SYMBOL_GPL(vxlan_fdb_clear_offload);
+
 /* Replace destination of unicast mac */
 static int vxlan_fdb_replace(struct vxlan_fdb *f,
 union vxlan_addr *ip, __be16 port, __be32 vni,
diff --git a/include/net/vxlan.h b/include/net/vxlan.h
index f49aa9afe598..236403eb5ba6 100644
--- a/include/net/vxlan.h
+++ b/include/net/vxlan.h
@@ -429,6 +429,7 @@ int vxlan_fdb_find_uc(struct net_device *dev, const u8 
*mac, __be32 vni,
  struct switchdev_notifier_vxlan_fdb_info *fdb_info);
 int vxlan_fdb_replay(const struct net_device *dev, __be32 vni,
 struct notifier_block *nb);
+void vxlan_fdb_clear_offload(const struct net_device *dev, __be32 vni);
 
 #else
 static inline int
@@ -443,6 +444,11 @@ static inline int vxlan_fdb_replay(const struct net_device 
*dev, __be32 vni,
 {
return -EOPNOTSUPP;
 }
+
+static inline void
+vxlan_fdb_clear_offload(const struct net_device *dev, __be32 vni)
+{
+}
 #endif
 
 #endif
-- 
2.19.1



[PATCH net-next v2 05/12] mlxsw: spectrum: Track NVE type at FIDs

2018-12-07 Thread Ido Schimmel
From: Petr Machata 

A follow-up patch will add support for replay and for clearing of
offload marks. These are NVE type-sensitive operations, and to be able
to dispatch them properly, a FID needs to know what NVE type is attached
to it.

Therefore, track the NVE type at struct mlxsw_sp_fid. Extend
mlxsw_sp_fid_vni_set() to take it as an argument, and add
mlxsw_sp_fid_nve_type().

Signed-off-by: Petr Machata 
Signed-off-by: Ido Schimmel 
---
 drivers/net/ethernet/mellanox/mlxsw/spectrum.h   | 13 -
 .../net/ethernet/mellanox/mlxsw/spectrum_fid.c   | 16 +++-
 .../net/ethernet/mellanox/mlxsw/spectrum_nve.c   |  3 ++-
 3 files changed, 25 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum.h 
b/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
index a3e564e0da39..00d783136d11 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
@@ -81,6 +81,10 @@ enum mlxsw_sp_fid_type {
MLXSW_SP_FID_TYPE_MAX,
 };
 
+enum mlxsw_sp_nve_type {
+   MLXSW_SP_NVE_TYPE_VXLAN,
+};
+
 struct mlxsw_sp_mid {
struct list_head list;
unsigned char addr[ETH_ALEN];
@@ -745,6 +749,8 @@ bool mlxsw_sp_fid_lag_vid_valid(const struct mlxsw_sp_fid 
*fid);
 struct mlxsw_sp_fid *mlxsw_sp_fid_lookup_by_index(struct mlxsw_sp *mlxsw_sp,
  u16 fid_index);
 int mlxsw_sp_fid_nve_ifindex(const struct mlxsw_sp_fid *fid, int *nve_ifindex);
+int mlxsw_sp_fid_nve_type(const struct mlxsw_sp_fid *fid,
+ enum mlxsw_sp_nve_type *p_type);
 struct mlxsw_sp_fid *mlxsw_sp_fid_lookup_by_vni(struct mlxsw_sp *mlxsw_sp,
__be32 vni);
 int mlxsw_sp_fid_vni(const struct mlxsw_sp_fid *fid, __be32 *vni);
@@ -752,7 +758,8 @@ int mlxsw_sp_fid_nve_flood_index_set(struct mlxsw_sp_fid 
*fid,
 u32 nve_flood_index);
 void mlxsw_sp_fid_nve_flood_index_clear(struct mlxsw_sp_fid *fid);
 bool mlxsw_sp_fid_nve_flood_index_is_set(const struct mlxsw_sp_fid *fid);
-int mlxsw_sp_fid_vni_set(struct mlxsw_sp_fid *fid, __be32 vni, int 
nve_ifindex);
+int mlxsw_sp_fid_vni_set(struct mlxsw_sp_fid *fid, enum mlxsw_sp_nve_type type,
+__be32 vni, int nve_ifindex);
 void mlxsw_sp_fid_vni_clear(struct mlxsw_sp_fid *fid);
 bool mlxsw_sp_fid_vni_is_set(const struct mlxsw_sp_fid *fid);
 int mlxsw_sp_fid_flood_set(struct mlxsw_sp_fid *fid,
@@ -823,10 +830,6 @@ extern const struct mlxsw_sp_mr_tcam_ops 
mlxsw_sp1_mr_tcam_ops;
 extern const struct mlxsw_sp_mr_tcam_ops mlxsw_sp2_mr_tcam_ops;
 
 /* spectrum_nve.c */
-enum mlxsw_sp_nve_type {
-   MLXSW_SP_NVE_TYPE_VXLAN,
-};
-
 struct mlxsw_sp_nve_params {
enum mlxsw_sp_nve_type type;
__be32 vni;
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_fid.c 
b/drivers/net/ethernet/mellanox/mlxsw/spectrum_fid.c
index f9af68230455..fe16e0be716e 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_fid.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_fid.c
@@ -30,6 +30,7 @@ struct mlxsw_sp_fid {
struct rhash_head ht_node;
 
struct rhash_head vni_ht_node;
+   enum mlxsw_sp_nve_type nve_type;
__be32 vni;
u32 nve_flood_index;
int nve_ifindex;
@@ -151,6 +152,17 @@ int mlxsw_sp_fid_nve_ifindex(const struct mlxsw_sp_fid 
*fid, int *nve_ifindex)
return 0;
 }
 
+int mlxsw_sp_fid_nve_type(const struct mlxsw_sp_fid *fid,
+ enum mlxsw_sp_nve_type *p_type)
+{
+   if (!fid->vni_valid)
+   return -EINVAL;
+
+   *p_type = fid->nve_type;
+
+   return 0;
+}
+
 struct mlxsw_sp_fid *mlxsw_sp_fid_lookup_by_vni(struct mlxsw_sp *mlxsw_sp,
__be32 vni)
 {
@@ -211,7 +223,8 @@ bool mlxsw_sp_fid_nve_flood_index_is_set(const struct 
mlxsw_sp_fid *fid)
return fid->nve_flood_index_valid;
 }
 
-int mlxsw_sp_fid_vni_set(struct mlxsw_sp_fid *fid, __be32 vni, int nve_ifindex)
+int mlxsw_sp_fid_vni_set(struct mlxsw_sp_fid *fid, enum mlxsw_sp_nve_type type,
+__be32 vni, int nve_ifindex)
 {
struct mlxsw_sp_fid_family *fid_family = fid->fid_family;
const struct mlxsw_sp_fid_ops *ops = fid_family->ops;
@@ -221,6 +234,7 @@ int mlxsw_sp_fid_vni_set(struct mlxsw_sp_fid *fid, __be32 
vni, int nve_ifindex)
if (WARN_ON(!ops->vni_set || fid->vni_valid))
return -EINVAL;
 
+   fid->nve_type = type;
fid->nve_ifindex = nve_ifindex;
fid->vni = vni;
err = rhashtable_lookup_insert_fast(&mlxsw_sp->fid_core->vni_ht,
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_nve.c 
b/drivers/net/ethernet/mellanox/mlxsw/spectrum_nve.c
index c4d5a0865c8f..e8e4cb6dfd38 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_nve.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_nve.c
@@ -817,7 +817,8 @@ int mlxsw_sp_nve_fid_ena

[PATCH net-next v2 01/12] vxlan: Add a function to init switchdev_notifier_vxlan_fdb_info

2018-12-07 Thread Ido Schimmel
From: Petr Machata 

There are currently two places that need to initialize the notifier info
structure, and one more is coming next when vxlan_fdb_replay() is
introduced. These three instances have / will have very similar code
that is easy to abstract away into a named function.

Add such function, vxlan_fdb_switchdev_notifier_info(), and call it from
vxlan_fdb_switchdev_call_notifiers() and vxlan_fdb_find_uc().

Signed-off-by: Petr Machata 
Signed-off-by: Ido Schimmel 
---
 drivers/net/vxlan.c | 41 ++---
 1 file changed, 18 insertions(+), 23 deletions(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index 901eef428280..d3db0313c97e 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -358,6 +358,22 @@ static void __vxlan_fdb_notify(struct vxlan_dev *vxlan, 
struct vxlan_fdb *fdb,
rtnl_set_sk_err(net, RTNLGRP_NEIGH, err);
 }
 
+static void vxlan_fdb_switchdev_notifier_info(const struct vxlan_dev *vxlan,
+   const struct vxlan_fdb *fdb,
+   const struct vxlan_rdst *rd,
+   struct switchdev_notifier_vxlan_fdb_info *fdb_info)
+{
+   fdb_info->info.dev = vxlan->dev;
+   fdb_info->remote_ip = rd->remote_ip;
+   fdb_info->remote_port = rd->remote_port;
+   fdb_info->remote_vni = rd->remote_vni;
+   fdb_info->remote_ifindex = rd->remote_ifindex;
+   memcpy(fdb_info->eth_addr, fdb->eth_addr, ETH_ALEN);
+   fdb_info->vni = fdb->vni;
+   fdb_info->offloaded = rd->offloaded;
+   fdb_info->added_by_user = fdb->flags & NTF_VXLAN_ADDED_BY_USER;
+}
+
 static void vxlan_fdb_switchdev_call_notifiers(struct vxlan_dev *vxlan,
   struct vxlan_fdb *fdb,
   struct vxlan_rdst *rd,
@@ -371,18 +387,7 @@ static void vxlan_fdb_switchdev_call_notifiers(struct 
vxlan_dev *vxlan,
 
notifier_type = adding ? SWITCHDEV_VXLAN_FDB_ADD_TO_DEVICE
   : SWITCHDEV_VXLAN_FDB_DEL_TO_DEVICE;
-
-   info = (struct switchdev_notifier_vxlan_fdb_info){
-   .remote_ip = rd->remote_ip,
-   .remote_port = rd->remote_port,
-   .remote_vni = rd->remote_vni,
-   .remote_ifindex = rd->remote_ifindex,
-   .vni = fdb->vni,
-   .offloaded = rd->offloaded,
-   .added_by_user = fdb->flags & NTF_VXLAN_ADDED_BY_USER,
-   };
-   memcpy(info.eth_addr, fdb->eth_addr, ETH_ALEN);
-
+   vxlan_fdb_switchdev_notifier_info(vxlan, fdb, rd, &info);
call_switchdev_notifiers(notifier_type, vxlan->dev,
 &info.info);
 }
@@ -539,17 +544,7 @@ int vxlan_fdb_find_uc(struct net_device *dev, const u8 
*mac, __be32 vni,
}
 
rdst = first_remote_rcu(f);
-
-   memset(fdb_info, 0, sizeof(*fdb_info));
-   fdb_info->info.dev = dev;
-   fdb_info->remote_ip = rdst->remote_ip;
-   fdb_info->remote_port = rdst->remote_port;
-   fdb_info->remote_vni = rdst->remote_vni;
-   fdb_info->remote_ifindex = rdst->remote_ifindex;
-   fdb_info->vni = vni;
-   fdb_info->offloaded = rdst->offloaded;
-   fdb_info->added_by_user = f->flags & NTF_VXLAN_ADDED_BY_USER;
-   ether_addr_copy(fdb_info->eth_addr, mac);
+   vxlan_fdb_switchdev_notifier_info(vxlan, f, rdst, fdb_info);
 
 out:
rcu_read_unlock();
-- 
2.19.1



[PATCH net-next v2 04/12] bridge: Add br_fdb_clear_offload()

2018-12-07 Thread Ido Schimmel
From: Petr Machata 

When a driver unoffloads all FDB entries en bloc, it's inefficient to
send the switchdev notification one by one. Add a helper that unsets the
offload flag on FDB entries on a given bridge port and VLAN.

Signed-off-by: Petr Machata 
Acked-by: Nikolay Aleksandrov 
Signed-off-by: Ido Schimmel 
---
 include/linux/if_bridge.h |  6 ++
 net/bridge/br_fdb.c   | 20 
 2 files changed, 26 insertions(+)

diff --git a/include/linux/if_bridge.h b/include/linux/if_bridge.h
index ef7c3d376b21..627b788ba0ff 100644
--- a/include/linux/if_bridge.h
+++ b/include/linux/if_bridge.h
@@ -119,6 +119,7 @@ static inline int br_vlan_get_info(const struct net_device 
*dev, u16 vid,
 struct net_device *br_fdb_find_port(const struct net_device *br_dev,
const unsigned char *addr,
__u16 vid);
+void br_fdb_clear_offload(const struct net_device *dev, u16 vid);
 bool br_port_flag_is_set(const struct net_device *dev, unsigned long flag);
 #else
 static inline struct net_device *
@@ -128,6 +129,11 @@ br_fdb_find_port(const struct net_device *br_dev,
 {
return NULL;
 }
+
+static inline void br_fdb_clear_offload(const struct net_device *dev, u16 vid)
+{
+}
+
 static inline bool
 br_port_flag_is_set(const struct net_device *dev, unsigned long flag)
 {
diff --git a/net/bridge/br_fdb.c b/net/bridge/br_fdb.c
index e56ba3912a90..38b1d0dd0529 100644
--- a/net/bridge/br_fdb.c
+++ b/net/bridge/br_fdb.c
@@ -1164,3 +1164,23 @@ void br_fdb_offloaded_set(struct net_bridge *br, struct 
net_bridge_port *p,
 
spin_unlock_bh(&br->hash_lock);
 }
+
+void br_fdb_clear_offload(const struct net_device *dev, u16 vid)
+{
+   struct net_bridge_fdb_entry *f;
+   struct net_bridge_port *p;
+
+   ASSERT_RTNL();
+
+   p = br_port_get_rtnl(dev);
+   if (!p)
+   return;
+
+   spin_lock_bh(&p->br->hash_lock);
+   hlist_for_each_entry(f, &p->br->fdb_list, fdb_node) {
+   if (f->dst == p && f->key.vlan_id == vid)
+   f->offloaded = 0;
+   }
+   spin_unlock_bh(&p->br->hash_lock);
+}
+EXPORT_SYMBOL_GPL(br_fdb_clear_offload);
-- 
2.19.1



Re: [PATCH net-next] neighbour: Improve garbage collection

2018-12-07 Thread David Ahern
On 12/6/18 8:59 PM, David Miller wrote:
> But why do you need the on_gc_list boolean state? f

mental blockage.

v2 coming up.


Re: [PATCH] net: dsa: ksz: Fix port membership

2018-12-07 Thread Florian Fainelli
On 12/7/18 11:37 AM, tristram...@microchip.com wrote:
>> If two ports are in the same bridge and in forwarding state, the packets
>> must be able to pass between them in both directions. The current code
>> only configures this bridge membership for a port newly added to the
>> bridge, but does not update all the other ports. Thus, ingress packets
>> on the new port will be forwarded, but ingress packets on other ports
>> destined for the new port (eg. a reply) will not be forwarded back to
>> the new port, because they are not configured to do so. This patch fixes
>> that by updating the membership registers of all ports.
>>
>> Signed-off-by: Marek Vasut 
>> Cc: Vivien Didelot 
>> Cc: Woojung Huh 
>> Cc: David S. Miller 
>> Cc: Tristram Ha 
>> ---
>>  drivers/net/dsa/microchip/ksz9477.c | 6 +++---
>>  1 file changed, 3 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/net/dsa/microchip/ksz9477.c
>> b/drivers/net/dsa/microchip/ksz9477.c
>> index 0684657fbf9a9..e24dd14ccde77 100644
>> --- a/drivers/net/dsa/microchip/ksz9477.c
>> +++ b/drivers/net/dsa/microchip/ksz9477.c
>> @@ -396,7 +396,7 @@ static void ksz9477_port_stp_state_set(struct
>> dsa_switch *ds, int port,
>>  struct ksz_device *dev = ds->priv;
>>  struct ksz_port *p = &dev->ports[port];
>>  u8 data;
>> -int member = -1;
>> +int i, member = -1;
>>
>>  ksz_pread8(dev, port, P_STP_CTRL, &data);
>>  data &= ~(PORT_TX_ENABLE | PORT_RX_ENABLE |
>> PORT_LEARN_DISABLE);
>> @@ -454,8 +454,8 @@ static void ksz9477_port_stp_state_set(struct
>> dsa_switch *ds, int port,
>>  dev->tx_ports &= ~(1 << port);
>>
>>  /* Port membership may share register with STP state. */
>> -if (member >= 0 && member != p->member)
>> -ksz9477_cfg_port_member(dev, port, (u8)member);
>> +for (i = 0; i < SWITCH_PORT_NUM; i++)
>> +ksz9477_cfg_port_member(dev, i, (u8)member);
>>
>>  /* Check if forwarding needs to be updated. */
>>  if (state != BR_STATE_FORWARDING) {
> 
> The original DSA model did not have a way to tell the bridge device not to
> forward the frame, so the switch driver always setup the membership to
> disable forwarding between ports.
> 
> When lan devices are setup they act like individual devices.  A bridge device
> adding them under it will forward the frames.
> 
> The new switchdev model adds the offload_fwd_mark bit to tell the bridge not 
> to
> forward frame.
> 
> The ksz_update_port_member function in ksz_common.c is doing this membership
> setup for all forwarding ports.  It was finally enabled in one of the RFC 
> patches I
> submitted recently (Add switch forward offloading support).
> 
> I think if you do this without setting offload_fwd_mark you will receive 
> duplicate
> frame.
> 

Either I am misreading Marek's patch, or I don't quite understand your
response, but what is happening when you enslave a switch port into a
bridge is that you need to make sure that:

- the switch port being enslaved will be part of the same forwarding
group as any other switch port already in the bridge
- any existing switch port already enslaved in the bridge must now also
be allowed to forward to the port that is being enslaved

That is to me, exactly what Marek's patch is fixing, your response is
about something slightly orthogonal here.
-- 
Florian


RE: [PATCH] net: dsa: ksz: Fix port membership

2018-12-07 Thread Tristram.Ha
> If two ports are in the same bridge and in forwarding state, the packets
> must be able to pass between them in both directions. The current code
> only configures this bridge membership for a port newly added to the
> bridge, but does not update all the other ports. Thus, ingress packets
> on the new port will be forwarded, but ingress packets on other ports
> destined for the new port (eg. a reply) will not be forwarded back to
> the new port, because they are not configured to do so. This patch fixes
> that by updating the membership registers of all ports.
> 
> Signed-off-by: Marek Vasut 
> Cc: Vivien Didelot 
> Cc: Woojung Huh 
> Cc: David S. Miller 
> Cc: Tristram Ha 
> ---
>  drivers/net/dsa/microchip/ksz9477.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/net/dsa/microchip/ksz9477.c
> b/drivers/net/dsa/microchip/ksz9477.c
> index 0684657fbf9a9..e24dd14ccde77 100644
> --- a/drivers/net/dsa/microchip/ksz9477.c
> +++ b/drivers/net/dsa/microchip/ksz9477.c
> @@ -396,7 +396,7 @@ static void ksz9477_port_stp_state_set(struct
> dsa_switch *ds, int port,
>   struct ksz_device *dev = ds->priv;
>   struct ksz_port *p = &dev->ports[port];
>   u8 data;
> - int member = -1;
> + int i, member = -1;
> 
>   ksz_pread8(dev, port, P_STP_CTRL, &data);
>   data &= ~(PORT_TX_ENABLE | PORT_RX_ENABLE |
> PORT_LEARN_DISABLE);
> @@ -454,8 +454,8 @@ static void ksz9477_port_stp_state_set(struct
> dsa_switch *ds, int port,
>   dev->tx_ports &= ~(1 << port);
> 
>   /* Port membership may share register with STP state. */
> - if (member >= 0 && member != p->member)
> - ksz9477_cfg_port_member(dev, port, (u8)member);
> + for (i = 0; i < SWITCH_PORT_NUM; i++)
> + ksz9477_cfg_port_member(dev, i, (u8)member);
> 
>   /* Check if forwarding needs to be updated. */
>   if (state != BR_STATE_FORWARDING) {

The original DSA model did not have a way to tell the bridge device not to
forward the frame, so the switch driver always setup the membership to
disable forwarding between ports.

When lan devices are setup they act like individual devices.  A bridge device
adding them under it will forward the frames.

The new switchdev model adds the offload_fwd_mark bit to tell the bridge not to
forward frame.

The ksz_update_port_member function in ksz_common.c is doing this membership
setup for all forwarding ports.  It was finally enabled in one of the RFC 
patches I
submitted recently (Add switch forward offloading support).

I think if you do this without setting offload_fwd_mark you will receive 
duplicate
frame.



Re: OMAP4430 SDP with KS8851: very slow networking

2018-12-07 Thread Russell King - ARM Linux
On Fri, Dec 07, 2018 at 11:03:12AM -0800, Tony Lindgren wrote:
> * Tony Lindgren  [181207 18:14]:
> > Hi,
> > 
> > * Russell King - ARM Linux  [181207 18:01]:
> > > Hi Tony,
> > > 
> > > You know most of what's been going on from IRC, but here's the patch
> > > which gets me:
> > > 
> > > 1) working interrupts for networking
> > > 2) solves the stuck-wakeup problem
> > > 
> > > It also contains some of the debug bits I added.
> > 
> > This is excellent news :) Will test today.
> 
> Yes your patch seems to work great based on brief testing :)
> 
> > > I think what this means is that we should strip out ec0daae685b2
> > > ("gpio: omap: Add level wakeup handling for omap4 based SoCs").
> > 
> > Yes the only reason for the wakeup quirk was the stuck wakeup
> > state seen on omap4, it can be just dropped if this works.
> > Adding Grygorii to Cc too.
> 
> I'll post a partial revert for commit ec0daae685b2 ("gpio: omap:
> Add level wakeup handling for omap4 based SoCs") shortly.

Hi,

You mentioned that edge mode didn't work as well as level mode on
duovero smsc controller, I think this may help to solve the same
issue but for edge IRQs - we need a mask_ack_irq function to avoid
acking while the edge interrupt is masked.  Let me know if that
lowers the smsc ping latency while in edge mode.

Thanks.

diff --git a/drivers/gpio/gpio-omap.c b/drivers/gpio/gpio-omap.c
index 3d021f648c5d..b1ad6098e894 100644
--- a/drivers/gpio/gpio-omap.c
+++ b/drivers/gpio/gpio-omap.c
@@ -11,7 +11,7 @@
  * it under the terms of the GNU General Public License version 2 as
  * published by the Free Software Foundation.
  */
-
+#define DEBUG
 #include 
 #include 
 #include 
@@ -366,10 +366,14 @@ static inline void omap_set_gpio_trigger(struct gpio_bank 
*bank, int gpio,
  trigger & IRQ_TYPE_LEVEL_LOW);
omap_gpio_rmw(base, bank->regs->leveldetect1, gpio_bit,
  trigger & IRQ_TYPE_LEVEL_HIGH);
+   /*
+* We need the edge detect enabled for the idle mode detection
+* to function on OMAP4430.
+*/
omap_gpio_rmw(base, bank->regs->risingdetect, gpio_bit,
- trigger & IRQ_TYPE_EDGE_RISING);
+ trigger & (IRQ_TYPE_EDGE_RISING | IRQ_TYPE_LEVEL_HIGH));
omap_gpio_rmw(base, bank->regs->fallingdetect, gpio_bit,
- trigger & IRQ_TYPE_EDGE_FALLING);
+ trigger & (IRQ_TYPE_EDGE_FALLING | IRQ_TYPE_LEVEL_LOW));
 
bank->context.leveldetect0 =
readl_relaxed(bank->base + bank->regs->leveldetect0);
@@ -899,6 +903,19 @@ static void omap_gpio_mask_irq(struct irq_data *d)
raw_spin_unlock_irqrestore(&bank->lock, flags);
 }
 
+static void omap_gpio_mask_ack_irq(struct irq_data *d)
+{
+   struct gpio_bank *bank = omap_irq_data_get_bank(d);
+   unsigned offset = d->hwirq;
+   unsigned long flags;
+
+   raw_spin_lock_irqsave(&bank->lock, flags);
+   omap_clear_gpio_irqstatus(bank, offset);
+   omap_set_gpio_irqenable(bank, offset, 0);
+   omap_set_gpio_triggering(bank, offset, IRQ_TYPE_NONE);
+   raw_spin_unlock_irqrestore(&bank->lock, flags);
+}
+
 static void omap_gpio_unmask_irq(struct irq_data *d)
 {
struct gpio_bank *bank = omap_irq_data_get_bank(d);
@@ -910,14 +927,16 @@ static void omap_gpio_unmask_irq(struct irq_data *d)
if (trigger)
omap_set_gpio_triggering(bank, offset, trigger);
 
+   omap_set_gpio_irqenable(bank, offset, 1);
+
/* For level-triggered GPIOs, the clearing must be done after
-* the HW source is cleared, thus after the handler has run */
-   if (bank->level_mask & BIT(offset)) {
-   omap_set_gpio_irqenable(bank, offset, 0);
+* the HW source is cleared, thus after the handler has run.
+* OMAP4 needs this done _after_ enabing the interrupt to clear
+* the wakeup status.
+*/
+   if (bank->level_mask & BIT(offset))
omap_clear_gpio_irqstatus(bank, offset);
-   }
 
-   omap_set_gpio_irqenable(bank, offset, 1);
raw_spin_unlock_irqrestore(&bank->lock, flags);
 }
 
@@ -1377,6 +1396,7 @@ static int omap_gpio_probe(struct platform_device *pdev)
 
irqc->irq_startup = omap_gpio_irq_startup,
irqc->irq_shutdown = omap_gpio_irq_shutdown,
+   irqc->irq_mask_ack = omap_gpio_mask_ack_irq,
irqc->irq_ack = omap_gpio_ack_irq,
irqc->irq_mask = omap_gpio_mask_irq,
irqc->irq_unmask = omap_gpio_unmask_irq,
@@ -1520,6 +1540,10 @@ static void omap_gpio_idle(struct gpio_bank *bank, bool 
may_lose_context)
struct device *dev = bank->chip.parent;
u32 l1 = 0, l2 = 0;
 
+   dev_dbg(dev, "%s(): ld 0x%08x 0x%08x we 0x%08x\n", __func__,
+   bank->context.leveldetect0, bank->context.leveldetect1,
+   bank->context.wake_en);
+
if (bank->funcs.idle_enable_level_quirk)
bank->funcs.idle_enable_level_quirk(ba

Re: OMAP4430 SDP with KS8851: very slow networking

2018-12-07 Thread Tony Lindgren
* Tony Lindgren  [181207 18:14]:
> Hi,
> 
> * Russell King - ARM Linux  [181207 18:01]:
> > Hi Tony,
> > 
> > You know most of what's been going on from IRC, but here's the patch
> > which gets me:
> > 
> > 1) working interrupts for networking
> > 2) solves the stuck-wakeup problem
> > 
> > It also contains some of the debug bits I added.
> 
> This is excellent news :) Will test today.

Yes your patch seems to work great based on brief testing :)

> > I think what this means is that we should strip out ec0daae685b2
> > ("gpio: omap: Add level wakeup handling for omap4 based SoCs").
> 
> Yes the only reason for the wakeup quirk was the stuck wakeup
> state seen on omap4, it can be just dropped if this works.
> Adding Grygorii to Cc too.

I'll post a partial revert for commit ec0daae685b2 ("gpio: omap:
Add level wakeup handling for omap4 based SoCs") shortly.

Thanks,

Tony


Re: [PATCH bpf-next] bpf: relax verifier restriction on BPF_MOV | BPF_ALU

2018-12-07 Thread Alexei Starovoitov
On Fri, Dec 07, 2018 at 05:19:21PM +, Jiong Wang wrote:
> On 06/12/2018 03:13, Alexei Starovoitov wrote:
> > On Wed, Dec 05, 2018 at 03:32:50PM +, Jiong Wang wrote:
> > > On 05/12/2018 14:52, Edward Cree wrote:
> > > > On 05/12/18 09:46, Jiong Wang wrote:
> > > > > There is NO processed instruction number regression, either with or 
> > > > > without
> > > > > -mattr=+alu32.
> > > > 
> > > > > Cilium bpf
> > > > > ===
> > > > > bpf_lb-DLB_L3.o 2110/21101730/1733
> > > > That looks like a regression of 3 insns in the 32-bit case.
> > > > May be worth investigating why.
> > > 
> > > Will look into this.
> > > 
> > > > 
> > > > > +  dst_reg = insn->dst_reg;
> > > > > +  regs[dst_reg] = regs[src_reg];
> > > > > +  if (BPF_CLASS(insn->code) == BPF_ALU) {
> > > > > +  /* Update type and range info. */
> > > > > +  regs[dst_reg].type = SCALAR_VALUE;
> > > > > +  coerce_reg_to_size(®s[dst_reg], 4);
> > > > Won't this break when handed a pointer (as root, so allowed to leak
> > > >   it)?  E.g. (pointer + x) gets turned into scalar x, rather than
> > > >   unknown scalar in range [0, 0x].
> > > 
> > > Initially I was gating this to scalar_value only, later was thinking it
> > > could be extended to ptr case if ptr leak is allowed.
> > > 
> > > But, your comment remind me min/max value doesn't mean real min/max value
> > > for ptr types value, it means the offset only if I am understanding the
> > > issue correctly. So, it will break pointer case.
> > 
> > correct. In case of is_pointer_value() && root -> mark_reg_unknown() has to 
> > be called
> > 
> > The explanation of additional 3 steps from another email makes sense to me.
> > 
> > Can you take a look why it helps default (llvm alu64) case too ?
> > bpf_overlay.o   3096/2898
> 
> It is embarrassing that I am not able to reproduce this number after tried
> quite a few env configurations. I think the number must be wrong because
> llvm alu64 binary doesn't contain alu32 move so shouldn't be impacted by
> this patch even though I double checked the raw data I collected on llvm
> alu64, re-calculated the number before this patch, it is still 3096. I
> guess there must be something wrong with the binary I was loading.
> 
> I improved my benchmarking methodology to build all alu64 and alu32
> binaries first, and never change them later. Then used a script to load and
> collect the processed number. (borrowed the script from
> https://github.com/4ast/bpf_cilium_test/, only my binaries are built from
> latest Cilium repo and contains alu32 version as well)
> 
> I ran this new benchmarking env for several times, and could get the
> following new results consistently:
> 
> bpf_lb-DLB_L3.o:2085/2085 1685/1687
> bpf_lb-DLB_L4.o:2287/2287 1986/1982
> bpf_lb-DUNKNOWN.o:  690/690   622/622
> bpf_lxc.o:  95033/95033   N/A
> bpf_netdev.o:   7245/7245 N/A
> bpf_overlay.o:  2898/2898 3085/2947
> 
> No change on alu64 binary.
> 
> For alu32, bpf_overlay.o still get fewer processed instruction number, this
> is because there is the following sequence (and another similar one).
> Before this patch, r2 at insn 139 is unknown, so verifier always explore
> both path-taken and path-fall_through. After this patch, it explores
> path-fall_through only, so saved some insns.
> 
>   129: (b4) (u32) r7 = (u32) -140
>   ...
>   136: (bc) (u32) r2 = (u32) r7
>   137: (74) (u32) r2 >>= (u32) 31
>   138: (4c) (u32) r2 |= (u32) r1
>   139: (15) if r2 == 0x0 goto pc+342
>   140: (b4) (u32) r1 = (u32) 2
> 
> And a permissive register value for r2 hasn't released more path prune for
> this test, so in all, after this patch, there is fewer processed insn.
> 
> I have sent out a v2, gated this change under SCALAR_VALUE, and also
> updated the patch description.

Thanks for the update. Makes sense.



[PATCH 5/5] net: dsa: ksz: Add Microchip KSZ8795 DSA driver

2018-12-07 Thread Marek Vasut
From: Tristram Ha 

Add Microchip KSZ8795 DSA driver.

Signed-off-by: Tristram Ha 
Signed-off-by: Marek Vasut 
Cc: Vivien Didelot 
Cc: Woojung Huh 
Cc: David S. Miller 
---
 drivers/net/dsa/microchip/Kconfig   |   17 +
 drivers/net/dsa/microchip/Makefile  |2 +
 drivers/net/dsa/microchip/ksz8795.c | 1351 +++
 drivers/net/dsa/microchip/ksz8795_reg.h | 1016 +
 drivers/net/dsa/microchip/ksz8795_spi.c |  166 +++
 drivers/net/dsa/microchip/ksz_priv.h|1 +
 6 files changed, 2553 insertions(+)
 create mode 100644 drivers/net/dsa/microchip/ksz8795.c
 create mode 100644 drivers/net/dsa/microchip/ksz8795_reg.h
 create mode 100644 drivers/net/dsa/microchip/ksz8795_spi.c

diff --git a/drivers/net/dsa/microchip/Kconfig 
b/drivers/net/dsa/microchip/Kconfig
index bea29fde9f3d1..d17aec084c8c7 100644
--- a/drivers/net/dsa/microchip/Kconfig
+++ b/drivers/net/dsa/microchip/Kconfig
@@ -14,3 +14,20 @@ config NET_DSA_MICROCHIP_KSZ9477_SPI
depends on NET_DSA_MICROCHIP_KSZ9477 && SPI
help
  Select to enable support for registering switches configured through 
SPI.
+
+menuconfig NET_DSA_MICROCHIP_KSZ8795
+   tristate "Microchip KSZ8795 series switch support"
+   depends on NET_DSA
+   select NET_DSA_TAG_KSZ8795
+   select NET_DSA_MICROCHIP_KSZ_COMMON
+   help
+ This driver adds support for Microchip KSZ8795 switch chips.
+
+config NET_DSA_MICROCHIP_KSZ8795_SPI
+   tristate "KSZ8795 series SPI connected switch driver"
+   depends on NET_DSA_MICROCHIP_KSZ8795 && SPI
+   help
+ This driver accesses KSZ8795 chip through SPI.
+
+ It is required to use the KSZ8795 switch driver as the only access
+ is through SPI.
diff --git a/drivers/net/dsa/microchip/Makefile 
b/drivers/net/dsa/microchip/Makefile
index 3142c18b8f573..18ab64172e0bb 100644
--- a/drivers/net/dsa/microchip/Makefile
+++ b/drivers/net/dsa/microchip/Makefile
@@ -1,3 +1,5 @@
 obj-$(CONFIG_NET_DSA_MICROCHIP_KSZ_COMMON) += ksz_common.o
 obj-$(CONFIG_NET_DSA_MICROCHIP_KSZ9477)+= ksz9477.o
 obj-$(CONFIG_NET_DSA_MICROCHIP_KSZ9477_SPI)+= ksz9477_spi.o
+obj-$(CONFIG_NET_DSA_MICROCHIP_KSZ8795)+= ksz8795.o
+obj-$(CONFIG_NET_DSA_MICROCHIP_KSZ8795_SPI)+= ksz8795_spi.o
diff --git a/drivers/net/dsa/microchip/ksz8795.c 
b/drivers/net/dsa/microchip/ksz8795.c
new file mode 100644
index 0..4ba31794a13a1
--- /dev/null
+++ b/drivers/net/dsa/microchip/ksz8795.c
@@ -0,0 +1,1351 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Microchip KSZ8795 switch driver
+ *
+ * Copyright (C) 2017 Microchip Technology Inc.
+ * Tristram Ha 
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "ksz_priv.h"
+#include "ksz_common.h"
+#include "ksz8795_reg.h"
+
+static const struct {
+   char string[ETH_GSTRING_LEN];
+} mib_names[TOTAL_SWITCH_COUNTER_NUM] = {
+   { "rx_hi" },
+   { "rx_undersize" },
+   { "rx_fragments" },
+   { "rx_oversize" },
+   { "rx_jabbers" },
+   { "rx_symbol_err" },
+   { "rx_crc_err" },
+   { "rx_align_err" },
+   { "rx_mac_ctrl" },
+   { "rx_pause" },
+   { "rx_bcast" },
+   { "rx_mcast" },
+   { "rx_ucast" },
+   { "rx_64_or_less" },
+   { "rx_65_127" },
+   { "rx_128_255" },
+   { "rx_256_511" },
+   { "rx_512_1023" },
+   { "rx_1024_1522" },
+   { "rx_1523_2000" },
+   { "rx_2001" },
+   { "tx_hi" },
+   { "tx_late_col" },
+   { "tx_pause" },
+   { "tx_bcast" },
+   { "tx_mcast" },
+   { "tx_ucast" },
+   { "tx_deferred" },
+   { "tx_total_col" },
+   { "tx_exc_col" },
+   { "tx_single_col" },
+   { "tx_mult_col" },
+   { "rx_total" },
+   { "tx_total" },
+   { "rx_discards" },
+   { "tx_discards" },
+};
+
+static int ksz8795_reset_switch(struct ksz_device *dev)
+{
+   /* reset switch */
+   ksz_write8(dev, REG_POWER_MANAGEMENT_1,
+  SW_SOFTWARE_POWER_DOWN << SW_POWER_MANAGEMENT_MODE_S);
+   ksz_write8(dev, REG_POWER_MANAGEMENT_1, 0);
+
+   return 0;
+}
+
+static void ksz8795_set_prio_queue(struct ksz_device *dev, int port, int queue)
+{
+   u8 hi;
+   u8 lo;
+
+   /* Number of queues can only be 1, 2, or 4. */
+   switch (queue) {
+   case 4:
+   case 3:
+   queue = PORT_QUEUE_SPLIT_4;
+   break;
+   case 2:
+   queue = PORT_QUEUE_SPLIT_2;
+   break;
+   default:
+   queue = PORT_QUEUE_SPLIT_1;
+   }
+   ksz_pread8(dev, port, REG_PORT_CTRL_0, &lo);
+   ksz_pread8(dev, port, P_DROP_TAG_CTRL, &hi);
+   lo &= ~PORT_QUEUE_SPLIT_L;
+   if (queue & PORT_QUEUE_SPLIT_2)
+   lo |= PORT_QUEUE_SPLIT_L;
+   hi &= ~PORT_QUEUE_SPLIT_H;
+   if (queue & PORT_QUEUE_SPLIT_4)
+   hi |= PORT_QUEUE_S

[PATCH 2/5] net: dsa: ksz: Rename NET_DSA_TAG_KSZ to _KSZ9477

2018-12-07 Thread Marek Vasut
From: Tristram Ha 

Rename the tag Kconfig option and related macros in preparation for
addition of new KSZ family switches with different tag formats.

Signed-off-by: Tristram Ha 
Signed-off-by: Marek Vasut 
Cc: Vivien Didelot 
Cc: Woojung Huh 
Cc: David S. Miller 
---
 drivers/net/dsa/microchip/Kconfig   | 2 +-
 drivers/net/dsa/microchip/ksz9477.c | 2 +-
 include/net/dsa.h   | 2 +-
 net/dsa/Kconfig | 4 
 net/dsa/dsa.c   | 4 ++--
 net/dsa/dsa_priv.h  | 2 +-
 net/dsa/tag_ksz.c   | 2 +-
 7 files changed, 11 insertions(+), 7 deletions(-)

diff --git a/drivers/net/dsa/microchip/Kconfig 
b/drivers/net/dsa/microchip/Kconfig
index a8caf9249d50f..bea29fde9f3d1 100644
--- a/drivers/net/dsa/microchip/Kconfig
+++ b/drivers/net/dsa/microchip/Kconfig
@@ -4,7 +4,7 @@ config NET_DSA_MICROCHIP_KSZ_COMMON
 menuconfig NET_DSA_MICROCHIP_KSZ9477
tristate "Microchip KSZ9477 series switch support"
depends on NET_DSA
-   select NET_DSA_TAG_KSZ
+   select NET_DSA_TAG_KSZ9477
select NET_DSA_MICROCHIP_KSZ_COMMON
help
  This driver adds support for Microchip KSZ9477 switch chips.
diff --git a/drivers/net/dsa/microchip/ksz9477.c 
b/drivers/net/dsa/microchip/ksz9477.c
index ace8f2e3c781d..547bfb097dee7 100644
--- a/drivers/net/dsa/microchip/ksz9477.c
+++ b/drivers/net/dsa/microchip/ksz9477.c
@@ -330,7 +330,7 @@ static void ksz9477_port_init_cnt(struct ksz_device *dev, 
int port)
 static enum dsa_tag_protocol ksz9477_get_tag_protocol(struct dsa_switch *ds,
  int port)
 {
-   return DSA_TAG_PROTO_KSZ;
+   return DSA_TAG_PROTO_KSZ9477;
 }
 
 static int ksz9477_phy_read16(struct dsa_switch *ds, int addr, int reg)
diff --git a/include/net/dsa.h b/include/net/dsa.h
index 7a03274ea981b..dff6afc22ab1e 100644
--- a/include/net/dsa.h
+++ b/include/net/dsa.h
@@ -35,7 +35,7 @@ enum dsa_tag_protocol {
DSA_TAG_PROTO_BRCM_PREPEND,
DSA_TAG_PROTO_DSA,
DSA_TAG_PROTO_EDSA,
-   DSA_TAG_PROTO_KSZ,
+   DSA_TAG_PROTO_KSZ9477,
DSA_TAG_PROTO_LAN9303,
DSA_TAG_PROTO_MTK,
DSA_TAG_PROTO_QCA,
diff --git a/net/dsa/Kconfig b/net/dsa/Kconfig
index 4183e4ba27a50..8cdf73a31374e 100644
--- a/net/dsa/Kconfig
+++ b/net/dsa/Kconfig
@@ -41,6 +41,10 @@ config NET_DSA_TAG_EDSA
 config NET_DSA_TAG_KSZ
bool
 
+config NET_DSA_TAG_KSZ9477
+   bool
+   select NET_DSA_TAG_KSZ
+
 config NET_DSA_TAG_LAN9303
bool
 
diff --git a/net/dsa/dsa.c b/net/dsa/dsa.c
index 9f3209ff7ffde..4d4a381367d4d 100644
--- a/net/dsa/dsa.c
+++ b/net/dsa/dsa.c
@@ -52,8 +52,8 @@ const struct dsa_device_ops *dsa_device_ops[DSA_TAG_LAST] = {
 #ifdef CONFIG_NET_DSA_TAG_EDSA
[DSA_TAG_PROTO_EDSA] = &edsa_netdev_ops,
 #endif
-#ifdef CONFIG_NET_DSA_TAG_KSZ
-   [DSA_TAG_PROTO_KSZ] = &ksz_netdev_ops,
+#ifdef CONFIG_NET_DSA_TAG_KSZ9477
+   [DSA_TAG_PROTO_KSZ9477] = &ksz9477_netdev_ops,
 #endif
 #ifdef CONFIG_NET_DSA_TAG_LAN9303
[DSA_TAG_PROTO_LAN9303] = &lan9303_netdev_ops,
diff --git a/net/dsa/dsa_priv.h b/net/dsa/dsa_priv.h
index 3964c6f7a7c0d..b48b533294544 100644
--- a/net/dsa/dsa_priv.h
+++ b/net/dsa/dsa_priv.h
@@ -206,7 +206,7 @@ extern const struct dsa_device_ops dsa_netdev_ops;
 extern const struct dsa_device_ops edsa_netdev_ops;
 
 /* tag_ksz.c */
-extern const struct dsa_device_ops ksz_netdev_ops;
+extern const struct dsa_device_ops ksz9477_netdev_ops;
 
 /* tag_lan9303.c */
 extern const struct dsa_device_ops lan9303_netdev_ops;
diff --git a/net/dsa/tag_ksz.c b/net/dsa/tag_ksz.c
index cad4406d9d4c2..036bc62198f28 100644
--- a/net/dsa/tag_ksz.c
+++ b/net/dsa/tag_ksz.c
@@ -96,7 +96,7 @@ static struct sk_buff *ksz_rcv(struct sk_buff *skb, struct 
net_device *dev,
return skb;
 }
 
-const struct dsa_device_ops ksz_netdev_ops = {
+const struct dsa_device_ops ksz9477_netdev_ops = {
.xmit   = ksz_xmit,
.rcv= ksz_rcv,
.overhead = KSZ_INGRESS_TAG_LEN,
-- 
2.18.0



  1   2   >