date:20181207

Re: [net-next PATCH RFC 1/8] page_pool: add helper functions for DMA

2018-12-07 Thread Ilias Apalodimas

On Fri, Dec 07, 2018 at 11:06:55PM -0800, David Miller wrote:

> This isn't going to work on 32-bit platforms where dma_addr_t is a u64,
> because the page private is unsigned long.
> 
> Grep for PHY_ADDR_T_64BIT under arch/ to see the vast majority of the
> cases where this happens, then ARCH_DMA_ADDR_T_64BIT.

Noted, thanks for the heads up.

Thanks
/Ilias

Re: [net-next PATCH RFC 4/8] net: core: add recycle capabilities on skbs via page_pool API

2018-12-07 Thread Ilias Apalodimas

On Fri, Dec 07, 2018 at 11:15:14PM -0800, David Miller wrote:
> From: Jesper Dangaard Brouer 
> Date: Fri, 07 Dec 2018 00:25:47 +0100
> 
> > @@ -744,6 +745,10 @@ struct sk_buff {
> > head_frag:1,
> > xmit_more:1,
> > pfmemalloc:1;
> > +   /* TODO: Future idea, extend mem_info with __u8 flags, and
> > +* move bits head_frag and pfmemalloc there.
> > +*/
> > +   struct xdp_mem_info mem_info;
> 
> This is 4 bytes right?

With this patchset yes

> 
> I guess I can live with this.

Great news!

> 
> Please do some microbenchmarks to make sure this doesn't show any
> obvious regressions.

Will do

Thanks
/Ilias

Re: [net-next PATCH RFC 4/8] net: core: add recycle capabilities on skbs via page_pool API

2018-12-07 Thread David Miller

From: Jesper Dangaard Brouer 
Date: Fri, 07 Dec 2018 00:25:47 +0100

> @@ -744,6 +745,10 @@ struct sk_buff {
>   head_frag:1,
>   xmit_more:1,
>   pfmemalloc:1;
> + /* TODO: Future idea, extend mem_info with __u8 flags, and
> +  * move bits head_frag and pfmemalloc there.
> +  */
> + struct xdp_mem_info mem_info;

This is 4 bytes right?

I guess I can live with this.

Please do some microbenchmarks to make sure this doesn't show any
obvious regressions.

Thanks.

Re: [net-next PATCH RFC 1/8] page_pool: add helper functions for DMA

2018-12-07 Thread David Miller

From: Jesper Dangaard Brouer 
Date: Fri, 07 Dec 2018 00:25:32 +0100

> From: Ilias Apalodimas 
> 
> Add helper functions for retreiving dma_addr_t stored in page_private and
> unmapping dma addresses, mapped via the page_pool API.
> 
> Signed-off-by: Ilias Apalodimas 
> Signed-off-by: Jesper Dangaard Brouer 

This isn't going to work on 32-bit platforms where dma_addr_t is a u64,
because the page private is unsigned long.

Grep for PHY_ADDR_T_64BIT under arch/ to see the vast majority of the
cases where this happens, then ARCH_DMA_ADDR_T_64BIT.

Re: [PATCH] Revert "net/ibm/emac: wrong bit is used for STA control"

2018-12-07 Thread David Miller

From: Benjamin Herrenschmidt 
Date: Fri, 07 Dec 2018 15:05:04 +1100

> This reverts commit 624ca9c33c8a853a4a589836e310d776620f4ab9.
> 
> This commit is completely bogus. The STACR register has two formats, old
> and new, depending on the version of the IP block used. There's a pair of
> device-tree properties that can be used to specify the format used:
> 
>   has-inverted-stacr-oc
>   has-new-stacr-staopc
> 
> What this commit did was to change the bit definition used with the old
> parts to match the new parts. This of course breaks the driver on all
> the old ones.
> 
> Instead, the author should have set the appropriate properties in the
> device-tree for the variant used on his board.
> 
> Signed-off-by: Benjamin Herrenschmidt 
> ---
> 
> Found while setting up some old ppc440 boxes for test/CI

Applied, thanks.

Re: [PATCH] net-udp: deprioritize cpu match for udp socket lookup

2018-12-07 Thread David Miller

From: Maciej Żenczykowski 
Date: Fri, 7 Dec 2018 16:46:36 -0800

>> This doesn't apply to the current net tree.
>>
>> Also "net-udp: " is a weird subsystem prefix, just use "udp: ".
>>
>> Thank you.
> 
> Interesting... this patch was on top of net-next/master, and it still
> rebases cleanly on current net-next/master.
> 
> Would you like it on net/master instead?  It indeed doesn't apply
> cleanly there...

Well, it is a bug fix isn't it?  Or is this more like a behavioral feature?

Re: [PATCH V2] net: dsa: ksz: Add reset GPIO handling

2018-12-07 Thread Marek Vasut

On 12/08/2018 12:46 AM, David Miller wrote:
> From: Marek Vasut 
> Date: Fri, 7 Dec 2018 23:59:58 +0100
> 
>> On 12/07/2018 11:24 PM, Andrew Lunn wrote:
>>> On Fri, Dec 07, 2018 at 10:51:36PM +0100, Marek Vasut wrote:
 Add code to handle optional reset GPIO in the KSZ switch driver. The switch
 has a reset GPIO line which can be controlled by the CPU, so make sure it 
 is
 configured correctly in such setups.
>>>
>>> Hi Marek
>>
>> Hi Andrew,
>>
>>> Please make this a patch series, not two individual patches.
>>
>> This actually is an individual patch, it doesn't depend on anything.
>> Or do you mean a series with the DT documentation change ?
> 
> Yes, but all of this stuff is building up for one single purpose,
> and that is to support a new mode of operation with DSA or whatever.

I'll group together the ones which make sense to group together and are
not orthogonal if that's OK with you. The reset handling really is
orthogonal from the rest and can go in independently of the rest.

> So please group them together in a series with an appropriate
> header posting.

Sure

-- 
Best regards,
Marek Vasut

Re: [PATCH] net: dsa: ksz: Increase the tag alignment

2018-12-07 Thread Marek Vasut

On 12/08/2018 01:52 AM, tristram...@microchip.com wrote:
>> -padlen = (skb->len >= ETH_ZLEN) ? 0 : ETH_ZLEN - skb->len;
>> +padlen = (skb->len >= VLAN_ETH_ZLEN) ? 0 : VLAN_ETH_ZLEN - skb-
>>> len;
> 
> The requirement is the tail tag should be at the end of frame before FCS.
> When the length is less than 60 the MAC controller will pad the frame to
> legal size.  That is why this function makes sure the padding is done
> manually.  Increasing the size just increases the length of the frame and the
> chance to allocate new socket buffer.
> 
> Example of using ping size of 18 will have the sizes of request and response
> differ by 4 bytes.  Not that it matters much.
> 
> Are you concerned the MAC controller will remove the VLAN tag and so the frame
> will not be sent? Or the switch removes the VLAN tag and is not able to send?

With TI CPSW in dual-ethernet configuration, which adds internal VLAN
tag at the end of the frame, the KSZ switch fails. The CPU will send out
packets and the switch will reject them as corrupted. It needs this
extra VLAN tag padding.

-- 
Best regards,
Marek Vasut

Re: [PATCH] net: dsa: ksz: Fix port membership

2018-12-07 Thread Marek Vasut

On 12/08/2018 01:13 AM, tristram...@microchip.com wrote:
>> Do you have a git tree with all the KSZ patches based on -next
>> somewhere, so I don't have to look for them in random MLs ?
> 
> I just sent it this Monday and the subject for that patch is
> "[PATCH RFC 6/6] net: dsa: microchip: Add switch offload forwarding support."

Is all that collected in some git tree somewhere, so I don't have to
look for various patches in varying states of decay throughout the ML?

-- 
Best regards,
Marek Vasut

RE: [PATCH] net: dsa: ksz: Increase the tag alignment

2018-12-07 Thread Tristram.Ha

> - padlen = (skb->len >= ETH_ZLEN) ? 0 : ETH_ZLEN - skb->len;
> + padlen = (skb->len >= VLAN_ETH_ZLEN) ? 0 : VLAN_ETH_ZLEN - skb-
> >len;

The requirement is the tail tag should be at the end of frame before FCS.
When the length is less than 60 the MAC controller will pad the frame to
legal size.  That is why this function makes sure the padding is done
manually.  Increasing the size just increases the length of the frame and the
chance to allocate new socket buffer.

Example of using ping size of 18 will have the sizes of request and response
differ by 4 bytes.  Not that it matters much.

Are you concerned the MAC controller will remove the VLAN tag and so the frame
will not be sent? Or the switch removes the VLAN tag and is not able to send?

Re: [PATCH] net-udp: deprioritize cpu match for udp socket lookup

2018-12-07 Thread Maciej Żenczykowski

> This doesn't apply to the current net tree.
>
> Also "net-udp: " is a weird subsystem prefix, just use "udp: ".
>
> Thank you.

Interesting... this patch was on top of net-next/master, and it still
rebases cleanly on current net-next/master.

Would you like it on net/master instead?  It indeed doesn't apply
cleanly there...

[PATCH bpf-next 1/7] bpf: Add bpf_line_info support

2018-12-07 Thread Martin KaFai Lau

This patch adds bpf_line_info support.

It accepts an array of bpf_line_info objects during BPF_PROG_LOAD.
The "line_info", "line_info_cnt" and "line_info_rec_size" are added
to the "union bpf_attr".  The "line_info_rec_size" makes
bpf_line_info extensible in the future.

The new "check_btf_line()" ensures the userspace line_info is valid
for the kernel to use.

When the verifier is translating/patching the bpf_prog (through
"bpf_patch_insn_single()"), the line_infos' insn_off is also
adjusted by the newly added "bpf_adj_linfo()".

If the bpf_prog is jited, this patch also provides the jited addrs (in
aux->jited_linfo) for the corresponding line_info.insn_off.
"bpf_prog_fill_jited_linfo()" is added to fill the aux->jited_linfo.
It is currently called by the x86 jit.  Other jits can also use
"bpf_prog_fill_jited_linfo()" and it will be done in the followup patches.
In the future, if it deemed necessary, a particular jit could also provide
its own "bpf_prog_fill_jited_linfo()" implementation.

A few "*line_info*" fields are added to the bpf_prog_info such
that the user can get the xlated line_info back (i.e. the line_info
with its insn_off reflecting the translated prog).  The jited_line_info
is available if the prog is jited.  It is an array of __u64.
If the prog is not jited, jited_line_info_cnt is 0.

The verifier's verbose log with line_info will be done in
a follow up patch.

Signed-off-by: Martin KaFai Lau 
Acked-by: Yonghong Song 
---
 arch/x86/net/bpf_jit_comp.c  |   2 +
 include/linux/bpf.h  |  21 
 include/linux/bpf_verifier.h |   1 +
 include/linux/btf.h  |   1 +
 include/linux/filter.h   |   7 ++
 include/uapi/linux/bpf.h |  19 
 kernel/bpf/btf.c |   2 +-
 kernel/bpf/core.c| 118 -
 kernel/bpf/syscall.c |  83 +--
 kernel/bpf/verifier.c| 198 ++-
 10 files changed, 419 insertions(+), 33 deletions(-)

diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
index 2580cd2e98b1..5542303c43d9 100644
--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -1181,6 +1181,8 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog 
*prog)
}
 
if (!image || !prog->is_func || extra_pass) {
+   if (image)
+   bpf_prog_fill_jited_linfo(prog, addrs);
 out_addrs:
kfree(addrs);
kfree(jit_data);
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index e82b7039fc66..0c992b86eb2c 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -319,7 +319,28 @@ struct bpf_prog_aux {
struct bpf_prog_offload *offload;
struct btf *btf;
struct bpf_func_info *func_info;
+   /* bpf_line_info loaded from userspace.  linfo->insn_off
+* has the xlated insn offset.
+* Both the main and sub prog share the same linfo.
+* The subprog can access its first linfo by
+* using the linfo_idx.
+*/
+   struct bpf_line_info *linfo;
+   /* jited_linfo is the jited addr of the linfo.  It has a
+* one to one mapping to linfo:
+* jited_linfo[i] is the jited addr for the linfo[i]->insn_off.
+* Both the main and sub prog share the same jited_linfo.
+* The subprog can access its first jited_linfo by
+* using the linfo_idx.
+*/
+   void **jited_linfo;
u32 func_info_cnt;
+   u32 nr_linfo;
+   /* subprog can use linfo_idx to access its first linfo and
+* jited_linfo.
+* main prog always has linfo_idx == 0
+*/
+   u32 linfo_idx;
union {
struct work_struct work;
struct rcu_head rcu;
diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index 11f5df1092d9..c736945be7c5 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -203,6 +203,7 @@ static inline bool bpf_verifier_log_needed(const struct 
bpf_verifier_log *log)
 
 struct bpf_subprog_info {
u32 start; /* insn idx of function entry point */
+   u32 linfo_idx; /* The idx to the main_prog->aux->linfo */
u16 stack_depth; /* max. stack depth used by this function */
 };
 
diff --git a/include/linux/btf.h b/include/linux/btf.h
index 8c2199b5d250..b98405a56383 100644
--- a/include/linux/btf.h
+++ b/include/linux/btf.h
@@ -46,6 +46,7 @@ void btf_type_seq_show(const struct btf *btf, u32 type_id, 
void *obj,
   struct seq_file *m);
 int btf_get_fd_by_id(u32 id);
 u32 btf_id(const struct btf *btf);
+bool btf_name_offset_valid(const struct btf *btf, u32 offset);
 
 #ifdef CONFIG_BPF_SYSCALL
 const struct btf_type *btf_type_by_id(const struct btf *btf, u32 type_id);
diff --git a/include/linux/filter.h b/include/linux/filter.h
index d16deead65c6..29f21f9d7f68 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -718,6 +718,13 @@ void bpf_prog_free(struct bpf_prog

[PATCH bpf-next 6/7] bpf: libbpf: Add btf_line_info support to libbpf

2018-12-07 Thread Martin KaFai Lau

This patch adds bpf_line_info support to libbpf:
1) Parsing the line_info sec from ".BTF.ext"
2) Relocating the line_info.  If the main prog *_info relocation
   fails, it will ignore the remaining subprog line_info and continue.
   If the subprog *_info relocation fails, it will bail out.
3) BPF_PROG_LOAD a prog with line_info

Signed-off-by: Martin KaFai Lau 
Acked-by: Yonghong Song 
---
 tools/lib/bpf/bpf.c|  86 +++--
 tools/lib/bpf/bpf.h|   3 +
 tools/lib/bpf/btf.c| 209 +
 tools/lib/bpf/btf.h|  10 +-
 tools/lib/bpf/libbpf.c |  20 
 5 files changed, 239 insertions(+), 89 deletions(-)

diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
index 9fbbc0ed5952..3caaa3428774 100644
--- a/tools/lib/bpf/bpf.c
+++ b/tools/lib/bpf/bpf.c
@@ -173,11 +173,36 @@ int bpf_create_map_in_map(enum bpf_map_type map_type, 
const char *name,
  -1);
 }
 
+static void *
+alloc_zero_tailing_info(const void *orecord, __u32 cnt,
+   __u32 actual_rec_size, __u32 expected_rec_size)
+{
+   __u64 info_len = actual_rec_size * cnt;
+   void *info, *nrecord;
+   int i;
+
+   info = malloc(info_len);
+   if (!info)
+   return NULL;
+
+   /* zero out bytes kernel does not understand */
+   nrecord = info;
+   for (i = 0; i < cnt; i++) {
+   memcpy(nrecord, orecord, expected_rec_size);
+   memset(nrecord + expected_rec_size, 0,
+  actual_rec_size - expected_rec_size);
+   orecord += actual_rec_size;
+   nrecord += actual_rec_size;
+   }
+
+   return info;
+}
+
 int bpf_load_program_xattr(const struct bpf_load_program_attr *load_attr,
   char *log_buf, size_t log_buf_sz)
 {
+   void *finfo = NULL, *linfo = NULL;
union bpf_attr attr;
-   void *finfo = NULL;
__u32 name_len;
int fd;
 
@@ -201,6 +226,9 @@ int bpf_load_program_xattr(const struct 
bpf_load_program_attr *load_attr,
attr.func_info_rec_size = load_attr->func_info_rec_size;
attr.func_info_cnt = load_attr->func_info_cnt;
attr.func_info = ptr_to_u64(load_attr->func_info);
+   attr.line_info_rec_size = load_attr->line_info_rec_size;
+   attr.line_info_cnt = load_attr->line_info_cnt;
+   attr.line_info = ptr_to_u64(load_attr->line_info);
memcpy(attr.prog_name, load_attr->name,
   min(name_len, BPF_OBJ_NAME_LEN - 1));
 
@@ -212,36 +240,35 @@ int bpf_load_program_xattr(const struct 
bpf_load_program_attr *load_attr,
 * to give user space a hint how to deal with loading failure.
 * Check to see whether we can make some changes and load again.
 */
-   if (errno == E2BIG && attr.func_info_cnt &&
-   attr.func_info_rec_size < load_attr->func_info_rec_size) {
-   __u32 actual_rec_size = load_attr->func_info_rec_size;
-   __u32 expected_rec_size = attr.func_info_rec_size;
-   __u32 finfo_cnt = load_attr->func_info_cnt;
-   __u64 finfo_len = actual_rec_size * finfo_cnt;
-   const void *orecord;
-   void *nrecord;
-   int i;
-
-   finfo = malloc(finfo_len);
-   if (!finfo)
-   /* further try with log buffer won't help */
-   return fd;
-
-   /* zero out bytes kernel does not understand */
-   orecord = load_attr->func_info;
-   nrecord = finfo;
-   for (i = 0; i < load_attr->func_info_cnt; i++) {
-   memcpy(nrecord, orecord, expected_rec_size);
-   memset(nrecord + expected_rec_size, 0,
-  actual_rec_size - expected_rec_size);
-   orecord += actual_rec_size;
-   nrecord += actual_rec_size;
+   while (errno == E2BIG && (!finfo || !linfo)) {
+   if (!finfo && attr.func_info_cnt &&
+   attr.func_info_rec_size < load_attr->func_info_rec_size) {
+   /* try with corrected func info records */
+   finfo = alloc_zero_tailing_info(load_attr->func_info,
+   
load_attr->func_info_cnt,
+   
load_attr->func_info_rec_size,
+   
attr.func_info_rec_size);
+   if (!finfo)
+   goto done;
+
+   attr.func_info = ptr_to_u64(finfo);
+   attr.func_info_rec_size = load_attr->func_info_rec_size;
+   } else if (!linfo && attr.line_info_cnt &&
+  attr.line_info_rec_size <
+  load_attr->line_info_rec_size) {
+   linfo =

[PATCH bpf-next 2/7] bpf: tools: Sync uapi bpf.h

2018-12-07 Thread Martin KaFai Lau

Sync uapi bpf.h to tools/include/uapi/linux for
the new bpf_line_info.

Signed-off-by: Martin KaFai Lau 
Acked-by: Yonghong Song 
---
 tools/include/uapi/linux/bpf.h | 19 +++
 1 file changed, 19 insertions(+)

diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 16263e8827fc..7973c28b24a0 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -356,6 +356,9 @@ union bpf_attr {
__u32   func_info_rec_size; /* userspace 
bpf_func_info size */
__aligned_u64   func_info;  /* func info */
__u32   func_info_cnt;  /* number of bpf_func_info 
records */
+   __u32   line_info_rec_size; /* userspace 
bpf_line_info size */
+   __aligned_u64   line_info;  /* line info */
+   __u32   line_info_cnt;  /* number of bpf_line_info 
records */
};
 
struct { /* anonymous struct used by BPF_OBJ_* commands */
@@ -2679,6 +2682,12 @@ struct bpf_prog_info {
__u32 func_info_rec_size;
__aligned_u64 func_info;
__u32 func_info_cnt;
+   __u32 line_info_cnt;
+   __aligned_u64 line_info;
+   __aligned_u64 jited_line_info;
+   __u32 jited_line_info_cnt;
+   __u32 line_info_rec_size;
+   __u32 jited_line_info_rec_size;
 } __attribute__((aligned(8)));
 
 struct bpf_map_info {
@@ -2995,4 +3004,14 @@ struct bpf_func_info {
__u32   type_id;
 };
 
+#define BPF_LINE_INFO_LINE_NUM(line_col)   ((line_col) >> 10)
+#define BPF_LINE_INFO_LINE_COL(line_col)   ((line_col) & 0x3ff)
+
+struct bpf_line_info {
+   __u32   insn_off;
+   __u32   file_name_off;
+   __u32   line_off;
+   __u32   line_col;
+};
+
 #endif /* _UAPI__LINUX_BPF_H__ */
-- 
2.17.1

[PATCH bpf-next 3/7] bpf: Refactor and bug fix in test_func_type in test_btf.c

2018-12-07 Thread Martin KaFai Lau

1) bpf_load_program_xattr() is absorbing the EBIG error
   which makes testing this case impossible.  It is replaced
   with a direct syscall(__NR_bpf, BPF_PROG_LOAD,...).
2) The test_func_type() is renamed to test_info_raw() to
   prepare for the new line_info test in the next patch.
3) The bpf_obj_get_info_by_fd() testing for func_info
   is refactored to test_get_finfo().  A new
   test_get_linfo() will be added in the next patch
   for testing line_info purpose.
4) The test->func_info_cnt is checked instead of
   a static value "2".
5) Remove unnecessary "\n" in error message.
6) Adding back info_raw_test_num to the cmd arg such
   that a specific test case can be tested, like
   all other existing tests.

7) Fix a bug in handling expected_prog_load_failure.
   A test could pass even if prog_fd != -1 while
   expected_prog_load_failure is true.
8) The min rec_size check should be < 8 instead of < 4.

Fixes: 4798c4ba3ba9 ("tools/bpf: extends test_btf to test load/retrieve 
func_type info")
Signed-off-by: Martin KaFai Lau 
Acked-by: Yonghong Song 
---
 tools/testing/selftests/bpf/test_btf.c | 211 +++--
 1 file changed, 125 insertions(+), 86 deletions(-)

diff --git a/tools/testing/selftests/bpf/test_btf.c 
b/tools/testing/selftests/bpf/test_btf.c
index ff0952ea757a..8d5777c89620 100644
--- a/tools/testing/selftests/bpf/test_btf.c
+++ b/tools/testing/selftests/bpf/test_btf.c
@@ -6,6 +6,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -114,12 +115,13 @@ static struct args {
unsigned int raw_test_num;
unsigned int file_test_num;
unsigned int get_info_test_num;
+   unsigned int info_raw_test_num;
bool raw_test;
bool file_test;
bool get_info_test;
bool pprint_test;
bool always_log;
-   bool func_type_test;
+   bool info_raw_test;
 } args;
 
 static char btf_log_buf[BTF_LOG_BUF_SIZE];
@@ -3051,7 +3053,7 @@ static int test_pprint(void)
return err;
 }
 
-static struct btf_func_type_test {
+static struct prog_info_raw_test {
const char *descr;
const char *str_sec;
__u32 raw_types[MAX_NR_RAW_TYPES];
@@ -3062,7 +3064,7 @@ static struct btf_func_type_test {
__u32 func_info_rec_size;
__u32 func_info_cnt;
bool expected_prog_load_failure;
-} func_type_test[] = {
+} info_raw_tests[] = {
 {
.descr = "func_type (main func + one sub)",
.raw_types = {
@@ -3198,90 +3200,44 @@ static size_t probe_prog_length(const struct bpf_insn 
*fp)
return len + 1;
 }
 
-static int do_test_func_type(int test_num)
+static int test_get_finfo(const struct prog_info_raw_test *test,
+ int prog_fd)
 {
-   const struct btf_func_type_test *test = _type_test[test_num];
-   unsigned int raw_btf_size, info_len, rec_size;
-   int i, btf_fd = -1, prog_fd = -1, err = 0;
-   struct bpf_load_program_attr attr = {};
-   void *raw_btf, *func_info = NULL;
struct bpf_prog_info info = {};
struct bpf_func_info *finfo;
-
-   fprintf(stderr, "%s..", test->descr);
-   raw_btf = btf_raw_create(_tmpl, test->raw_types,
-test->str_sec, test->str_sec_size,
-_btf_size);
-
-   if (!raw_btf)
-   return -1;
-
-   *btf_log_buf = '\0';
-   btf_fd = bpf_load_btf(raw_btf, raw_btf_size,
- btf_log_buf, BTF_LOG_BUF_SIZE,
- args.always_log);
-   free(raw_btf);
-
-   if (CHECK(btf_fd == -1, "invalid btf_fd errno:%d", errno)) {
-   err = -1;
-   goto done;
-   }
-
-   if (*btf_log_buf && args.always_log)
-   fprintf(stderr, "\n%s", btf_log_buf);
-
-   attr.prog_type = test->prog_type;
-   attr.insns = test->insns;
-   attr.insns_cnt = probe_prog_length(attr.insns);
-   attr.license = "GPL";
-   attr.prog_btf_fd = btf_fd;
-   attr.func_info_rec_size = test->func_info_rec_size;
-   attr.func_info_cnt = test->func_info_cnt;
-   attr.func_info = test->func_info;
-
-   *btf_log_buf = '\0';
-   prog_fd = bpf_load_program_xattr(, btf_log_buf,
-BTF_LOG_BUF_SIZE);
-   if (test->expected_prog_load_failure && prog_fd == -1) {
-   err = 0;
-   goto done;
-   }
-   if (CHECK(prog_fd == -1, "invalid prog_id errno:%d", errno)) {
-   fprintf(stderr, "%s\n", btf_log_buf);
-   err = -1;
-   goto done;
-   }
+   __u32 info_len, rec_size, i;
+   void *func_info = NULL;
+   int err;
 
/* get necessary lens */
info_len = sizeof(struct bpf_prog_info);
err = bpf_obj_get_info_by_fd(prog_fd, , _len);
if (CHECK(err == -1, "invalid get info (1st) errno:%d", errno)) {
fprintf(stderr, "%s\n", btf_log_buf);
-   err =

[PATCH bpf-next 0/7] Introduce bpf_line_info

2018-12-07 Thread Martin KaFai Lau

This patch series introduces the bpf_line_info.  Please see individual patch
for details.

It will be useful for introspection purpose, like:

[root@arch-fb-vm1 bpf]# ~/devshare/fb-kernel/linux/tools/bpf/bpftool/bpftool 
prog dump jited pinned /sys/fs/bpf/test_btf_haskv
[...]
int test_long_fname_2(struct dummy_tracepoint_args * arg):
bpf_prog_44a040bf25481309_test_long_fname_2:
; static int test_long_fname_2(struct dummy_tracepoint_args *arg)
   0:   push   %rbp
   1:   mov%rsp,%rbp
   4:   sub$0x30,%rsp
   b:   sub$0x28,%rbp
   f:   mov%rbx,0x0(%rbp)
  13:   mov%r13,0x8(%rbp)
  17:   mov%r14,0x10(%rbp)
  1b:   mov%r15,0x18(%rbp)
  1f:   xor%eax,%eax
  21:   mov%rax,0x20(%rbp)
  25:   xor%esi,%esi
; int key = 0;
  27:   mov%esi,-0x4(%rbp)
; if (!arg->sock)
  2a:   mov0x8(%rdi),%rdi
; if (!arg->sock)
  2e:   cmp$0x0,%rdi
  32:   je 0x0070
  34:   mov%rbp,%rsi
; counts = bpf_map_lookup_elem(_map, );
  37:   add$0xfffc,%rsi
  3b:   movabs $0x8881139d7480,%rdi
  45:   add$0x110,%rdi
  4c:   mov0x0(%rsi),%eax
  4f:   cmp$0x4,%rax
  53:   jae0x005e
  55:   shl$0x3,%rax
  59:   add%rdi,%rax
  5c:   jmp0x0060
  5e:   xor%eax,%eax
; if (!counts)
  60:   cmp$0x0,%rax
  64:   je 0x0070
; counts->v6++;
  66:   mov0x4(%rax),%edi
  69:   add$0x1,%rdi
  6d:   mov%edi,0x4(%rax)
  70:   mov0x0(%rbp),%rbx
  74:   mov0x8(%rbp),%r13
  78:   mov0x10(%rbp),%r14
  7c:   mov0x18(%rbp),%r15
  80:   add$0x28,%rbp
  84:   leaveq
  85:   retq
[...]

Martin KaFai Lau (7):
  bpf: Add bpf_line_info support
  bpf: tools: Sync uapi bpf.h
  bpf: Refactor and bug fix in test_func_type in test_btf.c
  bpf: Add unit tests for bpf_line_info
  bpf: libbpf: Refactor and bug fix on the bpf_func_info loading logic
  bpf: libbpf: Add btf_line_info support to libbpf
  bpf: libbpf: bpftool: Print bpf_line_info during prog dump

 arch/x86/net/bpf_jit_comp.c   |   2 +
 include/linux/bpf.h   |  21 +
 include/linux/bpf_verifier.h  |   1 +
 include/linux/btf.h   |   1 +
 include/linux/filter.h|   7 +
 include/uapi/linux/bpf.h  |  19 +
 kernel/bpf/btf.c  |   2 +-
 kernel/bpf/core.c | 118 ++-
 kernel/bpf/syscall.c  |  83 +-
 kernel/bpf/verifier.c | 198 -
 .../bpftool/Documentation/bpftool-prog.rst|  16 +-
 tools/bpf/bpftool/bash-completion/bpftool |   6 +-
 tools/bpf/bpftool/btf_dumper.c|  64 ++
 tools/bpf/bpftool/jit_disasm.c|  23 +-
 tools/bpf/bpftool/main.h  |  23 +-
 tools/bpf/bpftool/prog.c  | 100 ++-
 tools/bpf/bpftool/xlated_dumper.c |  30 +-
 tools/bpf/bpftool/xlated_dumper.h |   7 +-
 tools/include/uapi/linux/bpf.h|  19 +
 tools/lib/bpf/Build   |   2 +-
 tools/lib/bpf/bpf.c   |  93 ++-
 tools/lib/bpf/bpf.h   |   3 +
 tools/lib/bpf/bpf_prog_linfo.c| 253 ++
 tools/lib/bpf/btf.c   | 342 
 tools/lib/bpf/btf.h   |  25 +-
 tools/lib/bpf/libbpf.c| 159 +++-
 tools/lib/bpf/libbpf.h|  13 +
 tools/lib/bpf/libbpf.map  |   4 +
 tools/testing/selftests/bpf/test_btf.c| 790 +++---
 29 files changed, 2036 insertions(+), 388 deletions(-)
 create mode 100644 tools/lib/bpf/bpf_prog_linfo.c

-- 
2.17.1

[PATCH bpf-next 7/7] bpf: libbpf: bpftool: Print bpf_line_info during prog dump

2018-12-07 Thread Martin KaFai Lau

This patch adds print bpf_line_info function in 'prog dump jitted'
and 'prog dump xlated':

[root@arch-fb-vm1 bpf]# ~/devshare/fb-kernel/linux/tools/bpf/bpftool/bpftool 
prog dump jited pinned /sys/fs/bpf/test_btf_haskv
[...]
int test_long_fname_2(struct dummy_tracepoint_args * arg):
bpf_prog_44a040bf25481309_test_long_fname_2:
; static int test_long_fname_2(struct dummy_tracepoint_args *arg)
   0:   push   %rbp
   1:   mov%rsp,%rbp
   4:   sub$0x30,%rsp
   b:   sub$0x28,%rbp
   f:   mov%rbx,0x0(%rbp)
  13:   mov%r13,0x8(%rbp)
  17:   mov%r14,0x10(%rbp)
  1b:   mov%r15,0x18(%rbp)
  1f:   xor%eax,%eax
  21:   mov%rax,0x20(%rbp)
  25:   xor%esi,%esi
; int key = 0;
  27:   mov%esi,-0x4(%rbp)
; if (!arg->sock)
  2a:   mov0x8(%rdi),%rdi
; if (!arg->sock)
  2e:   cmp$0x0,%rdi
  32:   je 0x0070
  34:   mov%rbp,%rsi
; counts = bpf_map_lookup_elem(_map, );
  37:   add$0xfffc,%rsi
  3b:   movabs $0x8881139d7480,%rdi
  45:   add$0x110,%rdi
  4c:   mov0x0(%rsi),%eax
  4f:   cmp$0x4,%rax
  53:   jae0x005e
  55:   shl$0x3,%rax
  59:   add%rdi,%rax
  5c:   jmp0x0060
  5e:   xor%eax,%eax
; if (!counts)
  60:   cmp$0x0,%rax
  64:   je 0x0070
; counts->v6++;
  66:   mov0x4(%rax),%edi
  69:   add$0x1,%rdi
  6d:   mov%edi,0x4(%rax)
  70:   mov0x0(%rbp),%rbx
  74:   mov0x8(%rbp),%r13
  78:   mov0x10(%rbp),%r14
  7c:   mov0x18(%rbp),%r15
  80:   add$0x28,%rbp
  84:   leaveq
  85:   retq
[...]

With linum:
[root@arch-fb-vm1 bpf]# ~/devshare/fb-kernel/linux/tools/bpf/bpftool/bpftool 
prog dump jited pinned /sys/fs/bpf/test_btf_haskv linum
int _dummy_tracepoint(struct dummy_tracepoint_args * arg):
bpf_prog_b07ccb89267cf242__dummy_tracepoint:
; return test_long_fname_1(arg); 
[file:/data/users/kafai/fb-kernel/linux/tools/testing/selftests/bpf/test_btf_haskv.c
 line_num:54 line_col:9]
   0:   push   %rbp
   1:   mov%rsp,%rbp
   4:   sub$0x28,%rsp
   b:   sub$0x28,%rbp
   f:   mov%rbx,0x0(%rbp)
  13:   mov%r13,0x8(%rbp)
  17:   mov%r14,0x10(%rbp)
  1b:   mov%r15,0x18(%rbp)
  1f:   xor%eax,%eax
  21:   mov%rax,0x20(%rbp)
  25:   callq  0x851e
; return test_long_fname_1(arg); 
[file:/data/users/kafai/fb-kernel/linux/tools/testing/selftests/bpf/test_btf_haskv.c
 line_num:54 line_col:2]
  2a:   xor%eax,%eax
  2c:   mov0x0(%rbp),%rbx
  30:   mov0x8(%rbp),%r13
  34:   mov0x10(%rbp),%r14
  38:   mov0x18(%rbp),%r15
  3c:   add$0x28,%rbp
  40:   leaveq
  41:   retq
[...]

Signed-off-by: Martin KaFai Lau 
Acked-by: Yonghong Song 
---
 .../bpftool/Documentation/bpftool-prog.rst|  16 +-
 tools/bpf/bpftool/bash-completion/bpftool |   6 +-
 tools/bpf/bpftool/btf_dumper.c|  64 +
 tools/bpf/bpftool/jit_disasm.c|  23 +-
 tools/bpf/bpftool/main.h  |  23 +-
 tools/bpf/bpftool/prog.c  | 100 ++-
 tools/bpf/bpftool/xlated_dumper.c |  30 ++-
 tools/bpf/bpftool/xlated_dumper.h |   7 +-
 tools/lib/bpf/Build   |   2 +-
 tools/lib/bpf/bpf_prog_linfo.c| 253 ++
 tools/lib/bpf/libbpf.h|  13 +
 tools/lib/bpf/libbpf.map  |   4 +
 12 files changed, 516 insertions(+), 25 deletions(-)
 create mode 100644 tools/lib/bpf/bpf_prog_linfo.c

diff --git a/tools/bpf/bpftool/Documentation/bpftool-prog.rst 
b/tools/bpf/bpftool/Documentation/bpftool-prog.rst
index 5524b6dccd85..7c30731a9b73 100644
--- a/tools/bpf/bpftool/Documentation/bpftool-prog.rst
+++ b/tools/bpf/bpftool/Documentation/bpftool-prog.rst
@@ -22,8 +22,8 @@ MAP COMMANDS
 =
 
 |  **bpftool** **prog { show | list }** [*PROG*]
-|  **bpftool** **prog dump xlated** *PROG* [{**file** *FILE* | **opcodes** 
| **visual**}]
-|  **bpftool** **prog dump jited**  *PROG* [{**file** *FILE* | 
**opcodes**}]
+|  **bpftool** **prog dump xlated** *PROG* [{**file** *FILE* | **opcodes** 
| **visual** | **linum**}]
+|  **bpftool** **prog dump jited**  *PROG* [{**file** *FILE* | **opcodes** 
| **linum**}]
 |  **bpftool** **prog pin** *PROG* *FILE*
 |  **bpftool** **prog { load | loadall }** *OBJ* *PATH* [**type** *TYPE*] 
[**map** {**idx** *IDX* | **name** *NAME*} *MAP*] [**dev** *NAME*]
 |  **bpftool** **prog attach** *PROG* *ATTACH_TYPE* [*MAP*]
@@ -56,7 +56,7 @@ DESCRIPTION
  Output will start with program ID followed by program type and
  zero or more named attributes (depending on kernel version).
 
-   **bpftool prog dump xlated** *PROG* [{ **file** *FILE* | **opcodes** | 
**visual** }]
+   **bpftool prog dump xlated** *PROG* [{ **file** *FILE* | **opcodes** | 
**visual** | **linum** }]
  Dump eBPF instructions of the program from the kernel. By

[PATCH bpf-next 4/7] bpf: Add unit tests for bpf_line_info

2018-12-07 Thread Martin KaFai Lau

Add unit tests for bpf_line_info for both BPF_PROG_LOAD and
BPF_OBJ_GET_INFO_BY_FD.

jit enabled:
[root@arch-fb-vm1 bpf]# ./test_btf -k 0
BTF prog info raw test[5] (line_info (No subprog)): OK
BTF prog info raw test[6] (line_info (No subprog. insn_off >= prog->len)): OK
BTF prog info raw test[7] (line_info (No subprog. zero tailing line_info): OK
BTF prog info raw test[8] (line_info (No subprog. nonzero tailing line_info)): 
OK
BTF prog info raw test[9] (line_info (subprog)): OK
BTF prog info raw test[10] (line_info (subprog + func_info)): OK
BTF prog info raw test[11] (line_info (subprog. missing 1st func line info)): OK
BTF prog info raw test[12] (line_info (subprog. missing 2nd func line info)): OK
BTF prog info raw test[13] (line_info (subprog. unordered insn offset)): OK

jit disabled:
BTF prog info raw test[5] (line_info (No subprog)): not jited. skipping 
jited_line_info check. OK
BTF prog info raw test[6] (line_info (No subprog. insn_off >= prog->len)): OK
BTF prog info raw test[7] (line_info (No subprog. zero tailing line_info): not 
jited. skipping jited_line_info check. OK
BTF prog info raw test[8] (line_info (No subprog. nonzero tailing line_info)): 
OK
BTF prog info raw test[9] (line_info (subprog)): not jited. skipping 
jited_line_info check. OK
BTF prog info raw test[10] (line_info (subprog + func_info)): not jited. 
skipping jited_line_info check. OK
BTF prog info raw test[11] (line_info (subprog. missing 1st func line info)): OK
BTF prog info raw test[12] (line_info (subprog. missing 2nd func line info)): OK
BTF prog info raw test[13] (line_info (subprog. unordered insn offset)): OK

Signed-off-by: Martin KaFai Lau 
Acked-by: Yonghong Song 
---
 tools/testing/selftests/bpf/test_btf.c | 597 -
 1 file changed, 580 insertions(+), 17 deletions(-)

diff --git a/tools/testing/selftests/bpf/test_btf.c 
b/tools/testing/selftests/bpf/test_btf.c
index 8d5777c89620..7707273736ac 100644
--- a/tools/testing/selftests/bpf/test_btf.c
+++ b/tools/testing/selftests/bpf/test_btf.c
@@ -108,7 +108,7 @@ static int __base_pr(const char *format, ...)
 #define BTF_END_RAW 0xdeadbeef
 #define NAME_TBD 0xdeadb33f
 
-#define MAX_NR_RAW_TYPES 1024
+#define MAX_NR_RAW_U32 1024
 #define BTF_LOG_BUF_SIZE 65535
 
 static struct args {
@@ -137,7 +137,7 @@ struct btf_raw_test {
const char *str_sec;
const char *map_name;
const char *err_str;
-   __u32 raw_types[MAX_NR_RAW_TYPES];
+   __u32 raw_types[MAX_NR_RAW_U32];
__u32 str_sec_size;
enum bpf_map_type map_type;
__u32 key_size;
@@ -156,6 +156,9 @@ struct btf_raw_test {
int str_len_delta;
 };
 
+#define BTF_STR_SEC(str) \
+   .str_sec = str, .str_sec_size = sizeof(str)
+
 static struct btf_raw_test raw_tests[] = {
 /* enum E {
  * E0,
@@ -1858,11 +1861,11 @@ static const char *get_next_str(const char *start, 
const char *end)
return start < end - 1 ? start + 1 : NULL;
 }
 
-static int get_type_sec_size(const __u32 *raw_types)
+static int get_raw_sec_size(const __u32 *raw_types)
 {
int i;
 
-   for (i = MAX_NR_RAW_TYPES - 1;
+   for (i = MAX_NR_RAW_U32 - 1;
 i >= 0 && raw_types[i] != BTF_END_RAW;
 i--)
;
@@ -1874,7 +1877,8 @@ static void *btf_raw_create(const struct btf_header *hdr,
const __u32 *raw_types,
const char *str,
unsigned int str_sec_size,
-   unsigned int *btf_size)
+   unsigned int *btf_size,
+   const char **ret_next_str)
 {
const char *next_str = str, *end_str = str + str_sec_size;
unsigned int size_needed, offset;
@@ -1883,7 +1887,7 @@ static void *btf_raw_create(const struct btf_header *hdr,
uint32_t *ret_types;
void *raw_btf;
 
-   type_sec_size = get_type_sec_size(raw_types);
+   type_sec_size = get_raw_sec_size(raw_types);
if (CHECK(type_sec_size < 0, "Cannot get nr_raw_types"))
return NULL;
 
@@ -1922,6 +1926,8 @@ static void *btf_raw_create(const struct btf_header *hdr,
ret_hdr->str_len = str_sec_size;
 
*btf_size = size_needed;
+   if (ret_next_str)
+   *ret_next_str = next_str;
 
return raw_btf;
 }
@@ -1941,7 +1947,7 @@ static int do_test_raw(unsigned int test_num)
 test->raw_types,
 test->str_sec,
 test->str_sec_size,
-_btf_size);
+_btf_size, NULL);
 
if (!raw_btf)
return -1;
@@ -2018,7 +2024,7 @@ static int test_raw(void)
 struct btf_get_info_test {
const char *descr;
const char *str_sec;
-   __u32 raw_types[MAX_NR_RAW_TYPES];
+   __u32 raw_types[MAX_NR_RAW_U32];
__u32 str_sec_size;
int btf_size_delta;
int

[PATCH bpf-next 5/7] bpf: libbpf: Refactor and bug fix on the bpf_func_info loading logic

2018-12-07 Thread Martin KaFai Lau

This patch refactor and fix a bug in the libbpf's bpf_func_info loading
logic.  The bug fix and refactoring are targeting the same
commit 2993e0515bb4 ("tools/bpf: add support to read .BTF.ext sections")
which is in the bpf-next branch.

1) In bpf_load_program_xattr(), it should retry when errno == E2BIG
   regardless of log_buf and log_buf_sz.  This patch fixes it.

2) btf_ext__reloc_init() and btf_ext__reloc() are essentially
   the same except btf_ext__reloc_init() always has insns_cnt == 0.
   Hence, btf_ext__reloc_init() is removed.

   btf_ext__reloc() is also renamed to btf_ext__reloc_func_info()
   to get ready for the line_info support in the next patch.

3) Consolidate func_info section logic from "btf_ext_parse_hdr()",
   "btf_ext_validate_func_info()" and "btf_ext__new()" to
   a new function "btf_ext_copy_func_info()" such that similar
   logic can be reused by the later libbpf's line_info patch.

4) The next line_info patch will store line_info_cnt instead of
   line_info_len in the bpf_program because the kernel is taking
   line_info_cnt also.  It will save a few "len" to "cnt" conversions
   and will also save some function args.

   Hence, this patch also makes bpf_program to store func_info_cnt
   instead of func_info_len.

5) btf_ext depends on btf.  e.g. the func_info's type_id
   in ".BTF.ext" is not useful when ".BTF" is absent.
   This patch only init the obj->btf_ext pointer after
   it has successfully init the obj->btf pointer.

   This can avoid always checking "obj->btf && obj->btf_ext"
   together for accessing ".BTF.ext".  Checking "obj->btf_ext"
   alone will do.

6) Move "struct btf_sec_func_info" from btf.h to btf.c.
   There is no external usage outside btf.c.

Fixes: 2993e0515bb4 ("tools/bpf: add support to read .BTF.ext sections")
Signed-off-by: Martin KaFai Lau 
Acked-by: Yonghong Song 
---
 tools/lib/bpf/bpf.c|   7 +-
 tools/lib/bpf/btf.c| 191 -
 tools/lib/bpf/btf.h|  17 +---
 tools/lib/bpf/libbpf.c | 139 --
 4 files changed, 177 insertions(+), 177 deletions(-)

diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
index 5c3be06bf0dd..9fbbc0ed5952 100644
--- a/tools/lib/bpf/bpf.c
+++ b/tools/lib/bpf/bpf.c
@@ -205,7 +205,7 @@ int bpf_load_program_xattr(const struct 
bpf_load_program_attr *load_attr,
   min(name_len, BPF_OBJ_NAME_LEN - 1));
 
fd = sys_bpf(BPF_PROG_LOAD, , sizeof(attr));
-   if (fd >= 0 || !log_buf || !log_buf_sz)
+   if (fd >= 0)
return fd;
 
/* After bpf_prog_load, the kernel may modify certain attributes
@@ -244,10 +244,13 @@ int bpf_load_program_xattr(const struct 
bpf_load_program_attr *load_attr,
 
fd = sys_bpf(BPF_PROG_LOAD, , sizeof(attr));
 
-   if (fd >= 0 || !log_buf || !log_buf_sz)
+   if (fd >= 0)
goto done;
}
 
+   if (!log_buf || !log_buf_sz)
+   goto done;
+
/* Try again with log */
attr.log_buf = ptr_to_u64(log_buf);
attr.log_size = log_buf_sz;
diff --git a/tools/lib/bpf/btf.c b/tools/lib/bpf/btf.c
index 85d6446cf832..aa4fa02b13fc 100644
--- a/tools/lib/bpf/btf.c
+++ b/tools/lib/bpf/btf.c
@@ -43,6 +43,13 @@ struct btf_ext {
__u32 func_info_len;
 };
 
+struct btf_sec_func_info {
+   __u32   sec_name_off;
+   __u32   num_func_info;
+   /* Followed by num_func_info number of bpf func_info records */
+   __u8data[0];
+};
+
 /* The minimum bpf_func_info checked by the loader */
 struct bpf_func_info_min {
__u32   insn_off;
@@ -479,41 +486,66 @@ int btf__get_from_id(__u32 id, struct btf **btf)
return err;
 }
 
-static int btf_ext_validate_func_info(const void *finfo, __u32 size,
- btf_print_fn_t err_log)
+static int btf_ext_copy_func_info(struct btf_ext *btf_ext,
+ __u8 *data, __u32 data_size,
+ btf_print_fn_t err_log)
 {
-   int sec_hdrlen = sizeof(struct btf_sec_func_info);
-   __u32 size_left, num_records, record_size;
+   const struct btf_ext_header *hdr = (struct btf_ext_header *)data;
const struct btf_sec_func_info *sinfo;
-   __u64 total_record_size;
+   __u32 info_left, record_size;
+   /* The start of the info sec (including the __u32 record_size). */
+   const void *info;
+
+   /* data and data_size do not include btf_ext_header from now on */
+   data = data + hdr->hdr_len;
+   data_size -= hdr->hdr_len;
+
+   if (hdr->func_info_off & 0x03) {
+   elog("BTF.ext func_info section is not aligned to 4 bytes\n");
+   return -EINVAL;
+   }
+
+   if (data_size < hdr->func_info_off ||
+   hdr->func_info_len > data_size - hdr->func_info_off) {
+   elog("func_info section (off:%u len:%u) is beyond the end of 
the ELF section .BTF.ext\n",
+

Re: [PATCH net-next 0/4] tc-testing: implement command timeouts and better results tracking

2018-12-07 Thread David Miller

From: Lucas Bates 
Date: Thu,  6 Dec 2018 17:42:23 -0500

> Patch 1 adds a timeout feature for any command tdc launches in a subshell.
> This prevents tdc from hanging indefinitely.
> 
> Patches 2-4 introduce a new method for tracking and generating test case
> results, and implements it across the core script and all applicable
> plugins.

Series applied.

Re: [PATCH net v2 0/2] Fix slab out-of-bounds on insufficient headroom for IPv6 packets

2018-12-07 Thread David Miller

From: Stefano Brivio 
Date: Thu,  6 Dec 2018 19:30:35 +0100

> Patch 1/2 fixes a slab out-of-bounds occurring with short SCTP packets over
> IPv4 over L2TP over IPv6 on a configuration with relatively low HEADER_MAX.
> 
> Patch 2/2 makes sure we avoid writing before the allocated buffer in
> neigh_hh_output() in case the headroom is enough for the unaligned hardware
> header size, but not enough for the aligned one, and that we warn if we hit
> this condition.

Series applied and queued up for -stable, thanks.

Re: [PATCH net] tcp: lack of available data can also cause TSO defer

2018-12-07 Thread David Miller

From: Eric Dumazet 
Date: Thu,  6 Dec 2018 09:58:24 -0800

> tcp_tso_should_defer() can return true in three different cases :
> 
>  1) We are cwnd-limited
>  2) We are rwnd-limited
>  3) We are application limited.
> 
> Neal pointed out that my recent fix went too far, since
> it assumed that if we were not in 1) case, we must be rwnd-limited
> 
> Fix this by properly populating the is_cwnd_limited and
> is_rwnd_limited booleans.
> 
> After this change, we can finally move the silly check for FIN
> flag only for the application-limited case.
> 
> The same move for EOR bit will be handled in net-next,
> since commit 1c09f7d073b1 ("tcp: do not try to defer skbs
> with eor mark (MSG_EOR)") is scheduled for linux-4.21
> 
> Tested by running 200 concurrent netperf -t TCP_RR -- -r 6,100
> and checking none of them was rwnd_limited in the chrono_stat
> output from "ss -ti" command.
> 
> Fixes: 41727549de3e ("tcp: Do not underestimate rwnd_limited")
> Signed-off-by: Eric Dumazet 
> Suggested-by: Neal Cardwell 
> Reviewed-by: Neal Cardwell 
> Acked-by: Soheil Hassas Yeganeh 
> Reviewed-by: Yuchung Cheng 

Applied.

Re: [PATCH] net-udp: deprioritize cpu match for udp socket lookup

2018-12-07 Thread David Miller

From: Maciej Żenczykowski 
Date: Wed,  5 Dec 2018 12:59:17 -0800

> From: Maciej Żenczykowski 
> 
> During udp socket lookup cpu match should be lowest priority,
> hence it should increase score by only 1.
> 
> The next priority is delivering v4 to v4 sockets, and v6 to v6 sockets.
> The v6 code path doesn't have to deal with this so it always gets
> a score of '4'.  The v4 code path uses '4' or '2' depending on
> whether we're delivering to a v4 socket or a dualstack v6 socket.
> 
> This is more important than cpu match, so has to be greater than
> the '1' bump in score from cpu match.
> 
> All other matches (src/dst ip, src port) are even *more* important,
> so need to bump score by 4 for ipv4.
> 
> For ipv6 we could simply bump by 2, but let's keep the two code
> paths as similar as possible.
> 
> (also, while at it, remove two unnecessary unconditional score bumps)
> 
> Signed-off-by: Maciej Żenczykowski 

This doesn't apply to the current net tree.

Also "net-udp: " is a weird subsystem prefix, just use "udp: ".

Thank you.

RE: [PATCH] net: dsa: ksz: Fix port membership

2018-12-07 Thread Tristram.Ha

> Do you have a git tree with all the KSZ patches based on -next
> somewhere, so I don't have to look for them in random MLs ?

I just sent it this Monday and the subject for that patch is
"[PATCH RFC 6/6] net: dsa: microchip: Add switch offload forwarding support."

Re: [Patch v2 net-next] call sk_dst_reset when set SO_DONTROUTE

2018-12-07 Thread David Miller

From: yupeng 
Date: Wed,  5 Dec 2018 18:56:28 -0800

> after set SO_DONTROUTE to 1, the IP layer should not route packets if
> the dest IP address is not in link scope. But if the socket has cached
> the dst_entry, such packets would be routed until the sk_dst_cache
> expires. So we should clean the sk_dst_cache when a user set
> SO_DONTROUTE option. Below are server/client python scripts which
> could reprodue this issue:
 ...
> Signed-off-by: yupeng 

Applied.

RE: [PATCH] net: dsa: ksz: Fix port membership

2018-12-07 Thread Tristram.Ha

> > I think if you do this without setting offload_fwd_mark you will
> > receive duplicate frame.
> 
> I don't think it will, at least not in the normal case. The hardware
> should know the egress port, so there is no need to forward a copy to
> the CPU. The only time it should forward to the CPU is when the egress
> port is not known, so it floods. Without offload_fwd_mark set, the SW
> bridge will flood it back out the ports causing duplication. But that
> is not too bad. The Marvell driver did this for a while and nothing
> bad was reported.

For unicast frames it is okay as the CPU port does not see it after the first
one.  For multicast frames there will be duplicates, and it is tolerated?

Re: [PATCH v2 net-next] neighbor: Improve garbage collection

2018-12-07 Thread David Miller

From: David Ahern 
Date: Fri,  7 Dec 2018 12:24:57 -0800

> From: David Ahern 
> 
> The existing garbage collection algorithm has a number of problems:
 ...
> This patch addresses these problems as follows:
> 
> 1. Use of a separate list_head to track entries that can be garbage
>collected along with a separate counter. PERMANENT entries are not
>added to this list.
> 
>The gc_thresh parameters are only compared to the new counter, not the
>total entries in the table. The forced_gc function is updated to only
>walk this new gc_list looking for entries to evict.
> 
> 2. Entries are added to the list head at the tail and removed from the
>front.
> 
> 3. Entries are only evicted if they were last updated more than 5 seconds
>ago, adhering to the original intent of gc_thresh2.
> 
> 4. Forced gc is stopped once the number of gc_entries drops below
>gc_thresh2.
> 
> 5. Since gc checks do not apply to PERMANENT entries, gc levels are skipped
>when allocating a new neighbor for a PERMANENT entry. By extension this
>means there are no explicit limits on the number of PERMANENT entries
>that can be created, but this is no different than FIB entries or FDB
>entries.
> 
> Signed-off-by: David Ahern 
> ---
> v2
> - remove on_gc_list boolean in favor of !list_empty
> - fix neigh_alloc to add new entry to tail of list_head

Again, looks great, applied.

RE: [PATCH] net: dsa: ksz: Fix port membership

2018-12-07 Thread Tristram.Ha

> >> If two ports are in the same bridge and in forwarding state, the packets
> >> must be able to pass between them in both directions. The current code
> >> only configures this bridge membership for a port newly added to the
> >> bridge, but does not update all the other ports. Thus, ingress packets
> >> on the new port will be forwarded, but ingress packets on other ports
> >> destined for the new port (eg. a reply) will not be forwarded back to
> >> the new port, because they are not configured to do so. This patch fixes
> >> that by updating the membership registers of all ports.
> >>
> >> Signed-off-by: Marek Vasut 
> >> Cc: Vivien Didelot 
> >> Cc: Woojung Huh 
> >> Cc: David S. Miller 
> >> Cc: Tristram Ha 
> >> ---
> >>  drivers/net/dsa/microchip/ksz9477.c | 6 +++---
> >>  1 file changed, 3 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/drivers/net/dsa/microchip/ksz9477.c
> >> b/drivers/net/dsa/microchip/ksz9477.c
> >> index 0684657fbf9a9..e24dd14ccde77 100644
> >> --- a/drivers/net/dsa/microchip/ksz9477.c
> >> +++ b/drivers/net/dsa/microchip/ksz9477.c
> >> @@ -396,7 +396,7 @@ static void ksz9477_port_stp_state_set(struct
> >> dsa_switch *ds, int port,
> >>struct ksz_device *dev = ds->priv;
> >>struct ksz_port *p = >ports[port];
> >>u8 data;
> >> -  int member = -1;
> >> +  int i, member = -1;
> >>
> >>ksz_pread8(dev, port, P_STP_CTRL, );
> >>data &= ~(PORT_TX_ENABLE | PORT_RX_ENABLE |
> >> PORT_LEARN_DISABLE);
> >> @@ -454,8 +454,8 @@ static void ksz9477_port_stp_state_set(struct
> >> dsa_switch *ds, int port,
> >>dev->tx_ports &= ~(1 << port);
> >>
> >>/* Port membership may share register with STP state. */
> >> -  if (member >= 0 && member != p->member)
> >> -  ksz9477_cfg_port_member(dev, port, (u8)member);
> >> +  for (i = 0; i < SWITCH_PORT_NUM; i++)
> >> +  ksz9477_cfg_port_member(dev, i, (u8)member);
> >>
> >>/* Check if forwarding needs to be updated. */
> >>if (state != BR_STATE_FORWARDING) {
> >
> > The original DSA model did not have a way to tell the bridge device not to
> > forward the frame, so the switch driver always setup the membership to
> > disable forwarding between ports.
> >
> > When lan devices are setup they act like individual devices.  A bridge 
> > device
> > adding them under it will forward the frames.
> >
> > The new switchdev model adds the offload_fwd_mark bit to tell the bridge
> not to
> > forward frame.
> >
> > The ksz_update_port_member function in ksz_common.c is doing this
> membership
> > setup for all forwarding ports.  It was finally enabled in one of the RFC
> patches I
> > submitted recently (Add switch forward offloading support).
> >
> > I think if you do this without setting offload_fwd_mark you will receive
> duplicate
> > frame.
> >
> 
> Either I am misreading Marek's patch, or I don't quite understand your
> response, but what is happening when you enslave a switch port into a
> bridge is that you need to make sure that:
> 
> - the switch port being enslaved will be part of the same forwarding
> group as any other switch port already in the bridge
> - any existing switch port already enslaved in the bridge must now also
> be allowed to forward to the port that is being enslaved
> 
> That is to me, exactly what Marek's patch is fixing, your response is
> about something slightly orthogonal here.

I got confused here as the code is obviously wrong and should not work,
so I found out why it works in the bridge device situation.  There is actually
a bug in the driver that enables this behavior.  The port_vlan_filtering 
function
turns off the port membership enforcement.  Fixing this problem should be
easy, but this port_vlan_filtering function is also not implemented right.  It 
treats the
operation as a simple VLAN on/off, but it is more complex than that.

Anyway it seems to work in the bridge device situation, but it does not work
in the default situation:

Assume there are two port devices lan1 and lan2.  The device lan1 is assigned
an IP address and can talk to outside.  Enabling and disabling lan2 by doing
"ifconfig lan2 up" and "ifconfig lan2 down."  The device lan1 is no longer 
working.

Create a bridge device and add a child device by doing "brctl addbr br0" and
"brctl addif br0 lan1."  This will call port_vlan_filtering and then the feature
UNICAST_VLAN_BOUNDARY is disabled.  This causes port membership to have no
effect on unicast packets and so it does not matter what member value is used.
The device lan1 can start working again.

The fix is to avoid disabling UNICAST_VLAN_BOUNDARY and it should be set all
the time.  In this switch the default is on.

Re: [PATCH V2] net: dsa: ksz: Add reset GPIO handling

2018-12-07 Thread David Miller

From: Marek Vasut 
Date: Fri, 7 Dec 2018 23:59:58 +0100

> On 12/07/2018 11:24 PM, Andrew Lunn wrote:
>> On Fri, Dec 07, 2018 at 10:51:36PM +0100, Marek Vasut wrote:
>>> Add code to handle optional reset GPIO in the KSZ switch driver. The switch
>>> has a reset GPIO line which can be controlled by the CPU, so make sure it is
>>> configured correctly in such setups.
>> 
>> Hi Marek
> 
> Hi Andrew,
> 
>> Please make this a patch series, not two individual patches.
> 
> This actually is an individual patch, it doesn't depend on anything.
> Or do you mean a series with the DT documentation change ?

Yes, but all of this stuff is building up for one single purpose,
and that is to support a new mode of operation with DSA or whatever.

So please group them together in a series with an appropriate
header posting.

Re: [PATCH net-next] neighbor: Add protocol attribute

2018-12-07 Thread David Miller

From: Eric Dumazet 
Date: Fri, 7 Dec 2018 15:03:04 -0800

> On 12/07/2018 02:24 PM, David Ahern wrote:
>> On 12/7/18 3:20 PM, Eric Dumazet wrote:
>> 
>> /* --- cacheline 3 boundary (192 bytes) --- */
>> struct hh_cachehh;   /*   19248 */
>> 
>> ...
>> 
>> but does not change the actual allocation size which is rounded to 512.
>> 
> 
> I have not talked about the allocation size, but alignment of ->ha field,
> which is kind of assuming long alignment, in a strange way.

Right, neigh->ha[] should probably be kept 8-byte aligned.

Re: [PATCH net-next] neighbor: Add protocol attribute

2018-12-07 Thread Eric Dumazet




On 12/07/2018 02:24 PM, David Ahern wrote:
> On 12/7/18 3:20 PM, Eric Dumazet wrote:
>>
>>
>> On 12/07/2018 01:49 PM, David Ahern wrote:
>>> From: David Ahern 
>>>
>>> Similar to routes and rules, add protocol attribute to neighbor entries
>>> for easier tracking of how each was created.
>>>
>>> Signed-off-by: David Ahern 
>>> ---
>>>  include/net/neighbour.h|  2 ++
>>>  include/uapi/linux/neighbour.h |  1 +
>>>  net/core/neighbour.c   | 24 +++-
>>>  3 files changed, 26 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/include/net/neighbour.h b/include/net/neighbour.h
>>> index 6c13072910ab..e93c59df9501 100644
>>> --- a/include/net/neighbour.h
>>> +++ b/include/net/neighbour.h
>>> @@ -149,6 +149,7 @@ struct neighbour {
>>> __u8nud_state;
>>> __u8type;
>>> __u8dead;
>>> +   u8  protocol;
>>> seqlock_t   ha_lock;
>>> unsigned char   ha[ALIGN(MAX_ADDR_LEN, sizeof(unsigned long))];
>>
>> This looks like ha[] alignment would change, I am not sure how critical it 
>> is.
> 
> Just adds 4 bytes to neighbour:
> 
> ...
> /* --- cacheline 2 boundary (128 bytes) --- */
> long unsigned int  used; /*   128 8 */
> atomic_t   probes;   /*   136 4 */
> __u8   flags;/*   140 1 */
> __u8   nud_state;/*   141 1 */
> __u8   type; /*   142 1 */
> __u8   dead; /*   143 1 */
> u8 protocol; /*   144 1 */
> 
> /* XXX 3 bytes hole, try to pack */
> seqlock_t  ha_lock;  /*   148 8 */
> unsigned char  ha[32];   /*   15632 */
> /* XXX 4 bytes hole, try to pack */
> 
> /* --- cacheline 3 boundary (192 bytes) --- */
> struct hh_cachehh;   /*   19248 */
> 
> ...
> 
> but does not change the actual allocation size which is rounded to 512.
> 

I have not talked about the allocation size, but alignment of ->ha field,
which is kind of assuming long alignment, in a strange way.

As I said, I do not know how performance critical this might be.

Re: [PATCH V2] net: dsa: ksz: Add reset GPIO handling

2018-12-07 Thread Marek Vasut

On 12/07/2018 11:24 PM, Andrew Lunn wrote:
> On Fri, Dec 07, 2018 at 10:51:36PM +0100, Marek Vasut wrote:
>> Add code to handle optional reset GPIO in the KSZ switch driver. The switch
>> has a reset GPIO line which can be controlled by the CPU, so make sure it is
>> configured correctly in such setups.
> 
> Hi Marek

Hi Andrew,

> Please make this a patch series, not two individual patches.

This actually is an individual patch, it doesn't depend on anything.
Or do you mean a series with the DT documentation change ?

> And as David has already said, include a cover letter.
> 
> Otherwise, this looks O.K.
> 
> Thanks
>   Andrew
> 


-- 
Best regards,
Marek Vasut

Re: [PATCH net-next] neighbor: Add protocol attribute

2018-12-07 Thread David Ahern

On 12/7/18 3:20 PM, Eric Dumazet wrote:
> 
> 
> On 12/07/2018 01:49 PM, David Ahern wrote:
>> From: David Ahern 
>>
>> Similar to routes and rules, add protocol attribute to neighbor entries
>> for easier tracking of how each was created.
>>
>> Signed-off-by: David Ahern 
>> ---
>>  include/net/neighbour.h|  2 ++
>>  include/uapi/linux/neighbour.h |  1 +
>>  net/core/neighbour.c   | 24 +++-
>>  3 files changed, 26 insertions(+), 1 deletion(-)
>>
>> diff --git a/include/net/neighbour.h b/include/net/neighbour.h
>> index 6c13072910ab..e93c59df9501 100644
>> --- a/include/net/neighbour.h
>> +++ b/include/net/neighbour.h
>> @@ -149,6 +149,7 @@ struct neighbour {
>>  __u8nud_state;
>>  __u8type;
>>  __u8dead;
>> +u8  protocol;
>>  seqlock_t   ha_lock;
>>  unsigned char   ha[ALIGN(MAX_ADDR_LEN, sizeof(unsigned long))];
> 
> This looks like ha[] alignment would change, I am not sure how critical it is.

Just adds 4 bytes to neighbour:

...
/* --- cacheline 2 boundary (128 bytes) --- */
long unsigned int  used; /*   128 8 */
atomic_t   probes;   /*   136 4 */
__u8   flags;/*   140 1 */
__u8   nud_state;/*   141 1 */
__u8   type; /*   142 1 */
__u8   dead; /*   143 1 */
u8 protocol; /*   144 1 */

/* XXX 3 bytes hole, try to pack */
seqlock_t  ha_lock;  /*   148 8 */
unsigned char  ha[32];   /*   15632 */
/* XXX 4 bytes hole, try to pack */

/* --- cacheline 3 boundary (192 bytes) --- */
struct hh_cachehh;   /*   19248 */

...

but does not change the actual allocation size which is rounded to 512.

Re: [PATCH V2] net: dsa: ksz: Add reset GPIO handling

2018-12-07 Thread Andrew Lunn

On Fri, Dec 07, 2018 at 10:51:36PM +0100, Marek Vasut wrote:
> Add code to handle optional reset GPIO in the KSZ switch driver. The switch
> has a reset GPIO line which can be controlled by the CPU, so make sure it is
> configured correctly in such setups.

Hi Marek

Please make this a patch series, not two individual patches.

And as David has already said, include a cover letter.

Otherwise, this looks O.K.

Thanks
Andrew

Re: [PATCH net-next] neighbor: Add protocol attribute

2018-12-07 Thread Eric Dumazet




On 12/07/2018 01:49 PM, David Ahern wrote:
> From: David Ahern 
> 
> Similar to routes and rules, add protocol attribute to neighbor entries
> for easier tracking of how each was created.
> 
> Signed-off-by: David Ahern 
> ---
>  include/net/neighbour.h|  2 ++
>  include/uapi/linux/neighbour.h |  1 +
>  net/core/neighbour.c   | 24 +++-
>  3 files changed, 26 insertions(+), 1 deletion(-)
> 
> diff --git a/include/net/neighbour.h b/include/net/neighbour.h
> index 6c13072910ab..e93c59df9501 100644
> --- a/include/net/neighbour.h
> +++ b/include/net/neighbour.h
> @@ -149,6 +149,7 @@ struct neighbour {
>   __u8nud_state;
>   __u8type;
>   __u8dead;
> + u8  protocol;
>   seqlock_t   ha_lock;
>   unsigned char   ha[ALIGN(MAX_ADDR_LEN, sizeof(unsigned long))];

This looks like ha[] alignment would change, I am not sure how critical it is.

>   struct hh_cache hh;
> @@ -173,6 +174,7 @@ struct pneigh_entry {
>   possible_net_t  net;
>   struct net_device   *dev;
>   u8  flags;
> + u8  protocol;
>   u8  key[0];
>  };
>

[PATCH] net: dsa: Add optional reset GPIO to Microchip KSZ switch binding

2018-12-07 Thread Marek Vasut

Add optional reset GPIO, as such a signal is available on the KSZ switches.

Signed-off-by: Marek Vasut 
Cc: Andrew Lunn 
Cc: Florian Fainelli 
Cc: Woojung Huh 
Cc: David S. Miller 
---
 Documentation/devicetree/bindings/net/dsa/ksz.txt | 4 
 1 file changed, 4 insertions(+)

diff --git a/Documentation/devicetree/bindings/net/dsa/ksz.txt 
b/Documentation/devicetree/bindings/net/dsa/ksz.txt
index ac145b885e955..0f407fb371ce1 100644
--- a/Documentation/devicetree/bindings/net/dsa/ksz.txt
+++ b/Documentation/devicetree/bindings/net/dsa/ksz.txt
@@ -8,6 +8,10 @@ Required properties:
   - "microchip,ksz9477"
   - "microchip,ksz9897"
 
+Optional properties:
+
+- reset-gpios  : Should be a gpio specifier for a reset line
+
 See Documentation/devicetree/bindings/net/dsa/dsa.txt for a list of additional
 required and optional properties.
 
-- 
2.18.0

Re: [PATCH] net: dsa: ksz: Add reset GPIO handling

2018-12-07 Thread Marek Vasut

On 12/07/2018 08:55 PM, Andrew Lunn wrote:
>> +dev->reset_gpio = -1;
>> +reset_gpio = of_get_named_gpio_flags(np, "reset-gpios", 0,
>> + _gpio_flags);
>> +if (reset_gpio >= 0) {
>> +flags = (reset_gpio_flags == OF_GPIO_ACTIVE_LOW) ?
>> +GPIOF_ACTIVE_LOW : 0;
> 
> Can you use devm_gpiod_get_optional()? It makes this a lot simpler.
> Take a look at mv88e6xxx/chip.c which also uses a GPIO for reset.

Done

> You also need to update the binding documentation for this new
> property.

Will do in a separate patch.

-- 
Best regards,
Marek Vasut

[PATCH V2] net: dsa: ksz: Add reset GPIO handling

2018-12-07 Thread Marek Vasut

Add code to handle optional reset GPIO in the KSZ switch driver. The switch
has a reset GPIO line which can be controlled by the CPU, so make sure it is
configured correctly in such setups.

Signed-off-by: Marek Vasut 
Cc: Vivien Didelot 
Cc: Woojung Huh 
Cc: David S. Miller 
Cc: Tristram Ha 
---
V2: Switch to devm_gpiod_get_optional()
---
 drivers/net/dsa/microchip/ksz_common.c | 17 +
 drivers/net/dsa/microchip/ksz_priv.h   |  2 ++
 2 files changed, 19 insertions(+)

diff --git a/drivers/net/dsa/microchip/ksz_common.c 
b/drivers/net/dsa/microchip/ksz_common.c
index 9705808c3af7a..3b12e2dcff31b 100644
--- a/drivers/net/dsa/microchip/ksz_common.c
+++ b/drivers/net/dsa/microchip/ksz_common.c
@@ -8,12 +8,14 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -294,6 +296,17 @@ int ksz_switch_register(struct ksz_device *dev,
if (dev->pdata)
dev->chip_id = dev->pdata->chip_id;
 
+   dev->reset_gpio = devm_gpiod_get_optional(dev->dev, "reset",
+ GPIOD_OUT_LOW);
+   if (IS_ERR(dev->reset_gpio))
+   return PTR_ERR(dev->reset_gpio);
+
+   if (dev->reset_gpio) {
+   gpiod_set_value(dev->reset_gpio, 1);
+   mdelay(10);
+   gpiod_set_value(dev->reset_gpio, 0);
+   }
+
mutex_init(>reg_mutex);
mutex_init(>stats_mutex);
mutex_init(>alu_mutex);
@@ -329,6 +342,10 @@ void ksz_switch_remove(struct ksz_device *dev)
 {
dev->dev_ops->exit(dev);
dsa_unregister_switch(dev->ds);
+
+   if (dev->reset_gpio)
+   gpiod_set_value(dev->reset_gpio, 1);
+
 }
 EXPORT_SYMBOL(ksz_switch_remove);
 
diff --git a/drivers/net/dsa/microchip/ksz_priv.h 
b/drivers/net/dsa/microchip/ksz_priv.h
index a38ff0841ed4e..60b49010904bf 100644
--- a/drivers/net/dsa/microchip/ksz_priv.h
+++ b/drivers/net/dsa/microchip/ksz_priv.h
@@ -59,6 +59,8 @@ struct ksz_device {
 
void *priv;
 
+   struct gpio_desc *reset_gpio;   /* Optional reset GPIO */
+
/* chip specific data */
u32 chip_id;
int num_vlans;
-- 
2.18.0

[PATCH net-next] neighbor: Add protocol attribute

2018-12-07 Thread David Ahern

From: David Ahern 

Similar to routes and rules, add protocol attribute to neighbor entries
for easier tracking of how each was created.

Signed-off-by: David Ahern 
---
 include/net/neighbour.h|  2 ++
 include/uapi/linux/neighbour.h |  1 +
 net/core/neighbour.c   | 24 +++-
 3 files changed, 26 insertions(+), 1 deletion(-)

diff --git a/include/net/neighbour.h b/include/net/neighbour.h
index 6c13072910ab..e93c59df9501 100644
--- a/include/net/neighbour.h
+++ b/include/net/neighbour.h
@@ -149,6 +149,7 @@ struct neighbour {
__u8nud_state;
__u8type;
__u8dead;
+   u8  protocol;
seqlock_t   ha_lock;
unsigned char   ha[ALIGN(MAX_ADDR_LEN, sizeof(unsigned long))];
struct hh_cache hh;
@@ -173,6 +174,7 @@ struct pneigh_entry {
possible_net_t  net;
struct net_device   *dev;
u8  flags;
+   u8  protocol;
u8  key[0];
 };
 
diff --git a/include/uapi/linux/neighbour.h b/include/uapi/linux/neighbour.h
index 998155444e0d..cd144e3099a3 100644
--- a/include/uapi/linux/neighbour.h
+++ b/include/uapi/linux/neighbour.h
@@ -28,6 +28,7 @@ enum {
NDA_MASTER,
NDA_LINK_NETNSID,
NDA_SRC_VNI,
+   NDA_PROTOCOL,  /* Originator of entry */
__NDA_MAX
 };
 
diff --git a/net/core/neighbour.c b/net/core/neighbour.c
index c3b58712e98b..56984695585d 100644
--- a/net/core/neighbour.c
+++ b/net/core/neighbour.c
@@ -1799,6 +1799,7 @@ static int neigh_add(struct sk_buff *skb, struct nlmsghdr 
*nlh,
struct net_device *dev = NULL;
struct neighbour *neigh;
void *dst, *lladdr;
+   u8 protocol = 0;
int err;
 
ASSERT_RTNL();
@@ -1838,6 +1839,14 @@ static int neigh_add(struct sk_buff *skb, struct 
nlmsghdr *nlh,
dst = nla_data(tb[NDA_DST]);
lladdr = tb[NDA_LLADDR] ? nla_data(tb[NDA_LLADDR]) : NULL;
 
+   if (tb[NDA_PROTOCOL]) {
+   if (nla_len(tb[NDA_PROTOCOL]) != sizeof(u8)) {
+   NL_SET_ERR_MSG(extack, "Invalid protocol attribute");
+   goto out;
+   }
+   protocol = nla_get_u8(tb[NDA_PROTOCOL]);
+   }
+
if (ndm->ndm_flags & NTF_PROXY) {
struct pneigh_entry *pn;
 
@@ -1845,6 +1854,8 @@ static int neigh_add(struct sk_buff *skb, struct nlmsghdr 
*nlh,
pn = pneigh_lookup(tbl, net, dst, dev, 1);
if (pn) {
pn->flags = ndm->ndm_flags;
+   if (protocol)
+   pn->protocol = protocol;
err = 0;
}
goto out;
@@ -1893,6 +1904,10 @@ static int neigh_add(struct sk_buff *skb, struct 
nlmsghdr *nlh,
} else
err = __neigh_update(neigh, lladdr, ndm->ndm_state, flags,
 NETLINK_CB(skb).portid, extack);
+
+   if (protocol)
+   neigh->protocol = protocol;
+
neigh_release(neigh);
 
 out:
@@ -2386,6 +2401,9 @@ static int neigh_fill_info(struct sk_buff *skb, struct 
neighbour *neigh,
nla_put(skb, NDA_CACHEINFO, sizeof(ci), ))
goto nla_put_failure;
 
+   if (neigh->protocol && nla_put_u8(skb, NDA_PROTOCOL, neigh->protocol))
+   goto nla_put_failure;
+
nlmsg_end(skb, nlh);
return 0;
 
@@ -2417,6 +2435,9 @@ static int pneigh_fill_info(struct sk_buff *skb, struct 
pneigh_entry *pn,
if (nla_put(skb, NDA_DST, tbl->key_len, pn->key))
goto nla_put_failure;
 
+   if (pn->protocol && nla_put_u8(skb, NDA_PROTOCOL, pn->protocol))
+   goto nla_put_failure;
+
nlmsg_end(skb, nlh);
return 0;
 
@@ -3072,7 +3093,8 @@ static inline size_t neigh_nlmsg_size(void)
   + nla_total_size(MAX_ADDR_LEN) /* NDA_DST */
   + nla_total_size(MAX_ADDR_LEN) /* NDA_LLADDR */
   + nla_total_size(sizeof(struct nda_cacheinfo))
-  + nla_total_size(4); /* NDA_PROBES */
+  + nla_total_size(4)  /* NDA_PROBES */
+  + nla_total_size(1); /* NDA_PROTOCOL */
 }
 
 static void __neigh_notify(struct neighbour *n, int type, int flags,
-- 
2.11.0

Re: [PATCH bpf 1/2] selftests/bpf: use thoff instead of nhoff in BPF flow dissector

2018-12-07 Thread Alexei Starovoitov

On Wed, Dec 05, 2018 at 08:40:47PM -0800, Stanislav Fomichev wrote:
> We are returning thoff from the flow dissector, not the nhoff. Pass
> thoff along with nhoff to the bpf program (initially thoff == nhoff)
> and expect flow dissector amend/return thoff, not nhoff.
> 
> This avoids confusion, when by the time bpf flow dissector exits,
> nhoff == thoff, which doesn't make much sense.
> 
> Signed-off-by: Stanislav Fomichev 

applied both to bpf tree. thanks

Re: [PATCH v2 bpf-next 0/7] bpf: support BPF_ALU | BPF_ARSH

2018-12-07 Thread Alexei Starovoitov

On Wed, Dec 05, 2018 at 01:52:29PM -0500, Jiong Wang wrote:
> BPF_ALU | BPF_ARSH | BPF_* were rejected by commit: 7891a87efc71
> ("bpf: arsh is not supported in 32 bit alu thus reject it"). As explained
> in the commit message, this is due to there is no complete support for them
> on interpreter and various JIT compilation back-ends.
> 
> This patch set is a follow-up which completes the missing bits. This also
> pave the way for running bpf program compiled with ALU32 instruction
> enabled by specifing -mattr=+alu32 to LLVM for which case there is likely
> to have more BPF_ALU | BPF_ARSH insns that will trigger the rejection code.
> 
> test_verifier.c is updated accordingly.
> 
> I have tested this patch set on x86-64 and NFP, I need help of review and
> test on the arch changes (mips/ppc/s390).
> 
> Note, there might be merge confict on mips change which is better to be
> applied on top of:
> 
>   commit: 20b880a05f06 ("mips: bpf: fix encoding bug for mm_srlv32_op"),
> 
> which is on mips-fixes branch at the moment.
> 
> Thanks.
> 
> v1->v2:
>  - Fix ppc implementation bug. Should zero high bits explicitly.

I've applied this set and earlier commit "mips: bpf: fix encoding bug for 
mm_srlv32_op"
to bpf-next.

Thanks

Re: [PATCH bpf-next 0/7] Add XDP_ATTACH bind() flag to AF_XDP sockets

2018-12-07 Thread Alexei Starovoitov

On Fri, Dec 07, 2018 at 12:44:24PM +0100, Björn Töpel wrote:
> From: Björn Töpel 
> 
> Hi!
> 
> This patch set adds support for a new XDP socket bind option,
> XDP_ATTACH.
> 
> The rationale behind attach is performance and ease of use. Many XDP
> socket users just need a simple way of creating/binding a socket and
> receiving frames right away without loading an XDP program.
> 
> XDP_ATTACH adds a mechanism we call "builtin XDP program" that simply
> is a kernel provided XDP program that is installed to the netdev when
> XDP_ATTACH is being passed as a bind() flag.
> 
> The builtin program is the simplest program possible to redirect a
> frame to an attached socket. In restricted C it would look like this:
> 
>   SEC("xdp")
>   int xdp_prog(struct xdp_md *ctx)
>   {
> return bpf_xsk_redirect(ctx);
>   }
> 
> The builtin program loaded via XDP_ATTACH behaves, from an
> install-to-netdev/uninstall-from-netdev point of view, differently
> from regular XDP programs. The easiest way to look at it is as a
> 2-level hierarchy, where regular XDP programs has precedence over the
> builtin one.

The feature makes sense to me.
May be XDP_ATTACH_BUILTIN would be a better name ?
Also I think it needs another parameter to say which builtin
program to use.
This unconditional xsk_redirect is fine for performance
benchmarking, but for production I suspect the users would want
an easy way to stay safe when they're playing with AF_XDP.
So another builtin program that redirects ssh and ping traffic
back to the kernel would be a nice addition.

Re: [PATCH iproute2-next 0/2] devlink: Add support for 'fw_load_policy' generic parameter

2018-12-07 Thread David Ahern

On 12/4/18 3:14 AM, Shalom Toledo wrote:
> Patch #1 add string to uint conversion support for generic parameters.
> Patch #2 add string to uint support for 'fw_load_policy' generic parameter
> 
> Shalom Toledo (2):
>   devlink: Add string to uint{8,16,32} conversion for generic parameters
>   devlink: Add support for 'fw_load_policy' generic parameter
> 
>  devlink/devlink.c| 156 ---
>  include/uapi/linux/devlink.h |   5 ++
>  2 files changed, 151 insertions(+), 10 deletions(-)
> 

applied to iproute2-next. Thanks

Re: [PATCH 1/5] net: dsa: ksz: Add MIB counter reading support

2018-12-07 Thread David Miller



Every patch series should have a header posting with Subject of
the form "[PATCH 0/N] ..." explaining what the series does at
a high level, how it does it, and why it does it that way.

Re: OMAP4430 SDP with KS8851: very slow networking

2018-12-07 Thread Tony Lindgren

* Russell King - ARM Linux  [181207 19:27]:
> You mentioned that edge mode didn't work as well as level mode on
> duovero smsc controller, I think this may help to solve the same
> issue but for edge IRQs - we need a mask_ack_irq function to avoid
> acking while the edge interrupt is masked.  Let me know if that
> lowers the smsc ping latency while in edge mode.

Looks like smsc edge interrupt is still producing varying
ping latencies with this. Seems like the mas_ack_irq is
a nice improvment though.

Regards,

Tony

Re: [PATCH v2 net-next 0/4] net: aquantia: add RSS configuration

2018-12-07 Thread David Miller

From: Igor Russkikh 
Date: Fri, 7 Dec 2018 14:00:09 +

> In this patchset few bugs related to RSS are fixed and RSS table and
> hash key configuration is added.
> 
> We also do increase max number of HW rings upto 8.
> 
> v2: removed extra arg check

Series applied.

Re: [PATCH] gianfar: Add gfar_change_carrier()

2018-12-07 Thread Florian Fainelli

On 12/7/18 9:26 AM, Andrew Lunn wrote:
>> Would you be happier if .ndo_change_carrier() only acted on Fixed PHYs?
> 
> I think it makes sense to allow a fixed phy carrier to be changed from
> user space. However, i don't think you can easily plumb that to
> .ndo_change_carrier(), since that is a MAC feature. You need to change
> the fixed_phy_status to indicate the PHY has lost link, and then let
> the usual mechanisms tell the MAC it is down and change the carrier
> status.

Joakim, I still don't understand what did not work with:

- adding a ndo_change_carrier() interface which keeps a boolean flag
whether the link was up or not

- register a fixed_link_update callback for your fixed PHY, which just
propagates that flag back to the fixed PHY

and that should take care of having the carrier go down, as driven by
the PHY state machine, for that fixed device.
-- 
Florian

Re: [PATCH rdma-next 0/3] Packet based credit mode

2018-12-07 Thread Jason Gunthorpe

On Fri, Dec 07, 2018 at 08:04:26AM +0200, Leon Romanovsky wrote:
> On Thu, Dec 06, 2018 at 08:27:06PM -0700, Jason Gunthorpe wrote:
> > On Fri, Nov 30, 2018 at 01:22:03PM +0200, Leon Romanovsky wrote:
> > > From: Leon Romanovsky 
> > >
> > > >From Danit,
> > >
> > > Packet based credit mode is an alternative end-to-end credit mode for QPs
> > > set during their creation. Credits are transported from the responder
> > > to the requester to optimize the use of its receive resources.
> > > In packet-based credit mode, credits are issued on a per packet basis.
> > >
> > > The advantage of this feature comes while sending large RDMA messages
> > > through switches that are short in memory.
> > >
> > > The first commit exposes QP creation flag and the HCA capability. The
> > > second commit adds support for a new DV QP creation flag. The last
> > > commit report packet based credit mode capability via the MLX5DV device
> > > capabilities.
> > >
> > > Thanks
> > >
> > > Danit Goldberg (3):
> > >   net/mlx5: Expose packet based credit mode
> > >   IB/mlx5: Add packet based credit mode support
> > >   IB/mlx5: Report packet based credit mode device capability
> >
> > This looks fine to me, can you update the shared branch please
> 
> Done, thanks
> 3fd3c80acc17 net/mlx5: Expose packet based credit mode

Applied to for-next

Thanks,
Jason

Re: [PATCH net-next v2 0/4] net: mitigate retpoline overhead

2018-12-07 Thread Paolo Abeni

Hi,

On Thu, 2018-12-06 at 22:28 -0800, David Miller wrote:
> From: David Miller 
> Date: Thu, 06 Dec 2018 22:24:09 -0800 (PST)
> 
> > Series applied, thanks!
> 
> Erm... actually reverted.  Please fix these build failures:

oops ...
I'm sorry for the late reply. I'm travelling and I will not able to re-
post soon.

> ld: net/ipv6/ip6_offload.o: in function `ipv6_gro_receive':
> ip6_offload.c:(.text+0xda2): undefined reference to `udp6_gro_receive'
> ld: ip6_offload.c:(.text+0xdb6): undefined reference to `udp6_gro_receive'
> ld: net/ipv6/ip6_offload.o: in function `ipv6_gro_complete':
> ip6_offload.c:(.text+0x1953): undefined reference to `udp6_gro_complete'
> ld: ip6_offload.c:(.text+0x1966): undefined reference to `udp6_gro_complete'
> make: *** [Makefile:1036: vmlinux] Error 1

Are you building with CONFIG_IPV6=m ? I tested vs some common cfg, but
I omitted that in my last iteration (my bad). With such conf ip6
offloads are builtin while udp6 offloads end-up in the ipv6 module, so
I can't use them with the given conf.

I'll try to fix the above in v3.

I'm sorry for this mess,

Paolo

[PATCH v2 net-next] neighbor: Improve garbage collection

2018-12-07 Thread David Ahern

From: David Ahern 

The existing garbage collection algorithm has a number of problems:

1. The gc algorithm will not evict PERMANENT entries as those entries
   are managed by userspace, yet the existing algorithm walks the entire
   hash table which means it always considers PERMANENT entries when
   looking for entries to evict. In some use cases (e.g., EVPN) there
   can be tens of thousands of PERMANENT entries leading to wasted
   CPU cycles when gc kicks in. As an example, with 32k permanent
   entries, neigh_alloc has been observed taking more than 4 msec per
   invocation.

2. Currently, when the number of neighbor entries hits gc_thresh2 and
   the last flush for the table was more than 5 seconds ago gc kicks in
   walks the entire hash table evicting *all* entries not in PERMANENT
   or REACHABLE state and not marked as externally learned. There is no
   discriminator on when the neigh entry was created or if it just moved
   from REACHABLE to another NUD_VALID state (e.g., NUD_STALE).

   It is possible for entries to be created or for established neighbor
   entries to be moved to STALE (e.g., an external node sends an ARP
   request) right before the 5 second window lapses:

-|-x|--|-
t-5 t t+5

   If that happens those entries are evicted during gc causing unnecessary
   thrashing on neighbor entries and userspace caches trying to track them.

   Further, this contradicts the description of gc_thresh2 which says
   "Entries older than 5 seconds will be cleared".

   One workaround is to make gc_thresh2 == gc_thresh3 but that negates the
   whole point of having separate thresholds.

3. Clearing *all* neigh non-PERMANENT/REACHABLE/externally learned entries
   when gc_thresh2 is exceeded is over kill and contributes to trashing
   especially during startup.

This patch addresses these problems as follows:

1. Use of a separate list_head to track entries that can be garbage
   collected along with a separate counter. PERMANENT entries are not
   added to this list.

   The gc_thresh parameters are only compared to the new counter, not the
   total entries in the table. The forced_gc function is updated to only
   walk this new gc_list looking for entries to evict.

2. Entries are added to the list head at the tail and removed from the
   front.

3. Entries are only evicted if they were last updated more than 5 seconds
   ago, adhering to the original intent of gc_thresh2.

4. Forced gc is stopped once the number of gc_entries drops below
   gc_thresh2.

5. Since gc checks do not apply to PERMANENT entries, gc levels are skipped
   when allocating a new neighbor for a PERMANENT entry. By extension this
   means there are no explicit limits on the number of PERMANENT entries
   that can be created, but this is no different than FIB entries or FDB
   entries.

Signed-off-by: David Ahern 
---
v2
- remove on_gc_list boolean in favor of !list_empty
- fix neigh_alloc to add new entry to tail of list_head

 Documentation/networking/ip-sysctl.txt |   4 +-
 include/net/neighbour.h|   3 +
 net/core/neighbour.c   | 119 +++--
 3 files changed, 90 insertions(+), 36 deletions(-)

diff --git a/Documentation/networking/ip-sysctl.txt 
b/Documentation/networking/ip-sysctl.txt
index af2a69439b93..acdfb5d2bcaa 100644
--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -108,8 +108,8 @@ neigh/default/gc_thresh2 - INTEGER
Default: 512
 
 neigh/default/gc_thresh3 - INTEGER
-   Maximum number of neighbor entries allowed.  Increase this
-   when using large numbers of interfaces and when communicating
+   Maximum number of non-PERMANENT neighbor entries allowed.  Increase
+   this when using large numbers of interfaces and when communicating
with large numbers of directly-connected peers.
Default: 1024
 
diff --git a/include/net/neighbour.h b/include/net/neighbour.h
index f58b384aa6c9..6c13072910ab 100644
--- a/include/net/neighbour.h
+++ b/include/net/neighbour.h
@@ -154,6 +154,7 @@ struct neighbour {
struct hh_cache hh;
int (*output)(struct neighbour *, struct sk_buff *);
const struct neigh_ops  *ops;
+   struct list_headgc_list;
struct rcu_head rcu;
struct net_device   *dev;
u8  primary_key[0];
@@ -214,6 +215,8 @@ struct neigh_table {
struct timer_list   proxy_timer;
struct sk_buff_head proxy_queue;
atomic_tentries;
+   atomic_tgc_entries;
+   struct list_headgc_list;
rwlock_tlock;
unsigned long   last_rand;
struct neigh_statistics __percpu *stats;
diff --git a/net/core/neighbour.c b/net/core/neighbour.c
index 6d479b5562be..c3b58712e98b 100644
--- a/net/core/neighbour.c
+++

Re: [PATCH net] ipv6: sr: properly initialize flowi6 prior passing to ip6_route_output

2018-12-07 Thread David Miller

From: Shmulik Ladkani 
Date: Fri,  7 Dec 2018 09:50:17 +0200

> In 'seg6_output', stack variable 'struct flowi6 fl6' was missing
> initialization.
> 
> Fixes: 6c8702c60b88 ("ipv6: sr: add support for SRH encapsulation and 
> injection with lwtunnels")
> Signed-off-by: Shmulik Ladkani 

Applied and queued up for -stable, thanks.

Re: [PATCH] net: dsa: ksz: Fix port membership

2018-12-07 Thread Andrew Lunn

> I think if you do this without setting offload_fwd_mark you will
> receive duplicate frame.

I don't think it will, at least not in the normal case. The hardware
should know the egress port, so there is no need to forward a copy to
the CPU. The only time it should forward to the CPU is when the egress
port is not known, so it floods. Without offload_fwd_mark set, the SW
bridge will flood it back out the ports causing duplication. But that
is not too bad. The Marvell driver did this for a while and nothing
bad was reported.

Andrew

Re: [PATCH] net: dsa: ksz: Fix port membership

2018-12-07 Thread Marek Vasut

On 12/07/2018 08:37 PM, tristram...@microchip.com wrote:
>> If two ports are in the same bridge and in forwarding state, the packets
>> must be able to pass between them in both directions. The current code
>> only configures this bridge membership for a port newly added to the
>> bridge, but does not update all the other ports. Thus, ingress packets
>> on the new port will be forwarded, but ingress packets on other ports
>> destined for the new port (eg. a reply) will not be forwarded back to
>> the new port, because they are not configured to do so. This patch fixes
>> that by updating the membership registers of all ports.
>>
>> Signed-off-by: Marek Vasut 
>> Cc: Vivien Didelot 
>> Cc: Woojung Huh 
>> Cc: David S. Miller 
>> Cc: Tristram Ha 
>> ---
>>  drivers/net/dsa/microchip/ksz9477.c | 6 +++---
>>  1 file changed, 3 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/net/dsa/microchip/ksz9477.c
>> b/drivers/net/dsa/microchip/ksz9477.c
>> index 0684657fbf9a9..e24dd14ccde77 100644
>> --- a/drivers/net/dsa/microchip/ksz9477.c
>> +++ b/drivers/net/dsa/microchip/ksz9477.c
>> @@ -396,7 +396,7 @@ static void ksz9477_port_stp_state_set(struct
>> dsa_switch *ds, int port,
>>  struct ksz_device *dev = ds->priv;
>>  struct ksz_port *p = >ports[port];
>>  u8 data;
>> -int member = -1;
>> +int i, member = -1;
>>
>>  ksz_pread8(dev, port, P_STP_CTRL, );
>>  data &= ~(PORT_TX_ENABLE | PORT_RX_ENABLE |
>> PORT_LEARN_DISABLE);
>> @@ -454,8 +454,8 @@ static void ksz9477_port_stp_state_set(struct
>> dsa_switch *ds, int port,
>>  dev->tx_ports &= ~(1 << port);
>>
>>  /* Port membership may share register with STP state. */
>> -if (member >= 0 && member != p->member)
>> -ksz9477_cfg_port_member(dev, port, (u8)member);
>> +for (i = 0; i < SWITCH_PORT_NUM; i++)
>> +ksz9477_cfg_port_member(dev, i, (u8)member);
>>
>>  /* Check if forwarding needs to be updated. */
>>  if (state != BR_STATE_FORWARDING) {
> 
> The original DSA model did not have a way to tell the bridge device not to
> forward the frame, so the switch driver always setup the membership to
> disable forwarding between ports.
> 
> When lan devices are setup they act like individual devices.  A bridge device
> adding them under it will forward the frames.
> 
> The new switchdev model adds the offload_fwd_mark bit to tell the bridge not 
> to
> forward frame.
> 
> The ksz_update_port_member function in ksz_common.c is doing this membership
> setup for all forwarding ports.  It was finally enabled in one of the RFC 
> patches I
> submitted recently (Add switch forward offloading support).
> 
> I think if you do this without setting offload_fwd_mark you will receive 
> duplicate
> frame.

Do you have a git tree with all the KSZ patches based on -next
somewhere, so I don't have to look for them in random MLs ?

-- 
Best regards,
Marek Vasut

Re: [PATCH] net: dsa: ksz: Add reset GPIO handling

2018-12-07 Thread Andrew Lunn

> + dev->reset_gpio = -1;
> + reset_gpio = of_get_named_gpio_flags(np, "reset-gpios", 0,
> +  _gpio_flags);
> + if (reset_gpio >= 0) {
> + flags = (reset_gpio_flags == OF_GPIO_ACTIVE_LOW) ?
> + GPIOF_ACTIVE_LOW : 0;

Can you use devm_gpiod_get_optional()? It makes this a lot simpler.
Take a look at mv88e6xxx/chip.c which also uses a GPIO for reset.

You also need to update the binding documentation for this new
property.

Andrew

Re: [PATCH net-next] neighbour: Improve garbage collection

2018-12-07 Thread David Ahern

On 12/6/18 8:59 PM, David Miller wrote:
> But why do you need the on_gc_list boolean state? f

mental blockage.

v2 coming up.

Re: [PATCH] net: dsa: ksz: Fix port membership

2018-12-07 Thread Florian Fainelli

On 12/7/18 11:37 AM, tristram...@microchip.com wrote:
>> If two ports are in the same bridge and in forwarding state, the packets
>> must be able to pass between them in both directions. The current code
>> only configures this bridge membership for a port newly added to the
>> bridge, but does not update all the other ports. Thus, ingress packets
>> on the new port will be forwarded, but ingress packets on other ports
>> destined for the new port (eg. a reply) will not be forwarded back to
>> the new port, because they are not configured to do so. This patch fixes
>> that by updating the membership registers of all ports.
>>
>> Signed-off-by: Marek Vasut 
>> Cc: Vivien Didelot 
>> Cc: Woojung Huh 
>> Cc: David S. Miller 
>> Cc: Tristram Ha 
>> ---
>>  drivers/net/dsa/microchip/ksz9477.c | 6 +++---
>>  1 file changed, 3 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/net/dsa/microchip/ksz9477.c
>> b/drivers/net/dsa/microchip/ksz9477.c
>> index 0684657fbf9a9..e24dd14ccde77 100644
>> --- a/drivers/net/dsa/microchip/ksz9477.c
>> +++ b/drivers/net/dsa/microchip/ksz9477.c
>> @@ -396,7 +396,7 @@ static void ksz9477_port_stp_state_set(struct
>> dsa_switch *ds, int port,
>>  struct ksz_device *dev = ds->priv;
>>  struct ksz_port *p = >ports[port];
>>  u8 data;
>> -int member = -1;
>> +int i, member = -1;
>>
>>  ksz_pread8(dev, port, P_STP_CTRL, );
>>  data &= ~(PORT_TX_ENABLE | PORT_RX_ENABLE |
>> PORT_LEARN_DISABLE);
>> @@ -454,8 +454,8 @@ static void ksz9477_port_stp_state_set(struct
>> dsa_switch *ds, int port,
>>  dev->tx_ports &= ~(1 << port);
>>
>>  /* Port membership may share register with STP state. */
>> -if (member >= 0 && member != p->member)
>> -ksz9477_cfg_port_member(dev, port, (u8)member);
>> +for (i = 0; i < SWITCH_PORT_NUM; i++)
>> +ksz9477_cfg_port_member(dev, i, (u8)member);
>>
>>  /* Check if forwarding needs to be updated. */
>>  if (state != BR_STATE_FORWARDING) {
> 
> The original DSA model did not have a way to tell the bridge device not to
> forward the frame, so the switch driver always setup the membership to
> disable forwarding between ports.
> 
> When lan devices are setup they act like individual devices.  A bridge device
> adding them under it will forward the frames.
> 
> The new switchdev model adds the offload_fwd_mark bit to tell the bridge not 
> to
> forward frame.
> 
> The ksz_update_port_member function in ksz_common.c is doing this membership
> setup for all forwarding ports.  It was finally enabled in one of the RFC 
> patches I
> submitted recently (Add switch forward offloading support).
> 
> I think if you do this without setting offload_fwd_mark you will receive 
> duplicate
> frame.
> 

Either I am misreading Marek's patch, or I don't quite understand your
response, but what is happening when you enslave a switch port into a
bridge is that you need to make sure that:

- the switch port being enslaved will be part of the same forwarding
group as any other switch port already in the bridge
- any existing switch port already enslaved in the bridge must now also
be allowed to forward to the port that is being enslaved

That is to me, exactly what Marek's patch is fixing, your response is
about something slightly orthogonal here.
-- 
Florian

RE: [PATCH] net: dsa: ksz: Fix port membership

2018-12-07 Thread Tristram.Ha

> If two ports are in the same bridge and in forwarding state, the packets
> must be able to pass between them in both directions. The current code
> only configures this bridge membership for a port newly added to the
> bridge, but does not update all the other ports. Thus, ingress packets
> on the new port will be forwarded, but ingress packets on other ports
> destined for the new port (eg. a reply) will not be forwarded back to
> the new port, because they are not configured to do so. This patch fixes
> that by updating the membership registers of all ports.
> 
> Signed-off-by: Marek Vasut 
> Cc: Vivien Didelot 
> Cc: Woojung Huh 
> Cc: David S. Miller 
> Cc: Tristram Ha 
> ---
>  drivers/net/dsa/microchip/ksz9477.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/net/dsa/microchip/ksz9477.c
> b/drivers/net/dsa/microchip/ksz9477.c
> index 0684657fbf9a9..e24dd14ccde77 100644
> --- a/drivers/net/dsa/microchip/ksz9477.c
> +++ b/drivers/net/dsa/microchip/ksz9477.c
> @@ -396,7 +396,7 @@ static void ksz9477_port_stp_state_set(struct
> dsa_switch *ds, int port,
>   struct ksz_device *dev = ds->priv;
>   struct ksz_port *p = >ports[port];
>   u8 data;
> - int member = -1;
> + int i, member = -1;
> 
>   ksz_pread8(dev, port, P_STP_CTRL, );
>   data &= ~(PORT_TX_ENABLE | PORT_RX_ENABLE |
> PORT_LEARN_DISABLE);
> @@ -454,8 +454,8 @@ static void ksz9477_port_stp_state_set(struct
> dsa_switch *ds, int port,
>   dev->tx_ports &= ~(1 << port);
> 
>   /* Port membership may share register with STP state. */
> - if (member >= 0 && member != p->member)
> - ksz9477_cfg_port_member(dev, port, (u8)member);
> + for (i = 0; i < SWITCH_PORT_NUM; i++)
> + ksz9477_cfg_port_member(dev, i, (u8)member);
> 
>   /* Check if forwarding needs to be updated. */
>   if (state != BR_STATE_FORWARDING) {

The original DSA model did not have a way to tell the bridge device not to
forward the frame, so the switch driver always setup the membership to
disable forwarding between ports.

When lan devices are setup they act like individual devices.  A bridge device
adding them under it will forward the frames.

The new switchdev model adds the offload_fwd_mark bit to tell the bridge not to
forward frame.

The ksz_update_port_member function in ksz_common.c is doing this membership
setup for all forwarding ports.  It was finally enabled in one of the RFC 
patches I
submitted recently (Add switch forward offloading support).

I think if you do this without setting offload_fwd_mark you will receive 
duplicate
frame.

Re: OMAP4430 SDP with KS8851: very slow networking

2018-12-07 Thread Russell King - ARM Linux

On Fri, Dec 07, 2018 at 11:03:12AM -0800, Tony Lindgren wrote:
> * Tony Lindgren  [181207 18:14]:
> > Hi,
> > 
> > * Russell King - ARM Linux  [181207 18:01]:
> > > Hi Tony,
> > > 
> > > You know most of what's been going on from IRC, but here's the patch
> > > which gets me:
> > > 
> > > 1) working interrupts for networking
> > > 2) solves the stuck-wakeup problem
> > > 
> > > It also contains some of the debug bits I added.
> > 
> > This is excellent news :) Will test today.
> 
> Yes your patch seems to work great based on brief testing :)
> 
> > > I think what this means is that we should strip out ec0daae685b2
> > > ("gpio: omap: Add level wakeup handling for omap4 based SoCs").
> > 
> > Yes the only reason for the wakeup quirk was the stuck wakeup
> > state seen on omap4, it can be just dropped if this works.
> > Adding Grygorii to Cc too.
> 
> I'll post a partial revert for commit ec0daae685b2 ("gpio: omap:
> Add level wakeup handling for omap4 based SoCs") shortly.

Hi,

You mentioned that edge mode didn't work as well as level mode on
duovero smsc controller, I think this may help to solve the same
issue but for edge IRQs - we need a mask_ack_irq function to avoid
acking while the edge interrupt is masked.  Let me know if that
lowers the smsc ping latency while in edge mode.

Thanks.

diff --git a/drivers/gpio/gpio-omap.c b/drivers/gpio/gpio-omap.c
index 3d021f648c5d..b1ad6098e894 100644
--- a/drivers/gpio/gpio-omap.c
+++ b/drivers/gpio/gpio-omap.c
@@ -11,7 +11,7 @@
  * it under the terms of the GNU General Public License version 2 as
  * published by the Free Software Foundation.
  */
-
+#define DEBUG
 #include 
 #include 
 #include 
@@ -366,10 +366,14 @@ static inline void omap_set_gpio_trigger(struct gpio_bank 
*bank, int gpio,
  trigger & IRQ_TYPE_LEVEL_LOW);
omap_gpio_rmw(base, bank->regs->leveldetect1, gpio_bit,
  trigger & IRQ_TYPE_LEVEL_HIGH);
+   /*
+* We need the edge detect enabled for the idle mode detection
+* to function on OMAP4430.
+*/
omap_gpio_rmw(base, bank->regs->risingdetect, gpio_bit,
- trigger & IRQ_TYPE_EDGE_RISING);
+ trigger & (IRQ_TYPE_EDGE_RISING | IRQ_TYPE_LEVEL_HIGH));
omap_gpio_rmw(base, bank->regs->fallingdetect, gpio_bit,
- trigger & IRQ_TYPE_EDGE_FALLING);
+ trigger & (IRQ_TYPE_EDGE_FALLING | IRQ_TYPE_LEVEL_LOW));
 
bank->context.leveldetect0 =
readl_relaxed(bank->base + bank->regs->leveldetect0);
@@ -899,6 +903,19 @@ static void omap_gpio_mask_irq(struct irq_data *d)
raw_spin_unlock_irqrestore(>lock, flags);
 }
 
+static void omap_gpio_mask_ack_irq(struct irq_data *d)
+{
+   struct gpio_bank *bank = omap_irq_data_get_bank(d);
+   unsigned offset = d->hwirq;
+   unsigned long flags;
+
+   raw_spin_lock_irqsave(>lock, flags);
+   omap_clear_gpio_irqstatus(bank, offset);
+   omap_set_gpio_irqenable(bank, offset, 0);
+   omap_set_gpio_triggering(bank, offset, IRQ_TYPE_NONE);
+   raw_spin_unlock_irqrestore(>lock, flags);
+}
+
 static void omap_gpio_unmask_irq(struct irq_data *d)
 {
struct gpio_bank *bank = omap_irq_data_get_bank(d);
@@ -910,14 +927,16 @@ static void omap_gpio_unmask_irq(struct irq_data *d)
if (trigger)
omap_set_gpio_triggering(bank, offset, trigger);
 
+   omap_set_gpio_irqenable(bank, offset, 1);
+
/* For level-triggered GPIOs, the clearing must be done after
-* the HW source is cleared, thus after the handler has run */
-   if (bank->level_mask & BIT(offset)) {
-   omap_set_gpio_irqenable(bank, offset, 0);
+* the HW source is cleared, thus after the handler has run.
+* OMAP4 needs this done _after_ enabing the interrupt to clear
+* the wakeup status.
+*/
+   if (bank->level_mask & BIT(offset))
omap_clear_gpio_irqstatus(bank, offset);
-   }
 
-   omap_set_gpio_irqenable(bank, offset, 1);
raw_spin_unlock_irqrestore(>lock, flags);
 }
 
@@ -1377,6 +1396,7 @@ static int omap_gpio_probe(struct platform_device *pdev)
 
irqc->irq_startup = omap_gpio_irq_startup,
irqc->irq_shutdown = omap_gpio_irq_shutdown,
+   irqc->irq_mask_ack = omap_gpio_mask_ack_irq,
irqc->irq_ack = omap_gpio_ack_irq,
irqc->irq_mask = omap_gpio_mask_irq,
irqc->irq_unmask = omap_gpio_unmask_irq,
@@ -1520,6 +1540,10 @@ static void omap_gpio_idle(struct gpio_bank *bank, bool 
may_lose_context)
struct device *dev = bank->chip.parent;
u32 l1 = 0, l2 = 0;
 
+   dev_dbg(dev, "%s(): ld 0x%08x 0x%08x we 0x%08x\n", __func__,
+   bank->context.leveldetect0, bank->context.leveldetect1,
+   bank->context.wake_en);
+
if (bank->funcs.idle_enable_level_quirk)
bank->funcs.idle_enable_level_quirk(bank);
 
@@ -1553,6

Re: OMAP4430 SDP with KS8851: very slow networking

2018-12-07 Thread Tony Lindgren

* Tony Lindgren  [181207 18:14]:
> Hi,
> 
> * Russell King - ARM Linux  [181207 18:01]:
> > Hi Tony,
> > 
> > You know most of what's been going on from IRC, but here's the patch
> > which gets me:
> > 
> > 1) working interrupts for networking
> > 2) solves the stuck-wakeup problem
> > 
> > It also contains some of the debug bits I added.
> 
> This is excellent news :) Will test today.

Yes your patch seems to work great based on brief testing :)

> > I think what this means is that we should strip out ec0daae685b2
> > ("gpio: omap: Add level wakeup handling for omap4 based SoCs").
> 
> Yes the only reason for the wakeup quirk was the stuck wakeup
> state seen on omap4, it can be just dropped if this works.
> Adding Grygorii to Cc too.

I'll post a partial revert for commit ec0daae685b2 ("gpio: omap:
Add level wakeup handling for omap4 based SoCs") shortly.

Thanks,

Tony

Re: [PATCH bpf-next] bpf: relax verifier restriction on BPF_MOV | BPF_ALU

2018-12-07 Thread Alexei Starovoitov

On Fri, Dec 07, 2018 at 05:19:21PM +, Jiong Wang wrote:
> On 06/12/2018 03:13, Alexei Starovoitov wrote:
> > On Wed, Dec 05, 2018 at 03:32:50PM +, Jiong Wang wrote:
> > > On 05/12/2018 14:52, Edward Cree wrote:
> > > > On 05/12/18 09:46, Jiong Wang wrote:
> > > > > There is NO processed instruction number regression, either with or 
> > > > > without
> > > > > -mattr=+alu32.
> > > > 
> > > > > Cilium bpf
> > > > > ===
> > > > > bpf_lb-DLB_L3.o 2110/21101730/1733
> > > > That looks like a regression of 3 insns in the 32-bit case.
> > > > May be worth investigating why.
> > > 
> > > Will look into this.
> > > 
> > > > 
> > > > > +  dst_reg = insn->dst_reg;
> > > > > +  regs[dst_reg] = regs[src_reg];
> > > > > +  if (BPF_CLASS(insn->code) == BPF_ALU) {
> > > > > +  /* Update type and range info. */
> > > > > +  regs[dst_reg].type = SCALAR_VALUE;
> > > > > +  coerce_reg_to_size([dst_reg], 4);
> > > > Won't this break when handed a pointer (as root, so allowed to leak
> > > >   it)?  E.g. (pointer + x) gets turned into scalar x, rather than
> > > >   unknown scalar in range [0, 0x].
> > > 
> > > Initially I was gating this to scalar_value only, later was thinking it
> > > could be extended to ptr case if ptr leak is allowed.
> > > 
> > > But, your comment remind me min/max value doesn't mean real min/max value
> > > for ptr types value, it means the offset only if I am understanding the
> > > issue correctly. So, it will break pointer case.
> > 
> > correct. In case of is_pointer_value() && root -> mark_reg_unknown() has to 
> > be called
> > 
> > The explanation of additional 3 steps from another email makes sense to me.
> > 
> > Can you take a look why it helps default (llvm alu64) case too ?
> > bpf_overlay.o   3096/2898
> 
> It is embarrassing that I am not able to reproduce this number after tried
> quite a few env configurations. I think the number must be wrong because
> llvm alu64 binary doesn't contain alu32 move so shouldn't be impacted by
> this patch even though I double checked the raw data I collected on llvm
> alu64, re-calculated the number before this patch, it is still 3096. I
> guess there must be something wrong with the binary I was loading.
> 
> I improved my benchmarking methodology to build all alu64 and alu32
> binaries first, and never change them later. Then used a script to load and
> collect the processed number. (borrowed the script from
> https://github.com/4ast/bpf_cilium_test/, only my binaries are built from
> latest Cilium repo and contains alu32 version as well)
> 
> I ran this new benchmarking env for several times, and could get the
> following new results consistently:
> 
> bpf_lb-DLB_L3.o:2085/2085 1685/1687
> bpf_lb-DLB_L4.o:2287/2287 1986/1982
> bpf_lb-DUNKNOWN.o:  690/690   622/622
> bpf_lxc.o:  95033/95033   N/A
> bpf_netdev.o:   7245/7245 N/A
> bpf_overlay.o:  2898/2898 3085/2947
> 
> No change on alu64 binary.
> 
> For alu32, bpf_overlay.o still get fewer processed instruction number, this
> is because there is the following sequence (and another similar one).
> Before this patch, r2 at insn 139 is unknown, so verifier always explore
> both path-taken and path-fall_through. After this patch, it explores
> path-fall_through only, so saved some insns.
> 
>   129: (b4) (u32) r7 = (u32) -140
>   ...
>   136: (bc) (u32) r2 = (u32) r7
>   137: (74) (u32) r2 >>= (u32) 31
>   138: (4c) (u32) r2 |= (u32) r1
>   139: (15) if r2 == 0x0 goto pc+342
>   140: (b4) (u32) r1 = (u32) 2
> 
> And a permissive register value for r2 hasn't released more path prune for
> this test, so in all, after this patch, there is fewer processed insn.
> 
> I have sent out a v2, gated this change under SCALAR_VALUE, and also
> updated the patch description.

Thanks for the update. Makes sense.

[PATCH 5/5] net: dsa: ksz: Add Microchip KSZ8795 DSA driver

2018-12-07 Thread Marek Vasut

From: Tristram Ha 

Add Microchip KSZ8795 DSA driver.

Signed-off-by: Tristram Ha 
Signed-off-by: Marek Vasut 
Cc: Vivien Didelot 
Cc: Woojung Huh 
Cc: David S. Miller 
---
 drivers/net/dsa/microchip/Kconfig   |   17 +
 drivers/net/dsa/microchip/Makefile  |2 +
 drivers/net/dsa/microchip/ksz8795.c | 1351 +++
 drivers/net/dsa/microchip/ksz8795_reg.h | 1016 +
 drivers/net/dsa/microchip/ksz8795_spi.c |  166 +++
 drivers/net/dsa/microchip/ksz_priv.h|1 +
 6 files changed, 2553 insertions(+)
 create mode 100644 drivers/net/dsa/microchip/ksz8795.c
 create mode 100644 drivers/net/dsa/microchip/ksz8795_reg.h
 create mode 100644 drivers/net/dsa/microchip/ksz8795_spi.c

diff --git a/drivers/net/dsa/microchip/Kconfig 
b/drivers/net/dsa/microchip/Kconfig
index bea29fde9f3d1..d17aec084c8c7 100644
--- a/drivers/net/dsa/microchip/Kconfig
+++ b/drivers/net/dsa/microchip/Kconfig
@@ -14,3 +14,20 @@ config NET_DSA_MICROCHIP_KSZ9477_SPI
depends on NET_DSA_MICROCHIP_KSZ9477 && SPI
help
  Select to enable support for registering switches configured through 
SPI.
+
+menuconfig NET_DSA_MICROCHIP_KSZ8795
+   tristate "Microchip KSZ8795 series switch support"
+   depends on NET_DSA
+   select NET_DSA_TAG_KSZ8795
+   select NET_DSA_MICROCHIP_KSZ_COMMON
+   help
+ This driver adds support for Microchip KSZ8795 switch chips.
+
+config NET_DSA_MICROCHIP_KSZ8795_SPI
+   tristate "KSZ8795 series SPI connected switch driver"
+   depends on NET_DSA_MICROCHIP_KSZ8795 && SPI
+   help
+ This driver accesses KSZ8795 chip through SPI.
+
+ It is required to use the KSZ8795 switch driver as the only access
+ is through SPI.
diff --git a/drivers/net/dsa/microchip/Makefile 
b/drivers/net/dsa/microchip/Makefile
index 3142c18b8f573..18ab64172e0bb 100644
--- a/drivers/net/dsa/microchip/Makefile
+++ b/drivers/net/dsa/microchip/Makefile
@@ -1,3 +1,5 @@
 obj-$(CONFIG_NET_DSA_MICROCHIP_KSZ_COMMON) += ksz_common.o
 obj-$(CONFIG_NET_DSA_MICROCHIP_KSZ9477)+= ksz9477.o
 obj-$(CONFIG_NET_DSA_MICROCHIP_KSZ9477_SPI)+= ksz9477_spi.o
+obj-$(CONFIG_NET_DSA_MICROCHIP_KSZ8795)+= ksz8795.o
+obj-$(CONFIG_NET_DSA_MICROCHIP_KSZ8795_SPI)+= ksz8795_spi.o
diff --git a/drivers/net/dsa/microchip/ksz8795.c 
b/drivers/net/dsa/microchip/ksz8795.c
new file mode 100644
index 0..4ba31794a13a1
--- /dev/null
+++ b/drivers/net/dsa/microchip/ksz8795.c
@@ -0,0 +1,1351 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Microchip KSZ8795 switch driver
+ *
+ * Copyright (C) 2017 Microchip Technology Inc.
+ * Tristram Ha 
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "ksz_priv.h"
+#include "ksz_common.h"
+#include "ksz8795_reg.h"
+
+static const struct {
+   char string[ETH_GSTRING_LEN];
+} mib_names[TOTAL_SWITCH_COUNTER_NUM] = {
+   { "rx_hi" },
+   { "rx_undersize" },
+   { "rx_fragments" },
+   { "rx_oversize" },
+   { "rx_jabbers" },
+   { "rx_symbol_err" },
+   { "rx_crc_err" },
+   { "rx_align_err" },
+   { "rx_mac_ctrl" },
+   { "rx_pause" },
+   { "rx_bcast" },
+   { "rx_mcast" },
+   { "rx_ucast" },
+   { "rx_64_or_less" },
+   { "rx_65_127" },
+   { "rx_128_255" },
+   { "rx_256_511" },
+   { "rx_512_1023" },
+   { "rx_1024_1522" },
+   { "rx_1523_2000" },
+   { "rx_2001" },
+   { "tx_hi" },
+   { "tx_late_col" },
+   { "tx_pause" },
+   { "tx_bcast" },
+   { "tx_mcast" },
+   { "tx_ucast" },
+   { "tx_deferred" },
+   { "tx_total_col" },
+   { "tx_exc_col" },
+   { "tx_single_col" },
+   { "tx_mult_col" },
+   { "rx_total" },
+   { "tx_total" },
+   { "rx_discards" },
+   { "tx_discards" },
+};
+
+static int ksz8795_reset_switch(struct ksz_device *dev)
+{
+   /* reset switch */
+   ksz_write8(dev, REG_POWER_MANAGEMENT_1,
+  SW_SOFTWARE_POWER_DOWN << SW_POWER_MANAGEMENT_MODE_S);
+   ksz_write8(dev, REG_POWER_MANAGEMENT_1, 0);
+
+   return 0;
+}
+
+static void ksz8795_set_prio_queue(struct ksz_device *dev, int port, int queue)
+{
+   u8 hi;
+   u8 lo;
+
+   /* Number of queues can only be 1, 2, or 4. */
+   switch (queue) {
+   case 4:
+   case 3:
+   queue = PORT_QUEUE_SPLIT_4;
+   break;
+   case 2:
+   queue = PORT_QUEUE_SPLIT_2;
+   break;
+   default:
+   queue = PORT_QUEUE_SPLIT_1;
+   }
+   ksz_pread8(dev, port, REG_PORT_CTRL_0, );
+   ksz_pread8(dev, port, P_DROP_TAG_CTRL, );
+   lo &= ~PORT_QUEUE_SPLIT_L;
+   if (queue & PORT_QUEUE_SPLIT_2)
+   lo |= PORT_QUEUE_SPLIT_L;
+   hi &= ~PORT_QUEUE_SPLIT_H;
+   if (queue & PORT_QUEUE_SPLIT_4)
+   hi |=

[PATCH 2/5] net: dsa: ksz: Rename NET_DSA_TAG_KSZ to _KSZ9477

2018-12-07 Thread Marek Vasut

From: Tristram Ha 

Rename the tag Kconfig option and related macros in preparation for
addition of new KSZ family switches with different tag formats.

Signed-off-by: Tristram Ha 
Signed-off-by: Marek Vasut 
Cc: Vivien Didelot 
Cc: Woojung Huh 
Cc: David S. Miller 
---
 drivers/net/dsa/microchip/Kconfig   | 2 +-
 drivers/net/dsa/microchip/ksz9477.c | 2 +-
 include/net/dsa.h   | 2 +-
 net/dsa/Kconfig | 4 
 net/dsa/dsa.c   | 4 ++--
 net/dsa/dsa_priv.h  | 2 +-
 net/dsa/tag_ksz.c   | 2 +-
 7 files changed, 11 insertions(+), 7 deletions(-)

diff --git a/drivers/net/dsa/microchip/Kconfig 
b/drivers/net/dsa/microchip/Kconfig
index a8caf9249d50f..bea29fde9f3d1 100644
--- a/drivers/net/dsa/microchip/Kconfig
+++ b/drivers/net/dsa/microchip/Kconfig
@@ -4,7 +4,7 @@ config NET_DSA_MICROCHIP_KSZ_COMMON
 menuconfig NET_DSA_MICROCHIP_KSZ9477
tristate "Microchip KSZ9477 series switch support"
depends on NET_DSA
-   select NET_DSA_TAG_KSZ
+   select NET_DSA_TAG_KSZ9477
select NET_DSA_MICROCHIP_KSZ_COMMON
help
  This driver adds support for Microchip KSZ9477 switch chips.
diff --git a/drivers/net/dsa/microchip/ksz9477.c 
b/drivers/net/dsa/microchip/ksz9477.c
index ace8f2e3c781d..547bfb097dee7 100644
--- a/drivers/net/dsa/microchip/ksz9477.c
+++ b/drivers/net/dsa/microchip/ksz9477.c
@@ -330,7 +330,7 @@ static void ksz9477_port_init_cnt(struct ksz_device *dev, 
int port)
 static enum dsa_tag_protocol ksz9477_get_tag_protocol(struct dsa_switch *ds,
  int port)
 {
-   return DSA_TAG_PROTO_KSZ;
+   return DSA_TAG_PROTO_KSZ9477;
 }
 
 static int ksz9477_phy_read16(struct dsa_switch *ds, int addr, int reg)
diff --git a/include/net/dsa.h b/include/net/dsa.h
index 7a03274ea981b..dff6afc22ab1e 100644
--- a/include/net/dsa.h
+++ b/include/net/dsa.h
@@ -35,7 +35,7 @@ enum dsa_tag_protocol {
DSA_TAG_PROTO_BRCM_PREPEND,
DSA_TAG_PROTO_DSA,
DSA_TAG_PROTO_EDSA,
-   DSA_TAG_PROTO_KSZ,
+   DSA_TAG_PROTO_KSZ9477,
DSA_TAG_PROTO_LAN9303,
DSA_TAG_PROTO_MTK,
DSA_TAG_PROTO_QCA,
diff --git a/net/dsa/Kconfig b/net/dsa/Kconfig
index 4183e4ba27a50..8cdf73a31374e 100644
--- a/net/dsa/Kconfig
+++ b/net/dsa/Kconfig
@@ -41,6 +41,10 @@ config NET_DSA_TAG_EDSA
 config NET_DSA_TAG_KSZ
bool
 
+config NET_DSA_TAG_KSZ9477
+   bool
+   select NET_DSA_TAG_KSZ
+
 config NET_DSA_TAG_LAN9303
bool
 
diff --git a/net/dsa/dsa.c b/net/dsa/dsa.c
index 9f3209ff7ffde..4d4a381367d4d 100644
--- a/net/dsa/dsa.c
+++ b/net/dsa/dsa.c
@@ -52,8 +52,8 @@ const struct dsa_device_ops *dsa_device_ops[DSA_TAG_LAST] = {
 #ifdef CONFIG_NET_DSA_TAG_EDSA
[DSA_TAG_PROTO_EDSA] = _netdev_ops,
 #endif
-#ifdef CONFIG_NET_DSA_TAG_KSZ
-   [DSA_TAG_PROTO_KSZ] = _netdev_ops,
+#ifdef CONFIG_NET_DSA_TAG_KSZ9477
+   [DSA_TAG_PROTO_KSZ9477] = _netdev_ops,
 #endif
 #ifdef CONFIG_NET_DSA_TAG_LAN9303
[DSA_TAG_PROTO_LAN9303] = _netdev_ops,
diff --git a/net/dsa/dsa_priv.h b/net/dsa/dsa_priv.h
index 3964c6f7a7c0d..b48b533294544 100644
--- a/net/dsa/dsa_priv.h
+++ b/net/dsa/dsa_priv.h
@@ -206,7 +206,7 @@ extern const struct dsa_device_ops dsa_netdev_ops;
 extern const struct dsa_device_ops edsa_netdev_ops;
 
 /* tag_ksz.c */
-extern const struct dsa_device_ops ksz_netdev_ops;
+extern const struct dsa_device_ops ksz9477_netdev_ops;
 
 /* tag_lan9303.c */
 extern const struct dsa_device_ops lan9303_netdev_ops;
diff --git a/net/dsa/tag_ksz.c b/net/dsa/tag_ksz.c
index cad4406d9d4c2..036bc62198f28 100644
--- a/net/dsa/tag_ksz.c
+++ b/net/dsa/tag_ksz.c
@@ -96,7 +96,7 @@ static struct sk_buff *ksz_rcv(struct sk_buff *skb, struct 
net_device *dev,
return skb;
 }
 
-const struct dsa_device_ops ksz_netdev_ops = {
+const struct dsa_device_ops ksz9477_netdev_ops = {
.xmit   = ksz_xmit,
.rcv= ksz_rcv,
.overhead = KSZ_INGRESS_TAG_LEN,
-- 
2.18.0

[PATCH 1/5] net: dsa: ksz: Add MIB counter reading support

2018-12-07 Thread Marek Vasut

From: Tristram Ha 

Add MIB counter reading support to KSZ9477 driver. This makes the MIB
counter code more generic by removing the TOTAL_SWITCH_COUNTER_NUM and
instead making that configurable per switch model.

Signed-off-by: Tristram Ha 
Signed-off-by: Marek Vasut 
Cc: Vivien Didelot 
Cc: Woojung Huh 
Cc: David S. Miller 
---
 drivers/net/dsa/microchip/ksz9477.c| 129 +
 drivers/net/dsa/microchip/ksz_common.c | 118 ++
 drivers/net/dsa/microchip/ksz_common.h |   4 +
 drivers/net/dsa/microchip/ksz_priv.h   |   8 +-
 4 files changed, 214 insertions(+), 45 deletions(-)

diff --git a/drivers/net/dsa/microchip/ksz9477.c 
b/drivers/net/dsa/microchip/ksz9477.c
index e24dd14ccde77..ace8f2e3c781d 100644
--- a/drivers/net/dsa/microchip/ksz9477.c
+++ b/drivers/net/dsa/microchip/ksz9477.c
@@ -259,6 +259,74 @@ static int ksz9477_reset_switch(struct ksz_device *dev)
return 0;
 }
 
+static void ksz9477_r_mib_cnt(struct ksz_device *dev, int port, u16 addr,
+ u64 *cnt)
+{
+   u32 data;
+   int timeout;
+
+   /* retain the flush/freeze bit */
+   ksz_pread32(dev, port, REG_PORT_MIB_CTRL_STAT__4, );
+   data &= MIB_COUNTER_FLUSH_FREEZE;
+
+   data |= MIB_COUNTER_READ;
+   data |= (addr << MIB_COUNTER_INDEX_S);
+   ksz_pwrite32(dev, port, REG_PORT_MIB_CTRL_STAT__4, data);
+
+   timeout = 1000;
+   do {
+   ksz_pread32(dev, port, REG_PORT_MIB_CTRL_STAT__4,
+   );
+   usleep_range(1, 10);
+   if (!(data & MIB_COUNTER_READ))
+   break;
+   } while (timeout-- > 0);
+
+   /* failed to read MIB. get out of loop */
+   if (!timeout) {
+   dev_dbg(dev->dev, "Failed to get MIB\n");
+   return;
+   }
+
+   /* count resets upon read */
+   ksz_pread32(dev, port, REG_PORT_MIB_DATA, );
+   *cnt += data;
+}
+
+static void ksz9477_r_mib_pkt(struct ksz_device *dev, int port, u16 addr,
+ u64 *dropped, u64 *cnt)
+{
+   addr = mib_names[addr].index;
+   ksz9477_r_mib_cnt(dev, port, addr, cnt);
+}
+
+static void ksz9477_freeze_mib(struct ksz_device *dev, int port, bool freeze)
+{
+   /* enable the port for flush/freeze function */
+   if (freeze)
+   ksz_pwrite32(dev, port, REG_PORT_MIB_CTRL_STAT__4,
+MIB_COUNTER_FLUSH_FREEZE);
+   ksz_cfg(dev, REG_SW_MAC_CTRL_6, SW_MIB_COUNTER_FREEZE, freeze);
+
+   /* disable the port after freeze is done */
+   if (!freeze)
+   ksz_pwrite32(dev, port, REG_PORT_MIB_CTRL_STAT__4, 0);
+}
+
+static void ksz9477_port_init_cnt(struct ksz_device *dev, int port)
+{
+   struct ksz_port_mib *mib = >ports[port].mib;
+
+   /* flush all enabled port MIB counters */
+   ksz_pwrite32(dev, port, REG_PORT_MIB_CTRL_STAT__4,
+MIB_COUNTER_FLUSH_FREEZE);
+   ksz_write8(dev, REG_SW_MAC_CTRL_6, SW_MIB_COUNTER_FLUSH);
+   ksz_pwrite32(dev, port, REG_PORT_MIB_CTRL_STAT__4, 0);
+
+   mib->cnt_ptr = 0;
+   memset(mib->counters, 0, dev->mib_cnt * sizeof(u64));
+}
+
 static enum dsa_tag_protocol ksz9477_get_tag_protocol(struct dsa_switch *ds,
  int port)
 {
@@ -342,47 +410,6 @@ static void ksz9477_get_strings(struct dsa_switch *ds, int 
port,
}
 }
 
-static void ksz_get_ethtool_stats(struct dsa_switch *ds, int port,
- uint64_t *buf)
-{
-   struct ksz_device *dev = ds->priv;
-   int i;
-   u32 data;
-   int timeout;
-
-   mutex_lock(>stats_mutex);
-
-   for (i = 0; i < TOTAL_SWITCH_COUNTER_NUM; i++) {
-   data = MIB_COUNTER_READ;
-   data |= ((ksz9477_mib_names[i].index & 0xFF) <<
-   MIB_COUNTER_INDEX_S);
-   ksz_pwrite32(dev, port, REG_PORT_MIB_CTRL_STAT__4, data);
-
-   timeout = 1000;
-   do {
-   ksz_pread32(dev, port, REG_PORT_MIB_CTRL_STAT__4,
-   );
-   usleep_range(1, 10);
-   if (!(data & MIB_COUNTER_READ))
-   break;
-   } while (timeout-- > 0);
-
-   /* failed to read MIB. get out of loop */
-   if (!timeout) {
-   dev_dbg(dev->dev, "Failed to get MIB\n");
-   break;
-   }
-
-   /* count resets upon read */
-   ksz_pread32(dev, port, REG_PORT_MIB_DATA, );
-
-   dev->mib_value[i] += (uint64_t)data;
-   buf[i] = dev->mib_value[i];
-   }
-
-   mutex_unlock(>stats_mutex);
-}
-
 static void ksz9477_cfg_port_member(struct ksz_device *dev, int port,
u8 member)
 {
@@ -969,6 +996,17 @@ static void ksz9477_port_mirror_del(struct dsa_switch *ds,

[PATCH 4/5] net: dsa: ksz: Add KSZ8795 tag code

2018-12-07 Thread Marek Vasut

From: Tristram Ha 

Add DSA tag code for Microchip KSZ8795 switch. The switch is simpler
and the tag is only 1 byte, instead of 2 as is the case with KSZ9477.

Signed-off-by: Tristram Ha 
Signed-off-by: Marek Vasut 
Cc: Vivien Didelot 
Cc: Woojung Huh 
Cc: David S. Miller 
---
 include/net/dsa.h  |  1 +
 net/dsa/Kconfig|  4 
 net/dsa/dsa.c  |  3 +++
 net/dsa/dsa_priv.h |  1 +
 net/dsa/tag_ksz.c  | 59 ++
 5 files changed, 68 insertions(+)

diff --git a/include/net/dsa.h b/include/net/dsa.h
index dff6afc22ab1e..7aee9c33ceb78 100644
--- a/include/net/dsa.h
+++ b/include/net/dsa.h
@@ -35,6 +35,7 @@ enum dsa_tag_protocol {
DSA_TAG_PROTO_BRCM_PREPEND,
DSA_TAG_PROTO_DSA,
DSA_TAG_PROTO_EDSA,
+   DSA_TAG_PROTO_KSZ8795,
DSA_TAG_PROTO_KSZ9477,
DSA_TAG_PROTO_LAN9303,
DSA_TAG_PROTO_MTK,
diff --git a/net/dsa/Kconfig b/net/dsa/Kconfig
index 8cdf73a31374e..0059237d50ec3 100644
--- a/net/dsa/Kconfig
+++ b/net/dsa/Kconfig
@@ -41,6 +41,10 @@ config NET_DSA_TAG_EDSA
 config NET_DSA_TAG_KSZ
bool
 
+config NET_DSA_TAG_KSZ8795
+   bool
+   select NET_DSA_TAG_KSZ
+
 config NET_DSA_TAG_KSZ9477
bool
select NET_DSA_TAG_KSZ
diff --git a/net/dsa/dsa.c b/net/dsa/dsa.c
index 4d4a381367d4d..bf009192b0d53 100644
--- a/net/dsa/dsa.c
+++ b/net/dsa/dsa.c
@@ -52,6 +52,9 @@ const struct dsa_device_ops *dsa_device_ops[DSA_TAG_LAST] = {
 #ifdef CONFIG_NET_DSA_TAG_EDSA
[DSA_TAG_PROTO_EDSA] = _netdev_ops,
 #endif
+#ifdef CONFIG_NET_DSA_TAG_KSZ8795
+   [DSA_TAG_PROTO_KSZ8795] = _netdev_ops,
+#endif
 #ifdef CONFIG_NET_DSA_TAG_KSZ9477
[DSA_TAG_PROTO_KSZ9477] = _netdev_ops,
 #endif
diff --git a/net/dsa/dsa_priv.h b/net/dsa/dsa_priv.h
index b48b533294544..903c5118d5cbf 100644
--- a/net/dsa/dsa_priv.h
+++ b/net/dsa/dsa_priv.h
@@ -206,6 +206,7 @@ extern const struct dsa_device_ops dsa_netdev_ops;
 extern const struct dsa_device_ops edsa_netdev_ops;
 
 /* tag_ksz.c */
+extern const struct dsa_device_ops ksz8795_netdev_ops;
 extern const struct dsa_device_ops ksz9477_netdev_ops;
 
 /* tag_lan9303.c */
diff --git a/net/dsa/tag_ksz.c b/net/dsa/tag_ksz.c
index d94bad1ab7e53..ee05629b5a342 100644
--- a/net/dsa/tag_ksz.c
+++ b/net/dsa/tag_ksz.c
@@ -88,6 +88,65 @@ static struct sk_buff *ksz_rcv(struct sk_buff *skb, struct 
net_device *dev,
return skb;
 }
 
+/*
+ * For Ingress (Host -> KSZ8795), 1 byte is added before FCS.
+ * ---
+ * DA(6bytes)|SA(6bytes)||Data(nbytes)|tag(1byte)|FCS(4bytes)
+ * ---
+ * tag : each bit represents port (eg, 0x01=port1, 0x02=port2, 0x10=port5)
+ *
+ * For Egress (KSZ8795 -> Host), 1 byte is added before FCS.
+ * ---
+ * DA(6bytes)|SA(6bytes)||Data(nbytes)|tag0(1byte)|FCS(4bytes)
+ * ---
+ * tag0 : zero-based value represents port
+ *   (eg, 0x00=port1, 0x02=port3, 0x06=port7)
+ */
+
+#defineKSZ8795_INGRESS_TAG_LEN 1
+
+#define KSZ8795_TAIL_TAG_OVERRIDE  BIT(6)
+#define KSZ8795_TAIL_TAG_LOOKUPBIT(7)
+
+static void ksz8795_get_tag(u8 *tag, unsigned int *port, unsigned int *len)
+{
+   *port = tag[0] & 7;
+   *len = KSZ_EGRESS_TAG_LEN;
+}
+
+static void ksz8795_set_tag(struct sk_buff *skb, struct net_device *dev)
+{
+   struct dsa_port *dp = dsa_slave_to_port(dev);
+   u8 *tag = skb_put(skb, KSZ8795_INGRESS_TAG_LEN);
+   u8 *addr = skb_mac_header(skb);
+
+   *tag = 1 << dp->index;
+   if (!memcmp(addr, special_mult_addr, ETH_ALEN))
+   *tag |= KSZ8795_TAIL_TAG_OVERRIDE;
+}
+
+static struct ksz_tag_ops ksz8795_tag_ops = {
+   .get_tag= ksz8795_get_tag,
+   .set_tag= ksz8795_set_tag,
+};
+
+static struct sk_buff *ksz8795_xmit(struct sk_buff *skb, struct net_device 
*dev)
+{
+   return ksz_xmit(skb, dev, _tag_ops, KSZ8795_INGRESS_TAG_LEN);
+}
+
+static struct sk_buff *ksz8795_rcv(struct sk_buff *skb, struct net_device *dev,
+ struct packet_type *pt)
+{
+   return ksz_rcv(skb, dev, pt, _tag_ops);
+}
+
+const struct dsa_device_ops ksz8795_netdev_ops = {
+   .xmit   = ksz8795_xmit,
+   .rcv= ksz8795_rcv,
+   .overhead = KSZ8795_INGRESS_TAG_LEN,
+};
+
 /*
  * For Ingress (Host -> KSZ9477), 2 bytes are added before FCS.
  * ---
-- 
2.18.0

[PATCH 3/5] net: dsa: ksz: Factor out common tag code

2018-12-07 Thread Marek Vasut

From: Tristram Ha 

Factor out common code from the tag_ksz , so that the code can be used
with other KSZ family switches which use differenly sized tags.

Signed-off-by: Tristram Ha 
Signed-off-by: Marek Vasut 
Cc: Vivien Didelot 
Cc: Woojung Huh 
Cc: David S. Miller 
---
 net/dsa/tag_ksz.c | 125 +-
 1 file changed, 90 insertions(+), 35 deletions(-)

diff --git a/net/dsa/tag_ksz.c b/net/dsa/tag_ksz.c
index 036bc62198f28..d94bad1ab7e53 100644
--- a/net/dsa/tag_ksz.c
+++ b/net/dsa/tag_ksz.c
@@ -14,34 +14,30 @@
 #include 
 #include "dsa_priv.h"
 
-/* For Ingress (Host -> KSZ), 2 bytes are added before FCS.
- * ---
- * DA(6bytes)|SA(6bytes)||Data(nbytes)|tag0(1byte)|tag1(1byte)|FCS(4bytes)
- * ---
- * tag0 : Prioritization (not used now)
- * tag1 : each bit represents port (eg, 0x01=port1, 0x02=port2, 0x10=port5)
- *
- * For Egress (KSZ -> Host), 1 byte is added before FCS.
- * ---
- * DA(6bytes)|SA(6bytes)||Data(nbytes)|tag0(1byte)|FCS(4bytes)
- * ---
- * tag0 : zero-based value represents port
- *   (eg, 0x00=port1, 0x02=port3, 0x06=port7)
- */
+struct ksz_tag_ops {
+   void(*get_tag)(u8 *tag, unsigned int *port, unsigned int *len);
+   void(*set_tag)(struct sk_buff *skb, struct net_device *dev);
+};
 
-#defineKSZ_INGRESS_TAG_LEN 2
-#defineKSZ_EGRESS_TAG_LEN  1
+/* Typically only one byte is used for tail tag. */
+#define KSZ_EGRESS_TAG_LEN 1
 
-static struct sk_buff *ksz_xmit(struct sk_buff *skb, struct net_device *dev)
+/* Frames with following addresse may need to be sent even when the port is
+ * closed.
+ */
+static const u8 special_mult_addr[] = {
+   0x01, 0x80, 0xC2, 0x00, 0x00, 0x00
+};
+
+static struct sk_buff *ksz_xmit(struct sk_buff *skb, struct net_device *dev,
+   struct ksz_tag_ops *ops, int len)
 {
-   struct dsa_port *dp = dsa_slave_to_port(dev);
struct sk_buff *nskb;
int padlen;
-   u8 *tag;
 
padlen = (skb->len >= VLAN_ETH_ZLEN) ? 0 : VLAN_ETH_ZLEN - skb->len;
 
-   if (skb_tailroom(skb) >= padlen + KSZ_INGRESS_TAG_LEN) {
+   if (skb_tailroom(skb) >= padlen + len) {
/* Let dsa_slave_xmit() free skb */
if (__skb_put_padto(skb, skb->len + padlen, false))
return NULL;
@@ -49,7 +45,7 @@ static struct sk_buff *ksz_xmit(struct sk_buff *skb, struct 
net_device *dev)
nskb = skb;
} else {
nskb = alloc_skb(NET_IP_ALIGN + skb->len +
-padlen + KSZ_INGRESS_TAG_LEN, GFP_ATOMIC);
+padlen + len, GFP_ATOMIC);
if (!nskb)
return NULL;
skb_reserve(nskb, NET_IP_ALIGN);
@@ -70,34 +66,93 @@ static struct sk_buff *ksz_xmit(struct sk_buff *skb, struct 
net_device *dev)
consume_skb(skb);
}
 
-   tag = skb_put(nskb, KSZ_INGRESS_TAG_LEN);
-   tag[0] = 0;
-   tag[1] = 1 << dp->index; /* destination port */
+   ops->set_tag(nskb, dev);
 
return nskb;
 }
 
 static struct sk_buff *ksz_rcv(struct sk_buff *skb, struct net_device *dev,
-  struct packet_type *pt)
+  struct packet_type *pt, struct ksz_tag_ops *ops)
 {
-   u8 *tag;
-   int source_port;
+   int sp, len;
+   u8 *tag = skb_tail_pointer(skb) - KSZ_EGRESS_TAG_LEN;
 
-   tag = skb_tail_pointer(skb) - KSZ_EGRESS_TAG_LEN;
+   ops->get_tag(tag, , );
 
-   source_port = tag[0] & 7;
-
-   skb->dev = dsa_master_find_slave(dev, 0, source_port);
+   skb->dev = dsa_master_find_slave(dev, 0, sp);
if (!skb->dev)
return NULL;
 
-   pskb_trim_rcsum(skb, skb->len - KSZ_EGRESS_TAG_LEN);
+   pskb_trim_rcsum(skb, skb->len - len);
 
return skb;
 }
 
+/*
+ * For Ingress (Host -> KSZ9477), 2 bytes are added before FCS.
+ * ---
+ * DA(6bytes)|SA(6bytes)||Data(nbytes)|tag0(1byte)|tag1(1byte)|FCS(4bytes)
+ * ---
+ * tag0 : Prioritization (not used now)
+ * tag1 : each bit represents port (eg, 0x01=port1, 0x02=port2, 0x10=port5)
+ *
+ * For Egress (KSZ9477 -> Host), 1 byte is added before FCS.
+ * ---
+ * DA(6bytes)|SA(6bytes)||Data(nbytes)|tag0(1byte)|FCS(4bytes)
+ * ---
+ * tag0 : zero-based value represents port
+ *   (eg, 0x00=port1,

Re: OMAP4430 SDP with KS8851: very slow networking

2018-12-07 Thread Tony Lindgren

Hi,

* Russell King - ARM Linux  [181207 18:01]:
> Hi Tony,
> 
> You know most of what's been going on from IRC, but here's the patch
> which gets me:
> 
> 1) working interrupts for networking
> 2) solves the stuck-wakeup problem
> 
> It also contains some of the debug bits I added.

This is excellent news :) Will test today.

> I think what this means is that we should strip out ec0daae685b2
> ("gpio: omap: Add level wakeup handling for omap4 based SoCs").

Yes the only reason for the wakeup quirk was the stuck wakeup
state seen on omap4, it can be just dropped if this works.
Adding Grygorii to Cc too.

Regards,

Tony

> diff --git a/drivers/gpio/gpio-omap.c b/drivers/gpio/gpio-omap.c
> index 3d021f648c5d..528ffd1b9832 100644
> --- a/drivers/gpio/gpio-omap.c
> +++ b/drivers/gpio/gpio-omap.c
> @@ -11,7 +11,7 @@
>   * it under the terms of the GNU General Public License version 2 as
>   * published by the Free Software Foundation.
>   */
> -
> +#define DEBUG
>  #include 
>  #include 
>  #include 
> @@ -366,10 +366,14 @@ static inline void omap_set_gpio_trigger(struct 
> gpio_bank *bank, int gpio,
> trigger & IRQ_TYPE_LEVEL_LOW);
>   omap_gpio_rmw(base, bank->regs->leveldetect1, gpio_bit,
> trigger & IRQ_TYPE_LEVEL_HIGH);
> + /*
> +  * We need the edge detect enabled for the idle mode detection
> +  * to function on OMAP4430.
> +  */
>   omap_gpio_rmw(base, bank->regs->risingdetect, gpio_bit,
> -   trigger & IRQ_TYPE_EDGE_RISING);
> +   trigger & (IRQ_TYPE_EDGE_RISING | IRQ_TYPE_LEVEL_HIGH));
>   omap_gpio_rmw(base, bank->regs->fallingdetect, gpio_bit,
> -   trigger & IRQ_TYPE_EDGE_FALLING);
> +   trigger & (IRQ_TYPE_EDGE_FALLING | IRQ_TYPE_LEVEL_LOW));
>  
>   bank->context.leveldetect0 =
>   readl_relaxed(bank->base + bank->regs->leveldetect0);
> @@ -910,14 +914,16 @@ static void omap_gpio_unmask_irq(struct irq_data *d)
>   if (trigger)
>   omap_set_gpio_triggering(bank, offset, trigger);
>  
> + omap_set_gpio_irqenable(bank, offset, 1);
> +
>   /* For level-triggered GPIOs, the clearing must be done after
> -  * the HW source is cleared, thus after the handler has run */
> - if (bank->level_mask & BIT(offset)) {
> - omap_set_gpio_irqenable(bank, offset, 0);
> +  * the HW source is cleared, thus after the handler has run.
> +  * OMAP4 needs this done _after_ enabing the interrupt to clear
> +  * the wakeup status.
> +  */
> + if (bank->level_mask & BIT(offset))
>   omap_clear_gpio_irqstatus(bank, offset);
> - }
>  
> - omap_set_gpio_irqenable(bank, offset, 1);
>   raw_spin_unlock_irqrestore(>lock, flags);
>  }
>  
> @@ -1520,6 +1526,10 @@ static void omap_gpio_idle(struct gpio_bank *bank, 
> bool may_lose_context)
>   struct device *dev = bank->chip.parent;
>   u32 l1 = 0, l2 = 0;
>  
> + dev_dbg(dev, "%s(): ld 0x%08x 0x%08x we 0x%08x\n", __func__,
> + bank->context.leveldetect0, bank->context.leveldetect1,
> + bank->context.wake_en);
> +
>   if (bank->funcs.idle_enable_level_quirk)
>   bank->funcs.idle_enable_level_quirk(bank);
>  
> @@ -1553,6 +1563,10 @@ static void omap_gpio_idle(struct gpio_bank *bank, 
> bool may_lose_context)
>   bank->get_context_loss_count(dev);
>  
>   omap_gpio_dbck_disable(bank);
> +
> + dev_dbg(dev, "%s(): ld 0x%08x 0x%08x we 0x%08x\n", __func__,
> + bank->context.leveldetect0, bank->context.leveldetect1,
> + bank->context.wake_en);
>  }
>  
>  static void omap_gpio_init_context(struct gpio_bank *p);
> @@ -1563,6 +1577,10 @@ static void omap_gpio_unidle(struct gpio_bank *bank)
>   u32 l = 0, gen, gen0, gen1;
>   int c;
>  
> + dev_dbg(dev, "%s(): ld 0x%08x 0x%08x we 0x%08x\n", __func__,
> + bank->context.leveldetect0, bank->context.leveldetect1,
> + bank->context.wake_en);
> +
>   /*
>* On the first resume during the probe, the context has not
>* been initialised and so initialise it now. Also initialise
> @@ -1648,6 +1666,10 @@ static void omap_gpio_unidle(struct gpio_bank *bank)
>   }
>  
>   bank->workaround_enabled = false;
> +
> + dev_dbg(dev, "%s(): ld 0x%08x 0x%08x we 0x%08x\n", __func__,
> + bank->context.leveldetect0, bank->context.leveldetect1,
> + bank->context.wake_en);
>  }
>  
>  static void omap_gpio_init_context(struct gpio_bank *p)
> @@ -1720,6 +1742,7 @@ static int __maybe_unused 
> omap_gpio_runtime_suspend(struct device *dev)
>   error = -EBUSY;
>   goto unlock;
>   }
> + dev_dbg(dev, "%s()\n", __func__);
>   omap_gpio_idle(bank, true);
>   bank->is_suspended = true;
>  unlock:
> @@ -1741,6 +1764,7 @@ static int __maybe_unused 
> omap_gpio_runtime_resume(struct device *dev)
>

I wait to hear from you.

2018-12-07 Thread Mr. David Abraham

My Greeting, How are you today?Did you receive the letter i sent to
you. Please answer me.
Best Regard,
Mr.David Abraham.

RE: [PATCH v2 net-next 0/8] dpaa2-eth: Introduce XDP support

2018-12-07 Thread Ioana Ciocoi Radulescu

> -Original Message-
> From: Ilias Apalodimas 
> Sent: Friday, December 7, 2018 7:52 PM
> To: Ioana Ciocoi Radulescu 
> Cc: Jesper Dangaard Brouer ;
> netdev@vger.kernel.org; da...@davemloft.net; Ioana Ciornei
> ; dsah...@gmail.com; Camelia Alexandra Groza
> 
> Subject: Re: [PATCH v2 net-next 0/8] dpaa2-eth: Introduce XDP support
> 
> Hi Ioana,
> > > > >
> > > I only did a quick grep around the driver so i might be missing something,
> > > but i can only see allocations via napi_alloc_frag(). XDP requires pages
> > > (either a single page per packet or a driver that does the page
> management
> > > of
> > > its own and fits 2 frames in a single page, assuming 4kb pages).
> > > Am i missing something on the driver?
> >
> > No, I guess I'm the one missing stuff, I didn't realise single page per 
> > packet
> > is a hard requirement for XDP. Could you point me to more info on this?
> >
> 
> Well if you don't have to use 64kb pages you can use the page_pool API (only
> used from mlx5 atm) and get the xdp recycling for free. The memory 'waste'
> for
> 4kb pages isn't too much if the platforms the driver sits on have decent
> amounts
> of memory  (and the number of descriptors used is not too high).
> We still have work in progress with Jesper (just posted an RFC)with
> improvements
> on the API.
> Using it is fairly straightforward. This is a patchset on marvell's mvneta
> driver with the API changes needed:
> https://www.spinics.net/lists/netdev/msg538285.html
> 
> If you need 64kb pages you would have to introduce page recycling and
> sharing
> like intel/mlx drivers on your driver.

Thanks a lot for the info, will look into this. Do you have any pointers
as to why the full page restriction exists in the first place? Sorry if it's
a dumb question, but I haven't found details on this and I'd really like
to understand it.

Thanks
Ioana

Re: OMAP4430 SDP with KS8851: very slow networking

2018-12-07 Thread Russell King - ARM Linux

Hi Tony,

You know most of what's been going on from IRC, but here's the patch
which gets me:

1) working interrupts for networking
2) solves the stuck-wakeup problem

It also contains some of the debug bits I added.

I think what this means is that we should strip out ec0daae685b2
("gpio: omap: Add level wakeup handling for omap4 based SoCs").

diff --git a/drivers/gpio/gpio-omap.c b/drivers/gpio/gpio-omap.c
index 3d021f648c5d..528ffd1b9832 100644
--- a/drivers/gpio/gpio-omap.c
+++ b/drivers/gpio/gpio-omap.c
@@ -11,7 +11,7 @@
  * it under the terms of the GNU General Public License version 2 as
  * published by the Free Software Foundation.
  */
-
+#define DEBUG
 #include 
 #include 
 #include 
@@ -366,10 +366,14 @@ static inline void omap_set_gpio_trigger(struct gpio_bank 
*bank, int gpio,
  trigger & IRQ_TYPE_LEVEL_LOW);
omap_gpio_rmw(base, bank->regs->leveldetect1, gpio_bit,
  trigger & IRQ_TYPE_LEVEL_HIGH);
+   /*
+* We need the edge detect enabled for the idle mode detection
+* to function on OMAP4430.
+*/
omap_gpio_rmw(base, bank->regs->risingdetect, gpio_bit,
- trigger & IRQ_TYPE_EDGE_RISING);
+ trigger & (IRQ_TYPE_EDGE_RISING | IRQ_TYPE_LEVEL_HIGH));
omap_gpio_rmw(base, bank->regs->fallingdetect, gpio_bit,
- trigger & IRQ_TYPE_EDGE_FALLING);
+ trigger & (IRQ_TYPE_EDGE_FALLING | IRQ_TYPE_LEVEL_LOW));
 
bank->context.leveldetect0 =
readl_relaxed(bank->base + bank->regs->leveldetect0);
@@ -910,14 +914,16 @@ static void omap_gpio_unmask_irq(struct irq_data *d)
if (trigger)
omap_set_gpio_triggering(bank, offset, trigger);
 
+   omap_set_gpio_irqenable(bank, offset, 1);
+
/* For level-triggered GPIOs, the clearing must be done after
-* the HW source is cleared, thus after the handler has run */
-   if (bank->level_mask & BIT(offset)) {
-   omap_set_gpio_irqenable(bank, offset, 0);
+* the HW source is cleared, thus after the handler has run.
+* OMAP4 needs this done _after_ enabing the interrupt to clear
+* the wakeup status.
+*/
+   if (bank->level_mask & BIT(offset))
omap_clear_gpio_irqstatus(bank, offset);
-   }
 
-   omap_set_gpio_irqenable(bank, offset, 1);
raw_spin_unlock_irqrestore(>lock, flags);
 }
 
@@ -1520,6 +1526,10 @@ static void omap_gpio_idle(struct gpio_bank *bank, bool 
may_lose_context)
struct device *dev = bank->chip.parent;
u32 l1 = 0, l2 = 0;
 
+   dev_dbg(dev, "%s(): ld 0x%08x 0x%08x we 0x%08x\n", __func__,
+   bank->context.leveldetect0, bank->context.leveldetect1,
+   bank->context.wake_en);
+
if (bank->funcs.idle_enable_level_quirk)
bank->funcs.idle_enable_level_quirk(bank);
 
@@ -1553,6 +1563,10 @@ static void omap_gpio_idle(struct gpio_bank *bank, bool 
may_lose_context)
bank->get_context_loss_count(dev);
 
omap_gpio_dbck_disable(bank);
+
+   dev_dbg(dev, "%s(): ld 0x%08x 0x%08x we 0x%08x\n", __func__,
+   bank->context.leveldetect0, bank->context.leveldetect1,
+   bank->context.wake_en);
 }
 
 static void omap_gpio_init_context(struct gpio_bank *p);
@@ -1563,6 +1577,10 @@ static void omap_gpio_unidle(struct gpio_bank *bank)
u32 l = 0, gen, gen0, gen1;
int c;
 
+   dev_dbg(dev, "%s(): ld 0x%08x 0x%08x we 0x%08x\n", __func__,
+   bank->context.leveldetect0, bank->context.leveldetect1,
+   bank->context.wake_en);
+
/*
 * On the first resume during the probe, the context has not
 * been initialised and so initialise it now. Also initialise
@@ -1648,6 +1666,10 @@ static void omap_gpio_unidle(struct gpio_bank *bank)
}
 
bank->workaround_enabled = false;
+
+   dev_dbg(dev, "%s(): ld 0x%08x 0x%08x we 0x%08x\n", __func__,
+   bank->context.leveldetect0, bank->context.leveldetect1,
+   bank->context.wake_en);
 }
 
 static void omap_gpio_init_context(struct gpio_bank *p)
@@ -1720,6 +1742,7 @@ static int __maybe_unused 
omap_gpio_runtime_suspend(struct device *dev)
error = -EBUSY;
goto unlock;
}
+   dev_dbg(dev, "%s()\n", __func__);
omap_gpio_idle(bank, true);
bank->is_suspended = true;
 unlock:
@@ -1741,6 +1764,7 @@ static int __maybe_unused omap_gpio_runtime_resume(struct 
device *dev)
error = -EBUSY;
goto unlock;
}
+   dev_dbg(dev, "%s()\n", __func__);
omap_gpio_unidle(bank);
bank->is_suspended = false;
 unlock:
@@ -1827,8 +1851,8 @@ static const struct omap_gpio_platform_data omap4_pdata = 
{
.regs = _gpio_regs,
.bank_width = 32,
.dbck_flag = true,
-   .quirks =

Re: [PATCH v2 net-next 0/8] dpaa2-eth: Introduce XDP support

2018-12-07 Thread Ilias Apalodimas

Hi Ioana,
> > > >
> > I only did a quick grep around the driver so i might be missing something,
> > but i can only see allocations via napi_alloc_frag(). XDP requires pages
> > (either a single page per packet or a driver that does the page management
> > of
> > its own and fits 2 frames in a single page, assuming 4kb pages).
> > Am i missing something on the driver?
> 
> No, I guess I'm the one missing stuff, I didn't realise single page per packet
> is a hard requirement for XDP. Could you point me to more info on this?
> 

Well if you don't have to use 64kb pages you can use the page_pool API (only
used from mlx5 atm) and get the xdp recycling for free. The memory 'waste' for
4kb pages isn't too much if the platforms the driver sits on have decent amounts
of memory  (and the number of descriptors used is not too high).
We still have work in progress with Jesper (just posted an RFC)with improvements
on the API.
Using it is fairly straightforward. This is a patchset on marvell's mvneta
driver with the API changes needed: 
https://www.spinics.net/lists/netdev/msg538285.html

If you need 64kb pages you would have to introduce page recycling and sharing 
like intel/mlx drivers on your driver.

/Ilias

[PATCH] net: dsa: ksz: Fix port membership

2018-12-07 Thread Marek Vasut

If two ports are in the same bridge and in forwarding state, the packets
must be able to pass between them in both directions. The current code
only configures this bridge membership for a port newly added to the
bridge, but does not update all the other ports. Thus, ingress packets
on the new port will be forwarded, but ingress packets on other ports
destined for the new port (eg. a reply) will not be forwarded back to
the new port, because they are not configured to do so. This patch fixes
that by updating the membership registers of all ports.

Signed-off-by: Marek Vasut 
Cc: Vivien Didelot 
Cc: Woojung Huh 
Cc: David S. Miller 
Cc: Tristram Ha 
---
 drivers/net/dsa/microchip/ksz9477.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/dsa/microchip/ksz9477.c 
b/drivers/net/dsa/microchip/ksz9477.c
index 0684657fbf9a9..e24dd14ccde77 100644
--- a/drivers/net/dsa/microchip/ksz9477.c
+++ b/drivers/net/dsa/microchip/ksz9477.c
@@ -396,7 +396,7 @@ static void ksz9477_port_stp_state_set(struct dsa_switch 
*ds, int port,
struct ksz_device *dev = ds->priv;
struct ksz_port *p = >ports[port];
u8 data;
-   int member = -1;
+   int i, member = -1;
 
ksz_pread8(dev, port, P_STP_CTRL, );
data &= ~(PORT_TX_ENABLE | PORT_RX_ENABLE | PORT_LEARN_DISABLE);
@@ -454,8 +454,8 @@ static void ksz9477_port_stp_state_set(struct dsa_switch 
*ds, int port,
dev->tx_ports &= ~(1 << port);
 
/* Port membership may share register with STP state. */
-   if (member >= 0 && member != p->member)
-   ksz9477_cfg_port_member(dev, port, (u8)member);
+   for (i = 0; i < SWITCH_PORT_NUM; i++)
+   ksz9477_cfg_port_member(dev, i, (u8)member);
 
/* Check if forwarding needs to be updated. */
if (state != BR_STATE_FORWARDING) {
-- 
2.18.0

[PATCH] net: dsa: ksz: Add reset GPIO handling

2018-12-07 Thread Marek Vasut

Add code to handle optional reset GPIO in the KSZ switch driver. The switch
has a reset GPIO line which can be controlled by the CPU, so make sure it is
configured correctly in such setups.

Signed-off-by: Marek Vasut 
Cc: Vivien Didelot 
Cc: Woojung Huh 
Cc: David S. Miller 
Cc: Tristram Ha 
---
 drivers/net/dsa/microchip/ksz_common.c | 26 ++
 drivers/net/dsa/microchip/ksz_priv.h   |  2 ++
 2 files changed, 28 insertions(+)

diff --git a/drivers/net/dsa/microchip/ksz_common.c 
b/drivers/net/dsa/microchip/ksz_common.c
index 9705808c3af7a..1f50b31722958 100644
--- a/drivers/net/dsa/microchip/ksz_common.c
+++ b/drivers/net/dsa/microchip/ksz_common.c
@@ -8,12 +8,14 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -289,11 +291,31 @@ EXPORT_SYMBOL(ksz_switch_alloc);
 int ksz_switch_register(struct ksz_device *dev,
const struct ksz_dev_ops *ops)
 {
+   struct device_node *np = dev->dev->of_node;
+   enum of_gpio_flags reset_gpio_flags;
+   unsigned long flags;
+   int reset_gpio;
int ret;
 
if (dev->pdata)
dev->chip_id = dev->pdata->chip_id;
 
+   dev->reset_gpio = -1;
+   reset_gpio = of_get_named_gpio_flags(np, "reset-gpios", 0,
+_gpio_flags);
+   if (reset_gpio >= 0) {
+   flags = (reset_gpio_flags == OF_GPIO_ACTIVE_LOW) ?
+   GPIOF_ACTIVE_LOW : 0;
+   ret = devm_gpio_request_one(dev->dev, reset_gpio, flags,
+   "switch-reset");
+   if (ret) {
+   dev_err(dev->dev, "failed to get reset-gpios: %d\n", 
ret);
+   return -EIO;
+   }
+   dev->reset_gpio = reset_gpio;
+   gpiod_set_value(gpio_to_desc(reset_gpio), 0);
+   }
+
mutex_init(>reg_mutex);
mutex_init(>stats_mutex);
mutex_init(>alu_mutex);
@@ -329,6 +351,10 @@ void ksz_switch_remove(struct ksz_device *dev)
 {
dev->dev_ops->exit(dev);
dsa_unregister_switch(dev->ds);
+
+   if (dev->reset_gpio >= 0)
+   gpiod_set_value(gpio_to_desc(dev->reset_gpio), 1);
+
 }
 EXPORT_SYMBOL(ksz_switch_remove);
 
diff --git a/drivers/net/dsa/microchip/ksz_priv.h 
b/drivers/net/dsa/microchip/ksz_priv.h
index a38ff0841ed4e..6dd2ebfd6e12f 100644
--- a/drivers/net/dsa/microchip/ksz_priv.h
+++ b/drivers/net/dsa/microchip/ksz_priv.h
@@ -59,6 +59,8 @@ struct ksz_device {
 
void *priv;
 
+   int reset_gpio; /* Optional reset GPIO */
+
/* chip specific data */
u32 chip_id;
int num_vlans;
-- 
2.18.0

[PATCH] net: dsa: ksz: Increase the tag alignment

2018-12-07 Thread Marek Vasut

Make sure to cater even for network packets with VLAN tags in them,
increase the minimal packets size to account for those.

Signed-off-by: Marek Vasut 
Cc: Vivien Didelot 
Cc: Woojung Huh 
Cc: David S. Miller 
Cc: Tristram Ha 
---
 net/dsa/tag_ksz.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/dsa/tag_ksz.c b/net/dsa/tag_ksz.c
index 96411f70ab9f4..cad4406d9d4c2 100644
--- a/net/dsa/tag_ksz.c
+++ b/net/dsa/tag_ksz.c
@@ -39,7 +39,7 @@ static struct sk_buff *ksz_xmit(struct sk_buff *skb, struct 
net_device *dev)
int padlen;
u8 *tag;
 
-   padlen = (skb->len >= ETH_ZLEN) ? 0 : ETH_ZLEN - skb->len;
+   padlen = (skb->len >= VLAN_ETH_ZLEN) ? 0 : VLAN_ETH_ZLEN - skb->len;
 
if (skb_tailroom(skb) >= padlen + KSZ_INGRESS_TAG_LEN) {
/* Let dsa_slave_xmit() free skb */
-- 
2.18.0

[PATCH] net: ethernet: ti: cpsw: Assign OF node to slave devices

2018-12-07 Thread Marek Vasut

Assign OF node to CPSW slave devices, otherwise it is not possible to
bind eg. DSA switch to them. Without this patch, the DSA code tries
to find the ethernet device by OF match, but fails to do so because
the slave device has NULL OF node.

Signed-off-by: Marek Vasut 
Cc: David S. Miller 
Cc: Ivan Khoronzhuk 
---
 drivers/net/ethernet/ti/cpsw.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c
index a591583d120e1..702101b0aebd1 100644
--- a/drivers/net/ethernet/ti/cpsw.c
+++ b/drivers/net/ethernet/ti/cpsw.c
@@ -383,6 +383,7 @@ struct cpsw_hw_stats {
 };
 
 struct cpsw_slave_data {
+   struct device_node *slave_node;
struct device_node *phy_node;
charphy_id[MII_BUS_ID_SIZE];
int phy_if;
@@ -3291,6 +3292,7 @@ static int cpsw_probe_dt(struct cpsw_platform_data *data,
return ret;
}
 
+   slave_data->slave_node = slave_node;
slave_data->phy_node = of_parse_phandle(slave_node,
"phy-handle", 0);
parp = of_get_property(slave_node, "phy_id", );
@@ -3441,6 +3443,7 @@ static int cpsw_probe_dual_emac(struct cpsw_priv *priv)
 
/* register the network device */
SET_NETDEV_DEV(ndev, cpsw->dev);
+   ndev->dev.of_node = cpsw->slaves[1].data->slave_node;
ret = register_netdev(ndev);
if (ret) {
dev_err(cpsw->dev, "cpsw: error registering net device\n");
@@ -3709,6 +3712,7 @@ static int cpsw_probe(struct platform_device *pdev)
 
/* register the network device */
SET_NETDEV_DEV(ndev, >dev);
+   ndev->dev.of_node = cpsw->slaves[0].data->slave_node;
ret = register_netdev(ndev);
if (ret) {
dev_err(priv->dev, "error registering net device\n");
-- 
2.18.0

RE: [PATCH v2 net-next 0/8] dpaa2-eth: Introduce XDP support

2018-12-07 Thread Ioana Ciocoi Radulescu




> -Original Message-
> From: Ilias Apalodimas 
> Sent: Friday, December 7, 2018 7:20 PM
> To: Ioana Ciocoi Radulescu 
> Cc: Jesper Dangaard Brouer ;
> netdev@vger.kernel.org; da...@davemloft.net; Ioana Ciornei
> ; dsah...@gmail.com; Camelia Alexandra Groza
> 
> Subject: Re: [PATCH v2 net-next 0/8] dpaa2-eth: Introduce XDP support
> 
> Hi Ioana,
> > >
> > > > Add support for XDP programs. Only XDP_PASS, XDP_DROP and
> XDP_TX
> > > > actions are supported for now. Frame header changes are also
> > > > allowed.
> 
> I only did a quick grep around the driver so i might be missing something,
> but i can only see allocations via napi_alloc_frag(). XDP requires pages
> (either a single page per packet or a driver that does the page management
> of
> its own and fits 2 frames in a single page, assuming 4kb pages).
> Am i missing something on the driver?

No, I guess I'm the one missing stuff, I didn't realise single page per packet
is a hard requirement for XDP. Could you point me to more info on this?

Thanks,
Ioana

> 
> > >
> > > Do you have any XDP performance benchmarks on this hardware?
> >
> > We have some preliminary perf data that doesn't look great,
> > but we hope to improve it :)
> 
> As Jesper said we are doing similar work on a cortex a-53 and plan to work on
> a-72 as well. We might be able to help out.
> 
> /Ilias

Re: [PATCH] gianfar: Add gfar_change_carrier()

2018-12-07 Thread Andrew Lunn

> Would you be happier if .ndo_change_carrier() only acted on Fixed PHYs?

I think it makes sense to allow a fixed phy carrier to be changed from
user space. However, i don't think you can easily plumb that to
.ndo_change_carrier(), since that is a MAC feature. You need to change
the fixed_phy_status to indicate the PHY has lost link, and then let
the usual mechanisms tell the MAC it is down and change the carrier
status.

Andrew

Re: [PATCH v2 net-next 0/8] dpaa2-eth: Introduce XDP support

2018-12-07 Thread Ilias Apalodimas

Hi Ioana,
> > 
> > > Add support for XDP programs. Only XDP_PASS, XDP_DROP and XDP_TX
> > > actions are supported for now. Frame header changes are also
> > > allowed.

I only did a quick grep around the driver so i might be missing something, 
but i can only see allocations via napi_alloc_frag(). XDP requires pages 
(either a single page per packet or a driver that does the page management of
its own and fits 2 frames in a single page, assuming 4kb pages). 
Am i missing something on the driver? 

> > 
> > Do you have any XDP performance benchmarks on this hardware?
> 
> We have some preliminary perf data that doesn't look great,
> but we hope to improve it :)

As Jesper said we are doing similar work on a cortex a-53 and plan to work on
a-72 as well. We might be able to help out.

/Ilias

Re: [PATCH bpf-next] bpf: relax verifier restriction on BPF_MOV | BPF_ALU

2018-12-07 Thread Jiong Wang


On 06/12/2018 03:13, Alexei Starovoitov wrote:

On Wed, Dec 05, 2018 at 03:32:50PM +, Jiong Wang wrote:

On 05/12/2018 14:52, Edward Cree wrote:

On 05/12/18 09:46, Jiong Wang wrote:

There is NO processed instruction number regression, either with or without
-mattr=+alu32.



Cilium bpf
===
bpf_lb-DLB_L3.o 2110/21101730/1733

That looks like a regression of 3 insns in the 32-bit case.
May be worth investigating why.


Will look into this.




+  dst_reg = insn->dst_reg;
+  regs[dst_reg] = regs[src_reg];
+  if (BPF_CLASS(insn->code) == BPF_ALU) {
+  /* Update type and range info. */
+  regs[dst_reg].type = SCALAR_VALUE;
+  coerce_reg_to_size([dst_reg], 4);

Won't this break when handed a pointer (as root, so allowed to leak
  it)?  E.g. (pointer + x) gets turned into scalar x, rather than
  unknown scalar in range [0, 0x].


Initially I was gating this to scalar_value only, later was thinking it
could be extended to ptr case if ptr leak is allowed.

But, your comment remind me min/max value doesn't mean real min/max value
for ptr types value, it means the offset only if I am understanding the
issue correctly. So, it will break pointer case.


correct. In case of is_pointer_value() && root -> mark_reg_unknown() has to be 
called

The explanation of additional 3 steps from another email makes sense to me.

Can you take a look why it helps default (llvm alu64) case too ?
bpf_overlay.o   3096/2898


It is embarrassing that I am not able to reproduce this number after tried
quite a few env configurations. I think the number must be wrong because
llvm alu64 binary doesn't contain alu32 move so shouldn't be impacted by
this patch even though I double checked the raw data I collected on llvm
alu64, re-calculated the number before this patch, it is still 3096. I
guess there must be something wrong with the binary I was loading.

I improved my benchmarking methodology to build all alu64 and alu32
binaries first, and never change them later. Then used a script to load and
collect the processed number. (borrowed the script from
https://github.com/4ast/bpf_cilium_test/, only my binaries are built from
latest Cilium repo and contains alu32 version as well)

I ran this new benchmarking env for several times, and could get the
following new results consistently:

bpf_lb-DLB_L3.o:2085/2085 1685/1687
bpf_lb-DLB_L4.o:2287/2287 1986/1982
bpf_lb-DUNKNOWN.o:  690/690   622/622
bpf_lxc.o:  95033/95033   N/A
bpf_netdev.o:   7245/7245 N/A
bpf_overlay.o:  2898/2898 3085/2947

No change on alu64 binary.

For alu32, bpf_overlay.o still get fewer processed instruction number, this
is because there is the following sequence (and another similar one).
Before this patch, r2 at insn 139 is unknown, so verifier always explore
both path-taken and path-fall_through. After this patch, it explores
path-fall_through only, so saved some insns.

  129: (b4) (u32) r7 = (u32) -140
  ...
  136: (bc) (u32) r2 = (u32) r7
  137: (74) (u32) r2 >>= (u32) 31
  138: (4c) (u32) r2 |= (u32) r1
  139: (15) if r2 == 0x0 goto pc+342
  140: (b4) (u32) r1 = (u32) 2

And a permissive register value for r2 hasn't released more path prune for
this test, so in all, after this patch, there is fewer processed insn.

I have sent out a v2, gated this change under SCALAR_VALUE, and also
updated the patch description.

Thanks.

Regards,
Jiong

[PATCH v2 bpf-next] bpf: relax verifier restriction on BPF_MOV | BPF_ALU

2018-12-07 Thread Jiong Wang

Currently, the destination register is marked as unknown for 32-bit
sub-register move (BPF_MOV | BPF_ALU) whenever the source register type is
SCALAR_VALUE.

This is too conservative that some valid cases will be rejected.
Especially, this may turn a constant scalar value into unknown value that
could break some assumptions of verifier.

For example, test_l4lb_noinline.c has the following C code:

struct real_definition *dst

1:  if (!get_packet_dst(, , vip_info, is_ipv6))
2:return TC_ACT_SHOT;
3:
4:  if (dst->flags & F_IPV6) {

get_packet_dst is responsible for initializing "dst" into valid pointer and
return true (1), otherwise return false (0). The compiled instruction
sequence using alu32 will be:

  412: (54) (u32) r7 &= (u32) 1
  413: (bc) (u32) r0 = (u32) r7
  414: (95) exit

insn 413, a BPF_MOV | BPF_ALU, however will turn r0 into unknown value even
r7 contains SCALAR_VALUE 1.

This causes trouble when verifier is walking the code path that hasn't
initialized "dst" inside get_packet_dst, for which case 0 is returned and
we would then expect verifier concluding line 1 in the above C code pass
the "if" check, therefore would skip fall through path starting at line 4.
Now, because r0 returned from callee has became unknown value, so verifier
won't skip analyzing path starting at line 4 and "dst->flags" requires
dereferencing the pointer "dst" which actually hasn't be initialized for
this path.

This patch relaxed the code marking sub-register move destination. For a
SCALAR_VALUE, it is safe to just copy the value from source then truncate
it into 32-bit.

A unit test also included to demonstrate this issue. This test will fail
before this patch.

This relaxation could let verifier skipping more paths for conditional
comparison against immediate. It also let verifier recording a more
accurate/strict value for one register at one state, if this state end up
with going through exit without rejection and it is used for state
comparison later, then it is possible an inaccurate/permissive value is
better. So the real impact on verifier processed insn number is complex.
But in all, without this fix, valid program could be rejected.

>From real benchmarking on kernel selftests and Cilium bpf tests, there is
no impact on processed instruction number when tests ares compiled with
default compilation options. There is slightly improvements when they are
compiled with -mattr=+alu32 after this patch.

Also, test_xdp_noinline/-mattr=+alu32 now passed verification. It is
rejected before this fix.

Insn processed before/after this patch:

default -mattr=+alu32

Kernel selftest
===
test_xdp.o  371/371  369/369
test_l4lb.o 6345/63455623/5623
test_xdp_noinline.o 2971/2971rejected/2727
test_tcp_estates.o  429/429  430/430

Cilium bpf
===
bpf_lb-DLB_L3.o:2085/2085 1685/1687
bpf_lb-DLB_L4.o:2287/2287 1986/1982
bpf_lb-DUNKNOWN.o:  690/690   622/622
bpf_lxc.o:  95033/95033   N/A
bpf_netdev.o:   7245/7245 N/A
bpf_overlay.o:  2898/2898 3085/2947

NOTE:
  - bpf_lxc.o and bpf_netdev.o compiled by -mattr=+alu32 are rejected by
verifier due to another issue inside verifier on supporting alu32
binary.
  - Each cilium bpf program could generate several processed insn number,
above number is sum of them.

v1->v2:
 - Restrict the change on SCALAR_VALUE.
 - Update benchmark numbers on Cilium bpf tests.

Signed-off-by: Jiong Wang 
---
 kernel/bpf/verifier.c   | 16 
 tools/testing/selftests/bpf/test_verifier.c | 13 +
 2 files changed, 25 insertions(+), 4 deletions(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 9584438..20a 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -3583,12 +3583,15 @@ static int check_alu_op(struct bpf_verifier_env *env, 
struct bpf_insn *insn)
return err;
 
if (BPF_SRC(insn->code) == BPF_X) {
+   struct bpf_reg_state *src_reg = regs + insn->src_reg;
+   struct bpf_reg_state *dst_reg = regs + insn->dst_reg;
+
if (BPF_CLASS(insn->code) == BPF_ALU64) {
/* case: R1 = R2
 * copy register state to dest reg
 */
-   regs[insn->dst_reg] = regs[insn->src_reg];
-   regs[insn->dst_reg].live |= REG_LIVE_WRITTEN;
+   *dst_reg = *src_reg;
+   dst_reg->live |= REG_LIVE_WRITTEN;
} else {
/* R1 = (u32) R2 */
if (is_pointer_value(env, insn->src_reg)) {
@@ -3596,9 +3599,14 @@ static int check_alu_op(struct bpf_verifier_env *env, 
struct bpf_insn *insn)

Re: [PATCH] gianfar: Add gfar_change_carrier()

2018-12-07 Thread Joakim Tjernlund

On Fri, 2018-12-07 at 15:15 +0100, Andrew Lunn wrote:
> 
> 
> > Been a bit busy today but now I have played with dormant using ip link and 
> > got some odd results:
> > # > ifconfig eth0
> > eth0: flags=4163  mtu 1500
> > inet 172.20.0.246  netmask 255.255.0.0  broadcast 172.20.255.255
> > inet6 fe80::ad9c:b230:1da8:1821  prefixlen 64  scopeid 0x20
> > ether 8c:16:45:89:cf:c6  txqueuelen 1000  (Ethernet)
> > RX packets 1848903  bytes 736764445 (702.6 MiB)
> > RX errors 0  dropped 0  overruns 0  frame 0
> > TX packets 627462  bytes 222453345 (212.1 MiB)
> > TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
> > device interrupt 16  memory 0xdc20-dc22
> > # > ip link set mode dormant dev eth0
> >  ping sunet.se
> > PING sunet.se (192.36.171.231) 56(84) bytes of data.
> > 64 bytes from webc.sunet.se (192.36.171.231): icmp_seq=1 ttl=54 time=2.22 ms
> > 64 bytes from webc.sunet.se (192.36.171.231): icmp_seq=2 ttl=54 time=2.17 ms
> > 
> > # > ifconfig eth0
> > eth0: flags=4163  mtu 1500
> > inet 172.20.0.246  netmask 255.255.0.0  broadcast 172.20.255.255
> > inet6 fe80::ad9c:b230:1da8:1821  prefixlen 64  scopeid 0x20
> > ether 8c:16:45:89:cf:c6  txqueuelen 1000  (Ethernet)
> > RX packets 1905479  bytes 753549572 (718.6 MiB)
> > RX errors 0  dropped 0  overruns 0  frame 0
> > TX packets 648266  bytes 224421617 (214.0 MiB)
> > TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
> > device interrupt 16  memory 0xdc20-dc22
> > still RUNNING ..
> > 
> > #> ip link show eth0
> > 5: eth0:  mtu 1500 qdisc fq_codel state UP 
> > mode DORMANT group default qlen
> > 
> > state is still UP ?
> > 
> > # > ip link set state dormant dev eth0
> > # > ip link show eth0
> > 5: eth0:  mtu 1500 qdisc 
> > fq_codel state DORMANT mode DORMANT group default qlen 1000
> > # > ifconfig eth0
> > eth0: flags=4099  mtu 1500
> > ...
> > 
> > Now both state and not RUNNING :)
> > but ...
> >  # ping sunet.se
> > PING sunet.se (192.36.171.231) 56(84) bytes of data.
> > 64 bytes from webc.sunet.se (192.36.171.231): icmp_seq=1 ttl=54 time=2.43 ms
> > 64 bytes from webc.sunet.se (192.36.171.231): icmp_seq=2 ttl=54 time=2.31 ms
> > 
> > I can still ping though. Is this how it is supposed to work ? No sure how 
> > state and mode relate to each other either.
> 
> I don't actually know. I know some things are supposed to work while
> dormant. You should be able to perform 802.1x negotiation, etc.
> 
> You might find the people on the wireless list know more. I think it
> is used by wpa_supplicant and hostapd during device authentication.


I am not sure why .ndo_change_carrier() is a no go in real drivers. Consider 
PHY less (aka Fixed PHYs)
devices. Here it makes sense to be able to control carrier from /sys I think.

Would you be happier if .ndo_change_carrier() only acted on Fixed PHYs?
I could also rework ndo_change_carrier to only use netif_carrier_on/off like 
team and dummy do.

 Jocke

RE: [PATCH v2 net-next 0/8] dpaa2-eth: Introduce XDP support

2018-12-07 Thread Ioana Ciocoi Radulescu

> -Original Message-
> From: Jesper Dangaard Brouer 
> Sent: Wednesday, December 5, 2018 5:45 PM
> To: Ioana Ciocoi Radulescu 
> Cc: bro...@redhat.com; netdev@vger.kernel.org; da...@davemloft.net;
> Ioana Ciornei ; dsah...@gmail.com; Camelia
> Alexandra Groza ; Ilias Apalodimas
> 
> Subject: Re: [PATCH v2 net-next 0/8] dpaa2-eth: Introduce XDP support
> 
> On Mon, 26 Nov 2018 16:27:28 +
> Ioana Ciocoi Radulescu  wrote:
> 
> > Add support for XDP programs. Only XDP_PASS, XDP_DROP and XDP_TX
> > actions are supported for now. Frame header changes are also
> > allowed.
> 
> Do you have any XDP performance benchmarks on this hardware?

We have some preliminary perf data that doesn't look great,
but we hope to improve it :)

On a LS2088A with A72 cores @2GHz (numbers in Mpps):
1core   8cores
-
XDP_DROP (no touching data) 5.6829.6 (linerate)
XDP_DROP (xdp1 sample)  3.4625.18
XDP_TX(xdp2 sample) 1.7113.26

For comparison, plain IP forwarding through the stack
is currently around 0.5Mpps (1c) / 3.8Mpps (8c).

>
> Also what boards (and arch's) are using this dpaa2-eth driver?

Currently supported LS2088A, LS1088A, soon LX2160A (all with
ARM64 cores).

> Any devel board I can buy?

I should have an answer for this early next week and will
get back to you.

Thanks,
Ioana

> 
> 
> p.s. Ilias and I are coding up page_pool and XDP support for Marvell
> mvneta driver, which is avail on a number of avail boards, see here[1]
> 
> [1]
> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit
> hub.com%2Fxdp-project%2Fxdp-
> project%2Fblob%2Fmaster%2Fareas%2Farm64%2Farm01_selecting_hardwar
> e.orgdata=02%7C01%7Cruxandra.radulescu%40nxp.com%7C546868ba
> aa074902ded608d65ac8a594%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7
> C0%7C636796215148994553sdata=za6xUoIrv2jo%2BbvuKjXfpOXeQ3tw
> 96bZZzRB2Vny1iw%3Dreserved=0
> --
> Best regards,
>   Jesper Dangaard Brouer
>   MSc.CS, Principal Kernel Engineer at Red Hat
>   LinkedIn:
> https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fww
> w.linkedin.com%2Fin%2Fbrouerdata=02%7C01%7Cruxandra.radulescu
> %40nxp.com%7C546868baaa074902ded608d65ac8a594%7C686ea1d3bc2b4c6f
> a92cd99c5c301635%7C0%7C0%7C636796215148994553sdata=vTe2jd3V
> FXUpEVPLkbGN6i2OyyPfhQ9HacCaPZbm%2Bk8%3Dreserved=0

Re: [PATCH] bpf: fix overflow of bpf_jit_limit when PAGE_SIZE >= 64K

2018-12-07 Thread Michael Roth

Quoting Michael Ellerman (2018-12-07 06:31:13)
> Michael Roth  writes:
> 
> > Commit ede95a63b5 introduced a bpf_jit_limit tuneable to limit BPF
> > JIT allocations. At compile time it defaults to PAGE_SIZE * 4,
> > and is adjusted again at init time if MODULES_VADDR is defined.
> >
> > For ppc64 kernels, MODULES_VADDR isn't defined, so we're stuck with
> 
> But maybe it should be, I don't know why we don't define it.
> 
> > the compile-time default at boot-time, which is 0x9c40 when
> > using 64K page size. This overflows the signed 32-bit bpf_jit_limit
> > value:
> >
> >   root@ubuntu:/tmp# cat /proc/sys/net/core/bpf_jit_limit
> >   -1673527296
> >
> > and can cause various unexpected failures throughout the network
> > stack. In one case `strace dhclient eth0` reported:
> >
> >   setsockopt(5, SOL_SOCKET, SO_ATTACH_FILTER, {len=11, filter=0x105dd27f8}, 
> > 16) = -1 ENOTSUPP (Unknown error 524)
> >
> > and similar failures can be seen with tools like tcpdump. This doesn't
> > always reproduce however, and I'm not sure why. The more consistent
> > failure I've seen is an Ubuntu 18.04 KVM guest booted on a POWER9 host
> > would time out on systemd/netplan configuring a virtio-net NIC with no
> > noticeable errors in the logs.
> >
> > Fix this by limiting the compile-time default for bpf_jit_limit to
> > INT_MAX.
> 
> INT_MAX is a lot more than (4k * 4), so I guess I'm not clear on
> whether we should be using PAGE_SIZE here at all. I guess each BPF
> program uses at least one page is the thinking?

That seems to be the case, at least, the max number of minimum-sized
allocations would be less on ppc64 since the allocations are always at
least PAGE_SIZE in size. The init-time default also limits to INT_MAX,
so it seemed consistent to do that here too.

> 
> Thanks for tracking this down. For some reason none of my ~10 test boxes
> have hit this, perhaps I don't have new enough userspace?

I'm not too sure, I would've thought things like the dhclient case in
the commit log would fail every time, but sometimes I need to reboot the
guest before I start seeing the behavior. Maybe there's something special
about when JIT allocations are actually done that can affect
reproducibility?

In my case at least the virtio-net networking timeout was consistent
enough for a bisect, but maybe it depends on the specific network
configuration (single NIC, basic DHCP through netplan/systemd in my case).

> 
> You don't mention why you needed to add BPF_MIN(), I assume because the
> kernel version of min() has gotten too complicated to work here?

I wasn't sure if it was safe here or not, so I tried looking at other
users and came across:

mm/vmalloc.c:777:#define VMAP_MIN(x, y) ((x) < (y) ? (x) : (y)) /* 
can't use min() */

I'm not sure what the reasoning was (or whether it still applies), but I
figured it was safer to do the same here. Maybe Nick still recalls?

> 
> Daniel I assume you'll merge this via your tree?
> 
> cheers
> 
> > Fixes: ede95a63b5e8 ("bpf: add bpf_jit_limit knob to restrict unpriv 
> > allocations")
> > Cc: linuxppc-...@ozlabs.org
> > Cc: Daniel Borkmann 
> > Cc: Sandipan Das 
> > Cc: Alexei Starovoitov 
> > Signed-off-by: Michael Roth 
> > ---
> >  kernel/bpf/core.c | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
> > index b1a3545d0ec8..55de4746cdfd 100644
> > --- a/kernel/bpf/core.c
> > +++ b/kernel/bpf/core.c
> > @@ -365,7 +365,8 @@ void bpf_prog_kallsyms_del_all(struct bpf_prog *fp)
> >  }
> >  
> >  #ifdef CONFIG_BPF_JIT
> > -# define BPF_JIT_LIMIT_DEFAULT   (PAGE_SIZE * 4)
> > +# define BPF_MIN(x, y) ((x) < (y) ? (x) : (y))
> > +# define BPF_JIT_LIMIT_DEFAULT   BPF_MIN((PAGE_SIZE * 4), INT_MAX)
> >  
> >  /* All BPF JIT sysctl knobs here. */
> >  int bpf_jit_enable   __read_mostly = IS_BUILTIN(CONFIG_BPF_JIT_ALWAYS_ON);
> > -- 
> > 2.17.1
>

Re: [PATCH bpf-next 0/7] Add XDP_ATTACH bind() flag to AF_XDP sockets

2018-12-07 Thread Björn Töpel

Den fre 7 dec. 2018 kl 14:42 skrev Jesper Dangaard Brouer :
>
> On Fri,  7 Dec 2018 12:44:24 +0100
> Björn Töpel  wrote:
>
> > The rationale behind attach is performance and ease of use. Many XDP
> > socket users just need a simple way of creating/binding a socket and
> > receiving frames right away without loading an XDP program.
> >
> > XDP_ATTACH adds a mechanism we call "builtin XDP program" that simply
> > is a kernel provided XDP program that is installed to the netdev when
> > XDP_ATTACH is being passed as a bind() flag.
> >
> > The builtin program is the simplest program possible to redirect a
> > frame to an attached socket. In restricted C it would look like this:
> >
> >   SEC("xdp")
> >   int xdp_prog(struct xdp_md *ctx)
> >   {
> > return bpf_xsk_redirect(ctx);
> >   }
> >
> > The builtin program loaded via XDP_ATTACH behaves, from an
> > install-to-netdev/uninstall-from-netdev point of view, differently
> > from regular XDP programs. The easiest way to look at it is as a
> > 2-level hierarchy, where regular XDP programs has precedence over the
> > builtin one.
> >
> > If no regular XDP program is installed to the netdev, the builtin will
> > be install. If the builtin program is installed, and a regular is
> > installed, regular XDP program will have precedence over the builtin
> > one.
> >
> > Further, if a regular program is installed, and later removed, the
> > builtin one will automatically be installed.
> >
> > The sxdp_flags field of struct sockaddr_xdp gets two new options
> > XDP_BUILTIN_SKB_MODE and XDP_BUILTIN_DRV_MODE, which maps to the
> > corresponding XDP netlink install flags.
> >
> > The builtin XDP program functionally adds even more complexity to the
> > already hard to read dev_change_xdp_fd. Maybe it would be simpler to
> > store the program in the struct net_device together with the install
> > flags instead of calling the ndo_bpf multiple times?
>
> (As far as I can see from reading the code, correct me if I'm wrong.)
>
> If an AF_XDP program uses XDP_ATTACH, then it installs the
> builtin-program as the XDP program on the "entire" device.  That means
> all RX-queues will call this XDP-bpf program (indirect call), and it is
> actually only relevant for the specific queue_index.  Yes, the helper
> call does check that the 'xdp->rxq->queue_index' for an attached 'xsk'
> and return XDP_PASS if it is NULL:
>

A side-note: What one can do, which I did for the plumbers work, is
bypassing the indirect call in bpf_prog_run_xdp by doing a "if the XDP
program is a builtin, call internal_bpf_xsk_redirect directly". Then,
the XDP_PASS path wont be taxed by the indirect call -- but only for
this special XDP_ATTACH program. And you'll still get an additional
if-statement in for the skb-path.


Björn

> +BPF_CALL_1(bpf_xdp_xsk_redirect, struct xdp_buff *, xdp)
> +{
> +   struct bpf_redirect_info *ri = this_cpu_ptr(_redirect_info);
> +   struct xdp_sock *xsk;
> +
> +   xsk = READ_ONCE(xdp->rxq->dev->_rx[xdp->rxq->queue_index].xsk);
> +   if (xsk) {
> +   ri->xsk = xsk;
> +   return XDP_REDIRECT;
> +   }
> +
> +   return XDP_PASS;
> +}
>
> Why do every normal XDP_PASS packet have to pay this overhead (indirect
> call), when someone loads an AF_XDP socket program?  The AF_XDP socket
> is tied hard and only relevant to a specific RX-queue (which is why we
> get a performance boost due to SPSC queues).
>
> I acknowledge there is a need for this, but this use-case shows there
> is a need for attaching XDP programs per RX-queue basis.
>
> --
> Best regards,
>   Jesper Dangaard Brouer
>   MSc.CS, Principal Kernel Engineer at Red Hat
>   LinkedIn: http://www.linkedin.com/in/brouer

Re: [PATCH] net: Fix xps_needed inc/dec mismatch

2018-12-07 Thread Ross Lagerwall


On 12/7/18 11:01 AM, Sabrina Dubroca wrote:

Hi Ross,

2018-12-07, 10:16:21 +, Ross Lagerwall wrote:

xps_needed is incremented only when a new dev map is allocated (in
__netif_set_xps_queue). Therefore it should be decremented only when we
actually have a dev map to destroy. Without this, it may be decremented
too many times which causes netif_reset_xps_queues to return early and
not actually clean up the old dev maps. This results in a crash in
__netif_set_xps_queue when it is called later.

The crash occurred when having multiple ixgbe devices in a host. lldpad
would reconfigure them to be FCoE-capable causing reset_xps_queues /
set_xps_queue to be called several times. The xps_needed count would get
out of sync and eventually the above-mentioned crash would occur.

Signed-off-by: Ross Lagerwall 


I posted another patchset recently (commits f28c020fb488 and
867d0ad476db in the "net" tree) for issues in XPS, including broken
xps_needed accounting, so your patch won't apply to David's "net"
tree. Could you try it with your use case, and if you still see
issues, fix them on top? You can grab the latest net tree here:

git://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git



Your two commits fix the issue I was seeing. Thanks!

--
Ross Lagerwall

Re: IP fragmentation performance and don't fragment bug when forwarding

2018-12-07 Thread Risto Pajula


Hello.

I have been to track the poor forwarding latency to the TCP Window scale 
options. The Netgem device uses rather large windows scale options 
(x256) and I have been able to reproduce the routers poor forwarding 
latency also with linux box running in the internal network and changing 
the net.ipv4.tcp_rmem to a large value and thus changing the TCP window 
scaling options to larger ones. I still do not have clue why this causes 
the forwarfing in the linux kernel to block? Maybe something in the 
connection tracking?



With the ICMP timestamp messages I have been able to also pinpoint that 
the latency is caused in the eth1 sending side (the following hping3 
example is run in the router toward the internal network...



xxx:/usr/src/linux-4.20-rc2 # hping3 192.168.0.112 --icmp --icmp-ts -V
using eth1, addr: 192.168.0.1, MTU: 1500
HPING 192.168.0.112 (eth1 192.168.0.112): icmp mode set, 28 headers + 0 
data bytes

len=46 ip=192.168.0.112 ttl=64 id=49464 tos=0 iplen=40
icmp_seq=0 rtt=7.9 ms
ICMP timestamp: Originate=52294891 Receive=52294895 Transmit=52294895
ICMP timestamp RTT tsrtt=7

len=46 ip=192.168.0.112 ttl=64 id=49795 tos=0 iplen=40
icmp_seq=1 rtt=235.9 ms
ICMP timestamp: Originate=52295891 Receive=52296128 Transmit=52296128
ICMP timestamp RTT tsrtt=235

len=46 ip=192.168.0.112 ttl=64 id=49941 tos=0 iplen=40
icmp_seq=2 rtt=3.8 ms
ICMP timestamp: Originate=52296891 Receive=52296895 Transmit=52296895
ICMP timestamp RTT tsrtt=3

len=46 ip=192.168.0.112 ttl=64 id=50685 tos=0 iplen=40
icmp_seq=3 rtt=47.8 ms
ICMP timestamp: Originate=52297891 Receive=52297940 Transmit=52297940
ICMP timestamp RTT tsrtt=47

len=46 ip=192.168.0.112 ttl=64 id=51266 tos=0 iplen=40
icmp_seq=4 rtt=7.7 ms
ICMP timestamp: Originate=52298891 Receive=52298895 Transmit=52298895
ICMP timestamp RTT tsrtt=7

len=46 ip=192.168.0.112 ttl=64 id=52245 tos=0 iplen=40
icmp_seq=5 rtt=3.7 ms
ICMP timestamp: Originate=52299891 Receive=52299895 Transmit=52299895
ICMP timestamp RTT tsrtt=3

^C
--- 192.168.0.112 hping statistic ---
6 packets tramitted, 6 packets received, 0% packet loss
round-trip min/avg/max = 3.7/51.1/235.9 ms



BR.
Risto

On 2.12.2018 23:32, Risto Pajula wrote:

Hello.

You can most likely ignore the "DF Bit, mtu bug when forwarding" case. 
There isn't actually big IP packets on the wire, instead there is 
burst of packets on the wire, which are combined by the GRO... And 
thus dropping them should not happen. Sorry about the invalid bug report.


However the poor latency from intenal network to the internet still 
remain, both GRO enabled and disabled. I will try to study further...



BR.
Risto


On 2.12.2018 14:01, Risto Pajula wrote:

Hello.

I have encountered a weird performance problem in Linux IP 
fragmentation when using video streaming services behind the NAT. 
Also I have studied a possible bug in the DF bit (don't fragment) 
handling when forwarding the IP packets.


First the system setup description:

[host1]-int lan-(eth1)[linux router](eth0)-extlan-[fibre 
router]-internet


where:
host1: is a Netgem N7800 "cable box" for online video streaming 
services provided by local telco (Can access Netflix, HBO nordic, 
"live TV", etc.)
linux router: Linux computer with Dualcore Intel Celeron G1840, 
running currently Linux kernel 4.20.0-rc2, and openSUSE Leap 15.0
eth1: Linux Routers internal (NAT) interface, 192.168.0.1/24 network, 
mtu set to 1500, RTL8169sb/8110sb
eth0: Linux Routers internet facing interface, public ip address, mtu 
set to 1500,  RTL8168evl/8111evl
fibre router: Alcatel Lucent fibre router (I-241G-Q), directly 
connected to the eth0 of the Linux router.


And now when using the Netgem N7800 with online video services 
(Netflix, HBO nordic, etc) the Linux router will receive very BIG IP 
packets in the eth0 upto ~20kB, this seems to lead to the following 
problems in the Linux IP stack.


IP fragmentation performance:
When the Linux router receives these large IP packets in the eth0 
everything works, but it seems that them cause very large performance 
degradation from internal network to the internet regarding the 
latency when the IP fragmentation is performed. The ping latency from 
internal network to the internel network increases from stable 
15ms-20ms up to 700-800ms AND also the ping from the internal network 
to the linux router eth1 (192.168.0.). However up link works 
perfectly, the ping is still stable when streaming the online 
services (From linux router to the internet). It seems that the IP 
fragmentation is somehow blocking the eth1 reception or transmission 
for very long time (which it shouldn't). I'm able to test and debug 
the issue further, but advice regarding where to look would be 
appreciated.



DF Bit, mtu bug when forwarding:
I have started to study the above mentioned problem and have found a 
possible bug in the DF bit and mtu handling in IP forwarding. The BIG 
packets received from streaming services all have the "DF bit" set 
and the question is that

Urgently need money? We can help you!

2018-12-07 Thread Mr. Muller Dieter

Urgently need money? We can help you!
Are you by the current situation in trouble or threatens you in trouble?
In this way, we give you the ability to take a new development.
As a rich person I feel obliged to assist people who are struggling to give 
them a chance. Everyone deserved a second chance and since the Government 
fails, it will have to come from others.
No amount is too crazy for us and the maturity we determine by mutual agreement.
No surprises, no extra costs, but just the agreed amounts and nothing else.
Don't wait any longer and comment on this post. Please specify the amount you 
want to borrow and we will contact you with all the possibilities. contact us 
today at stewarrt.l...@gmail.com

Re: [PATCH] gianfar: Add gfar_change_carrier()

2018-12-07 Thread Andrew Lunn

> Been a bit busy today but now I have played with dormant using ip link and 
> got some odd results:
> # > ifconfig eth0
> eth0: flags=4163  mtu 1500
> inet 172.20.0.246  netmask 255.255.0.0  broadcast 172.20.255.255
> inet6 fe80::ad9c:b230:1da8:1821  prefixlen 64  scopeid 0x20
> ether 8c:16:45:89:cf:c6  txqueuelen 1000  (Ethernet)
> RX packets 1848903  bytes 736764445 (702.6 MiB)
> RX errors 0  dropped 0  overruns 0  frame 0
> TX packets 627462  bytes 222453345 (212.1 MiB)
> TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
> device interrupt 16  memory 0xdc20-dc22  
> # > ip link set mode dormant dev eth0
>  ping sunet.se
> PING sunet.se (192.36.171.231) 56(84) bytes of data.
> 64 bytes from webc.sunet.se (192.36.171.231): icmp_seq=1 ttl=54 time=2.22 ms
> 64 bytes from webc.sunet.se (192.36.171.231): icmp_seq=2 ttl=54 time=2.17 ms
> 
> # > ifconfig eth0
> eth0: flags=4163  mtu 1500
> inet 172.20.0.246  netmask 255.255.0.0  broadcast 172.20.255.255
> inet6 fe80::ad9c:b230:1da8:1821  prefixlen 64  scopeid 0x20
> ether 8c:16:45:89:cf:c6  txqueuelen 1000  (Ethernet)
> RX packets 1905479  bytes 753549572 (718.6 MiB)
> RX errors 0  dropped 0  overruns 0  frame 0
> TX packets 648266  bytes 224421617 (214.0 MiB)
> TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
> device interrupt 16  memory 0xdc20-dc22  
> still RUNNING ..
> 
> #> ip link show eth0
> 5: eth0:  mtu 1500 qdisc fq_codel state UP 
> mode DORMANT group default qlen 
> 
> state is still UP ?
> 
> # > ip link set state dormant dev eth0
> # > ip link show eth0
> 5: eth0:  mtu 1500 qdisc fq_codel 
> state DORMANT mode DORMANT group default qlen 1000
> # > ifconfig eth0
> eth0: flags=4099  mtu 1500
> ...
> 
> Now both state and not RUNNING :)
> but ...
>  # ping sunet.se
> PING sunet.se (192.36.171.231) 56(84) bytes of data.
> 64 bytes from webc.sunet.se (192.36.171.231): icmp_seq=1 ttl=54 time=2.43 ms
> 64 bytes from webc.sunet.se (192.36.171.231): icmp_seq=2 ttl=54 time=2.31 ms
> 
> I can still ping though. Is this how it is supposed to work ? No sure how 
> state and mode relate to each other either.

I don't actually know. I know some things are supposed to work while
dormant. You should be able to perform 802.1x negotiation, etc.

You might find the people on the wireless list know more. I think it
is used by wpa_supplicant and hostapd during device authentication.

   Andrew

Re: [PATCH] gianfar: Add gfar_change_carrier()

2018-12-07 Thread Joakim Tjernlund

On Thu, 2018-12-06 at 20:43 +0100, Andrew Lunn wrote:
> CAUTION: This email originated from outside of the organization. Do not click 
> links or open attachments unless you recognize the sender and know the 
> content is safe.
> 
> 
> > I can have a look at using dormant, but what is change_carrier
> > supposed to do if not this?
> 
> It is intended for interfaces which are stacked, like the team driver,
> and for devices which don't have a phy, e.g. tun, and dummy.
> 
> > I didn't find a tool for DORMANT, I guess i will have to write one
> > myself(using SIOCGIFFLAGS, SIOCSIFFLAGS)?
> 
> ip link should be able to set it.
> 
> Try ip link set mode dormant dev eth0
> 
> Andrew

Been a bit busy today but now I have played with dormant using ip link and got 
some odd results:
# > ifconfig eth0
eth0: flags=4163  mtu 1500
inet 172.20.0.246  netmask 255.255.0.0  broadcast 172.20.255.255
inet6 fe80::ad9c:b230:1da8:1821  prefixlen 64  scopeid 0x20
ether 8c:16:45:89:cf:c6  txqueuelen 1000  (Ethernet)
RX packets 1848903  bytes 736764445 (702.6 MiB)
RX errors 0  dropped 0  overruns 0  frame 0
TX packets 627462  bytes 222453345 (212.1 MiB)
TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
device interrupt 16  memory 0xdc20-dc22  
# > ip link set mode dormant dev eth0
 ping sunet.se
PING sunet.se (192.36.171.231) 56(84) bytes of data.
64 bytes from webc.sunet.se (192.36.171.231): icmp_seq=1 ttl=54 time=2.22 ms
64 bytes from webc.sunet.se (192.36.171.231): icmp_seq=2 ttl=54 time=2.17 ms

# > ifconfig eth0
eth0: flags=4163  mtu 1500
inet 172.20.0.246  netmask 255.255.0.0  broadcast 172.20.255.255
inet6 fe80::ad9c:b230:1da8:1821  prefixlen 64  scopeid 0x20
ether 8c:16:45:89:cf:c6  txqueuelen 1000  (Ethernet)
RX packets 1905479  bytes 753549572 (718.6 MiB)
RX errors 0  dropped 0  overruns 0  frame 0
TX packets 648266  bytes 224421617 (214.0 MiB)
TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
device interrupt 16  memory 0xdc20-dc22  
still RUNNING ..

#> ip link show eth0
5: eth0:  mtu 1500 qdisc fq_codel state UP 
mode DORMANT group default qlen 

state is still UP ?

# > ip link set state dormant dev eth0
# > ip link show eth0
5: eth0:  mtu 1500 qdisc fq_codel 
state DORMANT mode DORMANT group default qlen 1000
# > ifconfig eth0
eth0: flags=4099  mtu 1500
...

Now both state and not RUNNING :)
but ...
 # ping sunet.se
PING sunet.se (192.36.171.231) 56(84) bytes of data.
64 bytes from webc.sunet.se (192.36.171.231): icmp_seq=1 ttl=54 time=2.43 ms
64 bytes from webc.sunet.se (192.36.171.231): icmp_seq=2 ttl=54 time=2.31 ms

I can still ping though. Is this how it is supposed to work ? No sure how state 
and mode relate to each other either.

 Jocke

Re: [PATCH bpf-next 0/7] Add XDP_ATTACH bind() flag to AF_XDP sockets

2018-12-07 Thread Björn Töpel

Den fre 7 dec. 2018 kl 14:42 skrev Jesper Dangaard Brouer :
>
> On Fri,  7 Dec 2018 12:44:24 +0100
> Björn Töpel  wrote:
>
> > The rationale behind attach is performance and ease of use. Many XDP
> > socket users just need a simple way of creating/binding a socket and
> > receiving frames right away without loading an XDP program.
> >
> > XDP_ATTACH adds a mechanism we call "builtin XDP program" that simply
> > is a kernel provided XDP program that is installed to the netdev when
> > XDP_ATTACH is being passed as a bind() flag.
> >
> > The builtin program is the simplest program possible to redirect a
> > frame to an attached socket. In restricted C it would look like this:
> >
> >   SEC("xdp")
> >   int xdp_prog(struct xdp_md *ctx)
> >   {
> > return bpf_xsk_redirect(ctx);
> >   }
> >
> > The builtin program loaded via XDP_ATTACH behaves, from an
> > install-to-netdev/uninstall-from-netdev point of view, differently
> > from regular XDP programs. The easiest way to look at it is as a
> > 2-level hierarchy, where regular XDP programs has precedence over the
> > builtin one.
> >
> > If no regular XDP program is installed to the netdev, the builtin will
> > be install. If the builtin program is installed, and a regular is
> > installed, regular XDP program will have precedence over the builtin
> > one.
> >
> > Further, if a regular program is installed, and later removed, the
> > builtin one will automatically be installed.
> >
> > The sxdp_flags field of struct sockaddr_xdp gets two new options
> > XDP_BUILTIN_SKB_MODE and XDP_BUILTIN_DRV_MODE, which maps to the
> > corresponding XDP netlink install flags.
> >
> > The builtin XDP program functionally adds even more complexity to the
> > already hard to read dev_change_xdp_fd. Maybe it would be simpler to
> > store the program in the struct net_device together with the install
> > flags instead of calling the ndo_bpf multiple times?
>
> (As far as I can see from reading the code, correct me if I'm wrong.)
>
> If an AF_XDP program uses XDP_ATTACH, then it installs the
> builtin-program as the XDP program on the "entire" device.  That means
> all RX-queues will call this XDP-bpf program (indirect call), and it is
> actually only relevant for the specific queue_index.  Yes, the helper
> call does check that the 'xdp->rxq->queue_index' for an attached 'xsk'
> and return XDP_PASS if it is NULL:
>

Yes, you are correct. The builtin XDP program, just like a regular XDP
program, affects the whole netdev. So, yes the non-AF_XDP queues would
get a performance hit from this. Just to reiterate -- this isn't new
for this series. This has always been the case for XDP when acting on
just one queue.

> +BPF_CALL_1(bpf_xdp_xsk_redirect, struct xdp_buff *, xdp)
> +{
> +   struct bpf_redirect_info *ri = this_cpu_ptr(_redirect_info);
> +   struct xdp_sock *xsk;
> +
> +   xsk = READ_ONCE(xdp->rxq->dev->_rx[xdp->rxq->queue_index].xsk);
> +   if (xsk) {
> +   ri->xsk = xsk;
> +   return XDP_REDIRECT;
> +   }
> +
> +   return XDP_PASS;
> +}
>
> Why do every normal XDP_PASS packet have to pay this overhead (indirect
> call), when someone loads an AF_XDP socket program?  The AF_XDP socket
> is tied hard and only relevant to a specific RX-queue (which is why we
> get a performance boost due to SPSC queues).
>
> I acknowledge there is a need for this, but this use-case shows there
> is a need for attaching XDP programs per RX-queue basis.
>

>From my AF_XDP perspective, having a program per queue would make
sense. The discussion of a per-queue has been up before, and I think
the conclusion was that it would be too complex from a
configuration/tooling point-of-view. Again, for AF_XDP this would be
great.

When we started to hack on AF_PACKET v4, we had some ideas of doing
the "queue slicing" on a netdev level. So, e.g. take a netdev, and
create, say, macvlans that took over parts of parents queues
(something in line of what John did with NETIF_F_HW_L2FW_DOFFLOAD for
macvlan) and then use the macvlan interface as the dedicated AF_XDP
interface.

Personally, I like the current queue slicing model, and having a way
of loading an XDP program per queue would be nice -- unless the UX for
the poor sysadmin will be terrible. :-)


Björn

> --
> Best regards,
>   Jesper Dangaard Brouer
>   MSc.CS, Principal Kernel Engineer at Red Hat
>   LinkedIn: http://www.linkedin.com/in/brouer

[PATCH v2 net-next 3/4] net: aquantia: fix initialization of RSS table

2018-12-07 Thread Igor Russkikh

From: Dmitry Bogdanov 

Now RSS indirection table is initialized before setting up the number of
hw queues, consequently the table may be filled by non existing queues.
This patch moves the initialization when the number of hw queues is
known.

Signed-off-by: Dmitry Bogdanov 
Signed-off-by: Igor Russkikh 
---
 drivers/net/ethernet/aquantia/atlantic/aq_nic.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/aquantia/atlantic/aq_nic.c 
b/drivers/net/ethernet/aquantia/atlantic/aq_nic.c
index d617289d95f7..0147c037ca96 100644
--- a/drivers/net/ethernet/aquantia/atlantic/aq_nic.c
+++ b/drivers/net/ethernet/aquantia/atlantic/aq_nic.c
@@ -84,8 +84,6 @@ void aq_nic_cfg_start(struct aq_nic_s *self)
 
cfg->is_lro = AQ_CFG_IS_LRO_DEF;
 
-   aq_nic_rss_init(self, cfg->num_rss_queues);
-
/*descriptors */
cfg->rxds = min(cfg->aq_hw_caps->rxds_max, AQ_CFG_RXDS_DEF);
cfg->txds = min(cfg->aq_hw_caps->txds_max, AQ_CFG_TXDS_DEF);
@@ -106,6 +104,8 @@ void aq_nic_cfg_start(struct aq_nic_s *self)
 
cfg->num_rss_queues = min(cfg->vecs, AQ_CFG_NUM_RSS_QUEUES_DEF);
 
+   aq_nic_rss_init(self, cfg->num_rss_queues);
+
cfg->irq_type = aq_pci_func_get_irq_type(self);
 
if ((cfg->irq_type == AQ_HW_IRQ_LEGACY) ||
-- 
2.17.1

[PATCH v2 net-next 1/4] net: aquantia: fix RSS table and key sizes

2018-12-07 Thread Igor Russkikh

From: Dmitry Bogdanov 

Set RSS indirection table and RSS hash key sizes to their real size.

Signed-off-by: Dmitry Bogdanov 
Signed-off-by: Igor Russkikh 
---
 drivers/net/ethernet/aquantia/atlantic/aq_cfg.h | 4 ++--
 drivers/net/ethernet/aquantia/atlantic/aq_nic.c | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/aquantia/atlantic/aq_cfg.h 
b/drivers/net/ethernet/aquantia/atlantic/aq_cfg.h
index 91eb8910b1c9..90a0e1d0d622 100644
--- a/drivers/net/ethernet/aquantia/atlantic/aq_cfg.h
+++ b/drivers/net/ethernet/aquantia/atlantic/aq_cfg.h
@@ -42,8 +42,8 @@
 #define AQ_CFG_IS_LRO_DEF   1U
 
 /* RSS */
-#define AQ_CFG_RSS_INDIRECTION_TABLE_MAX  128U
-#define AQ_CFG_RSS_HASHKEY_SIZE   320U
+#define AQ_CFG_RSS_INDIRECTION_TABLE_MAX  64U
+#define AQ_CFG_RSS_HASHKEY_SIZE   40U
 
 #define AQ_CFG_IS_RSS_DEF   1U
 #define AQ_CFG_NUM_RSS_QUEUES_DEF   AQ_CFG_VECS_DEF
diff --git a/drivers/net/ethernet/aquantia/atlantic/aq_nic.c 
b/drivers/net/ethernet/aquantia/atlantic/aq_nic.c
index 279ea58f4a9e..d617289d95f7 100644
--- a/drivers/net/ethernet/aquantia/atlantic/aq_nic.c
+++ b/drivers/net/ethernet/aquantia/atlantic/aq_nic.c
@@ -44,7 +44,7 @@ static void aq_nic_rss_init(struct aq_nic_s *self, unsigned 
int num_rss_queues)
struct aq_rss_parameters *rss_params = >aq_rss;
int i = 0;
 
-   static u8 rss_key[40] = {
+   static u8 rss_key[AQ_CFG_RSS_HASHKEY_SIZE] = {
0x1e, 0xad, 0x71, 0x87, 0x65, 0xfc, 0x26, 0x7d,
0x0d, 0x45, 0x67, 0x74, 0xcd, 0x06, 0x1a, 0x18,
0xb6, 0xc1, 0xf0, 0xc7, 0xbb, 0x18, 0xbe, 0xf8,
-- 
2.17.1

[PATCH v2 net-next 4/4] net: aquantia: add support of RSS configuration

2018-12-07 Thread Igor Russkikh

From: Dmitry Bogdanov 

Add support of configuration of RSS hash key and RSS indirection table.

Signed-off-by: Dmitry Bogdanov 
Signed-off-by: Igor Russkikh 
---
 .../ethernet/aquantia/atlantic/aq_ethtool.c   | 36 +++
 1 file changed, 36 insertions(+)

diff --git a/drivers/net/ethernet/aquantia/atlantic/aq_ethtool.c 
b/drivers/net/ethernet/aquantia/atlantic/aq_ethtool.c
index a5fd71692c8b..fcbfecf41c45 100644
--- a/drivers/net/ethernet/aquantia/atlantic/aq_ethtool.c
+++ b/drivers/net/ethernet/aquantia/atlantic/aq_ethtool.c
@@ -202,6 +202,41 @@ static int aq_ethtool_get_rss(struct net_device *ndev, u32 
*indir, u8 *key,
return 0;
 }
 
+static int aq_ethtool_set_rss(struct net_device *netdev, const u32 *indir,
+ const u8 *key, const u8 hfunc)
+{
+   struct aq_nic_s *aq_nic = netdev_priv(netdev);
+   struct aq_nic_cfg_s *cfg;
+   unsigned int i = 0U;
+   u32 rss_entries;
+   int err = 0;
+
+   cfg = aq_nic_get_cfg(aq_nic);
+   rss_entries = cfg->aq_rss.indirection_table_size;
+
+   /* We do not allow change in unsupported parameters */
+   if (hfunc != ETH_RSS_HASH_NO_CHANGE && hfunc != ETH_RSS_HASH_TOP)
+   return -EOPNOTSUPP;
+   /* Fill out the redirection table */
+   if (indir)
+   for (i = 0; i < rss_entries; i++)
+   cfg->aq_rss.indirection_table[i] = indir[i];
+
+   /* Fill out the rss hash key */
+   if (key) {
+   memcpy(cfg->aq_rss.hash_secret_key, key,
+  sizeof(cfg->aq_rss.hash_secret_key));
+   err = aq_nic->aq_hw_ops->hw_rss_hash_set(aq_nic->aq_hw,
+   >aq_rss);
+   if (err)
+   return err;
+   }
+
+   err = aq_nic->aq_hw_ops->hw_rss_set(aq_nic->aq_hw, >aq_rss);
+
+   return err;
+}
+
 static int aq_ethtool_get_rxnfc(struct net_device *ndev,
struct ethtool_rxnfc *cmd,
u32 *rule_locs)
@@ -549,6 +584,7 @@ const struct ethtool_ops aq_ethtool_ops = {
.set_pauseparam  = aq_ethtool_set_pauseparam,
.get_rxfh_key_size   = aq_ethtool_get_rss_key_size,
.get_rxfh= aq_ethtool_get_rss,
+   .set_rxfh= aq_ethtool_set_rss,
.get_rxnfc   = aq_ethtool_get_rxnfc,
.set_rxnfc   = aq_ethtool_set_rxnfc,
.get_sset_count  = aq_ethtool_get_sset_count,
-- 
2.17.1

[PATCH v2 net-next 2/4] net: aquantia: increase max number of hw queues

2018-12-07 Thread Igor Russkikh

From: Dmitry Bogdanov 

Increase the upper limit of the hw queues up to 8.
This makes RSS better on multiheaded cpus.

This is a maximum AQC hardware supports in one traffic class.

The actual value is still limited by a number of available cpu cores.

Signed-off-by: Dmitry Bogdanov 
Signed-off-by: Igor Russkikh 
---
 drivers/net/ethernet/aquantia/atlantic/aq_cfg.h   | 2 +-
 drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_b0.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/aquantia/atlantic/aq_cfg.h 
b/drivers/net/ethernet/aquantia/atlantic/aq_cfg.h
index 90a0e1d0d622..3944ce7f0870 100644
--- a/drivers/net/ethernet/aquantia/atlantic/aq_cfg.h
+++ b/drivers/net/ethernet/aquantia/atlantic/aq_cfg.h
@@ -12,7 +12,7 @@
 #ifndef AQ_CFG_H
 #define AQ_CFG_H
 
-#define AQ_CFG_VECS_DEF   4U
+#define AQ_CFG_VECS_DEF   8U
 #define AQ_CFG_TCS_DEF1U
 
 #define AQ_CFG_TXDS_DEF4096U
diff --git a/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_b0.c 
b/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_b0.c
index 6af7d7f0cdca..08596a7a6486 100644
--- a/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_b0.c
+++ b/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_b0.c
@@ -21,7 +21,7 @@
 
 #define DEFAULT_B0_BOARD_BASIC_CAPABILITIES \
.is_64_dma = true,\
-   .msix_irqs = 4U,  \
+   .msix_irqs = 8U,  \
.irq_mask = ~0U,  \
.vecs = HW_ATL_B0_RSS_MAX,\
.tcs = HW_ATL_B0_TC_MAX,  \
-- 
2.17.1

[PATCH v2 net-next 0/4] net: aquantia: add RSS configuration

2018-12-07 Thread Igor Russkikh

In this patchset few bugs related to RSS are fixed and RSS table and
hash key configuration is added.

We also do increase max number of HW rings upto 8.

v2: removed extra arg check

Dmitry Bogdanov (4):
  net: aquantia: fix RSS table and key sizes
  net: aquantia: increase max number of hw queues
  net: aquantia: fix initialization of RSS table
  net: aquantia: add support of RSS configuration

 .../net/ethernet/aquantia/atlantic/aq_cfg.h   |  6 ++--
 .../ethernet/aquantia/atlantic/aq_ethtool.c   | 36 +++
 .../net/ethernet/aquantia/atlantic/aq_nic.c   |  6 ++--
 .../aquantia/atlantic/hw_atl/hw_atl_b0.c  |  2 +-
 4 files changed, 43 insertions(+), 7 deletions(-)

-- 
2.17.1

Re: [PATCH bpf-next 0/7] Add XDP_ATTACH bind() flag to AF_XDP sockets

2018-12-07 Thread Jesper Dangaard Brouer

On Fri,  7 Dec 2018 12:44:24 +0100
Björn Töpel  wrote:

> The rationale behind attach is performance and ease of use. Many XDP
> socket users just need a simple way of creating/binding a socket and
> receiving frames right away without loading an XDP program.
> 
> XDP_ATTACH adds a mechanism we call "builtin XDP program" that simply
> is a kernel provided XDP program that is installed to the netdev when
> XDP_ATTACH is being passed as a bind() flag.
> 
> The builtin program is the simplest program possible to redirect a
> frame to an attached socket. In restricted C it would look like this:
> 
>   SEC("xdp")
>   int xdp_prog(struct xdp_md *ctx)
>   {
> return bpf_xsk_redirect(ctx);
>   }
> 
> The builtin program loaded via XDP_ATTACH behaves, from an
> install-to-netdev/uninstall-from-netdev point of view, differently
> from regular XDP programs. The easiest way to look at it is as a
> 2-level hierarchy, where regular XDP programs has precedence over the
> builtin one.
> 
> If no regular XDP program is installed to the netdev, the builtin will
> be install. If the builtin program is installed, and a regular is
> installed, regular XDP program will have precedence over the builtin
> one.
> 
> Further, if a regular program is installed, and later removed, the
> builtin one will automatically be installed.
> 
> The sxdp_flags field of struct sockaddr_xdp gets two new options
> XDP_BUILTIN_SKB_MODE and XDP_BUILTIN_DRV_MODE, which maps to the
> corresponding XDP netlink install flags.
> 
> The builtin XDP program functionally adds even more complexity to the
> already hard to read dev_change_xdp_fd. Maybe it would be simpler to
> store the program in the struct net_device together with the install
> flags instead of calling the ndo_bpf multiple times?

(As far as I can see from reading the code, correct me if I'm wrong.)

If an AF_XDP program uses XDP_ATTACH, then it installs the
builtin-program as the XDP program on the "entire" device.  That means
all RX-queues will call this XDP-bpf program (indirect call), and it is
actually only relevant for the specific queue_index.  Yes, the helper
call does check that the 'xdp->rxq->queue_index' for an attached 'xsk'
and return XDP_PASS if it is NULL:

+BPF_CALL_1(bpf_xdp_xsk_redirect, struct xdp_buff *, xdp)
+{
+   struct bpf_redirect_info *ri = this_cpu_ptr(_redirect_info);
+   struct xdp_sock *xsk;
+
+   xsk = READ_ONCE(xdp->rxq->dev->_rx[xdp->rxq->queue_index].xsk);
+   if (xsk) {
+   ri->xsk = xsk;
+   return XDP_REDIRECT;
+   }
+
+   return XDP_PASS;
+}

Why do every normal XDP_PASS packet have to pay this overhead (indirect
call), when someone loads an AF_XDP socket program?  The AF_XDP socket
is tied hard and only relevant to a specific RX-queue (which is why we
get a performance boost due to SPSC queues).

I acknowledge there is a need for this, but this use-case shows there
is a need for attaching XDP programs per RX-queue basis.

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

Re: [PATCH] bpf: fix overflow of bpf_jit_limit when PAGE_SIZE >= 64K

2018-12-07 Thread Michael Ellerman

Michael Roth  writes:

> Commit ede95a63b5 introduced a bpf_jit_limit tuneable to limit BPF
> JIT allocations. At compile time it defaults to PAGE_SIZE * 4,
> and is adjusted again at init time if MODULES_VADDR is defined.
>
> For ppc64 kernels, MODULES_VADDR isn't defined, so we're stuck with

But maybe it should be, I don't know why we don't define it.

> the compile-time default at boot-time, which is 0x9c40 when
> using 64K page size. This overflows the signed 32-bit bpf_jit_limit
> value:
>
>   root@ubuntu:/tmp# cat /proc/sys/net/core/bpf_jit_limit
>   -1673527296
>
> and can cause various unexpected failures throughout the network
> stack. In one case `strace dhclient eth0` reported:
>
>   setsockopt(5, SOL_SOCKET, SO_ATTACH_FILTER, {len=11, filter=0x105dd27f8}, 
> 16) = -1 ENOTSUPP (Unknown error 524)
>
> and similar failures can be seen with tools like tcpdump. This doesn't
> always reproduce however, and I'm not sure why. The more consistent
> failure I've seen is an Ubuntu 18.04 KVM guest booted on a POWER9 host
> would time out on systemd/netplan configuring a virtio-net NIC with no
> noticeable errors in the logs.
>
> Fix this by limiting the compile-time default for bpf_jit_limit to
> INT_MAX.

INT_MAX is a lot more than (4k * 4), so I guess I'm not clear on
whether we should be using PAGE_SIZE here at all. I guess each BPF
program uses at least one page is the thinking?

Thanks for tracking this down. For some reason none of my ~10 test boxes
have hit this, perhaps I don't have new enough userspace?

You don't mention why you needed to add BPF_MIN(), I assume because the
kernel version of min() has gotten too complicated to work here?

Daniel I assume you'll merge this via your tree?

cheers

> Fixes: ede95a63b5e8 ("bpf: add bpf_jit_limit knob to restrict unpriv 
> allocations")
> Cc: linuxppc-...@ozlabs.org
> Cc: Daniel Borkmann 
> Cc: Sandipan Das 
> Cc: Alexei Starovoitov 
> Signed-off-by: Michael Roth 
> ---
>  kernel/bpf/core.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
> index b1a3545d0ec8..55de4746cdfd 100644
> --- a/kernel/bpf/core.c
> +++ b/kernel/bpf/core.c
> @@ -365,7 +365,8 @@ void bpf_prog_kallsyms_del_all(struct bpf_prog *fp)
>  }
>  
>  #ifdef CONFIG_BPF_JIT
> -# define BPF_JIT_LIMIT_DEFAULT   (PAGE_SIZE * 4)
> +# define BPF_MIN(x, y) ((x) < (y) ? (x) : (y))
> +# define BPF_JIT_LIMIT_DEFAULT   BPF_MIN((PAGE_SIZE * 4), INT_MAX)
>  
>  /* All BPF JIT sysctl knobs here. */
>  int bpf_jit_enable   __read_mostly = IS_BUILTIN(CONFIG_BPF_JIT_ALWAYS_ON);
> -- 
> 2.17.1

Re: [RFC PATCH 4/6] dt-bindings: update mvneta binding document

2018-12-07 Thread Russell King - ARM Linux

On Fri, Dec 07, 2018 at 05:30:52PM +0530, Kishon Vijay Abraham I wrote:
> Hi,
> 
> On 07/12/18 5:03 PM, Russell King - ARM Linux wrote:
> > On Fri, Dec 07, 2018 at 04:43:27PM +0530, Kishon Vijay Abraham I wrote:
> >> Russell,
> >>
> >> No, I haven't merged patches from this series. That would have failed
> >> compilation since Grygorii modified enum phy_mode which is used in this 
> >> series.
> >> You have also noted this in your cover letter.
> > 
> > Ok, but in any case, given the complexities of modifying the patch
> > and properly testing it, I think I'll wait until those changes have
> > hit mainline before re-spinning this series.  Alternatively, if
> > you're happy to take just build-tested version, I could re-spin
> > with that so at least we can get the phy bits queued for the merge
> > window.
> 
> I'd prefer we test it before merging.

Okay, expect it sometime after Christmas.  In any case, waiting for
the upheaval in the phy API to hit mainline will need to happen to
that netdev is in sync with the revised phy API.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up

Re: [PATCH 0/3] net: macb: DMA race condition fixes

2018-12-07 Thread Anssi Hannula

On 5.12.2018 22:35, David Miller wrote:
> From: Anssi Hannula 
> Date: Fri, 30 Nov 2018 20:21:34 +0200
>
>> Here are a couple of race condition fixes for the macb driver. The first
>> two are issues observed on real HW.
> It looks like there is still an active discussion about the memory
> barriers in patch #3 being excessive.
>
> Once that is sorted out to everyone's satisfaction, would you
> please repost this series with appropriate ACKs, reviewed-by's,
> tested-by's, etc. added?
>
> Thank you.


OK, I'll do that once everything is sorted.

Nicolas, were Claudiu's tests the ones you wanted to wait for or are we
waiting for more tests?

-- 
Anssi Hannula / Bitwise Oy
+358 503803997

Re: [RFC PATCH 4/6] dt-bindings: update mvneta binding document

2018-12-07 Thread Kishon Vijay Abraham I

Hi,

On 07/12/18 5:03 PM, Russell King - ARM Linux wrote:
> On Fri, Dec 07, 2018 at 04:43:27PM +0530, Kishon Vijay Abraham I wrote:
>> Russell,
>>
>> No, I haven't merged patches from this series. That would have failed
>> compilation since Grygorii modified enum phy_mode which is used in this 
>> series.
>> You have also noted this in your cover letter.
> 
> Ok, but in any case, given the complexities of modifying the patch
> and properly testing it, I think I'll wait until those changes have
> hit mainline before re-spinning this series.  Alternatively, if
> you're happy to take just build-tested version, I could re-spin
> with that so at least we can get the phy bits queued for the merge
> window.

I'd prefer we test it before merging.
> 
> In any case, I'm busy trying to get to the bottom of several OMAP4
> bugs while ill, so this isn't something I want to do at the moment.

Take care!

Thanks
Kishon
>

Re: [PATCH 3/3] net: macb: add missing barriers when reading buffers

2018-12-07 Thread Anssi Hannula

On 6.12.2018 16:14, claudiu.bez...@microchip.com wrote:
> Hi Anssi,

Hi!

> On 05.12.2018 16:00, Anssi Hannula wrote:
>> On 5.12.2018 14:37, claudiu.bez...@microchip.com wrote:
>>> On 30.11.2018 20:21, Anssi Hannula wrote:
 When reading buffer descriptors on RX or on TX completion, an
 RX_USED/TX_USED bit is checked first to ensure that the descriptor has
 been populated. However, there are no memory barriers to ensure that the
 data protected by the RX_USED/TX_USED bit is up-to-date with respect to
 that bit.

 Fix that by adding DMA read memory barriers on those paths.

 I did not observe any actual issues caused by these being missing,
 though.

 Tested on a ZynqMP based system.

 Signed-off-by: Anssi Hannula 
 Fixes: 89e5785fc8a6 ("[PATCH] Atmel MACB ethernet driver")
 Cc: Nicolas Ferre 
 ---
  drivers/net/ethernet/cadence/macb_main.c | 20 
  1 file changed, 16 insertions(+), 4 deletions(-)

 diff --git a/drivers/net/ethernet/cadence/macb_main.c 
 b/drivers/net/ethernet/cadence/macb_main.c
 index 430b7a0f5436..c93baa8621d5 100644
 --- a/drivers/net/ethernet/cadence/macb_main.c
 +++ b/drivers/net/ethernet/cadence/macb_main.c
 @@ -861,6 +861,11 @@ static void macb_tx_interrupt(struct macb_queue 
 *queue)

/* First, update TX stats if needed */
if (skb) {
 +  /* Ensure all of desc is at least as up-to-date
 +   * as ctrl (TX_USED bit)
 +   */
 +  dma_rmb();
 +
>>> Is this necessary? Wouldn't previous rmb() take care of this? At this time
>>> data specific to this descriptor was completed. The TX descriptors for next
>>> data to be send is updated under a locked spinlock.
>> The previous rmb() is before the TX_USED check, so my understanding is
>> that the following could happen in theory:
> We are using this IP on and ARM architecture, so, with regards to rmb(), I
> understand from [1] that dsb completes when:
> "All explicit memory accesses before this instruction complete.
> All Cache, Branch predictor and TLB maintenance operations before this
> instruction complete."
>
>> 1. rmb().
> According to [1] this should end after all previous instructions (loads,
> stores) ends.
>
>> 2. Reads are reordered so that TX timestamp is read first - no barriers
>> are crossed.
> But, as per [1], no onward instruction will be reached until all
> instruction prior to dsb ends, so, after rmb() all descriptor's members
> should be updated, right?

The descriptor that triggered the TX interrupt should be visible now, yes.
However, the controller may be writing to any other descriptors at the
same time as the loop is processing through them, as there are multiple
TX buffers.

>> 3. HW writes timestamp and sets TX_USED (or they become visible).
> I expect hardware to set TX_USED and timestamp before raising TX complete
> interrupt. If so, there should be no on-flight updates of this descriptor,
> right? Hardware raised a TX_USED bit read interrupt when it reads a
> descriptor like this and hangs TX.

For the first iteration of the loop, that is correct - there should be
no in-flight writes from controller as it already raised the interrupt.
However, the following iterations of the loop process descriptors that
may or may not have the interrupt raised yet, and therefore may still
have in-flight writes.

>> 4. Code checks TX_USED.
>> 5. Code operates on timestamp that is actually garbage.
>>
>> I'm not 100% sure that there isn't some lighter/cleaner way to do this
>> than dma_rmb(), though.
> If you still think this scenario could happen why not calling a dsb in
> gem_ptp_do_timestamp(). I feel like that is a proper place to call it.

OK, I will move it there. Unless we arrive at a conclusion that it is
unnecessary altogether, of course :)

> Moreover, there is bit 32 of desc->ctrl which tells you if a valid
> timestamp was placed in the descriptor. But, again, I expect the timestamp
> and TX_USED to be set by hardware before raising TX complete interrupt.

Yes, but since my concern is that without barriers in between,
desc->ctrl might be read after ts_1/ts_2, so that bit might be seen as
set even though ts_1 is not yet an actual timestamp. And per above, all
this may occur before the TX complete interrupt is raised for the
descriptor in question.

I agree that this TX case seems somewhat unlikely to be triggered in
real life at least at present, though, as the ts_1/ts_2 read is so far
after the desc->ctrl read, and behind function calls, so unlikely to be
reordered before desc->ctrl read.
But the similar RX cases below seem more problematic as the racy loads
in question are right after each other in code.

> [1]
> http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0489c/CIHGHHIE.html
>

1 2 >

1 - 100 of 117 matches

Mail list logo