Re: [PATCH v2 01/21] scatterlist: Introduce sg_map helper functions

2017-04-26 Thread Christoph Hellwig
On Wed, Apr 26, 2017 at 12:11:33PM -0600, Logan Gunthorpe wrote:
> Ok, well for starters I think you are mistaken about kmap being able to
> fail. I'm having a hard time finding many users of that function that
> bother to check for an error when calling it.

A quick audit of the arch code shows you're right - kmap can't fail
anywhere anymore.

> The main difficulty we
> have now is that neither of those functions are expected to fail and we
> need them to be able to in cases where the page doesn't map to system
> RAM. This patch series is trying to address it for users of scatterlist.
> I'm certainly open to other suggestions.

I think you'll need to follow the existing kmap semantics and never
fail the iomem version either.  Otherwise you'll have a special case
that's almost never used that has a different error path.

> There are a fair number of cases in the kernel that do something like:
> 
> if (something)
> x = kmap(page);
> else
> x = kmap_atomic(page);
> ...
> if (something)
> kunmap(page)
> else
> kunmap_atomic(x)
> 
> Which just seems cumbersome to me.

Passing a different flag based on something isn't really much better.

> In any case, if you can accept an sg_kmap and sg_kmap_atomic api just
> say so and I'll make the change. But I'll still need a flags variable
> for SG_MAP_MUST_NOT_FAIL to support legacy cases that have no fail path
> and both of those functions will need to be pretty nearly replicas of
> each other.

Again, wrong way.  Suddenly making things fail for your special case
that normally don't fail is a receipe for bugs.


Re: [PATCH 5/7] IB/hfi1: use pcie_flr instead of duplicating it

2017-04-26 Thread Christoph Hellwig
On Tue, Apr 25, 2017 at 02:39:55PM -0500, Bjorn Helgaas wrote:
> This still leaves these:
> 
>   [PATCH 4/7] ixgbe: use pcie_flr instead of duplicating it
>   [PATCH 6/7] crypto: qat: use pcie_flr instead of duplicating it
>   [PATCH 7/7] liquidio: use pcie_flr instead of duplicating it
> 
> I haven't seen any response to 4 and 6.  Felix reported an unused
> variable in 7.  Let me know if you'd like me to do anything with
> these.

Now that Jeff ACKed 4 it might be worth to add it to the pci tree last
minute.  I'll resend liquidio and qat to the respective maintainers for
the next merge window.


RE: [iproute2] iplink: add support for IFLA_CARRIER attribute

2017-04-26 Thread 张胜举
> -Original Message-
> From: Stephen Hemminger [mailto:step...@networkplumber.org]
> Sent: Wednesday, April 26, 2017 11:08 PM
> To: Zhang Shengju 
> Cc: netdev@vger.kernel.org
> Subject: Re: [iproute2] iplink: add support for IFLA_CARRIER attribute
> 
> On Wed, 26 Apr 2017 15:08:39 +0800
> Zhang Shengju  wrote:
> 
> > Add support to set IFLA_CARRIER attribute.
> >
> > Signed-off-by: Zhang Shengju 
> > ---
> >  ip/iplink.c | 12 
> >  1 file changed, 12 insertions(+)
> >
> > diff --git a/ip/iplink.c b/ip/iplink.c index 866ad72..263bfdd 100644
> > --- a/ip/iplink.c
> > +++ b/ip/iplink.c
> > @@ -72,6 +72,7 @@ void iplink_usage(void)
> > " [ allmulticast { on | off } ]\n"
> > " [ promisc { on | off } ]\n"
> > " [ trailers { on | off } ]\n"
> > +   " [ carrier { on | off } ]\n"
> > " [ txqueuelen PACKETS ]\n"
> > " [ name NEWNAME ]\n"
> > " [ address LLADDR ]\n"
> > @@ -673,6 +674,17 @@ int iplink_parse(int argc, char **argv, struct
> iplink_req *req,
> > req->i.ifi_flags |= IFF_NOARP;
> > else
> > return on_off("arp", *argv);
> > +   } else if (strcmp(*argv, "carrier") == 0) {
> > +   int carrier;
> > +   NEXT_ARG();
> > +   if (strcmp(*argv, "on") == 0)
> > +   carrier = 1;
> > +   else if (strcmp(*argv, "off") == 0)
> > +   carrier = 0;
> > +   else
> > +   return on_off("carrier", *argv);
> > +
> > +   addattr8(&req->n, sizeof(*req), IFLA_CARRIER,
> carrier);
> > } else if (strcmp(*argv, "vf") == 0) {
> > struct rtattr *vflist;
> >
> 
> The general policy of ip link command is all options should be invertable.
Yes, so I add 'on' and 'off' subcommand to make sure that it can be
invertable.

> There are some VPN's that use this to save and restore state. So if you
add
> an option to set something there should be similar output under the
detailed
> show command.
Currently, "show command" already can display 'carrier' status. Such as:
 dummy0: 







Re: [PATCH net-next v8 2/3] net sched actions: dump more than TCA_ACT_MAX_PRIO actions per batch

2017-04-26 Thread Jiri Pirko
Wed, Apr 26, 2017 at 10:07:08PM CEST, j...@mojatatu.com wrote:
>On 17-04-26 09:56 AM, Jiri Pirko wrote:
>> Wed, Apr 26, 2017 at 03:14:38PM CEST, j...@mojatatu.com wrote:
>> > On 17-04-26 08:08 AM, Jiri Pirko wrote:
>
>[..]
>

[...]

>
>> > Again: You are looking at this from a manageability point of view which
>> > is useful but not the only input into a design. If i can squeeze more
>> > data without killing usability - I am all for it. It just doesnt
>> > compute that it is ok to use a flag per attribute because it looks
>> > beautiful.
>> 
>> Hmm. Now that I'm thinking about it, why don't we have NLA_FLAGS with
>> couple of helpers around it? It will be obvious what the attr is, all
>> kernel code would use the same helpers. Would be nice.
>> 
>
>I think to have flags at that level is useful but it
>is a different hierarchy level. I am not sure the
>"actions dump large messages" is a fit for that level.

Jamal, the idea is to have exactly what you want to have. Only does not
use NLA_U32 attr for that but a special attr NLA_FLAGS which would have
well defined semantics and set of helpers to work with and enforce it.

Then, this could be easily reused in other subsystem that uses netlink


[RFC net-next 2/2] bpf: Test for bpf_prog ID and BPF_PROG_GET_NEXT_ID

2017-04-26 Thread Martin KaFai Lau
Add test to exercise the bpf_prog id generation
and iteration.

Signed-off-by: Martin KaFai Lau 
---
 tools/include/uapi/linux/bpf.h |  6 ++
 tools/lib/bpf/bpf.c| 11 
 tools/lib/bpf/bpf.h|  1 +
 tools/testing/selftests/bpf/Makefile   |  2 +-
 tools/testing/selftests/bpf/test_prog_id.c | 93 ++
 5 files changed, 112 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/bpf/test_prog_id.c

diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index e553529929f6..270f501c5597 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -82,6 +82,7 @@ enum bpf_cmd {
BPF_PROG_ATTACH,
BPF_PROG_DETACH,
BPF_PROG_TEST_RUN,
+   BPF_PROG_GET_NEXT_ID,
 };
 
 enum bpf_map_type {
@@ -201,6 +202,11 @@ union bpf_attr {
__u32   repeat;
__u32   duration;
} test;
+
+   struct { /* anonymous struct used by BPF_PROG_GET_NEXT_ID */
+   __u32   start_id;
+   __aligned_u64   next_id;
+   };
 } __attribute__((aligned(8)));
 
 /* BPF helper function descriptions:
diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
index 4fe444b8092e..e23011f88fb4 100644
--- a/tools/lib/bpf/bpf.c
+++ b/tools/lib/bpf/bpf.c
@@ -235,3 +235,14 @@ int bpf_prog_test_run(int prog_fd, int repeat, void *data, 
__u32 size,
*duration = attr.test.duration;
return ret;
 }
+
+int bpf_prog_get_next_id(__u32 start_id, __u32 *next_id)
+{
+   union bpf_attr attr;
+
+   bzero(&attr, sizeof(attr));
+   attr.start_id = start_id;
+   attr.next_id = ptr_to_u64(next_id);
+
+   return sys_bpf(BPF_PROG_GET_NEXT_ID, &attr, sizeof(attr));
+}
diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h
index edb4daeff7a5..200f1ffc9cf9 100644
--- a/tools/lib/bpf/bpf.h
+++ b/tools/lib/bpf/bpf.h
@@ -50,5 +50,6 @@ int bpf_prog_detach(int attachable_fd, enum bpf_attach_type 
type);
 int bpf_prog_test_run(int prog_fd, int repeat, void *data, __u32 size,
  void *data_out, __u32 *size_out, __u32 *retval,
  __u32 *duration);
+int bpf_prog_get_next_id(__u32 start_id, __u32 *next_id);
 
 #endif
diff --git a/tools/testing/selftests/bpf/Makefile 
b/tools/testing/selftests/bpf/Makefile
index d8d94b9bd76c..68c4a920cb56 100644
--- a/tools/testing/selftests/bpf/Makefile
+++ b/tools/testing/selftests/bpf/Makefile
@@ -11,7 +11,7 @@ endif
 CFLAGS += -Wall -O2 -I$(APIDIR) -I$(LIBDIR) -I$(GENDIR) $(GENFLAGS) 
-I../../../include
 LDLIBS += -lcap -lelf
 
-TEST_GEN_PROGS = test_verifier test_tag test_maps test_lru_map test_lpm_map 
test_progs
+TEST_GEN_PROGS = test_verifier test_tag test_maps test_lru_map test_lpm_map 
test_progs test_prog_id
 
 TEST_GEN_FILES = test_pkt_access.o test_xdp.o test_l4lb.o
 
diff --git a/tools/testing/selftests/bpf/test_prog_id.c 
b/tools/testing/selftests/bpf/test_prog_id.c
new file mode 100644
index ..f6c56c649d72
--- /dev/null
+++ b/tools/testing/selftests/bpf/test_prog_id.c
@@ -0,0 +1,93 @@
+/* Copyright (c) 2017 Facebook
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of version 2 of the GNU General Public
+ * License as published by the Free Software Foundation.
+ */
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "bpf_util.h"
+
+static int bpf_prog_load(const char *file, enum bpf_prog_type type,
+struct bpf_object **pobj, int *prog_fd)
+{
+   struct bpf_program *prog;
+   struct bpf_object *obj;
+   int err;
+
+   obj = bpf_object__open(file);
+   if (IS_ERR(obj))
+   return -ENOENT;
+
+   prog = bpf_program__next(NULL, obj);
+   if (!prog) {
+   bpf_object__close(obj);
+   return -ENOENT;
+   }
+
+   bpf_program__set_type(prog, type);
+   err = bpf_object__load(obj);
+   if (err) {
+   bpf_object__close(obj);
+   return -EINVAL;
+   }
+
+   *pobj = obj;
+   *prog_fd = bpf_program__fd(prog);
+   return 0;
+}
+
+int main(void)
+{
+   struct rlimit rinf = { RLIM_INFINITY, RLIM_INFINITY };
+   const char *file = "./test_pkt_access.o";
+   const int nr_iters = 16;
+   int bpf_prog_fds[nr_iters];
+   int i, err = 0;
+   uint32_t next_id = 0;
+
+   if (setrlimit(RLIMIT_MEMLOCK, &rinf)) {
+   perror("setrlimit");
+   return -1;
+   }
+
+   memset(bpf_prog_fds, -1, sizeof(bpf_prog_fds));
+
+   for (i = 0; i < nr_iters; i++) {
+   struct bpf_object *obj;
+   int prog_fd;
+
+   err = bpf_prog_load(file, BPF_PROG_TYPE_SCHED_CLS, &obj,
+   &prog_fd);
+   if (err) {
+   perror("bpf_prog_load");
+ 

[RFC net-next 1/2] bpf: Introduce bpf_prog ID

2017-04-26 Thread Martin KaFai Lau
The ID is generated by the existing idr_alloc_cyclic().

This patch also adds BPF_PROG_GET_NEXT_ID to allow userspace to iterate
all bpf_prog id(s).  The API is trying to be consistent with the
existing BPF_MAP_GET_NEXT_KEY.

Signed-off-by: Martin KaFai Lau 
---
 include/linux/filter.h   |  1 +
 include/uapi/linux/bpf.h |  6 ++
 kernel/bpf/syscall.c | 47 ++-
 3 files changed, 53 insertions(+), 1 deletion(-)

diff --git a/include/linux/filter.h b/include/linux/filter.h
index 9a7786db14fa..6eeeb83c4013 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -417,6 +417,7 @@ struct bpf_prog {
kmemcheck_bitfield_end(meta);
enum bpf_prog_type  type;   /* Type of BPF program */
u32 len;/* Number of filter blocks */
+   u32 id;
u8  tag[BPF_TAG_SIZE];
struct bpf_prog_aux *aux;   /* Auxiliary fields */
struct sock_fprog_kern  *orig_prog; /* Original BPF program */
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index e553529929f6..270f501c5597 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -82,6 +82,7 @@ enum bpf_cmd {
BPF_PROG_ATTACH,
BPF_PROG_DETACH,
BPF_PROG_TEST_RUN,
+   BPF_PROG_GET_NEXT_ID,
 };
 
 enum bpf_map_type {
@@ -201,6 +202,11 @@ union bpf_attr {
__u32   repeat;
__u32   duration;
} test;
+
+   struct { /* anonymous struct used by BPF_PROG_GET_NEXT_ID */
+   __u32   start_id;
+   __aligned_u64   next_id;
+   };
 } __attribute__((aligned(8)));
 
 /* BPF helper function descriptions:
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 13642c73dca0..6a654e17bd3c 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -22,8 +22,11 @@
 #include 
 #include 
 #include 
+#include 
 
 DEFINE_PER_CPU(int, bpf_prog_active);
+DEFINE_IDR(prog_idr);
+DEFINE_SPINLOCK(prog_idr_lock);
 
 int sysctl_unprivileged_bpf_disabled __read_mostly;
 
@@ -663,6 +666,9 @@ static void __bpf_prog_put_rcu(struct rcu_head *rcu)
 void bpf_prog_put(struct bpf_prog *prog)
 {
if (atomic_dec_and_test(&prog->aux->refcnt)) {
+   spin_lock(&prog_idr_lock);
+   idr_remove(&prog_idr, prog->id);
+   spin_unlock(&prog_idr_lock);
trace_bpf_prog_put_rcu(prog);
bpf_prog_kallsyms_del(prog);
call_rcu(&prog->aux->rcu, __bpf_prog_put_rcu);
@@ -790,7 +796,7 @@ static int bpf_prog_load(union bpf_attr *attr)
 {
enum bpf_prog_type type = attr->prog_type;
struct bpf_prog *prog;
-   int err;
+   int err, id;
char license[128];
bool is_gpl;
 
@@ -848,6 +854,15 @@ static int bpf_prog_load(union bpf_attr *attr)
if (err < 0)
goto free_used_maps;
 
+   spin_lock(&prog_idr_lock);
+   id = idr_alloc_cyclic(&prog_idr, prog, 1, INT_MAX, GFP_USER);
+   spin_unlock(&prog_idr_lock);
+   if (id < 0) {
+   err = id;
+   goto free_used_maps;
+   }
+   prog->id = id;
+
/* eBPF program is ready to be JITed */
prog = bpf_prog_select_runtime(prog, &err);
if (err < 0)
@@ -995,6 +1010,33 @@ static int bpf_prog_test_run(const union bpf_attr *attr,
return ret;
 }
 
+#define BPF_PROG_GET_NEXT_ID_LAST_FIELD next_id
+
+static int bpf_prog_get_next_id(union bpf_attr *attr)
+{
+   u64 __user *unext_id = u64_to_user_ptr(attr->next_id);
+   u32 next_id = attr->start_id;
+   struct bpf_prog *prog;
+
+   if (CHECK_ATTR(BPF_PROG_GET_NEXT_ID))
+   return -EINVAL;
+
+   if (next_id++ >= INT_MAX)
+   return -EINVAL;
+
+   spin_lock(&prog_idr_lock);
+   prog = idr_get_next(&prog_idr, &next_id);
+   spin_unlock(&prog_idr_lock);
+
+   if (!prog)
+   return -ENOENT;
+
+   if (copy_to_user(unext_id, &next_id, sizeof(next_id)))
+   return -EFAULT;
+
+   return 0;
+}
+
 SYSCALL_DEFINE3(bpf, int, cmd, union bpf_attr __user *, uattr, unsigned int, 
size)
 {
union bpf_attr attr = {};
@@ -1072,6 +1114,9 @@ SYSCALL_DEFINE3(bpf, int, cmd, union bpf_attr __user *, 
uattr, unsigned int, siz
case BPF_PROG_TEST_RUN:
err = bpf_prog_test_run(&attr, uattr);
break;
+   case BPF_PROG_GET_NEXT_ID:
+   err = bpf_prog_get_next_id(&attr);
+   break;
default:
err = -EINVAL;
break;
-- 
2.9.3



[RFC net-next 0/2] Introduce bpf_prog ID and iteration

2017-04-26 Thread Martin KaFai Lau
This patchset introduces the bpf_prog ID and a new bpf cmd to
iterate all bpf_prog in the system.

It is still incomplete.  The idea can be extended to bpf_map.

Martin KaFai Lau (2):
  bpf: Introduce bpf_prog ID
  bpf: Test for bpf_prog ID and BPF_PROG_GET_NEXT_ID

 include/linux/filter.h |  1 +
 include/uapi/linux/bpf.h   |  6 ++
 kernel/bpf/syscall.c   | 47 ++-
 tools/include/uapi/linux/bpf.h |  6 ++
 tools/lib/bpf/bpf.c| 11 
 tools/lib/bpf/bpf.h|  1 +
 tools/testing/selftests/bpf/Makefile   |  2 +-
 tools/testing/selftests/bpf/test_prog_id.c | 93 ++
 8 files changed, 165 insertions(+), 2 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/test_prog_id.c

-- 
2.9.3



rhashtable - Cap total number of entries to 2^31

2017-04-26 Thread Herbert Xu
On Tue, Apr 25, 2017 at 10:48:22AM -0400, David Miller wrote:
> From: Florian Westphal 
> Date: Tue, 25 Apr 2017 16:17:49 +0200
> 
> > I'd have less of an issue with this if we'd be talking about
> > something computationally expensive, but this is about storing
> > an extra value inside a struct just to avoid one "shr" in insert path...
> 
> Agreed, this shift is probably filling an available cpu cycle :-)

OK, but we need to have an extra field for another reason anyway.
The problem is that we're not capping the total number of elements
in the hashtable when max_size is not set, this means that nelems
can overflow which will cause havoc with the automatic shrinking
when it tries to fit 2^32 entries into a minimum-sized table.

So I'm taking that hole back for now :)

---8<---
When max_size is not set or if it set to a sufficiently large
value, the nelems counter can overflow.  This would cause havoc
with the automatic shrinking as it would then attempt to fit a
huge number of entries into a tiny hash table.

This patch fixes this by adding max_elems to struct rhashtable
to cap the number of elements.  This is set to 2^31 as nelems is
not a precise count.  This is sufficiently smaller than UINT_MAX
that it should be safe.

When max_size is set max_elems will be lowered to at most twice
max_size as is the status quo.

Signed-off-by: Herbert Xu 

diff --git a/include/linux/rhashtable.h b/include/linux/rhashtable.h
index ae93b65..45f8936 100644
--- a/include/linux/rhashtable.h
+++ b/include/linux/rhashtable.h
@@ -155,6 +155,7 @@ struct rhashtable_params {
  * @nelems: Number of elements in table
  * @key_len: Key length for hashfn
  * @p: Configuration parameters
+ * @max_elems: Maximum number of elements in table
  * @rhlist: True if this is an rhltable
  * @run_work: Deferred worker to expand/shrink asynchronously
  * @mutex: Mutex to protect current/future table swapping
@@ -165,6 +166,7 @@ struct rhashtable {
atomic_tnelems;
unsigned intkey_len;
struct rhashtable_paramsp;
+   unsigned intmax_elems;
boolrhlist;
struct work_struct  run_work;
struct mutexmutex;
@@ -327,8 +329,7 @@ static inline bool rht_grow_above_100(const struct 
rhashtable *ht,
 static inline bool rht_grow_above_max(const struct rhashtable *ht,
  const struct bucket_table *tbl)
 {
-   return ht->p.max_size &&
-  (atomic_read(&ht->nelems) / 2u) >= ht->p.max_size;
+   return atomic_read(&ht->nelems) >= ht->max_elems;
 }
 
 /* The bucket lock is selected based on the hash and protects mutations
diff --git a/lib/rhashtable.c b/lib/rhashtable.c
index f3b82e0..751630b 100644
--- a/lib/rhashtable.c
+++ b/lib/rhashtable.c
@@ -961,6 +961,11 @@ int rhashtable_init(struct rhashtable *ht,
if (params->max_size)
ht->p.max_size = rounddown_pow_of_two(params->max_size);
 
+   /* Cap total entries at 2^31 to avoid nelems overflow. */
+   ht->max_elems = 1u << 31;
+   if (ht->p.max_size < ht->max_elems / 2)
+   ht->max_elems = ht->p.max_size * 2;
+
ht->p.min_size = max(ht->p.min_size, HASH_MIN_SIZE);
 
if (params->nelem_hint)

-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt


Re: ipsec doesn't route TCP with 4.11 kernel

2017-04-26 Thread Cong Wang
(Cc'ing netdev and IPSec maintainers)

On Tue, Apr 25, 2017 at 6:08 PM, Don Bowman  wrote:
> I'm not sure how to describe this.
>
> 4.11rc2 worked, after that, no.
>
> My ipsec tunnel comes up ok. ICMP works. UDP works. But TCP, the
> sender [which is the ipsec client] does not reach the destination.
>
> Its not a routing rule issue (since ICMP/UDP work).
> Its not a traffic selector just selecting TCP (I think) since ipsec
> status shows just a subnet, no protocol.
>
> Using tcpdump:
> # iptables -t mangle -I PREROUTING -m policy --pol ipsec --dir in -j
> NFLOG --nflog-group 5
> # iptables -t mangle -I POSTROUTING -m policy --pol ipsec --dir out -j
> NFLOG --nflog-group 5
> # tcpdump -s 0 -n -i nflog:5
>
> I see that it thinks it is sending the TCP packet, but the server end
> does not receive.
>
> Does anyone have any suggestion to try?
>
> strongswan is 5.5.1 [on ubuntu 17.04]
> kernel is 4.11.0-041100rc8-generic
>
> My rightsubnet is
> rightsubnet = 192.168.128.0/17,10.0.0.0/8
>
> so no specific protocol selected, the result is:
> CHILD_SA sv{1} established with SPIs c05f1b6c_i 0d58815a_o and TS
> 192.168.130.4/32 === 10.0.0.0/8 192.168.128.0/17
>
> I tried changing charondebug net=3, but i'm not sure how to interpret
> the output:
>
> Apr 25 21:06:34 office charon: 04[NET] received packet: from
> 64.7.137.180[4500] to 172.16.0.8[4500] (80 bytes)
> Apr 25 21:06:34 office charon: 04[ENC] parsed INFORMATIONAL request 4 [ ]
> Apr 25 21:06:34 office charon: 04[ENC] generating INFORMATIONAL response 4 [ ]
> Apr 25 21:06:34 office charon: 04[NET] sending packet: from
> 172.16.0.8[4500] to 64.7.137.180[4500] (80 bytes)


Re: [PATCH net-next 1/9] drivers: net: xgene: Protect indirect MAC access

2017-04-26 Thread Iyappan Subramanian
On Wed, Apr 26, 2017 at 5:06 PM, Florian Fainelli  wrote:
> On 04/26/2017 04:38 PM, Iyappan Subramanian wrote:
>> From: Quan Nguyen 
>>
>> This patch adds lock to protect indirect mac access sequence.
>>
>> Signed-off-by: Quan Nguyen 
>> Signed-off-by: Iyappan Subramanian 
>> ---
>>  drivers/net/ethernet/apm/xgene/xgene_enet_hw.c| 2 ++
>>  drivers/net/ethernet/apm/xgene/xgene_enet_main.c  | 1 +
>>  drivers/net/ethernet/apm/xgene/xgene_enet_main.h  | 1 +
>>  drivers/net/ethernet/apm/xgene/xgene_enet_sgmac.c | 7 ++-
>>  drivers/net/ethernet/apm/xgene/xgene_enet_xgmac.c | 2 ++
>>  5 files changed, 12 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/net/ethernet/apm/xgene/xgene_enet_hw.c 
>> b/drivers/net/ethernet/apm/xgene/xgene_enet_hw.c
>> index 2a835e0..3697ba7 100644
>> --- a/drivers/net/ethernet/apm/xgene/xgene_enet_hw.c
>> +++ b/drivers/net/ethernet/apm/xgene/xgene_enet_hw.c
>> @@ -365,9 +365,11 @@ static void xgene_enet_rd_mcx_mac(struct 
>> xgene_enet_pdata *pdata,
>>   cmd = pdata->mcx_mac_addr + MAC_COMMAND_REG_OFFSET;
>>   cmd_done = pdata->mcx_mac_addr + MAC_COMMAND_DONE_REG_OFFSET;
>>
>> + spin_lock(&pdata->mac_lock);
>>   if (!xgene_enet_rd_indirect(addr, rd, cmd, cmd_done, rd_addr, rd_data))
>>   netdev_err(pdata->ndev, "MCX mac read failed, addr: %04x\n",
>>  rd_addr);
>> + spin_unlock(&pdata->mac_lock);
>
> Why not fold the spinlock manipulation within xgenet_enet_rd_indirect()?
> That way the caller does not have to know that acquiring the spinlock is
> required and you avoid potential bugs in the future where you are
> missing the spinlock on indirect accesses?

Thanks.  We'll do that and post the next version.

> --
> Florian


[PATCH net-next] tcp: tcp_rack_reo_timeout() must update tp->tcp_mstamp

2017-04-26 Thread Eric Dumazet
From: Eric Dumazet 

I wrongly assumed tp->tcp_mstamp was up to date at the time
tcp_rack_reo_timeout() was called.

It is not true, since we only update tcp->tcp_mstamp when receiving
a packet (as initially done in commit 69e996c58a35 ("tcp: add
tp->tcp_mstamp field")

tcp_rack_reo_timeout() being called by a timer and not an incoming
packet, we need to refresh tp->tcp_mstamp

Fixes: 7c1c7308592f ("tcp: do not pass timestamp to tcp_rack_detect_loss()")
Signed-off-by: Eric Dumazet 
Cc: Soheil Hassas Yeganeh 
Cc: Neal Cardwell 
Cc: Yuchung Cheng 
---
 net/ipv4/tcp_recovery.c |1 +
 1 file changed, 1 insertion(+)

diff --git a/net/ipv4/tcp_recovery.c b/net/ipv4/tcp_recovery.c
index 
cd72b3d3879e88181c8a4639f0334a24e4cda852..362b8c75bfab44cf87c2a01398a146a271bc1119
 100644
--- a/net/ipv4/tcp_recovery.c
+++ b/net/ipv4/tcp_recovery.c
@@ -166,6 +166,7 @@ void tcp_rack_reo_timeout(struct sock *sk)
u32 timeout, prior_inflight;
 
prior_inflight = tcp_packets_in_flight(tp);
+   skb_mstamp_get(&tp->tcp_mstamp);
tcp_rack_detect_loss(sk, &timeout);
if (prior_inflight != tcp_packets_in_flight(tp)) {
if (inet_csk(sk)->icsk_ca_state != TCP_CA_Recovery) {




Re: [PATCH v1 net-next 3/6] net: add new control message for incoming HW-timestamped packets

2017-04-26 Thread kbuild test robot
Hi Miroslav,

[auto build test ERROR on net-next/master]

url:
https://github.com/0day-ci/linux/commits/Miroslav-Lichvar/Extend-socket-timestamping-API/20170427-093243
config: xtensa-allmodconfig (attached as .config)
compiler: xtensa-linux-gcc (GCC) 4.9.0
reproduce:
wget 
https://raw.githubusercontent.com/01org/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=xtensa 

All errors (new ones prefixed by >>):

   net/socket.c: In function 'put_ts_pktinfo':
>> net/socket.c:680:28: error: 'SCM_TIMESTAMPING_PKTINFO' undeclared (first use 
>> in this function)
 put_cmsg(msg, SOL_SOCKET, SCM_TIMESTAMPING_PKTINFO,
   ^
   net/socket.c:680:28: note: each undeclared identifier is reported only once 
for each function it appears in

vim +/SCM_TIMESTAMPING_PKTINFO +680 net/socket.c

   674  orig_dev = dev_get_by_napi_id(skb->napi_id);
   675  if (!orig_dev)
   676  return;
   677  
   678  ts_pktinfo.if_index = orig_dev->ifindex;
   679  ts_pktinfo.pkt_length = skb->len - skb_mac_offset(skb);
 > 680  put_cmsg(msg, SOL_SOCKET, SCM_TIMESTAMPING_PKTINFO,
   681   sizeof(ts_pktinfo), &ts_pktinfo);
   682  #endif
   683  }

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip


Re: [PATCH v2 07/21] crypto: shash, caam: Make use of the new sg_map helper function

2017-04-26 Thread Herbert Xu
On Tue, Apr 25, 2017 at 12:20:54PM -0600, Logan Gunthorpe wrote:
> Very straightforward conversion to the new function in the caam driver
> and shash library.
> 
> Signed-off-by: Logan Gunthorpe 
> Cc: Herbert Xu 
> Cc: "David S. Miller" 
> ---
>  crypto/shash.c| 9 ++---
>  drivers/crypto/caam/caamalg.c | 8 +++-
>  2 files changed, 9 insertions(+), 8 deletions(-)
> 
> diff --git a/crypto/shash.c b/crypto/shash.c
> index 5e31c8d..5914881 100644
> --- a/crypto/shash.c
> +++ b/crypto/shash.c
> @@ -283,10 +283,13 @@ int shash_ahash_digest(struct ahash_request *req, 
> struct shash_desc *desc)
>   if (nbytes < min(sg->length, ((unsigned int)(PAGE_SIZE)) - offset)) {
>   void *data;
>  
> - data = kmap_atomic(sg_page(sg));
> - err = crypto_shash_digest(desc, data + offset, nbytes,
> + data = sg_map(sg, 0, SG_KMAP_ATOMIC);
> + if (IS_ERR(data))
> + return PTR_ERR(data);
> +
> + err = crypto_shash_digest(desc, data, nbytes,
> req->result);
> - kunmap_atomic(data);
> + sg_unmap(sg, data, 0, SG_KMAP_ATOMIC);
>   crypto_yield(desc->flags);
>   } else
>   err = crypto_shash_init(desc) ?:

Nack.  This is an optimisation for the special case of a single
SG list entry.  In fact in the common case the kmap_atomic should
disappear altogether in the no-highmem case.  So replacing it
with sg_map is not acceptable.

Cheers,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt


Re: [PATCH net-next] net: vrf: Do not allow looback to be moved to a VRF

2017-04-26 Thread Greg Rose
On Wed, 2017-04-26 at 07:58 -0700, David Ahern wrote:
> Moving the loopback into a VRF breaks networking for the default VRF.
> Since the VRF device is the loopback for VRF domains, there is no
> reason to move the loopback. Given the repercussions, block attempts
> to set lo into a VRF.
> 
> Signed-off-by: David Ahern 
> ---
>  drivers/net/vrf.c | 6 ++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/drivers/net/vrf.c b/drivers/net/vrf.c
> index aa5d30428bba..ceda5861da78 100644
> --- a/drivers/net/vrf.c
> +++ b/drivers/net/vrf.c
> @@ -877,6 +877,12 @@ static int do_vrf_add_slave(struct net_device *dev, 
> struct net_device *port_dev)
>  {
>   int ret;
>  
> + /* do not allow loopback device to be enslaved to a VRF.
> +  * The vrf device acts as the loopback for the vrf.
> +  */
> + if (port_dev == dev_net(dev)->loopback_dev)
> + return -EOPNOTSUPP;
> +
>   port_dev->priv_flags |= IFF_L3MDEV_SLAVE;
>   ret = netdev_master_upper_dev_link(port_dev, dev, NULL, NULL);
>   if (ret < 0)

I think that's a great idea.

Reviewed-by: Greg Rose 



[PATCH net v4 3/3] net: hns: fixed bug that skb used after kfree

2017-04-26 Thread Yankejian
From: lipeng 

There is KASAN warning which turn out it's a skb used after free:

BUG: KASAN: use-after-free in hns_nic_net_xmit_hw+0x62c/0x940...
[17659.112635]  alloc_debug_processing+0x18c/0x1a0
[17659.117208]  __slab_alloc+0x52c/0x560
[17659.120909]  kmem_cache_alloc_node+0xac/0x2c0
[17659.125309]  __alloc_skb+0x6c/0x260
[17659.128837]  tcp_send_ack+0x8c/0x280
[17659.132449]  __tcp_ack_snd_check+0x9c/0xf0
[17659.136587]  tcp_rcv_established+0x5a4/0xa70
[17659.140899]  tcp_v4_do_rcv+0x27c/0x620
[17659.144687]  tcp_prequeue_process+0x108/0x170
[17659.149085]  tcp_recvmsg+0x940/0x1020
[17659.152787]  inet_recvmsg+0x124/0x180
[17659.156488]  sock_recvmsg+0x64/0x80
[17659.160012]  SyS_recvfrom+0xd8/0x180
[17659.163626]  __sys_trace_return+0x0/0x4
[17659.167506] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=23 cpu=1 pid=13
[17659.174000]  free_debug_processing+0x1d4/0x2c0
[17659.178486]  __slab_free+0x240/0x390
[17659.182100]  kmem_cache_free+0x24c/0x270
[17659.186062]  kfree_skbmem+0xa0/0xb0
[17659.189587]  __kfree_skb+0x28/0x40
[17659.193025]  napi_gro_receive+0x168/0x1c0
[17659.197074]  hns_nic_rx_up_pro+0x58/0x90
[17659.201038]  hns_nic_rx_poll_one+0x518/0xbc0
[17659.205352]  hns_nic_common_poll+0x94/0x140
[17659.209576]  net_rx_action+0x458/0x5e0
[17659.213363]  __do_softirq+0x1b8/0x480
[17659.217062]  run_ksoftirqd+0x64/0x80
[17659.220679]  smpboot_thread_fn+0x224/0x310
[17659.224821]  kthread+0x150/0x170
[17659.228084]  ret_from_fork+0x10/0x40

BUG: KASAN: use-after-free in hns_nic_net_xmit+0x8c/0xc0...
[17751.080490]  __slab_alloc+0x52c/0x560
[17751.084188]  kmem_cache_alloc+0x244/0x280
[17751.088238]  __build_skb+0x40/0x150
[17751.091764]  build_skb+0x28/0x100
[17751.095115]  __alloc_rx_skb+0x94/0x150
[17751.098900]  __napi_alloc_skb+0x34/0x90
[17751.102776]  hns_nic_rx_poll_one+0x180/0xbc0
[17751.107097]  hns_nic_common_poll+0x94/0x140
[17751.111333]  net_rx_action+0x458/0x5e0
[17751.115123]  __do_softirq+0x1b8/0x480
[17751.118823]  run_ksoftirqd+0x64/0x80
[17751.122437]  smpboot_thread_fn+0x224/0x310
[17751.126575]  kthread+0x150/0x170
[17751.129838]  ret_from_fork+0x10/0x40
[17751.133454] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=19 cpu=7 pid=43
[17751.139951]  free_debug_processing+0x1d4/0x2c0
[17751.144436]  __slab_free+0x240/0x390
[17751.148051]  kmem_cache_free+0x24c/0x270
[17751.152014]  kfree_skbmem+0xa0/0xb0
[17751.155543]  __kfree_skb+0x28/0x40
[17751.159022]  napi_gro_receive+0x168/0x1c0
[17751.163074]  hns_nic_rx_up_pro+0x58/0x90
[17751.167041]  hns_nic_rx_poll_one+0x518/0xbc0
[17751.171358]  hns_nic_common_poll+0x94/0x140
[17751.175585]  net_rx_action+0x458/0x5e0
[17751.179373]  __do_softirq+0x1b8/0x480
[17751.183076]  run_ksoftirqd+0x64/0x80
[17751.186691]  smpboot_thread_fn+0x224/0x310
[17751.190826]  kthread+0x150/0x170
[17751.194093]  ret_from_fork+0x10/0x40

in drivers/net/ethernet/hisilicon/hns/hns_enet.c

static netdev_tx_t hns_nic_net_xmit(struct sk_buff *skb,
struct net_device *ndev)
{
struct hns_nic_priv *priv = netdev_priv(ndev);
int ret;

assert(skb->queue_mapping < ndev->ae_handle->q_num);
ret = hns_nic_net_xmit_hw(ndev, skb,
  &tx_ring_data(priv, skb->queue_mapping));
if (ret == NETDEV_TX_OK) {
netif_trans_update(ndev);
ndev->stats.tx_bytes += skb->len;
ndev->stats.tx_packets++;
}
 return (netdev_tx_t)ret;
}

skb maybe freed in hns_nic_net_xmit_hw() and return NETDEV_TX_OK:

out_err_tx_ok:

dev_kfree_skb_any(skb);
return NETDEV_TX_OK;

This patch fix this bug.

Reported-by: Jun He 
Signed-off-by: lipeng 
Reviewed-by: Yisen Zhuang 
---
 drivers/net/ethernet/hisilicon/hns/hns_enet.c | 22 ++
 drivers/net/ethernet/hisilicon/hns/hns_enet.h |  6 +++---
 2 files changed, 13 insertions(+), 15 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hns/hns_enet.c 
b/drivers/net/ethernet/hisilicon/hns/hns_enet.c
index fca37e2..1e04df6 100644
--- a/drivers/net/ethernet/hisilicon/hns/hns_enet.c
+++ b/drivers/net/ethernet/hisilicon/hns/hns_enet.c
@@ -300,9 +300,9 @@ static void fill_tso_desc(struct hnae_ring *ring, void 
*priv,
 mtu);
 }
 
-int hns_nic_net_xmit_hw(struct net_device *ndev,
-   struct sk_buff *skb,
-   struct hns_nic_ring_data *ring_data)
+netdev_tx_t hns_nic_net_xmit_hw(struct net_device *ndev,
+  

[PATCH net v4 0/3] net: hns: bug fix for HNS driver

2017-04-26 Thread Yankejian
From: lipeng 

The first two patches [1/3] [2/3] of this serie add support defered
dsaf probe when mdio and mbigen module is not insmod. The third
patch [3/3] fixes a bug that skb still be used after freed, which will
cuase panic.

For more details, please refer to individual patch.

change log:
V3 -> V4:
1. Delete redundant commit message;
2. add Reviewed-by: Matthias Brugger ;

V2 -> V3:
1. Check return value when  platform_get_irq in hns_rcb_get_cfg;

V1 -> V2:
1. Return appropriate errno in hns_mac_register_phy;

lipeng (3):
  net: hns: support deferred probe when can not obtain irq
  net: hns: support deferred probe when no mdio
  net: hns: fixed bug that skb used after kfree

 drivers/net/ethernet/hisilicon/hns/hns_dsaf_mac.c | 39 +++
 drivers/net/ethernet/hisilicon/hns/hns_dsaf_ppe.c |  4 ++-
 drivers/net/ethernet/hisilicon/hns/hns_dsaf_rcb.c |  8 -
 drivers/net/ethernet/hisilicon/hns/hns_dsaf_rcb.h |  2 +-
 drivers/net/ethernet/hisilicon/hns/hns_enet.c | 22 ++---
 drivers/net/ethernet/hisilicon/hns/hns_enet.h |  6 ++--
 6 files changed, 57 insertions(+), 24 deletions(-)

-- 
1.9.1



[PATCH net v4 2/3] net: hns: support deferred probe when no mdio

2017-04-26 Thread Yankejian
From: lipeng 

In the hip06 and hip07 SoCs, phy connect to mdio bus.The mdio
module is probed with module_init, and, as such,
is not guaranteed to probe before the HNS driver. So we need
to support deferred probe.

We check for probe deferral in the mac init, so we not init DSAF
when there is no mdio, and free all resource, to later learn that
we need to defer the probe.

Signed-off-by: lipeng 
Reviewed-by: Yisen Zhuang 
Reviewed-by: Matthias Brugger 
---
 drivers/net/ethernet/hisilicon/hns/hns_dsaf_mac.c | 39 +++
 1 file changed, 33 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_mac.c 
b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_mac.c
index bdd8cdd..8c00e09 100644
--- a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_mac.c
+++ b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_mac.c
@@ -735,6 +735,8 @@ static int hns_mac_init_ex(struct hns_mac_cb *mac_cb)
rc = phy_device_register(phy);
if (rc) {
phy_device_free(phy);
+   dev_err(&mdio->dev, "registered phy fail at address %i\n",
+   addr);
return -ENODEV;
}
 
@@ -745,7 +747,7 @@ static int hns_mac_init_ex(struct hns_mac_cb *mac_cb)
return 0;
 }
 
-static void hns_mac_register_phy(struct hns_mac_cb *mac_cb)
+static int hns_mac_register_phy(struct hns_mac_cb *mac_cb)
 {
struct acpi_reference_args args;
struct platform_device *pdev;
@@ -755,24 +757,39 @@ static void hns_mac_register_phy(struct hns_mac_cb 
*mac_cb)
 
/* Loop over the child nodes and register a phy_device for each one */
if (!to_acpi_device_node(mac_cb->fw_port))
-   return;
+   return -ENODEV;
 
rc = acpi_node_get_property_reference(
mac_cb->fw_port, "mdio-node", 0, &args);
if (rc)
-   return;
+   return rc;
 
addr = hns_mac_phy_parse_addr(mac_cb->dev, mac_cb->fw_port);
if (addr < 0)
-   return;
+   return addr;
 
/* dev address in adev */
pdev = hns_dsaf_find_platform_device(acpi_fwnode_handle(args.adev));
+   if (!pdev) {
+   dev_err(mac_cb->dev, "mac%d mdio pdev is NULL\n",
+   mac_cb->mac_id);
+   return  -EINVAL;
+   }
+
mii_bus = platform_get_drvdata(pdev);
+   if (!mii_bus) {
+   dev_err(mac_cb->dev,
+   "mac%d mdio is NULL, dsaf will probe again later\n",
+   mac_cb->mac_id);
+   return -EPROBE_DEFER;
+   }
+
rc = hns_mac_register_phydev(mii_bus, mac_cb, addr);
if (!rc)
dev_dbg(mac_cb->dev, "mac%d register phy addr:%d\n",
mac_cb->mac_id, addr);
+
+   return rc;
 }
 
 #define MAC_MEDIA_TYPE_MAX_LEN 16
@@ -793,7 +810,7 @@ static void hns_mac_register_phy(struct hns_mac_cb *mac_cb)
  *@np:device node
  * return: 0 --success, negative --fail
  */
-static int  hns_mac_get_info(struct hns_mac_cb *mac_cb)
+static int hns_mac_get_info(struct hns_mac_cb *mac_cb)
 {
struct device_node *np;
struct regmap *syscon;
@@ -903,7 +920,15 @@ static int  hns_mac_get_info(struct hns_mac_cb *mac_cb)
}
}
} else if (is_acpi_node(mac_cb->fw_port)) {
-   hns_mac_register_phy(mac_cb);
+   ret = hns_mac_register_phy(mac_cb);
+   /*
+* Mac can work well if there is phy or not.If the port don't
+* connect with phy, the return value will be ignored. Only
+* when there is phy but can't find mdio bus, the return value
+* will be handled.
+*/
+   if (ret == -EPROBE_DEFER)
+   return ret;
} else {
dev_err(mac_cb->dev, "mac%d cannot find phy node\n",
mac_cb->mac_id);
@@ -1065,6 +1090,7 @@ int hns_mac_init(struct dsaf_device *dsaf_dev)
dsaf_dev->mac_cb[port_id] = mac_cb;
}
}
+
/* init mac_cb for all port */
for (port_id = 0; port_id < max_port_num; port_id++) {
mac_cb = dsaf_dev->mac_cb[port_id];
@@ -1074,6 +1100,7 @@ int hns_mac_init(struct dsaf_device *dsaf_dev)
ret = hns_mac_get_cfg(dsaf_dev, mac_cb);
if (ret)
return ret;
+
ret = hns_mac_init_ex(mac_cb);
if (ret)
return ret;
-- 
1.9.1



[PATCH net v4 1/3] net: hns: support deferred probe when can not obtain irq

2017-04-26 Thread Yankejian
From: lipeng 

In the hip06 and hip07 SoCs, the interrupt lines from the
DSAF controllers are connected to mbigen hw module.
The mbigen module is probed with module_init, and, as such,
is not guaranteed to probe before the HNS driver. So we need
to support deferred probe.

Signed-off-by: lipeng 
Reviewed-by: Yisen Zhuang 
Reviewed-by: Matthias Brugger 
---
change log:
V3 -> V4:
1. Delete redundant commit message;
2. add Reviewed-by: Matthias Brugger ;

V2 -> V3:
1. Check return value when  platform_get_irq in hns_rcb_get_cfg;
---
 drivers/net/ethernet/hisilicon/hns/hns_dsaf_ppe.c | 4 +++-
 drivers/net/ethernet/hisilicon/hns/hns_dsaf_rcb.c | 8 +++-
 drivers/net/ethernet/hisilicon/hns/hns_dsaf_rcb.h | 2 +-
 3 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_ppe.c 
b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_ppe.c
index 6ea8722..a41cf95 100644
--- a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_ppe.c
+++ b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_ppe.c
@@ -510,7 +510,9 @@ int hns_ppe_init(struct dsaf_device *dsaf_dev)
 
hns_ppe_get_cfg(dsaf_dev->ppe_common[i]);
 
-   hns_rcb_get_cfg(dsaf_dev->rcb_common[i]);
+   ret = hns_rcb_get_cfg(dsaf_dev->rcb_common[i]);
+   if (ret)
+   goto get_rcb_cfg_fail;
}
 
for (i = 0; i < HNS_PPE_COM_NUM; i++)
diff --git a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_rcb.c 
b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_rcb.c
index f0ed80d6..673a5d3 100644
--- a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_rcb.c
+++ b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_rcb.c
@@ -452,7 +452,7 @@ static int hns_rcb_get_base_irq_idx(struct rcb_common_cb 
*rcb_common)
  *hns_rcb_get_cfg - get rcb config
  *@rcb_common: rcb common device
  */
-void hns_rcb_get_cfg(struct rcb_common_cb *rcb_common)
+int hns_rcb_get_cfg(struct rcb_common_cb *rcb_common)
 {
struct ring_pair_cb *ring_pair_cb;
u32 i;
@@ -477,10 +477,16 @@ void hns_rcb_get_cfg(struct rcb_common_cb *rcb_common)
ring_pair_cb->virq[HNS_RCB_IRQ_IDX_RX] =
is_ver1 ? platform_get_irq(pdev, base_irq_idx + i * 2 + 1) :
  platform_get_irq(pdev, base_irq_idx + i * 3);
+   if ((ring_pair_cb->virq[HNS_RCB_IRQ_IDX_TX] == -EPROBE_DEFER) ||
+   (ring_pair_cb->virq[HNS_RCB_IRQ_IDX_RX] == -EPROBE_DEFER))
+   return -EPROBE_DEFER;
+
ring_pair_cb->q.phy_base =
RCB_COMM_BASE_TO_RING_BASE(rcb_common->phy_base, i);
hns_rcb_ring_pair_get_cfg(ring_pair_cb);
}
+
+   return 0;
 }
 
 /**
diff --git a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_rcb.h 
b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_rcb.h
index 99b4e1b..3d7b484 100644
--- a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_rcb.h
+++ b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_rcb.h
@@ -110,7 +110,7 @@ struct rcb_common_cb {
 void hns_rcb_common_free_cfg(struct dsaf_device *dsaf_dev, u32 comm_index);
 int hns_rcb_common_init_hw(struct rcb_common_cb *rcb_common);
 void hns_rcb_start(struct hnae_queue *q, u32 val);
-void hns_rcb_get_cfg(struct rcb_common_cb *rcb_common);
+int hns_rcb_get_cfg(struct rcb_common_cb *rcb_common);
 void hns_rcb_get_queue_mode(enum dsaf_mode dsaf_mode,
u16 *max_vfn, u16 *max_q_per_vf);
 
-- 
1.9.1



Re: more test_progs...

2017-04-26 Thread David Miller
From: Alexei Starovoitov 
Date: Wed, 26 Apr 2017 16:49:35 -0700

> It's only needed for test_pkt_access.c test,
> since it's being fancy and doing iph->ihl * 4.

It is going to be a common idiom in anything looking at
transport headers, no?

> Also such tcp->urg_ptr access into packed struct is more efficient
> after JITing then sparc's own load_half_unaligned in asm, since it's
> done inline and doesn't need a call.
> 
> Note that such tcphdr workaround is not necessary for more
> real programs: test_l4lb.c and test_xdp.c, since they do:
> if (iph->ihl != 5)
>return drop;

Hmmm...

> Does sparc64 have some special instructions like that?

Unfortunately not.

I think we need to seriously consider tracking "Pointer
incremented by power of 2 N" and stuff like that.

I know it's not easy, but it is necessary.

Having stuff like the packed thing above is just a completely
unnecessary detail to expose to users writing these programs.



Re: [PATCH net v3 1/3] net: hns: support deferred probe when can not obtain irq

2017-04-26 Thread lipeng (Y)



On 2017/4/26 22:03, Matthias Brugger wrote:



On 26/04/17 09:00, Yankejian wrote:

From: lipeng 

In the hip06 and hip07 SoCs, the interrupt lines from the
DSAF controllers are connected to mbigen hw module.
The mbigen module is probed with module_init, and, as such,
is not guaranteed to probe before the HNS driver. So we need
to support deferred probe.

We check for probe deferral in the hw layer probe, so we not
probe into the main layer and memories, etc., to later learn
that we need to defer the probe.


This paragraph does not hold any more and should be deleted.

OK , will delete it.


Other then this:
Reviewed-by: Matthias Brugger 



Thanks for your review



Signed-off-by: lipeng 
Reviewed-by: Yisen Zhuang 
---
V2 -> V3:
1. Check return value when  platform_get_irq in hns_rcb_get_cfg;
---
 drivers/net/ethernet/hisilicon/hns/hns_dsaf_ppe.c | 4 +++-
 drivers/net/ethernet/hisilicon/hns/hns_dsaf_rcb.c | 8 +++-
 drivers/net/ethernet/hisilicon/hns/hns_dsaf_rcb.h | 2 +-
 3 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_ppe.c 
b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_ppe.c

index 6ea8722..a41cf95 100644
--- a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_ppe.c
+++ b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_ppe.c
@@ -510,7 +510,9 @@ int hns_ppe_init(struct dsaf_device *dsaf_dev)

 hns_ppe_get_cfg(dsaf_dev->ppe_common[i]);

-hns_rcb_get_cfg(dsaf_dev->rcb_common[i]);
+ret = hns_rcb_get_cfg(dsaf_dev->rcb_common[i]);
+if (ret)
+goto get_rcb_cfg_fail;
 }

 for (i = 0; i < HNS_PPE_COM_NUM; i++)
diff --git a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_rcb.c 
b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_rcb.c

index f0ed80d6..673a5d3 100644
--- a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_rcb.c
+++ b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_rcb.c
@@ -452,7 +452,7 @@ static int hns_rcb_get_base_irq_idx(struct 
rcb_common_cb *rcb_common)

  *hns_rcb_get_cfg - get rcb config
  *@rcb_common: rcb common device
  */
-void hns_rcb_get_cfg(struct rcb_common_cb *rcb_common)
+int hns_rcb_get_cfg(struct rcb_common_cb *rcb_common)
 {
 struct ring_pair_cb *ring_pair_cb;
 u32 i;
@@ -477,10 +477,16 @@ void hns_rcb_get_cfg(struct rcb_common_cb 
*rcb_common)

 ring_pair_cb->virq[HNS_RCB_IRQ_IDX_RX] =
 is_ver1 ? platform_get_irq(pdev, base_irq_idx + i * 2 + 1) :
   platform_get_irq(pdev, base_irq_idx + i * 3);
+if ((ring_pair_cb->virq[HNS_RCB_IRQ_IDX_TX] == 
-EPROBE_DEFER) ||

+(ring_pair_cb->virq[HNS_RCB_IRQ_IDX_RX] == -EPROBE_DEFER))
+return -EPROBE_DEFER;
+
 ring_pair_cb->q.phy_base =
 RCB_COMM_BASE_TO_RING_BASE(rcb_common->phy_base, i);
 hns_rcb_ring_pair_get_cfg(ring_pair_cb);
 }
+
+return 0;
 }

 /**
diff --git a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_rcb.h 
b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_rcb.h

index 99b4e1b..3d7b484 100644
--- a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_rcb.h
+++ b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_rcb.h
@@ -110,7 +110,7 @@ struct rcb_common_cb {
 void hns_rcb_common_free_cfg(struct dsaf_device *dsaf_dev, u32 
comm_index);

 int hns_rcb_common_init_hw(struct rcb_common_cb *rcb_common);
 void hns_rcb_start(struct hnae_queue *q, u32 val);
-void hns_rcb_get_cfg(struct rcb_common_cb *rcb_common);
+int hns_rcb_get_cfg(struct rcb_common_cb *rcb_common);
 void hns_rcb_get_queue_mode(enum dsaf_mode dsaf_mode,
 u16 *max_vfn, u16 *max_q_per_vf);



.





Re: [PATCH net-next] bpf: restore skb->sk before pskb_trim() call

2017-04-26 Thread Willem de Bruijn
On Wed, Apr 26, 2017 at 4:53 PM, Alexei Starovoitov
 wrote:
> On Wed, Apr 26, 2017 at 09:09:23AM -0700, Eric Dumazet wrote:
>> From: Eric Dumazet 
>>
>> While testing a fix [1] in ___pskb_trim(), addressing the WARN_ON_ONCE()
>> in skb_try_coalesce() reported by Andrey, I found that we had an skb
>> with skb->sk set but no skb->destructor.
>>
>> This invalidated heuristic found in commit 158f323b9868 ("net: adjust
>> skb->truesize in pskb_expand_head()") and in cited patch.
>>
>> Considering the BUG_ON(skb->sk) we have in skb_orphan(), we should
>> restrain the temporary setting to a minimal section.
>>
>> [1] https://patchwork.ozlabs.org/patch/755570/
>> net: adjust skb->truesize in ___pskb_trim()
>>
>> Fixes: 8f917bba0042 ("bpf: pass sk to helper functions")
>> Signed-off-by: Eric Dumazet 
>> Cc: Willem de Bruijn 
>> Cc: Andrey Konovalov 
>
> Ahh. Thanks for the fix.
> Acked-by: Alexei Starovoitov 

Acked-by: Willem de Bruijn 

Thanks, Eric.


[PATCH net] tcp: do not underestimate skb->truesize in tcp_trim_head()

2017-04-26 Thread Eric Dumazet
From: Eric Dumazet 

Andrey found a way to trigger the WARN_ON_ONCE(delta < len) in
skb_try_coalesce() using syzkaller and a filter attached to a TCP
socket over loopback interface.

I believe one issue with looped skbs is that tcp_trim_head() can end up
producing skb with under estimated truesize.

It hardly matters for normal conditions, since packets sent over
loopback are never truncated.

Bytes trimmed from skb->head should not change skb truesize, since
skb->head is not reallocated.

Signed-off-by: Eric Dumazet 
Reported-by: Andrey Konovalov 
---
 net/ipv4/tcp_output.c |   19 ---
 1 file changed, 12 insertions(+), 7 deletions(-)

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 
c3c082ed38796f65948e7a1042e37813b71d5abf..a85d863c44196e60fd22e25471cf773e72d2c133
 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -1267,7 +1267,7 @@ int tcp_fragment(struct sock *sk, struct sk_buff *skb, 
u32 len,
  * eventually). The difference is that pulled data not copied, but
  * immediately discarded.
  */
-static void __pskb_trim_head(struct sk_buff *skb, int len)
+static int __pskb_trim_head(struct sk_buff *skb, int len)
 {
struct skb_shared_info *shinfo;
int i, k, eat;
@@ -1277,7 +1277,7 @@ static void __pskb_trim_head(struct sk_buff *skb, int len)
__skb_pull(skb, eat);
len -= eat;
if (!len)
-   return;
+   return 0;
}
eat = len;
k = 0;
@@ -1303,23 +1303,28 @@ static void __pskb_trim_head(struct sk_buff *skb, int 
len)
skb_reset_tail_pointer(skb);
skb->data_len -= len;
skb->len = skb->data_len;
+   return len;
 }
 
 /* Remove acked data from a packet in the transmit queue. */
 int tcp_trim_head(struct sock *sk, struct sk_buff *skb, u32 len)
 {
+   u32 delta_truesize;
+
if (skb_unclone(skb, GFP_ATOMIC))
return -ENOMEM;
 
-   __pskb_trim_head(skb, len);
+   delta_truesize = __pskb_trim_head(skb, len);
 
TCP_SKB_CB(skb)->seq += len;
skb->ip_summed = CHECKSUM_PARTIAL;
 
-   skb->truesize-= len;
-   sk->sk_wmem_queued   -= len;
-   sk_mem_uncharge(sk, len);
-   sock_set_flag(sk, SOCK_QUEUE_SHRUNK);
+   if (delta_truesize) {
+   skb->truesize  -= delta_truesize;
+   sk->sk_wmem_queued -= delta_truesize;
+   sk_mem_uncharge(sk, delta_truesize);
+   sock_set_flag(sk, SOCK_QUEUE_SHRUNK);
+   }
 
/* Any change of skb->len requires recalculation of tso factor. */
if (tcp_skb_pcount(skb) > 1)




Re: [PATCH net-next 1/9] drivers: net: xgene: Protect indirect MAC access

2017-04-26 Thread Florian Fainelli
On 04/26/2017 04:38 PM, Iyappan Subramanian wrote:
> From: Quan Nguyen 
> 
> This patch adds lock to protect indirect mac access sequence.
> 
> Signed-off-by: Quan Nguyen 
> Signed-off-by: Iyappan Subramanian 
> ---
>  drivers/net/ethernet/apm/xgene/xgene_enet_hw.c| 2 ++
>  drivers/net/ethernet/apm/xgene/xgene_enet_main.c  | 1 +
>  drivers/net/ethernet/apm/xgene/xgene_enet_main.h  | 1 +
>  drivers/net/ethernet/apm/xgene/xgene_enet_sgmac.c | 7 ++-
>  drivers/net/ethernet/apm/xgene/xgene_enet_xgmac.c | 2 ++
>  5 files changed, 12 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/apm/xgene/xgene_enet_hw.c 
> b/drivers/net/ethernet/apm/xgene/xgene_enet_hw.c
> index 2a835e0..3697ba7 100644
> --- a/drivers/net/ethernet/apm/xgene/xgene_enet_hw.c
> +++ b/drivers/net/ethernet/apm/xgene/xgene_enet_hw.c
> @@ -365,9 +365,11 @@ static void xgene_enet_rd_mcx_mac(struct 
> xgene_enet_pdata *pdata,
>   cmd = pdata->mcx_mac_addr + MAC_COMMAND_REG_OFFSET;
>   cmd_done = pdata->mcx_mac_addr + MAC_COMMAND_DONE_REG_OFFSET;
>  
> + spin_lock(&pdata->mac_lock);
>   if (!xgene_enet_rd_indirect(addr, rd, cmd, cmd_done, rd_addr, rd_data))
>   netdev_err(pdata->ndev, "MCX mac read failed, addr: %04x\n",
>  rd_addr);
> + spin_unlock(&pdata->mac_lock);

Why not fold the spinlock manipulation within xgenet_enet_rd_indirect()?
That way the caller does not have to know that acquiring the spinlock is
required and you avoid potential bugs in the future where you are
missing the spinlock on indirect accesses?
-- 
Florian


Re: [PATCH v1 net-next 5/6] net: allow simultaneous SW and HW transmit timestamping

2017-04-26 Thread Willem de Bruijn
> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> index 81ef53f..42bff22 100644
> --- a/include/linux/skbuff.h
> +++ b/include/linux/skbuff.h
> @@ -3300,8 +3300,7 @@ void skb_tstamp_tx(struct sk_buff *orig_skb,
>
>  static inline void sw_tx_timestamp(struct sk_buff *skb)
>  {
> -   if (skb_shinfo(skb)->tx_flags & SKBTX_SW_TSTAMP &&
> -   !(skb_shinfo(skb)->tx_flags & SKBTX_IN_PROGRESS))
> +   if (skb_shinfo(skb)->tx_flags & SKBTX_SW_TSTAMP)
> skb_tstamp_tx(skb, NULL);
>  }
>
> diff --git a/include/uapi/linux/net_tstamp.h b/include/uapi/linux/net_tstamp.h
> index 8fcae35..d251972 100644
> --- a/include/uapi/linux/net_tstamp.h
> +++ b/include/uapi/linux/net_tstamp.h
> @@ -27,8 +27,9 @@ enum {
> SOF_TIMESTAMPING_OPT_TSONLY = (1<<11),
> SOF_TIMESTAMPING_OPT_STATS = (1<<12),
> SOF_TIMESTAMPING_OPT_PKTINFO = (1<<13),
> +   SOF_TIMESTAMPING_OPT_TX_SWHW = (1<<14),
>
> -   SOF_TIMESTAMPING_LAST = SOF_TIMESTAMPING_OPT_PKTINFO,
> +   SOF_TIMESTAMPING_LAST = SOF_TIMESTAMPING_OPT_TX_SWHW,
> SOF_TIMESTAMPING_MASK = (SOF_TIMESTAMPING_LAST - 1) |
>  SOF_TIMESTAMPING_LAST
>  };
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index 58604c1..db5aa19 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -3874,6 +3874,10 @@ void __skb_tstamp_tx(struct sk_buff *orig_skb,
> if (!sk)
> return;
>
> +   if (!hwtstamps && !(sk->sk_tsflags & SOF_TIMESTAMPING_OPT_TX_SWHW) &&
> +   skb_shinfo(orig_skb)->tx_flags & SKBTX_IN_PROGRESS)
> +   return;
> +

This check should only happen for software transmit timestamps, so simpler to
revise the check in sw_tx_timestamp above to

  if (skb_shinfo(skb)->tx_flags & SKBTX_SW_TSTAMP &&
-!(skb_shinfo(skb)->tx_flags & SKBTX_IN_PROGRESS))
+  (!(skb_shinfo(orig_skb)->tx_flags & SKBTX_IN_PROGRESS)) ||
+  (skb->sk && skb->sk->sk_tsflags & SOF_TIMESTAMPING_OPT_TX_SWHW)

> diff --git a/net/socket.c b/net/socket.c
> index 68b9304..14b7688 100644
> --- a/net/socket.c
> +++ b/net/socket.c
> @@ -720,6 +720,7 @@ void __sock_recv_timestamp(struct msghdr *msg, struct 
> sock *sk,
> empty = 0;
> if (shhwtstamps &&
> (sk->sk_tsflags & SOF_TIMESTAMPING_RAW_HARDWARE) &&
> +   (empty || !skb_is_err_queue(skb)) &&
> ktime_to_timespec_cond(shhwtstamps->hwtstamp, tss.ts + 2)) {

I find skb->tstamp == 0 easier to understand than the condition on empty.

Indeed, this is so non-obvious that I would suggest another helper function
skb_is_hwtx_tstamp with a concise comment about the race condition
between tx software and hardware timestamps (as in the last sentence of
the commit message).


[PATCH] netvsc: make sure napi enabled before vmbus_open

2017-04-26 Thread Stephen Hemminger
This fixes a race where vmbus callback for new packet arriving
could occur before NAPI is initialized. Happens more on WS2008
which takes longer to setup channel.

Signed-off-by: Stephen Hemminger 
---
 drivers/net/hyperv/netvsc.c   | 8 +---
 drivers/net/hyperv/rndis_filter.c | 4 +++-
 2 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c
index f99651c03e0a..8150febb86d6 100644
--- a/drivers/net/hyperv/netvsc.c
+++ b/drivers/net/hyperv/netvsc.c
@@ -1323,6 +1323,10 @@ int netvsc_device_add(struct hv_device *device,
nvchan->channel = device->channel;
}
 
+   /* Enable NAPI handler before init callbacks */
+   netif_napi_add(ndev, &net_device->chan_table[0].napi,
+  netvsc_poll, NAPI_POLL_WEIGHT);
+
/* Open the channel */
ret = vmbus_open(device->channel, ring_size * PAGE_SIZE,
 ring_size * PAGE_SIZE, NULL, 0,
@@ -1330,6 +1334,7 @@ int netvsc_device_add(struct hv_device *device,
 net_device->chan_table);
 
if (ret != 0) {
+   netif_napi_del(&net_device->chan_table[0].napi);
netdev_err(ndev, "unable to open channel: %d\n", ret);
goto cleanup;
}
@@ -1337,9 +1342,6 @@ int netvsc_device_add(struct hv_device *device,
/* Channel is opened */
netdev_dbg(ndev, "hv_netvsc channel opened successfully\n");
 
-   /* Enable NAPI handler for init callbacks */
-   netif_napi_add(ndev, &net_device->chan_table[0].napi,
-  netvsc_poll, NAPI_POLL_WEIGHT);
napi_enable(&net_device->chan_table[0].napi);
 
/* Writing nvdev pointer unlocks netvsc_send(), make sure chn_table is
diff --git a/drivers/net/hyperv/rndis_filter.c 
b/drivers/net/hyperv/rndis_filter.c
index ab92c3c95951..a9e9fb4bb9c5 100644
--- a/drivers/net/hyperv/rndis_filter.c
+++ b/drivers/net/hyperv/rndis_filter.c
@@ -1017,8 +1017,10 @@ static void netvsc_sc_open(struct vmbus_channel *new_sc)
 netvsc_channel_cb, nvchan);
if (ret == 0)
napi_enable(&nvchan->napi);
-   else
+   else {
+   netif_napi_del(&nvchan->napi);
netdev_err(ndev, "sub channel open failed (%d)\n", ret);
+   }
 
if (refcount_dec_and_test(&nvscdev->sc_offered))
complete(&nvscdev->channel_init_wait);
-- 
2.11.0



Re: more test_progs...

2017-04-26 Thread Alexei Starovoitov

On 4/26/17 7:55 AM, David Miller wrote:

From: Daniel Borkmann 
Date: Wed, 26 Apr 2017 10:14:42 +0200


On 04/26/2017 05:42 AM, Alexei Starovoitov wrote:

That sucks for sparc, of course. I don't have good suggestions for
workaround other than marking tcphdr as packed :(
 From code efficiency point of view it still will be faster than
LD_ABS insn.


Plus, ld_abs would also only work on skbs right now, and would
need JIT support for XDP. But it's also cumbersome to use with
f.e. IPv6, etc, so should not be encouraged, imo.

One other option for !HAVE_EFFICIENT_UNALIGNED_ACCESS archs could
be to provide a bpf_skb_load_bytes() helper equivalent for XDP,
where you eventually do the memcpy() inside. We could see to inline
the helper itself to optimize it slightly.


We have to do something that works transparently and always,
regardless of whether HAVE_EFFICIENT_UNALIGNED_ACCESS is in
play or not.

And no, marking objects with __packed is not the answer.


I'm not suggesting to mark everything as __packed.
Why the following is not the answer?
index fd1e0832d409..c215dffd7189 100644
--- a/tools/testing/selftests/bpf/test_pkt_access.c
+++ b/tools/testing/selftests/bpf/test_pkt_access.c
@@ -19,13 +19,52 @@
 #define barrier() __asm__ __volatile__("": : :"memory")
 int _version SEC("version") = 1;

+struct __tcphdr {
+   __u16  source;
...
+   __u16  window;
+   __u16 check;
+   __u16  urg_ptr;
+}
+#if defined(__sparc__) || defined(__s390__)
+__packed
+#endif
+;
 SEC("test1")
 int process(struct __sk_buff *skb)
 {
void *data_end = (void *)(long)skb->data_end;
void *data = (void *)(long)skb->data;
struct ethhdr *eth = (struct ethhdr *)(data);
-   struct tcphdr *tcp = NULL;
+   struct __tcphdr *tcp = NULL;
__u8 proto = 255;

It's only needed for test_pkt_access.c test,
since it's being fancy and doing iph->ihl * 4.

Also such tcp->urg_ptr access into packed struct is more efficient
after JITing then sparc's own load_half_unaligned in asm, since it's
done inline and doesn't need a call.

Note that such tcphdr workaround is not necessary for more
real programs: test_l4lb.c and test_xdp.c, since they do:
if (iph->ihl != 5)
   return drop;

Another idea:
x64, arm64, ppc have efficient unaligned.
s390 and mips64 have special instructions to do unaligned
access efficiently and we can make verifier convert unknown-align
load/stores into special internal load/stores, so they can be
executed differently by interpreter and by JITs.
Does sparc64 have some special instructions like that?



Re: [PATCH net-next v1] net: ipv6: make sure multicast packets are not forwarded beyond the different scopes

2017-04-26 Thread kbuild test robot
Hi Donatas,

[auto build test ERROR on net-next/master]

url:
https://github.com/0day-ci/linux/commits/Donatas-Abraitis/net-ipv6-make-sure-multicast-packets-are-not-forwarded-beyond-the-different-scopes/20170426-180846
config: x86_64-rhel (attached as .config)
compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901
reproduce:
# save the attached .config to linux build tree
make ARCH=x86_64 

All errors (new ones prefixed by >>):

   net//ipv6/ip6_input.c: In function 'ipv6_rcv':
>> net//ipv6/ip6_input.c:174:10: error: expected ')' before 'goto'
 goto err;
 ^~~~
>> net//ipv6/ip6_input.c:225:1: error: expected expression before '}' token
}
^
>> net//ipv6/ip6_input.c:166:3: error: label 'err' used but not defined
  goto err;
  ^~~~
>> net//ipv6/ip6_input.c:95:3: error: label 'drop' used but not defined
  goto drop;
  ^~~~
   net//ipv6/ip6_input.c:77:6: warning: unused variable 'pkt_len' 
[-Wunused-variable]
 u32 pkt_len;
 ^~~
   net//ipv6/ip6_input.c:225:1: warning: control reaches end of non-void 
function [-Wreturn-type]
}
^

vim +174 net//ipv6/ip6_input.c

89  
90  __IP6_UPD_PO_STATS(net, idev, IPSTATS_MIB_IN, skb->len);
91  
92  if ((skb = skb_share_check(skb, GFP_ATOMIC)) == NULL ||
93  !idev || unlikely(idev->cnf.disable_ipv6)) {
94  __IP6_INC_STATS(net, idev, IPSTATS_MIB_INDISCARDS);
  > 95  goto drop;
96  }
97  
98  memset(IP6CB(skb), 0, sizeof(struct inet6_skb_parm));
99  
   100  /*
   101   * Store incoming device index. When the packet will
   102   * be queued, we cannot refer to skb->dev anymore.
   103   *
   104   * BTW, when we send a packet for our own local address on a
   105   * non-loopback interface (e.g. ethX), it is being delivered
   106   * via the loopback interface (lo) here; skb->dev = 
loopback_dev.
   107   * It, however, should be considered as if it is being
   108   * arrived via the sending interface (ethX), because of the
   109   * nature of scoping architecture. --yoshfuji
   110   */
   111  IP6CB(skb)->iif = skb_valid_dst(skb) ? 
ip6_dst_idev(skb_dst(skb))->dev->ifindex : dev->ifindex;
   112  
   113  if (unlikely(!pskb_may_pull(skb, sizeof(*hdr
   114  goto err;
   115  
   116  hdr = ipv6_hdr(skb);
   117  
   118  if (hdr->version != 6)
   119  goto err;
   120  
   121  __IP6_ADD_STATS(net, idev,
   122  IPSTATS_MIB_NOECTPKTS +
   123  (ipv6_get_dsfield(hdr) & INET_ECN_MASK),
   124  max_t(unsigned short, 1, 
skb_shinfo(skb)->gso_segs));
   125  /*
   126   * RFC4291 2.5.3
   127   * The loopback address must not be used as the source address 
in IPv6
   128   * packets that are sent outside of a single node. [..]
   129   * A packet received on an interface with a destination address
   130   * of loopback must be dropped.
   131   */
   132  if ((ipv6_addr_loopback(&hdr->saddr) ||
   133   ipv6_addr_loopback(&hdr->daddr)) &&
   134   !(dev->flags & IFF_LOOPBACK))
   135  goto err;
   136  
   137  /* RFC4291 Errata ID: 3480
   138   * Interface-Local scope spans only a single interface on a
   139   * node and is useful only for loopback transmission of
   140   * multicast.  Packets with interface-local scope received
   141   * from another node must be discarded.
   142   */
   143  if (!(skb->pkt_type == PACKET_LOOPBACK ||
   144dev->flags & IFF_LOOPBACK) &&
   145  ipv6_addr_is_multicast(&hdr->daddr) &&
   146  IPV6_ADDR_MC_SCOPE(&hdr->daddr) == 1)
   147  goto err;
   148  
   149  /* If enabled, drop unicast packets that were encapsulated in 
link-layer
   150   * multicast or broadcast to protected against the so-called 
"hole-196"
   151   * attack in 802.11 wireless.
   152   */
   153  if (!ipv6_addr_is_multicast(&hdr->daddr) &&
   154  (skb->pkt_type == PACKET_BROADCAST ||
   155   skb->pkt_type == PACKET_MULTICAST) &&
   156  idev->cnf.drop_unicast_in_l2_multicast)
   157  goto err;
   158  
   159  /* RFC4291 2.7
   160   * Nodes must not originate a packet to a multicast address 
whose scope
   161   * field contains the reserved value 0; 

[PATCH net-next] Fix inaccurate helper function description

2017-04-26 Thread Chenbo Feng
From: Chenbo Feng 

The description inside uapi/linux/bpf.h about bpf_get_socket_uid
helper function is no longer valid. It returns overflowuid rather
than 0 when failed.

Signed-off-by: Chenbo Feng 
---
 include/uapi/linux/bpf.h | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index e553529..945a1f5 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -481,8 +481,7 @@ union bpf_attr {
  * u32 bpf_get_socket_uid(skb)
  * Get the owner uid of the socket stored inside sk_buff.
  * @skb: pointer to skb
- * Return: uid of the socket owner on success or 0 if the socket pointer
- * inside sk_buff is NULL
+ * Return: uid of the socket owner on success or overflowuid if failed.
  */
 #define __BPF_FUNC_MAPPER(FN)  \
FN(unspec), \
-- 
2.7.4



[PATCH net-next 5/5] bpf: provide a generic macro for percpu values for selftests

2017-04-26 Thread Daniel Borkmann
To overcome bugs as described and fixed in 89087c456fb5 ("bpf: Fix
values type used in test_maps"), provide a generic BPF_DECLARE_PERCPU()
and bpf_percpu() accessor macro for all percpu map values used in
tests.

Declaring variables works as follows (also works for structs):

  BPF_DECLARE_PERCPU(uint32_t, my_value);

They can then be accessed normally as uint32_t type through:

  bpf_percpu(my_value, )

For example:

  bpf_percpu(my_value, 0)++;

Implicitly, we make sure that the passed type is allocated and aligned
by gcc at least on a 8-byte boundary, so that it works together with
the map lookup/update syscall for percpu maps. We use it as a usage
example in test_maps, so that others are free to adapt this into their
code when necessary.

Signed-off-by: Daniel Borkmann 
Acked-by: Alexei Starovoitov 
---
 tools/testing/selftests/bpf/bpf_util.h  |  7 +++
 tools/testing/selftests/bpf/test_maps.c | 37 ++---
 2 files changed, 27 insertions(+), 17 deletions(-)

diff --git a/tools/testing/selftests/bpf/bpf_util.h 
b/tools/testing/selftests/bpf/bpf_util.h
index 7de2796..369e7d7 100644
--- a/tools/testing/selftests/bpf/bpf_util.h
+++ b/tools/testing/selftests/bpf/bpf_util.h
@@ -54,4 +54,11 @@ static inline unsigned int bpf_num_possible_cpus(void)
return possible_cpus;
 }
 
+#define __bpf_percpu_val_align __attribute__((__aligned__(8)))
+
+#define BPF_DECLARE_PERCPU(type, name) \
+   struct { type v; /* padding */ } __bpf_percpu_val_align \
+   name[bpf_num_possible_cpus()]
+#define bpf_percpu(name, cpu) name[(cpu)].v
+
 #endif /* __BPF_UTIL__ */
diff --git a/tools/testing/selftests/bpf/test_maps.c 
b/tools/testing/selftests/bpf/test_maps.c
index a977c4f..9331452 100644
--- a/tools/testing/selftests/bpf/test_maps.c
+++ b/tools/testing/selftests/bpf/test_maps.c
@@ -137,20 +137,20 @@ static void test_hashmap_sizes(int task, void *data)
 static void test_hashmap_percpu(int task, void *data)
 {
unsigned int nr_cpus = bpf_num_possible_cpus();
-   long long value[nr_cpus];
+   BPF_DECLARE_PERCPU(long, value);
long long key, next_key, first_key;
int expected_key_mask = 0;
int fd, i;
 
fd = bpf_create_map(BPF_MAP_TYPE_PERCPU_HASH, sizeof(key),
-   sizeof(value[0]), 2, map_flags);
+   sizeof(bpf_percpu(value, 0)), 2, map_flags);
if (fd < 0) {
printf("Failed to create hashmap '%s'!\n", strerror(errno));
exit(1);
}
 
for (i = 0; i < nr_cpus; i++)
-   value[i] = i + 100;
+   bpf_percpu(value, i) = i + 100;
 
key = 1;
/* Insert key=1 element. */
@@ -170,8 +170,9 @@ static void test_hashmap_percpu(int task, void *data)
/* Check that key=1 can be found. Value could be 0 if the lookup
 * was run from a different CPU.
 */
-   value[0] = 1;
-   assert(bpf_map_lookup_elem(fd, &key, value) == 0 && value[0] == 100);
+   bpf_percpu(value, 0) = 1;
+   assert(bpf_map_lookup_elem(fd, &key, value) == 0 &&
+  bpf_percpu(value, 0) == 100);
 
key = 2;
/* Check that key=2 is not found. */
@@ -211,7 +212,7 @@ static void test_hashmap_percpu(int task, void *data)
assert(bpf_map_lookup_elem(fd, &next_key, value) == 0);
 
for (i = 0; i < nr_cpus; i++)
-   assert(value[i] == i + 100);
+   assert(bpf_percpu(value, i) == i + 100);
 
key = next_key;
}
@@ -296,34 +297,36 @@ static void test_arraymap(int task, void *data)
 static void test_arraymap_percpu(int task, void *data)
 {
unsigned int nr_cpus = bpf_num_possible_cpus();
+   BPF_DECLARE_PERCPU(long, values);
int key, next_key, fd, i;
-   long long values[nr_cpus];
 
fd = bpf_create_map(BPF_MAP_TYPE_PERCPU_ARRAY, sizeof(key),
-   sizeof(values[0]), 2, 0);
+   sizeof(bpf_percpu(values, 0)), 2, 0);
if (fd < 0) {
printf("Failed to create arraymap '%s'!\n", strerror(errno));
exit(1);
}
 
for (i = 0; i < nr_cpus; i++)
-   values[i] = i + 100;
+   bpf_percpu(values, i) = i + 100;
 
key = 1;
/* Insert key=1 element. */
assert(bpf_map_update_elem(fd, &key, values, BPF_ANY) == 0);
 
-   values[0] = 0;
+   bpf_percpu(values, 0) = 0;
assert(bpf_map_update_elem(fd, &key, values, BPF_NOEXIST) == -1 &&
   errno == EEXIST);
 
/* Check that key=1 can be found. */
-   assert(bpf_map_lookup_elem(fd, &key, values) == 0 && values[0] == 100);
+   assert(bpf_map_lookup_elem(fd, &key, values) == 0 &&
+  bpf_percpu(values, 0) == 100);
 
key = 0;
/* Check that key=0 is also found and zero initialized. */
assert(bpf_map_lookup_elem(fd,

[PATCH net-next 1/5] bpf, x86_64/arm64: remove old ldimm64 artifacts from jits

2017-04-26 Thread Daniel Borkmann
For both cases, the verifier is already rejecting such invalid
formed instructions. Thus, remove these artifacts from old times
and align it with ppc64, sparc64 and s390x JITs that don't have
them in the first place.

Signed-off-by: Daniel Borkmann 
Acked-by: Alexei Starovoitov 
---
 arch/arm64/net/bpf_jit_comp.c | 9 -
 arch/x86/net/bpf_jit_comp.c   | 7 ---
 2 files changed, 16 deletions(-)

diff --git a/arch/arm64/net/bpf_jit_comp.c b/arch/arm64/net/bpf_jit_comp.c
index a785554..3047368 100644
--- a/arch/arm64/net/bpf_jit_comp.c
+++ b/arch/arm64/net/bpf_jit_comp.c
@@ -604,15 +604,6 @@ static int build_insn(const struct bpf_insn *insn, struct 
jit_ctx *ctx)
const struct bpf_insn insn1 = insn[1];
u64 imm64;
 
-   if (insn1.code != 0 || insn1.src_reg != 0 ||
-   insn1.dst_reg != 0 || insn1.off != 0) {
-   /* Note: verifier in BPF core must catch invalid
-* instructions.
-*/
-   pr_err_once("Invalid BPF_LD_IMM64 instruction\n");
-   return -EINVAL;
-   }
-
imm64 = (u64)insn1.imm << 32 | (u32)imm;
emit_a64_mov_i64(dst, imm64, ctx);
 
diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
index 32322ce..14f840d 100644
--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -490,13 +490,6 @@ static int do_jit(struct bpf_prog *bpf_prog, int *addrs, 
u8 *image,
break;
 
case BPF_LD | BPF_IMM | BPF_DW:
-   if (insn[1].code != 0 || insn[1].src_reg != 0 ||
-   insn[1].dst_reg != 0 || insn[1].off != 0) {
-   /* verifier must catch invalid insns */
-   pr_err("invalid BPF_LD_IMM64 insn\n");
-   return -EINVAL;
-   }
-
/* optimization: if imm64 is zero, use 'xor ,'
 * to save 7 bytes.
 */
-- 
1.9.3



[PATCH net-next 3/5] bpf: bpf_lock on kallsysms doesn't need to be irqsave

2017-04-26 Thread Daniel Borkmann
From: Hannes Frederic Sowa 

Hannes rightfully spotted that the bpf_lock doesn't need to be
irqsave variant. We never perform any such updates where this
would be necessary (neither right now nor in future), therefore
relax this further.

Signed-off-by: Hannes Frederic Sowa 
Signed-off-by: Daniel Borkmann 
Acked-by: Alexei Starovoitov 
---
 kernel/bpf/core.c | 12 
 1 file changed, 4 insertions(+), 8 deletions(-)

diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index b4f1cb0..6f81e0f 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -394,27 +394,23 @@ static bool bpf_prog_kallsyms_verify_off(const struct 
bpf_prog *fp)
 
 void bpf_prog_kallsyms_add(struct bpf_prog *fp)
 {
-   unsigned long flags;
-
if (!bpf_prog_kallsyms_candidate(fp) ||
!capable(CAP_SYS_ADMIN))
return;
 
-   spin_lock_irqsave(&bpf_lock, flags);
+   spin_lock_bh(&bpf_lock);
bpf_prog_ksym_node_add(fp->aux);
-   spin_unlock_irqrestore(&bpf_lock, flags);
+   spin_unlock_bh(&bpf_lock);
 }
 
 void bpf_prog_kallsyms_del(struct bpf_prog *fp)
 {
-   unsigned long flags;
-
if (!bpf_prog_kallsyms_candidate(fp))
return;
 
-   spin_lock_irqsave(&bpf_lock, flags);
+   spin_lock_bh(&bpf_lock);
bpf_prog_ksym_node_del(fp->aux);
-   spin_unlock_irqrestore(&bpf_lock, flags);
+   spin_unlock_bh(&bpf_lock);
 }
 
 static struct bpf_prog *bpf_prog_kallsyms_find(unsigned long addr)
-- 
1.9.3



[PATCH net-next 4/5] bpf: fix _htons occurences in test_progs

2017-04-26 Thread Daniel Borkmann
Dave reported that on sparc test_progs generates buggy swapped
eth->h_proto protocol comparisons:

  10: (15) if r3 == 0xdd86 goto pc+9
   R0=imm2,min_value=2,max_value=2 R1=pkt(id=0,off=0,r=14) R2=pkt_end R3=inv
   R4=pkt(id=0,off=14,r=14) R5=inv56 R10=fp

This is due to the unconditional ...

  #define htons __builtin_bswap16
  #define ntohs __builtin_bswap16

... in test_progs that causes this. Make use of asm/byteorder.h
and use __constant_htons() where possible and only perform the
bswap16 when on little endian in non-constant case.

Fixes: 6882804c916b ("selftests/bpf: add a test for overlapping packet range 
checks")
Fixes: 37821613626e ("selftests/bpf: add l4 load balancer test based on 
sched_cls")
Reported-by: David S. Miller 
Signed-off-by: Daniel Borkmann 
Acked-by: Alexei Starovoitov 
---
 tools/testing/selftests/bpf/bpf_util.h| 19 +++
 tools/testing/selftests/bpf/test_l4lb.c   | 11 +--
 tools/testing/selftests/bpf/test_pkt_access.c |  6 +++---
 tools/testing/selftests/bpf/test_progs.c  | 10 --
 4 files changed, 31 insertions(+), 15 deletions(-)

diff --git a/tools/testing/selftests/bpf/bpf_util.h 
b/tools/testing/selftests/bpf/bpf_util.h
index 84a5d18..7de2796 100644
--- a/tools/testing/selftests/bpf/bpf_util.h
+++ b/tools/testing/selftests/bpf/bpf_util.h
@@ -6,6 +6,25 @@
 #include 
 #include 
 
+#include 
+
+#if __BYTE_ORDER == __LITTLE_ENDIAN
+# define __bpf_ntohs(x)__builtin_bswap16(x)
+# define __bpf_htons(x)__builtin_bswap16(x)
+#elif __BYTE_ORDER == __BIG_ENDIAN
+# define __bpf_ntohs(x)(x)
+# define __bpf_htons(x)(x)
+#else
+# error "Fix your __BYTE_ORDER?!"
+#endif
+
+#define bpf_htons(x)   \
+   (__builtin_constant_p(x) ?  \
+__constant_htons(x) : __bpf_htons(x))
+#define bpf_ntohs(x)   \
+   (__builtin_constant_p(x) ?  \
+__constant_ntohs(x) : __bpf_ntohs(x))
+
 static inline unsigned int bpf_num_possible_cpus(void)
 {
static const char *fcpu = "/sys/devices/system/cpu/possible";
diff --git a/tools/testing/selftests/bpf/test_l4lb.c 
b/tools/testing/selftests/bpf/test_l4lb.c
index 368bfe8..b68b212 100644
--- a/tools/testing/selftests/bpf/test_l4lb.c
+++ b/tools/testing/selftests/bpf/test_l4lb.c
@@ -19,9 +19,8 @@
 #include 
 #include "bpf_helpers.h"
 #include "test_iptunnel_common.h"
+#include "bpf_util.h"
 
-#define htons __builtin_bswap16
-#define ntohs __builtin_bswap16
 int _version SEC("version") = 1;
 
 static inline __u32 rol32(__u32 word, unsigned int shift)
@@ -355,7 +354,7 @@ static __always_inline int process_packet(void *data, __u64 
off, void *data_end,
iph_len = sizeof(struct ipv6hdr);
protocol = ip6h->nexthdr;
pckt.proto = protocol;
-   pkt_bytes = ntohs(ip6h->payload_len);
+   pkt_bytes = bpf_ntohs(ip6h->payload_len);
off += iph_len;
if (protocol == IPPROTO_FRAGMENT) {
return TC_ACT_SHOT;
@@ -377,7 +376,7 @@ static __always_inline int process_packet(void *data, __u64 
off, void *data_end,
 
protocol = iph->protocol;
pckt.proto = protocol;
-   pkt_bytes = ntohs(iph->tot_len);
+   pkt_bytes = bpf_ntohs(iph->tot_len);
off += IPV4_HDR_LEN_NO_OPT;
 
if (iph->frag_off & PCKT_FRAGMENTED)
@@ -464,9 +463,9 @@ int balancer_ingress(struct __sk_buff *ctx)
if (data + nh_off > data_end)
return TC_ACT_SHOT;
eth_proto = eth->eth_proto;
-   if (eth_proto == htons(ETH_P_IP))
+   if (eth_proto == bpf_htons(ETH_P_IP))
return process_packet(data, nh_off, data_end, false, ctx);
-   else if (eth_proto == htons(ETH_P_IPV6))
+   else if (eth_proto == bpf_htons(ETH_P_IPV6))
return process_packet(data, nh_off, data_end, true, ctx);
else
return TC_ACT_SHOT;
diff --git a/tools/testing/selftests/bpf/test_pkt_access.c 
b/tools/testing/selftests/bpf/test_pkt_access.c
index fd1e083..7113005 100644
--- a/tools/testing/selftests/bpf/test_pkt_access.c
+++ b/tools/testing/selftests/bpf/test_pkt_access.c
@@ -14,8 +14,8 @@
 #include 
 #include 
 #include "bpf_helpers.h"
+#include "bpf_util.h"
 
-#define _htons __builtin_bswap16
 #define barrier() __asm__ __volatile__("": : :"memory")
 int _version SEC("version") = 1;
 
@@ -32,7 +32,7 @@ int process(struct __sk_buff *skb)
if (eth + 1 > data_end)
return TC_ACT_SHOT;
 
-   if (eth->h_proto == _htons(ETH_P_IP)) {
+   if (eth->h_proto == bpf_htons(ETH_P_IP)) {
struct iphdr *iph = (struct iphdr *)(eth + 1);
 
if (iph + 1 > data_end)
@@ -40,7 +40,7 @@ int process(struct __sk_buff *skb)
ihl_len = iph->ihl * 4;
proto = iph->protocol;
 

[PATCH net-next 2/5] bpf: add various test cases to verifier selftests

2017-04-26 Thread Daniel Borkmann
Add several test cases around ldimm64, fp arithmetic and direct
packet access.

Signed-off-by: Daniel Borkmann 
Acked-by: Alexei Starovoitov 
---
 tools/testing/selftests/bpf/test_verifier.c | 137 
 1 file changed, 137 insertions(+)

diff --git a/tools/testing/selftests/bpf/test_verifier.c 
b/tools/testing/selftests/bpf/test_verifier.c
index 95a8d5f..d3395c1 100644
--- a/tools/testing/selftests/bpf/test_verifier.c
+++ b/tools/testing/selftests/bpf/test_verifier.c
@@ -191,6 +191,86 @@ struct test_val {
.result = REJECT,
},
{
+   "test6 ld_imm64",
+   .insns = {
+   BPF_RAW_INSN(BPF_LD | BPF_IMM | BPF_DW, 0, 0, 0, 0),
+   BPF_RAW_INSN(0, 0, 0, 0, 0),
+   BPF_EXIT_INSN(),
+   },
+   .result = ACCEPT,
+   },
+   {
+   "test7 ld_imm64",
+   .insns = {
+   BPF_RAW_INSN(BPF_LD | BPF_IMM | BPF_DW, 0, 0, 0, 1),
+   BPF_RAW_INSN(0, 0, 0, 0, 1),
+   BPF_EXIT_INSN(),
+   },
+   .result = ACCEPT,
+   },
+   {
+   "test8 ld_imm64",
+   .insns = {
+   BPF_RAW_INSN(BPF_LD | BPF_IMM | BPF_DW, 0, 0, 1, 1),
+   BPF_RAW_INSN(0, 0, 0, 0, 1),
+   BPF_EXIT_INSN(),
+   },
+   .errstr = "uses reserved fields",
+   .result = REJECT,
+   },
+   {
+   "test9 ld_imm64",
+   .insns = {
+   BPF_RAW_INSN(BPF_LD | BPF_IMM | BPF_DW, 0, 0, 0, 1),
+   BPF_RAW_INSN(0, 0, 0, 1, 1),
+   BPF_EXIT_INSN(),
+   },
+   .errstr = "invalid bpf_ld_imm64 insn",
+   .result = REJECT,
+   },
+   {
+   "test10 ld_imm64",
+   .insns = {
+   BPF_RAW_INSN(BPF_LD | BPF_IMM | BPF_DW, 0, 0, 0, 1),
+   BPF_RAW_INSN(0, BPF_REG_1, 0, 0, 1),
+   BPF_EXIT_INSN(),
+   },
+   .errstr = "invalid bpf_ld_imm64 insn",
+   .result = REJECT,
+   },
+   {
+   "test11 ld_imm64",
+   .insns = {
+   BPF_RAW_INSN(BPF_LD | BPF_IMM | BPF_DW, 0, 0, 0, 1),
+   BPF_RAW_INSN(0, 0, BPF_REG_1, 0, 1),
+   BPF_EXIT_INSN(),
+   },
+   .errstr = "invalid bpf_ld_imm64 insn",
+   .result = REJECT,
+   },
+   {
+   "test12 ld_imm64",
+   .insns = {
+   BPF_MOV64_IMM(BPF_REG_1, 0),
+   BPF_RAW_INSN(BPF_LD | BPF_IMM | BPF_DW, 0, BPF_REG_1, 
0, 1),
+   BPF_RAW_INSN(0, 0, 0, 0, 1),
+   BPF_EXIT_INSN(),
+   },
+   .errstr = "not pointing to valid bpf_map",
+   .result = REJECT,
+   },
+   {
+   "test13 ld_imm64",
+   .insns = {
+   BPF_MOV64_IMM(BPF_REG_1, 0),
+   BPF_RAW_INSN(BPF_LD | BPF_IMM | BPF_DW, 0, BPF_REG_1, 
0, 1),
+   BPF_RAW_INSN(0, 0, BPF_REG_1, 0, 1),
+   BPF_EXIT_INSN(),
+   },
+   .errstr = "invalid bpf_ld_imm64 insn",
+   .result = REJECT,
+   },
+   {
"no bpf_exit",
.insns = {
BPF_ALU64_REG(BPF_MOV, BPF_REG_0, BPF_REG_2),
@@ -331,6 +411,30 @@ struct test_val {
.result = REJECT,
},
{
+   "invalid fp arithmetic",
+   /* If this gets ever changed, make sure JITs can deal with it. 
*/
+   .insns = {
+   BPF_MOV64_IMM(BPF_REG_0, 0),
+   BPF_MOV64_REG(BPF_REG_1, BPF_REG_10),
+   BPF_ALU64_IMM(BPF_SUB, BPF_REG_1, 8),
+   BPF_STX_MEM(BPF_DW, BPF_REG_1, BPF_REG_0, 0),
+   BPF_EXIT_INSN(),
+   },
+   .errstr_unpriv = "R1 pointer arithmetic",
+   .result_unpriv = REJECT,
+   .errstr = "R1 invalid mem access",
+   .result = REJECT,
+   },
+   {
+   "non-invalid fp arithmetic",
+   .insns = {
+   BPF_MOV64_IMM(BPF_REG_0, 0),
+   BPF_STX_MEM(BPF_DW, BPF_REG_10, BPF_REG_0, -8),
+   BPF_EXIT_INSN(),
+   },
+   .result = ACCEPT,
+   },
+   {
"invalid argument register",
.insns = {
BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0,
@@ -1801,6 +1905,20 @@ struct test_val {
.result = ACCEPT,
},
{
+   "unpriv: adding of fp",
+

[PATCH net-next 0/5] Misc BPF updates

2017-04-26 Thread Daniel Borkmann
This set cleans up ldimm64 leftovers from early eBPF days and
adds couple of test cases related to this to the verifier test
suite. It also cleans up the kallsyms spinlock (had same patch
also in queue) by relaxing it through switching to _bh variant.
It fixes up test_progs in relation to htons/ntohs and adds
accessor macros for the percpu tests in test_maps.

Thanks!

Daniel Borkmann (4):
  bpf, x86_64/arm64: remove old ldimm64 artifacts from jits
  bpf: add various test cases to verifier selftests
  bpf: fix _htons occurences in test_progs
  bpf: provide a generic macro for percpu values for selftests

Hannes Frederic Sowa (1):
  bpf: bpf_lock on kallsysms doesn't need to be irqsave

 arch/arm64/net/bpf_jit_comp.c |   9 --
 arch/x86/net/bpf_jit_comp.c   |   7 --
 kernel/bpf/core.c |  12 +--
 tools/testing/selftests/bpf/bpf_util.h|  26 +
 tools/testing/selftests/bpf/test_l4lb.c   |  11 +--
 tools/testing/selftests/bpf/test_maps.c   |  37 +++
 tools/testing/selftests/bpf/test_pkt_access.c |   6 +-
 tools/testing/selftests/bpf/test_progs.c  |  10 +-
 tools/testing/selftests/bpf/test_verifier.c   | 137 ++
 9 files changed, 199 insertions(+), 56 deletions(-)

-- 
1.9.3



[PATCH net-next 7/9] drivers: net: xgene: Workaround for HW errata 10GE_4

2017-04-26 Thread Iyappan Subramanian
From: Quan Nguyen 

This patch adds workaround for HW errata 10GE_4:
"XGENET_ICM_ECM_DROP_COUNT_REG_0 reg not clear on read".

Signed-off-by: Quan Nguyen 
Signed-off-by: Iyappan Subramanian 
---
 drivers/net/ethernet/apm/xgene/xgene_enet_hw.c| 2 ++
 drivers/net/ethernet/apm/xgene/xgene_enet_sgmac.c | 5 +
 drivers/net/ethernet/apm/xgene/xgene_enet_xgmac.c | 2 ++
 3 files changed, 9 insertions(+)

diff --git a/drivers/net/ethernet/apm/xgene/xgene_enet_hw.c 
b/drivers/net/ethernet/apm/xgene/xgene_enet_hw.c
index f79eb78..60b1a07 100644
--- a/drivers/net/ethernet/apm/xgene/xgene_enet_hw.c
+++ b/drivers/net/ethernet/apm/xgene/xgene_enet_hw.c
@@ -623,6 +623,8 @@ static void xgene_gmac_get_drop_cnt(struct xgene_enet_pdata 
*pdata,
xgene_enet_rd_mcx_csr(pdata, ICM_ECM_DROP_COUNT_REG0_ADDR, &count);
*rx = ICM_DROP_COUNT(count);
*tx = ECM_DROP_COUNT(count);
+   /* Errata: 10GE_4 - Fix ICM_ECM_DROP_COUNT not clear-on-read */
+   xgene_enet_rd_mcx_csr(pdata, ECM_CONFIG0_REG_0_ADDR, &count);
 }
 
 static void xgene_enet_config_ring_if_assoc(struct xgene_enet_pdata *pdata)
diff --git a/drivers/net/ethernet/apm/xgene/xgene_enet_sgmac.c 
b/drivers/net/ethernet/apm/xgene/xgene_enet_sgmac.c
index b253069..ca1f723 100644
--- a/drivers/net/ethernet/apm/xgene/xgene_enet_sgmac.c
+++ b/drivers/net/ethernet/apm/xgene/xgene_enet_sgmac.c
@@ -197,6 +197,11 @@ static void xgene_sgmac_get_drop_cnt(struct 
xgene_enet_pdata *pdata,
count = xgene_enet_rd_mcx_csr(pdata, addr);
*rx = ICM_DROP_COUNT(count);
*tx = ECM_DROP_COUNT(count);
+   /* Errata: 10GE_4 - ICM_ECM_DROP_COUNT not clear-on-read */
+   addr = (pdata->enet_id != XGENE_ENET1) ?
+   XG_MCX_ECM_CONFIG0_REG_0_ADDR :
+   ECM_CONFIG0_REG_0_ADDR + pdata->port_id * OFFSET_4;
+   xgene_enet_rd_mcx_csr(pdata, addr);
 }
 
 static void xgene_enet_config_ring_if_assoc(struct xgene_enet_pdata *p)
diff --git a/drivers/net/ethernet/apm/xgene/xgene_enet_xgmac.c 
b/drivers/net/ethernet/apm/xgene/xgene_enet_xgmac.c
index a317596..3e4cd79 100644
--- a/drivers/net/ethernet/apm/xgene/xgene_enet_xgmac.c
+++ b/drivers/net/ethernet/apm/xgene/xgene_enet_xgmac.c
@@ -239,6 +239,8 @@ static void xgene_xgmac_get_drop_cnt(struct 
xgene_enet_pdata *pdata,
xgene_enet_rd_axg_csr(pdata, XGENET_ICM_ECM_DROP_COUNT_REG0, &count);
*rx = ICM_DROP_COUNT(count);
*tx = ECM_DROP_COUNT(count);
+   /* Errata: 10GE_4 - ICM_ECM_DROP_COUNT not clear-on-read */
+   xgene_enet_rd_axg_csr(pdata, XGENET_ECM_CONFIG0_REG_0, &count);
 }
 
 static void xgene_enet_config_ring_if_assoc(struct xgene_enet_pdata *pdata)
-- 
1.9.1



[PATCH net-next 9/9] drivers: net: xgene: Workaround for HW errata 10GE_10/ENET_15

2017-04-26 Thread Iyappan Subramanian
From: Quan Nguyen 

This patch adds workaround for HW errata 10GE_10 and ENET_15:
"HW statistic counters value are duplicated".

- RFCS duplicates RALN counter
- RFLR duplicates RUND counter
- TFCS duplicates TFRG counter
- RALN should be intepreted as 0 in 10G mode

Signed-off-by: Quan Nguyen 
Signed-off-by: Iyappan Subramanian 
---
 .../net/ethernet/apm/xgene/xgene_enet_ethtool.c| 30 ++
 drivers/net/ethernet/apm/xgene/xgene_enet_main.c   | 20 ---
 drivers/net/ethernet/apm/xgene/xgene_enet_main.h   |  2 ++
 3 files changed, 44 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/apm/xgene/xgene_enet_ethtool.c 
b/drivers/net/ethernet/apm/xgene/xgene_enet_ethtool.c
index 5e8660e..8d9ed2b 100644
--- a/drivers/net/ethernet/apm/xgene/xgene_enet_ethtool.c
+++ b/drivers/net/ethernet/apm/xgene/xgene_enet_ethtool.c
@@ -71,6 +71,7 @@ struct xgene_gstrings_stats {
XGENE_EXTD_STAT(rx_oversize_pkt_cntr, ROVR, 16),
XGENE_EXTD_STAT(rx_fragments_cntr, RFRG, 16),
XGENE_EXTD_STAT(rx_jabber_cntr, RJBR, 16),
+   XGENE_EXTD_STAT(rx_jabber_recov_cntr, DUMP, 0),
XGENE_EXTD_STAT(rx_dropped_pkt_cntr, RDRP, 16),
XGENE_EXTD_STAT(rx_overrun_cntr, DUMP, 0),
XGENE_EXTD_STAT(tx_multicast_pkt_cntr, TMCA, 31),
@@ -96,9 +97,16 @@ struct xgene_gstrings_stats {
 
 #define XGENE_STATS_LENARRAY_SIZE(gstrings_stats)
 #define XGENE_EXTD_STATS_LEN   ARRAY_SIZE(gstrings_extd_stats)
+#define RFCS_IDX   7
+#define RALN_IDX   13
+#define RFLR_IDX   14
 #define FALSE_RFLR_IDX 15
-#define RX_OVERRUN_IDX 23
-#define TX_UNDERRUN_IDX42
+#define RUND_IDX   18
+#define FALSE_RJBR_IDX 22
+#define RX_OVERRUN_IDX 24
+#define TFCS_IDX   38
+#define TFRG_IDX   42
+#define TX_UNDERRUN_IDX43
 
 static void xgene_get_drvinfo(struct net_device *ndev,
  struct ethtool_drvinfo *info)
@@ -217,24 +225,36 @@ static int xgene_get_sset_count(struct net_device *ndev, 
int sset)
 
 static void xgene_get_extd_stats(struct xgene_enet_pdata *pdata)
 {
+   u32 tmp[XGENE_EXTD_STATS_LEN];
u32 rx_drop, tx_drop;
-   u32 tmp;
int i;
 
for (i = 0; i < XGENE_EXTD_STATS_LEN; i++) {
pdata->mac_ops->read_stats(pdata,
-  gstrings_extd_stats[i].addr, &tmp);
+   gstrings_extd_stats[i].addr, &tmp[i]);
if (gstrings_extd_stats[i].mask)
-   pdata->extd_stats[i] += tmp &
+   pdata->extd_stats[i] += tmp[i] &
GENMASK(gstrings_extd_stats[i].mask - 1, 0);
}
 
+   if (pdata->phy_mode == PHY_INTERFACE_MODE_XGMII) {
+   /* Errata 10GE_10 - SW should intepret RALN as 0 */
+   pdata->extd_stats[RALN_IDX] = 0;
+   } else {
+   /* Errata ENET_15 - Fixes RFCS, RFLR, TFCS counter */
+   pdata->extd_stats[RFCS_IDX] -= tmp[RALN_IDX];
+   pdata->extd_stats[RFLR_IDX] -= tmp[RUND_IDX];
+   pdata->extd_stats[TFCS_IDX] -= tmp[TFRG_IDX];
+   }
+
pdata->mac_ops->get_drop_cnt(pdata, &rx_drop, &tx_drop);
pdata->extd_stats[RX_OVERRUN_IDX] += rx_drop;
pdata->extd_stats[TX_UNDERRUN_IDX] += tx_drop;
 
/* Errata 10GE_8 -  Update Frame recovered from Errata 10GE_8/ENET_11 */
pdata->extd_stats[FALSE_RFLR_IDX] = pdata->false_rflr;
+   /* Errata ENET_15 - Jabber Frame recov'ed from Errata 10GE_10/ENET_15 */
+   pdata->extd_stats[FALSE_RJBR_IDX] = pdata->vlan_rjbr;
 }
 
 int xgene_extd_stats_init(struct xgene_enet_pdata *pdata)
diff --git a/drivers/net/ethernet/apm/xgene/xgene_enet_main.c 
b/drivers/net/ethernet/apm/xgene/xgene_enet_main.c
index c2d38da..01e389d 100644
--- a/drivers/net/ethernet/apm/xgene/xgene_enet_main.c
+++ b/drivers/net/ethernet/apm/xgene/xgene_enet_main.c
@@ -656,6 +656,18 @@ static void xgene_enet_free_pagepool(struct 
xgene_enet_desc_ring *buf_pool,
buf_pool->head = head;
 }
 
+/* Errata 10GE_10 and ENET_15 - Fix duplicated HW statistic counters */
+static bool xgene_enet_errata_10GE_10(struct sk_buff *skb, u32 len, u8 status)
+{
+   if (status == INGRESS_CRC &&
+   len >= (ETHER_STD_PACKET + 1) &&
+   len <= (ETHER_STD_PACKET + 4) &&
+   skb->protocol == htons(ETH_P_8021Q))
+   return true;
+
+   return false;
+}
+
 /* Errata 10GE_8 and ENET_11 - allow packet with length <=64B */
 static bool xgene_enet_errata_10GE_8(struct sk_buff *skb, u32 len, u8 status)
 {
@@ -706,14 +718,16 @@ static int xgene_enet_rx_frame(struct 
xgene_enet_desc_ring *rx_ring,
status = (GET_VAL(ELERR, le64_to_cpu(raw_desc->m0)) << LERR_LEN) |
  GET_VAL(LERR, le64_to_cpu(raw_desc->m0));
if (unlikely(status)) {
-   if (!xgene_

[PATCH net-next 6/9] drivers: net: xgene: Add rx_overrun/tx_underrun statistic

2017-04-26 Thread Iyappan Subramanian
From: Quan Nguyen 

This patch adds rx_overrun and tx_underrun ethtool statistic counters.

Signed-off-by: Quan Nguyen 
Signed-off-by: Iyappan Subramanian 
---
 drivers/net/ethernet/apm/xgene/xgene_enet_ethtool.c | 16 +---
 drivers/net/ethernet/apm/xgene/xgene_enet_hw.c  | 11 +++
 drivers/net/ethernet/apm/xgene/xgene_enet_hw.h  |  8 
 drivers/net/ethernet/apm/xgene/xgene_enet_main.h|  1 +
 drivers/net/ethernet/apm/xgene/xgene_enet_sgmac.c   | 14 ++
 drivers/net/ethernet/apm/xgene/xgene_enet_xgmac.c   | 11 +++
 drivers/net/ethernet/apm/xgene/xgene_enet_xgmac.h   |  4 
 7 files changed, 62 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/apm/xgene/xgene_enet_ethtool.c 
b/drivers/net/ethernet/apm/xgene/xgene_enet_ethtool.c
index bbc90b6..369658b 100644
--- a/drivers/net/ethernet/apm/xgene/xgene_enet_ethtool.c
+++ b/drivers/net/ethernet/apm/xgene/xgene_enet_ethtool.c
@@ -71,6 +71,7 @@ struct xgene_gstrings_stats {
XGENE_EXTD_STAT(rx_fragments_cntr, RFRG, 16),
XGENE_EXTD_STAT(rx_jabber_cntr, RJBR, 16),
XGENE_EXTD_STAT(rx_dropped_pkt_cntr, RDRP, 16),
+   XGENE_EXTD_STAT(rx_overrun_cntr, DUMP, 0),
XGENE_EXTD_STAT(tx_multicast_pkt_cntr, TMCA, 31),
XGENE_EXTD_STAT(tx_broadcast_pkt_cntr, TBCA, 31),
XGENE_EXTD_STAT(tx_pause_ctrl_frame_cntr, TXPF, 16),
@@ -88,11 +89,14 @@ struct xgene_gstrings_stats {
XGENE_EXTD_STAT(tx_ctrl_frame_cntr, TXCF, 12),
XGENE_EXTD_STAT(tx_oversize_frame_cntr, TOVR, 12),
XGENE_EXTD_STAT(tx_undersize_frame_cntr, TUND, 12),
-   XGENE_EXTD_STAT(tx_fragments_cntr, TFRG, 12)
+   XGENE_EXTD_STAT(tx_fragments_cntr, TFRG, 12),
+   XGENE_EXTD_STAT(tx_underrun_cntr, DUMP, 0)
 };
 
 #define XGENE_STATS_LENARRAY_SIZE(gstrings_stats)
 #define XGENE_EXTD_STATS_LEN   ARRAY_SIZE(gstrings_extd_stats)
+#define RX_OVERRUN_IDX 22
+#define TX_UNDERRUN_IDX41
 
 static void xgene_get_drvinfo(struct net_device *ndev,
  struct ethtool_drvinfo *info)
@@ -211,15 +215,21 @@ static int xgene_get_sset_count(struct net_device *ndev, 
int sset)
 
 static void xgene_get_extd_stats(struct xgene_enet_pdata *pdata)
 {
+   u32 rx_drop, tx_drop;
u32 tmp;
int i;
 
for (i = 0; i < XGENE_EXTD_STATS_LEN; i++) {
pdata->mac_ops->read_stats(pdata,
   gstrings_extd_stats[i].addr, &tmp);
-   pdata->extd_stats[i] += tmp &
-   GENMASK(gstrings_extd_stats[i].mask - 1, 0);
+   if (gstrings_extd_stats[i].mask)
+   pdata->extd_stats[i] += tmp &
+   GENMASK(gstrings_extd_stats[i].mask - 1, 0);
}
+
+   pdata->mac_ops->get_drop_cnt(pdata, &rx_drop, &tx_drop);
+   pdata->extd_stats[RX_OVERRUN_IDX] += rx_drop;
+   pdata->extd_stats[TX_UNDERRUN_IDX] += tx_drop;
 }
 
 int xgene_extd_stats_init(struct xgene_enet_pdata *pdata)
diff --git a/drivers/net/ethernet/apm/xgene/xgene_enet_hw.c 
b/drivers/net/ethernet/apm/xgene/xgene_enet_hw.c
index ec5f61f..f79eb78 100644
--- a/drivers/net/ethernet/apm/xgene/xgene_enet_hw.c
+++ b/drivers/net/ethernet/apm/xgene/xgene_enet_hw.c
@@ -615,6 +615,16 @@ static void xgene_gmac_init(struct xgene_enet_pdata *pdata)
xgene_enet_wr_csr(pdata, CFG_BYPASS_ADDR, RESUME_TX);
 }
 
+static void xgene_gmac_get_drop_cnt(struct xgene_enet_pdata *pdata,
+   u32 *rx, u32 *tx)
+{
+   u32 count;
+
+   xgene_enet_rd_mcx_csr(pdata, ICM_ECM_DROP_COUNT_REG0_ADDR, &count);
+   *rx = ICM_DROP_COUNT(count);
+   *tx = ECM_DROP_COUNT(count);
+}
+
 static void xgene_enet_config_ring_if_assoc(struct xgene_enet_pdata *pdata)
 {
u32 val = 0x;
@@ -1025,6 +1035,7 @@ void xgene_enet_mdio_remove(struct xgene_enet_pdata 
*pdata)
.rx_disable = xgene_gmac_rx_disable,
.tx_disable = xgene_gmac_tx_disable,
.read_stats = xgene_enet_rd_mcx_stats,
+   .get_drop_cnt = xgene_gmac_get_drop_cnt,
.set_speed = xgene_gmac_set_speed,
.set_mac_addr = xgene_gmac_set_mac_addr,
.set_framesize = xgene_enet_set_frame_size,
diff --git a/drivers/net/ethernet/apm/xgene/xgene_enet_hw.h 
b/drivers/net/ethernet/apm/xgene/xgene_enet_hw.h
index 9130c05..c5672df 100644
--- a/drivers/net/ethernet/apm/xgene/xgene_enet_hw.h
+++ b/drivers/net/ethernet/apm/xgene/xgene_enet_hw.h
@@ -192,6 +192,10 @@ enum xgene_enet_rm {
 #define CFG_CLE_NXTFPSEL0(val) (((val) << 20) & GENMASK(23, 20))
 #define ICM_CONFIG0_REG_0_ADDR 0x0400
 #define ICM_CONFIG2_REG_0_ADDR 0x0410
+#define ECM_CONFIG0_REG_0_ADDR 0x0500
+#define ECM_CONFIG0_REG_1_ADDR 0x0504
+#define ICM_ECM_DROP_COUNT_REG0_ADDR   0x0508
+#define ICM_ECM_DROP_COUNT_REG1_ADDR   0x050c
 #define RX_DV_GATE_REG_0_ADDR  0x05fc
 #define TX_DV_GATE_EN0  

[PATCH net-next 5/9] drivers: net: xgene: Extend ethtool statistics

2017-04-26 Thread Iyappan Subramanian
From: Quan Nguyen 

This patch adds extended ethtool statistics support.

Signed-off-by: Quan Nguyen 
Signed-off-by: Iyappan Subramanian 
---
 .../net/ethernet/apm/xgene/xgene_enet_ethtool.c| 90 +-
 drivers/net/ethernet/apm/xgene/xgene_enet_hw.c | 20 +
 drivers/net/ethernet/apm/xgene/xgene_enet_hw.h | 50 
 drivers/net/ethernet/apm/xgene/xgene_enet_main.c   |  8 ++
 drivers/net/ethernet/apm/xgene/xgene_enet_main.h   |  5 ++
 drivers/net/ethernet/apm/xgene/xgene_enet_sgmac.c  | 16 
 drivers/net/ethernet/apm/xgene/xgene_enet_xgmac.c  | 20 +
 drivers/net/ethernet/apm/xgene/xgene_enet_xgmac.h  |  1 +
 8 files changed, 209 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/apm/xgene/xgene_enet_ethtool.c 
b/drivers/net/ethernet/apm/xgene/xgene_enet_ethtool.c
index 217cde8..bbc90b6 100644
--- a/drivers/net/ethernet/apm/xgene/xgene_enet_ethtool.c
+++ b/drivers/net/ethernet/apm/xgene/xgene_enet_ethtool.c
@@ -23,9 +23,17 @@
 struct xgene_gstrings_stats {
char name[ETH_GSTRING_LEN];
int offset;
+   u32 addr;
+   u32 mask;
 };
 
 #define XGENE_STAT(m) { #m, offsetof(struct rtnl_link_stats64, m) }
+#define XGENE_EXTD_STAT(s, a, m)   \
+   {   \
+   .name = #s, \
+   .addr = a ## _ADDR, \
+   .mask = m   \
+   }
 
 static const struct xgene_gstrings_stats gstrings_stats[] = {
XGENE_STAT(rx_packets),
@@ -40,7 +48,51 @@ struct xgene_gstrings_stats {
XGENE_STAT(rx_fifo_errors)
 };
 
+static const struct xgene_gstrings_stats gstrings_extd_stats[] = {
+   XGENE_EXTD_STAT(tx_rx_64b_frame_cntr, TR64, 31),
+   XGENE_EXTD_STAT(tx_rx_127b_frame_cntr, TR127, 31),
+   XGENE_EXTD_STAT(tx_rx_255b_frame_cntr, TR255, 31),
+   XGENE_EXTD_STAT(tx_rx_511b_frame_cntr, TR511, 31),
+   XGENE_EXTD_STAT(tx_rx_1023b_frame_cntr, TR1K, 31),
+   XGENE_EXTD_STAT(tx_rx_1518b_frame_cntr, TRMAX, 31),
+   XGENE_EXTD_STAT(tx_rx_1522b_frame_cntr, TRMGV, 31),
+   XGENE_EXTD_STAT(rx_fcs_error_cntr, RFCS, 16),
+   XGENE_EXTD_STAT(rx_multicast_pkt_cntr, RMCA, 31),
+   XGENE_EXTD_STAT(rx_broadcast_pkt_cntr, RBCA, 31),
+   XGENE_EXTD_STAT(rx_ctrl_frame_pkt_cntr, RXCF, 16),
+   XGENE_EXTD_STAT(rx_pause_frame_pkt_cntr, RXPF, 16),
+   XGENE_EXTD_STAT(rx_unk_opcode_cntr, RXUO, 16),
+   XGENE_EXTD_STAT(rx_align_err_cntr, RALN, 16),
+   XGENE_EXTD_STAT(rx_frame_len_err_cntr, RFLR, 16),
+   XGENE_EXTD_STAT(rx_code_err_cntr, RCDE, 16),
+   XGENE_EXTD_STAT(rx_carrier_sense_err_cntr, RCSE, 16),
+   XGENE_EXTD_STAT(rx_undersize_pkt_cntr, RUND, 16),
+   XGENE_EXTD_STAT(rx_oversize_pkt_cntr, ROVR, 16),
+   XGENE_EXTD_STAT(rx_fragments_cntr, RFRG, 16),
+   XGENE_EXTD_STAT(rx_jabber_cntr, RJBR, 16),
+   XGENE_EXTD_STAT(rx_dropped_pkt_cntr, RDRP, 16),
+   XGENE_EXTD_STAT(tx_multicast_pkt_cntr, TMCA, 31),
+   XGENE_EXTD_STAT(tx_broadcast_pkt_cntr, TBCA, 31),
+   XGENE_EXTD_STAT(tx_pause_ctrl_frame_cntr, TXPF, 16),
+   XGENE_EXTD_STAT(tx_defer_pkt_cntr, TDFR, 31),
+   XGENE_EXTD_STAT(tx_excv_defer_pkt_cntr, TEDF, 31),
+   XGENE_EXTD_STAT(tx_single_col_pkt_cntr, TSCL, 31),
+   XGENE_EXTD_STAT(tx_multi_col_pkt_cntr, TMCL, 31),
+   XGENE_EXTD_STAT(tx_late_col_pkt_cntr, TLCL, 31),
+   XGENE_EXTD_STAT(tx_excv_col_pkt_cntr, TXCL, 31),
+   XGENE_EXTD_STAT(tx_total_col_cntr, TNCL, 31),
+   XGENE_EXTD_STAT(tx_pause_frames_hnrd_cntr, TPFH, 16),
+   XGENE_EXTD_STAT(tx_drop_frame_cntr, TDRP, 16),
+   XGENE_EXTD_STAT(tx_jabber_frame_cntr, TJBR, 12),
+   XGENE_EXTD_STAT(tx_fcs_error_cntr, TFCS, 12),
+   XGENE_EXTD_STAT(tx_ctrl_frame_cntr, TXCF, 12),
+   XGENE_EXTD_STAT(tx_oversize_frame_cntr, TOVR, 12),
+   XGENE_EXTD_STAT(tx_undersize_frame_cntr, TUND, 12),
+   XGENE_EXTD_STAT(tx_fragments_cntr, TFRG, 12)
+};
+
 #define XGENE_STATS_LENARRAY_SIZE(gstrings_stats)
+#define XGENE_EXTD_STATS_LEN   ARRAY_SIZE(gstrings_extd_stats)
 
 static void xgene_get_drvinfo(struct net_device *ndev,
  struct ethtool_drvinfo *info)
@@ -142,6 +194,11 @@ static void xgene_get_strings(struct net_device *ndev, u32 
stringset, u8 *data)
memcpy(p, gstrings_stats[i].name, ETH_GSTRING_LEN);
p += ETH_GSTRING_LEN;
}
+
+   for (i = 0; i < XGENE_EXTD_STATS_LEN; i++) {
+   memcpy(p, gstrings_extd_stats[i].name, ETH_GSTRING_LEN);
+   p += ETH_GSTRING_LEN;
+   }
 }
 
 static int xgene_get_sset_count(struct net_device *ndev, int sset)
@@ -149,19 +206,50 @@ static int xgene_get_sset_count(struct net_device *ndev, 
int sset)
if (sset != ETH_SS_STATS)
return -EINVAL;
 
-   return XGENE_STATS_LEN;
+   return XGENE_STATS_LEN + XGENE_EXTD_STATS_LEN;
+}
+
+static void xgene_get_extd_stats(struct xgene_enet_

[PATCH net-next 8/9] drivers: net: xgene: Add frame recovered statistics counter for errata 10GE_8/ENET_11

2017-04-26 Thread Iyappan Subramanian
From: Quan Nguyen 

This patch adds statistic counter for frames recovered from HW errata
10GE_8 and ENET_11:
"HW reports Length error for valid 64 byte frames with len <46 bytes".

Signed-off-by: Quan Nguyen 
Signed-off-by: Iyappan Subramanian 
---
 drivers/net/ethernet/apm/xgene/xgene_enet_ethtool.c | 9 +++--
 drivers/net/ethernet/apm/xgene/xgene_enet_main.c| 2 ++
 drivers/net/ethernet/apm/xgene/xgene_enet_main.h| 1 +
 3 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/apm/xgene/xgene_enet_ethtool.c 
b/drivers/net/ethernet/apm/xgene/xgene_enet_ethtool.c
index 369658b..5e8660e 100644
--- a/drivers/net/ethernet/apm/xgene/xgene_enet_ethtool.c
+++ b/drivers/net/ethernet/apm/xgene/xgene_enet_ethtool.c
@@ -64,6 +64,7 @@ struct xgene_gstrings_stats {
XGENE_EXTD_STAT(rx_unk_opcode_cntr, RXUO, 16),
XGENE_EXTD_STAT(rx_align_err_cntr, RALN, 16),
XGENE_EXTD_STAT(rx_frame_len_err_cntr, RFLR, 16),
+   XGENE_EXTD_STAT(rx_frame_len_err_recov_cntr, DUMP, 0),
XGENE_EXTD_STAT(rx_code_err_cntr, RCDE, 16),
XGENE_EXTD_STAT(rx_carrier_sense_err_cntr, RCSE, 16),
XGENE_EXTD_STAT(rx_undersize_pkt_cntr, RUND, 16),
@@ -95,8 +96,9 @@ struct xgene_gstrings_stats {
 
 #define XGENE_STATS_LENARRAY_SIZE(gstrings_stats)
 #define XGENE_EXTD_STATS_LEN   ARRAY_SIZE(gstrings_extd_stats)
-#define RX_OVERRUN_IDX 22
-#define TX_UNDERRUN_IDX41
+#define FALSE_RFLR_IDX 15
+#define RX_OVERRUN_IDX 23
+#define TX_UNDERRUN_IDX42
 
 static void xgene_get_drvinfo(struct net_device *ndev,
  struct ethtool_drvinfo *info)
@@ -230,6 +232,9 @@ static void xgene_get_extd_stats(struct xgene_enet_pdata 
*pdata)
pdata->mac_ops->get_drop_cnt(pdata, &rx_drop, &tx_drop);
pdata->extd_stats[RX_OVERRUN_IDX] += rx_drop;
pdata->extd_stats[TX_UNDERRUN_IDX] += tx_drop;
+
+   /* Errata 10GE_8 -  Update Frame recovered from Errata 10GE_8/ENET_11 */
+   pdata->extd_stats[FALSE_RFLR_IDX] = pdata->false_rflr;
 }
 
 int xgene_extd_stats_init(struct xgene_enet_pdata *pdata)
diff --git a/drivers/net/ethernet/apm/xgene/xgene_enet_main.c 
b/drivers/net/ethernet/apm/xgene/xgene_enet_main.c
index bd2486e..c2d38da 100644
--- a/drivers/net/ethernet/apm/xgene/xgene_enet_main.c
+++ b/drivers/net/ethernet/apm/xgene/xgene_enet_main.c
@@ -712,6 +712,8 @@ static int xgene_enet_rx_frame(struct xgene_enet_desc_ring 
*rx_ring,
xgene_enet_parse_error(rx_ring, status);
rx_ring->rx_dropped++;
goto out;
+   } else {
+   pdata->false_rflr++;
}
}
 
diff --git a/drivers/net/ethernet/apm/xgene/xgene_enet_main.h 
b/drivers/net/ethernet/apm/xgene/xgene_enet_main.h
index e4b7786..8afae57 100644
--- a/drivers/net/ethernet/apm/xgene/xgene_enet_main.h
+++ b/drivers/net/ethernet/apm/xgene/xgene_enet_main.h
@@ -225,6 +225,7 @@ struct xgene_enet_pdata {
enum xgene_enet_rm rm;
struct xgene_enet_cle cle;
u64 *extd_stats;
+   u64 false_rflr;
spinlock_t stats_lock; /* statistics lock */
const struct xgene_mac_ops *mac_ops;
spinlock_t mac_lock; /* mac lock */
-- 
1.9.1



[PATCH net-next 3/9] drivers: net: xgene: Refactor statistics error parsing code

2017-04-26 Thread Iyappan Subramanian
From: Quan Nguyen 

This patch fixes the tx error counters and adds more rx error counters.

Signed-off-by: Quan Nguyen 
Signed-off-by: Iyappan Subramanian 
---
 drivers/net/ethernet/apm/xgene/xgene_enet_hw.c   |  6 --
 drivers/net/ethernet/apm/xgene/xgene_enet_hw.h   |  2 --
 drivers/net/ethernet/apm/xgene/xgene_enet_main.c | 26 +++-
 drivers/net/ethernet/apm/xgene/xgene_enet_main.h |  2 ++
 4 files changed, 18 insertions(+), 18 deletions(-)

diff --git a/drivers/net/ethernet/apm/xgene/xgene_enet_hw.c 
b/drivers/net/ethernet/apm/xgene/xgene_enet_hw.c
index 3697ba7..06bef14 100644
--- a/drivers/net/ethernet/apm/xgene/xgene_enet_hw.c
+++ b/drivers/net/ethernet/apm/xgene/xgene_enet_hw.c
@@ -205,30 +205,24 @@ static u32 xgene_enet_ring_len(struct 
xgene_enet_desc_ring *ring)
 }
 
 void xgene_enet_parse_error(struct xgene_enet_desc_ring *ring,
-   struct xgene_enet_pdata *pdata,
enum xgene_enet_err_code status)
 {
switch (status) {
case INGRESS_CRC:
ring->rx_crc_errors++;
-   ring->rx_dropped++;
break;
case INGRESS_CHECKSUM:
case INGRESS_CHECKSUM_COMPUTE:
ring->rx_errors++;
-   ring->rx_dropped++;
break;
case INGRESS_TRUNC_FRAME:
ring->rx_frame_errors++;
-   ring->rx_dropped++;
break;
case INGRESS_PKT_LEN:
ring->rx_length_errors++;
-   ring->rx_dropped++;
break;
case INGRESS_PKT_UNDER:
ring->rx_frame_errors++;
-   ring->rx_dropped++;
break;
case INGRESS_FIFO_OVERRUN:
ring->rx_fifo_errors++;
diff --git a/drivers/net/ethernet/apm/xgene/xgene_enet_hw.h 
b/drivers/net/ethernet/apm/xgene/xgene_enet_hw.h
index d250bfe..c6b5bbd 100644
--- a/drivers/net/ethernet/apm/xgene/xgene_enet_hw.h
+++ b/drivers/net/ethernet/apm/xgene/xgene_enet_hw.h
@@ -380,9 +380,7 @@ static inline u16 xgene_enet_get_numslots(u16 id, u32 size)
 }
 
 void xgene_enet_parse_error(struct xgene_enet_desc_ring *ring,
-   struct xgene_enet_pdata *pdata,
enum xgene_enet_err_code status);
-
 int xgene_enet_mdio_config(struct xgene_enet_pdata *pdata);
 void xgene_enet_mdio_remove(struct xgene_enet_pdata *pdata);
 bool xgene_ring_mgr_init(struct xgene_enet_pdata *p);
diff --git a/drivers/net/ethernet/apm/xgene/xgene_enet_main.c 
b/drivers/net/ethernet/apm/xgene/xgene_enet_main.c
index e4f2ef2..3f24b83 100644
--- a/drivers/net/ethernet/apm/xgene/xgene_enet_main.c
+++ b/drivers/net/ethernet/apm/xgene/xgene_enet_main.c
@@ -246,9 +246,9 @@ static int xgene_enet_tx_completion(struct 
xgene_enet_desc_ring *cp_ring,
skb_frag_t *frag;
dma_addr_t *frag_dma_addr;
u16 skb_index;
-   u8 status;
-   int i, ret = 0;
u8 mss_index;
+   u8 status;
+   int i;
 
skb_index = GET_VAL(USERINFO, le64_to_cpu(raw_desc->m0));
skb = cp_ring->cp_skb[skb_index];
@@ -275,19 +275,17 @@ static int xgene_enet_tx_completion(struct 
xgene_enet_desc_ring *cp_ring,
/* Checking for error */
status = GET_VAL(LERR, le64_to_cpu(raw_desc->m0));
if (unlikely(status > 2)) {
-   xgene_enet_parse_error(cp_ring, netdev_priv(cp_ring->ndev),
-  status);
-   ret = -EIO;
+   cp_ring->tx_dropped++;
+   cp_ring->tx_errors++;
}
 
if (likely(skb)) {
dev_kfree_skb_any(skb);
} else {
netdev_err(cp_ring->ndev, "completion skb is NULL\n");
-   ret = -EIO;
}
 
-   return ret;
+   return 0;
 }
 
 static int xgene_enet_setup_mss(struct net_device *ndev, u32 mss)
@@ -711,7 +709,8 @@ static int xgene_enet_rx_frame(struct xgene_enet_desc_ring 
*rx_ring,
if (!xgene_enet_errata_10GE_8(skb, datalen, status)) {
dev_kfree_skb_any(skb);
xgene_enet_free_pagepool(page_pool, raw_desc, exp_desc);
-   xgene_enet_parse_error(rx_ring, pdata, status);
+   xgene_enet_parse_error(rx_ring, status);
+   rx_ring->rx_dropped++;
goto out;
}
}
@@ -1477,6 +1476,8 @@ static void xgene_enet_get_stats64(
if (ring) {
stats->tx_packets += ring->tx_packets;
stats->tx_bytes += ring->tx_bytes;
+   stats->tx_dropped += ring->tx_dropped;
+   stats->tx_errors += ring->tx_errors;
}
}
 
@@ -1485,11 +1486,16 @@ static void xgene_enet_get_stats64(
if (ring) {
stats->rx_packets += ring->rx_packets;
stats->rx_bytes += ring->rx_by

[PATCH net-next 1/9] drivers: net: xgene: Protect indirect MAC access

2017-04-26 Thread Iyappan Subramanian
From: Quan Nguyen 

This patch adds lock to protect indirect mac access sequence.

Signed-off-by: Quan Nguyen 
Signed-off-by: Iyappan Subramanian 
---
 drivers/net/ethernet/apm/xgene/xgene_enet_hw.c| 2 ++
 drivers/net/ethernet/apm/xgene/xgene_enet_main.c  | 1 +
 drivers/net/ethernet/apm/xgene/xgene_enet_main.h  | 1 +
 drivers/net/ethernet/apm/xgene/xgene_enet_sgmac.c | 7 ++-
 drivers/net/ethernet/apm/xgene/xgene_enet_xgmac.c | 2 ++
 5 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/apm/xgene/xgene_enet_hw.c 
b/drivers/net/ethernet/apm/xgene/xgene_enet_hw.c
index 2a835e0..3697ba7 100644
--- a/drivers/net/ethernet/apm/xgene/xgene_enet_hw.c
+++ b/drivers/net/ethernet/apm/xgene/xgene_enet_hw.c
@@ -365,9 +365,11 @@ static void xgene_enet_rd_mcx_mac(struct xgene_enet_pdata 
*pdata,
cmd = pdata->mcx_mac_addr + MAC_COMMAND_REG_OFFSET;
cmd_done = pdata->mcx_mac_addr + MAC_COMMAND_DONE_REG_OFFSET;
 
+   spin_lock(&pdata->mac_lock);
if (!xgene_enet_rd_indirect(addr, rd, cmd, cmd_done, rd_addr, rd_data))
netdev_err(pdata->ndev, "MCX mac read failed, addr: %04x\n",
   rd_addr);
+   spin_unlock(&pdata->mac_lock);
 }
 
 static void xgene_gmac_set_mac_addr(struct xgene_enet_pdata *pdata)
diff --git a/drivers/net/ethernet/apm/xgene/xgene_enet_main.c 
b/drivers/net/ethernet/apm/xgene/xgene_enet_main.c
index 5f37ed3..9a28ac3 100644
--- a/drivers/net/ethernet/apm/xgene/xgene_enet_main.c
+++ b/drivers/net/ethernet/apm/xgene/xgene_enet_main.c
@@ -2055,6 +2055,7 @@ static int xgene_enet_probe(struct platform_device *pdev)
goto err;
 
xgene_enet_setup_ops(pdata);
+   spin_lock_init(&pdata->mac_lock);
 
if (pdata->phy_mode == PHY_INTERFACE_MODE_XGMII) {
ndev->features |= NETIF_F_TSO | NETIF_F_RXCSUM;
diff --git a/drivers/net/ethernet/apm/xgene/xgene_enet_main.h 
b/drivers/net/ethernet/apm/xgene/xgene_enet_main.h
index 0d4be24..827b33d 100644
--- a/drivers/net/ethernet/apm/xgene/xgene_enet_main.h
+++ b/drivers/net/ethernet/apm/xgene/xgene_enet_main.h
@@ -221,6 +221,7 @@ struct xgene_enet_pdata {
struct xgene_enet_cle cle;
struct rtnl_link_stats64 stats;
const struct xgene_mac_ops *mac_ops;
+   spinlock_t mac_lock; /* mac lock */
const struct xgene_port_ops *port_ops;
struct xgene_ring_ops *ring_ops;
const struct xgene_cle_ops *cle_ops;
diff --git a/drivers/net/ethernet/apm/xgene/xgene_enet_sgmac.c 
b/drivers/net/ethernet/apm/xgene/xgene_enet_sgmac.c
index a8e063b..4dd41f5 100644
--- a/drivers/net/ethernet/apm/xgene/xgene_enet_sgmac.c
+++ b/drivers/net/ethernet/apm/xgene/xgene_enet_sgmac.c
@@ -130,6 +130,7 @@ static u32 xgene_enet_rd_indirect(struct xgene_indirect_ctl 
*ctl, u32 rd_addr)
 
 static u32 xgene_enet_rd_mac(struct xgene_enet_pdata *p, u32 rd_addr)
 {
+   u32 val;
struct xgene_indirect_ctl ctl = {
.addr = p->mcx_mac_addr + MAC_ADDR_REG_OFFSET,
.ctl = p->mcx_mac_addr + MAC_READ_REG_OFFSET,
@@ -137,7 +138,11 @@ static u32 xgene_enet_rd_mac(struct xgene_enet_pdata *p, 
u32 rd_addr)
.cmd_done = p->mcx_mac_addr + MAC_COMMAND_DONE_REG_OFFSET
};
 
-   return xgene_enet_rd_indirect(&ctl, rd_addr);
+   spin_lock(&p->mac_lock);
+   val = xgene_enet_rd_indirect(&ctl, rd_addr);
+   spin_unlock(&p->mac_lock);
+
+   return val;
 }
 
 static int xgene_enet_ecc_init(struct xgene_enet_pdata *p)
diff --git a/drivers/net/ethernet/apm/xgene/xgene_enet_xgmac.c 
b/drivers/net/ethernet/apm/xgene/xgene_enet_xgmac.c
index 423240c..9a2d0ca 100644
--- a/drivers/net/ethernet/apm/xgene/xgene_enet_xgmac.c
+++ b/drivers/net/ethernet/apm/xgene/xgene_enet_xgmac.c
@@ -158,9 +158,11 @@ static void xgene_enet_rd_mac(struct xgene_enet_pdata 
*pdata,
cmd = pdata->mcx_mac_addr + MAC_COMMAND_REG_OFFSET;
cmd_done = pdata->mcx_mac_addr + MAC_COMMAND_DONE_REG_OFFSET;
 
+   spin_lock(&pdata->mac_lock);
if (!xgene_enet_rd_indirect(addr, rd, cmd, cmd_done, rd_addr, rd_data))
netdev_err(pdata->ndev, "MCX mac read failed, addr: %04x\n",
   rd_addr);
+   spin_unlock(&pdata->mac_lock);
 }
 
 static bool xgene_enet_rd_pcs(struct xgene_enet_pdata *pdata,
-- 
1.9.1



[PATCH net-next 2/9] drivers: net: xgene: Remove redundant local stats

2017-04-26 Thread Iyappan Subramanian
From: Quan Nguyen 

Commit 5944701df90d ("net: remove useless memset's in drivers get_stats64")
makes the pdata->stats redundant. This patch removes pdata->stats and
updates get_stats64() callback accordingly.

Signed-off-by: Quan Nguyen 
Signed-off-by: Iyappan Subramanian 
---
 drivers/net/ethernet/apm/xgene/xgene_enet_ethtool.c | 7 ---
 drivers/net/ethernet/apm/xgene/xgene_enet_main.c| 4 +---
 drivers/net/ethernet/apm/xgene/xgene_enet_main.h| 1 -
 3 files changed, 5 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/apm/xgene/xgene_enet_ethtool.c 
b/drivers/net/ethernet/apm/xgene/xgene_enet_ethtool.c
index 28fdedc..217cde8 100644
--- a/drivers/net/ethernet/apm/xgene/xgene_enet_ethtool.c
+++ b/drivers/net/ethernet/apm/xgene/xgene_enet_ethtool.c
@@ -25,7 +25,7 @@ struct xgene_gstrings_stats {
int offset;
 };
 
-#define XGENE_STAT(m) { #m, offsetof(struct xgene_enet_pdata, stats.m) }
+#define XGENE_STAT(m) { #m, offsetof(struct rtnl_link_stats64, m) }
 
 static const struct xgene_gstrings_stats gstrings_stats[] = {
XGENE_STAT(rx_packets),
@@ -156,11 +156,12 @@ static void xgene_get_ethtool_stats(struct net_device 
*ndev,
struct ethtool_stats *dummy,
u64 *data)
 {
-   void *pdata = netdev_priv(ndev);
+   struct rtnl_link_stats64 stats;
int i;
 
+   dev_get_stats(ndev, &stats);
for (i = 0; i < XGENE_STATS_LEN; i++)
-   *data++ = *(u64 *)(pdata + gstrings_stats[i].offset);
+   data[i] = *(u64 *)((char *)&stats + gstrings_stats[i].offset);
 }
 
 static void xgene_get_pauseparam(struct net_device *ndev,
diff --git a/drivers/net/ethernet/apm/xgene/xgene_enet_main.c 
b/drivers/net/ethernet/apm/xgene/xgene_enet_main.c
index 9a28ac3..e4f2ef2 100644
--- a/drivers/net/ethernet/apm/xgene/xgene_enet_main.c
+++ b/drivers/net/ethernet/apm/xgene/xgene_enet_main.c
@@ -1466,10 +1466,9 @@ static int xgene_enet_create_desc_rings(struct 
net_device *ndev)
 
 static void xgene_enet_get_stats64(
struct net_device *ndev,
-   struct rtnl_link_stats64 *storage)
+   struct rtnl_link_stats64 *stats)
 {
struct xgene_enet_pdata *pdata = netdev_priv(ndev);
-   struct rtnl_link_stats64 *stats = &pdata->stats;
struct xgene_enet_desc_ring *ring;
int i;
 
@@ -1493,7 +1492,6 @@ static void xgene_enet_get_stats64(
stats->rx_dropped += ring->rx_dropped;
}
}
-   memcpy(storage, stats, sizeof(struct rtnl_link_stats64));
 }
 
 static int xgene_enet_set_mac_address(struct net_device *ndev, void *addr)
diff --git a/drivers/net/ethernet/apm/xgene/xgene_enet_main.h 
b/drivers/net/ethernet/apm/xgene/xgene_enet_main.h
index 827b33d..5e6fd71 100644
--- a/drivers/net/ethernet/apm/xgene/xgene_enet_main.h
+++ b/drivers/net/ethernet/apm/xgene/xgene_enet_main.h
@@ -219,7 +219,6 @@ struct xgene_enet_pdata {
int phy_mode;
enum xgene_enet_rm rm;
struct xgene_enet_cle cle;
-   struct rtnl_link_stats64 stats;
const struct xgene_mac_ops *mac_ops;
spinlock_t mac_lock; /* mac lock */
const struct xgene_port_ops *port_ops;
-- 
1.9.1



[PATCH net-next 4/9] drivers: net: xgene: Remove unused macros

2017-04-26 Thread Iyappan Subramanian
From: Quan Nguyen 

This patch cleans up unused macros to improve readability.

Signed-off-by: Quan Nguyen 
Signed-off-by: Iyappan Subramanian 
---
 drivers/net/ethernet/apm/xgene/xgene_enet_hw.h | 6 --
 1 file changed, 6 deletions(-)

diff --git a/drivers/net/ethernet/apm/xgene/xgene_enet_hw.h 
b/drivers/net/ethernet/apm/xgene/xgene_enet_hw.h
index c6b5bbd..5a9f9d5 100644
--- a/drivers/net/ethernet/apm/xgene/xgene_enet_hw.h
+++ b/drivers/net/ethernet/apm/xgene/xgene_enet_hw.h
@@ -217,12 +217,6 @@ enum xgene_enet_rm {
 #define FULL_DUPLEX2   BIT(0)
 #define PAD_CRCBIT(2)
 #define LENGTH_CHK BIT(4)
-#define SCAN_AUTO_INCR BIT(5)
-#define TBYT_ADDR  0x38
-#define TPKT_ADDR  0x39
-#define TDRP_ADDR  0x45
-#define TFCS_ADDR  0x47
-#define TUND_ADDR  0x4a
 
 #define TSO_IPPROTO_TCP1
 
-- 
1.9.1



[PATCH net-next 0/9] drivers: net: xgene: Add ethtool stats and bug fixes

2017-04-26 Thread Iyappan Subramanian
This patch set,

- adds ethtool extended statistics support
- addresses errata workarounds
- fixes bugs related to statistics

Signed-off-by: Iyappan Subramanian 
Signed-off-by: Quan Nguyen 
---

Quan Nguyen (9):
  drivers: net: xgene: Protect indirect MAC access
  drivers: net: xgene: Remove redundant local stats
  drivers: net: xgene: Refactor statistics error parsing code
  drivers: net: xgene: Remove unused macros
  drivers: net: xgene: Extend ethtool statistics
  drivers: net: xgene: Add rx_overrun/tx_underrun statistic
  drivers: net: xgene: Workaround for HW errata 10GE_4
  drivers: net: xgene: Add frame recovered statistics counter for errata
10GE_8/ENET_11
  drivers: net: xgene: Workaround for HW errata 10GE_10/ENET_15

 .../net/ethernet/apm/xgene/xgene_enet_ethtool.c| 132 -
 drivers/net/ethernet/apm/xgene/xgene_enet_hw.c |  41 ++-
 drivers/net/ethernet/apm/xgene/xgene_enet_hw.h |  66 +--
 drivers/net/ethernet/apm/xgene/xgene_enet_main.c   |  57 ++---
 drivers/net/ethernet/apm/xgene/xgene_enet_main.h   |  13 +-
 drivers/net/ethernet/apm/xgene/xgene_enet_sgmac.c  |  42 ++-
 drivers/net/ethernet/apm/xgene/xgene_enet_xgmac.c  |  35 ++
 drivers/net/ethernet/apm/xgene/xgene_enet_xgmac.h  |   5 +
 8 files changed, 357 insertions(+), 34 deletions(-)

-- 
1.9.1



Re: [PATCH v1 net-next 3/6] net: add new control message for incoming HW-timestamped packets

2017-04-26 Thread Willem de Bruijn
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 1b3317c..0a78f7f 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -160,6 +160,7 @@ static int netif_rx_internal(struct sk_buff *skb);
>  static int call_netdevice_notifiers_info(unsigned long val,
>  struct net_device *dev,
>  struct netdev_notifier_info *info);
> +static struct napi_struct *napi_by_id(unsigned int napi_id);
>
>  /*
>   * The @dev_base_head list is protected by @dev_base_lock and the rtnl
> @@ -863,6 +864,23 @@ struct net_device *dev_get_by_index(struct net *net, int 
> ifindex)
>  }
>  EXPORT_SYMBOL(dev_get_by_index);
>
> +struct net_device *dev_get_by_napi_id(unsigned int napi_id)
> +{
> +   struct net_device *dev = NULL;
> +   struct napi_struct *napi;
> +
> +   rcu_read_lock();
> +
> +   napi = napi_by_id(napi_id);
> +   if (napi)
> +   dev = napi->dev;
> +
> +   rcu_read_unlock();
> +
> +   return dev;
> +}
> +EXPORT_SYMBOL(dev_get_by_napi_id);

Returning dev without holding a reference is not safe. You'll probably
have to call this with rcu_read_lock held instead.

Also, this generic napi function should probably be in a separate patch.

> diff --git a/net/socket.c b/net/socket.c
> index c2564eb..5ea5f29 100644
> --- a/net/socket.c
> +++ b/net/socket.c
> @@ -662,6 +662,26 @@ static bool skb_is_err_queue(const struct sk_buff *skb)
> return skb->pkt_type == PACKET_OUTGOING;
>  }
>
> +static void put_ts_pktinfo(struct msghdr *msg, struct sk_buff *skb)
> +{
> +#ifdef CONFIG_NET_RX_BUSY_POLL

Let's limit the ifdef scope to napi code. Perhaps we need an skb_get_napi_id
helper that returns 0 if the feature is not compiled in.

> +   struct scm_ts_pktinfo ts_pktinfo;
> +   struct net_device *orig_dev;
> +
> +   if (skb->napi_id < MIN_NAPI_ID || !skb_mac_header_was_set(skb))
> +   return;

This MIN_NAPI_ID check can be moved into dev_get_by_napi_id.

>  /*
>   * called from sock_recv_timestamp() if sock_flag(sk, SOCK_RCVTSTAMP)
>   */
> @@ -699,8 +719,12 @@ void __sock_recv_timestamp(struct msghdr *msg, struct 
> sock *sk,
> empty = 0;
> if (shhwtstamps &&
> (sk->sk_tsflags & SOF_TIMESTAMPING_RAW_HARDWARE) &&

This information is also informative with software timestamps.

And getting the real iif is definitely useful outside timestamps. An
alternative approach is to add versioning to IP_PKTINFO with a new
setsockopt IP_PKTINFO_VERSION plus a new struct in_pktinfo_v2
that extends in_pktinfo. Just a thought.


Re: [PATCH v2 01/21] scatterlist: Introduce sg_map helper functions

2017-04-26 Thread Logan Gunthorpe


On 26/04/17 02:59 AM,   wrote:
> Good to know that somebody is working on this. Those problems troubled
> us as well.

Thanks Christian. It's a daunting problem and a there's a lot of work to
do before we will ever be where we need to be so any help, even an ack,
is greatly appreciated.

Logan



Re: [PATCH net-next 02/10] tcp: do not pass timestamp to tcp_rack_detect_loss()

2017-04-26 Thread Eric Dumazet
On Tue, 2017-04-25 at 10:15 -0700, Eric Dumazet wrote:
> We can use tp->tcp_mstamp as it contains a recent timestamp.
> 

> @@ -165,12 +164,10 @@ void tcp_rack_advance(struct tcp_sock *tp, u8 sacked, 
> u32 end_seq,
>  void tcp_rack_reo_timeout(struct sock *sk)
>  {
>   struct tcp_sock *tp = tcp_sk(sk);
> - struct skb_mstamp now;
>   u32 timeout, prior_inflight;
>  
> - skb_mstamp_get(&now);

Oh this is silly, a timer event is not updating tp->tcp_mstamp yet.

I will provide a patch, after tests.

>   prior_inflight = tcp_packets_in_flight(tp);
> - tcp_rack_detect_loss(sk, &now, &timeout);
> + tcp_rack_detect_loss(sk, &timeout);
>   if (prior_inflight != tcp_packets_in_flight(tp)) {
>   if (inet_csk(sk)->icsk_ca_state != TCP_CA_Recovery) {
>   tcp_enter_recovery(sk, false);

Something like :

diff --git a/net/ipv4/tcp_recovery.c b/net/ipv4/tcp_recovery.c
index 
fdac262e277b2f25492f155bbb295d6d87e31d02..75096becac21f09fd14466b5f87dffe5820325b5
 100644
--- a/net/ipv4/tcp_recovery.c
+++ b/net/ipv4/tcp_recovery.c
@@ -167,6 +167,7 @@ void tcp_rack_reo_timeout(struct sock *sk)
u32 timeout, prior_inflight;
 
prior_inflight = tcp_packets_in_flight(tp);
+   skb_mstamp_get(&tp->tcp_mstamp);
tcp_rack_detect_loss(sk, &timeout);
if (prior_inflight != tcp_packets_in_flight(tp)) {
if (inet_csk(sk)->icsk_ca_state != TCP_CA_Recovery) {




Re: [PATCH v2 0/8] NFC: fix device allocation and nfcmrvl crashes

2017-04-26 Thread Samuel Ortiz
Hi Johan,

On Thu, Mar 30, 2017 at 12:15:34PM +0200, Johan Hovold wrote:
> This started out with the observation that the nfcmrvl_uart driver
> unconditionally dereferenced the tty class device despite the fact that
> not every tty has an associated struct device (Unix98 ptys). Some
> further changes were needed in the common nfcmrvl code to fully address
> this, some of which also incidentally fixed a few related bugs (e.g.
> resource leaks in error paths).
> 
> While fixing this I stumbled over a regression in NFC core that lead to
> broken registration error paths and misnamed workqueues.
> 
> Note that this has only been tested by configuring the n_hci line
> discipline for different ttys without any actual NFC hardware connected.
> 
> Johan
> 
> 
> Changes in v2
>  - fix typo in commit message (1/8)
>  - release reset gpio in error paths (3/8)
>  - fix description of patch impact (3/8)
>  - allow gpio 0 to be used for reset signalling (8/8, new)
> 
> 
> Johan Hovold (8):
>   NFC: fix broken device allocation
>   NFC: nfcmrvl_uart: add missing tty-device sanity check
>   NFC: nfcmrvl: do not use device-managed resources
>   NFC: nfcmrvl: use nfc-device for firmware download
>   NFC: nfcmrvl: fix firmware-management initialisation
>   NFC: nfcmrvl_uart: fix device-node leak during probe
>   NFC: nfcmrvl_usb: use interface as phy device
>   NFC: nfcmrvl: allow gpio 0 for reset signalling
Applied, thanks.

Cheers,
Samuel.


Re: [PATCH 5/7] IB/hfi1: use pcie_flr instead of duplicating it

2017-04-26 Thread Jeff Kirsher
On Tue, 2017-04-25 at 14:39 -0500, Bjorn Helgaas wrote:
> On Mon, Apr 24, 2017 at 04:35:07PM +0200, Christoph Hellwig wrote:
> > On Mon, Apr 24, 2017 at 02:16:31PM +, Byczkowski, Jakub wrote:
> > > Tested-by: Jakub Byczkowski 
> > 
> > Are you (and Doug) ok with queueing this up in the PCI tree?
> 
> Applied this with Jakub's tested-by and Doug's ack to pci/virtualization
> for v4.12.
> 
> This still leaves these:
> 
>   [PATCH 4/7] ixgbe: use pcie_flr instead of duplicating it
>   [PATCH 6/7] crypto: qat: use pcie_flr instead of duplicating it
>   [PATCH 7/7] liquidio: use pcie_flr instead of duplicating it
> 
> I haven't seen any response to 4 and 6.  Felix reported an unused
> variable in 7.  Let me know if you'd like me to do anything with
> these.

Just provided my ACK for ixgbe patch.

signature.asc
Description: This is a digitally signed message part


Re: [PATCH 4/7] ixgbe: use pcie_flr instead of duplicating it

2017-04-26 Thread Jeff Kirsher
On Thu, 2017-04-13 at 16:53 +0200, Christoph Hellwig wrote:
> Signed-off-by: Christoph Hellwig 
> ---
>  drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 16 ++--
>  1 file changed, 2 insertions(+), 14 deletions(-)

Acked-by: Jeff Kirsher 

Sorry for the late ACK.

signature.asc
Description: This is a digitally signed message part


[PATCH 2/2] arm64: dts: apq8016-sbc: Correct WLAN LED default-trigger

2017-04-26 Thread Bjorn Andersson
The TX status trigger of the wlan interface is named phy0tx, so this
updates the default-trigger for the WLAN LED to use that instead.

Signed-off-by: Bjorn Andersson 
---

Note that without patch 1/2 this trigger does not fire - but there's also no
harm in picking the two patches through separate trees.

 arch/arm64/boot/dts/qcom/apq8016-sbc.dtsi | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm64/boot/dts/qcom/apq8016-sbc.dtsi 
b/arch/arm64/boot/dts/qcom/apq8016-sbc.dtsi
index 5d83b02b7c4a..21a8f5ce8955 100644
--- a/arch/arm64/boot/dts/qcom/apq8016-sbc.dtsi
+++ b/arch/arm64/boot/dts/qcom/apq8016-sbc.dtsi
@@ -178,7 +178,7 @@
led@5 {
label = "apq8016-sbc:yellow:wlan";
gpios = <&pm8916_mpps 2 GPIO_ACTIVE_HIGH>;
-   linux,default-trigger = "wlan";
+   linux,default-trigger = "phy0tx";
default-state = "off";
};
 
-- 
2.12.0



[PATCH 1/2] wcn36xx: Pass used skb to ieee80211_tx_status()

2017-04-26 Thread Bjorn Andersson
As the tx skbs are collected they should be passed to
ieee80211_tx_status() rather than ieee80211_free_txskb(), as the prior
will take care of monitoring and LED triggers while the latter will
consider the skb dropped.

Signed-off-by: Bjorn Andersson 
---
 drivers/net/wireless/ath/wcn36xx/dxe.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/wireless/ath/wcn36xx/dxe.c 
b/drivers/net/wireless/ath/wcn36xx/dxe.c
index 87dfdaf9044c..938b7bd733cf 100644
--- a/drivers/net/wireless/ath/wcn36xx/dxe.c
+++ b/drivers/net/wireless/ath/wcn36xx/dxe.c
@@ -371,7 +371,7 @@ static void reap_tx_dxes(struct wcn36xx *wcn, struct 
wcn36xx_dxe_ch *ch)
info = IEEE80211_SKB_CB(ctl->skb);
if (!(info->flags & IEEE80211_TX_CTL_REQ_TX_STATUS)) {
/* Keep frame until TX status comes */
-   ieee80211_free_txskb(wcn->hw, ctl->skb);
+   ieee80211_tx_status(wcn->hw, ctl->skb);
}
spin_lock(&ctl->skb_lock);
if (wcn->queues_stopped) {
-- 
2.12.0



Re: [PATCH net-next 6/6] bpf: show bpf programs

2017-04-26 Thread Daniel Borkmann

On 04/26/2017 08:24 PM, Hannes Frederic Sowa wrote:

Signed-off-by: Hannes Frederic Sowa 
---
  include/uapi/linux/bpf.h | 32 +++-
  kernel/bpf/core.c| 30 +-
  2 files changed, 48 insertions(+), 14 deletions(-)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index e553529929f683..d6506e320953d5 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -101,20 +101,26 @@ enum bpf_map_type {
BPF_MAP_TYPE_HASH_OF_MAPS,
  };

+#define BPF_PROG_TYPES \
+   X(BPF_PROG_TYPE_UNSPEC),\
+   X(BPF_PROG_TYPE_SOCKET_FILTER), \
+   X(BPF_PROG_TYPE_KPROBE),\
+   X(BPF_PROG_TYPE_SCHED_CLS), \
+   X(BPF_PROG_TYPE_SCHED_ACT), \
+   X(BPF_PROG_TYPE_TRACEPOINT),\
+   X(BPF_PROG_TYPE_XDP),   \
+   X(BPF_PROG_TYPE_PERF_EVENT),\
+   X(BPF_PROG_TYPE_CGROUP_SKB),\
+   X(BPF_PROG_TYPE_CGROUP_SOCK),   \
+   X(BPF_PROG_TYPE_LWT_IN),\
+   X(BPF_PROG_TYPE_LWT_OUT),   \
+   X(BPF_PROG_TYPE_LWT_XMIT),
+
+
  enum bpf_prog_type {
-   BPF_PROG_TYPE_UNSPEC,
-   BPF_PROG_TYPE_SOCKET_FILTER,
-   BPF_PROG_TYPE_KPROBE,
-   BPF_PROG_TYPE_SCHED_CLS,
-   BPF_PROG_TYPE_SCHED_ACT,
-   BPF_PROG_TYPE_TRACEPOINT,
-   BPF_PROG_TYPE_XDP,
-   BPF_PROG_TYPE_PERF_EVENT,
-   BPF_PROG_TYPE_CGROUP_SKB,
-   BPF_PROG_TYPE_CGROUP_SOCK,
-   BPF_PROG_TYPE_LWT_IN,
-   BPF_PROG_TYPE_LWT_OUT,
-   BPF_PROG_TYPE_LWT_XMIT,
+#define X(type) type


Defining X in uapi could clash easily with other headers e.g.
from application side.


+   BPF_PROG_TYPES
+#undef X
  };

  enum bpf_attach_type {
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index 3ba175a24e971a..685c1d0f31e029 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -536,13 +536,41 @@ static void ebpf_proc_stop(struct seq_file *s, void *v)
rcu_read_unlock();
  }

+static const char *bpf_type_string(enum bpf_prog_type type)
+{
+   static const char *bpf_type_names[] = {
+#define X(type) #type
+   BPF_PROG_TYPES
+#undef X
+   };
+
+   if (type >= ARRAY_SIZE(bpf_type_names))
+   return "";
+
+   return bpf_type_names[type];
+}
+
  static int ebpf_proc_show(struct seq_file *s, void *v)
  {
+   struct bpf_prog *prog;
+   struct bpf_prog_aux *aux;
+   char prog_tag[sizeof(prog->tag) * 2 + 1] = { };
+
if (v == SEQ_START_TOKEN) {
-   seq_printf(s, "# tag\n");
+   seq_printf(s, "# tag\t\t\ttype\t\t\truntime\tcap\tmemlock\n");
return 0;
}

+   aux = v;
+   prog = aux->prog;
+
+   bin2hex(prog_tag, prog->tag, sizeof(prog->tag));
+   seq_printf(s, "%s\t%s\t%s\t%s\t%llu\n", prog_tag,
+  bpf_type_string(prog->type),
+  prog->jited ? "jit" : "int",
+  prog->priv_cap_sys_admin ? "priv" : "unpriv",
+  prog->pages * 1ULL << PAGE_SHIFT);


Yeah, so that would be quite similar to what we dump in
bpf_prog_show_fdinfo() modulo the priv bit.

I generally agree that a facility for dumping all progs is needed
and it was also on the TODO list after the bpf(2) cmd for dumping
program insns back to user space.

I think the procfs interface has pro and cons: the upside is that
you can use it with tools like cat to inspect it, but what you still
cannot do is to say that you want to see the prog insns for, say,
prog #4 from that list. If we could iterate over that list through fds
via bpf(2) syscall, you could i) present the same info you have above
via fdinfo already and ii) also dump the BPF insns from that specific
program through a BPF_PROG_DUMP bpf(2) command. Once that dump also
supports maps in progs, you could go further and fetch related map
fds for inspection, etc.

Such option of iterating through that would need a new BPF syscall
cmd aka BPF_PROG_GET_NEXT which returns the first prog from the list
and you would walk the next one by passing the current fd, which can
later also be closed as not needed anymore. We could restrict that
dump to capable(CAP_SYS_ADMIN), and the kernel tree would need to
ship a tool e.g. under tools/bpf/ that can be used for inspection.


+
return 0;
  }






Re: [PATCH net-next 6/6] bpf: show bpf programs

2017-04-26 Thread Alexei Starovoitov
On Wed, Apr 26, 2017 at 08:24:19PM +0200, Hannes Frederic Sowa wrote:
>  
> +static const char *bpf_type_string(enum bpf_prog_type type)
> +{
> + static const char *bpf_type_names[] = {
> +#define X(type) #type
> + BPF_PROG_TYPES
> +#undef X
> + };
> +
> + if (type >= ARRAY_SIZE(bpf_type_names))
> + return "";
> +
> + return bpf_type_names[type];
> +}
> +
>  static int ebpf_proc_show(struct seq_file *s, void *v)
>  {
> + struct bpf_prog *prog;
> + struct bpf_prog_aux *aux;
> + char prog_tag[sizeof(prog->tag) * 2 + 1] = { };
> +
>   if (v == SEQ_START_TOKEN) {
> - seq_printf(s, "# tag\n");
> + seq_printf(s, "# tag\t\t\ttype\t\t\truntime\tcap\tmemlock\n");
>   return 0;
>   }
>  
> + aux = v;
> + prog = aux->prog;
> +
> + bin2hex(prog_tag, prog->tag, sizeof(prog->tag));
> + seq_printf(s, "%s\t%s\t%s\t%s\t%llu\n", prog_tag,
> +bpf_type_string(prog->type),
> +prog->jited ? "jit" : "int",
> +prog->priv_cap_sys_admin ? "priv" : "unpriv",
> +prog->pages * 1ULL << PAGE_SHIFT);

As I said several times already I'm strongly against procfs
style of exposing information about the programs.
I don't want this to become debugfs for bpf.
Maintaining the list of all loaded programs is fine
and we need a way to iterate through them, but procfs
is obviously not the interface to do that.
Programs/maps are binary whereas any fs interface is text.
It also doesn't scale with large number of programs/maps.
I prefer Daniel's suggestion on adding 'get_next' like API.
Also would be good if you can wait for Martin to finish his
prog->handle/id patches. Then user space will be able
to iterate through all the progs/maps and fetch all info about
them through syscall in extensible way.
And you wouldn't need to abuse kallsyms list for different purpose.



Re: [PATCH net-next 4/6] bpf: track if the bpf program was loaded with SYS_ADMIN capabilities

2017-04-26 Thread Alexei Starovoitov
On Wed, Apr 26, 2017 at 08:24:17PM +0200, Hannes Frederic Sowa wrote:
> Signed-off-by: Hannes Frederic Sowa 
> ---
>  include/linux/filter.h | 6 --
>  kernel/bpf/core.c  | 4 +++-
>  kernel/bpf/syscall.c   | 7 ---
>  kernel/bpf/verifier.c  | 4 ++--
>  net/core/filter.c  | 6 +++---
>  5 files changed, 16 insertions(+), 11 deletions(-)
> 
> diff --git a/include/linux/filter.h b/include/linux/filter.h
> index 63624c619e371b..635311f57bf24f 100644
> --- a/include/linux/filter.h
> +++ b/include/linux/filter.h
> @@ -413,7 +413,8 @@ struct bpf_prog {
>   locked:1,   /* Program image locked? */
>   gpl_compatible:1, /* Is filter GPL compatible? 
> */
>   cb_access:1,/* Is control block accessed? */
> - dst_needed:1;   /* Do we need dst entry? */
> + dst_needed:1,   /* Do we need dst entry? */
> + priv_cap_sys_admin:1; /* Where we loaded as 
> sys_admin? */

This is no go.
You didn't provide any explanation whatsoever why you want to see this boolean 
value.



Re: [PATCH net-next 4/6] bpf: track if the bpf program was loaded with SYS_ADMIN capabilities

2017-04-26 Thread Daniel Borkmann

On 04/26/2017 08:24 PM, Hannes Frederic Sowa wrote:

Signed-off-by: Hannes Frederic Sowa 


Ahh, looks this got swapped with 3/6.


---
  include/linux/filter.h | 6 --
  kernel/bpf/core.c  | 4 +++-
  kernel/bpf/syscall.c   | 7 ---
  kernel/bpf/verifier.c  | 4 ++--
  net/core/filter.c  | 6 +++---
  5 files changed, 16 insertions(+), 11 deletions(-)

diff --git a/include/linux/filter.h b/include/linux/filter.h
index 63624c619e371b..635311f57bf24f 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -413,7 +413,8 @@ struct bpf_prog {
locked:1,   /* Program image locked? */
gpl_compatible:1, /* Is filter GPL compatible? 
*/
cb_access:1,/* Is control block accessed? */
-   dst_needed:1;   /* Do we need dst entry? */
+   dst_needed:1,   /* Do we need dst entry? */
+   priv_cap_sys_admin:1; /* Where we loaded as 
sys_admin? */
kmemcheck_bitfield_end(meta);
enum bpf_prog_type  type;   /* Type of BPF program */

[...]

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 6f8b6ed690be93..24c9dac374770f 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -3488,7 +3488,7 @@ int bpf_check(struct bpf_prog **prog, union bpf_attr 
*attr)
if (ret < 0)
goto skip_full_check;

-   env->allow_ptr_leaks = capable(CAP_SYS_ADMIN);
+   env->allow_ptr_leaks = env->prog->priv_cap_sys_admin;

ret = do_check(env);

@@ -3589,7 +3589,7 @@ int bpf_analyzer(struct bpf_prog *prog, const struct 
bpf_ext_analyzer_ops *ops,
if (ret < 0)
goto skip_full_check;

-   env->allow_ptr_leaks = capable(CAP_SYS_ADMIN);
+   env->allow_ptr_leaks = prog->priv_cap_sys_admin;

ret = do_check(env);

diff --git a/net/core/filter.c b/net/core/filter.c
index 9a37860a80fc78..dc020d40bb770a 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -1100,7 +1100,7 @@ int bpf_prog_create(struct bpf_prog **pfp, struct 
sock_fprog_kern *fprog)
if (!bpf_check_basics_ok(fprog->filter, fprog->len))
return -EINVAL;

-   fp = bpf_prog_alloc(bpf_prog_size(fprog->len), 0);
+   fp = bpf_prog_alloc(bpf_prog_size(fprog->len), 0, false);
if (!fp)
return -ENOMEM;



Did you check that transferring allow_ptr_leaks doesn't have a side
effect on the nfp JIT? I believe it can also do cbpf migrations to
a certain extend.


Re: xdp_redirect ifindex vs port. Was: best API for returning/setting egress port?

2017-04-26 Thread Andy Gospodarek
On Wed, Apr 26, 2017 at 10:58:45AM -0700, Alexei Starovoitov wrote:
> On 4/26/17 9:35 AM, John Fastabend wrote:
> > 
> > > As Alexei also mentioned before, ifindex vs port makes no real
> > > difference seen from the bpf program side.  It is userspace's
> > > responsibility to add ifindex/port's to the bpf-maps, according to how
> > > the bpf program "policy" want to "connect" these ports.  The
> > > port-table system add one extra step, of also adding this port to the
> > > port-table (which lives inside the kernel).
> > > 
> > 
> > I'm not sure I understand the "lives inside the kernel" bit. I assumed
> > the 'map' should be a bpf map and behave like any other bpf map.
> > 
> > I wanted a new map to be defined, something like this from the bpf 
> > programmer
> > side.
> > 
> > struct bpf_map_def SEC("maps") port_table =
> > .type = BPF_MAP_TYPE_PORT_CONNECTION,
> > .key_size = sizeof(u32),
> > .value_size = BPF_PORT_CONNECTION_SIZE,
> > .max_entries = 256,
> > };
> 
> I like the idea.
> We have prog_array, perf_event_array, cgroup_array map specializations.
> This one can be new netdev_array with some new bpf_redirect-like helper
> accessing it.
> 
> > > When loading the XDP program, we also need to pass along a port table
> > > "id" this XDP program is associated with (and if it doesn't exists you
> > > create it).  And your userspace "control-plane" application also need
> > > to know this port table "id", when adding a new port.
> > 
> > So the user space application that is loading the program also needs
> > to handle this map. This seems correct to me. But I don't see the
> > value in making some new port table when we already have well understood
> > framework for maps.
> 
> +1
> 
> > > 
> > > The concept of having multiple port tables is key.  As this implies we
> > > can have several simultaneous "data-planes" that is *isolated* from
> > > each-other.  Think about how network-namespaces/containers want
> > > isolation. A subtle thing I'm afraid to mention, is that oppose to the
> > > ifindex model, a port table with mapping to a net_device pointer, would
> > > allow (faster) delivery into the container's inner net_device, which
> > > sort of violates the isolation, but I would argue it is not a problem
> > > as this net_device pointer could only be added from a process within the
> > > namespace.  I like this feature, but it could easily be disallowed via
> > > port insertion-time validation.
> > > 
> > 
> > I think the above optimization should be allowed. And agree multiple port
> > tables (maps?) is needed. Again all this points to using standard maps
> > logic in my mind. For permissions and different domains, which I think
> > you were starting to touch on, it looks like we could extend the pinning 
> > API.
> > At the moment it does an inode_permission(inode, MAY_WRITE) check but I
> > presume this could be extended. None of this would be needed in v1 and
> > could be added subsequently. read-only maps seems doable.
> 
> this is great idea. Once BPF_MAP_TYPE_NETDEV_ARRAY is populated
> the user space can make it readonly to prevent further changes.
> 
> From user space it can be done similar to perf_events/cgroups as well.
> bpf_map_update_elem(&netdev_array, &port_num, &ifindex)
> should work.
> For bpf_map_lookup_elem() from such netdev_array we can return
> ifindex back.
> The bpf_map_show_fdinfo() can be customized as well to pretty print
> ifindexes of netdevs stored in there.
> 

I agree with both of you on all of these points.  Having the port
redirection in a new type of map and/or array seems like the way to go.

I understood Jesper's perspecitive when thinking about a way to pass a
port-table id down, but I think the idea that the userspace loader code
defining the maps is going to be the one making this link is the right
idea and handling things like ifindex changes (rather than identifiers
that perform lookups in other tables) is going to have to be yet another
exercise left up to the...user.  :-)




[Patch net-next] ipv4: get rid of ip_ra_lock

2017-04-26 Thread Cong Wang
After commit 1215e51edad1 ("ipv4: fix a deadlock in ip_ra_control")
we always take RTNL lock for ip_ra_control() which is the only place
we update the list ip_ra_chain, so the ip_ra_lock is no longer needed,
we just need to disable BH there.

Signed-off-by: Cong Wang 
---
 net/ipv4/ip_sockglue.c | 15 +++
 1 file changed, 7 insertions(+), 8 deletions(-)

diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c
index 1d46d05..2923ea1 100644
--- a/net/ipv4/ip_sockglue.c
+++ b/net/ipv4/ip_sockglue.c
@@ -330,7 +330,6 @@ int ip_cmsg_send(struct sock *sk, struct msghdr *msg, 
struct ipcm_cookie *ipc,
sent to multicast group to reach destination designated router.
  */
 struct ip_ra_chain __rcu *ip_ra_chain;
-static DEFINE_SPINLOCK(ip_ra_lock);
 
 
 static void ip_ra_destroy_rcu(struct rcu_head *head)
@@ -352,21 +351,21 @@ int ip_ra_control(struct sock *sk, unsigned char on,
 
new_ra = on ? kmalloc(sizeof(*new_ra), GFP_KERNEL) : NULL;
 
-   spin_lock_bh(&ip_ra_lock);
+   ASSERT_RTNL();
+   local_bh_disable();
for (rap = &ip_ra_chain;
-(ra = rcu_dereference_protected(*rap,
-   lockdep_is_held(&ip_ra_lock))) != NULL;
+(ra = rtnl_dereference(*rap)) != NULL;
 rap = &ra->next) {
if (ra->sk == sk) {
if (on) {
-   spin_unlock_bh(&ip_ra_lock);
+   local_bh_enable();
kfree(new_ra);
return -EADDRINUSE;
}
/* dont let ip_call_ra_chain() use sk again */
ra->sk = NULL;
RCU_INIT_POINTER(*rap, ra->next);
-   spin_unlock_bh(&ip_ra_lock);
+   local_bh_enable();
 
if (ra->destructor)
ra->destructor(sk);
@@ -381,7 +380,7 @@ int ip_ra_control(struct sock *sk, unsigned char on,
}
}
if (!new_ra) {
-   spin_unlock_bh(&ip_ra_lock);
+   local_bh_enable();
return -ENOBUFS;
}
new_ra->sk = sk;
@@ -390,7 +389,7 @@ int ip_ra_control(struct sock *sk, unsigned char on,
RCU_INIT_POINTER(new_ra->next, ra);
rcu_assign_pointer(*rap, new_ra);
sock_hold(sk);
-   spin_unlock_bh(&ip_ra_lock);
+   local_bh_enable();
 
return 0;
 }
-- 
2.5.5



Re: [PATCH net-next] bpf: restore skb->sk before pskb_trim() call

2017-04-26 Thread Alexei Starovoitov
On Wed, Apr 26, 2017 at 09:09:23AM -0700, Eric Dumazet wrote:
> From: Eric Dumazet 
> 
> While testing a fix [1] in ___pskb_trim(), addressing the WARN_ON_ONCE()
> in skb_try_coalesce() reported by Andrey, I found that we had an skb
> with skb->sk set but no skb->destructor.
> 
> This invalidated heuristic found in commit 158f323b9868 ("net: adjust
> skb->truesize in pskb_expand_head()") and in cited patch.
> 
> Considering the BUG_ON(skb->sk) we have in skb_orphan(), we should
> restrain the temporary setting to a minimal section.
> 
> [1] https://patchwork.ozlabs.org/patch/755570/ 
> net: adjust skb->truesize in ___pskb_trim()
> 
> Fixes: 8f917bba0042 ("bpf: pass sk to helper functions")
> Signed-off-by: Eric Dumazet 
> Cc: Willem de Bruijn 
> Cc: Andrey Konovalov 

Ahh. Thanks for the fix.
Acked-by: Alexei Starovoitov 



Re: [PATCH net-next 3/6] bpf: bpf_progs stores all loaded programs

2017-04-26 Thread Daniel Borkmann

[ -dan...@iogearbox.com (wrong address) ]

On 04/26/2017 08:24 PM, Hannes Frederic Sowa wrote:

We later want to give users a quick dump of what is possible with procfs,
so store a list of all currently loaded bpf programs. Later this list
will be printed in procfs.

Signed-off-by: Hannes Frederic Sowa 
---
  include/linux/filter.h |  4 ++--
  kernel/bpf/core.c  | 51 +++---
  kernel/bpf/syscall.c   |  4 ++--
  3 files changed, 27 insertions(+), 32 deletions(-)

diff --git a/include/linux/filter.h b/include/linux/filter.h
index 9a7786db14fa53..63624c619e371b 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -753,8 +753,8 @@ bpf_address_lookup(unsigned long addr, unsigned long *size,
return ret;
  }

-void bpf_prog_kallsyms_add(struct bpf_prog *fp);
-void bpf_prog_kallsyms_del(struct bpf_prog *fp);
+void bpf_prog_link(struct bpf_prog *fp);
+void bpf_prog_unlink(struct bpf_prog *fp);

  #else /* CONFIG_BPF_JIT */

diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index 043f634ff58d87..2139118258cdf8 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -365,22 +365,6 @@ static struct latch_tree_root bpf_tree __cacheline_aligned;

  int bpf_jit_kallsyms __read_mostly;

-static void bpf_prog_ksym_node_add(struct bpf_prog_aux *aux)
-{
-   WARN_ON_ONCE(!list_empty(&aux->bpf_progs_head));
-   list_add_tail_rcu(&aux->bpf_progs_head, &bpf_progs);
-   latch_tree_insert(&aux->ksym_tnode, &bpf_tree, &bpf_tree_ops);
-}
-
-static void bpf_prog_ksym_node_del(struct bpf_prog_aux *aux)
-{
-   if (list_empty(&aux->bpf_progs_head))
-   return;
-
-   latch_tree_erase(&aux->ksym_tnode, &bpf_tree, &bpf_tree_ops);
-   list_del_rcu(&aux->bpf_progs_head);
-}
-
  static bool bpf_prog_kallsyms_candidate(const struct bpf_prog *fp)
  {
return fp->jited && !bpf_prog_was_classic(fp);
@@ -392,38 +376,45 @@ static bool bpf_prog_kallsyms_verify_off(const struct 
bpf_prog *fp)
   fp->aux->bpf_progs_head.prev == LIST_POISON2;
  }

-void bpf_prog_kallsyms_add(struct bpf_prog *fp)
+void bpf_prog_link(struct bpf_prog *fp)
  {
-   if (!bpf_prog_kallsyms_candidate(fp) ||
-   !capable(CAP_SYS_ADMIN))
-   return;
+   struct bpf_prog_aux *aux = fp->aux;

spin_lock_bh(&bpf_lock);
-   bpf_prog_ksym_node_add(fp->aux);
+   list_add_tail_rcu(&aux->bpf_progs_head, &bpf_progs);
+   if (bpf_prog_kallsyms_candidate(fp))
+   latch_tree_insert(&aux->ksym_tnode, &bpf_tree, &bpf_tree_ops);


Hmm, this has the side-effect that it will hook up all progs
to kallsyms (I left out !capable(CAP_SYS_ADMIN) intentionally).


spin_unlock_bh(&bpf_lock);
  }

-void bpf_prog_kallsyms_del(struct bpf_prog *fp)
+void bpf_prog_unlink(struct bpf_prog *fp)
  {
-   if (!bpf_prog_kallsyms_candidate(fp))
-   return;
+   struct bpf_prog_aux *aux = fp->aux;

spin_lock_bh(&bpf_lock);
-   bpf_prog_ksym_node_del(fp->aux);
+   list_del_rcu(&aux->bpf_progs_head);
+   if (bpf_prog_kallsyms_candidate(fp))
+   latch_tree_erase(&aux->ksym_tnode, &bpf_tree, &bpf_tree_ops);
spin_unlock_bh(&bpf_lock);
  }

  static struct bpf_prog *bpf_prog_kallsyms_find(unsigned long addr)
  {
struct latch_tree_node *n;
+   struct bpf_prog *prog;

if (!bpf_jit_kallsyms_enabled())
return NULL;

n = latch_tree_find((void *)addr, &bpf_tree, &bpf_tree_ops);
-   return n ?
-  container_of(n, struct bpf_prog_aux, ksym_tnode)->prog :
-  NULL;
+   if (!n)
+   return NULL;
+
+   prog = container_of(n, struct bpf_prog_aux, ksym_tnode)->prog;
+   if (!prog->priv_cap_sys_admin)


Where is this bit defined?

If we return NULL on them anyway, why adding them to the tree
in the first place, just wastes resources on the traversal?


+   return NULL;
+
+   return prog;
  }

  const char *__bpf_address_lookup(unsigned long addr, unsigned long *size,
@@ -474,6 +465,10 @@ int bpf_get_kallsym(unsigned int symnum, unsigned long 
*value, char *type,

rcu_read_lock();
list_for_each_entry_rcu(aux, &bpf_progs, bpf_progs_head) {
+   if (!bpf_prog_kallsyms_candidate(aux->prog) ||
+   !aux->prog->priv_cap_sys_admin)


Same here.


+   continue;
+
if (it++ != symnum)
continue;

diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 13642c73dca0b4..d61d1bd3e6fee6 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -664,7 +664,7 @@ void bpf_prog_put(struct bpf_prog *prog)
  {
if (atomic_dec_and_test(&prog->aux->refcnt)) {
trace_bpf_prog_put_rcu(prog);
-   bpf_prog_kallsyms_del(prog);
+   bpf_prog_unlink(prog);
call_rcu(&prog->aux->rcu, __bpf_prog_put_rcu);
}
  }
@@ -858,7 +858,7 @@ st

Re: [PATCH v2 01/21] scatterlist: Introduce sg_map helper functions

2017-04-26 Thread Logan Gunthorpe


On 26/04/17 01:44 AM, Christoph Hellwig wrote:
> I think we'll at least need a draft of those to make sense of these
> patches.  Otherwise they just look very clumsy.

Ok, I'll work up a draft proposal and send it in a couple days. But
without a lot of cleanup such as this series it's not going to even be
able to compile.

> I'm sorry but this API is just a trainwreck.  Right now we have the
> nice little kmap_atomic API, which never fails and has a very nice
> calling convention where we just pass back the return address, but does
> not support sleeping inside the critical section.
> 
> And kmap, whіch may fail and requires the original page to be passed
> back.  Anything that mixes these two concepts up is simply a non-starter.

Ok, well for starters I think you are mistaken about kmap being able to
fail. I'm having a hard time finding many users of that function that
bother to check for an error when calling it. The main difficulty we
have now is that neither of those functions are expected to fail and we
need them to be able to in cases where the page doesn't map to system
RAM. This patch series is trying to address it for users of scatterlist.
I'm certainly open to other suggestions.

I also have to disagree that kmap and kmap_atomic are all that "nice".
Except for the sleeping restriction and performance, they effectively do
the same thing. And it was necessary to write a macro wrapper around
kunmap_atomic to ensure that users of that function don't screw it up.
(See 597781f3e5.) I'd say the kmap/kmap_atomic functions are the
trainwreck and I'm trying to do my best to cleanup a few cases.

There are a fair number of cases in the kernel that do something like:

if (something)
x = kmap(page);
else
x = kmap_atomic(page);
...
if (something)
kunmap(page)
else
kunmap_atomic(x)

Which just seems cumbersome to me.

In any case, if you can accept an sg_kmap and sg_kmap_atomic api just
say so and I'll make the change. But I'll still need a flags variable
for SG_MAP_MUST_NOT_FAIL to support legacy cases that have no fail path
and both of those functions will need to be pretty nearly replicas of
each other.

Logan




Re: [PATCH net-next v8 2/3] net sched actions: dump more than TCA_ACT_MAX_PRIO actions per batch

2017-04-26 Thread Jamal Hadi Salim

On 17-04-26 09:56 AM, Jiri Pirko wrote:

Wed, Apr 26, 2017 at 03:14:38PM CEST, j...@mojatatu.com wrote:

On 17-04-26 08:08 AM, Jiri Pirko wrote:


[..]


Jiri, what are you arguing about if you have done the math? ;->


I can do 3*2*64. What I cannot do is to figure out the real performance
impact.



Jiri, I do a lot of very large data dumping and setting towards the
kernel. You know that. It is why I even have these patches to begin
with.

The math should be convincing enough.
48B per rule extra for just MPLS in a filter rule. I havent started
testing the overhead of flower but i do plan to use it - with about a
million rules for offloading. I will give you the numbers then.

I think we are at a stalemate.
You are not going to convince me to use an attribute with a
u8 for a bit flag when I can fit 32 of them in one attribute (with
the same cost). And I am not able to convince you that you are
wrong to put beauty first.


Again: You are looking at this from a manageability point of view which
is useful but not the only input into a design. If i can squeeze more
data without killing usability - I am all for it. It just doesnt
compute that it is ok to use a flag per attribute because it looks
beautiful.


Hmm. Now that I'm thinking about it, why don't we have NLA_FLAGS with
couple of helpers around it? It will be obvious what the attr is, all
kernel code would use the same helpers. Would be nice.



I think to have flags at that level is useful but it
is a different hierarchy level. I am not sure the
"actions dump large messages" is a fit for that level.

cheers,
jamal


Low speed MPLS to virtio-net

2017-04-26 Thread Алексей Болдырев
Started MPLS on the branch - Everything was fine. When I tried to run MPLS on a 
real network of virtual machines, there were problems with the speed:
root@containers:~# iperf3 -c 10.194.10.2 -B 10.194.10.1 -Z   
Connecting to host 10.194.10.2, port 5201
[  4] local 10.194.10.1 port 49533 connected to 10.194.10.2 port 5201
[ ID] Interval   Transfer Bandwidth   Retr  Cwnd
[  4]   0.00-1.00   sec  1018 KBytes  8.34 Mbits/sec  238   5.64 KBytes   
[  4]   1.00-2.00   sec  1.42 MBytes  11.9 Mbits/sec  373   1.41 KBytes   
[  4]   2.00-3.00   sec  1.43 MBytes  12.0 Mbits/sec  379   5.64 KBytes   
[  4]   3.00-4.00   sec  1.43 MBytes  12.0 Mbits/sec  376   5.64 KBytes   
[  4]   4.00-5.00   sec  1.41 MBytes  11.8 Mbits/sec  375   2.82 KBytes   
[  4]   5.00-6.00   sec  1.42 MBytes  11.9 Mbits/sec  376   2.82 KBytes   
[  4]   6.00-7.00   sec  1.42 MBytes  11.9 Mbits/sec  373   5.64 KBytes   
[  4]   7.00-8.00   sec  1.41 MBytes  11.8 Mbits/sec  372   5.64 KBytes   
[  4]   8.00-9.00   sec  1.42 MBytes  11.9 Mbits/sec  379   2.82 KBytes   
[  4]   9.00-10.00  sec  1.42 MBytes  11.9 Mbits/sec  373   5.64 KBytes   
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval   Transfer Bandwidth   Retr
[  4]   0.00-10.00  sec  13.8 MBytes  11.5 Mbits/sec  3614 sender
[  4]   0.00-10.00  sec  13.6 MBytes  11.4 Mbits/sec  receiver

iperf Done.
root@containers:~# 
Here are the settings:
test0:
lo: flags=73  mtu 65536
inet 127.0.0.1  netmask 255.0.0.0
inet6 ::1  prefixlen 128  scopeid 0x10
loop  txqueuelen 1000  (Local Loopback)
RX packets 0  bytes 0 (0.0 B)
RX errors 0  dropped 0  overruns 0  frame 0
TX packets 0  bytes 0 (0.0 B)
TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

lo:1: flags=73  mtu 65536
inet 10.194.10.1  netmask 255.255.255.255
loop  txqueuelen 1000  (Local Loopback)

test0p1: flags=4163  mtu 1500
inet 10.194.1.50  netmask 255.255.255.0  broadcast 10.194.1.255
inet6 fe80::b0a7:b1ff:fec1:3d5c  prefixlen 64  scopeid 0x20
ether b2:a7:b1:c1:3d:5c  txqueuelen 1000  (Ethernet)
RX packets 19974  bytes 1410944 (1.3 MiB)
RX errors 0  dropped 0  overruns 0  frame 0
TX packets 3726844  bytes 5236310466 (4.8 GiB)
TX errors 0  dropped 3604 overruns 0  carrier 0  collisions 0

test1:
lo: flags=73  mtu 65536
inet 127.0.0.1  netmask 255.0.0.0
inet6 ::1  prefixlen 128  scopeid 0x10
loop  txqueuelen 1000  (Local Loopback)
RX packets 0  bytes 0 (0.0 B)
RX errors 0  dropped 0  overruns 0  frame 0
TX packets 0  bytes 0 (0.0 B)
TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

lo:1: flags=73  mtu 65536
inet 10.194.10.2  netmask 255.255.255.255
loop  txqueuelen 1000  (Local Loopback)

test1p1: flags=4163  mtu 1500
inet 10.194.1.51  netmask 255.255.255.0  broadcast 10.194.1.255
inet6 fe80::5cc0:45ff:fe1a:9705  prefixlen 64  scopeid 0x20
ether 5e:c0:45:1a:97:05  txqueuelen 1000  (Ethernet)
RX packets 2001923  bytes 2806406771 (2.6 GiB)
RX errors 0  dropped 0  overruns 0  frame 0
TX packets 19907  bytes 1485150 (1.4 MiB)
TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

Server configuration:
root@ne-vlezay80:~# ip -M r
100 via inet 10.194.1.50 dev vlan11 
101 via inet 10.194.1.51 dev vlan11 
root@ne-vlezay80:~# 
root@ne-vlezay80:~# ifconfig
eth0  Link encap:Ethernet  HWaddr 52:54:00:5d:81:90  
  inet addr:10.247.0.250  Bcast:10.247.0.255  Mask:255.255.255.0
  inet6 addr: fe80::5054:ff:fe5d:8190/64 Scope:Link
  inet6 addr: fd00:1002:1289:10::10/64 Scope:Global
  UP BROADCAST RUNNING MULTICAST  MTU:2500  Metric:1
  RX packets:7403 errors:0 dropped:0 overruns:0 frame:0
  TX packets:4182 errors:0 dropped:0 overruns:0 carrier:0
  collisions:0 txqueuelen:1000 
  RX bytes:621871 (607.2 KiB)  TX bytes:445766 (435.3 KiB)

eth1  Link encap:Ethernet  HWaddr 52:54:00:0b:ff:2e  
  inet addr:192.168.122.1  Bcast:192.168.122.255  Mask:255.255.255.0
  inet6 addr: fd00:104:1::1/64 Scope:Global
  inet6 addr: fe80::5054:ff:fe0b:ff2e/64 Scope:Link
  UP BROADCAST RUNNING MULTICAST  MTU:2500  Metric:1
  RX packets:2204837 errors:0 dropped:5 overruns:0 frame:0
  TX packets:2083636 errors:0 dropped:0 overruns:0 carrier:0
  collisions:0 txqueuelen:1000 
  RX bytes:3073876412 (2.8 GiB)  TX bytes:2897017540 (2.6 GiB)

loLink encap:Local Loopback  
  inet addr:127.0.0.1  Mask:255.0.0.0
  inet6 addr: ::1/128 Scope:Host
  UP LOOPBACK RUNNING  MTU:65536  Metric:1
  RX packets:75 errors:0 dropped:0 overruns:0 frame:0
  TX packets:75 errors:0 dropped:0 overruns:0 carrier:0
  collisio

[GIT] Networking

2017-04-26 Thread David Miller

1) MLX5 bug fixes from Saeed Mahameed et al.
   a) Release wrong resources when firmware timeout happens
   b) Wrong check for encapsulation size limits
   c) UAR memory leak
   d) ETHTOOL_GRXCLSRLALL fails to fill in info->data

2) Don't cache l3mdev on mis-matches local route, causes
   net devices to leak refs.  From Robert Shearman.

3) Handle fragmented SKBs properly in macsec driver, the problem
   is that we were mis-sizing the sgvec table.  From Jason A.
   Donenfeld.

4) We cannot have checksum offload enabled for inner UDP tunneled
   packet during IPSEC, from Ansis Atteka.

5) Fix double SKB free in ravb driver, from Dan Carpenter.

6) Fix CPU port handling in b53 DSA driver, from Florian Dainelli.

7) Don't use on-stack buffers for usb_control_msg() in CAN usb driver,
   from Maksim Salau.

8) Fix device leak in macvlan driver, from Herbert Xu.  We have to
   purge the broadcast queue properly on port destroy.

9) Fix tx ring entry limit on EF10 devices in sfc driver.  From
   Bert Kenward.

10) Fix memory leaks in team driver, from Pan Bian.

11) Don't setup ipv6_stub before it can be actually used, from Paolo
   Abeni.

12) Fix tipc socket flow control accounting, from Parthasarathy
Bhuvaragan.

13) Fix crash on module unload in hso driver, from Andreas Kemnade.

14) Fix purging of bridge multicast entries, the problem is that if
we don't defer it to ndo_uninit it's possible for new entries to
get added after we purge.  Fix from Xin Long.

15) Don't return garbage for PACKET_HDRLEN getsockopt, from Alexander
Potapenko.

16) Fix autoneg stall properly in PHY layer, and revert micrel driver
change that was papering over it.  From Alexander Kochetkov.

17) Don't dereference an ipv4 route as an ipv6 one in the ip6_tunnnel
code, from Cong Wang.

18) Clear out the congestion control private of the TCP socket in all
of the right places, from Wei Wang.

19) rawv6_ioctl measures SKB length incorrectly, fix from Jamie
Bainbridge.

Please pull, thanks a lot!

The following changes since commit 94836ecf1e7378b64d37624fbb81fe48fbd4c772:

  Merge tag 'nfsd-4.11-2' of git://linux-nfs.org/~bfields/linux (2017-04-21 
16:37:48 -0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git 

for you to fetch changes up to 105f5528b9bbaa08b526d3405a5bcd2ff0c953c8:

  ipv6: check raw payload size correctly in ioctl (2017-04-26 14:59:35 -0400)


Alexander Kochetkov (1):
  net: phy: fix auto-negotiation stall due to unavailable interrupt

Alexander Potapenko (1):
  net/packet: check length in getsockopt() called with PACKET_HDRLEN

Andreas Kemnade (1):
  net: hso: fix module unloading

Ansis Atteka (1):
  udp: disable inner UDP checksum offloads in IPsec case

Bert Kenward (1):
  sfc: tx ring can only have 2048 entries for all EF10 NICs

Dan Carpenter (2):
  net: tc35815: move free after the dereference
  ravb: Double free on error in ravb_start_xmit()

David Ahern (2):
  net: ipv6: send unsolicited NA if enabled for all interfaces
  net: ipv6: regenerate host route if moved to gc list

David S. Miller (4):
  Merge tag 'mlx5-fixes-2017-04-22' of git://git.kernel.org/.../saeed/linux
  Merge branch 'dsa-b53-58xx-fixes'
  Merge tag 'linux-can-fixes-for-4.11-20170425' of 
git://git.kernel.org/.../mkl/linux-can
  Revert "phy: micrel: Disable auto negotiation on startup"

Eugenia Emantayev (1):
  net/mlx5e: Fix small packet threshold

Florian Fainelli (3):
  net: dsa: b53: Include IMP/CPU port in dumb forwarding mode
  net: dsa: b53: Implement software reset for 58xx devices
  net: dsa: b53: Fix CPU port for 58xx devices

Herbert Xu (1):
  macvlan: Fix device ref leak when purging bc_queue

Ilan Tayari (1):
  net/mlx5e: Fix ETHTOOL_GRXCLSRLALL handling

Jamie Bainbridge (1):
  ipv6: check raw payload size correctly in ioctl

Jason A. Donenfeld (2):
  macsec: avoid heap overflow in skb_to_sgvec
  macsec: dynamically allocate space for sglist

Maksim Salau (1):
  net: can: usb: gs_usb: Fix buffer on stack

Maor Gottlieb (1):
  net/mlx5: Fix UAR memory leak

Martin KaFai Lau (1):
  net/mlx5e: Fix race in mlx5e_sw_stats and mlx5e_vport_stats

Mohamad Haj Yahia (1):
  net/mlx5: Fix driver load bad flow when having fw initializing timeout

Myungho Jung (1):
  net: core: Prevent from dereferencing null pointer when releasing SKB

Or Gerlitz (3):
  net/mlx5: E-Switch, Correctly deal with inline mode on ConnectX-5
  net/mlx5e: Make sure the FW max encap size is enough for ipv4 tunnels
  net/mlx5e: Make sure the FW max encap size is enough for ipv6 tunnels

Pan Bian (1):
  team: fix memory leaks

Paolo Abeni (1):
  ipv6: move stub initialization after ipv6 setup completion

Parthasarathy Bhuvaragan (2):
  tipc: fix socket flow control accounting error at tipc

Re: [PATCH] ipv6: check raw payload size correctly in ioctl

2017-04-26 Thread David Miller
From: Jamie Bainbridge 
Date: Wed, 26 Apr 2017 10:43:27 +1000

> In situations where an skb is paged, the transport header pointer and
> tail pointer can be the same because the skb contents are in frags.
> 
> This results in ioctl(SIOCINQ/FIONREAD) incorrectly returning a
> length of 0 when the length to receive is actually greater than zero.
> 
> skb->len is already correctly set in ip6_input_finish() with
> pskb_pull(), so use skb->len as it always returns the correct result
> for both linear and paged data.
> 
> Signed-off-by: Jamie Bainbridge 

Applied and queued up for -stable, thanks.


Re: [PATCH net-next] tcp: memset ca_priv data to 0 properly

2017-04-26 Thread David Miller
From: Wei Wang 
Date: Tue, 25 Apr 2017 17:38:02 -0700

> From: Wei Wang 
> 
> Always zero out ca_priv data in tcp_assign_congestion_control() so that
> ca_priv data is cleared out during socket creation.
> Also always zero out ca_priv data in tcp_reinit_congestion_control() so
> that when cc algorithm is changed, ca_priv data is cleared out as well.
> We should still zero out ca_priv data even in TCP_CLOSE state because
> user could call connect() on AF_UNSPEC to disconnect the socket and
> leave it in TCP_CLOSE state and later call setsockopt() to switch cc
> algorithm on this socket.
> 
> Fixes: 2b0a8c9ee ("tcp: add CDG congestion control")
> Reported-by: Andrey Konovalov  
> Signed-off-by: Wei Wang 
> Acked-by: Eric Dumazet 
> Acked-by: Yuchung Cheng 
> Acked-by: Neal Cardwell 

Applied to 'net' and queued up for -stable, thanks.


Re: [Patch net] ipv6: check skb->protocol before lookup for nexthop

2017-04-26 Thread David Miller
From: Cong Wang 
Date: Tue, 25 Apr 2017 14:37:15 -0700

> Andrey reported a out-of-bound access in ip6_tnl_xmit(), this
> is because we use an ipv4 dst in ip6_tnl_xmit() and cast an IPv4
> neigh key as an IPv6 address:
> 
> neigh = dst_neigh_lookup(skb_dst(skb),
>  &ipv6_hdr(skb)->daddr);
> if (!neigh)
> goto tx_err_link_failure;
> 
> addr6 = (struct in6_addr *)&neigh->primary_key; // <=== HERE
> addr_type = ipv6_addr_type(addr6);
> 
> if (addr_type == IPV6_ADDR_ANY)
> addr6 = &ipv6_hdr(skb)->daddr;
> 
> memcpy(&fl6->daddr, addr6, sizeof(fl6->daddr));
> 
> Also the network header of the skb at this point should be still IPv4
> for 4in6 tunnels, we shold not just use it as IPv6 header.
> 
> This patch fixes it by checking if skb->protocol is ETH_P_IPV6: if it
> is, we are safe to do the nexthop lookup using skb_dst() and
> ipv6_hdr(skb)->daddr; if not (aka IPv4), we have no clue about which
> dest address we can pick here, we have to rely on callers to fill it
> from tunnel config, so just fall to ip6_route_output() to make the
> decision.
> 
> Fixes: ea3dc9601bda ("ip6_tunnel: Add support for wildcard tunnel endpoints.")
> Reported-by: Andrey Konovalov 
> Tested-by: Andrey Konovalov 
> Cc: Steffen Klassert 
> Signed-off-by: Cong Wang 

Applied and queued up for -stable, thanks Cong.


Re: [PATCH net-next] virtio-net: on tx, only call napi_disable if tx napi is on

2017-04-26 Thread David Miller
From: Willem de Bruijn 
Date: Tue, 25 Apr 2017 15:59:17 -0400

> From: Willem de Bruijn 
> 
> As of tx napi, device down (`ip link set dev $dev down`) hangs unless
> tx napi is enabled. Else napi_enable is not called, so napi_disable
> will spin on test_and_set_bit NAPI_STATE_SCHED.
> 
> Only call napi_disable if tx napi is enabled.
> 
> Fixes: 5a719c2552ca ("virtio-net: transmit napi")
> Reported-by: Jason Wang 
> Signed-off-by: Willem de Bruijn 

Applied, thanks.


Re: [PATCH net-next 0/2] Move sub crq init out of interrupt context

2017-04-26 Thread David Miller
From: Nathan Fontenot 
Date: Tue, 25 Apr 2017 15:00:58 -0400

> The sub crqs are currently intialized in interrupt context when
> handling a crq response fromn the vios server. There is no reason
> they must be initialized there.
> 
> Moving the initialization of the sub crqs to the ibmvnic_init routine
> allows us to do the initialization outside of interrupt context and
> make all of the allocations with GFP_KERNEL instead of GFP_ATOMIC.

Series applied, thanks.


Re: [PATCH v3] net: core: Prevent from dereferencing null pointer when releasing SKB

2017-04-26 Thread David Miller
From: Myungho Jung 
Date: Tue, 25 Apr 2017 11:58:15 -0700

> Added NULL check to make __dev_kfree_skb_irq consistent with kfree
> family of functions.
> 
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=195289
> 
> Signed-off-by: Myungho Jung 

Applied, thank you.


Re: [PATCH net-next] dt-bindings: mdio: Clarify binding document

2017-04-26 Thread David Miller
From: Florian Fainelli 
Date: Tue, 25 Apr 2017 11:33:03 -0700

> The described GPIO reset property is applicable to *all* child PHYs. If
> we have one reset line per PHY present on the MDIO bus, these
> automatically become properties of the child PHY nodes.
> 
> Finally, indicate how the RESET pulse width must be defined, which is
> the maximum value of all individual PHYs RESET pulse widths determined
> by reading their datasheets.
> 
> Fixes: 69226896ad63 ("mdio_bus: Issue GPIO RESET to PHYs.")
> Signed-off-by: Florian Fainelli 

Applied, thanks Florian.


Re: [PATCH net-next 00/10] tcp: do not use tcp_time_stamp for rcv autotuning

2017-04-26 Thread David Miller
From: Eric Dumazet 
Date: Tue, 25 Apr 2017 10:15:31 -0700

> Some devices or linux distributions use HZ=100 or HZ=250
> 
> TCP receive buffer autotuning has poor behavior caused by this choice.
> Since autotuning happens after 4 ms or 10 ms, short distance flows
> get their receive buffer tuned to a very high value, but after an initial
> period where it was frozen to (too small) initial value.
> 
> With BBR (or other CC allowing to increase BDP), we are willing to
> increase tcp_rmem[2], but this receive autotuning defect is a blocker
> for hosts dealing with gazillions of TCP flows in the data centers,
> since many of them have inflated RCVBUF. Risk of OOM is too high.
> 
> Note that TSO autodefer, tcp cubic, and TCP TS options (RFC 7323)
> also suffer from our dependency to jiffies (via tcp_time_stamp).
> 
> We have ongoing efforts to improve all that in the future.

Looks great, series applied, thanks Eric.


Re: [oss-drivers] Re: [RFC 3/4] nfp: make use of extended ack message reporting

2017-04-26 Thread Simon Horman
On Wed, Apr 26, 2017 at 10:44:16AM -0400, David Miller wrote:
> From: Simon Horman 
> Date: Wed, 26 Apr 2017 13:13:16 +0200
> 
> > On Tue, Apr 25, 2017 at 10:20:22AM -0400, David Miller wrote:
> >> From: Jamal Hadi Salim 
> >> Date: Tue, 25 Apr 2017 08:42:32 -0400
> >> 
> >> > So are we going to standardize these strings?
> >> 
> >> No.
> >> 
> >> > i.e what if some user has written a bash script that depends on this
> >> > string and it gets changed later.
> >> 
> >> They can't do that.
> >> 
> >> It's free form extra information an application may or not provide
> >> to the user when the kernel emits it.
> > 
> > I don't feel strongly about this and perhaps it can be revisited at some
> > point but perhaps it would be worth documenting that he strings do not
> > form part of the UAPI as my expectation would have been that they do f.e. to
> > facilitate internationalisation.
> 
> These two things are entirely separate.
> 
> We can maintain uptodate translations of the strings, yet document that
> they can change at any time and are thus not UAPI.

Thanks, I see that now.


Re: [PATCH v2] macsec: dynamically allocate space for sglist

2017-04-26 Thread David Miller
From: "Jason A. Donenfeld" 
Date: Tue, 25 Apr 2017 19:08:18 +0200

> We call skb_cow_data, which is good anyway to ensure we can actually
> modify the skb as such (another error from prior). Now that we have the
> number of fragments required, we can safely allocate exactly that amount
> of memory.
> 
> Signed-off-by: Jason A. Donenfeld 
> Cc: Sabrina Dubroca 
> Cc: secur...@kernel.org
> Cc: sta...@vger.kernel.org

Applied, thanks.


Re: [PATCH net-next] rhashtable: remove insecure_max_entries param

2017-04-26 Thread David Miller
From: Florian Westphal 
Date: Tue, 25 Apr 2017 11:41:34 +0200

> no users in the tree, insecure_max_entries is always set to
> ht->p.max_size * 2 in rhtashtable_init().
> 
> Replace only spot that uses it with a ht->p.max_size check.
> 
> Signed-off-by: Florian Westphal 

Applied, thanks Florian.


Re: [RFC PATCH] netxen_nic: null-terminate serial number string in netxen_check_options()

2017-04-26 Thread David Miller
From: "Jerome Marchand" 
Date: Tue, 25 Apr 2017 09:42:29 +0200

> The serial_num string in netxen_check_options() is not always properly
> null-terminated. I couldn't find the documention on the serial number
> format and I suspect a proper integer to string conversion is in
> order, but this patch a least prevents the out-of-bound access.
> 
> It solves the following kasan warning:
 ...
> @@ -842,7 +842,7 @@ netxen_check_options(struct netxen_adapter *adapter)
>  {
>   u32 fw_major, fw_minor, fw_build, prev_fw_version;
>   char brd_name[NETXEN_MAX_SHORT_NAME];
> - char serial_num[32];
> + char serial_num[33];
>   int i, offset, val, err;
>   __le32 *ptr32;
>   struct pci_dev *pdev = adapter->pdev;

Another problem is that the serial_num array is only 4-byte aligned by
accident.  Steps are necessary to make sure the ptr32 assignments don't
take unaligned traps.

Something like:

union {
char buf[33];
__le32 dummy;
} serial_num;


[PATCH v2 net-next] ip6_tunnel: Fix missing tunnel encapsulation limit option

2017-04-26 Thread Craig Gallek
From: Craig Gallek 

The IPv6 tunneling code tries to insert IPV6_TLV_TNL_ENCAP_LIMIT and
IPV6_TLV_PADN options when an encapsulation limit is defined (the
default is a limit of 4).  An MTU adjustment is done to account for
these options as well.  However, the options are never present in the
generated packets.

The issue appears to be a subtlety between IPV6_DSTOPTS and
IPV6_RTHDRDSTOPTS defined in RFC 3542.  When the IPIP tunnel driver was
written, the encap limit options were included as IPV6_RTHDRDSTOPTS in
dst0opt of struct ipv6_txoptions.  Later, ipv6_push_nfrags_opts was
(correctly) updated to require IPV6_RTHDR options when IPV6_RTHDRDSTOPTS
are to be used.  This caused the options to no longer be included in v6
encapsulated packets.

The fix is to use IPV6_DSTOPTS (in dst1opt of struct ipv6_txoptions)
instead.  IPV6_DSTOPTS do not have the additional IPV6_RTHDR requirement.

Fixes: 1df64a8569c7: ("[IPV6]: Add ip6ip6 tunnel driver.")
Fixes: 333fad5364d6: ("[IPV6]: Support several new sockopt / ancillary data in 
Advanced API (RFC3542)")
Signed-off-by: Craig Gallek 
---

v2: Change tunnel code to use dst1opt rather than making the checks for
dst0opt more permissive.

 net/ipv6/ip6_tunnel.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/ipv6/ip6_tunnel.c b/net/ipv6/ip6_tunnel.c
index ad15d38b41e8..c81f9541f1f7 100644
--- a/net/ipv6/ip6_tunnel.c
+++ b/net/ipv6/ip6_tunnel.c
@@ -954,7 +954,7 @@ static void init_tel_txopt(struct ipv6_tel_txoption *opt, 
__u8 encap_limit)
opt->dst_opt[5] = IPV6_TLV_PADN;
opt->dst_opt[6] = 1;
 
-   opt->ops.dst0opt = (struct ipv6_opt_hdr *) opt->dst_opt;
+   opt->ops.dst1opt = (struct ipv6_opt_hdr *) opt->dst_opt;
opt->ops.opt_nflen = 8;
 }
 
@@ -1176,7 +1176,7 @@ int ip6_tnl_xmit(struct sk_buff *skb, struct net_device 
*dev, __u8 dsfield,
 
if (encap_limit >= 0) {
init_tel_txopt(&opt, encap_limit);
-   ipv6_push_nfrag_opts(skb, &opt.ops, &proto, NULL, NULL);
+   ipv6_push_frag_opts(skb, &opt.ops, &proto);
}
 
/* Calculate max headroom for all the headers and adjust
-- 
2.13.0.rc0.306.g87b477812d-goog



Re: [PATCH v1] net: phy: fix auto-negotiation stall due to unavailable interrupt

2017-04-26 Thread David Miller
From: Alexander Kochetkov 
Date: Thu, 20 Apr 2017 14:00:04 +0300

> The Ethernet link on an interrupt driven PHY was not coming up if the Ethernet
> cable was plugged before the Ethernet interface was brought up.
> 
> The patch trigger PHY state machine to update link state if PHY was requested 
> to
> do auto-negotiation and auto-negotiation complete flag already set.
> 
> During power-up cycle the PHY do auto-negotiation, generate interrupt and set
> auto-negotiation complete flag. Interrupt is handled by PHY state machine but
> doesn't update link state because PHY is in PHY_READY state. After some time
> MAC bring up, start and request PHY to do auto-negotiation. If there are no 
> new
> settings to advertise genphy_config_aneg() doesn't start PHY auto-negotiation.
> PHY continue to stay in auto-negotiation complete state and doesn't fire
> interrupt. At the same time PHY state machine expect that PHY started
> auto-negotiation and is waiting for interrupt from PHY and it won't get it.
> 
> Signed-off-by: Alexander Kochetkov 
> Cc: stable  # v4.9+

Applied, and I reverted the micrel commit too.

Thanks.


GSO + changing TX queues == crash?

2017-04-26 Thread Jakub Kicinski
Hi!

I'm seeing crashes with GSO on when changing the number of TX rings.  I
initially thought it was a nfp driver problem but I managed to
reproduce it with i40e now.

What I run on the nfp was iperf sending two dozen streams.  Then I run
this little script while 40Gbps is being sent:

bl() { 
tc -s qdisc show dev p4p1 | \
tail -$((MAX_TX*3)) | \
grep backlog | \
grep -v "backlog 0b"
}

while true
do 
ethtool -L p4p1 tx 0

a=$(bl | wc -l)
echo down $a

# there are 8 combined queues, we shouldn't see more backlog
[ $a -gt 8 ] && break  

sleep 2

ethtool -L p4p1 tx 10

echo up $(bl | wc -l)

sleep 2
done

The idea is to catch when after reconfig more queues have backlog than
are configured.  Right after this script exits driver gets traffic on
queues which are down.  It usually reproduces within a minute, I run it
with tso on gso off and tso off gso off for 15 minutes and that didn't
crash.

i40e machine was running kernel 4.10, with the NFP driver I'm able to
reproduce on net-next all the way back to 3.16 (I haven't tried older).

FWIW the nfp driver is doing:

netif_carrier_off()
for enabled rings:
disable_irq()
napi_disable()
netif_tx_disable()

nfp's free_tx_bufs()

netif_set_real_num_tx_queues()

for enabled rings:
napi_enable()
enable_irq()
netif_tx_wake_all_queues()
nfp's read_link() # does netif_carrier_on()

I was entirely convinced that it's a driver problem, but the fact I
crashed the i40e made me worry :|

This is a crash from i40e, it takes a bit longer to kill it than the
nfp, maybe because it takes longer to reconfig:

[  461.822381] BUG: unable to handle kernel NULL pointer dereference at 
0008
[  461.831229] IP: i40e_lan_xmit_frame+0xf1/0x1420 [i40e]
[  461.837045] PGD 0 
[  461.837046] 
[  461.841089] Oops: 0002 [#1] SMP
[  461.844665] Modules linked in: xfs nls_iso8859_1 ipmi_devintf ipt_MASQUERADE 
nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 
nf_defrag_ipv4 xtc
[  461.924168]  fscache tg3 i40e ptp ahci pps_core libahci fjes
[  461.930595] CPU: 15 PID: 0 Comm: swapper/15 Not tainted 
4.11.0-041100rc1-generic #201703051731
[  461.940340] Hardware name: Dell Inc. PowerEdge R730/072T6D, BIOS 2.3.4 
11/08/2016
[  461.948823] task: a087db2b task.stack: b9b90335
[  461.955546] RIP: 0010:i40e_lan_xmit_frame+0xf1/0x1420 [i40e]
[  461.961965] RSP: 0018:a087df1c3d80 EFLAGS: 00010293
[  461.967899] RAX:  RBX: a087c3077d00 RCX: 
[  461.975971] RDX:  RSI: 0007 RDI: a087b4e1d70c
[  461.984044] RBP: a087df1c3e20 R08: a087d440349c R09: 
[  461.992117] R10: a087de821a08 R11:  R12: a087b7c6
[  462.000188] R13: 0002 R14: 05ea R15: a087d97b3000
[  462.008261] FS:  () GS:a087df1c() 
knlGS:
[  462.017423] CS:  0010 DS:  ES:  CR0: 80050033
[  462.023941] CR2: 0008 CR3: 0006efe09000 CR4: 003406e0
[  462.032014] DR0:  DR1:  DR2: 
[  462.040083] DR3:  DR6: fffe0ff0 DR7: 0400
[  462.048154] Call Trace:
[  462.050969]  
[  462.053311]  ? update_load_avg+0x79/0x520
[  462.057876]  ? sched_clock_cpu+0x11/0xb0
[  462.062356]  dev_hard_start_xmit+0xa3/0x1f0
[  462.067127]  sch_direct_xmit+0xfc/0x1c0
[  462.071509]  __qdisc_run+0x122/0x270
[  462.075598]  net_tx_action+0xfd/0x1e0
[  462.079786]  __do_softirq+0x104/0x2af
[  462.083973]  irq_exit+0xb6/0xc0
[  462.087577]  smp_apic_timer_interrupt+0x3d/0x50
[  462.092732]  apic_timer_interrupt+0x89/0x90
[  462.097501] RIP: 0010:cpuidle_enter_state+0x122/0x2c0
[  462.103240] RSP: 0018:b9b903353e58 EFLAGS: 0246 ORIG_RAX: 
ff10
[  462.111818] RAX:  RBX: 0004 RCX: 001f
[  462.119890] RDX: 006b86c1bdc1 RSI: a087df1d8998 RDI: 
[  462.127962] RBP: b9b903353e98 R08: 00084c3f R09: 0018
[  462.136032] R10: 000310a2 R11: 00055e38 R12: a087df1e5800
[  462.144094] R13: b86ec998 R14: 0004 R15: b86ec980
[  462.152165]  
[  462.154601]  ? cpuidle_enter_state+0x110/0x2c0
[  462.159662]  cpuidle_enter+0x17/0x20
[  462.163748]  call_cpuidle+0x23/0x40
[  462.167740]  do_idle+0x189/0x200
[  462.171438]  cpu_startup_entry+0x71/0x80
[  462.175908]  start_secondary+0x154/0x190
[  462.180384]  start_cpu+0x14/0x14
[  462.184080] Code: 8d 75 05 66 39 c8 0f 86 e4 04 00 00 01 d0 0f b7 d1 29 d0 
83 e8 01 39 c6 0f 8f ed 05 00 00 49 8b 44 24 20 48 8d 14 89 4c 8d 1c d0 <49> 89 
5b 08 8b 83 
[  462.205347] RIP: i40e_lan_xmit_frame+0xf1/0x1420 [i40e] RSP: a087df1c3d80
[  462.213418] CR2: 0008
[  462.2

Re: [PATCH] stmmac: Add support for SIMATIC IOT2000 platform

2017-04-26 Thread Andy Shevchenko
On Tue, Apr 25, 2017 at 3:15 PM, Jan Kiszka  wrote:
> On 2017-04-25 13:42, Andy Shevchenko wrote:
>> On Tue, Apr 25, 2017 at 1:09 PM, Jan Kiszka  wrote:
>>> On 2017-04-25 12:07, Jan Kiszka wrote:
 On 2017-04-25 11:46, Andy Shevchenko wrote:
> On Tue, Apr 25, 2017 at 12:00 PM, Jan Kiszka  
> wrote:

>{
>.name = "SIMATIC IOT2000",
>.func = 6,
>.phy_addr = 1,
>},
>{
>.name = "SIMATIC IOT2000",
>.func = 7,
>.phy_addr = 1,
>},
>
> That's all what you need.

 Nope. Again: the asset tag is the way to tell both apart AND to ensure
 that we do not match on future devices.
>>
>>> To be more verbose: your version (which is our old one) would even
>>> enable the second, not connected port on the IOT2020. Incorrectly.
>>
>> So, name has 2000 for 2020 device? It's clear bug in DMI table you have. :-(
>>

>> What else do you have in DMI which can be used to distinguish those devices?

So, except asset tag is there anything else to use?

Whatever you choose would be nice to split this long conditional:

if (!strcmp(dmi->name, name) && dmi->func == func) {
/* ASSET_TAG is optional */
if (dmi->asset_tag && strcmp(dmi->asset_tag, asset_tag))
continue;
return dmi->phy_addr;
}

> Andy, there are devices out there in the field, if we as engineers like
> it or not, that are all called "IOT2000" although they are sightly
> different inside. This patch accounts for that. And it does that even
> without adding "platform_data" hacks like in other patches of mine. ;)

Yes, which is good.

-- 
With Best Regards,
Andy Shevchenko


[PATCH net-next 3/6] bpf: bpf_progs stores all loaded programs

2017-04-26 Thread Hannes Frederic Sowa
We later want to give users a quick dump of what is possible with procfs,
so store a list of all currently loaded bpf programs. Later this list
will be printed in procfs.

Signed-off-by: Hannes Frederic Sowa 
---
 include/linux/filter.h |  4 ++--
 kernel/bpf/core.c  | 51 +++---
 kernel/bpf/syscall.c   |  4 ++--
 3 files changed, 27 insertions(+), 32 deletions(-)

diff --git a/include/linux/filter.h b/include/linux/filter.h
index 9a7786db14fa53..63624c619e371b 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -753,8 +753,8 @@ bpf_address_lookup(unsigned long addr, unsigned long *size,
return ret;
 }
 
-void bpf_prog_kallsyms_add(struct bpf_prog *fp);
-void bpf_prog_kallsyms_del(struct bpf_prog *fp);
+void bpf_prog_link(struct bpf_prog *fp);
+void bpf_prog_unlink(struct bpf_prog *fp);
 
 #else /* CONFIG_BPF_JIT */
 
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index 043f634ff58d87..2139118258cdf8 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -365,22 +365,6 @@ static struct latch_tree_root bpf_tree __cacheline_aligned;
 
 int bpf_jit_kallsyms __read_mostly;
 
-static void bpf_prog_ksym_node_add(struct bpf_prog_aux *aux)
-{
-   WARN_ON_ONCE(!list_empty(&aux->bpf_progs_head));
-   list_add_tail_rcu(&aux->bpf_progs_head, &bpf_progs);
-   latch_tree_insert(&aux->ksym_tnode, &bpf_tree, &bpf_tree_ops);
-}
-
-static void bpf_prog_ksym_node_del(struct bpf_prog_aux *aux)
-{
-   if (list_empty(&aux->bpf_progs_head))
-   return;
-
-   latch_tree_erase(&aux->ksym_tnode, &bpf_tree, &bpf_tree_ops);
-   list_del_rcu(&aux->bpf_progs_head);
-}
-
 static bool bpf_prog_kallsyms_candidate(const struct bpf_prog *fp)
 {
return fp->jited && !bpf_prog_was_classic(fp);
@@ -392,38 +376,45 @@ static bool bpf_prog_kallsyms_verify_off(const struct 
bpf_prog *fp)
   fp->aux->bpf_progs_head.prev == LIST_POISON2;
 }
 
-void bpf_prog_kallsyms_add(struct bpf_prog *fp)
+void bpf_prog_link(struct bpf_prog *fp)
 {
-   if (!bpf_prog_kallsyms_candidate(fp) ||
-   !capable(CAP_SYS_ADMIN))
-   return;
+   struct bpf_prog_aux *aux = fp->aux;
 
spin_lock_bh(&bpf_lock);
-   bpf_prog_ksym_node_add(fp->aux);
+   list_add_tail_rcu(&aux->bpf_progs_head, &bpf_progs);
+   if (bpf_prog_kallsyms_candidate(fp))
+   latch_tree_insert(&aux->ksym_tnode, &bpf_tree, &bpf_tree_ops);
spin_unlock_bh(&bpf_lock);
 }
 
-void bpf_prog_kallsyms_del(struct bpf_prog *fp)
+void bpf_prog_unlink(struct bpf_prog *fp)
 {
-   if (!bpf_prog_kallsyms_candidate(fp))
-   return;
+   struct bpf_prog_aux *aux = fp->aux;
 
spin_lock_bh(&bpf_lock);
-   bpf_prog_ksym_node_del(fp->aux);
+   list_del_rcu(&aux->bpf_progs_head);
+   if (bpf_prog_kallsyms_candidate(fp))
+   latch_tree_erase(&aux->ksym_tnode, &bpf_tree, &bpf_tree_ops);
spin_unlock_bh(&bpf_lock);
 }
 
 static struct bpf_prog *bpf_prog_kallsyms_find(unsigned long addr)
 {
struct latch_tree_node *n;
+   struct bpf_prog *prog;
 
if (!bpf_jit_kallsyms_enabled())
return NULL;
 
n = latch_tree_find((void *)addr, &bpf_tree, &bpf_tree_ops);
-   return n ?
-  container_of(n, struct bpf_prog_aux, ksym_tnode)->prog :
-  NULL;
+   if (!n)
+   return NULL;
+
+   prog = container_of(n, struct bpf_prog_aux, ksym_tnode)->prog;
+   if (!prog->priv_cap_sys_admin)
+   return NULL;
+
+   return prog;
 }
 
 const char *__bpf_address_lookup(unsigned long addr, unsigned long *size,
@@ -474,6 +465,10 @@ int bpf_get_kallsym(unsigned int symnum, unsigned long 
*value, char *type,
 
rcu_read_lock();
list_for_each_entry_rcu(aux, &bpf_progs, bpf_progs_head) {
+   if (!bpf_prog_kallsyms_candidate(aux->prog) ||
+   !aux->prog->priv_cap_sys_admin)
+   continue;
+
if (it++ != symnum)
continue;
 
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 13642c73dca0b4..d61d1bd3e6fee6 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -664,7 +664,7 @@ void bpf_prog_put(struct bpf_prog *prog)
 {
if (atomic_dec_and_test(&prog->aux->refcnt)) {
trace_bpf_prog_put_rcu(prog);
-   bpf_prog_kallsyms_del(prog);
+   bpf_prog_unlink(prog);
call_rcu(&prog->aux->rcu, __bpf_prog_put_rcu);
}
 }
@@ -858,7 +858,7 @@ static int bpf_prog_load(union bpf_attr *attr)
/* failed to allocate fd */
goto free_used_maps;
 
-   bpf_prog_kallsyms_add(prog);
+   bpf_prog_link(prog);
trace_bpf_prog_load(prog, err);
return err;
 
-- 
2.9.3



[PATCH net-next 5/6] bpf: add skeleton for procfs printing of bpf_progs

2017-04-26 Thread Hannes Frederic Sowa
Signed-off-by: Hannes Frederic Sowa 
---
 kernel/bpf/core.c | 90 +++
 1 file changed, 90 insertions(+)

diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index 048e2d79718a16..3ba175a24e971a 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -32,6 +32,9 @@
 #include 
 #include 
 
+#include 
+#include 
+
 #include 
 
 /* Registers */
@@ -488,6 +491,93 @@ int bpf_get_kallsym(unsigned int symnum, unsigned long 
*value, char *type,
return ret;
 }
 
+/* ebpf procfs implementation */
+
+static struct proc_dir_entry *ebpf_proc_dir;
+
+static void *ebpf_proc_start(struct seq_file *s, loff_t *pos)
+   __acquires(RCU_BH)
+{
+   struct bpf_prog_aux *aux;
+   loff_t off = 0;
+
+   rcu_read_lock();
+
+   if (*pos == 0)
+   return SEQ_START_TOKEN;
+
+   list_for_each_entry_rcu(aux, &bpf_progs, bpf_progs_head)
+   if (++off == *pos)
+   return aux;
+
+   return NULL;
+}
+
+static void *ebpf_proc_next(struct seq_file *s, void *v, loff_t *pos)
+{
+   struct bpf_prog_aux *aux;
+
+   ++*pos;
+
+   if (v == SEQ_START_TOKEN)
+   return list_first_or_null_rcu(&bpf_progs, struct bpf_prog_aux,
+ bpf_progs_head);
+
+   aux = v;
+   return list_next_or_null_rcu(&bpf_progs,
+&aux->bpf_progs_head,
+struct bpf_prog_aux,
+bpf_progs_head);
+}
+
+static void ebpf_proc_stop(struct seq_file *s, void *v)
+   __releases(RCU_BH)
+{
+   rcu_read_unlock();
+}
+
+static int ebpf_proc_show(struct seq_file *s, void *v)
+{
+   if (v == SEQ_START_TOKEN) {
+   seq_printf(s, "# tag\n");
+   return 0;
+   }
+
+   return 0;
+}
+
+static const struct seq_operations ebpf_seq_ops = {
+   .start = ebpf_proc_start,
+   .next = ebpf_proc_next,
+   .stop = ebpf_proc_stop,
+   .show = ebpf_proc_show,
+};
+
+static int ebpf_proc_open(struct inode *inode, struct file *file)
+{
+   return seq_open(file, &ebpf_seq_ops);
+}
+
+static const struct file_operations ebpf_proc_operations = {
+   .open = ebpf_proc_open,
+   .read = seq_read,
+   .llseek = seq_lseek,
+   .release = seq_release,
+};
+
+static int __init ebpf_proc_init(void)
+{
+   ebpf_proc_dir = proc_mkdir("bpf", NULL);
+   if (!ebpf_proc_dir)
+   return 0;
+   proc_create("programs", 0400, ebpf_proc_dir, &ebpf_proc_operations);
+   return 0;
+}
+
+device_initcall(ebpf_proc_init);
+
+/* end of bpf proc inmplementation */
+
 struct bpf_binary_header *
 bpf_jit_binary_alloc(unsigned int proglen, u8 **image_ptr,
 unsigned int alignment,
-- 
2.9.3



[PATCH net-next 1/6] bpf: bpf_lock needs only block bottom half

2017-04-26 Thread Hannes Frederic Sowa
We never modify bpf programs from hardirqs ever.

Signed-off-by: Hannes Frederic Sowa 
---
 kernel/bpf/core.c | 12 
 1 file changed, 4 insertions(+), 8 deletions(-)

diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index b4f1cb0c5ac710..6f81e0f5a0faa2 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -394,27 +394,23 @@ static bool bpf_prog_kallsyms_verify_off(const struct 
bpf_prog *fp)
 
 void bpf_prog_kallsyms_add(struct bpf_prog *fp)
 {
-   unsigned long flags;
-
if (!bpf_prog_kallsyms_candidate(fp) ||
!capable(CAP_SYS_ADMIN))
return;
 
-   spin_lock_irqsave(&bpf_lock, flags);
+   spin_lock_bh(&bpf_lock);
bpf_prog_ksym_node_add(fp->aux);
-   spin_unlock_irqrestore(&bpf_lock, flags);
+   spin_unlock_bh(&bpf_lock);
 }
 
 void bpf_prog_kallsyms_del(struct bpf_prog *fp)
 {
-   unsigned long flags;
-
if (!bpf_prog_kallsyms_candidate(fp))
return;
 
-   spin_lock_irqsave(&bpf_lock, flags);
+   spin_lock_bh(&bpf_lock);
bpf_prog_ksym_node_del(fp->aux);
-   spin_unlock_irqrestore(&bpf_lock, flags);
+   spin_unlock_bh(&bpf_lock);
 }
 
 static struct bpf_prog *bpf_prog_kallsyms_find(unsigned long addr)
-- 
2.9.3



[PATCH net-next 6/6] bpf: show bpf programs

2017-04-26 Thread Hannes Frederic Sowa
Signed-off-by: Hannes Frederic Sowa 
---
 include/uapi/linux/bpf.h | 32 +++-
 kernel/bpf/core.c| 30 +-
 2 files changed, 48 insertions(+), 14 deletions(-)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index e553529929f683..d6506e320953d5 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -101,20 +101,26 @@ enum bpf_map_type {
BPF_MAP_TYPE_HASH_OF_MAPS,
 };
 
+#define BPF_PROG_TYPES \
+   X(BPF_PROG_TYPE_UNSPEC),\
+   X(BPF_PROG_TYPE_SOCKET_FILTER), \
+   X(BPF_PROG_TYPE_KPROBE),\
+   X(BPF_PROG_TYPE_SCHED_CLS), \
+   X(BPF_PROG_TYPE_SCHED_ACT), \
+   X(BPF_PROG_TYPE_TRACEPOINT),\
+   X(BPF_PROG_TYPE_XDP),   \
+   X(BPF_PROG_TYPE_PERF_EVENT),\
+   X(BPF_PROG_TYPE_CGROUP_SKB),\
+   X(BPF_PROG_TYPE_CGROUP_SOCK),   \
+   X(BPF_PROG_TYPE_LWT_IN),\
+   X(BPF_PROG_TYPE_LWT_OUT),   \
+   X(BPF_PROG_TYPE_LWT_XMIT),
+
+
 enum bpf_prog_type {
-   BPF_PROG_TYPE_UNSPEC,
-   BPF_PROG_TYPE_SOCKET_FILTER,
-   BPF_PROG_TYPE_KPROBE,
-   BPF_PROG_TYPE_SCHED_CLS,
-   BPF_PROG_TYPE_SCHED_ACT,
-   BPF_PROG_TYPE_TRACEPOINT,
-   BPF_PROG_TYPE_XDP,
-   BPF_PROG_TYPE_PERF_EVENT,
-   BPF_PROG_TYPE_CGROUP_SKB,
-   BPF_PROG_TYPE_CGROUP_SOCK,
-   BPF_PROG_TYPE_LWT_IN,
-   BPF_PROG_TYPE_LWT_OUT,
-   BPF_PROG_TYPE_LWT_XMIT,
+#define X(type) type
+   BPF_PROG_TYPES
+#undef X
 };
 
 enum bpf_attach_type {
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index 3ba175a24e971a..685c1d0f31e029 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -536,13 +536,41 @@ static void ebpf_proc_stop(struct seq_file *s, void *v)
rcu_read_unlock();
 }
 
+static const char *bpf_type_string(enum bpf_prog_type type)
+{
+   static const char *bpf_type_names[] = {
+#define X(type) #type
+   BPF_PROG_TYPES
+#undef X
+   };
+
+   if (type >= ARRAY_SIZE(bpf_type_names))
+   return "";
+
+   return bpf_type_names[type];
+}
+
 static int ebpf_proc_show(struct seq_file *s, void *v)
 {
+   struct bpf_prog *prog;
+   struct bpf_prog_aux *aux;
+   char prog_tag[sizeof(prog->tag) * 2 + 1] = { };
+
if (v == SEQ_START_TOKEN) {
-   seq_printf(s, "# tag\n");
+   seq_printf(s, "# tag\t\t\ttype\t\t\truntime\tcap\tmemlock\n");
return 0;
}
 
+   aux = v;
+   prog = aux->prog;
+
+   bin2hex(prog_tag, prog->tag, sizeof(prog->tag));
+   seq_printf(s, "%s\t%s\t%s\t%s\t%llu\n", prog_tag,
+  bpf_type_string(prog->type),
+  prog->jited ? "jit" : "int",
+  prog->priv_cap_sys_admin ? "priv" : "unpriv",
+  prog->pages * 1ULL << PAGE_SHIFT);
+
return 0;
 }
 
-- 
2.9.3



[PATCH net-next 0/6] bpf: list all loaded ebpf programs in /proc/bpf/programs

2017-04-26 Thread Hannes Frederic Sowa
Right now it seems difficult to list all active ebpf programs in a
system. This new /proc/bpf/programs node should help and print
basically essential information about loaded ebpf programs.

This should help an admin to get a quick look of what is going on in
his system.

Feedback welcome!

Hannes Frederic Sowa (6):
  bpf: bpf_lock needs only block bottom half
  bpf: rename bpf_kallsyms to bpf_progs, ksym_lnode to bpf_progs_head
  bpf: bpf_progs stores all loaded programs
  bpf: track if the bpf program was loaded with SYS_ADMIN capabilities
  bpf: add skeleton for procfs printing of bpf_progs
  bpf: show bpf programs

 include/linux/bpf.h  |   2 +-
 include/linux/filter.h   |  10 ++-
 include/uapi/linux/bpf.h |  32 
 kernel/bpf/core.c| 195 +--
 kernel/bpf/syscall.c |  11 +--
 kernel/bpf/verifier.c|   4 +-
 net/core/filter.c|   6 +-
 7 files changed, 190 insertions(+), 70 deletions(-)

-- 
2.9.3



[PATCH net-next 4/6] bpf: track if the bpf program was loaded with SYS_ADMIN capabilities

2017-04-26 Thread Hannes Frederic Sowa
Signed-off-by: Hannes Frederic Sowa 
---
 include/linux/filter.h | 6 --
 kernel/bpf/core.c  | 4 +++-
 kernel/bpf/syscall.c   | 7 ---
 kernel/bpf/verifier.c  | 4 ++--
 net/core/filter.c  | 6 +++---
 5 files changed, 16 insertions(+), 11 deletions(-)

diff --git a/include/linux/filter.h b/include/linux/filter.h
index 63624c619e371b..635311f57bf24f 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -413,7 +413,8 @@ struct bpf_prog {
locked:1,   /* Program image locked? */
gpl_compatible:1, /* Is filter GPL compatible? 
*/
cb_access:1,/* Is control block accessed? */
-   dst_needed:1;   /* Do we need dst entry? */
+   dst_needed:1,   /* Do we need dst entry? */
+   priv_cap_sys_admin:1; /* Where we loaded as 
sys_admin? */
kmemcheck_bitfield_end(meta);
enum bpf_prog_type  type;   /* Type of BPF program */
u32 len;/* Number of filter blocks */
@@ -615,7 +616,8 @@ static inline int sk_filter(struct sock *sk, struct sk_buff 
*skb)
 struct bpf_prog *bpf_prog_select_runtime(struct bpf_prog *fp, int *err);
 void bpf_prog_free(struct bpf_prog *fp);
 
-struct bpf_prog *bpf_prog_alloc(unsigned int size, gfp_t gfp_extra_flags);
+struct bpf_prog *bpf_prog_alloc(unsigned int size, gfp_t gfp_extra_flags,
+   bool cap_sys_admin);
 struct bpf_prog *bpf_prog_realloc(struct bpf_prog *fp_old, unsigned int size,
  gfp_t gfp_extra_flags);
 void __bpf_prog_free(struct bpf_prog *fp);
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index 2139118258cdf8..048e2d79718a16 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -74,7 +74,8 @@ void *bpf_internal_load_pointer_neg_helper(const struct 
sk_buff *skb, int k, uns
return NULL;
 }
 
-struct bpf_prog *bpf_prog_alloc(unsigned int size, gfp_t gfp_extra_flags)
+struct bpf_prog *bpf_prog_alloc(unsigned int size, gfp_t gfp_extra_flags,
+   bool cap_sys_admin)
 {
gfp_t gfp_flags = GFP_KERNEL | __GFP_HIGHMEM | __GFP_ZERO |
  gfp_extra_flags;
@@ -94,6 +95,7 @@ struct bpf_prog *bpf_prog_alloc(unsigned int size, gfp_t 
gfp_extra_flags)
return NULL;
}
 
+   fp->priv_cap_sys_admin = cap_sys_admin;
fp->pages = size / PAGE_SIZE;
fp->aux = aux;
fp->aux->prog = fp;
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index d61d1bd3e6fee6..ed698c17578a49 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -786,7 +786,7 @@ EXPORT_SYMBOL_GPL(bpf_prog_get_type);
 /* last field in 'union bpf_attr' used by this command */
 #defineBPF_PROG_LOAD_LAST_FIELD kern_version
 
-static int bpf_prog_load(union bpf_attr *attr)
+static int bpf_prog_load(union bpf_attr *attr, bool cap_sys_admin)
 {
enum bpf_prog_type type = attr->prog_type;
struct bpf_prog *prog;
@@ -817,7 +817,8 @@ static int bpf_prog_load(union bpf_attr *attr)
return -EPERM;
 
/* plain bpf_prog allocation */
-   prog = bpf_prog_alloc(bpf_prog_size(attr->insn_cnt), GFP_USER);
+   prog = bpf_prog_alloc(bpf_prog_size(attr->insn_cnt), GFP_USER,
+ cap_sys_admin);
if (!prog)
return -ENOMEM;
 
@@ -1053,7 +1054,7 @@ SYSCALL_DEFINE3(bpf, int, cmd, union bpf_attr __user *, 
uattr, unsigned int, siz
err = map_get_next_key(&attr);
break;
case BPF_PROG_LOAD:
-   err = bpf_prog_load(&attr);
+   err = bpf_prog_load(&attr, capable(CAP_SYS_ADMIN));
break;
case BPF_OBJ_PIN:
err = bpf_obj_pin(&attr);
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 6f8b6ed690be93..24c9dac374770f 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -3488,7 +3488,7 @@ int bpf_check(struct bpf_prog **prog, union bpf_attr 
*attr)
if (ret < 0)
goto skip_full_check;
 
-   env->allow_ptr_leaks = capable(CAP_SYS_ADMIN);
+   env->allow_ptr_leaks = env->prog->priv_cap_sys_admin;
 
ret = do_check(env);
 
@@ -3589,7 +3589,7 @@ int bpf_analyzer(struct bpf_prog *prog, const struct 
bpf_ext_analyzer_ops *ops,
if (ret < 0)
goto skip_full_check;
 
-   env->allow_ptr_leaks = capable(CAP_SYS_ADMIN);
+   env->allow_ptr_leaks = prog->priv_cap_sys_admin;
 
ret = do_check(env);
 
diff --git a/net/core/filter.c b/net/core/filter.c
index 9a37860a80fc78..dc020d40bb770a 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -1100,7 +1100,7 @@ int bpf_prog_create(struct bpf_prog **pfp, struct 
sock_fprog_kern *fprog)
if (!bpf_check_basics_ok(fprog->filter, fprog->len))
return -EINVAL;
 

[PATCH net-next 2/6] bpf: rename bpf_kallsyms to bpf_progs, ksym_lnode to bpf_progs_head

2017-04-26 Thread Hannes Frederic Sowa
We will soon put all bpf programs on this list, thus use apropriate names.

Signed-off-by: Hannes Frederic Sowa 
---
 include/linux/bpf.h |  2 +-
 kernel/bpf/core.c   | 18 +-
 2 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 6bb38d76faf42a..0fbf6a76555cc9 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -172,7 +172,7 @@ struct bpf_prog_aux {
u32 used_map_cnt;
u32 max_ctx_offset;
struct latch_tree_node ksym_tnode;
-   struct list_head ksym_lnode;
+   struct list_head bpf_progs_head;
const struct bpf_verifier_ops *ops;
struct bpf_map **used_maps;
struct bpf_prog *prog;
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index 6f81e0f5a0faa2..043f634ff58d87 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -98,7 +98,7 @@ struct bpf_prog *bpf_prog_alloc(unsigned int size, gfp_t 
gfp_extra_flags)
fp->aux = aux;
fp->aux->prog = fp;
 
-   INIT_LIST_HEAD_RCU(&fp->aux->ksym_lnode);
+   INIT_LIST_HEAD_RCU(&fp->aux->bpf_progs_head);
 
return fp;
 }
@@ -360,25 +360,25 @@ static const struct latch_tree_ops bpf_tree_ops = {
 };
 
 static DEFINE_SPINLOCK(bpf_lock);
-static LIST_HEAD(bpf_kallsyms);
+static LIST_HEAD(bpf_progs);
 static struct latch_tree_root bpf_tree __cacheline_aligned;
 
 int bpf_jit_kallsyms __read_mostly;
 
 static void bpf_prog_ksym_node_add(struct bpf_prog_aux *aux)
 {
-   WARN_ON_ONCE(!list_empty(&aux->ksym_lnode));
-   list_add_tail_rcu(&aux->ksym_lnode, &bpf_kallsyms);
+   WARN_ON_ONCE(!list_empty(&aux->bpf_progs_head));
+   list_add_tail_rcu(&aux->bpf_progs_head, &bpf_progs);
latch_tree_insert(&aux->ksym_tnode, &bpf_tree, &bpf_tree_ops);
 }
 
 static void bpf_prog_ksym_node_del(struct bpf_prog_aux *aux)
 {
-   if (list_empty(&aux->ksym_lnode))
+   if (list_empty(&aux->bpf_progs_head))
return;
 
latch_tree_erase(&aux->ksym_tnode, &bpf_tree, &bpf_tree_ops);
-   list_del_rcu(&aux->ksym_lnode);
+   list_del_rcu(&aux->bpf_progs_head);
 }
 
 static bool bpf_prog_kallsyms_candidate(const struct bpf_prog *fp)
@@ -388,8 +388,8 @@ static bool bpf_prog_kallsyms_candidate(const struct 
bpf_prog *fp)
 
 static bool bpf_prog_kallsyms_verify_off(const struct bpf_prog *fp)
 {
-   return list_empty(&fp->aux->ksym_lnode) ||
-  fp->aux->ksym_lnode.prev == LIST_POISON2;
+   return list_empty(&fp->aux->bpf_progs_head) ||
+  fp->aux->bpf_progs_head.prev == LIST_POISON2;
 }
 
 void bpf_prog_kallsyms_add(struct bpf_prog *fp)
@@ -473,7 +473,7 @@ int bpf_get_kallsym(unsigned int symnum, unsigned long 
*value, char *type,
return ret;
 
rcu_read_lock();
-   list_for_each_entry_rcu(aux, &bpf_kallsyms, ksym_lnode) {
+   list_for_each_entry_rcu(aux, &bpf_progs, bpf_progs_head) {
if (it++ != symnum)
continue;
 
-- 
2.9.3



Re: linux-next: Tree for Apr 26 (net/can/bcm.c)

2017-04-26 Thread Oliver Hartkopp

Hi Randy,

thanks for the report!

Some fallout of my namespace support integration %-)

I posted a patch for it:

http://marc.info/?l=linux-can&m=149323049630039&w=2

Many thanks & best regards,
Oliver

On 04/26/2017 04:53 PM, Randy Dunlap wrote:

On 04/26/17 01:03, Stephen Rothwell wrote:

Hi all,

Changes since 20170424:



on x86_64:

when CONFIG_PROC_FS is not enabled:

../net/can/bcm.c:1541:14: error: 'struct netns_can' has no member named 
'bcmproc_dir'
../net/can/bcm.c:1601:14: error: 'struct netns_can' has no member named 
'bcmproc_dir'
../net/can/bcm.c:1696:11: error: 'struct netns_can' has no member named 
'bcmproc_dir'
../net/can/bcm.c:1707:15: error: 'struct netns_can' has no member named 
'bcmproc_dir'

2 of those are "protected" by
if (IS_ENABLED(CONFIG_PROC_FS)) {
but that doesn't seem to help/work here.

gcc v4.8.5





Re: Bug and configuration MPLS error?

2017-04-26 Thread David Ahern
On 4/26/17 6:40 AM, Алексей Болдырев wrote:
> 
> 
> 26.04.2017, 05:23, "David Ahern" :
>> On 4/25/17 11:28 AM, Алексей Болдырев wrote:
>>>  226 sysctl -w net.mpls.conf.lo.input=1
>>>  227 sysctl -w net.mpls.platform_labels=1048575
>>>  228 ip link add veth0 type veth peer name veth1
>>>  229 ip link add veth2 type veth peer name veth3
>>>  230 sysctl -w net.mpls.conf.veth0.input=1
>>>  231 sysctl -w net.mpls.conf.veth2.input=1
>>>  232 ifconfig veth0 10.3.3.1 netmask 255.255.255.0
>>>  233 ifconfig veth2 10.4.4.1 netmask 255.255.255.0
>>>  234 ip netns add host1
>>>  235 ip netns add host2
>>>  236 ip link set veth1 netns host1
>>>  237 ip link set veth3 netns host2
>>>  238 ip netns exec host1 ifconfig veth1 10.3.3.2 netmask 255.255.255.0 up
>>>  239 ip netns exec host2 ifconfig veth3 10.4.4.2 netmask 255.255.255.0 up
>>>  240 ip netns exec host1 ip route add 10.10.10.2/32 encap mpls 112 via inet 
>>> 10.3.3.1
>>>  241 ip netns exec host2 ip route add 10.10.10.1/32 encap mpls 111 via inet 
>>> 10.4.4.1
>>>  242 ip -f mpls route add 111 via inet 10.3.3.2
>>>  243 ip -f mpls route add 112 via inet 10.4.4.2
>>
>> your setup is incomplete.
>>
>> # ip netns exec host2 ping 10.10.10.1
>> PING 10.10.10.1 (10.10.10.1) 56(84) bytes of data.
>> ^C
>> --- 10.10.10.1 ping statistics ---
>> 2 packets transmitted, 0 received, 100% packet loss, time 1038ms
>>
>> If you run tcpdump on veth1 in host1 you see the packets come in but no
>> response:
>>
>> # ip netns exec host1 tcpdump -n -i veth1
>> tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
>> listening on veth1, link-type EN10MB (Ethernet), capture size 262144 bytes
>> 19:20:24.599529 IP6 fe80::347d:e3ff:fe93:944b > ff02::2: ICMP6, router
>> solicitation, length 16
>> 19:20:27.413901 IP 10.4.4.2 > 10.10.10.1: ICMP echo request, id 978, seq
>> 1, length 64
>> 19:20:28.439574 IP 10.4.4.2 > 10.10.10.1: ICMP echo request, id 978, seq
>> 2, length 64
>>
>> and the lack of response is b/c:
>> 1. host1 has no address for 10.10.10.1 and
>> 2. even if it did, there is no return route to 10.4.4.2:
>>
>> # ip -netns host1 ro ls
>> 10.3.3.0/24 dev veth1 proto kernel scope link src 10.3.3.2
>> 10.10.10.2 encap mpls 112 via 10.3.3.1 dev veth1
> 
> As for ping, you need to enter this:
> Ip netns exec host2 ping 10.10.10.1 -A 10.10.10.2
> Here I published the results of testing on new (>4.9) cores. (in Russian):
> http://forum.nag.ru/forum/index.php?s=d09f0e5186fda59b3099eb81ad07ee63&showtopic=128927
> But on the old kernels:
> http://forum.nag.ru/forum/index.php?showtopic=128927&view=findpost&p=1396067
> 

host1 does not have 10.10.10.1 as a local address.
host2 does not have 10.10.10.2 as a local address.

Given that, host1 has no business replying to a ping destined to
10.10.10.1, and host2 will not use 10.10.10.2 as a source address.

I don't have time right now to build and test on older kernels, but
based on the network config I do not see how it can work.

If you add:
  ip -netns host1 addr add dev lo 10.10.10.1/32
  ip -netns host2 addr add dev lo 10.10.10.2/32

Then it works.


[PATCH net-next] can: fix CAN BCM build with CONFIG_PROC_FS disabled

2017-04-26 Thread Oliver Hartkopp
The introduced namespace support moved the BCM variables for procfs into a
per-net data structure. This leads to a build failure with disabled procfs:

on x86_64:

when CONFIG_PROC_FS is not enabled:

../net/can/bcm.c:1541:14: error: 'struct netns_can' has no member named 
'bcmproc_dir'
../net/can/bcm.c:1601:14: error: 'struct netns_can' has no member named 
'bcmproc_dir'
../net/can/bcm.c:1696:11: error: 'struct netns_can' has no member named 
'bcmproc_dir'
../net/can/bcm.c:1707:15: error: 'struct netns_can' has no member named 
'bcmproc_dir'

http://marc.info/?l=linux-can&m=149321842526524&w=2

Reported-by: Randy Dunlap 
Signed-off-by: Oliver Hartkopp 
---
 net/can/bcm.c | 22 ++
 1 file changed, 14 insertions(+), 8 deletions(-)

diff --git a/net/can/bcm.c b/net/can/bcm.c
index 0e855917b7e1..78409841b74c 100644
--- a/net/can/bcm.c
+++ b/net/can/bcm.c
@@ -147,6 +147,7 @@ static inline ktime_t bcm_timeval_to_ktime(struct 
bcm_timeval tv)
 /*
  * procfs functions
  */
+#if IS_ENABLED(CONFIG_PROC_FS)
 static char *bcm_proc_getifname(struct net *net, char *result, int ifindex)
 {
struct net_device *dev;
@@ -251,6 +252,7 @@ static const struct file_operations bcm_proc_fops = {
.llseek = seq_lseek,
.release= single_release,
 };
+#endif /* CONFIG_PROC_FS */
 
 /*
  * bcm_can_tx - send the (next) CAN frame to the appropriate CAN interface
@@ -1537,9 +1539,11 @@ static int bcm_release(struct socket *sock)
bcm_remove_op(op);
}
 
+#if IS_ENABLED(CONFIG_PROC_FS)
/* remove procfs entry */
if (net->can.bcmproc_dir && bo->bcm_proc_read)
remove_proc_entry(bo->procname, net->can.bcmproc_dir);
+#endif /* CONFIG_PROC_FS */
 
/* remove device reference */
if (bo->bound) {
@@ -1598,6 +1602,7 @@ static int bcm_connect(struct socket *sock, struct 
sockaddr *uaddr, int len,
bo->ifindex = 0;
}
 
+#if IS_ENABLED(CONFIG_PROC_FS)
if (net->can.bcmproc_dir) {
/* unique socket address as filename */
sprintf(bo->procname, "%lu", sock_i_ino(sk));
@@ -1609,6 +1614,7 @@ static int bcm_connect(struct socket *sock, struct 
sockaddr *uaddr, int len,
goto fail;
}
}
+#endif /* CONFIG_PROC_FS */
 
bo->bound = 1;
 
@@ -1691,22 +1697,22 @@ static const struct can_proto bcm_can_proto = {
 
 static int canbcm_pernet_init(struct net *net)
 {
+#if IS_ENABLED(CONFIG_PROC_FS)
/* create /proc/net/can-bcm directory */
-   if (IS_ENABLED(CONFIG_PROC_FS)) {
-   net->can.bcmproc_dir =
-   proc_net_mkdir(net, "can-bcm", net->proc_net);
-   }
+   net->can.bcmproc_dir =
+   proc_net_mkdir(net, "can-bcm", net->proc_net);
+#endif /* CONFIG_PROC_FS */
 
return 0;
 }
 
 static void canbcm_pernet_exit(struct net *net)
 {
+#if IS_ENABLED(CONFIG_PROC_FS)
/* remove /proc/net/can-bcm directory */
-   if (IS_ENABLED(CONFIG_PROC_FS)) {
-   if (net->can.bcmproc_dir)
-   remove_proc_entry("can-bcm", net->proc_net);
-   }
+   if (net->can.bcmproc_dir)
+   remove_proc_entry("can-bcm", net->proc_net);
+#endif /* CONFIG_PROC_FS */
 }
 
 static struct pernet_operations canbcm_pernet_ops __read_mostly = {
-- 
2.11.0



Re: [PATCH net-next] ip6_tunnel: Fix missing tunnel encapsulation limit option

2017-04-26 Thread Craig Gallek
On Wed, Apr 26, 2017 at 1:07 PM, Craig Gallek  wrote:
> From: Craig Gallek 
>
> The IPv6 tunneling code tries to insert IPV6_TLV_TNL_ENCAP_LIMIT and
> IPV6_TLV_PADN options when an encapsulation limit is defined (the
> default is a limit of 4).  An MTU adjustment is done to account for
> these options as well.  However, the options are never present in the
> generated packets.
>
> ipv6_push_nfrag_opts requires that IPV6_RTHDR be present in order to
> include any IPV6_DSTOPTS options.  The v6 tunnel code does not
> use routing options, so the encap limit options are not included.
>
> A brief reading of RFC 3542 section 9.2 (specifically the 4th paragraph)
> makes me believe that this requirement in the kernel is incorrect.
Looking more closely, I think I'm wrong here.  Specifically, the cmsg
parser puts IPV6_RTHDRDSTOPTS in dst0opt and IPV6_DSTOPTS in dst1opt.
The tunnel code is currently building dst0opt and using
ipv6_push_nfrag_opts.  Perhaps it should be building dst1opt and
calling ipv6_push_frag_opts?


Re: xdp_redirect ifindex vs port. Was: best API for returning/setting egress port?

2017-04-26 Thread Alexei Starovoitov

On 4/26/17 9:35 AM, John Fastabend wrote:



As Alexei also mentioned before, ifindex vs port makes no real
difference seen from the bpf program side.  It is userspace's
responsibility to add ifindex/port's to the bpf-maps, according to how
the bpf program "policy" want to "connect" these ports.  The
port-table system add one extra step, of also adding this port to the
port-table (which lives inside the kernel).



I'm not sure I understand the "lives inside the kernel" bit. I assumed
the 'map' should be a bpf map and behave like any other bpf map.

I wanted a new map to be defined, something like this from the bpf programmer
side.

struct bpf_map_def SEC("maps") port_table =
.type = BPF_MAP_TYPE_PORT_CONNECTION,
.key_size = sizeof(u32),
.value_size = BPF_PORT_CONNECTION_SIZE,
.max_entries = 256,
};


I like the idea.
We have prog_array, perf_event_array, cgroup_array map specializations.
This one can be new netdev_array with some new bpf_redirect-like helper
accessing it.


When loading the XDP program, we also need to pass along a port table
"id" this XDP program is associated with (and if it doesn't exists you
create it).  And your userspace "control-plane" application also need
to know this port table "id", when adding a new port.


So the user space application that is loading the program also needs
to handle this map. This seems correct to me. But I don't see the
value in making some new port table when we already have well understood
framework for maps.


+1



The concept of having multiple port tables is key.  As this implies we
can have several simultaneous "data-planes" that is *isolated* from
each-other.  Think about how network-namespaces/containers want
isolation. A subtle thing I'm afraid to mention, is that oppose to the
ifindex model, a port table with mapping to a net_device pointer, would
allow (faster) delivery into the container's inner net_device, which
sort of violates the isolation, but I would argue it is not a problem
as this net_device pointer could only be added from a process within the
namespace.  I like this feature, but it could easily be disallowed via
port insertion-time validation.



I think the above optimization should be allowed. And agree multiple port
tables (maps?) is needed. Again all this points to using standard maps
logic in my mind. For permissions and different domains, which I think
you were starting to touch on, it looks like we could extend the pinning API.
At the moment it does an inode_permission(inode, MAY_WRITE) check but I
presume this could be extended. None of this would be needed in v1 and
could be added subsequently. read-only maps seems doable.


this is great idea. Once BPF_MAP_TYPE_NETDEV_ARRAY is populated
the user space can make it readonly to prevent further changes.

From user space it can be done similar to perf_events/cgroups as well.
bpf_map_update_elem(&netdev_array, &port_num, &ifindex)
should work.
For bpf_map_lookup_elem() from such netdev_array we can return
ifindex back.
The bpf_map_show_fdinfo() can be customized as well to pretty print
ifindexes of netdevs stored in there.



Re: [Patch net 3/3] team: use a larger struct for mac address

2017-04-26 Thread Jarod Wilson

On 2017-04-26 1:28 PM, Cong Wang wrote:

On Wed, Apr 26, 2017 at 9:46 AM, Jarod Wilson  wrote:

On 2017-04-26 12:11 PM, Cong Wang wrote:


On Wed, Apr 26, 2017 at 8:55 AM, Jarod Wilson  wrote:



We already have struct sockaddr_storage that could be used throughout
this
set as well. We just converted a few pieces of the bonding driver over to
using it for better support of ipoib bonds, via commit
faeeb317a5615076dff1ff44b51e862e6064dbd0. Might be better to just use
that
in both bonding and team, rather than having different per-driver
structs,
or Yet Another Address Storage implementation.



Technically, struct sockaddr_storage is not enough either, given the
max is MAX_ADDR_LEN. This is why I gave up on sockaddr_storage.



Wait, what? Am I missing something? MAX_ADDR_LEN is 32, and sockaddr_storage
is a #define for __kernel_sockaddr_storage, which has it's __data member
defined as being of size 128 - sizeof(unsigned short).


My bad, I thought it is same with sizeof(in6addr) without looking into it.
The question is, why do we waste 126 - 32 = 94 bytes on stack to just
use struct sockaddr_storage?


That's a fair point.


I totally understand we want a unified struct, but we already redefine
it in multiple places in tree...


Something unified and centralized with a data storage of MAX_ADDR_LEN 
does seem reasonable to get both consistency and minimized waste, and I 
could certainly do a follow-up patch for the bonding driver to switch 
the bits now using sockaddr_storage over to whatever new struct gets added.


--
Jarod Wilson
ja...@redhat.com


RE: qed*: debug infrastructures

2017-04-26 Thread Elior, Ariel
Jiri, Florian, Jakub,
Thanks all for you suggestions.

Some answers to questions posted: The signal tracing in our device can be used 
for tracing things like load/store/program_counter from our fastpath processors
(which handle every packet) which can then be re-run in a simulative environment
(recreating the recorded scenario). Other interesting uses for this feature can
be partial pci recording or partial network recording (poor man's analyzer)
which can also be very effective where full blown lab equipment is unavailable.

I reviewed the code of the drivers under hw_tracing (thanks Florian) and I think
we might be a good fit.

Jiri indicated dpipe was not intended for this sort of thing and suggested an
additional dev link object, although it seems to me that this will have to be
either a very generic object which would be susceptible to abuse similar to
debugfs, or it would be tailored to our device so much that no one else would
use it, so I am somewhat less inclined to go down this path (the code
abstracting our debug feature is accessed via ~20 api functions accepting ~10
params each, i.e. quite a handful of configuraion to generalize).

The ethtool debug dump presets (thanks Jakub) are far too narrow to encompass
the full flexibility required here.

Dave, I think my next step would be to send an RFC adding to our core module
(qed) the necessary APIs (mostly to provide some details to this rather abstract
discussion). I will plan to connect those to a new hwtracing driver I'll create
for this purpose, unless a different direction is suggested.

Thanks,
Ariel


Re: Corrupted SKB

2017-04-26 Thread Cong Wang
On Tue, Apr 25, 2017 at 9:42 PM, Michael Ma  wrote:
> 2017-04-18 21:46 GMT-07:00 Michael Ma :
>> 2017-04-18 16:12 GMT-07:00 Cong Wang :
>>> On Mon, Apr 17, 2017 at 5:39 PM, Michael Ma  wrote:
 Hi -

 We've implemented a "glue" qdisc similar to mqprio which can associate
 one qdisc to multiple txqs as the root qdisc. Reference count of the
 child qdiscs have been adjusted properly in this case so that it
 represents the number of txqs it has been attached to. However when
 sending packets we saw the skb from dequeue_skb() corrupted with the
 following call stack:

 [exception RIP: netif_skb_features+51]
 RIP: 815292b3  RSP: 8817f6987940  RFLAGS: 00010246

  #9 [8817f6987968] validate_xmit_skb at 815294aa
 #10 [8817f69879a0] validate_xmit_skb at 8152a0d9
 #11 [8817f69879b0] __qdisc_run at 8154a193
 #12 [8817f6987a00] dev_queue_xmit at 81529e03

 It looks like the skb has already been released since its dev pointer
 field is invalid.

 Any clue on how this can be investigated further? My current thought
 is to add some instrumentation to the place where skb is released and
 analyze whether there is any race condition happening there. However
>>>
>>> Either dropwatch or perf could do the work to instrument kfree_skb().
>>
>> Thanks - will try it out.
>
> I'm using perf to collect the callstack for kfree_skb and trying to
> correlate that with the corrupted SKB address however when system
> crashes the perf.data file is also corrupted - how can I view this
> file in case the system crashes before perf exits?

Hmm, KASAN is pretty good at detecting use-after-free,
its report can nicely shows where we allocate/free it and the
use after free.

https://01.org/linuxgraphics/gfx-docs/drm/dev-tools/kasan.html


Re: [Patch net 3/3] team: use a larger struct for mac address

2017-04-26 Thread Cong Wang
On Wed, Apr 26, 2017 at 9:46 AM, Jarod Wilson  wrote:
> On 2017-04-26 12:11 PM, Cong Wang wrote:
>>
>> On Wed, Apr 26, 2017 at 8:55 AM, Jarod Wilson  wrote:
>>>
>>>
>>> We already have struct sockaddr_storage that could be used throughout
>>> this
>>> set as well. We just converted a few pieces of the bonding driver over to
>>> using it for better support of ipoib bonds, via commit
>>> faeeb317a5615076dff1ff44b51e862e6064dbd0. Might be better to just use
>>> that
>>> in both bonding and team, rather than having different per-driver
>>> structs,
>>> or Yet Another Address Storage implementation.
>>
>>
>> Technically, struct sockaddr_storage is not enough either, given the
>> max is MAX_ADDR_LEN. This is why I gave up on sockaddr_storage.
>
>
> Wait, what? Am I missing something? MAX_ADDR_LEN is 32, and sockaddr_storage
> is a #define for __kernel_sockaddr_storage, which has it's __data member
> defined as being of size 128 - sizeof(unsigned short).

My bad, I thought it is same with sizeof(in6addr) without looking into it.
The question is, why do we waste 126 - 32 = 94 bytes on stack to just
use struct sockaddr_storage?

I totally understand we want a unified struct, but we already redefine
it in multiple places in tree...


Re: [PATCH v2 2/2] ARM: dts: Add the ethernet and ethernet PHY to the cygnus core DT.

2017-04-26 Thread Eric Anholt
Florian Fainelli  writes:

> On 04/25/2017 04:53 PM, Eric Anholt wrote:
>> Cygnus has a single AMAC controller connected to the B53 switch with 2
>> PHYs.  On the BCM911360_EP platform, those two PHYs are connected to
>> the external ethernet jacks.
>> 
>> Signed-off-by: Eric Anholt 
>> Reviewed-by: Florian Fainelli 
>> ---
>> 
>> v2: Call the node "switch", just call the ports "port" (suggestions by
>> Florian), drop max-speed on the phys (suggestion by Andrew Lunn),
>> call the other nodes "ethernet" and "ethernet-phy" (suggestions by
>> Sergei Shtylyov)
>> 
>>  arch/arm/boot/dts/bcm-cygnus.dtsi  | 58 
>> ++
>>  arch/arm/boot/dts/bcm911360_entphn.dts |  8 +
>>  2 files changed, 66 insertions(+)
>> 
>> diff --git a/arch/arm/boot/dts/bcm-cygnus.dtsi 
>> b/arch/arm/boot/dts/bcm-cygnus.dtsi
>> index 009f1346b817..9fd89be0f5e0 100644
>> --- a/arch/arm/boot/dts/bcm-cygnus.dtsi
>> +++ b/arch/arm/boot/dts/bcm-cygnus.dtsi
>> @@ -142,6 +142,54 @@
>>  interrupts = <0>;
>>  };
>>  
>> +mdio: mdio@18002000 {
>> +compatible = "brcm,iproc-mdio";
>> +reg = <0x18002000 0x8>;
>> +#size-cells = <1>;
>> +#address-cells = <0>;
>
> Sorry for not noticing earlier, since you override this correctly in the
> board-level DTS file can you put a:
>
>   status = "disabled"
>
> property in there by default?

Done.


signature.asc
Description: PGP signature


[PATCH] net: hso: register netdev later to avoid a race condition

2017-04-26 Thread Andreas Kemnade
If the netdev is accessed before the urbs are initialized,
there will be NULL pointer dereferences. That is avoided by
registering it when it is fully initialized.

This case occurs e.g. if dhcpcd is running in the background
and the device is probed, either after insmod hso or
when the device appears on the usb bus.

A backtrace is the following:

[ 1357.356048] usb 1-2: new high-speed USB device number 12 using ehci-omap
[ 1357.551177] usb 1-2: New USB device found, idVendor=0af0, idProduct=8800
[ 1357.558654] usb 1-2: New USB device strings: Mfr=3, Product=2, SerialNumber=0
[ 1357.568572] usb 1-2: Product: Globetrotter HSUPA Modem
[ 1357.574096] usb 1-2: Manufacturer: Option N.V.
[ 1357.685882] hso 1-2:1.5: Not our interface
[ 1460.886352] hso: unloaded
[ 1460.889984] usbcore: deregistering interface driver hso
[ 1513.769134] hso: ../drivers/net/usb/hso.c: Option Wireless
[ 1513.846771] Unable to handle kernel NULL pointer dereference at virtual 
address 0030
[ 1513.887664] hso 1-2:1.5: Not our interface
[ 1513.906890] usbcore: registered new interface driver hso
[ 1513.937988] pgd = ecdec000
[ 1513.949890] [0030] *pgd=acd15831, *pte=, *ppte=
[ 1513.956573] Internal error: Oops: 817 [#1] PREEMPT SMP ARM
[ 1513.962371] Modules linked in: hso usb_f_ecm omap2430 bnep bluetooth g_ether 
usb_f_rndis u_ether libcomposite configfs ipv6 arc4 wl18xx wlcore mac80211 
cfg80211 bq27xxx_battery panel_tpo_td028ttec1 omapdrm drm_kms_helper 
cfbfillrect snd_soc_simple_card syscopyarea cfbimgblt snd_soc_simple_card_utils 
sysfillrect sysimgblt fb_sys_fops snd_soc_omap_twl4030 cfbcopyarea 
encoder_opa362 drm twl4030_madc_hwmon wwan_on_off snd_soc_gtm601 
pwm_omap_dmtimer generic_adc_battery connector_analog_tv pwm_bl extcon_gpio 
omap3_isp wlcore_sdio videobuf2_dma_contig videobuf2_memops w1_bq27000 
videobuf2_v4l2 videobuf2_core omap_hdq snd_soc_omap_mcbsp ov9650 snd_soc_omap 
bmp280_i2c bmg160_i2c v4l2_common snd_pcm_dmaengine bmp280 bmg160_core at24 
bmc150_magn_i2c nvmem_core videodev phy_twl4030_usb bmc150_accel_i2c tsc2007
[ 1514.037384]  bmc150_magn bmc150_accel_core media leds_tca6507 bno055 
industrialio_triggered_buffer kfifo_buf gpio_twl4030 musb_hdrc snd_soc_twl4030 
twl4030_vibra twl4030_madc twl4030_pwrbutton twl4030_charger industrialio 
w2sg0004 ehci_omap omapdss [last unloaded: hso]
[ 1514.062622] CPU: 0 PID: 3433 Comm: dhcpcd Tainted: GW   
4.11.0-rc8-letux+ #1
[ 1514.071136] Hardware name: Generic OMAP36xx (Flattened Device Tree)
[ 1514.077758] task: ee748240 task.stack: ecdd6000
[ 1514.082580] PC is at hso_start_net_device+0x50/0xc0 [hso]
[ 1514.088287] LR is at hso_net_open+0x68/0x84 [hso]
[ 1514.093231] pc : []lr : []psr: a00f0013
sp : ecdd7e20  ip :   fp : 
[ 1514.105316] r10:   r9 : ed0e080c  r8 : ecd8fe2c
[ 1514.110839] r7 : bf79cef4  r6 : ecd8fe00  r5 :   r4 : ed0dbd80
[ 1514.117706] r3 :   r2 : c0020c80  r1 :   r0 : ecdb7800
[ 1514.124572] Flags: NzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
[ 1514.132110] Control: 10c5387d  Table: acdec019  DAC: 0051
[ 1514.138153] Process dhcpcd (pid: 3433, stack limit = 0xecdd6218)
[ 1514.144470] Stack: (0xecdd7e20 to 0xecdd8000)
[ 1514.149078] 7e20: ed0dbd80 ecd8fe98 0001  ecd8f800 ecd8fe00 
ecd8fe60 
[ 1514.157714] 7e40: ed0e080c bf79ced8 bf79ce70 ecd8f800 0001 bf7a0258 
ecd8f830 c068d958
[ 1514.166320] 7e60: c068d8b8 ecd8f800 0001 1091 1090 c068dba4 
ecd8f800 1090
[ 1514.174926] 7e80: ecd8f940 ecd8f800  c068dc60  0001 
ed0e0800 ecd8f800
[ 1514.183563] 7ea0:  c06feaa8 c0ca39c2 beea57dc 0020  
306f7368 
[ 1514.192169] 7ec0:   1091    
 8914
[ 1514.200805] 7ee0: eaa9ab60 beea57dc c0c9bfc0 eaa9ab40 0006  
00046858 c066a948
[ 1514.209411] 7f00: beea57dc eaa9ab60 ecc6b0c0 c02837b0 0006 c0282c90 
c000 c0283654
[ 1514.218017] 7f20: c09b0c00 c098bc31 0001 c0c5e513 c0c5e513  
c0151354 c01a20c0
[ 1514.226654] 7f40: c0c5e513 c01a3134 ecdd6000 c01a3160 ee7487f0 600f0013 
 ee748240
[ 1514.235260] 7f60: ee748734  ecc6b0c0 ecc6b0c0 beea57dc 8914 
0006 
[ 1514.243896] 7f80: 00046858 c02837b0 1091 0003a1f0 00046608 0003a248 
0036 c01071e4
[ 1514.252502] 7fa0: ecdd6000 c0107040 0003a1f0 00046608 0006 8914 
beea57dc 1091
[ 1514.261108] 7fc0: 0003a1f0 00046608 0003a248 0036 0003ac0c 00046608 
00046610 00046858
[ 1514.269744] 7fe0: 0003a0ac beea57d4 000167eb b6f23106 400f0030 0006 
 
[ 1514.278411] [] (hso_start_net_device [hso]) from [] 
(hso_net_open+0x68/0x84 [hso])
[ 1514.288238] [] (hso_net_open [hso]) from [] 
(__dev_open+0xa0/0xf4)
[ 1514.296600] [] (__dev_open) from [] 
(__dev_change_flags+0x8c/0x130)
[ 1514.305023] [] (__dev_change_flags) from [] 
(dev_change_flags+0x18/0x48)
[ 1514.313934] [] (dev_change_flags) f

Re: [PATCH v2 2/2] ARM: dts: Add the ethernet and ethernet PHY to the cygnus core DT.

2017-04-26 Thread Eric Anholt
Florian Fainelli  writes:

> On 04/25/2017 04:53 PM, Eric Anholt wrote:
>> Cygnus has a single AMAC controller connected to the B53 switch with 2
>> PHYs.  On the BCM911360_EP platform, those two PHYs are connected to
>> the external ethernet jacks.
>> 
>> Signed-off-by: Eric Anholt 
>> Reviewed-by: Florian Fainelli 
>> ---
>> 
>> v2: Call the node "switch", just call the ports "port" (suggestions by
>> Florian), drop max-speed on the phys (suggestion by Andrew Lunn),
>> call the other nodes "ethernet" and "ethernet-phy" (suggestions by
>> Sergei Shtylyov)
>> 
>>  arch/arm/boot/dts/bcm-cygnus.dtsi  | 58 
>> ++
>>  arch/arm/boot/dts/bcm911360_entphn.dts |  8 +
>>  2 files changed, 66 insertions(+)
>> 
>> diff --git a/arch/arm/boot/dts/bcm-cygnus.dtsi 
>> b/arch/arm/boot/dts/bcm-cygnus.dtsi
>> index 009f1346b817..9fd89be0f5e0 100644
>> --- a/arch/arm/boot/dts/bcm-cygnus.dtsi
>> +++ b/arch/arm/boot/dts/bcm-cygnus.dtsi
>> @@ -142,6 +142,54 @@
>>  interrupts = <0>;
>>  };
>>  
>> +mdio: mdio@18002000 {
>> +compatible = "brcm,iproc-mdio";
>> +reg = <0x18002000 0x8>;
>> +#size-cells = <1>;
>> +#address-cells = <0>;
>
> Sorry for not noticing earlier, since you override this correctly in the
> board-level DTS file can you put a:
>
>   status = "disabled"
>
> property in there by default?

I didn't have the override in the board file either, just switch and
ethernet.  Fixed.


signature.asc
Description: PGP signature


Re: [PATCH v2 2/2] ARM: dts: Add the ethernet and ethernet PHY to the cygnus core DT.

2017-04-26 Thread Eric Anholt
Andrew Lunn  writes:

>> +eth0: ethernet@18042000 {
>> +compatible = "brcm,amac";
>> +reg = <0x18042000 0x1000>,
>> +  <0x1811 0x1000>;
>> +reg-names = "amac_base", "idm_base";
>> +interrupts = ;
>> +max-speed = <1000>;
>
> Hi Eric
>
> Sorry i missed this the first time. Does this Ethernet controller do >
> 1Gbps? Does this max-speed do anything useful?

It doesn't look like it.  Dropped.


signature.asc
Description: PGP signature


  1   2   3   >