Re: [PATCH v2 01/21] scatterlist: Introduce sg_map helper functions
On Wed, Apr 26, 2017 at 12:11:33PM -0600, Logan Gunthorpe wrote: > Ok, well for starters I think you are mistaken about kmap being able to > fail. I'm having a hard time finding many users of that function that > bother to check for an error when calling it. A quick audit of the arch code shows you're right - kmap can't fail anywhere anymore. > The main difficulty we > have now is that neither of those functions are expected to fail and we > need them to be able to in cases where the page doesn't map to system > RAM. This patch series is trying to address it for users of scatterlist. > I'm certainly open to other suggestions. I think you'll need to follow the existing kmap semantics and never fail the iomem version either. Otherwise you'll have a special case that's almost never used that has a different error path. > There are a fair number of cases in the kernel that do something like: > > if (something) > x = kmap(page); > else > x = kmap_atomic(page); > ... > if (something) > kunmap(page) > else > kunmap_atomic(x) > > Which just seems cumbersome to me. Passing a different flag based on something isn't really much better. > In any case, if you can accept an sg_kmap and sg_kmap_atomic api just > say so and I'll make the change. But I'll still need a flags variable > for SG_MAP_MUST_NOT_FAIL to support legacy cases that have no fail path > and both of those functions will need to be pretty nearly replicas of > each other. Again, wrong way. Suddenly making things fail for your special case that normally don't fail is a receipe for bugs.
Re: [PATCH 5/7] IB/hfi1: use pcie_flr instead of duplicating it
On Tue, Apr 25, 2017 at 02:39:55PM -0500, Bjorn Helgaas wrote: > This still leaves these: > > [PATCH 4/7] ixgbe: use pcie_flr instead of duplicating it > [PATCH 6/7] crypto: qat: use pcie_flr instead of duplicating it > [PATCH 7/7] liquidio: use pcie_flr instead of duplicating it > > I haven't seen any response to 4 and 6. Felix reported an unused > variable in 7. Let me know if you'd like me to do anything with > these. Now that Jeff ACKed 4 it might be worth to add it to the pci tree last minute. I'll resend liquidio and qat to the respective maintainers for the next merge window.
RE: [iproute2] iplink: add support for IFLA_CARRIER attribute
> -Original Message- > From: Stephen Hemminger [mailto:step...@networkplumber.org] > Sent: Wednesday, April 26, 2017 11:08 PM > To: Zhang Shengju > Cc: netdev@vger.kernel.org > Subject: Re: [iproute2] iplink: add support for IFLA_CARRIER attribute > > On Wed, 26 Apr 2017 15:08:39 +0800 > Zhang Shengju wrote: > > > Add support to set IFLA_CARRIER attribute. > > > > Signed-off-by: Zhang Shengju > > --- > > ip/iplink.c | 12 > > 1 file changed, 12 insertions(+) > > > > diff --git a/ip/iplink.c b/ip/iplink.c index 866ad72..263bfdd 100644 > > --- a/ip/iplink.c > > +++ b/ip/iplink.c > > @@ -72,6 +72,7 @@ void iplink_usage(void) > > " [ allmulticast { on | off } ]\n" > > " [ promisc { on | off } ]\n" > > " [ trailers { on | off } ]\n" > > + " [ carrier { on | off } ]\n" > > " [ txqueuelen PACKETS ]\n" > > " [ name NEWNAME ]\n" > > " [ address LLADDR ]\n" > > @@ -673,6 +674,17 @@ int iplink_parse(int argc, char **argv, struct > iplink_req *req, > > req->i.ifi_flags |= IFF_NOARP; > > else > > return on_off("arp", *argv); > > + } else if (strcmp(*argv, "carrier") == 0) { > > + int carrier; > > + NEXT_ARG(); > > + if (strcmp(*argv, "on") == 0) > > + carrier = 1; > > + else if (strcmp(*argv, "off") == 0) > > + carrier = 0; > > + else > > + return on_off("carrier", *argv); > > + > > + addattr8(&req->n, sizeof(*req), IFLA_CARRIER, > carrier); > > } else if (strcmp(*argv, "vf") == 0) { > > struct rtattr *vflist; > > > > The general policy of ip link command is all options should be invertable. Yes, so I add 'on' and 'off' subcommand to make sure that it can be invertable. > There are some VPN's that use this to save and restore state. So if you add > an option to set something there should be similar output under the detailed > show command. Currently, "show command" already can display 'carrier' status. Such as: dummy0:
Re: [PATCH net-next v8 2/3] net sched actions: dump more than TCA_ACT_MAX_PRIO actions per batch
Wed, Apr 26, 2017 at 10:07:08PM CEST, j...@mojatatu.com wrote: >On 17-04-26 09:56 AM, Jiri Pirko wrote: >> Wed, Apr 26, 2017 at 03:14:38PM CEST, j...@mojatatu.com wrote: >> > On 17-04-26 08:08 AM, Jiri Pirko wrote: > >[..] > [...] > >> > Again: You are looking at this from a manageability point of view which >> > is useful but not the only input into a design. If i can squeeze more >> > data without killing usability - I am all for it. It just doesnt >> > compute that it is ok to use a flag per attribute because it looks >> > beautiful. >> >> Hmm. Now that I'm thinking about it, why don't we have NLA_FLAGS with >> couple of helpers around it? It will be obvious what the attr is, all >> kernel code would use the same helpers. Would be nice. >> > >I think to have flags at that level is useful but it >is a different hierarchy level. I am not sure the >"actions dump large messages" is a fit for that level. Jamal, the idea is to have exactly what you want to have. Only does not use NLA_U32 attr for that but a special attr NLA_FLAGS which would have well defined semantics and set of helpers to work with and enforce it. Then, this could be easily reused in other subsystem that uses netlink
[RFC net-next 2/2] bpf: Test for bpf_prog ID and BPF_PROG_GET_NEXT_ID
Add test to exercise the bpf_prog id generation and iteration. Signed-off-by: Martin KaFai Lau --- tools/include/uapi/linux/bpf.h | 6 ++ tools/lib/bpf/bpf.c| 11 tools/lib/bpf/bpf.h| 1 + tools/testing/selftests/bpf/Makefile | 2 +- tools/testing/selftests/bpf/test_prog_id.c | 93 ++ 5 files changed, 112 insertions(+), 1 deletion(-) create mode 100644 tools/testing/selftests/bpf/test_prog_id.c diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h index e553529929f6..270f501c5597 100644 --- a/tools/include/uapi/linux/bpf.h +++ b/tools/include/uapi/linux/bpf.h @@ -82,6 +82,7 @@ enum bpf_cmd { BPF_PROG_ATTACH, BPF_PROG_DETACH, BPF_PROG_TEST_RUN, + BPF_PROG_GET_NEXT_ID, }; enum bpf_map_type { @@ -201,6 +202,11 @@ union bpf_attr { __u32 repeat; __u32 duration; } test; + + struct { /* anonymous struct used by BPF_PROG_GET_NEXT_ID */ + __u32 start_id; + __aligned_u64 next_id; + }; } __attribute__((aligned(8))); /* BPF helper function descriptions: diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c index 4fe444b8092e..e23011f88fb4 100644 --- a/tools/lib/bpf/bpf.c +++ b/tools/lib/bpf/bpf.c @@ -235,3 +235,14 @@ int bpf_prog_test_run(int prog_fd, int repeat, void *data, __u32 size, *duration = attr.test.duration; return ret; } + +int bpf_prog_get_next_id(__u32 start_id, __u32 *next_id) +{ + union bpf_attr attr; + + bzero(&attr, sizeof(attr)); + attr.start_id = start_id; + attr.next_id = ptr_to_u64(next_id); + + return sys_bpf(BPF_PROG_GET_NEXT_ID, &attr, sizeof(attr)); +} diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h index edb4daeff7a5..200f1ffc9cf9 100644 --- a/tools/lib/bpf/bpf.h +++ b/tools/lib/bpf/bpf.h @@ -50,5 +50,6 @@ int bpf_prog_detach(int attachable_fd, enum bpf_attach_type type); int bpf_prog_test_run(int prog_fd, int repeat, void *data, __u32 size, void *data_out, __u32 *size_out, __u32 *retval, __u32 *duration); +int bpf_prog_get_next_id(__u32 start_id, __u32 *next_id); #endif diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile index d8d94b9bd76c..68c4a920cb56 100644 --- a/tools/testing/selftests/bpf/Makefile +++ b/tools/testing/selftests/bpf/Makefile @@ -11,7 +11,7 @@ endif CFLAGS += -Wall -O2 -I$(APIDIR) -I$(LIBDIR) -I$(GENDIR) $(GENFLAGS) -I../../../include LDLIBS += -lcap -lelf -TEST_GEN_PROGS = test_verifier test_tag test_maps test_lru_map test_lpm_map test_progs +TEST_GEN_PROGS = test_verifier test_tag test_maps test_lru_map test_lpm_map test_progs test_prog_id TEST_GEN_FILES = test_pkt_access.o test_xdp.o test_l4lb.o diff --git a/tools/testing/selftests/bpf/test_prog_id.c b/tools/testing/selftests/bpf/test_prog_id.c new file mode 100644 index ..f6c56c649d72 --- /dev/null +++ b/tools/testing/selftests/bpf/test_prog_id.c @@ -0,0 +1,93 @@ +/* Copyright (c) 2017 Facebook + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of version 2 of the GNU General Public + * License as published by the Free Software Foundation. + */ +#include +#include +#include +#include +#include +#include +#include +#include + +#include "bpf_util.h" + +static int bpf_prog_load(const char *file, enum bpf_prog_type type, +struct bpf_object **pobj, int *prog_fd) +{ + struct bpf_program *prog; + struct bpf_object *obj; + int err; + + obj = bpf_object__open(file); + if (IS_ERR(obj)) + return -ENOENT; + + prog = bpf_program__next(NULL, obj); + if (!prog) { + bpf_object__close(obj); + return -ENOENT; + } + + bpf_program__set_type(prog, type); + err = bpf_object__load(obj); + if (err) { + bpf_object__close(obj); + return -EINVAL; + } + + *pobj = obj; + *prog_fd = bpf_program__fd(prog); + return 0; +} + +int main(void) +{ + struct rlimit rinf = { RLIM_INFINITY, RLIM_INFINITY }; + const char *file = "./test_pkt_access.o"; + const int nr_iters = 16; + int bpf_prog_fds[nr_iters]; + int i, err = 0; + uint32_t next_id = 0; + + if (setrlimit(RLIMIT_MEMLOCK, &rinf)) { + perror("setrlimit"); + return -1; + } + + memset(bpf_prog_fds, -1, sizeof(bpf_prog_fds)); + + for (i = 0; i < nr_iters; i++) { + struct bpf_object *obj; + int prog_fd; + + err = bpf_prog_load(file, BPF_PROG_TYPE_SCHED_CLS, &obj, + &prog_fd); + if (err) { + perror("bpf_prog_load"); +
[RFC net-next 1/2] bpf: Introduce bpf_prog ID
The ID is generated by the existing idr_alloc_cyclic(). This patch also adds BPF_PROG_GET_NEXT_ID to allow userspace to iterate all bpf_prog id(s). The API is trying to be consistent with the existing BPF_MAP_GET_NEXT_KEY. Signed-off-by: Martin KaFai Lau --- include/linux/filter.h | 1 + include/uapi/linux/bpf.h | 6 ++ kernel/bpf/syscall.c | 47 ++- 3 files changed, 53 insertions(+), 1 deletion(-) diff --git a/include/linux/filter.h b/include/linux/filter.h index 9a7786db14fa..6eeeb83c4013 100644 --- a/include/linux/filter.h +++ b/include/linux/filter.h @@ -417,6 +417,7 @@ struct bpf_prog { kmemcheck_bitfield_end(meta); enum bpf_prog_type type; /* Type of BPF program */ u32 len;/* Number of filter blocks */ + u32 id; u8 tag[BPF_TAG_SIZE]; struct bpf_prog_aux *aux; /* Auxiliary fields */ struct sock_fprog_kern *orig_prog; /* Original BPF program */ diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index e553529929f6..270f501c5597 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -82,6 +82,7 @@ enum bpf_cmd { BPF_PROG_ATTACH, BPF_PROG_DETACH, BPF_PROG_TEST_RUN, + BPF_PROG_GET_NEXT_ID, }; enum bpf_map_type { @@ -201,6 +202,11 @@ union bpf_attr { __u32 repeat; __u32 duration; } test; + + struct { /* anonymous struct used by BPF_PROG_GET_NEXT_ID */ + __u32 start_id; + __aligned_u64 next_id; + }; } __attribute__((aligned(8))); /* BPF helper function descriptions: diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index 13642c73dca0..6a654e17bd3c 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -22,8 +22,11 @@ #include #include #include +#include DEFINE_PER_CPU(int, bpf_prog_active); +DEFINE_IDR(prog_idr); +DEFINE_SPINLOCK(prog_idr_lock); int sysctl_unprivileged_bpf_disabled __read_mostly; @@ -663,6 +666,9 @@ static void __bpf_prog_put_rcu(struct rcu_head *rcu) void bpf_prog_put(struct bpf_prog *prog) { if (atomic_dec_and_test(&prog->aux->refcnt)) { + spin_lock(&prog_idr_lock); + idr_remove(&prog_idr, prog->id); + spin_unlock(&prog_idr_lock); trace_bpf_prog_put_rcu(prog); bpf_prog_kallsyms_del(prog); call_rcu(&prog->aux->rcu, __bpf_prog_put_rcu); @@ -790,7 +796,7 @@ static int bpf_prog_load(union bpf_attr *attr) { enum bpf_prog_type type = attr->prog_type; struct bpf_prog *prog; - int err; + int err, id; char license[128]; bool is_gpl; @@ -848,6 +854,15 @@ static int bpf_prog_load(union bpf_attr *attr) if (err < 0) goto free_used_maps; + spin_lock(&prog_idr_lock); + id = idr_alloc_cyclic(&prog_idr, prog, 1, INT_MAX, GFP_USER); + spin_unlock(&prog_idr_lock); + if (id < 0) { + err = id; + goto free_used_maps; + } + prog->id = id; + /* eBPF program is ready to be JITed */ prog = bpf_prog_select_runtime(prog, &err); if (err < 0) @@ -995,6 +1010,33 @@ static int bpf_prog_test_run(const union bpf_attr *attr, return ret; } +#define BPF_PROG_GET_NEXT_ID_LAST_FIELD next_id + +static int bpf_prog_get_next_id(union bpf_attr *attr) +{ + u64 __user *unext_id = u64_to_user_ptr(attr->next_id); + u32 next_id = attr->start_id; + struct bpf_prog *prog; + + if (CHECK_ATTR(BPF_PROG_GET_NEXT_ID)) + return -EINVAL; + + if (next_id++ >= INT_MAX) + return -EINVAL; + + spin_lock(&prog_idr_lock); + prog = idr_get_next(&prog_idr, &next_id); + spin_unlock(&prog_idr_lock); + + if (!prog) + return -ENOENT; + + if (copy_to_user(unext_id, &next_id, sizeof(next_id))) + return -EFAULT; + + return 0; +} + SYSCALL_DEFINE3(bpf, int, cmd, union bpf_attr __user *, uattr, unsigned int, size) { union bpf_attr attr = {}; @@ -1072,6 +1114,9 @@ SYSCALL_DEFINE3(bpf, int, cmd, union bpf_attr __user *, uattr, unsigned int, siz case BPF_PROG_TEST_RUN: err = bpf_prog_test_run(&attr, uattr); break; + case BPF_PROG_GET_NEXT_ID: + err = bpf_prog_get_next_id(&attr); + break; default: err = -EINVAL; break; -- 2.9.3
[RFC net-next 0/2] Introduce bpf_prog ID and iteration
This patchset introduces the bpf_prog ID and a new bpf cmd to iterate all bpf_prog in the system. It is still incomplete. The idea can be extended to bpf_map. Martin KaFai Lau (2): bpf: Introduce bpf_prog ID bpf: Test for bpf_prog ID and BPF_PROG_GET_NEXT_ID include/linux/filter.h | 1 + include/uapi/linux/bpf.h | 6 ++ kernel/bpf/syscall.c | 47 ++- tools/include/uapi/linux/bpf.h | 6 ++ tools/lib/bpf/bpf.c| 11 tools/lib/bpf/bpf.h| 1 + tools/testing/selftests/bpf/Makefile | 2 +- tools/testing/selftests/bpf/test_prog_id.c | 93 ++ 8 files changed, 165 insertions(+), 2 deletions(-) create mode 100644 tools/testing/selftests/bpf/test_prog_id.c -- 2.9.3
rhashtable - Cap total number of entries to 2^31
On Tue, Apr 25, 2017 at 10:48:22AM -0400, David Miller wrote: > From: Florian Westphal > Date: Tue, 25 Apr 2017 16:17:49 +0200 > > > I'd have less of an issue with this if we'd be talking about > > something computationally expensive, but this is about storing > > an extra value inside a struct just to avoid one "shr" in insert path... > > Agreed, this shift is probably filling an available cpu cycle :-) OK, but we need to have an extra field for another reason anyway. The problem is that we're not capping the total number of elements in the hashtable when max_size is not set, this means that nelems can overflow which will cause havoc with the automatic shrinking when it tries to fit 2^32 entries into a minimum-sized table. So I'm taking that hole back for now :) ---8<--- When max_size is not set or if it set to a sufficiently large value, the nelems counter can overflow. This would cause havoc with the automatic shrinking as it would then attempt to fit a huge number of entries into a tiny hash table. This patch fixes this by adding max_elems to struct rhashtable to cap the number of elements. This is set to 2^31 as nelems is not a precise count. This is sufficiently smaller than UINT_MAX that it should be safe. When max_size is set max_elems will be lowered to at most twice max_size as is the status quo. Signed-off-by: Herbert Xu diff --git a/include/linux/rhashtable.h b/include/linux/rhashtable.h index ae93b65..45f8936 100644 --- a/include/linux/rhashtable.h +++ b/include/linux/rhashtable.h @@ -155,6 +155,7 @@ struct rhashtable_params { * @nelems: Number of elements in table * @key_len: Key length for hashfn * @p: Configuration parameters + * @max_elems: Maximum number of elements in table * @rhlist: True if this is an rhltable * @run_work: Deferred worker to expand/shrink asynchronously * @mutex: Mutex to protect current/future table swapping @@ -165,6 +166,7 @@ struct rhashtable { atomic_tnelems; unsigned intkey_len; struct rhashtable_paramsp; + unsigned intmax_elems; boolrhlist; struct work_struct run_work; struct mutexmutex; @@ -327,8 +329,7 @@ static inline bool rht_grow_above_100(const struct rhashtable *ht, static inline bool rht_grow_above_max(const struct rhashtable *ht, const struct bucket_table *tbl) { - return ht->p.max_size && - (atomic_read(&ht->nelems) / 2u) >= ht->p.max_size; + return atomic_read(&ht->nelems) >= ht->max_elems; } /* The bucket lock is selected based on the hash and protects mutations diff --git a/lib/rhashtable.c b/lib/rhashtable.c index f3b82e0..751630b 100644 --- a/lib/rhashtable.c +++ b/lib/rhashtable.c @@ -961,6 +961,11 @@ int rhashtable_init(struct rhashtable *ht, if (params->max_size) ht->p.max_size = rounddown_pow_of_two(params->max_size); + /* Cap total entries at 2^31 to avoid nelems overflow. */ + ht->max_elems = 1u << 31; + if (ht->p.max_size < ht->max_elems / 2) + ht->max_elems = ht->p.max_size * 2; + ht->p.min_size = max(ht->p.min_size, HASH_MIN_SIZE); if (params->nelem_hint) -- Email: Herbert Xu Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
Re: ipsec doesn't route TCP with 4.11 kernel
(Cc'ing netdev and IPSec maintainers) On Tue, Apr 25, 2017 at 6:08 PM, Don Bowman wrote: > I'm not sure how to describe this. > > 4.11rc2 worked, after that, no. > > My ipsec tunnel comes up ok. ICMP works. UDP works. But TCP, the > sender [which is the ipsec client] does not reach the destination. > > Its not a routing rule issue (since ICMP/UDP work). > Its not a traffic selector just selecting TCP (I think) since ipsec > status shows just a subnet, no protocol. > > Using tcpdump: > # iptables -t mangle -I PREROUTING -m policy --pol ipsec --dir in -j > NFLOG --nflog-group 5 > # iptables -t mangle -I POSTROUTING -m policy --pol ipsec --dir out -j > NFLOG --nflog-group 5 > # tcpdump -s 0 -n -i nflog:5 > > I see that it thinks it is sending the TCP packet, but the server end > does not receive. > > Does anyone have any suggestion to try? > > strongswan is 5.5.1 [on ubuntu 17.04] > kernel is 4.11.0-041100rc8-generic > > My rightsubnet is > rightsubnet = 192.168.128.0/17,10.0.0.0/8 > > so no specific protocol selected, the result is: > CHILD_SA sv{1} established with SPIs c05f1b6c_i 0d58815a_o and TS > 192.168.130.4/32 === 10.0.0.0/8 192.168.128.0/17 > > I tried changing charondebug net=3, but i'm not sure how to interpret > the output: > > Apr 25 21:06:34 office charon: 04[NET] received packet: from > 64.7.137.180[4500] to 172.16.0.8[4500] (80 bytes) > Apr 25 21:06:34 office charon: 04[ENC] parsed INFORMATIONAL request 4 [ ] > Apr 25 21:06:34 office charon: 04[ENC] generating INFORMATIONAL response 4 [ ] > Apr 25 21:06:34 office charon: 04[NET] sending packet: from > 172.16.0.8[4500] to 64.7.137.180[4500] (80 bytes)
Re: [PATCH net-next 1/9] drivers: net: xgene: Protect indirect MAC access
On Wed, Apr 26, 2017 at 5:06 PM, Florian Fainelli wrote: > On 04/26/2017 04:38 PM, Iyappan Subramanian wrote: >> From: Quan Nguyen >> >> This patch adds lock to protect indirect mac access sequence. >> >> Signed-off-by: Quan Nguyen >> Signed-off-by: Iyappan Subramanian >> --- >> drivers/net/ethernet/apm/xgene/xgene_enet_hw.c| 2 ++ >> drivers/net/ethernet/apm/xgene/xgene_enet_main.c | 1 + >> drivers/net/ethernet/apm/xgene/xgene_enet_main.h | 1 + >> drivers/net/ethernet/apm/xgene/xgene_enet_sgmac.c | 7 ++- >> drivers/net/ethernet/apm/xgene/xgene_enet_xgmac.c | 2 ++ >> 5 files changed, 12 insertions(+), 1 deletion(-) >> >> diff --git a/drivers/net/ethernet/apm/xgene/xgene_enet_hw.c >> b/drivers/net/ethernet/apm/xgene/xgene_enet_hw.c >> index 2a835e0..3697ba7 100644 >> --- a/drivers/net/ethernet/apm/xgene/xgene_enet_hw.c >> +++ b/drivers/net/ethernet/apm/xgene/xgene_enet_hw.c >> @@ -365,9 +365,11 @@ static void xgene_enet_rd_mcx_mac(struct >> xgene_enet_pdata *pdata, >> cmd = pdata->mcx_mac_addr + MAC_COMMAND_REG_OFFSET; >> cmd_done = pdata->mcx_mac_addr + MAC_COMMAND_DONE_REG_OFFSET; >> >> + spin_lock(&pdata->mac_lock); >> if (!xgene_enet_rd_indirect(addr, rd, cmd, cmd_done, rd_addr, rd_data)) >> netdev_err(pdata->ndev, "MCX mac read failed, addr: %04x\n", >> rd_addr); >> + spin_unlock(&pdata->mac_lock); > > Why not fold the spinlock manipulation within xgenet_enet_rd_indirect()? > That way the caller does not have to know that acquiring the spinlock is > required and you avoid potential bugs in the future where you are > missing the spinlock on indirect accesses? Thanks. We'll do that and post the next version. > -- > Florian
[PATCH net-next] tcp: tcp_rack_reo_timeout() must update tp->tcp_mstamp
From: Eric Dumazet I wrongly assumed tp->tcp_mstamp was up to date at the time tcp_rack_reo_timeout() was called. It is not true, since we only update tcp->tcp_mstamp when receiving a packet (as initially done in commit 69e996c58a35 ("tcp: add tp->tcp_mstamp field") tcp_rack_reo_timeout() being called by a timer and not an incoming packet, we need to refresh tp->tcp_mstamp Fixes: 7c1c7308592f ("tcp: do not pass timestamp to tcp_rack_detect_loss()") Signed-off-by: Eric Dumazet Cc: Soheil Hassas Yeganeh Cc: Neal Cardwell Cc: Yuchung Cheng --- net/ipv4/tcp_recovery.c |1 + 1 file changed, 1 insertion(+) diff --git a/net/ipv4/tcp_recovery.c b/net/ipv4/tcp_recovery.c index cd72b3d3879e88181c8a4639f0334a24e4cda852..362b8c75bfab44cf87c2a01398a146a271bc1119 100644 --- a/net/ipv4/tcp_recovery.c +++ b/net/ipv4/tcp_recovery.c @@ -166,6 +166,7 @@ void tcp_rack_reo_timeout(struct sock *sk) u32 timeout, prior_inflight; prior_inflight = tcp_packets_in_flight(tp); + skb_mstamp_get(&tp->tcp_mstamp); tcp_rack_detect_loss(sk, &timeout); if (prior_inflight != tcp_packets_in_flight(tp)) { if (inet_csk(sk)->icsk_ca_state != TCP_CA_Recovery) {
Re: [PATCH v1 net-next 3/6] net: add new control message for incoming HW-timestamped packets
Hi Miroslav, [auto build test ERROR on net-next/master] url: https://github.com/0day-ci/linux/commits/Miroslav-Lichvar/Extend-socket-timestamping-API/20170427-093243 config: xtensa-allmodconfig (attached as .config) compiler: xtensa-linux-gcc (GCC) 4.9.0 reproduce: wget https://raw.githubusercontent.com/01org/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # save the attached .config to linux build tree make.cross ARCH=xtensa All errors (new ones prefixed by >>): net/socket.c: In function 'put_ts_pktinfo': >> net/socket.c:680:28: error: 'SCM_TIMESTAMPING_PKTINFO' undeclared (first use >> in this function) put_cmsg(msg, SOL_SOCKET, SCM_TIMESTAMPING_PKTINFO, ^ net/socket.c:680:28: note: each undeclared identifier is reported only once for each function it appears in vim +/SCM_TIMESTAMPING_PKTINFO +680 net/socket.c 674 orig_dev = dev_get_by_napi_id(skb->napi_id); 675 if (!orig_dev) 676 return; 677 678 ts_pktinfo.if_index = orig_dev->ifindex; 679 ts_pktinfo.pkt_length = skb->len - skb_mac_offset(skb); > 680 put_cmsg(msg, SOL_SOCKET, SCM_TIMESTAMPING_PKTINFO, 681 sizeof(ts_pktinfo), &ts_pktinfo); 682 #endif 683 } --- 0-DAY kernel test infrastructureOpen Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation .config.gz Description: application/gzip
Re: [PATCH v2 07/21] crypto: shash, caam: Make use of the new sg_map helper function
On Tue, Apr 25, 2017 at 12:20:54PM -0600, Logan Gunthorpe wrote: > Very straightforward conversion to the new function in the caam driver > and shash library. > > Signed-off-by: Logan Gunthorpe > Cc: Herbert Xu > Cc: "David S. Miller" > --- > crypto/shash.c| 9 ++--- > drivers/crypto/caam/caamalg.c | 8 +++- > 2 files changed, 9 insertions(+), 8 deletions(-) > > diff --git a/crypto/shash.c b/crypto/shash.c > index 5e31c8d..5914881 100644 > --- a/crypto/shash.c > +++ b/crypto/shash.c > @@ -283,10 +283,13 @@ int shash_ahash_digest(struct ahash_request *req, > struct shash_desc *desc) > if (nbytes < min(sg->length, ((unsigned int)(PAGE_SIZE)) - offset)) { > void *data; > > - data = kmap_atomic(sg_page(sg)); > - err = crypto_shash_digest(desc, data + offset, nbytes, > + data = sg_map(sg, 0, SG_KMAP_ATOMIC); > + if (IS_ERR(data)) > + return PTR_ERR(data); > + > + err = crypto_shash_digest(desc, data, nbytes, > req->result); > - kunmap_atomic(data); > + sg_unmap(sg, data, 0, SG_KMAP_ATOMIC); > crypto_yield(desc->flags); > } else > err = crypto_shash_init(desc) ?: Nack. This is an optimisation for the special case of a single SG list entry. In fact in the common case the kmap_atomic should disappear altogether in the no-highmem case. So replacing it with sg_map is not acceptable. Cheers, -- Email: Herbert Xu Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
Re: [PATCH net-next] net: vrf: Do not allow looback to be moved to a VRF
On Wed, 2017-04-26 at 07:58 -0700, David Ahern wrote: > Moving the loopback into a VRF breaks networking for the default VRF. > Since the VRF device is the loopback for VRF domains, there is no > reason to move the loopback. Given the repercussions, block attempts > to set lo into a VRF. > > Signed-off-by: David Ahern > --- > drivers/net/vrf.c | 6 ++ > 1 file changed, 6 insertions(+) > > diff --git a/drivers/net/vrf.c b/drivers/net/vrf.c > index aa5d30428bba..ceda5861da78 100644 > --- a/drivers/net/vrf.c > +++ b/drivers/net/vrf.c > @@ -877,6 +877,12 @@ static int do_vrf_add_slave(struct net_device *dev, > struct net_device *port_dev) > { > int ret; > > + /* do not allow loopback device to be enslaved to a VRF. > + * The vrf device acts as the loopback for the vrf. > + */ > + if (port_dev == dev_net(dev)->loopback_dev) > + return -EOPNOTSUPP; > + > port_dev->priv_flags |= IFF_L3MDEV_SLAVE; > ret = netdev_master_upper_dev_link(port_dev, dev, NULL, NULL); > if (ret < 0) I think that's a great idea. Reviewed-by: Greg Rose
[PATCH net v4 3/3] net: hns: fixed bug that skb used after kfree
From: lipeng There is KASAN warning which turn out it's a skb used after free: BUG: KASAN: use-after-free in hns_nic_net_xmit_hw+0x62c/0x940... [17659.112635] alloc_debug_processing+0x18c/0x1a0 [17659.117208] __slab_alloc+0x52c/0x560 [17659.120909] kmem_cache_alloc_node+0xac/0x2c0 [17659.125309] __alloc_skb+0x6c/0x260 [17659.128837] tcp_send_ack+0x8c/0x280 [17659.132449] __tcp_ack_snd_check+0x9c/0xf0 [17659.136587] tcp_rcv_established+0x5a4/0xa70 [17659.140899] tcp_v4_do_rcv+0x27c/0x620 [17659.144687] tcp_prequeue_process+0x108/0x170 [17659.149085] tcp_recvmsg+0x940/0x1020 [17659.152787] inet_recvmsg+0x124/0x180 [17659.156488] sock_recvmsg+0x64/0x80 [17659.160012] SyS_recvfrom+0xd8/0x180 [17659.163626] __sys_trace_return+0x0/0x4 [17659.167506] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=23 cpu=1 pid=13 [17659.174000] free_debug_processing+0x1d4/0x2c0 [17659.178486] __slab_free+0x240/0x390 [17659.182100] kmem_cache_free+0x24c/0x270 [17659.186062] kfree_skbmem+0xa0/0xb0 [17659.189587] __kfree_skb+0x28/0x40 [17659.193025] napi_gro_receive+0x168/0x1c0 [17659.197074] hns_nic_rx_up_pro+0x58/0x90 [17659.201038] hns_nic_rx_poll_one+0x518/0xbc0 [17659.205352] hns_nic_common_poll+0x94/0x140 [17659.209576] net_rx_action+0x458/0x5e0 [17659.213363] __do_softirq+0x1b8/0x480 [17659.217062] run_ksoftirqd+0x64/0x80 [17659.220679] smpboot_thread_fn+0x224/0x310 [17659.224821] kthread+0x150/0x170 [17659.228084] ret_from_fork+0x10/0x40 BUG: KASAN: use-after-free in hns_nic_net_xmit+0x8c/0xc0... [17751.080490] __slab_alloc+0x52c/0x560 [17751.084188] kmem_cache_alloc+0x244/0x280 [17751.088238] __build_skb+0x40/0x150 [17751.091764] build_skb+0x28/0x100 [17751.095115] __alloc_rx_skb+0x94/0x150 [17751.098900] __napi_alloc_skb+0x34/0x90 [17751.102776] hns_nic_rx_poll_one+0x180/0xbc0 [17751.107097] hns_nic_common_poll+0x94/0x140 [17751.111333] net_rx_action+0x458/0x5e0 [17751.115123] __do_softirq+0x1b8/0x480 [17751.118823] run_ksoftirqd+0x64/0x80 [17751.122437] smpboot_thread_fn+0x224/0x310 [17751.126575] kthread+0x150/0x170 [17751.129838] ret_from_fork+0x10/0x40 [17751.133454] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=19 cpu=7 pid=43 [17751.139951] free_debug_processing+0x1d4/0x2c0 [17751.144436] __slab_free+0x240/0x390 [17751.148051] kmem_cache_free+0x24c/0x270 [17751.152014] kfree_skbmem+0xa0/0xb0 [17751.155543] __kfree_skb+0x28/0x40 [17751.159022] napi_gro_receive+0x168/0x1c0 [17751.163074] hns_nic_rx_up_pro+0x58/0x90 [17751.167041] hns_nic_rx_poll_one+0x518/0xbc0 [17751.171358] hns_nic_common_poll+0x94/0x140 [17751.175585] net_rx_action+0x458/0x5e0 [17751.179373] __do_softirq+0x1b8/0x480 [17751.183076] run_ksoftirqd+0x64/0x80 [17751.186691] smpboot_thread_fn+0x224/0x310 [17751.190826] kthread+0x150/0x170 [17751.194093] ret_from_fork+0x10/0x40 in drivers/net/ethernet/hisilicon/hns/hns_enet.c static netdev_tx_t hns_nic_net_xmit(struct sk_buff *skb, struct net_device *ndev) { struct hns_nic_priv *priv = netdev_priv(ndev); int ret; assert(skb->queue_mapping < ndev->ae_handle->q_num); ret = hns_nic_net_xmit_hw(ndev, skb, &tx_ring_data(priv, skb->queue_mapping)); if (ret == NETDEV_TX_OK) { netif_trans_update(ndev); ndev->stats.tx_bytes += skb->len; ndev->stats.tx_packets++; } return (netdev_tx_t)ret; } skb maybe freed in hns_nic_net_xmit_hw() and return NETDEV_TX_OK: out_err_tx_ok: dev_kfree_skb_any(skb); return NETDEV_TX_OK; This patch fix this bug. Reported-by: Jun He Signed-off-by: lipeng Reviewed-by: Yisen Zhuang --- drivers/net/ethernet/hisilicon/hns/hns_enet.c | 22 ++ drivers/net/ethernet/hisilicon/hns/hns_enet.h | 6 +++--- 2 files changed, 13 insertions(+), 15 deletions(-) diff --git a/drivers/net/ethernet/hisilicon/hns/hns_enet.c b/drivers/net/ethernet/hisilicon/hns/hns_enet.c index fca37e2..1e04df6 100644 --- a/drivers/net/ethernet/hisilicon/hns/hns_enet.c +++ b/drivers/net/ethernet/hisilicon/hns/hns_enet.c @@ -300,9 +300,9 @@ static void fill_tso_desc(struct hnae_ring *ring, void *priv, mtu); } -int hns_nic_net_xmit_hw(struct net_device *ndev, - struct sk_buff *skb, - struct hns_nic_ring_data *ring_data) +netdev_tx_t hns_nic_net_xmit_hw(struct net_device *ndev, +
[PATCH net v4 0/3] net: hns: bug fix for HNS driver
From: lipeng The first two patches [1/3] [2/3] of this serie add support defered dsaf probe when mdio and mbigen module is not insmod. The third patch [3/3] fixes a bug that skb still be used after freed, which will cuase panic. For more details, please refer to individual patch. change log: V3 -> V4: 1. Delete redundant commit message; 2. add Reviewed-by: Matthias Brugger ; V2 -> V3: 1. Check return value when platform_get_irq in hns_rcb_get_cfg; V1 -> V2: 1. Return appropriate errno in hns_mac_register_phy; lipeng (3): net: hns: support deferred probe when can not obtain irq net: hns: support deferred probe when no mdio net: hns: fixed bug that skb used after kfree drivers/net/ethernet/hisilicon/hns/hns_dsaf_mac.c | 39 +++ drivers/net/ethernet/hisilicon/hns/hns_dsaf_ppe.c | 4 ++- drivers/net/ethernet/hisilicon/hns/hns_dsaf_rcb.c | 8 - drivers/net/ethernet/hisilicon/hns/hns_dsaf_rcb.h | 2 +- drivers/net/ethernet/hisilicon/hns/hns_enet.c | 22 ++--- drivers/net/ethernet/hisilicon/hns/hns_enet.h | 6 ++-- 6 files changed, 57 insertions(+), 24 deletions(-) -- 1.9.1
[PATCH net v4 2/3] net: hns: support deferred probe when no mdio
From: lipeng In the hip06 and hip07 SoCs, phy connect to mdio bus.The mdio module is probed with module_init, and, as such, is not guaranteed to probe before the HNS driver. So we need to support deferred probe. We check for probe deferral in the mac init, so we not init DSAF when there is no mdio, and free all resource, to later learn that we need to defer the probe. Signed-off-by: lipeng Reviewed-by: Yisen Zhuang Reviewed-by: Matthias Brugger --- drivers/net/ethernet/hisilicon/hns/hns_dsaf_mac.c | 39 +++ 1 file changed, 33 insertions(+), 6 deletions(-) diff --git a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_mac.c b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_mac.c index bdd8cdd..8c00e09 100644 --- a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_mac.c +++ b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_mac.c @@ -735,6 +735,8 @@ static int hns_mac_init_ex(struct hns_mac_cb *mac_cb) rc = phy_device_register(phy); if (rc) { phy_device_free(phy); + dev_err(&mdio->dev, "registered phy fail at address %i\n", + addr); return -ENODEV; } @@ -745,7 +747,7 @@ static int hns_mac_init_ex(struct hns_mac_cb *mac_cb) return 0; } -static void hns_mac_register_phy(struct hns_mac_cb *mac_cb) +static int hns_mac_register_phy(struct hns_mac_cb *mac_cb) { struct acpi_reference_args args; struct platform_device *pdev; @@ -755,24 +757,39 @@ static void hns_mac_register_phy(struct hns_mac_cb *mac_cb) /* Loop over the child nodes and register a phy_device for each one */ if (!to_acpi_device_node(mac_cb->fw_port)) - return; + return -ENODEV; rc = acpi_node_get_property_reference( mac_cb->fw_port, "mdio-node", 0, &args); if (rc) - return; + return rc; addr = hns_mac_phy_parse_addr(mac_cb->dev, mac_cb->fw_port); if (addr < 0) - return; + return addr; /* dev address in adev */ pdev = hns_dsaf_find_platform_device(acpi_fwnode_handle(args.adev)); + if (!pdev) { + dev_err(mac_cb->dev, "mac%d mdio pdev is NULL\n", + mac_cb->mac_id); + return -EINVAL; + } + mii_bus = platform_get_drvdata(pdev); + if (!mii_bus) { + dev_err(mac_cb->dev, + "mac%d mdio is NULL, dsaf will probe again later\n", + mac_cb->mac_id); + return -EPROBE_DEFER; + } + rc = hns_mac_register_phydev(mii_bus, mac_cb, addr); if (!rc) dev_dbg(mac_cb->dev, "mac%d register phy addr:%d\n", mac_cb->mac_id, addr); + + return rc; } #define MAC_MEDIA_TYPE_MAX_LEN 16 @@ -793,7 +810,7 @@ static void hns_mac_register_phy(struct hns_mac_cb *mac_cb) *@np:device node * return: 0 --success, negative --fail */ -static int hns_mac_get_info(struct hns_mac_cb *mac_cb) +static int hns_mac_get_info(struct hns_mac_cb *mac_cb) { struct device_node *np; struct regmap *syscon; @@ -903,7 +920,15 @@ static int hns_mac_get_info(struct hns_mac_cb *mac_cb) } } } else if (is_acpi_node(mac_cb->fw_port)) { - hns_mac_register_phy(mac_cb); + ret = hns_mac_register_phy(mac_cb); + /* +* Mac can work well if there is phy or not.If the port don't +* connect with phy, the return value will be ignored. Only +* when there is phy but can't find mdio bus, the return value +* will be handled. +*/ + if (ret == -EPROBE_DEFER) + return ret; } else { dev_err(mac_cb->dev, "mac%d cannot find phy node\n", mac_cb->mac_id); @@ -1065,6 +1090,7 @@ int hns_mac_init(struct dsaf_device *dsaf_dev) dsaf_dev->mac_cb[port_id] = mac_cb; } } + /* init mac_cb for all port */ for (port_id = 0; port_id < max_port_num; port_id++) { mac_cb = dsaf_dev->mac_cb[port_id]; @@ -1074,6 +1100,7 @@ int hns_mac_init(struct dsaf_device *dsaf_dev) ret = hns_mac_get_cfg(dsaf_dev, mac_cb); if (ret) return ret; + ret = hns_mac_init_ex(mac_cb); if (ret) return ret; -- 1.9.1
[PATCH net v4 1/3] net: hns: support deferred probe when can not obtain irq
From: lipeng In the hip06 and hip07 SoCs, the interrupt lines from the DSAF controllers are connected to mbigen hw module. The mbigen module is probed with module_init, and, as such, is not guaranteed to probe before the HNS driver. So we need to support deferred probe. Signed-off-by: lipeng Reviewed-by: Yisen Zhuang Reviewed-by: Matthias Brugger --- change log: V3 -> V4: 1. Delete redundant commit message; 2. add Reviewed-by: Matthias Brugger ; V2 -> V3: 1. Check return value when platform_get_irq in hns_rcb_get_cfg; --- drivers/net/ethernet/hisilicon/hns/hns_dsaf_ppe.c | 4 +++- drivers/net/ethernet/hisilicon/hns/hns_dsaf_rcb.c | 8 +++- drivers/net/ethernet/hisilicon/hns/hns_dsaf_rcb.h | 2 +- 3 files changed, 11 insertions(+), 3 deletions(-) diff --git a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_ppe.c b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_ppe.c index 6ea8722..a41cf95 100644 --- a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_ppe.c +++ b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_ppe.c @@ -510,7 +510,9 @@ int hns_ppe_init(struct dsaf_device *dsaf_dev) hns_ppe_get_cfg(dsaf_dev->ppe_common[i]); - hns_rcb_get_cfg(dsaf_dev->rcb_common[i]); + ret = hns_rcb_get_cfg(dsaf_dev->rcb_common[i]); + if (ret) + goto get_rcb_cfg_fail; } for (i = 0; i < HNS_PPE_COM_NUM; i++) diff --git a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_rcb.c b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_rcb.c index f0ed80d6..673a5d3 100644 --- a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_rcb.c +++ b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_rcb.c @@ -452,7 +452,7 @@ static int hns_rcb_get_base_irq_idx(struct rcb_common_cb *rcb_common) *hns_rcb_get_cfg - get rcb config *@rcb_common: rcb common device */ -void hns_rcb_get_cfg(struct rcb_common_cb *rcb_common) +int hns_rcb_get_cfg(struct rcb_common_cb *rcb_common) { struct ring_pair_cb *ring_pair_cb; u32 i; @@ -477,10 +477,16 @@ void hns_rcb_get_cfg(struct rcb_common_cb *rcb_common) ring_pair_cb->virq[HNS_RCB_IRQ_IDX_RX] = is_ver1 ? platform_get_irq(pdev, base_irq_idx + i * 2 + 1) : platform_get_irq(pdev, base_irq_idx + i * 3); + if ((ring_pair_cb->virq[HNS_RCB_IRQ_IDX_TX] == -EPROBE_DEFER) || + (ring_pair_cb->virq[HNS_RCB_IRQ_IDX_RX] == -EPROBE_DEFER)) + return -EPROBE_DEFER; + ring_pair_cb->q.phy_base = RCB_COMM_BASE_TO_RING_BASE(rcb_common->phy_base, i); hns_rcb_ring_pair_get_cfg(ring_pair_cb); } + + return 0; } /** diff --git a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_rcb.h b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_rcb.h index 99b4e1b..3d7b484 100644 --- a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_rcb.h +++ b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_rcb.h @@ -110,7 +110,7 @@ struct rcb_common_cb { void hns_rcb_common_free_cfg(struct dsaf_device *dsaf_dev, u32 comm_index); int hns_rcb_common_init_hw(struct rcb_common_cb *rcb_common); void hns_rcb_start(struct hnae_queue *q, u32 val); -void hns_rcb_get_cfg(struct rcb_common_cb *rcb_common); +int hns_rcb_get_cfg(struct rcb_common_cb *rcb_common); void hns_rcb_get_queue_mode(enum dsaf_mode dsaf_mode, u16 *max_vfn, u16 *max_q_per_vf); -- 1.9.1
Re: more test_progs...
From: Alexei Starovoitov Date: Wed, 26 Apr 2017 16:49:35 -0700 > It's only needed for test_pkt_access.c test, > since it's being fancy and doing iph->ihl * 4. It is going to be a common idiom in anything looking at transport headers, no? > Also such tcp->urg_ptr access into packed struct is more efficient > after JITing then sparc's own load_half_unaligned in asm, since it's > done inline and doesn't need a call. > > Note that such tcphdr workaround is not necessary for more > real programs: test_l4lb.c and test_xdp.c, since they do: > if (iph->ihl != 5) >return drop; Hmmm... > Does sparc64 have some special instructions like that? Unfortunately not. I think we need to seriously consider tracking "Pointer incremented by power of 2 N" and stuff like that. I know it's not easy, but it is necessary. Having stuff like the packed thing above is just a completely unnecessary detail to expose to users writing these programs.
Re: [PATCH net v3 1/3] net: hns: support deferred probe when can not obtain irq
On 2017/4/26 22:03, Matthias Brugger wrote: On 26/04/17 09:00, Yankejian wrote: From: lipeng In the hip06 and hip07 SoCs, the interrupt lines from the DSAF controllers are connected to mbigen hw module. The mbigen module is probed with module_init, and, as such, is not guaranteed to probe before the HNS driver. So we need to support deferred probe. We check for probe deferral in the hw layer probe, so we not probe into the main layer and memories, etc., to later learn that we need to defer the probe. This paragraph does not hold any more and should be deleted. OK , will delete it. Other then this: Reviewed-by: Matthias Brugger Thanks for your review Signed-off-by: lipeng Reviewed-by: Yisen Zhuang --- V2 -> V3: 1. Check return value when platform_get_irq in hns_rcb_get_cfg; --- drivers/net/ethernet/hisilicon/hns/hns_dsaf_ppe.c | 4 +++- drivers/net/ethernet/hisilicon/hns/hns_dsaf_rcb.c | 8 +++- drivers/net/ethernet/hisilicon/hns/hns_dsaf_rcb.h | 2 +- 3 files changed, 11 insertions(+), 3 deletions(-) diff --git a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_ppe.c b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_ppe.c index 6ea8722..a41cf95 100644 --- a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_ppe.c +++ b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_ppe.c @@ -510,7 +510,9 @@ int hns_ppe_init(struct dsaf_device *dsaf_dev) hns_ppe_get_cfg(dsaf_dev->ppe_common[i]); -hns_rcb_get_cfg(dsaf_dev->rcb_common[i]); +ret = hns_rcb_get_cfg(dsaf_dev->rcb_common[i]); +if (ret) +goto get_rcb_cfg_fail; } for (i = 0; i < HNS_PPE_COM_NUM; i++) diff --git a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_rcb.c b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_rcb.c index f0ed80d6..673a5d3 100644 --- a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_rcb.c +++ b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_rcb.c @@ -452,7 +452,7 @@ static int hns_rcb_get_base_irq_idx(struct rcb_common_cb *rcb_common) *hns_rcb_get_cfg - get rcb config *@rcb_common: rcb common device */ -void hns_rcb_get_cfg(struct rcb_common_cb *rcb_common) +int hns_rcb_get_cfg(struct rcb_common_cb *rcb_common) { struct ring_pair_cb *ring_pair_cb; u32 i; @@ -477,10 +477,16 @@ void hns_rcb_get_cfg(struct rcb_common_cb *rcb_common) ring_pair_cb->virq[HNS_RCB_IRQ_IDX_RX] = is_ver1 ? platform_get_irq(pdev, base_irq_idx + i * 2 + 1) : platform_get_irq(pdev, base_irq_idx + i * 3); +if ((ring_pair_cb->virq[HNS_RCB_IRQ_IDX_TX] == -EPROBE_DEFER) || +(ring_pair_cb->virq[HNS_RCB_IRQ_IDX_RX] == -EPROBE_DEFER)) +return -EPROBE_DEFER; + ring_pair_cb->q.phy_base = RCB_COMM_BASE_TO_RING_BASE(rcb_common->phy_base, i); hns_rcb_ring_pair_get_cfg(ring_pair_cb); } + +return 0; } /** diff --git a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_rcb.h b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_rcb.h index 99b4e1b..3d7b484 100644 --- a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_rcb.h +++ b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_rcb.h @@ -110,7 +110,7 @@ struct rcb_common_cb { void hns_rcb_common_free_cfg(struct dsaf_device *dsaf_dev, u32 comm_index); int hns_rcb_common_init_hw(struct rcb_common_cb *rcb_common); void hns_rcb_start(struct hnae_queue *q, u32 val); -void hns_rcb_get_cfg(struct rcb_common_cb *rcb_common); +int hns_rcb_get_cfg(struct rcb_common_cb *rcb_common); void hns_rcb_get_queue_mode(enum dsaf_mode dsaf_mode, u16 *max_vfn, u16 *max_q_per_vf); .
Re: [PATCH net-next] bpf: restore skb->sk before pskb_trim() call
On Wed, Apr 26, 2017 at 4:53 PM, Alexei Starovoitov wrote: > On Wed, Apr 26, 2017 at 09:09:23AM -0700, Eric Dumazet wrote: >> From: Eric Dumazet >> >> While testing a fix [1] in ___pskb_trim(), addressing the WARN_ON_ONCE() >> in skb_try_coalesce() reported by Andrey, I found that we had an skb >> with skb->sk set but no skb->destructor. >> >> This invalidated heuristic found in commit 158f323b9868 ("net: adjust >> skb->truesize in pskb_expand_head()") and in cited patch. >> >> Considering the BUG_ON(skb->sk) we have in skb_orphan(), we should >> restrain the temporary setting to a minimal section. >> >> [1] https://patchwork.ozlabs.org/patch/755570/ >> net: adjust skb->truesize in ___pskb_trim() >> >> Fixes: 8f917bba0042 ("bpf: pass sk to helper functions") >> Signed-off-by: Eric Dumazet >> Cc: Willem de Bruijn >> Cc: Andrey Konovalov > > Ahh. Thanks for the fix. > Acked-by: Alexei Starovoitov Acked-by: Willem de Bruijn Thanks, Eric.
[PATCH net] tcp: do not underestimate skb->truesize in tcp_trim_head()
From: Eric Dumazet Andrey found a way to trigger the WARN_ON_ONCE(delta < len) in skb_try_coalesce() using syzkaller and a filter attached to a TCP socket over loopback interface. I believe one issue with looped skbs is that tcp_trim_head() can end up producing skb with under estimated truesize. It hardly matters for normal conditions, since packets sent over loopback are never truncated. Bytes trimmed from skb->head should not change skb truesize, since skb->head is not reallocated. Signed-off-by: Eric Dumazet Reported-by: Andrey Konovalov --- net/ipv4/tcp_output.c | 19 --- 1 file changed, 12 insertions(+), 7 deletions(-) diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index c3c082ed38796f65948e7a1042e37813b71d5abf..a85d863c44196e60fd22e25471cf773e72d2c133 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -1267,7 +1267,7 @@ int tcp_fragment(struct sock *sk, struct sk_buff *skb, u32 len, * eventually). The difference is that pulled data not copied, but * immediately discarded. */ -static void __pskb_trim_head(struct sk_buff *skb, int len) +static int __pskb_trim_head(struct sk_buff *skb, int len) { struct skb_shared_info *shinfo; int i, k, eat; @@ -1277,7 +1277,7 @@ static void __pskb_trim_head(struct sk_buff *skb, int len) __skb_pull(skb, eat); len -= eat; if (!len) - return; + return 0; } eat = len; k = 0; @@ -1303,23 +1303,28 @@ static void __pskb_trim_head(struct sk_buff *skb, int len) skb_reset_tail_pointer(skb); skb->data_len -= len; skb->len = skb->data_len; + return len; } /* Remove acked data from a packet in the transmit queue. */ int tcp_trim_head(struct sock *sk, struct sk_buff *skb, u32 len) { + u32 delta_truesize; + if (skb_unclone(skb, GFP_ATOMIC)) return -ENOMEM; - __pskb_trim_head(skb, len); + delta_truesize = __pskb_trim_head(skb, len); TCP_SKB_CB(skb)->seq += len; skb->ip_summed = CHECKSUM_PARTIAL; - skb->truesize-= len; - sk->sk_wmem_queued -= len; - sk_mem_uncharge(sk, len); - sock_set_flag(sk, SOCK_QUEUE_SHRUNK); + if (delta_truesize) { + skb->truesize -= delta_truesize; + sk->sk_wmem_queued -= delta_truesize; + sk_mem_uncharge(sk, delta_truesize); + sock_set_flag(sk, SOCK_QUEUE_SHRUNK); + } /* Any change of skb->len requires recalculation of tso factor. */ if (tcp_skb_pcount(skb) > 1)
Re: [PATCH net-next 1/9] drivers: net: xgene: Protect indirect MAC access
On 04/26/2017 04:38 PM, Iyappan Subramanian wrote: > From: Quan Nguyen > > This patch adds lock to protect indirect mac access sequence. > > Signed-off-by: Quan Nguyen > Signed-off-by: Iyappan Subramanian > --- > drivers/net/ethernet/apm/xgene/xgene_enet_hw.c| 2 ++ > drivers/net/ethernet/apm/xgene/xgene_enet_main.c | 1 + > drivers/net/ethernet/apm/xgene/xgene_enet_main.h | 1 + > drivers/net/ethernet/apm/xgene/xgene_enet_sgmac.c | 7 ++- > drivers/net/ethernet/apm/xgene/xgene_enet_xgmac.c | 2 ++ > 5 files changed, 12 insertions(+), 1 deletion(-) > > diff --git a/drivers/net/ethernet/apm/xgene/xgene_enet_hw.c > b/drivers/net/ethernet/apm/xgene/xgene_enet_hw.c > index 2a835e0..3697ba7 100644 > --- a/drivers/net/ethernet/apm/xgene/xgene_enet_hw.c > +++ b/drivers/net/ethernet/apm/xgene/xgene_enet_hw.c > @@ -365,9 +365,11 @@ static void xgene_enet_rd_mcx_mac(struct > xgene_enet_pdata *pdata, > cmd = pdata->mcx_mac_addr + MAC_COMMAND_REG_OFFSET; > cmd_done = pdata->mcx_mac_addr + MAC_COMMAND_DONE_REG_OFFSET; > > + spin_lock(&pdata->mac_lock); > if (!xgene_enet_rd_indirect(addr, rd, cmd, cmd_done, rd_addr, rd_data)) > netdev_err(pdata->ndev, "MCX mac read failed, addr: %04x\n", > rd_addr); > + spin_unlock(&pdata->mac_lock); Why not fold the spinlock manipulation within xgenet_enet_rd_indirect()? That way the caller does not have to know that acquiring the spinlock is required and you avoid potential bugs in the future where you are missing the spinlock on indirect accesses? -- Florian
Re: [PATCH v1 net-next 5/6] net: allow simultaneous SW and HW transmit timestamping
> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h > index 81ef53f..42bff22 100644 > --- a/include/linux/skbuff.h > +++ b/include/linux/skbuff.h > @@ -3300,8 +3300,7 @@ void skb_tstamp_tx(struct sk_buff *orig_skb, > > static inline void sw_tx_timestamp(struct sk_buff *skb) > { > - if (skb_shinfo(skb)->tx_flags & SKBTX_SW_TSTAMP && > - !(skb_shinfo(skb)->tx_flags & SKBTX_IN_PROGRESS)) > + if (skb_shinfo(skb)->tx_flags & SKBTX_SW_TSTAMP) > skb_tstamp_tx(skb, NULL); > } > > diff --git a/include/uapi/linux/net_tstamp.h b/include/uapi/linux/net_tstamp.h > index 8fcae35..d251972 100644 > --- a/include/uapi/linux/net_tstamp.h > +++ b/include/uapi/linux/net_tstamp.h > @@ -27,8 +27,9 @@ enum { > SOF_TIMESTAMPING_OPT_TSONLY = (1<<11), > SOF_TIMESTAMPING_OPT_STATS = (1<<12), > SOF_TIMESTAMPING_OPT_PKTINFO = (1<<13), > + SOF_TIMESTAMPING_OPT_TX_SWHW = (1<<14), > > - SOF_TIMESTAMPING_LAST = SOF_TIMESTAMPING_OPT_PKTINFO, > + SOF_TIMESTAMPING_LAST = SOF_TIMESTAMPING_OPT_TX_SWHW, > SOF_TIMESTAMPING_MASK = (SOF_TIMESTAMPING_LAST - 1) | > SOF_TIMESTAMPING_LAST > }; > diff --git a/net/core/skbuff.c b/net/core/skbuff.c > index 58604c1..db5aa19 100644 > --- a/net/core/skbuff.c > +++ b/net/core/skbuff.c > @@ -3874,6 +3874,10 @@ void __skb_tstamp_tx(struct sk_buff *orig_skb, > if (!sk) > return; > > + if (!hwtstamps && !(sk->sk_tsflags & SOF_TIMESTAMPING_OPT_TX_SWHW) && > + skb_shinfo(orig_skb)->tx_flags & SKBTX_IN_PROGRESS) > + return; > + This check should only happen for software transmit timestamps, so simpler to revise the check in sw_tx_timestamp above to if (skb_shinfo(skb)->tx_flags & SKBTX_SW_TSTAMP && -!(skb_shinfo(skb)->tx_flags & SKBTX_IN_PROGRESS)) + (!(skb_shinfo(orig_skb)->tx_flags & SKBTX_IN_PROGRESS)) || + (skb->sk && skb->sk->sk_tsflags & SOF_TIMESTAMPING_OPT_TX_SWHW) > diff --git a/net/socket.c b/net/socket.c > index 68b9304..14b7688 100644 > --- a/net/socket.c > +++ b/net/socket.c > @@ -720,6 +720,7 @@ void __sock_recv_timestamp(struct msghdr *msg, struct > sock *sk, > empty = 0; > if (shhwtstamps && > (sk->sk_tsflags & SOF_TIMESTAMPING_RAW_HARDWARE) && > + (empty || !skb_is_err_queue(skb)) && > ktime_to_timespec_cond(shhwtstamps->hwtstamp, tss.ts + 2)) { I find skb->tstamp == 0 easier to understand than the condition on empty. Indeed, this is so non-obvious that I would suggest another helper function skb_is_hwtx_tstamp with a concise comment about the race condition between tx software and hardware timestamps (as in the last sentence of the commit message).
[PATCH] netvsc: make sure napi enabled before vmbus_open
This fixes a race where vmbus callback for new packet arriving could occur before NAPI is initialized. Happens more on WS2008 which takes longer to setup channel. Signed-off-by: Stephen Hemminger --- drivers/net/hyperv/netvsc.c | 8 +--- drivers/net/hyperv/rndis_filter.c | 4 +++- 2 files changed, 8 insertions(+), 4 deletions(-) diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c index f99651c03e0a..8150febb86d6 100644 --- a/drivers/net/hyperv/netvsc.c +++ b/drivers/net/hyperv/netvsc.c @@ -1323,6 +1323,10 @@ int netvsc_device_add(struct hv_device *device, nvchan->channel = device->channel; } + /* Enable NAPI handler before init callbacks */ + netif_napi_add(ndev, &net_device->chan_table[0].napi, + netvsc_poll, NAPI_POLL_WEIGHT); + /* Open the channel */ ret = vmbus_open(device->channel, ring_size * PAGE_SIZE, ring_size * PAGE_SIZE, NULL, 0, @@ -1330,6 +1334,7 @@ int netvsc_device_add(struct hv_device *device, net_device->chan_table); if (ret != 0) { + netif_napi_del(&net_device->chan_table[0].napi); netdev_err(ndev, "unable to open channel: %d\n", ret); goto cleanup; } @@ -1337,9 +1342,6 @@ int netvsc_device_add(struct hv_device *device, /* Channel is opened */ netdev_dbg(ndev, "hv_netvsc channel opened successfully\n"); - /* Enable NAPI handler for init callbacks */ - netif_napi_add(ndev, &net_device->chan_table[0].napi, - netvsc_poll, NAPI_POLL_WEIGHT); napi_enable(&net_device->chan_table[0].napi); /* Writing nvdev pointer unlocks netvsc_send(), make sure chn_table is diff --git a/drivers/net/hyperv/rndis_filter.c b/drivers/net/hyperv/rndis_filter.c index ab92c3c95951..a9e9fb4bb9c5 100644 --- a/drivers/net/hyperv/rndis_filter.c +++ b/drivers/net/hyperv/rndis_filter.c @@ -1017,8 +1017,10 @@ static void netvsc_sc_open(struct vmbus_channel *new_sc) netvsc_channel_cb, nvchan); if (ret == 0) napi_enable(&nvchan->napi); - else + else { + netif_napi_del(&nvchan->napi); netdev_err(ndev, "sub channel open failed (%d)\n", ret); + } if (refcount_dec_and_test(&nvscdev->sc_offered)) complete(&nvscdev->channel_init_wait); -- 2.11.0
Re: more test_progs...
On 4/26/17 7:55 AM, David Miller wrote: From: Daniel Borkmann Date: Wed, 26 Apr 2017 10:14:42 +0200 On 04/26/2017 05:42 AM, Alexei Starovoitov wrote: That sucks for sparc, of course. I don't have good suggestions for workaround other than marking tcphdr as packed :( From code efficiency point of view it still will be faster than LD_ABS insn. Plus, ld_abs would also only work on skbs right now, and would need JIT support for XDP. But it's also cumbersome to use with f.e. IPv6, etc, so should not be encouraged, imo. One other option for !HAVE_EFFICIENT_UNALIGNED_ACCESS archs could be to provide a bpf_skb_load_bytes() helper equivalent for XDP, where you eventually do the memcpy() inside. We could see to inline the helper itself to optimize it slightly. We have to do something that works transparently and always, regardless of whether HAVE_EFFICIENT_UNALIGNED_ACCESS is in play or not. And no, marking objects with __packed is not the answer. I'm not suggesting to mark everything as __packed. Why the following is not the answer? index fd1e0832d409..c215dffd7189 100644 --- a/tools/testing/selftests/bpf/test_pkt_access.c +++ b/tools/testing/selftests/bpf/test_pkt_access.c @@ -19,13 +19,52 @@ #define barrier() __asm__ __volatile__("": : :"memory") int _version SEC("version") = 1; +struct __tcphdr { + __u16 source; ... + __u16 window; + __u16 check; + __u16 urg_ptr; +} +#if defined(__sparc__) || defined(__s390__) +__packed +#endif +; SEC("test1") int process(struct __sk_buff *skb) { void *data_end = (void *)(long)skb->data_end; void *data = (void *)(long)skb->data; struct ethhdr *eth = (struct ethhdr *)(data); - struct tcphdr *tcp = NULL; + struct __tcphdr *tcp = NULL; __u8 proto = 255; It's only needed for test_pkt_access.c test, since it's being fancy and doing iph->ihl * 4. Also such tcp->urg_ptr access into packed struct is more efficient after JITing then sparc's own load_half_unaligned in asm, since it's done inline and doesn't need a call. Note that such tcphdr workaround is not necessary for more real programs: test_l4lb.c and test_xdp.c, since they do: if (iph->ihl != 5) return drop; Another idea: x64, arm64, ppc have efficient unaligned. s390 and mips64 have special instructions to do unaligned access efficiently and we can make verifier convert unknown-align load/stores into special internal load/stores, so they can be executed differently by interpreter and by JITs. Does sparc64 have some special instructions like that?
Re: [PATCH net-next v1] net: ipv6: make sure multicast packets are not forwarded beyond the different scopes
Hi Donatas, [auto build test ERROR on net-next/master] url: https://github.com/0day-ci/linux/commits/Donatas-Abraitis/net-ipv6-make-sure-multicast-packets-are-not-forwarded-beyond-the-different-scopes/20170426-180846 config: x86_64-rhel (attached as .config) compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901 reproduce: # save the attached .config to linux build tree make ARCH=x86_64 All errors (new ones prefixed by >>): net//ipv6/ip6_input.c: In function 'ipv6_rcv': >> net//ipv6/ip6_input.c:174:10: error: expected ')' before 'goto' goto err; ^~~~ >> net//ipv6/ip6_input.c:225:1: error: expected expression before '}' token } ^ >> net//ipv6/ip6_input.c:166:3: error: label 'err' used but not defined goto err; ^~~~ >> net//ipv6/ip6_input.c:95:3: error: label 'drop' used but not defined goto drop; ^~~~ net//ipv6/ip6_input.c:77:6: warning: unused variable 'pkt_len' [-Wunused-variable] u32 pkt_len; ^~~ net//ipv6/ip6_input.c:225:1: warning: control reaches end of non-void function [-Wreturn-type] } ^ vim +174 net//ipv6/ip6_input.c 89 90 __IP6_UPD_PO_STATS(net, idev, IPSTATS_MIB_IN, skb->len); 91 92 if ((skb = skb_share_check(skb, GFP_ATOMIC)) == NULL || 93 !idev || unlikely(idev->cnf.disable_ipv6)) { 94 __IP6_INC_STATS(net, idev, IPSTATS_MIB_INDISCARDS); > 95 goto drop; 96 } 97 98 memset(IP6CB(skb), 0, sizeof(struct inet6_skb_parm)); 99 100 /* 101 * Store incoming device index. When the packet will 102 * be queued, we cannot refer to skb->dev anymore. 103 * 104 * BTW, when we send a packet for our own local address on a 105 * non-loopback interface (e.g. ethX), it is being delivered 106 * via the loopback interface (lo) here; skb->dev = loopback_dev. 107 * It, however, should be considered as if it is being 108 * arrived via the sending interface (ethX), because of the 109 * nature of scoping architecture. --yoshfuji 110 */ 111 IP6CB(skb)->iif = skb_valid_dst(skb) ? ip6_dst_idev(skb_dst(skb))->dev->ifindex : dev->ifindex; 112 113 if (unlikely(!pskb_may_pull(skb, sizeof(*hdr 114 goto err; 115 116 hdr = ipv6_hdr(skb); 117 118 if (hdr->version != 6) 119 goto err; 120 121 __IP6_ADD_STATS(net, idev, 122 IPSTATS_MIB_NOECTPKTS + 123 (ipv6_get_dsfield(hdr) & INET_ECN_MASK), 124 max_t(unsigned short, 1, skb_shinfo(skb)->gso_segs)); 125 /* 126 * RFC4291 2.5.3 127 * The loopback address must not be used as the source address in IPv6 128 * packets that are sent outside of a single node. [..] 129 * A packet received on an interface with a destination address 130 * of loopback must be dropped. 131 */ 132 if ((ipv6_addr_loopback(&hdr->saddr) || 133 ipv6_addr_loopback(&hdr->daddr)) && 134 !(dev->flags & IFF_LOOPBACK)) 135 goto err; 136 137 /* RFC4291 Errata ID: 3480 138 * Interface-Local scope spans only a single interface on a 139 * node and is useful only for loopback transmission of 140 * multicast. Packets with interface-local scope received 141 * from another node must be discarded. 142 */ 143 if (!(skb->pkt_type == PACKET_LOOPBACK || 144dev->flags & IFF_LOOPBACK) && 145 ipv6_addr_is_multicast(&hdr->daddr) && 146 IPV6_ADDR_MC_SCOPE(&hdr->daddr) == 1) 147 goto err; 148 149 /* If enabled, drop unicast packets that were encapsulated in link-layer 150 * multicast or broadcast to protected against the so-called "hole-196" 151 * attack in 802.11 wireless. 152 */ 153 if (!ipv6_addr_is_multicast(&hdr->daddr) && 154 (skb->pkt_type == PACKET_BROADCAST || 155 skb->pkt_type == PACKET_MULTICAST) && 156 idev->cnf.drop_unicast_in_l2_multicast) 157 goto err; 158 159 /* RFC4291 2.7 160 * Nodes must not originate a packet to a multicast address whose scope 161 * field contains the reserved value 0;
[PATCH net-next] Fix inaccurate helper function description
From: Chenbo Feng The description inside uapi/linux/bpf.h about bpf_get_socket_uid helper function is no longer valid. It returns overflowuid rather than 0 when failed. Signed-off-by: Chenbo Feng --- include/uapi/linux/bpf.h | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index e553529..945a1f5 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -481,8 +481,7 @@ union bpf_attr { * u32 bpf_get_socket_uid(skb) * Get the owner uid of the socket stored inside sk_buff. * @skb: pointer to skb - * Return: uid of the socket owner on success or 0 if the socket pointer - * inside sk_buff is NULL + * Return: uid of the socket owner on success or overflowuid if failed. */ #define __BPF_FUNC_MAPPER(FN) \ FN(unspec), \ -- 2.7.4
[PATCH net-next 5/5] bpf: provide a generic macro for percpu values for selftests
To overcome bugs as described and fixed in 89087c456fb5 ("bpf: Fix values type used in test_maps"), provide a generic BPF_DECLARE_PERCPU() and bpf_percpu() accessor macro for all percpu map values used in tests. Declaring variables works as follows (also works for structs): BPF_DECLARE_PERCPU(uint32_t, my_value); They can then be accessed normally as uint32_t type through: bpf_percpu(my_value, ) For example: bpf_percpu(my_value, 0)++; Implicitly, we make sure that the passed type is allocated and aligned by gcc at least on a 8-byte boundary, so that it works together with the map lookup/update syscall for percpu maps. We use it as a usage example in test_maps, so that others are free to adapt this into their code when necessary. Signed-off-by: Daniel Borkmann Acked-by: Alexei Starovoitov --- tools/testing/selftests/bpf/bpf_util.h | 7 +++ tools/testing/selftests/bpf/test_maps.c | 37 ++--- 2 files changed, 27 insertions(+), 17 deletions(-) diff --git a/tools/testing/selftests/bpf/bpf_util.h b/tools/testing/selftests/bpf/bpf_util.h index 7de2796..369e7d7 100644 --- a/tools/testing/selftests/bpf/bpf_util.h +++ b/tools/testing/selftests/bpf/bpf_util.h @@ -54,4 +54,11 @@ static inline unsigned int bpf_num_possible_cpus(void) return possible_cpus; } +#define __bpf_percpu_val_align __attribute__((__aligned__(8))) + +#define BPF_DECLARE_PERCPU(type, name) \ + struct { type v; /* padding */ } __bpf_percpu_val_align \ + name[bpf_num_possible_cpus()] +#define bpf_percpu(name, cpu) name[(cpu)].v + #endif /* __BPF_UTIL__ */ diff --git a/tools/testing/selftests/bpf/test_maps.c b/tools/testing/selftests/bpf/test_maps.c index a977c4f..9331452 100644 --- a/tools/testing/selftests/bpf/test_maps.c +++ b/tools/testing/selftests/bpf/test_maps.c @@ -137,20 +137,20 @@ static void test_hashmap_sizes(int task, void *data) static void test_hashmap_percpu(int task, void *data) { unsigned int nr_cpus = bpf_num_possible_cpus(); - long long value[nr_cpus]; + BPF_DECLARE_PERCPU(long, value); long long key, next_key, first_key; int expected_key_mask = 0; int fd, i; fd = bpf_create_map(BPF_MAP_TYPE_PERCPU_HASH, sizeof(key), - sizeof(value[0]), 2, map_flags); + sizeof(bpf_percpu(value, 0)), 2, map_flags); if (fd < 0) { printf("Failed to create hashmap '%s'!\n", strerror(errno)); exit(1); } for (i = 0; i < nr_cpus; i++) - value[i] = i + 100; + bpf_percpu(value, i) = i + 100; key = 1; /* Insert key=1 element. */ @@ -170,8 +170,9 @@ static void test_hashmap_percpu(int task, void *data) /* Check that key=1 can be found. Value could be 0 if the lookup * was run from a different CPU. */ - value[0] = 1; - assert(bpf_map_lookup_elem(fd, &key, value) == 0 && value[0] == 100); + bpf_percpu(value, 0) = 1; + assert(bpf_map_lookup_elem(fd, &key, value) == 0 && + bpf_percpu(value, 0) == 100); key = 2; /* Check that key=2 is not found. */ @@ -211,7 +212,7 @@ static void test_hashmap_percpu(int task, void *data) assert(bpf_map_lookup_elem(fd, &next_key, value) == 0); for (i = 0; i < nr_cpus; i++) - assert(value[i] == i + 100); + assert(bpf_percpu(value, i) == i + 100); key = next_key; } @@ -296,34 +297,36 @@ static void test_arraymap(int task, void *data) static void test_arraymap_percpu(int task, void *data) { unsigned int nr_cpus = bpf_num_possible_cpus(); + BPF_DECLARE_PERCPU(long, values); int key, next_key, fd, i; - long long values[nr_cpus]; fd = bpf_create_map(BPF_MAP_TYPE_PERCPU_ARRAY, sizeof(key), - sizeof(values[0]), 2, 0); + sizeof(bpf_percpu(values, 0)), 2, 0); if (fd < 0) { printf("Failed to create arraymap '%s'!\n", strerror(errno)); exit(1); } for (i = 0; i < nr_cpus; i++) - values[i] = i + 100; + bpf_percpu(values, i) = i + 100; key = 1; /* Insert key=1 element. */ assert(bpf_map_update_elem(fd, &key, values, BPF_ANY) == 0); - values[0] = 0; + bpf_percpu(values, 0) = 0; assert(bpf_map_update_elem(fd, &key, values, BPF_NOEXIST) == -1 && errno == EEXIST); /* Check that key=1 can be found. */ - assert(bpf_map_lookup_elem(fd, &key, values) == 0 && values[0] == 100); + assert(bpf_map_lookup_elem(fd, &key, values) == 0 && + bpf_percpu(values, 0) == 100); key = 0; /* Check that key=0 is also found and zero initialized. */ assert(bpf_map_lookup_elem(fd,
[PATCH net-next 1/5] bpf, x86_64/arm64: remove old ldimm64 artifacts from jits
For both cases, the verifier is already rejecting such invalid formed instructions. Thus, remove these artifacts from old times and align it with ppc64, sparc64 and s390x JITs that don't have them in the first place. Signed-off-by: Daniel Borkmann Acked-by: Alexei Starovoitov --- arch/arm64/net/bpf_jit_comp.c | 9 - arch/x86/net/bpf_jit_comp.c | 7 --- 2 files changed, 16 deletions(-) diff --git a/arch/arm64/net/bpf_jit_comp.c b/arch/arm64/net/bpf_jit_comp.c index a785554..3047368 100644 --- a/arch/arm64/net/bpf_jit_comp.c +++ b/arch/arm64/net/bpf_jit_comp.c @@ -604,15 +604,6 @@ static int build_insn(const struct bpf_insn *insn, struct jit_ctx *ctx) const struct bpf_insn insn1 = insn[1]; u64 imm64; - if (insn1.code != 0 || insn1.src_reg != 0 || - insn1.dst_reg != 0 || insn1.off != 0) { - /* Note: verifier in BPF core must catch invalid -* instructions. -*/ - pr_err_once("Invalid BPF_LD_IMM64 instruction\n"); - return -EINVAL; - } - imm64 = (u64)insn1.imm << 32 | (u32)imm; emit_a64_mov_i64(dst, imm64, ctx); diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c index 32322ce..14f840d 100644 --- a/arch/x86/net/bpf_jit_comp.c +++ b/arch/x86/net/bpf_jit_comp.c @@ -490,13 +490,6 @@ static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image, break; case BPF_LD | BPF_IMM | BPF_DW: - if (insn[1].code != 0 || insn[1].src_reg != 0 || - insn[1].dst_reg != 0 || insn[1].off != 0) { - /* verifier must catch invalid insns */ - pr_err("invalid BPF_LD_IMM64 insn\n"); - return -EINVAL; - } - /* optimization: if imm64 is zero, use 'xor ,' * to save 7 bytes. */ -- 1.9.3
[PATCH net-next 3/5] bpf: bpf_lock on kallsysms doesn't need to be irqsave
From: Hannes Frederic Sowa Hannes rightfully spotted that the bpf_lock doesn't need to be irqsave variant. We never perform any such updates where this would be necessary (neither right now nor in future), therefore relax this further. Signed-off-by: Hannes Frederic Sowa Signed-off-by: Daniel Borkmann Acked-by: Alexei Starovoitov --- kernel/bpf/core.c | 12 1 file changed, 4 insertions(+), 8 deletions(-) diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c index b4f1cb0..6f81e0f 100644 --- a/kernel/bpf/core.c +++ b/kernel/bpf/core.c @@ -394,27 +394,23 @@ static bool bpf_prog_kallsyms_verify_off(const struct bpf_prog *fp) void bpf_prog_kallsyms_add(struct bpf_prog *fp) { - unsigned long flags; - if (!bpf_prog_kallsyms_candidate(fp) || !capable(CAP_SYS_ADMIN)) return; - spin_lock_irqsave(&bpf_lock, flags); + spin_lock_bh(&bpf_lock); bpf_prog_ksym_node_add(fp->aux); - spin_unlock_irqrestore(&bpf_lock, flags); + spin_unlock_bh(&bpf_lock); } void bpf_prog_kallsyms_del(struct bpf_prog *fp) { - unsigned long flags; - if (!bpf_prog_kallsyms_candidate(fp)) return; - spin_lock_irqsave(&bpf_lock, flags); + spin_lock_bh(&bpf_lock); bpf_prog_ksym_node_del(fp->aux); - spin_unlock_irqrestore(&bpf_lock, flags); + spin_unlock_bh(&bpf_lock); } static struct bpf_prog *bpf_prog_kallsyms_find(unsigned long addr) -- 1.9.3
[PATCH net-next 4/5] bpf: fix _htons occurences in test_progs
Dave reported that on sparc test_progs generates buggy swapped eth->h_proto protocol comparisons: 10: (15) if r3 == 0xdd86 goto pc+9 R0=imm2,min_value=2,max_value=2 R1=pkt(id=0,off=0,r=14) R2=pkt_end R3=inv R4=pkt(id=0,off=14,r=14) R5=inv56 R10=fp This is due to the unconditional ... #define htons __builtin_bswap16 #define ntohs __builtin_bswap16 ... in test_progs that causes this. Make use of asm/byteorder.h and use __constant_htons() where possible and only perform the bswap16 when on little endian in non-constant case. Fixes: 6882804c916b ("selftests/bpf: add a test for overlapping packet range checks") Fixes: 37821613626e ("selftests/bpf: add l4 load balancer test based on sched_cls") Reported-by: David S. Miller Signed-off-by: Daniel Borkmann Acked-by: Alexei Starovoitov --- tools/testing/selftests/bpf/bpf_util.h| 19 +++ tools/testing/selftests/bpf/test_l4lb.c | 11 +-- tools/testing/selftests/bpf/test_pkt_access.c | 6 +++--- tools/testing/selftests/bpf/test_progs.c | 10 -- 4 files changed, 31 insertions(+), 15 deletions(-) diff --git a/tools/testing/selftests/bpf/bpf_util.h b/tools/testing/selftests/bpf/bpf_util.h index 84a5d18..7de2796 100644 --- a/tools/testing/selftests/bpf/bpf_util.h +++ b/tools/testing/selftests/bpf/bpf_util.h @@ -6,6 +6,25 @@ #include #include +#include + +#if __BYTE_ORDER == __LITTLE_ENDIAN +# define __bpf_ntohs(x)__builtin_bswap16(x) +# define __bpf_htons(x)__builtin_bswap16(x) +#elif __BYTE_ORDER == __BIG_ENDIAN +# define __bpf_ntohs(x)(x) +# define __bpf_htons(x)(x) +#else +# error "Fix your __BYTE_ORDER?!" +#endif + +#define bpf_htons(x) \ + (__builtin_constant_p(x) ? \ +__constant_htons(x) : __bpf_htons(x)) +#define bpf_ntohs(x) \ + (__builtin_constant_p(x) ? \ +__constant_ntohs(x) : __bpf_ntohs(x)) + static inline unsigned int bpf_num_possible_cpus(void) { static const char *fcpu = "/sys/devices/system/cpu/possible"; diff --git a/tools/testing/selftests/bpf/test_l4lb.c b/tools/testing/selftests/bpf/test_l4lb.c index 368bfe8..b68b212 100644 --- a/tools/testing/selftests/bpf/test_l4lb.c +++ b/tools/testing/selftests/bpf/test_l4lb.c @@ -19,9 +19,8 @@ #include #include "bpf_helpers.h" #include "test_iptunnel_common.h" +#include "bpf_util.h" -#define htons __builtin_bswap16 -#define ntohs __builtin_bswap16 int _version SEC("version") = 1; static inline __u32 rol32(__u32 word, unsigned int shift) @@ -355,7 +354,7 @@ static __always_inline int process_packet(void *data, __u64 off, void *data_end, iph_len = sizeof(struct ipv6hdr); protocol = ip6h->nexthdr; pckt.proto = protocol; - pkt_bytes = ntohs(ip6h->payload_len); + pkt_bytes = bpf_ntohs(ip6h->payload_len); off += iph_len; if (protocol == IPPROTO_FRAGMENT) { return TC_ACT_SHOT; @@ -377,7 +376,7 @@ static __always_inline int process_packet(void *data, __u64 off, void *data_end, protocol = iph->protocol; pckt.proto = protocol; - pkt_bytes = ntohs(iph->tot_len); + pkt_bytes = bpf_ntohs(iph->tot_len); off += IPV4_HDR_LEN_NO_OPT; if (iph->frag_off & PCKT_FRAGMENTED) @@ -464,9 +463,9 @@ int balancer_ingress(struct __sk_buff *ctx) if (data + nh_off > data_end) return TC_ACT_SHOT; eth_proto = eth->eth_proto; - if (eth_proto == htons(ETH_P_IP)) + if (eth_proto == bpf_htons(ETH_P_IP)) return process_packet(data, nh_off, data_end, false, ctx); - else if (eth_proto == htons(ETH_P_IPV6)) + else if (eth_proto == bpf_htons(ETH_P_IPV6)) return process_packet(data, nh_off, data_end, true, ctx); else return TC_ACT_SHOT; diff --git a/tools/testing/selftests/bpf/test_pkt_access.c b/tools/testing/selftests/bpf/test_pkt_access.c index fd1e083..7113005 100644 --- a/tools/testing/selftests/bpf/test_pkt_access.c +++ b/tools/testing/selftests/bpf/test_pkt_access.c @@ -14,8 +14,8 @@ #include #include #include "bpf_helpers.h" +#include "bpf_util.h" -#define _htons __builtin_bswap16 #define barrier() __asm__ __volatile__("": : :"memory") int _version SEC("version") = 1; @@ -32,7 +32,7 @@ int process(struct __sk_buff *skb) if (eth + 1 > data_end) return TC_ACT_SHOT; - if (eth->h_proto == _htons(ETH_P_IP)) { + if (eth->h_proto == bpf_htons(ETH_P_IP)) { struct iphdr *iph = (struct iphdr *)(eth + 1); if (iph + 1 > data_end) @@ -40,7 +40,7 @@ int process(struct __sk_buff *skb) ihl_len = iph->ihl * 4; proto = iph->protocol;
[PATCH net-next 2/5] bpf: add various test cases to verifier selftests
Add several test cases around ldimm64, fp arithmetic and direct packet access. Signed-off-by: Daniel Borkmann Acked-by: Alexei Starovoitov --- tools/testing/selftests/bpf/test_verifier.c | 137 1 file changed, 137 insertions(+) diff --git a/tools/testing/selftests/bpf/test_verifier.c b/tools/testing/selftests/bpf/test_verifier.c index 95a8d5f..d3395c1 100644 --- a/tools/testing/selftests/bpf/test_verifier.c +++ b/tools/testing/selftests/bpf/test_verifier.c @@ -191,6 +191,86 @@ struct test_val { .result = REJECT, }, { + "test6 ld_imm64", + .insns = { + BPF_RAW_INSN(BPF_LD | BPF_IMM | BPF_DW, 0, 0, 0, 0), + BPF_RAW_INSN(0, 0, 0, 0, 0), + BPF_EXIT_INSN(), + }, + .result = ACCEPT, + }, + { + "test7 ld_imm64", + .insns = { + BPF_RAW_INSN(BPF_LD | BPF_IMM | BPF_DW, 0, 0, 0, 1), + BPF_RAW_INSN(0, 0, 0, 0, 1), + BPF_EXIT_INSN(), + }, + .result = ACCEPT, + }, + { + "test8 ld_imm64", + .insns = { + BPF_RAW_INSN(BPF_LD | BPF_IMM | BPF_DW, 0, 0, 1, 1), + BPF_RAW_INSN(0, 0, 0, 0, 1), + BPF_EXIT_INSN(), + }, + .errstr = "uses reserved fields", + .result = REJECT, + }, + { + "test9 ld_imm64", + .insns = { + BPF_RAW_INSN(BPF_LD | BPF_IMM | BPF_DW, 0, 0, 0, 1), + BPF_RAW_INSN(0, 0, 0, 1, 1), + BPF_EXIT_INSN(), + }, + .errstr = "invalid bpf_ld_imm64 insn", + .result = REJECT, + }, + { + "test10 ld_imm64", + .insns = { + BPF_RAW_INSN(BPF_LD | BPF_IMM | BPF_DW, 0, 0, 0, 1), + BPF_RAW_INSN(0, BPF_REG_1, 0, 0, 1), + BPF_EXIT_INSN(), + }, + .errstr = "invalid bpf_ld_imm64 insn", + .result = REJECT, + }, + { + "test11 ld_imm64", + .insns = { + BPF_RAW_INSN(BPF_LD | BPF_IMM | BPF_DW, 0, 0, 0, 1), + BPF_RAW_INSN(0, 0, BPF_REG_1, 0, 1), + BPF_EXIT_INSN(), + }, + .errstr = "invalid bpf_ld_imm64 insn", + .result = REJECT, + }, + { + "test12 ld_imm64", + .insns = { + BPF_MOV64_IMM(BPF_REG_1, 0), + BPF_RAW_INSN(BPF_LD | BPF_IMM | BPF_DW, 0, BPF_REG_1, 0, 1), + BPF_RAW_INSN(0, 0, 0, 0, 1), + BPF_EXIT_INSN(), + }, + .errstr = "not pointing to valid bpf_map", + .result = REJECT, + }, + { + "test13 ld_imm64", + .insns = { + BPF_MOV64_IMM(BPF_REG_1, 0), + BPF_RAW_INSN(BPF_LD | BPF_IMM | BPF_DW, 0, BPF_REG_1, 0, 1), + BPF_RAW_INSN(0, 0, BPF_REG_1, 0, 1), + BPF_EXIT_INSN(), + }, + .errstr = "invalid bpf_ld_imm64 insn", + .result = REJECT, + }, + { "no bpf_exit", .insns = { BPF_ALU64_REG(BPF_MOV, BPF_REG_0, BPF_REG_2), @@ -331,6 +411,30 @@ struct test_val { .result = REJECT, }, { + "invalid fp arithmetic", + /* If this gets ever changed, make sure JITs can deal with it. */ + .insns = { + BPF_MOV64_IMM(BPF_REG_0, 0), + BPF_MOV64_REG(BPF_REG_1, BPF_REG_10), + BPF_ALU64_IMM(BPF_SUB, BPF_REG_1, 8), + BPF_STX_MEM(BPF_DW, BPF_REG_1, BPF_REG_0, 0), + BPF_EXIT_INSN(), + }, + .errstr_unpriv = "R1 pointer arithmetic", + .result_unpriv = REJECT, + .errstr = "R1 invalid mem access", + .result = REJECT, + }, + { + "non-invalid fp arithmetic", + .insns = { + BPF_MOV64_IMM(BPF_REG_0, 0), + BPF_STX_MEM(BPF_DW, BPF_REG_10, BPF_REG_0, -8), + BPF_EXIT_INSN(), + }, + .result = ACCEPT, + }, + { "invalid argument register", .insns = { BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, @@ -1801,6 +1905,20 @@ struct test_val { .result = ACCEPT, }, { + "unpriv: adding of fp", +
[PATCH net-next 0/5] Misc BPF updates
This set cleans up ldimm64 leftovers from early eBPF days and adds couple of test cases related to this to the verifier test suite. It also cleans up the kallsyms spinlock (had same patch also in queue) by relaxing it through switching to _bh variant. It fixes up test_progs in relation to htons/ntohs and adds accessor macros for the percpu tests in test_maps. Thanks! Daniel Borkmann (4): bpf, x86_64/arm64: remove old ldimm64 artifacts from jits bpf: add various test cases to verifier selftests bpf: fix _htons occurences in test_progs bpf: provide a generic macro for percpu values for selftests Hannes Frederic Sowa (1): bpf: bpf_lock on kallsysms doesn't need to be irqsave arch/arm64/net/bpf_jit_comp.c | 9 -- arch/x86/net/bpf_jit_comp.c | 7 -- kernel/bpf/core.c | 12 +-- tools/testing/selftests/bpf/bpf_util.h| 26 + tools/testing/selftests/bpf/test_l4lb.c | 11 +-- tools/testing/selftests/bpf/test_maps.c | 37 +++ tools/testing/selftests/bpf/test_pkt_access.c | 6 +- tools/testing/selftests/bpf/test_progs.c | 10 +- tools/testing/selftests/bpf/test_verifier.c | 137 ++ 9 files changed, 199 insertions(+), 56 deletions(-) -- 1.9.3
[PATCH net-next 7/9] drivers: net: xgene: Workaround for HW errata 10GE_4
From: Quan Nguyen This patch adds workaround for HW errata 10GE_4: "XGENET_ICM_ECM_DROP_COUNT_REG_0 reg not clear on read". Signed-off-by: Quan Nguyen Signed-off-by: Iyappan Subramanian --- drivers/net/ethernet/apm/xgene/xgene_enet_hw.c| 2 ++ drivers/net/ethernet/apm/xgene/xgene_enet_sgmac.c | 5 + drivers/net/ethernet/apm/xgene/xgene_enet_xgmac.c | 2 ++ 3 files changed, 9 insertions(+) diff --git a/drivers/net/ethernet/apm/xgene/xgene_enet_hw.c b/drivers/net/ethernet/apm/xgene/xgene_enet_hw.c index f79eb78..60b1a07 100644 --- a/drivers/net/ethernet/apm/xgene/xgene_enet_hw.c +++ b/drivers/net/ethernet/apm/xgene/xgene_enet_hw.c @@ -623,6 +623,8 @@ static void xgene_gmac_get_drop_cnt(struct xgene_enet_pdata *pdata, xgene_enet_rd_mcx_csr(pdata, ICM_ECM_DROP_COUNT_REG0_ADDR, &count); *rx = ICM_DROP_COUNT(count); *tx = ECM_DROP_COUNT(count); + /* Errata: 10GE_4 - Fix ICM_ECM_DROP_COUNT not clear-on-read */ + xgene_enet_rd_mcx_csr(pdata, ECM_CONFIG0_REG_0_ADDR, &count); } static void xgene_enet_config_ring_if_assoc(struct xgene_enet_pdata *pdata) diff --git a/drivers/net/ethernet/apm/xgene/xgene_enet_sgmac.c b/drivers/net/ethernet/apm/xgene/xgene_enet_sgmac.c index b253069..ca1f723 100644 --- a/drivers/net/ethernet/apm/xgene/xgene_enet_sgmac.c +++ b/drivers/net/ethernet/apm/xgene/xgene_enet_sgmac.c @@ -197,6 +197,11 @@ static void xgene_sgmac_get_drop_cnt(struct xgene_enet_pdata *pdata, count = xgene_enet_rd_mcx_csr(pdata, addr); *rx = ICM_DROP_COUNT(count); *tx = ECM_DROP_COUNT(count); + /* Errata: 10GE_4 - ICM_ECM_DROP_COUNT not clear-on-read */ + addr = (pdata->enet_id != XGENE_ENET1) ? + XG_MCX_ECM_CONFIG0_REG_0_ADDR : + ECM_CONFIG0_REG_0_ADDR + pdata->port_id * OFFSET_4; + xgene_enet_rd_mcx_csr(pdata, addr); } static void xgene_enet_config_ring_if_assoc(struct xgene_enet_pdata *p) diff --git a/drivers/net/ethernet/apm/xgene/xgene_enet_xgmac.c b/drivers/net/ethernet/apm/xgene/xgene_enet_xgmac.c index a317596..3e4cd79 100644 --- a/drivers/net/ethernet/apm/xgene/xgene_enet_xgmac.c +++ b/drivers/net/ethernet/apm/xgene/xgene_enet_xgmac.c @@ -239,6 +239,8 @@ static void xgene_xgmac_get_drop_cnt(struct xgene_enet_pdata *pdata, xgene_enet_rd_axg_csr(pdata, XGENET_ICM_ECM_DROP_COUNT_REG0, &count); *rx = ICM_DROP_COUNT(count); *tx = ECM_DROP_COUNT(count); + /* Errata: 10GE_4 - ICM_ECM_DROP_COUNT not clear-on-read */ + xgene_enet_rd_axg_csr(pdata, XGENET_ECM_CONFIG0_REG_0, &count); } static void xgene_enet_config_ring_if_assoc(struct xgene_enet_pdata *pdata) -- 1.9.1
[PATCH net-next 9/9] drivers: net: xgene: Workaround for HW errata 10GE_10/ENET_15
From: Quan Nguyen This patch adds workaround for HW errata 10GE_10 and ENET_15: "HW statistic counters value are duplicated". - RFCS duplicates RALN counter - RFLR duplicates RUND counter - TFCS duplicates TFRG counter - RALN should be intepreted as 0 in 10G mode Signed-off-by: Quan Nguyen Signed-off-by: Iyappan Subramanian --- .../net/ethernet/apm/xgene/xgene_enet_ethtool.c| 30 ++ drivers/net/ethernet/apm/xgene/xgene_enet_main.c | 20 --- drivers/net/ethernet/apm/xgene/xgene_enet_main.h | 2 ++ 3 files changed, 44 insertions(+), 8 deletions(-) diff --git a/drivers/net/ethernet/apm/xgene/xgene_enet_ethtool.c b/drivers/net/ethernet/apm/xgene/xgene_enet_ethtool.c index 5e8660e..8d9ed2b 100644 --- a/drivers/net/ethernet/apm/xgene/xgene_enet_ethtool.c +++ b/drivers/net/ethernet/apm/xgene/xgene_enet_ethtool.c @@ -71,6 +71,7 @@ struct xgene_gstrings_stats { XGENE_EXTD_STAT(rx_oversize_pkt_cntr, ROVR, 16), XGENE_EXTD_STAT(rx_fragments_cntr, RFRG, 16), XGENE_EXTD_STAT(rx_jabber_cntr, RJBR, 16), + XGENE_EXTD_STAT(rx_jabber_recov_cntr, DUMP, 0), XGENE_EXTD_STAT(rx_dropped_pkt_cntr, RDRP, 16), XGENE_EXTD_STAT(rx_overrun_cntr, DUMP, 0), XGENE_EXTD_STAT(tx_multicast_pkt_cntr, TMCA, 31), @@ -96,9 +97,16 @@ struct xgene_gstrings_stats { #define XGENE_STATS_LENARRAY_SIZE(gstrings_stats) #define XGENE_EXTD_STATS_LEN ARRAY_SIZE(gstrings_extd_stats) +#define RFCS_IDX 7 +#define RALN_IDX 13 +#define RFLR_IDX 14 #define FALSE_RFLR_IDX 15 -#define RX_OVERRUN_IDX 23 -#define TX_UNDERRUN_IDX42 +#define RUND_IDX 18 +#define FALSE_RJBR_IDX 22 +#define RX_OVERRUN_IDX 24 +#define TFCS_IDX 38 +#define TFRG_IDX 42 +#define TX_UNDERRUN_IDX43 static void xgene_get_drvinfo(struct net_device *ndev, struct ethtool_drvinfo *info) @@ -217,24 +225,36 @@ static int xgene_get_sset_count(struct net_device *ndev, int sset) static void xgene_get_extd_stats(struct xgene_enet_pdata *pdata) { + u32 tmp[XGENE_EXTD_STATS_LEN]; u32 rx_drop, tx_drop; - u32 tmp; int i; for (i = 0; i < XGENE_EXTD_STATS_LEN; i++) { pdata->mac_ops->read_stats(pdata, - gstrings_extd_stats[i].addr, &tmp); + gstrings_extd_stats[i].addr, &tmp[i]); if (gstrings_extd_stats[i].mask) - pdata->extd_stats[i] += tmp & + pdata->extd_stats[i] += tmp[i] & GENMASK(gstrings_extd_stats[i].mask - 1, 0); } + if (pdata->phy_mode == PHY_INTERFACE_MODE_XGMII) { + /* Errata 10GE_10 - SW should intepret RALN as 0 */ + pdata->extd_stats[RALN_IDX] = 0; + } else { + /* Errata ENET_15 - Fixes RFCS, RFLR, TFCS counter */ + pdata->extd_stats[RFCS_IDX] -= tmp[RALN_IDX]; + pdata->extd_stats[RFLR_IDX] -= tmp[RUND_IDX]; + pdata->extd_stats[TFCS_IDX] -= tmp[TFRG_IDX]; + } + pdata->mac_ops->get_drop_cnt(pdata, &rx_drop, &tx_drop); pdata->extd_stats[RX_OVERRUN_IDX] += rx_drop; pdata->extd_stats[TX_UNDERRUN_IDX] += tx_drop; /* Errata 10GE_8 - Update Frame recovered from Errata 10GE_8/ENET_11 */ pdata->extd_stats[FALSE_RFLR_IDX] = pdata->false_rflr; + /* Errata ENET_15 - Jabber Frame recov'ed from Errata 10GE_10/ENET_15 */ + pdata->extd_stats[FALSE_RJBR_IDX] = pdata->vlan_rjbr; } int xgene_extd_stats_init(struct xgene_enet_pdata *pdata) diff --git a/drivers/net/ethernet/apm/xgene/xgene_enet_main.c b/drivers/net/ethernet/apm/xgene/xgene_enet_main.c index c2d38da..01e389d 100644 --- a/drivers/net/ethernet/apm/xgene/xgene_enet_main.c +++ b/drivers/net/ethernet/apm/xgene/xgene_enet_main.c @@ -656,6 +656,18 @@ static void xgene_enet_free_pagepool(struct xgene_enet_desc_ring *buf_pool, buf_pool->head = head; } +/* Errata 10GE_10 and ENET_15 - Fix duplicated HW statistic counters */ +static bool xgene_enet_errata_10GE_10(struct sk_buff *skb, u32 len, u8 status) +{ + if (status == INGRESS_CRC && + len >= (ETHER_STD_PACKET + 1) && + len <= (ETHER_STD_PACKET + 4) && + skb->protocol == htons(ETH_P_8021Q)) + return true; + + return false; +} + /* Errata 10GE_8 and ENET_11 - allow packet with length <=64B */ static bool xgene_enet_errata_10GE_8(struct sk_buff *skb, u32 len, u8 status) { @@ -706,14 +718,16 @@ static int xgene_enet_rx_frame(struct xgene_enet_desc_ring *rx_ring, status = (GET_VAL(ELERR, le64_to_cpu(raw_desc->m0)) << LERR_LEN) | GET_VAL(LERR, le64_to_cpu(raw_desc->m0)); if (unlikely(status)) { - if (!xgene_
[PATCH net-next 6/9] drivers: net: xgene: Add rx_overrun/tx_underrun statistic
From: Quan Nguyen This patch adds rx_overrun and tx_underrun ethtool statistic counters. Signed-off-by: Quan Nguyen Signed-off-by: Iyappan Subramanian --- drivers/net/ethernet/apm/xgene/xgene_enet_ethtool.c | 16 +--- drivers/net/ethernet/apm/xgene/xgene_enet_hw.c | 11 +++ drivers/net/ethernet/apm/xgene/xgene_enet_hw.h | 8 drivers/net/ethernet/apm/xgene/xgene_enet_main.h| 1 + drivers/net/ethernet/apm/xgene/xgene_enet_sgmac.c | 14 ++ drivers/net/ethernet/apm/xgene/xgene_enet_xgmac.c | 11 +++ drivers/net/ethernet/apm/xgene/xgene_enet_xgmac.h | 4 7 files changed, 62 insertions(+), 3 deletions(-) diff --git a/drivers/net/ethernet/apm/xgene/xgene_enet_ethtool.c b/drivers/net/ethernet/apm/xgene/xgene_enet_ethtool.c index bbc90b6..369658b 100644 --- a/drivers/net/ethernet/apm/xgene/xgene_enet_ethtool.c +++ b/drivers/net/ethernet/apm/xgene/xgene_enet_ethtool.c @@ -71,6 +71,7 @@ struct xgene_gstrings_stats { XGENE_EXTD_STAT(rx_fragments_cntr, RFRG, 16), XGENE_EXTD_STAT(rx_jabber_cntr, RJBR, 16), XGENE_EXTD_STAT(rx_dropped_pkt_cntr, RDRP, 16), + XGENE_EXTD_STAT(rx_overrun_cntr, DUMP, 0), XGENE_EXTD_STAT(tx_multicast_pkt_cntr, TMCA, 31), XGENE_EXTD_STAT(tx_broadcast_pkt_cntr, TBCA, 31), XGENE_EXTD_STAT(tx_pause_ctrl_frame_cntr, TXPF, 16), @@ -88,11 +89,14 @@ struct xgene_gstrings_stats { XGENE_EXTD_STAT(tx_ctrl_frame_cntr, TXCF, 12), XGENE_EXTD_STAT(tx_oversize_frame_cntr, TOVR, 12), XGENE_EXTD_STAT(tx_undersize_frame_cntr, TUND, 12), - XGENE_EXTD_STAT(tx_fragments_cntr, TFRG, 12) + XGENE_EXTD_STAT(tx_fragments_cntr, TFRG, 12), + XGENE_EXTD_STAT(tx_underrun_cntr, DUMP, 0) }; #define XGENE_STATS_LENARRAY_SIZE(gstrings_stats) #define XGENE_EXTD_STATS_LEN ARRAY_SIZE(gstrings_extd_stats) +#define RX_OVERRUN_IDX 22 +#define TX_UNDERRUN_IDX41 static void xgene_get_drvinfo(struct net_device *ndev, struct ethtool_drvinfo *info) @@ -211,15 +215,21 @@ static int xgene_get_sset_count(struct net_device *ndev, int sset) static void xgene_get_extd_stats(struct xgene_enet_pdata *pdata) { + u32 rx_drop, tx_drop; u32 tmp; int i; for (i = 0; i < XGENE_EXTD_STATS_LEN; i++) { pdata->mac_ops->read_stats(pdata, gstrings_extd_stats[i].addr, &tmp); - pdata->extd_stats[i] += tmp & - GENMASK(gstrings_extd_stats[i].mask - 1, 0); + if (gstrings_extd_stats[i].mask) + pdata->extd_stats[i] += tmp & + GENMASK(gstrings_extd_stats[i].mask - 1, 0); } + + pdata->mac_ops->get_drop_cnt(pdata, &rx_drop, &tx_drop); + pdata->extd_stats[RX_OVERRUN_IDX] += rx_drop; + pdata->extd_stats[TX_UNDERRUN_IDX] += tx_drop; } int xgene_extd_stats_init(struct xgene_enet_pdata *pdata) diff --git a/drivers/net/ethernet/apm/xgene/xgene_enet_hw.c b/drivers/net/ethernet/apm/xgene/xgene_enet_hw.c index ec5f61f..f79eb78 100644 --- a/drivers/net/ethernet/apm/xgene/xgene_enet_hw.c +++ b/drivers/net/ethernet/apm/xgene/xgene_enet_hw.c @@ -615,6 +615,16 @@ static void xgene_gmac_init(struct xgene_enet_pdata *pdata) xgene_enet_wr_csr(pdata, CFG_BYPASS_ADDR, RESUME_TX); } +static void xgene_gmac_get_drop_cnt(struct xgene_enet_pdata *pdata, + u32 *rx, u32 *tx) +{ + u32 count; + + xgene_enet_rd_mcx_csr(pdata, ICM_ECM_DROP_COUNT_REG0_ADDR, &count); + *rx = ICM_DROP_COUNT(count); + *tx = ECM_DROP_COUNT(count); +} + static void xgene_enet_config_ring_if_assoc(struct xgene_enet_pdata *pdata) { u32 val = 0x; @@ -1025,6 +1035,7 @@ void xgene_enet_mdio_remove(struct xgene_enet_pdata *pdata) .rx_disable = xgene_gmac_rx_disable, .tx_disable = xgene_gmac_tx_disable, .read_stats = xgene_enet_rd_mcx_stats, + .get_drop_cnt = xgene_gmac_get_drop_cnt, .set_speed = xgene_gmac_set_speed, .set_mac_addr = xgene_gmac_set_mac_addr, .set_framesize = xgene_enet_set_frame_size, diff --git a/drivers/net/ethernet/apm/xgene/xgene_enet_hw.h b/drivers/net/ethernet/apm/xgene/xgene_enet_hw.h index 9130c05..c5672df 100644 --- a/drivers/net/ethernet/apm/xgene/xgene_enet_hw.h +++ b/drivers/net/ethernet/apm/xgene/xgene_enet_hw.h @@ -192,6 +192,10 @@ enum xgene_enet_rm { #define CFG_CLE_NXTFPSEL0(val) (((val) << 20) & GENMASK(23, 20)) #define ICM_CONFIG0_REG_0_ADDR 0x0400 #define ICM_CONFIG2_REG_0_ADDR 0x0410 +#define ECM_CONFIG0_REG_0_ADDR 0x0500 +#define ECM_CONFIG0_REG_1_ADDR 0x0504 +#define ICM_ECM_DROP_COUNT_REG0_ADDR 0x0508 +#define ICM_ECM_DROP_COUNT_REG1_ADDR 0x050c #define RX_DV_GATE_REG_0_ADDR 0x05fc #define TX_DV_GATE_EN0
[PATCH net-next 5/9] drivers: net: xgene: Extend ethtool statistics
From: Quan Nguyen This patch adds extended ethtool statistics support. Signed-off-by: Quan Nguyen Signed-off-by: Iyappan Subramanian --- .../net/ethernet/apm/xgene/xgene_enet_ethtool.c| 90 +- drivers/net/ethernet/apm/xgene/xgene_enet_hw.c | 20 + drivers/net/ethernet/apm/xgene/xgene_enet_hw.h | 50 drivers/net/ethernet/apm/xgene/xgene_enet_main.c | 8 ++ drivers/net/ethernet/apm/xgene/xgene_enet_main.h | 5 ++ drivers/net/ethernet/apm/xgene/xgene_enet_sgmac.c | 16 drivers/net/ethernet/apm/xgene/xgene_enet_xgmac.c | 20 + drivers/net/ethernet/apm/xgene/xgene_enet_xgmac.h | 1 + 8 files changed, 209 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/apm/xgene/xgene_enet_ethtool.c b/drivers/net/ethernet/apm/xgene/xgene_enet_ethtool.c index 217cde8..bbc90b6 100644 --- a/drivers/net/ethernet/apm/xgene/xgene_enet_ethtool.c +++ b/drivers/net/ethernet/apm/xgene/xgene_enet_ethtool.c @@ -23,9 +23,17 @@ struct xgene_gstrings_stats { char name[ETH_GSTRING_LEN]; int offset; + u32 addr; + u32 mask; }; #define XGENE_STAT(m) { #m, offsetof(struct rtnl_link_stats64, m) } +#define XGENE_EXTD_STAT(s, a, m) \ + { \ + .name = #s, \ + .addr = a ## _ADDR, \ + .mask = m \ + } static const struct xgene_gstrings_stats gstrings_stats[] = { XGENE_STAT(rx_packets), @@ -40,7 +48,51 @@ struct xgene_gstrings_stats { XGENE_STAT(rx_fifo_errors) }; +static const struct xgene_gstrings_stats gstrings_extd_stats[] = { + XGENE_EXTD_STAT(tx_rx_64b_frame_cntr, TR64, 31), + XGENE_EXTD_STAT(tx_rx_127b_frame_cntr, TR127, 31), + XGENE_EXTD_STAT(tx_rx_255b_frame_cntr, TR255, 31), + XGENE_EXTD_STAT(tx_rx_511b_frame_cntr, TR511, 31), + XGENE_EXTD_STAT(tx_rx_1023b_frame_cntr, TR1K, 31), + XGENE_EXTD_STAT(tx_rx_1518b_frame_cntr, TRMAX, 31), + XGENE_EXTD_STAT(tx_rx_1522b_frame_cntr, TRMGV, 31), + XGENE_EXTD_STAT(rx_fcs_error_cntr, RFCS, 16), + XGENE_EXTD_STAT(rx_multicast_pkt_cntr, RMCA, 31), + XGENE_EXTD_STAT(rx_broadcast_pkt_cntr, RBCA, 31), + XGENE_EXTD_STAT(rx_ctrl_frame_pkt_cntr, RXCF, 16), + XGENE_EXTD_STAT(rx_pause_frame_pkt_cntr, RXPF, 16), + XGENE_EXTD_STAT(rx_unk_opcode_cntr, RXUO, 16), + XGENE_EXTD_STAT(rx_align_err_cntr, RALN, 16), + XGENE_EXTD_STAT(rx_frame_len_err_cntr, RFLR, 16), + XGENE_EXTD_STAT(rx_code_err_cntr, RCDE, 16), + XGENE_EXTD_STAT(rx_carrier_sense_err_cntr, RCSE, 16), + XGENE_EXTD_STAT(rx_undersize_pkt_cntr, RUND, 16), + XGENE_EXTD_STAT(rx_oversize_pkt_cntr, ROVR, 16), + XGENE_EXTD_STAT(rx_fragments_cntr, RFRG, 16), + XGENE_EXTD_STAT(rx_jabber_cntr, RJBR, 16), + XGENE_EXTD_STAT(rx_dropped_pkt_cntr, RDRP, 16), + XGENE_EXTD_STAT(tx_multicast_pkt_cntr, TMCA, 31), + XGENE_EXTD_STAT(tx_broadcast_pkt_cntr, TBCA, 31), + XGENE_EXTD_STAT(tx_pause_ctrl_frame_cntr, TXPF, 16), + XGENE_EXTD_STAT(tx_defer_pkt_cntr, TDFR, 31), + XGENE_EXTD_STAT(tx_excv_defer_pkt_cntr, TEDF, 31), + XGENE_EXTD_STAT(tx_single_col_pkt_cntr, TSCL, 31), + XGENE_EXTD_STAT(tx_multi_col_pkt_cntr, TMCL, 31), + XGENE_EXTD_STAT(tx_late_col_pkt_cntr, TLCL, 31), + XGENE_EXTD_STAT(tx_excv_col_pkt_cntr, TXCL, 31), + XGENE_EXTD_STAT(tx_total_col_cntr, TNCL, 31), + XGENE_EXTD_STAT(tx_pause_frames_hnrd_cntr, TPFH, 16), + XGENE_EXTD_STAT(tx_drop_frame_cntr, TDRP, 16), + XGENE_EXTD_STAT(tx_jabber_frame_cntr, TJBR, 12), + XGENE_EXTD_STAT(tx_fcs_error_cntr, TFCS, 12), + XGENE_EXTD_STAT(tx_ctrl_frame_cntr, TXCF, 12), + XGENE_EXTD_STAT(tx_oversize_frame_cntr, TOVR, 12), + XGENE_EXTD_STAT(tx_undersize_frame_cntr, TUND, 12), + XGENE_EXTD_STAT(tx_fragments_cntr, TFRG, 12) +}; + #define XGENE_STATS_LENARRAY_SIZE(gstrings_stats) +#define XGENE_EXTD_STATS_LEN ARRAY_SIZE(gstrings_extd_stats) static void xgene_get_drvinfo(struct net_device *ndev, struct ethtool_drvinfo *info) @@ -142,6 +194,11 @@ static void xgene_get_strings(struct net_device *ndev, u32 stringset, u8 *data) memcpy(p, gstrings_stats[i].name, ETH_GSTRING_LEN); p += ETH_GSTRING_LEN; } + + for (i = 0; i < XGENE_EXTD_STATS_LEN; i++) { + memcpy(p, gstrings_extd_stats[i].name, ETH_GSTRING_LEN); + p += ETH_GSTRING_LEN; + } } static int xgene_get_sset_count(struct net_device *ndev, int sset) @@ -149,19 +206,50 @@ static int xgene_get_sset_count(struct net_device *ndev, int sset) if (sset != ETH_SS_STATS) return -EINVAL; - return XGENE_STATS_LEN; + return XGENE_STATS_LEN + XGENE_EXTD_STATS_LEN; +} + +static void xgene_get_extd_stats(struct xgene_enet_
[PATCH net-next 8/9] drivers: net: xgene: Add frame recovered statistics counter for errata 10GE_8/ENET_11
From: Quan Nguyen This patch adds statistic counter for frames recovered from HW errata 10GE_8 and ENET_11: "HW reports Length error for valid 64 byte frames with len <46 bytes". Signed-off-by: Quan Nguyen Signed-off-by: Iyappan Subramanian --- drivers/net/ethernet/apm/xgene/xgene_enet_ethtool.c | 9 +++-- drivers/net/ethernet/apm/xgene/xgene_enet_main.c| 2 ++ drivers/net/ethernet/apm/xgene/xgene_enet_main.h| 1 + 3 files changed, 10 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/apm/xgene/xgene_enet_ethtool.c b/drivers/net/ethernet/apm/xgene/xgene_enet_ethtool.c index 369658b..5e8660e 100644 --- a/drivers/net/ethernet/apm/xgene/xgene_enet_ethtool.c +++ b/drivers/net/ethernet/apm/xgene/xgene_enet_ethtool.c @@ -64,6 +64,7 @@ struct xgene_gstrings_stats { XGENE_EXTD_STAT(rx_unk_opcode_cntr, RXUO, 16), XGENE_EXTD_STAT(rx_align_err_cntr, RALN, 16), XGENE_EXTD_STAT(rx_frame_len_err_cntr, RFLR, 16), + XGENE_EXTD_STAT(rx_frame_len_err_recov_cntr, DUMP, 0), XGENE_EXTD_STAT(rx_code_err_cntr, RCDE, 16), XGENE_EXTD_STAT(rx_carrier_sense_err_cntr, RCSE, 16), XGENE_EXTD_STAT(rx_undersize_pkt_cntr, RUND, 16), @@ -95,8 +96,9 @@ struct xgene_gstrings_stats { #define XGENE_STATS_LENARRAY_SIZE(gstrings_stats) #define XGENE_EXTD_STATS_LEN ARRAY_SIZE(gstrings_extd_stats) -#define RX_OVERRUN_IDX 22 -#define TX_UNDERRUN_IDX41 +#define FALSE_RFLR_IDX 15 +#define RX_OVERRUN_IDX 23 +#define TX_UNDERRUN_IDX42 static void xgene_get_drvinfo(struct net_device *ndev, struct ethtool_drvinfo *info) @@ -230,6 +232,9 @@ static void xgene_get_extd_stats(struct xgene_enet_pdata *pdata) pdata->mac_ops->get_drop_cnt(pdata, &rx_drop, &tx_drop); pdata->extd_stats[RX_OVERRUN_IDX] += rx_drop; pdata->extd_stats[TX_UNDERRUN_IDX] += tx_drop; + + /* Errata 10GE_8 - Update Frame recovered from Errata 10GE_8/ENET_11 */ + pdata->extd_stats[FALSE_RFLR_IDX] = pdata->false_rflr; } int xgene_extd_stats_init(struct xgene_enet_pdata *pdata) diff --git a/drivers/net/ethernet/apm/xgene/xgene_enet_main.c b/drivers/net/ethernet/apm/xgene/xgene_enet_main.c index bd2486e..c2d38da 100644 --- a/drivers/net/ethernet/apm/xgene/xgene_enet_main.c +++ b/drivers/net/ethernet/apm/xgene/xgene_enet_main.c @@ -712,6 +712,8 @@ static int xgene_enet_rx_frame(struct xgene_enet_desc_ring *rx_ring, xgene_enet_parse_error(rx_ring, status); rx_ring->rx_dropped++; goto out; + } else { + pdata->false_rflr++; } } diff --git a/drivers/net/ethernet/apm/xgene/xgene_enet_main.h b/drivers/net/ethernet/apm/xgene/xgene_enet_main.h index e4b7786..8afae57 100644 --- a/drivers/net/ethernet/apm/xgene/xgene_enet_main.h +++ b/drivers/net/ethernet/apm/xgene/xgene_enet_main.h @@ -225,6 +225,7 @@ struct xgene_enet_pdata { enum xgene_enet_rm rm; struct xgene_enet_cle cle; u64 *extd_stats; + u64 false_rflr; spinlock_t stats_lock; /* statistics lock */ const struct xgene_mac_ops *mac_ops; spinlock_t mac_lock; /* mac lock */ -- 1.9.1
[PATCH net-next 3/9] drivers: net: xgene: Refactor statistics error parsing code
From: Quan Nguyen This patch fixes the tx error counters and adds more rx error counters. Signed-off-by: Quan Nguyen Signed-off-by: Iyappan Subramanian --- drivers/net/ethernet/apm/xgene/xgene_enet_hw.c | 6 -- drivers/net/ethernet/apm/xgene/xgene_enet_hw.h | 2 -- drivers/net/ethernet/apm/xgene/xgene_enet_main.c | 26 +++- drivers/net/ethernet/apm/xgene/xgene_enet_main.h | 2 ++ 4 files changed, 18 insertions(+), 18 deletions(-) diff --git a/drivers/net/ethernet/apm/xgene/xgene_enet_hw.c b/drivers/net/ethernet/apm/xgene/xgene_enet_hw.c index 3697ba7..06bef14 100644 --- a/drivers/net/ethernet/apm/xgene/xgene_enet_hw.c +++ b/drivers/net/ethernet/apm/xgene/xgene_enet_hw.c @@ -205,30 +205,24 @@ static u32 xgene_enet_ring_len(struct xgene_enet_desc_ring *ring) } void xgene_enet_parse_error(struct xgene_enet_desc_ring *ring, - struct xgene_enet_pdata *pdata, enum xgene_enet_err_code status) { switch (status) { case INGRESS_CRC: ring->rx_crc_errors++; - ring->rx_dropped++; break; case INGRESS_CHECKSUM: case INGRESS_CHECKSUM_COMPUTE: ring->rx_errors++; - ring->rx_dropped++; break; case INGRESS_TRUNC_FRAME: ring->rx_frame_errors++; - ring->rx_dropped++; break; case INGRESS_PKT_LEN: ring->rx_length_errors++; - ring->rx_dropped++; break; case INGRESS_PKT_UNDER: ring->rx_frame_errors++; - ring->rx_dropped++; break; case INGRESS_FIFO_OVERRUN: ring->rx_fifo_errors++; diff --git a/drivers/net/ethernet/apm/xgene/xgene_enet_hw.h b/drivers/net/ethernet/apm/xgene/xgene_enet_hw.h index d250bfe..c6b5bbd 100644 --- a/drivers/net/ethernet/apm/xgene/xgene_enet_hw.h +++ b/drivers/net/ethernet/apm/xgene/xgene_enet_hw.h @@ -380,9 +380,7 @@ static inline u16 xgene_enet_get_numslots(u16 id, u32 size) } void xgene_enet_parse_error(struct xgene_enet_desc_ring *ring, - struct xgene_enet_pdata *pdata, enum xgene_enet_err_code status); - int xgene_enet_mdio_config(struct xgene_enet_pdata *pdata); void xgene_enet_mdio_remove(struct xgene_enet_pdata *pdata); bool xgene_ring_mgr_init(struct xgene_enet_pdata *p); diff --git a/drivers/net/ethernet/apm/xgene/xgene_enet_main.c b/drivers/net/ethernet/apm/xgene/xgene_enet_main.c index e4f2ef2..3f24b83 100644 --- a/drivers/net/ethernet/apm/xgene/xgene_enet_main.c +++ b/drivers/net/ethernet/apm/xgene/xgene_enet_main.c @@ -246,9 +246,9 @@ static int xgene_enet_tx_completion(struct xgene_enet_desc_ring *cp_ring, skb_frag_t *frag; dma_addr_t *frag_dma_addr; u16 skb_index; - u8 status; - int i, ret = 0; u8 mss_index; + u8 status; + int i; skb_index = GET_VAL(USERINFO, le64_to_cpu(raw_desc->m0)); skb = cp_ring->cp_skb[skb_index]; @@ -275,19 +275,17 @@ static int xgene_enet_tx_completion(struct xgene_enet_desc_ring *cp_ring, /* Checking for error */ status = GET_VAL(LERR, le64_to_cpu(raw_desc->m0)); if (unlikely(status > 2)) { - xgene_enet_parse_error(cp_ring, netdev_priv(cp_ring->ndev), - status); - ret = -EIO; + cp_ring->tx_dropped++; + cp_ring->tx_errors++; } if (likely(skb)) { dev_kfree_skb_any(skb); } else { netdev_err(cp_ring->ndev, "completion skb is NULL\n"); - ret = -EIO; } - return ret; + return 0; } static int xgene_enet_setup_mss(struct net_device *ndev, u32 mss) @@ -711,7 +709,8 @@ static int xgene_enet_rx_frame(struct xgene_enet_desc_ring *rx_ring, if (!xgene_enet_errata_10GE_8(skb, datalen, status)) { dev_kfree_skb_any(skb); xgene_enet_free_pagepool(page_pool, raw_desc, exp_desc); - xgene_enet_parse_error(rx_ring, pdata, status); + xgene_enet_parse_error(rx_ring, status); + rx_ring->rx_dropped++; goto out; } } @@ -1477,6 +1476,8 @@ static void xgene_enet_get_stats64( if (ring) { stats->tx_packets += ring->tx_packets; stats->tx_bytes += ring->tx_bytes; + stats->tx_dropped += ring->tx_dropped; + stats->tx_errors += ring->tx_errors; } } @@ -1485,11 +1486,16 @@ static void xgene_enet_get_stats64( if (ring) { stats->rx_packets += ring->rx_packets; stats->rx_bytes += ring->rx_by
[PATCH net-next 1/9] drivers: net: xgene: Protect indirect MAC access
From: Quan Nguyen This patch adds lock to protect indirect mac access sequence. Signed-off-by: Quan Nguyen Signed-off-by: Iyappan Subramanian --- drivers/net/ethernet/apm/xgene/xgene_enet_hw.c| 2 ++ drivers/net/ethernet/apm/xgene/xgene_enet_main.c | 1 + drivers/net/ethernet/apm/xgene/xgene_enet_main.h | 1 + drivers/net/ethernet/apm/xgene/xgene_enet_sgmac.c | 7 ++- drivers/net/ethernet/apm/xgene/xgene_enet_xgmac.c | 2 ++ 5 files changed, 12 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/apm/xgene/xgene_enet_hw.c b/drivers/net/ethernet/apm/xgene/xgene_enet_hw.c index 2a835e0..3697ba7 100644 --- a/drivers/net/ethernet/apm/xgene/xgene_enet_hw.c +++ b/drivers/net/ethernet/apm/xgene/xgene_enet_hw.c @@ -365,9 +365,11 @@ static void xgene_enet_rd_mcx_mac(struct xgene_enet_pdata *pdata, cmd = pdata->mcx_mac_addr + MAC_COMMAND_REG_OFFSET; cmd_done = pdata->mcx_mac_addr + MAC_COMMAND_DONE_REG_OFFSET; + spin_lock(&pdata->mac_lock); if (!xgene_enet_rd_indirect(addr, rd, cmd, cmd_done, rd_addr, rd_data)) netdev_err(pdata->ndev, "MCX mac read failed, addr: %04x\n", rd_addr); + spin_unlock(&pdata->mac_lock); } static void xgene_gmac_set_mac_addr(struct xgene_enet_pdata *pdata) diff --git a/drivers/net/ethernet/apm/xgene/xgene_enet_main.c b/drivers/net/ethernet/apm/xgene/xgene_enet_main.c index 5f37ed3..9a28ac3 100644 --- a/drivers/net/ethernet/apm/xgene/xgene_enet_main.c +++ b/drivers/net/ethernet/apm/xgene/xgene_enet_main.c @@ -2055,6 +2055,7 @@ static int xgene_enet_probe(struct platform_device *pdev) goto err; xgene_enet_setup_ops(pdata); + spin_lock_init(&pdata->mac_lock); if (pdata->phy_mode == PHY_INTERFACE_MODE_XGMII) { ndev->features |= NETIF_F_TSO | NETIF_F_RXCSUM; diff --git a/drivers/net/ethernet/apm/xgene/xgene_enet_main.h b/drivers/net/ethernet/apm/xgene/xgene_enet_main.h index 0d4be24..827b33d 100644 --- a/drivers/net/ethernet/apm/xgene/xgene_enet_main.h +++ b/drivers/net/ethernet/apm/xgene/xgene_enet_main.h @@ -221,6 +221,7 @@ struct xgene_enet_pdata { struct xgene_enet_cle cle; struct rtnl_link_stats64 stats; const struct xgene_mac_ops *mac_ops; + spinlock_t mac_lock; /* mac lock */ const struct xgene_port_ops *port_ops; struct xgene_ring_ops *ring_ops; const struct xgene_cle_ops *cle_ops; diff --git a/drivers/net/ethernet/apm/xgene/xgene_enet_sgmac.c b/drivers/net/ethernet/apm/xgene/xgene_enet_sgmac.c index a8e063b..4dd41f5 100644 --- a/drivers/net/ethernet/apm/xgene/xgene_enet_sgmac.c +++ b/drivers/net/ethernet/apm/xgene/xgene_enet_sgmac.c @@ -130,6 +130,7 @@ static u32 xgene_enet_rd_indirect(struct xgene_indirect_ctl *ctl, u32 rd_addr) static u32 xgene_enet_rd_mac(struct xgene_enet_pdata *p, u32 rd_addr) { + u32 val; struct xgene_indirect_ctl ctl = { .addr = p->mcx_mac_addr + MAC_ADDR_REG_OFFSET, .ctl = p->mcx_mac_addr + MAC_READ_REG_OFFSET, @@ -137,7 +138,11 @@ static u32 xgene_enet_rd_mac(struct xgene_enet_pdata *p, u32 rd_addr) .cmd_done = p->mcx_mac_addr + MAC_COMMAND_DONE_REG_OFFSET }; - return xgene_enet_rd_indirect(&ctl, rd_addr); + spin_lock(&p->mac_lock); + val = xgene_enet_rd_indirect(&ctl, rd_addr); + spin_unlock(&p->mac_lock); + + return val; } static int xgene_enet_ecc_init(struct xgene_enet_pdata *p) diff --git a/drivers/net/ethernet/apm/xgene/xgene_enet_xgmac.c b/drivers/net/ethernet/apm/xgene/xgene_enet_xgmac.c index 423240c..9a2d0ca 100644 --- a/drivers/net/ethernet/apm/xgene/xgene_enet_xgmac.c +++ b/drivers/net/ethernet/apm/xgene/xgene_enet_xgmac.c @@ -158,9 +158,11 @@ static void xgene_enet_rd_mac(struct xgene_enet_pdata *pdata, cmd = pdata->mcx_mac_addr + MAC_COMMAND_REG_OFFSET; cmd_done = pdata->mcx_mac_addr + MAC_COMMAND_DONE_REG_OFFSET; + spin_lock(&pdata->mac_lock); if (!xgene_enet_rd_indirect(addr, rd, cmd, cmd_done, rd_addr, rd_data)) netdev_err(pdata->ndev, "MCX mac read failed, addr: %04x\n", rd_addr); + spin_unlock(&pdata->mac_lock); } static bool xgene_enet_rd_pcs(struct xgene_enet_pdata *pdata, -- 1.9.1
[PATCH net-next 2/9] drivers: net: xgene: Remove redundant local stats
From: Quan Nguyen Commit 5944701df90d ("net: remove useless memset's in drivers get_stats64") makes the pdata->stats redundant. This patch removes pdata->stats and updates get_stats64() callback accordingly. Signed-off-by: Quan Nguyen Signed-off-by: Iyappan Subramanian --- drivers/net/ethernet/apm/xgene/xgene_enet_ethtool.c | 7 --- drivers/net/ethernet/apm/xgene/xgene_enet_main.c| 4 +--- drivers/net/ethernet/apm/xgene/xgene_enet_main.h| 1 - 3 files changed, 5 insertions(+), 7 deletions(-) diff --git a/drivers/net/ethernet/apm/xgene/xgene_enet_ethtool.c b/drivers/net/ethernet/apm/xgene/xgene_enet_ethtool.c index 28fdedc..217cde8 100644 --- a/drivers/net/ethernet/apm/xgene/xgene_enet_ethtool.c +++ b/drivers/net/ethernet/apm/xgene/xgene_enet_ethtool.c @@ -25,7 +25,7 @@ struct xgene_gstrings_stats { int offset; }; -#define XGENE_STAT(m) { #m, offsetof(struct xgene_enet_pdata, stats.m) } +#define XGENE_STAT(m) { #m, offsetof(struct rtnl_link_stats64, m) } static const struct xgene_gstrings_stats gstrings_stats[] = { XGENE_STAT(rx_packets), @@ -156,11 +156,12 @@ static void xgene_get_ethtool_stats(struct net_device *ndev, struct ethtool_stats *dummy, u64 *data) { - void *pdata = netdev_priv(ndev); + struct rtnl_link_stats64 stats; int i; + dev_get_stats(ndev, &stats); for (i = 0; i < XGENE_STATS_LEN; i++) - *data++ = *(u64 *)(pdata + gstrings_stats[i].offset); + data[i] = *(u64 *)((char *)&stats + gstrings_stats[i].offset); } static void xgene_get_pauseparam(struct net_device *ndev, diff --git a/drivers/net/ethernet/apm/xgene/xgene_enet_main.c b/drivers/net/ethernet/apm/xgene/xgene_enet_main.c index 9a28ac3..e4f2ef2 100644 --- a/drivers/net/ethernet/apm/xgene/xgene_enet_main.c +++ b/drivers/net/ethernet/apm/xgene/xgene_enet_main.c @@ -1466,10 +1466,9 @@ static int xgene_enet_create_desc_rings(struct net_device *ndev) static void xgene_enet_get_stats64( struct net_device *ndev, - struct rtnl_link_stats64 *storage) + struct rtnl_link_stats64 *stats) { struct xgene_enet_pdata *pdata = netdev_priv(ndev); - struct rtnl_link_stats64 *stats = &pdata->stats; struct xgene_enet_desc_ring *ring; int i; @@ -1493,7 +1492,6 @@ static void xgene_enet_get_stats64( stats->rx_dropped += ring->rx_dropped; } } - memcpy(storage, stats, sizeof(struct rtnl_link_stats64)); } static int xgene_enet_set_mac_address(struct net_device *ndev, void *addr) diff --git a/drivers/net/ethernet/apm/xgene/xgene_enet_main.h b/drivers/net/ethernet/apm/xgene/xgene_enet_main.h index 827b33d..5e6fd71 100644 --- a/drivers/net/ethernet/apm/xgene/xgene_enet_main.h +++ b/drivers/net/ethernet/apm/xgene/xgene_enet_main.h @@ -219,7 +219,6 @@ struct xgene_enet_pdata { int phy_mode; enum xgene_enet_rm rm; struct xgene_enet_cle cle; - struct rtnl_link_stats64 stats; const struct xgene_mac_ops *mac_ops; spinlock_t mac_lock; /* mac lock */ const struct xgene_port_ops *port_ops; -- 1.9.1
[PATCH net-next 4/9] drivers: net: xgene: Remove unused macros
From: Quan Nguyen This patch cleans up unused macros to improve readability. Signed-off-by: Quan Nguyen Signed-off-by: Iyappan Subramanian --- drivers/net/ethernet/apm/xgene/xgene_enet_hw.h | 6 -- 1 file changed, 6 deletions(-) diff --git a/drivers/net/ethernet/apm/xgene/xgene_enet_hw.h b/drivers/net/ethernet/apm/xgene/xgene_enet_hw.h index c6b5bbd..5a9f9d5 100644 --- a/drivers/net/ethernet/apm/xgene/xgene_enet_hw.h +++ b/drivers/net/ethernet/apm/xgene/xgene_enet_hw.h @@ -217,12 +217,6 @@ enum xgene_enet_rm { #define FULL_DUPLEX2 BIT(0) #define PAD_CRCBIT(2) #define LENGTH_CHK BIT(4) -#define SCAN_AUTO_INCR BIT(5) -#define TBYT_ADDR 0x38 -#define TPKT_ADDR 0x39 -#define TDRP_ADDR 0x45 -#define TFCS_ADDR 0x47 -#define TUND_ADDR 0x4a #define TSO_IPPROTO_TCP1 -- 1.9.1
[PATCH net-next 0/9] drivers: net: xgene: Add ethtool stats and bug fixes
This patch set, - adds ethtool extended statistics support - addresses errata workarounds - fixes bugs related to statistics Signed-off-by: Iyappan Subramanian Signed-off-by: Quan Nguyen --- Quan Nguyen (9): drivers: net: xgene: Protect indirect MAC access drivers: net: xgene: Remove redundant local stats drivers: net: xgene: Refactor statistics error parsing code drivers: net: xgene: Remove unused macros drivers: net: xgene: Extend ethtool statistics drivers: net: xgene: Add rx_overrun/tx_underrun statistic drivers: net: xgene: Workaround for HW errata 10GE_4 drivers: net: xgene: Add frame recovered statistics counter for errata 10GE_8/ENET_11 drivers: net: xgene: Workaround for HW errata 10GE_10/ENET_15 .../net/ethernet/apm/xgene/xgene_enet_ethtool.c| 132 - drivers/net/ethernet/apm/xgene/xgene_enet_hw.c | 41 ++- drivers/net/ethernet/apm/xgene/xgene_enet_hw.h | 66 +-- drivers/net/ethernet/apm/xgene/xgene_enet_main.c | 57 ++--- drivers/net/ethernet/apm/xgene/xgene_enet_main.h | 13 +- drivers/net/ethernet/apm/xgene/xgene_enet_sgmac.c | 42 ++- drivers/net/ethernet/apm/xgene/xgene_enet_xgmac.c | 35 ++ drivers/net/ethernet/apm/xgene/xgene_enet_xgmac.h | 5 + 8 files changed, 357 insertions(+), 34 deletions(-) -- 1.9.1
Re: [PATCH v1 net-next 3/6] net: add new control message for incoming HW-timestamped packets
> diff --git a/net/core/dev.c b/net/core/dev.c > index 1b3317c..0a78f7f 100644 > --- a/net/core/dev.c > +++ b/net/core/dev.c > @@ -160,6 +160,7 @@ static int netif_rx_internal(struct sk_buff *skb); > static int call_netdevice_notifiers_info(unsigned long val, > struct net_device *dev, > struct netdev_notifier_info *info); > +static struct napi_struct *napi_by_id(unsigned int napi_id); > > /* > * The @dev_base_head list is protected by @dev_base_lock and the rtnl > @@ -863,6 +864,23 @@ struct net_device *dev_get_by_index(struct net *net, int > ifindex) > } > EXPORT_SYMBOL(dev_get_by_index); > > +struct net_device *dev_get_by_napi_id(unsigned int napi_id) > +{ > + struct net_device *dev = NULL; > + struct napi_struct *napi; > + > + rcu_read_lock(); > + > + napi = napi_by_id(napi_id); > + if (napi) > + dev = napi->dev; > + > + rcu_read_unlock(); > + > + return dev; > +} > +EXPORT_SYMBOL(dev_get_by_napi_id); Returning dev without holding a reference is not safe. You'll probably have to call this with rcu_read_lock held instead. Also, this generic napi function should probably be in a separate patch. > diff --git a/net/socket.c b/net/socket.c > index c2564eb..5ea5f29 100644 > --- a/net/socket.c > +++ b/net/socket.c > @@ -662,6 +662,26 @@ static bool skb_is_err_queue(const struct sk_buff *skb) > return skb->pkt_type == PACKET_OUTGOING; > } > > +static void put_ts_pktinfo(struct msghdr *msg, struct sk_buff *skb) > +{ > +#ifdef CONFIG_NET_RX_BUSY_POLL Let's limit the ifdef scope to napi code. Perhaps we need an skb_get_napi_id helper that returns 0 if the feature is not compiled in. > + struct scm_ts_pktinfo ts_pktinfo; > + struct net_device *orig_dev; > + > + if (skb->napi_id < MIN_NAPI_ID || !skb_mac_header_was_set(skb)) > + return; This MIN_NAPI_ID check can be moved into dev_get_by_napi_id. > /* > * called from sock_recv_timestamp() if sock_flag(sk, SOCK_RCVTSTAMP) > */ > @@ -699,8 +719,12 @@ void __sock_recv_timestamp(struct msghdr *msg, struct > sock *sk, > empty = 0; > if (shhwtstamps && > (sk->sk_tsflags & SOF_TIMESTAMPING_RAW_HARDWARE) && This information is also informative with software timestamps. And getting the real iif is definitely useful outside timestamps. An alternative approach is to add versioning to IP_PKTINFO with a new setsockopt IP_PKTINFO_VERSION plus a new struct in_pktinfo_v2 that extends in_pktinfo. Just a thought.
Re: [PATCH v2 01/21] scatterlist: Introduce sg_map helper functions
On 26/04/17 02:59 AM, wrote: > Good to know that somebody is working on this. Those problems troubled > us as well. Thanks Christian. It's a daunting problem and a there's a lot of work to do before we will ever be where we need to be so any help, even an ack, is greatly appreciated. Logan
Re: [PATCH net-next 02/10] tcp: do not pass timestamp to tcp_rack_detect_loss()
On Tue, 2017-04-25 at 10:15 -0700, Eric Dumazet wrote: > We can use tp->tcp_mstamp as it contains a recent timestamp. > > @@ -165,12 +164,10 @@ void tcp_rack_advance(struct tcp_sock *tp, u8 sacked, > u32 end_seq, > void tcp_rack_reo_timeout(struct sock *sk) > { > struct tcp_sock *tp = tcp_sk(sk); > - struct skb_mstamp now; > u32 timeout, prior_inflight; > > - skb_mstamp_get(&now); Oh this is silly, a timer event is not updating tp->tcp_mstamp yet. I will provide a patch, after tests. > prior_inflight = tcp_packets_in_flight(tp); > - tcp_rack_detect_loss(sk, &now, &timeout); > + tcp_rack_detect_loss(sk, &timeout); > if (prior_inflight != tcp_packets_in_flight(tp)) { > if (inet_csk(sk)->icsk_ca_state != TCP_CA_Recovery) { > tcp_enter_recovery(sk, false); Something like : diff --git a/net/ipv4/tcp_recovery.c b/net/ipv4/tcp_recovery.c index fdac262e277b2f25492f155bbb295d6d87e31d02..75096becac21f09fd14466b5f87dffe5820325b5 100644 --- a/net/ipv4/tcp_recovery.c +++ b/net/ipv4/tcp_recovery.c @@ -167,6 +167,7 @@ void tcp_rack_reo_timeout(struct sock *sk) u32 timeout, prior_inflight; prior_inflight = tcp_packets_in_flight(tp); + skb_mstamp_get(&tp->tcp_mstamp); tcp_rack_detect_loss(sk, &timeout); if (prior_inflight != tcp_packets_in_flight(tp)) { if (inet_csk(sk)->icsk_ca_state != TCP_CA_Recovery) {
Re: [PATCH v2 0/8] NFC: fix device allocation and nfcmrvl crashes
Hi Johan, On Thu, Mar 30, 2017 at 12:15:34PM +0200, Johan Hovold wrote: > This started out with the observation that the nfcmrvl_uart driver > unconditionally dereferenced the tty class device despite the fact that > not every tty has an associated struct device (Unix98 ptys). Some > further changes were needed in the common nfcmrvl code to fully address > this, some of which also incidentally fixed a few related bugs (e.g. > resource leaks in error paths). > > While fixing this I stumbled over a regression in NFC core that lead to > broken registration error paths and misnamed workqueues. > > Note that this has only been tested by configuring the n_hci line > discipline for different ttys without any actual NFC hardware connected. > > Johan > > > Changes in v2 > - fix typo in commit message (1/8) > - release reset gpio in error paths (3/8) > - fix description of patch impact (3/8) > - allow gpio 0 to be used for reset signalling (8/8, new) > > > Johan Hovold (8): > NFC: fix broken device allocation > NFC: nfcmrvl_uart: add missing tty-device sanity check > NFC: nfcmrvl: do not use device-managed resources > NFC: nfcmrvl: use nfc-device for firmware download > NFC: nfcmrvl: fix firmware-management initialisation > NFC: nfcmrvl_uart: fix device-node leak during probe > NFC: nfcmrvl_usb: use interface as phy device > NFC: nfcmrvl: allow gpio 0 for reset signalling Applied, thanks. Cheers, Samuel.
Re: [PATCH 5/7] IB/hfi1: use pcie_flr instead of duplicating it
On Tue, 2017-04-25 at 14:39 -0500, Bjorn Helgaas wrote: > On Mon, Apr 24, 2017 at 04:35:07PM +0200, Christoph Hellwig wrote: > > On Mon, Apr 24, 2017 at 02:16:31PM +, Byczkowski, Jakub wrote: > > > Tested-by: Jakub Byczkowski > > > > Are you (and Doug) ok with queueing this up in the PCI tree? > > Applied this with Jakub's tested-by and Doug's ack to pci/virtualization > for v4.12. > > This still leaves these: > > [PATCH 4/7] ixgbe: use pcie_flr instead of duplicating it > [PATCH 6/7] crypto: qat: use pcie_flr instead of duplicating it > [PATCH 7/7] liquidio: use pcie_flr instead of duplicating it > > I haven't seen any response to 4 and 6. Felix reported an unused > variable in 7. Let me know if you'd like me to do anything with > these. Just provided my ACK for ixgbe patch. signature.asc Description: This is a digitally signed message part
Re: [PATCH 4/7] ixgbe: use pcie_flr instead of duplicating it
On Thu, 2017-04-13 at 16:53 +0200, Christoph Hellwig wrote: > Signed-off-by: Christoph Hellwig > --- > drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 16 ++-- > 1 file changed, 2 insertions(+), 14 deletions(-) Acked-by: Jeff Kirsher Sorry for the late ACK. signature.asc Description: This is a digitally signed message part
[PATCH 2/2] arm64: dts: apq8016-sbc: Correct WLAN LED default-trigger
The TX status trigger of the wlan interface is named phy0tx, so this updates the default-trigger for the WLAN LED to use that instead. Signed-off-by: Bjorn Andersson --- Note that without patch 1/2 this trigger does not fire - but there's also no harm in picking the two patches through separate trees. arch/arm64/boot/dts/qcom/apq8016-sbc.dtsi | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/arm64/boot/dts/qcom/apq8016-sbc.dtsi b/arch/arm64/boot/dts/qcom/apq8016-sbc.dtsi index 5d83b02b7c4a..21a8f5ce8955 100644 --- a/arch/arm64/boot/dts/qcom/apq8016-sbc.dtsi +++ b/arch/arm64/boot/dts/qcom/apq8016-sbc.dtsi @@ -178,7 +178,7 @@ led@5 { label = "apq8016-sbc:yellow:wlan"; gpios = <&pm8916_mpps 2 GPIO_ACTIVE_HIGH>; - linux,default-trigger = "wlan"; + linux,default-trigger = "phy0tx"; default-state = "off"; }; -- 2.12.0
[PATCH 1/2] wcn36xx: Pass used skb to ieee80211_tx_status()
As the tx skbs are collected they should be passed to ieee80211_tx_status() rather than ieee80211_free_txskb(), as the prior will take care of monitoring and LED triggers while the latter will consider the skb dropped. Signed-off-by: Bjorn Andersson --- drivers/net/wireless/ath/wcn36xx/dxe.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/wireless/ath/wcn36xx/dxe.c b/drivers/net/wireless/ath/wcn36xx/dxe.c index 87dfdaf9044c..938b7bd733cf 100644 --- a/drivers/net/wireless/ath/wcn36xx/dxe.c +++ b/drivers/net/wireless/ath/wcn36xx/dxe.c @@ -371,7 +371,7 @@ static void reap_tx_dxes(struct wcn36xx *wcn, struct wcn36xx_dxe_ch *ch) info = IEEE80211_SKB_CB(ctl->skb); if (!(info->flags & IEEE80211_TX_CTL_REQ_TX_STATUS)) { /* Keep frame until TX status comes */ - ieee80211_free_txskb(wcn->hw, ctl->skb); + ieee80211_tx_status(wcn->hw, ctl->skb); } spin_lock(&ctl->skb_lock); if (wcn->queues_stopped) { -- 2.12.0
Re: [PATCH net-next 6/6] bpf: show bpf programs
On 04/26/2017 08:24 PM, Hannes Frederic Sowa wrote: Signed-off-by: Hannes Frederic Sowa --- include/uapi/linux/bpf.h | 32 +++- kernel/bpf/core.c| 30 +- 2 files changed, 48 insertions(+), 14 deletions(-) diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index e553529929f683..d6506e320953d5 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -101,20 +101,26 @@ enum bpf_map_type { BPF_MAP_TYPE_HASH_OF_MAPS, }; +#define BPF_PROG_TYPES \ + X(BPF_PROG_TYPE_UNSPEC),\ + X(BPF_PROG_TYPE_SOCKET_FILTER), \ + X(BPF_PROG_TYPE_KPROBE),\ + X(BPF_PROG_TYPE_SCHED_CLS), \ + X(BPF_PROG_TYPE_SCHED_ACT), \ + X(BPF_PROG_TYPE_TRACEPOINT),\ + X(BPF_PROG_TYPE_XDP), \ + X(BPF_PROG_TYPE_PERF_EVENT),\ + X(BPF_PROG_TYPE_CGROUP_SKB),\ + X(BPF_PROG_TYPE_CGROUP_SOCK), \ + X(BPF_PROG_TYPE_LWT_IN),\ + X(BPF_PROG_TYPE_LWT_OUT), \ + X(BPF_PROG_TYPE_LWT_XMIT), + + enum bpf_prog_type { - BPF_PROG_TYPE_UNSPEC, - BPF_PROG_TYPE_SOCKET_FILTER, - BPF_PROG_TYPE_KPROBE, - BPF_PROG_TYPE_SCHED_CLS, - BPF_PROG_TYPE_SCHED_ACT, - BPF_PROG_TYPE_TRACEPOINT, - BPF_PROG_TYPE_XDP, - BPF_PROG_TYPE_PERF_EVENT, - BPF_PROG_TYPE_CGROUP_SKB, - BPF_PROG_TYPE_CGROUP_SOCK, - BPF_PROG_TYPE_LWT_IN, - BPF_PROG_TYPE_LWT_OUT, - BPF_PROG_TYPE_LWT_XMIT, +#define X(type) type Defining X in uapi could clash easily with other headers e.g. from application side. + BPF_PROG_TYPES +#undef X }; enum bpf_attach_type { diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c index 3ba175a24e971a..685c1d0f31e029 100644 --- a/kernel/bpf/core.c +++ b/kernel/bpf/core.c @@ -536,13 +536,41 @@ static void ebpf_proc_stop(struct seq_file *s, void *v) rcu_read_unlock(); } +static const char *bpf_type_string(enum bpf_prog_type type) +{ + static const char *bpf_type_names[] = { +#define X(type) #type + BPF_PROG_TYPES +#undef X + }; + + if (type >= ARRAY_SIZE(bpf_type_names)) + return ""; + + return bpf_type_names[type]; +} + static int ebpf_proc_show(struct seq_file *s, void *v) { + struct bpf_prog *prog; + struct bpf_prog_aux *aux; + char prog_tag[sizeof(prog->tag) * 2 + 1] = { }; + if (v == SEQ_START_TOKEN) { - seq_printf(s, "# tag\n"); + seq_printf(s, "# tag\t\t\ttype\t\t\truntime\tcap\tmemlock\n"); return 0; } + aux = v; + prog = aux->prog; + + bin2hex(prog_tag, prog->tag, sizeof(prog->tag)); + seq_printf(s, "%s\t%s\t%s\t%s\t%llu\n", prog_tag, + bpf_type_string(prog->type), + prog->jited ? "jit" : "int", + prog->priv_cap_sys_admin ? "priv" : "unpriv", + prog->pages * 1ULL << PAGE_SHIFT); Yeah, so that would be quite similar to what we dump in bpf_prog_show_fdinfo() modulo the priv bit. I generally agree that a facility for dumping all progs is needed and it was also on the TODO list after the bpf(2) cmd for dumping program insns back to user space. I think the procfs interface has pro and cons: the upside is that you can use it with tools like cat to inspect it, but what you still cannot do is to say that you want to see the prog insns for, say, prog #4 from that list. If we could iterate over that list through fds via bpf(2) syscall, you could i) present the same info you have above via fdinfo already and ii) also dump the BPF insns from that specific program through a BPF_PROG_DUMP bpf(2) command. Once that dump also supports maps in progs, you could go further and fetch related map fds for inspection, etc. Such option of iterating through that would need a new BPF syscall cmd aka BPF_PROG_GET_NEXT which returns the first prog from the list and you would walk the next one by passing the current fd, which can later also be closed as not needed anymore. We could restrict that dump to capable(CAP_SYS_ADMIN), and the kernel tree would need to ship a tool e.g. under tools/bpf/ that can be used for inspection. + return 0; }
Re: [PATCH net-next 6/6] bpf: show bpf programs
On Wed, Apr 26, 2017 at 08:24:19PM +0200, Hannes Frederic Sowa wrote: > > +static const char *bpf_type_string(enum bpf_prog_type type) > +{ > + static const char *bpf_type_names[] = { > +#define X(type) #type > + BPF_PROG_TYPES > +#undef X > + }; > + > + if (type >= ARRAY_SIZE(bpf_type_names)) > + return ""; > + > + return bpf_type_names[type]; > +} > + > static int ebpf_proc_show(struct seq_file *s, void *v) > { > + struct bpf_prog *prog; > + struct bpf_prog_aux *aux; > + char prog_tag[sizeof(prog->tag) * 2 + 1] = { }; > + > if (v == SEQ_START_TOKEN) { > - seq_printf(s, "# tag\n"); > + seq_printf(s, "# tag\t\t\ttype\t\t\truntime\tcap\tmemlock\n"); > return 0; > } > > + aux = v; > + prog = aux->prog; > + > + bin2hex(prog_tag, prog->tag, sizeof(prog->tag)); > + seq_printf(s, "%s\t%s\t%s\t%s\t%llu\n", prog_tag, > +bpf_type_string(prog->type), > +prog->jited ? "jit" : "int", > +prog->priv_cap_sys_admin ? "priv" : "unpriv", > +prog->pages * 1ULL << PAGE_SHIFT); As I said several times already I'm strongly against procfs style of exposing information about the programs. I don't want this to become debugfs for bpf. Maintaining the list of all loaded programs is fine and we need a way to iterate through them, but procfs is obviously not the interface to do that. Programs/maps are binary whereas any fs interface is text. It also doesn't scale with large number of programs/maps. I prefer Daniel's suggestion on adding 'get_next' like API. Also would be good if you can wait for Martin to finish his prog->handle/id patches. Then user space will be able to iterate through all the progs/maps and fetch all info about them through syscall in extensible way. And you wouldn't need to abuse kallsyms list for different purpose.
Re: [PATCH net-next 4/6] bpf: track if the bpf program was loaded with SYS_ADMIN capabilities
On Wed, Apr 26, 2017 at 08:24:17PM +0200, Hannes Frederic Sowa wrote: > Signed-off-by: Hannes Frederic Sowa > --- > include/linux/filter.h | 6 -- > kernel/bpf/core.c | 4 +++- > kernel/bpf/syscall.c | 7 --- > kernel/bpf/verifier.c | 4 ++-- > net/core/filter.c | 6 +++--- > 5 files changed, 16 insertions(+), 11 deletions(-) > > diff --git a/include/linux/filter.h b/include/linux/filter.h > index 63624c619e371b..635311f57bf24f 100644 > --- a/include/linux/filter.h > +++ b/include/linux/filter.h > @@ -413,7 +413,8 @@ struct bpf_prog { > locked:1, /* Program image locked? */ > gpl_compatible:1, /* Is filter GPL compatible? > */ > cb_access:1,/* Is control block accessed? */ > - dst_needed:1; /* Do we need dst entry? */ > + dst_needed:1, /* Do we need dst entry? */ > + priv_cap_sys_admin:1; /* Where we loaded as > sys_admin? */ This is no go. You didn't provide any explanation whatsoever why you want to see this boolean value.
Re: [PATCH net-next 4/6] bpf: track if the bpf program was loaded with SYS_ADMIN capabilities
On 04/26/2017 08:24 PM, Hannes Frederic Sowa wrote: Signed-off-by: Hannes Frederic Sowa Ahh, looks this got swapped with 3/6. --- include/linux/filter.h | 6 -- kernel/bpf/core.c | 4 +++- kernel/bpf/syscall.c | 7 --- kernel/bpf/verifier.c | 4 ++-- net/core/filter.c | 6 +++--- 5 files changed, 16 insertions(+), 11 deletions(-) diff --git a/include/linux/filter.h b/include/linux/filter.h index 63624c619e371b..635311f57bf24f 100644 --- a/include/linux/filter.h +++ b/include/linux/filter.h @@ -413,7 +413,8 @@ struct bpf_prog { locked:1, /* Program image locked? */ gpl_compatible:1, /* Is filter GPL compatible? */ cb_access:1,/* Is control block accessed? */ - dst_needed:1; /* Do we need dst entry? */ + dst_needed:1, /* Do we need dst entry? */ + priv_cap_sys_admin:1; /* Where we loaded as sys_admin? */ kmemcheck_bitfield_end(meta); enum bpf_prog_type type; /* Type of BPF program */ [...] diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 6f8b6ed690be93..24c9dac374770f 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -3488,7 +3488,7 @@ int bpf_check(struct bpf_prog **prog, union bpf_attr *attr) if (ret < 0) goto skip_full_check; - env->allow_ptr_leaks = capable(CAP_SYS_ADMIN); + env->allow_ptr_leaks = env->prog->priv_cap_sys_admin; ret = do_check(env); @@ -3589,7 +3589,7 @@ int bpf_analyzer(struct bpf_prog *prog, const struct bpf_ext_analyzer_ops *ops, if (ret < 0) goto skip_full_check; - env->allow_ptr_leaks = capable(CAP_SYS_ADMIN); + env->allow_ptr_leaks = prog->priv_cap_sys_admin; ret = do_check(env); diff --git a/net/core/filter.c b/net/core/filter.c index 9a37860a80fc78..dc020d40bb770a 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -1100,7 +1100,7 @@ int bpf_prog_create(struct bpf_prog **pfp, struct sock_fprog_kern *fprog) if (!bpf_check_basics_ok(fprog->filter, fprog->len)) return -EINVAL; - fp = bpf_prog_alloc(bpf_prog_size(fprog->len), 0); + fp = bpf_prog_alloc(bpf_prog_size(fprog->len), 0, false); if (!fp) return -ENOMEM; Did you check that transferring allow_ptr_leaks doesn't have a side effect on the nfp JIT? I believe it can also do cbpf migrations to a certain extend.
Re: xdp_redirect ifindex vs port. Was: best API for returning/setting egress port?
On Wed, Apr 26, 2017 at 10:58:45AM -0700, Alexei Starovoitov wrote: > On 4/26/17 9:35 AM, John Fastabend wrote: > > > > > As Alexei also mentioned before, ifindex vs port makes no real > > > difference seen from the bpf program side. It is userspace's > > > responsibility to add ifindex/port's to the bpf-maps, according to how > > > the bpf program "policy" want to "connect" these ports. The > > > port-table system add one extra step, of also adding this port to the > > > port-table (which lives inside the kernel). > > > > > > > I'm not sure I understand the "lives inside the kernel" bit. I assumed > > the 'map' should be a bpf map and behave like any other bpf map. > > > > I wanted a new map to be defined, something like this from the bpf > > programmer > > side. > > > > struct bpf_map_def SEC("maps") port_table = > > .type = BPF_MAP_TYPE_PORT_CONNECTION, > > .key_size = sizeof(u32), > > .value_size = BPF_PORT_CONNECTION_SIZE, > > .max_entries = 256, > > }; > > I like the idea. > We have prog_array, perf_event_array, cgroup_array map specializations. > This one can be new netdev_array with some new bpf_redirect-like helper > accessing it. > > > > When loading the XDP program, we also need to pass along a port table > > > "id" this XDP program is associated with (and if it doesn't exists you > > > create it). And your userspace "control-plane" application also need > > > to know this port table "id", when adding a new port. > > > > So the user space application that is loading the program also needs > > to handle this map. This seems correct to me. But I don't see the > > value in making some new port table when we already have well understood > > framework for maps. > > +1 > > > > > > > The concept of having multiple port tables is key. As this implies we > > > can have several simultaneous "data-planes" that is *isolated* from > > > each-other. Think about how network-namespaces/containers want > > > isolation. A subtle thing I'm afraid to mention, is that oppose to the > > > ifindex model, a port table with mapping to a net_device pointer, would > > > allow (faster) delivery into the container's inner net_device, which > > > sort of violates the isolation, but I would argue it is not a problem > > > as this net_device pointer could only be added from a process within the > > > namespace. I like this feature, but it could easily be disallowed via > > > port insertion-time validation. > > > > > > > I think the above optimization should be allowed. And agree multiple port > > tables (maps?) is needed. Again all this points to using standard maps > > logic in my mind. For permissions and different domains, which I think > > you were starting to touch on, it looks like we could extend the pinning > > API. > > At the moment it does an inode_permission(inode, MAY_WRITE) check but I > > presume this could be extended. None of this would be needed in v1 and > > could be added subsequently. read-only maps seems doable. > > this is great idea. Once BPF_MAP_TYPE_NETDEV_ARRAY is populated > the user space can make it readonly to prevent further changes. > > From user space it can be done similar to perf_events/cgroups as well. > bpf_map_update_elem(&netdev_array, &port_num, &ifindex) > should work. > For bpf_map_lookup_elem() from such netdev_array we can return > ifindex back. > The bpf_map_show_fdinfo() can be customized as well to pretty print > ifindexes of netdevs stored in there. > I agree with both of you on all of these points. Having the port redirection in a new type of map and/or array seems like the way to go. I understood Jesper's perspecitive when thinking about a way to pass a port-table id down, but I think the idea that the userspace loader code defining the maps is going to be the one making this link is the right idea and handling things like ifindex changes (rather than identifiers that perform lookups in other tables) is going to have to be yet another exercise left up to the...user. :-)
[Patch net-next] ipv4: get rid of ip_ra_lock
After commit 1215e51edad1 ("ipv4: fix a deadlock in ip_ra_control") we always take RTNL lock for ip_ra_control() which is the only place we update the list ip_ra_chain, so the ip_ra_lock is no longer needed, we just need to disable BH there. Signed-off-by: Cong Wang --- net/ipv4/ip_sockglue.c | 15 +++ 1 file changed, 7 insertions(+), 8 deletions(-) diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c index 1d46d05..2923ea1 100644 --- a/net/ipv4/ip_sockglue.c +++ b/net/ipv4/ip_sockglue.c @@ -330,7 +330,6 @@ int ip_cmsg_send(struct sock *sk, struct msghdr *msg, struct ipcm_cookie *ipc, sent to multicast group to reach destination designated router. */ struct ip_ra_chain __rcu *ip_ra_chain; -static DEFINE_SPINLOCK(ip_ra_lock); static void ip_ra_destroy_rcu(struct rcu_head *head) @@ -352,21 +351,21 @@ int ip_ra_control(struct sock *sk, unsigned char on, new_ra = on ? kmalloc(sizeof(*new_ra), GFP_KERNEL) : NULL; - spin_lock_bh(&ip_ra_lock); + ASSERT_RTNL(); + local_bh_disable(); for (rap = &ip_ra_chain; -(ra = rcu_dereference_protected(*rap, - lockdep_is_held(&ip_ra_lock))) != NULL; +(ra = rtnl_dereference(*rap)) != NULL; rap = &ra->next) { if (ra->sk == sk) { if (on) { - spin_unlock_bh(&ip_ra_lock); + local_bh_enable(); kfree(new_ra); return -EADDRINUSE; } /* dont let ip_call_ra_chain() use sk again */ ra->sk = NULL; RCU_INIT_POINTER(*rap, ra->next); - spin_unlock_bh(&ip_ra_lock); + local_bh_enable(); if (ra->destructor) ra->destructor(sk); @@ -381,7 +380,7 @@ int ip_ra_control(struct sock *sk, unsigned char on, } } if (!new_ra) { - spin_unlock_bh(&ip_ra_lock); + local_bh_enable(); return -ENOBUFS; } new_ra->sk = sk; @@ -390,7 +389,7 @@ int ip_ra_control(struct sock *sk, unsigned char on, RCU_INIT_POINTER(new_ra->next, ra); rcu_assign_pointer(*rap, new_ra); sock_hold(sk); - spin_unlock_bh(&ip_ra_lock); + local_bh_enable(); return 0; } -- 2.5.5
Re: [PATCH net-next] bpf: restore skb->sk before pskb_trim() call
On Wed, Apr 26, 2017 at 09:09:23AM -0700, Eric Dumazet wrote: > From: Eric Dumazet > > While testing a fix [1] in ___pskb_trim(), addressing the WARN_ON_ONCE() > in skb_try_coalesce() reported by Andrey, I found that we had an skb > with skb->sk set but no skb->destructor. > > This invalidated heuristic found in commit 158f323b9868 ("net: adjust > skb->truesize in pskb_expand_head()") and in cited patch. > > Considering the BUG_ON(skb->sk) we have in skb_orphan(), we should > restrain the temporary setting to a minimal section. > > [1] https://patchwork.ozlabs.org/patch/755570/ > net: adjust skb->truesize in ___pskb_trim() > > Fixes: 8f917bba0042 ("bpf: pass sk to helper functions") > Signed-off-by: Eric Dumazet > Cc: Willem de Bruijn > Cc: Andrey Konovalov Ahh. Thanks for the fix. Acked-by: Alexei Starovoitov
Re: [PATCH net-next 3/6] bpf: bpf_progs stores all loaded programs
[ -dan...@iogearbox.com (wrong address) ] On 04/26/2017 08:24 PM, Hannes Frederic Sowa wrote: We later want to give users a quick dump of what is possible with procfs, so store a list of all currently loaded bpf programs. Later this list will be printed in procfs. Signed-off-by: Hannes Frederic Sowa --- include/linux/filter.h | 4 ++-- kernel/bpf/core.c | 51 +++--- kernel/bpf/syscall.c | 4 ++-- 3 files changed, 27 insertions(+), 32 deletions(-) diff --git a/include/linux/filter.h b/include/linux/filter.h index 9a7786db14fa53..63624c619e371b 100644 --- a/include/linux/filter.h +++ b/include/linux/filter.h @@ -753,8 +753,8 @@ bpf_address_lookup(unsigned long addr, unsigned long *size, return ret; } -void bpf_prog_kallsyms_add(struct bpf_prog *fp); -void bpf_prog_kallsyms_del(struct bpf_prog *fp); +void bpf_prog_link(struct bpf_prog *fp); +void bpf_prog_unlink(struct bpf_prog *fp); #else /* CONFIG_BPF_JIT */ diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c index 043f634ff58d87..2139118258cdf8 100644 --- a/kernel/bpf/core.c +++ b/kernel/bpf/core.c @@ -365,22 +365,6 @@ static struct latch_tree_root bpf_tree __cacheline_aligned; int bpf_jit_kallsyms __read_mostly; -static void bpf_prog_ksym_node_add(struct bpf_prog_aux *aux) -{ - WARN_ON_ONCE(!list_empty(&aux->bpf_progs_head)); - list_add_tail_rcu(&aux->bpf_progs_head, &bpf_progs); - latch_tree_insert(&aux->ksym_tnode, &bpf_tree, &bpf_tree_ops); -} - -static void bpf_prog_ksym_node_del(struct bpf_prog_aux *aux) -{ - if (list_empty(&aux->bpf_progs_head)) - return; - - latch_tree_erase(&aux->ksym_tnode, &bpf_tree, &bpf_tree_ops); - list_del_rcu(&aux->bpf_progs_head); -} - static bool bpf_prog_kallsyms_candidate(const struct bpf_prog *fp) { return fp->jited && !bpf_prog_was_classic(fp); @@ -392,38 +376,45 @@ static bool bpf_prog_kallsyms_verify_off(const struct bpf_prog *fp) fp->aux->bpf_progs_head.prev == LIST_POISON2; } -void bpf_prog_kallsyms_add(struct bpf_prog *fp) +void bpf_prog_link(struct bpf_prog *fp) { - if (!bpf_prog_kallsyms_candidate(fp) || - !capable(CAP_SYS_ADMIN)) - return; + struct bpf_prog_aux *aux = fp->aux; spin_lock_bh(&bpf_lock); - bpf_prog_ksym_node_add(fp->aux); + list_add_tail_rcu(&aux->bpf_progs_head, &bpf_progs); + if (bpf_prog_kallsyms_candidate(fp)) + latch_tree_insert(&aux->ksym_tnode, &bpf_tree, &bpf_tree_ops); Hmm, this has the side-effect that it will hook up all progs to kallsyms (I left out !capable(CAP_SYS_ADMIN) intentionally). spin_unlock_bh(&bpf_lock); } -void bpf_prog_kallsyms_del(struct bpf_prog *fp) +void bpf_prog_unlink(struct bpf_prog *fp) { - if (!bpf_prog_kallsyms_candidate(fp)) - return; + struct bpf_prog_aux *aux = fp->aux; spin_lock_bh(&bpf_lock); - bpf_prog_ksym_node_del(fp->aux); + list_del_rcu(&aux->bpf_progs_head); + if (bpf_prog_kallsyms_candidate(fp)) + latch_tree_erase(&aux->ksym_tnode, &bpf_tree, &bpf_tree_ops); spin_unlock_bh(&bpf_lock); } static struct bpf_prog *bpf_prog_kallsyms_find(unsigned long addr) { struct latch_tree_node *n; + struct bpf_prog *prog; if (!bpf_jit_kallsyms_enabled()) return NULL; n = latch_tree_find((void *)addr, &bpf_tree, &bpf_tree_ops); - return n ? - container_of(n, struct bpf_prog_aux, ksym_tnode)->prog : - NULL; + if (!n) + return NULL; + + prog = container_of(n, struct bpf_prog_aux, ksym_tnode)->prog; + if (!prog->priv_cap_sys_admin) Where is this bit defined? If we return NULL on them anyway, why adding them to the tree in the first place, just wastes resources on the traversal? + return NULL; + + return prog; } const char *__bpf_address_lookup(unsigned long addr, unsigned long *size, @@ -474,6 +465,10 @@ int bpf_get_kallsym(unsigned int symnum, unsigned long *value, char *type, rcu_read_lock(); list_for_each_entry_rcu(aux, &bpf_progs, bpf_progs_head) { + if (!bpf_prog_kallsyms_candidate(aux->prog) || + !aux->prog->priv_cap_sys_admin) Same here. + continue; + if (it++ != symnum) continue; diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index 13642c73dca0b4..d61d1bd3e6fee6 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -664,7 +664,7 @@ void bpf_prog_put(struct bpf_prog *prog) { if (atomic_dec_and_test(&prog->aux->refcnt)) { trace_bpf_prog_put_rcu(prog); - bpf_prog_kallsyms_del(prog); + bpf_prog_unlink(prog); call_rcu(&prog->aux->rcu, __bpf_prog_put_rcu); } } @@ -858,7 +858,7 @@ st
Re: [PATCH v2 01/21] scatterlist: Introduce sg_map helper functions
On 26/04/17 01:44 AM, Christoph Hellwig wrote: > I think we'll at least need a draft of those to make sense of these > patches. Otherwise they just look very clumsy. Ok, I'll work up a draft proposal and send it in a couple days. But without a lot of cleanup such as this series it's not going to even be able to compile. > I'm sorry but this API is just a trainwreck. Right now we have the > nice little kmap_atomic API, which never fails and has a very nice > calling convention where we just pass back the return address, but does > not support sleeping inside the critical section. > > And kmap, whіch may fail and requires the original page to be passed > back. Anything that mixes these two concepts up is simply a non-starter. Ok, well for starters I think you are mistaken about kmap being able to fail. I'm having a hard time finding many users of that function that bother to check for an error when calling it. The main difficulty we have now is that neither of those functions are expected to fail and we need them to be able to in cases where the page doesn't map to system RAM. This patch series is trying to address it for users of scatterlist. I'm certainly open to other suggestions. I also have to disagree that kmap and kmap_atomic are all that "nice". Except for the sleeping restriction and performance, they effectively do the same thing. And it was necessary to write a macro wrapper around kunmap_atomic to ensure that users of that function don't screw it up. (See 597781f3e5.) I'd say the kmap/kmap_atomic functions are the trainwreck and I'm trying to do my best to cleanup a few cases. There are a fair number of cases in the kernel that do something like: if (something) x = kmap(page); else x = kmap_atomic(page); ... if (something) kunmap(page) else kunmap_atomic(x) Which just seems cumbersome to me. In any case, if you can accept an sg_kmap and sg_kmap_atomic api just say so and I'll make the change. But I'll still need a flags variable for SG_MAP_MUST_NOT_FAIL to support legacy cases that have no fail path and both of those functions will need to be pretty nearly replicas of each other. Logan
Re: [PATCH net-next v8 2/3] net sched actions: dump more than TCA_ACT_MAX_PRIO actions per batch
On 17-04-26 09:56 AM, Jiri Pirko wrote: Wed, Apr 26, 2017 at 03:14:38PM CEST, j...@mojatatu.com wrote: On 17-04-26 08:08 AM, Jiri Pirko wrote: [..] Jiri, what are you arguing about if you have done the math? ;-> I can do 3*2*64. What I cannot do is to figure out the real performance impact. Jiri, I do a lot of very large data dumping and setting towards the kernel. You know that. It is why I even have these patches to begin with. The math should be convincing enough. 48B per rule extra for just MPLS in a filter rule. I havent started testing the overhead of flower but i do plan to use it - with about a million rules for offloading. I will give you the numbers then. I think we are at a stalemate. You are not going to convince me to use an attribute with a u8 for a bit flag when I can fit 32 of them in one attribute (with the same cost). And I am not able to convince you that you are wrong to put beauty first. Again: You are looking at this from a manageability point of view which is useful but not the only input into a design. If i can squeeze more data without killing usability - I am all for it. It just doesnt compute that it is ok to use a flag per attribute because it looks beautiful. Hmm. Now that I'm thinking about it, why don't we have NLA_FLAGS with couple of helpers around it? It will be obvious what the attr is, all kernel code would use the same helpers. Would be nice. I think to have flags at that level is useful but it is a different hierarchy level. I am not sure the "actions dump large messages" is a fit for that level. cheers, jamal
Low speed MPLS to virtio-net
Started MPLS on the branch - Everything was fine. When I tried to run MPLS on a real network of virtual machines, there were problems with the speed: root@containers:~# iperf3 -c 10.194.10.2 -B 10.194.10.1 -Z Connecting to host 10.194.10.2, port 5201 [ 4] local 10.194.10.1 port 49533 connected to 10.194.10.2 port 5201 [ ID] Interval Transfer Bandwidth Retr Cwnd [ 4] 0.00-1.00 sec 1018 KBytes 8.34 Mbits/sec 238 5.64 KBytes [ 4] 1.00-2.00 sec 1.42 MBytes 11.9 Mbits/sec 373 1.41 KBytes [ 4] 2.00-3.00 sec 1.43 MBytes 12.0 Mbits/sec 379 5.64 KBytes [ 4] 3.00-4.00 sec 1.43 MBytes 12.0 Mbits/sec 376 5.64 KBytes [ 4] 4.00-5.00 sec 1.41 MBytes 11.8 Mbits/sec 375 2.82 KBytes [ 4] 5.00-6.00 sec 1.42 MBytes 11.9 Mbits/sec 376 2.82 KBytes [ 4] 6.00-7.00 sec 1.42 MBytes 11.9 Mbits/sec 373 5.64 KBytes [ 4] 7.00-8.00 sec 1.41 MBytes 11.8 Mbits/sec 372 5.64 KBytes [ 4] 8.00-9.00 sec 1.42 MBytes 11.9 Mbits/sec 379 2.82 KBytes [ 4] 9.00-10.00 sec 1.42 MBytes 11.9 Mbits/sec 373 5.64 KBytes - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bandwidth Retr [ 4] 0.00-10.00 sec 13.8 MBytes 11.5 Mbits/sec 3614 sender [ 4] 0.00-10.00 sec 13.6 MBytes 11.4 Mbits/sec receiver iperf Done. root@containers:~# Here are the settings: test0: lo: flags=73 mtu 65536 inet 127.0.0.1 netmask 255.0.0.0 inet6 ::1 prefixlen 128 scopeid 0x10 loop txqueuelen 1000 (Local Loopback) RX packets 0 bytes 0 (0.0 B) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 0 bytes 0 (0.0 B) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 lo:1: flags=73 mtu 65536 inet 10.194.10.1 netmask 255.255.255.255 loop txqueuelen 1000 (Local Loopback) test0p1: flags=4163 mtu 1500 inet 10.194.1.50 netmask 255.255.255.0 broadcast 10.194.1.255 inet6 fe80::b0a7:b1ff:fec1:3d5c prefixlen 64 scopeid 0x20 ether b2:a7:b1:c1:3d:5c txqueuelen 1000 (Ethernet) RX packets 19974 bytes 1410944 (1.3 MiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 3726844 bytes 5236310466 (4.8 GiB) TX errors 0 dropped 3604 overruns 0 carrier 0 collisions 0 test1: lo: flags=73 mtu 65536 inet 127.0.0.1 netmask 255.0.0.0 inet6 ::1 prefixlen 128 scopeid 0x10 loop txqueuelen 1000 (Local Loopback) RX packets 0 bytes 0 (0.0 B) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 0 bytes 0 (0.0 B) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 lo:1: flags=73 mtu 65536 inet 10.194.10.2 netmask 255.255.255.255 loop txqueuelen 1000 (Local Loopback) test1p1: flags=4163 mtu 1500 inet 10.194.1.51 netmask 255.255.255.0 broadcast 10.194.1.255 inet6 fe80::5cc0:45ff:fe1a:9705 prefixlen 64 scopeid 0x20 ether 5e:c0:45:1a:97:05 txqueuelen 1000 (Ethernet) RX packets 2001923 bytes 2806406771 (2.6 GiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 19907 bytes 1485150 (1.4 MiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 Server configuration: root@ne-vlezay80:~# ip -M r 100 via inet 10.194.1.50 dev vlan11 101 via inet 10.194.1.51 dev vlan11 root@ne-vlezay80:~# root@ne-vlezay80:~# ifconfig eth0 Link encap:Ethernet HWaddr 52:54:00:5d:81:90 inet addr:10.247.0.250 Bcast:10.247.0.255 Mask:255.255.255.0 inet6 addr: fe80::5054:ff:fe5d:8190/64 Scope:Link inet6 addr: fd00:1002:1289:10::10/64 Scope:Global UP BROADCAST RUNNING MULTICAST MTU:2500 Metric:1 RX packets:7403 errors:0 dropped:0 overruns:0 frame:0 TX packets:4182 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:621871 (607.2 KiB) TX bytes:445766 (435.3 KiB) eth1 Link encap:Ethernet HWaddr 52:54:00:0b:ff:2e inet addr:192.168.122.1 Bcast:192.168.122.255 Mask:255.255.255.0 inet6 addr: fd00:104:1::1/64 Scope:Global inet6 addr: fe80::5054:ff:fe0b:ff2e/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:2500 Metric:1 RX packets:2204837 errors:0 dropped:5 overruns:0 frame:0 TX packets:2083636 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:3073876412 (2.8 GiB) TX bytes:2897017540 (2.6 GiB) loLink encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 inet6 addr: ::1/128 Scope:Host UP LOOPBACK RUNNING MTU:65536 Metric:1 RX packets:75 errors:0 dropped:0 overruns:0 frame:0 TX packets:75 errors:0 dropped:0 overruns:0 carrier:0 collisio
[GIT] Networking
1) MLX5 bug fixes from Saeed Mahameed et al. a) Release wrong resources when firmware timeout happens b) Wrong check for encapsulation size limits c) UAR memory leak d) ETHTOOL_GRXCLSRLALL fails to fill in info->data 2) Don't cache l3mdev on mis-matches local route, causes net devices to leak refs. From Robert Shearman. 3) Handle fragmented SKBs properly in macsec driver, the problem is that we were mis-sizing the sgvec table. From Jason A. Donenfeld. 4) We cannot have checksum offload enabled for inner UDP tunneled packet during IPSEC, from Ansis Atteka. 5) Fix double SKB free in ravb driver, from Dan Carpenter. 6) Fix CPU port handling in b53 DSA driver, from Florian Dainelli. 7) Don't use on-stack buffers for usb_control_msg() in CAN usb driver, from Maksim Salau. 8) Fix device leak in macvlan driver, from Herbert Xu. We have to purge the broadcast queue properly on port destroy. 9) Fix tx ring entry limit on EF10 devices in sfc driver. From Bert Kenward. 10) Fix memory leaks in team driver, from Pan Bian. 11) Don't setup ipv6_stub before it can be actually used, from Paolo Abeni. 12) Fix tipc socket flow control accounting, from Parthasarathy Bhuvaragan. 13) Fix crash on module unload in hso driver, from Andreas Kemnade. 14) Fix purging of bridge multicast entries, the problem is that if we don't defer it to ndo_uninit it's possible for new entries to get added after we purge. Fix from Xin Long. 15) Don't return garbage for PACKET_HDRLEN getsockopt, from Alexander Potapenko. 16) Fix autoneg stall properly in PHY layer, and revert micrel driver change that was papering over it. From Alexander Kochetkov. 17) Don't dereference an ipv4 route as an ipv6 one in the ip6_tunnnel code, from Cong Wang. 18) Clear out the congestion control private of the TCP socket in all of the right places, from Wei Wang. 19) rawv6_ioctl measures SKB length incorrectly, fix from Jamie Bainbridge. Please pull, thanks a lot! The following changes since commit 94836ecf1e7378b64d37624fbb81fe48fbd4c772: Merge tag 'nfsd-4.11-2' of git://linux-nfs.org/~bfields/linux (2017-04-21 16:37:48 -0700) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git for you to fetch changes up to 105f5528b9bbaa08b526d3405a5bcd2ff0c953c8: ipv6: check raw payload size correctly in ioctl (2017-04-26 14:59:35 -0400) Alexander Kochetkov (1): net: phy: fix auto-negotiation stall due to unavailable interrupt Alexander Potapenko (1): net/packet: check length in getsockopt() called with PACKET_HDRLEN Andreas Kemnade (1): net: hso: fix module unloading Ansis Atteka (1): udp: disable inner UDP checksum offloads in IPsec case Bert Kenward (1): sfc: tx ring can only have 2048 entries for all EF10 NICs Dan Carpenter (2): net: tc35815: move free after the dereference ravb: Double free on error in ravb_start_xmit() David Ahern (2): net: ipv6: send unsolicited NA if enabled for all interfaces net: ipv6: regenerate host route if moved to gc list David S. Miller (4): Merge tag 'mlx5-fixes-2017-04-22' of git://git.kernel.org/.../saeed/linux Merge branch 'dsa-b53-58xx-fixes' Merge tag 'linux-can-fixes-for-4.11-20170425' of git://git.kernel.org/.../mkl/linux-can Revert "phy: micrel: Disable auto negotiation on startup" Eugenia Emantayev (1): net/mlx5e: Fix small packet threshold Florian Fainelli (3): net: dsa: b53: Include IMP/CPU port in dumb forwarding mode net: dsa: b53: Implement software reset for 58xx devices net: dsa: b53: Fix CPU port for 58xx devices Herbert Xu (1): macvlan: Fix device ref leak when purging bc_queue Ilan Tayari (1): net/mlx5e: Fix ETHTOOL_GRXCLSRLALL handling Jamie Bainbridge (1): ipv6: check raw payload size correctly in ioctl Jason A. Donenfeld (2): macsec: avoid heap overflow in skb_to_sgvec macsec: dynamically allocate space for sglist Maksim Salau (1): net: can: usb: gs_usb: Fix buffer on stack Maor Gottlieb (1): net/mlx5: Fix UAR memory leak Martin KaFai Lau (1): net/mlx5e: Fix race in mlx5e_sw_stats and mlx5e_vport_stats Mohamad Haj Yahia (1): net/mlx5: Fix driver load bad flow when having fw initializing timeout Myungho Jung (1): net: core: Prevent from dereferencing null pointer when releasing SKB Or Gerlitz (3): net/mlx5: E-Switch, Correctly deal with inline mode on ConnectX-5 net/mlx5e: Make sure the FW max encap size is enough for ipv4 tunnels net/mlx5e: Make sure the FW max encap size is enough for ipv6 tunnels Pan Bian (1): team: fix memory leaks Paolo Abeni (1): ipv6: move stub initialization after ipv6 setup completion Parthasarathy Bhuvaragan (2): tipc: fix socket flow control accounting error at tipc
Re: [PATCH] ipv6: check raw payload size correctly in ioctl
From: Jamie Bainbridge Date: Wed, 26 Apr 2017 10:43:27 +1000 > In situations where an skb is paged, the transport header pointer and > tail pointer can be the same because the skb contents are in frags. > > This results in ioctl(SIOCINQ/FIONREAD) incorrectly returning a > length of 0 when the length to receive is actually greater than zero. > > skb->len is already correctly set in ip6_input_finish() with > pskb_pull(), so use skb->len as it always returns the correct result > for both linear and paged data. > > Signed-off-by: Jamie Bainbridge Applied and queued up for -stable, thanks.
Re: [PATCH net-next] tcp: memset ca_priv data to 0 properly
From: Wei Wang Date: Tue, 25 Apr 2017 17:38:02 -0700 > From: Wei Wang > > Always zero out ca_priv data in tcp_assign_congestion_control() so that > ca_priv data is cleared out during socket creation. > Also always zero out ca_priv data in tcp_reinit_congestion_control() so > that when cc algorithm is changed, ca_priv data is cleared out as well. > We should still zero out ca_priv data even in TCP_CLOSE state because > user could call connect() on AF_UNSPEC to disconnect the socket and > leave it in TCP_CLOSE state and later call setsockopt() to switch cc > algorithm on this socket. > > Fixes: 2b0a8c9ee ("tcp: add CDG congestion control") > Reported-by: Andrey Konovalov > Signed-off-by: Wei Wang > Acked-by: Eric Dumazet > Acked-by: Yuchung Cheng > Acked-by: Neal Cardwell Applied to 'net' and queued up for -stable, thanks.
Re: [Patch net] ipv6: check skb->protocol before lookup for nexthop
From: Cong Wang Date: Tue, 25 Apr 2017 14:37:15 -0700 > Andrey reported a out-of-bound access in ip6_tnl_xmit(), this > is because we use an ipv4 dst in ip6_tnl_xmit() and cast an IPv4 > neigh key as an IPv6 address: > > neigh = dst_neigh_lookup(skb_dst(skb), > &ipv6_hdr(skb)->daddr); > if (!neigh) > goto tx_err_link_failure; > > addr6 = (struct in6_addr *)&neigh->primary_key; // <=== HERE > addr_type = ipv6_addr_type(addr6); > > if (addr_type == IPV6_ADDR_ANY) > addr6 = &ipv6_hdr(skb)->daddr; > > memcpy(&fl6->daddr, addr6, sizeof(fl6->daddr)); > > Also the network header of the skb at this point should be still IPv4 > for 4in6 tunnels, we shold not just use it as IPv6 header. > > This patch fixes it by checking if skb->protocol is ETH_P_IPV6: if it > is, we are safe to do the nexthop lookup using skb_dst() and > ipv6_hdr(skb)->daddr; if not (aka IPv4), we have no clue about which > dest address we can pick here, we have to rely on callers to fill it > from tunnel config, so just fall to ip6_route_output() to make the > decision. > > Fixes: ea3dc9601bda ("ip6_tunnel: Add support for wildcard tunnel endpoints.") > Reported-by: Andrey Konovalov > Tested-by: Andrey Konovalov > Cc: Steffen Klassert > Signed-off-by: Cong Wang Applied and queued up for -stable, thanks Cong.
Re: [PATCH net-next] virtio-net: on tx, only call napi_disable if tx napi is on
From: Willem de Bruijn Date: Tue, 25 Apr 2017 15:59:17 -0400 > From: Willem de Bruijn > > As of tx napi, device down (`ip link set dev $dev down`) hangs unless > tx napi is enabled. Else napi_enable is not called, so napi_disable > will spin on test_and_set_bit NAPI_STATE_SCHED. > > Only call napi_disable if tx napi is enabled. > > Fixes: 5a719c2552ca ("virtio-net: transmit napi") > Reported-by: Jason Wang > Signed-off-by: Willem de Bruijn Applied, thanks.
Re: [PATCH net-next 0/2] Move sub crq init out of interrupt context
From: Nathan Fontenot Date: Tue, 25 Apr 2017 15:00:58 -0400 > The sub crqs are currently intialized in interrupt context when > handling a crq response fromn the vios server. There is no reason > they must be initialized there. > > Moving the initialization of the sub crqs to the ibmvnic_init routine > allows us to do the initialization outside of interrupt context and > make all of the allocations with GFP_KERNEL instead of GFP_ATOMIC. Series applied, thanks.
Re: [PATCH v3] net: core: Prevent from dereferencing null pointer when releasing SKB
From: Myungho Jung Date: Tue, 25 Apr 2017 11:58:15 -0700 > Added NULL check to make __dev_kfree_skb_irq consistent with kfree > family of functions. > > Link: https://bugzilla.kernel.org/show_bug.cgi?id=195289 > > Signed-off-by: Myungho Jung Applied, thank you.
Re: [PATCH net-next] dt-bindings: mdio: Clarify binding document
From: Florian Fainelli Date: Tue, 25 Apr 2017 11:33:03 -0700 > The described GPIO reset property is applicable to *all* child PHYs. If > we have one reset line per PHY present on the MDIO bus, these > automatically become properties of the child PHY nodes. > > Finally, indicate how the RESET pulse width must be defined, which is > the maximum value of all individual PHYs RESET pulse widths determined > by reading their datasheets. > > Fixes: 69226896ad63 ("mdio_bus: Issue GPIO RESET to PHYs.") > Signed-off-by: Florian Fainelli Applied, thanks Florian.
Re: [PATCH net-next 00/10] tcp: do not use tcp_time_stamp for rcv autotuning
From: Eric Dumazet Date: Tue, 25 Apr 2017 10:15:31 -0700 > Some devices or linux distributions use HZ=100 or HZ=250 > > TCP receive buffer autotuning has poor behavior caused by this choice. > Since autotuning happens after 4 ms or 10 ms, short distance flows > get their receive buffer tuned to a very high value, but after an initial > period where it was frozen to (too small) initial value. > > With BBR (or other CC allowing to increase BDP), we are willing to > increase tcp_rmem[2], but this receive autotuning defect is a blocker > for hosts dealing with gazillions of TCP flows in the data centers, > since many of them have inflated RCVBUF. Risk of OOM is too high. > > Note that TSO autodefer, tcp cubic, and TCP TS options (RFC 7323) > also suffer from our dependency to jiffies (via tcp_time_stamp). > > We have ongoing efforts to improve all that in the future. Looks great, series applied, thanks Eric.
Re: [oss-drivers] Re: [RFC 3/4] nfp: make use of extended ack message reporting
On Wed, Apr 26, 2017 at 10:44:16AM -0400, David Miller wrote: > From: Simon Horman > Date: Wed, 26 Apr 2017 13:13:16 +0200 > > > On Tue, Apr 25, 2017 at 10:20:22AM -0400, David Miller wrote: > >> From: Jamal Hadi Salim > >> Date: Tue, 25 Apr 2017 08:42:32 -0400 > >> > >> > So are we going to standardize these strings? > >> > >> No. > >> > >> > i.e what if some user has written a bash script that depends on this > >> > string and it gets changed later. > >> > >> They can't do that. > >> > >> It's free form extra information an application may or not provide > >> to the user when the kernel emits it. > > > > I don't feel strongly about this and perhaps it can be revisited at some > > point but perhaps it would be worth documenting that he strings do not > > form part of the UAPI as my expectation would have been that they do f.e. to > > facilitate internationalisation. > > These two things are entirely separate. > > We can maintain uptodate translations of the strings, yet document that > they can change at any time and are thus not UAPI. Thanks, I see that now.
Re: [PATCH v2] macsec: dynamically allocate space for sglist
From: "Jason A. Donenfeld" Date: Tue, 25 Apr 2017 19:08:18 +0200 > We call skb_cow_data, which is good anyway to ensure we can actually > modify the skb as such (another error from prior). Now that we have the > number of fragments required, we can safely allocate exactly that amount > of memory. > > Signed-off-by: Jason A. Donenfeld > Cc: Sabrina Dubroca > Cc: secur...@kernel.org > Cc: sta...@vger.kernel.org Applied, thanks.
Re: [PATCH net-next] rhashtable: remove insecure_max_entries param
From: Florian Westphal Date: Tue, 25 Apr 2017 11:41:34 +0200 > no users in the tree, insecure_max_entries is always set to > ht->p.max_size * 2 in rhtashtable_init(). > > Replace only spot that uses it with a ht->p.max_size check. > > Signed-off-by: Florian Westphal Applied, thanks Florian.
Re: [RFC PATCH] netxen_nic: null-terminate serial number string in netxen_check_options()
From: "Jerome Marchand" Date: Tue, 25 Apr 2017 09:42:29 +0200 > The serial_num string in netxen_check_options() is not always properly > null-terminated. I couldn't find the documention on the serial number > format and I suspect a proper integer to string conversion is in > order, but this patch a least prevents the out-of-bound access. > > It solves the following kasan warning: ... > @@ -842,7 +842,7 @@ netxen_check_options(struct netxen_adapter *adapter) > { > u32 fw_major, fw_minor, fw_build, prev_fw_version; > char brd_name[NETXEN_MAX_SHORT_NAME]; > - char serial_num[32]; > + char serial_num[33]; > int i, offset, val, err; > __le32 *ptr32; > struct pci_dev *pdev = adapter->pdev; Another problem is that the serial_num array is only 4-byte aligned by accident. Steps are necessary to make sure the ptr32 assignments don't take unaligned traps. Something like: union { char buf[33]; __le32 dummy; } serial_num;
[PATCH v2 net-next] ip6_tunnel: Fix missing tunnel encapsulation limit option
From: Craig Gallek The IPv6 tunneling code tries to insert IPV6_TLV_TNL_ENCAP_LIMIT and IPV6_TLV_PADN options when an encapsulation limit is defined (the default is a limit of 4). An MTU adjustment is done to account for these options as well. However, the options are never present in the generated packets. The issue appears to be a subtlety between IPV6_DSTOPTS and IPV6_RTHDRDSTOPTS defined in RFC 3542. When the IPIP tunnel driver was written, the encap limit options were included as IPV6_RTHDRDSTOPTS in dst0opt of struct ipv6_txoptions. Later, ipv6_push_nfrags_opts was (correctly) updated to require IPV6_RTHDR options when IPV6_RTHDRDSTOPTS are to be used. This caused the options to no longer be included in v6 encapsulated packets. The fix is to use IPV6_DSTOPTS (in dst1opt of struct ipv6_txoptions) instead. IPV6_DSTOPTS do not have the additional IPV6_RTHDR requirement. Fixes: 1df64a8569c7: ("[IPV6]: Add ip6ip6 tunnel driver.") Fixes: 333fad5364d6: ("[IPV6]: Support several new sockopt / ancillary data in Advanced API (RFC3542)") Signed-off-by: Craig Gallek --- v2: Change tunnel code to use dst1opt rather than making the checks for dst0opt more permissive. net/ipv6/ip6_tunnel.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/net/ipv6/ip6_tunnel.c b/net/ipv6/ip6_tunnel.c index ad15d38b41e8..c81f9541f1f7 100644 --- a/net/ipv6/ip6_tunnel.c +++ b/net/ipv6/ip6_tunnel.c @@ -954,7 +954,7 @@ static void init_tel_txopt(struct ipv6_tel_txoption *opt, __u8 encap_limit) opt->dst_opt[5] = IPV6_TLV_PADN; opt->dst_opt[6] = 1; - opt->ops.dst0opt = (struct ipv6_opt_hdr *) opt->dst_opt; + opt->ops.dst1opt = (struct ipv6_opt_hdr *) opt->dst_opt; opt->ops.opt_nflen = 8; } @@ -1176,7 +1176,7 @@ int ip6_tnl_xmit(struct sk_buff *skb, struct net_device *dev, __u8 dsfield, if (encap_limit >= 0) { init_tel_txopt(&opt, encap_limit); - ipv6_push_nfrag_opts(skb, &opt.ops, &proto, NULL, NULL); + ipv6_push_frag_opts(skb, &opt.ops, &proto); } /* Calculate max headroom for all the headers and adjust -- 2.13.0.rc0.306.g87b477812d-goog
Re: [PATCH v1] net: phy: fix auto-negotiation stall due to unavailable interrupt
From: Alexander Kochetkov Date: Thu, 20 Apr 2017 14:00:04 +0300 > The Ethernet link on an interrupt driven PHY was not coming up if the Ethernet > cable was plugged before the Ethernet interface was brought up. > > The patch trigger PHY state machine to update link state if PHY was requested > to > do auto-negotiation and auto-negotiation complete flag already set. > > During power-up cycle the PHY do auto-negotiation, generate interrupt and set > auto-negotiation complete flag. Interrupt is handled by PHY state machine but > doesn't update link state because PHY is in PHY_READY state. After some time > MAC bring up, start and request PHY to do auto-negotiation. If there are no > new > settings to advertise genphy_config_aneg() doesn't start PHY auto-negotiation. > PHY continue to stay in auto-negotiation complete state and doesn't fire > interrupt. At the same time PHY state machine expect that PHY started > auto-negotiation and is waiting for interrupt from PHY and it won't get it. > > Signed-off-by: Alexander Kochetkov > Cc: stable # v4.9+ Applied, and I reverted the micrel commit too. Thanks.
GSO + changing TX queues == crash?
Hi! I'm seeing crashes with GSO on when changing the number of TX rings. I initially thought it was a nfp driver problem but I managed to reproduce it with i40e now. What I run on the nfp was iperf sending two dozen streams. Then I run this little script while 40Gbps is being sent: bl() { tc -s qdisc show dev p4p1 | \ tail -$((MAX_TX*3)) | \ grep backlog | \ grep -v "backlog 0b" } while true do ethtool -L p4p1 tx 0 a=$(bl | wc -l) echo down $a # there are 8 combined queues, we shouldn't see more backlog [ $a -gt 8 ] && break sleep 2 ethtool -L p4p1 tx 10 echo up $(bl | wc -l) sleep 2 done The idea is to catch when after reconfig more queues have backlog than are configured. Right after this script exits driver gets traffic on queues which are down. It usually reproduces within a minute, I run it with tso on gso off and tso off gso off for 15 minutes and that didn't crash. i40e machine was running kernel 4.10, with the NFP driver I'm able to reproduce on net-next all the way back to 3.16 (I haven't tried older). FWIW the nfp driver is doing: netif_carrier_off() for enabled rings: disable_irq() napi_disable() netif_tx_disable() nfp's free_tx_bufs() netif_set_real_num_tx_queues() for enabled rings: napi_enable() enable_irq() netif_tx_wake_all_queues() nfp's read_link() # does netif_carrier_on() I was entirely convinced that it's a driver problem, but the fact I crashed the i40e made me worry :| This is a crash from i40e, it takes a bit longer to kill it than the nfp, maybe because it takes longer to reconfig: [ 461.822381] BUG: unable to handle kernel NULL pointer dereference at 0008 [ 461.831229] IP: i40e_lan_xmit_frame+0xf1/0x1420 [i40e] [ 461.837045] PGD 0 [ 461.837046] [ 461.841089] Oops: 0002 [#1] SMP [ 461.844665] Modules linked in: xfs nls_iso8859_1 ipmi_devintf ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xtc [ 461.924168] fscache tg3 i40e ptp ahci pps_core libahci fjes [ 461.930595] CPU: 15 PID: 0 Comm: swapper/15 Not tainted 4.11.0-041100rc1-generic #201703051731 [ 461.940340] Hardware name: Dell Inc. PowerEdge R730/072T6D, BIOS 2.3.4 11/08/2016 [ 461.948823] task: a087db2b task.stack: b9b90335 [ 461.955546] RIP: 0010:i40e_lan_xmit_frame+0xf1/0x1420 [i40e] [ 461.961965] RSP: 0018:a087df1c3d80 EFLAGS: 00010293 [ 461.967899] RAX: RBX: a087c3077d00 RCX: [ 461.975971] RDX: RSI: 0007 RDI: a087b4e1d70c [ 461.984044] RBP: a087df1c3e20 R08: a087d440349c R09: [ 461.992117] R10: a087de821a08 R11: R12: a087b7c6 [ 462.000188] R13: 0002 R14: 05ea R15: a087d97b3000 [ 462.008261] FS: () GS:a087df1c() knlGS: [ 462.017423] CS: 0010 DS: ES: CR0: 80050033 [ 462.023941] CR2: 0008 CR3: 0006efe09000 CR4: 003406e0 [ 462.032014] DR0: DR1: DR2: [ 462.040083] DR3: DR6: fffe0ff0 DR7: 0400 [ 462.048154] Call Trace: [ 462.050969] [ 462.053311] ? update_load_avg+0x79/0x520 [ 462.057876] ? sched_clock_cpu+0x11/0xb0 [ 462.062356] dev_hard_start_xmit+0xa3/0x1f0 [ 462.067127] sch_direct_xmit+0xfc/0x1c0 [ 462.071509] __qdisc_run+0x122/0x270 [ 462.075598] net_tx_action+0xfd/0x1e0 [ 462.079786] __do_softirq+0x104/0x2af [ 462.083973] irq_exit+0xb6/0xc0 [ 462.087577] smp_apic_timer_interrupt+0x3d/0x50 [ 462.092732] apic_timer_interrupt+0x89/0x90 [ 462.097501] RIP: 0010:cpuidle_enter_state+0x122/0x2c0 [ 462.103240] RSP: 0018:b9b903353e58 EFLAGS: 0246 ORIG_RAX: ff10 [ 462.111818] RAX: RBX: 0004 RCX: 001f [ 462.119890] RDX: 006b86c1bdc1 RSI: a087df1d8998 RDI: [ 462.127962] RBP: b9b903353e98 R08: 00084c3f R09: 0018 [ 462.136032] R10: 000310a2 R11: 00055e38 R12: a087df1e5800 [ 462.144094] R13: b86ec998 R14: 0004 R15: b86ec980 [ 462.152165] [ 462.154601] ? cpuidle_enter_state+0x110/0x2c0 [ 462.159662] cpuidle_enter+0x17/0x20 [ 462.163748] call_cpuidle+0x23/0x40 [ 462.167740] do_idle+0x189/0x200 [ 462.171438] cpu_startup_entry+0x71/0x80 [ 462.175908] start_secondary+0x154/0x190 [ 462.180384] start_cpu+0x14/0x14 [ 462.184080] Code: 8d 75 05 66 39 c8 0f 86 e4 04 00 00 01 d0 0f b7 d1 29 d0 83 e8 01 39 c6 0f 8f ed 05 00 00 49 8b 44 24 20 48 8d 14 89 4c 8d 1c d0 <49> 89 5b 08 8b 83 [ 462.205347] RIP: i40e_lan_xmit_frame+0xf1/0x1420 [i40e] RSP: a087df1c3d80 [ 462.213418] CR2: 0008 [ 462.2
Re: [PATCH] stmmac: Add support for SIMATIC IOT2000 platform
On Tue, Apr 25, 2017 at 3:15 PM, Jan Kiszka wrote: > On 2017-04-25 13:42, Andy Shevchenko wrote: >> On Tue, Apr 25, 2017 at 1:09 PM, Jan Kiszka wrote: >>> On 2017-04-25 12:07, Jan Kiszka wrote: On 2017-04-25 11:46, Andy Shevchenko wrote: > On Tue, Apr 25, 2017 at 12:00 PM, Jan Kiszka > wrote: >{ >.name = "SIMATIC IOT2000", >.func = 6, >.phy_addr = 1, >}, >{ >.name = "SIMATIC IOT2000", >.func = 7, >.phy_addr = 1, >}, > > That's all what you need. Nope. Again: the asset tag is the way to tell both apart AND to ensure that we do not match on future devices. >> >>> To be more verbose: your version (which is our old one) would even >>> enable the second, not connected port on the IOT2020. Incorrectly. >> >> So, name has 2000 for 2020 device? It's clear bug in DMI table you have. :-( >> >> What else do you have in DMI which can be used to distinguish those devices? So, except asset tag is there anything else to use? Whatever you choose would be nice to split this long conditional: if (!strcmp(dmi->name, name) && dmi->func == func) { /* ASSET_TAG is optional */ if (dmi->asset_tag && strcmp(dmi->asset_tag, asset_tag)) continue; return dmi->phy_addr; } > Andy, there are devices out there in the field, if we as engineers like > it or not, that are all called "IOT2000" although they are sightly > different inside. This patch accounts for that. And it does that even > without adding "platform_data" hacks like in other patches of mine. ;) Yes, which is good. -- With Best Regards, Andy Shevchenko
[PATCH net-next 3/6] bpf: bpf_progs stores all loaded programs
We later want to give users a quick dump of what is possible with procfs, so store a list of all currently loaded bpf programs. Later this list will be printed in procfs. Signed-off-by: Hannes Frederic Sowa --- include/linux/filter.h | 4 ++-- kernel/bpf/core.c | 51 +++--- kernel/bpf/syscall.c | 4 ++-- 3 files changed, 27 insertions(+), 32 deletions(-) diff --git a/include/linux/filter.h b/include/linux/filter.h index 9a7786db14fa53..63624c619e371b 100644 --- a/include/linux/filter.h +++ b/include/linux/filter.h @@ -753,8 +753,8 @@ bpf_address_lookup(unsigned long addr, unsigned long *size, return ret; } -void bpf_prog_kallsyms_add(struct bpf_prog *fp); -void bpf_prog_kallsyms_del(struct bpf_prog *fp); +void bpf_prog_link(struct bpf_prog *fp); +void bpf_prog_unlink(struct bpf_prog *fp); #else /* CONFIG_BPF_JIT */ diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c index 043f634ff58d87..2139118258cdf8 100644 --- a/kernel/bpf/core.c +++ b/kernel/bpf/core.c @@ -365,22 +365,6 @@ static struct latch_tree_root bpf_tree __cacheline_aligned; int bpf_jit_kallsyms __read_mostly; -static void bpf_prog_ksym_node_add(struct bpf_prog_aux *aux) -{ - WARN_ON_ONCE(!list_empty(&aux->bpf_progs_head)); - list_add_tail_rcu(&aux->bpf_progs_head, &bpf_progs); - latch_tree_insert(&aux->ksym_tnode, &bpf_tree, &bpf_tree_ops); -} - -static void bpf_prog_ksym_node_del(struct bpf_prog_aux *aux) -{ - if (list_empty(&aux->bpf_progs_head)) - return; - - latch_tree_erase(&aux->ksym_tnode, &bpf_tree, &bpf_tree_ops); - list_del_rcu(&aux->bpf_progs_head); -} - static bool bpf_prog_kallsyms_candidate(const struct bpf_prog *fp) { return fp->jited && !bpf_prog_was_classic(fp); @@ -392,38 +376,45 @@ static bool bpf_prog_kallsyms_verify_off(const struct bpf_prog *fp) fp->aux->bpf_progs_head.prev == LIST_POISON2; } -void bpf_prog_kallsyms_add(struct bpf_prog *fp) +void bpf_prog_link(struct bpf_prog *fp) { - if (!bpf_prog_kallsyms_candidate(fp) || - !capable(CAP_SYS_ADMIN)) - return; + struct bpf_prog_aux *aux = fp->aux; spin_lock_bh(&bpf_lock); - bpf_prog_ksym_node_add(fp->aux); + list_add_tail_rcu(&aux->bpf_progs_head, &bpf_progs); + if (bpf_prog_kallsyms_candidate(fp)) + latch_tree_insert(&aux->ksym_tnode, &bpf_tree, &bpf_tree_ops); spin_unlock_bh(&bpf_lock); } -void bpf_prog_kallsyms_del(struct bpf_prog *fp) +void bpf_prog_unlink(struct bpf_prog *fp) { - if (!bpf_prog_kallsyms_candidate(fp)) - return; + struct bpf_prog_aux *aux = fp->aux; spin_lock_bh(&bpf_lock); - bpf_prog_ksym_node_del(fp->aux); + list_del_rcu(&aux->bpf_progs_head); + if (bpf_prog_kallsyms_candidate(fp)) + latch_tree_erase(&aux->ksym_tnode, &bpf_tree, &bpf_tree_ops); spin_unlock_bh(&bpf_lock); } static struct bpf_prog *bpf_prog_kallsyms_find(unsigned long addr) { struct latch_tree_node *n; + struct bpf_prog *prog; if (!bpf_jit_kallsyms_enabled()) return NULL; n = latch_tree_find((void *)addr, &bpf_tree, &bpf_tree_ops); - return n ? - container_of(n, struct bpf_prog_aux, ksym_tnode)->prog : - NULL; + if (!n) + return NULL; + + prog = container_of(n, struct bpf_prog_aux, ksym_tnode)->prog; + if (!prog->priv_cap_sys_admin) + return NULL; + + return prog; } const char *__bpf_address_lookup(unsigned long addr, unsigned long *size, @@ -474,6 +465,10 @@ int bpf_get_kallsym(unsigned int symnum, unsigned long *value, char *type, rcu_read_lock(); list_for_each_entry_rcu(aux, &bpf_progs, bpf_progs_head) { + if (!bpf_prog_kallsyms_candidate(aux->prog) || + !aux->prog->priv_cap_sys_admin) + continue; + if (it++ != symnum) continue; diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index 13642c73dca0b4..d61d1bd3e6fee6 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -664,7 +664,7 @@ void bpf_prog_put(struct bpf_prog *prog) { if (atomic_dec_and_test(&prog->aux->refcnt)) { trace_bpf_prog_put_rcu(prog); - bpf_prog_kallsyms_del(prog); + bpf_prog_unlink(prog); call_rcu(&prog->aux->rcu, __bpf_prog_put_rcu); } } @@ -858,7 +858,7 @@ static int bpf_prog_load(union bpf_attr *attr) /* failed to allocate fd */ goto free_used_maps; - bpf_prog_kallsyms_add(prog); + bpf_prog_link(prog); trace_bpf_prog_load(prog, err); return err; -- 2.9.3
[PATCH net-next 5/6] bpf: add skeleton for procfs printing of bpf_progs
Signed-off-by: Hannes Frederic Sowa --- kernel/bpf/core.c | 90 +++ 1 file changed, 90 insertions(+) diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c index 048e2d79718a16..3ba175a24e971a 100644 --- a/kernel/bpf/core.c +++ b/kernel/bpf/core.c @@ -32,6 +32,9 @@ #include #include +#include +#include + #include /* Registers */ @@ -488,6 +491,93 @@ int bpf_get_kallsym(unsigned int symnum, unsigned long *value, char *type, return ret; } +/* ebpf procfs implementation */ + +static struct proc_dir_entry *ebpf_proc_dir; + +static void *ebpf_proc_start(struct seq_file *s, loff_t *pos) + __acquires(RCU_BH) +{ + struct bpf_prog_aux *aux; + loff_t off = 0; + + rcu_read_lock(); + + if (*pos == 0) + return SEQ_START_TOKEN; + + list_for_each_entry_rcu(aux, &bpf_progs, bpf_progs_head) + if (++off == *pos) + return aux; + + return NULL; +} + +static void *ebpf_proc_next(struct seq_file *s, void *v, loff_t *pos) +{ + struct bpf_prog_aux *aux; + + ++*pos; + + if (v == SEQ_START_TOKEN) + return list_first_or_null_rcu(&bpf_progs, struct bpf_prog_aux, + bpf_progs_head); + + aux = v; + return list_next_or_null_rcu(&bpf_progs, +&aux->bpf_progs_head, +struct bpf_prog_aux, +bpf_progs_head); +} + +static void ebpf_proc_stop(struct seq_file *s, void *v) + __releases(RCU_BH) +{ + rcu_read_unlock(); +} + +static int ebpf_proc_show(struct seq_file *s, void *v) +{ + if (v == SEQ_START_TOKEN) { + seq_printf(s, "# tag\n"); + return 0; + } + + return 0; +} + +static const struct seq_operations ebpf_seq_ops = { + .start = ebpf_proc_start, + .next = ebpf_proc_next, + .stop = ebpf_proc_stop, + .show = ebpf_proc_show, +}; + +static int ebpf_proc_open(struct inode *inode, struct file *file) +{ + return seq_open(file, &ebpf_seq_ops); +} + +static const struct file_operations ebpf_proc_operations = { + .open = ebpf_proc_open, + .read = seq_read, + .llseek = seq_lseek, + .release = seq_release, +}; + +static int __init ebpf_proc_init(void) +{ + ebpf_proc_dir = proc_mkdir("bpf", NULL); + if (!ebpf_proc_dir) + return 0; + proc_create("programs", 0400, ebpf_proc_dir, &ebpf_proc_operations); + return 0; +} + +device_initcall(ebpf_proc_init); + +/* end of bpf proc inmplementation */ + struct bpf_binary_header * bpf_jit_binary_alloc(unsigned int proglen, u8 **image_ptr, unsigned int alignment, -- 2.9.3
[PATCH net-next 1/6] bpf: bpf_lock needs only block bottom half
We never modify bpf programs from hardirqs ever. Signed-off-by: Hannes Frederic Sowa --- kernel/bpf/core.c | 12 1 file changed, 4 insertions(+), 8 deletions(-) diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c index b4f1cb0c5ac710..6f81e0f5a0faa2 100644 --- a/kernel/bpf/core.c +++ b/kernel/bpf/core.c @@ -394,27 +394,23 @@ static bool bpf_prog_kallsyms_verify_off(const struct bpf_prog *fp) void bpf_prog_kallsyms_add(struct bpf_prog *fp) { - unsigned long flags; - if (!bpf_prog_kallsyms_candidate(fp) || !capable(CAP_SYS_ADMIN)) return; - spin_lock_irqsave(&bpf_lock, flags); + spin_lock_bh(&bpf_lock); bpf_prog_ksym_node_add(fp->aux); - spin_unlock_irqrestore(&bpf_lock, flags); + spin_unlock_bh(&bpf_lock); } void bpf_prog_kallsyms_del(struct bpf_prog *fp) { - unsigned long flags; - if (!bpf_prog_kallsyms_candidate(fp)) return; - spin_lock_irqsave(&bpf_lock, flags); + spin_lock_bh(&bpf_lock); bpf_prog_ksym_node_del(fp->aux); - spin_unlock_irqrestore(&bpf_lock, flags); + spin_unlock_bh(&bpf_lock); } static struct bpf_prog *bpf_prog_kallsyms_find(unsigned long addr) -- 2.9.3
[PATCH net-next 6/6] bpf: show bpf programs
Signed-off-by: Hannes Frederic Sowa --- include/uapi/linux/bpf.h | 32 +++- kernel/bpf/core.c| 30 +- 2 files changed, 48 insertions(+), 14 deletions(-) diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index e553529929f683..d6506e320953d5 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -101,20 +101,26 @@ enum bpf_map_type { BPF_MAP_TYPE_HASH_OF_MAPS, }; +#define BPF_PROG_TYPES \ + X(BPF_PROG_TYPE_UNSPEC),\ + X(BPF_PROG_TYPE_SOCKET_FILTER), \ + X(BPF_PROG_TYPE_KPROBE),\ + X(BPF_PROG_TYPE_SCHED_CLS), \ + X(BPF_PROG_TYPE_SCHED_ACT), \ + X(BPF_PROG_TYPE_TRACEPOINT),\ + X(BPF_PROG_TYPE_XDP), \ + X(BPF_PROG_TYPE_PERF_EVENT),\ + X(BPF_PROG_TYPE_CGROUP_SKB),\ + X(BPF_PROG_TYPE_CGROUP_SOCK), \ + X(BPF_PROG_TYPE_LWT_IN),\ + X(BPF_PROG_TYPE_LWT_OUT), \ + X(BPF_PROG_TYPE_LWT_XMIT), + + enum bpf_prog_type { - BPF_PROG_TYPE_UNSPEC, - BPF_PROG_TYPE_SOCKET_FILTER, - BPF_PROG_TYPE_KPROBE, - BPF_PROG_TYPE_SCHED_CLS, - BPF_PROG_TYPE_SCHED_ACT, - BPF_PROG_TYPE_TRACEPOINT, - BPF_PROG_TYPE_XDP, - BPF_PROG_TYPE_PERF_EVENT, - BPF_PROG_TYPE_CGROUP_SKB, - BPF_PROG_TYPE_CGROUP_SOCK, - BPF_PROG_TYPE_LWT_IN, - BPF_PROG_TYPE_LWT_OUT, - BPF_PROG_TYPE_LWT_XMIT, +#define X(type) type + BPF_PROG_TYPES +#undef X }; enum bpf_attach_type { diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c index 3ba175a24e971a..685c1d0f31e029 100644 --- a/kernel/bpf/core.c +++ b/kernel/bpf/core.c @@ -536,13 +536,41 @@ static void ebpf_proc_stop(struct seq_file *s, void *v) rcu_read_unlock(); } +static const char *bpf_type_string(enum bpf_prog_type type) +{ + static const char *bpf_type_names[] = { +#define X(type) #type + BPF_PROG_TYPES +#undef X + }; + + if (type >= ARRAY_SIZE(bpf_type_names)) + return ""; + + return bpf_type_names[type]; +} + static int ebpf_proc_show(struct seq_file *s, void *v) { + struct bpf_prog *prog; + struct bpf_prog_aux *aux; + char prog_tag[sizeof(prog->tag) * 2 + 1] = { }; + if (v == SEQ_START_TOKEN) { - seq_printf(s, "# tag\n"); + seq_printf(s, "# tag\t\t\ttype\t\t\truntime\tcap\tmemlock\n"); return 0; } + aux = v; + prog = aux->prog; + + bin2hex(prog_tag, prog->tag, sizeof(prog->tag)); + seq_printf(s, "%s\t%s\t%s\t%s\t%llu\n", prog_tag, + bpf_type_string(prog->type), + prog->jited ? "jit" : "int", + prog->priv_cap_sys_admin ? "priv" : "unpriv", + prog->pages * 1ULL << PAGE_SHIFT); + return 0; } -- 2.9.3
[PATCH net-next 0/6] bpf: list all loaded ebpf programs in /proc/bpf/programs
Right now it seems difficult to list all active ebpf programs in a system. This new /proc/bpf/programs node should help and print basically essential information about loaded ebpf programs. This should help an admin to get a quick look of what is going on in his system. Feedback welcome! Hannes Frederic Sowa (6): bpf: bpf_lock needs only block bottom half bpf: rename bpf_kallsyms to bpf_progs, ksym_lnode to bpf_progs_head bpf: bpf_progs stores all loaded programs bpf: track if the bpf program was loaded with SYS_ADMIN capabilities bpf: add skeleton for procfs printing of bpf_progs bpf: show bpf programs include/linux/bpf.h | 2 +- include/linux/filter.h | 10 ++- include/uapi/linux/bpf.h | 32 kernel/bpf/core.c| 195 +-- kernel/bpf/syscall.c | 11 +-- kernel/bpf/verifier.c| 4 +- net/core/filter.c| 6 +- 7 files changed, 190 insertions(+), 70 deletions(-) -- 2.9.3
[PATCH net-next 4/6] bpf: track if the bpf program was loaded with SYS_ADMIN capabilities
Signed-off-by: Hannes Frederic Sowa --- include/linux/filter.h | 6 -- kernel/bpf/core.c | 4 +++- kernel/bpf/syscall.c | 7 --- kernel/bpf/verifier.c | 4 ++-- net/core/filter.c | 6 +++--- 5 files changed, 16 insertions(+), 11 deletions(-) diff --git a/include/linux/filter.h b/include/linux/filter.h index 63624c619e371b..635311f57bf24f 100644 --- a/include/linux/filter.h +++ b/include/linux/filter.h @@ -413,7 +413,8 @@ struct bpf_prog { locked:1, /* Program image locked? */ gpl_compatible:1, /* Is filter GPL compatible? */ cb_access:1,/* Is control block accessed? */ - dst_needed:1; /* Do we need dst entry? */ + dst_needed:1, /* Do we need dst entry? */ + priv_cap_sys_admin:1; /* Where we loaded as sys_admin? */ kmemcheck_bitfield_end(meta); enum bpf_prog_type type; /* Type of BPF program */ u32 len;/* Number of filter blocks */ @@ -615,7 +616,8 @@ static inline int sk_filter(struct sock *sk, struct sk_buff *skb) struct bpf_prog *bpf_prog_select_runtime(struct bpf_prog *fp, int *err); void bpf_prog_free(struct bpf_prog *fp); -struct bpf_prog *bpf_prog_alloc(unsigned int size, gfp_t gfp_extra_flags); +struct bpf_prog *bpf_prog_alloc(unsigned int size, gfp_t gfp_extra_flags, + bool cap_sys_admin); struct bpf_prog *bpf_prog_realloc(struct bpf_prog *fp_old, unsigned int size, gfp_t gfp_extra_flags); void __bpf_prog_free(struct bpf_prog *fp); diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c index 2139118258cdf8..048e2d79718a16 100644 --- a/kernel/bpf/core.c +++ b/kernel/bpf/core.c @@ -74,7 +74,8 @@ void *bpf_internal_load_pointer_neg_helper(const struct sk_buff *skb, int k, uns return NULL; } -struct bpf_prog *bpf_prog_alloc(unsigned int size, gfp_t gfp_extra_flags) +struct bpf_prog *bpf_prog_alloc(unsigned int size, gfp_t gfp_extra_flags, + bool cap_sys_admin) { gfp_t gfp_flags = GFP_KERNEL | __GFP_HIGHMEM | __GFP_ZERO | gfp_extra_flags; @@ -94,6 +95,7 @@ struct bpf_prog *bpf_prog_alloc(unsigned int size, gfp_t gfp_extra_flags) return NULL; } + fp->priv_cap_sys_admin = cap_sys_admin; fp->pages = size / PAGE_SIZE; fp->aux = aux; fp->aux->prog = fp; diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index d61d1bd3e6fee6..ed698c17578a49 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -786,7 +786,7 @@ EXPORT_SYMBOL_GPL(bpf_prog_get_type); /* last field in 'union bpf_attr' used by this command */ #defineBPF_PROG_LOAD_LAST_FIELD kern_version -static int bpf_prog_load(union bpf_attr *attr) +static int bpf_prog_load(union bpf_attr *attr, bool cap_sys_admin) { enum bpf_prog_type type = attr->prog_type; struct bpf_prog *prog; @@ -817,7 +817,8 @@ static int bpf_prog_load(union bpf_attr *attr) return -EPERM; /* plain bpf_prog allocation */ - prog = bpf_prog_alloc(bpf_prog_size(attr->insn_cnt), GFP_USER); + prog = bpf_prog_alloc(bpf_prog_size(attr->insn_cnt), GFP_USER, + cap_sys_admin); if (!prog) return -ENOMEM; @@ -1053,7 +1054,7 @@ SYSCALL_DEFINE3(bpf, int, cmd, union bpf_attr __user *, uattr, unsigned int, siz err = map_get_next_key(&attr); break; case BPF_PROG_LOAD: - err = bpf_prog_load(&attr); + err = bpf_prog_load(&attr, capable(CAP_SYS_ADMIN)); break; case BPF_OBJ_PIN: err = bpf_obj_pin(&attr); diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 6f8b6ed690be93..24c9dac374770f 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -3488,7 +3488,7 @@ int bpf_check(struct bpf_prog **prog, union bpf_attr *attr) if (ret < 0) goto skip_full_check; - env->allow_ptr_leaks = capable(CAP_SYS_ADMIN); + env->allow_ptr_leaks = env->prog->priv_cap_sys_admin; ret = do_check(env); @@ -3589,7 +3589,7 @@ int bpf_analyzer(struct bpf_prog *prog, const struct bpf_ext_analyzer_ops *ops, if (ret < 0) goto skip_full_check; - env->allow_ptr_leaks = capable(CAP_SYS_ADMIN); + env->allow_ptr_leaks = prog->priv_cap_sys_admin; ret = do_check(env); diff --git a/net/core/filter.c b/net/core/filter.c index 9a37860a80fc78..dc020d40bb770a 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -1100,7 +1100,7 @@ int bpf_prog_create(struct bpf_prog **pfp, struct sock_fprog_kern *fprog) if (!bpf_check_basics_ok(fprog->filter, fprog->len)) return -EINVAL;
[PATCH net-next 2/6] bpf: rename bpf_kallsyms to bpf_progs, ksym_lnode to bpf_progs_head
We will soon put all bpf programs on this list, thus use apropriate names. Signed-off-by: Hannes Frederic Sowa --- include/linux/bpf.h | 2 +- kernel/bpf/core.c | 18 +- 2 files changed, 10 insertions(+), 10 deletions(-) diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 6bb38d76faf42a..0fbf6a76555cc9 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -172,7 +172,7 @@ struct bpf_prog_aux { u32 used_map_cnt; u32 max_ctx_offset; struct latch_tree_node ksym_tnode; - struct list_head ksym_lnode; + struct list_head bpf_progs_head; const struct bpf_verifier_ops *ops; struct bpf_map **used_maps; struct bpf_prog *prog; diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c index 6f81e0f5a0faa2..043f634ff58d87 100644 --- a/kernel/bpf/core.c +++ b/kernel/bpf/core.c @@ -98,7 +98,7 @@ struct bpf_prog *bpf_prog_alloc(unsigned int size, gfp_t gfp_extra_flags) fp->aux = aux; fp->aux->prog = fp; - INIT_LIST_HEAD_RCU(&fp->aux->ksym_lnode); + INIT_LIST_HEAD_RCU(&fp->aux->bpf_progs_head); return fp; } @@ -360,25 +360,25 @@ static const struct latch_tree_ops bpf_tree_ops = { }; static DEFINE_SPINLOCK(bpf_lock); -static LIST_HEAD(bpf_kallsyms); +static LIST_HEAD(bpf_progs); static struct latch_tree_root bpf_tree __cacheline_aligned; int bpf_jit_kallsyms __read_mostly; static void bpf_prog_ksym_node_add(struct bpf_prog_aux *aux) { - WARN_ON_ONCE(!list_empty(&aux->ksym_lnode)); - list_add_tail_rcu(&aux->ksym_lnode, &bpf_kallsyms); + WARN_ON_ONCE(!list_empty(&aux->bpf_progs_head)); + list_add_tail_rcu(&aux->bpf_progs_head, &bpf_progs); latch_tree_insert(&aux->ksym_tnode, &bpf_tree, &bpf_tree_ops); } static void bpf_prog_ksym_node_del(struct bpf_prog_aux *aux) { - if (list_empty(&aux->ksym_lnode)) + if (list_empty(&aux->bpf_progs_head)) return; latch_tree_erase(&aux->ksym_tnode, &bpf_tree, &bpf_tree_ops); - list_del_rcu(&aux->ksym_lnode); + list_del_rcu(&aux->bpf_progs_head); } static bool bpf_prog_kallsyms_candidate(const struct bpf_prog *fp) @@ -388,8 +388,8 @@ static bool bpf_prog_kallsyms_candidate(const struct bpf_prog *fp) static bool bpf_prog_kallsyms_verify_off(const struct bpf_prog *fp) { - return list_empty(&fp->aux->ksym_lnode) || - fp->aux->ksym_lnode.prev == LIST_POISON2; + return list_empty(&fp->aux->bpf_progs_head) || + fp->aux->bpf_progs_head.prev == LIST_POISON2; } void bpf_prog_kallsyms_add(struct bpf_prog *fp) @@ -473,7 +473,7 @@ int bpf_get_kallsym(unsigned int symnum, unsigned long *value, char *type, return ret; rcu_read_lock(); - list_for_each_entry_rcu(aux, &bpf_kallsyms, ksym_lnode) { + list_for_each_entry_rcu(aux, &bpf_progs, bpf_progs_head) { if (it++ != symnum) continue; -- 2.9.3
Re: linux-next: Tree for Apr 26 (net/can/bcm.c)
Hi Randy, thanks for the report! Some fallout of my namespace support integration %-) I posted a patch for it: http://marc.info/?l=linux-can&m=149323049630039&w=2 Many thanks & best regards, Oliver On 04/26/2017 04:53 PM, Randy Dunlap wrote: On 04/26/17 01:03, Stephen Rothwell wrote: Hi all, Changes since 20170424: on x86_64: when CONFIG_PROC_FS is not enabled: ../net/can/bcm.c:1541:14: error: 'struct netns_can' has no member named 'bcmproc_dir' ../net/can/bcm.c:1601:14: error: 'struct netns_can' has no member named 'bcmproc_dir' ../net/can/bcm.c:1696:11: error: 'struct netns_can' has no member named 'bcmproc_dir' ../net/can/bcm.c:1707:15: error: 'struct netns_can' has no member named 'bcmproc_dir' 2 of those are "protected" by if (IS_ENABLED(CONFIG_PROC_FS)) { but that doesn't seem to help/work here. gcc v4.8.5
Re: Bug and configuration MPLS error?
On 4/26/17 6:40 AM, Алексей Болдырев wrote: > > > 26.04.2017, 05:23, "David Ahern" : >> On 4/25/17 11:28 AM, Алексей Болдырев wrote: >>> 226 sysctl -w net.mpls.conf.lo.input=1 >>> 227 sysctl -w net.mpls.platform_labels=1048575 >>> 228 ip link add veth0 type veth peer name veth1 >>> 229 ip link add veth2 type veth peer name veth3 >>> 230 sysctl -w net.mpls.conf.veth0.input=1 >>> 231 sysctl -w net.mpls.conf.veth2.input=1 >>> 232 ifconfig veth0 10.3.3.1 netmask 255.255.255.0 >>> 233 ifconfig veth2 10.4.4.1 netmask 255.255.255.0 >>> 234 ip netns add host1 >>> 235 ip netns add host2 >>> 236 ip link set veth1 netns host1 >>> 237 ip link set veth3 netns host2 >>> 238 ip netns exec host1 ifconfig veth1 10.3.3.2 netmask 255.255.255.0 up >>> 239 ip netns exec host2 ifconfig veth3 10.4.4.2 netmask 255.255.255.0 up >>> 240 ip netns exec host1 ip route add 10.10.10.2/32 encap mpls 112 via inet >>> 10.3.3.1 >>> 241 ip netns exec host2 ip route add 10.10.10.1/32 encap mpls 111 via inet >>> 10.4.4.1 >>> 242 ip -f mpls route add 111 via inet 10.3.3.2 >>> 243 ip -f mpls route add 112 via inet 10.4.4.2 >> >> your setup is incomplete. >> >> # ip netns exec host2 ping 10.10.10.1 >> PING 10.10.10.1 (10.10.10.1) 56(84) bytes of data. >> ^C >> --- 10.10.10.1 ping statistics --- >> 2 packets transmitted, 0 received, 100% packet loss, time 1038ms >> >> If you run tcpdump on veth1 in host1 you see the packets come in but no >> response: >> >> # ip netns exec host1 tcpdump -n -i veth1 >> tcpdump: verbose output suppressed, use -v or -vv for full protocol decode >> listening on veth1, link-type EN10MB (Ethernet), capture size 262144 bytes >> 19:20:24.599529 IP6 fe80::347d:e3ff:fe93:944b > ff02::2: ICMP6, router >> solicitation, length 16 >> 19:20:27.413901 IP 10.4.4.2 > 10.10.10.1: ICMP echo request, id 978, seq >> 1, length 64 >> 19:20:28.439574 IP 10.4.4.2 > 10.10.10.1: ICMP echo request, id 978, seq >> 2, length 64 >> >> and the lack of response is b/c: >> 1. host1 has no address for 10.10.10.1 and >> 2. even if it did, there is no return route to 10.4.4.2: >> >> # ip -netns host1 ro ls >> 10.3.3.0/24 dev veth1 proto kernel scope link src 10.3.3.2 >> 10.10.10.2 encap mpls 112 via 10.3.3.1 dev veth1 > > As for ping, you need to enter this: > Ip netns exec host2 ping 10.10.10.1 -A 10.10.10.2 > Here I published the results of testing on new (>4.9) cores. (in Russian): > http://forum.nag.ru/forum/index.php?s=d09f0e5186fda59b3099eb81ad07ee63&showtopic=128927 > But on the old kernels: > http://forum.nag.ru/forum/index.php?showtopic=128927&view=findpost&p=1396067 > host1 does not have 10.10.10.1 as a local address. host2 does not have 10.10.10.2 as a local address. Given that, host1 has no business replying to a ping destined to 10.10.10.1, and host2 will not use 10.10.10.2 as a source address. I don't have time right now to build and test on older kernels, but based on the network config I do not see how it can work. If you add: ip -netns host1 addr add dev lo 10.10.10.1/32 ip -netns host2 addr add dev lo 10.10.10.2/32 Then it works.
[PATCH net-next] can: fix CAN BCM build with CONFIG_PROC_FS disabled
The introduced namespace support moved the BCM variables for procfs into a per-net data structure. This leads to a build failure with disabled procfs: on x86_64: when CONFIG_PROC_FS is not enabled: ../net/can/bcm.c:1541:14: error: 'struct netns_can' has no member named 'bcmproc_dir' ../net/can/bcm.c:1601:14: error: 'struct netns_can' has no member named 'bcmproc_dir' ../net/can/bcm.c:1696:11: error: 'struct netns_can' has no member named 'bcmproc_dir' ../net/can/bcm.c:1707:15: error: 'struct netns_can' has no member named 'bcmproc_dir' http://marc.info/?l=linux-can&m=149321842526524&w=2 Reported-by: Randy Dunlap Signed-off-by: Oliver Hartkopp --- net/can/bcm.c | 22 ++ 1 file changed, 14 insertions(+), 8 deletions(-) diff --git a/net/can/bcm.c b/net/can/bcm.c index 0e855917b7e1..78409841b74c 100644 --- a/net/can/bcm.c +++ b/net/can/bcm.c @@ -147,6 +147,7 @@ static inline ktime_t bcm_timeval_to_ktime(struct bcm_timeval tv) /* * procfs functions */ +#if IS_ENABLED(CONFIG_PROC_FS) static char *bcm_proc_getifname(struct net *net, char *result, int ifindex) { struct net_device *dev; @@ -251,6 +252,7 @@ static const struct file_operations bcm_proc_fops = { .llseek = seq_lseek, .release= single_release, }; +#endif /* CONFIG_PROC_FS */ /* * bcm_can_tx - send the (next) CAN frame to the appropriate CAN interface @@ -1537,9 +1539,11 @@ static int bcm_release(struct socket *sock) bcm_remove_op(op); } +#if IS_ENABLED(CONFIG_PROC_FS) /* remove procfs entry */ if (net->can.bcmproc_dir && bo->bcm_proc_read) remove_proc_entry(bo->procname, net->can.bcmproc_dir); +#endif /* CONFIG_PROC_FS */ /* remove device reference */ if (bo->bound) { @@ -1598,6 +1602,7 @@ static int bcm_connect(struct socket *sock, struct sockaddr *uaddr, int len, bo->ifindex = 0; } +#if IS_ENABLED(CONFIG_PROC_FS) if (net->can.bcmproc_dir) { /* unique socket address as filename */ sprintf(bo->procname, "%lu", sock_i_ino(sk)); @@ -1609,6 +1614,7 @@ static int bcm_connect(struct socket *sock, struct sockaddr *uaddr, int len, goto fail; } } +#endif /* CONFIG_PROC_FS */ bo->bound = 1; @@ -1691,22 +1697,22 @@ static const struct can_proto bcm_can_proto = { static int canbcm_pernet_init(struct net *net) { +#if IS_ENABLED(CONFIG_PROC_FS) /* create /proc/net/can-bcm directory */ - if (IS_ENABLED(CONFIG_PROC_FS)) { - net->can.bcmproc_dir = - proc_net_mkdir(net, "can-bcm", net->proc_net); - } + net->can.bcmproc_dir = + proc_net_mkdir(net, "can-bcm", net->proc_net); +#endif /* CONFIG_PROC_FS */ return 0; } static void canbcm_pernet_exit(struct net *net) { +#if IS_ENABLED(CONFIG_PROC_FS) /* remove /proc/net/can-bcm directory */ - if (IS_ENABLED(CONFIG_PROC_FS)) { - if (net->can.bcmproc_dir) - remove_proc_entry("can-bcm", net->proc_net); - } + if (net->can.bcmproc_dir) + remove_proc_entry("can-bcm", net->proc_net); +#endif /* CONFIG_PROC_FS */ } static struct pernet_operations canbcm_pernet_ops __read_mostly = { -- 2.11.0
Re: [PATCH net-next] ip6_tunnel: Fix missing tunnel encapsulation limit option
On Wed, Apr 26, 2017 at 1:07 PM, Craig Gallek wrote: > From: Craig Gallek > > The IPv6 tunneling code tries to insert IPV6_TLV_TNL_ENCAP_LIMIT and > IPV6_TLV_PADN options when an encapsulation limit is defined (the > default is a limit of 4). An MTU adjustment is done to account for > these options as well. However, the options are never present in the > generated packets. > > ipv6_push_nfrag_opts requires that IPV6_RTHDR be present in order to > include any IPV6_DSTOPTS options. The v6 tunnel code does not > use routing options, so the encap limit options are not included. > > A brief reading of RFC 3542 section 9.2 (specifically the 4th paragraph) > makes me believe that this requirement in the kernel is incorrect. Looking more closely, I think I'm wrong here. Specifically, the cmsg parser puts IPV6_RTHDRDSTOPTS in dst0opt and IPV6_DSTOPTS in dst1opt. The tunnel code is currently building dst0opt and using ipv6_push_nfrag_opts. Perhaps it should be building dst1opt and calling ipv6_push_frag_opts?
Re: xdp_redirect ifindex vs port. Was: best API for returning/setting egress port?
On 4/26/17 9:35 AM, John Fastabend wrote: As Alexei also mentioned before, ifindex vs port makes no real difference seen from the bpf program side. It is userspace's responsibility to add ifindex/port's to the bpf-maps, according to how the bpf program "policy" want to "connect" these ports. The port-table system add one extra step, of also adding this port to the port-table (which lives inside the kernel). I'm not sure I understand the "lives inside the kernel" bit. I assumed the 'map' should be a bpf map and behave like any other bpf map. I wanted a new map to be defined, something like this from the bpf programmer side. struct bpf_map_def SEC("maps") port_table = .type = BPF_MAP_TYPE_PORT_CONNECTION, .key_size = sizeof(u32), .value_size = BPF_PORT_CONNECTION_SIZE, .max_entries = 256, }; I like the idea. We have prog_array, perf_event_array, cgroup_array map specializations. This one can be new netdev_array with some new bpf_redirect-like helper accessing it. When loading the XDP program, we also need to pass along a port table "id" this XDP program is associated with (and if it doesn't exists you create it). And your userspace "control-plane" application also need to know this port table "id", when adding a new port. So the user space application that is loading the program also needs to handle this map. This seems correct to me. But I don't see the value in making some new port table when we already have well understood framework for maps. +1 The concept of having multiple port tables is key. As this implies we can have several simultaneous "data-planes" that is *isolated* from each-other. Think about how network-namespaces/containers want isolation. A subtle thing I'm afraid to mention, is that oppose to the ifindex model, a port table with mapping to a net_device pointer, would allow (faster) delivery into the container's inner net_device, which sort of violates the isolation, but I would argue it is not a problem as this net_device pointer could only be added from a process within the namespace. I like this feature, but it could easily be disallowed via port insertion-time validation. I think the above optimization should be allowed. And agree multiple port tables (maps?) is needed. Again all this points to using standard maps logic in my mind. For permissions and different domains, which I think you were starting to touch on, it looks like we could extend the pinning API. At the moment it does an inode_permission(inode, MAY_WRITE) check but I presume this could be extended. None of this would be needed in v1 and could be added subsequently. read-only maps seems doable. this is great idea. Once BPF_MAP_TYPE_NETDEV_ARRAY is populated the user space can make it readonly to prevent further changes. From user space it can be done similar to perf_events/cgroups as well. bpf_map_update_elem(&netdev_array, &port_num, &ifindex) should work. For bpf_map_lookup_elem() from such netdev_array we can return ifindex back. The bpf_map_show_fdinfo() can be customized as well to pretty print ifindexes of netdevs stored in there.
Re: [Patch net 3/3] team: use a larger struct for mac address
On 2017-04-26 1:28 PM, Cong Wang wrote: On Wed, Apr 26, 2017 at 9:46 AM, Jarod Wilson wrote: On 2017-04-26 12:11 PM, Cong Wang wrote: On Wed, Apr 26, 2017 at 8:55 AM, Jarod Wilson wrote: We already have struct sockaddr_storage that could be used throughout this set as well. We just converted a few pieces of the bonding driver over to using it for better support of ipoib bonds, via commit faeeb317a5615076dff1ff44b51e862e6064dbd0. Might be better to just use that in both bonding and team, rather than having different per-driver structs, or Yet Another Address Storage implementation. Technically, struct sockaddr_storage is not enough either, given the max is MAX_ADDR_LEN. This is why I gave up on sockaddr_storage. Wait, what? Am I missing something? MAX_ADDR_LEN is 32, and sockaddr_storage is a #define for __kernel_sockaddr_storage, which has it's __data member defined as being of size 128 - sizeof(unsigned short). My bad, I thought it is same with sizeof(in6addr) without looking into it. The question is, why do we waste 126 - 32 = 94 bytes on stack to just use struct sockaddr_storage? That's a fair point. I totally understand we want a unified struct, but we already redefine it in multiple places in tree... Something unified and centralized with a data storage of MAX_ADDR_LEN does seem reasonable to get both consistency and minimized waste, and I could certainly do a follow-up patch for the bonding driver to switch the bits now using sockaddr_storage over to whatever new struct gets added. -- Jarod Wilson ja...@redhat.com
RE: qed*: debug infrastructures
Jiri, Florian, Jakub, Thanks all for you suggestions. Some answers to questions posted: The signal tracing in our device can be used for tracing things like load/store/program_counter from our fastpath processors (which handle every packet) which can then be re-run in a simulative environment (recreating the recorded scenario). Other interesting uses for this feature can be partial pci recording or partial network recording (poor man's analyzer) which can also be very effective where full blown lab equipment is unavailable. I reviewed the code of the drivers under hw_tracing (thanks Florian) and I think we might be a good fit. Jiri indicated dpipe was not intended for this sort of thing and suggested an additional dev link object, although it seems to me that this will have to be either a very generic object which would be susceptible to abuse similar to debugfs, or it would be tailored to our device so much that no one else would use it, so I am somewhat less inclined to go down this path (the code abstracting our debug feature is accessed via ~20 api functions accepting ~10 params each, i.e. quite a handful of configuraion to generalize). The ethtool debug dump presets (thanks Jakub) are far too narrow to encompass the full flexibility required here. Dave, I think my next step would be to send an RFC adding to our core module (qed) the necessary APIs (mostly to provide some details to this rather abstract discussion). I will plan to connect those to a new hwtracing driver I'll create for this purpose, unless a different direction is suggested. Thanks, Ariel
Re: Corrupted SKB
On Tue, Apr 25, 2017 at 9:42 PM, Michael Ma wrote: > 2017-04-18 21:46 GMT-07:00 Michael Ma : >> 2017-04-18 16:12 GMT-07:00 Cong Wang : >>> On Mon, Apr 17, 2017 at 5:39 PM, Michael Ma wrote: Hi - We've implemented a "glue" qdisc similar to mqprio which can associate one qdisc to multiple txqs as the root qdisc. Reference count of the child qdiscs have been adjusted properly in this case so that it represents the number of txqs it has been attached to. However when sending packets we saw the skb from dequeue_skb() corrupted with the following call stack: [exception RIP: netif_skb_features+51] RIP: 815292b3 RSP: 8817f6987940 RFLAGS: 00010246 #9 [8817f6987968] validate_xmit_skb at 815294aa #10 [8817f69879a0] validate_xmit_skb at 8152a0d9 #11 [8817f69879b0] __qdisc_run at 8154a193 #12 [8817f6987a00] dev_queue_xmit at 81529e03 It looks like the skb has already been released since its dev pointer field is invalid. Any clue on how this can be investigated further? My current thought is to add some instrumentation to the place where skb is released and analyze whether there is any race condition happening there. However >>> >>> Either dropwatch or perf could do the work to instrument kfree_skb(). >> >> Thanks - will try it out. > > I'm using perf to collect the callstack for kfree_skb and trying to > correlate that with the corrupted SKB address however when system > crashes the perf.data file is also corrupted - how can I view this > file in case the system crashes before perf exits? Hmm, KASAN is pretty good at detecting use-after-free, its report can nicely shows where we allocate/free it and the use after free. https://01.org/linuxgraphics/gfx-docs/drm/dev-tools/kasan.html
Re: [Patch net 3/3] team: use a larger struct for mac address
On Wed, Apr 26, 2017 at 9:46 AM, Jarod Wilson wrote: > On 2017-04-26 12:11 PM, Cong Wang wrote: >> >> On Wed, Apr 26, 2017 at 8:55 AM, Jarod Wilson wrote: >>> >>> >>> We already have struct sockaddr_storage that could be used throughout >>> this >>> set as well. We just converted a few pieces of the bonding driver over to >>> using it for better support of ipoib bonds, via commit >>> faeeb317a5615076dff1ff44b51e862e6064dbd0. Might be better to just use >>> that >>> in both bonding and team, rather than having different per-driver >>> structs, >>> or Yet Another Address Storage implementation. >> >> >> Technically, struct sockaddr_storage is not enough either, given the >> max is MAX_ADDR_LEN. This is why I gave up on sockaddr_storage. > > > Wait, what? Am I missing something? MAX_ADDR_LEN is 32, and sockaddr_storage > is a #define for __kernel_sockaddr_storage, which has it's __data member > defined as being of size 128 - sizeof(unsigned short). My bad, I thought it is same with sizeof(in6addr) without looking into it. The question is, why do we waste 126 - 32 = 94 bytes on stack to just use struct sockaddr_storage? I totally understand we want a unified struct, but we already redefine it in multiple places in tree...
Re: [PATCH v2 2/2] ARM: dts: Add the ethernet and ethernet PHY to the cygnus core DT.
Florian Fainelli writes: > On 04/25/2017 04:53 PM, Eric Anholt wrote: >> Cygnus has a single AMAC controller connected to the B53 switch with 2 >> PHYs. On the BCM911360_EP platform, those two PHYs are connected to >> the external ethernet jacks. >> >> Signed-off-by: Eric Anholt >> Reviewed-by: Florian Fainelli >> --- >> >> v2: Call the node "switch", just call the ports "port" (suggestions by >> Florian), drop max-speed on the phys (suggestion by Andrew Lunn), >> call the other nodes "ethernet" and "ethernet-phy" (suggestions by >> Sergei Shtylyov) >> >> arch/arm/boot/dts/bcm-cygnus.dtsi | 58 >> ++ >> arch/arm/boot/dts/bcm911360_entphn.dts | 8 + >> 2 files changed, 66 insertions(+) >> >> diff --git a/arch/arm/boot/dts/bcm-cygnus.dtsi >> b/arch/arm/boot/dts/bcm-cygnus.dtsi >> index 009f1346b817..9fd89be0f5e0 100644 >> --- a/arch/arm/boot/dts/bcm-cygnus.dtsi >> +++ b/arch/arm/boot/dts/bcm-cygnus.dtsi >> @@ -142,6 +142,54 @@ >> interrupts = <0>; >> }; >> >> +mdio: mdio@18002000 { >> +compatible = "brcm,iproc-mdio"; >> +reg = <0x18002000 0x8>; >> +#size-cells = <1>; >> +#address-cells = <0>; > > Sorry for not noticing earlier, since you override this correctly in the > board-level DTS file can you put a: > > status = "disabled" > > property in there by default? Done. signature.asc Description: PGP signature
[PATCH] net: hso: register netdev later to avoid a race condition
If the netdev is accessed before the urbs are initialized, there will be NULL pointer dereferences. That is avoided by registering it when it is fully initialized. This case occurs e.g. if dhcpcd is running in the background and the device is probed, either after insmod hso or when the device appears on the usb bus. A backtrace is the following: [ 1357.356048] usb 1-2: new high-speed USB device number 12 using ehci-omap [ 1357.551177] usb 1-2: New USB device found, idVendor=0af0, idProduct=8800 [ 1357.558654] usb 1-2: New USB device strings: Mfr=3, Product=2, SerialNumber=0 [ 1357.568572] usb 1-2: Product: Globetrotter HSUPA Modem [ 1357.574096] usb 1-2: Manufacturer: Option N.V. [ 1357.685882] hso 1-2:1.5: Not our interface [ 1460.886352] hso: unloaded [ 1460.889984] usbcore: deregistering interface driver hso [ 1513.769134] hso: ../drivers/net/usb/hso.c: Option Wireless [ 1513.846771] Unable to handle kernel NULL pointer dereference at virtual address 0030 [ 1513.887664] hso 1-2:1.5: Not our interface [ 1513.906890] usbcore: registered new interface driver hso [ 1513.937988] pgd = ecdec000 [ 1513.949890] [0030] *pgd=acd15831, *pte=, *ppte= [ 1513.956573] Internal error: Oops: 817 [#1] PREEMPT SMP ARM [ 1513.962371] Modules linked in: hso usb_f_ecm omap2430 bnep bluetooth g_ether usb_f_rndis u_ether libcomposite configfs ipv6 arc4 wl18xx wlcore mac80211 cfg80211 bq27xxx_battery panel_tpo_td028ttec1 omapdrm drm_kms_helper cfbfillrect snd_soc_simple_card syscopyarea cfbimgblt snd_soc_simple_card_utils sysfillrect sysimgblt fb_sys_fops snd_soc_omap_twl4030 cfbcopyarea encoder_opa362 drm twl4030_madc_hwmon wwan_on_off snd_soc_gtm601 pwm_omap_dmtimer generic_adc_battery connector_analog_tv pwm_bl extcon_gpio omap3_isp wlcore_sdio videobuf2_dma_contig videobuf2_memops w1_bq27000 videobuf2_v4l2 videobuf2_core omap_hdq snd_soc_omap_mcbsp ov9650 snd_soc_omap bmp280_i2c bmg160_i2c v4l2_common snd_pcm_dmaengine bmp280 bmg160_core at24 bmc150_magn_i2c nvmem_core videodev phy_twl4030_usb bmc150_accel_i2c tsc2007 [ 1514.037384] bmc150_magn bmc150_accel_core media leds_tca6507 bno055 industrialio_triggered_buffer kfifo_buf gpio_twl4030 musb_hdrc snd_soc_twl4030 twl4030_vibra twl4030_madc twl4030_pwrbutton twl4030_charger industrialio w2sg0004 ehci_omap omapdss [last unloaded: hso] [ 1514.062622] CPU: 0 PID: 3433 Comm: dhcpcd Tainted: GW 4.11.0-rc8-letux+ #1 [ 1514.071136] Hardware name: Generic OMAP36xx (Flattened Device Tree) [ 1514.077758] task: ee748240 task.stack: ecdd6000 [ 1514.082580] PC is at hso_start_net_device+0x50/0xc0 [hso] [ 1514.088287] LR is at hso_net_open+0x68/0x84 [hso] [ 1514.093231] pc : []lr : []psr: a00f0013 sp : ecdd7e20 ip : fp : [ 1514.105316] r10: r9 : ed0e080c r8 : ecd8fe2c [ 1514.110839] r7 : bf79cef4 r6 : ecd8fe00 r5 : r4 : ed0dbd80 [ 1514.117706] r3 : r2 : c0020c80 r1 : r0 : ecdb7800 [ 1514.124572] Flags: NzCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment none [ 1514.132110] Control: 10c5387d Table: acdec019 DAC: 0051 [ 1514.138153] Process dhcpcd (pid: 3433, stack limit = 0xecdd6218) [ 1514.144470] Stack: (0xecdd7e20 to 0xecdd8000) [ 1514.149078] 7e20: ed0dbd80 ecd8fe98 0001 ecd8f800 ecd8fe00 ecd8fe60 [ 1514.157714] 7e40: ed0e080c bf79ced8 bf79ce70 ecd8f800 0001 bf7a0258 ecd8f830 c068d958 [ 1514.166320] 7e60: c068d8b8 ecd8f800 0001 1091 1090 c068dba4 ecd8f800 1090 [ 1514.174926] 7e80: ecd8f940 ecd8f800 c068dc60 0001 ed0e0800 ecd8f800 [ 1514.183563] 7ea0: c06feaa8 c0ca39c2 beea57dc 0020 306f7368 [ 1514.192169] 7ec0: 1091 8914 [ 1514.200805] 7ee0: eaa9ab60 beea57dc c0c9bfc0 eaa9ab40 0006 00046858 c066a948 [ 1514.209411] 7f00: beea57dc eaa9ab60 ecc6b0c0 c02837b0 0006 c0282c90 c000 c0283654 [ 1514.218017] 7f20: c09b0c00 c098bc31 0001 c0c5e513 c0c5e513 c0151354 c01a20c0 [ 1514.226654] 7f40: c0c5e513 c01a3134 ecdd6000 c01a3160 ee7487f0 600f0013 ee748240 [ 1514.235260] 7f60: ee748734 ecc6b0c0 ecc6b0c0 beea57dc 8914 0006 [ 1514.243896] 7f80: 00046858 c02837b0 1091 0003a1f0 00046608 0003a248 0036 c01071e4 [ 1514.252502] 7fa0: ecdd6000 c0107040 0003a1f0 00046608 0006 8914 beea57dc 1091 [ 1514.261108] 7fc0: 0003a1f0 00046608 0003a248 0036 0003ac0c 00046608 00046610 00046858 [ 1514.269744] 7fe0: 0003a0ac beea57d4 000167eb b6f23106 400f0030 0006 [ 1514.278411] [] (hso_start_net_device [hso]) from [] (hso_net_open+0x68/0x84 [hso]) [ 1514.288238] [] (hso_net_open [hso]) from [] (__dev_open+0xa0/0xf4) [ 1514.296600] [] (__dev_open) from [] (__dev_change_flags+0x8c/0x130) [ 1514.305023] [] (__dev_change_flags) from [] (dev_change_flags+0x18/0x48) [ 1514.313934] [] (dev_change_flags) f
Re: [PATCH v2 2/2] ARM: dts: Add the ethernet and ethernet PHY to the cygnus core DT.
Florian Fainelli writes: > On 04/25/2017 04:53 PM, Eric Anholt wrote: >> Cygnus has a single AMAC controller connected to the B53 switch with 2 >> PHYs. On the BCM911360_EP platform, those two PHYs are connected to >> the external ethernet jacks. >> >> Signed-off-by: Eric Anholt >> Reviewed-by: Florian Fainelli >> --- >> >> v2: Call the node "switch", just call the ports "port" (suggestions by >> Florian), drop max-speed on the phys (suggestion by Andrew Lunn), >> call the other nodes "ethernet" and "ethernet-phy" (suggestions by >> Sergei Shtylyov) >> >> arch/arm/boot/dts/bcm-cygnus.dtsi | 58 >> ++ >> arch/arm/boot/dts/bcm911360_entphn.dts | 8 + >> 2 files changed, 66 insertions(+) >> >> diff --git a/arch/arm/boot/dts/bcm-cygnus.dtsi >> b/arch/arm/boot/dts/bcm-cygnus.dtsi >> index 009f1346b817..9fd89be0f5e0 100644 >> --- a/arch/arm/boot/dts/bcm-cygnus.dtsi >> +++ b/arch/arm/boot/dts/bcm-cygnus.dtsi >> @@ -142,6 +142,54 @@ >> interrupts = <0>; >> }; >> >> +mdio: mdio@18002000 { >> +compatible = "brcm,iproc-mdio"; >> +reg = <0x18002000 0x8>; >> +#size-cells = <1>; >> +#address-cells = <0>; > > Sorry for not noticing earlier, since you override this correctly in the > board-level DTS file can you put a: > > status = "disabled" > > property in there by default? I didn't have the override in the board file either, just switch and ethernet. Fixed. signature.asc Description: PGP signature
Re: [PATCH v2 2/2] ARM: dts: Add the ethernet and ethernet PHY to the cygnus core DT.
Andrew Lunn writes: >> +eth0: ethernet@18042000 { >> +compatible = "brcm,amac"; >> +reg = <0x18042000 0x1000>, >> + <0x1811 0x1000>; >> +reg-names = "amac_base", "idm_base"; >> +interrupts = ; >> +max-speed = <1000>; > > Hi Eric > > Sorry i missed this the first time. Does this Ethernet controller do > > 1Gbps? Does this max-speed do anything useful? It doesn't look like it. Dropped. signature.asc Description: PGP signature