Re: [RFC PATCH net-next 00/12] selftests: forwarding: Add VRF-based tests
On Mon, Jan 15, 2018 at 04:48:25PM -0700, David Ahern wrote: > On 1/15/18 4:17 PM, Jiri Pirko wrote: > >> A couple of feature requests: > >> 1. an option to pause on any error to allow inspection of the setup > > > > Good idea. Should be easy to add. > > Here is a snippet from my vrf test script: > > PAUSE_ON_FAIL=no > -p option sets PAUSE_ON_FAIL=yes > > log_test() > { > local rc=$1 > local expected=$2 > local msg="$3" > > if [ ${rc} -eq ${expected} ]; then > nsuccess=$((nsuccess+1)) > printf "\nTEST: %-80s [ OK ]\n" "${msg}" > else > nfail=$((nfail+1)) > printf "\nTEST: %-80s [FAIL]\n" "${msg}" > if [ "${PAUSE_ON_FAIL}" = "yes" ]; then > echo > echo "hit enter to continue, 'q' to quit" > read a > [ "$a" = "q" ] && exit 1 > fi > fi > } Nice. Will add. [...] > >> 2. an option to configure the system and leave it in that state (ie, > >> don't trap exit and run cleanup). By extension, an option is needed to > >> do cleanup only. > > > > Checkout the last patch. It has "noprepare" and "nocleanup" options. > > So I guess you imagine something like that, but generic? > > > > Sure that is one way. I think we can do something similar to your 'PAUSE_ON_FAIL' option. At the end of the run the system is supposed to be configured as it was in the beginning of the test, so we can have the trap wait for user to hit 'c' for cleanup if the option is set. By default it will run cleanup. > Something else I have found useful is to not redirect stdout/stderr from > the commands and to have tags that can be grep'ed to provide a summary. > I run my VRF test script as: > > $ run-test.sh 2>&1 | tee vrf-results.txt | grep TEST Good idea. Will change.
Re: KASAN: slab-out-of-bounds Write in tcp_v6_syn_recv_sock
On Thu, Jan 4, 2018 at 12:31 AM, Cong Wang wrote: > On Wed, Jan 3, 2018 at 12:55 PM, Ozgur wrote: >> >> >> 03.01.2018, 21:57, "Cong Wang" : >>> On Tue, Jan 2, 2018 at 3:58 PM, syzbot >>> wrote: Hello, syzkaller hit the following crash on 61233580f1f33c50e159c50e24d80ffd2ba2e06b git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/master compiler: gcc (GCC) 7.1.1 20170620 .config is attached Raw console output is attached. C reproducer is attached syzkaller reproducer is attached. See https://goo.gl/kgGztJ for information about syzkaller reproducers IMPORTANT: if you fix the bug, please add the following tag to the commit: Reported-by: syzbot+6dc95bddc6976b800...@syzkaller.appspotmail.com It will help syzbot understand when the bug is fixed. See footer for details. If you forward the report, please keep this part and the footer. TCP: request_sock_TCPv6: Possible SYN flooding on port 20002. Sending cookies. Check SNMP counters. == BUG: KASAN: slab-out-of-bounds in memcpy include/linux/string.h:344 [inline] BUG: KASAN: slab-out-of-bounds in tcp_v6_syn_recv_sock+0x628/0x23a0 net/ipv6/tcp_ipv6.c:1144 Write of size 160 at addr 8801cbdd7460 by task syzkaller545407/3196 CPU: 1 PID: 3196 Comm: syzkaller545407 Not tainted 4.15.0-rc5+ #241 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:17 [inline] dump_stack+0x194/0x257 lib/dump_stack.c:53 print_address_description+0x73/0x250 mm/kasan/report.c:252 kasan_report_error mm/kasan/report.c:351 [inline] kasan_report+0x25b/0x340 mm/kasan/report.c:409 check_memory_region_inline mm/kasan/kasan.c:260 [inline] check_memory_region+0x137/0x190 mm/kasan/kasan.c:267 memcpy+0x37/0x50 mm/kasan/kasan.c:303 memcpy include/linux/string.h:344 [inline] tcp_v6_syn_recv_sock+0x628/0x23a0 net/ipv6/tcp_ipv6.c:1144 >>> >>> tls_init() changes sk->sk_prot from IPv6 to IPv4, which leads >>> to this bug. I guess IPv6 is not supported for TLS? If so, need >>> a check on proto in tls_init()... >> >> Hello, >> >> I think IPv6 supports with TLS. >> There was a previously posted commit by Mellanox: >> >> https://patchwork.ozlabs.org/patch/801530/ > > Good to know. > > Can you resend the fix? It could probably fix another warning > reported by syzbot too. FTR this has been assigned CVE-2018-5703
Re: [PATCH] Remove structure passing and assignment to save stack and no coping structures.
Hi Karim, Thank you for the patch! Perhaps something to improve: [auto build test WARNING on linus/master] [also build test WARNING on v4.15-rc8 next-20180115] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] url: https://github.com/0day-ci/linux/commits/Karim-Eshapa/Remove-structure-passing-and-assignment-to-save-stack-and-no-coping-structures/20180116-130502 reproduce: # apt-get install sparse make ARCH=x86_64 allmodconfig make C=1 CF=-D__CHECK_ENDIAN__ sparse warnings: (new ones prefixed by >>) >> kernel/bpf/verifier.c:1002:29: sparse: incorrect type in argument 2 >> (different modifiers) @@ expected struct tnum @@ got structstruct tnum @@ kernel/bpf/verifier.c:1002:29: expected struct tnum kernel/bpf/verifier.c:1002:29: got struct tnum const >> kernel/bpf/verifier.c:1003:17: sparse: not addressable kernel/bpf/verifier.c:1028:29: sparse: incorrect type in argument 2 (different modifiers) @@ expected struct tnum @@ got structstruct tnum @@ kernel/bpf/verifier.c:1028:29: expected struct tnum kernel/bpf/verifier.c:1028:29: got struct tnum const kernel/bpf/verifier.c:1029:17: sparse: not addressable kernel/bpf/verifier.c:1969:46: sparse: incorrect type in argument 2 (different modifiers) @@ expected struct tnum @@ got structstruct tnum @@ kernel/bpf/verifier.c:1969:46: expected struct tnum kernel/bpf/verifier.c:1969:46: got struct tnum const >> kernel/bpf/verifier.c:1970:26: sparse: incorrect type in argument 3 >> (different modifiers) @@ expected struct tnum lib @@ got structstruct tnum >> lib @@ kernel/bpf/verifier.c:1970:26: expected struct tnum lib kernel/bpf/verifier.c:1970:26: got struct tnum const kernel/bpf/verifier.c:4527:38: sparse: subtraction of Share your drugs >> kernel/bpf/verifier.c:1002:17: sparse: call with no type! kernel/bpf/verifier.c:1028:17: sparse: call with no type! kernel/bpf/verifier.c: In function 'check_pkt_ptr_alignment': kernel/bpf/verifier.c:1003:3: error: lvalue required as unary '&' operand &tnum_const(ip_align + reg->off + off)); ^ kernel/bpf/verifier.c:1002:21: warning: passing argument 2 of 'tnum_add' discards 'const' qualifier from pointer target type tnum_add(®_off, ®->var_off, ^ In file included from include/linux/bpf_verifier.h:12:0, from kernel/bpf/verifier.c:17: include/linux/tnum.h:29:6: note: expected 'struct tnum but argument is of type 'const struct tnum void tnum_add(struct tnum struct tnum struct tnum ^~~~ kernel/bpf/verifier.c: In function 'check_generic_ptr_alignment': kernel/bpf/verifier.c:1029:3: error: lvalue required as unary '&' operand &tnum_const(reg->off + off)); ^ kernel/bpf/verifier.c:1028:21: warning: passing argument 2 of 'tnum_add' discards 'const' qualifier from pointer target type tnum_add(®_off, ®->var_off, ^ In file included from include/linux/bpf_verifier.h:12:0, from kernel/bpf/verifier.c:17: include/linux/tnum.h:29:6: note: expected 'struct tnum but argument is of type 'const struct tnum void tnum_add(struct tnum struct tnum struct tnum ^~~~ kernel/bpf/verifier.c: In function 'adjust_ptr_min_max_vals': kernel/bpf/verifier.c:1969:31: warning: passing argument 2 of 'tnum_add' discards 'const' qualifier from pointer target type tnum_add(&dst_reg->var_off, &ptr_reg->var_off, ^ In file included from include/linux/bpf_verifier.h:12:0, from kernel/bpf/verifier.c:17: include/linux/tnum.h:29:6: note: expected 'struct tnum but argument is of type 'const struct tnum void tnum_add(struct tnum struct tnum struct tnum ^~~~ kernel/bpf/verifier.c:1970:4: warning: passing argument 3 of 'tnum_add' discards 'const' qualifier from pointer target type &off_reg->var_off); ^ In file included from include/linux/bpf_verifier.h:12:0, from kernel/bpf/verifier.c:17: include/linux/tnum.h:29:6: note: expected 'struct tnum but argument is of type 'const struct tnum void tnum_add(struct tnum struct tnum struct tnum ^~~~ vim +1002 kernel/bpf/verifier.c 980 981 static int check_pkt_ptr_alignment(struct bpf_verifier_env *env, 982 const struct bpf_reg_state *reg, 983 int off, int size, bool strict) 984 { 985 struct tnum reg_off; 986 int ip_align; 987 988 /* Byte size accesses are always allowed. */ 989 if (!strict || size == 1) 990 return 0; 991 992 /* For platforms that do not have a Kconfig enabling 993 * CONFIG_HAVE_EFFI
Re: [Patch net v3] tun: fix a memory leak for tfile->tx_array
On Mon, Jan 15, 2018 at 10:37 PM, Jason Wang wrote: > > > On 2018年01月16日 14:33, Cong Wang wrote: >> But __tun_detach(true) is not a hot path, a memset() doesn't harm >> anything. > > > Yes, but it looks more more like a workaround or trick to me. > I'd blame skb_array API for this. ;) Ideally, skb_array_cleanup() should take care of everything I put in tun_cleanup_tx_array(). As I mentioned in changelog, we can always improve it in -net-next, so I don't want to bother it for -net.
Re: [PATCH net-next] net: sched: fix use before alloc of per cpu stats
On Mon, Jan 15, 2018 at 10:33 PM, Prashant Bhole wrote: > Pardon my ignorance, I think it is necessary to forward d59f5ffa59d8 to > -net. Because previously flags were set during init and checked after init > to allocate memory. static_flags makes the flags available before init, > hence we can allocate memory before init. Oh, right, but we can allocate it unconditionally and free it if not needed after ->init(). It is not prettier though. ;) > Sorry, but from your reply I couldn't understand whether we need v2 for > net-next. > I don't think you need to send v2, because the one for -net will be merged (with a conflict) into -net-next by DaveM.
Re: [Patch net v3] tun: fix a memory leak for tfile->tx_array
On 2018年01月16日 14:33, Cong Wang wrote: On Mon, Jan 15, 2018 at 10:12 PM, Jason Wang wrote: On 2018年01月16日 14:07, Cong Wang wrote: On Mon, Jan 15, 2018 at 10:00 PM, Jason Wang wrote: I mean we can leave __tun_detach() as is, and just add the cleanup to tun_detach_all(). This is because in both cases, we're sure skb array has been initialized before. Oh, I thought the same before sending v3, but I believe it is easier to understand 'if (tfile->tx_array.ring.queue)' than 'if (tun)', because tx_array only depends on itself rather tfile->tun in this way. Maybe just add a comment to explain in __tun_detach(), it avoids memset() anyway. But __tun_detach(true) is not a hot path, a memset() doesn't harm anything. Yes, but it looks more more like a workaround or trick to me. Thanks
Re: [PATCH net-next] net: sched: fix use before alloc of per cpu stats
On 1/16/2018 2:57 PM, Cong Wang wrote: On Mon, Jan 15, 2018 at 9:47 PM, Prashant Bhole wrote: On 1/16/2018 2:08 PM, Cong Wang wrote: On Sun, Jan 14, 2018 at 9:52 PM, Prashant Bhole wrote: About bug: During init of clsact/ingress, it links qdisc's cpu_bstats,cpu_qstats with mini qdisc. TCQ_F_CPUSTATS is set in qdisc->flags during init and this flag is checked after init to allocate memory for stats. Hence mini qdisc points to null per-cpu-stats. The problem isn't caught while updating stats via mini qdisc because per_cpu_ptr(NULL, cpu_num) or this_cpu_ptr(NULL) gives a valid pointer. About fix: Currently stats memory is allocated at two places. - in qdisc_alloc() if TCQ_F_CPUSTATS is set in Qdisc_ops->static_flags - in qdisc_create() if TCQ_F_CPUSTATS is set in Qdisc->flags Qdisc_ops->static_flags is propagated to Qdisc->flags. So to fix this bug, we set TCQ_F_CPUSTATS in static flags and additional condition to avoid allocation after init. Good catch! One nit below. Fixes: 46209401f8f6 ("net: core: introduce mini_Qdisc and eliminate usage of tp->q for clsact fastpath") Signed-off-by: Prashant Bhole --- net/sched/sch_api.c | 3 ++- net/sched/sch_ingress.c | 6 ++ 2 files changed, 4 insertions(+), 5 deletions(-) diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c index 8a04c36e579f..de99a5e80944 100644 --- a/net/sched/sch_api.c +++ b/net/sched/sch_api.c @@ -1094,7 +1094,8 @@ static struct Qdisc *qdisc_create(struct net_device *dev, goto err_out5; } - if (qdisc_is_percpu_stats(sch)) { + if (!(ops->static_flags & TCQ_F_CPUSTATS) && + qdisc_is_percpu_stats(sch)) { Instead of checking both flags, it is simpler to just check if sch->cpu_bstats and sch->cpu_qstats are NULL here? Also, this patch should go to -net tree, not just net-next, but -net doesn't have ops->static_flags yet. ;) Given this, consider moving the sch->cpu_bstats allocation before ops->init(), which might be simpler. Cong, Thanks for reviewing. Based on this patch, Daniel submitted another patch for -net: http://patchwork.ozlabs.org/patch/861098/ Odd, I don't even see it in my inbox... Let me know if we still need v2 of my patch on -net-next (with your suggested nit picks). It is okay, but it doesn't have to forward commit d59f5ffa59d8 to net, reordering allocation before ->init() should be simpler. Pardon my ignorance, I think it is necessary to forward d59f5ffa59d8 to -net. Because previously flags were set during init and checked after init to allocate memory. static_flags makes the flags available before init, hence we can allocate memory before init. Sorry, but from your reply I couldn't understand whether we need v2 for net-next. -Prashant
Re: [Patch net v3] tun: fix a memory leak for tfile->tx_array
On Mon, Jan 15, 2018 at 10:12 PM, Jason Wang wrote: > > > On 2018年01月16日 14:07, Cong Wang wrote: >> >> On Mon, Jan 15, 2018 at 10:00 PM, Jason Wang wrote: >>> >>> I mean we can leave __tun_detach() as is, and just add the cleanup to >>> tun_detach_all(). This is because in both cases, we're sure skb array has >>> been initialized before. >>> >> Oh, I thought the same before sending v3, but I believe it is easier to >> understand 'if (tfile->tx_array.ring.queue)' than 'if (tun)', because >> tx_array >> only depends on itself rather tfile->tun in this way. > > > Maybe just add a comment to explain in __tun_detach(), it avoids memset() > anyway. > But __tun_detach(true) is not a hot path, a memset() doesn't harm anything.
Re: [PATCH net-next 2/2] net: sched: add xfrm policy ematch
On Fri, Jan 12, 2018 at 4:57 AM, Eyal Birger wrote: > +static void em_policy_destroy(struct tcf_ematch *em) > +{ > + const struct xt_policy_info *info = (const void *)em->data; > + > + if (!info) > + return; > + > + kfree((void *)em->data); > +} Nit: kfree() could handle NULL, no need to check.
Re: [PATCH net-next v5 2/4] phy: cp110-comphy: 2.5G SGMII mode
On Friday 12 January 2018 01:21 PM, Antoine Tenart wrote: > This patch allow the CP100 comphy to configure some lanes in the > 2.5G SGMII mode. This mode is quite close to SGMII and uses nearly the > same code path. > > Signed-off-by: Antoine Tenart Acked-by: Kishon Vijay Abraham I > --- > drivers/phy/marvell/phy-mvebu-cp110-comphy.c | 17 ++--- > 1 file changed, 14 insertions(+), 3 deletions(-) > > diff --git a/drivers/phy/marvell/phy-mvebu-cp110-comphy.c > b/drivers/phy/marvell/phy-mvebu-cp110-comphy.c > index a0d522154cdf..4ef429250d7b 100644 > --- a/drivers/phy/marvell/phy-mvebu-cp110-comphy.c > +++ b/drivers/phy/marvell/phy-mvebu-cp110-comphy.c > @@ -135,19 +135,25 @@ struct mvebu_comhy_conf { > static const struct mvebu_comhy_conf mvebu_comphy_cp110_modes[] = { > /* lane 0 */ > MVEBU_COMPHY_CONF(0, 1, PHY_MODE_SGMII, 0x1), > + MVEBU_COMPHY_CONF(0, 1, PHY_MODE_2500SGMII, 0x1), > /* lane 1 */ > MVEBU_COMPHY_CONF(1, 2, PHY_MODE_SGMII, 0x1), > + MVEBU_COMPHY_CONF(1, 2, PHY_MODE_2500SGMII, 0x1), > /* lane 2 */ > MVEBU_COMPHY_CONF(2, 0, PHY_MODE_SGMII, 0x1), > + MVEBU_COMPHY_CONF(2, 0, PHY_MODE_2500SGMII, 0x1), > MVEBU_COMPHY_CONF(2, 0, PHY_MODE_10GKR, 0x1), > /* lane 3 */ > MVEBU_COMPHY_CONF(3, 1, PHY_MODE_SGMII, 0x2), > + MVEBU_COMPHY_CONF(3, 1, PHY_MODE_2500SGMII, 0x2), > /* lane 4 */ > MVEBU_COMPHY_CONF(4, 0, PHY_MODE_SGMII, 0x2), > + MVEBU_COMPHY_CONF(4, 0, PHY_MODE_2500SGMII, 0x2), > MVEBU_COMPHY_CONF(4, 0, PHY_MODE_10GKR, 0x2), > MVEBU_COMPHY_CONF(4, 1, PHY_MODE_SGMII, 0x1), > /* lane 5 */ > MVEBU_COMPHY_CONF(5, 2, PHY_MODE_SGMII, 0x1), > + MVEBU_COMPHY_CONF(5, 2, PHY_MODE_2500SGMII, 0x1), > }; > > struct mvebu_comphy_priv { > @@ -206,6 +212,10 @@ static void mvebu_comphy_ethernet_init_reset(struct > mvebu_comphy_lane *lane, > if (mode == PHY_MODE_10GKR) > val |= MVEBU_COMPHY_SERDES_CFG0_GEN_RX(0xe) | > MVEBU_COMPHY_SERDES_CFG0_GEN_TX(0xe); > + else if (mode == PHY_MODE_2500SGMII) > + val |= MVEBU_COMPHY_SERDES_CFG0_GEN_RX(0x8) | > +MVEBU_COMPHY_SERDES_CFG0_GEN_TX(0x8) | > +MVEBU_COMPHY_SERDES_CFG0_HALF_BUS; > else if (mode == PHY_MODE_SGMII) > val |= MVEBU_COMPHY_SERDES_CFG0_GEN_RX(0x6) | > MVEBU_COMPHY_SERDES_CFG0_GEN_TX(0x6) | > @@ -296,13 +306,13 @@ static int mvebu_comphy_init_plls(struct > mvebu_comphy_lane *lane, > return 0; > } > > -static int mvebu_comphy_set_mode_sgmii(struct phy *phy) > +static int mvebu_comphy_set_mode_sgmii(struct phy *phy, enum phy_mode mode) > { > struct mvebu_comphy_lane *lane = phy_get_drvdata(phy); > struct mvebu_comphy_priv *priv = lane->priv; > u32 val; > > - mvebu_comphy_ethernet_init_reset(lane, PHY_MODE_SGMII); > + mvebu_comphy_ethernet_init_reset(lane, mode); > > val = readl(priv->base + MVEBU_COMPHY_RX_CTRL1(lane->id)); > val &= ~MVEBU_COMPHY_RX_CTRL1_CLK8T_EN; > @@ -487,7 +497,8 @@ static int mvebu_comphy_power_on(struct phy *phy) > > switch (lane->mode) { > case PHY_MODE_SGMII: > - ret = mvebu_comphy_set_mode_sgmii(phy); > + case PHY_MODE_2500SGMII: > + ret = mvebu_comphy_set_mode_sgmii(phy, lane->mode); > break; > case PHY_MODE_10GKR: > ret = mvebu_comphy_set_mode_10gkr(phy); >
Re: [PATCH net-next v5 1/4] phy: add 2.5G SGMII mode to the phy_mode enum
On Tuesday 16 January 2018 12:51 AM, David Miller wrote: > From: Antoine Tenart > Date: Fri, 12 Jan 2018 08:51:27 +0100 > >> This patch adds one more generic PHY mode to the phy_mode enum, to allow >> configuring generic PHYs to the 2.5G SGMII mode by using the set_mode >> callback. >> >> Signed-off-by: Antoine Tenart Acked-by: Kishon Vijay Abraham I > > PHY layer folks, and reviews please? > >> --- >> include/linux/phy/phy.h | 1 + >> 1 file changed, 1 insertion(+) >> >> diff --git a/include/linux/phy/phy.h b/include/linux/phy/phy.h >> index 4f8423a948d5..5a80e9de3686 100644 >> --- a/include/linux/phy/phy.h >> +++ b/include/linux/phy/phy.h >> @@ -28,6 +28,7 @@ enum phy_mode { >> PHY_MODE_USB_DEVICE, >> PHY_MODE_USB_OTG, >> PHY_MODE_SGMII, >> +PHY_MODE_2500SGMII, >> PHY_MODE_10GKR, >> PHY_MODE_UFS_HS_A, >> PHY_MODE_UFS_HS_B, >> -- >> 2.14.3 >>
Re: [Patch net v3] tun: fix a memory leak for tfile->tx_array
On 2018年01月16日 14:07, Cong Wang wrote: On Mon, Jan 15, 2018 at 10:00 PM, Jason Wang wrote: I mean we can leave __tun_detach() as is, and just add the cleanup to tun_detach_all(). This is because in both cases, we're sure skb array has been initialized before. Oh, I thought the same before sending v3, but I believe it is easier to understand 'if (tfile->tx_array.ring.queue)' than 'if (tun)', because tx_array only depends on itself rather tfile->tun in this way. Maybe just add a comment to explain in __tun_detach(), it avoids memset() anyway. Thanks
Re: [PATCH net] sctp: return error if the asoc has been peeled off in sctp_wait_for_sndbuf
On Tue, Jan 16, 2018 at 2:58 AM, Neil Horman wrote: > On Tue, Jan 16, 2018 at 01:20:28AM +0800, Xin Long wrote: >> On Mon, Jan 15, 2018 at 9:06 PM, Neil Horman wrote: >> > On Mon, Jan 15, 2018 at 05:01:36PM +0800, Xin Long wrote: >> >> After commit cea0cc80a677 ("sctp: use the right sk after waking up from >> >> wait_buf sleep"), it may change to lock another sk if the asoc has been >> >> peeled off in sctp_wait_for_sndbuf. >> >> >> >> However, the asoc's new sk could be already closed elsewhere, as it's in >> >> the sendmsg context of the old sk that can't avoid the new sk's closing. >> >> If the sk's last one refcnt is held by this asoc, later on after putting >> >> this asoc, the new sk will be freed, while under it's own lock. >> >> >> >> This patch is to revert that commit, but fix the old issue by returning >> >> error under the old sk's lock. >> >> >> >> Fixes: cea0cc80a677 ("sctp: use the right sk after waking up from >> >> wait_buf sleep") >> >> Reported-by: syzbot+ac6ea7baa4432811e...@syzkaller.appspotmail.com >> >> Signed-off-by: Xin Long >> >> --- >> >> net/sctp/socket.c | 16 ++-- >> >> 1 file changed, 6 insertions(+), 10 deletions(-) >> >> >> >> diff --git a/net/sctp/socket.c b/net/sctp/socket.c >> >> index 15ae018..feb2ca6 100644 >> >> --- a/net/sctp/socket.c >> >> +++ b/net/sctp/socket.c >> >> @@ -85,7 +85,7 @@ >> >> static int sctp_writeable(struct sock *sk); >> >> static void sctp_wfree(struct sk_buff *skb); >> >> static int sctp_wait_for_sndbuf(struct sctp_association *asoc, long >> >> *timeo_p, >> >> - size_t msg_len, struct sock **orig_sk); >> >> + size_t msg_len); >> >> static int sctp_wait_for_packet(struct sock *sk, int *err, long >> >> *timeo_p); >> >> static int sctp_wait_for_connect(struct sctp_association *, long >> >> *timeo_p); >> >> static int sctp_wait_for_accept(struct sock *sk, long timeo); >> >> @@ -1977,7 +1977,7 @@ static int sctp_sendmsg(struct sock *sk, struct >> >> msghdr *msg, size_t msg_len) >> >> timeo = sock_sndtimeo(sk, msg->msg_flags & MSG_DONTWAIT); >> >> if (!sctp_wspace(asoc)) { >> >> /* sk can be changed by peel off when waiting for buf. */ >> >> - err = sctp_wait_for_sndbuf(asoc, &timeo, msg_len, &sk); >> >> + err = sctp_wait_for_sndbuf(asoc, &timeo, msg_len); >> >> if (err) { >> >> if (err == -ESRCH) { >> >> /* asoc is already dead. */ >> >> @@ -8022,12 +8022,12 @@ void sctp_sock_rfree(struct sk_buff *skb) >> >> >> >> /* Helper function to wait for space in the sndbuf. */ >> >> static int sctp_wait_for_sndbuf(struct sctp_association *asoc, long >> >> *timeo_p, >> >> - size_t msg_len, struct sock **orig_sk) >> >> + size_t msg_len) >> >> { >> >> struct sock *sk = asoc->base.sk; >> >> - int err = 0; >> >> long current_timeo = *timeo_p; >> >> DEFINE_WAIT(wait); >> >> + int err = 0; >> >> >> >> pr_debug("%s: asoc:%p, timeo:%ld, msg_len:%zu\n", __func__, asoc, >> >>*timeo_p, msg_len); >> >> @@ -8056,17 +8056,13 @@ static int sctp_wait_for_sndbuf(struct >> >> sctp_association *asoc, long *timeo_p, >> >> release_sock(sk); >> >> current_timeo = schedule_timeout(current_timeo); >> >> lock_sock(sk); >> >> - if (sk != asoc->base.sk) { >> >> - release_sock(sk); >> >> - sk = asoc->base.sk; >> >> - lock_sock(sk); >> >> - } >> >> + if (sk != asoc->base.sk) >> >> + goto do_error; >> > Is this a safe comparison to make (thinking in terms both of non-cache >> > coherent >> > arches, or, more likely, of cases where the sock slab reuses an object >> > leading >> > to the same pointer). Would it be better to have a single point of >> > freeing the >> > sock and use the SOCK_DEAD flag here? >> Hi, Neil, You meant leading to 'asoc->base.sk is the same as sk' ? >> Here sk is being used in it's sendmsg context, this sk can't even be closed. > if thats the case, then I'm confused. Your changelog message asserted that > the > existing mechanism was broken because the socket might get closed during the > execution of this code. Can you provide a example of how the current > implementation might break? Here are two SKs, asoc's NEW sk and OLD sk. "However, the asoc's new sk could be already closed elsewhere, as it's in the sendmsg context of the old sk that can't avoid the new sk's closing." It's in asoc's OLD sk's sendmsg, the asoc's NEW sk can be closed elsewhere. Example: If it's in wait_buf. After peeling off the assoc and returning the NEW sk, just close() this NEW sk. Please let me know if it's still confusing. :-) > >> it's impossible that the sock slab may reuses this sk(still alive) to >> asoc->base.sk in s
Re: [Patch net v3] tun: fix a memory leak for tfile->tx_array
On Mon, Jan 15, 2018 at 10:00 PM, Jason Wang wrote: > I mean we can leave __tun_detach() as is, and just add the cleanup to > tun_detach_all(). This is because in both cases, we're sure skb array has > been initialized before. > Oh, I thought the same before sending v3, but I believe it is easier to understand 'if (tfile->tx_array.ring.queue)' than 'if (tun)', because tx_array only depends on itself rather tfile->tun in this way.
Re: [Patch net v3] tun: fix a memory leak for tfile->tx_array
On 2018年01月16日 13:49, Cong Wang wrote: On Mon, Jan 15, 2018 at 9:46 PM, Jason Wang wrote: I think then you don't even need the memset trick since we are sure it has been implemented? It doesn't look like sk_alloc() zero's the memory of tfile. Typo, for "implemented" I mean "initialized". I mean we can leave __tun_detach() as is, and just add the cleanup to tun_detach_all(). This is because in both cases, we're sure skb array has been initialized before. Thanks
Re: [PATCH net-next] net: sched: fix use before alloc of per cpu stats
On Mon, Jan 15, 2018 at 9:47 PM, Prashant Bhole wrote: > > > On 1/16/2018 2:08 PM, Cong Wang wrote: >> >> On Sun, Jan 14, 2018 at 9:52 PM, Prashant Bhole >> wrote: >>> >>> About bug: >>> During init of clsact/ingress, it links qdisc's cpu_bstats,cpu_qstats >>> with mini qdisc. TCQ_F_CPUSTATS is set in qdisc->flags during init and >>> this flag is checked after init to allocate memory for stats. >>> >>> Hence mini qdisc points to null per-cpu-stats. The problem isn't caught >>> while updating stats via mini qdisc because per_cpu_ptr(NULL, cpu_num) >>> or this_cpu_ptr(NULL) gives a valid pointer. >>> >>> About fix: >>> Currently stats memory is allocated at two places. >>> - in qdisc_alloc() if TCQ_F_CPUSTATS is set in Qdisc_ops->static_flags >>> - in qdisc_create() if TCQ_F_CPUSTATS is set in Qdisc->flags >>> >>> Qdisc_ops->static_flags is propagated to Qdisc->flags. So to fix this >>> bug, >>> we set TCQ_F_CPUSTATS in static flags and additional condition to avoid >>> allocation after init. >> >> >> >> Good catch! One nit below. >> >> >>> >>> Fixes: 46209401f8f6 ("net: core: introduce mini_Qdisc and eliminate usage >>> of tp->q for clsact fastpath") >>> Signed-off-by: Prashant Bhole >>> --- >>> net/sched/sch_api.c | 3 ++- >>> net/sched/sch_ingress.c | 6 ++ >>> 2 files changed, 4 insertions(+), 5 deletions(-) >>> >>> diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c >>> index 8a04c36e579f..de99a5e80944 100644 >>> --- a/net/sched/sch_api.c >>> +++ b/net/sched/sch_api.c >>> @@ -1094,7 +1094,8 @@ static struct Qdisc *qdisc_create(struct net_device >>> *dev, >>> goto err_out5; >>> } >>> >>> - if (qdisc_is_percpu_stats(sch)) { >>> + if (!(ops->static_flags & TCQ_F_CPUSTATS) && >>> + qdisc_is_percpu_stats(sch)) { >> >> >> Instead of checking both flags, it is simpler to just check if >> sch->cpu_bstats >> and sch->cpu_qstats are NULL here? >> >> Also, this patch should go to -net tree, not just net-next, but -net >> doesn't have ops->static_flags yet. ;) Given this, consider moving >> the sch->cpu_bstats allocation before ops->init(), which might be >> simpler. > > > Cong, > Thanks for reviewing. Based on this patch, Daniel submitted another patch > for -net: > http://patchwork.ozlabs.org/patch/861098/ Odd, I don't even see it in my inbox... > > Let me know if we still need v2 of my patch on -net-next (with your > suggested nit picks). It is okay, but it doesn't have to forward commit d59f5ffa59d8 to net, reordering allocation before ->init() should be simpler.
Re: [Patch net v3] tun: fix a memory leak for tfile->tx_array
On Mon, Jan 15, 2018 at 9:46 PM, Jason Wang wrote: > > > I think then you don't even need the memset trick since we are sure it has > been implemented? It doesn't look like sk_alloc() zero's the memory of tfile.
Re: [PATCH net-next] net: sched: fix use before alloc of per cpu stats
On 1/16/2018 2:08 PM, Cong Wang wrote: On Sun, Jan 14, 2018 at 9:52 PM, Prashant Bhole wrote: About bug: During init of clsact/ingress, it links qdisc's cpu_bstats,cpu_qstats with mini qdisc. TCQ_F_CPUSTATS is set in qdisc->flags during init and this flag is checked after init to allocate memory for stats. Hence mini qdisc points to null per-cpu-stats. The problem isn't caught while updating stats via mini qdisc because per_cpu_ptr(NULL, cpu_num) or this_cpu_ptr(NULL) gives a valid pointer. About fix: Currently stats memory is allocated at two places. - in qdisc_alloc() if TCQ_F_CPUSTATS is set in Qdisc_ops->static_flags - in qdisc_create() if TCQ_F_CPUSTATS is set in Qdisc->flags Qdisc_ops->static_flags is propagated to Qdisc->flags. So to fix this bug, we set TCQ_F_CPUSTATS in static flags and additional condition to avoid allocation after init. Good catch! One nit below. Fixes: 46209401f8f6 ("net: core: introduce mini_Qdisc and eliminate usage of tp->q for clsact fastpath") Signed-off-by: Prashant Bhole --- net/sched/sch_api.c | 3 ++- net/sched/sch_ingress.c | 6 ++ 2 files changed, 4 insertions(+), 5 deletions(-) diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c index 8a04c36e579f..de99a5e80944 100644 --- a/net/sched/sch_api.c +++ b/net/sched/sch_api.c @@ -1094,7 +1094,8 @@ static struct Qdisc *qdisc_create(struct net_device *dev, goto err_out5; } - if (qdisc_is_percpu_stats(sch)) { + if (!(ops->static_flags & TCQ_F_CPUSTATS) && + qdisc_is_percpu_stats(sch)) { Instead of checking both flags, it is simpler to just check if sch->cpu_bstats and sch->cpu_qstats are NULL here? Also, this patch should go to -net tree, not just net-next, but -net doesn't have ops->static_flags yet. ;) Given this, consider moving the sch->cpu_bstats allocation before ops->init(), which might be simpler. Cong, Thanks for reviewing. Based on this patch, Daniel submitted another patch for -net: http://patchwork.ozlabs.org/patch/861098/ Let me know if we still need v2 of my patch on -net-next (with your suggested nit picks). -Prashant
Re: [Patch net v3] tun: fix a memory leak for tfile->tx_array
On 2018年01月16日 03:37, Cong Wang wrote: tfile->tun could be detached before we close the tun fd, via tun_detach_all(), so it should not be used to check for tfile->tx_array. As Jason suggested, we probably have to clean it up unconditionally both in __tun_deatch() and tun_detach_all(), but this requires to check if it is initialized or not. Currently skb_array_cleanup() doesn't have such a check, so I check it in the caller and introduce a helper function, it is a bit ugly but we can always improve it in net-next. Reported-by: Dmitry Vyukov Fixes: 1576d9860599 ("tun: switch to use skb array for tx") Cc: Jason Wang Signed-off-by: Cong Wang --- drivers/net/tun.c | 15 +-- 1 file changed, 13 insertions(+), 2 deletions(-) diff --git a/drivers/net/tun.c b/drivers/net/tun.c index 4f4a842a1c9c..a8ec589d1359 100644 --- a/drivers/net/tun.c +++ b/drivers/net/tun.c @@ -611,6 +611,14 @@ static void tun_queue_purge(struct tun_file *tfile) skb_queue_purge(&tfile->sk.sk_error_queue); } +static void tun_cleanup_tx_array(struct tun_file *tfile) +{ + if (tfile->tx_array.ring.queue) { + skb_array_cleanup(&tfile->tx_array); + memset(&tfile->tx_array, 0, sizeof(tfile->tx_array)); + } +} + static void __tun_detach(struct tun_file *tfile, bool clean) { struct tun_file *ntfile; @@ -657,8 +665,7 @@ static void __tun_detach(struct tun_file *tfile, bool clean) tun->dev->reg_state == NETREG_REGISTERED) unregister_netdevice(tun->dev); } - if (tun) - skb_array_cleanup(&tfile->tx_array); + tun_cleanup_tx_array(tfile); sock_put(&tfile->sk); } } @@ -700,11 +707,13 @@ static void tun_detach_all(struct net_device *dev) /* Drop read queue */ tun_queue_purge(tfile); sock_put(&tfile->sk); + tun_cleanup_tx_array(tfile); } list_for_each_entry_safe(tfile, tmp, &tun->disabled, next) { tun_enable_queue(tfile); tun_queue_purge(tfile); sock_put(&tfile->sk); + tun_cleanup_tx_array(tfile); } BUG_ON(tun->numdisabled != 0); @@ -2851,6 +2860,8 @@ static int tun_chr_open(struct inode *inode, struct file * file) sock_set_flag(&tfile->sk, SOCK_ZEROCOPY); + memset(&tfile->tx_array, 0, sizeof(tfile->tx_array)); + return 0; } I think then you don't even need the memset trick since we are sure it has been implemented? Thanks
Re: [PATCH] brcmfmac: Make sure CLM downloading is optional
On Mon 15 Jan 11:40 PST 2018, Arend van Spriel wrote: > On 1/15/2018 6:10 PM, Bjorn Andersson wrote: > > The presence of a CLM file is described as optional, but missing the clm > > blob causes the preinit to return unsuccessfully. Fix this by ignoring > > the return value of the brcmf_c_process_clm_blob(). > > > > Also remove the extra debug print, as brcmf_c_process_clm_blob() already > > did print a useful error message before returning. > > > > Fixes: fdd0bd88ceae ("brcmfmac: add CLM download support") > > Cc: sta...@vger.kernel.org > > Signed-off-by: Bjorn Andersson > > --- > > > > This regression was introduced in v4.15-rc1, but I unfortunately didn't test > > WiFi until now. Included a Cc to stable@ in case you choose to pick this up > > after v4.15. > > Hi Bjorn, > > Thanks for looking into this. Actually there already have been a couple of > fixes posted on the linux-wireless list. [1] was rejected, [2] is being > discussed, and [2] was posted by me and I was awaiting response from the guy > reporting it. > Thanks for pointing me to these discussions, I did for some reason not find them this morning. I don't see the need for the retry in [1], so I think that's invalid. I tested [2] and it works well for me, but I agree with Kalle that a better description of why would be in order. Unfortunatley I can't find it in my inbox...even though I'm subscribed to linux-wireless@. The answer to Kalle's question should probably include a reference to 0542ad88fbdd ("firmware loader: Fix _request_firmware_load() return val for fw load abort") > Now the thing is that for old (Broadcom) firmware this is optional. Those > firmwares don't even have CLM support and those that do have the CLM data > embedded in firmware. I don't know which applies to my device, but it at least doesn't come with a dedicated clm blob - and the device won't receive any upgrades and hence will never get a clm blob. > However, Cypress wants to provide their customers with > firmware without CLM data. For those devices it is not optional. I still > prefer we add a mechanism to the driver to detect that, but we do not have > that yet. > That sounds reasonable, but I hope we can sort out the regression first and then add such logic. > Now regarding your patch some comments below ... > > > drivers/net/wireless/broadcom/brcm80211/brcmfmac/common.c | 8 ++-- > > 1 file changed, 2 insertions(+), 6 deletions(-) > > > > diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/common.c > > b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/common.c > > index 6a59d0609d30..0c67ba6ae135 100644 > > --- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/common.c > > +++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/common.c > > @@ -278,12 +278,8 @@ int brcmf_c_preinit_dcmds(struct brcmf_if *ifp) > > } > > ri->result = err; > > > > - /* Do any CLM downloading */ > > - err = brcmf_c_process_clm_blob(ifp); > > - if (err < 0) { > > - brcmf_err("download CLM blob file failed, %d\n", err); > > - goto done; > > - } > > + /* Do any optional CLM downloading */ > > + brcmf_c_process_clm_blob(ifp); > > The error print is indeed redundant and should be removed here. However, > brcmf_c_process_clm_blob() also returns -ENOMEM upon allocation failure and > I would still fail the driver probe for that. > Agreed, but as we want to let a few errors, specifically from the firmware loader, slip by I believe it's better to check the return value there instead. So I prefer Write's [2]. Regards, Bjorn
Re: [PATCH net-next] net: sched: fix use before alloc of per cpu stats
On Sun, Jan 14, 2018 at 9:52 PM, Prashant Bhole wrote: > About bug: > During init of clsact/ingress, it links qdisc's cpu_bstats,cpu_qstats > with mini qdisc. TCQ_F_CPUSTATS is set in qdisc->flags during init and > this flag is checked after init to allocate memory for stats. > > Hence mini qdisc points to null per-cpu-stats. The problem isn't caught > while updating stats via mini qdisc because per_cpu_ptr(NULL, cpu_num) > or this_cpu_ptr(NULL) gives a valid pointer. > > About fix: > Currently stats memory is allocated at two places. > - in qdisc_alloc() if TCQ_F_CPUSTATS is set in Qdisc_ops->static_flags > - in qdisc_create() if TCQ_F_CPUSTATS is set in Qdisc->flags > > Qdisc_ops->static_flags is propagated to Qdisc->flags. So to fix this bug, > we set TCQ_F_CPUSTATS in static flags and additional condition to avoid > allocation after init. Good catch! One nit below. > > Fixes: 46209401f8f6 ("net: core: introduce mini_Qdisc and eliminate usage of > tp->q for clsact fastpath") > Signed-off-by: Prashant Bhole > --- > net/sched/sch_api.c | 3 ++- > net/sched/sch_ingress.c | 6 ++ > 2 files changed, 4 insertions(+), 5 deletions(-) > > diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c > index 8a04c36e579f..de99a5e80944 100644 > --- a/net/sched/sch_api.c > +++ b/net/sched/sch_api.c > @@ -1094,7 +1094,8 @@ static struct Qdisc *qdisc_create(struct net_device > *dev, > goto err_out5; > } > > - if (qdisc_is_percpu_stats(sch)) { > + if (!(ops->static_flags & TCQ_F_CPUSTATS) && > + qdisc_is_percpu_stats(sch)) { Instead of checking both flags, it is simpler to just check if sch->cpu_bstats and sch->cpu_qstats are NULL here? Also, this patch should go to -net tree, not just net-next, but -net doesn't have ops->static_flags yet. ;) Given this, consider moving the sch->cpu_bstats allocation before ops->init(), which might be simpler. Thanks.
linux-next: manual merge of the akpm tree with the net-next tree
Hi Andrew, Today's linux-next merge of the akpm-current tree got a conflict in: net/ipv6/route.c between commit: 6802f3adcb3f ("ipv6: Fix build with gcc-4.4.5") from the net-next tree and patch: "net/ipv6/route.c: work around gcc-4.4.4 anon union initializer issue" from the akpm tree. I fixed it up (I just used the net-next tree version) and can carry the fix as necessary. This is now fixed as far as linux-next is concerned, but any non trivial conflicts should be mentioned to your upstream maintainer when your tree is submitted for merging. You may also want to consider cooperating with the maintainer of the conflicting tree to minimise any particularly complex conflicts. -- Cheers, Stephen Rothwell
Re: KASAN: use-after-free Read in tipc_group_is_open
On Mon, Jan 15, 2018 at 7:58 PM, syzbot wrote: > Hello, > > syzkaller hit the following crash on > 594831a8aba3fd045c3212a3e3bb9788c77b989d > git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git/master > compiler: gcc (GCC) 7.1.1 20170620 > .config is attached > Raw console output is attached. > C reproducer is attached > syzkaller reproducer is attached. See https://goo.gl/kgGztJ > for information about syzkaller reproducers > > > IMPORTANT: if you fix the bug, please add the following tag to the commit: > Reported-by: syzbot+799dafde028679585...@syzkaller.appspotmail.com > It will help syzbot understand when the bug is fixed. See footer for > details. > If you forward the report, please keep this part and the footer. > > == > BUG: KASAN: use-after-free in tipc_group_is_open+0x3a/0x40 > net/tipc/group.c:106 > Read of size 1 at addr 8801d89f7378 by task syzkaller275009/3704 > > CPU: 0 PID: 3704 Comm: syzkaller275009 Not tainted 4.15.0-rc7+ #190 > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS > Google 01/01/2011 > Call Trace: > __dump_stack lib/dump_stack.c:17 [inline] > dump_stack+0x194/0x257 lib/dump_stack.c:53 > print_address_description+0x73/0x250 mm/kasan/report.c:252 > kasan_report_error mm/kasan/report.c:351 [inline] > kasan_report+0x25b/0x340 mm/kasan/report.c:409 > __asan_report_load1_noabort+0x14/0x20 mm/kasan/report.c:427 > tipc_group_is_open+0x3a/0x40 net/tipc/group.c:106 > tipc_poll+0x364/0x4d0 net/tipc/socket.c:740 commit eb929a91b213d2a72c5a8b4af9a1acf63bfb8287 Author: Jon Maloy Date: Mon Jan 8 21:03:31 2018 +0100 tipc: improve poll() for group member socket Apparently Jon's commit doesn't fix this. I also don't understand why Jon believes sock_poll_wait() could sync with setsockopt path... > sock_poll+0x141/0x320 net/socket.c: > do_pollfd fs/select.c:822 [inline] > do_poll fs/select.c:872 [inline] > do_sys_poll+0x715/0x10b0 fs/select.c:966 > SYSC_poll fs/select.c:1024 [inline] > SyS_poll+0x10d/0x450 fs/select.c:1012 > entry_SYSCALL_64_fastpath+0x23/0x9a > RIP: 0033:0x446129 > RSP: 002b:7f0f2df96db8 EFLAGS: 0246 ORIG_RAX: 0007 > RAX: ffda RBX: 006dbc54 RCX: 00446129 > RDX: 7fff RSI: 000a RDI: 20ef5000 > RBP: 006dbc50 R08: R09: > R10: R11: 0246 R12: > R13: 7ffc07e2923f R14: 7f0f2df979c0 R15: 0007 > > Allocated by task 3702: > save_stack+0x43/0xd0 mm/kasan/kasan.c:447 > set_track mm/kasan/kasan.c:459 [inline] > kasan_kmalloc+0xad/0xe0 mm/kasan/kasan.c:551 > kmem_cache_alloc_trace+0x136/0x750 mm/slab.c:3610 > kmalloc include/linux/slab.h:499 [inline] > kzalloc include/linux/slab.h:688 [inline] > tipc_group_create+0x144/0x900 net/tipc/group.c:180 > tipc_sk_join net/tipc/socket.c:2762 [inline] > tipc_setsockopt+0x274/0xce0 net/tipc/socket.c:2877 > SYSC_setsockopt net/socket.c:1823 [inline] > SyS_setsockopt+0x189/0x360 net/socket.c:1802 > entry_SYSCALL_64_fastpath+0x23/0x9a > > Freed by task 3702: > save_stack+0x43/0xd0 mm/kasan/kasan.c:447 > set_track mm/kasan/kasan.c:459 [inline] > kasan_slab_free+0x71/0xc0 mm/kasan/kasan.c:524 > __cache_free mm/slab.c:3488 [inline] > kfree+0xd6/0x260 mm/slab.c:3803 > tipc_group_delete+0x2c8/0x3d0 net/tipc/group.c:234 > tipc_sk_join net/tipc/socket.c:2775 [inline] > tipc_setsockopt+0xaa9/0xce0 net/tipc/socket.c:2877 > SYSC_setsockopt net/socket.c:1823 [inline] > SyS_setsockopt+0x189/0x360 net/socket.c:1802 > entry_SYSCALL_64_fastpath+0x23/0x9a > > The buggy address belongs to the object at 8801d89f7300 > which belongs to the cache kmalloc-128 of size 128 > The buggy address is located 120 bytes inside of > 128-byte region [8801d89f7300, 8801d89f7380) > The buggy address belongs to the page: > page:ea0007627dc0 count:1 mapcount:0 mapping:8801d89f7000 index:0x0 > flags: 0x2fffc000100(slab) > raw: 02fffc000100 8801d89f7000 00010015 > raw: ea0007622720 ea0007627ce0 8801dac00640 > page dumped because: kasan: bad access detected > > Memory state around the buggy address: > 8801d89f7200: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb > 8801d89f7280: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc >> >> 8801d89f7300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb > > ^ > 8801d89f7380: fc fc fc fc fc fc fc fc 00 00 00 00 00 00 00 00 > 8801d89f7400: 00 00 00 00 00 00 00 00 fc fc fc fc fc fc fc fc > == > > > --- > This bug is generated by a dumb bot. It may contain errors. > See https://goo.gl/tpsmEJ for details. > Direct all questions to syzkal...@googlegroups.com. > > syzbot will keep track
Re: iproute2 net-next
On Mon, 15 Jan 2018 19:59:05 -0700, David Ahern wrote: > On 1/15/18 6:56 PM, Marcelo Ricardo Leitner wrote: > > On Fri, Dec 29, 2017 at 08:00:28PM -0800, Stephen Hemminger wrote: > >> On Fri, 29 Dec 2017 09:58:23 +0100 > >> Jiri Pirko wrote: > >> > >>> Fri, Dec 29, 2017 at 12:46:31AM CET, dan...@iogearbox.net wrote: > On 12/26/2017 10:35 AM, Leon Romanovsky wrote: > > On Mon, Dec 25, 2017 at 10:14:26PM -0800, Stephen Hemminger wrote: > >> On Tue, 26 Dec 2017 06:47:43 +0200 > >> Leon Romanovsky wrote: > >> > >>> On Mon, Dec 25, 2017 at 10:49:19AM -0800, Stephen Hemminger wrote: > David Ahern has agreed to take over managing the net-next branch of > iproute2. > The new location is: > > https://git.kernel.org/pub/scm/linux/kernel/git/dsahern/iproute2-next.git/ > > In the past, I have accepted new features into iproute2 master > branch, but > am changing the policy so that outside of the merge window (up until > -rc1) > new features will get put into net-next to get some more review and > testing > time. This means that things like the proposed batch streaming mode > will > go through net-next. > >>> > >>> Did you consider to create one shared repo for the iproute2 to allow > >>> multiple committers workflow? > >> > >> For now having separate trees is best, there is no need for multiple > >> committers the load is very light. > >> > >>> It will be much convenient for the users to have one place for > >>> master/stable/net-next branches, instead of actually following two > >>> different repositories. > >> > >> If you are doing network development, you already need to deal with > >> multiple repo's on the kernel side so there is no difference. > > > > I agree with you that one extra "git remote add .." is not so huge and > > all people who develop for the netdev will do it. My concern is about > > Documentation and newcomers, who will have a hard time to find a right > > tree. > > I guess it would certainly help to identify the official repo to rebase > against much quicker if it would be under a common group on korg e.g. > > * iproute2/iproute2.git - for current cycle > * iproute2/iproute2-next.git- for net-next bits > > and also be in line with other tooling (ethtool and others), even if > not as high volume, but it would make it unambiguous right away from > the other, private iproute2 repos on korg, imho. Just a thought. > >>> > >>> +1 > >>> > >>> I was about to suggest this. This is nice opportunity to do such change. > >>> > >>> > > >>> Example, of such shared repo: > >>> BPF: https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/ > >>> Bluetooth: > >>> https://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next.git/ > >>> RDMA: https://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git/ > >>> > >> > >> Most of these are high volume or vendor silo'd which is not the case > >> here. > Cheers, > Daniel > >> > >> Good news > >> kup does support links so could make links from personal to iproute2 > >> directory > >> > >> Bad news > >> kup won't allow me to make iproute2 directory right now. Will have to wait > >> for > >> Konstantin > >> > > > > Hi, any news on this? Not sure if Konstantin is back already or not. > > Done a few days ago. The new canonical URLs for those repos are: > pub/scm/network/iproute2/iproute2 > pub/scm/network/iproute2/iproute2-next > > So clone URLs > git://git.kernel.org/pub/scm/network/iproute2/iproute2-next.git > https://git.kernel.org/pub/scm/network/iproute2/iproute2-next.git > and > git://git.kernel.org/pub/scm/network/iproute2/iproute2.git > https://git.kernel.org/pub/scm/network/iproute2/iproute2.git About the branches - what should we base our patches on for net-next? Most -next repos just use the master, but it seems that in case of iproute2-next.git the net-next branch is still active, and there is an inactive net-next branch in iproute2.git.. Is this transitional or will it stay this way?
Re: iproute2 net-next
On 1/15/18 6:56 PM, Marcelo Ricardo Leitner wrote: > On Fri, Dec 29, 2017 at 08:00:28PM -0800, Stephen Hemminger wrote: >> On Fri, 29 Dec 2017 09:58:23 +0100 >> Jiri Pirko wrote: >> >>> Fri, Dec 29, 2017 at 12:46:31AM CET, dan...@iogearbox.net wrote: On 12/26/2017 10:35 AM, Leon Romanovsky wrote: > On Mon, Dec 25, 2017 at 10:14:26PM -0800, Stephen Hemminger wrote: >> On Tue, 26 Dec 2017 06:47:43 +0200 >> Leon Romanovsky wrote: >> >>> On Mon, Dec 25, 2017 at 10:49:19AM -0800, Stephen Hemminger wrote: David Ahern has agreed to take over managing the net-next branch of iproute2. The new location is: https://git.kernel.org/pub/scm/linux/kernel/git/dsahern/iproute2-next.git/ In the past, I have accepted new features into iproute2 master branch, but am changing the policy so that outside of the merge window (up until -rc1) new features will get put into net-next to get some more review and testing time. This means that things like the proposed batch streaming mode will go through net-next. >>> >>> Did you consider to create one shared repo for the iproute2 to allow >>> multiple committers workflow? >> >> For now having separate trees is best, there is no need for multiple >> committers the load is very light. >> >>> It will be much convenient for the users to have one place for >>> master/stable/net-next branches, instead of actually following two >>> different repositories. >> >> If you are doing network development, you already need to deal with >> multiple repo's on the kernel side so there is no difference. > > I agree with you that one extra "git remote add .." is not so huge and > all people who develop for the netdev will do it. My concern is about > Documentation and newcomers, who will have a hard time to find a right > tree. I guess it would certainly help to identify the official repo to rebase against much quicker if it would be under a common group on korg e.g. * iproute2/iproute2.git - for current cycle * iproute2/iproute2-next.git- for net-next bits and also be in line with other tooling (ethtool and others), even if not as high volume, but it would make it unambiguous right away from the other, private iproute2 repos on korg, imho. Just a thought. >>> >>> +1 >>> >>> I was about to suggest this. This is nice opportunity to do such change. >>> >>> >>> Example, of such shared repo: >>> BPF: https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/ >>> Bluetooth: >>> https://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next.git/ >>> RDMA: https://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git/ >> >> Most of these are high volume or vendor silo'd which is not the case >> here. Cheers, Daniel >> >> Good news >> kup does support links so could make links from personal to iproute2 >> directory >> >> Bad news >> kup won't allow me to make iproute2 directory right now. Will have to wait >> for >> Konstantin >> > > Hi, any news on this? Not sure if Konstantin is back already or not. Done a few days ago. The new canonical URLs for those repos are: pub/scm/network/iproute2/iproute2 pub/scm/network/iproute2/iproute2-next So clone URLs git://git.kernel.org/pub/scm/network/iproute2/iproute2-next.git https://git.kernel.org/pub/scm/network/iproute2/iproute2-next.git and git://git.kernel.org/pub/scm/network/iproute2/iproute2.git https://git.kernel.org/pub/scm/network/iproute2/iproute2.git
Re: [PATCH net-next 2/2] cxgb4: speed up on-chip memory read
Hi Rahul, Thank you for the patch! Perhaps something to improve: [auto build test WARNING on net-next/master] url: https://github.com/0day-ci/linux/commits/Rahul-Lakkireddy/cxgb4-rework-on-chip-memory-read/20180116-050826 reproduce: # apt-get install sparse make ARCH=x86_64 allmodconfig make C=1 CF=-D__CHECK_ENDIAN__ sparse warnings: (new ones prefixed by >>) >> drivers/net/ethernet/chelsio/cxgb4/cudbg_intrinsic_avx.c:48:21: sparse: cast >> removes address space of expression vim +48 drivers/net/ethernet/chelsio/cxgb4/cudbg_intrinsic_avx.c 37 38 unsigned int cudbg_mem_read_avx(struct cudbg_init *pdbg_init, u32 start, 39 u32 offset, u32 size, u32 mem_aperture, 40 u8 *outbuf) 41 { 42 #ifdef CONFIG_AS_AVX 43 u32 max_read_len = CUDBG_MEM_ALIGN_AVX; 44 struct adapter *adap = pdbg_init->adap; 45 u8 *reg_addr, *src_addr, *dst_addr; 46 u32 bytes_read, read_len; 47 > 48 reg_addr = (u8 *)adap->regs + start + offset; --- 0-DAY kernel test infrastructureOpen Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation
[PATCH bpf] bpf, arm64: fix stack_depth tracking in combination with tail calls
Using dynamic stack_depth tracking in arm64 JIT is currently broken in combination with tail calls. In prologue, we cache ctx->stack_size and adjust SP reg for setting up function call stack, and tearing it down again in epilogue. Problem is that when doing a tail call, the cached ctx->stack_size might not be the same. One way to fix the problem with minimal overhead is to re-adjust SP in emit_bpf_tail_call() and properly adjust it to the current program's ctx->stack_size. Tested on Cavium ThunderX ARMv8. Fixes: f1c9eed7f437 ("bpf, arm64: take advantage of stack_depth tracking") Signed-off-by: Daniel Borkmann --- arch/arm64/net/bpf_jit_comp.c | 20 +++- 1 file changed, 11 insertions(+), 9 deletions(-) diff --git a/arch/arm64/net/bpf_jit_comp.c b/arch/arm64/net/bpf_jit_comp.c index ba38d40..bb32f7f 100644 --- a/arch/arm64/net/bpf_jit_comp.c +++ b/arch/arm64/net/bpf_jit_comp.c @@ -148,7 +148,8 @@ static inline int epilogue_offset(const struct jit_ctx *ctx) /* Stack must be multiples of 16B */ #define STACK_ALIGN(sz) (((sz) + 15) & ~15) -#define PROLOGUE_OFFSET 8 +/* Tail call offset to jump into */ +#define PROLOGUE_OFFSET 7 static int build_prologue(struct jit_ctx *ctx) { @@ -200,19 +201,19 @@ static int build_prologue(struct jit_ctx *ctx) /* Initialize tail_call_cnt */ emit(A64_MOVZ(1, tcc, 0, 0), ctx); - /* 4 byte extra for skb_copy_bits buffer */ - ctx->stack_size = prog->aux->stack_depth + 4; - ctx->stack_size = STACK_ALIGN(ctx->stack_size); - - /* Set up function call stack */ - emit(A64_SUB_I(1, A64_SP, A64_SP, ctx->stack_size), ctx); - cur_offset = ctx->idx - idx0; if (cur_offset != PROLOGUE_OFFSET) { pr_err_once("PROLOGUE_OFFSET = %d, expected %d!\n", cur_offset, PROLOGUE_OFFSET); return -1; } + + /* 4 byte extra for skb_copy_bits buffer */ + ctx->stack_size = prog->aux->stack_depth + 4; + ctx->stack_size = STACK_ALIGN(ctx->stack_size); + + /* Set up function call stack */ + emit(A64_SUB_I(1, A64_SP, A64_SP, ctx->stack_size), ctx); return 0; } @@ -260,11 +261,12 @@ static int emit_bpf_tail_call(struct jit_ctx *ctx) emit(A64_LDR64(prg, tmp, prg), ctx); emit(A64_CBZ(1, prg, jmp_offset), ctx); - /* goto *(prog->bpf_func + prologue_size); */ + /* goto *(prog->bpf_func + prologue_offset); */ off = offsetof(struct bpf_prog, bpf_func); emit_a64_mov_i64(tmp, off, ctx); emit(A64_LDR64(tmp, prg, tmp), ctx); emit(A64_ADD_I(1, tmp, tmp, sizeof(u32) * PROLOGUE_OFFSET), ctx); + emit(A64_ADD_I(1, A64_SP, A64_SP, ctx->stack_size), ctx); emit(A64_BR(tmp), ctx); /* out: */ -- 2.9.5
[PATCH bpf-next v2 03/11] net: sched: cls_flower: propagate extack support for filter offload
From: Quentin Monnet Propagate the extack pointer from the `->change()` classifier operation to the function used for filter replacement in cls_flower, and feed it to tc_cls_common_offload_init(). This makes it possible to use netlink extack messages in the future at replacement time for this filter, although it is not used at this point. Signed-off-by: Quentin Monnet Reviewed-by: Jakub Kicinski --- net/sched/cls_flower.c | 8 +--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c index f1640a98a3d2..fe7d96d12435 100644 --- a/net/sched/cls_flower.c +++ b/net/sched/cls_flower.c @@ -234,14 +234,15 @@ static void fl_hw_destroy_filter(struct tcf_proto *tp, struct cls_fl_filter *f) static int fl_hw_replace_filter(struct tcf_proto *tp, struct flow_dissector *dissector, struct fl_flow_key *mask, - struct cls_fl_filter *f) + struct cls_fl_filter *f, + struct netlink_ext_ack *extack) { struct tc_cls_flower_offload cls_flower = {}; struct tcf_block *block = tp->chain->block; bool skip_sw = tc_skip_sw(f->flags); int err; - tc_cls_common_offload_init(&cls_flower.common, tp, NULL); + tc_cls_common_offload_init(&cls_flower.common, tp, extack); cls_flower.command = TC_CLSFLOWER_REPLACE; cls_flower.cookie = (unsigned long) f; cls_flower.dissector = dissector; @@ -939,7 +940,8 @@ static int fl_change(struct net *net, struct sk_buff *in_skb, err = fl_hw_replace_filter(tp, &head->dissector, &mask.key, - fnew); + fnew, + extack); if (err) goto errout_idr; } -- 2.15.1
[PATCH 3/3] openvswitch: drop GSO packets that are too large
Open vSwitch attempts to detect if a packet is too large to be sent to the destination device. However, this test does not consider GSO packets, and it is possible that a GSO packet, when resegmented, will be larger than the device can send. Add detection for packets which are too large. We use skb_gso_validate_len, reusing the length calculation in the existing checks - see 738314a084aa ("openvswitch: use hard_header_len instead of hardcoded ETH_HLEN") This is different from the is_skb_forwardable logic in that it only allows for the length of a VLAN tag if one is actually present. Signed-off-by: Daniel Axtens --- net/openvswitch/vport.c | 37 ++--- 1 file changed, 30 insertions(+), 7 deletions(-) diff --git a/net/openvswitch/vport.c b/net/openvswitch/vport.c index b6c8524032a0..290eeaa82344 100644 --- a/net/openvswitch/vport.c +++ b/net/openvswitch/vport.c @@ -464,16 +464,16 @@ int ovs_vport_receive(struct vport *vport, struct sk_buff *skb, return 0; } -static unsigned int packet_length(const struct sk_buff *skb, - struct net_device *dev) +static unsigned int packet_length_offset(const struct sk_buff *skb, +const struct net_device *dev) { - unsigned int length = skb->len - dev->hard_header_len; + unsigned int length = dev->hard_header_len; if (!skb_vlan_tag_present(skb) && eth_type_vlan(skb->protocol)) - length -= VLAN_HLEN; + length += VLAN_HLEN; - /* Don't subtract for multiple VLAN tags. Most (all?) drivers allow + /* Don't adjust for multiple VLAN tags. Most (all?) drivers allow * (ETH_LEN + VLAN_HLEN) in addition to the mtu value, but almost none * account for 802.1ad. e.g. is_skb_forwardable(). */ @@ -481,6 +481,21 @@ static unsigned int packet_length(const struct sk_buff *skb, return length; } +static inline unsigned int packet_length(const struct sk_buff *skb, +const struct net_device *dev) +{ + return skb->len - packet_length_offset(skb, dev); +} + +static inline bool vport_gso_validate_len(const struct sk_buff *skb, + const struct net_device *dev, + unsigned int mtu) +{ + unsigned int len = mtu + packet_length_offset(skb, dev); + + return skb_gso_validate_len(skb, len); +} + void ovs_vport_send(struct vport *vport, struct sk_buff *skb, u8 mac_proto) { int mtu = vport->dev->mtu; @@ -504,13 +519,21 @@ void ovs_vport_send(struct vport *vport, struct sk_buff *skb, u8 mac_proto) goto drop; } - if (unlikely(packet_length(skb, vport->dev) > mtu && -!skb_is_gso(skb))) { + if (!skb_is_gso(skb) && + unlikely(packet_length(skb, vport->dev) > mtu)) { net_warn_ratelimited("%s: dropped over-mtu packet: %d > %d\n", vport->dev->name, packet_length(skb, vport->dev), mtu); vport->dev->stats.tx_errors++; goto drop; + } else if (skb_is_gso(skb) && + unlikely(!vport_gso_validate_len(skb, vport->dev, mtu))) { + net_warn_ratelimited("%s: dropped over-mtu GSO packet: " +"gso_size = %d, mtu = %d\n", +vport->dev->name, +skb_shinfo(skb)->gso_size, mtu); + vport->dev->stats.tx_errors++; + goto drop; } skb->dev = vport->dev; -- 2.14.1
[PATCH 0/3] Check gso_size of packets when forwarding
When regular packets are forwarded, we validate their size against the MTU of the destination device. However, when GSO packets are forwarded, we do not validate their size against the MTU. We implicitly assume that when they are segmented, the resultant packets will be correctly sized. This is not always the case. We observed a case where a packet received on an ibmveth device had a GSO size of around 10kB. This was forwarded by Open vSwitch to a bnx2x device, where it caused a firmware assert. This is described in detail at [0] and was the genesis of this series. Rather than fixing it in the driver, this series fixes the forwarding path. To fix this: - Move a helper in patch 1. - Validate GSO segment lengths in is_skb_forwardable() in the GSO case, rather than assuming all will be well. This fixes bridges. This is patch 2. - Open vSwitch uses its own slightly specialised algorithm for checking lengths. Wire up checking for that in patch 3. [0]: https://patchwork.ozlabs.org/patch/859410/ Cc: manish.cho...@cavium.com Cc: d...@openvswitch.org Daniel Axtens (3): net: move skb_gso_mac_seglen to skbuff.h net: is_skb_forwardable: validate length of GSO packet segments openvswitch: drop GSO packets that are too large include/linux/skbuff.h | 16 net/core/dev.c | 7 --- net/core/skbuff.c | 34 ++ net/openvswitch/vport.c | 37 ++--- net/sched/sch_tbf.c | 10 -- 5 files changed, 84 insertions(+), 20 deletions(-) -- 2.14.1
[PATCH bpf-next v2 04/11] net: sched: cls_matchall: propagate extack support for filter offload
From: Quentin Monnet Propagate the extack pointer from the `->change()` classifier operation to the function used for filter replacement in cls_matchall, and feed it to tc_cls_common_offload_init(). This makes it possible to use netlink extack messages in the future at replacement time for this filter, although it is not used at this point. Signed-off-by: Quentin Monnet Reviewed-by: Jakub Kicinski --- net/sched/cls_matchall.c | 8 +--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/net/sched/cls_matchall.c b/net/sched/cls_matchall.c index 24bec37e9747..fe6b673db5c6 100644 --- a/net/sched/cls_matchall.c +++ b/net/sched/cls_matchall.c @@ -85,14 +85,15 @@ static void mall_destroy_hw_filter(struct tcf_proto *tp, static int mall_replace_hw_filter(struct tcf_proto *tp, struct cls_mall_head *head, - unsigned long cookie) + unsigned long cookie, + struct netlink_ext_ack *extack) { struct tc_cls_matchall_offload cls_mall = {}; struct tcf_block *block = tp->chain->block; bool skip_sw = tc_skip_sw(head->flags); int err; - tc_cls_common_offload_init(&cls_mall.common, tp, NULL); + tc_cls_common_offload_init(&cls_mall.common, tp, extack); cls_mall.command = TC_CLSMATCHALL_REPLACE; cls_mall.exts = &head->exts; cls_mall.cookie = cookie; @@ -202,7 +203,8 @@ static int mall_change(struct net *net, struct sk_buff *in_skb, goto err_set_parms; if (!tc_skip_hw(new->flags)) { - err = mall_replace_hw_filter(tp, new, (unsigned long) new); + err = mall_replace_hw_filter(tp, new, (unsigned long)new, +extack); if (err) goto err_replace_hw_filter; } -- 2.15.1
[PATCH 1/3] net: move skb_gso_mac_seglen to skbuff.h
We're about to use this elsewhere, so move it into the header with the other related functions like skb_gso_network_seglen(). Signed-off-by: Daniel Axtens --- include/linux/skbuff.h | 15 +++ net/sched/sch_tbf.c| 10 -- 2 files changed, 15 insertions(+), 10 deletions(-) diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index b8e0da6c27d6..b8acafd73272 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -4120,6 +4120,21 @@ static inline unsigned int skb_gso_network_seglen(const struct sk_buff *skb) return hdr_len + skb_gso_transport_seglen(skb); } +/** + * skb_gso_mac_seglen - Return length of individual segments of a gso packet + * + * @skb: GSO skb + * + * skb_gso_mac_seglen is used to determine the real size of the + * individual segments, including MAC/L2, Layer3 (IP, IPv6) and L4 + * headers (TCP/UDP). + */ +static inline unsigned int skb_gso_mac_seglen(const struct sk_buff *skb) +{ + unsigned int hdr_len = skb_transport_header(skb) - skb_mac_header(skb); + return hdr_len + skb_gso_transport_seglen(skb); +} + /* Local Checksum Offload. * Compute outer checksum based on the assumption that the * inner checksum will be offloaded later. diff --git a/net/sched/sch_tbf.c b/net/sched/sch_tbf.c index 83e76d046993..229172d509cc 100644 --- a/net/sched/sch_tbf.c +++ b/net/sched/sch_tbf.c @@ -142,16 +142,6 @@ static u64 psched_ns_t2l(const struct psched_ratecfg *r, return len; } -/* - * Return length of individual segments of a gso packet, - * including all headers (MAC, IP, TCP/UDP) - */ -static unsigned int skb_gso_mac_seglen(const struct sk_buff *skb) -{ - unsigned int hdr_len = skb_transport_header(skb) - skb_mac_header(skb); - return hdr_len + skb_gso_transport_seglen(skb); -} - /* GSO packet is too big, segment it so that tbf can transmit * each segment in time */ -- 2.14.1
[PATCH 2/3] net: is_skb_forwardable: validate length of GSO packet segments
is_skb_forwardable attempts to detect if a packet is too large to be sent to the destination device. However, this test does not consider GSO packets, and it is possible that a GSO packet, when resegmented, will be larger than the device can transmit. Add detection for packets which will be too large by creating a skb_gso_validate_len() routine which is similar to skb_gso_validate_mtu(), but which considers L2 headers, and wire it up in is_skb_forwardable(). Signed-off-by: Daniel Axtens --- include/linux/skbuff.h | 1 + net/core/dev.c | 7 --- net/core/skbuff.c | 34 ++ 3 files changed, 39 insertions(+), 3 deletions(-) diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index b8acafd73272..6a9e4b6f80a7 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -3287,6 +3287,7 @@ int skb_shift(struct sk_buff *tgt, struct sk_buff *skb, int shiftlen); void skb_scrub_packet(struct sk_buff *skb, bool xnet); unsigned int skb_gso_transport_seglen(const struct sk_buff *skb); bool skb_gso_validate_mtu(const struct sk_buff *skb, unsigned int mtu); +bool skb_gso_validate_len(const struct sk_buff *skb, unsigned int len); struct sk_buff *skb_segment(struct sk_buff *skb, netdev_features_t features); struct sk_buff *skb_vlan_untag(struct sk_buff *skb); int skb_ensure_writable(struct sk_buff *skb, int write_len); diff --git a/net/core/dev.c b/net/core/dev.c index 94435cd09072..5af04be128c1 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -1841,11 +1841,12 @@ bool is_skb_forwardable(const struct net_device *dev, const struct sk_buff *skb) if (skb->len <= len) return true; - /* if TSO is enabled, we don't care about the length as the packet -* could be forwarded without being segmented before + /* +* if TSO is enabled, we need to check the size of the +* segmented packets */ if (skb_is_gso(skb)) - return true; + return skb_gso_validate_len(skb, len); return false; } diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 01e8285aea73..aefe049e4b0c 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -4945,6 +4945,40 @@ bool skb_gso_validate_mtu(const struct sk_buff *skb, unsigned int mtu) } EXPORT_SYMBOL_GPL(skb_gso_validate_mtu); +/** + * skb_gso_validate_len - Will a split GSO skb fit in a given length? + * + * @skb: GSO skb + * @len: length to validate against + * + * skb_gso_validate_len validates if a given skb will fit a wanted + * length once split, including L2, L3 and L4 headers. + * + * Similar to skb_gso_validate_mtu, but inclusive of L2 headers. + */ +bool skb_gso_validate_len(const struct sk_buff *skb, unsigned int len) +{ + const struct skb_shared_info *shinfo = skb_shinfo(skb); + const struct sk_buff *iter; + unsigned int hlen; + + hlen = skb_gso_mac_seglen(skb); + + if (shinfo->gso_size != GSO_BY_FRAGS) + return hlen <= len; + + /* Undo this so we can re-use header sizes */ + hlen -= GSO_BY_FRAGS; + + skb_walk_frags(skb, iter) { + if (hlen + skb_headlen(iter) > len) + return false; + } + + return true; +} +EXPORT_SYMBOL_GPL(skb_gso_validate_len); + static struct sk_buff *skb_reorder_vlan_header(struct sk_buff *skb) { if (skb_cow(skb, skb_headroom(skb)) < 0) { -- 2.14.1
[PATCH bpf-next v2 05/11] net: sched: cls_u32: propagate extack support for filter offload
From: Quentin Monnet Propagate the extack pointer from the `->change()` classifier operation to the function used for filter replacement in cls_u32, and feed it to tc_cls_common_offload_init(). This makes it possible to use netlink extack messages in the future at replacement time for this filter, although it is not used at this point. Signed-off-by: Quentin Monnet Reviewed-by: Jakub Kicinski --- net/sched/cls_u32.c | 14 +++--- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/net/sched/cls_u32.c b/net/sched/cls_u32.c index 8127b15d8d50..ef1b746de80b 100644 --- a/net/sched/cls_u32.c +++ b/net/sched/cls_u32.c @@ -501,7 +501,7 @@ static void u32_clear_hw_hnode(struct tcf_proto *tp, struct tc_u_hnode *h) } static int u32_replace_hw_hnode(struct tcf_proto *tp, struct tc_u_hnode *h, - u32 flags) + u32 flags, struct netlink_ext_ack *extack) { struct tcf_block *block = tp->chain->block; struct tc_cls_u32_offload cls_u32 = {}; @@ -509,7 +509,7 @@ static int u32_replace_hw_hnode(struct tcf_proto *tp, struct tc_u_hnode *h, bool offloaded = false; int err; - tc_cls_common_offload_init(&cls_u32.common, tp, NULL); + tc_cls_common_offload_init(&cls_u32.common, tp, extack); cls_u32.command = TC_CLSU32_NEW_HNODE; cls_u32.hnode.divisor = h->divisor; cls_u32.hnode.handle = h->handle; @@ -542,14 +542,14 @@ static void u32_remove_hw_knode(struct tcf_proto *tp, u32 handle) } static int u32_replace_hw_knode(struct tcf_proto *tp, struct tc_u_knode *n, - u32 flags) + u32 flags, struct netlink_ext_ack *extack) { struct tcf_block *block = tp->chain->block; struct tc_cls_u32_offload cls_u32 = {}; bool skip_sw = tc_skip_sw(flags); int err; - tc_cls_common_offload_init(&cls_u32.common, tp, NULL); + tc_cls_common_offload_init(&cls_u32.common, tp, extack); cls_u32.command = TC_CLSU32_REPLACE_KNODE; cls_u32.knode.handle = n->handle; cls_u32.knode.fshift = n->fshift; @@ -943,7 +943,7 @@ static int u32_change(struct net *net, struct sk_buff *in_skb, return err; } - err = u32_replace_hw_knode(tp, new, flags); + err = u32_replace_hw_knode(tp, new, flags, extack); if (err) { u32_destroy_key(tp, new, false); return err; @@ -990,7 +990,7 @@ static int u32_change(struct net *net, struct sk_buff *in_skb, ht->prio = tp->prio; idr_init(&ht->handle_idr); - err = u32_replace_hw_hnode(tp, ht, flags); + err = u32_replace_hw_hnode(tp, ht, flags, extack); if (err) { idr_remove_ext(&tp_c->handle_idr, handle); kfree(ht); @@ -1088,7 +1088,7 @@ static int u32_change(struct net *net, struct sk_buff *in_skb, struct tc_u_knode __rcu **ins; struct tc_u_knode *pins; - err = u32_replace_hw_knode(tp, n, flags); + err = u32_replace_hw_knode(tp, n, flags, extack); if (err) goto errhw; -- 2.15.1
[PATCH bpf-next v2 11/11] selftests/bpf: add checks on extack messages for eBPF hw offload tests
From: Quentin Monnet Add checks to test that netlink extack messages are correctly displayed in some expected error cases for eBPF offload to netdevsim with TC and XDP. iproute2 may be built without libmnl support, in which case the extack messages will not be reported. Try to detect this condition, and when enountered print a mild warning to the user and skip the extack validation. Signed-off-by: Quentin Monnet Reviewed-by: Jakub Kicinski Signed-off-by: Jakub Kicinski --- v2: - auto-skip the extack message validation (David A). tools/testing/selftests/bpf/test_offload.py | 104 +--- 1 file changed, 78 insertions(+), 26 deletions(-) diff --git a/tools/testing/selftests/bpf/test_offload.py b/tools/testing/selftests/bpf/test_offload.py index e3c750f17cb8..28ddb30884fd 100755 --- a/tools/testing/selftests/bpf/test_offload.py +++ b/tools/testing/selftests/bpf/test_offload.py @@ -25,6 +25,7 @@ import time logfile = None log_level = 1 +skip_extack = False bpf_test_dir = os.path.dirname(os.path.realpath(__file__)) pp = pprint.PrettyPrinter() devs = [] # devices we created for clean up @@ -131,7 +132,7 @@ netns = [] # net namespaces to be removed if f in files: files.remove(f) -def tool(name, args, flags, JSON=True, ns="", fail=True): +def tool(name, args, flags, JSON=True, ns="", fail=True, include_stderr=False): params = "" if JSON: params += "%s " % (flags["json"]) @@ -139,9 +140,20 @@ netns = [] # net namespaces to be removed if ns != "": ns = "ip netns exec %s " % (ns) -ret, out = cmd(ns + name + " " + params + args, fail=fail) -if JSON and len(out.strip()) != 0: -return ret, json.loads(out) +if include_stderr: +ret, stdout, stderr = cmd(ns + name + " " + params + args, + fail=fail, include_stderr=True) +else: +ret, stdout = cmd(ns + name + " " + params + args, + fail=fail, include_stderr=False) + +if JSON and len(stdout.strip()) != 0: +out = json.loads(stdout) +else: +out = stdout + +if include_stderr: +return ret, out, stderr else: return ret, out @@ -164,13 +176,15 @@ netns = [] # net namespaces to be removed time.sleep(0.05) raise Exception("Time out waiting for program counts to stabilize want %d, have %d" % (expected, nprogs)) -def ip(args, force=False, JSON=True, ns="", fail=True): +def ip(args, force=False, JSON=True, ns="", fail=True, include_stderr=False): if force: args = "-force " + args -return tool("ip", args, {"json":"-j"}, JSON=JSON, ns=ns, fail=fail) +return tool("ip", args, {"json":"-j"}, JSON=JSON, ns=ns, fail=fail, +include_stderr=include_stderr) -def tc(args, JSON=True, ns="", fail=True): -return tool("tc", args, {"json":"-p"}, JSON=JSON, ns=ns, fail=fail) +def tc(args, JSON=True, ns="", fail=True, include_stderr=False): +return tool("tc", args, {"json":"-p"}, JSON=JSON, ns=ns, fail=fail, +include_stderr=include_stderr) def ethtool(dev, opt, args, fail=True): return cmd("ethtool %s %s %s" % (opt, dev["ifname"], args), fail=fail) @@ -311,13 +325,13 @@ netns = [] # net namespaces to be removed return ip("link set dev %s mtu %d" % (self.dev["ifname"], mtu), fail=fail) -def set_xdp(self, bpf, mode, force=False, fail=True): +def set_xdp(self, bpf, mode, force=False, fail=True, include_stderr=False): return ip("link set dev %s xdp%s %s" % (self.dev["ifname"], mode, bpf), - force=force, fail=fail) + force=force, fail=fail, include_stderr=include_stderr) -def unset_xdp(self, mode, force=False, fail=True): +def unset_xdp(self, mode, force=False, fail=True, include_stderr=False): return ip("link set dev %s xdp%s off" % (self.dev["ifname"], mode), - force=force, fail=fail) + force=force, fail=fail, include_stderr=include_stderr) def ip_link_show(self, xdp): _, link = ip("link show dev %s" % (self['ifname'])) @@ -373,7 +387,7 @@ netns = [] # net namespaces to be removed return filters def cls_bpf_add_filter(self, bpf, da=False, skip_sw=False, skip_hw=False, - fail=True): + fail=True, include_stderr=False): params = "" if da: params += " da" @@ -382,7 +396,8 @@ netns = [] # net namespaces to be removed if skip_hw: params += " skip_hw" return tc("filter add dev %s ingress bpf %s %s" % - (self['ifname'], bpf, params), fail=fail) + (self['ifname'], bpf, params), fail=fail, + include_stderr=include_stderr) def set_ethtool_tc_offloads(self, enable, fail=True): args = "hw-tc-offload %s" % ("on" if enable else "off") @@ -434,6 +449,13 @@ ne
[PATCH bpf-next v2 06/11] net: sched: cls_bpf: plumb extack support in filter for hardware offload
From: Quentin Monnet Pass the extack pointer obtained in the `->change()` filter operation to cls_bpf_offload() and then to cls_bpf_offload_cmd(), where it can be used to initialize the new field of tc_cls_common_offload passed to the callback for offload. This makes it possible to use this extack pointer in drivers offloading BPF programs in a future patch. Signed-off-by: Quentin Monnet Reviewed-by: Jakub Kicinski --- net/sched/cls_bpf.c | 16 +--- 1 file changed, 9 insertions(+), 7 deletions(-) diff --git a/net/sched/cls_bpf.c b/net/sched/cls_bpf.c index b8729124d209..d15ef9ab7243 100644 --- a/net/sched/cls_bpf.c +++ b/net/sched/cls_bpf.c @@ -147,7 +147,8 @@ static bool cls_bpf_is_ebpf(const struct cls_bpf_prog *prog) } static int cls_bpf_offload_cmd(struct tcf_proto *tp, struct cls_bpf_prog *prog, - struct cls_bpf_prog *oldprog) + struct cls_bpf_prog *oldprog, + struct netlink_ext_ack *extack) { struct tcf_block *block = tp->chain->block; struct tc_cls_bpf_offload cls_bpf = {}; @@ -158,7 +159,7 @@ static int cls_bpf_offload_cmd(struct tcf_proto *tp, struct cls_bpf_prog *prog, skip_sw = prog && tc_skip_sw(prog->gen_flags); obj = prog ?: oldprog; - tc_cls_common_offload_init(&cls_bpf.common, tp, NULL); + tc_cls_common_offload_init(&cls_bpf.common, tp, extack); cls_bpf.command = TC_CLSBPF_OFFLOAD; cls_bpf.exts = &obj->exts; cls_bpf.prog = prog ? prog->filter : NULL; @@ -170,7 +171,7 @@ static int cls_bpf_offload_cmd(struct tcf_proto *tp, struct cls_bpf_prog *prog, err = tc_setup_cb_call(block, NULL, TC_SETUP_CLSBPF, &cls_bpf, skip_sw); if (prog) { if (err < 0) { - cls_bpf_offload_cmd(tp, oldprog, prog); + cls_bpf_offload_cmd(tp, oldprog, prog, extack); return err; } else if (err > 0) { prog->gen_flags |= TCA_CLS_FLAGS_IN_HW; @@ -184,7 +185,8 @@ static int cls_bpf_offload_cmd(struct tcf_proto *tp, struct cls_bpf_prog *prog, } static int cls_bpf_offload(struct tcf_proto *tp, struct cls_bpf_prog *prog, - struct cls_bpf_prog *oldprog) + struct cls_bpf_prog *oldprog, + struct netlink_ext_ack *extack) { if (prog && oldprog && prog->gen_flags != oldprog->gen_flags) return -EINVAL; @@ -196,7 +198,7 @@ static int cls_bpf_offload(struct tcf_proto *tp, struct cls_bpf_prog *prog, if (!prog && !oldprog) return 0; - return cls_bpf_offload_cmd(tp, prog, oldprog); + return cls_bpf_offload_cmd(tp, prog, oldprog, extack); } static void cls_bpf_stop_offload(struct tcf_proto *tp, @@ -204,7 +206,7 @@ static void cls_bpf_stop_offload(struct tcf_proto *tp, { int err; - err = cls_bpf_offload_cmd(tp, NULL, prog); + err = cls_bpf_offload_cmd(tp, NULL, prog, NULL); if (err) pr_err("Stopping hardware offload failed: %d\n", err); } @@ -501,7 +503,7 @@ static int cls_bpf_change(struct net *net, struct sk_buff *in_skb, if (ret < 0) goto errout_idr; - ret = cls_bpf_offload(tp, prog, oldprog); + ret = cls_bpf_offload(tp, prog, oldprog, extack); if (ret) goto errout_parms; -- 2.15.1
[PATCH bpf-next v2 09/11] nfp: bpf: use extack support to improve debugging
From: Quentin Monnet Use the recently added extack support for eBPF offload in the driver. Signed-off-by: Quentin Monnet Reviewed-by: Jakub Kicinski --- drivers/net/ethernet/netronome/nfp/bpf/main.c| 31 ++-- drivers/net/ethernet/netronome/nfp/bpf/main.h| 2 +- drivers/net/ethernet/netronome/nfp/bpf/offload.c | 24 +++--- 3 files changed, 39 insertions(+), 18 deletions(-) diff --git a/drivers/net/ethernet/netronome/nfp/bpf/main.c b/drivers/net/ethernet/netronome/nfp/bpf/main.c index e8816ab8fb63..a638c3ab6b61 100644 --- a/drivers/net/ethernet/netronome/nfp/bpf/main.c +++ b/drivers/net/ethernet/netronome/nfp/bpf/main.c @@ -70,7 +70,7 @@ nfp_bpf_xdp_offload(struct nfp_app *app, struct nfp_net *nn, if (prog && running && !xdp_running) return -EBUSY; - ret = nfp_net_bpf_offload(nn, prog, running); + ret = nfp_net_bpf_offload(nn, prog, running, extack); /* Stop offload if replace not possible */ if (ret && prog) nfp_bpf_xdp_offload(app, nn, NULL, extack); @@ -125,17 +125,31 @@ static int nfp_bpf_setup_tc_block_cb(enum tc_setup_type type, struct nfp_bpf_vnic *bv; int err; - if (type != TC_SETUP_CLSBPF || - !tc_can_offload(nn->dp.netdev) || - !nfp_net_ebpf_capable(nn) || - cls_bpf->common.protocol != htons(ETH_P_ALL) || - cls_bpf->common.chain_index) + if (type != TC_SETUP_CLSBPF) { + NL_SET_ERR_MSG_MOD(cls_bpf->common.extack, + "only offload of BPF classifiers supported"); + return -EOPNOTSUPP; + } + if (!tc_can_offload_extack(nn->dp.netdev, cls_bpf->common.extack)) + return -EOPNOTSUPP; + if (!nfp_net_ebpf_capable(nn)) { + NL_SET_ERR_MSG_MOD(cls_bpf->common.extack, + "NFP firmware does not support eBPF offload"); + return -EOPNOTSUPP; + } + if (cls_bpf->common.protocol != htons(ETH_P_ALL)) { + NL_SET_ERR_MSG_MOD(cls_bpf->common.extack, + "only ETH_P_ALL supported as filter protocol"); + return -EOPNOTSUPP; + } + if (cls_bpf->common.chain_index) return -EOPNOTSUPP; /* Only support TC direct action */ if (!cls_bpf->exts_integrated || tcf_exts_has_actions(cls_bpf->exts)) { - nn_err(nn, "only direct action with no legacy actions supported\n"); + NL_SET_ERR_MSG_MOD(cls_bpf->common.extack, + "only direct action with no legacy actions supported"); return -EOPNOTSUPP; } @@ -152,7 +166,8 @@ static int nfp_bpf_setup_tc_block_cb(enum tc_setup_type type, return 0; } - err = nfp_net_bpf_offload(nn, cls_bpf->prog, oldprog); + err = nfp_net_bpf_offload(nn, cls_bpf->prog, oldprog, + cls_bpf->common.extack); if (err) return err; diff --git a/drivers/net/ethernet/netronome/nfp/bpf/main.h b/drivers/net/ethernet/netronome/nfp/bpf/main.h index b80e75a8ecda..80855d43b25e 100644 --- a/drivers/net/ethernet/netronome/nfp/bpf/main.h +++ b/drivers/net/ethernet/netronome/nfp/bpf/main.h @@ -334,7 +334,7 @@ struct nfp_net; int nfp_ndo_bpf(struct nfp_app *app, struct nfp_net *nn, struct netdev_bpf *bpf); int nfp_net_bpf_offload(struct nfp_net *nn, struct bpf_prog *prog, - bool old_prog); + bool old_prog, struct netlink_ext_ack *extack); struct nfp_insn_meta * nfp_bpf_goto_meta(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta, diff --git a/drivers/net/ethernet/netronome/nfp/bpf/offload.c b/drivers/net/ethernet/netronome/nfp/bpf/offload.c index e2859b2e9c6a..9c78a09cda24 100644 --- a/drivers/net/ethernet/netronome/nfp/bpf/offload.c +++ b/drivers/net/ethernet/netronome/nfp/bpf/offload.c @@ -271,7 +271,9 @@ int nfp_ndo_bpf(struct nfp_app *app, struct nfp_net *nn, struct netdev_bpf *bpf) } } -static int nfp_net_bpf_load(struct nfp_net *nn, struct bpf_prog *prog) +static int +nfp_net_bpf_load(struct nfp_net *nn, struct bpf_prog *prog, +struct netlink_ext_ack *extack) { struct nfp_prog *nfp_prog = prog->aux->offload->dev_priv; unsigned int max_mtu; @@ -281,7 +283,7 @@ static int nfp_net_bpf_load(struct nfp_net *nn, struct bpf_prog *prog) max_mtu = nn_readb(nn, NFP_NET_CFG_BPF_INL_MTU) * 64 - 32; if (max_mtu < nn->dp.netdev->mtu) { - nn_info(nn, "BPF offload not supported with MTU larger than HW packet split boundary\n"); + NL_SET_ERR_MSG_MOD(extack, "BPF offload not supported with MTU larger than HW packet split boundary"); return -EOPNOTSUPP; } @@ -303,7 +305,8 @@ static int nfp_net_bpf_load(struct n
[PATCH bpf-next v2 02/11] net: sched: prepare extack support for offload via tc_cls_common_offload
From: Quentin Monnet Prepare for extack support for hardware offload of classifiers. In order to achieve this, a pointer to a struct netlink_ext_ack is added to the struct tc_cls_common_offload that is passed to the callback for setting up the classifier. Function tc_cls_common_offload_init() is updated to support initialization of this new attribute. Extack plumbing is not complete yet. Signed-off-by: Quentin Monnet Reviewed-by: Jakub Kicinski --- include/net/pkt_cls.h| 5 - net/sched/cls_bpf.c | 4 ++-- net/sched/cls_flower.c | 6 +++--- net/sched/cls_matchall.c | 4 ++-- net/sched/cls_u32.c | 8 5 files changed, 15 insertions(+), 12 deletions(-) diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h index 0d1343cba84c..c88c61234cb3 100644 --- a/include/net/pkt_cls.h +++ b/include/net/pkt_cls.h @@ -590,15 +590,18 @@ struct tc_cls_common_offload { u32 chain_index; __be16 protocol; u32 prio; + struct netlink_ext_ack *extack; }; static inline void tc_cls_common_offload_init(struct tc_cls_common_offload *cls_common, - const struct tcf_proto *tp) + const struct tcf_proto *tp, + struct netlink_ext_ack *extack) { cls_common->chain_index = tp->chain->index; cls_common->protocol = tp->protocol; cls_common->prio = tp->prio; + cls_common->extack = extack; } struct tc_cls_u32_knode { diff --git a/net/sched/cls_bpf.c b/net/sched/cls_bpf.c index fcb831b3917e..b8729124d209 100644 --- a/net/sched/cls_bpf.c +++ b/net/sched/cls_bpf.c @@ -158,7 +158,7 @@ static int cls_bpf_offload_cmd(struct tcf_proto *tp, struct cls_bpf_prog *prog, skip_sw = prog && tc_skip_sw(prog->gen_flags); obj = prog ?: oldprog; - tc_cls_common_offload_init(&cls_bpf.common, tp); + tc_cls_common_offload_init(&cls_bpf.common, tp, NULL); cls_bpf.command = TC_CLSBPF_OFFLOAD; cls_bpf.exts = &obj->exts; cls_bpf.prog = prog ? prog->filter : NULL; @@ -215,7 +215,7 @@ static void cls_bpf_offload_update_stats(struct tcf_proto *tp, struct tcf_block *block = tp->chain->block; struct tc_cls_bpf_offload cls_bpf = {}; - tc_cls_common_offload_init(&cls_bpf.common, tp); + tc_cls_common_offload_init(&cls_bpf.common, tp, NULL); cls_bpf.command = TC_CLSBPF_STATS; cls_bpf.exts = &prog->exts; cls_bpf.prog = prog->filter; diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c index 998ee4faf934..f1640a98a3d2 100644 --- a/net/sched/cls_flower.c +++ b/net/sched/cls_flower.c @@ -223,7 +223,7 @@ static void fl_hw_destroy_filter(struct tcf_proto *tp, struct cls_fl_filter *f) struct tc_cls_flower_offload cls_flower = {}; struct tcf_block *block = tp->chain->block; - tc_cls_common_offload_init(&cls_flower.common, tp); + tc_cls_common_offload_init(&cls_flower.common, tp, NULL); cls_flower.command = TC_CLSFLOWER_DESTROY; cls_flower.cookie = (unsigned long) f; @@ -241,7 +241,7 @@ static int fl_hw_replace_filter(struct tcf_proto *tp, bool skip_sw = tc_skip_sw(f->flags); int err; - tc_cls_common_offload_init(&cls_flower.common, tp); + tc_cls_common_offload_init(&cls_flower.common, tp, NULL); cls_flower.command = TC_CLSFLOWER_REPLACE; cls_flower.cookie = (unsigned long) f; cls_flower.dissector = dissector; @@ -270,7 +270,7 @@ static void fl_hw_update_stats(struct tcf_proto *tp, struct cls_fl_filter *f) struct tc_cls_flower_offload cls_flower = {}; struct tcf_block *block = tp->chain->block; - tc_cls_common_offload_init(&cls_flower.common, tp); + tc_cls_common_offload_init(&cls_flower.common, tp, NULL); cls_flower.command = TC_CLSFLOWER_STATS; cls_flower.cookie = (unsigned long) f; cls_flower.exts = &f->exts; diff --git a/net/sched/cls_matchall.c b/net/sched/cls_matchall.c index dc3c57116bbd..24bec37e9747 100644 --- a/net/sched/cls_matchall.c +++ b/net/sched/cls_matchall.c @@ -76,7 +76,7 @@ static void mall_destroy_hw_filter(struct tcf_proto *tp, struct tc_cls_matchall_offload cls_mall = {}; struct tcf_block *block = tp->chain->block; - tc_cls_common_offload_init(&cls_mall.common, tp); + tc_cls_common_offload_init(&cls_mall.common, tp, NULL); cls_mall.command = TC_CLSMATCHALL_DESTROY; cls_mall.cookie = cookie; @@ -92,7 +92,7 @@ static int mall_replace_hw_filter(struct tcf_proto *tp, bool skip_sw = tc_skip_sw(head->flags); int err; - tc_cls_common_offload_init(&cls_mall.common, tp); + tc_cls_common_offload_init(&cls_mall.common, tp, NULL); cls_mall.command = TC_CLSMATCHALL_REPLACE; cls_mall.exts = &head->exts; cls_mall.cookie = cookie; diff --git a/net/sched/cls_u32.c b/net/sched/cls_u32.c index 3ef5c32741c1..8127b15d8d50 100644 --- a/n
[PATCH bpf-next v2 07/11] net: sched: create tc_can_offload_extack() wrapper
From: Quentin Monnet Create a wrapper around tc_can_offload() that takes an additional extack pointer argument in order to output an error message if TC offload is disabled on the device. In this way, the error message is handled by the core and can be the same for all drivers. Signed-off-by: Quentin Monnet Reviewed-by: Jakub Kicinski --- include/net/pkt_cls.h | 11 +++ 1 file changed, 11 insertions(+) diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h index c88c61234cb3..a3ad6a5a2d12 100644 --- a/include/net/pkt_cls.h +++ b/include/net/pkt_cls.h @@ -644,6 +644,17 @@ static inline bool tc_can_offload(const struct net_device *dev) return dev->features & NETIF_F_HW_TC; } +static inline bool tc_can_offload_extack(const struct net_device *dev, +struct netlink_ext_ack *extack) +{ + bool can = tc_can_offload(dev); + + if (!can) + NL_SET_ERR_MSG(extack, "TC offload is disabled on net device"); + + return can; +} + static inline bool tc_skip_hw(u32 flags) { return (flags & TCA_CLS_FLAGS_SKIP_HW) ? true : false; -- 2.15.1
[PATCH bpf-next v2 08/11] nfp: bpf: plumb extack into functions related to XDP offload
From: Quentin Monnet Pass a pointer to an extack object to nfp_app_xdp_offload() in order to prepare for extack usage in the nfp driver. Next step will be to forward this extack pointer to nfp_net_bpf_offload(), once this function is able to use it for printing error messages. Signed-off-by: Quentin Monnet Reviewed-by: Jakub Kicinski --- drivers/net/ethernet/netronome/nfp/bpf/main.c | 4 ++-- drivers/net/ethernet/netronome/nfp/nfp_app.h| 9 ++--- drivers/net/ethernet/netronome/nfp/nfp_net_common.c | 2 +- 3 files changed, 9 insertions(+), 6 deletions(-) diff --git a/drivers/net/ethernet/netronome/nfp/bpf/main.c b/drivers/net/ethernet/netronome/nfp/bpf/main.c index 8823c8360047..e8816ab8fb63 100644 --- a/drivers/net/ethernet/netronome/nfp/bpf/main.c +++ b/drivers/net/ethernet/netronome/nfp/bpf/main.c @@ -54,7 +54,7 @@ static bool nfp_net_ebpf_capable(struct nfp_net *nn) static int nfp_bpf_xdp_offload(struct nfp_app *app, struct nfp_net *nn, - struct bpf_prog *prog) + struct bpf_prog *prog, struct netlink_ext_ack *extack) { bool running, xdp_running; int ret; @@ -73,7 +73,7 @@ nfp_bpf_xdp_offload(struct nfp_app *app, struct nfp_net *nn, ret = nfp_net_bpf_offload(nn, prog, running); /* Stop offload if replace not possible */ if (ret && prog) - nfp_bpf_xdp_offload(app, nn, NULL); + nfp_bpf_xdp_offload(app, nn, NULL, extack); nn->dp.bpf_offload_xdp = prog && !ret; return ret; diff --git a/drivers/net/ethernet/netronome/nfp/nfp_app.h b/drivers/net/ethernet/netronome/nfp/nfp_app.h index 6a6eb02b516e..1229a34f8da5 100644 --- a/drivers/net/ethernet/netronome/nfp/nfp_app.h +++ b/drivers/net/ethernet/netronome/nfp/nfp_app.h @@ -43,6 +43,7 @@ struct bpf_prog; struct net_device; struct netdev_bpf; +struct netlink_ext_ack; struct pci_dev; struct sk_buff; struct sk_buff; @@ -134,7 +135,8 @@ struct nfp_app_type { int (*bpf)(struct nfp_app *app, struct nfp_net *nn, struct netdev_bpf *xdp); int (*xdp_offload)(struct nfp_app *app, struct nfp_net *nn, - struct bpf_prog *prog); + struct bpf_prog *prog, + struct netlink_ext_ack *extack); int (*sriov_enable)(struct nfp_app *app, int num_vfs); void (*sriov_disable)(struct nfp_app *app); @@ -320,11 +322,12 @@ static inline int nfp_app_bpf(struct nfp_app *app, struct nfp_net *nn, } static inline int nfp_app_xdp_offload(struct nfp_app *app, struct nfp_net *nn, - struct bpf_prog *prog) + struct bpf_prog *prog, + struct netlink_ext_ack *extack) { if (!app || !app->type->xdp_offload) return -EOPNOTSUPP; - return app->type->xdp_offload(app, nn, prog); + return app->type->xdp_offload(app, nn, prog, extack); } static inline bool __nfp_app_ctrl_tx(struct nfp_app *app, struct sk_buff *skb) diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c index 2b5cad3069a7..14f23e8d27fa 100644 --- a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c +++ b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c @@ -3395,7 +3395,7 @@ nfp_net_xdp_setup(struct nfp_net *nn, struct bpf_prog *prog, u32 flags, if (err) return err; - err = nfp_app_xdp_offload(nn->app, nn, offload_prog); + err = nfp_app_xdp_offload(nn->app, nn, offload_prog, extack); if (err && flags & XDP_FLAGS_HW_MODE) return err; -- 2.15.1
[PATCH bpf-next v2 01/11] net: sched: add extack support to change() classifier operation
From: Quentin Monnet Add an extra argument to `->change()` operation for passing a pointer to a struct netlink_ext_ack. Update the operation for all classifiers accordingly. Extack is not used at this point. Signed-off-by: Quentin Monnet Reviewed-by: Jakub Kicinski --- include/net/sch_generic.h | 3 ++- net/sched/cls_api.c | 3 ++- net/sched/cls_basic.c | 3 ++- net/sched/cls_bpf.c | 2 +- net/sched/cls_cgroup.c| 3 ++- net/sched/cls_flow.c | 2 +- net/sched/cls_flower.c| 2 +- net/sched/cls_fw.c| 2 +- net/sched/cls_matchall.c | 2 +- net/sched/cls_route.c | 3 ++- net/sched/cls_rsvp.h | 2 +- net/sched/cls_tcindex.c | 3 ++- net/sched/cls_u32.c | 3 ++- 13 files changed, 20 insertions(+), 13 deletions(-) diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h index ac029d5d88e4..5e77f2639c67 100644 --- a/include/net/sch_generic.h +++ b/include/net/sch_generic.h @@ -232,7 +232,8 @@ struct tcf_proto_ops { int (*change)(struct net *net, struct sk_buff *, struct tcf_proto*, unsigned long, u32 handle, struct nlattr **, - void **, bool); + void **, bool, + struct netlink_ext_ack *); int (*delete)(struct tcf_proto*, void *, bool*); void(*walk)(struct tcf_proto*, struct tcf_walker *arg); void(*bind_class)(void *, u32, unsigned long); diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c index 6708b6953bfa..0460cc22d48c 100644 --- a/net/sched/cls_api.c +++ b/net/sched/cls_api.c @@ -912,7 +912,8 @@ static int tc_ctl_tfilter(struct sk_buff *skb, struct nlmsghdr *n, } err = tp->ops->change(net, skb, tp, cl, t->tcm_handle, tca, &fh, - n->nlmsg_flags & NLM_F_CREATE ? TCA_ACT_NOREPLACE : TCA_ACT_REPLACE); + n->nlmsg_flags & NLM_F_CREATE ? TCA_ACT_NOREPLACE : TCA_ACT_REPLACE, + extack); if (err == 0) { if (tp_created) tcf_chain_tp_insert(chain, &chain_info, tp); diff --git a/net/sched/cls_basic.c b/net/sched/cls_basic.c index 5f169ded347e..2cc38cd71938 100644 --- a/net/sched/cls_basic.c +++ b/net/sched/cls_basic.c @@ -175,7 +175,8 @@ static int basic_set_parms(struct net *net, struct tcf_proto *tp, static int basic_change(struct net *net, struct sk_buff *in_skb, struct tcf_proto *tp, unsigned long base, u32 handle, - struct nlattr **tca, void **arg, bool ovr) + struct nlattr **tca, void **arg, bool ovr, + struct netlink_ext_ack *extack) { int err; struct basic_head *head = rtnl_dereference(tp->root); diff --git a/net/sched/cls_bpf.c b/net/sched/cls_bpf.c index 8d78e7f4ecc3..fcb831b3917e 100644 --- a/net/sched/cls_bpf.c +++ b/net/sched/cls_bpf.c @@ -449,7 +449,7 @@ static int cls_bpf_set_parms(struct net *net, struct tcf_proto *tp, static int cls_bpf_change(struct net *net, struct sk_buff *in_skb, struct tcf_proto *tp, unsigned long base, u32 handle, struct nlattr **tca, - void **arg, bool ovr) + void **arg, bool ovr, struct netlink_ext_ack *extack) { struct cls_bpf_head *head = rtnl_dereference(tp->root); struct cls_bpf_prog *oldprog = *arg; diff --git a/net/sched/cls_cgroup.c b/net/sched/cls_cgroup.c index 309d5899265f..b74af0b55820 100644 --- a/net/sched/cls_cgroup.c +++ b/net/sched/cls_cgroup.c @@ -91,7 +91,8 @@ static void cls_cgroup_destroy_rcu(struct rcu_head *root) static int cls_cgroup_change(struct net *net, struct sk_buff *in_skb, struct tcf_proto *tp, unsigned long base, u32 handle, struct nlattr **tca, -void **arg, bool ovr) +void **arg, bool ovr, +struct netlink_ext_ack *extack) { struct nlattr *tb[TCA_CGROUP_MAX + 1]; struct cls_cgroup_head *head = rtnl_dereference(tp->root); diff --git a/net/sched/cls_flow.c b/net/sched/cls_flow.c index 25c2a888e1f0..e944f01d5394 100644 --- a/net/sched/cls_flow.c +++ b/net/sched/cls_flow.c @@ -401,7 +401,7 @@ static void flow_destroy_filter(struct rcu_head *head) static int flow_change(struct net *net, struct sk_buff *in_skb, struct tcf_proto *tp, unsigned long base, u32 handle, struct nlattr **tca, - void **arg, bool ovr) + void **arg, bool ovr, struct netlink_ext_ack *extack) { struct flow_head *head = rtnl_dereference(tp->root); struct flow_filter *fold, *fne
[PATCH bpf-next v2 10/11] netdevsim: add extack support for TC eBPF offload
From: Quentin Monnet Use the recently added extack support for TC eBPF filters in netdevsim. Signed-off-by: Quentin Monnet Reviewed-by: Jakub Kicinski --- drivers/net/netdevsim/bpf.c | 35 --- 1 file changed, 28 insertions(+), 7 deletions(-) diff --git a/drivers/net/netdevsim/bpf.c b/drivers/net/netdevsim/bpf.c index 5134d5c1306c..0de8ba91b262 100644 --- a/drivers/net/netdevsim/bpf.c +++ b/drivers/net/netdevsim/bpf.c @@ -109,17 +109,35 @@ int nsim_bpf_setup_tc_block_cb(enum tc_setup_type type, struct netdevsim *ns = cb_priv; struct bpf_prog *oldprog; - if (type != TC_SETUP_CLSBPF || - !tc_can_offload(ns->netdev) || - cls_bpf->common.protocol != htons(ETH_P_ALL) || - cls_bpf->common.chain_index) + if (type != TC_SETUP_CLSBPF) { + NSIM_EA(cls_bpf->common.extack, + "only offload of BPF classifiers supported"); + return -EOPNOTSUPP; + } + + if (!tc_can_offload_extack(ns->netdev, cls_bpf->common.extack)) + return -EOPNOTSUPP; + + if (cls_bpf->common.protocol != htons(ETH_P_ALL)) { + NSIM_EA(cls_bpf->common.extack, + "only ETH_P_ALL supported as filter protocol"); + return -EOPNOTSUPP; + } + + if (cls_bpf->common.chain_index) return -EOPNOTSUPP; - if (!ns->bpf_tc_accept) + if (!ns->bpf_tc_accept) { + NSIM_EA(cls_bpf->common.extack, + "netdevsim configured to reject BPF TC offload"); return -EOPNOTSUPP; + } /* Note: progs without skip_sw will probably not be dev bound */ - if (prog && !prog->aux->offload && !ns->bpf_tc_non_bound_accept) + if (prog && !prog->aux->offload && !ns->bpf_tc_non_bound_accept) { + NSIM_EA(cls_bpf->common.extack, + "netdevsim configured to reject unbound programs"); return -EOPNOTSUPP; + } if (cls_bpf->command != TC_CLSBPF_OFFLOAD) return -EOPNOTSUPP; @@ -131,8 +149,11 @@ int nsim_bpf_setup_tc_block_cb(enum tc_setup_type type, oldprog = NULL; if (!cls_bpf->prog) return 0; - if (ns->bpf_offloaded) + if (ns->bpf_offloaded) { + NSIM_EA(cls_bpf->common.extack, + "driver and netdev offload states mismatch"); return -EBUSY; + } } return nsim_bpf_offload(ns, cls_bpf->prog, oldprog); -- 2.15.1
[PATCH bpf-next v2 00/11] net: sched: add extack support for cls offload
Hi! This series adds extack to cls offloads, as such it could arguably be targeted at net-next. Unfortunately, git am is not able to deal cleanly with minor conflicts on the nfp patches.. Since the series is really about cls_bpf I hope it's OK if it went via the bpf-next tree. Quentin says: This series tries to improve user experience when eBPF hardware offload hits error paths at load time. In particular, it introduces netlink extended ack support in the nfp driver. To that aim, transmission of the pointer to the extack object is piped through the `change()` operation of the existing classifiers (patch 1 to 6). Then it is used for TC offload in the nfp driver (patch 8) and in netdevsim (patch 9, selftest in patch 10). Patch 7 adds a helper to handle extack messages in the core when TC offload is disabled on the net device. For completeness extack is propagated for classifiers other than cls_bpf, but it's up to the drivers to make use of it. v2: - auto-skip the extack message validation in patch 11 (Dave A). Quentin Monnet (11): net: sched: add extack support to change() classifier operation net: sched: prepare extack support for offload via tc_cls_common_offload net: sched: cls_flower: propagate extack support for filter offload net: sched: cls_matchall: propagate extack support for filter offload net: sched: cls_u32: propagate extack support for filter offload net: sched: cls_bpf: plumb extack support in filter for hardware offload net: sched: create tc_can_offload_extack() wrapper nfp: bpf: plumb extack into functions related to XDP offload nfp: bpf: use extack support to improve debugging netdevsim: add extack support for TC eBPF offload selftests/bpf: add checks on extack messages for eBPF hw offload tests drivers/net/ethernet/netronome/nfp/bpf/main.c | 35 +-- drivers/net/ethernet/netronome/nfp/bpf/main.h | 2 +- drivers/net/ethernet/netronome/nfp/bpf/offload.c | 24 +++-- drivers/net/ethernet/netronome/nfp/nfp_app.h | 9 +- .../net/ethernet/netronome/nfp/nfp_net_common.c| 2 +- drivers/net/netdevsim/bpf.c| 35 +-- include/net/pkt_cls.h | 16 +++- include/net/sch_generic.h | 3 +- net/sched/cls_api.c| 3 +- net/sched/cls_basic.c | 3 +- net/sched/cls_bpf.c| 20 ++-- net/sched/cls_cgroup.c | 3 +- net/sched/cls_flow.c | 2 +- net/sched/cls_flower.c | 14 +-- net/sched/cls_fw.c | 2 +- net/sched/cls_matchall.c | 12 ++- net/sched/cls_route.c | 3 +- net/sched/cls_rsvp.h | 2 +- net/sched/cls_tcindex.c| 3 +- net/sched/cls_u32.c| 21 +++-- tools/testing/selftests/bpf/test_offload.py| 104 +++-- 21 files changed, 221 insertions(+), 97 deletions(-) -- 2.15.1
Re: [PATCH net-next 1/2] cxgb4: rework on-chip memory read
Hi Rahul, Thank you for the patch! Perhaps something to improve: [auto build test WARNING on net-next/master] url: https://github.com/0day-ci/linux/commits/Rahul-Lakkireddy/cxgb4-rework-on-chip-memory-read/20180116-050826 reproduce: # apt-get install sparse make ARCH=x86_64 allmodconfig make C=1 CF=-D__CHECK_ENDIAN__ sparse warnings: (new ones prefixed by >>) >> drivers/net/ethernet/chelsio/cxgb4/cudbg_intrinsic.c:29:14: sparse: >> incorrect type in assignment (different base types) @@ expected restricted >> __be32 @@ got unsignrestricted __be32 @@ drivers/net/ethernet/chelsio/cxgb4/cudbg_intrinsic.c:29:14: expected restricted __be32 drivers/net/ethernet/chelsio/cxgb4/cudbg_intrinsic.c:29:14: got unsigned int vim +29 drivers/net/ethernet/chelsio/cxgb4/cudbg_intrinsic.c 21 22 unsigned int cudbg_mem_read_def(struct cudbg_init *pdbg_init, 23 u32 start, u32 offset, u32 size, 24 u32 mem_aperture, u8 *outbuf) 25 { 26 struct adapter *adap = pdbg_init->adap; 27 __be32 *buf = (__be32 *)outbuf; 28 > 29 *buf = le32_to_cpu((__force __le32) 30 t4_read_reg(adap, start + offset)); 31 32 return sizeof(__be32); 33 } 34 --- 0-DAY kernel test infrastructureOpen Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation
Re: iproute2 net-next
On Fri, Dec 29, 2017 at 08:00:28PM -0800, Stephen Hemminger wrote: > On Fri, 29 Dec 2017 09:58:23 +0100 > Jiri Pirko wrote: > > > Fri, Dec 29, 2017 at 12:46:31AM CET, dan...@iogearbox.net wrote: > > >On 12/26/2017 10:35 AM, Leon Romanovsky wrote: > > >> On Mon, Dec 25, 2017 at 10:14:26PM -0800, Stephen Hemminger wrote: > > >>> On Tue, 26 Dec 2017 06:47:43 +0200 > > >>> Leon Romanovsky wrote: > > >>> > > On Mon, Dec 25, 2017 at 10:49:19AM -0800, Stephen Hemminger wrote: > > > David Ahern has agreed to take over managing the net-next branch of > > > iproute2. > > > The new location is: > > > > > > https://git.kernel.org/pub/scm/linux/kernel/git/dsahern/iproute2-next.git/ > > > > > > In the past, I have accepted new features into iproute2 master > > > branch, but > > > am changing the policy so that outside of the merge window (up until > > > -rc1) > > > new features will get put into net-next to get some more review and > > > testing > > > time. This means that things like the proposed batch streaming mode > > > will > > > go through net-next. > > > > Did you consider to create one shared repo for the iproute2 to allow > > multiple committers workflow? > > >>> > > >>> For now having separate trees is best, there is no need for multiple > > >>> committers the load is very light. > > >>> > > It will be much convenient for the users to have one place for > > master/stable/net-next branches, instead of actually following two > > different repositories. > > >>> > > >>> If you are doing network development, you already need to deal with > > >>> multiple repo's on the kernel side so there is no difference. > > >> > > >> I agree with you that one extra "git remote add .." is not so huge and > > >> all people who develop for the netdev will do it. My concern is about > > >> Documentation and newcomers, who will have a hard time to find a right > > >> tree. > > > > > >I guess it would certainly help to identify the official repo to rebase > > >against much quicker if it would be under a common group on korg e.g. > > > > > > * iproute2/iproute2.git - for current cycle > > > * iproute2/iproute2-next.git- for net-next bits > > > > > >and also be in line with other tooling (ethtool and others), even if > > >not as high volume, but it would make it unambiguous right away from > > >the other, private iproute2 repos on korg, imho. Just a thought. > > > > +1 > > > > I was about to suggest this. This is nice opportunity to do such change. > > > > > > > > > Example, of such shared repo: > > BPF: https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/ > > Bluetooth: > > https://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next.git/ > > RDMA: https://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git/ > > >>> > > >>> Most of these are high volume or vendor silo'd which is not the case > > >>> here. > > >Cheers, > > >Daniel > > Good news > kup does support links so could make links from personal to iproute2 directory > > Bad news > kup won't allow me to make iproute2 directory right now. Will have to wait for > Konstantin > Hi, any news on this? Not sure if Konstantin is back already or not. Thanks, Marcelo
Re: [PATCH bpf-next] bpftool: recognize BPF_PROG_TYPE_CGROUP_DEVICE programs
On 01/15/2018 08:49 PM, Roman Gushchin wrote: > On Mon, Jan 15, 2018 at 07:32:01PM +, Quentin Monnet wrote: >> 2018-01-15 19:16 UTC+ ~ Roman Gushchin >>> Bpftool doesn't recognize BPF_PROG_TYPE_CGROUP_DEVICE programs, >>> so the prog show command prints the numeric type value: >>> >>> $ bpftool prog show >>> 1: type 15 name bpf_prog1 tag ac9f93dbfd6d9b74 >>> loaded_at Jan 15/07:58 uid 0 >>> xlated 96B jited 105B memlock 4096B >>> >>> This patch defines the corresponding textual representation: >>> >>> $ bpftool prog show >>> 1: cgroup_device name bpf_prog1 tag ac9f93dbfd6d9b74 >>> loaded_at Jan 15/07:58 uid 0 >>> xlated 96B jited 105B memlock 4096B >>> >>> Signed-off-by: Roman Gushchin >>> Cc: Jakub Kicinski >>> Cc: Quentin Monnet >>> Cc: Daniel Borkmann >>> Cc: Alexei Starovoitov >>> --- >>> tools/bpf/bpftool/prog.c | 1 + >>> 1 file changed, 1 insertion(+) >>> >>> diff --git a/tools/bpf/bpftool/prog.c b/tools/bpf/bpftool/prog.c >>> index c6a28be4665c..099e21cf1b5c 100644 >>> --- a/tools/bpf/bpftool/prog.c >>> +++ b/tools/bpf/bpftool/prog.c >>> @@ -66,6 +66,7 @@ static const char * const prog_type_name[] = { >>> [BPF_PROG_TYPE_LWT_XMIT]= "lwt_xmit", >>> [BPF_PROG_TYPE_SOCK_OPS]= "sock_ops", >>> [BPF_PROG_TYPE_SK_SKB] = "sk_skb", >>> + [BPF_PROG_TYPE_CGROUP_DEVICE] = "cgroup_device", >>> }; >>> >>> static void print_boot_time(__u64 nsecs, char *buf, unsigned int size) >> >> Looks good, thanks Roman! >> Would you mind updating the map names as well? It seems the >> BPF_MAP_TYPE_CPUMAP is missing from the list in map.c. > > Hello, Quentin! > > Sure, I'll take a look. Ok, I presume this comes in as a separate one anyway, so I've applied this one into bpf-next already, thanks Roman!
RE: [patch iproute2 v10 0/2] tc: Add batchsize feature to batch mode
> -Original Message- > From: David Ahern [mailto:dsah...@gmail.com] > Sent: Tuesday, January 16, 2018 12:41 AM > To: Chris Mi ; netdev@vger.kernel.org > Cc: gerlitz...@gmail.com; step...@networkplumber.org; > marcelo.leit...@gmail.com; p...@nwl.cc > Subject: Re: [patch iproute2 v10 0/2] tc: Add batchsize feature to batch mode > > On 1/11/18 10:13 PM, Chris Mi wrote: > > Currently in tc batch mode, only one command is read from the batch > > file and sent to kernel to process. With this patchset, at most 128 > > commands can be accumulated before sending to kernel. > > > > We introduced a new function in patch 1 to support for sending > > multiple messages. In patch 2, we add this support for filter > > add/delete/change/replace and actions add/change/replace commands. > > > > But please note that kernel still processes the requests one by one. > > To process the requests in parallel in kernel is another effort. > > The time we're saving in this patchset is the user mode and kernel > > mode context switch. So this patchset works on top of the current kernel. > > > > Using the following script in kernel, we can generate 1,000,000 rules. > > tools/testing/selftests/tc-testing/tdc_batch.py > > > > Without this patchset, 'tc -b $file' exection time is: > > > > real0m15.555s > > user0m7.211s > > sys 0m8.284s > > > > With this patchset, 'tc -b $file' exection time is: > > > > real0m12.360s > > user0m6.082s > > sys 0m6.213s > > > > The insertion rate is improved more than 10%. > > LGTM. Applied to iproute2-next. Thank you, David. And thanks for your careful review that improves the code quality and design very much. -Chris
Re: [PATCH bpf-next 11/11] selftests/bpf: add checks on extack messages for eBPF hw offload tests
On Mon, 15 Jan 2018 18:02:17 -0700, David Ahern wrote: > On 1/15/18 5:55 PM, Jakub Kicinski wrote: > > On Mon, 15 Jan 2018 17:49:07 -0700, David Ahern wrote: > >> On 1/15/18 5:30 PM, Jakub Kicinski wrote: > >>> A new flag ("--skip-extack") is added to the Python script so as to > >>> allow to skip these checks. This is because extack messages cannot be > >>> displayed by tc and ip if tools from iproute2 package were compiled > >>> without libmnl, but we do not want this to prevent users to run the > >>> other checks. > >> > >> That is unfortunate. Did you consider auto-detecting support? e.g., run > >> a command that is known to fail and return a message. For example, > >> > >> $ ip ro add 1.1.1.1/32 dev eth0 onlink > >> Error: Invalid flags for nexthop - PERVASIVE and ONLINK can not be set. > > > > That's a good idea. And then if it fails would you suggest to skip the > > test entirely or just skip extack checks? With the --skip-extack flag > > the hope was that people would be inclined to install libmnl to save > > themselves the hassle of setting the flag each time. If it's detected > > automatically there is no hassle/nudge to install libmnl.. > > > > My inclination is to avoid unnecessary options and adapt to the > environment. ie., just skip the checks. You can still nudge users with a > warning that it appears ip or tc does not have extack support. OK, I'll respin shortly, thanks!
Re: [PATCH bpf-next 11/11] selftests/bpf: add checks on extack messages for eBPF hw offload tests
On 1/15/18 5:55 PM, Jakub Kicinski wrote: > On Mon, 15 Jan 2018 17:49:07 -0700, David Ahern wrote: >> On 1/15/18 5:30 PM, Jakub Kicinski wrote: >>> A new flag ("--skip-extack") is added to the Python script so as to >>> allow to skip these checks. This is because extack messages cannot be >>> displayed by tc and ip if tools from iproute2 package were compiled >>> without libmnl, but we do not want this to prevent users to run the >>> other checks. >> >> That is unfortunate. Did you consider auto-detecting support? e.g., run >> a command that is known to fail and return a message. For example, >> >> $ ip ro add 1.1.1.1/32 dev eth0 onlink >> Error: Invalid flags for nexthop - PERVASIVE and ONLINK can not be set. > > That's a good idea. And then if it fails would you suggest to skip the > test entirely or just skip extack checks? With the --skip-extack flag > the hope was that people would be inclined to install libmnl to save > themselves the hassle of setting the flag each time. If it's detected > automatically there is no hassle/nudge to install libmnl.. > My inclination is to avoid unnecessary options and adapt to the environment. ie., just skip the checks. You can still nudge users with a warning that it appears ip or tc does not have extack support.
Re: [PATCH bpf-next 11/11] selftests/bpf: add checks on extack messages for eBPF hw offload tests
On Mon, 15 Jan 2018 17:49:07 -0700, David Ahern wrote: > On 1/15/18 5:30 PM, Jakub Kicinski wrote: > > A new flag ("--skip-extack") is added to the Python script so as to > > allow to skip these checks. This is because extack messages cannot be > > displayed by tc and ip if tools from iproute2 package were compiled > > without libmnl, but we do not want this to prevent users to run the > > other checks. > > That is unfortunate. Did you consider auto-detecting support? e.g., run > a command that is known to fail and return a message. For example, > > $ ip ro add 1.1.1.1/32 dev eth0 onlink > Error: Invalid flags for nexthop - PERVASIVE and ONLINK can not be set. That's a good idea. And then if it fails would you suggest to skip the test entirely or just skip extack checks? With the --skip-extack flag the hope was that people would be inclined to install libmnl to save themselves the hassle of setting the flag each time. If it's detected automatically there is no hassle/nudge to install libmnl..
Re: [PATCH bpf-next 11/11] selftests/bpf: add checks on extack messages for eBPF hw offload tests
On 1/15/18 5:30 PM, Jakub Kicinski wrote: > A new flag ("--skip-extack") is added to the Python script so as to > allow to skip these checks. This is because extack messages cannot be > displayed by tc and ip if tools from iproute2 package were compiled > without libmnl, but we do not want this to prevent users to run the > other checks. That is unfortunate. Did you consider auto-detecting support? e.g., run a command that is known to fail and return a message. For example, $ ip ro add 1.1.1.1/32 dev eth0 onlink Error: Invalid flags for nexthop - PERVASIVE and ONLINK can not be set.
Re: [PATCH] netfilter: nf_tables: flow_offload depends on flow_table
On Fri, Jan 12, 2018 at 04:50:26PM +0100, Arnd Bergmann wrote: > Without CONFIG_NF_FLOW_TABLE, the new nft_flow_offload module produces > a link error: > > net/netfilter/nft_flow_offload.o: In function > `nft_flow_offload_iterate_cleanup': > nft_flow_offload.c:(.text+0xb0): undefined reference to > `nf_flow_table_iterate' > net/netfilter/nft_flow_offload.o: In function `flow_offload_iterate_cleanup': > nft_flow_offload.c:(.text+0x160): undefined reference to `flow_offload_dead' > net/netfilter/nft_flow_offload.o: In function `nft_flow_offload_eval': > nft_flow_offload.c:(.text+0xc4c): undefined reference to `flow_offload_alloc' > nft_flow_offload.c:(.text+0xc64): undefined reference to `flow_offload_add' > nft_flow_offload.c:(.text+0xc94): undefined reference to `flow_offload_free' > > This adds a Kconfig dependency for it. Applied, thanks Arnd.
Re: [PATCH net-next 1/2] netfilter: nf_defrag: mark xt_table structures 'const' again
On Mon, Jan 15, 2018 at 04:49:05PM +0100, Arnd Bergmann wrote: > As a side-effect of adding the module option, we now get a section > mismatch warning: > > WARNING: net/ipv4/netfilter/iptable_raw.o(.data+0x1c): Section mismatch in > reference from the variable packet_raw to the function > .init.text:iptable_raw_table_init() > The variable packet_raw references > the function __init iptable_raw_table_init() > If the reference is valid then annotate the > variable with __init* or __refdata (see linux/init.h) or name the variable: > *_template, *_timer, *_sht, *_ops, *_probe, *_probe_one, *_console Applied, thanks Arnd.
Re: [PATCH net-next 2/2] netfilter: nf_defrag: move NF_CONNTRACK bits into #ifdef
On Mon, Jan 15, 2018 at 04:49:06PM +0100, Arnd Bergmann wrote: > We cannot access the skb->_nfct field when CONFIG_NF_CONNTRACK is > disabled: > > net/ipv4/netfilter/nf_defrag_ipv4.c: In function 'ipv4_conntrack_defrag': > net/ipv4/netfilter/nf_defrag_ipv4.c:83:9: error: 'struct sk_buff' has no > member named '_nfct' > net/ipv6/netfilter/nf_defrag_ipv6_hooks.c: In function 'ipv6_defrag': > net/ipv6/netfilter/nf_defrag_ipv6_hooks.c:68:9: error: 'struct sk_buff' has > no member named '_nfct' > > Both functions already have an #ifdef for this, so let's move the > check in there. Also applied, thanks.
[PATCH bpf-next 03/11] net: sched: cls_flower: propagate extack support for filter offload
From: Quentin Monnet Propagate the extack pointer from the `->change()` classifier operation to the function used for filter replacement in cls_flower, and feed it to tc_cls_common_offload_init(). This makes it possible to use netlink extack messages in the future at replacement time for this filter, although it is not used at this point. Signed-off-by: Quentin Monnet Reviewed-by: Jakub Kicinski --- net/sched/cls_flower.c | 8 +--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c index f1640a98a3d2..fe7d96d12435 100644 --- a/net/sched/cls_flower.c +++ b/net/sched/cls_flower.c @@ -234,14 +234,15 @@ static void fl_hw_destroy_filter(struct tcf_proto *tp, struct cls_fl_filter *f) static int fl_hw_replace_filter(struct tcf_proto *tp, struct flow_dissector *dissector, struct fl_flow_key *mask, - struct cls_fl_filter *f) + struct cls_fl_filter *f, + struct netlink_ext_ack *extack) { struct tc_cls_flower_offload cls_flower = {}; struct tcf_block *block = tp->chain->block; bool skip_sw = tc_skip_sw(f->flags); int err; - tc_cls_common_offload_init(&cls_flower.common, tp, NULL); + tc_cls_common_offload_init(&cls_flower.common, tp, extack); cls_flower.command = TC_CLSFLOWER_REPLACE; cls_flower.cookie = (unsigned long) f; cls_flower.dissector = dissector; @@ -939,7 +940,8 @@ static int fl_change(struct net *net, struct sk_buff *in_skb, err = fl_hw_replace_filter(tp, &head->dissector, &mask.key, - fnew); + fnew, + extack); if (err) goto errout_idr; } -- 2.15.1
[PATCH bpf-next 00/11] net: sched: add extack support for cls offload
Hi! This series adds extack to cls offloads, as such it could arguably be targeted at net-next. Unfortunately, git am is not able to deal cleanly with minor conflicts on the nfp patches.. Since the series is really about cls_bpf I hope it's OK if it went via the bpf-next tree. Quentin says: This series tries to improve user experience when eBPF hardware offload hits error paths at load time. In particular, it introduces netlink extended ack support in the nfp driver. To that aim, transmission of the pointer to the extack object is piped through the `change()` operation of the existing classifiers (patch 1 to 6). Then it is used for TC offload in the nfp driver (patch 8) and in netdevsim (patch 9, selftest in patch 10). Patch 7 adds a helper to handle extack messages in the core when TC offload is disabled on the net device. For completeness extack is propagated for classifiers other than cls_bpf, but it's up to the drivers to make use of it. Quentin Monnet (11): net: sched: add extack support to change() classifier operation net: sched: prepare extack support for offload via tc_cls_common_offload net: sched: cls_flower: propagate extack support for filter offload net: sched: cls_matchall: propagate extack support for filter offload net: sched: cls_u32: propagate extack support for filter offload net: sched: cls_bpf: plumb extack support in filter for hardware offload net: sched: create tc_can_offload_extack() wrapper nfp: bpf: plumb extack into functions related to XDP offload nfp: bpf: use extack support to improve debugging netdevsim: add extack support for TC eBPF offload selftests/bpf: add checks on extack messages for eBPF hw offload tests drivers/net/ethernet/netronome/nfp/bpf/main.c | 35 +--- drivers/net/ethernet/netronome/nfp/bpf/main.h | 2 +- drivers/net/ethernet/netronome/nfp/bpf/offload.c | 24 -- drivers/net/ethernet/netronome/nfp/nfp_app.h | 9 +- .../net/ethernet/netronome/nfp/nfp_net_common.c| 2 +- drivers/net/netdevsim/bpf.c| 35 ++-- include/net/pkt_cls.h | 16 +++- include/net/sch_generic.h | 3 +- net/sched/cls_api.c| 3 +- net/sched/cls_basic.c | 3 +- net/sched/cls_bpf.c| 20 +++-- net/sched/cls_cgroup.c | 3 +- net/sched/cls_flow.c | 2 +- net/sched/cls_flower.c | 14 ++-- net/sched/cls_fw.c | 2 +- net/sched/cls_matchall.c | 12 +-- net/sched/cls_route.c | 3 +- net/sched/cls_rsvp.h | 2 +- net/sched/cls_tcindex.c| 3 +- net/sched/cls_u32.c| 21 ++--- tools/testing/selftests/bpf/test_offload.py| 97 -- 21 files changed, 214 insertions(+), 97 deletions(-) -- 2.15.1
[PATCH bpf-next 01/11] net: sched: add extack support to change() classifier operation
From: Quentin Monnet Add an extra argument to `->change()` operation for passing a pointer to a struct netlink_ext_ack. Update the operation for all classifiers accordingly. Extack is not used at this point. Signed-off-by: Quentin Monnet Reviewed-by: Jakub Kicinski --- include/net/sch_generic.h | 3 ++- net/sched/cls_api.c | 3 ++- net/sched/cls_basic.c | 3 ++- net/sched/cls_bpf.c | 2 +- net/sched/cls_cgroup.c| 3 ++- net/sched/cls_flow.c | 2 +- net/sched/cls_flower.c| 2 +- net/sched/cls_fw.c| 2 +- net/sched/cls_matchall.c | 2 +- net/sched/cls_route.c | 3 ++- net/sched/cls_rsvp.h | 2 +- net/sched/cls_tcindex.c | 3 ++- net/sched/cls_u32.c | 3 ++- 13 files changed, 20 insertions(+), 13 deletions(-) diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h index ac029d5d88e4..5e77f2639c67 100644 --- a/include/net/sch_generic.h +++ b/include/net/sch_generic.h @@ -232,7 +232,8 @@ struct tcf_proto_ops { int (*change)(struct net *net, struct sk_buff *, struct tcf_proto*, unsigned long, u32 handle, struct nlattr **, - void **, bool); + void **, bool, + struct netlink_ext_ack *); int (*delete)(struct tcf_proto*, void *, bool*); void(*walk)(struct tcf_proto*, struct tcf_walker *arg); void(*bind_class)(void *, u32, unsigned long); diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c index 6708b6953bfa..0460cc22d48c 100644 --- a/net/sched/cls_api.c +++ b/net/sched/cls_api.c @@ -912,7 +912,8 @@ static int tc_ctl_tfilter(struct sk_buff *skb, struct nlmsghdr *n, } err = tp->ops->change(net, skb, tp, cl, t->tcm_handle, tca, &fh, - n->nlmsg_flags & NLM_F_CREATE ? TCA_ACT_NOREPLACE : TCA_ACT_REPLACE); + n->nlmsg_flags & NLM_F_CREATE ? TCA_ACT_NOREPLACE : TCA_ACT_REPLACE, + extack); if (err == 0) { if (tp_created) tcf_chain_tp_insert(chain, &chain_info, tp); diff --git a/net/sched/cls_basic.c b/net/sched/cls_basic.c index 5f169ded347e..2cc38cd71938 100644 --- a/net/sched/cls_basic.c +++ b/net/sched/cls_basic.c @@ -175,7 +175,8 @@ static int basic_set_parms(struct net *net, struct tcf_proto *tp, static int basic_change(struct net *net, struct sk_buff *in_skb, struct tcf_proto *tp, unsigned long base, u32 handle, - struct nlattr **tca, void **arg, bool ovr) + struct nlattr **tca, void **arg, bool ovr, + struct netlink_ext_ack *extack) { int err; struct basic_head *head = rtnl_dereference(tp->root); diff --git a/net/sched/cls_bpf.c b/net/sched/cls_bpf.c index 8d78e7f4ecc3..fcb831b3917e 100644 --- a/net/sched/cls_bpf.c +++ b/net/sched/cls_bpf.c @@ -449,7 +449,7 @@ static int cls_bpf_set_parms(struct net *net, struct tcf_proto *tp, static int cls_bpf_change(struct net *net, struct sk_buff *in_skb, struct tcf_proto *tp, unsigned long base, u32 handle, struct nlattr **tca, - void **arg, bool ovr) + void **arg, bool ovr, struct netlink_ext_ack *extack) { struct cls_bpf_head *head = rtnl_dereference(tp->root); struct cls_bpf_prog *oldprog = *arg; diff --git a/net/sched/cls_cgroup.c b/net/sched/cls_cgroup.c index 309d5899265f..b74af0b55820 100644 --- a/net/sched/cls_cgroup.c +++ b/net/sched/cls_cgroup.c @@ -91,7 +91,8 @@ static void cls_cgroup_destroy_rcu(struct rcu_head *root) static int cls_cgroup_change(struct net *net, struct sk_buff *in_skb, struct tcf_proto *tp, unsigned long base, u32 handle, struct nlattr **tca, -void **arg, bool ovr) +void **arg, bool ovr, +struct netlink_ext_ack *extack) { struct nlattr *tb[TCA_CGROUP_MAX + 1]; struct cls_cgroup_head *head = rtnl_dereference(tp->root); diff --git a/net/sched/cls_flow.c b/net/sched/cls_flow.c index 25c2a888e1f0..e944f01d5394 100644 --- a/net/sched/cls_flow.c +++ b/net/sched/cls_flow.c @@ -401,7 +401,7 @@ static void flow_destroy_filter(struct rcu_head *head) static int flow_change(struct net *net, struct sk_buff *in_skb, struct tcf_proto *tp, unsigned long base, u32 handle, struct nlattr **tca, - void **arg, bool ovr) + void **arg, bool ovr, struct netlink_ext_ack *extack) { struct flow_head *head = rtnl_dereference(tp->root); struct flow_filter *fold, *fne
[PATCH bpf-next 11/11] selftests/bpf: add checks on extack messages for eBPF hw offload tests
From: Quentin Monnet Add checks to test that netlink extack messages are correctly displayed in some expected error cases for eBPF offload to netdevsim with TC and XDP. A new flag ("--skip-extack") is added to the Python script so as to allow to skip these checks. This is because extack messages cannot be displayed by tc and ip if tools from iproute2 package were compiled without libmnl, but we do not want this to prevent users to run the other checks. Signed-off-by: Quentin Monnet Reviewed-by: Jakub Kicinski --- tools/testing/selftests/bpf/test_offload.py | 97 + 1 file changed, 71 insertions(+), 26 deletions(-) diff --git a/tools/testing/selftests/bpf/test_offload.py b/tools/testing/selftests/bpf/test_offload.py index e3c750f17cb8..94ced38f650b 100755 --- a/tools/testing/selftests/bpf/test_offload.py +++ b/tools/testing/selftests/bpf/test_offload.py @@ -131,7 +131,7 @@ netns = [] # net namespaces to be removed if f in files: files.remove(f) -def tool(name, args, flags, JSON=True, ns="", fail=True): +def tool(name, args, flags, JSON=True, ns="", fail=True, include_stderr=False): params = "" if JSON: params += "%s " % (flags["json"]) @@ -139,9 +139,20 @@ netns = [] # net namespaces to be removed if ns != "": ns = "ip netns exec %s " % (ns) -ret, out = cmd(ns + name + " " + params + args, fail=fail) -if JSON and len(out.strip()) != 0: -return ret, json.loads(out) +if include_stderr: +ret, stdout, stderr = cmd(ns + name + " " + params + args, + fail=fail, include_stderr=True) +else: +ret, stdout = cmd(ns + name + " " + params + args, + fail=fail, include_stderr=False) + +if JSON and len(stdout.strip()) != 0: +out = json.loads(stdout) +else: +out = stdout + +if include_stderr: +return ret, out, stderr else: return ret, out @@ -164,13 +175,15 @@ netns = [] # net namespaces to be removed time.sleep(0.05) raise Exception("Time out waiting for program counts to stabilize want %d, have %d" % (expected, nprogs)) -def ip(args, force=False, JSON=True, ns="", fail=True): +def ip(args, force=False, JSON=True, ns="", fail=True, include_stderr=False): if force: args = "-force " + args -return tool("ip", args, {"json":"-j"}, JSON=JSON, ns=ns, fail=fail) +return tool("ip", args, {"json":"-j"}, JSON=JSON, ns=ns, fail=fail, +include_stderr=include_stderr) -def tc(args, JSON=True, ns="", fail=True): -return tool("tc", args, {"json":"-p"}, JSON=JSON, ns=ns, fail=fail) +def tc(args, JSON=True, ns="", fail=True, include_stderr=False): +return tool("tc", args, {"json":"-p"}, JSON=JSON, ns=ns, fail=fail, +include_stderr=include_stderr) def ethtool(dev, opt, args, fail=True): return cmd("ethtool %s %s %s" % (opt, dev["ifname"], args), fail=fail) @@ -311,13 +324,13 @@ netns = [] # net namespaces to be removed return ip("link set dev %s mtu %d" % (self.dev["ifname"], mtu), fail=fail) -def set_xdp(self, bpf, mode, force=False, fail=True): +def set_xdp(self, bpf, mode, force=False, fail=True, include_stderr=False): return ip("link set dev %s xdp%s %s" % (self.dev["ifname"], mode, bpf), - force=force, fail=fail) + force=force, fail=fail, include_stderr=include_stderr) -def unset_xdp(self, mode, force=False, fail=True): +def unset_xdp(self, mode, force=False, fail=True, include_stderr=False): return ip("link set dev %s xdp%s off" % (self.dev["ifname"], mode), - force=force, fail=fail) + force=force, fail=fail, include_stderr=include_stderr) def ip_link_show(self, xdp): _, link = ip("link show dev %s" % (self['ifname'])) @@ -373,7 +386,7 @@ netns = [] # net namespaces to be removed return filters def cls_bpf_add_filter(self, bpf, da=False, skip_sw=False, skip_hw=False, - fail=True): + fail=True, include_stderr=False): params = "" if da: params += " da" @@ -382,7 +395,8 @@ netns = [] # net namespaces to be removed if skip_hw: params += " skip_hw" return tc("filter add dev %s ingress bpf %s %s" % - (self['ifname'], bpf, params), fail=fail) + (self['ifname'], bpf, params), fail=fail, + include_stderr=include_stderr) def set_ethtool_tc_offloads(self, enable, fail=True): args = "hw-tc-offload %s" % ("on" if enable else "off") @@ -434,9 +448,18 @@ netns = [] # net namespaces to be removed fail(dev["ns_dev"] != 0, "Device perameters not zero on removed") fail(dev["ns_inode"] != 0, "Device perameters not zero on removed") +def check_extack(output, reference,
[PATCH bpf-next 04/11] net: sched: cls_matchall: propagate extack support for filter offload
From: Quentin Monnet Propagate the extack pointer from the `->change()` classifier operation to the function used for filter replacement in cls_matchall, and feed it to tc_cls_common_offload_init(). This makes it possible to use netlink extack messages in the future at replacement time for this filter, although it is not used at this point. Signed-off-by: Quentin Monnet Reviewed-by: Jakub Kicinski --- net/sched/cls_matchall.c | 8 +--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/net/sched/cls_matchall.c b/net/sched/cls_matchall.c index 24bec37e9747..fe6b673db5c6 100644 --- a/net/sched/cls_matchall.c +++ b/net/sched/cls_matchall.c @@ -85,14 +85,15 @@ static void mall_destroy_hw_filter(struct tcf_proto *tp, static int mall_replace_hw_filter(struct tcf_proto *tp, struct cls_mall_head *head, - unsigned long cookie) + unsigned long cookie, + struct netlink_ext_ack *extack) { struct tc_cls_matchall_offload cls_mall = {}; struct tcf_block *block = tp->chain->block; bool skip_sw = tc_skip_sw(head->flags); int err; - tc_cls_common_offload_init(&cls_mall.common, tp, NULL); + tc_cls_common_offload_init(&cls_mall.common, tp, extack); cls_mall.command = TC_CLSMATCHALL_REPLACE; cls_mall.exts = &head->exts; cls_mall.cookie = cookie; @@ -202,7 +203,8 @@ static int mall_change(struct net *net, struct sk_buff *in_skb, goto err_set_parms; if (!tc_skip_hw(new->flags)) { - err = mall_replace_hw_filter(tp, new, (unsigned long) new); + err = mall_replace_hw_filter(tp, new, (unsigned long)new, +extack); if (err) goto err_replace_hw_filter; } -- 2.15.1
[PATCH bpf-next 07/11] net: sched: create tc_can_offload_extack() wrapper
From: Quentin Monnet Create a wrapper around tc_can_offload() that takes an additional extack pointer argument in order to output an error message if TC offload is disabled on the device. In this way, the error message is handled by the core and can be the same for all drivers. Signed-off-by: Quentin Monnet Reviewed-by: Jakub Kicinski --- include/net/pkt_cls.h | 11 +++ 1 file changed, 11 insertions(+) diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h index c88c61234cb3..a3ad6a5a2d12 100644 --- a/include/net/pkt_cls.h +++ b/include/net/pkt_cls.h @@ -644,6 +644,17 @@ static inline bool tc_can_offload(const struct net_device *dev) return dev->features & NETIF_F_HW_TC; } +static inline bool tc_can_offload_extack(const struct net_device *dev, +struct netlink_ext_ack *extack) +{ + bool can = tc_can_offload(dev); + + if (!can) + NL_SET_ERR_MSG(extack, "TC offload is disabled on net device"); + + return can; +} + static inline bool tc_skip_hw(u32 flags) { return (flags & TCA_CLS_FLAGS_SKIP_HW) ? true : false; -- 2.15.1
[PATCH bpf-next 02/11] net: sched: prepare extack support for offload via tc_cls_common_offload
From: Quentin Monnet Prepare for extack support for hardware offload of classifiers. In order to achieve this, a pointer to a struct netlink_ext_ack is added to the struct tc_cls_common_offload that is passed to the callback for setting up the classifier. Function tc_cls_common_offload_init() is updated to support initialization of this new attribute. Extack plumbing is not complete yet. Signed-off-by: Quentin Monnet Reviewed-by: Jakub Kicinski --- include/net/pkt_cls.h| 5 - net/sched/cls_bpf.c | 4 ++-- net/sched/cls_flower.c | 6 +++--- net/sched/cls_matchall.c | 4 ++-- net/sched/cls_u32.c | 8 5 files changed, 15 insertions(+), 12 deletions(-) diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h index 0d1343cba84c..c88c61234cb3 100644 --- a/include/net/pkt_cls.h +++ b/include/net/pkt_cls.h @@ -590,15 +590,18 @@ struct tc_cls_common_offload { u32 chain_index; __be16 protocol; u32 prio; + struct netlink_ext_ack *extack; }; static inline void tc_cls_common_offload_init(struct tc_cls_common_offload *cls_common, - const struct tcf_proto *tp) + const struct tcf_proto *tp, + struct netlink_ext_ack *extack) { cls_common->chain_index = tp->chain->index; cls_common->protocol = tp->protocol; cls_common->prio = tp->prio; + cls_common->extack = extack; } struct tc_cls_u32_knode { diff --git a/net/sched/cls_bpf.c b/net/sched/cls_bpf.c index fcb831b3917e..b8729124d209 100644 --- a/net/sched/cls_bpf.c +++ b/net/sched/cls_bpf.c @@ -158,7 +158,7 @@ static int cls_bpf_offload_cmd(struct tcf_proto *tp, struct cls_bpf_prog *prog, skip_sw = prog && tc_skip_sw(prog->gen_flags); obj = prog ?: oldprog; - tc_cls_common_offload_init(&cls_bpf.common, tp); + tc_cls_common_offload_init(&cls_bpf.common, tp, NULL); cls_bpf.command = TC_CLSBPF_OFFLOAD; cls_bpf.exts = &obj->exts; cls_bpf.prog = prog ? prog->filter : NULL; @@ -215,7 +215,7 @@ static void cls_bpf_offload_update_stats(struct tcf_proto *tp, struct tcf_block *block = tp->chain->block; struct tc_cls_bpf_offload cls_bpf = {}; - tc_cls_common_offload_init(&cls_bpf.common, tp); + tc_cls_common_offload_init(&cls_bpf.common, tp, NULL); cls_bpf.command = TC_CLSBPF_STATS; cls_bpf.exts = &prog->exts; cls_bpf.prog = prog->filter; diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c index 998ee4faf934..f1640a98a3d2 100644 --- a/net/sched/cls_flower.c +++ b/net/sched/cls_flower.c @@ -223,7 +223,7 @@ static void fl_hw_destroy_filter(struct tcf_proto *tp, struct cls_fl_filter *f) struct tc_cls_flower_offload cls_flower = {}; struct tcf_block *block = tp->chain->block; - tc_cls_common_offload_init(&cls_flower.common, tp); + tc_cls_common_offload_init(&cls_flower.common, tp, NULL); cls_flower.command = TC_CLSFLOWER_DESTROY; cls_flower.cookie = (unsigned long) f; @@ -241,7 +241,7 @@ static int fl_hw_replace_filter(struct tcf_proto *tp, bool skip_sw = tc_skip_sw(f->flags); int err; - tc_cls_common_offload_init(&cls_flower.common, tp); + tc_cls_common_offload_init(&cls_flower.common, tp, NULL); cls_flower.command = TC_CLSFLOWER_REPLACE; cls_flower.cookie = (unsigned long) f; cls_flower.dissector = dissector; @@ -270,7 +270,7 @@ static void fl_hw_update_stats(struct tcf_proto *tp, struct cls_fl_filter *f) struct tc_cls_flower_offload cls_flower = {}; struct tcf_block *block = tp->chain->block; - tc_cls_common_offload_init(&cls_flower.common, tp); + tc_cls_common_offload_init(&cls_flower.common, tp, NULL); cls_flower.command = TC_CLSFLOWER_STATS; cls_flower.cookie = (unsigned long) f; cls_flower.exts = &f->exts; diff --git a/net/sched/cls_matchall.c b/net/sched/cls_matchall.c index dc3c57116bbd..24bec37e9747 100644 --- a/net/sched/cls_matchall.c +++ b/net/sched/cls_matchall.c @@ -76,7 +76,7 @@ static void mall_destroy_hw_filter(struct tcf_proto *tp, struct tc_cls_matchall_offload cls_mall = {}; struct tcf_block *block = tp->chain->block; - tc_cls_common_offload_init(&cls_mall.common, tp); + tc_cls_common_offload_init(&cls_mall.common, tp, NULL); cls_mall.command = TC_CLSMATCHALL_DESTROY; cls_mall.cookie = cookie; @@ -92,7 +92,7 @@ static int mall_replace_hw_filter(struct tcf_proto *tp, bool skip_sw = tc_skip_sw(head->flags); int err; - tc_cls_common_offload_init(&cls_mall.common, tp); + tc_cls_common_offload_init(&cls_mall.common, tp, NULL); cls_mall.command = TC_CLSMATCHALL_REPLACE; cls_mall.exts = &head->exts; cls_mall.cookie = cookie; diff --git a/net/sched/cls_u32.c b/net/sched/cls_u32.c index 3ef5c32741c1..8127b15d8d50 100644 --- a/n
[PATCH bpf-next 09/11] nfp: bpf: use extack support to improve debugging
From: Quentin Monnet Use the recently added extack support for eBPF offload in the driver. Signed-off-by: Quentin Monnet Reviewed-by: Jakub Kicinski --- drivers/net/ethernet/netronome/nfp/bpf/main.c| 31 ++-- drivers/net/ethernet/netronome/nfp/bpf/main.h| 2 +- drivers/net/ethernet/netronome/nfp/bpf/offload.c | 24 +++--- 3 files changed, 39 insertions(+), 18 deletions(-) diff --git a/drivers/net/ethernet/netronome/nfp/bpf/main.c b/drivers/net/ethernet/netronome/nfp/bpf/main.c index e8816ab8fb63..a638c3ab6b61 100644 --- a/drivers/net/ethernet/netronome/nfp/bpf/main.c +++ b/drivers/net/ethernet/netronome/nfp/bpf/main.c @@ -70,7 +70,7 @@ nfp_bpf_xdp_offload(struct nfp_app *app, struct nfp_net *nn, if (prog && running && !xdp_running) return -EBUSY; - ret = nfp_net_bpf_offload(nn, prog, running); + ret = nfp_net_bpf_offload(nn, prog, running, extack); /* Stop offload if replace not possible */ if (ret && prog) nfp_bpf_xdp_offload(app, nn, NULL, extack); @@ -125,17 +125,31 @@ static int nfp_bpf_setup_tc_block_cb(enum tc_setup_type type, struct nfp_bpf_vnic *bv; int err; - if (type != TC_SETUP_CLSBPF || - !tc_can_offload(nn->dp.netdev) || - !nfp_net_ebpf_capable(nn) || - cls_bpf->common.protocol != htons(ETH_P_ALL) || - cls_bpf->common.chain_index) + if (type != TC_SETUP_CLSBPF) { + NL_SET_ERR_MSG_MOD(cls_bpf->common.extack, + "only offload of BPF classifiers supported"); + return -EOPNOTSUPP; + } + if (!tc_can_offload_extack(nn->dp.netdev, cls_bpf->common.extack)) + return -EOPNOTSUPP; + if (!nfp_net_ebpf_capable(nn)) { + NL_SET_ERR_MSG_MOD(cls_bpf->common.extack, + "NFP firmware does not support eBPF offload"); + return -EOPNOTSUPP; + } + if (cls_bpf->common.protocol != htons(ETH_P_ALL)) { + NL_SET_ERR_MSG_MOD(cls_bpf->common.extack, + "only ETH_P_ALL supported as filter protocol"); + return -EOPNOTSUPP; + } + if (cls_bpf->common.chain_index) return -EOPNOTSUPP; /* Only support TC direct action */ if (!cls_bpf->exts_integrated || tcf_exts_has_actions(cls_bpf->exts)) { - nn_err(nn, "only direct action with no legacy actions supported\n"); + NL_SET_ERR_MSG_MOD(cls_bpf->common.extack, + "only direct action with no legacy actions supported"); return -EOPNOTSUPP; } @@ -152,7 +166,8 @@ static int nfp_bpf_setup_tc_block_cb(enum tc_setup_type type, return 0; } - err = nfp_net_bpf_offload(nn, cls_bpf->prog, oldprog); + err = nfp_net_bpf_offload(nn, cls_bpf->prog, oldprog, + cls_bpf->common.extack); if (err) return err; diff --git a/drivers/net/ethernet/netronome/nfp/bpf/main.h b/drivers/net/ethernet/netronome/nfp/bpf/main.h index b80e75a8ecda..80855d43b25e 100644 --- a/drivers/net/ethernet/netronome/nfp/bpf/main.h +++ b/drivers/net/ethernet/netronome/nfp/bpf/main.h @@ -334,7 +334,7 @@ struct nfp_net; int nfp_ndo_bpf(struct nfp_app *app, struct nfp_net *nn, struct netdev_bpf *bpf); int nfp_net_bpf_offload(struct nfp_net *nn, struct bpf_prog *prog, - bool old_prog); + bool old_prog, struct netlink_ext_ack *extack); struct nfp_insn_meta * nfp_bpf_goto_meta(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta, diff --git a/drivers/net/ethernet/netronome/nfp/bpf/offload.c b/drivers/net/ethernet/netronome/nfp/bpf/offload.c index e2859b2e9c6a..9c78a09cda24 100644 --- a/drivers/net/ethernet/netronome/nfp/bpf/offload.c +++ b/drivers/net/ethernet/netronome/nfp/bpf/offload.c @@ -271,7 +271,9 @@ int nfp_ndo_bpf(struct nfp_app *app, struct nfp_net *nn, struct netdev_bpf *bpf) } } -static int nfp_net_bpf_load(struct nfp_net *nn, struct bpf_prog *prog) +static int +nfp_net_bpf_load(struct nfp_net *nn, struct bpf_prog *prog, +struct netlink_ext_ack *extack) { struct nfp_prog *nfp_prog = prog->aux->offload->dev_priv; unsigned int max_mtu; @@ -281,7 +283,7 @@ static int nfp_net_bpf_load(struct nfp_net *nn, struct bpf_prog *prog) max_mtu = nn_readb(nn, NFP_NET_CFG_BPF_INL_MTU) * 64 - 32; if (max_mtu < nn->dp.netdev->mtu) { - nn_info(nn, "BPF offload not supported with MTU larger than HW packet split boundary\n"); + NL_SET_ERR_MSG_MOD(extack, "BPF offload not supported with MTU larger than HW packet split boundary"); return -EOPNOTSUPP; } @@ -303,7 +305,8 @@ static int nfp_net_bpf_load(struct n
[PATCH bpf-next 05/11] net: sched: cls_u32: propagate extack support for filter offload
From: Quentin Monnet Propagate the extack pointer from the `->change()` classifier operation to the function used for filter replacement in cls_u32, and feed it to tc_cls_common_offload_init(). This makes it possible to use netlink extack messages in the future at replacement time for this filter, although it is not used at this point. Signed-off-by: Quentin Monnet Reviewed-by: Jakub Kicinski --- net/sched/cls_u32.c | 14 +++--- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/net/sched/cls_u32.c b/net/sched/cls_u32.c index 8127b15d8d50..ef1b746de80b 100644 --- a/net/sched/cls_u32.c +++ b/net/sched/cls_u32.c @@ -501,7 +501,7 @@ static void u32_clear_hw_hnode(struct tcf_proto *tp, struct tc_u_hnode *h) } static int u32_replace_hw_hnode(struct tcf_proto *tp, struct tc_u_hnode *h, - u32 flags) + u32 flags, struct netlink_ext_ack *extack) { struct tcf_block *block = tp->chain->block; struct tc_cls_u32_offload cls_u32 = {}; @@ -509,7 +509,7 @@ static int u32_replace_hw_hnode(struct tcf_proto *tp, struct tc_u_hnode *h, bool offloaded = false; int err; - tc_cls_common_offload_init(&cls_u32.common, tp, NULL); + tc_cls_common_offload_init(&cls_u32.common, tp, extack); cls_u32.command = TC_CLSU32_NEW_HNODE; cls_u32.hnode.divisor = h->divisor; cls_u32.hnode.handle = h->handle; @@ -542,14 +542,14 @@ static void u32_remove_hw_knode(struct tcf_proto *tp, u32 handle) } static int u32_replace_hw_knode(struct tcf_proto *tp, struct tc_u_knode *n, - u32 flags) + u32 flags, struct netlink_ext_ack *extack) { struct tcf_block *block = tp->chain->block; struct tc_cls_u32_offload cls_u32 = {}; bool skip_sw = tc_skip_sw(flags); int err; - tc_cls_common_offload_init(&cls_u32.common, tp, NULL); + tc_cls_common_offload_init(&cls_u32.common, tp, extack); cls_u32.command = TC_CLSU32_REPLACE_KNODE; cls_u32.knode.handle = n->handle; cls_u32.knode.fshift = n->fshift; @@ -943,7 +943,7 @@ static int u32_change(struct net *net, struct sk_buff *in_skb, return err; } - err = u32_replace_hw_knode(tp, new, flags); + err = u32_replace_hw_knode(tp, new, flags, extack); if (err) { u32_destroy_key(tp, new, false); return err; @@ -990,7 +990,7 @@ static int u32_change(struct net *net, struct sk_buff *in_skb, ht->prio = tp->prio; idr_init(&ht->handle_idr); - err = u32_replace_hw_hnode(tp, ht, flags); + err = u32_replace_hw_hnode(tp, ht, flags, extack); if (err) { idr_remove_ext(&tp_c->handle_idr, handle); kfree(ht); @@ -1088,7 +1088,7 @@ static int u32_change(struct net *net, struct sk_buff *in_skb, struct tc_u_knode __rcu **ins; struct tc_u_knode *pins; - err = u32_replace_hw_knode(tp, n, flags); + err = u32_replace_hw_knode(tp, n, flags, extack); if (err) goto errhw; -- 2.15.1
[PATCH bpf-next 06/11] net: sched: cls_bpf: plumb extack support in filter for hardware offload
From: Quentin Monnet Pass the extack pointer obtained in the `->change()` filter operation to cls_bpf_offload() and then to cls_bpf_offload_cmd(), where it can be used to initialize the new field of tc_cls_common_offload passed to the callback for offload. This makes it possible to use this extack pointer in drivers offloading BPF programs in a future patch. Signed-off-by: Quentin Monnet Reviewed-by: Jakub Kicinski --- net/sched/cls_bpf.c | 16 +--- 1 file changed, 9 insertions(+), 7 deletions(-) diff --git a/net/sched/cls_bpf.c b/net/sched/cls_bpf.c index b8729124d209..d15ef9ab7243 100644 --- a/net/sched/cls_bpf.c +++ b/net/sched/cls_bpf.c @@ -147,7 +147,8 @@ static bool cls_bpf_is_ebpf(const struct cls_bpf_prog *prog) } static int cls_bpf_offload_cmd(struct tcf_proto *tp, struct cls_bpf_prog *prog, - struct cls_bpf_prog *oldprog) + struct cls_bpf_prog *oldprog, + struct netlink_ext_ack *extack) { struct tcf_block *block = tp->chain->block; struct tc_cls_bpf_offload cls_bpf = {}; @@ -158,7 +159,7 @@ static int cls_bpf_offload_cmd(struct tcf_proto *tp, struct cls_bpf_prog *prog, skip_sw = prog && tc_skip_sw(prog->gen_flags); obj = prog ?: oldprog; - tc_cls_common_offload_init(&cls_bpf.common, tp, NULL); + tc_cls_common_offload_init(&cls_bpf.common, tp, extack); cls_bpf.command = TC_CLSBPF_OFFLOAD; cls_bpf.exts = &obj->exts; cls_bpf.prog = prog ? prog->filter : NULL; @@ -170,7 +171,7 @@ static int cls_bpf_offload_cmd(struct tcf_proto *tp, struct cls_bpf_prog *prog, err = tc_setup_cb_call(block, NULL, TC_SETUP_CLSBPF, &cls_bpf, skip_sw); if (prog) { if (err < 0) { - cls_bpf_offload_cmd(tp, oldprog, prog); + cls_bpf_offload_cmd(tp, oldprog, prog, extack); return err; } else if (err > 0) { prog->gen_flags |= TCA_CLS_FLAGS_IN_HW; @@ -184,7 +185,8 @@ static int cls_bpf_offload_cmd(struct tcf_proto *tp, struct cls_bpf_prog *prog, } static int cls_bpf_offload(struct tcf_proto *tp, struct cls_bpf_prog *prog, - struct cls_bpf_prog *oldprog) + struct cls_bpf_prog *oldprog, + struct netlink_ext_ack *extack) { if (prog && oldprog && prog->gen_flags != oldprog->gen_flags) return -EINVAL; @@ -196,7 +198,7 @@ static int cls_bpf_offload(struct tcf_proto *tp, struct cls_bpf_prog *prog, if (!prog && !oldprog) return 0; - return cls_bpf_offload_cmd(tp, prog, oldprog); + return cls_bpf_offload_cmd(tp, prog, oldprog, extack); } static void cls_bpf_stop_offload(struct tcf_proto *tp, @@ -204,7 +206,7 @@ static void cls_bpf_stop_offload(struct tcf_proto *tp, { int err; - err = cls_bpf_offload_cmd(tp, NULL, prog); + err = cls_bpf_offload_cmd(tp, NULL, prog, NULL); if (err) pr_err("Stopping hardware offload failed: %d\n", err); } @@ -501,7 +503,7 @@ static int cls_bpf_change(struct net *net, struct sk_buff *in_skb, if (ret < 0) goto errout_idr; - ret = cls_bpf_offload(tp, prog, oldprog); + ret = cls_bpf_offload(tp, prog, oldprog, extack); if (ret) goto errout_parms; -- 2.15.1
[PATCH bpf-next 10/11] netdevsim: add extack support for TC eBPF offload
From: Quentin Monnet Use the recently added extack support for TC eBPF filters in netdevsim. Signed-off-by: Quentin Monnet Reviewed-by: Jakub Kicinski --- drivers/net/netdevsim/bpf.c | 35 --- 1 file changed, 28 insertions(+), 7 deletions(-) diff --git a/drivers/net/netdevsim/bpf.c b/drivers/net/netdevsim/bpf.c index 5134d5c1306c..0de8ba91b262 100644 --- a/drivers/net/netdevsim/bpf.c +++ b/drivers/net/netdevsim/bpf.c @@ -109,17 +109,35 @@ int nsim_bpf_setup_tc_block_cb(enum tc_setup_type type, struct netdevsim *ns = cb_priv; struct bpf_prog *oldprog; - if (type != TC_SETUP_CLSBPF || - !tc_can_offload(ns->netdev) || - cls_bpf->common.protocol != htons(ETH_P_ALL) || - cls_bpf->common.chain_index) + if (type != TC_SETUP_CLSBPF) { + NSIM_EA(cls_bpf->common.extack, + "only offload of BPF classifiers supported"); + return -EOPNOTSUPP; + } + + if (!tc_can_offload_extack(ns->netdev, cls_bpf->common.extack)) + return -EOPNOTSUPP; + + if (cls_bpf->common.protocol != htons(ETH_P_ALL)) { + NSIM_EA(cls_bpf->common.extack, + "only ETH_P_ALL supported as filter protocol"); + return -EOPNOTSUPP; + } + + if (cls_bpf->common.chain_index) return -EOPNOTSUPP; - if (!ns->bpf_tc_accept) + if (!ns->bpf_tc_accept) { + NSIM_EA(cls_bpf->common.extack, + "netdevsim configured to reject BPF TC offload"); return -EOPNOTSUPP; + } /* Note: progs without skip_sw will probably not be dev bound */ - if (prog && !prog->aux->offload && !ns->bpf_tc_non_bound_accept) + if (prog && !prog->aux->offload && !ns->bpf_tc_non_bound_accept) { + NSIM_EA(cls_bpf->common.extack, + "netdevsim configured to reject unbound programs"); return -EOPNOTSUPP; + } if (cls_bpf->command != TC_CLSBPF_OFFLOAD) return -EOPNOTSUPP; @@ -131,8 +149,11 @@ int nsim_bpf_setup_tc_block_cb(enum tc_setup_type type, oldprog = NULL; if (!cls_bpf->prog) return 0; - if (ns->bpf_offloaded) + if (ns->bpf_offloaded) { + NSIM_EA(cls_bpf->common.extack, + "driver and netdev offload states mismatch"); return -EBUSY; + } } return nsim_bpf_offload(ns, cls_bpf->prog, oldprog); -- 2.15.1
[PATCH bpf-next 08/11] nfp: bpf: plumb extack into functions related to XDP offload
From: Quentin Monnet Pass a pointer to an extack object to nfp_app_xdp_offload() in order to prepare for extack usage in the nfp driver. Next step will be to forward this extack pointer to nfp_net_bpf_offload(), once this function is able to use it for printing error messages. Signed-off-by: Quentin Monnet Reviewed-by: Jakub Kicinski --- drivers/net/ethernet/netronome/nfp/bpf/main.c | 4 ++-- drivers/net/ethernet/netronome/nfp/nfp_app.h| 9 ++--- drivers/net/ethernet/netronome/nfp/nfp_net_common.c | 2 +- 3 files changed, 9 insertions(+), 6 deletions(-) diff --git a/drivers/net/ethernet/netronome/nfp/bpf/main.c b/drivers/net/ethernet/netronome/nfp/bpf/main.c index 8823c8360047..e8816ab8fb63 100644 --- a/drivers/net/ethernet/netronome/nfp/bpf/main.c +++ b/drivers/net/ethernet/netronome/nfp/bpf/main.c @@ -54,7 +54,7 @@ static bool nfp_net_ebpf_capable(struct nfp_net *nn) static int nfp_bpf_xdp_offload(struct nfp_app *app, struct nfp_net *nn, - struct bpf_prog *prog) + struct bpf_prog *prog, struct netlink_ext_ack *extack) { bool running, xdp_running; int ret; @@ -73,7 +73,7 @@ nfp_bpf_xdp_offload(struct nfp_app *app, struct nfp_net *nn, ret = nfp_net_bpf_offload(nn, prog, running); /* Stop offload if replace not possible */ if (ret && prog) - nfp_bpf_xdp_offload(app, nn, NULL); + nfp_bpf_xdp_offload(app, nn, NULL, extack); nn->dp.bpf_offload_xdp = prog && !ret; return ret; diff --git a/drivers/net/ethernet/netronome/nfp/nfp_app.h b/drivers/net/ethernet/netronome/nfp/nfp_app.h index 6a6eb02b516e..1229a34f8da5 100644 --- a/drivers/net/ethernet/netronome/nfp/nfp_app.h +++ b/drivers/net/ethernet/netronome/nfp/nfp_app.h @@ -43,6 +43,7 @@ struct bpf_prog; struct net_device; struct netdev_bpf; +struct netlink_ext_ack; struct pci_dev; struct sk_buff; struct sk_buff; @@ -134,7 +135,8 @@ struct nfp_app_type { int (*bpf)(struct nfp_app *app, struct nfp_net *nn, struct netdev_bpf *xdp); int (*xdp_offload)(struct nfp_app *app, struct nfp_net *nn, - struct bpf_prog *prog); + struct bpf_prog *prog, + struct netlink_ext_ack *extack); int (*sriov_enable)(struct nfp_app *app, int num_vfs); void (*sriov_disable)(struct nfp_app *app); @@ -320,11 +322,12 @@ static inline int nfp_app_bpf(struct nfp_app *app, struct nfp_net *nn, } static inline int nfp_app_xdp_offload(struct nfp_app *app, struct nfp_net *nn, - struct bpf_prog *prog) + struct bpf_prog *prog, + struct netlink_ext_ack *extack) { if (!app || !app->type->xdp_offload) return -EOPNOTSUPP; - return app->type->xdp_offload(app, nn, prog); + return app->type->xdp_offload(app, nn, prog, extack); } static inline bool __nfp_app_ctrl_tx(struct nfp_app *app, struct sk_buff *skb) diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c index 2b5cad3069a7..14f23e8d27fa 100644 --- a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c +++ b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c @@ -3395,7 +3395,7 @@ nfp_net_xdp_setup(struct nfp_net *nn, struct bpf_prog *prog, u32 flags, if (err) return err; - err = nfp_app_xdp_offload(nn->app, nn, offload_prog); + err = nfp_app_xdp_offload(nn->app, nn, offload_prog, extack); if (err && flags & XDP_FLAGS_HW_MODE) return err; -- 2.15.1
Re: [PATCH 4.15-rc8] net/core: Increase default optmem_max limit
With the new Linux Kernel Crypto API User Space Interface and its underlying socket interface, the current default value for `net.core.optmem_max` can be exhausted pretty quick. On 32 bit systems it is not even enough for sending 16 IOVECs at once to the socket interface. To provide consumers of this new user space interface a sufficient and reasonable maximum ancillary buffer size per socket by default, the limit is increased to four times of the previous setting: * 32 bit systems: from 10240 bytes to 40960 bytes * 64 bit systems: from 20480 bytes to 81920 bytes This allows for sending 32/64 (32/64 bit) parallel IOVECs at once to the socket interface, which should be enough for use in real world applications. Signed-off-by: Björn Esser --- Index: linux-4.15/net/core/sock.c === --- linux-4.15.orig/net/core/sock.c +++ linux-4.15/net/core/sock.c @@ -316,7 +316,7 @@ __u32 sysctl_wmem_default __read_mostly __u32 sysctl_rmem_default __read_mostly = SK_RMEM_MAX; /* Maximal space eaten by iovec or ancillary data plus some space */ -int sysctl_optmem_max __read_mostly = sizeof(unsigned long)*(2*UIO_MAXIOV+512); +int sysctl_optmem_max __read_mostly = sizeof(unsigned long)*4*(2*UIO_MAXIOV+512); EXPORT_SYMBOL(sysctl_optmem_max); int sysctl_tstamp_allow_data __read_mostly = 1;
Re: [PATCH v2] bnx2x: disable GSO where gso_size is too big for hardware
David Miller writes: > From: Daniel Axtens > Date: Fri, 12 Jan 2018 10:59:05 +1100 > >> If a bnx2x card is passed a GSO packet with a gso_size larger than >> ~9700 bytes, it will cause a firmware error that will bring the card >> down: >> >> bnx2x: [bnx2x_attn_int_deasserted3:4323(enP24p1s0f0)]MC assert! >> bnx2x: [bnx2x_mc_assert:720(enP24p1s0f0)]XSTORM_ASSERT_LIST_INDEX 0x2 >> bnx2x: [bnx2x_mc_assert:736(enP24p1s0f0)]XSTORM_ASSERT_INDEX 0x0 = >> 0x 0x25e43e47 0x00463e01 0x00010052 >> bnx2x: [bnx2x_mc_assert:750(enP24p1s0f0)]Chip Revision: everest3, FW >> Version: 7_13_1 >> ... (dump of values continues) ... >> >> Detect when gso_size + header length is greater than the maximum >> packet size (9700 bytes) and disable GSO. For simplicity and speed >> this is approximated by comparing gso_size against 9200 and assuming >> no-one will have more than 500 bytes of headers. > > What is the MTU size configured on the bnx2x device when these 9700 > byte packets are seen? > > If it's less than 9700, whatever is allowing your device (openvswitch, > ibmveth, whatever) needs to be fixed. Sure, I had an approach that checks the gso_size in is_skb_forwardable and the equivalent openvswitch path - I'll send that. Regards, Daniel > > I don't like this at all, quite frankly. We'll have one device now that > has this special check, probably many others can run into this situation > as well but they won't be used on these kinds of powerpc boxes and > therefore nobody is going to notice. > > I'm not applying this without more information or better justification, > sorry.
[PATCH 4.15-rc8] net/core: Increase default optmem_max limit
With the new Linux Kernel Crypto API User Space Interface and its underlying socket interface, the current default value for `net.core.optmem_max` can be exhausted pretty quick. On 32 bit systems it is not even enough for sending 16 IOVECs at once to the socket interface. To provide consumers of this new user space interface a sufficient and reasonable maximum ancillary buffer size per socket by default, the limit is increased to four times of the previous setting: * 32 bit systems: from 10240 bytes to 40960 bytes * 64 bit systems: from 20480 bytes to 81920 bytes This allows for sending 32/64 (32/64 bit) parallel IOVECs at once to the socket interface, which should be enough for use in real world applications. Signed-off-by: Björn Esser --- Index: linux-4.15/net/core/sock.c === --- linux-4.15.orig/net/core/sock.c +++ linux-4.15/net/core/sock.c @@ -316,7 +316,7 @@ __u32 sysctl_wmem_default __read_mostly __u32 sysctl_rmem_default __read_mostly = SK_RMEM_MAX; /* Maximal space eaten by iovec or ancillary data plus some space */ -int sysctl_optmem_max __read_mostly = sizeof(unsigned long)*(2*UIO_MAXIOV+512); +int sysctl_optmem_max __read_mostly = sizeof(unsigned long)*4*(2*UIO_MAXIOV+512); EXPORT_SYMBOL(sysctl_optmem_max); int sysctl_tstamp_allow_data __read_mostly = 1; signature.asc Description: This is a digitally signed message part
[PATCH 4.15-rc8] net/core: Increase default optmem_max limit
With the new Linux Kernel Crypto API User Space Interface and its underlying socket interface, the current default value for `net.core.optmem_max` can be exhausted pretty quick. On 32 bit systems it is not even enough for sending 16 IOVECs at once to the socket interface. To provide consumers of this new user space interface a sufficient and reasonable maximum ancillary buffer size per socket by default, the limit is increased to four times of the previous setting: * 32 bit systems: from 10240 bytes to 40960 bytes * 64 bit systems: from 20480 bytes to 81920 bytes This allows for sending 32/64 (32/64 bit) parallel IOVECs at once to the socket interface, which should be enough for use in real world applications. Signed-off-by: Björn Esser --- Index: linux-4.15/net/core/sock.c === --- linux-4.15.orig/net/core/sock.c +++ linux- 4.15/net/core/sock.c @@ -316,7 +316,7 @@ __u32 sysctl_wmem_default __read_mostly __u32 sysctl_rmem_default __read_mostly = SK_RMEM_MAX; /* Maximal space eaten by iovec or ancillary data plus some space */ -int sysctl_optmem_max __read_mostly = sizeof(unsigned long)*(2*UIO_MAXIOV+512); +int sysctl_optmem_max __read_mostly = sizeof(unsigned long)*4*(2*UIO_MAXIOV+512); EXPORT_SYMBOL(sysctl_optmem_max); int sysctl_tstamp_allow_data __read_mostly = 1;
Re: [RFC PATCH net-next 00/12] selftests: forwarding: Add VRF-based tests
On 1/15/18 4:17 PM, Jiri Pirko wrote: >> A couple of feature requests: >> 1. an option to pause on any error to allow inspection of the setup > > Good idea. Should be easy to add. Here is a snippet from my vrf test script: PAUSE_ON_FAIL=no -p option sets PAUSE_ON_FAIL=yes log_test() { local rc=$1 local expected=$2 local msg="$3" if [ ${rc} -eq ${expected} ]; then nsuccess=$((nsuccess+1)) printf "\nTEST: %-80s [ OK ]\n" "${msg}" else nfail=$((nfail+1)) printf "\nTEST: %-80s [FAIL]\n" "${msg}" if [ "${PAUSE_ON_FAIL}" = "yes" ]; then echo echo "hit enter to continue, 'q' to quit" read a [ "$a" = "q" ] && exit 1 fi fi } 'rc' is the return code from the command run 'expected' is the expected return code. This allows negative testing where rc can be non-0 and the test passes 'msg' is the name of the test. The above maps close to print_result in lib.sh (nsuccess and nfail track total number of tests run that pass/fail. I print a summary when the test set is done.) > > >> >> 2. an option to configure the system and leave it in that state (ie, >> don't trap exit and run cleanup). By extension, an option is needed to >> do cleanup only. > > Checkout the last patch. It has "noprepare" and "nocleanup" options. > So I guess you imagine something like that, but generic? > Sure that is one way. Something else I have found useful is to not redirect stdout/stderr from the commands and to have tags that can be grep'ed to provide a summary. I run my VRF test script as: $ run-test.sh 2>&1 | tee vrf-results.txt | grep TEST The terminal gets a nice summary: TEST SECTION: IPv4 ping TEST: ping out - VRF device, peer IP [ OK ] TEST: ping out - enslaved device, peer IP[ OK ] TEST: ping in - My IP[ OK ] TEST: ping in - VRF IP [FAIL] TEST: ping local - VRF device, My IP [ OK ] TEST: ping local - VRF device, VRF IP[ OK ] TEST: ping local - VRF device, loopback [ OK ] TEST: ping local - enslaved device, My IP[ OK ] TEST: ping local - enslaved device, VRF IP [ OK ] TEST: ping out - VRF device, VRF source, peer IP [ OK ] TEST: ping out - enslaved device, VRF source, peer IP[ OK ] And when I find a test failing I can go look at the full output: $ vi vrf-results.txt Test setup VRF name: lisa Enslaved device veth1 Device addresses: 172.16.99.1 2001:db8:99::1 VRF local IP: 1.1.1.1 :1::1 Peer device: veth2 Peer addresses: 172.16.99.254 2001:db8:99::64 Configuring network namespace Resetting config ... ## TEST SECTION: IPv4 ping ## COMMAND: ping -c1 -w1 -I lisa 172.16.99.254 ping: Warning: source address might be selected on device other than lisa. PING 172.16.99.254 (172.16.99.254) from 172.16.99.1 lisa: 56(84) bytes of data. 64 bytes from 172.16.99.254: icmp_seq=1 ttl=64 time=0.084 ms --- 172.16.99.254 ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 0.084/0.084/0.084/0.000 ms TEST: ping out - VRF device, peer IP [ OK ] ...
linux-next: manual merge of the net-next tree with the net tree
Hi all, Today's linux-next merge of the net-next tree got a conflict in: net/ipv6/ip6_output.c between commit: 749439bfac6e ("ipv6: fix udpv6 sendmsg crash caused by too small MTU") from the net tree and commit: 0f6c480f23f4 ("xfrm: Move dst->path into struct xfrm_dst") from the net-next tree. I fixed it up (see below) and can carry the fix as necessary. This is now fixed as far as linux-next is concerned, but any non trivial conflicts should be mentioned to your upstream maintainer when your tree is submitted for merging. You may also want to consider cooperating with the maintainer of the conflicting tree to minimise any particularly complex conflicts. -- Cheers, Stephen Rothwell diff --cc net/ipv6/ip6_output.c index 4f7d8de56611,18547a44bdaf.. --- a/net/ipv6/ip6_output.c +++ b/net/ipv6/ip6_output.c @@@ -1206,18 -1215,16 +1215,18 @@@ static int ip6_setup_cork(struct sock * v6_cork->tclass = ipc6->tclass; if (rt->dst.flags & DST_XFRM_TUNNEL) mtu = np->pmtudisc >= IPV6_PMTUDISC_PROBE ? -rt->dst.dev->mtu : dst_mtu(&rt->dst); +READ_ONCE(rt->dst.dev->mtu) : dst_mtu(&rt->dst); else mtu = np->pmtudisc >= IPV6_PMTUDISC_PROBE ? - READ_ONCE(rt->dst.dev->mtu) : dst_mtu(rt->dst.path); -rt->dst.dev->mtu : dst_mtu(xfrm_dst_path(&rt->dst)); ++READ_ONCE(rt->dst.dev->mtu) : dst_mtu(xfrm_dst_path(&rt->dst)); if (np->frag_size < mtu) { if (np->frag_size) mtu = np->frag_size; } + if (mtu < IPV6_MIN_MTU) + return -EINVAL; cork->base.fragsize = mtu; - if (dst_allfrag(rt->dst.path)) + if (dst_allfrag(xfrm_dst_path(&rt->dst))) cork->base.flags |= IPCORK_ALLFRAG; cork->base.length = 0;
linux-next: manual merge of the net-next tree with the net tree
Hi all, Today's linux-next merge of the net-next tree got a conflict in: net/openvswitch/flow_netlink.c between commit: 95a332088ecb ("Revert "openvswitch: Add erspan tunnel support."") from the net tree and commit: 1d7e2ed22f8d ("net: erspan: refactor existing erspan code") from the net-next tree. I fixed it up (I removed the bits of code rmeoved by the former that were updated in the latter) and can carry the fix as necessary. This is now fixed as far as linux-next is concerned, but any non trivial conflicts should be mentioned to your upstream maintainer when your tree is submitted for merging. You may also want to consider cooperating with the maintainer of the conflicting tree to minimise any particularly complex conflicts. -- Cheers, Stephen Rothwell
Re: [RFC PATCH net-next 00/12] selftests: forwarding: Add VRF-based tests
Mon, Jan 15, 2018 at 09:14:56PM CET, dsah...@gmail.com wrote: >On 1/15/18 12:18 PM, Ido Schimmel wrote: >> One of the nice things about network namespaces is that they allow one >> to easily create and test complex environments. >> >> Unfortunately, these namespaces can not be used with actual switching >> ASICs, as their ports can not be migrated to other network namespaces >> (NETIF_F_NETNS_LOCAL) and most of them probably do not support the >> L1-separation provided by namespaces. >> >> However, a similar kind of flexibility can be achieved by using VRFs and >> by looping the switch ports together. For example: >> >> br0 >> + >>vrf-h1 | vrf-h2 >> ++---+++ >> |||| >> 192.0.2.1/24 ++++ 192.0.2.2/24 >>swp1 swp2 swp3 swp4 >> ++++ >> |||| >> ++++ >> >> The VRFs act as lightweight namespaces representing hosts connected to >> the switch. >> >> This approach for testing switch ASICs has several advantages over the >> traditional method that requires multiple physical machines, to name a >> few: >> >> 1. Only the device under test (DUT) is being tested without noise from >> other system. >> >> 2. Ability to easily provision complex topologies. Testing bridging >> between 4-ports LAGs or 8-way ECMP requires many physical links that are >> not always available. With the VRF-based approach one merely needs to >> loopback more ports. >> >> These tests are written with switch ASICs in mind, but they can be run >> on any Linux box using veth pairs to emulate physical loopbacks. >> >> Feedback is is welcome. Particularly regarding the best location for >> these tests (e.g., current location, tools/testing/selftests/net). >> > >Awesome. Thanks for working on this. > >A couple of feature requests: >1. an option to pause on any error to allow inspection of the setup Good idea. Should be easy to add. > >2. an option to configure the system and leave it in that state (ie, >don't trap exit and run cleanup). By extension, an option is needed to >do cleanup only. Checkout the last patch. It has "noprepare" and "nocleanup" options. So I guess you imagine something like that, but generic? > >This framework will be very useful.
Re: [PATCH v2] net: delete /proc THIS_MODULE references
On Mon, Jan 15, 2018 at 02:18:00PM -0800, Stephen Hemminger wrote: > On Tue, 16 Jan 2018 00:42:40 +0300 > Alexey Dobriyan wrote: > > > /proc has been ignoring struct file_operations::owner field for 10 years. > > Specifically, it started with commit > > 786d7e1612f0b0adb6046f19b906609e4fe8b1ba > > ("Fix rmmod/read/write races in /proc entries"). Notice the chunk where > > inode->i_fop is initialized with proxy struct file_operations for > > regular files: > > > > - if (de->proc_fops) > > - inode->i_fop = de->proc_fops; > > + if (de->proc_fops) { > > + if (S_ISREG(inode->i_mode)) > > + inode->i_fop = &proc_reg_file_ops; > > + else > > + inode->i_fop = de->proc_fops; > > + } > > > > VFS stopped pinning module at this point. > > > > Signed-off-by: Alexey Dobriyan > > What happens if /proc file for the module is open and the module is unloaded? > Just because it is old doesn't mean that it wasn't a bug. /proc ensures that ->release hook is called for every file which has it set in its struct file_operations either at normal close() time or at rmmod time, see fs/proc/inode.c Normal filesystems don't have this problem because they pin module at mount time. This whole series was started to fix all the races wrt proc entries disappearing while they're being used. Here is likely incomplete list: commit 786d7e1612f0b0adb6046f19b906609e4fe8b1ba commit c2319540cd7330fa9066e5b9b84d357a2c8631a2 commit 5a622f2d0f86b316b07b55a4866ecb5518dd1cf7 commit 2d3a4e3666325a9709cc8ea2e88151394e8f20fc commit 59b7435149eab2dd06dd678742faff6049cb655f commit 881adb85358309ea9c6f707394002719982ec607 commit 300b994b74e75120dd1a48529552a44977e0a82a commit 3dec7f59c370c7b58184d63293c3dc984d475840 commit 99b76233803beab302123d243eea9e41149804f3 and more fixes by Al commit 866ad9a747bbf5461739fcae6d0a41c8971bbe1d commit ca469f35a8e9ef12571a4b80ac6d7fdc0260fb44 commit 05c0ae21c034a6f7c6f4c0c63a31167ebb4b061f commit 7e0e953bb0cf649f93277ac8fb67ecbb7f7b04a9
Re: [PATCH net-next 2/2] net: dsa: mv88e6xxx: Free ATU/VTU irq only when there is chip irq
On Mon, Jan 15, 2018 at 02:54:22PM -0800, Florian Fainelli wrote: > On 01/15/2018 02:45 PM, Andrew Lunn wrote: > > We only register the ATU and VTU irq when we have a chip level IRQ. > > In the error path, we should only attempt to remove the ATU and VTU > > irq if we also have a chip level IRQ. > > > > Signed-off-by: Andrew Lunn > > --- > > drivers/net/dsa/mv88e6xxx/chip.c | 6 -- > > 1 file changed, 4 insertions(+), 2 deletions(-) > > > > diff --git a/drivers/net/dsa/mv88e6xxx/chip.c > > b/drivers/net/dsa/mv88e6xxx/chip.c > > index 54cb00a27408..eb328bade225 100644 > > --- a/drivers/net/dsa/mv88e6xxx/chip.c > > +++ b/drivers/net/dsa/mv88e6xxx/chip.c > > @@ -3999,9 +3999,11 @@ static int mv88e6xxx_probe(struct mdio_device > > *mdiodev) > > out_mdio: > > mv88e6xxx_mdios_unregister(chip); > > out_g1_vtu_prob_irq: > > - mv88e6xxx_g1_vtu_prob_irq_free(chip); > > + if (chip->irq > 0) > > + mv88e6xxx_g1_vtu_prob_irq_free(chip); > > Why not move this check to mv88e6xxx_g1_vtu_prob_irq_free() and make it > a no-op if chip->irq <= 0? Hi Florian It keeps it symmetrical with the setup code: if (chip->irq > 0) { /* Has to be performed before the MDIO bus is created, * because the PHYs will link there interrupts to these * interrupt controllers */ mutex_lock(&chip->reg_lock); err = mv88e6xxx_g1_irq_setup(chip); mutex_unlock(&chip->reg_lock); if (err) goto out; if (chip->info->g2_irqs > 0) { err = mv88e6xxx_g2_irq_setup(chip); if (err) goto out_g1_irq; } err = mv88e6xxx_g1_atu_prob_irq_setup(chip); if (err) goto out_g2_irq; err = mv88e6xxx_g1_vtu_prob_irq_setup(chip); if (err) goto out_g1_atu_prob_irq; } The general flow in probe() is to only call functions if they are needed. So i would prefer to keep to that. Andrew
Re: [PATCH] net: delete /proc THIS_MODULE references
On Mon, Jan 15, 2018 at 02:50:12PM -0500, David Miller wrote: > From: Alexey Dobriyan > Date: Sat, 13 Jan 2018 20:11:52 +0300 > > > /proc has been ignoring struct file_operations::owner field for ages. > > > > Signed-off-by: Alexey Dobriyan > > What, then, makes sure that the procfs files are unregistered before the > referencing module is unloaded? Procfs interposes a bunch of wrappers for file_operations methods. Attempt to remove proc_dir_entry marks it as going down and waits for use count to reach 0. Wrappers check if it's marked and bump the use count for the duration of actual method call otherwise. IOW, removal prevents any subsequent method calls and waits for all calls currently in progress to finish... > Please explain the situation, and add a reference to the commit that > made procfs stop using the fops owner field. That commit and it's > commit message may help explain why all of this is fine. Umm... It got started in commit 786d7e1612f0b0adb6046f19b906609e4fe8b1ba Author: Alexey Dobriyan Date: Sun Jul 15 23:39:00 2007 -0700 Fix rmmod/read/write races in /proc entries Bugger if I remember how long did it take for the dust to settle - there had been bugs in the original, and it took some time to fix. I think the last straggler had been commit 7e0e953bb0cf649f93277ac8fb67ecbb7f7b04a9 Author: Al Viro Date: Sat Feb 21 22:16:11 2015 -0500 procfs: fix race between symlink removals and traversals and it had been a while between that and the previous one. Of course, we still might have bugs there, but that's not different from anywhere else in the kernel... Procfs file_operations had not been supposed to need ->owner for at least a decade...
Re: KASAN: use-after-free Write in array_map_update_elem
On 01/15/2018 04:07 PM, Dmitry Vyukov wrote: > On Mon, Jan 15, 2018 at 3:58 PM, syzbot > wrote: >> Hello, >> >> syzkaller hit the following crash on >> 8418f88764046d0e8ca6a3c04a69a0e57189aa1e >> git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/master >> compiler: gcc (GCC) 7.1.1 20170620 >> .config is attached >> Raw console output is attached. >> C reproducer is attached >> syzkaller reproducer is attached. See https://goo.gl/kgGztJ >> for information about syzkaller reproducers >> >> >> IMPORTANT: if you fix the bug, please add the following tag to the commit: >> Reported-by: syzbot+a2215ab5e92b3bb1c...@syzkaller.appspotmail.com >> It will help syzbot understand when the bug is fixed. See footer for >> details. >> If you forward the report, please keep this part and the footer. >> >> audit: type=1400 audit(1515680378.393:8): avc: denied { sys_admin } for >> pid=3508 comm="syzkaller203412" capability=21 >> scontext=unconfined_u:system_r:insmod_t:s0-s0:c0.c1023 >> tcontext=unconfined_u:system_r:insmod_t:s0-s0:c0.c1023 tclass=cap_userns >> permissive=1 >> audit: type=1400 audit(1515680378.446:9): avc: denied { sys_chroot } for >> pid=3509 comm="syzkaller203412" capability=18 >> scontext=unconfined_u:system_r:insmod_t:s0-s0:c0.c1023 >> tcontext=unconfined_u:system_r:insmod_t:s0-s0:c0.c1023 tclass=cap_userns >> permissive=1 >> == >> BUG: KASAN: use-after-free in memcpy include/linux/string.h:344 [inline] >> BUG: KASAN: use-after-free in array_map_update_elem+0x197/0x240 >> kernel/bpf/arraymap.c:249 >> Write of size 1 at addr 8801cb125bc0 by task syzkaller203412/3509 >> >> CPU: 1 PID: 3509 Comm: syzkaller203412 Not tainted 4.15.0-rc7-next-20180111+ >> #94 >> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS >> Google 01/01/2011 >> Call Trace: >> __dump_stack lib/dump_stack.c:17 [inline] >> dump_stack+0x194/0x257 lib/dump_stack.c:53 >> print_address_description+0x73/0x250 mm/kasan/report.c:256 >> kasan_report_error mm/kasan/report.c:354 [inline] >> kasan_report+0x23b/0x360 mm/kasan/report.c:412 >> check_memory_region_inline mm/kasan/kasan.c:260 [inline] >> check_memory_region+0x137/0x190 mm/kasan/kasan.c:267 >> memcpy+0x37/0x50 mm/kasan/kasan.c:303 >> memcpy include/linux/string.h:344 [inline] >> array_map_update_elem+0x197/0x240 kernel/bpf/arraymap.c:249 >> map_update_elem kernel/bpf/syscall.c:687 [inline] >> SYSC_bpf kernel/bpf/syscall.c:1811 [inline] >> SyS_bpf+0x32df/0x4400 kernel/bpf/syscall.c:1782 >> entry_SYSCALL_64_fastpath+0x29/0xa0 >> RIP: 0033:0x440ac9 >> RSP: 002b:007dff68 EFLAGS: 0203 ORIG_RAX: 0141 >> RAX: ffda RBX: 7ffe6a583b10 RCX: 00440ac9 >> RDX: 0020 RSI: 20eaf000 RDI: 0002 >> RBP: R08: R09: >> R10: R11: 0203 R12: 004022a0 >> R13: 00402330 R14: R15: >> >> Allocated by task 3071: >> save_stack+0x43/0xd0 mm/kasan/kasan.c:447 >> set_track mm/kasan/kasan.c:459 [inline] >> kasan_kmalloc+0xad/0xe0 mm/kasan/kasan.c:552 >> kasan_slab_alloc+0x12/0x20 mm/kasan/kasan.c:489 >> kmem_cache_alloc+0x12e/0x760 mm/slab.c:3541 >> getname_flags+0xcb/0x580 fs/namei.c:138 >> user_path_at_empty+0x2d/0x50 fs/namei.c:2587 >> user_path_at include/linux/namei.h:57 [inline] >> SYSC_faccessat fs/open.c:384 [inline] >> SyS_faccessat fs/open.c:353 [inline] >> SYSC_access fs/open.c:431 [inline] >> SyS_access+0x22c/0x6a0 fs/open.c:429 >> entry_SYSCALL_64_fastpath+0x29/0xa0 >> >> Freed by task 3071: >> save_stack+0x43/0xd0 mm/kasan/kasan.c:447 >> set_track mm/kasan/kasan.c:459 [inline] >> __kasan_slab_free+0x11a/0x170 mm/kasan/kasan.c:520 >> kasan_slab_free+0xe/0x10 mm/kasan/kasan.c:527 >> __cache_free mm/slab.c:3485 [inline] >> kmem_cache_free+0x86/0x2b0 mm/slab.c:3743 >> putname+0xee/0x130 fs/namei.c:258 >> filename_lookup+0x315/0x500 fs/namei.c:2342 >> user_path_at_empty+0x40/0x50 fs/namei.c:2587 >> user_path_at include/linux/namei.h:57 [inline] >> SYSC_faccessat fs/open.c:384 [inline] >> SyS_faccessat fs/open.c:353 [inline] >> SYSC_access fs/open.c:431 [inline] >> SyS_access+0x22c/0x6a0 fs/open.c:429 >> entry_SYSCALL_64_fastpath+0x29/0xa0 >> >> The buggy address belongs to the object at 8801cb124e40 >> which belongs to the cache names_cache of size 4096 >> The buggy address is located 3456 bytes inside of >> 4096-byte region [8801cb124e40, 8801cb125e40) >> The buggy address belongs to the page: >> page:ea00072c4900 count:1 mapcount:0 mapping:8801cb124e40 index:0x0 >> compound_mapcount: 0 >> flags: 0x2fffc008100(slab|head) >> raw: 02fffc008100 8801cb124e40 00010001 >> raw: ea000729bc20 ea000729bca0 8801dae30600 >> page dumped because: kasan: bad access detected >> >> Memory stat
Re: KASAN: slab-out-of-bounds Write in array_map_update_elem
On 01/15/2018 04:07 PM, Dmitry Vyukov wrote: > On Mon, Jan 15, 2018 at 3:58 PM, syzbot > wrote: >> Hello, >> >> syzkaller hit the following crash on >> 4147d50978df60f34d444c647dde9e5b34a4315e >> git://git.cmpxchg.org/linux-mmots.git/master >> compiler: gcc (GCC) 7.1.1 20170620 >> .config is attached >> Raw console output is attached. >> C reproducer is attached >> syzkaller reproducer is attached. See https://goo.gl/kgGztJ >> for information about syzkaller reproducers >> >> >> IMPORTANT: if you fix the bug, please add the following tag to the commit: >> Reported-by: syzbot+23dc48712d6941968...@syzkaller.appspotmail.com >> It will help syzbot understand when the bug is fixed. See footer for >> details. >> If you forward the report, please keep this part and the footer. >> >> audit: type=1400 audit(1515679662.716:8): avc: denied { sys_admin } for >> pid=3485 comm="syzkaller377846" capability=21 >> scontext=unconfined_u:system_r:insmod_t:s0-s0:c0.c1023 >> tcontext=unconfined_u:system_r:insmod_t:s0-s0:c0.c1023 tclass=cap_userns >> permissive=1 >> audit: type=1400 audit(1515679662.770:9): avc: denied { sys_chroot } for >> pid=3486 comm="syzkaller377846" capability=18 >> scontext=unconfined_u:system_r:insmod_t:s0-s0:c0.c1023 >> tcontext=unconfined_u:system_r:insmod_t:s0-s0:c0.c1023 tclass=cap_userns >> permissive=1 >> == >> BUG: KASAN: slab-out-of-bounds in memcpy include/linux/string.h:344 [inline] >> BUG: KASAN: slab-out-of-bounds in array_map_update_elem+0x197/0x240 >> kernel/bpf/arraymap.c:249 >> Write of size 1 at addr 8801cb0a3900 by task syzkaller377846/3486 >> >> CPU: 1 PID: 3486 Comm: syzkaller377846 Not tainted 4.15.0-rc7-mm1+ #53 >> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS >> Google 01/01/2011 >> Call Trace: >> __dump_stack lib/dump_stack.c:17 [inline] >> dump_stack+0x194/0x257 lib/dump_stack.c:53 >> print_address_description+0x73/0x250 mm/kasan/report.c:256 >> kasan_report_error mm/kasan/report.c:354 [inline] >> kasan_report+0x23b/0x360 mm/kasan/report.c:412 >> check_memory_region_inline mm/kasan/kasan.c:260 [inline] >> check_memory_region+0x137/0x190 mm/kasan/kasan.c:267 >> memcpy+0x37/0x50 mm/kasan/kasan.c:303 >> memcpy include/linux/string.h:344 [inline] >> array_map_update_elem+0x197/0x240 kernel/bpf/arraymap.c:249 >> map_update_elem kernel/bpf/syscall.c:687 [inline] >> SYSC_bpf kernel/bpf/syscall.c:1811 [inline] >> SyS_bpf+0x32df/0x4400 kernel/bpf/syscall.c:1782 >> entry_SYSCALL_64_fastpath+0x29/0xa0 >> RIP: 0033:0x440ac9 >> RSP: 002b:007dff68 EFLAGS: 0203 ORIG_RAX: 0141 >> RAX: ffda RBX: 7ffc88b23300 RCX: 00440ac9 >> RDX: 0020 RSI: 20eaf000 RDI: 0002 >> RBP: R08: R09: >> R10: R11: 0203 R12: 004022a0 >> R13: 00402330 R14: R15: >> >> Allocated by task 3072: >> save_stack+0x43/0xd0 mm/kasan/kasan.c:447 >> set_track mm/kasan/kasan.c:459 [inline] >> kasan_kmalloc+0xad/0xe0 mm/kasan/kasan.c:552 >> kasan_slab_alloc+0x12/0x20 mm/kasan/kasan.c:489 >> kmem_cache_alloc+0x12e/0x760 mm/slab.c:3541 >> getname_kernel+0x54/0x340 fs/namei.c:218 >> open_exec+0x17/0x60 fs/exec.c:877 >> load_elf_binary+0x1348/0x4c10 fs/binfmt_elf.c:780 >> search_binary_handler+0x142/0x6b0 fs/exec.c:1639 >> exec_binprm fs/exec.c:1681 [inline] >> do_execveat_common.isra.30+0x1711/0x22a0 fs/exec.c:1803 >> do_execve fs/exec.c:1848 [inline] >> SYSC_execve fs/exec.c:1929 [inline] >> SyS_execve+0x39/0x50 fs/exec.c:1924 >> do_syscall_64+0x273/0x920 arch/x86/entry/common.c:285 >> return_from_SYSCALL_64+0x0/0x75 >> >> Freed by task 3072: >> save_stack+0x43/0xd0 mm/kasan/kasan.c:447 >> set_track mm/kasan/kasan.c:459 [inline] >> __kasan_slab_free+0x11a/0x170 mm/kasan/kasan.c:520 >> kasan_slab_free+0xe/0x10 mm/kasan/kasan.c:527 >> __cache_free mm/slab.c:3485 [inline] >> kmem_cache_free+0x86/0x2b0 mm/slab.c:3743 >> putname+0xee/0x130 fs/namei.c:258 >> open_exec+0x41/0x60 fs/exec.c:882 >> load_elf_binary+0x1348/0x4c10 fs/binfmt_elf.c:780 >> search_binary_handler+0x142/0x6b0 fs/exec.c:1639 >> exec_binprm fs/exec.c:1681 [inline] >> do_execveat_common.isra.30+0x1711/0x22a0 fs/exec.c:1803 >> do_execve fs/exec.c:1848 [inline] >> SYSC_execve fs/exec.c:1929 [inline] >> SyS_execve+0x39/0x50 fs/exec.c:1924 >> do_syscall_64+0x273/0x920 arch/x86/entry/common.c:285 >> return_from_SYSCALL_64+0x0/0x75 >> >> The buggy address belongs to the object at 8801cb0a2380 >> which belongs to the cache names_cache of size 4096 >> The buggy address is located 1408 bytes to the right of >> 4096-byte region [8801cb0a2380, 8801cb0a3380) >> The buggy address belongs to the page: >> page:ea00072c2880 count:1 mapcount:0 mapping:8801cb0a2380 index:0x0 >> compound_mapcount: 0 >> flags: 0x2fffc0081
Re: [patch iproute2 net-next v8 2/3] tc: introduce support for block-handle for filter operations
Mon, Jan 15, 2018 at 06:46:00PM CET, dsah...@gmail.com wrote: >On 1/12/18 8:49 AM, Jiri Pirko wrote: >> From: Jiri Pirko >> >> Signed-off-by: Jiri Pirko >> --- >> tc/tc_filter.c | 127 >> + >> 1 file changed, 110 insertions(+), 17 deletions(-) >> > >Once the API questions settle out, please add a proper commit message to >the iproute2 patches and include examples of how to use the new feature. > Of course. That is the plan once the kernel patches are applied.
Re: [PATCHv3 1/5] net: dsa: Support internal phy on 'cpu' port
> int dsa_port_fixed_link_register_of(struct dsa_port *dp) > { > struct device_node *dn = dp->dn; > @@ -305,6 +354,10 @@ int dsa_port_fixed_link_register_of(struct dsa_port *dp) > ds->ops->adjust_link(ds, port, phydev); > > put_device(&phydev->mdio.dev); > + } else { > + err = dsa_port_setup_phy_of(dp, true); > + if (err) > + return err; Hi Sebastian First off, i tend to agree with Florian. I'm not sure how reliable this is. There is normally a state machine moving the PHY between different states. But in order to do that, i think you need a netdev. Have you tried multiple down/up of the other MAC/PHY? Does it always work? But, at the moment, we don't have much better. What i don't like is having this code inside dsa_port_fixed_link_register_of(). This has nothing to do with a fixed link. Please export functions from port.c and call them directly from dsa_port_setup() and dsa_port_teardown(). Andrew
Re: [PATCH net-next 2/2] net: dsa: mv88e6xxx: Free ATU/VTU irq only when there is chip irq
On 01/15/2018 02:45 PM, Andrew Lunn wrote: > We only register the ATU and VTU irq when we have a chip level IRQ. > In the error path, we should only attempt to remove the ATU and VTU > irq if we also have a chip level IRQ. > > Signed-off-by: Andrew Lunn > --- > drivers/net/dsa/mv88e6xxx/chip.c | 6 -- > 1 file changed, 4 insertions(+), 2 deletions(-) > > diff --git a/drivers/net/dsa/mv88e6xxx/chip.c > b/drivers/net/dsa/mv88e6xxx/chip.c > index 54cb00a27408..eb328bade225 100644 > --- a/drivers/net/dsa/mv88e6xxx/chip.c > +++ b/drivers/net/dsa/mv88e6xxx/chip.c > @@ -3999,9 +3999,11 @@ static int mv88e6xxx_probe(struct mdio_device *mdiodev) > out_mdio: > mv88e6xxx_mdios_unregister(chip); > out_g1_vtu_prob_irq: > - mv88e6xxx_g1_vtu_prob_irq_free(chip); > + if (chip->irq > 0) > + mv88e6xxx_g1_vtu_prob_irq_free(chip); Why not move this check to mv88e6xxx_g1_vtu_prob_irq_free() and make it a no-op if chip->irq <= 0? -- Florian
Re: [patch net-next v8 08/14] net: sched: add rt netlink message type for block get
Mon, Jan 15, 2018 at 06:44:41PM CET, dsah...@gmail.com wrote: >On 1/15/18 10:27 AM, Jiri Pirko wrote: >> Mon, Jan 15, 2018 at 06:21:44PM CET, dsah...@gmail.com wrote: >>> On 1/15/18 10:08 AM, David Ahern wrote: On 1/15/18 10:03 AM, Jiri Pirko wrote: > Mon, Jan 15, 2018 at 05:56:31PM CET, dsah...@gmail.com wrote: >> On 1/12/18 8:46 AM, Jiri Pirko wrote: >>> From: Jiri Pirko >> >> Why can't this be done with RTM_GETQDISC? > > I don't follow. Could you please describe a bit more what do you think? Why are you adding RTM_{NEW,GET,DEL}BLOCK? Can't you get the same information using RTM_GETQDISC and updating it to check for the 'tcm_ifindex == TCM_IFINDEX_MAGIC_BLOCK' path >> >> I might, but it bould be an ugly hack. I would use cmd that is used to >> manipulate qdisc to some entirely different purpose. That does not make >> any sense to me :( >> >> >> >>> >>> The above question is because a user specifies a shared block in a >>> 'qdisc add'. >> >> Qdisc and block is a different entity >> >> >>> >>> Alternatively, what about RTM_GETTFILTER? You already update >>> tc_ctl_tfilter to check for TCM_IFINDEX_MAGIC_BLOCK >> >> The object is still filter! Only the handle is different. You cannot >> compare that, sorry. >> >> >>> >>> My main question is why can't existing RTM_ commands be used? > >What I am struggling with is the idea that you need a new set of RTM_ >commands to see if a block exists or to get notifications of a change to >a block, but you don't need that API to create or modify the blocks. There is no modify. Create and destroy is done implicitly. That is the same as for the chains. I was thinking about how to get info if block exists or not. One way was new set of rtms that would be also usable for notifications. If we don't need notifications, it can be done by listing all qdiscs and see if they don't reference the block. That was not originally possible because the block info was per-qdisc. Now, per Roopa's suggestion it is generic. Okay. I will use qdisc dump for checking block existence. I guess that block create/destroy notifications are not that useful anyway. We don't have them for chains either. Thanks.
[PATCH net-next 1/2] net: dsa: mv88e6xxx: Return error from irq_find_mapping()
Fix a cut/paste error. When irq_find_mapping() returns an error for the ATU or VTU interrupt, return that error, not the value of chip->device_irq. Signed-off-by: Andrew Lunn --- drivers/net/dsa/mv88e6xxx/global1_atu.c | 2 +- drivers/net/dsa/mv88e6xxx/global1_vtu.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/net/dsa/mv88e6xxx/global1_atu.c b/drivers/net/dsa/mv88e6xxx/global1_atu.c index b97de9d36337..20d941f4273b 100644 --- a/drivers/net/dsa/mv88e6xxx/global1_atu.c +++ b/drivers/net/dsa/mv88e6xxx/global1_atu.c @@ -377,7 +377,7 @@ int mv88e6xxx_g1_atu_prob_irq_setup(struct mv88e6xxx_chip *chip) chip->atu_prob_irq = irq_find_mapping(chip->g1_irq.domain, MV88E6XXX_G1_STS_IRQ_ATU_PROB); if (chip->atu_prob_irq < 0) - return chip->device_irq; + return chip->atu_prob_irq; err = request_threaded_irq(chip->atu_prob_irq, NULL, mv88e6xxx_g1_atu_prob_irq_thread_fn, diff --git a/drivers/net/dsa/mv88e6xxx/global1_vtu.c b/drivers/net/dsa/mv88e6xxx/global1_vtu.c index 53d58a01484a..40fe81af64d5 100644 --- a/drivers/net/dsa/mv88e6xxx/global1_vtu.c +++ b/drivers/net/dsa/mv88e6xxx/global1_vtu.c @@ -570,7 +570,7 @@ int mv88e6xxx_g1_vtu_prob_irq_setup(struct mv88e6xxx_chip *chip) chip->vtu_prob_irq = irq_find_mapping(chip->g1_irq.domain, MV88E6XXX_G1_STS_IRQ_VTU_PROB); if (chip->vtu_prob_irq < 0) - return chip->device_irq; + return chip->chip->vtu_prob_irq; err = request_threaded_irq(chip->vtu_prob_irq, NULL, mv88e6xxx_g1_vtu_prob_irq_thread_fn, -- 2.15.1
[PATCH net-next 2/2] net: dsa: mv88e6xxx: Free ATU/VTU irq only when there is chip irq
We only register the ATU and VTU irq when we have a chip level IRQ. In the error path, we should only attempt to remove the ATU and VTU irq if we also have a chip level IRQ. Signed-off-by: Andrew Lunn --- drivers/net/dsa/mv88e6xxx/chip.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c index 54cb00a27408..eb328bade225 100644 --- a/drivers/net/dsa/mv88e6xxx/chip.c +++ b/drivers/net/dsa/mv88e6xxx/chip.c @@ -3999,9 +3999,11 @@ static int mv88e6xxx_probe(struct mdio_device *mdiodev) out_mdio: mv88e6xxx_mdios_unregister(chip); out_g1_vtu_prob_irq: - mv88e6xxx_g1_vtu_prob_irq_free(chip); + if (chip->irq > 0) + mv88e6xxx_g1_vtu_prob_irq_free(chip); out_g1_atu_prob_irq: - mv88e6xxx_g1_atu_prob_irq_free(chip); + if (chip->irq > 0) + mv88e6xxx_g1_atu_prob_irq_free(chip); out_g2_irq: if (chip->info->g2_irqs > 0 && chip->irq > 0) mv88e6xxx_g2_irq_free(chip); -- 2.15.1
[PATCH net-next 0/2] ATU and VTU irq fixes
Further testing and core review found two sets of bugs. Core review found a cut/paste error in the irq setup code. A board which does not have an interrupt line from the switch to the SoC, and experiancing an EPROBE_DEFER throw a splat when the ATU irq was freed but never registered. Andrew Lunn (2): net: dsa: mv88e6xxx: Return error from irq_find_mapping() net: dsa: mv88e6xxx: Free ATU/VTU irq only when there is chip irq drivers/net/dsa/mv88e6xxx/chip.c| 6 -- drivers/net/dsa/mv88e6xxx/global1_atu.c | 2 +- drivers/net/dsa/mv88e6xxx/global1_vtu.c | 2 +- 3 files changed, 6 insertions(+), 4 deletions(-) -- 2.15.1
Re: DPAA Ethernet problems with mainstream Linux kernels
Hello Madalin-cristian On 15/01/2018, Madalin-cristian Bucur wrote: >> > The device tree that you mention, cyrus_p5020.eth.dts is not found in >> > the Linux kernel sources. The cyrus_p5020.dts file from the fsl ppc >> > device tree folder does not include the PHY information for the DPAA >> > interfaces. The problems that you experience may be caused by some >> > issues with the PHY configuration (i.e. internal delay). >> The cyrus_p5020.eth.dts is a modified version of the cyrus_p5020.dts, >> which of course was based off the original p5020ds.dts file. As you >> noted, the current cyrus_p5020.dts file is incomplete, and does not >> map the Ethernet connections properly. This is because the current linux kernel version of cyrus_p5020.dts includes 'p5020si-pre.dtsi' and 'p5020si-post.dtsi' include files, which orginally gave us working ethernet (when we used the SDK kernel) However at some point you moved the ethernet aliases from the board dts file to the p5020si-pre.dtsi file breaking the linkages for our board. cyrus-pre.dtsi is simply p5020si-pre.dtsi with only the correct aliases in. >> ** I have attached both the cyrus_p5020.eth.dts and cyrus-pre.dtsi >> files with this email for comparison. Please let me know if you see >> any corrections that should be made to either file. > > At a first glance they look fine to me. That's good to know. >> I have started testing along that line, using Wireshark to view the >> traffic on the X5000/20 itself, and from another machine connected >> on the same subnet. So far (as indicated by some details of in my >> initial email), I can see outgoing broadcast requests (for DHCP) >> being sent out from the X5000/20, and these requests are correctly >> constructed and visible outside the X5000/20. >> >> However, no responses to the DHCP broadcasts appear to reach >> to X5000/20's DPAA Ethernet. I will need to setup some further >> tests to determine if the DHCP server saw the requests and responded >> to them. (I assume the DHCP server is getting them, and responding, >> as I can always get a successful DHCP response to the X5000/20 >> when using an add-on Ethernet PICe card on the same subnet). This matches what I see, the interface I have connected to the network shows an increasing number of transmitted packets, but no received ones. Jamie also noticed the following error in dmesg (right after the ethernet port becomes active) [4.112165] fsl_dpa dpaa-ethernet.0 eth0: Probed interface eth0 [4.116616] fsl_dpa dpaa-ethernet.1 eth1: Probed interface eth1 [ ... ] [ 106.501055] IPv6: ADDRCONF(NETDEV_UP): eth1: link is not ready [ 106.570944] IPv6: ADDRCONF(NETDEV_UP): eth1: link is not ready [ 106.605044] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready [ 106.674918] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready [ 108.702771] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready [ 109.032798] fsl-pamu: pamu_av_isr: access violation interrupt [ 109.032806] fsl-pamu: pamu_av_isr: POES1= [ 109.032808] fsl-pamu: pamu_av_isr: POES2= [ 109.032809] fsl-pamu: pamu_av_isr: AVS1=002d0081 [ 109.032811] fsl-pamu: pamu_av_isr: AVS2=0081 [ 109.032813] fsl-pamu: pamu_av_isr: AVA=0001f1328000 [ 109.032815] fsl-pamu: pamu_av_isr: UDAD=002d0001 [ 109.032817] fsl-pamu: pamu_av_isr: POEA= I haven't seen this anywhere else, and wondered if it is relevant. Regards Darren
Re: [PATCH v2] net: delete /proc THIS_MODULE references
On Tue, 16 Jan 2018 00:42:40 +0300 Alexey Dobriyan wrote: > /proc has been ignoring struct file_operations::owner field for 10 years. > Specifically, it started with commit 786d7e1612f0b0adb6046f19b906609e4fe8b1ba > ("Fix rmmod/read/write races in /proc entries"). Notice the chunk where > inode->i_fop is initialized with proxy struct file_operations for > regular files: > > - if (de->proc_fops) > - inode->i_fop = de->proc_fops; > + if (de->proc_fops) { > + if (S_ISREG(inode->i_mode)) > + inode->i_fop = &proc_reg_file_ops; > + else > + inode->i_fop = de->proc_fops; > + } > > VFS stopped pinning module at this point. > > Signed-off-by: Alexey Dobriyan What happens if /proc file for the module is open and the module is unloaded? Just because it is old doesn't mean that it wasn't a bug.
[PATCH net] net, sched: fix panic when updating miniq {b,q}stats
While working on fixing another bug, I ran into the following panic on arm64 by simply attaching clsact qdisc, adding a filter and running traffic on ingress to it: [...] [ 178.188591] Unable to handle kernel read from unreadable memory at virtual address 810fb501f000 [ 178.197314] Mem abort info: [ 178.200121] ESR = 0x9604 [ 178.203168] Exception class = DABT (current EL), IL = 32 bits [ 178.209095] SET = 0, FnV = 0 [ 178.212157] EA = 0, S1PTW = 0 [ 178.215288] Data abort info: [ 178.218175] ISV = 0, ISS = 0x0004 [ 178.222019] CM = 0, WnR = 0 [ 178.224997] user pgtable: 4k pages, 48-bit VAs, pgd = 23cb3f33 [ 178.231531] [810fb501f000] *pgd= [ 178.236508] Internal error: Oops: 9604 [#1] SMP [...] [ 178.311855] CPU: 73 PID: 2497 Comm: ping Tainted: GW 4.15.0-rc7+ #5 [ 178.319413] Hardware name: FOXCONN R2-1221R-A4/C2U4N_MB, BIOS G31FB18A 03/31/2017 [ 178.326887] pstate: 6045 (nZCv daif +PAN -UAO) [ 178.331685] pc : __netif_receive_skb_core+0x49c/0xac8 [ 178.336728] lr : __netif_receive_skb+0x28/0x78 [ 178.341161] sp : 2344b750 [ 178.344465] x29: 2344b750 x28: 810fbdfd0580 [ 178.349769] x27: x26: 09378000 [...] [ 178.418715] x1 : 0054 x0 : [ 178.424020] Process ping (pid: 2497, stack limit = 0x9f0a3ff4) [ 178.430537] Call trace: [ 178.432976] __netif_receive_skb_core+0x49c/0xac8 [ 178.437670] __netif_receive_skb+0x28/0x78 [ 178.441757] process_backlog+0x9c/0x160 [ 178.445584] net_rx_action+0x2f8/0x3f0 [...] Reason is that sch_ingress and sch_clsact are doing mini_qdisc_pair_init() which sets up miniq pointers to cpu_{b,q}stats from the underlying qdisc. Problem is that this cannot work since they are actually set up right after the qdisc ->init() callback in qdisc_create(), so first packet going into sch_handle_ingress() tries to call mini_qdisc_bstats_cpu_update() and we therefore panic. In order to fix this, allocation of {b,q}stats needs to happen before we call into ->init(). In net-next, there's already such option through commit d59f5ffa59d8 ("net: sched: a dflt qdisc may be used with per cpu stats"). However, the bug needs to be fixed in net still for 4.15. Thus, include these bits to reduce any merge churn and reuse the static_flags field to set TCQ_F_CPUSTATS, and remove the allocation from qdisc_create() since there is no other user left. Prashant Bhole ran into the same issue but for net-next, thus adding him below as well as co-author. Same issue was also reported by Sandipan Das when using bcc. Fixes: 46209401f8f6 ("net: core: introduce mini_Qdisc and eliminate usage of tp->q for clsact fastpath") Reference: https://lists.iovisor.org/pipermail/iovisor-dev/2018-January/001190.html Reported-by: Sandipan Das Co-authored-by: Prashant Bhole Co-authored-by: John Fastabend Signed-off-by: Daniel Borkmann Cc: Jiri Pirko --- include/net/sch_generic.h | 2 ++ net/sched/sch_api.c | 15 +-- net/sched/sch_generic.c | 18 +- net/sched/sch_ingress.c | 19 --- 4 files changed, 24 insertions(+), 30 deletions(-) diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h index 83a3e47..becf86a 100644 --- a/include/net/sch_generic.h +++ b/include/net/sch_generic.h @@ -179,6 +179,7 @@ struct Qdisc_ops { const struct Qdisc_class_ops*cl_ops; charid[IFNAMSIZ]; int priv_size; + unsigned intstatic_flags; int (*enqueue)(struct sk_buff *skb, struct Qdisc *sch, @@ -444,6 +445,7 @@ void qdisc_tree_reduce_backlog(struct Qdisc *qdisc, unsigned int n, unsigned int len); struct Qdisc *qdisc_alloc(struct netdev_queue *dev_queue, const struct Qdisc_ops *ops); +void qdisc_free(struct Qdisc *qdisc); struct Qdisc *qdisc_create_dflt(struct netdev_queue *dev_queue, const struct Qdisc_ops *ops, u32 parentid); void __qdisc_calculate_pkt_len(struct sk_buff *skb, diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c index 0f1eab9..52529b7 100644 --- a/net/sched/sch_api.c +++ b/net/sched/sch_api.c @@ -1063,17 +1063,6 @@ static struct Qdisc *qdisc_create(struct net_device *dev, } if (!ops->init || (err = ops->init(sch, tca[TCA_OPTIONS])) == 0) { - if (qdisc_is_percpu_stats(sch)) { - sch->cpu_bstats = - netdev_alloc_pcpu_stats(struct gnet_stats_basic_cpu); - if (!sch->cpu_bstats) - goto err_out4; - - sch->cpu_qstats = alloc_percpu(struct gnet_stats_queue); - if (!sch->cpu_qstats) -
Re: [PATCH v2 net-next 1/5] l2tp: fix switch default error handling in l2tp_nl_cmd_session_create()
> On 15 January 2018 at 21:18, Lorenzo Bianconi > wrote: >>> On Sun, Jan 14, 2018 at 03:50:54PM +0100, Lorenzo Bianconi wrote: Although this issue is harmless since that code path is protected by the check on l2tp_nl_cmd_ops[]/l2tp_nl_cmd_ops[]->session_create(), fix error handling for L2TP_PWTYPE_IP/default case in l2tp_nl_cmd_session_create() Signed-off-by: Lorenzo Bianconi --- net/l2tp/l2tp_netlink.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/l2tp/l2tp_netlink.c b/net/l2tp/l2tp_netlink.c index e1ca29f79821..48b5bf30ec50 100644 --- a/net/l2tp/l2tp_netlink.c +++ b/net/l2tp/l2tp_netlink.c @@ -635,7 +635,7 @@ static int l2tp_nl_cmd_session_create(struct sk_buff *skb, struct genl_info *inf case L2TP_PWTYPE_IP: default: ret = -EPROTONOSUPPORT; - break; + goto out_tunnel; } >>> Not sure if this change is really worthwhile. As you noted, this is >>> unreachable code. And this switch should better be removed entirely: >>> it doesn't do anything for supported pseudo-wires. >>> >>> And if PWTYPE_ETH_VLAN were to be implemented, it should perform its >>> configuration consistency checking in its own PW specific code, not in >>> the genl handler. >>> >> >> Personally I would prefer to not remove some code that could be useful >> for a future implementation, but just fix it if it presents issues to >> address. >> Anyway we can simply drop this patch from the series and I can send a >> new one to remove the switch entirely. >> >> @James what do you think? > > Keep the patch series focused. If you read the patch series as a set, > this patch stands out as not fitting the purpose of the series. I > agree with Guillaume. > Ok, fine. What about the fix? Do you prefer to remove the switch or just fix it? >> >> Regards, >> Lorenzo >> >>> Anyway, removing this switch isn't the purpose of this series, so I >>> think you can drop this patch. > > I agree.
Re: [PATCH net-next v5 0/4] dwmac-meson8b: clock fixes for Meson8b
On Mon, Jan 15, 2018 at 06:10:11PM +0100, Martin Blumenstingl wrote: > Hi Dave, > > this series is now successfully tested, thus we think it's ready to be > applied to your net-next tree. > > Emiliano reported [0] that he couldn't get dwmac-meson8b to work on his > Odroid-C1. This is the (hopefully) final version of this series, which > was successfully tested. > > Due to the fact that the public S805/S905/S912 datasheets all seem to > be outdated regarding the description of the PRG_ETH0 (also called > PRG_ETHERNET_ADDR0) register Linus Lüssing offered to help testing with > an oscilloscope and an Odroid-C1. I would like to say HUGE thanks to him > at this point as he spent hours figuring out the effects of the bits > that are (though to be) relevant to get Ethernet working on the > Odroid-C1. > We tested three scenarios, all based on version 3 of this series: > 1) MPLL2 at ~500MHz, m250_div set to 1, bit 10 enabled > this resulted in a clock rate twice as high as expected at the RGMII TX > clock pin (250MHz instead of 125MHz for Gbit connections and 50MHz > instead of 25MHz for 100Mbit/s connections). it did not change the > rate at the XTAL_IN pin of PHY (which stayed consistenly at 25MHz) > 2) MPLL2 at ~250MHz, m250_div set to 1, bit 10 disabled > the oscilloscope shows "no clock" for the RGMII TX clock pin at it's > highest resolution (and random rates at lower resolutions). XTAL_IN is > still at 25MHz > 3) MPLL2 at ~250MHz, m250_div set to 1, bit 10 enabled > this resulted in a 125MHz signal at the RGMII TX clock pin for Gbit > speeds and 25MHz for 100Mbit/s - both values are as expected. The rate > on the XTAL_IN pin was at 25MHz > -> boot-logs (with the PRG_ETH0 register value) and screenshots from the > readings of the oscilloscope can be found at: > https://metameute.de/~tux/linux/amlogic/odroidc1/ethernet/ > > Version 4 of this series is based on the results from Linus Lüssing's > help with the oscilloscope and Odroid-C1. > Unfortunately I don't have any Meson8b boards with RGMII PHY so I could > only partially test this. @Emiliano: Could you please give this version > a try and let me know about the results (preferably with a "Tested-by" > if it works)? > You obviously still need your two "ARM: dts: meson8b" patches which > - add the amlogic,meson8b-dwmac" compatible to meson8b.dtsi > - enable Ethernet on the Odroid-C1 (according to your last thest a TX > delay of 4ns is required to make it work properly) > > When testing on Meson8b this also needs a fix for the MPLL clock driver: > "clk: meson: mpll: use 64-bit maths in params_from_rate", see: > https://patchwork.kernel.org/patch/10131677/ > > I have tested this myself on a Khadas VIM (GXL SoC, internal RMII PHY) > and a Khadas VIM2 (GXM SoC, external RGMII PHY). Both are still working > fine (so let's hope that this also fixes your Meson8b issue :)). > > changes since v4 at [4]: > - dropped "RFT" status since Jerome tested this series successfully! > - dropped PATCH #2 ("simplify generating the clock names"). I will > improve the whole clock registration in a separate series. since that > patch didn't really improve anything I dropped it for now > - added Jerome's Acked-/Reviewed-/Tested-by's - many thanks! > > changes since v3 at [3]: > - renamed the function PATCH #1 from meson8b_init_rgmii_clk to > meson8b_init_rgmii_tx_clk since we now know what the register bits > mean > - rewrote PATCH #3 because bit 10 is a gate clock and it seems that > there is an internal fixed divide-by-2 clock. see the patch > description for a detailed explanation > - updated the description of PATCH #4 and #5 as the clock we're trying > to fix is the "RGMII TX" clock (old version stated that this is the > "RGMII clock" or "PHY reference clock"). also updated the numbers in > the description now that we have the clock hierarchy right (at least > we hope so) > > changes since v2 at [2]: > - added PATCH #2 to make the following patch easier > - Emiliano reported that there's currently another bug in the > dwmac-meson8b driver which prevents it from working with RGMII PHYs on > Meson8b: bit 10 of the PRG_ETH0 register is configures a clock gate > (instead of a divide by 5 or divide by 10 clock divider). This has not > been visible on GXBB and later due to the input clock which always led > to a selection of "divide by 10" (which is done internally in the IP > block, but the bit actually means "enable RGMII clock output"). > PATCH #3 was added to address this issue. > - the commit message of PATCH #4 and #5 (formerly PATCH #2 and #3) were > updated and the patch itself rebased because the m25_div clock was > removed with the new PATCH #3 (so some of the statements were not > valid anymore) > > changes since v1 at [1]: > - changed the subject of the cover-letter to indicate that this is all > about the RGMII clock > - added PATCH #1 which ensures that we don't unnecessarily change the > parent clocks in RMII mode
Re: [PATCH v2 net-next 1/5] l2tp: fix switch default error handling in l2tp_nl_cmd_session_create()
On 15 January 2018 at 21:18, Lorenzo Bianconi wrote: >> On Sun, Jan 14, 2018 at 03:50:54PM +0100, Lorenzo Bianconi wrote: >>> Although this issue is harmless since that code path is protected by the >>> check on l2tp_nl_cmd_ops[]/l2tp_nl_cmd_ops[]->session_create(), fix error >>> handling for L2TP_PWTYPE_IP/default case in l2tp_nl_cmd_session_create() >>> >>> Signed-off-by: Lorenzo Bianconi >>> --- >>> net/l2tp/l2tp_netlink.c | 2 +- >>> 1 file changed, 1 insertion(+), 1 deletion(-) >>> >>> diff --git a/net/l2tp/l2tp_netlink.c b/net/l2tp/l2tp_netlink.c >>> index e1ca29f79821..48b5bf30ec50 100644 >>> --- a/net/l2tp/l2tp_netlink.c >>> +++ b/net/l2tp/l2tp_netlink.c >>> @@ -635,7 +635,7 @@ static int l2tp_nl_cmd_session_create(struct sk_buff >>> *skb, struct genl_info *inf >>> case L2TP_PWTYPE_IP: >>> default: >>> ret = -EPROTONOSUPPORT; >>> - break; >>> + goto out_tunnel; >>> } >>> >> Not sure if this change is really worthwhile. As you noted, this is >> unreachable code. And this switch should better be removed entirely: >> it doesn't do anything for supported pseudo-wires. >> >> And if PWTYPE_ETH_VLAN were to be implemented, it should perform its >> configuration consistency checking in its own PW specific code, not in >> the genl handler. >> > > Personally I would prefer to not remove some code that could be useful > for a future implementation, but just fix it if it presents issues to > address. > Anyway we can simply drop this patch from the series and I can send a > new one to remove the switch entirely. > > @James what do you think? Keep the patch series focused. If you read the patch series as a set, this patch stands out as not fitting the purpose of the series. I agree with Guillaume. > > Regards, > Lorenzo > >> Anyway, removing this switch isn't the purpose of this series, so I >> think you can drop this patch. I agree.
Re: [PATCH v2 0/2] ipv4: Make neigh lookup keys for loopback/point-to-point devices be INADDR_ANY
From: Jim Westfall Date: Mon, 15 Jan 2018 13:42:38 -0800 > David Miller wrote [01.15.18]: >> From: Jim Westfall >> Date: Sun, 14 Jan 2018 04:18:49 -0800 >> >> > This used to be the previous behavior in older kernels but became broken in >> > a263b3093641f (ipv4: Make neigh lookups directly in output packet path) >> > and then later removed because it was broken in 0bb4087cbec0 (ipv4: Fix >> > neigh >> > lookup keying over loopback/point-to-point devices) >> > >> > Not having this results in there being an arp entry for every remote ip >> > address that the device talks to. Given a fairly active device it can >> > cause the arp table to become huge and/or having to add/purge large number >> > of entires to keep within table size thresholds. >> ... >> > v2: >> > - fixes coding style issues >> >> Series applied and queued up for -stable, thank you. > > Thanks for applying these. We see the same type of behavior with ipv6 > over point-to-point interfaces and I would like to fix these as well by > mapping all the ndisc_cache entries to in6addr_any. However my knowledge > of ndisc is limited and I'm unclear if its safe to assume ndisc, like > arp, would never exist on the point-to-point interface. Ok, hopefully some ipv6 experts can chime in. Thank you.
Re: [PATCH 1/3 net] ibmvnic: Modify buffer size on failover
On 01/15/2018 03:11 PM, John Allen wrote: > Using newer backing devices can cause the required padding at the end of > rx buffers to change. Currently we assume that the size of buffers will > never change, but in the case that we failover from a backing device with > smaller padding requirement to a backing device with a larger padding > requirement, the vnic server will fail to post rx buffers due to > inadequate space in our rx pool. This patch fixes the issue by checking > whether or not the buffer size has changed on a reset and if it has, > reallocate the buffer. > > Signed-off-by: John Allen > --- > diff --git a/drivers/net/ethernet/ibm/ibmvnic.c > b/drivers/net/ethernet/ibm/ibmvnic.c > index b676fa9..5b68a28 100644 > --- a/drivers/net/ethernet/ibm/ibmvnic.c > +++ b/drivers/net/ethernet/ibm/ibmvnic.c > @@ -412,13 +412,25 @@ static int reset_rx_pools(struct ibmvnic_adapter > *adapter) > int rx_scrqs; > int i, j, rc; > > + size_array = (u64 *)((u8 *)(adapter->login_rsp_buf) + > + be32_to_cpu(adapter->login_rsp_buf->off_rxadd_buff_size)); > + > rx_scrqs = be32_to_cpu(adapter->login_rsp_buf->num_rxadd_subcrqs); > for (i = 0; i < rx_scrqs; i++) { > rx_pool = &adapter->rx_pool[i]; > > netdev_dbg(adapter->netdev, "Re-setting rx_pool[%d]\n", i); > > - rc = reset_long_term_buff(adapter, &rx_pool->long_term_buff); > + if (rx_pool->buff_size != be64_to_cpu(size_array[i])) { > + rx_pool->buff_size = be64_to_cpu(size_array[i]); > + rc = alloc_long_term_buff(adapter, > + &rx_pool->long_term_buff, > + rx_pool->size * > + rx_pool->buff_size); We should be freeing the long_term_buff here before allocating a new one. I will send a new version with the changes. Please ignore this version. -John > + } else { > + rc = reset_long_term_buff(adapter, > + &rx_pool->long_term_buff); > + } > if (rc) > return rc; >
Re: [PATCH v2 0/2] ipv4: Make neigh lookup keys for loopback/point-to-point devices be INADDR_ANY
David Miller wrote [01.15.18]: > From: Jim Westfall > Date: Sun, 14 Jan 2018 04:18:49 -0800 > > > This used to be the previous behavior in older kernels but became broken in > > a263b3093641f (ipv4: Make neigh lookups directly in output packet path) > > and then later removed because it was broken in 0bb4087cbec0 (ipv4: Fix > > neigh > > lookup keying over loopback/point-to-point devices) > > > > Not having this results in there being an arp entry for every remote ip > > address that the device talks to. Given a fairly active device it can > > cause the arp table to become huge and/or having to add/purge large number > > of entires to keep within table size thresholds. > ... > > v2: > > - fixes coding style issues > > Series applied and queued up for -stable, thank you. Thanks for applying these. We see the same type of behavior with ipv6 over point-to-point interfaces and I would like to fix these as well by mapping all the ndisc_cache entries to in6addr_any. However my knowledge of ndisc is limited and I'm unclear if its safe to assume ndisc, like arp, would never exist on the point-to-point interface. Thanks jim
[PATCH v2] net: delete /proc THIS_MODULE references
/proc has been ignoring struct file_operations::owner field for 10 years. Specifically, it started with commit 786d7e1612f0b0adb6046f19b906609e4fe8b1ba ("Fix rmmod/read/write races in /proc entries"). Notice the chunk where inode->i_fop is initialized with proxy struct file_operations for regular files: - if (de->proc_fops) - inode->i_fop = de->proc_fops; + if (de->proc_fops) { + if (S_ISREG(inode->i_mode)) + inode->i_fop = &proc_reg_file_ops; + else + inode->i_fop = de->proc_fops; + } VFS stopped pinning module at this point. Signed-off-by: Alexey Dobriyan --- net/8021q/vlanproc.c |2 -- net/appletalk/aarp.c |1 - net/appletalk/atalk_proc.c |3 --- net/atm/br2684.c |1 - net/atm/lec.c |1 - net/atm/mpoa_proc.c|1 - net/atm/proc.c |1 - net/ax25/af_ax25.c |1 - net/ax25/ax25_route.c |1 - net/ax25/ax25_uid.c|1 - net/bluetooth/cmtp/capi.c |1 - net/can/bcm.c |1 - net/can/proc.c |6 -- net/core/neighbour.c |1 - net/core/net-procfs.c |4 net/core/pktgen.c |3 --- net/core/sock.c|1 - net/decnet/af_decnet.c |1 - net/decnet/dn_dev.c|1 - net/decnet/dn_neigh.c |1 - net/decnet/dn_route.c |1 - net/ipv4/arp.c |1 - net/ipv4/fib_trie.c|3 --- net/ipv4/igmp.c|2 -- net/ipv4/ipconfig.c|1 - net/ipv4/ipmr.c|2 -- net/ipv4/netfilter/ipt_CLUSTERIP.c |1 - net/ipv4/proc.c|3 --- net/ipv4/raw.c |1 - net/ipv4/route.c |3 --- net/ipv4/tcp_ipv4.c|1 - net/ipv4/udp.c |1 - net/ipv4/udplite.c |1 - net/ipv6/addrconf.c|1 - net/ipv6/anycast.c |1 - net/ipv6/ip6_flowlabel.c |1 - net/ipv6/ip6mr.c |2 -- net/ipv6/mcast.c |2 -- net/ipv6/proc.c|3 --- net/ipv6/raw.c |1 - net/ipv6/route.c |2 -- net/ipv6/tcp_ipv6.c|1 - net/ipv6/udp.c |1 - net/ipv6/udplite.c |1 - net/ipx/ipx_proc.c |3 --- net/kcm/kcmproc.c |2 -- net/l2tp/l2tp_ppp.c|1 - net/llc/llc_proc.c |2 -- net/netlink/af_netlink.c |1 - net/netrom/af_netrom.c |1 - net/netrom/nr_route.c |2 -- net/packet/af_packet.c |1 - net/phonet/socket.c|2 -- net/rose/af_rose.c |1 - net/rose/rose_route.c |3 --- net/rxrpc/proc.c |2 -- net/sched/sch_api.c|1 - net/sctp/proc.c|1 - net/unix/af_unix.c |1 - net/wireless/wext-proc.c |1 - net/xfrm/xfrm_proc.c |1 - 61 files changed, 96 deletions(-) --- a/net/8021q/vlanproc.c +++ b/net/8021q/vlanproc.c @@ -80,7 +80,6 @@ static int vlan_seq_open(struct inode *inode, struct file *file) } static const struct file_operations vlan_fops = { - .owner = THIS_MODULE, .open= vlan_seq_open, .read= seq_read, .llseek = seq_lseek, @@ -97,7 +96,6 @@ static int vlandev_seq_open(struct inode *inode, struct file *file) } static const struct file_operations vlandev_fops = { - .owner = THIS_MODULE, .open= vlandev_seq_open, .read= seq_read, .llseek = seq_lseek, --- a/net/appletalk/aarp.c +++ b/net/appletalk/aarp.c @@ -1047,7 +1047,6 @@ static int aarp_seq_open(struct inode *inode, struct file *file) } const struct file_operations atalk_seq_arp_fops = { - .owner = THIS_MODULE, .open = aarp_seq_open, .read = seq_read, .llseek = seq_lseek, --- a/net/appletalk/atalk_proc.c +++ b/net/appletalk/atalk_proc.c @@ -226,7 +226,6 @@ static int atalk_seq_socket_open(struct inode *inode, struct file *file) } static const struct file_operations atalk_seq_interface_fops = { - .owner = THIS_MODULE, .open = atalk_seq_interface_open, .read = seq_read, .llseek = seq_lseek, @@ -234,7 +233,6 @@ static const struct
Re: DPAA Ethernet traffice troubles with Linux kernel
Some extra info: When Ubuntu boots, Eth0 (192.168.22.44) is not brought up automaticly.. When i bring up eth0 by hand, its still not active.. root@X5000LNX:/home/skateman# ifconfig eth0 up root@X5000LNX:/home/skateman# ping 192.168.22.44 connect: Network is unreachable When i use mii-tool too Kick the tranceiver... it comes alive.. i can ping the eth0 itself root@X5000LNX:/home/skateman# mii-tool -R eth0 resetting the transceiver... root@X5000LNX:/home/skateman# ping 192.168.22.44 PING 192.168.22.44 (192.168.22.44) 56(84) bytes of data. 64 bytes from 192.168.22.44: icmp_seq=1 ttl=64 time=0.045 ms 64 bytes from 192.168.22.44: icmp_seq=2 ttl=64 time=0.046 ms 64 bytes from 192.168.22.44: icmp_seq=3 ttl=64 time=0.047 ms 64 bytes from 192.168.22.44: icmp_seq=4 ttl=64 time=0.048 ms eth0 Link encap:Ethernet HWaddr 00:04:9f:01:02:03 inet addr:192.168.22.44 Bcast:192.168.22.255 Mask:255.255.255.0 inet6 addr: fe80::c84b:9f6b:f2f6:8933/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:48 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:0 (0.0 B) TX bytes:7600 (7.6 KB) Memory:fe4e6000-fe4e6fff Not realy sure why the Tranceiver is not brought up directly when linux starts.. On 1/15/18, Christian Zigotzky wrote: > Sorry, I have forgotten the download link. Please test it with the DPAA > Ethernet. >> Hi All, >> >> I compiled the RC8 of kernel 4.15 with Joakim's patch for the AmigaOne >> X5000 today. Many thanks to Joakim for the mdio patch. >> >> Download: http://www.xenosoft.de/uImage-4.15-rc8_with_mdio_patch.tar.gz >> >> Please test it on your X5000. >> >> Thanks, >> Christian >> >> >> On 15 January 2018 at 5:59PM, Joakim Tjernlund wrote: Hi, Please use text logs instead of pictures next time, it's easier to read. The errors you see are related to missing MAC addresses for the unused interfaces, you can ignore these are they are not relevant for the issue you encounter. Normally the unused interfaces should have status disabled in the device tree but there is not a big deal if they fail like that. As I've advised Jamie on the other thread, please try to connect the device back 2 back to a known good machine and determine what is broken - Rx/Tx? Is there another software version that does work on these machines? >>> Hi, just saw this and thought of a small patch I just wrote for mdio >>> bus, o idea >>> if it is relevant but here goes: >>> >>> From fe0b98d54a79779482700676331b4d10a0f3cada Mon Sep 17 00:00:00 2001 >>> From: Joakim Tjernlund >>> Date: Sun, 14 Jan 2018 21:27:20 +0100 >>> Subject: [PATCH] of_mdiobus_register: Continue after error >>> >>> of_mdiobus_register unregister itself if one phy fails to register >>> which is bad for system having all its PHYs on the same MDIO bus. >>> Just log the error and continue with the remaining PHYs instead. >>> >>> Signed-off-by: Joakim Tjernlund >>> --- >>> drivers/of/of_mdio.c | 6 -- >>> 1 file changed, 4 insertions(+), 2 deletions(-) >>> >>> diff --git a/drivers/of/of_mdio.c b/drivers/of/of_mdio.c >>> index 98258583abb0..76ff28a41dad 100644 >>> --- a/drivers/of/of_mdio.c >>> +++ b/drivers/of/of_mdio.c >>> @@ -229,7 +229,8 @@ int of_mdiobus_register(struct mii_bus *mdio, >>> struct device_node *np) >>> else >>> rc = of_mdiobus_register_device(mdio, child, addr); >>> if (rc) >>> - goto unregister; >>> + pr_warn(FW_WARN >>> + "%pOF: Failed to register MDIO device.\n", child); >>> } >>> if (!scanphys) >>> @@ -253,7 +254,8 @@ int of_mdiobus_register(struct mii_bus *mdio, >>> struct device_node *np) >>> if (of_mdiobus_child_is_phy(child)) { >>> rc = of_mdiobus_register_phy(mdio, child, addr); >>> if (rc) >>> - goto unregister; >>> + pr_warn(FW_WARN >>> + "%pOF: Failed to register MDIO PHY.\n", child); >>> } >>> } >>> } >> >> > >
Re: [PATCH v2 net-next 1/5] l2tp: fix switch default error handling in l2tp_nl_cmd_session_create()
> On Sun, Jan 14, 2018 at 03:50:54PM +0100, Lorenzo Bianconi wrote: >> Although this issue is harmless since that code path is protected by the >> check on l2tp_nl_cmd_ops[]/l2tp_nl_cmd_ops[]->session_create(), fix error >> handling for L2TP_PWTYPE_IP/default case in l2tp_nl_cmd_session_create() >> >> Signed-off-by: Lorenzo Bianconi >> --- >> net/l2tp/l2tp_netlink.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/net/l2tp/l2tp_netlink.c b/net/l2tp/l2tp_netlink.c >> index e1ca29f79821..48b5bf30ec50 100644 >> --- a/net/l2tp/l2tp_netlink.c >> +++ b/net/l2tp/l2tp_netlink.c >> @@ -635,7 +635,7 @@ static int l2tp_nl_cmd_session_create(struct sk_buff >> *skb, struct genl_info *inf >> case L2TP_PWTYPE_IP: >> default: >> ret = -EPROTONOSUPPORT; >> - break; >> + goto out_tunnel; >> } >> > Not sure if this change is really worthwhile. As you noted, this is > unreachable code. And this switch should better be removed entirely: > it doesn't do anything for supported pseudo-wires. > > And if PWTYPE_ETH_VLAN were to be implemented, it should perform its > configuration consistency checking in its own PW specific code, not in > the genl handler. > Personally I would prefer to not remove some code that could be useful for a future implementation, but just fix it if it presents issues to address. Anyway we can simply drop this patch from the series and I can send a new one to remove the switch entirely. @James what do you think? Regards, Lorenzo > Anyway, removing this switch isn't the purpose of this series, so I > think you can drop this patch.
Re: pull-request: can-next 2017-12-01,Re: pull-request: can-nexto 2017-12-01,Re: pull-request: can-next 2017-12-01,Re: pull-request: can-next 2017-12-01
From: Marc Kleine-Budde Date: Mon, 15 Jan 2018 21:54:29 +0100 > On 01/15/2018 06:55 PM, David Miller wrote: >> From: Marc Kleine-Budde >> Date: Mon, 15 Jan 2018 14:02:23 +0100 >> >>> Hello David, >>> >>> >>> On 01/05/2018 12:08 PM, Marc Kleine-Budde wrote: Hello David, this is a pull request of 7 patches for net-next/master. All patches are by me. Patch 6 is for the "can_raw" protocol and add error checking to the bind() function. All other patches clean up the coding style and remove unused parameters in various CAN drivers and infrastructure. >>> >>> this pull request is still open. Any objections to pull this? >> >> You gave me an SSH URL: >> >> >> ssh://g...@gitolite.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next.git >> tags/linux-can-next-for-4.16-20180105 >> >> Or did you plan to give my your SSH password so that I can pull this? :-) >> >> Please resubmit with a proper git:// URL I can actually pull from. > > I don't see a reason, why you need my ssh keys. That was intentional. > > As long as you have your SSH keys, you have read only access to my (and > the git repos of the other kernel developers) on git.kernel.org. I'm > already fetching from net and net-next via ssh: Ok, I didn't know this. Pulled, thanks.
[PATCH 3/3 net] ibmvnic: Allocate and request vpd in init_resources
In reset events in which our memory allocations need to be reallocated, VPD data is being freed, but never reallocated. This can cause issues if we later attempt to access that memory or reset and attempt to free the memory. This patch moves the allocation of the VPD data to init_resources so that it will be symmetrically freed during release resources. Signed-off-by: John Allen --- diff --git a/drivers/net/ethernet/ibm/ibmvnic.c b/drivers/net/ethernet/ibm/ibmvnic.c index bb56460..f0dbb76 100644 --- a/drivers/net/ethernet/ibm/ibmvnic.c +++ b/drivers/net/ethernet/ibm/ibmvnic.c @@ -867,7 +867,7 @@ static int ibmvnic_get_vpd(struct ibmvnic_adapter *adapter) if (adapter->vpd->buff) len = adapter->vpd->len; - reinit_completion(&adapter->fw_done); + init_completion(&adapter->fw_done); crq.get_vpd_size.first = IBMVNIC_CRQ_CMD; crq.get_vpd_size.cmd = GET_VPD_SIZE; ibmvnic_send_crq(adapter, &crq); @@ -929,6 +929,13 @@ static int init_resources(struct ibmvnic_adapter *adapter) if (!adapter->vpd) return -ENOMEM; + /* Vital Product Data (VPD) */ + rc = ibmvnic_get_vpd(adapter); + if (rc) { + netdev_err(netdev, "failed to initialize Vital Product Data (VPD)\n"); + return rc; + } + adapter->map_id = 1; adapter->napi = kcalloc(adapter->req_rx_queues, sizeof(struct napi_struct), GFP_KERNEL); @@ -1002,7 +1009,7 @@ static int __ibmvnic_open(struct net_device *netdev) static int ibmvnic_open(struct net_device *netdev) { struct ibmvnic_adapter *adapter = netdev_priv(netdev); - int rc, vpd; + int rc; mutex_lock(&adapter->reset_lock); @@ -1030,11 +1037,6 @@ static int ibmvnic_open(struct net_device *netdev) rc = __ibmvnic_open(netdev); netif_carrier_on(netdev); - /* Vital Product Data (VPD) */ - vpd = ibmvnic_get_vpd(adapter); - if (vpd) - netdev_err(netdev, "failed to initialize Vital Product Data (VPD)\n"); - mutex_unlock(&adapter->reset_lock); return rc;