Re: [PATCH net-next 5/6] qed: Allow chance for fast ramrod completions
> > + while (iter_cnt--) { > > + /* Validate we receive completion update */ > > smp_rmb(); > > if (comp_done->done == 1) { > > if (p_fw_ret) > > *p_fw_ret = comp_done->fw_return_code; > > return 0; > > } > Note that this smp_rmb() and accesses to ->done and ->fw_return_code are > racy. > fw_return_code needs to be written _before_ done. Thanks for catching this up. Dave - do you want me to re-spin for this? I believe it's a day-1 issue [not introduced by this series], and merits its own patch [and not incorporated into this one]. Still, if you'd like me to re-spin and include Eric's fix, I'll do it.
Re: [PATCH v2 1/3] net: smc91x: isolate u16 writes alignment workaround
Andy Shevchenko writes: >> +#define SMC_outw(lp, v, a, r) \ >> + _SMC_outw_align4((v), (a), (r), \ >> +IS_BUILTIN(CONFIG_ARCH_PXA) && ((r) & 2) &&\ >> +lp->half_word_align4) > > Hmm... Isn't enough to have just (r) & 2 && lp->half_word_align4 ? It wouldn't be equivalent to what we had before. The point of the previous code was to compile out as much as possible of this test. Therefore, at compilation time for omap1 boards, the compiler would evaluate the test to 0, and never leave the workaround code compiled. So it would be enough, but worse performance wise and not equivalent for non-pxa boards, hence this test. Cheers. -- Robert
Re: slab corruption with current -git
From: Linus Torvalds Date: Sun, 9 Oct 2016 20:41:17 -0700 > Note that the "correct way" of doing list operations also almost > inevitably is the shortest way by far, since it gets rid of all the > special cases. So the patch looks nice. It gets rid of the magic > "nf_set_hooks_head()" thing too, because once you do list following > right, the head is no different from any other pointer in the list. Perhaps we should have some "slist" primitives added to include/linux/list.h but since the comparison differs for each user I guess it's hard to abstract in a way that's generic and inlines properly. I'll start taking a look at your patch and this stuff as well, thanks Linus.
Re: slab corruption with current -git (was Re: [git pull] vfs pile 1 (splice))
On Sun, Oct 9, 2016 at 7:49 PM, Linus Torvalds wrote: > > There is one *correct* way to remove an entry from a singly linked > list, and it looks like this: > > struct entry **pp, *p; > > pp = &head; > while ((p = *pp) != NULL) { > if (right_entry(p)) { > *pp = p->next; > break; > } > pp = &p->next; > } > > and that's it. Nothing else. This COMPLETELY UNTESTED patch tries to fix the nf_hook_entry code to do this. I repeat: it's ENTIRELY UNTESTED. I just converted the insertion and deletion to the proper pattern, but I could easily have gotten the insertion priority test the wrong way around entirely, for example. Or it could simply have some other completely broken bug in it. It compiles for me, but that's all I actually checked. Note that the "correct way" of doing list operations also almost inevitably is the shortest way by far, since it gets rid of all the special cases. So the patch looks nice. It gets rid of the magic "nf_set_hooks_head()" thing too, because once you do list following right, the head is no different from any other pointer in the list. So the patch stats look good: net/netfilter/core.c | 108 --- 1 file changed, 33 insertions(+), 75 deletions(-) but again, it's entirely *entirely* untested. Please consider this just a "this is generally how list insert/delete operations should be done, avoiding special cases for the first entry". ALSO NOTE! The code assumes that the "nf_hook_mutex" locking only protects the actual *lists*, and that the address to the list can be looked up without holding the lock. That's generally how things are done, and it simplifies error handling (because you can do the "there is no such list at all" test before you do anything else. But again, I don't actually know the code, and if there is something that actually expands the number of lists etc that depends on that mutex, then the list head lookup may need to be inside the lock too. Linus net/netfilter/core.c | 108 --- 1 file changed, 33 insertions(+), 75 deletions(-) diff --git a/net/netfilter/core.c b/net/netfilter/core.c index c9d90eb64046..814258641fcc 100644 --- a/net/netfilter/core.c +++ b/net/netfilter/core.c @@ -65,49 +65,24 @@ static DEFINE_MUTEX(nf_hook_mutex); #define nf_entry_dereference(e) \ rcu_dereference_protected(e, lockdep_is_held(&nf_hook_mutex)) -static struct nf_hook_entry *nf_hook_entry_head(struct net *net, - const struct nf_hook_ops *reg) +static struct nf_hook_entry __rcu **nf_hook_entry_head(struct net *net, const struct nf_hook_ops *reg) { - struct nf_hook_entry *hook_head = NULL; - if (reg->pf != NFPROTO_NETDEV) - hook_head = nf_entry_dereference(net->nf.hooks[reg->pf] -[reg->hooknum]); - else if (reg->hooknum == NF_NETDEV_INGRESS) { + return net->nf.hooks[reg->pf]+reg->hooknum; + #ifdef CONFIG_NETFILTER_INGRESS + if (reg->hooknum == NF_NETDEV_INGRESS) { if (reg->dev && dev_net(reg->dev) == net) - hook_head = - nf_entry_dereference( - reg->dev->nf_hooks_ingress); -#endif + return ®->dev->nf_hooks_ingress; } - return hook_head; -} - -/* must hold nf_hook_mutex */ -static void nf_set_hooks_head(struct net *net, const struct nf_hook_ops *reg, - struct nf_hook_entry *entry) -{ - switch (reg->pf) { - case NFPROTO_NETDEV: -#ifdef CONFIG_NETFILTER_INGRESS - /* We already checked in nf_register_net_hook() that this is -* used from ingress. -*/ - rcu_assign_pointer(reg->dev->nf_hooks_ingress, entry); #endif - break; - default: - rcu_assign_pointer(net->nf.hooks[reg->pf][reg->hooknum], - entry); - break; - } + return NULL; } int nf_register_net_hook(struct net *net, const struct nf_hook_ops *reg) { - struct nf_hook_entry *hooks_entry; - struct nf_hook_entry *entry; + struct nf_hook_entry __rcu **pp; + struct nf_hook_entry *entry, *p; if (reg->pf == NFPROTO_NETDEV) { #ifndef CONFIG_NETFILTER_INGRESS @@ -119,6 +94,10 @@ int nf_register_net_hook(struct net *net, const struct nf_hook_ops *reg) return -EINVAL; } + pp = nf_hook_entry_head(net, reg); + if (!pp) + return -EINVAL; + entry = kmalloc(sizeof(*entry), GFP_KERNEL); if (!entry) return -ENOMEM; @@ -128,26 +107,15 @@ int nf_register_net_hook(struct net *net, const struct nf_hook_ops *reg) entry->next = NULL; mutex_lock(&nf
[PATCH net] net_sched: do not broadcast RTM_GETTFILTER result
From: Eric Dumazet There are two ways to get tc filters from kernel to user space. 1) Full dump (tc_dump_tfilter()) 2) RTM_GETTFILTER to get one precise filter, reducing overhead. The second operation is unfortunately broadcasting its result, polluting "tc monitor" users. This patch makes sure only the requester gets the result, using netlink_unicast() instead of rtnetlink_send() Jamal cooked an iproute2 patch to implement "tc filter get" operation, but other user space libraries already use RTM_GETTFILTER when a single filter is queried, instead of dumping all filters. Signed-off-by: Eric Dumazet Cc: Jamal Hadi Salim --- net/sched/cls_api.c | 18 +++--- 1 file changed, 11 insertions(+), 7 deletions(-) diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c index 11da7da0b7c4..2ee29a3375f6 100644 --- a/net/sched/cls_api.c +++ b/net/sched/cls_api.c @@ -101,7 +101,7 @@ EXPORT_SYMBOL(unregister_tcf_proto_ops); static int tfilter_notify(struct net *net, struct sk_buff *oskb, struct nlmsghdr *n, struct tcf_proto *tp, - unsigned long fh, int event); + unsigned long fh, int event, bool unicast); static void tfilter_notify_chain(struct net *net, struct sk_buff *oskb, struct nlmsghdr *n, @@ -112,7 +112,7 @@ static void tfilter_notify_chain(struct net *net, struct sk_buff *oskb, for (it_chain = chain; (tp = rtnl_dereference(*it_chain)) != NULL; it_chain = &tp->next) - tfilter_notify(net, oskb, n, tp, 0, event); + tfilter_notify(net, oskb, n, tp, 0, event, false); } /* Select new prio value from the range, managed by kernel. */ @@ -319,7 +319,8 @@ replay: RCU_INIT_POINTER(*back, next); - tfilter_notify(net, skb, n, tp, fh, RTM_DELTFILTER); + tfilter_notify(net, skb, n, tp, fh, + RTM_DELTFILTER, false); tcf_destroy(tp, true); err = 0; goto errout; @@ -345,14 +346,14 @@ replay: struct tcf_proto *next = rtnl_dereference(tp->next); tfilter_notify(net, skb, n, tp, fh, - RTM_DELTFILTER); + RTM_DELTFILTER, false); if (tcf_destroy(tp, false)) RCU_INIT_POINTER(*back, next); } goto errout; case RTM_GETTFILTER: err = tfilter_notify(net, skb, n, tp, fh, -RTM_NEWTFILTER); +RTM_NEWTFILTER, true); goto errout; default: err = -EINVAL; @@ -367,7 +368,7 @@ replay: RCU_INIT_POINTER(tp->next, rtnl_dereference(*back)); rcu_assign_pointer(*back, tp); } - tfilter_notify(net, skb, n, tp, fh, RTM_NEWTFILTER); + tfilter_notify(net, skb, n, tp, fh, RTM_NEWTFILTER, false); } else { if (tp_created) tcf_destroy(tp, true); @@ -419,7 +420,7 @@ nla_put_failure: static int tfilter_notify(struct net *net, struct sk_buff *oskb, struct nlmsghdr *n, struct tcf_proto *tp, - unsigned long fh, int event) + unsigned long fh, int event, bool unicast) { struct sk_buff *skb; u32 portid = oskb ? NETLINK_CB(oskb).portid : 0; @@ -433,6 +434,9 @@ static int tfilter_notify(struct net *net, struct sk_buff *oskb, return -EINVAL; } + if (unicast) + return netlink_unicast(net->rtnl, skb, portid, MSG_DONTWAIT); + return rtnetlink_send(skb, net, portid, RTNLGRP_TC, n->nlmsg_flags & NLM_F_ECHO); }
Re: net: BUG still has locks held in unix_stream_splice_read
On Mon, Oct 10, 2016 at 03:46:07AM +0100, Al Viro wrote: > On Sun, Oct 09, 2016 at 12:06:14PM +0200, Dmitry Vyukov wrote: > > I suspect this is: > > > > commit 25869262ef7af24ccde988867ac3eb1c3d4b88d4 > > Author: Al Viro > > Date: Sat Sep 17 21:02:10 2016 -0400 > > skb_splice_bits(): get rid of callback > > since pipe_lock is the outermost now, we don't need to drop/regain > > socket locks around the call of splice_to_pipe() from skb_splice_bits(), > > which kills the need to have a socket-specific callback; we can just > > call splice_to_pipe() and be done with that. > > Unlikely, since that particular commit removes unlocking/relocking ->iolock > around the call of splice_to_pipe(). Original would've retaken the same > lock on the way out; it's not as if we could leave the syscall there. > > It might be splice-related, but I don't believe that you've got the right > commit here. It's not that commit, all right - it's "can't call unix_stream_read_generic() with any locks held" stepped onto a couple of commits prior by "splice: lift pipe_lock out of splice_to_pipe()". Could somebody explain what is that about? E.g what will happen if some code does a read on AF_UNIX socket with some local mutex held? AFAICS, there are exactly two callers of freezable_schedule_timeout() - this one and one in XFS; the latter is in a kernel thread where we do have good warranties about the locking environment, but here it's in the bleeding ->recvmsg/->splice_read and for those assumption that caller doesn't hold any locks is pretty strong, especially since it's not documented anywhere. What's going on there?
Re: slab corruption with current -git (was Re: [git pull] vfs pile 1 (splice))
On Sun, Oct 9, 2016 at 6:35 PM, Aaron Conole wrote: > > I was just about to build and test something similar: So I haven't actually tested that one, but looking at the code, it really looks very bogus. In fact, that code just looks like crap. It does *not* do a proper "remove singly linked list entry". It's exactly the kind of code that I rail against, and that people should never write. Any code that can't even traverse a linked list is not worth looking at. There is one *correct* way to remove an entry from a singly linked list, and it looks like this: struct entry **pp, *p; pp = &head; while ((p = *pp) != NULL) { if (right_entry(p)) { *pp = p->next; break; } pp = &p->next; } and that's it. Nothing else. The above code exits the loop with "p" containing the entry that was removed, or NULL if nothing was. It can't get any simpler than that, but more importantly, anything more complicated than that is WRONG. Seriously, nothing else is acceptable. In particular, any linked list traversal that makes a special case of the first entry or the last entry should not be allowed to exist. Note how there is not a single special case in the above correct code. It JustWorks(tm). That nf_unregister_net_hook() code has all the signs of exactly that kind of broken list-handling code: special-casing the head of the loop, and having the loop condition test both current and that odd "next to next" pointer etc. It's all very very wrong. So I really see two options: - do that singly-linked list traversal right (and I'm serious: nothing but the code above can ever be right) - don't make up your own list handling code at all, and use the standard linux list code. So either e3b37f11e6e4 needs to be reverted, or it needs to be taught to use real list handling. If the code doesn't want to use the regular list.h (either the doubly linked one, or the hlist one), it needs to at least learn to do list removal right. Linus
Re: net: BUG still has locks held in unix_stream_splice_read
On Sun, Oct 09, 2016 at 12:06:14PM +0200, Dmitry Vyukov wrote: > I suspect this is: > > commit 25869262ef7af24ccde988867ac3eb1c3d4b88d4 > Author: Al Viro > Date: Sat Sep 17 21:02:10 2016 -0400 > skb_splice_bits(): get rid of callback > since pipe_lock is the outermost now, we don't need to drop/regain > socket locks around the call of splice_to_pipe() from skb_splice_bits(), > which kills the need to have a socket-specific callback; we can just > call splice_to_pipe() and be done with that. Unlikely, since that particular commit removes unlocking/relocking ->iolock around the call of splice_to_pipe(). Original would've retaken the same lock on the way out; it's not as if we could leave the syscall there. It might be splice-related, but I don't believe that you've got the right commit here.
Re: [PATCH iproute2 0/2] ip rule: merger iprule_flush and add selector support
On Thu, 22 Sep 2016 14:28:47 +0800 Hangbin Liu wrote: > When merge iprule_flush() and iprule_list_or_save(). Renamed > rtnl_filter_t filter to filter_fn because we want to use global > variable 'filter' to filter nlmsg in the next patch. > > Hangbin Liu (2): > ip rule: merge ip rule flush and list, save together > ip rule: add selector support > > ip/iprule.c| 295 > + > man/man8/ip-rule.8 | 6 +- > 2 files changed, 231 insertions(+), 70 deletions(-) > > -- Applied to net-next (for 4.9)
Re: [PATCH iproute2] ip link: Add support to configure SR-IOV VF to vlan protocol 802.1ad (VST QinQ)
On Wed, 28 Sep 2016 10:58:59 +0300 Tariq Toukan wrote: > From: Moshe Shemesh > > Introduce a new API that exposes a list of vlans per VF (IFLA_VF_VLAN_LIST), > giving the ability for user-space application to specify it for the VF as > an option to support 802.1ad (VST QinQ). > > We introduce struct vf_vlan_info, which extends struct vf_vlan and adds > an optional VF VLAN proto parameter. > Default VLAN-protocol is 802.1Q. > > Add IFLA_VF_VLAN_LIST in addition to IFLA_VF_VLAN to keep backward > compatibility with older kernel versions. > > Suitable ip link tool command examples: > - Set vf vlan protocol 802.1ad (S-TAG) > ip link set eth0 vf 1 vlan 100 proto 802.1ad > - Set vf vlan S-TAG and vlan C-TAG (VST QinQ) > ip link set eth0 vf 1 vlan 100 proto 802.1ad vlan 30 proto 802.1Q > - Set vf to VST (802.1Q) mode > ip link set eth0 vf 1 vlan 100 proto 802.1Q > - Or by omitting the new parameter (backward compatible) > ip link set eth0 vf 1 vlan 100 > > Signed-off-by: Moshe Shemesh > Signed-off-by: Tariq Toukan Applied to net-next (for 4.9)
Re: [PATCH v2 iproute2 net-next] tc: m_vlan: Add vlan modify action
On Thu, 22 Sep 2016 21:01:05 +0300 Shmulik Ladkani wrote: > The 'vlan modify' action allows to replace an existing 802.1q tag > according to user provided settings. > It accepts same arguments as the 'vlan push' action. > > For example, this replaces vid 6 with vid 5: > > # tc filter add dev veth0 parent : pref 1 protocol 802.1q \ > basic match 'meta(vlan mask 0xfff eq 6)' \ > action vlan modify id 5 continue > > Signed-off-by: Shmulik Ladkani Applied to net-next (for 4.9)
Re: [PATCH iproute2 net-next] ipmroute: add support for age dumping
On Wed, 21 Sep 2016 11:45:58 +0200 Nikolay Aleksandrov wrote: > Add support to dump the mroute cache entry age if the show_stats (-s) > switch is provided. > Example: > $ ip -s mroute > (0.0.0.0, 239.10.10.10) Iif: eth0 Oifs: eth0 > 0 packets, 0 bytes, Age 245.44 > > Signed-off-by: Nikolay Aleksandrov Applied to net-next (pending for 4.9)
Re: [PATCH iproute2-next] tc: fq: display unthrottle latency
On Wed, 28 Sep 2016 06:23:15 -0700 Eric Dumazet wrote: > From: Eric Dumazet > > In linux-4.9 fq packet scheduler got a new stat : > > unthrottle_latency in nano second units. > > Gives a good indication of system load or timer implementation > latencies. > > Signed-off-by: Eric Dumazet Applied to net-next (for 4.9)
Re: [PATCH iproute2 6/9] actions: add skbmod action
On Sat, 1 Oct 2016 16:48:34 -0400 Jamal Hadi Salim wrote: > From: Jamal Hadi Salim > > This action is intended to be an upgrade from a usability perspective > from pedit (as well as operational debugability). > Compare this: > > sudo tc filter add dev $ETH parent 1: protocol ip prio 10 \ > u32 match ip protocol 1 0xff flowid 1:2 \ > action pedit munge offset -14 u8 set 0x02 \ > munge offset -13 u8 set 0x15 \ > munge offset -12 u8 set 0x15 \ > munge offset -11 u8 set 0x15 \ > munge offset -10 u16 set 0x1515 \ > pipe > > to: > > sudo tc filter add dev $ETH parent 1: protocol ip prio 10 \ > u32 match ip protocol 1 0xff flowid 1:2 \ > action skbmod dmac 02:15:15:15:15:15 > > Or worse, try to debug a policy with destination mac, source mac and > etherype. Then make that a hundred rules and you'll get my point. > > The most important ethernet use case at the moment is when redirecting or > mirroring packets to a remote machine. The dst mac address needs a re-write > so that it doesnt get dropped or confuse an interconnecting (learning) switch > or dropped by a target machine (which looks at the dst mac). > > In the future common use cases on pedit can be migrated to this action > (as an example different fields in ip v4/6, transports like tcp/udp/sctp > etc). For this first cut, this allows modifying basic ethernet header. > > Signed-off-by: Jamal Hadi Salim Lots of checkpatch errors on this. Please fix and resubmit series. For example: ERROR: spaces required around that '+=' (ctx:WxV) #442: FILE: tc/m_skbmod.c:79: + ok +=1; ERROR: code indent should use tabs where possible #567: FILE: tc/m_skbmod.c:204: +SPRINT_BUF(b1);$ WARNING: please, no spaces at the start of a line #567: FILE: tc/m_skbmod.c:204: +SPRINT_BUF(b1);$ ERROR: code indent should use tabs where possible #568: FILE: tc/m_skbmod.c:205: +SPRINT_BUF(b2);$ WARNING: please, no spaces at the start of a line #568: FILE: tc/m_skbmod.c:205: +SPRINT_BUF(b2);$ WARNING: braces {} are not necessary for single statement blocks #610: FILE: tc/m_skbmod.c:247: + if (p->flags & SKBMOD_F_SWAPMAC) { + fprintf(f, "swap mac "); + } ERROR: trailing whitespace #816: FILE: man/man8/tc-skbmod.8:28: +.IR CONTROL " := {" $
[ANNOUNCE] iproute 4.8
Release of iproute2 for Linux 4.8, slightly late because of netdev. Update to iproute2 utility to support new features in Linux 4.8. Includes support for MACSEC and ILA. Plus the usual array of documentation and minor fixes. Source: http://www.kernel.org/pub/linux/utils/net/iproute2/iproute2-4.8.0.tar.gz Repository: git://git.kernel.org/pub/scm/linux/kernel/git/shemminger/iproute2.git Report problems (or enhancements) to the netdev@vger.kernel.org mailing list. --- Andrey Jr. Melnikov (1): iproute: disallow ip rule del without parameters David Ahern (1): ip rule: Add support for l3mdev rules Davide Caratti (5): macsec: fix input of 'port', improve documentation of 'address' man: ip.8: add missing 'macsec' item to OBJECT list macsec: fix byte ordering on input/display of 'sci' tc: don't accept qdisc 'handle' greater than macsec: fix input range of 'icvlen' parameter Eric Dumazet (1): ip: report IFLA_GSO_MAX_SIZE and IFLA_GSO_MAX_SEGS Gustavo Zacarias (1): ss: fix build with musl libc Hangbin Liu (5): nstat: add sctp snmp support gitignore: Ignore 'tags' file generated by ctags ip route: check ftell, fseek return value misc/ss: tcp cwnd should be unsigned ip: Use specific slave id Hannes Frederic Sowa (1): iptuntap: show processes using tuntap interface Igor Ryzhov (1): fix netlink message length checks Iskren Chernev (1): iproute: fix documentation for ip rule scan order Jamal Hadi Salim (2): actions: skbedit add support for mod-ing skb pkt_type tc classifiers: Modernize tcindex classifier Jiri Benc (2): vxlan: group address requires net device tunnels: use macros for IPv6 address comparison Liping Zhang (1): ipmonitor: fix ip monitor can't work when NET_NS is not enabled Nikolay Aleksandrov (1): ip: route: fix multicast route dumps Or Gerlitz (1): devlink: Add e-switch support Phil Sutter (4): man: ip-link.8: Document missing geneve options ip-link: add missing {min,max}_tx_rate to help text ip-route: Prevent some double spaces in output iproute: fix documentation for ip rule scan order Richard Alpe (2): tipc: fix UDP bearer synopsis tipc: refactor bearer identification Roman Mashak (3): police: add extra space to improve police result printing police: improve usage message police: bug fix man page Roopa Prabhu (2): bridge: print_vlan: add missing check for json instance bridge: print_vlan: add missing check for json instance Sabrina Dubroca (4): libgenl: introduce genl_init_handle macsec: show usage even if the module is not available fou: show usage even if the module is not available ila: show usage even if the module is not available Simon Horman (1): iproute2: correct port in FOU/GRE example Stephen Hemminger (14): minor header update from net-next fib_rules.h update header file iprule: whitespace cleanup update kernel headers (net-next) update kernel header (4.7 net-next) update headers files to current net-next include: update net-next XDP headers update kernel headers update BPF headers devlink: whitespace cleanup remove useless return statement ip: iptuntap cleanup update kernel headers from 4.8-rc4 v4.8.0 Sushma Sitaram (1): tc: f_u32: Fill in 'linkid' provided by user Thomas Graf (1): tuntap: Add name attribute to usage text Tom Herbert (6): ila: Support for checksum neutral translation ila: Support for configuring ila to use netfilter hook ip6tnl: Support for fou encapsulation gre6: Support for fou encapsulation fou: Allowing configuring IPv6 listener ipila: Fixed unitialized variables WANG Cong (1): tc: fix a misleading failure Xin Long (1): ip route: restore_handler should check tb[RTA_PREFSRC] for local networks Yotam Gigi (2): tc: Add support for the matchall traffic classifier. tc: man: Add man entry for the matchall classifier. anuradhak (1): bridge: Fix garbled json output seen if a vlan filter is specified
Re: [PATCH iproute2] fix netlink message length checks
On Tue, 4 Oct 2016 13:16:55 +0300 Igor Ryzhov wrote: > Signed-off-by: Igor Ryzhov Makes sense applied, I wonder why one of the static checkers didn't see this.
Re: [iproute2] bridge: Fix garbled json output seen if a vlan filter is specified
On Fri, 7 Oct 2016 09:40:18 -0700 Anuradha Karuppiah wrote: > From: anuradhak > > json objects were started but not completed if the fdb vlan did not > match the specified filter vlan. > > Sample output: > $ bridge -j fdb show vlan 111 > [{ > "mac": "44:38:39:00:69:88", > "dev": "br0", > "vlan": 111, > "master": "br0", > "state": "permanent" > } > ] > $ bridge -j fdb show vlan 100 > [] > $ > > Signed-off-by: Anuradha Karuppiah Applied for 4.8 version
Re: [PATCH] tc: f_u32: Fill in 'linkid' provided by user
On Wed, 28 Sep 2016 11:30:16 -0700 Sushma Sitaram wrote: > Currently, 'linkid' input by the user is parsed but 'handle' is appended to > the netlink message. > > # tc filter add dev enp1s0f1 protocol ip parent : prio 99 u32 ht 800: \ > order 1 link 1: offset at 0 mask 0f00 shift 6 plus 0 eat match ip \ > protocol 6 ff > > resulted in: > filter protocol ip pref 99 u32 fh 800::1 order 1 key ht 800 bkt 0 > match 0006/00ff at 8 > offset 0f00>>6 at 0 eat > > This patch results in: > filter protocol ip pref 99 u32 fh 800::1 order 1 key ht 800 bkt 0 link 1: > match 0006/00ff at 8 > offset 0f00>>6 at 0 eat > > > Signed-off-by Sushma Sitaram: Sushma Sitaram Applied (for 4.8).
Re: [PATCH v3] iproute2: build nsid-name cache only for commands that need it
On Tue, 20 Sep 2016 06:01:27 + Anton Aksola wrote: > The calling of netns_map_init() before command parsing introduced > a performance issue with large number of namespaces. > > As commands such as add, del and exec do not need to iterate through > /var/run/netns it would be good not no build the cache before executing > these commands. > > Example: > unpatched: > time seq 1 1000 | xargs -n 1 ip netns add > > real0m16.832s > user0m1.350s > sys0m15.029s > > patched: > time seq 1 1000 | xargs -n 1 ip netns add > > real0m3.859s > user0m0.132s > sys0m3.205s > > Signed-off-by: Anton Aksola > --- > ip/ip_common.h| 1 + > ip/ipmonitor.c| 1 + > ip/ipnetns.c | 31 > ++- > testsuite/tests/ip/netns/set_nsid.t | 22 ++ > testsuite/tests/ip/netns/set_nsid_batch.t | 18 ++ > 5 files changed, 64 insertions(+), 9 deletions(-) > create mode 100755 testsuite/tests/ip/netns/set_nsid.t > create mode 100755 testsuite/tests/ip/netns/set_nsid_batch.t > Applied to net-next (ie for 4.9)
Re: slab corruption with current -git (was Re: [git pull] vfs pile 1 (splice))
Florian Westphal writes: > Linus Torvalds wrote: >> On Sun, Oct 9, 2016 at 12:11 PM, Linus Torvalds >> wrote: >> > >> > Anyway, I don't think I can bisect it, but I'll try to narrow it down >> > a *bit* at least. >> > >> > Not doing any more pulls on this unstable base, I've been puttering >> > around in trying to clean up some stupid printk logging issues >> > instead. >> >> So I finally got a oops with slub debugging enabled. It doesn't really >> narrow things down, though, it kind of extends on the possible >> suspects. Now adding David Miller and Pablo, because it looks like it >> may be netfilter that does something bad and corrupts memory. > > Quite possible, the netns interactions are not nice :-/ > >> Without further ado, here's the new oops: >> >>general protection fault: [#1] SMP >>CPU: 7 PID: 169 Comm: kworker/u16:7 Not tainted >> 4.8.0-11288-gb66484cd7470 #1 >>Hardware name: System manufacturer System Product Name/Z170-K, BIOS > .. >>Call Trace: >> netfilter_net_exit+0x2f/0x60 >> ops_exit_list.isra.4+0x38/0x60 >> cleanup_net+0x1ba/0x2a0 >> process_one_work+0x1f1/0x480 >> worker_thread+0x48/0x4d0 >> ? process_one_work+0x480/0x480 > > .. > >> like it's a pointer loaded from a free'd allocation. >> >> The code disassembles to >> >>0: 0f b6 ca movzbl %dl,%ecx >>3: 48 8d 84 c8 00 01 00 lea0x100(%rax,%rcx,8),%rax >>a: 00 >>b: 49 8b 5c c5 00 mov0x0(%r13,%rax,8),%rbx >> 10: 48 85 db test %rbx,%rbx >> 13: 0f 84 cb 00 00 00 je 0xe4 >> 19: 4c 3b 63 40 cmp0x40(%rbx),%r12 >> 1d: 48 8b 03 mov(%rbx),%rax >> 20: 0f 84 e9 00 00 00 je 0x10f >> 26: 48 85 c0 test %rax,%rax >> 29: 74 26 je 0x51 >> 2b:* 4c 3b 60 40 cmp0x40(%rax),%r12 <-- trapping instruction >> 2f: 75 08 jne0x39 >> 31: e9 ef 00 00 00 jmpq 0x125 >> 36: 48 89 d8 mov%rbx,%rax >> 39: 48 8b 18 mov(%rax),%rbx >> 3c: 48 85 db test %rbx,%rbx >> >> and that oopsing instruction seems to be the compare of >> "hooks_entry->orig_ops" from hooks_entry in this expression: >> >> if (hooks_entry && hooks_entry->orig_ops == reg) { >> >> so hooks_entry() is bogus. It was gotten from >> >> hooks_entry = nf_hook_entry_head(net, reg); >> >> but that's as far as I dug. And yes, I do have >> CONFIG_NETFILTER_INGRESS=y in case that matters. >> >> And all this code has changed pretty radically in commit e3b37f11e6e4 >> ("netfilter: replace list_head with single linked list"), and there >> was clearly already something wrong with that code, with commit >> 5119e4381a90 ("netfilter: Fix potential null pointer dereference") >> adding the test against NULL. But I suspect that only hid the "oops, >> it's actually not NULL, it loaded some uninitialized value" problem. >> >> Over to the networking guys.. Ideas? > > Sorry, not off the top of my head. > Pablo is currently travelling back home from netdev 1.2 in Tokyo, > I can help starting Wednesday when I am back. > > One shot in the dark (not even compile tested; wonder if we can end up > zapping bogus hook ...) > I was just about to build and test something similar: diff --git a/net/netfilter/core.c b/net/netfilter/core.c index c9d90eb..e84103f 100644 --- a/net/netfilter/core.c +++ b/net/netfilter/core.c @@ -189,7 +189,7 @@ void nf_unregister_net_hook(struct net *net, const struct nf_hook_ops *reg) unlock: mutex_unlock(&nf_hook_mutex); - if (!hooks_entry) { + if (!hooks_entry || hooks_entry->orig_ops != reg) { WARN(1, "nf_unregister_net_hook: hook not found!\n"); return; }
Re: [PATCH kernel v2] PCI: Enable access to custom VPD for Chelsio devices (cxgb3)
Anyone, ping? On 29/09/16 15:21, Alexey Kardashevskiy wrote: > There is at least one Chelsio 10Gb card which uses VPD area to store > some custom blocks (example below). However pci_vpd_size() returns > the length of the first block only assuming that there can be only > one VPD "End Tag" and VFIO blocks access beyond that offset > (since 4e1a63555) which leads to the situation when the guest "cxgb3" > driver fails to probe the device. The host system does not have this > problem as the drives accesses the config space directly without > pci_read_vpd()/... > > This adds a quirk to override the VPD size to a bigger value. > The maximum size is taken from EEPROMSIZE in > drivers/net/ethernet/chelsio/cxgb3/common.h. We do not read the tag > as the cxgb3 driver does as the driver supports writing to EEPROM/VPD > and when it writes, it only checks for 8192 bytes boundary. The quirk > is registerted for all devices supported by the cxgb3 driver. > > This adds a quirk to the PCI layer (not to the cxgb3 driver) as > the cxgb3 driver itself accesses VPD directly and the problem only exists > with the vfio-pci driver (when cxgb3 is not running on the host and > may not be even loaded) which blocks accesses beyond the first block > of VPD data. However vfio-pci itself does not have quirks mechanism so > we add it to PCI. > > Tested on: > Ethernet controller [0200]: Chelsio Communications Inc T310 10GbE Single Port > Adapter [1425:0030] > > This is its VPD: > Large item 42 bytes; name 0x2 Identifier String > b'10 Gigabit Ethernet-SR PCI Express Adapter' > #00 [EC] len=7: b'D76809 ' > #0a [FN] len=7: b'46K7897' > #14 [PN] len=7: b'46K7897' > #1e [MN] len=4: b'1037' > #25 [FC] len=4: b'5769' > #2c [SN] len=12: b'YL102035603V' > #3b [NA] len=12: b'00145E992ED1' > > 0c00 Large item 16 bytes; name 0x2 Identifier String > b'S310E-SR-X ' > 0c13 Large item 234 bytes; name 0x10 > #00 [PN] len=16: b'TBD ' > #13 [EC] len=16: b'110107730D2 ' > #26 [SN] len=16: b'97YL102035603V ' > #39 [NA] len=12: b'00145E992ED1' > #48 [V0] len=6: b'175000' > #51 [V1] len=6: b'26' > #5a [V2] len=6: b'26' > #63 [V3] len=6: b'2000 ' > #6c [V4] len=2: b'1 ' > #71 [V5] len=6: b'c2' > #7a [V6] len=6: b'0 ' > #83 [V7] len=2: b'1 ' > #88 [V8] len=2: b'0 ' > #8d [V9] len=2: b'0 ' > #92 [VA] len=2: b'0 ' > #97 [RV] len=80: > b's\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'... > 0d00 Large item 252 bytes; name 0x11 > #00 [VC] len=16: b'122310_1222 dp ' > #13 [VD] len=16: b'610-0001-00 H1\x00\x00' > #26 [VE] len=16: b'122310_1353 fp ' > #39 [VF] len=16: b'610-0001-00 H1\x00\x00' > #4c [RW] len=173: > b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'... > 0dff Small item 0 bytes; name 0xf End Tag > > Signed-off-by: Alexey Kardashevskiy > --- > Changes: > v2: > * used pci_set_vpd_size() helper > * added explicit list of IDs from cxgb3 driver > * added a note in the commit log why the quirk is not in cxgb3 > --- > drivers/pci/quirks.c | 22 ++ > 1 file changed, 22 insertions(+) > > diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c > index 44e0ff3..b22fce5 100644 > --- a/drivers/pci/quirks.c > +++ b/drivers/pci/quirks.c > @@ -3243,6 +3243,28 @@ DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, > PCI_DEVICE_ID_INTEL_CACTUS_RIDGE_4C > DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_PORT_RIDGE, > quirk_thunderbolt_hotplug_msi); > > +static void quirk_chelsio_extend_vpd(struct pci_dev *dev) > +{ > + if (!dev->vpd) > + return; > + > + pci_set_vpd_size(dev, max_t(unsigned int, dev->vpd->len, 8192)); > +} > + > +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_CHELSIO, 0x20, > quirk_chelsio_extend_vpd); > +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_CHELSIO, 0x21, > quirk_chelsio_extend_vpd); > +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_CHELSIO, 0x22, > quirk_chelsio_extend_vpd); > +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_CHELSIO, 0x23, > quirk_chelsio_extend_vpd); > +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_CHELSIO, 0x24, > quirk_chelsio_extend_vpd); > +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_CHELSIO, 0x25, > quirk_chelsio_extend_vpd); > +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_CHELSIO, 0x26, > quirk_chelsio_extend_vpd); > +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_CHELSIO, 0x30, > quirk_chelsio_extend_vpd); > +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_CHELSIO, 0x31, > quirk_chelsio_extend_vpd); > +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_CHELSIO, 0x32, > quirk_chelsio_extend_vpd); > +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_CHELSIO, 0x35, > quirk_chelsio_extend_vpd); > +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_CHELSIO, 0x36, > quirk_chelsio_extend_vpd); > +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_CHELSIO, 0x3
Re: slab corruption with current -git (was Re: [git pull] vfs pile 1 (splice))
Linus Torvalds wrote: > On Sun, Oct 9, 2016 at 12:11 PM, Linus Torvalds > wrote: > > > > Anyway, I don't think I can bisect it, but I'll try to narrow it down > > a *bit* at least. > > > > Not doing any more pulls on this unstable base, I've been puttering > > around in trying to clean up some stupid printk logging issues > > instead. > > So I finally got a oops with slub debugging enabled. It doesn't really > narrow things down, though, it kind of extends on the possible > suspects. Now adding David Miller and Pablo, because it looks like it > may be netfilter that does something bad and corrupts memory. Quite possible, the netns interactions are not nice :-/ > Without further ado, here's the new oops: > >general protection fault: [#1] SMP >CPU: 7 PID: 169 Comm: kworker/u16:7 Not tainted 4.8.0-11288-gb66484cd7470 > #1 >Hardware name: System manufacturer System Product Name/Z170-K, BIOS .. >Call Trace: > netfilter_net_exit+0x2f/0x60 > ops_exit_list.isra.4+0x38/0x60 > cleanup_net+0x1ba/0x2a0 > process_one_work+0x1f1/0x480 > worker_thread+0x48/0x4d0 > ? process_one_work+0x480/0x480 .. > like it's a pointer loaded from a free'd allocation. > > The code disassembles to > >0: 0f b6 ca movzbl %dl,%ecx >3: 48 8d 84 c8 00 01 00 lea0x100(%rax,%rcx,8),%rax >a: 00 >b: 49 8b 5c c5 00 mov0x0(%r13,%rax,8),%rbx > 10: 48 85 db test %rbx,%rbx > 13: 0f 84 cb 00 00 00 je 0xe4 > 19: 4c 3b 63 40 cmp0x40(%rbx),%r12 > 1d: 48 8b 03 mov(%rbx),%rax > 20: 0f 84 e9 00 00 00 je 0x10f > 26: 48 85 c0 test %rax,%rax > 29: 74 26 je 0x51 > 2b:* 4c 3b 60 40 cmp0x40(%rax),%r12 <-- trapping instruction > 2f: 75 08 jne0x39 > 31: e9 ef 00 00 00 jmpq 0x125 > 36: 48 89 d8 mov%rbx,%rax > 39: 48 8b 18 mov(%rax),%rbx > 3c: 48 85 db test %rbx,%rbx > > and that oopsing instruction seems to be the compare of > "hooks_entry->orig_ops" from hooks_entry in this expression: > > if (hooks_entry && hooks_entry->orig_ops == reg) { > > so hooks_entry() is bogus. It was gotten from > > hooks_entry = nf_hook_entry_head(net, reg); > > but that's as far as I dug. And yes, I do have > CONFIG_NETFILTER_INGRESS=y in case that matters. > > And all this code has changed pretty radically in commit e3b37f11e6e4 > ("netfilter: replace list_head with single linked list"), and there > was clearly already something wrong with that code, with commit > 5119e4381a90 ("netfilter: Fix potential null pointer dereference") > adding the test against NULL. But I suspect that only hid the "oops, > it's actually not NULL, it loaded some uninitialized value" problem. > > Over to the networking guys.. Ideas? Sorry, not off the top of my head. Pablo is currently travelling back home from netdev 1.2 in Tokyo, I can help starting Wednesday when I am back. One shot in the dark (not even compile tested; wonder if we can end up zapping bogus hook ...) diff --git a/net/netfilter/core.c b/net/netfilter/core.c index c9d90eb..fd6a2ce 100644 --- a/net/netfilter/core.c +++ b/net/netfilter/core.c @@ -189,6 +189,9 @@ void nf_unregister_net_hook(struct net *net, const struct nf_hook_ops *reg) unlock: mutex_unlock(&nf_hook_mutex); + + WARN_ON(hooks_entry && hooks_entry->orig_ops != reg); + if (!hooks_entry) { WARN(1, "nf_unregister_net_hook: hook not found!\n"); return;
Re: [PATCH net-next 5/6] qed: Allow chance for fast ramrod completions
On Sun, 2016-10-09 at 18:25 +0300, Yuval Mintz wrote: > From: Yuval Mintz > > Whenever a ramrod is being sent for some device configuration, > the driver is going to sleep at least 5ms between each iteration > of polling on the completion of the ramrod. > > However, in almost every configuration scenario the firmware > would be able to comply and complete the ramrod in a manner of > several usecs. This is especially important in cases where there > might be a lot of sequential configurations applying to the hardware > [e.g., RoCE], in which case the existing scheme might cause some > visible user delays. > > This patch changes the completion scheme - instead of immediately > starting to sleep for a 'long' period, allow the device to quickly > poll on the first iteration after a couple of usecs. > > Signed-off-by: Yuval Mintz > --- > drivers/net/ethernet/qlogic/qed/qed_spq.c | 85 > +-- > 1 file changed, 59 insertions(+), 26 deletions(-) > > diff --git a/drivers/net/ethernet/qlogic/qed/qed_spq.c > b/drivers/net/ethernet/qlogic/qed/qed_spq.c > index caff415..259a615 100644 > --- a/drivers/net/ethernet/qlogic/qed/qed_spq.c > +++ b/drivers/net/ethernet/qlogic/qed/qed_spq.c > @@ -37,7 +37,11 @@ > ***/ > > #define SPQ_HIGH_PRI_RESERVE_DEFAULT(1) > -#define SPQ_BLOCK_SLEEP_LENGTH (1000) > + > +#define SPQ_BLOCK_DELAY_MAX_ITER(10) > +#define SPQ_BLOCK_DELAY_US (10) > +#define SPQ_BLOCK_SLEEP_MAX_ITER(1000) > +#define SPQ_BLOCK_SLEEP_MS (5) > > /*** > * Blocking Imp. (BLOCK/EBLOCK mode) > @@ -57,53 +61,81 @@ static void qed_spq_blocking_cb(struct qed_hwfn *p_hwfn, > smp_wmb(); > } > > -static int qed_spq_block(struct qed_hwfn *p_hwfn, > - struct qed_spq_entry *p_ent, > - u8 *p_fw_ret) > +static int __qed_spq_block(struct qed_hwfn *p_hwfn, > +struct qed_spq_entry *p_ent, > +u8 *p_fw_ret, bool sleep_between_iter) > { > - int sleep_count = SPQ_BLOCK_SLEEP_LENGTH; > struct qed_spq_comp_done *comp_done; > - int rc; > + u32 iter_cnt; > > comp_done = (struct qed_spq_comp_done *)p_ent->comp_cb.cookie; > - while (sleep_count) { > - /* validate we receive completion update */ > + iter_cnt = sleep_between_iter ? SPQ_BLOCK_SLEEP_MAX_ITER > + : SPQ_BLOCK_DELAY_MAX_ITER; > + > + while (iter_cnt--) { > + /* Validate we receive completion update */ > smp_rmb(); > if (comp_done->done == 1) { > if (p_fw_ret) > *p_fw_ret = comp_done->fw_return_code; > return 0; > } Note that this smp_rmb() and accesses to ->done and ->fw_return_code are racy. fq_return_code needs to be written _before_ done. Something like the following patch would make sense... diff --git a/drivers/net/ethernet/qlogic/qed/qed_spq.c b/drivers/net/ethernet/qlogic/qed/qed_spq.c index caff41544898baed09f45a41829cb0ba9c719fb9..eefa45eab3728791f4d15a088c4550f318a1d1da 100644 --- a/drivers/net/ethernet/qlogic/qed/qed_spq.c +++ b/drivers/net/ethernet/qlogic/qed/qed_spq.c @@ -50,11 +50,9 @@ static void qed_spq_blocking_cb(struct qed_hwfn *p_hwfn, comp_done = (struct qed_spq_comp_done *)cookie; - comp_done->done = 0x1; - comp_done->fw_return_code = fw_return_code; + comp_done->fw_return_code = fw_return_code; - /* make update visible to waiting thread */ - smp_wmb(); + smp_store_release(&comp_done->done, 0x1); } static int qed_spq_block(struct qed_hwfn *p_hwfn, @@ -68,8 +66,8 @@ static int qed_spq_block(struct qed_hwfn *p_hwfn, comp_done = (struct qed_spq_comp_done *)p_ent->comp_cb.cookie; while (sleep_count) { /* validate we receive completion update */ - smp_rmb(); - if (comp_done->done == 1) { + if (READ_ONCE(comp_done->done) == 1) { + smp_read_barrier_depends(); if (p_fw_ret) *p_fw_ret = comp_done->fw_return_code; return 0; @@ -87,8 +85,8 @@ static int qed_spq_block(struct qed_hwfn *p_hwfn, sleep_count = SPQ_BLOCK_SLEEP_LENGTH; while (sleep_count) { /* validate we receive completion update */ - smp_rmb(); - if (comp_done->done == 1) { + if (READ_ONCE(comp_done->done) == 1) { + smp_read_barrier_depends(); if (p_fw_ret) *p_fw_ret = comp_done->fw_return_code; return 0; @@ -97,7 +95,8 @@ static int qed_
Re: Accelerated receive flow steering (aRFS) for UDP
On Sun, 2016-10-09 at 19:48 +, Chopra, Manish wrote: > Hi Eric, I used "-n" as well with "-N" but still the problem doesn't > go away. > > This is what I have done - > > Started "netserver" on local/test setup > > #netserver > Starting netserver with host 'IN(6)ADDR_ANY' port '12865' and family > AF_UNSPEC > > It starts listening on port "12865" > > From remote setup, started multiple netperf using different ports for > data sockets specified using "-P" with "-N" and "-n" options specified > as well. > netperf -H 192.168.200.40 -l 150 -t UDP_STREAM -T 8,8 -- -N -n -m 1400 > -P 6660,5550 & > netperf -H 192.168.200.40 -l 150 -t UDP_STREAM -T 9,9 -- -N -n -m 1400 > -P 9990,9880 & > netperf -H 192.168.200.40 -l 150 -t UDP_STREAM -T 10,10 -- -N -n -m > 1400 -P 4455,4400 & > netperf -H 192.168.200.40 -l 150 -t UDP_STREAM -T 11,11 -- -N -n -m > 1400 -P 3300,7800 & > netperf -H 192.168.200.40 -l 150 -t UDP_STREAM -T 12,12 -- -N -n -m > 1400 -P 50512,4 & > netperf -H 192.168.200.40 -l 150 -t UDP_STREAM -T 13,13 -- -N -n -m > 1400 -P 10512,45672 & > netperf -H 192.168.200.40 -l 150 -t UDP_STREAM -T 14,14 -- -N -n -m > 1400 -P ,56721 & > netperf -H 192.168.200.40 -l 150 -t UDP_STREAM -T 15,15 -- -N -n -m > 1400 -P 9300,8899 & > > When on local/test receiving setup, I dump skb's IP header protocol > field in .ndo_rx_flow_steer() handler - it is still always > IPPROTO_TCP. > Which has destined port 12865. But that handler never receives a SKB > whose IP header protocol field is set to IPPROTO_UDP. > > As suspected, I believe in receive flow, packets always go in the path > where it never match any entry in global flow table in get_rps_cpu() > function > ,possibly due to packets don't get received from the flow of > inet_recvmsg() which updates the global flow table ? > > 3571 /* First check into global flow table if there is > a match */ > 3572 ident = sock_flow_table->ents[hash & > sock_flow_table->mask]; > 3573 if ((ident ^ hash) & ~rps_cpu_mask) > 3574 goto try_rps; > > Hence, it never call set_rps_cpu() which internally is supposed to > call .ndo_rx_flow_steer() for the SKB's whose flows to be steered. > > On another side, when I use "Iperf" for sending UDP stream, which I > believe receives the packets from the intet_recvmsg() flow > and I do see flows getting steered for UDP packets. [Actually seeing > SKB's whose IP header protocol set to IPPROTO_UDP arriving > in .ndo-rx_flow_steer()]. > > iperf -s -u > iperf -u -c 192.168.200.40 -t 3000 -i 10 -P 8 OK, I am adding/CC Rick Jones, netperf author, since it seems a netperf bug, not a kernel one. I believe I already mentioned fact that "UDP_STREAM -- -N" was not doing a connect() on the receiver side. > > >
Re: [PATCH net-next v9 1/1] net: phy: Cleanup the Edge-Rate feature in Microsemi PHYs.
Hi, On 07/10/16 22:22, Andrew Lunn wrote: > Overall, this is much better. I just have a few nitpicks. It is good to hear that we are getting closer ;-) > dt-bindings/net/mscc-phy-vsc8531.h is removed by this patch. It would > be good to also remove the reference. My bad. > You don't need edge_slowdown and vddmac in the private structure, > since they are never used after determining what rate_magic is. Year... I was actually a bit confused about this... But assumed that you had some conventions about saving "input" configuration. Well, clearly not - I will clean this up. Actually, there is no need to store rate_magic either... It is only used in the init function. This means that there is no need for private date... The code could look somthing like this (it is not tested, so just look at see if you like the idea): #ifdef CONFIG_OF_MDIO static int vsc85xx_edge_rate_magic_get(struct phy_device *phydev) { int rc; u8 sd; u16 vdd; struct device *dev = &phydev->mdio.dev; struct device_node *of_node = dev->of_node; u8 sd_array_size = ARRAY_SIZE(edge_table[0].slowdown); if (!of_node) return -ENODEV; rc = of_property_read_u16(of_node, "vsc8531,vddmac", &vdd); if (rc != 0) vdd = MSCC_VDDMAC_3300; rc = of_property_read_u8(of_node, "vsc8531,edge-slowdown", &sd); if (rc != 0) sd = 0; for (vdd = 0; vdd < ARRAY_SIZE(edge_table); vdd++) if (edge_table[vdd].vddmac == vddmac) for (sd = 0; sd < sd_array_size; sd++) if (edge_table[vdd].slowdown[sd] == slowdown) return (sd_array_size - sd - 1); return -EINVAL; } #else static int vsc85xx_edge_rate_magic_get(struct phy_device *phydev) { return 0; } #endif /* CONFIG_OF_MDIO */ static int vsc85xx_config_init(struct phy_device *phydev) { int rc, rate_magic; ... rate_magic = vsc85xx_edge_rate_magic_get(phydev); if (rate_magic < 0) return rate_magic; rc = vsc85xx_edge_rate_cntl_set(phydev, rate_magic); if (rc) return rc; ... } This is clearly how I would prefere it, as it would simplify the implementation. It should be no problem to have this tested, and have a new patch avialable tomorrow. /Allan
Re: [PATCH v2 1/3] net: smc91x: isolate u16 writes alignment workaround
On Sun, Oct 9, 2016 at 11:33 PM, Robert Jarzmik wrote: > Writes to u16 has a special handling on 3 PXA platforms, where the > hardware wiring forces these writes to be u32 aligned. > > This patch isolates this handling for PXA platforms as before, but > enables this "workaround" to be set up dynamically, which will be the > case in device-tree build types. > > This patch was tested on 2 PXA platforms : mainstone, which relies on > the workaround, and lubbock, which doesn't. > @@ -2276,6 +2277,9 @@ static int smc_drv_probe(struct platform_device *pdev) > memcpy(&lp->cfg, pd, sizeof(lp->cfg)); > lp->io_shift = SMC91X_IO_SHIFT(lp->cfg.flags); > } > + lp->half_word_align4 = > + machine_is_mainstone() || machine_is_stargate2() || > + machine_is_pxa_idp(); > /* We actually can't write halfwords properly if not word aligned */ > -static inline void SMC_outw(u16 val, void __iomem *ioaddr, int reg) > +static inline void _SMC_outw_align4(u16 val, void __iomem *ioaddr, int reg, > + bool use_align4_workaround) > { > - if ((machine_is_mainstone() || machine_is_stargate2() || > -machine_is_pxa_idp()) && reg & 2) { > + if (use_align4_workaround) { > unsigned int v = val << 16; > v |= readl(ioaddr + (reg & ~2)) & 0x; > writel(v, ioaddr + (reg & ~2)); > +#define SMC_outw(lp, v, a, r) \ > + _SMC_outw_align4((v), (a), (r), \ > +IS_BUILTIN(CONFIG_ARCH_PXA) && ((r) & 2) &&\ > +lp->half_word_align4) Hmm... Isn't enough to have just (r) & 2 && lp->half_word_align4 ? -- With Best Regards, Andy Shevchenko
slab corruption with current -git (was Re: [git pull] vfs pile 1 (splice))
On Sun, Oct 9, 2016 at 12:11 PM, Linus Torvalds wrote: > > Anyway, I don't think I can bisect it, but I'll try to narrow it down > a *bit* at least. > > Not doing any more pulls on this unstable base, I've been puttering > around in trying to clean up some stupid printk logging issues > instead. So I finally got a oops with slub debugging enabled. It doesn't really narrow things down, though, it kind of extends on the possible suspects. Now adding David Miller and Pablo, because it looks like it may be netfilter that does something bad and corrupts memory. Of course, maybe this is another symptom, and not the root cause for my troubles, but it does look like it might be getting closer to the cause... In particular, now it very much looks like a use-after-free in the netfilter code, which *could* explain my original symptom with later allocation users oopsing randomly. Without further ado, here's the new oops: general protection fault: [#1] SMP CPU: 7 PID: 169 Comm: kworker/u16:7 Not tainted 4.8.0-11288-gb66484cd7470 #1 Hardware name: System manufacturer System Product Name/Z170-K, BIOS 1803 05/06/2016 Workqueue: netns cleanup_net task: 91935e001fc0 task.stack: b4e2c213c000 RIP: nf_unregister_net_hook+0x5f/0x190 RSP: :b4e2c213fd40 EFLAGS: 00010202 RAX: 6b6b6b6b6b6b6b6b RBX: 91933c4ab968 RCX: 0002 RDX: 0002 RSI: c0642280 RDI: 91cf9820 RBP: b4e2c213fd58 R08: 91933c4a86c8 R09: 0025 R10: 00cc R11: 91935dd22000 R12: c0642280 R13: 91934cc0ea80 R14: 91cf97e0 R15: FS: () GS:919376dc() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: 03e7c000 CR3: 0003fdb62000 CR4: 003406e0 DR0: DR1: DR2: DR3: DR6: fffe0ff0 DR7: 0400 Call Trace: netfilter_net_exit+0x2f/0x60 ops_exit_list.isra.4+0x38/0x60 cleanup_net+0x1ba/0x2a0 process_one_work+0x1f1/0x480 worker_thread+0x48/0x4d0 ? process_one_work+0x480/0x480 ? process_one_work+0x480/0x480 kthread+0xd9/0xf0 ? kthread_park+0x60/0x60 ret_from_fork+0x22/0x30 Code: 0f b6 ca 48 8d 84 c8 00 01 00 00 49 8b 5c c5 00 48 85 db 0f 84 cb 00 00 00 4c 3b 63 40 48 8b 03 0f 84 e9 00 00 00 48 85 c0 74 26 <4c> 3b 60 40 75 08 e9 ef 00 00 00 48 89 d8 48 8b 18 48 85 db 0f RIP [] nf_unregister_net_hook+0x5f/0x190 and note the value in %rax: 6b is POISON_FREE, so it very much looks like it's a pointer loaded from a free'd allocation. The code disassembles to 0: 0f b6 ca movzbl %dl,%ecx 3: 48 8d 84 c8 00 01 00 lea0x100(%rax,%rcx,8),%rax a: 00 b: 49 8b 5c c5 00 mov0x0(%r13,%rax,8),%rbx 10: 48 85 db test %rbx,%rbx 13: 0f 84 cb 00 00 00 je 0xe4 19: 4c 3b 63 40 cmp0x40(%rbx),%r12 1d: 48 8b 03 mov(%rbx),%rax 20: 0f 84 e9 00 00 00 je 0x10f 26: 48 85 c0 test %rax,%rax 29: 74 26 je 0x51 2b:* 4c 3b 60 40 cmp0x40(%rax),%r12 <-- trapping instruction 2f: 75 08 jne0x39 31: e9 ef 00 00 00 jmpq 0x125 36: 48 89 d8 mov%rbx,%rax 39: 48 8b 18 mov(%rax),%rbx 3c: 48 85 db test %rbx,%rbx and that oopsing instruction seems to be the compare of "hooks_entry->orig_ops" from hooks_entry in this expression: if (hooks_entry && hooks_entry->orig_ops == reg) { so hooks_entry() is bogus. It was gotten from hooks_entry = nf_hook_entry_head(net, reg); but that's as far as I dug. And yes, I do have CONFIG_NETFILTER_INGRESS=y in case that matters. And all this code has changed pretty radically in commit e3b37f11e6e4 ("netfilter: replace list_head with single linked list"), and there was clearly already something wrong with that code, with commit 5119e4381a90 ("netfilter: Fix potential null pointer dereference") adding the test against NULL. But I suspect that only hid the "oops, it's actually not NULL, it loaded some uninitialized value" problem. Over to the networking guys.. Ideas? Linus
Please Check.
I have a proposal to share Joseph
[PATCH v2 1/3] net: smc91x: isolate u16 writes alignment workaround
Writes to u16 has a special handling on 3 PXA platforms, where the hardware wiring forces these writes to be u32 aligned. This patch isolates this handling for PXA platforms as before, but enables this "workaround" to be set up dynamically, which will be the case in device-tree build types. This patch was tested on 2 PXA platforms : mainstone, which relies on the workaround, and lubbock, which doesn't. Signed-off-by: Robert Jarzmik --- drivers/net/ethernet/smsc/smc91x.c | 6 ++- drivers/net/ethernet/smsc/smc91x.h | 80 ++ 2 files changed, 52 insertions(+), 34 deletions(-) diff --git a/drivers/net/ethernet/smsc/smc91x.c b/drivers/net/ethernet/smsc/smc91x.c index 77ad2a3f59db..b1f74e06d98e 100644 --- a/drivers/net/ethernet/smsc/smc91x.c +++ b/drivers/net/ethernet/smsc/smc91x.c @@ -602,7 +602,8 @@ static void smc_hardware_send_pkt(unsigned long data) SMC_PUSH_DATA(lp, buf, len & ~1); /* Send final ctl word with the last byte if there is one */ - SMC_outw(((len & 1) ? (0x2000 | buf[len-1]) : 0), ioaddr, DATA_REG(lp)); + SMC_outw(lp, ((len & 1) ? (0x2000 | buf[len-1]) : 0), ioaddr, +DATA_REG(lp)); /* * If THROTTLE_TX_PKTS is set, we stop the queue here. This will @@ -2276,6 +2277,9 @@ static int smc_drv_probe(struct platform_device *pdev) memcpy(&lp->cfg, pd, sizeof(lp->cfg)); lp->io_shift = SMC91X_IO_SHIFT(lp->cfg.flags); } + lp->half_word_align4 = + machine_is_mainstone() || machine_is_stargate2() || + machine_is_pxa_idp(); #if IS_BUILTIN(CONFIG_OF) match = of_match_device(of_match_ptr(smc91x_match), &pdev->dev); diff --git a/drivers/net/ethernet/smsc/smc91x.h b/drivers/net/ethernet/smsc/smc91x.h index 1a55c7976df0..2b7752db8635 100644 --- a/drivers/net/ethernet/smsc/smc91x.h +++ b/drivers/net/ethernet/smsc/smc91x.h @@ -66,10 +66,10 @@ #define SMC_IRQ_FLAGS (-1)/* from resource */ /* We actually can't write halfwords properly if not word aligned */ -static inline void SMC_outw(u16 val, void __iomem *ioaddr, int reg) +static inline void _SMC_outw_align4(u16 val, void __iomem *ioaddr, int reg, + bool use_align4_workaround) { - if ((machine_is_mainstone() || machine_is_stargate2() || -machine_is_pxa_idp()) && reg & 2) { + if (use_align4_workaround) { unsigned int v = val << 16; v |= readl(ioaddr + (reg & ~2)) & 0x; writel(v, ioaddr + (reg & ~2)); @@ -78,6 +78,12 @@ static inline void SMC_outw(u16 val, void __iomem *ioaddr, int reg) } } +#define SMC_outw(lp, v, a, r) \ + _SMC_outw_align4((v), (a), (r), \ +IS_BUILTIN(CONFIG_ARCH_PXA) && ((r) & 2) &&\ +lp->half_word_align4) + + #elif defined(CONFIG_SH_SH4202_MICRODEV) #define SMC_CAN_USE_8BIT 0 @@ -88,7 +94,8 @@ static inline void SMC_outw(u16 val, void __iomem *ioaddr, int reg) #define SMC_inw(a, r) inw((a) + (r) - 0xa000) #define SMC_inl(a, r) inl((a) + (r) - 0xa000) #define SMC_outb(v, a, r) outb(v, (a) + (r) - 0xa000) -#define SMC_outw(v, a, r) outw(v, (a) + (r) - 0xa000) +#define _SMC_outw(v, a, r) outw(v, (a) + (r) - 0xa000) +#define SMC_outw(lp, v, a, r) _SMC_outw((v), (a), (r)) #define SMC_outl(v, a, r) outl(v, (a) + (r) - 0xa000) #define SMC_insl(a, r, p, l) insl((a) + (r) - 0xa000, p, l) #define SMC_outsl(a, r, p, l) outsl((a) + (r) - 0xa000, p, l) @@ -106,7 +113,8 @@ static inline void SMC_outw(u16 val, void __iomem *ioaddr, int reg) #define SMC_inb(a, r) inb(((u32)a) + (r)) #define SMC_inw(a, r) inw(((u32)a) + (r)) #define SMC_outb(v, a, r) outb(v, ((u32)a) + (r)) -#define SMC_outw(v, a, r) outw(v, ((u32)a) + (r)) +#define _SMC_outw(v, a, r) outw(v, ((u32)a) + (r)) +#define SMC_outw(lp, v, a, r) _SMC_outw((v), (a), (r)) #define SMC_insw(a, r, p, l) insw(((u32)a) + (r), p, l) #define SMC_outsw(a, r, p, l) outsw(((u32)a) + (r), p, l) @@ -134,7 +142,8 @@ static inline void SMC_outw(u16 val, void __iomem *ioaddr, int reg) #define SMC_inw(a, r) readw((a) + (r)) #define SMC_inl(a, r) readl((a) + (r)) #define SMC_outb(v, a, r) writeb(v, (a) + (r)) -#define SMC_outw(v, a, r) writew(v, (a) + (r)) +#define _SMC_outw(v, a, r) writew(v, (a) + (r)) +#define SMC_outw(lp, v, a, r) _SMC_outw((v), (a), (r)) #define SMC_outl(v, a, r) writel(v, (a) + (r)) #define SMC_insw(a, r, p, l)readsw((a) + (r), p, l) #define SMC_outsw(a, r, p, l) writesw((a) + (r), p, l) @@ -166,7 +175,8 @@ static inline void mcf_outsw(void *a, unsigned char *p, int l) } #define SMC_inw(a, r) _swapw(readw((a) + (r))) -#define SMC_outw(v, a
[PATCH v2 2/3] net: smc91x: take into account half-word workaround
For device-tree builds, platforms such as mainstone, idp and stargate2 must have their u16 writes all aligned on 32 bit boundaries. This is already enabled in platform data builds, and this patch adds it to device-tree builds. Signed-off-by: Robert Jarzmik --- Since v1: rename dt property to pxa-u16-align4 --- drivers/net/ethernet/smsc/smc91x.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/net/ethernet/smsc/smc91x.c b/drivers/net/ethernet/smsc/smc91x.c index b1f74e06d98e..e2f151258b38 100644 --- a/drivers/net/ethernet/smsc/smc91x.c +++ b/drivers/net/ethernet/smsc/smc91x.c @@ -2323,6 +2323,8 @@ static int smc_drv_probe(struct platform_device *pdev) if (!device_property_read_u32(&pdev->dev, "reg-shift", &val)) lp->io_shift = val; + lp->half_word_align4 = + device_property_read_bool(&pdev->dev, "pxa-u16-align4"); } #endif -- 2.1.4
[PATCH v2 3/3] net: smsc91x: add u16 workaround for pxa platforms
Add a workaround for mainstone, idp and stargate2 boards, for u16 writes which must be aligned on 32 bits addresses. Signed-off-by: Robert Jarzmik Cc: Jeremy Linton --- Since v1: rename dt property to pxa-u16-align4 change the binding documentation file --- Documentation/devicetree/bindings/net/smsc-lan91c111.txt | 2 ++ 1 file changed, 2 insertions(+) diff --git a/Documentation/devicetree/bindings/net/smsc-lan91c111.txt b/Documentation/devicetree/bindings/net/smsc-lan91c111.txt index e77e167593db..309e37eb7c7c 100644 --- a/Documentation/devicetree/bindings/net/smsc-lan91c111.txt +++ b/Documentation/devicetree/bindings/net/smsc-lan91c111.txt @@ -13,3 +13,5 @@ Optional properties: 16-bit access only. - power-gpios: GPIO to control the PWRDWN pin - reset-gpios: GPIO to control the RESET pin +- pxa-u16-align4 : Boolean, put in place the workaround the force all + u16 writes to be 32 bits aligned -- 2.1.4
Re: [PATCH net-next v9 1/1] net: phy: Cleanup the Edge-Rate feature in Microsemi PHYs.
> Year... I was actually a bit confused about this... But assumed that you had > some conventions about saving "input" configuration. Nope. Humm, actually, i messed up here and missed something. Sorry. You should really do all the DT parsing in the probe function, or a function it calls. We want the probe to return -EINVAL if the device tree is invalid. Either you can program the magic value immediately, if it will survive a reset, or save it in the private structure until the init is called. Andrew
RE: Accelerated receive flow steering (aRFS) for UDP
> -Original Message- > From: Eric Dumazet [mailto:eric.duma...@gmail.com] > Sent: Sunday, October 09, 2016 10:45 PM > To: Chopra, Manish > Cc: netdev@vger.kernel.org; ma...@mellanox.com; t...@herbertland.com > Subject: Re: Accelerated receive flow steering (aRFS) for UDP > > On Sat, 2016-10-08 at 12:25 +, Chopra, Manish wrote: > > > -Original Message- > > > From: Eric Dumazet [mailto:eric.duma...@gmail.com] > > > Sent: Saturday, October 08, 2016 5:08 AM > > > To: Chopra, Manish > > > Cc: netdev@vger.kernel.org; ma...@mellanox.com; t...@herbertland.com > > > Subject: Re: Accelerated receive flow steering (aRFS) for UDP > > > > > > On Fri, 2016-10-07 at 22:55 +, Chopra, Manish wrote: > > > > Hello Folks, > > > > > > > > I am experimenting aRFS with our NIC devices, and for that I have > > > > kernel 4.8.x installed with below config. > > > > > > > > CONFIG_RPS=y > > > > CONFIG_RFS_ACCEL=y > > > > > > > > # cat /proc/cpuinfo | grep processor > > > > processor : 0 > > > > processor : 1 > > > > processor : 2 > > > > processor : 3 > > > > processor : 4 > > > > processor : 5 > > > > processor : 6 > > > > processor : 7 > > > > processor : 8 > > > > processor : 9 > > > > processor : 10 > > > > processor : 11 > > > > processor : 12 > > > > processor : 13 > > > > processor : 14 > > > > processor : 15 > > > > > > > > I configured rps_sock_flow_entries and our NIC rx queues with below > > > > values > > > > > > > > echo 32768 > /proc/sys/net/core/rps_sock_flow_entries > > > > echo 4096 > /sys/class/net/p4p1/queues/rx-0/rps_flow_cnt > > > > echo 4096 > /sys/class/net/p4p1/queues/rx-1/rps_flow_cnt > > > > echo 4096 > /sys/class/net/p4p1/queues/rx-2/rps_flow_cnt > > > > echo 4096 > /sys/class/net/p4p1/queues/rx-3/rps_flow_cnt > > > > echo 4096 > /sys/class/net/p4p1/queues/rx-4/rps_flow_cnt > > > > echo 4096 > /sys/class/net/p4p1/queues/rx-5/rps_flow_cnt > > > > echo 4096 > /sys/class/net/p4p1/queues/rx-6/rps_flow_cnt > > > > echo 4096 > /sys/class/net/p4p1/queues/rx-7/rps_flow_cnt > > > > > > > > echo > /sys/class/net/p4p1/queues/rx-0/rps_cpus > > > > echo > /sys/class/net/p4p1/queues/rx-1/rps_cpus > > > > echo > /sys/class/net/p4p1/queues/rx-2/rps_cpus > > > > echo > /sys/class/net/p4p1/queues/rx-3/rps_cpus > > > > echo > /sys/class/net/p4p1/queues/rx-4/rps_cpus > > > > echo > /sys/class/net/p4p1/queues/rx-5/rps_cpus > > > > echo > /sys/class/net/p4p1/queues/rx-6/rps_cpus > > > > echo > /sys/class/net/p4p1/queues/rx-7/rps_cpus > > > > > > > > Below is IRQ affinity configuration for NIC irqs used. > > > > > > > > # cat /proc/irq/67/smp_affinity_list > > > > 8 > > > > # cat /proc/irq/68/smp_affinity_list > > > > 9 > > > > # cat /proc/irq/69/smp_affinity_list > > > > 10 > > > > # cat /proc/irq/70/smp_affinity_list > > > > 11 > > > > # cat /proc/irq/71/smp_affinity_list > > > > 12 > > > > # cat /proc/irq/72/smp_affinity_list > > > > 13 > > > > # cat /proc/irq/73/smp_affinity_list > > > > 14 > > > > # cat /proc/irq/74/smp_affinity_list > > > > 15 > > > > > > > > Driver has required feature NETIF_F_NTUPLE set, ndo_rx_flow_steer() > > > > registered and I am running UDP multiple connections stream using > > > > netperf to the host where I am experimenting aRFS. > > > > > > > > # netperf -V > > > > Netperf version 2.7.0 > > > > > > > > netperf -H 192.168.200.40 -t UDP_STREAM -l 150 -T 8,8 -- -m 1470 -P > > > > 5001,48512 & > > > > netperf -H 192.168.200.40 -t UDP_STREAM -l 150 -T 9,9 -- -m 1470 -P > > > > 5001,37990 & > > > > netperf -H 192.168.200.40 -t UDP_STREAM -l 150 -T 10,10 -- -m 1470 -P > > > > 5001,40302 & > > > > netperf -H 192.168.200.40 -t UDP_STREAM -l 150 -T 11,11 -- -m 1470 -P > > > > 5001,39071 & > > > > netperf -H 192.168.200.40 -t UDP_STREAM -l 150 -T 12,12 -- -m 1470 -P > > > > 5001,58994 & > > > > netperf -H 192.168.200.40 -t UDP_STREAM -l 150 -T 13,13 -- -m 1470 -P > > > > 5001,59884 & > > > > netperf -H 192.168.200.40 -t UDP_STREAM -l 150 -T 14,14 -- -m 1470 -P > > > > 5001,40282 & > > > > netperf -H 192.168.200.40 -t UDP_STREAM -l 150 -T 15,15 -- -m 1470 -P > > > > 5001,56042 & > > > > > > > > I see that our registered callback for ndo_rx_flow_steer() "NEVER" > > > > gets invoked for UDP packets, with TCP_STREAM I do see it gets > > > > invoked. > > > > But while running UDP_STREAM I see it gets invoked for some of TCP > > > > packets as netperf also uses TCP managed connections while running > > > > UDP_STREAM. > > > > > > > > My initial investigation suspects that while running UDP_STREAM with > > > > netperf, rps_sock_flow_table doesn't get updated, as packets never > > > > reach to the flow of inet_recvmsg() > > > > where it gets updated using sock_rps_record_flow(). Which might be the > > > > reason it never invokes NIC's flow steering handler ? > > > > > > > > Please note that when I run UDP stream using "iperf" - I do see that > > > > ou
Re: Accelerated receive flow steering (aRFS) for UDP
On Sat, 2016-10-08 at 12:25 +, Chopra, Manish wrote: > > -Original Message- > > From: Eric Dumazet [mailto:eric.duma...@gmail.com] > > Sent: Saturday, October 08, 2016 5:08 AM > > To: Chopra, Manish > > Cc: netdev@vger.kernel.org; ma...@mellanox.com; t...@herbertland.com > > Subject: Re: Accelerated receive flow steering (aRFS) for UDP > > > > On Fri, 2016-10-07 at 22:55 +, Chopra, Manish wrote: > > > Hello Folks, > > > > > > I am experimenting aRFS with our NIC devices, and for that I have > > > kernel 4.8.x installed with below config. > > > > > > CONFIG_RPS=y > > > CONFIG_RFS_ACCEL=y > > > > > > # cat /proc/cpuinfo | grep processor > > > processor : 0 > > > processor : 1 > > > processor : 2 > > > processor : 3 > > > processor : 4 > > > processor : 5 > > > processor : 6 > > > processor : 7 > > > processor : 8 > > > processor : 9 > > > processor : 10 > > > processor : 11 > > > processor : 12 > > > processor : 13 > > > processor : 14 > > > processor : 15 > > > > > > I configured rps_sock_flow_entries and our NIC rx queues with below > > > values > > > > > > echo 32768 > /proc/sys/net/core/rps_sock_flow_entries > > > echo 4096 > /sys/class/net/p4p1/queues/rx-0/rps_flow_cnt > > > echo 4096 > /sys/class/net/p4p1/queues/rx-1/rps_flow_cnt > > > echo 4096 > /sys/class/net/p4p1/queues/rx-2/rps_flow_cnt > > > echo 4096 > /sys/class/net/p4p1/queues/rx-3/rps_flow_cnt > > > echo 4096 > /sys/class/net/p4p1/queues/rx-4/rps_flow_cnt > > > echo 4096 > /sys/class/net/p4p1/queues/rx-5/rps_flow_cnt > > > echo 4096 > /sys/class/net/p4p1/queues/rx-6/rps_flow_cnt > > > echo 4096 > /sys/class/net/p4p1/queues/rx-7/rps_flow_cnt > > > > > > echo > /sys/class/net/p4p1/queues/rx-0/rps_cpus > > > echo > /sys/class/net/p4p1/queues/rx-1/rps_cpus > > > echo > /sys/class/net/p4p1/queues/rx-2/rps_cpus > > > echo > /sys/class/net/p4p1/queues/rx-3/rps_cpus > > > echo > /sys/class/net/p4p1/queues/rx-4/rps_cpus > > > echo > /sys/class/net/p4p1/queues/rx-5/rps_cpus > > > echo > /sys/class/net/p4p1/queues/rx-6/rps_cpus > > > echo > /sys/class/net/p4p1/queues/rx-7/rps_cpus > > > > > > Below is IRQ affinity configuration for NIC irqs used. > > > > > > # cat /proc/irq/67/smp_affinity_list > > > 8 > > > # cat /proc/irq/68/smp_affinity_list > > > 9 > > > # cat /proc/irq/69/smp_affinity_list > > > 10 > > > # cat /proc/irq/70/smp_affinity_list > > > 11 > > > # cat /proc/irq/71/smp_affinity_list > > > 12 > > > # cat /proc/irq/72/smp_affinity_list > > > 13 > > > # cat /proc/irq/73/smp_affinity_list > > > 14 > > > # cat /proc/irq/74/smp_affinity_list > > > 15 > > > > > > Driver has required feature NETIF_F_NTUPLE set, ndo_rx_flow_steer() > > > registered and I am running UDP multiple connections stream using > > > netperf to the host where I am experimenting aRFS. > > > > > > # netperf -V > > > Netperf version 2.7.0 > > > > > > netperf -H 192.168.200.40 -t UDP_STREAM -l 150 -T 8,8 -- -m 1470 -P > > > 5001,48512 & > > > netperf -H 192.168.200.40 -t UDP_STREAM -l 150 -T 9,9 -- -m 1470 -P > > > 5001,37990 & > > > netperf -H 192.168.200.40 -t UDP_STREAM -l 150 -T 10,10 -- -m 1470 -P > > > 5001,40302 & > > > netperf -H 192.168.200.40 -t UDP_STREAM -l 150 -T 11,11 -- -m 1470 -P > > > 5001,39071 & > > > netperf -H 192.168.200.40 -t UDP_STREAM -l 150 -T 12,12 -- -m 1470 -P > > > 5001,58994 & > > > netperf -H 192.168.200.40 -t UDP_STREAM -l 150 -T 13,13 -- -m 1470 -P > > > 5001,59884 & > > > netperf -H 192.168.200.40 -t UDP_STREAM -l 150 -T 14,14 -- -m 1470 -P > > > 5001,40282 & > > > netperf -H 192.168.200.40 -t UDP_STREAM -l 150 -T 15,15 -- -m 1470 -P > > > 5001,56042 & > > > > > > I see that our registered callback for ndo_rx_flow_steer() "NEVER" > > > gets invoked for UDP packets, with TCP_STREAM I do see it gets > > > invoked. > > > But while running UDP_STREAM I see it gets invoked for some of TCP > > > packets as netperf also uses TCP managed connections while running > > > UDP_STREAM. > > > > > > My initial investigation suspects that while running UDP_STREAM with > > > netperf, rps_sock_flow_table doesn't get updated, as packets never > > > reach to the flow of inet_recvmsg() > > > where it gets updated using sock_rps_record_flow(). Which might be the > > > reason it never invokes NIC's flow steering handler ? > > > > > > Please note that when I run UDP stream using "iperf" - I do see that > > > our registered callback function for flow steering gets invoked for > > > "UDP" packets. > > > I am not sure if I am missing something in configuration or something > > > else which is I am unware of ? > > > > > > I appreciate any help for this. > > > > Make sure you use connected UDP flows > > > > > > netperf -t UDP_STREAM ... -- -N -n > > > > Otherwise, one UDP socket can be involved in millions of 4-tuples (aka > > flows) > > > > > > Hi Eric, I tried with it but it doesn't help with the said issue.
[PATCH net-next 3/6] qede: Prevent GSO on long Geneve headers
From: Manish Chopra Due to hardware limitation, when transmitting a geneve-encapsulated packet with more than 32 bytes worth of geneve options the hardware would not be able to crack the packet and consider it a regular UDP packet. This implements the ndo_features_check() in qede in order to prevent GSO on said transmitted packets. Signed-off-by: Manish Chopra Signed-off-by: Yuval Mintz --- drivers/net/ethernet/qlogic/qede/qede_main.c | 35 1 file changed, 35 insertions(+) diff --git a/drivers/net/ethernet/qlogic/qede/qede_main.c b/drivers/net/ethernet/qlogic/qede/qede_main.c index 7d5dc1e..6c2b09c 100644 --- a/drivers/net/ethernet/qlogic/qede/qede_main.c +++ b/drivers/net/ethernet/qlogic/qede/qede_main.c @@ -2240,6 +2240,40 @@ static void qede_udp_tunnel_del(struct net_device *dev, schedule_delayed_work(&edev->sp_task, 0); } +/* 8B udp header + 8B base tunnel header + 32B option length */ +#define QEDE_MAX_TUN_HDR_LEN 48 + +static netdev_features_t qede_features_check(struct sk_buff *skb, +struct net_device *dev, +netdev_features_t features) +{ + if (skb->encapsulation) { + u8 l4_proto = 0; + + switch (vlan_get_protocol(skb)) { + case htons(ETH_P_IP): + l4_proto = ip_hdr(skb)->protocol; + break; + case htons(ETH_P_IPV6): + l4_proto = ipv6_hdr(skb)->nexthdr; + break; + default: + return features; + } + + /* Disable offloads for geneve tunnels, as HW can't parse +* the geneve header which has option length greater than 32B. +*/ + if ((l4_proto == IPPROTO_UDP) && + ((skb_inner_mac_header(skb) - + skb_transport_header(skb)) > QEDE_MAX_TUN_HDR_LEN)) + return features & ~(NETIF_F_CSUM_MASK | + NETIF_F_GSO_MASK); + } + + return features; +} + static const struct net_device_ops qede_netdev_ops = { .ndo_open = qede_open, .ndo_stop = qede_close, @@ -2264,6 +2298,7 @@ static const struct net_device_ops qede_netdev_ops = { #endif .ndo_udp_tunnel_add = qede_udp_tunnel_add, .ndo_udp_tunnel_del = qede_udp_tunnel_del, + .ndo_features_check = qede_features_check, }; /* - -- 1.9.3
[PATCH net-next 1/6] qed: Pass MAC hints to VFs
From: Yuval Mintz Some hypervisors can support MAC hints to their VFs. Even though we don't have such a hypervisor API in linux, we add sufficient logic for the VF to be able to receive such hints and set the mac accordingly - as long as the VF has not been set with a MAC already. Signed-off-by: Yuval Mintz --- drivers/net/ethernet/qlogic/qed/qed_vf.c | 4 ++-- drivers/net/ethernet/qlogic/qede/qede_main.c | 6 +- include/linux/qed/qed_eth_if.h | 2 +- 3 files changed, 8 insertions(+), 4 deletions(-) diff --git a/drivers/net/ethernet/qlogic/qed/qed_vf.c b/drivers/net/ethernet/qlogic/qed/qed_vf.c index abf5bf1..f580bf4 100644 --- a/drivers/net/ethernet/qlogic/qed/qed_vf.c +++ b/drivers/net/ethernet/qlogic/qed/qed_vf.c @@ -1230,8 +1230,8 @@ static void qed_handle_bulletin_change(struct qed_hwfn *hwfn) is_mac_exist = qed_vf_bulletin_get_forced_mac(hwfn, mac, &is_mac_forced); - if (is_mac_exist && is_mac_forced && cookie) - ops->force_mac(cookie, mac); + if (is_mac_exist && cookie) + ops->force_mac(cookie, mac, !!is_mac_forced); /* Always update link configuration according to bulletin */ qed_link_update(hwfn); diff --git a/drivers/net/ethernet/qlogic/qede/qede_main.c b/drivers/net/ethernet/qlogic/qede/qede_main.c index 343038c..9866d95 100644 --- a/drivers/net/ethernet/qlogic/qede/qede_main.c +++ b/drivers/net/ethernet/qlogic/qede/qede_main.c @@ -171,10 +171,14 @@ static struct pci_driver qede_pci_driver = { #endif }; -static void qede_force_mac(void *dev, u8 *mac) +static void qede_force_mac(void *dev, u8 *mac, bool forced) { struct qede_dev *edev = dev; + /* MAC hints take effect only if we haven't set one already */ + if (is_valid_ether_addr(edev->ndev->dev_addr) && !forced) + return; + ether_addr_copy(edev->ndev->dev_addr, mac); ether_addr_copy(edev->primary_mac, mac); } diff --git a/include/linux/qed/qed_eth_if.h b/include/linux/qed/qed_eth_if.h index 33c24eb..1c77948 100644 --- a/include/linux/qed/qed_eth_if.h +++ b/include/linux/qed/qed_eth_if.h @@ -129,7 +129,7 @@ struct qed_tunn_params { struct qed_eth_cb_ops { struct qed_common_cb_ops common; - void (*force_mac) (void *dev, u8 *mac); + void (*force_mac) (void *dev, u8 *mac, bool forced); }; #ifdef CONFIG_DCB -- 1.9.3
[PATCH net-next 5/6] qed: Allow chance for fast ramrod completions
From: Yuval Mintz Whenever a ramrod is being sent for some device configuration, the driver is going to sleep at least 5ms between each iteration of polling on the completion of the ramrod. However, in almost every configuration scenario the firmware would be able to comply and complete the ramrod in a manner of several usecs. This is especially important in cases where there might be a lot of sequential configurations applying to the hardware [e.g., RoCE], in which case the existing scheme might cause some visible user delays. This patch changes the completion scheme - instead of immediately starting to sleep for a 'long' period, allow the device to quickly poll on the first iteration after a couple of usecs. Signed-off-by: Yuval Mintz --- drivers/net/ethernet/qlogic/qed/qed_spq.c | 85 +-- 1 file changed, 59 insertions(+), 26 deletions(-) diff --git a/drivers/net/ethernet/qlogic/qed/qed_spq.c b/drivers/net/ethernet/qlogic/qed/qed_spq.c index caff415..259a615 100644 --- a/drivers/net/ethernet/qlogic/qed/qed_spq.c +++ b/drivers/net/ethernet/qlogic/qed/qed_spq.c @@ -37,7 +37,11 @@ ***/ #define SPQ_HIGH_PRI_RESERVE_DEFAULT(1) -#define SPQ_BLOCK_SLEEP_LENGTH (1000) + +#define SPQ_BLOCK_DELAY_MAX_ITER(10) +#define SPQ_BLOCK_DELAY_US (10) +#define SPQ_BLOCK_SLEEP_MAX_ITER(1000) +#define SPQ_BLOCK_SLEEP_MS (5) /*** * Blocking Imp. (BLOCK/EBLOCK mode) @@ -57,53 +61,81 @@ static void qed_spq_blocking_cb(struct qed_hwfn *p_hwfn, smp_wmb(); } -static int qed_spq_block(struct qed_hwfn *p_hwfn, -struct qed_spq_entry *p_ent, -u8 *p_fw_ret) +static int __qed_spq_block(struct qed_hwfn *p_hwfn, + struct qed_spq_entry *p_ent, + u8 *p_fw_ret, bool sleep_between_iter) { - int sleep_count = SPQ_BLOCK_SLEEP_LENGTH; struct qed_spq_comp_done *comp_done; - int rc; + u32 iter_cnt; comp_done = (struct qed_spq_comp_done *)p_ent->comp_cb.cookie; - while (sleep_count) { - /* validate we receive completion update */ + iter_cnt = sleep_between_iter ? SPQ_BLOCK_SLEEP_MAX_ITER + : SPQ_BLOCK_DELAY_MAX_ITER; + + while (iter_cnt--) { + /* Validate we receive completion update */ smp_rmb(); if (comp_done->done == 1) { if (p_fw_ret) *p_fw_ret = comp_done->fw_return_code; return 0; } - usleep_range(5000, 1); - sleep_count--; + + if (sleep_between_iter) + msleep(SPQ_BLOCK_SLEEP_MS); + else + udelay(SPQ_BLOCK_DELAY_US); } + return -EBUSY; +} + +static int qed_spq_block(struct qed_hwfn *p_hwfn, +struct qed_spq_entry *p_ent, +u8 *p_fw_ret, bool skip_quick_poll) +{ + struct qed_spq_comp_done *comp_done; + int rc; + + /* A relatively short polling period w/o sleeping, to allow the FW to +* complete the ramrod and thus possibly to avoid the following sleeps. +*/ + if (!skip_quick_poll) { + rc = __qed_spq_block(p_hwfn, p_ent, p_fw_ret, false); + if (!rc) + return 0; + } + + /* Move to polling with a sleeping period between iterations */ + rc = __qed_spq_block(p_hwfn, p_ent, p_fw_ret, true); + if (!rc) + return 0; + DP_INFO(p_hwfn, "Ramrod is stuck, requesting MCP drain\n"); rc = qed_mcp_drain(p_hwfn, p_hwfn->p_main_ptt); - if (rc != 0) + if (rc) { DP_NOTICE(p_hwfn, "MCP drain failed\n"); + goto err; + } /* Retry after drain */ - sleep_count = SPQ_BLOCK_SLEEP_LENGTH; - while (sleep_count) { - /* validate we receive completion update */ - smp_rmb(); - if (comp_done->done == 1) { - if (p_fw_ret) - *p_fw_ret = comp_done->fw_return_code; - return 0; - } - usleep_range(5000, 1); - sleep_count--; - } + rc = __qed_spq_block(p_hwfn, p_ent, p_fw_ret, true); + if (!rc) + return 0; + comp_done = (struct qed_spq_comp_done *)p_ent->comp_cb.cookie; if (comp_done->done == 1) { if (p_fw_ret) *p_fw_ret = comp_done->fw_return_code; return 0; } - - DP_NOTICE(p_hwfn, "Ramrod is stuck, MCP drain failed\n"); +err: + DP_NOTICE(p_hwf
[PATCH net-next 2/6] qede: GSO support for tunnels with outer csum
From: Manish Chopra This patch adds GSO support for GRE and UDP tunnels where outer checksums are enabled. Signed-off-by: Manish Chopra Signed-off-by: Yuval Mintz --- drivers/net/ethernet/qlogic/qede/qede.h | 1 + drivers/net/ethernet/qlogic/qede/qede_main.c | 26 +++--- 2 files changed, 24 insertions(+), 3 deletions(-) diff --git a/drivers/net/ethernet/qlogic/qede/qede.h b/drivers/net/ethernet/qlogic/qede/qede.h index 28c0e9f..f50e527 100644 --- a/drivers/net/ethernet/qlogic/qede/qede.h +++ b/drivers/net/ethernet/qlogic/qede/qede.h @@ -320,6 +320,7 @@ struct qede_fastpath { #define XMIT_L4_CSUM BIT(0) #define XMIT_LSO BIT(1) #define XMIT_ENC BIT(2) +#define XMIT_ENC_GSO_L4_CSUM BIT(3) #define QEDE_CSUM_ERRORBIT(0) #define QEDE_CSUM_UNNECESSARY BIT(1) diff --git a/drivers/net/ethernet/qlogic/qede/qede_main.c b/drivers/net/ethernet/qlogic/qede/qede_main.c index 9866d95..7d5dc1e 100644 --- a/drivers/net/ethernet/qlogic/qede/qede_main.c +++ b/drivers/net/ethernet/qlogic/qede/qede_main.c @@ -400,8 +400,19 @@ static u32 qede_xmit_type(struct qede_dev *edev, (ipv6_hdr(skb)->nexthdr == NEXTHDR_IPV6)) *ipv6_ext = 1; - if (skb->encapsulation) + if (skb->encapsulation) { rc |= XMIT_ENC; + if (skb_is_gso(skb)) { + unsigned short gso_type = skb_shinfo(skb)->gso_type; + + if ((gso_type & SKB_GSO_UDP_TUNNEL_CSUM) || + (gso_type & SKB_GSO_GRE_CSUM)) + rc |= XMIT_ENC_GSO_L4_CSUM; + + rc |= XMIT_LSO; + return rc; + } + } if (skb_is_gso(skb)) rc |= XMIT_LSO; @@ -637,6 +648,12 @@ static netdev_tx_t qede_start_xmit(struct sk_buff *skb, if (unlikely(xmit_type & XMIT_ENC)) { first_bd->data.bd_flags.bitfields |= 1 << ETH_TX_1ST_BD_FLAGS_TUNN_IP_CSUM_SHIFT; + + if (xmit_type & XMIT_ENC_GSO_L4_CSUM) { + u8 tmp = ETH_TX_1ST_BD_FLAGS_TUNN_L4_CSUM_SHIFT; + + first_bd->data.bd_flags.bitfields |= 1 << tmp; + } hlen = qede_get_skb_hlen(skb, true); } else { first_bd->data.bd_flags.bitfields |= @@ -2320,11 +2337,14 @@ static void qede_init_ndev(struct qede_dev *edev) /* Encap features*/ hw_features |= NETIF_F_GSO_GRE | NETIF_F_GSO_UDP_TUNNEL | - NETIF_F_TSO_ECN; + NETIF_F_TSO_ECN | NETIF_F_GSO_UDP_TUNNEL_CSUM | + NETIF_F_GSO_GRE_CSUM; ndev->hw_enc_features = NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM | NETIF_F_SG | NETIF_F_TSO | NETIF_F_TSO_ECN | NETIF_F_TSO6 | NETIF_F_GSO_GRE | - NETIF_F_GSO_UDP_TUNNEL | NETIF_F_RXCSUM; + NETIF_F_GSO_UDP_TUNNEL | NETIF_F_RXCSUM | + NETIF_F_GSO_UDP_TUNNEL_CSUM | + NETIF_F_GSO_GRE_CSUM; ndev->vlan_features = hw_features | NETIF_F_RXHASH | NETIF_F_RXCSUM | NETIF_F_HIGHDMA; -- 1.9.3
[PATCH net-next 0/6] qed*: driver updates
From: Yuval Mintz There are several new additions in this series; Most are connected to either Tx offloading or Rx classifications [either fastpath changes or supporting configuration]. In addition, there's a single IOV enhancement. Dave, Please consider applying this series to `net-next'. Thanks, Yuval Manish Chopra (2): qede: GSO support for tunnels with outer csum qede: Prevent GSO on long Geneve headers Yuval Mintz (4): qed: Pass MAC hints to VFs qed*: Allow unicast filtering qed: Allow chance for fast ramrod completions qed: Handle malicious VFs events drivers/net/ethernet/qlogic/qed/qed_l2.c | 12 ++- drivers/net/ethernet/qlogic/qed/qed_spq.c| 85 ++-- drivers/net/ethernet/qlogic/qed/qed_sriov.c | 114 ++- drivers/net/ethernet/qlogic/qed/qed_sriov.h | 1 + drivers/net/ethernet/qlogic/qed/qed_vf.c | 4 +- drivers/net/ethernet/qlogic/qed/qed_vf.h | 1 + drivers/net/ethernet/qlogic/qede/qede.h | 1 + drivers/net/ethernet/qlogic/qede/qede_main.c | 71 +++-- include/linux/qed/qed_eth_if.h | 3 +- 9 files changed, 236 insertions(+), 56 deletions(-) -- 1.9.3
[PATCH net-next 6/6] qed: Handle malicious VFs events
From: Yuval Mintz Malicious VFs might be caught in several different methods: - Misusing their bar permission and being blocked by hardware. - Misusing their fastpath logic and being blocked by firmware. - Misusing their interaction with their PF via hw-channel, and being blocked by PF driver. On the first two items, firmware would indicate to driver that the VF is to be considered malicious, but would sometime still allow the VF to communicate with the PF [depending on the exact nature of the malicious activity done by the VF]. The current existing logic on the PF side lacks handling of such events, and might allow the PF to perform some incorrect configuration on behalf of a VF that was previously indicated as malicious. The new scheme is simple - Once the PF determines a VF is malicious it would: a. Ignore any further requests on behalf of the VF-driver. b. Prevent any configurations initiated by the hyperuser for the malicious VF, as firmware isn't willing to serve such. The malicious indication would be cleared upon the VF flr, after which it would become usable once again. Signed-off-by: Yuval Mintz --- drivers/net/ethernet/qlogic/qed/qed_sriov.c | 114 +++- drivers/net/ethernet/qlogic/qed/qed_sriov.h | 1 + drivers/net/ethernet/qlogic/qed/qed_vf.h| 1 + 3 files changed, 96 insertions(+), 20 deletions(-) diff --git a/drivers/net/ethernet/qlogic/qed/qed_sriov.c b/drivers/net/ethernet/qlogic/qed/qed_sriov.c index d2d6621..6f029f9 100644 --- a/drivers/net/ethernet/qlogic/qed/qed_sriov.c +++ b/drivers/net/ethernet/qlogic/qed/qed_sriov.c @@ -109,7 +109,8 @@ static int qed_sp_vf_stop(struct qed_hwfn *p_hwfn, } static bool qed_iov_is_valid_vfid(struct qed_hwfn *p_hwfn, - int rel_vf_id, bool b_enabled_only) + int rel_vf_id, + bool b_enabled_only, bool b_non_malicious) { if (!p_hwfn->pf_iov_info) { DP_NOTICE(p_hwfn->cdev, "No iov info\n"); @@ -124,6 +125,10 @@ static bool qed_iov_is_valid_vfid(struct qed_hwfn *p_hwfn, b_enabled_only) return false; + if ((p_hwfn->pf_iov_info->vfs_array[rel_vf_id].b_malicious) && + b_non_malicious) + return false; + return true; } @@ -138,7 +143,8 @@ static struct qed_vf_info *qed_iov_get_vf_info(struct qed_hwfn *p_hwfn, return NULL; } - if (qed_iov_is_valid_vfid(p_hwfn, relative_vf_id, b_enabled_only)) + if (qed_iov_is_valid_vfid(p_hwfn, relative_vf_id, + b_enabled_only, false)) vf = &p_hwfn->pf_iov_info->vfs_array[relative_vf_id]; else DP_ERR(p_hwfn, "qed_iov_get_vf_info: VF[%d] is not enabled\n", @@ -542,7 +548,8 @@ int qed_iov_hw_info(struct qed_hwfn *p_hwfn) return 0; } -static bool qed_iov_pf_sanity_check(struct qed_hwfn *p_hwfn, int vfid) +bool _qed_iov_pf_sanity_check(struct qed_hwfn *p_hwfn, + int vfid, bool b_fail_malicious) { /* Check PF supports sriov */ if (IS_VF(p_hwfn->cdev) || !IS_QED_SRIOV(p_hwfn->cdev) || @@ -550,12 +557,17 @@ static bool qed_iov_pf_sanity_check(struct qed_hwfn *p_hwfn, int vfid) return false; /* Check VF validity */ - if (!qed_iov_is_valid_vfid(p_hwfn, vfid, true)) + if (!qed_iov_is_valid_vfid(p_hwfn, vfid, true, b_fail_malicious)) return false; return true; } +bool qed_iov_pf_sanity_check(struct qed_hwfn *p_hwfn, int vfid) +{ + return _qed_iov_pf_sanity_check(p_hwfn, vfid, true); +} + static void qed_iov_set_vf_to_disable(struct qed_dev *cdev, u16 rel_vf_id, u8 to_disable) { @@ -652,6 +664,9 @@ static int qed_iov_enable_vf_access(struct qed_hwfn *p_hwfn, qed_iov_vf_igu_reset(p_hwfn, p_ptt, vf); + /* It's possible VF was previously considered malicious */ + vf->b_malicious = false; + rc = qed_mcp_config_vf_msix(p_hwfn, p_ptt, vf->abs_vf_id, vf->num_sbs); if (rc) return rc; @@ -2804,6 +2819,13 @@ qed_iov_execute_vf_flr_cleanup(struct qed_hwfn *p_hwfn, return rc; } + /* Workaround to make VF-PF channel ready, as FW +* doesn't do that as a part of FLR. +*/ + REG_WR(p_hwfn, + GTT_BAR0_MAP_REG_USDM_RAM + + USTORM_VF_PF_CHANNEL_READY_OFFSET(vfid), 1); + /* VF_STOPPED has to be set only after final cleanup * but prior to re-enabling the VF. */ @@ -2942,7 +2964,8 @@ static void qed_iov_process_mbx_req(struct qed_hwfn *p_hwfn, mbx->first_tlv = mbx->req_virt->first_tlv; /* check if tlv type is known */ - if (qed_iov_tlv_supported(mbx->first_tlv.tl.type)) { +
[PATCH net-next 4/6] qed*: Allow unicast filtering
From: Yuval Mintz Apparently qede fails to set IFF_UNICAST_FLT, and as a result is not actually performing unicast MAC filtering. While we're at it - relax a hard-coded limitation that limits each interface into using at most 15 unicast MAC addresses before turning promiscuous. Instead utilize the HW resources to their limit. Signed-off-by: Yuval Mintz --- drivers/net/ethernet/qlogic/qed/qed_l2.c | 12 ++-- drivers/net/ethernet/qlogic/qede/qede_main.c | 4 +++- include/linux/qed/qed_eth_if.h | 1 + 3 files changed, 14 insertions(+), 3 deletions(-) diff --git a/drivers/net/ethernet/qlogic/qed/qed_l2.c b/drivers/net/ethernet/qlogic/qed/qed_l2.c index ddd410a..572ed04 100644 --- a/drivers/net/ethernet/qlogic/qed/qed_l2.c +++ b/drivers/net/ethernet/qlogic/qed/qed_l2.c @@ -1652,6 +1652,7 @@ static int qed_fill_eth_dev_info(struct qed_dev *cdev, if (IS_PF(cdev)) { int max_vf_vlan_filters = 0; + int max_vf_mac_filters = 0; if (cdev->int_params.out.int_mode == QED_INT_MODE_MSIX) { for_each_hwfn(cdev, i) @@ -1665,11 +1666,18 @@ static int qed_fill_eth_dev_info(struct qed_dev *cdev, info->num_queues = cdev->num_hwfns; } - if (IS_QED_SRIOV(cdev)) + if (IS_QED_SRIOV(cdev)) { max_vf_vlan_filters = cdev->p_iov_info->total_vfs * QED_ETH_VF_NUM_VLAN_FILTERS; - info->num_vlan_filters = RESC_NUM(&cdev->hwfns[0], QED_VLAN) - + max_vf_mac_filters = cdev->p_iov_info->total_vfs * +QED_ETH_VF_NUM_MAC_FILTERS; + } + info->num_vlan_filters = RESC_NUM(QED_LEADING_HWFN(cdev), + QED_VLAN) - max_vf_vlan_filters; + info->num_mac_filters = RESC_NUM(QED_LEADING_HWFN(cdev), +QED_MAC) - + max_vf_mac_filters; ether_addr_copy(info->port_mac, cdev->hwfns[0].hw_info.hw_mac_addr); diff --git a/drivers/net/ethernet/qlogic/qede/qede_main.c b/drivers/net/ethernet/qlogic/qede/qede_main.c index 6c2b09c..0e483af 100644 --- a/drivers/net/ethernet/qlogic/qede/qede_main.c +++ b/drivers/net/ethernet/qlogic/qede/qede_main.c @@ -2365,6 +2365,8 @@ static void qede_init_ndev(struct qede_dev *edev) qede_set_ethtool_ops(ndev); + ndev->priv_flags = IFF_UNICAST_FLT; + /* user-changeble features */ hw_features = NETIF_F_GRO | NETIF_F_SG | NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM | @@ -3937,7 +3939,7 @@ static void qede_config_rx_mode(struct net_device *ndev) /* Check for promiscuous */ if ((ndev->flags & IFF_PROMISC) || - (uc_count > 15)) { /* @@@TBD resource allocation - 1 */ + (uc_count > edev->dev_info.num_mac_filters - 1)) { accept_flags = QED_FILTER_RX_MODE_TYPE_PROMISC; } else { /* Add MAC filters according to the unicast secondary macs */ diff --git a/include/linux/qed/qed_eth_if.h b/include/linux/qed/qed_eth_if.h index 1c77948..1513080 100644 --- a/include/linux/qed/qed_eth_if.h +++ b/include/linux/qed/qed_eth_if.h @@ -23,6 +23,7 @@ struct qed_dev_eth_info { u8 port_mac[ETH_ALEN]; u8 num_vlan_filters; + u16 num_mac_filters; /* Legacy VF - this affects the datapath, so qede has to know */ bool is_legacy; -- 1.9.3
[PATCH] net: dsa: slave: use new api ethtool_{get|set}_link_ksettings
The ethtool api {get|set}_settings is deprecated. We move this driver to new api {get|set}_link_ksettings. Signed-off-by: Philippe Reynes --- net/dsa/slave.c | 14 -- 1 files changed, 8 insertions(+), 6 deletions(-) diff --git a/net/dsa/slave.c b/net/dsa/slave.c index 6b1282c..68714a5 100644 --- a/net/dsa/slave.c +++ b/net/dsa/slave.c @@ -641,7 +641,8 @@ static netdev_tx_t dsa_slave_xmit(struct sk_buff *skb, struct net_device *dev) /* ethtool operations ***/ static int -dsa_slave_get_settings(struct net_device *dev, struct ethtool_cmd *cmd) +dsa_slave_get_link_ksettings(struct net_device *dev, +struct ethtool_link_ksettings *cmd) { struct dsa_slave_priv *p = netdev_priv(dev); int err; @@ -650,19 +651,20 @@ dsa_slave_get_settings(struct net_device *dev, struct ethtool_cmd *cmd) if (p->phy != NULL) { err = phy_read_status(p->phy); if (err == 0) - err = phy_ethtool_gset(p->phy, cmd); + err = phy_ethtool_ksettings_get(p->phy, cmd); } return err; } static int -dsa_slave_set_settings(struct net_device *dev, struct ethtool_cmd *cmd) +dsa_slave_set_link_ksettings(struct net_device *dev, +const struct ethtool_link_ksettings *cmd) { struct dsa_slave_priv *p = netdev_priv(dev); if (p->phy != NULL) - return phy_ethtool_sset(p->phy, cmd); + return phy_ethtool_ksettings_set(p->phy, cmd); return -EOPNOTSUPP; } @@ -990,8 +992,6 @@ void dsa_cpu_port_ethtool_init(struct ethtool_ops *ops) } static const struct ethtool_ops dsa_slave_ethtool_ops = { - .get_settings = dsa_slave_get_settings, - .set_settings = dsa_slave_set_settings, .get_drvinfo= dsa_slave_get_drvinfo, .get_regs_len = dsa_slave_get_regs_len, .get_regs = dsa_slave_get_regs, @@ -1007,6 +1007,8 @@ static const struct ethtool_ops dsa_slave_ethtool_ops = { .get_wol= dsa_slave_get_wol, .set_eee= dsa_slave_set_eee, .get_eee= dsa_slave_get_eee, + .get_link_ksettings = dsa_slave_get_link_ksettings, + .set_link_ksettings = dsa_slave_set_link_ksettings, }; static const struct net_device_ops dsa_slave_netdev_ops = { -- 1.7.4.4
[PATCH] qed: Fix to use list_for_each_entry_safe() when delete items
From: Wei Yongjun Since we will remove items off the list using list_del() we need to use a safe version of the list_for_each_entry() macro aptly named list_for_each_entry_safe(). Fixes: 0a7fb11c23c0 ("qed: Add Light L2 support") Signed-off-by: Wei Yongjun --- drivers/net/ethernet/qlogic/qed/qed_ll2.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/qlogic/qed/qed_ll2.c b/drivers/net/ethernet/qlogic/qed/qed_ll2.c index a6db107..4428333 100644 --- a/drivers/net/ethernet/qlogic/qed/qed_ll2.c +++ b/drivers/net/ethernet/qlogic/qed/qed_ll2.c @@ -1517,7 +1517,7 @@ static void qed_ll2_register_cb_ops(struct qed_dev *cdev, static int qed_ll2_start(struct qed_dev *cdev, struct qed_ll2_params *params) { struct qed_ll2_info ll2_info; - struct qed_ll2_buffer *buffer; + struct qed_ll2_buffer *buffer, *tmp; enum qed_ll2_conn_type conn_type; struct qed_ptt *p_ptt; int rc, i; @@ -1587,7 +1587,7 @@ static int qed_ll2_start(struct qed_dev *cdev, struct qed_ll2_params *params) /* Post all Rx buffers to FW */ spin_lock_bh(&cdev->ll2->lock); - list_for_each_entry(buffer, &cdev->ll2->list, list) { + list_for_each_entry_safe(buffer, tmp, &cdev->ll2->list, list) { rc = qed_ll2_post_rx_buffer(QED_LEADING_HWFN(cdev), cdev->ll2->handle, buffer->phys_addr, 0, buffer, 1);
RE: [PATCH] qed: Fix to use list_for_each_entry_safe() when delete items
> From: Wei Yongjun > > Since we will remove items off the list using list_del() we need to use a safe > version of the list_for_each_entry() macro aptly named > list_for_each_entry_safe(). > > Fixes: 0a7fb11c23c0 ("qed: Add Light L2 support") > Signed-off-by: Wei Yongjun > --- > drivers/net/ethernet/qlogic/qed/qed_ll2.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/drivers/net/ethernet/qlogic/qed/qed_ll2.c > b/drivers/net/ethernet/qlogic/qed/qed_ll2.c > index a6db107..4428333 100644 > --- a/drivers/net/ethernet/qlogic/qed/qed_ll2.c > +++ b/drivers/net/ethernet/qlogic/qed/qed_ll2.c > @@ -1517,7 +1517,7 @@ static void qed_ll2_register_cb_ops(struct qed_dev > *cdev, static int qed_ll2_start(struct qed_dev *cdev, struct qed_ll2_params > *params) { > struct qed_ll2_info ll2_info; > - struct qed_ll2_buffer *buffer; > + struct qed_ll2_buffer *buffer, *tmp; > enum qed_ll2_conn_type conn_type; > struct qed_ptt *p_ptt; > int rc, i; > @@ -1587,7 +1587,7 @@ static int qed_ll2_start(struct qed_dev *cdev, struct > qed_ll2_params *params) > > /* Post all Rx buffers to FW */ > spin_lock_bh(&cdev->ll2->lock); > - list_for_each_entry(buffer, &cdev->ll2->list, list) { > + list_for_each_entry_safe(buffer, tmp, &cdev->ll2->list, list) { > rc = qed_ll2_post_rx_buffer(QED_LEADING_HWFN(cdev), > cdev->ll2->handle, > buffer->phys_addr, 0, buffer, 1); Thanks for the catch. A single petty comment - having the variable called 'tmp' in such a long function is far from informative. Perhaps 'tmp_buffer' would have been better. Regardless, Acked-by: Yuval Mintz
[PATCH] staging: net: netlogic: use new api ethtool_{get|set}_link_ksettings
The ethtool api {get|set}_settings is deprecated. We move this driver to new api {get|set}_link_ksettings. Signed-off-by: Philippe Reynes --- drivers/staging/netlogic/xlr_net.c | 14 -- 1 files changed, 8 insertions(+), 6 deletions(-) diff --git a/drivers/staging/netlogic/xlr_net.c b/drivers/staging/netlogic/xlr_net.c index 552a7dc..cdf01b9 100644 --- a/drivers/staging/netlogic/xlr_net.c +++ b/drivers/staging/netlogic/xlr_net.c @@ -172,29 +172,31 @@ static struct phy_device *xlr_get_phydev(struct xlr_net_priv *priv) /* * Ethtool operation */ -static int xlr_get_settings(struct net_device *ndev, struct ethtool_cmd *ecmd) +static int xlr_get_link_ksettings(struct net_device *ndev, + struct ethtool_link_ksettings *ecmd) { struct xlr_net_priv *priv = netdev_priv(ndev); struct phy_device *phydev = xlr_get_phydev(priv); if (!phydev) return -ENODEV; - return phy_ethtool_gset(phydev, ecmd); + return phy_ethtool_ksettings_get(phydev, ecmd); } -static int xlr_set_settings(struct net_device *ndev, struct ethtool_cmd *ecmd) +static int xlr_set_link_ksettings(struct net_device *ndev, + const struct ethtool_link_ksettings *ecmd) { struct xlr_net_priv *priv = netdev_priv(ndev); struct phy_device *phydev = xlr_get_phydev(priv); if (!phydev) return -ENODEV; - return phy_ethtool_sset(phydev, ecmd); + return phy_ethtool_ksettings_set(phydev, ecmd); } static const struct ethtool_ops xlr_ethtool_ops = { - .get_settings = xlr_get_settings, - .set_settings = xlr_set_settings, + .get_link_ksettings = xlr_get_link_ksettings, + .set_link_ksettings = xlr_set_link_ksettings, }; /* -- 1.7.4.4
Re: Code quality and XDP
From: Tom Herbert Date: Sat, 8 Oct 2016 12:11:51 -0700 > On Fri, Oct 7, 2016 at 9:28 PM, Jesper Dangaard Brouer > wrote: >> >> On Sat, 8 Oct 2016 07:25:01 +0900 Tom Herbert wrote: >> >>> One concern raised at netdev concerning XDP is how are we going to >>> maintain code quality, security, and correctness of programs being >>> loaded. With kernel bypass it is not just the kernel code path that is >>> being bypassed, but also the processes that hold the quality of code >>> being accepted to a high bar. Our users expect that quality to be >>> maintained. >>> >>> I suggest that we need XDP programs to be subject to the same scrutiny >>> that any other kernel netdev code is. One idea is to sign programs >>> that have been accepted into the kernel. By default only signed >>> programs would be allowed to be loaded, the override to allow unsigned >>> programs might be a kernel config or a least a boot parameter >>> (enabling the override needs to be very explicit). >> >> Sorry, I think this "lock-down" will kill the DDoS use-case. In the >> DDoS mitigation use-case, is all about flexibility to adapt quickly to >> changing attacks. Thus, you need the ability to quickly modify your >> programs to catch attack signatures. >> > As I mentioned the ability to run arbitrary programs can be explicitly > be disabled for such use-cases. But not all use cases of XDP require > such dynamic program-ability and not every user is going to need or > want this capability. For instance, an ILA router should be a > straightforward program to implement and not really need dynamic > modification (it is controlled through configuration). If someone > chooses to do a proprietary ILA router themselves they can do that by > disabling the lock-down, but they shouldn't expect any support from > the community when things got wrong. This is no different then when > people post to netdev about problems they are having with proprietary > modules. If they use an in-tree implementation then we could support > that. The only entities that care about lock down are the same entities that use kernel module signing. And these area people who put together distributions where they wish to make sure only signed kernel modules and XDP programs can execute without somehow "tainting" the system so that crash dumps sent in for support and analysis are appropriately marked as such. The default should definitely be to allow the administrator to load any XDP program they want. Administrators and distribution maintainers can both optionally change the system to only run signed XDP programs. I frankly am not so fearful of people running arbitrary XDP programs. If a CAP_SYS_ADMIN privileged entity on a physical host does it, it's his problem. He could have messed up the system even more. A CAP_SYS_ADMIN privileged entity on a virtual host can only screw up his own traffic, and this doesn't hurt anyone else on the machine. Powerful facilities come at a certain cost. Let's not defang this by default until we really know it will bite people.
Re: [PATCH net-next 0/5] be2net: patch-set
From: Sriharsha Basavapatna Date: Sun, 9 Oct 2016 09:58:48 +0530 > The following patch set contains a few bug fixes. > Please consider applying this to the net-next tree. > Thanks. > > Patch-1 Obtains proper PF number for BEx chips > Patch-2 Fixes a FW update issue seen with BEx chips > Patch-3 Updates copyright string > Patch-4 Fixes TX stats for TSO packets > Patch-5 Enables VF link state setting for BE3 Please do not target real bug fixes at net-next, they should target 'net' instead. And that's where I have applied this series. Thanks.
[PATCH] net: usb: lan78xx: use new api ethtool_{get|set}_link_ksettings
The ethtool api {get|set}_settings is deprecated. We move this driver to new api {get|set}_link_ksettings. Signed-off-by: Philippe Reynes --- drivers/net/usb/lan78xx.c | 70 +--- 1 files changed, 40 insertions(+), 30 deletions(-) diff --git a/drivers/net/usb/lan78xx.c b/drivers/net/usb/lan78xx.c index db558b8..13f033c 100644 --- a/drivers/net/usb/lan78xx.c +++ b/drivers/net/usb/lan78xx.c @@ -1092,7 +1092,7 @@ static int lan78xx_update_flowcontrol(struct lan78xx_net *dev, u8 duplex, static int lan78xx_link_reset(struct lan78xx_net *dev) { struct phy_device *phydev = dev->net->phydev; - struct ethtool_cmd ecmd = { .cmd = ETHTOOL_GSET }; + struct ethtool_link_ksettings ecmd; int ladv, radv, ret; u32 buf; @@ -1126,12 +1126,12 @@ static int lan78xx_link_reset(struct lan78xx_net *dev) } else if (phydev->link && !dev->link_on) { dev->link_on = true; - phy_ethtool_gset(phydev, &ecmd); + phy_ethtool_ksettings_get(phydev, &ecmd); ret = phy_read(phydev, LAN88XX_INT_STS); if (dev->udev->speed == USB_SPEED_SUPER) { - if (ethtool_cmd_speed(&ecmd) == 1000) { + if (ecmd.base.speed == 1000) { /* disable U2 */ ret = lan78xx_read_reg(dev, USB_CFG1, &buf); buf &= ~USB_CFG1_DEV_U2_INIT_EN_; @@ -1159,9 +1159,10 @@ static int lan78xx_link_reset(struct lan78xx_net *dev) netif_dbg(dev, link, dev->net, "speed: %u duplex: %d anadv: 0x%04x anlpa: 0x%04x", - ethtool_cmd_speed(&ecmd), ecmd.duplex, ladv, radv); + ecmd.base.speed, ecmd.base.duplex, ladv, radv); - ret = lan78xx_update_flowcontrol(dev, ecmd.duplex, ladv, radv); + ret = lan78xx_update_flowcontrol(dev, ecmd.base.duplex, ladv, +radv); phy_mac_interrupt(phydev, 1); if (!timer_pending(&dev->stat_monitor)) { @@ -1484,7 +1485,8 @@ static void lan78xx_set_mdix_status(struct net_device *net, __u8 mdix_ctrl) dev->mdix_ctrl = mdix_ctrl; } -static int lan78xx_get_settings(struct net_device *net, struct ethtool_cmd *cmd) +static int lan78xx_get_link_ksettings(struct net_device *net, + struct ethtool_link_ksettings *cmd) { struct lan78xx_net *dev = netdev_priv(net); struct phy_device *phydev = net->phydev; @@ -1495,20 +1497,20 @@ static int lan78xx_get_settings(struct net_device *net, struct ethtool_cmd *cmd) if (ret < 0) return ret; - ret = phy_ethtool_gset(phydev, cmd); + ret = phy_ethtool_ksettings_get(phydev, cmd); buf = lan78xx_get_mdix_status(net); buf &= LAN88XX_EXT_MODE_CTRL_MDIX_MASK_; if (buf == LAN88XX_EXT_MODE_CTRL_AUTO_MDIX_) { - cmd->eth_tp_mdix = ETH_TP_MDI_AUTO; - cmd->eth_tp_mdix_ctrl = ETH_TP_MDI_AUTO; + cmd->base.eth_tp_mdix = ETH_TP_MDI_AUTO; + cmd->base.eth_tp_mdix_ctrl = ETH_TP_MDI_AUTO; } else if (buf == LAN88XX_EXT_MODE_CTRL_MDI_) { - cmd->eth_tp_mdix = ETH_TP_MDI; - cmd->eth_tp_mdix_ctrl = ETH_TP_MDI; + cmd->base.eth_tp_mdix = ETH_TP_MDI; + cmd->base.eth_tp_mdix_ctrl = ETH_TP_MDI; } else if (buf == LAN88XX_EXT_MODE_CTRL_MDI_X_) { - cmd->eth_tp_mdix = ETH_TP_MDI_X; - cmd->eth_tp_mdix_ctrl = ETH_TP_MDI_X; + cmd->base.eth_tp_mdix = ETH_TP_MDI_X; + cmd->base.eth_tp_mdix_ctrl = ETH_TP_MDI_X; } usb_autopm_put_interface(dev->intf); @@ -1516,7 +1518,8 @@ static int lan78xx_get_settings(struct net_device *net, struct ethtool_cmd *cmd) return ret; } -static int lan78xx_set_settings(struct net_device *net, struct ethtool_cmd *cmd) +static int lan78xx_set_link_ksettings(struct net_device *net, + const struct ethtool_link_ksettings *cmd) { struct lan78xx_net *dev = netdev_priv(net); struct phy_device *phydev = net->phydev; @@ -1527,14 +1530,13 @@ static int lan78xx_set_settings(struct net_device *net, struct ethtool_cmd *cmd) if (ret < 0) return ret; - if (dev->mdix_ctrl != cmd->eth_tp_mdix_ctrl) { - lan78xx_set_mdix_status(net, cmd->eth_tp_mdix_ctrl); - } + if (dev->mdix_ctrl != cmd->base.eth_tp_mdix_ctrl) + lan78xx_set_mdix_status(net, cmd->base.eth_tp_mdix_ctrl); /* change speed & duplex */ - ret = phy_ethtool_sset(phydev, cmd); + ret = phy_ethtool_ksettings_set(phydev, cmd); - if (!cmd->autoneg) { + if (!cmd->base.autoneg) { /* force link down
Re: net: BUG still has locks held in unix_stream_splice_read
Hello, While running syzkaller fuzzer on commit b66484cd74706fa8681d051840fe4b18a3da40ff (Oct 7), I am getting: [ BUG: syz-executor/15138 still has locks held! ] 4.8.0+ #29 Not tainted - 1 lock held by syz-executor/15138: #0: (&pipe->mutex/1){+.+.+.}, at: [< inline >] pipe_lock_nested fs/pipe.c:66 #0: (&pipe->mutex/1){+.+.+.}, at: [] pipe_lock+0x5b/0x70 fs/pipe.c:74 stack backtrace: CPU: 1 PID: 15138 Comm: syz-executor Not tainted 4.8.0+ #29 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 880044d4fa38 82d383c9 fbfff1097248 88005a44a3c0 88005a44a3c0 dc00 88005a44a3c0 8800541fb9b8 880044d4fa58 81463cd5 Call Trace: [< inline >] __dump_stack lib/dump_stack.c:15 [] dump_stack+0x12e/0x185 lib/dump_stack.c:51 [< inline >] print_held_locks_bug kernel/locking/lockdep.c:4296 [] debug_check_no_locks_held+0x125/0x140 kernel/locking/lockdep.c:4302 [< inline >] try_to_freeze include/linux/freezer.h:65 [< inline >] freezer_count include/linux/freezer.h:127 [< inline >] freezable_schedule_timeout include/linux/freezer.h:192 [< inline >] unix_stream_data_wait net/unix/af_unix.c:2223 [] unix_stream_read_generic+0x1317/0x1b70 net/unix/af_unix.c:2332 [] unix_stream_splice_read+0x15b/0x1d0 net/unix/af_unix.c:2506 [] sock_splice_read+0xbe/0x100 net/socket.c:775 [] do_splice_to+0x10f/0x170 fs/splice.c:908 [< inline >] do_splice fs/splice.c:1196 [< inline >] SYSC_splice fs/splice.c:1420 [] SyS_splice+0x114c/0x15b0 fs/splice.c:1403 [] entry_SYSCALL_64_fastpath+0x23/0xc6 I suspect this is: commit 25869262ef7af24ccde988867ac3eb1c3d4b88d4 Author: Al Viro Date: Sat Sep 17 21:02:10 2016 -0400 skb_splice_bits(): get rid of callback since pipe_lock is the outermost now, we don't need to drop/regain socket locks around the call of splice_to_pipe() from skb_splice_bits(), which kills the need to have a socket-specific callback; we can just call splice_to_pipe() and be done with that.
net: BUG still has locks held in unix_stream_splice_read
Hello, While running syzkaller fuzzer on commit b66484cd74706fa8681d051840fe4b18a3da40ff (Oct 7), I am getting: [ BUG: syz-executor/15138 still has locks held! ] 4.8.0+ #29 Not tainted - 1 lock held by syz-executor/15138: #0: (&pipe->mutex/1){+.+.+.}, at: [< inline >] pipe_lock_nested fs/pipe.c:66 #0: (&pipe->mutex/1){+.+.+.}, at: [] pipe_lock+0x5b/0x70 fs/pipe.c:74 stack backtrace: CPU: 1 PID: 15138 Comm: syz-executor Not tainted 4.8.0+ #29 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 880044d4fa38 82d383c9 fbfff1097248 88005a44a3c0 88005a44a3c0 dc00 88005a44a3c0 8800541fb9b8 880044d4fa58 81463cd5 Call Trace: [< inline >] __dump_stack lib/dump_stack.c:15 [] dump_stack+0x12e/0x185 lib/dump_stack.c:51 [< inline >] print_held_locks_bug kernel/locking/lockdep.c:4296 [] debug_check_no_locks_held+0x125/0x140 kernel/locking/lockdep.c:4302 [< inline >] try_to_freeze include/linux/freezer.h:65 [< inline >] freezer_count include/linux/freezer.h:127 [< inline >] freezable_schedule_timeout include/linux/freezer.h:192 [< inline >] unix_stream_data_wait net/unix/af_unix.c:2223 [] unix_stream_read_generic+0x1317/0x1b70 net/unix/af_unix.c:2332 [] unix_stream_splice_read+0x15b/0x1d0 net/unix/af_unix.c:2506 [] sock_splice_read+0xbe/0x100 net/socket.c:775 [] do_splice_to+0x10f/0x170 fs/splice.c:908 [< inline >] do_splice fs/splice.c:1196 [< inline >] SYSC_splice fs/splice.c:1420 [] SyS_splice+0x114c/0x15b0 fs/splice.c:1403 [] entry_SYSCALL_64_fastpath+0x23/0xc6 I suspect this is: commit 25869262ef7af24ccde988867ac3eb1c3d4b88d4 Author: Al Viro Date: Sat Sep 17 21:02:10 2016 -0400 skb_splice_bits(): get rid of callback since pipe_lock is the outermost now, we don't need to drop/regain socket locks around the call of splice_to_pipe() from skb_splice_bits(), which kills the need to have a socket-specific callback; we can just call splice_to_pipe() and be done with that.
Re: [PATCH iproute2] devlink: Convert conditional in dl_argv_handle_port() to switch()
On Sun, Oct 09, 2016 at 10:14:18AM +0800, Hangbin Liu wrote: > Discovered by Phil's covscan. The final return statement is never reached. > This is not inherently clear from looking at the code, so change the > conditional to a switch() statement which should clarify this. > > CC: Phil Sutter > Signed-off-by: Hangbin Liu Acked-by: Phil Sutter
[PATCH v2] Add support for ethtool operations to RDC R6040.
Signed-off-by: Venkat Prashanth B U Changes since v1: 1. Made the commit message more clear 2. Add enumeration data type RTL_FLAG_MAX 3. Modified the locking interface used in r6040_get_regs() 4. Initialized mutex dynamically in a function r6040_get_regs() 5. Declared u32 msg_enable in struct r6040_private --- drivers/net/ethernet/rdc/r6040.c | 95 1 file changed, 95 insertions(+) diff --git a/drivers/net/ethernet/rdc/r6040.c b/drivers/net/ethernet/rdc/r6040.c index cb29ee2..167ff59 100644 --- a/drivers/net/ethernet/rdc/r6040.c +++ b/drivers/net/ethernet/rdc/r6040.c @@ -183,6 +183,10 @@ struct r6040_descriptor { u32 rev2; /* 1C-1F */ } __aligned(32); +enum rtl_flag { + RTL_FLAG_MAX +}; + struct r6040_private { spinlock_t lock;/* driver lock */ struct pci_dev *pdev; @@ -196,12 +200,18 @@ struct r6040_private { dma_addr_t tx_ring_dma; u16 tx_free_desc; u16 mcr0; + u32 msg_enable; struct net_device *dev; struct mii_bus *mii_bus; struct napi_struct napi; void __iomem *base; int old_link; int old_duplex; + struct { + DECLARE_BITMAP(flags, RTL_FLAG_MAX); + struct mutex mutex; + struct work_struct work; + } wk; }; static char version[] = DRV_NAME @@ -955,12 +965,97 @@ static void netdev_get_drvinfo(struct net_device *dev, strlcpy(info->bus_info, pci_name(rp->pdev), sizeof(info->bus_info)); } +static void r6040_lock_work(struct r6040_private *tp) +{ + mutex_lock(&tp->wk.mutex); +} + +static void r6040_unlock_work(struct r6040_private *tp) +{ + mutex_unlock(&tp->wk.mutex); +} + +static int r6040_get_regs_len (struct net_device *dev) +{ + return R6040_IO_SIZE; +} + +static void r6040_get_regs (struct net_device *dev, struct ethtool_regs *regs, void *p) +{ + struct r6040_private *tp = netdev_priv (dev); + u32 __iomem *data = tp->base; + u32 *dw = p; + int i; + + r6040_lock_work (tp); + for (i = 0; i < R6040_IO_SIZE; i += 4) +memcpy_fromio (dw++, data++, 4); + r6040_unlock_work (tp); +} + +static u32 r6040_get_msglevel (struct net_device *dev) +{ + struct r6040_private *tp = netdev_priv (dev); + + return tp->msg_enable; +} + +static void r6040_set_msglevel (struct net_device *dev, u32 value) +{ + struct r6040_private *tp = netdev_priv (dev); + + tp->msg_enable = value; +} + +static const char r6040_gstrings[][ETH_GSTRING_LEN] = { + "tx_packets", + "rx_packets", + "tx_errors", + "rx_errors", + "rx_missed", + "align_errors", + "tx_single_collisions", + "tx_multi_collisions", + "unicast", + "broadcast", + "multicast", + "tx_aborted", + "tx_underrun", +}; + +static int r6040_get_sset_count (struct net_device *dev, int sset) +{ + switch (sset) +{ +case ETH_SS_STATS: + return ARRAY_SIZE (r6040_gstrings); +default: + return -EOPNOTSUPP; +} +} + +static void r6040_get_strings (struct net_device *dev, u32 stringset, u8 * data) +{ + switch (stringset) +{ +case ETH_SS_STATS: + memcpy (data, *r6040_gstrings, sizeof (r6040_gstrings)); + break; +} +} + static const struct ethtool_ops netdev_ethtool_ops = { .get_drvinfo= netdev_get_drvinfo, .get_link = ethtool_op_get_link, .get_ts_info= ethtool_op_get_ts_info, .get_link_ksettings = phy_ethtool_get_link_ksettings, .set_link_ksettings = phy_ethtool_set_link_ksettings, + .get_regs_len = r6040_get_regs_len, + .get_msglevel = r6040_get_msglevel, + .set_msglevel = r6040_set_msglevel, + .get_regs = r6040_get_regs, + .get_strings = r6040_get_strings, + .get_sset_count = r6040_get_sset_count, }; static const struct net_device_ops r6040_netdev_ops = { -- 1.9.2