Re: netlink: GPF in sock_sndtimeo
On Fri, Dec 9, 2016 at 8:13 PM, Cong Wangwrote: > On Fri, Dec 9, 2016 at 3:01 AM, Richard Guy Briggs wrote: >> On 2016-12-08 22:57, Cong Wang wrote: >>> On Thu, Dec 8, 2016 at 10:02 PM, Richard Guy Briggs wrote: >>> > I also tried to extend Cong Wang's idea to attempt to proactively respond >>> > to a >>> > NETLINK_URELEASE on the audit_sock and reset it, but ran into a locking >>> > error >>> > stack dump using mutex_lock(_cmd_mutex) in the notifier callback. >>> > Eliminating the lock since the sock is dead anways eliminates the error. >>> > >>> > Is it safe? I'll resubmit if this looks remotely sane. Meanwhile I'll >>> > try to >>> > get the test case to compile. >>> >>> It doesn't look safe, because 'audit_sock', 'audit_nlk_portid' and >>> 'audit_pid' >>> are updated as a whole and race between audit_receive_msg() and >>> NETLINK_URELEASE. >> >> This is what I expected and why I originally added the mutex lock in the >> callback... The dumps I got were bare with no wrapper identifying the >> process context or specific error, so I'm at a bit of a loss how to >> solve this (without thinking more about it) other than instinctively >> removing the mutex. > > Netlink notifier can safely be converted to blocking one, I will send > a patch. > > But I seriously doubt you really need NETLINK_URELEASE here, > it adds nothing but overhead, b/c the netlink notifier is called on > every netlink socket in the system, but for net exit path, that is > relatively a slow path. > > Also, kauditd_send_skb() needs audit_cmd_mutex too. Please let me know what you think about the attached patch? Thanks! commit a12b43ee814625933ff155c20dc863c59cfcf240 Author: Cong Wang Date: Fri Dec 9 17:56:42 2016 -0800 audit: close a race condition on audit_sock Signed-off-by: Cong Wang diff --git a/kernel/audit.c b/kernel/audit.c index f1ca116..ab947d8 100644 --- a/kernel/audit.c +++ b/kernel/audit.c @@ -423,6 +423,8 @@ static void kauditd_send_skb(struct sk_buff *skb) snprintf(s, sizeof(s), "audit_pid=%d reset", audit_pid); audit_log_lost(s); audit_pid = 0; + audit_nlk_portid = 0; + sock_put(audit_sock); audit_sock = NULL; } else { pr_warn("re-scheduling(#%d) write to audit_pid=%d\n", @@ -899,6 +901,9 @@ static int audit_receive_msg(struct sk_buff *skb, struct nlmsghdr *nlh) audit_log_config_change("audit_pid", new_pid, audit_pid, 1); audit_pid = new_pid; audit_nlk_portid = NETLINK_CB(skb).portid; + sock_hold(skb->sk); + if (audit_sock) + sock_put(audit_sock); audit_sock = skb->sk; } if (s.mask & AUDIT_STATUS_RATE_LIMIT) { @@ -1167,10 +1172,6 @@ static void __net_exit audit_net_exit(struct net *net) { struct audit_net *aunet = net_generic(net, audit_net_id); struct sock *sock = aunet->nlsk; - if (sock == audit_sock) { - audit_pid = 0; - audit_sock = NULL; - } RCU_INIT_POINTER(aunet->nlsk, NULL); synchronize_net();
[PATCH] net: socket: removed an unnecessary newline
From: Amit KushwahaThis patch removes a newline which was added in socket.c file in net-next Signed-off-by: Amit Kushwaha --- net/socket.c |1 - 1 file changed, 1 deletion(-) diff --git a/net/socket.c b/net/socket.c index 5835383..dc01d7b 100644 --- a/net/socket.c +++ b/net/socket.c @@ -1,4 +1,3 @@ - /* * NET An implementation of the SOCKET network access protocol. * -- 1.7.9.5
Re: [PATCH V2 00/22] Broadcom RoCE Driver (bnxt_re)
On Fri, Dec 9, 2016 at 12:17 PM, Selvin Xavierwrote: > I am preparing a git repository with these changes as per Jason's > comment and will share the details later today. Please use bnxt_re branch in this git repository. https://github.com/Broadcom/linux-rdma-nxt.git Thanks, Selvin Xavier
[Patch net-next] netlink: use blocking notifier
netlink_chain is called in ->release(), which is apparently a process context, so we don't have to use an atomic notifier here. Signed-off-by: Cong Wang--- net/netlink/af_netlink.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c index 246f29d..801d474 100644 --- a/net/netlink/af_netlink.c +++ b/net/netlink/af_netlink.c @@ -113,7 +113,7 @@ static atomic_t nl_table_users = ATOMIC_INIT(0); #define nl_deref_protected(X) rcu_dereference_protected(X, lockdep_is_held(_table_lock)); -static ATOMIC_NOTIFIER_HEAD(netlink_chain); +static BLOCKING_NOTIFIER_HEAD(netlink_chain); static DEFINE_SPINLOCK(netlink_tap_lock); static struct list_head netlink_tap_all __read_mostly; @@ -711,7 +711,7 @@ static int netlink_release(struct socket *sock) .protocol = sk->sk_protocol, .portid = nlk->portid, }; - atomic_notifier_call_chain(_chain, + blocking_notifier_call_chain(_chain, NETLINK_URELEASE, ); } @@ -2504,13 +2504,13 @@ static const struct file_operations netlink_seq_fops = { int netlink_register_notifier(struct notifier_block *nb) { - return atomic_notifier_chain_register(_chain, nb); + return blocking_notifier_chain_register(_chain, nb); } EXPORT_SYMBOL(netlink_register_notifier); int netlink_unregister_notifier(struct notifier_block *nb) { - return atomic_notifier_chain_unregister(_chain, nb); + return blocking_notifier_chain_unregister(_chain, nb); } EXPORT_SYMBOL(netlink_unregister_notifier); -- 2.5.5
[Patch net-next] ipvs: remove an annoying printk in netns init
At most it is used for debugging purpose, but I don't think it is even useful for debugging, just remove it. Cc: Simon HormanSigned-off-by: Cong Wang --- net/netfilter/ipvs/ip_vs_core.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/net/netfilter/ipvs/ip_vs_core.c b/net/netfilter/ipvs/ip_vs_core.c index 2c1b498..febc7f3 100644 --- a/net/netfilter/ipvs/ip_vs_core.c +++ b/net/netfilter/ipvs/ip_vs_core.c @@ -2231,8 +2231,6 @@ static int __net_init __ip_vs_init(struct net *net) if (ip_vs_sync_net_init(ipvs) < 0) goto sync_fail; - printk(KERN_INFO "IPVS: Creating netns size=%zu id=%d\n", -sizeof(struct netns_ipvs), ipvs->gen); return 0; /* * Error handling -- 2.5.5
[GIT] Networking
1) Limit the number of can filters to avoid > MAX_ORDER allocations. Fix from Marc Kleine-Budde. 2) Limit GSO max size in netvsc driver to avoid problems with NVGRE configurations. From Stephen Hemminger. 3) Return proper error when memory allocation fails in ser_gigaset_init(), from Dan Carpenter. 4) Missing linkage undo in error paths of ipvlan_link_new(), from Gao Feng. 5) Missing necessayr SET_NETDEV_DEV in lantiq and cpmac drivers, from Florian Fainelli. 6) Handle probe deferral properly in smsc911x driver. Please pull, thanks a lot! The following changes since commit bc3913a5378cd0ddefd1dfec6917cc12eb23a946: Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc (2016-12-06 09:24:11 -0800) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git for you to fetch changes up to d33695fbfab73a4a6550fa5c2d0bacc68d7c5901: net: mlx5: Fix Kconfig help text (2016-12-09 23:08:32 -0500) Alex (1): drivers: net: cpsw-phy-sel: Clear RGMII_IDMODE on "rgmii" links Arjun V (1): cxgb4/cxgb4vf: Assign netdev->dev_port with port ID Christopher Covington (1): net: mlx5: Fix Kconfig help text Dan Carpenter (1): ser_gigaset: return -ENOMEM on error instead of success Daniele Palmas (1): NET: usb: cdc_mbim: add quirk for supporting Telit LE922A David S. Miller (3): Merge tag 'linux-can-fixes-for-4.9-20161207' of git://git.kernel.org/.../mkl/linux-can Merge tag 'linux-can-fixes-for-4.9-20161208' of git://git.kernel.org/.../mkl/linux-can Merge branch 'ethernet-missing-netdev-parent' Florian Fainelli (3): phy: Don't increment MDIO bus refcount unless it's a different owner net: ethernet: lantiq_etop: Call SET_NETDEV_DEV() net: ethernet: cpmac: Call SET_NETDEV_DEV() Gao Feng (1): driver: ipvlan: Unlink the upper dev when ipvlan_link_new failed Linus Walleij (1): net: smsc911x: back out silently on probe deferrals Marc Kleine-Budde (1): can: raw: raw_setsockopt: limit number of can_filter that can be set Peng Tao (1): vhost-vsock: fix orphan connection reset Thomas Falcon (1): ibmveth: set correct gso_size and gso_type stephen hemminger (1): netvsc: reduce maximum GSO size 추지호 (1): can: peak: fix bad memory access and free sequence drivers/isdn/gigaset/ser-gigaset.c | 4 +++- drivers/net/can/usb/peak_usb/pcan_usb_core.c| 6 -- drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c | 1 + drivers/net/ethernet/chelsio/cxgb4/t4_hw.c | 1 - drivers/net/ethernet/chelsio/cxgb4vf/cxgb4vf_main.c | 1 + drivers/net/ethernet/ibm/ibmveth.c | 65 +++-- drivers/net/ethernet/ibm/ibmveth.h | 1 + drivers/net/ethernet/lantiq_etop.c | 1 + drivers/net/ethernet/mellanox/mlx5/core/Kconfig | 2 -- drivers/net/ethernet/smsc/smsc911x.c| 9 - drivers/net/ethernet/ti/cpmac.c | 1 + drivers/net/ethernet/ti/cpsw-phy-sel.c | 1 + drivers/net/hyperv/netvsc_drv.c | 5 + drivers/net/ipvlan/ipvlan_main.c| 4 +++- drivers/net/phy/phy_device.c| 16 +--- drivers/net/usb/cdc_mbim.c | 21 + drivers/net/usb/cdc_ncm.c | 14 +- drivers/vhost/vsock.c | 2 +- include/linux/usb/cdc_ncm.h | 3 ++- include/uapi/linux/can.h| 1 + net/can/raw.c | 3 +++ 21 files changed, 142 insertions(+), 20 deletions(-)
Re: Soft lockup in inet_put_port on 4.6
On Fri, 2016-12-09 at 19:47 -0800, Eric Dumazet wrote: > > Hmm... Is your ephemeral port range includes the port your load > balancing app is using ? I suspect that you might have processes doing bind( port = 0) that are trapped into the bind_conflict() scan ? With 100,000 + timewaits there, this possibly hurts. Can you try the following loop breaker ? diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c index d5d3ead0a6c31e42e8843d30f8c643324a91b8e9..74f0f5ee6a02c624edb0263b9ddd27813f68d0a5 100644 --- a/net/ipv4/inet_connection_sock.c +++ b/net/ipv4/inet_connection_sock.c @@ -51,7 +51,7 @@ int inet_csk_bind_conflict(const struct sock *sk, int reuse = sk->sk_reuse; int reuseport = sk->sk_reuseport; kuid_t uid = sock_i_uid((struct sock *)sk); - + unsigned int max_count; /* * Unlike other sk lookup places we do not check * for sk_net here, since _all_ the socks listed @@ -59,6 +59,7 @@ int inet_csk_bind_conflict(const struct sock *sk, * one this bucket belongs to. */ + max_count = relax ? ~0U : 100; sk_for_each_bound(sk2, >owners) { if (sk != sk2 && !inet_v6_ipv6only(sk2) && @@ -84,6 +85,8 @@ int inet_csk_bind_conflict(const struct sock *sk, break; } } + if (--max_count == 0) + return 1; } return sk2 != NULL; } diff --git a/net/ipv6/inet6_connection_sock.c b/net/ipv6/inet6_connection_sock.c index 1c86c478f578b49373e61a4c397f23f3dc7f3fc6..4f63d06e0d601da94eb3f2b35a988abd060e156c 100644 --- a/net/ipv6/inet6_connection_sock.c +++ b/net/ipv6/inet6_connection_sock.c @@ -35,12 +35,14 @@ int inet6_csk_bind_conflict(const struct sock *sk, int reuse = sk->sk_reuse; int reuseport = sk->sk_reuseport; kuid_t uid = sock_i_uid((struct sock *)sk); + unsigned int max_count; /* We must walk the whole port owner list in this case. -DaveM */ /* * See comment in inet_csk_bind_conflict about sock lookup * vs net namespaces issues. */ + max_count = relax ? ~0U : 100; sk_for_each_bound(sk2, >owners) { if (sk != sk2 && (!sk->sk_bound_dev_if || @@ -61,6 +63,8 @@ int inet6_csk_bind_conflict(const struct sock *sk, ipv6_rcv_saddr_equal(sk, sk2, true)) break; } + if (--max_count == 0) + return 1; } return sk2 != NULL;
Re: netlink: GPF in sock_sndtimeo
On Fri, Dec 9, 2016 at 3:01 AM, Richard Guy Briggswrote: > On 2016-12-08 22:57, Cong Wang wrote: >> On Thu, Dec 8, 2016 at 10:02 PM, Richard Guy Briggs wrote: >> > I also tried to extend Cong Wang's idea to attempt to proactively respond >> > to a >> > NETLINK_URELEASE on the audit_sock and reset it, but ran into a locking >> > error >> > stack dump using mutex_lock(_cmd_mutex) in the notifier callback. >> > Eliminating the lock since the sock is dead anways eliminates the error. >> > >> > Is it safe? I'll resubmit if this looks remotely sane. Meanwhile I'll >> > try to >> > get the test case to compile. >> >> It doesn't look safe, because 'audit_sock', 'audit_nlk_portid' and >> 'audit_pid' >> are updated as a whole and race between audit_receive_msg() and >> NETLINK_URELEASE. > > This is what I expected and why I originally added the mutex lock in the > callback... The dumps I got were bare with no wrapper identifying the > process context or specific error, so I'm at a bit of a loss how to > solve this (without thinking more about it) other than instinctively > removing the mutex. Netlink notifier can safely be converted to blocking one, I will send a patch. But I seriously doubt you really need NETLINK_URELEASE here, it adds nothing but overhead, b/c the netlink notifier is called on every netlink socket in the system, but for net exit path, that is relatively a slow path. Also, kauditd_send_skb() needs audit_cmd_mutex too. I will send a formal patch. Thanks.
Re: [PATCH] net: mlx5: Fix Kconfig help text
From: Christopher CovingtonDate: Fri, 9 Dec 2016 16:53:05 -0500 > Since the following commit, Infiniband and Ethernet have not been > mutually exclusive. > > Fixes: 4aa17b28 mlx5: Enable mutual support for IB and Ethernet > > Signed-off-by: Christopher Covington Applied.
Re: [PATCH net-next] net: skb_condense() can also deal with empty skbs
From: Eric DumazetDate: Fri, 09 Dec 2016 08:02:05 -0800 > From: Eric Dumazet > > It seems attackers can also send UDP packets with no payload at all. > > skb_condense() can still be a win in this case. > > It will be possible to replace the custom code in tcp_add_backlog() > to get full benefit from skb_condense() > > Signed-off-by: Eric Dumazet Applied.
Re: [PATCH] i40e: don't truncate match_method assignment
From: Jacob KellerDate: Fri, 9 Dec 2016 13:39:21 -0800 > The .match_method field is a u8, so we shouldn't be casting to a u16, > and because it is only one byte, we do not need to byte swap anything. > Just assign the value directly. This avoids issues on Big Endian > architectures which would have byte swapped and then incorrectly > truncated the value. > > Signed-off-by: Jacob Keller > Cc: Stephen Rothwell > Cc: Bimmy Pujari > --- > Not sure if this was already in Jeff's queue, but since it's an obvious > fix for the issue found by Stephen, I thought I'd send it out now just > to make sure. Thanks for catching this, and sorry we didn't find the fix > earlier. Jeff, what do you want me to do with this?
Re: [PATCH] net: smsc911x: back out silently on probe deferrals
From: Linus WalleijDate: Fri, 9 Dec 2016 14:18:00 +0100 > When trying to get a regulator we may get deferred and we see > this noise: > > smsc911x 1b80.ethernet-ebi2 (unnamed net_device) (uninitialized): >couldn't get regulators -517 > > Then the driver continues anyway. Which means that the regulator > may not be properly retrieved and reference counted, and may be > switched off in case noone else is using it. > > Fix this by returning silently on deferred probe and let the > system work it out. > > Cc: Jeremy Linton > Signed-off-by: Linus Walleij Looks good, applied, thanks.
Re: pull-request: mac80211-next 2016-12-09
From: Johannes BergDate: Fri, 9 Dec 2016 13:00:13 +0100 > Closing net-next caught me by surprise, so I had to rebase a bit, > but these three patches really should go in soon. I'm not sending > them for 4.9 this late though. > > Please pull and let me know if there's any problem. Pulled, thanks Johannes.
Re: [PATCH net-next] net: macb: Added PCI wrapper for Platform Driver.
From: Bartosz FoltaDate: Fri, 9 Dec 2016 10:05:46 + > There are hardware PCI implementations of Cadence GEM network controller. > This patch will allow to use such hardware with reuse of existing Platform > Driver. Please properly format your commit message text to 80 columns. > > Signed-off-by: Bartosz Folta > --- > drivers/net/ethernet/cadence/Kconfig| 9 ++ > drivers/net/ethernet/cadence/Makefile | 1 + > drivers/net/ethernet/cadence/macb.c | 31 +-- > drivers/net/ethernet/cadence/macb_pci.c | 152 > > include/linux/platform_data/macb.h | 6 ++ > 5 files changed, 194 insertions(+), 5 deletions(-) create mode 100644 > drivers/net/ethernet/cadence/macb_pci.c This patch doesn't apply to net-next, please respin.
Re: [PATCH] ibmveth: set correct gso_size and gso_type
From: Thomas FalconDate: Thu, 8 Dec 2016 16:40:03 -0600 > This patch is based on an earlier one submitted > by Jon Maxwell with the following commit message: > > "We recently encountered a bug where a few customers using ibmveth on the > same LPAR hit an issue where a TCP session hung when large receive was > enabled. Closer analysis revealed that the session was stuck because the > one side was advertising a zero window repeatedly. > > We narrowed this down to the fact the ibmveth driver did not set gso_size > which is translated by TCP into the MSS later up the stack. The MSS is > used to calculate the TCP window size and as that was abnormally large, > it was calculating a zero window, even although the sockets receive buffer > was completely empty." > > We rely on the Virtual I/O Server partition in a pseries > environment to provide the MSS through the TCP header checksum > field. The stipulation is that users should not disable checksum > offloading if rx packet aggregation is enabled through VIOS. > > Some firmware offerings provide the MSS in the RX buffer. > This is signalled by a bit in the RX queue descriptor. > > Reviewed-by: Brian King > Reviewed-by: Pradeep Satyanarayana > Reviewed-by: Marcelo Ricardo Leitner > Reviewed-by: Jonathan Maxwell > Reviewed-by: David Dai > Signed-off-by: Thomas Falcon Applied, although mis-using the TCP checksum field for this is kind of bogus. I'm surprised there wasn't some other place you could stick this value, which wouldn't modify the packet contents.
Re: Soft lockup in inet_put_port on 4.6
On Fri, 2016-12-09 at 20:59 -0500, Josef Bacik wrote: > On Thu, Dec 8, 2016 at 8:01 PM, Josef Bacikwrote: > > > >> On Dec 8, 2016, at 7:32 PM, Eric Dumazet > >> wrote: > >> > >>> On Thu, 2016-12-08 at 16:36 -0500, Josef Bacik wrote: > >>> > >>> We can reproduce the problem at will, still trying to run down the > >>> problem. I'll try and find one of the boxes that dumped a core > >>> and get > >>> a bt of everybody. Thanks, > >> > >> OK, sounds good. > >> > >> I had a look and : > >> - could not spot a fix that came after 4.6. > >> - could not spot an obvious bug. > >> > >> Anything special in the program triggering the issue ? > >> SO_REUSEPORT and/or special socket options ? > >> > > > > So they recently started using SO_REUSEPORT, that's what triggered > > it, if they don't use it then everything is fine. > > > > I added some instrumentation for get_port to see if it was looping in > > there and none of my printk's triggered. The softlockup messages are > > always on the inet_bind_bucket lock, sometimes in the process context > > in get_port or in the softirq context either through inet_put_port or > > inet_kill_twsk. On the box that I have a coredump for there's only > > one processor in the inet code so I'm not sure what to make of that. > > That was a box from last week so I'll look at a more recent core and > > see if it's different. Thanks, > > Ok more investigation today, a few bullet points > > - With all the debugging turned on the boxes seem to recover after > about a minute. I'd get the spam of the soft lockup messages all on > the inet_bind_bucket, and then the box would be fine. > - I looked at a core I had from before I started investigating things > and there's only one process trying to get the inet_bind_bucket of all > the 48 cpus. > - I noticed that there was over 100k twsk's in that original core. > - I put a global counter of the twsk's (since most of the softlockup > messages have the twsk timers in the stack) and noticed with the > debugging kernel it started around 16k twsk's and once it recovered it > was down to less than a thousand. There's a jump where it goes from 8k > to 2k and then there's only one more softlockup message and the box is > fine. > - This happens when we restart the service with the config option to > start using SO_REUSEPORT. > > The application is our load balancing app, so obviously has lots of > connections opened at any given time. What I'm wondering and will test > on Monday is if the SO_REUSEPORT change even matters, or if simply > restarting the service is what triggers the problem. One thing I > forgot to mention is that it's also using TCP_FASTOPEN in both the > non-reuseport and reuseport variants. > > What I suspect is happening is the service stops, all of the sockets it > had open go into TIMEWAIT with relatively the same timer period, and > then suddenly all wake up at the same time which coupled with the > massive amount of traffic that we see per box anyway results in so much > contention and ksoftirqd usage that the box livelocks for a while. > With the lock debugging and stuff turned on we aren't able to service > as much traffic so it recovers relatively quickly, whereas a normal > production kernel never recovers. > > Please keep in mind that I"m a file system developer so my conclusions > may be completely insane, any guidance would be welcome. I'll continue > hammering on this on Monday. Thanks, Hmm... Is your ephemeral port range includes the port your load balancing app is using ?
Re: [PATCH net v2] ibmveth: set correct gso_size and gso_type
On Fri, 2016-12-09 at 19:31 -0600, Thomas Falcon wrote: > This patch is based on an earlier one submitted > by Jon Maxwell with the following commit message: > > + DIV_ROUND_UP(skb->len - hdr_len, mss); > + } else if (offset) { > + skb_shinfo(skb)->gso_size = ntohs(tcph->check); > + skb_shinfo(skb)->gso_segs = > + DIV_ROUND_UP(skb->len - hdr_len, > + skb_shinfo(skb)->gso_size); > + tcph->check = 0; > + } Are you sure that tcph->check could never be 0 on some cases ? That would crash on a divide by 0
Re: [PATCH v3 net-next 0/4] udp: receive path optimizations
From: Eric DumazetDate: Thu, 8 Dec 2016 11:41:53 -0800 > This patch series provides about 100 % performance increase under flood. > > v2: added Paolo feedback on udp_rmem_release() for tiny sk_rcvbuf > added the last patch touching sk_rmem_alloc later Series applied, thanks.
Re: [PATCH 1/2] net: ethernet: sxgbe: remove private tx queue lock
Hi, On 09.12.2016 12:21, Pavel Machek wrote: > On Fri 2016-12-09 00:19:43, Francois Romieu wrote: >> Lino Sanfilippo: >> [...] >> > OTOH Pavel said that he actually could produce a deadlock. Now I wonder if >> > this is caused by that locking scheme (in a way I have not figured out yet) >> > or if it is a different issue. >> >> stmmac_tx_err races with stmmac_xmit. > > Umm, yes, that looks real. > > And that means that removing tx_lock will not be completely trivial > :-(. Lino, any ideas there? > Ok, the race is there but it looks like a problem that is not related to the use or removal of the private lock. By a glimpse into other drivers (e.g sky2 or e1000), a possible way to handle a tx error is to start a separate task and restart the tx path in that task instead the irq handler (or timer in case of the watchdog). In that task we could do: 1. deactivate napi 2. deactivate irqs 3. wait for running napi/irqs do complete (_sync) 4. call stmmac_tx_err() 5. reenable napi 6. reenable irqs We have to ensure that no xmit() is executing while stmmac_tx_err() does the cleanup, so stmmac_tx_err() should IMO rather call netif_tx_disable() instead of netif_stop_queue() (the former grabs the xmit lock before it sets __QUEUE_STATE_DRV_XOFF to disable the queue). Regards, Lino
Re: [PATCH net-next 1/2] net: phy: add extension of phy-mode for XLGMII
On 2016/12/10 0:39, Andrew Lunn wrote: > On Fri, Dec 09, 2016 at 01:19:07PM +0800, Jie Deng wrote: >> >> On 2016/12/9 6:15, Florian Fainelli wrote: >>> On 12/06/2016 07:57 PM, Jie Deng wrote: This patch adds phy-mode support for Synopsys XLGMAC >>> The functional changes look good, but I would like to see some >>> description of what the XL part stands for here. >>> >>> While you are modifying this, do you also mind submitting a Device Tree >>> specification change: >>> >>> https://www.devicetree.org/specifications/ >>> >>> Thanks! >> Thank you for the information. >> >> Currenlty, the XLGMAC is a new IP from Synopsys. > I think Florian wants to know about the IEEE standard or what ever > which defines what the phy-mode XLGMAC is, in the same way there are > standards for RGMII, SGMII, etc. > > Andrew Understood! Thank you !
Re: Synopsys Ethernet QoS
On 2016/12/10 8:16, Andy Shevchenko wrote: > On Sat, Dec 10, 2016 at 12:52 AM, Florian Fainelli> wrote: > >> It's kind of sad that customers of that IP (stmmac, amd-xgbe, sxgbe) >> did >> actually pioneer the upstreaming effort, but it is good to see people >> from Synopsys willing to fix that in the future. > Wait, you would like to tell that we have more than 2 drivers for the > same (okay, same vendor) IP?! > It's better to unify them earlier, than have n+ copies. > > P.S. Though, I don't see how sxgbe got in the list. First glance on > the code doesn't show similarities. Glance on sxgbe_reg.h the register seems from Synopsys XGMAC IP... Probably, amd-xgbe and sxgbe targeted the same IP
Re: [PATCH 0/2 v3] net: qcom/emac: simplify support for different SOCs
From: Timur TabiDate: Thu, 8 Dec 2016 13:24:19 -0600 > On SOCs that have the Qualcomm EMAC network controller, the internal > PHY block is always different. Sometimes the differences are small, > sometimes it might be a completely different IP. Either way, using version > numbers to differentiate them and putting all of the init code in one > file does not scale. > > This patchset does two things: The first breaks up the current code into > different files, and the second patch adds support for a third SOC, the > Qualcomm Technologies QDF2400 ARM Server SOC. Series applied.
Re: Soft lockup in inet_put_port on 4.6
On Thu, Dec 8, 2016 at 8:01 PM, Josef Bacikwrote: On Dec 8, 2016, at 7:32 PM, Eric Dumazet wrote: On Thu, 2016-12-08 at 16:36 -0500, Josef Bacik wrote: We can reproduce the problem at will, still trying to run down the problem. I'll try and find one of the boxes that dumped a core and get a bt of everybody. Thanks, OK, sounds good. I had a look and : - could not spot a fix that came after 4.6. - could not spot an obvious bug. Anything special in the program triggering the issue ? SO_REUSEPORT and/or special socket options ? So they recently started using SO_REUSEPORT, that's what triggered it, if they don't use it then everything is fine. I added some instrumentation for get_port to see if it was looping in there and none of my printk's triggered. The softlockup messages are always on the inet_bind_bucket lock, sometimes in the process context in get_port or in the softirq context either through inet_put_port or inet_kill_twsk. On the box that I have a coredump for there's only one processor in the inet code so I'm not sure what to make of that. That was a box from last week so I'll look at a more recent core and see if it's different. Thanks, Ok more investigation today, a few bullet points - With all the debugging turned on the boxes seem to recover after about a minute. I'd get the spam of the soft lockup messages all on the inet_bind_bucket, and then the box would be fine. - I looked at a core I had from before I started investigating things and there's only one process trying to get the inet_bind_bucket of all the 48 cpus. - I noticed that there was over 100k twsk's in that original core. - I put a global counter of the twsk's (since most of the softlockup messages have the twsk timers in the stack) and noticed with the debugging kernel it started around 16k twsk's and once it recovered it was down to less than a thousand. There's a jump where it goes from 8k to 2k and then there's only one more softlockup message and the box is fine. - This happens when we restart the service with the config option to start using SO_REUSEPORT. The application is our load balancing app, so obviously has lots of connections opened at any given time. What I'm wondering and will test on Monday is if the SO_REUSEPORT change even matters, or if simply restarting the service is what triggers the problem. One thing I forgot to mention is that it's also using TCP_FASTOPEN in both the non-reuseport and reuseport variants. What I suspect is happening is the service stops, all of the sockets it had open go into TIMEWAIT with relatively the same timer period, and then suddenly all wake up at the same time which coupled with the massive amount of traffic that we see per box anyway results in so much contention and ksoftirqd usage that the box livelocks for a while. With the lock debugging and stuff turned on we aren't able to service as much traffic so it recovers relatively quickly, whereas a normal production kernel never recovers. Please keep in mind that I"m a file system developer so my conclusions may be completely insane, any guidance would be welcome. I'll continue hammering on this on Monday. Thanks, Josef
Re: Synopsys Ethernet QoS
Le 12/09/16 à 16:16, Andy Shevchenko a écrit : > On Sat, Dec 10, 2016 at 12:52 AM, Florian Fainelli> wrote: > >> It's kind of sad that customers of that IP (stmmac, amd-xgbe, sxgbe) > >> did >> actually pioneer the upstreaming effort, but it is good to see people >> from Synopsys willing to fix that in the future. > > Wait, you would like to tell that we have more than 2 drivers for the > same (okay, same vendor) IP?! > It's better to unify them earlier, than have n+ copies. Unfortunately that is the case, see this email: https://www.mail-archive.com/netdev@vger.kernel.org/msg142796.html dwc_eth_qos and stmmac have some overlap. There seems to be work underway to unify these two to begin with. > > P.S. Though, I don't see how sxgbe got in the list. First glance on > the code doesn't show similarities. Well samsung/sxgbe looks potentially similar to amd/xgbe, but that's just my cursory look at the code, it may very well be something entirely different. The descriptor formats just look suspiciously similar. -- Florian
[PATCH net v2] ibmveth: set correct gso_size and gso_type
This patch is based on an earlier one submitted by Jon Maxwell with the following commit message: "We recently encountered a bug where a few customers using ibmveth on the same LPAR hit an issue where a TCP session hung when large receive was enabled. Closer analysis revealed that the session was stuck because the one side was advertising a zero window repeatedly. We narrowed this down to the fact the ibmveth driver did not set gso_size which is translated by TCP into the MSS later up the stack. The MSS is used to calculate the TCP window size and as that was abnormally large, it was calculating a zero window, even although the sockets receive buffer was completely empty." We rely on the Virtual I/O Server partition in a pseries environment to provide the MSS through the TCP header checksum field. The stipulation is that users should not disable checksum offloading if rx packet aggregation is enabled through VIOS. Some firmware offerings provide the MSS in the RX buffer. This is signalled by a bit in the RX queue descriptor. Reviewed-by: Brian KingReviewed-by: Pradeep Satyanarayana Reviewed-by: Marcelo Ricardo Leitner Reviewed-by: Jonathan Maxwell Reviewed-by: David Dai Signed-off-by: Thomas Falcon --- v2: calculate gso_segs after Eric Dumazet's comments on the earlier patch and make sure everyone is included on CC --- drivers/net/ethernet/ibm/ibmveth.c | 72 -- drivers/net/ethernet/ibm/ibmveth.h | 1 + 2 files changed, 71 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/ibm/ibmveth.c b/drivers/net/ethernet/ibm/ibmveth.c index ebe6071..f0c3ae7 100644 --- a/drivers/net/ethernet/ibm/ibmveth.c +++ b/drivers/net/ethernet/ibm/ibmveth.c @@ -58,7 +58,7 @@ static const char ibmveth_driver_name[] = "ibmveth"; static const char ibmveth_driver_string[] = "IBM Power Virtual Ethernet Driver"; -#define ibmveth_driver_version "1.05" +#define ibmveth_driver_version "1.06" MODULE_AUTHOR("Santiago Leon "); MODULE_DESCRIPTION("IBM Power Virtual Ethernet Driver"); @@ -137,6 +137,11 @@ static inline int ibmveth_rxq_frame_offset(struct ibmveth_adapter *adapter) return ibmveth_rxq_flags(adapter) & IBMVETH_RXQ_OFF_MASK; } +static inline int ibmveth_rxq_large_packet(struct ibmveth_adapter *adapter) +{ + return ibmveth_rxq_flags(adapter) & IBMVETH_RXQ_LRG_PKT; +} + static inline int ibmveth_rxq_frame_length(struct ibmveth_adapter *adapter) { return be32_to_cpu(adapter->rx_queue.queue_addr[adapter->rx_queue.index].length); @@ -1174,6 +1179,52 @@ static netdev_tx_t ibmveth_start_xmit(struct sk_buff *skb, goto retry_bounce; } +static void ibmveth_rx_mss_helper(struct sk_buff *skb, u16 mss, int lrg_pkt) +{ + struct tcphdr *tcph; + int offset = 0; + int hdr_len; + + /* only TCP packets will be aggregated */ + if (skb->protocol == htons(ETH_P_IP)) { + struct iphdr *iph = (struct iphdr *)skb->data; + + if (iph->protocol == IPPROTO_TCP) { + offset = iph->ihl * 4; + skb_shinfo(skb)->gso_type = SKB_GSO_TCPV4; + } else { + return; + } + } else if (skb->protocol == htons(ETH_P_IPV6)) { + struct ipv6hdr *iph6 = (struct ipv6hdr *)skb->data; + + if (iph6->nexthdr == IPPROTO_TCP) { + offset = sizeof(struct ipv6hdr); + skb_shinfo(skb)->gso_type = SKB_GSO_TCPV6; + } else { + return; + } + } else { + return; + } + /* if mss is not set through Large Packet bit/mss in rx buffer, +* expect that the mss will be written to the tcp header checksum. +*/ + tcph = (struct tcphdr *)(skb->data + offset); + hdr_len = offset + tcph->doff * 4; + if (lrg_pkt) { + skb_shinfo(skb)->gso_size = mss; + skb_shinfo(skb)->gso_segs = + DIV_ROUND_UP(skb->len - hdr_len, mss); + } else if (offset) { + skb_shinfo(skb)->gso_size = ntohs(tcph->check); + skb_shinfo(skb)->gso_segs = + DIV_ROUND_UP(skb->len - hdr_len, +skb_shinfo(skb)->gso_size); + tcph->check = 0; + } +} + static int ibmveth_poll(struct napi_struct *napi, int budget) { struct ibmveth_adapter *adapter = @@ -1182,6 +1233,7 @@ static int ibmveth_poll(struct napi_struct *napi, int budget) int frames_processed = 0; unsigned long lpar_rc; struct iphdr *iph; + u16 mss = 0; restart_poll: while (frames_processed < budget) { @@ -1199,9 +1251,21 @@
Re: Synopsys Ethernet QoS
On Sat, Dec 10, 2016 at 12:52 AM, Florian Fainelliwrote: > It's kind of sad that customers of that IP (stmmac, amd-xgbe, sxgbe) > did > actually pioneer the upstreaming effort, but it is good to see people > from Synopsys willing to fix that in the future. Wait, you would like to tell that we have more than 2 drivers for the same (okay, same vendor) IP?! It's better to unify them earlier, than have n+ copies. P.S. Though, I don't see how sxgbe got in the list. First glance on the code doesn't show similarities. -- With Best Regards, Andy Shevchenko
fib_frontend: Add network specific broadcasts, when it takes a sense
Hello- A number of us are working on an OSS overlay network system called flannel. It is used in a variety of Linux container systems and one of the backends is VXLAN. The issue we have: when creating the VXLAN interface and assigning it an address we see a broadcast route being added by the Kernel. For example if we have 10.4.0.0/16 a broadcast route to 10.4.0.0 is created. This route is unwanted because we assign 10.4.0.0 to one of our VXLAN interfaces. However, the Kernel interface bring-up comment reads: Add network specific broadcasts, when it takes a sense. The code is here: https://github.com/torvalds/linux/blob/master/net/ipv4/fib_frontend.c#L859-L872 Can someone explain why creation of the broadcast route is non-optional? Would a patch to make it optional be acceptable? Is it safe for us to simply delete the route? We have a patch that simply deletes the broadcast route after interface creation but don't know why the Kernel code "makes sense". You can read more information about the issue here: https://github.com/coreos/flannel/pull/569 Thank You, Brandon
Re: [PATCH V2 03/22] bnxt_re: register with the NIC driver
On 12/09/2016 01:47 AM, Selvin Xavier wrote: > This patch handles the registration with bnxt_en driver. The driver registers > with netdev notifier chain. Upon receiving NETDEV_REGISTER event, the driver > in turn registers with bnxt_en driver. > 1. bnxt_en's ulp_probe function returns a structure that contains > information > about the device and additional entry points. > 2. bnxt_en driver returns 'struct bnxt_eth_dev' that contains set of > operation > vectors that RocE driver invokes later. > 3. bnxt_request_msix() allows the RoCE driver to specify the number of > MSI-X > vectors that are needed. > 4. bnxt_send_fw_msg () can be used to send messages to the FW > 5. bnxt_register_async_events() can be used to register for async event > callbacks. > > v2: Remove some sparse warning. Also, remove some unused code from unreg path. > > Signed-off-by: Eddie Wai> Signed-off-by: Devesh Sharma > Signed-off-by: Somnath Kotur > Signed-off-by: Sriharsha Basavapatna > Signed-off-by: Selvin Xavier > --- > drivers/infiniband/hw/bnxtre/bnxt_re.h | 48 +++ > drivers/infiniband/hw/bnxtre/bnxt_re_main.c | 436 > > 2 files changed, 484 insertions(+) > [...] > #endif > diff --git a/drivers/infiniband/hw/bnxtre/bnxt_re_main.c > b/drivers/infiniband/hw/bnxtre/bnxt_re_main.c > index ebe1c69..029824a 100644 > --- a/drivers/infiniband/hw/bnxtre/bnxt_re_main.c > +++ b/drivers/infiniband/hw/bnxtre/bnxt_re_main.c > + > +static int bnxt_re_ib_reg(struct bnxt_re_dev *rdev) > +{ > + int i, j, rc; > + > + /* Registered a new RoCE device instance to netdev */ > + rc = bnxt_re_register_netdev(rdev); > + if (rc) { > + pr_err("Failed to register with netedev: %#x\n", rc); > + return -EINVAL; > + } > + set_bit(BNXT_RE_FLAG_NETDEV_REGISTERED, >flags); > + > + rc = bnxt_re_request_msix(rdev); > + if (rc) { > + pr_err("Failed to get MSI-X vectors: %#x\n", rc); > + rc = -EINVAL; > + goto fail; > + } > + set_bit(BNXT_RE_FLAG_GOT_MSIX, >flags); Though this exit path looks correct (need to verify) once all patches are applied, this looks incorrect if only considering this specific patch. I think you need the following: + return 0; > + > +fail: > + bnxt_re_ib_unreg(rdev, true); > + return rc; > +} > +
fib_frontend: Add network specific broadcasts, when it takes a sense
Hello- A number of us are working on an OSS overlay network system called flannel. It is used in a variety of Linux container systems and one of the backends is VXLAN. The issue we have: when creating the VXLAN interface and assigning it an address we see a broadcast route being added by the Kernel. For example if we have 10.4.0.0/16 a broadcast route to 10.4.0.0 is created. This route is unwanted because we assign 10.4.0.0 to one of our VXLAN interfaces. However, the Kernel interface bring-up comment reads: Add network specific broadcasts, when it takes a sense. The code is here: https://github.com/torvalds/linux/blob/master/net/ipv4/fib_frontend.c#L859-L872 Can someone explain why creation of the broadcast route is non-optional? Would a patch to make it optional be acceptable? Is it safe for us to simply delete the route? We have a patch that simply deletes the broadcast route after interface creation but don't know why the Kernel code "makes sense". You can read more information about the issue here: https://github.com/coreos/flannel/pull/569 Thank You, Brandon
Re: [PATCH 2/6] net: ethernet: ti: cpts: add support for ext rftclk selection
On 12/08/2016 06:47 PM, Stephen Boyd wrote: > On 12/06, Grygorii Strashko wrote: >> Subject: [PATCH] cpts refclk sel >> >> Signed-off-by: Grygorii Strashko>> --- >> arch/arm/boot/dts/keystone-k2e-netcp.dtsi | 10 +- >> drivers/net/ethernet/ti/cpts.c| 52 >> ++- >> 2 files changed, 60 insertions(+), 2 deletions(-) >> >> diff --git a/arch/arm/boot/dts/keystone-k2e-netcp.dtsi >> b/arch/arm/boot/dts/keystone-k2e-netcp.dtsi >> index 919e655..b27aa22 100644 >> --- a/arch/arm/boot/dts/keystone-k2e-netcp.dtsi >> +++ b/arch/arm/boot/dts/keystone-k2e-netcp.dtsi >> @@ -138,7 +138,7 @@ netcp: netcp@2400 { >> /* NetCP address range */ >> ranges = <0 0x2400 0x100>; >> >> -clocks = <>, <>, <>; >> +clocks = <>, <>, <_mux>; ^^ mux clock used here >> clock-names = "pa_clk", "ethss_clk", "cpts"; >> dma-coherent; >> >> @@ -162,6 +162,14 @@ netcp: netcp@2400 { >> cpts-ext-ts-inputs = <6>; >> cpts-ts-comp-length; >> >> +cpts_mux: cpts_refclk_mux { >> +#clock-cells = <0>; >> +clocks = <>, <>; >> +cpts-mux-tbl = <0>, <1>; >> +assigned-clocks = <_mux>; >> +assigned-clock-parents = <>; > > Is there a binding update? this was pure RFC-DEV patch just to check the possibility of modeling CPTS_RFTCLK_SEL register as mux clock. Original patch: https://lkml.org/lkml/2016/11/28/780 I've plan to resend it using clk framework. Why the subnode? Sry, I did not get this question - is there another way to pas phandle on clock in clocks list property? Am I missing smth.? Sry, this is my first clock :) > Why not have it as part of the netcp node? cpts is part of gbe ethss, which is part of netcp. Only netcp is modeled as DD - cpts and gbe ethss implemented without using DD model, so generic resources acquired by netcp and then passed to cpts and gbe ethss. CPTS has register to control an external multiplexer that selects one of up to 32 clocks for time sync reference (RFTCLK) > Does the cpts-mux-tbl property change? On Keystone 2 66AK2e (as example) the following list of clocks can be selected as ref clocks (list is different for other SoCs): = SYSCLK2 0001 = SYSCLK3 0010 = TIMI0 0011 = TIMI1 0100 = TSIPCLKA 1000 = TSREFCLK 1100 = TSIPCLKB Others = Reserved and only 0 and 1 are internal, other external and board specific (parameters unknown and corresponding inputs can be used for other purposes), so I can't define all parent clocks, only internal: clocks = <>, <>; cpts-mux-tbl = <0>, <1>; to use another, external, clock - it should be explicitly defined in board file the board file timi1clk: timi1clk { #clock-cells = <0>; compatible = "fixed-clock"; ... _mux { clocks = <>, <>, ; ^^^ i can't predict value here cpts-mux-tbl = <0>, <1>, <3>; ^^i can't predict value here assigned-clocks = <_mux>; assigned-clock-parents = <>; }; or I understood your question wrongly? > >> +}; >> + >> interfaces { >> gbe0: interface-0 { >> slave-port = <0>; >> diff --git a/drivers/net/ethernet/ti/cpts.c b/drivers/net/ethernet/ti/cpts.c >> index 938de22..ef94316 100644 >> --- a/drivers/net/ethernet/ti/cpts.c >> +++ b/drivers/net/ethernet/ti/cpts.c >> @@ -17,6 +17,7 @@ >> * along with this program; if not, write to the Free Software >> * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA >> */ >> +#include >> #include >> #include >> #include >> @@ -672,6 +673,7 @@ int cpts_register(struct cpts *cpts) >> cpts->phc_index = ptp_clock_index(cpts->clock); >> >> schedule_delayed_work(>overflow_work, cpts->ov_check_period); >> + > > Maybe in another patch. > sure >> return 0; >> >> err_ptp: >> @@ -741,6 +743,54 @@ static void cpts_calc_mult_shift(struct cpts *cpts) >> freq, cpts->cc_mult, cpts->cc.shift, (ns - NSEC_PER_SEC)); >> } >> ... >> + >> +reg = >reg->rftclk_sel; >> + >> +clk = clk_register_mux_table(cpts->dev, refclk_np->name, >> + parent_names, num_parents, >> + 0, reg, 0, 0x1F, 0, mux_table, NULL); >> +if (IS_ERR(clk)) >> +return PTR_ERR(clk); >> + >> +return of_clk_add_provider(refclk_np, of_clk_src_simple_get, clk); > > Can you please use the clk_hw APIs instead? > ok -- regards, -grygorii
Re: Synopsys Ethernet QoS
On 12/09/2016 02:25 PM, Andy Shevchenko wrote: > On Fri, Dec 9, 2016 at 5:41 PM, David Millerwrote: > >> But one thing I am against is changing the driver name for existing >> users. If an existing chip is supported by the stmmac driver for >> existing users, they should still continue to use the "stmmac" driver. >> >> Therefore, if consolidation changes the driver module name for >> existing users, then that is not a good plan at all. > > You have at least one supporter here. Though I jumped in to the > discussion very late, not sure if everyone have time to answer to > that. I don't have many stakes in the stmmac driver (or other Synopsys drivers for that matter), but renaming seems like a terrible idea that is going to make backporting of fixes difficult for distribution. While moving the driver into a separate directory could be done, and git knows how to track files, renaming the driver entirely would break many platforms (including but not limited to, Device Tree) that you may not have visibility over (compatible strings, properties, and platform device driver name for instance). It's kind of sad that customers of that IP (stmmac, amd-xgbe, sxgbe) did actually pioneer the upstreaming effort, but it is good to see people from Synopsys willing to fix that in the future. -- Florian
Re: [PATCH] net:ethernet:samsung:initialize cur_rx_qnum
Rayagond Kokatanur: > This patch initialize the cur_rx_qnum upon occurence of rx interrupt, > without this initialization driver will not work with multiple rx queues > configurations. > > NOTE: This patch is not tested on actual hw. (your patch should include a Signed-off-by) Imho the driver needs more changes to support multiple rx queues. - rx interrupt for queue A -> priv->cur_rx_qnum = A - rx interrupt for queue B -> priv->cur_rx_qnum = B - rx napi processing -> Err... Please start turning priv->cur_rx_qnum into a SXGBE_RX_QUEUES sized bitmap. -- Ueimor
Re: [PATCH v2 1/4] net: hix5hd2_gmac: add generic compatible string
On Mon, Dec 05, 2016 at 09:27:58PM +0800, Dongpo Li wrote: > The "hix5hd2" is SoC name, add the generic ethernet driver name. > The "hisi-gemac-v1" is the basic version and "hisi-gemac-v2" adds > the SG/TXCSUM/TSO/UFO features. > > Signed-off-by: Dongpo Li> --- > .../devicetree/bindings/net/hisilicon-hix5hd2-gmac.txt| 9 +++-- > drivers/net/ethernet/hisilicon/hix5hd2_gmac.c | 15 > +++ > 2 files changed, 18 insertions(+), 6 deletions(-) > > diff --git a/Documentation/devicetree/bindings/net/hisilicon-hix5hd2-gmac.txt > b/Documentation/devicetree/bindings/net/hisilicon-hix5hd2-gmac.txt > index 75d398b..75920f0 100644 > --- a/Documentation/devicetree/bindings/net/hisilicon-hix5hd2-gmac.txt > +++ b/Documentation/devicetree/bindings/net/hisilicon-hix5hd2-gmac.txt > @@ -1,7 +1,12 @@ > Hisilicon hix5hd2 gmac controller > > Required properties: > -- compatible: should be "hisilicon,hix5hd2-gmac". > +- compatible: should contain one of the following SoC strings: > + * "hisilicon,hix5hd2-gemac" > + * "hisilicon,hi3798cv200-gemac" > + and one of the following version string: > + * "hisilicon,hisi-gemac-v1" > + * "hisilicon,hisi-gemac-v2" What combinations are valid? I assume both chips don't have both v1 and v2. 2 SoCs and 2 versions so far, I don't think there is much point to have the v1 and v2 compatible strings. > - reg: specifies base physical address(s) and size of the device registers. >The first region is the MAC register base and size. >The second region is external interface control register. > @@ -20,7 +25,7 @@ Required properties: > > Example: > gmac0: ethernet@f984 { > - compatible = "hisilicon,hix5hd2-gmac"; > + compatible = "hisilicon,hix5hd2-gemac", > "hisilicon,hisi-gemac-v1"; You can't just change compatible strings. > reg = <0xf984 0x1000>,<0xf984300c 0x4>; > interrupts = <0 71 4>; > #address-cells = <1>;
Re: Synopsys Ethernet QoS
On Fri, Dec 9, 2016 at 5:41 PM, David Millerwrote: > But one thing I am against is changing the driver name for existing > users. If an existing chip is supported by the stmmac driver for > existing users, they should still continue to use the "stmmac" driver. > > Therefore, if consolidation changes the driver module name for > existing users, then that is not a good plan at all. You have at least one supporter here. Though I jumped in to the discussion very late, not sure if everyone have time to answer to that. -- With Best Regards, Andy Shevchenko
Re: [PATCH iproute2] Makefile: really suppress printing of directories
On 12/9/16 12:50 PM, Stephen Hemminger wrote: > On Wed, 7 Dec 2016 12:55:09 -0800 > David Ahernwrote: > >> Makefile adds --no-print-directory to MAKEFLAGS if VERBOSE is not >> defined however Config always defines VERBOSE. Update the check to >> whether VERBOSE is 0. >> >> Fixes: 57bdf8b76451 ("Make builds default to quiet mode") >> Signed-off-by: David Ahern > > Applied to net-next. > > Patch only works with net-next, please label it next time. > That does not sound right. The patch this one fixes was applied back in May, and Makefile has only had one other commit against it since. Regardless, I will add the label to git to default to net-next.
[PATCH] net: mlx5: Fix Kconfig help text
Since the following commit, Infiniband and Ethernet have not been mutually exclusive. Fixes: 4aa17b28 mlx5: Enable mutual support for IB and Ethernet Signed-off-by: Christopher Covington--- drivers/net/ethernet/mellanox/mlx5/core/Kconfig | 2 -- 1 file changed, 2 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Kconfig b/drivers/net/ethernet/mellanox/mlx5/core/Kconfig index aae4688..521cfdb 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/Kconfig +++ b/drivers/net/ethernet/mellanox/mlx5/core/Kconfig @@ -18,8 +18,6 @@ config MLX5_CORE_EN default n ---help--- Ethernet support in Mellanox Technologies ConnectX-4 NIC. - Ethernet and Infiniband support in ConnectX-4 are currently mutually - exclusive. config MLX5_CORE_EN_DCB bool "Data Center Bridging (DCB) Support" -- Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc. Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.
[PATCH] i40e: don't truncate match_method assignment
The .match_method field is a u8, so we shouldn't be casting to a u16, and because it is only one byte, we do not need to byte swap anything. Just assign the value directly. This avoids issues on Big Endian architectures which would have byte swapped and then incorrectly truncated the value. Signed-off-by: Jacob KellerCc: Stephen Rothwell Cc: Bimmy Pujari --- Not sure if this was already in Jeff's queue, but since it's an obvious fix for the issue found by Stephen, I thought I'd send it out now just to make sure. Thanks for catching this, and sorry we didn't find the fix earlier. drivers/net/ethernet/intel/i40e/i40e_main.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c index 69a51a4119d6..6ccf18464339 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_main.c +++ b/drivers/net/ethernet/intel/i40e/i40e_main.c @@ -2257,8 +2257,7 @@ int i40e_sync_vsi_filters(struct i40e_vsi *vsi) } add_list[num_add].queue_number = 0; /* set invalid match method for later detection */ - add_list[num_add].match_method = - cpu_to_le16((u16)I40E_AQC_MM_ERR_NO_RES); + add_list[num_add].match_method = I40E_AQC_MM_ERR_NO_RES; cmd_flags |= I40E_AQC_MACVLAN_ADD_PERFECT_MATCH; add_list[num_add].flags = cpu_to_le16(cmd_flags); num_add++; -- 2.11.0.rc2.152.g4d04e67
Re: [PATCH net-next v3 1/2] net: dt-bindings: add RGMII TX delay configuration to meson8b-dwmac
On Sat, Dec 03, 2016 at 12:32:37AM +0100, Martin Blumenstingl wrote: > This allows configuring the RGMII TX clock delay. The RGMII clock is > generated by underlying hardware of the the Meson 8b / GXBB DWMAC glue. > The configuration depends on the actual hardware (no delay may be > needed due to the design of the actual circuit, the PHY might add this > delay, etc.). > > Signed-off-by: Martin Blumenstingl> Tested-by: Neil Armstrong > --- > Documentation/devicetree/bindings/net/meson-dwmac.txt | 14 ++ > 1 file changed, 14 insertions(+) Acked-by: Rob Herring
Re: [PATCH net-next 1/3] net/mlx5e: use %pad format string for dma_addr_t
On 12/08/2016 11:57 PM, Arnd Bergmann wrote: > On 32-bit ARM with 64-bit dma_addr_t I get this warning about an > incorrect format string: > > In file included from > /git/arm-soc/drivers/net/ethernet/mellanox/mlx5/core/alloc.c:42:0: > drivers/net/ethernet/mellanox/mlx5/core/alloc.c: In function > ‘mlx5_frag_buf_alloc_node’: > drivers/net/ethernet/mellanox/mlx5/core/alloc.c:134:12: error: cast to > pointer from integer of different size [-Werror=int-to-pointer-cast] > > We have the special %pad format for printing dma_addr_t, so use that > to print the correct address and avoid the warning. > > Fixes: 1c1b522808a1 ("net/mlx5e: Implement Fragmented Work Queue (WQ)") > Signed-off-by: Arnd BergmannThank you Arnd !! Acked-by: Saeed Mahameed
Re: [PATCH iproute2] Makefile: really suppress printing of directories
On Wed, 7 Dec 2016 12:55:09 -0800 David Ahernwrote: > Makefile adds --no-print-directory to MAKEFLAGS if VERBOSE is not > defined however Config always defines VERBOSE. Update the check to > whether VERBOSE is 0. > > Fixes: 57bdf8b76451 ("Make builds default to quiet mode") > Signed-off-by: David Ahern Applied to net-next. Patch only works with net-next, please label it next time.
Re: [PATCH v2 iproute2/net-next 3/3] tc: flower: support matching on ICMP type and code
On Wed, 7 Dec 2016 14:54:03 +0100 Simon Hormanwrote: > Support matching on ICMP type and code. > > Example usage: > > tc qdisc add dev eth0 ingress > > tc filter add dev eth0 protocol ip parent : flower \ > indev eth0 ip_proto icmp type 8 code 0 action drop > > tc filter add dev eth0 protocol ipv6 parent : flower \ > indev eth0 ip_proto icmpv6 type 128 code 0 action drop > > Signed-off-by: Simon Horman Applied to net-next
Re: [PATCH v2 iproute2/net-next 2/3] tc: flower: introduce enum flower_endpoint
On Wed, 7 Dec 2016 14:54:02 +0100 Simon Hormanwrote: > Introduce enum flower_endpoint and use it instead of a bool > as the type for paramatising source and destination. > > This is intended to improve read-ability and provide some type > checking of endpoint parameters. > > Signed-off-by: Simon Horman Applied to net-next
Re: [PATCH v2 iproute2/net-next 1/3] tc: flower: update headers for TCA_FLOWER_KEY_ICMP*
On Wed, 7 Dec 2016 14:54:01 +0100 Simon Hormanwrote: > These are proposed changes for net-next. > > Signed-off-by: Simon Horman Picked this up with upstream headers update
Re: [PATCH] linux/types.h: enable endian checks for all sparse builds
On Fri, Dec 09, 2016 at 03:18:02PM +, Bart Van Assche wrote: > On 12/08/16 22:40, Madhani, Himanshu wrote: > > We’ll take a look and send patches to resolve these warnings. > > Thanks! > > Bart. > Sounds good. I posted what I have so far so that you can start from that. -- MST
Re: [PATCH iproute2 net-next] bpf: Fix number of retries when growing log buffer
On Wed, 7 Dec 2016 10:47:59 +0100 Thomas Grafwrote: > The log buffer is automatically grown when the verifier output does not > fit into the default buffer size. The number of growing attempts was > not sufficient to reach the maximum buffer size so far. > > Perform 9 iterations to reach max and let the 10th one fail. > > j:0 i:65536 max:16777215 > j:1 i:131072max:16777215 > j:2 i:262144max:16777215 > j:3 i:524288max:16777215 > j:4 i:1048576 max:16777215 > j:5 i:2097152 max:16777215 > j:6 i:4194304 max:16777215 > j:7 i:8388608 max:16777215 > j:8 i:16777216 max:16777215 > > Signed-off-by: Thomas Graf > Acked-by: Daniel Borkmann Applied to net-next
Re: [PATCH] uio-hv-generic: store physical addresses instead of virtual
On Friday, December 9, 2016 9:28:44 AM CET Stephen Hemminger wrote: > On Fri, 9 Dec 2016 12:44:40 +0100 > Arnd Bergmannwrote: > > Fixes: 95096f2fbd10 ("uio-hv-generic: new userspace i/o driver for VMBus") > > Signed-off-by: Arnd Bergmann > > Thanks, the code was inherited from outside, and only tested on x86_64. > Not sure which platform and GCC version generates the warning, was this just > W=1? > > Acked-by: Stephen Hemminger This was a regular warning with a randconfig build on arm32, but it happens on any 32-bit architecture when CONFIG_PHYS_ADDR_T_64BIT is enabled. Arnd
Re: [PATCH net-next v2] dsa:mv88e6xxx: dispose irq mapping for chip->irq
On Wed, Dec 07, 2016 at 05:40:12PM +0100, Volodymyr Bendiuga wrote: > Yes, most of the users of of_irq_get() do not use irq_dispose_mapping(). > > But some of them do (some irq chips), and I believe the correct way > of doing this is to > > dispose irq mapping, as the description for this function says that > it unmaps > > the irq, which is mapped by of_irq_parse_and_map(). Not disposing > irq might not make > > any affect on most drivers, but some, that get EPROBE_DEFER error do > need to dispose. > > This is what I get when I run the code. > > of_irq_put() could be implemented, and it would be a wrapper for > irq_dispose_mapping() > > as I can see it. Should I do it this way? Hi Volodymyr Yes, i think having of_irq_put() would be good. It gives some symmetry to the API. Andrew
Re: [PATCHv3 perf/core 5/7] samples/bpf: Switch over to libbpf
On 8 December 2016 at 21:18, Wangnan (F)wrote: > > > On 2016/12/9 13:04, Wangnan (F) wrote: >> >> >> >> On 2016/12/9 10:46, Joe Stringer wrote: >> >> [SNIP] >> >>> diff --git a/tools/lib/bpf/Makefile b/tools/lib/bpf/Makefile >>> index 62d89d50fcbd..616bd55f3be8 100644 >>> --- a/tools/lib/bpf/Makefile >>> +++ b/tools/lib/bpf/Makefile >>> @@ -149,6 +149,8 @@ CMD_TARGETS = $(LIB_FILE) >>> TARGETS = $(CMD_TARGETS) >>> +libbpf: all >>> + >> >> >> Why we need this? I tested this patch without it and it seems to work, and >> this line causes an extra error: >> $ pwd >> /home/wn/kernel/tools/lib/bpf >> $ make libbpf >> ... >> gcc -g -Wall -DHAVE_LIBELF_MMAP_SUPPORT -DHAVE_ELF_GETPHDRNUM_SUPPORT >> -Wbad-function-cast -Wdeclaration-after-statement -Wformat-security >> -Wformat-y2k -Winit-self -Wmissing-declarations -Wmissing-prototypes >> -Wnested-externs -Wno-system-headers -Wold-style-definition -Wpacked >> -Wredundant-decls -Wshadow -Wstrict-aliasing=3 -Wstrict-prototypes >> -Wswitch-default -Wswitch-enum -Wundef -Wwrite-strings -Wformat -Werror >> -Wall -fPIC -I. -I/home/wn/kernel-hydrogen/tools/include >> -I/home/wn/kernel-hydrogen/tools/arch/x86/include/uapi >> -I/home/wn/kernel-hydrogen/tools/include/uapilibbpf.c all -o libbpf >> gcc: error: all: No such file or directory >> make: *** [libbpf] Error 1 >> >> Thank you. > > > It is not 'caused' by your patch. 'make libbpf' fails without > your change because it tries to build an executable from > libbpf.c, but main() is missing. > > I think libbpf should never be used as a make target. Your > new dependency looks strange. Thanks for the feedback, I sent a patch to address this on top of perf/core: https://lkml.org/lkml/2016/12/9/518
Re: [PATCH V2 00/22] Broadcom RoCE Driver (bnxt_re)
On Fri, Dec 9, 2016 at 8:56 PM, David Millerwrote: > From: Selvin Xavier > Date: Thu, 8 Dec 2016 22:47:54 -0800 > >> This series introduces the RoCE driver for the Broadcom >> NetXtreme-E 10/25/40/50 gigabit RoCE HCAs. >> This driver is dependent on the bnxt_en NIC driver and is >> based on the bnxt_re branch in Doug's repository. bnxt_en changes >> required for this patch series is already available in this branch. >> >> I am preparing a git repository with these changes as per Jason's >> comment and will share the details later today. > > If this is targetted at the net-next tree, it is too late as I've > closed the net-next tree two nights ago. > This patch series is targeting linux-rdma tree. netdev is copied since this series is dependent on bnxt_en. Thanks Selvin
[PATCH perf/core] samples/bpf: Drop unnecessary build targets.
Commit f72179ef11db ("samples/bpf: Switch over to libbpf") added these two makefile changes that were unnecessary for switching samples to use libbpf. The extra make is already handled by the build dependency, and libbpf target doesn't build because it lacks main(). Remove these. Reported-by: Wang NanSigned-off-by: Joe Stringer --- samples/bpf/Makefile | 1 - tools/lib/bpf/Makefile | 2 -- 2 files changed, 3 deletions(-) diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile index 9ffa6a2c061d..60ffc8115b67 100644 --- a/samples/bpf/Makefile +++ b/samples/bpf/Makefile @@ -127,7 +127,6 @@ CLANG ?= clang # Trick to allow make to be run from this directory all: - $(MAKE) -C ../../ tools/lib/bpf/ $(MAKE) -C ../../ $$PWD/ clean: diff --git a/tools/lib/bpf/Makefile b/tools/lib/bpf/Makefile index 616bd55f3be8..62d89d50fcbd 100644 --- a/tools/lib/bpf/Makefile +++ b/tools/lib/bpf/Makefile @@ -149,8 +149,6 @@ CMD_TARGETS = $(LIB_FILE) TARGETS = $(CMD_TARGETS) -libbpf: all - all: fixdep $(VERSION_FILES) all_cmd all_cmd: $(CMD_TARGETS) -- 2.10.2
Re: [PATCH] uio-hv-generic: store physical addresses instead of virtual
On Fri, 9 Dec 2016 12:44:40 +0100 Arnd Bergmannwrote: > gcc warns about the newly added driver when phys_addr_t is wider than > a pointer: > > drivers/uio/uio_hv_generic.c: In function 'hv_uio_mmap': > drivers/uio/uio_hv_generic.c:71:17: error: cast to pointer from integer of > different size [-Werror=int-to-pointer-cast] > virt_to_phys((void *)info->mem[mi].addr) >> PAGE_SHIFT, > drivers/uio/uio_hv_generic.c: In function 'hv_uio_probe': > drivers/uio/uio_hv_generic.c:140:5: error: cast from pointer to integer of > different size [-Werror=pointer-to-int-cast] >= (phys_addr_t)dev->channel->ringbuffer_pages; > drivers/uio/uio_hv_generic.c:147:3: error: cast from pointer to integer of > different size [-Werror=pointer-to-int-cast] >(phys_addr_t)vmbus_connection.int_page; > drivers/uio/uio_hv_generic.c:153:3: error: cast from pointer to integer of > different size [-Werror=pointer-to-int-cast] >(phys_addr_t)vmbus_connection.monitor_pages[1]; > > I can't see why we store a virtual address in a phys_addr_t here, > as the only user of that variable converts it into a physical > address anyway, so this moves the conversion to where it logically > fits according to the types. > > Fixes: 95096f2fbd10 ("uio-hv-generic: new userspace i/o driver for VMBus") > Signed-off-by: Arnd Bergmann Thanks, the code was inherited from outside, and only tested on x86_64. Not sure which platform and GCC version generates the warning, was this just W=1? Acked-by: Stephen Hemminger
Re: [PATCH v2 net-next 0/4] udp: receive path optimizations
On Fri, Dec 9, 2016 at 8:53 AM, Eric Dumazetwrote: > On Fri, 2016-12-09 at 08:43 -0800, Tom Herbert wrote: >> >> > > >> Are you thinking of allowing unconnected socket to have multiple input >> queues? Sort of an automatic and transparent SO_REUSEPORT... > > It all depends if the user application is using a single thread or > multiple threads to drain the queue. > If they're using multiple threads hopefully there's no reason they can't use SO_REUSEPORT. Since we should always assume DDOS is possibility it seems like that should be a general recommendation: If you have multiple threads listening on a port use SO_REUSEPORT. > Since we used to grab socket lock in udp_recvmsg(), I guess nobody uses > multiple threads to read packets from a single socket. > That's the hope! So the problem at hand is multiple producer CPUs and one consumer CPU. > So heavy users must use SO_REUSEPORT already, not sure what we would > gain trying to go to a single socket, with the complexity of mem > charging. > I think you're making a good point a the possibility that any unconnected UDP socket could be subject to an attack, so any use of unconnected UDP has the potential to become a "heavy user" (in fact we've seen bring down whole networks before in production). Therefore the single thread reader case is relevant to consider. Tom > >> > >
Re: [PATCH v2 net-next 0/4] udp: receive path optimizations
On Fri, 2016-12-09 at 08:43 -0800, Tom Herbert wrote: > > > Are you thinking of allowing unconnected socket to have multiple input > queues? Sort of an automatic and transparent SO_REUSEPORT... It all depends if the user application is using a single thread or multiple threads to drain the queue. Since we used to grab socket lock in udp_recvmsg(), I guess nobody uses multiple threads to read packets from a single socket. So heavy users must use SO_REUSEPORT already, not sure what we would gain trying to go to a single socket, with the complexity of mem charging. >
Re: [PATCH net-next 1/2] net: phy: add extension of phy-mode for XLGMII
On Fri, Dec 09, 2016 at 01:19:07PM +0800, Jie Deng wrote: > > > On 2016/12/9 6:15, Florian Fainelli wrote: > > On 12/06/2016 07:57 PM, Jie Deng wrote: > >> This patch adds phy-mode support for Synopsys XLGMAC > > The functional changes look good, but I would like to see some > > description of what the XL part stands for here. > > > > While you are modifying this, do you also mind submitting a Device Tree > > specification change: > > > > https://www.devicetree.org/specifications/ > > > > Thanks! > Thank you for the information. > > Currenlty, the XLGMAC is a new IP from Synopsys. I think Florian wants to know about the IEEE standard or what ever which defines what the phy-mode XLGMAC is, in the same way there are standards for RGMII, SGMII, etc. Andrew
Re: [PATCH v2 net-next 0/4] udp: receive path optimizations
On Fri, 2016-12-09 at 17:05 +0100, Jesper Dangaard Brouer wrote: > On Thu, 08 Dec 2016 13:13:15 -0800 > Eric Dumazetwrote: > > > On Thu, 2016-12-08 at 21:48 +0100, Jesper Dangaard Brouer wrote: > > > On Thu, 8 Dec 2016 09:38:55 -0800 > > > Eric Dumazet wrote: > > > > > > > This patch series provides about 100 % performance increase under > > > > flood. > > > > > > Could you please explain a bit more about what kind of testing you are > > > doing that can show 100% performance improvement? > > > > > > I've tested this patchset and my tests show *huge* speeds ups, but > > > reaping the performance benefit depend heavily on setup and enabling > > > the right UDP socket settings, and most importantly where the > > > performance bottleneck is: ksoftirqd(producer) or udp_sink(consumer). > > > > Right. > > > > So here at Google we do not try (yet) to downgrade our expensive > > Multiqueue Nics into dumb NICS from last decade by using a single queue > > on them. Maybe it will happen when we can process 10Mpps per core, > > but we are not there yet ;) > > > > So my test is using a NIC, programmed with 8 queues, on a dual-socket > > machine. (2 physical packages) > > > > 4 queues are handled by 4 cpus on socket0 (NUMA node 0) > > 4 queues are handled by 4 cpus on socket1 (NUMA node 1) > > Interesting setup, it will be good to catch cache-line bouncing and > false-sharing, which the streak of recent patches show ;-) (Hopefully > such setup are avoided for production). Well, if you have 100Gbit NIC, and 2 NUMA nodes, what do you suggest exactly, when jobs run on both nodes ? If you suggest to remove one package, or force jobs to run on Socket0, just because the NIC is attached to it, it wont be an option. Most of the traffic is TCP, so RSS comes nicely here to affine traffic on one RX queue of the NIC. Now, if for some reason an innocent UDP socket is the target of a flood, we need to not make all cpus blocked in a spinlock to eventually queue a packet. Be assured that high performance UDP servers use kernel bypass, or SO_REUSEPORT already. My effort is not targeting these special users, since they already have good performance. My effort is to provide some isolation, a bit like the effort I did for SYN flood attacks (Cpus were all spinning on listener spinlock) > > > > So I explicitly put my poor single thread UDP application in the worst > > condition, having skbs produced on two NUMA nodes. > > On which CPU do you place the single thread UDP application? No matter in this case. You can either force it to run on a group of cpu, or let the scheduler choose. If you let the scheduler choose, then it might help the single tuple flood attack, since the user thread will be moved on a difference cpu than the ksoftirqd. > > E.g. do you allow it to run on a CPU that also process ksoftirq? > My experience is that performance is approx half, if ksoftirq and > UDP-thread share a CPU (after you fixed the softirq issue). Well, this is exactly what I said earlier. Your choices about cpu pinning might help or might hurt in different scenarios. > > > > Then my load generator use trafgen, with spoofed UDP source addresses, > > like a UDP flood would use. Or typical DNS traffic, malicious or not. > > I also like trafgen > https://github.com/netoptimizer/network-testing/tree/master/trafgen > > > So I have 8 cpus all trying to queue packets in a single UDP socket. > > > > Of course, a real high performance server would use 8 UDP sockets, and > > SO_REUSEPORT with nice eBPF filter to spread the packets based on the > > queue/cpu they arrived. > > Once the ksoftirq and UDP-threads are silo'ed like that, it should > basically correspond to the benchmarks of my single queue test, > multiplied by the number of CPUs/UDP-threads. Well, if one cpu is shared by the producer and consumer then packets are hot in caches, so trying to avoid cache line misses as I did is not really helping. I optimized the case where we do not assume both parties run on the same cpu. If you leave process scheduler do its job, then your throughput can be doubled ;) Now if for some reason you are stuck with a single CPU, this is a very different problem, and af_packet might be better. > > I think it might be a good idea (for me) to implement such a > UDP-multi-threaded sink example program (with SO_REUSEPORT and eBPF > filter) to demonstrate and make sure the stack scales (and every > time we/I improve single queue performance, the numbers should multiply > with the scaling). Maybe you already have such an example program? Well, I do have something using SO_REUSEPORT, but not yet BPF, so not in a state I can share at this moment.
Re: [PATCH V2 00/22] Broadcom RoCE Driver (bnxt_re)
On Thu, Dec 08, 2016 at 10:47:54PM -0800, Selvin Xavier wrote: > ... > create mode 100644 include/uapi/rdma/bnxt_re_uverbs_abi.h Please use already established naming format for this file. It will simplify our future integration with rdma-core library. Thanks ➜ linux-rdma git:(master) ls -l include/uapi/rdma/*-abi.h -rw-r--r-- 1 leonro leonro 2291 Dec 7 13:07 include/uapi/rdma/cxgb3-abi.h -rw-r--r-- 1 leonro leonro 2488 Dec 7 13:07 include/uapi/rdma/cxgb4-abi.h -rw-r--r-- 1 leonro leonro 2864 Dec 7 13:07 include/uapi/rdma/mlx4-abi.h -rw-r--r-- 1 leonro leonro 6103 Dec 8 12:52 include/uapi/rdma/mlx5-abi.h -rw-r--r-- 1 leonro leonro 2932 Dec 7 13:07 include/uapi/rdma/mthca-abi.h -rw-r--r-- 1 leonro leonro 3380 Dec 7 13:07 include/uapi/rdma/nes-abi.h -rw-r--r-- 1 leonro leonro 3918 Dec 7 13:07 include/uapi/rdma/ocrdma-abi.h -rw-r--r-- 1 leonro leonro 2559 Dec 7 13:07 include/uapi/rdma/qedr-abi.h > > -- > 2.5.5 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html signature.asc Description: PGP signature
Re: stmmac DT property snps,axi_all
Hi Niklas On 12/09/2016 10:53 AM, Niklas Cassel wrote: On 12/09/2016 10:20 AM, Niklas Cassel wrote: On 12/08/2016 02:36 PM, Alexandre Torgue wrote: Hi Niklas, On 12/05/2016 05:18 PM, Niklas Cassel wrote: Hello Giuseppe I'm trying to figure out what snps,axi_all is supposed to represent. It appears that the value is saved, but never used in the code. Looking at the register specification, I'm guessing that it represents Address-Aligned Beats, but there is already the property snps,aal for that. IMO, it is not useful. Indeed AXI_AAL is a read only bit (in AXI bus mode register) and reflects the aal bit in DMA bus register. As you know we use "snps,aal" to set aal bit in DMA bus register. So "snps,axi_all" entry seems useless. Let's see with Peppe. Ok, I see. GMAC and GMAC4 is different here. For GMAC4 AAL only exists in DMA_SYS_BUS_MODE. It's not reflected anywhere else. The code is correct in the driver. If snps,axi_all is just created for a read-only register, and it is currently never used in the code, while we have snps,aal, which is correct and works, I guess it should be ok to remove snps,axi_all. I can cook up a patch. Here we go :) I will send it as a real patch once net-next reopens. Thanks ;). Just check with Peppe next week (as he added in the past this property). Regards Alex From defc01cb7c22611b89d9cf1fcae72544092bd62c Mon Sep 17 00:00:00 2001 From: Niklas CasselDate: Fri, 9 Dec 2016 10:27:00 +0100 Subject: [PATCH net-next] net: stmmac: remove unused duplicate property snps,axi_all For core revision 3.x Address-Aligned Beats is available in two registers. The DT property snps,aal was created for AAL in the DMA bus register, which is a read/write bit. The DT property snps,axi_all was created for AXI_AAL in the AXI bus mode register, which is a read only bit that reflects the value of AAL in the DMA bus register. Since the value of snps,axi_all is never used in the driver, and since the property was created for a bit that is read only, it should be safe to remove the property. Signed-off-by: Niklas Cassel --- Documentation/devicetree/bindings/net/stmmac.txt | 1 - drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c | 1 - include/linux/stmmac.h| 1 - 3 files changed, 3 deletions(-) diff --git a/Documentation/devicetree/bindings/net/stmmac.txt b/Documentation/devicetree/bindings/net/stmmac.txt index 128da752fec9..c3d2fd480a1b 100644 --- a/Documentation/devicetree/bindings/net/stmmac.txt +++ b/Documentation/devicetree/bindings/net/stmmac.txt @@ -65,7 +65,6 @@ Optional properties: - snps,wr_osr_lmt: max write outstanding req. limit - snps,rd_osr_lmt: max read outstanding req. limit - snps,kbbe: do not cross 1KiB boundary. -- snps,axi_all: align address - snps,blen: this is a vector of supported burst length. - snps,fb: fixed-burst - snps,mb: mixed-burst diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c index 082cd48db6a7..60ba8993c650 100644 --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c @@ -121,7 +121,6 @@ static struct stmmac_axi *stmmac_axi_setup(struct platform_device *pdev) axi->axi_lpi_en = of_property_read_bool(np, "snps,lpi_en"); axi->axi_xit_frm = of_property_read_bool(np, "snps,xit_frm"); axi->axi_kbbe = of_property_read_bool(np, "snps,axi_kbbe"); -axi->axi_axi_all = of_property_read_bool(np, "snps,axi_all"); axi->axi_fb = of_property_read_bool(np, "snps,axi_fb"); axi->axi_mb = of_property_read_bool(np, "snps,axi_mb"); axi->axi_rb = of_property_read_bool(np, "snps,axi_rb"); diff --git a/include/linux/stmmac.h b/include/linux/stmmac.h index 266dab9ad782..889e0e9a3f1c 100644 --- a/include/linux/stmmac.h +++ b/include/linux/stmmac.h @@ -103,7 +103,6 @@ struct stmmac_axi { u32 axi_wr_osr_lmt; u32 axi_rd_osr_lmt; bool axi_kbbe; -bool axi_axi_all; u32 axi_blen[AXI_BLEN]; bool axi_fb; bool axi_mb;
Re: [PATCH v2 net-next 0/4] udp: receive path optimizations
On Thu, 08 Dec 2016 13:13:15 -0800 Eric Dumazetwrote: > On Thu, 2016-12-08 at 21:48 +0100, Jesper Dangaard Brouer wrote: > > On Thu, 8 Dec 2016 09:38:55 -0800 > > Eric Dumazet wrote: > > > > > This patch series provides about 100 % performance increase under flood. > > > > > > > Could you please explain a bit more about what kind of testing you are > > doing that can show 100% performance improvement? > > > > I've tested this patchset and my tests show *huge* speeds ups, but > > reaping the performance benefit depend heavily on setup and enabling > > the right UDP socket settings, and most importantly where the > > performance bottleneck is: ksoftirqd(producer) or udp_sink(consumer). > > Right. > > So here at Google we do not try (yet) to downgrade our expensive > Multiqueue Nics into dumb NICS from last decade by using a single queue > on them. Maybe it will happen when we can process 10Mpps per core, > but we are not there yet ;) > > So my test is using a NIC, programmed with 8 queues, on a dual-socket > machine. (2 physical packages) > > 4 queues are handled by 4 cpus on socket0 (NUMA node 0) > 4 queues are handled by 4 cpus on socket1 (NUMA node 1) Interesting setup, it will be good to catch cache-line bouncing and false-sharing, which the streak of recent patches show ;-) (Hopefully such setup are avoided for production). > So I explicitly put my poor single thread UDP application in the worst > condition, having skbs produced on two NUMA nodes. On which CPU do you place the single thread UDP application? E.g. do you allow it to run on a CPU that also process ksoftirq? My experience is that performance is approx half, if ksoftirq and UDP-thread share a CPU (after you fixed the softirq issue). > Then my load generator use trafgen, with spoofed UDP source addresses, > like a UDP flood would use. Or typical DNS traffic, malicious or not. I also like trafgen https://github.com/netoptimizer/network-testing/tree/master/trafgen > So I have 8 cpus all trying to queue packets in a single UDP socket. > > Of course, a real high performance server would use 8 UDP sockets, and > SO_REUSEPORT with nice eBPF filter to spread the packets based on the > queue/cpu they arrived. Once the ksoftirq and UDP-threads are silo'ed like that, it should basically correspond to the benchmarks of my single queue test, multiplied by the number of CPUs/UDP-threads. I think it might be a good idea (for me) to implement such a UDP-multi-threaded sink example program (with SO_REUSEPORT and eBPF filter) to demonstrate and make sure the stack scales (and every time we/I improve single queue performance, the numbers should multiply with the scaling). Maybe you already have such an example program? > In the case you have one cpu that you need to share between ksoftirq and > all user threads, then your test results depend on process scheduler > decisions more than anything we can code in network land. Yes, also my experience, the scheduler have large influence. > It is actually easy for user space to get more than 50% of the cycles, > and 'starve' ksoftirqd. FYI, Paolo recently added an option for parsing of pktgen payload in the udp_sink.c program, this way we can simulate the app doing something. I've started testing with 4 CPUs doing ksoftirq, multiple flows (pktgen_sample04_many_flows.sh) and then increasing adding udp_sink --reuse-port programs, on other 4 CPUs, and it looks like it scales nicely :-) -- Best regards, Jesper Dangaard Brouer MSc.CS, Principal Kernel Engineer at Red Hat LinkedIn: http://www.linkedin.com/in/brouer
[PATCH net-next] net: skb_condense() can also deal with empty skbs
From: Eric DumazetIt seems attackers can also send UDP packets with no payload at all. skb_condense() can still be a win in this case. It will be possible to replace the custom code in tcp_add_backlog() to get full benefit from skb_condense() Signed-off-by: Eric Dumazet --- net/core/skbuff.c | 22 +- 1 file changed, 13 insertions(+), 9 deletions(-) diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 84151cf40aebb973bad5bee3ee4be0758084d83c..b1451e66d570269252ce628b2dc1714b860e1ca4 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -4946,16 +4946,20 @@ EXPORT_SYMBOL(pskb_extract); */ void skb_condense(struct sk_buff *skb) { - if (!skb->data_len || - skb->data_len > skb->end - skb->tail || - skb_cloned(skb)) - return; - - /* Nice, we can free page frag(s) right now */ - __pskb_pull_tail(skb, skb->data_len); + if (skb->data_len) { + if (skb->data_len > skb->end - skb->tail || + skb_cloned(skb)) + return; - /* Now adjust skb->truesize, since __pskb_pull_tail() does -* not do this. + /* Nice, we can free page frag(s) right now */ + __pskb_pull_tail(skb, skb->data_len); + } + /* At this point, skb->truesize might be over estimated, +* because skb had a fragment, and fragments do not tell +* their truesize. +* When we pulled its content into skb->head, fragment +* was freed, but __pskb_pull_tail() could not possibly +* adjust skb->truesize, not knowing the frag truesize. */ skb->truesize = SKB_TRUESIZE(skb_end_offset(skb)); }
Re: Synopsys Ethernet QoS
Às 3:41 PM de 12/9/2016, David Miller escreveu: > From: Joao Pinto> Date: Fri, 9 Dec 2016 15:36:38 + > >> Of course, I started a general discussion about the subject and >> those were the conclusions, but I would like to know if you as the >> subsystem maintainer also support the approach or have any >> suggestion. > > Generally, I support whatever the interested parties agree to. > > But one thing I am against is changing the driver name for existing > users. If an existing chip is supported by the stmmac driver for > existing users, they should still continue to use the "stmmac" driver. > > Therefore, if consolidation changes the driver module name for > existing users, then that is not a good plan at all. > Of course, 100% with you! Retro-compatibility for existing drivers is a must have. The consolidation is going to be done with extreme careful. Joao
Re: Synopsys Ethernet QoS
From: Joao PintoDate: Fri, 9 Dec 2016 15:36:38 + > Of course, I started a general discussion about the subject and > those were the conclusions, but I would like to know if you as the > subsystem maintainer also support the approach or have any > suggestion. Generally, I support whatever the interested parties agree to. But one thing I am against is changing the driver name for existing users. If an existing chip is supported by the stmmac driver for existing users, they should still continue to use the "stmmac" driver. Therefore, if consolidation changes the driver module name for existing users, then that is not a good plan at all.
Re: Synopsys Ethernet QoS
Hi David, Of course, I started a general discussion about the subject and those were the conclusions, but I would like to know if you as the subsystem maintainer also support the approach or have any suggestion. Thanks, Joao Às 3:33 PM de 12/9/2016, David Miller escreveu: > From: Joao Pinto> Date: Fri, 9 Dec 2016 11:29:02 + > >> Dear David Miller, > ... >> I would like to know if you support this plan. > > This is not how this works. > > You need to discuss and work out a plan with the other people > with a direct interest in the existing drivers and maintainence. > > Not me. >
Re: [PATCH] linux/types.h: enable endian checks for all sparse builds
On 12/08/16 22:40, Madhani, Himanshu wrote: > We’ll take a look and send patches to resolve these warnings. Thanks! Bart.
Re: Synopsys Ethernet QoS
From: Joao PintoDate: Fri, 9 Dec 2016 11:29:02 + > Dear David Miller, ... > I would like to know if you support this plan. This is not how this works. You need to discuss and work out a plan with the other people with a direct interest in the existing drivers and maintainence. Not me.
Re: [PATCH V2 00/22] Broadcom RoCE Driver (bnxt_re)
From: Selvin XavierDate: Thu, 8 Dec 2016 22:47:54 -0800 > This series introduces the RoCE driver for the Broadcom > NetXtreme-E 10/25/40/50 gigabit RoCE HCAs. > This driver is dependent on the bnxt_en NIC driver and is > based on the bnxt_re branch in Doug's repository. bnxt_en changes > required for this patch series is already available in this branch. > > I am preparing a git repository with these changes as per Jason's > comment and will share the details later today. If this is targetted at the net-next tree, it is too late as I've closed the net-next tree two nights ago. Please resubmit this after the upcoming merge window closes. Thanks.
Re: [PATCHv3 perf/core 0/7] Reuse libbpf from samples/bpf
Hi Arnaldo, On 12/09/2016 04:09 PM, Arnaldo Carvalho de Melo wrote: Em Thu, Dec 08, 2016 at 06:46:13PM -0800, Joe Stringer escreveu: (Was "libbpf: Synchronize implementations") Update tools/lib/bpf to provide the remaining bpf wrapper pieces needed by the samples/bpf/ code, then get rid of all of the duplicate BPF libraries in samples/bpf/libbpf.[ch]. --- v3: Add ack for first patch. Split out second patch from v2 into separate changes for remaining diff. Add patches to switch samples/bpf over to using tools/lib/. v2: https://www.mail-archive.com/netdev@vger.kernel.org/msg135088.html Don't shift non-bpf code into libbpf. Drop the patch to synchronize ELF definitions with tc. v1: https://www.mail-archive.com/netdev@vger.kernel.org/msg135088.html First post. Thanks, applied after addressing the -I$(objtree) issue raised by Wang, [ Sorry for late reply. ] First of all, glad to see us getting rid of the duplicate lib eventually! :) Please note that this might result in hopefully just a minor merge issue with net-next. Looks like patch 4/7 touches test_maps.c and test_verifier.c, which moved to a new bpf selftest suite [1] this net-next cycle. Seems it's just log buffer and some renames there, which can be discarded for both files sitting in selftests. Thanks, Daniel [1] https://git.kernel.org/cgit/linux/kernel/git/davem/net-next.git/tree/tools/testing/selftests/bpf
Re: [PATCH 37/50] netfilter: nf_tables: atomic dump and reset for stateful objects
On Fri, 2016-12-09 at 06:24 -0800, Eric Dumazet wrote: > It looks that you want a seqcount, even on 64bit arches, > so that CPU 2 can restart its loop, and more importantly you need > to not accumulate the values you read, because they might be old/invalid. Untested patch to give general idea. I can polish it a bit later today. net/netfilter/nft_counter.c | 59 +- 1 file changed, 23 insertions(+), 36 deletions(-) diff --git a/net/netfilter/nft_counter.c b/net/netfilter/nft_counter.c index f6a02c5071c2aeafca7635da3282a809aa04d6ab..57ed95b024473a2aa76298fe5bb5013bf709801b 100644 --- a/net/netfilter/nft_counter.c +++ b/net/netfilter/nft_counter.c @@ -31,18 +31,25 @@ struct nft_counter_percpu_priv { struct nft_counter_percpu __percpu *counter; }; +static DEFINE_PER_CPU(seqcount_t, nft_counter_seq); + static inline void nft_counter_do_eval(struct nft_counter_percpu_priv *priv, struct nft_regs *regs, const struct nft_pktinfo *pkt) { struct nft_counter_percpu *this_cpu; + seqcount_t *myseq; local_bh_disable(); this_cpu = this_cpu_ptr(priv->counter); - u64_stats_update_begin(_cpu->syncp); + myseq = this_cpu_ptr(_counter_seq); + + write_seqcount_begin(myseq); + this_cpu->counter.bytes += pkt->skb->len; this_cpu->counter.packets++; - u64_stats_update_end(_cpu->syncp); + + write_seqcount_end(myseq); local_bh_enable(); } @@ -110,52 +117,30 @@ static void nft_counter_fetch(struct nft_counter_percpu __percpu *counter, memset(total, 0, sizeof(*total)); for_each_possible_cpu(cpu) { + seqcount_t *seqp = per_cpu_ptr(_counter_seq, cpu); + cpu_stats = per_cpu_ptr(counter, cpu); do { - seq = u64_stats_fetch_begin_irq(_stats->syncp); + seq = read_seqcount_begin(seqp); bytes = cpu_stats->counter.bytes; packets = cpu_stats->counter.packets; - } while (u64_stats_fetch_retry_irq(_stats->syncp, seq)); + } while (read_seqcount_retry(seqp, seq)); total->packets += packets; total->bytes += bytes; } } -static u64 __nft_counter_reset(u64 *counter) -{ - u64 ret, old; - - do { - old = *counter; - ret = cmpxchg64(counter, old, 0); - } while (ret != old); - - return ret; -} - static void nft_counter_reset(struct nft_counter_percpu __percpu *counter, struct nft_counter *total) { struct nft_counter_percpu *cpu_stats; - u64 bytes, packets; - unsigned int seq; - int cpu; - memset(total, 0, sizeof(*total)); - for_each_possible_cpu(cpu) { - bytes = packets = 0; - - cpu_stats = per_cpu_ptr(counter, cpu); - do { - seq = u64_stats_fetch_begin_irq(_stats->syncp); - packets += __nft_counter_reset(_stats->counter.packets); - bytes += __nft_counter_reset(_stats->counter.bytes); - } while (u64_stats_fetch_retry_irq(_stats->syncp, seq)); - - total->packets += packets; - total->bytes += bytes; - } + local_bh_disable(); + cpu_stats = this_cpu_ptr(counter); + cpu_stats->counter.packets -= total->packets; + cpu_stats->counter.bytes -= total->bytes; + local_bh_enable(); } static int nft_counter_do_dump(struct sk_buff *skb, @@ -164,10 +149,9 @@ static int nft_counter_do_dump(struct sk_buff *skb, { struct nft_counter total; + nft_counter_fetch(priv->counter, ); if (reset) nft_counter_reset(priv->counter, ); - else - nft_counter_fetch(priv->counter, ); if (nla_put_be64(skb, NFTA_COUNTER_BYTES, cpu_to_be64(total.bytes), NFTA_COUNTER_PAD) || @@ -285,7 +269,10 @@ static struct nft_expr_type nft_counter_type __read_mostly = { static int __init nft_counter_module_init(void) { - int err; + int err, cpu; + + for_each_possible_cpu(cpu) + seqcount_init(per_cpu_ptr(_counter_seq, cpu)); err = nft_register_obj(_counter_obj); if (err < 0)
Re: [PATCH net-next 0/2] Initial driver for Synopsys DWC XLGMAC
Hi Jie, I don't think we have the need to create the "dwc" subdirectory under "synopsys". Its preferable to have them directly under drivers/net/ethernet/synopsys. Regards, C.Palminha On 07-12-2016 03:57, Jie Deng wrote: > This series provides the support for 25/40/50/100 GbE > devices using Synopsys DWC Enterprise Ethernet (XLGMAC). > > The first patch adds support for Synopsys XLGMII. > The second patch provides the initial driver for Synopsys XLGMAC > > The driver has three layers by refactoring AMD XGBE. > > dwc-eth-xxx.x > The DWC ethernet core layer (DWC ECL). This layer contains codes > can be shared by different DWC series ethernet cores > > dwc-xxx.x (e.g. dwc-xlgmac.c) > The DWC MAC HW adapter layer (DWC MHAL). This layer contains > special support for a specific MAC. e.g. currently, XLGMAC. > > xxx-xxx-pci.c xxx-xxx-plat.c (e.g. dwc-xlgmac-pci.c) > The glue adapter layer (GAL). Vendors who adopt Synopsys Etherent > cores can develop a glue driver for their platform. > > Jie Deng (2): > net: phy: add extension of phy-mode for XLGMII > net: ethernet: Initial driver for Synopsys DWC XLGMAC > > Documentation/devicetree/bindings/net/ethernet.txt |1 + > MAINTAINERS|6 + > drivers/net/ethernet/synopsys/Kconfig |2 + > drivers/net/ethernet/synopsys/Makefile |1 + > drivers/net/ethernet/synopsys/dwc/Kconfig | 37 + > drivers/net/ethernet/synopsys/dwc/Makefile |9 + > drivers/net/ethernet/synopsys/dwc/dwc-eth-dcb.c| 228 ++ > .../net/ethernet/synopsys/dwc/dwc-eth-debugfs.c| 328 +++ > drivers/net/ethernet/synopsys/dwc/dwc-eth-desc.c | 715 + > .../net/ethernet/synopsys/dwc/dwc-eth-ethtool.c| 567 > drivers/net/ethernet/synopsys/dwc/dwc-eth-hw.c | 3098 > > drivers/net/ethernet/synopsys/dwc/dwc-eth-mdio.c | 252 ++ > drivers/net/ethernet/synopsys/dwc/dwc-eth-net.c| 2319 +++ > drivers/net/ethernet/synopsys/dwc/dwc-eth-ptp.c| 216 ++ > drivers/net/ethernet/synopsys/dwc/dwc-eth-regacc.h | 1115 +++ > drivers/net/ethernet/synopsys/dwc/dwc-eth.h| 738 + > drivers/net/ethernet/synopsys/dwc/dwc-xlgmac-pci.c | 538 > drivers/net/ethernet/synopsys/dwc/dwc-xlgmac.c | 135 + > drivers/net/ethernet/synopsys/dwc/dwc-xlgmac.h | 85 + > include/linux/phy.h|3 + > 20 files changed, 10393 insertions(+) > create mode 100644 drivers/net/ethernet/synopsys/dwc/Kconfig > create mode 100644 drivers/net/ethernet/synopsys/dwc/Makefile > create mode 100644 drivers/net/ethernet/synopsys/dwc/dwc-eth-dcb.c > create mode 100644 drivers/net/ethernet/synopsys/dwc/dwc-eth-debugfs.c > create mode 100644 drivers/net/ethernet/synopsys/dwc/dwc-eth-desc.c > create mode 100644 drivers/net/ethernet/synopsys/dwc/dwc-eth-ethtool.c > create mode 100644 drivers/net/ethernet/synopsys/dwc/dwc-eth-hw.c > create mode 100644 drivers/net/ethernet/synopsys/dwc/dwc-eth-mdio.c > create mode 100644 drivers/net/ethernet/synopsys/dwc/dwc-eth-net.c > create mode 100644 drivers/net/ethernet/synopsys/dwc/dwc-eth-ptp.c > create mode 100644 drivers/net/ethernet/synopsys/dwc/dwc-eth-regacc.h > create mode 100644 drivers/net/ethernet/synopsys/dwc/dwc-eth.h > create mode 100644 drivers/net/ethernet/synopsys/dwc/dwc-xlgmac-pci.c > create mode 100644 drivers/net/ethernet/synopsys/dwc/dwc-xlgmac.c > create mode 100644 drivers/net/ethernet/synopsys/dwc/dwc-xlgmac.h >
Re: [PATCHv3 perf/core 0/7] Reuse libbpf from samples/bpf
Em Thu, Dec 08, 2016 at 06:46:13PM -0800, Joe Stringer escreveu: > (Was "libbpf: Synchronize implementations") > > Update tools/lib/bpf to provide the remaining bpf wrapper pieces needed by the > samples/bpf/ code, then get rid of all of the duplicate BPF libraries in > samples/bpf/libbpf.[ch]. > > --- > v3: Add ack for first patch. > Split out second patch from v2 into separate changes for remaining diff. > Add patches to switch samples/bpf over to using tools/lib/. > v2: https://www.mail-archive.com/netdev@vger.kernel.org/msg135088.html > Don't shift non-bpf code into libbpf. > Drop the patch to synchronize ELF definitions with tc. > v1: https://www.mail-archive.com/netdev@vger.kernel.org/msg135088.html > First post. Thanks, applied after addressing the -I$(objtree) issue raised by Wang, - Arnaldo
Re: [PATCHv3 perf/core 6/7] samples/bpf: Remove perf_event_open() declaration
Em Thu, Dec 08, 2016 at 06:46:19PM -0800, Joe Stringer escreveu: > This declaration was made in samples/bpf/libbpf.c for convenience, but > there's already one in tools/perf/perf-sys.h. Reuse that one. > > Signed-off-by: Joe Stringer> --- > v3: First post. > --- > samples/bpf/Makefile| 3 ++- > samples/bpf/bpf_load.c | 3 ++- > samples/bpf/libbpf.c| 7 --- > samples/bpf/libbpf.h| 3 --- > samples/bpf/sampleip_user.c | 3 ++- > samples/bpf/trace_event_user.c | 9 + > samples/bpf/trace_output_user.c | 3 ++- > samples/bpf/tracex6_user.c | 3 ++- > 8 files changed, 15 insertions(+), 19 deletions(-) > > diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile > index c8f7ed37b2de..0adc47e67e65 100644 > --- a/samples/bpf/Makefile > +++ b/samples/bpf/Makefile > @@ -92,7 +92,8 @@ always += test_current_task_under_cgroup_kern.o > always += trace_event_kern.o > always += sampleip_kern.o > > -HOSTCFLAGS += -I$(objtree)/usr/include -I$(objtree)/tools/lib/ > +HOSTCFLAGS += -I$(objtree)/usr/include -I$(objtree)/tools/lib/ \ > + -I$(objtree)/tools/include -I$(objtree)/tools/perf Switching these to $(srctree) as well, to support building it like: make -j4 O=../build/v4.9.0-rc8+ samples/bpf/ > > HOSTCFLAGS_bpf_load.o += -I$(objtree)/usr/include -Wno-unused-variable > HOSTLOADLIBES_fds_example += -lelf > diff --git a/samples/bpf/bpf_load.c b/samples/bpf/bpf_load.c > index f8e3c58a0897..d683bd278171 100644 > --- a/samples/bpf/bpf_load.c > +++ b/samples/bpf/bpf_load.c > @@ -19,6 +19,7 @@ > #include > #include "libbpf.h" > #include "bpf_load.h" > +#include "perf-sys.h" > > #define DEBUGFS "/sys/kernel/debug/tracing/" > > @@ -168,7 +169,7 @@ static int load_and_attach(const char *event, struct > bpf_insn *prog, int size) > id = atoi(buf); > attr.config = id; > > - efd = perf_event_open(, -1/*pid*/, 0/*cpu*/, -1/*group_fd*/, 0); > + efd = sys_perf_event_open(, -1/*pid*/, 0/*cpu*/, -1/*group_fd*/, > 0); > if (efd < 0) { > printf("event %d fd %d err %s\n", id, efd, strerror(errno)); > return -1; > diff --git a/samples/bpf/libbpf.c b/samples/bpf/libbpf.c > index d9af876b4a2c..bee473a494f1 100644 > --- a/samples/bpf/libbpf.c > +++ b/samples/bpf/libbpf.c > @@ -34,10 +34,3 @@ int open_raw_sock(const char *name) > > return sock; > } > - > -int perf_event_open(struct perf_event_attr *attr, int pid, int cpu, > - int group_fd, unsigned long flags) > -{ > - return syscall(__NR_perf_event_open, attr, pid, cpu, > -group_fd, flags); > -} > diff --git a/samples/bpf/libbpf.h b/samples/bpf/libbpf.h > index cc815624aacf..09aedc320009 100644 > --- a/samples/bpf/libbpf.h > +++ b/samples/bpf/libbpf.h > @@ -188,7 +188,4 @@ struct bpf_insn; > /* create RAW socket and bind to interface 'name' */ > int open_raw_sock(const char *name); > > -struct perf_event_attr; > -int perf_event_open(struct perf_event_attr *attr, int pid, int cpu, > - int group_fd, unsigned long flags); > #endif > diff --git a/samples/bpf/sampleip_user.c b/samples/bpf/sampleip_user.c > index 09ab620b324c..476a11947180 100644 > --- a/samples/bpf/sampleip_user.c > +++ b/samples/bpf/sampleip_user.c > @@ -21,6 +21,7 @@ > #include > #include "libbpf.h" > #include "bpf_load.h" > +#include "perf-sys.h" > > #define DEFAULT_FREQ 99 > #define DEFAULT_SECS 5 > @@ -50,7 +51,7 @@ static int sampling_start(int *pmu_fd, int freq) > }; > > for (i = 0; i < nr_cpus; i++) { > - pmu_fd[i] = perf_event_open(_sample_attr, -1 /* pid */, i, > + pmu_fd[i] = sys_perf_event_open(_sample_attr, -1 /* pid */, > i, > -1 /* group_fd */, 0 /* flags */); > if (pmu_fd[i] < 0) { > fprintf(stderr, "ERROR: Initializing perf sampling\n"); > diff --git a/samples/bpf/trace_event_user.c b/samples/bpf/trace_event_user.c > index de8fd0266d78..ccb0cba8324a 100644 > --- a/samples/bpf/trace_event_user.c > +++ b/samples/bpf/trace_event_user.c > @@ -20,6 +20,7 @@ > #include > #include "libbpf.h" > #include "bpf_load.h" > +#include "perf-sys.h" > > #define SAMPLE_FREQ 50 > > @@ -126,9 +127,9 @@ static void test_perf_event_all_cpu(struct > perf_event_attr *attr) > > /* open perf_event on all cpus */ > for (i = 0; i < nr_cpus; i++) { > - pmu_fd[i] = perf_event_open(attr, -1, i, -1, 0); > + pmu_fd[i] = sys_perf_event_open(attr, -1, i, -1, 0); > if (pmu_fd[i] < 0) { > - printf("perf_event_open failed\n"); > + printf("sys_perf_event_open failed\n"); > goto all_cpu_err; > } > assert(ioctl(pmu_fd[i], PERF_EVENT_IOC_SET_BPF, prog_fd[0]) == > 0); > @@ -147,9 +148,9 @@ static void test_perf_event_task(struct perf_event_attr >
Re: 4.9.0-rc8: tg3 dead after resume
On Thu, Dec 8, 2016 at 4:03 AM, Siva Reddy Kallamwrote: > On Thu, Dec 8, 2016 at 12:14 AM, Billy Shuman wrote: >> On Wed, Dec 7, 2016 at 12:37 PM, Michael Chan >> wrote: >>> On Wed, Dec 7, 2016 at 7:20 AM, Billy Shuman wrote: After resume on 4.9.0-rc8 tg3 is dead. In logs I see: kernel: tg3 :44:00.0: phy probe failed, err -19 kernel: tg3 :44:00.0: Problem fetching invariants of chip, aborting >>> >>> -19 is -ENODEV which means tg3 cannot read the PHY ID. >>> >>> If it's a true suspend/resume operation, the driver does not have to >>> go through probe during resume. Please explain how you do >>> suspend/resume. >>> >> >> Sorry my previous message was accidentally sent to early. >> >> I used systemd (systemctl suspend) to suspend. >> > We need more information to proceed further. > Without suspend, Are you able to use the tg3 port? Yes the port works fine without suspend. > Which Broadcom card are you having in laptop? The nic is a NetXtreme BCM57762 Gigabit Ethernet PCIe in a thunderbolt3 dock. > Please provide complete tg3 specific logs in dmesg. > [ 32.084010] tg3.c:v3.137 (May 11, 2014) [ 32.124695] tg3 :44:00.0 eth0: Tigon3 [partno(BCM957762) rev 57766001] (PCI Express) MAC address 98:e7:f4:8b:13:19 [ 32.124698] tg3 :44:00.0 eth0: attached PHY is 57765 (10/100/1000Base-T Ethernet) (WireSpeed[1], EEE[1]) [ 32.124699] tg3 :44:00.0 eth0: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[0] TSOcap[1] [ 32.124700] tg3 :44:00.0 eth0: dma_rwctrl[0001] dma_mask[64-bit] [ 32.219764] tg3 :44:00.0 enp68s0: renamed from eth0 [ 36.219245] tg3 :44:00.0 enp68s0: Link is up at 1000 Mbps, full duplex [ 36.219250] tg3 :44:00.0 enp68s0: Flow control is on for TX and on for RX [ 36.219251] tg3 :44:00.0 enp68s0: EEE is disabled after resume [ 92.292838] tg3 :44:00.0 enp68s0: No firmware running [ 93.521744] tg3 :44:00.0: tg3_abort_hw timed out, TX_MODE_ENABLE will not clear MAC_TX_MODE= [ 106.704655] tg3 :44:00.0 enp68s0: Link is down [ 108.370356] tg3 :44:00.0: tg3_abort_hw timed out, TX_MODE_ENABLE will not clear MAC_TX_MODE= after rmmod, modprobe [ 570.933636] tg3 :44:00.0: tg3_abort_hw timed out, TX_MODE_ENABLE will not clear MAC_TX_MODE= [ 604.847215] tg3.c:v3.137 (May 11, 2014) [ 605.010075] tg3 :44:00.0: phy probe failed, err -19 [ 605.010077] tg3 :44:00.0: Problem fetching invariants of chip, aborting >>> Did this work before? There has been very few changes to tg3 recently. >>> >> >> This is a new laptop for me, but the same behavior is seen on 4.4.36 and >> 4.8.12. >> rmmod and modprobe does not fix the problem only a reboot resolves the issue. Billy
Re: [PATCHv3 perf/core 3/7] tools lib bpf: Add flags to bpf_create_map()
Em Fri, Dec 09, 2016 at 11:36:18AM +0800, Wangnan (F) escreveu: > > > On 2016/12/9 10:46, Joe Stringer wrote: > > The map_flags argument to bpf_create_map() was previously not exposed. > > By exposing it, users can access flags such as whether or not to > > preallocate the map. > > > > Signed-off-by: Joe Stringer> > Please mention commit 6c90598174322b029e40dd84a4eb01f56afe in > commit message: > > Commit 6c905981743 ("bpf: pre-allocate hash map elements") introduces > map_flags to bpf_attr for BPF_MAP_CREATE command. Expose this new > parameter in libbpf. will do it, thanks. - Arnaldo > Acked-by: Wang Nan > > > --- > > v3: Split from "tools lib bpf: Sync with samples/bpf/libbpf". > > --- > > tools/lib/bpf/bpf.c| 3 ++- > > tools/lib/bpf/bpf.h| 2 +- > > tools/lib/bpf/libbpf.c | 3 ++- > > 3 files changed, 5 insertions(+), 3 deletions(-) > > > > diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c > > index 89e8e8e5b60e..d0afb26c2e0f 100644 > > --- a/tools/lib/bpf/bpf.c > > +++ b/tools/lib/bpf/bpf.c > > @@ -54,7 +54,7 @@ static int sys_bpf(enum bpf_cmd cmd, union bpf_attr *attr, > > } > > int bpf_create_map(enum bpf_map_type map_type, int key_size, > > - int value_size, int max_entries) > > + int value_size, int max_entries, __u32 map_flags) > > { > > union bpf_attr attr; > > @@ -64,6 +64,7 @@ int bpf_create_map(enum bpf_map_type map_type, int > > key_size, > > attr.key_size = key_size; > > attr.value_size = value_size; > > attr.max_entries = max_entries; > > + attr.map_flags = map_flags; > > return sys_bpf(BPF_MAP_CREATE, , sizeof(attr)); > > } > > diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h > > index 61130170a6ad..7fcdce16fd62 100644 > > --- a/tools/lib/bpf/bpf.h > > +++ b/tools/lib/bpf/bpf.h > > @@ -24,7 +24,7 @@ > > #include > > int bpf_create_map(enum bpf_map_type map_type, int key_size, int > > value_size, > > - int max_entries); > > + int max_entries, __u32 map_flags); > > /* Recommend log buffer size */ > > #define BPF_LOG_BUF_SIZE 65536 > > diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c > > index 2e974593f3e8..84e6b35da4bd 100644 > > --- a/tools/lib/bpf/libbpf.c > > +++ b/tools/lib/bpf/libbpf.c > > @@ -854,7 +854,8 @@ bpf_object__create_maps(struct bpf_object *obj) > > *pfd = bpf_create_map(def->type, > > def->key_size, > > def->value_size, > > - def->max_entries); > > + def->max_entries, > > + 0); > > if (*pfd < 0) { > > size_t j; > > int err = *pfd; >
Re: [PATCH 37/50] netfilter: nf_tables: atomic dump and reset for stateful objects
On Fri, 2016-12-09 at 11:24 +0100, Pablo Neira Ayuso wrote: > Hi Paul, Hi Pablo Given that bytes/packets counters are modified without cmpxchg64() : static inline void nft_counter_do_eval(struct nft_counter_percpu_priv *priv, struct nft_regs *regs, const struct nft_pktinfo *pkt) { struct nft_counter_percpu *this_cpu; local_bh_disable(); this_cpu = this_cpu_ptr(priv->counter); u64_stats_update_begin(_cpu->syncp); this_cpu->counter.bytes += pkt->skb->len; this_cpu->counter.packets++; u64_stats_update_end(_cpu->syncp); local_bh_enable(); } It means that the cmpxchg64() used to clear the stats is not good enough. It does not help to make sure stats are properly cleared. On 64 bit, the ->syncp is not there, so the nft_counter_reset() might not see that a bytes or packets counter was modified by another cpu. CPU 1 CPU 2 LOAD PTR->BYTES into REG_A old = *counter; REG_A += skb->len; cmpxchg64(counter, old, 0); PTR->BYTES = REG_A It looks that you want a seqcount, even on 64bit arches, so that CPU 2 can restart its loop, and more importantly you need to not accumulate the values you read, because they might be old/invalid. Another way would be to not use cmpxchg64() at all. Way to expensive in fast path ! The percpu value would never be modified by an other cpu than the owner. You need a per cpu seqcount, no need to add a syncp per nft percpu counter. static DEFINE_PERCPU(seqcount_t, nft_pcpu_seq); static inline void nft_counter_do_eval(struct nft_counter_percpu_priv *priv, struct nft_regs *regs, const struct nft_pktinfo *pkt) { struct nft_counter_percpu *this_cpu; seqcount_t *myseq; local_bh_disable(); this_cpu = this_cpu_ptr(priv->counter); myseq = this_cpu_ptr(_pcpu_seq); write_seqcount_begin(myseq); this_cpu->counter.bytes += pkt->skb->len; this_cpu->counter.packets++; write_seqcount_end(myseq); local_bh_enable(); } Thanks !
Re: [RESEND][PATCH v4] cgroup: Use CAP_SYS_RESOURCE to allow a process to migrate other tasks between cgroups
Hello, John. On Thu, Dec 08, 2016 at 09:39:38PM -0800, John Stultz wrote: > So just to clarify the discussion for my purposes and make sure I > understood, per-cgroup CAP rules was not desired, and instead we > should either utilize an existing cap (are there still objections to > CAP_SYS_RESOURCE? - this isn't clear to me) or create a new one (ie, > bring back the older CAP_CGROUP_MIGRATE patch). Let's create a new one. It looks to be a bit too different to share with an existing one. > Tejun: Do you have a more finished version of your patch that I should > add my changes on top of? Oh, just submit the patch on top of the current for-next. I can queue mine on top of yours. They are mostly orthogonal. Thanks. -- tejun
[PATCH] net: smsc911x: back out silently on probe deferrals
When trying to get a regulator we may get deferred and we see this noise: smsc911x 1b80.ethernet-ebi2 (unnamed net_device) (uninitialized): couldn't get regulators -517 Then the driver continues anyway. Which means that the regulator may not be properly retrieved and reference counted, and may be switched off in case noone else is using it. Fix this by returning silently on deferred probe and let the system work it out. Cc: Jeremy LintonSigned-off-by: Linus Walleij --- drivers/net/ethernet/smsc/smsc911x.c | 9 - 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/smsc/smsc911x.c b/drivers/net/ethernet/smsc/smsc911x.c index 86b7c04e3738..c492e4ffd9e7 100644 --- a/drivers/net/ethernet/smsc/smsc911x.c +++ b/drivers/net/ethernet/smsc/smsc911x.c @@ -442,9 +442,16 @@ static int smsc911x_request_resources(struct platform_device *pdev) ret = regulator_bulk_get(>dev, ARRAY_SIZE(pdata->supplies), pdata->supplies); - if (ret) + if (ret) { + /* +* Retry on deferrals, else just report the error +* and try to continue. +*/ + if (ret == -EPROBE_DEFER) + return ret; netdev_err(ndev, "couldn't get regulators %d\n", ret); + } /* Request optional RESET GPIO */ pdata->reset_gpiod = devm_gpiod_get_optional(>dev, -- 2.7.4
Re: [PATCH net 1/2] r8152: fix the sw rx checksum is unavailable
On 16-12-08 10:23 PM, Hayes Wang wrote: > Mark Lord> > I find an issue about autosuspend, and it may result in the same > problem with you. I don't sure if this is helpful to you, because > it only occurs when enabling the autosuspend. Thanks. I am using ASIX adapters now. I did try the latest 4.9-rc8, and 4.8.12 kernels with the r8152 dongle yesterday, in hope that perhaps the many EHCI fixes from those kernels might help out. The dongle was unusable with those newer kernels. Most of the time it failed with "Get ether addr fail\n" at startup. On the occasions where it got past that point, it often failed the DHCP negotiation, but this looks more like a bug elsewhere in the kernel, possibly racing against initialization of the random number generators. Adding a 2-second sleep the the r8151 probe function made this error mostly go away. Cheers -- Mark Lord
[PATCH] net:ethernet:samsung:initialize cur_rx_qnum
This patch initialize the cur_rx_qnum upon occurence of rx interrupt, without this initialization driver will not work with multiple rx queues configurations. NOTE: This patch is not tested on actual hw. --- drivers/net/ethernet/samsung/sxgbe/sxgbe_main.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/net/ethernet/samsung/sxgbe/sxgbe_main.c b/drivers/net/ethernet/samsung/sxgbe/sxgbe_main.c index ea44a24..580a1a4 100644 --- a/drivers/net/ethernet/samsung/sxgbe/sxgbe_main.c +++ b/drivers/net/ethernet/samsung/sxgbe/sxgbe_main.c @@ -1681,6 +1681,7 @@ static irqreturn_t sxgbe_rx_interrupt(int irq, void *dev_id) struct sxgbe_rx_queue *rxq = (struct sxgbe_rx_queue *)dev_id; struct sxgbe_priv_data *priv = rxq->priv_ptr; + priv->cur_rx_qnum = rxq->queue_no; /* get the channel status */ status = priv->hw->dma->rx_dma_int_status(priv->ioaddr, rxq->queue_no, >xstats); -- 1.9.1
Re: netlink: GPF in sock_sndtimeo
On 2016-12-09 12:53, Dmitry Vyukov wrote: > On Fri, Dec 9, 2016 at 12:48 PM, Richard Guy Briggswrote: > > On 2016-12-09 11:49, Dmitry Vyukov wrote: > >> On Fri, Dec 9, 2016 at 7:02 AM, Richard Guy Briggs wrote: > >> > On 2016-11-29 23:52, Richard Guy Briggs wrote: > >> > I tried a quick compile attempt on the test case (I assume it is a > >> > socket fuzzer) and get the following compile error: > >> > cc -g -O0 -Wall -D_GNU_SOURCE -o socket_fuzz socket_fuzz.c > >> > socket_fuzz.c:16:1: warning: "_GNU_SOURCE" redefined > >> > : warning: this is the location of the previous definition > >> > socket_fuzz.c: In function ‘segv_handler’: > >> > socket_fuzz.c:89: warning: implicit declaration of function > >> > ‘__atomic_load_n’ > >> > socket_fuzz.c:89: error: ‘__ATOMIC_RELAXED’ undeclared (first use in > >> > this function) > >> > socket_fuzz.c:89: error: (Each undeclared identifier is reported only > >> > once > >> > socket_fuzz.c:89: error: for each function it appears in.) > >> > socket_fuzz.c: In function ‘loop’: > >> > socket_fuzz.c:280: warning: unused variable ‘errno0’ > >> > socket_fuzz.c: In function ‘test’: > >> > socket_fuzz.c:303: warning: implicit declaration of function > >> > ‘__atomic_fetch_add’ > >> > socket_fuzz.c:303: error: ‘__ATOMIC_SEQ_CST’ undeclared (first use in > >> > this function) > >> > socket_fuzz.c:303: warning: implicit declaration of function > >> > ‘__atomic_fetch_sub’ > >> > >> -std=gnu99 should help > >> ignore warnings > > > > I got a little further, left with "__ATOMIC_RELAXED undeclared", > > "__ATOMIC_SEQ_CST > > undeclared" under gcc 4.4.7-16. > > > > gcc 4.8.2-15 leaves me with "undefined reference to `clock_gettime'" > > add -lrt Ok, that helped. Thanks! > > What compiler version do you recommend? > > 6.x sounds reasonable > 4.4 branch is 7.5 years old, surprised that it does not disintegrate > into dust yet :) These are under RHEL6... so there are updates to them, but yeah, they are old. > >> >> - RGB > >> > > >> > - RGB > > > > - RGB > > > > -- > > Richard Guy Briggs > > Kernel Security Engineering, Base Operating Systems, Red Hat > > Remote, Ottawa, Canada > > Voice: +1.647.777.2635, Internal: (81) 32635 - RGB -- Richard Guy Briggs Kernel Security Engineering, Base Operating Systems, Red Hat Remote, Ottawa, Canada Voice: +1.647.777.2635, Internal: (81) 32635
Re: [PATCH V2 21/22] bnxt_re: Add QP event handling
Hello! On 12/9/2016 9:48 AM, Selvin Xavier wrote: Implements callback handler for processing affiliated Async events of a QP. This patch also implements the control path command completion handling. Signed-off-by: Eddie WaiSigned-off-by: Devesh Sharma Signed-off-by: Somnath Kotur Signed-off-by: Sriharsha Basavapatna Signed-off-by: Selvin Xavier --- drivers/infiniband/hw/bnxtre/bnxt_qplib_rcfw.c | 49 ++ 1 file changed, 49 insertions(+) diff --git a/drivers/infiniband/hw/bnxtre/bnxt_qplib_rcfw.c b/drivers/infiniband/hw/bnxtre/bnxt_qplib_rcfw.c index 5b71acd..3e6bb3f 100644 --- a/drivers/infiniband/hw/bnxtre/bnxt_qplib_rcfw.c +++ b/drivers/infiniband/hw/bnxtre/bnxt_qplib_rcfw.c @@ -246,6 +246,46 @@ static int bnxt_qplib_process_func_event(struct bnxt_qplib_rcfw *rcfw, return 0; } +static int bnxt_qplib_process_qp_event(struct bnxt_qplib_rcfw *rcfw, + struct creq_qp_event *qp_event) +{ + struct bnxt_qplib_crsq *crsq = >crsq; + struct bnxt_qplib_hwq *cmdq = >cmdq; + struct bnxt_qplib_crsqe *crsqe; + u16 cbit, cookie, blocked = 0; + unsigned long flags; + u32 sw_cons; + + switch (qp_event->event) { + case CREQ_QP_EVENT_EVENT_QP_ERROR_NOTIFICATION: + break; + default: + { + /* Command Response */ + spin_lock_irqsave(>lock, flags); + sw_cons = HWQ_CMP(crsq->cons, crsq); + crsqe = >crsq[sw_cons]; + crsq->cons++; + memcpy(>qp_event, qp_event, sizeof(crsqe->qp_event)); + + cookie = le16_to_cpu(crsqe->qp_event.cookie); + blocked = cookie & RCFW_CMD_IS_BLOCKING; + cookie &= RCFW_MAX_COOKIE_VALUE; + cbit = cookie % RCFW_MAX_OUTSTANDING_CMD; + if (!test_and_clear_bit(cbit, rcfw->cmdq_bitmap)) + dev_warn(>pdev->dev, +"QPLIB: CMD bit %d was not requested", cbit); + + cmdq->cons += crsqe->req_size; + spin_unlock_irqrestore(>lock, flags); + if (!blocked) + wake_up(>waitq); + break; + } + } Hum, strange indentation... Not seeing why you need {} in the *default* at all... + return 0; +} + /* SP - CREQ Completion handlers */ static void bnxt_qplib_service_creq(unsigned long data) { @@ -269,6 +309,15 @@ static void bnxt_qplib_service_creq(unsigned long data) type = creqe->type & CREQ_BASE_TYPE_MASK; switch (type) { case CREQ_BASE_TYPE_QP_EVENT: + if (!bnxt_qplib_process_qp_event + (rcfw, (struct creq_qp_event *)creqe)) + rcfw->creq_qp_event_processed++; + else { CodingStyle: there should be {} used in all branches if it's used on at least branch of *if*. + dev_warn(>pdev->dev, "QPLIB: crsqe with"); + dev_warn(>pdev->dev, +"QPLIB: type = 0x%x not handled", +type); + } break; case CREQ_BASE_TYPE_FUNC_EVENT: if (!bnxt_qplib_process_func_event MBR, Sergei
Re: [PATCH] ipv4: Should use consistent conditional judgement for ip fragment in __ip_append_data and ip_finish_output
Hello! On 12/9/2016 6:08 AM, Zheng Li wrote: From: zheng liThere is an inconsitent conditional judgement in __ip_append_data and Inconsistent. ip_finish_output functions, the variable length in __ip_append_data just include the length of applicatoin's payload and udp header, don't include Application. the length of ip header, but in ip_finish_output use (skb->len > ip_skb_dst_mtu(skb)) as judgement, and skb->len include the length of ip header. That cuase some particular applicatoin's udp payload whose length is Causes, application. between (MTU - IP Header) and MTU were framented by ip_fragment even Fragmented. though the rst->dev support UFO feature. Add the length of ip header to length in __ip_append_data to keep consistent conditional judgement as ip_finish_output for ip fragment. Signed-off-by: Zheng Li [...] MBR, Sergei
pull-request: mac80211-next 2016-12-09
Hi Dave, Closing net-next caught me by surprise, so I had to rebase a bit, but these three patches really should go in soon. I'm not sending them for 4.9 this late though. Please pull and let me know if there's any problem. Thanks, johannes The following changes since commit f81a8a02bb3b3e882ba6aa580230c13b5be64849: Merge branch 'mV88e6xxx-interrupt-fixes' (2016-11-20 21:16:14 -0500) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next.git tags/mac80211-next-for-davem-2016-12-09 for you to fetch changes up to e6f462df9acd2a3295e5d34eb29e2823220cf129: cfg80211/mac80211: fix BSS leaks when abandoning assoc attempts (2016-12-09 12:57:49 +0100) Three fixes: * fix a logic bug introduced by a previous cleanup * fix nl80211 attribute confusing (trying to use a single attribute for two purposes) * fix a long-standing BSS leak that happens when an association attempt is abandoned Johannes Berg (2): nl80211: fix logic inversion in start_nan() cfg80211/mac80211: fix BSS leaks when abandoning assoc attempts Vamsi Krishna (1): nl80211: Use different attrs for BSSID and random MAC addr in scan req include/net/cfg80211.h | 11 +++ include/uapi/linux/nl80211.h | 7 ++- net/mac80211/mlme.c | 21 - net/wireless/core.h | 1 + net/wireless/mlme.c | 12 net/wireless/nl80211.c | 18 -- net/wireless/sme.c | 14 ++ 7 files changed, 72 insertions(+), 12 deletions(-)
Re: netlink: GPF in sock_sndtimeo
On Fri, Dec 9, 2016 at 12:48 PM, Richard Guy Briggswrote: > On 2016-12-09 11:49, Dmitry Vyukov wrote: >> On Fri, Dec 9, 2016 at 7:02 AM, Richard Guy Briggs wrote: >> > On 2016-11-29 23:52, Richard Guy Briggs wrote: >> > I tried a quick compile attempt on the test case (I assume it is a >> > socket fuzzer) and get the following compile error: >> > cc -g -O0 -Wall -D_GNU_SOURCE -o socket_fuzz socket_fuzz.c >> > socket_fuzz.c:16:1: warning: "_GNU_SOURCE" redefined >> > : warning: this is the location of the previous definition >> > socket_fuzz.c: In function ‘segv_handler’: >> > socket_fuzz.c:89: warning: implicit declaration of function >> > ‘__atomic_load_n’ >> > socket_fuzz.c:89: error: ‘__ATOMIC_RELAXED’ undeclared (first use in this >> > function) >> > socket_fuzz.c:89: error: (Each undeclared identifier is reported only once >> > socket_fuzz.c:89: error: for each function it appears in.) >> > socket_fuzz.c: In function ‘loop’: >> > socket_fuzz.c:280: warning: unused variable ‘errno0’ >> > socket_fuzz.c: In function ‘test’: >> > socket_fuzz.c:303: warning: implicit declaration of function >> > ‘__atomic_fetch_add’ >> > socket_fuzz.c:303: error: ‘__ATOMIC_SEQ_CST’ undeclared (first use in this >> > function) >> > socket_fuzz.c:303: warning: implicit declaration of function >> > ‘__atomic_fetch_sub’ >> >> -std=gnu99 should help >> ignore warnings > > I got a little further, left with "__ATOMIC_RELAXED undeclared", > "__ATOMIC_SEQ_CST > undeclared" under gcc 4.4.7-16. > > gcc 4.8.2-15 leaves me with "undefined reference to `clock_gettime'" add -lrt > What compiler version do you recommend? 6.x sounds reasonable 4.4 branch is 7.5 years old, surprised that it does not disintegrate into dust yet :) >> >> - RGB >> > >> > - RGB > > - RGB > > -- > Richard Guy Briggs > Kernel Security Engineering, Base Operating Systems, Red Hat > Remote, Ottawa, Canada > Voice: +1.647.777.2635, Internal: (81) 32635
Re: netlink: GPF in sock_sndtimeo
On 2016-12-09 11:49, Dmitry Vyukov wrote: > On Fri, Dec 9, 2016 at 7:02 AM, Richard Guy Briggswrote: > > On 2016-11-29 23:52, Richard Guy Briggs wrote: > > I tried a quick compile attempt on the test case (I assume it is a > > socket fuzzer) and get the following compile error: > > cc -g -O0 -Wall -D_GNU_SOURCE -o socket_fuzz socket_fuzz.c > > socket_fuzz.c:16:1: warning: "_GNU_SOURCE" redefined > > : warning: this is the location of the previous definition > > socket_fuzz.c: In function ‘segv_handler’: > > socket_fuzz.c:89: warning: implicit declaration of function > > ‘__atomic_load_n’ > > socket_fuzz.c:89: error: ‘__ATOMIC_RELAXED’ undeclared (first use in this > > function) > > socket_fuzz.c:89: error: (Each undeclared identifier is reported only once > > socket_fuzz.c:89: error: for each function it appears in.) > > socket_fuzz.c: In function ‘loop’: > > socket_fuzz.c:280: warning: unused variable ‘errno0’ > > socket_fuzz.c: In function ‘test’: > > socket_fuzz.c:303: warning: implicit declaration of function > > ‘__atomic_fetch_add’ > > socket_fuzz.c:303: error: ‘__ATOMIC_SEQ_CST’ undeclared (first use in this > > function) > > socket_fuzz.c:303: warning: implicit declaration of function > > ‘__atomic_fetch_sub’ > > -std=gnu99 should help > ignore warnings I got a little further, left with "__ATOMIC_RELAXED undeclared", "__ATOMIC_SEQ_CST undeclared" under gcc 4.4.7-16. gcc 4.8.2-15 leaves me with "undefined reference to `clock_gettime'" What compiler version do you recommend? > >> - RGB > > > > - RGB - RGB -- Richard Guy Briggs Kernel Security Engineering, Base Operating Systems, Red Hat Remote, Ottawa, Canada Voice: +1.647.777.2635, Internal: (81) 32635
[PATCH] uio-hv-generic: store physical addresses instead of virtual
gcc warns about the newly added driver when phys_addr_t is wider than a pointer: drivers/uio/uio_hv_generic.c: In function 'hv_uio_mmap': drivers/uio/uio_hv_generic.c:71:17: error: cast to pointer from integer of different size [-Werror=int-to-pointer-cast] virt_to_phys((void *)info->mem[mi].addr) >> PAGE_SHIFT, drivers/uio/uio_hv_generic.c: In function 'hv_uio_probe': drivers/uio/uio_hv_generic.c:140:5: error: cast from pointer to integer of different size [-Werror=pointer-to-int-cast] = (phys_addr_t)dev->channel->ringbuffer_pages; drivers/uio/uio_hv_generic.c:147:3: error: cast from pointer to integer of different size [-Werror=pointer-to-int-cast] (phys_addr_t)vmbus_connection.int_page; drivers/uio/uio_hv_generic.c:153:3: error: cast from pointer to integer of different size [-Werror=pointer-to-int-cast] (phys_addr_t)vmbus_connection.monitor_pages[1]; I can't see why we store a virtual address in a phys_addr_t here, as the only user of that variable converts it into a physical address anyway, so this moves the conversion to where it logically fits according to the types. Fixes: 95096f2fbd10 ("uio-hv-generic: new userspace i/o driver for VMBus") Signed-off-by: Arnd Bergmann--- drivers/uio/uio_hv_generic.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/uio/uio_hv_generic.c b/drivers/uio/uio_hv_generic.c index ad3ab5805ad8..50958f167305 100644 --- a/drivers/uio/uio_hv_generic.c +++ b/drivers/uio/uio_hv_generic.c @@ -68,7 +68,7 @@ hv_uio_mmap(struct uio_info *info, struct vm_area_struct *vma) mi = (int)vma->vm_pgoff; return remap_pfn_range(vma, vma->vm_start, - virt_to_phys((void *)info->mem[mi].addr) >> PAGE_SHIFT, + info->mem[mi].addr >> PAGE_SHIFT, vma->vm_end - vma->vm_start, vma->vm_page_prot); } @@ -137,20 +137,20 @@ hv_uio_probe(struct hv_device *dev, /* mem resources */ pdata->info.mem[TXRX_RING_MAP].name = "txrx_rings"; pdata->info.mem[TXRX_RING_MAP].addr - = (phys_addr_t)dev->channel->ringbuffer_pages; + = virt_to_phys(dev->channel->ringbuffer_pages); pdata->info.mem[TXRX_RING_MAP].size = dev->channel->ringbuffer_pagecount * PAGE_SIZE; pdata->info.mem[TXRX_RING_MAP].memtype = UIO_MEM_LOGICAL; pdata->info.mem[INT_PAGE_MAP].name = "int_page"; pdata->info.mem[INT_PAGE_MAP].addr = - (phys_addr_t)vmbus_connection.int_page; + virt_to_phys(vmbus_connection.int_page); pdata->info.mem[INT_PAGE_MAP].size = PAGE_SIZE; pdata->info.mem[INT_PAGE_MAP].memtype = UIO_MEM_LOGICAL; pdata->info.mem[MON_PAGE_MAP].name = "monitor_pages"; pdata->info.mem[MON_PAGE_MAP].addr = - (phys_addr_t)vmbus_connection.monitor_pages[1]; + virt_to_phys(vmbus_connection.monitor_pages[1]); pdata->info.mem[MON_PAGE_MAP].size = PAGE_SIZE; pdata->info.mem[MON_PAGE_MAP].memtype = UIO_MEM_LOGICAL; -- 2.9.0
Synopsys Ethernet QoS
Dear David Miller, These past 2 weeks we have been discussing the right way to go in terms of Synopsys QoS support in the kernel. The approach that raised more supporters was: a) Test /stmicro/stmmac driver in a reference hardware prototyping platform (QoS IPK) [Status: In Progress | 90% finished] b) Merge the necessary features from AXIS’ synopsys based qos driver to the /stmicro/stmmac [Status: In Queue] c) Rename /stmicro/stmmac driver to synopsys/ and re-factor the driver if necessary [Status: In Queue] d) Add QoS features incrementally to the new synopsys/ driver [Status: In Queue] This approach has the green light from AXIS and STMicro maintainers (Lars and Peppe). I would like to know if you support this plan. Best Regards, Joao
Re: Misalignment, MIPS, and ip_hdr(skb)->version
On Wed, 07 Dec 2016 23:34:21 -0500, Daniel Kahn Gillmor wrote: > fwiw, i'm not convinced that "most protocols of the IETF follow this > mantra". we've had multiple discussions in different protocol groups > about shaving or bloating by a few bytes here or there in different > protocols, and i don't think anyone has brought up memory alignment as > an argument in any of the discussions i've followed. Which is sad. One would expect that this would be well understood for decades already. Jiri
Re: [PATCH V2 22/22] bnxt_re: Add bnxt_re driver build support
Hi Selvin, [auto build test ERROR on rdma/master] [also build test ERROR on v4.9-rc8 next-20161208] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] url: https://github.com/0day-ci/linux/commits/Selvin-Xavier/Broadcom-RoCE-Driver-bnxt_re/20161209-154823 base: https://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma.git master config: parisc-allyesconfig (attached as .config) compiler: hppa-linux-gnu-gcc (Debian 6.1.1-9) 6.1.1 20160705 reproduce: wget https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # save the attached .config to linux build tree make.cross ARCH=parisc All errors (new ones prefixed by >>): drivers/infiniband/hw/bnxtre/bnxt_qplib_rcfw.c: In function 'bnxt_qplib_creq_irq': >> drivers/infiniband/hw/bnxtre/bnxt_qplib_rcfw.c:359:2: error: implicit >> declaration of function 'prefetch' [-Werror=implicit-function-declaration] prefetch(_ptr[CREQ_PG(sw_cons)][CREQ_IDX(sw_cons)]); ^~~~ cc1: some warnings being treated as errors -- drivers/infiniband/hw/bnxtre/bnxt_qplib_fp.c: In function 'bnxt_qplib_service_nq': drivers/infiniband/hw/bnxtre/bnxt_qplib_fp.c:145:29: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast] bnxt_qplib_arm_cq_enable((struct bnxt_qplib_cq *) ^ drivers/infiniband/hw/bnxtre/bnxt_qplib_fp.c:147:29: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast] if (!nq->cqn_handler(nq, (struct bnxt_qplib_cq *) ^ drivers/infiniband/hw/bnxtre/bnxt_qplib_fp.c: In function 'bnxt_qplib_nq_irq': >> drivers/infiniband/hw/bnxtre/bnxt_qplib_fp.c:182:2: error: implicit >> declaration of function 'prefetch' [-Werror=implicit-function-declaration] prefetch(_ptr[NQE_PG(sw_cons)][NQE_IDX(sw_cons)]); ^~~~ drivers/infiniband/hw/bnxtre/bnxt_qplib_fp.c: In function 'bnxt_qplib_create_qp': drivers/infiniband/hw/bnxtre/bnxt_qplib_fp.c:484:16: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast] psn_search = (unsigned long long int) ^ drivers/infiniband/hw/bnxtre/bnxt_qplib_fp.c: In function 'bnxt_qplib_destroy_qp': drivers/infiniband/hw/bnxtre/bnxt_qplib_fp.c:1071:22: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast] __clean_cq(qp->scq, (u64)qp); ^ drivers/infiniband/hw/bnxtre/bnxt_qplib_fp.c:1073:23: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast] __clean_cq(qp->rcq, (u64)qp); ^ drivers/infiniband/hw/bnxtre/bnxt_qplib_fp.c: In function '__flush_sq': drivers/infiniband/hw/bnxtre/bnxt_qplib_fp.c:1630:20: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast] cqe->qp_handle = (u64)qp; ^ drivers/infiniband/hw/bnxtre/bnxt_qplib_fp.c: In function '__flush_rq': drivers/infiniband/hw/bnxtre/bnxt_qplib_fp.c:1664:20: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast] cqe->qp_handle = (u64)qp; ^ drivers/infiniband/hw/bnxtre/bnxt_qplib_fp.c: In function 'bnxt_qplib_cq_process_req': drivers/infiniband/hw/bnxtre/bnxt_qplib_fp.c:1688:7: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast] qp = (struct bnxt_qplib_qp *)le64_to_cpu(hwcqe->qp_handle); ^ drivers/infiniband/hw/bnxtre/bnxt_qplib_fp.c:1720:20: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast] cqe->qp_handle = (u64)qp; ^ drivers/infiniband/hw/bnxtre/bnxt_qplib_fp.c: In function 'bnxt_qplib_cq_process_res_rc': drivers/infiniband/hw/bnxtre/bnxt_qplib_fp.c:1782:7: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast] qp = (struct bnxt_qplib_qp *)le64_to_cpu(hwcqe->qp_handle); ^ drivers/infiniband/hw/bnxtre/bnxt_qplib_fp.c:1794:19: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast] cqe->qp_handle = (u64)qp; ^ drivers/infiniband/hw/bnxtre/bnxt_qplib_fp.c: In function 'bnxt_qplib_cq_process_res_ud': drivers/infiniband/hw/bnxtre/bnxt_qplib_fp.c:1836:7: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast] qp = (struct bnxt_qplib_qp *)le64_to_cpu(hwcqe->qp_handle); ^ drivers/infiniband/hw/bnxtre/bnxt_qplib_fp.c:1847:19: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast] cqe->qp_handle = (u64)qp; ^ drivers/infiniband/hw/bnxtre/bnxt_qplib_fp.c: In function 'bnxt_qplib_cq_process_res_raweth_qp1': dri
Re: [PATCH 1/2] net: ethernet: sxgbe: remove private tx queue lock
On Fri 2016-12-09 00:19:43, Francois Romieu wrote: > Lino Sanfilippo: > [...] > > OTOH Pavel said that he actually could produce a deadlock. Now I wonder if > > this is caused by that locking scheme (in a way I have not figured out yet) > > or if it is a different issue. > > stmmac_tx_err races with stmmac_xmit. Umm, yes, that looks real. And that means that removing tx_lock will not be completely trivial :-(. Lino, any ideas there? netif_tx_lock_irqsave() would help, but afaict that one does not exist. Plus, does someone know how to trigger the status == tx_hard_error? I tried powering down the switch, but that did not do it. Thanks, Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html signature.asc Description: Digital signature
Re: [PATCH] net: socket: preferred __aligned(size) for control buffer
Hello! On 12/8/2016 3:51 PM, kushwah...@samsung.com wrote: From: Amit KushwahaThis patch cleanup checkpatch.pl warning WARNING: __aligned(size) is preferred over __attribute__((aligned(size))) Signed-off-by: Amit Kushwaha --- net/socket.c |3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/net/socket.c b/net/socket.c index e631894..5835383 100644 --- a/net/socket.c +++ b/net/socket.c @@ -1,3 +1,4 @@ + Why? /* * NET An implementation of the SOCKET network access protocol. * [...] MBR, Sergei
Re: netlink: GPF in sock_sndtimeo
On 2016-12-08 22:57, Cong Wang wrote: > On Thu, Dec 8, 2016 at 10:02 PM, Richard Guy Briggswrote: > > I also tried to extend Cong Wang's idea to attempt to proactively respond > > to a > > NETLINK_URELEASE on the audit_sock and reset it, but ran into a locking > > error > > stack dump using mutex_lock(_cmd_mutex) in the notifier callback. > > Eliminating the lock since the sock is dead anways eliminates the error. > > > > Is it safe? I'll resubmit if this looks remotely sane. Meanwhile I'll try > > to > > get the test case to compile. > > It doesn't look safe, because 'audit_sock', 'audit_nlk_portid' and 'audit_pid' > are updated as a whole and race between audit_receive_msg() and > NETLINK_URELEASE. This is what I expected and why I originally added the mutex lock in the callback... The dumps I got were bare with no wrapper identifying the process context or specific error, so I'm at a bit of a loss how to solve this (without thinking more about it) other than instinctively removing the mutex. Another approach might be to look at consolidating the three into one identifier or derive the other two from one, or serialize their access. > > @@ -1167,10 +1190,14 @@ static void __net_exit audit_net_exit(struct net > > *net) > > { > > struct audit_net *aunet = net_generic(net, audit_net_id); > > struct sock *sock = aunet->nlsk; > > + > > + mutex_lock(_cmd_mutex); > > if (sock == audit_sock) { > > audit_pid = 0; > > + audit_nlk_portid = 0; > > audit_sock = NULL; > > } > > + mutex_unlock(_cmd_mutex); > > If you decide to use NETLINK_URELEASE notifier, the above piece is no > longer needed, the net_exit path simply releases a refcnt. Good point. It would have already killed it off. So this piece is arguably too late anyways. - RGB -- Richard Guy Briggs Kernel Security Engineering, Base Operating Systems, Red Hat Remote, Ottawa, Canada Voice: +1.647.777.2635, Internal: (81) 32635
[PATCH] bnxt_re: fix itnull.cocci warnings
list_for_each_entry iterator variable cannot be NULL. Generated by: scripts/coccinelle/iterators/itnull.cocci CC: Selvin Xavier <selvin.xav...@broadcom.com> Signed-off-by: Julia Lawall <julia.law...@lip6.fr> Signed-off-by: Fengguang Wu <fengguang...@intel.com> --- url: https://github.com/0day-ci/linux/commits/Selvin-Xavier/Broadcom-RoCE-Driver-bnxt_re/20161209-154823 base: https://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma.git master I received some other warnings as well. Not sure if they have been passed along already: >> drivers/infiniband/hw/bnxtre/bnxt_re_ib_verbs.c:2455:4-14: code aligned with following code on line 2456 -- >> drivers/infiniband/hw/bnxtre/bnxt_re_main.c:1047:2-20: code aligned with following code on line 1048 drivers/infiniband/hw/bnxtre/bnxt_re_main.c:1188:3-43: code aligned with following code on line 1190 -- >> drivers/infiniband/hw/bnxtre/bnxt_re_main.c:834:6-8: ERROR: iterator variable bound on line 832 cannot be NULL -- >> drivers/infiniband/hw/bnxtre/bnxt_re_ib_verbs.c:2512:5-13: WARNING: Unsigned expression compared with zero: pkt_type < 0 bnxt_re_main.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) --- a/drivers/infiniband/hw/bnxtre/bnxt_re_main.c +++ b/drivers/infiniband/hw/bnxtre/bnxt_re_main.c @@ -831,7 +831,7 @@ static void bnxt_re_dev_stop(struct bnxt mutex_lock(>qp_lock); list_for_each_entry(qp, >qp_list, list) { /* Modify the state of all QPs except QP1/Shadow QP */ - if (qp && !bnxt_re_is_qp1_or_shadow_qp(rdev, qp)) { + if (!bnxt_re_is_qp1_or_shadow_qp(rdev, qp)) { if (qp->qplib_qp.state != CMDQ_MODIFY_QP_NEW_STATE_RESET || qp->qplib_qp.state !=
Re: stmmac driver...
Hello Jie Deng In your cover letter you wrote dwc-eth-xxx.x The DWC ethernet core layer (DWC ECL). This layer contains codes can be shared by different DWC series ethernet cores Does this mean that code in dwc-eth-xxx.x is common to all the different Synopsys IPs, GMAC, XGMAC and XLGMAC ? Regards, Niklas On Fri, Dec 9, 2016 at 11:05 AM, Jie Dengwrote: > > > On 2016/12/8 23:25, David Miller wrote: >> From: Alexandre Torgue >> Date: Thu, 8 Dec 2016 14:55:04 +0100 >> >>> Maybe I forget some series. Do you have others in mind ? >> Please see the thread titled: >> >> "net: ethernet: Initial driver for Synopsys DWC XLGMAC" >> >> which seems to be discussing consolidation of various drivers >> for the same IP core, of which stmmac is one. >> >> I personally am against any change of the driver name and >> things like this, and wish the people doing that work would >> simply contribute to making whatever changes they need directly >> to the stmmac driver. >> >> You really need to voice your opinion when major changes are being >> proposed for the driver you maintain. >> > Hi David and Alex, > > XLGMAC is not a version of GMAC. Synopsys has several IPs and each IP has > several versions. > > GMAC(QoS): 3.5, 3.7, 4.0, 4.10, 4.20... > XGMAC: 1.00, 1.10, 1.20, 2.00, 2.10, 2.11... > XLGMAC (Synopsys DesignWare Core Enterprise Ethernet): this is a new IP. > > Regards, > Jie >
Re: netlink: GPF in sock_sndtimeo
On Fri, Dec 9, 2016 at 7:02 AM, Richard Guy Briggswrote: > On 2016-11-29 23:52, Richard Guy Briggs wrote: >> On 2016-11-29 15:13, Cong Wang wrote: >> > On Tue, Nov 29, 2016 at 8:48 AM, Richard Guy Briggs >> > wrote: >> > > On 2016-11-26 17:11, Cong Wang wrote: >> > >> It is racy on audit_sock, especially on the netns exit path. >> > > >> > > I think that is the only place it is racy. The other places audit_sock >> > > is set is when the socket failure has just triggered a reset. >> > > >> > > Is there a notifier callback for failed or reaped sockets? >> > >> > Is NETLINK_URELEASE event what you are looking for? >> >> Possibly, yes. Thanks, I'll have a look. > > I tried a quick compile attempt on the test case (I assume it is a > socket fuzzer) and get the following compile error: > cc -g -O0 -Wall -D_GNU_SOURCE -o socket_fuzz socket_fuzz.c > socket_fuzz.c:16:1: warning: "_GNU_SOURCE" redefined > : warning: this is the location of the previous definition > socket_fuzz.c: In function ‘segv_handler’: > socket_fuzz.c:89: warning: implicit declaration of function ‘__atomic_load_n’ > socket_fuzz.c:89: error: ‘__ATOMIC_RELAXED’ undeclared (first use in this > function) > socket_fuzz.c:89: error: (Each undeclared identifier is reported only once > socket_fuzz.c:89: error: for each function it appears in.) > socket_fuzz.c: In function ‘loop’: > socket_fuzz.c:280: warning: unused variable ‘errno0’ > socket_fuzz.c: In function ‘test’: > socket_fuzz.c:303: warning: implicit declaration of function > ‘__atomic_fetch_add’ > socket_fuzz.c:303: error: ‘__ATOMIC_SEQ_CST’ undeclared (first use in this > function) > socket_fuzz.c:303: warning: implicit declaration of function > ‘__atomic_fetch_sub’ -std=gnu99 should help ignore warnings > I also tried to extend Cong Wang's idea to attempt to proactively respond to a > NETLINK_URELEASE on the audit_sock and reset it, but ran into a locking error > stack dump using mutex_lock(_cmd_mutex) in the notifier callback. > Eliminating the lock since the sock is dead anways eliminates the error. > > Is it safe? I'll resubmit if this looks remotely sane. Meanwhile I'll try to > get the test case to compile. > > This is being tracked as https://github.com/linux-audit/audit-kernel/issues/30 > > Subject: [PATCH] audit: proactively reset audit_sock on matching > NETLINK_URELEASE > > diff --git a/kernel/audit.c b/kernel/audit.c > index f1ca116..91d222d 100644 > --- a/kernel/audit.c > +++ b/kernel/audit.c > @@ -423,6 +423,7 @@ static void kauditd_send_skb(struct sk_buff *skb) > snprintf(s, sizeof(s), "audit_pid=%d reset", > audit_pid); > audit_log_lost(s); > audit_pid = 0; > + audit_nlk_portid = 0; > audit_sock = NULL; > } else { > pr_warn("re-scheduling(#%d) write to > audit_pid=%d\n", > @@ -1143,6 +1144,28 @@ static int audit_bind(struct net *net, int group) > return 0; > } > > +static int audit_sock_netlink_notify(struct notifier_block *nb, > +unsigned long event, > +void *_notify) > +{ > + struct netlink_notify *notify = _notify; > + struct audit_net *aunet = net_generic(notify->net, audit_net_id); > + > + if (event == NETLINK_URELEASE && notify->protocol == NETLINK_AUDIT) { > + if (audit_nlk_portid == notify->portid && > + audit_sock == aunet->nlsk) { > + audit_pid = 0; > + audit_nlk_portid = 0; > + audit_sock = NULL; > + } > + } > + return NOTIFY_DONE; > +} > + > +static struct notifier_block audit_netlink_notifier = { > + .notifier_call = audit_sock_netlink_notify, > +}; > + > static int __net_init audit_net_init(struct net *net) > { > struct netlink_kernel_cfg cfg = { > @@ -1167,10 +1190,14 @@ static void __net_exit audit_net_exit(struct net *net) > { > struct audit_net *aunet = net_generic(net, audit_net_id); > struct sock *sock = aunet->nlsk; > + > + mutex_lock(_cmd_mutex); > if (sock == audit_sock) { > audit_pid = 0; > + audit_nlk_portid = 0; > audit_sock = NULL; > } > + mutex_unlock(_cmd_mutex); > > RCU_INIT_POINTER(aunet->nlsk, NULL); > synchronize_net(); > @@ -1202,6 +1229,7 @@ static int __init audit_init(void) > audit_enabled = audit_default; > audit_ever_enabled |= !!audit_default; > > + netlink_register_notifier(_netlink_notifier); > audit_log(NULL, GFP_KERNEL, AUDIT_KERNEL, "initialized"); > > for (i = 0; i < AUDIT_INODE_BUCKETS; i++) > -- > 1.7.1 > > >> - RGB > > - RGB > > -- > Richard Guy Briggs > Kernel Security
Re: [PATCH 37/50] netfilter: nf_tables: atomic dump and reset for stateful objects
Hi Paul, On Thu, Dec 08, 2016 at 07:40:14PM -0500, Paul Gortmaker wrote: > On Wed, Dec 7, 2016 at 4:52 PM, Pablo Neira Ayusowrote: > > This patch adds a new NFT_MSG_GETOBJ_RESET command perform an atomic > > dump-and-reset of the stateful object. This also comes with add support > > for atomic dump and reset for counter and quota objects. > > This triggered a new build failure in linux-next on parisc-32, which a > hands-off bisect > run lists as resulting from this: > > ERROR: "__cmpxchg_u64" [net/netfilter/nft_counter.ko] undefined! > make[2]: *** [__modpost] Error 1 > make[1]: *** [modules] Error 2 > make: *** [sub-make] Error 2 > 43da04a593d8b2626f1cf4b56efe9402f6b53652 is the first bad commit > commit 43da04a593d8b2626f1cf4b56efe9402f6b53652 > Author: Pablo Neira Ayuso > Date: Mon Nov 28 00:05:44 2016 +0100 > > netfilter: nf_tables: atomic dump and reset for stateful objects > > This patch adds a new NFT_MSG_GETOBJ_RESET command perform an atomic > dump-and-reset of the stateful object. This also comes with add support > for atomic dump and reset for counter and quota objects. > > Signed-off-by: Pablo Neira Ayuso > > :04 04 6cd4554f69247e5c837db52342f26888beda1623 > 5908aca93c89e7922336546c3753bfcf2aceefba M include > :04 04 f25d5831eb30972436bd198c5bb237a0cb0b4856 > 4ee5751c8de02bb5a8dcaadb2a2df7986d90f8e9 M net > bisect run success > > Guessing this is more an issue with parisc than it is with netfilter, but I > figured I'd mention it anyway. I'm planning to submit this patch to parisc, I'm attaching it to this email. >From c9d320ac0be2a32a7b2bfad398be549865088ecf Mon Sep 17 00:00:00 2001 From: Pablo Neira Ayuso Date: Thu, 8 Dec 2016 22:55:33 +0100 Subject: [PATCH] parisc: export symbol __cmpxchg_u64() kbuild test robot reports: >> ERROR: "__cmpxchg_u64" [net/netfilter/nft_counter.ko] undefined! Commit 43da04a593d8 ("netfilter: nf_tables: atomic dump and reset for stateful objects") introduces the first client of cmpxchg64() from modules. Patch 54b668009076 ("parisc: Add native high-resolution sched_clock() implementation") removed __cmpxchg_u64() dependency on CONFIG_64BIT. So, let's fix this problem by exporting this symbol unconditionally. Reported-by: kbuild test robot Signed-off-by: Pablo Neira Ayuso --- arch/parisc/kernel/parisc_ksyms.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/parisc/kernel/parisc_ksyms.c b/arch/parisc/kernel/parisc_ksyms.c index 3cad8aadc69e..cfa704548cf3 100644 --- a/arch/parisc/kernel/parisc_ksyms.c +++ b/arch/parisc/kernel/parisc_ksyms.c @@ -40,8 +40,8 @@ EXPORT_SYMBOL(__atomic_hash); #endif #ifdef CONFIG_64BIT EXPORT_SYMBOL(__xchg64); -EXPORT_SYMBOL(__cmpxchg_u64); #endif +EXPORT_SYMBOL(__cmpxchg_u64); #include EXPORT_SYMBOL(lclear_user); -- 2.1.4
Re: [PATCH v3 0/4] vsock: cancel connect packets when failing to connect
On Fri, Dec 09, 2016 at 01:12:32AM +0800, Peng Tao wrote: > Currently, if a connect call fails on a signal or timeout (e.g., guest is > still > in the process of starting up), we'll just return to caller and leave the > connect > packet queued and they are sent even though the connection is considered a > failure, > which can confuse applications with unwanted false connect attempt. > > The patchset enables vsock (both host and guest) to cancel queued packets when > a connect attempt is considered to fail. > > v3 changelog: > - define cancel_pkt callback in struct vsock_transport rather than struct > virtio_transport > - rename virtio_vsock_pkt->vsk to virtio_vsock_pkt->cancel_token > v2 changelog: > - fix queued_replies counting and resume tx/rx when necessary > > > Peng Tao (4): > vsock: track pkt owner vsock > vhost-vsock: add pkt cancel capability > vsock: add pkt cancel capability > vsock: cancel packets when failing to connect > > drivers/vhost/vsock.c | 41 > include/linux/virtio_vsock.h| 2 ++ > include/net/af_vsock.h | 3 +++ > net/vmw_vsock/af_vsock.c| 14 +++ > net/vmw_vsock/virtio_transport.c| 42 > + > net/vmw_vsock/virtio_transport_common.c | 7 ++ > 6 files changed, 109 insertions(+) I'm happy although I pointed out two unnecessary (void*) casts. Please wait for Jorgen to go happy on the af_vsock.c changes before applying. signature.asc Description: PGP signature
Re: [PATCH v3 4/4] vsock: cancel packets when failing to connect
On Fri, Dec 09, 2016 at 01:12:36AM +0800, Peng Tao wrote: > Otherwise we'll leave the packets queued until releasing vsock device. > E.g., if guest is slow to start up, resulting ETIMEDOUT on connect, guest > will get the connect requests from failed host sockets. > > Reviewed-by: Stefan HajnocziPlease do not include Reviewed-by: if the patch has undergone substantial changes. I am happy with this latest version: Reviewed-by: Stefan Hajnoczi signature.asc Description: PGP signature
Re: [PATCH v3 2/4] vhost-vsock: add pkt cancel capability
On Fri, Dec 09, 2016 at 01:12:34AM +0800, Peng Tao wrote: > To allow canceling all packets of a connection. > > Reviewed-by: Stefan Hajnoczi> Signed-off-by: Peng Tao > --- > drivers/vhost/vsock.c | 41 + > include/net/af_vsock.h | 3 +++ > 2 files changed, 44 insertions(+) > > diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c > index a504e2e0..db64d51 100644 > --- a/drivers/vhost/vsock.c > +++ b/drivers/vhost/vsock.c > @@ -218,6 +218,46 @@ vhost_transport_send_pkt(struct virtio_vsock_pkt *pkt) > return len; > } > > +static int > +vhost_transport_cancel_pkt(struct vsock_sock *vsk) > +{ > + struct vhost_vsock *vsock; > + struct virtio_vsock_pkt *pkt, *n; > + int cnt = 0; > + LIST_HEAD(freeme); > + > + /* Find the vhost_vsock according to guest context id */ > + vsock = vhost_vsock_get(vsk->remote_addr.svm_cid); > + if (!vsock) > + return -ENODEV; > + > + spin_lock_bh(>send_pkt_list_lock); > + list_for_each_entry_safe(pkt, n, >send_pkt_list, list) { > + if (pkt->cancel_token != (void *)vsk) It's not necessary to cast to void* in C. All pointers cast to void* automatically without compiler warnings. The warnings and explicit casts are a C++ thing. signature.asc Description: PGP signature
Re: [PATCH v3 3/4] vsock: add pkt cancel capability
On Fri, Dec 09, 2016 at 01:12:35AM +0800, Peng Tao wrote: > Reviewed-by: Stefan Hajnoczi> Signed-off-by: Peng Tao > --- > net/vmw_vsock/virtio_transport.c | 42 > > 1 file changed, 42 insertions(+) > > diff --git a/net/vmw_vsock/virtio_transport.c > b/net/vmw_vsock/virtio_transport.c > index 936d7ee..95c1162 100644 > --- a/net/vmw_vsock/virtio_transport.c > +++ b/net/vmw_vsock/virtio_transport.c > @@ -170,6 +170,47 @@ virtio_transport_send_pkt(struct virtio_vsock_pkt *pkt) > return len; > } > > +static int > +virtio_transport_cancel_pkt(struct vsock_sock *vsk) > +{ > + struct virtio_vsock *vsock; > + struct virtio_vsock_pkt *pkt, *n; > + int cnt = 0; > + LIST_HEAD(freeme); > + > + vsock = virtio_vsock_get(); > + if (!vsock) { > + return -ENODEV; > + } > + > + spin_lock_bh(>send_pkt_list_lock); > + list_for_each_entry_safe(pkt, n, >send_pkt_list, list) { > + if (pkt->cancel_token != (void *)vsk) The cast is unnecessary here. signature.asc Description: PGP signature
[PATCH net-next] net: macb: Added PCI wrapper for Platform Driver.
There are hardware PCI implementations of Cadence GEM network controller. This patch will allow to use such hardware with reuse of existing Platform Driver. Signed-off-by: Bartosz Folta--- drivers/net/ethernet/cadence/Kconfig| 9 ++ drivers/net/ethernet/cadence/Makefile | 1 + drivers/net/ethernet/cadence/macb.c | 31 +-- drivers/net/ethernet/cadence/macb_pci.c | 152 include/linux/platform_data/macb.h | 6 ++ 5 files changed, 194 insertions(+), 5 deletions(-) create mode 100644 drivers/net/ethernet/cadence/macb_pci.c diff --git a/drivers/net/ethernet/cadence/Kconfig b/drivers/net/ethernet/cadence/Kconfig index f0bcb15..00d833e 100644 --- a/drivers/net/ethernet/cadence/Kconfig +++ b/drivers/net/ethernet/cadence/Kconfig @@ -31,4 +31,13 @@ config MACB To compile this driver as a module, choose M here: the module will be called macb. +config MACB_PCI + tristate "Cadence PCI MACB/GEM support" + depends on MACB + ---help--- + This is PCI wrapper for MACB driver. + + To compile this driver as a module, choose M here: the module + will be called macb_pci. + endif # NET_CADENCE diff --git a/drivers/net/ethernet/cadence/Makefile b/drivers/net/ethernet/cadence/Makefile index 91f79b1..4ba7559 100644 --- a/drivers/net/ethernet/cadence/Makefile +++ b/drivers/net/ethernet/cadence/Makefile @@ -3,3 +3,4 @@ # obj-$(CONFIG_MACB) += macb.o +obj-$(CONFIG_MACB_PCI) += macb_pci.o diff --git a/drivers/net/ethernet/cadence/macb.c b/drivers/net/ethernet/cadence/macb.c index 538544a..c0fb80a 100644 --- a/drivers/net/ethernet/cadence/macb.c +++ b/drivers/net/ethernet/cadence/macb.c @@ -404,6 +404,8 @@ static int macb_mii_probe(struct net_device *dev) phy_irq = gpio_to_irq(pdata->phy_irq_pin); phydev->irq = (phy_irq < 0) ? PHY_POLL : phy_irq; } + } else { + phydev->irq = PHY_POLL; } /* attach the mac to the phy */ @@ -482,6 +484,9 @@ static int macb_mii_init(struct macb *bp) goto err_out_unregister_bus; } } else { + for (i = 0; i < PHY_MAX_ADDR; i++) + bp->mii_bus->irq[i] = PHY_POLL; + if (pdata) bp->mii_bus->phy_mask = pdata->phy_mask; @@ -2523,16 +2528,24 @@ static int macb_clk_init(struct platform_device *pdev, struct clk **pclk, struct clk **hclk, struct clk **tx_clk, struct clk **rx_clk) { + struct macb_platform_data *pdata; int err; - *pclk = devm_clk_get(>dev, "pclk"); + pdata = dev_get_platdata(>dev); + if (pdata) { + *pclk = pdata->pclk; + *hclk = pdata->hclk; + } else { + *pclk = devm_clk_get(>dev, "pclk"); + *hclk = devm_clk_get(>dev, "hclk"); + } + if (IS_ERR(*pclk)) { err = PTR_ERR(*pclk); dev_err(>dev, "failed to get macb_clk (%u)\n", err); return err; } - *hclk = devm_clk_get(>dev, "hclk"); if (IS_ERR(*hclk)) { err = PTR_ERR(*hclk); dev_err(>dev, "failed to get hclk (%u)\n", err); @@ -3107,15 +3120,23 @@ static int at91ether_init(struct platform_device *pdev) MODULE_DEVICE_TABLE(of, macb_dt_ids); #endif /* CONFIG_OF */ +static const struct macb_config default_gem_config = { + .caps = MACB_CAPS_GIGABIT_MODE_AVAILABLE | MACB_CAPS_JUMBO, + .dma_burst_length = 16, + .clk_init = macb_clk_init, + .init = macb_init, + .jumbo_max_len = 10240, +}; + static int macb_probe(struct platform_device *pdev) { + const struct macb_config *macb_config = _gem_config; int (*clk_init)(struct platform_device *, struct clk **, struct clk **, struct clk **, struct clk **) - = macb_clk_init; - int (*init)(struct platform_device *) = macb_init; + = macb_config->clk_init; + int (*init)(struct platform_device *) = macb_config->init; struct device_node *np = pdev->dev.of_node; struct device_node *phy_node; - const struct macb_config *macb_config = NULL; struct clk *pclk, *hclk = NULL, *tx_clk = NULL, *rx_clk = NULL; unsigned int queue_mask, num_queues; struct macb_platform_data *pdata; diff --git a/drivers/net/ethernet/cadence/macb_pci.c b/drivers/net/ethernet/cadence/macb_pci.c new file mode 100644 index 000..b440960 --- /dev/null +++ b/drivers/net/ethernet/cadence/macb_pci.c @@ -0,0 +1,152 @@ +/** + * macb_pci.c - Cadence GEM PCI wrapper. + * + * Copyright (C) 2016 Cadence Design Systems - http://www.cadence.com + * + * Authors: Rafal Ozieblo + *
Re: stmmac driver...
On 2016/12/8 23:25, David Miller wrote: > From: Alexandre Torgue> Date: Thu, 8 Dec 2016 14:55:04 +0100 > >> Maybe I forget some series. Do you have others in mind ? > Please see the thread titled: > > "net: ethernet: Initial driver for Synopsys DWC XLGMAC" > > which seems to be discussing consolidation of various drivers > for the same IP core, of which stmmac is one. > > I personally am against any change of the driver name and > things like this, and wish the people doing that work would > simply contribute to making whatever changes they need directly > to the stmmac driver. > > You really need to voice your opinion when major changes are being > proposed for the driver you maintain. > Hi David and Alex, XLGMAC is not a version of GMAC. Synopsys has several IPs and each IP has several versions. GMAC(QoS): 3.5, 3.7, 4.0, 4.10, 4.20... XGMAC: 1.00, 1.10, 1.20, 2.00, 2.10, 2.11... XLGMAC (Synopsys DesignWare Core Enterprise Ethernet): this is a new IP. Regards, Jie