[PATCH net] mpls: fix clearing of dead nh_flags on link up
From: Roopa Prabhu recent fixes to use WRITE_ONCE for nh_flags on link up, accidently ended up leaving the deadflags on a nh. This patch fixes the WRITE_ONCE to use freshly evaluated nh_flags. Fixes: 39eb8cd17588 ("net: mpls: rt_nhn_alive and nh_flags should be accessed using READ_ONCE") Reported-by: Satish Ashok Signed-off-by: Roopa Prabhu --- net/mpls/af_mpls.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c index 94b3317..b51582d 100644 --- a/net/mpls/af_mpls.c +++ b/net/mpls/af_mpls.c @@ -1483,7 +1483,7 @@ static void mpls_ifup(struct net_device *dev, unsigned int flags) continue; alive++; nh_flags &= ~flags; - WRITE_ONCE(nh->nh_flags, flags); + WRITE_ONCE(nh->nh_flags, nh_flags); } endfor_nexthops(rt); WRITE_ONCE(rt->rt_nhn_alive, alive); -- 1.9.1
Re: [PULL] topic/e1000e-fix
On Wed, May 31, 2017 at 7:54 AM, Daniel Vetter wrote: > On Wed, May 31, 2017 at 1:06 AM, Dave Airlie wrote: >> On 31 May 2017 at 08:10, David Miller wrote: >>> From: Daniel Vetter >>> Date: Tue, 30 May 2017 22:15:42 +0200 >>> If the e1000e maintainer wants to coalesce or not return statements this simple way, that's imo on him to change the color as needed. >>> >>> That's not how things work. >>> >>> If the maintainer wants you to style things a certain way, either you >>> do it that way or your patch isn't accepted. > > Consider this pull a regression report, pls handle it. And I guess I pile of more cc, to make this regression report complete. I mean you got the backtrace, bisect and a proposed fix, and the almost-whitespace change demanded is something gcc does in its sleep. I'd understand a request to retest if it would be a real functional change, but in this situation I have no idea why this regression just can't be fixed already. Not sure if it's really preferred if regression reports come incomplete, without bugfix and bisect attached. -Daniel >> I'm not really sure why Chris just couldn't respin already. >> >> Though really I think Chris should have just asked for a revert of the >> original patch that broke stuff, instead of trying to patch a driver >> if he doesn't have time to get the patch right for the maintainer. > > Ok, can we pls revert 2800209994f8 ("e1000e: Refactor PM flows") then? > > Thanks, Daniel > -- > Daniel Vetter > Software Engineer, Intel Corporation > +41 (0) 79 365 57 48 - http://blog.ffwll.ch -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch
[PATCH net-next] powerpc: use asm-generic/socket.h as much as possible
asm-generic/socket.h already has an exception for the differences that powerpc needs, so just include it after defining the differences. Signed-off-by: Stephen Rothwell --- arch/powerpc/include/uapi/asm/socket.h | 92 +- 1 file changed, 1 insertion(+), 91 deletions(-) Build tested using powerpc allyesconfig, pseries_le_defconfig, 32 bit and 64 bit allnoconfig and ppc44x_defconfig builds. Dave, this is on top of the net-next tree and I am sending it to you since it would produce a conflict if it was merged via the powerpc tree. diff --git a/arch/powerpc/include/uapi/asm/socket.h b/arch/powerpc/include/uapi/asm/socket.h index bc4ca72faf99..3c590c7c42c0 100644 --- a/arch/powerpc/include/uapi/asm/socket.h +++ b/arch/powerpc/include/uapi/asm/socket.h @@ -8,28 +8,6 @@ * 2 of the License, or (at your option) any later version. */ -#include - -/* For setsockopt(2) */ -#define SOL_SOCKET 1 - -#define SO_DEBUG 1 -#define SO_REUSEADDR 2 -#define SO_TYPE3 -#define SO_ERROR 4 -#define SO_DONTROUTE 5 -#define SO_BROADCAST 6 -#define SO_SNDBUF 7 -#define SO_RCVBUF 8 -#define SO_SNDBUFFORCE 32 -#define SO_RCVBUFFORCE 33 -#define SO_KEEPALIVE 9 -#define SO_OOBINLINE 10 -#define SO_NO_CHECK11 -#define SO_PRIORITY12 -#define SO_LINGER 13 -#define SO_BSDCOMPAT 14 -#define SO_REUSEPORT 15 #define SO_RCVLOWAT16 #define SO_SNDLOWAT17 #define SO_RCVTIMEO18 @@ -37,74 +15,6 @@ #define SO_PASSCRED20 #define SO_PEERCRED21 -/* Security levels - as per NRL IPv6 - don't actually do anything */ -#define SO_SECURITY_AUTHENTICATION 22 -#define SO_SECURITY_ENCRYPTION_TRANSPORT 23 -#define SO_SECURITY_ENCRYPTION_NETWORK 24 - -#define SO_BINDTODEVICE25 - -/* Socket filtering */ -#define SO_ATTACH_FILTER 26 -#define SO_DETACH_FILTER 27 -#define SO_GET_FILTER SO_ATTACH_FILTER - -#define SO_PEERNAME28 -#define SO_TIMESTAMP 29 -#define SCM_TIMESTAMP SO_TIMESTAMP - -#define SO_ACCEPTCONN 30 - -#define SO_PEERSEC 31 -#define SO_PASSSEC 34 -#define SO_TIMESTAMPNS 35 -#define SCM_TIMESTAMPNSSO_TIMESTAMPNS - -#define SO_MARK36 - -#define SO_TIMESTAMPING37 -#define SCM_TIMESTAMPING SO_TIMESTAMPING - -#define SO_PROTOCOL38 -#define SO_DOMAIN 39 - -#define SO_RXQ_OVFL 40 - -#define SO_WIFI_STATUS 41 -#define SCM_WIFI_STATUSSO_WIFI_STATUS -#define SO_PEEK_OFF42 - -/* Instruct lower device to use last 4-bytes of skb data as FCS */ -#define SO_NOFCS 43 - -#define SO_LOCK_FILTER 44 - -#define SO_SELECT_ERR_QUEUE45 - -#define SO_BUSY_POLL 46 - -#define SO_MAX_PACING_RATE 47 - -#define SO_BPF_EXTENSIONS 48 - -#define SO_INCOMING_CPU49 - -#define SO_ATTACH_BPF 50 -#define SO_DETACH_BPF SO_DETACH_FILTER - -#define SO_ATTACH_REUSEPORT_CBPF 51 -#define SO_ATTACH_REUSEPORT_EBPF 52 - -#define SO_CNX_ADVICE 53 - -#define SCM_TIMESTAMPING_OPT_STATS 54 - -#define SO_MEMINFO 55 - -#define SO_INCOMING_NAPI_ID56 - -#define SO_COOKIE 57 - -#define SCM_TIMESTAMPING_PKTINFO 58 +#include #endif /* _ASM_POWERPC_SOCKET_H */ -- 2.11.0 -- Cheers, Stephen Rothwell
Re: [PATCH net-next 0/8] Introduce bpf ID
On Tue, May 30, 2017 at 06:35:37PM -0600, David Ahern wrote: > On 5/30/17 6:08 PM, Martin KaFai Lau wrote: > > This patch series: > > 1) Introduce ID for both bpf_prog and bpf_map. > > 2) Add bpf commands to iterate the prog IDs and map > >IDs of the system. > > 3) Add bpf commands to get a prog/map fd from an ID > > 4) Add bpf command to get prog/map info from a fd. > >The prog/map info is a jump start in this patchset > >and it is not meant to be a complete list. They can > >be extended in the future patches. > > Interesting. This really simplifies the bpf retrieval I have been > working on: > https://github.com/dsahern/linux/commits/bpf/retrieve-bpf-wip > > Having the net device, filter, route, socket, cgroup, etc return the ids > for attached programs solves the problem of getting a reference to all > of the places BPF programs can be placed. Right, it is the idea. I am also baking some follow-up patches for returning xdp-prog's id through netlink. > > This patch set turns an id into a fd, and my patches allow the fd to be > used to get the bpf code.
Re: [PATCH 1/2] ARM: dts: imx6ul-14x14-evk: Add ksz8081 phy properties
On Tue, May 30, 2017 at 2:34 PM, Leonard Crestez wrote: > Right now mach-imx6ul registers a fixup for the ksz8081 phy. The same > register values can be set through the micrel phy driver by using dts > properties. > > This seems preferable and allows cleanly fixing suspend/resume. > > Signed-off-by: Leonard Crestez Reviewed-by: Fabio Estevam
Re: [net-bluetooth] question about potential null pointer dereference
Hi Marcel, Quoting "Gustavo A. R. Silva" : Hi Marcel, Quoting Marcel Holtmann : Hi Gustavo, While looking into Coverity ID 1357456 I ran into the following piece of code at net/bluetooth/smp.c:166 166/* The following functions map to the LE SC SMP crypto functions 167 * AES-CMAC, f4, f5, f6, g2 and h6. 168 */ 169 170static int aes_cmac(struct crypto_shash *tfm, const u8 k[16], const u8 *m, 171size_t len, u8 mac[16]) 172{ 173uint8_t tmp[16], mac_msb[16], msg_msb[CMAC_MSG_MAX]; 174SHASH_DESC_ON_STACK(desc, tfm); 175int err; 176 177if (len > CMAC_MSG_MAX) 178return -EFBIG; 179 180if (!tfm) { 181BT_ERR("tfm %p", tfm); 182return -EINVAL; 183} 184 BTW, what do you think about removing the IF block above? 185desc->tfm = tfm; 186desc->flags = 0; 187 188/* Swap key and message from LSB to MSB */ 189swap_buf(k, tmp, 16); 190swap_buf(m, msg_msb, len); 191 192SMP_DBG("msg (len %zu) %*phN", len, (int) len, m); 193SMP_DBG("key %16phN", k); 194 195err = crypto_shash_setkey(tfm, tmp, 16); 196if (err) { 197BT_ERR("cipher setkey failed: %d", err); 198return err; 199} 200 201err = crypto_shash_digest(desc, msg_msb, len, mac_msb); 202shash_desc_zero(desc); 203if (err) { 204BT_ERR("Hash computation error %d", err); 205return err; 206} 207 208swap_buf(mac_msb, mac, 16); 209 210SMP_DBG("mac %16phN", mac); 211 212return 0; 213} The issue here is that line 180 implies that pointer tfm might be NULL. If this is the case, there is a potential NULL pointer dereference at line 174 once pointer tfm is indirectly dereferenced inside macro SHASH_DESC_ON_STACK(). My question is if there is any chance that pointer tfm maybe be NULL when calling macro SHASH_DESC_ON_STACK()? I think the part you are after is this: smp->tfm_cmac = crypto_alloc_shash("cmac(aes)", 0, 0); if (IS_ERR(smp->tfm_cmac)) { BT_ERR("Unable to create CMAC crypto context"); crypto_free_cipher(smp->tfm_aes); kzfree(smp); return NULL; } Yeah, this makes it all clear. So the tfm_cmac is part of the smp structure. However if there is no cipher, we destroy the smp structure and essentially run without SMP support. So it can not really be called anyway. What I take from this is that as a general rule, I should first try to identify whether the code I'm debugging is reachable or not, depending on the specific structures and variables I'm interested in. Maybe commenting this might be a good idea. Yep, it wouldn't hurt. In the meantime I will triage and document this as a false positive. Thank you very much for the clarification, Marcel, I really appreciate it. -- Gustavo A. R. Silva Thanks -- Gustavo A. R. Silva
Re: [net-bluetooth] question about potential null pointer dereference
Hi Marcel, Quoting Marcel Holtmann : Hi Gustavo, While looking into Coverity ID 1357456 I ran into the following piece of code at net/bluetooth/smp.c:166 166/* The following functions map to the LE SC SMP crypto functions 167 * AES-CMAC, f4, f5, f6, g2 and h6. 168 */ 169 170static int aes_cmac(struct crypto_shash *tfm, const u8 k[16], const u8 *m, 171size_t len, u8 mac[16]) 172{ 173uint8_t tmp[16], mac_msb[16], msg_msb[CMAC_MSG_MAX]; 174SHASH_DESC_ON_STACK(desc, tfm); 175int err; 176 177if (len > CMAC_MSG_MAX) 178return -EFBIG; 179 180if (!tfm) { 181BT_ERR("tfm %p", tfm); 182return -EINVAL; 183} 184 185desc->tfm = tfm; 186desc->flags = 0; 187 188/* Swap key and message from LSB to MSB */ 189swap_buf(k, tmp, 16); 190swap_buf(m, msg_msb, len); 191 192SMP_DBG("msg (len %zu) %*phN", len, (int) len, m); 193SMP_DBG("key %16phN", k); 194 195err = crypto_shash_setkey(tfm, tmp, 16); 196if (err) { 197BT_ERR("cipher setkey failed: %d", err); 198return err; 199} 200 201err = crypto_shash_digest(desc, msg_msb, len, mac_msb); 202shash_desc_zero(desc); 203if (err) { 204BT_ERR("Hash computation error %d", err); 205return err; 206} 207 208swap_buf(mac_msb, mac, 16); 209 210SMP_DBG("mac %16phN", mac); 211 212return 0; 213} The issue here is that line 180 implies that pointer tfm might be NULL. If this is the case, there is a potential NULL pointer dereference at line 174 once pointer tfm is indirectly dereferenced inside macro SHASH_DESC_ON_STACK(). My question is if there is any chance that pointer tfm maybe be NULL when calling macro SHASH_DESC_ON_STACK()? I think the part you are after is this: smp->tfm_cmac = crypto_alloc_shash("cmac(aes)", 0, 0); if (IS_ERR(smp->tfm_cmac)) { BT_ERR("Unable to create CMAC crypto context"); crypto_free_cipher(smp->tfm_aes); kzfree(smp); return NULL; } Yeah, this makes it all clear. So the tfm_cmac is part of the smp structure. However if there is no cipher, we destroy the smp structure and essentially run without SMP support. So it can not really be called anyway. What I take from this is that as a general rule, I should first try to identify whether the code I'm debugging is reachable or not, depending on the specific structures and variables I'm interested in. Maybe commenting this might be a good idea. Yep, it wouldn't hurt. In the meantime I will triage and document this as a false positive. Thank you very much for the clarification, Marcel, I really appreciate it. -- Gustavo A. R. Silva
Re: [PATCH net-next 4/8] bpf: Add BPF_PROG_GET_FD_BY_ID
Hi Martin, [auto build test ERROR on net-next/master] url: https://github.com/0day-ci/linux/commits/Martin-KaFai-Lau/bpf-Introduce-bpf_prog-ID/20170531-110650 config: x86_64-randconfig-x015-201722 (attached as .config) compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901 reproduce: # save the attached .config to linux build tree make ARCH=x86_64 All errors (new ones prefixed by >>): In file included from arch/x86/include/asm/bug.h:81:0, from include/linux/bug.h:4, from include/linux/thread_info.h:11, from arch/x86/include/asm/preempt.h:6, from include/linux/preempt.h:80, from include/linux/spinlock.h:50, from include/linux/seqlock.h:35, from include/linux/time.h:5, from include/linux/ktime.h:24, from include/linux/timer.h:5, from include/linux/workqueue.h:8, from include/linux/bpf.h:12, from kernel/bpf/syscall.c:12: kernel/bpf/syscall.c: In function 'bpf_prog_inc_not_zero': >> kernel/bpf/syscall.c:834:16: error: implicit declaration of function >> 'lockdep_is_held' [-Werror=implicit-function-declaration] WARN_ON_ONCE(!lockdep_is_held(&prog_idr_lock)); ^ include/asm-generic/bug.h:65:25: note: in definition of macro 'WARN_ON_ONCE' int __ret_warn_on = !!(condition); \ ^ cc1: some warnings being treated as errors vim +/lockdep_is_held +834 kernel/bpf/syscall.c 828 EXPORT_SYMBOL_GPL(bpf_prog_inc); 829 830 static struct bpf_prog *bpf_prog_inc_not_zero(struct bpf_prog *prog) 831 { 832 int refold; 833 > 834 WARN_ON_ONCE(!lockdep_is_held(&prog_idr_lock)); 835 refold = __atomic_add_unless(&prog->aux->refcnt, 1, 0); 836 837 if (refold >= BPF_MAX_REFCNT) { --- 0-DAY kernel test infrastructureOpen Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation .config.gz Description: application/gzip
Re: [PATCH net-next] bnxt_en: Fix xmit_more with BQL.
From: Michael Chan Date: Tue, 30 May 2017 20:03:00 -0400 > We need to write the doorbell if BQL has stopped the queue and > skb->xmit_more is set. Otherwise it is possible for the tx queue to > rot and cause tx timeout. > > Fixes: 4d172f21cefe ("bnxt_en: Implement xmit_more.") > Suggested-by: Yuval Mintz > Signed-off-by: Michael Chan Applied, thanks.
[PATCH] enic: Fix another sleep-in-atomic bug
The driver may sleep under a spin lock in some function call paths. The 1st function call path is: enic_tx_hang_reset (acquire the lock by spin_lock) enic_stop enic_synchronize_irqs synchronize_irq --> may sleep The 2nd function call path is: enic_tx_hang_reset (acquire the lock by spin_lock) enic_dev_wait schedule_timeout_uninterruptible --> may sleep The 3rd function call path is: enic_tx_hang_reset (acquire the lock by spin_lock) enic_open enic_request_intr enic_set_rx_cpu_rmap enic_free_rx_cpu_rmap free_irq_cpu_rmap --> may sleep To fix it, the "spin_lock" and "spin_unlock" are removed in enic_tx_hang_reset. Signed-off-by: Jia-Ju Bai --- drivers/net/ethernet/cisco/enic/enic_main.c |2 -- 1 file changed, 2 deletions(-) diff --git a/drivers/net/ethernet/cisco/enic/enic_main.c b/drivers/net/ethernet/cisco/enic/enic_main.c index 4b87bee..d6523e2 100644 --- a/drivers/net/ethernet/cisco/enic/enic_main.c +++ b/drivers/net/ethernet/cisco/enic/enic_main.c @@ -2330,7 +2330,6 @@ static void enic_tx_hang_reset(struct work_struct *work) rtnl_lock(); - spin_lock(&enic->enic_api_lock); enic_dev_hang_notify(enic); enic_stop(enic->netdev); enic_dev_hang_reset(enic); @@ -2339,7 +2338,6 @@ static void enic_tx_hang_reset(struct work_struct *work) enic_set_rss_nic_cfg(enic); enic_dev_set_ig_vlan_rewrite_mode(enic); enic_open(enic->netdev); - spin_unlock(&enic->enic_api_lock); call_netdevice_notifiers(NETDEV_REBOOT, enic->netdev); rtnl_unlock(); -- 1.7.9.5
[PATCH] enic: Fix a sleep-in-atomic bug
The driver may sleep under a spin lock in some function call paths. The 1st function call path is: enic_reset (acquire the lock by spin_lock) enic_stop enic_synchronize_irqs synchronize_irq --> may sleep The 2nd function call path is: enic_reset (acquire the lock by spin_lock) enic_dev_soft_reset enic_dev_wait schedule_timeout_uninterruptible --> may sleep The 3rd function call path is: enic_reset (acquire the lock by spin_lock) enic_open enic_request_intr enic_set_rx_cpu_rmap enic_free_rx_cpu_rmap free_irq_cpu_rmap --> may sleep To fix it, the "spin_lock" and "spin_unlock" are removed in enic_reset. Signed-off-by: Jia-Ju Bai --- drivers/net/ethernet/cisco/enic/enic_main.c |2 -- 1 file changed, 2 deletions(-) diff --git a/drivers/net/ethernet/cisco/enic/enic_main.c b/drivers/net/ethernet/cisco/enic/enic_main.c index 4b87bee..2a9bef8 100644 --- a/drivers/net/ethernet/cisco/enic/enic_main.c +++ b/drivers/net/ethernet/cisco/enic/enic_main.c @@ -2310,7 +2310,6 @@ static void enic_reset(struct work_struct *work) rtnl_lock(); - spin_lock(&enic->enic_api_lock); enic_stop(enic->netdev); enic_dev_soft_reset(enic); enic_reset_addr_lists(enic); @@ -2318,7 +2317,6 @@ static void enic_reset(struct work_struct *work) enic_set_rss_nic_cfg(enic); enic_dev_set_ig_vlan_rewrite_mode(enic); enic_open(enic->netdev); - spin_unlock(&enic->enic_api_lock); call_netdevice_notifiers(NETDEV_REBOOT, enic->netdev); rtnl_unlock(); -- 1.7.9.5
[PATCH] i40e: Fix a sleep-in-atomic bug
The driver may sleep under a spin lock, and the function call path is: i40e_ndo_set_vf_port_vlan (acquire the lock by spin_lock_bh) i40e_vsi_remove_pvid i40e_vlan_stripping_disable i40e_aq_update_vsi_params i40e_asq_send_command mutex_lock --> may sleep To fixed it, the spin lock is released before "i40e_vsi_remove_pvid", and the lock is acquired again after this function. Signed-off-by: Jia-Ju Bai --- drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c |2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c index 95c23fb..0fb38ca 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c +++ b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c @@ -3017,10 +3017,12 @@ int i40e_ndo_set_vf_port_vlan(struct net_device *netdev, int vf_id, VLAN_VID_MASK)); } + spin_unlock_bh(&vsi->mac_filter_hash_lock); if (vlan_id || qos) ret = i40e_vsi_add_pvid(vsi, vlanprio); else i40e_vsi_remove_pvid(vsi); + spin_lock_bh(&vsi->mac_filter_hash_lock); if (vlan_id) { dev_info(&pf->pdev->dev, "Setting VLAN %d, QOS 0x%x on VF %d\n", -- 1.7.9.5
Re: Deleting a dynamic mac entry..
Please see inline.. On Fri, May 26, 2017 at 1:15 AM, Toshiaki Makita wrote: > On 2017/05/26 14:08, Manohar Kumar wrote: >> On Wed, May 24, 2017 at 6:11 PM, Toshiaki Makita >> wrote: >>> On 2017/05/25 3:05, Manohar Kumar wrote: Thanks, Toshiaki. What is the right way to set the default_pvid using the bridge command ? I tried this, which fails.. root@net-3:~# ip link set dev vxlan0 name untagged type vlan id 0 RTNETLINK answers: Operation not supported root@net-3:~# All the interfaces in the bridge are untagged. >>> >>> # ip link set br0 down >>> # echo 0 > /sys/class/net/br0/bridge/default_pvid >>> >> >> I have the bridge inside a network namespace. There is no >> /sys/class/net entry for the bridge interface inside the namespace. >> >> I see online references to this concept of 'tagged' sysfs directories >> to support namespaces. But its not clear how to access the sysfs >> entries for a particular namespace. > > Did you remount /sys after entering netns? No, that was the problem. Mounting /sys did the trick. Appreciate your help. Trying to understanding this better. How is the sysfs tree maintained per network namespace ? I assume only the /sys/class/net is per network namespace and the rest of the /sys is shared with the host namespace ? Also, how does mounting /sys in the network namespace make the new /sys/class/net tree visible ? thanks, Manohar. > If not, remount /sys or use "ip netns" which automatically do it. > >> Is there any other way to set the default_pvid for the bridge >> interface inside namespace ? > > No, as of 3.19. > > > Toshiaki Makita >
[PATCH] isdn: Fix a sleep-in-atomic bug
The driver may sleep under a spin lock, the function call path is: isdn_ppp_mp_receive (acquire the lock) isdn_ppp_mp_reassembly isdn_ppp_push_higher isdn_ppp_decompress isdn_ppp_ccp_reset_trans isdn_ppp_ccp_reset_alloc_state kzalloc(GFP_KERNEL) --> may sleep To fixed it, the "GFP_KERNEL" is replaced with "GFP_ATOMIC". Signed-off-by: Jia-Ju Bai --- drivers/isdn/i4l/isdn_ppp.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/isdn/i4l/isdn_ppp.c b/drivers/isdn/i4l/isdn_ppp.c index d07dd519..8aa158a 100644 --- a/drivers/isdn/i4l/isdn_ppp.c +++ b/drivers/isdn/i4l/isdn_ppp.c @@ -2364,7 +2364,7 @@ static struct ippp_ccp_reset_state *isdn_ppp_ccp_reset_alloc_state(struct ippp_s id); return NULL; } else { - rs = kzalloc(sizeof(struct ippp_ccp_reset_state), GFP_KERNEL); + rs = kzalloc(sizeof(struct ippp_ccp_reset_state), GFP_ATOMIC); if (!rs) return NULL; rs->state = CCPResetIdle; -- 1.7.9.5
Re: [i40e] regression on TCP stream and TCP maerts, kernel-4.12.0-0.rc2
On Tue, May 30, 2017 at 8:41 AM, Alexander Duyck wrote: > On Tue, May 30, 2017 at 6:43 AM, Adam Okuliar wrote: >> Hello, >> >> we found regression on intel card(XL710) with i40e driver. Regression is >> about ~45% >> on TCP_STREAM and TCP_MAERTS test for IPv4 and IPv6. Regression was first >> visible in kernel-4.12.0-0.rc1. >> >> More details about results you can see in uploaded images in bugzilla. [0] >> >> >> [0] https://bugzilla.kernel.org/show_bug.cgi?id=195923 >> >> >> Best regards, / S pozdravom, >> >> Adrián Tomašov >> Kernel Performance QE >> atoma...@redhat.com > > I have added the i40e driver maintainer and the intel-wired-lan > mailing list so that we can make are developers aware of the issue. > > Thanks. > > - Alex Adam, We are having some issues trying to reproduce what you reported. Can you provide some additional data. Specifically we would be looking for an "ethtool -i", and an "ethtool -S" for the port before and after the test. If you can attach it to the bugzilla that would be appreciated. Thanks. - Alex
RE: [PATCH net-next] net: ndisc.c: reduce size of __ndisc_fill_addr_option()
Ok, I will send v2 later > -Original Message- > From: David Ahern [mailto:dsah...@gmail.com] > Sent: Wednesday, May 31, 2017 8:40 AM > To: YUAN Linyu; David Ahern; Joe Perches; David Miller; cug...@163.com > Cc: netdev@vger.kernel.org > Subject: Re: [PATCH net-next] net: ndisc.c: reduce size of > __ndisc_fill_addr_option() > > On 5/30/17 6:29 PM, YUAN Linyu wrote: > >>> that function should be converted to skb_put_zero once it hits net-next. > >> I will check it > > I can't find skb_put_zero > > I believe the decision was to put it in Johannes' tree and it will make > its way to DaveM's tree. Give some time.
Re: [PATCH iproute2 0/4] ip: Add vrf show commmand
On Sat, 27 May 2017 17:34:46 -0600 David Ahern wrote: > Refactor ip address to export its capability to save a list of nlmsg's > for links and its link filter. Use both to add an 'ip vrf show' command > to list all configured VRF with table id. > > David Ahern (4): > ip address: Export ip_linkaddr_list > ip address: Move filter struct to ip_common.h > ip address: Change print_linkinfo_brief to take filter as an input > ip vrf: Add show command > > include/libnetlink.h | 10 > ip/ip_common.h | 27 - > ip/ipaddress.c | 144 +++- > ip/iplink.c | 2 +- > ip/ipvrf.c | 153 > +-- > man/man8/ip-vrf.8| 11 > 6 files changed, 266 insertions(+), 81 deletions(-) > Looks good, series applied to master.
Re: [PATCH V2 2/3] net-next: dsa: add multi cpu port support
On Tue, May 30, 2017 at 05:16:27PM -0700, Florian Fainelli wrote: > On 05/30/2017 05:06 PM, Andrew Lunn wrote: > >> - past the initial setup, if we start creating bridge devices and so on, > >> we have no way to tell: group Ports 0-3 together and send traffic to CPU > >> port 0, then let Port 5 alone and send traffic to CPU port 1, that's a > >> DSA-only problem though, because we still have the CPU port(s) as > >> independent network interfaces. > > > > What is the problem here? Frames come out the master interface, get > > untagged and passed to the slave interface and go upto the bridge. It > > should all just work. Same in the reverse direction. > > The problem is really that is you have multiple CPU ports, how do you > define which one gets all the traffic by default? Ascending order of > port number? Descending order? I would probably default to round robin when allocating user ports to CPU ports. That probably gives you the best default. > I actually tend to think that most use cases our there are in the order > of dedicating one CPU port to one corresponding switch port (user > facing, or internal) in order to provided guaranteed bandwidth for that > port. Which is generally a waste of bandwidth. Best case, i get 40Mbps Internet access. Meaning 960Mbps of a dedicated cpu port would be wasted. > But as an user, I want to choose how the grouping is going to > work, and right now, I cannot, unless this is hardcoded in Device Tree, > which sounds both wrong and inadequate. So how about round-robin default, and then devlink to move a user port to a specific cpu port? We also need to watch out for asymmetry. I think newer marvell chips don't support egress to multiple CPU ports. Ingress to the switch i think is unlimited. The older chips are more flexible in this respect. So we need some degree of flexibility here. Andrew
Re: [net-bluetooth] question about potential null pointer dereference
Hi Gustavo, > While looking into Coverity ID 1357456 I ran into the following piece of code > at net/bluetooth/smp.c:166 > > 166/* The following functions map to the LE SC SMP crypto functions > 167 * AES-CMAC, f4, f5, f6, g2 and h6. > 168 */ > 169 > 170static int aes_cmac(struct crypto_shash *tfm, const u8 k[16], const u8 *m, > 171size_t len, u8 mac[16]) > 172{ > 173uint8_t tmp[16], mac_msb[16], msg_msb[CMAC_MSG_MAX]; > 174SHASH_DESC_ON_STACK(desc, tfm); > 175int err; > 176 > 177if (len > CMAC_MSG_MAX) > 178return -EFBIG; > 179 > 180if (!tfm) { > 181BT_ERR("tfm %p", tfm); > 182return -EINVAL; > 183} > 184 > 185desc->tfm = tfm; > 186desc->flags = 0; > 187 > 188/* Swap key and message from LSB to MSB */ > 189swap_buf(k, tmp, 16); > 190swap_buf(m, msg_msb, len); > 191 > 192SMP_DBG("msg (len %zu) %*phN", len, (int) len, m); > 193SMP_DBG("key %16phN", k); > 194 > 195err = crypto_shash_setkey(tfm, tmp, 16); > 196if (err) { > 197BT_ERR("cipher setkey failed: %d", err); > 198return err; > 199} > 200 > 201err = crypto_shash_digest(desc, msg_msb, len, mac_msb); > 202shash_desc_zero(desc); > 203if (err) { > 204BT_ERR("Hash computation error %d", err); > 205return err; > 206} > 207 > 208swap_buf(mac_msb, mac, 16); > 209 > 210SMP_DBG("mac %16phN", mac); > 211 > 212return 0; > 213} > > The issue here is that line 180 implies that pointer tfm might be NULL. If > this is the case, there is a potential NULL pointer dereference at line 174 > once pointer tfm is indirectly dereferenced inside macro > SHASH_DESC_ON_STACK(). > > My question is if there is any chance that pointer tfm maybe be NULL when > calling macro SHASH_DESC_ON_STACK()? I think the part you are after is this: smp->tfm_cmac = crypto_alloc_shash("cmac(aes)", 0, 0); if (IS_ERR(smp->tfm_cmac)) { BT_ERR("Unable to create CMAC crypto context"); crypto_free_cipher(smp->tfm_aes); kzfree(smp); return NULL; } So the tfm_cmac is part of the smp structure. However if there is no cipher, we destroy the smp structure and essentially run without SMP support. So it can not really be called anyway. Maybe commenting this might be a good idea. Regards Marcel
Re: [PATCH iproute2 -master 0/2] Two misc BPF updates
On Sat, 13 May 2017 02:32:33 +0200 Daniel Borkmann wrote: > Requires header rebase with -net. > > Thanks! > > Daniel Borkmann (2): > bpf: update printing of generic xdp mode > bpf: dump error to the user when retrieving pinned prog fails > > ip/iplink_xdp.c | 19 +++ > lib/bpf.c | 12 +++- > 2 files changed, 22 insertions(+), 9 deletions(-) > Applied thanks.
Re: [patch iproute2] ipvtap: Adding support for ipvtap device management
On Tue, 23 May 2017 16:22:05 -0700 Sainath Grandhi wrote: > This patch adds support for managing ipvtap devices using ip link. ipvtap > support > is added to linux with commit 235a9d89da976e2975b3de9afc0bed7b72557983 Please resend with a Signed-off-by line. Also, you need to update the Usage message and the man page.
Re: [patch iproute2] tc: flower: add support for tcp flags
On Tue, 23 May 2017 23:51:39 +0200 Jiri Pirko wrote: > From: Jiri Pirko > > Allow user to insert a flower classifier filter rule which includes > match for tcp flags. > > Signed-off-by: Jiri Pirko Applied to net-next
Re: [PATCH RFC net-next] arp: Really delete an arp entry on "arp -d"
On 5/30/17 5:32 PM, Sowmini Varadhan wrote: > On (05/30/17 16:20), Stephen Hemminger wrote: >> >> Please don't copy/paste chunks of code. Instead refactor and make this >> into a helper function. > > sure, I have no problems with that, and as I pointed out, I've not > tested ipv6 for this yet either. I'll do all of this after getting > some feedback on the more basic issue here: > > I was first looking for comments on the more fundamental refcnt > management behind the fix (I'm surprised no one noticed this > before, is there some deep reason for leaving it like this, that > I am missing? Does it break something else?) It has been noticed. I have not sent a patch since adjusting gc parameters will reclaim FAILED entries at whatever rate the user wants.
Re: [PATCH iproute2] ip: add handling for new CAN netlink interface
On Fri, 19 May 2017 14:54:49 +0200 Remigiusz Kołłątaj wrote: > This patch adds handling for new CAN netlink interface introduced in > 4.11 kernel: > - IFLA_CAN_TERMINATION, > - IFLA_CAN_TERMINATION_CONST, > - IFLA_CAN_BITRATE_CONST, > - IFLA_CAN_DATA_BITRATE_CONST > > Output example: > $ip -d link show can0 > 6: can0: mtu 16 qdisc noop state DOWN mode DEFAULT group default > qlen 10 > link/can promiscuity 0 > can state STOPPED (berr-counter tx 0 rx 0) restart-ms 0 > bitrate 8 > [ 2,3,5,8,8, 10, > 125000, 15, 175000, 20, 225000, 25, > 275000, 30, 50, 625000, 80, 100 ] > termination 0 [ 0, 120 ] > clock 0numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs > 65535 > > Signed-off-by: Remigiusz Kołłątaj Applied, thanks for confirming the output change.
Re: [PATCH net-next] net: ndisc.c: reduce size of __ndisc_fill_addr_option()
On 5/30/17 6:29 PM, YUAN Linyu wrote: >>> that function should be converted to skb_put_zero once it hits net-next. >> I will check it > I can't find skb_put_zero I believe the decision was to put it in Johannes' tree and it will make its way to DaveM's tree. Give some time.
Re: [iproute PATCH] tc: m_xt: Prevent a segfault in libipt
On Tue, 23 May 2017 15:40:57 +0200 Phil Sutter wrote: > From: Phil Sutter > > This happens with NAT targets, such as SNAT, DNAT and MASQUERADE. These > are still not usable with this patch, but at least tc doesn't crash > anymore when one tries to use them. > > Signed-off-by: Phil Sutter Applied, thanks
Re: [PATCH iproute2] devlink: Add option to set and show eswitch encapsulation support
On Sun, 21 May 2017 08:37:27 +0300 Roi Dayan wrote: > This is an e-switch global knob to enable HW support for applying > encapsulation/decapsulation to VF traffic as part of SRIOV e-switch > offloading. > > The actual encap/decap is carried out (along with the matching and other > actions) per offloaded e-switch rules, e.g as done when offloading the TC > tunnel > key action. > > Possible values are enable/disable. > > Signed-off-by: Roi Dayan > Reviewed-by: Jiri Pirko Applied, thanks
Re: [PATCH] netlink: Change rtnl_dump_done to always show error
On Tue, 16 May 2017 14:22:46 -0700 David Ahern wrote: > The original code which became rtnl_dump_done only shows netlink errors > if the protocol is NETLINK_SOCK_DIAG, but netlink dumps always appends > the length which contains any error encountered during the dump. Update > rtnl_dump_done to always show the error if there is one. > > As an *example* without this patch, dumping a route object that exceeds > the internal buffer size terminates with no message to the user -- the > dump just ends because the NLMSG_DONE attribute was received. With this > patch the user at least gets a message that the dump was aborted. > > $ ip ro ls > default via 10.0.2.2 dev eth0 > 10.0.2.0/24 dev eth0 proto kernel scope link src 10.0.2.15 > 10.10.0.0/16 dev veth1 proto kernel scope link src 10.10.0.1 > 172.16.1.0/24 dev br0.11 proto kernel scope link src 172.16.1.1 > Error: Buffer too small for object > Dump terminated > > The point of this patch is to notify the user of a failure versus > silently exiting on a partial dump. Because the NLMSG_DONE attribute > was received, the entire dump needs to be restarted to use a larger > buffer for EMSGSIZE errors. That could be done automatically but it > has other user impacts (e.g., duplicate output if the dump is > restarted) and should be the subject of a different patch. > > Signed-off-by: David Ahern Applied. It is easier to seen iproute2 patches with: Subject: [PATCH iproute2] netlink: Change rtnl_dump_done to always show error
Re: [PATCH net-next 0/8] Introduce bpf ID
On 5/30/17 6:08 PM, Martin KaFai Lau wrote: > This patch series: > 1) Introduce ID for both bpf_prog and bpf_map. > 2) Add bpf commands to iterate the prog IDs and map >IDs of the system. > 3) Add bpf commands to get a prog/map fd from an ID > 4) Add bpf command to get prog/map info from a fd. >The prog/map info is a jump start in this patchset >and it is not meant to be a complete list. They can >be extended in the future patches. Interesting. This really simplifies the bpf retrieval I have been working on: https://github.com/dsahern/linux/commits/bpf/retrieve-bpf-wip Having the net device, filter, route, socket, cgroup, etc return the ids for attached programs solves the problem of getting a reference to all of the places BPF programs can be placed. This patch set turns an id into a fd, and my patches allow the fd to be used to get the bpf code.
Re: [PATCH 1/2] ip: include libc headers first
On Mon, 22 May 2017 16:27:53 +0300 Baruch Siach wrote: > Including libc headers first helps as a workaround to redefinition of struct > ethhdr with a suitably patched musl libc that suppresses the kernel > if_ether.h. > > Signed-off-by: Baruch Siach > --- > ip/iplink_bridge.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/ip/iplink_bridge.c b/ip/iplink_bridge.c > index 818b43c89b5b..cccdec1c203a 100644 > --- a/ip/iplink_bridge.c > +++ b/ip/iplink_bridge.c > @@ -13,9 +13,9 @@ > #include > #include > #include > +#include > #include > #include > -#include > #include > > #include "rt_names.h" Applied. Next time please fix the Subject line. You implied two patches by using 1/2 but only one was sent. It is easier for me if you include iproute2 in subject since then it goes into a separate folder. Subject: [PATCH iproute2] ip: include libc headers first or if intended for net-next Subject: [PATCH iproute2 net-next] ip: add magic VRF support
RE: [PATCH net-next] net: ndisc.c: reduce size of __ndisc_fill_addr_option()
> > -Original Message- > > From: David Ahern [mailto:dsah...@gmail.com] > > Sent: Tuesday, May 30, 2017 11:42 AM > > To: Joe Perches; David Miller; cug...@163.com > > Cc: netdev@vger.kernel.org; YUAN Linyu > > Subject: Re: [PATCH net-next] net: ndisc.c: reduce size of > > __ndisc_fill_addr_option() > > > > On 5/29/17 9:41 PM, Joe Perches wrote: > > > On Mon, 2017-05-29 at 23:30 -0400, David Miller wrote: > > >> From: yuan linyu > > >> Date: Sat, 27 May 2017 06:00:52 +0800 > > >> > > >>> From: yuan linyu > > >>> > > >>> Signed-off-by: yuan linyu > > >> Applied, thanks. > > > OK, but is it really safe though? > > > > > > Could "space" (an int) ever be negative after > > > subtracting "pad" and "data_len"? > > > > > > > that function should be converted to skb_put_zero once it hits net-next. > I will check it I can't find skb_put_zero
RE: [PATCH net-next] net: ndisc.c: reduce size of __ndisc_fill_addr_option()
> -Original Message- > From: David Ahern [mailto:dsah...@gmail.com] > Sent: Tuesday, May 30, 2017 11:42 AM > To: Joe Perches; David Miller; cug...@163.com > Cc: netdev@vger.kernel.org; YUAN Linyu > Subject: Re: [PATCH net-next] net: ndisc.c: reduce size of > __ndisc_fill_addr_option() > > On 5/29/17 9:41 PM, Joe Perches wrote: > > On Mon, 2017-05-29 at 23:30 -0400, David Miller wrote: > >> From: yuan linyu > >> Date: Sat, 27 May 2017 06:00:52 +0800 > >> > >>> From: yuan linyu > >>> > >>> Signed-off-by: yuan linyu > >> Applied, thanks. > > OK, but is it really safe though? > > > > Could "space" (an int) ever be negative after > > subtracting "pad" and "data_len"? > > > > that function should be converted to skb_put_zero once it hits net-next. I will check it
RE: [PATCH net-next] net: ndisc.c: reduce size of __ndisc_fill_addr_option()
Hi joe, > -Original Message- > From: netdev-ow...@vger.kernel.org [mailto:netdev-ow...@vger.kernel.org] > On Behalf Of Joe Perches > Sent: Tuesday, May 30, 2017 11:41 AM > To: David Miller; cug...@163.com > Cc: netdev@vger.kernel.org; dsah...@gmail.com; YUAN Linyu > Subject: Re: [PATCH net-next] net: ndisc.c: reduce size of > __ndisc_fill_addr_option() > > On Mon, 2017-05-29 at 23:30 -0400, David Miller wrote: > > From: yuan linyu > > Date: Sat, 27 May 2017 06:00:52 +0800 > > > > > From: yuan linyu > > > > > > Signed-off-by: yuan linyu > > > > Applied, thanks. > > OK, but is it really safe though? > > Could "space" (an int) ever be negative after > subtracting "pad" and "data_len"? It's safe, check int space = __ndisc_opt_addr_space(data_len, pad); space is maximum aligned value. And I check current pad, the maximum value is 2.
Re: [PATCH V2 2/3] net-next: dsa: add multi cpu port support
On 05/30/2017 05:06 PM, Andrew Lunn wrote: >> - past the initial setup, if we start creating bridge devices and so on, >> we have no way to tell: group Ports 0-3 together and send traffic to CPU >> port 0, then let Port 5 alone and send traffic to CPU port 1, that's a >> DSA-only problem though, because we still have the CPU port(s) as >> independent network interfaces. > > What is the problem here? Frames come out the master interface, get > untagged and passed to the slave interface and go upto the bridge. It > should all just work. Same in the reverse direction. The problem is really that is you have multiple CPU ports, how do you define which one gets all the traffic by default? Ascending order of port number? Descending order? > > In order to make best use of the extra bandwidth of having two cpu > ports, i probably want the user ports reasonably evenly distributed > between the CPU ports. Dedicating one CPU port to one user port is > probably sub-optimal. How many people have 1Gbps Fibre to the home, > which could fully utilise a one-to-one mapping for the WAN port? I actually tend to think that most use cases our there are in the order of dedicating one CPU port to one corresponding switch port (user facing, or internal) in order to provided guaranteed bandwidth for that port. But as an user, I want to choose how the grouping is going to work, and right now, I cannot, unless this is hardcoded in Device Tree, which sounds both wrong and inadequate. > >> Now, that would still force the user to configure two bridges in order >> to properly steer traffic towards the requested ports but it would allow >> us to be very flexible (which is probably desired here) in how ports are >> grouped together. > > We want a sensible default, spreading the slave ports evenly over the > CPU ports. We could add a devlink command to change the defaults at > runtime. Sensible default is fine for the first time boot, but we should let users be entirely flexible in how they want their user-facing ports to map to a CPU port as you say, and IMHO using separate bridges to configure that is a possible way to go since there is already knowledge in the bridge join/leave code in DSA that already knows the dwnstream/user-facing ports, but does not yet know about CPU ports. Code speaks better, so let me see if I can cook something to illustrate this. Thanks! -- Florian
[PATCH net-next 5/8] bpf: Add BPF_MAP_GET_FD_BY_ID
Add BPF_MAP_GET_FD_BY_ID command to allow user to get a fd from a bpf_map's ID. bpf_map_inc_not_zero() is added and is called with map_idr_lock held. __bpf_map_put() is also added which has the 'bool do_idr_lock' param to decide if the map_idr_lock should be acquired when freeing the map->id. In the error path of bpf_map_inc_not_zero(), it may have to call __bpf_map_put(map, false) which does not need to take the map_idr_lock when freeing the map->id. It is currently limited to CAP_SYS_ADMIN which we can consider to lift it in followup patches. Signed-off-by: Martin KaFai Lau Acked-by: Alexei Starovoitov Acked-by: Daniel Borkmann --- include/uapi/linux/bpf.h | 2 + kernel/bpf/syscall.c | 95 +++- 2 files changed, 87 insertions(+), 10 deletions(-) diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index 6d4e1cc5bd18..cf704e8b6e65 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -85,6 +85,7 @@ enum bpf_cmd { BPF_PROG_GET_NEXT_ID, BPF_MAP_GET_NEXT_ID, BPF_PROG_GET_FD_BY_ID, + BPF_MAP_GET_FD_BY_ID, }; enum bpf_map_type { @@ -217,6 +218,7 @@ union bpf_attr { union { __u32 start_id; __u32 prog_id; + __u32 map_id; }; __u32 next_id; }; diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index 06d146f54ffb..f524431dd6ce 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -135,11 +135,19 @@ static int bpf_map_alloc_id(struct bpf_map *map) return id > 0 ? 0 : id; } -static void bpf_map_free_id(struct bpf_map *map) +static void bpf_map_free_id(struct bpf_map *map, bool do_idr_lock) { - spin_lock_bh(&map_idr_lock); + if (do_idr_lock) + spin_lock_bh(&map_idr_lock); + else + __acquire(&map_idr_lock); + idr_remove(&map_idr, map->id); - spin_unlock_bh(&map_idr_lock); + + if (do_idr_lock) + spin_unlock_bh(&map_idr_lock); + else + __release(&map_idr_lock); } /* called from workqueue */ @@ -163,16 +171,21 @@ static void bpf_map_put_uref(struct bpf_map *map) /* decrement map refcnt and schedule it for freeing via workqueue * (unrelying map implementation ops->map_free() might sleep) */ -void bpf_map_put(struct bpf_map *map) +static void __bpf_map_put(struct bpf_map *map, bool do_idr_lock) { if (atomic_dec_and_test(&map->refcnt)) { /* bpf_map_free_id() must be called first */ - bpf_map_free_id(map); + bpf_map_free_id(map, do_idr_lock); INIT_WORK(&map->work, bpf_map_free_deferred); schedule_work(&map->work); } } +void bpf_map_put(struct bpf_map *map) +{ + __bpf_map_put(map, true); +} + void bpf_map_put_with_uref(struct bpf_map *map) { bpf_map_put_uref(map); @@ -271,15 +284,20 @@ static int map_create(union bpf_attr *attr) goto free_map; err = bpf_map_new_fd(map); - if (err < 0) - /* failed to allocate fd */ - goto free_id; + if (err < 0) { + /* failed to allocate fd. +* bpf_map_put() is needed because the above +* bpf_map_alloc_id() has published the map +* to the userspace and the userspace may +* have refcnt-ed it through BPF_MAP_GET_FD_BY_ID. +*/ + bpf_map_put(map); + return err; + } trace_bpf_map_create(map, err); return err; -free_id: - bpf_map_free_id(map); free_map: bpf_map_uncharge_memlock(map); free_map_nouncharge: @@ -331,6 +349,28 @@ struct bpf_map *bpf_map_get_with_uref(u32 ufd) return map; } +static struct bpf_map *bpf_map_inc_not_zero(struct bpf_map *map, + bool uref) +{ + int refold; + + WARN_ON_ONCE(!lockdep_is_held(&map_idr_lock)); + refold = __atomic_add_unless(&map->refcnt, 1, 0); + + if (refold >= BPF_MAX_REFCNT) { + __bpf_map_put(map, false); + return ERR_PTR(-EBUSY); + } + + if (!refold) + return ERR_PTR(-ENOENT); + + if (uref) + atomic_inc(&map->usercnt); + + return map; +} + int __weak bpf_stackmap_copy(struct bpf_map *map, void *key, void *value) { return -ENOTSUPP; @@ -1165,6 +1205,38 @@ static int bpf_prog_get_fd_by_id(const union bpf_attr *attr) return fd; } +#define BPF_MAP_GET_FD_BY_ID_LAST_FIELD map_id + +static int bpf_map_get_fd_by_id(const union bpf_attr *attr) +{ + struct bpf_map *map; + u32 id = attr->map_id; + int fd; + + if (CHECK_ATTR(BPF_MAP_GET_FD_BY_ID)) + return -EINVAL; + + if (!capable(CAP_SYS_ADMIN)) +
[PATCH net-next 6/8] bpf: Add jited_len to struct bpf_prog
Add jited_len to struct bpf_prog. It will be useful for the struct bpf_prog_info which will be added in the later patch. Signed-off-by: Martin KaFai Lau Acked-by: Alexei Starovoitov Acked-by: Daniel Borkmann --- arch/arm64/net/bpf_jit_comp.c | 1 + arch/powerpc/net/bpf_jit_comp64.c | 1 + arch/s390/net/bpf_jit_comp.c | 1 + arch/sparc/net/bpf_jit_comp_64.c | 1 + arch/x86/net/bpf_jit_comp.c | 1 + include/linux/filter.h| 1 + 6 files changed, 6 insertions(+) diff --git a/arch/arm64/net/bpf_jit_comp.c b/arch/arm64/net/bpf_jit_comp.c index 71f930501ade..6b21dccf01e8 100644 --- a/arch/arm64/net/bpf_jit_comp.c +++ b/arch/arm64/net/bpf_jit_comp.c @@ -900,6 +900,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog) bpf_jit_binary_lock_ro(header); prog->bpf_func = (void *)ctx.image; prog->jited = 1; + prog->jited_len = image_size; out_off: kfree(ctx.offset); diff --git a/arch/powerpc/net/bpf_jit_comp64.c b/arch/powerpc/net/bpf_jit_comp64.c index aee2bb817ac6..a3f904cd8b1e 100644 --- a/arch/powerpc/net/bpf_jit_comp64.c +++ b/arch/powerpc/net/bpf_jit_comp64.c @@ -1052,6 +1052,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp) fp->bpf_func = (void *)image; fp->jited = 1; + fp->jited_len = alloclen; bpf_flush_icache(bpf_hdr, (u8 *)bpf_hdr + (bpf_hdr->pages * PAGE_SIZE)); diff --git a/arch/s390/net/bpf_jit_comp.c b/arch/s390/net/bpf_jit_comp.c index 6e97a2e3fd8d..76b7e9d5d591 100644 --- a/arch/s390/net/bpf_jit_comp.c +++ b/arch/s390/net/bpf_jit_comp.c @@ -1329,6 +1329,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp) bpf_jit_binary_lock_ro(header); fp->bpf_func = (void *) jit.prg_buf; fp->jited = 1; + fp->jited_len = jit.size; free_addrs: kfree(jit.addrs); out: diff --git a/arch/sparc/net/bpf_jit_comp_64.c b/arch/sparc/net/bpf_jit_comp_64.c index 21de77419f48..beb46b14fbbc 100644 --- a/arch/sparc/net/bpf_jit_comp_64.c +++ b/arch/sparc/net/bpf_jit_comp_64.c @@ -1555,6 +1555,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog) prog->bpf_func = (void *)ctx.image; prog->jited = 1; + prog->jited_len = image_size; out_off: kfree(ctx.offset); diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c index f58939393eef..9341030133ae 100644 --- a/arch/x86/net/bpf_jit_comp.c +++ b/arch/x86/net/bpf_jit_comp.c @@ -1162,6 +1162,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog) bpf_jit_binary_lock_ro(header); prog->bpf_func = (void *)image; prog->jited = 1; + prog->jited_len = proglen; } else { prog = orig_prog; } diff --git a/include/linux/filter.h b/include/linux/filter.h index 62d948f80730..a322a41f394b 100644 --- a/include/linux/filter.h +++ b/include/linux/filter.h @@ -429,6 +429,7 @@ struct bpf_prog { kmemcheck_bitfield_end(meta); enum bpf_prog_type type; /* Type of BPF program */ u32 len;/* Number of filter blocks */ + u32 jited_len; /* Size of jited insns in bytes */ u8 tag[BPF_TAG_SIZE]; struct bpf_prog_aux *aux; /* Auxiliary fields */ struct sock_fprog_kern *orig_prog; /* Original BPF program */ -- 2.9.3
[PATCH net-next 2/8] bpf: Introduce bpf_map ID
This patch generates an unique ID for each created bpf_map. The approach is similar to the earlier patch for bpf_prog ID. It is worth to note that the bpf_map's ID and bpf_prog's ID are in two independent ID spaces and both have the same valid range: [1, INT_MAX). Signed-off-by: Martin KaFai Lau Acked-by: Alexei Starovoitov Acked-by: Daniel Borkmann --- include/linux/bpf.h | 1 + kernel/bpf/syscall.c | 34 +- 2 files changed, 34 insertions(+), 1 deletion(-) diff --git a/include/linux/bpf.h b/include/linux/bpf.h index c2793a732edc..ea78d87cbc3e 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -46,6 +46,7 @@ struct bpf_map { u32 max_entries; u32 map_flags; u32 pages; + u32 id; struct user_struct *user; const struct bpf_map_ops *ops; struct work_struct work; diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index 697bdb6ceceb..a6fe034dbc09 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -27,6 +27,8 @@ DEFINE_PER_CPU(int, bpf_prog_active); DEFINE_IDR(prog_idr); DEFINE_SPINLOCK(prog_idr_lock); +DEFINE_IDR(map_idr); +DEFINE_SPINLOCK(map_idr_lock); int sysctl_unprivileged_bpf_disabled __read_mostly; @@ -117,6 +119,29 @@ static void bpf_map_uncharge_memlock(struct bpf_map *map) free_uid(user); } +static int bpf_map_alloc_id(struct bpf_map *map) +{ + int id; + + spin_lock_bh(&map_idr_lock); + id = idr_alloc_cyclic(&map_idr, map, 1, INT_MAX, GFP_ATOMIC); + if (id > 0) + map->id = id; + spin_unlock_bh(&map_idr_lock); + + if (WARN_ON_ONCE(!id)) + return -ENOSPC; + + return id > 0 ? 0 : id; +} + +static void bpf_map_free_id(struct bpf_map *map) +{ + spin_lock_bh(&map_idr_lock); + idr_remove(&map_idr, map->id); + spin_unlock_bh(&map_idr_lock); +} + /* called from workqueue */ static void bpf_map_free_deferred(struct work_struct *work) { @@ -141,6 +166,7 @@ static void bpf_map_put_uref(struct bpf_map *map) void bpf_map_put(struct bpf_map *map) { if (atomic_dec_and_test(&map->refcnt)) { + bpf_map_free_id(map); INIT_WORK(&map->work, bpf_map_free_deferred); schedule_work(&map->work); } @@ -239,14 +265,20 @@ static int map_create(union bpf_attr *attr) if (err) goto free_map_nouncharge; + err = bpf_map_alloc_id(map); + if (err) + goto free_map; + err = bpf_map_new_fd(map); if (err < 0) /* failed to allocate fd */ - goto free_map; + goto free_id; trace_bpf_map_create(map, err); return err; +free_id: + bpf_map_free_id(map); free_map: bpf_map_uncharge_memlock(map); free_map_nouncharge: -- 2.9.3
[PATCH net-next 3/8] bpf: Add BPF_(PROG|MAP)_GET_NEXT_ID command
This patch adds BPF_PROG_GET_NEXT_ID and BPF_MAP_GET_NEXT_ID to allow userspace to iterate all bpf_prog IDs and bpf_map IDs. The API is trying to be consistent with the existing BPF_MAP_GET_NEXT_KEY. It is currently limited to CAP_SYS_ADMIN which we can consider to lift it in followup patches. Signed-off-by: Martin KaFai Lau Acked-by: Alexei Starovoitov Acked-by: Daniel Borkmann --- include/uapi/linux/bpf.h | 7 +++ kernel/bpf/syscall.c | 38 ++ 2 files changed, 45 insertions(+) diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index 94dfa9def355..e5c88f39bdc3 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -82,6 +82,8 @@ enum bpf_cmd { BPF_PROG_ATTACH, BPF_PROG_DETACH, BPF_PROG_TEST_RUN, + BPF_PROG_GET_NEXT_ID, + BPF_MAP_GET_NEXT_ID, }; enum bpf_map_type { @@ -209,6 +211,11 @@ union bpf_attr { __u32 repeat; __u32 duration; } test; + + struct { /* anonymous struct used by BPF_*_GET_NEXT_ID */ + __u32 start_id; + __u32 next_id; + }; } __attribute__((aligned(8))); /* BPF helper function descriptions: diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index a6fe034dbc09..34dc92b3c349 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -166,6 +166,7 @@ static void bpf_map_put_uref(struct bpf_map *map) void bpf_map_put(struct bpf_map *map) { if (atomic_dec_and_test(&map->refcnt)) { + /* bpf_map_free_id() must be called first */ bpf_map_free_id(map); INIT_WORK(&map->work, bpf_map_free_deferred); schedule_work(&map->work); @@ -726,6 +727,7 @@ void bpf_prog_put(struct bpf_prog *prog) { if (atomic_dec_and_test(&prog->aux->refcnt)) { trace_bpf_prog_put_rcu(prog); + /* bpf_prog_free_id() must be called first */ bpf_prog_free_id(prog); bpf_prog_kallsyms_del(prog); call_rcu(&prog->aux->rcu, __bpf_prog_put_rcu); @@ -1067,6 +1069,34 @@ static int bpf_prog_test_run(const union bpf_attr *attr, return ret; } +#define BPF_OBJ_GET_NEXT_ID_LAST_FIELD next_id + +static int bpf_obj_get_next_id(const union bpf_attr *attr, + union bpf_attr __user *uattr, + struct idr *idr, + spinlock_t *lock) +{ + u32 next_id = attr->start_id; + int err = 0; + + if (CHECK_ATTR(BPF_OBJ_GET_NEXT_ID) || next_id >= INT_MAX) + return -EINVAL; + + if (!capable(CAP_SYS_ADMIN)) + return -EPERM; + + next_id++; + spin_lock_bh(lock); + if (!idr_get_next(idr, &next_id)) + err = -ENOENT; + spin_unlock_bh(lock); + + if (!err) + err = put_user(next_id, &uattr->next_id); + + return err; +} + SYSCALL_DEFINE3(bpf, int, cmd, union bpf_attr __user *, uattr, unsigned int, size) { union bpf_attr attr = {}; @@ -1144,6 +1174,14 @@ SYSCALL_DEFINE3(bpf, int, cmd, union bpf_attr __user *, uattr, unsigned int, siz case BPF_PROG_TEST_RUN: err = bpf_prog_test_run(&attr, uattr); break; + case BPF_PROG_GET_NEXT_ID: + err = bpf_obj_get_next_id(&attr, uattr, + &prog_idr, &prog_idr_lock); + break; + case BPF_MAP_GET_NEXT_ID: + err = bpf_obj_get_next_id(&attr, uattr, + &map_idr, &map_idr_lock); + break; default: err = -EINVAL; break; -- 2.9.3
[PATCH net-next 8/8] bpf: Test for bpf ID
Add test to exercise the bpf_prog/map id generation, bpf_(prog|map)_get_next_id(), bpf_(prog|map)_get_fd_by_id() and bpf_get_obj_info_by_fd(). Signed-off-by: Martin KaFai Lau Acked-by: Alexei Starovoitov Acked-by: Daniel Borkmann --- tools/include/uapi/linux/bpf.h| 41 +++ tools/lib/bpf/bpf.c | 68 +++ tools/lib/bpf/bpf.h | 5 + tools/testing/selftests/bpf/Makefile | 2 +- tools/testing/selftests/bpf/test_obj_id.c | 35 ++ tools/testing/selftests/bpf/test_progs.c | 191 ++ 6 files changed, 341 insertions(+), 1 deletion(-) create mode 100644 tools/testing/selftests/bpf/test_obj_id.c diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h index 94dfa9def355..677b570da8ec 100644 --- a/tools/include/uapi/linux/bpf.h +++ b/tools/include/uapi/linux/bpf.h @@ -82,6 +82,11 @@ enum bpf_cmd { BPF_PROG_ATTACH, BPF_PROG_DETACH, BPF_PROG_TEST_RUN, + BPF_PROG_GET_NEXT_ID, + BPF_MAP_GET_NEXT_ID, + BPF_PROG_GET_FD_BY_ID, + BPF_MAP_GET_FD_BY_ID, + BPF_OBJ_GET_INFO_BY_FD, }; enum bpf_map_type { @@ -209,6 +214,21 @@ union bpf_attr { __u32 repeat; __u32 duration; } test; + + struct { /* anonymous struct used by BPF_*_GET_*_ID */ + union { + __u32 start_id; + __u32 prog_id; + __u32 map_id; + }; + __u32 next_id; + }; + + struct { /* anonymous struct used by BPF_OBJ_GET_INFO_BY_FD */ + __u32 bpf_fd; + __u32 info_len; + __aligned_u64 info; + } info; } __attribute__((aligned(8))); /* BPF helper function descriptions: @@ -670,4 +690,25 @@ struct xdp_md { __u32 data_end; }; +#define BPF_TAG_SIZE 8 + +struct bpf_prog_info { + __u32 type; + __u32 id; + __u8 tag[BPF_TAG_SIZE]; + __u32 jited_prog_len; + __u32 xlated_prog_len; + __aligned_u64 jited_prog_insns; + __aligned_u64 xlated_prog_insns; +} __attribute__((aligned(8))); + +struct bpf_map_info { + __u32 type; + __u32 id; + __u32 key_size; + __u32 value_size; + __u32 max_entries; + __u32 map_flags; +} __attribute__((aligned(8))); + #endif /* _UAPI__LINUX_BPF_H__ */ diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c index 6e178987af8e..7e0405e1651d 100644 --- a/tools/lib/bpf/bpf.c +++ b/tools/lib/bpf/bpf.c @@ -257,3 +257,71 @@ int bpf_prog_test_run(int prog_fd, int repeat, void *data, __u32 size, *duration = attr.test.duration; return ret; } + +int bpf_prog_get_next_id(__u32 start_id, __u32 *next_id) +{ + union bpf_attr attr; + int err; + + bzero(&attr, sizeof(attr)); + attr.start_id = start_id; + + err = sys_bpf(BPF_PROG_GET_NEXT_ID, &attr, sizeof(attr)); + if (!err) + *next_id = attr.next_id; + + return err; +} + +int bpf_map_get_next_id(__u32 start_id, __u32 *next_id) +{ + union bpf_attr attr; + int err; + + bzero(&attr, sizeof(attr)); + attr.start_id = start_id; + + err = sys_bpf(BPF_MAP_GET_NEXT_ID, &attr, sizeof(attr)); + if (!err) + *next_id = attr.next_id; + + return err; +} + +int bpf_prog_get_fd_by_id(__u32 id) +{ + union bpf_attr attr; + + bzero(&attr, sizeof(attr)); + attr.prog_id = id; + + return sys_bpf(BPF_PROG_GET_FD_BY_ID, &attr, sizeof(attr)); +} + +int bpf_map_get_fd_by_id(__u32 id) +{ + union bpf_attr attr; + + bzero(&attr, sizeof(attr)); + attr.map_id = id; + + return sys_bpf(BPF_MAP_GET_FD_BY_ID, &attr, sizeof(attr)); +} + +int bpf_obj_get_info_by_fd(int prog_fd, void *info, __u32 *info_len) +{ + union bpf_attr attr; + int err; + + bzero(&attr, sizeof(attr)); + bzero(info, *info_len); + attr.info.bpf_fd = prog_fd; + attr.info.info_len = *info_len; + attr.info.info = ptr_to_u64(info); + + err = sys_bpf(BPF_OBJ_GET_INFO_BY_FD, &attr, sizeof(attr)); + if (!err) + *info_len = attr.info.info_len; + + return err; +} diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h index 972bd8333eb7..16de44a14b48 100644 --- a/tools/lib/bpf/bpf.h +++ b/tools/lib/bpf/bpf.h @@ -54,5 +54,10 @@ int bpf_prog_detach(int attachable_fd, enum bpf_attach_type type); int bpf_prog_test_run(int prog_fd, int repeat, void *data, __u32 size, void *data_out, __u32 *size_out, __u32 *retval, __u32 *duration); +int bpf_prog_get_next_id(__u32 start_id, __u32 *next_id); +int bpf_map_get_next_id(__u32 start_id, __u32 *next_id); +int bpf_prog_get_fd_by_id(__u32 id); +int bpf_map_get_fd_by_id(__u32 id); +int bpf_obj_get_i
[PATCH net-next 1/8] bpf: Introduce bpf_prog ID
This patch generates an unique ID for each BPF_PROG_LOAD-ed prog. It is worth to note that each BPF_PROG_LOAD-ed prog will have a different ID even they have the same bpf instructions. The ID is generated by the existing idr_alloc_cyclic(). The ID is ranged from [1, INT_MAX). It is allocated in cyclic manner, so an ID will get reused every 2 billion BPF_PROG_LOAD. The bpf_prog_alloc_id() is done after bpf_prog_select_runtime() because the jit process may have allocated a new prog. Hence, we need to ensure the value of pointer 'prog' will not be changed any more before storing the prog to the prog_idr. After bpf_prog_select_runtime(), the prog is read-only. Hence, the id is stored in 'struct bpf_prog_aux'. Signed-off-by: Martin KaFai Lau Acked-by: Alexei Starovoitov Acked-by: Daniel Borkmann --- include/linux/bpf.h | 1 + kernel/bpf/syscall.c | 40 +++- 2 files changed, 40 insertions(+), 1 deletion(-) diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 6bb38d76faf4..c2793a732edc 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -171,6 +171,7 @@ struct bpf_prog_aux { atomic_t refcnt; u32 used_map_cnt; u32 max_ctx_offset; + u32 id; struct latch_tree_node ksym_tnode; struct list_head ksym_lnode; const struct bpf_verifier_ops *ops; diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index 265a0d854e33..697bdb6ceceb 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -22,8 +22,11 @@ #include #include #include +#include DEFINE_PER_CPU(int, bpf_prog_active); +DEFINE_IDR(prog_idr); +DEFINE_SPINLOCK(prog_idr_lock); int sysctl_unprivileged_bpf_disabled __read_mostly; @@ -650,6 +653,34 @@ static void bpf_prog_uncharge_memlock(struct bpf_prog *prog) free_uid(user); } +static int bpf_prog_alloc_id(struct bpf_prog *prog) +{ + int id; + + spin_lock_bh(&prog_idr_lock); + id = idr_alloc_cyclic(&prog_idr, prog, 1, INT_MAX, GFP_ATOMIC); + if (id > 0) + prog->aux->id = id; + spin_unlock_bh(&prog_idr_lock); + + /* id is in [1, INT_MAX) */ + if (WARN_ON_ONCE(!id)) + return -ENOSPC; + + return id > 0 ? 0 : id; +} + +static void bpf_prog_free_id(struct bpf_prog *prog) +{ + /* cBPF to eBPF migrations are currently not in the idr store. */ + if (!prog->aux->id) + return; + + spin_lock_bh(&prog_idr_lock); + idr_remove(&prog_idr, prog->aux->id); + spin_unlock_bh(&prog_idr_lock); +} + static void __bpf_prog_put_rcu(struct rcu_head *rcu) { struct bpf_prog_aux *aux = container_of(rcu, struct bpf_prog_aux, rcu); @@ -663,6 +694,7 @@ void bpf_prog_put(struct bpf_prog *prog) { if (atomic_dec_and_test(&prog->aux->refcnt)) { trace_bpf_prog_put_rcu(prog); + bpf_prog_free_id(prog); bpf_prog_kallsyms_del(prog); call_rcu(&prog->aux->rcu, __bpf_prog_put_rcu); } @@ -855,15 +887,21 @@ static int bpf_prog_load(union bpf_attr *attr) if (err < 0) goto free_used_maps; + err = bpf_prog_alloc_id(prog); + if (err) + goto free_used_maps; + err = bpf_prog_new_fd(prog); if (err < 0) /* failed to allocate fd */ - goto free_used_maps; + goto free_id; bpf_prog_kallsyms_add(prog); trace_bpf_prog_load(prog, err); return err; +free_id: + bpf_prog_free_id(prog); free_used_maps: free_used_maps(prog->aux); free_prog: -- 2.9.3
[PATCH net-next 7/8] bpf: Add BPF_OBJ_GET_INFO_BY_FD
A single BPF_OBJ_GET_INFO_BY_FD cmd is used to obtain the info for both bpf_prog and bpf_map. The kernel can figure out the fd is associated with a bpf_prog or bpf_map. The suggested struct bpf_prog_info and struct bpf_map_info are not meant to be a complete list and it is not the goal of this patch. New fields can be added in the future patch. The focus of this patch is to create the interface, BPF_OBJ_GET_INFO_BY_FD cmd for exposing the bpf_prog's and bpf_map's info. The obj's info, which will be extended (and get bigger) over time, is separated from the bpf_attr to avoid bloating the bpf_attr. Signed-off-by: Martin KaFai Lau Acked-by: Alexei Starovoitov Acked-by: Daniel Borkmann --- include/linux/filter.h | 2 - include/uapi/linux/bpf.h | 28 kernel/bpf/syscall.c | 163 ++- 3 files changed, 174 insertions(+), 19 deletions(-) diff --git a/include/linux/filter.h b/include/linux/filter.h index a322a41f394b..46274f23f127 100644 --- a/include/linux/filter.h +++ b/include/linux/filter.h @@ -66,8 +66,6 @@ struct bpf_prog_aux; /* BPF program can access up to 512 bytes of stack space. */ #define MAX_BPF_STACK 512 -#define BPF_TAG_SIZE 8 - /* Helper macros for filter block array initializers. */ /* ALU ops on registers, bpf_add|sub|...: dst_reg += src_reg */ diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index cf704e8b6e65..677b570da8ec 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -86,6 +86,7 @@ enum bpf_cmd { BPF_MAP_GET_NEXT_ID, BPF_PROG_GET_FD_BY_ID, BPF_MAP_GET_FD_BY_ID, + BPF_OBJ_GET_INFO_BY_FD, }; enum bpf_map_type { @@ -222,6 +223,12 @@ union bpf_attr { }; __u32 next_id; }; + + struct { /* anonymous struct used by BPF_OBJ_GET_INFO_BY_FD */ + __u32 bpf_fd; + __u32 info_len; + __aligned_u64 info; + } info; } __attribute__((aligned(8))); /* BPF helper function descriptions: @@ -683,4 +690,25 @@ struct xdp_md { __u32 data_end; }; +#define BPF_TAG_SIZE 8 + +struct bpf_prog_info { + __u32 type; + __u32 id; + __u8 tag[BPF_TAG_SIZE]; + __u32 jited_prog_len; + __u32 xlated_prog_len; + __aligned_u64 jited_prog_insns; + __aligned_u64 xlated_prog_insns; +} __attribute__((aligned(8))); + +struct bpf_map_info { + __u32 type; + __u32 id; + __u32 key_size; + __u32 value_size; + __u32 max_entries; + __u32 map_flags; +} __attribute__((aligned(8))); + #endif /* _UAPI__LINUX_BPF_H__ */ diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index f524431dd6ce..b21af69bf57b 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -1237,6 +1237,145 @@ static int bpf_map_get_fd_by_id(const union bpf_attr *attr) return fd; } +static int check_uarg_tail_zero(void __user *uaddr, + size_t expected_size, + size_t actual_size) +{ + unsigned char __user *addr; + unsigned char __user *end; + unsigned char val; + int err; + + if (actual_size <= expected_size) + return 0; + + addr = uaddr + expected_size; + end = uaddr + actual_size; + + for (; addr < end; addr++) { + err = get_user(val, addr); + if (err) + return err; + if (val) + return -E2BIG; + } + + return 0; +} + +static int bpf_prog_get_info_by_fd(struct bpf_prog *prog, + const union bpf_attr *attr, + union bpf_attr *uattr) +{ + struct bpf_prog_info __user *uinfo = u64_to_user_ptr(attr->info.info); + struct bpf_prog_info info = {}; + u32 info_len = attr->info.info_len; + char __user *uinsns; + u32 ulen; + int err; + + err = check_uarg_tail_zero(uinfo, sizeof(info), info_len); + if (err) + return err; + info_len = min_t(u32, sizeof(info), info_len); + + if (copy_from_user(&info, uinfo, info_len)) + return err; + + info.type = prog->type; + info.id = prog->aux->id; + + memcpy(info.tag, prog->tag, sizeof(prog->tag)); + + if (!capable(CAP_SYS_ADMIN)) { + info.jited_prog_len = 0; + info.xlated_prog_len = 0; + goto done; + } + + ulen = info.jited_prog_len; + info.jited_prog_len = prog->jited_len; + if (info.jited_prog_len && ulen) { + uinsns = u64_to_user_ptr(info.jited_prog_insns); + ulen = min_t(u32, info.jited_prog_len, ulen); + if (copy_to_user(uinsns, prog->bpf_func, ulen)) + return -EFAULT; + } + + ulen = info.xlated_prog_len; + info.xlated_prog_le
[PATCH net-next 4/8] bpf: Add BPF_PROG_GET_FD_BY_ID
Add BPF_PROG_GET_FD_BY_ID command to allow user to get a fd from a bpf_prog's ID. bpf_prog_inc_not_zero() is added and is called with prog_idr_lock held. __bpf_prog_put() is also added which has the 'bool do_idr_lock' param to decide if the prog_idr_lock should be acquired when freeing the prog->id. In the error path of bpf_prog_inc_not_zero(), it may have to call __bpf_prog_put(map, false) which does not need to take the prog_idr_lock when freeing the prog->id. It is currently limited to CAP_SYS_ADMIN which we can consider to lift it in followup patches. Signed-off-by: Martin KaFai Lau Acked-by: Alexei Starovoitov Acked-by: Daniel Borkmann --- include/uapi/linux/bpf.h | 8 +++-- kernel/bpf/syscall.c | 91 ++-- 2 files changed, 87 insertions(+), 12 deletions(-) diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index e5c88f39bdc3..6d4e1cc5bd18 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -84,6 +84,7 @@ enum bpf_cmd { BPF_PROG_TEST_RUN, BPF_PROG_GET_NEXT_ID, BPF_MAP_GET_NEXT_ID, + BPF_PROG_GET_FD_BY_ID, }; enum bpf_map_type { @@ -212,8 +213,11 @@ union bpf_attr { __u32 duration; } test; - struct { /* anonymous struct used by BPF_*_GET_NEXT_ID */ - __u32 start_id; + struct { /* anonymous struct used by BPF_*_GET_*_ID */ + union { + __u32 start_id; + __u32 prog_id; + }; __u32 next_id; }; } __attribute__((aligned(8))); diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index 34dc92b3c349..06d146f54ffb 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -703,15 +703,23 @@ static int bpf_prog_alloc_id(struct bpf_prog *prog) return id > 0 ? 0 : id; } -static void bpf_prog_free_id(struct bpf_prog *prog) +static void bpf_prog_free_id(struct bpf_prog *prog, bool do_idr_lock) { /* cBPF to eBPF migrations are currently not in the idr store. */ if (!prog->aux->id) return; - spin_lock_bh(&prog_idr_lock); + if (do_idr_lock) + spin_lock_bh(&prog_idr_lock); + else + __acquire(&prog_idr_lock); + idr_remove(&prog_idr, prog->aux->id); - spin_unlock_bh(&prog_idr_lock); + + if (do_idr_lock) + spin_unlock_bh(&prog_idr_lock); + else + __release(&prog_idr_lock); } static void __bpf_prog_put_rcu(struct rcu_head *rcu) @@ -723,16 +731,21 @@ static void __bpf_prog_put_rcu(struct rcu_head *rcu) bpf_prog_free(aux->prog); } -void bpf_prog_put(struct bpf_prog *prog) +static void __bpf_prog_put(struct bpf_prog *prog, bool do_idr_lock) { if (atomic_dec_and_test(&prog->aux->refcnt)) { trace_bpf_prog_put_rcu(prog); /* bpf_prog_free_id() must be called first */ - bpf_prog_free_id(prog); + bpf_prog_free_id(prog, do_idr_lock); bpf_prog_kallsyms_del(prog); call_rcu(&prog->aux->rcu, __bpf_prog_put_rcu); } } + +void bpf_prog_put(struct bpf_prog *prog) +{ + __bpf_prog_put(prog, true); +} EXPORT_SYMBOL_GPL(bpf_prog_put); static int bpf_prog_release(struct inode *inode, struct file *filp) @@ -814,6 +827,24 @@ struct bpf_prog *bpf_prog_inc(struct bpf_prog *prog) } EXPORT_SYMBOL_GPL(bpf_prog_inc); +static struct bpf_prog *bpf_prog_inc_not_zero(struct bpf_prog *prog) +{ + int refold; + + WARN_ON_ONCE(!lockdep_is_held(&prog_idr_lock)); + refold = __atomic_add_unless(&prog->aux->refcnt, 1, 0); + + if (refold >= BPF_MAX_REFCNT) { + __bpf_prog_put(prog, false); + return ERR_PTR(-EBUSY); + } + + if (!refold) + return ERR_PTR(-ENOENT); + + return prog; +} + static struct bpf_prog *__bpf_prog_get(u32 ufd, enum bpf_prog_type *type) { struct fd f = fdget(ufd); @@ -926,16 +957,21 @@ static int bpf_prog_load(union bpf_attr *attr) goto free_used_maps; err = bpf_prog_new_fd(prog); - if (err < 0) - /* failed to allocate fd */ - goto free_id; + if (err < 0) { + /* failed to allocate fd. +* bpf_prog_put() is needed because the above +* bpf_prog_alloc_id() has published the prog +* to the userspace and the userspace may +* have refcnt-ed it through BPF_PROG_GET_FD_BY_ID. +*/ + bpf_prog_put(prog); + return err; + } bpf_prog_kallsyms_add(prog); trace_bpf_prog_load(prog, err); return err; -free_id: - bpf_prog_free_id(prog); free_used_maps: free_used_maps(prog->aux); free_prog: @@ -1097,6 +1133,38 @@ static int bpf_obj_get_ne
[PATCH net-next 0/8] Introduce bpf ID
This patch series: 1) Introduce ID for both bpf_prog and bpf_map. 2) Add bpf commands to iterate the prog IDs and map IDs of the system. 3) Add bpf commands to get a prog/map fd from an ID 4) Add bpf command to get prog/map info from a fd. The prog/map info is a jump start in this patchset and it is not meant to be a complete list. They can be extended in the future patches. Martin KaFai Lau (8): bpf: Introduce bpf_prog ID bpf: Introduce bpf_map ID bpf: Add BPF_(PROG|MAP)_GET_NEXT_ID command bpf: Add BPF_PROG_GET_FD_BY_ID bpf: Add BPF_MAP_GET_FD_BY_ID bpf: Add jited_len to struct bpf_prog bpf: Add BPF_OBJ_GET_INFO_BY_FD bpf: Test for bpf ID arch/arm64/net/bpf_jit_comp.c | 1 + arch/powerpc/net/bpf_jit_comp64.c | 1 + arch/s390/net/bpf_jit_comp.c | 1 + arch/sparc/net/bpf_jit_comp_64.c | 1 + arch/x86/net/bpf_jit_comp.c | 1 + include/linux/bpf.h | 2 + include/linux/filter.h| 3 +- include/uapi/linux/bpf.h | 41 +++ kernel/bpf/syscall.c | 433 -- tools/include/uapi/linux/bpf.h| 41 +++ tools/lib/bpf/bpf.c | 68 + tools/lib/bpf/bpf.h | 5 + tools/testing/selftests/bpf/Makefile | 2 +- tools/testing/selftests/bpf/test_obj_id.c | 35 +++ tools/testing/selftests/bpf/test_progs.c | 191 + 15 files changed, 798 insertions(+), 28 deletions(-) create mode 100644 tools/testing/selftests/bpf/test_obj_id.c -- 2.9.3
Re: [PATCH V2 2/3] net-next: dsa: add multi cpu port support
> - past the initial setup, if we start creating bridge devices and so on, > we have no way to tell: group Ports 0-3 together and send traffic to CPU > port 0, then let Port 5 alone and send traffic to CPU port 1, that's a > DSA-only problem though, because we still have the CPU port(s) as > independent network interfaces. What is the problem here? Frames come out the master interface, get untagged and passed to the slave interface and go upto the bridge. It should all just work. Same in the reverse direction. In order to make best use of the extra bandwidth of having two cpu ports, i probably want the user ports reasonably evenly distributed between the CPU ports. Dedicating one CPU port to one user port is probably sub-optimal. How many people have 1Gbps Fibre to the home, which could fully utilise a one-to-one mapping for the WAN port? > Now, that would still force the user to configure two bridges in order > to properly steer traffic towards the requested ports but it would allow > us to be very flexible (which is probably desired here) in how ports are > grouped together. We want a sensible default, spreading the slave ports evenly over the CPU ports. We could add a devlink command to change the defaults at runtime. Andrew
[PATCH net-next] bnxt_en: Fix xmit_more with BQL.
We need to write the doorbell if BQL has stopped the queue and skb->xmit_more is set. Otherwise it is possible for the tx queue to rot and cause tx timeout. Fixes: 4d172f21cefe ("bnxt_en: Implement xmit_more.") Suggested-by: Yuval Mintz Signed-off-by: Michael Chan --- drivers/net/ethernet/broadcom/bnxt/bnxt.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c index 954758f..c1cd72a 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c @@ -463,7 +463,7 @@ static netdev_tx_t bnxt_start_xmit(struct sk_buff *skb, struct net_device *dev) prod = NEXT_TX(prod); txr->tx_prod = prod; - if (!skb->xmit_more) + if (!skb->xmit_more || netif_xmit_stopped(txq)) bnxt_db_write(bp, txr->tx_doorbell, DB_KEY_TX | prod); tx_done: -- 1.8.3.1
[PATCH v5 net-next 5/5] dsa: add maintainer of Microchip KSZ switches
From: Woojung Huh Adding maintainer of Microchip KSZ switches. Reviewed-by: Andrew Lunn Reviewed-by: Florian Fainelli Signed-off-by: Woojung Huh --- MAINTAINERS | 10 ++ 1 file changed, 10 insertions(+) diff --git a/MAINTAINERS b/MAINTAINERS index 42378cf..0fcb5e75 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -8479,6 +8479,16 @@ F: drivers/media/platform/atmel/atmel-isc.c F: drivers/media/platform/atmel/atmel-isc-regs.h F: devicetree/bindings/media/atmel-isc.txt +MICROCHIP KSZ SERIES ETHERNET SWITCH DRIVER +M: Woojung Huh +M: Microchip Linux Driver Support +L: netdev@vger.kernel.org +S: Maintained +F: net/dsa/tag_ksz.c +F: drivers/net/dsa/microchip/* +F: include/linux/platform_data/microchip-ksz.h +F: Documentation/devicetree/bindings/net/dsa/ksz.txt + MICROCHIP USB251XB DRIVER M: Richard Leitner L: linux-...@vger.kernel.org -- 2.7.4
[PATCH v5 net-next 2/5] phy: micrel: add Microchip KSZ 9477 Switch PHY support
From: Woojung Huh Adding Microchip 9477 Phy included in KSZ9477 Switch. Reviewed-by: Andrew Lunn Reviewed-by: Florian Fainelli Signed-off-by: Woojung Huh --- drivers/net/phy/micrel.c | 11 +++ include/linux/micrel_phy.h | 2 ++ 2 files changed, 13 insertions(+) diff --git a/drivers/net/phy/micrel.c b/drivers/net/phy/micrel.c index 4cfd541..46e80bc 100644 --- a/drivers/net/phy/micrel.c +++ b/drivers/net/phy/micrel.c @@ -20,6 +20,7 @@ *ksz8081, ksz8091, *ksz8061, * Switch : ksz8873, ksz886x + * ksz9477 */ #include @@ -996,6 +997,16 @@ static struct phy_driver ksphy_driver[] = { .read_status= ksz8873mll_read_status, .suspend= genphy_suspend, .resume = genphy_resume, +}, { + .phy_id = PHY_ID_KSZ9477, + .phy_id_mask= MICREL_PHY_ID_MASK, + .name = "Microchip KSZ9477", + .features = PHY_GBIT_FEATURES, + .config_init= kszphy_config_init, + .config_aneg= genphy_config_aneg, + .read_status= genphy_read_status, + .suspend= genphy_suspend, + .resume = genphy_resume, } }; module_phy_driver(ksphy_driver); diff --git a/include/linux/micrel_phy.h b/include/linux/micrel_phy.h index f541da6..472fa4d 100644 --- a/include/linux/micrel_phy.h +++ b/include/linux/micrel_phy.h @@ -37,6 +37,8 @@ #define PHY_ID_KSZ8795 0x00221550 +#definePHY_ID_KSZ9477 0x00221631 + /* struct phy_device dev_flags definitions */ #define MICREL_PHY_50MHZ_CLK 0x0001 #define MICREL_PHY_FXEN0x0002 -- 2.7.4
[PATCH v5 net-next 1/5] dsa: add support for Microchip KSZ tail tagging
From: Woojung Huh Adding support for the Microchip KSZ switch family tail tagging. Reviewed-by: Andrew Lunn Reviewed-by: Florian Fainelli Signed-off-by: Woojung Huh --- include/net/dsa.h | 1 + net/dsa/Kconfig| 3 ++ net/dsa/Makefile | 1 + net/dsa/dsa.c | 3 ++ net/dsa/dsa_priv.h | 3 ++ net/dsa/tag_ksz.c | 101 + 6 files changed, 112 insertions(+) create mode 100644 net/dsa/tag_ksz.c diff --git a/include/net/dsa.h b/include/net/dsa.h index c0e567c..42c28ae 100644 --- a/include/net/dsa.h +++ b/include/net/dsa.h @@ -31,6 +31,7 @@ enum dsa_tag_protocol { DSA_TAG_PROTO_BRCM, DSA_TAG_PROTO_DSA, DSA_TAG_PROTO_EDSA, + DSA_TAG_PROTO_KSZ, DSA_TAG_PROTO_LAN9303, DSA_TAG_PROTO_MTK, DSA_TAG_PROTO_QCA, diff --git a/net/dsa/Kconfig b/net/dsa/Kconfig index 297389b..cc5f8f9 100644 --- a/net/dsa/Kconfig +++ b/net/dsa/Kconfig @@ -25,6 +25,9 @@ config NET_DSA_TAG_DSA config NET_DSA_TAG_EDSA bool +config NET_DSA_TAG_KSZ + bool + config NET_DSA_TAG_LAN9303 bool diff --git a/net/dsa/Makefile b/net/dsa/Makefile index 90e5aa6..fcce25d 100644 --- a/net/dsa/Makefile +++ b/net/dsa/Makefile @@ -6,6 +6,7 @@ dsa_core-y += dsa.o dsa2.o legacy.o port.o slave.o switch.o dsa_core-$(CONFIG_NET_DSA_TAG_BRCM) += tag_brcm.o dsa_core-$(CONFIG_NET_DSA_TAG_DSA) += tag_dsa.o dsa_core-$(CONFIG_NET_DSA_TAG_EDSA) += tag_edsa.o +dsa_core-$(CONFIG_NET_DSA_TAG_KSZ) += tag_ksz.o dsa_core-$(CONFIG_NET_DSA_TAG_LAN9303) += tag_lan9303.o dsa_core-$(CONFIG_NET_DSA_TAG_MTK) += tag_mtk.o dsa_core-$(CONFIG_NET_DSA_TAG_QCA) += tag_qca.o diff --git a/net/dsa/dsa.c b/net/dsa/dsa.c index 3288a80..402459e 100644 --- a/net/dsa/dsa.c +++ b/net/dsa/dsa.c @@ -49,6 +49,9 @@ const struct dsa_device_ops *dsa_device_ops[DSA_TAG_LAST] = { #ifdef CONFIG_NET_DSA_TAG_EDSA [DSA_TAG_PROTO_EDSA] = &edsa_netdev_ops, #endif +#ifdef CONFIG_NET_DSA_TAG_KSZ + [DSA_TAG_PROTO_KSZ] = &ksz_netdev_ops, +#endif #ifdef CONFIG_NET_DSA_TAG_LAN9303 [DSA_TAG_PROTO_LAN9303] = &lan9303_netdev_ops, #endif diff --git a/net/dsa/dsa_priv.h b/net/dsa/dsa_priv.h index c1d4180..7459d57 100644 --- a/net/dsa/dsa_priv.h +++ b/net/dsa/dsa_priv.h @@ -167,6 +167,9 @@ extern const struct dsa_device_ops dsa_netdev_ops; /* tag_edsa.c */ extern const struct dsa_device_ops edsa_netdev_ops; +/* tag_ksz.c */ +extern const struct dsa_device_ops ksz_netdev_ops; + /* tag_lan9303.c */ extern const struct dsa_device_ops lan9303_netdev_ops; diff --git a/net/dsa/tag_ksz.c b/net/dsa/tag_ksz.c new file mode 100644 index 000..0b08a40 --- /dev/null +++ b/net/dsa/tag_ksz.c @@ -0,0 +1,101 @@ +/* + * net/dsa/tag_ksz.c - Microchip KSZ Switch tag format handling + * Copyright (c) 2017 Microchip Technology + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + */ + +#include +#include +#include +#include +#include "dsa_priv.h" + +/* For Ingress (Host -> KSZ), 2 bytes are added before FCS. + * --- + * DA(6bytes)|SA(6bytes)||Data(nbytes)|tag0(1byte)|tag1(1byte)|FCS(4bytes) + * --- + * tag0 : Prioritization (not used now) + * tag1 : each bit represents port (eg, 0x01=port1, 0x02=port2, 0x10=port5) + * + * For Egress (KSZ -> Host), 1 byte is added before FCS. + * --- + * DA(6bytes)|SA(6bytes)||Data(nbytes)|tag0(1byte)|FCS(4bytes) + * --- + * tag0 : zero-based value represents port + * (eg, 0x00=port1, 0x02=port3, 0x06=port7) + */ + +#defineKSZ_INGRESS_TAG_LEN 2 +#defineKSZ_EGRESS_TAG_LEN 1 + +static struct sk_buff *ksz_xmit(struct sk_buff *skb, struct net_device *dev) +{ + struct dsa_slave_priv *p = netdev_priv(dev); + struct sk_buff *nskb; + int padlen; + u8 *tag; + + padlen = (skb->len >= ETH_ZLEN) ? 0 : ETH_ZLEN - skb->len; + + if (skb_tailroom(skb) >= padlen + KSZ_INGRESS_TAG_LEN) { + nskb = skb; + } else { + nskb = alloc_skb(NET_IP_ALIGN + skb->len + +padlen + KSZ_INGRESS_TAG_LEN, GFP_ATOMIC); + if (!nskb) { + kfree_skb(skb); + return NULL; + } + skb_reserve(nskb, NET_IP_ALIGN); + + skb_reset_mac_header(nskb); + skb_set_network_header(nskb, + skb_network_header(skb) - skb->head); + skb_set_transport_header(nskb, +
[PATCH v5 net-next 4/5] net: dsa: Add Microchip KSZ switches binding
From: Woojung Huh A sample SPI configuration for Microchip KSZ switches. Reviewed-by: Andrew Lunn Reviewed-by: Florian Fainelli Signed-off-by: Woojung Huh --- Documentation/devicetree/bindings/net/dsa/ksz.txt | 72 +++ 1 file changed, 72 insertions(+) create mode 100644 Documentation/devicetree/bindings/net/dsa/ksz.txt diff --git a/Documentation/devicetree/bindings/net/dsa/ksz.txt b/Documentation/devicetree/bindings/net/dsa/ksz.txt new file mode 100644 index 000..9acdb6c --- /dev/null +++ b/Documentation/devicetree/bindings/net/dsa/ksz.txt @@ -0,0 +1,72 @@ +Microchip KSZ Series Ethernet switches +== + +Required properties: + +- compatible: For external switch chips, compatible string must be exactly one + of: "microchip,ksz9477" + +See Documentation/devicetree/bindings/dsa/dsa.txt for a list of additional +required and optional properties. + +Examples: + +Ethernet switch connected via SPI to the host, CPU port wired to eth0: + + eth0: ethernet@10001000 { + fixed-link { + speed = <1000>; + full-duplex; + }; + }; + + spi1: spi@f8008000 { + pinctrl-0 = <&pinctrl_spi_ksz>; + cs-gpios = <&pioC 25 0>; + id = <1>; + status = "okay"; + + ksz9477: ksz9477@0 { + compatible = "microchip,ksz9477"; + reg = <0>; + + spi-max-frequency = <4400>; + spi-cpha; + spi-cpol; + + status = "okay"; + ports { + #address-cells = <1>; + #size-cells = <0>; + port@0 { + reg = <0>; + label = "lan1"; + }; + port@1 { + reg = <1>; + label = "lan2"; + }; + port@2 { + reg = <2>; + label = "lan3"; + }; + port@3 { + reg = <3>; + label = "lan4"; + }; + port@4 { + reg = <4>; + label = "lan5"; + }; + port@5 { + reg = <5>; + label = "cpu"; + ethernet = <ð0>; + fixed-link { + speed = <1000>; + full-duplex; + }; + }; + }; + }; + }; -- 2.7.4
[PATCH v5 net-next 3/5] dsa: add DSA switch driver for Microchip KSZ9477
From: Woojung Huh The KSZ9477 is a fully integrated layer 2, managed, 7 ports GigE switch with numerous advanced features. 5 ports incorporate 10/100/1000 Mbps PHYs. The other 2 ports have interfaces that can be configured as SGMII, RGMII, MII or RMII. Either of these may connect directly to a host processor or to an external PHY. The SGMII port may interface to a fiber optic transceiver. This driver currently supports vlan, fdb, mdb & mirror dsa switch operations. Reviewed-by: Florian Fainelli Signed-off-by: Woojung Huh --- drivers/net/dsa/Kconfig |2 + drivers/net/dsa/Makefile|1 + drivers/net/dsa/microchip/Kconfig | 12 + drivers/net/dsa/microchip/Makefile |2 + drivers/net/dsa/microchip/ksz_9477_reg.h| 1676 +++ drivers/net/dsa/microchip/ksz_common.c | 1278 drivers/net/dsa/microchip/ksz_priv.h| 210 drivers/net/dsa/microchip/ksz_spi.c | 215 include/linux/platform_data/microchip-ksz.h | 29 + 9 files changed, 3425 insertions(+) create mode 100644 drivers/net/dsa/microchip/Kconfig create mode 100644 drivers/net/dsa/microchip/Makefile create mode 100644 drivers/net/dsa/microchip/ksz_9477_reg.h create mode 100644 drivers/net/dsa/microchip/ksz_common.c create mode 100644 drivers/net/dsa/microchip/ksz_priv.h create mode 100644 drivers/net/dsa/microchip/ksz_spi.c create mode 100644 include/linux/platform_data/microchip-ksz.h diff --git a/drivers/net/dsa/Kconfig b/drivers/net/dsa/Kconfig index 68131a4..83a9bc8 100644 --- a/drivers/net/dsa/Kconfig +++ b/drivers/net/dsa/Kconfig @@ -39,6 +39,8 @@ config NET_DSA_MV88E6060 This enables support for the Marvell 88E6060 ethernet switch chip. +source "drivers/net/dsa/microchip/Kconfig" + source "drivers/net/dsa/mv88e6xxx/Kconfig" config NET_DSA_QCA8K diff --git a/drivers/net/dsa/Makefile b/drivers/net/dsa/Makefile index 9613f36..4a5b5bd 100644 --- a/drivers/net/dsa/Makefile +++ b/drivers/net/dsa/Makefile @@ -8,4 +8,5 @@ obj-$(CONFIG_NET_DSA_SMSC_LAN9303) += lan9303-core.o obj-$(CONFIG_NET_DSA_SMSC_LAN9303_I2C) += lan9303_i2c.o obj-$(CONFIG_NET_DSA_SMSC_LAN9303_MDIO) += lan9303_mdio.o obj-y += b53/ +obj-y += microchip/ obj-y += mv88e6xxx/ diff --git a/drivers/net/dsa/microchip/Kconfig b/drivers/net/dsa/microchip/Kconfig new file mode 100644 index 000..a8b8f59 --- /dev/null +++ b/drivers/net/dsa/microchip/Kconfig @@ -0,0 +1,12 @@ +menuconfig MICROCHIP_KSZ + tristate "Microchip KSZ series switch support" + depends on NET_DSA + select NET_DSA_TAG_KSZ + help + This driver adds support for Microchip KSZ switch chips. + +config MICROCHIP_KSZ_SPI_DRIVER + tristate "KSZ series SPI connected switch driver" + depends on MICROCHIP_KSZ && SPI + help + Select to enable support for registering switches configured through SPI. diff --git a/drivers/net/dsa/microchip/Makefile b/drivers/net/dsa/microchip/Makefile new file mode 100644 index 000..ed335e2 --- /dev/null +++ b/drivers/net/dsa/microchip/Makefile @@ -0,0 +1,2 @@ +obj-$(CONFIG_MICROCHIP_KSZ)+= ksz_common.o +obj-$(CONFIG_MICROCHIP_KSZ_SPI_DRIVER) += ksz_spi.o diff --git a/drivers/net/dsa/microchip/ksz_9477_reg.h b/drivers/net/dsa/microchip/ksz_9477_reg.h new file mode 100644 index 000..6aa6752 --- /dev/null +++ b/drivers/net/dsa/microchip/ksz_9477_reg.h @@ -0,0 +1,1676 @@ +/* + * Microchip KSZ9477 register definitions + * + * Copyright (C) 2017 + * + * Permission to use, copy, modify, and/or distribute this software for any + * purpose with or without fee is hereby granted, provided that the above + * copyright notice and this permission notice appear in all copies. + * + * THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES + * WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF + * MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR + * ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES + * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN + * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF + * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. + */ + +#ifndef __KSZ9477_REGS_H +#define __KSZ9477_REGS_H + +#define KS_PRIO_M 0x7 +#define KS_PRIO_S 4 + +/* 0 - Operation */ +#define REG_CHIP_ID0__10x + +#define REG_CHIP_ID1__10x0001 + +#define FAMILY_ID 0x95 +#define FAMILY_ID_94 0x94 +#define FAMILY_ID_95 0x95 +#define FAMILY_ID_85 0x85 +#define FAMILY_ID_98 0x98 +#define FAMILY_ID_88 0x88 + +#define REG_CHIP_ID2__10x000
[PATCH v5 net-next 0/5] dsa: add Microchip KSZ9477 DSA driver
From: Woojung Huh This series of patches is for Microchip KSZ9477 DSA driver. KSZ9477 is 7 ports GigE switch with numerous advanced features. 5 ports are 10/100/1000 Mbps internal PHYs and 2 ports have Interfaces to SGMII, RGMII, MII or RMII. This patch supports VLAN, MDB, FDB and port mirroring offloads. Welcome reviews and comments from community. Note: Tests are performed on internal development board. V5 - drivers/net/dsa/microchip/ksz_common.c * remove wrong mutex_unlock() in ksz_port_mdb_del() V4 - update per review comments - cosmetic changes - net/dsa/tag_ksz.c * skb_put() & memset() are changed to skb_put_padto() - drivers/net/dsa/microchip/ksz_common. * vlan access mutex is updated * mib_names[] is changed to static const V3 - update per review comments - cosmetic changes - drivers/net/dsa/microchip/ksz_common.c * clean up ksz_switch_chips[] * consolidate checking loops into functions * update mutex for better locking * replace devm_kmalloc_array() to devm_kcalloc() - MAINTAINERS * add missing net/dsa/tag_ksz.c V2 - update per review comments - several cosmetic changes - net/dsa/tag_ksz.c * constants are changed to defines * remove skb_linearize() in ksz_rcv() * ksz_xmit()checks skb tailroom before allocate new skb - drivers/net/phy/micrel.c * remove PHY_HAS_MAGICANEG from ksphy_driver[] - drivers/net/dsa/microchip/ksz_common.c * add timeout to avoid endless loop * port initialization is move to ksz_port_enable() instead of ksz_setup_ports() - Documentation/devicetree/bindings/net/dsa/ksz.txt * fix typo and indentations Woojung Huh (5): dsa: add support for Microchip KSZ tail tagging phy: micrel: add Microchip KSZ 9477 Switch PHY support dsa: add DSA switch driver for Microchip KSZ9477 net: dsa: Add Microchip KSZ switches binding dsa: add maintainer of Microchip KSZ switches Documentation/devicetree/bindings/net/dsa/ksz.txt | 72 + MAINTAINERS | 10 + drivers/net/dsa/Kconfig |2 + drivers/net/dsa/Makefile |1 + drivers/net/dsa/microchip/Kconfig | 12 + drivers/net/dsa/microchip/Makefile|2 + drivers/net/dsa/microchip/ksz_9477_reg.h | 1676 + drivers/net/dsa/microchip/ksz_common.c| 1278 drivers/net/dsa/microchip/ksz_priv.h | 210 +++ drivers/net/dsa/microchip/ksz_spi.c | 215 +++ drivers/net/phy/micrel.c | 11 + include/linux/micrel_phy.h|2 + include/linux/platform_data/microchip-ksz.h | 29 + include/net/dsa.h |1 + net/dsa/Kconfig |3 + net/dsa/Makefile |1 + net/dsa/dsa.c |3 + net/dsa/dsa_priv.h|3 + net/dsa/tag_ksz.c | 101 ++ 19 files changed, 3632 insertions(+) create mode 100644 Documentation/devicetree/bindings/net/dsa/ksz.txt create mode 100644 drivers/net/dsa/microchip/Kconfig create mode 100644 drivers/net/dsa/microchip/Makefile create mode 100644 drivers/net/dsa/microchip/ksz_9477_reg.h create mode 100644 drivers/net/dsa/microchip/ksz_common.c create mode 100644 drivers/net/dsa/microchip/ksz_priv.h create mode 100644 drivers/net/dsa/microchip/ksz_spi.c create mode 100644 include/linux/platform_data/microchip-ksz.h create mode 100644 net/dsa/tag_ksz.c -- 2.7.4
Re: loosing netdevices with namespaces and unshare?
On 5/30/17 4:07 PM, Harald Welte wrote: > In case you're wondering what I'm actually trying to achieve: Find > an easy way to run a single program in an isolated namespace that only > has one physical (usb) ethernet device. I would like to execute that > program as unprivileged user but still be able to bind to privileged > ports. And I want to do this using simple command-line tools without > all the bloat and overhead of "container" solutions that have 99% of > features I don't need. But let that not distract you, I think the > mysteriously disappearing netdevices are a more general and important > issue. An alternative approach is to create a bridge and add the usb ethernet device to it. As you want to launch a program, create a veth pair. Put one end into the bridge, and the other end into the new network namespace. All of this can be scripted quite easily with 'ip' - including configuring the veth device pushed into the namespace and running the command. Use unshare for the other namespaces.
Re: [PATCH RFC net-next] arp: Really delete an arp entry on "arp -d"
On (05/30/17 16:20), Stephen Hemminger wrote: > > Please don't copy/paste chunks of code. Instead refactor and make this > into a helper function. sure, I have no problems with that, and as I pointed out, I've not tested ipv6 for this yet either. I'll do all of this after getting some feedback on the more basic issue here: I was first looking for comments on the more fundamental refcnt management behind the fix (I'm surprised no one noticed this before, is there some deep reason for leaving it like this, that I am missing? Does it break something else?) And fwiw I was merging pieces of ___neigh_lookup_noref, which figures out the hash val, and neigh_flush_dev/neigh_forced_gc
Re: [PATCH RFC net-next] arp: Really delete an arp entry on "arp -d"
On Mon, 29 May 2017 20:48:16 -0700 Sowmini Varadhan wrote: > + np = &nht->hash_buckets[hash_val]; > + while ((n = rcu_dereference_protected(*np, > + lockdep_is_held(&tbl->lock))) != NULL) { > + write_lock(&n->lock); > + if (n == ndel) { > + bool retval = false; > + > + if (atomic_read(&n->refcnt) == 1) { > + rcu_assign_pointer(*np, > + rcu_dereference_protected(n->next, > + lockdep_is_held(&tbl->lock))); > + n->dead = 1; > + retval = true; > + } > + write_unlock(&n->lock); > + if (retval) > + neigh_cleanup_and_release(n); > + write_unlock_bh(&tbl->lock); > + return retval; > + } > + write_unlock(&n->lock); > + np = &n->next; > + } > + Please don't copy/paste chunks of code. Instead refactor and make this into a helper function.
Re: loosing netdevices with namespaces and unshare?
On Tue, May 30, 2017 at 3:07 PM, Harald Welte wrote: > But, to the contrary, this doesn't happen. The unshare-created netns is > gone, but the netdevice did not get moved back to the root namespace > either. The only hack to get back to the "eth0" device is to unload the > driver and re-load it. Net namespace simply unregisters all netdevices inside when it is gone, no matter where they are from. I am pretty sure you can move it back to root-ns if you want, it is a little tricky because you have to give the root-ns a name first. > > I can reproduce the above without starting any other process inside that > namespace. I have verified that there are no /proc/*/ns/net symlinks > left pointing to the ID of that namespace. What am I missing here? Is > this the intended behavior? Yes it is. > > Of course I know I could simply do something like "ip link set eth0 > netns 1" from within the namespace before leaving. But what if the > process is not bash and the process exits abnormally? I'd consider > that explicit reassignment more like a hack than a proper solution... It doesn't make sense to move it back to where it is from, for example, what if you move a veth0 from netns1 to netns2 and netns1 is gone before netns2? Regards.
[PATCH RFC net-next] arp: Really delete an arp entry on "arp -d"
Noticed during some testing: the command # arp -s 62.2.0.1 a:b:c:d:e:f dev eth2 adds an entry like the following (listed by "arp -an") ? (62.2.0.1) at 0a:0b:0c:0d:0e:0f [ether] PERM on eth2 but the symmetric deletion command # arp -i eth2 -d 62.2.0.1 does not remove the PERM entry from the table, and instead leaves behind ? (62.2.0.1) at on eth2 The reason is that there is a refcnt of 1 for the arp_tbl itself (neigh_alloc starts off the entry with a refcnt of 1), thus the neigh_release() call from arp_invalidate() will (at best) just decrement the ref to 1, but will never actually free it from the table. To fix this, we need to do something like neigh_forced_gc: if the refcnt is 1 (i.e., on the table's ref), remove the entry from the table and free it. We may need something symmetric for IPv6- I was going to check into that, after getting some feedback on this RFC patch. Signed-off-by: Sowmini Varadhan --- include/net/neighbour.h |1 + net/core/neighbour.c| 42 ++ net/ipv4/arp.c |1 + 3 files changed, 44 insertions(+), 0 deletions(-) diff --git a/include/net/neighbour.h b/include/net/neighbour.h index e4dd3a2..639b675 100644 --- a/include/net/neighbour.h +++ b/include/net/neighbour.h @@ -317,6 +317,7 @@ struct neighbour *__neigh_create(struct neigh_table *tbl, const void *pkey, int neigh_update(struct neighbour *neigh, const u8 *lladdr, u8 new, u32 flags, u32 nlmsg_pid); void __neigh_set_probe_once(struct neighbour *neigh); +bool neigh_remove_one(struct neighbour *ndel, struct neigh_table *tbl); void neigh_changeaddr(struct neigh_table *tbl, struct net_device *dev); int neigh_ifdown(struct neigh_table *tbl, struct net_device *dev); int neigh_resolve_output(struct neighbour *neigh, struct sk_buff *skb); diff --git a/net/core/neighbour.c b/net/core/neighbour.c index d274f81..0a09f6f 100644 --- a/net/core/neighbour.c +++ b/net/core/neighbour.c @@ -117,6 +117,48 @@ unsigned long neigh_rand_reach_time(unsigned long base) } EXPORT_SYMBOL(neigh_rand_reach_time); +bool neigh_remove_one(struct neighbour *ndel, struct neigh_table *tbl) +{ + struct neigh_hash_table *nht; + void *pkey = ndel->primary_key; + u32 hash_val; + struct neighbour *n; + struct neighbour __rcu **np; + + write_lock_bh(&tbl->lock); + nht = rcu_dereference_protected(tbl->nht, + lockdep_is_held(&tbl->lock)); + hash_val = tbl->hash(pkey, ndel->dev, nht->hash_rnd); + hash_val = hash_val >> (32 - nht->hash_shift); + + np = &nht->hash_buckets[hash_val]; + while ((n = rcu_dereference_protected(*np, + lockdep_is_held(&tbl->lock))) != NULL) { + write_lock(&n->lock); + if (n == ndel) { + bool retval = false; + + if (atomic_read(&n->refcnt) == 1) { + rcu_assign_pointer(*np, + rcu_dereference_protected(n->next, + lockdep_is_held(&tbl->lock))); + n->dead = 1; + retval = true; + } + write_unlock(&n->lock); + if (retval) + neigh_cleanup_and_release(n); + write_unlock_bh(&tbl->lock); + return retval; + } + write_unlock(&n->lock); + np = &n->next; + } + + write_unlock_bh(&tbl->lock); + return false; +} +EXPORT_SYMBOL(neigh_remove_one); static int neigh_forced_gc(struct neigh_table *tbl) { diff --git a/net/ipv4/arp.c b/net/ipv4/arp.c index e9f3386..5264004 100644 --- a/net/ipv4/arp.c +++ b/net/ipv4/arp.c @@ -1120,6 +1120,7 @@ static int arp_invalidate(struct net_device *dev, __be32 ip) NEIGH_UPDATE_F_OVERRIDE| NEIGH_UPDATE_F_ADMIN, 0); neigh_release(neigh); + neigh_remove_one(neigh, &arp_tbl); } return err; -- 1.7.1
Re: [PATCH 2/2] net: phy: micrel: Restore led_mode and clk_sel on resume
On Tue, 2017-05-30 at 15:19 -0700, Florian Fainelli wrote: > On 05/30/2017 03:08 PM, Leonard Crestez wrote: > > On Tue, 2017-05-30 at 11:05 -0700, Florian Fainelli wrote: > > > Should not we actually call kszphy_config_init() in order to restore > > > broadcast and nand disable bits as well? > > > > I don't know. In my case the B_CAST_OFF bit doesn't seem to be lost and > > NAND_TREE_ON is already off by the time it gets to linux. > > > > The bit that get lost seem to disappear just as the phy is resumed. I > > added some prints and they look like this: > > > > PM: early resume of devices complete after 6.534 msecs > > begin resume > > 0x1F=0x8190 0x16=0x202 > > after genphy_resume 0x1F=0x8100 0x16=0x202 > > end > > resume 0x1F=0x8190 0x16=0x202 > > OK, so it actually would not hurt to call config_init() again here, right? No, but if only some registers are lost then why reconfigure others? In particular it seems that only MII_KSZPHY_CTRL_2 is lost but not MII_KSZPHY_OMSO. However a few extra phy reads don't really matter. Calling config_init is the path of least resistance. Another problem is that the ksz9031 driver uses kszphy_resume but has a completely different init function. The bits I am concerned about do not seem to exist in hardware but it does something completely unrelated with parsing OF properties. Should I split this into ksz8021_resume and ksz9031_resume? It seems likely that more of the kszphy drivers need suspend/resume code but nobody got around to testing them. > > > If not, I would be more comfortable if we did create a specific function > > > that takes care of setting the reference clock and LED mode. > > > > Ok, I can add a function called kszphy_config_reset() with a comment > > explaining it's for bits lost on reset/resume. > > > > Or perhaps a better option would be to just save/restore the entire > > 0x1F register? > > Register 0 through 15 are standardized, but those after that are not, > and PHYs with a gazillion of registers already need to have a specific > set of suspend/resume sequence due to their proprietary indirect > register scheme, so we cannot quite come up with a generic function that > would cache all 16 or 32 registers and flat out restore those. > > Similarly, having phy_resume() systematically calling into config_init() > is a bit of a stretch and can be pretty inefficient. I'm not suggesting changing this at the phy core level. Just that maybe kszphy_resume specifically could save/restore register 0x1F instead of going through config logic again (which would involve extra reads and writes). However it seems that this has complications, for example on some versions the leds are written to a different register. It's not really worth optimizing to such an extent.
RE: [PATCH v4 net-next 3/5] dsa: add DSA switch driver for Microchip KSZ9477
Hi Andrew, > > +static int ksz_port_mdb_del(struct dsa_switch *ds, int port, > > + const struct switchdev_obj_port_mdb *mdb) > > +{ > > + struct ksz_device *dev = ds->priv; > > + u32 static_table[4]; > > + u32 data; > > + int index; > > + int ret = 0; > > + u32 mac_hi, mac_lo; > > + > > + mac_hi = ((mdb->addr[0] << 8) | mdb->addr[1]); > > + mac_lo = ((mdb->addr[2] << 24) | (mdb->addr[3] << 16)); > > + mac_lo |= ((mdb->addr[4] << 8) | mdb->addr[5]); > > + > > + mutex_lock(&dev->alu_mutex); > > + > > + for (index = 0; index < dev->num_statics; index++) { > > + /* find empty slot first */ > > + data = (index << ALU_STAT_INDEX_S) | > > + ALU_STAT_READ | ALU_STAT_START; > > + ksz_write32(dev, REG_SW_ALU_STAT_CTRL__4, data); > > + > > + /* wait to be finished */ > > + ret = wait_alu_sta_ready(dev, ALU_STAT_START, 1000); > > + if (ret < 0) { > > + dev_dbg(dev->dev, "Failed to read ALU STATIC\n"); > > + goto exit; > > + } > > + > > + /* read ALU static table */ > > + read_table(ds, static_table); > > + > > + mutex_unlock(&dev->alu_mutex); > > Is this mutex unlock here correct? It looks like we will unlock it > again when we eventually get to exit: below. > Thanks for catching this. Will submit new patch soon. - Woojung
[PATCH] net: freescale: fix potential null pointer dereference
Add NULL check before dereferencing pointer _id_ in order to avoid a potential NULL pointer dereference. Addresses-Coverity-ID: 1397995 Signed-off-by: Gustavo A. R. Silva --- drivers/net/ethernet/freescale/fsl_pq_mdio.c | 9 - 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/freescale/fsl_pq_mdio.c b/drivers/net/ethernet/freescale/fsl_pq_mdio.c index 446c7b3..a10de1e 100644 --- a/drivers/net/ethernet/freescale/fsl_pq_mdio.c +++ b/drivers/net/ethernet/freescale/fsl_pq_mdio.c @@ -381,7 +381,7 @@ static int fsl_pq_mdio_probe(struct platform_device *pdev) { const struct of_device_id *id = of_match_device(fsl_pq_mdio_match, &pdev->dev); - const struct fsl_pq_mdio_data *data = id->data; + const struct fsl_pq_mdio_data *data; struct device_node *np = pdev->dev.of_node; struct resource res; struct device_node *tbi; @@ -389,6 +389,13 @@ static int fsl_pq_mdio_probe(struct platform_device *pdev) struct mii_bus *new_bus; int err; + if (!id) { + dev_err(&pdev->dev, "Failed to match device\n"); + return -ENODEV; + } + + data = id->data; + dev_dbg(&pdev->dev, "found %s compatible node\n", id->compatible); new_bus = mdiobus_alloc_size(sizeof(*priv)); -- 2.5.0
Re: [PATCH V2 2/3] net-next: dsa: add multi cpu port support
+Jiri, Ido, On 05/30/2017 03:44 AM, John Crispin wrote: > Some boards have two CPU interfaces connected to the switch, e.g. WiFi > access points, with 1 port labeled WAN, 4 ports labeled lan1-lan4, and > two port connected to the SoC. > > This patch extends DSA to allows both CPU ports to be used. The "cpu" > node in the DSA tree can now have a phandle to the host interface it > connects to. Each user port can have a phandle to a cpu port which > should be used for traffic between the port and the CPU. Thus simple > load sharing over the two CPU ports can be achieved. Part of the problem here is that: - we have the initial state where we only allow the user-ports to be talking to the CPU port, now we have more than one CPU port, so which one do we choose? You have proposed a Device Tree change for that, and while this works, we are going beyond pure HW description and we are already describing a desired policy, not ideal - past the initial setup, if we start creating bridge devices and so on, we have no way to tell: group Ports 0-3 together and send traffic to CPU port 0, then let Port 5 alone and send traffic to CPU port 1, that's a DSA-only problem though, because we still have the CPU port(s) as independent network interfaces. Right now, we cannot enslave master network devices into the bridge when a tagging protocol is used (see 8db0a2ee2c6302a1dcbcdb93cb731dfc6c0cdb5e) so we cannot quite associate a bridge device with its underlying CPU port. I suppose we could revise that commit and let the enslaving happen in order for the switch drivers to "learn" which CPU/master network device is being added to the bridge, and just not register the RX handler for that interface. Now, that would still force the user to configure two bridges in order to properly steer traffic towards the requested ports but it would allow us to be very flexible (which is probably desired here) in how ports are grouped together. Does that make sense? > > Signed-off-by: John Crispin > --- > include/net/dsa.h | 21 - > net/dsa/dsa2.c | 35 +++ > net/dsa/dsa_priv.h | 1 + > net/dsa/slave.c| 26 -- > 4 files changed, 68 insertions(+), 15 deletions(-) > > diff --git a/include/net/dsa.h b/include/net/dsa.h > index c0e567c0c824..d2994bd2c507 100644 > --- a/include/net/dsa.h > +++ b/include/net/dsa.h > @@ -186,6 +186,8 @@ struct dsa_port { > u8 stp_state; > struct net_device *bridge_dev; > struct devlink_port devlink_port; > + struct net_device *ethernet; > + int upstream; > }; > > struct dsa_switch { > @@ -251,7 +253,7 @@ struct dsa_switch { > > static inline bool dsa_is_cpu_port(struct dsa_switch *ds, int p) > { > - return ds->dst->cpu_dp == &ds->ports[p]; > + return !!(ds->cpu_port_mask & (1 << p)); > } > > static inline bool dsa_is_dsa_port(struct dsa_switch *ds, int p) > @@ -269,6 +271,11 @@ static inline bool dsa_is_port_initialized(struct > dsa_switch *ds, int p) > return ds->enabled_port_mask & (1 << p) && ds->ports[p].netdev; > } > > +static inline bool dsa_is_upstream_port(struct dsa_switch *ds, int p) > +{ > + return dsa_is_cpu_port(ds, p) || dsa_is_dsa_port(ds, p); > +} > + > static inline u8 dsa_upstream_port(struct dsa_switch *ds) > { > struct dsa_switch_tree *dst = ds->dst; > @@ -285,6 +292,18 @@ static inline u8 dsa_upstream_port(struct dsa_switch *ds) > return ds->rtable[dst->cpu_dp->ds->index]; > } > > +static inline u8 dsa_port_upstream_port(struct dsa_switch *ds, int port) > +{ > + /* > + * If this port has a specific upstream cpu port, use it, > + * otherwise use the switch default. > + */ > + if (ds->ports[port].upstream) > + return ds->ports[port].upstream; > + else > + return dsa_upstream_port(ds); > +} > + > struct dsa_switch_ops { > /* >* Legacy probing. > diff --git a/net/dsa/dsa2.c b/net/dsa/dsa2.c > index 4301f52e4f5a..8b13aa735c40 100644 > --- a/net/dsa/dsa2.c > +++ b/net/dsa/dsa2.c > @@ -253,8 +253,6 @@ static int dsa_cpu_port_apply(struct dsa_port *port, u32 > index, > return err; > } > > - ds->cpu_port_mask |= BIT(index); > - > memset(&ds->ports[index].devlink_port, 0, > sizeof(ds->ports[index].devlink_port)); > err = devlink_port_register(ds->devlink, &ds->ports[index].devlink_port, > @@ -269,6 +267,10 @@ static void dsa_cpu_port_unapply(struct dsa_port *port, > u32 index, > dsa_cpu_dsa_destroy(port); > ds->cpu_port_mask &= ~BIT(index); > > + if (ds->ports[index].ethernet) { > + dev_put(ds->ports[index].ethernet); > + ds->ports[index].ethernet = NULL; > + } > } > > static int dsa_user_port_apply(struct dsa_port *port, u32 index, > @@ -530,6 +532,29 @@ static int dsa_cpu_parse(struct dsa_port *port, u32
Suggest stable pick for nfnetlink_queue memory leak
Hi, I was running kmemleak on a kernel based on 4.4.64 and I noticed that my memory leak was fixed by commit b18bcb0019cf ("netfilter: nfnetlink_queue: fix memory leak when attach expectation successfully"). It seems like perhaps this should go to linux-stable? It's not a huge leak, but the fix is simple. It looks as if the leak has existed since commit bd0779370588 ("netfilter: nfnetlink_queue: allow to attach expectations to conntracks") Thanks! -Doug
Re: [PATCH v4 net-next 3/5] dsa: add DSA switch driver for Microchip KSZ9477
> +static int ksz_port_mdb_del(struct dsa_switch *ds, int port, > + const struct switchdev_obj_port_mdb *mdb) > +{ > + struct ksz_device *dev = ds->priv; > + u32 static_table[4]; > + u32 data; > + int index; > + int ret = 0; > + u32 mac_hi, mac_lo; > + > + mac_hi = ((mdb->addr[0] << 8) | mdb->addr[1]); > + mac_lo = ((mdb->addr[2] << 24) | (mdb->addr[3] << 16)); > + mac_lo |= ((mdb->addr[4] << 8) | mdb->addr[5]); > + > + mutex_lock(&dev->alu_mutex); > + > + for (index = 0; index < dev->num_statics; index++) { > + /* find empty slot first */ > + data = (index << ALU_STAT_INDEX_S) | > + ALU_STAT_READ | ALU_STAT_START; > + ksz_write32(dev, REG_SW_ALU_STAT_CTRL__4, data); > + > + /* wait to be finished */ > + ret = wait_alu_sta_ready(dev, ALU_STAT_START, 1000); > + if (ret < 0) { > + dev_dbg(dev->dev, "Failed to read ALU STATIC\n"); > + goto exit; > + } > + > + /* read ALU static table */ > + read_table(ds, static_table); > + > + mutex_unlock(&dev->alu_mutex); Is this mutex unlock here correct? It looks like we will unlock it again when we eventually get to exit: below. > + > + if (static_table[0] & ALU_V_STATIC_VALID) { > + /* check this has same vid & mac address */ > + > + if (((static_table[2] >> ALU_V_FID_S) == (mdb->vid)) && > + ((static_table[2] & ALU_V_MAC_ADDR_HI) == mac_hi) && > + (static_table[3] == mac_lo)) { > + /* found matching one */ > + break; > + } > + } > + } > + > + /* no available entry */ > + if (index == dev->num_statics) { > + ret = -EINVAL; > + goto exit; > + } > + > + /* clear port */ > + static_table[1] &= ~BIT(port); > + > + if ((static_table[1] & ALU_V_PORT_MAP) == 0) { > + /* delete entry */ > + static_table[0] = 0; > + static_table[1] = 0; > + static_table[2] = 0; > + static_table[3] = 0; > + } > + > + write_table(ds, static_table); > + > + data = (index << ALU_STAT_INDEX_S) | ALU_STAT_START; > + ksz_write32(dev, REG_SW_ALU_STAT_CTRL__4, data); > + > + /* wait to be finished */ > + ret = wait_alu_sta_ready(dev, ALU_STAT_START, 1000); > + if (ret < 0) > + dev_dbg(dev->dev, "Failed to read ALU STATIC\n"); > + > +exit: > + mutex_unlock(&dev->alu_mutex); > + > + return ret; > +}
Re: [PATCH] net: fix auto-loading of Marvell DSA driver
On Tue, May 30, 2017 at 09:38:18PM +0100, Russell King wrote: > Auto-loading of the Marvell DSA driver has stopped working with recent > kernels. This seems to be due to the change of binding for DSA devices, > moving them from the platform bus to the MDIO bus. > > In order for module auto-loading to work, we need to provide a MODALIAS > string in the uevent file for the device. However, the device core does > not automatically provide this, and needs each bus_type to implement a > uevent method to generate these strings. The MDIO bus does not provide > such a method, so no MODALIAS string is provided: > > .# cat /sys/bus/mdio_bus/devices/f1072004.mdio-mii\:04/uevent > DRIVER=mv88e6085 > OF_NAME=switch > OF_FULLNAME=/soc/internal-regs/mdio@72004/switch@4 > OF_COMPATIBLE_0=marvell,mv88e6085 > OF_COMPATIBLE_N=1 > > In the case of OF-based devices, the solution is easy - > of_device_uevent_modalias() does the work for us. After this is done, > the uevent file looks like this: > > .# cat /sys/bus/mdio_bus/devices/f1072004.mdio-mii\:04/uevent > DRIVER=mv88e6085 > OF_NAME=switch > OF_FULLNAME=/soc/internal-regs/mdio@72004/switch@4 > OF_COMPATIBLE_0=marvell,mv88e6085 > OF_COMPATIBLE_N=1 > MODALIAS=of:NswitchTCmarvell,mv88e6085 > > which results in auto-loading of the Marvell DSA driver on Clearfog > platforms. > > Fixes: c0405563a613 ("ARM: dts: armada-388-clearfog: Utilize new DSA binding") > Signed-off-by: Russell King Reviewed-by: Andrew Lunn Andrew
loosing netdevices with namespaces and unshare?
Hi all, I know I'm kind of late to the party in terms of deeper exploration of Linux network namespaces. Also, I'm not sure if the netdev list is the riight place to ask, but a moderate amount of web searching didn't bring up a solution in multiple hours, and it seems like I could trigger the kernel (4.11.0) to loose netdevices, which I think is a serious issue. What I'm doing: * start a process using the 'unshare' command line tool provided with util-linux, e.g. "unshare -nUr bash". I do this as a non-privileged user but now that is mapped to uid '0' inside the new process/namespace, so I can adjust interface configuration. * I use "echo $$" to get the PID of that bash process. * On another terminal in a root shell, I use "ip link set eth0 netns $PID" in order to move a given physical device into that namespace. * I then "exit" that bash, which should - to my knowledge - return the "eth0" netdev back to the root namespace, as the bash process was the only one using that network namespace But, to the contrary, this doesn't happen. The unshare-created netns is gone, but the netdevice did not get moved back to the root namespace either. The only hack to get back to the "eth0" device is to unload the driver and re-load it. I can reproduce the above without starting any other process inside that namespace. I have verified that there are no /proc/*/ns/net symlinks left pointing to the ID of that namespace. What am I missing here? Is this the intended behavior? Of course I know I could simply do something like "ip link set eth0 netns 1" from within the namespace before leaving. But what if the process is not bash and the process exits abnormally? I'd consider that explicit reassignment more like a hack than a proper solution... Regards, Harald p.s.: In case you're wondering what I'm actually trying to achieve: Find an easy way to run a single program in an isolated namespace that only has one physical (usb) ethernet device. I would like to execute that program as unprivileged user but still be able to bind to privileged ports. And I want to do this using simple command-line tools without all the bloat and overhead of "container" solutions that have 99% of features I don't need. But let that not distract you, I think the mysteriously disappearing netdevices are a more general and important issue. -- - Harald Weltehttp://laforge.gnumonks.org/ "Privacy in residential applications is a desirable marketing option." (ETSI EN 300 175-7 Ch. A6)
Re: [PATCH 1/2] ARM: dts: imx6ul-14x14-evk: Add ksz8081 phy properties
On Tue, 2017-05-30 at 11:10 -0700, Florian Fainelli wrote: > On 05/30/2017 10:34 AM, Leonard Crestez wrote: > > Right now mach-imx6ul registers a fixup for the ksz8081 phy. The same > > register values can be set through the micrel phy driver by using dts > > properties. > > > > This seems preferable and allows cleanly fixing suspend/resume. > > > > Signed-off-by: Leonard Crestez > > Should you have a Fixes: tag for that? Sounds like something you'd want > to backport to stable trees as well, no? I don't know if suspend over nfs ever worked on this board in upstream so there is no commit that this patch "Fixes: ". It would make sense to CC stable.
Re: [PATCH 2/2] net: phy: micrel: Restore led_mode and clk_sel on resume
On 05/30/2017 03:08 PM, Leonard Crestez wrote: > On Tue, 2017-05-30 at 11:05 -0700, Florian Fainelli wrote: >> On 05/30/2017 10:34 AM, Leonard Crestez wrote: >>> These bits seem to be lost after a suspend/resume cycle so just set them >>> again. >>> >>> This patch fixes ethernet suspend/resume on imx6ul-14x14-evk boards. >>> >>> Signed-off-by: Leonard Crestez >>> --- >>> drivers/net/phy/micrel.c | 15 +++ >>> 1 file changed, 15 insertions(+) >>> >>> diff --git a/drivers/net/phy/micrel.c b/drivers/net/phy/micrel.c >>> index 6a5fd18..c53ee17 100644 >>> --- a/drivers/net/phy/micrel.c >>> +++ b/drivers/net/phy/micrel.c >>> @@ -700,6 +700,9 @@ static int kszphy_suspend(struct phy_device *phydev) >>> >>> static int kszphy_resume(struct phy_device *phydev) >>> { >>> + struct kszphy_priv *priv = phydev->priv; >>> + int ret; >>> + >>> genphy_resume(phydev); >>> >>> /* Enable PHY Interrupts */ >>> @@ -709,6 +712,18 @@ static int kszphy_resume(struct phy_device *phydev) >>> phydev->drv->config_intr(phydev); >>> } >>> >>> + if (priv->rmii_ref_clk_sel) { >>> + ret = kszphy_rmii_clk_sel(phydev, priv->rmii_ref_clk_sel_val); >>> + if (ret) { >>> + phydev_err(phydev, >>> + "failed to set rmii reference clock\n"); >>> + return ret; >>> + } >>> + } >>> + >>> + if (priv->led_mode >= 0) >>> + kszphy_setup_led(phydev, priv->type->led_mode_reg, >>> priv->led_mode); >> >> Should not we actually call kszphy_config_init() in order to restore >> broadcast and nand disable bits as well? > > I don't know. In my case the B_CAST_OFF bit doesn't seem to be lost and > NAND_TREE_ON is already off by the time it gets to linux. > > The bit that get lost seem to disappear just as the phy is resumed. I > added some prints and they look like this: > > PM: early resume of devices complete after 6.534 msecs > begin resume > 0x1F=0x8190 0x16=0x202 > after genphy_resume 0x1F=0x8100 0x16=0x202 > end > resume 0x1F=0x8190 0x16=0x202 OK, so it actually would not hurt to call config_init() again here, right? > >> If not, I would be more comfortable if we did create a specific function >> that takes care of setting the reference clock and LED mode. > > Ok, I can add a function called kszphy_config_reset() with a comment > explaining it's for bits lost on reset/resume. > > Or perhaps a better option would be to just save/restore the entire > 0x1F register? Register 0 through 15 are standardized, but those after that are not, and PHYs with a gazillion of registers already need to have a specific set of suspend/resume sequence due to their proprietary indirect register scheme, so we cannot quite come up with a generic function that would cache all 16 or 32 registers and flat out restore those. Similarly, having phy_resume() systematically calling into config_init() is a bit of a stretch and can be pretty inefficient. -- Florian
Re: [PATCH 2/2] net: phy: micrel: Restore led_mode and clk_sel on resume
On Tue, 2017-05-30 at 11:05 -0700, Florian Fainelli wrote: > On 05/30/2017 10:34 AM, Leonard Crestez wrote: > > These bits seem to be lost after a suspend/resume cycle so just set them > > again. > > > > This patch fixes ethernet suspend/resume on imx6ul-14x14-evk boards. > > > > Signed-off-by: Leonard Crestez > > --- > > drivers/net/phy/micrel.c | 15 +++ > > 1 file changed, 15 insertions(+) > > > > diff --git a/drivers/net/phy/micrel.c b/drivers/net/phy/micrel.c > > index 6a5fd18..c53ee17 100644 > > --- a/drivers/net/phy/micrel.c > > +++ b/drivers/net/phy/micrel.c > > @@ -700,6 +700,9 @@ static int kszphy_suspend(struct phy_device *phydev) > > > > static int kszphy_resume(struct phy_device *phydev) > > { > > + struct kszphy_priv *priv = phydev->priv; > > + int ret; > > + > > genphy_resume(phydev); > > > > /* Enable PHY Interrupts */ > > @@ -709,6 +712,18 @@ static int kszphy_resume(struct phy_device *phydev) > > phydev->drv->config_intr(phydev); > > } > > > > + if (priv->rmii_ref_clk_sel) { > > + ret = kszphy_rmii_clk_sel(phydev, priv->rmii_ref_clk_sel_val); > > + if (ret) { > > + phydev_err(phydev, > > + "failed to set rmii reference clock\n"); > > + return ret; > > + } > > + } > > + > > + if (priv->led_mode >= 0) > > + kszphy_setup_led(phydev, priv->type->led_mode_reg, > > priv->led_mode); > > Should not we actually call kszphy_config_init() in order to restore > broadcast and nand disable bits as well? I don't know. In my case the B_CAST_OFF bit doesn't seem to be lost and NAND_TREE_ON is already off by the time it gets to linux. The bit that get lost seem to disappear just as the phy is resumed. I added some prints and they look like this: PM: early resume of devices complete after 6.534 msecs begin resume 0x1F=0x8190 0x16=0x202 after genphy_resume 0x1F=0x8100 0x16=0x202 end resume 0x1F=0x8190 0x16=0x202 > If not, I would be more comfortable if we did create a specific function > that takes care of setting the reference clock and LED mode. Ok, I can add a function called kszphy_config_reset() with a comment explaining it's for bits lost on reset/resume. Or perhaps a better option would be to just save/restore the entire 0x1F register?
[PATCH net-next] netlink: include netnsid only when netns differs.
Don't include netns id for notifications broadcasts when the socket and the skb are in the same netns because it will be an error which can't be distinguished from a peer netns failing to allocate an id. Signed-off-by: Flavio Leitner --- net/netlink/af_netlink.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c index ee841f0..b9f1392 100644 --- a/net/netlink/af_netlink.c +++ b/net/netlink/af_netlink.c @@ -1414,8 +1414,10 @@ static void do_one_broadcast(struct sock *sk, p->skb2 = NULL; goto out; } - NETLINK_CB(p->skb2).nsid = peernet2id(sock_net(sk), p->net); - NETLINK_CB(p->skb2).nsid_is_set = true; + if (!net_eq(sock_net(sk), p->net)) { + NETLINK_CB(p->skb2).nsid = peernet2id(sock_net(sk), p->net); + NETLINK_CB(p->skb2).nsid_is_set = true; + } val = netlink_broadcast_deliver(sk, p->skb2); if (val < 0) { netlink_overrun(sk); -- 2.9.4
Re: [PATCH] dt-bindings: net: move FMan binding
On Thu, May 25, 2017 at 12:57:26PM +, Madalin-Cristian Bucur wrote: > > -Original Message- > > From: David Miller [mailto:da...@davemloft.net] > > Sent: Monday, May 15, 2017 5:31 PM > > Subject: Re: [PATCH] dt-bindings: net: move FMan binding > > > > From: Madalin Bucur > > Date: Mon, 15 May 2017 16:36:38 +0300 > > > > > Besides the PPC SoCs, the QorIQ DPAA FMan is also present on ARM SoCs, > > > moving the device tree binding document into the bindings/net folder. > > > > > > Signed-off-by: Madalin Bucur > > > > What tree is this targetted at for merging? > > I hope Rob Herring will take this into his tree, it's a device tree binding > change. Applied. Rob
Re: [PATCH V2 1/3] Documentation: devicetree: add multiple cpu port DSA binding
On 05/30/2017 03:44 AM, John Crispin wrote: > Extend the DSA binding documentation, adding the new property required > when there is more than one CPU port attached to the switch. > > Cc: Rob Herring > Cc: devicet...@vger.kernel.org > Signed-off-by: John Crispin > --- > Documentation/devicetree/bindings/net/dsa/dsa.txt | 61 > ++- > 1 file changed, 60 insertions(+), 1 deletion(-) > > diff --git a/Documentation/devicetree/bindings/net/dsa/dsa.txt > b/Documentation/devicetree/bindings/net/dsa/dsa.txt > index cfe8f64eca4f..c164eb38ccc5 100644 > --- a/Documentation/devicetree/bindings/net/dsa/dsa.txt > +++ b/Documentation/devicetree/bindings/net/dsa/dsa.txt > @@ -55,6 +55,11 @@ A user port has the following optional property: > - label : Describes the label associated with this > port, which >will become the netdev name. > > +- cpu: Option for non "cpu"/"dsa" ports. A phandle > to a > + "cpu" port, which will be used for passing packets > + from this port to the host. If not present, the first > + "cpu" port will be used. So this option essentially allow us to "partition" the switch between vectors of ports and their upstream/CPU port. While using Device Tree is an obvious choice for making the initial partitioning, it seems like we are missing a configuration mechanism whereby we can properly assign ports to a specific upstream CPU port. Let's move the actual discussion into patch 2 in order not to pollute the DT maintainers' inbox. > + > Port child nodes may also contain the following optional standardised > properties, described in binding documents: > > @@ -71,7 +76,7 @@ properties, described in binding documents: > Documentation/devicetree/bindings/net/fixed-link.txt > for details. > > -Example > +Examples > > The following example shows three switches on three MDIO busses, > linked into one DSA cluster. > @@ -264,6 +269,60 @@ linked into one DSA cluster. > }; > }; > > +The following example shows a switch that has two cpu ports each connecting > +to a different MAC. > + > +&mdio0 { > + switch@0 { > + compatible = "mediatek,mt7530"; > + #address-cells = <1>; > + #size-cells = <0>; > + reg = <0>; > + > + ports { > + #address-cells = <1>; > + #size-cells = <0>; > + port@0 { > + reg = <0>; > + label = "lan0"; > + cpu = <&cpu_port1>; > + }; > + > + port@1 { > + reg = <1>; > + label = "lan1"; > + cpu = <&cpu_port1>; > + }; > + > + port@2 { > + reg = <2>; > + label = "lan2"; > + cpu = <&cpu_port1>; > + }; > + > + port@3 { > + reg = <3>; > + label = "wan"; > + cpu = <&cpu_port2>; > + }; > + > + cpu_port2: port@5 { > + reg = <5>; > + label = "cpu"; > + ethernet = <&gmac2>; > + phy-mode = "trgmii"; > + }; > + > + cpu_port1: port@6 { > + reg = <6>; > + label = "cpu"; > + ethernet = <&gmac1>; > + phy-mode = "trgmii"; > + }; > + }; > + }; > +}; > + > Deprecated Binding > -- > > -- Florian
Re: [PATCH net-next 1/7] net: dsa: hide dsa_uses_tagged_protocol code
Hi Vivien, [auto build test ERROR on net-next/master] url: https://github.com/0day-ci/linux/commits/Vivien-Didelot/net-dsa-tagger-simplification/20170531-032911 config: ia64-allmodconfig (attached as .config) compiler: ia64-linux-gcc (GCC) 6.2.0 reproduce: wget https://raw.githubusercontent.com/01org/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # save the attached .config to linux build tree make.cross ARCH=ia64 All errors (new ones prefixed by >>): net/built-in.o: In function `eth_type_trans': >> (.text+0xbf5f2): undefined reference to `dsa_uses_tagged_protocol' net/built-in.o: In function `ic_close_devs': >> ipconfig.c:(.init.text+0x72c2): undefined reference to >> `dsa_uses_tagged_protocol' net/built-in.o: In function `ip_auto_config': ipconfig.c:(.init.text+0x8882): undefined reference to `dsa_uses_tagged_protocol' --- 0-DAY kernel test infrastructureOpen Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation .config.gz Description: application/gzip
Re: [PATCH net-next v2 6/6] net: dsa: factor skb freeing on xmit
On 05/30/2017 11:33 AM, Vivien Didelot wrote: > The taggers are currently responsible to free the original SKB if they > made a copy of it, or in case of error. > > This patch simplifies this by freeing the original SKB in the > dsa_slave_xmit caller, but only if an error (NULL) is returned. Is not it a clearer contract if the tagging protocol must always free the original SKB, whether it succeeded in creating a new one or not? > > It is still the responsibility of the tagger to free the original SKB if > it returned a copy of it. > > At the same time, fix the checkpatch NULL comparison check: > > CHECK: Comparison to NULL could be written "!nskb" > #208: FILE: net/dsa/tag_trailer.c:35: > + if (nskb == NULL) > > Signed-off-by: Vivien Didelot > --- > net/dsa/slave.c | 8 ++-- > net/dsa/tag_brcm.c| 6 +- > net/dsa/tag_dsa.c | 8 ++-- > net/dsa/tag_edsa.c| 8 ++-- > net/dsa/tag_lan9303.c | 5 + > net/dsa/tag_mtk.c | 6 +- > net/dsa/tag_qca.c | 6 +- > net/dsa/tag_trailer.c | 4 +--- > 8 files changed, 15 insertions(+), 36 deletions(-) > > diff --git a/net/dsa/slave.c b/net/dsa/slave.c > index 887e26695519..4473aec9dcda 100644 > --- a/net/dsa/slave.c > +++ b/net/dsa/slave.c > @@ -362,10 +362,14 @@ static netdev_tx_t dsa_slave_xmit(struct sk_buff *skb, > struct net_device *dev) > dev->stats.tx_packets++; > dev->stats.tx_bytes += skb->len; > > - /* Transmit function may have to reallocate the original SKB */ > + /* Transmit function may have to reallocate the original SKB, > + * in which case it must have freed it. Only free it here on error. > + */ > nskb = p->xmit(skb, dev); > - if (!nskb) > + if (!nskb) { > + kfree_skb(skb); > return NETDEV_TX_OK; > + } > > /* SKB for netpoll still need to be mangled with the protocol-specific >* tag to be successfully transmitted > diff --git a/net/dsa/tag_brcm.c b/net/dsa/tag_brcm.c > index 10fa4c0ca46b..5db6bcfde025 100644 > --- a/net/dsa/tag_brcm.c > +++ b/net/dsa/tag_brcm.c > @@ -65,7 +65,7 @@ static struct sk_buff *brcm_tag_xmit(struct sk_buff *skb, > struct net_device *dev > u8 *brcm_tag; > > if (skb_cow_head(skb, BRCM_TAG_LEN) < 0) > - goto out_free; > + return NULL; > > skb_push(skb, BRCM_TAG_LEN); > > @@ -86,10 +86,6 @@ static struct sk_buff *brcm_tag_xmit(struct sk_buff *skb, > struct net_device *dev > brcm_tag[3] = (1 << p->dp->index) & BRCM_IG_DSTMAP1_MASK; > > return skb; > - > -out_free: > - kfree_skb(skb); > - return NULL; > } > > static struct sk_buff *brcm_tag_rcv(struct sk_buff *skb, struct net_device > *dev) > diff --git a/net/dsa/tag_dsa.c b/net/dsa/tag_dsa.c > index 9f5fecaa4a93..6837ca9160a8 100644 > --- a/net/dsa/tag_dsa.c > +++ b/net/dsa/tag_dsa.c > @@ -28,7 +28,7 @@ static struct sk_buff *dsa_xmit(struct sk_buff *skb, struct > net_device *dev) >*/ > if (skb->protocol == htons(ETH_P_8021Q)) { > if (skb_cow_head(skb, 0) < 0) > - goto out_free; > + return NULL; > > /* >* Construct tagged FROM_CPU DSA tag from 802.1q tag. > @@ -46,7 +46,7 @@ static struct sk_buff *dsa_xmit(struct sk_buff *skb, struct > net_device *dev) > } > } else { > if (skb_cow_head(skb, DSA_HLEN) < 0) > - goto out_free; > + return NULL; > skb_push(skb, DSA_HLEN); > > memmove(skb->data, skb->data + DSA_HLEN, 2 * ETH_ALEN); > @@ -62,10 +62,6 @@ static struct sk_buff *dsa_xmit(struct sk_buff *skb, > struct net_device *dev) > } > > return skb; > - > -out_free: > - kfree_skb(skb); > - return NULL; > } > > static struct sk_buff *dsa_rcv(struct sk_buff *skb, struct net_device *dev) > diff --git a/net/dsa/tag_edsa.c b/net/dsa/tag_edsa.c > index a9a06e19abeb..96ddc90472a2 100644 > --- a/net/dsa/tag_edsa.c > +++ b/net/dsa/tag_edsa.c > @@ -30,7 +30,7 @@ static struct sk_buff *edsa_xmit(struct sk_buff *skb, > struct net_device *dev) >*/ > if (skb->protocol == htons(ETH_P_8021Q)) { > if (skb_cow_head(skb, DSA_HLEN) < 0) > - goto out_free; > + return NULL; > skb_push(skb, DSA_HLEN); > > memmove(skb->data, skb->data + DSA_HLEN, 2 * ETH_ALEN); > @@ -55,7 +55,7 @@ static struct sk_buff *edsa_xmit(struct sk_buff *skb, > struct net_device *dev) > } > } else { > if (skb_cow_head(skb, EDSA_HLEN) < 0) > - goto out_free; > + return NULL; > skb_push(skb, EDSA_HLEN); > > memmove(skb->data, skb->data + EDSA_HLEN, 2 * ETH_ALEN); > @@ -75,10 +75,6 @@ static struct sk_buff *edsa_xmit(struct sk_buff *skb, > struct net_device *dev) >
Re: [PATCH net-next v2 5/6] net: dsa: remove out_drop label in taggers rcv
On 05/30/2017 11:33 AM, Vivien Didelot wrote: > Many rcv functions from net/dsa/tag_*.c have a useless out_drop goto > label which simply returns NULL. Kill it in favor of the obvious. Why not > > Signed-off-by: Vivien Didelot Reviewed-by: Florian Fainelli -- Florian
Re: [PATCH net-next v2 3/6] net: dsa: remove dsa_uses_tagged_protocol
On 05/30/2017 11:33 AM, Vivien Didelot wrote: > Since dev->dsa_ptr is a pointer to a dsa_switch_tree, there is no need > to have another inline helper just to check rcv. > > Remove dsa_uses_tagged_protocol and check dsa_ptr && dsa_ptr->rcv > together at the same time. > > Signed-off-by: Vivien Didelot FYI, the part that matters here is that we know whether the master network device actually uses a tagged DSA protocol, this is important for e.g: bridge (see 8db0a2ee2c6302a1dcbcdb93cb731dfc6c0cdb5e). Some drivers like bcmsysport.c also care about that in order to enable specific switch tagging protocol parsing hints. Reviewed-by: Florian Fainelli > --- > include/net/dsa.h | 8 +--- > 1 file changed, 1 insertion(+), 7 deletions(-) > > diff --git a/include/net/dsa.h b/include/net/dsa.h > index 58297e4c6b31..4675b52c964c 100644 > --- a/include/net/dsa.h > +++ b/include/net/dsa.h > @@ -467,16 +467,10 @@ struct mii_bus *dsa_host_dev_to_mii_bus(struct device > *dev); > struct net_device *dsa_dev_to_net_device(struct device *dev); > > /* Keep inline for faster access in hot path */ > -static inline bool dsa_uses_tagged_protocol(struct dsa_switch_tree *dst) > -{ > - return dst->rcv != NULL; > -} > - > static inline bool netdev_uses_dsa(struct net_device *dev) > { > #if IS_ENABLED(CONFIG_NET_DSA) > - if (dev->dsa_ptr != NULL) > - return dsa_uses_tagged_protocol(dev->dsa_ptr); > + return dev->dsa_ptr && dev->dsa_ptr->rcv; > #endif > return false; > } > -- Florian
[PATCH] net: fix auto-loading of Marvell DSA driver
Auto-loading of the Marvell DSA driver has stopped working with recent kernels. This seems to be due to the change of binding for DSA devices, moving them from the platform bus to the MDIO bus. In order for module auto-loading to work, we need to provide a MODALIAS string in the uevent file for the device. However, the device core does not automatically provide this, and needs each bus_type to implement a uevent method to generate these strings. The MDIO bus does not provide such a method, so no MODALIAS string is provided: .# cat /sys/bus/mdio_bus/devices/f1072004.mdio-mii\:04/uevent DRIVER=mv88e6085 OF_NAME=switch OF_FULLNAME=/soc/internal-regs/mdio@72004/switch@4 OF_COMPATIBLE_0=marvell,mv88e6085 OF_COMPATIBLE_N=1 In the case of OF-based devices, the solution is easy - of_device_uevent_modalias() does the work for us. After this is done, the uevent file looks like this: .# cat /sys/bus/mdio_bus/devices/f1072004.mdio-mii\:04/uevent DRIVER=mv88e6085 OF_NAME=switch OF_FULLNAME=/soc/internal-regs/mdio@72004/switch@4 OF_COMPATIBLE_0=marvell,mv88e6085 OF_COMPATIBLE_N=1 MODALIAS=of:NswitchTCmarvell,mv88e6085 which results in auto-loading of the Marvell DSA driver on Clearfog platforms. Fixes: c0405563a613 ("ARM: dts: armada-388-clearfog: Utilize new DSA binding") Signed-off-by: Russell King --- drivers/net/phy/mdio_bus.c | 13 + 1 file changed, 13 insertions(+) diff --git a/drivers/net/phy/mdio_bus.c b/drivers/net/phy/mdio_bus.c index 8e73f5f36e71..f99c21f78b63 100644 --- a/drivers/net/phy/mdio_bus.c +++ b/drivers/net/phy/mdio_bus.c @@ -658,6 +658,18 @@ static int mdio_bus_match(struct device *dev, struct device_driver *drv) return 0; } +static int mdio_uevent(struct device *dev, struct kobj_uevent_env *env) +{ + int rc; + + /* Some devices have extra OF data and an OF-style MODALIAS */ + rc = of_device_uevent_modalias(dev, env); + if (rc != -ENODEV) + return rc; + + return 0; +} + #ifdef CONFIG_PM static int mdio_bus_suspend(struct device *dev) { @@ -708,6 +720,7 @@ static const struct dev_pm_ops mdio_bus_pm_ops = { struct bus_type mdio_bus_type = { .name = "mdio_bus", .match = mdio_bus_match, + .uevent = mdio_uevent, .pm = MDIO_BUS_PM_OPS, }; EXPORT_SYMBOL(mdio_bus_type); -- 2.7.4
Re: [PATCH net-next v2 2/6] net: dsa: do not cast dst
On 05/30/2017 11:33 AM, Vivien Didelot wrote: > dsa_ptr is not a void pointer anymore since Nov 2011, as of cf50dcc24f82 > ("dsa: Change dsa_uses_{dsa, trailer}_tags() into inline functions"), > but an explicit dsa_switch_tree pointer, thus remove the (void *) cast. > > Signed-off-by: Vivien Didelot Reviewed-by: Florian Fainelli -- Florian
Re: [PATCH net-next v2 1/6] net: dsa: comment hot path requirements
On 05/30/2017 11:33 AM, Vivien Didelot wrote: > The DSA layer uses inline helpers and copies of the tagging functions > for faster access in hot path. Add comments to detail that. > > Signed-off-by: Vivien Didelot Reviewed-by: Florian Fainelli -- Florian
[PATCH net-next 2/9] bpf: split bpf core interpreter
split __bpf_prog_run() interpreter into stack allocation and execution parts. The code section shrinks which helps interpreter performance in some cases. textdata bss dec hex filename 26350 10328 624 3730291b6 kernel/bpf/core.o.before 25777 10328 624 367298f79 kernel/bpf/core.o.after Very short programs got slower (due to extra function call): Before: test_bpf: #89 ALU64_ADD_K: 1 + 2 = 3 jited:0 7 PASS test_bpf: #90 ALU64_ADD_K: 3 + 0 = 3 jited:0 8 PASS test_bpf: #91 ALU64_ADD_K: 1 + 2147483646 = 2147483647 jited:0 7 PASS test_bpf: #92 ALU64_ADD_K: 4294967294 + 2 = 4294967296 jited:0 11 PASS test_bpf: #93 ALU64_ADD_K: 2147483646 + -2147483647 = -1 jited:0 7 PASS After: test_bpf: #89 ALU64_ADD_K: 1 + 2 = 3 jited:0 11 PASS test_bpf: #90 ALU64_ADD_K: 3 + 0 = 3 jited:0 11 PASS test_bpf: #91 ALU64_ADD_K: 1 + 2147483646 = 2147483647 jited:0 11 PASS test_bpf: #92 ALU64_ADD_K: 4294967294 + 2 = 4294967296 jited:0 14 PASS test_bpf: #93 ALU64_ADD_K: 2147483646 + -2147483647 = -1 jited:0 10 PASS Longer programs got faster: Before: test_bpf: #266 BPF_MAXINSNS: Ctx heavy transformations jited:0 20286 20513 PASS test_bpf: #267 BPF_MAXINSNS: Call heavy transformations jited:0 31853 31768 PASS test_bpf: #268 BPF_MAXINSNS: Jump heavy test jited:0 9815 PASS test_bpf: #269 BPF_MAXINSNS: Very long jump backwards jited:0 6 PASS test_bpf: #270 BPF_MAXINSNS: Edge hopping nuthouse jited:0 13959 PASS test_bpf: #271 BPF_MAXINSNS: Jump, gap, jump, ... jited:0 210 PASS test_bpf: #272 BPF_MAXINSNS: ld_abs+get_processor_id jited:0 21724 PASS test_bpf: #273 BPF_MAXINSNS: ld_abs+vlan_push/pop jited:0 19118 PASS After: test_bpf: #266 BPF_MAXINSNS: Ctx heavy transformations jited:0 19008 18827 PASS test_bpf: #267 BPF_MAXINSNS: Call heavy transformations jited:0 29238 28450 PASS test_bpf: #268 BPF_MAXINSNS: Jump heavy test jited:0 9485 PASS test_bpf: #269 BPF_MAXINSNS: Very long jump backwards jited:0 12 PASS test_bpf: #270 BPF_MAXINSNS: Edge hopping nuthouse jited:0 13257 PASS test_bpf: #271 BPF_MAXINSNS: Jump, gap, jump, ... jited:0 213 PASS test_bpf: #272 BPF_MAXINSNS: ld_abs+get_processor_id jited:0 19389 PASS test_bpf: #273 BPF_MAXINSNS: ld_abs+vlan_push/pop jited:0 19583 PASS For real world production programs the difference is noise. This patch is first step towards reducing interpreter stack consumption. Signed-off-by: Alexei Starovoitov Acked-by: Daniel Borkmann --- kernel/bpf/core.c | 21 ++--- 1 file changed, 14 insertions(+), 7 deletions(-) diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c index 339289402b96..abd410d394bc 100644 --- a/kernel/bpf/core.c +++ b/kernel/bpf/core.c @@ -763,10 +763,10 @@ EXPORT_SYMBOL_GPL(__bpf_call_base); * * Decode and execute eBPF instructions. */ -static unsigned int __bpf_prog_run(void *ctx, const struct bpf_insn *insn) +static unsigned int ___bpf_prog_run(u64 *regs, const struct bpf_insn *insn, + u64 *stack) { - u64 stack[MAX_BPF_STACK / sizeof(u64)]; - u64 regs[MAX_BPF_REG], tmp; + u64 tmp; static const void *jumptable[256] = { [0 ... 255] = &&default_label, /* Now overwrite non-defaults ... */ @@ -874,9 +874,6 @@ static unsigned int __bpf_prog_run(void *ctx, const struct bpf_insn *insn) #define CONT({ insn++; goto select_insn; }) #define CONT_JMP ({ insn++; goto select_insn; }) - FP = (u64) (unsigned long) &stack[ARRAY_SIZE(stack)]; - ARG1 = (u64) (unsigned long) ctx; - select_insn: goto *jumptable[insn->code]; @@ -1219,7 +1216,17 @@ static unsigned int __bpf_prog_run(void *ctx, const struct bpf_insn *insn) WARN_RATELIMIT(1, "unknown opcode %02x\n", insn->code); return 0; } -STACK_FRAME_NON_STANDARD(__bpf_prog_run); /* jump table */ +STACK_FRAME_NON_STANDARD(___bpf_prog_run); /* jump table */ + +static unsigned int __bpf_prog_run(void *ctx, const struct bpf_insn *insn) +{ + u64 stack[MAX_BPF_STACK / sizeof(u64)]; + u64 regs[MAX_BPF_REG]; + + FP = (u64) (unsigned long) &stack[ARRAY_SIZE(stack)]; + ARG1 = (u64) (unsigned long) ctx; + return ___bpf_prog_run(regs, insn, stack); +} bool bpf_prog_array_compatible(struct bpf_array *array, const struct bpf_prog *fp) -- 2.9.3
[PATCH net-next 7/9] bpf: use different interpreter depending on required stack size
16 __bpf_prog_run() interpreters for various stack sizes add .text but not a lot comparing to run-time stack savings textdata bss dec hex filename 26350 10328 624 3730291b6 kernel/bpf/core.o.before_split 25777 10328 624 367298f79 kernel/bpf/core.o.after_split 26970 10328 624 379229422 kernel/bpf/core.o.now Signed-off-by: Alexei Starovoitov Acked-by: Daniel Borkmann --- kernel/bpf/core.c | 40 +++- 1 file changed, 31 insertions(+), 9 deletions(-) diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c index abd410d394bc..774069ca18a7 100644 --- a/kernel/bpf/core.c +++ b/kernel/bpf/core.c @@ -1218,16 +1218,38 @@ static unsigned int ___bpf_prog_run(u64 *regs, const struct bpf_insn *insn, } STACK_FRAME_NON_STANDARD(___bpf_prog_run); /* jump table */ -static unsigned int __bpf_prog_run(void *ctx, const struct bpf_insn *insn) -{ - u64 stack[MAX_BPF_STACK / sizeof(u64)]; - u64 regs[MAX_BPF_REG]; - - FP = (u64) (unsigned long) &stack[ARRAY_SIZE(stack)]; - ARG1 = (u64) (unsigned long) ctx; - return ___bpf_prog_run(regs, insn, stack); +#define PROG_NAME(stack_size) __bpf_prog_run##stack_size +#define DEFINE_BPF_PROG_RUN(stack_size) \ +static unsigned int PROG_NAME(stack_size)(const void *ctx, const struct bpf_insn *insn) \ +{ \ + u64 stack[stack_size / sizeof(u64)]; \ + u64 regs[MAX_BPF_REG]; \ +\ + FP = (u64) (unsigned long) &stack[ARRAY_SIZE(stack)]; \ + ARG1 = (u64) (unsigned long) ctx; \ + return ___bpf_prog_run(regs, insn, stack); \ } +#define EVAL1(FN, X) FN(X) +#define EVAL2(FN, X, Y...) FN(X) EVAL1(FN, Y) +#define EVAL3(FN, X, Y...) FN(X) EVAL2(FN, Y) +#define EVAL4(FN, X, Y...) FN(X) EVAL3(FN, Y) +#define EVAL5(FN, X, Y...) FN(X) EVAL4(FN, Y) +#define EVAL6(FN, X, Y...) FN(X) EVAL5(FN, Y) + +EVAL6(DEFINE_BPF_PROG_RUN, 32, 64, 96, 128, 160, 192); +EVAL6(DEFINE_BPF_PROG_RUN, 224, 256, 288, 320, 352, 384); +EVAL4(DEFINE_BPF_PROG_RUN, 416, 448, 480, 512); + +#define PROG_NAME_LIST(stack_size) PROG_NAME(stack_size), + +static unsigned int (*interpreters[])(const void *ctx, + const struct bpf_insn *insn) = { +EVAL6(PROG_NAME_LIST, 32, 64, 96, 128, 160, 192) +EVAL6(PROG_NAME_LIST, 224, 256, 288, 320, 352, 384) +EVAL4(PROG_NAME_LIST, 416, 448, 480, 512) +}; + bool bpf_prog_array_compatible(struct bpf_array *array, const struct bpf_prog *fp) { @@ -1275,7 +1297,7 @@ static int bpf_check_tail_call(const struct bpf_prog *fp) */ struct bpf_prog *bpf_prog_select_runtime(struct bpf_prog *fp, int *err) { - fp->bpf_func = (void *) __bpf_prog_run; + fp->bpf_func = interpreters[round_down(fp->aux->stack_depth, 32) / 32]; /* eBPF JITs can rewrite the program in case constant * blinding is active. However, in case of error during -- 2.9.3
Re: [patch net-next 0/9] mlxsw: Support firmware flash
On Sun, May 28, 2017 at 10:26 AM, Yotam Gigi wrote: > On 05/23/2017 06:38 PM, David Miller wrote: >> From: Yotam Gigi >> Date: Tue, 23 May 2017 18:14:15 +0300 >>> Sorry, I am not sure I understand. You think that drivers should not >>> implement >>> ethtool's flash_device callback anymore? do you have an alternative for >>> firmware flash? >> As stated, export an MTD device. > So, after we have been going over MTD, it seems like it does not fit our needs > at all. > MTD device provides (erasable-)block access to a flash storage, where in our > case the firmware burn process is just pouring a binary BLOB into the device. > The driver is not aware of the internal storage used for storing the firmware > as > it is not defined in our driver-hardware API. > > Needless to say that block access has no meaning in our case, so any solution > that will involve MTD device to burn our firmware (if there is a solution at > all) will be a workaround and will not fit MTD purpose. > > Apart for boot time firmware flash, which we have already pushed we would > really > like to allow the user to ask for a specific firmware version. Do you have any > other solution for us apart from "ethtool -f"? > > This problem is even more relevant in the Mellanox HCA driver team, which > would > like to use that code in order to burn the HCA firmware, but not intend to > trigger it on boot time, which means that must have a way for the user to > trigger it. Hi Dave, We had few more emails on this thread with Jakub, and he's now happy with our replies, so where do we go from here? could you comment on Yotam's note. thanks, Or.
[PATCH net-next 1/9] bpf: free up BPF_JMP | BPF_CALL | BPF_X opcode
free up BPF_JMP | BPF_CALL | BPF_X opcode to be used by actual indirect call by register and use kernel internal opcode to mark call instruction into bpf_tail_call() helper. Signed-off-by: Alexei Starovoitov Acked-by: Daniel Borkmann --- arch/arm64/net/bpf_jit_comp.c | 2 +- arch/powerpc/net/bpf_jit_comp64.c | 2 +- arch/s390/net/bpf_jit_comp.c | 2 +- arch/sparc/net/bpf_jit_comp_64.c | 2 +- arch/x86/net/bpf_jit_comp.c | 2 +- include/linux/filter.h| 3 +++ kernel/bpf/core.c | 2 +- kernel/bpf/verifier.c | 2 +- 8 files changed, 10 insertions(+), 7 deletions(-) diff --git a/arch/arm64/net/bpf_jit_comp.c b/arch/arm64/net/bpf_jit_comp.c index 71f930501ade..b1d38eeb24f6 100644 --- a/arch/arm64/net/bpf_jit_comp.c +++ b/arch/arm64/net/bpf_jit_comp.c @@ -586,7 +586,7 @@ static int build_insn(const struct bpf_insn *insn, struct jit_ctx *ctx) break; } /* tail call */ - case BPF_JMP | BPF_CALL | BPF_X: + case BPF_JMP | BPF_TAIL_CALL: if (emit_bpf_tail_call(ctx)) return -EFAULT; break; diff --git a/arch/powerpc/net/bpf_jit_comp64.c b/arch/powerpc/net/bpf_jit_comp64.c index aee2bb817ac6..a01366584a4b 100644 --- a/arch/powerpc/net/bpf_jit_comp64.c +++ b/arch/powerpc/net/bpf_jit_comp64.c @@ -938,7 +938,7 @@ static int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, /* * Tail call */ - case BPF_JMP | BPF_CALL | BPF_X: + case BPF_JMP | BPF_TAIL_CALL: ctx->seen |= SEEN_TAILCALL; bpf_jit_emit_tail_call(image, ctx, addrs[i + 1]); break; diff --git a/arch/s390/net/bpf_jit_comp.c b/arch/s390/net/bpf_jit_comp.c index 6e97a2e3fd8d..42ad3832586c 100644 --- a/arch/s390/net/bpf_jit_comp.c +++ b/arch/s390/net/bpf_jit_comp.c @@ -991,7 +991,7 @@ static noinline int bpf_jit_insn(struct bpf_jit *jit, struct bpf_prog *fp, int i } break; } - case BPF_JMP | BPF_CALL | BPF_X: + case BPF_JMP | BPF_TAIL_CALL: /* * Implicit input: * B1: pointer to ctx diff --git a/arch/sparc/net/bpf_jit_comp_64.c b/arch/sparc/net/bpf_jit_comp_64.c index 21de77419f48..4a52d34facf9 100644 --- a/arch/sparc/net/bpf_jit_comp_64.c +++ b/arch/sparc/net/bpf_jit_comp_64.c @@ -1217,7 +1217,7 @@ static int build_insn(const struct bpf_insn *insn, struct jit_ctx *ctx) } /* tail call */ - case BPF_JMP | BPF_CALL |BPF_X: + case BPF_JMP | BPF_TAIL_CALL: emit_tail_call(ctx); break; diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c index f58939393eef..fec12eaa0dec 100644 --- a/arch/x86/net/bpf_jit_comp.c +++ b/arch/x86/net/bpf_jit_comp.c @@ -877,7 +877,7 @@ xadd: if (is_imm8(insn->off)) } break; - case BPF_JMP | BPF_CALL | BPF_X: + case BPF_JMP | BPF_TAIL_CALL: emit_bpf_tail_call(&prog); break; diff --git a/include/linux/filter.h b/include/linux/filter.h index 62d948f80730..a20ba40fcb73 100644 --- a/include/linux/filter.h +++ b/include/linux/filter.h @@ -57,6 +57,9 @@ struct bpf_prog_aux; #define BPF_REG_AX MAX_BPF_REG #define MAX_BPF_JIT_REG(MAX_BPF_REG + 1) +/* unused opcode to mark special call to bpf_tail_call() helper */ +#define BPF_TAIL_CALL 0xf0 + /* As per nm, we expose JITed images as text (code) section for * kallsyms. That way, tools like perf can find it to match * addresses. diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c index dedf367f59bb..339289402b96 100644 --- a/kernel/bpf/core.c +++ b/kernel/bpf/core.c @@ -824,7 +824,7 @@ static unsigned int __bpf_prog_run(void *ctx, const struct bpf_insn *insn) [BPF_ALU64 | BPF_NEG] = &&ALU64_NEG, /* Call instruction */ [BPF_JMP | BPF_CALL] = &&JMP_CALL, - [BPF_JMP | BPF_CALL | BPF_X] = &&JMP_TAIL_CALL, + [BPF_JMP | BPF_TAIL_CALL] = &&JMP_TAIL_CALL, /* Jumps */ [BPF_JMP | BPF_JA] = &&JMP_JA, [BPF_JMP | BPF_JEQ | BPF_X] = &&JMP_JEQ_X, diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 339c8a1371de..28113d0e8e92 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -3469,7 +3469,7 @@ static int fixup_bpf_calls(struct bpf_verifier_env *env) * that doesn't support bpf_tail_call yet */ insn->imm = 0; - insn->code |= BPF_X; + insn->code = BPF_JMP | BPF_TAIL_CALL; continue; } -- 2.9.3
[PATCH net-next 6/9] bpf: fix stack_depth usage by test_bpf.ko
test_bpf.ko doesn't call verifier before selecting interpreter or JITing, hence the tests need to manually specify the amount of stack they consume. Signed-off-by: Alexei Starovoitov Acked-by: Daniel Borkmann --- lib/test_bpf.c | 25 - 1 file changed, 24 insertions(+), 1 deletion(-) diff --git a/lib/test_bpf.c b/lib/test_bpf.c index be88cbaadde3..070bde56474c 100644 --- a/lib/test_bpf.c +++ b/lib/test_bpf.c @@ -84,6 +84,7 @@ struct bpf_test { } test[MAX_SUBTESTS]; int (*fill_helper)(struct bpf_test *self); __u8 frag_data[MAX_DATA]; + int stack_depth; /* for eBPF only, since tests don't call verifier */ }; /* Large test cases need separate allocation and fill handler. */ @@ -455,6 +456,7 @@ static int __bpf_fill_stxdw(struct bpf_test *self, int size) self->u.ptr.insns = insn; self->u.ptr.len = len; + self->stack_depth = 40; return 0; } @@ -2317,7 +2319,8 @@ static struct bpf_test tests[] = { { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0x08, 0x06, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0x10, 0xbf, 0x48, 0xd6, 0x43, 0xd6}, - { { 38, 256 } } + { { 38, 256 } }, + .stack_depth = 64, }, /* BPF_ALU | BPF_MOV | BPF_X */ { @@ -4169,6 +4172,7 @@ static struct bpf_test tests[] = { INTERNAL, { }, { { 0, 0xff } }, + .stack_depth = 40, }, { "ST_MEM_B: Store/Load byte: max positive", @@ -4181,6 +4185,7 @@ static struct bpf_test tests[] = { INTERNAL, { }, { { 0, 0x7f } }, + .stack_depth = 40, }, { "STX_MEM_B: Store/Load byte: max negative", @@ -4194,6 +4199,7 @@ static struct bpf_test tests[] = { INTERNAL, { }, { { 0, 0xff } }, + .stack_depth = 40, }, { "ST_MEM_H: Store/Load half word: max negative", @@ -4206,6 +4212,7 @@ static struct bpf_test tests[] = { INTERNAL, { }, { { 0, 0x } }, + .stack_depth = 40, }, { "ST_MEM_H: Store/Load half word: max positive", @@ -4218,6 +4225,7 @@ static struct bpf_test tests[] = { INTERNAL, { }, { { 0, 0x7fff } }, + .stack_depth = 40, }, { "STX_MEM_H: Store/Load half word: max negative", @@ -4231,6 +4239,7 @@ static struct bpf_test tests[] = { INTERNAL, { }, { { 0, 0x } }, + .stack_depth = 40, }, { "ST_MEM_W: Store/Load word: max negative", @@ -4243,6 +4252,7 @@ static struct bpf_test tests[] = { INTERNAL, { }, { { 0, 0x } }, + .stack_depth = 40, }, { "ST_MEM_W: Store/Load word: max positive", @@ -4255,6 +4265,7 @@ static struct bpf_test tests[] = { INTERNAL, { }, { { 0, 0x7fff } }, + .stack_depth = 40, }, { "STX_MEM_W: Store/Load word: max negative", @@ -4268,6 +4279,7 @@ static struct bpf_test tests[] = { INTERNAL, { }, { { 0, 0x } }, + .stack_depth = 40, }, { "ST_MEM_DW: Store/Load double word: max negative", @@ -4280,6 +4292,7 @@ static struct bpf_test tests[] = { INTERNAL, { }, { { 0, 0x } }, + .stack_depth = 40, }, { "ST_MEM_DW: Store/Load double word: max negative 2", @@ -4297,6 +4310,7 @@ static struct bpf_test tests[] = { INTERNAL, { }, { { 0, 0x1 } }, + .stack_depth = 40, }, { "ST_MEM_DW: Store/Load double word: max positive", @@ -4309,6 +4323,7 @@ static struct bpf_test tests[] = { INTERNAL, { }, { { 0, 0x7fff } }, + .stack_depth = 40, }, { "STX_MEM_DW: Store/Load double word: max negative", @@ -4322,6 +4337,7 @@ static struct bpf_test tests[] = { INTERNAL, { }, { { 0, 0x } }, + .stack_depth = 40, }, /* BPF_STX | BPF_XADD | BPF_W/DW */ { @@ -4336,6 +4352,7 @@ static struct bpf_test tests[] = { INTERNAL, { }, { { 0, 0x22 } }, + .stack_depth = 40, }, { "STX_XADD_W: Test side-effects, r10: 0x12 + 0x10 = 0x22", @@ -4351,6 +4368,7 @@
[PATCH net-next 0/9] bpf: stack depth tracking
Introduce tracking of bpf program stack depth in the verifier and use that info to reduce bpf program stack consumption in the interpreter and x64 JIT. Other JITs can take advantage of it as well in the future. Most of the programs consume very little stack, so it's good optimization in general and it's the first step toward bpf to bpf function calls. Also use internal opcode for bpf_tail_call() marking to make clear that jmp|call|x opcode is not uapi and may be used for actual indirect call opcode in the future. Alexei Starovoitov (9): bpf: free up BPF_JMP | BPF_CALL | BPF_X opcode bpf: split bpf core interpreter bpf: teach verifier to track stack depth bpf: reconcile bpf_tail_call and stack_depth bpf: track stack depth of classic bpf programs bpf: fix stack_depth usage by test_bpf.ko bpf: use different interpreter depending on required stack size bpf: change x86 JITed program stack layout bpf: take advantage of stack_depth tracking in x64 JIT arch/arm64/net/bpf_jit_comp.c | 2 +- arch/powerpc/net/bpf_jit_comp64.c | 2 +- arch/s390/net/bpf_jit_comp.c | 2 +- arch/sparc/net/bpf_jit_comp_64.c | 2 +- arch/x86/net/bpf_jit.S| 20 ++-- arch/x86/net/bpf_jit_comp.c | 65 +-- include/linux/bpf.h | 1 + include/linux/filter.h| 3 ++ kernel/bpf/core.c | 47 ++-- kernel/bpf/verifier.c | 13 ++-- lib/test_bpf.c| 25 ++- net/core/filter.c | 36 +- 12 files changed, 147 insertions(+), 71 deletions(-) -- 2.9.3
[PATCH net-next 9/9] bpf: take advantage of stack_depth tracking in x64 JIT
Take advantage of stack_depth tracking in x64 JIT. Round up allocated stack by 8 bytes to make sure it stays aligned for functions called from JITed bpf program. Signed-off-by: Alexei Starovoitov Acked-by: Daniel Borkmann --- arch/x86/net/bpf_jit_comp.c | 9 + 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c index c96dac838f3e..617eac9c4511 100644 --- a/arch/x86/net/bpf_jit_comp.c +++ b/arch/x86/net/bpf_jit_comp.c @@ -206,7 +206,7 @@ struct jit_context { /* emit x64 prologue code for BPF program and check it's size. * bpf_tail_call helper will skip it while jumping into another program */ -static void emit_prologue(u8 **pprog) +static void emit_prologue(u8 **pprog, u32 stack_depth) { u8 *prog = *pprog; int cnt = 0; @@ -214,8 +214,9 @@ static void emit_prologue(u8 **pprog) EMIT1(0x55); /* push rbp */ EMIT3(0x48, 0x89, 0xE5); /* mov rbp,rsp */ - /* sub rsp, MAX_BPF_STACK + AUX_STACK_SPACE */ - EMIT3_off32(0x48, 0x81, 0xEC, MAX_BPF_STACK + AUX_STACK_SPACE); + /* sub rsp, rounded_stack_depth + AUX_STACK_SPACE */ + EMIT3_off32(0x48, 0x81, 0xEC, + round_up(stack_depth, 8) + AUX_STACK_SPACE); /* sub rbp, AUX_STACK_SPACE */ EMIT4(0x48, 0x83, 0xED, AUX_STACK_SPACE); @@ -363,7 +364,7 @@ static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image, int proglen = 0; u8 *prog = temp; - emit_prologue(&prog); + emit_prologue(&prog, bpf_prog->aux->stack_depth); if (seen_ld_abs) emit_load_skb_data_hlen(&prog); -- 2.9.3
[PATCH net-next 8/9] bpf: change x86 JITed program stack layout
in order to JIT programs with different stack sizes we need to make epilogue and exception path to be stack size independent, hence move auxiliary stack space from the bottom of the stack to the top of the stack. Nice side effect is that JITed function prologue becomes shorter due to imm8 offset encoding vs imm32. Signed-off-by: Alexei Starovoitov Acked-by: Daniel Borkmann --- arch/x86/net/bpf_jit.S | 20 +++- arch/x86/net/bpf_jit_comp.c | 58 - 2 files changed, 40 insertions(+), 38 deletions(-) diff --git a/arch/x86/net/bpf_jit.S b/arch/x86/net/bpf_jit.S index f2a7faf4706e..b33093f84528 100644 --- a/arch/x86/net/bpf_jit.S +++ b/arch/x86/net/bpf_jit.S @@ -19,9 +19,6 @@ */ #define SKBDATA%r10 #define SKF_MAX_NEG_OFF$(-0x20) /* SKF_LL_OFF from filter.h */ -#define MAX_BPF_STACK (512 /* from filter.h */ + \ - 32 /* space for rbx,r13,r14,r15 */ + \ - 8 /* space for skb_copy_bits */) #define FUNC(name) \ .globl name; \ @@ -66,7 +63,7 @@ FUNC(sk_load_byte_positive_offset) /* rsi contains offset and can be scratched */ #define bpf_slow_path_common(LEN) \ - lea -MAX_BPF_STACK + 32(%rbp), %rdx;\ + lea 32(%rbp), %rdx;\ FRAME_BEGIN;\ mov %rbx, %rdi; /* arg1 == skb */ \ push%r9;\ @@ -83,14 +80,14 @@ FUNC(sk_load_byte_positive_offset) bpf_slow_path_word: bpf_slow_path_common(4) js bpf_error - mov - MAX_BPF_STACK + 32(%rbp),%eax + mov 32(%rbp),%eax bswap %eax ret bpf_slow_path_half: bpf_slow_path_common(2) js bpf_error - mov - MAX_BPF_STACK + 32(%rbp),%ax + mov 32(%rbp),%ax rol $8,%ax movzwl %ax,%eax ret @@ -98,7 +95,7 @@ FUNC(sk_load_byte_positive_offset) bpf_slow_path_byte: bpf_slow_path_common(1) js bpf_error - movzbl - MAX_BPF_STACK + 32(%rbp),%eax + movzbl 32(%rbp),%eax ret #define sk_negative_common(SIZE) \ @@ -148,9 +145,10 @@ FUNC(sk_load_byte_negative_offset) bpf_error: # force a return 0 from jit handler xor %eax,%eax - mov - MAX_BPF_STACK(%rbp),%rbx - mov - MAX_BPF_STACK + 8(%rbp),%r13 - mov - MAX_BPF_STACK + 16(%rbp),%r14 - mov - MAX_BPF_STACK + 24(%rbp),%r15 + mov (%rbp),%rbx + mov 8(%rbp),%r13 + mov 16(%rbp),%r14 + mov 24(%rbp),%r15 + add $40, %rbp leaveq ret diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c index fec12eaa0dec..c96dac838f3e 100644 --- a/arch/x86/net/bpf_jit_comp.c +++ b/arch/x86/net/bpf_jit_comp.c @@ -197,12 +197,11 @@ struct jit_context { #define BPF_MAX_INSN_SIZE 128 #define BPF_INSN_SAFETY64 -#define STACKSIZE \ - (MAX_BPF_STACK + \ -32 /* space for rbx, r13, r14, r15 */ + \ +#define AUX_STACK_SPACE \ + (32 /* space for rbx, r13, r14, r15 */ + \ 8 /* space for skb_copy_bits() buffer */) -#define PROLOGUE_SIZE 48 +#define PROLOGUE_SIZE 37 /* emit x64 prologue code for BPF program and check it's size. * bpf_tail_call helper will skip it while jumping into another program @@ -215,13 +214,16 @@ static void emit_prologue(u8 **pprog) EMIT1(0x55); /* push rbp */ EMIT3(0x48, 0x89, 0xE5); /* mov rbp,rsp */ - /* sub rsp, STACKSIZE */ - EMIT3_off32(0x48, 0x81, 0xEC, STACKSIZE); + /* sub rsp, MAX_BPF_STACK + AUX_STACK_SPACE */ + EMIT3_off32(0x48, 0x81, 0xEC, MAX_BPF_STACK + AUX_STACK_SPACE); + + /* sub rbp, AUX_STACK_SPACE */ + EMIT4(0x48, 0x83, 0xED, AUX_STACK_SPACE); /* all classic BPF filters use R6(rbx) save it */ - /* mov qword ptr [rbp-X],rbx */ - EMIT3_off32(0x48, 0x89, 0x9D, -STACKSIZE); + /* mov qword ptr [rbp+0],rbx */ + EMIT4(0x48, 0x89, 0x5D, 0); /* bpf_convert_filter() maps classic BPF register X to R7 and uses R8 * as temporary, so all tcpdump filters need to spill/fill R7(r13) and @@ -231,12 +233,12 @@ static void emit_prologue(u8 **pprog) * than synthetic ones. Therefore not worth adding complexity. */ - /* mov qword ptr [rbp-X],r13 */ - EMIT3_off32(0x4C, 0x89, 0xAD, -STACKSIZE + 8); - /* mov qword ptr [rbp-X],r14 */ - EMIT3_off32(0x4C, 0x89, 0xB5, -STACKSIZE + 16); - /* mov qword ptr [rbp-X],r15 */ - EMIT3_off32(0x4C, 0x89, 0xBD, -STACKSIZE + 24); + /* mov qword ptr [rbp+8],r13 */ + EMIT4(0x4C, 0x89, 0x6D, 8); + /* mov qword ptr [rbp+16],r14 */ + EMIT4(0x4C, 0x89, 0x75, 16); + /* mov qword ptr [rbp+24],r15 */ + EMIT4(0x4C, 0x89, 0x7D, 24); /* Clear the tail call counter (tail_call_cnt): for eBPF tail calls * we need to reset the c
[PATCH net-next 5/9] bpf: track stack depth of classic bpf programs
To track stack depth of classic bpf programs we only need to analyze ST|STX instructions, since check_load_and_stores() verifies that programs can load from stack only after write. We also need to change the way cBPF stack slots map to eBPF stack, since typical classic programs are using slots 0 and 1, so they need to map to stack offsets -4 and -8 respectively in order to take advantage of small stack interpreter and JITs. Signed-off-by: Alexei Starovoitov Acked-by: Daniel Borkmann --- net/core/filter.c | 36 ++-- 1 file changed, 22 insertions(+), 14 deletions(-) diff --git a/net/core/filter.c b/net/core/filter.c index a6bb95fa87b2..946f758d44f2 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -352,7 +352,7 @@ static bool convert_bpf_extensions(struct sock_filter *fp, * bpf_convert_filter - convert filter program * @prog: the user passed filter program * @len: the length of the user passed filter program - * @new_prog: buffer where converted program will be stored + * @new_prog: allocated 'struct bpf_prog' or NULL * @new_len: pointer to store length of converted program * * Remap 'sock_filter' style classic BPF (cBPF) instruction set to 'bpf_insn' @@ -364,14 +364,13 @@ static bool convert_bpf_extensions(struct sock_filter *fp, * * 2) 2nd pass to remap in two passes: 1st pass finds new *jump offsets, 2nd pass remapping: - * new_prog = kmalloc(sizeof(struct bpf_insn) * new_len); * bpf_convert_filter(old_prog, old_len, new_prog, &new_len); */ static int bpf_convert_filter(struct sock_filter *prog, int len, - struct bpf_insn *new_prog, int *new_len) + struct bpf_prog *new_prog, int *new_len) { - int new_flen = 0, pass = 0, target, i; - struct bpf_insn *new_insn; + int new_flen = 0, pass = 0, target, i, stack_off; + struct bpf_insn *new_insn, *first_insn = NULL; struct sock_filter *fp; int *addrs = NULL; u8 bpf_src; @@ -383,6 +382,7 @@ static int bpf_convert_filter(struct sock_filter *prog, int len, return -EINVAL; if (new_prog) { + first_insn = new_prog->insnsi; addrs = kcalloc(len, sizeof(*addrs), GFP_KERNEL | __GFP_NOWARN); if (!addrs) @@ -390,11 +390,11 @@ static int bpf_convert_filter(struct sock_filter *prog, int len, } do_pass: - new_insn = new_prog; + new_insn = first_insn; fp = prog; /* Classic BPF related prologue emission. */ - if (new_insn) { + if (new_prog) { /* Classic BPF expects A and X to be reset first. These need * to be guaranteed to be the first two instructions. */ @@ -415,7 +415,7 @@ static int bpf_convert_filter(struct sock_filter *prog, int len, struct bpf_insn *insn = tmp_insns; if (addrs) - addrs[i] = new_insn - new_prog; + addrs[i] = new_insn - first_insn; switch (fp->code) { /* All arithmetic insns and skb loads map as-is. */ @@ -561,17 +561,25 @@ static int bpf_convert_filter(struct sock_filter *prog, int len, /* Store to stack. */ case BPF_ST: case BPF_STX: + stack_off = fp->k * 4 + 4; *insn = BPF_STX_MEM(BPF_W, BPF_REG_FP, BPF_CLASS(fp->code) == BPF_ST ? BPF_REG_A : BPF_REG_X, - -(BPF_MEMWORDS - fp->k) * 4); + -stack_off); + /* check_load_and_stores() verifies that classic BPF can +* load from stack only after write, so tracking +* stack_depth for ST|STX insns is enough +*/ + if (new_prog && new_prog->aux->stack_depth < stack_off) + new_prog->aux->stack_depth = stack_off; break; /* Load from stack. */ case BPF_LD | BPF_MEM: case BPF_LDX | BPF_MEM: + stack_off = fp->k * 4 + 4; *insn = BPF_LDX_MEM(BPF_W, BPF_CLASS(fp->code) == BPF_LD ? BPF_REG_A : BPF_REG_X, BPF_REG_FP, - -(BPF_MEMWORDS - fp->k) * 4); + -stack_off); break; /* A = K or X = K */ @@ -619,13 +627,13 @@ static int bpf_convert_filter(struct sock_filter *prog, int len, if (!new_prog) { /* Only calculating new length. */ - *new_len = new_insn - new_prog; + *new_len = new_insn - first_insn;
[PATCH net-next 3/9] bpf: teach verifier to track stack depth
teach verifier to track bpf program stack depth Signed-off-by: Alexei Starovoitov Acked-by: Daniel Borkmann --- include/linux/bpf.h | 1 + kernel/bpf/verifier.c | 10 +- 2 files changed, 10 insertions(+), 1 deletion(-) diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 6bb38d76faf4..fcc80ca11045 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -171,6 +171,7 @@ struct bpf_prog_aux { atomic_t refcnt; u32 used_map_cnt; u32 max_ctx_offset; + u32 stack_depth; struct latch_tree_node ksym_tnode; struct list_head ksym_lnode; const struct bpf_verifier_ops *ops; diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 28113d0e8e92..d96f27ff9f6f 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -926,6 +926,10 @@ static int check_mem_access(struct bpf_verifier_env *env, u32 regno, int off, verbose("invalid stack off=%d size=%d\n", off, size); return -EACCES; } + + if (env->prog->aux->stack_depth < -off) + env->prog->aux->stack_depth = -off; + if (t == BPF_WRITE) { if (!env->allow_ptr_leaks && state->stack_slot_type[MAX_BPF_STACK + off] == STACK_SPILL && @@ -1032,6 +1036,9 @@ static int check_stack_boundary(struct bpf_verifier_env *env, int regno, return -EACCES; } + if (env->prog->aux->stack_depth < -off) + env->prog->aux->stack_depth = -off; + if (meta && meta->raw_mode) { meta->access_size = access_size; meta->regno = regno; @@ -3167,7 +3174,8 @@ static int do_check(struct bpf_verifier_env *env) insn_idx++; } - verbose("processed %d insns\n", insn_processed); + verbose("processed %d insns, stack depth %d\n", + insn_processed, env->prog->aux->stack_depth); return 0; } -- 2.9.3
[PATCH net-next 4/9] bpf: reconcile bpf_tail_call and stack_depth
The next set of patches will take advantage of stack_depth tracking, so make sure that the program that does bpf_tail_call() has stack depth large enough for the callee. We could have tracked the stack depth of the prog_array owner program and only allow insertion of the programs with stack depth less than the owner, but it will break existing applications. Some of them have trivial root bpf program that only does multiple bpf_tail_calls and at init time the prog array is empty. In the future we may add a flag to do such tracking optionally, but for now play simple and safe. Signed-off-by: Alexei Starovoitov Acked-by: Daniel Borkmann --- kernel/bpf/verifier.c | 1 + 1 file changed, 1 insertion(+) diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index d96f27ff9f6f..14ccb0759fa4 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -3470,6 +3470,7 @@ static int fixup_bpf_calls(struct bpf_verifier_env *env) * the program array. */ prog->cb_access = 1; + env->prog->aux->stack_depth = MAX_BPF_STACK; /* mark bpf_tail_call as different opcode to avoid * conditional branch in the interpeter for every normal -- 2.9.3
Re: [PATCH net-next v2 4/6] net: dsa: remove unused arguments of tagger rcv
Florian Fainelli writes: >> Is orig_dev really needed in the tagging implementation, or only in the >> layer above? (dsa_slave_xmit and dsa_switch_rcv.) > > It's needed: > > https://github.com/ffainelli/linux/commits/switch-tag > https://github.com/ffainelli/linux/commit/61f9ca70dd21bb8c1615f959934cd0ce80a2f3ce Patch dropped!
Re: [PATCH net-next v2 4/6] net: dsa: remove unused arguments of tagger rcv
On 05/30/2017 12:55 PM, Vivien Didelot wrote: > Hi Florian, > > Florian Fainelli writes: > >> I actually have a patch pending that adds support for HW >> insertion/extraction of switch tags (broadcom HW supports that) which >> require the orig_dev to be there so we know what features are supported >> by the master network device. > > Is orig_dev really needed in the tagging implementation, or only in the > layer above? (dsa_slave_xmit and dsa_switch_rcv.) It's needed: https://github.com/ffainelli/linux/commits/switch-tag https://github.com/ffainelli/linux/commit/61f9ca70dd21bb8c1615f959934cd0ce80a2f3ce > >> Do you mind dropping this one from your patch series? > > I don't mind if they are actually needed in taggers. I'd wait for > reviews of the other patches before respinning though. > > Thanks, > > Vivien > -- Florian
Re: [PATCH net-next v2 4/6] net: dsa: remove unused arguments of tagger rcv
Hi Florian, Florian Fainelli writes: > I actually have a patch pending that adds support for HW > insertion/extraction of switch tags (broadcom HW supports that) which > require the orig_dev to be there so we know what features are supported > by the master network device. Is orig_dev really needed in the tagging implementation, or only in the layer above? (dsa_slave_xmit and dsa_switch_rcv.) > Do you mind dropping this one from your patch series? I don't mind if they are actually needed in taggers. I'd wait for reviews of the other patches before respinning though. Thanks, Vivien
Re: [PATCH V2 2/3] net-next: dsa: add multi cpu port support
Hi John, Vivien Didelot writes: >> +int port_cpu = ds->ports[port].upstream; > > ds->ports[port] is p->dp. I misread this part, p is not yet allocated in that chunk, please ignore this one comment ;-) Thanks, Vivien
Re: [PATCH V2 2/3] net-next: dsa: add multi cpu port support
Hi John, John Crispin writes: > +static inline bool dsa_is_upstream_port(struct dsa_switch *ds, int p) > +{ > + return dsa_is_cpu_port(ds, p) || dsa_is_dsa_port(ds, p); > +} This looks confusing to me. What DSA calls an "upstream" port for the moment is the port which goes to the CPU interface. CPU0 (eth0) | | sw0 sw1 sw2 [p0 p1 p2]--[p0 p1 p2]--[p0 p1 p2] | | | | eth1 eth2eth3 eth4 So in the example above, sw1p0 is an upstream port, but sw1p2 is not. This is why dsa_upstream_port makes use of ds->rtable. > @@ -1140,11 +1140,16 @@ int dsa_slave_create(struct dsa_switch *ds, struct > device *parent, > struct net_device *master; > struct net_device *slave_dev; > struct dsa_slave_priv *p; > + int port_cpu = ds->ports[port].upstream; ds->ports[port] is p->dp. > int ret; > > - master = ds->dst->master_netdev; > - if (ds->master_netdev) > + if (port_cpu && ds->ports[port_cpu].ethernet) 0 is a valid port index for a CPU, e.g. Marvell 88E6390. > + master = ds->ports[port_cpu].ethernet; > + else if (ds->master_netdev) > master = ds->master_netdev; > + else > + master = ds->dst->master_netdev; > + master->dsa_ptr = (void *)ds->dst; > > slave_dev = alloc_netdev(sizeof(struct dsa_slave_priv), name, >NET_NAME_UNKNOWN, ether_setup); > @@ -1173,6 +1178,7 @@ int dsa_slave_create(struct dsa_switch *ds, struct > device *parent, > p->dp = &ds->ports[port]; > INIT_LIST_HEAD(&p->mall_tc_list); > p->xmit = dst->tag_ops->xmit; > + p->master = master; I'm a bit confused why we need all these references to net devices. We now have ds->master_netdev, dst->master_netdev, dp->ethernet and p->master... Wouldn't it be simpler if we only had dp->ethernet (or whichever more explicit name) for the conduit interface used to send/receive frames? Maybe I am missing something, in which case I'm sorry in advance. Thanks, Vivien
Re: [net-wireless-orinoco] question about potential null pointer dereference
Hi, > The issue here is that line 56 implies that pointer tfm_michael > might be NULL. If this is the case, there is a potential NULL > pointer dereference at line 52 once pointer tfm_michael is > indirectly dereferenced inside macro SHASH_DESC_ON_STACK(). > > My question is if there is any chance that pointer tfm_michael > might be NULL when calling macro SHASH_DESC_ON_STACK() ? > > I'm trying to figure out if this is a false positive or something > that needs to be fixed somehow. Look, if you're just sending the coverity reports to the list without reflecting and researching them, that's not actually useful - we can look at them ourselves. It took me at most a few minutes to figure this one out, so please just do the same, look at the code, and figure out what the right answer is here. johannes
Re: [PATCH V2 2/3] net-next: dsa: add multi cpu port support
On 05/30/2017 12:15 PM, Florian Fainelli wrote: > Hi John, > > On 05/30/2017 11:37 AM, John Crispin wrote: >> Hi, >> >> the patch series is based on net-next from 12 hours ago and works fine >> on that tree. I compile and runtime tested it quite intensively on >> various boards > > The warning is legit though: > > 572if (dsa_port_is_cpu(port)) > 573err = dsa_cpu_parse(port, index, dst, ds); > > 574else if (dsa_is_normal_port(port)) > 575err = dsa_user_parse(port->dn, index, ds); > > and after applying your patches I also met it: > > net/dsa/dsa2.c: In function 'dsa_ds_parse': > net/dsa/dsa2.c:574:3: warning: passing argument 1 of > 'dsa_is_normal_port' from incompatible pointer type [enabled by default] >else if (dsa_is_normal_port(port)) >^ > In file included from net/dsa/dsa_priv.h:17:0, > from net/dsa/dsa2.c:22: > ./include/net/dsa.h:264:20: note: expected 'struct dsa_switch *' but > argument is of type 'struct dsa_port *' > static inline bool dsa_is_normal_port(struct dsa_switch *ds, int p) > ^ > net/dsa/dsa2.c:574:3: error: too few arguments to function > 'dsa_is_normal_port' >else if (dsa_is_normal_port(port)) >^ > In file included from net/dsa/dsa_priv.h:17:0, > from net/dsa/dsa2.c:22: > ./include/net/dsa.h:264:20: note: declared here > static inline bool dsa_is_normal_port(struct dsa_switch *ds, int p) > ^ > CC net/bridge/br_stp.o > scripts/Makefile.build:302: recipe for target 'net/dsa/dsa2.o' failed > make[4]: *** [net/dsa/dsa2.o] Error 1 > scripts/Makefile.build:561: recipe for target 'net/dsa' failed > make[3]: *** [net/dsa] Error 2 > make[3]: *** Waiting for unfinished jobs > > We need something like this, I have some comments on your patches that I > will send shortly. Thanks! > > diff --git a/net/dsa/dsa2.c b/net/dsa/dsa2.c > index 8b13aa735c40..124c5acfa123 100644 > --- a/net/dsa/dsa2.c > +++ b/net/dsa/dsa2.c > @@ -539,14 +539,14 @@ static int dsa_cpu_parse(struct dsa_port *port, > u32 index, > return 0; > } > > -static int dsa_user_parse(struct device_node *port, u32 index, > +static int dsa_user_parse(struct dsa_port *port, u32 index, > struct dsa_switch *ds) > { > struct device_node *cpu_port; > const unsigned int *cpu_port_reg; > int cpu_port_index; > > - cpu_port = of_parse_phandle(port, "cpu", 0); > + cpu_port = of_parse_phandle(port->dn, "cpu", 0); > if (cpu_port) { > cpu_port_reg = of_get_property(cpu_port, "reg", NULL); > if (!cpu_port_reg) > @@ -572,7 +572,7 @@ static int dsa_ds_parse(struct dsa_switch_tree *dst, > struct dsa_switch *ds) > if (dsa_port_is_cpu(port)) > err = dsa_cpu_parse(port, index, dst, ds); > else if (dsa_is_normal_port(port)) > - err = dsa_user_parse(port->dn, index, ds); > + err = dsa_user_parse(port, index, ds); > > if (err) > return err; here is a version that builds, I missed one hunk: diff --git a/net/dsa/dsa2.c b/net/dsa/dsa2.c index 8b13aa735c40..d71a2a610340 100644 --- a/net/dsa/dsa2.c +++ b/net/dsa/dsa2.c @@ -539,14 +539,14 @@ static int dsa_cpu_parse(struct dsa_port *port, u32 index, return 0; } -static int dsa_user_parse(struct device_node *port, u32 index, +static int dsa_user_parse(struct dsa_port *port, u32 index, struct dsa_switch *ds) { struct device_node *cpu_port; const unsigned int *cpu_port_reg; int cpu_port_index; - cpu_port = of_parse_phandle(port, "cpu", 0); + cpu_port = of_parse_phandle(port->dn, "cpu", 0); if (cpu_port) { cpu_port_reg = of_get_property(cpu_port, "reg", NULL); if (!cpu_port_reg) @@ -571,8 +571,8 @@ static int dsa_ds_parse(struct dsa_switch_tree *dst, struct dsa_switch *ds) if (dsa_port_is_cpu(port)) err = dsa_cpu_parse(port, index, dst, ds); - else if (dsa_is_normal_port(port)) - err = dsa_user_parse(port->dn, index, ds); + else if (dsa_is_normal_port(ds, index)) + err = dsa_user_parse(port, index, ds); if (err) return err; -- Florian
Re: [PATCH linux-firmware] Mellanox: Add firmware for mlxsw_spectrum
On Mon, May 29, 2017 at 01:42:28PM +0300, Yotam Gigi wrote: > Add first firmware for the Mellanox Spectrum switch, as a followup to the > recently added commit: > 6b7421992b8d ("mlxsw: spectrum: Validate firmware revision on init") > > The version of the firmware release is 13.1420.122 > > Signed-off-by: Yotam Gigi > --- > mlxsw_spectrum-13.1420.122.mfa2 | Bin 0 -> 860424 bytes > 1 file changed, 0 insertions(+), 0 deletions(-) > create mode 100644 mlxsw_spectrum-13.1420.122.mfa2 > > diff --git a/mlxsw_spectrum-13.1420.122.mfa2 b/mlxsw_spectrum-13.1420.122.mfa2 > new file mode 100644 Can we please put this in a subdirectory if we're going to be adding more versions of this in the future? --Kyle
Re: [PATCH v2] arm: eBPF JIT compiler
Forwarding this to net-dev and eBPF folks, who weren't on CC... -Kees On Thu, May 25, 2017 at 4:13 PM, Shubham Bansal wrote: > The JIT compiler emits ARM 32 bit instructions. Currently, It supports > eBPF only. Classic BPF is supported because of the conversion by BPF > core. > > This patch is essentially changing the current implementation of JIT > compiler of Berkeley Packet Filter from classic to internal with almost > all instructions from eBPF ISA supported except the following > BPF_ALU64 | BPF_DIV | BPF_K > BPF_ALU64 | BPF_DIV | BPF_X > BPF_ALU64 | BPF_MOD | BPF_K > BPF_ALU64 | BPF_MOD | BPF_X > BPF_STX | BPF_XADD | BPF_W > BPF_STX | BPF_XADD | BPF_DW > BPF_JMP | BPF_CALL > > Implementation is using scratch space to emulate 64 bit eBPF ISA on 32 bit > ARM because of deficiency of general purpose registers on ARM. Currently, > only LITTLE ENDIAN machines are supported in this eBPF JIT Compiler. > > Tested on ARMv7 with QEMU by me (Shubham Bansal). > Tested on ARMv5 by Andrew Lunn (and...@lunn.ch). > Expected to work on ARMv6 as well, as its a part ARMv7 and part ARMv5. > Although, a proper testing is not done for ARMv6. > > Both of these testing are done with and without CONFIG_FRAME_POINTER > separately for LITTLE ENDIAN machine. > > For testing: > > 1. JIT is enabled with > echo 1 > /proc/sys/net/core/bpf_jit_enable > 2. Constant Blinding can be enabled along with JIT using > echo 1 > /proc/sys/net/core/bpf_jit_enable > echo 2 > /proc/sys/net/core/bpf_jit_harden > > See Documentation/networking/filter.txt for more information. > > Result : test_bpf: Summary: 314 PASSED, 0 FAILED, [278/306 JIT'ed] > > Signed-off-by: Shubham Bansal > --- > Documentation/networking/filter.txt |4 +- > arch/arm/Kconfig|2 +- > arch/arm/net/bpf_jit_32.c | 2404 > --- > arch/arm/net/bpf_jit_32.h | 108 +- > 4 files changed, 1713 insertions(+), 805 deletions(-) > > diff --git a/Documentation/networking/filter.txt > b/Documentation/networking/filter.txt > index b69b205..01165ac 100644 > --- a/Documentation/networking/filter.txt > +++ b/Documentation/networking/filter.txt > @@ -596,8 +596,8 @@ skb pointer). All constraints and restrictions from > bpf_check_classic() apply > before a conversion to the new layout is being done behind the scenes! > > Currently, the classic BPF format is being used for JITing on most 32-bit > -architectures, whereas x86-64, aarch64, s390x, powerpc64, sparc64 perform JIT > -compilation from eBPF instruction set. > +architectures, whereas x86-64, aarch64, arm, s390x, powerpc64, sparc64 > perform > +JIT compilation from eBPF instruction set. > > Some core changes of the new internal format: > > diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig > index 8a7ab5e..13ade46 100644 > --- a/arch/arm/Kconfig > +++ b/arch/arm/Kconfig > @@ -47,7 +47,7 @@ config ARM > select HAVE_ARCH_SECCOMP_FILTER if (AEABI && !OABI_COMPAT) > select HAVE_ARCH_TRACEHOOK > select HAVE_ARM_SMCCC if CPU_V7 > - select HAVE_CBPF_JIT > + select HAVE_EBPF_JIT > select HAVE_CC_STACKPROTECTOR > select HAVE_CONTEXT_TRACKING > select HAVE_C_RECORDMCOUNT > diff --git a/arch/arm/net/bpf_jit_32.c b/arch/arm/net/bpf_jit_32.c > index 93d0b6d..c7476e5 100644 > --- a/arch/arm/net/bpf_jit_32.c > +++ b/arch/arm/net/bpf_jit_32.c > @@ -1,13 +1,15 @@ > /* > - * Just-In-Time compiler for BPF filters on 32bit ARM > + * Just-In-Time compiler for eBPF filters on 32bit ARM > * > * Copyright (c) 2011 Mircea Gherzan > + * Copyright (c) 2017 Shubham Bansal > * > * This program is free software; you can redistribute it and/or modify it > * under the terms of the GNU General Public License as published by the > * Free Software Foundation; version 2 of the License. > */ > > +#include > #include > #include > #include > @@ -23,44 +25,91 @@ > > #include "bpf_jit_32.h" > > +int bpf_jit_enable __read_mostly; > + > +#define STACK_OFFSET(k)(k) > +#define TMP_REG_1 (MAX_BPF_JIT_REG + 0) /* TEMP Register 1 */ > +#define TMP_REG_2 (MAX_BPF_JIT_REG + 1) /* TEMP Register 2 */ > +#define TCALL_CNT (MAX_BPF_JIT_REG + 2) /* Tail Call Count */ > + > +/* Flags used for JIT optimization */ > +#define SEEN_CALL (1 << 0) > + > +#define FLAG_IMM_OVERFLOW (1 << 0) > + > /* > - * ABI: > + * Map eBPF registers to ARM 32bit registers or stack scratch space. > + * > + * 1. First argument is passed using the arm 32bit registers and rest of the > + * arguments are passed on stack scratch space. > + * 2. First callee-saved aregument is mapped to arm 32 bit registers and rest > + * arguments are mapped to scratch space on stack. > + * 3. We need two 64 bit temp registers to do complex operations on eBPF > + * registers. > + * > + * As the eBPF registers are all 64 bit registers and arm has only 3
Re: [PATCH V2 2/3] net-next: dsa: add multi cpu port support
Hi John, On 05/30/2017 11:37 AM, John Crispin wrote: > Hi, > > the patch series is based on net-next from 12 hours ago and works fine > on that tree. I compile and runtime tested it quite intensively on > various boards The warning is legit though: 572if (dsa_port_is_cpu(port)) 573err = dsa_cpu_parse(port, index, dst, ds); > 574else if (dsa_is_normal_port(port)) 575err = dsa_user_parse(port->dn, index, ds); and after applying your patches I also met it: net/dsa/dsa2.c: In function 'dsa_ds_parse': net/dsa/dsa2.c:574:3: warning: passing argument 1 of 'dsa_is_normal_port' from incompatible pointer type [enabled by default] else if (dsa_is_normal_port(port)) ^ In file included from net/dsa/dsa_priv.h:17:0, from net/dsa/dsa2.c:22: ./include/net/dsa.h:264:20: note: expected 'struct dsa_switch *' but argument is of type 'struct dsa_port *' static inline bool dsa_is_normal_port(struct dsa_switch *ds, int p) ^ net/dsa/dsa2.c:574:3: error: too few arguments to function 'dsa_is_normal_port' else if (dsa_is_normal_port(port)) ^ In file included from net/dsa/dsa_priv.h:17:0, from net/dsa/dsa2.c:22: ./include/net/dsa.h:264:20: note: declared here static inline bool dsa_is_normal_port(struct dsa_switch *ds, int p) ^ CC net/bridge/br_stp.o scripts/Makefile.build:302: recipe for target 'net/dsa/dsa2.o' failed make[4]: *** [net/dsa/dsa2.o] Error 1 scripts/Makefile.build:561: recipe for target 'net/dsa' failed make[3]: *** [net/dsa] Error 2 make[3]: *** Waiting for unfinished jobs We need something like this, I have some comments on your patches that I will send shortly. Thanks! diff --git a/net/dsa/dsa2.c b/net/dsa/dsa2.c index 8b13aa735c40..124c5acfa123 100644 --- a/net/dsa/dsa2.c +++ b/net/dsa/dsa2.c @@ -539,14 +539,14 @@ static int dsa_cpu_parse(struct dsa_port *port, u32 index, return 0; } -static int dsa_user_parse(struct device_node *port, u32 index, +static int dsa_user_parse(struct dsa_port *port, u32 index, struct dsa_switch *ds) { struct device_node *cpu_port; const unsigned int *cpu_port_reg; int cpu_port_index; - cpu_port = of_parse_phandle(port, "cpu", 0); + cpu_port = of_parse_phandle(port->dn, "cpu", 0); if (cpu_port) { cpu_port_reg = of_get_property(cpu_port, "reg", NULL); if (!cpu_port_reg) @@ -572,7 +572,7 @@ static int dsa_ds_parse(struct dsa_switch_tree *dst, struct dsa_switch *ds) if (dsa_port_is_cpu(port)) err = dsa_cpu_parse(port, index, dst, ds); else if (dsa_is_normal_port(port)) - err = dsa_user_parse(port->dn, index, ds); + err = dsa_user_parse(port, index, ds); if (err) return err; > > John > > > On 30/05/17 17:38, kbuild test robot wrote: >> Hi John, >> >> [auto build test ERROR on net-next/master] >> [also build test ERROR on next-20170530] >> [cannot apply to v4.12-rc3] >> [if your patch is applied to the wrong git tree, please drop us a note >> to help improve the system] >> >> url: >> https://github.com/0day-ci/linux/commits/John-Crispin/Documentation-devicetree-add-multiple-cpu-port-DSA-binding/20170530-224954 >> >> config: x86_64-randconfig-x014-201722 (attached as .config) >> compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901 >> reproduce: >> # save the attached .config to linux build tree >> make ARCH=x86_64 >> >> All error/warnings (new ones prefixed by >>): >> >> In file included from include/linux/ioport.h:12:0, >> from include/linux/device.h:16, >> from net//dsa/dsa2.c:13: >> net//dsa/dsa2.c: In function 'dsa_ds_parse': >>>> net//dsa/dsa2.c:574:31: error: passing argument 1 of >>>> 'dsa_is_normal_port' from incompatible pointer type >>>> [-Werror=incompatible-pointer-types] >>else if (dsa_is_normal_port(port)) >>^ >> include/linux/compiler.h:160:30: note: in definition of macro >> '__trace_if' >> if (__builtin_constant_p(!!(cond)) ? !!(cond) : \ >> ^~~~ >>>> net//dsa/dsa2.c:574:8: note: in expansion of macro 'if' >>else if (dsa_is_normal_port(port)) >> ^~ >> In file included from net//dsa/dsa_priv.h:17:0, >>
Re: [PATCH v2 net-next 1/3] perf, bpf: Add BPF support to all perf_event types
On Tue, May 30, 2017 at 10:37:46AM -0700, Alexei Starovoitov wrote: > On 5/30/17 9:51 AM, Peter Zijlstra wrote: > > I'm not entirely sure I see how that is required. Should per task not > > already work? The WARN that's there will only trigger if you call them > > on the wrong task, which is something you shouldn't do anyway. > > The kernel WARN is considered to be a bug of bpf infra. That's the > reason we do all these checks at map update time and at run-time. > The bpf program authors should be able to do all possible experiments > until their scripts work. Dealing with kernel warns and reboots is not > something user space folks like to do. > Today bpf_perf_event_read() for per-task events isn't really > working due to event->oncpu != cpu runtime check in there. > If we convert warns to returns the existing scripts will continue > to work as-is and per-task will be possible. Ah yes.. I always forget that side of things (not ever having seen a bpf thing working doesn't help either of course). Let me consider that for a bit..
Re: [PATCH net-next v2 4/6] net: dsa: remove unused arguments of tagger rcv
Hey Vivien, On 05/30/2017 11:33 AM, Vivien Didelot wrote: > The struct dsa_device_ops defines the rcv function with 2 unused > arguments struct packet_type *pt, and struct net_device *orig_dev. > > This patch removes them from the definition and implementations. > > Reviewed-by: Andrew Lunn > Signed-off-by: Vivien Didelot I actually have a patch pending that adds support for HW insertion/extraction of switch tags (broadcom HW supports that) which require the orig_dev to be there so we know what features are supported by the master network device. Do you mind dropping this one from your patch series? Thanks! -- Florian