[PATCH net] mpls: fix clearing of dead nh_flags on link up

2017-05-30 Thread Roopa Prabhu
From: Roopa Prabhu 

recent fixes to use WRITE_ONCE for nh_flags on link up,
accidently ended up leaving the deadflags on a nh. This patch
fixes the WRITE_ONCE to use freshly evaluated nh_flags.

Fixes: 39eb8cd17588 ("net: mpls: rt_nhn_alive and nh_flags should be accessed 
using READ_ONCE")
Reported-by: Satish Ashok 
Signed-off-by: Roopa Prabhu 
---
 net/mpls/af_mpls.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c
index 94b3317..b51582d 100644
--- a/net/mpls/af_mpls.c
+++ b/net/mpls/af_mpls.c
@@ -1483,7 +1483,7 @@ static void mpls_ifup(struct net_device *dev, unsigned 
int flags)
continue;
alive++;
nh_flags &= ~flags;
-   WRITE_ONCE(nh->nh_flags, flags);
+   WRITE_ONCE(nh->nh_flags, nh_flags);
} endfor_nexthops(rt);
 
WRITE_ONCE(rt->rt_nhn_alive, alive);
-- 
1.9.1



Re: [PULL] topic/e1000e-fix

2017-05-30 Thread Daniel Vetter
On Wed, May 31, 2017 at 7:54 AM, Daniel Vetter  wrote:
> On Wed, May 31, 2017 at 1:06 AM, Dave Airlie  wrote:
>> On 31 May 2017 at 08:10, David Miller  wrote:
>>> From: Daniel Vetter 
>>> Date: Tue, 30 May 2017 22:15:42 +0200
>>>
 If the e1000e maintainer wants to coalesce or not return statements
 this simple way, that's imo on him to change the color as needed.
>>>
>>> That's not how things work.
>>>
>>> If the maintainer wants you to style things a certain way, either you
>>> do it that way or your patch isn't accepted.
>
> Consider this pull a regression report, pls handle it.

And I guess I pile of more cc, to make this regression report
complete. I mean you got the backtrace, bisect and a proposed fix, and
the almost-whitespace change demanded is something gcc does in its
sleep. I'd understand a request to retest if it would be a real
functional change, but in this situation I have no idea why this
regression just can't be fixed already.

Not sure if it's really preferred if regression reports come
incomplete, without bugfix and bisect attached.
-Daniel

>> I'm not really sure why Chris just couldn't respin already.
>>
>> Though really I think Chris should have just asked for a revert of the
>> original patch that broke stuff, instead of trying to patch a driver
>> if he doesn't have time to get the patch right for the maintainer.
>
> Ok, can we pls revert 2800209994f8 ("e1000e: Refactor PM flows") then?
>
> Thanks, Daniel
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> +41 (0) 79 365 57 48 - http://blog.ffwll.ch



-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch


[PATCH net-next] powerpc: use asm-generic/socket.h as much as possible

2017-05-30 Thread Stephen Rothwell
asm-generic/socket.h already has an exception for the differences that
powerpc needs, so just include it after defining the differences.

Signed-off-by: Stephen Rothwell 
---
 arch/powerpc/include/uapi/asm/socket.h | 92 +-
 1 file changed, 1 insertion(+), 91 deletions(-)

Build tested using powerpc allyesconfig, pseries_le_defconfig, 32 bit
and 64 bit allnoconfig and ppc44x_defconfig builds.

Dave, this is on top of the net-next tree and I am sending it to you
since it would produce a conflict if it was merged via the powerpc tree.

diff --git a/arch/powerpc/include/uapi/asm/socket.h 
b/arch/powerpc/include/uapi/asm/socket.h
index bc4ca72faf99..3c590c7c42c0 100644
--- a/arch/powerpc/include/uapi/asm/socket.h
+++ b/arch/powerpc/include/uapi/asm/socket.h
@@ -8,28 +8,6 @@
  * 2 of the License, or (at your option) any later version.
  */
 
-#include 
-
-/* For setsockopt(2) */
-#define SOL_SOCKET 1
-
-#define SO_DEBUG   1
-#define SO_REUSEADDR   2
-#define SO_TYPE3
-#define SO_ERROR   4
-#define SO_DONTROUTE   5
-#define SO_BROADCAST   6
-#define SO_SNDBUF  7
-#define SO_RCVBUF  8
-#define SO_SNDBUFFORCE 32
-#define SO_RCVBUFFORCE 33
-#define SO_KEEPALIVE   9
-#define SO_OOBINLINE   10
-#define SO_NO_CHECK11
-#define SO_PRIORITY12
-#define SO_LINGER  13
-#define SO_BSDCOMPAT   14
-#define SO_REUSEPORT   15
 #define SO_RCVLOWAT16
 #define SO_SNDLOWAT17
 #define SO_RCVTIMEO18
@@ -37,74 +15,6 @@
 #define SO_PASSCRED20
 #define SO_PEERCRED21
 
-/* Security levels - as per NRL IPv6 - don't actually do anything */
-#define SO_SECURITY_AUTHENTICATION 22
-#define SO_SECURITY_ENCRYPTION_TRANSPORT   23
-#define SO_SECURITY_ENCRYPTION_NETWORK 24
-
-#define SO_BINDTODEVICE25
-
-/* Socket filtering */
-#define SO_ATTACH_FILTER   26
-#define SO_DETACH_FILTER   27
-#define SO_GET_FILTER  SO_ATTACH_FILTER
-
-#define SO_PEERNAME28
-#define SO_TIMESTAMP   29
-#define SCM_TIMESTAMP  SO_TIMESTAMP
-
-#define SO_ACCEPTCONN  30
-
-#define SO_PEERSEC 31
-#define SO_PASSSEC 34
-#define SO_TIMESTAMPNS 35
-#define SCM_TIMESTAMPNSSO_TIMESTAMPNS
-
-#define SO_MARK36
-
-#define SO_TIMESTAMPING37
-#define SCM_TIMESTAMPING   SO_TIMESTAMPING
-
-#define SO_PROTOCOL38
-#define SO_DOMAIN  39
-
-#define SO_RXQ_OVFL 40
-
-#define SO_WIFI_STATUS 41
-#define SCM_WIFI_STATUSSO_WIFI_STATUS
-#define SO_PEEK_OFF42
-
-/* Instruct lower device to use last 4-bytes of skb data as FCS */
-#define SO_NOFCS   43
-
-#define SO_LOCK_FILTER 44
-
-#define SO_SELECT_ERR_QUEUE45
-
-#define SO_BUSY_POLL   46
-
-#define SO_MAX_PACING_RATE 47
-
-#define SO_BPF_EXTENSIONS  48
-
-#define SO_INCOMING_CPU49
-
-#define SO_ATTACH_BPF  50
-#define SO_DETACH_BPF  SO_DETACH_FILTER
-
-#define SO_ATTACH_REUSEPORT_CBPF   51
-#define SO_ATTACH_REUSEPORT_EBPF   52
-
-#define SO_CNX_ADVICE  53
-
-#define SCM_TIMESTAMPING_OPT_STATS 54
-
-#define SO_MEMINFO 55
-
-#define SO_INCOMING_NAPI_ID56
-
-#define SO_COOKIE  57
-
-#define SCM_TIMESTAMPING_PKTINFO   58
+#include 
 
 #endif /* _ASM_POWERPC_SOCKET_H */
-- 
2.11.0

-- 
Cheers,
Stephen Rothwell


Re: [PATCH net-next 0/8] Introduce bpf ID

2017-05-30 Thread Martin KaFai Lau
On Tue, May 30, 2017 at 06:35:37PM -0600, David Ahern wrote:
> On 5/30/17 6:08 PM, Martin KaFai Lau wrote:
> > This patch series:
> > 1) Introduce ID for both bpf_prog and bpf_map.
> > 2) Add bpf commands to iterate the prog IDs and map
> >IDs of the system.
> > 3) Add bpf commands to get a prog/map fd from an ID
> > 4) Add bpf command to get prog/map info from a fd.
> >The prog/map info is a jump start in this patchset
> >and it is not meant to be a complete list.  They can
> >be extended in the future patches.
>
> Interesting. This really simplifies the bpf retrieval I have been
> working on:
> https://github.com/dsahern/linux/commits/bpf/retrieve-bpf-wip
>
> Having the net device, filter, route, socket, cgroup, etc return the ids
> for attached programs solves the problem of getting a reference to all
> of the places BPF programs can be placed.
Right, it is the idea.

I am also baking some follow-up patches for returning xdp-prog's id
through netlink.

>
> This patch set turns an id into a fd, and my patches allow the fd to be
> used to get the bpf code.


Re: [PATCH 1/2] ARM: dts: imx6ul-14x14-evk: Add ksz8081 phy properties

2017-05-30 Thread Fabio Estevam
On Tue, May 30, 2017 at 2:34 PM, Leonard Crestez
 wrote:
> Right now mach-imx6ul registers a fixup for the ksz8081 phy. The same
> register values can be set through the micrel phy driver by using dts
> properties.
>
> This seems preferable and allows cleanly fixing suspend/resume.
>
> Signed-off-by: Leonard Crestez 

Reviewed-by: Fabio Estevam 


Re: [net-bluetooth] question about potential null pointer dereference

2017-05-30 Thread Gustavo A. R. Silva

Hi Marcel,

Quoting "Gustavo A. R. Silva" :


Hi Marcel,

Quoting Marcel Holtmann :


Hi Gustavo,

While looking into Coverity ID 1357456 I ran into the following  
piece of code at net/bluetooth/smp.c:166


166/* The following functions map to the LE SC SMP crypto functions
167 * AES-CMAC, f4, f5, f6, g2 and h6.
168 */
169
170static int aes_cmac(struct crypto_shash *tfm, const u8 k[16],  
const u8 *m,

171size_t len, u8 mac[16])
172{
173uint8_t tmp[16], mac_msb[16], msg_msb[CMAC_MSG_MAX];
174SHASH_DESC_ON_STACK(desc, tfm);
175int err;
176
177if (len > CMAC_MSG_MAX)
178return -EFBIG;
179
180if (!tfm) {
181BT_ERR("tfm %p", tfm);
182return -EINVAL;
183}
184


BTW, what do you think about removing the IF block above?


185desc->tfm = tfm;
186desc->flags = 0;
187
188/* Swap key and message from LSB to MSB */
189swap_buf(k, tmp, 16);
190swap_buf(m, msg_msb, len);
191
192SMP_DBG("msg (len %zu) %*phN", len, (int) len, m);
193SMP_DBG("key %16phN", k);
194
195err = crypto_shash_setkey(tfm, tmp, 16);
196if (err) {
197BT_ERR("cipher setkey failed: %d", err);
198return err;
199}
200
201err = crypto_shash_digest(desc, msg_msb, len, mac_msb);
202shash_desc_zero(desc);
203if (err) {
204BT_ERR("Hash computation error %d", err);
205return err;
206}
207
208swap_buf(mac_msb, mac, 16);
209
210SMP_DBG("mac %16phN", mac);
211
212return 0;
213}

The issue here is that line 180 implies that pointer tfm might be  
NULL. If this is the case, there is a potential NULL pointer  
dereference at line 174 once pointer tfm is indirectly  
dereferenced inside macro SHASH_DESC_ON_STACK().


My question is if there is any chance that pointer tfm maybe be  
NULL when calling macro SHASH_DESC_ON_STACK()?


I think the part you are after is this:

   smp->tfm_cmac = crypto_alloc_shash("cmac(aes)", 0, 0);
   if (IS_ERR(smp->tfm_cmac)) {
   BT_ERR("Unable to create CMAC crypto context");
   crypto_free_cipher(smp->tfm_aes);
   kzfree(smp);
   return NULL;
   }



Yeah, this makes it all clear.

So the tfm_cmac is part of the smp structure. However if there is  
no cipher, we destroy the smp structure and essentially run without  
SMP support. So it can not really be called anyway.




What I take from this is that as a general rule, I should first try  
to identify whether the code I'm debugging is reachable or not,  
depending on the specific structures and variables I'm interested in.



Maybe commenting this might be a good idea.



Yep, it wouldn't hurt.

In the meantime I will triage and document this as a false positive.

Thank you very much for the clarification, Marcel,
I really appreciate it.
--
Gustavo A. R. Silva


Thanks
--
Gustavo A. R. Silva








Re: [net-bluetooth] question about potential null pointer dereference

2017-05-30 Thread Gustavo A. R. Silva

Hi Marcel,

Quoting Marcel Holtmann :


Hi Gustavo,

While looking into Coverity ID 1357456 I ran into the following  
piece of code at net/bluetooth/smp.c:166


166/* The following functions map to the LE SC SMP crypto functions
167 * AES-CMAC, f4, f5, f6, g2 and h6.
168 */
169
170static int aes_cmac(struct crypto_shash *tfm, const u8 k[16],  
const u8 *m,

171size_t len, u8 mac[16])
172{
173uint8_t tmp[16], mac_msb[16], msg_msb[CMAC_MSG_MAX];
174SHASH_DESC_ON_STACK(desc, tfm);
175int err;
176
177if (len > CMAC_MSG_MAX)
178return -EFBIG;
179
180if (!tfm) {
181BT_ERR("tfm %p", tfm);
182return -EINVAL;
183}
184
185desc->tfm = tfm;
186desc->flags = 0;
187
188/* Swap key and message from LSB to MSB */
189swap_buf(k, tmp, 16);
190swap_buf(m, msg_msb, len);
191
192SMP_DBG("msg (len %zu) %*phN", len, (int) len, m);
193SMP_DBG("key %16phN", k);
194
195err = crypto_shash_setkey(tfm, tmp, 16);
196if (err) {
197BT_ERR("cipher setkey failed: %d", err);
198return err;
199}
200
201err = crypto_shash_digest(desc, msg_msb, len, mac_msb);
202shash_desc_zero(desc);
203if (err) {
204BT_ERR("Hash computation error %d", err);
205return err;
206}
207
208swap_buf(mac_msb, mac, 16);
209
210SMP_DBG("mac %16phN", mac);
211
212return 0;
213}

The issue here is that line 180 implies that pointer tfm might be  
NULL. If this is the case, there is a potential NULL pointer  
dereference at line 174 once pointer tfm is indirectly dereferenced  
inside macro SHASH_DESC_ON_STACK().


My question is if there is any chance that pointer tfm maybe be  
NULL when calling macro SHASH_DESC_ON_STACK()?


I think the part you are after is this:

smp->tfm_cmac = crypto_alloc_shash("cmac(aes)", 0, 0);
if (IS_ERR(smp->tfm_cmac)) {
BT_ERR("Unable to create CMAC crypto context");
crypto_free_cipher(smp->tfm_aes);
kzfree(smp);
return NULL;
}



Yeah, this makes it all clear.

So the tfm_cmac is part of the smp structure. However if there is no  
cipher, we destroy the smp structure and essentially run without SMP  
support. So it can not really be called anyway.




What I take from this is that as a general rule, I should first try to  
identify whether the code I'm debugging is reachable or not, depending  
on the specific structures and variables I'm interested in.



Maybe commenting this might be a good idea.



Yep, it wouldn't hurt.

In the meantime I will triage and document this as a false positive.

Thank you very much for the clarification, Marcel,
I really appreciate it.
--
Gustavo A. R. Silva







Re: [PATCH net-next 4/8] bpf: Add BPF_PROG_GET_FD_BY_ID

2017-05-30 Thread kbuild test robot
Hi Martin,

[auto build test ERROR on net-next/master]

url:
https://github.com/0day-ci/linux/commits/Martin-KaFai-Lau/bpf-Introduce-bpf_prog-ID/20170531-110650
config: x86_64-randconfig-x015-201722 (attached as .config)
compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901
reproduce:
# save the attached .config to linux build tree
make ARCH=x86_64 

All errors (new ones prefixed by >>):

   In file included from arch/x86/include/asm/bug.h:81:0,
from include/linux/bug.h:4,
from include/linux/thread_info.h:11,
from arch/x86/include/asm/preempt.h:6,
from include/linux/preempt.h:80,
from include/linux/spinlock.h:50,
from include/linux/seqlock.h:35,
from include/linux/time.h:5,
from include/linux/ktime.h:24,
from include/linux/timer.h:5,
from include/linux/workqueue.h:8,
from include/linux/bpf.h:12,
from kernel/bpf/syscall.c:12:
   kernel/bpf/syscall.c: In function 'bpf_prog_inc_not_zero':
>> kernel/bpf/syscall.c:834:16: error: implicit declaration of function 
>> 'lockdep_is_held' [-Werror=implicit-function-declaration]
 WARN_ON_ONCE(!lockdep_is_held(&prog_idr_lock));
   ^
   include/asm-generic/bug.h:65:25: note: in definition of macro 'WARN_ON_ONCE'
 int __ret_warn_on = !!(condition);   \
^
   cc1: some warnings being treated as errors

vim +/lockdep_is_held +834 kernel/bpf/syscall.c

   828  EXPORT_SYMBOL_GPL(bpf_prog_inc);
   829  
   830  static struct bpf_prog *bpf_prog_inc_not_zero(struct bpf_prog *prog)
   831  {
   832  int refold;
   833  
 > 834  WARN_ON_ONCE(!lockdep_is_held(&prog_idr_lock));
   835  refold = __atomic_add_unless(&prog->aux->refcnt, 1, 0);
   836  
   837  if (refold >= BPF_MAX_REFCNT) {

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip


Re: [PATCH net-next] bnxt_en: Fix xmit_more with BQL.

2017-05-30 Thread David Miller
From: Michael Chan 
Date: Tue, 30 May 2017 20:03:00 -0400

> We need to write the doorbell if BQL has stopped the queue and
> skb->xmit_more is set.  Otherwise it is possible for the tx queue to
> rot and cause tx timeout.
> 
> Fixes: 4d172f21cefe ("bnxt_en: Implement xmit_more.")
> Suggested-by: Yuval Mintz 
> Signed-off-by: Michael Chan 

Applied, thanks.


[PATCH] enic: Fix another sleep-in-atomic bug

2017-05-30 Thread Jia-Ju Bai
The driver may sleep under a spin lock in some function call paths.
The 1st function call path is:
enic_tx_hang_reset (acquire the lock by spin_lock)
  enic_stop
enic_synchronize_irqs
  synchronize_irq --> may sleep
The 2nd function call path is:
enic_tx_hang_reset (acquire the lock by spin_lock)
  enic_dev_wait
schedule_timeout_uninterruptible --> may sleep
The 3rd function call path is:
enic_tx_hang_reset (acquire the lock by spin_lock)
  enic_open
enic_request_intr
  enic_set_rx_cpu_rmap
enic_free_rx_cpu_rmap
  free_irq_cpu_rmap --> may sleep

To fix it, the "spin_lock" and "spin_unlock" are removed 
in enic_tx_hang_reset.

Signed-off-by: Jia-Ju Bai 
---
 drivers/net/ethernet/cisco/enic/enic_main.c |2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/net/ethernet/cisco/enic/enic_main.c 
b/drivers/net/ethernet/cisco/enic/enic_main.c
index 4b87bee..d6523e2 100644
--- a/drivers/net/ethernet/cisco/enic/enic_main.c
+++ b/drivers/net/ethernet/cisco/enic/enic_main.c
@@ -2330,7 +2330,6 @@ static void enic_tx_hang_reset(struct work_struct *work)
 
rtnl_lock();
 
-   spin_lock(&enic->enic_api_lock);
enic_dev_hang_notify(enic);
enic_stop(enic->netdev);
enic_dev_hang_reset(enic);
@@ -2339,7 +2338,6 @@ static void enic_tx_hang_reset(struct work_struct *work)
enic_set_rss_nic_cfg(enic);
enic_dev_set_ig_vlan_rewrite_mode(enic);
enic_open(enic->netdev);
-   spin_unlock(&enic->enic_api_lock);
call_netdevice_notifiers(NETDEV_REBOOT, enic->netdev);
 
rtnl_unlock();
-- 
1.7.9.5




[PATCH] enic: Fix a sleep-in-atomic bug

2017-05-30 Thread Jia-Ju Bai
The driver may sleep under a spin lock in some function call paths.
The 1st function call path is:
enic_reset (acquire the lock by spin_lock)
  enic_stop
enic_synchronize_irqs
  synchronize_irq --> may sleep
The 2nd function call path is:
enic_reset (acquire the lock by spin_lock)
  enic_dev_soft_reset
enic_dev_wait
  schedule_timeout_uninterruptible --> may sleep
The 3rd function call path is:
enic_reset (acquire the lock by spin_lock)
  enic_open
enic_request_intr
  enic_set_rx_cpu_rmap
enic_free_rx_cpu_rmap
  free_irq_cpu_rmap --> may sleep

To fix it, the "spin_lock" and "spin_unlock" are removed in enic_reset.

Signed-off-by: Jia-Ju Bai 
---
 drivers/net/ethernet/cisco/enic/enic_main.c |2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/net/ethernet/cisco/enic/enic_main.c 
b/drivers/net/ethernet/cisco/enic/enic_main.c
index 4b87bee..2a9bef8 100644
--- a/drivers/net/ethernet/cisco/enic/enic_main.c
+++ b/drivers/net/ethernet/cisco/enic/enic_main.c
@@ -2310,7 +2310,6 @@ static void enic_reset(struct work_struct *work)
 
rtnl_lock();
 
-   spin_lock(&enic->enic_api_lock);
enic_stop(enic->netdev);
enic_dev_soft_reset(enic);
enic_reset_addr_lists(enic);
@@ -2318,7 +2317,6 @@ static void enic_reset(struct work_struct *work)
enic_set_rss_nic_cfg(enic);
enic_dev_set_ig_vlan_rewrite_mode(enic);
enic_open(enic->netdev);
-   spin_unlock(&enic->enic_api_lock);
call_netdevice_notifiers(NETDEV_REBOOT, enic->netdev);
 
rtnl_unlock();
-- 
1.7.9.5




[PATCH] i40e: Fix a sleep-in-atomic bug

2017-05-30 Thread Jia-Ju Bai
The driver may sleep under a spin lock, and the function call path is:
i40e_ndo_set_vf_port_vlan (acquire the lock by spin_lock_bh)
  i40e_vsi_remove_pvid
i40e_vlan_stripping_disable
  i40e_aq_update_vsi_params
i40e_asq_send_command
  mutex_lock --> may sleep

To fixed it, the spin lock is released before "i40e_vsi_remove_pvid", and 
the lock is acquired again after this function.

Signed-off-by: Jia-Ju Bai 
---
 drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c |2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c 
b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
index 95c23fb..0fb38ca 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
@@ -3017,10 +3017,12 @@ int i40e_ndo_set_vf_port_vlan(struct net_device 
*netdev, int vf_id,
   VLAN_VID_MASK));
}
 
+   spin_unlock_bh(&vsi->mac_filter_hash_lock);
if (vlan_id || qos)
ret = i40e_vsi_add_pvid(vsi, vlanprio);
else
i40e_vsi_remove_pvid(vsi);
+   spin_lock_bh(&vsi->mac_filter_hash_lock);
 
if (vlan_id) {
dev_info(&pf->pdev->dev, "Setting VLAN %d, QOS 0x%x on VF %d\n",
-- 
1.7.9.5




Re: Deleting a dynamic mac entry..

2017-05-30 Thread Manohar Kumar
Please see inline..

On Fri, May 26, 2017 at 1:15 AM, Toshiaki Makita
 wrote:
> On 2017/05/26 14:08, Manohar Kumar wrote:
>> On Wed, May 24, 2017 at 6:11 PM, Toshiaki Makita
>>  wrote:
>>> On 2017/05/25 3:05, Manohar Kumar wrote:
 Thanks, Toshiaki.

 What is the right way to set the default_pvid using the bridge command
 ? I tried this, which fails..

 root@net-3:~# ip link set dev vxlan0 name untagged type vlan id 0
 RTNETLINK answers: Operation not supported
 root@net-3:~#

 All the interfaces in the bridge are untagged.
>>>
>>> # ip link set br0 down
>>> # echo 0 > /sys/class/net/br0/bridge/default_pvid
>>>
>>
>> I have the bridge inside a network namespace. There is no
>> /sys/class/net entry for the bridge interface inside the namespace.
>>
>> I see online references to this concept of 'tagged' sysfs directories
>> to support namespaces. But its not clear how to access the sysfs
>> entries for a particular namespace.
>
> Did you remount /sys after entering netns?

No, that was the problem. Mounting /sys did the trick. Appreciate your help.

Trying to understanding this better. How is the sysfs tree maintained
per network namespace ? I assume only the /sys/class/net is per
network namespace and the rest of the /sys is shared with the host
namespace ? Also, how does mounting /sys in the network namespace make
the new /sys/class/net tree visible ?

thanks,
Manohar.


> If not, remount /sys or use "ip netns" which automatically do it.
>
>> Is there any other way to set the default_pvid for the bridge
>> interface inside namespace ?
>
> No, as of 3.19.
>
>
> Toshiaki Makita
>


[PATCH] isdn: Fix a sleep-in-atomic bug

2017-05-30 Thread Jia-Ju Bai
The driver may sleep under a spin lock, the function call path is:
isdn_ppp_mp_receive (acquire the lock)
  isdn_ppp_mp_reassembly
isdn_ppp_push_higher
  isdn_ppp_decompress
isdn_ppp_ccp_reset_trans
  isdn_ppp_ccp_reset_alloc_state
kzalloc(GFP_KERNEL) --> may sleep

To fixed it, the "GFP_KERNEL" is replaced with "GFP_ATOMIC".

Signed-off-by: Jia-Ju Bai 
---
 drivers/isdn/i4l/isdn_ppp.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/isdn/i4l/isdn_ppp.c b/drivers/isdn/i4l/isdn_ppp.c
index d07dd519..8aa158a 100644
--- a/drivers/isdn/i4l/isdn_ppp.c
+++ b/drivers/isdn/i4l/isdn_ppp.c
@@ -2364,7 +2364,7 @@ static struct ippp_ccp_reset_state 
*isdn_ppp_ccp_reset_alloc_state(struct ippp_s
   id);
return NULL;
} else {
-   rs = kzalloc(sizeof(struct ippp_ccp_reset_state), GFP_KERNEL);
+   rs = kzalloc(sizeof(struct ippp_ccp_reset_state), GFP_ATOMIC);
if (!rs)
return NULL;
rs->state = CCPResetIdle;
-- 
1.7.9.5




Re: [i40e] regression on TCP stream and TCP maerts, kernel-4.12.0-0.rc2

2017-05-30 Thread Alexander Duyck
On Tue, May 30, 2017 at 8:41 AM, Alexander Duyck
 wrote:
> On Tue, May 30, 2017 at 6:43 AM, Adam Okuliar  wrote:
>> Hello,
>>
>> we found regression on intel card(XL710) with i40e driver. Regression is
>> about ~45%
>> on TCP_STREAM and TCP_MAERTS test for IPv4 and IPv6. Regression was first
>> visible in kernel-4.12.0-0.rc1.
>>
>> More details about results you can see in uploaded images in bugzilla. [0]
>>
>>
>> [0] https://bugzilla.kernel.org/show_bug.cgi?id=195923
>>
>>
>> Best regards, / S pozdravom,
>>
>> Adrián Tomašov
>> Kernel Performance QE
>> atoma...@redhat.com
>
> I have added the i40e driver maintainer and the intel-wired-lan
> mailing list so that we can make are developers aware of the issue.
>
> Thanks.
>
> - Alex

Adam,

We are having some issues trying to reproduce what you reported.

Can you provide some additional data. Specifically we would be looking
for an "ethtool -i", and an "ethtool -S" for the port before and after
the test. If you can attach it to the bugzilla that would be
appreciated.

Thanks.

- Alex


RE: [PATCH net-next] net: ndisc.c: reduce size of __ndisc_fill_addr_option()

2017-05-30 Thread YUAN Linyu
Ok, I will send v2 later

> -Original Message-
> From: David Ahern [mailto:dsah...@gmail.com]
> Sent: Wednesday, May 31, 2017 8:40 AM
> To: YUAN Linyu; David Ahern; Joe Perches; David Miller; cug...@163.com
> Cc: netdev@vger.kernel.org
> Subject: Re: [PATCH net-next] net: ndisc.c: reduce size of
> __ndisc_fill_addr_option()
> 
> On 5/30/17 6:29 PM, YUAN Linyu wrote:
> >>> that function should be converted to skb_put_zero once it hits net-next.
> >> I will check it
> > I can't find skb_put_zero
> 
> I believe the decision was to put it in Johannes' tree and it will make
> its way to DaveM's tree. Give some time.


Re: [PATCH iproute2 0/4] ip: Add vrf show commmand

2017-05-30 Thread Stephen Hemminger
On Sat, 27 May 2017 17:34:46 -0600
David Ahern  wrote:

> Refactor ip address to export its capability to save a list of nlmsg's
> for links and its link filter. Use both to add an 'ip vrf show' command
> to list all configured VRF with table id.
> 
> David Ahern (4):
>   ip address: Export ip_linkaddr_list
>   ip address: Move filter struct to ip_common.h
>   ip address: Change print_linkinfo_brief to take filter as an input
>   ip vrf: Add show command
> 
>  include/libnetlink.h |  10 
>  ip/ip_common.h   |  27 -
>  ip/ipaddress.c   | 144 +++-
>  ip/iplink.c  |   2 +-
>  ip/ipvrf.c   | 153 
> +--
>  man/man8/ip-vrf.8|  11 
>  6 files changed, 266 insertions(+), 81 deletions(-)
> 

Looks good, series applied to master.


Re: [PATCH V2 2/3] net-next: dsa: add multi cpu port support

2017-05-30 Thread Andrew Lunn
On Tue, May 30, 2017 at 05:16:27PM -0700, Florian Fainelli wrote:
> On 05/30/2017 05:06 PM, Andrew Lunn wrote:
> >> - past the initial setup, if we start creating bridge devices and so on,
> >> we have no way to tell: group Ports 0-3 together and send traffic to CPU
> >> port 0, then let Port 5 alone and send traffic to CPU port 1, that's a
> >> DSA-only problem though, because we still have the CPU port(s) as
> >> independent network interfaces.
> > 
> > What is the problem here? Frames come out the master interface, get
> > untagged and passed to the slave interface and go upto the bridge. It
> > should all just work. Same in the reverse direction.
> 
> The problem is really that is you have multiple CPU ports, how do you
> define which one gets all the traffic by default? Ascending order of
> port number? Descending order?

I would probably default to round robin when allocating user ports to
CPU ports. That probably gives you the best default.

> I actually tend to think that most use cases our there are in the order
> of dedicating one CPU port to one corresponding switch port (user
> facing, or internal) in order to provided guaranteed bandwidth for that
> port.

Which is generally a waste of bandwidth. Best case, i get 40Mbps
Internet access. Meaning 960Mbps of a dedicated cpu port would be
wasted.

>  But as an user, I want to choose how the grouping is going to
> work, and right now, I cannot, unless this is hardcoded in Device Tree,
> which sounds both wrong and inadequate.

So how about round-robin default, and then devlink to move a user port
to a specific cpu port?

We also need to watch out for asymmetry. I think newer marvell chips
don't support egress to multiple CPU ports. Ingress to the switch i
think is unlimited. The older chips are more flexible in this
respect. So we need some degree of flexibility here.
 
Andrew


Re: [net-bluetooth] question about potential null pointer dereference

2017-05-30 Thread Marcel Holtmann
Hi Gustavo,

> While looking into Coverity ID 1357456 I ran into the following piece of code 
> at net/bluetooth/smp.c:166
> 
> 166/* The following functions map to the LE SC SMP crypto functions
> 167 * AES-CMAC, f4, f5, f6, g2 and h6.
> 168 */
> 169
> 170static int aes_cmac(struct crypto_shash *tfm, const u8 k[16], const u8 *m,
> 171size_t len, u8 mac[16])
> 172{
> 173uint8_t tmp[16], mac_msb[16], msg_msb[CMAC_MSG_MAX];
> 174SHASH_DESC_ON_STACK(desc, tfm);
> 175int err;
> 176
> 177if (len > CMAC_MSG_MAX)
> 178return -EFBIG;
> 179
> 180if (!tfm) {
> 181BT_ERR("tfm %p", tfm);
> 182return -EINVAL;
> 183}
> 184
> 185desc->tfm = tfm;
> 186desc->flags = 0;
> 187
> 188/* Swap key and message from LSB to MSB */
> 189swap_buf(k, tmp, 16);
> 190swap_buf(m, msg_msb, len);
> 191
> 192SMP_DBG("msg (len %zu) %*phN", len, (int) len, m);
> 193SMP_DBG("key %16phN", k);
> 194
> 195err = crypto_shash_setkey(tfm, tmp, 16);
> 196if (err) {
> 197BT_ERR("cipher setkey failed: %d", err);
> 198return err;
> 199}
> 200
> 201err = crypto_shash_digest(desc, msg_msb, len, mac_msb);
> 202shash_desc_zero(desc);
> 203if (err) {
> 204BT_ERR("Hash computation error %d", err);
> 205return err;
> 206}
> 207
> 208swap_buf(mac_msb, mac, 16);
> 209
> 210SMP_DBG("mac %16phN", mac);
> 211
> 212return 0;
> 213}
> 
> The issue here is that line 180 implies that pointer tfm might be NULL. If 
> this is the case, there is a potential NULL pointer dereference at line 174 
> once pointer tfm is indirectly dereferenced inside macro 
> SHASH_DESC_ON_STACK().
> 
> My question is if there is any chance that pointer tfm maybe be NULL when 
> calling macro SHASH_DESC_ON_STACK()?

I think the part you are after is this:

smp->tfm_cmac = crypto_alloc_shash("cmac(aes)", 0, 0);  
 
if (IS_ERR(smp->tfm_cmac)) {
 
BT_ERR("Unable to create CMAC crypto context"); 
 
crypto_free_cipher(smp->tfm_aes);   
 
kzfree(smp);
 
return NULL;
 
} 

So the tfm_cmac is part of the smp structure. However if there is no cipher, we 
destroy the smp structure and essentially run without SMP support. So it can 
not really be called anyway.

Maybe commenting this might be a good idea.

Regards

Marcel



Re: [PATCH iproute2 -master 0/2] Two misc BPF updates

2017-05-30 Thread Stephen Hemminger
On Sat, 13 May 2017 02:32:33 +0200
Daniel Borkmann  wrote:

> Requires header rebase with -net.
> 
> Thanks!
> 
> Daniel Borkmann (2):
>   bpf: update printing of generic xdp mode
>   bpf: dump error to the user when retrieving pinned prog fails
> 
>  ip/iplink_xdp.c | 19 +++
>  lib/bpf.c   | 12 +++-
>  2 files changed, 22 insertions(+), 9 deletions(-)
> 

Applied thanks.


Re: [patch iproute2] ipvtap: Adding support for ipvtap device management

2017-05-30 Thread Stephen Hemminger
On Tue, 23 May 2017 16:22:05 -0700
Sainath Grandhi  wrote:

> This patch adds support for managing ipvtap devices using ip link. ipvtap 
> support
> is added to linux with commit 235a9d89da976e2975b3de9afc0bed7b72557983

Please resend with a Signed-off-by line.

Also, you need to update the Usage message and the man page.


Re: [patch iproute2] tc: flower: add support for tcp flags

2017-05-30 Thread Stephen Hemminger
On Tue, 23 May 2017 23:51:39 +0200
Jiri Pirko  wrote:

> From: Jiri Pirko 
> 
> Allow user to insert a flower classifier filter rule which includes
> match for tcp flags.
> 
> Signed-off-by: Jiri Pirko 

Applied to net-next


Re: [PATCH RFC net-next] arp: Really delete an arp entry on "arp -d"

2017-05-30 Thread David Ahern
On 5/30/17 5:32 PM, Sowmini Varadhan wrote:
> On (05/30/17 16:20), Stephen Hemminger wrote:
>>
>> Please don't copy/paste chunks of code. Instead refactor and make this
>> into a helper function. 
> 
> sure, I have no problems with that, and as I pointed out, I've not
> tested ipv6 for this yet either. I'll do all of this after getting
> some feedback on the more basic issue here:
> 
> I was first looking for comments on the more fundamental refcnt
> management behind the fix (I'm surprised no one noticed this 
> before, is there some deep reason for leaving it like this, that
> I am missing? Does it break something else?)

It has been noticed. I have not sent a patch since adjusting gc
parameters will reclaim FAILED entries at whatever rate the user wants.



Re: [PATCH iproute2] ip: add handling for new CAN netlink interface

2017-05-30 Thread Stephen Hemminger
On Fri, 19 May 2017 14:54:49 +0200
Remigiusz Kołłątaj  wrote:

> This patch adds handling for new CAN netlink interface introduced in
> 4.11 kernel:
> - IFLA_CAN_TERMINATION,
> - IFLA_CAN_TERMINATION_CONST,
> - IFLA_CAN_BITRATE_CONST,
> - IFLA_CAN_DATA_BITRATE_CONST
> 
> Output example:
> $ip -d link show can0
> 6: can0:  mtu 16 qdisc noop state DOWN mode DEFAULT group default 
> qlen 10
> link/can  promiscuity 0
> can state STOPPED (berr-counter tx 0 rx 0) restart-ms 0
>   bitrate 8
>  [   2,3,5,8,8,   10,
> 125000,   15,   175000,   20,   225000,   25,
> 275000,   30,   50,   625000,   80,  100 ]
>   termination 0 [ 0, 120 ]
>   clock 0numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 
> 65535
> 
> Signed-off-by: Remigiusz Kołłątaj 

Applied, thanks for confirming the output change.



Re: [PATCH net-next] net: ndisc.c: reduce size of __ndisc_fill_addr_option()

2017-05-30 Thread David Ahern
On 5/30/17 6:29 PM, YUAN Linyu wrote:
>>> that function should be converted to skb_put_zero once it hits net-next.
>> I will check it
> I can't find skb_put_zero

I believe the decision was to put it in Johannes' tree and it will make
its way to DaveM's tree. Give some time.


Re: [iproute PATCH] tc: m_xt: Prevent a segfault in libipt

2017-05-30 Thread Stephen Hemminger
On Tue, 23 May 2017 15:40:57 +0200
Phil Sutter  wrote:

> From: Phil Sutter 
> 
> This happens with NAT targets, such as SNAT, DNAT and MASQUERADE. These
> are still not usable with this patch, but at least tc doesn't crash
> anymore when one tries to use them.
> 
> Signed-off-by: Phil Sutter 

Applied, thanks


Re: [PATCH iproute2] devlink: Add option to set and show eswitch encapsulation support

2017-05-30 Thread Stephen Hemminger
On Sun, 21 May 2017 08:37:27 +0300
Roi Dayan  wrote:

> This is an e-switch global knob to enable HW support for applying
> encapsulation/decapsulation to VF traffic as part of SRIOV e-switch 
> offloading.
> 
> The actual encap/decap is carried out (along with the matching and other
> actions) per offloaded e-switch rules, e.g as done when offloading the TC 
> tunnel
> key action.
> 
> Possible values are enable/disable.
> 
> Signed-off-by: Roi Dayan 
> Reviewed-by: Jiri Pirko 

Applied, thanks



Re: [PATCH] netlink: Change rtnl_dump_done to always show error

2017-05-30 Thread Stephen Hemminger
On Tue, 16 May 2017 14:22:46 -0700
David Ahern  wrote:

> The original code which became rtnl_dump_done only shows netlink errors
> if the protocol is NETLINK_SOCK_DIAG, but netlink dumps always appends
> the length which contains any error encountered during the dump. Update
> rtnl_dump_done to always show the error if there is one.
> 
> As an *example* without this patch, dumping a route object that exceeds
> the internal buffer size terminates with no message to the user -- the
> dump just ends because the NLMSG_DONE attribute was received. With this
> patch the user at least gets a message that the dump was aborted.
> 
> $ ip ro ls
> default via 10.0.2.2 dev eth0
> 10.0.2.0/24 dev eth0 proto kernel scope link src 10.0.2.15
> 10.10.0.0/16 dev veth1 proto kernel scope link src 10.10.0.1
> 172.16.1.0/24 dev br0.11 proto kernel scope link src 172.16.1.1
> Error: Buffer too small for object
> Dump terminated
> 
> The point of this patch is to notify the user of a failure versus
> silently exiting on a partial dump. Because the NLMSG_DONE attribute
> was received, the entire dump needs to be restarted to use a larger
> buffer for EMSGSIZE errors. That could be done automatically but it
> has other user impacts (e.g., duplicate output if the dump is
> restarted) and should be the subject of a different patch.
> 
> Signed-off-by: David Ahern 

Applied.

It is easier to seen iproute2 patches with:

Subject: [PATCH iproute2] netlink: Change rtnl_dump_done to always show error


Re: [PATCH net-next 0/8] Introduce bpf ID

2017-05-30 Thread David Ahern
On 5/30/17 6:08 PM, Martin KaFai Lau wrote:
> This patch series:
> 1) Introduce ID for both bpf_prog and bpf_map.
> 2) Add bpf commands to iterate the prog IDs and map
>IDs of the system.
> 3) Add bpf commands to get a prog/map fd from an ID
> 4) Add bpf command to get prog/map info from a fd.
>The prog/map info is a jump start in this patchset
>and it is not meant to be a complete list.  They can
>be extended in the future patches.

Interesting. This really simplifies the bpf retrieval I have been
working on:
https://github.com/dsahern/linux/commits/bpf/retrieve-bpf-wip

Having the net device, filter, route, socket, cgroup, etc return the ids
for attached programs solves the problem of getting a reference to all
of the places BPF programs can be placed.

This patch set turns an id into a fd, and my patches allow the fd to be
used to get the bpf code.


Re: [PATCH 1/2] ip: include libc headers first

2017-05-30 Thread Stephen Hemminger
On Mon, 22 May 2017 16:27:53 +0300
Baruch Siach  wrote:

> Including libc headers first helps as a workaround to redefinition of struct
> ethhdr with a suitably patched musl libc that suppresses the kernel
> if_ether.h.
> 
> Signed-off-by: Baruch Siach 
> ---
>  ip/iplink_bridge.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/ip/iplink_bridge.c b/ip/iplink_bridge.c
> index 818b43c89b5b..cccdec1c203a 100644
> --- a/ip/iplink_bridge.c
> +++ b/ip/iplink_bridge.c
> @@ -13,9 +13,9 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
> -#include 
>  #include 
>  
>  #include "rt_names.h"

Applied.

Next time please fix the Subject line.  You implied two patches by using 1/2
but only one was sent.

It is easier for me if you include iproute2 in subject since then it goes into a
separate folder.

Subject: [PATCH iproute2] ip: include libc headers first

or if intended for net-next

Subject: [PATCH iproute2 net-next] ip: add magic VRF support



RE: [PATCH net-next] net: ndisc.c: reduce size of __ndisc_fill_addr_option()

2017-05-30 Thread YUAN Linyu
> > -Original Message-
> > From: David Ahern [mailto:dsah...@gmail.com]
> > Sent: Tuesday, May 30, 2017 11:42 AM
> > To: Joe Perches; David Miller; cug...@163.com
> > Cc: netdev@vger.kernel.org; YUAN Linyu
> > Subject: Re: [PATCH net-next] net: ndisc.c: reduce size of
> > __ndisc_fill_addr_option()
> >
> > On 5/29/17 9:41 PM, Joe Perches wrote:
> > > On Mon, 2017-05-29 at 23:30 -0400, David Miller wrote:
> > >> From: yuan linyu 
> > >> Date: Sat, 27 May 2017 06:00:52 +0800
> > >>
> > >>> From: yuan linyu 
> > >>>
> > >>> Signed-off-by: yuan linyu 
> > >> Applied, thanks.
> > > OK, but is it really safe though?
> > >
> > > Could "space" (an int) ever be negative after
> > > subtracting "pad" and "data_len"?
> > >
> >
> > that function should be converted to skb_put_zero once it hits net-next.
> I will check it
I can't find skb_put_zero


RE: [PATCH net-next] net: ndisc.c: reduce size of __ndisc_fill_addr_option()

2017-05-30 Thread YUAN Linyu


> -Original Message-
> From: David Ahern [mailto:dsah...@gmail.com]
> Sent: Tuesday, May 30, 2017 11:42 AM
> To: Joe Perches; David Miller; cug...@163.com
> Cc: netdev@vger.kernel.org; YUAN Linyu
> Subject: Re: [PATCH net-next] net: ndisc.c: reduce size of
> __ndisc_fill_addr_option()
> 
> On 5/29/17 9:41 PM, Joe Perches wrote:
> > On Mon, 2017-05-29 at 23:30 -0400, David Miller wrote:
> >> From: yuan linyu 
> >> Date: Sat, 27 May 2017 06:00:52 +0800
> >>
> >>> From: yuan linyu 
> >>>
> >>> Signed-off-by: yuan linyu 
> >> Applied, thanks.
> > OK, but is it really safe though?
> >
> > Could "space" (an int) ever be negative after
> > subtracting "pad" and "data_len"?
> >
> 
> that function should be converted to skb_put_zero once it hits net-next.
I will check it


RE: [PATCH net-next] net: ndisc.c: reduce size of __ndisc_fill_addr_option()

2017-05-30 Thread YUAN Linyu
Hi joe,

> -Original Message-
> From: netdev-ow...@vger.kernel.org [mailto:netdev-ow...@vger.kernel.org]
> On Behalf Of Joe Perches
> Sent: Tuesday, May 30, 2017 11:41 AM
> To: David Miller; cug...@163.com
> Cc: netdev@vger.kernel.org; dsah...@gmail.com; YUAN Linyu
> Subject: Re: [PATCH net-next] net: ndisc.c: reduce size of
> __ndisc_fill_addr_option()
> 
> On Mon, 2017-05-29 at 23:30 -0400, David Miller wrote:
> > From: yuan linyu 
> > Date: Sat, 27 May 2017 06:00:52 +0800
> >
> > > From: yuan linyu 
> > >
> > > Signed-off-by: yuan linyu 
> >
> > Applied, thanks.
> 
> OK, but is it really safe though?
> 
> Could "space" (an int) ever be negative after
> subtracting "pad" and "data_len"?
It's safe, check
int space = __ndisc_opt_addr_space(data_len, pad);
space is maximum aligned value.
And I check current pad, the maximum value is 2.



Re: [PATCH V2 2/3] net-next: dsa: add multi cpu port support

2017-05-30 Thread Florian Fainelli
On 05/30/2017 05:06 PM, Andrew Lunn wrote:
>> - past the initial setup, if we start creating bridge devices and so on,
>> we have no way to tell: group Ports 0-3 together and send traffic to CPU
>> port 0, then let Port 5 alone and send traffic to CPU port 1, that's a
>> DSA-only problem though, because we still have the CPU port(s) as
>> independent network interfaces.
> 
> What is the problem here? Frames come out the master interface, get
> untagged and passed to the slave interface and go upto the bridge. It
> should all just work. Same in the reverse direction.

The problem is really that is you have multiple CPU ports, how do you
define which one gets all the traffic by default? Ascending order of
port number? Descending order?

> 
> In order to make best use of the extra bandwidth of having two cpu
> ports, i probably want the user ports reasonably evenly distributed
> between the CPU ports. Dedicating one CPU port to one user port is
> probably sub-optimal. How many people have 1Gbps Fibre to the home,
> which could fully utilise a one-to-one mapping for the WAN port?

I actually tend to think that most use cases our there are in the order
of dedicating one CPU port to one corresponding switch port (user
facing, or internal) in order to provided guaranteed bandwidth for that
port. But as an user, I want to choose how the grouping is going to
work, and right now, I cannot, unless this is hardcoded in Device Tree,
which sounds both wrong and inadequate.

> 
>> Now, that would still force the user to configure two bridges in order
>> to properly steer traffic towards the requested ports but it would allow
>> us to be very flexible (which is probably desired here) in how ports are
>> grouped together.
> 
> We want a sensible default, spreading the slave ports evenly over the
> CPU ports. We could add a devlink command to change the defaults at
> runtime.

Sensible default is fine for the first time boot, but we should let
users be entirely flexible in how they want their user-facing ports to
map to a CPU port as you say, and IMHO using separate bridges to
configure that is a possible way to go since there is already knowledge
in the bridge join/leave code in DSA that already knows the
dwnstream/user-facing ports, but does not yet know about CPU ports.

Code speaks better, so let me see if I can cook something to illustrate
this.

Thanks!
-- 
Florian


[PATCH net-next 5/8] bpf: Add BPF_MAP_GET_FD_BY_ID

2017-05-30 Thread Martin KaFai Lau
Add BPF_MAP_GET_FD_BY_ID command to allow user to get a fd
from a bpf_map's ID.

bpf_map_inc_not_zero() is added and is called with map_idr_lock
held.

__bpf_map_put() is also added which has the 'bool do_idr_lock'
param to decide if the map_idr_lock should be acquired when
freeing the map->id.

In the error path of bpf_map_inc_not_zero(), it may have to
call __bpf_map_put(map, false) which does not need
to take the map_idr_lock when freeing the map->id.

It is currently limited to CAP_SYS_ADMIN which we can
consider to lift it in followup patches.

Signed-off-by: Martin KaFai Lau 
Acked-by: Alexei Starovoitov 
Acked-by: Daniel Borkmann 
---
 include/uapi/linux/bpf.h |  2 +
 kernel/bpf/syscall.c | 95 +++-
 2 files changed, 87 insertions(+), 10 deletions(-)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 6d4e1cc5bd18..cf704e8b6e65 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -85,6 +85,7 @@ enum bpf_cmd {
BPF_PROG_GET_NEXT_ID,
BPF_MAP_GET_NEXT_ID,
BPF_PROG_GET_FD_BY_ID,
+   BPF_MAP_GET_FD_BY_ID,
 };
 
 enum bpf_map_type {
@@ -217,6 +218,7 @@ union bpf_attr {
union {
__u32   start_id;
__u32   prog_id;
+   __u32   map_id;
};
__u32   next_id;
};
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 06d146f54ffb..f524431dd6ce 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -135,11 +135,19 @@ static int bpf_map_alloc_id(struct bpf_map *map)
return id > 0 ? 0 : id;
 }
 
-static void bpf_map_free_id(struct bpf_map *map)
+static void bpf_map_free_id(struct bpf_map *map, bool do_idr_lock)
 {
-   spin_lock_bh(&map_idr_lock);
+   if (do_idr_lock)
+   spin_lock_bh(&map_idr_lock);
+   else
+   __acquire(&map_idr_lock);
+
idr_remove(&map_idr, map->id);
-   spin_unlock_bh(&map_idr_lock);
+
+   if (do_idr_lock)
+   spin_unlock_bh(&map_idr_lock);
+   else
+   __release(&map_idr_lock);
 }
 
 /* called from workqueue */
@@ -163,16 +171,21 @@ static void bpf_map_put_uref(struct bpf_map *map)
 /* decrement map refcnt and schedule it for freeing via workqueue
  * (unrelying map implementation ops->map_free() might sleep)
  */
-void bpf_map_put(struct bpf_map *map)
+static void __bpf_map_put(struct bpf_map *map, bool do_idr_lock)
 {
if (atomic_dec_and_test(&map->refcnt)) {
/* bpf_map_free_id() must be called first */
-   bpf_map_free_id(map);
+   bpf_map_free_id(map, do_idr_lock);
INIT_WORK(&map->work, bpf_map_free_deferred);
schedule_work(&map->work);
}
 }
 
+void bpf_map_put(struct bpf_map *map)
+{
+   __bpf_map_put(map, true);
+}
+
 void bpf_map_put_with_uref(struct bpf_map *map)
 {
bpf_map_put_uref(map);
@@ -271,15 +284,20 @@ static int map_create(union bpf_attr *attr)
goto free_map;
 
err = bpf_map_new_fd(map);
-   if (err < 0)
-   /* failed to allocate fd */
-   goto free_id;
+   if (err < 0) {
+   /* failed to allocate fd.
+* bpf_map_put() is needed because the above
+* bpf_map_alloc_id() has published the map
+* to the userspace and the userspace may
+* have refcnt-ed it through BPF_MAP_GET_FD_BY_ID.
+*/
+   bpf_map_put(map);
+   return err;
+   }
 
trace_bpf_map_create(map, err);
return err;
 
-free_id:
-   bpf_map_free_id(map);
 free_map:
bpf_map_uncharge_memlock(map);
 free_map_nouncharge:
@@ -331,6 +349,28 @@ struct bpf_map *bpf_map_get_with_uref(u32 ufd)
return map;
 }
 
+static struct bpf_map *bpf_map_inc_not_zero(struct bpf_map *map,
+   bool uref)
+{
+   int refold;
+
+   WARN_ON_ONCE(!lockdep_is_held(&map_idr_lock));
+   refold = __atomic_add_unless(&map->refcnt, 1, 0);
+
+   if (refold >= BPF_MAX_REFCNT) {
+   __bpf_map_put(map, false);
+   return ERR_PTR(-EBUSY);
+   }
+
+   if (!refold)
+   return ERR_PTR(-ENOENT);
+
+   if (uref)
+   atomic_inc(&map->usercnt);
+
+   return map;
+}
+
 int __weak bpf_stackmap_copy(struct bpf_map *map, void *key, void *value)
 {
return -ENOTSUPP;
@@ -1165,6 +1205,38 @@ static int bpf_prog_get_fd_by_id(const union bpf_attr 
*attr)
return fd;
 }
 
+#define BPF_MAP_GET_FD_BY_ID_LAST_FIELD map_id
+
+static int bpf_map_get_fd_by_id(const union bpf_attr *attr)
+{
+   struct bpf_map *map;
+   u32 id = attr->map_id;
+   int fd;
+
+   if (CHECK_ATTR(BPF_MAP_GET_FD_BY_ID))
+   return -EINVAL;
+
+   if (!capable(CAP_SYS_ADMIN))
+  

[PATCH net-next 6/8] bpf: Add jited_len to struct bpf_prog

2017-05-30 Thread Martin KaFai Lau
Add jited_len to struct bpf_prog.  It will be
useful for the struct bpf_prog_info which will
be added in the later patch.

Signed-off-by: Martin KaFai Lau 
Acked-by: Alexei Starovoitov 
Acked-by: Daniel Borkmann 
---
 arch/arm64/net/bpf_jit_comp.c | 1 +
 arch/powerpc/net/bpf_jit_comp64.c | 1 +
 arch/s390/net/bpf_jit_comp.c  | 1 +
 arch/sparc/net/bpf_jit_comp_64.c  | 1 +
 arch/x86/net/bpf_jit_comp.c   | 1 +
 include/linux/filter.h| 1 +
 6 files changed, 6 insertions(+)

diff --git a/arch/arm64/net/bpf_jit_comp.c b/arch/arm64/net/bpf_jit_comp.c
index 71f930501ade..6b21dccf01e8 100644
--- a/arch/arm64/net/bpf_jit_comp.c
+++ b/arch/arm64/net/bpf_jit_comp.c
@@ -900,6 +900,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog)
bpf_jit_binary_lock_ro(header);
prog->bpf_func = (void *)ctx.image;
prog->jited = 1;
+   prog->jited_len = image_size;
 
 out_off:
kfree(ctx.offset);
diff --git a/arch/powerpc/net/bpf_jit_comp64.c 
b/arch/powerpc/net/bpf_jit_comp64.c
index aee2bb817ac6..a3f904cd8b1e 100644
--- a/arch/powerpc/net/bpf_jit_comp64.c
+++ b/arch/powerpc/net/bpf_jit_comp64.c
@@ -1052,6 +1052,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
 
fp->bpf_func = (void *)image;
fp->jited = 1;
+   fp->jited_len = alloclen;
 
bpf_flush_icache(bpf_hdr, (u8 *)bpf_hdr + (bpf_hdr->pages * PAGE_SIZE));
 
diff --git a/arch/s390/net/bpf_jit_comp.c b/arch/s390/net/bpf_jit_comp.c
index 6e97a2e3fd8d..76b7e9d5d591 100644
--- a/arch/s390/net/bpf_jit_comp.c
+++ b/arch/s390/net/bpf_jit_comp.c
@@ -1329,6 +1329,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
bpf_jit_binary_lock_ro(header);
fp->bpf_func = (void *) jit.prg_buf;
fp->jited = 1;
+   fp->jited_len = jit.size;
 free_addrs:
kfree(jit.addrs);
 out:
diff --git a/arch/sparc/net/bpf_jit_comp_64.c b/arch/sparc/net/bpf_jit_comp_64.c
index 21de77419f48..beb46b14fbbc 100644
--- a/arch/sparc/net/bpf_jit_comp_64.c
+++ b/arch/sparc/net/bpf_jit_comp_64.c
@@ -1555,6 +1555,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog 
*prog)
 
prog->bpf_func = (void *)ctx.image;
prog->jited = 1;
+   prog->jited_len = image_size;
 
 out_off:
kfree(ctx.offset);
diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
index f58939393eef..9341030133ae 100644
--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -1162,6 +1162,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog 
*prog)
bpf_jit_binary_lock_ro(header);
prog->bpf_func = (void *)image;
prog->jited = 1;
+   prog->jited_len = proglen;
} else {
prog = orig_prog;
}
diff --git a/include/linux/filter.h b/include/linux/filter.h
index 62d948f80730..a322a41f394b 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -429,6 +429,7 @@ struct bpf_prog {
kmemcheck_bitfield_end(meta);
enum bpf_prog_type  type;   /* Type of BPF program */
u32 len;/* Number of filter blocks */
+   u32 jited_len;  /* Size of jited insns in bytes 
*/
u8  tag[BPF_TAG_SIZE];
struct bpf_prog_aux *aux;   /* Auxiliary fields */
struct sock_fprog_kern  *orig_prog; /* Original BPF program */
-- 
2.9.3



[PATCH net-next 2/8] bpf: Introduce bpf_map ID

2017-05-30 Thread Martin KaFai Lau
This patch generates an unique ID for each created bpf_map.
The approach is similar to the earlier patch for bpf_prog ID.

It is worth to note that the bpf_map's ID and bpf_prog's ID
are in two independent ID spaces and both have the same valid range:
[1, INT_MAX).

Signed-off-by: Martin KaFai Lau 
Acked-by: Alexei Starovoitov 
Acked-by: Daniel Borkmann 
---
 include/linux/bpf.h  |  1 +
 kernel/bpf/syscall.c | 34 +-
 2 files changed, 34 insertions(+), 1 deletion(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index c2793a732edc..ea78d87cbc3e 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -46,6 +46,7 @@ struct bpf_map {
u32 max_entries;
u32 map_flags;
u32 pages;
+   u32 id;
struct user_struct *user;
const struct bpf_map_ops *ops;
struct work_struct work;
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 697bdb6ceceb..a6fe034dbc09 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -27,6 +27,8 @@
 DEFINE_PER_CPU(int, bpf_prog_active);
 DEFINE_IDR(prog_idr);
 DEFINE_SPINLOCK(prog_idr_lock);
+DEFINE_IDR(map_idr);
+DEFINE_SPINLOCK(map_idr_lock);
 
 int sysctl_unprivileged_bpf_disabled __read_mostly;
 
@@ -117,6 +119,29 @@ static void bpf_map_uncharge_memlock(struct bpf_map *map)
free_uid(user);
 }
 
+static int bpf_map_alloc_id(struct bpf_map *map)
+{
+   int id;
+
+   spin_lock_bh(&map_idr_lock);
+   id = idr_alloc_cyclic(&map_idr, map, 1, INT_MAX, GFP_ATOMIC);
+   if (id > 0)
+   map->id = id;
+   spin_unlock_bh(&map_idr_lock);
+
+   if (WARN_ON_ONCE(!id))
+   return -ENOSPC;
+
+   return id > 0 ? 0 : id;
+}
+
+static void bpf_map_free_id(struct bpf_map *map)
+{
+   spin_lock_bh(&map_idr_lock);
+   idr_remove(&map_idr, map->id);
+   spin_unlock_bh(&map_idr_lock);
+}
+
 /* called from workqueue */
 static void bpf_map_free_deferred(struct work_struct *work)
 {
@@ -141,6 +166,7 @@ static void bpf_map_put_uref(struct bpf_map *map)
 void bpf_map_put(struct bpf_map *map)
 {
if (atomic_dec_and_test(&map->refcnt)) {
+   bpf_map_free_id(map);
INIT_WORK(&map->work, bpf_map_free_deferred);
schedule_work(&map->work);
}
@@ -239,14 +265,20 @@ static int map_create(union bpf_attr *attr)
if (err)
goto free_map_nouncharge;
 
+   err = bpf_map_alloc_id(map);
+   if (err)
+   goto free_map;
+
err = bpf_map_new_fd(map);
if (err < 0)
/* failed to allocate fd */
-   goto free_map;
+   goto free_id;
 
trace_bpf_map_create(map, err);
return err;
 
+free_id:
+   bpf_map_free_id(map);
 free_map:
bpf_map_uncharge_memlock(map);
 free_map_nouncharge:
-- 
2.9.3



[PATCH net-next 3/8] bpf: Add BPF_(PROG|MAP)_GET_NEXT_ID command

2017-05-30 Thread Martin KaFai Lau
This patch adds BPF_PROG_GET_NEXT_ID and BPF_MAP_GET_NEXT_ID
to allow userspace to iterate all bpf_prog IDs and bpf_map IDs.

The API is trying to be consistent with the existing
BPF_MAP_GET_NEXT_KEY.

It is currently limited to CAP_SYS_ADMIN which we can
consider to lift it in followup patches.

Signed-off-by: Martin KaFai Lau 
Acked-by: Alexei Starovoitov 
Acked-by: Daniel Borkmann 
---
 include/uapi/linux/bpf.h |  7 +++
 kernel/bpf/syscall.c | 38 ++
 2 files changed, 45 insertions(+)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 94dfa9def355..e5c88f39bdc3 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -82,6 +82,8 @@ enum bpf_cmd {
BPF_PROG_ATTACH,
BPF_PROG_DETACH,
BPF_PROG_TEST_RUN,
+   BPF_PROG_GET_NEXT_ID,
+   BPF_MAP_GET_NEXT_ID,
 };
 
 enum bpf_map_type {
@@ -209,6 +211,11 @@ union bpf_attr {
__u32   repeat;
__u32   duration;
} test;
+
+   struct { /* anonymous struct used by BPF_*_GET_NEXT_ID */
+   __u32   start_id;
+   __u32   next_id;
+   };
 } __attribute__((aligned(8)));
 
 /* BPF helper function descriptions:
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index a6fe034dbc09..34dc92b3c349 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -166,6 +166,7 @@ static void bpf_map_put_uref(struct bpf_map *map)
 void bpf_map_put(struct bpf_map *map)
 {
if (atomic_dec_and_test(&map->refcnt)) {
+   /* bpf_map_free_id() must be called first */
bpf_map_free_id(map);
INIT_WORK(&map->work, bpf_map_free_deferred);
schedule_work(&map->work);
@@ -726,6 +727,7 @@ void bpf_prog_put(struct bpf_prog *prog)
 {
if (atomic_dec_and_test(&prog->aux->refcnt)) {
trace_bpf_prog_put_rcu(prog);
+   /* bpf_prog_free_id() must be called first */
bpf_prog_free_id(prog);
bpf_prog_kallsyms_del(prog);
call_rcu(&prog->aux->rcu, __bpf_prog_put_rcu);
@@ -1067,6 +1069,34 @@ static int bpf_prog_test_run(const union bpf_attr *attr,
return ret;
 }
 
+#define BPF_OBJ_GET_NEXT_ID_LAST_FIELD next_id
+
+static int bpf_obj_get_next_id(const union bpf_attr *attr,
+  union bpf_attr __user *uattr,
+  struct idr *idr,
+  spinlock_t *lock)
+{
+   u32 next_id = attr->start_id;
+   int err = 0;
+
+   if (CHECK_ATTR(BPF_OBJ_GET_NEXT_ID) || next_id >= INT_MAX)
+   return -EINVAL;
+
+   if (!capable(CAP_SYS_ADMIN))
+   return -EPERM;
+
+   next_id++;
+   spin_lock_bh(lock);
+   if (!idr_get_next(idr, &next_id))
+   err = -ENOENT;
+   spin_unlock_bh(lock);
+
+   if (!err)
+   err = put_user(next_id, &uattr->next_id);
+
+   return err;
+}
+
 SYSCALL_DEFINE3(bpf, int, cmd, union bpf_attr __user *, uattr, unsigned int, 
size)
 {
union bpf_attr attr = {};
@@ -1144,6 +1174,14 @@ SYSCALL_DEFINE3(bpf, int, cmd, union bpf_attr __user *, 
uattr, unsigned int, siz
case BPF_PROG_TEST_RUN:
err = bpf_prog_test_run(&attr, uattr);
break;
+   case BPF_PROG_GET_NEXT_ID:
+   err = bpf_obj_get_next_id(&attr, uattr,
+ &prog_idr, &prog_idr_lock);
+   break;
+   case BPF_MAP_GET_NEXT_ID:
+   err = bpf_obj_get_next_id(&attr, uattr,
+ &map_idr, &map_idr_lock);
+   break;
default:
err = -EINVAL;
break;
-- 
2.9.3



[PATCH net-next 8/8] bpf: Test for bpf ID

2017-05-30 Thread Martin KaFai Lau
Add test to exercise the bpf_prog/map id generation,
bpf_(prog|map)_get_next_id(), bpf_(prog|map)_get_fd_by_id() and
bpf_get_obj_info_by_fd().

Signed-off-by: Martin KaFai Lau 
Acked-by: Alexei Starovoitov 
Acked-by: Daniel Borkmann 
---
 tools/include/uapi/linux/bpf.h|  41 +++
 tools/lib/bpf/bpf.c   |  68 +++
 tools/lib/bpf/bpf.h   |   5 +
 tools/testing/selftests/bpf/Makefile  |   2 +-
 tools/testing/selftests/bpf/test_obj_id.c |  35 ++
 tools/testing/selftests/bpf/test_progs.c  | 191 ++
 6 files changed, 341 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/bpf/test_obj_id.c

diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 94dfa9def355..677b570da8ec 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -82,6 +82,11 @@ enum bpf_cmd {
BPF_PROG_ATTACH,
BPF_PROG_DETACH,
BPF_PROG_TEST_RUN,
+   BPF_PROG_GET_NEXT_ID,
+   BPF_MAP_GET_NEXT_ID,
+   BPF_PROG_GET_FD_BY_ID,
+   BPF_MAP_GET_FD_BY_ID,
+   BPF_OBJ_GET_INFO_BY_FD,
 };
 
 enum bpf_map_type {
@@ -209,6 +214,21 @@ union bpf_attr {
__u32   repeat;
__u32   duration;
} test;
+
+   struct { /* anonymous struct used by BPF_*_GET_*_ID */
+   union {
+   __u32   start_id;
+   __u32   prog_id;
+   __u32   map_id;
+   };
+   __u32   next_id;
+   };
+
+   struct { /* anonymous struct used by BPF_OBJ_GET_INFO_BY_FD */
+   __u32   bpf_fd;
+   __u32   info_len;
+   __aligned_u64   info;
+   } info;
 } __attribute__((aligned(8)));
 
 /* BPF helper function descriptions:
@@ -670,4 +690,25 @@ struct xdp_md {
__u32 data_end;
 };
 
+#define BPF_TAG_SIZE   8
+
+struct bpf_prog_info {
+   __u32 type;
+   __u32 id;
+   __u8  tag[BPF_TAG_SIZE];
+   __u32 jited_prog_len;
+   __u32 xlated_prog_len;
+   __aligned_u64 jited_prog_insns;
+   __aligned_u64 xlated_prog_insns;
+} __attribute__((aligned(8)));
+
+struct bpf_map_info {
+   __u32 type;
+   __u32 id;
+   __u32 key_size;
+   __u32 value_size;
+   __u32 max_entries;
+   __u32 map_flags;
+} __attribute__((aligned(8)));
+
 #endif /* _UAPI__LINUX_BPF_H__ */
diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
index 6e178987af8e..7e0405e1651d 100644
--- a/tools/lib/bpf/bpf.c
+++ b/tools/lib/bpf/bpf.c
@@ -257,3 +257,71 @@ int bpf_prog_test_run(int prog_fd, int repeat, void *data, 
__u32 size,
*duration = attr.test.duration;
return ret;
 }
+
+int bpf_prog_get_next_id(__u32 start_id, __u32 *next_id)
+{
+   union bpf_attr attr;
+   int err;
+
+   bzero(&attr, sizeof(attr));
+   attr.start_id = start_id;
+
+   err = sys_bpf(BPF_PROG_GET_NEXT_ID, &attr, sizeof(attr));
+   if (!err)
+   *next_id = attr.next_id;
+
+   return err;
+}
+
+int bpf_map_get_next_id(__u32 start_id, __u32 *next_id)
+{
+   union bpf_attr attr;
+   int err;
+
+   bzero(&attr, sizeof(attr));
+   attr.start_id = start_id;
+
+   err = sys_bpf(BPF_MAP_GET_NEXT_ID, &attr, sizeof(attr));
+   if (!err)
+   *next_id = attr.next_id;
+
+   return err;
+}
+
+int bpf_prog_get_fd_by_id(__u32 id)
+{
+   union bpf_attr attr;
+
+   bzero(&attr, sizeof(attr));
+   attr.prog_id = id;
+
+   return sys_bpf(BPF_PROG_GET_FD_BY_ID, &attr, sizeof(attr));
+}
+
+int bpf_map_get_fd_by_id(__u32 id)
+{
+   union bpf_attr attr;
+
+   bzero(&attr, sizeof(attr));
+   attr.map_id = id;
+
+   return sys_bpf(BPF_MAP_GET_FD_BY_ID, &attr, sizeof(attr));
+}
+
+int bpf_obj_get_info_by_fd(int prog_fd, void *info, __u32 *info_len)
+{
+   union bpf_attr attr;
+   int err;
+
+   bzero(&attr, sizeof(attr));
+   bzero(info, *info_len);
+   attr.info.bpf_fd = prog_fd;
+   attr.info.info_len = *info_len;
+   attr.info.info = ptr_to_u64(info);
+
+   err = sys_bpf(BPF_OBJ_GET_INFO_BY_FD, &attr, sizeof(attr));
+   if (!err)
+   *info_len = attr.info.info_len;
+
+   return err;
+}
diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h
index 972bd8333eb7..16de44a14b48 100644
--- a/tools/lib/bpf/bpf.h
+++ b/tools/lib/bpf/bpf.h
@@ -54,5 +54,10 @@ int bpf_prog_detach(int attachable_fd, enum bpf_attach_type 
type);
 int bpf_prog_test_run(int prog_fd, int repeat, void *data, __u32 size,
  void *data_out, __u32 *size_out, __u32 *retval,
  __u32 *duration);
+int bpf_prog_get_next_id(__u32 start_id, __u32 *next_id);
+int bpf_map_get_next_id(__u32 start_id, __u32 *next_id);
+int bpf_prog_get_fd_by_id(__u32 id);
+int bpf_map_get_fd_by_id(__u32 id);
+int bpf_obj_get_i

[PATCH net-next 1/8] bpf: Introduce bpf_prog ID

2017-05-30 Thread Martin KaFai Lau
This patch generates an unique ID for each BPF_PROG_LOAD-ed prog.
It is worth to note that each BPF_PROG_LOAD-ed prog will have
a different ID even they have the same bpf instructions.

The ID is generated by the existing idr_alloc_cyclic().
The ID is ranged from [1, INT_MAX).  It is allocated in cyclic manner,
so an ID will get reused every 2 billion BPF_PROG_LOAD.

The bpf_prog_alloc_id() is done after bpf_prog_select_runtime()
because the jit process may have allocated a new prog.  Hence,
we need to ensure the value of pointer 'prog' will not be changed
any more before storing the prog to the prog_idr.

After bpf_prog_select_runtime(), the prog is read-only.  Hence,
the id is stored in 'struct bpf_prog_aux'.

Signed-off-by: Martin KaFai Lau 
Acked-by: Alexei Starovoitov 
Acked-by: Daniel Borkmann 
---
 include/linux/bpf.h  |  1 +
 kernel/bpf/syscall.c | 40 +++-
 2 files changed, 40 insertions(+), 1 deletion(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 6bb38d76faf4..c2793a732edc 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -171,6 +171,7 @@ struct bpf_prog_aux {
atomic_t refcnt;
u32 used_map_cnt;
u32 max_ctx_offset;
+   u32 id;
struct latch_tree_node ksym_tnode;
struct list_head ksym_lnode;
const struct bpf_verifier_ops *ops;
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 265a0d854e33..697bdb6ceceb 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -22,8 +22,11 @@
 #include 
 #include 
 #include 
+#include 
 
 DEFINE_PER_CPU(int, bpf_prog_active);
+DEFINE_IDR(prog_idr);
+DEFINE_SPINLOCK(prog_idr_lock);
 
 int sysctl_unprivileged_bpf_disabled __read_mostly;
 
@@ -650,6 +653,34 @@ static void bpf_prog_uncharge_memlock(struct bpf_prog 
*prog)
free_uid(user);
 }
 
+static int bpf_prog_alloc_id(struct bpf_prog *prog)
+{
+   int id;
+
+   spin_lock_bh(&prog_idr_lock);
+   id = idr_alloc_cyclic(&prog_idr, prog, 1, INT_MAX, GFP_ATOMIC);
+   if (id > 0)
+   prog->aux->id = id;
+   spin_unlock_bh(&prog_idr_lock);
+
+   /* id is in [1, INT_MAX) */
+   if (WARN_ON_ONCE(!id))
+   return -ENOSPC;
+
+   return id > 0 ? 0 : id;
+}
+
+static void bpf_prog_free_id(struct bpf_prog *prog)
+{
+   /* cBPF to eBPF migrations are currently not in the idr store. */
+   if (!prog->aux->id)
+   return;
+
+   spin_lock_bh(&prog_idr_lock);
+   idr_remove(&prog_idr, prog->aux->id);
+   spin_unlock_bh(&prog_idr_lock);
+}
+
 static void __bpf_prog_put_rcu(struct rcu_head *rcu)
 {
struct bpf_prog_aux *aux = container_of(rcu, struct bpf_prog_aux, rcu);
@@ -663,6 +694,7 @@ void bpf_prog_put(struct bpf_prog *prog)
 {
if (atomic_dec_and_test(&prog->aux->refcnt)) {
trace_bpf_prog_put_rcu(prog);
+   bpf_prog_free_id(prog);
bpf_prog_kallsyms_del(prog);
call_rcu(&prog->aux->rcu, __bpf_prog_put_rcu);
}
@@ -855,15 +887,21 @@ static int bpf_prog_load(union bpf_attr *attr)
if (err < 0)
goto free_used_maps;
 
+   err = bpf_prog_alloc_id(prog);
+   if (err)
+   goto free_used_maps;
+
err = bpf_prog_new_fd(prog);
if (err < 0)
/* failed to allocate fd */
-   goto free_used_maps;
+   goto free_id;
 
bpf_prog_kallsyms_add(prog);
trace_bpf_prog_load(prog, err);
return err;
 
+free_id:
+   bpf_prog_free_id(prog);
 free_used_maps:
free_used_maps(prog->aux);
 free_prog:
-- 
2.9.3



[PATCH net-next 7/8] bpf: Add BPF_OBJ_GET_INFO_BY_FD

2017-05-30 Thread Martin KaFai Lau
A single BPF_OBJ_GET_INFO_BY_FD cmd is used to obtain the info
for both bpf_prog and bpf_map.  The kernel can figure out the
fd is associated with a bpf_prog or bpf_map.

The suggested struct bpf_prog_info and struct bpf_map_info are
not meant to be a complete list and it is not the goal of this patch.
New fields can be added in the future patch.

The focus of this patch is to create the interface,
BPF_OBJ_GET_INFO_BY_FD cmd for exposing the bpf_prog's and
bpf_map's info.

The obj's info, which will be extended (and get bigger) over time, is
separated from the bpf_attr to avoid bloating the bpf_attr.

Signed-off-by: Martin KaFai Lau 
Acked-by: Alexei Starovoitov 
Acked-by: Daniel Borkmann 
---
 include/linux/filter.h   |   2 -
 include/uapi/linux/bpf.h |  28 
 kernel/bpf/syscall.c | 163 ++-
 3 files changed, 174 insertions(+), 19 deletions(-)

diff --git a/include/linux/filter.h b/include/linux/filter.h
index a322a41f394b..46274f23f127 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -66,8 +66,6 @@ struct bpf_prog_aux;
 /* BPF program can access up to 512 bytes of stack space. */
 #define MAX_BPF_STACK  512
 
-#define BPF_TAG_SIZE   8
-
 /* Helper macros for filter block array initializers. */
 
 /* ALU ops on registers, bpf_add|sub|...: dst_reg += src_reg */
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index cf704e8b6e65..677b570da8ec 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -86,6 +86,7 @@ enum bpf_cmd {
BPF_MAP_GET_NEXT_ID,
BPF_PROG_GET_FD_BY_ID,
BPF_MAP_GET_FD_BY_ID,
+   BPF_OBJ_GET_INFO_BY_FD,
 };
 
 enum bpf_map_type {
@@ -222,6 +223,12 @@ union bpf_attr {
};
__u32   next_id;
};
+
+   struct { /* anonymous struct used by BPF_OBJ_GET_INFO_BY_FD */
+   __u32   bpf_fd;
+   __u32   info_len;
+   __aligned_u64   info;
+   } info;
 } __attribute__((aligned(8)));
 
 /* BPF helper function descriptions:
@@ -683,4 +690,25 @@ struct xdp_md {
__u32 data_end;
 };
 
+#define BPF_TAG_SIZE   8
+
+struct bpf_prog_info {
+   __u32 type;
+   __u32 id;
+   __u8  tag[BPF_TAG_SIZE];
+   __u32 jited_prog_len;
+   __u32 xlated_prog_len;
+   __aligned_u64 jited_prog_insns;
+   __aligned_u64 xlated_prog_insns;
+} __attribute__((aligned(8)));
+
+struct bpf_map_info {
+   __u32 type;
+   __u32 id;
+   __u32 key_size;
+   __u32 value_size;
+   __u32 max_entries;
+   __u32 map_flags;
+} __attribute__((aligned(8)));
+
 #endif /* _UAPI__LINUX_BPF_H__ */
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index f524431dd6ce..b21af69bf57b 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -1237,6 +1237,145 @@ static int bpf_map_get_fd_by_id(const union bpf_attr 
*attr)
return fd;
 }
 
+static int check_uarg_tail_zero(void __user *uaddr,
+   size_t expected_size,
+   size_t actual_size)
+{
+   unsigned char __user *addr;
+   unsigned char __user *end;
+   unsigned char val;
+   int err;
+
+   if (actual_size <= expected_size)
+   return 0;
+
+   addr = uaddr + expected_size;
+   end  = uaddr + actual_size;
+
+   for (; addr < end; addr++) {
+   err = get_user(val, addr);
+   if (err)
+   return err;
+   if (val)
+   return -E2BIG;
+   }
+
+   return 0;
+}
+
+static int bpf_prog_get_info_by_fd(struct bpf_prog *prog,
+  const union bpf_attr *attr,
+  union bpf_attr *uattr)
+{
+   struct bpf_prog_info __user *uinfo = u64_to_user_ptr(attr->info.info);
+   struct bpf_prog_info info = {};
+   u32 info_len = attr->info.info_len;
+   char __user *uinsns;
+   u32 ulen;
+   int err;
+
+   err = check_uarg_tail_zero(uinfo, sizeof(info), info_len);
+   if (err)
+   return err;
+   info_len = min_t(u32, sizeof(info), info_len);
+
+   if (copy_from_user(&info, uinfo, info_len))
+   return err;
+
+   info.type = prog->type;
+   info.id = prog->aux->id;
+
+   memcpy(info.tag, prog->tag, sizeof(prog->tag));
+
+   if (!capable(CAP_SYS_ADMIN)) {
+   info.jited_prog_len = 0;
+   info.xlated_prog_len = 0;
+   goto done;
+   }
+
+   ulen = info.jited_prog_len;
+   info.jited_prog_len = prog->jited_len;
+   if (info.jited_prog_len && ulen) {
+   uinsns = u64_to_user_ptr(info.jited_prog_insns);
+   ulen = min_t(u32, info.jited_prog_len, ulen);
+   if (copy_to_user(uinsns, prog->bpf_func, ulen))
+   return -EFAULT;
+   }
+
+   ulen = info.xlated_prog_len;
+   info.xlated_prog_le

[PATCH net-next 4/8] bpf: Add BPF_PROG_GET_FD_BY_ID

2017-05-30 Thread Martin KaFai Lau
Add BPF_PROG_GET_FD_BY_ID command to allow user to get a fd
from a bpf_prog's ID.

bpf_prog_inc_not_zero() is added and is called with prog_idr_lock
held.

__bpf_prog_put() is also added which has the 'bool do_idr_lock'
param to decide if the prog_idr_lock should be acquired when
freeing the prog->id.

In the error path of bpf_prog_inc_not_zero(), it may have to
call __bpf_prog_put(map, false) which does not need
to take the prog_idr_lock when freeing the prog->id.

It is currently limited to CAP_SYS_ADMIN which we can
consider to lift it in followup patches.

Signed-off-by: Martin KaFai Lau 
Acked-by: Alexei Starovoitov 
Acked-by: Daniel Borkmann 
---
 include/uapi/linux/bpf.h |  8 +++--
 kernel/bpf/syscall.c | 91 ++--
 2 files changed, 87 insertions(+), 12 deletions(-)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index e5c88f39bdc3..6d4e1cc5bd18 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -84,6 +84,7 @@ enum bpf_cmd {
BPF_PROG_TEST_RUN,
BPF_PROG_GET_NEXT_ID,
BPF_MAP_GET_NEXT_ID,
+   BPF_PROG_GET_FD_BY_ID,
 };
 
 enum bpf_map_type {
@@ -212,8 +213,11 @@ union bpf_attr {
__u32   duration;
} test;
 
-   struct { /* anonymous struct used by BPF_*_GET_NEXT_ID */
-   __u32   start_id;
+   struct { /* anonymous struct used by BPF_*_GET_*_ID */
+   union {
+   __u32   start_id;
+   __u32   prog_id;
+   };
__u32   next_id;
};
 } __attribute__((aligned(8)));
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 34dc92b3c349..06d146f54ffb 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -703,15 +703,23 @@ static int bpf_prog_alloc_id(struct bpf_prog *prog)
return id > 0 ? 0 : id;
 }
 
-static void bpf_prog_free_id(struct bpf_prog *prog)
+static void bpf_prog_free_id(struct bpf_prog *prog, bool do_idr_lock)
 {
/* cBPF to eBPF migrations are currently not in the idr store. */
if (!prog->aux->id)
return;
 
-   spin_lock_bh(&prog_idr_lock);
+   if (do_idr_lock)
+   spin_lock_bh(&prog_idr_lock);
+   else
+   __acquire(&prog_idr_lock);
+
idr_remove(&prog_idr, prog->aux->id);
-   spin_unlock_bh(&prog_idr_lock);
+
+   if (do_idr_lock)
+   spin_unlock_bh(&prog_idr_lock);
+   else
+   __release(&prog_idr_lock);
 }
 
 static void __bpf_prog_put_rcu(struct rcu_head *rcu)
@@ -723,16 +731,21 @@ static void __bpf_prog_put_rcu(struct rcu_head *rcu)
bpf_prog_free(aux->prog);
 }
 
-void bpf_prog_put(struct bpf_prog *prog)
+static void __bpf_prog_put(struct bpf_prog *prog, bool do_idr_lock)
 {
if (atomic_dec_and_test(&prog->aux->refcnt)) {
trace_bpf_prog_put_rcu(prog);
/* bpf_prog_free_id() must be called first */
-   bpf_prog_free_id(prog);
+   bpf_prog_free_id(prog, do_idr_lock);
bpf_prog_kallsyms_del(prog);
call_rcu(&prog->aux->rcu, __bpf_prog_put_rcu);
}
 }
+
+void bpf_prog_put(struct bpf_prog *prog)
+{
+   __bpf_prog_put(prog, true);
+}
 EXPORT_SYMBOL_GPL(bpf_prog_put);
 
 static int bpf_prog_release(struct inode *inode, struct file *filp)
@@ -814,6 +827,24 @@ struct bpf_prog *bpf_prog_inc(struct bpf_prog *prog)
 }
 EXPORT_SYMBOL_GPL(bpf_prog_inc);
 
+static struct bpf_prog *bpf_prog_inc_not_zero(struct bpf_prog *prog)
+{
+   int refold;
+
+   WARN_ON_ONCE(!lockdep_is_held(&prog_idr_lock));
+   refold = __atomic_add_unless(&prog->aux->refcnt, 1, 0);
+
+   if (refold >= BPF_MAX_REFCNT) {
+   __bpf_prog_put(prog, false);
+   return ERR_PTR(-EBUSY);
+   }
+
+   if (!refold)
+   return ERR_PTR(-ENOENT);
+
+   return prog;
+}
+
 static struct bpf_prog *__bpf_prog_get(u32 ufd, enum bpf_prog_type *type)
 {
struct fd f = fdget(ufd);
@@ -926,16 +957,21 @@ static int bpf_prog_load(union bpf_attr *attr)
goto free_used_maps;
 
err = bpf_prog_new_fd(prog);
-   if (err < 0)
-   /* failed to allocate fd */
-   goto free_id;
+   if (err < 0) {
+   /* failed to allocate fd.
+* bpf_prog_put() is needed because the above
+* bpf_prog_alloc_id() has published the prog
+* to the userspace and the userspace may
+* have refcnt-ed it through BPF_PROG_GET_FD_BY_ID.
+*/
+   bpf_prog_put(prog);
+   return err;
+   }
 
bpf_prog_kallsyms_add(prog);
trace_bpf_prog_load(prog, err);
return err;
 
-free_id:
-   bpf_prog_free_id(prog);
 free_used_maps:
free_used_maps(prog->aux);
 free_prog:
@@ -1097,6 +1133,38 @@ static int bpf_obj_get_ne

[PATCH net-next 0/8] Introduce bpf ID

2017-05-30 Thread Martin KaFai Lau
This patch series:
1) Introduce ID for both bpf_prog and bpf_map.
2) Add bpf commands to iterate the prog IDs and map
   IDs of the system.
3) Add bpf commands to get a prog/map fd from an ID
4) Add bpf command to get prog/map info from a fd.
   The prog/map info is a jump start in this patchset
   and it is not meant to be a complete list.  They can
   be extended in the future patches.

Martin KaFai Lau (8):
  bpf: Introduce bpf_prog ID
  bpf: Introduce bpf_map ID
  bpf: Add BPF_(PROG|MAP)_GET_NEXT_ID command
  bpf: Add BPF_PROG_GET_FD_BY_ID
  bpf: Add BPF_MAP_GET_FD_BY_ID
  bpf: Add jited_len to struct bpf_prog
  bpf: Add BPF_OBJ_GET_INFO_BY_FD
  bpf: Test for bpf ID

 arch/arm64/net/bpf_jit_comp.c |   1 +
 arch/powerpc/net/bpf_jit_comp64.c |   1 +
 arch/s390/net/bpf_jit_comp.c  |   1 +
 arch/sparc/net/bpf_jit_comp_64.c  |   1 +
 arch/x86/net/bpf_jit_comp.c   |   1 +
 include/linux/bpf.h   |   2 +
 include/linux/filter.h|   3 +-
 include/uapi/linux/bpf.h  |  41 +++
 kernel/bpf/syscall.c  | 433 --
 tools/include/uapi/linux/bpf.h|  41 +++
 tools/lib/bpf/bpf.c   |  68 +
 tools/lib/bpf/bpf.h   |   5 +
 tools/testing/selftests/bpf/Makefile  |   2 +-
 tools/testing/selftests/bpf/test_obj_id.c |  35 +++
 tools/testing/selftests/bpf/test_progs.c  | 191 +
 15 files changed, 798 insertions(+), 28 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/test_obj_id.c

-- 
2.9.3



Re: [PATCH V2 2/3] net-next: dsa: add multi cpu port support

2017-05-30 Thread Andrew Lunn
> - past the initial setup, if we start creating bridge devices and so on,
> we have no way to tell: group Ports 0-3 together and send traffic to CPU
> port 0, then let Port 5 alone and send traffic to CPU port 1, that's a
> DSA-only problem though, because we still have the CPU port(s) as
> independent network interfaces.

What is the problem here? Frames come out the master interface, get
untagged and passed to the slave interface and go upto the bridge. It
should all just work. Same in the reverse direction.

In order to make best use of the extra bandwidth of having two cpu
ports, i probably want the user ports reasonably evenly distributed
between the CPU ports. Dedicating one CPU port to one user port is
probably sub-optimal. How many people have 1Gbps Fibre to the home,
which could fully utilise a one-to-one mapping for the WAN port?

> Now, that would still force the user to configure two bridges in order
> to properly steer traffic towards the requested ports but it would allow
> us to be very flexible (which is probably desired here) in how ports are
> grouped together.

We want a sensible default, spreading the slave ports evenly over the
CPU ports. We could add a devlink command to change the defaults at
runtime.

Andrew


[PATCH net-next] bnxt_en: Fix xmit_more with BQL.

2017-05-30 Thread Michael Chan
We need to write the doorbell if BQL has stopped the queue and
skb->xmit_more is set.  Otherwise it is possible for the tx queue to
rot and cause tx timeout.

Fixes: 4d172f21cefe ("bnxt_en: Implement xmit_more.")
Suggested-by: Yuval Mintz 
Signed-off-by: Michael Chan 
---
 drivers/net/ethernet/broadcom/bnxt/bnxt.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c 
b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index 954758f..c1cd72a 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -463,7 +463,7 @@ static netdev_tx_t bnxt_start_xmit(struct sk_buff *skb, 
struct net_device *dev)
prod = NEXT_TX(prod);
txr->tx_prod = prod;
 
-   if (!skb->xmit_more)
+   if (!skb->xmit_more || netif_xmit_stopped(txq))
bnxt_db_write(bp, txr->tx_doorbell, DB_KEY_TX | prod);
 
 tx_done:
-- 
1.8.3.1



[PATCH v5 net-next 5/5] dsa: add maintainer of Microchip KSZ switches

2017-05-30 Thread Woojung.Huh
From: Woojung Huh 

Adding maintainer of Microchip KSZ switches.

Reviewed-by: Andrew Lunn 
Reviewed-by: Florian Fainelli 
Signed-off-by: Woojung Huh 
---
 MAINTAINERS | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 42378cf..0fcb5e75 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -8479,6 +8479,16 @@ F:   drivers/media/platform/atmel/atmel-isc.c
 F: drivers/media/platform/atmel/atmel-isc-regs.h
 F: devicetree/bindings/media/atmel-isc.txt
 
+MICROCHIP KSZ SERIES ETHERNET SWITCH DRIVER
+M: Woojung Huh 
+M: Microchip Linux Driver Support 
+L: netdev@vger.kernel.org
+S: Maintained
+F: net/dsa/tag_ksz.c
+F: drivers/net/dsa/microchip/*
+F: include/linux/platform_data/microchip-ksz.h
+F: Documentation/devicetree/bindings/net/dsa/ksz.txt
+
 MICROCHIP USB251XB DRIVER
 M: Richard Leitner 
 L: linux-...@vger.kernel.org
-- 
2.7.4



[PATCH v5 net-next 2/5] phy: micrel: add Microchip KSZ 9477 Switch PHY support

2017-05-30 Thread Woojung.Huh
From: Woojung Huh 

Adding Microchip 9477 Phy included in KSZ9477 Switch.

Reviewed-by: Andrew Lunn 
Reviewed-by: Florian Fainelli 
Signed-off-by: Woojung Huh 
---
 drivers/net/phy/micrel.c   | 11 +++
 include/linux/micrel_phy.h |  2 ++
 2 files changed, 13 insertions(+)

diff --git a/drivers/net/phy/micrel.c b/drivers/net/phy/micrel.c
index 4cfd541..46e80bc 100644
--- a/drivers/net/phy/micrel.c
+++ b/drivers/net/phy/micrel.c
@@ -20,6 +20,7 @@
  *ksz8081, ksz8091,
  *ksz8061,
  * Switch : ksz8873, ksz886x
+ *  ksz9477
  */
 
 #include 
@@ -996,6 +997,16 @@ static struct phy_driver ksphy_driver[] = {
.read_status= ksz8873mll_read_status,
.suspend= genphy_suspend,
.resume = genphy_resume,
+}, {
+   .phy_id = PHY_ID_KSZ9477,
+   .phy_id_mask= MICREL_PHY_ID_MASK,
+   .name   = "Microchip KSZ9477",
+   .features   = PHY_GBIT_FEATURES,
+   .config_init= kszphy_config_init,
+   .config_aneg= genphy_config_aneg,
+   .read_status= genphy_read_status,
+   .suspend= genphy_suspend,
+   .resume = genphy_resume,
 } };
 
 module_phy_driver(ksphy_driver);
diff --git a/include/linux/micrel_phy.h b/include/linux/micrel_phy.h
index f541da6..472fa4d 100644
--- a/include/linux/micrel_phy.h
+++ b/include/linux/micrel_phy.h
@@ -37,6 +37,8 @@
 
 #define PHY_ID_KSZ8795 0x00221550
 
+#definePHY_ID_KSZ9477  0x00221631
+
 /* struct phy_device dev_flags definitions */
 #define MICREL_PHY_50MHZ_CLK   0x0001
 #define MICREL_PHY_FXEN0x0002
-- 
2.7.4



[PATCH v5 net-next 1/5] dsa: add support for Microchip KSZ tail tagging

2017-05-30 Thread Woojung.Huh
From: Woojung Huh 

Adding support for the Microchip KSZ switch family tail tagging.

Reviewed-by: Andrew Lunn 
Reviewed-by: Florian Fainelli 
Signed-off-by: Woojung Huh 
---
 include/net/dsa.h  |   1 +
 net/dsa/Kconfig|   3 ++
 net/dsa/Makefile   |   1 +
 net/dsa/dsa.c  |   3 ++
 net/dsa/dsa_priv.h |   3 ++
 net/dsa/tag_ksz.c  | 101 +
 6 files changed, 112 insertions(+)
 create mode 100644 net/dsa/tag_ksz.c

diff --git a/include/net/dsa.h b/include/net/dsa.h
index c0e567c..42c28ae 100644
--- a/include/net/dsa.h
+++ b/include/net/dsa.h
@@ -31,6 +31,7 @@ enum dsa_tag_protocol {
DSA_TAG_PROTO_BRCM,
DSA_TAG_PROTO_DSA,
DSA_TAG_PROTO_EDSA,
+   DSA_TAG_PROTO_KSZ,
DSA_TAG_PROTO_LAN9303,
DSA_TAG_PROTO_MTK,
DSA_TAG_PROTO_QCA,
diff --git a/net/dsa/Kconfig b/net/dsa/Kconfig
index 297389b..cc5f8f9 100644
--- a/net/dsa/Kconfig
+++ b/net/dsa/Kconfig
@@ -25,6 +25,9 @@ config NET_DSA_TAG_DSA
 config NET_DSA_TAG_EDSA
bool
 
+config NET_DSA_TAG_KSZ
+   bool
+
 config NET_DSA_TAG_LAN9303
bool
 
diff --git a/net/dsa/Makefile b/net/dsa/Makefile
index 90e5aa6..fcce25d 100644
--- a/net/dsa/Makefile
+++ b/net/dsa/Makefile
@@ -6,6 +6,7 @@ dsa_core-y += dsa.o dsa2.o legacy.o port.o slave.o switch.o
 dsa_core-$(CONFIG_NET_DSA_TAG_BRCM) += tag_brcm.o
 dsa_core-$(CONFIG_NET_DSA_TAG_DSA) += tag_dsa.o
 dsa_core-$(CONFIG_NET_DSA_TAG_EDSA) += tag_edsa.o
+dsa_core-$(CONFIG_NET_DSA_TAG_KSZ) += tag_ksz.o
 dsa_core-$(CONFIG_NET_DSA_TAG_LAN9303) += tag_lan9303.o
 dsa_core-$(CONFIG_NET_DSA_TAG_MTK) += tag_mtk.o
 dsa_core-$(CONFIG_NET_DSA_TAG_QCA) += tag_qca.o
diff --git a/net/dsa/dsa.c b/net/dsa/dsa.c
index 3288a80..402459e 100644
--- a/net/dsa/dsa.c
+++ b/net/dsa/dsa.c
@@ -49,6 +49,9 @@ const struct dsa_device_ops *dsa_device_ops[DSA_TAG_LAST] = {
 #ifdef CONFIG_NET_DSA_TAG_EDSA
[DSA_TAG_PROTO_EDSA] = &edsa_netdev_ops,
 #endif
+#ifdef CONFIG_NET_DSA_TAG_KSZ
+   [DSA_TAG_PROTO_KSZ] = &ksz_netdev_ops,
+#endif
 #ifdef CONFIG_NET_DSA_TAG_LAN9303
[DSA_TAG_PROTO_LAN9303] = &lan9303_netdev_ops,
 #endif
diff --git a/net/dsa/dsa_priv.h b/net/dsa/dsa_priv.h
index c1d4180..7459d57 100644
--- a/net/dsa/dsa_priv.h
+++ b/net/dsa/dsa_priv.h
@@ -167,6 +167,9 @@ extern const struct dsa_device_ops dsa_netdev_ops;
 /* tag_edsa.c */
 extern const struct dsa_device_ops edsa_netdev_ops;
 
+/* tag_ksz.c */
+extern const struct dsa_device_ops ksz_netdev_ops;
+
 /* tag_lan9303.c */
 extern const struct dsa_device_ops lan9303_netdev_ops;
 
diff --git a/net/dsa/tag_ksz.c b/net/dsa/tag_ksz.c
new file mode 100644
index 000..0b08a40
--- /dev/null
+++ b/net/dsa/tag_ksz.c
@@ -0,0 +1,101 @@
+/*
+ * net/dsa/tag_ksz.c - Microchip KSZ Switch tag format handling
+ * Copyright (c) 2017 Microchip Technology
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include "dsa_priv.h"
+
+/* For Ingress (Host -> KSZ), 2 bytes are added before FCS.
+ * ---
+ * DA(6bytes)|SA(6bytes)||Data(nbytes)|tag0(1byte)|tag1(1byte)|FCS(4bytes)
+ * ---
+ * tag0 : Prioritization (not used now)
+ * tag1 : each bit represents port (eg, 0x01=port1, 0x02=port2, 0x10=port5)
+ *
+ * For Egress (KSZ -> Host), 1 byte is added before FCS.
+ * ---
+ * DA(6bytes)|SA(6bytes)||Data(nbytes)|tag0(1byte)|FCS(4bytes)
+ * ---
+ * tag0 : zero-based value represents port
+ *   (eg, 0x00=port1, 0x02=port3, 0x06=port7)
+ */
+
+#defineKSZ_INGRESS_TAG_LEN 2
+#defineKSZ_EGRESS_TAG_LEN  1
+
+static struct sk_buff *ksz_xmit(struct sk_buff *skb, struct net_device *dev)
+{
+   struct dsa_slave_priv *p = netdev_priv(dev);
+   struct sk_buff *nskb;
+   int padlen;
+   u8 *tag;
+
+   padlen = (skb->len >= ETH_ZLEN) ? 0 : ETH_ZLEN - skb->len;
+
+   if (skb_tailroom(skb) >= padlen + KSZ_INGRESS_TAG_LEN) {
+   nskb = skb;
+   } else {
+   nskb = alloc_skb(NET_IP_ALIGN + skb->len +
+padlen + KSZ_INGRESS_TAG_LEN, GFP_ATOMIC);
+   if (!nskb) {
+   kfree_skb(skb);
+   return NULL;
+   }
+   skb_reserve(nskb, NET_IP_ALIGN);
+
+   skb_reset_mac_header(nskb);
+   skb_set_network_header(nskb,
+  skb_network_header(skb) - skb->head);
+   skb_set_transport_header(nskb,
+

[PATCH v5 net-next 4/5] net: dsa: Add Microchip KSZ switches binding

2017-05-30 Thread Woojung.Huh
From: Woojung Huh 

A sample SPI configuration for Microchip KSZ switches.

Reviewed-by: Andrew Lunn 
Reviewed-by: Florian Fainelli 
Signed-off-by: Woojung Huh 
---
 Documentation/devicetree/bindings/net/dsa/ksz.txt | 72 +++
 1 file changed, 72 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/net/dsa/ksz.txt

diff --git a/Documentation/devicetree/bindings/net/dsa/ksz.txt 
b/Documentation/devicetree/bindings/net/dsa/ksz.txt
new file mode 100644
index 000..9acdb6c
--- /dev/null
+++ b/Documentation/devicetree/bindings/net/dsa/ksz.txt
@@ -0,0 +1,72 @@
+Microchip KSZ Series Ethernet switches
+==
+
+Required properties:
+
+- compatible: For external switch chips, compatible string must be exactly one
+  of: "microchip,ksz9477"
+
+See Documentation/devicetree/bindings/dsa/dsa.txt for a list of additional
+required and optional properties.
+
+Examples:
+
+Ethernet switch connected via SPI to the host, CPU port wired to eth0:
+
+   eth0: ethernet@10001000 {
+   fixed-link {
+   speed = <1000>;
+   full-duplex;
+   };
+   };
+
+   spi1: spi@f8008000 {
+   pinctrl-0 = <&pinctrl_spi_ksz>;
+   cs-gpios = <&pioC 25 0>;
+   id = <1>;
+   status = "okay";
+
+   ksz9477: ksz9477@0 {
+   compatible = "microchip,ksz9477";
+   reg = <0>;
+
+   spi-max-frequency = <4400>;
+   spi-cpha;
+   spi-cpol;
+
+   status = "okay";
+   ports {
+   #address-cells = <1>;
+   #size-cells = <0>;
+   port@0 {
+   reg = <0>;
+   label = "lan1";
+   };
+   port@1 {
+   reg = <1>;
+   label = "lan2";
+   };
+   port@2 {
+   reg = <2>;
+   label = "lan3";
+   };
+   port@3 {
+   reg = <3>;
+   label = "lan4";
+   };
+   port@4 {
+   reg = <4>;
+   label = "lan5";
+   };
+   port@5 {
+   reg = <5>;
+   label = "cpu";
+   ethernet = <ð0>;
+   fixed-link {
+   speed = <1000>;
+   full-duplex;
+   };
+   };
+   };
+   };
+   };
-- 
2.7.4



[PATCH v5 net-next 3/5] dsa: add DSA switch driver for Microchip KSZ9477

2017-05-30 Thread Woojung.Huh
From: Woojung Huh 

The KSZ9477 is a fully integrated layer 2, managed, 7 ports GigE switch
with numerous advanced features. 5 ports incorporate 10/100/1000 Mbps PHYs.
The other 2 ports have interfaces that can be configured as SGMII, RGMII, MII
or RMII. Either of these may connect directly to a host processor or
to an external PHY. The SGMII port may interface to a fiber optic transceiver.

This driver currently supports vlan, fdb, mdb & mirror dsa switch operations.

Reviewed-by: Florian Fainelli 
Signed-off-by: Woojung Huh 
---
 drivers/net/dsa/Kconfig |2 +
 drivers/net/dsa/Makefile|1 +
 drivers/net/dsa/microchip/Kconfig   |   12 +
 drivers/net/dsa/microchip/Makefile  |2 +
 drivers/net/dsa/microchip/ksz_9477_reg.h| 1676 +++
 drivers/net/dsa/microchip/ksz_common.c  | 1278 
 drivers/net/dsa/microchip/ksz_priv.h|  210 
 drivers/net/dsa/microchip/ksz_spi.c |  215 
 include/linux/platform_data/microchip-ksz.h |   29 +
 9 files changed, 3425 insertions(+)
 create mode 100644 drivers/net/dsa/microchip/Kconfig
 create mode 100644 drivers/net/dsa/microchip/Makefile
 create mode 100644 drivers/net/dsa/microchip/ksz_9477_reg.h
 create mode 100644 drivers/net/dsa/microchip/ksz_common.c
 create mode 100644 drivers/net/dsa/microchip/ksz_priv.h
 create mode 100644 drivers/net/dsa/microchip/ksz_spi.c
 create mode 100644 include/linux/platform_data/microchip-ksz.h

diff --git a/drivers/net/dsa/Kconfig b/drivers/net/dsa/Kconfig
index 68131a4..83a9bc8 100644
--- a/drivers/net/dsa/Kconfig
+++ b/drivers/net/dsa/Kconfig
@@ -39,6 +39,8 @@ config NET_DSA_MV88E6060
  This enables support for the Marvell 88E6060 ethernet switch
  chip.
 
+source "drivers/net/dsa/microchip/Kconfig"
+
 source "drivers/net/dsa/mv88e6xxx/Kconfig"
 
 config NET_DSA_QCA8K
diff --git a/drivers/net/dsa/Makefile b/drivers/net/dsa/Makefile
index 9613f36..4a5b5bd 100644
--- a/drivers/net/dsa/Makefile
+++ b/drivers/net/dsa/Makefile
@@ -8,4 +8,5 @@ obj-$(CONFIG_NET_DSA_SMSC_LAN9303) += lan9303-core.o
 obj-$(CONFIG_NET_DSA_SMSC_LAN9303_I2C) += lan9303_i2c.o
 obj-$(CONFIG_NET_DSA_SMSC_LAN9303_MDIO) += lan9303_mdio.o
 obj-y  += b53/
+obj-y  += microchip/
 obj-y  += mv88e6xxx/
diff --git a/drivers/net/dsa/microchip/Kconfig 
b/drivers/net/dsa/microchip/Kconfig
new file mode 100644
index 000..a8b8f59
--- /dev/null
+++ b/drivers/net/dsa/microchip/Kconfig
@@ -0,0 +1,12 @@
+menuconfig MICROCHIP_KSZ
+   tristate "Microchip KSZ series switch support"
+   depends on NET_DSA
+   select NET_DSA_TAG_KSZ
+   help
+ This driver adds support for Microchip KSZ switch chips.
+
+config MICROCHIP_KSZ_SPI_DRIVER
+   tristate "KSZ series SPI connected switch driver"
+   depends on MICROCHIP_KSZ && SPI
+   help
+ Select to enable support for registering switches configured through 
SPI.
diff --git a/drivers/net/dsa/microchip/Makefile 
b/drivers/net/dsa/microchip/Makefile
new file mode 100644
index 000..ed335e2
--- /dev/null
+++ b/drivers/net/dsa/microchip/Makefile
@@ -0,0 +1,2 @@
+obj-$(CONFIG_MICROCHIP_KSZ)+= ksz_common.o
+obj-$(CONFIG_MICROCHIP_KSZ_SPI_DRIVER) += ksz_spi.o
diff --git a/drivers/net/dsa/microchip/ksz_9477_reg.h 
b/drivers/net/dsa/microchip/ksz_9477_reg.h
new file mode 100644
index 000..6aa6752
--- /dev/null
+++ b/drivers/net/dsa/microchip/ksz_9477_reg.h
@@ -0,0 +1,1676 @@
+/*
+ * Microchip KSZ9477 register definitions
+ *
+ * Copyright (C) 2017
+ *
+ * Permission to use, copy, modify, and/or distribute this software for any
+ * purpose with or without fee is hereby granted, provided that the above
+ * copyright notice and this permission notice appear in all copies.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
+ * WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
+ * MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
+ * ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
+ * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
+ * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
+ * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
+ */
+
+#ifndef __KSZ9477_REGS_H
+#define __KSZ9477_REGS_H
+
+#define KS_PRIO_M  0x7
+#define KS_PRIO_S  4
+
+/* 0 - Operation */
+#define REG_CHIP_ID0__10x
+
+#define REG_CHIP_ID1__10x0001
+
+#define FAMILY_ID  0x95
+#define FAMILY_ID_94   0x94
+#define FAMILY_ID_95   0x95
+#define FAMILY_ID_85   0x85
+#define FAMILY_ID_98   0x98
+#define FAMILY_ID_88   0x88
+
+#define REG_CHIP_ID2__10x000

[PATCH v5 net-next 0/5] dsa: add Microchip KSZ9477 DSA driver

2017-05-30 Thread Woojung.Huh
From: Woojung Huh 

This series of patches is for Microchip KSZ9477 DSA driver.
KSZ9477 is 7 ports GigE switch with numerous advanced features.
5 ports are 10/100/1000 Mbps internal PHYs and 2 ports have
Interfaces to SGMII, RGMII, MII or RMII.

This patch supports VLAN, MDB, FDB and port mirroring offloads.

Welcome reviews and comments from community.

Note: Tests are performed on internal development board.

V5
- drivers/net/dsa/microchip/ksz_common.c
  * remove wrong mutex_unlock() in ksz_port_mdb_del()

V4
- update per review comments
- cosmetic changes
- net/dsa/tag_ksz.c
  * skb_put() & memset() are changed to skb_put_padto()
- drivers/net/dsa/microchip/ksz_common.
   * vlan access mutex is updated
   * mib_names[] is changed to static const

V3
- update per review comments
- cosmetic changes
- drivers/net/dsa/microchip/ksz_common.c 
  * clean up ksz_switch_chips[] 
  * consolidate checking loops into functions
  * update mutex for better locking
  * replace devm_kmalloc_array() to devm_kcalloc()
- MAINTAINERS
  * add missing net/dsa/tag_ksz.c

V2
- update per review comments
- several cosmetic changes
- net/dsa/tag_ksz.c
  * constants are changed to defines
  * remove skb_linearize() in ksz_rcv()
  * ksz_xmit()checks skb tailroom before allocate new skb
- drivers/net/phy/micrel.c
  * remove PHY_HAS_MAGICANEG from ksphy_driver[]
- drivers/net/dsa/microchip/ksz_common.c
  * add timeout to avoid endless loop
  * port initialization is move to ksz_port_enable() instead of  
ksz_setup_ports()
- Documentation/devicetree/bindings/net/dsa/ksz.txt
  * fix typo and indentations

Woojung Huh (5):
  dsa: add support for Microchip KSZ tail tagging
  phy: micrel: add Microchip KSZ 9477 Switch PHY support
  dsa: add DSA switch driver for Microchip KSZ9477
  net: dsa: Add Microchip KSZ switches binding
  dsa: add maintainer of Microchip KSZ switches

 Documentation/devicetree/bindings/net/dsa/ksz.txt |   72 +
 MAINTAINERS   |   10 +
 drivers/net/dsa/Kconfig   |2 +
 drivers/net/dsa/Makefile  |1 +
 drivers/net/dsa/microchip/Kconfig |   12 +
 drivers/net/dsa/microchip/Makefile|2 +
 drivers/net/dsa/microchip/ksz_9477_reg.h  | 1676 +
 drivers/net/dsa/microchip/ksz_common.c| 1278 
 drivers/net/dsa/microchip/ksz_priv.h  |  210 +++
 drivers/net/dsa/microchip/ksz_spi.c   |  215 +++
 drivers/net/phy/micrel.c  |   11 +
 include/linux/micrel_phy.h|2 +
 include/linux/platform_data/microchip-ksz.h   |   29 +
 include/net/dsa.h |1 +
 net/dsa/Kconfig   |3 +
 net/dsa/Makefile  |1 +
 net/dsa/dsa.c |3 +
 net/dsa/dsa_priv.h|3 +
 net/dsa/tag_ksz.c |  101 ++
 19 files changed, 3632 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/net/dsa/ksz.txt
 create mode 100644 drivers/net/dsa/microchip/Kconfig
 create mode 100644 drivers/net/dsa/microchip/Makefile
 create mode 100644 drivers/net/dsa/microchip/ksz_9477_reg.h
 create mode 100644 drivers/net/dsa/microchip/ksz_common.c
 create mode 100644 drivers/net/dsa/microchip/ksz_priv.h
 create mode 100644 drivers/net/dsa/microchip/ksz_spi.c
 create mode 100644 include/linux/platform_data/microchip-ksz.h
 create mode 100644 net/dsa/tag_ksz.c

-- 
2.7.4


Re: loosing netdevices with namespaces and unshare?

2017-05-30 Thread David Ahern
On 5/30/17 4:07 PM, Harald Welte wrote:
> In case you're wondering what I'm actually trying to achieve: Find
> an easy way to run a single program in an isolated namespace that only
> has one physical (usb) ethernet device.  I would like to execute that
> program as unprivileged user but still be able to bind to privileged
> ports.  And I want to do this using simple command-line tools without
> all the bloat and overhead of "container" solutions that have 99% of
> features I don't need.  But let that not distract you, I think the
> mysteriously disappearing netdevices are a more general and important
> issue.

An alternative approach is to create a bridge and add the usb ethernet
device to it. As you want to launch a program, create a veth pair. Put
one end into the bridge, and the other end into the new network namespace.

All of this can be scripted quite easily with 'ip' - including
configuring the veth device pushed into the namespace and running the
command. Use unshare for the other namespaces.


Re: [PATCH RFC net-next] arp: Really delete an arp entry on "arp -d"

2017-05-30 Thread Sowmini Varadhan
On (05/30/17 16:20), Stephen Hemminger wrote:
> 
> Please don't copy/paste chunks of code. Instead refactor and make this
> into a helper function. 

sure, I have no problems with that, and as I pointed out, I've not
tested ipv6 for this yet either. I'll do all of this after getting
some feedback on the more basic issue here:

I was first looking for comments on the more fundamental refcnt
management behind the fix (I'm surprised no one noticed this 
before, is there some deep reason for leaving it like this, that
I am missing? Does it break something else?)

And fwiw I was merging pieces of ___neigh_lookup_noref, which figures
out the hash val, and neigh_flush_dev/neigh_forced_gc 


Re: [PATCH RFC net-next] arp: Really delete an arp entry on "arp -d"

2017-05-30 Thread Stephen Hemminger
On Mon, 29 May 2017 20:48:16 -0700
Sowmini Varadhan  wrote:

> + np = &nht->hash_buckets[hash_val];
> + while ((n = rcu_dereference_protected(*np,
> + lockdep_is_held(&tbl->lock))) != NULL) {
> + write_lock(&n->lock);
> + if (n == ndel) {
> + bool retval = false;
> +
> + if  (atomic_read(&n->refcnt) == 1) {
> + rcu_assign_pointer(*np,
> + rcu_dereference_protected(n->next,
> + lockdep_is_held(&tbl->lock)));
> + n->dead = 1;
> + retval = true;
> + }
> + write_unlock(&n->lock);
> + if (retval)
> + neigh_cleanup_and_release(n);
> + write_unlock_bh(&tbl->lock);
> + return retval;
> + }
> + write_unlock(&n->lock);
> + np = &n->next;
> + }
> +

Please don't copy/paste chunks of code. Instead refactor and make this
into a helper function. 


Re: loosing netdevices with namespaces and unshare?

2017-05-30 Thread Cong Wang
On Tue, May 30, 2017 at 3:07 PM, Harald Welte  wrote:
> But, to the contrary, this doesn't happen.  The unshare-created netns is
> gone, but the netdevice did not get moved back to the root namespace
> either.  The only hack to get back to the "eth0" device is to unload the
> driver and re-load it.


Net namespace simply unregisters all netdevices inside when it is
gone, no matter where they are from. I am pretty sure you can move it
back to root-ns if you want, it is a little tricky because you have to give
the root-ns a name first.


>
> I can reproduce the above without starting any other process inside that
> namespace.  I have verified that there are no /proc/*/ns/net symlinks
> left pointing to the ID of that namespace.  What am I missing here?  Is
> this the intended behavior?

Yes it is.

>
> Of course I know I could simply do something like "ip link set eth0
> netns 1" from within the namespace before leaving.  But what if the
> process is not bash and the process exits abnormally?   I'd consider
> that explicit reassignment more like a hack than a proper solution...

It doesn't make sense to move it back to where it is from, for example,
what if you move a veth0 from netns1 to netns2 and netns1 is gone
before netns2?

Regards.


[PATCH RFC net-next] arp: Really delete an arp entry on "arp -d"

2017-05-30 Thread Sowmini Varadhan
Noticed during some testing: the command
  # arp -s 62.2.0.1 a:b:c:d:e:f dev eth2
adds an entry like the following (listed by "arp -an")
  ? (62.2.0.1) at 0a:0b:0c:0d:0e:0f [ether] PERM on eth2
but the symmetric deletion command
  # arp -i eth2 -d 62.2.0.1
does not remove the PERM entry from the table, and instead leaves behind
  ? (62.2.0.1) at  on eth2

The reason is that there is a refcnt of 1 for the arp_tbl itself
(neigh_alloc starts off the entry with a refcnt of 1), thus
the neigh_release() call from arp_invalidate() will (at best) just
decrement the ref to 1, but will never actually free it from the
table.

To fix this, we need to do something like neigh_forced_gc: if
the refcnt is 1 (i.e., on the table's ref), remove the entry from
the table and free it.

We may need something symmetric for IPv6- I was going to check into
that, after getting some feedback on this RFC patch.

Signed-off-by: Sowmini Varadhan 
---
 include/net/neighbour.h |1 +
 net/core/neighbour.c|   42 ++
 net/ipv4/arp.c  |1 +
 3 files changed, 44 insertions(+), 0 deletions(-)

diff --git a/include/net/neighbour.h b/include/net/neighbour.h
index e4dd3a2..639b675 100644
--- a/include/net/neighbour.h
+++ b/include/net/neighbour.h
@@ -317,6 +317,7 @@ struct neighbour *__neigh_create(struct neigh_table *tbl, 
const void *pkey,
 int neigh_update(struct neighbour *neigh, const u8 *lladdr, u8 new, u32 flags,
 u32 nlmsg_pid);
 void __neigh_set_probe_once(struct neighbour *neigh);
+bool neigh_remove_one(struct neighbour *ndel, struct neigh_table *tbl);
 void neigh_changeaddr(struct neigh_table *tbl, struct net_device *dev);
 int neigh_ifdown(struct neigh_table *tbl, struct net_device *dev);
 int neigh_resolve_output(struct neighbour *neigh, struct sk_buff *skb);
diff --git a/net/core/neighbour.c b/net/core/neighbour.c
index d274f81..0a09f6f 100644
--- a/net/core/neighbour.c
+++ b/net/core/neighbour.c
@@ -117,6 +117,48 @@ unsigned long neigh_rand_reach_time(unsigned long base)
 }
 EXPORT_SYMBOL(neigh_rand_reach_time);
 
+bool neigh_remove_one(struct neighbour *ndel, struct neigh_table *tbl)
+{
+   struct neigh_hash_table *nht;
+   void *pkey = ndel->primary_key;
+   u32 hash_val;
+   struct neighbour *n;
+   struct neighbour __rcu **np;
+
+   write_lock_bh(&tbl->lock);
+   nht = rcu_dereference_protected(tbl->nht,
+   lockdep_is_held(&tbl->lock));
+   hash_val = tbl->hash(pkey, ndel->dev, nht->hash_rnd);
+   hash_val = hash_val >> (32 - nht->hash_shift);
+
+   np = &nht->hash_buckets[hash_val];
+   while ((n = rcu_dereference_protected(*np,
+   lockdep_is_held(&tbl->lock))) != NULL) {
+   write_lock(&n->lock);
+   if (n == ndel) {
+   bool retval = false;
+
+   if  (atomic_read(&n->refcnt) == 1) {
+   rcu_assign_pointer(*np,
+   rcu_dereference_protected(n->next,
+   lockdep_is_held(&tbl->lock)));
+   n->dead = 1;
+   retval = true;
+   }
+   write_unlock(&n->lock);
+   if (retval)
+   neigh_cleanup_and_release(n);
+   write_unlock_bh(&tbl->lock);
+   return retval;
+   }
+   write_unlock(&n->lock);
+   np = &n->next;
+   }
+
+   write_unlock_bh(&tbl->lock);
+   return false;
+}
+EXPORT_SYMBOL(neigh_remove_one);
 
 static int neigh_forced_gc(struct neigh_table *tbl)
 {
diff --git a/net/ipv4/arp.c b/net/ipv4/arp.c
index e9f3386..5264004 100644
--- a/net/ipv4/arp.c
+++ b/net/ipv4/arp.c
@@ -1120,6 +1120,7 @@ static int arp_invalidate(struct net_device *dev, __be32 
ip)
   NEIGH_UPDATE_F_OVERRIDE|
   NEIGH_UPDATE_F_ADMIN, 0);
neigh_release(neigh);
+   neigh_remove_one(neigh, &arp_tbl);
}
 
return err;
-- 
1.7.1



Re: [PATCH 2/2] net: phy: micrel: Restore led_mode and clk_sel on resume

2017-05-30 Thread Leonard Crestez
On Tue, 2017-05-30 at 15:19 -0700, Florian Fainelli wrote:
> On 05/30/2017 03:08 PM, Leonard Crestez wrote:
> > On Tue, 2017-05-30 at 11:05 -0700, Florian Fainelli wrote:

> > > Should not we actually call kszphy_config_init() in order to restore
> > > broadcast and nand disable bits as well?
> > 
> > I don't know. In my case the B_CAST_OFF bit doesn't seem to be lost and
> > NAND_TREE_ON is already off by the time it gets to linux.
> > 
> > The bit that get lost seem to disappear just as the phy is resumed. I
> > added some prints and they look like this:
> > 
> > PM: early resume of devices complete after 6.534 msecs
> > begin resume
> > 0x1F=0x8190 0x16=0x202
> > after genphy_resume 0x1F=0x8100 0x16=0x202
> > end
> > resume 0x1F=0x8190 0x16=0x202
> 
> OK, so it actually would not hurt to call config_init() again here, right?

No, but if only some registers are lost then why reconfigure others? In
particular it seems that only MII_KSZPHY_CTRL_2 is lost but not
MII_KSZPHY_OMSO.

However a few extra phy reads don't really matter. Calling config_init
is the path of least resistance.

Another problem is that the ksz9031 driver uses kszphy_resume but has a
completely different init function. The bits I am concerned about do
not seem to exist in hardware but it does something completely
unrelated with parsing OF properties. Should I split this into
ksz8021_resume and ksz9031_resume?

It seems likely that more of the kszphy drivers need suspend/resume
code but nobody got around to testing them.

> > > If not, I would be more comfortable if we did create a specific function
> > > that takes care of setting the reference clock and LED mode.
> > 
> > Ok, I can add a function called kszphy_config_reset() with a comment
> > explaining it's for bits lost on reset/resume.
> > 
> > Or perhaps a better option would be to just save/restore the entire
> > 0x1F register?
> 
> Register 0 through 15 are standardized, but those after that are not,
> and PHYs with a gazillion of registers already need to have a specific
> set of suspend/resume sequence due to their proprietary indirect
> register scheme, so we cannot quite come up with a generic function that
> would cache all 16 or 32 registers and flat out restore those.
> 
> Similarly, having phy_resume() systematically calling into config_init()
> is a bit of a stretch and can be pretty inefficient.

I'm not suggesting changing this at the phy core level. Just that maybe
kszphy_resume specifically could save/restore register 0x1F instead of
going through config logic again (which would involve extra reads and
writes).

However it seems that this has complications, for example on some
versions the leds are written to a different register. It's not really
worth optimizing to such an extent.


RE: [PATCH v4 net-next 3/5] dsa: add DSA switch driver for Microchip KSZ9477

2017-05-30 Thread Woojung.Huh
Hi Andrew,
> > +static int ksz_port_mdb_del(struct dsa_switch *ds, int port,
> > +   const struct switchdev_obj_port_mdb *mdb)
> > +{
> > +   struct ksz_device *dev = ds->priv;
> > +   u32 static_table[4];
> > +   u32 data;
> > +   int index;
> > +   int ret = 0;
> > +   u32 mac_hi, mac_lo;
> > +
> > +   mac_hi = ((mdb->addr[0] << 8) | mdb->addr[1]);
> > +   mac_lo = ((mdb->addr[2] << 24) | (mdb->addr[3] << 16));
> > +   mac_lo |= ((mdb->addr[4] << 8) | mdb->addr[5]);
> > +
> > +   mutex_lock(&dev->alu_mutex);
> > +
> > +   for (index = 0; index < dev->num_statics; index++) {
> > +   /* find empty slot first */
> > +   data = (index << ALU_STAT_INDEX_S) |
> > +   ALU_STAT_READ | ALU_STAT_START;
> > +   ksz_write32(dev, REG_SW_ALU_STAT_CTRL__4, data);
> > +
> > +   /* wait to be finished */
> > +   ret = wait_alu_sta_ready(dev, ALU_STAT_START, 1000);
> > +   if (ret < 0) {
> > +   dev_dbg(dev->dev, "Failed to read ALU STATIC\n");
> > +   goto exit;
> > +   }
> > +
> > +   /* read ALU static table */
> > +   read_table(ds, static_table);
> > +
> > +   mutex_unlock(&dev->alu_mutex);
> 
> Is this mutex unlock here correct? It looks like we will unlock it
> again when we eventually get to exit: below.
> 
Thanks for catching this. Will submit new patch soon.

- Woojung


[PATCH] net: freescale: fix potential null pointer dereference

2017-05-30 Thread Gustavo A. R. Silva
Add NULL check before dereferencing pointer _id_ in order to avoid
a potential NULL pointer dereference.

Addresses-Coverity-ID: 1397995
Signed-off-by: Gustavo A. R. Silva 
---
 drivers/net/ethernet/freescale/fsl_pq_mdio.c | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/freescale/fsl_pq_mdio.c 
b/drivers/net/ethernet/freescale/fsl_pq_mdio.c
index 446c7b3..a10de1e 100644
--- a/drivers/net/ethernet/freescale/fsl_pq_mdio.c
+++ b/drivers/net/ethernet/freescale/fsl_pq_mdio.c
@@ -381,7 +381,7 @@ static int fsl_pq_mdio_probe(struct platform_device *pdev)
 {
const struct of_device_id *id =
of_match_device(fsl_pq_mdio_match, &pdev->dev);
-   const struct fsl_pq_mdio_data *data = id->data;
+   const struct fsl_pq_mdio_data *data;
struct device_node *np = pdev->dev.of_node;
struct resource res;
struct device_node *tbi;
@@ -389,6 +389,13 @@ static int fsl_pq_mdio_probe(struct platform_device *pdev)
struct mii_bus *new_bus;
int err;
 
+   if (!id) {
+   dev_err(&pdev->dev, "Failed to match device\n");
+   return -ENODEV;
+   }
+
+   data = id->data;
+
dev_dbg(&pdev->dev, "found %s compatible node\n", id->compatible);
 
new_bus = mdiobus_alloc_size(sizeof(*priv));
-- 
2.5.0



Re: [PATCH V2 2/3] net-next: dsa: add multi cpu port support

2017-05-30 Thread Florian Fainelli
+Jiri, Ido,

On 05/30/2017 03:44 AM, John Crispin wrote:
> Some boards have two CPU interfaces connected to the switch, e.g. WiFi
> access points, with 1 port labeled WAN, 4 ports labeled lan1-lan4, and
> two port connected to the SoC.
> 
> This patch extends DSA to allows both CPU ports to be used. The "cpu"
> node in the DSA tree can now have a phandle to the host interface it
> connects to. Each user port can have a phandle to a cpu port which
> should be used for traffic between the port and the CPU. Thus simple
> load sharing over the two CPU ports can be achieved.

Part of the problem here is that:

- we have the initial state where we only allow the user-ports to be
talking to the CPU port, now we have more than one CPU port, so which
one do we choose? You have proposed a Device Tree change for that, and
while this works, we are going beyond pure HW description and we are
already describing a desired policy, not ideal

- past the initial setup, if we start creating bridge devices and so on,
we have no way to tell: group Ports 0-3 together and send traffic to CPU
port 0, then let Port 5 alone and send traffic to CPU port 1, that's a
DSA-only problem though, because we still have the CPU port(s) as
independent network interfaces.

Right now, we cannot enslave master network devices into the bridge when
a tagging protocol is used (see
8db0a2ee2c6302a1dcbcdb93cb731dfc6c0cdb5e) so we cannot quite associate a
bridge device with its underlying CPU port.

I suppose we could revise that commit and let the enslaving happen in
order for the switch drivers to "learn" which CPU/master network device
is being added to the bridge, and just not register the RX handler for
that interface.

Now, that would still force the user to configure two bridges in order
to properly steer traffic towards the requested ports but it would allow
us to be very flexible (which is probably desired here) in how ports are
grouped together.

Does that make sense?

> 
> Signed-off-by: John Crispin 
> ---
>  include/net/dsa.h  | 21 -
>  net/dsa/dsa2.c | 35 +++
>  net/dsa/dsa_priv.h |  1 +
>  net/dsa/slave.c| 26 --
>  4 files changed, 68 insertions(+), 15 deletions(-)
> 
> diff --git a/include/net/dsa.h b/include/net/dsa.h
> index c0e567c0c824..d2994bd2c507 100644
> --- a/include/net/dsa.h
> +++ b/include/net/dsa.h
> @@ -186,6 +186,8 @@ struct dsa_port {
>   u8  stp_state;
>   struct net_device   *bridge_dev;
>   struct devlink_port devlink_port;
> + struct net_device   *ethernet;
> + int upstream;
>  };
>  
>  struct dsa_switch {
> @@ -251,7 +253,7 @@ struct dsa_switch {
>  
>  static inline bool dsa_is_cpu_port(struct dsa_switch *ds, int p)
>  {
> - return ds->dst->cpu_dp == &ds->ports[p];
> + return !!(ds->cpu_port_mask & (1 << p));
>  }
>  
>  static inline bool dsa_is_dsa_port(struct dsa_switch *ds, int p)
> @@ -269,6 +271,11 @@ static inline bool dsa_is_port_initialized(struct 
> dsa_switch *ds, int p)
>   return ds->enabled_port_mask & (1 << p) && ds->ports[p].netdev;
>  }
>  
> +static inline bool dsa_is_upstream_port(struct dsa_switch *ds, int p)
> +{
> + return dsa_is_cpu_port(ds, p) || dsa_is_dsa_port(ds, p);
> +}
> +
>  static inline u8 dsa_upstream_port(struct dsa_switch *ds)
>  {
>   struct dsa_switch_tree *dst = ds->dst;
> @@ -285,6 +292,18 @@ static inline u8 dsa_upstream_port(struct dsa_switch *ds)
>   return ds->rtable[dst->cpu_dp->ds->index];
>  }
>  
> +static inline u8 dsa_port_upstream_port(struct dsa_switch *ds, int port)
> +{
> + /*
> +  * If this port has a specific upstream cpu port, use it,
> +  * otherwise use the switch default.
> +  */
> + if (ds->ports[port].upstream)
> + return ds->ports[port].upstream;
> + else
> + return dsa_upstream_port(ds);
> +}
> +
>  struct dsa_switch_ops {
>   /*
>* Legacy probing.
> diff --git a/net/dsa/dsa2.c b/net/dsa/dsa2.c
> index 4301f52e4f5a..8b13aa735c40 100644
> --- a/net/dsa/dsa2.c
> +++ b/net/dsa/dsa2.c
> @@ -253,8 +253,6 @@ static int dsa_cpu_port_apply(struct dsa_port *port, u32 
> index,
>   return err;
>   }
>  
> - ds->cpu_port_mask |= BIT(index);
> -
>   memset(&ds->ports[index].devlink_port, 0,
>  sizeof(ds->ports[index].devlink_port));
>   err = devlink_port_register(ds->devlink, &ds->ports[index].devlink_port,
> @@ -269,6 +267,10 @@ static void dsa_cpu_port_unapply(struct dsa_port *port, 
> u32 index,
>   dsa_cpu_dsa_destroy(port);
>   ds->cpu_port_mask &= ~BIT(index);
>  
> + if (ds->ports[index].ethernet) {
> + dev_put(ds->ports[index].ethernet);
> + ds->ports[index].ethernet = NULL;
> + }
>  }
>  
>  static int dsa_user_port_apply(struct dsa_port *port, u32 index,
> @@ -530,6 +532,29 @@ static int dsa_cpu_parse(struct dsa_port *port, u32 

Suggest stable pick for nfnetlink_queue memory leak

2017-05-30 Thread Doug Anderson
Hi,

I was running kmemleak on a kernel based on 4.4.64 and I noticed that
my memory leak was fixed by commit b18bcb0019cf ("netfilter:
nfnetlink_queue: fix memory leak when attach expectation
successfully").  It seems like perhaps this should go to linux-stable?
 It's not a huge leak, but the fix is simple.

It looks as if the leak has existed since commit bd0779370588
("netfilter: nfnetlink_queue: allow to attach expectations to
conntracks")

Thanks!

-Doug


Re: [PATCH v4 net-next 3/5] dsa: add DSA switch driver for Microchip KSZ9477

2017-05-30 Thread Andrew Lunn
> +static int ksz_port_mdb_del(struct dsa_switch *ds, int port,
> + const struct switchdev_obj_port_mdb *mdb)
> +{
> + struct ksz_device *dev = ds->priv;
> + u32 static_table[4];
> + u32 data;
> + int index;
> + int ret = 0;
> + u32 mac_hi, mac_lo;
> +
> + mac_hi = ((mdb->addr[0] << 8) | mdb->addr[1]);
> + mac_lo = ((mdb->addr[2] << 24) | (mdb->addr[3] << 16));
> + mac_lo |= ((mdb->addr[4] << 8) | mdb->addr[5]);
> +
> + mutex_lock(&dev->alu_mutex);
> +
> + for (index = 0; index < dev->num_statics; index++) {
> + /* find empty slot first */
> + data = (index << ALU_STAT_INDEX_S) |
> + ALU_STAT_READ | ALU_STAT_START;
> + ksz_write32(dev, REG_SW_ALU_STAT_CTRL__4, data);
> +
> + /* wait to be finished */
> + ret = wait_alu_sta_ready(dev, ALU_STAT_START, 1000);
> + if (ret < 0) {
> + dev_dbg(dev->dev, "Failed to read ALU STATIC\n");
> + goto exit;
> + }
> +
> + /* read ALU static table */
> + read_table(ds, static_table);
> +
> + mutex_unlock(&dev->alu_mutex);

Is this mutex unlock here correct? It looks like we will unlock it
again when we eventually get to exit: below.

> +
> + if (static_table[0] & ALU_V_STATIC_VALID) {
> + /* check this has same vid & mac address */
> +
> + if (((static_table[2] >> ALU_V_FID_S) == (mdb->vid)) &&
> + ((static_table[2] & ALU_V_MAC_ADDR_HI) == mac_hi) &&
> + (static_table[3] == mac_lo)) {
> + /* found matching one */
> + break;
> + }
> + }
> + }
> +
> + /* no available entry */
> + if (index == dev->num_statics) {
> + ret = -EINVAL;
> + goto exit;
> + }
> +
> + /* clear port */
> + static_table[1] &= ~BIT(port);
> +
> + if ((static_table[1] & ALU_V_PORT_MAP) == 0) {
> + /* delete entry */
> + static_table[0] = 0;
> + static_table[1] = 0;
> + static_table[2] = 0;
> + static_table[3] = 0;
> + }
> +
> + write_table(ds, static_table);
> +
> + data = (index << ALU_STAT_INDEX_S) | ALU_STAT_START;
> + ksz_write32(dev, REG_SW_ALU_STAT_CTRL__4, data);
> +
> + /* wait to be finished */
> + ret = wait_alu_sta_ready(dev, ALU_STAT_START, 1000);
> + if (ret < 0)
> + dev_dbg(dev->dev, "Failed to read ALU STATIC\n");
> +
> +exit:
> + mutex_unlock(&dev->alu_mutex);
> +
> + return ret;
> +}


Re: [PATCH] net: fix auto-loading of Marvell DSA driver

2017-05-30 Thread Andrew Lunn
On Tue, May 30, 2017 at 09:38:18PM +0100, Russell King wrote:
> Auto-loading of the Marvell DSA driver has stopped working with recent
> kernels.  This seems to be due to the change of binding for DSA devices,
> moving them from the platform bus to the MDIO bus.
> 
> In order for module auto-loading to work, we need to provide a MODALIAS
> string in the uevent file for the device.  However, the device core does
> not automatically provide this, and needs each bus_type to implement a
> uevent method to generate these strings.  The MDIO bus does not provide
> such a method, so no MODALIAS string is provided:
> 
> .# cat /sys/bus/mdio_bus/devices/f1072004.mdio-mii\:04/uevent
> DRIVER=mv88e6085
> OF_NAME=switch
> OF_FULLNAME=/soc/internal-regs/mdio@72004/switch@4
> OF_COMPATIBLE_0=marvell,mv88e6085
> OF_COMPATIBLE_N=1
> 
> In the case of OF-based devices, the solution is easy -
> of_device_uevent_modalias() does the work for us.  After this is done,
> the uevent file looks like this:
> 
> .# cat /sys/bus/mdio_bus/devices/f1072004.mdio-mii\:04/uevent
> DRIVER=mv88e6085
> OF_NAME=switch
> OF_FULLNAME=/soc/internal-regs/mdio@72004/switch@4
> OF_COMPATIBLE_0=marvell,mv88e6085
> OF_COMPATIBLE_N=1
> MODALIAS=of:NswitchTCmarvell,mv88e6085
> 
> which results in auto-loading of the Marvell DSA driver on Clearfog
> platforms.
> 
> Fixes: c0405563a613 ("ARM: dts: armada-388-clearfog: Utilize new DSA binding")
> Signed-off-by: Russell King 

Reviewed-by: Andrew Lunn 

Andrew


loosing netdevices with namespaces and unshare?

2017-05-30 Thread Harald Welte
Hi all,

I know I'm kind of late to the party in terms of deeper exploration of
Linux network namespaces.  Also, I'm not sure if the netdev list is the
riight place to ask, but a moderate amount of web searching didn't bring
up a solution in multiple hours, and it seems like I could trigger the
kernel (4.11.0) to loose netdevices, which I think is a serious issue.

What I'm doing:
* start a process using the 'unshare' command line tool provided with
  util-linux, e.g. "unshare -nUr bash". I do this as a non-privileged
  user but now that is mapped to uid '0' inside the new
  process/namespace, so I can adjust interface configuration.
* I use "echo $$" to get the PID of that bash process.
* On another terminal in a root shell, I use "ip link set eth0 netns $PID"
  in order to move a given physical device into that namespace.
* I then "exit" that bash, which should - to my knowledge - return the
  "eth0" netdev back to the root namespace, as the bash process was the
  only one using that network namespace

But, to the contrary, this doesn't happen.  The unshare-created netns is
gone, but the netdevice did not get moved back to the root namespace
either.  The only hack to get back to the "eth0" device is to unload the
driver and re-load it.

I can reproduce the above without starting any other process inside that
namespace.  I have verified that there are no /proc/*/ns/net symlinks
left pointing to the ID of that namespace.  What am I missing here?  Is
this the intended behavior?

Of course I know I could simply do something like "ip link set eth0
netns 1" from within the namespace before leaving.  But what if the
process is not bash and the process exits abnormally?   I'd consider
that explicit reassignment more like a hack than a proper solution...

Regards,
Harald

p.s.: In case you're wondering what I'm actually trying to achieve: Find
an easy way to run a single program in an isolated namespace that only
has one physical (usb) ethernet device.  I would like to execute that
program as unprivileged user but still be able to bind to privileged
ports.  And I want to do this using simple command-line tools without
all the bloat and overhead of "container" solutions that have 99% of
features I don't need.  But let that not distract you, I think the
mysteriously disappearing netdevices are a more general and important
issue.

-- 
- Harald Weltehttp://laforge.gnumonks.org/

"Privacy in residential applications is a desirable marketing option."
  (ETSI EN 300 175-7 Ch. A6)


Re: [PATCH 1/2] ARM: dts: imx6ul-14x14-evk: Add ksz8081 phy properties

2017-05-30 Thread Leonard Crestez
On Tue, 2017-05-30 at 11:10 -0700, Florian Fainelli wrote:
> On 05/30/2017 10:34 AM, Leonard Crestez wrote:
> > Right now mach-imx6ul registers a fixup for the ksz8081 phy. The same
> > register values can be set through the micrel phy driver by using dts
> > properties.
> > 
> > This seems preferable and allows cleanly fixing suspend/resume.
> > 
> > Signed-off-by: Leonard Crestez 
> 
> Should you have a Fixes: tag for that? Sounds like something you'd want
> to backport to stable trees as well, no?

I don't know if suspend over nfs ever worked on this board in upstream
so there is no commit that this patch "Fixes: ". It would make sense to
CC stable.


Re: [PATCH 2/2] net: phy: micrel: Restore led_mode and clk_sel on resume

2017-05-30 Thread Florian Fainelli
On 05/30/2017 03:08 PM, Leonard Crestez wrote:
> On Tue, 2017-05-30 at 11:05 -0700, Florian Fainelli wrote:
>> On 05/30/2017 10:34 AM, Leonard Crestez wrote:
>>> These bits seem to be lost after a suspend/resume cycle so just set them
>>> again.
>>>
>>> This patch fixes ethernet suspend/resume on imx6ul-14x14-evk boards.
>>>
>>> Signed-off-by: Leonard Crestez 
>>> ---
>>>  drivers/net/phy/micrel.c | 15 +++
>>>  1 file changed, 15 insertions(+)
>>>
>>> diff --git a/drivers/net/phy/micrel.c b/drivers/net/phy/micrel.c
>>> index 6a5fd18..c53ee17 100644
>>> --- a/drivers/net/phy/micrel.c
>>> +++ b/drivers/net/phy/micrel.c
>>> @@ -700,6 +700,9 @@ static int kszphy_suspend(struct phy_device *phydev)
>>>  
>>>  static int kszphy_resume(struct phy_device *phydev)
>>>  {
>>> +   struct kszphy_priv *priv = phydev->priv;
>>> +   int ret;
>>> +
>>> genphy_resume(phydev);
>>>  
>>> /* Enable PHY Interrupts */
>>> @@ -709,6 +712,18 @@ static int kszphy_resume(struct phy_device *phydev)
>>> phydev->drv->config_intr(phydev);
>>> }
>>>  
>>> +   if (priv->rmii_ref_clk_sel) {
>>> +   ret = kszphy_rmii_clk_sel(phydev, priv->rmii_ref_clk_sel_val);
>>> +   if (ret) {
>>> +   phydev_err(phydev,
>>> +  "failed to set rmii reference clock\n");
>>> +   return ret;
>>> +   }
>>> +   }
>>> +
>>> +   if (priv->led_mode >= 0)
>>> +   kszphy_setup_led(phydev, priv->type->led_mode_reg, 
>>> priv->led_mode);
>>
>> Should not we actually call kszphy_config_init() in order to restore
>> broadcast and nand disable bits as well?
> 
> I don't know. In my case the B_CAST_OFF bit doesn't seem to be lost and
> NAND_TREE_ON is already off by the time it gets to linux.
> 
> The bit that get lost seem to disappear just as the phy is resumed. I
> added some prints and they look like this:
> 
> PM: early resume of devices complete after 6.534 msecs
> begin resume
> 0x1F=0x8190 0x16=0x202
> after genphy_resume 0x1F=0x8100 0x16=0x202
> end
> resume 0x1F=0x8190 0x16=0x202

OK, so it actually would not hurt to call config_init() again here, right?

> 
>> If not, I would be more comfortable if we did create a specific function
>> that takes care of setting the reference clock and LED mode.
> 
> Ok, I can add a function called kszphy_config_reset() with a comment
> explaining it's for bits lost on reset/resume.
> 
> Or perhaps a better option would be to just save/restore the entire
> 0x1F register?

Register 0 through 15 are standardized, but those after that are not,
and PHYs with a gazillion of registers already need to have a specific
set of suspend/resume sequence due to their proprietary indirect
register scheme, so we cannot quite come up with a generic function that
would cache all 16 or 32 registers and flat out restore those.

Similarly, having phy_resume() systematically calling into config_init()
is a bit of a stretch and can be pretty inefficient.
-- 
Florian


Re: [PATCH 2/2] net: phy: micrel: Restore led_mode and clk_sel on resume

2017-05-30 Thread Leonard Crestez
On Tue, 2017-05-30 at 11:05 -0700, Florian Fainelli wrote:
> On 05/30/2017 10:34 AM, Leonard Crestez wrote:
> > These bits seem to be lost after a suspend/resume cycle so just set them
> > again.
> > 
> > This patch fixes ethernet suspend/resume on imx6ul-14x14-evk boards.
> > 
> > Signed-off-by: Leonard Crestez 
> > ---
> >  drivers/net/phy/micrel.c | 15 +++
> >  1 file changed, 15 insertions(+)
> > 
> > diff --git a/drivers/net/phy/micrel.c b/drivers/net/phy/micrel.c
> > index 6a5fd18..c53ee17 100644
> > --- a/drivers/net/phy/micrel.c
> > +++ b/drivers/net/phy/micrel.c
> > @@ -700,6 +700,9 @@ static int kszphy_suspend(struct phy_device *phydev)
> >  
> >  static int kszphy_resume(struct phy_device *phydev)
> >  {
> > +   struct kszphy_priv *priv = phydev->priv;
> > +   int ret;
> > +
> > genphy_resume(phydev);
> >  
> > /* Enable PHY Interrupts */
> > @@ -709,6 +712,18 @@ static int kszphy_resume(struct phy_device *phydev)
> > phydev->drv->config_intr(phydev);
> > }
> >  
> > +   if (priv->rmii_ref_clk_sel) {
> > +   ret = kszphy_rmii_clk_sel(phydev, priv->rmii_ref_clk_sel_val);
> > +   if (ret) {
> > +   phydev_err(phydev,
> > +  "failed to set rmii reference clock\n");
> > +   return ret;
> > +   }
> > +   }
> > +
> > +   if (priv->led_mode >= 0)
> > +   kszphy_setup_led(phydev, priv->type->led_mode_reg, 
> > priv->led_mode);
> 
> Should not we actually call kszphy_config_init() in order to restore
> broadcast and nand disable bits as well?

I don't know. In my case the B_CAST_OFF bit doesn't seem to be lost and
NAND_TREE_ON is already off by the time it gets to linux.

The bit that get lost seem to disappear just as the phy is resumed. I
added some prints and they look like this:

PM: early resume of devices complete after 6.534 msecs
begin resume
0x1F=0x8190 0x16=0x202
after genphy_resume 0x1F=0x8100 0x16=0x202
end
resume 0x1F=0x8190 0x16=0x202

> If not, I would be more comfortable if we did create a specific function
> that takes care of setting the reference clock and LED mode.

Ok, I can add a function called kszphy_config_reset() with a comment
explaining it's for bits lost on reset/resume.

Or perhaps a better option would be to just save/restore the entire
0x1F register?


[PATCH net-next] netlink: include netnsid only when netns differs.

2017-05-30 Thread Flavio Leitner
Don't include netns id for notifications broadcasts when the
socket and the skb are in the same netns because it will be
an error which can't be distinguished from a peer netns failing
to allocate an id.

Signed-off-by: Flavio Leitner 
---
 net/netlink/af_netlink.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index ee841f0..b9f1392 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -1414,8 +1414,10 @@ static void do_one_broadcast(struct sock *sk,
p->skb2 = NULL;
goto out;
}
-   NETLINK_CB(p->skb2).nsid = peernet2id(sock_net(sk), p->net);
-   NETLINK_CB(p->skb2).nsid_is_set = true;
+   if (!net_eq(sock_net(sk), p->net)) {
+   NETLINK_CB(p->skb2).nsid = peernet2id(sock_net(sk), p->net);
+   NETLINK_CB(p->skb2).nsid_is_set = true;
+   }
val = netlink_broadcast_deliver(sk, p->skb2);
if (val < 0) {
netlink_overrun(sk);
-- 
2.9.4




Re: [PATCH] dt-bindings: net: move FMan binding

2017-05-30 Thread Rob Herring
On Thu, May 25, 2017 at 12:57:26PM +, Madalin-Cristian Bucur wrote:
> > -Original Message-
> > From: David Miller [mailto:da...@davemloft.net]
> > Sent: Monday, May 15, 2017 5:31 PM
> > Subject: Re: [PATCH] dt-bindings: net: move FMan binding
> > 
> > From: Madalin Bucur 
> > Date: Mon, 15 May 2017 16:36:38 +0300
> > 
> > > Besides the PPC SoCs, the QorIQ DPAA FMan is also present on ARM SoCs,
> > > moving the device tree binding document into the bindings/net folder.
> > >
> > > Signed-off-by: Madalin Bucur 
> > 
> > What tree is this targetted at for merging?
> 
> I hope Rob Herring will take this into his tree, it's a device tree binding
> change. 

Applied.

Rob


Re: [PATCH V2 1/3] Documentation: devicetree: add multiple cpu port DSA binding

2017-05-30 Thread Florian Fainelli
On 05/30/2017 03:44 AM, John Crispin wrote:
> Extend the DSA binding documentation, adding the new property required
> when there is more than one CPU port attached to the switch.
> 
> Cc: Rob Herring 
> Cc: devicet...@vger.kernel.org
> Signed-off-by: John Crispin 
> ---
>  Documentation/devicetree/bindings/net/dsa/dsa.txt | 61 
> ++-
>  1 file changed, 60 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/devicetree/bindings/net/dsa/dsa.txt 
> b/Documentation/devicetree/bindings/net/dsa/dsa.txt
> index cfe8f64eca4f..c164eb38ccc5 100644
> --- a/Documentation/devicetree/bindings/net/dsa/dsa.txt
> +++ b/Documentation/devicetree/bindings/net/dsa/dsa.txt
> @@ -55,6 +55,11 @@ A user port has the following optional property:
>  - label  : Describes the label associated with this 
> port, which
>will become the netdev name.
>  
> +- cpu: Option for non "cpu"/"dsa" ports. A phandle 
> to a
> +   "cpu" port, which will be used for passing packets
> +   from this port to the host. If not present, the first
> +   "cpu" port will be used.

So this option essentially allow us to "partition" the switch between
vectors of ports and their upstream/CPU port.

While using Device Tree is an obvious choice for making the initial
partitioning, it seems like we are missing a configuration mechanism
whereby we can properly assign ports to a specific upstream CPU port.

Let's move the actual discussion into patch 2 in order not to pollute
the DT maintainers' inbox.

> +
>  Port child nodes may also contain the following optional standardised
>  properties, described in binding documents:
>  
> @@ -71,7 +76,7 @@ properties, described in binding documents:
> Documentation/devicetree/bindings/net/fixed-link.txt
> for details.
>  
> -Example
> +Examples
>  
>  The following example shows three switches on three MDIO busses,
>  linked into one DSA cluster.
> @@ -264,6 +269,60 @@ linked into one DSA cluster.
>   };
>  };
>  
> +The following example shows a switch that has two cpu ports each connecting
> +to a different MAC.
> +
> +&mdio0 {
> + switch@0 {
> + compatible = "mediatek,mt7530";
> + #address-cells = <1>;
> + #size-cells = <0>;
> + reg = <0>;
> +
> + ports {
> + #address-cells = <1>;
> + #size-cells = <0>;
> + port@0 {
> + reg = <0>;
> + label = "lan0";
> + cpu = <&cpu_port1>;
> + };
> +
> + port@1 {
> + reg = <1>;
> + label = "lan1";
> + cpu = <&cpu_port1>;
> + };
> +
> + port@2 {
> + reg = <2>;
> + label = "lan2";
> + cpu = <&cpu_port1>;
> + };
> +
> + port@3 {
> + reg = <3>;
> + label = "wan";
> + cpu = <&cpu_port2>;
> + };
> +
> + cpu_port2: port@5 {
> + reg = <5>;
> + label = "cpu";
> + ethernet = <&gmac2>;
> + phy-mode = "trgmii";
> + };
> +
> + cpu_port1: port@6 {
> + reg = <6>;
> + label = "cpu";
> + ethernet = <&gmac1>;
> + phy-mode = "trgmii";
> + };
> + };
> + };
> +};
> +
>  Deprecated Binding
>  --
>  
> 


-- 
Florian


Re: [PATCH net-next 1/7] net: dsa: hide dsa_uses_tagged_protocol code

2017-05-30 Thread kbuild test robot
Hi Vivien,

[auto build test ERROR on net-next/master]

url:
https://github.com/0day-ci/linux/commits/Vivien-Didelot/net-dsa-tagger-simplification/20170531-032911
config: ia64-allmodconfig (attached as .config)
compiler: ia64-linux-gcc (GCC) 6.2.0
reproduce:
wget 
https://raw.githubusercontent.com/01org/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=ia64 

All errors (new ones prefixed by >>):

   net/built-in.o: In function `eth_type_trans':
>> (.text+0xbf5f2): undefined reference to `dsa_uses_tagged_protocol'
   net/built-in.o: In function `ic_close_devs':
>> ipconfig.c:(.init.text+0x72c2): undefined reference to 
>> `dsa_uses_tagged_protocol'
   net/built-in.o: In function `ip_auto_config':
   ipconfig.c:(.init.text+0x8882): undefined reference to 
`dsa_uses_tagged_protocol'

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip


Re: [PATCH net-next v2 6/6] net: dsa: factor skb freeing on xmit

2017-05-30 Thread Florian Fainelli
On 05/30/2017 11:33 AM, Vivien Didelot wrote:
> The taggers are currently responsible to free the original SKB if they
> made a copy of it, or in case of error.
> 
> This patch simplifies this by freeing the original SKB in the
> dsa_slave_xmit caller, but only if an error (NULL) is returned.

Is not it a clearer contract if the tagging protocol must always free
the original SKB, whether it succeeded in creating a new one or not?

> 
> It is still the responsibility of the tagger to free the original SKB if
> it returned a copy of it.
> 
> At the same time, fix the checkpatch NULL comparison check:
> 
> CHECK: Comparison to NULL could be written "!nskb"
> #208: FILE: net/dsa/tag_trailer.c:35:
> + if (nskb == NULL)
> 
> Signed-off-by: Vivien Didelot 
> ---
>  net/dsa/slave.c   | 8 ++--
>  net/dsa/tag_brcm.c| 6 +-
>  net/dsa/tag_dsa.c | 8 ++--
>  net/dsa/tag_edsa.c| 8 ++--
>  net/dsa/tag_lan9303.c | 5 +
>  net/dsa/tag_mtk.c | 6 +-
>  net/dsa/tag_qca.c | 6 +-
>  net/dsa/tag_trailer.c | 4 +---
>  8 files changed, 15 insertions(+), 36 deletions(-)
> 
> diff --git a/net/dsa/slave.c b/net/dsa/slave.c
> index 887e26695519..4473aec9dcda 100644
> --- a/net/dsa/slave.c
> +++ b/net/dsa/slave.c
> @@ -362,10 +362,14 @@ static netdev_tx_t dsa_slave_xmit(struct sk_buff *skb, 
> struct net_device *dev)
>   dev->stats.tx_packets++;
>   dev->stats.tx_bytes += skb->len;
>  
> - /* Transmit function may have to reallocate the original SKB */
> + /* Transmit function may have to reallocate the original SKB,
> +  * in which case it must have freed it. Only free it here on error.
> +  */
>   nskb = p->xmit(skb, dev);
> - if (!nskb)
> + if (!nskb) {
> + kfree_skb(skb);
>   return NETDEV_TX_OK;
> + }
>  
>   /* SKB for netpoll still need to be mangled with the protocol-specific
>* tag to be successfully transmitted
> diff --git a/net/dsa/tag_brcm.c b/net/dsa/tag_brcm.c
> index 10fa4c0ca46b..5db6bcfde025 100644
> --- a/net/dsa/tag_brcm.c
> +++ b/net/dsa/tag_brcm.c
> @@ -65,7 +65,7 @@ static struct sk_buff *brcm_tag_xmit(struct sk_buff *skb, 
> struct net_device *dev
>   u8 *brcm_tag;
>  
>   if (skb_cow_head(skb, BRCM_TAG_LEN) < 0)
> - goto out_free;
> + return NULL;
>  
>   skb_push(skb, BRCM_TAG_LEN);
>  
> @@ -86,10 +86,6 @@ static struct sk_buff *brcm_tag_xmit(struct sk_buff *skb, 
> struct net_device *dev
>   brcm_tag[3] = (1 << p->dp->index) & BRCM_IG_DSTMAP1_MASK;
>  
>   return skb;
> -
> -out_free:
> - kfree_skb(skb);
> - return NULL;
>  }
>  
>  static struct sk_buff *brcm_tag_rcv(struct sk_buff *skb, struct net_device 
> *dev)
> diff --git a/net/dsa/tag_dsa.c b/net/dsa/tag_dsa.c
> index 9f5fecaa4a93..6837ca9160a8 100644
> --- a/net/dsa/tag_dsa.c
> +++ b/net/dsa/tag_dsa.c
> @@ -28,7 +28,7 @@ static struct sk_buff *dsa_xmit(struct sk_buff *skb, struct 
> net_device *dev)
>*/
>   if (skb->protocol == htons(ETH_P_8021Q)) {
>   if (skb_cow_head(skb, 0) < 0)
> - goto out_free;
> + return NULL;
>  
>   /*
>* Construct tagged FROM_CPU DSA tag from 802.1q tag.
> @@ -46,7 +46,7 @@ static struct sk_buff *dsa_xmit(struct sk_buff *skb, struct 
> net_device *dev)
>   }
>   } else {
>   if (skb_cow_head(skb, DSA_HLEN) < 0)
> - goto out_free;
> + return NULL;
>   skb_push(skb, DSA_HLEN);
>  
>   memmove(skb->data, skb->data + DSA_HLEN, 2 * ETH_ALEN);
> @@ -62,10 +62,6 @@ static struct sk_buff *dsa_xmit(struct sk_buff *skb, 
> struct net_device *dev)
>   }
>  
>   return skb;
> -
> -out_free:
> - kfree_skb(skb);
> - return NULL;
>  }
>  
>  static struct sk_buff *dsa_rcv(struct sk_buff *skb, struct net_device *dev)
> diff --git a/net/dsa/tag_edsa.c b/net/dsa/tag_edsa.c
> index a9a06e19abeb..96ddc90472a2 100644
> --- a/net/dsa/tag_edsa.c
> +++ b/net/dsa/tag_edsa.c
> @@ -30,7 +30,7 @@ static struct sk_buff *edsa_xmit(struct sk_buff *skb, 
> struct net_device *dev)
>*/
>   if (skb->protocol == htons(ETH_P_8021Q)) {
>   if (skb_cow_head(skb, DSA_HLEN) < 0)
> - goto out_free;
> + return NULL;
>   skb_push(skb, DSA_HLEN);
>  
>   memmove(skb->data, skb->data + DSA_HLEN, 2 * ETH_ALEN);
> @@ -55,7 +55,7 @@ static struct sk_buff *edsa_xmit(struct sk_buff *skb, 
> struct net_device *dev)
>   }
>   } else {
>   if (skb_cow_head(skb, EDSA_HLEN) < 0)
> - goto out_free;
> + return NULL;
>   skb_push(skb, EDSA_HLEN);
>  
>   memmove(skb->data, skb->data + EDSA_HLEN, 2 * ETH_ALEN);
> @@ -75,10 +75,6 @@ static struct sk_buff *edsa_xmit(struct sk_buff *skb, 
> struct net_device *dev)
>  

Re: [PATCH net-next v2 5/6] net: dsa: remove out_drop label in taggers rcv

2017-05-30 Thread Florian Fainelli
On 05/30/2017 11:33 AM, Vivien Didelot wrote:
> Many rcv functions from net/dsa/tag_*.c have a useless out_drop goto
> label which simply returns NULL. Kill it in favor of the obvious.

Why not

> 
> Signed-off-by: Vivien Didelot 

Reviewed-by: Florian Fainelli 
-- 
Florian


Re: [PATCH net-next v2 3/6] net: dsa: remove dsa_uses_tagged_protocol

2017-05-30 Thread Florian Fainelli
On 05/30/2017 11:33 AM, Vivien Didelot wrote:
> Since dev->dsa_ptr is a pointer to a dsa_switch_tree, there is no need
> to have another inline helper just to check rcv.
> 
> Remove dsa_uses_tagged_protocol and check dsa_ptr && dsa_ptr->rcv
> together at the same time.
> 
> Signed-off-by: Vivien Didelot 

FYI, the part that matters here is that we know whether the master
network device actually uses a tagged DSA protocol, this is important
for e.g: bridge (see 8db0a2ee2c6302a1dcbcdb93cb731dfc6c0cdb5e). Some
drivers like bcmsysport.c also care about that in order to enable
specific switch tagging protocol parsing hints.

Reviewed-by: Florian Fainelli 

> ---
>  include/net/dsa.h | 8 +---
>  1 file changed, 1 insertion(+), 7 deletions(-)
> 
> diff --git a/include/net/dsa.h b/include/net/dsa.h
> index 58297e4c6b31..4675b52c964c 100644
> --- a/include/net/dsa.h
> +++ b/include/net/dsa.h
> @@ -467,16 +467,10 @@ struct mii_bus *dsa_host_dev_to_mii_bus(struct device 
> *dev);
>  struct net_device *dsa_dev_to_net_device(struct device *dev);
>  
>  /* Keep inline for faster access in hot path */
> -static inline bool dsa_uses_tagged_protocol(struct dsa_switch_tree *dst)
> -{
> - return dst->rcv != NULL;
> -}
> -
>  static inline bool netdev_uses_dsa(struct net_device *dev)
>  {
>  #if IS_ENABLED(CONFIG_NET_DSA)
> - if (dev->dsa_ptr != NULL)
> - return dsa_uses_tagged_protocol(dev->dsa_ptr);
> + return dev->dsa_ptr && dev->dsa_ptr->rcv;
>  #endif
>   return false;
>  }
> 


-- 
Florian


[PATCH] net: fix auto-loading of Marvell DSA driver

2017-05-30 Thread Russell King
Auto-loading of the Marvell DSA driver has stopped working with recent
kernels.  This seems to be due to the change of binding for DSA devices,
moving them from the platform bus to the MDIO bus.

In order for module auto-loading to work, we need to provide a MODALIAS
string in the uevent file for the device.  However, the device core does
not automatically provide this, and needs each bus_type to implement a
uevent method to generate these strings.  The MDIO bus does not provide
such a method, so no MODALIAS string is provided:

.# cat /sys/bus/mdio_bus/devices/f1072004.mdio-mii\:04/uevent
DRIVER=mv88e6085
OF_NAME=switch
OF_FULLNAME=/soc/internal-regs/mdio@72004/switch@4
OF_COMPATIBLE_0=marvell,mv88e6085
OF_COMPATIBLE_N=1

In the case of OF-based devices, the solution is easy -
of_device_uevent_modalias() does the work for us.  After this is done,
the uevent file looks like this:

.# cat /sys/bus/mdio_bus/devices/f1072004.mdio-mii\:04/uevent
DRIVER=mv88e6085
OF_NAME=switch
OF_FULLNAME=/soc/internal-regs/mdio@72004/switch@4
OF_COMPATIBLE_0=marvell,mv88e6085
OF_COMPATIBLE_N=1
MODALIAS=of:NswitchTCmarvell,mv88e6085

which results in auto-loading of the Marvell DSA driver on Clearfog
platforms.

Fixes: c0405563a613 ("ARM: dts: armada-388-clearfog: Utilize new DSA binding")
Signed-off-by: Russell King 
---
 drivers/net/phy/mdio_bus.c | 13 +
 1 file changed, 13 insertions(+)

diff --git a/drivers/net/phy/mdio_bus.c b/drivers/net/phy/mdio_bus.c
index 8e73f5f36e71..f99c21f78b63 100644
--- a/drivers/net/phy/mdio_bus.c
+++ b/drivers/net/phy/mdio_bus.c
@@ -658,6 +658,18 @@ static int mdio_bus_match(struct device *dev, struct 
device_driver *drv)
return 0;
 }
 
+static int mdio_uevent(struct device *dev, struct kobj_uevent_env *env)
+{
+   int rc;
+
+   /* Some devices have extra OF data and an OF-style MODALIAS */
+   rc = of_device_uevent_modalias(dev, env);
+   if (rc != -ENODEV)
+   return rc;
+
+   return 0;
+}
+
 #ifdef CONFIG_PM
 static int mdio_bus_suspend(struct device *dev)
 {
@@ -708,6 +720,7 @@ static const struct dev_pm_ops mdio_bus_pm_ops = {
 struct bus_type mdio_bus_type = {
.name   = "mdio_bus",
.match  = mdio_bus_match,
+   .uevent = mdio_uevent,
.pm = MDIO_BUS_PM_OPS,
 };
 EXPORT_SYMBOL(mdio_bus_type);
-- 
2.7.4



Re: [PATCH net-next v2 2/6] net: dsa: do not cast dst

2017-05-30 Thread Florian Fainelli
On 05/30/2017 11:33 AM, Vivien Didelot wrote:
> dsa_ptr is not a void pointer anymore since Nov 2011, as of cf50dcc24f82
> ("dsa: Change dsa_uses_{dsa, trailer}_tags() into inline functions"),
> but an explicit dsa_switch_tree pointer, thus remove the (void *) cast.
> 
> Signed-off-by: Vivien Didelot 

Reviewed-by: Florian Fainelli 
-- 
Florian


Re: [PATCH net-next v2 1/6] net: dsa: comment hot path requirements

2017-05-30 Thread Florian Fainelli
On 05/30/2017 11:33 AM, Vivien Didelot wrote:
> The DSA layer uses inline helpers and copies of the tagging functions
> for faster access in hot path. Add comments to detail that.
> 
> Signed-off-by: Vivien Didelot 

Reviewed-by: Florian Fainelli 
-- 
Florian


[PATCH net-next 2/9] bpf: split bpf core interpreter

2017-05-30 Thread Alexei Starovoitov
split __bpf_prog_run() interpreter into stack allocation and execution parts.
The code section shrinks which helps interpreter performance in some cases.
   textdata bss dec hex filename
  26350   10328 624   3730291b6 kernel/bpf/core.o.before
  25777   10328 624   367298f79 kernel/bpf/core.o.after

Very short programs got slower (due to extra function call):
Before:
test_bpf: #89 ALU64_ADD_K: 1 + 2 = 3 jited:0 7 PASS
test_bpf: #90 ALU64_ADD_K: 3 + 0 = 3 jited:0 8 PASS
test_bpf: #91 ALU64_ADD_K: 1 + 2147483646 = 2147483647 jited:0 7 PASS
test_bpf: #92 ALU64_ADD_K: 4294967294 + 2 = 4294967296 jited:0 11 PASS
test_bpf: #93 ALU64_ADD_K: 2147483646 + -2147483647 = -1 jited:0 7 PASS
After:
test_bpf: #89 ALU64_ADD_K: 1 + 2 = 3 jited:0 11 PASS
test_bpf: #90 ALU64_ADD_K: 3 + 0 = 3 jited:0 11 PASS
test_bpf: #91 ALU64_ADD_K: 1 + 2147483646 = 2147483647 jited:0 11 PASS
test_bpf: #92 ALU64_ADD_K: 4294967294 + 2 = 4294967296 jited:0 14 PASS
test_bpf: #93 ALU64_ADD_K: 2147483646 + -2147483647 = -1 jited:0 10 PASS

Longer programs got faster:
Before:
test_bpf: #266 BPF_MAXINSNS: Ctx heavy transformations jited:0 20286 20513 PASS
test_bpf: #267 BPF_MAXINSNS: Call heavy transformations jited:0 31853 31768 PASS
test_bpf: #268 BPF_MAXINSNS: Jump heavy test jited:0 9815 PASS
test_bpf: #269 BPF_MAXINSNS: Very long jump backwards jited:0 6 PASS
test_bpf: #270 BPF_MAXINSNS: Edge hopping nuthouse jited:0 13959 PASS
test_bpf: #271 BPF_MAXINSNS: Jump, gap, jump, ... jited:0 210 PASS
test_bpf: #272 BPF_MAXINSNS: ld_abs+get_processor_id jited:0 21724 PASS
test_bpf: #273 BPF_MAXINSNS: ld_abs+vlan_push/pop jited:0 19118 PASS
After:
test_bpf: #266 BPF_MAXINSNS: Ctx heavy transformations jited:0 19008 18827 PASS
test_bpf: #267 BPF_MAXINSNS: Call heavy transformations jited:0 29238 28450 PASS
test_bpf: #268 BPF_MAXINSNS: Jump heavy test jited:0 9485 PASS
test_bpf: #269 BPF_MAXINSNS: Very long jump backwards jited:0 12 PASS
test_bpf: #270 BPF_MAXINSNS: Edge hopping nuthouse jited:0 13257 PASS
test_bpf: #271 BPF_MAXINSNS: Jump, gap, jump, ... jited:0 213 PASS
test_bpf: #272 BPF_MAXINSNS: ld_abs+get_processor_id jited:0 19389 PASS
test_bpf: #273 BPF_MAXINSNS: ld_abs+vlan_push/pop jited:0 19583 PASS

For real world production programs the difference is noise.

This patch is first step towards reducing interpreter stack consumption.

Signed-off-by: Alexei Starovoitov 
Acked-by: Daniel Borkmann 
---
 kernel/bpf/core.c | 21 ++---
 1 file changed, 14 insertions(+), 7 deletions(-)

diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index 339289402b96..abd410d394bc 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -763,10 +763,10 @@ EXPORT_SYMBOL_GPL(__bpf_call_base);
  *
  * Decode and execute eBPF instructions.
  */
-static unsigned int __bpf_prog_run(void *ctx, const struct bpf_insn *insn)
+static unsigned int ___bpf_prog_run(u64 *regs, const struct bpf_insn *insn,
+   u64 *stack)
 {
-   u64 stack[MAX_BPF_STACK / sizeof(u64)];
-   u64 regs[MAX_BPF_REG], tmp;
+   u64 tmp;
static const void *jumptable[256] = {
[0 ... 255] = &&default_label,
/* Now overwrite non-defaults ... */
@@ -874,9 +874,6 @@ static unsigned int __bpf_prog_run(void *ctx, const struct 
bpf_insn *insn)
 #define CONT({ insn++; goto select_insn; })
 #define CONT_JMP ({ insn++; goto select_insn; })
 
-   FP = (u64) (unsigned long) &stack[ARRAY_SIZE(stack)];
-   ARG1 = (u64) (unsigned long) ctx;
-
 select_insn:
goto *jumptable[insn->code];
 
@@ -1219,7 +1216,17 @@ static unsigned int __bpf_prog_run(void *ctx, const 
struct bpf_insn *insn)
WARN_RATELIMIT(1, "unknown opcode %02x\n", insn->code);
return 0;
 }
-STACK_FRAME_NON_STANDARD(__bpf_prog_run); /* jump table */
+STACK_FRAME_NON_STANDARD(___bpf_prog_run); /* jump table */
+
+static unsigned int __bpf_prog_run(void *ctx, const struct bpf_insn *insn)
+{
+   u64 stack[MAX_BPF_STACK / sizeof(u64)];
+   u64 regs[MAX_BPF_REG];
+
+   FP = (u64) (unsigned long) &stack[ARRAY_SIZE(stack)];
+   ARG1 = (u64) (unsigned long) ctx;
+   return ___bpf_prog_run(regs, insn, stack);
+}
 
 bool bpf_prog_array_compatible(struct bpf_array *array,
   const struct bpf_prog *fp)
-- 
2.9.3



[PATCH net-next 7/9] bpf: use different interpreter depending on required stack size

2017-05-30 Thread Alexei Starovoitov
16 __bpf_prog_run() interpreters for various stack sizes add .text
but not a lot comparing to run-time stack savings

   textdata bss dec hex filename
  26350   10328 624   3730291b6 kernel/bpf/core.o.before_split
  25777   10328 624   367298f79 kernel/bpf/core.o.after_split
  26970   10328 624   379229422 kernel/bpf/core.o.now

Signed-off-by: Alexei Starovoitov 
Acked-by: Daniel Borkmann 
---
 kernel/bpf/core.c | 40 +++-
 1 file changed, 31 insertions(+), 9 deletions(-)

diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index abd410d394bc..774069ca18a7 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -1218,16 +1218,38 @@ static unsigned int ___bpf_prog_run(u64 *regs, const 
struct bpf_insn *insn,
 }
 STACK_FRAME_NON_STANDARD(___bpf_prog_run); /* jump table */
 
-static unsigned int __bpf_prog_run(void *ctx, const struct bpf_insn *insn)
-{
-   u64 stack[MAX_BPF_STACK / sizeof(u64)];
-   u64 regs[MAX_BPF_REG];
-
-   FP = (u64) (unsigned long) &stack[ARRAY_SIZE(stack)];
-   ARG1 = (u64) (unsigned long) ctx;
-   return ___bpf_prog_run(regs, insn, stack);
+#define PROG_NAME(stack_size) __bpf_prog_run##stack_size
+#define DEFINE_BPF_PROG_RUN(stack_size) \
+static unsigned int PROG_NAME(stack_size)(const void *ctx, const struct 
bpf_insn *insn) \
+{ \
+   u64 stack[stack_size / sizeof(u64)]; \
+   u64 regs[MAX_BPF_REG]; \
+\
+   FP = (u64) (unsigned long) &stack[ARRAY_SIZE(stack)]; \
+   ARG1 = (u64) (unsigned long) ctx; \
+   return ___bpf_prog_run(regs, insn, stack); \
 }
 
+#define EVAL1(FN, X) FN(X)
+#define EVAL2(FN, X, Y...) FN(X) EVAL1(FN, Y)
+#define EVAL3(FN, X, Y...) FN(X) EVAL2(FN, Y)
+#define EVAL4(FN, X, Y...) FN(X) EVAL3(FN, Y)
+#define EVAL5(FN, X, Y...) FN(X) EVAL4(FN, Y)
+#define EVAL6(FN, X, Y...) FN(X) EVAL5(FN, Y)
+
+EVAL6(DEFINE_BPF_PROG_RUN, 32, 64, 96, 128, 160, 192);
+EVAL6(DEFINE_BPF_PROG_RUN, 224, 256, 288, 320, 352, 384);
+EVAL4(DEFINE_BPF_PROG_RUN, 416, 448, 480, 512);
+
+#define PROG_NAME_LIST(stack_size) PROG_NAME(stack_size),
+
+static unsigned int (*interpreters[])(const void *ctx,
+ const struct bpf_insn *insn) = {
+EVAL6(PROG_NAME_LIST, 32, 64, 96, 128, 160, 192)
+EVAL6(PROG_NAME_LIST, 224, 256, 288, 320, 352, 384)
+EVAL4(PROG_NAME_LIST, 416, 448, 480, 512)
+};
+
 bool bpf_prog_array_compatible(struct bpf_array *array,
   const struct bpf_prog *fp)
 {
@@ -1275,7 +1297,7 @@ static int bpf_check_tail_call(const struct bpf_prog *fp)
  */
 struct bpf_prog *bpf_prog_select_runtime(struct bpf_prog *fp, int *err)
 {
-   fp->bpf_func = (void *) __bpf_prog_run;
+   fp->bpf_func = interpreters[round_down(fp->aux->stack_depth, 32) / 32];
 
/* eBPF JITs can rewrite the program in case constant
 * blinding is active. However, in case of error during
-- 
2.9.3



Re: [patch net-next 0/9] mlxsw: Support firmware flash

2017-05-30 Thread Or Gerlitz
On Sun, May 28, 2017 at 10:26 AM, Yotam Gigi  wrote:
> On 05/23/2017 06:38 PM, David Miller wrote:
>> From: Yotam Gigi 
>> Date: Tue, 23 May 2017 18:14:15 +0300

>>> Sorry, I am not sure I understand. You think that drivers should not 
>>> implement
>>> ethtool's flash_device callback anymore? do you have an alternative for
>>> firmware flash?

>> As stated, export an MTD device.

> So, after we have been going over MTD, it seems like it does not fit our needs
> at all.

> MTD device provides (erasable-)block access to a flash storage, where in our
> case the firmware burn process is just pouring a binary BLOB into the device.
> The driver is not aware of the internal storage used for storing the firmware 
> as
> it is not defined in our driver-hardware API.
>
> Needless to say that block access has no meaning in our case, so any solution
> that will involve MTD device to burn our firmware (if there is a solution at
> all) will be a workaround and will not fit MTD purpose.
>
> Apart for boot time firmware flash, which we have already pushed we would 
> really
> like to allow the user to ask for a specific firmware version. Do you have any
> other solution for us apart from "ethtool -f"?
>
> This problem is even more relevant in the Mellanox HCA driver team, which 
> would
> like to use that code in order to burn the HCA firmware, but not intend to
> trigger it on boot time, which means that must have a way for the user to
> trigger it.


Hi Dave,

We had few more emails on this thread with Jakub, and he's now happy
with our replies, so where do we go from here? could you comment on
Yotam's note.

thanks,

Or.


[PATCH net-next 1/9] bpf: free up BPF_JMP | BPF_CALL | BPF_X opcode

2017-05-30 Thread Alexei Starovoitov
free up BPF_JMP | BPF_CALL | BPF_X opcode to be used by actual
indirect call by register and use kernel internal opcode to
mark call instruction into bpf_tail_call() helper.

Signed-off-by: Alexei Starovoitov 
Acked-by: Daniel Borkmann 
---
 arch/arm64/net/bpf_jit_comp.c | 2 +-
 arch/powerpc/net/bpf_jit_comp64.c | 2 +-
 arch/s390/net/bpf_jit_comp.c  | 2 +-
 arch/sparc/net/bpf_jit_comp_64.c  | 2 +-
 arch/x86/net/bpf_jit_comp.c   | 2 +-
 include/linux/filter.h| 3 +++
 kernel/bpf/core.c | 2 +-
 kernel/bpf/verifier.c | 2 +-
 8 files changed, 10 insertions(+), 7 deletions(-)

diff --git a/arch/arm64/net/bpf_jit_comp.c b/arch/arm64/net/bpf_jit_comp.c
index 71f930501ade..b1d38eeb24f6 100644
--- a/arch/arm64/net/bpf_jit_comp.c
+++ b/arch/arm64/net/bpf_jit_comp.c
@@ -586,7 +586,7 @@ static int build_insn(const struct bpf_insn *insn, struct 
jit_ctx *ctx)
break;
}
/* tail call */
-   case BPF_JMP | BPF_CALL | BPF_X:
+   case BPF_JMP | BPF_TAIL_CALL:
if (emit_bpf_tail_call(ctx))
return -EFAULT;
break;
diff --git a/arch/powerpc/net/bpf_jit_comp64.c 
b/arch/powerpc/net/bpf_jit_comp64.c
index aee2bb817ac6..a01366584a4b 100644
--- a/arch/powerpc/net/bpf_jit_comp64.c
+++ b/arch/powerpc/net/bpf_jit_comp64.c
@@ -938,7 +938,7 @@ static int bpf_jit_build_body(struct bpf_prog *fp, u32 
*image,
/*
 * Tail call
 */
-   case BPF_JMP | BPF_CALL | BPF_X:
+   case BPF_JMP | BPF_TAIL_CALL:
ctx->seen |= SEEN_TAILCALL;
bpf_jit_emit_tail_call(image, ctx, addrs[i + 1]);
break;
diff --git a/arch/s390/net/bpf_jit_comp.c b/arch/s390/net/bpf_jit_comp.c
index 6e97a2e3fd8d..42ad3832586c 100644
--- a/arch/s390/net/bpf_jit_comp.c
+++ b/arch/s390/net/bpf_jit_comp.c
@@ -991,7 +991,7 @@ static noinline int bpf_jit_insn(struct bpf_jit *jit, 
struct bpf_prog *fp, int i
}
break;
}
-   case BPF_JMP | BPF_CALL | BPF_X:
+   case BPF_JMP | BPF_TAIL_CALL:
/*
 * Implicit input:
 *  B1: pointer to ctx
diff --git a/arch/sparc/net/bpf_jit_comp_64.c b/arch/sparc/net/bpf_jit_comp_64.c
index 21de77419f48..4a52d34facf9 100644
--- a/arch/sparc/net/bpf_jit_comp_64.c
+++ b/arch/sparc/net/bpf_jit_comp_64.c
@@ -1217,7 +1217,7 @@ static int build_insn(const struct bpf_insn *insn, struct 
jit_ctx *ctx)
}
 
/* tail call */
-   case BPF_JMP | BPF_CALL |BPF_X:
+   case BPF_JMP | BPF_TAIL_CALL:
emit_tail_call(ctx);
break;
 
diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
index f58939393eef..fec12eaa0dec 100644
--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -877,7 +877,7 @@ xadd:   if (is_imm8(insn->off))
}
break;
 
-   case BPF_JMP | BPF_CALL | BPF_X:
+   case BPF_JMP | BPF_TAIL_CALL:
emit_bpf_tail_call(&prog);
break;
 
diff --git a/include/linux/filter.h b/include/linux/filter.h
index 62d948f80730..a20ba40fcb73 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -57,6 +57,9 @@ struct bpf_prog_aux;
 #define BPF_REG_AX MAX_BPF_REG
 #define MAX_BPF_JIT_REG(MAX_BPF_REG + 1)
 
+/* unused opcode to mark special call to bpf_tail_call() helper */
+#define BPF_TAIL_CALL  0xf0
+
 /* As per nm, we expose JITed images as text (code) section for
  * kallsyms. That way, tools like perf can find it to match
  * addresses.
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index dedf367f59bb..339289402b96 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -824,7 +824,7 @@ static unsigned int __bpf_prog_run(void *ctx, const struct 
bpf_insn *insn)
[BPF_ALU64 | BPF_NEG] = &&ALU64_NEG,
/* Call instruction */
[BPF_JMP | BPF_CALL] = &&JMP_CALL,
-   [BPF_JMP | BPF_CALL | BPF_X] = &&JMP_TAIL_CALL,
+   [BPF_JMP | BPF_TAIL_CALL] = &&JMP_TAIL_CALL,
/* Jumps */
[BPF_JMP | BPF_JA] = &&JMP_JA,
[BPF_JMP | BPF_JEQ | BPF_X] = &&JMP_JEQ_X,
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 339c8a1371de..28113d0e8e92 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -3469,7 +3469,7 @@ static int fixup_bpf_calls(struct bpf_verifier_env *env)
 * that doesn't support bpf_tail_call yet
 */
insn->imm = 0;
-   insn->code |= BPF_X;
+   insn->code = BPF_JMP | BPF_TAIL_CALL;
continue;
}
 
-- 
2.9.3



[PATCH net-next 6/9] bpf: fix stack_depth usage by test_bpf.ko

2017-05-30 Thread Alexei Starovoitov
test_bpf.ko doesn't call verifier before selecting interpreter or JITing,
hence the tests need to manually specify the amount of stack they consume.

Signed-off-by: Alexei Starovoitov 
Acked-by: Daniel Borkmann 
---
 lib/test_bpf.c | 25 -
 1 file changed, 24 insertions(+), 1 deletion(-)

diff --git a/lib/test_bpf.c b/lib/test_bpf.c
index be88cbaadde3..070bde56474c 100644
--- a/lib/test_bpf.c
+++ b/lib/test_bpf.c
@@ -84,6 +84,7 @@ struct bpf_test {
} test[MAX_SUBTESTS];
int (*fill_helper)(struct bpf_test *self);
__u8 frag_data[MAX_DATA];
+   int stack_depth; /* for eBPF only, since tests don't call verifier */
 };
 
 /* Large test cases need separate allocation and fill handler. */
@@ -455,6 +456,7 @@ static int __bpf_fill_stxdw(struct bpf_test *self, int size)
 
self->u.ptr.insns = insn;
self->u.ptr.len = len;
+   self->stack_depth = 40;
 
return 0;
 }
@@ -2317,7 +2319,8 @@ static struct bpf_test tests[] = {
{ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0x08, 0x06, 0, 0,
  0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
  0x10, 0xbf, 0x48, 0xd6, 0x43, 0xd6},
-   { { 38, 256 } }
+   { { 38, 256 } },
+   .stack_depth = 64,
},
/* BPF_ALU | BPF_MOV | BPF_X */
{
@@ -4169,6 +4172,7 @@ static struct bpf_test tests[] = {
INTERNAL,
{ },
{ { 0, 0xff } },
+   .stack_depth = 40,
},
{
"ST_MEM_B: Store/Load byte: max positive",
@@ -4181,6 +4185,7 @@ static struct bpf_test tests[] = {
INTERNAL,
{ },
{ { 0, 0x7f } },
+   .stack_depth = 40,
},
{
"STX_MEM_B: Store/Load byte: max negative",
@@ -4194,6 +4199,7 @@ static struct bpf_test tests[] = {
INTERNAL,
{ },
{ { 0, 0xff } },
+   .stack_depth = 40,
},
{
"ST_MEM_H: Store/Load half word: max negative",
@@ -4206,6 +4212,7 @@ static struct bpf_test tests[] = {
INTERNAL,
{ },
{ { 0, 0x } },
+   .stack_depth = 40,
},
{
"ST_MEM_H: Store/Load half word: max positive",
@@ -4218,6 +4225,7 @@ static struct bpf_test tests[] = {
INTERNAL,
{ },
{ { 0, 0x7fff } },
+   .stack_depth = 40,
},
{
"STX_MEM_H: Store/Load half word: max negative",
@@ -4231,6 +4239,7 @@ static struct bpf_test tests[] = {
INTERNAL,
{ },
{ { 0, 0x } },
+   .stack_depth = 40,
},
{
"ST_MEM_W: Store/Load word: max negative",
@@ -4243,6 +4252,7 @@ static struct bpf_test tests[] = {
INTERNAL,
{ },
{ { 0, 0x } },
+   .stack_depth = 40,
},
{
"ST_MEM_W: Store/Load word: max positive",
@@ -4255,6 +4265,7 @@ static struct bpf_test tests[] = {
INTERNAL,
{ },
{ { 0, 0x7fff } },
+   .stack_depth = 40,
},
{
"STX_MEM_W: Store/Load word: max negative",
@@ -4268,6 +4279,7 @@ static struct bpf_test tests[] = {
INTERNAL,
{ },
{ { 0, 0x } },
+   .stack_depth = 40,
},
{
"ST_MEM_DW: Store/Load double word: max negative",
@@ -4280,6 +4292,7 @@ static struct bpf_test tests[] = {
INTERNAL,
{ },
{ { 0, 0x } },
+   .stack_depth = 40,
},
{
"ST_MEM_DW: Store/Load double word: max negative 2",
@@ -4297,6 +4310,7 @@ static struct bpf_test tests[] = {
INTERNAL,
{ },
{ { 0, 0x1 } },
+   .stack_depth = 40,
},
{
"ST_MEM_DW: Store/Load double word: max positive",
@@ -4309,6 +4323,7 @@ static struct bpf_test tests[] = {
INTERNAL,
{ },
{ { 0, 0x7fff } },
+   .stack_depth = 40,
},
{
"STX_MEM_DW: Store/Load double word: max negative",
@@ -4322,6 +4337,7 @@ static struct bpf_test tests[] = {
INTERNAL,
{ },
{ { 0, 0x } },
+   .stack_depth = 40,
},
/* BPF_STX | BPF_XADD | BPF_W/DW */
{
@@ -4336,6 +4352,7 @@ static struct bpf_test tests[] = {
INTERNAL,
{ },
{ { 0, 0x22 } },
+   .stack_depth = 40,
},
{
"STX_XADD_W: Test side-effects, r10: 0x12 + 0x10 = 0x22",
@@ -4351,6 +4368,7 @@

[PATCH net-next 0/9] bpf: stack depth tracking

2017-05-30 Thread Alexei Starovoitov
Introduce tracking of bpf program stack depth in the verifier and use that
info to reduce bpf program stack consumption in the interpreter and x64 JIT.
Other JITs can take advantage of it as well in the future.
Most of the programs consume very little stack, so it's good optimization
in general and it's the first step toward bpf to bpf function calls.

Also use internal opcode for bpf_tail_call() marking to make clear
that jmp|call|x opcode is not uapi and may be used for actual
indirect call opcode in the future.

Alexei Starovoitov (9):
  bpf: free up BPF_JMP | BPF_CALL | BPF_X opcode
  bpf: split bpf core interpreter
  bpf: teach verifier to track stack depth
  bpf: reconcile bpf_tail_call and stack_depth
  bpf: track stack depth of classic bpf programs
  bpf: fix stack_depth usage by test_bpf.ko
  bpf: use different interpreter depending on required stack size
  bpf: change x86 JITed program stack layout
  bpf: take advantage of stack_depth tracking in x64 JIT

 arch/arm64/net/bpf_jit_comp.c |  2 +-
 arch/powerpc/net/bpf_jit_comp64.c |  2 +-
 arch/s390/net/bpf_jit_comp.c  |  2 +-
 arch/sparc/net/bpf_jit_comp_64.c  |  2 +-
 arch/x86/net/bpf_jit.S| 20 ++--
 arch/x86/net/bpf_jit_comp.c   | 65 +--
 include/linux/bpf.h   |  1 +
 include/linux/filter.h|  3 ++
 kernel/bpf/core.c | 47 ++--
 kernel/bpf/verifier.c | 13 ++--
 lib/test_bpf.c| 25 ++-
 net/core/filter.c | 36 +-
 12 files changed, 147 insertions(+), 71 deletions(-)

-- 
2.9.3



[PATCH net-next 9/9] bpf: take advantage of stack_depth tracking in x64 JIT

2017-05-30 Thread Alexei Starovoitov
Take advantage of stack_depth tracking in x64 JIT.
Round up allocated stack by 8 bytes to make sure it stays aligned
for functions called from JITed bpf program.

Signed-off-by: Alexei Starovoitov 
Acked-by: Daniel Borkmann 
---
 arch/x86/net/bpf_jit_comp.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
index c96dac838f3e..617eac9c4511 100644
--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -206,7 +206,7 @@ struct jit_context {
 /* emit x64 prologue code for BPF program and check it's size.
  * bpf_tail_call helper will skip it while jumping into another program
  */
-static void emit_prologue(u8 **pprog)
+static void emit_prologue(u8 **pprog, u32 stack_depth)
 {
u8 *prog = *pprog;
int cnt = 0;
@@ -214,8 +214,9 @@ static void emit_prologue(u8 **pprog)
EMIT1(0x55); /* push rbp */
EMIT3(0x48, 0x89, 0xE5); /* mov rbp,rsp */
 
-   /* sub rsp, MAX_BPF_STACK + AUX_STACK_SPACE */
-   EMIT3_off32(0x48, 0x81, 0xEC, MAX_BPF_STACK + AUX_STACK_SPACE);
+   /* sub rsp, rounded_stack_depth + AUX_STACK_SPACE */
+   EMIT3_off32(0x48, 0x81, 0xEC,
+   round_up(stack_depth, 8) + AUX_STACK_SPACE);
 
/* sub rbp, AUX_STACK_SPACE */
EMIT4(0x48, 0x83, 0xED, AUX_STACK_SPACE);
@@ -363,7 +364,7 @@ static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 
*image,
int proglen = 0;
u8 *prog = temp;
 
-   emit_prologue(&prog);
+   emit_prologue(&prog, bpf_prog->aux->stack_depth);
 
if (seen_ld_abs)
emit_load_skb_data_hlen(&prog);
-- 
2.9.3



[PATCH net-next 8/9] bpf: change x86 JITed program stack layout

2017-05-30 Thread Alexei Starovoitov
in order to JIT programs with different stack sizes we need to
make epilogue and exception path to be stack size independent,
hence move auxiliary stack space from the bottom of the stack
to the top of the stack.
Nice side effect is that JITed function prologue becomes shorter
due to imm8 offset encoding vs imm32.

Signed-off-by: Alexei Starovoitov 
Acked-by: Daniel Borkmann 
---
 arch/x86/net/bpf_jit.S  | 20 +++-
 arch/x86/net/bpf_jit_comp.c | 58 -
 2 files changed, 40 insertions(+), 38 deletions(-)

diff --git a/arch/x86/net/bpf_jit.S b/arch/x86/net/bpf_jit.S
index f2a7faf4706e..b33093f84528 100644
--- a/arch/x86/net/bpf_jit.S
+++ b/arch/x86/net/bpf_jit.S
@@ -19,9 +19,6 @@
  */
 #define SKBDATA%r10
 #define SKF_MAX_NEG_OFF$(-0x20) /* SKF_LL_OFF from filter.h */
-#define MAX_BPF_STACK (512 /* from filter.h */ + \
-   32 /* space for rbx,r13,r14,r15 */ + \
-   8 /* space for skb_copy_bits */)
 
 #define FUNC(name) \
.globl name; \
@@ -66,7 +63,7 @@ FUNC(sk_load_byte_positive_offset)
 
 /* rsi contains offset and can be scratched */
 #define bpf_slow_path_common(LEN)  \
-   lea -MAX_BPF_STACK + 32(%rbp), %rdx;\
+   lea 32(%rbp), %rdx;\
FRAME_BEGIN;\
mov %rbx, %rdi; /* arg1 == skb */   \
push%r9;\
@@ -83,14 +80,14 @@ FUNC(sk_load_byte_positive_offset)
 bpf_slow_path_word:
bpf_slow_path_common(4)
js  bpf_error
-   mov - MAX_BPF_STACK + 32(%rbp),%eax
+   mov 32(%rbp),%eax
bswap   %eax
ret
 
 bpf_slow_path_half:
bpf_slow_path_common(2)
js  bpf_error
-   mov - MAX_BPF_STACK + 32(%rbp),%ax
+   mov 32(%rbp),%ax
rol $8,%ax
movzwl  %ax,%eax
ret
@@ -98,7 +95,7 @@ FUNC(sk_load_byte_positive_offset)
 bpf_slow_path_byte:
bpf_slow_path_common(1)
js  bpf_error
-   movzbl  - MAX_BPF_STACK + 32(%rbp),%eax
+   movzbl  32(%rbp),%eax
ret
 
 #define sk_negative_common(SIZE)   \
@@ -148,9 +145,10 @@ FUNC(sk_load_byte_negative_offset)
 bpf_error:
 # force a return 0 from jit handler
xor %eax,%eax
-   mov - MAX_BPF_STACK(%rbp),%rbx
-   mov - MAX_BPF_STACK + 8(%rbp),%r13
-   mov - MAX_BPF_STACK + 16(%rbp),%r14
-   mov - MAX_BPF_STACK + 24(%rbp),%r15
+   mov (%rbp),%rbx
+   mov 8(%rbp),%r13
+   mov 16(%rbp),%r14
+   mov 24(%rbp),%r15
+   add $40, %rbp
leaveq
ret
diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
index fec12eaa0dec..c96dac838f3e 100644
--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -197,12 +197,11 @@ struct jit_context {
 #define BPF_MAX_INSN_SIZE  128
 #define BPF_INSN_SAFETY64
 
-#define STACKSIZE \
-   (MAX_BPF_STACK + \
-32 /* space for rbx, r13, r14, r15 */ + \
+#define AUX_STACK_SPACE \
+   (32 /* space for rbx, r13, r14, r15 */ + \
 8 /* space for skb_copy_bits() buffer */)
 
-#define PROLOGUE_SIZE 48
+#define PROLOGUE_SIZE 37
 
 /* emit x64 prologue code for BPF program and check it's size.
  * bpf_tail_call helper will skip it while jumping into another program
@@ -215,13 +214,16 @@ static void emit_prologue(u8 **pprog)
EMIT1(0x55); /* push rbp */
EMIT3(0x48, 0x89, 0xE5); /* mov rbp,rsp */
 
-   /* sub rsp, STACKSIZE */
-   EMIT3_off32(0x48, 0x81, 0xEC, STACKSIZE);
+   /* sub rsp, MAX_BPF_STACK + AUX_STACK_SPACE */
+   EMIT3_off32(0x48, 0x81, 0xEC, MAX_BPF_STACK + AUX_STACK_SPACE);
+
+   /* sub rbp, AUX_STACK_SPACE */
+   EMIT4(0x48, 0x83, 0xED, AUX_STACK_SPACE);
 
/* all classic BPF filters use R6(rbx) save it */
 
-   /* mov qword ptr [rbp-X],rbx */
-   EMIT3_off32(0x48, 0x89, 0x9D, -STACKSIZE);
+   /* mov qword ptr [rbp+0],rbx */
+   EMIT4(0x48, 0x89, 0x5D, 0);
 
/* bpf_convert_filter() maps classic BPF register X to R7 and uses R8
 * as temporary, so all tcpdump filters need to spill/fill R7(r13) and
@@ -231,12 +233,12 @@ static void emit_prologue(u8 **pprog)
 * than synthetic ones. Therefore not worth adding complexity.
 */
 
-   /* mov qword ptr [rbp-X],r13 */
-   EMIT3_off32(0x4C, 0x89, 0xAD, -STACKSIZE + 8);
-   /* mov qword ptr [rbp-X],r14 */
-   EMIT3_off32(0x4C, 0x89, 0xB5, -STACKSIZE + 16);
-   /* mov qword ptr [rbp-X],r15 */
-   EMIT3_off32(0x4C, 0x89, 0xBD, -STACKSIZE + 24);
+   /* mov qword ptr [rbp+8],r13 */
+   EMIT4(0x4C, 0x89, 0x6D, 8);
+   /* mov qword ptr [rbp+16],r14 */
+   EMIT4(0x4C, 0x89, 0x75, 16);
+   /* mov qword ptr [rbp+24],r15 */
+   EMIT4(0x4C, 0x89, 0x7D, 24);
 
/* Clear the tail call counter (tail_call_cnt): for eBPF tail calls
 * we need to reset the c

[PATCH net-next 5/9] bpf: track stack depth of classic bpf programs

2017-05-30 Thread Alexei Starovoitov
To track stack depth of classic bpf programs we only need
to analyze ST|STX instructions, since check_load_and_stores()
verifies that programs can load from stack only after write.

We also need to change the way cBPF stack slots map to eBPF stack,
since typical classic programs are using slots 0 and 1, so they
need to map to stack offsets -4 and -8 respectively in order
to take advantage of small stack interpreter and JITs.

Signed-off-by: Alexei Starovoitov 
Acked-by: Daniel Borkmann 
---
 net/core/filter.c | 36 ++--
 1 file changed, 22 insertions(+), 14 deletions(-)

diff --git a/net/core/filter.c b/net/core/filter.c
index a6bb95fa87b2..946f758d44f2 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -352,7 +352,7 @@ static bool convert_bpf_extensions(struct sock_filter *fp,
  * bpf_convert_filter - convert filter program
  * @prog: the user passed filter program
  * @len: the length of the user passed filter program
- * @new_prog: buffer where converted program will be stored
+ * @new_prog: allocated 'struct bpf_prog' or NULL
  * @new_len: pointer to store length of converted program
  *
  * Remap 'sock_filter' style classic BPF (cBPF) instruction set to 'bpf_insn'
@@ -364,14 +364,13 @@ static bool convert_bpf_extensions(struct sock_filter *fp,
  *
  * 2) 2nd pass to remap in two passes: 1st pass finds new
  *jump offsets, 2nd pass remapping:
- *   new_prog = kmalloc(sizeof(struct bpf_insn) * new_len);
  *   bpf_convert_filter(old_prog, old_len, new_prog, &new_len);
  */
 static int bpf_convert_filter(struct sock_filter *prog, int len,
- struct bpf_insn *new_prog, int *new_len)
+ struct bpf_prog *new_prog, int *new_len)
 {
-   int new_flen = 0, pass = 0, target, i;
-   struct bpf_insn *new_insn;
+   int new_flen = 0, pass = 0, target, i, stack_off;
+   struct bpf_insn *new_insn, *first_insn = NULL;
struct sock_filter *fp;
int *addrs = NULL;
u8 bpf_src;
@@ -383,6 +382,7 @@ static int bpf_convert_filter(struct sock_filter *prog, int 
len,
return -EINVAL;
 
if (new_prog) {
+   first_insn = new_prog->insnsi;
addrs = kcalloc(len, sizeof(*addrs),
GFP_KERNEL | __GFP_NOWARN);
if (!addrs)
@@ -390,11 +390,11 @@ static int bpf_convert_filter(struct sock_filter *prog, 
int len,
}
 
 do_pass:
-   new_insn = new_prog;
+   new_insn = first_insn;
fp = prog;
 
/* Classic BPF related prologue emission. */
-   if (new_insn) {
+   if (new_prog) {
/* Classic BPF expects A and X to be reset first. These need
 * to be guaranteed to be the first two instructions.
 */
@@ -415,7 +415,7 @@ static int bpf_convert_filter(struct sock_filter *prog, int 
len,
struct bpf_insn *insn = tmp_insns;
 
if (addrs)
-   addrs[i] = new_insn - new_prog;
+   addrs[i] = new_insn - first_insn;
 
switch (fp->code) {
/* All arithmetic insns and skb loads map as-is. */
@@ -561,17 +561,25 @@ static int bpf_convert_filter(struct sock_filter *prog, 
int len,
/* Store to stack. */
case BPF_ST:
case BPF_STX:
+   stack_off = fp->k * 4  + 4;
*insn = BPF_STX_MEM(BPF_W, BPF_REG_FP, 
BPF_CLASS(fp->code) ==
BPF_ST ? BPF_REG_A : BPF_REG_X,
-   -(BPF_MEMWORDS - fp->k) * 4);
+   -stack_off);
+   /* check_load_and_stores() verifies that classic BPF can
+* load from stack only after write, so tracking
+* stack_depth for ST|STX insns is enough
+*/
+   if (new_prog && new_prog->aux->stack_depth < stack_off)
+   new_prog->aux->stack_depth = stack_off;
break;
 
/* Load from stack. */
case BPF_LD | BPF_MEM:
case BPF_LDX | BPF_MEM:
+   stack_off = fp->k * 4  + 4;
*insn = BPF_LDX_MEM(BPF_W, BPF_CLASS(fp->code) == 
BPF_LD  ?
BPF_REG_A : BPF_REG_X, BPF_REG_FP,
-   -(BPF_MEMWORDS - fp->k) * 4);
+   -stack_off);
break;
 
/* A = K or X = K */
@@ -619,13 +627,13 @@ static int bpf_convert_filter(struct sock_filter *prog, 
int len,
 
if (!new_prog) {
/* Only calculating new length. */
-   *new_len = new_insn - new_prog;
+   *new_len = new_insn - first_insn;

[PATCH net-next 3/9] bpf: teach verifier to track stack depth

2017-05-30 Thread Alexei Starovoitov
teach verifier to track bpf program stack depth

Signed-off-by: Alexei Starovoitov 
Acked-by: Daniel Borkmann 
---
 include/linux/bpf.h   |  1 +
 kernel/bpf/verifier.c | 10 +-
 2 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 6bb38d76faf4..fcc80ca11045 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -171,6 +171,7 @@ struct bpf_prog_aux {
atomic_t refcnt;
u32 used_map_cnt;
u32 max_ctx_offset;
+   u32 stack_depth;
struct latch_tree_node ksym_tnode;
struct list_head ksym_lnode;
const struct bpf_verifier_ops *ops;
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 28113d0e8e92..d96f27ff9f6f 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -926,6 +926,10 @@ static int check_mem_access(struct bpf_verifier_env *env, 
u32 regno, int off,
verbose("invalid stack off=%d size=%d\n", off, size);
return -EACCES;
}
+
+   if (env->prog->aux->stack_depth < -off)
+   env->prog->aux->stack_depth = -off;
+
if (t == BPF_WRITE) {
if (!env->allow_ptr_leaks &&
state->stack_slot_type[MAX_BPF_STACK + off] == 
STACK_SPILL &&
@@ -1032,6 +1036,9 @@ static int check_stack_boundary(struct bpf_verifier_env 
*env, int regno,
return -EACCES;
}
 
+   if (env->prog->aux->stack_depth < -off)
+   env->prog->aux->stack_depth = -off;
+
if (meta && meta->raw_mode) {
meta->access_size = access_size;
meta->regno = regno;
@@ -3167,7 +3174,8 @@ static int do_check(struct bpf_verifier_env *env)
insn_idx++;
}
 
-   verbose("processed %d insns\n", insn_processed);
+   verbose("processed %d insns, stack depth %d\n",
+   insn_processed, env->prog->aux->stack_depth);
return 0;
 }
 
-- 
2.9.3



[PATCH net-next 4/9] bpf: reconcile bpf_tail_call and stack_depth

2017-05-30 Thread Alexei Starovoitov
The next set of patches will take advantage of stack_depth tracking,
so make sure that the program that does bpf_tail_call() has
stack depth large enough for the callee.
We could have tracked the stack depth of the prog_array owner program
and only allow insertion of the programs with stack depth less
than the owner, but it will break existing applications.
Some of them have trivial root bpf program that only does
multiple bpf_tail_calls and at init time the prog array is empty.
In the future we may add a flag to do such tracking optionally,
but for now play simple and safe.

Signed-off-by: Alexei Starovoitov 
Acked-by: Daniel Borkmann 
---
 kernel/bpf/verifier.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index d96f27ff9f6f..14ccb0759fa4 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -3470,6 +3470,7 @@ static int fixup_bpf_calls(struct bpf_verifier_env *env)
 * the program array.
 */
prog->cb_access = 1;
+   env->prog->aux->stack_depth = MAX_BPF_STACK;
 
/* mark bpf_tail_call as different opcode to avoid
 * conditional branch in the interpeter for every normal
-- 
2.9.3



Re: [PATCH net-next v2 4/6] net: dsa: remove unused arguments of tagger rcv

2017-05-30 Thread Vivien Didelot
Florian Fainelli  writes:

>> Is orig_dev really needed in the tagging implementation, or only in the
>> layer above? (dsa_slave_xmit and dsa_switch_rcv.)
>
> It's needed:
>
> https://github.com/ffainelli/linux/commits/switch-tag
> https://github.com/ffainelli/linux/commit/61f9ca70dd21bb8c1615f959934cd0ce80a2f3ce

Patch dropped!


Re: [PATCH net-next v2 4/6] net: dsa: remove unused arguments of tagger rcv

2017-05-30 Thread Florian Fainelli
On 05/30/2017 12:55 PM, Vivien Didelot wrote:
> Hi Florian,
> 
> Florian Fainelli  writes:
> 
>> I actually have a patch pending that adds support for HW
>> insertion/extraction of switch tags (broadcom HW supports that) which
>> require the orig_dev to be there so we know what features are supported
>> by the master network device.
> 
> Is orig_dev really needed in the tagging implementation, or only in the
> layer above? (dsa_slave_xmit and dsa_switch_rcv.)

It's needed:

https://github.com/ffainelli/linux/commits/switch-tag
https://github.com/ffainelli/linux/commit/61f9ca70dd21bb8c1615f959934cd0ce80a2f3ce

> 
>> Do you mind dropping this one from your patch series?
> 
> I don't mind if they are actually needed in taggers. I'd wait for
> reviews of the other patches before respinning though.
> 
> Thanks,
> 
> Vivien
> 


-- 
Florian


Re: [PATCH net-next v2 4/6] net: dsa: remove unused arguments of tagger rcv

2017-05-30 Thread Vivien Didelot
Hi Florian,

Florian Fainelli  writes:

> I actually have a patch pending that adds support for HW
> insertion/extraction of switch tags (broadcom HW supports that) which
> require the orig_dev to be there so we know what features are supported
> by the master network device.

Is orig_dev really needed in the tagging implementation, or only in the
layer above? (dsa_slave_xmit and dsa_switch_rcv.)

> Do you mind dropping this one from your patch series?

I don't mind if they are actually needed in taggers. I'd wait for
reviews of the other patches before respinning though.

Thanks,

Vivien


Re: [PATCH V2 2/3] net-next: dsa: add multi cpu port support

2017-05-30 Thread Vivien Didelot
Hi John,

Vivien Didelot  writes:

>> +int port_cpu = ds->ports[port].upstream;
>
> ds->ports[port] is p->dp.

I misread this part, p is not yet allocated in that chunk, please ignore
this one comment ;-)

Thanks,

Vivien


Re: [PATCH V2 2/3] net-next: dsa: add multi cpu port support

2017-05-30 Thread Vivien Didelot
Hi John,

John Crispin  writes:

> +static inline bool dsa_is_upstream_port(struct dsa_switch *ds, int p)
> +{
> + return dsa_is_cpu_port(ds, p) || dsa_is_dsa_port(ds, p);
> +}

This looks confusing to me. What DSA calls an "upstream" port for the
moment is the port which goes to the CPU interface.

 CPU0 (eth0)
   |
   | sw0 sw1 sw2
 [p0 p1 p2]--[p0 p1 p2]--[p0 p1 p2]
  |   |   |  |
  eth1 eth2eth3  eth4

So in the example above, sw1p0 is an upstream port, but sw1p2 is not.
This is why dsa_upstream_port makes use of ds->rtable.

> @@ -1140,11 +1140,16 @@ int dsa_slave_create(struct dsa_switch *ds, struct 
> device *parent,
>   struct net_device *master;
>   struct net_device *slave_dev;
>   struct dsa_slave_priv *p;
> + int port_cpu = ds->ports[port].upstream;

ds->ports[port] is p->dp.

>   int ret;
>  
> - master = ds->dst->master_netdev;
> - if (ds->master_netdev)
> + if (port_cpu && ds->ports[port_cpu].ethernet)

0 is a valid port index for a CPU, e.g. Marvell 88E6390.

> + master = ds->ports[port_cpu].ethernet;
> + else if (ds->master_netdev)
>   master = ds->master_netdev;
> + else
> + master = ds->dst->master_netdev;
> + master->dsa_ptr = (void *)ds->dst;
>  
>   slave_dev = alloc_netdev(sizeof(struct dsa_slave_priv), name,
>NET_NAME_UNKNOWN, ether_setup);
> @@ -1173,6 +1178,7 @@ int dsa_slave_create(struct dsa_switch *ds, struct 
> device *parent,
>   p->dp = &ds->ports[port];
>   INIT_LIST_HEAD(&p->mall_tc_list);
>   p->xmit = dst->tag_ops->xmit;
> + p->master = master;

I'm a bit confused why we need all these references to net devices. We
now have ds->master_netdev, dst->master_netdev, dp->ethernet and
p->master...

Wouldn't it be simpler if we only had dp->ethernet (or whichever more
explicit name) for the conduit interface used to send/receive frames?

Maybe I am missing something, in which case I'm sorry in advance.


Thanks,

Vivien


Re: [net-wireless-orinoco] question about potential null pointer dereference

2017-05-30 Thread Johannes Berg
Hi,

> The issue here is that line 56 implies that pointer tfm_michael
> might be NULL. If this is the case, there is a potential NULL
> pointer dereference at line 52 once pointer tfm_michael is
> indirectly dereferenced inside macro SHASH_DESC_ON_STACK().
> 
> My question is if there is any chance that pointer tfm_michael
> might be NULL when calling macro SHASH_DESC_ON_STACK() ?
> 
> I'm trying to figure out if this is a false positive or something
> that needs to be fixed somehow.

Look, if you're just sending the coverity reports to the list without
reflecting and researching them, that's not actually useful - we can
look at them ourselves.

It took me at most a few minutes to figure this one out, so please just
do the same, look at the code, and figure out what the right answer is
here.

johannes


Re: [PATCH V2 2/3] net-next: dsa: add multi cpu port support

2017-05-30 Thread Florian Fainelli
On 05/30/2017 12:15 PM, Florian Fainelli wrote:
> Hi John,
> 
> On 05/30/2017 11:37 AM, John Crispin wrote:
>> Hi,
>>
>> the patch series is based on net-next from 12 hours ago and works fine
>> on that tree. I compile and runtime tested it quite intensively on
>> various boards
> 
> The warning is legit though:
> 
> 572if (dsa_port_is_cpu(port))
> 573err = dsa_cpu_parse(port, index, dst, ds);
>   > 574else if (dsa_is_normal_port(port))
> 575err = dsa_user_parse(port->dn, index,  ds);
> 
> and after applying your patches I also met it:
> 
> net/dsa/dsa2.c: In function 'dsa_ds_parse':
> net/dsa/dsa2.c:574:3: warning: passing argument 1 of
> 'dsa_is_normal_port' from incompatible pointer type [enabled by default]
>else if (dsa_is_normal_port(port))
>^
> In file included from net/dsa/dsa_priv.h:17:0,
>  from net/dsa/dsa2.c:22:
> ./include/net/dsa.h:264:20: note: expected 'struct dsa_switch *' but
> argument is of type 'struct dsa_port *'
>  static inline bool dsa_is_normal_port(struct dsa_switch *ds, int p)
> ^
> net/dsa/dsa2.c:574:3: error: too few arguments to function
> 'dsa_is_normal_port'
>else if (dsa_is_normal_port(port))
>^
> In file included from net/dsa/dsa_priv.h:17:0,
>  from net/dsa/dsa2.c:22:
> ./include/net/dsa.h:264:20: note: declared here
>  static inline bool dsa_is_normal_port(struct dsa_switch *ds, int p)
> ^
>   CC  net/bridge/br_stp.o
> scripts/Makefile.build:302: recipe for target 'net/dsa/dsa2.o' failed
> make[4]: *** [net/dsa/dsa2.o] Error 1
> scripts/Makefile.build:561: recipe for target 'net/dsa' failed
> make[3]: *** [net/dsa] Error 2
> make[3]: *** Waiting for unfinished jobs
> 
> We need something like this, I have some comments on your patches that I
> will send shortly. Thanks!
> 
> diff --git a/net/dsa/dsa2.c b/net/dsa/dsa2.c
> index 8b13aa735c40..124c5acfa123 100644
> --- a/net/dsa/dsa2.c
> +++ b/net/dsa/dsa2.c
> @@ -539,14 +539,14 @@ static int dsa_cpu_parse(struct dsa_port *port,
> u32 index,
> return 0;
>  }
> 
> -static int dsa_user_parse(struct device_node *port, u32 index,
> +static int dsa_user_parse(struct dsa_port *port, u32 index,
>   struct dsa_switch *ds)
>  {
> struct device_node *cpu_port;
> const unsigned int *cpu_port_reg;
> int cpu_port_index;
> 
> -   cpu_port = of_parse_phandle(port, "cpu", 0);
> +   cpu_port = of_parse_phandle(port->dn, "cpu", 0);
> if (cpu_port) {
> cpu_port_reg = of_get_property(cpu_port, "reg", NULL);
> if (!cpu_port_reg)
> @@ -572,7 +572,7 @@ static int dsa_ds_parse(struct dsa_switch_tree *dst,
> struct dsa_switch *ds)
> if (dsa_port_is_cpu(port))
> err = dsa_cpu_parse(port, index, dst, ds);
> else if (dsa_is_normal_port(port))
> -   err = dsa_user_parse(port->dn, index,  ds);
> +   err = dsa_user_parse(port, index,  ds);
> 
> if (err)
> return err;

here is a version that builds, I missed one hunk:

diff --git a/net/dsa/dsa2.c b/net/dsa/dsa2.c
index 8b13aa735c40..d71a2a610340 100644
--- a/net/dsa/dsa2.c
+++ b/net/dsa/dsa2.c
@@ -539,14 +539,14 @@ static int dsa_cpu_parse(struct dsa_port *port,
u32 index,
return 0;
 }

-static int dsa_user_parse(struct device_node *port, u32 index,
+static int dsa_user_parse(struct dsa_port *port, u32 index,
  struct dsa_switch *ds)
 {
struct device_node *cpu_port;
const unsigned int *cpu_port_reg;
int cpu_port_index;

-   cpu_port = of_parse_phandle(port, "cpu", 0);
+   cpu_port = of_parse_phandle(port->dn, "cpu", 0);
if (cpu_port) {
cpu_port_reg = of_get_property(cpu_port, "reg", NULL);
if (!cpu_port_reg)
@@ -571,8 +571,8 @@ static int dsa_ds_parse(struct dsa_switch_tree *dst,
struct dsa_switch *ds)

if (dsa_port_is_cpu(port))
err = dsa_cpu_parse(port, index, dst, ds);
-   else if (dsa_is_normal_port(port))
-   err = dsa_user_parse(port->dn, index,  ds);
+   else if (dsa_is_normal_port(ds, index))
+   err = dsa_user_parse(port, index,  ds);

if (err)
return err;

-- 
Florian


Re: [PATCH linux-firmware] Mellanox: Add firmware for mlxsw_spectrum

2017-05-30 Thread Kyle McMartin
On Mon, May 29, 2017 at 01:42:28PM +0300, Yotam Gigi wrote:
> Add first firmware for the Mellanox Spectrum switch, as a followup to the
> recently added commit:
> 6b7421992b8d ("mlxsw: spectrum: Validate firmware revision on init")
> 
> The version of the firmware release is 13.1420.122
> 
> Signed-off-by: Yotam Gigi 
> ---
>  mlxsw_spectrum-13.1420.122.mfa2 | Bin 0 -> 860424 bytes
>  1 file changed, 0 insertions(+), 0 deletions(-)
>  create mode 100644 mlxsw_spectrum-13.1420.122.mfa2
> 
> diff --git a/mlxsw_spectrum-13.1420.122.mfa2 b/mlxsw_spectrum-13.1420.122.mfa2
> new file mode 100644

Can we please put this in a subdirectory if we're going to be adding
more versions of this in the future?

--Kyle


Re: [PATCH v2] arm: eBPF JIT compiler

2017-05-30 Thread Kees Cook
Forwarding this to net-dev and eBPF folks, who weren't on CC...

-Kees

On Thu, May 25, 2017 at 4:13 PM, Shubham Bansal
 wrote:
> The JIT compiler emits ARM 32 bit instructions. Currently, It supports
> eBPF only. Classic BPF is supported because of the conversion by BPF
> core.
>
> This patch is essentially changing the current implementation of JIT
> compiler of Berkeley Packet Filter from classic to internal with almost
> all instructions from eBPF ISA supported except the following
> BPF_ALU64 | BPF_DIV | BPF_K
> BPF_ALU64 | BPF_DIV | BPF_X
> BPF_ALU64 | BPF_MOD | BPF_K
> BPF_ALU64 | BPF_MOD | BPF_X
> BPF_STX | BPF_XADD | BPF_W
> BPF_STX | BPF_XADD | BPF_DW
> BPF_JMP | BPF_CALL
>
> Implementation is using scratch space to emulate 64 bit eBPF ISA on 32 bit
> ARM because of deficiency of general purpose registers on ARM. Currently,
> only LITTLE ENDIAN machines are supported in this eBPF JIT Compiler.
>
> Tested on ARMv7 with QEMU by me (Shubham Bansal).
> Tested on ARMv5 by Andrew Lunn (and...@lunn.ch).
> Expected to work on ARMv6 as well, as its a part ARMv7 and part ARMv5.
> Although, a proper testing is not done for ARMv6.
>
> Both of these testing are done with and without CONFIG_FRAME_POINTER
> separately for LITTLE ENDIAN machine.
>
> For testing:
>
> 1. JIT is enabled with
> echo 1 > /proc/sys/net/core/bpf_jit_enable
> 2. Constant Blinding can be enabled along with JIT using
> echo 1 > /proc/sys/net/core/bpf_jit_enable
> echo 2 > /proc/sys/net/core/bpf_jit_harden
>
> See Documentation/networking/filter.txt for more information.
>
> Result : test_bpf: Summary: 314 PASSED, 0 FAILED, [278/306 JIT'ed]
>
> Signed-off-by: Shubham Bansal 
> ---
>  Documentation/networking/filter.txt |4 +-
>  arch/arm/Kconfig|2 +-
>  arch/arm/net/bpf_jit_32.c   | 2404 
> ---
>  arch/arm/net/bpf_jit_32.h   |  108 +-
>  4 files changed, 1713 insertions(+), 805 deletions(-)
>
> diff --git a/Documentation/networking/filter.txt 
> b/Documentation/networking/filter.txt
> index b69b205..01165ac 100644
> --- a/Documentation/networking/filter.txt
> +++ b/Documentation/networking/filter.txt
> @@ -596,8 +596,8 @@ skb pointer). All constraints and restrictions from 
> bpf_check_classic() apply
>  before a conversion to the new layout is being done behind the scenes!
>
>  Currently, the classic BPF format is being used for JITing on most 32-bit
> -architectures, whereas x86-64, aarch64, s390x, powerpc64, sparc64 perform JIT
> -compilation from eBPF instruction set.
> +architectures, whereas x86-64, aarch64, arm, s390x, powerpc64, sparc64 
> perform
> +JIT compilation from eBPF instruction set.
>
>  Some core changes of the new internal format:
>
> diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
> index 8a7ab5e..13ade46 100644
> --- a/arch/arm/Kconfig
> +++ b/arch/arm/Kconfig
> @@ -47,7 +47,7 @@ config ARM
> select HAVE_ARCH_SECCOMP_FILTER if (AEABI && !OABI_COMPAT)
> select HAVE_ARCH_TRACEHOOK
> select HAVE_ARM_SMCCC if CPU_V7
> -   select HAVE_CBPF_JIT
> +   select HAVE_EBPF_JIT
> select HAVE_CC_STACKPROTECTOR
> select HAVE_CONTEXT_TRACKING
> select HAVE_C_RECORDMCOUNT
> diff --git a/arch/arm/net/bpf_jit_32.c b/arch/arm/net/bpf_jit_32.c
> index 93d0b6d..c7476e5 100644
> --- a/arch/arm/net/bpf_jit_32.c
> +++ b/arch/arm/net/bpf_jit_32.c
> @@ -1,13 +1,15 @@
>  /*
> - * Just-In-Time compiler for BPF filters on 32bit ARM
> + * Just-In-Time compiler for eBPF filters on 32bit ARM
>   *
>   * Copyright (c) 2011 Mircea Gherzan 
> + * Copyright (c) 2017 Shubham Bansal 
>   *
>   * This program is free software; you can redistribute it and/or modify it
>   * under the terms of the GNU General Public License as published by the
>   * Free Software Foundation; version 2 of the License.
>   */
>
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -23,44 +25,91 @@
>
>  #include "bpf_jit_32.h"
>
> +int bpf_jit_enable __read_mostly;
> +
> +#define STACK_OFFSET(k)(k)
> +#define TMP_REG_1  (MAX_BPF_JIT_REG + 0)   /* TEMP Register 1 */
> +#define TMP_REG_2  (MAX_BPF_JIT_REG + 1)   /* TEMP Register 2 */
> +#define TCALL_CNT  (MAX_BPF_JIT_REG + 2)   /* Tail Call Count */
> +
> +/* Flags used for JIT optimization */
> +#define SEEN_CALL  (1 << 0)
> +
> +#define FLAG_IMM_OVERFLOW  (1 << 0)
> +
>  /*
> - * ABI:
> + * Map eBPF registers to ARM 32bit registers or stack scratch space.
> + *
> + * 1. First argument is passed using the arm 32bit registers and rest of the
> + * arguments are passed on stack scratch space.
> + * 2. First callee-saved aregument is mapped to arm 32 bit registers and rest
> + * arguments are mapped to scratch space on stack.
> + * 3. We need two 64 bit temp registers to do complex operations on eBPF
> + * registers.
> + *
> + * As the eBPF registers are all 64 bit registers and arm has only 3

Re: [PATCH V2 2/3] net-next: dsa: add multi cpu port support

2017-05-30 Thread Florian Fainelli
Hi John,

On 05/30/2017 11:37 AM, John Crispin wrote:
> Hi,
> 
> the patch series is based on net-next from 12 hours ago and works fine
> on that tree. I compile and runtime tested it quite intensively on
> various boards

The warning is legit though:

572if (dsa_port_is_cpu(port))
573err = dsa_cpu_parse(port, index, dst, ds);
  > 574else if (dsa_is_normal_port(port))
575err = dsa_user_parse(port->dn, index,  ds);

and after applying your patches I also met it:

net/dsa/dsa2.c: In function 'dsa_ds_parse':
net/dsa/dsa2.c:574:3: warning: passing argument 1 of
'dsa_is_normal_port' from incompatible pointer type [enabled by default]
   else if (dsa_is_normal_port(port))
   ^
In file included from net/dsa/dsa_priv.h:17:0,
 from net/dsa/dsa2.c:22:
./include/net/dsa.h:264:20: note: expected 'struct dsa_switch *' but
argument is of type 'struct dsa_port *'
 static inline bool dsa_is_normal_port(struct dsa_switch *ds, int p)
^
net/dsa/dsa2.c:574:3: error: too few arguments to function
'dsa_is_normal_port'
   else if (dsa_is_normal_port(port))
   ^
In file included from net/dsa/dsa_priv.h:17:0,
 from net/dsa/dsa2.c:22:
./include/net/dsa.h:264:20: note: declared here
 static inline bool dsa_is_normal_port(struct dsa_switch *ds, int p)
^
  CC  net/bridge/br_stp.o
scripts/Makefile.build:302: recipe for target 'net/dsa/dsa2.o' failed
make[4]: *** [net/dsa/dsa2.o] Error 1
scripts/Makefile.build:561: recipe for target 'net/dsa' failed
make[3]: *** [net/dsa] Error 2
make[3]: *** Waiting for unfinished jobs

We need something like this, I have some comments on your patches that I
will send shortly. Thanks!

diff --git a/net/dsa/dsa2.c b/net/dsa/dsa2.c
index 8b13aa735c40..124c5acfa123 100644
--- a/net/dsa/dsa2.c
+++ b/net/dsa/dsa2.c
@@ -539,14 +539,14 @@ static int dsa_cpu_parse(struct dsa_port *port,
u32 index,
return 0;
 }

-static int dsa_user_parse(struct device_node *port, u32 index,
+static int dsa_user_parse(struct dsa_port *port, u32 index,
  struct dsa_switch *ds)
 {
struct device_node *cpu_port;
const unsigned int *cpu_port_reg;
int cpu_port_index;

-   cpu_port = of_parse_phandle(port, "cpu", 0);
+   cpu_port = of_parse_phandle(port->dn, "cpu", 0);
if (cpu_port) {
cpu_port_reg = of_get_property(cpu_port, "reg", NULL);
if (!cpu_port_reg)
@@ -572,7 +572,7 @@ static int dsa_ds_parse(struct dsa_switch_tree *dst,
struct dsa_switch *ds)
if (dsa_port_is_cpu(port))
err = dsa_cpu_parse(port, index, dst, ds);
else if (dsa_is_normal_port(port))
-   err = dsa_user_parse(port->dn, index,  ds);
+   err = dsa_user_parse(port, index,  ds);

if (err)
return err;


> 
> John
> 
> 
> On 30/05/17 17:38, kbuild test robot wrote:
>> Hi John,
>>
>> [auto build test ERROR on net-next/master]
>> [also build test ERROR on next-20170530]
>> [cannot apply to v4.12-rc3]
>> [if your patch is applied to the wrong git tree, please drop us a note
>> to help improve the system]
>>
>> url:   
>> https://github.com/0day-ci/linux/commits/John-Crispin/Documentation-devicetree-add-multiple-cpu-port-DSA-binding/20170530-224954
>>
>> config: x86_64-randconfig-x014-201722 (attached as .config)
>> compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901
>> reproduce:
>>  # save the attached .config to linux build tree
>>  make ARCH=x86_64
>>
>> All error/warnings (new ones prefixed by >>):
>>
>> In file included from include/linux/ioport.h:12:0,
>>  from include/linux/device.h:16,
>>  from net//dsa/dsa2.c:13:
>> net//dsa/dsa2.c: In function 'dsa_ds_parse':
>>>> net//dsa/dsa2.c:574:31: error: passing argument 1 of
>>>> 'dsa_is_normal_port' from incompatible pointer type
>>>> [-Werror=incompatible-pointer-types]
>>else if (dsa_is_normal_port(port))
>>^
>> include/linux/compiler.h:160:30: note: in definition of macro
>> '__trace_if'
>>   if (__builtin_constant_p(!!(cond)) ? !!(cond) :   \
>>   ^~~~
>>>> net//dsa/dsa2.c:574:8: note: in expansion of macro 'if'
>>else if (dsa_is_normal_port(port))
>> ^~
>> In file included from net//dsa/dsa_priv.h:17:0,
>> 

Re: [PATCH v2 net-next 1/3] perf, bpf: Add BPF support to all perf_event types

2017-05-30 Thread Peter Zijlstra
On Tue, May 30, 2017 at 10:37:46AM -0700, Alexei Starovoitov wrote:
> On 5/30/17 9:51 AM, Peter Zijlstra wrote:

> > I'm not entirely sure I see how that is required. Should per task not
> > already work? The WARN that's there will only trigger if you call them
> > on the wrong task, which is something you shouldn't do anyway.
> 
> The kernel WARN is considered to be a bug of bpf infra. That's the
> reason we do all these checks at map update time and at run-time.
> The bpf program authors should be able to do all possible experiments
> until their scripts work. Dealing with kernel warns and reboots is not
> something user space folks like to do.
> Today bpf_perf_event_read() for per-task events isn't really
> working due to event->oncpu != cpu runtime check in there.
> If we convert warns to returns the existing scripts will continue
> to work as-is and per-task will be possible.

Ah yes.. I always forget that side of things (not ever having seen a
bpf thing working doesn't help either of course).

Let me consider that for a bit..


Re: [PATCH net-next v2 4/6] net: dsa: remove unused arguments of tagger rcv

2017-05-30 Thread Florian Fainelli
Hey Vivien,

On 05/30/2017 11:33 AM, Vivien Didelot wrote:
> The struct dsa_device_ops defines the rcv function with 2 unused
> arguments struct packet_type *pt, and struct net_device *orig_dev.
> 
> This patch removes them from the definition and implementations.
> 
> Reviewed-by: Andrew Lunn 
> Signed-off-by: Vivien Didelot 

I actually have a patch pending that adds support for HW
insertion/extraction of switch tags (broadcom HW supports that) which
require the orig_dev to be there so we know what features are supported
by the master network device.

Do you mind dropping this one from your patch series?

Thanks!
-- 
Florian


  1   2   3   >