Re: commit : ppp: add rtnetlink device creation support - breaks netcf on my machine.

2016-12-05 Thread Brad Campbell

On 06/12/16 01:53, Guillaume Nault wrote:



Probably not a mistake on your side. I've started looking at netcf'
source code, but haven't found anything that could explain your issue.
It'd really help if you could provide steps to reproduce the bug.


Further to my message this morning, I started with a clean linux.git 
4.9.0-rc7-00198-g0cb65c8 and did two runs. One untouched and one with 
the identified patch reverted. I logged both of these with NLCB=debug, 
then split out the ppp section and diffed them.


It appears the only difference of note is the new ATTR 18. I did a diff 
of the entire dump for both and nothing else popped out.



brad@test:~$ diff -u ppp-ok ppp-fail
--- ppp-ok  2016-12-06 13:32:04.358393578 +0800
+++ ppp-fail2016-12-06 13:32:18.577864406 +0800
@@ -1,10 +1,10 @@
 --   BEGIN NETLINK MESSAGE 
---

   [HEADER] 16 octets
-.nlmsg_len = 628
+.nlmsg_len = 644
 .nlmsg_type = 16 
 .nlmsg_flags = 2 
-.nlmsg_seq = 1481001940
-.nlmsg_pid = 7462
+.nlmsg_seq = 1481002252
+.nlmsg_pid = 7376
   [PAYLOAD] 16 octets
 00 00 00 02 0a 00 00 00 d1 10 01 00 00 00 00 00   
   [ATTR 03] 5 octets
@@ -71,6 +71,8 @@
 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
..
 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
..

 00 00 00 00 00 00 ..
+  [ATTR 18] 12 octets
+08 00 01 00 70 70 70 00 04 00 02 00   ppp.
   [ATTR 26] 132 octets
 84 00 02 00 80 00 01 00 01 00 00 00 00 00 00 00 00 00 
..
 00 00 01 00 00 00 01 00 00 00 01 00 00 00 01 00 00 00 
..

@@ -81,3 +83,4 @@
 00 00 00 00 10 27 00 00 e8 03 00 00 00 00 00 00 00 00 
.'

 00 00 00 00 00 00 ..
 ---  END NETLINK MESSAGE 
---


Running with NLDBG=4 seems to generate this :
DBG<2>: While picking up for 0x26d2e00 , recvmsgs() returned 
-34:  (errno = Numerical result out of range)DBG<1>: Clearing cache 
0x26d2e00 ...


(skip forward 4 hours)

Ok, so I've spent the afternoon compiling and installing software.

I'm afraid I gave you a bum steer. The issue only manifests itself on 
libnl1. I had both installed and netcf was compiling against 1 and not 3.


I spent the afternoon compiling and installing various combinations of 
libnl and netcf and can only reproduce the issue if netcf is compiled 
against libnl <= 1.1.4. It won't compile against 2, 3, or 3.1 and it 
works against 3.2. That explains why it manifests itself on my clean 
Debian 7 machines.


I can work around it locally by recompiling all my stuff against libnl3 
if you don't feel inclined to chase it down, but it is certainly 
reproducible on nl1. I compiled up 1.1.4 and compiled netcf-0.2.8 
against that and the problem shows.


Regards,
Brad


[PATCHv2 net] team: team_port_add should check link_up before enable port

2016-12-05 Thread Xin Long
Now when users add a nic to team dev, the option 'enable' of the port
is true by default, as team_port_enable enables it after dev_open in
team_port_add.

But even if the port_dev has no carrier, like it's cable was unpluged,
the port is still enabled. It leads to that team dev couldn't work well
if this port was chosen to connect, and has no chance to change to use
other ports if link_watch is ethtool.

This patch is to enable the port only when the port_dev has carrier in
team_port_add.

v1 -> v2:
  use netif_carrier_ok() instead of !!netif_carrier_ok(), as it returns
  bool now.

Signed-off-by: Xin Long 
---
 drivers/net/team/team.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/net/team/team.c b/drivers/net/team/team.c
index a380649..4bc0103 100644
--- a/drivers/net/team/team.c
+++ b/drivers/net/team/team.c
@@ -1140,6 +1140,7 @@ static int team_port_add(struct team *team, struct 
net_device *port_dev)
struct net_device *dev = team->dev;
struct team_port *port;
char *portname = port_dev->name;
+   bool linkup;
int err;
 
if (port_dev->flags & IFF_LOOPBACK) {
@@ -1249,9 +1250,12 @@ static int team_port_add(struct team *team, struct 
net_device *port_dev)
 
port->index = -1;
list_add_tail_rcu(>list, >port_list);
-   team_port_enable(team, port);
+   linkup = netif_carrier_ok(port_dev);
+   if (linkup)
+   team_port_enable(team, port);
+
__team_compute_features(team);
-   __team_port_change_port_added(port, !!netif_carrier_ok(port_dev));
+   __team_port_change_port_added(port, linkup);
__team_options_change_check(team);
 
netdev_info(dev, "Port device %s added\n", portname);
-- 
2.1.0



[PATCH] net: return value of skb_linearize should be handled in Linux kernel

2016-12-05 Thread Zhouyi Zhou
kmalloc_reserve may fail to allocate memory inside skb_linearize, 
which means skb_linearize's return value should not be ignored. 
Following patch correct the uses of skb_linearize.

Compiled in x86_64

Signed-off-by: Zhouyi Zhou 
---
 drivers/infiniband/hw/nes/nes_nic.c   | 5 +++--
 drivers/net/ethernet/intel/ixgbe/ixgbe_fcoe.c | 6 +-
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 3 +--
 drivers/scsi/bnx2fc/bnx2fc_fcoe.c | 7 +--
 drivers/scsi/fcoe/fcoe.c  | 5 -
 net/tipc/link.c   | 3 ++-
 net/tipc/name_distr.c | 5 -
 7 files changed, 24 insertions(+), 10 deletions(-)

diff --git a/drivers/infiniband/hw/nes/nes_nic.c 
b/drivers/infiniband/hw/nes/nes_nic.c
index 2b27d13..69372ea 100644
--- a/drivers/infiniband/hw/nes/nes_nic.c
+++ b/drivers/infiniband/hw/nes/nes_nic.c
@@ -662,10 +662,11 @@ static int nes_netdev_start_xmit(struct sk_buff *skb, 
struct net_device *netdev)
nesnic->sq_head &= nesnic->sq_size-1;
}
} else {
-   nesvnic->linearized_skbs++;
hoffset = skb_transport_header(skb) - skb->data;
nhoffset = skb_network_header(skb) - skb->data;
-   skb_linearize(skb);
+   if (skb_linearize(skb))
+   return NETDEV_TX_BUSY;
+   nesvnic->linearized_skbs++;
skb_set_transport_header(skb, hoffset);
skb_set_network_header(skb, nhoffset);
if (!nes_nic_send(skb, netdev))
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_fcoe.c 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_fcoe.c
index 2a653ec..ab787cb 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_fcoe.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_fcoe.c
@@ -490,7 +490,11 @@ int ixgbe_fcoe_ddp(struct ixgbe_adapter *adapter,
 */
if ((fh->fh_r_ctl == FC_RCTL_DD_SOL_DATA) &&
(fctl & FC_FC_END_SEQ)) {
-   skb_linearize(skb);
+   int err = 0;
+
+   err = skb_linearize(skb);
+   if (err)
+   return err;
crc = (struct fcoe_crc_eof *)skb_put(skb, sizeof(*crc));
crc->fcoe_eof = FC_EOF_T;
}
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index fee1f29..4926d48 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -2173,8 +2173,7 @@ static int ixgbe_clean_rx_irq(struct ixgbe_q_vector 
*q_vector,
total_rx_bytes += ddp_bytes;
total_rx_packets += DIV_ROUND_UP(ddp_bytes,
 mss);
-   }
-   if (!ddp_bytes) {
+   } else {
dev_kfree_skb_any(skb);
continue;
}
diff --git a/drivers/scsi/bnx2fc/bnx2fc_fcoe.c 
b/drivers/scsi/bnx2fc/bnx2fc_fcoe.c
index f9ddb61..197d02e 100644
--- a/drivers/scsi/bnx2fc/bnx2fc_fcoe.c
+++ b/drivers/scsi/bnx2fc/bnx2fc_fcoe.c
@@ -542,8 +542,11 @@ static void bnx2fc_recv_frame(struct sk_buff *skb)
return;
}
 
-   if (skb_is_nonlinear(skb))
-   skb_linearize(skb);
+   if (skb_linearize(skb)) {
+   kfree_skb(skb);
+   return;
+   }
+
mac = eth_hdr(skb)->h_source;
dest_mac = eth_hdr(skb)->h_dest;
 
diff --git a/drivers/scsi/fcoe/fcoe.c b/drivers/scsi/fcoe/fcoe.c
index 9bd41a3..f691b97 100644
--- a/drivers/scsi/fcoe/fcoe.c
+++ b/drivers/scsi/fcoe/fcoe.c
@@ -1685,7 +1685,10 @@ static void fcoe_recv_frame(struct sk_buff *skb)
skb->dev ? skb->dev->name : "");
 
port = lport_priv(lport);
-   skb_linearize(skb); /* check for skb_is_nonlinear is within 
skb_linearize */
+   if (skb_linearize(skb)) {
+   kfree_skb(skb);
+   return;
+   }
 
/*
 * Frame length checks and setting up the header pointers
diff --git a/net/tipc/link.c b/net/tipc/link.c
index bda89bf..077c570 100644
--- a/net/tipc/link.c
+++ b/net/tipc/link.c
@@ -1446,7 +1446,8 @@ static int tipc_link_proto_rcv(struct tipc_link *l, 
struct sk_buff *skb,
if (tipc_own_addr(l->net) > msg_prevnode(hdr))
l->net_plane = msg_net_plane(hdr);
 
-   skb_linearize(skb);
+   if (skb_linearize(skb))
+   goto exit;
hdr = buf_msg(skb);
data = msg_data(hdr);
 
diff --git a/net/tipc/name_distr.c b/net/tipc/name_distr.c
index c1cfd92..4e05d2a 100644
--- a/net/tipc/name_distr.c
+++ b/net/tipc/name_distr.c
@@ -356,7 +356,10 @@ void tipc_named_rcv(struct net 

Re: [PATCH net-next 1/1] driver: ipvlan: Free the port memory directly with kfree instead of kfree_rcu

2016-12-05 Thread Gao Feng
Hi Eric,

On Tue, Dec 6, 2016 at 2:53 PM, Eric Dumazet  wrote:
> On Tue, 2016-12-06 at 14:31 +0800, Gao Feng wrote:
>
>> Because I don't fully hold the ipvlan codes now, I am afraid of that
>> there is someone which may get the port address when
>> ipvlan_port_destroy. So the original ipvlan_port_destroy uses the
>> kfree_rcu to avoid it.
>>
>> I am sure there is unnecessary to use kfree in ipvlan_port_create.
>
> And I am pretty sure it is unnecessary to use kfree_rcu() in
> ipvlan_port_destroy() as well.
>
> I highly suggest you spend time on learning why.
>
>
>

Thanks your suggestion.
I will send v2 patch after get the reason by myself.

Begards
Feng




Re: [PATCH net] team: team_port_add should check link_up before enable port

2016-12-05 Thread Xin Long
On Sat, Dec 3, 2016 at 10:57 PM, Marcelo Ricardo Leitner
 wrote:
> On Sat, Dec 03, 2016 at 09:42:11PM +0800, Xin Long wrote:
>> Now when users add a nic to team dev, the option 'enable' of the port
>> is true by default, as team_port_enable enables it after dev_open in
>> team_port_add.
>>
>> But even if the port_dev has no carrier, like it's cable was unpluged,
>> the port is still enabled. It leads to that team dev couldn't work well
>> if this port was chosen to connect, and has no chance to change to use
>> other ports if link_watch is ethtool.
>>
>> This patch is to enable the port only when the port_dev has carrier in
>> team_port_add.
>>
>> Signed-off-by: Xin Long 
>> ---
>>  drivers/net/team/team.c | 8 ++--
>>  1 file changed, 6 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/net/team/team.c b/drivers/net/team/team.c
>> index a380649..42004ac 100644
>> --- a/drivers/net/team/team.c
>> +++ b/drivers/net/team/team.c
>> @@ -1140,6 +1140,7 @@ static int team_port_add(struct team *team, struct 
>> net_device *port_dev)
>>   struct net_device *dev = team->dev;
>>   struct team_port *port;
>>   char *portname = port_dev->name;
>> + bool linkup;
>>   int err;
>>
>>   if (port_dev->flags & IFF_LOOPBACK) {
>> @@ -1249,9 +1250,12 @@ static int team_port_add(struct team *team, struct 
>> net_device *port_dev)
>>
>>   port->index = -1;
>>   list_add_tail_rcu(>list, >port_list);
>> - team_port_enable(team, port);
>> + linkup = !!netif_carrier_ok(port_dev);
>
> The !! here is not needed anymore, netif_carrier_ok already returns a
> bool.
> static inline bool netif_carrier_ok(const struct net_device *dev)
will repost, thanks.

>
>
>> + if (linkup)
>> + team_port_enable(team, port);
>> +
>>   __team_compute_features(team);
>> - __team_port_change_port_added(port, !!netif_carrier_ok(port_dev));
>> + __team_port_change_port_added(port, linkup);
>>   __team_options_change_check(team);
>>
>>   netdev_info(dev, "Port device %s added\n", portname);
>> --
>> 2.1.0
>>


Re: [PATCH net-next 1/1] driver: ipvlan: Free the port memory directly with kfree instead of kfree_rcu

2016-12-05 Thread Eric Dumazet
On Tue, 2016-12-06 at 14:31 +0800, Gao Feng wrote:

> Because I don't fully hold the ipvlan codes now, I am afraid of that
> there is someone which may get the port address when
> ipvlan_port_destroy. So the original ipvlan_port_destroy uses the
> kfree_rcu to avoid it.
> 
> I am sure there is unnecessary to use kfree in ipvlan_port_create.

And I am pretty sure it is unnecessary to use kfree_rcu() in
ipvlan_port_destroy() as well.

I highly suggest you spend time on learning why.





Re: [PATCH net-next 1/1] driver: ipvlan: Free the port memory directly with kfree instead of kfree_rcu

2016-12-05 Thread Gao Feng
Hi Eric,

On Tue, Dec 6, 2016 at 2:25 PM, Eric Dumazet  wrote:
> On Tue, 2016-12-06 at 12:29 +0800, f...@ikuai8.com wrote:
>> From: Gao Feng 
>>
>> There is no one which may reference the "port" in ipvlan_port_create
>> when netdev_rx_handler_register failed. So it could free it directly
>> with kfree instead of kfree_rcu.
>>
>> Signed-off-by: Gao Feng 
>> ---
>>  drivers/net/ipvlan/ipvlan_main.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/net/ipvlan/ipvlan_main.c 
>> b/drivers/net/ipvlan/ipvlan_main.c
>> index c6aa667..1a601151 100644
>> --- a/drivers/net/ipvlan/ipvlan_main.c
>> +++ b/drivers/net/ipvlan/ipvlan_main.c
>> @@ -128,7 +128,7 @@ static int ipvlan_port_create(struct net_device *dev)
>>   return 0;
>>
>>  err:
>> - kfree_rcu(port, rcu);
>> + kfree(port);
>>   return err;
>>  }
>>
>
> This looks a partial patch.
>
> If you really care, why don't you also replace the kfree_rcu() in
> ipvlan_port_destroy() ?

Because I don't fully hold the ipvlan codes now, I am afraid of that
there is someone which may get the port address when
ipvlan_port_destroy. So the original ipvlan_port_destroy uses the
kfree_rcu to avoid it.

I am sure there is unnecessary to use kfree in ipvlan_port_create.

Regards
Feng

>
>
>
> diff --git a/drivers/net/ipvlan/ipvlan.h b/drivers/net/ipvlan/ipvlan.h
> index 
> 05a62d2216c54651f6158c35d446d2e395b38dc3..031093e1c25f55244e6bdfde4ebeb65c0f2f10c1
>  100644
> --- a/drivers/net/ipvlan/ipvlan.h
> +++ b/drivers/net/ipvlan/ipvlan.h
> @@ -97,7 +97,6 @@ struct ipvl_port {
> struct work_struct  wq;
> struct sk_buff_head backlog;
> int count;
> -   struct rcu_head rcu;
>  };
>
>  static inline struct ipvl_port *ipvlan_port_get_rcu(const struct net_device 
> *d)
> diff --git a/drivers/net/ipvlan/ipvlan_main.c 
> b/drivers/net/ipvlan/ipvlan_main.c
> index 
> 5430460167b5e8945d29a3febdd324461bf5af5c..ffe8994e64fc1791ef07d80ad2340bc82d541bba
>  100644
> --- a/drivers/net/ipvlan/ipvlan_main.c
> +++ b/drivers/net/ipvlan/ipvlan_main.c
> @@ -128,7 +128,7 @@ static int ipvlan_port_create(struct net_device *dev)
> return 0;
>
>  err:
> -   kfree_rcu(port, rcu);
> +   kfree(port);
> return err;
>  }
>
> @@ -145,7 +145,7 @@ static void ipvlan_port_destroy(struct net_device *dev)
> netdev_rx_handler_unregister(dev);
> cancel_work_sync(>wq);
> __skb_queue_purge(>backlog);
> -   kfree_rcu(port, rcu);
> +   kfree(port);
>  }
>
>  #define IPVLAN_FEATURES \
>
>
>




[patch net v4] net: fec: fix compile with CONFIG_M5272

2016-12-05 Thread Nikita Yushchenko
Commit 80cca775cdc4 ("net: fec: cache statistics while device is down")
introduced unconditional statistics-related actions.

However, when driver is compiled with CONFIG_M5272, staticsics-related
definitions do not exist, which results into build errors.

Fix that by adding explicit handling of !defined(CONFIG_M5272) case.

Fixes: 80cca775cdc4 ("net: fec: cache statistics while device is down")
Signed-off-by: Nikita Yushchenko 
---
Changes from v3:
- fix reference commit id to match upstream tree

 drivers/net/ethernet/freescale/fec_main.c | 13 ++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/freescale/fec_main.c 
b/drivers/net/ethernet/freescale/fec_main.c
index 5f77caa59534..12aef1b15356 100644
--- a/drivers/net/ethernet/freescale/fec_main.c
+++ b/drivers/net/ethernet/freescale/fec_main.c
@@ -2313,6 +2313,8 @@ static const struct fec_stat {
{ "IEEE_rx_octets_ok", IEEE_R_OCTETS_OK },
 };
 
+#define FEC_STATS_SIZE (ARRAY_SIZE(fec_stats) * sizeof(u64))
+
 static void fec_enet_update_ethtool_stats(struct net_device *dev)
 {
struct fec_enet_private *fep = netdev_priv(dev);
@@ -2330,7 +2332,7 @@ static void fec_enet_get_ethtool_stats(struct net_device 
*dev,
if (netif_running(dev))
fec_enet_update_ethtool_stats(dev);
 
-   memcpy(data, fep->ethtool_stats, ARRAY_SIZE(fec_stats) * sizeof(u64));
+   memcpy(data, fep->ethtool_stats, FEC_STATS_SIZE);
 }
 
 static void fec_enet_get_strings(struct net_device *netdev,
@@ -2355,6 +2357,12 @@ static int fec_enet_get_sset_count(struct net_device 
*dev, int sset)
return -EOPNOTSUPP;
}
 }
+
+#else  /* !defined(CONFIG_M5272) */
+#define FEC_STATS_SIZE 0
+static inline void fec_enet_update_ethtool_stats(struct net_device *dev)
+{
+}
 #endif /* !defined(CONFIG_M5272) */
 
 static int fec_enet_nway_reset(struct net_device *dev)
@@ -3293,8 +3301,7 @@ fec_probe(struct platform_device *pdev)
 
/* Init network device */
ndev = alloc_etherdev_mqs(sizeof(struct fec_enet_private) +
- ARRAY_SIZE(fec_stats) * sizeof(u64),
- num_tx_qs, num_rx_qs);
+ FEC_STATS_SIZE, num_tx_qs, num_rx_qs);
if (!ndev)
return -ENOMEM;
 
-- 
2.1.4



Re: [PATCH net-next 1/1] driver: ipvlan: Free the port memory directly with kfree instead of kfree_rcu

2016-12-05 Thread Eric Dumazet
On Tue, 2016-12-06 at 12:29 +0800, f...@ikuai8.com wrote:
> From: Gao Feng 
> 
> There is no one which may reference the "port" in ipvlan_port_create
> when netdev_rx_handler_register failed. So it could free it directly
> with kfree instead of kfree_rcu.
> 
> Signed-off-by: Gao Feng 
> ---
>  drivers/net/ipvlan/ipvlan_main.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ipvlan/ipvlan_main.c 
> b/drivers/net/ipvlan/ipvlan_main.c
> index c6aa667..1a601151 100644
> --- a/drivers/net/ipvlan/ipvlan_main.c
> +++ b/drivers/net/ipvlan/ipvlan_main.c
> @@ -128,7 +128,7 @@ static int ipvlan_port_create(struct net_device *dev)
>   return 0;
>  
>  err:
> - kfree_rcu(port, rcu);
> + kfree(port);
>   return err;
>  }
>  

This looks a partial patch.

If you really care, why don't you also replace the kfree_rcu() in
ipvlan_port_destroy() ?



diff --git a/drivers/net/ipvlan/ipvlan.h b/drivers/net/ipvlan/ipvlan.h
index 
05a62d2216c54651f6158c35d446d2e395b38dc3..031093e1c25f55244e6bdfde4ebeb65c0f2f10c1
 100644
--- a/drivers/net/ipvlan/ipvlan.h
+++ b/drivers/net/ipvlan/ipvlan.h
@@ -97,7 +97,6 @@ struct ipvl_port {
struct work_struct  wq;
struct sk_buff_head backlog;
int count;
-   struct rcu_head rcu;
 };
 
 static inline struct ipvl_port *ipvlan_port_get_rcu(const struct net_device *d)
diff --git a/drivers/net/ipvlan/ipvlan_main.c b/drivers/net/ipvlan/ipvlan_main.c
index 
5430460167b5e8945d29a3febdd324461bf5af5c..ffe8994e64fc1791ef07d80ad2340bc82d541bba
 100644
--- a/drivers/net/ipvlan/ipvlan_main.c
+++ b/drivers/net/ipvlan/ipvlan_main.c
@@ -128,7 +128,7 @@ static int ipvlan_port_create(struct net_device *dev)
return 0;
 
 err:
-   kfree_rcu(port, rcu);
+   kfree(port);
return err;
 }
 
@@ -145,7 +145,7 @@ static void ipvlan_port_destroy(struct net_device *dev)
netdev_rx_handler_unregister(dev);
cancel_work_sync(>wq);
__skb_queue_purge(>backlog);
-   kfree_rcu(port, rcu);
+   kfree(port);
 }
 
 #define IPVLAN_FEATURES \





Re: [PATCH net v2] ipv6: Allow IPv4-mapped address as next-hop

2016-12-05 Thread Erik Nordmark

On 12/5/16 11:52 AM, David Miller wrote:

From: Erik Nordmark 
Date: Sat,  3 Dec 2016 20:57:09 -0800


Made kernel accept IPv6 routes with IPv4-mapped address as next-hop.

It is possible to configure IP interfaces with IPv4-mapped addresses, and
one can add IPv6 routes for IPv4-mapped destinations/prefixes, yet prior
to this fix the kernel returned an EINVAL when attempting to add an IPv6
route with an IPv4-mapped address as a nexthop/gateway.

RFC 4798 (a proposed standard RFC) uses IPv4-mapped addresses as nexthops,
thus in order to support that type of address configuration the kernel
needs to allow IPv4-mapped addresses as nexthops.

Signed-off-by: Erik Nordmark 
Signed-off-by: Bob Gilligan 

Applied to net-next, thanks.


Thanks, especially for moving it from net to net-next.

I guess I don't fully understand what is considered a bug fix for net as 
opposed to new stuff for net-next. Is the former mostly for regressions 
and serious bugs? This was a fix for a bug that's been there since the 
beginning of IPv6 time AFAICT.



  Erik




Re: [flamebait] xdp Was: Re: bpf bounded loops. Was: [flamebait] xdp

2016-12-05 Thread Alexei Starovoitov
On Mon, Dec 05, 2016 at 09:08:36PM -0800, Tom Herbert wrote:
> On Mon, Dec 5, 2016 at 7:05 PM, Alexei Starovoitov
>  wrote:
> > On Sun, Dec 04, 2016 at 05:05:28PM +0100, Hannes Frederic Sowa wrote:
> >>
> >> If one of those eBPF verifiers only accepts a certain number of INSN, as
> >> fundamental as backwards jumps, we might end up with two compiler?
> >
> > two compilers? We already have five. There is gcc bpf backend (unmaintained)
> > and now lua, python and ply project can generate bpf code without llvm.
> > The kernel verifier has to become smarter. Right now it understands
> > only certain instruction patterns which caused all five bpf generators to
> > do extra work to satisfy the verifier. The solution is to do
> > data flow analysis using proper compiler techniques.
> >
> >> program thinks). Ergo, more complexity. What do you do when one of those
> >> two systems fail? What is the reference data? What do you do if on a
> >> highly busy box during DoS constant reloading of your vmalloc happens (I
> >> don't know if it is a problem under DoS)?
> >
> > ddos is one of the key use cases for xdp. If the system is about to oom
> > during ddos, it has to be fixed. The faster we move with xdp development
> > the sooner we will find and fix those issues.
> > And xdp being a core component of the linux kernel we will fix ddos
> > for the whole internet. Anyone going dpdk route are simply in
> > business of selling ddos protection with proprietary solutions.
> >
> Hi Alexei,
> 
> I am wondering exactly how XDP fixes DDOS in a non-proprietary
> fashion. While the XDP infrastructure is part of the core kernel, the
> programs are not part of the kernel as you mention below. So what will
> a DDOS solution based on XDP for the whole Internet look like? Do you
> envision a set of "blessed" DDOS programs that various sites can use
> and configure (maybe some maintained open source repository), or will
> each site need to come up with their own XDP programs for DDOS?

At some point we would need a repository of these 'blessed' programs.
Some of them will not be programs, but program generators
similar to existing Cloudflare bpf setup:
https://github.com/cloudflare/bpftools
and instead of doing things like:
https://github.com/cloudflare/lua-aho-corasick
and reimplementing them in proprietary c++,
the dfa/aho-corasick will be implemented as a kernel helper.
That's what I was alluding to in
https://github.com/iovisor/bcc/issues/471
Then all of the research in that area like:
https://ir.nctu.edu.tw/bitstream/11536/26033/1/00028831946.pdf
will be applicable and researchers will be sharing
these detector programs.
Of course, not everyone will open up their secret sauce,
but a lot of folks will do and it will drive the innovation.



[PATCH net] be2net: Add DEVSEC privilege to SET_HSW_CONFIG command.

2016-12-05 Thread Suresh Reddy
From: Venkat Duvvuru 

OPCODE_COMMON_GET_FN_PRIVILEGES is returning only DEVSEC
privilege (Unrestricted Administrative Privilege) for Lancer NIC functions.
So, driver is failing SET_HSW_CONFIG command, as DEVSEC privilege was not
set in the privilege bitmap. This patch fixes the problem by setting DEVSEC
privilege in SET_HSW_CONFIG’s privilege bitmap.

Signed-off-by: Venkat Duvvuru 
Signed-off-by: Suresh Reddy 
---
 drivers/net/ethernet/emulex/benet/be_cmds.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/emulex/benet/be_cmds.c 
b/drivers/net/ethernet/emulex/benet/be_cmds.c
index 1fb5d72..0e74529 100644
--- a/drivers/net/ethernet/emulex/benet/be_cmds.c
+++ b/drivers/net/ethernet/emulex/benet/be_cmds.c
@@ -90,7 +90,8 @@ static struct be_cmd_priv_map cmd_priv_map[] = {
{
OPCODE_COMMON_SET_HSW_CONFIG,
CMD_SUBSYSTEM_COMMON,
-   BE_PRIV_DEVCFG | BE_PRIV_VHADM
+   BE_PRIV_DEVCFG | BE_PRIV_VHADM |
+   BE_PRIV_DEVSEC
},
{
OPCODE_COMMON_GET_EXT_FAT_CAPABILITIES,
-- 
2.10.1



Re: [flamebait] xdp Was: Re: bpf bounded loops. Was: [flamebait] xdp

2016-12-05 Thread Tom Herbert
On Mon, Dec 5, 2016 at 7:05 PM, Alexei Starovoitov
 wrote:
> On Sun, Dec 04, 2016 at 05:05:28PM +0100, Hannes Frederic Sowa wrote:
>>
>> If one of those eBPF verifiers only accepts a certain number of INSN, as
>> fundamental as backwards jumps, we might end up with two compiler?
>
> two compilers? We already have five. There is gcc bpf backend (unmaintained)
> and now lua, python and ply project can generate bpf code without llvm.
> The kernel verifier has to become smarter. Right now it understands
> only certain instruction patterns which caused all five bpf generators to
> do extra work to satisfy the verifier. The solution is to do
> data flow analysis using proper compiler techniques.
>
>> program thinks). Ergo, more complexity. What do you do when one of those
>> two systems fail? What is the reference data? What do you do if on a
>> highly busy box during DoS constant reloading of your vmalloc happens (I
>> don't know if it is a problem under DoS)?
>
> ddos is one of the key use cases for xdp. If the system is about to oom
> during ddos, it has to be fixed. The faster we move with xdp development
> the sooner we will find and fix those issues.
> And xdp being a core component of the linux kernel we will fix ddos
> for the whole internet. Anyone going dpdk route are simply in
> business of selling ddos protection with proprietary solutions.
>
Hi Alexei,

I am wondering exactly how XDP fixes DDOS in a non-proprietary
fashion. While the XDP infrastructure is part of the core kernel, the
programs are not part of the kernel as you mention below. So what will
a DDOS solution based on XDP for the whole Internet look like? Do you
envision a set of "blessed" DDOS programs that various sites can use
and configure (maybe some maintained open source repository), or will
each site need to come up with their own XDP programs for DDOS?

Thanks,
Tom

>> I tried to argue that someone wanting to build netmap/DPDK-alike things
>> in XDP, one faces the problem of synchronized IPC. Hashmaps solve this
>> to some degree but cannot be synchronized.
>
> I don't see ipc as a problem and, yes, xdp is the best platform so far
> to deliver packets to user space. I think that the dataplane-in-the-driver
> is going to be faster than the fastest streaming to user space approach,
> but we cannot rule one way or the other without trying multiple
> approaches first and benchmarking them against each other.
> So I very much in favor of Jesper's effort to deliver packets to user space.
>
>> DPDK even can configure various hw offloads already before the kernel
>> can do so.
>
> that's a harsh lesson that the kernel needs to learn. Since people went
> to dpdk to do hw offload it means it's our fault that we were not
> accommodative and flexible enough to provide such frameworks within
> the kernel. imo John's flow/match api should have been accepted
> and it would have been solid building block towards such offloads.
>
>> If users want to use those, they switch to DPDK also, as I
>> have seen the industry always wanting the best performance. DPDK can use
>> SIMD instructions, all AVX, SSE and MMX stuff, and they do it.
>
> agree as well. The kernel needs to find a way to use all of these
> fancy instructions where performance matters.
> People who say "kernel cannot do simd" just didn't try hard enough.
>
>> Debugging is harder but currently worked on. But will probably always be
>> harder than simply using a debugger.
>
> That's actually the important value proposition of xdp+bpf, since
> non-working bpf program is not a concern for the kernel support team.
> Unlike kernel modules that the kernel team needs to bless and support
> in production, bpf programs are outside of that scope. They are part
> of user space apps and part of user space responsibility.
>
>> This all leads to gigantic user space control planes like neutron and
>> others that just make everyone's life much harder. The model requires
>> this. And that is what I fear.
>
> the neutron is complex and fragile, since it's using bridges on
> top of bridges with ebtables and ovs in the mix. Trying to manage
> many different kernel technologies and a mix of smaller control planes
> by this mega control plane is not an easy task.
>
>> I am not at all that negative against a hook before allocating the
>> packet, but making everyone using it and marketing as an alternative to
>> DPDK doesn't seem to fit for me.
>
> I don't see developers that are forced to use xdp. I see developers
> that are eager to use xdp as soon as support for it is available
> in their nics. Those like maglev who developed their own bypass
> are not going to use dpdk and people who already using dpdk are
> not going to switch to xdp, but there are lots of others who
> welcome xdp with open arms.
>
> Thanks
>


[PATCH net-next 1/1] driver: ipvlan: Free the port memory directly with kfree instead of kfree_rcu

2016-12-05 Thread fgao
From: Gao Feng 

There is no one which may reference the "port" in ipvlan_port_create
when netdev_rx_handler_register failed. So it could free it directly
with kfree instead of kfree_rcu.

Signed-off-by: Gao Feng 
---
 drivers/net/ipvlan/ipvlan_main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ipvlan/ipvlan_main.c b/drivers/net/ipvlan/ipvlan_main.c
index c6aa667..1a601151 100644
--- a/drivers/net/ipvlan/ipvlan_main.c
+++ b/drivers/net/ipvlan/ipvlan_main.c
@@ -128,7 +128,7 @@ static int ipvlan_port_create(struct net_device *dev)
return 0;
 
 err:
-   kfree_rcu(port, rcu);
+   kfree(port);
return err;
 }
 
-- 
1.9.1




Re: "af_unix: conditionally use freezable blocking calls in read" is wrong

2016-12-05 Thread Cong Wang
On Sun, Dec 4, 2016 at 7:52 PM, Al Viro  wrote:
> On Sun, Dec 04, 2016 at 09:42:14PM -0500, David Miller wrote:
>> > I've run into that converting AF_UNIX to generic_file_splice_read();
>> > I can kludge around that ("freezable unless ->msg_iter is ITER_PIPE"), but
>> > that only delays trouble.
>> >
>> > Note that the only other user of freezable_schedule_timeout() is
>> > a very different story - it's a kernel thread, which *does* have a 
>> > guaranteed
>> > locking environment.  Making such assumptions in unix_stream_recvmsg(),
>> > OTOH, is insane...
>>
>> We have to otherwise Android phones drain their batteries in 10
>> minutes.
>>
>> I'm not going to revert this and be responsible for that.
>>
>> So you have to find a way to make the freezable calls legitimate.
>
> Oh, well...  As I said, I can kludge around that - call from
> generic_file_splice_read() can be distinguished by looking at the
> ->msg_iter->type; it still means unpleasantness for kernel_recvmsg()
> users - in effect, it can only be called with locks held if you know that
> the socket is not an AF_UNIX one.
>
> BTW, how do they deal with plain pipes?

I suppose this question is for Colin. ;)


Re: [flamebait] xdp Was: Re: bpf bounded loops. Was: [flamebait] xdp

2016-12-05 Thread Alexei Starovoitov
On Sun, Dec 04, 2016 at 05:05:28PM +0100, Hannes Frederic Sowa wrote:
>
> If one of those eBPF verifiers only accepts a certain number of INSN, as
> fundamental as backwards jumps, we might end up with two compiler?

two compilers? We already have five. There is gcc bpf backend (unmaintained)
and now lua, python and ply project can generate bpf code without llvm.
The kernel verifier has to become smarter. Right now it understands
only certain instruction patterns which caused all five bpf generators to
do extra work to satisfy the verifier. The solution is to do
data flow analysis using proper compiler techniques.

> program thinks). Ergo, more complexity. What do you do when one of those
> two systems fail? What is the reference data? What do you do if on a
> highly busy box during DoS constant reloading of your vmalloc happens (I
> don't know if it is a problem under DoS)?

ddos is one of the key use cases for xdp. If the system is about to oom
during ddos, it has to be fixed. The faster we move with xdp development
the sooner we will find and fix those issues.
And xdp being a core component of the linux kernel we will fix ddos
for the whole internet. Anyone going dpdk route are simply in
business of selling ddos protection with proprietary solutions.

> I tried to argue that someone wanting to build netmap/DPDK-alike things
> in XDP, one faces the problem of synchronized IPC. Hashmaps solve this
> to some degree but cannot be synchronized.

I don't see ipc as a problem and, yes, xdp is the best platform so far
to deliver packets to user space. I think that the dataplane-in-the-driver
is going to be faster than the fastest streaming to user space approach,
but we cannot rule one way or the other without trying multiple
approaches first and benchmarking them against each other.
So I very much in favor of Jesper's effort to deliver packets to user space.

> DPDK even can configure various hw offloads already before the kernel
> can do so.

that's a harsh lesson that the kernel needs to learn. Since people went
to dpdk to do hw offload it means it's our fault that we were not
accommodative and flexible enough to provide such frameworks within
the kernel. imo John's flow/match api should have been accepted
and it would have been solid building block towards such offloads.

> If users want to use those, they switch to DPDK also, as I
> have seen the industry always wanting the best performance. DPDK can use
> SIMD instructions, all AVX, SSE and MMX stuff, and they do it.

agree as well. The kernel needs to find a way to use all of these
fancy instructions where performance matters.
People who say "kernel cannot do simd" just didn't try hard enough.

> Debugging is harder but currently worked on. But will probably always be
> harder than simply using a debugger.

That's actually the important value proposition of xdp+bpf, since
non-working bpf program is not a concern for the kernel support team.
Unlike kernel modules that the kernel team needs to bless and support
in production, bpf programs are outside of that scope. They are part
of user space apps and part of user space responsibility.

> This all leads to gigantic user space control planes like neutron and
> others that just make everyone's life much harder. The model requires
> this. And that is what I fear.

the neutron is complex and fragile, since it's using bridges on
top of bridges with ebtables and ovs in the mix. Trying to manage
many different kernel technologies and a mix of smaller control planes
by this mega control plane is not an easy task.

> I am not at all that negative against a hook before allocating the
> packet, but making everyone using it and marketing as an alternative to
> DPDK doesn't seem to fit for me.

I don't see developers that are forced to use xdp. I see developers
that are eager to use xdp as soon as support for it is available
in their nics. Those like maglev who developed their own bypass
are not going to use dpdk and people who already using dpdk are
not going to switch to xdp, but there are lots of others who
welcome xdp with open arms.

Thanks



Re: [PATCH] virtio-net: Fix DMA-from-the-stack in virtnet_set_mac_address()

2016-12-05 Thread Michael S. Tsirkin
On Mon, Dec 05, 2016 at 06:10:58PM -0800, Andy Lutomirski wrote:
> With CONFIG_VMAP_STACK=y, virtnet_set_mac_address() can be passed a
> pointer to the stack and it will OOPS.  Copy the address to the heap
> to prevent the crash.
> 
> Cc: Michael S. Tsirkin 
> Cc: Jason Wang 
> Cc: Laura Abbott 
> Reported-by: zbys...@in.waw.pl
> Signed-off-by: Andy Lutomirski 

Acked-by: Michael S. Tsirkin 

> ---
> 
> Very lightly tested.
> 
>  drivers/net/virtio_net.c | 19 ++-
>  1 file changed, 14 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 7276d5a95bd0..cbf1c613c67a 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -969,12 +969,17 @@ static int virtnet_set_mac_address(struct net_device 
> *dev, void *p)
>   struct virtnet_info *vi = netdev_priv(dev);
>   struct virtio_device *vdev = vi->vdev;
>   int ret;
> - struct sockaddr *addr = p;
> + struct sockaddr *addr;
>   struct scatterlist sg;
>  
> - ret = eth_prepare_mac_addr_change(dev, p);
> + addr = kmalloc(sizeof(*addr), GFP_KERNEL);
> + if (!addr)
> + return -ENOMEM;
> + memcpy(addr, p, sizeof(*addr));
> +
> + ret = eth_prepare_mac_addr_change(dev, addr);
>   if (ret)
> - return ret;
> + goto out;
>  
>   if (virtio_has_feature(vdev, VIRTIO_NET_F_CTRL_MAC_ADDR)) {
>   sg_init_one(, addr->sa_data, dev->addr_len);
> @@ -982,7 +987,8 @@ static int virtnet_set_mac_address(struct net_device 
> *dev, void *p)
> VIRTIO_NET_CTRL_MAC_ADDR_SET, )) {
>   dev_warn(>dev,
>"Failed to set mac address by vq command.\n");
> - return -EINVAL;
> + ret = -EINVAL;
> + goto out;
>   }
>   } else if (virtio_has_feature(vdev, VIRTIO_NET_F_MAC) &&
>  !virtio_has_feature(vdev, VIRTIO_F_VERSION_1)) {
> @@ -996,8 +1002,11 @@ static int virtnet_set_mac_address(struct net_device 
> *dev, void *p)
>   }
>  
>   eth_commit_mac_addr_change(dev, p);
> + ret = 0;
>  
> - return 0;
> +out:
> + kfree(addr);
> + return ret;
>  }
>  
>  static struct rtnl_link_stats64 *virtnet_stats(struct net_device *dev,
> -- 
> 2.9.3


Re: [PATCH] virtio-net: Fix DMA-from-the-stack in virtnet_set_mac_address()

2016-12-05 Thread Jason Wang



On 2016年12月06日 10:10, Andy Lutomirski wrote:

With CONFIG_VMAP_STACK=y, virtnet_set_mac_address() can be passed a
pointer to the stack and it will OOPS.  Copy the address to the heap
to prevent the crash.

Cc: Michael S. Tsirkin 
Cc: Jason Wang 
Cc: Laura Abbott 
Reported-by: zbys...@in.waw.pl
Signed-off-by: Andy Lutomirski 
---

Very lightly tested.

  drivers/net/virtio_net.c | 19 ++-
  1 file changed, 14 insertions(+), 5 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 7276d5a95bd0..cbf1c613c67a 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -969,12 +969,17 @@ static int virtnet_set_mac_address(struct net_device 
*dev, void *p)
struct virtnet_info *vi = netdev_priv(dev);
struct virtio_device *vdev = vi->vdev;
int ret;
-   struct sockaddr *addr = p;
+   struct sockaddr *addr;
struct scatterlist sg;
  
-	ret = eth_prepare_mac_addr_change(dev, p);

+   addr = kmalloc(sizeof(*addr), GFP_KERNEL);
+   if (!addr)
+   return -ENOMEM;
+   memcpy(addr, p, sizeof(*addr));
+
+   ret = eth_prepare_mac_addr_change(dev, addr);
if (ret)
-   return ret;
+   goto out;
  
  	if (virtio_has_feature(vdev, VIRTIO_NET_F_CTRL_MAC_ADDR)) {

sg_init_one(, addr->sa_data, dev->addr_len);
@@ -982,7 +987,8 @@ static int virtnet_set_mac_address(struct net_device *dev, 
void *p)
  VIRTIO_NET_CTRL_MAC_ADDR_SET, )) {
dev_warn(>dev,
 "Failed to set mac address by vq command.\n");
-   return -EINVAL;
+   ret = -EINVAL;
+   goto out;
}
} else if (virtio_has_feature(vdev, VIRTIO_NET_F_MAC) &&
   !virtio_has_feature(vdev, VIRTIO_F_VERSION_1)) {
@@ -996,8 +1002,11 @@ static int virtnet_set_mac_address(struct net_device 
*dev, void *p)
}
  
  	eth_commit_mac_addr_change(dev, p);

+   ret = 0;
  
-	return 0;

+out:
+   kfree(addr);
+   return ret;
  }
  
  static struct rtnl_link_stats64 *virtnet_stats(struct net_device *dev,


Acked-by: Jason Wang 


Re: [RESEND][PATCH v4] cgroup: Use CAP_SYS_RESOURCE to allow a process to migrate other tasks between cgroups

2016-12-05 Thread Serge E. Hallyn
On Mon, Dec 05, 2016 at 04:36:51PM -0800, Andy Lutomirski wrote:
> On Mon, Dec 5, 2016 at 4:28 PM, John Stultz  wrote:
> > On Tue, Nov 22, 2016 at 4:57 PM, John Stultz  wrote:
> >> On Tue, Nov 8, 2016 at 4:12 PM, Andy Lutomirski  
> >> wrote:
> >>> On Tue, Nov 8, 2016 at 4:03 PM, Alexei Starovoitov
> >>>  wrote:
>  On Tue, Nov 08, 2016 at 03:51:40PM -0800, Andy Lutomirski wrote:
> >
> > I hate to say it, but I think I may see a problem.  Current
> > developments are afoot to make cgroups do more than resource control.
> > For example, there's Landlock and there's Daniel's ingress/egress
> > filter thing.  Current cgroup controllers can mostly just DoS their
> > controlled processes.  These new controllers (or controller-like
> > things) can exfiltrate data and change semantics.
> >
> > Does anyone have a security model in mind for these controllers and
> > the cgroups that they're attached to?  I'm reasonably confident that
> > CAP_SYS_RESOURCE is not the answer...
> 
>  and specifically the answer is... ?
>  Also would be great if you start with specifying the question first
>  and the problem you're trying to solve.
> 
> >>>
> >>> I don't have a good answer right now.  Here are some constraints, though:
> >>>
> >>> 1. An insufficiently privileged process should not be able to move a
> >>> victim into a dangerous cgroup.
> >>>
> >>> 2. An insufficiently privileged process should not be able to move
> >>> itself into a dangerous cgroup and then use execve to gain privilege
> >>> such that the execve'd program can be compromised.
> >>>
> >>> 3. An insufficiently privileged process should not be able to make an
> >>> existing cgroup dangerous in a way that could compromise a victim in
> >>> that cgroup.
> >>>
> >>> 4. An insufficiently privileged process should not be able to make a
> >>> cgroup dangerous in a way that bypasses protections that would
> >>> otherwise protect execve() as used by itself or some other process in
> >>> that cgroup.
> >>>
> >>> Keep in mind that "dangerous" may apply to a cgroup's descendents in
> >>> addition to the cgroup being controlled.
> >>
> >> Sorry for taking awhile to get back to you here.  I'm a little
> >> befuddled as to what next steps I should consider (and honestly, I'm
> >> not totally sure I really grok your concern here, particularly what
> >> you mean with "dangrous cgroups").
> >>
> >> So is going back to the CAP_CGROUP_MIGRATE approach (to properly
> >> separate "sufficiently" from "insufficiently privileged") better?
> >>
> >> Or something closer to the original method Android used of each cgroup
> >> having an allow_attach() check which could determine what is
> >> sufficiently privledged for the respective level of danger the cgroup
> >> might poise?
> >>
> >> Or just stepping back, what method would you imagine to be reasonable
> >> to allow a specified task to migrate other tasks between cgroups
> >> without it having to be root/suid?
> >
> > Any suggested feedback here?
> 
> I really don't know.  The cgroupfs interface is a bit unfortunate in
> that it doesn't really express the constraints.  To safely migrate a
> task, ISTM you ought to have some form of privilege over the task
> *and* some form of privilege over the cgroup.

Agreed.  The problem is that the privilege required should depend on
the controller (I guess).  For memory and cpuset, CAP_SYS_NICE seems
right.  Perhaps CAP_SYS_RESOURCE would be needed for some..  but then,
as I look through the lists (capabilities(7) and the list of controllers),
it seems like CAP_SYS_NICE works for everything.  What else would we need?
Maybe CAP_NET_ADMIN for net_cls and net_prio?  CAP_SYS_RESOURCE|CAP_SYS_ADMIN
for pids?

>   cgroupfs only handles
> the latter.

If we need different checks for different controllers, we can add
checks to cgroupfs.

> CAP_CGROUP_MIGRATE ought to be okay.  Or maybe cgroupfs needs to gain
> a concept of "dangerous" cgroups and further restrict them and
> CAP_SYS_RESOURCE should be fine for non-dangerous cgroups?  I think I
> favor the latter, but it might be nice to hear from Tejun first.
> 
> --Andy


Re: [PATCH next] Revert "dctcp: update cwnd on congestion event"

2016-12-05 Thread Neal Cardwell
On Mon, Dec 5, 2016 at 6:23 PM, Florian Westphal  wrote:
> Neal Cardwell says:
>  If I am reading the code correctly, then I would have two concerns:
>  1) Has that been tested? That seems like an extremely dramatic
> decrease in cwnd. For example, if the cwnd is 80, and there are 40
> ACKs, and half the ACKs are ECE marked, then my back-of-the-envelope
> calculations seem to suggest that after just 11 ACKs the cwnd would be
> down to a minimal value of 2 [..]
>  2) That seems to contradict another passage in the draft [..] where it
> sazs:
>Just as specified in [RFC3168], DCTCP does not react to congestion
>indications more than once for every window of data.
>
> Neal is right.  Fortunately we don't have to complicate this by testing
> vs. current rtt estimate, we can just revert the patch.
>
> Normal stack already handles this for us: receiving ACKs with ECE
> set causes a call to tcp_enter_cwr(), from there on the ssthresh gets
> adjusted and prr will take care of cwnd adjustment.
>
> Fixes: 4780566784b396 ("dctcp: update cwnd on congestion event")
> Cc: Neal Cardwell 
> Signed-off-by: Florian Westphal 
> ---

Acked-by: Neal Cardwell 

Looks good to me. :-)

Thanks,
neal


[PATCH] virtio-net: Fix DMA-from-the-stack in virtnet_set_mac_address()

2016-12-05 Thread Andy Lutomirski
With CONFIG_VMAP_STACK=y, virtnet_set_mac_address() can be passed a
pointer to the stack and it will OOPS.  Copy the address to the heap
to prevent the crash.

Cc: Michael S. Tsirkin 
Cc: Jason Wang 
Cc: Laura Abbott 
Reported-by: zbys...@in.waw.pl
Signed-off-by: Andy Lutomirski 
---

Very lightly tested.

 drivers/net/virtio_net.c | 19 ++-
 1 file changed, 14 insertions(+), 5 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 7276d5a95bd0..cbf1c613c67a 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -969,12 +969,17 @@ static int virtnet_set_mac_address(struct net_device 
*dev, void *p)
struct virtnet_info *vi = netdev_priv(dev);
struct virtio_device *vdev = vi->vdev;
int ret;
-   struct sockaddr *addr = p;
+   struct sockaddr *addr;
struct scatterlist sg;
 
-   ret = eth_prepare_mac_addr_change(dev, p);
+   addr = kmalloc(sizeof(*addr), GFP_KERNEL);
+   if (!addr)
+   return -ENOMEM;
+   memcpy(addr, p, sizeof(*addr));
+
+   ret = eth_prepare_mac_addr_change(dev, addr);
if (ret)
-   return ret;
+   goto out;
 
if (virtio_has_feature(vdev, VIRTIO_NET_F_CTRL_MAC_ADDR)) {
sg_init_one(, addr->sa_data, dev->addr_len);
@@ -982,7 +987,8 @@ static int virtnet_set_mac_address(struct net_device *dev, 
void *p)
  VIRTIO_NET_CTRL_MAC_ADDR_SET, )) {
dev_warn(>dev,
 "Failed to set mac address by vq command.\n");
-   return -EINVAL;
+   ret = -EINVAL;
+   goto out;
}
} else if (virtio_has_feature(vdev, VIRTIO_NET_F_MAC) &&
   !virtio_has_feature(vdev, VIRTIO_F_VERSION_1)) {
@@ -996,8 +1002,11 @@ static int virtnet_set_mac_address(struct net_device 
*dev, void *p)
}
 
eth_commit_mac_addr_change(dev, p);
+   ret = 0;
 
-   return 0;
+out:
+   kfree(addr);
+   return ret;
 }
 
 static struct rtnl_link_stats64 *virtnet_stats(struct net_device *dev,
-- 
2.9.3



[PATCH] net: ethernet: ti: cpsw: fix early budget split

2016-12-05 Thread Ivan Khoronzhuk
The budget split function requires the phy speed to be known.
While ndo open a phy speed identification is postponed till the
moment link is up. Hence, move it to appropriate callback, when link
is up.

Reported-by: Grygorii Strashko 
Fixes: 8feb0a196507 ("net: ethernet: ti: cpsw: split tx budget according 
between channels")
Signed-off-by: Ivan Khoronzhuk 
---
Based on net-next/master

 drivers/net/ethernet/ti/cpsw.c | 154 -
 1 file changed, 77 insertions(+), 77 deletions(-)

diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c
index 3f96c57..f373a4b 100644
--- a/drivers/net/ethernet/ti/cpsw.c
+++ b/drivers/net/ethernet/ti/cpsw.c
@@ -753,6 +753,82 @@ static void cpsw_rx_handler(void *token, int len, int 
status)
dev_kfree_skb_any(new_skb);
 }
 
+/* split budget depending on channel rates */
+static void cpsw_split_budget(struct net_device *ndev)
+{
+   struct cpsw_priv *priv = netdev_priv(ndev);
+   struct cpsw_common *cpsw = priv->cpsw;
+   struct cpsw_vector *txv = cpsw->txv;
+   u32 consumed_rate, bigest_rate = 0;
+   int budget, bigest_rate_ch = 0;
+   struct cpsw_slave *slave;
+   int i, rlim_ch_num = 0;
+   u32 ch_rate, max_rate;
+   int ch_budget = 0;
+
+   if (cpsw->data.dual_emac)
+   slave = >slaves[priv->emac_port];
+   else
+   slave = >slaves[cpsw->data.active_slave];
+
+   max_rate = slave->phy->speed * 1000;
+
+   consumed_rate = 0;
+   for (i = 0; i < cpsw->tx_ch_num; i++) {
+   ch_rate = cpdma_chan_get_rate(txv[i].ch);
+   if (!ch_rate)
+   continue;
+
+   rlim_ch_num++;
+   consumed_rate += ch_rate;
+   }
+
+   if (cpsw->tx_ch_num == rlim_ch_num) {
+   max_rate = consumed_rate;
+   } else {
+   ch_budget = (consumed_rate * CPSW_POLL_WEIGHT) / max_rate;
+   ch_budget = (CPSW_POLL_WEIGHT - ch_budget) /
+   (cpsw->tx_ch_num - rlim_ch_num);
+   bigest_rate = (max_rate - consumed_rate) /
+ (cpsw->tx_ch_num - rlim_ch_num);
+   }
+
+   /* split tx budget */
+   budget = CPSW_POLL_WEIGHT;
+   for (i = 0; i < cpsw->tx_ch_num; i++) {
+   ch_rate = cpdma_chan_get_rate(txv[i].ch);
+   if (ch_rate) {
+   txv[i].budget = (ch_rate * CPSW_POLL_WEIGHT) / max_rate;
+   if (!txv[i].budget)
+   txv[i].budget = 1;
+   if (ch_rate > bigest_rate) {
+   bigest_rate_ch = i;
+   bigest_rate = ch_rate;
+   }
+   } else {
+   txv[i].budget = ch_budget;
+   if (!bigest_rate_ch)
+   bigest_rate_ch = i;
+   }
+
+   budget -= txv[i].budget;
+   }
+
+   if (budget)
+   txv[bigest_rate_ch].budget += budget;
+
+   /* split rx budget */
+   budget = CPSW_POLL_WEIGHT;
+   ch_budget = budget / cpsw->rx_ch_num;
+   for (i = 0; i < cpsw->rx_ch_num; i++) {
+   cpsw->rxv[i].budget = ch_budget;
+   budget -= ch_budget;
+   }
+
+   if (budget)
+   cpsw->rxv[0].budget += budget;
+}
+
 static irqreturn_t cpsw_tx_interrupt(int irq, void *dev_id)
 {
struct cpsw_common *cpsw = dev_id;
@@ -941,6 +1017,7 @@ static void cpsw_adjust_link(struct net_device *ndev)
for_each_slave(priv, _cpsw_adjust_link, priv, );
 
if (link) {
+   cpsw_split_budget(priv->ndev);
netif_carrier_on(ndev);
if (netif_running(ndev))
netif_tx_wake_all_queues(ndev);
@@ -1280,82 +1357,6 @@ static void cpsw_init_host_port(struct cpsw_priv *priv)
}
 }
 
-/* split budget depending on channel rates */
-static void cpsw_split_budget(struct net_device *ndev)
-{
-   struct cpsw_priv *priv = netdev_priv(ndev);
-   struct cpsw_common *cpsw = priv->cpsw;
-   struct cpsw_vector *txv = cpsw->txv;
-   u32 consumed_rate, bigest_rate = 0;
-   int budget, bigest_rate_ch = 0;
-   struct cpsw_slave *slave;
-   int i, rlim_ch_num = 0;
-   u32 ch_rate, max_rate;
-   int ch_budget = 0;
-
-   if (cpsw->data.dual_emac)
-   slave = >slaves[priv->emac_port];
-   else
-   slave = >slaves[cpsw->data.active_slave];
-
-   max_rate = slave->phy->speed * 1000;
-
-   consumed_rate = 0;
-   for (i = 0; i < cpsw->tx_ch_num; i++) {
-   ch_rate = cpdma_chan_get_rate(txv[i].ch);
-   if (!ch_rate)
-   continue;
-
-   rlim_ch_num++;
-   consumed_rate += ch_rate;
-   }
-
-   if 

[PATCH] drivers: net: cpsw-phy-sel: Clear RGMII_IDMODE on "rgmii" links

2016-12-05 Thread Alexandru Gagniuc
Support for setting the RGMII_IDMODE bit was added in commit:
"drivers: net: cpsw-phy-sel: add support to configure rgmii internal delay"
However, that commit did not add the symmetrical clearing of the bit
by way of setting it in "mask". Add it here.

Note that the documentation marks clearing this bit as "reserved",
however, according to TI, support for delaying the clock does exist in
the MAC, although it is not officially supported.
We tested this on a board with an RGMII to RGMII link that will not
work unless this bit is cleared.

Signed-off-by: Alexandru Gagniuc 
---
 drivers/net/ethernet/ti/cpsw-phy-sel.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/ethernet/ti/cpsw-phy-sel.c 
b/drivers/net/ethernet/ti/cpsw-phy-sel.c
index ba1e45f..1801364 100644
--- a/drivers/net/ethernet/ti/cpsw-phy-sel.c
+++ b/drivers/net/ethernet/ti/cpsw-phy-sel.c
@@ -81,6 +81,7 @@ static void cpsw_gmii_sel_am3352(struct cpsw_phy_sel_priv 
*priv,
};
 
mask = GMII_SEL_MODE_MASK << (slave * 2) | BIT(slave + 6);
+   mask |= BIT(slave + 4);
mode <<= slave * 2;
 
if (priv->rmii_clock_external) {
-- 
2.7.4



Re: commit : ppp: add rtnetlink device creation support - breaks netcf on my machine.

2016-12-05 Thread Brad Campbell

On 06/12/16 01:53, Guillaume Nault wrote:


Can you send a minimal configuration file that triggers the bug?
I've set up a virtual machine (Linux 4.7.0, netcf 0.2.8 backported from
Debian Sid), but couldn't reproduce the issue so far.



Ok I reproduced this in a VM. I used the stock Debian 7 install netcf 
version 0.1.9 which exhibits the fault just as well, so no backporting 
required.


The server is (as before) a Debian 7.11 box.
I downloaded rp-pppoe and compiled it up on the server.

Moved into the rp-pppoe-3.12/src directory.
Created these two files :
root@srv:~/src/rp-pppoe-3.12/src# cat allip
10.0.1.2-200

root@srv:~/src/rp-pppoe-3.12/src# cat pppoe-server-options
#require-pap
noauth
lcp-echo-interval 10
lcp-echo-failure 2
ms-dns 192.168.2.1
netmask 255.255.255.0
usepeerdns
nobsdcomp
noccp
novj
noipx

And ran the server :
./pppoe-server -I br1 -C isp -S jimmy -L 10.0.1.1 -O 
`pwd`/pppoe-server-options -p `pwd`/allip -q /usr/sbin/pppd -Q 
`pwd`/pppoe -F


The VM is a stock Debian VM installed from the Debian 7.5 netinst and 
upgraded to the latest packages. The VM is run on KVM and the network 
interface is bound to br2.


Install ppp and netcf. Created :
root@debian64:/home/brad# cat /etc/ppp/peers/dsl-provider
debug
noipdefault
noauth
persist
maxfail 0
plugin rp-pppoe.so eth0

And added the following to /etc/network/interfaces:

auto dsl-provider
iface dsl-provider inet ppp
provider dsl-provider

With the stock debian kernel this brings up the ppp0 interface, and 
ncftool works fine.


I compiled up a 4.8.12 kernel with the attached config (static kernel, 
no modules) resulting in :


root@debian64:/home/brad# ifdown dsl-provider
root@debian64:/home/brad# ifconfig
eth0  Link encap:Ethernet  HWaddr 52:54:00:ae:c0:74
  inet addr:192.168.253.50  Bcast:192.168.253.255 
Mask:255.255.255.0

  UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
  RX packets:1841 errors:0 dropped:0 overruns:0 frame:0
  TX packets:674 errors:0 dropped:0 overruns:0 carrier:0
  collisions:0 txqueuelen:1000
  RX bytes:230980 (225.5 KiB)  TX bytes:99559 (97.2 KiB)

loLink encap:Local Loopback
  inet addr:127.0.0.1  Mask:255.0.0.0
  UP LOOPBACK RUNNING  MTU:65536  Metric:1
  RX packets:0 errors:0 dropped:0 overruns:0 frame:0
  TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
  collisions:0 txqueuelen:1
  RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)

root@debian64:/home/brad# ncftool
ncftool> quit
root@debian64:/home/brad# ifup dsl-provider
Plugin rp-pppoe.so loaded.
root@debian64:/home/brad# ifconfig
eth0  Link encap:Ethernet  HWaddr 52:54:00:ae:c0:74
  inet addr:192.168.253.50  Bcast:192.168.253.255 
Mask:255.255.255.0

  UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
  RX packets:1949 errors:0 dropped:0 overruns:0 frame:0
  TX packets:741 errors:0 dropped:0 overruns:0 carrier:0
  collisions:0 txqueuelen:1000
  RX bytes:240006 (234.3 KiB)  TX bytes:110717 (108.1 KiB)

loLink encap:Local Loopback
  inet addr:127.0.0.1  Mask:255.0.0.0
  UP LOOPBACK RUNNING  MTU:65536  Metric:1
  RX packets:0 errors:0 dropped:0 overruns:0 frame:0
  TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
  collisions:0 txqueuelen:1
  RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)

ppp0  Link encap:Point-to-Point Protocol
  inet addr:10.0.1.3  P-t-P:10.0.1.1  Mask:255.255.255.255
  UP POINTOPOINT RUNNING NOARP MULTICAST  MTU:1492  Metric:1
  RX packets:4 errors:0 dropped:0 overruns:0 frame:0
  TX packets:4 errors:0 dropped:0 overruns:0 carrier:0
  collisions:0 txqueuelen:3
  RX bytes:52 (52.0 B)  TX bytes:46 (46.0 B)

root@debian64:/home/brad# ncftool
Failed to initialize netcf

Hope this helps.

Regards,
Brad
#
# Automatically generated file; DO NOT EDIT.
# Linux/x86 4.8.12 Kernel Configuration
#
CONFIG_64BIT=y
CONFIG_X86_64=y
CONFIG_X86=y
CONFIG_INSTRUCTION_DECODER=y
CONFIG_OUTPUT_FORMAT="elf64-x86-64"
CONFIG_ARCH_DEFCONFIG="arch/x86/configs/x86_64_defconfig"
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_MMU=y
CONFIG_ARCH_MMAP_RND_BITS_MIN=28
CONFIG_ARCH_MMAP_RND_BITS_MAX=32
CONFIG_ARCH_MMAP_RND_COMPAT_BITS_MIN=8
CONFIG_ARCH_MMAP_RND_COMPAT_BITS_MAX=16
CONFIG_NEED_DMA_MAP_STATE=y
CONFIG_NEED_SG_DMA_LENGTH=y
CONFIG_GENERIC_ISA_DMA=y
CONFIG_GENERIC_BUG=y
CONFIG_GENERIC_BUG_RELATIVE_POINTERS=y
CONFIG_GENERIC_HWEIGHT=y
CONFIG_ARCH_MAY_HAVE_PC_FDC=y
CONFIG_RWSEM_XCHGADD_ALGORITHM=y
CONFIG_GENERIC_CALIBRATE_DELAY=y
CONFIG_ARCH_HAS_CPU_RELAX=y
CONFIG_ARCH_HAS_CACHE_LINE_SIZE=y
CONFIG_HAVE_SETUP_PER_CPU_AREA=y
CONFIG_NEED_PER_CPU_EMBED_FIRST_CHUNK=y
CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK=y
CONFIG_ARCH_HIBERNATION_POSSIBLE=y
CONFIG_ARCH_SUSPEND_POSSIBLE=y
CONFIG_ARCH_WANT_HUGE_PMD_SHARE=y
CONFIG_ARCH_WANT_GENERAL_HUGETLB=y
CONFIG_ZONE_DMA32=y
CONFIG_AUDIT_ARCH=y

[PATCH iproute2 -net-next] bpf: add initial support for attaching xdp progs

2016-12-05 Thread Daniel Borkmann
Now that we made the BPF loader generic as a library, reuse it
for loading XDP programs as well. This basically adds a minimal
start of a facility for iproute2 to load XDP programs. There
currently only exists the xdp1_user.c sample code in the kernel
tree that sets up netlink directly and an iovisor/bcc front-end.

Since we have all the necessary infrastructure in place already
from tc side, we can just reuse its loader back-end and thus
facilitate migration and usability among the two for people
familiar with tc/bpf already. Sharing maps, performing tail calls,
etc works the same way as with tc. Naturally, once kernel
configuration API evolves, we will extend new features for XDP
here as well, resp. extend dumping of related netlink attributes.

Minimal example:

  clang -target bpf -O2 -Wall -c prog.c -o prog.o
  ip [-force] link set dev em1 xdp obj prog.o   # attaching
  ip [-d] link  # dumping
  ip link set dev em1 xdp off   # detaching

For the dump, intention is that in the first line for each ip
link entry, we'll see "xdp" to indicate that this device has an
XDP program attached. Once we dump some more useful information
via netlink (digest, etc), idea is that 'ip -d link' will then
display additional relevant program information below the "link/
ether [...]" output line for such devices, for example.

Signed-off-by: Daniel Borkmann 
Acked-by: Alexei Starovoitov 
---
 include/bpf_api.h |  5 +++
 include/bpf_elf.h |  1 +
 include/utils.h   |  7 +++-
 ip/Makefile   |  2 +-
 ip/ipaddress.c|  3 ++
 ip/iplink.c   | 22 +++-
 ip/iplink_xdp.c   | 75 
 ip/xdp.h  |  9 +
 lib/bpf.c |  6 
 man/man8/ip-link.8.in | 95 ++-
 10 files changed, 213 insertions(+), 12 deletions(-)
 create mode 100644 ip/iplink_xdp.c
 create mode 100644 ip/xdp.h

diff --git a/include/bpf_api.h b/include/bpf_api.h
index 7642623..72578c9 100644
--- a/include/bpf_api.h
+++ b/include/bpf_api.h
@@ -72,6 +72,11 @@
__section(__stringify(ID) "/" __stringify(KEY))
 #endif
 
+#ifndef __section_xdp_entry
+# define __section_xdp_entry   \
+   __section(ELF_SECTION_PROG)
+#endif
+
 #ifndef __section_cls_entry
 # define __section_cls_entry   \
__section(ELF_SECTION_CLASSIFIER)
diff --git a/include/bpf_elf.h b/include/bpf_elf.h
index 36cc988..239a0f3 100644
--- a/include/bpf_elf.h
+++ b/include/bpf_elf.h
@@ -15,6 +15,7 @@
 /* ELF section names, etc */
 #define ELF_SECTION_LICENSE"license"
 #define ELF_SECTION_MAPS   "maps"
+#define ELF_SECTION_PROG   "prog"
 #define ELF_SECTION_CLASSIFIER "classifier"
 #define ELF_SECTION_ACTION "action"
 
diff --git a/include/utils.h b/include/utils.h
index 1b4f939..26c970d 100644
--- a/include/utils.h
+++ b/include/utils.h
@@ -239,7 +239,12 @@ ssize_t getcmdline(char **line, size_t *len, FILE *in);
 int makeargs(char *line, char *argv[], int maxargs);
 int inet_get_addr(const char *src, __u32 *dst, struct in6_addr *dst6);
 
-struct iplink_req;
+struct iplink_req {
+   struct nlmsghdr n;
+   struct ifinfomsgi;
+   charbuf[1024];
+};
+
 int iplink_parse(int argc, char **argv, struct iplink_req *req,
char **name, char **type, char **link, char **dev,
int *group, int *index);
diff --git a/ip/Makefile b/ip/Makefile
index 86c8cdc..c8e6c61 100644
--- a/ip/Makefile
+++ b/ip/Makefile
@@ -2,7 +2,7 @@ IPOBJ=ip.o ipaddress.o ipaddrlabel.o iproute.o iprule.o 
ipnetns.o \
 rtm_map.o iptunnel.o ip6tunnel.o tunnel.o ipneigh.o ipntable.o iplink.o \
 ipmaddr.o ipmonitor.o ipmroute.o ipprefix.o iptuntap.o iptoken.o \
 ipxfrm.o xfrm_state.o xfrm_policy.o xfrm_monitor.o \
-iplink_vlan.o link_veth.o link_gre.o iplink_can.o \
+iplink_vlan.o link_veth.o link_gre.o iplink_can.o iplink_xdp.o \
 iplink_macvlan.o ipl2tp.o link_vti.o link_vti6.o \
 iplink_vxlan.o tcp_metrics.o iplink_ipoib.o ipnetconf.o link_ip6tnl.o \
 link_iptnl.o link_gre6.o iplink_bond.o iplink_bond_slave.o iplink_hsr.o \
diff --git a/ip/ipaddress.c b/ip/ipaddress.c
index 50897e6..de64877 100644
--- a/ip/ipaddress.c
+++ b/ip/ipaddress.c
@@ -35,6 +35,7 @@
 #include "utils.h"
 #include "ll_map.h"
 #include "ip_common.h"
+#include "xdp.h"
 #include "color.h"
 
 enum {
@@ -838,6 +839,8 @@ int print_linkinfo(const struct sockaddr_nl *who,
 
if (tb[IFLA_MTU])
fprintf(fp, "mtu %u ", *(int *)RTA_DATA(tb[IFLA_MTU]));
+   if (tb[IFLA_XDP])
+   xdp_dump(fp, tb[IFLA_XDP]);
if (tb[IFLA_QDISC])
fprintf(fp, "qdisc %s ", rta_getattr_str(tb[IFLA_QDISC]));
if (tb[IFLA_MASTER]) {
diff --git a/ip/iplink.c b/ip/iplink.c
index 

[PATCH iproute2 -net-next] bpf: check for owner_prog_type and notify users when differ

2016-12-05 Thread Daniel Borkmann
Kernel commit 21116b7068b9 ("bpf: add owner_prog_type and accounted mem
to array map's fdinfo") added support for telling the owner prog type in
case of prog arrays. Give a notification to the user when they differ,
and the program eventually fails to load.

Signed-off-by: Daniel Borkmann 
Acked-by: Alexei Starovoitov 
---
 lib/bpf.c | 18 ++
 1 file changed, 14 insertions(+), 4 deletions(-)

diff --git a/lib/bpf.c b/lib/bpf.c
index 8a5b84b..f714993 100644
--- a/lib/bpf.c
+++ b/lib/bpf.c
@@ -273,11 +273,11 @@ static void bpf_map_pin_report(const struct bpf_elf_map 
*pin,
 }
 
 static int bpf_map_selfcheck_pinned(int fd, const struct bpf_elf_map *map,
-   int length)
+   int length, enum bpf_prog_type type)
 {
char file[PATH_MAX], buff[4096];
struct bpf_elf_map tmp = {}, zero = {};
-   unsigned int val;
+   unsigned int val, owner_type = 0;
FILE *fp;
 
snprintf(file, sizeof(file), "/proc/%d/fdinfo/%d", getpid(), fd);
@@ -299,10 +299,19 @@ static int bpf_map_selfcheck_pinned(int fd, const struct 
bpf_elf_map *map,
tmp.max_elem = val;
else if (sscanf(buff, "map_flags:\t%i", ) == 1)
tmp.flags = val;
+   else if (sscanf(buff, "owner_prog_type:\t%i", ) == 1)
+   owner_type = val;
}
 
fclose(fp);
 
+   /* The decision to reject this is on kernel side eventually, but
+* at least give the user a chance to know what's wrong.
+*/
+   if (owner_type && owner_type != type)
+   fprintf(stderr, "Program array map owner types differ: %u (obj) 
!= %u (pin)\n",
+   type, owner_type);
+
if (!memcmp(, map, length)) {
return 0;
} else {
@@ -818,7 +827,8 @@ int bpf_graft_map(const char *map_path, uint32_t *key, int 
argc, char **argv)
}
 
ret = bpf_map_selfcheck_pinned(map_fd, ,
-  offsetof(struct bpf_elf_map, max_elem));
+  offsetof(struct bpf_elf_map, max_elem),
+  type);
if (ret < 0) {
fprintf(stderr, "Map \'%s\' self-check failed!\n", map_path);
goto out_map;
@@ -1300,7 +1310,7 @@ static int bpf_map_attach(const char *name, const struct 
bpf_elf_map *map,
if (fd > 0) {
ret = bpf_map_selfcheck_pinned(fd, map,
   offsetof(struct bpf_elf_map,
-   id));
+   id), ctx->type);
if (ret < 0) {
close(fd);
fprintf(stderr, "Map \'%s\' self-check failed!\n",
-- 
1.9.3



Re: [net-next][PATCH 02/18] RDS: mark few internal functions static to make sparse build happy

2016-12-05 Thread Santosh Shilimkar

On 12/5/2016 1:45 AM, Sergei Shtylyov wrote:

Hello!

On 12/5/2016 9:57 AM, Santosh Shilimkar wrote:


[...]


-void rds_walk_conn_path_info(struct socket *sock, unsigned int len,
+static void rds_walk_conn_path_info(struct socket *sock, unsigned int
len,
  struct rds_info_iterator *iter,
  struct rds_info_lengths *lens,
  int (*visitor)(struct rds_conn_path *, void *),


   You now need to realign the continuation lines.


Right. Will fix that. Thanks !!

Regards,
Santosh


Re: [ovs-dev] [PATCH net-next] net: remove abuse of VLAN DEI/CFI bit

2016-12-05 Thread Ben Pfaff
On Mon, Dec 05, 2016 at 11:52:47PM +0100, Michał Mirosław wrote:
> On Mon, Dec 05, 2016 at 10:55:45AM -0800, Ben Pfaff wrote:
> > On Mon, Dec 05, 2016 at 06:24:36PM +0100, Michał Mirosław wrote:
> > > On Sat, Dec 03, 2016 at 03:27:30PM -0800, Ben Pfaff wrote:
> > > > On Sat, Dec 03, 2016 at 10:22:28AM +0100, Michał Mirosław wrote:
> > > > > This All-in-one patch removes abuse of VLAN CFI bit, so it can be 
> > > > > passed
> > > > > intact through linux networking stack.
> > > > This appears to change the established Open vSwitch userspace API.  You
> > > > can see that simply from the way that it changes the documentation for
> > > > the userspace API.  If I'm right about that, then this change will break
> > > > all userspace programs that use the Open vSwitch kernel module,
> > > > including Open vSwitch itself.
> > > 
> > > If I understood the code correctly, it does change expected meaning for
> > > the (unlikely?) case of header truncated just before the VLAN TCI - it 
> > > will
> > > be impossible to differentiate this case from the VLAN TCI == 0.
> > > 
> > > I guess this is a problem with OVS API, because it doesn't directly show
> > > the "missing" state of elements, but relies on an "invalid" value.
> > 
> > That particular corner case should not be a huge problem in any case.
> > 
> > The real problem is that this appears to break the common case use of
> > VLANs in Open vSwitch.  After this patch, parse_vlan() in
> > net/openvswitch/flow.c copies the tpid and tci from sk_buff (either the
> > accelerated version of them or the version in the skb data) into
> > sw_flow_key members.  OK, that's fine on it's own.  However, I don't see
> > any corresponding change to the code in flow_netlink.c to compensate for
> > the fact that, until now, the VLAN CFI bit (formerly VLAN_TAG_PRESENT)
> > was always required to be set to 1 in flow matches inside Netlink
> > messages sent from userspace, and the kernel always set it to 1 in
> > corresponding messages sent to userspace.
> > 
> > In other words, if I'm reading this change correctly:
> > 
> > * With a kernel before this change, userspace always had to set
> >   VLAN_TAG_PRESENT to 1 to match on a VLAN, or the kernel would
> >   reject the flow match.
> > 
> > * With a kernel after this change, userspace must not set
> >   VLAN_TAG_PRESENT to 1, otherwise the kernel will accept the flow
> >   match but nothing will ever match because packets do not actually
> >   have the CFI bit set.
> > 
> > Take a look at this code that the patch deletes from
> > validate_vlan_from_nlattrs(), for example, and see how it insisted that
> > VLAN_TAG_PRESENT was set:
> > 
> > if (!(tci & htons(VLAN_TAG_PRESENT))) {
> > if (tci) {
> > OVS_NLERR(log, "%s TCI does not have VLAN_TAG_PRESENT 
> > bit set.",
> >   (inner) ? "C-VLAN" : "VLAN");
> > return -EINVAL;
> > } else if (nla_len(a[OVS_KEY_ATTR_ENCAP])) {
> > /* Corner case for truncated VLAN header. */
> > OVS_NLERR(log, "Truncated %s header has non-zero encap 
> > attribute.",
> >   (inner) ? "C-VLAN" : "VLAN");
> > return -EINVAL;
> > }
> > }
> > 
> > Please let me know if I'm overlooking something.
> 
> Hmm. So the easiest change without disrupting current userspace, would be
> to flip the CFI bit on the way to/from OVS userspace. Does this seem
> correct?

That sounds correct.  (The bit should not be flipped in the mask.)

Thanks,

Ben.


Re: [patch net v3] net: fec: fix compile with CONFIG_M5272

2016-12-05 Thread David Miller
From: Nikita Yushchenko 
Date: Mon,  5 Dec 2016 20:41:01 +0300

> Commit 4dfb80d18d05 ("net: fec: cache statistics while device is down")
> introduced unconditional statistics-related actions.

I do not see this commit in any of my trees:

[davem@localhost net-next]$ git describe 4dfb80d18d05
fatal: Not a valid object name 4dfb80d18d05
[davem@localhost net-next]$ cd ../net
[davem@localhost net]$ git describe 4dfb80d18d05
fatal: Not a valid object name 4dfb80d18d05
[davem@localhost net]$


Re: [PATCH net-next 4/7] liquidio CN23XX: VF scatter gather lists

2016-12-05 Thread David Miller
From: Raghu Vatsavayi 
Date: Mon, 5 Dec 2016 01:15:15 -0800

> + kfree((void *)lio->glist);
> + kfree((void *)lio->glist_lock);
> +}
 ...
> + if (!lio->glist) {
> + kfree((void *)lio->glist_lock);
> + return 1;
> + }

These void casts are unnecessary, please remove them.


Re: [v3 PATCH] netlink: Do not schedule work from sk_destruct

2016-12-05 Thread David Miller
From: Herbert Xu 
Date: Mon, 5 Dec 2016 15:28:21 +0800

> It is wrong to schedule a work from sk_destruct using the socket
> as the memory reserve because the socket will be freed immediately
> after the return from sk_destruct.
> 
> Instead we should do the deferral prior to sk_free.
> 
> This patch does just that.
> 
> Fixes: 707693c8a498 ("netlink: Call cb->done from a worker thread")
> Signed-off-by: Herbert Xu 

Applied, thanks Herbert.


Re: [RESEND][PATCH v4] cgroup: Use CAP_SYS_RESOURCE to allow a process to migrate other tasks between cgroups

2016-12-05 Thread Andy Lutomirski
On Mon, Dec 5, 2016 at 4:28 PM, John Stultz  wrote:
> On Tue, Nov 22, 2016 at 4:57 PM, John Stultz  wrote:
>> On Tue, Nov 8, 2016 at 4:12 PM, Andy Lutomirski  wrote:
>>> On Tue, Nov 8, 2016 at 4:03 PM, Alexei Starovoitov
>>>  wrote:
 On Tue, Nov 08, 2016 at 03:51:40PM -0800, Andy Lutomirski wrote:
>
> I hate to say it, but I think I may see a problem.  Current
> developments are afoot to make cgroups do more than resource control.
> For example, there's Landlock and there's Daniel's ingress/egress
> filter thing.  Current cgroup controllers can mostly just DoS their
> controlled processes.  These new controllers (or controller-like
> things) can exfiltrate data and change semantics.
>
> Does anyone have a security model in mind for these controllers and
> the cgroups that they're attached to?  I'm reasonably confident that
> CAP_SYS_RESOURCE is not the answer...

 and specifically the answer is... ?
 Also would be great if you start with specifying the question first
 and the problem you're trying to solve.

>>>
>>> I don't have a good answer right now.  Here are some constraints, though:
>>>
>>> 1. An insufficiently privileged process should not be able to move a
>>> victim into a dangerous cgroup.
>>>
>>> 2. An insufficiently privileged process should not be able to move
>>> itself into a dangerous cgroup and then use execve to gain privilege
>>> such that the execve'd program can be compromised.
>>>
>>> 3. An insufficiently privileged process should not be able to make an
>>> existing cgroup dangerous in a way that could compromise a victim in
>>> that cgroup.
>>>
>>> 4. An insufficiently privileged process should not be able to make a
>>> cgroup dangerous in a way that bypasses protections that would
>>> otherwise protect execve() as used by itself or some other process in
>>> that cgroup.
>>>
>>> Keep in mind that "dangerous" may apply to a cgroup's descendents in
>>> addition to the cgroup being controlled.
>>
>> Sorry for taking awhile to get back to you here.  I'm a little
>> befuddled as to what next steps I should consider (and honestly, I'm
>> not totally sure I really grok your concern here, particularly what
>> you mean with "dangrous cgroups").
>>
>> So is going back to the CAP_CGROUP_MIGRATE approach (to properly
>> separate "sufficiently" from "insufficiently privileged") better?
>>
>> Or something closer to the original method Android used of each cgroup
>> having an allow_attach() check which could determine what is
>> sufficiently privledged for the respective level of danger the cgroup
>> might poise?
>>
>> Or just stepping back, what method would you imagine to be reasonable
>> to allow a specified task to migrate other tasks between cgroups
>> without it having to be root/suid?
>
> Any suggested feedback here?

I really don't know.  The cgroupfs interface is a bit unfortunate in
that it doesn't really express the constraints.  To safely migrate a
task, ISTM you ought to have some form of privilege over the task
*and* some form of privilege over the cgroup.  cgroupfs only handles
the latter.

CAP_CGROUP_MIGRATE ought to be okay.  Or maybe cgroupfs needs to gain
a concept of "dangerous" cgroups and further restrict them and
CAP_SYS_RESOURCE should be fine for non-dangerous cgroups?  I think I
favor the latter, but it might be nice to hear from Tejun first.

--Andy


Re: [RESEND][PATCH v4] cgroup: Use CAP_SYS_RESOURCE to allow a process to migrate other tasks between cgroups

2016-12-05 Thread John Stultz
On Tue, Nov 22, 2016 at 4:57 PM, John Stultz  wrote:
> On Tue, Nov 8, 2016 at 4:12 PM, Andy Lutomirski  wrote:
>> On Tue, Nov 8, 2016 at 4:03 PM, Alexei Starovoitov
>>  wrote:
>>> On Tue, Nov 08, 2016 at 03:51:40PM -0800, Andy Lutomirski wrote:

 I hate to say it, but I think I may see a problem.  Current
 developments are afoot to make cgroups do more than resource control.
 For example, there's Landlock and there's Daniel's ingress/egress
 filter thing.  Current cgroup controllers can mostly just DoS their
 controlled processes.  These new controllers (or controller-like
 things) can exfiltrate data and change semantics.

 Does anyone have a security model in mind for these controllers and
 the cgroups that they're attached to?  I'm reasonably confident that
 CAP_SYS_RESOURCE is not the answer...
>>>
>>> and specifically the answer is... ?
>>> Also would be great if you start with specifying the question first
>>> and the problem you're trying to solve.
>>>
>>
>> I don't have a good answer right now.  Here are some constraints, though:
>>
>> 1. An insufficiently privileged process should not be able to move a
>> victim into a dangerous cgroup.
>>
>> 2. An insufficiently privileged process should not be able to move
>> itself into a dangerous cgroup and then use execve to gain privilege
>> such that the execve'd program can be compromised.
>>
>> 3. An insufficiently privileged process should not be able to make an
>> existing cgroup dangerous in a way that could compromise a victim in
>> that cgroup.
>>
>> 4. An insufficiently privileged process should not be able to make a
>> cgroup dangerous in a way that bypasses protections that would
>> otherwise protect execve() as used by itself or some other process in
>> that cgroup.
>>
>> Keep in mind that "dangerous" may apply to a cgroup's descendents in
>> addition to the cgroup being controlled.
>
> Sorry for taking awhile to get back to you here.  I'm a little
> befuddled as to what next steps I should consider (and honestly, I'm
> not totally sure I really grok your concern here, particularly what
> you mean with "dangrous cgroups").
>
> So is going back to the CAP_CGROUP_MIGRATE approach (to properly
> separate "sufficiently" from "insufficiently privileged") better?
>
> Or something closer to the original method Android used of each cgroup
> having an allow_attach() check which could determine what is
> sufficiently privledged for the respective level of danger the cgroup
> might poise?
>
> Or just stepping back, what method would you imagine to be reasonable
> to allow a specified task to migrate other tasks between cgroups
> without it having to be root/suid?

Any suggested feedback here?

thanks
-john


Re: [PATCH nf-next] netfilter: xt_bpf: support ebpf

2016-12-05 Thread Florian Westphal
Willem de Bruijn  wrote:
> While we're discussing the patch, another question, about revisions: I
> tested both modified and original iptables binaries on both standard
> and modified kernels. It all works as expected, except for the case
> where both binaries are used on a single kernel. For instance:
> 
>   iptables -A OUTPUT -m bpf --bytecode "`./nfbpf_compile RAW 'udp port
> 8000'`" -j LOG
>   ./iptables.new -L
> 
> Here the new binary will interpret the object as xt_bpf_match_v1, but
> iptables has inserted xt_bpf_match. The same problem happens the other
> way around. A new binary can be made robust to detect old structs, but
> not the other way around. Specific to bpf, the existing xt_bpf code
> has an unfortunate bug that it always prints at least one line of
> code, even if ->bpf_program_num_elems == 0.
> 
> I notice that other extensions also do not necessarily only extend
> struct vN in vN+1. Is the above a known issue?

Yes, I guess noone ever bothered to fix this.

The kernel blob should contain the match/target revision number,
so userspace can in fact see that 'this is bpf v42', but iirc
the netfilter userspace just loads the highest userspace revision
supported by the kernel (which is then different for the 2 iptables
binaries).

But we *could* display message like 'kernel uses revision 2 but I can
only find 0 and 1' or fall back to the lower supported revision without
guess-the-struct-by-size games.


Oops with CONFIG_VMAP_STCK and bond device + virtio-net

2016-12-05 Thread Laura Abbott
Hi,

Fedora got a bug report https://bugzilla.redhat.com/show_bug.cgi?id=1401612
In qemu with two virtio-net interfaces:

$ ip l
...
5: ens14:  mtu 1500 qdisc noop state DOWN mode DEFAULT 
group default qlen 1000
link/ether 52:54:00:e9:64:41 brd ff:ff:ff:ff:ff:ff
6: ens15:  mtu 1500 qdisc noop state DOWN mode DEFAULT 
group default qlen 1000
link/ether 52:54:00:e9:64:42 brd ff:ff:ff:ff:ff:ff

$ sudo ip link add bond1 type bond
$ sudo ip link set ens14 master bond1
Segmentation fault

 [ cut here ]
 kernel BUG at ./include/linux/scatterlist.h:140!
 invalid opcode:  [#1] SMP
 Modules linked in: bonding ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 
xt_conntrack ip
  ata_generic crc32c_intel qxl drm_kms_helper virtio_pci serio_raw ttm drm 
pata_acpi
 CPU: 5 PID: 1983 Comm: ip Not tainted 4.9.0-0.rc6.git2.1.fc26.x86_64 #1
 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015
 task: 9d50a3583240 task.stack: b06e4104
 RIP: 0010:[]  [] sg_init_one+0x8c/0xa0
 RSP: 0018:b06e41043698  EFLAGS: 00010246
 RAX:  RBX: b06e41043774 RCX: 0028
 RDX: 131ec1043774 RSI: 0013 RDI: b06ec1043774
 RBP: b06e410436b0 R08: 001ddbe0 R09: b06e410436c8
 R10: 0001 R11:  R12: 0006
 R13: b06e410436c8 R14: 9d50b2dc1800 R15: 9d50b3db9600
 FS:  7f15347e5700() GS:9d50bb00() knlGS:
 CS:  0010 DS:  ES:  CR0: 80050033
 CR2: 7ffc09bc4000 CR3: 000135797000 CR4: 000406e0
 Stack:
  9d50b229d000  b06e41043772 b06e41043720
  c0051123 9d50a3583240 87654321 0002
     7b8f5301
 Call Trace:
  [] virtnet_set_mac_address+0xb3/0x140 [virtio_net]
  [] dev_set_mac_address+0x55/0xc0
  [] bond_enslave+0x34e/0x1180 [bonding]
  [] do_setlink+0x6cf/0xd10
  [] ? get_page_from_freelist+0x6ba/0xca0
  [] ? sched_clock+0x9/0x10
  [] ? kvm_sched_clock_read+0x25/0x40
  [] ? __lock_acquire+0x346/0x1290
  [] ? nla_parse+0xa6/0x120
  [] rtnl_newlink+0x5c8/0x870
  [] ? avc_has_perm_noaudit+0x32/0x210
  [] ? ns_capable_common+0x7a/0x90
  [] ? ns_capable+0x13/0x20
  [] rtnetlink_rcv_msg+0xe6/0x210
  [] ? rtnetlink_rcv+0x1b/0x40
  [] ? rtnetlink_rcv+0x1b/0x40
  [] ? rtnl_newlink+0x870/0x870
  [] netlink_rcv_skb+0xa4/0xc0
  [] rtnetlink_rcv+0x2a/0x40
  [] netlink_unicast+0x1f7/0x2f0
  [] ? netlink_unicast+0x16f/0x2f0
  [] netlink_sendmsg+0x302/0x3c0
  [] sock_sendmsg+0x38/0x50
  [] ___sys_sendmsg+0x2e3/0x2f0
  [] ? __audit_syscall_entry+0xad/0xf0
  [] ? kvm_sched_clock_read+0x25/0x40
  [] ? sched_clock+0x9/0x10
  [] ? __audit_syscall_entry+0xad/0xf0
  [] ? __audit_syscall_entry+0xad/0xf0
  [] ? trace_hardirqs_on_caller+0xf5/0x1b0
  [] __sys_sendmsg+0x54/0x90
  [] SyS_sendmsg+0x12/0x20
  [] do_syscall_64+0x6c/0x1f0
  [] entry_SYSCALL64_slow_path+0x25/0x25
 Code: ca 75 2c 49 8b 55 08 f6 c2 01 75 25 83 e2 03 81 e3 ff 0f 00 00 45 89 65 
14 48
 RIP  [] sg_init_one+0x8c/0xa0
  RSP 
 ---[ end trace 9076d2284efbf735 ]---

This looks like an issue with CONFIG_VMAP_STACK since bond_enslave uses
struct sockaddr from the stack and virtnet_set_mac_address calls
sg_init_one which triggers BUG_ON(!virt_addr_valid(buf));

I know there have been a lot of CONFIG_VMAP_STACK fixes around but I
didn't find this one reported yet.

Thanks,
Laura


Re: wl1251 & mac address & calibration data

2016-12-05 Thread Tony Lindgren
* Pali Rohár <pali.ro...@gmail.com> [161126 09:21]:
> On Thursday 24 November 2016 19:46:01 Aaro Koskinen wrote:
> > Hi,
> > 
> > On Thu, Nov 24, 2016 at 04:20:45PM +0100, Pali Rohár wrote:
> > > Proprietary, signed and closed bootloader NOLO does not support DT.
> > > So for booting you need to append DTS file to kernel image.
> > > 
> > > U-Boot is optional and can be used as intermediate bootloader
> > > between NOLO and kernel. But still it has problems with reading
> > > from nand, so cannot read NVS data nor MAC address.
> > 
> > You could use kexec to pass the fixed DT.
> > 
> > A.
> 
> IIRC it was broken for N900/omap3, no idea if somebody fixed it.

FYI, at least in next-20161205 kexec works on omap3 for me.

Regards,

Tony




Re: [PATCH nf-next] netfilter: xt_bpf: support ebpf

2016-12-05 Thread Willem de Bruijn
On Mon, Dec 5, 2016 at 6:29 PM, Willem de Bruijn  wrote:
> On Mon, Dec 5, 2016 at 6:22 PM, Pablo Neira Ayuso  wrote:
>> On Mon, Dec 05, 2016 at 06:06:05PM -0500, Willem de Bruijn wrote:
>> [...]
>>> Eric also suggests a private variable to avoid being subject to
>>> changes to PATH_MAX. Then we can indeed also choose an arbitrary lower
>>> length than current PATH_MAX.
>>
>> Good.
>>
>>> FWIW, there is a workaround for users with deeply nested paths: the
>>> path passed does not have to be absolute. It is literally what is
>>> passed on the command line to iptables right now, including relative
>>> addresses.
>>
>> If iptables userspace always expects to have the bpf file repository
>> in some given location (suggesting to have a directory that we specify
>> at ./configure time, similar to what we do with connlabel.conf), then
>> I think we can rely on relative paths. Would this be flexible enough
>> for your usecase?
>
> As long as it accepts relative paths, I think it will always work.
> Worst case, a user has to cd. No need for hardcoding the bpf mount
> point at compile time.
>
> I have the matching iptables patch for pinned objects, btw. Not for
> elf objects, which requires linking to libelf and parsing the object,
> which is more work (and perhaps best punted on by expanding libbpf in
> bcc to include this functionality. it already exists under samples/bpf
> and iproute2).

While we're discussing the patch, another question, about revisions: I
tested both modified and original iptables binaries on both standard
and modified kernels. It all works as expected, except for the case
where both binaries are used on a single kernel. For instance:

  iptables -A OUTPUT -m bpf --bytecode "`./nfbpf_compile RAW 'udp port
8000'`" -j LOG
  ./iptables.new -L

Here the new binary will interpret the object as xt_bpf_match_v1, but
iptables has inserted xt_bpf_match. The same problem happens the other
way around. A new binary can be made robust to detect old structs, but
not the other way around. Specific to bpf, the existing xt_bpf code
has an unfortunate bug that it always prints at least one line of
code, even if ->bpf_program_num_elems == 0.

I notice that other extensions also do not necessarily only extend
struct vN in vN+1. Is the above a known issue?


Re: [PATCH nf-next] netfilter: xt_bpf: support ebpf

2016-12-05 Thread Willem de Bruijn
On Mon, Dec 5, 2016 at 6:22 PM, Pablo Neira Ayuso  wrote:
> On Mon, Dec 05, 2016 at 06:06:05PM -0500, Willem de Bruijn wrote:
> [...]
>> Eric also suggests a private variable to avoid being subject to
>> changes to PATH_MAX. Then we can indeed also choose an arbitrary lower
>> length than current PATH_MAX.
>
> Good.
>
>> FWIW, there is a workaround for users with deeply nested paths: the
>> path passed does not have to be absolute. It is literally what is
>> passed on the command line to iptables right now, including relative
>> addresses.
>
> If iptables userspace always expects to have the bpf file repository
> in some given location (suggesting to have a directory that we specify
> at ./configure time, similar to what we do with connlabel.conf), then
> I think we can rely on relative paths. Would this be flexible enough
> for your usecase?

As long as it accepts relative paths, I think it will always work.
Worst case, a user has to cd. No need for hardcoding the bpf mount
point at compile time.

I have the matching iptables patch for pinned objects, btw. Not for
elf objects, which requires linking to libelf and parsing the object,
which is more work (and perhaps best punted on by expanding libbpf in
bcc to include this functionality. it already exists under samples/bpf
and iproute2).


Re: [PATCH nf-next] netfilter: xt_bpf: support ebpf

2016-12-05 Thread Pablo Neira Ayuso
On Mon, Dec 05, 2016 at 06:06:05PM -0500, Willem de Bruijn wrote:
[...]
> Eric also suggests a private variable to avoid being subject to
> changes to PATH_MAX. Then we can indeed also choose an arbitrary lower
> length than current PATH_MAX.

Good.

> FWIW, there is a workaround for users with deeply nested paths: the
> path passed does not have to be absolute. It is literally what is
> passed on the command line to iptables right now, including relative
> addresses.

If iptables userspace always expects to have the bpf file repository
in some given location (suggesting to have a directory that we specify
at ./configure time, similar to what we do with connlabel.conf), then
I think we can rely on relative paths. Would this be flexible enough
for your usecase?


[PATCH next] Revert "dctcp: update cwnd on congestion event"

2016-12-05 Thread Florian Westphal
Neal Cardwell says:
 If I am reading the code correctly, then I would have two concerns:
 1) Has that been tested? That seems like an extremely dramatic
decrease in cwnd. For example, if the cwnd is 80, and there are 40
ACKs, and half the ACKs are ECE marked, then my back-of-the-envelope
calculations seem to suggest that after just 11 ACKs the cwnd would be
down to a minimal value of 2 [..]
 2) That seems to contradict another passage in the draft [..] where it
sazs:
   Just as specified in [RFC3168], DCTCP does not react to congestion
   indications more than once for every window of data.

Neal is right.  Fortunately we don't have to complicate this by testing
vs. current rtt estimate, we can just revert the patch.

Normal stack already handles this for us: receiving ACKs with ECE
set causes a call to tcp_enter_cwr(), from there on the ssthresh gets
adjusted and prr will take care of cwnd adjustment.

Fixes: 4780566784b396 ("dctcp: update cwnd on congestion event")
Cc: Neal Cardwell 
Signed-off-by: Florian Westphal 
---
 net/ipv4/tcp_dctcp.c | 9 +
 1 file changed, 1 insertion(+), 8 deletions(-)

diff --git a/net/ipv4/tcp_dctcp.c b/net/ipv4/tcp_dctcp.c
index bde22ebb92a8..5f5e5936760e 100644
--- a/net/ipv4/tcp_dctcp.c
+++ b/net/ipv4/tcp_dctcp.c
@@ -188,8 +188,8 @@ static void dctcp_ce_state_1_to_0(struct sock *sk)
 
 static void dctcp_update_alpha(struct sock *sk, u32 flags)
 {
+   const struct tcp_sock *tp = tcp_sk(sk);
struct dctcp *ca = inet_csk_ca(sk);
-   struct tcp_sock *tp = tcp_sk(sk);
u32 acked_bytes = tp->snd_una - ca->prior_snd_una;
 
/* If ack did not advance snd_una, count dupack as MSS size.
@@ -229,13 +229,6 @@ static void dctcp_update_alpha(struct sock *sk, u32 flags)
WRITE_ONCE(ca->dctcp_alpha, alpha);
dctcp_reset(tp, ca);
}
-
-   if (flags & CA_ACK_ECE) {
-   unsigned int cwnd = dctcp_ssthresh(sk);
-
-   if (cwnd != tp->snd_cwnd)
-   tp->snd_cwnd = cwnd;
-   }
 }
 
 static void dctcp_state(struct sock *sk, u8 new_state)
-- 
2.7.3



Re: stmmac ethernet in kernel 4.9-rc6: coalescing related pauses.

2016-12-05 Thread Lino Sanfilippo

> 
> You mean stmmac_xmit()? Thats also softirq AFAICT, its the TX softirq
> 
> Regards,
> Lino
> 
> 

Hmm. netdevices.txt says:

ndo_start_xmit:
...

Context: Process with BHs disabled or BH (timer),
 will be called with interrupts disabled by netconsole.

...

If this is correct it can indeed be process context, too. However BHs are 
already
disabled.



Re: [PATCH nf-next] netfilter: xt_bpf: support ebpf

2016-12-05 Thread Willem de Bruijn
On Mon, Dec 5, 2016 at 6:00 PM, Pablo Neira Ayuso  wrote:
> On Mon, Dec 05, 2016 at 11:34:15PM +0100, Pablo Neira Ayuso wrote:
>> On Mon, Dec 05, 2016 at 10:30:01PM +0100, Florian Westphal wrote:
>> > Eric Dumazet  wrote:
>> > > On Mon, 2016-12-05 at 15:28 -0500, Willem de Bruijn wrote:
>> > > > From: Willem de Bruijn 
>> > > >
>> > > > Add support for attaching an eBPF object by file descriptor.
>> > > >
>> > > > The iptables binary can be called with a path to an elf object or a
>> > > > pinned bpf object. Also pass the mode and path to the kernel to be
>> > > > able to return it later for iptables dump and save.
>> > > >
>> > > > Signed-off-by: Willem de Bruijn 
>> > > > ---
>> > >
>> > > Assuming there is no simple way to get variable matchsize in iptables,
>> > > this looks good to me, thanks.
>> >
>> > It should be possible by setting kernel .matchsize to ~0 which
>> > suppresses strict size enforcement.
>> >
>> > Its currently only used by ebt_among, but this should work for any xtables
>> > module.
>>
>> This is likely going to trigger a large rewrite of the core userspace
>> iptables codebase, and likely going to pull part of the mess we have
>> in ebtables into iptables. So I'd prefer not to follow this path.
>
> So this variable path is there to annotate what userspace claims that
> is the file that contains the bpf blob that was loaded, actually this
> is irrelevant to the kernel, so this is just there to dump it back
> when iptables-save it is called. Just a side note, one could set
> anything there from userspace, point somewhere else actually...
>
> Well anyway, going back to the path problem to keep it simple: Why
> don't just trim this down to something smaller, are you really
> expecting to reach PATH_MAX in your usecase?

Not often. Module-specific limitations that differ from global
definitions are just a pain when they bite. This module also has an
arbitrary low limit on the length of the cBPF program passed, for
instance.

Eric also suggests a private variable to avoid being subject to
changes to PATH_MAX. Then we can indeed also choose an arbitrary lower
length than current PATH_MAX.

FWIW, there is a workaround for users with deeply nested paths: the
path passed does not have to be absolute. It is literally what is
passed on the command line to iptables right now, including relative
addresses.


Re: [PATCH nf-next] netfilter: xt_bpf: support ebpf

2016-12-05 Thread Pablo Neira Ayuso
On Mon, Dec 05, 2016 at 02:59:09PM -0800, Eric Dumazet wrote:
> On Mon, 2016-12-05 at 23:40 +0100, Florian Westphal wrote:
> 
> > Fair enough, I have no objections to the patch.
> 
> An additional question is about PATH_MAX :
> 
> Is it guaranteed to stay at 4096 forever ?
> 
> To be safe, maybe we should use a constant of our own.

Right, this reminds me we have to fix something else.

So constant of our own plus something smaller, if possible, would be
good to go. Thanks.


Re: [PATCH nf-next] netfilter: xt_bpf: support ebpf

2016-12-05 Thread Willem de Bruijn
On Mon, Dec 5, 2016 at 5:55 PM, Daniel Borkmann  wrote:
> Hi Willem,
>
> On 12/05/2016 09:28 PM, Willem de Bruijn wrote:
>>
>> From: Willem de Bruijn 
>>
>> Add support for attaching an eBPF object by file descriptor.
>>
>> The iptables binary can be called with a path to an elf object or a
>> pinned bpf object. Also pass the mode and path to the kernel to be
>> able to return it later for iptables dump and save.
>>
>> Signed-off-by: Willem de Bruijn 
>
>
> just out of pure curiosity, use case is for android guys wrt
> accounting, or anything specific that cls_bpf on tc ingress +
> egress cannot do already?

That is the immediate motivation, yes.


Re: [PATCH nf-next] netfilter: xt_bpf: support ebpf

2016-12-05 Thread Pablo Neira Ayuso
On Mon, Dec 05, 2016 at 11:34:15PM +0100, Pablo Neira Ayuso wrote:
> On Mon, Dec 05, 2016 at 10:30:01PM +0100, Florian Westphal wrote:
> > Eric Dumazet  wrote:
> > > On Mon, 2016-12-05 at 15:28 -0500, Willem de Bruijn wrote:
> > > > From: Willem de Bruijn 
> > > > 
> > > > Add support for attaching an eBPF object by file descriptor.
> > > > 
> > > > The iptables binary can be called with a path to an elf object or a
> > > > pinned bpf object. Also pass the mode and path to the kernel to be
> > > > able to return it later for iptables dump and save.
> > > > 
> > > > Signed-off-by: Willem de Bruijn 
> > > > ---
> > > 
> > > Assuming there is no simple way to get variable matchsize in iptables,
> > > this looks good to me, thanks.
> > 
> > It should be possible by setting kernel .matchsize to ~0 which
> > suppresses strict size enforcement.
> > 
> > Its currently only used by ebt_among, but this should work for any xtables
> > module.
> 
> This is likely going to trigger a large rewrite of the core userspace
> iptables codebase, and likely going to pull part of the mess we have
> in ebtables into iptables. So I'd prefer not to follow this path.

So this variable path is there to annotate what userspace claims that
is the file that contains the bpf blob that was loaded, actually this
is irrelevant to the kernel, so this is just there to dump it back
when iptables-save it is called. Just a side note, one could set
anything there from userspace, point somewhere else actually...

Well anyway, going back to the path problem to keep it simple: Why
don't just trim this down to something smaller, are you really
expecting to reach PATH_MAX in your usecase?


Re: [PATCH nf-next] netfilter: xt_bpf: support ebpf

2016-12-05 Thread Eric Dumazet
On Mon, 2016-12-05 at 23:40 +0100, Florian Westphal wrote:

> Fair enough, I have no objections to the patch.

An additional question is about PATH_MAX :

Is it guaranteed to stay at 4096 forever ?

To be safe, maybe we should use a constant of our own.




Re: [PATCH nf-next] netfilter: xt_bpf: support ebpf

2016-12-05 Thread Daniel Borkmann

Hi Willem,

On 12/05/2016 09:28 PM, Willem de Bruijn wrote:

From: Willem de Bruijn 

Add support for attaching an eBPF object by file descriptor.

The iptables binary can be called with a path to an elf object or a
pinned bpf object. Also pass the mode and path to the kernel to be
able to return it later for iptables dump and save.

Signed-off-by: Willem de Bruijn 


just out of pure curiosity, use case is for android guys wrt
accounting, or anything specific that cls_bpf on tc ingress +
egress cannot do already?

Thanks,
Daniel


Re: stmmac ethernet in kernel 4.9-rc6: coalescing related pauses.

2016-12-05 Thread Lino Sanfilippo
On 05.12.2016 23:40, Pavel Machek wrote:
> On Mon 2016-12-05 23:37:09, Lino Sanfilippo wrote:
>> Hi Pavel,
>> 
>> On 05.12.2016 23:02, Pavel Machek wrote:
>> > 
>> > we need spin_lock_bh at minimum, as we are locking user context
>> > against timer.
>> > 
>> > Best regards,
>> >Pavel
>> > 
>> 
>> I was referring to stmmac_tx_clean() which AFAICS is only called from 
>> softirq context,
>> (one time in the timer handler and one time in napi poll handler) so a 
>> spin_lock() should
>> be sufficient. I cant see how this is called from userspace. If it were, a 
>> spin_lock_bh() had
>> to be used, of course.
> 
> stmmac_tx_clean() shares lock with stmmac_tx() -- and that's process
> context as far as I can tell. So... spin_lock_bh() at
> minimum... right?
> 
>   Pavel
> 

You mean stmmac_xmit()? Thats also softirq AFAICT, its the TX softirq

Regards,
Lino




Re: [ovs-dev] [PATCH net-next] net: remove abuse of VLAN DEI/CFI bit

2016-12-05 Thread Michał Mirosław
On Mon, Dec 05, 2016 at 10:55:45AM -0800, Ben Pfaff wrote:
> On Mon, Dec 05, 2016 at 06:24:36PM +0100, Michał Mirosław wrote:
> > On Sat, Dec 03, 2016 at 03:27:30PM -0800, Ben Pfaff wrote:
> > > On Sat, Dec 03, 2016 at 10:22:28AM +0100, Michał Mirosław wrote:
> > > > This All-in-one patch removes abuse of VLAN CFI bit, so it can be passed
> > > > intact through linux networking stack.
> > > This appears to change the established Open vSwitch userspace API.  You
> > > can see that simply from the way that it changes the documentation for
> > > the userspace API.  If I'm right about that, then this change will break
> > > all userspace programs that use the Open vSwitch kernel module,
> > > including Open vSwitch itself.
> > 
> > If I understood the code correctly, it does change expected meaning for
> > the (unlikely?) case of header truncated just before the VLAN TCI - it will
> > be impossible to differentiate this case from the VLAN TCI == 0.
> > 
> > I guess this is a problem with OVS API, because it doesn't directly show
> > the "missing" state of elements, but relies on an "invalid" value.
> 
> That particular corner case should not be a huge problem in any case.
> 
> The real problem is that this appears to break the common case use of
> VLANs in Open vSwitch.  After this patch, parse_vlan() in
> net/openvswitch/flow.c copies the tpid and tci from sk_buff (either the
> accelerated version of them or the version in the skb data) into
> sw_flow_key members.  OK, that's fine on it's own.  However, I don't see
> any corresponding change to the code in flow_netlink.c to compensate for
> the fact that, until now, the VLAN CFI bit (formerly VLAN_TAG_PRESENT)
> was always required to be set to 1 in flow matches inside Netlink
> messages sent from userspace, and the kernel always set it to 1 in
> corresponding messages sent to userspace.
> 
> In other words, if I'm reading this change correctly:
> 
> * With a kernel before this change, userspace always had to set
>   VLAN_TAG_PRESENT to 1 to match on a VLAN, or the kernel would
>   reject the flow match.
> 
> * With a kernel after this change, userspace must not set
>   VLAN_TAG_PRESENT to 1, otherwise the kernel will accept the flow
>   match but nothing will ever match because packets do not actually
>   have the CFI bit set.
> 
> Take a look at this code that the patch deletes from
> validate_vlan_from_nlattrs(), for example, and see how it insisted that
> VLAN_TAG_PRESENT was set:
> 
>   if (!(tci & htons(VLAN_TAG_PRESENT))) {
>   if (tci) {
>   OVS_NLERR(log, "%s TCI does not have VLAN_TAG_PRESENT 
> bit set.",
> (inner) ? "C-VLAN" : "VLAN");
>   return -EINVAL;
>   } else if (nla_len(a[OVS_KEY_ATTR_ENCAP])) {
>   /* Corner case for truncated VLAN header. */
>   OVS_NLERR(log, "Truncated %s header has non-zero encap 
> attribute.",
> (inner) ? "C-VLAN" : "VLAN");
>   return -EINVAL;
>   }
>   }
> 
> Please let me know if I'm overlooking something.

Hmm. So the easiest change without disrupting current userspace, would be
to flip the CFI bit on the way to/from OVS userspace. Does this seem
correct?

Best Regards,
Michał Mirosław


Re: [PATCH nf-next] netfilter: xt_bpf: support ebpf

2016-12-05 Thread Florian Westphal
Pablo Neira Ayuso  wrote:
> On Mon, Dec 05, 2016 at 10:30:01PM +0100, Florian Westphal wrote:
> > Eric Dumazet  wrote:
> > > On Mon, 2016-12-05 at 15:28 -0500, Willem de Bruijn wrote:
> > > > From: Willem de Bruijn 
> > > > 
> > > > Add support for attaching an eBPF object by file descriptor.
> > > > 
> > > > The iptables binary can be called with a path to an elf object or a
> > > > pinned bpf object. Also pass the mode and path to the kernel to be
> > > > able to return it later for iptables dump and save.
> > > > 
> > > > Signed-off-by: Willem de Bruijn 
> > > > ---
> > > 
> > > Assuming there is no simple way to get variable matchsize in iptables,
> > > this looks good to me, thanks.
> > 
> > It should be possible by setting kernel .matchsize to ~0 which
> > suppresses strict size enforcement.
> > 
> > Its currently only used by ebt_among, but this should work for any xtables
> > module.
> 
> This is likely going to trigger a large rewrite of the core userspace
> iptables codebase, and likely going to pull part of the mess we have
> in ebtables into iptables. So I'd prefer not to follow this path.

Fair enough, I have no objections to the patch.


Re: stmmac ethernet in kernel 4.9-rc6: coalescing related pauses.

2016-12-05 Thread Pavel Machek
On Mon 2016-12-05 23:37:09, Lino Sanfilippo wrote:
> Hi Pavel,
> 
> On 05.12.2016 23:02, Pavel Machek wrote:
> > 
> > we need spin_lock_bh at minimum, as we are locking user context
> > against timer.
> > 
> > Best regards,
> > Pavel
> > 
> 
> I was referring to stmmac_tx_clean() which AFAICS is only called from softirq 
> context,
> (one time in the timer handler and one time in napi poll handler) so a 
> spin_lock() should
> be sufficient. I cant see how this is called from userspace. If it were, a 
> spin_lock_bh() had
> to be used, of course.

stmmac_tx_clean() shares lock with stmmac_tx() -- and that's process
context as far as I can tell. So... spin_lock_bh() at
minimum... right?

Pavel

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


signature.asc
Description: Digital signature


Re: stmmac ethernet in kernel 4.9-rc6: coalescing related pauses.

2016-12-05 Thread Lino Sanfilippo
Hi Pavel,

On 05.12.2016 23:02, Pavel Machek wrote:
> 
> we need spin_lock_bh at minimum, as we are locking user context
> against timer.
> 
> Best regards,
>   Pavel
> 

I was referring to stmmac_tx_clean() which AFAICS is only called from softirq 
context,
(one time in the timer handler and one time in napi poll handler) so a 
spin_lock() should
be sufficient. I cant see how this is called from userspace. If it were, a 
spin_lock_bh() had
to be used, of course.

Regards,
Lino 


Re: [PATCH nf-next] netfilter: xt_bpf: support ebpf

2016-12-05 Thread Pablo Neira Ayuso
On Mon, Dec 05, 2016 at 10:30:01PM +0100, Florian Westphal wrote:
> Eric Dumazet  wrote:
> > On Mon, 2016-12-05 at 15:28 -0500, Willem de Bruijn wrote:
> > > From: Willem de Bruijn 
> > > 
> > > Add support for attaching an eBPF object by file descriptor.
> > > 
> > > The iptables binary can be called with a path to an elf object or a
> > > pinned bpf object. Also pass the mode and path to the kernel to be
> > > able to return it later for iptables dump and save.
> > > 
> > > Signed-off-by: Willem de Bruijn 
> > > ---
> > 
> > Assuming there is no simple way to get variable matchsize in iptables,
> > this looks good to me, thanks.
> 
> It should be possible by setting kernel .matchsize to ~0 which
> suppresses strict size enforcement.
> 
> Its currently only used by ebt_among, but this should work for any xtables
> module.

This is likely going to trigger a large rewrite of the core userspace
iptables codebase, and likely going to pull part of the mess we have
in ebtables into iptables. So I'd prefer not to follow this path.


[PATCH v3 net-next v3 2/4] net: dsa: mv88e6xxx: add helper to hardware reset

2016-12-05 Thread Vivien Didelot
Add an helper to toggle the eventual GPIO connected to the reset pin.

Signed-off-by: Vivien Didelot 
Reviewed-by: Andrew Lunn 
---
 drivers/net/dsa/mv88e6xxx/chip.c | 22 ++
 1 file changed, 14 insertions(+), 8 deletions(-)

diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
index 1d4d3be..27dfb5d 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.c
+++ b/drivers/net/dsa/mv88e6xxx/chip.c
@@ -2356,6 +2356,19 @@ static void mv88e6xxx_port_bridge_leave(struct 
dsa_switch *ds, int port)
mutex_unlock(>reg_lock);
 }
 
+static void mv88e6xxx_hardware_reset(struct mv88e6xxx_chip *chip)
+{
+   struct gpio_desc *gpiod = chip->reset;
+
+   /* If there is a GPIO connected to the reset pin, toggle it */
+   if (gpiod) {
+   gpiod_set_value_cansleep(gpiod, 1);
+   usleep_range(1, 2);
+   gpiod_set_value_cansleep(gpiod, 0);
+   usleep_range(1, 2);
+   }
+}
+
 static int mv88e6xxx_disable_ports(struct mv88e6xxx_chip *chip)
 {
int i, err;
@@ -2380,7 +2393,6 @@ static int mv88e6xxx_switch_reset(struct mv88e6xxx_chip 
*chip)
 {
bool ppu_active = mv88e6xxx_has(chip, MV88E6XXX_FLAG_PPU_ACTIVE);
u16 is_reset = (ppu_active ? 0x8800 : 0xc800);
-   struct gpio_desc *gpiod = chip->reset;
unsigned long timeout;
u16 reg;
int err;
@@ -2389,13 +2401,7 @@ static int mv88e6xxx_switch_reset(struct mv88e6xxx_chip 
*chip)
if (err)
return err;
 
-   /* If there is a gpio connected to the reset pin, toggle it */
-   if (gpiod) {
-   gpiod_set_value_cansleep(gpiod, 1);
-   usleep_range(1, 2);
-   gpiod_set_value_cansleep(gpiod, 0);
-   usleep_range(1, 2);
-   }
+   mv88e6xxx_hardware_reset(chip);
 
/* Reset the switch. Keep the PPU active if requested. The PPU
 * needs to be active to support indirect phy register access
-- 
2.10.2



[PATCH v3 net-next v3 4/4] net: dsa: mv88e6xxx: add PPU operations

2016-12-05 Thread Vivien Didelot
Some Marvell chips can enable/disable the PPU on demand. This is needed
to access the PHY registers when there is no indirection mechanism.

Add two new ppu_enable and ppu_disable ops to describe this and finally
get rid of the MV88E6XXX_FLAG_PPU* flags.

Signed-off-by: Vivien Didelot 
---
 drivers/net/dsa/mv88e6xxx/chip.c  | 76 ---
 drivers/net/dsa/mv88e6xxx/global1.c   | 57 ++
 drivers/net/dsa/mv88e6xxx/global1.h   |  3 ++
 drivers/net/dsa/mv88e6xxx/mv88e6xxx.h | 19 ++---
 4 files changed, 81 insertions(+), 74 deletions(-)

diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
index 5aae5d7..731d262 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.c
+++ b/drivers/net/dsa/mv88e6xxx/chip.c
@@ -527,58 +527,18 @@ int mv88e6xxx_update(struct mv88e6xxx_chip *chip, int 
addr, int reg, u16 update)
 
 static int mv88e6xxx_ppu_disable(struct mv88e6xxx_chip *chip)
 {
-   u16 val;
-   int i, err;
+   if (!chip->info->ops->ppu_disable)
+   return 0;
 
-   err = mv88e6xxx_g1_read(chip, GLOBAL_CONTROL, );
-   if (err)
-   return err;
-
-   err = mv88e6xxx_g1_write(chip, GLOBAL_CONTROL,
-val & ~GLOBAL_CONTROL_PPU_ENABLE);
-   if (err)
-   return err;
-
-   for (i = 0; i < 16; i++) {
-   err = mv88e6xxx_g1_read(chip, GLOBAL_STATUS, );
-   if (err)
-   return err;
-
-   usleep_range(1000, 2000);
-   val &= GLOBAL_STATUS_PPU_STATE_MASK;
-   if (val != GLOBAL_STATUS_PPU_STATE_POLLING)
-   return 0;
-   }
-
-   return -ETIMEDOUT;
+   return chip->info->ops->ppu_disable(chip);
 }
 
 static int mv88e6xxx_ppu_enable(struct mv88e6xxx_chip *chip)
 {
-   u16 val;
-   int i, err;
+   if (!chip->info->ops->ppu_enable)
+   return 0;
 
-   err = mv88e6xxx_g1_read(chip, GLOBAL_CONTROL, );
-   if (err)
-   return err;
-
-   err = mv88e6xxx_g1_write(chip, GLOBAL_CONTROL,
-val | GLOBAL_CONTROL_PPU_ENABLE);
-   if (err)
-   return err;
-
-   for (i = 0; i < 16; i++) {
-   err = mv88e6xxx_g1_read(chip, GLOBAL_STATUS, );
-   if (err)
-   return err;
-
-   usleep_range(1000, 2000);
-   val &= GLOBAL_STATUS_PPU_STATE_MASK;
-   if (val == GLOBAL_STATUS_PPU_STATE_POLLING)
-   return 0;
-   }
-
-   return -ETIMEDOUT;
+   return chip->info->ops->ppu_enable(chip);
 }
 
 static void mv88e6xxx_ppu_reenable_work(struct work_struct *ugly)
@@ -2746,22 +2706,12 @@ static int mv88e6xxx_g1_setup(struct mv88e6xxx_chip 
*chip)
 {
struct dsa_switch *ds = chip->ds;
u32 upstream_port = dsa_upstream_port(ds);
-   u16 reg;
int err;
 
/* Enable the PHY Polling Unit if present, don't discard any packets,
 * and mask all interrupt sources.
 */
-   err = mv88e6xxx_g1_read(chip, GLOBAL_CONTROL, );
-   if (err < 0)
-   return err;
-
-   reg &= ~GLOBAL_CONTROL_PPU_ENABLE;
-   if (mv88e6xxx_has(chip, MV88E6XXX_FLAG_PPU) ||
-   mv88e6xxx_has(chip, MV88E6XXX_FLAG_PPU_ACTIVE))
-   reg |= GLOBAL_CONTROL_PPU_ENABLE;
-
-   err = mv88e6xxx_g1_write(chip, GLOBAL_CONTROL, reg);
+   err = mv88e6xxx_ppu_enable(chip);
if (err)
return err;
 
@@ -3223,6 +3173,8 @@ static const struct mv88e6xxx_ops mv88e6085_ops = {
.g1_set_cpu_port = mv88e6095_g1_set_cpu_port,
.g1_set_egress_port = mv88e6095_g1_set_egress_port,
.mgmt_rsvd2cpu = mv88e6095_g2_mgmt_rsvd2cpu,
+   .ppu_enable = mv88e6185_g1_ppu_enable,
+   .ppu_disable = mv88e6185_g1_ppu_disable,
.reset = mv88e6185_g1_reset,
 };
 
@@ -3241,6 +3193,8 @@ static const struct mv88e6xxx_ops mv88e6095_ops = {
.stats_get_strings = mv88e6095_stats_get_strings,
.stats_get_stats = mv88e6095_stats_get_stats,
.mgmt_rsvd2cpu = mv88e6095_g2_mgmt_rsvd2cpu,
+   .ppu_enable = mv88e6185_g1_ppu_enable,
+   .ppu_disable = mv88e6185_g1_ppu_disable,
.reset = mv88e6185_g1_reset,
 };
 
@@ -3311,6 +3265,8 @@ static const struct mv88e6xxx_ops mv88e6131_ops = {
.g1_set_cpu_port = mv88e6095_g1_set_cpu_port,
.g1_set_egress_port = mv88e6095_g1_set_egress_port,
.mgmt_rsvd2cpu = mv88e6095_g2_mgmt_rsvd2cpu,
+   .ppu_enable = mv88e6185_g1_ppu_enable,
+   .ppu_disable = mv88e6185_g1_ppu_disable,
.reset = mv88e6185_g1_reset,
 };
 
@@ -3483,6 +3439,8 @@ static const struct mv88e6xxx_ops mv88e6185_ops = {
.g1_set_cpu_port = mv88e6095_g1_set_cpu_port,
.g1_set_egress_port = mv88e6095_g1_set_egress_port,
.mgmt_rsvd2cpu = mv88e6095_g2_mgmt_rsvd2cpu,
+   .ppu_enable = 

[PATCH v3 net-next v3 3/4] net: dsa: mv88e6xxx: add a soft reset operation

2016-12-05 Thread Vivien Didelot
Marvell chips have different way to issue a software reset.

Old chips (such as 88E6060) have a reset bit in an ATU control register.

Newer chips moved this bit in a Global control register. Chips with
controllable PPU should reset the PPU when resetting the switch.

Add a new reset operation to implement these differences and introduce a
mv88e6xxx_software_reset() helper to wrap it conveniently.

Signed-off-by: Vivien Didelot 
---
 drivers/net/dsa/mv88e6xxx/chip.c  |  72 ++--
 drivers/net/dsa/mv88e6xxx/global1.c   | 121 ++
 drivers/net/dsa/mv88e6xxx/global1.h   |   4 ++
 drivers/net/dsa/mv88e6xxx/mv88e6xxx.h |  15 +++--
 4 files changed, 172 insertions(+), 40 deletions(-)

diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
index 27dfb5d..5aae5d7 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.c
+++ b/drivers/net/dsa/mv88e6xxx/chip.c
@@ -545,7 +545,8 @@ static int mv88e6xxx_ppu_disable(struct mv88e6xxx_chip 
*chip)
return err;
 
usleep_range(1000, 2000);
-   if ((val & GLOBAL_STATUS_PPU_MASK) != GLOBAL_STATUS_PPU_POLLING)
+   val &= GLOBAL_STATUS_PPU_STATE_MASK;
+   if (val != GLOBAL_STATUS_PPU_STATE_POLLING)
return 0;
}
 
@@ -572,7 +573,8 @@ static int mv88e6xxx_ppu_enable(struct mv88e6xxx_chip *chip)
return err;
 
usleep_range(1000, 2000);
-   if ((val & GLOBAL_STATUS_PPU_MASK) == GLOBAL_STATUS_PPU_POLLING)
+   val &= GLOBAL_STATUS_PPU_STATE_MASK;
+   if (val == GLOBAL_STATUS_PPU_STATE_POLLING)
return 0;
}
 
@@ -2356,6 +2358,14 @@ static void mv88e6xxx_port_bridge_leave(struct 
dsa_switch *ds, int port)
mutex_unlock(>reg_lock);
 }
 
+static int mv88e6xxx_software_reset(struct mv88e6xxx_chip *chip)
+{
+   if (chip->info->ops->reset)
+   return chip->info->ops->reset(chip);
+
+   return 0;
+}
+
 static void mv88e6xxx_hardware_reset(struct mv88e6xxx_chip *chip)
 {
struct gpio_desc *gpiod = chip->reset;
@@ -2391,10 +2401,6 @@ static int mv88e6xxx_disable_ports(struct mv88e6xxx_chip 
*chip)
 
 static int mv88e6xxx_switch_reset(struct mv88e6xxx_chip *chip)
 {
-   bool ppu_active = mv88e6xxx_has(chip, MV88E6XXX_FLAG_PPU_ACTIVE);
-   u16 is_reset = (ppu_active ? 0x8800 : 0xc800);
-   unsigned long timeout;
-   u16 reg;
int err;
 
err = mv88e6xxx_disable_ports(chip);
@@ -2403,34 +2409,7 @@ static int mv88e6xxx_switch_reset(struct mv88e6xxx_chip 
*chip)
 
mv88e6xxx_hardware_reset(chip);
 
-   /* Reset the switch. Keep the PPU active if requested. The PPU
-* needs to be active to support indirect phy register access
-* through global registers 0x18 and 0x19.
-*/
-   if (ppu_active)
-   err = mv88e6xxx_g1_write(chip, 0x04, 0xc000);
-   else
-   err = mv88e6xxx_g1_write(chip, 0x04, 0xc400);
-   if (err)
-   return err;
-
-   /* Wait up to one second for reset to complete. */
-   timeout = jiffies + 1 * HZ;
-   while (time_before(jiffies, timeout)) {
-   err = mv88e6xxx_g1_read(chip, 0x00, );
-   if (err)
-   return err;
-
-   if ((reg & is_reset) == is_reset)
-   break;
-   usleep_range(1000, 2000);
-   }
-   if (time_after(jiffies, timeout))
-   err = -ETIMEDOUT;
-   else
-   err = 0;
-
-   return err;
+   return mv88e6xxx_software_reset(chip);
 }
 
 static int mv88e6xxx_serdes_power_on(struct mv88e6xxx_chip *chip)
@@ -3244,6 +3223,7 @@ static const struct mv88e6xxx_ops mv88e6085_ops = {
.g1_set_cpu_port = mv88e6095_g1_set_cpu_port,
.g1_set_egress_port = mv88e6095_g1_set_egress_port,
.mgmt_rsvd2cpu = mv88e6095_g2_mgmt_rsvd2cpu,
+   .reset = mv88e6185_g1_reset,
 };
 
 static const struct mv88e6xxx_ops mv88e6095_ops = {
@@ -3261,6 +3241,7 @@ static const struct mv88e6xxx_ops mv88e6095_ops = {
.stats_get_strings = mv88e6095_stats_get_strings,
.stats_get_stats = mv88e6095_stats_get_stats,
.mgmt_rsvd2cpu = mv88e6095_g2_mgmt_rsvd2cpu,
+   .reset = mv88e6185_g1_reset,
 };
 
 static const struct mv88e6xxx_ops mv88e6097_ops = {
@@ -3285,6 +3266,7 @@ static const struct mv88e6xxx_ops mv88e6097_ops = {
.g1_set_cpu_port = mv88e6095_g1_set_cpu_port,
.g1_set_egress_port = mv88e6095_g1_set_egress_port,
.mgmt_rsvd2cpu = mv88e6095_g2_mgmt_rsvd2cpu,
+   .reset = mv88e6352_g1_reset,
 };
 
 static const struct mv88e6xxx_ops mv88e6123_ops = {
@@ -3304,6 +3286,7 @@ static const struct mv88e6xxx_ops mv88e6123_ops = {
.g1_set_cpu_port = mv88e6095_g1_set_cpu_port,
.g1_set_egress_port = mv88e6095_g1_set_egress_port,

[PATCH v3 net-next v3 0/4] net: dsa: mv88e6xxx: rework reset and PPU code

2016-12-05 Thread Vivien Didelot
Old Marvell chips (like 88E6060) don't have a PHY Polling Unit (PPU).

Next chips (like 88E6185) have a PPU, which has exclusive access to the
PHY registers, thus must be disabled before access.

Newer chips (like 88E6352) have an indirect mechanism to access the PHY
registers whenever, thus loose control over the PPU (always enabled).

Here's a summary:

Model | PPU? | Has PPU ctrl?  | PPU state readable? | PHY access
- |  | -- | --- | --
 6060 | no   | no | no  | direct
 6185 | yes  | yes, PPUEn bit | yes, PPUState 2-bit | direct w/ PPU dis.
 6352 | yes  | no | yes, PPUState 1-bit | indirect
 6390 | yes  | no | yes, InitState bit  | indirect

Depending on the PPU control, a switch may have to restart the PPU when
resetting the switch. Once the switch is reset, we must wait for the PPU
state to be active polling again before accessing the registers.

For that purpose, add new operations to the chips to enable/disable the
PPU, and execute software reset. With these new ops in place, rework the
switch reset code and finally get rid of the MV88E6XXX_FLAG_PPU* flags.

Changes in v3:
  - consider 6097 as 6352 (no PPU ops and use mv88e6352_g1_reset).

Changes in v2:
  - wait in ppu/reset ops so that ppu_polling is not needed anymore.

Vivien Didelot (4):
  net: dsa: mv88e6xxx: add helper to disable ports
  net: dsa: mv88e6xxx: add helper to hardware reset
  net: dsa: mv88e6xxx: add a soft reset operation
  net: dsa: mv88e6xxx: add PPU operations

 drivers/net/dsa/mv88e6xxx/chip.c  | 176 +++--
 drivers/net/dsa/mv88e6xxx/global1.c   | 178 ++
 drivers/net/dsa/mv88e6xxx/global1.h   |   7 ++
 drivers/net/dsa/mv88e6xxx/mv88e6xxx.h |  34 +++
 4 files changed, 276 insertions(+), 119 deletions(-)

-- 
2.10.2



[PATCH v3 net-next v3 1/4] net: dsa: mv88e6xxx: add helper to disable ports

2016-12-05 Thread Vivien Didelot
Before resetting a switch, the ports should be set to the Disabled state
and the transmit queues should be drained.

Add an helper to explicit that.

Signed-off-by: Vivien Didelot 
Reviewed-by: Andrew Lunn 
---
 drivers/net/dsa/mv88e6xxx/chip.c | 34 +++---
 1 file changed, 23 insertions(+), 11 deletions(-)

diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
index ca453f3..1d4d3be 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.c
+++ b/drivers/net/dsa/mv88e6xxx/chip.c
@@ -2356,6 +2356,26 @@ static void mv88e6xxx_port_bridge_leave(struct 
dsa_switch *ds, int port)
mutex_unlock(>reg_lock);
 }
 
+static int mv88e6xxx_disable_ports(struct mv88e6xxx_chip *chip)
+{
+   int i, err;
+
+   /* Set all ports to the Disabled state */
+   for (i = 0; i < mv88e6xxx_num_ports(chip); i++) {
+   err = mv88e6xxx_port_set_state(chip, i,
+  PORT_CONTROL_STATE_DISABLED);
+   if (err)
+   return err;
+   }
+
+   /* Wait for transmit queues to drain,
+* i.e. 2ms for a maximum frame to be transmitted at 10 Mbps.
+*/
+   usleep_range(2000, 4000);
+
+   return 0;
+}
+
 static int mv88e6xxx_switch_reset(struct mv88e6xxx_chip *chip)
 {
bool ppu_active = mv88e6xxx_has(chip, MV88E6XXX_FLAG_PPU_ACTIVE);
@@ -2364,18 +2384,10 @@ static int mv88e6xxx_switch_reset(struct mv88e6xxx_chip 
*chip)
unsigned long timeout;
u16 reg;
int err;
-   int i;
 
-   /* Set all ports to the disabled state. */
-   for (i = 0; i < mv88e6xxx_num_ports(chip); i++) {
-   err = mv88e6xxx_port_set_state(chip, i,
-  PORT_CONTROL_STATE_DISABLED);
-   if (err)
-   return err;
-   }
-
-   /* Wait for transmit queues to drain. */
-   usleep_range(2000, 4000);
+   err = mv88e6xxx_disable_ports(chip);
+   if (err)
+   return err;
 
/* If there is a gpio connected to the reset pin, toggle it */
if (gpiod) {
-- 
2.10.2



Re: [Intel-wired-lan] [RFC PATCH] i40e: enable PCIe relax ordering for SPARC

2016-12-05 Thread tndave



On 12/05/2016 01:54 PM, Alexander Duyck wrote:

On Mon, Dec 5, 2016 at 9:07 AM, Tushar Dave  wrote:

Unlike previous generation NIC (e.g. ixgbe) i40e doesn't seem to have
standard CSR where PCIe relaxed ordering can be set. Without PCIe relax
ordering enabled, i40e performance is significantly low on SPARC.

This patch sets PCIe relax ordering for SPARC arch by setting dma attr
DMA_ATTR_WEAK_ORDERING for every tx and rx DMA map/unmap.
This has shown 10x increase in performance numbers.

e.g.
iperf TCP test with 10 threads on SPARC S7

Test 1: Without this patch

# iperf -s

Server listening on TCP port 5001
TCP window size: 85.3 KByte (default)

[  4] local 16.0.0.7 port 5001 connected with 16.0.0.1 port 40926
[  5] local 16.0.0.7 port 5001 connected with 16.0.0.1 port 40934
[  6] local 16.0.0.7 port 5001 connected with 16.0.0.1 port 40930
[  7] local 16.0.0.7 port 5001 connected with 16.0.0.1 port 40928
[  8] local 16.0.0.7 port 5001 connected with 16.0.0.1 port 40922
[  9] local 16.0.0.7 port 5001 connected with 16.0.0.1 port 40932
[ 10] local 16.0.0.7 port 5001 connected with 16.0.0.1 port 40920
[ 11] local 16.0.0.7 port 5001 connected with 16.0.0.1 port 40924
[ 14] local 16.0.0.7 port 5001 connected with 16.0.0.1 port 40982
[ 12] local 16.0.0.7 port 5001 connected with 16.0.0.1 port 40980
[ ID] Interval   Transfer Bandwidth
[  4]  0.0-20.0 sec   566 MBytes   237 Mbits/sec
[  5]  0.0-20.0 sec   532 MBytes   223 Mbits/sec
[  6]  0.0-20.0 sec   537 MBytes   225 Mbits/sec
[  8]  0.0-20.0 sec   546 MBytes   229 Mbits/sec
[ 11]  0.0-20.0 sec   592 MBytes   248 Mbits/sec
[  7]  0.0-20.0 sec   539 MBytes   226 Mbits/sec
[  9]  0.0-20.0 sec   572 MBytes   240 Mbits/sec
[ 10]  0.0-20.0 sec   604 MBytes   253 Mbits/sec
[ 14]  0.0-20.0 sec   567 MBytes   238 Mbits/sec
[ 12]  0.0-20.0 sec   511 MBytes   214 Mbits/sec
[SUM]  0.0-20.0 sec  5.44 GBytes  2.33 Gbits/sec

Test 2: with this patch:

# iperf -s

Server listening on TCP port 5001
TCP window size: 85.3 KByte (default)

TCP: request_sock_TCP: Possible SYN flooding on port 5001. Sending
cookies.  Check SNMP counters.
[  4] local 16.0.0.7 port 5001 connected with 16.0.0.1 port 46876
[  5] local 16.0.0.7 port 5001 connected with 16.0.0.1 port 46874
[  6] local 16.0.0.7 port 5001 connected with 16.0.0.1 port 46872
[  7] local 16.0.0.7 port 5001 connected with 16.0.0.1 port 46880
[  8] local 16.0.0.7 port 5001 connected with 16.0.0.1 port 46878
[  9] local 16.0.0.7 port 5001 connected with 16.0.0.1 port 46884
[ 10] local 16.0.0.7 port 5001 connected with 16.0.0.1 port 46886
[ 11] local 16.0.0.7 port 5001 connected with 16.0.0.1 port 46890
[ 12] local 16.0.0.7 port 5001 connected with 16.0.0.1 port 46888
[ 13] local 16.0.0.7 port 5001 connected with 16.0.0.1 port 46882
[ ID] Interval   Transfer Bandwidth
[  4]  0.0-20.0 sec  7.45 GBytes  3.19 Gbits/sec
[  5]  0.0-20.0 sec  7.48 GBytes  3.21 Gbits/sec
[  7]  0.0-20.0 sec  7.34 GBytes  3.15 Gbits/sec
[  8]  0.0-20.0 sec  7.42 GBytes  3.18 Gbits/sec
[  9]  0.0-20.0 sec  7.24 GBytes  3.11 Gbits/sec
[ 10]  0.0-20.0 sec  7.40 GBytes  3.17 Gbits/sec
[ 12]  0.0-20.0 sec  7.49 GBytes  3.21 Gbits/sec
[  6]  0.0-20.0 sec  7.30 GBytes  3.13 Gbits/sec
[ 11]  0.0-20.0 sec  7.44 GBytes  3.19 Gbits/sec
[ 13]  0.0-20.0 sec  7.22 GBytes  3.10 Gbits/sec
[SUM]  0.0-20.0 sec  73.8 GBytes  31.6 Gbits/sec

NOTE: In my testing, this patch does _not_ show any harm to i40e
performance numbers on x86.

Signed-off-by: Tushar Dave 


You went through and replaced all of the dma_unmap/map_page calls with
dma_map/unmap_single_attrs  I would prefer you didn't do that.  I have
Yes, because currently there is no DMA API for dma_map/unmap_page with 
dma attr*

patches to add the ability to map and unmap pages with attributes that
should be available for 4.10-rc1 so if you could wait on this patch
until then it would be preferred.

:-) thanks. I will wait until your patches are out.



---
 drivers/net/ethernet/intel/i40e/i40e_txrx.c | 69 -
 drivers/net/ethernet/intel/i40e/i40e_txrx.h |  1 +
 2 files changed, 49 insertions(+), 21 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c 
b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
index 6287bf6..800dca7 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
@@ -551,15 +551,17 @@ static void i40e_unmap_and_free_tx_resource(struct 
i40e_ring *ring,
else
dev_kfree_skb_any(tx_buffer->skb);
if (dma_unmap_len(tx_buffer, len))
-   dma_unmap_single(ring->dev,
-dma_unmap_addr(tx_buffer, dma),
-

Re: [PATCH v2 net-next v2 4/4] net: dsa: mv88e6xxx: add PPU operations

2016-12-05 Thread Vivien Didelot
Stefan Eichenberger  writes:

> Hi Vivien,
>
> On Mon, Dec 05, 2016 at 11:27:03AM -0500, Vivien Didelot wrote:
>> @@ -3266,6 +3220,8 @@ static const struct mv88e6xxx_ops mv88e6097_ops = {
>>  .g1_set_cpu_port = mv88e6095_g1_set_cpu_port,
>>  .g1_set_egress_port = mv88e6095_g1_set_egress_port,
>>  .mgmt_rsvd2cpu = mv88e6095_g2_mgmt_rsvd2cpu,
>> +.ppu_enable = mv88e6185_g1_ppu_enable,
>> +.ppu_disable = mv88e6185_g1_ppu_disable,
>>  .reset = mv88e6185_g1_reset,
>>  };
>
> The mv88e6097 should use the indirect access to the phys, bit 14 in g1
> control is marked as reserved. They write in the datasheet that
> disabling the PPU is still supported but indirect access via g2 should
> be used because disabling the PPU  is no longer recommended.

Ho ok thanks, I respin a v3 right away with this removed and with
mv88e6352_g1_reset instead.

Vivien


Re: stmmac ethernet in kernel 4.9-rc6: coalescing related pauses.

2016-12-05 Thread Pavel Machek
Hi!

> > 
> > Actually, I was wrong. irqlock protection is needed, since
> > stmmac_tx_clean() is called from timer, and that's interrupt context,
> > as you can confirm using BUG_ON(in_interrupt());
> > 
> 
> in_interrupt() can mean both softirq and hardirq context. In this case it
> means softirq. So I guess you were right before, and no irq locking is needed.

Are you absolutely sure? Because my testing seems to indicate
otherwise (but I may have made a mistake).

According to

https://www.kernel.org/pub/linux/kernel/people/rusty/kernel-locking/c214.html

we need spin_lock_bh at minimum, as we are locking user context
against timer.

Best regards,
Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


signature.asc
Description: Digital signature


[PATCH v4 net-next 2/2] MAINTAINERS: add entry for slicoss ethernet driver

2016-12-05 Thread Lino Sanfilippo
Add myself as maintainer for the slicoss ethernet driver.

Signed-off-by: Lino Sanfilippo 
---
 MAINTAINERS | 5 +
 1 file changed, 5 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 6781a3f..bb9af28 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -562,6 +562,11 @@ T: git git://linuxtv.org/anttip/media_tree.git
 S: Maintained
 F: drivers/media/usb/airspy/
 
+ALACRITECH GIGABIT ETHERNET DRIVER
+M: Lino Sanfilippo 
+S: Maintained
+F: drivers/net/ethernet/alacritech/*
+
 ALCATEL SPEEDTOUCH USB DRIVER
 M: Duncan Sands 
 L: linux-...@vger.kernel.org
-- 
1.9.1



[PATCH v4 net-next 1/2] net: ethernet: slicoss: add slicoss gigabit ethernet driver

2016-12-05 Thread Lino Sanfilippo
Add driver for Alacritech gigabit ethernet cards with SLIC (session-layer
interface control) technology. The driver provides basic support without
SLIC for the following devices:

- Mojave cards (single port PCI Gigabit) both copper and fiber
- Oasis cards (single and dual port PCI-x Gigabit) copper and fiber
- Kalahari cards (dual and quad port PCI-e Gigabit) copper and fiber

Signed-off-by: Lino Sanfilippo 
---
 drivers/net/ethernet/Kconfig  |1 +
 drivers/net/ethernet/Makefile |1 +
 drivers/net/ethernet/alacritech/Kconfig   |   28 +
 drivers/net/ethernet/alacritech/Makefile  |4 +
 drivers/net/ethernet/alacritech/slic.h|  575 +
 drivers/net/ethernet/alacritech/slicoss.c | 1882 +
 6 files changed, 2491 insertions(+)
 create mode 100644 drivers/net/ethernet/alacritech/Kconfig
 create mode 100644 drivers/net/ethernet/alacritech/Makefile
 create mode 100644 drivers/net/ethernet/alacritech/slic.h
 create mode 100644 drivers/net/ethernet/alacritech/slicoss.c

diff --git a/drivers/net/ethernet/Kconfig b/drivers/net/ethernet/Kconfig
index 2ffd634..a4cc87fe 100644
--- a/drivers/net/ethernet/Kconfig
+++ b/drivers/net/ethernet/Kconfig
@@ -21,6 +21,7 @@ source "drivers/net/ethernet/3com/Kconfig"
 source "drivers/net/ethernet/adaptec/Kconfig"
 source "drivers/net/ethernet/aeroflex/Kconfig"
 source "drivers/net/ethernet/agere/Kconfig"
+source "drivers/net/ethernet/alacritech/Kconfig"
 source "drivers/net/ethernet/allwinner/Kconfig"
 source "drivers/net/ethernet/alteon/Kconfig"
 source "drivers/net/ethernet/altera/Kconfig"
diff --git a/drivers/net/ethernet/Makefile b/drivers/net/ethernet/Makefile
index 1d349e9..b448027 100644
--- a/drivers/net/ethernet/Makefile
+++ b/drivers/net/ethernet/Makefile
@@ -7,6 +7,7 @@ obj-$(CONFIG_NET_VENDOR_8390) += 8390/
 obj-$(CONFIG_NET_VENDOR_ADAPTEC) += adaptec/
 obj-$(CONFIG_GRETH) += aeroflex/
 obj-$(CONFIG_NET_VENDOR_AGERE) += agere/
+obj-$(CONFIG_NET_VENDOR_ALACRITECH) += alacritech/
 obj-$(CONFIG_NET_VENDOR_ALLWINNER) += allwinner/
 obj-$(CONFIG_NET_VENDOR_ALTEON) += alteon/
 obj-$(CONFIG_ALTERA_TSE) += altera/
diff --git a/drivers/net/ethernet/alacritech/Kconfig 
b/drivers/net/ethernet/alacritech/Kconfig
new file mode 100644
index 000..09496e1
--- /dev/null
+++ b/drivers/net/ethernet/alacritech/Kconfig
@@ -0,0 +1,28 @@
+config NET_VENDOR_ALACRITECH
+   bool "Alacritech devices"
+   default y
+   ---help---
+ If you have a network (Ethernet) card belonging to this class, say Y.
+
+ Note that the answer to this question doesn't directly affect the
+ kernel: saying N will just cause the configurator to skip all the
+ questions about Alacritech devices. If you say Y, you will be asked
+ for your specific device in the following questions.
+
+if NET_VENDOR_ALACRITECH
+
+config SLICOSS
+   tristate "Alacritech Slicoss support"
+   depends on PCI
+   select CRC32
+   ---help---
+ This driver supports Gigabit Ethernet adapters based on the
+ Session Layer Interface (SLIC) technology by Alacritech.
+
+ Supported are Mojave (1 port) and Oasis (1, 2 and 4 port) cards,
+ both copper and fiber.
+
+ To compile this driver as a module, choose M here: the module
+ will be called slicoss. This is recommended.
+
+endif # NET_VENDOR_ALACRITECH
diff --git a/drivers/net/ethernet/alacritech/Makefile 
b/drivers/net/ethernet/alacritech/Makefile
new file mode 100644
index 000..8790e9e
--- /dev/null
+++ b/drivers/net/ethernet/alacritech/Makefile
@@ -0,0 +1,4 @@
+#
+# Makefile for the Alacritech Slicoss driver
+#
+obj-$(CONFIG_SLICOSS) += slicoss.o
diff --git a/drivers/net/ethernet/alacritech/slic.h 
b/drivers/net/ethernet/alacritech/slic.h
new file mode 100644
index 000..08931b4
--- /dev/null
+++ b/drivers/net/ethernet/alacritech/slic.h
@@ -0,0 +1,575 @@
+
+#ifndef _SLIC_H
+#define _SLIC_H
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#define SLIC_VGBSTAT_XPERR 0x4000
+#define SLIC_VGBSTAT_XERRSHFT  25
+#define SLIC_VGBSTAT_XCSERR0x23
+#define SLIC_VGBSTAT_XUFLOW0x22
+#define SLIC_VGBSTAT_XHLEN 0x20
+#define SLIC_VGBSTAT_NETERR0x0100
+#define SLIC_VGBSTAT_NERRSHFT  16
+#define SLIC_VGBSTAT_NERRMSK   0x1ff
+#define SLIC_VGBSTAT_NCSERR0x103
+#define SLIC_VGBSTAT_NUFLOW0x102
+#define SLIC_VGBSTAT_NHLEN 0x100
+#define SLIC_VGBSTAT_LNKERR0x0080
+#define SLIC_VGBSTAT_LERRMSK   0xff
+#define SLIC_VGBSTAT_LDEARLY   0x86
+#define SLIC_VGBSTAT_LBOFLO0x85
+#define SLIC_VGBSTAT_LCODERR   0x84
+#define SLIC_VGBSTAT_LDBLNBL   0x83
+#define SLIC_VGBSTAT_LCRCERR   0x82
+#define SLIC_VGBSTAT_LOFLO 0x81
+#define SLIC_VGBSTAT_LUFLO 0x80
+

Gigabit ethernet driver for Alacritechs SLIC devices (v4)

2016-12-05 Thread Lino Sanfilippo
Hi,

this is the forth version of the slicoss gigabit ethernet driver (which is a
rework of the driver from Alacritech which can currently be found under
drivers/staging/slicoss). The driver is supposed to support Mojave, Oasis and
Kalahari cards, for both copper and fiber.

If this code is accepted the staging version can be removed.

The driver has been tested on a SEN2104ET adapter (4 Port PCIe copper).

v4:
- fix wrong driver name in Kconfig file (reported by Rami Rosen)
- remove unused variable from driver struct (reported by Rami Rosen)
- return "err" instead of 0 in slic_load_rcvseq_firmware() (reported by Rami 
Rosen)
- Fix typos in constants, comments and error message (reported by Markus Böhme)
- fix various warnings concerning signedness (reported by Markus Böhme)
- improve line formatting (reported by Markus Böhme)
- add comment describing the need for SLIC_MAX_TX_COMPLETIONS (suggested by 
Florian Fainelli)
- do not zero out complete rx descriptor (suggested by Florian Fainelli)
- add missing write barrier (reported by Florian Fainelli)
- remove unneeded assignment of net_device to skb (reported by Florian Fainelli)
- use napi_complete_done() instead of napi_complete (suggested by Florian 
Fainelli)
- use napi_schedule_irqoff() instead of napi_schedule (suggested by Florian 
Fainelli)
- do not map error returned by slic_init() to -ENOMEM
- do proper dma syncs before and after rx descriptor status is set to 0
- if after dma sync for CPU rx descriptor is not used return it to HW by means 
of dma sync for device

v3:
- dont add defines to pci_ids.h but instead put it into the drivers header file
(requested by Greg Kroah-Hartman)

v2:
- remove unusual padding in statistic strings (suggested by Andrew Lunn)
- for mdio register and bit names use defines from mii.h instead of own ones
  (suggested by Andrew Lunn)
- remove unused defines
- ensure PCI flush at two more places
- use mmiowb before lock to prevent mmio writes leaking out of lock
- fix some typos in comments
- add copyright and GPL header

Regards,
Lino 



Kernel panic in netfilter 4.8.10 probably on conntrack -L

2016-12-05 Thread Denys Fedoryshchenko

Hi!

I have quite loaded NAT server (approx 17Gbps of traffic) where periodic 
"conntrack -L" might trigger once per day kernel panic.
I am not definitely sure it is triggered exactly at running tool, or 
just by enabling events.

Here is panic message:

 [221287.380762] general protection fault:  [#1] SMP
 [221287.381029] Modules linked in:
 xt_rateest
 xt_RATEEST
 nf_conntrack_netlink
 netconsole
 configfs
 tun
 nf_nat_pptp
 nf_nat_proto_gre
 xt_TCPMSS
 xt_connmark
 ipt_MASQUERADE
 nf_nat_masquerade_ipv4
 xt_nat
 nf_conntrack_pptp
 nf_conntrack_proto_gre
 xt_CT
 xt_set
 xt_hl
 xt_tcpudp
 ip_set_hash_net
 ip_set
 nfnetlink
 iptable_raw
 iptable_mangle
 iptable_nat
 nf_conntrack_ipv4
 nf_defrag_ipv4
 nf_nat_ipv4
 nf_nat
 nf_conntrack
 iptable_filter
 ip_tables
 x_tables
 8021q
 garp
 mrp
 stp
 llc
 bonding
 ixgbe
 dca

 [221287.384913] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 
4.8.10-build-0121 #10
 [221287.385184] Hardware name: Intel Corporation 
S2600WTT/S2600WTT, BIOS SE5C610.86B.01.01.1008.031920151331 03/19/2015
 [221287.385634] task: 8200b4c0 task.stack: 
8200

 [221287.385900] RIP: 0010:[]
 [] nf_conntrack_eventmask_report+0xba/0x123 
[nf_conntrack]

 [221287.386428] RSP: 0018:882fbf603df8  EFLAGS: 00010202
 [221287.386693] RAX:  RBX: 882f96a51da8 RCX: 

 [221287.387134] RDX:  RSI: 882fbf603e00 RDI: 
0004
 [221287.387575] RBP: 882fbf603e38 R08: ff81822024ff R09: 
0004
 [221287.388011] R10: 882fbf603de0 R11: 820050c0 R12: 
882f810bf0c0
 [221287.388445] R13:  R14:  R15: 
0004
 [221287.388877] FS:  () 
GS:882fbf60() knlGS:

 [221287.389311] CS:  0010 DS:  ES:  CR0: 80050033
 [221287.389567] CR2: 7faff0bd8978 CR3: 02006000 CR4: 
001406f0

 [221287.389998] Stack:
 [221287.390238]  00049f292300
 882f810bf0c0
 
 882f810bf0c0

 [221287.390913]  882f96a51d80
 
 
 820050c8

 [221287.391587]  882fbf603e68
 a0098bd3
 8100
 a0098c85

 [221287.392262] Call Trace:
 [221287.392508]  

 [221287.392579]  [] nf_ct_delete+0x7a/0x12c 
[nf_conntrack]
 [221287.393082]  [] ? nf_ct_delete+0x12c/0x12c 
[nf_conntrack]
 [221287.393351]  [] death_by_timeout+0xd/0xf 
[nf_conntrack]
 [221287.393617]  [] 
call_timer_fn.isra.5+0x17/0x6b

 [221287.393881]  [] expire_timers+0x6f/0x7e
 [221287.394134]  [] run_timer_softirq+0x69/0x8b
 [221287.394390]  [] __do_softirq+0xbd/0x1aa
 [221287.394643]  [] irq_exit+0x37/0x7c
 [221287.394898]  [] 
smp_trace_call_function_single_interrupt+0x2e/0x30
 [221287.395341]  [] 
smp_call_function_single_interrupt+0x9/0xb
 [221287.395600]  [] 
call_function_single_interrupt+0x7c/0x90

 [221287.395857]  

 [221287.395926]  [] ? mwait_idle+0x64/0x7a
 [221287.396413]  [] arch_cpu_idle+0xa/0xc
 [221287.396665]  [] default_idle_call+0x27/0x29
 [221287.396919]  [] 
cpu_startup_entry+0x11d/0x1c7

 [221287.397175]  [] rest_init+0x72/0x74
 [221287.397428]  [] start_kernel+0x3ba/0x3c7
 [221287.397681]  [] 
x86_64_start_reservations+0x2a/0x2c
 [221287.397937]  [] 
x86_64_start_kernel+0x12a/0x135

 [221287.402124] Code:
 f2
 89
 75
 d0
 75
 04
 4c
 8b
 73
 08
 0f
 b7
 73
 10
 41
 89
 ff
 4d
 89
 f1
 4d
 09
 f9
 31
 c0
 49
 85
 f1
 74
 67
 41
 89
 d5
 89
 7d
 c4
 48
 8d
 75
 c8
 44
 09
 f7

 ff
 10
 89
 c2
 c1
 ea
 1f
 75
 05
 4d
 85
 f6
 74
 4b
 49
 83
 c4
 04
 89
 45

 [221287.406724] RIP
 [] nf_conntrack_eventmask_report+0xba/0x123 
[nf_conntrack]

 [221287.407234]  RSP 
 [221287.407489] ---[ end trace 4b077b9412fc7065 ]---
 [221287.407746] Kernel panic - not syncing: Fatal exception in 
interrupt

 [221287.408013] Kernel Offset: disabled
 [221287.408270] Rebooting in 5 seconds..
Dec  5 23:17:58 10.0.253.34
Dec  5 23:17:58 10.0.253.34 [221292.408645] ACPI MEMORY or I/O 
RESET_REG.


Re: [Intel-wired-lan] [RFC PATCH] i40e: enable PCIe relax ordering for SPARC

2016-12-05 Thread Alexander Duyck
On Mon, Dec 5, 2016 at 9:07 AM, Tushar Dave  wrote:
> Unlike previous generation NIC (e.g. ixgbe) i40e doesn't seem to have
> standard CSR where PCIe relaxed ordering can be set. Without PCIe relax
> ordering enabled, i40e performance is significantly low on SPARC.
>
> This patch sets PCIe relax ordering for SPARC arch by setting dma attr
> DMA_ATTR_WEAK_ORDERING for every tx and rx DMA map/unmap.
> This has shown 10x increase in performance numbers.
>
> e.g.
> iperf TCP test with 10 threads on SPARC S7
>
> Test 1: Without this patch
>
> [root@brm-snt1-03 net]# iperf -s
> 
> Server listening on TCP port 5001
> TCP window size: 85.3 KByte (default)
> 
> [  4] local 16.0.0.7 port 5001 connected with 16.0.0.1 port 40926
> [  5] local 16.0.0.7 port 5001 connected with 16.0.0.1 port 40934
> [  6] local 16.0.0.7 port 5001 connected with 16.0.0.1 port 40930
> [  7] local 16.0.0.7 port 5001 connected with 16.0.0.1 port 40928
> [  8] local 16.0.0.7 port 5001 connected with 16.0.0.1 port 40922
> [  9] local 16.0.0.7 port 5001 connected with 16.0.0.1 port 40932
> [ 10] local 16.0.0.7 port 5001 connected with 16.0.0.1 port 40920
> [ 11] local 16.0.0.7 port 5001 connected with 16.0.0.1 port 40924
> [ 14] local 16.0.0.7 port 5001 connected with 16.0.0.1 port 40982
> [ 12] local 16.0.0.7 port 5001 connected with 16.0.0.1 port 40980
> [ ID] Interval   Transfer Bandwidth
> [  4]  0.0-20.0 sec   566 MBytes   237 Mbits/sec
> [  5]  0.0-20.0 sec   532 MBytes   223 Mbits/sec
> [  6]  0.0-20.0 sec   537 MBytes   225 Mbits/sec
> [  8]  0.0-20.0 sec   546 MBytes   229 Mbits/sec
> [ 11]  0.0-20.0 sec   592 MBytes   248 Mbits/sec
> [  7]  0.0-20.0 sec   539 MBytes   226 Mbits/sec
> [  9]  0.0-20.0 sec   572 MBytes   240 Mbits/sec
> [ 10]  0.0-20.0 sec   604 MBytes   253 Mbits/sec
> [ 14]  0.0-20.0 sec   567 MBytes   238 Mbits/sec
> [ 12]  0.0-20.0 sec   511 MBytes   214 Mbits/sec
> [SUM]  0.0-20.0 sec  5.44 GBytes  2.33 Gbits/sec
>
> Test 2: with this patch:
>
> [root@brm-snt1-03 net]# iperf -s
> 
> Server listening on TCP port 5001
> TCP window size: 85.3 KByte (default)
> 
> TCP: request_sock_TCP: Possible SYN flooding on port 5001. Sending
> cookies.  Check SNMP counters.
> [  4] local 16.0.0.7 port 5001 connected with 16.0.0.1 port 46876
> [  5] local 16.0.0.7 port 5001 connected with 16.0.0.1 port 46874
> [  6] local 16.0.0.7 port 5001 connected with 16.0.0.1 port 46872
> [  7] local 16.0.0.7 port 5001 connected with 16.0.0.1 port 46880
> [  8] local 16.0.0.7 port 5001 connected with 16.0.0.1 port 46878
> [  9] local 16.0.0.7 port 5001 connected with 16.0.0.1 port 46884
> [ 10] local 16.0.0.7 port 5001 connected with 16.0.0.1 port 46886
> [ 11] local 16.0.0.7 port 5001 connected with 16.0.0.1 port 46890
> [ 12] local 16.0.0.7 port 5001 connected with 16.0.0.1 port 46888
> [ 13] local 16.0.0.7 port 5001 connected with 16.0.0.1 port 46882
> [ ID] Interval   Transfer Bandwidth
> [  4]  0.0-20.0 sec  7.45 GBytes  3.19 Gbits/sec
> [  5]  0.0-20.0 sec  7.48 GBytes  3.21 Gbits/sec
> [  7]  0.0-20.0 sec  7.34 GBytes  3.15 Gbits/sec
> [  8]  0.0-20.0 sec  7.42 GBytes  3.18 Gbits/sec
> [  9]  0.0-20.0 sec  7.24 GBytes  3.11 Gbits/sec
> [ 10]  0.0-20.0 sec  7.40 GBytes  3.17 Gbits/sec
> [ 12]  0.0-20.0 sec  7.49 GBytes  3.21 Gbits/sec
> [  6]  0.0-20.0 sec  7.30 GBytes  3.13 Gbits/sec
> [ 11]  0.0-20.0 sec  7.44 GBytes  3.19 Gbits/sec
> [ 13]  0.0-20.0 sec  7.22 GBytes  3.10 Gbits/sec
> [SUM]  0.0-20.0 sec  73.8 GBytes  31.6 Gbits/sec
>
> NOTE: In my testing, this patch does _not_ show any harm to i40e
> performance numbers on x86.
>
> Signed-off-by: Tushar Dave 

You went through and replaced all of the dma_unmap/map_page calls with
dma_map/unmap_single_attrs  I would prefer you didn't do that.  I have
patches to add the ability to map and unmap pages with attributes that
should be available for 4.10-rc1 so if you could wait on this patch
until then it would be preferred.

> ---
>  drivers/net/ethernet/intel/i40e/i40e_txrx.c | 69 
> -
>  drivers/net/ethernet/intel/i40e/i40e_txrx.h |  1 +
>  2 files changed, 49 insertions(+), 21 deletions(-)
>
> diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c 
> b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
> index 6287bf6..800dca7 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
> +++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
> @@ -551,15 +551,17 @@ static void i40e_unmap_and_free_tx_resource(struct 
> i40e_ring *ring,
> else
> dev_kfree_skb_any(tx_buffer->skb);
> if (dma_unmap_len(tx_buffer, len))
> -   dma_unmap_single(ring->dev,
> -

Re: [PATCH nf-next] netfilter: xt_bpf: support ebpf

2016-12-05 Thread Florian Westphal
Eric Dumazet  wrote:
> On Mon, 2016-12-05 at 15:28 -0500, Willem de Bruijn wrote:
> > From: Willem de Bruijn 
> > 
> > Add support for attaching an eBPF object by file descriptor.
> > 
> > The iptables binary can be called with a path to an elf object or a
> > pinned bpf object. Also pass the mode and path to the kernel to be
> > able to return it later for iptables dump and save.
> > 
> > Signed-off-by: Willem de Bruijn 
> > ---
> 
> Assuming there is no simple way to get variable matchsize in iptables,
> this looks good to me, thanks.

It should be possible by setting kernel .matchsize to ~0 which
suppresses strict size enforcement.

Its currently only used by ebt_among, but this should work for any xtables
module.


Re: [PATCH v2 net-next v2 3/4] net: dsa: mv88e6xxx: add a soft reset operation

2016-12-05 Thread Stefan Eichenberger
Hi Vivien

On Mon, Dec 05, 2016 at 11:27:02AM -0500, Vivien Didelot wrote:
>  static const struct mv88e6xxx_ops mv88e6097_ops = {
> @@ -3285,6 +3266,7 @@ static const struct mv88e6xxx_ops mv88e6097_ops = {
>   .g1_set_cpu_port = mv88e6095_g1_set_cpu_port,
>   .g1_set_egress_port = mv88e6095_g1_set_egress_port,
>   .mgmt_rsvd2cpu = mv88e6095_g2_mgmt_rsvd2cpu,
> + .reset = mv88e6185_g1_reset,
>  };

Because it is not necessary to disable/enable the PPU and bit 14 is
marked as reserved in the datasheet, I think the following should be
used instead:
.reset = mv88e6352_g1_reset

Regards,
Stefan



Re: [PATCH nf-next] netfilter: xt_bpf: support ebpf

2016-12-05 Thread Eric Dumazet
On Mon, 2016-12-05 at 15:28 -0500, Willem de Bruijn wrote:
> From: Willem de Bruijn 
> 
> Add support for attaching an eBPF object by file descriptor.
> 
> The iptables binary can be called with a path to an elf object or a
> pinned bpf object. Also pass the mode and path to the kernel to be
> able to return it later for iptables dump and save.
> 
> Signed-off-by: Willem de Bruijn 
> ---

Assuming there is no simple way to get variable matchsize in iptables,
this looks good to me, thanks.

Reviewed-by: Eric Dumazet 






Re: [PATCH v2 net-next v2 4/4] net: dsa: mv88e6xxx: add PPU operations

2016-12-05 Thread Stefan Eichenberger
Hi Vivien,

On Mon, Dec 05, 2016 at 11:27:03AM -0500, Vivien Didelot wrote:
> @@ -3266,6 +3220,8 @@ static const struct mv88e6xxx_ops mv88e6097_ops = {
>   .g1_set_cpu_port = mv88e6095_g1_set_cpu_port,
>   .g1_set_egress_port = mv88e6095_g1_set_egress_port,
>   .mgmt_rsvd2cpu = mv88e6095_g2_mgmt_rsvd2cpu,
> + .ppu_enable = mv88e6185_g1_ppu_enable,
> + .ppu_disable = mv88e6185_g1_ppu_disable,
>   .reset = mv88e6185_g1_reset,
>  };

The mv88e6097 should use the indirect access to the phys, bit 14 in g1
control is marked as reserved. They write in the datasheet that
disabling the PPU is still supported but indirect access via g2 should
be used because disabling the PPU  is no longer recommended.

Best regards,
Stefan


Re: [PATCH 0/5] cpsw: add per channel shaper configuration

2016-12-05 Thread Ivan Khoronzhuk
On Mon, Dec 05, 2016 at 02:33:40PM -0600, Grygorii Strashko wrote:
> Hi Ivan,
> 
> On 11/29/2016 09:00 AM, Ivan Khoronzhuk wrote:
> > This series is intended to allow user to set rate for per channel
> > shapers at cpdma level. This patchset doesn't have impact on performance.
> > The rate can be set with:
> > 
> > echo 100 > /sys/class/net/ethX/queues/tx-0/tx_maxrate
> > 
> > Tested on am572xx
> > Based on net-next/master
> > 
> > Ivan Khoronzhuk (5):
> >   net: ethernet: ti: davinci_cpdma: add weight function for channels
> >   net: ethernet: ti: davinci_cpdma: add set rate for a channel
> >   net: ethernet: ti: cpsw: add .ndo to set per-queue rate
> >   net: ethernet: ti: cpsw: optimize end of poll cycle
> >   net: ethernet: ti: cpsw: split tx budget according between channels
> > 
> >  drivers/net/ethernet/ti/cpsw.c  | 264 +++
> >  drivers/net/ethernet/ti/davinci_cpdma.c | 453 
> > 
> >  drivers/net/ethernet/ti/davinci_cpdma.h |   6 +
> >  3 files changed, 624 insertions(+), 99 deletions(-)
> > 
> 
> 
> I've just tried net-next on BBB and got below back-trace:
I'd not tested on BBB, and expected some review before apply.
Seems that's because of phy speed, let me finish a fix patch.
Thanks.


> INIT: Entering runlevel: 5
> Configuring network interfaces... [   15.018356] net eth0: initializing cpsw 
> version 1.12 (0)
> [   15.120153] SMSC LAN8710/LAN8720 4a101000.mdio:00: attached PHY driver 
> [SMSC LAN8710/LAN8720] (mii_bus:phy_addr=4a101000.mdio:00, irq=-1)
> [   15.138578] Division by zero in kernel.
> [   15.142667] CPU: 0 PID: 755 Comm: ifconfig Not tainted 
> 4.9.0-rc7-01617-g6ea3f00 #5
> [   15.150277] Hardware name: Generic AM33XX (Flattened Device Tree)
> [   15.156399] Backtrace: 
> [   15.158898] [] (dump_backtrace) from [] 
> (show_stack+0x18/0x1c)
> [   15.166508]  r7: r6:600f0013 r5: r4:c0d395d0
> [   15.172200] [] (show_stack) from [] 
> (dump_stack+0x8c/0xa0)
> [   15.179460] [] (dump_stack) from [] (__div0+0x1c/0x20)
> [   15.186368]  r7: r6:ddf1c010 r5:0001 r4:0001
> [   15.192068] [] (__div0) from [] (Ldiv0+0x8/0x14)
> [   15.198503] [] (cpsw_split_budget [ti_cpsw]) from [] 
> (cpsw_ndo_open+0x4b8/0x5e4 [ti_cpsw])
> [   15.208554]  r10:ddf1c010 r9: r8: r7:ddf1c010 r6:dcc88000 
> r5:dcc88500
> [   15.216418]  r4:ddf1c0b0
> [   15.218985] [] (cpsw_ndo_open [ti_cpsw]) from [] 
> (__dev_open+0xb0/0x114)
> [   15.227466]  r10: r9: r8: r7:dcc88030 r6:bf080364 
> r5:
> [   15.235330]  r4:dcc88000
> [   15.237880] [] (__dev_open) from [] 
> (__dev_change_flags+0x9c/0x14c)
> [   15.245923]  r7:1002 r6:0001 r5:1043 r4:dcc88000
> [   15.251613] [] (__dev_change_flags) from [] 
> (dev_change_flags+0x20/0x50)
> [   15.260093]  r9: r8: r7:dcbabf0c r6:1002 r5:dcc8813c 
> r4:dcc88000
> [   15.267886] [] (dev_change_flags) from [] 
> (devinet_ioctl+0x6d4/0x794)
> [   15.276105]  r9: r8:bee8cc64 r7:dcbabf0c r6: r5:dc921e80 
> r4:
> [   15.283891] [] (devinet_ioctl) from [] 
> (inet_ioctl+0x19c/0x1c8)
> [   15.291587]  r10: r9:dc92 r8:bee8cc64 r7:c0d64bc0 r6:bee8cc64 
> r5:dd2728e0
> [   15.299450]  r4:8914
> [   15.302002] [] (inet_ioctl) from [] 
> (sock_ioctl+0x14c/0x300)
> [   15.309443] [] (sock_ioctl) from [] 
> (do_vfs_ioctl+0xa8/0x98c)
> [   15.316962]  r7:0003 r6:ddf0a780 r5:dd2728e0 r4:bee8cc64
> [   15.322653] [] (do_vfs_ioctl) from [] 
> (SyS_ioctl+0x3c/0x64)
> [   15.33]  r10: r9:dc92 r8:bee8cc64 r7:8914 r6:ddf0a780 
> r5:0003
> [   15.337864]  r4:ddf0a780
> [   15.340416] [] (SyS_ioctl) from [] 
> (ret_fast_syscall+0x0/0x3c)
> [   15.348024]  r9:dc92 r8:c0107f24 r7:0036 r6:bee8cf4d r5:bee8ce4c 
> r4:000949f0
> [   15.361174] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
> udhcpc (v1.23.1) started
> 
> 
> -- 
> regards,
> -grygorii


[PATCH net v4] tcp: warn on bogus MSS and try to amend it

2016-12-05 Thread Marcelo Ricardo Leitner
There have been some reports lately about TCP connection stalls caused
by NIC drivers that aren't setting gso_size on aggregated packets on rx
path. This causes TCP to assume that the MSS is actually the size of the
aggregated packet, which is invalid.

Although the proper fix is to be done at each driver, it's often hard
and cumbersome for one to debug, come to such root cause and report/fix
it.

This patch amends this situation in two ways. First, it adds a warning
on when this situation occurs, so it gives a hint to those trying to
debug this. It also limit the maximum probed MSS to the adverised MSS,
as it should never be any higher than that.

The result is that the connection may not have the best performance ever
but it shouldn't stall, and the admin will have a hint on what to look
for.

Tested with virtio by forcing gso_size to 0.

v2: updated msg per David's suggestion
v3: use skb_iif to find the interface and also log its name, per Eric
Dumazet's suggestion. As the skb may be backlogged and the interface
gone by then, we need to check if the number still has a meaning.
v4: use helper tcp_gro_dev_warn() and avoid pr_warn_once inside __once, per
David's suggestion

Cc: Jonathan Maxwell 
Signed-off-by: Marcelo Ricardo Leitner 
---
Thanks

 net/ipv4/tcp_input.c | 22 +-
 1 file changed, 21 insertions(+), 1 deletion(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 
a27b9c0e27c08b4e4aeaff3d0bfdf3ae561ba4d8..c71d49ce0c9379cd68317bcc135b7a2761110887
 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -128,6 +128,23 @@ int sysctl_tcp_invalid_ratelimit __read_mostly = HZ/2;
 #define REXMIT_LOST1 /* retransmit packets marked lost */
 #define REXMIT_NEW 2 /* FRTO-style transmit of unsent/new packets */
 
+static void tcp_gro_dev_warn(struct sock *sk, const struct sk_buff *skb)
+{
+   static bool __once __read_mostly;
+
+   if (!__once) {
+   struct net_device *dev;
+
+   __once = true;
+
+   rcu_read_lock();
+   dev = dev_get_by_index_rcu(sock_net(sk), skb->skb_iif);
+   pr_warn("%s: Driver has suspect GRO implementation, TCP 
performance may be compromised.\n",
+   dev ? dev->name : "Unknown driver");
+   rcu_read_unlock();
+   }
+}
+
 /* Adapt the MSS value used to make delayed ack decision to the
  * real world.
  */
@@ -144,7 +161,10 @@ static void tcp_measure_rcv_mss(struct sock *sk, const 
struct sk_buff *skb)
 */
len = skb_shinfo(skb)->gso_size ? : skb->len;
if (len >= icsk->icsk_ack.rcv_mss) {
-   icsk->icsk_ack.rcv_mss = len;
+   icsk->icsk_ack.rcv_mss = min_t(unsigned int, len,
+  tcp_sk(sk)->advmss);
+   if (unlikely(icsk->icsk_ack.rcv_mss != len))
+   tcp_gro_dev_warn(sk, skb);
} else {
/* Otherwise, we make more careful check taking into account,
 * that SACKs block is variable.
-- 
2.9.3



Re: [PATCH net] net: ep93xx_eth: Do not crash unloading module

2016-12-05 Thread David Miller
From: Florian Fainelli 
Date: Sun,  4 Dec 2016 19:22:05 -0800

> When we unload the ep93xx_eth, whether we have opened the network
> interface or not, we will either hit a kernel paging request error, or a
> simple NULL pointer de-reference because:
> 
> - if ep93xx_open has been called, we have created a valid DMA mapping
>   for ep->descs, when we call ep93xx_stop, we also call
>   ep93xx_free_buffers, ep->descs now has a stale value
> 
> - if ep93xx_open has not been called, we have a NULL pointer for
>   ep->descs, so performing any operation against that address just won't
>   work
> 
> Fix this by adding a NULL pointer check for ep->descs which means that
> ep93xx_free_buffers() was able to successfully tear down the descriptors
> and free the DMA cookie as well.
> 
> Fixes: 1d22e05df818 ("[PATCH] Cirrus Logic ep93xx ethernet driver")
> Signed-off-by: Florian Fainelli 

Applied, thanks Florian.


Re: [PATCH] net: calxeda: xgmac: use new api ethtool_{get|set}_link_ksettings

2016-12-05 Thread David Miller
From: Philippe Reynes 
Date: Sun,  4 Dec 2016 23:37:53 +0100

> The ethtool api {get|set}_settings is deprecated.
> We move this driver to new api {get|set}_link_ksettings.
> 
> Signed-off-by: Philippe Reynes 

Applied.


Re: [PATCH 0/5] cpsw: add per channel shaper configuration

2016-12-05 Thread Grygorii Strashko
Hi Ivan,

On 11/29/2016 09:00 AM, Ivan Khoronzhuk wrote:
> This series is intended to allow user to set rate for per channel
> shapers at cpdma level. This patchset doesn't have impact on performance.
> The rate can be set with:
> 
> echo 100 > /sys/class/net/ethX/queues/tx-0/tx_maxrate
> 
> Tested on am572xx
> Based on net-next/master
> 
> Ivan Khoronzhuk (5):
>   net: ethernet: ti: davinci_cpdma: add weight function for channels
>   net: ethernet: ti: davinci_cpdma: add set rate for a channel
>   net: ethernet: ti: cpsw: add .ndo to set per-queue rate
>   net: ethernet: ti: cpsw: optimize end of poll cycle
>   net: ethernet: ti: cpsw: split tx budget according between channels
> 
>  drivers/net/ethernet/ti/cpsw.c  | 264 +++
>  drivers/net/ethernet/ti/davinci_cpdma.c | 453 
> 
>  drivers/net/ethernet/ti/davinci_cpdma.h |   6 +
>  3 files changed, 624 insertions(+), 99 deletions(-)
> 


I've just tried net-next on BBB and got below back-trace:
INIT: Entering runlevel: 5
Configuring network interfaces... [   15.018356] net eth0: initializing cpsw 
version 1.12 (0)
[   15.120153] SMSC LAN8710/LAN8720 4a101000.mdio:00: attached PHY driver [SMSC 
LAN8710/LAN8720] (mii_bus:phy_addr=4a101000.mdio:00, irq=-1)
[   15.138578] Division by zero in kernel.
[   15.142667] CPU: 0 PID: 755 Comm: ifconfig Not tainted 
4.9.0-rc7-01617-g6ea3f00 #5
[   15.150277] Hardware name: Generic AM33XX (Flattened Device Tree)
[   15.156399] Backtrace: 
[   15.158898] [] (dump_backtrace) from [] 
(show_stack+0x18/0x1c)
[   15.166508]  r7: r6:600f0013 r5: r4:c0d395d0
[   15.172200] [] (show_stack) from [] 
(dump_stack+0x8c/0xa0)
[   15.179460] [] (dump_stack) from [] (__div0+0x1c/0x20)
[   15.186368]  r7: r6:ddf1c010 r5:0001 r4:0001
[   15.192068] [] (__div0) from [] (Ldiv0+0x8/0x14)
[   15.198503] [] (cpsw_split_budget [ti_cpsw]) from [] 
(cpsw_ndo_open+0x4b8/0x5e4 [ti_cpsw])
[   15.208554]  r10:ddf1c010 r9: r8: r7:ddf1c010 r6:dcc88000 
r5:dcc88500
[   15.216418]  r4:ddf1c0b0
[   15.218985] [] (cpsw_ndo_open [ti_cpsw]) from [] 
(__dev_open+0xb0/0x114)
[   15.227466]  r10: r9: r8: r7:dcc88030 r6:bf080364 
r5:
[   15.235330]  r4:dcc88000
[   15.237880] [] (__dev_open) from [] 
(__dev_change_flags+0x9c/0x14c)
[   15.245923]  r7:1002 r6:0001 r5:1043 r4:dcc88000
[   15.251613] [] (__dev_change_flags) from [] 
(dev_change_flags+0x20/0x50)
[   15.260093]  r9: r8: r7:dcbabf0c r6:1002 r5:dcc8813c 
r4:dcc88000
[   15.267886] [] (dev_change_flags) from [] 
(devinet_ioctl+0x6d4/0x794)
[   15.276105]  r9: r8:bee8cc64 r7:dcbabf0c r6: r5:dc921e80 
r4:
[   15.283891] [] (devinet_ioctl) from [] 
(inet_ioctl+0x19c/0x1c8)
[   15.291587]  r10: r9:dc92 r8:bee8cc64 r7:c0d64bc0 r6:bee8cc64 
r5:dd2728e0
[   15.299450]  r4:8914
[   15.302002] [] (inet_ioctl) from [] 
(sock_ioctl+0x14c/0x300)
[   15.309443] [] (sock_ioctl) from [] 
(do_vfs_ioctl+0xa8/0x98c)
[   15.316962]  r7:0003 r6:ddf0a780 r5:dd2728e0 r4:bee8cc64
[   15.322653] [] (do_vfs_ioctl) from [] 
(SyS_ioctl+0x3c/0x64)
[   15.33]  r10: r9:dc92 r8:bee8cc64 r7:8914 r6:ddf0a780 
r5:0003
[   15.337864]  r4:ddf0a780
[   15.340416] [] (SyS_ioctl) from [] 
(ret_fast_syscall+0x0/0x3c)
[   15.348024]  r9:dc92 r8:c0107f24 r7:0036 r6:bee8cf4d r5:bee8ce4c 
r4:000949f0
[   15.361174] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
udhcpc (v1.23.1) started


-- 
regards,
-grygorii


Re: [PATCH net-next 0/3] Minor BPF cleanups and digest

2016-12-05 Thread David Miller
From: Daniel Borkmann 
Date: Sun,  4 Dec 2016 23:19:38 +0100

> First two patches are minor cleanups, and the third one adds
> a prog digest. For details, please see individual patches.
> After this one, I have a set with tracepoint support that makes
> use of this facility as well.

Series applied, thanks.


Re: [PATCH net-next 0/3] net: ethoc: Misc improvements

2016-12-05 Thread David Miller
From: Florian Fainelli 
Date: Sun,  4 Dec 2016 12:40:27 -0800

> This patch series fixes/improves a few things:
> 
> - implement a proper PHYLIB adjust_link callback to set the duplex mode
>   accordingly
> - do not open code the fetching of a MAC address in OF/DT environments
> - demote an error message that occurs more frequently than expected in low
>   CPU/memory/bandwidth environments
> 
> Tested on a Cirrus Logic EP93xx / TS7300 board.

Series applied, thanks Florian.


[PATCH nf-next] netfilter: xt_bpf: support ebpf

2016-12-05 Thread Willem de Bruijn
From: Willem de Bruijn 

Add support for attaching an eBPF object by file descriptor.

The iptables binary can be called with a path to an elf object or a
pinned bpf object. Also pass the mode and path to the kernel to be
able to return it later for iptables dump and save.

Signed-off-by: Willem de Bruijn 
---
 include/uapi/linux/netfilter/xt_bpf.h | 21 
 net/netfilter/xt_bpf.c| 96 +--
 2 files changed, 101 insertions(+), 16 deletions(-)

diff --git a/include/uapi/linux/netfilter/xt_bpf.h 
b/include/uapi/linux/netfilter/xt_bpf.h
index 1fad2c2..652d2b6 100644
--- a/include/uapi/linux/netfilter/xt_bpf.h
+++ b/include/uapi/linux/netfilter/xt_bpf.h
@@ -2,9 +2,11 @@
 #define _XT_BPF_H
 
 #include 
+#include 
 #include 
 
 #define XT_BPF_MAX_NUM_INSTR   64
+#define XT_BPF_MAX_NUM_INSTR_V1(PATH_MAX / sizeof(struct sock_filter))
 
 struct bpf_prog;
 
@@ -16,4 +18,23 @@ struct xt_bpf_info {
struct bpf_prog *filter __attribute__((aligned(8)));
 };
 
+enum xt_bpf_modes {
+   XT_BPF_MODE_BYTECODE,
+   XT_BPF_MODE_FD_PINNED,
+   XT_BPF_MODE_FD_ELF,
+};
+
+struct xt_bpf_info_v1 {
+   __u16 mode;
+   __u16 bpf_program_num_elem;
+   __s32 fd;
+   union {
+   struct sock_filter bpf_program[XT_BPF_MAX_NUM_INSTR_V1];
+   char path[PATH_MAX];
+   };
+
+   /* only used in the kernel */
+   struct bpf_prog *filter __attribute__((aligned(8)));
+};
+
 #endif /*_XT_BPF_H */
diff --git a/net/netfilter/xt_bpf.c b/net/netfilter/xt_bpf.c
index dffee9d47..2dedaa2 100644
--- a/net/netfilter/xt_bpf.c
+++ b/net/netfilter/xt_bpf.c
@@ -10,6 +10,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -20,15 +21,15 @@ MODULE_LICENSE("GPL");
 MODULE_ALIAS("ipt_bpf");
 MODULE_ALIAS("ip6t_bpf");
 
-static int bpf_mt_check(const struct xt_mtchk_param *par)
+static int __bpf_mt_check_bytecode(struct sock_filter *insns, __u16 len,
+  struct bpf_prog **ret)
 {
-   struct xt_bpf_info *info = par->matchinfo;
struct sock_fprog_kern program;
 
-   program.len = info->bpf_program_num_elem;
-   program.filter = info->bpf_program;
+   program.len = len;
+   program.filter = insns;
 
-   if (bpf_prog_create(>filter, )) {
+   if (bpf_prog_create(ret, )) {
pr_info("bpf: check failed: parse error\n");
return -EINVAL;
}
@@ -36,6 +37,42 @@ static int bpf_mt_check(const struct xt_mtchk_param *par)
return 0;
 }
 
+static int __bpf_mt_check_fd(int fd, struct bpf_prog **ret)
+{
+   struct bpf_prog *prog;
+
+   prog = bpf_prog_get_type(fd, BPF_PROG_TYPE_SOCKET_FILTER);
+   if (IS_ERR(prog))
+   return PTR_ERR(prog);
+
+   *ret = prog;
+   return 0;
+}
+
+static int bpf_mt_check(const struct xt_mtchk_param *par)
+{
+   struct xt_bpf_info *info = par->matchinfo;
+
+   return __bpf_mt_check_bytecode(info->bpf_program,
+  info->bpf_program_num_elem,
+  >filter);
+}
+
+static int bpf_mt_check_v1(const struct xt_mtchk_param *par)
+{
+   struct xt_bpf_info_v1 *info = par->matchinfo;
+
+   if (info->mode == XT_BPF_MODE_BYTECODE)
+   return __bpf_mt_check_bytecode(info->bpf_program,
+  info->bpf_program_num_elem,
+  >filter);
+   else if (info->mode == XT_BPF_MODE_FD_PINNED ||
+info->mode == XT_BPF_MODE_FD_ELF)
+   return __bpf_mt_check_fd(info->fd, >filter);
+   else
+   return -EINVAL;
+}
+
 static bool bpf_mt(const struct sk_buff *skb, struct xt_action_param *par)
 {
const struct xt_bpf_info *info = par->matchinfo;
@@ -43,31 +80,58 @@ static bool bpf_mt(const struct sk_buff *skb, struct 
xt_action_param *par)
return BPF_PROG_RUN(info->filter, skb);
 }
 
+static bool bpf_mt_v1(const struct sk_buff *skb, struct xt_action_param *par)
+{
+   const struct xt_bpf_info_v1 *info = par->matchinfo;
+
+   return !!bpf_prog_run_save_cb(info->filter, (struct sk_buff *) skb);
+}
+
 static void bpf_mt_destroy(const struct xt_mtdtor_param *par)
 {
const struct xt_bpf_info *info = par->matchinfo;
+
+   bpf_prog_destroy(info->filter);
+}
+
+static void bpf_mt_destroy_v1(const struct xt_mtdtor_param *par)
+{
+   const struct xt_bpf_info_v1 *info = par->matchinfo;
+
bpf_prog_destroy(info->filter);
 }
 
-static struct xt_match bpf_mt_reg __read_mostly = {
-   .name   = "bpf",
-   .revision   = 0,
-   .family = NFPROTO_UNSPEC,
-   .checkentry = bpf_mt_check,
-   .match  = bpf_mt,
-   .destroy= bpf_mt_destroy,
-   .matchsize  = sizeof(struct xt_bpf_info),
-   .me = THIS_MODULE,
+static struct 

Re: [PATCH v3 net-next] net_sched: gen_estimator: complete rewrite of rate estimators

2016-12-05 Thread David Miller
From: Eric Dumazet 
Date: Sun, 04 Dec 2016 09:48:16 -0800

> From: Eric Dumazet 
> 
> 1) Old code was hard to maintain, due to complex lock chains.
>(We probably will be able to remove some kfree_rcu() in callers)
> 
> 2) Using a single timer to update all estimators does not scale.
> 
> 3) Code was buggy on 32bit kernel (WRITE_ONCE() on 64bit quantity
>is not supposed to work well)
> 
> In this rewrite :
> 
> - I removed the RB tree that had to be scanned in
>   gen_estimator_active(). qdisc dumps should be much faster.
> 
> - Each estimator has its own timer.
> 
> - Estimations are maintained in net_rate_estimator structure,
>   instead of dirtying the qdisc. Minor, but part of the simplification.
> 
> - Reading the estimator uses RCU and a seqcount to provide proper
>   support for 32bit kernels.
> 
> - We reduce memory need when estimators are not used, since
>   we store a pointer, instead of the bytes/packets counters.
> 
> - xt_rateest_mt() no longer has to grab a spinlock.
>   (In the future, xt_rateest_tg() could be switched to per cpu counters)
> 
> Signed-off-by: Eric Dumazet 
> ---
> v3: Renamed some parameters to please make htmldocs
> v2: Removed unwanted changes to tcp_output.c

This was probably long overdue, thanks for working on this.

Applied, thanks Eric.


Re: [net PATCH 2/2] ipv4: Drop suffix update from resize code

2016-12-05 Thread Robert Shearman

On 05/12/16 17:28, David Miller wrote:

From: Robert Shearman 
Date: Mon, 5 Dec 2016 15:05:18 +


On 01/12/16 12:27, Alexander Duyck wrote:

It has been reported that update_suffix can be expensive when it is
called
on a large node in which most of the suffix lengths are the same.  The
time
required to add 200K entries had increased from around 3 seconds to
almost
49 seconds.

In order to address this we need to move the code for updating the
suffix
out of resize and instead just have it handled in the cases where we
are
pushing a node that increases the suffix length, or will decrease the
suffix length.

Fixes: 5405afd1a306 ("fib_trie: Add tracking value for suffix length")
Reported-by: Robert Shearman 
Signed-off-by: Alexander Duyck 


$ time sudo ip route restore < ~/allroutes
RTNETLINK answers: File exists
RTNETLINK answers: File exists
RTNETLINK answers: File exists
RTNETLINK answers: File exists


What are these errors all about?


These are just routes that are already added by the system but are 
present in the dump:


$ ip route showdump < ~/allroutes | grep -v 110.110.110.2
default via 192.168.100.1 dev eth0  proto static  metric 1024
10.37.96.0/20 dev eth2  proto kernel  scope link  src 10.37.96.204
110.110.110.0/24 dev eth1  proto kernel  scope link  src 110.110.110.1
192.168.100.0/24 dev eth0  proto kernel  scope link  src 192.168.100.153

So the errors are expected and are seen both with and without these patches.

Thanks,
Rob


Re: [PATCH V3 net-next] net: hns: Fix to conditionally convey RX checksum flag to stack

2016-12-05 Thread kbuild test robot
Hi Salil,

[auto build test WARNING on net-next/master]

url:
https://github.com/0day-ci/linux/commits/Salil-Mehta/net-hns-Fix-to-conditionally-convey-RX-checksum-flag-to-stack/20161206-022948
config: i386-allmodconfig (attached as .config)
compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901
reproduce:
# save the attached .config to linux build tree
make ARCH=i386 

Note: it may well be a FALSE warning. FWIW you are at least aware of it now.
http://gcc.gnu.org/wiki/Better_Uninitialized_Warnings

All warnings (new ones prefixed by >>):

   drivers/net/ethernet/hisilicon/hns/hns_enet.c: In function 
'hns_nic_rx_poll_one':
>> drivers/net/ethernet/hisilicon/hns/hns_enet.c:606:5: warning: 'l3id' may be 
>> used uninitialized in this function [-Wmaybe-uninitialized]
 if ((l3id != HNS_RX_FLAG_L3ID_IPV4) && (l3id != HNS_RX_FLAG_L3ID_IPV6))
^
   drivers/net/ethernet/hisilicon/hns/hns_enet.c:573:6: note: 'l3id' was 
declared here
 u32 l3id;
 ^~~~
>> drivers/net/ethernet/hisilicon/hns/hns_enet.c:618:37: warning: 'l4id' may be 
>> used uninitialized in this function [-Wmaybe-uninitialized]
 if ((l4id != HNS_RX_FLAG_L4ID_TCP) &&
 ~~~^~
 (l4id != HNS_RX_FLAG_L4ID_UDP) &&
 ~~  
   drivers/net/ethernet/hisilicon/hns/hns_enet.c:574:6: note: 'l4id' was 
declared here
 u32 l4id;
 ^~~~

vim +/l3id +606 drivers/net/ethernet/hisilicon/hns/hns_enet.c

   600   * checksum or any other L3/L4 error, we will not (cannot) 
convey
   601   * checksum status for such cases to upper stack and will not 
maintain
   602   * the RX L3/L4 checksum counters as well.
   603   */
   604  
   605  /*  check L3 protocol for which checksum is supported */
 > 606  if ((l3id != HNS_RX_FLAG_L3ID_IPV4) && (l3id != 
 > HNS_RX_FLAG_L3ID_IPV6))
   607  return;
   608  
   609  /* check for any(not just checksum)flagged L3 protocol errors */
   610  if (unlikely(hnae_get_bit(flag, HNS_RXD_L3E_B)))
   611  return;
   612  
   613  /* we do not support checksum of fragmented packets */
   614  if (unlikely(hnae_get_bit(flag, HNS_RXD_FRAG_B)))
   615  return;
   616  
   617  /*  check L4 protocol for which checksum is supported */
 > 618  if ((l4id != HNS_RX_FLAG_L4ID_TCP) &&
   619  (l4id != HNS_RX_FLAG_L4ID_UDP) &&
   620  (l4id != HNS_RX_FLAG_L4ID_SCTP))
   621  return;

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip


[PATCH v4 05/13] net: ethernet: ti: cpts: fix registration order

2016-12-05 Thread Grygorii Strashko
The ptp clock registered before spinlock, which is protecting it, and
before timecounter and cyclecounter initialization in cpts_register().

So, ensure that ptp clock is registered the last, after everything
else is done.

Signed-off-by: Grygorii Strashko 
Acked-by: Richard Cochran 
---
 drivers/net/ethernet/ti/cpts.c | 24 ++--
 1 file changed, 14 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/ti/cpts.c b/drivers/net/ethernet/ti/cpts.c
index 61198f1..3dda6d5 100644
--- a/drivers/net/ethernet/ti/cpts.c
+++ b/drivers/net/ethernet/ti/cpts.c
@@ -356,15 +356,8 @@ int cpts_register(struct device *dev, struct cpts *cpts,
  u32 mult, u32 shift)
 {
int err, i;
-   unsigned long flags;
 
cpts->info = cpts_info;
-   cpts->clock = ptp_clock_register(>info, dev);
-   if (IS_ERR(cpts->clock)) {
-   err = PTR_ERR(cpts->clock);
-   cpts->clock = NULL;
-   return err;
-   }
spin_lock_init(>lock);
 
cpts->cc.read = cpts_systim_read;
@@ -382,15 +375,26 @@ int cpts_register(struct device *dev, struct cpts *cpts,
cpts_write32(cpts, CPTS_EN, control);
cpts_write32(cpts, TS_PEND_EN, int_enable);
 
-   spin_lock_irqsave(>lock, flags);
timecounter_init(>tc, >cc, ktime_to_ns(ktime_get_real()));
-   spin_unlock_irqrestore(>lock, flags);
 
INIT_DELAYED_WORK(>overflow_work, cpts_overflow_check);
-   schedule_delayed_work(>overflow_work, CPTS_OVERFLOW_PERIOD);
 
+   cpts->clock = ptp_clock_register(>info, dev);
+   if (IS_ERR(cpts->clock)) {
+   err = PTR_ERR(cpts->clock);
+   cpts->clock = NULL;
+   goto err_ptp;
+   }
cpts->phc_index = ptp_clock_index(cpts->clock);
+
+   schedule_delayed_work(>overflow_work, CPTS_OVERFLOW_PERIOD);
+
return 0;
+
+err_ptp:
+   if (cpts->refclk)
+   cpts_clk_release(cpts);
+   return err;
 }
 EXPORT_SYMBOL_GPL(cpts_register);
 
-- 
2.10.1



Re: [PATCH net 0/2] bnx2x: fixes series

2016-12-05 Thread David Miller
From: Yuval Mintz 
Date: Sun, 4 Dec 2016 15:30:16 +0200

> Two unrelated fixes for bnx2x - the first one is nice-to-have,
> while the other fixes fatal behaviour in older adapters.
> 
> Please consider applying them to `net'.

Series applied, thanks.


Re: [PATCH net-next] net/sched: cls_flower: Set the filter Hardware device for all use-cases

2016-12-05 Thread David Miller
From: Hadar Hen Zion 
Date: Sun,  4 Dec 2016 15:25:19 +0200

> Check if the returned device from tcf_exts_get_dev function supports tc
> offload and in case the rule can't be offloaded, set the filter hw_dev
> parameter to the original device given by the user.
> 
> The filter hw_device parameter should always be set by fl_hw_replace_filter
> function, since this pointer is used by dump stats and destroy
> filter for each flower rule (offloaded or not).
> 
> Fixes: 7091d8c7055d ('net/sched: cls_flower: Add offload support using egress 
> Hardware device')
> Signed-off-by: Hadar Hen Zion 
> Reported-by: Simon Horman 

Applied, thank you.


[PATCH v4 02/13] net: ethernet: ti: allow cpts to be built separately

2016-12-05 Thread Grygorii Strashko
TI CPTS IP is used as part of TI OMAP CPSW driver, but it's also
present as part of NETCP on TI Keystone 2 SoCs. So, It's required
to enable build of CPTS for both this drivers and this can be
achieved by allowing CPTS to be built separately.

Hence, allow cpts to be built separately and convert it to be
a module as both CPSW and NETCP drives can be built as modules.

Signed-off-by: Grygorii Strashko 
---
 drivers/net/ethernet/ti/Kconfig  |  2 +-
 drivers/net/ethernet/ti/Makefile |  3 ++-
 drivers/net/ethernet/ti/cpsw.c   | 22 +-
 drivers/net/ethernet/ti/cpts.c   | 16 
 drivers/net/ethernet/ti/cpts.h   | 18 ++
 5 files changed, 42 insertions(+), 19 deletions(-)

diff --git a/drivers/net/ethernet/ti/Kconfig b/drivers/net/ethernet/ti/Kconfig
index 9904d74..ff7f518 100644
--- a/drivers/net/ethernet/ti/Kconfig
+++ b/drivers/net/ethernet/ti/Kconfig
@@ -74,7 +74,7 @@ config TI_CPSW
  will be called cpsw.
 
 config TI_CPTS
-   bool "TI Common Platform Time Sync (CPTS) Support"
+   tristate "TI Common Platform Time Sync (CPTS) Support"
depends on TI_CPSW
select PTP_1588_CLOCK
---help---
diff --git a/drivers/net/ethernet/ti/Makefile b/drivers/net/ethernet/ti/Makefile
index d420d94..1e7c10b 100644
--- a/drivers/net/ethernet/ti/Makefile
+++ b/drivers/net/ethernet/ti/Makefile
@@ -12,8 +12,9 @@ obj-$(CONFIG_TI_DAVINCI_MDIO) += davinci_mdio.o
 obj-$(CONFIG_TI_DAVINCI_CPDMA) += davinci_cpdma.o
 obj-$(CONFIG_TI_CPSW_PHY_SEL) += cpsw-phy-sel.o
 obj-$(CONFIG_TI_CPSW_ALE) += cpsw_ale.o
+obj-$(CONFIG_TI_CPTS) += cpts.o
 obj-$(CONFIG_TI_CPSW) += ti_cpsw.o
-ti_cpsw-y := cpsw.o cpts.o
+ti_cpsw-y := cpsw.o
 
 obj-$(CONFIG_TI_KEYSTONE_NETCP) += keystone_netcp.o
 keystone_netcp-y := netcp_core.o
diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c
index 3f96c57..323174d 100644
--- a/drivers/net/ethernet/ti/cpsw.c
+++ b/drivers/net/ethernet/ti/cpsw.c
@@ -1594,7 +1594,7 @@ static netdev_tx_t cpsw_ndo_start_xmit(struct sk_buff 
*skb,
return NETDEV_TX_BUSY;
 }
 
-#ifdef CONFIG_TI_CPTS
+#if IS_ENABLED(CONFIG_TI_CPTS)
 
 static void cpsw_hwtstamp_v1(struct cpsw_common *cpsw)
 {
@@ -1742,7 +1742,16 @@ static int cpsw_hwtstamp_get(struct net_device *dev, 
struct ifreq *ifr)
 
return copy_to_user(ifr->ifr_data, , sizeof(cfg)) ? -EFAULT : 0;
 }
+#else
+static int cpsw_hwtstamp_get(struct net_device *dev, struct ifreq *ifr)
+{
+   return -EOPNOTSUPP;
+}
 
+static int cpsw_hwtstamp_set(struct net_device *dev, struct ifreq *ifr)
+{
+   return -EOPNOTSUPP;
+}
 #endif /*CONFIG_TI_CPTS*/
 
 static int cpsw_ndo_ioctl(struct net_device *dev, struct ifreq *req, int cmd)
@@ -1755,12 +1764,10 @@ static int cpsw_ndo_ioctl(struct net_device *dev, 
struct ifreq *req, int cmd)
return -EINVAL;
 
switch (cmd) {
-#ifdef CONFIG_TI_CPTS
case SIOCSHWTSTAMP:
return cpsw_hwtstamp_set(dev, req);
case SIOCGHWTSTAMP:
return cpsw_hwtstamp_get(dev, req);
-#endif
}
 
if (!cpsw->slaves[slave_no].phy)
@@ -2100,10 +2107,10 @@ static void cpsw_set_msglevel(struct net_device *ndev, 
u32 value)
priv->msg_enable = value;
 }
 
+#if IS_ENABLED(CONFIG_TI_CPTS)
 static int cpsw_get_ts_info(struct net_device *ndev,
struct ethtool_ts_info *info)
 {
-#ifdef CONFIG_TI_CPTS
struct cpsw_common *cpsw = ndev_to_cpsw(ndev);
 
info->so_timestamping =
@@ -2120,7 +2127,12 @@ static int cpsw_get_ts_info(struct net_device *ndev,
info->rx_filters =
(1 << HWTSTAMP_FILTER_NONE) |
(1 << HWTSTAMP_FILTER_PTP_V2_EVENT);
+   return 0;
+}
 #else
+static int cpsw_get_ts_info(struct net_device *ndev,
+   struct ethtool_ts_info *info)
+{
info->so_timestamping =
SOF_TIMESTAMPING_TX_SOFTWARE |
SOF_TIMESTAMPING_RX_SOFTWARE |
@@ -2128,9 +2140,9 @@ static int cpsw_get_ts_info(struct net_device *ndev,
info->phc_index = -1;
info->tx_types = 0;
info->rx_filters = 0;
-#endif
return 0;
 }
+#endif
 
 static int cpsw_get_link_ksettings(struct net_device *ndev,
   struct ethtool_link_ksettings *ecmd)
diff --git a/drivers/net/ethernet/ti/cpts.c b/drivers/net/ethernet/ti/cpts.c
index a42c449..8cb0369 100644
--- a/drivers/net/ethernet/ti/cpts.c
+++ b/drivers/net/ethernet/ti/cpts.c
@@ -31,8 +31,6 @@
 
 #include "cpts.h"
 
-#ifdef CONFIG_TI_CPTS
-
 #define cpts_read32(c, r)  readl_relaxed(>reg->r)
 #define cpts_write32(c, v, r)  writel_relaxed(v, >reg->r)
 
@@ -334,6 +332,7 @@ void cpts_rx_timestamp(struct cpts *cpts, struct sk_buff 
*skb)
memset(ssh, 0, sizeof(*ssh));
ssh->hwtstamp = ns_to_ktime(ns);
 }
+EXPORT_SYMBOL_GPL(cpts_rx_timestamp);
 
 void cpts_tx_timestamp(struct cpts *cpts, struct sk_buff *skb)
 {
@@ -349,13 +348,11 @@ 

Re: [PATCH net 1/6] net/mlx5: Verify module parameters

2016-12-05 Thread David Miller
From: Saeed Mahameed 
Date: Sun,  4 Dec 2016 12:56:11 +0200

> +static uint prof_sel = MLX5_DEFAULT_PROF;

Please do not use type shorthands such as "uint", always expand
fully "unsigned int".

Thanks.


Re: [mm PATCH 0/3] Page fragment updates

2016-12-05 Thread Andrew Morton
On Mon, 5 Dec 2016 09:01:12 -0800 Alexander Duyck  
wrote:

> On Tue, Nov 29, 2016 at 10:23 AM, Alexander Duyck
>  wrote:
> > This patch series takes care of a few cleanups for the page fragments API.
> >
> > ...
> 
> It's been about a week since I submitted this series.  Just wanted to
> check in and see if anyone had any feedback or if this is good to be
> accepted for 4.10-rc1 with the rest of the set?

Looks good to me.  I have it all queued for post-4.9 processing.


[PATCH v4 12/13] net: ethernet: ti: cpts: calc mult and shift from refclk freq

2016-12-05 Thread Grygorii Strashko
The cyclecounter mult and shift values can be calculated based on the
CPTS rfclk frequency and timekeepnig framework provides required algos
and API's.

Hence, calc mult and shift basing on CPTS rfclk frequency if both
cpts_clock_shift and cpts_clock_mult properties are not provided in DT (the
basis of calculation algorithm is borrowed from
__clocksource_update_freq_scale() commit 7d2f944a2b83 ("clocksource:
Provide a generic mult/shift factor calculation")). After this change
cpts_clock_shift and cpts_clock_mult DT properties will become optional.

Cc: John Stultz 
Cc: Thomas Gleixner 
Signed-off-by: Grygorii Strashko 
---
 Documentation/devicetree/bindings/net/cpsw.txt |  8 ++--
 drivers/net/ethernet/ti/cpts.c | 53 +++---
 2 files changed, 52 insertions(+), 9 deletions(-)

diff --git a/Documentation/devicetree/bindings/net/cpsw.txt 
b/Documentation/devicetree/bindings/net/cpsw.txt
index 5ad439f..ebda7c9 100644
--- a/Documentation/devicetree/bindings/net/cpsw.txt
+++ b/Documentation/devicetree/bindings/net/cpsw.txt
@@ -20,8 +20,6 @@ Required properties:
 - slaves   : Specifies number for slaves
 - active_slave : Specifies the slave to use for time stamping,
  ethtool and SIOCGMIIPHY
-- cpts_clock_mult  : Numerator to convert input clock ticks into 
nanoseconds
-- cpts_clock_shift : Denominator to convert input clock ticks into 
nanoseconds
 
 Optional properties:
 - ti,hwmods: Must be "cpgmac0"
@@ -35,7 +33,11 @@ Optional properties:
  For example in dra72x-evm, pcf gpio has to be
  driven low so that cpsw slave 0 and phy data
  lines are connected via mux.
-
+- cpts_clock_mult  : Numerator to convert input clock ticks into 
nanoseconds
+- cpts_clock_shift : Denominator to convert input clock ticks into 
nanoseconds
+ Mult and shift will be calculated basing on CPTS
+ rftclk frequency if both cpts_clock_shift and
+ cpts_clock_mult properties are not provided.
 
 Slave Properties:
 Required properties:
diff --git a/drivers/net/ethernet/ti/cpts.c b/drivers/net/ethernet/ti/cpts.c
index 59c09a4..361d13a 100644
--- a/drivers/net/ethernet/ti/cpts.c
+++ b/drivers/net/ethernet/ti/cpts.c
@@ -409,21 +409,60 @@ void cpts_unregister(struct cpts *cpts)
 }
 EXPORT_SYMBOL_GPL(cpts_unregister);
 
+static void cpts_calc_mult_shift(struct cpts *cpts)
+{
+   u64 frac, maxsec, ns;
+   u32 freq, mult, shift;
+
+   freq = clk_get_rate(cpts->refclk);
+
+   /* Calc the maximum number of seconds which we can run before
+* wrapping around.
+*/
+   maxsec = cpts->cc.mask;
+   do_div(maxsec, freq);
+   /* limit conversation rate to 10 sec as higher values will produce
+* too small mult factors and so reduce the conversion accuracy
+*/
+   if (maxsec > 10)
+   maxsec = 10;
+
+   if (cpts->cc_mult || cpts->cc.shift)
+   return;
+
+   clocks_calc_mult_shift(, , freq, NSEC_PER_SEC, maxsec);
+
+   cpts->cc_mult = mult;
+   cpts->cc.mult = mult;
+   cpts->cc.shift = shift;
+
+   frac = 0;
+   ns = cyclecounter_cyc2ns(>cc, freq, cpts->cc.mask, );
+
+   dev_info(cpts->dev,
+"CPTS: ref_clk_freq:%u calc_mult:%u calc_shift:%u error:%lld 
nsec/sec\n",
+freq, cpts->cc_mult, cpts->cc.shift, (ns - NSEC_PER_SEC));
+}
+
 static int cpts_of_parse(struct cpts *cpts, struct device_node *node)
 {
int ret = -EINVAL;
u32 prop;
 
-   if (of_property_read_u32(node, "cpts_clock_mult", ))
-   goto  of_error;
/* save cc.mult original value as it can be modified
 * by cpts_ptp_adjfreq().
 */
-   cpts->cc_mult = prop;
+   cpts->cc_mult = 0;
+   if (!of_property_read_u32(node, "cpts_clock_mult", ))
+   cpts->cc_mult = prop;
+
+   cpts->cc.shift = 0;
+   if (!of_property_read_u32(node, "cpts_clock_shift", ))
+   cpts->cc.shift = prop;
 
-   if (of_property_read_u32(node, "cpts_clock_shift", ))
-   goto  of_error;
-   cpts->cc.shift = prop;
+   if ((cpts->cc_mult && !cpts->cc.shift) ||
+   (!cpts->cc_mult && cpts->cc.shift))
+   goto of_error;
 
return 0;
 
@@ -463,6 +502,8 @@ struct cpts *cpts_create(struct device *dev, void __iomem 
*regs,
cpts->cc.mask = CLOCKSOURCE_MASK(32);
cpts->info = cpts_info;
 
+   cpts_calc_mult_shift(cpts);
+
return cpts;
 }
 EXPORT_SYMBOL_GPL(cpts_create);
-- 
2.10.1



[PATCH v4 10/13] net: ethernet: ti: cpts: move dt props parsing to cpts driver

2016-12-05 Thread Grygorii Strashko
Move DT properties parsing into CPTS driver to simplify CPSW
code and CPTS driver porting on other SoC in the future
(like Keystone 2) - with this change it will not be required
to add the same DT parsing code in Keystone 2 NETCP driver.

Signed-off-by: Grygorii Strashko 
---
 drivers/net/ethernet/ti/cpsw.c | 16 +---
 drivers/net/ethernet/ti/cpsw.h |  2 --
 drivers/net/ethernet/ti/cpts.c | 32 +---
 drivers/net/ethernet/ti/cpts.h |  5 +++--
 4 files changed, 33 insertions(+), 22 deletions(-)

diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c
index deb008a..259c717 100644
--- a/drivers/net/ethernet/ti/cpsw.c
+++ b/drivers/net/ethernet/ti/cpsw.c
@@ -2524,18 +2524,6 @@ static int cpsw_probe_dt(struct cpsw_platform_data *data,
}
data->active_slave = prop;
 
-   if (of_property_read_u32(node, "cpts_clock_mult", )) {
-   dev_err(>dev, "Missing cpts_clock_mult property in the 
DT.\n");
-   return -EINVAL;
-   }
-   data->cpts_clock_mult = prop;
-
-   if (of_property_read_u32(node, "cpts_clock_shift", )) {
-   dev_err(>dev, "Missing cpts_clock_shift property in the 
DT.\n");
-   return -EINVAL;
-   }
-   data->cpts_clock_shift = prop;
-
data->slave_data = devm_kzalloc(>dev, data->slaves
* sizeof(struct cpsw_slave_data),
GFP_KERNEL);
@@ -2990,9 +2978,7 @@ static int cpsw_probe(struct platform_device *pdev)
goto clean_dma_ret;
}
 
-   cpsw->cpts = cpts_create(cpsw->dev, cpts_regs,
-cpsw->data.cpts_clock_mult,
-cpsw->data.cpts_clock_shift);
+   cpsw->cpts = cpts_create(cpsw->dev, cpts_regs, cpsw->dev->of_node);
if (IS_ERR(cpsw->cpts)) {
ret = PTR_ERR(cpsw->cpts);
goto clean_ale_ret;
diff --git a/drivers/net/ethernet/ti/cpsw.h b/drivers/net/ethernet/ti/cpsw.h
index 16b54c6..6c3037a 100644
--- a/drivers/net/ethernet/ti/cpsw.h
+++ b/drivers/net/ethernet/ti/cpsw.h
@@ -31,8 +31,6 @@ struct cpsw_platform_data {
u32 channels;   /* number of cpdma channels (symmetric) */
u32 slaves; /* number of slave cpgmac ports */
u32 active_slave; /* time stamping, ethtool and SIOCGMIIPHY slave */
-   u32 cpts_clock_mult;  /* convert input clock ticks to nanoseconds */
-   u32 cpts_clock_shift; /* convert input clock ticks to nanoseconds */
u32 ale_entries;/* ale table size */
u32 bd_ram_size;  /*buffer descriptor ram size */
u32 mac_control;/* Mac control register */
diff --git a/drivers/net/ethernet/ti/cpts.c b/drivers/net/ethernet/ti/cpts.c
index 9356803..59c09a4 100644
--- a/drivers/net/ethernet/ti/cpts.c
+++ b/drivers/net/ethernet/ti/cpts.c
@@ -409,10 +409,34 @@ void cpts_unregister(struct cpts *cpts)
 }
 EXPORT_SYMBOL_GPL(cpts_unregister);
 
+static int cpts_of_parse(struct cpts *cpts, struct device_node *node)
+{
+   int ret = -EINVAL;
+   u32 prop;
+
+   if (of_property_read_u32(node, "cpts_clock_mult", ))
+   goto  of_error;
+   /* save cc.mult original value as it can be modified
+* by cpts_ptp_adjfreq().
+*/
+   cpts->cc_mult = prop;
+
+   if (of_property_read_u32(node, "cpts_clock_shift", ))
+   goto  of_error;
+   cpts->cc.shift = prop;
+
+   return 0;
+
+of_error:
+   dev_err(cpts->dev, "CPTS: Missing property in the DT.\n");
+   return ret;
+}
+
 struct cpts *cpts_create(struct device *dev, void __iomem *regs,
-u32 mult, u32 shift)
+struct device_node *node)
 {
struct cpts *cpts;
+   int ret;
 
cpts = devm_kzalloc(dev, sizeof(*cpts), GFP_KERNEL);
if (!cpts)
@@ -423,6 +447,10 @@ struct cpts *cpts_create(struct device *dev, void __iomem 
*regs,
spin_lock_init(>lock);
INIT_DELAYED_WORK(>overflow_work, cpts_overflow_check);
 
+   ret = cpts_of_parse(cpts, node);
+   if (ret)
+   return ERR_PTR(ret);
+
cpts->refclk = devm_clk_get(dev, "cpts");
if (IS_ERR(cpts->refclk)) {
dev_err(dev, "Failed to get cpts refclk\n");
@@ -433,8 +461,6 @@ struct cpts *cpts_create(struct device *dev, void __iomem 
*regs,
 
cpts->cc.read = cpts_systim_read;
cpts->cc.mask = CLOCKSOURCE_MASK(32);
-   cpts->cc.shift = shift;
-   cpts->cc_mult = mult;
cpts->info = cpts_info;
 
return cpts;
diff --git a/drivers/net/ethernet/ti/cpts.h b/drivers/net/ethernet/ti/cpts.h
index e7d857c..5da23af 100644
--- a/drivers/net/ethernet/ti/cpts.h
+++ b/drivers/net/ethernet/ti/cpts.h
@@ -27,6 +27,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -133,7 +134,7 @@ void 

[PATCH v4 00/13] net: ethernet: ti: cpts: update and fixes

2016-12-05 Thread Grygorii Strashko
It is preparation series intended to clean up and optimize TI CPTS driver to
facilitate further integration with other TI's SoCs like Keystone 2.

Changes in v4:
- fixed build error in patch
  "net: ethernet: ti: cpts: clean up event list if event pool is empty"
- rebased on top of net-next
 
Changes in v3:
- patches reordered: fixes and small updates moved first
- added comments in code about cpts->cc_mult
- conversation range (maxsec) limited to 10sec

Changes in v2:
- patch "net: ethernet: ti: cpts: rework initialization/deinitialization"
  was split on 4 patches
- applied comments from Richard Cochran
- dropped patch
  "net: ethernet: ti: cpts: add return value to tx and rx timestamp funcitons"
- new patches added:
  "net: ethernet: ti: cpts: drop excessive writes to CTRL and INT_EN regs"
  and "clocksource: export the clocks_calc_mult_shift to use by timestamp code"

Links on prev versions:
v3: https://www.spinics.net/lists/devicetree/msg153474.html
v2: http://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1282034.html
v1: http://www.spinics.net/lists/linux-omap/msg131925.html

Grygorii Strashko (11):
  net: ethernet: ti: cpts: switch to readl/writel_relaxed()
  net: ethernet: ti: allow cpts to be built separately
  net: ethernet: ti: cpsw: minimize direct access to struct cpts
  net: ethernet: ti: cpts: fix unbalanced clk api usage in 
cpts_register/unregister
  net: ethernet: ti: cpts: fix registration order
  net: ethernet: ti: cpts: disable cpts when unregistered
  net: ethernet: ti: cpts: drop excessive writes to CTRL and INT_EN regs
  net: ethernet: ti: cpts: rework initialization/deinitialization
  net: ethernet: ti: cpts: move dt props parsing to cpts driver
  net: ethernet: ti: cpts: calc mult and shift from refclk freq
  net: ethernet: ti: cpts: fix overflow check period

Murali Karicheri (1):
  clocksource: export the clocks_calc_mult_shift to use by timestamp code

WingMan Kwok (1):
  net: ethernet: ti: cpts: clean up event list if event pool is empty

 Documentation/devicetree/bindings/net/cpsw.txt |   8 +-
 drivers/net/ethernet/ti/Kconfig|   2 +-
 drivers/net/ethernet/ti/Makefile   |   3 +-
 drivers/net/ethernet/ti/cpsw.c |  84 -
 drivers/net/ethernet/ti/cpsw.h |   2 -
 drivers/net/ethernet/ti/cpts.c | 239 +++--
 drivers/net/ethernet/ti/cpts.h |  80 -
 kernel/time/clocksource.c  |   1 +
 8 files changed, 304 insertions(+), 115 deletions(-)

-- 
2.10.1



[PATCH v4 01/13] net: ethernet: ti: cpts: switch to readl/writel_relaxed()

2016-12-05 Thread Grygorii Strashko
Switch to readl/writel_relaxed() APIs, because this is recommended
API and the CPTS IP is reused on Keystone 2 SoCs
where LE/BE modes are supported.

Signed-off-by: Grygorii Strashko 
Acked-by: Richard Cochran 
---
 drivers/net/ethernet/ti/cpts.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/ti/cpts.c b/drivers/net/ethernet/ti/cpts.c
index 85a55b4..a42c449 100644
--- a/drivers/net/ethernet/ti/cpts.c
+++ b/drivers/net/ethernet/ti/cpts.c
@@ -33,8 +33,8 @@
 
 #ifdef CONFIG_TI_CPTS
 
-#define cpts_read32(c, r)  __raw_readl(>reg->r)
-#define cpts_write32(c, v, r)  __raw_writel(v, >reg->r)
+#define cpts_read32(c, r)  readl_relaxed(>reg->r)
+#define cpts_write32(c, v, r)  writel_relaxed(v, >reg->r)
 
 static int event_expired(struct cpts_event *event)
 {
-- 
2.10.1



[PATCH v4 03/13] net: ethernet: ti: cpsw: minimize direct access to struct cpts

2016-12-05 Thread Grygorii Strashko
This will provide more flexibility in changing CPTS internals and also
required for further changes.

Signed-off-by: Grygorii Strashko 
---
 drivers/net/ethernet/ti/cpsw.c | 28 +++-
 drivers/net/ethernet/ti/cpts.h | 39 +++
 2 files changed, 54 insertions(+), 13 deletions(-)

diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c
index 323174d..ec05e20 100644
--- a/drivers/net/ethernet/ti/cpsw.c
+++ b/drivers/net/ethernet/ti/cpsw.c
@@ -1562,7 +1562,7 @@ static netdev_tx_t cpsw_ndo_start_xmit(struct sk_buff 
*skb,
}
 
if (skb_shinfo(skb)->tx_flags & SKBTX_HW_TSTAMP &&
-   cpsw->cpts->tx_enable)
+   cpts_is_tx_enabled(cpsw->cpts))
skb_shinfo(skb)->tx_flags |= SKBTX_IN_PROGRESS;
 
skb_tx_timestamp(skb);
@@ -1601,7 +1601,8 @@ static void cpsw_hwtstamp_v1(struct cpsw_common *cpsw)
struct cpsw_slave *slave = >slaves[cpsw->data.active_slave];
u32 ts_en, seq_id;
 
-   if (!cpsw->cpts->tx_enable && !cpsw->cpts->rx_enable) {
+   if (!cpts_is_tx_enabled(cpsw->cpts) &&
+   !cpts_is_rx_enabled(cpsw->cpts)) {
slave_write(slave, 0, CPSW1_TS_CTL);
return;
}
@@ -1609,10 +1610,10 @@ static void cpsw_hwtstamp_v1(struct cpsw_common *cpsw)
seq_id = (30 << CPSW_V1_SEQ_ID_OFS_SHIFT) | ETH_P_1588;
ts_en = EVENT_MSG_BITS << CPSW_V1_MSG_TYPE_OFS;
 
-   if (cpsw->cpts->tx_enable)
+   if (cpts_is_tx_enabled(cpsw->cpts))
ts_en |= CPSW_V1_TS_TX_EN;
 
-   if (cpsw->cpts->rx_enable)
+   if (cpts_is_rx_enabled(cpsw->cpts))
ts_en |= CPSW_V1_TS_RX_EN;
 
slave_write(slave, ts_en, CPSW1_TS_CTL);
@@ -1635,20 +1636,20 @@ static void cpsw_hwtstamp_v2(struct cpsw_priv *priv)
case CPSW_VERSION_2:
ctrl &= ~CTRL_V2_ALL_TS_MASK;
 
-   if (cpsw->cpts->tx_enable)
+   if (cpts_is_tx_enabled(cpsw->cpts))
ctrl |= CTRL_V2_TX_TS_BITS;
 
-   if (cpsw->cpts->rx_enable)
+   if (cpts_is_rx_enabled(cpsw->cpts))
ctrl |= CTRL_V2_RX_TS_BITS;
break;
case CPSW_VERSION_3:
default:
ctrl &= ~CTRL_V3_ALL_TS_MASK;
 
-   if (cpsw->cpts->tx_enable)
+   if (cpts_is_tx_enabled(cpsw->cpts))
ctrl |= CTRL_V3_TX_TS_BITS;
 
-   if (cpsw->cpts->rx_enable)
+   if (cpts_is_rx_enabled(cpsw->cpts))
ctrl |= CTRL_V3_RX_TS_BITS;
break;
}
@@ -1684,7 +1685,7 @@ static int cpsw_hwtstamp_set(struct net_device *dev, 
struct ifreq *ifr)
 
switch (cfg.rx_filter) {
case HWTSTAMP_FILTER_NONE:
-   cpts->rx_enable = 0;
+   cpts_rx_enable(cpts, 0);
break;
case HWTSTAMP_FILTER_ALL:
case HWTSTAMP_FILTER_PTP_V1_L4_EVENT:
@@ -1700,14 +1701,14 @@ static int cpsw_hwtstamp_set(struct net_device *dev, 
struct ifreq *ifr)
case HWTSTAMP_FILTER_PTP_V2_EVENT:
case HWTSTAMP_FILTER_PTP_V2_SYNC:
case HWTSTAMP_FILTER_PTP_V2_DELAY_REQ:
-   cpts->rx_enable = 1;
+   cpts_rx_enable(cpts, 1);
cfg.rx_filter = HWTSTAMP_FILTER_PTP_V2_EVENT;
break;
default:
return -ERANGE;
}
 
-   cpts->tx_enable = cfg.tx_type == HWTSTAMP_TX_ON;
+   cpts_tx_enable(cpts, cfg.tx_type == HWTSTAMP_TX_ON);
 
switch (cpsw->version) {
case CPSW_VERSION_1:
@@ -1736,8 +1737,9 @@ static int cpsw_hwtstamp_get(struct net_device *dev, 
struct ifreq *ifr)
return -EOPNOTSUPP;
 
cfg.flags = 0;
-   cfg.tx_type = cpts->tx_enable ? HWTSTAMP_TX_ON : HWTSTAMP_TX_OFF;
-   cfg.rx_filter = (cpts->rx_enable ?
+   cfg.tx_type = cpts_is_tx_enabled(cpts) ?
+ HWTSTAMP_TX_ON : HWTSTAMP_TX_OFF;
+   cfg.rx_filter = (cpts_is_rx_enabled(cpts) ?
 HWTSTAMP_FILTER_PTP_V2_EVENT : HWTSTAMP_FILTER_NONE);
 
return copy_to_user(ifr->ifr_data, , sizeof(cfg)) ? -EFAULT : 0;
diff --git a/drivers/net/ethernet/ti/cpts.h b/drivers/net/ethernet/ti/cpts.h
index 416ba2c..29a1e80c 100644
--- a/drivers/net/ethernet/ti/cpts.h
+++ b/drivers/net/ethernet/ti/cpts.h
@@ -132,6 +132,27 @@ void cpts_rx_timestamp(struct cpts *cpts, struct sk_buff 
*skb);
 void cpts_tx_timestamp(struct cpts *cpts, struct sk_buff *skb);
 int cpts_register(struct device *dev, struct cpts *cpts, u32 mult, u32 shift);
 void cpts_unregister(struct cpts *cpts);
+
+static inline void cpts_rx_enable(struct cpts *cpts, int enable)
+{
+   cpts->rx_enable = enable;
+}
+
+static inline bool cpts_is_rx_enabled(struct cpts *cpts)
+{
+   return !!cpts->rx_enable;
+}
+
+static inline void cpts_tx_enable(struct cpts *cpts, int 

[PATCH v4 11/13] clocksource: export the clocks_calc_mult_shift to use by timestamp code

2016-12-05 Thread Grygorii Strashko
From: Murali Karicheri 

The CPSW CPTS driver is capable of doing timestamping on tx/rx packets and
requires to know mult and shift factors for timestamp conversion from raw
value to nanoseconds (ptp clock). Now these mult and shift factors are
calculated manually and provided through DT, which makes very hard to
support of a lot number of platforms, especially if CPTS refclk is not the
same for some kind of boards and depends on efuse settings (Keystone 2
platforms). Hence, export clocks_calc_mult_shift() to allow drivers like
CPSW CPTS (and other ptp drivesr) to benefit from automaitc calculation of
mult and shift factors.

Cc: John Stultz 
Signed-off-by: Murali Karicheri 
Signed-off-by: Grygorii Strashko 
Acked-by: Thomas Gleixner 
---
 kernel/time/clocksource.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index 7e4fad7..150242c 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -89,6 +89,7 @@ clocks_calc_mult_shift(u32 *mult, u32 *shift, u32 from, u32 
to, u32 maxsec)
*mult = tmp;
*shift = sft;
 }
+EXPORT_SYMBOL_GPL(clocks_calc_mult_shift);
 
 /*[Clocksource internal variables]-
  * curr_clocksource:
-- 
2.10.1



[PATCH v4 06/13] net: ethernet: ti: cpts: disable cpts when unregistered

2016-12-05 Thread Grygorii Strashko
The cpts now is left enabled after unregistration.
Hence, disable it in cpts_unregister().

Signed-off-by: Grygorii Strashko 
Acked-by: Richard Cochran 
---
 drivers/net/ethernet/ti/cpts.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/net/ethernet/ti/cpts.c b/drivers/net/ethernet/ti/cpts.c
index 3dda6d5..d3c1ac5 100644
--- a/drivers/net/ethernet/ti/cpts.c
+++ b/drivers/net/ethernet/ti/cpts.c
@@ -404,6 +404,10 @@ void cpts_unregister(struct cpts *cpts)
ptp_clock_unregister(cpts->clock);
cancel_delayed_work_sync(>overflow_work);
}
+
+   cpts_write32(cpts, 0, int_enable);
+   cpts_write32(cpts, 0, control);
+
if (cpts->refclk)
cpts_clk_release(cpts);
 }
-- 
2.10.1



[PATCH v4 07/13] net: ethernet: ti: cpts: clean up event list if event pool is empty

2016-12-05 Thread Grygorii Strashko
From: WingMan Kwok 

When a CPTS user does not exit gracefully by disabling cpts
timestamping and leaving a joined multicast group, the system
continues to receive and timestamps the ptp packets which eventually
occupy all the event list entries.  When this happns, the added code
tries to remove some list entries which are expired.

Signed-off-by: WingMan Kwok 
Signed-off-by: Grygorii Strashko 
---
 drivers/net/ethernet/ti/cpts.c | 26 --
 1 file changed, 24 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/ti/cpts.c b/drivers/net/ethernet/ti/cpts.c
index d3c1ac5..7ab1fa7 100644
--- a/drivers/net/ethernet/ti/cpts.c
+++ b/drivers/net/ethernet/ti/cpts.c
@@ -57,6 +57,26 @@ static int cpts_fifo_pop(struct cpts *cpts, u32 *high, u32 
*low)
return -1;
 }
 
+static int cpts_purge_events(struct cpts *cpts)
+{
+   struct list_head *this, *next;
+   struct cpts_event *event;
+   int removed = 0;
+
+   list_for_each_safe(this, next, >events) {
+   event = list_entry(this, struct cpts_event, list);
+   if (event_expired(event)) {
+   list_del_init(>list);
+   list_add(>list, >pool);
+   ++removed;
+   }
+   }
+
+   if (removed)
+   pr_debug("cpts: event pool cleaned up %d\n", removed);
+   return removed ? 0 : -1;
+}
+
 /*
  * Returns zero if matching event type was found.
  */
@@ -69,10 +89,12 @@ static int cpts_fifo_read(struct cpts *cpts, int match)
for (i = 0; i < CPTS_FIFO_DEPTH; i++) {
if (cpts_fifo_pop(cpts, , ))
break;
-   if (list_empty(>pool)) {
-   pr_err("cpts: event pool is empty\n");
+
+   if (list_empty(>pool) && cpts_purge_events(cpts)) {
+   pr_err("cpts: event pool empty\n");
return -1;
}
+
event = list_first_entry(>pool, struct cpts_event, list);
event->tmo = jiffies + 2;
event->high = hi;
-- 
2.10.1



[PATCH v4 13/13] net: ethernet: ti: cpts: fix overflow check period

2016-12-05 Thread Grygorii Strashko
The CPTS drivers uses 8sec period for overflow checking with
assumption that CPTS retclk will not exceed 500MHz. But that's not
true on some TI platforms (Kesytone 2). As result, it is possible that
CPTS counter will overflow more than once between two readings.

Hence, fix it by selecting overflow check period dynamically as
max_sec_before_overflow/2, where
 max_sec_before_overflow = max_counter_val / rftclk_freq.

Cc: John Stultz 
Cc: Thomas Gleixner 
Signed-off-by: Grygorii Strashko 
Acked-by: Richard Cochran 
---
 drivers/net/ethernet/ti/cpts.c | 10 +++---
 drivers/net/ethernet/ti/cpts.h |  4 +---
 2 files changed, 8 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/ti/cpts.c b/drivers/net/ethernet/ti/cpts.c
index 361d13a..a60d837 100644
--- a/drivers/net/ethernet/ti/cpts.c
+++ b/drivers/net/ethernet/ti/cpts.c
@@ -245,7 +245,7 @@ static void cpts_overflow_check(struct work_struct *work)
 
cpts_ptp_gettime(>info, );
pr_debug("cpts overflow check at %lld.%09lu\n", ts.tv_sec, ts.tv_nsec);
-   schedule_delayed_work(>overflow_work, CPTS_OVERFLOW_PERIOD);
+   schedule_delayed_work(>overflow_work, cpts->ov_check_period);
 }
 
 static int cpts_match(struct sk_buff *skb, unsigned int ptp_class,
@@ -382,8 +382,7 @@ int cpts_register(struct cpts *cpts)
}
cpts->phc_index = ptp_clock_index(cpts->clock);
 
-   schedule_delayed_work(>overflow_work, CPTS_OVERFLOW_PERIOD);
-
+   schedule_delayed_work(>overflow_work, cpts->ov_check_period);
return 0;
 
 err_ptp:
@@ -427,6 +426,11 @@ static void cpts_calc_mult_shift(struct cpts *cpts)
if (maxsec > 10)
maxsec = 10;
 
+   /* Calc overflow check period (maxsec / 2) */
+   cpts->ov_check_period = (HZ * maxsec) / 2;
+   dev_info(cpts->dev, "cpts: overflow check period %lu (jiffies)\n",
+cpts->ov_check_period);
+
if (cpts->cc_mult || cpts->cc.shift)
return;
 
diff --git a/drivers/net/ethernet/ti/cpts.h b/drivers/net/ethernet/ti/cpts.h
index 5da23af..c96eca2 100644
--- a/drivers/net/ethernet/ti/cpts.h
+++ b/drivers/net/ethernet/ti/cpts.h
@@ -97,9 +97,6 @@ enum {
CPTS_EV_TX,   /* Ethernet Transmit Event */
 };
 
-/* This covers any input clock up to about 500 MHz. */
-#define CPTS_OVERFLOW_PERIOD (HZ * 8)
-
 #define CPTS_FIFO_DEPTH 16
 #define CPTS_MAX_EVENTS 32
 
@@ -127,6 +124,7 @@ struct cpts {
struct list_head events;
struct list_head pool;
struct cpts_event pool_data[CPTS_MAX_EVENTS];
+   unsigned long ov_check_period;
 };
 
 void cpts_rx_timestamp(struct cpts *cpts, struct sk_buff *skb);
-- 
2.10.1



[PATCH v4 09/13] net: ethernet: ti: cpts: rework initialization/deinitialization

2016-12-05 Thread Grygorii Strashko
The current implementation CPTS initialization and deinitialization
(represented by cpts_register/unregister()) does too many static
initialization from .ndo_open(), which is reasonable to do once at probe
time instead, and also require caller to allocate memory for struct cpts,
which is internal for CPTS driver in general.

This patch splits CPTS initialization and deinitialization on two parts:

- static initializtion cpts_create()/cpts_release() which expected to be
executed when parent driver is probed/removed;

- dynamic part cpts_register/unregister() which expected to be executed
when network device is opened/closed.

As result, current code of CPTS parent driver - CPSW - will be simplified
(and it also will allow simplify adding support for Keystone 2 devices in
the future), plus more initialization errors will be catched earlier. In
addition, this change allows to clean up cpts.h for the case when CPTS is
disabled.

Signed-off-by: Grygorii Strashko 
---
 drivers/net/ethernet/ti/cpsw.c |  24 +-
 drivers/net/ethernet/ti/cpts.c | 102 -
 drivers/net/ethernet/ti/cpts.h |  26 +--
 3 files changed, 95 insertions(+), 57 deletions(-)

diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c
index ec05e20..deb008a 100644
--- a/drivers/net/ethernet/ti/cpsw.c
+++ b/drivers/net/ethernet/ti/cpsw.c
@@ -1486,9 +1486,7 @@ static int cpsw_ndo_open(struct net_device *ndev)
if (ret < 0)
goto err_cleanup;
 
-   if (cpts_register(cpsw->dev, cpsw->cpts,
- cpsw->data.cpts_clock_mult,
- cpsw->data.cpts_clock_shift))
+   if (cpts_register(cpsw->cpts))
dev_err(priv->dev, "error registering cpts device\n");
 
}
@@ -2796,6 +2794,7 @@ static int cpsw_probe(struct platform_device *pdev)
struct cpdma_params dma_params;
struct cpsw_ale_params  ale_params;
void __iomem*ss_regs;
+   void __iomem*cpts_regs;
struct resource *res, *ss_res;
const struct of_device_id   *of_id;
struct gpio_descs   *mode;
@@ -2823,12 +2822,6 @@ static int cpsw_probe(struct platform_device *pdev)
priv->dev  = >dev;
priv->msg_enable = netif_msg_init(debug_level, CPSW_DEBUG);
cpsw->rx_packet_max = max(rx_packet_max, 128);
-   cpsw->cpts = devm_kzalloc(>dev, sizeof(struct cpts), GFP_KERNEL);
-   if (!cpsw->cpts) {
-   dev_err(>dev, "error allocating cpts\n");
-   ret = -ENOMEM;
-   goto clean_ndev_ret;
-   }
 
mode = devm_gpiod_get_array_optional(>dev, "mode", GPIOD_OUT_LOW);
if (IS_ERR(mode)) {
@@ -2916,7 +2909,7 @@ static int cpsw_probe(struct platform_device *pdev)
switch (cpsw->version) {
case CPSW_VERSION_1:
cpsw->host_port_regs = ss_regs + CPSW1_HOST_PORT_OFFSET;
-   cpsw->cpts->reg  = ss_regs + CPSW1_CPTS_OFFSET;
+   cpts_regs   = ss_regs + CPSW1_CPTS_OFFSET;
cpsw->hw_stats   = ss_regs + CPSW1_HW_STATS;
dma_params.dmaregs   = ss_regs + CPSW1_CPDMA_OFFSET;
dma_params.txhdp = ss_regs + CPSW1_STATERAM_OFFSET;
@@ -2930,7 +2923,7 @@ static int cpsw_probe(struct platform_device *pdev)
case CPSW_VERSION_3:
case CPSW_VERSION_4:
cpsw->host_port_regs = ss_regs + CPSW2_HOST_PORT_OFFSET;
-   cpsw->cpts->reg  = ss_regs + CPSW2_CPTS_OFFSET;
+   cpts_regs   = ss_regs + CPSW2_CPTS_OFFSET;
cpsw->hw_stats   = ss_regs + CPSW2_HW_STATS;
dma_params.dmaregs   = ss_regs + CPSW2_CPDMA_OFFSET;
dma_params.txhdp = ss_regs + CPSW2_STATERAM_OFFSET;
@@ -2997,6 +2990,14 @@ static int cpsw_probe(struct platform_device *pdev)
goto clean_dma_ret;
}
 
+   cpsw->cpts = cpts_create(cpsw->dev, cpts_regs,
+cpsw->data.cpts_clock_mult,
+cpsw->data.cpts_clock_shift);
+   if (IS_ERR(cpsw->cpts)) {
+   ret = PTR_ERR(cpsw->cpts);
+   goto clean_ale_ret;
+   }
+
ndev->irq = platform_get_irq(pdev, 1);
if (ndev->irq < 0) {
dev_err(priv->dev, "error getting irq resource\n");
@@ -3112,6 +3113,7 @@ static int cpsw_remove(struct platform_device *pdev)
unregister_netdev(cpsw->slaves[1].ndev);
unregister_netdev(ndev);
 
+   cpts_release(cpsw->cpts);
cpsw_ale_destroy(cpsw->ale);
cpdma_ctlr_destroy(cpsw->dma);
cpsw_remove_dt(pdev);
diff --git a/drivers/net/ethernet/ti/cpts.c b/drivers/net/ethernet/ti/cpts.c
index fe1bb7f..9356803 100644
--- 

[PATCH v4 04/13] net: ethernet: ti: cpts: fix unbalanced clk api usage in cpts_register/unregister

2016-12-05 Thread Grygorii Strashko
There are two issues with TI CPTS code which are reproducible when TI
CPSW ethX device passes few up/down iterations:
- cpts refclk prepare counter continuously incremented after each
up/down iteration;
- devm_clk_get(dev, "cpts") is called many times.

Hence, fix these issues by using clk_disable_unprepare() in
cpts_clk_release() and skipping devm_clk_get() if cpts refclk has been
acquired already.

Signed-off-by: Grygorii Strashko 
Acked-by: Richard Cochran 
---
 drivers/net/ethernet/ti/cpts.c | 14 --
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/ti/cpts.c b/drivers/net/ethernet/ti/cpts.c
index 8cb0369..61198f1 100644
--- a/drivers/net/ethernet/ti/cpts.c
+++ b/drivers/net/ethernet/ti/cpts.c
@@ -230,18 +230,20 @@ static void cpts_overflow_check(struct work_struct *work)
 
 static void cpts_clk_init(struct device *dev, struct cpts *cpts)
 {
-   cpts->refclk = devm_clk_get(dev, "cpts");
-   if (IS_ERR(cpts->refclk)) {
-   dev_err(dev, "Failed to get cpts refclk\n");
-   cpts->refclk = NULL;
-   return;
+   if (!cpts->refclk) {
+   cpts->refclk = devm_clk_get(dev, "cpts");
+   if (IS_ERR(cpts->refclk)) {
+   dev_err(dev, "Failed to get cpts refclk\n");
+   cpts->refclk = NULL;
+   return;
+   }
}
clk_prepare_enable(cpts->refclk);
 }
 
 static void cpts_clk_release(struct cpts *cpts)
 {
-   clk_disable(cpts->refclk);
+   clk_disable_unprepare(cpts->refclk);
 }
 
 static int cpts_match(struct sk_buff *skb, unsigned int ptp_class,
-- 
2.10.1



[PATCH v4 08/13] net: ethernet: ti: cpts: drop excessive writes to CTRL and INT_EN regs

2016-12-05 Thread Grygorii Strashko
CPTS module and IRQs are always enabled when CPTS is registered,
before starting overflow check work, and disabled during
deregistration, when overflow check work has been canceled already.
So, It doesn't require to (re)enable CPTS module and IRQs in
cpts_overflow_check().

Signed-off-by: Grygorii Strashko 
Acked-by: Richard Cochran 
---
 drivers/net/ethernet/ti/cpts.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/net/ethernet/ti/cpts.c b/drivers/net/ethernet/ti/cpts.c
index 7ab1fa7..fe1bb7f 100644
--- a/drivers/net/ethernet/ti/cpts.c
+++ b/drivers/net/ethernet/ti/cpts.c
@@ -243,8 +243,6 @@ static void cpts_overflow_check(struct work_struct *work)
struct timespec64 ts;
struct cpts *cpts = container_of(work, struct cpts, overflow_work.work);
 
-   cpts_write32(cpts, CPTS_EN, control);
-   cpts_write32(cpts, TS_PEND_EN, int_enable);
cpts_ptp_gettime(>info, );
pr_debug("cpts overflow check at %lld.%09lu\n", ts.tv_sec, ts.tv_nsec);
schedule_delayed_work(>overflow_work, CPTS_OVERFLOW_PERIOD);
-- 
2.10.1



Re: [PATCH 1/1] net: ethernet: qlogic: set error code on failure

2016-12-05 Thread David Miller
From: "Mintz, Yuval" 
Date: Sun, 4 Dec 2016 07:29:58 +

>> From: Pan Bian 
>> 
>> When calling dma_mapping_error(), the value of return variable rc is 0.
>> And when the call returns an unexpected value, rc is not set to a negative
>> errno. Thus, it will return 0 on the error path, and its callers cannot 
>> detect
>> the bug. This patch fixes the bug, assigning "-ENOMEM" to err.
>> 
>> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=189041
>> 
>> Signed-off-by: Pan Bian 
> 
> The title should have been "[PATCH net 1/1] qed: Set error code on failure".
> 
> But the fix itself is sound. Thanks.
> BTW, is -ENOMEM the right return code in case of DMA mapping errors?
> 
> Acked-by: Yuval Mintz 

Applied.

Indeed, -ENOMEM is usually the right thing to use for DMA mapping errors.
Because usually the error is because we're run out of IOMMU resources
or similar.  And -ENOMEM is pretty much the error code which maps most
closely to that situation.


  1   2   3   >