date:20160718

Re: PROBLEM: MTU of ipsec tunnel drops continuously until traffic stops

2016-07-18 Thread Matt Bennett

On 07/05/2016 03:55 PM, Matt Bennett wrote:
> On 07/04/2016 11:12 PM, Steffen Klassert wrote:
>> On Mon, Jul 04, 2016 at 03:52:50AM +, Matt Bennett wrote:
>>> *Resending as plain text so the mailing list accepts it.. Sorry Steffen and 
>>> Herbert*
>>>
>>> Hi,
>>>
>>> During long run testing of an ipsec tunnel over a PPP link it was found 
>>> that occasionally traffic would stop flowing over the tunnel. Eventually 
>>> the traffic would start again, however using the command "ip route flush 
>>> cache" causes traffic to start flowing  again immediately.
>>>
>>> Note, I am using a 4.4.6 based kernel, however I see no major differences 
>>> between 4.4.6 and 4.4.14 (current LTS) in any of the code I am debugging. I 
>>>  have manually debugged the code as far as I can, however I don't know the 
>>> code well enough to make further progress. What I have uncovered is 
>>> outlined below:
>>>
>>> By pinging the other end of the tunnel when the traffic stops flowing I get 
>>> messages like the following:
>>>
>>> 10-AR4050#ping 172.16.0.5
>>> PING 172.16.0.5 (172.16.0.5) 56(84) bytes of data.
>>>   From 172.16.0.6 icmp_seq=1 Frag needed and DF set (mtu = 46)
>>>   From 172.16.0.6 icmp_seq=2 Frag needed and DF set (mtu = 46)
>>>
>>> but this is weird considering (note the mtu values):
>>>
>>> [root@10-AR4050 /flash]# ip link
>>> 16778240: ppp0:  mtu 1492 qdisc 
>>> htb state UP mode DEFAULT group default qlen 3
>>>   link/ppp
>>> 14: tunnel64@NONE:  mtu 1200 qdisc htb 
>>> state UNKNOWN mode DEFAULT group default qlen 1
>>>   link/ipip 203.0.113.10 peer 203.0.113.5
>>>
>>> The code that generates the ICMP_FRAG_NEEDED packet is vti_xmit() 
>>> (ip_vti.c) where there is a check of skb length against the mtu of dst 
>>> entry. Since the mtu is lower than the packet (debug shows the mtu is 46 as 
>>> expected from the ping output) the ICMP  error is generated.
>>
>> Semms like you use vti tunnels. Is tunnel64@NONE a vti device, and
>> if so did you set the mtu to 1200?
> Yes it is a vti device with mtu manually set to 1200. Similarly the
> other end of the tunnel is a vti with mtu manually set to 1200. There is
> traffic flowing across the tunnel of random size between 512 to 1500 bytes.
>>
>> Not sure if it is related to your problem, but there was a recent
>> fix for vti pmtu handling. It was commit d6af1a31 ("vti: Add pmtu
>> handling to vti_xmit.") Do you have this on your branch?
> Yes, that problem was reported from another one of our tests. That patch
> is applied to our branch. It is the code added from that patch that
> explicitly sends the ICMP_FRAG_NEEDED packet. Before this patch I
> presume the packets would have simply been sent (even though the cached
> mtu values were buggy).
>>
>>>
>>> Digging further I find that when the issue occurs the mtu value is being 
>>> updated in what appears to be an error case in xfrm_bundle_ok 
>>> (xfrm_policy.c). Specifically the block of code:
>>>
>>> if (likely(!last))
>>>   return 1;
>>>
>>> is not hit meaning there is a difference between the cached mtu value and 
>>> the value just calculated. I then see this code being hit continuously and 
>>> each time the mtu keeps getting lowered. i.e. (I don't know if the drop by 
>>> 80 bytes is significant)
>>>
>>> 1200
>>> 1118
>>> 1038
>>> 958
>>> 878
>>>
>>> 46
>>
>> I remember that we had a similar problem with IPsec when no
>> vti was used some years ago...
>>
>> Unfortunately, today is my last office day before my vacation,
>> so no fix from me for the next two weeks.
>>
>
>
Hi Steffen,

I figured you must be back from your holiday soon. I haven't been able to make 
much progress on this issue, however I have found an interesting patch that 
appears like it addresses a similar issue to what I have reported (albeit for 
the ipv6 case).

Commit 00bc0ef5880dc7b82f9c320dead4afaad48e47be "ipv6: Skip XFRM lookup if 
dst_entry in socket cache is valid" mentions "... To put it another way, the 
path MTU shrinks each time we miss the flow cache, which later on leads to 
incorrectly fragmented payload."

Re: [PATCH] net/sched/sch_htb: clamp xstats tokens to fit into 32-bit int

2016-07-18 Thread David Miller

From: Konstantin Khlebnikov 
Date: Sat, 16 Jul 2016 17:08:56 +0300

> In kernel HTB keeps tokens in signed 64-bit in nanoseconds. In netlink
> protocol these values are converted into pshed ticks (64ns for now) and
> truncated to 32-bit. In struct tc_htb_xstats fields "tokens" and "ctokens"
> are declared as unsigned 32-bit but they could be negative thus tool 'tc'
> prints them as signed. Big values loose higher bits and/or become negative.
> 
> This patch clamps tokens in xstat into range from INT_MIN to INT_MAX.
> In this way it's easier to understand what's going on here.
> 
> Signed-off-by: Konstantin Khlebnikov 

Applied.

[Patch-V2 2/3] chcr: Support for Chelsio's Crypto Hardware

2016-07-18 Thread Yeshaswi M R Gowda

The Chelsio's Crypto Hardware can perform the following operations:
SHA1, SHA224, SHA256, SHA384 and SHA512, HMAC(SHA1), HMAC(SHA224),
HMAC(SHA256), HMAC(SHA384), HAMC(SHA512), AES-128-CBC, AES-192-CBC,
AES-256-CBC, AES-128-XTS, AES-256-XTS

This patch implements the driver for above mentioned features. This
driver is an Upper Layer Driver which is attached to Chelsio's LLD
(cxgb4) and uses the queue allocated by the LLD for sending the crypto
requests to the Hardware and receiving the responses from it.

The crypto operations can be performed by Chelsio's hardware from the
userspace applications and/or from within the kernel space using the
kernel's crypto API.

The above mentioned crypto features have been tested using kernel's
tests mentioned in testmgr.h. They also have been tested from user
space using libkcapi and Openssl.

Signed-off-by: Yeshaswi M R Gowda 
---
 drivers/crypto/chelsio/Kconfig   |   21 +
 drivers/crypto/chelsio/Makefile  |4 +
 drivers/crypto/chelsio/chcr_algo.c   | 1509 ++
 drivers/crypto/chelsio/chcr_algo.h   |  503 
 drivers/crypto/chelsio/chcr_core.c   |  268 ++
 drivers/crypto/chelsio/chcr_core.h   |   80 ++
 drivers/crypto/chelsio/chcr_crypto.h |  204 +
 7 files changed, 2589 insertions(+)
 create mode 100644 drivers/crypto/chelsio/Kconfig
 create mode 100644 drivers/crypto/chelsio/Makefile
 create mode 100644 drivers/crypto/chelsio/chcr_algo.c
 create mode 100644 drivers/crypto/chelsio/chcr_algo.h
 create mode 100644 drivers/crypto/chelsio/chcr_core.c
 create mode 100644 drivers/crypto/chelsio/chcr_core.h
 create mode 100644 drivers/crypto/chelsio/chcr_crypto.h

diff --git a/drivers/crypto/chelsio/Kconfig b/drivers/crypto/chelsio/Kconfig
new file mode 100644
index 000..4266cb2
--- /dev/null
+++ b/drivers/crypto/chelsio/Kconfig
@@ -0,0 +1,21 @@
+config CRYPTO_DEV_CHELSIO
+   tristate "Chelsio Crypto Co-processor Driver"
+   depends on PCI && NETDEVICES && ETHERNET
+   select CRYPTO_SHA1
+   select CRYPTO_SHA256
+   select CRYPTO_SHA512
+   select NET_VENDOR_CHELSIO
+   select CHELSIO_T4
+   ---help---
+ The Chelsio Crypto Co-processor driver for T6 adapters.
+
+ For general information about Chelsio and our products, visit
+ our website at .
+
+ For customer support, please visit our customer support page at
+ .
+
+ Please send feedback to .
+
+ To compile this driver as a module, choose M here: the module
+ will be called chcr.
diff --git a/drivers/crypto/chelsio/Makefile b/drivers/crypto/chelsio/Makefile
new file mode 100644
index 000..7e4fda5
--- /dev/null
+++ b/drivers/crypto/chelsio/Makefile
@@ -0,0 +1,4 @@
+ ccflags-y := -Idrivers/net/ethernet/chelsio/cxgb4
+
+ obj-$(CONFIG_CRYPTO_DEV_CHELSIO) += chcr.o
+ chcr-objs :=  chcr_core.o chcr_algo.o
\ No newline at end of file
diff --git a/drivers/crypto/chelsio/chcr_algo.c 
b/drivers/crypto/chelsio/chcr_algo.c
new file mode 100644
index 000..a327b53
--- /dev/null
+++ b/drivers/crypto/chelsio/chcr_algo.c
@@ -0,0 +1,1509 @@
+/*
+ * This file is part of the Chelsio T6 Crypto driver for Linux.
+ *
+ * Copyright (c) 2003-2016 Chelsio Communications, Inc. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ *  - Redistributions of source code must retain the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer.
+ *
+ *  - Redistributions in binary form must reproduce the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer in the documentation and/or other materials
+ *provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ *
+ * Written and Maintained by:
+ * Manoj Malviya (manojmalv...@chelsio.com)
+ * Atul Gupta (atul.gu...@chelsio.com)
+ * Jitendra Lulla (jlu...@chelsio.com)
+ * Yeshaswi M R Gowda (yesha...@chelsio.com)
+ *

[Patch-V2 1/3] cxgb4: Add Chelsio LLD support Chelsio Crypto ULD

2016-07-18 Thread Yeshaswi M R Gowda

The Chelsio crypto driver is an Upper Layer Driver (ULD), making use
of the Chelsio Lower Layer Driver (LLD - cxgb4). The LLD facilitates
the basic infrastructure services of the ULD. These services include
queue allocation, deallocation and registration with LLD. The queues
are used for sending the crypto requests to the Chelsio's hardware
and for receiving the responses from the hardware.

This patch enables the services mentioned for the Chelsio's crypto
driver.

Signed-off-by: Yeshaswi M R Gowda 
---
 drivers/net/ethernet/chelsio/cxgb4/cxgb4.h |   18 +-
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.c |   41 +-
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c|   80 +++-
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_uld.h |   10 +
 drivers/net/ethernet/chelsio/cxgb4/sge.c   |   64 +++
 drivers/net/ethernet/chelsio/cxgb4/t4_msg.h|  437 
 drivers/net/ethernet/chelsio/cxgb4/t4fw_api.h  |  131 +-
 7 files changed, 770 insertions(+), 11 deletions(-)

diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h 
b/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h
index b4fceb9..4de1e39 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h
@@ -346,6 +346,8 @@ struct adapter_params {
 
unsigned int max_ordird_qp;   /* Max read depth per RDMA QP */
unsigned int max_ird_adapter; /* Max read depth per adapter */
+
+   unsigned char ulp_crypto_lookaside; /* crypto lookaside support */
 };
 
 /* State needed to monitor the forward progress of SGE Ingress DMA activities
@@ -435,7 +437,7 @@ enum {
MAX_CTRL_QUEUES = NCHAN,  /* # of control Tx queues */
MAX_RDMA_QUEUES = NCHAN,  /* # of streaming RDMA Rx queues */
MAX_RDMA_CIQS = 32,/* # of  RDMA concentrator IQs */
-
+   MAX_CRYPTO_QUEUES = 32,   /* # of crypto queues */
/* # of streaming iSCSIT Rx queues */
MAX_ISCSIT_QUEUES = MAX_OFLD_QSETS,
 };
@@ -455,7 +457,8 @@ enum {
INGQ_EXTRAS = 2,/* firmware event queue and */
/*   forwarded interrupts */
MAX_INGQ = MAX_ETH_QSETS + MAX_OFLD_QSETS + MAX_RDMA_QUEUES +
-  MAX_RDMA_CIQS + MAX_ISCSIT_QUEUES + INGQ_EXTRAS,
+  MAX_RDMA_CIQS + MAX_ISCSIT_QUEUES + INGQ_EXTRAS +
+  MAX_CRYPTO_QUEUES,
 };
 
 struct adapter;
@@ -509,6 +512,10 @@ enum { /* adapter flags */
FW_OFLD_CONN   = (1 << 9),
 };
 
+enum {
+   ULP_CRYPTO_LOOKASIDE = 1 << 0,
+};
+
 struct rx_sw_desc;
 
 struct sge_fl { /* SGE free-buffer queue state */
@@ -682,10 +689,12 @@ struct sge_ctrl_txq {   /* state for an SGE 
control Tx queue */
 struct sge {
struct sge_eth_txq ethtxq[MAX_ETH_QSETS];
struct sge_ofld_txq ofldtxq[MAX_OFLD_QSETS];
+   struct sge_ofld_txq cryptotxq[MAX_CRYPTO_QUEUES];
struct sge_ctrl_txq ctrlq[MAX_CTRL_QUEUES];
 
struct sge_eth_rxq ethrxq[MAX_ETH_QSETS];
struct sge_ofld_rxq iscsirxq[MAX_OFLD_QSETS];
+   struct sge_ofld_rxq cryptorxq[MAX_CRYPTO_QUEUES];
struct sge_ofld_rxq iscsitrxq[MAX_ISCSIT_QUEUES];
struct sge_ofld_rxq rdmarxq[MAX_RDMA_QUEUES];
struct sge_ofld_rxq rdmaciq[MAX_RDMA_CIQS];
@@ -699,10 +708,12 @@ struct sge {
u16 ethtxq_rover;   /* Tx queue to clean up next */
u16 iscsiqsets;  /* # of active iSCSI queue sets */
u16 niscsitq;   /* # of available iSCST Rx queues */
+   u16 ncryptoq;   /* # of available lookaside crypto queues */
u16 rdmaqs; /* # of available RDMA Rx queues */
u16 rdmaciqs;   /* # of available RDMA concentrator IQs */
u16 iscsi_rxq[MAX_OFLD_QSETS];
u16 iscsit_rxq[MAX_ISCSIT_QUEUES];
+   u16 crypto_rxq[MAX_CRYPTO_QUEUES];
u16 rdma_rxq[MAX_RDMA_QUEUES];
u16 rdma_ciq[MAX_RDMA_CIQS];
u16 timer_val[SGE_NTIMERS];
@@ -732,6 +743,7 @@ struct sge {
 #define for_each_iscsitrxq(sge, i) for (i = 0; i < (sge)->niscsitq; i++)
 #define for_each_rdmarxq(sge, i) for (i = 0; i < (sge)->rdmaqs; i++)
 #define for_each_rdmaciq(sge, i) for (i = 0; i < (sge)->rdmaciqs; i++)
+#define for_each_cryptorxq(sge, i) for (i = 0; i < (sge)->ncryptoq; i++)
 
 struct l2t_data;
 
@@ -1441,7 +1453,7 @@ int t4_fw_bye(struct adapter *adap, unsigned int mbox);
 int t4_early_init(struct adapter *adap, unsigned int mbox);
 int t4_fw_reset(struct adapter *adap, unsigned int mbox, int reset);
 int t4_fixup_host_params(struct adapter *adap, unsigned int page_size,
- unsigned int cache_line_size);
+unsigned int cache_line_size);
 int t4_fw_initialize(struct adapter *adap, unsigned int mbox);
 int t4_query_params(struct adapter *adap, unsigned int mbox, unsigned int pf,

[Patch-V2 0/3] crypto/chcr: Add Chelsio Crypto Driver

2016-07-18 Thread Yeshaswi M R Gowda

Hi Herbert,

This patch series contains 3 patches that add support for Chelsio's
Crypto Hardware.

The patch series has been created against Herbert Xu's tree (crypto-2.6).
It includes patches for Chelsio Low Level Driver(cxgb4) and adds the new
crypto Upper Layer Driver(chcr) under a new directory drivers/crypto/chelsio.

The first of the patch series implements necessary changes in the Chelsio
LLD for queue allocation, deallocation and registration of the ULD.

The second patch implements the Chelsio crypto driver.

The third patch contains the changes to the driver/crypto/Kconfig and
drivers/crypto/Makefile to enable the Chelsio Crypto driver.

We have included all the maintainers of respective drivers. Kindly
review the changes and provide feedback on the same.

Thank you Joe Perches and Herbert Xu for your review, I have made appropriate
changes based on them.

[V1 -> V2]

1. Some residual code cleanup
2. Adds pr_fmt with chcr (KBUILD_MODNAME) added
3. Changes var name to accomodate them <80 columns in the chcr_register_alg
4. Support for printing the crypto queue stats
5. Fix compile warnings reported by kbuild bot for certain architectures
6. Dependency fix in Kconfig.
7. If the request has the MAY_BACKLOG bit set and hardware queue is full the 
request
   is queued up else -EBUSY is returned to throttle the user. The queue when 
executed
   and processed returns -EINPROGRESS in completion.

Yeshaswi M R Gowda (3):
  cxgb4: Add Chelsio LLD support Chelsio Crypto ULD
  chcr: Support for Chelsio's Crypto Hardware
  crypto: Added Chelsio Menu to the Kconfig file

 drivers/crypto/Kconfig |2 +
 drivers/crypto/Makefile|1 +
 drivers/crypto/chelsio/Kconfig |   21 +
 drivers/crypto/chelsio/Makefile|4 +
 drivers/crypto/chelsio/chcr_algo.c | 1509 
 drivers/crypto/chelsio/chcr_algo.h |  503 +++
 drivers/crypto/chelsio/chcr_core.c |  268 
 drivers/crypto/chelsio/chcr_core.h |   80 ++
 drivers/crypto/chelsio/chcr_crypto.h   |  204 +++
 drivers/net/ethernet/chelsio/cxgb4/cxgb4.h |   18 +-
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.c |   41 +-
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c|   80 +-
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_uld.h |   10 +
 drivers/net/ethernet/chelsio/cxgb4/sge.c   |   64 +
 drivers/net/ethernet/chelsio/cxgb4/t4_msg.h|  437 ++
 drivers/net/ethernet/chelsio/cxgb4/t4fw_api.h  |  131 +-
 16 files changed, 3362 insertions(+), 11 deletions(-)
 create mode 100644 drivers/crypto/chelsio/Kconfig
 create mode 100644 drivers/crypto/chelsio/Makefile
 create mode 100644 drivers/crypto/chelsio/chcr_algo.c
 create mode 100644 drivers/crypto/chelsio/chcr_algo.h
 create mode 100644 drivers/crypto/chelsio/chcr_core.c
 create mode 100644 drivers/crypto/chelsio/chcr_core.h
 create mode 100644 drivers/crypto/chelsio/chcr_crypto.h

-- 
1.7.10.1

[Patch-V2 3/3] crypto: Added Chelsio Menu to the Kconfig file

2016-07-18 Thread Yeshaswi M R Gowda

Adds the config entry for the Chelsio Crypto Driver, Makefile changes
for the same.

Signed-off-by: Yeshaswi M R Gowda 
---
 drivers/crypto/Kconfig  |2 ++
 drivers/crypto/Makefile |1 +
 2 files changed, 3 insertions(+)

diff --git a/drivers/crypto/Kconfig b/drivers/crypto/Kconfig
index d77ba2f..b44faf0 100644
--- a/drivers/crypto/Kconfig
+++ b/drivers/crypto/Kconfig
@@ -537,4 +537,6 @@ config CRYPTO_DEV_ROCKCHIP
  This driver interfaces with the hardware crypto accelerator.
  Supporting cbc/ecb chainmode, and aes/des/des3_ede cipher mode.
 
+source "drivers/crypto/chelsio/Kconfig"
+
 endif # CRYPTO_HW
diff --git a/drivers/crypto/Makefile b/drivers/crypto/Makefile
index 3c6432d..ad7250f 100644
--- a/drivers/crypto/Makefile
+++ b/drivers/crypto/Makefile
@@ -31,3 +31,4 @@ obj-$(CONFIG_CRYPTO_DEV_QCE) += qce/
 obj-$(CONFIG_CRYPTO_DEV_VMX) += vmx/
 obj-$(CONFIG_CRYPTO_DEV_SUN4I_SS) += sunxi-ss/
 obj-$(CONFIG_CRYPTO_DEV_ROCKCHIP) += rockchip/
+obj-$(CONFIG_CRYPTO_DEV_CHELSIO) += chelsio/
-- 
1.7.10.1

RE: [PATCH/RFC] packet: fix sock_tx_timestamp() in packet_snd() via sendto syscall

2016-07-18 Thread Yoshihiro Shimoda

Hi,

> From: Willem de Bruijn
> Sent: Saturday, July 16, 2016 12:31 AM
> 
> On Thu, Jul 14, 2016 at 10:49 PM, Yoshihiro Shimoda
>  wrote:
> > Since the sendto syscall doesn't have msg_control buffer,
> > the sock_tx_timestamp() in packet_snd() cannot work correctly because
> > the socks.fsflags is set to 0.
> 
> You're right. __sock_tx_timestamp used to take sk->sk_tsflags as
> input, now it relies solely on this parameter tsflags. All callsites
> must either pass sk->sk_tsflags directly or initialize sockc.tsflags
> to this value.

Thank you very much for the comment!

> > diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
> > index 9f0983f..d76fd41 100644
> > --- a/net/packet/af_packet.c
> > +++ b/net/packet/af_packet.c
> > @@ -2887,6 +2887,11 @@ static int packet_snd(struct socket *sock, struct 
> > msghdr *msg, size_t len)
> > err = sock_cmsg_send(sk, msg, );
> > if (unlikely(err))
> > goto out_unlock;
> > +   } else {
> > +   /* Set tsflags from sk because a syscall (e.g. sendto) 
> > doesn't
> > +* have msg_control buffer.
> > +*/
> > +   sockc.tsflags = sk->sk_tsflags;
> > }
> 
> Better to follow the example of other protocols. In all three packet
> variants, make the following initialization change:
> 
> -   sockc.tsflags = 0;
> +   sockc.tsflags = sk->sk_tsflags;

Thank you for the suggestion. I submitted a fixed patch now.

Best regards,
Yoshihiro Shimoda

> (I had to remove some recipients, because my reply was marked as spam
> and dropped otherwise..)

[PATCH v3] packet: fix second argument of sock_tx_timestamp()

2016-07-18 Thread Yoshihiro Shimoda

This patch fixes an issue that a syscall (e.g. sendto syscall) cannot
work correctly. Since the sendto syscall doesn't have msg_control buffer,
the sock_tx_timestamp() in packet_snd() cannot work correctly because
the socks.tsflags is set to 0.
So, this patch sets the socks.tsflags to sk->sk_tsflags as default.

Fixes: c14ac9451c34 ("sock: enable timestamping using control messages")
Cc: 
Reported-by: Kazuya Mizuguchi 
Reported-by: Keita Kobayashi 
Signed-off-by: Yoshihiro Shimoda 
---
 Changes from v2:
  - Fix build error...

 Changes from v1:
  - Set socks.tsflags to sk->sk_tsflags as default instead of a condition.
  - Fix other socks.tsflags values in the af_packet.c.
  - Revise the commit log.

 About v1 (as RFC):
  - http://thread.gmane.org/gmane.linux.kernel.renesas-soc/5646

 net/packet/af_packet.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index 9f0983f..53e87ce 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -1927,7 +1927,7 @@ retry:
goto out_unlock;
}
 
-   sockc.tsflags = 0;
+   sockc.tsflags = sk->sk_tsflags;
if (msg->msg_controllen) {
err = sock_cmsg_send(sk, msg, );
if (unlikely(err)) {
@@ -2678,7 +2678,7 @@ static int tpacket_snd(struct packet_sock *po, struct 
msghdr *msg)
dev = dev_get_by_index(sock_net(>sk), saddr->sll_ifindex);
}
 
-   sockc.tsflags = 0;
+   sockc.tsflags = po->sk.sk_tsflags;
if (msg->msg_controllen) {
err = sock_cmsg_send(>sk, msg, );
if (unlikely(err))
@@ -2881,7 +2881,7 @@ static int packet_snd(struct socket *sock, struct msghdr 
*msg, size_t len)
if (unlikely(!(dev->flags & IFF_UP)))
goto out_unlock;
 
-   sockc.tsflags = 0;
+   sockc.tsflags = sk->sk_tsflags;
sockc.mark = sk->sk_mark;
if (msg->msg_controllen) {
err = sock_cmsg_send(sk, msg, );
-- 
1.9.1

Re: [PATCH V2] Add flow control to the portmapper

2016-07-18 Thread Leon Romanovsky

On Mon, Jul 18, 2016 at 02:23:30PM -0500, Shiraz Saleem wrote:
> From: Mustafa Ismail 
> 
> During connection establishment with a large number of connections,
> it is possible that the connection requests might fail. Adding flow
> control prevents this failure. Change ibnl unicast to use netlink
> messaging with blocking to enable flow control.

You are the one user of this new inline function.
Why don't you directly call to netlink_unicast() in your ibnl_unicast()
without messing with widely visible header file?

Thanks


signature.asc
Description: Digital signature

RE: [PATCH] packet: fix second argument of sock_tx_timestamp()

2016-07-18 Thread Yoshihiro Shimoda

> -Original Message-
> From: Yoshihiro Shimoda
> Sent: Tuesday, July 19, 2016 2:15 PM
> 
> This patch fixes an issue that a syscall (e.g. sendto syscall) cannot
> work correctly. Since the sendto syscall doesn't have msg_control buffer,
> the sock_tx_timestamp() in packet_snd() cannot work correctly because
> the socks.tsflags is set to 0.
> So, this patch sets the socks.tsflags to sk->sk_tsflags as default.
> 
> Fixes: c14ac9451c34 ("sock: enable timestamping using control messages")
> Cc: 
> Reported-by: Kazuya Mizuguchi 
> Reported-by: Keita Kobayashi 
> Signed-off-by: Yoshihiro Shimoda 
> ---
>  Changes from v1:
>   - Set socks.tsflags to sk->sk_tsflags as default instead of a condition.
>   - Fix other socks.tsflags values in the af_packet.c.
>   - Revise the commit log.
> 
>  About v1 (as RFC):
>   - http://thread.gmane.org/gmane.linux.kernel.renesas-soc/5646
> 
> 
>  net/packet/af_packet.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
> index 9f0983f..50ea97e 100644
> --- a/net/packet/af_packet.c
> +++ b/net/packet/af_packet.c
> @@ -1927,7 +1927,7 @@ retry:
>   goto out_unlock;
>   }
> 
> - sockc.tsflags = 0;
> + sockc.tsflags = sk->sk_tsflags;
>   if (msg->msg_controllen) {
>   err = sock_cmsg_send(sk, msg, );
>   if (unlikely(err)) {
> @@ -2678,7 +2678,7 @@ static int tpacket_snd(struct packet_sock *po, struct 
> msghdr *msg)
>   dev = dev_get_by_index(sock_net(>sk), saddr->sll_ifindex);
>   }
> 
> - sockc.tsflags = 0;
> + sockc.tsflags = sk->sk_tsflags;

Oops! I mistook this. I will resubmit a fixed patch

Best regards,
Yoshihiro Shimoda

[PATCH] packet: fix second argument of sock_tx_timestamp()

2016-07-18 Thread Yoshihiro Shimoda

This patch fixes an issue that a syscall (e.g. sendto syscall) cannot
work correctly. Since the sendto syscall doesn't have msg_control buffer,
the sock_tx_timestamp() in packet_snd() cannot work correctly because
the socks.tsflags is set to 0.
So, this patch sets the socks.tsflags to sk->sk_tsflags as default.

Fixes: c14ac9451c34 ("sock: enable timestamping using control messages")
Cc: 
Reported-by: Kazuya Mizuguchi 
Reported-by: Keita Kobayashi 
Signed-off-by: Yoshihiro Shimoda 
---
 Changes from v1:
  - Set socks.tsflags to sk->sk_tsflags as default instead of a condition.
  - Fix other socks.tsflags values in the af_packet.c.
  - Revise the commit log.

 About v1 (as RFC):
  - http://thread.gmane.org/gmane.linux.kernel.renesas-soc/5646


 net/packet/af_packet.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index 9f0983f..50ea97e 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -1927,7 +1927,7 @@ retry:
goto out_unlock;
}
 
-   sockc.tsflags = 0;
+   sockc.tsflags = sk->sk_tsflags;
if (msg->msg_controllen) {
err = sock_cmsg_send(sk, msg, );
if (unlikely(err)) {
@@ -2678,7 +2678,7 @@ static int tpacket_snd(struct packet_sock *po, struct 
msghdr *msg)
dev = dev_get_by_index(sock_net(>sk), saddr->sll_ifindex);
}
 
-   sockc.tsflags = 0;
+   sockc.tsflags = sk->sk_tsflags;
if (msg->msg_controllen) {
err = sock_cmsg_send(>sk, msg, );
if (unlikely(err))
@@ -2881,7 +2881,7 @@ static int packet_snd(struct socket *sock, struct msghdr 
*msg, size_t len)
if (unlikely(!(dev->flags & IFF_UP)))
goto out_unlock;
 
-   sockc.tsflags = 0;
+   sockc.tsflags = sk->sk_tsflags;
sockc.mark = sk->sk_mark;
if (msg->msg_controllen) {
err = sock_cmsg_send(sk, msg, );
-- 
1.9.1

Re: [PATCH net-next] macvtap: correctly free skb during socket destruction

2016-07-18 Thread David Miller

From: Jason Wang 
Date: Tue, 19 Jul 2016 11:02:59 +0800

> We should use kfree_skb() instead of kfree() to free an skb.
> 
> Fixes: 362899b8725b ("macvtap: switch to use skb array")
> Reported-by: Dan Carpenter 
> Signed-off-by: Jason Wang 

Applied, thanks Jason.

Re: [RFC PATCH 00/30] Kernel NET policy

2016-07-18 Thread David Miller

From: "Liang, Kan" 
Date: Tue, 19 Jul 2016 01:49:41 +

> Yes, rtnl will bring some overheads. But the configuration is one
> time thing for application or socket. It only happens on receiving
> first packet.

Thanks for destroying our connection rates.

This kind of overhead is simply unacceptable.

Re: [PATCH net-next v3 10/12] net: dsa: support switchdev ageing time attr

2016-07-18 Thread Florian Fainelli

Le 18/07/2016 à 20:24, Andrew Lunn a écrit :
> On Mon, Jul 18, 2016 at 08:45:38PM -0400, Vivien Didelot wrote:
>> Add a new function for DSA drivers to handle the switchdev
>> SWITCHDEV_ATTR_ID_BRIDGE_AGEING_TIME attribute.
>>
>> The ageing time is passed as milliseconds.
>>
>> Also because we can have multiple logical bridges on top of a physical
>> switch and ageing time are switch-wide, call the driver function with
>> the fastest ageing time in use on the chip instead of the requested one.
>>
>> Signed-off-by: Vivien Didelot 
>> ---
>>  include/net/dsa.h |  2 ++
>>  net/dsa/slave.c   | 41 +
> 

Hi Andrew,

> Hi Florian
> 
> It looks like the SF2 can do fast ageing per port. What i don't see if
> what configuration options you have. Can you get the fast and the
> normal age time per port? Or is it global?

The normal ageing is global and the value needs to be programmed in
seconds, can can range from 10 to 1,048,575 (encoded on 20 bits). The
fast-ageing can actually be per-port, per-VLAN id, for just dynamic or
static entries etc. and is just a poor name for a flush based on any of
these criteria.
-- 
Florian

Re: [PATCH net-next v3 10/12] net: dsa: support switchdev ageing time attr

2016-07-18 Thread Andrew Lunn

On Mon, Jul 18, 2016 at 08:45:38PM -0400, Vivien Didelot wrote:
> Add a new function for DSA drivers to handle the switchdev
> SWITCHDEV_ATTR_ID_BRIDGE_AGEING_TIME attribute.
> 
> The ageing time is passed as milliseconds.
> 
> Also because we can have multiple logical bridges on top of a physical
> switch and ageing time are switch-wide, call the driver function with
> the fastest ageing time in use on the chip instead of the requested one.
> 
> Signed-off-by: Vivien Didelot 
> ---
>  include/net/dsa.h |  2 ++
>  net/dsa/slave.c   | 41 +

Hi Florian

It looks like the SF2 can do fast ageing per port. What i don't see if
what configuration options you have. Can you get the fast and the
normal age time per port? Or is it global?

   Andrew

Re: [patch 1/1] kernel/trace/bpf_trace.c: work around gcc-4.4.4 anon union initialization bug

2016-07-18 Thread Fengguang Wu


On Mon, Jul 18, 2016 at 07:38:27PM -0700, Alexei Starovoitov wrote:

On Tue, Jul 19, 2016 at 08:38:02AM +0800, Fengguang Wu wrote:

Hi Alexei,

On Mon, Jul 18, 2016 at 05:33:07PM -0700, Alexei Starovoitov wrote:
>On Mon, Jul 18, 2016 at 03:50:58PM -0700, a...@linux-foundation.org wrote:
>>From: Andrew Morton 
>>Subject: kernel/trace/bpf_trace.c: work around gcc-4.4.4 anon union 
initialization bug
>>
>>kernel/trace/bpf_trace.c: In function 'bpf_event_output':
>>kernel/trace/bpf_trace.c:312: error: unknown field 'next' specified in 
initializer
>>kernel/trace/bpf_trace.c:312: warning: missing braces around initializer
>>kernel/trace/bpf_trace.c:312: warning: (near initialization for 
'raw.frag.')
>>
>>Fixes: 555c8a8623a3a87 ("bpf: avoid stack copy and use skb ctx for event 
output")
>>Acked-by: Daniel Borkmann 
>>Cc: Alexei Starovoitov 
>>Cc: David S. Miller 
>>Signed-off-by: Andrew Morton 
>
>Acked-by: Alexei Starovoitov 
>
>Fengguang can you add gcc-4.4 to buildbot. Thanks!

Sure. Currently we only test gcc-6. It'd be easy to test more versions
concurrently, like

gcc-4.4
gcc-4.6
gcc-4.8
gcc-4.9
gcc-5
gcc-6


thanks! If you need to reduce the test matrix I don't see a concern
of dropping 4.6 and 4.8.
4.4 is good for old stuff, 4.9 is the most stable and 5/6 are good
for new warnings.


Not a burden at all. I've enabled them all. :)

Thanks,
Fengguang

[PATCH net-next] macvtap: correctly free skb during socket destruction

2016-07-18 Thread Jason Wang

We should use kfree_skb() instead of kfree() to free an skb.

Fixes: 362899b8725b ("macvtap: switch to use skb array")
Reported-by: Dan Carpenter 
Signed-off-by: Jason Wang 
---
 drivers/net/macvtap.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
index 9204d19..a38c0da 100644
--- a/drivers/net/macvtap.c
+++ b/drivers/net/macvtap.c
@@ -536,7 +536,7 @@ static void macvtap_sock_destruct(struct sock *sk)
struct sk_buff *skb;
 
while ((skb = skb_array_consume(>skb_array)) != NULL)
-   kfree(skb);
+   kfree_skb(skb);
 }
 
 static int macvtap_open(struct inode *inode, struct file *file)
-- 
2.7.4

Re: [PATCH net-next v3 03/12] net: dsa: mv88e6xxx: extract device mapping

2016-07-18 Thread Andrew Lunn

On Mon, Jul 18, 2016 at 08:45:31PM -0400, Vivien Didelot wrote:
> The Device Mapping register is an indirect table access.
> 
> Provide helpers to access this table and explicit the checking of the
> new DSA_RTABLE_NONE routing table value.
> 
> Signed-off-by: Vivien Didelot 

Reviewed-by: Andrew Lunn 

Andrew

Re: [PATCH v8 04/11] net/mlx4_en: add support for fast rx drop bpf program

2016-07-18 Thread Alexei Starovoitov

On Mon, Jul 18, 2016 at 03:07:01PM +0200, Tom Herbert wrote:
> On Mon, Jul 18, 2016 at 2:48 PM, Thomas Graf  wrote:
> > On 07/18/16 at 01:39pm, Tom Herbert wrote:
> >> On Mon, Jul 18, 2016 at 11:10 AM, Thomas Graf  wrote:
> >> > I agree with that but I would like to keep the current per net_device
> >> > atomic properties.
> >>
> >> I don't see that see that there is any synchronization guarantees
> >> using xchg. For instance, if the pointer is set right after being read
> >> by a thread for one queue and right before being read by a thread for
> >> another queue, this could result in the old and new program running
> >> concurrently or old one running after new. If we need to synchronize
> >> the operation across all queues then sequence
> >> ifdown,modify-config,ifup will work.
> >
> > Right, there are no synchronization guarantees between threads and I
> > don't think that's needed. The guarantee that is provided is that if
> > I replace a BPF program, the replace either succeeds in which case
> > all packets have been either processed by the old or new program. Or
> > the replace failed in which case the old program was left intact and
> > all packets are still going through the old program.
> >
> > This is a nice atomic replacement principle which would be nice to
> > preserve.
> 
> Sure, if replace operation fails then old program should remain in
> place. But xchg can't fail, so it seems like part is just giving a
> false sense of security that program replacement is somehow
> synchronized across queues.

good point. we do read_once at the beginning of napi, so we can
process a bunch of packets in other cpus even after xchg is all done.
Then I guess we can have a prog pointers in rings and it only marginally
increases the race. Why not if it doesn't increase the patch complexity...
btw we definitely want to avoid drain/start/stop or any slow operation
during prog xchg. When xdp is used for dos, the prog swap needs to be fast.

Re: [patch 1/1] kernel/trace/bpf_trace.c: work around gcc-4.4.4 anon union initialization bug

2016-07-18 Thread Alexei Starovoitov

On Tue, Jul 19, 2016 at 08:38:02AM +0800, Fengguang Wu wrote:
> Hi Alexei,
> 
> On Mon, Jul 18, 2016 at 05:33:07PM -0700, Alexei Starovoitov wrote:
> >On Mon, Jul 18, 2016 at 03:50:58PM -0700, a...@linux-foundation.org wrote:
> >>From: Andrew Morton 
> >>Subject: kernel/trace/bpf_trace.c: work around gcc-4.4.4 anon union 
> >>initialization bug
> >>
> >>kernel/trace/bpf_trace.c: In function 'bpf_event_output':
> >>kernel/trace/bpf_trace.c:312: error: unknown field 'next' specified in 
> >>initializer
> >>kernel/trace/bpf_trace.c:312: warning: missing braces around initializer
> >>kernel/trace/bpf_trace.c:312: warning: (near initialization for 
> >>'raw.frag.')
> >>
> >>Fixes: 555c8a8623a3a87 ("bpf: avoid stack copy and use skb ctx for event 
> >>output")
> >>Acked-by: Daniel Borkmann 
> >>Cc: Alexei Starovoitov 
> >>Cc: David S. Miller 
> >>Signed-off-by: Andrew Morton 
> >
> >Acked-by: Alexei Starovoitov 
> >
> >Fengguang can you add gcc-4.4 to buildbot. Thanks!
> 
> Sure. Currently we only test gcc-6. It'd be easy to test more versions
> concurrently, like
> 
> gcc-4.4
> gcc-4.6
> gcc-4.8
> gcc-4.9
> gcc-5
> gcc-6

thanks! If you need to reduce the test matrix I don't see a concern
of dropping 4.6 and 4.8.
4.4 is good for old stuff, 4.9 is the most stable and 5/6 are good
for new warnings.

[PATCH net-next v3 10/10] net/faraday: Mask PHY interrupt with NCSI mode

2016-07-18 Thread Gavin Shan

Bogus PHY interrupts are observed. This masks the PHY interrupt
when the interface works in NCSI mode as there is no attached
PHY under the circumstance.

Signed-off-by: Gavin Shan 
Acked-by: Joel Stanley 
---
 drivers/net/ethernet/faraday/ftgmac100.c | 27 +++
 1 file changed, 15 insertions(+), 12 deletions(-)

diff --git a/drivers/net/ethernet/faraday/ftgmac100.c 
b/drivers/net/ethernet/faraday/ftgmac100.c
index d8afa2d..2d4c7ea 100644
--- a/drivers/net/ethernet/faraday/ftgmac100.c
+++ b/drivers/net/ethernet/faraday/ftgmac100.c
@@ -74,6 +74,7 @@ struct ftgmac100 {
 
struct mii_bus *mii_bus;
int old_speed;
+   int int_mask_all;
bool use_ncsi;
bool enabled;
 };
@@ -84,14 +85,6 @@ static int ftgmac100_alloc_rx_page(struct ftgmac100 *priv,
 /**
  * internal functions (hardware register access)
  */
-#define INT_MASK_ALL_ENABLED   (FTGMAC100_INT_RPKT_LOST| \
-FTGMAC100_INT_XPKT_ETH | \
-FTGMAC100_INT_XPKT_LOST| \
-FTGMAC100_INT_AHB_ERR  | \
-FTGMAC100_INT_PHYSTS_CHG   | \
-FTGMAC100_INT_RPKT_BUF | \
-FTGMAC100_INT_NO_RXBUF)
-
 static void ftgmac100_set_rx_ring_base(struct ftgmac100 *priv, dma_addr_t addr)
 {
iowrite32(addr, priv->base + FTGMAC100_OFFSET_RXR_BADR);
@@ -1070,8 +1063,9 @@ static int ftgmac100_poll(struct napi_struct *napi, int 
budget)
ftgmac100_tx_complete(priv);
}
 
-   if (status & (FTGMAC100_INT_NO_RXBUF | FTGMAC100_INT_RPKT_LOST |
- FTGMAC100_INT_AHB_ERR | FTGMAC100_INT_PHYSTS_CHG)) {
+   if (status & priv->int_mask_all & (FTGMAC100_INT_NO_RXBUF |
+   FTGMAC100_INT_RPKT_LOST | FTGMAC100_INT_AHB_ERR |
+   FTGMAC100_INT_PHYSTS_CHG)) {
if (net_ratelimit())
netdev_info(netdev, "[ISR] = 0x%x: %s%s%s%s\n", status,
status & FTGMAC100_INT_NO_RXBUF ? "NO_RXBUF 
" : "",
@@ -1094,7 +1088,8 @@ static int ftgmac100_poll(struct napi_struct *napi, int 
budget)
napi_complete(napi);
 
/* enable all interrupts */
-   iowrite32(INT_MASK_ALL_ENABLED, priv->base + 
FTGMAC100_OFFSET_IER);
+   iowrite32(priv->int_mask_all,
+ priv->base + FTGMAC100_OFFSET_IER);
}
 
return rx;
@@ -1140,7 +1135,7 @@ static int ftgmac100_open(struct net_device *netdev)
netif_start_queue(netdev);
 
/* enable all interrupts */
-   iowrite32(INT_MASK_ALL_ENABLED, priv->base + FTGMAC100_OFFSET_IER);
+   iowrite32(priv->int_mask_all, priv->base + FTGMAC100_OFFSET_IER);
 
/* Start the NCSI device */
if (priv->use_ncsi) {
@@ -1365,6 +1360,13 @@ static int ftgmac100_probe(struct platform_device *pdev)
/* MAC address from chip or random one */
ftgmac100_setup_mac(priv);
 
+   priv->int_mask_all = (FTGMAC100_INT_RPKT_LOST |
+ FTGMAC100_INT_XPKT_ETH |
+ FTGMAC100_INT_XPKT_LOST |
+ FTGMAC100_INT_AHB_ERR |
+ FTGMAC100_INT_PHYSTS_CHG |
+ FTGMAC100_INT_RPKT_BUF |
+ FTGMAC100_INT_NO_RXBUF);
if (pdev->dev.of_node &&
of_get_property(pdev->dev.of_node, "use-ncsi", NULL)) {
if (!IS_ENABLED(CONFIG_NET_NCSI)) {
@@ -1374,6 +1376,7 @@ static int ftgmac100_probe(struct platform_device *pdev)
 
dev_info(>dev, "Using NCSI interface\n");
priv->use_ncsi = true;
+   priv->int_mask_all &= ~FTGMAC100_INT_PHYSTS_CHG;
priv->ndev = ncsi_register_dev(netdev, ftgmac100_ncsi_handler);
if (!priv->ndev)
goto err_ncsi_dev;
-- 
2.1.0

[PATCH net-next v3 01/10] net/ncsi: Resource management

2016-07-18 Thread Gavin Shan

NCSI spec (DSP0222) defines several objects: package, channel, mode,
filter, version and statistics etc. This introduces the data structs
to represent those objects and implement functions to manage them.
Also, this introduces CONFIG_NET_NCSI for the newly implemented NCSI
stack.

   * The user (e.g. netdev driver) dereference NCSI device by
 "struct ncsi_dev", which is embedded to "struct ncsi_dev_priv".
 The later one is used by NCSI stack internally.
   * Every NCSI device can have multiple packages simultaneously, up
 to 8 packages. It's represented by "struct ncsi_package" and
 identified by 3-bits ID.
   * Every NCSI package can have multiple channels, up to 32. It's
 represented by "struct ncsi_channel" and identified by 5-bits ID.
   * Every NCSI channel has version, statistics, various modes and
 filters. They are represented by "struct ncsi_channel_version",
 "struct ncsi_channel_stats", "struct ncsi_channel_mode" and
 "struct ncsi_channel_filter" separately.
   * Apart from AEN (Asynchronous Event Notification), the NCSI stack
 works in terms of command and response. This introduces "struct
 ncsi_req" to represent a complete NCSI transaction made of NCSI
 request and response.

link: 
https://www.dmtf.org/sites/default/files/standards/documents/DSP0222_1.1.0.pdf
Signed-off-by: Gavin Shan 
Acked-by: Joel Stanley 
---
 include/net/ncsi.h |  46 ++
 net/Kconfig|   1 +
 net/Makefile   |   1 +
 net/ncsi/Kconfig   |  12 ++
 net/ncsi/Makefile  |   4 +
 net/ncsi/internal.h| 256 +
 net/ncsi/ncsi-manage.c | 436 +
 7 files changed, 756 insertions(+)
 create mode 100644 include/net/ncsi.h
 create mode 100644 net/ncsi/Kconfig
 create mode 100644 net/ncsi/Makefile
 create mode 100644 net/ncsi/internal.h
 create mode 100644 net/ncsi/ncsi-manage.c

diff --git a/include/net/ncsi.h b/include/net/ncsi.h
new file mode 100644
index 000..70d14ee
--- /dev/null
+++ b/include/net/ncsi.h
@@ -0,0 +1,46 @@
+#ifndef __NET_NCSI_H
+#define __NET_NCSI_H
+
+/*
+ * The NCSI device states seen from external. More NCSI device states are
+ * only visible internally (in net/ncsi/internal.h). When the NCSI device
+ * is registered, it's in ncsi_dev_state_registered state. The state
+ * ncsi_dev_state_start is used to drive to choose active package and
+ * channel. After that, its state is changed to ncsi_dev_state_functional.
+ *
+ * The state ncsi_dev_state_stop helps to shut down the currently active
+ * package and channel while ncsi_dev_state_config helps to reconfigure
+ * them.
+ */
+enum {
+   ncsi_dev_state_registered   = 0x,
+   ncsi_dev_state_functional   = 0x0100,
+   ncsi_dev_state_probe= 0x0200,
+   ncsi_dev_state_config   = 0x0300,
+   ncsi_dev_state_suspend  = 0x0400,
+};
+
+struct ncsi_dev {
+   int   state;
+   int   link_up;
+   struct net_device *dev;
+   void  (*handler)(struct ncsi_dev *ndev);
+};
+
+#ifdef CONFIG_NET_NCSI
+struct ncsi_dev *ncsi_register_dev(struct net_device *dev,
+  void (*notifier)(struct ncsi_dev *nd));
+void ncsi_unregister_dev(struct ncsi_dev *nd);
+#else /* !CONFIG_NET_NCSI */
+static inline struct ncsi_dev *ncsi_register_dev(struct net_device *dev,
+   void (*notifier)(struct ncsi_dev *nd))
+{
+   return NULL;
+}
+
+static inline void ncsi_unregister_dev(struct ncsi_dev *nd)
+{
+}
+#endif /* CONFIG_NET_NCSI */
+
+#endif /* __NET_NCSI_H */
diff --git a/net/Kconfig b/net/Kconfig
index ff40562..c2cdbce 100644
--- a/net/Kconfig
+++ b/net/Kconfig
@@ -237,6 +237,7 @@ source "net/hsr/Kconfig"
 source "net/switchdev/Kconfig"
 source "net/l3mdev/Kconfig"
 source "net/qrtr/Kconfig"
+source "net/ncsi/Kconfig"
 
 config RPS
bool
diff --git a/net/Makefile b/net/Makefile
index bdd1455..9bd20bb 100644
--- a/net/Makefile
+++ b/net/Makefile
@@ -79,3 +79,4 @@ ifneq ($(CONFIG_NET_L3_MASTER_DEV),)
 obj-y  += l3mdev/
 endif
 obj-$(CONFIG_QRTR) += qrtr/
+obj-$(CONFIG_NET_NCSI) += ncsi/
diff --git a/net/ncsi/Kconfig b/net/ncsi/Kconfig
new file mode 100644
index 000..08a8a60
--- /dev/null
+++ b/net/ncsi/Kconfig
@@ -0,0 +1,12 @@
+#
+# Configuration for NCSI support
+#
+
+config NET_NCSI
+   bool "NCSI interface support"
+   depends on INET
+   ---help---
+ This module provides NCSI (Network Controller Sideband Interface)
+ support. Enable this only if your system connects to a network
+ device via NCSI and the ethernet driver you're using supports
+ the protocol explicitly.
diff --git a/net/ncsi/Makefile b/net/ncsi/Makefile
new file mode 100644
index 000..07b5625
--- /dev/null
+++ b/net/ncsi/Makefile
@@ -0,0 +1,4 @@
+#
+#

[PATCH net-next v3 03/10] net/ncsi: NCSI response packet handler

2016-07-18 Thread Gavin Shan

The NCSI response packets are sent to MC (Management Controller)
from the remote end. They are responses of NCSI command packets
for multiple purposes: completion status of NCSI command packets,
return NCSI channel's capability or configuration etc.

This defines struct to represent NCSI response packets and introduces
function ncsi_rcv_rsp() which will be used to receive NCSI response
packets and parse them.

Signed-off-by: Gavin Shan 
Acked-by: Joel Stanley 
---
 net/ncsi/Makefile   |2 +-
 net/ncsi/internal.h |2 +
 net/ncsi/ncsi-pkt.h |  208 +++
 net/ncsi/ncsi-rsp.c | 1016 +++
 4 files changed, 1227 insertions(+), 1 deletion(-)
 create mode 100644 net/ncsi/ncsi-rsp.c

diff --git a/net/ncsi/Makefile b/net/ncsi/Makefile
index abc4046..4751819 100644
--- a/net/ncsi/Makefile
+++ b/net/ncsi/Makefile
@@ -1,4 +1,4 @@
 #
 # Makefile for NCSI API
 #
-obj-$(CONFIG_NET_NCSI) += ncsi-cmd.o ncsi-manage.o
+obj-$(CONFIG_NET_NCSI) += ncsi-cmd.o ncsi-rsp.o ncsi-manage.o
diff --git a/net/ncsi/internal.h b/net/ncsi/internal.h
index 3d81697..bd000c9 100644
--- a/net/ncsi/internal.h
+++ b/net/ncsi/internal.h
@@ -271,5 +271,7 @@ struct ncsi_dev *ncsi_find_dev(struct net_device *dev);
 /* Packet handlers */
 u32 ncsi_calculate_checksum(unsigned char *data, int len);
 int ncsi_xmit_cmd(struct ncsi_cmd_arg *nca);
+int ncsi_rcv_rsp(struct sk_buff *skb, struct net_device *dev,
+struct packet_type *pt, struct net_device *orig_dev);
 
 #endif /* __NCSI_INTERNAL_H__ */
diff --git a/net/ncsi/ncsi-pkt.h b/net/ncsi/ncsi-pkt.h
index 5481458..4bdefd9 100644
--- a/net/ncsi/ncsi-pkt.h
+++ b/net/ncsi/ncsi-pkt.h
@@ -25,6 +25,12 @@ struct ncsi_cmd_pkt_hdr {
struct ncsi_pkt_hdr common; /* Common NCSI packet header */
 };
 
+struct ncsi_rsp_pkt_hdr {
+   struct ncsi_pkt_hdr common; /* Common NCSI packet header */
+   __be16  code;   /* Response code */
+   __be16  reason; /* Response reason   */
+};
+
 /* NCSI common command packet */
 struct ncsi_cmd_pkt {
struct ncsi_cmd_pkt_hdr cmd;  /* Command header */
@@ -32,6 +38,12 @@ struct ncsi_cmd_pkt {
unsigned char   pad[26];
 };
 
+struct ncsi_rsp_pkt {
+   struct ncsi_rsp_pkt_hdr rsp;  /* Response header */
+   __be32  checksum; /* Checksum*/
+   unsigned char   pad[22];
+};
+
 /* Select Package */
 struct ncsi_cmd_sp_pkt {
struct ncsi_cmd_pkt_hdr cmd;/* Command header */
@@ -133,6 +145,157 @@ struct ncsi_cmd_snfc_pkt {
unsigned char   pad[22];
 };
 
+/* Get Link Status */
+struct ncsi_rsp_gls_pkt {
+   struct ncsi_rsp_pkt_hdr rsp;/* Response header   */
+   __be32  status; /* Link status   */
+   __be32  other;  /* Other indications */
+   __be32  oem_status; /* OEM link status   */
+   __be32  checksum;
+   unsigned char   pad[10];
+};
+
+/* Get Version ID */
+struct ncsi_rsp_gvi_pkt {
+   struct ncsi_rsp_pkt_hdr rsp;  /* Response header */
+   __be32  ncsi_version; /* NCSI version*/
+   unsigned char   reserved[3];  /* Reserved*/
+   unsigned char   alpha2;   /* NCSI version*/
+   unsigned char   fw_name[12];  /* f/w name string */
+   __be32  fw_version;   /* f/w version */
+   __be16  pci_ids[4];   /* PCI IDs */
+   __be32  mf_id;/* Manufacture ID  */
+   __be32  checksum;
+};
+
+/* Get Capabilities */
+struct ncsi_rsp_gc_pkt {
+   struct ncsi_rsp_pkt_hdr rsp; /* Response header   */
+   __be32  cap; /* Capabilities  */
+   __be32  bc_cap;  /* Broadcast cap */
+   __be32  mc_cap;  /* Multicast cap */
+   __be32  buf_cap; /* Buffering cap */
+   __be32  aen_cap; /* AEN cap   */
+   unsigned char   vlan_cnt;/* VLAN filter count */
+   unsigned char   mixed_cnt;   /* Mix filter count  */
+   unsigned char   mc_cnt;  /* MC filter count   */
+   unsigned char   uc_cnt;  /* UC filter count   */
+   unsigned char   reserved[2]; /* Reserved  */
+   unsigned char   vlan_mode;   /* VLAN mode */
+   unsigned char   channel_cnt; /* Channel count */
+   __be32  checksum;/* Checksum  */
+};
+
+/* Get Parameters */
+struct ncsi_rsp_gp_pkt {
+   struct ncsi_rsp_pkt_hdr rsp;  /* Response header   */
+   unsigned char   mac_cnt;  /* Number of MAC addr*/
+   unsigned char

[PATCH net-next v3 08/10] net/faraday: Support NCSI mode

2016-07-18 Thread Gavin Shan

This makes ftgmac100 driver support NCSI mode. The NCSI is enabled
on the interface if property "use-nc-si" or "use-ncsi" is found from
the device node in device tree.

   * No PHY device is used when NCSI mode is enabled.
   * The NCSI device (struct ncsi_dev) is created when probing the
 device while it's enabled/started when the interface is brought
 up.
   * Hardware IP checksum dosn't work when NCSI mode is enabled. It
 is disabled on enabled NCSI.

Signed-off-by: Gavin Shan 
Acked-by: Joel Stanley 
---
 drivers/net/ethernet/faraday/ftgmac100.c | 85 
 1 file changed, 75 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/faraday/ftgmac100.c 
b/drivers/net/ethernet/faraday/ftgmac100.c
index 2c3f656..1cd4975 100644
--- a/drivers/net/ethernet/faraday/ftgmac100.c
+++ b/drivers/net/ethernet/faraday/ftgmac100.c
@@ -31,6 +31,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "ftgmac100.h"
 
@@ -68,10 +69,13 @@ struct ftgmac100 {
 
struct net_device *netdev;
struct device *dev;
+   struct ncsi_dev *ndev;
struct napi_struct napi;
 
struct mii_bus *mii_bus;
int old_speed;
+   bool use_ncsi;
+   bool enabled;
 };
 
 static int ftgmac100_alloc_rx_page(struct ftgmac100 *priv,
@@ -1010,7 +1014,10 @@ static irqreturn_t ftgmac100_interrupt(int irq, void 
*dev_id)
struct net_device *netdev = dev_id;
struct ftgmac100 *priv = netdev_priv(netdev);
 
-   if (likely(netif_running(netdev))) {
+   /* When running in NCSI mode, the interface should be ready for
+* receiving or transmitting NCSI packets before it's opened.
+*/
+   if (likely(priv->use_ncsi || netif_running(netdev))) {
/* Disable interrupts for polling */
iowrite32(0, priv->base + FTGMAC100_OFFSET_IER);
napi_schedule(>napi);
@@ -1123,17 +1130,33 @@ static int ftgmac100_open(struct net_device *netdev)
goto err_hw;
 
ftgmac100_init_hw(priv);
-   ftgmac100_start_hw(priv, 10);
-
-   phy_start(netdev->phydev);
+   ftgmac100_start_hw(priv, priv->use_ncsi ? 100 : 10);
+   if (netdev->phydev)
+   phy_start(netdev->phydev);
+   else if (priv->use_ncsi)
+   netif_carrier_on(netdev);
 
napi_enable(>napi);
netif_start_queue(netdev);
 
/* enable all interrupts */
iowrite32(INT_MASK_ALL_ENABLED, priv->base + FTGMAC100_OFFSET_IER);
+
+   /* Start the NCSI device */
+   if (priv->use_ncsi) {
+   err = ncsi_start_dev(priv->ndev);
+   if (err)
+   goto err_ncsi;
+   }
+
+   priv->enabled = true;
+
return 0;
 
+err_ncsi:
+   napi_disable(>napi);
+   netif_stop_queue(netdev);
+   iowrite32(0, priv->base + FTGMAC100_OFFSET_IER);
 err_hw:
free_irq(priv->irq, netdev);
 err_irq:
@@ -1146,12 +1169,17 @@ static int ftgmac100_stop(struct net_device *netdev)
 {
struct ftgmac100 *priv = netdev_priv(netdev);
 
+   if (!priv->enabled)
+   return 0;
+
/* disable all interrupts */
+   priv->enabled = false;
iowrite32(0, priv->base + FTGMAC100_OFFSET_IER);
 
netif_stop_queue(netdev);
napi_disable(>napi);
-   phy_stop(netdev->phydev);
+   if (netdev->phydev)
+   phy_stop(netdev->phydev);
 
ftgmac100_stop_hw(priv);
free_irq(priv->irq, netdev);
@@ -1192,6 +1220,9 @@ static int ftgmac100_hard_start_xmit(struct sk_buff *skb,
 /* optional */
 static int ftgmac100_do_ioctl(struct net_device *netdev, struct ifreq *ifr, 
int cmd)
 {
+   if (!netdev->phydev)
+   return -ENXIO;
+
return phy_mii_ioctl(netdev->phydev, ifr, cmd);
 }
 
@@ -1258,6 +1289,15 @@ static void ftgmac100_destroy_mdio(struct net_device 
*netdev)
mdiobus_free(priv->mii_bus);
 }
 
+static void ftgmac100_ncsi_handler(struct ncsi_dev *nd)
+{
+   if (unlikely(nd->state != ncsi_dev_state_functional))
+   return;
+
+   netdev_info(nd->dev, "NCSI interface %s\n",
+   nd->link_up ? "up" : "down");
+}
+
 /**
  * struct platform_driver functions
  */
@@ -1267,7 +1307,7 @@ static int ftgmac100_probe(struct platform_device *pdev)
int irq;
struct net_device *netdev;
struct ftgmac100 *priv;
-   int err;
+   int err = 0;
 
if (!pdev)
return -ENODEV;
@@ -1291,7 +1331,6 @@ static int ftgmac100_probe(struct platform_device *pdev)
 
netdev->ethtool_ops = _ethtool_ops;
netdev->netdev_ops = _netdev_ops;
-   netdev->features = NETIF_F_IP_CSUM | NETIF_F_GRO;
 
platform_set_drvdata(pdev, netdev);
 
@@ -1326,9 +1365,34 @@ static int

[PATCH net-next v3 05/10] net/ncsi: NCSI AEN packet handler

2016-07-18 Thread Gavin Shan

This introduces NCSI AEN packet handlers that result in (A) the
currently active channel is reconfigured; (B) Currently active
channel is deconfigured and disabled, another channel is chosen
as active one and configured. Case (B) won't happen if hardware
arbitration has been enabled, the channel that was in active
state is suspended simply.

Signed-off-by: Gavin Shan 
Acked-by: Joel Stanley 
---
 net/ncsi/Makefile   |   2 +-
 net/ncsi/internal.h |   1 +
 net/ncsi/ncsi-aen.c | 193 
 net/ncsi/ncsi-pkt.h |  36 ++
 net/ncsi/ncsi-rsp.c |   6 +-
 5 files changed, 236 insertions(+), 2 deletions(-)
 create mode 100644 net/ncsi/ncsi-aen.c

diff --git a/net/ncsi/Makefile b/net/ncsi/Makefile
index 4751819..dd12b56 100644
--- a/net/ncsi/Makefile
+++ b/net/ncsi/Makefile
@@ -1,4 +1,4 @@
 #
 # Makefile for NCSI API
 #
-obj-$(CONFIG_NET_NCSI) += ncsi-cmd.o ncsi-rsp.o ncsi-manage.o
+obj-$(CONFIG_NET_NCSI) += ncsi-cmd.o ncsi-rsp.o ncsi-aen.o ncsi-manage.o
diff --git a/net/ncsi/internal.h b/net/ncsi/internal.h
index 38fc95a..33738c0 100644
--- a/net/ncsi/internal.h
+++ b/net/ncsi/internal.h
@@ -323,5 +323,6 @@ u32 ncsi_calculate_checksum(unsigned char *data, int len);
 int ncsi_xmit_cmd(struct ncsi_cmd_arg *nca);
 int ncsi_rcv_rsp(struct sk_buff *skb, struct net_device *dev,
 struct packet_type *pt, struct net_device *orig_dev);
+int ncsi_aen_handler(struct ncsi_dev_priv *ndp, struct sk_buff *skb);
 
 #endif /* __NCSI_INTERNAL_H__ */
diff --git a/net/ncsi/ncsi-aen.c b/net/ncsi/ncsi-aen.c
new file mode 100644
index 000..d463468
--- /dev/null
+++ b/net/ncsi/ncsi-aen.c
@@ -0,0 +1,193 @@
+/*
+ * Copyright Gavin Shan, IBM Corporation 2016.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+
+#include "internal.h"
+#include "ncsi-pkt.h"
+
+static int ncsi_validate_aen_pkt(struct ncsi_aen_pkt_hdr *h,
+const unsigned short payload)
+{
+   u32 checksum;
+   __be32 *pchecksum;
+
+   if (h->common.revision != NCSI_PKT_REVISION)
+   return -EINVAL;
+   if (ntohs(h->common.length) != payload)
+   return -EINVAL;
+
+   /* Validate checksum, which might be zeroes if the
+* sender doesn't support checksum according to NCSI
+* specification.
+*/
+   pchecksum = (__be32 *)((void *)(h + 1) + payload - 4);
+   if (ntohl(*pchecksum) == 0)
+   return 0;
+
+   checksum = ncsi_calculate_checksum((unsigned char *)h,
+  sizeof(*h) + payload - 4);
+   if (*pchecksum != htonl(checksum))
+   return -EINVAL;
+
+   return 0;
+}
+
+static int ncsi_aen_handler_lsc(struct ncsi_dev_priv *ndp,
+   struct ncsi_aen_pkt_hdr *h)
+{
+   struct ncsi_aen_lsc_pkt *lsc;
+   struct ncsi_channel *nc;
+   struct ncsi_channel_mode *ncm;
+   unsigned long old_data;
+   unsigned long flags;
+
+   /* Find the NCSI channel */
+   ncsi_find_package_and_channel(ndp, h->common.channel, NULL, );
+   if (!nc)
+   return -ENODEV;
+
+   /* Update the link status */
+   ncm = >modes[NCSI_MODE_LINK];
+   lsc = (struct ncsi_aen_lsc_pkt *)h;
+   old_data = ncm->data[2];
+   ncm->data[2] = ntohl(lsc->status);
+   ncm->data[4] = ntohl(lsc->oem_status);
+   if (!((old_data ^ ncm->data[2]) & 0x1) ||
+   !list_empty(>link))
+   return 0;
+   if (!(nc->state == NCSI_CHANNEL_INACTIVE && (ncm->data[2] & 0x1)) &&
+   !(nc->state == NCSI_CHANNEL_ACTIVE && !(ncm->data[2] & 0x1)))
+   return 0;
+
+   if (!(ndp->flags & NCSI_DEV_HWA) &&
+   nc->state == NCSI_CHANNEL_ACTIVE)
+   ndp->flags |= NCSI_DEV_RESHUFFLE;
+
+   ncsi_stop_channel_monitor(nc);
+   spin_lock_irqsave(>lock, flags);
+   list_add_tail_rcu(>link, >channel_queue);
+   spin_unlock_irqrestore(>lock, flags);
+
+   return ncsi_process_next_channel(ndp);
+}
+
+static int ncsi_aen_handler_cr(struct ncsi_dev_priv *ndp,
+  struct ncsi_aen_pkt_hdr *h)
+{
+   struct ncsi_channel *nc;
+   unsigned long flags;
+
+   /* Find the NCSI channel */
+   ncsi_find_package_and_channel(ndp, h->common.channel, NULL, );
+   if (!nc)
+   return -ENODEV;
+
+   if (!list_empty(>link) ||
+   nc->state != NCSI_CHANNEL_ACTIVE)
+   return 0;
+
+   ncsi_stop_channel_monitor(nc);
+   spin_lock_irqsave(>lock, flags);
+   xchg(>state, NCSI_CHANNEL_INACTIVE);
+

[PATCH net-next v3 02/10] net/ncsi: NCSI command packet handler

2016-07-18 Thread Gavin Shan

The NCSI command packets are sent from MC (Management Controller)
to remote end. They are used for multiple purposes: probe existing
NCSI package/channel, retrieve NCSI channel's capability, configure
NCSI channel etc.

This defines struct to represent NCSI command packets and introduces
function ncsi_xmit_cmd(), which will be used to transmit NCSI command
packet according to the request. The request is represented by struct
ncsi_cmd_arg.

Signed-off-by: Gavin Shan 
Acked-by: Joel Stanley 
---
 include/uapi/linux/if_ether.h |   1 +
 net/ncsi/Makefile |   2 +-
 net/ncsi/internal.h   |  19 +++
 net/ncsi/ncsi-cmd.c   | 367 ++
 net/ncsi/ncsi-pkt.h   | 171 
 5 files changed, 559 insertions(+), 1 deletion(-)
 create mode 100644 net/ncsi/ncsi-cmd.c
 create mode 100644 net/ncsi/ncsi-pkt.h

diff --git a/include/uapi/linux/if_ether.h b/include/uapi/linux/if_ether.h
index cec849a..117d02e 100644
--- a/include/uapi/linux/if_ether.h
+++ b/include/uapi/linux/if_ether.h
@@ -87,6 +87,7 @@
 #define ETH_P_8021AH   0x88E7  /* 802.1ah Backbone Service Tag */
 #define ETH_P_MVRP 0x88F5  /* 802.1Q MVRP  */
 #define ETH_P_1588 0x88F7  /* IEEE 1588 Timesync */
+#define ETH_P_NCSI 0x88F8  /* NCSI protocol*/
 #define ETH_P_PRP  0x88FB  /* IEC 62439-3 PRP/HSRv0*/
 #define ETH_P_FCOE 0x8906  /* Fibre Channel over Ethernet  */
 #define ETH_P_TDLS 0x890D  /* TDLS */
diff --git a/net/ncsi/Makefile b/net/ncsi/Makefile
index 07b5625..abc4046 100644
--- a/net/ncsi/Makefile
+++ b/net/ncsi/Makefile
@@ -1,4 +1,4 @@
 #
 # Makefile for NCSI API
 #
-obj-$(CONFIG_NET_NCSI) += ncsi-manage.o
+obj-$(CONFIG_NET_NCSI) += ncsi-cmd.o ncsi-manage.o
diff --git a/net/ncsi/internal.h b/net/ncsi/internal.h
index 89028e1..3d81697 100644
--- a/net/ncsi/internal.h
+++ b/net/ncsi/internal.h
@@ -220,6 +220,21 @@ struct ncsi_dev_priv {
struct list_headnode;/* Form NCSI device list  */
 };
 
+struct ncsi_cmd_arg {
+   struct ncsi_dev_priv *ndp;/* Associated NCSI device*/
+   unsigned chartype;/* Command in the NCSI packet*/
+   unsigned charid;  /* Request ID (sequence number)  */
+   unsigned charpackage; /* Destination package ID*/
+   unsigned charchannel; /* Detination channel ID or 0x1f */
+   unsigned short   payload; /* Command packet payload length */
+   bool driven;  /* Drive the state machine?  */
+   union {
+   unsigned char  bytes[16]; /* Command packet specific data  */
+   unsigned short words[8];
+   unsigned int   dwords[4];
+   };
+};
+
 extern struct list_head ncsi_dev_list;
 extern spinlock_t ncsi_dev_lock;
 
@@ -253,4 +268,8 @@ struct ncsi_request *ncsi_alloc_request(struct 
ncsi_dev_priv *ndp, bool driven);
 void ncsi_free_request(struct ncsi_request *nr);
 struct ncsi_dev *ncsi_find_dev(struct net_device *dev);
 
+/* Packet handlers */
+u32 ncsi_calculate_checksum(unsigned char *data, int len);
+int ncsi_xmit_cmd(struct ncsi_cmd_arg *nca);
+
 #endif /* __NCSI_INTERNAL_H__ */
diff --git a/net/ncsi/ncsi-cmd.c b/net/ncsi/ncsi-cmd.c
new file mode 100644
index 000..21057a8
--- /dev/null
+++ b/net/ncsi/ncsi-cmd.c
@@ -0,0 +1,367 @@
+/*
+ * Copyright Gavin Shan, IBM Corporation 2016.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+
+#include "internal.h"
+#include "ncsi-pkt.h"
+
+u32 ncsi_calculate_checksum(unsigned char *data, int len)
+{
+   u32 checksum = 0;
+   int i;
+
+   for (i = 0; i < len; i += 2)
+   checksum += (((u32)data[i] << 8) | data[i + 1]);
+
+   checksum = (~checksum + 1);
+   return checksum;
+}
+
+/* This function should be called after the data area has been
+ * populated completely.
+ */
+static void ncsi_cmd_build_header(struct ncsi_pkt_hdr *h,
+ struct ncsi_cmd_arg *nca)
+{
+   u32 checksum;
+   __be32 *pchecksum;
+
+   h->mc_id= 0;
+   h->revision = NCSI_PKT_REVISION;
+   h->reserved = 0;
+   h->id   = nca->id;
+   h->type = nca->type;
+   h->channel  = NCSI_TO_CHANNEL(nca->package,
+ nca->channel);
+   h->length   = htons(nca->payload);
+   h->reserved1[0] = 0;
+   h->reserved1[1] = 0;
+
+   /* Fill with calculated checksum */
+   checksum =

[PATCH net-next v3 07/10] net/faraday: Read MAC address from chip

2016-07-18 Thread Gavin Shan

The device is assigned with random MAC address. It isn't reasonable.
An valid MAC address might have been provided by (uboot) firmware by
device-tree or in chip. It's reasonable to use it to maintain consistency.

This uses the MAC address from device-tree or that in the chip if it's
valid. Otherwise, a random MAC address is given as before.

Signed-off-by: Gavin Shan 
Acked-by: Joel Stanley 
---
 drivers/net/ethernet/faraday/ftgmac100.c | 69 
 1 file changed, 62 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/faraday/ftgmac100.c 
b/drivers/net/ethernet/faraday/ftgmac100.c
index 9b09493..2c3f656 100644
--- a/drivers/net/ethernet/faraday/ftgmac100.c
+++ b/drivers/net/ethernet/faraday/ftgmac100.c
@@ -141,6 +141,64 @@ static void ftgmac100_set_mac(struct ftgmac100 *priv, 
const unsigned char *mac)
iowrite32(laddr, priv->base + FTGMAC100_OFFSET_MAC_LADR);
 }
 
+static void ftgmac100_setup_mac(struct ftgmac100 *priv)
+{
+   u8 mac[ETH_ALEN];
+   unsigned int m;
+   unsigned int l;
+   void *addr;
+
+   addr = device_get_mac_address(priv->dev, mac, ETH_ALEN);
+   if (addr) {
+   ether_addr_copy(priv->netdev->dev_addr, mac);
+   dev_info(priv->dev, "Read MAC address %pM from device tree\n",
+mac);
+   return;
+   }
+
+   m = ioread32(priv->base + FTGMAC100_OFFSET_MAC_MADR);
+   l = ioread32(priv->base + FTGMAC100_OFFSET_MAC_LADR);
+
+   mac[0] = (m >> 8) & 0xff;
+   mac[1] = m & 0xff;
+   mac[2] = (l >> 24) & 0xff;
+   mac[3] = (l >> 16) & 0xff;
+   mac[4] = (l >> 8) & 0xff;
+   mac[5] = l & 0xff;
+
+   if (!is_valid_ether_addr(mac)) {
+   mac[5] = (m >> 8) & 0xff;
+   mac[4] = m & 0xff;
+   mac[3] = (l >> 24) & 0xff;
+   mac[2] = (l >> 16) & 0xff;
+   mac[1] = (l >>  8) & 0xff;
+   mac[0] = l & 0xff;
+   }
+
+   if (is_valid_ether_addr(mac)) {
+   ether_addr_copy(priv->netdev->dev_addr, mac);
+   dev_info(priv->dev, "Read MAC address %pM from chip\n", mac);
+   } else {
+   eth_hw_addr_random(priv->netdev);
+   dev_info(priv->dev, "Generated random MAC address %pM\n",
+priv->netdev->dev_addr);
+   }
+}
+
+static int ftgmac100_set_mac_addr(struct net_device *dev, void *p)
+{
+   int ret;
+
+   ret = eth_prepare_mac_addr_change(dev, p);
+   if (ret < 0)
+   return ret;
+
+   eth_commit_mac_addr_change(dev, p);
+   ftgmac100_set_mac(netdev_priv(dev), dev->dev_addr);
+
+   return 0;
+}
+
 static void ftgmac100_init_hw(struct ftgmac100 *priv)
 {
/* setup ring buffer base registers */
@@ -1141,7 +1199,7 @@ static const struct net_device_ops ftgmac100_netdev_ops = 
{
.ndo_open   = ftgmac100_open,
.ndo_stop   = ftgmac100_stop,
.ndo_start_xmit = ftgmac100_hard_start_xmit,
-   .ndo_set_mac_address= eth_mac_addr,
+   .ndo_set_mac_address= ftgmac100_set_mac_addr,
.ndo_validate_addr  = eth_validate_addr,
.ndo_do_ioctl   = ftgmac100_do_ioctl,
 };
@@ -1265,6 +1323,9 @@ static int ftgmac100_probe(struct platform_device *pdev)
 
priv->irq = irq;
 
+   /* MAC address from chip or random one */
+   ftgmac100_setup_mac(priv);
+
err = ftgmac100_setup_mdio(netdev);
if (err)
goto err_setup_mdio;
@@ -1278,12 +1339,6 @@ static int ftgmac100_probe(struct platform_device *pdev)
 
netdev_info(netdev, "irq %d, mapped at %p\n", priv->irq, priv->base);
 
-   if (!is_valid_ether_addr(netdev->dev_addr)) {
-   eth_hw_addr_random(netdev);
-   netdev_info(netdev, "generated random MAC address %pM\n",
-   netdev->dev_addr);
-   }
-
return 0;
 
 err_register_netdev:
-- 
2.1.0

[PATCH net-next v3 09/10] net/faraday: Match driver according to compatible property

2016-07-18 Thread Gavin Shan

This matches the driver with devices compatible with "faraday,ftgmac100"
declared in the device tree. Originally, device's name from device
tree for it.

Signed-off-by: Gavin Shan 
Acked-by: Joel Stanley 
---
 drivers/net/ethernet/faraday/ftgmac100.c | 16 +++-
 1 file changed, 11 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/faraday/ftgmac100.c 
b/drivers/net/ethernet/faraday/ftgmac100.c
index 1cd4975..d8afa2d 100644
--- a/drivers/net/ethernet/faraday/ftgmac100.c
+++ b/drivers/net/ethernet/faraday/ftgmac100.c
@@ -1438,14 +1438,20 @@ static int __exit ftgmac100_remove(struct 
platform_device *pdev)
return 0;
 }
 
+static const struct of_device_id ftgmac100_of_match[] = {
+   { .compatible = "faraday,ftgmac100" },
+   { }
+};
+MODULE_DEVICE_TABLE(of, ftgmac100_of_match);
+
 static struct platform_driver ftgmac100_driver = {
-   .probe  = ftgmac100_probe,
-   .remove = __exit_p(ftgmac100_remove),
-   .driver = {
-   .name   = DRV_NAME,
+   .probe  = ftgmac100_probe,
+   .remove = __exit_p(ftgmac100_remove),
+   .driver = {
+   .name   = DRV_NAME,
+   .of_match_table = ftgmac100_of_match,
},
 };
-
 module_platform_driver(ftgmac100_driver);
 
 MODULE_AUTHOR("Po-Yu Chuang ");
-- 
2.1.0

[PATCH net-next v3 00/10] NCSI Support

2016-07-18 Thread Gavin Shan

This series rebases on David's linux-net git repo ("master" branch). It's
to support NCSI stack on drivers/net/ethernet/faraday/ftgmac100.c. The
implementation is based on NCSI spec (version: 1.1.0):
https://www.dmtf.org/sites/default/files/standards/documents/DSP0222_1.1.0.pdf

As the following figure shows and defined in NCSI spec:

 * The NC-SI (aka NCSI) is defined as the interface between a (Base)
   Management Controller (BMC) and one or multiple Network Interface
   Controlers (NIC) on host side. The interface is responsible for providing
   external network connectivity for BMC.
 * Each BMC can connect to multiple packages, up to 8. Each package can have
   multiple channels, up to 32. Every package and channel are identified by
   3-bits and 5-bits in NCSI packet.
 * NCSI packet, encapsulated in ethernet frame, has 0x88F8 in the protocol
   field. The destination MAC address should be 0xFF's while the source MAC
   address can be arbitrary one.
 * NCSI packets are classified to command, response, AEN (Asynchronous Event 
Notification).
   Commands are sent from BMC to host (NIC) for configuration and
   information retrival. Responses, corresponding to commands, are sent from
   host to BMC for confirmation and requested information. One command should
   have one and only one response. AEN is sent from host to BMC for notification
   (e.g. link down on active channel) so that BMC can take appropriate action.

   +--++--+
   |  || Host |
   |BMC   ||  |
   |  || +---+  +---+ |
   |+-+   || | Package-A |  | Package-B | |
   || |   || +-+-+  +---+ |
   ||ftgmac100|   || | Channel | Channel |  | Channel | Channel | |
   ++++---++-+-+-+--+-+-+-+
 | |  |
 | |  |
 +-+--+

The series of patches is highlighted as:

The design for the patchset is highlighted as below:

 * The network driver uses 3 interfaces exported from NCSI stack:
   ncsi_register_dev() - Register (create) a associated NCSI device.
   ncsi_start_dev() - Bring up the NCSI device.
   ncsi_unregister_dev() - Destroy the registered NCSI device.
 * There are several data structures introduced for different objects:
   struct ncsi_dev - NCSI device seen by network device driver.
   struct ncsi_dev_priv - NCSI device seen by NCSI stack.
   struct ncsi_package - NCSI package which can have multiple channels.
   struct ncsi_channel - NCSI channel.
 * The NCSI stack is driven by workqueue and state machine internally.
 * The all available NCSI packages and channels are enumerated (probed) on
   the first call to ncsi_start_dev(). The NCSI topology won't change until
   the NCSI device is destroyed.
 * All available channels will be brought up When the hardware arbitration
   is enabled. Otherwise, only one channel is selected as active one. The
   NCSI internal is driven by state machine with help of a workqueue. In
   the meanwhile, there are 3 states for each channel which can be put into
   a queue requesting for configuration or suspending. Channels in the queue
   with inactive state set will be configured (bringup) while channels in
   the queue with active state will be suspended (teardown). The request
   configuration or suspending is being applied on the channel if it's in
   invisible state.
 * Failover, another inactive channel is selected as active, can happen when
   the hardware arbitration is disabled. The failover can be caused by timeout
   on link monitor and AEN.
 * NCSI stack should be configurable through netlink or another mechanism, it's
   not implemented in this patchset. It's something TBD.
 * The first NIC driver that is aware of NCSI: 
drivers/net/ethernet/faraday/ftgmac100.c

Changelog
=
v2 -> v3:
 * Include (one line) change in include/uapi/linux/if_ether.h to fix build
   error.
v1 -> v2:
 * Support NCSI spec v1.1.0 (3 more commands and 4 hardware arbitration
   modes added).
 * Enable AEN packets according to the supported list.
 * Introduce NCSI channel states and processing queue in order to support
   the hardware arbitration.
 * The hardware arbitration is supported (tested with emulated environment).
 * Introduce link monitor with GLS (Get Link Status) command/response as part
   of the error handling defined in NCSI spec.
 * Support IPv6 address discovery when CONFIG_IPV6 is enabled.

Gavin Shan (10):
  net/ncsi: Resource management
  net/ncsi: NCSI command packet handler
  net/ncsi: NCSI response packet handler
  net/ncsi:

[PATCH net-next v3 04/10] net/ncsi: Package and channel management

2016-07-18 Thread Gavin Shan

This manages NCSI packages and channels:

 * The available packages and channels are enumerated in the first
   time of calling ncsi_start_dev(). The channels' capabilities are
   probed in the meanwhile. The NCSI network topology won't change
   until the NCSI device is destroyed.
 * There in a queue in every NCSI device. The element in the queue,
   channel, is waiting for configuration (bringup) or suspending
   (teardown). The channel's state (inactive/active) indicates the
   futher action (configuration or suspending) will be applied on the
   channel. Another channel's state (invisible) means the requested
   action is being applied.
 * The hardware arbitration will be enabled if all available packages
   and channels support it. All available channels try to provide
   service when hardware arbitration is enabled. Otherwise, one channel
   is selected as the active one at once.
 * When channel is in active state, meaning it's providing service, a
   timer started to retrieve the channe's link status. If the channel's
   link status fails to be updated in the determined period, the channel
   is going to be reconfigured. It's the error handling implementation
   as defined in NCSI spec.

Signed-off-by: Gavin Shan 
Acked-by: Joel Stanley 
---
 include/net/ncsi.h |   6 +
 net/ncsi/internal.h|  50 
 net/ncsi/ncsi-manage.c | 763 +
 net/ncsi/ncsi-rsp.c|  15 +
 4 files changed, 834 insertions(+)

diff --git a/include/net/ncsi.h b/include/net/ncsi.h
index 70d14ee..1dbf42f 100644
--- a/include/net/ncsi.h
+++ b/include/net/ncsi.h
@@ -30,6 +30,7 @@ struct ncsi_dev {
 #ifdef CONFIG_NET_NCSI
 struct ncsi_dev *ncsi_register_dev(struct net_device *dev,
   void (*notifier)(struct ncsi_dev *nd));
+int ncsi_start_dev(struct ncsi_dev *nd);
 void ncsi_unregister_dev(struct ncsi_dev *nd);
 #else /* !CONFIG_NET_NCSI */
 static inline struct ncsi_dev *ncsi_register_dev(struct net_device *dev,
@@ -38,6 +39,11 @@ static inline struct ncsi_dev *ncsi_register_dev(struct 
net_device *dev,
return NULL;
 }
 
+static inline int ncsi_start_dev(struct ncsi_dev *nd)
+{
+   return -ENOTTY;
+}
+
 static inline void ncsi_unregister_dev(struct ncsi_dev *nd)
 {
 }
diff --git a/net/ncsi/internal.h b/net/ncsi/internal.h
index bd000c9..38fc95a 100644
--- a/net/ncsi/internal.h
+++ b/net/ncsi/internal.h
@@ -178,6 +178,7 @@ struct ncsi_channel {
int state;
 #define NCSI_CHANNEL_INACTIVE  1
 #define NCSI_CHANNEL_ACTIVE2
+#define NCSI_CHANNEL_INVISIBLE 3
spinlock_t  lock;   /* Protect filters etc */
struct ncsi_package *package;
struct ncsi_channel_version version;
@@ -185,7 +186,11 @@ struct ncsi_channel {
struct ncsi_channel_modemodes[NCSI_MODE_MAX];
struct ncsi_channel_filter  *filters[NCSI_FILTER_MAX];
struct ncsi_channel_stats   stats;
+   struct timer_list   timer;  /* Link monitor timer  */
+   boolenabled;/* Timer is enabled*/
+   unsigned inttimeout;/* Times of timeout*/
struct list_headnode;
+   struct list_headlink;
 };
 
 struct ncsi_package {
@@ -209,14 +214,56 @@ struct ncsi_request {
bool enabled; /* Time has been enabled or not*/
 };
 
+enum {
+   ncsi_dev_state_major= 0xff00,
+   ncsi_dev_state_minor= 0x00ff,
+   ncsi_dev_state_probe_deselect   = 0x0201,
+   ncsi_dev_state_probe_package,
+   ncsi_dev_state_probe_channel,
+   ncsi_dev_state_probe_cis,
+   ncsi_dev_state_probe_gvi,
+   ncsi_dev_state_probe_gc,
+   ncsi_dev_state_probe_gls,
+   ncsi_dev_state_probe_dp,
+   ncsi_dev_state_config_sp= 0x0301,
+   ncsi_dev_state_config_cis,
+   ncsi_dev_state_config_sma,
+   ncsi_dev_state_config_ebf,
+#if IS_ENABLED(CONFIG_IPV6)
+   ncsi_dev_state_config_egmf,
+#endif
+   ncsi_dev_state_config_ecnt,
+   ncsi_dev_state_config_ec,
+   ncsi_dev_state_config_ae,
+   ncsi_dev_state_config_gls,
+   ncsi_dev_state_config_done,
+   ncsi_dev_state_suspend_select   = 0x0401,
+   ncsi_dev_state_suspend_dcnt,
+   ncsi_dev_state_suspend_dc,
+   ncsi_dev_state_suspend_deselect,
+   ncsi_dev_state_suspend_done
+};
+
 struct ncsi_dev_priv {
struct ncsi_dev ndev;/* Associated NCSI device */
unsigned intflags;   /* NCSI device flags  */
+#define NCSI_DEV_PROBED1/* Finalized NCSI topology 
   */
+#define NCSI_DEV_HWA   2/* Enabled HW arbitration */
+#define NCSI_DEV_RESHUFFLE 4
spinlock_t  lock;/* Protect the NCSI device*/
+#if

[PATCH net-next v3 06/10] net/faraday: Helper functions to create or destroy MDIO interface

2016-07-18 Thread Gavin Shan

This introduces two helper functions to create or destroy MDIO
interface. No logical changes introduced except the proper MDIO
names are given when having more than one MDIO bus.

Signed-off-by: Gavin Shan 
Acked-by: Joel Stanley 
---
 drivers/net/ethernet/faraday/ftgmac100.c | 95 
 1 file changed, 60 insertions(+), 35 deletions(-)

diff --git a/drivers/net/ethernet/faraday/ftgmac100.c 
b/drivers/net/ethernet/faraday/ftgmac100.c
index e7cf313..9b09493 100644
--- a/drivers/net/ethernet/faraday/ftgmac100.c
+++ b/drivers/net/ethernet/faraday/ftgmac100.c
@@ -1146,6 +1146,60 @@ static const struct net_device_ops ftgmac100_netdev_ops 
= {
.ndo_do_ioctl   = ftgmac100_do_ioctl,
 };
 
+static int ftgmac100_setup_mdio(struct net_device *netdev)
+{
+   struct ftgmac100 *priv = netdev_priv(netdev);
+   struct platform_device *pdev = to_platform_device(priv->dev);
+   int i, err = 0;
+
+   /* initialize mdio bus */
+   priv->mii_bus = mdiobus_alloc();
+   if (!priv->mii_bus)
+   return -EIO;
+
+   priv->mii_bus->name = "ftgmac100_mdio";
+   snprintf(priv->mii_bus->id, MII_BUS_ID_SIZE, "%s-%d",
+pdev->name, pdev->id);
+   priv->mii_bus->priv = priv->netdev;
+   priv->mii_bus->read = ftgmac100_mdiobus_read;
+   priv->mii_bus->write = ftgmac100_mdiobus_write;
+
+   for (i = 0; i < PHY_MAX_ADDR; i++)
+   priv->mii_bus->irq[i] = PHY_POLL;
+
+   err = mdiobus_register(priv->mii_bus);
+   if (err) {
+   dev_err(priv->dev, "Cannot register MDIO bus!\n");
+   goto err_register_mdiobus;
+   }
+
+   err = ftgmac100_mii_probe(priv);
+   if (err) {
+   dev_err(priv->dev, "MII Probe failed!\n");
+   goto err_mii_probe;
+   }
+
+   return 0;
+
+err_mii_probe:
+   mdiobus_unregister(priv->mii_bus);
+err_register_mdiobus:
+   mdiobus_free(priv->mii_bus);
+   return err;
+}
+
+static void ftgmac100_destroy_mdio(struct net_device *netdev)
+{
+   struct ftgmac100 *priv = netdev_priv(netdev);
+
+   if (!netdev->phydev)
+   return;
+
+   phy_disconnect(netdev->phydev);
+   mdiobus_unregister(priv->mii_bus);
+   mdiobus_free(priv->mii_bus);
+}
+
 /**
  * struct platform_driver functions
  */
@@ -1211,31 +1265,9 @@ static int ftgmac100_probe(struct platform_device *pdev)
 
priv->irq = irq;
 
-   /* initialize mdio bus */
-   priv->mii_bus = mdiobus_alloc();
-   if (!priv->mii_bus) {
-   err = -EIO;
-   goto err_alloc_mdiobus;
-   }
-
-   priv->mii_bus->name = "ftgmac100_mdio";
-   snprintf(priv->mii_bus->id, MII_BUS_ID_SIZE, "ftgmac100_mii");
-
-   priv->mii_bus->priv = netdev;
-   priv->mii_bus->read = ftgmac100_mdiobus_read;
-   priv->mii_bus->write = ftgmac100_mdiobus_write;
-
-   err = mdiobus_register(priv->mii_bus);
-   if (err) {
-   dev_err(>dev, "Cannot register MDIO bus!\n");
-   goto err_register_mdiobus;
-   }
-
-   err = ftgmac100_mii_probe(priv);
-   if (err) {
-   dev_err(>dev, "MII Probe failed!\n");
-   goto err_mii_probe;
-   }
+   err = ftgmac100_setup_mdio(netdev);
+   if (err)
+   goto err_setup_mdio;
 
/* register network device */
err = register_netdev(netdev);
@@ -1255,12 +1287,8 @@ static int ftgmac100_probe(struct platform_device *pdev)
return 0;
 
 err_register_netdev:
-   phy_disconnect(netdev->phydev);
-err_mii_probe:
-   mdiobus_unregister(priv->mii_bus);
-err_register_mdiobus:
-   mdiobus_free(priv->mii_bus);
-err_alloc_mdiobus:
+   ftgmac100_destroy_mdio(netdev);
+err_setup_mdio:
iounmap(priv->base);
 err_ioremap:
release_resource(priv->res);
@@ -1280,10 +1308,7 @@ static int __exit ftgmac100_remove(struct 
platform_device *pdev)
priv = netdev_priv(netdev);
 
unregister_netdev(netdev);
-
-   phy_disconnect(netdev->phydev);
-   mdiobus_unregister(priv->mii_bus);
-   mdiobus_free(priv->mii_bus);
+   ftgmac100_destroy_mdio(netdev);
 
iounmap(priv->base);
release_resource(priv->res);
-- 
2.1.0

RE: [RFC PATCH 00/30] Kernel NET policy

2016-07-18 Thread Liang, Kan



> 
> > Also of course it would be fundamentally less efficient than kernel
> > code doing that, just because of the additional context switches
> > needed.
> 
> Synchronizing or configuring any kind of queues already requires rtnl_mutex.
> I didn't test it but acquiring rtnl mutex in inet_recvmsg is unlikely to fly
> performance wise and

Yes, rtnl will bring some overheads. But the configuration is one time thing for
application or socket. It only happens on receiving first packet.
Unless the application/socket only transmit few packets, the overhead
could be ignored. If they only transmit few packets, why they care about
performance?

> might even be very dangerous under DoS attacks (like
> I see in 24/30).
> 
Patch 29/30 tries to prevent such case.

Thanks,
Kan

Re: [PATCH net-next v2 02/10] net/ncsi: NCSI command packet handler

2016-07-18 Thread Gavin Shan

On Mon, Jul 18, 2016 at 10:15:21AM +1000, Gavin Shan wrote:
>On Fri, Jul 15, 2016 at 10:08:23PM +0800, kbuild test robot wrote:
>>[auto build test ERROR on net-next/master]
>>
>>url:
>>https://github.com/0day-ci/linux/commits/Gavin-Shan/NCSI-Support/20160715-190549
>>config: i386-allmodconfig (attached as .config)
>>compiler: gcc-6 (Debian 6.1.1-1) 6.1.1 20160430
>>reproduce:
>># save the attached .config to linux build tree
>>make ARCH=i386 
>>
>>All error/warnings (new ones prefixed by >>):
>>
>>   In file included from include/linux/swab.h:4:0,
>>from include/uapi/linux/byteorder/little_endian.h:12,
>>from include/linux/byteorder/little_endian.h:4,
>>from arch/x86/include/uapi/asm/byteorder.h:4,
>>from include/asm-generic/bitops/le.h:5,
>>from arch/x86/include/asm/bitops.h:504,
>>from include/linux/bitops.h:36,
>>from include/linux/kernel.h:10,
>>from include/linux/list.h:8,
>>from include/linux/module.h:9,
>>from net/ncsi/ncsi-cmd.c:10:
>>   net/ncsi/ncsi-cmd.c: In function 'ncsi_alloc_command':
 net/ncsi/ncsi-cmd.c:301:24: error: 'ETH_P_NCSI' undeclared (first use in 
 this function)
>> skb->protocol = htons(ETH_P_NCSI);
>>   ^
>
>The ETH_P_NCSI definition in include/uapi/linux/if_ether.h was missed from 
>this series. I
>will fix it in next respin to address comments received from this series.
>

I will send followup v3 shortly since David marked this series as "change 
requested"
in patchwork.

Thanks,
Gavin

[PATCH net-next v3 12/12] net: dsa: mv88e6xxx: add support for DSA ageing time

2016-07-18 Thread Vivien Didelot

Implement the DSA driver function to configure the bridge ageing time.

Signed-off-by: Vivien Didelot 
---
 drivers/net/dsa/mv88e6xxx/chip.c | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
index 8e24c65..9ba2173 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.c
+++ b/drivers/net/dsa/mv88e6xxx/chip.c
@@ -3002,6 +3002,19 @@ static int mv88e6xxx_g1_set_age_time(struct 
mv88e6xxx_chip *chip,
return mv88e6xxx_write(chip, REG_GLOBAL, GLOBAL_ATU_CONTROL, val);
 }
 
+static int mv88e6xxx_set_ageing_time(struct dsa_switch *ds,
+unsigned int ageing_time)
+{
+   struct mv88e6xxx_chip *chip = ds_to_priv(ds);
+   int err;
+
+   mutex_lock(>reg_lock);
+   err = mv88e6xxx_g1_set_age_time(chip, ageing_time);
+   mutex_unlock(>reg_lock);
+
+   return err;
+}
+
 static int mv88e6xxx_g1_setup(struct mv88e6xxx_chip *chip)
 {
struct dsa_switch *ds = chip->ds;
@@ -3980,6 +3993,7 @@ static struct dsa_switch_driver mv88e6xxx_switch_driver = 
{
.set_eeprom = mv88e6xxx_set_eeprom,
.get_regs_len   = mv88e6xxx_get_regs_len,
.get_regs   = mv88e6xxx_get_regs,
+   .set_ageing_time= mv88e6xxx_set_ageing_time,
.port_bridge_join   = mv88e6xxx_port_bridge_join,
.port_bridge_leave  = mv88e6xxx_port_bridge_leave,
.port_stp_state_set = mv88e6xxx_port_stp_state_set,
-- 
2.9.0

[PATCH net-next v3 09/12] net: dsa: mv88e6xxx: add cap for IRL

2016-07-18 Thread Vivien Didelot

Add capability flags to describe the presence of Ingress Rate Limit unit
registers and an helper function to clear it.

In the meantime, fix a few harmless issues:

  - 6185 and 6095 don't have such registers (reserved)
  - the previous code didn't wait for the IRL operation to complete

Signed-off-by: Vivien Didelot 
Reviewed-by: Andrew Lunn 
---
 drivers/net/dsa/mv88e6xxx/chip.c  | 50 ++-
 drivers/net/dsa/mv88e6xxx/mv88e6xxx.h | 23 ++--
 2 files changed, 53 insertions(+), 20 deletions(-)

diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
index 3422792..bc26b4f 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.c
+++ b/drivers/net/dsa/mv88e6xxx/chip.c
@@ -3150,6 +3150,29 @@ static int mv88e6xxx_g2_clear_trunk(struct 
mv88e6xxx_chip *chip)
return 0;
 }
 
+static int mv88e6xxx_g2_clear_irl(struct mv88e6xxx_chip *chip)
+{
+   int port, err;
+
+   /* Init all Ingress Rate Limit resources of all ports */
+   for (port = 0; port < chip->info->num_ports; ++port) {
+   /* XXX newer chips (like 88E6390) have different 2-bit ops */
+   err = mv88e6xxx_write(chip, REG_GLOBAL2, GLOBAL2_IRL_CMD,
+ GLOBAL2_IRL_CMD_OP_INIT_ALL |
+ (port << 8));
+   if (err)
+   break;
+
+   /* Wait for the operation to complete */
+   err = _mv88e6xxx_wait(chip, REG_GLOBAL2, GLOBAL2_IRL_CMD,
+ GLOBAL2_IRL_CMD_BUSY);
+   if (err)
+   break;
+   }
+
+   return err;
+}
+
 /* Indirect write to the Switch MAC/WoL/WoF register */
 static int mv88e6xxx_g2_switch_mac_write(struct mv88e6xxx_chip *chip,
 unsigned int pointer, u8 data)
@@ -3198,7 +3221,6 @@ static int mv88e6xxx_g2_setup(struct mv88e6xxx_chip *chip)
 {
u16 reg;
int err;
-   int i;
 
if (mv88e6xxx_has(chip, MV88E6XXX_FLAG_G2_MGMT_EN_2X)) {
/* Consider the frames with reserved multicast destination
@@ -3243,6 +3265,15 @@ static int mv88e6xxx_g2_setup(struct mv88e6xxx_chip 
*chip)
if (err)
return err;
 
+   if (mv88e6xxx_has(chip, MV88E6XXX_FLAGS_IRL)) {
+   /* Disable ingress rate limiting by resetting all per port
+* ingress rate limit resources to their initial state.
+*/
+   err = mv88e6xxx_g2_clear_irl(chip);
+   if (err)
+   return err;
+   }
+
if (mv88e6xxx_has(chip, MV88E6XXX_FLAGS_PVT)) {
/* Initialize Cross-chip Port VLAN Table to reset defaults */
err = mv88e6xxx_write(chip, REG_GLOBAL2, GLOBAL2_PVT_ADDR,
@@ -3258,23 +3289,6 @@ static int mv88e6xxx_g2_setup(struct mv88e6xxx_chip 
*chip)
return err;
}
 
-   if (mv88e6xxx_6352_family(chip) || mv88e6xxx_6351_family(chip) ||
-   mv88e6xxx_6165_family(chip) || mv88e6xxx_6097_family(chip) ||
-   mv88e6xxx_6185_family(chip) || mv88e6xxx_6095_family(chip) ||
-   mv88e6xxx_6320_family(chip)) {
-   /* Disable ingress rate limiting by resetting all
-* ingress rate limit registers to their initial
-* state.
-*/
-   for (i = 0; i < chip->info->num_ports; i++) {
-   err = _mv88e6xxx_reg_write(chip, REG_GLOBAL2,
-  GLOBAL2_INGRESS_OP,
-  0x9000 | (i << 8));
-   if (err)
-   return err;
-   }
-   }
-
return 0;
 }
 
diff --git a/drivers/net/dsa/mv88e6xxx/mv88e6xxx.h 
b/drivers/net/dsa/mv88e6xxx/mv88e6xxx.h
index 06b11fb..9ea5363 100644
--- a/drivers/net/dsa/mv88e6xxx/mv88e6xxx.h
+++ b/drivers/net/dsa/mv88e6xxx/mv88e6xxx.h
@@ -298,8 +298,13 @@
 #define GLOBAL2_TRUNK_MAPPING  0x08
 #define GLOBAL2_TRUNK_MAPPING_UPDATE   BIT(15)
 #define GLOBAL2_TRUNK_MAPPING_ID_SHIFT 11
-#define GLOBAL2_INGRESS_OP 0x09
-#define GLOBAL2_INGRESS_DATA   0x0a
+#define GLOBAL2_IRL_CMD0x09
+#define GLOBAL2_IRL_CMD_BUSY   BIT(15)
+#define GLOBAL2_IRL_CMD_OP_INIT_ALL((0x001 << 12) | GLOBAL2_IRL_CMD_BUSY)
+#define GLOBAL2_IRL_CMD_OP_INIT_SEL((0x010 << 12) | GLOBAL2_IRL_CMD_BUSY)
+#define GLOBAL2_IRL_CMD_OP_WRITE_SEL   ((0x011 << 12) | GLOBAL2_IRL_CMD_BUSY)
+#define GLOBAL2_IRL_CMD_OP_READ_SEL((0x100 << 12) | GLOBAL2_IRL_CMD_BUSY)
+#define GLOBAL2_IRL_DATA   0x0a
 #define GLOBAL2_PVT_ADDR   0x0b
 #define GLOBAL2_PVT_ADDR_BUSY  BIT(15)
 #define GLOBAL2_PVT_ADDR_OP_INIT_ONES  ((0x01 << 12) | GLOBAL2_PVT_ADDR_BUSY)
@@ -393,6 +398,8 @@ enum mv88e6xxx_cap {

[PATCH net-next v3 10/12] net: dsa: support switchdev ageing time attr

2016-07-18 Thread Vivien Didelot

Add a new function for DSA drivers to handle the switchdev
SWITCHDEV_ATTR_ID_BRIDGE_AGEING_TIME attribute.

The ageing time is passed as milliseconds.

Also because we can have multiple logical bridges on top of a physical
switch and ageing time are switch-wide, call the driver function with
the fastest ageing time in use on the chip instead of the requested one.

Signed-off-by: Vivien Didelot 
---
 include/net/dsa.h |  2 ++
 net/dsa/slave.c   | 41 +
 2 files changed, 43 insertions(+)

diff --git a/include/net/dsa.h b/include/net/dsa.h
index 52ab18b..2217a3f 100644
--- a/include/net/dsa.h
+++ b/include/net/dsa.h
@@ -141,6 +141,7 @@ struct dsa_switch_tree {
 struct dsa_port {
struct net_device   *netdev;
struct device_node  *dn;
+   unsigned intageing_time;
 };
 
 struct dsa_switch {
@@ -329,6 +330,7 @@ struct dsa_switch_driver {
/*
 * Bridge integration
 */
+   int (*set_ageing_time)(struct dsa_switch *ds, unsigned int msecs);
int (*port_bridge_join)(struct dsa_switch *ds, int port,
struct net_device *bridge);
void(*port_bridge_leave)(struct dsa_switch *ds, int port);
diff --git a/net/dsa/slave.c b/net/dsa/slave.c
index 7236eb2..fc91967 100644
--- a/net/dsa/slave.c
+++ b/net/dsa/slave.c
@@ -333,6 +333,44 @@ static int dsa_slave_vlan_filtering(struct net_device *dev,
return 0;
 }
 
+static int dsa_fastest_ageing_time(struct dsa_switch *ds,
+  unsigned int ageing_time)
+{
+   int i;
+
+   for (i = 0; i < DSA_MAX_PORTS; ++i) {
+   struct dsa_port *dp = >ports[i];
+
+   if (dp && dp->ageing_time && dp->ageing_time < ageing_time)
+   ageing_time = dp->ageing_time;
+   }
+
+   return ageing_time;
+}
+
+static int dsa_slave_ageing_time(struct net_device *dev,
+const struct switchdev_attr *attr,
+struct switchdev_trans *trans)
+{
+   struct dsa_slave_priv *p = netdev_priv(dev);
+   struct dsa_switch *ds = p->parent;
+   unsigned long ageing_jiffies = clock_t_to_jiffies(attr->u.ageing_time);
+   unsigned int ageing_time = jiffies_to_msecs(ageing_jiffies);
+
+   /* bridge skips -EOPNOTSUPP, so skip the prepare phase */
+   if (switchdev_trans_ph_prepare(trans))
+   return 0;
+
+   /* Keep the fastest ageing time in case of multiple bridges */
+   ds->ports[p->port].ageing_time = ageing_time;
+   ageing_time = dsa_fastest_ageing_time(ds, ageing_time);
+
+   if (ds->drv->set_ageing_time)
+   return ds->drv->set_ageing_time(ds, ageing_time);
+
+   return 0;
+}
+
 static int dsa_slave_port_attr_set(struct net_device *dev,
   const struct switchdev_attr *attr,
   struct switchdev_trans *trans)
@@ -346,6 +384,9 @@ static int dsa_slave_port_attr_set(struct net_device *dev,
case SWITCHDEV_ATTR_ID_BRIDGE_VLAN_FILTERING:
ret = dsa_slave_vlan_filtering(dev, attr, trans);
break;
+   case SWITCHDEV_ATTR_ID_BRIDGE_AGEING_TIME:
+   ret = dsa_slave_ageing_time(dev, attr, trans);
+   break;
default:
ret = -EOPNOTSUPP;
break;
-- 
2.9.0

[PATCH net-next v3 11/12] net: dsa: mv88e6xxx: add G1 helper for ageing time

2016-07-18 Thread Vivien Didelot

All Marvell switch chips from (88E6060 to 88E6390) have a ATU Control
register containing bits 11:4 to configure an ATU Age Time quotient.

However the coefficient used to calculate the ATU Age Time vary with the
models. E.g. 88E6060, 88E6352 and 88E6390 use respectively 16, 15 and
3.75 seconds.

Add a age_time_coeff to the info structure to handle this and a Global 1
helper to set the default age time of 5 minutes in the setup code.

Signed-off-by: Vivien Didelot 
---
 drivers/net/dsa/mv88e6xxx/chip.c  | 58 ---
 drivers/net/dsa/mv88e6xxx/mv88e6xxx.h |  1 +
 2 files changed, 54 insertions(+), 5 deletions(-)

diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
index bc26b4f..8e24c65 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.c
+++ b/drivers/net/dsa/mv88e6xxx/chip.c
@@ -2975,6 +2975,33 @@ static int mv88e6xxx_g1_set_switch_mac(struct 
mv88e6xxx_chip *chip, u8 *addr)
   (addr[4] << 8) | addr[5]);
 }
 
+static int mv88e6xxx_g1_set_age_time(struct mv88e6xxx_chip *chip,
+unsigned int msecs)
+{
+   const unsigned int coeff = chip->info->age_time_coeff;
+   const unsigned int min = 0x01 * coeff;
+   const unsigned int max = 0xff * coeff;
+   u8 age_time;
+   u16 val;
+   int err;
+
+   if (msecs < min || msecs > max)
+   return -ERANGE;
+
+   /* Round to nearest multiple of coeff */
+   age_time = (msecs + coeff / 2) / coeff;
+
+   err = mv88e6xxx_read(chip, REG_GLOBAL, GLOBAL_ATU_CONTROL, );
+   if (err)
+   return err;
+
+   /* AgeTime is 11:4 bits */
+   val &= ~0xff0;
+   val |= age_time << 4;
+
+   return mv88e6xxx_write(chip, REG_GLOBAL, GLOBAL_ATU_CONTROL, val);
+}
+
 static int mv88e6xxx_g1_setup(struct mv88e6xxx_chip *chip)
 {
struct dsa_switch *ds = chip->ds;
@@ -3012,18 +3039,22 @@ static int mv88e6xxx_g1_setup(struct mv88e6xxx_chip 
*chip)
if (err)
return err;
 
+   /* Clear all the VTU and STU entries */
+   err = _mv88e6xxx_vtu_stu_flush(chip);
+   if (err < 0)
+   return err;
+
/* Set the default address aging time to 5 minutes, and
 * enable address learn messages to be sent to all message
 * ports.
 */
-   err = _mv88e6xxx_reg_write(chip, REG_GLOBAL, GLOBAL_ATU_CONTROL,
-  0x0140 | GLOBAL_ATU_CONTROL_LEARN2ALL);
+   err = mv88e6xxx_write(chip, REG_GLOBAL, GLOBAL_ATU_CONTROL,
+ GLOBAL_ATU_CONTROL_LEARN2ALL);
if (err)
return err;
 
-   /* Clear all the VTU and STU entries */
-   err = _mv88e6xxx_vtu_stu_flush(chip);
-   if (err < 0)
+   err = mv88e6xxx_g1_set_age_time(chip, 30);
+   if (err)
return err;
 
/* Clear all ATU entries */
@@ -3634,6 +3665,7 @@ static const struct mv88e6xxx_info mv88e6xxx_table[] = {
.num_databases = 4096,
.num_ports = 10,
.port_base_addr = 0x10,
+   .age_time_coeff = 15000,
.flags = MV88E6XXX_FLAGS_FAMILY_6097,
},
 
@@ -3644,6 +3676,7 @@ static const struct mv88e6xxx_info mv88e6xxx_table[] = {
.num_databases = 256,
.num_ports = 11,
.port_base_addr = 0x10,
+   .age_time_coeff = 15000,
.flags = MV88E6XXX_FLAGS_FAMILY_6095,
},
 
@@ -3654,6 +3687,7 @@ static const struct mv88e6xxx_info mv88e6xxx_table[] = {
.num_databases = 4096,
.num_ports = 3,
.port_base_addr = 0x10,
+   .age_time_coeff = 15000,
.flags = MV88E6XXX_FLAGS_FAMILY_6165,
},
 
@@ -3664,6 +3698,7 @@ static const struct mv88e6xxx_info mv88e6xxx_table[] = {
.num_databases = 256,
.num_ports = 8,
.port_base_addr = 0x10,
+   .age_time_coeff = 15000,
.flags = MV88E6XXX_FLAGS_FAMILY_6185,
},
 
@@ -3674,6 +3709,7 @@ static const struct mv88e6xxx_info mv88e6xxx_table[] = {
.num_databases = 4096,
.num_ports = 6,
.port_base_addr = 0x10,
+   .age_time_coeff = 15000,
.flags = MV88E6XXX_FLAGS_FAMILY_6165,
},
 
@@ -3684,6 +3720,7 @@ static const struct mv88e6xxx_info mv88e6xxx_table[] = {
.num_databases = 4096,
.num_ports = 6,
.port_base_addr = 0x10,
+   .age_time_coeff = 15000,
.flags = MV88E6XXX_FLAGS_FAMILY_6165,
},
 
@@ -3694,6 +3731,7 @@ static const struct mv88e6xxx_info mv88e6xxx_table[] = {
.num_databases = 4096,
.num_ports = 7,
.port_base_addr = 0x10,
+   .age_time_coeff = 15000,

[PATCH net-next v3 04/12] net: dsa: mv88e6xxx: extract trunk mapping

2016-07-18 Thread Vivien Didelot

The Trunk Mask and Trunk Mapping registers are two Global 2 indirect
accesses to trunking configuration.

Add helpers for these tables and simplify the Global 2 setup.

Signed-off-by: Vivien Didelot 
Reviewed-by: Andrew Lunn 
---
 drivers/net/dsa/mv88e6xxx/chip.c  | 68 ---
 drivers/net/dsa/mv88e6xxx/mv88e6xxx.h |  1 +
 2 files changed, 48 insertions(+), 21 deletions(-)

diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
index 9b2525a..d18e5c8 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.c
+++ b/drivers/net/dsa/mv88e6xxx/chip.c
@@ -3151,6 +3151,49 @@ static int mv88e6xxx_g2_set_device_mapping(struct 
mv88e6xxx_chip *chip)
return err;
 }
 
+static int mv88e6xxx_g2_trunk_mask_write(struct mv88e6xxx_chip *chip, int num,
+bool hask, u16 mask)
+{
+   const u16 port_mask = BIT(chip->info->num_ports) - 1;
+   u16 val = (num << 12) | (mask & port_mask);
+
+   if (hask)
+   val |= GLOBAL2_TRUNK_MASK_HASK;
+
+   return mv88e6xxx_update(chip, REG_GLOBAL2, GLOBAL2_TRUNK_MASK, val);
+}
+
+static int mv88e6xxx_g2_trunk_mapping_write(struct mv88e6xxx_chip *chip, int 
id,
+   u16 map)
+{
+   const u16 port_mask = BIT(chip->info->num_ports) - 1;
+   u16 val = (id << 11) | (map & port_mask);
+
+   return mv88e6xxx_update(chip, REG_GLOBAL2, GLOBAL2_TRUNK_MAPPING, val);
+}
+
+static int mv88e6xxx_g2_clear_trunk(struct mv88e6xxx_chip *chip)
+{
+   const u16 port_mask = BIT(chip->info->num_ports) - 1;
+   int i, err;
+
+   /* Clear all eight possible Trunk Mask vectors */
+   for (i = 0; i < 8; ++i) {
+   err = mv88e6xxx_g2_trunk_mask_write(chip, i, false, port_mask);
+   if (err)
+   return err;
+   }
+
+   /* Clear all sixteen possible Trunk ID routing vectors */
+   for (i = 0; i < 16; ++i) {
+   err = mv88e6xxx_g2_trunk_mapping_write(chip, i, 0);
+   if (err)
+   return err;
+   }
+
+   return 0;
+}
+
 static int mv88e6xxx_g2_setup(struct mv88e6xxx_chip *chip)
 {
int err;
@@ -3180,27 +3223,10 @@ static int mv88e6xxx_g2_setup(struct mv88e6xxx_chip 
*chip)
if (err)
return err;
 
-   /* Clear all trunk masks. */
-   for (i = 0; i < 8; i++) {
-   err = _mv88e6xxx_reg_write(chip, REG_GLOBAL2,
-  GLOBAL2_TRUNK_MASK,
-  0x8000 |
-  (i << GLOBAL2_TRUNK_MASK_NUM_SHIFT) |
-  ((1 << chip->info->num_ports) - 1));
-   if (err)
-   return err;
-   }
-
-   /* Clear all trunk mappings. */
-   for (i = 0; i < 16; i++) {
-   err = _mv88e6xxx_reg_write(
-   chip, REG_GLOBAL2,
-   GLOBAL2_TRUNK_MAPPING,
-   GLOBAL2_TRUNK_MAPPING_UPDATE |
-   (i << GLOBAL2_TRUNK_MAPPING_ID_SHIFT));
-   if (err)
-   return err;
-   }
+   /* Clear all trunk masks and mapping. */
+   err = mv88e6xxx_g2_clear_trunk(chip);
+   if (err)
+   return err;
 
if (mv88e6xxx_6352_family(chip) || mv88e6xxx_6351_family(chip) ||
mv88e6xxx_6165_family(chip) || mv88e6xxx_6097_family(chip) ||
diff --git a/drivers/net/dsa/mv88e6xxx/mv88e6xxx.h 
b/drivers/net/dsa/mv88e6xxx/mv88e6xxx.h
index 390dac5..876d9ea 100644
--- a/drivers/net/dsa/mv88e6xxx/mv88e6xxx.h
+++ b/drivers/net/dsa/mv88e6xxx/mv88e6xxx.h
@@ -294,6 +294,7 @@
 #define GLOBAL2_TRUNK_MASK 0x07
 #define GLOBAL2_TRUNK_MASK_UPDATE  BIT(15)
 #define GLOBAL2_TRUNK_MASK_NUM_SHIFT   12
+#define GLOBAL2_TRUNK_MASK_HASKBIT(11)
 #define GLOBAL2_TRUNK_MAPPING  0x08
 #define GLOBAL2_TRUNK_MAPPING_UPDATE   BIT(15)
 #define GLOBAL2_TRUNK_MAPPING_ID_SHIFT 11
-- 
2.9.0

[PATCH net-next v3 06/12] net: dsa: mv88e6xxx: rework Switch MAC setter

2016-07-18 Thread Vivien Didelot

Switches such as 88E6185 as 3 Switch MAC registers in Global 1. Newer
chips such as 88E6352 have freed these registers in favor of an indirect
access in a Switch MAC/WoL/WoF register in Global 2.

Explicit this difference with G1 and G2 helpers and flags.

Also, note that this indirect access is a single-register which doesn't
require to wait for the operation to complete (like Switch MAC, Trunk
Mapping, etc.), in contrary to multi-registers indirect accesses with
several operations and a busy bit.

Signed-off-by: Vivien Didelot 
Reviewed-by: Andrew Lunn 
---
 drivers/net/dsa/mv88e6xxx/chip.c  | 120 --
 drivers/net/dsa/mv88e6xxx/mv88e6xxx.h |  19 ++
 2 files changed, 64 insertions(+), 75 deletions(-)

diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
index cf98884..d1b4a7a 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.c
+++ b/drivers/net/dsa/mv88e6xxx/chip.c
@@ -283,68 +283,6 @@ static int mv88e6xxx_reg_write(struct mv88e6xxx_chip 
*chip, int addr,
return ret;
 }
 
-static int mv88e6xxx_set_addr_direct(struct dsa_switch *ds, u8 *addr)
-{
-   struct mv88e6xxx_chip *chip = ds_to_priv(ds);
-   int err;
-
-   err = mv88e6xxx_reg_write(chip, REG_GLOBAL, GLOBAL_MAC_01,
- (addr[0] << 8) | addr[1]);
-   if (err)
-   return err;
-
-   err = mv88e6xxx_reg_write(chip, REG_GLOBAL, GLOBAL_MAC_23,
- (addr[2] << 8) | addr[3]);
-   if (err)
-   return err;
-
-   return mv88e6xxx_reg_write(chip, REG_GLOBAL, GLOBAL_MAC_45,
-  (addr[4] << 8) | addr[5]);
-}
-
-static int mv88e6xxx_set_addr_indirect(struct dsa_switch *ds, u8 *addr)
-{
-   struct mv88e6xxx_chip *chip = ds_to_priv(ds);
-   int ret;
-   int i;
-
-   for (i = 0; i < 6; i++) {
-   int j;
-
-   /* Write the MAC address byte. */
-   ret = mv88e6xxx_reg_write(chip, REG_GLOBAL2, GLOBAL2_SWITCH_MAC,
- GLOBAL2_SWITCH_MAC_BUSY |
- (i << 8) | addr[i]);
-   if (ret)
-   return ret;
-
-   /* Wait for the write to complete. */
-   for (j = 0; j < 16; j++) {
-   ret = mv88e6xxx_reg_read(chip, REG_GLOBAL2,
-GLOBAL2_SWITCH_MAC);
-   if (ret < 0)
-   return ret;
-
-   if ((ret & GLOBAL2_SWITCH_MAC_BUSY) == 0)
-   break;
-   }
-   if (j == 16)
-   return -ETIMEDOUT;
-   }
-
-   return 0;
-}
-
-static int mv88e6xxx_set_addr(struct dsa_switch *ds, u8 *addr)
-{
-   struct mv88e6xxx_chip *chip = ds_to_priv(ds);
-
-   if (mv88e6xxx_has(chip, MV88E6XXX_FLAG_SWITCH_MAC))
-   return mv88e6xxx_set_addr_indirect(ds, addr);
-   else
-   return mv88e6xxx_set_addr_direct(ds, addr);
-}
-
 static int mv88e6xxx_mdio_read_direct(struct mv88e6xxx_chip *chip,
  int addr, int regnum)
 {
@@ -3019,6 +2957,24 @@ static int mv88e6xxx_setup_port(struct mv88e6xxx_chip 
*chip, int port)
return 0;
 }
 
+static int mv88e6xxx_g1_set_switch_mac(struct mv88e6xxx_chip *chip, u8 *addr)
+{
+   int err;
+
+   err = mv88e6xxx_write(chip, REG_GLOBAL, GLOBAL_MAC_01,
+ (addr[0] << 8) | addr[1]);
+   if (err)
+   return err;
+
+   err = mv88e6xxx_write(chip, REG_GLOBAL, GLOBAL_MAC_23,
+ (addr[2] << 8) | addr[3]);
+   if (err)
+   return err;
+
+   return mv88e6xxx_write(chip, REG_GLOBAL, GLOBAL_MAC_45,
+  (addr[4] << 8) | addr[5]);
+}
+
 static int mv88e6xxx_g1_setup(struct mv88e6xxx_chip *chip)
 {
struct dsa_switch *ds = chip->ds;
@@ -3194,6 +3150,28 @@ static int mv88e6xxx_g2_clear_trunk(struct 
mv88e6xxx_chip *chip)
return 0;
 }
 
+/* Indirect write to the Switch MAC/WoL/WoF register */
+static int mv88e6xxx_g2_switch_mac_write(struct mv88e6xxx_chip *chip,
+unsigned int pointer, u8 data)
+{
+   u16 val = (pointer << 8) | data;
+
+   return mv88e6xxx_update(chip, REG_GLOBAL2, GLOBAL2_SWITCH_MAC, val);
+}
+
+static int mv88e6xxx_g2_set_switch_mac(struct mv88e6xxx_chip *chip, u8 *addr)
+{
+   int i, err;
+
+   for (i = 0; i < 6; i++) {
+   err = mv88e6xxx_g2_switch_mac_write(chip, i, addr[i]);
+   if (err)
+   break;
+   }
+
+   return err;
+}
+
 static int mv88e6xxx_g2_setup(struct mv88e6xxx_chip *chip)
 {
u16 reg;
@@ -3327,6 +3305,24 @@ unlock:
return err;
 }
 
+static int mv88e6xxx_set_addr(struct

[PATCH net-next v3 02/12] net: dsa: mv88e6xxx: split setup of Global 1 and 2

2016-07-18 Thread Vivien Didelot

Separate the setup of Global 1 and Global 2 internal SMI devices and add
a flag to describe the presence of this second registers set.

Also rearrange the G1 setup in the registers order.

Signed-off-by: Vivien Didelot 
Reviewed-by: Andrew Lunn 
---
 drivers/net/dsa/mv88e6xxx/chip.c  | 71 ++-
 drivers/net/dsa/mv88e6xxx/mv88e6xxx.h | 23 +---
 2 files changed, 62 insertions(+), 32 deletions(-)

diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
index 3feb842..1e39fa6 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.c
+++ b/drivers/net/dsa/mv88e6xxx/chip.c
@@ -2993,13 +2993,12 @@ static int mv88e6xxx_setup_port(struct mv88e6xxx_chip 
*chip, int port)
return 0;
 }
 
-static int mv88e6xxx_setup_global(struct mv88e6xxx_chip *chip)
+static int mv88e6xxx_g1_setup(struct mv88e6xxx_chip *chip)
 {
struct dsa_switch *ds = chip->ds;
u32 upstream_port = dsa_upstream_port(ds);
u16 reg;
int err;
-   int i;
 
/* Enable the PHY Polling Unit if present, don't discard any packets,
 * and mask all interrupt sources.
@@ -3040,6 +3039,16 @@ static int mv88e6xxx_setup_global(struct mv88e6xxx_chip 
*chip)
if (err)
return err;
 
+   /* Clear all the VTU and STU entries */
+   err = _mv88e6xxx_vtu_stu_flush(chip);
+   if (err < 0)
+   return err;
+
+   /* Clear all ATU entries */
+   err = _mv88e6xxx_atu_flush(chip, 0, true);
+   if (err)
+   return err;
+
/* Configure the IP ToS mapping registers. */
err = _mv88e6xxx_reg_write(chip, REG_GLOBAL, GLOBAL_IP_PRI_0, 0x);
if (err)
@@ -3071,6 +3080,26 @@ static int mv88e6xxx_setup_global(struct mv88e6xxx_chip 
*chip)
if (err)
return err;
 
+   /* Clear the statistics counters for all ports */
+   err = _mv88e6xxx_reg_write(chip, REG_GLOBAL, GLOBAL_STATS_OP,
+  GLOBAL_STATS_OP_FLUSH_ALL);
+   if (err)
+   return err;
+
+   /* Wait for the flush to complete. */
+   err = _mv88e6xxx_stats_wait(chip);
+   if (err)
+   return err;
+
+   return 0;
+}
+
+static int mv88e6xxx_g2_setup(struct mv88e6xxx_chip *chip)
+{
+   struct dsa_switch *ds = chip->ds;
+   int err;
+   int i;
+
/* Send all frames with destination addresses matching
 * 01:80:c2:00:00:0x to the CPU port.
 */
@@ -3174,28 +3203,7 @@ static int mv88e6xxx_setup_global(struct mv88e6xxx_chip 
*chip)
}
}
 
-   /* Clear the statistics counters for all ports */
-   err = _mv88e6xxx_reg_write(chip, REG_GLOBAL, GLOBAL_STATS_OP,
-  GLOBAL_STATS_OP_FLUSH_ALL);
-   if (err)
-   return err;
-
-   /* Wait for the flush to complete. */
-   err = _mv88e6xxx_stats_wait(chip);
-   if (err)
-   return err;
-
-   /* Clear all ATU entries */
-   err = _mv88e6xxx_atu_flush(chip, 0, true);
-   if (err)
-   return err;
-
-   /* Clear all the VTU and STU entries */
-   err = _mv88e6xxx_vtu_stu_flush(chip);
-   if (err < 0)
-   return err;
-
-   return err;
+   return 0;
 }
 
 static int mv88e6xxx_setup(struct dsa_switch *ds)
@@ -3216,12 +3224,21 @@ static int mv88e6xxx_setup(struct dsa_switch *ds)
if (err)
goto unlock;
 
-   err = mv88e6xxx_setup_global(chip);
+   /* Setup Switch Port Registers */
+   for (i = 0; i < chip->info->num_ports; i++) {
+   err = mv88e6xxx_setup_port(chip, i);
+   if (err)
+   goto unlock;
+   }
+
+   /* Setup Switch Global 1 Registers */
+   err = mv88e6xxx_g1_setup(chip);
if (err)
goto unlock;
 
-   for (i = 0; i < chip->info->num_ports; i++) {
-   err = mv88e6xxx_setup_port(chip, i);
+   /* Setup Switch Global 2 Registers */
+   if (mv88e6xxx_has(chip, MV88E6XXX_FLAG_GLOBAL2)) {
+   err = mv88e6xxx_g2_setup(chip);
if (err)
goto unlock;
}
diff --git a/drivers/net/dsa/mv88e6xxx/mv88e6xxx.h 
b/drivers/net/dsa/mv88e6xxx/mv88e6xxx.h
index 2ff62f4..390dac5 100644
--- a/drivers/net/dsa/mv88e6xxx/mv88e6xxx.h
+++ b/drivers/net/dsa/mv88e6xxx/mv88e6xxx.h
@@ -383,6 +383,11 @@ enum mv88e6xxx_cap {
 */
MV88E6XXX_CAP_EEPROM,
 
+   /* Switch Global 2 Registers.
+* The device contains a second set of global 16-bit registers.
+*/
+   MV88E6XXX_CAP_GLOBAL2,
+
/* Multi-chip Addressing Mode.
 * Some chips require an indirect SMI access when their SMI device
 * address is not zero. See SMI_CMD and SMI_DATA.
@@ -429,6 +434,7 @@ enum mv88e6xxx_cap {
 /* Bitmask of capabilities */
 #define MV88E6XXX_FLAG_EEE

[PATCH net-next v3 01/12] net: dsa: mv88e6xxx: remove basic function flags

2016-07-18 Thread Vivien Didelot

All 88E6xxx Marvell switches (even the old not supported yet 88E6060)
have at least an ATU, per-port STP states and VLAN map, to run basic
switch functions such as Spanning Tree and port based VLANs.

Get rid of the related MV88E6XXX_FLAG_{ATU,PORTSTATE,VLANTABLE} flags,
as they are defaults to every chip.

This enables STP on 6185 and removes many inconsistencies on others.

Signed-off-by: Vivien Didelot 
Reviewed-by: Andrew Lunn 
---
 drivers/net/dsa/mv88e6xxx/chip.c  | 23 --
 drivers/net/dsa/mv88e6xxx/mv88e6xxx.h | 46 +--
 2 files changed, 6 insertions(+), 63 deletions(-)

diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
index 5cb06f7..3feb842 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.c
+++ b/drivers/net/dsa/mv88e6xxx/chip.c
@@ -1460,9 +1460,6 @@ static void mv88e6xxx_port_stp_state_set(struct 
dsa_switch *ds, int port,
int stp_state;
int err;
 
-   if (!mv88e6xxx_has(chip, MV88E6XXX_FLAG_PORTSTATE))
-   return;
-
switch (state) {
case BR_STATE_DISABLED:
stp_state = PORT_CONTROL_STATE_DISABLED;
@@ -2398,11 +2395,6 @@ static int mv88e6xxx_port_fdb_prepare(struct dsa_switch 
*ds, int port,
  const struct switchdev_obj_port_fdb *fdb,
  struct switchdev_trans *trans)
 {
-   struct mv88e6xxx_chip *chip = ds_to_priv(ds);
-
-   if (!mv88e6xxx_has(chip, MV88E6XXX_FLAG_ATU))
-   return -EOPNOTSUPP;
-
/* We don't need any dynamic resource from the kernel (yet),
 * so skip the prepare phase.
 */
@@ -2418,9 +2410,6 @@ static void mv88e6xxx_port_fdb_add(struct dsa_switch *ds, 
int port,
GLOBAL_ATU_DATA_STATE_UC_STATIC;
struct mv88e6xxx_chip *chip = ds_to_priv(ds);
 
-   if (!mv88e6xxx_has(chip, MV88E6XXX_FLAG_ATU))
-   return;
-
mutex_lock(>reg_lock);
if (_mv88e6xxx_port_fdb_load(chip, port, fdb->addr, fdb->vid, state))
netdev_err(ds->ports[port].netdev,
@@ -2434,9 +2423,6 @@ static int mv88e6xxx_port_fdb_del(struct dsa_switch *ds, 
int port,
struct mv88e6xxx_chip *chip = ds_to_priv(ds);
int ret;
 
-   if (!mv88e6xxx_has(chip, MV88E6XXX_FLAG_ATU))
-   return -EOPNOTSUPP;
-
mutex_lock(>reg_lock);
ret = _mv88e6xxx_port_fdb_load(chip, port, fdb->addr, fdb->vid,
   GLOBAL_ATU_DATA_STATE_UNUSED);
@@ -2542,9 +2528,6 @@ static int mv88e6xxx_port_fdb_dump(struct dsa_switch *ds, 
int port,
u16 fid;
int err;
 
-   if (!mv88e6xxx_has(chip, MV88E6XXX_FLAG_ATU))
-   return -EOPNOTSUPP;
-
mutex_lock(>reg_lock);
 
/* Dump port's default Filtering Information Database (VLAN ID 0) */
@@ -2587,9 +2570,6 @@ static int mv88e6xxx_port_bridge_join(struct dsa_switch 
*ds, int port,
struct mv88e6xxx_chip *chip = ds_to_priv(ds);
int i, err = 0;
 
-   if (!mv88e6xxx_has(chip, MV88E6XXX_FLAG_VLANTABLE))
-   return -EOPNOTSUPP;
-
mutex_lock(>reg_lock);
 
/* Assign the bridge and remap each port's VLANTable */
@@ -2614,9 +2594,6 @@ static void mv88e6xxx_port_bridge_leave(struct dsa_switch 
*ds, int port)
struct net_device *bridge = chip->ports[port].bridge_dev;
int i;
 
-   if (!mv88e6xxx_has(chip, MV88E6XXX_FLAG_VLANTABLE))
-   return;
-
mutex_lock(>reg_lock);
 
/* Unassign the bridge and remap each port's VLANTable */
diff --git a/drivers/net/dsa/mv88e6xxx/mv88e6xxx.h 
b/drivers/net/dsa/mv88e6xxx/mv88e6xxx.h
index 83f0662..2ff62f4 100644
--- a/drivers/net/dsa/mv88e6xxx/mv88e6xxx.h
+++ b/drivers/net/dsa/mv88e6xxx/mv88e6xxx.h
@@ -374,11 +374,6 @@ enum mv88e6xxx_family {
 };
 
 enum mv88e6xxx_cap {
-   /* Address Translation Unit.
-* The ATU is used to lookup and learn MAC addresses. See GLOBAL_ATU_OP.
-*/
-   MV88E6XXX_CAP_ATU,
-
/* Energy Efficient Ethernet.
 */
MV88E6XXX_CAP_EEE,
@@ -394,11 +389,6 @@ enum mv88e6xxx_cap {
 */
MV88E6XXX_CAP_MULTI_CHIP,
 
-   /* Port State Filtering for 802.1D Spanning Tree.
-* See PORT_CONTROL_STATE_* values in the PORT_CONTROL register.
-*/
-   MV88E6XXX_CAP_PORTSTATE,
-
/* PHY Polling Unit.
 * See GLOBAL_CONTROL_PPU_ENABLE and GLOBAL_STATUS_PPU_POLLING.
 */
@@ -430,12 +420,6 @@ enum mv88e6xxx_cap {
MV88E6XXX_CAP_TEMP,
MV88E6XXX_CAP_TEMP_LIMIT,
 
-   /* In-chip Port Based VLANs.
-* Each port VLANTable register (see PORT_BASE_VLAN) is used to restrict
-* the output (or egress) ports to which it is allowed to send frames.
-*/
-   MV88E6XXX_CAP_VLANTABLE,
-
/* VLAN Table Unit.
 * The VTU is used to program 802.1Q VLANs. See

[PATCH net-next v3 08/12] net: dsa: mv88e6xxx: add cap for Priority Override

2016-07-18 Thread Vivien Didelot

Add flags and helpers to describe the presence of Priority Override
Table (POT) related registers and simplify the setup of Global 2.

Signed-off-by: Vivien Didelot 
Reviewed-by: Andrew Lunn 
---
 drivers/net/dsa/mv88e6xxx/chip.c  | 36 +--
 drivers/net/dsa/mv88e6xxx/mv88e6xxx.h |  7 +++
 2 files changed, 33 insertions(+), 10 deletions(-)

diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
index f346ad7..3422792 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.c
+++ b/drivers/net/dsa/mv88e6xxx/chip.c
@@ -3172,6 +3172,28 @@ static int mv88e6xxx_g2_set_switch_mac(struct 
mv88e6xxx_chip *chip, u8 *addr)
return err;
 }
 
+static int mv88e6xxx_g2_pot_write(struct mv88e6xxx_chip *chip, int pointer,
+ u8 data)
+{
+   u16 val = (pointer << 8) | (data & 0x7);
+
+   return mv88e6xxx_update(chip, REG_GLOBAL2, GLOBAL2_PRIO_OVERRIDE, val);
+}
+
+static int mv88e6xxx_g2_clear_pot(struct mv88e6xxx_chip *chip)
+{
+   int i, err;
+
+   /* Clear all sixteen possible Priority Override entries */
+   for (i = 0; i < 16; i++) {
+   err = mv88e6xxx_g2_pot_write(chip, i, 0);
+   if (err)
+   break;
+   }
+
+   return err;
+}
+
 static int mv88e6xxx_g2_setup(struct mv88e6xxx_chip *chip)
 {
u16 reg;
@@ -3229,17 +3251,11 @@ static int mv88e6xxx_g2_setup(struct mv88e6xxx_chip 
*chip)
return err;
}
 
-   if (mv88e6xxx_6352_family(chip) || mv88e6xxx_6351_family(chip) ||
-   mv88e6xxx_6165_family(chip) || mv88e6xxx_6097_family(chip) ||
-   mv88e6xxx_6320_family(chip)) {
+   if (mv88e6xxx_has(chip, MV88E6XXX_FLAG_G2_POT)) {
/* Clear the priority override table. */
-   for (i = 0; i < 16; i++) {
-   err = _mv88e6xxx_reg_write(chip, REG_GLOBAL2,
-  GLOBAL2_PRIO_OVERRIDE,
-  0x8000 | (i << 8));
-   if (err)
-   return err;
-   }
+   err = mv88e6xxx_g2_clear_pot(chip);
+   if (err)
+   return err;
}
 
if (mv88e6xxx_6352_family(chip) || mv88e6xxx_6351_family(chip) ||
diff --git a/drivers/net/dsa/mv88e6xxx/mv88e6xxx.h 
b/drivers/net/dsa/mv88e6xxx/mv88e6xxx.h
index 4e03650..06b11fb 100644
--- a/drivers/net/dsa/mv88e6xxx/mv88e6xxx.h
+++ b/drivers/net/dsa/mv88e6xxx/mv88e6xxx.h
@@ -396,6 +396,7 @@ enum mv88e6xxx_cap {
MV88E6XXX_CAP_G2_PVT_ADDR,  /* (0x0b) Cross Chip Port VLAN Addr */
MV88E6XXX_CAP_G2_PVT_DATA,  /* (0x0c) Cross Chip Port VLAN Data */
MV88E6XXX_CAP_G2_SWITCH_MAC,/* (0x0d) Switch MAC/WoL/WoF */
+   MV88E6XXX_CAP_G2_POT,   /* (0x0f) Priority Override Table */
 
/* Multi-chip Addressing Mode.
 * Some chips require an indirect SMI access when their SMI device
@@ -442,6 +443,7 @@ enum mv88e6xxx_cap {
 #define MV88E6XXX_FLAG_G2_PVT_ADDR BIT(MV88E6XXX_CAP_G2_PVT_ADDR)
 #define MV88E6XXX_FLAG_G2_PVT_DATA BIT(MV88E6XXX_CAP_G2_PVT_DATA)
 #define MV88E6XXX_FLAG_G2_SWITCH_MAC   BIT(MV88E6XXX_CAP_G2_SWITCH_MAC)
+#define MV88E6XXX_FLAG_G2_POT  BIT(MV88E6XXX_CAP_G2_POT)
 #define MV88E6XXX_FLAG_MULTI_CHIP  BIT(MV88E6XXX_CAP_MULTI_CHIP)
 #define MV88E6XXX_FLAG_PPU BIT(MV88E6XXX_CAP_PPU)
 #define MV88E6XXX_FLAG_PPU_ACTIVE  BIT(MV88E6XXX_CAP_PPU_ACTIVE)
@@ -467,6 +469,7 @@ enum mv88e6xxx_cap {
(MV88E6XXX_FLAG_GLOBAL2 |   \
 MV88E6XXX_FLAG_G2_MGMT_EN_2X | \
 MV88E6XXX_FLAG_G2_MGMT_EN_0X | \
+MV88E6XXX_FLAG_G2_POT |\
 MV88E6XXX_FLAG_MULTI_CHIP |\
 MV88E6XXX_FLAG_PPU |   \
 MV88E6XXX_FLAG_STU |   \
@@ -478,6 +481,7 @@ enum mv88e6xxx_cap {
 MV88E6XXX_FLAG_G2_MGMT_EN_2X | \
 MV88E6XXX_FLAG_G2_MGMT_EN_0X | \
 MV88E6XXX_FLAG_G2_SWITCH_MAC | \
+MV88E6XXX_FLAG_G2_POT |\
 MV88E6XXX_FLAG_MULTI_CHIP |\
 MV88E6XXX_FLAG_STU |   \
 MV88E6XXX_FLAG_TEMP |  \
@@ -498,6 +502,7 @@ enum mv88e6xxx_cap {
 MV88E6XXX_FLAG_G2_MGMT_EN_2X | \
 MV88E6XXX_FLAG_G2_MGMT_EN_0X | \
 MV88E6XXX_FLAG_G2_SWITCH_MAC | \
+MV88E6XXX_FLAG_G2_POT |\
 MV88E6XXX_FLAG_MULTI_CHIP |\
 MV88E6XXX_FLAG_PPU_ACTIVE |\
 MV88E6XXX_FLAG_SMI_PHY |   \
@@ -511,6 +516,7 @@ enum mv88e6xxx_cap {
 MV88E6XXX_FLAG_G2_MGMT_EN_2X | \
 MV88E6XXX_FLAG_G2_MGMT_EN_0X | \
 MV88E6XXX_FLAG_G2_SWITCH_MAC | \
+MV88E6XXX_FLAG_G2_POT |\
 MV88E6XXX_FLAG_MULTI_CHIP |\
 MV88E6XXX_FLAG_PPU_ACTIVE |\
 MV88E6XXX_FLAG_SMI_PHY |   \
@@ -526,6 +532,7 @@ enum mv88e6xxx_cap

[PATCH net-next v3 07/12] net: dsa: mv88e6xxx: add cap for PVT

2016-07-18 Thread Vivien Didelot

Add flags to describe the presence of Cross-chip Port VLAN Table (PVT)
related registers and simplify the setup of Global 2.

Signed-off-by: Vivien Didelot 
Reviewed-by: Andrew Lunn 
---
 drivers/net/dsa/mv88e6xxx/chip.c  | 16 
 drivers/net/dsa/mv88e6xxx/mv88e6xxx.h | 28 +++-
 2 files changed, 31 insertions(+), 13 deletions(-)

diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
index d1b4a7a..f346ad7 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.c
+++ b/drivers/net/dsa/mv88e6xxx/chip.c
@@ -3221,17 +3221,17 @@ static int mv88e6xxx_g2_setup(struct mv88e6xxx_chip 
*chip)
if (err)
return err;
 
-   if (mv88e6xxx_6352_family(chip) || mv88e6xxx_6351_family(chip) ||
-   mv88e6xxx_6165_family(chip) || mv88e6xxx_6097_family(chip) ||
-   mv88e6xxx_6320_family(chip)) {
-   /* Initialise cross-chip port VLAN table to reset
-* defaults.
-*/
-   err = _mv88e6xxx_reg_write(chip, REG_GLOBAL2,
-  GLOBAL2_PVT_ADDR, 0x9000);
+   if (mv88e6xxx_has(chip, MV88E6XXX_FLAGS_PVT)) {
+   /* Initialize Cross-chip Port VLAN Table to reset defaults */
+   err = mv88e6xxx_write(chip, REG_GLOBAL2, GLOBAL2_PVT_ADDR,
+ GLOBAL2_PVT_ADDR_OP_INIT_ONES);
if (err)
return err;
+   }
 
+   if (mv88e6xxx_6352_family(chip) || mv88e6xxx_6351_family(chip) ||
+   mv88e6xxx_6165_family(chip) || mv88e6xxx_6097_family(chip) ||
+   mv88e6xxx_6320_family(chip)) {
/* Clear the priority override table. */
for (i = 0; i < 16; i++) {
err = _mv88e6xxx_reg_write(chip, REG_GLOBAL2,
diff --git a/drivers/net/dsa/mv88e6xxx/mv88e6xxx.h 
b/drivers/net/dsa/mv88e6xxx/mv88e6xxx.h
index da61db4..4e03650 100644
--- a/drivers/net/dsa/mv88e6xxx/mv88e6xxx.h
+++ b/drivers/net/dsa/mv88e6xxx/mv88e6xxx.h
@@ -301,6 +301,10 @@
 #define GLOBAL2_INGRESS_OP 0x09
 #define GLOBAL2_INGRESS_DATA   0x0a
 #define GLOBAL2_PVT_ADDR   0x0b
+#define GLOBAL2_PVT_ADDR_BUSY  BIT(15)
+#define GLOBAL2_PVT_ADDR_OP_INIT_ONES  ((0x01 << 12) | GLOBAL2_PVT_ADDR_BUSY)
+#define GLOBAL2_PVT_ADDR_OP_WRITE_PVLAN((0x03 << 12) | 
GLOBAL2_PVT_ADDR_BUSY)
+#define GLOBAL2_PVT_ADDR_OP_READ   ((0x04 << 12) | GLOBAL2_PVT_ADDR_BUSY)
 #define GLOBAL2_PVT_DATA   0x0c
 #define GLOBAL2_SWITCH_MAC 0x0d
 #define GLOBAL2_ATU_STATS  0x0e
@@ -389,6 +393,8 @@ enum mv88e6xxx_cap {
MV88E6XXX_CAP_GLOBAL2,
MV88E6XXX_CAP_G2_MGMT_EN_2X,/* (0x02) MGMT Enable Register 2x */
MV88E6XXX_CAP_G2_MGMT_EN_0X,/* (0x03) MGMT Enable Register 0x */
+   MV88E6XXX_CAP_G2_PVT_ADDR,  /* (0x0b) Cross Chip Port VLAN Addr */
+   MV88E6XXX_CAP_G2_PVT_DATA,  /* (0x0c) Cross Chip Port VLAN Data */
MV88E6XXX_CAP_G2_SWITCH_MAC,/* (0x0d) Switch MAC/WoL/WoF */
 
/* Multi-chip Addressing Mode.
@@ -433,6 +439,8 @@ enum mv88e6xxx_cap {
 #define MV88E6XXX_FLAG_GLOBAL2 BIT(MV88E6XXX_CAP_GLOBAL2)
 #define MV88E6XXX_FLAG_G2_MGMT_EN_2X   BIT(MV88E6XXX_CAP_G2_MGMT_EN_2X)
 #define MV88E6XXX_FLAG_G2_MGMT_EN_0X   BIT(MV88E6XXX_CAP_G2_MGMT_EN_0X)
+#define MV88E6XXX_FLAG_G2_PVT_ADDR BIT(MV88E6XXX_CAP_G2_PVT_ADDR)
+#define MV88E6XXX_FLAG_G2_PVT_DATA BIT(MV88E6XXX_CAP_G2_PVT_DATA)
 #define MV88E6XXX_FLAG_G2_SWITCH_MAC   BIT(MV88E6XXX_CAP_G2_SWITCH_MAC)
 #define MV88E6XXX_FLAG_MULTI_CHIP  BIT(MV88E6XXX_CAP_MULTI_CHIP)
 #define MV88E6XXX_FLAG_PPU BIT(MV88E6XXX_CAP_PPU)
@@ -443,6 +451,11 @@ enum mv88e6xxx_cap {
 #define MV88E6XXX_FLAG_TEMP_LIMIT  BIT(MV88E6XXX_CAP_TEMP_LIMIT)
 #define MV88E6XXX_FLAG_VTU BIT(MV88E6XXX_CAP_VTU)
 
+/* Cross-chip Port VLAN Table */
+#define MV88E6XXX_FLAGS_PVT\
+   (MV88E6XXX_FLAG_G2_PVT_ADDR |   \
+MV88E6XXX_FLAG_G2_PVT_DATA)
+
 #define MV88E6XXX_FLAGS_FAMILY_6095\
(MV88E6XXX_FLAG_GLOBAL2 |   \
 MV88E6XXX_FLAG_G2_MGMT_EN_0X | \
@@ -457,7 +470,8 @@ enum mv88e6xxx_cap {
 MV88E6XXX_FLAG_MULTI_CHIP |\
 MV88E6XXX_FLAG_PPU |   \
 MV88E6XXX_FLAG_STU |   \
-MV88E6XXX_FLAG_VTU)
+MV88E6XXX_FLAG_VTU |   \
+MV88E6XXX_FLAGS_PVT)
 
 #define MV88E6XXX_FLAGS_FAMILY_6165\
(MV88E6XXX_FLAG_GLOBAL2 |   \
@@ -467,7 +481,8 @@ enum mv88e6xxx_cap {
 MV88E6XXX_FLAG_MULTI_CHIP |\
 MV88E6XXX_FLAG_STU |   \
 MV88E6XXX_FLAG_TEMP |  \
-MV88E6XXX_FLAG_VTU)
+MV88E6XXX_FLAG_VTU |   \
+MV88E6XXX_FLAGS_PVT)
 
 #define MV88E6XXX_FLAGS_FAMILY_6185\
(MV88E6XXX_FLAG_GLOBAL2 |   \
@@ -488,7 +503,8 @@ enum mv88e6xxx_cap {
 MV88E6XXX_FLAG_SMI_PHY |   \

[PATCH net-next v3 00/12] net: dsa: mv88e6xxx: Global2 cleanup and STP

2016-07-18 Thread Vivien Didelot

The Marvell switches registers are organized in distinct internal SMI
devices, such as PHY, Port, Global 1 or Global 2 registers sets.

Since not all chips support every registers sets or have slightly
differences in them (such as old 88E6060 or new 88E6390 likely to be
supported soon), make the setup code clearer now by removing a few
family checks and adding flags to describe the Global 2 registers map.

This patchset enables basic STP support and bridging on most chips when
getting rid of a few inconsistencies in chip descriptions (patch 1) and
add bridge Ageing Time support to DSA and the mv88e6xxx driver.

Changes v2 -> v3:
  - rename mv88e6xxx_update_write to mv88e6xxx_update
  - set fastest ageing time in use in the chip for multiple bridges,
tested with a few printk

Changes v1 -> v2:
  - add a write helper for pointer-data Update registers
  - add ageing time support

Vivien Didelot (12):
  net: dsa: mv88e6xxx: remove basic function flags
  net: dsa: mv88e6xxx: split setup of Global 1 and 2
  net: dsa: mv88e6xxx: extract device mapping
  net: dsa: mv88e6xxx: extract trunk mapping
  net: dsa: mv88e6xxx: add cap for MGMT Enables bits
  net: dsa: mv88e6xxx: rework Switch MAC setter
  net: dsa: mv88e6xxx: add cap for PVT
  net: dsa: mv88e6xxx: add cap for Priority Override
  net: dsa: mv88e6xxx: add cap for IRL
  net: dsa: support switchdev ageing time attr
  net: dsa: mv88e6xxx: add G1 helper for ageing time
  net: dsa: mv88e6xxx: add support for DSA ageing time

 drivers/net/dsa/mv88e6xxx/chip.c  | 516 ++
 drivers/net/dsa/mv88e6xxx/mv88e6xxx.h | 148 ++
 include/net/dsa.h |   2 +
 net/dsa/slave.c   |  41 +++
 4 files changed, 472 insertions(+), 235 deletions(-)

-- 
2.9.0

[PATCH net-next v3 03/12] net: dsa: mv88e6xxx: extract device mapping

2016-07-18 Thread Vivien Didelot

The Device Mapping register is an indirect table access.

Provide helpers to access this table and explicit the checking of the
new DSA_RTABLE_NONE routing table value.

Signed-off-by: Vivien Didelot 
---
 drivers/net/dsa/mv88e6xxx/chip.c | 75 
 1 file changed, 60 insertions(+), 15 deletions(-)

diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
index 1e39fa6..9b2525a 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.c
+++ b/drivers/net/dsa/mv88e6xxx/chip.c
@@ -216,6 +216,32 @@ static int mv88e6xxx_write(struct mv88e6xxx_chip *chip,
return 0;
 }
 
+/* Indirect write to single pointer-data register with an Update bit */
+static int mv88e6xxx_update(struct mv88e6xxx_chip *chip, int addr, int reg,
+   u16 update)
+{
+   u16 val;
+   int i, err;
+
+   /* Wait until the previous operation is completed */
+   for (i = 0; i < 16; ++i) {
+   err = mv88e6xxx_read(chip, addr, reg, );
+   if (err)
+   return err;
+
+   if (!(val & BIT(15)))
+   break;
+   }
+
+   if (i == 16)
+   return -ETIMEDOUT;
+
+   /* Set the Update bit to trigger a write operation */
+   val = BIT(15) | update;
+
+   return mv88e6xxx_write(chip, addr, reg, val);
+}
+
 static int _mv88e6xxx_reg_read(struct mv88e6xxx_chip *chip, int addr, int reg)
 {
u16 val;
@@ -3094,9 +3120,39 @@ static int mv88e6xxx_g1_setup(struct mv88e6xxx_chip 
*chip)
return 0;
 }
 
+static int mv88e6xxx_g2_device_mapping_write(struct mv88e6xxx_chip *chip,
+int target, int port)
+{
+   u16 val = (target << 8) | (port & 0xf);
+
+   return mv88e6xxx_update(chip, REG_GLOBAL2, GLOBAL2_DEVICE_MAPPING, val);
+}
+
+static int mv88e6xxx_g2_set_device_mapping(struct mv88e6xxx_chip *chip)
+{
+   int target, port;
+   int err;
+
+   /* Initialize the routing port to the 32 possible target devices */
+   for (target = 0; target < 32; ++target) {
+   port = 0xf;
+
+   if (target < DSA_MAX_SWITCHES) {
+   port = chip->ds->rtable[target];
+   if (port == DSA_RTABLE_NONE)
+   port = 0xf;
+   }
+
+   err = mv88e6xxx_g2_device_mapping_write(chip, target, port);
+   if (err)
+   break;
+   }
+
+   return err;
+}
+
 static int mv88e6xxx_g2_setup(struct mv88e6xxx_chip *chip)
 {
-   struct dsa_switch *ds = chip->ds;
int err;
int i;
 
@@ -3120,20 +3176,9 @@ static int mv88e6xxx_g2_setup(struct mv88e6xxx_chip 
*chip)
return err;
 
/* Program the DSA routing table. */
-   for (i = 0; i < 32; i++) {
-   int nexthop = 0x1f;
-
-   if (i != ds->index && i < DSA_MAX_SWITCHES)
-   nexthop = ds->rtable[i] & 0x1f;
-
-   err = _mv88e6xxx_reg_write(
-   chip, REG_GLOBAL2,
-   GLOBAL2_DEVICE_MAPPING,
-   GLOBAL2_DEVICE_MAPPING_UPDATE |
-   (i << GLOBAL2_DEVICE_MAPPING_TARGET_SHIFT) | nexthop);
-   if (err)
-   return err;
-   }
+   err = mv88e6xxx_g2_set_device_mapping(chip);
+   if (err)
+   return err;
 
/* Clear all trunk masks. */
for (i = 0; i < 8; i++) {
-- 
2.9.0

[PATCH net-next v3 05/12] net: dsa: mv88e6xxx: add cap for MGMT Enables bits

2016-07-18 Thread Vivien Didelot

Some switches provide a Rsvd2CPU mechanism used to choose which of the
16 reserved multicast destination addresses matching 01:80:c2:00:00:0x
should be considered as MGMT and thus forwarded to the CPU port.

Other switches extend this mechanism to also configure as MGMT the
additional 16 reserved multicast addresses matching 01:80:c2:00:00:2x.

This mechanism is exposed via two registers in Global 2, and an Rsvd2CPU
enable bit in the management register.

Newer chip (such as 88E6390) has replaced these registers with a new
indirect MGMT mechanism in Global 1.

The patch adds two MV88E6XXX_FLAG_G2_MGMT_EN_{0,2}X flags to describe
the presence of these Global 2 registers. If 88E6390 support is added, a
MV88E6XXX_FLAG_G1_MGMT_CTRL flag will be needed to setup Rsvd2CPU.

Note: all switches still support in parallel the ATU Load operation with
an MGMT Entry State to forward such frames in a less convenient way.

Signed-off-by: Vivien Didelot 
Reviewed-by: Andrew Lunn 
---
 drivers/net/dsa/mv88e6xxx/chip.c  | 43 ---
 drivers/net/dsa/mv88e6xxx/mv88e6xxx.h | 16 +
 2 files changed, 41 insertions(+), 18 deletions(-)

diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
index d18e5c8..cf98884 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.c
+++ b/drivers/net/dsa/mv88e6xxx/chip.c
@@ -3196,25 +3196,40 @@ static int mv88e6xxx_g2_clear_trunk(struct 
mv88e6xxx_chip *chip)
 
 static int mv88e6xxx_g2_setup(struct mv88e6xxx_chip *chip)
 {
+   u16 reg;
int err;
int i;
 
-   /* Send all frames with destination addresses matching
-* 01:80:c2:00:00:0x to the CPU port.
-*/
-   err = _mv88e6xxx_reg_write(chip, REG_GLOBAL2, GLOBAL2_MGMT_EN_0X,
-  0x);
-   if (err)
-   return err;
+   if (mv88e6xxx_has(chip, MV88E6XXX_FLAG_G2_MGMT_EN_2X)) {
+   /* Consider the frames with reserved multicast destination
+* addresses matching 01:80:c2:00:00:2x as MGMT.
+*/
+   err = mv88e6xxx_write(chip, REG_GLOBAL2, GLOBAL2_MGMT_EN_2X,
+ 0x);
+   if (err)
+   return err;
+   }
+
+   if (mv88e6xxx_has(chip, MV88E6XXX_FLAG_G2_MGMT_EN_0X)) {
+   /* Consider the frames with reserved multicast destination
+* addresses matching 01:80:c2:00:00:0x as MGMT.
+*/
+   err = mv88e6xxx_write(chip, REG_GLOBAL2, GLOBAL2_MGMT_EN_0X,
+ 0x);
+   if (err)
+   return err;
+   }
 
/* Ignore removed tag data on doubly tagged packets, disable
 * flow control messages, force flow control priority to the
 * highest, and send all special multicast frames to the CPU
 * port at the highest priority.
 */
-   err = _mv88e6xxx_reg_write(chip, REG_GLOBAL2, GLOBAL2_SWITCH_MGMT,
-  0x7 | GLOBAL2_SWITCH_MGMT_RSVD2CPU | 0x70 |
-  GLOBAL2_SWITCH_MGMT_FORCE_FLOW_CTRL_PRI);
+   reg = GLOBAL2_SWITCH_MGMT_FORCE_FLOW_CTRL_PRI | (0x7 << 4);
+   if (mv88e6xxx_has(chip, MV88E6XXX_FLAG_G2_MGMT_EN_0X) ||
+   mv88e6xxx_has(chip, MV88E6XXX_FLAG_G2_MGMT_EN_2X))
+   reg |= GLOBAL2_SWITCH_MGMT_RSVD2CPU | 0x7;
+   err = mv88e6xxx_write(chip, REG_GLOBAL2, GLOBAL2_SWITCH_MGMT, reg);
if (err)
return err;
 
@@ -3231,14 +3246,6 @@ static int mv88e6xxx_g2_setup(struct mv88e6xxx_chip 
*chip)
if (mv88e6xxx_6352_family(chip) || mv88e6xxx_6351_family(chip) ||
mv88e6xxx_6165_family(chip) || mv88e6xxx_6097_family(chip) ||
mv88e6xxx_6320_family(chip)) {
-   /* Send all frames with destination addresses matching
-* 01:80:c2:00:00:2x to the CPU port.
-*/
-   err = _mv88e6xxx_reg_write(chip, REG_GLOBAL2,
-  GLOBAL2_MGMT_EN_2X, 0x);
-   if (err)
-   return err;
-
/* Initialise cross-chip port VLAN table to reset
 * defaults.
 */
diff --git a/drivers/net/dsa/mv88e6xxx/mv88e6xxx.h 
b/drivers/net/dsa/mv88e6xxx/mv88e6xxx.h
index 876d9ea..d13b0b5 100644
--- a/drivers/net/dsa/mv88e6xxx/mv88e6xxx.h
+++ b/drivers/net/dsa/mv88e6xxx/mv88e6xxx.h
@@ -388,6 +388,8 @@ enum mv88e6xxx_cap {
 * The device contains a second set of global 16-bit registers.
 */
MV88E6XXX_CAP_GLOBAL2,
+   MV88E6XXX_CAP_G2_MGMT_EN_2X,/* (0x02) MGMT Enable Register 2x */
+   MV88E6XXX_CAP_G2_MGMT_EN_0X,/* (0x03) MGMT Enable Register 0x */
 
/* Multi-chip Addressing Mode.
 * Some chips require an indirect SMI access when their SMI device
@@ -436,6

Re: [patch 1/1] kernel/trace/bpf_trace.c: work around gcc-4.4.4 anon union initialization bug

2016-07-18 Thread Fengguang Wu

Hi Alexei,

On Mon, Jul 18, 2016 at 05:33:07PM -0700, Alexei Starovoitov wrote:

On Mon, Jul 18, 2016 at 03:50:58PM -0700, a...@linux-foundation.org wrote:

From: Andrew Morton 
Subject: kernel/trace/bpf_trace.c: work around gcc-4.4.4 anon union 
initialization bug

kernel/trace/bpf_trace.c: In function 'bpf_event_output':
kernel/trace/bpf_trace.c:312: error: unknown field 'next' specified in 
initializer
kernel/trace/bpf_trace.c:312: warning: missing braces around initializer
kernel/trace/bpf_trace.c:312: warning: (near initialization for 
'raw.frag.')

Fixes: 555c8a8623a3a87 ("bpf: avoid stack copy and use skb ctx for event 
output")
Acked-by: Daniel Borkmann 
Cc: Alexei Starovoitov 
Cc: David S. Miller 
Signed-off-by: Andrew Morton 

Acked-by: Alexei Starovoitov 

Fengguang can you add gcc-4.4 to buildbot. Thanks!

Sure. Currently we only test gcc-6. It'd be easy to test more versions
concurrently, like

gcc-4.4
gcc-4.6
gcc-4.8
gcc-4.9
gcc-5
gcc-6

Thanks,
Fengguang

Re: [PATCH iproute2 -master] bpf: also check elf for official e_machine value

2016-07-18 Thread Alexei Starovoitov

On Tue, Jul 19, 2016 at 01:09:52AM +0200, Daniel Borkmann wrote:
> Use the official BPF ELF e_machine value that was assigned recently [1]
> and will be propagated to glibc, libelf et al. LLVM will switch to it
> in 3.9 release, therefore we need to prepare tc to check for EM_ELF as
> well, older version still have the EM_NONE.
> 
>   [1] 
> https://github.com/llvm-mirror/llvm/commit/36b9c09330bfb5e771914cfe307588f30d5510d2
> 
> Signed-off-by: Daniel Borkmann 

Thank you for fixing it.
Acked-by: Alexei Starovoitov

Re: [patch 1/1] kernel/trace/bpf_trace.c: work around gcc-4.4.4 anon union initialization bug

2016-07-18 Thread Alexei Starovoitov

On Mon, Jul 18, 2016 at 03:50:58PM -0700, a...@linux-foundation.org wrote:
> From: Andrew Morton 
> Subject: kernel/trace/bpf_trace.c: work around gcc-4.4.4 anon union 
> initialization bug
> 
> kernel/trace/bpf_trace.c: In function 'bpf_event_output':
> kernel/trace/bpf_trace.c:312: error: unknown field 'next' specified in 
> initializer
> kernel/trace/bpf_trace.c:312: warning: missing braces around initializer
> kernel/trace/bpf_trace.c:312: warning: (near initialization for 
> 'raw.frag.')
> 
> Fixes: 555c8a8623a3a87 ("bpf: avoid stack copy and use skb ctx for event 
> output")
> Acked-by: Daniel Borkmann 
> Cc: Alexei Starovoitov 
> Cc: David S. Miller 
> Signed-off-by: Andrew Morton 

Acked-by: Alexei Starovoitov 

Fengguang can you add gcc-4.4 to buildbot. Thanks!

[PATCH iproute2 -master] bpf: also check elf for official e_machine value

2016-07-18 Thread Daniel Borkmann

Use the official BPF ELF e_machine value that was assigned recently [1]
and will be propagated to glibc, libelf et al. LLVM will switch to it
in 3.9 release, therefore we need to prepare tc to check for EM_ELF as
well, older version still have the EM_NONE.

  [1] 
https://github.com/llvm-mirror/llvm/commit/36b9c09330bfb5e771914cfe307588f30d5510d2

Signed-off-by: Daniel Borkmann 
---
 tc/tc_bpf.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/tc/tc_bpf.c b/tc/tc_bpf.c
index 86c6069..7eb1cd7 100644
--- a/tc/tc_bpf.c
+++ b/tc/tc_bpf.c
@@ -54,6 +54,10 @@
 #define AF_ALG 38
 #endif
 
+#ifndef EM_BPF
+#define EM_BPF 247
+#endif
+
 #ifdef HAVE_ELF
 static int bpf_obj_open(const char *path, enum bpf_prog_type type,
const char *sec, bool verbose);
@@ -1690,7 +1694,8 @@ static void bpf_hash_destroy(struct bpf_elf_ctx *ctx)
 static int bpf_elf_check_ehdr(const struct bpf_elf_ctx *ctx)
 {
if (ctx->elf_hdr.e_type != ET_REL ||
-   ctx->elf_hdr.e_machine != 0 ||
+   (ctx->elf_hdr.e_machine != EM_NONE &&
+ctx->elf_hdr.e_machine != EM_BPF) ||
ctx->elf_hdr.e_version != EV_CURRENT) {
fprintf(stderr, "ELF format error, ELF file not for eBPF?\n");
return -EINVAL;
-- 
1.9.3

Re: [PATCH resend] virtio-net: Remove more stack DMA

2016-07-18 Thread Michael S. Tsirkin

On Mon, Jul 18, 2016 at 03:34:49PM -0700, Andy Lutomirski wrote:
> VLAN and MQ control was doing DMA from the stack.  Fix it.
> 
> Cc: Michael S. Tsirkin 
> Cc: "netdev@vger.kernel.org" 
> Signed-off-by: Andy Lutomirski 

Acked-by: Michael S. Tsirkin 


> ---
> 
> I tested VLAN addition and removal with CONFIG_VMAP_STACK=y,
> CONFIG_DEBUG_SG=y and it got rid of the warnings I saw.  I haven't
> tested the MQ part because I don't know how to enable it in the first
> place (I'm guessing it needs me to enable some QEMU feature I don't
> know about.)
> 
> DaveM, contrary to what I thought last time I sent this, I think this
> should go through net-next as long as it makes it in time for 4.8.
> 
>  drivers/net/virtio_net.c | 13 -
>  1 file changed, 8 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index e0638e556fe7..5044ca37d725 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -144,8 +144,10 @@ struct virtnet_info {
>   /* Control VQ buffers: protected by the rtnl lock */
>   struct virtio_net_ctrl_hdr ctrl_hdr;
>   virtio_net_ctrl_ack ctrl_status;
> + struct virtio_net_ctrl_mq ctrl_mq;
>   u8 ctrl_promisc;
>   u8 ctrl_allmulti;
> + u16 ctrl_vid;
>  
>   /* Ethtool settings */
>   u8 duplex;
> @@ -1116,14 +1118,13 @@ static void virtnet_ack_link_announce(struct 
> virtnet_info *vi)
>  static int virtnet_set_queues(struct virtnet_info *vi, u16 queue_pairs)
>  {
>   struct scatterlist sg;
> - struct virtio_net_ctrl_mq s;
>   struct net_device *dev = vi->dev;
>  
>   if (!vi->has_cvq || !virtio_has_feature(vi->vdev, VIRTIO_NET_F_MQ))
>   return 0;
>  
> - s.virtqueue_pairs = cpu_to_virtio16(vi->vdev, queue_pairs);
> - sg_init_one(, , sizeof(s));
> + vi->ctrl_mq.virtqueue_pairs = cpu_to_virtio16(vi->vdev, queue_pairs);
> + sg_init_one(, >ctrl_mq, sizeof(vi->ctrl_mq));
>  
>   if (!virtnet_send_command(vi, VIRTIO_NET_CTRL_MQ,
> VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET, )) {
> @@ -1230,7 +1231,8 @@ static int virtnet_vlan_rx_add_vid(struct net_device 
> *dev,
>   struct virtnet_info *vi = netdev_priv(dev);
>   struct scatterlist sg;
>  
> - sg_init_one(, , sizeof(vid));
> + vi->ctrl_vid = vid;
> + sg_init_one(, >ctrl_vid, sizeof(vi->ctrl_vid));
>  
>   if (!virtnet_send_command(vi, VIRTIO_NET_CTRL_VLAN,
> VIRTIO_NET_CTRL_VLAN_ADD, ))
> @@ -1244,7 +1246,8 @@ static int virtnet_vlan_rx_kill_vid(struct net_device 
> *dev,
>   struct virtnet_info *vi = netdev_priv(dev);
>   struct scatterlist sg;
>  
> - sg_init_one(, , sizeof(vid));
> + vi->ctrl_vid = vid;
> + sg_init_one(, >ctrl_vid, sizeof(vi->ctrl_vid));
>  
>   if (!virtnet_send_command(vi, VIRTIO_NET_CTRL_VLAN,
> VIRTIO_NET_CTRL_VLAN_DEL, ))
> -- 
> 2.7.4

[patch 1/1] kernel/trace/bpf_trace.c: work around gcc-4.4.4 anon union initialization bug

2016-07-18 Thread akpm

From: Andrew Morton 
Subject: kernel/trace/bpf_trace.c: work around gcc-4.4.4 anon union 
initialization bug

kernel/trace/bpf_trace.c: In function 'bpf_event_output':
kernel/trace/bpf_trace.c:312: error: unknown field 'next' specified in 
initializer
kernel/trace/bpf_trace.c:312: warning: missing braces around initializer
kernel/trace/bpf_trace.c:312: warning: (near initialization for 
'raw.frag.')

Fixes: 555c8a8623a3a87 ("bpf: avoid stack copy and use skb ctx for event 
output")
Acked-by: Daniel Borkmann 
Cc: Alexei Starovoitov 
Cc: David S. Miller 
Signed-off-by: Andrew Morton 
---

 kernel/trace/bpf_trace.c |4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff -puN 
kernel/trace/bpf_trace.c~kernel-trace-bpf_tracec-work-around-gcc-444-anon-union-initialization-bug
 kernel/trace/bpf_trace.c
--- 
a/kernel/trace/bpf_trace.c~kernel-trace-bpf_tracec-work-around-gcc-444-anon-union-initialization-bug
+++ a/kernel/trace/bpf_trace.c
@@ -309,7 +309,9 @@ u64 bpf_event_output(struct bpf_map *map
};
struct perf_raw_record raw = {
.frag = {
-   .next   = ctx_size ?  : NULL,
+   {
+   .next   = ctx_size ?  : NULL,
+   },
.size   = meta_size,
.data   = meta,
},
_

[PATCH resend] virtio-net: Remove more stack DMA

2016-07-18 Thread Andy Lutomirski

VLAN and MQ control was doing DMA from the stack.  Fix it.

Cc: Michael S. Tsirkin 
Cc: "netdev@vger.kernel.org" 
Signed-off-by: Andy Lutomirski 
---

I tested VLAN addition and removal with CONFIG_VMAP_STACK=y,
CONFIG_DEBUG_SG=y and it got rid of the warnings I saw.  I haven't
tested the MQ part because I don't know how to enable it in the first
place (I'm guessing it needs me to enable some QEMU feature I don't
know about.)

DaveM, contrary to what I thought last time I sent this, I think this
should go through net-next as long as it makes it in time for 4.8.

 drivers/net/virtio_net.c | 13 -
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index e0638e556fe7..5044ca37d725 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -144,8 +144,10 @@ struct virtnet_info {
/* Control VQ buffers: protected by the rtnl lock */
struct virtio_net_ctrl_hdr ctrl_hdr;
virtio_net_ctrl_ack ctrl_status;
+   struct virtio_net_ctrl_mq ctrl_mq;
u8 ctrl_promisc;
u8 ctrl_allmulti;
+   u16 ctrl_vid;
 
/* Ethtool settings */
u8 duplex;
@@ -1116,14 +1118,13 @@ static void virtnet_ack_link_announce(struct 
virtnet_info *vi)
 static int virtnet_set_queues(struct virtnet_info *vi, u16 queue_pairs)
 {
struct scatterlist sg;
-   struct virtio_net_ctrl_mq s;
struct net_device *dev = vi->dev;
 
if (!vi->has_cvq || !virtio_has_feature(vi->vdev, VIRTIO_NET_F_MQ))
return 0;
 
-   s.virtqueue_pairs = cpu_to_virtio16(vi->vdev, queue_pairs);
-   sg_init_one(, , sizeof(s));
+   vi->ctrl_mq.virtqueue_pairs = cpu_to_virtio16(vi->vdev, queue_pairs);
+   sg_init_one(, >ctrl_mq, sizeof(vi->ctrl_mq));
 
if (!virtnet_send_command(vi, VIRTIO_NET_CTRL_MQ,
  VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET, )) {
@@ -1230,7 +1231,8 @@ static int virtnet_vlan_rx_add_vid(struct net_device *dev,
struct virtnet_info *vi = netdev_priv(dev);
struct scatterlist sg;
 
-   sg_init_one(, , sizeof(vid));
+   vi->ctrl_vid = vid;
+   sg_init_one(, >ctrl_vid, sizeof(vi->ctrl_vid));
 
if (!virtnet_send_command(vi, VIRTIO_NET_CTRL_VLAN,
  VIRTIO_NET_CTRL_VLAN_ADD, ))
@@ -1244,7 +1246,8 @@ static int virtnet_vlan_rx_kill_vid(struct net_device 
*dev,
struct virtnet_info *vi = netdev_priv(dev);
struct scatterlist sg;
 
-   sg_init_one(, , sizeof(vid));
+   vi->ctrl_vid = vid;
+   sg_init_one(, >ctrl_vid, sizeof(vi->ctrl_vid));
 
if (!virtnet_send_command(vi, VIRTIO_NET_CTRL_VLAN,
  VIRTIO_NET_CTRL_VLAN_DEL, ))
-- 
2.7.4

Re: [ovs-dev] [PATCH net-next v11 5/6] openvswitch: add layer 3 flow/port support

2016-07-18 Thread pravin shelar

On Sun, Jul 17, 2016 at 9:50 PM, Simon Horman
 wrote:
> [CC Jiri Benc for portion regarding GRE]
>
> Hi Pravin,
>
> On Fri, Jul 15, 2016 at 02:07:37PM -0700, pravin shelar wrote:
>> On Wed, Jul 13, 2016 at 12:31 AM, Simon Horman
>>  wrote:
>> > Hi Pravin,
>> >
>> > On Thu, Jul 07, 2016 at 01:54:15PM -0700, pravin shelar wrote:
>> >> On Wed, Jul 6, 2016 at 10:59 AM, Simon Horman
>> >>  wrote:
>> >
>> > ...
>>
>> >
>> >> > diff --git a/net/openvswitch/flow.c b/net/openvswitch/flow.c
>> >> > index 0ea128eeeab2..86f2cfb19de3 100644
>> >> > --- a/net/openvswitch/flow.c
>> >> > +++ b/net/openvswitch/flow.c
>> >> ...
>> >>
>> >> > @@ -723,9 +729,17 @@ int ovs_flow_key_extract(const struct 
>> >> > ip_tunnel_info *tun_info,
>> >> > key->phy.skb_mark = skb->mark;
>> >> > ovs_ct_fill_key(skb, key);
>> >> > key->ovs_flow_hash = 0;
>> >> > +   key->phy.is_layer3 = skb->mac_len == 0;
>> >>
>> >> I do not think mac_len can be used. mac_header needs to be checked.
>> >> ...
>> >
>> > Yes, indeed. The update to use skb_mac_header_was_set() here accidently
>> > slipped into the following patch, sorry about that.
>> >
>> > With that change in place I believe that this patch is internally
>> > consistent because mac_header and mac_len are set correctly by the
>> > call to key_extract() which is called by ovs_flow_key_extract() just
>> > after where the excerpt above ends.
>> >
>> > That said, I do think that it is possible to rely on skb_mac_header_was_set
>> > throughout the datapath, including action processing etc... I have provided
>> > an incremental patch - which I created on top of this entire series - at
>> > the end of this email. If you prefer that approach I am happy to take it,
>> > though I do feel that using mac_len leads to slightly cleaner code. Let me
>> > know what you think.
>> >
>>
>>
>> I am not sure if you can use only mac_len to detect L3 packet. This
>> does not work with MPLS packets, mac_len is used to account MPLS
>> headers pushed on skb. Therefore in case of a MPLS header on L3
>> packet, mac_len would be non zero and we have to look at either
>> mac_header or some other metadata like is_layer3 flag from key to
>> check for L3 packet.
>
> At least within OvS mac_len does not include the length of the MPLS label
> stack. Rather, the MPLS label stack length is the difference between the
> end of (mac_header + mac_len) and network_header.
>
> So I think that the scheme does work as mac_len is 0 if there is no L2
> header regardless of if an MPLS label stack is present or not.
>

I was thinking in overall networking stack rather than just ovs
datapath. I think we should have consistent method of detecting L3
packet. As commented in previous mail it could be achieved using
skb-protocol and device type.

RE: [E1000-devel] [PATCH 1/1] ixgbevf: avoid checking hang when performing hardware reset

2016-07-18 Thread Skidmore, Donald C

> -Original Message-
> From: zyjzyj2...@gmail.com [mailto:zyjzyj2...@gmail.com]
> Sent: Monday, July 18, 2016 6:45 AM
> To: e1000-de...@lists.sourceforge.net; netdev@vger.kernel.org; Kirsher,
> Jeffrey T 
> Subject: [E1000-devel] [PATCH 1/1] ixgbevf: avoid checking hang when
> performing hardware reset
> 
> From: Zhu Yanjun 
> 
> When performing hardware reset, it is not necessary to check hang.
> Or else, the call trace will appear.
> 
> Signed-off-by: Zhu Yanjun 
> ---
>  drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c | 5 +
>  1 file changed, 5 insertions(+)
> 
> diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
> b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
> index acc2401..d563d24 100644
> --- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
> +++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
> @@ -2792,9 +2792,14 @@ static void ixgbevf_reset_subtask(struct
> ixgbevf_adapter *adapter)  static void ixgbevf_check_hang_subtask(struct
> ixgbevf_adapter *adapter)  {
>   struct ixgbe_hw *hw = >hw;
> + struct ixgbe_mbx_info *mbx = >mbx;
>   u32 eics = 0;
>   int i;
> 
> + /* When performing hardware reset, unnecessary to check hang. */
> + if (mbx->ops.check_for_rst(hw))
> + return;
> +
>   /* If we're down or resetting, just bail */
>   if (test_bit(__IXGBEVF_DOWN, >state) ||
>   test_bit(__IXGBEVF_RESETTING, >state))
> --
> 2.7.4
> 
> 

My concern with this patch is that the check_for_rst does a read to clear on 
the RSTD and RSTI bits.  So reading it in the service task may mean we miss 
this transition in the mailbox protocol.  Likewise it is possible they may have 
already been cleared and you won't even catch the state your looking for.

-Don Skidmore

Re: [PATCH 2/2] net: ethernet: marvell: pxa168_eth: use phy_ethtool_{get|set}_link_ksettings

2016-07-18 Thread Florian Fainelli

On 07/17/2016 02:30 PM, Philippe Reynes wrote:
> There are two generics functions phy_ethtool_{get|set}_link_ksettings,
> so we can use them instead of defining the same code in the driver.
> 
> Signed-off-by: Philippe Reynes 
> ---
> - cmd.phy_address = pep->phy_addr;
> - cmd.speed = pep->phy_speed;
> - cmd.duplex = pep->phy_duplex;
> - cmd.advertising = PHY_BASIC_FEATURES;
> - cmd.autoneg = AUTONEG_ENABLE;
> + cmd.base.phy_address = pep->phy_addr;
> + cmd.base.speed = pep->phy_speed;
> + cmd.base.duplex = pep->phy_duplex;
> + ethtool_convert_legacy_u32_to_link_mode(cmd.link_modes.advertising,
> + PHY_BASIC_FEATURES);
> + cmd.base.autoneg = AUTONEG_ENABLE;
>  
> - if (cmd.speed != 0)
> - cmd.autoneg = AUTONEG_DISABLE;
> + if (cmd.base.speed != 0)
> + cmd.base.autoneg = AUTONEG_DISABLE;
>  
> - return pxa168_set_settings(dev, );
> + return phy_ethtool_set_link_ksettings(dev, );

This duplicates quite a bit of code that the core PHYLIB already deals
with, you should plan for a subsequent cleanup patch which removes all
of this.
-- 
Florian

Re: [PATCH 1/2] net: ethernet: marvell: pxa168_eth: use phydev from struct net_device

2016-07-18 Thread Philippe Reynes


Hi,

On 18/07/16 12:14, Sergei Shtylyov wrote:

Hello.

On 7/18/2016 12:30 AM, Philippe Reynes wrote:


The private structure contain a pointer to phydev, but the structure
net_device already contain such pointer. So we can remove the pointer
phydev in the private structure, and update the driver to use the
one contained in struct net_device.

Signed-off-by: Philippe Reynes
---
  drivers/net/ethernet/marvell/pxa168_eth.c |   36 +
  1 files changed, 16 insertions(+), 20 deletions(-)

diff --git a/drivers/net/ethernet/marvell/pxa168_eth.c 
b/drivers/net/ethernet/marvell/pxa168_eth.c
index 54d5154..d466326 100644
--- a/drivers/net/ethernet/marvell/pxa168_eth.c
+++ b/drivers/net/ethernet/marvell/pxa168_eth.c

[...]

@@ -973,16 +972,17 @@ static int pxa168_init_phy(struct net_device *dev)
  {
struct pxa168_eth_private *pep = netdev_priv(dev);
struct ethtool_cmd cmd;
+   struct phy_device *phy = NULL;


Initializer not really needed.


You're right, the first line using phy is an assign, so it's not really usefull.
 

int err;

-   if (pep->phy)
+   if (dev->phydev)
return 0;

-   pep->phy = mdiobus_scan(pep->smi_bus, pep->phy_addr);
-   if (IS_ERR(pep->phy))
-   return PTR_ERR(pep->phy);
+   phy = mdiobus_scan(pep->smi_bus, pep->phy_addr);
+   if (IS_ERR(phy))
+   return PTR_ERR(phy);

-   err = phy_connect_direct(dev, pep->phy, pxa168_eth_adjust_link,
+   err = phy_connect_direct(dev, phy, pxa168_eth_adjust_link,
 pep->phy_intf);
if (err)
return err;


 Hm, where do you assign 'dev->phydev'?


dev-> phydev is assigned in phy_connect_direct. In fact, phy_connect_direct call
phy_attach_direct, and this last function assign phydev to dev->phydev.


[...]

MBR, Sergei



Regards,
Philippe

Re: [RFC PATCH 00/30] Kernel NET policy

2016-07-18 Thread Hannes Frederic Sowa

Hello,

On Mon, Jul 18, 2016, at 21:43, Andi Kleen wrote:
> > I wonder if this can be attacked from a different angle. What would be
> > missing to add support for this in user space? The first possibility
> > that came to my mind is to just multiplex those hints in the kernel.
> 
> "just" is the handwaving part here -- you're proposing a micro kernel
> approach where part of the multiplexing job that the kernel is doing
> is farmed out to a message passing user space component.
> 
> I suspect this would be far more complicated to get right and
> perform well than a straight forward monolithic kernel subsystem --
> which is traditionally how Linux has approached things.

At the same time having any kind of policy in the kernel was also always
avoided.

> The daemon would always need to work with out of date state
> compared to the latest, because it cannot do any locking with the
> kernel state.  So you end up with a complex distributed system with
> multiple
> agents "fighting" with each other, and the tuning agent
> never being able to keep up with the actual work.

But you don't want to have the tuning agents in the fast path? If you
really try to synchronously update all queue mappings/irqs during socket
creation or connect time this would add rtnl lock to basically socket
creation, as drivers require that. This would slow down basic socket
operations a lot and synchronize them with the management interface.

Even dst_entries are not synchronously updated anymore nowadays as that
would require too much locking overhead in the kernel.

> Also of course it would be fundamentally less efficient than
> kernel code doing that, just because of the additional context
> switches needed.

Synchronizing or configuring any kind of queues already requires
rtnl_mutex. I didn't test it but acquiring rtnl mutex in inet_recvmsg is
unlikely to fly performance wise and might even be very dangerous under
DoS attacks (like I see in 24/30).

Bye,
Hannes

Re: [PATCH v2 net-next v2 12/12] net: dsa: mv88e6xxx: add support for DSA ageing time

2016-07-18 Thread Andrew Lunn

> OK. I think caching per-port (and thus per-bridge) ageing time would do
> the trick and keep DSA drivers simple. What about the following patch?

Hi Vivien

This looks good. I would suggest you do some testing with a printk,
and force some topology changes, just to make sure it works how we
think it works.

Andrew

Re: [RFC PATCH 00/30] Kernel NET policy

2016-07-18 Thread Daniel Borkmann


On 07/18/2016 08:30 PM, Liang, Kan wrote:

On 07/18/2016 08:55 AM, kan.li...@intel.com wrote:

[...]

On a higher level picture, why for example, a new cgroup in combination
with tc shouldn't be the ones resolving these policies on resource usage?


The NET policy doesn't support cgroup yet, but it's on my todo list.
The granularity for the device resource is per queue. The packet will be
redirected to the specific queue.
I'm not sure if cgroup with tc can do that.


Did you have a look at sch_mqprio, which can be used along with either
netprio cgroup or netcls cgroup plus tc on clsact's egress side to set
the priority for mqprio mappings from application side? At leats ixgbe,
i40e, fm10k have offload support for it and a number of other nics. You
could also use cls_bpf for making the prio assignment if you need to
involve also other meta data from the skb (like mark or prio derived from
sockets, etc). Maybe it doesn't cover all of what you need, but could be
a start to extend upon?

Thanks,
Daniel

Re: [PATCH v2 net-next v2 12/12] net: dsa: mv88e6xxx: add support for DSA ageing time

2016-07-18 Thread Vivien Didelot

Hi Andrew,

Andrew Lunn  writes:

> On Mon, Jul 18, 2016 at 03:59:38PM -0400, Vivien Didelot wrote:
>> Andrew Lunn  writes:
>> 
>> >> Nope, the bridge ageing time is not per-port, even though switchdev ops
>> >> are per-port by design. This is a switch-wide attribute.
>> >
>> > So you are saying the core is doing all the reference counting, etc,
>> > when swapping between fast and slow ageing?
>> 
>> I don't see how checking for the fastest ageing time would fix support
>> for multiple bridges...
>
> The bridge should switch to fast ageing after a topology change to
> flush out entries which are now wrong. Using the short age time for
> too long results in a bit more inefficiency, in that entries time out
> faster than they need to. But if we go back to slow ageing too
> quickly, e.g. because of another bridge, we get wrong operation, in
> that bad entries can get stuck in the table for up to 5 minutes.
>
> So either we need to keep fast ageing as long as there is one bridge
> fast ageing, or we need to flush the whole MAC cache for a bridge on
> topology change and don't bother with fast ageing at all.
>
>> Maybe we can keep it simple for the moment with this switch-wide
>> set_ageing_time operation, and later add a patch for the DSA layer to
>> cache and elect the ageing time per-port or per-bridge.
>
> I don't think it can be done at the DSA layer. It does not have the
> information needed.

OK. I think caching per-port (and thus per-bridge) ageing time would do
the trick and keep DSA drivers simple. What about the following patch?

diff --git a/include/net/dsa.h b/include/net/dsa.h
index 238fad9..2217a3f 100644
--- a/include/net/dsa.h
+++ b/include/net/dsa.h
@@ -141,6 +141,7 @@ struct dsa_switch_tree {
 struct dsa_port {
struct net_device   *netdev;
struct device_node  *dn;
+   unsigned intageing_time;
 };
 
 struct dsa_switch {
diff --git a/net/dsa/slave.c b/net/dsa/slave.c
index 1074cb6..fc91967 100644
--- a/net/dsa/slave.c
+++ b/net/dsa/slave.c
@@ -333,6 +333,21 @@ static int dsa_slave_vlan_filtering(struct net_device *dev,
return 0;
 }
 
+static int dsa_fastest_ageing_time(struct dsa_switch *ds,
+  unsigned int ageing_time)
+{
+   int i;
+
+   for (i = 0; i < DSA_MAX_PORTS; ++i) {
+   struct dsa_port *dp = >ports[i];
+
+   if (dp && dp->ageing_time && dp->ageing_time < ageing_time)
+   ageing_time = dp->ageing_time;
+   }
+
+   return ageing_time;
+}
+
 static int dsa_slave_ageing_time(struct net_device *dev,
 const struct switchdev_attr *attr,
 struct switchdev_trans *trans)
@@ -346,6 +361,10 @@ static int dsa_slave_ageing_time(struct net_device *dev,
if (switchdev_trans_ph_prepare(trans))
return 0;
 
+   /* Keep the fastest ageing time in case of multiple bridges */
+   ds->ports[p->port].ageing_time = ageing_time;
+   ageing_time = dsa_fastest_ageing_time(ds, ageing_time);
+
if (ds->drv->set_ageing_time)
return ds->drv->set_ageing_time(ds, ageing_time);
 

Thanks,

Vivien

RE: [RFC PATCH 00/30] Kernel NET policy

2016-07-18 Thread Liang, Kan



> >>
> >> On Mon, Jul 18, 2016 at 8:45 AM, Andi Kleen  wrote:
> >> >> It seems strange to me to add such policies to the kernel.
> >> >> Addmittingly, documentation of some settings is non-existent and
> >> >> one needs various different tools to set this (sysctl, procfs, sysfs,
> ethtool, etc).
> >> >
> >> > The problem is that different applications need different policies.
> >> >
> >> > The only entity which can efficiently negotiate between different
> >> > applications' conflicting requests is the kernel. And that is
> >> > pretty much the basic job description of a kernel: multiplex
> >> > hardware efficiently between different users.
> >> >
> >> > So yes the user space tuning approach works for simple cases ("only
> >> > run workloads that require the same tuning"), but is ultimately not
> >> > very interesting nor scalable.
> >>
> >> I don't read the code yet, just the cover letter.
> >>
> >> We have global tunings, per-network-namespace tunings, per-socket
> tunings.
> >> It is still unclear why you can't just put different applications
> >> into different namespaces/containers to get different policies.
> >
> > In NET policy, we do per queue tunings.
> 
> Is it possible to isolate NIC queues for containers?

Yes, but we don't  have containers support yet.

Re: [RFC PATCH 00/30] Kernel NET policy

2016-07-18 Thread Cong Wang

On Mon, Jul 18, 2016 at 1:14 PM, Liang, Kan  wrote:
>
>
>>
>> On Mon, Jul 18, 2016 at 8:45 AM, Andi Kleen  wrote:
>> >> It seems strange to me to add such policies to the kernel.
>> >> Addmittingly, documentation of some settings is non-existent and one
>> >> needs various different tools to set this (sysctl, procfs, sysfs, 
>> >> ethtool, etc).
>> >
>> > The problem is that different applications need different policies.
>> >
>> > The only entity which can efficiently negotiate between different
>> > applications' conflicting requests is the kernel. And that is pretty
>> > much the basic job description of a kernel: multiplex hardware
>> > efficiently between different users.
>> >
>> > So yes the user space tuning approach works for simple cases ("only
>> > run workloads that require the same tuning"), but is ultimately not
>> > very interesting nor scalable.
>>
>> I don't read the code yet, just the cover letter.
>>
>> We have global tunings, per-network-namespace tunings, per-socket tunings.
>> It is still unclear why you can't just put different applications into 
>> different
>> namespaces/containers to get different policies.
>
> In NET policy, we do per queue tunings.

Is it possible to isolate NIC queues for containers?

Re: Getting IP address and port of a file descriptor

2016-07-18 Thread Cong Wang

On Mon, Jul 18, 2016 at 12:09 PM, Peter Chen
 wrote:
> Hi,
>
>I was wondering, if I was in the kernel, and I intercepted a system
> call such as read(). Would I be able, from the fd, determine the
> whether the fd is (1) a network socket? (2) the IP address and port of
> this socket? What are the kernel data structures and functions that
> can get these information for me in the kernel? Thanks.

You can use sockfd_lookup() to read the sock structure from a given fd,
after than you can call, for example, sock->ops->getname() to read
local IP address etc.

Re: [PATCH v2 05/11] Kbuild: don't add obj tree in additional includes

2016-07-18 Thread Michal Marek

On Wed, Jun 15, 2016 at 05:45:47PM +0200, Arnd Bergmann wrote:
> When building with separate object directories and driver specific
> Makefiles that add additional header include paths, Kbuild adjusts
> the gcc flags so that we include both the directory in the source
> tree and in the object tree.
> 
> However, due to another bug I fixed earlier, this did not actually
> include the correct directory in the object tree, so we know that
> we only really need the source tree here. Also, including the
> object tree sometimes causes warnings about nonexisting directories
> when the include path only exists in the source.
> 
> This changes the logic to only emit the -I argument for the srctree,
> not for objects. We still need both $(srctree)/$(src) and $(obj)
> though, so I'm adding them manually.
> 
> Signed-off-by: Arnd Bergmann 

Hi Arnd,

I applied the series up to this patch to kbuild.git#kbuild. The rest
seem to be related but not dependent patches, so I'll leave it up to the
respective maintainers to pick them up. Is that OK with you?

Thanks,
Michal

RE: [RFC PATCH 00/30] Kernel NET policy

2016-07-18 Thread Liang, Kan



> 
> On Mon, Jul 18, 2016 at 8:45 AM, Andi Kleen  wrote:
> >> It seems strange to me to add such policies to the kernel.
> >> Addmittingly, documentation of some settings is non-existent and one
> >> needs various different tools to set this (sysctl, procfs, sysfs, ethtool, 
> >> etc).
> >
> > The problem is that different applications need different policies.
> >
> > The only entity which can efficiently negotiate between different
> > applications' conflicting requests is the kernel. And that is pretty
> > much the basic job description of a kernel: multiplex hardware
> > efficiently between different users.
> >
> > So yes the user space tuning approach works for simple cases ("only
> > run workloads that require the same tuning"), but is ultimately not
> > very interesting nor scalable.
> 
> I don't read the code yet, just the cover letter.
> 
> We have global tunings, per-network-namespace tunings, per-socket tunings.
> It is still unclear why you can't just put different applications into 
> different
> namespaces/containers to get different policies.

In NET policy, we do per queue tunings.


Thanks,
Kan

Re: [PATCH v2 net-next v2 12/12] net: dsa: mv88e6xxx: add support for DSA ageing time

2016-07-18 Thread Andrew Lunn

On Mon, Jul 18, 2016 at 03:59:38PM -0400, Vivien Didelot wrote:
> Andrew Lunn  writes:
> 
> >> Nope, the bridge ageing time is not per-port, even though switchdev ops
> >> are per-port by design. This is a switch-wide attribute.
> >
> > So you are saying the core is doing all the reference counting, etc,
> > when swapping between fast and slow ageing?
> 
> I don't see how checking for the fastest ageing time would fix support
> for multiple bridges...

The bridge should switch to fast ageing after a topology change to
flush out entries which are now wrong. Using the short age time for
too long results in a bit more inefficiency, in that entries time out
faster than they need to. But if we go back to slow ageing too
quickly, e.g. because of another bridge, we get wrong operation, in
that bad entries can get stuck in the table for up to 5 minutes.

So either we need to keep fast ageing as long as there is one bridge
fast ageing, or we need to flush the whole MAC cache for a bridge on
topology change and don't bother with fast ageing at all.

> Maybe we can keep it simple for the moment with this switch-wide
> set_ageing_time operation, and later add a patch for the DSA layer to
> cache and elect the ageing time per-port or per-bridge.

I don't think it can be done at the DSA layer. It does not have the
information needed.

Andrew

[no subject]

2016-07-18 Thread J Walker

Re: [PATCH net] bnxt_en: Remove locking around txr->dev_state

2016-07-18 Thread Michael Chan

On Mon, Jul 18, 2016 at 1:02 PM, Florian Fainelli  wrote:
> txr->dev_state was not consistently manipulated with the acquisition of
> the per-queue lock, after further inspection the lock does not seem
> necessary, either the value is read as BNXT_DEV_STATE_CLOSING or 0.
>
> Reported-by: coverity (CID 1339583)
> Fixes: c0c050c58d840 ("bnxt_en: New Broadcom ethernet driver.")
> Signed-off-by: Florian Fainelli 

Thanks Florian.

Acked-by: Michael Chan

[PATCH net] bnxt_en: Remove locking around txr->dev_state

2016-07-18 Thread Florian Fainelli

txr->dev_state was not consistently manipulated with the acquisition of
the per-queue lock, after further inspection the lock does not seem
necessary, either the value is read as BNXT_DEV_STATE_CLOSING or 0.

Reported-by: coverity (CID 1339583)
Fixes: c0c050c58d840 ("bnxt_en: New Broadcom ethernet driver.")
Signed-off-by: Florian Fainelli 
---
Changes in v2:

- remove locking in bnxt_tx_disable() as recommended by Michael

 drivers/net/ethernet/broadcom/bnxt/bnxt.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c 
b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index c777cde85ce4..15e1d1885919 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -4580,9 +4580,7 @@ static void bnxt_tx_disable(struct bnxt *bp)
for (i = 0; i < bp->tx_nr_rings; i++) {
txr = >tx_ring[i];
txq = netdev_get_tx_queue(bp->dev, i);
-   __netif_tx_lock(txq, smp_processor_id());
txr->dev_state = BNXT_DEV_STATE_CLOSING;
-   __netif_tx_unlock(txq);
}
}
/* Stop all TX queues */
-- 
2.7.4

Re: [PATCH v2 net-next v2 12/12] net: dsa: mv88e6xxx: add support for DSA ageing time

2016-07-18 Thread Vivien Didelot

Andrew Lunn  writes:

>> Nope, the bridge ageing time is not per-port, even though switchdev ops
>> are per-port by design. This is a switch-wide attribute.
>
> So you are saying the core is doing all the reference counting, etc,
> when swapping between fast and slow ageing?

I don't see how checking for the fastest ageing time would fix support
for multiple bridges... I think that would make the code much more
complex for a small value. Multiple logical bridges on top of a single
physical switch is still a tricky topic.

Maybe we can keep it simple for the moment with this switch-wide
set_ageing_time operation, and later add a patch for the DSA layer to
cache and elect the ageing time per-port or per-bridge.

Thanks,

Vivien

Re: [PATCH 0/2] Fix DMA channel misreporting for the Renesas Ethernet drivers

2016-07-18 Thread David Miller

From: Sergei Shtylyov 
Date: Mon, 18 Jul 2016 17:13:02 +0300

> Hello.
> 
> On 07/18/2016 09:24 AM, David Miller wrote:
> 
>>>Here's a set of 2 patches against DaveM's 'net.git' repo fixing
>>> up the DMA channel reporting by 'ifconfig'...
>>
>> Is some fixing some ifconfig output that is effectively meaningless
>> appropriate for 'net'?
> 
>I just don't know any more users of SIOCGIFMAP ioctl... ip?
>It seems to  also be used in net/core/rtnetlink.c...

Who cares who uses it.  The important thing to ask is what in the
world could they do with the value?

DMA masks have no meaning or usage since ISA times.

Re: [RFC PATCH 00/30] Kernel NET policy

2016-07-18 Thread Andi Kleen

> > So where is your policy for power saving?  From past experience I can tell 
> > you
> 
> There is no policy for power saving yet. I will add it to my todo list.

Yes it's interesting to consider. The main goal here is to maximize CPU
idle residency? I wonder if that's that much different from the CPU policy.

-Andi

RE: [RFC PATCH 00/30] Kernel NET policy

2016-07-18 Thread Liang, Kan



> On Sun, Jul 17, 2016 at 11:55 PM,   wrote:
> > From: Kan Liang 
> >
> > It is a big challenge to get good network performance. First, the
> > network performance is not good with default system settings. Second,
> > it is too difficult to do automatic tuning for all possible workloads,
> > since workloads have different requirements. Some workloads may want
> > high throughput. Some may need low latency. Last but not least, there are
> lots of manual configurations.
> > Fine grained configuration is too difficult for users.
> 
> The problem as I see it is that this is just going to end up likely being an 
> even
> more intrusive version of irqbalance.  I really don't like the way that turned
> out as it did a number of really dumb things that usually result in it being
> disabled as soon as you actually want to do anything that will actually 
> involve
> any kind of performance tuning.  If this stuff is pushed into the kernel it 
> will
> be even harder to get rid of and that is definitely a bad thing.
> 
> > NET policy intends to simplify the network configuration and get a
> > good network performance according to the hints(policy) which is
> > applied by user. It provides some typical "policies" for user which
> > can be set per-socket, per-task or per-device. The kernel will
> > automatically figures out how to merge different requests to get good
> network performance.
> 
> So where is your policy for power saving?  From past experience I can tell you

There is no policy for power saving yet. I will add it to my todo list.

> that while performance tuning is a good thing, doing so at the expense of
> power management is bad.  In addition you seem to be making a lot of
> assumptions here that the end users are going to rewrite their applications to
> use the new socket options you added in order to try and tune the

Currently, they can set per task policy by proc to get good performance without
code changes.

> performance.  I have a hard time believing most developers are going to go
> to all that trouble.  In addition I suspect that even if they do go to that
> trouble they will probably still screw it up and you will end up with
> applications advertising latency as a goal when they should have specified
> CPU and so on.
> 
> > Net policy is designed for multiqueue network devices. This
> > implementation is only for Intel NICs using i40e driver. But the
> > concepts and generic code should apply to other multiqueue NICs too.
> 
> I would argue that your code is not very generic.  The fact that it is 
> relying on
> flow director already greatly limits what you can do.  If you want to make 
> this
> truly generic I would say you need to find ways to make this work on
> everything all the way down to things like i40evf and igb which don't have
> support for Flow Director.

Actually the NET policy codes employ ethtool's interface set_rxnfc to set rules.
It should be generic.
I guess I emphasize Flow Director too much in the document which make
you confuse.

> 
> > Net policy is also a combination of generic policy manager code and
> > some ethtool callbacks (per queue coalesce setting, flow
> > classification rules) to configure the driver.
> > This series also supports CPU hotplug and device hotplug.
> >
> > Here are some key Interfaces/APIs for NET policy.
> >
> >/proc/net/netpolicy/$DEV/policy
> >User can set/get per device policy from /proc
> >
> >/proc/$PID/net_policy
> >User can set/get per task policy from /proc
> >prctl(PR_SET_NETPOLICY, POLICY_NAME, NULL, NULL, NULL)
> >An alternative way to set/get per task policy is from prctl.
> >
> >setsockopt(sockfd,SOL_SOCKET,SO_NETPOLICY,,sizeof(int))
> >User can set/get per socket policy by setsockopt
> >
> >
> >int (*ndo_netpolicy_init)(struct net_device *dev,
> >  struct netpolicy_info *info);
> >Initialize device driver for NET policy
> >
> >int (*ndo_get_irq_info)(struct net_device *dev,
> >struct netpolicy_dev_info *info);
> >Collect device irq information
> 
> Instead of making the irq info a part of the ndo ops it might make more
> sense to make it part of an ethtool op.  Maybe you could make it so that you
> could specify a single queue at a time and get things like statistics, IRQ, 
> and
> ring information.

I will think about it. Thanks.

> 
> >int (*ndo_set_net_policy)(struct net_device *dev,
> >  enum netpolicy_name name);
> >Configure device according to policy name
> 
> I really don't like this piece of it.  I really think we shouldn't be leaving 
> so
> much up to the driver to determine how to handle things.

There are some settings are device specific. For example, the interrupt
moderation for i40e for BULK policy is (50, 125). For other device, the number
could be different. For other device, only tunning interrupt moderation may
not be enough. So

Re: [RFC PATCH 00/30] Kernel NET policy

2016-07-18 Thread Andi Kleen

> I wonder if this can be attacked from a different angle. What would be
> missing to add support for this in user space? The first possibility
> that came to my mind is to just multiplex those hints in the kernel.

"just" is the handwaving part here -- you're proposing a micro kernel
approach where part of the multiplexing job that the kernel is doing
is farmed out to a message passing user space component.

I suspect this would be far more complicated to get right and
perform well than a straight forward monolithic kernel subsystem --
which is traditionally how Linux has approached things.

The daemon would always need to work with out of date state
compared to the latest, because it cannot do any locking with the
kernel state.  So you end up with a complex distributed system with multiple
agents "fighting" with each other, and the tuning agent
never being able to keep up with the actual work.

Also of course it would be fundamentally less efficient than
kernel code doing that, just because of the additional context
switches needed.

-Andi

Re: [PATCH v2 net-next v2 09/12] net: dsa: mv88e6xxx: add cap for IRL

2016-07-18 Thread Andrew Lunn

On Mon, Jul 18, 2016 at 02:46:25PM -0400, Vivien Didelot wrote:
> Add capability flags to describe the presence of Ingress Rate Limit unit
> registers and an helper function to clear it.
> 
> In the meantime, fix a few harmless issues:
> 
>   - 6185 and 6095 don't have such registers (reserved)
>   - the previous code didn't wait for the IRL operation to complete
> 
> Signed-off-by: Vivien Didelot 

Reviewed-by: Andrew Lunn 

Andrew

Re: [PATCH v2 net-next v2 03/12] net: dsa: mv88e6xxx: extract device mapping

2016-07-18 Thread Vivien Didelot

Hi Andrew,

Andrew Lunn  writes:

>> +/* Indirect write to single pointer-data register with an Update bit */
>> +static int mv88e6xxx_update_write(struct mv88e6xxx_chip *chip,
>> +  int addr, int reg, u16 update)
>
> Hi Vivien
>
> I don't think mv88e6xxx_update_read() makes any sense? Can we just
> infer write? Call it mv88e6xxx_update().

Yes it does, a read operation in such register consists of write+read
(first write the pointer to read, then read the actual value.)

>> +static int mv88e6xxx_g2_device_mapping_write(struct mv88e6xxx_chip *chip,
>> + int target, int port)
>> +{
>> +u16 val = (target << 8) | (port & 0xf);
>> +
>> +return mv88e6xxx_update_write(chip, REG_GLOBAL2, GLOBAL2_DEVICE_MAPPING,
>> +  val);
>
> This would then all be on one line and look a better.

I plan to add more cleanup for register description later, such as
s/REG_GLOBAL2/ADDR_G2/ and s/GLOBAL2_/G2/. But that'll be a future patch.

Thanks,

Vivien

Re: [PATCH v2 net-next v2 08/12] net: dsa: mv88e6xxx: add cap for Priority Override

2016-07-18 Thread Andrew Lunn

On Mon, Jul 18, 2016 at 02:46:24PM -0400, Vivien Didelot wrote:
> Add flags and helpers to describe the presence of Priority Override
> Table (POT) related registers and simplify the setup of Global 2.
> 
> Signed-off-by: Vivien Didelot 

Reviewed-by: Andrew Lunn 

Andrew

Re: [PATCH v2 net-next v2 06/12] net: dsa: mv88e6xxx: rework Switch MAC setter

2016-07-18 Thread Andrew Lunn

On Mon, Jul 18, 2016 at 02:46:22PM -0400, Vivien Didelot wrote:
> Switches such as 88E6185 as 3 Switch MAC registers in Global 1. Newer
> chips such as 88E6352 have freed these registers in favor of an indirect
> access in a Switch MAC/WoL/WoF register in Global 2.
> 
> Explicit this difference with G1 and G2 helpers and flags.
> 
> Signed-off-by: Vivien Didelot 

Reviewed-by: Andrew Lunn 

Andrew

Re: [PATCH v2 net-next v2 07/12] net: dsa: mv88e6xxx: add cap for PVT

2016-07-18 Thread Andrew Lunn

On Mon, Jul 18, 2016 at 02:46:23PM -0400, Vivien Didelot wrote:
> Add flags to describe the presence of Cross-chip Port VLAN Table (PVT)
> related registers and simplify the setup of Global 2.
> 
> Signed-off-by: Vivien Didelot 

Reviewed-by: Andrew Lunn 

Andrew

Re: [PATCH v2 net-next v2 05/12] net: dsa: mv88e6xxx: add cap for MGMT Enables bits

2016-07-18 Thread Andrew Lunn

On Mon, Jul 18, 2016 at 02:46:21PM -0400, Vivien Didelot wrote:
> Some switches provide a Rsvd2CPU mechanism used to choose which of the
> 16 reserved multicast destination addresses matching 01:80:c2:00:00:0x
> should be considered as MGMT and thus forwarded to the CPU port.
> 
> Other switches extend this mechanism to also configure as MGMT the
> additional 16 reserved multicast addresses matching 01:80:c2:00:00:2x.
> 
> This mechanism is exposed via two registers in Global 2, and an Rsvd2CPU
> enable bit in the management register.
> 
> Newer chip (such as 88E6390) has replaced these registers with a new
> indirect MGMT mechanism in Global 1.
> 
> The patch adds two MV88E6XXX_FLAG_G2_MGMT_EN_{0,2}X flags to describe
> the presence of these Global 2 registers. If 88E6390 support is added, a
> MV88E6XXX_FLAG_G1_MGMT_CTRL flag will be needed to setup Rsvd2CPU.
> 
> Note: all switches still support in parallel the ATU Load operation with
> an MGMT Entry State to forward such frames in a less convenient way.
> 
> net: dsa: mv88e6xxx: add cap for MGMT Enable 2x

We seem to have an extra subject line here?

> Signed-off-by: Vivien Didelot 

Reviewed-by: Andrew Lunn 

Andrew

Re: [PATCH v2 net-next v2 04/12] net: dsa: mv88e6xxx: extract trunk mapping

2016-07-18 Thread Andrew Lunn

On Mon, Jul 18, 2016 at 02:46:20PM -0400, Vivien Didelot wrote:
> The Trunk Mask and Trunk Mapping registers are two Global 2 indirect
> accesses to trunking configuration.
> 
> Add helpers for these tables and simplify the Global 2 setup.
> 
> Signed-off-by: Vivien Didelot 

Reviewed-by: Andrew Lunn 

Andrew

Re: brcmfmac: respect hidden_ssid for AP interfaces

2016-07-18 Thread Kalle Valo

Rafał Miłecki wrote:
> This was succesfully tested with 4366B1. A small workaround is needed
> for the main interface otherwise it would stuck at the hidden state.
> 
> Signed-off-by: Rafał Miłecki 

Thanks, 1 patch applied to wireless-drivers-next.git:

c940de10d45e brcmfmac: respect hidden_ssid for AP interfaces

-- 
Sent by pwcli
https://patchwork.kernel.org/patch/9216049/

Re: [PATCH v2 net-next v2 03/12] net: dsa: mv88e6xxx: extract device mapping

2016-07-18 Thread Andrew Lunn

> +/* Indirect write to single pointer-data register with an Update bit */
> +static int mv88e6xxx_update_write(struct mv88e6xxx_chip *chip,
> +   int addr, int reg, u16 update)

Hi Vivien

I don't think mv88e6xxx_update_read() makes any sense? Can we just
infer write? Call it mv88e6xxx_update().

> +static int mv88e6xxx_g2_device_mapping_write(struct mv88e6xxx_chip *chip,
> +  int target, int port)
> +{
> + u16 val = (target << 8) | (port & 0xf);
> +
> + return mv88e6xxx_update_write(chip, REG_GLOBAL2, GLOBAL2_DEVICE_MAPPING,
> +   val);

This would then all be on one line and look a better.

 Andrew

Re: [PATCH v2 net-next v2 12/12] net: dsa: mv88e6xxx: add support for DSA ageing time

2016-07-18 Thread Andrew Lunn

> Nope, the bridge ageing time is not per-port, even though switchdev ops
> are per-port by design. This is a switch-wide attribute.

So you are saying the core is doing all the reference counting, etc,
when swapping between fast and slow ageing?

 Andrew

Re: wlcore/wl18xx: mesh: added initial mesh support for wl8

2016-07-18 Thread Kalle Valo

"Machani, Yaniv"  wrote:
> From: Maital Hahn 
> 
> 1. Added support for interface and role of mesh type.
> 2. Enabled enable/start of mesh-point role,
>and opening and closing a connection with a mesh peer.
> 3. Added multirole combination of mesh and ap
>under the same limits of dual ap mode.
> 4. Add support for 'sta_rc_update' opcode for mesh IF.
>The 'sta_rc_update' opcode is being used in mesh_plink.c.
> Add support in wlcore to handle this opcode correctly for mesh
> (as opposed to current implementation that handles STA only).
> 5. Bumped the firmware version to support new Mesh functionality
> 
> Signed-off-by: Maital Hahn 
> Signed-off-by: Yaniv Machani 

Thanks, 1 patch applied to wireless-drivers-next.git:

c0174ee28003 wlcore/wl18xx: mesh: added initial mesh support for wl8

-- 
Sent by pwcli
https://patchwork.kernel.org/patch/9202707/

Re: [PATCH v2 net-next v2 02/12] net: dsa: mv88e6xxx: split setup of Global 1 and 2

2016-07-18 Thread Andrew Lunn

On Mon, Jul 18, 2016 at 02:46:18PM -0400, Vivien Didelot wrote:
> Separate the setup of Global 1 and Global 2 internal SMI devices and add
> a flag to describe the presence of this second registers set.

Reviewed-by: Andrew Lunn 

Andrew

Re: [PATCH v2 net-next v2 12/12] net: dsa: mv88e6xxx: add support for DSA ageing time

2016-07-18 Thread Vivien Didelot

Hi Andrew,

Andrew Lunn  writes:

> This is way too simplistic.
>
> The switchdev call is per port. This manipulates the whole switch. We
> need to somehow handle the difference.
>
> I've not look at the bridge code, but i assume it initially sets each
> port to a long age time, probably 5 minutes. When there is a topology
> change, it enables fast ageing by setting a shorter age time in each
> port. After a while it will return to the default age time. Although
> the switchdev call is per port, i think the age time is a property of
> the bridge, not a port.
>
> For the Marvell devices, we only have a global setting. It will apply
> to all bridges we create on the switch. So if one bridge requests fast
> ageing, we need to apply it to all bridges. We should only go back to
> slow ageing when all bridges are out of fast ageing. That is, we need
> some sort of reference counting.
>
> I'm not sure we have enough information to know why the bridge changed
> the age timing. Did the use change the forwarding delay, or have we
> entered fast ageing? So i think for Marvell devices, we need an
> additional property passed down. Is this a fast or a slow age time?
> We can then determine what is the fastest fast ageing, and the fastest
> slow ageing is, perform reference counting as appropriate, and set the
> global setting as needed.

Nope, the bridge ageing time is not per-port, even though switchdev ops
are per-port by design. This is a switch-wide attribute.

See f55ac58ae64c ("switchdev: add bridge ageing_time attribute") [1]

Rocker and mlxsw implement AGEING_TIME switch-wide too.

[1] 
https://git.kernel.org/cgit/linux/kernel/git/davem/net.git/commit/?id=f55ac58ae64c

Thanks,

Vivien

[PATCH V2] Add flow control to the portmapper

2016-07-18 Thread Shiraz Saleem

From: Mustafa Ismail 

During connection establishment with a large number of connections,
it is possible that the connection requests might fail. Adding flow
control prevents this failure. Change ibnl unicast to use netlink
messaging with blocking to enable flow control.

Signed-off-by: Faisal Latif 
Signed-off-by: Mustafa Ismail 
Signed-off-by: Shiraz Saleem 
---

V2: update commit message with justification for flow control. CC'ing
linux-netdev mailing list.
 
 drivers/infiniband/core/netlink.c |  4 ++--
 include/net/netlink.h | 17 +
 2 files changed, 19 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/core/netlink.c 
b/drivers/infiniband/core/netlink.c
index 9b8c20c..6b09580 100644
--- a/drivers/infiniband/core/netlink.c
+++ b/drivers/infiniband/core/netlink.c
@@ -229,7 +229,7 @@ static void ibnl_rcv(struct sk_buff *skb)
 int ibnl_unicast(struct sk_buff *skb, struct nlmsghdr *nlh,
__u32 pid)
 {
-   return nlmsg_unicast(nls, skb, pid);
+   return nlmsg_unicast_block(nls, skb, pid);
 }
 EXPORT_SYMBOL(ibnl_unicast);
 
@@ -251,7 +251,7 @@ int __init ibnl_init(void)
pr_warn("Failed to create netlink socket\n");
return -ENOMEM;
}
-
+   nls->sk_sndtimeo = 10 * HZ;
return 0;
 }
 
diff --git a/include/net/netlink.h b/include/net/netlink.h
index 254a0fc..5434279 100644
--- a/include/net/netlink.h
+++ b/include/net/netlink.h
@@ -598,6 +598,23 @@ static inline int nlmsg_unicast(struct sock *sk, struct 
sk_buff *skb, u32 portid
 }
 
 /**
+ * nlmsg_unicast_block - unicast a netlink message with blocking
+ * @sk: netlink socket to spread message to
+ * @skb: netlink message as socket buffer
+ * @portid: netlink portid of the destination socket
+ */
+static inline int nlmsg_unicast_block(struct sock *sk, struct sk_buff *skb, 
u32 portid)
+{
+   int err;
+
+   err = netlink_unicast(sk, skb, portid, 0);
+   if (err > 0)
+   err = 0;
+
+   return err;
+}
+
+/**
  * nlmsg_for_each_msg - iterate over a stream of messages
  * @pos: loop counter, set to current message
  * @head: head of message stream
-- 
2.1.3

Re: [PATCH v2 net-next v2 01/12] net: dsa: mv88e6xxx: remove basic function flags

2016-07-18 Thread Andrew Lunn

On Mon, Jul 18, 2016 at 02:46:17PM -0400, Vivien Didelot wrote:
> All 88E6xxx Marvell switches (even the old not supported yet 88E6060)
> have at least an ATU, per-port STP states and VLAN map, to run basic
> switch functions such as Spanning Tree and port based VLANs.
> 
> Get rid of the related MV88E6XXX_FLAG_{ATU,PORTSTATE,VLANTABLE} flags,
> as they are defaults to every chip.
> 
> This enables STP on 6185 and removes many inconsistencies on others.
> 
> Signed-off-by: Vivien Didelot 

Reviewed-by: Andrew Lunn 

Andrew

Re: [PATCH v2 net-next v2 12/12] net: dsa: mv88e6xxx: add support for DSA ageing time

2016-07-18 Thread Andrew Lunn

On Mon, Jul 18, 2016 at 02:46:28PM -0400, Vivien Didelot wrote:
> Implement the DSA driver function to configure the bridge ageing time.
> 
> Signed-off-by: Vivien Didelot 
> ---
>  drivers/net/dsa/mv88e6xxx/chip.c | 14 ++
>  1 file changed, 14 insertions(+)
> 
> diff --git a/drivers/net/dsa/mv88e6xxx/chip.c 
> b/drivers/net/dsa/mv88e6xxx/chip.c
> index e2627a8..2101241 100644
> --- a/drivers/net/dsa/mv88e6xxx/chip.c
> +++ b/drivers/net/dsa/mv88e6xxx/chip.c
> @@ -3002,6 +3002,19 @@ static int mv88e6xxx_g1_set_age_time(struct 
> mv88e6xxx_chip *chip,
>   return mv88e6xxx_write(chip, REG_GLOBAL, GLOBAL_ATU_CONTROL, val);
>  }
>  
> +static int mv88e6xxx_set_ageing_time(struct dsa_switch *ds,
> +  unsigned int ageing_time)
> +{
> + struct mv88e6xxx_chip *chip = ds_to_priv(ds);
> + int err;
> +
> + mutex_lock(>reg_lock);
> + err = mv88e6xxx_g1_set_age_time(chip, ageing_time);
> + mutex_unlock(>reg_lock);
> +
> + return err;
> +}
> +
>  static int mv88e6xxx_g1_setup(struct mv88e6xxx_chip *chip)
>  {
>   struct dsa_switch *ds = chip->ds;
> @@ -3985,6 +3998,7 @@ static struct dsa_switch_driver mv88e6xxx_switch_driver 
> = {
>   .set_eeprom = mv88e6xxx_set_eeprom,
>   .get_regs_len   = mv88e6xxx_get_regs_len,
>   .get_regs   = mv88e6xxx_get_regs,
> + .set_ageing_time= mv88e6xxx_set_ageing_time,
>   .port_bridge_join   = mv88e6xxx_port_bridge_join,
>   .port_bridge_leave  = mv88e6xxx_port_bridge_leave,
>   .port_stp_state_set = mv88e6xxx_port_stp_state_set,

Hi Vivien

This is way too simplistic.

The switchdev call is per port. This manipulates the whole switch. We
need to somehow handle the difference.

I've not look at the bridge code, but i assume it initially sets each
port to a long age time, probably 5 minutes. When there is a topology
change, it enables fast ageing by setting a shorter age time in each
port. After a while it will return to the default age time. Although
the switchdev call is per port, i think the age time is a property of
the bridge, not a port.

For the Marvell devices, we only have a global setting. It will apply
to all bridges we create on the switch. So if one bridge requests fast
ageing, we need to apply it to all bridges. We should only go back to
slow ageing when all bridges are out of fast ageing. That is, we need
some sort of reference counting.

I'm not sure we have enough information to know why the bridge changed
the age timing. Did the use change the forwarding delay, or have we
entered fast ageing? So i think for Marvell devices, we need an
additional property passed down. Is this a fast or a slow age time?
We can then determine what is the fastest fast ageing, and the fastest
slow ageing is, perform reference counting as appropriate, and set the
global setting as needed.

   Andrew

Getting IP address and port of a file descriptor

2016-07-18 Thread Peter Chen

Hi,

   I was wondering, if I was in the kernel, and I intercepted a system
call such as read(). Would I be able, from the fd, determine the
whether the fd is (1) a network socket? (2) the IP address and port of
this socket? What are the kernel data structures and functions that
can get these information for me in the kernel? Thanks.

Peter

Re: [RFC PATCH 00/30] Kernel NET policy

2016-07-18 Thread Hannes Frederic Sowa

On 18.07.2016 17:45, Andi Kleen wrote:
>> It seems strange to me to add such policies to the kernel.
>> Addmittingly, documentation of some settings is non-existent and one needs
>> various different tools to set this (sysctl, procfs, sysfs, ethtool, etc).
> 
> The problem is that different applications need different policies.

I fear that if those policies get changed in future, people will rely on
some of their side-effects, causing us to add more and more policies
which basically just differ in those side-effects.

If you compare your policies to madvise or fadvise options, they seem a
have a much more strict and narrower effects, which can be reasoned much
more easily about.

> The only entity which can efficiently negotiate between different
> applications' conflicting requests is the kernel. And that is pretty 
> much the basic job description of a kernel: multiplex hardware
> efficiently between different users.

The multiplexing part seems to be not really relevant for the per-device
settings, thus being controllable from current user space just fine.
Per-task setting could be conflicting with per-socket settings which
could lead to non-deterministic behavior. Probably semantically it
should be made clear what overrides what here (here == cover letter).
Things like indeterminate allocation of sockets in a threaded
environment come to my mind. Also allocation strategy could very much
depend on the installed rss key.

> So yes the user space tuning approach works for simple cases
> ("only run workloads that require the same tuning"), but is ultimately not
> very interesting nor scalable.

I wonder if this can be attacked from a different angle. What would be
missing to add support for this in user space? The first possibility
that came to my mind is to just multiplex those hints in the kernel.
Implement a generic way to add metadata to sockets and allow tuning
daemons to retrieve them via sockdiag? I could imagine that if the
SO_INCOMING_CPU information would be visible in sockdiag, one could
already do more automatic tuning and basically allow to implement your
policy in user space.

Bye,
Hannes

Re: [PATCH v8 04/11] net/mlx4_en: add support for fast rx drop bpf program

2016-07-18 Thread Brenden Blanco

On Mon, Jul 18, 2016 at 01:39:02PM +0200, Tom Herbert wrote:
> On Mon, Jul 18, 2016 at 11:10 AM, Thomas Graf  wrote:
> > On 07/15/16 at 10:49am, Tom Herbert wrote:
[...]
> >> To me, an XDP program is just another attribute of an RX queue, it's
> >> really not special!. We already have a very good infrastructure for
> >> managing multiqueue and pretty much everything in the receive path
> >> operates at the queue level not the device level-- we should follow
> >> that model.
> >
> > I agree with that but I would like to keep the current per net_device
> > atomic properties.
> 
> I don't see that see that there is any synchronization guarantees
> using xchg. For instance, if the pointer is set right after being read
> by a thread for one queue and right before being read by a thread for
> another queue, this could result in the old and new program running
> concurrently or old one running after new. If we need to synchronize
> the operation across all queues then sequence
> ifdown,modify-config,ifup will work.
The case you mentioned is a valid criticism. The reason I wanted to keep this
fast xchg around is because the full stop/start operation on mlx4 is a second
or longer of downtime. I think something like the following should suffice to
have a clean cut between programs without bringing the whole port down, buffers
and all:

{
struct bpf_prog *old_prog;
bool port_up;
int i;

mutex_lock(>state_lock);
port_up = priv->port_up;
priv->port_up = false;
for (i = 0; i < priv->rx_ring_num; i++)
napi_synchronize(>rx_cq[i]->napi);

old_prog = xchg(>prog, prog);
if (old_prog)
bpf_prog_put(old_prog);

priv->port_up = port_up;
mutex_unlock(>state_lock);
}

Thoughts?

> 
> Tom

[PATCH net] net: switchdev: change ageing_time type to clock_t

2016-07-18 Thread Vivien Didelot

The switchdev value for the SWITCHDEV_ATTR_ID_BRIDGE_AGEING_TIME
attribute is a clock_t and requires to use helpers such as
clock_t_to_jiffies() to convert to milliseconds.

Change ageing_time type from u32 to clock_t to make it explicit.

Fixes: f55ac58ae64c ("switchdev: add bridge ageing_time attribute")
Signed-off-by: Vivien Didelot 
---
 include/net/switchdev.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/net/switchdev.h b/include/net/switchdev.h
index 9023e3e..62f6a96 100644
--- a/include/net/switchdev.h
+++ b/include/net/switchdev.h
@@ -60,7 +60,7 @@ struct switchdev_attr {
struct netdev_phys_item_id ppid;/* PORT_PARENT_ID */
u8 stp_state;   /* PORT_STP_STATE */
unsigned long brport_flags; /* PORT_BRIDGE_FLAGS */
-   u32 ageing_time;/* BRIDGE_AGEING_TIME */
+   clock_t ageing_time;/* BRIDGE_AGEING_TIME */
bool vlan_filtering;/* 
BRIDGE_VLAN_FILTERING */
} u;
 };
-- 
2.9.0

[PATCH v2 net-next v2 08/12] net: dsa: mv88e6xxx: add cap for Priority Override

2016-07-18 Thread Vivien Didelot

Add flags and helpers to describe the presence of Priority Override
Table (POT) related registers and simplify the setup of Global 2.

Signed-off-by: Vivien Didelot 
---
 drivers/net/dsa/mv88e6xxx/chip.c  | 37 +--
 drivers/net/dsa/mv88e6xxx/mv88e6xxx.h |  7 +++
 2 files changed, 34 insertions(+), 10 deletions(-)

diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
index 4e8bdc5..0864a78 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.c
+++ b/drivers/net/dsa/mv88e6xxx/chip.c
@@ -3176,6 +3176,29 @@ static int mv88e6xxx_g2_set_switch_mac(struct 
mv88e6xxx_chip *chip, u8 *addr)
return err;
 }
 
+static int mv88e6xxx_g2_pot_write(struct mv88e6xxx_chip *chip, int pointer,
+ u8 data)
+{
+   u16 val = (pointer << 8) | (data & 0x7);
+
+   return mv88e6xxx_update_write(chip, REG_GLOBAL2, GLOBAL2_PRIO_OVERRIDE,
+ val);
+}
+
+static int mv88e6xxx_g2_clear_pot(struct mv88e6xxx_chip *chip)
+{
+   int i, err;
+
+   /* Clear all sixteen possible Priority Override entries */
+   for (i = 0; i < 16; i++) {
+   err = mv88e6xxx_g2_pot_write(chip, i, 0);
+   if (err)
+   break;
+   }
+
+   return err;
+}
+
 static int mv88e6xxx_g2_setup(struct mv88e6xxx_chip *chip)
 {
u16 reg;
@@ -3233,17 +3256,11 @@ static int mv88e6xxx_g2_setup(struct mv88e6xxx_chip 
*chip)
return err;
}
 
-   if (mv88e6xxx_6352_family(chip) || mv88e6xxx_6351_family(chip) ||
-   mv88e6xxx_6165_family(chip) || mv88e6xxx_6097_family(chip) ||
-   mv88e6xxx_6320_family(chip)) {
+   if (mv88e6xxx_has(chip, MV88E6XXX_FLAG_G2_POT)) {
/* Clear the priority override table. */
-   for (i = 0; i < 16; i++) {
-   err = _mv88e6xxx_reg_write(chip, REG_GLOBAL2,
-  GLOBAL2_PRIO_OVERRIDE,
-  0x8000 | (i << 8));
-   if (err)
-   return err;
-   }
+   err = mv88e6xxx_g2_clear_pot(chip);
+   if (err)
+   return err;
}
 
if (mv88e6xxx_6352_family(chip) || mv88e6xxx_6351_family(chip) ||
diff --git a/drivers/net/dsa/mv88e6xxx/mv88e6xxx.h 
b/drivers/net/dsa/mv88e6xxx/mv88e6xxx.h
index 4e03650..06b11fb 100644
--- a/drivers/net/dsa/mv88e6xxx/mv88e6xxx.h
+++ b/drivers/net/dsa/mv88e6xxx/mv88e6xxx.h
@@ -396,6 +396,7 @@ enum mv88e6xxx_cap {
MV88E6XXX_CAP_G2_PVT_ADDR,  /* (0x0b) Cross Chip Port VLAN Addr */
MV88E6XXX_CAP_G2_PVT_DATA,  /* (0x0c) Cross Chip Port VLAN Data */
MV88E6XXX_CAP_G2_SWITCH_MAC,/* (0x0d) Switch MAC/WoL/WoF */
+   MV88E6XXX_CAP_G2_POT,   /* (0x0f) Priority Override Table */
 
/* Multi-chip Addressing Mode.
 * Some chips require an indirect SMI access when their SMI device
@@ -442,6 +443,7 @@ enum mv88e6xxx_cap {
 #define MV88E6XXX_FLAG_G2_PVT_ADDR BIT(MV88E6XXX_CAP_G2_PVT_ADDR)
 #define MV88E6XXX_FLAG_G2_PVT_DATA BIT(MV88E6XXX_CAP_G2_PVT_DATA)
 #define MV88E6XXX_FLAG_G2_SWITCH_MAC   BIT(MV88E6XXX_CAP_G2_SWITCH_MAC)
+#define MV88E6XXX_FLAG_G2_POT  BIT(MV88E6XXX_CAP_G2_POT)
 #define MV88E6XXX_FLAG_MULTI_CHIP  BIT(MV88E6XXX_CAP_MULTI_CHIP)
 #define MV88E6XXX_FLAG_PPU BIT(MV88E6XXX_CAP_PPU)
 #define MV88E6XXX_FLAG_PPU_ACTIVE  BIT(MV88E6XXX_CAP_PPU_ACTIVE)
@@ -467,6 +469,7 @@ enum mv88e6xxx_cap {
(MV88E6XXX_FLAG_GLOBAL2 |   \
 MV88E6XXX_FLAG_G2_MGMT_EN_2X | \
 MV88E6XXX_FLAG_G2_MGMT_EN_0X | \
+MV88E6XXX_FLAG_G2_POT |\
 MV88E6XXX_FLAG_MULTI_CHIP |\
 MV88E6XXX_FLAG_PPU |   \
 MV88E6XXX_FLAG_STU |   \
@@ -478,6 +481,7 @@ enum mv88e6xxx_cap {
 MV88E6XXX_FLAG_G2_MGMT_EN_2X | \
 MV88E6XXX_FLAG_G2_MGMT_EN_0X | \
 MV88E6XXX_FLAG_G2_SWITCH_MAC | \
+MV88E6XXX_FLAG_G2_POT |\
 MV88E6XXX_FLAG_MULTI_CHIP |\
 MV88E6XXX_FLAG_STU |   \
 MV88E6XXX_FLAG_TEMP |  \
@@ -498,6 +502,7 @@ enum mv88e6xxx_cap {
 MV88E6XXX_FLAG_G2_MGMT_EN_2X | \
 MV88E6XXX_FLAG_G2_MGMT_EN_0X | \
 MV88E6XXX_FLAG_G2_SWITCH_MAC | \
+MV88E6XXX_FLAG_G2_POT |\
 MV88E6XXX_FLAG_MULTI_CHIP |\
 MV88E6XXX_FLAG_PPU_ACTIVE |\
 MV88E6XXX_FLAG_SMI_PHY |   \
@@ -511,6 +516,7 @@ enum mv88e6xxx_cap {
 MV88E6XXX_FLAG_G2_MGMT_EN_2X | \
 MV88E6XXX_FLAG_G2_MGMT_EN_0X | \
 MV88E6XXX_FLAG_G2_SWITCH_MAC | \
+MV88E6XXX_FLAG_G2_POT |\
 MV88E6XXX_FLAG_MULTI_CHIP |\
 MV88E6XXX_FLAG_PPU_ACTIVE |\
 MV88E6XXX_FLAG_SMI_PHY |   \
@@ -526,6 +532,7 @@ enum

[PATCH v2 net-next v2 00/12] net: dsa: mv88e6xxx: Global2 cleanup and STP

2016-07-18 Thread Vivien Didelot

The Marvell switches registers are organized in distinct internal SMI
devices, such as PHY, Port, Global 1 or Global 2 registers sets.

Since not all chips support every registers sets or have slightly
differences in them (such as old 88E6060 or new 88E6390 likely to be
supported soon), make the setup code clearer now by removing a few
family checks and adding flags to describe the Global 2 registers map.

This patchset enables basic STP support and bridging on most chips when
getting rid of a few inconsistencies in chip descriptions (patch 1) and
add bridge Ageing Time support to DSA and the mv88e6xxx driver.

Changes v1 -> v2:
  - add a write helper for pointer-data Update registers
  - add ageing time support

Vivien Didelot (12):
  net: dsa: mv88e6xxx: remove basic function flags
  net: dsa: mv88e6xxx: split setup of Global 1 and 2
  net: dsa: mv88e6xxx: extract device mapping
  net: dsa: mv88e6xxx: extract trunk mapping
  net: dsa: mv88e6xxx: add cap for MGMT Enables bits
  net: dsa: mv88e6xxx: rework Switch MAC setter
  net: dsa: mv88e6xxx: add cap for PVT
  net: dsa: mv88e6xxx: add cap for Priority Override
  net: dsa: mv88e6xxx: add cap for IRL
  net: dsa: support switchdev ageing time attr
  net: dsa: mv88e6xxx: add G1 helper for ageing time
  net: dsa: mv88e6xxx: add support for DSA ageing time

 drivers/net/dsa/mv88e6xxx/chip.c  | 521 ++
 drivers/net/dsa/mv88e6xxx/mv88e6xxx.h | 148 ++
 include/net/dsa.h |   1 +
 net/dsa/slave.c   |  22 ++
 4 files changed, 457 insertions(+), 235 deletions(-)

-- 
2.9.0

Re: [PATCH 3/3] mac80211: mesh: fixed HT ies in beacon template

2016-07-18 Thread Johannes Berg

On Mon, 2016-07-18 at 09:38 -0400, Bob Copeland wrote:
> On Wed, Jul 13, 2016 at 02:45:40PM +0300, Yaniv Machani wrote:
> > The HT capab info field inside the HT capab IE of the mesh beacon
> > is incorrect (in the case of 20MHz channel width).
> > To fix this driver will check configuration from cfg and
> > will build it accordingly.
> 
> > +/* determine capability flags */
> > +   cap = sband->ht_cap.cap;
> > +
> > +/* if channel width is 20MHz - configure HT capab
> > accordingly*/
> > +   if (sdata->vif.bss_conf.chandef.width ==
> > NL80211_CHAN_WIDTH_20) {
> > +   cap &= ~IEEE80211_HT_CAP_SUP_WIDTH_20_40;
> > +   cap &= ~IEEE80211_HT_CAP_DSSSCCK40;
> > +   }
> 
> Is it required that HT capability match the HT operation in this
> case?
> 

Is there ever a case that HT *capability* should be restricted
artificially like that? I can't remember any cases - we do something
like that to work around broken APs in some cases, but here?

johannes

[PATCH v2 net-next v2 02/12] net: dsa: mv88e6xxx: split setup of Global 1 and 2

2016-07-18 Thread Vivien Didelot

Separate the setup of Global 1 and Global 2 internal SMI devices and add
a flag to describe the presence of this second registers set.

Also rearrange the G1 setup in the registers order.

Signed-off-by: Vivien Didelot 
---
 drivers/net/dsa/mv88e6xxx/chip.c  | 71 ++-
 drivers/net/dsa/mv88e6xxx/mv88e6xxx.h | 23 +---
 2 files changed, 62 insertions(+), 32 deletions(-)

diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
index 3feb842..1e39fa6 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.c
+++ b/drivers/net/dsa/mv88e6xxx/chip.c
@@ -2993,13 +2993,12 @@ static int mv88e6xxx_setup_port(struct mv88e6xxx_chip 
*chip, int port)
return 0;
 }
 
-static int mv88e6xxx_setup_global(struct mv88e6xxx_chip *chip)
+static int mv88e6xxx_g1_setup(struct mv88e6xxx_chip *chip)
 {
struct dsa_switch *ds = chip->ds;
u32 upstream_port = dsa_upstream_port(ds);
u16 reg;
int err;
-   int i;
 
/* Enable the PHY Polling Unit if present, don't discard any packets,
 * and mask all interrupt sources.
@@ -3040,6 +3039,16 @@ static int mv88e6xxx_setup_global(struct mv88e6xxx_chip 
*chip)
if (err)
return err;
 
+   /* Clear all the VTU and STU entries */
+   err = _mv88e6xxx_vtu_stu_flush(chip);
+   if (err < 0)
+   return err;
+
+   /* Clear all ATU entries */
+   err = _mv88e6xxx_atu_flush(chip, 0, true);
+   if (err)
+   return err;
+
/* Configure the IP ToS mapping registers. */
err = _mv88e6xxx_reg_write(chip, REG_GLOBAL, GLOBAL_IP_PRI_0, 0x);
if (err)
@@ -3071,6 +3080,26 @@ static int mv88e6xxx_setup_global(struct mv88e6xxx_chip 
*chip)
if (err)
return err;
 
+   /* Clear the statistics counters for all ports */
+   err = _mv88e6xxx_reg_write(chip, REG_GLOBAL, GLOBAL_STATS_OP,
+  GLOBAL_STATS_OP_FLUSH_ALL);
+   if (err)
+   return err;
+
+   /* Wait for the flush to complete. */
+   err = _mv88e6xxx_stats_wait(chip);
+   if (err)
+   return err;
+
+   return 0;
+}
+
+static int mv88e6xxx_g2_setup(struct mv88e6xxx_chip *chip)
+{
+   struct dsa_switch *ds = chip->ds;
+   int err;
+   int i;
+
/* Send all frames with destination addresses matching
 * 01:80:c2:00:00:0x to the CPU port.
 */
@@ -3174,28 +3203,7 @@ static int mv88e6xxx_setup_global(struct mv88e6xxx_chip 
*chip)
}
}
 
-   /* Clear the statistics counters for all ports */
-   err = _mv88e6xxx_reg_write(chip, REG_GLOBAL, GLOBAL_STATS_OP,
-  GLOBAL_STATS_OP_FLUSH_ALL);
-   if (err)
-   return err;
-
-   /* Wait for the flush to complete. */
-   err = _mv88e6xxx_stats_wait(chip);
-   if (err)
-   return err;
-
-   /* Clear all ATU entries */
-   err = _mv88e6xxx_atu_flush(chip, 0, true);
-   if (err)
-   return err;
-
-   /* Clear all the VTU and STU entries */
-   err = _mv88e6xxx_vtu_stu_flush(chip);
-   if (err < 0)
-   return err;
-
-   return err;
+   return 0;
 }
 
 static int mv88e6xxx_setup(struct dsa_switch *ds)
@@ -3216,12 +3224,21 @@ static int mv88e6xxx_setup(struct dsa_switch *ds)
if (err)
goto unlock;
 
-   err = mv88e6xxx_setup_global(chip);
+   /* Setup Switch Port Registers */
+   for (i = 0; i < chip->info->num_ports; i++) {
+   err = mv88e6xxx_setup_port(chip, i);
+   if (err)
+   goto unlock;
+   }
+
+   /* Setup Switch Global 1 Registers */
+   err = mv88e6xxx_g1_setup(chip);
if (err)
goto unlock;
 
-   for (i = 0; i < chip->info->num_ports; i++) {
-   err = mv88e6xxx_setup_port(chip, i);
+   /* Setup Switch Global 2 Registers */
+   if (mv88e6xxx_has(chip, MV88E6XXX_FLAG_GLOBAL2)) {
+   err = mv88e6xxx_g2_setup(chip);
if (err)
goto unlock;
}
diff --git a/drivers/net/dsa/mv88e6xxx/mv88e6xxx.h 
b/drivers/net/dsa/mv88e6xxx/mv88e6xxx.h
index 2ff62f4..390dac5 100644
--- a/drivers/net/dsa/mv88e6xxx/mv88e6xxx.h
+++ b/drivers/net/dsa/mv88e6xxx/mv88e6xxx.h
@@ -383,6 +383,11 @@ enum mv88e6xxx_cap {
 */
MV88E6XXX_CAP_EEPROM,
 
+   /* Switch Global 2 Registers.
+* The device contains a second set of global 16-bit registers.
+*/
+   MV88E6XXX_CAP_GLOBAL2,
+
/* Multi-chip Addressing Mode.
 * Some chips require an indirect SMI access when their SMI device
 * address is not zero. See SMI_CMD and SMI_DATA.
@@ -429,6 +434,7 @@ enum mv88e6xxx_cap {
 /* Bitmask of capabilities */
 #define MV88E6XXX_FLAG_EEE BIT(MV88E6XXX_CAP_EEE)
 #define

[PATCH v2 net-next v2 06/12] net: dsa: mv88e6xxx: rework Switch MAC setter

2016-07-18 Thread Vivien Didelot

Switches such as 88E6185 as 3 Switch MAC registers in Global 1. Newer
chips such as 88E6352 have freed these registers in favor of an indirect
access in a Switch MAC/WoL/WoF register in Global 2.

Explicit this difference with G1 and G2 helpers and flags.

Signed-off-by: Vivien Didelot 
---
 drivers/net/dsa/mv88e6xxx/chip.c  | 121 +-
 drivers/net/dsa/mv88e6xxx/mv88e6xxx.h |  19 ++
 2 files changed, 65 insertions(+), 75 deletions(-)

diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
index 818..615c153 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.c
+++ b/drivers/net/dsa/mv88e6xxx/chip.c
@@ -283,68 +283,6 @@ static int mv88e6xxx_reg_write(struct mv88e6xxx_chip 
*chip, int addr,
return ret;
 }
 
-static int mv88e6xxx_set_addr_direct(struct dsa_switch *ds, u8 *addr)
-{
-   struct mv88e6xxx_chip *chip = ds_to_priv(ds);
-   int err;
-
-   err = mv88e6xxx_reg_write(chip, REG_GLOBAL, GLOBAL_MAC_01,
- (addr[0] << 8) | addr[1]);
-   if (err)
-   return err;
-
-   err = mv88e6xxx_reg_write(chip, REG_GLOBAL, GLOBAL_MAC_23,
- (addr[2] << 8) | addr[3]);
-   if (err)
-   return err;
-
-   return mv88e6xxx_reg_write(chip, REG_GLOBAL, GLOBAL_MAC_45,
-  (addr[4] << 8) | addr[5]);
-}
-
-static int mv88e6xxx_set_addr_indirect(struct dsa_switch *ds, u8 *addr)
-{
-   struct mv88e6xxx_chip *chip = ds_to_priv(ds);
-   int ret;
-   int i;
-
-   for (i = 0; i < 6; i++) {
-   int j;
-
-   /* Write the MAC address byte. */
-   ret = mv88e6xxx_reg_write(chip, REG_GLOBAL2, GLOBAL2_SWITCH_MAC,
- GLOBAL2_SWITCH_MAC_BUSY |
- (i << 8) | addr[i]);
-   if (ret)
-   return ret;
-
-   /* Wait for the write to complete. */
-   for (j = 0; j < 16; j++) {
-   ret = mv88e6xxx_reg_read(chip, REG_GLOBAL2,
-GLOBAL2_SWITCH_MAC);
-   if (ret < 0)
-   return ret;
-
-   if ((ret & GLOBAL2_SWITCH_MAC_BUSY) == 0)
-   break;
-   }
-   if (j == 16)
-   return -ETIMEDOUT;
-   }
-
-   return 0;
-}
-
-static int mv88e6xxx_set_addr(struct dsa_switch *ds, u8 *addr)
-{
-   struct mv88e6xxx_chip *chip = ds_to_priv(ds);
-
-   if (mv88e6xxx_has(chip, MV88E6XXX_FLAG_SWITCH_MAC))
-   return mv88e6xxx_set_addr_indirect(ds, addr);
-   else
-   return mv88e6xxx_set_addr_direct(ds, addr);
-}
-
 static int mv88e6xxx_mdio_read_direct(struct mv88e6xxx_chip *chip,
  int addr, int regnum)
 {
@@ -3019,6 +2957,24 @@ static int mv88e6xxx_setup_port(struct mv88e6xxx_chip 
*chip, int port)
return 0;
 }
 
+static int mv88e6xxx_g1_set_switch_mac(struct mv88e6xxx_chip *chip, u8 *addr)
+{
+   int err;
+
+   err = mv88e6xxx_write(chip, REG_GLOBAL, GLOBAL_MAC_01,
+ (addr[0] << 8) | addr[1]);
+   if (err)
+   return err;
+
+   err = mv88e6xxx_write(chip, REG_GLOBAL, GLOBAL_MAC_23,
+ (addr[2] << 8) | addr[3]);
+   if (err)
+   return err;
+
+   return mv88e6xxx_write(chip, REG_GLOBAL, GLOBAL_MAC_45,
+  (addr[4] << 8) | addr[5]);
+}
+
 static int mv88e6xxx_g1_setup(struct mv88e6xxx_chip *chip)
 {
struct dsa_switch *ds = chip->ds;
@@ -3197,6 +3153,29 @@ static int mv88e6xxx_g2_clear_trunk(struct 
mv88e6xxx_chip *chip)
return 0;
 }
 
+/* Indirect write to the Switch MAC/WoL/WoF register */
+static int mv88e6xxx_g2_switch_mac_write(struct mv88e6xxx_chip *chip,
+unsigned int pointer, u8 data)
+{
+   u16 val = (pointer << 8) | data;
+
+   return mv88e6xxx_update_write(chip, REG_GLOBAL2, GLOBAL2_SWITCH_MAC,
+ val);
+}
+
+static int mv88e6xxx_g2_set_switch_mac(struct mv88e6xxx_chip *chip, u8 *addr)
+{
+   int i, err;
+
+   for (i = 0; i < 6; i++) {
+   err = mv88e6xxx_g2_switch_mac_write(chip, i, addr[i]);
+   if (err)
+   break;
+   }
+
+   return err;
+}
+
 static int mv88e6xxx_g2_setup(struct mv88e6xxx_chip *chip)
 {
u16 reg;
@@ -3330,6 +3309,24 @@ unlock:
return err;
 }
 
+static int mv88e6xxx_set_addr(struct dsa_switch *ds, u8 *addr)
+{
+   struct mv88e6xxx_chip *chip = ds_to_priv(ds);
+   int err;
+
+   mutex_lock(>reg_lock);
+
+   /* Has an indirect Switch MAC/WoL/WoF register in Global 2? */
+   if (mv88e6xxx_has(chip,

1 2 3 >

1 - 100 of 247 matches

Mail list logo