date:20151216

Re: [PATCH 01/15] i40e: Add support for client interface for IWARP driver

2015-12-16 Thread Joe Perches

On Wed, 2015-12-16 at 13:58 -0600, Faisal Latif wrote:
> From: Anjali Singhai Jain 
> 
> This patch adds a Client interface for i40iw driver
> support. Also expands the Virtchannel to support messages
> from i40evf driver on behalf of i40iwvf driver.
[]
> diff --git a/drivers/net/ethernet/intel/i40e/i40e_client.c 
> b/drivers/net/ethernet/intel/i40e/i40e_client.c
[]
> + * Contact Information:
> + * e1000-devel Mailing List 

trivia:

This should probably be: intel-wired-...@lists.osuosl.org

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 15/15] i40iw: changes for build of i40iw module

2015-12-16 Thread Christoph Hellwig

> --- a/include/uapi/rdma/rdma_netlink.h
> +++ b/include/uapi/rdma/rdma_netlink.h
> @@ -5,6 +5,7 @@
>  
>  enum {
>   RDMA_NL_RDMA_CM = 1,
> + RDMA_NL_I40IW,
>   RDMA_NL_NES,
>   RDMA_NL_C4IW,
>   RDMA_NL_LS, /* RDMA Local Services */

This changes the values for the existing RDMA_NL_NES, RDMA_NL_C4IW and
RDMA_NL_LS symbols.  Please add your new value at the end.  And it
should probably be a separate patch as it's not related to the build
system and referenced by the earlier patches.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next] hv_netvsc: Use simple parser for IPv4 and v6 headers

2015-12-16 Thread Eric Dumazet

On Wed, 2015-12-16 at 10:03 -0800, Haiyang Zhang wrote:
> To avoid performance overhead when using skb_flow_dissect_flow_keys(),
> we switch to the simple parsers to get the IP and port numbers.
> 
> Performance comparison: throughput (Gbps):
> Number of connections, before patch, after patch
> 1 8.5610.18
> 4 11.17   14.07
> 1612.21   21.78
> 6418.71   32.08
> 256   15.92   26.32
> 1024  8.4115.49
> 3000  7.8211.58
> 
> Signed-off-by: Haiyang Zhang 
> Tested-by: Simon Xiao 
> Reviewed-by: K. Y. Srinivasan 
> ---
>  drivers/net/hyperv/netvsc_drv.c |   38 +-
>  1 files changed, 29 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
> index 1c8db9a..e28951f 100644
> --- a/drivers/net/hyperv/netvsc_drv.c
> +++ b/drivers/net/hyperv/netvsc_drv.c
> @@ -237,20 +237,40 @@ static u32 comp_hash(u8 *key, int klen, void *data, int 
> dlen)
>  
>  static bool netvsc_set_hash(u32 *hash, struct sk_buff *skb)
>  {
> - struct flow_keys flow;
> + struct iphdr *iphdr;
> + struct ipv6hdr *ipv6hdr;
> + __be32 dbuf[9];
>   int data_len;
>  
> - if (!skb_flow_dissect_flow_keys(skb, , 0) ||
> - !(flow.basic.n_proto == htons(ETH_P_IP) ||
> -   flow.basic.n_proto == htons(ETH_P_IPV6)))
> + if (eth_hdr(skb)->h_proto != htons(ETH_P_IP) &&
> + eth_hdr(skb)->h_proto != htons(ETH_P_IPV6))
>   return false;
>  
> - if (flow.basic.ip_proto == IPPROTO_TCP)
> - data_len = 12;
> - else
> - data_len = 8;
> + iphdr = ip_hdr(skb);
> + ipv6hdr = ipv6_hdr(skb);
> +
> + if (iphdr->version == 4) {
> + dbuf[0] = iphdr->saddr;
> + dbuf[1] = iphdr->daddr;
> + if (iphdr->protocol == IPPROTO_TCP) {
> + dbuf[2] = *(__be32 *)_hdr(skb)->source;
> + data_len = 12;
> + } else {
> + data_len = 8;
> + }
> + } else if (ipv6hdr->version == 6) {
> + memcpy(dbuf, >saddr, 32);
> + if (ipv6hdr->nexthdr == IPPROTO_TCP) {
> + dbuf[8] = *(__be32 *)_hdr(skb)->source;
> + data_len = 36;
> + } else {
> + data_len = 32;
> + }
> + } else {
> + return false;
> + }
>  
> - *hash = comp_hash(netvsc_hash_key, HASH_KEYLEN, , data_len);
> + *hash = comp_hash(netvsc_hash_key, HASH_KEYLEN, dbuf, data_len);
>  
>   return true;
>  }


This looks very very wrong to me.

How many times this is called per second, for the 'one flow' case ?

Don't you use TSO in this driver ?

What about encapsulation ?

I suspect you have a quite different issue here.

You simply could use skb_get_hash() since local TCP flows will provide a
l4 skb->hash and you have no further flow dissection to do.




--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next] hv_netvsc: Use simple parser for IPv4 and v6 headers

2015-12-16 Thread Sergei Shtylyov


Hello.

On 12/16/2015 09:03 PM, Haiyang Zhang wrote:


To avoid performance overhead when using skb_flow_dissect_flow_keys(),
we switch to the simple parsers to get the IP and port numbers.

Performance comparison: throughput (Gbps):
Number of connections, before patch, after patch
1   8.5610.18
4   11.17   14.07
16  12.21   21.78
64  18.71   32.08
256 15.92   26.32
10248.4115.49
30007.8211.58

Signed-off-by: Haiyang Zhang 
Tested-by: Simon Xiao 
Reviewed-by: K. Y. Srinivasan 
---
  drivers/net/hyperv/netvsc_drv.c |   38 +-
  1 files changed, 29 insertions(+), 9 deletions(-)

diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
index 1c8db9a..e28951f 100644
--- a/drivers/net/hyperv/netvsc_drv.c
+++ b/drivers/net/hyperv/netvsc_drv.c
@@ -237,20 +237,40 @@ static u32 comp_hash(u8 *key, int klen, void *data, int 
dlen)

[...]

+   if (iphdr->version == 4) {
+   dbuf[0] = iphdr->saddr;
+   dbuf[1] = iphdr->daddr;
+   if (iphdr->protocol == IPPROTO_TCP) {
+   dbuf[2] = *(__be32 *)_hdr(skb)->source;
+   data_len = 12;
+   } else {
+   data_len = 8;
+   }
+   } else if (ipv6hdr->version == 6) {
+   memcpy(dbuf, >saddr, 32);
+   if (ipv6hdr->nexthdr == IPPROTO_TCP) {
+   dbuf[8] = *(__be32 *)_hdr(skb)->source;
+   data_len = 36;
+   } else {
+   data_len = 32;
+   }
+   } else {
+   return false;
+   }


   This is asking to be a *switch* statement.

[...]

MBR, Sergei

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 04/15] i40iw: add puda code

2015-12-16 Thread Faisal Latif

i40iw_puda.[ch] are files to handle iwarp connection packets as
well as exception packets over multiple privilege mode uda queues.

Acked-by: Anjali Singhai Jain 
Acked-by: Shannon Nelson 
Signed-off-by: Faisal Latif 
---
 drivers/infiniband/hw/i40iw/i40iw_puda.c | 1443 ++
 drivers/infiniband/hw/i40iw/i40iw_puda.h |  183 
 2 files changed, 1626 insertions(+)
 create mode 100644 drivers/infiniband/hw/i40iw/i40iw_puda.c
 create mode 100644 drivers/infiniband/hw/i40iw/i40iw_puda.h

diff --git a/drivers/infiniband/hw/i40iw/i40iw_puda.c 
b/drivers/infiniband/hw/i40iw/i40iw_puda.c
new file mode 100644
index 000..8e628af
--- /dev/null
+++ b/drivers/infiniband/hw/i40iw/i40iw_puda.c
@@ -0,0 +1,1443 @@
+/***
+*
+* Copyright (c) 2015 Intel Corporation.  All rights reserved.
+*
+* This software is available to you under a choice of one of two
+* licenses.  You may choose to be licensed under the terms of the GNU
+* General Public License (GPL) Version 2, available from the file
+* COPYING in the main directory of this source tree, or the
+* OpenFabrics.org BSD license below:
+*
+*   Redistribution and use in source and binary forms, with or
+*   without modification, are permitted provided that the following
+*   conditions are met:
+*
+*- Redistributions of source code must retain the above
+*  copyright notice, this list of conditions and the following
+*  disclaimer.
+*
+*- Redistributions in binary form must reproduce the above
+*  copyright notice, this list of conditions and the following
+*  disclaimer in the documentation and/or other materials
+*  provided with the distribution.
+*
+* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+* EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+* MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+* NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+* BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+* ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+* CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+* SOFTWARE.
+*
+***/
+
+#include "i40iw_osdep.h"
+#include "i40iw_register.h"
+#include "i40iw_status.h"
+#include "i40iw_hmc.h"
+
+#include "i40iw_d.h"
+#include "i40iw_type.h"
+#include "i40iw_p.h"
+#include "i40iw_puda.h"
+
+static void i40iw_ieq_receive(struct i40iw_sc_dev *dev,
+ struct i40iw_puda_buf *buf);
+static void i40iw_ieq_tx_compl(struct i40iw_sc_dev *dev, void *sqwrid);
+static void i40iw_ilq_putback_rcvbuf(struct i40iw_sc_qp *qp, u32 wqe_idx);
+static enum i40iw_status_code i40iw_puda_replenish_rq(struct i40iw_puda_rsrc
+ *rsrc, bool initial);
+/**
+ * i40iw_puda_get_listbuf - get buffer from puda list
+ * @list: list to use for buffers (ILQ or IEQ)
+ */
+static struct i40iw_puda_buf *i40iw_puda_get_listbuf(struct list_head *list)
+{
+   struct i40iw_puda_buf *buf = NULL;
+
+   if (!list_empty(list)) {
+   buf = (struct i40iw_puda_buf *)list->next;
+   list_del((struct list_head *)>list);
+   }
+   return buf;
+}
+
+/**
+ * i40iw_puda_get_bufpool - return buffer from resource
+ * @rsrc: resource to use for buffer
+ */
+struct i40iw_puda_buf *i40iw_puda_get_bufpool(struct i40iw_puda_rsrc *rsrc)
+{
+   struct i40iw_puda_buf *buf = NULL;
+   struct list_head *list = >bufpool;
+   unsigned long   flags;
+
+   spin_lock_irqsave(>bufpool_lock, flags);
+   buf = i40iw_puda_get_listbuf(list);
+   if (buf)
+   rsrc->avail_buf_count--;
+   else
+   rsrc->stats_buf_alloc_fail++;
+   spin_unlock_irqrestore(>bufpool_lock, flags);
+   return buf;
+}
+
+/**
+ * i40iw_puda_ret_bufpool - return buffer to rsrc list
+ * @rsrc: resource to use for buffer
+ * @buf: buffe to return to resouce
+ */
+void i40iw_puda_ret_bufpool(struct i40iw_puda_rsrc *rsrc,
+   struct i40iw_puda_buf *buf)
+{
+   unsigned long   flags;
+
+   spin_lock_irqsave(>bufpool_lock, flags);
+   list_add(>list, >bufpool);
+   spin_unlock_irqrestore(>bufpool_lock, flags);
+   rsrc->avail_buf_count++;
+}
+
+/**
+ * i40iw_puda_post_recvbuf - set wqe for rcv buffer
+ * @rsrc: resource ptr
+ * @wqe_idx: wqe index to use
+ * @buf: puda buffer for rcv q
+ * @initial: flag if during init time
+ */
+static void i40iw_puda_post_recvbuf(struct i40iw_puda_rsrc *rsrc, u32 wqe_idx,
+   struct i40iw_puda_buf *buf, bool initial)
+{
+   u64 *wqe;
+   struct i40iw_sc_qp *qp = >qp;
+   u64 offset24 = 0;
+
+   qp->qp_uk.rq_wrid_array[wqe_idx] = (uintptr_t)buf;
+

[PATCH 11/15] i40iw: add X722 register file

2015-12-16 Thread Faisal Latif

X722 Hardware registers defines for iWARP component.

Acked-by: Anjali Singhai Jain 
Acked-by: Shannon Nelson 
Signed-off-by: Faisal Latif 
---
 drivers/infiniband/hw/i40iw/i40iw_register.h | 1027 ++
 1 file changed, 1027 insertions(+)
 create mode 100644 drivers/infiniband/hw/i40iw/i40iw_register.h

diff --git a/drivers/infiniband/hw/i40iw/i40iw_register.h 
b/drivers/infiniband/hw/i40iw/i40iw_register.h
new file mode 100644
index 000..01da7c5
--- /dev/null
+++ b/drivers/infiniband/hw/i40iw/i40iw_register.h
@@ -0,0 +1,1027 @@
+/***
+*
+* Copyright (c) 2015 Intel Corporation.  All rights reserved.
+*
+* This software is available to you under a choice of one of two
+* licenses.  You may choose to be licensed under the terms of the GNU
+* General Public License (GPL) Version 2, available from the file
+* COPYING in the main directory of this source tree, or the
+* OpenFabrics.org BSD license below:
+*
+*   Redistribution and use in source and binary forms, with or
+*   without modification, are permitted provided that the following
+*   conditions are met:
+*
+*- Redistributions of source code must retain the above
+*  copyright notice, this list of conditions and the following
+*  disclaimer.
+*
+*- Redistributions in binary form must reproduce the above
+*  copyright notice, this list of conditions and the following
+*  disclaimer in the documentation and/or other materials
+*  provided with the distribution.
+*
+* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+* EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+* MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+* NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+* BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+* ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+* CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+* SOFTWARE.
+*
+***/
+
+#ifndef I40IW_REGISTER_H
+#define I40IW_REGISTER_H
+
+#define I40E_GLGEN_STAT   0x000B612C /* Reset: POR */
+
+#define I40E_PFHMC_PDINV   0x000C0300 /* Reset: PFR */
+#define I40E_PFHMC_PDINV_PMSDIDX_SHIFT 0
+#define I40E_PFHMC_PDINV_PMSDIDX_MASK  I40E_MASK(0xFFF, 
I40E_PFHMC_PDINV_PMSDIDX_SHIFT)
+#define I40E_PFHMC_PDINV_PMPDIDX_SHIFT 16
+#define I40E_PFHMC_PDINV_PMPDIDX_MASK  I40E_MASK(0x1FF, 
I40E_PFHMC_PDINV_PMPDIDX_SHIFT)
+#define I40E_PFHMC_SDCMD_PMSDWR_SHIFT  31
+#define I40E_PFHMC_SDCMD_PMSDWR_MASK   I40E_MASK(0x1, 
I40E_PFHMC_SDCMD_PMSDWR_SHIFT)
+#define I40E_PFHMC_SDDATALOW_PMSDVALID_SHIFT   0
+#define I40E_PFHMC_SDDATALOW_PMSDVALID_MASKI40E_MASK(0x1, 
I40E_PFHMC_SDDATALOW_PMSDVALID_SHIFT)
+#define I40E_PFHMC_SDDATALOW_PMSDTYPE_SHIFT1
+#define I40E_PFHMC_SDDATALOW_PMSDTYPE_MASK I40E_MASK(0x1, 
I40E_PFHMC_SDDATALOW_PMSDTYPE_SHIFT)
+#define I40E_PFHMC_SDDATALOW_PMSDBPCOUNT_SHIFT 2
+#define I40E_PFHMC_SDDATALOW_PMSDBPCOUNT_MASK  I40E_MASK(0x3FF, 
I40E_PFHMC_SDDATALOW_PMSDBPCOUNT_SHIFT)
+
+#define I40E_PFINT_DYN_CTLN(_INTPF) (0x00034800 + ((_INTPF) * 4)) /* 
_i=0...511 */ /* Reset: PFR */
+#define I40E_PFINT_DYN_CTLN_INTENA_SHIFT  0
+#define I40E_PFINT_DYN_CTLN_INTENA_MASK   I40E_MASK(0x1, 
I40E_PFINT_DYN_CTLN_INTENA_SHIFT)
+#define I40E_PFINT_DYN_CTLN_CLEARPBA_SHIFT1
+#define I40E_PFINT_DYN_CTLN_CLEARPBA_MASK I40E_MASK(0x1, 
I40E_PFINT_DYN_CTLN_CLEARPBA_SHIFT)
+#define I40E_PFINT_DYN_CTLN_ITR_INDX_SHIFT3
+#define I40E_PFINT_DYN_CTLN_ITR_INDX_MASK I40E_MASK(0x3, 
I40E_PFINT_DYN_CTLN_ITR_INDX_SHIFT)
+
+#define I40E_VFINT_DYN_CTLN1(_INTVF)   (0x3800 + ((_INTVF) * 
4)) /* _i=0...15 */ /* Reset: VFR */
+#define I40E_GLHMC_VFPDINV(_i)   (0x000C8300 + ((_i) * 4)) /* 
_i=0...31 */ /* Reset: CORER */
+
+#define I40E_PFHMC_PDINV_PMSDPARTSEL_SHIFT 15
+#define I40E_PFHMC_PDINV_PMSDPARTSEL_MASK  I40E_MASK(0x1, 
I40E_PFHMC_PDINV_PMSDPARTSEL_SHIFT)
+#define I40E_GLPCI_LBARCTRL0x000BE484 /* Reset: POR */
+#define I40E_GLPCI_LBARCTRL_PE_DB_SIZE_SHIFT4
+#define I40E_GLPCI_LBARCTRL_PE_DB_SIZE_MASK I40E_MASK(0x3, 
I40E_GLPCI_LBARCTRL_PE_DB_SIZE_SHIFT)
+
+#define I40E_PFPE_AEQALLOC   0x00131180 /* Reset: PFR */
+#define I40E_PFPE_AEQALLOC_AECOUNT_SHIFT 0
+#define I40E_PFPE_AEQALLOC_AECOUNT_MASK  I40E_MASK(0x, 
I40E_PFPE_AEQALLOC_AECOUNT_SHIFT)
+#define I40E_PFPE_CCQPHIGH  0x8200 /* Reset: PFR */
+#define I40E_PFPE_CCQPHIGH_PECCQPHIGH_SHIFT 0
+#define I40E_PFPE_CCQPHIGH_PECCQPHIGH_MASK  I40E_MASK(0x, 
I40E_PFPE_CCQPHIGH_PECCQPHIGH_SHIFT)
+#define I40E_PFPE_CCQPLOW 0x8180 /* Reset: PFR */
+#define I40E_PFPE_CCQPLOW_PECCQPLOW_SHIFT 0

[PATCH] af_unix: Revert 'lock_interruptible' in stream receive code

2015-12-16 Thread Rainer Weikusat

With b3ca9b02b00704053a38bfe4c31dbbb9c13595d0, the AF_UNIX SOCK_STREAM
receive code was changed from using mutex_lock(>readlock) to
mutex_lock_interruptible(>readlock) to prevent signals from being
delayed for an indefinite time if a thread sleeping on the mutex
happened to be selected for handling the signal. But this was never a
problem with the stream receive code (as opposed to its datagram
counterpart) as that never went to sleep waiting for new messages with the
mutex held and thus, wouldn't cause secondary readers to block on the
mutex waiting for the sleeping primary reader. As the interruptible
locking makes the code more complicated in exchange for no benefit,
change it back to using mutex_lock.

Signed-off-by: Rainer Weikusat 
---

Considering that the datagram receive routine also doesn't go the sleep
with the mutex held anymore, the 37ab4fa7844a044dc21fde45e2a0fc2f3c3b6490
change to unix_autobind is now similarly purposeless.

diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 1c3c1f3..b1314c0 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -2263,14 +2263,7 @@ static int unix_stream_read_generic(struct 
unix_stream_read_state *state)
/* Lock the socket to prevent queue disordering
 * while sleeps in memcpy_tomsg
 */
-   err = mutex_lock_interruptible(>readlock);
-   if (unlikely(err)) {
-   /* recvmsg() in non blocking mode is supposed to return -EAGAIN
-* sk_rcvtimeo is not honored by mutex_lock_interruptible()
-*/
-   err = noblock ? -EAGAIN : -ERESTARTSYS;
-   goto out;
-   }
+   mutex_lock(>readlock);
 
if (flags & MSG_PEEK)
skip = sk_peek_offset(sk, flags);
@@ -2314,12 +2307,12 @@ again:
timeo = unix_stream_data_wait(sk, timeo, last,
  last_len);
 
-   if (signal_pending(current) ||
-   mutex_lock_interruptible(>readlock)) {
+   if (signal_pending(current)) {
err = sock_intr_errno(timeo);
goto out;
}
 
+   mutex_lock(>readlock);
continue;
 unlock:
unix_state_unlock(sk);
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v7 0/4] Support administratively closing application sockets

2015-12-16 Thread Jamal Hadi Salim


On 15-12-16 10:50 AM, Eric Dumazet wrote:

On Wed, 2015-12-16 at 07:43 -0800, Stephen Hemminger wrote:



I see no security checks in the diag infrastructure.
Up until now diag has been read-only access and therefore has been
allowed for all users.


It is still allowed to all users.

Only the 'destroy' operation is restricted.


The question i had was the opposite when i saw this: why are
regular users allowed to read admin (and any other users) details?;->
On this specific feature: why, as a regular user, I cant close
connections attributed to me (and have to use CAP_NET_ADMIN)?

cheers,
jamal

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: net: heap-out-of-bounds in sock_setsockopt

2015-12-16 Thread Cong Wang

On Wed, Dec 16, 2015 at 11:34 AM, Dmitry Vyukov  wrote:
> BUG: KASAN: slab-out-of-bounds in sock_setsockopt+0x1284/0x13d0 at
> addr 88006563ec10
> Read of size 4 by task syzkaller_execu/4755
> =
> BUG RAWv6 (Not tainted): kasan: bad access detected
> -
> INFO: Allocated in sk_prot_alloc+0x69/0x340 age=17 cpu=3 pid=4755
> [<  none  >] kmem_cache_alloc+0x244/0x2c0 mm/slub.c:2607
> [<  none  >] sk_prot_alloc+0x69/0x340 net/core/sock.c:1343
> [<  none  >] sk_alloc+0x3a/0x6b0 net/core/sock.c:1418
> [<  none  >] inet6_create+0x2c4/0xfd0 net/ipv6/af_inet6.c:170
> [<  none  >] __sock_create+0x37c/0x640 net/socket.c:1162
> [< inline >] sock_create net/socket.c:1202
> [< inline >] SYSC_socket net/socket.c:1232
> [<  none  >] SyS_socket+0xef/0x1b0 net/socket.c:1212
> [<  none  >] entry_SYSCALL_64_fastpath+0x16/0x7a
> arch/x86/entry/entry_64.S:185
>
> Call Trace:
>  [] __asan_report_load4_noabort+0x3e/0x40
> mm/kasan/report.c:294
>  [] sock_setsockopt+0x1284/0x13d0 net/core/sock.c:880
>  [< inline >] SYSC_setsockopt net/socket.c:1746
>  [] SyS_setsockopt+0x1fe/0x240 net/socket.c:1729
>  [] entry_SYSCALL_64_fastpath+0x16/0x7a
> arch/x86/entry/entry_64.S:185

Hmm, we should exclude the raw socket case, something like the
following, but I am not sure if the check is too strict or not, also
not sure if we should return an error for this raw socket case.

diff --git a/net/core/sock.c b/net/core/sock.c
index 765be83..c26e80a 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -872,7 +872,7 @@ int sock_setsockopt(struct socket *sock, int
level, int optname,

if (val & SOF_TIMESTAMPING_OPT_ID &&
!(sk->sk_tsflags & SOF_TIMESTAMPING_OPT_ID)) {
-   if (sk->sk_protocol == IPPROTO_TCP) {
+   if (sk->sk_protocol == IPPROTO_TCP &&
sk->sk_type == SOCK_STREAM) {
if (sk->sk_state != TCP_ESTABLISHED) {
ret = -EINVAL;
break;
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 08/15] i40iw: add files for iwarp interface

2015-12-16 Thread Faisal Latif

i40iw_verbs.[ch] are to handle iwarp interface.

Acked-by: Anjali Singhai Jain 
Acked-by: Shannon Nelson 
Signed-off-by: Faisal Latif 
---
 drivers/infiniband/hw/i40iw/i40iw_ucontext.h |  110 ++
 drivers/infiniband/hw/i40iw/i40iw_verbs.c| 2492 ++
 drivers/infiniband/hw/i40iw/i40iw_verbs.h|  173 ++
 3 files changed, 2775 insertions(+)
 create mode 100644 drivers/infiniband/hw/i40iw/i40iw_ucontext.h
 create mode 100644 drivers/infiniband/hw/i40iw/i40iw_verbs.c
 create mode 100644 drivers/infiniband/hw/i40iw/i40iw_verbs.h

diff --git a/drivers/infiniband/hw/i40iw/i40iw_ucontext.h 
b/drivers/infiniband/hw/i40iw/i40iw_ucontext.h
new file mode 100644
index 000..5c65c25
--- /dev/null
+++ b/drivers/infiniband/hw/i40iw/i40iw_ucontext.h
@@ -0,0 +1,110 @@
+/*
+ * Copyright (c) 2006 - 2015 Intel Corporation.  All rights reserved.
+ * Copyright (c) 2005 Topspin Communications.  All rights reserved.
+ * Copyright (c) 2005 Cisco Systems.  All rights reserved.
+ * Copyright (c) 2005 Open Grid Computing, Inc. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ *  - Redistributions of source code must retain the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer.
+ *
+ *  - Redistributions in binary form must reproduce the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer in the documentation and/or other materials
+ *provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ *
+ */
+
+#ifndef I40IW_USER_CONTEXT_H
+#define I40IW_USER_CONTEXT_H
+
+#include 
+
+#define I40IW_ABI_USERSPACE_VER 4
+#define I40IW_ABI_KERNEL_VER4
+struct i40iw_alloc_ucontext_req {
+   __u32 reserved32;
+   __u8 userspace_ver;
+   __u8 reserved8[3];
+};
+
+struct i40iw_alloc_ucontext_resp {
+   __u32 max_pds;  /* maximum pds allowed for this user process */
+   __u32 max_qps;  /* maximum qps allowed for this user process */
+   __u32 wq_size;  /* size of the WQs (sq+rq) allocated to the 
mmaped area */
+   __u8 kernel_ver;
+   __u8 reserved[3];
+};
+
+struct i40iw_alloc_pd_resp {
+   __u32 pd_id;
+   __u8 reserved[4];
+};
+
+struct i40iw_create_cq_req {
+   __u64 user_cq_buffer;
+   __u64 user_shadow_area;
+};
+
+struct i40iw_create_qp_req {
+   __u64 user_wqe_buffers;
+   __u64 user_compl_ctx;
+
+   /* UDA QP PHB */
+   __u64 user_sq_phb;  /* place for VA of the sq phb buff */
+   __u64 user_rq_phb;  /* place for VA of the rq phb buff */
+};
+
+enum i40iw_memreg_type {
+   IW_MEMREG_TYPE_MEM = 0x,
+   IW_MEMREG_TYPE_QP = 0x0001,
+   IW_MEMREG_TYPE_CQ = 0x0002,
+   IW_MEMREG_TYPE_MW = 0x0003,
+   IW_MEMREG_TYPE_FMR = 0x0004,
+   IW_MEMREG_TYPE_FMEM = 0x0005,
+};
+
+struct i40iw_mem_reg_req {
+   __u16 reg_type; /* Memory, QP or CQ */
+   __u16 cq_pages;
+   __u16 rq_pages;
+   __u16 sq_pages;
+};
+
+struct i40iw_create_cq_resp {
+   __u32 cq_id;
+   __u32 cq_size;
+   __u32 mmap_db_index;
+   __u32 reserved;
+};
+
+struct i40iw_create_qp_resp {
+   __u32 qp_id;
+   __u32 actual_sq_size;
+   __u32 actual_rq_size;
+   __u32 i40iw_drv_opt;
+   __u16 push_idx;
+   __u8  lsmm;
+   __u8  rsvd2;
+};
+
+#endif
diff --git a/drivers/infiniband/hw/i40iw/i40iw_verbs.c 
b/drivers/infiniband/hw/i40iw/i40iw_verbs.c
new file mode 100644
index 000..9bdd95f
--- /dev/null
+++ b/drivers/infiniband/hw/i40iw/i40iw_verbs.c
@@ -0,0 +1,2492 @@
+/***
+*
+* Copyright (c) 2015 Intel Corporation.  All rights reserved.
+*
+* This software is available to you under a choice of one of two
+* licenses.  You may choose to be licensed under the terms of the GNU
+* General Public License (GPL) Version 2, available from the file
+* COPYING in the main directory of this source tree,

[PATCH 13/15] i40iw: virtual channel handling files

2015-12-16 Thread Faisal Latif

i40iw_vf.[ch] and i40iw_virtchnl[ch] are used for virtual
channel support for iWARP VF module.

Acked-by: Anjali Singhai Jain 
Acked-by: Shannon Nelson 
Signed-off-by: Faisal Latif 
---
 drivers/infiniband/hw/i40iw/i40iw_vf.c   |  85 +++
 drivers/infiniband/hw/i40iw/i40iw_vf.h   |  62 +++
 drivers/infiniband/hw/i40iw/i40iw_virtchnl.c | 750 +++
 drivers/infiniband/hw/i40iw/i40iw_virtchnl.h | 124 +
 4 files changed, 1021 insertions(+)
 create mode 100644 drivers/infiniband/hw/i40iw/i40iw_vf.c
 create mode 100644 drivers/infiniband/hw/i40iw/i40iw_vf.h
 create mode 100644 drivers/infiniband/hw/i40iw/i40iw_virtchnl.c
 create mode 100644 drivers/infiniband/hw/i40iw/i40iw_virtchnl.h

diff --git a/drivers/infiniband/hw/i40iw/i40iw_vf.c 
b/drivers/infiniband/hw/i40iw/i40iw_vf.c
new file mode 100644
index 000..39bb0ca
--- /dev/null
+++ b/drivers/infiniband/hw/i40iw/i40iw_vf.c
@@ -0,0 +1,85 @@
+/***
+*
+* Copyright (c) 2015 Intel Corporation.  All rights reserved.
+*
+* This software is available to you under a choice of one of two
+* licenses.  You may choose to be licensed under the terms of the GNU
+* General Public License (GPL) Version 2, available from the file
+* COPYING in the main directory of this source tree, or the
+* OpenFabrics.org BSD license below:
+*
+*   Redistribution and use in source and binary forms, with or
+*   without modification, are permitted provided that the following
+*   conditions are met:
+*
+*- Redistributions of source code must retain the above
+*  copyright notice, this list of conditions and the following
+*  disclaimer.
+*
+*- Redistributions in binary form must reproduce the above
+*  copyright notice, this list of conditions and the following
+*  disclaimer in the documentation and/or other materials
+*  provided with the distribution.
+*
+* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+* EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+* MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+* NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+* BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+* ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+* CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+* SOFTWARE.
+*
+***/
+
+#include "i40iw_osdep.h"
+#include "i40iw_register.h"
+#include "i40iw_status.h"
+#include "i40iw_hmc.h"
+#include "i40iw_d.h"
+#include "i40iw_type.h"
+#include "i40iw_p.h"
+#include "i40iw_vf.h"
+
+/**
+ * i40iw_manage_vf_pble_bp - manage vf pble
+ * @cqp: cqp for cqp' sq wqe
+ * @info: pble info
+ * @scratch: pointer for completion
+ * @post_sq: to post and ring
+ */
+enum i40iw_status_code i40iw_manage_vf_pble_bp(struct i40iw_sc_cqp *cqp,
+  struct i40iw_manage_vf_pble_info 
*info,
+  u64 scratch,
+  bool post_sq)
+{
+   u64 *wqe;
+   u64 temp, header, pd_pl_pba = 0;
+
+   wqe = i40iw_sc_cqp_get_next_send_wqe(cqp, scratch);
+   if (!wqe)
+   return I40IW_ERR_RING_FULL;
+
+   temp = LS_64((info->pd_entry_cnt), I40IW_CQPSQ_MVPBP_PD_ENTRY_CNT) |
+   LS_64((info->first_pd_index), I40IW_CQPSQ_MVPBP_FIRST_PD_INX) |
+   LS_64((info->sd_index), I40IW_CQPSQ_MVPBP_SD_INX);
+   set_64bit_val(wqe, 16, temp);
+
+   header = LS_64((info->inv_pd_ent ? 1 : 0), 
I40IW_CQPSQ_MVPBP_INV_PD_ENT) |
+   LS_64(I40IW_CQP_OP_MANAGE_VF_PBLE_BP, I40IW_CQPSQ_OPCODE) |
+   LS_64(cqp->polarity, I40IW_CQPSQ_WQEVALID);
+   set_64bit_val(wqe, 24, header);
+
+   pd_pl_pba = LS_64(info->pd_pl_pba >> 3, I40IW_CQPSQ_MVPBP_PD_PLPBA);
+   set_64bit_val(wqe, 32, pd_pl_pba);
+
+   i40iw_debug_buf(cqp->dev, I40IW_DEBUG_WQE, "MANAGE VF_PBLE_BP WQE", 
wqe, I40IW_CQP_WQE_SIZE * 8);
+
+   if (post_sq)
+   i40iw_sc_cqp_post_sq(cqp);
+   return 0;
+}
+
+struct i40iw_vf_cqp_ops iw_vf_cqp_ops = {
+   i40iw_manage_vf_pble_bp
+};
diff --git a/drivers/infiniband/hw/i40iw/i40iw_vf.h 
b/drivers/infiniband/hw/i40iw/i40iw_vf.h
new file mode 100644
index 000..cfe112d
--- /dev/null
+++ b/drivers/infiniband/hw/i40iw/i40iw_vf.h
@@ -0,0 +1,62 @@
+/***
+*
+* Copyright (c) 2015 Intel Corporation.  All rights reserved.
+*
+* This software is available to you under a choice of one of two
+* licenses.  You may choose to be licensed under the terms of the GNU
+* General Public License (GPL) Version 2, available from the file
+* COPYING in the main directory of this source tree, or the
+* OpenFabrics.org BSD

[PATCH 10/15] i40iw: add hardware related header files

2015-12-16 Thread Faisal Latif

header files for hardware accesses

Signed-off-by: Faisal Latif 
---
 drivers/infiniband/hw/i40iw/i40iw_d.h| 1713 ++
 drivers/infiniband/hw/i40iw/i40iw_p.h|  106 ++
 drivers/infiniband/hw/i40iw/i40iw_type.h | 1308 +++
 3 files changed, 3127 insertions(+)
 create mode 100644 drivers/infiniband/hw/i40iw/i40iw_d.h
 create mode 100644 drivers/infiniband/hw/i40iw/i40iw_p.h
 create mode 100644 drivers/infiniband/hw/i40iw/i40iw_type.h

diff --git a/drivers/infiniband/hw/i40iw/i40iw_d.h 
b/drivers/infiniband/hw/i40iw/i40iw_d.h
new file mode 100644
index 000..f6668d7
--- /dev/null
+++ b/drivers/infiniband/hw/i40iw/i40iw_d.h
@@ -0,0 +1,1713 @@
+/***
+*
+* Copyright (c) 2015 Intel Corporation.  All rights reserved.
+*
+* This software is available to you under a choice of one of two
+* licenses.  You may choose to be licensed under the terms of the GNU
+* General Public License (GPL) Version 2, available from the file
+* COPYING in the main directory of this source tree, or the
+* OpenFabrics.org BSD license below:
+*
+*   Redistribution and use in source and binary forms, with or
+*   without modification, are permitted provided that the following
+*   conditions are met:
+*
+*- Redistributions of source code must retain the above
+*  copyright notice, this list of conditions and the following
+*  disclaimer.
+*
+*- Redistributions in binary form must reproduce the above
+*  copyright notice, this list of conditions and the following
+*  disclaimer in the documentation and/or other materials
+*  provided with the distribution.
+*
+* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+* EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+* MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+* NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+* BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+* ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+* CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+* SOFTWARE.
+*
+***/
+
+#ifndef I40IW_D_H
+#define I40IW_D_H
+
+#define I40IW_DB_ADDR_OFFSET(4 * 1024 * 1024 - 64 * 1024)
+#define I40IW_VF_DB_ADDR_OFFSET (64 * 1024)
+
+#define I40IW_PUSH_OFFSET   (4 * 1024 * 1024)
+#define I40IW_PF_FIRST_PUSH_PAGE_INDEX 16
+#define I40IW_VF_PUSH_OFFSET((8 + 64) * 1024)
+#define I40IW_VF_FIRST_PUSH_PAGE_INDEX 2
+
+#define I40IW_PE_DB_SIZE_4M 1
+#define I40IW_PE_DB_SIZE_8M 2
+
+#define I40IW_DDP_VER 1
+#define I40IW_RDMAP_VER 1
+
+#define I40IW_RDMA_MODE_RDMAC 0
+#define I40IW_RDMA_MODE_IETF  1
+
+#define I40IW_QP_STATE_INVALID 0
+#define I40IW_QP_STATE_IDLE 1
+#define I40IW_QP_STATE_RTS 2
+#define I40IW_QP_STATE_CLOSING 3
+#define I40IW_QP_STATE_RESERVED 4
+#define I40IW_QP_STATE_TERMINATE 5
+#define I40IW_QP_STATE_ERROR 6
+
+#define I40IW_STAG_STATE_INVALID 0
+#define I40IW_STAG_STATE_VALID 1
+
+#define I40IW_STAG_TYPE_SHARED 0
+#define I40IW_STAG_TYPE_NONSHARED 1
+
+#define I40IW_MAX_USER_PRIORITY 8
+
+#define LS_64_1(val, bits)  ((u64)(uintptr_t)val << bits)
+#define RS_64_1(val, bits)  ((u64)(uintptr_t)val >> bits)
+#define LS_32_1(val, bits)  (u32)(val << bits)
+#define RS_32_1(val, bits)  (u32)(val >> bits)
+#define I40E_HI_DWORD(x)((u32)x) >> 16) >> 16) & 0x))
+
+#define LS_64(val, field) (((u64)val << field ## _SHIFT) & (field ## _MASK))
+
+#define RS_64(val, field) ((u64)(u64)(val & field ## _MASK) >> field ## _SHIFT)
+#define LS_32(val, field) ((val << field ## _SHIFT) & (field ## _MASK))
+#define RS_32(val, field) ((val & field ## _MASK) >> field ## _SHIFT)
+
+#define TERM_DDP_LEN_TAGGED 14
+#define TERM_DDP_LEN_UNTAGGED   18
+#define TERM_RDMA_LEN   28
+#define RDMA_OPCODE_MASK0x0f
+#define RDMA_READ_REQ_OPCODE1
+#define Q2_BAD_FRAME_OFFSET 72
+#define CQE_MAJOR_DRV   0x8000
+
+#define I40IW_TERM_SENT 0x01
+#define I40IW_TERM_RCVD 0x02
+#define I40IW_TERM_DONE 0x04
+#define I40IW_MAC_HLEN  14
+
+#define I40IW_INVALID_WQE_INDEX 0x
+
+#define I40IW_CQP_WAIT_POLL_REGS 1
+#define I40IW_CQP_WAIT_POLL_CQ 2
+#define I40IW_CQP_WAIT_EVENT 3
+
+#define I40IW_CQP_INIT_WQE(wqe) memset(wqe, 0, 64)
+
+#define I40IW_GET_CURRENT_CQ_ELEMENT(_cq) \
+   ( \
+   &((_cq)->cq_base[I40IW_RING_GETCURRENT_HEAD((_cq)->cq_ring)])  \
+   )
+#define I40IW_GET_CURRENT_EXTENDED_CQ_ELEMENT(_cq) \
+   ( \
+   &(((struct i40iw_extended_cqe *)\
+  
((_cq)->cq_base))[I40IW_RING_GETCURRENT_HEAD((_cq)->cq_ring)]) \
+   )
+
+#define I40IW_GET_CURRENT_AEQ_ELEMENT(_aeq) \
+   ( \
+   &_aeq->aeqe_base[I40IW_RING_GETCURRENT_TAIL(_aeq->aeq_ring)]   \
+   )
+
+#define

[PATCH 14/15] i40iw: Kconfig and Kbuild for iwarp module

2015-12-16 Thread Faisal Latif

Kconfig and Kbuild needed to build iwarp module.

Signed-off-by: Faisal Latif 
---
 drivers/infiniband/hw/i40iw/Kbuild  | 43 +
 drivers/infiniband/hw/i40iw/Kconfig |  7 ++
 2 files changed, 50 insertions(+)
 create mode 100644 drivers/infiniband/hw/i40iw/Kbuild
 create mode 100644 drivers/infiniband/hw/i40iw/Kconfig

diff --git a/drivers/infiniband/hw/i40iw/Kbuild 
b/drivers/infiniband/hw/i40iw/Kbuild
new file mode 100644
index 000..ba84a78
--- /dev/null
+++ b/drivers/infiniband/hw/i40iw/Kbuild
@@ -0,0 +1,43 @@
+
+#
+# * Copyright (c) 2015 Intel Corporation.  All rights reserved.
+# *
+# * This software is available to you under a choice of one of two
+# * licenses.  You may choose to be licensed under the terms of the GNU
+# * General Public License (GPL) Version 2, available from the file
+# * COPYING in the main directory of this source tree, or the
+# * OpenFabrics.org BSD license below:
+# *
+# *   Redistribution and use in source and binary forms, with or
+# *   without modification, are permitted provided that the following
+# *   conditions are met:
+# *
+# *- Redistributions of source code must retain the above
+# *copyright notice, this list of conditions and the following
+# *disclaimer.
+# *
+# *- Redistributions in binary form must reproduce the above
+# *copyright notice, this list of conditions and the following
+# *disclaimer in the documentation and/or other materials
+# *provided with the distribution.
+# *
+# * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+# * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+# * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+# * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+# * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+# * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+# * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+# * SOFTWARE.
+#
+
+
+ccflags-y :=  -Idrivers/net/ethernet/intel/i40e
+
+obj-m += i40iw.o
+
+i40iw-objs :=\
+   i40iw_cm.o i40iw_ctrl.o \
+   i40iw_hmc.o i40iw_hw.o i40iw_main.o  \
+   i40iw_pble.o i40iw_puda.o i40iw_uk.o i40iw_utils.o \
+   i40iw_verbs.o i40iw_virtchnl.o i40iw_vf.o
diff --git a/drivers/infiniband/hw/i40iw/Kconfig 
b/drivers/infiniband/hw/i40iw/Kconfig
new file mode 100644
index 000..6e7d27a
--- /dev/null
+++ b/drivers/infiniband/hw/i40iw/Kconfig
@@ -0,0 +1,7 @@
+config INFINIBAND_I40IW
+   tristate "Intel(R) Ethernet X722 iWARP Driver"
+   depends on INET && I40E
+   select GENERIC_ALLOCATOR
+   ---help---
+   Intel(R) Ethernet X722 iWARP Driver
+   INET && I40IW && INFINIBAND && I40E
-- 
2.5.3

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 02/15] i40iw: add main, hdr, status

2015-12-16 Thread Faisal Latif

i40iw_main.c contains routines for i40e <=> i40iw interface and setup.
i40iw.h is header file for main device data structures.
i40iw_status.h is for return status codes.

Acked-by: Anjali Singhai Jain 
Acked-by: Shannon Nelson 
Signed-off-by: Faisal Latif 
---
 drivers/infiniband/hw/i40iw/i40iw.h|  573 +
 drivers/infiniband/hw/i40iw/i40iw_main.c   | 1905 
 drivers/infiniband/hw/i40iw/i40iw_status.h |  100 ++
 3 files changed, 2578 insertions(+)
 create mode 100644 drivers/infiniband/hw/i40iw/i40iw.h
 create mode 100644 drivers/infiniband/hw/i40iw/i40iw_main.c
 create mode 100644 drivers/infiniband/hw/i40iw/i40iw_status.h

diff --git a/drivers/infiniband/hw/i40iw/i40iw.h 
b/drivers/infiniband/hw/i40iw/i40iw.h
new file mode 100644
index 000..c048f06b
--- /dev/null
+++ b/drivers/infiniband/hw/i40iw/i40iw.h
@@ -0,0 +1,573 @@
+/***
+*
+* Copyright (c) 2015 Intel Corporation.  All rights reserved.
+*
+* This software is available to you under a choice of one of two
+* licenses.  You may choose to be licensed under the terms of the GNU
+* General Public License (GPL) Version 2, available from the file
+* COPYING in the main directory of this source tree, or the
+* OpenFabrics.org BSD license below:
+*
+*   Redistribution and use in source and binary forms, with or
+*   without modification, are permitted provided that the following
+*   conditions are met:
+*
+*- Redistributions of source code must retain the above
+*  copyright notice, this list of conditions and the following
+*  disclaimer.
+*
+*- Redistributions in binary form must reproduce the above
+*  copyright notice, this list of conditions and the following
+*  disclaimer in the documentation and/or other materials
+*  provided with the distribution.
+*
+* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+* EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+* MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+* NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+* BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+* ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+* CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+* SOFTWARE.
+*
+***/
+
+#ifndef I40IW_IW_H
+#define I40IW_IW_H
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "i40iw_status.h"
+#include "i40iw_osdep.h"
+#include "i40iw_d.h"
+#include "i40iw_hmc.h"
+
+#include 
+#include "i40iw_type.h"
+#include "i40iw_p.h"
+#include "i40iw_ucontext.h"
+#include "i40iw_pble.h"
+#include "i40iw_verbs.h"
+#include "i40iw_cm.h"
+#include "i40iw_user.h"
+#include "i40iw_puda.h"
+
+#define I40IW_FW_VERSION  2
+#define I40IW_HW_VERSION  2
+
+#define I40IW_ARP_ADD 1
+#define I40IW_ARP_DELETE  2
+#define I40IW_ARP_RESOLVE 3
+
+#define I40IW_MACIP_ADD 1
+#define I40IW_MACIP_DELETE  2
+
+#define IW_CCQ_SIZE (I40IW_CQP_SW_SQSIZE_2048 + 1)
+#define IW_CEQ_SIZE 2048
+#define IW_AEQ_SIZE 2048
+
+#define RX_BUF_SIZE(1536 + 8)
+#define IW_REG0_SIZE   (4 * 1024)
+#define IW_TX_TIMEOUT  (6 * HZ)
+#define IW_FIRST_QPN   1
+#define IW_SW_CONTEXT_ALIGN1024
+
+#define MAX_DPC_ITERATIONS 128
+
+#define I40IW_EVENT_TIMEOUT10
+#define I40IW_VCHNL_EVENT_TIMEOUT  10
+
+#defineI40IW_NO_VLAN   0x
+#defineI40IW_NO_QSET   0x
+
+/* access to mcast filter list */
+#define IW_ADD_MCAST false
+#define IW_DEL_MCAST true
+
+#define I40IW_DRV_OPT_ENABLE_MPA_VER_0 0x0001
+#define I40IW_DRV_OPT_DISABLE_MPA_CRC  0x0002
+#define I40IW_DRV_OPT_DISABLE_FIRST_WRITE  0x0004
+#define I40IW_DRV_OPT_DISABLE_INTF 0x0008
+#define I40IW_DRV_OPT_ENABLE_MSI   0x0010
+#define I40IW_DRV_OPT_DUAL_LOGICAL_PORT0x0020
+#define I40IW_DRV_OPT_NO_INLINE_DATA   0x0080
+#define I40IW_DRV_OPT_DISABLE_INT_MOD  0x0100
+#define I40IW_DRV_OPT_DISABLE_VIRT_WQ  0x0200
+#define I40IW_DRV_OPT_ENABLE_PAU   0x0400
+#define I40IW_DRV_OPT_MCAST_LOGPORT_MAP0x0800
+
+#define IW_HMC_OBJ_TYPE_NUM ARRAY_SIZE(iw_hmc_obj_types)
+#define IW_CFG_FPM_QP_COUNT32768
+
+#define I40IW_MTU_TO_MSS   40
+#define I40IW_DEFAULT_MSS  1460
+
+struct i40iw_cqp_compl_info {
+   u32 op_ret_val;
+   u16 maj_err_code;
+   u16 min_err_code;
+   bool error;
+   u8 op_code;
+};
+
+#define CHECK_CQP_REQ(cqp_request) \
+{

[PATCH 15/15] i40iw: changes for build of i40iw module

2015-12-16 Thread Faisal Latif

IAINTAINERS< Kconfig, Makefile and rdma_netlink.h to include
i40iw

Signed-off-by: Faisal Latif 
---
 MAINTAINERS  | 10 ++
 drivers/infiniband/Kconfig   |  1 +
 drivers/infiniband/hw/Makefile   |  1 +
 include/uapi/rdma/rdma_netlink.h |  1 +
 4 files changed, 13 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 69c8a9c..fc0ee30 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -5600,6 +5600,16 @@ F:   Documentation/networking/i40evf.txt
 F: drivers/net/ethernet/intel/
 F: drivers/net/ethernet/intel/*/
 
+INTEL RDMA RNIC DRIVER
+M: Faisal Latif 
+R: Chien Tin Tung 
+R: Mustafa Ismail 
+R: Shiraz Saleem 
+R: Tatyana Nikolova 
+L: linux-r...@vger.kernel.org
+S: Supported
+F: drivers/infiniband/hw/i40iw/
+
 INTEL-MID GPIO DRIVER
 M: David Cohen 
 L: linux-g...@vger.kernel.org
diff --git a/drivers/infiniband/Kconfig b/drivers/infiniband/Kconfig
index aa26f3c..7ddd81f 100644
--- a/drivers/infiniband/Kconfig
+++ b/drivers/infiniband/Kconfig
@@ -58,6 +58,7 @@ source "drivers/infiniband/hw/mthca/Kconfig"
 source "drivers/infiniband/hw/qib/Kconfig"
 source "drivers/infiniband/hw/cxgb3/Kconfig"
 source "drivers/infiniband/hw/cxgb4/Kconfig"
+source "drivers/infiniband/hw/i40iw/Kconfig"
 source "drivers/infiniband/hw/mlx4/Kconfig"
 source "drivers/infiniband/hw/mlx5/Kconfig"
 source "drivers/infiniband/hw/nes/Kconfig"
diff --git a/drivers/infiniband/hw/Makefile b/drivers/infiniband/hw/Makefile
index aded2a5..c7ad0a4 100644
--- a/drivers/infiniband/hw/Makefile
+++ b/drivers/infiniband/hw/Makefile
@@ -2,6 +2,7 @@ obj-$(CONFIG_INFINIBAND_MTHCA)  += mthca/
 obj-$(CONFIG_INFINIBAND_QIB)   += qib/
 obj-$(CONFIG_INFINIBAND_CXGB3) += cxgb3/
 obj-$(CONFIG_INFINIBAND_CXGB4) += cxgb4/
+obj-$(CONFIG_INFINIBAND_I40IW) += i40iw/
 obj-$(CONFIG_MLX4_INFINIBAND)  += mlx4/
 obj-$(CONFIG_MLX5_INFINIBAND)  += mlx5/
 obj-$(CONFIG_INFINIBAND_NES)   += nes/
diff --git a/include/uapi/rdma/rdma_netlink.h b/include/uapi/rdma/rdma_netlink.h
index c19a5dc..56bafbe 100644
--- a/include/uapi/rdma/rdma_netlink.h
+++ b/include/uapi/rdma/rdma_netlink.h
@@ -5,6 +5,7 @@
 
 enum {
RDMA_NL_RDMA_CM = 1,
+   RDMA_NL_I40IW,
RDMA_NL_NES,
RDMA_NL_C4IW,
RDMA_NL_LS, /* RDMA Local Services */
-- 
2.5.3

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 09/15] i40iw: add file to handle cqp calls

2015-12-16 Thread Faisal Latif

i40iw_ctrl.c provides for hardware wqe supporti and cqp.

Acked-by: Anjali Singhai Jain 
Acked-by: Shannon Nelson 
Signed-off-by: Faisal Latif 
---
 drivers/infiniband/hw/i40iw/i40iw_ctrl.c | 4774 ++
 1 file changed, 4774 insertions(+)
 create mode 100644 drivers/infiniband/hw/i40iw/i40iw_ctrl.c

diff --git a/drivers/infiniband/hw/i40iw/i40iw_ctrl.c 
b/drivers/infiniband/hw/i40iw/i40iw_ctrl.c
new file mode 100644
index 000..d0f2a23
--- /dev/null
+++ b/drivers/infiniband/hw/i40iw/i40iw_ctrl.c
@@ -0,0 +1,4774 @@
+/***
+*
+* Copyright (c) 2015 Intel Corporation.  All rights reserved.
+*
+* This software is available to you under a choice of one of two
+* licenses.  You may choose to be licensed under the terms of the GNU
+* General Public License (GPL) Version 2, available from the file
+* COPYING in the main directory of this source tree, or the
+* OpenFabrics.org BSD license below:
+*
+*   Redistribution and use in source and binary forms, with or
+*   without modification, are permitted provided that the following
+*   conditions are met:
+*
+*- Redistributions of source code must retain the above
+*  copyright notice, this list of conditions and the following
+*  disclaimer.
+*
+*- Redistributions in binary form must reproduce the above
+*  copyright notice, this list of conditions and the following
+*  disclaimer in the documentation and/or other materials
+*  provided with the distribution.
+*
+* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+* EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+* MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+* NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+* BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+* ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+* CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+* SOFTWARE.
+*
+***/
+
+#include "i40iw_osdep.h"
+#include "i40iw_register.h"
+#include "i40iw_status.h"
+#include "i40iw_hmc.h"
+
+#include "i40iw_d.h"
+#include "i40iw_type.h"
+#include "i40iw_p.h"
+#include "i40iw_vf.h"
+#include "i40iw_virtchnl.h"
+
+/**
+ * i40iw_insert_wqe_hdr - write wqe header
+ * @wqe: cqp wqe for header
+ * @header: header for the cqp wqe
+ */
+static inline void i40iw_insert_wqe_hdr(u64 *wqe, u64 header)
+{
+   wmb();/* make sure WQE is populated before valid bit is set 
*/
+   set_64bit_val(wqe, 24, header);
+}
+
+/**
+ * i40iw_get_cqp_reg_info - get head and tail for cqp using registers
+ * @cqp: struct for cqp hw
+ * @val: cqp tail register value
+ * @tail:wqtail register value
+ * @error: cqp processing err
+ */
+static inline void i40iw_get_cqp_reg_info(struct i40iw_sc_cqp *cqp,
+ u32 *val,
+ u32 *tail,
+ u32 *error)
+{
+   if (cqp->dev->is_pf) {
+   *val = rd32(cqp->dev->hw, I40E_PFPE_CQPTAIL);
+   *tail = RS_32(*val, I40E_PFPE_CQPTAIL_WQTAIL);
+   *error = RS_32(*val, I40E_PFPE_CQPTAIL_CQP_OP_ERR);
+   } else {
+   *val = rd32(cqp->dev->hw, I40E_VFPE_CQPTAIL1);
+   *tail = RS_32(*val, I40E_VFPE_CQPTAIL_WQTAIL);
+   *error = RS_32(*val, I40E_VFPE_CQPTAIL_CQP_OP_ERR);
+   }
+}
+
+/**
+ * i40iw_cqp_poll_registers - poll cqp registers
+ * @cqp: struct for cqp hw
+ * @tail:wqtail register value
+ * @count: how many times to try for completion
+ */
+static enum i40iw_status_code i40iw_cqp_poll_registers(
+   struct i40iw_sc_cqp *cqp,
+   u32 tail,
+   u32 count)
+{
+   u32 i = 0;
+   u32 newtail, error, val;
+
+   while (i < count) {
+   i++;
+   i40iw_get_cqp_reg_info(cqp, , , );
+   if (error) {
+   error = (cqp->dev->is_pf) ?
+rd32(cqp->dev->hw, I40E_PFPE_CQPERRCODES) :
+rd32(cqp->dev->hw, I40E_VFPE_CQPERRCODES1);
+   return I40IW_ERR_CQP_COMPL_ERROR;
+   }
+   if (newtail != tail) {
+   /* SUCCESS */
+   I40IW_RING_MOVE_TAIL(cqp->sq_ring);
+   return 0;
+   }
+   udelay(I40IW_SLEEP_COUNT);
+   }
+   return I40IW_ERR_TIMEOUT;
+}
+
+/**
+ * i40iw_sc_parse_fpm_commit_buf - parse fpm commit buffer
+ * @buf: ptr to fpm commit buffer
+ * @info: ptr to i40iw_hmc_obj_info struct
+ *
+ * parses fpm commit info and copy base value
+

Re: [PATCH 15/15] i40iw: changes for build of i40iw module

2015-12-16 Thread kbuild test robot

Hi Faisal,

[auto build test WARNING on net/master]
[also build test WARNING on v4.4-rc5 next-20151216]
[cannot apply to net-next/master]

url:
https://github.com/0day-ci/linux/commits/Faisal-Latif/add-Intel-R-X722-iWARP-driver/20151217-040340
config: arm-allyesconfig (attached as .config)
reproduce:
wget 
https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross
 -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=arm 

All warnings (new ones prefixed by >>):

   In file included from include/linux/byteorder/big_endian.h:4:0,
from arch/arm/include/uapi/asm/byteorder.h:19,
from include/asm-generic/bitops/le.h:5,
from arch/arm/include/asm/bitops.h:340,
from include/linux/bitops.h:36,
from include/linux/kernel.h:10,
from include/linux/skbuff.h:17,
from include/linux/ip.h:20,
from drivers/infiniband/hw/i40iw/i40iw_cm.c:36:
   drivers/infiniband/hw/i40iw/i40iw_cm.c: In function 'i40iw_init_tcp_ctx':
   include/uapi/linux/byteorder/big_endian.h:32:26: warning: large integer 
implicitly truncated to unsigned type [-Woverflow]
#define __cpu_to_le32(x) ((__force __le32)__swab32((x)))
 ^
   include/linux/byteorder/generic.h:87:21: note: in expansion of macro 
'__cpu_to_le32'
#define cpu_to_le32 __cpu_to_le32
^
>> drivers/infiniband/hw/i40iw/i40iw_cm.c:3513:18: note: in expansion of macro 
>> 'cpu_to_le32'
 tcp_info->ttl = cpu_to_le32(I40IW_DEFAULT_TTL);
 ^

vim +/cpu_to_le32 +3513 drivers/infiniband/hw/i40iw/i40iw_cm.c

2d207efd Faisal Latif 2015-12-16  3497   * i40iw_init_tcp_ctx - setup qp context
2d207efd Faisal Latif 2015-12-16  3498   * @cm_node: connection's node
2d207efd Faisal Latif 2015-12-16  3499   * @tcp_info: offload info for tcp
2d207efd Faisal Latif 2015-12-16  3500   * @iwqp: associate qp for the 
connection
2d207efd Faisal Latif 2015-12-16  3501   */
2d207efd Faisal Latif 2015-12-16  3502  static void i40iw_init_tcp_ctx(struct 
i40iw_cm_node *cm_node,
2d207efd Faisal Latif 2015-12-16  3503 struct 
i40iw_tcp_offload_info *tcp_info,
2d207efd Faisal Latif 2015-12-16  3504 struct 
i40iw_qp *iwqp)
2d207efd Faisal Latif 2015-12-16  3505  {
2d207efd Faisal Latif 2015-12-16  3506  tcp_info->ipv4 = cm_node->ipv4;
2d207efd Faisal Latif 2015-12-16  3507  tcp_info->drop_ooo_seg = true;
2d207efd Faisal Latif 2015-12-16  3508  tcp_info->wscale = true;
2d207efd Faisal Latif 2015-12-16  3509  tcp_info->ignore_tcp_opt = true;
2d207efd Faisal Latif 2015-12-16  3510  tcp_info->ignore_tcp_uns_opt = 
true;
2d207efd Faisal Latif 2015-12-16  3511  tcp_info->no_nagle = false;
2d207efd Faisal Latif 2015-12-16  3512  
2d207efd Faisal Latif 2015-12-16 @3513  tcp_info->ttl = 
cpu_to_le32(I40IW_DEFAULT_TTL);
2d207efd Faisal Latif 2015-12-16  3514  tcp_info->rtt_var = 
cpu_to_le32(I40IW_DEFAULT_RTT_VAR);
2d207efd Faisal Latif 2015-12-16  3515  tcp_info->ss_thresh = 
cpu_to_le32(I40IW_DEFAULT_SS_THRESH);
2d207efd Faisal Latif 2015-12-16  3516  tcp_info->rexmit_thresh = 
I40IW_DEFAULT_REXMIT_THRESH;
2d207efd Faisal Latif 2015-12-16  3517  
2d207efd Faisal Latif 2015-12-16  3518  tcp_info->tcp_state = 
I40IW_TCP_STATE_ESTABLISHED;
2d207efd Faisal Latif 2015-12-16  3519  tcp_info->snd_wscale = 
cm_node->tcp_cntxt.snd_wscale;
2d207efd Faisal Latif 2015-12-16  3520  tcp_info->rcv_wscale = 
cm_node->tcp_cntxt.rcv_wscale;
2d207efd Faisal Latif 2015-12-16  3521  

:: The code at line 3513 was first introduced by commit
:: 2d207efd7fd9e5a190b2ebd6f077139412b0343f i40iw: add connection 
management code

:: TO: Faisal Latif <faisal.la...@intel.com>
:: CC: 0day robot <fengguang...@intel.com>

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: Binary data

[PATCH] Octeon: Fix logic for waking octoen ethernet tx queue.

2015-12-16 Thread Luuk Paulussen

The ethernet driver tx queue is stopped when the queue length exceeds
what is allowed.  It should only be started again when the queue length
is back within bounds.

The logic here was just reenabling the queue when any buffers had been
freed.  the queue was stopped whenever the length exceeded 1000
(MAX_OUT_QUEUE_DEPTH), but then was essentially immediately started again.
On a congested link, the queue length would just keep increasing up to around
8000 (for average size packets), at which point the hardware would start
refusing the packets and they would begin to be dropped.
This prevented the qdisc layer from effectively managing and prioritising
packets, as essentially all packets were being allowed into the driver queue
and then were being dropped by the hardware.

This change only restarts the queue if the length is less than 1000
(MAX_OUT_QUEUE_DEPTH).

Reviewed-by: Kyeong Yoo 
Reviewed-by: Chris Packham 
Reviewed-by: Richard Laing 
---
 drivers/staging/octeon/ethernet-tx.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/staging/octeon/ethernet-tx.c 
b/drivers/staging/octeon/ethernet-tx.c
index c053c4a..31292fe 100644
--- a/drivers/staging/octeon/ethernet-tx.c
+++ b/drivers/staging/octeon/ethernet-tx.c
@@ -126,7 +126,7 @@ static void cvm_oct_free_tx_skbs(struct net_device *dev)
}
total_remaining += skb_queue_len(>tx_free_list[qos]);
}
-   if (total_freed >= 0 && netif_queue_stopped(dev))
+   if (total_remaining < MAX_OUT_QUEUE_DEPTH && netif_queue_stopped(dev))
netif_wake_queue(dev);
if (total_remaining)
cvm_oct_kick_tx_poll_watchdog();
-- 
2.6.4

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 07/15] i40iw: add hw and utils files

2015-12-16 Thread Faisal Latif

i40iw_hw.c, i40iw_utils.c and i40iw_osdep.h are files to handle
interrupts and processing.

Acked-by: Anjali Singhai Jain 
Acked-by: Shannon Nelson 
Signed-off-by: Faisal Latif 
---
 drivers/infiniband/hw/i40iw/i40iw_hw.c|  705 +
 drivers/infiniband/hw/i40iw/i40iw_osdep.h |  235 ++
 drivers/infiniband/hw/i40iw/i40iw_utils.c | 1233 +
 3 files changed, 2173 insertions(+)
 create mode 100644 drivers/infiniband/hw/i40iw/i40iw_hw.c
 create mode 100644 drivers/infiniband/hw/i40iw/i40iw_osdep.h
 create mode 100644 drivers/infiniband/hw/i40iw/i40iw_utils.c

diff --git a/drivers/infiniband/hw/i40iw/i40iw_hw.c 
b/drivers/infiniband/hw/i40iw/i40iw_hw.c
new file mode 100644
index 000..13d0d9e
--- /dev/null
+++ b/drivers/infiniband/hw/i40iw/i40iw_hw.c
@@ -0,0 +1,705 @@
+/***
+*
+* Copyright (c) 2015 Intel Corporation.  All rights reserved.
+*
+* This software is available to you under a choice of one of two
+* licenses.  You may choose to be licensed under the terms of the GNU
+* General Public License (GPL) Version 2, available from the file
+* COPYING in the main directory of this source tree, or the
+* OpenFabrics.org BSD license below:
+*
+*   Redistribution and use in source and binary forms, with or
+*   without modification, are permitted provided that the following
+*   conditions are met:
+*
+*- Redistributions of source code must retain the above
+*  copyright notice, this list of conditions and the following
+*  disclaimer.
+*
+*- Redistributions in binary form must reproduce the above
+*  copyright notice, this list of conditions and the following
+*  disclaimer in the documentation and/or other materials
+*  provided with the distribution.
+*
+* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+* EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+* MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+* NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+* BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+* ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+* CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+* SOFTWARE.
+*
+***/
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "i40iw.h"
+
+/**
+ * i40iw_initialize_hw_resources - initialize hw resource during open
+ * @iwdev: iwarp device
+ */
+u32 i40iw_initialize_hw_resources(struct i40iw_device *iwdev)
+{
+   unsigned long num_pds;
+   u32 resources_size;
+   u32 max_mr;
+   u32 max_qp;
+   u32 max_cq;
+   u32 arp_table_size;
+   u32 mrdrvbits;
+   void *resource_ptr;
+
+   max_qp = iwdev->sc_dev.hmc_info->hmc_obj[I40IW_HMC_IW_QP].cnt;
+   max_cq = iwdev->sc_dev.hmc_info->hmc_obj[I40IW_HMC_IW_CQ].cnt;
+   max_mr = iwdev->sc_dev.hmc_info->hmc_obj[I40IW_HMC_IW_MR].cnt;
+   arp_table_size = iwdev->sc_dev.hmc_info->hmc_obj[I40IW_HMC_IW_ARP].cnt;
+   iwdev->max_cqe = 0xF;
+   num_pds = max_qp * 4;
+   resources_size = sizeof(struct i40iw_arp_entry) * arp_table_size;
+   resources_size += sizeof(unsigned long) * BITS_TO_LONGS(max_qp);
+   resources_size += sizeof(unsigned long) * BITS_TO_LONGS(max_mr);
+   resources_size += sizeof(unsigned long) * BITS_TO_LONGS(max_cq);
+   resources_size += sizeof(unsigned long) * BITS_TO_LONGS(num_pds);
+   resources_size += sizeof(unsigned long) * BITS_TO_LONGS(arp_table_size);
+   resources_size += sizeof(struct i40iw_qp **) * max_qp;
+   iwdev->mem_resources = kzalloc(resources_size, GFP_KERNEL);
+
+   if (!iwdev->mem_resources)
+   return -ENOMEM;
+
+   iwdev->max_qp = max_qp;
+   iwdev->max_mr = max_mr;
+   iwdev->max_cq = max_cq;
+   iwdev->max_pd = num_pds;
+   iwdev->arp_table_size = arp_table_size;
+   iwdev->arp_table = (struct i40iw_arp_entry *)iwdev->mem_resources;
+   resource_ptr = iwdev->mem_resources + (sizeof(struct i40iw_arp_entry) * 
arp_table_size);
+
+   iwdev->device_cap_flags = IB_DEVICE_LOCAL_DMA_LKEY |
+   IB_DEVICE_MEM_WINDOW | IB_DEVICE_MEM_MGT_EXTENSIONS;
+
+   iwdev->allocated_qps = resource_ptr;
+   iwdev->allocated_cqs = >allocated_qps[BITS_TO_LONGS(max_qp)];
+   iwdev->allocated_mrs = >allocated_cqs[BITS_TO_LONGS(max_cq)];
+   iwdev->allocated_pds = >allocated_mrs[BITS_TO_LONGS(max_mr)];
+   iwdev->allocated_arps = >allocated_pds[BITS_TO_LONGS(num_pds)];
+   iwdev->qp_table = (struct i40iw_qp 
**)(>allocated_arps[BITS_TO_LONGS(arp_table_size)]);
+   set_bit(0, iwdev->allocated_mrs);
+   set_bit(0, iwdev->allocated_qps);
+   set_bit(0, iwdev->allocated_cqs);
+

[PATCH 03/15] i40iw: add connection management code

2015-12-16 Thread Faisal Latif

i40iw_cm.c i40iw_cm.h are used for connection management.

Acked-by: Anjali Singhai Jain 
Acked-by: Shannon Nelson 
Signed-off-by: Faisal Latif 
---
 drivers/infiniband/hw/i40iw/i40iw_cm.c | 4447 
 drivers/infiniband/hw/i40iw/i40iw_cm.h |  456 
 2 files changed, 4903 insertions(+)
 create mode 100644 drivers/infiniband/hw/i40iw/i40iw_cm.c
 create mode 100644 drivers/infiniband/hw/i40iw/i40iw_cm.h

diff --git a/drivers/infiniband/hw/i40iw/i40iw_cm.c 
b/drivers/infiniband/hw/i40iw/i40iw_cm.c
new file mode 100644
index 000..aa6263f
--- /dev/null
+++ b/drivers/infiniband/hw/i40iw/i40iw_cm.c
@@ -0,0 +1,4447 @@
+/***
+*
+* Copyright (c) 2015 Intel Corporation.  All rights reserved.
+*
+* This software is available to you under a choice of one of two
+* licenses.  You may choose to be licensed under the terms of the GNU
+* General Public License (GPL) Version 2, available from the file
+* COPYING in the main directory of this source tree, or the
+* OpenFabrics.org BSD license below:
+*
+*   Redistribution and use in source and binary forms, with or
+*   without modification, are permitted provided that the following
+*   conditions are met:
+*
+*- Redistributions of source code must retain the above
+*  copyright notice, this list of conditions and the following
+*  disclaimer.
+*
+*- Redistributions in binary form must reproduce the above
+*  copyright notice, this list of conditions and the following
+*  disclaimer in the documentation and/or other materials
+*  provided with the distribution.
+*
+* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+* EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+* MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+* NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+* BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+* ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+* CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+* SOFTWARE.
+*
+***/
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "i40iw.h"
+
+static void i40iw_rem_ref_cm_node(struct i40iw_cm_node *);
+static void i40iw_cm_post_event(struct i40iw_cm_event *event);
+static void i40iw_disconnect_worker(struct work_struct *work);
+
+/**
+ * i40iw_free_sqbuf - put back puda buffer if refcount = 0
+ * @dev: FPK device
+ * @buf: puda buffer to free
+ */
+void i40iw_free_sqbuf(struct i40iw_sc_dev *dev, void *bufp)
+{
+   struct i40iw_puda_buf *buf = (struct i40iw_puda_buf *)bufp;
+   struct i40iw_puda_rsrc *ilq = dev->ilq;
+
+   if (!atomic_dec_return(>refcount))
+   i40iw_puda_ret_bufpool(ilq, buf);
+}
+
+/**
+ * i40iw_derive_hw_ird_setting - Calculate IRD
+ *
+ * @cm_ird: IRD of connection's node
+ *
+ * The ird from the connection is rounded to a supported HW
+ * setting (2,8,32,64) and then encoded for ird_size field of
+ * qp_ctx
+ */
+static u8 i40iw_derive_hw_ird_setting(u16 cm_ird)
+{
+   u8 encoded_ird_size = 0;
+   u8 pof2_cm_ird = 1;
+
+   /* round-off to next powerof2 */
+   while (pof2_cm_ird < cm_ird)
+   pof2_cm_ird *= 2;
+
+   /* ird_size field is encoded in qp_ctx */
+   switch (pof2_cm_ird) {
+   case I40IW_HW_IRD_SETTING_64:
+   encoded_ird_size = 3;
+   break;
+   case I40IW_HW_IRD_SETTING_32:
+   case I40IW_HW_IRD_SETTING_16:
+   encoded_ird_size = 2;
+   break;
+   case I40IW_HW_IRD_SETTING_8:
+   case I40IW_HW_IRD_SETTING_4:
+   encoded_ird_size = 1;
+   break;
+   case I40IW_HW_IRD_SETTING_2:
+   default:
+   encoded_ird_size = 0;
+   break;
+   }
+   return encoded_ird_size;
+}
+
+/**
+ * i40iw_record_ird_ord - Record IRD/ORD passed in
+ * @cm_node: connection's node
+ * @conn_ird: connection IRD
+ * @conn_ord: connection ORD
+ */
+static void i40iw_record_ird_ord(struct i40iw_cm_node *cm_node, u16 conn_ird, 
u16 conn_ord)
+{
+   if (conn_ird > I40IW_MAX_IRD_SIZE)
+   conn_ird = I40IW_MAX_IRD_SIZE;
+
+   if (conn_ord > I40IW_MAX_ORD_SIZE)
+   conn_ord = I40IW_MAX_ORD_SIZE;
+
+   cm_node->ird_size = conn_ird;
+   cm_node->ord_size = conn_ord;
+}
+
+/**
+ * i40iw_copy_ip_ntohl - change network to host ip
+ * @dst: host ip
+ * @src: big endian
+ */
+void i40iw_copy_ip_ntohl(u32 *dst, __be32 *src)
+{
+   *dst++ =

[PATCH 06/15] i40iw: add hmc resource files

2015-12-16 Thread Faisal Latif

i40iw_hmc.[ch] are to manage hmc for the device.

Acked-by: Anjali Singhai Jain 
Acked-by: Shannon Nelson 
Signed-off-by: Faisal Latif 
---
 drivers/infiniband/hw/i40iw/i40iw_hmc.c | 823 
 drivers/infiniband/hw/i40iw/i40iw_hmc.h | 241 ++
 2 files changed, 1064 insertions(+)
 create mode 100644 drivers/infiniband/hw/i40iw/i40iw_hmc.c
 create mode 100644 drivers/infiniband/hw/i40iw/i40iw_hmc.h

diff --git a/drivers/infiniband/hw/i40iw/i40iw_hmc.c 
b/drivers/infiniband/hw/i40iw/i40iw_hmc.c
new file mode 100644
index 000..f4f4055
--- /dev/null
+++ b/drivers/infiniband/hw/i40iw/i40iw_hmc.c
@@ -0,0 +1,823 @@
+/***
+*
+* Copyright (c) 2015 Intel Corporation.  All rights reserved.
+*
+* This software is available to you under a choice of one of two
+* licenses.  You may choose to be licensed under the terms of the GNU
+* General Public License (GPL) Version 2, available from the file
+* COPYING in the main directory of this source tree, or the
+* OpenFabrics.org BSD license below:
+*
+*   Redistribution and use in source and binary forms, with or
+*   without modification, are permitted provided that the following
+*   conditions are met:
+*
+*- Redistributions of source code must retain the above
+*  copyright notice, this list of conditions and the following
+*  disclaimer.
+*
+*- Redistributions in binary form must reproduce the above
+*  copyright notice, this list of conditions and the following
+*  disclaimer in the documentation and/or other materials
+*  provided with the distribution.
+*
+* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+* EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+* MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+* NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+* BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+* ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+* CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+* SOFTWARE.
+*
+***/
+
+#include "i40iw_osdep.h"
+#include "i40iw_register.h"
+#include "i40iw_status.h"
+#include "i40iw_hmc.h"
+#include "i40iw_d.h"
+#include "i40iw_type.h"
+#include "i40iw_p.h"
+#include "i40iw_vf.h"
+#include "i40iw_virtchnl.h"
+
+/**
+ * i40iw_find_sd_index_limit - finds segment descriptor index limit
+ * @hmc_info: pointer to the HMC configuration information structure
+ * @type: type of HMC resources we're searching
+ * @index: starting index for the object
+ * @cnt: number of objects we're trying to create
+ * @sd_idx: pointer to return index of the segment descriptor in question
+ * @sd_limit: pointer to return the maximum number of segment descriptors
+ *
+ * This function calculates the segment descriptor index and index limit
+ * for the resource defined by i40iw_hmc_rsrc_type.
+ */
+
+static inline void i40iw_find_sd_index_limit(struct i40iw_hmc_info *hmc_info,
+u32 type,
+u32 idx,
+u32 cnt,
+u32 *sd_idx,
+u32 *sd_limit)
+{
+   u64 fpm_addr, fpm_limit;
+
+   fpm_addr = hmc_info->hmc_obj[(type)].base +
+   hmc_info->hmc_obj[type].size * idx;
+   fpm_limit = fpm_addr + hmc_info->hmc_obj[type].size * cnt;
+   *sd_idx = (u32)(fpm_addr / I40IW_HMC_DIRECT_BP_SIZE);
+   *sd_limit = (u32)((fpm_limit - 1) / I40IW_HMC_DIRECT_BP_SIZE);
+   *sd_limit += 1;
+}
+
+/**
+ * i40iw_find_pd_index_limit - finds page descriptor index limit
+ * @hmc_info: pointer to the HMC configuration information struct
+ * @type: HMC resource type we're examining
+ * @idx: starting index for the object
+ * @cnt: number of objects we're trying to create
+ * @pd_index: pointer to return page descriptor index
+ * @pd_limit: pointer to return page descriptor index limit
+ *
+ * Calculates the page descriptor index and index limit for the resource
+ * defined by i40iw_hmc_rsrc_type.
+ */
+
+static inline void i40iw_find_pd_index_limit(struct i40iw_hmc_info *hmc_info,
+u32 type,
+u32 idx,
+u32 cnt,
+u32 *pd_idx,
+u32 *pd_limit)
+{
+   u64 fpm_adr, fpm_limit;
+
+   fpm_adr = hmc_info->hmc_obj[type].base +
+   hmc_info->hmc_obj[type].size * idx;
+   fpm_limit = fpm_adr + (hmc_info)->hmc_obj[(type)].size * (cnt);
+   *(pd_idx) = (u32)(fpm_adr /

[PATCH 05/15] i40iw: add pble resource files

2015-12-16 Thread Faisal Latif

i40iw_pble.[ch] to manage pble resource for iwarp clients.

Acked-by: Anjali Singhai Jain 
Acked-by: Shannon Nelson 
Signed-off-by: Faisal Latif 
---
 drivers/infiniband/hw/i40iw/i40iw_pble.c | 618 +++
 drivers/infiniband/hw/i40iw/i40iw_pble.h | 131 +++
 2 files changed, 749 insertions(+)
 create mode 100644 drivers/infiniband/hw/i40iw/i40iw_pble.c
 create mode 100644 drivers/infiniband/hw/i40iw/i40iw_pble.h

diff --git a/drivers/infiniband/hw/i40iw/i40iw_pble.c 
b/drivers/infiniband/hw/i40iw/i40iw_pble.c
new file mode 100644
index 000..217997e
--- /dev/null
+++ b/drivers/infiniband/hw/i40iw/i40iw_pble.c
@@ -0,0 +1,618 @@
+/***
+*
+* Copyright (c) 2015 Intel Corporation.  All rights reserved.
+*
+* This software is available to you under a choice of one of two
+* licenses.  You may choose to be licensed under the terms of the GNU
+* General Public License (GPL) Version 2, available from the file
+* COPYING in the main directory of this source tree, or the
+* OpenFabrics.org BSD license below:
+*
+*   Redistribution and use in source and binary forms, with or
+*   without modification, are permitted provided that the following
+*   conditions are met:
+*
+*- Redistributions of source code must retain the above
+*  copyright notice, this list of conditions and the following
+*  disclaimer.
+*
+*- Redistributions in binary form must reproduce the above
+*  copyright notice, this list of conditions and the following
+*  disclaimer in the documentation and/or other materials
+*  provided with the distribution.
+*
+* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+* EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+* MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+* NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+* BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+* ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+* CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+* SOFTWARE.
+*
+***/
+
+#include "i40iw_status.h"
+#include "i40iw_osdep.h"
+#include "i40iw_register.h"
+#include "i40iw_hmc.h"
+
+#include "i40iw_d.h"
+#include "i40iw_type.h"
+#include "i40iw_p.h"
+
+#include 
+#include 
+#include 
+#include "i40iw_pble.h"
+#include "i40iw.h"
+
+struct i40iw_device;
+static enum i40iw_status_code add_pble_pool(struct i40iw_sc_dev *dev,
+   struct i40iw_hmc_pble_rsrc 
*pble_rsrc);
+static void i40iw_free_vmalloc_mem(struct i40iw_hw *hw, struct i40iw_chunk 
*chunk);
+
+/**
+ * i40iw_destroy_pble_pool - destroy pool during module unload
+ * @pble_rsrc: pble resources
+ */
+void i40iw_destroy_pble_pool(struct i40iw_sc_dev *dev, struct 
i40iw_hmc_pble_rsrc *pble_rsrc)
+{
+   struct list_head *clist;
+   struct list_head *tlist;
+   struct i40iw_chunk *chunk;
+   struct i40iw_pble_pool *pinfo = _rsrc->pinfo;
+
+   if (pinfo->pool) {
+   list_for_each_safe(clist, tlist, >clist) {
+   chunk = list_entry(clist, struct i40iw_chunk, list);
+   if (chunk->type == I40IW_VMALLOC)
+   i40iw_free_vmalloc_mem(dev->hw, chunk);
+   kfree(chunk);
+   }
+   gen_pool_destroy(pinfo->pool);
+   }
+}
+
+/**
+ * i40iw_hmc_init_pble - Initialize pble resources during module load
+ * @dev: i40iw_sc_dev struct
+ * @pble_rsrc: pble resources
+ */
+enum i40iw_status_code i40iw_hmc_init_pble(struct i40iw_sc_dev *dev,
+  struct i40iw_hmc_pble_rsrc 
*pble_rsrc)
+{
+   struct i40iw_hmc_info *hmc_info;
+   u32 fpm_idx = 0;
+
+   hmc_info = dev->hmc_info;
+   pble_rsrc->fpm_base_addr = hmc_info->hmc_obj[I40IW_HMC_IW_PBLE].base;
+   /* Now start the pble' on 4k boundary */
+   if (pble_rsrc->fpm_base_addr & 0xfff)
+   fpm_idx = (PAGE_SIZE - (pble_rsrc->fpm_base_addr & 0xfff)) >> 3;
+
+   pble_rsrc->unallocated_pble =
+   hmc_info->hmc_obj[I40IW_HMC_IW_PBLE].cnt - fpm_idx;
+   pble_rsrc->next_fpm_addr = pble_rsrc->fpm_base_addr + (fpm_idx << 3);
+
+   pble_rsrc->pinfo.pool_shift = POOL_SHIFT;
+   pble_rsrc->pinfo.pool = gen_pool_create(pble_rsrc->pinfo.pool_shift, 
-1);
+   INIT_LIST_HEAD(_rsrc->pinfo.clist);
+   if (!pble_rsrc->pinfo.pool)
+   goto error;
+
+   if (add_pble_pool(dev, pble_rsrc))
+   goto error;
+
+   return 0;
+
+ error:i40iw_destroy_pble_pool(dev, pble_rsrc);
+   return I40IW_ERR_NO_MEMORY;
+}
+
+/**
+ * get_sd_pd_idx -  Returns sd index, pd index and rel_pd_idx from fpm address
+ * @ pble_rsrc:

[PATCH 12/15] i40iw: user kernel shared files

2015-12-16 Thread Faisal Latif

i40iw_user.h and i40iw_uk.c are used by both user library as well as
kernel requests.

Acked-by: Anjali Singhai Jain 
Acked-by: Shannon Nelson 
Signed-off-by: Faisal Latif 
---
 drivers/infiniband/hw/i40iw/i40iw_uk.c   | 1213 ++
 drivers/infiniband/hw/i40iw/i40iw_user.h |  438 +++
 2 files changed, 1651 insertions(+)
 create mode 100644 drivers/infiniband/hw/i40iw/i40iw_uk.c
 create mode 100644 drivers/infiniband/hw/i40iw/i40iw_user.h

diff --git a/drivers/infiniband/hw/i40iw/i40iw_uk.c 
b/drivers/infiniband/hw/i40iw/i40iw_uk.c
new file mode 100644
index 000..d7ae9e6
--- /dev/null
+++ b/drivers/infiniband/hw/i40iw/i40iw_uk.c
@@ -0,0 +1,1213 @@
+/***
+*
+* Copyright (c) 2015 Intel Corporation.  All rights reserved.
+*
+* This software is available to you under a choice of one of two
+* licenses.  You may choose to be licensed under the terms of the GNU
+* General Public License (GPL) Version 2, available from the file
+* COPYING in the main directory of this source tree, or the
+* OpenFabrics.org BSD license below:
+*
+*   Redistribution and use in source and binary forms, with or
+*   without modification, are permitted provided that the following
+*   conditions are met:
+*
+*- Redistributions of source code must retain the above
+*  copyright notice, this list of conditions and the following
+*  disclaimer.
+*
+*- Redistributions in binary form must reproduce the above
+*  copyright notice, this list of conditions and the following
+*  disclaimer in the documentation and/or other materials
+*  provided with the distribution.
+*
+* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+* EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+* MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+* NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+* BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+* ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+* CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+* SOFTWARE.
+*
+***/
+
+#include "i40iw_osdep.h"
+#include "i40iw_status.h"
+#include "i40iw_d.h"
+#include "i40iw_user.h"
+#include "i40iw_register.h"
+
+static u32 nop_signature = 0x;
+
+/**
+ * i40iw_nop_1 - insert a nop wqe and move head. no post work
+ * @qp: hw qp ptr
+ */
+static enum i40iw_status_code i40iw_nop_1(struct i40iw_qp_uk *qp)
+{
+   u64 header, *wqe;
+   u64 *wqe_0 = NULL;
+   u32 wqe_idx, peek_head;
+   bool signaled = false;
+
+   if (!qp->sq_ring.head)
+   return I40IW_ERR_PARAM;
+
+   wqe_idx = I40IW_RING_GETCURRENT_HEAD(qp->sq_ring);
+   wqe = >sq_base[wqe_idx << 2];
+   peek_head = (qp->sq_ring.head + 1) % qp->sq_ring.size;
+   wqe_0 = >sq_base[peek_head << 2];
+   if (peek_head)
+   wqe_0[3] = LS_64(!qp->swqe_polarity, I40IWQPSQ_VALID);
+   else
+   wqe_0[3] = LS_64(qp->swqe_polarity, I40IWQPSQ_VALID);
+
+   set_64bit_val(wqe, 0, 0);
+   set_64bit_val(wqe, 8, 0);
+   set_64bit_val(wqe, 16, 0);
+
+   header = LS_64(I40IWQP_OP_NOP, I40IWQPSQ_OPCODE) |
+   LS_64(signaled, I40IWQPSQ_SIGCOMPL) |
+   LS_64(qp->swqe_polarity, I40IWQPSQ_VALID) | nop_signature++;
+
+   wmb();  /* Memory barrier to ensure data is written before valid bit is 
set */
+
+   set_64bit_val(wqe, 24, header);
+   return 0;
+}
+
+/**
+ * i40iw_qp_post_wr - post wr to hrdware
+ * @qp: hw qp ptr
+ */
+void i40iw_qp_post_wr(struct i40iw_qp_uk *qp)
+{
+   u64 temp;
+   u32 hw_sq_tail;
+   u32 sw_sq_head;
+
+   wmb(); /* make sure valid bit is written */
+
+   /* read the doorbell shadow area */
+   get_64bit_val(qp->shadow_area, 0, );
+
+   rmb(); /* make sure read is finished */
+
+   hw_sq_tail = (u32)RS_64(temp, I40IW_QP_DBSA_HW_SQ_TAIL);
+   sw_sq_head = I40IW_RING_GETCURRENT_HEAD(qp->sq_ring);
+   if (sw_sq_head != hw_sq_tail) {
+   if (sw_sq_head > qp->initial_ring.head) {
+   if ((hw_sq_tail >= qp->initial_ring.head) &&
+   (hw_sq_tail < sw_sq_head)) {
+   db_wr32(qp->wqe_alloc_reg, qp->qp_id);
+   }
+   } else if (sw_sq_head != qp->initial_ring.head) {
+   if ((hw_sq_tail >= qp->initial_ring.head) ||
+   (hw_sq_tail < sw_sq_head)) {
+   db_wr32(qp->wqe_alloc_reg, qp->qp_id);
+   }
+   }
+   }
+
+   qp->initial_ring.head = qp->sq_ring.head;
+}
+
+/**
+ * i40iw_qp_ring_push_db -  ring qp doorbell
+ * @qp: hw qp ptr
+ * @wqe_idx: wqe index
+

Re: net: heap-out-of-bounds in sock_setsockopt

2015-12-16 Thread Eric Dumazet

On Wed, Dec 16, 2015 at 12:22 PM, Cong Wang  wrote:

> Hmm, we should exclude the raw socket case, something like the
> following, but I am not sure if the check is too strict or not, also
> not sure if we should return an error for this raw socket case.
>
> diff --git a/net/core/sock.c b/net/core/sock.c
> index 765be83..c26e80a 100644
> --- a/net/core/sock.c
> +++ b/net/core/sock.c
> @@ -872,7 +872,7 @@ int sock_setsockopt(struct socket *sock, int
> level, int optname,
>
> if (val & SOF_TIMESTAMPING_OPT_ID &&
> !(sk->sk_tsflags & SOF_TIMESTAMPING_OPT_ID)) {
> -   if (sk->sk_protocol == IPPROTO_TCP) {
> +   if (sk->sk_protocol == IPPROTO_TCP &&
> sk->sk_type == SOCK_STREAM) {
> if (sk->sk_state != TCP_ESTABLISHED) {
> ret = -EINVAL;
> break;

This looks right, please post this officially ;)

tcp_sk(sk) only works for TCP sockets , and the test must include
sk->sk_type == SOCK_STREAM
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

net: heap-out-of-bounds in sock_setsockopt

2015-12-16 Thread Dmitry Vyukov

Hello,

The following program triggers heap-out-of-bounds access in sock_setsockopt:

// autogenerated by syzkaller (http://github.com/google/syzkaller)
#include 
#include 
#include 
#include 
#include 
#include 
#include 

#define SOF_TIMESTAMPING_OPT_ID (1<<7)

int main()
{
int fd = socket(PF_INET6, SOCK_RAW, IPPROTO_TCP);
struct sockaddr_in6 sa = {};
sa.sin6_family = AF_INET6;
sa.sin6_port = htons(13277);
inet_pton(AF_INET6, "::1", _addr);
connect(fd, (struct sockaddr*), sizeof(sa));
int opt = SOF_TIMESTAMPING_OPT_ID;
setsockopt(fd, SOL_SOCKET, SO_TIMESTAMPING, , sizeof(opt));
return 0;
}


BUG: KASAN: slab-out-of-bounds in sock_setsockopt+0x1284/0x13d0 at
addr 88006563ec10
Read of size 4 by task syzkaller_execu/4755
=
BUG RAWv6 (Not tainted): kasan: bad access detected
-
INFO: Allocated in sk_prot_alloc+0x69/0x340 age=17 cpu=3 pid=4755
[<  none  >] kmem_cache_alloc+0x244/0x2c0 mm/slub.c:2607
[<  none  >] sk_prot_alloc+0x69/0x340 net/core/sock.c:1343
[<  none  >] sk_alloc+0x3a/0x6b0 net/core/sock.c:1418
[<  none  >] inet6_create+0x2c4/0xfd0 net/ipv6/af_inet6.c:170
[<  none  >] __sock_create+0x37c/0x640 net/socket.c:1162
[< inline >] sock_create net/socket.c:1202
[< inline >] SYSC_socket net/socket.c:1232
[<  none  >] SyS_socket+0xef/0x1b0 net/socket.c:1212
[<  none  >] entry_SYSCALL_64_fastpath+0x16/0x7a
arch/x86/entry/entry_64.S:185

Call Trace:
 [] __asan_report_load4_noabort+0x3e/0x40
mm/kasan/report.c:294
 [] sock_setsockopt+0x1284/0x13d0 net/core/sock.c:880
 [< inline >] SYSC_setsockopt net/socket.c:1746
 [] SyS_setsockopt+0x1fe/0x240 net/socket.c:1729
 [] entry_SYSCALL_64_fastpath+0x16/0x7a
arch/x86/entry/entry_64.S:185


On commit edb42dc7bc0da0125ceacab810a553ce1f0cac8d (Dec 15).
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [PATCH net-next] hv_netvsc: Use simple parser for IPv4 and v6 headers

2015-12-16 Thread Haiyang Zhang

> -Original Message-
> From: Eric Dumazet [mailto:eric.duma...@gmail.com]
> Sent: Wednesday, December 16, 2015 12:08 PM
> 
> This looks very very wrong to me.
> 
> How many times this is called per second, for the 'one flow' case ?
> 
> Don't you use TSO in this driver ?
> 
> What about encapsulation ?
> 
> I suspect you have a quite different issue here.
> 
> You simply could use skb_get_hash() since local TCP flows will provide a
> l4 skb->hash and you have no further flow dissection to do.

In our test, we have bisected and found the following patch introduced big 
overhead into skb_flow_dissect_flow_keys(), and caused performance 
regression:
commit: d34af823
net: Add VLAN ID to flow_keys

This patch didn't add too many instructions, but we think the change to 
the size of struct flow_keys may cause different cache missing rate...

To avoid affecting other drivers using this function, our patch limits the 
change inside our driver to fix this performance regression.

Regarding your suggestion on skb_get_hash(), I looked at the code and ran 
some tests, and found the skb->l4_hash and skb->sw_hash bits are not set, 
so it calls __skb_get_hash() which eventually calls 
skb_flow_dissect_flow_keys(). So it still includes the performance 
overhead mentioned above.

static inline __u32 skb_get_hash(struct sk_buff *skb)
{
if (!skb->l4_hash && !skb->sw_hash)
__skb_get_hash(skb);

return skb->hash;
}


void __skb_get_hash(struct sk_buff *skb)
{
struct flow_keys keys;

__flow_hash_secret_init();

__skb_set_sw_hash(skb, ___skb_get_hash(skb, , hashrnd),
  flow_keys_have_l4());
}


static inline u32 ___skb_get_hash(const struct sk_buff *skb,
  struct flow_keys *keys, u32 keyval)
{
skb_flow_dissect_flow_keys(skb, keys,
   FLOW_DISSECTOR_F_STOP_AT_FLOW_LABEL);

return __flow_hash_from_keys(keys, keyval);
}


Thanks,
- Haiyang

[PATCH 01/15] i40e: Add support for client interface for IWARP driver

2015-12-16 Thread Faisal Latif

From: Anjali Singhai Jain 

This patch adds a Client interface for i40iw driver
support. Also expands the Virtchannel to support messages
from i40evf driver on behalf of i40iwvf driver.

This client API is used by the i40iw and i40iwvf driver
to access the core driver resources brokered by the i40e driver.

Signed-off-by: Anjali Singhai Jain 
---
 drivers/net/ethernet/intel/i40e/Makefile   |1 +
 drivers/net/ethernet/intel/i40e/i40e.h |   22 +
 drivers/net/ethernet/intel/i40e/i40e_client.c  | 1012 
 drivers/net/ethernet/intel/i40e/i40e_client.h  |  232 +
 drivers/net/ethernet/intel/i40e/i40e_main.c|  115 ++-
 drivers/net/ethernet/intel/i40e/i40e_type.h|3 +-
 drivers/net/ethernet/intel/i40e/i40e_virtchnl.h|   34 +
 drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c |  247 -
 drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.h |4 +
 9 files changed, 1657 insertions(+), 13 deletions(-)
 create mode 100644 drivers/net/ethernet/intel/i40e/i40e_client.c
 create mode 100644 drivers/net/ethernet/intel/i40e/i40e_client.h

diff --git a/drivers/net/ethernet/intel/i40e/Makefile 
b/drivers/net/ethernet/intel/i40e/Makefile
index b4729ba..3b3c63e 100644
--- a/drivers/net/ethernet/intel/i40e/Makefile
+++ b/drivers/net/ethernet/intel/i40e/Makefile
@@ -41,6 +41,7 @@ i40e-objs := i40e_main.o \
i40e_diag.o \
i40e_txrx.o \
i40e_ptp.o  \
+   i40e_client.o   \
i40e_virtchnl_pf.o
 
 i40e-$(CONFIG_I40E_DCB) += i40e_dcb.o i40e_dcb_nl.o
diff --git a/drivers/net/ethernet/intel/i40e/i40e.h 
b/drivers/net/ethernet/intel/i40e/i40e.h
index 4dd3e26..1417ae8 100644
--- a/drivers/net/ethernet/intel/i40e/i40e.h
+++ b/drivers/net/ethernet/intel/i40e/i40e.h
@@ -59,6 +59,7 @@
 #ifdef I40E_FCOE
 #include "i40e_fcoe.h"
 #endif
+#include "i40e_client.h"
 #include "i40e_virtchnl.h"
 #include "i40e_virtchnl_pf.h"
 #include "i40e_txrx.h"
@@ -178,6 +179,7 @@ struct i40e_lump_tracking {
u16 search_hint;
u16 list[0];
 #define I40E_PILE_VALID_BIT  0x8000
+#define I40E_IWARP_IRQ_PILE_ID  (I40E_PILE_VALID_BIT - 2)
 };
 
 #define I40E_DEFAULT_ATR_SAMPLE_RATE   20
@@ -264,6 +266,8 @@ struct i40e_pf {
 #endif /* I40E_FCOE */
u16 num_lan_qps;   /* num lan queues this PF has set up */
u16 num_lan_msix;  /* num queue vectors for the base PF vsi */
+   u16 num_iwarp_msix;/* num of iwarp vectors for this PF */
+   int iwarp_base_vector;
int queues_left;   /* queues left unclaimed */
u16 rss_size;  /* num queues in the RSS array */
u16 rss_size_max;  /* HW defined max RSS queues */
@@ -313,6 +317,7 @@ struct i40e_pf {
 #define I40E_FLAG_16BYTE_RX_DESC_ENABLED   BIT_ULL(13)
 #define I40E_FLAG_CLEAN_ADMINQ BIT_ULL(14)
 #define I40E_FLAG_FILTER_SYNC  BIT_ULL(15)
+#define I40E_FLAG_SERVICE_CLIENT_REQUESTED BIT_ULL(16)
 #define I40E_FLAG_PROCESS_MDD_EVENTBIT_ULL(17)
 #define I40E_FLAG_PROCESS_VFLR_EVENT   BIT_ULL(18)
 #define I40E_FLAG_SRIOV_ENABLEDBIT_ULL(19)
@@ -550,6 +555,8 @@ struct i40e_vsi {
struct kobject *kobj;  /* sysfs object */
bool current_isup; /* Sync 'link up' logging */
 
+   void *priv; /* client driver data reference. */
+
/* VSI specific handlers */
irqreturn_t (*irq_handler)(int irq, void *data);
 
@@ -702,6 +709,10 @@ void i40e_vsi_setup_queue_map(struct i40e_vsi *vsi,
  struct i40e_vsi_context *ctxt,
  u8 enabled_tc, bool is_add);
 #endif
+void i40e_service_event_schedule(struct i40e_pf *pf);
+void i40e_notify_client_of_vf_msg(struct i40e_vsi *vsi, u32 vf_id,
+ u8 *msg, u16 len);
+
 int i40e_vsi_control_rings(struct i40e_vsi *vsi, bool enable);
 int i40e_reconfig_rss_queues(struct i40e_pf *pf, int queue_count);
 struct i40e_veb *i40e_veb_setup(struct i40e_pf *pf, u16 flags, u16 uplink_seid,
@@ -724,6 +735,17 @@ static inline void i40e_dbg_pf_exit(struct i40e_pf *pf) {}
 static inline void i40e_dbg_init(void) {}
 static inline void i40e_dbg_exit(void) {}
 #endif /* CONFIG_DEBUG_FS*/
+/* needed by client drivers */
+int i40e_lan_add_device(struct i40e_pf *pf);
+int i40e_lan_del_device(struct i40e_pf *pf);
+void i40e_client_subtask(struct i40e_pf *pf);
+void i40e_notify_client_of_l2_param_changes(struct i40e_vsi *vsi);
+void i40e_notify_client_of_netdev_open(struct i40e_vsi *vsi);
+void i40e_notify_client_of_netdev_close(struct i40e_vsi *vsi, bool reset);
+void i40e_notify_client_of_vf_enable(struct i40e_pf *pf, u32 num_vfs);
+void i40e_notify_client_of_vf_reset(struct i40e_pf *pf, u32 vf_id);
+int i40e_vf_client_capable(struct i40e_pf *pf, u32 vf_id,
+  enum i40e_client_type type);
 /**
  * i40e_irq_dynamic_enable -

[PATCH 00/15] add Intel(R) X722 iWARP driver

2015-12-16 Thread Faisal Latif

This series contains the addition of the i40iw.ko driver.

This driver provides iWARP RDMA functionality for the Intel(R) X722
Ethernet controller for PCI Physical Functions. It also has support
for Virtual Function driver (i40iwvf.ko) that will be part of seperate
patch series.

It cooperates with the Intel(R) X722 base driver (i40e.ko) to allocate
resources and program the controller.

This series include 1 patch to i40e.ko to provide interface support
to i40iw.ko. The interface provides a driver registration mechanism,
resource allocations, and device reset coordination mechanisms.

This patch series is based on Doug Ledford's
/github.com/dledford/linux.git


Anjali Singhai Jain (1)
net/ethernet/intel/i40e: Add support for client interface for IWARP driver

Faisal Latif(14):
infiniband/hw/i40iw: add main, hdr, status
infiniband/hw/i40iw: add connection management code
infiniband/hw/i40iw: add puda code
infiniband/hw/i40iw: add pble resource files
infiniband/hw/i40iw: add hmc resource files
infiniband/hw/i40iw: add hw and utils files
infiniband/hw/i40iw: add files for iwarp interface
infiniband/hw/i40iw: add file to handle cqp calls
infiniband/hw/i40iw: add hardware related header files
infiniband/hw/i40iw: add X722 register file
infiniband/hw/i40iw: user kernel shared files
infiniband/hw/i40iw: virtual channel handling files
infiniband/hw/i40iw: Kconfig and Kbuild for iwarp module
infiniband/hw/i40iw: changes for build of i40iw module

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 02/15] i40iw: add main, hdr, status

2015-12-16 Thread Joe Perches

On Wed, 2015-12-16 at 13:58 -0600, Faisal Latif wrote:
> i40iw_main.c contains routines for i40e <=> i40iw interface and setup.
> i40iw.h is header file for main device data structures.
> i40iw_status.h is for return status codes.
[]
> diff --git a/drivers/infiniband/hw/i40iw/i40iw.h 
> b/drivers/infiniband/hw/i40iw/i40iw.h
[]
> +#define i40iw_pr_err(fmt, args ...) pr_err("%s: error " fmt, __func__, ## 
> args)
> +
> +#define i40iw_pr_info(fmt, args ...) pr_info("%s: " fmt, __func__, ## args)
> +
> +#define i40iw_pr_warn(fmt, args ...) pr_warn("%s: " fmt, __func__, ## args)

Using "error " in the output doesn't really add much
as there's already a KERN_ERR with the output.

Using __func__ hardly adds anything.

Using netdev_ is generally preferred

> +
> +struct i40iw_cqp_request {
> + struct cqp_commands_info info;
> + wait_queue_head_t waitq;
> + struct list_head list;
> + atomic_t refcount;
> + void (*callback_fcn)(struct i40iw_cqp_request*, u32);
> + void *param;
> + struct i40iw_cqp_compl_info compl_info;
> + u8 waiting:1;
> + u8 request_done:1;
> + u8 dynamic:1;
> + u8 polling:1;

These would bitfields might be better as bool

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH iproute2] ipv6: allow setting stable_secret mode

2015-12-16 Thread Hannes Frederic Sowa

On 16.12.2015 17:56, Bjørn Mork wrote:
> 
> 
> On December 16, 2015 12:30:08 PM CET, Hannes Frederic Sowa 
>  wrote:
>> Signed-off-by: Hannes Frederic Sowa 
>> ---
>> ip/iplink.c | 4 +++-
>> 1 file changed, 3 insertions(+), 1 deletion(-)
>>
>> diff --git a/ip/iplink.c b/ip/iplink.c
>> index f30de86d1858a0..e824082f7d8149 100644
>> --- a/ip/iplink.c
>> +++ b/ip/iplink.c
>> @@ -84,7 +84,7 @@ void iplink_usage(void)
>>  fprintf(stderr, "  [ state { auto | 
>> enable | disable} ] ]\n");
>>  fprintf(stderr, " [ master DEVICE ]\n");
>>  fprintf(stderr, " [ nomaster ]\n");
>> -fprintf(stderr, " [ addrgenmode { eui64 | none 
>> } ]\n");
>> +fprintf(stderr, " [ addrgenmode { eui64 | none 
>> | stable_secret }
>> ]\n");
>>  fprintf(stderr, " [ protodown { on | off } 
>> ]\n");
>>  fprintf(stderr, "   ip link show [ DEVICE | group GROUP ] [up]
>> [master DEV] [type TYPE]\n");
>>
>> @@ -176,6 +176,8 @@ static int get_addr_gen_mode(const char *mode)
>>  return IN6_ADDR_GEN_MODE_EUI64;
>>  if (strcasecmp(mode, "none") == 0)
>>  return IN6_ADDR_GEN_MODE_NONE;
>> +if (strcasecmp(mode, "stable_secret") == 0)
>> +return IN6_ADDR_GEN_MODE_STABLE_PRIVACY;
>>  return -1;
>> }
>>
> 
> Ah, didn't notice this before I sent my similar patch. Sorry about that.
> 
> But FWIW, I really don't like the renaming you do here. 
> If the mode is 'stable_privacy', then it should be known
> by that name here as well.

I am fine with both ways. We can take your patch as you also updated the
man page. :)

The reason why I chose stable secret in the end was to align it with the
already visible proc filename stable_secret and didn't care too much
about the internal variable names.

Thanks,
Hannes

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v3] net/macb: add support for resetting PHY using GPIO

2015-12-16 Thread Richard Cochran

On Wed, Dec 16, 2015 at 07:31:30PM +0100, Gregory CLEMENT wrote:
> +Optional properties for PHY child node:
> +- reset-gpios : Should specify the gpio for phy reset

reset-gpios plural or reset-gpio singular?

> +
>  Examples:
>  
>   macb0: ethernet@fffc4000 {
> @@ -29,4 +32,8 @@ Examples:
>   local-mac-address = [3a 0e 03 04 05 06];
>   clock-names = "pclk", "hclk", "tx_clk";
>   clocks = < 30>, < 30>, < 13>;
> + ethernet-phy@1 {
> + reg = <0x1>;
> + reset-gpios = < 6 1>;

Did you mean "gpioE" ?

Thanks,
Richard
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next] ipv6: add IPV6_HDRINCL option for raw sockets

2015-12-16 Thread Joe Perches

On Wed, 2015-12-16 at 17:22 +0100, Hannes Frederic Sowa wrote:
> Same as in Windows, we miss IPV6_HDRINCL for SOL_IPV6 and SOL_RAW.
> The SOL_IP/IP_HDRINCL is not available for IPv6 sockets.
[]
> diff --git a/net/ipv6/raw.c b/net/ipv6/raw.c
[]
> @@ -972,6 +972,11 @@ static int do_rawv6_setsockopt(struct sock *sk,
> int level, int optname,
>   return -EFAULT;
>  
>   switch (optname) {
> + case IPV6_HDRINCL:
> + if (sk->sk_type != SOCK_RAW)
> + return -EINVAL;
> + inet_sk(sk)->hdrincl = !!val;

trivia:

ipv4/sockglue.c uses the ternary '? 1 : 0' convention for this.
It might be nicer to be consistent.

Then again sockglue.c is inconsistent about that too.

net/ipv4/ip_sockglue.c:736: inet->hdrincl = val ? 1 : 0;
net/ipv4/ip_sockglue.c:743: inet->nodefrag = val ? 1 : 0;
net/ipv4/ip_sockglue.c:746: inet->bind_address_no_port = val ? 1 : 
0;
net/ipv4/ip_sockglue.c:754: inet->recverr = !!val;
net/ipv4/ip_sockglue.c:772: inet->mc_loop = !!val;
net/ipv4/ip_sockglue.c:1123:inet->freebind = !!val;

There's no change to object code either way.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v3] net/macb: add support for resetting PHY using GPIO

2015-12-16 Thread Arnd Bergmann

On Wednesday 16 December 2015 19:31:30 Gregory CLEMENT wrote:
> diff --git a/drivers/net/ethernet/cadence/macb.c 
> b/drivers/net/ethernet/cadence/macb.c
> index 88c1e1a..35661aa 100644
> --- a/drivers/net/ethernet/cadence/macb.c
> +++ b/drivers/net/ethernet/cadence/macb.c
> @@ -28,6 +28,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 

Is this the patch that is already in linux-next?

I needed an additional

#include 

to avoid this build error on randconfig builds without GPIOLIB:

drivers/net/ethernet/cadence/macb.c: In function 'macb_probe':
drivers/net/ethernet/cadence/macb.c:2908:19: error: implicit declaration of 
function 'devm_gpiod_get_optional' [-Werror=implicit-function-declaration]
  bp->reset_gpio = devm_gpiod_get_optional(>pdev->dev, "phy-reset",
   ^
drivers/net/ethernet/cadence/macb.c:2909:8: error: 'GPIOD_OUT_HIGH' 
undeclared (first use in this function)
GPIOD_OUT_HIGH);
^
drivers/net/ethernet/cadence/macb.c:2909:8: note: each undeclared 
identifier is reported only once for each function it appears in
drivers/net/ethernet/cadence/macb.c: In function 'macb_remove':
drivers/net/ethernet/cadence/macb.c:2979:3: error: implicit declaration of 
function 'gpiod_set_value' [-Werror=implicit-function-declaration]


Arnd
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH iproute2] ipv6: allow setting stable_secret mode

2015-12-16 Thread Bjørn Mork



On December 16, 2015 12:30:08 PM CET, Hannes Frederic Sowa 
 wrote:
>Signed-off-by: Hannes Frederic Sowa 
>---
> ip/iplink.c | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
>
>diff --git a/ip/iplink.c b/ip/iplink.c
>index f30de86d1858a0..e824082f7d8149 100644
>--- a/ip/iplink.c
>+++ b/ip/iplink.c
>@@ -84,7 +84,7 @@ void iplink_usage(void)
>   fprintf(stderr, "  [ state { auto | 
> enable | disable} ] ]\n");
>   fprintf(stderr, " [ master DEVICE ]\n");
>   fprintf(stderr, " [ nomaster ]\n");
>-  fprintf(stderr, " [ addrgenmode { eui64 | none 
>} ]\n");
>+  fprintf(stderr, " [ addrgenmode { eui64 | none 
>| stable_secret }
>]\n");
>   fprintf(stderr, " [ protodown { on | off } 
> ]\n");
>   fprintf(stderr, "   ip link show [ DEVICE | group GROUP ] [up]
>[master DEV] [type TYPE]\n");
> 
>@@ -176,6 +176,8 @@ static int get_addr_gen_mode(const char *mode)
>   return IN6_ADDR_GEN_MODE_EUI64;
>   if (strcasecmp(mode, "none") == 0)
>   return IN6_ADDR_GEN_MODE_NONE;
>+  if (strcasecmp(mode, "stable_secret") == 0)
>+  return IN6_ADDR_GEN_MODE_STABLE_PRIVACY;
>   return -1;
> }
> 

Ah, didn't notice this before I sent my similar patch. Sorry about that.

But FWIW, I really don't like the renaming you do here. 
If the mode is 'stable_privacy', then it should be known
by that name here as well.


Bjørn
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next] hv_netvsc: Use simple parser for IPv4 and v6 headers

2015-12-16 Thread Sergei Shtylyov


On 12/16/2015 09:34 PM, Sergei Shtylyov wrote:


To avoid performance overhead when using skb_flow_dissect_flow_keys(),
we switch to the simple parsers to get the IP and port numbers.

Performance comparison: throughput (Gbps):
Number of connections, before patch, after patch
18.5610.18
411.1714.07
1612.2121.78
6418.7132.08
25615.9226.32
10248.4115.49
30007.8211.58

Signed-off-by: Haiyang Zhang 
Tested-by: Simon Xiao 
Reviewed-by: K. Y. Srinivasan 
---
  drivers/net/hyperv/netvsc_drv.c |   38 +-
  1 files changed, 29 insertions(+), 9 deletions(-)

diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
index 1c8db9a..e28951f 100644
--- a/drivers/net/hyperv/netvsc_drv.c
+++ b/drivers/net/hyperv/netvsc_drv.c
@@ -237,20 +237,40 @@ static u32 comp_hash(u8 *key, int klen, void *data,
int dlen)

[...]

+if (iphdr->version == 4) {
+dbuf[0] = iphdr->saddr;
+dbuf[1] = iphdr->daddr;
+if (iphdr->protocol == IPPROTO_TCP) {
+dbuf[2] = *(__be32 *)_hdr(skb)->source;
+data_len = 12;
+} else {
+data_len = 8;
+}
+} else if (ipv6hdr->version == 6) {
+memcpy(dbuf, >saddr, 32);
+if (ipv6hdr->nexthdr == IPPROTO_TCP) {
+dbuf[8] = *(__be32 *)_hdr(skb)->source;
+data_len = 36;
+} else {
+data_len = 32;
+}
+} else {
+return false;
+}


This is asking to be a *switch* statement.


   Oops, nevermind. I'd misread the code.


[...]


MBR, Sergei

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-next] nfp: clear ring delayed kick counters

2015-12-16 Thread Jakub Kicinski

We need to clear delayed kick counters when we free rings otherwise
after ndo_close()/ndo_open() we could kick HW by more entries than
actually written to rings.

Signed-off-by: Jakub Kicinski 
Reviewed-by: Rolf Neugebauer 
---
 drivers/net/ethernet/netronome/nfp/nfp_net_common.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c 
b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
index 7060539d276a..6c5af4cb5bdc 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
@@ -1363,6 +1363,7 @@ static void nfp_net_tx_ring_free(struct nfp_net_tx_ring 
*tx_ring)
tx_ring->wr_p = 0;
tx_ring->rd_p = 0;
tx_ring->qcp_rd_p = 0;
+   tx_ring->wr_ptr_add = 0;
 
tx_ring->txbufs = NULL;
tx_ring->txds = NULL;
@@ -1437,6 +1438,7 @@ static void nfp_net_rx_ring_free(struct nfp_net_rx_ring 
*rx_ring)
rx_ring->cnt = 0;
rx_ring->wr_p = 0;
rx_ring->rd_p = 0;
+   rx_ring->wr_ptr_add = 0;
 
rx_ring->rxbufs = NULL;
rx_ring->rxds = NULL;
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v3] net/macb: add support for resetting PHY using GPIO

2015-12-16 Thread Gregory CLEMENT

With device tree it is no more possible to reset the PHY at board
level. Furthermore, doing in the driver allow to power down the PHY when
the network interface is no more used.

This reset can't be done at the PHY driver level. The PHY must be able to
answer to the mii bus scan to let the kernel creating a PHY device.

The patch introduces a new optional property "reset-gpios" at PHY level.

Signed-off-by: Gregory CLEMENT 
---
Hi,

I agree with Sasha to start with a good binding and indeed the reset
is more related to the PHY than to the MAC even if currently we have
to manipulate it at MAC level.

I also followed Russell advice to not use fwnode functions. However
the following code seems to work:

struct fwnode_handle *phy_node =
device_get_next_child_node(>dev, NULL);
bp->reset_gpio = fwnode_get_named_gpiod(phy_node, "reset-gpios");
if (IS_ERR)
bp->reset_gpio = NULL;
gpiod_set_value(bp->reset_gpio, GPIOD_OUT_HIGH);

Given that it doesn't make the code better I kept the of_get_named_gpio method.

Gregory

Documentation/devicetree/bindings/net/macb.txt |  7 +++
 drivers/net/ethernet/cadence/macb.c| 16 
 drivers/net/ethernet/cadence/macb.h|  1 +
 3 files changed, 24 insertions(+)

diff --git a/Documentation/devicetree/bindings/net/macb.txt 
b/Documentation/devicetree/bindings/net/macb.txt
index b5d7976..38c8e84 100644
--- a/Documentation/devicetree/bindings/net/macb.txt
+++ b/Documentation/devicetree/bindings/net/macb.txt
@@ -19,6 +19,9 @@ Required properties:
Optional elements: 'tx_clk'
 - clocks: Phandles to input clocks.
 
+Optional properties for PHY child node:
+- reset-gpios : Should specify the gpio for phy reset
+
 Examples:
 
macb0: ethernet@fffc4000 {
@@ -29,4 +32,8 @@ Examples:
local-mac-address = [3a 0e 03 04 05 06];
clock-names = "pclk", "hclk", "tx_clk";
clocks = < 30>, < 30>, < 13>;
+   ethernet-phy@1 {
+   reg = <0x1>;
+   reset-gpios = < 6 1>;
+   };
};
diff --git a/drivers/net/ethernet/cadence/macb.c 
b/drivers/net/ethernet/cadence/macb.c
index 88c1e1a..35661aa 100644
--- a/drivers/net/ethernet/cadence/macb.c
+++ b/drivers/net/ethernet/cadence/macb.c
@@ -28,6 +28,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -2813,6 +2814,7 @@ static int macb_probe(struct platform_device *pdev)
  = macb_clk_init;
int (*init)(struct platform_device *) = macb_init;
struct device_node *np = pdev->dev.of_node;
+   struct device_node *phy_node;
const struct macb_config *macb_config = NULL;
struct clk *pclk, *hclk, *tx_clk;
unsigned int queue_mask, num_queues;
@@ -2900,6 +2902,16 @@ static int macb_probe(struct platform_device *pdev)
else
macb_get_hwaddr(bp);
 
+   /* Power up the PHY if there is a GPIO reset */
+   phy_node =  of_get_next_available_child(np, NULL);
+   if (phy_node) {
+   int gpio = of_get_named_gpio(phy_node, "reset-gpios", 0);
+   if (gpio_is_valid(gpio))
+   bp->reset_gpio = gpio_to_desc(gpio);
+   gpiod_set_value(bp->reset_gpio, GPIOD_OUT_HIGH);
+   }
+   of_node_put(phy_node);
+
err = of_get_phy_mode(np);
if (err < 0) {
pdata = dev_get_platdata(>dev);
@@ -2966,6 +2978,10 @@ static int macb_remove(struct platform_device *pdev)
mdiobus_unregister(bp->mii_bus);
kfree(bp->mii_bus->irq);
mdiobus_free(bp->mii_bus);
+
+   /* Shutdown the PHY if there is a GPIO reset */
+   gpiod_set_value(bp->reset_gpio, GPIOD_OUT_LOW);
+
unregister_netdev(dev);
clk_disable_unprepare(bp->tx_clk);
clk_disable_unprepare(bp->hclk);
diff --git a/drivers/net/ethernet/cadence/macb.h 
b/drivers/net/ethernet/cadence/macb.h
index 6e1faea..b6ec421 100644
--- a/drivers/net/ethernet/cadence/macb.h
+++ b/drivers/net/ethernet/cadence/macb.h
@@ -824,6 +824,7 @@ struct macb {
unsigned intdma_burst_length;
 
phy_interface_t phy_interface;
+   struct gpio_desc*reset_gpio;
 
/* AT91RM9200 transmit */
struct sk_buff *skb;/* holds skb until xmit 
interrupt completes */
-- 
2.5.0

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Information leak in llcp_sock_bind/llcp_raw_sock_bind

2015-12-16 Thread Dmitry Vyukov

On Tue, Dec 15, 2015 at 9:58 PM, David Miller  wrote:
> From: Dmitry Vyukov 
> Date: Tue, 15 Dec 2015 21:55:37 +0100
>
>> I've seen a kernel address at least in pptp_bind,
>
> We're not talking about pptp_bind.
>
> We're talking about llcp_{,raw}_sock_bind().
>
> If your hex dump doesn't show it, don't report anything unless you are
> absolutely sure via code inspection that there could be a leak.  And
> in that case make it perfectly clear exactly how that can happen.
>
> I am generally unimpressed with your reports half of the time,
> and just a small amount of extra effort would extraordinarily
> improve the quality of the things your post.
>
> Thanks.
>
>> So it is almost impossible to prove that a PC cannot be leaked.
>
> You can't show that anything is actually being leaked in this specific
> case, period.

I am a human and sometimes do mistakes.

In this case, I checked that the bind succeeds when I pass
sockaddrlen=0, which is suspicious and matches the behavior of pptp
case (not checking sockaddrlen at all). Then I looked at the code and
misread it, because it uses a different idiom from other cases I saw
(explicitly checking sockaddrlen value). Then I wrote a test program
and observed a varying, garbage-looking values returned from
getsockname. From that I concluded that there is an information leak.
This is wrong.

For the purpose of improvement of my reports, what are the other
reports you are not impressed with and why?

Thank you
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v3] net/macb: add support for resetting PHY using GPIO

2015-12-16 Thread Gregory CLEMENT

Hi Richard,
 
 On mer., déc. 16 2015, Richard Cochran  wrote:

> On Wed, Dec 16, 2015 at 07:31:30PM +0100, Gregory CLEMENT wrote:
>> +Optional properties for PHY child node:
>> +- reset-gpios : Should specify the gpio for phy reset
>
> reset-gpios plural or reset-gpio singular?

The bindings name is plural but it handle only one gpio.

See:
https://lkml.org/lkml/2015/12/9/623

>
>> +
>>  Examples:
>>  
>>  macb0: ethernet@fffc4000 {
>> @@ -29,4 +32,8 @@ Examples:
>>  local-mac-address = [3a 0e 03 04 05 06];
>>  clock-names = "pclk", "hclk", "tx_clk";
>>  clocks = < 30>, < 30>, < 13>;
>> +ethernet-phy@1 {
>> +reg = <0x1>;
>> +reset-gpios = < 6 1>;
>
> Did you mean "gpioE" ?

No, I really mean pioE: it is the name used for the at91 SoCs.

Thanks,

Gregory

-- 
Gregory Clement, Free Electrons
Kernel, drivers, real-time and embedded Linux
development, consulting, training and support.
http://free-electrons.com
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next 2/2] net/mlx5_en: Add HW timestamping (TS) support

2015-12-16 Thread Richard Cochran

On Wed, Dec 16, 2015 at 06:46:34PM +0200, Saeed Mahameed wrote:
> From: Eran Ben Elisha 
> 
> Add support for enable/disable HW timestamping for incoming and/or
> outgoing packets. It adds and initializes all structs and callbacks
> needed by kernel TS API.  To enable/disable HW timestamping appropriate
> ioctl should be used.  Currently HWTSTAMP_FILTER_ALL/NONE and
> HWTSAMP_TX_ON/OFF only are supported.  Make all relevant changes in
> RX/TX flows to consider TS request and plant HW timestamps into
> relevant structures.
> 
> Add a PHC support to the mlx5_en driver. Use reader/writer spinlocks to
> protect the timecounter since every packet received needs to call
> timecounter_cycle2time() when timestamping is enabled.  This can become
> a performance bottleneck with RSS and multiple receive queues if normal
> spinlocks are used.

The subject line doesn't mention PHC.  Please break this into two patches.

1. add hw time stamping
2. add phc support

Also, please CC the ptp maintainer for ptp patches.

Thanks,
Richard
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCHv3 net-next] ipv6: allow routes to be configured with expire values

2015-12-16 Thread Dan Williams

On Wed, 2015-12-16 at 17:50 +0800, Xin Long wrote:
> Add the support for adding expire value to routes,  requested by
> Tom Gundersen  for systemd-networkd, and NetworkManager
> wants it too.
> 
> implement it by adding the new RTNETLINK attribute RTA_EXPIRES.

Could you also add bits to send RTA_EXPIRES back to userspace in the
route dump in rt6_fill_node(), so that userspace can figure out when
RTA_EXPIRES is supported or not?

(obviously having it there isn't foolproof as if there are no routes on
the system yet userspace can't figure out support, but it's better than
nothing...)

Thanks!
Dan

> Signed-off-by: Xin Long 
> ---
>  include/uapi/linux/rtnetlink.h |  1 +
>  net/ipv6/route.c   | 10 ++
>  2 files changed, 11 insertions(+)
> 
> diff --git a/include/uapi/linux/rtnetlink.h
> b/include/uapi/linux/rtnetlink.h
> index 123a5af..ca764b5 100644
> --- a/include/uapi/linux/rtnetlink.h
> +++ b/include/uapi/linux/rtnetlink.h
> @@ -311,6 +311,7 @@ enum rtattr_type_t {
>   RTA_PREF,
>   RTA_ENCAP_TYPE,
>   RTA_ENCAP,
> + RTA_EXPIRES,
>   __RTA_MAX
>  };
>  
> diff --git a/net/ipv6/route.c b/net/ipv6/route.c
> index c83b6a5..3c8834b 100644
> --- a/net/ipv6/route.c
> +++ b/net/ipv6/route.c
> @@ -2709,6 +2709,7 @@ static const struct nla_policy
> rtm_ipv6_policy[RTA_MAX+1] = {
>   [RTA_PREF]  = { .type = NLA_U8 },
>   [RTA_ENCAP_TYPE]= { .type = NLA_U16 },
>   [RTA_ENCAP] = { .type = NLA_NESTED },
> + [RTA_EXPIRES]   = { .type = NLA_U32 },
>  };
>  
>  static int rtm_to_fib6_config(struct sk_buff *skb, struct nlmsghdr
> *nlh,
> @@ -2809,6 +2810,15 @@ static int rtm_to_fib6_config(struct sk_buff
> *skb, struct nlmsghdr *nlh,
>   if (tb[RTA_ENCAP_TYPE])
>   cfg->fc_encap_type =
> nla_get_u16(tb[RTA_ENCAP_TYPE]);
>  
> + if (tb[RTA_EXPIRES]) {
> + unsigned long timeout =
> addrconf_timeout_fixup(nla_get_u32(tb[RTA_EXPIRES]), HZ);
> +
> + if (addrconf_finite_timeout(timeout)) {
> + cfg->fc_expires = jiffies_to_clock_t(timeout
> * HZ);
> + cfg->fc_flags |= RTF_EXPIRES;
> + }
> + }
> +
>   err = 0;
>  errout:
>   return err;
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/2] net: usb: cdc_ncm: Adding support for two new Dell devices

2015-12-16 Thread Dan Williams

On Wed, 2015-12-16 at 10:39 +0100, Daniele Palmas wrote:
> This patch series add support in the cdc_ncm driver for two devices
> based on the same platform, that are different only for carrier
> customization.
> 
> The devices do not have ARP capabilities.
> 
> Daniele Palmas (2):
>   net: usb: cdc_ncm: Adding Dell DW5812 LTE Verizon Mobile Broadband
> Card
>   net: usb: cdc_ncm: Adding Dell DW5813 LTE AT Mobile Broadband
> Card

Quite interesting; Google knows nothing about these devices that I can
find.  What platform are these based on?

But in any case, since these blocks are almost identical to the DW5550
block, maybe update the comments to indicate that they need NOARP
unlike the MBM platform that the 5550 is based on?

Dan
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next v3 3/3] geneve: Remote Checksum Offload support

2015-12-16 Thread Jesse Gross

On Wed, Dec 16, 2015 at 8:41 AM, Tom Herbert  wrote:
> On Thu, Dec 10, 2015 at 1:17 PM, Jesse Gross  wrote:
>> On Thu, Dec 10, 2015 at 12:37 PM, Tom Herbert  wrote:
>>> Add support for remote checksum offload in both the normal and GRO
>>> paths. netlinks command are used to enable sending of the Remote
>>> Checksum Data, and allow processing of it on receive.
>>>
>>> Signed-off-by: Tom Herbert 
>>
>> Tom, can you please split this patch off and mark it as RFC or similar?
>>
>> I don't have any objections to implementing remote checksum offload
>> for Geneve in general but I think that it's pretty clear that the
>> format that you are using here is not the direction that the protocol
>> is going to evolve. We don't need to fragment the protocol by applying
>> this at this time.
>
> The first two patches were accepted, you should have enough to
> implement the remote checksum patches in the TLV format so you can
> take it from here.

Thanks, I'll pick it up from here.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH iproute2] ipv6: allow setting stable_secret mode

2015-12-16 Thread Stephen Hemminger

On Wed, 16 Dec 2015 12:30:08 +0100
Hannes Frederic Sowa  wrote:

> Signed-off-by: Hannes Frederic Sowa 
> ---
>  ip/iplink.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/ip/iplink.c b/ip/iplink.c
> index f30de86d1858a0..e824082f7d8149 100644
> --- a/ip/iplink.c
> +++ b/ip/iplink.c
> @@ -84,7 +84,7 @@ void iplink_usage(void)
>   fprintf(stderr, "  [ state { auto | 
> enable | disable} ] ]\n");
>   fprintf(stderr, " [ master DEVICE ]\n");
>   fprintf(stderr, " [ nomaster ]\n");
> - fprintf(stderr, " [ addrgenmode { eui64 | none 
> } ]\n");
> + fprintf(stderr, " [ addrgenmode { eui64 | none 
> | stable_secret } ]\n");
>   fprintf(stderr, " [ protodown { on | off } 
> ]\n");
>   fprintf(stderr, "   ip link show [ DEVICE | group GROUP ] [up] 
> [master DEV] [type TYPE]\n");
>  
> @@ -176,6 +176,8 @@ static int get_addr_gen_mode(const char *mode)
>   return IN6_ADDR_GEN_MODE_EUI64;
>   if (strcasecmp(mode, "none") == 0)
>   return IN6_ADDR_GEN_MODE_NONE;
> + if (strcasecmp(mode, "stable_secret") == 0)
> + return IN6_ADDR_GEN_MODE_STABLE_PRIVACY;
>   return -1;
>  }
>  

ip options are supposed to be clse to invertable. ie what is displayed on show
command should match what are options for set.  There were even some VPN scripts
that depended on this.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-next] tun: honor IFF_UP in tun_get_user()

2015-12-16 Thread Eric Dumazet

From: Eric Dumazet 

If a tun interface is turned down, we should not allow packet injection
into the kernel.

Kernel does not send packets to the tun already.

TUNATTACHFILTER can not be used as only tun_net_xmit() is taking care
of it.

Reported-by: Curt Wohlgemuth 
Signed-off-by: Eric Dumazet 
---
 drivers/net/tun.c |3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index f0db770e8b2f..88bb8cc3555b 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -1095,6 +1095,9 @@ static ssize_t tun_get_user(struct tun_struct *tun, 
struct tun_file *tfile,
u32 rxhash;
ssize_t n;
 
+   if (!(tun->dev->flags & IFF_UP))
+   return -EIO;
+
if (!(tun->flags & IFF_NO_PI)) {
if (len < sizeof(pi))
return -EINVAL;


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next] hv_netvsc: Use simple parser for IPv4 and v6 headers

2015-12-16 Thread Eric Dumazet

On Wed, 2015-12-16 at 19:20 +, Haiyang Zhang wrote:
> > -Original Message-
> > From: Eric Dumazet [mailto:eric.duma...@gmail.com]
> > Sent: Wednesday, December 16, 2015 12:08 PM
> > 
> > This looks very very wrong to me.
> > 
> > How many times this is called per second, for the 'one flow' case ?
> > 
> > Don't you use TSO in this driver ?
> > 
> > What about encapsulation ?
> > 
> > I suspect you have a quite different issue here.
> > 
> > You simply could use skb_get_hash() since local TCP flows will provide a
> > l4 skb->hash and you have no further flow dissection to do.
> 
> In our test, we have bisected and found the following patch introduced big 
> overhead into skb_flow_dissect_flow_keys(), and caused performance 
> regression:
> commit: d34af823
> net: Add VLAN ID to flow_keys

Adding Tom Herbert 

Your driver was assuming things about "struct flow_keys" layout.
This is not permitted.

Magic numbers like 12 and 8 are really bad...

static bool netvsc_set_hash(u32 *hash, struct sk_buff *skb)
{
struct flow_keys flow;
int data_len;

if (!skb_flow_dissect_flow_keys(skb, , 0) ||
!(flow.basic.n_proto == htons(ETH_P_IP) ||
  flow.basic.n_proto == htons(ETH_P_IPV6)))
return false;

if (flow.basic.ip_proto == IPPROTO_TCP)
data_len = 12;
else
data_len = 8;

*hash = comp_hash(netvsc_hash_key, HASH_KEYLEN, , data_len);

return true;
}


> This patch didn't add too many instructions, but we think the change to 
> the size of struct flow_keys may cause different cache missing rate...
> 
> To avoid affecting other drivers using this function, our patch limits the 
> change inside our driver to fix this performance regression.
> 
> Regarding your suggestion on skb_get_hash(), I looked at the code and ran 
> some tests, and found the skb->l4_hash and skb->sw_hash bits are not set, 
> so it calls __skb_get_hash() which eventually calls 
> skb_flow_dissect_flow_keys(). So it still includes the performance 
> overhead mentioned above.

Okay, but have you tried this instead of just guessing ?

Are you forwarding traffic, or is the traffic locally generated ?

TCP stack does set skb->l4_hash for sure in current kernels.

Your 'basic flow dissection' is very buggy and a step backward.

Just call skb_get_hash() : Not only your perf problem will vanish, but
your driver will correctly work with all possible malformed packets
(like pretending to be TCP packets but too small to even contain one
byte of TCP header) and well formed ones, with all encapsulations.




--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-next v3 0/2] net: Allow accepted sockets to be bound to l3mdev domain

2015-12-16 Thread David Ahern

Allow accepted sockets to derive their sk_bound_dev_if setting from the
l3mdev domain in which the packets originated. This version adds a sysctl
to control whether the setting is inherited, making the functionality
similar to sk_mark and its sysctl_tcp_fwmark_accept setting.

This effectively allow a process to have a "VRF-global" listen socket,
with child sockets bound to the VRF device in which the packet originated.
A similar behavior can be achieved using sk_mark, but a solution using marks
is incomplete as it does not handle duplicate addresses in different L3
domains/VRFs. Allowing sockets to inherit the sk_bound_dev_if from l3mdev
domain provides a complete solution.

David Ahern (2):
  net: l3mdev: Add master device lookup by index
  net: Allow accepted sockets to be bound to l3mdev domain

 Documentation/networking/ip-sysctl.txt |  8 
 include/net/inet_sock.h| 14 ++
 include/net/l3mdev.h   | 23 +++
 include/net/netns/ipv4.h   |  3 +++
 net/ipv4/syncookies.c  |  4 ++--
 net/ipv4/sysctl_net_ipv4.c | 11 +++
 net/ipv4/tcp_input.c   |  2 +-
 net/ipv4/tcp_ipv4.c|  1 +
 net/ipv6/syncookies.c  |  4 ++--
 9 files changed, 65 insertions(+), 5 deletions(-)

-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 00/15] add Intel(R) X722 iWARP driver

2015-12-16 Thread Joe Perches

On Wed, 2015-12-16 at 13:58 -0600, Faisal Latif wrote:
> This series contains the addition of the i40iw.ko driver.

This series should probably be respun against -next
instead of linus' tree.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next 2/2] net/mlx5_en: Add HW timestamping (TS) support

2015-12-16 Thread Or Gerlitz

On Wed, Dec 16, 2015 at 6:46 PM, Saeed Mahameed 
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
b/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
> index ea6a137..b6651b8 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
> @@ -98,6 +98,7 @@ int mlx5_core_sriov_configure(struct pci_dev *dev, int 
> num_vfs);
>  int mlx5_core_enable_hca(struct mlx5_core_dev *dev, u16 func_id);
>  int mlx5_core_disable_hca(struct mlx5_core_dev *dev, u16 func_id);
>  int mlx5_wait_for_vf_pages(struct mlx5_core_dev *dev);
> +cycle_t mlx5_core_read_clock(struct mlx5_core_dev *dev);
>
>  void mlx5e_init(void);
>  void mlx5e_cleanup(void);

this should go too into the pre-patch
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH][iproute2] tc/q_htb.c: Fix the MPU value output in 'tc -d class show dev ' command

2015-12-16 Thread Dmitrii Shcherbakov

Phil,

> Dmitrii, did iproute2 without your change even print the overhead as
set by you before? Looking at the code, I'd assume not.

I used an iproute2 provided by the distribution and an OpenVZ 3.10 kernel
so there are differences there but I
noticed the 'manual' overhead-related code is included there. I am going
to get an upstream kernel and iproute2 to check tomorrow. At least they
print mpu consistently.

The following output is for non-upstream kernels and iproute2 provided
by the distributions so it probably won't be 100% useful to answer your
question.

This is what I have on a quite an old ovz kernel at home:

[root@localhost ~]# uname -r
3.10.0-229.7.2.vz7.8.8
[root@localhost ~]# tc qdisc add dev eth0 root handle 1: htb default 12
[root@localhost ~]# tc class add dev eth0 parent 1: classid 1:1 htb rate
100kbps ceil 100kbps mpu 256 overhead 64
[root@localhost ~]# tc -d class show dev eth0
class htb 1:1 root prio 0 quantum 1 rate 80bit overhead 64 ceil
80bit burst 1600b/1 mpu 0b overhead 1b cburst 1600b/1 mpu 0b
overhead 1b level 0

And this is the same on ubuntu 14.04 with 3.19 kernel (looks like the
mpu value is not configured properly which is strange, though 'manual'
overhead is there).

administrator@ubuntu-q87:~$ uname -r
3.19.0-33-generic

administrator@ubuntu-q87:~$ sudo tc qdisc add dev eth0 root handle 1:
htb default 12
administrator@ubuntu-q87:~$ sudo tc class add dev eth0 parent 1: classid
1:1 htb rate 100kbps ceil 100kbps mpu 256 overhead 64
administrator@ubuntu-q87:~$ tc -d class show dev eth0
class htb 1:1 root prio 0 quantum 1 rate 80bit overhead 64 ceil
80bit linklayer ethernet burst 1600b/1 mpu 0b overhead 0b cburst
1600b/1 mpu 0b overhead 0b level 0

Thanks,
Dima
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next] net: Pass ndm_state to route netlink FDB notifications.

2015-12-16 Thread David Miller

From: "Sokolowski, Hubert" 
Date: Tue, 15 Dec 2015 13:20:30 +

> Before this change applications monitoring FDB notifications
> were not able to determine whether a new FDB entry is permament
> or not:
> bridge fdb add f1:f2:f3:f4:f5:f8 dev sw0p1 temp self
> bridge fdb add f1:f2:f3:f4:f5:f9 dev sw0p1 self
> 
> bridge monitor fdb
> 
> f1:f2:f3:f4:f5:f8 dev sw0p1 self permanent
> f1:f2:f3:f4:f5:f9 dev sw0p1 self permanent
> 
> With this change ndm_state from the original netlink message
> is passed to the new netlink message sent as notification.
> 
> bridge fdb add f1:f2:f3:f4:f5:f6 dev sw0p1 self
> bridge fdb add f1:f2:f3:f4:f5:f7 dev sw0p1 temp self
> 
> bridge monitor fdb
> f1:f2:f3:f4:f5:f6 dev sw0p1 self permanent
> f1:f2:f3:f4:f5:f7 dev sw0p1 self static
> 
> Signed-off-by: Hubert Sokolowski 

This looks fine, applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next] hv_netvsc: Use simple parser for IPv4 and v6 headers

2015-12-16 Thread David Miller

From: Haiyang Zhang 
Date: Wed, 16 Dec 2015 19:20:44 +

> In our test, we have bisected and found the following patch introduced big 
> overhead into skb_flow_dissect_flow_keys(), and caused performance 
> regression:
> commit: d34af823
> net: Add VLAN ID to flow_keys

NEVER _EVER_ work around this kind of problem by bypassing the code in
question in your driver.

ALWAYS work to fix the actual problem.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v7 0/4] Support administratively closing application sockets

2015-12-16 Thread David Miller

From: Jamal Hadi Salim 
Date: Wed, 16 Dec 2015 14:55:15 -0500

> The question i had was the opposite when i saw this: why are
> regular users allowed to read admin (and any other users) details?;->
> On this specific feature: why, as a regular user, I cant close
> connections attributed to me (and have to use CAP_NET_ADMIN)?

There is nothing provided via socket diag that can't be seen
also via /proc/net/tcp et al.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-next v3 2/2] net: Allow accepted sockets to be bound to l3mdev domain

2015-12-16 Thread David Ahern

Allow accepted sockets to derive their sk_bound_dev_if setting from the
l3mdev domain in which the packets originated. A sysctl setting is added
to control the behavior which is similar to sk_mark and
sysctl_tcp_fwmark_accept.

This effectively allow a process to have a "VRF-global" listen socket,
with child sockets bound to the VRF device in which the packet originated.
A similar behavior can be achieved using sk_mark, but a solution using marks
is incomplete as it does not handle duplicate addresses in different L3
domains/VRFs. Allowing sockets to inherit the sk_bound_dev_if from l3mdev
domain provides a complete solution.

Signed-off-by: David Ahern 
---
v3
- wrap the sysctl and its use with CONFIG_NET_L3_MASTER_DEV check

v2
- added sysctl option. wrapped l3mdev lookup in helper function
  similar to marks

 Documentation/networking/ip-sysctl.txt |  8 
 include/net/inet_sock.h| 14 ++
 include/net/netns/ipv4.h   |  3 +++
 net/ipv4/syncookies.c  |  4 ++--
 net/ipv4/sysctl_net_ipv4.c | 11 +++
 net/ipv4/tcp_input.c   |  2 +-
 net/ipv4/tcp_ipv4.c|  1 +
 net/ipv6/syncookies.c  |  4 ++--
 8 files changed, 42 insertions(+), 5 deletions(-)

diff --git a/Documentation/networking/ip-sysctl.txt 
b/Documentation/networking/ip-sysctl.txt
index 2ea4c45cf1c8..d104ec6cd2e4 100644
--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -335,6 +335,14 @@ tcp_keepalive_intvl - INTEGER
after probes started. Default value: 75sec i.e. connection
will be aborted after ~11 minutes of retries.
 
+tcp_l3mdev_accept - BOOLEAN
+   Enables child sockets to inherit the L3 master device index.
+   Enabling this option allows a "global" listen socket to work
+   across L3 master domains (e.g., VRFs) with connected sockets
+   derived from the listen socket to be bound to the L3 domain in
+   which the packets originated. Only valid when the kernel was
+   compiled with CONFIG_NET_L3_MASTER_DEV.
+
 tcp_low_latency - BOOLEAN
If set, the TCP stack makes decisions that prefer lower
latency as opposed to higher throughput.  By default, this
diff --git a/include/net/inet_sock.h b/include/net/inet_sock.h
index 2134e6d815bc..71c119d53b40 100644
--- a/include/net/inet_sock.h
+++ b/include/net/inet_sock.h
@@ -28,6 +28,7 @@
 #include 
 #include 
 #include 
+#include 
 
 /** struct ip_options - IP Options
  *
@@ -113,6 +114,19 @@ static inline u32 inet_request_mark(const struct sock *sk, 
struct sk_buff *skb)
return sk->sk_mark;
 }
 
+static inline int inet_request_bound_dev_if(const struct sock *sk,
+   struct sk_buff *skb)
+{
+#ifdef CONFIG_NET_L3_MASTER_DEV
+   struct net *net = sock_net(sk);
+
+   if (!sk->sk_bound_dev_if && net->ipv4.sysctl_tcp_l3mdev_accept)
+   return l3mdev_master_ifindex_by_index(net, skb->skb_iif);
+#endif
+
+   return sk->sk_bound_dev_if;
+}
+
 struct inet_cork {
unsigned intflags;
__be32  addr;
diff --git a/include/net/netns/ipv4.h b/include/net/netns/ipv4.h
index c68926b4899c..d75be32650ba 100644
--- a/include/net/netns/ipv4.h
+++ b/include/net/netns/ipv4.h
@@ -86,6 +86,9 @@ struct netns_ipv4 {
 
int sysctl_fwmark_reflect;
int sysctl_tcp_fwmark_accept;
+#ifdef CONFIG_NET_L3_MASTER_DEV
+   int sysctl_tcp_l3mdev_accept;
+#endif
int sysctl_tcp_mtu_probing;
int sysctl_tcp_base_mss;
int sysctl_tcp_probe_threshold;
diff --git a/net/ipv4/syncookies.c b/net/ipv4/syncookies.c
index 4cbe9f0a4281..643a86c49020 100644
--- a/net/ipv4/syncookies.c
+++ b/net/ipv4/syncookies.c
@@ -351,7 +351,7 @@ struct sock *cookie_v4_check(struct sock *sk, struct 
sk_buff *skb)
treq->snt_synack.v64= 0;
treq->tfo_listener  = false;
 
-   ireq->ir_iif = sk->sk_bound_dev_if;
+   ireq->ir_iif = inet_request_bound_dev_if(sk, skb);
 
/* We throwed the options of the initial SYN away, so we hope
 * the ACK carries the same options again (see RFC1122 4.2.3.8)
@@ -371,7 +371,7 @@ struct sock *cookie_v4_check(struct sock *sk, struct 
sk_buff *skb)
 * hasn't changed since we received the original syn, but I see
 * no easy way to do this.
 */
-   flowi4_init_output(, sk->sk_bound_dev_if, ireq->ir_mark,
+   flowi4_init_output(, ireq->ir_iif, ireq->ir_mark,
   RT_CONN_FLAGS(sk), RT_SCOPE_UNIVERSE, IPPROTO_TCP,
   inet_sk_flowi_flags(sk),
   opt->srr ? opt->faddr : ireq->ir_rmt_addr,
diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c
index a0bd7a55193e..41ff1f87dfd7 100644
--- a/net/ipv4/sysctl_net_ipv4.c
+++ b/net/ipv4/sysctl_net_ipv4.c
@@ -915,6 +915,17 @@ static struct ctl_table

[PATCH net-next 1/2] net: l3mdev: Add master device lookup by index

2015-12-16 Thread David Ahern

Add helper to lookup l3mdev master index given a device index.

Signed-off-by: David Ahern 
---
 include/net/l3mdev.h | 23 +++
 1 file changed, 23 insertions(+)

diff --git a/include/net/l3mdev.h b/include/net/l3mdev.h
index 774d85b2d5d9..786226f8e77b 100644
--- a/include/net/l3mdev.h
+++ b/include/net/l3mdev.h
@@ -51,6 +51,24 @@ static inline int l3mdev_master_ifindex(struct net_device 
*dev)
return ifindex;
 }
 
+static inline int l3mdev_master_ifindex_by_index(struct net *net, int ifindex)
+{
+   struct net_device *dev;
+   int rc = 0;
+
+   if (likely(ifindex)) {
+   rcu_read_lock();
+
+   dev = dev_get_by_index_rcu(net, ifindex);
+   if (dev)
+   rc = l3mdev_master_ifindex_rcu(dev);
+
+   rcu_read_unlock();
+   }
+
+   return rc;
+}
+
 /* get index of an interface to use for FIB lookups. For devices
  * enslaved to an L3 master device FIB lookups are based on the
  * master index
@@ -167,6 +185,11 @@ static inline int l3mdev_master_ifindex(struct net_device 
*dev)
return 0;
 }
 
+static inline int l3mdev_master_ifindex_by_index(struct net *net, int ifindex)
+{
+   return 0;
+}
+
 static inline int l3mdev_fib_oif_rcu(struct net_device *dev)
 {
return dev ? dev->ifindex : 0;
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: net: heap-out-of-bounds in sock_setsockopt

2015-12-16 Thread Willem de Bruijn

>
> Hmm, we should exclude the raw socket case, something like the
> following, but I am not sure if the check is too strict or not, also
> not sure if we should return an error for this raw socket case.

No, SOF_TIMESTAMPING_OPT_ID with SOCK_RAW/IPPROTO_TCP
is legitimate. It should fall through to initializing sk->sk_tskey to zero.
Only stream sockets should use the special case where numbering
is bytestream and computed by subtracting the seqno from the seqno
at the time that the option is enabled.

>
> diff --git a/net/core/sock.c b/net/core/sock.c
> index 765be83..c26e80a 100644
> --- a/net/core/sock.c
> +++ b/net/core/sock.c
> @@ -872,7 +872,7 @@ int sock_setsockopt(struct socket *sock, int
> level, int optname,
>
> if (val & SOF_TIMESTAMPING_OPT_ID &&
> !(sk->sk_tsflags & SOF_TIMESTAMPING_OPT_ID)) {
> -   if (sk->sk_protocol == IPPROTO_TCP) {
> +   if (sk->sk_protocol == IPPROTO_TCP &&
> sk->sk_type == SOCK_STREAM) {
> if (sk->sk_state != TCP_ESTABLISHED) {
> ret = -EINVAL;
> break;

I made the same error when later returning the tskey in
__skb_tstamp_tx:

if (sk->sk_tsflags & SOF_TIMESTAMPING_OPT_ID) {
serr->ee.ee_data = skb_shinfo(skb)->tskey;
if (sk->sk_protocol == IPPROTO_TCP)
serr->ee.ee_data -= sk->sk_tskey;
}

This has no effect if sk->sk_tskey is initialized to 0 with your patch. Still,
it should not treat SOCK_RAW as a TCP sock here, either. Please
add this if you're about to send the patch. Or I can send it separately,
if you prefer. Thanks for the quick fix.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH RFC] AF_UNIX SOCK_STREAM SO_PEEK_OFS oddity

2015-12-16 Thread Rainer Weikusat

While moving towards rectifying some more of my past missteps, I noticed
that the unix_stream_read_generic code loads the peek offset (into a
variable named skip) before entering the actual receive loop with the
u->readlock mutex held. If there's no data to return, it drops this lock
and goes to sleep until there is. The lock is then reacquired and it
proceeds using the old skip value which is later used to adjust the peek
offset for the socket in question. But since the readlock was dropped in
between, another reader could already have adjusted the peek offset by
the same amount based on the same, initial skip value: If there are two
concurrent 'peekers' and no skb available initially, they will both
return the same data to their respective callers and the peek offset
will end up being adjusted by twice the length of the returned number of
bytes, causing not-yet-peeked data immediately adjacent to that to be
skipped.

Example program:

-
#include 
#include 
#include 
#include 

#ifndef SO_PEEK_OFF
#  define SO_PEEK_OFF   42  /* my system is too old to have this */
#endif

#define MSG "TWOTIMESNOTATALL12345678"

static void peek_n_print(int sk)
{
char peeked[8];
ssize_t nr;

nr = recv(sk, peeked, sizeof(peeked), MSG_PEEK);
fprintf(stderr, "%ld got %.*s\n", (long)getpid(), (int)nr, peeked);
}

int main(void)
{
int sks[2], x;

socketpair(AF_UNIX, SOCK_STREAM, 0, sks);

x = 0;
setsockopt(sks[1], SOL_SOCKET, SO_PEEK_OFF, , sizeof(x));

if (fork() == 0) {
if (fork() == 0) {
peek_n_print(sks[1]);
_exit(0);
}

peek_n_print(sks[1]);

wait(NULL);

peek_n_print(sks[1]);
_exit(0);
}

sleep(1);
write(*sks, MSG, sizeof(MSG) - 1);

wait(NULL);
return 0;
}
-

If I understand the description available here,

http://www.spinics.net/lists/netdev/msg189589.html

correctly, this should print TWOTIMES, NOTATALL and 12345768 but because
of the locking/ offset handling issue (if it is an issue) described
above, it will actually print TWOTIMES twices followed by 12345678 while
the NOTATALL remains invisible. If this is not the intended behaviour, I
propose the patch below to fix it. It changes the code to reload the
peek offset after the sleep.

Signed-off-by: Rainer Weikusat 
---
diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 1c3c1f3..f020a81 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -2320,6 +2320,9 @@ again:
goto out;
}
 
+   if (flags & MSG_PEEK)
+   skip = sk_peek_offset(sk, flags);
+
continue;
 unlock:
unix_state_unlock(sk);
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v7 0/4] Support administratively closing application sockets

2015-12-16 Thread Eric Dumazet

On Wed, 2015-12-16 at 14:55 -0500, Jamal Hadi Salim wrote:
> On 15-12-16 10:50 AM, Eric Dumazet wrote:
> > On Wed, 2015-12-16 at 07:43 -0800, Stephen Hemminger wrote:
> >
> >>
> >> I see no security checks in the diag infrastructure.
> >> Up until now diag has been read-only access and therefore has been
> >> allowed for all users.
> >
> > It is still allowed to all users.
> >
> > Only the 'destroy' operation is restricted.
> 
> The question i had was the opposite when i saw this: why are
> regular users allowed to read admin (and any other users) details?;->
> On this specific feature: why, as a regular user, I cant close
> connections attributed to me (and have to use CAP_NET_ADMIN)?

I guess that we need to be careful here.

socket can be shared by fd passing to multiple users.

Who really owns it ?


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next 2/2] net/mlx5_en: Add HW timestamping (TS) support

2015-12-16 Thread Or Gerlitz

On Wed, Dec 16, 2015 at 6:46 PM, Saeed Mahameed  wrote:
> From: Eran Ben Elisha 
>
> Add support for enable/disable HW timestamping [..]

>  drivers/net/ethernet/mellanox/mlx5/core/Kconfig|1 +
>  drivers/net/ethernet/mellanox/mlx5/core/Makefile   |2 +-
>  drivers/net/ethernet/mellanox/mlx5/core/en.h   |   25 +++
>  drivers/net/ethernet/mellanox/mlx5/core/en_clock.c |  226 
> 
>  .../net/ethernet/mellanox/mlx5/core/en_ethtool.c   |   32 +++
>  drivers/net/ethernet/mellanox/mlx5/core/en_main.c  |  103 +-
>  drivers/net/ethernet/mellanox/mlx5/core/en_rx.c|9 +
>  drivers/net/ethernet/mellanox/mlx5/core/en_tx.c|   14 ++
>  .../net/ethernet/mellanox/mlx5/core/mlx5_core.h|1 +

On top of Richard's comment, lets shrink this a bit more -- put the
below changes into a pure mlx5_core downstream patch

>  drivers/net/ethernet/mellanox/mlx5/core/main.c |   31 +++
>  include/linux/mlx5/device.h|   20 ++-
>  include/linux/mlx5/mlx5_ifc.h  |5 +-


>  12 files changed, 462 insertions(+), 7 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] gianfar: Don't enable RX Filer if not supported

2015-12-16 Thread David Miller

From: Manoil Claudiu 
Date: Tue, 15 Dec 2015 10:12:36 +

>>-Original Message-
>>From: Hamish Martin [mailto:hamish.mar...@alliedtelesis.co.nz]
>>Sent: Tuesday, December 15, 2015 3:15 AM
>>To: Manoil Claudiu-B08782 
>>Cc: netdev@vger.kernel.org; Hamish Martin
>>
>>Subject: [PATCH] gianfar: Don't enable RX Filer if not supported
>>
>>After commit 15bf176db1fb ("gianfar: Don't enable the Filer w/o the
>>Parser"), 'TSEC' model controllers (for example as seen on MPC8541E)
>>always have 8 bytes stripped from the front of received frames.
>>Only 'eTSEC' gianfar controllers have the RX Filer capability (amongst
>>other enhancements). Previously this was treated as always enabled
>>for both 'TSEC' and 'eTSEC' controllers.
>>In commit 15bf176db1fb ("gianfar: Don't enable the Filer w/o the Parser")
>>a subtle change was made to the setting of 'uses_rxfcb' to effectively
>>always set it (since 'rx_filer_enable' was always true). This had the
>>side-effect of always stripping 8 bytes from the front of received frames
>>on 'TSEC' type controllers.
>>
>>We now only enable the RX Filer capability on controller types that
>>support it, thereby avoiding the issue for 'TSEC' type controllers.
>>
> 
> 
> Reviewed-by: Claudiu Manoil 

Applied and queued up for -stable, thanks everyone.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] net: add Qualcomm IPC router

2015-12-16 Thread Courtney Cavin

On Fri, Dec 11, 2015 at 09:41:59PM +0100, Bjorn Andersson wrote:
> From: Courtney Cavin 
> 
> Add an implementation of Qualcomm's IPC router protocol, used to
> communicate with service providing remote processors.
> 
> Signed-off-by: Courtney Cavin 
> ---
[...]
> +static int qrtr_sendmsg(struct socket *sock, struct msghdr *msg, size_t len)
> +{
> +   DECLARE_SOCKADDR(struct sockaddr_qrtr *, addr, msg->msg_name);
> +   int (*enqueue_fn)(struct qrtr_node *, struct sk_buff *);
> +   struct qrtr_sock *ipc = qrtr_sk(sock->sk);
> +   struct sock *sk = sock->sk;
> +   struct qrtr_node *node;
> +   struct qrtr_hdr *hdr;
> +   struct sk_buff *skb;
> +   size_t plen;
> +   int rc;
> +
> +   if (msg->msg_flags & ~(MSG_DONTWAIT))
> +   return -EINVAL;
> +
> +   if (len > 65535)
> +   return -EMSGSIZE;
> +
> +   lock_sock(sk);
> +
> +   if (addr) {
> +   if (msg->msg_namelen < sizeof(*addr)) {
> +   release_sock(sk);
> +   return -EINVAL;
> +   }
> +
> +   if (addr->sq_family != AF_QIPCRTR) {
> +   release_sock(sk);
> +   return -EINVAL;
> +   }
> +
> +   rc = qrtr_autobind(sock);
> +   if (rc) {
> +   release_sock(sk);
> +   return rc;
> +   }
> +   } else if (sk->sk_state == TCP_ESTABLISHED) {
> +   addr = >peer;
> +   } else {
> +   release_sock(sk);
> +   return -ENOTCONN;
> +   }
> +
> +   node = NULL;
> +   if (addr->sq_node == QRTR_NODE_BCAST) {
> +   enqueue_fn = qrtr_bcast_enqueue;
> +   } else if (addr->sq_node == 0 || addr->sq_node == ipc->us.sq_node) {

'addr->sq_node == 0' should be removed from this if-condition.  Zero is
a valid node id.  Clients needing the local address can use
getsockname(2).

> +   enqueue_fn = qrtr_local_enqueue;
> +   } else {
> +   enqueue_fn = qrtr_node_enqueue;
> +   node = qrtr_node_lookup(addr->sq_node);
> +   if (!node) {
> +   release_sock(sk);
> +   return -ECONNRESET;
> +   }
> +   }

-Courtney
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net] tcp: restore fastopen with no data in SYN packet

2015-12-16 Thread Yuchung Cheng

On Wed, Dec 16, 2015 at 1:53 PM, Eric Dumazet  wrote:
>
> From: Eric Dumazet 
>
> Yuchung tracked a regression caused by commit 57be5bdad759 ("ip: convert
> tcp_sendmsg() to iov_iter primitives") for TCP Fast Open.
>
> Some Fast Open users do not actually add any data in the SYN packet.
>
> Fixes: 57be5bdad759 ("ip: convert tcp_sendmsg() to iov_iter primitives")
> Reported-by: Yuchung Cheng 
> Signed-off-by: Eric Dumazet 
> Cc: Al Viro 
> ---
Acked-by: Yuchung Cheng 
>  net/ipv4/tcp_output.c |   23 ---
>  1 file changed, 12 insertions(+), 11 deletions(-)
>
> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> index cb7ca569052c..9bfc39ff2285 100644
> --- a/net/ipv4/tcp_output.c
> +++ b/net/ipv4/tcp_output.c
> @@ -3150,7 +3150,7 @@ static int tcp_send_syn_data(struct sock *sk, struct 
> sk_buff *syn)
>  {
> struct tcp_sock *tp = tcp_sk(sk);
> struct tcp_fastopen_request *fo = tp->fastopen_req;
> -   int syn_loss = 0, space, err = 0, copied;
> +   int syn_loss = 0, space, err = 0;
> unsigned long last_syn_loss = 0;
> struct sk_buff *syn_data;
>
> @@ -3188,17 +3188,18 @@ static int tcp_send_syn_data(struct sock *sk, struct 
> sk_buff *syn)
> goto fallback;
> syn_data->ip_summed = CHECKSUM_PARTIAL;
> memcpy(syn_data->cb, syn->cb, sizeof(syn->cb));
> -   copied = copy_from_iter(skb_put(syn_data, space), space,
> -   >data->msg_iter);
> -   if (unlikely(!copied)) {
> -   kfree_skb(syn_data);
> -   goto fallback;
> -   }
> -   if (copied != space) {
> -   skb_trim(syn_data, copied);
> -   space = copied;
> +   if (space) {
> +   int copied = copy_from_iter(skb_put(syn_data, space), space,
> +   >data->msg_iter);
> +   if (unlikely(!copied)) {
> +   kfree_skb(syn_data);
> +   goto fallback;
> +   }
> +   if (copied != space) {
> +   skb_trim(syn_data, copied);
> +   space = copied;
> +   }
> }
> -
> /* No more data pending in inet_wait_for_connect() */
> if (space == fo->size)
> fo->data = NULL;
>
>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net] tcp: restore fastopen with no data in SYN packet

2015-12-16 Thread Eric Dumazet

From: Eric Dumazet 

Yuchung tracked a regression caused by commit 57be5bdad759 ("ip: convert
tcp_sendmsg() to iov_iter primitives") for TCP Fast Open.

Some Fast Open users do not actually add any data in the SYN packet.

Fixes: 57be5bdad759 ("ip: convert tcp_sendmsg() to iov_iter primitives")
Reported-by: Yuchung Cheng 
Signed-off-by: Eric Dumazet 
Cc: Al Viro 
---
 net/ipv4/tcp_output.c |   23 ---
 1 file changed, 12 insertions(+), 11 deletions(-)

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index cb7ca569052c..9bfc39ff2285 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -3150,7 +3150,7 @@ static int tcp_send_syn_data(struct sock *sk, struct 
sk_buff *syn)
 {
struct tcp_sock *tp = tcp_sk(sk);
struct tcp_fastopen_request *fo = tp->fastopen_req;
-   int syn_loss = 0, space, err = 0, copied;
+   int syn_loss = 0, space, err = 0;
unsigned long last_syn_loss = 0;
struct sk_buff *syn_data;
 
@@ -3188,17 +3188,18 @@ static int tcp_send_syn_data(struct sock *sk, struct 
sk_buff *syn)
goto fallback;
syn_data->ip_summed = CHECKSUM_PARTIAL;
memcpy(syn_data->cb, syn->cb, sizeof(syn->cb));
-   copied = copy_from_iter(skb_put(syn_data, space), space,
-   >data->msg_iter);
-   if (unlikely(!copied)) {
-   kfree_skb(syn_data);
-   goto fallback;
-   }
-   if (copied != space) {
-   skb_trim(syn_data, copied);
-   space = copied;
+   if (space) {
+   int copied = copy_from_iter(skb_put(syn_data, space), space,
+   >data->msg_iter);
+   if (unlikely(!copied)) {
+   kfree_skb(syn_data);
+   goto fallback;
+   }
+   if (copied != space) {
+   skb_trim(syn_data, copied);
+   space = copied;
+   }
}
-
/* No more data pending in inet_wait_for_connect() */
if (space == fo->size)
fo->data = NULL;


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] 82xx: FCC: Fixing a bug causing to FCC port lock-up

2015-12-16 Thread David Miller

From: Martin Roth 
Date: Tue, 15 Dec 2015 04:17:53 +0200

> The patch fixes FCC port lock-up, which occurs as a result of a bug
> during underrun/collision handling. Within the tx_startup() function
> in mac-fcc.c, the address of last BD is not calculated correctly.
> As a result of wrong calculation of the last BD address, the next
> transmitted BD may be set to an area out of the transmit BD ring.
> This actually causes to port lock-up and it is not recoverable.
> 
> Signed-off-by: Martin Roth 

Applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: pull-request: mac80211 2015-12-15

2015-12-16 Thread David Miller

From: Johannes Berg 
Date: Tue, 15 Dec 2015 13:30:21 +0100

> After going through my patch queue, I have another set of fixes that
> I think is still appropriate for the current cycle.
> 
> Two of my own restart changes there are fairly big but for the most
> part just move code around so it can be called in a slightly different
> order.
> 
> Let me know if there are any issues.

Something about your text encoding kept this from ending up
in patchwork for some reason.

Anyways, pulled, thanks a lot!
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next v4 8/8] openvswitch: Interface with NAT.

2015-12-16 Thread Jarno Rajahalme

Thanks for review, I removed these in version 5.

  Jarno

> On Dec 10, 2015, at 11:10 AM, Pablo Neira Ayuso  wrote:
> 
> On Tue, Dec 08, 2015 at 05:01:10PM -0800, Jarno Rajahalme wrote:
>> -/* Call the helper right after nf_conntrack_in() for confirmed
>> - * connections, but only when commiting for unconfirmed connections.
>> - */
>>  ct = nf_ct_get(skb, );
>> -if (ct && (nf_ct_is_confirmed(ct) ? !cached : info->commit)
>> -&& ovs_ct_helper(skb, info->family) != NF_ACCEPT) {
>> -WARN_ONCE(1, "helper rejected packet");
>> -return -EINVAL;
>> +if (ct) {
>> +#ifdef CONFIG_NF_NAT_NEEDED
>> +/* Packets starting a new connection must be NATted before the
>> + * helper, so that the helper knows about the NAT.  We enforce
>> + * this by delaying both NAT and helper calls for unconfirmed
>> + * connections until the commiting CT action.  For later
>> + * packets NAT and Helper may be called in either order.
>> + *
>> + * NAT will be done only if the CT action has NAT, and only
>> + * once per packet (per zone), as guarded by the NAT bits in
>> + * the key->ct.state.
>> + */
>> +if (info->nat && !(key->ct.state & OVS_CS_F_NAT_MASK) &&
>> +(nf_ct_is_confirmed(ct) || info->commit) &&
>> +ovs_ct_nat(net, key, info, skb, ct, ctinfo) != NF_ACCEPT) {
>> +WARN_ONCE(1, "NAT rejected packet");
> 
> NAT can drop packets, so this warn_on I don't think you need it.
> 
>> +return -EINVAL;
>> +}
>> +#endif
>> +/* Call the helper whenever nf_conntrack_in() was called for
>> + * confirmed connections, but only when commiting for
>> + * unconfirmed connections.
>> + */
>> +if ((nf_ct_is_confirmed(ct) ? !cached : info->commit)
>> +&& ovs_ct_helper(skb, info->family) != NF_ACCEPT) {
>> +WARN_ONCE(1, "helper rejected packet");
> 
> Same thing may happen with helpers.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] inet: sanitize socket() protocol value

2015-12-16 Thread Eric Dumazet

On Wed, Dec 16, 2015 at 4:57 PM, Vegard Nossum  wrote:
> If you create a raw socket with a protocol of e.g. 0x1, then
> inet_sk(sk)->inet_num will get set to 0 since it only has room for 16
> bits. This causes problems further down the line as lots of code makes
> assumptions about ->inet_num, for example connect()...inet_autobind()
> will attempt to call sk->sk_prot->get_port() which is invalid:
>
>   BUG: unable to handle kernel NULL pointer dereference at   (null)
>   IP: [<  (null)>]   (null)
>   PGD 19ea0067 PUD 19d3f067 PMD 0
>   Oops: 0010 [#1] SMP
>   CPU: 1 PID: 849 Comm: a.out Not tainted 4.4.0-rc4+ #287
>   task: 880019a12640 ti: 8808c000 task.ti: 8808c000
>   RIP: 0010:[<>]  [<  (null)>]   (null)
>   RSP: 0018:8808fe50  EFLAGS: 00010246
>   RAX: 81cc61f0 RBX: 880019d2ca80 RCX: 0802
>   RDX: 0001 RSI:  RDI: 880019d2ca80
>   RBP: 8808fe60 R08: 7f0e6b132e80 R09: 880019d2ca80
>   R10: 88001a7a72e0 R11: 880019a12640 R12: 000b
>   R13: 00400644 R14:  R15: 
>   FS:  7f0e6b355740() GS:88001a90() knlGS:
>   CS:  0010 DS:  ES:  CR0: 80050033
>   CR2:  CR3: 19d3d000 CR4: 001406a0
>   Stack:
>815f1c69 880019d2ca80 8808fe88 815f1cde
>000b000b 8808fea0 880019484000 8808ff38
>8156f310  2a2a2a2a2a2a2a2a 812a
>   Call Trace:
>[] ? inet_autobind+0x23/0x50
>[] inet_dgram_connect+0x48/0x64
>[] SYSC_connect+0x84/0xae
>[] ? sock_alloc_file+0xb3/0x108
>[] ? fd_install+0x20/0x22
>[] ? SYSC_socket+0x62/0x90
>[] SyS_connect+0x9/0xb
>[] entry_SYSCALL_64_fastpath+0x12/0x71
>   Code:  Bad RIP value.
>   RIP  [<  (null)>]   (null)
>RSP 
>   CR2: 
>   ---[ end trace bd60b4fe2edc2537 ]---
>
> Signed-off-by: Vegard Nossum 
> Cc: Eric Dumazet 
> Cc: 
> ---
>  net/ipv4/af_inet.c | 6 ++
>  1 file changed, 6 insertions(+)
>
> diff --git net/ipv4/af_inet.c net/ipv4/af_inet.c
> index 11c4ca1..4e1583a 100644
> --- net/ipv4/af_inet.c
> +++ net/ipv4/af_inet.c
> @@ -316,6 +316,12 @@ lookup_protocol:
>
> WARN_ON(!answer_prot->slab);
>
> +   /* Check that the protocol we were given will actually fit in
> +* inet->inet_num. */
> +   err = -EINVAL;
> +   if (protocol != (typeof(inet->inet_num)) protocol)
> +   goto out;
> +
> err = -ENOBUFS;
> sk = sk_alloc(net, PF_INET, GFP_KERNEL, answer_prot, kern);
> if (!sk)
> --
> 1.9.1
>

It looks already fixed in a better way (IPv6, ...)

http://git.kernel.org/cgit/linux/kernel/git/davem/net.git/commit/?id=79462ad02e861803b3840cc782248c7359451cd9

Thanks
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

linux-next: manual merge of the akpm-current tree with the net-next tree

2015-12-16 Thread Stephen Rothwell

Hi Andrew,

Today's linux-next merge of the akpm-current tree got a conflict in:

  include/net/sock.h
  net/ipv4/tcp_ipv4.c

between commit:

  64be0aed59ad ("net: diag: Add the ability to destroy a socket.")

from the net-next tree and commit:

  0e2cde9cf7b6 ("net: tcp_memcontrol: simplify linkage between socket and page 
counter")

from the akpm-current tree.

I fixed it up (see below) and can carry the fix as necessary (no action
is required).

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au

diff --cc include/net/sock.h
index f772b8245cae,edd552ef8e38..
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@@ -309,8 -292,8 +293,8 @@@ struct cg_proto
*   @sk_send_head: front of stuff to transmit
*   @sk_security: used by security modules
*   @sk_mark: generic packet mark
 -  *   @sk_classid: this socket's cgroup classid
 +  *   @sk_cgrp_data: cgroup data for this cgroup
-   *   @sk_cgrp: this socket's cgroup-specific proto data
+   *   @sk_memcg: this socket's memory cgroup association
*   @sk_write_pending: a write to stream socket waits to start
*   @sk_state_change: callback to indicate change in the state of the sock
*   @sk_data_ready: callback to indicate there is data to be processed
@@@ -444,8 -428,11 +428,8 @@@ struct sock 
  #ifdef CONFIG_SECURITY
void*sk_security;
  #endif
 -  __u32   sk_mark;
 -#ifdef CONFIG_CGROUP_NET_CLASSID
 -  u32 sk_classid;
 -#endif
 +  struct sock_cgroup_data sk_cgrp_data;
-   struct cg_proto *sk_cgrp;
+   struct mem_cgroup   *sk_memcg;
void(*sk_state_change)(struct sock *sk);
void(*sk_data_ready)(struct sock *sk);
void(*sk_write_space)(struct sock *sk);
@@@ -1051,19 -1036,6 +1035,7 @@@ struct proto 
  #ifdef SOCK_REFCNT_DEBUG
atomic_tsocks;
  #endif
- #ifdef CONFIG_MEMCG_KMEM
-   /*
-* cgroup specific init/deinit functions. Called once for all
-* protocols that implement it, from cgroups populate function.
-* This function has to setup any files the protocol want to
-* appear in the kmem cgroup filesystem.
-*/
-   int (*init_cgroup)(struct mem_cgroup *memcg,
-  struct cgroup_subsys *ss);
-   void(*destroy_cgroup)(struct mem_cgroup *memcg);
-   struct cg_proto *(*proto_cgroup)(struct mem_cgroup *memcg);
- #endif
 +  int (*diag_destroy)(struct sock *sk, int err);
  };
  
  int proto_register(struct proto *prot, int alloc_slab);
diff --cc net/ipv4/tcp_ipv4.c
index 205e6745393f,34c26782e114..
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@@ -2336,12 -2339,6 +2338,7 @@@ struct proto tcp_prot = 
.compat_setsockopt  = compat_tcp_setsockopt,
.compat_getsockopt  = compat_tcp_getsockopt,
  #endif
- #ifdef CONFIG_MEMCG_KMEM
-   .init_cgroup= tcp_init_cgroup,
-   .destroy_cgroup = tcp_destroy_cgroup,
-   .proto_cgroup   = tcp_proto_cgroup,
- #endif
 +  .diag_destroy   = tcp_abort,
  };
  EXPORT_SYMBOL(tcp_prot);
  
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Load Balancing for AF_INET Raw Sockets

2015-12-16 Thread Prashant Upadhyaya

On Tue, Dec 15, 2015 at 8:09 PM, Eric Dumazet  wrote:
> On Tue, 2015-12-15 at 18:26 +0530, Prashant Upadhyaya wrote:
>> Hi,
>>
>> I open a raw socket for listening to all the UDP packets in a raw fashion --
>>
>> socket(AF_INET, SOCK_RAW, IPPROTO_UDP);
>>
>> Then I use recvfrom to read the packets over the socket.
>>
>> The above works mighty fine.
>> I want to find out if it is possible to 'load balance' the UDP flows
>> by opening up multiple instances of this socket and then possibly
>> setting some socket options so that I can scale up the reading via
>> multiple threads doing recvfrom on these from multiple cores.
>> (I know it is possible over packet sockets, but that is a different usecase)
>
> No plan yet to support fanout on multiple raw sockets.
>
>
Hi,

One question on the AF_INET6 raw sockets.
Here I don't get the ipv6 header at all when I read a packet.
I checked the RFC 3542 and it specifies the following as the ancillary
data which can be obtained --

Four similar pieces of information can be returned for a received
   packet as ancillary data:

  1.  the destination IPv6 address,
  2.  the arriving interface index,
  3.  the arriving hop limit, and
  4.  the arriving traffic class value.

Now how do I obtain the 'src IPv6 address' ?

Regards
-Prashant
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next 1/4] cxgb4: Use mask & shift while returning vlan_prio

2015-12-16 Thread David Miller

From: Casey Leedom 
Date: Wed, 16 Dec 2015 17:40:41 -0800

> 
>> On Dec 16, 2015, at 4:07 PM, David Miller  wrote:
>> 
>> From: Hariprasad Shenai 
>> Date: Wed, 16 Dec 2015 13:16:25 +0530
>> 
>>> @@ -66,7 +66,7 @@ struct l2t_data {
>>> 
>>> static inline unsigned int vlan_prio(const struct l2t_entry *e)
>>> {
>>> -   return e->vlan >> 13;
>>> +   return (e->vlan & VLAN_PRIO_MASK) >> VLAN_PRIO_SHIFT;
>> 
>> e->vlan is a u16, the vlan priotity is the top 3 bits of the 16-bit
>> value, and finally the right shift will be non-signed.
>> 
>> Therefore this change is absolutely not necessary.
>> 
>> Please remove this patch from the series and resend.
> 
> I assume that you only meant that the masking portion is
> unnecessary.  Doing the shift with the symbolic constant
> VLAN_PRIO_SHIFT instead of the literal constant “13” is still a
> reasonable change.  The masking was almost certainly from me because
> once one uses the symbolic constants, weren’t not supposed to “know”
> about the internal structure of the operation.  Modern compilers are
> of course free to optimize away the mask, etc.

Yes I'm only objecting to the unnecessary mask operation.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] drivers: net: cpsw: fix RMII/RGMII mode when used with fixed-link PHY

2015-12-16 Thread David Rivshin (Allworx)

On Wed, 16 Dec 2015 07:39:16 +0100
Markus Brunner  wrote:

> On Monday 14 December 2015 13:04:46 David Rivshin wrote:
> > On Sat, 12 Dec 2015 16:44:19 +0100
> ...
> > > Your patch works fine on my board, which uses MII and dual_emac
> > > with a fixed_phy and a real one.
> > 
> > Thanks for checking. The only dual_emac board I have available is
> > the EVMSK, which has two real PHYs. I'm not sure of the usual
> > etiquette (and Google was  unhelpful), should I add a Tested-by on
> > the next version?
> > 
> Yes you can. Documentation/SubmittingPatches has some notes about it.

Thanks, I didn't want to throw it on without permission. Although due 
to the non-trivial change I mention below, I figured that the previous 
testing wasn't totally valid anymore anyways, so I left it off the v2 
emails.

> > > I wanted to keep changes small and didn't spend too much thinking
> > > about already broken devicetrees. Since my patch is quite new, I
> > 
> > I'm honestly not sure it's an important consideration myself. Most
> > patches I've seen in this area for this or other drivers do not take
> > such behavior into account (e.g. the phy-handle parsing that went in
> > to cpsw in 4.3).
> > I would generally feel more comfortable with such a behavior tweak
> > (minor as it is) before 4.4 is released, to avoid ping-ponging the
> > behavior. But given how far along the cycle is, I'm not sure about
> > the chances of that.
> 
> Well I don't think compatibility for flawed DTs is such a high
> priority, especially if it is that unlikely that there are some
> affected. Keep the focus on the other _real_ problems you have
> encountered and fix those like you see fit. 

Since there's been no indication from anyone that being nice to such
broken DTs is desired, I decided to drop that aspect of the patch and 
leave the current 4.4-rc1..5 behavior. This also made it much more 
reasonable to chop up the patch into smaller pieces, which I think will 
be easier to review.

> > > don't see any problems with subtle changes like that. However you
> > > should update the documentation as well.
> > 
> > Your patch already updated .../bindings/net/cpsw.txt, which this
> > patch left alone. Are you referring to some other documentation,
> > or do you think I should update the binding documentation to state
> > that phy_id takes precedence over fixed-link? I figured that such
> > devicetrees were still officially malformed, so I thought the
> > existing text was appropriate.
> 
> "Either the properties phy_id and phy-mode, or the sub-node
> fixed-link can be specified" One flaw of my patch was to ignore the
> phy-mode for a fixed link. Do not mention the precedence of the
> phy_id, because it is an undefined behavior. Your patch should change
> it to: "Either the property phy_id, or the sub-node fixed-link can be
> specified"

Thanks for pointing that out. For some reason my brain skipped over the 
"and phy-mode" part. Fixed in v2.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 15/15] i40iw: changes for build of i40iw module

2015-12-16 Thread kbuild test robot

Hi Faisal,

[auto build test WARNING on net/master]
[also build test WARNING on v4.4-rc5 next-20151216]
[cannot apply to net-next/master]

url:
https://github.com/0day-ci/linux/commits/Faisal-Latif/add-Intel-R-X722-iWARP-driver/20151217-040340
config: sparc-allyesconfig (attached as .config)
reproduce:
wget 
https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross
 -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=sparc 

All warnings (new ones prefixed by >>):

   drivers/infiniband/hw/i40iw/i40iw_verbs.c: In function 
'i40iw_setup_kmode_qp':
>> drivers/infiniband/hw/i40iw/i40iw_verbs.c:571:28: warning: cast to pointer 
>> from integer of different size [-Wint-to-pointer-cast]
 info->rq_pa = (uintptr_t)((u8 *)mem->pa + (sqdepth * 
I40IW_QP_WQE_MIN_SIZE));
   ^

vim +571 drivers/infiniband/hw/i40iw/i40iw_verbs.c

e4d636f5 Faisal Latif 2015-12-16  555   ukinfo->rq_wrid_array = (u64 
*)>sq_wrtrk_array[sqdepth];
e4d636f5 Faisal Latif 2015-12-16  556  
e4d636f5 Faisal Latif 2015-12-16  557   size = (sqdepth + rqdepth) * 
I40IW_QP_WQE_MIN_SIZE;
e4d636f5 Faisal Latif 2015-12-16  558   size += (I40IW_SHADOW_AREA_SIZE << 3);
e4d636f5 Faisal Latif 2015-12-16  559  
e4d636f5 Faisal Latif 2015-12-16  560   status = 
i40iw_allocate_dma_mem(iwdev->sc_dev.hw, mem, size, 256);
e4d636f5 Faisal Latif 2015-12-16  561   if (status) {
e4d636f5 Faisal Latif 2015-12-16  562   kfree(ukinfo->sq_wrtrk_array);
e4d636f5 Faisal Latif 2015-12-16  563   ukinfo->sq_wrtrk_array = NULL;
e4d636f5 Faisal Latif 2015-12-16  564   return -ENOMEM;
e4d636f5 Faisal Latif 2015-12-16  565   }
e4d636f5 Faisal Latif 2015-12-16  566  
e4d636f5 Faisal Latif 2015-12-16  567   ukinfo->sq = mem->va;
e4d636f5 Faisal Latif 2015-12-16  568   info->sq_pa = mem->pa;
e4d636f5 Faisal Latif 2015-12-16  569  
e4d636f5 Faisal Latif 2015-12-16  570   ukinfo->rq = (u64 *)((u8 *)mem->va + 
(sqdepth * I40IW_QP_WQE_MIN_SIZE));
e4d636f5 Faisal Latif 2015-12-16 @571   info->rq_pa = (uintptr_t)((u8 *)mem->pa 
+ (sqdepth * I40IW_QP_WQE_MIN_SIZE));
e4d636f5 Faisal Latif 2015-12-16  572  
e4d636f5 Faisal Latif 2015-12-16  573   ukinfo->shadow_area = (u64 *)((u8 
*)ukinfo->rq +
e4d636f5 Faisal Latif 2015-12-16  574 (rqdepth 
* I40IW_QP_WQE_MIN_SIZE));
e4d636f5 Faisal Latif 2015-12-16  575   info->shadow_area_pa = info->rq_pa + 
(rqdepth * I40IW_QP_WQE_MIN_SIZE);
e4d636f5 Faisal Latif 2015-12-16  576  
e4d636f5 Faisal Latif 2015-12-16  577   ukinfo->sq_size = sq_size;
e4d636f5 Faisal Latif 2015-12-16  578   ukinfo->rq_size = rq_size;
e4d636f5 Faisal Latif 2015-12-16  579   ukinfo->qp_id = iwqp->ibqp.qp_num;

:: The code at line 571 was first introduced by commit
:: e4d636f5c9dea5d2dd1f5c74e3a2235218a537a8 i40iw: add files for iwarp 
interface

:: TO: Faisal Latif <faisal.la...@intel.com>
:: CC: 0day robot <fengguang...@intel.com>

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: Binary data

Re: [PATCH] Add tcindex to conntrack and add netfilter target/matches

2015-12-16 Thread kbuild test robot

Hi Luuk,

[auto build test ERROR on nf-next/master]
[also build test ERROR on v4.4-rc5 next-20151216]

url:
https://github.com/0day-ci/linux/commits/Luuk-Paulussen/Add-tcindex-to-conntrack-and-add-netfilter-target-matches/20151216-082324
base:   https://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next master
config: x86_64-randconfig-x019-12171138 (attached as .config)
reproduce:
# save the attached .config to linux build tree
make ARCH=x86_64 

All errors (new ones prefixed by >>):

   net/built-in.o: In function `conntcindex_tg_destroy':
>> xt_conntcindex.c:(.text+0x7cf5d): undefined reference to 
>> `nf_ct_l3proto_module_put'
   net/built-in.o: In function `conntcindex_mt_destroy':
   xt_conntcindex.c:(.text+0x7cf71): undefined reference to 
`nf_ct_l3proto_module_put'
   net/built-in.o: In function `conntcindex_tg_check':
>> xt_conntcindex.c:(.text+0x7d0aa): undefined reference to 
>> `nf_ct_l3proto_try_module_get'
   net/built-in.o: In function `conntcindex_mt_check':
   xt_conntcindex.c:(.text+0x7d0f2): undefined reference to 
`nf_ct_l3proto_try_module_get'

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: Binary data

[PATCH v2] drivers: net: xgene: fix Tx flow control

2015-12-16 Thread Iyappan Subramanian

Currently the Tx flow control is based on reading the hardware state,
which is not accurate since it may not reflect the descriptors that
are not yet reached the memory.

To accurately control the Tx flow, changing it to be software based.

Signed-off-by: Iyappan Subramanian 
---

v2: Address v1 review comments
- Removed use of atomic_t
- Added tx_level and txc_level counters to pdata

v1: Initial version
---
 drivers/net/ethernet/apm/xgene/xgene_enet_main.c | 38 ++--
 drivers/net/ethernet/apm/xgene/xgene_enet_main.h |  4 +--
 2 files changed, 24 insertions(+), 18 deletions(-)

diff --git a/drivers/net/ethernet/apm/xgene/xgene_enet_main.c 
b/drivers/net/ethernet/apm/xgene/xgene_enet_main.c
index 9147a01..d0ae1a6 100644
--- a/drivers/net/ethernet/apm/xgene/xgene_enet_main.c
+++ b/drivers/net/ethernet/apm/xgene/xgene_enet_main.c
@@ -289,6 +289,7 @@ static int xgene_enet_setup_tx_desc(struct 
xgene_enet_desc_ring *tx_ring,
struct sk_buff *skb)
 {
struct device *dev = ndev_to_dev(tx_ring->ndev);
+   struct xgene_enet_pdata *pdata = netdev_priv(tx_ring->ndev);
struct xgene_enet_raw_desc *raw_desc;
__le64 *exp_desc = NULL, *exp_bufs = NULL;
dma_addr_t dma_addr, pbuf_addr, *frag_dma_addr;
@@ -419,6 +420,7 @@ out:
raw_desc->m0 = cpu_to_le64(SET_VAL(LL, ll) | SET_VAL(NV, nv) |
   SET_VAL(USERINFO, tx_ring->tail));
tx_ring->cp_ring->cp_skb[tx_ring->tail] = skb;
+   pdata->tx_level += count;
tx_ring->tail = tail;
 
return count;
@@ -429,14 +431,13 @@ static netdev_tx_t xgene_enet_start_xmit(struct sk_buff 
*skb,
 {
struct xgene_enet_pdata *pdata = netdev_priv(ndev);
struct xgene_enet_desc_ring *tx_ring = pdata->tx_ring;
-   struct xgene_enet_desc_ring *cp_ring = tx_ring->cp_ring;
-   u32 tx_level, cq_level;
+   u32 tx_level = pdata->tx_level;
int count;
 
-   tx_level = pdata->ring_ops->len(tx_ring);
-   cq_level = pdata->ring_ops->len(cp_ring);
-   if (unlikely(tx_level > pdata->tx_qcnt_hi ||
-cq_level > pdata->cp_qcnt_hi)) {
+   if (tx_level < pdata->txc_level)
+   tx_level += ((typeof(pdata->tx_level))~0U);
+
+   if ((tx_level - pdata->txc_level) > pdata->tx_qcnt_hi) {
netif_stop_queue(ndev);
return NETDEV_TX_BUSY;
}
@@ -539,10 +540,13 @@ static int xgene_enet_process_ring(struct 
xgene_enet_desc_ring *ring,
struct xgene_enet_raw_desc *raw_desc, *exp_desc;
u16 head = ring->head;
u16 slots = ring->slots - 1;
-   int ret, count = 0, processed = 0;
+   int ret, desc_count, count = 0, processed = 0;
+   bool is_completion;
 
do {
raw_desc = >raw_desc[head];
+   desc_count = 0;
+   is_completion = false;
exp_desc = NULL;
if (unlikely(xgene_enet_is_desc_slot_empty(raw_desc)))
break;
@@ -559,18 +563,24 @@ static int xgene_enet_process_ring(struct 
xgene_enet_desc_ring *ring,
}
dma_rmb();
count++;
+   desc_count++;
}
-   if (is_rx_desc(raw_desc))
+   if (is_rx_desc(raw_desc)) {
ret = xgene_enet_rx_frame(ring, raw_desc);
-   else
+   } else {
ret = xgene_enet_tx_completion(ring, raw_desc);
+   is_completion = true;
+   }
xgene_enet_mark_desc_slot_empty(raw_desc);
if (exp_desc)
xgene_enet_mark_desc_slot_empty(exp_desc);
 
head = (head + 1) & slots;
count++;
+   desc_count++;
processed++;
+   if (is_completion)
+   pdata->txc_level += desc_count;
 
if (ret)
break;
@@ -580,10 +590,8 @@ static int xgene_enet_process_ring(struct 
xgene_enet_desc_ring *ring,
pdata->ring_ops->wr_cmd(ring, -count);
ring->head = head;
 
-   if (netif_queue_stopped(ring->ndev)) {
-   if (pdata->ring_ops->len(ring) < pdata->cp_qcnt_low)
-   netif_wake_queue(ring->ndev);
-   }
+   if (netif_queue_stopped(ring->ndev))
+   netif_start_queue(ring->ndev);
}
 
return processed;
@@ -1033,9 +1041,7 @@ static int xgene_enet_create_desc_rings(struct net_device 
*ndev)
pdata->tx_ring->cp_ring = cp_ring;
pdata->tx_ring->dst_ring_num = xgene_enet_dst_ring_num(cp_ring);
 
-   pdata->tx_qcnt_hi = pdata->tx_ring->slots / 2;
-   pdata->cp_qcnt_hi = pdata->rx_ring->slots / 2;
-   pdata->cp_qcnt_low = pdata->cp_qcnt_hi

Re: [PATCH net-next 1/4] cxgb4: Use mask & shift while returning vlan_prio

2015-12-16 Thread David Miller

From: Hariprasad Shenai 
Date: Wed, 16 Dec 2015 13:16:25 +0530

> @@ -66,7 +66,7 @@ struct l2t_data {
>  
>  static inline unsigned int vlan_prio(const struct l2t_entry *e)
>  {
> - return e->vlan >> 13;
> + return (e->vlan & VLAN_PRIO_MASK) >> VLAN_PRIO_SHIFT;

e->vlan is a u16, the vlan priotity is the top 3 bits of the 16-bit
value, and finally the right shift will be non-signed.

Therefore this change is absolutely not necessary.

Please remove this patch from the series and resend.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v7 0/4] Support administratively closing application sockets

2015-12-16 Thread David Miller

From: Eric Dumazet 
Date: Wed, 16 Dec 2015 15:35:35 -0800

> socket can be shared by fd passing to multiple users.
> 
> Who really owns it ?

Oh yes the old fd permission discussion...

Probably need to use the openner's permissions.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] inet: sanitize socket() protocol value

2015-12-16 Thread Vegard Nossum

If you create a raw socket with a protocol of e.g. 0x1, then
inet_sk(sk)->inet_num will get set to 0 since it only has room for 16
bits. This causes problems further down the line as lots of code makes
assumptions about ->inet_num, for example connect()...inet_autobind()
will attempt to call sk->sk_prot->get_port() which is invalid:

  BUG: unable to handle kernel NULL pointer dereference at   (null)
  IP: [<  (null)>]   (null)
  PGD 19ea0067 PUD 19d3f067 PMD 0
  Oops: 0010 [#1] SMP
  CPU: 1 PID: 849 Comm: a.out Not tainted 4.4.0-rc4+ #287
  task: 880019a12640 ti: 8808c000 task.ti: 8808c000
  RIP: 0010:[<>]  [<  (null)>]   (null)
  RSP: 0018:8808fe50  EFLAGS: 00010246
  RAX: 81cc61f0 RBX: 880019d2ca80 RCX: 0802
  RDX: 0001 RSI:  RDI: 880019d2ca80
  RBP: 8808fe60 R08: 7f0e6b132e80 R09: 880019d2ca80
  R10: 88001a7a72e0 R11: 880019a12640 R12: 000b
  R13: 00400644 R14:  R15: 
  FS:  7f0e6b355740() GS:88001a90() knlGS:
  CS:  0010 DS:  ES:  CR0: 80050033
  CR2:  CR3: 19d3d000 CR4: 001406a0
  Stack:
   815f1c69 880019d2ca80 8808fe88 815f1cde
   000b000b 8808fea0 880019484000 8808ff38
   8156f310  2a2a2a2a2a2a2a2a 812a
  Call Trace:
   [] ? inet_autobind+0x23/0x50
   [] inet_dgram_connect+0x48/0x64
   [] SYSC_connect+0x84/0xae
   [] ? sock_alloc_file+0xb3/0x108
   [] ? fd_install+0x20/0x22
   [] ? SYSC_socket+0x62/0x90
   [] SyS_connect+0x9/0xb
   [] entry_SYSCALL_64_fastpath+0x12/0x71
  Code:  Bad RIP value.
  RIP  [<  (null)>]   (null)
   RSP 
  CR2: 
  ---[ end trace bd60b4fe2edc2537 ]---

Signed-off-by: Vegard Nossum 
Cc: Eric Dumazet 
Cc: 
---
 net/ipv4/af_inet.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git net/ipv4/af_inet.c net/ipv4/af_inet.c
index 11c4ca1..4e1583a 100644
--- net/ipv4/af_inet.c
+++ net/ipv4/af_inet.c
@@ -316,6 +316,12 @@ lookup_protocol:
 
WARN_ON(!answer_prot->slab);
 
+   /* Check that the protocol we were given will actually fit in
+* inet->inet_num. */
+   err = -EINVAL;
+   if (protocol != (typeof(inet->inet_num)) protocol)
+   goto out;
+
err = -ENOBUFS;
sk = sk_alloc(net, PF_INET, GFP_KERNEL, answer_prot, kern);
if (!sk)
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] wireless: change cfg80211 regulatory domain info as debug messages

2015-12-16 Thread Dave Young

Hi,

On 12/11/15 at 03:26pm, Johannes Berg wrote:
> On Mon, 2015-11-23 at 09:37 +0800, Dave Young wrote:
> 
> > Seems there're a lot of other wireless messages. Should we refactor 
> > them as well? I still did not get chance to see where is the code.
> > (My wireless driver being used is iwlwifi)
> 
> Most are probably from net/mac80211/.
> 
> > # dmesg|grep "Limiting TX power"|wc
> >    4128   49600  360052
> > 
> 
> I fixed this one recently, due to a bug it was printed all the time
> instead of just once when taking effect.

Cool, has the fix been in mainline or somewhere else?

Thanks
Dave
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-next v5 0/8] openvswitch: NAT support.

2015-12-16 Thread Jarno Rajahalme

This series adds NAT support to openvswitch kernel module.  A few
changes are needed to the netfilter code to facilitate this (patches
1-3/8).  Patches 4-7 make the openvswitch kernel module ready for the
patch 8 that adds the NAT support by calling into netfilter NAT code
from the openvswitch conntrack action.

This version addresses all the comments received on prior versions and
rebases to current net-next.

The OVS master now has the corresponding OVS userspace support to use
and test the NAT features.  Below if a walk through of a simple use
case.

In this case ports 1 and 2 are in different namespaces.  The OpenFlow
table below only allows IPv4 connections initiated from port 1, and
applies source NAT to those connections:

   in_port=1,ip,action=ct(commit,zone=1,nat(src=10.1.1.240-10.1.1.255)),2
   in_port=2,ct_state=-trk,ip,action=ct(table=0,zone=1,nat)
   in_port=2,ct_state=+est,ct_zone=1,ip,action=1

This flow table matches all IPv4 traffic from port 1, runs them
through conntrack in zone 1 and NATs them.  The NAT is initialized to
do source IP mapping to the given range for the first packet of each
connection, after which the new connection is committed (confirmed).
For further packets of already tracked connections NAT is done
according to the connection state and the commit is a no-op.  Each
packet that is not flagged as a drop by the CT action is forwarded to
port 2.  The CT action does an implicit fragmentation reassembly, so
that only complete packets are run through conntrack.  Reassembled
packets are re-fragmented on output.

The IPv4 traffic coming from port 2 is first matched for the
non-tracked state (-trk), which means that the packet has not been
through a CT action yet.  Such traffic is run trough the conntrack in
zone 1 and all packets associated with a NATted connection are NATted
also in the return direction.  After the packet has been through
conntrack it is recirculated back to OpenFlow table 0 (which is the
default table, so all the rules above are in table 0).  The CT action
changes the 'trk' flag to being set, so the packets after
recirculation no longer match the second rule.  The third rule then
matches the recirculated packets that were marked as established by
conntrack (+est), and the packet is output on port 1.  Matching on
ct_zone is not strictly needed, but in this test case it verifies that
the ct_zone key attribute is properly set by the conntrack action.

A full test case requires rules for ARP handling not shown here.

The flow table above is an OpenFlow table, and the rules therein
are translated to kernel flow entries on-demand by ovs-vswitchd.

Jarno Rajahalme (8):
  netfilter: Remove IP_CT_NEW_REPLY definition.
  netfilter: Factor out nf_ct_get_info().
  netfilter: Allow calling into nat helper without skb_dst.
  openvswitch: Update the CT state key only after nf_conntrack_in().
  openvswitch: Find existing conntrack entry after upcall.
  openvswitch: Handle NF_REPEAT in conntrack action.
  openvswitch: Delay conntrack helper call for new connections.
  openvswitch: Interface with NAT.

 include/net/netfilter/nf_conntrack.h   |  15 +
 include/uapi/linux/netfilter/nf_conntrack_common.h |  12 +-
 include/uapi/linux/openvswitch.h   |  47 ++
 net/ipv4/netfilter/nf_nat_l3proto_ipv4.c   |  30 +-
 net/ipv6/netfilter/nf_nat_l3proto_ipv6.c   |  30 +-
 net/netfilter/nf_conntrack_core.c  |  28 +-
 net/openvswitch/conntrack.c| 630 +++--
 net/openvswitch/conntrack.h|   3 +-
 8 files changed, 690 insertions(+), 105 deletions(-)

-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-next v5 6/8] openvswitch: Handle NF_REPEAT in conntrack action.

2015-12-16 Thread Jarno Rajahalme

Repeat the nf_conntrack_in() call when it returns NF_REPEAT.  This
avoids dropping a SYN packet re-opening an existing TCP connection.

Signed-off-by: Jarno Rajahalme 
---
 net/openvswitch/conntrack.c | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
index 0c371d0..7aa38fa 100644
--- a/net/openvswitch/conntrack.c
+++ b/net/openvswitch/conntrack.c
@@ -470,6 +470,7 @@ static int __ovs_ct_lookup(struct net *net, struct 
sw_flow_key *key,
 */
if (!skb_nfct_cached(net, key, info, skb)) {
struct nf_conn *tmpl = info->ct;
+   int err;
 
/* Associate skb with specified zone. */
if (tmpl) {
@@ -480,8 +481,13 @@ static int __ovs_ct_lookup(struct net *net, struct 
sw_flow_key *key,
skb->nfctinfo = IP_CT_NEW;
}
 
-   if (nf_conntrack_in(net, info->family, NF_INET_PRE_ROUTING,
-   skb) != NF_ACCEPT)
+   /* Repeat if requested, see nf_iterate(). */
+   do {
+   err = nf_conntrack_in(net, info->family,
+ NF_INET_PRE_ROUTING, skb);
+   } while (err == NF_REPEAT);
+
+   if (err != NF_ACCEPT)
return -ENOENT;
 
ovs_ct_update_key(skb, key, true);
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-next v5 7/8] openvswitch: Delay conntrack helper call for new connections.

2015-12-16 Thread Jarno Rajahalme

There is no need to help connections that are not confirmed, so we can
delay helping new connections to the time when they are confirmed.
This change is needed for NAT support, and having this as a separate
patch will make the following NAT patch a bit easier to review.

Signed-off-by: Jarno Rajahalme 
---
 net/openvswitch/conntrack.c | 20 +++-
 1 file changed, 15 insertions(+), 5 deletions(-)

diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
index 7aa38fa..ba44287 100644
--- a/net/openvswitch/conntrack.c
+++ b/net/openvswitch/conntrack.c
@@ -458,6 +458,7 @@ static bool skb_nfct_cached(struct net *net,
 /* Pass 'skb' through conntrack in 'net', using zone configured in 'info', if
  * not done already.  Update key with new CT state after passing the packet
  * through conntrack.
+ * Note that invalid packets are accepted while the skb->nfct remains unset!
  */
 static int __ovs_ct_lookup(struct net *net, struct sw_flow_key *key,
   const struct ovs_conntrack_info *info,
@@ -468,7 +469,11 @@ static int __ovs_ct_lookup(struct net *net, struct 
sw_flow_key *key,
 * actually run the packet through conntrack twice unless it's for a
 * different zone.
 */
-   if (!skb_nfct_cached(net, key, info, skb)) {
+   bool cached = skb_nfct_cached(net, key, info, skb);
+   enum ip_conntrack_info ctinfo;
+   struct nf_conn *ct;
+
+   if (!cached) {
struct nf_conn *tmpl = info->ct;
int err;
 
@@ -491,11 +496,16 @@ static int __ovs_ct_lookup(struct net *net, struct 
sw_flow_key *key,
return -ENOENT;
 
ovs_ct_update_key(skb, key, true);
+   }
 
-   if (ovs_ct_helper(skb, info->family) != NF_ACCEPT) {
-   WARN_ONCE(1, "helper rejected packet");
-   return -EINVAL;
-   }
+   /* Call the helper right after nf_conntrack_in() for confirmed
+* connections, but only when commiting for unconfirmed connections.
+*/
+   ct = nf_ct_get(skb, );
+   if (ct && (nf_ct_is_confirmed(ct) ? !cached : info->commit)
+   && ovs_ct_helper(skb, info->family) != NF_ACCEPT) {
+   WARN_ONCE(1, "helper rejected packet");
+   return -EINVAL;
}
 
return 0;
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-next v5 4/8] openvswitch: Update the CT state key only after nf_conntrack_in().

2015-12-16 Thread Jarno Rajahalme

Only a successful nf_conntrack_in() call can effect a connection state
change, so if suffices to update the key only after the
nf_conntrack_in() returns.

This change is needed for the later NAT patches.

Signed-off-by: Jarno Rajahalme 
---
 net/openvswitch/conntrack.c | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
index a28a819..10f4a6e 100644
--- a/net/openvswitch/conntrack.c
+++ b/net/openvswitch/conntrack.c
@@ -194,7 +194,6 @@ static int ovs_ct_set_mark(struct sk_buff *skb, struct 
sw_flow_key *key,
struct nf_conn *ct;
u32 new_mark;
 
-
/* The connection could be invalid, in which case set_mark is no-op. */
ct = nf_ct_get(skb, );
if (!ct)
@@ -385,6 +384,10 @@ static bool skb_nfct_cached(const struct net *net, const 
struct sk_buff *skb,
return true;
 }
 
+/* Pass 'skb' through conntrack in 'net', using zone configured in 'info', if
+ * not done already.  Update key with new CT state after passing the packet
+ * through conntrack.
+ */
 static int __ovs_ct_lookup(struct net *net, struct sw_flow_key *key,
   const struct ovs_conntrack_info *info,
   struct sk_buff *skb)
@@ -410,14 +413,14 @@ static int __ovs_ct_lookup(struct net *net, struct 
sw_flow_key *key,
skb) != NF_ACCEPT)
return -ENOENT;
 
+   ovs_ct_update_key(skb, key, true);
+
if (ovs_ct_helper(skb, info->family) != NF_ACCEPT) {
WARN_ONCE(1, "helper rejected packet");
return -EINVAL;
}
}
 
-   ovs_ct_update_key(skb, key, true);
-
return 0;
 }
 
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-next v5 8/8] openvswitch: Interface with NAT.

2015-12-16 Thread Jarno Rajahalme

Extend OVS conntrack interface to cover NAT.  New nested
OVS_CT_ATTR_NAT attribute may be used to include NAT with a CT action.
A bare OVS_CT_ATTR_NAT only mangles existing and expected connections.
If OVS_NAT_ATTR_SRC or OVS_NAT_ATTR_DST is included within the nested
attributes, new (non-committed/non-confirmed) connections are mangled
according to the rest of the nested attributes.

The corresponding OVS userspace patch series includes test cases (in
tests/system-traffic.at) that also serve as example uses.

This work extends on a branch by Thomas Graf at
https://github.com/tgraf/ovs/tree/nat.

Signed-off-by: Jarno Rajahalme 
---
 include/uapi/linux/openvswitch.h |  47 
 net/openvswitch/conntrack.c  | 516 +--
 net/openvswitch/conntrack.h  |   3 +-
 3 files changed, 541 insertions(+), 25 deletions(-)

diff --git a/include/uapi/linux/openvswitch.h b/include/uapi/linux/openvswitch.h
index 28ccedd..1a4bbdc 100644
--- a/include/uapi/linux/openvswitch.h
+++ b/include/uapi/linux/openvswitch.h
@@ -454,6 +454,12 @@ struct ovs_key_ct_labels {
 #define OVS_CS_F_REPLY_DIR 0x08 /* Flow is in the reply direction. */
 #define OVS_CS_F_INVALID   0x10 /* Could not track connection. */
 #define OVS_CS_F_TRACKED   0x20 /* Conntrack has occurred. */
+#define OVS_CS_F_SRC_NAT   0x40 /* Packet's source address/port was
+  mangled by NAT. */
+#define OVS_CS_F_DST_NAT   0x80 /* Packet's destination address/port
+  was mangled by NAT. */
+
+#define OVS_CS_F_NAT_MASK (OVS_CS_F_SRC_NAT | OVS_CS_F_DST_NAT)
 
 /**
  * enum ovs_flow_attr - attributes for %OVS_FLOW_* commands.
@@ -632,6 +638,8 @@ struct ovs_action_hash {
  * mask. For each bit set in the mask, the corresponding bit in the value is
  * copied to the connection tracking label field in the connection.
  * @OVS_CT_ATTR_HELPER: variable length string defining conntrack ALG.
+ * @OVS_CT_ATTR_NAT: Nested OVS_NAT_ATTR_* for performing L3 network address
+ * translation (NAT) on the packet.
  */
 enum ovs_ct_attr {
OVS_CT_ATTR_UNSPEC,
@@ -641,12 +649,51 @@ enum ovs_ct_attr {
OVS_CT_ATTR_LABELS, /* labels to associate with this connection. */
OVS_CT_ATTR_HELPER, /* netlink helper to assist detection of
   related connections. */
+   OVS_CT_ATTR_NAT,/* Nested OVS_NAT_ATTR_* */
__OVS_CT_ATTR_MAX
 };
 
 #define OVS_CT_ATTR_MAX (__OVS_CT_ATTR_MAX - 1)
 
 /**
+ * enum ovs_nat_attr - Attributes for %OVS_CT_ATTR_NAT.
+ *
+ * @OVS_NAT_ATTR_SRC: Flag for Source NAT (mangle source address/port).
+ * @OVS_NAT_ATTR_DST: Flag for Destination NAT (mangle destination
+ * address/port).  Only one of (@OVS_NAT_ATTR_SRC, @OVS_NAT_ATTR_DST) may be
+ * specified.  Effective only for packets for ct_state NEW connections.
+ * Packets of committed connections are mangled by the NAT action according to
+ * the committed NAT type regardless of the flags specified.  As a corollary, a
+ * NAT action without a NAT type flag will only mangle packets of committed
+ * connections.  The following NAT attributes only apply for NEW
+ * (non-committed) connections, and they may be included only when the CT
+ * action has the @OVS_CT_ATTR_COMMIT flag and either @OVS_NAT_ATTR_SRC or
+ * @OVS_NAT_ATTR_DST is also included.
+ * @OVS_NAT_ATTR_IP_MIN: struct in_addr or struct in6_addr
+ * @OVS_NAT_ATTR_IP_MAX: struct in_addr or struct in6_addr
+ * @OVS_NAT_ATTR_PROTO_MIN: u16 L4 protocol specific lower boundary (port)
+ * @OVS_NAT_ATTR_PROTO_MAX: u16 L4 protocol specific upper boundary (port)
+ * @OVS_NAT_ATTR_PERSISTENT: Flag for persistent IP mapping across reboots
+ * @OVS_NAT_ATTR_PROTO_HASH: Flag for pseudo random L4 port mapping (MD5)
+ * @OVS_NAT_ATTR_PROTO_RANDOM: Flag for fully randomized L4 port mapping
+ */
+enum ovs_nat_attr {
+   OVS_NAT_ATTR_UNSPEC,
+   OVS_NAT_ATTR_SRC,
+   OVS_NAT_ATTR_DST,
+   OVS_NAT_ATTR_IP_MIN,
+   OVS_NAT_ATTR_IP_MAX,
+   OVS_NAT_ATTR_PROTO_MIN,
+   OVS_NAT_ATTR_PROTO_MAX,
+   OVS_NAT_ATTR_PERSISTENT,
+   OVS_NAT_ATTR_PROTO_HASH,
+   OVS_NAT_ATTR_PROTO_RANDOM,
+   __OVS_NAT_ATTR_MAX,
+};
+
+#define OVS_NAT_ATTR_MAX (__OVS_NAT_ATTR_MAX - 1)
+
+/**
  * enum ovs_action_attr - Action types.
  *
  * @OVS_ACTION_ATTR_OUTPUT: Output packet to port.
diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
index ba44287..096f5da 100644
--- a/net/openvswitch/conntrack.c
+++ b/net/openvswitch/conntrack.c
@@ -20,14 +20,24 @@
 #include 
 #include 
 
+#ifdef CONFIG_NF_NAT_NEEDED
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#endif
+
 #include "datapath.h"
 #include "conntrack.h"
 #include "flow.h"
 #include "flow_netlink.h"
 
 struct ovs_ct_len_tbl {
-   size_t maxlen;
-   size_t minlen;
+   int maxlen;
+   int minlen;
 };

[PATCH net-next v5 3/8] netfilter: Allow calling into nat helper without skb_dst.

2015-12-16 Thread Jarno Rajahalme

NAT checksum recalculation code assumes existence of skb_dst, which
becomes a problem for a later patch in the series ("openvswitch:
Interface with NAT.").  Simplify this by removing the check on
skb_dst, as the checksum will be dealt with later in the stack.

Suggested-by: Pravin Shelar 
Signed-off-by: Jarno Rajahalme 
---
 net/ipv4/netfilter/nf_nat_l3proto_ipv4.c | 30 --
 net/ipv6/netfilter/nf_nat_l3proto_ipv6.c | 30 --
 2 files changed, 16 insertions(+), 44 deletions(-)

diff --git a/net/ipv4/netfilter/nf_nat_l3proto_ipv4.c 
b/net/ipv4/netfilter/nf_nat_l3proto_ipv4.c
index 61c7cc2..f8aad03 100644
--- a/net/ipv4/netfilter/nf_nat_l3proto_ipv4.c
+++ b/net/ipv4/netfilter/nf_nat_l3proto_ipv4.c
@@ -127,29 +127,15 @@ static void nf_nat_ipv4_csum_recalc(struct sk_buff *skb,
u8 proto, void *data, __sum16 *check,
int datalen, int oldlen)
 {
-   const struct iphdr *iph = ip_hdr(skb);
-   struct rtable *rt = skb_rtable(skb);
-
if (skb->ip_summed != CHECKSUM_PARTIAL) {
-   if (!(rt->rt_flags & RTCF_LOCAL) &&
-   (!skb->dev || skb->dev->features &
-(NETIF_F_IP_CSUM | NETIF_F_HW_CSUM))) {
-   skb->ip_summed = CHECKSUM_PARTIAL;
-   skb->csum_start = skb_headroom(skb) +
- skb_network_offset(skb) +
- ip_hdrlen(skb);
-   skb->csum_offset = (void *)check - data;
-   *check = ~csum_tcpudp_magic(iph->saddr, iph->daddr,
-   datalen, proto, 0);
-   } else {
-   *check = 0;
-   *check = csum_tcpudp_magic(iph->saddr, iph->daddr,
-  datalen, proto,
-  csum_partial(data, datalen,
-   0));
-   if (proto == IPPROTO_UDP && !*check)
-   *check = CSUM_MANGLED_0;
-   }
+   const struct iphdr *iph = ip_hdr(skb);
+
+   skb->ip_summed = CHECKSUM_PARTIAL;
+   skb->csum_start = skb_headroom(skb) + skb_network_offset(skb) +
+   ip_hdrlen(skb);
+   skb->csum_offset = (void *)check - data;
+   *check = ~csum_tcpudp_magic(iph->saddr, iph->daddr, datalen,
+   proto, 0);
} else
inet_proto_csum_replace2(check, skb,
 htons(oldlen), htons(datalen), true);
diff --git a/net/ipv6/netfilter/nf_nat_l3proto_ipv6.c 
b/net/ipv6/netfilter/nf_nat_l3proto_ipv6.c
index 6ce3099..e0be97e 100644
--- a/net/ipv6/netfilter/nf_nat_l3proto_ipv6.c
+++ b/net/ipv6/netfilter/nf_nat_l3proto_ipv6.c
@@ -131,29 +131,15 @@ static void nf_nat_ipv6_csum_recalc(struct sk_buff *skb,
u8 proto, void *data, __sum16 *check,
int datalen, int oldlen)
 {
-   const struct ipv6hdr *ipv6h = ipv6_hdr(skb);
-   struct rt6_info *rt = (struct rt6_info *)skb_dst(skb);
-
if (skb->ip_summed != CHECKSUM_PARTIAL) {
-   if (!(rt->rt6i_flags & RTF_LOCAL) &&
-   (!skb->dev || skb->dev->features &
-(NETIF_F_IPV6_CSUM | NETIF_F_HW_CSUM))) {
-   skb->ip_summed = CHECKSUM_PARTIAL;
-   skb->csum_start = skb_headroom(skb) +
- skb_network_offset(skb) +
- (data - (void *)skb->data);
-   skb->csum_offset = (void *)check - data;
-   *check = ~csum_ipv6_magic(>saddr, >daddr,
- datalen, proto, 0);
-   } else {
-   *check = 0;
-   *check = csum_ipv6_magic(>saddr, >daddr,
-datalen, proto,
-csum_partial(data, datalen,
- 0));
-   if (proto == IPPROTO_UDP && !*check)
-   *check = CSUM_MANGLED_0;
-   }
+   const struct ipv6hdr *ipv6h = ipv6_hdr(skb);
+
+   skb->ip_summed = CHECKSUM_PARTIAL;
+   skb->csum_start = skb_headroom(skb) + skb_network_offset(skb) +
+   (data - (void *)skb->data);
+   skb->csum_offset = (void *)check - data;
+   *check = ~csum_ipv6_magic(>saddr, >daddr,
+ datalen, proto, 0);
} else

Re: [PATCH] wireless: change cfg80211 regulatory domain info as debug messages

2015-12-16 Thread Dave Young

Hi, Johannes

Sorry for late feedback, I was busying on other things.

On 12/11/15 at 03:38pm, Johannes Berg wrote:
> On Sun, 2015-11-15 at 15:31 +0800, Dave Young wrote:
> > cfg80211 module prints a lot of messages like below. Actually
> > printing once is acceptable but sometimes it will print again and
> > again, it looks very annoying. It is better to change these detail
> > messages to debugging only.
> > 
> 
> Despite the objections, I've applied this patch now.
> 

Thanks a lot.

> I've made one change: keeping the alpha2 (e.g. "US") printed in some of
> the pr_err() cases in this file.
> I also got rid of CONFIG_CFG80211_REG_DEBUG in a separate patch.
> 
> I somewhat agree with the objections, but if the kernel is with
> CONFIG_DYNAMIC_DEBUG then it's really simple to get the messages back
> by enabling them for this file.
> 
> Where the messages were used as an indication of something having gone
> awry at a different level (e.g. mac80211 disconnect) I don't really
> quite agree - that then perhaps should have a more explicit (and less
> noisy) message.
> 
> I also agree that the regulatory code is quite opaque, and the way it
> arrives at certain conclusions is not always obvious. These messages
> don't help all that much though since they don't contain the actual
> input to the decisions. I think for that, we'd be much better served
> with some kind of tracepoint or so that records all the information.

I think you guys are expert in this area, I will agree with all of
above. But I hope we can have some rate limited messages at least
especially for endless things.

Thanks
Dave
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next v5 05/19] net: ethtool: add new ETHTOOL_GSETTINGS/SSETTINGS API

2015-12-16 Thread David Decotigny

Thanks David: you are right, we should copy back sizeof(struct
ethtool_settings) in that case and not sizeof(usettings). Sorry about
that, will fix for v6.

a few questions before sending update:
 - is this handshake reasonable? or should we have an ethtool cmd
dedicated to this kind of handshake, or should we hardcode bitmap
length once and for all (ie. 128 instead of 32 today)?
 - is it ok to have the u32[] api part of bitmap.h? I was assuming it
could be used for other ioctl/syscalls outside ethtool... but maybe
this is being too pretentious and we could keep this internal to
net/core?
 - struct inheritance is used to have the link mode bitmaps
piggybacked at end the public struct ethtool_settings. hence this
"parent" field. I'm not super proud of this, but I find relying on the
compiler more comfortable. could revise my position with a
constructor+accessors macros if you think it's preferred.


On Wed, Dec 16, 2015 at 8:38 AM, David Miller  wrote:
> From: David Decotigny 
> Date: Mon, 14 Dec 2015 13:03:52 -0800
>
>> +static int ethtool_get_ksettings(struct net_device *dev, void __user 
>> *useraddr)
>> +{
>  ...
>> + if (__ETHTOOL_LINK_MODE_MASK_NU32
>> + != ksettings.parent.link_mode_masks_nwords) {
>> + /* wrong link mode nbits requested */
>> + memset(, 0, sizeof(ksettings));
>> + /* keep cmd field reset to 0 */
>> + /* send back number of words required as negative val */
>> + compiletime_assert(__ETHTOOL_LINK_MODE_MASK_NU32 <= S8_MAX,
>> +"need too many bits for link modes!");
>> + ksettings.parent.link_mode_masks_nwords
>> + = -((s8)__ETHTOOL_LINK_MODE_MASK_NU32);
>
> I'm trying to understand how this can work.
>
> Supposedly, the link_mode_masks_nwords field is there so that we can
> add new link modes yet still work with tools built against any
> particular link mode list in the UAPI header files.
>
> But here you're forcing the value of link_mode_masks_nwords and then
> copying that amount back to userspace.  If the user allocated less
> space than the the link mode list in the kernel supports, we will
> overwrite past the end of the user's usettings object.
>
> You cannot unconditionally copy sizeof(usettings) back to the user,
> as store_ksettings_for_user() will do.
>
> I think you have to truncate here, copying only the array elements the
> user's structure actually has space for.  That's the only way this can
> work.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-next v5 5/8] openvswitch: Find existing conntrack entry after upcall.

2015-12-16 Thread Jarno Rajahalme

Add a new function ovs_ct_find_existing() to find an existing
conntrack entry for which this packet was already applied to.  This is
only to be called when there is evidence that the packet was already
tracked and committed, but we lost the ct reference due to an
userspace upcall.

ovs_ct_find_existing() is called from skb_nfct_cached(), which can now
hide the fact that the ct reference may have been lost due to an
upcall.  This allows ovs_ct_commit() to be simplified.

This patch is needed by later "openvswitch: Interface with NAT" patch,
as we need to be able to pass the packet through NAT using the
original ct reference also after the reference is lost after an
upcall.

Signed-off-by: Jarno Rajahalme 
---
 net/openvswitch/conntrack.c | 95 ++---
 1 file changed, 82 insertions(+), 13 deletions(-)

diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
index 10f4a6e..0c371d0 100644
--- a/net/openvswitch/conntrack.c
+++ b/net/openvswitch/conntrack.c
@@ -359,16 +359,87 @@ ovs_ct_expect_find(struct net *net, const struct 
nf_conntrack_zone *zone,
return __nf_ct_expect_find(net, zone, );
 }
 
+/* Find an existing conntrack entry for which this packet was already applied
+ * to.  This is only called when there is evidence that the packet was already
+ * tracked and commited, but we lost the ct reference due to an userspace
+ * upcall. This means that on entry skb->nfct is NULL.
+ * On success, returns conntrack ptr, sets skb->nfct and ctinfo.
+ * Must be called rcu_read_lock()ed. */
+static struct nf_conn *
+ovs_ct_find_existing(struct net *net, const struct nf_conntrack_zone *zone,
+u_int8_t l3num, struct sk_buff *skb,
+enum ip_conntrack_info *ctinfo)
+{
+   struct nf_conntrack_l3proto *l3proto;
+   struct nf_conntrack_l4proto *l4proto;
+   struct nf_conntrack_tuple tuple;
+   struct nf_conntrack_tuple_hash *h;
+   struct nf_conn *ct;
+   unsigned int dataoff;
+   u_int8_t protonum;
+
+   BUG_ON(skb->nfct != NULL);
+
+   l3proto = __nf_ct_l3proto_find(l3num);
+   if (!l3proto) {
+   pr_debug("ovs_ct_find_existing: Can't get l3proto\n");
+   return NULL;
+   }
+   if (l3proto->get_l4proto(skb, skb_network_offset(skb), ,
+) <= 0) {
+   pr_debug("ovs_ct_find_existing: Can't get protonum\n");
+   return NULL;
+   }
+   l4proto = __nf_ct_l4proto_find(l3num, protonum);
+   if (!l4proto) {
+   pr_debug("ovs_ct_find_existing: Can't get l4proto\n");
+   return NULL;
+   }
+   if (!nf_ct_get_tuple(skb, skb_network_offset(skb), dataoff, l3num,
+protonum, net, , l3proto, l4proto)) {
+   pr_debug("ovs_ct_find_existing: Can't get tuple\n");
+   return NULL;
+   }
+
+   /* look for tuple match */
+   h = nf_conntrack_find_get(net, zone, );
+   if (!h)
+   return NULL;   /* Not found. */
+
+   ct = nf_ct_tuplehash_to_ctrack(h);
+
+   *ctinfo = nf_ct_get_info(h);
+   if (*ctinfo == IP_CT_NEW) {
+   /* This should not happen. */
+   WARN_ONCE(1, "ovs_ct_find_existing: new packet for %p\n", ct);
+   }
+   skb->nfct = >ct_general;
+   skb->nfctinfo = *ctinfo;
+   return ct;
+}
+
 /* Determine whether skb->nfct is equal to the result of conntrack lookup. */
-static bool skb_nfct_cached(const struct net *net, const struct sk_buff *skb,
-   const struct ovs_conntrack_info *info)
+static bool skb_nfct_cached(struct net *net,
+   const struct sw_flow_key *key,
+   const struct ovs_conntrack_info *info,
+   struct sk_buff *skb)
 {
enum ip_conntrack_info ctinfo;
struct nf_conn *ct;
 
ct = nf_ct_get(skb, );
+   /* If no ct, check if we have evidence that an existing conntrack entry
+* might be found for this skb.  This happens when we lose a skb->nfct
+* due to an upcall.  If the connection was not confirmed, it is not
+* cached and needs to be run through conntrack again. */
+   if (!ct && key->ct.state & OVS_CS_F_TRACKED
+   && !(key->ct.state & OVS_CS_F_INVALID)
+   && key->ct.zone == info->zone.id)
+   ct = ovs_ct_find_existing(net, >zone, info->family, skb,
+ );
if (!ct)
return false;
+
if (!net_eq(net, read_pnet(>ct_net)))
return false;
if (!nf_ct_zone_equal_any(info->ct, nf_ct_zone(ct)))
@@ -397,7 +468,7 @@ static int __ovs_ct_lookup(struct net *net, struct 
sw_flow_key *key,
 * actually run the packet through conntrack twice unless it's for a
 * different zone.
 */
-   if (!skb_nfct_cached(net, skb, info)) {
+

[PATCH net-next v5 1/8] netfilter: Remove IP_CT_NEW_REPLY definition.

2015-12-16 Thread Jarno Rajahalme

Remove the definition of IP_CT_NEW_REPLY from the kernel as it does
not make sense.  This allows the definition of IP_CT_NUMBER to be
simplified as well.

Signed-off-by: Jarno Rajahalme 
---
 include/uapi/linux/netfilter/nf_conntrack_common.h | 12 +---
 net/openvswitch/conntrack.c|  2 --
 2 files changed, 9 insertions(+), 5 deletions(-)

diff --git a/include/uapi/linux/netfilter/nf_conntrack_common.h 
b/include/uapi/linux/netfilter/nf_conntrack_common.h
index 319f471..6d074d1 100644
--- a/include/uapi/linux/netfilter/nf_conntrack_common.h
+++ b/include/uapi/linux/netfilter/nf_conntrack_common.h
@@ -20,9 +20,15 @@ enum ip_conntrack_info {
 
IP_CT_ESTABLISHED_REPLY = IP_CT_ESTABLISHED + IP_CT_IS_REPLY,
IP_CT_RELATED_REPLY = IP_CT_RELATED + IP_CT_IS_REPLY,
-   IP_CT_NEW_REPLY = IP_CT_NEW + IP_CT_IS_REPLY,   
-   /* Number of distinct IP_CT types (no NEW in reply dirn). */
-   IP_CT_NUMBER = IP_CT_IS_REPLY * 2 - 1
+   /* No NEW in reply direction. */
+
+   /* Number of distinct IP_CT types. */
+   IP_CT_NUMBER,
+
+   /* only for userspace compatibility */
+#ifndef __KERNEL__
+   IP_CT_NEW_REPLY = IP_CT_NUMBER,
+#endif
 };
 
 #define NF_CT_STATE_INVALID_BIT(1 << 0)
diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
index c2cc111..a28a819 100644
--- a/net/openvswitch/conntrack.c
+++ b/net/openvswitch/conntrack.c
@@ -73,7 +73,6 @@ static u8 ovs_ct_get_state(enum ip_conntrack_info ctinfo)
switch (ctinfo) {
case IP_CT_ESTABLISHED_REPLY:
case IP_CT_RELATED_REPLY:
-   case IP_CT_NEW_REPLY:
ct_state |= OVS_CS_F_REPLY_DIR;
break;
default:
@@ -90,7 +89,6 @@ static u8 ovs_ct_get_state(enum ip_conntrack_info ctinfo)
ct_state |= OVS_CS_F_RELATED;
break;
case IP_CT_NEW:
-   case IP_CT_NEW_REPLY:
ct_state |= OVS_CS_F_NEW;
break;
default:
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2 1/3] ethernet:ti:cpsw: fix phy identification with multiple slaves on fixed-phy

2015-12-16 Thread David Rivshin (Allworx)

From: Pascal Speck (Iktek) 
Date: Fri, 04 Dec 2015 16:55:17 +0100

When using more than one slave with ti cpsw and fixed phy the pd->phy_id
will be always zero, but slave_data->phy_id must be unique. pd->phy_id
means a "phy hardware id" whereas slave_data->phy_id means an "unique id",
so we should use pd->addr which has the same unique meaning.

Fixes: 1f71e8c96fc6 ("drivers: net: cpsw: Add support for fixed-link PHY")
Signed-off-by: Pascal Speck 
---
This was originally submitted by Pascal Speck on December 4, but was not
picked up by patchwork. I suspect that is because the patch was mangled by
the mailer. The only changes I made were to manually fix the patch whitespace
and wrapping, and add the Fixes: tag.

 drivers/net/ethernet/ti/cpsw.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c
index 48b92c9..e3b220d 100644
--- a/drivers/net/ethernet/ti/cpsw.c
+++ b/drivers/net/ethernet/ti/cpsw.c
@@ -2047,7 +2047,7 @@ static int cpsw_probe_dt(struct cpsw_priv *priv,
if (!pd)
return -ENODEV;
snprintf(slave_data->phy_id, sizeof(slave_data->phy_id),
-PHY_ID_FMT, pd->bus->id, pd->phy_id);
+PHY_ID_FMT, pd->bus->id, pd->addr);
goto no_phy_slave;
}
parp = of_get_property(slave_node, "phy_id", );
--
2.5.0

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2 0/3] drivers: net: cpsw: Fix bugs in fixed-link PHY DT parsing

2015-12-16 Thread David Rivshin (Allworx)

Commit 1f71e8c96fc654724723ce987e0a8b2aeb81746d ("drivers: net: cpsw:
Add support for fixed-link PHY") added initial fixed-link PHY support
for CPSW, but missed a few considerations.

This series is based on the tip of the net tree. The first two patches
fix user-visible errors in different hardware configurations. The third
patch is for an internal reference counting issue. They are logically
independent changes, but in the same function, so must be applied in
order to apply cleanly.

The first patch was originally submitted by Pascal Speck on December 4,
but was not picked up by patchwork. I suspect that is because the patch
was mangled by the mailer. I fixed the mangling and am including it in
this series, as I believe it is the correct change.

I have tested on the following hardware configurations:
 - (EVMSK) dual emac with two real MDIO-connected phys using RGMII-TXID
 - single emac with fixed-link using RGMII
Testing of other CPSW emac configurations that folks may have would
be appreciated.


Changes from v1 [1]:
 - Split into 3 smaller patches.
 - Maintain 1f71e8c96fc6's preference for fixed-link over phy_id if
   they are both (incorrectly) specified in the slave node.
 - Update binding documentation to no longer say that phy_mode is also
   mutually exclusive with fixed-link.
 - Dropped unnecessary include of phy_fixed.h.

[1] https://patchwork.ozlabs.org/patch/554989/

David Rivshin (2):
  drivers: net: cpsw: fix RMII/RGMII mode when used with fixed-link PHY
  drivers: net: cpsw: increment reference count on fixed-link PHY node

Pascal Speck (Iktek) (1):
  ethernet:ti:cpsw: fix phy identification with multiple slaves on
fixed-phy

 Documentation/devicetree/bindings/net/cpsw.txt |  6 +--
 drivers/net/ethernet/ti/cpsw.c | 53 +++---
 2 files changed, 34 insertions(+), 25 deletions(-)

--
2.5.0

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2 2/3] drivers: net: cpsw: fix RMII/RGMII mode when used with fixed-link PHY

2015-12-16 Thread David Rivshin (Allworx)

From: David Rivshin 

Commit 1f71e8c96fc654724723ce987e0a8b2aeb81746d ("drivers: net: cpsw: Add
support for fixed-link PHY") did not parse the "phy-mode" property in
the case of a fixed-link PHY, leaving slave_data->phy_if with its default
of PHY_INTERFACE_MODE_NA(0). This later gets passed to phy_connect() in
cpsw_slave_open(), and eventually to cpsw_phy_sel() where it hits a default
case that configures the MAC for MII mode.

The user visible symptom is that while kernel log messages seem to indicate
that the interface is set up, there is no network communication. Eventually
a watchdog error occurs:
NETDEV WATCHDOG: eth0 (cpsw): transmit queue 0 timed out

Fixes: 1f71e8c96fc6 ("drivers: net: cpsw: Add support for fixed-link PHY")
Signed-off-by: David Rivshin 
---
Changes from v1 (https://patchwork.ozlabs.org/patch/554989/):
 - Maintain 1f71e8c96fc6's preference for fixed-link over phy_id if they
   are both (incorrectly) specified in the slave node.
 - Update binding documentation to no longer say that phy_mode is also
   mutually exclusive with fixed-link.
 - Dropped unnecessary include of phy_fixed.h.
 - Commit message tweaked.

 Documentation/devicetree/bindings/net/cpsw.txt |  6 ++--
 drivers/net/ethernet/ti/cpsw.c | 40 ++
 2 files changed, 25 insertions(+), 21 deletions(-)

diff --git a/Documentation/devicetree/bindings/net/cpsw.txt 
b/Documentation/devicetree/bindings/net/cpsw.txt
index 9853f8e..28a4781 100644
--- a/Documentation/devicetree/bindings/net/cpsw.txt
+++ b/Documentation/devicetree/bindings/net/cpsw.txt
@@ -40,18 +40,18 @@ Optional properties:

 Slave Properties:
 Required properties:
-- phy_id   : Specifies slave phy id
 - phy-mode : See ethernet.txt file in the same directory

 Optional properties:
 - dual_emac_res_vlan   : Specifies VID to be used to segregate the ports
 - mac-address  : See ethernet.txt file in the same directory
+- phy_id   : Specifies slave phy id
 - phy-handle   : See ethernet.txt file in the same directory

 Slave sub-nodes:
 - fixed-link   : See fixed-link.txt file in the same directory
- Either the properties phy_id and phy-mode,
- or the sub-node fixed-link can be specified
+ Either the property phy_id, or the sub-node
+ fixed-link can be specified

 Note: "ti,hwmods" field is used to fetch the base address and irq
 resources from TI, omap hwmod data base during device registration.
diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c
index e3b220d..bc6d20d 100644
--- a/drivers/net/ethernet/ti/cpsw.c
+++ b/drivers/net/ethernet/ti/cpsw.c
@@ -2026,17 +2026,15 @@ static int cpsw_probe_dt(struct cpsw_priv *priv,
for_each_child_of_node(node, slave_node) {
struct cpsw_slave_data *slave_data = data->slave_data + i;
const void *mac_addr = NULL;
-   u32 phyid;
int lenp;
const __be32 *parp;
-   struct device_node *mdio_node;
-   struct platform_device *mdio;

/* This is no slave child node, continue */
if (strcmp(slave_node->name, "slave"))
continue;

priv->phy_node = of_parse_phandle(slave_node, "phy-handle", 0);
+   parp = of_get_property(slave_node, "phy_id", );
if (of_phy_is_fixed_link(slave_node)) {
struct phy_device *pd;

@@ -2048,23 +2046,29 @@ static int cpsw_probe_dt(struct cpsw_priv *priv,
return -ENODEV;
snprintf(slave_data->phy_id, sizeof(slave_data->phy_id),
 PHY_ID_FMT, pd->bus->id, pd->addr);
+   } else if (parp) {
+   u32 phyid;
+   struct device_node *mdio_node;
+   struct platform_device *mdio;
+
+   if (lenp != (sizeof(__be32) * 2)) {
+   dev_err(>dev, "Invalid slave[%d] phy_id 
property\n", i);
+   goto no_phy_slave;
+   }
+   mdio_node = of_find_node_by_phandle(be32_to_cpup(parp));
+   phyid = be32_to_cpup(parp+1);
+   mdio = of_find_device_by_node(mdio_node);
+   of_node_put(mdio_node);
+   if (!mdio) {
+   dev_err(>dev, "Missing mdio platform 
device\n");
+   return -EINVAL;
+   }
+   snprintf(slave_data->phy_id, sizeof(slave_data->phy_id),
+PHY_ID_FMT, mdio->name, phyid);
+   } else {
+   dev_err(>dev, "No slave[%d] phy_id or fixed-link

[PATCH v2 3/3] drivers: net: cpsw: increment reference count on fixed-link PHY node

2015-12-16 Thread David Rivshin (Allworx)

From: David Rivshin 

When a fixed-link sub-node exists in a slave node, the slave node
is also the PHY node. Since this is a separate use of the slave node,
of_node_get() should be used to increment the reference count.

Fixes: 1f71e8c96fc6 ("drivers: net: cpsw: Add support for fixed-link PHY")
Signed-off-by: David Rivshin 
---
'pd' was renamed to 'phy_dev' to better fit the naming convention in the
function/file.

 drivers/net/ethernet/ti/cpsw.c | 13 +
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c
index bc6d20d..3b489ca 100644
--- a/drivers/net/ethernet/ti/cpsw.c
+++ b/drivers/net/ethernet/ti/cpsw.c
@@ -2036,16 +2036,21 @@ static int cpsw_probe_dt(struct cpsw_priv *priv,
priv->phy_node = of_parse_phandle(slave_node, "phy-handle", 0);
parp = of_get_property(slave_node, "phy_id", );
if (of_phy_is_fixed_link(slave_node)) {
-   struct phy_device *pd;
+   struct device_node *phy_node;
+   struct phy_device *phy_dev;

+   /* In the case of a fixed PHY, the DT node associated
+* to the PHY is the Ethernet MAC DT node.
+*/
ret = of_phy_register_fixed_link(slave_node);
if (ret)
return ret;
-   pd = of_phy_find_device(slave_node);
-   if (!pd)
+   phy_node = of_node_get(slave_node);
+   phy_dev = of_phy_find_device(phy_node);
+   if (!phy_dev)
return -ENODEV;
snprintf(slave_data->phy_id, sizeof(slave_data->phy_id),
-PHY_ID_FMT, pd->bus->id, pd->addr);
+PHY_ID_FMT, phy_dev->bus->id, phy_dev->addr);
} else if (parp) {
u32 phyid;
struct device_node *mdio_node;
--
2.5.0

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net 2/2] udp: restrict offloads to one namespace

2015-12-16 Thread David Miller

From: Hannes Frederic Sowa 
Date: Tue, 15 Dec 2015 21:01:54 +0100

> udp tunnel offloads tend to aggregate datagrams based on inner
> headers. gro engine gets notified by tunnel implementations about
> possible offloads. The match is solely based on the port number.
> 
> Imagine a tunnel bound to port 53, the offloading will look into all
> DNS packets and tries to aggregate them based on the inner data found
> within. This could lead to data corruption and malformed DNS packets.
> 
> While this patch minimizes the problem and helps an administrator to find
> the issue by querying ip tunnel/fou, a better way would be to match on
> the specific destination ip address so if a user space socket is bound
> to the same address it will conflict.
> 
> Cc: Tom Herbert 
> Cc: Eric Dumazet 
> Signed-off-by: Hannes Frederic Sowa 

It looks this issue is still being hashed out so I've marked this
patch as deferred for now.

THanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net 1/2] fou: clean up socket with kfree_rcu

2015-12-16 Thread David Miller

From: Hannes Frederic Sowa 
Date: Tue, 15 Dec 2015 21:01:53 +0100

> fou->udp_offloads is managed by RCU. As it is actually included inside
> the fou sockets, we cannot let the memory go out of scope before a grace
> period. We either can synchronize_rcu or switch over to kfree_rcu to
> manage the sockets. kfree_rcu seems appropriate as it is used by vxlan
> and geneve.
> 
> Fixes: 23461551c00628c ("fou: Support for foo-over-udp RX path")
> Cc: Tom Herbert 
> Signed-off-by: Hannes Frederic Sowa 

Applied.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] net: add Qualcomm IPC router

2015-12-16 Thread Courtney Cavin

On Tue, Dec 15, 2015 at 10:01:14PM +0100, David Miller wrote:
> From: Bjorn Andersson 
> Date: Fri, 11 Dec 2015 12:41:59 -0800
> 
> > +static unsigned int qrtr_local_nid = 1;
> > +module_param_named(node_id, qrtr_local_nid, uint, S_IRUGO);
> > +MODULE_PARM_DESC(idVendor, "Local Node Identifier");
> 
> Module parameters suck.
> 
> Allow the user to choose this dynamically.  You have roughtly two choices.
> 
> 1) Subvert the 'protocol' field passed to ->create() and use that, it is
>being ignored otherwise.
> 
> 2) Put it into the socket address for bind().

So each socket can have its own node id?  That doesn't seem right.

The way these node ids are assigned is by a system designer (in this
case Qualcomm).  The ARM, Linux CPU is always node 1, the audio DSP is
always node 5, etc.  Anyone with the knowhow could reassign these
numbers, but there's no reason to have them be dynamic during runtime.
Additionally, allowing dynamic assignment would require code to prevent
id duplication for known remote nodes, as well as to deal with cases in
which remote node discovery happens after local sockets have acquired
that node's id.

Maybe the first socket created needs CAP_NET_ADMIN, and uses the
'protocol' field to set the node id?  Ugh. Gross.

We could hardcode the value in kconfig, but that seems like a worse
solution than a module parameter.

I'm open to further suggestions.

-Courtney
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next 1/4] cxgb4: Use mask & shift while returning vlan_prio

2015-12-16 Thread Casey Leedom

> On Dec 16, 2015, at 4:07 PM, David Miller  wrote:
> 
> From: Hariprasad Shenai 
> Date: Wed, 16 Dec 2015 13:16:25 +0530
> 
>> @@ -66,7 +66,7 @@ struct l2t_data {
>> 
>> static inline unsigned int vlan_prio(const struct l2t_entry *e)
>> {
>> -return e->vlan >> 13;
>> +return (e->vlan & VLAN_PRIO_MASK) >> VLAN_PRIO_SHIFT;
> 
> e->vlan is a u16, the vlan priotity is the top 3 bits of the 16-bit
> value, and finally the right shift will be non-signed.
> 
> Therefore this change is absolutely not necessary.
> 
> Please remove this patch from the series and resend.

I assume that you only meant that the masking portion is unnecessary.  Doing 
the shift with the symbolic constant VLAN_PRIO_SHIFT instead of the literal 
constant “13” is still a reasonable change.  The masking was almost certainly 
from me because once one uses the symbolic constants, weren’t not supposed to 
“know” about the internal structure of the operation.  Modern compilers are of 
course free to optimize away the mask, etc.

Casey--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] net: add Qualcomm IPC router

2015-12-16 Thread David Miller

From: Courtney Cavin 
Date: Wed, 16 Dec 2015 16:01:41 -0800

> We could hardcode the value in kconfig, but that seems like a worse
> solution than a module parameter.
> 
> I'm open to further suggestions.

No module parameters, configure it via netlink or similar at run
time.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] inet: sanitize socket() protocol value

2015-12-16 Thread Vegard Nossum


On 12/17/2015 02:01 AM, Eric Dumazet wrote:

On Wed, Dec 16, 2015 at 4:57 PM, Vegard Nossum  wrote:

If you create a raw socket with a protocol of e.g. 0x1, then
inet_sk(sk)->inet_num will get set to 0 since it only has room for 16
bits. This causes problems further down the line as lots of code makes
assumptions about ->inet_num, for example connect()...inet_autobind()
will attempt to call sk->sk_prot->get_port() which is invalid:

   BUG: unable to handle kernel NULL pointer dereference at   (null)
   IP: [<  (null)>]   (null)
   PGD 19ea0067 PUD 19d3f067 PMD 0
   Oops: 0010 [#1] SMP
   CPU: 1 PID: 849 Comm: a.out Not tainted 4.4.0-rc4+ #287
   task: 880019a12640 ti: 8808c000 task.ti: 8808c000
   RIP: 0010:[<>]  [<  (null)>]   (null)
   RSP: 0018:8808fe50  EFLAGS: 00010246
   RAX: 81cc61f0 RBX: 880019d2ca80 RCX: 0802
   RDX: 0001 RSI:  RDI: 880019d2ca80
   RBP: 8808fe60 R08: 7f0e6b132e80 R09: 880019d2ca80
   R10: 88001a7a72e0 R11: 880019a12640 R12: 000b
   R13: 00400644 R14:  R15: 
   FS:  7f0e6b355740() GS:88001a90() knlGS:
   CS:  0010 DS:  ES:  CR0: 80050033
   CR2:  CR3: 19d3d000 CR4: 001406a0
   Stack:
815f1c69 880019d2ca80 8808fe88 815f1cde
000b000b 8808fea0 880019484000 8808ff38
8156f310  2a2a2a2a2a2a2a2a 812a
   Call Trace:
[] ? inet_autobind+0x23/0x50
[] inet_dgram_connect+0x48/0x64
[] SYSC_connect+0x84/0xae
[] ? sock_alloc_file+0xb3/0x108
[] ? fd_install+0x20/0x22
[] ? SYSC_socket+0x62/0x90
[] SyS_connect+0x9/0xb
[] entry_SYSCALL_64_fastpath+0x12/0x71
   Code:  Bad RIP value.
   RIP  [<  (null)>]   (null)
RSP 
   CR2: 
   ---[ end trace bd60b4fe2edc2537 ]---

Signed-off-by: Vegard Nossum 
Cc: Eric Dumazet 
Cc: 
---
  net/ipv4/af_inet.c | 6 ++
  1 file changed, 6 insertions(+)

diff --git net/ipv4/af_inet.c net/ipv4/af_inet.c
index 11c4ca1..4e1583a 100644
--- net/ipv4/af_inet.c
+++ net/ipv4/af_inet.c
@@ -316,6 +316,12 @@ lookup_protocol:

 WARN_ON(!answer_prot->slab);

+   /* Check that the protocol we were given will actually fit in
+* inet->inet_num. */
+   err = -EINVAL;
+   if (protocol != (typeof(inet->inet_num)) protocol)
+   goto out;
+
 err = -ENOBUFS;
 sk = sk_alloc(net, PF_INET, GFP_KERNEL, answer_prot, kern);
 if (!sk)
--
1.9.1



It looks already fixed in a better way (IPv6, ...)

http://git.kernel.org/cgit/linux/kernel/git/davem/net.git/commit/?id=79462ad02e861803b3840cc782248c7359451cd9


Ha, sorry, I missed that!

Thanks for the quick response.


Vegard
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] wireless: change cfg80211 regulatory domain info as debug messages

2015-12-16 Thread Johannes Berg

On Thu, 2015-12-17 at 11:19 +0800, Dave Young wrote:
> 
> Cool, has the fix been in mainline or somewhere else?
> 

https://git.kernel.org/cgit/linux/kernel/git/davem/net.git/commit/?id=a87da0cbc42949cefc8282c39ab4cb8c460bd6ea

johannes
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Patch net] net: check both type and procotol for tcp sockets

2015-12-16 Thread Cong Wang

Dmitry reported the following out-of-bound access:

Call Trace:
 [] __asan_report_load4_noabort+0x3e/0x40
mm/kasan/report.c:294
 [] sock_setsockopt+0x1284/0x13d0 net/core/sock.c:880
 [< inline >] SYSC_setsockopt net/socket.c:1746
 [] SyS_setsockopt+0x1fe/0x240 net/socket.c:1729
 [] entry_SYSCALL_64_fastpath+0x16/0x7a
arch/x86/entry/entry_64.S:185

This is because we mistake a raw socket as a tcp socket.
We should check both sk->sk_type and sk->sk_protocol to ensure
it is a tcp socket.

Willem points out __skb_complete_tx_timestamp() needs to fix as well.

Reported-by: Dmitry Vyukov 
Cc: Willem de Bruijn 
Cc: Eric Dumazet 
Signed-off-by: Cong Wang 
---
 net/core/skbuff.c | 3 ++-
 net/core/sock.c   | 3 ++-
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 5cc43d37..b2df375 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -3643,7 +3643,8 @@ static void __skb_complete_tx_timestamp(struct sk_buff 
*skb,
serr->ee.ee_info = tstype;
if (sk->sk_tsflags & SOF_TIMESTAMPING_OPT_ID) {
serr->ee.ee_data = skb_shinfo(skb)->tskey;
-   if (sk->sk_protocol == IPPROTO_TCP)
+   if (sk->sk_protocol == IPPROTO_TCP &&
+   sk->sk_type == SOCK_STREAM)
serr->ee.ee_data -= sk->sk_tskey;
}
 
diff --git a/net/core/sock.c b/net/core/sock.c
index 765be83..0d91f7d 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -872,7 +872,8 @@ int sock_setsockopt(struct socket *sock, int level, int 
optname,
 
if (val & SOF_TIMESTAMPING_OPT_ID &&
!(sk->sk_tsflags & SOF_TIMESTAMPING_OPT_ID)) {
-   if (sk->sk_protocol == IPPROTO_TCP) {
+   if (sk->sk_protocol == IPPROTO_TCP &&
+   sk->sk_type == SOCK_STREAM) {
if (sk->sk_state != TCP_ESTABLISHED) {
ret = -EINVAL;
break;
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

1 2 >

1 - 100 of 150 matches

Mail list logo