Re: [patch] IB/hfi1: info leak in get_ctxt_info()

2015-09-16 Thread Julia Lawall
On Wed, 16 Sep 2015, Dan Carpenter wrote:

> The cinfo struct has a hole after the last struct member so we need to
> zero it out.  Otherwise we don't disclose some uninitialized stack data.

I think the "don't" wasn't intended in the second sentence?

julia

> 
> Signed-off-by: Dan Carpenter 
> 
> diff --git a/drivers/staging/rdma/hfi1/file_ops.c 
> b/drivers/staging/rdma/hfi1/file_ops.c
> index 4698617..2c43ca5 100644
> --- a/drivers/staging/rdma/hfi1/file_ops.c
> +++ b/drivers/staging/rdma/hfi1/file_ops.c
> @@ -1181,6 +1181,7 @@ static int get_ctxt_info(struct file *fp, void __user 
> *ubase, __u32 len)
>   struct hfi1_filedata *fd = fp->private_data;
>   int ret = 0;
>  
> + memset(, 0, sizeof(cinfo));
>   ret = hfi1_get_base_kinfo(uctxt, );
>   if (ret < 0)
>   goto done;
> --
> To unsubscribe from this list: send the line "unsubscribe kernel-janitors" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch] IB/hfi1: info leak in get_ctxt_info()

2015-09-16 Thread Dan Carpenter
The cinfo struct has a hole after the last struct member so we need to
zero it out.  Otherwise we don't disclose some uninitialized stack data.

Signed-off-by: Dan Carpenter 

diff --git a/drivers/staging/rdma/hfi1/file_ops.c 
b/drivers/staging/rdma/hfi1/file_ops.c
index 4698617..2c43ca5 100644
--- a/drivers/staging/rdma/hfi1/file_ops.c
+++ b/drivers/staging/rdma/hfi1/file_ops.c
@@ -1181,6 +1181,7 @@ static int get_ctxt_info(struct file *fp, void __user 
*ubase, __u32 len)
struct hfi1_filedata *fd = fp->private_data;
int ret = 0;
 
+   memset(, 0, sizeof(cinfo));
ret = hfi1_get_base_kinfo(uctxt, );
if (ret < 0)
goto done;
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch] IB/hfi1: fix a locking bug

2015-09-16 Thread Dan Carpenter
mutex_trylock() returns zero on failure, not EBUSY.

Signed-off-by: Dan Carpenter 

diff --git a/drivers/staging/rdma/hfi1/chip.c b/drivers/staging/rdma/hfi1/chip.c
index 654eafe..aa58e59 100644
--- a/drivers/staging/rdma/hfi1/chip.c
+++ b/drivers/staging/rdma/hfi1/chip.c
@@ -2710,7 +2710,7 @@ int acquire_lcb_access(struct hfi1_devdata *dd, int 
sleep_ok)
if (sleep_ok) {
mutex_lock(>hls_lock);
} else {
-   while (mutex_trylock(>hls_lock) == EBUSY)
+   while (!mutex_trylock(>hls_lock))
udelay(1);
}
 
@@ -2758,7 +2758,7 @@ int release_lcb_access(struct hfi1_devdata *dd, int 
sleep_ok)
if (sleep_ok) {
mutex_lock(>pport->hls_lock);
} else {
-   while (mutex_trylock(>pport->hls_lock) == EBUSY)
+   while (!mutex_trylock(>pport->hls_lock))
udelay(1);
}
 
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


re: IB/ipath: infiniband verbs support

2015-09-16 Thread Dan Carpenter
Hello Bryan O'Sullivan,

The patch 6522108f19a9: "IB/ipath: infiniband verbs support" from Mar
29, 2006, leads to the following static checker warning:

drivers/staging/rdma/ipath/ipath_verbs.c:2289 show_hca()
warn: bool is not less than zero.

drivers/staging/rdma/ipath/ipath_verbs.c
  2281  static ssize_t show_hca(struct device *device, struct device_attribute 
*attr,
  2282  char *buf)
  2283  {
  2284  struct ipath_ibdev *dev =
  2285  container_of(device, struct ipath_ibdev, ibdev.dev);
  2286  int ret;
  2287  
  2288  ret = dev->dd->ipath_f_get_boardname(dev->dd, buf, 128);
  2289  if (ret < 0)

ret is either zero or one, not negative.  There is some dead code in
ipath_ht_boardname() which indicates that it might have returned error
codes at some point as well.

This warning is from a too many false positives to publish Smatch check.

  2290  goto bail;
  2291  strcat(buf, "\n");
  2292  ret = strlen(buf);
  2293  
  2294  bail:
  2295  return ret;
  2296  }

regards,
dan carpenter
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch v2] IB/hfi1: info leak in get_ctxt_info()

2015-09-16 Thread Dan Carpenter
The cinfo struct has a hole after the last struct member so we need to
zero it out.  Otherwise we disclose some uninitialized stack data.

Signed-off-by: Dan Carpenter 
---
v2: typo in changelog

diff --git a/drivers/staging/rdma/hfi1/file_ops.c 
b/drivers/staging/rdma/hfi1/file_ops.c
index 4698617..2c43ca5 100644
--- a/drivers/staging/rdma/hfi1/file_ops.c
+++ b/drivers/staging/rdma/hfi1/file_ops.c
@@ -1181,6 +1181,7 @@ static int get_ctxt_info(struct file *fp, void __user 
*ubase, __u32 len)
struct hfi1_filedata *fd = fp->private_data;
int ret = 0;
 
+   memset(, 0, sizeof(cinfo));
ret = hfi1_get_base_kinfo(uctxt, );
if (ret < 0)
goto done;

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch] IB/hfi1: info leak in get_ctxt_info()

2015-09-16 Thread Dan Carpenter
On Wed, Sep 16, 2015 at 08:25:00AM +0200, Julia Lawall wrote:
> On Wed, 16 Sep 2015, Dan Carpenter wrote:
> 
> > The cinfo struct has a hole after the last struct member so we need to
> > zero it out.  Otherwise we don't disclose some uninitialized stack data.
> 
> I think the "don't" wasn't intended in the second sentence?
> 

Derp...  I will resend.

regards,
dan carpenter

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch] IB/hfi1: checking for NULL instead of IS_ERR

2015-09-16 Thread Dan Carpenter
__get_txreq() returns an ERR_PTR() but this checks for NULL so it would
oops on failure.

Signed-off-by: Dan Carpenter 

diff --git a/drivers/staging/rdma/hfi1/verbs.c 
b/drivers/staging/rdma/hfi1/verbs.c
index 53ac214..41bb59e 100644
--- a/drivers/staging/rdma/hfi1/verbs.c
+++ b/drivers/staging/rdma/hfi1/verbs.c
@@ -749,11 +749,13 @@ static inline struct verbs_txreq *get_txreq(struct 
hfi1_ibdev *dev,
struct verbs_txreq *tx;
 
tx = kmem_cache_alloc(dev->verbs_txreq_cache, GFP_ATOMIC);
-   if (!tx)
+   if (!tx) {
/* call slow path to get the lock */
tx =  __get_txreq(dev, qp);
-   if (tx)
-   tx->qp = qp;
+   if (IS_ERR(tx))
+   return tx;
+   }
+   tx->qp = qp;
return tx;
 }
 
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


re: IB/hfi1: add driver files

2015-09-16 Thread Dan Carpenter
Hello Mike Marciniszyn,

The patch 7724105686e7: "IB/hfi1: add driver files" from Jul 30,
2015, leads to the following static checker warning:

drivers/staging/rdma/hfi1/user_sdma.c:1349 set_txreq_header_ahg()
warn: mask and shift to zero

drivers/staging/rdma/hfi1/user_sdma.c
  1347  /* Clear KDETH.SH on last packet */
  1348  if (unlikely(tx->flags & 
USER_SDMA_TXREQ_FLAGS_LAST_PKT)) {
  1349  val |= 
cpu_to_le16(KDETH_GET(hdr->kdeth.ver_tid_offset,
  1350  INTR) 
>> 16);


KDETH_GET(hdr->kdeth.ver_tid_offset, INTR) is zero or one.  1 >> 16 is
zero.  This line is a no-op.

  1351  val &= cpu_to_le16(~(1U << 13));
  1352  AHG_HEADER_SET(req->ahg, diff, 7, 16, 14, val);
  1353  } else
  1354  AHG_HEADER_SET(req->ahg, diff, 7, 16, 12, val);

regards,
dan carpenter
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


re: IB/hfi1: add driver files

2015-09-16 Thread Dan Carpenter
Hello Mike Marciniszyn,

The patch 7724105686e7: "IB/hfi1: add driver files" from Jul 30,
2015, leads to the following static checker warning:

drivers/staging/rdma/hfi1/rc.c:2399 hfi1_rc_hdrerr()
warn: right shift assign to zero

drivers/staging/rdma/hfi1/rc.c
  2376  void hfi1_rc_hdrerr(
  2377  struct hfi1_ctxtdata *rcd,
  2378  struct hfi1_ib_header *hdr,
  2379  u32 rcv_flags,
  2380  struct hfi1_qp *qp)
  2381  {
  2382  int has_grh = rcv_flags & HFI1_HAS_GRH;
  2383  struct hfi1_other_headers *ohdr;
  2384  struct hfi1_ibport *ibp = to_iport(qp->ibqp.device, 
qp->port_num);
  2385  int diff;
  2386  u8 opcode;
  2387  u32 psn;
  2388  
  2389  /* Check for GRH */
  2390  ohdr = >u.oth;
  2391  if (has_grh)
  2392  ohdr = >u.l.oth;
  2393  
  2394  opcode = be32_to_cpu(ohdr->bth[0]);
  2395  if (hfi1_ruc_check_hdr(ibp, hdr, has_grh, qp, opcode))
  2396  return;
  2397  
  2398  psn = be32_to_cpu(ohdr->bth[2]);
  2399  opcode >>= 24;
  2400  

opcode should probably be a u32 instead of a u8.

regards,
dan carpenter
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH rdma-next 01/32] IB/core: Macro for RoCEv2 UDP port

2015-09-16 Thread Kamal Heib
From: Amir Vadai 

Adding a macro for RoCEv2 UDP destination port.

Signed-off-by: Amir Vadai 
Signed-off-by: Kamal Heib 
---
 include/rdma/ib_verbs.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index e6b6a86..4de9dfa 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -77,6 +77,8 @@ enum ib_gid_type {
IB_GID_TYPE_SIZE
 };
 
+#define ROCE_V2_UDP_DPORT  4791
+
 struct ib_gid_attr {
enum ib_gid_typegid_type;
struct net_device   *ndev;
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH rdma-next 13/32] IB/rxe: Allocation pool for RDMA objects

2015-09-16 Thread Kamal Heib
Manage and allocate pool of objects with given limit on number of
elements.  Gets parameters from rxe_type_info. Pool elements are
allocated out of a slab cache.  Objects that are using this facility
are: PD, QP, SRQ, CQ, MR, FMR, MW, etc.

Signed-off-by: Kamal Heib 
Signed-off-by: Amir Vadai 
Signed-off-by: Haggai Eran 
---
 drivers/staging/rxe/rxe_pool.c | 511 +
 drivers/staging/rxe/rxe_pool.h | 161 +
 2 files changed, 672 insertions(+)
 create mode 100644 drivers/staging/rxe/rxe_pool.c
 create mode 100644 drivers/staging/rxe/rxe_pool.h

diff --git a/drivers/staging/rxe/rxe_pool.c b/drivers/staging/rxe/rxe_pool.c
new file mode 100644
index 000..1e0787a
--- /dev/null
+++ b/drivers/staging/rxe/rxe_pool.c
@@ -0,0 +1,511 @@
+/*
+ * Copyright (c) 2015 Mellanox Technologies Ltd. All rights reserved.
+ * Copyright (c) 2015 System Fabric Works, Inc. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *Redistribution and use in source and binary forms, with or
+ *without modification, are permitted provided that the following
+ *conditions are met:
+ *
+ * - Redistributions of source code must retain the above
+ *   copyright notice, this list of conditions and the following
+ *   disclaimer.
+ *
+ * - Redistributions in binary form must reproduce the above
+ *   copyright notice, this list of conditions and the following
+ *   disclaimer in the documentation and/or other materials
+ *   provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include "rxe.h"
+#include "rxe_loc.h"
+
+/* info about object pools
+   note that mr, fmr and mw share a single index space
+   so that one can map an lkey to the correct type of object */
+struct rxe_type_info rxe_type_info[RXE_NUM_TYPES] = {
+   [RXE_TYPE_UC] = {
+   .name   = "uc",
+   .size   = sizeof(struct rxe_ucontext),
+   },
+   [RXE_TYPE_PD] = {
+   .name   = "pd",
+   .size   = sizeof(struct rxe_pd),
+   },
+   [RXE_TYPE_AH] = {
+   .name   = "ah",
+   .size   = sizeof(struct rxe_ah),
+   .flags  = RXE_POOL_ATOMIC,
+   },
+   [RXE_TYPE_SRQ] = {
+   .name   = "srq",
+   .size   = sizeof(struct rxe_srq),
+   .flags  = RXE_POOL_INDEX,
+   .min_index  = RXE_MIN_SRQ_INDEX,
+   .max_index  = RXE_MAX_SRQ_INDEX,
+   },
+   [RXE_TYPE_QP] = {
+   .name   = "qp",
+   .size   = sizeof(struct rxe_qp),
+   .cleanup= rxe_qp_cleanup,
+   .flags  = RXE_POOL_INDEX,
+   .min_index  = RXE_MIN_QP_INDEX,
+   .max_index  = RXE_MAX_QP_INDEX,
+   },
+   [RXE_TYPE_CQ] = {
+   .name   = "cq",
+   .size   = sizeof(struct rxe_cq),
+   .cleanup= rxe_cq_cleanup,
+   },
+   [RXE_TYPE_MR] = {
+   .name   = "mr",
+   .size   = sizeof(struct rxe_mem),
+   .cleanup= rxe_mem_cleanup,
+   .flags  = RXE_POOL_INDEX,
+   .max_index  = RXE_MAX_MR_INDEX,
+   .min_index  = RXE_MIN_MR_INDEX,
+   },
+   [RXE_TYPE_FMR] = {
+   .name   = "fmr",
+   .size   = sizeof(struct rxe_mem),
+   .cleanup= rxe_mem_cleanup,
+   .flags  = RXE_POOL_INDEX,
+   .max_index  = RXE_MAX_FMR_INDEX,
+   .min_index  = RXE_MIN_FMR_INDEX,
+   },
+   [RXE_TYPE_MW] = {
+   .name   = "mw",
+   .size   = sizeof(struct rxe_mem),
+   .flags  = RXE_POOL_INDEX,
+   .max_index  = RXE_MAX_MW_INDEX,
+   .min_index  = RXE_MIN_MW_INDEX,
+   },
+   [RXE_TYPE_MC_GRP] = {
+   .name  

[PATCH rdma-next 16/32] IB/rxe: Shared Receive Queue (SRQ) manipulation functions

2015-09-16 Thread Kamal Heib
Functions to manipulate SRQ.

Signed-off-by: Kamal Heib 
Signed-off-by: Amir Vadai 
Signed-off-by: Haggai Eran 
---
 drivers/staging/rxe/rxe_srq.c | 195 ++
 1 file changed, 195 insertions(+)
 create mode 100644 drivers/staging/rxe/rxe_srq.c

diff --git a/drivers/staging/rxe/rxe_srq.c b/drivers/staging/rxe/rxe_srq.c
new file mode 100644
index 000..1411fd2
--- /dev/null
+++ b/drivers/staging/rxe/rxe_srq.c
@@ -0,0 +1,195 @@
+/*
+ * Copyright (c) 2015 Mellanox Technologies Ltd. All rights reserved.
+ * Copyright (c) 2015 System Fabric Works, Inc. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ * - Redistributions of source code must retain the above
+ *   copyright notice, this list of conditions and the following
+ *   disclaimer.
+ *
+ * - Redistributions in binary form must reproduce the above
+ *   copyright notice, this list of conditions and the following
+ *   disclaimer in the documentation and/or other materials
+ *   provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include "rxe.h"
+#include "rxe_loc.h"
+#include "rxe_queue.h"
+
+int rxe_srq_chk_attr(struct rxe_dev *rxe, struct rxe_srq *srq,
+struct ib_srq_attr *attr, enum ib_srq_attr_mask mask)
+{
+   if (srq && srq->error) {
+   pr_warn("srq in error state\n");
+   goto err1;
+   }
+
+   if (mask & IB_SRQ_MAX_WR) {
+   if (attr->max_wr > rxe->attr.max_srq_wr) {
+   pr_warn("max_wr(%d) > max_srq_wr(%d)\n",
+   attr->max_wr, rxe->attr.max_srq_wr);
+   goto err1;
+   }
+
+   if (attr->max_wr <= 0) {
+   pr_warn("max_wr(%d) <= 0\n", attr->max_wr);
+   goto err1;
+   }
+
+   if (srq && srq->limit && (attr->max_wr < srq->limit)) {
+   pr_warn("max_wr (%d) < srq->limit (%d)\n",
+   attr->max_wr, srq->limit);
+   goto err1;
+   }
+
+   if (attr->max_wr < RXE_MIN_SRQ_WR)
+   attr->max_wr = RXE_MIN_SRQ_WR;
+   }
+
+   if (mask & IB_SRQ_LIMIT) {
+   if (attr->srq_limit > rxe->attr.max_srq_wr) {
+   pr_warn("srq_limit(%d) > max_srq_wr(%d)\n",
+   attr->srq_limit, rxe->attr.max_srq_wr);
+   goto err1;
+   }
+
+   if (srq && (attr->srq_limit > srq->rq.queue->buf->index_mask)) {
+   pr_warn("srq_limit (%d) > cur limit(%d)\n",
+   attr->srq_limit,
+srq->rq.queue->buf->index_mask);
+   goto err1;
+   }
+   }
+
+   if (mask == IB_SRQ_INIT_MASK) {
+   if (attr->max_sge > rxe->attr.max_srq_sge) {
+   pr_warn("max_sge(%d) > max_srq_sge(%d)\n",
+   attr->max_sge, rxe->attr.max_srq_sge);
+   goto err1;
+   }
+
+   if (attr->max_sge < RXE_MIN_SRQ_SGE)
+   attr->max_sge = RXE_MIN_SRQ_SGE;
+   }
+
+   return 0;
+
+err1:
+   return -EINVAL;
+}
+
+int rxe_srq_from_init(struct rxe_dev *rxe, struct rxe_srq *srq,
+ struct ib_srq_init_attr *init,
+ struct ib_ucontext *context, struct ib_udata *udata)
+{
+   int err;
+   int srq_wqe_size;
+   struct rxe_queue *q;
+
+   srq->event_handler  = init->event_handler;
+   srq->context= init->srq_context;
+   srq->limit  = init->attr.srq_limit;
+   srq->srq_num= srq->pelem.index;
+   srq->rq.max_wr  = init->attr.max_wr;
+   srq->rq.max_sge = init->attr.max_sge;
+
+   srq_wqe_size= rcv_wqe_size(srq->rq.max_sge);
+
+   

[PATCH rdma-next 14/32] IB/rxe: RXE tasks handling

2015-09-16 Thread Kamal Heib
A 'task' is a short function that returns 0 as long as it needs to be
called again. rxe tasks are based on the kernel's tasklet infrastructure.

Signed-off-by: Kamal Heib 
Signed-off-by: Amir Vadai 
Signed-off-by: Haggai Eran 
---
 drivers/staging/rxe/rxe_task.c | 154 +
 drivers/staging/rxe/rxe_task.h |  95 +
 2 files changed, 249 insertions(+)
 create mode 100644 drivers/staging/rxe/rxe_task.c
 create mode 100644 drivers/staging/rxe/rxe_task.h

diff --git a/drivers/staging/rxe/rxe_task.c b/drivers/staging/rxe/rxe_task.c
new file mode 100644
index 000..162fa1a
--- /dev/null
+++ b/drivers/staging/rxe/rxe_task.c
@@ -0,0 +1,154 @@
+/*
+ * Copyright (c) 2015 Mellanox Technologies Ltd. All rights reserved.
+ * Copyright (c) 2015 System Fabric Works, Inc. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *Redistribution and use in source and binary forms, with or
+ *without modification, are permitted provided that the following
+ *conditions are met:
+ *
+ * - Redistributions of source code must retain the above
+ *   copyright notice, this list of conditions and the following
+ *   disclaimer.
+ *
+ * - Redistributions in binary form must reproduce the above
+ *   copyright notice, this list of conditions and the following
+ *   disclaimer in the documentation and/or other materials
+ *   provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include 
+#include 
+#include 
+
+#include "rxe_task.h"
+
+int __rxe_do_task(struct rxe_task *task)
+
+{
+   int ret;
+
+   while ((ret = task->func(task->arg)) == 0)
+   ;
+
+   task->ret = ret;
+
+   return ret;
+}
+
+/*
+ * this locking is due to a potential race where
+ * a second caller finds the task already running
+ * but looks just after the last call to func
+ */
+void rxe_do_task(unsigned long data)
+{
+   int cont;
+   int ret;
+   unsigned long flags;
+   struct rxe_task *task = (struct rxe_task *)data;
+
+   spin_lock_irqsave(>state_lock, flags);
+   switch (task->state) {
+   case TASK_STATE_START:
+   task->state = TASK_STATE_BUSY;
+   spin_unlock_irqrestore(>state_lock, flags);
+   break;
+
+   case TASK_STATE_BUSY:
+   task->state = TASK_STATE_ARMED;
+   /* fall through to */
+   case TASK_STATE_ARMED:
+   spin_unlock_irqrestore(>state_lock, flags);
+   return;
+
+   default:
+   spin_unlock_irqrestore(>state_lock, flags);
+   pr_warn("bad state = %d in rxe_do_task\n", task->state);
+   return;
+   }
+
+   do {
+   cont = 0;
+   ret = task->func(task->arg);
+
+   spin_lock_irqsave(>state_lock, flags);
+   switch (task->state) {
+   case TASK_STATE_BUSY:
+   if (ret)
+   task->state = TASK_STATE_START;
+   else
+   cont = 1;
+   break;
+
+   /* soneone tried to run the task since the last time we called
+* func, so we will call one more time regardless of the
+* return value
+*/
+   case TASK_STATE_ARMED:
+   task->state = TASK_STATE_BUSY;
+   cont = 1;
+   break;
+
+   default:
+   pr_warn("bad state = %d in rxe_do_task\n",
+   task->state);
+   }
+   spin_unlock_irqrestore(>state_lock, flags);
+   } while (cont);
+
+   task->ret = ret;
+}
+
+int rxe_init_task(void *obj, struct rxe_task *task,
+ void *arg, int (*func)(void *), char *name)
+{
+   task->obj   = obj;
+   task->arg   = arg;
+   task->func  = func;
+   snprintf(task->name, sizeof(task->name), "%s", name);
+
+   tasklet_init(>tasklet, rxe_do_task, (unsigned long)task);
+
+   task->state = TASK_STATE_START;
+   

[PATCH rdma-next 31/32] IB/rxe: Add Soft-RoCE to kbuild and makefiles

2015-09-16 Thread Kamal Heib
Kconfig and Makefiles for RXE driver

Signed-off-by: Kamal Heib 
Signed-off-by: Amir Vadai 
Signed-off-by: Haggai Eran 
---
 drivers/staging/Kconfig  |  2 ++
 drivers/staging/Makefile |  1 +
 drivers/staging/rxe/Kconfig  | 23 +++
 drivers/staging/rxe/Makefile | 24 
 4 files changed, 50 insertions(+)
 create mode 100644 drivers/staging/rxe/Kconfig
 create mode 100644 drivers/staging/rxe/Makefile

diff --git a/drivers/staging/Kconfig b/drivers/staging/Kconfig
index 3f9f058..1654753 100644
--- a/drivers/staging/Kconfig
+++ b/drivers/staging/Kconfig
@@ -118,4 +118,6 @@ source "drivers/staging/ipath/Kconfig"
 
 source "drivers/staging/hfi1/Kconfig"
 
+source "drivers/staging/rxe/Kconfig"
+
 endif # STAGING
diff --git a/drivers/staging/Makefile b/drivers/staging/Makefile
index 20f8276..15080c4 100644
--- a/drivers/staging/Makefile
+++ b/drivers/staging/Makefile
@@ -51,3 +51,4 @@ obj-$(CONFIG_WILC1000)+= wilc1000/
 obj-$(CONFIG_INFINIBAND_AMSO1100)  += amso1100/
 obj-$(CONFIG_INFINIBAND_IPATH) += ipath/
 obj-$(CONFIG_INFINIBAND_HFI1)  += hfi1/
+obj-$(CONFIG_INFINIBAND_RXE)+= rxe/
diff --git a/drivers/staging/rxe/Kconfig b/drivers/staging/rxe/Kconfig
new file mode 100644
index 000..649b7be
--- /dev/null
+++ b/drivers/staging/rxe/Kconfig
@@ -0,0 +1,23 @@
+config INFINIBAND_RXE
+   tristate "Software RDMA over Ethernet (RoCE) driver"
+   depends on INET && PCI && INFINIBAND
+   ---help---
+   This driver implements the InfiniBand RDMA transport over
+   the Linux network stack. It enables a system with a
+   standard Ethernet adapter to interoperate with a RoCE
+   adapter or with another system running the RXE driver.
+   Documentation on InfiniBand and RoCE can be downloaded at
+   www.infinibandta.org and www.openfabrics.org. (See also
+   siw which is a similar software driver for iWARP.)
+
+   The driver is split into two layers, one interfaces with the
+   Linux RDMA stack and implements a kernel or user space
+   verbs API. The user space verbs API requires a support
+   library named librxe which is loaded by the generic user
+   space verbs API, libibverbs. The other layer interfaces
+   with the Linux network stack at layer 3.
+
+   To configure and work with soft-RoCE driver please use the
+   following wiki page under "configure Soft-RoCE (RXE)" section:
+
+   https://github.com/SoftRoCE/rxe-dev/wiki/rxe-dev:-Home
diff --git a/drivers/staging/rxe/Makefile b/drivers/staging/rxe/Makefile
new file mode 100644
index 000..7cf7774
--- /dev/null
+++ b/drivers/staging/rxe/Makefile
@@ -0,0 +1,24 @@
+obj-$(CONFIG_INFINIBAND_RXE) += ib_rxe.o
+
+ib_rxe-y := \
+   rxe.o \
+   rxe_comp.o \
+   rxe_req.o \
+   rxe_resp.o \
+   rxe_recv.o \
+   rxe_pool.o \
+   rxe_queue.o \
+   rxe_verbs.o \
+   rxe_av.o \
+   rxe_srq.o \
+   rxe_qp.o \
+   rxe_cq.o \
+   rxe_mr.o \
+   rxe_dma.o \
+   rxe_opcode.o \
+   rxe_mmap.o \
+   rxe_icrc.o \
+   rxe_mcast.o \
+   rxe_task.o \
+   rxe_net.o \
+   rxe_sysfs.o
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH rdma-next 15/32] IB/rxe: Address vector manipulation functions

2015-09-16 Thread Kamal Heib
Functions to manipulate Address Vector.

Signed-off-by: Kamal Heib 
Signed-off-by: Amir Vadai 
Signed-off-by: Haggai Eran 
---
 drivers/staging/rxe/rxe_av.c | 87 
 1 file changed, 87 insertions(+)
 create mode 100644 drivers/staging/rxe/rxe_av.c

diff --git a/drivers/staging/rxe/rxe_av.c b/drivers/staging/rxe/rxe_av.c
new file mode 100644
index 000..cc4b179
--- /dev/null
+++ b/drivers/staging/rxe/rxe_av.c
@@ -0,0 +1,87 @@
+/*
+ * Copyright (c) 2015 Mellanox Technologies Ltd. All rights reserved.
+ * Copyright (c) 2015 System Fabric Works, Inc. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *Redistribution and use in source and binary forms, with or
+ *without modification, are permitted provided that the following
+ *conditions are met:
+ *
+ * - Redistributions of source code must retain the above
+ *   copyright notice, this list of conditions and the following
+ *   disclaimer.
+ *
+ * - Redistributions in binary form must reproduce the above
+ *   copyright notice, this list of conditions and the following
+ *   disclaimer in the documentation and/or other materials
+ *   provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include "rxe.h"
+#include "rxe_loc.h"
+
+int rxe_av_chk_attr(struct rxe_dev *rxe, struct ib_ah_attr *attr)
+{
+   struct rxe_port *port;
+
+   if (attr->port_num < 1 || attr->port_num > rxe->num_ports) {
+   pr_info("rxe: invalid port_num = %d\n", attr->port_num);
+   return -EINVAL;
+   }
+
+   port = >port[attr->port_num - 1];
+
+   if (attr->ah_flags & IB_AH_GRH) {
+   if (attr->grh.sgid_index > port->attr.gid_tbl_len) {
+   pr_info("rxe: invalid sgid index = %d\n",
+   attr->grh.sgid_index);
+   return -EINVAL;
+   }
+   }
+
+   return 0;
+}
+
+int rxe_av_from_attr(struct rxe_dev *rxe, u8 port_num,
+struct rxe_av *av, struct ib_ah_attr *attr)
+{
+   memset(av, 0, sizeof(*av));
+   memcpy(>grh, >grh, sizeof(attr->grh));
+   av->port_num = port_num;
+   return 0;
+}
+
+int rxe_av_to_attr(struct rxe_dev *rxe, struct rxe_av *av,
+  struct ib_ah_attr *attr)
+{
+   memcpy(>grh, >grh, sizeof(av->grh));
+   attr->port_num = av->port_num;
+   return 0;
+}
+
+int rxe_av_fill_ip_info(struct rxe_dev *rxe,
+   struct rxe_av *av,
+   struct ib_ah_attr *attr,
+   struct ib_gid_attr *sgid_attr,
+   union ib_gid *sgid)
+{
+   rdma_gid2ip(>sgid_addr._sockaddr, sgid);
+   rdma_gid2ip(>dgid_addr._sockaddr, >grh.dgid);
+   av->network_type = ib_gid_to_network_type(sgid_attr->gid_type, sgid);
+
+   return 0;
+}
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH rdma-next 09/32] IB/rxe: Work request's opcode information table

2015-09-16 Thread Kamal Heib
Useful information about work request opcodes and pkt opcodes in table
form.

Signed-off-by: Kamal Heib 
Signed-off-by: Amir Vadai 
Signed-off-by: Haggai Eran 
---
 drivers/staging/rxe/rxe_opcode.c | 961 +++
 1 file changed, 961 insertions(+)
 create mode 100644 drivers/staging/rxe/rxe_opcode.c

diff --git a/drivers/staging/rxe/rxe_opcode.c b/drivers/staging/rxe/rxe_opcode.c
new file mode 100644
index 000..894efe7
--- /dev/null
+++ b/drivers/staging/rxe/rxe_opcode.c
@@ -0,0 +1,961 @@
+/*
+ * Copyright (c) 2015 Mellanox Technologies Ltd. All rights reserved.
+ * Copyright (c) 2015 System Fabric Works, Inc. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ * - Redistributions of source code must retain the above
+ *   copyright notice, this list of conditions and the following
+ *   disclaimer.
+ *
+ * - Redistributions in binary form must reproduce the above
+ *   copyright notice, this list of conditions and the following
+ *   disclaimer in the documentation and/or other materials
+ *   provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include 
+#include "rxe_opcode.h"
+#include "rxe_hdr.h"
+
+/* useful information about work request opcodes and pkt opcodes in
+ * table form
+ */
+struct rxe_wr_opcode_info rxe_wr_opcode_info[] = {
+   [IB_WR_RDMA_WRITE]  = {
+   .name   = "IB_WR_RDMA_WRITE",
+   .mask   = {
+   [IB_QPT_RC] = WR_INLINE_MASK | WR_WRITE_MASK,
+   [IB_QPT_UC] = WR_INLINE_MASK | WR_WRITE_MASK,
+   },
+   },
+   [IB_WR_RDMA_WRITE_WITH_IMM] = {
+   .name   = "IB_WR_RDMA_WRITE_WITH_IMM",
+   .mask   = {
+   [IB_QPT_RC] = WR_INLINE_MASK | WR_WRITE_MASK,
+   [IB_QPT_UC] = WR_INLINE_MASK | WR_WRITE_MASK,
+   },
+   },
+   [IB_WR_SEND]= {
+   .name   = "IB_WR_SEND",
+   .mask   = {
+   [IB_QPT_SMI]= WR_INLINE_MASK | WR_SEND_MASK,
+   [IB_QPT_GSI]= WR_INLINE_MASK | WR_SEND_MASK,
+   [IB_QPT_RC] = WR_INLINE_MASK | WR_SEND_MASK,
+   [IB_QPT_UC] = WR_INLINE_MASK | WR_SEND_MASK,
+   [IB_QPT_UD] = WR_INLINE_MASK | WR_SEND_MASK,
+   },
+   },
+   [IB_WR_SEND_WITH_IMM]   = {
+   .name   = "IB_WR_SEND_WITH_IMM",
+   .mask   = {
+   [IB_QPT_SMI]= WR_INLINE_MASK | WR_SEND_MASK,
+   [IB_QPT_GSI]= WR_INLINE_MASK | WR_SEND_MASK,
+   [IB_QPT_RC] = WR_INLINE_MASK | WR_SEND_MASK,
+   [IB_QPT_UC] = WR_INLINE_MASK | WR_SEND_MASK,
+   [IB_QPT_UD] = WR_INLINE_MASK | WR_SEND_MASK,
+   },
+   },
+   [IB_WR_RDMA_READ]   = {
+   .name   = "IB_WR_RDMA_READ",
+   .mask   = {
+   [IB_QPT_RC] = WR_READ_MASK,
+   },
+   },
+   [IB_WR_ATOMIC_CMP_AND_SWP]  = {
+   .name   = "IB_WR_ATOMIC_CMP_AND_SWP",
+   .mask   = {
+   [IB_QPT_RC] = WR_ATOMIC_MASK,
+   },
+   },
+   [IB_WR_ATOMIC_FETCH_AND_ADD]= {
+   .name   = "IB_WR_ATOMIC_FETCH_AND_ADD",
+   .mask   = {
+   [IB_QPT_RC] = WR_ATOMIC_MASK,
+   },
+   },
+   [IB_WR_LSO] = {
+   .name   = "IB_WR_LSO",
+   .mask   = {
+   /* not supported */
+   },
+   },
+   [IB_WR_SEND_WITH_INV]   = {
+   .name   = 

[PATCH rdma-next 06/32] IB/rxe: External interface to lower level modules

2015-09-16 Thread Kamal Heib
Functions to be called by the networking layer.

Signed-off-by: Kamal Heib 
Signed-off-by: Amir Vadai 
Signed-off-by: Haggai Eran 
---
 drivers/staging/rxe/rxe.h | 70 +++
 1 file changed, 70 insertions(+)
 create mode 100644 drivers/staging/rxe/rxe.h

diff --git a/drivers/staging/rxe/rxe.h b/drivers/staging/rxe/rxe.h
new file mode 100644
index 000..f781619
--- /dev/null
+++ b/drivers/staging/rxe/rxe.h
@@ -0,0 +1,70 @@
+/*
+ * Copyright (c) 2015 Mellanox Technologies Ltd. All rights reserved.
+ * Copyright (c) 2015 System Fabric Works, Inc. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ * - Redistributions of source code must retain the above
+ *   copyright notice, this list of conditions and the following
+ *   disclaimer.
+ *
+ * - Redistributions in binary form must reproduce the above
+ *   copyright notice, this list of conditions and the following
+ *   disclaimer in the documentation and/or other materials
+ *   provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#ifndef RXE_H
+#define RXE_H
+
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "rxe_net.h"
+#include "rxe_opcode.h"
+#include "rxe_hdr.h"
+#include "rxe_param.h"
+#include "rxe_verbs.h"
+
+#define RXE_UVERBS_ABI_VERSION (1)
+
+#define IB_PHYS_STATE_LINK_UP  (5)
+
+#define RXE_ROCE_V2_SPORT  (0xc000)
+
+int rxe_set_mtu(struct rxe_dev *rxe, unsigned int dev_mtu,
+   unsigned int port_num);
+
+int rxe_add(struct rxe_dev *rxe, unsigned int mtu);
+
+void rxe_remove(struct rxe_dev *rxe);
+
+int rxe_rcv(struct sk_buff *skb);
+
+#endif /* RXE_H */
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH rdma-next 18/32] IB/rxe: Queue Pair (QP) handling

2015-09-16 Thread Kamal Heib
Functions to manipulate QP objects.

Signed-off-by: Kamal Heib 
Signed-off-by: Amir Vadai 
---
 drivers/staging/rxe/rxe_qp.c | 835 +++
 1 file changed, 835 insertions(+)
 create mode 100644 drivers/staging/rxe/rxe_qp.c

diff --git a/drivers/staging/rxe/rxe_qp.c b/drivers/staging/rxe/rxe_qp.c
new file mode 100644
index 000..dcc3e2d
--- /dev/null
+++ b/drivers/staging/rxe/rxe_qp.c
@@ -0,0 +1,835 @@
+/*
+ * Copyright (c) 2015 Mellanox Technologies Ltd. All rights reserved.
+ * Copyright (c) 2015 System Fabric Works, Inc. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *Redistribution and use in source and binary forms, with or
+ *without modification, are permitted provided that the following
+ *conditions are met:
+ *
+ * - Redistributions of source code must retain the above
+ *   copyright notice, this list of conditions and the following
+ *   disclaimer.
+ *
+ * - Redistributions in binary form must reproduce the above
+ *   copyright notice, this list of conditions and the following
+ *   disclaimer in the documentation and/or other materials
+ *   provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include 
+#include 
+#include 
+
+#include "rxe.h"
+#include "rxe_loc.h"
+#include "rxe_queue.h"
+#include "rxe_task.h"
+
+char *rxe_qp_state_name[] = {
+   [QP_STATE_RESET]= "RESET",
+   [QP_STATE_INIT] = "INIT",
+   [QP_STATE_READY]= "READY",
+   [QP_STATE_DRAIN]= "DRAIN",
+   [QP_STATE_DRAINED]  = "DRAINED",
+   [QP_STATE_ERROR]= "ERROR",
+};
+
+static int rxe_qp_chk_cap(struct rxe_dev *rxe, struct ib_qp_cap *cap,
+ int has_srq)
+{
+   if (cap->max_send_wr > rxe->attr.max_qp_wr) {
+   pr_warn("invalid send wr = %d > %d\n",
+   cap->max_send_wr, rxe->attr.max_qp_wr);
+   goto err1;
+   }
+
+   if (cap->max_send_sge > rxe->attr.max_sge) {
+   pr_warn("invalid send sge = %d > %d\n",
+   cap->max_send_sge, rxe->attr.max_sge);
+   goto err1;
+   }
+
+   if (!has_srq) {
+   if (cap->max_recv_wr > rxe->attr.max_qp_wr) {
+   pr_warn("invalid recv wr = %d > %d\n",
+   cap->max_recv_wr, rxe->attr.max_qp_wr);
+   goto err1;
+   }
+
+   if (cap->max_recv_sge > rxe->attr.max_sge) {
+   pr_warn("invalid recv sge = %d > %d\n",
+   cap->max_recv_sge, rxe->attr.max_sge);
+   goto err1;
+   }
+   }
+
+   if (cap->max_inline_data > rxe->max_inline_data) {
+   pr_warn("invalid max inline data = %d > %d\n",
+   cap->max_inline_data, rxe->max_inline_data);
+   goto err1;
+   }
+
+   return 0;
+
+err1:
+   return -EINVAL;
+}
+
+int rxe_qp_chk_init(struct rxe_dev *rxe, struct ib_qp_init_attr *init)
+{
+   struct ib_qp_cap *cap = >cap;
+   struct rxe_port *port;
+   int port_num = init->port_num;
+
+   if (!init->recv_cq || !init->send_cq) {
+   pr_warn("missing cq\n");
+   goto err1;
+   }
+
+   if (rxe_qp_chk_cap(rxe, cap, !!init->srq))
+   goto err1;
+
+   if (init->qp_type == IB_QPT_SMI || init->qp_type == IB_QPT_GSI) {
+   if (port_num < 1 || port_num > rxe->num_ports) {
+   pr_warn("invalid port = %d\n", port_num);
+   goto err1;
+   }
+
+   port = >port[port_num - 1];
+
+   if (init->qp_type == IB_QPT_SMI && port->qp_smi_index) {
+   pr_warn("SMI QP exists for port %d\n", port_num);
+   goto err1;
+   }
+
+   if (init->qp_type == IB_QPT_GSI && port->qp_gsi_index) {
+   pr_warn("GSI QP exists for port %d\n", port_num);
+   goto err1;
+   }
+   }
+
+   return 0;
+

[PATCH rdma-next 30/32] IB/rxe: Shared objects between user and kernel

2015-09-16 Thread Kamal Heib
From: Amir Vadai 

Objects used by the userspace to post work requests.

Signed-off-by: Kamal Heib 
Signed-off-by: Amir Vadai 
Signed-off-by: Haggai Eran 
---
 include/uapi/rdma/Kbuild   |   1 +
 include/uapi/rdma/ib_rxe.h | 139 +
 2 files changed, 140 insertions(+)
 create mode 100644 include/uapi/rdma/ib_rxe.h

diff --git a/include/uapi/rdma/Kbuild b/include/uapi/rdma/Kbuild
index 687ae33..91bc37a 100644
--- a/include/uapi/rdma/Kbuild
+++ b/include/uapi/rdma/Kbuild
@@ -5,3 +5,4 @@ header-y += ib_user_sa.h
 header-y += ib_user_verbs.h
 header-y += rdma_netlink.h
 header-y += rdma_user_cm.h
+header-y += ib_rxe.h
diff --git a/include/uapi/rdma/ib_rxe.h b/include/uapi/rdma/ib_rxe.h
new file mode 100644
index 000..fc1d9ca
--- /dev/null
+++ b/include/uapi/rdma/ib_rxe.h
@@ -0,0 +1,139 @@
+/*
+ * Copyright (c) 2015 Mellanox Technologies Ltd. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ * - Redistributions of source code must retain the above
+ *   copyright notice, this list of conditions and the following
+ *   disclaimer.
+ *
+ * - Redistributions in binary form must reproduce the above
+ *   copyright notice, this list of conditions and the following
+ *   disclaimer in the documentation and/or other materials
+ *   provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#ifndef IB_RXE_H
+#define IB_RXE_H
+
+#include 
+
+union rxe_gid {
+   __u8raw[16];
+   struct {
+   __be64  subnet_prefix;
+   __be64  interface_id;
+   } global;
+};
+
+struct rxe_global_route {
+   union rxe_gid   dgid;
+   __u32   flow_label;
+   __u8sgid_index;
+   __u8hop_limit;
+   __u8traffic_class;
+};
+
+struct rxe_av {
+   __u8port_num;
+   __u8network_type;
+   struct rxe_global_route grh;
+   union {
+   struct sockaddr _sockaddr;
+   struct sockaddr_in  _sockaddr_in;
+   struct sockaddr_in6 _sockaddr_in6;
+   } sgid_addr, dgid_addr;
+};
+
+struct rxe_send_wr {
+   __u64   wr_id;
+   __u32   num_sge;
+   __u32   opcode;
+   __u32   send_flags;
+   union {
+   __u32   imm_data;
+   __u32   invalidate_rkey;
+   } ex;
+   union {
+   struct {
+   __u64   remote_addr;
+   __u32   rkey;
+   } rdma;
+   struct {
+   __u64   remote_addr;
+   __u64   compare_add;
+   __u64   swap;
+   __u32   rkey;
+   } atomic;
+   struct {
+   __u32   remote_qpn;
+   __u32   remote_qkey;
+   __u16   pkey_index;
+   } ud;
+   } wr;
+};
+
+struct rxe_sge {
+   __u64   addr;
+   __u32   length;
+   __u32   lkey;
+};
+
+struct mminfo {
+   __u64   offset;
+   __u32   size;
+   __u32   pad;
+};
+
+struct rxe_dma_info {
+   __u32   length;
+   __u32   resid;
+   __u32   cur_sge;
+   __u32   num_sge;
+   __u32   sge_offset;
+   union {
+   __u8inline_data[0];
+   struct rxe_sge  sge[0];
+   };
+};
+
+struct rxe_send_wqe {
+   struct rxe_send_wr  wr;
+   struct rxe_av   av;
+   __u32   status;
+   __u32   state;
+   __u64   iova;
+   __u32   mask;
+   __u32   first_psn;
+   __u32   last_psn;
+   __u32   ack_length;
+

[PATCH rdma-next 00/32] Soft-RoCE driver

2015-09-16 Thread Kamal Heib
Doug and list Hi,

This patchset introduces Soft RoCE driver.

Some background on the driver: The original Soft-RoCE driver was implemented by
Bob Pearson from SFW. Bob started the submission process [3], but his work was
abandoned after v2.
Mellanox decided to pick it up and continue the submission. As part of the
process we detected some problems with the original implementation. Mainly, we
wanted to RoCEv2, also, there are too many locks and 
context switches in the data path. Most of them are already removed.

We've located the driver in the staging subtree. This follows a requirement
to implement an IB transport library - Soft RoCE is in the same boat like the 
hfi1 
driver. We need to define and implement a lib to prevent those code 
duplications. 

We did address the feedback provided on the original submission.

Another issue is, that this code is based on RoCEv2 patchesets.

So why not wait and submit it when the RoCEv2 IB core bits are upstream?

The main reason we want to submit it now, and not to wait is: "Submit early".  
We understand that 4 years after v2 is not exactly early. But we started few
months ago to work on it, and did some heavy modifications to the code and the
design, and we would like this work to be done under the eye of the community
and not in house (although this work was done @ github [4]). 

Soft-RoCE is sitting on top of Matan's 3 patchsets for gid cache and RoCEv2.
The first [1] is in, the second [2] and third [5] were posted already.

RXE user space (librxe) is located at github [6] with instructions how to use
it [7]

Some notes on the architecture and design:

ib_rxe, implements the RDMA transport and registers with the RDMA core as a
kernel verbs provider. It also implements the packet IO layer. ib_rxe attaches
to the Linux netdev stack as a udp encapsulating protocol and can send and
receive packets over any Ethernet device. It uses the RoCEv2 protocol to handle
RDMA transport. 

The modules are configured by entries in /sys. There is a configuration script
(rxe_cfg) that simplifies the use of this interface. rxe_cfg is part of the
rxe user space code, librxe.

The use of rxe verbs in user space requires the inclusion of librxe as a device
specific plug-in to libibverbs. librxe is packaged separately [6].

Copies of the user space library and tools for 'upstream' and a clone of Doug's 
tree with 
these patches applied are available at github [4]

Architecture:

~~

 +---+
 |  Application  |
 +---+
 +---+
 | libibverbs|
User +---+
 ++ ++
 | librxe | | HW RoCE lib|
 ++ ++
~~
 +--+   ++
 | Sockets  |   | RDMA ULP   |
 +--+   ++
 +--+  +-+
 | TCP/IP   |  | ib_core |
 +--+  +-+
 ++ ++
Kernel   | ib_rxe | | HW RoCE driver |  
 
 ++ ++
 ++
 | NIC driver |
 ++
~~

The driver components and a non asci chart of the module could be found at a
pdf [8] presented by Bob before the original submission.
The design is very similar, one thing that was changed, is the arbiter task
that was removed. This reduced the number of context switches and locks during
the data path.

Currently IPv4 based sessions aren't supported, this will be addressed for V1.

A TODO file is placed under the driver folder.

The patchset is applied and tested over Dougs to-be-rebased/for-4.3 branch
153b730 ("Merge branch 'hfi1-v4' into to-be-rebased/for-4.3").

Thanks,
Kamal, Liran and Amir

[1] - http://www.spinics.net/lists/netdev/msg337683.html
[2] - http://www.spinics.net/lists/linux-rdma/msg28031.html
[3] - http://www.spinics.net/lists/linux-rdma/msg08936.html
[4] - https://github.com/SoftRoCE
[5] - http://www.spinics.net/lists/linux-rdma/msg28120.html
[6] - https://github.com/SoftRoCE/librxe-dev
[7] - https://github.com/SoftRoCE/rxe-dev/wiki/rxe-dev:-Home
[8] - 

[PATCH rdma-next 32/32] IB/rxe: TODO file while in staging

2015-09-16 Thread Kamal Heib
From: Amir Vadai 

Things todo in order to get out of staging subtree.

Signed-off-by: Amir Vadai 
Signed-off-by: Kamal Heib 
---
 drivers/staging/rxe/TODO | 15 +++
 1 file changed, 15 insertions(+)
 create mode 100644 drivers/staging/rxe/TODO

diff --git a/drivers/staging/rxe/TODO b/drivers/staging/rxe/TODO
new file mode 100644
index 000..a621b27
--- /dev/null
+++ b/drivers/staging/rxe/TODO
@@ -0,0 +1,15 @@
+Aug, 2015
+
+- Remove software processing of IB protocol and place in library for use
+  by qib, ipath (if still present), hfi1, and soft-roce
+- Do not use tasklet in completion flow
+- Need to free resources if user space didn't.
+- Share structures from ib_user_verbs.h instead of copying in ib_rxe.h
+- Move IBA header types and methods from rxe_hdr.h into IB core
+- Cleanup members of rxe_pkt_info that already exists in packet header
+- Refactor post_send_one function to get better performance.
+- Refactor rxe_mem struct to be clear what is type of memory that it's holding.
+- Use single reference count from the pool to the device, instead of having a 
single
+  reference on the device kept by each element in the pool.
+- Calculate ICRC for incoming packets.
+- Use hash table to hold net_info instead of fixed size array used now.
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH rdma-next 23/32] IB/rxe: QP request handling

2015-09-16 Thread Kamal Heib
QP request logic.

Signed-off-by: Kamal Heib 
Signed-off-by: Amir Vadai 
---
 drivers/staging/rxe/rxe_req.c | 679 ++
 1 file changed, 679 insertions(+)
 create mode 100644 drivers/staging/rxe/rxe_req.c

diff --git a/drivers/staging/rxe/rxe_req.c b/drivers/staging/rxe/rxe_req.c
new file mode 100644
index 000..41d13a5
--- /dev/null
+++ b/drivers/staging/rxe/rxe_req.c
@@ -0,0 +1,679 @@
+/*
+ * Copyright (c) 2015 Mellanox Technologies Ltd. All rights reserved.
+ * Copyright (c) 2015 System Fabric Works, Inc. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ * - Redistributions of source code must retain the above
+ *   copyright notice, this list of conditions and the following
+ *   disclaimer.
+ *
+ * - Redistributions in binary form must reproduce the above
+ *   copyright notice, this list of conditions and the following
+ *   disclaimer in the documentation and/or other materials
+ *   provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include 
+
+#include "rxe.h"
+#include "rxe_loc.h"
+#include "rxe_queue.h"
+
+static int next_opcode(struct rxe_qp *qp, struct rxe_send_wqe *wqe,
+  unsigned opcode);
+
+static inline void retry_first_write_send(struct rxe_qp *qp,
+ struct rxe_send_wqe *wqe,
+ unsigned mask, int npsn)
+{
+   int i;
+
+   for (i = 0; i < npsn; i++) {
+   int to_send = (wqe->dma.resid > qp->mtu) ?
+   qp->mtu : wqe->dma.resid;
+
+   qp->req.opcode = next_opcode(qp, wqe,
+wqe->wr.opcode);
+
+   if (wqe->wr.send_flags & IB_SEND_INLINE) {
+   wqe->dma.resid -= to_send;
+   wqe->dma.sge_offset += to_send;
+   } else {
+   advance_dma_data(>dma, to_send);
+   }
+   if (mask & WR_WRITE_MASK)
+   wqe->iova += qp->mtu;
+   }
+}
+
+static void req_retry(struct rxe_qp *qp)
+{
+   struct rxe_send_wqe *wqe;
+   unsigned int wqe_index;
+   unsigned int mask;
+   int npsn;
+   int first = 1;
+
+   wqe = queue_head(qp->sq.queue);
+   npsn = (qp->comp.psn - wqe->first_psn) & BTH_PSN_MASK;
+
+   qp->req.wqe_index   = consumer_index(qp->sq.queue);
+   qp->req.psn = qp->comp.psn;
+   qp->req.opcode  = -1;
+
+   for (wqe_index = consumer_index(qp->sq.queue);
+   wqe_index != producer_index(qp->sq.queue);
+   wqe_index = next_index(qp->sq.queue, wqe_index)) {
+   wqe = addr_from_index(qp->sq.queue, wqe_index);
+   mask = wr_opcode_mask(wqe->wr.opcode, qp);
+
+   if (wqe->state == wqe_state_posted)
+   break;
+
+   if (wqe->state == wqe_state_done)
+   continue;
+
+   wqe->iova = (mask & WR_ATOMIC_MASK) ?
+   wqe->wr.wr.atomic.remote_addr :
+   wqe->wr.wr.rdma.remote_addr;
+
+   if (!first || (mask & WR_READ_MASK) == 0) {
+   wqe->dma.resid = wqe->dma.length;
+   wqe->dma.cur_sge = 0;
+   wqe->dma.sge_offset = 0;
+   }
+
+   if (first) {
+   first = 0;
+
+   if (mask & WR_WRITE_OR_SEND_MASK)
+   retry_first_write_send(qp, wqe, mask, npsn);
+
+   if (mask & WR_READ_MASK)
+   wqe->iova += npsn * qp->mtu;
+   }
+
+   wqe->state = wqe_state_posted;
+   }
+}
+
+void rnr_nak_timer(unsigned long data)
+{
+   struct rxe_qp *qp = (struct rxe_qp *)data;
+
+   pr_debug("rnr nak timer fired\n");
+   rxe_run_task(>req.task, 1);
+}
+
+static struct rxe_send_wqe *req_next_wqe(struct rxe_qp 

Re: [PATCH rdma-next 02/32] IB/core: Add SEND_LAST_INV and SEND_ONLY_INV opcodes

2015-09-16 Thread Christoph Hellwig
On Wed, Sep 16, 2015 at 04:42:36PM +0300, Kamal Heib wrote:
> Intorduce Add SEND_LAST_INV and SEND_ONLY_INV opcodes in ib_pack.h to be
> used by RXE for RC.

Why does RXE need new public opcodes?

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH rdma-next 28/32] IB/rxe: Interface to netdev stack

2015-09-16 Thread Kamal Heib
Linux netdev related code

Signed-off-by: Kamal Heib 
Signed-off-by: Amir Vadai 
---
 drivers/staging/rxe/rxe_net.c | 705 ++
 drivers/staging/rxe/rxe_net.h |  72 +
 2 files changed, 777 insertions(+)
 create mode 100644 drivers/staging/rxe/rxe_net.c
 create mode 100644 drivers/staging/rxe/rxe_net.h

diff --git a/drivers/staging/rxe/rxe_net.c b/drivers/staging/rxe/rxe_net.c
new file mode 100644
index 000..defcb0f
--- /dev/null
+++ b/drivers/staging/rxe/rxe_net.c
@@ -0,0 +1,705 @@
+/*
+ * Copyright (c) 2015 Mellanox Technologies Ltd. All rights reserved.
+ * Copyright (c) 2015 System Fabric Works, Inc. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ * - Redistributions of source code must retain the above
+ *   copyright notice, this list of conditions and the following
+ *   disclaimer.
+ *
+ * - Redistributions in binary form must reproduce the above
+ *   copyright notice, this list of conditions and the following
+ *   disclaimer in the documentation and/or other materials
+ *   provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "rxe.h"
+#include "rxe_net.h"
+#include "rxe_loc.h"
+
+/*
+ * note: this table is a replacement for a protocol specific pointer
+ * in struct net_device which exists for other ethertypes
+ * this allows us to not have to patch that data structure
+ * eventually we want to get our own when we're famous
+ */
+struct rxe_net_info net_info[RXE_MAX_IF_INDEX];
+spinlock_t net_info_lock; /* spinlock for net_info array */
+struct socket *rxe_sock;
+
+static __be64 rxe_mac_to_eui64(struct net_device *ndev)
+{
+   unsigned char *mac_addr = ndev->dev_addr;
+   __be64 eui64;
+   unsigned char *dst = (unsigned char *)
+
+   dst[0] = mac_addr[0] ^ 2;
+   dst[1] = mac_addr[1];
+   dst[2] = mac_addr[2];
+   dst[3] = 0xff;
+   dst[4] = 0xfe;
+   dst[5] = mac_addr[3];
+   dst[6] = mac_addr[4];
+   dst[7] = mac_addr[5];
+
+   return eui64;
+}
+
+static __be64 node_guid(struct rxe_dev *rxe)
+{
+   return rxe_mac_to_eui64(rxe->ndev);
+}
+
+static __be64 port_guid(struct rxe_dev *rxe, unsigned int port_num)
+{
+   return rxe_mac_to_eui64(rxe->ndev);
+}
+
+static struct device *dma_device(struct rxe_dev *rxe)
+{
+   struct net_device *ndev;
+
+   ndev = rxe->ndev;
+
+   if (ndev->priv_flags & IFF_802_1Q_VLAN)
+   ndev = vlan_dev_real_dev(ndev);
+
+   return ndev->dev.parent;
+}
+
+static int mcast_add(struct rxe_dev *rxe, union ib_gid *mgid)
+{
+   int err;
+   unsigned char ll_addr[ETH_ALEN];
+
+   ipv6_eth_mc_map((struct in6_addr *)mgid->raw, ll_addr);
+   err = dev_mc_add(rxe->ndev, ll_addr);
+
+   return err;
+}
+
+static int mcast_delete(struct rxe_dev *rxe, union ib_gid *mgid)
+{
+   int err;
+   unsigned char ll_addr[ETH_ALEN];
+
+   ipv6_eth_mc_map((struct in6_addr *)mgid->raw, ll_addr);
+   err = dev_mc_del(rxe->ndev, ll_addr);
+
+   return err;
+}
+
+static struct rtable *rxe_find_route4(struct in_addr *saddr,
+ struct in_addr *daddr)
+{
+   struct rtable *rt;
+   struct flowi4 fl = { { 0 } };
+
+   memset(, 0, sizeof(fl));
+   memcpy(, saddr, sizeof(*saddr));
+   memcpy(, daddr, sizeof(*daddr));
+   fl.flowi4_proto = IPPROTO_UDP;
+
+   rt = ip_route_output_key(_net, );
+   if (IS_ERR(rt)) {
+   pr_err("no route to %pI4\n", >s_addr);
+   return NULL;
+   }
+
+   return rt;
+}
+
+static struct dst_entry *rxe_find_route6(struct net_device *ndev,
+struct in6_addr *saddr,
+struct in6_addr *daddr)
+{
+   struct dst_entry *ndst;
+   struct flowi6 fl6 = { { 0 } };
+
+   memset(, 0, sizeof(fl6));
+   fl6.flowi6_oif = ndev->ifindex;
+  

[PATCH rdma-next 19/32] IB/rxe: Memory Region (MR) handling

2015-09-16 Thread Kamal Heib
MR objects handling.

Signed-off-by: Kamal Heib 
Signed-off-by: Amir Vadai 
---
 drivers/staging/rxe/rxe_mr.c | 764 +++
 1 file changed, 764 insertions(+)
 create mode 100644 drivers/staging/rxe/rxe_mr.c

diff --git a/drivers/staging/rxe/rxe_mr.c b/drivers/staging/rxe/rxe_mr.c
new file mode 100644
index 000..89a5c2b
--- /dev/null
+++ b/drivers/staging/rxe/rxe_mr.c
@@ -0,0 +1,764 @@
+/*
+ * Copyright (c) 2015 Mellanox Technologies Ltd. All rights reserved.
+ * Copyright (c) 2015 System Fabric Works, Inc. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ * - Redistributions of source code must retain the above
+ *   copyright notice, this list of conditions and the following
+ *   disclaimer.
+ *
+ * - Redistributions in binary form must reproduce the above
+ *   copyright notice, this list of conditions and the following
+ *   disclaimer in the documentation and/or other materials
+ *   provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include "rxe.h"
+#include "rxe_loc.h"
+
+/*
+ * lfsr (linear feedback shift register) with period 255
+ */
+static u8 rxe_get_key(void)
+{
+   static unsigned key = 1;
+
+   key = key << 1;
+
+   key |= (0 != (key & 0x100)) ^ (0 != (key & 0x10))
+   ^ (0 != (key & 0x80)) ^ (0 != (key & 0x40));
+
+   key &= 0xff;
+
+   return key;
+}
+
+int mem_check_range(struct rxe_mem *mem, u64 iova, size_t length)
+{
+   switch (mem->type) {
+   case RXE_MEM_TYPE_DMA:
+   return 0;
+
+   case RXE_MEM_TYPE_MR:
+   case RXE_MEM_TYPE_FMR:
+   return ((iova < mem->iova) ||
+   ((iova + length) > (mem->iova + mem->length))) ?
+   -EFAULT : 0;
+
+   default:
+   return -EFAULT;
+   }
+}
+
+#define IB_ACCESS_REMOTE   (IB_ACCESS_REMOTE_READ  \
+   | IB_ACCESS_REMOTE_WRITE\
+   | IB_ACCESS_REMOTE_ATOMIC)
+
+static void rxe_mem_init(int access, struct rxe_mem *mem)
+{
+   u32 lkey = mem->pelem.index << 8 | rxe_get_key();
+   u32 rkey = (access & IB_ACCESS_REMOTE) ? lkey : 0;
+
+   if (mem->pelem.pool->type == RXE_TYPE_MR) {
+   mem->ibmr.lkey  = lkey;
+   mem->ibmr.rkey  = rkey;
+   } else {
+   mem->ibfmr.lkey = lkey;
+   mem->ibfmr.rkey = rkey;
+   }
+
+   mem->pd = NULL;
+   mem->umem   = NULL;
+   mem->lkey   = lkey;
+   mem->rkey   = rkey;
+   mem->state  = RXE_MEM_STATE_INVALID;
+   mem->type   = RXE_MEM_TYPE_NONE;
+   mem->va = 0;
+   mem->iova   = 0;
+   mem->length = 0;
+   mem->offset = 0;
+   mem->access = 0;
+   mem->page_shift = 0;
+   mem->page_mask  = 0;
+   mem->map_shift  = ilog2(RXE_BUF_PER_MAP);
+   mem->map_mask   = 0;
+   mem->num_buf= 0;
+   mem->max_buf= 0;
+   mem->num_map= 0;
+   mem->map= NULL;
+}
+
+void rxe_mem_cleanup(void *arg)
+{
+   struct rxe_mem *mem = arg;
+   int i;
+
+   if (mem->umem)
+   ib_umem_release(mem->umem);
+
+   if (mem->map) {
+   for (i = 0; i < mem->num_map; i++)
+   kfree(mem->map[i]);
+
+   kfree(mem->map);
+   }
+}
+
+static int rxe_mem_alloc(struct rxe_dev *rxe, struct rxe_mem *mem, int num_buf)
+{
+   int i;
+   int num_map;
+   struct rxe_map **map = mem->map;
+
+   num_map = (num_buf + RXE_BUF_PER_MAP - 1) / RXE_BUF_PER_MAP;
+
+   mem->map = kmalloc_array(num_map, sizeof(*map), GFP_KERNEL);
+   if (!mem->map)
+   goto err1;
+
+   for (i = 0; i < num_map; i++) {
+   mem->map[i] = 

[PATCH rdma-next 10/32] IB/rxe: User/kernel shared queues infrastructure

2015-09-16 Thread Kamal Heib
mmap routines

Signed-off-by: Kamal Heib 
Signed-off-by: Amir Vadai 
Signed-off-by: Haggai Eran 
---
 drivers/staging/rxe/rxe_mmap.c | 173 +
 1 file changed, 173 insertions(+)
 create mode 100644 drivers/staging/rxe/rxe_mmap.c

diff --git a/drivers/staging/rxe/rxe_mmap.c b/drivers/staging/rxe/rxe_mmap.c
new file mode 100644
index 000..fbe3e1d
--- /dev/null
+++ b/drivers/staging/rxe/rxe_mmap.c
@@ -0,0 +1,173 @@
+/*
+ * Copyright (c) 2015 Mellanox Technologies Ltd. All rights reserved.
+ * Copyright (c) 2015 System Fabric Works, Inc. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ * - Redistributions of source code must retain the above
+ *   copyright notice, this list of conditions and the following
+ *   disclaimer.
+ *
+ * - Redistributions in binary form must reproduce the above
+ *   copyright notice, this list of conditions and the following
+ *   disclaimer in the documentation and/or other materials
+ *   provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "rxe.h"
+#include "rxe_loc.h"
+#include "rxe_queue.h"
+
+void rxe_mmap_release(struct kref *ref)
+{
+   struct rxe_mmap_info *ip = container_of(ref,
+   struct rxe_mmap_info, ref);
+   struct rxe_dev *rxe = to_rdev(ip->context->device);
+
+   spin_lock_bh(>pending_lock);
+
+   if (!list_empty(>pending_mmaps))
+   list_del(>pending_mmaps);
+
+   spin_unlock_bh(>pending_lock);
+
+   vfree(ip->obj); /* buf */
+   kfree(ip);
+}
+
+/*
+ * open and close keep track of how many times the memory region is mapped,
+ * to avoid releasing it.
+ */
+static void rxe_vma_open(struct vm_area_struct *vma)
+{
+   struct rxe_mmap_info *ip = vma->vm_private_data;
+
+   kref_get(>ref);
+}
+
+static void rxe_vma_close(struct vm_area_struct *vma)
+{
+   struct rxe_mmap_info *ip = vma->vm_private_data;
+
+   kref_put(>ref, rxe_mmap_release);
+}
+
+static struct vm_operations_struct rxe_vm_ops = {
+   .open = rxe_vma_open,
+   .close = rxe_vma_close,
+};
+
+/**
+ * rxe_mmap - create a new mmap region
+ * @context: the IB user context of the process making the mmap() call
+ * @vma: the VMA to be initialized
+ * Return zero if the mmap is OK. Otherwise, return an errno.
+ */
+int rxe_mmap(struct ib_ucontext *context, struct vm_area_struct *vma)
+{
+   struct rxe_dev *rxe = to_rdev(context->device);
+   unsigned long offset = vma->vm_pgoff << PAGE_SHIFT;
+   unsigned long size = vma->vm_end - vma->vm_start;
+   struct rxe_mmap_info *ip, *pp;
+   int ret;
+
+   /*
+* Search the device's list of objects waiting for a mmap call.
+* Normally, this list is very short since a call to create a
+* CQ, QP, or SRQ is soon followed by a call to mmap().
+*/
+   spin_lock_bh(>pending_lock);
+   list_for_each_entry_safe(ip, pp, >pending_mmaps, pending_mmaps) {
+   if (context != ip->context || (__u64)offset != ip->info.offset)
+   continue;
+
+   /* Don't allow a mmap larger than the object. */
+   if (size > ip->info.size) {
+   pr_err("mmap region is larger than the object!\n");
+   spin_unlock_bh(>pending_lock);
+   ret = -EINVAL;
+   goto done;
+   }
+
+   goto found_it;
+   }
+   pr_warn("unable to find pending mmap info\n");
+   spin_unlock_bh(>pending_lock);
+   ret = -EINVAL;
+   goto done;
+
+found_it:
+   list_del_init(>pending_mmaps);
+   spin_unlock_bh(>pending_lock);
+
+   ret = remap_vmalloc_range(vma, ip->obj, 0);
+   if (ret) {
+   pr_err("rxe: err %d from remap_vmalloc_range\n", ret);
+   goto done;
+   }
+
+   vma->vm_ops = _vm_ops;
+   

[PATCH rdma-next 08/32] IB/rxe: Add maintainer for rxe driver

2015-09-16 Thread Kamal Heib
Add maintainer for rxe driver

Signed-off-by: Kamal Heib 
Signed-off-by: Amir Vadai 
Signed-off-by: Haggai Eran 
---
 MAINTAINERS | 9 +
 1 file changed, 9 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 23438a1..916bbf1 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -6644,6 +6644,15 @@ W:   http://www.mellanox.com
 Q: http://patchwork.ozlabs.org/project/netdev/list/
 F: drivers/net/ethernet/mellanox/mlx4/en_*
 
+SOFT-ROCE DRIVER (rxe)
+M:  Kamal Heib 
+L:  linux-rdma@vger.kernel.org
+S:  Supported
+W:  https://github.com/SoftRoCE/rxe-dev/wiki/rxe-dev:-Home
+Q:  http://patchwork.kernel.org/project/linux-rdma/list/
+F:  drivers/staging/rxe/
+F: include/uapi/rdma/ib_rxe.h
+
 MEMORY MANAGEMENT
 L: linux...@kvack.org
 W: http://www.linux-mm.org
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH rdma-next 02/32] IB/core: Add SEND_LAST_INV and SEND_ONLY_INV opcodes

2015-09-16 Thread Kamal Heib
Intorduce Add SEND_LAST_INV and SEND_ONLY_INV opcodes in ib_pack.h to be
used by RXE for RC.

Signed-off-by: Kamal Heib 
Signed-off-by: Amir Vadai 
Signed-off-by: Haggai Eran 
---
 include/rdma/ib_pack.h | 4 
 1 file changed, 4 insertions(+)

diff --git a/include/rdma/ib_pack.h b/include/rdma/ib_pack.h
index 3883fcd..12bce2e 100644
--- a/include/rdma/ib_pack.h
+++ b/include/rdma/ib_pack.h
@@ -102,6 +102,8 @@ enum {
IB_OPCODE_ATOMIC_ACKNOWLEDGE= 0x12,
IB_OPCODE_COMPARE_SWAP  = 0x13,
IB_OPCODE_FETCH_ADD = 0x14,
+   IB_OPCODE_SEND_LAST_INV = 0x16,
+   IB_OPCODE_SEND_ONLY_INV = 0x17,
 
/* real constants follow -- see comment about above IB_OPCODE()
   macro for more details */
@@ -128,6 +130,8 @@ enum {
IB_OPCODE(RC, ATOMIC_ACKNOWLEDGE),
IB_OPCODE(RC, COMPARE_SWAP),
IB_OPCODE(RC, FETCH_ADD),
+   IB_OPCODE(RC, SEND_LAST_INV),
+   IB_OPCODE(RC, SEND_ONLY_INV),
 
/* UC */
IB_OPCODE(UC, SEND_FIRST),
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH rdma-next 05/32] IB/rxe: Default rxe device and port parameters

2015-09-16 Thread Kamal Heib
Default/initial rxe device parameter settings.

Signed-off-by: Kamal Heib 
Signed-off-by: Amir Vadai 
Signed-off-by: Haggai Eran 
---
 drivers/staging/rxe/rxe_param.h | 177 
 1 file changed, 177 insertions(+)
 create mode 100644 drivers/staging/rxe/rxe_param.h

diff --git a/drivers/staging/rxe/rxe_param.h b/drivers/staging/rxe/rxe_param.h
new file mode 100644
index 000..320b8e5
--- /dev/null
+++ b/drivers/staging/rxe/rxe_param.h
@@ -0,0 +1,177 @@
+/*
+ * Copyright (c) 2015 Mellanox Technologies Ltd. All rights reserved.
+ * Copyright (c) 2015 System Fabric Works, Inc. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ * - Redistributions of source code must retain the above
+ *   copyright notice, this list of conditions and the following
+ *   disclaimer.
+ *
+ * - Redistributions in binary form must reproduce the above
+ *   copyright notice, this list of conditions and the following
+ *   disclaimer in the documentation and/or other materials
+ *   provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#ifndef RXE_PARAM_H
+#define RXE_PARAM_H
+
+static inline enum ib_mtu rxe_mtu_int_to_enum(int mtu)
+{
+   if (mtu < 256)
+   return 0;
+   else if (mtu < 512)
+   return IB_MTU_256;
+   else if (mtu < 1024)
+   return IB_MTU_512;
+   else if (mtu < 2048)
+   return IB_MTU_1024;
+   else if (mtu < 4096)
+   return IB_MTU_2048;
+   else
+   return IB_MTU_4096;
+}
+
+/* Find the IB mtu for a given network MTU. */
+static inline enum ib_mtu eth_mtu_int_to_enum(int mtu)
+{
+   mtu -= RXE_MAX_HDR_LENGTH;
+
+   return rxe_mtu_int_to_enum(mtu);
+}
+
+/* default/initial rxe device parameter settings */
+enum rxe_device_param {
+   RXE_FW_VER  = 0,
+   RXE_MAX_MR_SIZE = -1ull,
+   RXE_PAGE_SIZE_CAP   = 0xf000,
+   RXE_VENDOR_ID   = 0,
+   RXE_VENDOR_PART_ID  = 0,
+   RXE_HW_VER  = 0,
+   RXE_MAX_QP  = 0x1,
+   RXE_MAX_QP_WR   = 0x4000,
+   RXE_MAX_INLINE_DATA = 400,
+   RXE_DEVICE_CAP_FLAGS= IB_DEVICE_BAD_PKEY_CNTR
+   | IB_DEVICE_BAD_QKEY_CNTR
+   | IB_DEVICE_AUTO_PATH_MIG
+   | IB_DEVICE_CHANGE_PHY_PORT
+   | IB_DEVICE_UD_AV_PORT_ENFORCE
+   | IB_DEVICE_PORT_ACTIVE_EVENT
+   | IB_DEVICE_SYS_IMAGE_GUID
+   | IB_DEVICE_RC_RNR_NAK_GEN
+   | IB_DEVICE_SRQ_RESIZE,
+   RXE_MAX_SGE = 27,
+   RXE_MAX_SGE_RD  = 0,
+   RXE_MAX_CQ  = 16384,
+   RXE_MAX_LOG_CQE = 13,
+   RXE_MAX_MR  = 2 * 1024,
+   RXE_MAX_PD  = 0x7ffc,
+   RXE_MAX_QP_RD_ATOM  = 128,
+   RXE_MAX_EE_RD_ATOM  = 0,
+   RXE_MAX_RES_RD_ATOM = 0x3f000,
+   RXE_MAX_QP_INIT_RD_ATOM = 128,
+   RXE_MAX_EE_INIT_RD_ATOM = 0,
+   RXE_ATOMIC_CAP  = 1,
+   RXE_MAX_EE  = 0,
+   RXE_MAX_RDD = 0,
+   RXE_MAX_MW  = 0,
+   RXE_MAX_RAW_IPV6_QP = 0,
+   RXE_MAX_RAW_ETHY_QP = 0,
+   RXE_MAX_MCAST_GRP   = 8192,
+   RXE_MAX_MCAST_QP_ATTACH = 56,
+   RXE_MAX_TOT_MCAST_QP_ATTACH = 0x7,
+   RXE_MAX_AH  = 100,
+   RXE_MAX_FMR = 2 * 1024,
+   RXE_MAX_MAP_PER_FMR = 100,
+   RXE_MAX_SRQ = 960,
+   

[PATCH rdma-next 11/32] IB/rxe: Common user/kernel queue implementation

2015-09-16 Thread Kamal Heib
A simple circular buffer that can optionally be shared between user
space and the kernel and can be resized.

Signed-off-by: Kamal Heib 
Signed-off-by: Amir Vadai 
Signed-off-by: Haggai Eran 
---
 drivers/staging/rxe/rxe_queue.c | 217 
 drivers/staging/rxe/rxe_queue.h | 178 
 2 files changed, 395 insertions(+)
 create mode 100644 drivers/staging/rxe/rxe_queue.c
 create mode 100644 drivers/staging/rxe/rxe_queue.h

diff --git a/drivers/staging/rxe/rxe_queue.c b/drivers/staging/rxe/rxe_queue.c
new file mode 100644
index 000..aabe04b
--- /dev/null
+++ b/drivers/staging/rxe/rxe_queue.c
@@ -0,0 +1,217 @@
+/*
+ * Copyright (c) 2015 Mellanox Technologies Ltd. All rights reserved.
+ * Copyright (c) 2015 System Fabric Works, Inc. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ * - Redistributions of source code must retain the above
+ *   copyright notice, this list of conditions and the following
+ *   disclaimer.
+ *
+ * - Redistributions in binary form must retailuce the above
+ *   copyright notice, this list of conditions and the following
+ *   disclaimer in the documentation and/or other materials
+ *   provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include 
+#include "rxe.h"
+#include "rxe_loc.h"
+#include "rxe_queue.h"
+
+int do_mmap_info(struct rxe_dev *rxe,
+struct ib_udata *udata,
+bool is_req,
+struct ib_ucontext *context,
+struct rxe_queue_buf *buf,
+size_t buf_size,
+struct rxe_mmap_info **ip_p)
+{
+   int err;
+   u32 len, offset;
+   struct rxe_mmap_info *ip = NULL;
+
+   if (udata) {
+   if (is_req) {
+   len = udata->outlen - sizeof(struct mminfo);
+   offset = sizeof(struct mminfo);
+   } else {
+   len = udata->outlen;
+   offset = 0;
+   }
+
+   if (len < sizeof(ip->info))
+   goto err1;
+
+   ip = rxe_create_mmap_info(rxe, buf_size, context, buf);
+   if (!ip)
+   goto err1;
+
+   err = copy_to_user(udata->outbuf + offset, >info,
+  sizeof(ip->info));
+   if (err)
+   goto err2;
+
+   spin_lock_bh(>pending_lock);
+   list_add(>pending_mmaps, >pending_mmaps);
+   spin_unlock_bh(>pending_lock);
+   }
+
+   *ip_p = ip;
+
+   return 0;
+
+err2:
+   kfree(ip);
+err1:
+   return -EINVAL;
+}
+
+struct rxe_queue *rxe_queue_init(struct rxe_dev *rxe,
+int *num_elem,
+unsigned int elem_size)
+{
+   struct rxe_queue *q;
+   size_t buf_size;
+   unsigned int num_slots;
+
+   /* num_elem == 0 is allowed, but uninteresting */
+   if (*num_elem < 0)
+   goto err1;
+
+   q = kmalloc(sizeof(*q), GFP_KERNEL);
+   if (!q)
+   goto err1;
+
+   q->rxe = rxe;
+
+   /* used in resize, only need to copy used part of queue */
+   q->elem_size = elem_size;
+
+   /* pad element up to at least a cacheline and always a power of 2 */
+   if (elem_size < cache_line_size())
+   elem_size = cache_line_size();
+   elem_size = roundup_pow_of_two(elem_size);
+
+   q->log2_elem_size = order_base_2(elem_size);
+
+   num_slots = *num_elem + 1;
+   num_slots = roundup_pow_of_two(num_slots);
+   q->index_mask = num_slots - 1;
+
+   buf_size = sizeof(struct rxe_queue_buf) + num_slots * elem_size;
+
+   q->buf = vmalloc_user(buf_size);
+   if (!q->buf)
+   goto err2;
+
+   q->buf->log2_elem_size = q->log2_elem_size;
+   q->buf->index_mask = q->index_mask;
+
+   q->buf_size = buf_size;
+
+   *num_elem = 

[PATCH rdma-next 07/32] IB/rxe: Misc local interfaces between files in ib_rxe

2015-09-16 Thread Kamal Heib
Signed-off-by: Kamal Heib 
Signed-off-by: Amir Vadai 
Signed-off-by: Haggai Eran 
---
 drivers/staging/rxe/rxe_loc.h | 291 ++
 1 file changed, 291 insertions(+)
 create mode 100644 drivers/staging/rxe/rxe_loc.h

diff --git a/drivers/staging/rxe/rxe_loc.h b/drivers/staging/rxe/rxe_loc.h
new file mode 100644
index 000..814b51d
--- /dev/null
+++ b/drivers/staging/rxe/rxe_loc.h
@@ -0,0 +1,291 @@
+/*
+ * Copyright (c) 2015 Mellanox Technologies Ltd. All rights reserved.
+ * Copyright (c) 2015 System Fabric Works, Inc. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ * - Redistributions of source code must retain the above
+ *   copyright notice, this list of conditions and the following
+ *   disclaimer.
+ *
+ * - Redistributions in binary form must reproduce the above
+ *   copyright notice, this list of conditions and the following
+ *   disclaimer in the documentation and/or other materials
+ *   provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#ifndef RXE_LOC_H
+#define RXE_LOC_H
+
+/* rxe_av.c */
+int rxe_av_chk_attr(struct rxe_dev *rxe, struct ib_ah_attr *attr);
+
+int rxe_av_from_attr(struct rxe_dev *rxe, u8 port_num,
+struct rxe_av *av, struct ib_ah_attr *attr);
+
+int rxe_av_to_attr(struct rxe_dev *rxe, struct rxe_av *av,
+  struct ib_ah_attr *attr);
+
+int rxe_av_fill_ip_info(struct rxe_dev *rxe,
+   struct rxe_av *av,
+   struct ib_ah_attr *attr,
+   struct ib_gid_attr *sgid_attr,
+   union ib_gid *sgid);
+
+/* rxe_cq.c */
+int rxe_cq_chk_attr(struct rxe_dev *rxe, struct rxe_cq *cq,
+   int cqe, int comp_vector, struct ib_udata *udata);
+
+int rxe_cq_from_init(struct rxe_dev *rxe, struct rxe_cq *cq, int cqe,
+int comp_vector, struct ib_ucontext *context,
+struct ib_udata *udata);
+
+int rxe_cq_resize_queue(struct rxe_cq *cq, int new_cqe, struct ib_udata 
*udata);
+
+int rxe_cq_post(struct rxe_cq *cq, struct rxe_cqe *cqe, int solicited);
+
+void rxe_cq_cleanup(void *arg);
+
+/* rxe_mcast.c */
+int rxe_mcast_get_grp(struct rxe_dev *rxe, union ib_gid *mgid,
+ struct rxe_mc_grp **grp_p);
+
+int rxe_mcast_add_grp_elem(struct rxe_dev *rxe, struct rxe_qp *qp,
+  struct rxe_mc_grp *grp);
+
+int rxe_mcast_drop_grp_elem(struct rxe_dev *rxe, struct rxe_qp *qp,
+   union ib_gid *mgid);
+
+void rxe_drop_all_mcast_groups(struct rxe_qp *qp);
+
+void rxe_mc_cleanup(void *arg);
+
+/* rxe_mmap.c */
+struct rxe_mmap_info {
+   struct list_headpending_mmaps;
+   struct ib_ucontext  *context;
+   struct kref ref;
+   void*obj;
+
+   struct mminfo info;
+};
+
+void rxe_mmap_release(struct kref *ref);
+
+struct rxe_mmap_info *rxe_create_mmap_info(struct rxe_dev *dev,
+  u32 size,
+  struct ib_ucontext *context,
+  void *obj);
+
+int rxe_mmap(struct ib_ucontext *context, struct vm_area_struct *vma);
+
+/* rxe_mr.c */
+enum copy_direction {
+   to_mem_obj,
+   from_mem_obj,
+};
+
+int rxe_mem_init_dma(struct rxe_dev *rxe, struct rxe_pd *pd,
+int access, struct rxe_mem *mem);
+
+int rxe_mem_init_phys(struct rxe_dev *rxe, struct rxe_pd *pd,
+ int access, u64 iova, struct ib_phys_buf *buf,
+ int num_buf, struct rxe_mem *mem);
+
+int rxe_mem_init_user(struct rxe_dev *rxe, struct rxe_pd *pd, u64 start,
+ u64 length, u64 iova, int access, struct ib_udata *udata,
+ struct rxe_mem *mr);
+
+int rxe_mem_init_fast(struct rxe_dev *rxe, struct rxe_pd *pd,
+ int max_pages, struct rxe_mem *mem);
+
+int rxe_mem_init_mw(struct rxe_dev *rxe, 

[PATCH rdma-next 04/32] IB/rxe: Bit mask and lengths declaration for different opcodes

2015-09-16 Thread Kamal Heib
header bit mask definitions and header lengths declaration of the
rxe_opcode_info struct and rxe_wr_opcode_info struct.

Signed-off-by: Kamal Heib 
Signed-off-by: Amir Vadai 
Signed-off-by: Haggai Eran 
---
 drivers/staging/rxe/rxe_opcode.h | 128 +++
 1 file changed, 128 insertions(+)
 create mode 100644 drivers/staging/rxe/rxe_opcode.h

diff --git a/drivers/staging/rxe/rxe_opcode.h b/drivers/staging/rxe/rxe_opcode.h
new file mode 100644
index 000..3682c16
--- /dev/null
+++ b/drivers/staging/rxe/rxe_opcode.h
@@ -0,0 +1,128 @@
+/*
+ * Copyright (c) 2015 Mellanox Technologies Ltd. All rights reserved.
+ * Copyright (c) 2015 System Fabric Works, Inc. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ * - Redistributions of source code must retain the above
+ *   copyright notice, this list of conditions and the following
+ *   disclaimer.
+ *
+ * - Redistributions in binary form must reproduce the above
+ *   copyright notice, this list of conditions and the following
+ *   disclaimer in the documentation and/or other materials
+ *   provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#ifndef RXE_OPCODE_H
+#define RXE_OPCODE_H
+
+/*
+ * contains header bit mask definitions and header lengths
+ * declaration of the rxe_opcode_info struct and
+ * rxe_wr_opcode_info struct
+ */
+
+enum rxe_wr_mask {
+   WR_INLINE_MASK  = BIT(0),
+   WR_ATOMIC_MASK  = BIT(1),
+   WR_SEND_MASK= BIT(2),
+   WR_READ_MASK= BIT(3),
+   WR_WRITE_MASK   = BIT(4),
+   WR_LOCAL_MASK   = BIT(5),
+
+   WR_READ_OR_WRITE_MASK   = WR_READ_MASK | WR_WRITE_MASK,
+   WR_READ_WRITE_OR_SEND_MASK  = WR_READ_OR_WRITE_MASK | WR_SEND_MASK,
+   WR_WRITE_OR_SEND_MASK   = WR_WRITE_MASK | WR_SEND_MASK,
+   WR_ATOMIC_OR_READ_MASK  = WR_ATOMIC_MASK | WR_READ_MASK,
+};
+
+#define WR_MAX_QPT (8)
+
+struct rxe_wr_opcode_info {
+   char*name;
+   enum rxe_wr_maskmask[WR_MAX_QPT];
+};
+
+extern struct rxe_wr_opcode_info rxe_wr_opcode_info[];
+
+enum rxe_hdr_type {
+   RXE_LRH,
+   RXE_GRH,
+   RXE_BTH,
+   RXE_RETH,
+   RXE_AETH,
+   RXE_ATMETH,
+   RXE_ATMACK,
+   RXE_IETH,
+   RXE_RDETH,
+   RXE_DETH,
+   RXE_IMMDT,
+   RXE_PAYLOAD,
+   NUM_HDR_TYPES
+};
+
+enum rxe_hdr_mask {
+   RXE_LRH_MASK= BIT(RXE_LRH),
+   RXE_GRH_MASK= BIT(RXE_GRH),
+   RXE_BTH_MASK= BIT(RXE_BTH),
+   RXE_IMMDT_MASK  = BIT(RXE_IMMDT),
+   RXE_RETH_MASK   = BIT(RXE_RETH),
+   RXE_AETH_MASK   = BIT(RXE_AETH),
+   RXE_ATMETH_MASK = BIT(RXE_ATMETH),
+   RXE_ATMACK_MASK = BIT(RXE_ATMACK),
+   RXE_IETH_MASK   = BIT(RXE_IETH),
+   RXE_RDETH_MASK  = BIT(RXE_RDETH),
+   RXE_DETH_MASK   = BIT(RXE_DETH),
+   RXE_PAYLOAD_MASK= BIT(RXE_PAYLOAD),
+
+   RXE_REQ_MASK= BIT(NUM_HDR_TYPES + 0),
+   RXE_ACK_MASK= BIT(NUM_HDR_TYPES + 1),
+   RXE_SEND_MASK   = BIT(NUM_HDR_TYPES + 2),
+   RXE_WRITE_MASK  = BIT(NUM_HDR_TYPES + 3),
+   RXE_READ_MASK   = BIT(NUM_HDR_TYPES + 4),
+   RXE_ATOMIC_MASK = BIT(NUM_HDR_TYPES + 5),
+
+   RXE_RWR_MASK= BIT(NUM_HDR_TYPES + 6),
+   RXE_COMP_MASK   = BIT(NUM_HDR_TYPES + 7),
+
+   RXE_START_MASK  = BIT(NUM_HDR_TYPES + 8),
+   RXE_MIDDLE_MASK = BIT(NUM_HDR_TYPES + 9),
+   RXE_END_MASK= BIT(NUM_HDR_TYPES + 10),
+
+   RXE_LOOPBACK_MASK   = BIT(NUM_HDR_TYPES + 12),
+
+   RXE_READ_OR_ATOMIC  = (RXE_READ_MASK | RXE_ATOMIC_MASK),
+   RXE_WRITE_OR_SEND   = (RXE_WRITE_MASK | RXE_SEND_MASK),
+};
+
+#define OPCODE_NONE(-1)
+#define 

[PATCH rdma-next 17/32] IB/rxe: Completion Queue (CQ) manipulation functions

2015-09-16 Thread Kamal Heib
Functions to manipulate CQ.

Signed-off-by: Kamal Heib 
Signed-off-by: Amir Vadai 
Signed-off-by: Haggai Eran 
---
 drivers/staging/rxe/rxe_cq.c | 165 +++
 1 file changed, 165 insertions(+)
 create mode 100644 drivers/staging/rxe/rxe_cq.c

diff --git a/drivers/staging/rxe/rxe_cq.c b/drivers/staging/rxe/rxe_cq.c
new file mode 100644
index 000..a572e4d
--- /dev/null
+++ b/drivers/staging/rxe/rxe_cq.c
@@ -0,0 +1,165 @@
+/*
+ * Copyright (c) 2015 Mellanox Technologies Ltd. All rights reserved.
+ * Copyright (c) 2015 System Fabric Works, Inc. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *Redistribution and use in source and binary forms, with or
+ *without modification, are permitted provided that the following
+ *conditions are met:
+ *
+ * - Redistributions of source code must retain the above
+ *   copyright notice, this list of conditions and the following
+ *   disclaimer.
+ *
+ * - Redistributions in binary form must reproduce the above
+ *   copyright notice, this list of conditions and the following
+ *   disclaimer in the documentation and/or other materials
+ *   provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include "rxe.h"
+#include "rxe_loc.h"
+#include "rxe_queue.h"
+
+int rxe_cq_chk_attr(struct rxe_dev *rxe, struct rxe_cq *cq,
+   int cqe, int comp_vector, struct ib_udata *udata)
+{
+   int count;
+
+   if (cqe <= 0) {
+   pr_warn("cqe(%d) <= 0\n", cqe);
+   goto err1;
+   }
+
+   if (cqe > rxe->attr.max_cqe) {
+   pr_warn("cqe(%d) > max_cqe(%d)\n",
+   cqe, rxe->attr.max_cqe);
+   goto err1;
+   }
+
+   if (cq) {
+   count = queue_count(cq->queue);
+   if (cqe < count) {
+   pr_warn("cqe(%d) < current # elements in queue (%d)",
+   cqe, count);
+   goto err1;
+   }
+   }
+
+   return 0;
+
+err1:
+   return -EINVAL;
+}
+
+static void rxe_send_complete(unsigned long data)
+{
+   struct rxe_cq *cq = (struct rxe_cq *)data;
+
+   cq->ibcq.comp_handler(>ibcq, cq->ibcq.cq_context);
+}
+
+int rxe_cq_from_init(struct rxe_dev *rxe, struct rxe_cq *cq, int cqe,
+int comp_vector, struct ib_ucontext *context,
+struct ib_udata *udata)
+{
+   int err;
+
+   cq->queue = rxe_queue_init(rxe, ,
+  sizeof(struct rxe_cqe));
+   if (!cq->queue) {
+   pr_warn("unable to create cq\n");
+   return -ENOMEM;
+   }
+
+   err = do_mmap_info(rxe, udata, false, context, cq->queue->buf,
+  cq->queue->buf_size, >queue->ip);
+   if (err) {
+   kvfree(cq->queue->buf);
+   kfree(cq->queue);
+   return err;
+   }
+
+   if (udata)
+   cq->is_user = 1;
+
+   tasklet_init(>comp_task, rxe_send_complete, (unsigned long)cq);
+
+   spin_lock_init(>cq_lock);
+   cq->ibcq.cqe = cqe;
+   return 0;
+}
+
+int rxe_cq_resize_queue(struct rxe_cq *cq, int cqe, struct ib_udata *udata)
+{
+   int err;
+
+   err = rxe_queue_resize(cq->queue, (unsigned int *),
+  sizeof(struct rxe_cqe),
+  cq->queue->ip ? cq->queue->ip->context : NULL,
+  udata, NULL, >cq_lock);
+   if (!err)
+   cq->ibcq.cqe = cqe;
+
+   return err;
+}
+
+int rxe_cq_post(struct rxe_cq *cq, struct rxe_cqe *cqe, int solicited)
+{
+   struct ib_event ev;
+   unsigned long flags;
+
+   spin_lock_irqsave(>cq_lock, flags);
+
+   if (unlikely(queue_full(cq->queue))) {
+   spin_unlock_irqrestore(>cq_lock, flags);
+   if (cq->ibcq.event_handler) {
+   ev.device = cq->ibcq.device;
+   ev.element.cq = >ibcq;
+   ev.event = IB_EVENT_CQ_ERR;
+   cq->ibcq.event_handler(, cq->ibcq.cq_context);
+   }

Re: [PATCH for-next V1 00/10] Add RoCE GID cache usage in verbs/cma

2015-09-16 Thread Or Gerlitz

On 8/7/2015 4:00 PM, Matan Barak wrote:

This purpose of this series is to add usage of the GID cache to
the CMA and IB stack. Instead of passing Ethernet L2 attributes
via QP attributes, we could just use the GID cache that's already
points to a ndev and thus to all required L2 attributes.


[...]

Hi Doug,

So 4.3-rc1 is out by now... can we start making progress on this series 
and the RoCE V2 one?



Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH rdma-next 32/32] IB/rxe: TODO file while in staging

2015-09-16 Thread Sagi Grimberg

On 9/16/2015 4:43 PM, Kamal Heib wrote:

From: Amir Vadai 

Things todo in order to get out of staging subtree.

Signed-off-by: Amir Vadai 
Signed-off-by: Kamal Heib 
---
  drivers/staging/rxe/TODO | 15 +++
  1 file changed, 15 insertions(+)
  create mode 100644 drivers/staging/rxe/TODO

diff --git a/drivers/staging/rxe/TODO b/drivers/staging/rxe/TODO
new file mode 100644
index 000..a621b27
--- /dev/null
+++ b/drivers/staging/rxe/TODO
@@ -0,0 +1,15 @@
+Aug, 2015
+
+- Remove software processing of IB protocol and place in library for use
+  by qib, ipath (if still present), hfi1, and soft-roce
+- Do not use tasklet in completion flow
+- Need to free resources if user space didn't.
+- Share structures from ib_user_verbs.h instead of copying in ib_rxe.h
+- Move IBA header types and methods from rxe_hdr.h into IB core
+- Cleanup members of rxe_pkt_info that already exists in packet header
+- Refactor post_send_one function to get better performance.
+- Refactor rxe_mem struct to be clear what is type of memory that it's holding.
+- Use single reference count from the pool to the device, instead of having a 
single
+  reference on the device kept by each element in the pool.
+- Calculate ICRC for incoming packets.
+- Use hash table to hold net_info instead of fixed size array used now.



Lets add:
- Support work request interface memory registration (I'll look into
that).
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH rdma-next 22/32] IB/rxe: Completion handling

2015-09-16 Thread Kamal Heib
Handling of Work Completions.

Signed-off-by: Kamal Heib 
Signed-off-by: Amir Vadai 
---
 drivers/staging/rxe/rxe_comp.c | 728 +
 1 file changed, 728 insertions(+)
 create mode 100644 drivers/staging/rxe/rxe_comp.c

diff --git a/drivers/staging/rxe/rxe_comp.c b/drivers/staging/rxe/rxe_comp.c
new file mode 100644
index 000..00a4cf7
--- /dev/null
+++ b/drivers/staging/rxe/rxe_comp.c
@@ -0,0 +1,728 @@
+/*
+ * Copyright (c) 2015 Mellanox Technologies Ltd. All rights reserved.
+ * Copyright (c) 2015 System Fabric Works, Inc. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ * - Redistributions of source code must retain the above
+ *   copyright notice, this list of conditions and the following
+ *   disclaimer.
+ *
+ * - Redistributions in binary form must reproduce the above
+ *   copyright notice, this list of conditions and the following
+ *   disclaimer in the documentation and/or other materials
+ *   provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include 
+
+#include "rxe.h"
+#include "rxe_loc.h"
+#include "rxe_queue.h"
+#include "rxe_task.h"
+
+enum comp_state {
+   COMPST_GET_ACK,
+   COMPST_GET_WQE,
+   COMPST_COMP_WQE,
+   COMPST_COMP_ACK,
+   COMPST_CHECK_PSN,
+   COMPST_CHECK_ACK,
+   COMPST_READ,
+   COMPST_ATOMIC,
+   COMPST_WRITE_SEND,
+   COMPST_UPDATE_COMP,
+   COMPST_ERROR_RETRY,
+   COMPST_RNR_RETRY,
+   COMPST_ERROR,
+   COMPST_EXIT, /* We have an issue, and we want to rerun the completer */
+   COMPST_DONE, /* The completer finished successflly */
+};
+
+static char *comp_state_name[] =  {
+   [COMPST_GET_ACK]= "GET ACK",
+   [COMPST_GET_WQE]= "GET WQE",
+   [COMPST_COMP_WQE]   = "COMP WQE",
+   [COMPST_COMP_ACK]   = "COMP ACK",
+   [COMPST_CHECK_PSN]  = "CHECK PSN",
+   [COMPST_CHECK_ACK]  = "CHECK ACK",
+   [COMPST_READ]   = "READ",
+   [COMPST_ATOMIC] = "ATOMIC",
+   [COMPST_WRITE_SEND] = "WRITE/SEND",
+   [COMPST_UPDATE_COMP]= "UPDATE COMP",
+   [COMPST_ERROR_RETRY]= "ERROR RETRY",
+   [COMPST_RNR_RETRY]  = "RNR RETRY",
+   [COMPST_ERROR]  = "ERROR",
+   [COMPST_EXIT]   = "EXIT",
+   [COMPST_DONE]   = "DONE",
+};
+
+static unsigned long rnrnak_usec[32] = {
+   [IB_RNR_TIMER_655_36] = 655360,
+   [IB_RNR_TIMER_000_01] = 10,
+   [IB_RNR_TIMER_000_02] = 20,
+   [IB_RNR_TIMER_000_03] = 30,
+   [IB_RNR_TIMER_000_04] = 40,
+   [IB_RNR_TIMER_000_06] = 60,
+   [IB_RNR_TIMER_000_08] = 80,
+   [IB_RNR_TIMER_000_12] = 120,
+   [IB_RNR_TIMER_000_16] = 160,
+   [IB_RNR_TIMER_000_24] = 240,
+   [IB_RNR_TIMER_000_32] = 320,
+   [IB_RNR_TIMER_000_48] = 480,
+   [IB_RNR_TIMER_000_64] = 640,
+   [IB_RNR_TIMER_000_96] = 960,
+   [IB_RNR_TIMER_001_28] = 1280,
+   [IB_RNR_TIMER_001_92] = 1920,
+   [IB_RNR_TIMER_002_56] = 2560,
+   [IB_RNR_TIMER_003_84] = 3840,
+   [IB_RNR_TIMER_005_12] = 5120,
+   [IB_RNR_TIMER_007_68] = 7680,
+   [IB_RNR_TIMER_010_24] = 10240,
+   [IB_RNR_TIMER_015_36] = 15360,
+   [IB_RNR_TIMER_020_48] = 20480,
+   [IB_RNR_TIMER_030_72] = 30720,
+   [IB_RNR_TIMER_040_96] = 40960,
+   [IB_RNR_TIMER_061_44] = 61410,
+   [IB_RNR_TIMER_081_92] = 81920,
+   [IB_RNR_TIMER_122_88] = 122880,
+   [IB_RNR_TIMER_163_84] = 163840,
+   [IB_RNR_TIMER_245_76] = 245760,
+   [IB_RNR_TIMER_327_68] = 327680,
+   [IB_RNR_TIMER_491_52] = 491520,
+};
+
+static inline unsigned long rnrnak_jiffies(u8 timeout)
+{
+   return max_t(unsigned long,
+   usecs_to_jiffies(rnrnak_usec[timeout]), 1);
+}
+
+static enum ib_wc_opcode wr_to_wc_opcode(enum ib_wr_opcode opcode)
+{
+   switch (opcode) {
+   case 

[PATCH rdma-next 21/32] IB/rxe: Received packets handling

2015-09-16 Thread Kamal Heib
Handles receiving new packets which are sent to either request or
response processing.

Signed-off-by: Kamal Heib 
Signed-off-by: Amir Vadai 
Signed-off-by: Haggai Eran 
---
 drivers/staging/rxe/rxe_recv.c | 371 +
 1 file changed, 371 insertions(+)
 create mode 100644 drivers/staging/rxe/rxe_recv.c

diff --git a/drivers/staging/rxe/rxe_recv.c b/drivers/staging/rxe/rxe_recv.c
new file mode 100644
index 000..092
--- /dev/null
+++ b/drivers/staging/rxe/rxe_recv.c
@@ -0,0 +1,371 @@
+/*
+ * Copyright (c) 2015 Mellanox Technologies Ltd. All rights reserved.
+ * Copyright (c) 2015 System Fabric Works, Inc. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ * - Redistributions of source code must retain the above
+ *   copyright notice, this list of conditions and the following
+ *   disclaimer.
+ *
+ * - Redistributions in binary form must reproduce the above
+ *   copyright notice, this list of conditions and the following
+ *   disclaimer in the documentation and/or other materials
+ *   provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include 
+
+#include "rxe.h"
+#include "rxe_loc.h"
+
+static int check_type_state(struct rxe_dev *rxe, struct rxe_pkt_info *pkt,
+   struct rxe_qp *qp)
+{
+   if (unlikely(!qp->valid))
+   goto err1;
+
+   switch (qp_type(qp)) {
+   case IB_QPT_RC:
+   if (unlikely((pkt->opcode & IB_OPCODE_RC) != 0)) {
+   pr_warn_ratelimited("bad qp type\n");
+   goto err1;
+   }
+   break;
+   case IB_QPT_UC:
+   if (unlikely(!(pkt->opcode & IB_OPCODE_UC))) {
+   pr_warn_ratelimited("bad qp type\n");
+   goto err1;
+   }
+   break;
+   case IB_QPT_UD:
+   case IB_QPT_SMI:
+   case IB_QPT_GSI:
+   if (unlikely(!(pkt->opcode & IB_OPCODE_UD))) {
+   pr_warn_ratelimited("bad qp type\n");
+   goto err1;
+   }
+   break;
+   default:
+   pr_warn_ratelimited("unsupported qp type\n");
+   goto err1;
+   }
+
+   if (pkt->mask & RXE_REQ_MASK) {
+   if (unlikely(qp->resp.state != QP_STATE_READY))
+   goto err1;
+   } else if (unlikely(qp->req.state < QP_STATE_READY ||
+   qp->req.state > QP_STATE_DRAINED))
+   goto err1;
+
+   return 0;
+
+err1:
+   return -EINVAL;
+}
+
+static void set_bad_pkey_cntr(struct rxe_port *port)
+{
+   spin_lock_bh(>port_lock);
+   port->attr.bad_pkey_cntr = min((u32)0x,
+  port->attr.bad_pkey_cntr + 1);
+   spin_unlock_bh(>port_lock);
+}
+
+static void set_qkey_viol_cntr(struct rxe_port *port)
+{
+   spin_lock_bh(>port_lock);
+   port->attr.qkey_viol_cntr = min((u32)0x,
+   port->attr.qkey_viol_cntr + 1);
+   spin_unlock_bh(>port_lock);
+}
+
+static int check_keys(struct rxe_dev *rxe, struct rxe_pkt_info *pkt,
+ u32 qpn, struct rxe_qp *qp)
+{
+   int i;
+   int found_pkey = 0;
+   struct rxe_port *port = >port[pkt->port_num - 1];
+   u16 pkey = bth_pkey(pkt);
+
+   pkt->pkey_index = 0;
+
+   if (qpn == 1) {
+   for (i = 0; i < port->attr.pkey_tbl_len; i++) {
+   if (pkey_match(pkey, port->pkey_tbl[i])) {
+   pkt->pkey_index = i;
+   found_pkey = 1;
+   break;
+   }
+   }
+
+   if (!found_pkey) {
+   pr_warn_ratelimited("bad pkey = 0x%x\n", pkey);
+   set_bad_pkey_cntr(port);
+   goto err1;
+   }
+   } else if (qpn != 0) {
+

[PATCH rdma-next 20/32] IB/rxe: Multicast implementation

2015-09-16 Thread Kamal Heib
Multicast groups handling.

Signed-off-by: Kamal Heib 
Signed-off-by: Amir Vadai 
Signed-off-by: Haggai Eran 
---
 drivers/staging/rxe/rxe_mcast.c | 190 
 1 file changed, 190 insertions(+)
 create mode 100644 drivers/staging/rxe/rxe_mcast.c

diff --git a/drivers/staging/rxe/rxe_mcast.c b/drivers/staging/rxe/rxe_mcast.c
new file mode 100644
index 000..bcf37be
--- /dev/null
+++ b/drivers/staging/rxe/rxe_mcast.c
@@ -0,0 +1,190 @@
+/*
+ * Copyright (c) 2015 Mellanox Technologies Ltd. All rights reserved.
+ * Copyright (c) 2015 System Fabric Works, Inc. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *Redistribution and use in source and binary forms, with or
+ *without modification, are permitted provided that the following
+ *conditions are met:
+ *
+ * - Redistributions of source code must retain the above
+ *   copyright notice, this list of conditions and the following
+ *   disclaimer.
+ *
+ * - Redistributions in binary form must reproduce the above
+ *   copyright notice, this list of conditions and the following
+ *   disclaimer in the documentation and/or other materials
+ *   provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include "rxe.h"
+#include "rxe_loc.h"
+
+int rxe_mcast_get_grp(struct rxe_dev *rxe, union ib_gid *mgid,
+ struct rxe_mc_grp **grp_p)
+{
+   int err;
+   struct rxe_mc_grp *grp;
+
+   if (rxe->attr.max_mcast_qp_attach == 0) {
+   err = -EINVAL;
+   goto err1;
+   }
+
+   grp = rxe_pool_get_key(>mc_grp_pool, mgid);
+   if (grp)
+   goto done;
+
+   grp = rxe_alloc(>mc_grp_pool);
+   if (!grp) {
+   err = -ENOMEM;
+   goto err1;
+   }
+
+   INIT_LIST_HEAD(>qp_list);
+   spin_lock_init(>mcg_lock);
+   grp->rxe = rxe;
+
+   rxe_add_key(grp, mgid);
+
+   err = rxe->ifc_ops->mcast_add(rxe, mgid);
+   if (err)
+   goto err2;
+
+done:
+   *grp_p = grp;
+   return 0;
+
+err2:
+   rxe_drop_ref(grp);
+err1:
+   return err;
+}
+
+int rxe_mcast_add_grp_elem(struct rxe_dev *rxe, struct rxe_qp *qp,
+  struct rxe_mc_grp *grp)
+{
+   int err;
+   struct rxe_mc_elem *elem;
+
+   /* check to see of the qp is already a member of the group */
+   spin_lock_bh(>grp_lock);
+   spin_lock_bh(>mcg_lock);
+   list_for_each_entry(elem, >qp_list, qp_list) {
+   if (elem->qp == qp) {
+   err = 0;
+   goto out;
+   }
+   }
+
+   if (grp->num_qp >= rxe->attr.max_mcast_qp_attach) {
+   err = -ENOMEM;
+   goto out;
+   }
+
+   elem = rxe_alloc(>mc_elem_pool);
+   if (!elem) {
+   err = -ENOMEM;
+   goto out;
+   }
+
+   /* each qp holds a ref on the grp */
+   rxe_add_ref(grp);
+
+   grp->num_qp++;
+   elem->qp = qp;
+   elem->grp = grp;
+
+   list_add(>qp_list, >qp_list);
+   list_add(>grp_list, >grp_list);
+
+   err = 0;
+out:
+   spin_unlock_bh(>mcg_lock);
+   spin_unlock_bh(>grp_lock);
+   return err;
+}
+
+int rxe_mcast_drop_grp_elem(struct rxe_dev *rxe, struct rxe_qp *qp,
+   union ib_gid *mgid)
+{
+   struct rxe_mc_grp *grp;
+   struct rxe_mc_elem *elem, *tmp;
+
+   grp = rxe_pool_get_key(>mc_grp_pool, mgid);
+   if (!grp)
+   goto err1;
+
+   spin_lock_bh(>grp_lock);
+   spin_lock_bh(>mcg_lock);
+
+   list_for_each_entry_safe(elem, tmp, >qp_list, qp_list) {
+   if (elem->qp == qp) {
+   list_del(>qp_list);
+   list_del(>grp_list);
+   grp->num_qp--;
+
+   spin_unlock_bh(>mcg_lock);
+   spin_unlock_bh(>grp_lock);
+   rxe_drop_ref(elem);
+   rxe_drop_ref(grp);  /* ref held by QP */
+   rxe_drop_ref(grp);  /* 

[PATCH rdma-next 24/32] IB/rxe: QP response handling

2015-09-16 Thread Kamal Heib
QP response logic.

Signed-off-by: Kamal Heib 
Signed-off-by: Amir Vadai 
---
 drivers/staging/rxe/rxe_resp.c | 1368 
 1 file changed, 1368 insertions(+)
 create mode 100644 drivers/staging/rxe/rxe_resp.c

diff --git a/drivers/staging/rxe/rxe_resp.c b/drivers/staging/rxe/rxe_resp.c
new file mode 100644
index 000..78304c6
--- /dev/null
+++ b/drivers/staging/rxe/rxe_resp.c
@@ -0,0 +1,1368 @@
+/*
+ * Copyright (c) 2015 Mellanox Technologies Ltd. All rights reserved.
+ * Copyright (c) 2015 System Fabric Works, Inc. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ * - Redistributions of source code must retain the above
+ *   copyright notice, this list of conditions and the following
+ *   disclaimer.
+ *
+ * - Redistributions in binary form must reproduce the above
+ *   copyright notice, this list of conditions and the following
+ *   disclaimer in the documentation and/or other materials
+ *   provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include 
+
+#include "rxe.h"
+#include "rxe_loc.h"
+#include "rxe_queue.h"
+
+enum resp_states {
+   RESPST_NONE,
+   RESPST_GET_REQ,
+   RESPST_CHK_PSN,
+   RESPST_CHK_OP_SEQ,
+   RESPST_CHK_OP_VALID,
+   RESPST_CHK_RESOURCE,
+   RESPST_CHK_LENGTH,
+   RESPST_CHK_RKEY,
+   RESPST_EXECUTE,
+   RESPST_READ_REPLY,
+   RESPST_COMPLETE,
+   RESPST_ACKNOWLEDGE,
+   RESPST_CLEANUP,
+   RESPST_DUPLICATE_REQUEST,
+   RESPST_ERR_MALFORMED_WQE,
+   RESPST_ERR_UNSUPPORTED_OPCODE,
+   RESPST_ERR_MISALIGNED_ATOMIC,
+   RESPST_ERR_PSN_OUT_OF_SEQ,
+   RESPST_ERR_MISSING_OPCODE_FIRST,
+   RESPST_ERR_MISSING_OPCODE_LAST_C,
+   RESPST_ERR_MISSING_OPCODE_LAST_D1E,
+   RESPST_ERR_TOO_MANY_RDMA_ATM_REQ,
+   RESPST_ERR_RNR,
+   RESPST_ERR_RKEY_VIOLATION,
+   RESPST_ERR_LENGTH,
+   RESPST_ERR_CQ_OVERFLOW,
+   RESPST_ERROR,
+   RESPST_RESET,
+   RESPST_DONE,
+   RESPST_EXIT,
+};
+
+static char *resp_state_name[] = {
+   [RESPST_NONE]   = "NONE",
+   [RESPST_GET_REQ]= "GET_REQ",
+   [RESPST_CHK_PSN]= "CHK_PSN",
+   [RESPST_CHK_OP_SEQ] = "CHK_OP_SEQ",
+   [RESPST_CHK_OP_VALID]   = "CHK_OP_VALID",
+   [RESPST_CHK_RESOURCE]   = "CHK_RESOURCE",
+   [RESPST_CHK_LENGTH] = "CHK_LENGTH",
+   [RESPST_CHK_RKEY]   = "CHK_RKEY",
+   [RESPST_EXECUTE]= "EXECUTE",
+   [RESPST_READ_REPLY] = "READ_REPLY",
+   [RESPST_COMPLETE]   = "COMPLETE",
+   [RESPST_ACKNOWLEDGE]= "ACKNOWLEDGE",
+   [RESPST_CLEANUP]= "CLEANUP",
+   [RESPST_DUPLICATE_REQUEST]  = "DUPLICATE_REQUEST",
+   [RESPST_ERR_MALFORMED_WQE]  = "ERR_MALFORMED_WQE",
+   [RESPST_ERR_UNSUPPORTED_OPCODE] = "ERR_UNSUPPORTED_OPCODE",
+   [RESPST_ERR_MISALIGNED_ATOMIC]  = "ERR_MISALIGNED_ATOMIC",
+   [RESPST_ERR_PSN_OUT_OF_SEQ] = "ERR_PSN_OUT_OF_SEQ",
+   [RESPST_ERR_MISSING_OPCODE_FIRST]   = "ERR_MISSING_OPCODE_FIRST",
+   [RESPST_ERR_MISSING_OPCODE_LAST_C]  = "ERR_MISSING_OPCODE_LAST_C",
+   [RESPST_ERR_MISSING_OPCODE_LAST_D1E]= "ERR_MISSING_OPCODE_LAST_D1E",
+   [RESPST_ERR_TOO_MANY_RDMA_ATM_REQ]  = "ERR_TOO_MANY_RDMA_ATM_REQ",
+   [RESPST_ERR_RNR]= "ERR_RNR",
+   [RESPST_ERR_RKEY_VIOLATION] = "ERR_RKEY_VIOLATION",
+   [RESPST_ERR_LENGTH] = "ERR_LENGTH",
+   [RESPST_ERR_CQ_OVERFLOW]= "ERR_CQ_OVERFLOW",
+   [RESPST_ERROR]  = "ERROR",
+   [RESPST_RESET]  = "RESET",
+   [RESPST_DONE]   = "DONE",
+   

[PATCH rdma-next 29/32] IB/rxe: sysfs interface to RXE

2015-09-16 Thread Kamal Heib
sysfs interface for ib_rxe

Signed-off-by: Kamal Heib 
Signed-off-by: Amir Vadai 
Signed-off-by: Haggai Eran 
---
 drivers/staging/rxe/rxe_sysfs.c | 168 
 1 file changed, 168 insertions(+)
 create mode 100644 drivers/staging/rxe/rxe_sysfs.c

diff --git a/drivers/staging/rxe/rxe_sysfs.c b/drivers/staging/rxe/rxe_sysfs.c
new file mode 100644
index 000..35bc299
--- /dev/null
+++ b/drivers/staging/rxe/rxe_sysfs.c
@@ -0,0 +1,168 @@
+/*
+ * Copyright (c) 2015 Mellanox Technologies Ltd. All rights reserved.
+ * Copyright (c) 2015 System Fabric Works, Inc. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ * - Redistributions of source code must retain the above
+ *   copyright notice, this list of conditions and the following
+ *   disclaimer.
+ *
+ * - Redistributions in binary form must reproduce the above
+ *   copyright notice, this list of conditions and the following
+ *   disclaimer in the documentation and/or other materials
+ *   provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include "rxe.h"
+#include "rxe_net.h"
+
+/* Copy argument and remove trailing CR. Return the new length. */
+static int sanitize_arg(const char *val, char *intf, int intf_len)
+{
+   int len;
+
+   if (!val)
+   return 0;
+
+   /* Remove newline. */
+   for (len = 0; len < intf_len - 1 && val[len] && val[len] != '\n'; len++)
+   intf[len] = val[len];
+   intf[len] = 0;
+
+   if (len == 0 || (val[len] != 0 && val[len] != '\n'))
+   return 0;
+
+   return len;
+}
+
+/* Caller must hold net_info_lock */
+static void rxe_set_port_state(struct net_device *ndev)
+{
+   struct rxe_dev *rxe;
+
+   rxe = net_to_rxe(ndev);
+   if (!rxe)
+   goto out;
+
+   if (net_info[ndev->ifindex].status == IB_PORT_ACTIVE)
+   rxe_net_up(ndev);
+   else
+   rxe_net_down(ndev); /* down for unknown state */
+out:
+   return;
+}
+
+static int rxe_param_set_add(const char *val, struct kernel_param *kp)
+{
+   int i, len, err;
+   char intf[32];
+
+   len = sanitize_arg(val, intf, sizeof(intf));
+   if (!len) {
+   pr_err("rxe: add: invalid interface name\n");
+   return -EINVAL;
+   }
+
+   spin_lock_bh(_info_lock);
+   for (i = 0; i < RXE_MAX_IF_INDEX; i++) {
+   struct net_device *ndev = net_info[i].ndev;
+
+   if (ndev && (0 == strncmp(intf, ndev->name, len))) {
+   spin_unlock_bh(_info_lock);
+   if (net_info[i].rxe)
+   pr_info("rxe: already configured on %s\n",
+   intf);
+   else {
+   err = rxe_net_add(ndev);
+   if (!err && net_info[i].rxe) {
+   rxe_set_port_state(ndev);
+   } else {
+   pr_err("rxe: add appears to have failed 
for %s (index %d)\n",
+  intf, i);
+   }
+   }
+   return 0;
+   }
+   }
+   spin_unlock_bh(_info_lock);
+
+   pr_warn("interface %s not found\n", intf);
+
+   return 0;
+}
+
+static void rxe_remove_all(void)
+{
+   int i;
+   struct rxe_dev *rxe;
+
+   for (i = 0; i < RXE_MAX_IF_INDEX; i++) {
+   if (net_info[i].rxe) {
+   spin_lock_bh(_info_lock);
+   rxe = net_info[i].rxe;
+   net_info[i].rxe = NULL;
+   spin_unlock_bh(_info_lock);
+
+   rxe_remove(rxe);
+   }
+   }
+}
+
+static int rxe_param_set_remove(const char *val, struct kernel_param *kp)
+{
+   int i, len;
+   char intf[32];
+   struct rxe_dev *rxe;

[PATCH rdma-next 25/32] IB/rxe: Dummy DMA callbacks for RXE device

2015-09-16 Thread Kamal Heib
Dummy DMA processing for RXE device.


Signed-off-by: Kamal Heib 
Signed-off-by: Amir Vadai 
Signed-off-by: Haggai Eran 
---
 drivers/staging/rxe/rxe_dma.c | 166 ++
 1 file changed, 166 insertions(+)
 create mode 100644 drivers/staging/rxe/rxe_dma.c

diff --git a/drivers/staging/rxe/rxe_dma.c b/drivers/staging/rxe/rxe_dma.c
new file mode 100644
index 000..265b03d
--- /dev/null
+++ b/drivers/staging/rxe/rxe_dma.c
@@ -0,0 +1,166 @@
+/*
+ * Copyright (c) 2015 Mellanox Technologies Ltd. All rights reserved.
+ * Copyright (c) 2015 System Fabric Works, Inc. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ * - Redistributions of source code must retain the above
+ *   copyright notice, this list of conditions and the following
+ *   disclaimer.
+ *
+ * - Redistributions in binary form must reproduce the above
+ *   copyright notice, this list of conditions and the following
+ *   disclaimer in the documentation and/or other materials
+ *   provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include "rxe.h"
+#include "rxe_loc.h"
+
+#define DMA_BAD_ADDER ((u64)0)
+
+static int rxe_mapping_error(struct ib_device *dev, u64 dma_addr)
+{
+   return dma_addr == DMA_BAD_ADDER;
+}
+
+static u64 rxe_dma_map_single(struct ib_device *dev,
+ void *cpu_addr, size_t size,
+ enum dma_data_direction direction)
+{
+   WARN_ON(!valid_dma_direction(direction));
+   return (u64)cpu_addr;
+}
+
+static void rxe_dma_unmap_single(struct ib_device *dev,
+u64 addr, size_t size,
+enum dma_data_direction direction)
+{
+   WARN_ON(!valid_dma_direction(direction));
+}
+
+static u64 rxe_dma_map_page(struct ib_device *dev,
+   struct page *page,
+   unsigned long offset,
+   size_t size, enum dma_data_direction direction)
+{
+   u64 addr;
+
+   WARN_ON(!valid_dma_direction(direction));
+
+   if (offset + size > PAGE_SIZE) {
+   addr = DMA_BAD_ADDER;
+   goto done;
+   }
+
+   addr = (u64)page_address(page);
+   if (addr)
+   addr += offset;
+
+done:
+   return addr;
+}
+
+static void rxe_dma_unmap_page(struct ib_device *dev,
+  u64 addr, size_t size,
+  enum dma_data_direction direction)
+{
+   WARN_ON(!valid_dma_direction(direction));
+}
+
+static int rxe_map_sg(struct ib_device *dev, struct scatterlist *sgl,
+ int nents, enum dma_data_direction direction)
+{
+   struct scatterlist *sg;
+   u64 addr;
+   int i;
+   int ret = nents;
+
+   WARN_ON(!valid_dma_direction(direction));
+
+   for_each_sg(sgl, sg, nents, i) {
+   addr = (u64)page_address(sg_page(sg));
+   if (!addr) {
+   ret = 0;
+   break;
+   }
+   sg->dma_address = addr + sg->offset;
+#ifdef CONFIG_NEED_SG_DMA_LENGTH
+   sg->dma_length = sg->length;
+#endif
+   }
+
+   return ret;
+}
+
+static void rxe_unmap_sg(struct ib_device *dev,
+struct scatterlist *sg, int nents,
+enum dma_data_direction direction)
+{
+   WARN_ON(!valid_dma_direction(direction));
+}
+
+static void rxe_sync_single_for_cpu(struct ib_device *dev,
+   u64 addr,
+   size_t size, enum dma_data_direction dir)
+{
+}
+
+static void rxe_sync_single_for_device(struct ib_device *dev,
+  u64 addr,
+  size_t size, enum dma_data_direction dir)
+{
+}
+
+static void *rxe_dma_alloc_coherent(struct ib_device *dev, size_t size,
+   u64 *dma_handle, gfp_t flag)
+{
+   struct page 

[PATCH rdma-next 26/32] IB/rxe: ICRC calculations

2015-09-16 Thread Kamal Heib
Compute ICRC for UDP/IP/BTH headers

Signed-off-by: Kamal Heib 
Signed-off-by: Amir Vadai 
---
 drivers/staging/rxe/rxe_icrc.c | 96 ++
 1 file changed, 96 insertions(+)
 create mode 100644 drivers/staging/rxe/rxe_icrc.c

diff --git a/drivers/staging/rxe/rxe_icrc.c b/drivers/staging/rxe/rxe_icrc.c
new file mode 100644
index 000..02b73d6
--- /dev/null
+++ b/drivers/staging/rxe/rxe_icrc.c
@@ -0,0 +1,96 @@
+/*
+ * Copyright (c) 2015 Mellanox Technologies Ltd. All rights reserved.
+ * Copyright (c) 2015 System Fabric Works, Inc. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ * - Redistributions of source code must retain the above
+ *   copyright notice, this list of conditions and the following
+ *   disclaimer.
+ *
+ * - Redistributions in binary form must reproduce the above
+ *   copyright notice, this list of conditions and the following
+ *   disclaimer in the documentation and/or other materials
+ *   provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include "rxe.h"
+#include "rxe_loc.h"
+
+/* Compute a partial ICRC for all the IB transport headers. */
+u32 rxe_icrc_hdr(struct rxe_pkt_info *pkt, struct sk_buff *skb)
+{
+   unsigned int bth_offset = 0;
+   struct iphdr *ip4h = NULL;
+   struct ipv6hdr *ip6h = NULL;
+   struct udphdr *udph;
+   struct rxe_bth *bth;
+   int crc;
+   int length;
+   int hdr_size = sizeof(struct udphdr) +
+   (skb->protocol == htons(ETH_P_IP) ?
+   sizeof(struct iphdr) : sizeof(struct ipv6hdr));
+   /* pseudo header buffer size is calculate using ipv6 header size since
+* it is bigger than ipv4
+*/
+   u8 pshdr[sizeof(struct udphdr) +
+   sizeof(struct ipv6hdr) +
+   RXE_BTH_BYTES];
+
+   /* This seed is the result of computing a CRC with a seed of
+* 0xfff and 8 bytes of 0xff representing a masked LRH.
+*/
+   crc = 0xdebb20e3;
+
+   if (skb->protocol == htons(ETH_P_IP)) { /* IPv4 */
+   memcpy(pshdr, ip_hdr(skb), hdr_size);
+   ip4h = (struct iphdr *)pshdr;
+   udph = (struct udphdr *)(ip4h + 1);
+
+   ip4h->ttl = 0xff;
+   ip4h->check = CSUM_MANGLED_0;
+   ip4h->tos = 0xff;
+   } else {/* IPv6 */
+   memcpy(pshdr, ipv6_hdr(skb), hdr_size);
+   ip6h = (struct ipv6hdr *)pshdr;
+   udph = (struct udphdr *)(ip6h + 1);
+
+   memset(ip6h->flow_lbl, 0xff, sizeof(ip6h->flow_lbl));
+   ip6h->priority = 0xf;
+   ip6h->hop_limit = 0xff;
+   }
+   udph->check = CSUM_MANGLED_0;
+
+   bth_offset += hdr_size;
+
+   memcpy([bth_offset], pkt->hdr, RXE_BTH_BYTES);
+   bth = (struct rxe_bth *)[bth_offset];
+
+   /* exclude bth.resv8a */
+   bth->qpn |= cpu_to_be32(~BTH_QPN_MASK);
+
+   length = hdr_size + RXE_BTH_BYTES;
+   crc = crc32_le(crc, pshdr, length);
+
+   /* And finish to compute the CRC on the remainder of the headers. */
+   crc = crc32_le(crc, pkt->hdr + RXE_BTH_BYTES,
+  rxe_opcode[pkt->opcode].length - RXE_BTH_BYTES);
+   return crc;
+}
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH rdma-next 03/32] IB/rxe: IBA header types and methods

2015-09-16 Thread Kamal Heib
Add declarations for data structures used to hold per opcode
and per work request opcode tables.


Signed-off-by: Kamal Heib 
Signed-off-by: Amir Vadai 
Signed-off-by: Haggai Eran 
---
 drivers/staging/rxe/rxe_hdr.h | 950 ++
 1 file changed, 950 insertions(+)
 create mode 100644 drivers/staging/rxe/rxe_hdr.h

diff --git a/drivers/staging/rxe/rxe_hdr.h b/drivers/staging/rxe/rxe_hdr.h
new file mode 100644
index 000..d8bc4a3
--- /dev/null
+++ b/drivers/staging/rxe/rxe_hdr.h
@@ -0,0 +1,950 @@
+/*
+ * Copyright (c) 2015 Mellanox Technologies Ltd. All rights reserved.
+ * Copyright (c) 2015 System Fabric Works, Inc. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ * - Redistributions of source code must retain the above
+ *   copyright notice, this list of conditions and the following
+ *   disclaimer.
+ *
+ * - Redistributions in binary form must reproduce the above
+ *   copyright notice, this list of conditions and the following
+ *   disclaimer in the documentation and/or other materials
+ *   provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#ifndef RXE_HDR_H
+#define RXE_HDR_H
+
+/* extracted information about a packet carried in an sk_buff struct fits in
+ * the skbuff cb array. Must be at most 48 bytes.
+ */
+struct rxe_pkt_info {
+   struct rxe_dev  *rxe;   /* device that owns packet */
+   struct rxe_qp   *qp;/* qp that owns packet */
+   struct rxe_send_wqe *wqe;   /* send wqe */
+   u8  *hdr;   /* points to bth */
+   u32 mask;   /* useful info about pkt */
+   u32 psn;/* bth psn of packet */
+   u16 pkey_index; /* partition of pkt */
+   u16 paylen; /* length of bth - icrc */
+   u8  port_num;   /* port pkt received on */
+   u8  opcode; /* bth opcode of packet */
+   u8  offset; /* bth offset from pkt->hdr */
+};
+
+#define SKB_TO_PKT(skb) ((struct rxe_pkt_info *)(skb)->cb)
+#define PKT_TO_SKB(pkt) container_of((void *)(pkt), struct sk_buff, cb)
+
+/*
+ * IBA header types and methods
+ *
+ * Some of these are for reference and completeness only since
+ * rxe does not currently support RD transport
+ * most of this could be moved into IB core. ib_pack.h has
+ * part of this but is incomplete
+ *
+ * Header specific routines to insert/extract values to/from headers
+ * the routines that are named __hhh_(set_)fff() take a pointer to a
+ * hhh header and get(set) the fff field. The routines named
+ * hhh_(set_)fff take a packet info struct and find the
+ * header and field based on the opcode in the packet.
+ * Conversion to/from network byte order from cpu order is also done.
+ */
+
+#define RXE_ICRC_SIZE  (4)
+#define RXE_MAX_HDR_LENGTH (80)
+
+/**
+ * Base Transport Header
+ 
**/
+struct rxe_bth {
+   u8  opcode;
+   u8  flags;
+   __be16  pkey;
+   __be32  qpn;
+   __be32  apsn;
+};
+
+#define BTH_TVER   (0)
+#define BTH_DEF_PKEY   (0x)
+
+#define BTH_SE_MASK(0x80)
+#define BTH_MIG_MASK   (0x40)
+#define BTH_PAD_MASK   (0x30)
+#define BTH_TVER_MASK  (0x0f)
+#define BTH_FECN_MASK  (0x8000)
+#define BTH_BECN_MASK  (0x4000)
+#define BTH_RESV6A_MASK(0x3f00)
+#define BTH_QPN_MASK   (0x00ff)
+#define BTH_ACK_MASK   (0x8000)
+#define BTH_RESV7_MASK (0x7f00)
+#define BTH_PSN_MASK   (0x00ff)
+
+static inline u8 

[PATCH rdma-next 27/32] IB/rxe: Module init hooks

2015-09-16 Thread Kamal Heib
Module main for ib_rxe

Signed-off-by: Kamal Heib 
Signed-off-by: Amir Vadai 
Signed-off-by: Haggai Eran 
---
 drivers/staging/rxe/rxe.c | 434 ++
 drivers/staging/rxe/rxe.h |   2 +
 2 files changed, 436 insertions(+)
 create mode 100644 drivers/staging/rxe/rxe.c

diff --git a/drivers/staging/rxe/rxe.c b/drivers/staging/rxe/rxe.c
new file mode 100644
index 000..f6c81ba
--- /dev/null
+++ b/drivers/staging/rxe/rxe.c
@@ -0,0 +1,434 @@
+/*
+ * Copyright (c) 2015 Mellanox Technologies Ltd. All rights reserved.
+ * Copyright (c) 2015 System Fabric Works, Inc. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ * - Redistributions of source code must retain the above
+ *   copyright notice, this list of conditions and the following
+ *   disclaimer.
+ *
+ * - Redistributions in binary form must reproduce the above
+ *   copyright notice, this list of conditions and the following
+ *   disclaimer in the documentation and/or other materials
+ *   provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include "rxe.h"
+#include "rxe_loc.h"
+
+MODULE_AUTHOR("Bob Pearson, Frank Zago, John Groves, Kamal Heib");
+MODULE_DESCRIPTION("Soft RDMA transport");
+MODULE_LICENSE("Dual BSD/GPL");
+MODULE_VERSION("0.2");
+
+/* free resources for all ports on a device */
+static void rxe_cleanup_ports(struct rxe_dev *rxe)
+{
+   unsigned int port_num;
+   struct rxe_port *port;
+
+   for (port_num = 1; port_num <= rxe->num_ports; port_num++) {
+   port = >port[port_num - 1];
+
+   kfree(port->pkey_tbl);
+   port->pkey_tbl = NULL;
+   }
+
+   kfree(rxe->port);
+   rxe->port = NULL;
+}
+
+/* free resources for a rxe device all objects created for this device must
+ * have been destroyed
+ */
+static void rxe_cleanup(struct rxe_dev *rxe)
+{
+   rxe_pool_cleanup(>uc_pool);
+   rxe_pool_cleanup(>pd_pool);
+   rxe_pool_cleanup(>ah_pool);
+   rxe_pool_cleanup(>srq_pool);
+   rxe_pool_cleanup(>qp_pool);
+   rxe_pool_cleanup(>cq_pool);
+   rxe_pool_cleanup(>mr_pool);
+   rxe_pool_cleanup(>fmr_pool);
+   rxe_pool_cleanup(>mw_pool);
+   rxe_pool_cleanup(>mc_grp_pool);
+   rxe_pool_cleanup(>mc_elem_pool);
+
+   rxe_cleanup_ports(rxe);
+}
+
+/* called when all references have been dropped */
+void rxe_release(struct kref *kref)
+{
+   struct rxe_dev *rxe = container_of(kref, struct rxe_dev, ref_cnt);
+
+   rxe_cleanup(rxe);
+   ib_dealloc_device(>ib_dev);
+}
+
+void rxe_dev_put(struct rxe_dev *rxe)
+{
+   kref_put(>ref_cnt, rxe_release);
+}
+EXPORT_SYMBOL_GPL(rxe_dev_put);
+
+/* initialize rxe device parameters */
+static int rxe_init_device_param(struct rxe_dev *rxe)
+{
+   rxe->num_ports  = RXE_NUM_PORT;
+   rxe->max_inline_data= RXE_MAX_INLINE_DATA;
+
+   rxe->attr.fw_ver= RXE_FW_VER;
+   rxe->attr.max_mr_size   = RXE_MAX_MR_SIZE;
+   rxe->attr.page_size_cap = RXE_PAGE_SIZE_CAP;
+   rxe->attr.vendor_id = RXE_VENDOR_ID;
+   rxe->attr.vendor_part_id= RXE_VENDOR_PART_ID;
+   rxe->attr.hw_ver= RXE_HW_VER;
+   rxe->attr.max_qp= RXE_MAX_QP;
+   rxe->attr.max_qp_wr = RXE_MAX_QP_WR;
+   rxe->attr.device_cap_flags  = RXE_DEVICE_CAP_FLAGS;
+   rxe->attr.max_sge   = RXE_MAX_SGE;
+   rxe->attr.max_sge_rd= RXE_MAX_SGE_RD;
+   rxe->attr.max_cq= RXE_MAX_CQ;
+   rxe->attr.max_cqe   = (1 << RXE_MAX_LOG_CQE) - 1;
+   rxe->attr.max_mr= RXE_MAX_MR;
+   rxe->attr.max_pd= RXE_MAX_PD;
+   rxe->attr.max_qp_rd_atom= RXE_MAX_QP_RD_ATOM;
+   

[PATCH rdma-next 12/32] IB/rxe: Interface to ib_core

2015-09-16 Thread Kamal Heib
rxe interface to rdma/core

Signed-off-by: Kamal Heib 
Signed-off-by: Amir Vadai 
Signed-off-by: Haggai Eran 
---
 drivers/staging/rxe/rxe_verbs.c | 1429 +++
 drivers/staging/rxe/rxe_verbs.h |  496 ++
 2 files changed, 1925 insertions(+)
 create mode 100644 drivers/staging/rxe/rxe_verbs.c
 create mode 100644 drivers/staging/rxe/rxe_verbs.h

diff --git a/drivers/staging/rxe/rxe_verbs.c b/drivers/staging/rxe/rxe_verbs.c
new file mode 100644
index 000..c96d649
--- /dev/null
+++ b/drivers/staging/rxe/rxe_verbs.c
@@ -0,0 +1,1429 @@
+/*
+ * Copyright (c) 2015 Mellanox Technologies Ltd. All rights reserved.
+ * Copyright (c) 2015 System Fabric Works, Inc. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ * - Redistributions of source code must retain the above
+ *   copyright notice, this list of conditions and the following
+ *   disclaimer.
+ *
+ * - Redistributions in binary form must reproduce the above
+ *   copyright notice, this list of conditions and the following
+ *   disclaimer in the documentation and/or other materials
+ *   provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include "rxe.h"
+#include "rxe_loc.h"
+#include "rxe_queue.h"
+
+static int rxe_query_device(struct ib_device *dev,
+   struct ib_device_attr *attr,
+   struct ib_udata *uhw)
+{
+   struct rxe_dev *rxe = to_rdev(dev);
+
+   if (uhw->inlen || uhw->outlen)
+   return -EINVAL;
+
+   *attr = rxe->attr;
+   return 0;
+}
+
+static int rxe_query_port(struct ib_device *dev,
+ u8 port_num, struct ib_port_attr *attr)
+{
+   struct rxe_dev *rxe = to_rdev(dev);
+   struct rxe_port *port;
+
+   if (unlikely(port_num < 1 || port_num > rxe->num_ports)) {
+   pr_warn("invalid port_number %d\n", port_num);
+   goto err1;
+   }
+
+   port = >port[port_num - 1];
+
+   *attr = port->attr;
+   return 0;
+
+err1:
+   return -EINVAL;
+}
+
+static int rxe_query_gid(struct ib_device *device,
+u8 port_num, int index, union ib_gid *gid)
+{
+   int ret;
+
+   if (index > RXE_PORT_GID_TBL_LEN)
+   return -EINVAL;
+
+   ret = ib_get_cached_gid(device, port_num, index, gid, NULL);
+   if (ret == -EAGAIN) {
+   memcpy(gid, , sizeof(*gid));
+   return 0;
+   }
+
+   return ret;
+}
+
+static int rxe_add_gid(struct ib_device *device, u8 port_num, unsigned int
+  index, const union ib_gid *gid,
+  const struct ib_gid_attr *attr, void **context)
+{
+   return 0;
+}
+
+static int rxe_del_gid(struct ib_device *device, u8 port_num, unsigned int
+  index, void **context)
+{
+   return 0;
+}
+
+static struct net_device *rxe_get_netdev(struct ib_device *device,
+u8 port_num)
+{
+   struct rxe_dev *rxe = to_rdev(device);
+
+   if (rxe->ndev)
+   return rxe->ndev;
+
+   return NULL;
+}
+
+static int rxe_query_pkey(struct ib_device *device,
+ u8 port_num, u16 index, u16 *pkey)
+{
+   struct rxe_dev *rxe = to_rdev(device);
+   struct rxe_port *port;
+
+   if (unlikely(port_num < 1 || port_num > rxe->num_ports)) {
+   dev_warn(device->dma_device, "invalid port_num = %d\n",
+port_num);
+   goto err1;
+   }
+
+   port = >port[port_num - 1];
+
+   if (unlikely(index >= port->attr.pkey_tbl_len)) {
+   dev_warn(device->dma_device, "invalid index = %d\n",
+index);
+   goto err1;
+   }
+
+   *pkey = port->pkey_tbl[index];
+   return 0;
+
+err1:
+   return -EINVAL;
+}
+
+static int rxe_modify_device(struct ib_device *dev,
+int mask, struct ib_device_modify *attr)
+{
+

Re: [PATCH rdma-next 00/32] Soft-RoCE driver

2015-09-16 Thread Sagi Grimberg

On 9/16/2015 4:42 PM, Kamal Heib wrote:

Doug and list Hi,

This patchset introduces Soft RoCE driver.


Thanks guys,

Should probably mention that iser initiator was tested over this driver
and works pretty cool! (user-space TGT iser target was tested with librxe).

Cheers,
Sagi.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH for-4.3] IB/ipoib: add module option for auto-creating mcast groups

2015-09-16 Thread Christoph Lameter
On Wed, 16 Sep 2015, Doug Ledford wrote:

> > Abusing it for send-side is probably the wrong
> > direction overall.
>
> I wouldn't "abuse" it for such, I would suggest adding a proper notion
> of send-only registrations.

That is really not necessary for IP traffic. There is no need to track
these since multicast can be send without subscriptions. So I guess that
there will not be much support on netdev for such an approach.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/1] Add support for TX/RX checksum offload

2015-09-16 Thread Bodong Wang
Add a device capability field csum_cap to denote IPv4 checksum offload
support. Devices should configure this field if they support
insertion/verification of IPv4, TCP and UDP checksums on outgoing/incoming
IPv4 packets according link layer and QP types.

Flags IBV_SEND_IP_CSUM and IBV_WC_IP_CSUM_OK are added for utilizing this
capability for send and receive separately.

Signed-off-by: Bodong Wang 
---
 examples/devinfo.c| 33 +
 include/infiniband/kern-abi.h |  7 +++
 include/infiniband/verbs.h| 22 --
 man/ibv_poll_cq.3 |  5 +
 man/ibv_post_send.3   |  4 
 src/cmd.c | 13 +
 6 files changed, 82 insertions(+), 2 deletions(-)

diff --git a/examples/devinfo.c b/examples/devinfo.c
index a8de982..46d4614 100644
--- a/examples/devinfo.c
+++ b/examples/devinfo.c
@@ -253,6 +253,38 @@ void print_odp_caps(const struct ibv_odp_caps *caps)
print_odp_trans_caps(caps->per_transport_caps.ud_odp_caps);
 }
 
+void print_csum_caps(const struct ibv_csum_cap_per_link *caps)
+{
+   uint32_t unknown_csum_caps = ~(IBV_CSUM_SUPPORT_RAW |
+  IBV_CSUM_SUPPORT_UD);
+
+   printf("\teth_csum_cap:\n");
+   if (!caps->eth_csum_cap) {
+   printf("\t\t\t\t\tNO_SUPPORT\n");
+   } else {
+   if (caps->eth_csum_cap & IBV_CSUM_SUPPORT_RAW)
+   printf("\t\t\t\t\tRAW_QP_SUPPORT\n");
+   if (caps->eth_csum_cap & IBV_CSUM_SUPPORT_UD)
+   printf("\t\t\t\t\tUD_QP_SUPPORT\n");
+   if (caps->eth_csum_cap & unknown_csum_caps)
+   printf("\t\t\t\t\tUnknown flags: 0x%" PRIX32 "\n",
+  caps->eth_csum_cap & unknown_csum_caps);
+   }
+
+   printf("\tib_csum_cap:\n");
+   if (!caps->ib_csum_cap) {
+   printf("\t\t\t\t\tNO_SUPPORT\n");
+   } else {
+   if (caps->ib_csum_cap & IBV_CSUM_SUPPORT_RAW)
+   printf("\t\t\t\t\tRAW_QP_SUPPORT\n");
+   if (caps->ib_csum_cap & IBV_CSUM_SUPPORT_UD)
+   printf("\t\t\t\t\tUD_QP_SUPPORT\n");
+   if (caps->ib_csum_cap & unknown_csum_caps)
+   printf("\t\t\t\t\tUnknown flags: 0x%" PRIX32 "\n",
+  caps->ib_csum_cap & unknown_csum_caps);
+   }
+}
+
 static int print_hca_cap(struct ibv_device *ib_dev, uint8_t ib_port)
 {
struct ibv_context *ctx;
@@ -339,6 +371,7 @@ static int print_hca_cap(struct ibv_device *ib_dev, uint8_t 
ib_port)
printf("\tlocal_ca_ack_delay:\t\t%d\n", 
device_attr.orig_attr.local_ca_ack_delay);
 
print_odp_caps(_attr.odp_caps);
+   print_csum_caps(_attr.csum_cap);
}
 
for (port = 1; port <= device_attr.orig_attr.phys_port_cnt; ++port) {
diff --git a/include/infiniband/kern-abi.h b/include/infiniband/kern-abi.h
index 800c5ab..51d4fb0 100644
--- a/include/infiniband/kern-abi.h
+++ b/include/infiniband/kern-abi.h
@@ -262,11 +262,18 @@ struct ibv_odp_caps_resp {
__u32 reserved;
 };
 
+struct ibv_csum_cap_per_link_resp {
+   __u32 eth_csum_cap;
+   __u32 ib_csum_cap;
+};
+
 struct ibv_query_device_resp_ex {
struct ibv_query_device_resp base;
__u32 comp_mask;
__u32 response_length;
struct ibv_odp_caps_resp odp_caps;
+   __u64 reserved0[2];
+   struct ibv_csum_cap_per_link_resp csum_cap;
 };
 
 struct ibv_query_port {
diff --git a/include/infiniband/verbs.h b/include/infiniband/verbs.h
index 1ff5265..134359f 100644
--- a/include/infiniband/verbs.h
+++ b/include/infiniband/verbs.h
@@ -196,10 +196,16 @@ enum ibv_odp_general_caps {
IBV_ODP_SUPPORT = 1 << 0,
 };
 
+struct ibv_csum_cap_per_link {
+   uint32_t eth_csum_cap;
+   uint32_t ib_csum_cap;
+};
+
 struct ibv_device_attr_ex {
struct ibv_device_attr  orig_attr;
uint32_tcomp_mask;
struct ibv_odp_caps odp_caps;
+   struct ibv_csum_cap_per_link csum_cap;
 };
 
 enum ibv_mtu {
@@ -348,9 +354,14 @@ enum ibv_wc_opcode {
IBV_WC_RECV_RDMA_WITH_IMM
 };
 
+enum {
+   IBV_WC_IP_CSUM_OK_SHIFT = 2
+};
+
 enum ibv_wc_flags {
IBV_WC_GRH  = 1 << 0,
-   IBV_WC_WITH_IMM = 1 << 1
+   IBV_WC_WITH_IMM = 1 << 1,
+   IBV_WC_IP_CSUM_OK   = 1 << IBV_WC_IP_CSUM_OK_SHIFT
 };
 
 struct ibv_wc {
@@ -646,6 +657,11 @@ enum ibv_mig_state {
IBV_MIG_ARMED
 };
 
+enum ibv_csum_cap_flags {
+   IBV_CSUM_SUPPORT_UD = 1 << IBV_QPT_UD,
+   IBV_CSUM_SUPPORT_RAW= 1 << IBV_QPT_RAW_PACKET,
+};
+
 struct ibv_qp_attr {
enum ibv_qp_state   qp_state;
enum ibv_qp_state   cur_qp_state;
@@ -688,7 +704,8 @@ enum ibv_send_flags {
IBV_SEND_FENCE  = 1 << 0,
IBV_SEND_SIGNALED   = 1 << 1,

Re: [PATCH for-4.3] IB/ipoib: add module option for auto-creating mcast groups

2015-09-16 Thread Doug Ledford
On 09/15/2015 07:53 PM, Christoph Lameter wrote:
> On Tue, 15 Sep 2015, Jason Gunthorpe wrote:
> 
>> The mcast list in the core is soley for listing subscriptions for
>> inbound - ie receive. Abusing it for send-side is probably the wrong
>> direction overall.
> 
> Ok then a simple approach would be to port timeout logic from
> OFED-1.5.X.

It's the simple fix, but not the right fix.  I would prefer to find the
right fix for upstream.


-- 
Doug Ledford 
  GPG KeyID: 0E572FDD




signature.asc
Description: OpenPGP digital signature


[PATCH 0/3] Enable checksum offload capability reporting

2015-09-16 Thread Bodong Wang
The checksum offload capability reporting is enabled based on extended verbs.
The capability field has sub-fields for every link layer, and depends on device
cap, each link layer will support specific QP types. These will be reported to
user space.

I'm new to uverbs extensions and looking forward for review comments on that
aspect of the patches.

Bodong Wang (3):
  IB/core: Add support of checksum capability reporting in ib verbs
  IB/uverbs: Add support for checksum capability reporting in user verbs
  IB/mlx4: Report checksum offload cap when query device

 drivers/infiniband/core/uverbs_cmd.c |  7 +++
 drivers/infiniband/hw/mlx4/main.c|  3 +++
 include/rdma/ib_verbs.h  | 10 ++
 include/uapi/rdma/ib_user_verbs.h|  6 ++
 4 files changed, 26 insertions(+)

-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/3] IB/mlx4: Report checksum offload cap when query device

2015-09-16 Thread Bodong Wang
Signed-off-by: Bodong Wang 
---
 drivers/infiniband/hw/mlx4/main.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/infiniband/hw/mlx4/main.c 
b/drivers/infiniband/hw/mlx4/main.c
index 8be6db8..a70ca6a 100644
--- a/drivers/infiniband/hw/mlx4/main.c
+++ b/drivers/infiniband/hw/mlx4/main.c
@@ -217,6 +217,9 @@ static int mlx4_ib_query_device(struct ib_device *ibdev,
props->device_cap_flags |= IB_DEVICE_MANAGED_FLOW_STEERING;
}
 
+   props->csum_cap.eth_csum_cap |= IB_CSUM_SUPPORT_RAW;
+   props->csum_cap.ib_csum_cap |= IB_CSUM_SUPPORT_UD;
+
props->vendor_id   = be32_to_cpup((__be32 *) (out_mad->data + 
36)) &
0xff;
props->vendor_part_id  = dev->dev->persist->pdev->device;
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/3] IB/uverbs: Add support for checksum capability reporting in user verbs

2015-09-16 Thread Bodong Wang
New field csum_cap is added to respective uverbs counterpart according
to ib_verbs.

Signed-off-by: Bodong Wang 
---
 drivers/infiniband/core/uverbs_cmd.c | 7 +++
 include/uapi/rdma/ib_user_verbs.h| 6 ++
 2 files changed, 13 insertions(+)

diff --git a/drivers/infiniband/core/uverbs_cmd.c 
b/drivers/infiniband/core/uverbs_cmd.c
index bbb02ff..9d5deec 100644
--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -3464,6 +3464,13 @@ int ib_uverbs_ex_query_device(struct ib_uverbs_file 
*file,
resp.hca_core_clock = attr.hca_core_clock;
resp.response_length += sizeof(resp.hca_core_clock);
 
+   if (ucore->outlen < resp.response_length + sizeof(resp.csum_cap))
+   goto end;
+
+   resp.csum_cap.eth_csum_cap = attr.csum_cap.eth_csum_cap;
+   resp.csum_cap.ib_csum_cap = attr.csum_cap.ib_csum_cap;
+   resp.response_length += sizeof(resp.csum_cap);
+
 end:
err = ib_copy_to_udata(ucore, , resp.response_length);
if (err)
diff --git a/include/uapi/rdma/ib_user_verbs.h 
b/include/uapi/rdma/ib_user_verbs.h
index 978841e..9d69546 100644
--- a/include/uapi/rdma/ib_user_verbs.h
+++ b/include/uapi/rdma/ib_user_verbs.h
@@ -218,6 +218,11 @@ struct ib_uverbs_odp_caps {
__u32 reserved;
 };
 
+struct ib_uverbs_csum_cap_per_link {
+   __u32 eth_csum_cap;
+   __u32 ib_csum_cap;
+};
+
 struct ib_uverbs_ex_query_device_resp {
struct ib_uverbs_query_device_resp base;
__u32 comp_mask;
@@ -225,6 +230,7 @@ struct ib_uverbs_ex_query_device_resp {
struct ib_uverbs_odp_caps odp_caps;
__u64 timestamp_mask;
__u64 hca_core_clock; /* in KHZ */
+   struct ib_uverbs_csum_cap_per_link csum_cap;
 };
 
 struct ib_uverbs_query_port {
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH for-4.3] IB/ipoib: add module option for auto-creating mcast groups

2015-09-16 Thread Christoph Lameter
On Wed, 16 Sep 2015, Doug Ledford wrote:

> On 09/15/2015 07:53 PM, Christoph Lameter wrote:
> > On Tue, 15 Sep 2015, Jason Gunthorpe wrote:
> >
> >> The mcast list in the core is soley for listing subscriptions for
> >> inbound - ie receive. Abusing it for send-side is probably the wrong
> >> direction overall.
> >
> > Ok then a simple approach would be to port timeout logic from
> > OFED-1.5.X.
>
> It's the simple fix, but not the right fix.  I would prefer to find the
> right fix for upstream.

We would have to track which sockets have sent sendonly multicast
traffic. Some sort of a refcount on the sendonly multicast group that
gets decremented when the socket is closed down. We need some sort of
custom callback during socket shutdown.

The IPoIB layer is not a protocol otherwise we would have a shutdown
callback to work with.

Hmmm... For the UDP protocol the shutdown function is not populated in the
protocol methods. There is an encap_destroy() that is called on
udp_destroy_sock(). We could add another check in udp_destroy_sock()
that does a callback for IPoIB. That could then release the refcount.

Question then is how do we know which socket has done a sendonly join to
which multicast groups? We cannot use the regular multicast list for a
socket. So add another list?
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH for-4.3] IB/ipoib: add module option for auto-creating mcast groups

2015-09-16 Thread Jason Gunthorpe
On Wed, Sep 16, 2015 at 10:59:50AM -0500, Christoph Lameter wrote:
> On Wed, 16 Sep 2015, Doug Ledford wrote:
> 
> > > Abusing it for send-side is probably the wrong
> > > direction overall.
> >
> > I wouldn't "abuse" it for such, I would suggest adding a proper notion
> > of send-only registrations.
> 
> That is really not necessary for IP traffic. There is no need to track
> these since multicast can be send without subscriptions. So I guess that
> there will not be much support on netdev for such an approach.

InfiniBand is not unique here, eg, long ago proposals for IPoATM had
the same problem. Not sure if there is any other more current networks
that work this way..

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH for-4.3] IB/ipoib: add module option for auto-creating mcast groups

2015-09-16 Thread Doug Ledford
On 09/16/2015 11:59 AM, Christoph Lameter wrote:
> On Wed, 16 Sep 2015, Doug Ledford wrote:
> 
>>> Abusing it for send-side is probably the wrong
>>> direction overall.
>>
>> I wouldn't "abuse" it for such, I would suggest adding a proper notion
>> of send-only registrations.
> 
> That is really not necessary for IP traffic. There is no need to track
> these since multicast can be send without subscriptions. So I guess that
> there will not be much support on netdev for such an approach.

I agree with you that it's not necessary for Ethernet/IP traffic.  The
point is that this isn't really Ethernet/IP traffic, it's InfiniBand/IP
traffic.  And there it *is* necessary.  An argument can be made that the
core code shouldn't assume Ethernet/IP and should support either
Ethernet/IP or InfiniBand/IP.


-- 
Doug Ledford 
  GPG KeyID: 0E572FDD




signature.asc
Description: OpenPGP digital signature


[PATCH 2/2] Add support for TX/RX checksum offload

2015-09-16 Thread Bodong Wang
RX checksum verification status is reported through wc_flag when polling CQ if
device supports checksum offload. When IBV_WC_IP_CSUM_OK is set, that means
both IPv4 header checksum and TCP/UDP checksum are OK.

TX checksum offload will be enabled for TCP/UDP over IPv4 if user sets
send_flag IBV_SEND_IP_CSUM and device supports checksum offload.

A new field: qp_cap_cache, is added to mlx4_qp in order to 'cache' the device
capabilities to minimize performance hit on poll_one and post_send function.
The capabilities are set inside mlx4_modify_qp. Post_send will return error
if device doesn't support checksum but user sets flag IBV_SEND_IP_CSUM.

Signed-off-by: Bodong Wang 
---
 src/cq.c|  6 ++
 src/mlx4.c  |  1 +
 src/mlx4.h  | 23 ++-
 src/qp.c| 19 +++
 src/verbs.c | 54 ++
 src/wqe.h   |  8 +---
 6 files changed, 107 insertions(+), 4 deletions(-)

diff --git a/src/cq.c b/src/cq.c
index 8b27795..32c9070 100644
--- a/src/cq.c
+++ b/src/cq.c
@@ -329,6 +329,12 @@ static int mlx4_poll_one(struct mlx4_cq *cq,
wc->sl = ntohs(cqe->sl_vid) >> 13;
else
wc->sl = ntohs(cqe->sl_vid) >> 12;
+
+   if ((*cur_qp) && ((*cur_qp)->qp_cap_cache & 
MLX4_RX_CSUM_VALID)) {
+   wc->wc_flags |= ((cqe->status & 
htonl(MLX4_CQE_STATUS_IPV4_CSUM_OK)) ==
+htonl(MLX4_CQE_STATUS_IPV4_CSUM_OK)) <<
+   IBV_WC_IP_CSUM_OK_SHIFT;
+   }
}
 
return CQ_OK;
diff --git a/src/mlx4.c b/src/mlx4.c
index 9fe8c6a..427a3a8 100644
--- a/src/mlx4.c
+++ b/src/mlx4.c
@@ -205,6 +205,7 @@ static int mlx4_init_context(struct verbs_device *v_device,
verbs_set_ctx_op(verbs_ctx, open_qp, mlx4_open_qp);
verbs_set_ctx_op(verbs_ctx, ibv_create_flow, ibv_cmd_create_flow);
verbs_set_ctx_op(verbs_ctx, ibv_destroy_flow, ibv_cmd_destroy_flow);
+   verbs_set_ctx_op(verbs_ctx, query_device_ex, mlx4_query_device_ex);
 
return 0;
 
diff --git a/src/mlx4.h b/src/mlx4.h
index d71450f..7e229d7 100644
--- a/src/mlx4.h
+++ b/src/mlx4.h
@@ -257,6 +257,7 @@ struct mlx4_qp {
struct mlx4_wq  rq;
 
uint8_t link_layer;
+   uint32_tqp_cap_cache;
 };
 
 struct mlx4_av {
@@ -279,6 +280,22 @@ struct mlx4_ah {
uint8_t mac[6];
 };
 
+enum {
+   MLX4_CSUM_SUPPORT_UD_OVER_IB= (1 <<  0),
+   MLX4_CSUM_SUPPORT_RAW_OVER_ETH  = (1 <<  1),
+   /* Only report rx checksum when the validation is valid */
+   MLX4_RX_CSUM_VALID  = (1 <<  16),
+};
+
+enum mlx4_cqe_status {
+   MLX4_CQE_STATUS_TCP_UDP_CSUM_OK = (1 <<  2),
+   MLX4_CQE_STATUS_IPV4_PKT= (1 << 22),
+   MLX4_CQE_STATUS_IP_HDR_CSUM_OK  = (1 << 28),
+   MLX4_CQE_STATUS_IPV4_CSUM_OK= MLX4_CQE_STATUS_IPV4_PKT |
+   MLX4_CQE_STATUS_IP_HDR_CSUM_OK |
+   MLX4_CQE_STATUS_TCP_UDP_CSUM_OK
+};
+
 struct mlx4_cqe {
uint32_tvlan_my_qpn;
uint32_timmed_rss_invalid;
@@ -286,7 +303,7 @@ struct mlx4_cqe {
uint8_t sl_vid;
uint8_t reserved1;
uint16_trlid;
-   uint32_treserved2;
+   uint32_tstatus;
uint32_tbyte_cnt;
uint16_twqe_index;
uint16_tchecksum;
@@ -352,6 +369,10 @@ void mlx4_free_db(struct mlx4_context *context, enum 
mlx4_db_type type, uint32_t
 
 int mlx4_query_device(struct ibv_context *context,
   struct ibv_device_attr *attr);
+int mlx4_query_device_ex(struct ibv_context *context,
+const struct ibv_query_device_ex_input *input,
+struct ibv_device_attr_ex *attr,
+size_t attr_size);
 int mlx4_query_port(struct ibv_context *context, uint8_t port,
 struct ibv_port_attr *attr);
 
diff --git a/src/qp.c b/src/qp.c
index 721bed4..057490b 100644
--- a/src/qp.c
+++ b/src/qp.c
@@ -289,12 +289,31 @@ int mlx4_post_send(struct ibv_qp *ibqp, struct 
ibv_send_wr *wr,
set_datagram_seg(wqe, wr);
wqe  += sizeof (struct mlx4_wqe_datagram_seg);
size += sizeof (struct mlx4_wqe_datagram_seg) / 16;
+
+   if (wr->send_flags & IBV_SEND_IP_CSUM) {
+   if (!(qp->qp_cap_cache & 
MLX4_CSUM_SUPPORT_UD_OVER_IB)) {
+   ret = EINVAL;
+   *bad_wr = wr;
+   goto out;
+   }
+   ctrl->srcrb_flags |= 
htonl(MLX4_WQE_CTRL_IP_HDR_CSUM |
+   

[PATCH 1/2] Update ibv_create_flow/ibv_destroy_flow according to change of libibverbs

2015-09-16 Thread Bodong Wang
Signed-off-by: Bodong Wang 
---
 src/mlx4.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/mlx4.c b/src/mlx4.c
index 2999150..9fe8c6a 100644
--- a/src/mlx4.c
+++ b/src/mlx4.c
@@ -203,8 +203,8 @@ static int mlx4_init_context(struct verbs_device *v_device,
verbs_set_ctx_op(verbs_ctx, get_srq_num, verbs_get_srq_num);
verbs_set_ctx_op(verbs_ctx, create_qp_ex, mlx4_create_qp_ex);
verbs_set_ctx_op(verbs_ctx, open_qp, mlx4_open_qp);
-   verbs_set_ctx_op(verbs_ctx, drv_ibv_create_flow, ibv_cmd_create_flow);
-   verbs_set_ctx_op(verbs_ctx, drv_ibv_destroy_flow, ibv_cmd_destroy_flow);
+   verbs_set_ctx_op(verbs_ctx, ibv_create_flow, ibv_cmd_create_flow);
+   verbs_set_ctx_op(verbs_ctx, ibv_destroy_flow, ibv_cmd_destroy_flow);
 
return 0;
 
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch] IB/hfi1: mask vs shift confusion

2015-09-16 Thread Dan Carpenter
We are shifting by the _MASK macros instead of the _SHIFT ones.

Signed-off-by: Dan Carpenter 

diff --git a/drivers/staging/rdma/hfi1/sdma.c b/drivers/staging/rdma/hfi1/sdma.c
index a8c903c..3a457d2 100644
--- a/drivers/staging/rdma/hfi1/sdma.c
+++ b/drivers/staging/rdma/hfi1/sdma.c
@@ -1848,7 +1848,7 @@ static void dump_sdma_state(struct sdma_engine *sde)
dd_dev_err(sde->dd,
"\taidx: %u amode: %u alen: %u\n",
(u8)((desc[1] & SDMA_DESC1_HEADER_INDEX_SMASK)
-   >> SDMA_DESC1_HEADER_INDEX_MASK),
+   >> SDMA_DESC1_HEADER_INDEX_SHIFT),
(u8)((desc[1] & SDMA_DESC1_HEADER_MODE_SMASK)
>> SDMA_DESC1_HEADER_MODE_SHIFT),
(u8)((desc[1] & SDMA_DESC1_HEADER_DWS_SMASK)
@@ -1926,7 +1926,7 @@ void sdma_seqfile_dump_sde(struct seq_file *s, struct 
sdma_engine *sde)
if (desc[0] & SDMA_DESC0_FIRST_DESC_FLAG)
seq_printf(s, "\t\tahgidx: %u ahgmode: %u\n",
(u8)((desc[1] & SDMA_DESC1_HEADER_INDEX_SMASK)
-   >> SDMA_DESC1_HEADER_INDEX_MASK),
+   >> SDMA_DESC1_HEADER_INDEX_SHIFT),
(u8)((desc[1] & SDMA_DESC1_HEADER_MODE_SMASK)
>> SDMA_DESC1_HEADER_MODE_SHIFT));
head = (head + 1) & sde->sdma_mask;
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH for-4.3] IB/ipoib: add module option for auto-creating mcast groups

2015-09-16 Thread Christoph Lameter
Another approach may be to tie the unsub from sendonly multicast joins to
the expiration of the layer 2 addresses in IPoIB. F.e. add code to
 __ipoib_reap_ah() to detect if the handle was used for a sendonly
multicast join. If so unsubscribe from the MC group. This will result in
behavior consistent with address resolution and caching on IPoIB.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH for-4.3] IB/ipoib: add module option for auto-creating mcast groups

2015-09-16 Thread Christoph Lameter
On Wed, 16 Sep 2015, Or Gerlitz wrote:

> On Wed, Sep 16, 2015 at 7:31 PM, Christoph Lameter  wrote:
> > Another approach may be to tie the unsub from sendonly multicast joins to
> > the expiration of the layer 2 addresses in IPoIB. F.e. add code to
> >  __ipoib_reap_ah() to detect if the handle was used for a sendonly
> > multicast join. If so unsubscribe from the MC group. This will result in
> > behavior consistent with address resolution and caching on IPoIB.
>
> yep, Erez has the patches to do so.

Would you please share them?

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] IB/hfi: Properly set permissions for user device files

2015-09-16 Thread ira . weiny
From: Ira Weiny 

Some of the device files are required to be user accessible for PSM while
most should remain accessible only by root.

Add a parameter to hfi1_cdev_init which controls if the user should have access
to this device which places it in a different class with the appropriate
devnode callback.

In addition set the devnode call back for the existing class to be a bit more
explicit for those permissions.

Signed-off-by: Haralanov, Mitko 
Signed-off-by: Ira Weiny 
---
 drivers/staging/rdma/hfi1/device.c   | 48 ++--
 drivers/staging/rdma/hfi1/device.h   |  3 ++-
 drivers/staging/rdma/hfi1/diag.c |  5 ++--
 drivers/staging/rdma/hfi1/file_ops.c |  9 ---
 4 files changed, 57 insertions(+), 8 deletions(-)

diff --git a/drivers/staging/rdma/hfi1/device.c 
b/drivers/staging/rdma/hfi1/device.c
index 07c87a87775f..b9315d71b20c 100644
--- a/drivers/staging/rdma/hfi1/device.c
+++ b/drivers/staging/rdma/hfi1/device.c
@@ -57,11 +57,13 @@
 #include "device.h"
 
 static struct class *class;
+static struct class *user_class;
 static dev_t hfi1_dev;
 
 int hfi1_cdev_init(int minor, const char *name,
   const struct file_operations *fops,
-  struct cdev *cdev, struct device **devp)
+  struct cdev *cdev, struct device **devp,
+  bool user_accessible)
 {
const dev_t dev = MKDEV(MAJOR(hfi1_dev), minor);
struct device *device = NULL;
@@ -78,7 +80,11 @@ int hfi1_cdev_init(int minor, const char *name,
goto done;
}
 
-   device = device_create(class, NULL, dev, NULL, "%s", name);
+   if (user_accessible)
+   device = device_create(user_class, NULL, dev, NULL, "%s", name);
+   else
+   device = device_create(class, NULL, dev, NULL, "%s", name);
+
if (!IS_ERR(device))
goto done;
ret = PTR_ERR(device);
@@ -110,6 +116,26 @@ const char *class_name(void)
return hfi1_class_name;
 }
 
+static char *hfi1_devnode(struct device *dev, umode_t *mode)
+{
+   if (mode)
+   *mode = 0600;
+   return kasprintf(GFP_KERNEL, "%s", dev_name(dev));
+}
+
+static const char *hfi1_class_name_user = "hfi1_user";
+const char *class_name_user(void)
+{
+   return hfi1_class_name_user;
+}
+
+static char *hfi1_user_devnode(struct device *dev, umode_t *mode)
+{
+   if (mode)
+   *mode = 0666;
+   return kasprintf(GFP_KERNEL, "%s", dev_name(dev));
+}
+
 int __init dev_init(void)
 {
int ret;
@@ -125,7 +151,20 @@ int __init dev_init(void)
ret = PTR_ERR(class);
pr_err("Could not create device class (err %d)\n", -ret);
unregister_chrdev_region(hfi1_dev, HFI1_NMINORS);
+   goto done;
}
+   class->devnode = hfi1_devnode;
+
+   user_class = class_create(THIS_MODULE, class_name_user());
+   if (IS_ERR(user_class)) {
+   ret = PTR_ERR(user_class);
+   pr_err("Could not create device class for user accisble files 
(err %d)\n",
+  -ret);
+   class_destroy(class);
+   class = NULL;
+   unregister_chrdev_region(hfi1_dev, HFI1_NMINORS);
+   }
+   user_class->devnode = hfi1_user_devnode;
 
 done:
return ret;
@@ -138,5 +177,10 @@ void dev_cleanup(void)
class = NULL;
}
 
+   if (user_class) {
+   class_destroy(user_class);
+   user_class = NULL;
+   }
+
unregister_chrdev_region(hfi1_dev, HFI1_NMINORS);
 }
diff --git a/drivers/staging/rdma/hfi1/device.h 
b/drivers/staging/rdma/hfi1/device.h
index 98caecd3d807..2850ff739d81 100644
--- a/drivers/staging/rdma/hfi1/device.h
+++ b/drivers/staging/rdma/hfi1/device.h
@@ -52,7 +52,8 @@
 
 int hfi1_cdev_init(int minor, const char *name,
   const struct file_operations *fops,
-  struct cdev *cdev, struct device **devp);
+  struct cdev *cdev, struct device **devp,
+  bool user_accessible);
 void hfi1_cdev_cleanup(struct cdev *cdev, struct device **devp);
 const char *class_name(void);
 int __init dev_init(void);
diff --git a/drivers/staging/rdma/hfi1/diag.c b/drivers/staging/rdma/hfi1/diag.c
index 6777d6b659cf..b87e4e942ae6 100644
--- a/drivers/staging/rdma/hfi1/diag.c
+++ b/drivers/staging/rdma/hfi1/diag.c
@@ -292,7 +292,7 @@ int hfi1_diag_add(struct hfi1_devdata *dd)
if (atomic_inc_return(_count) == 1) {
ret = hfi1_cdev_init(HFI1_DIAGPKT_MINOR, name,
 _file_ops, _cdev,
-_device);
+_device, false);
}
 
return ret;
@@ -592,7 +592,8 @@ static int hfi1_snoop_add(struct hfi1_devdata *dd, const 
char *name)
 
ret = 

Re: [PATCH for-4.3] IB/ipoib: add module option for auto-creating mcast groups

2015-09-16 Thread Or Gerlitz
On Wed, Sep 16, 2015 at 11:17 PM, Christoph Lameter  wrote:
> On Wed, 16 Sep 2015, Or Gerlitz wrote:
>
>> On Wed, Sep 16, 2015 at 7:31 PM, Christoph Lameter  wrote:
>> > Another approach may be to tie the unsub from sendonly multicast joins to
>> > the expiration of the layer 2 addresses in IPoIB. F.e. add code to
>> >  __ipoib_reap_ah() to detect if the handle was used for a sendonly
>> > multicast join. If so unsubscribe from the MC group. This will result in
>> > behavior consistent with address resolution and caching on IPoIB.
>>
>> yep, Erez has the patches to do so.
>
> Would you please share them?

I will check with him tomorrow, basically I think he's pretty busy and
hence didn't participate in these threads so far, we'll see.

Could you please post here a few (say 2-4) liner summary of what is
still missing or done wrong in 4.3-rc1 and what is your suggestion how
to resolve that.

The amount of TEXT @ the IPoIB patch by Doug that went into rc1 and
the texts written so far over this thread are just too much for a busy
Erez and myself to grasp and act, we need your help with summing this
up...

Or.

Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH for-4.3] IB/ipoib: add module option for auto-creating mcast groups

2015-09-16 Thread Or Gerlitz
On Wed, Sep 16, 2015 at 7:31 PM, Christoph Lameter  wrote:
> Another approach may be to tie the unsub from sendonly multicast joins to
> the expiration of the layer 2 addresses in IPoIB. F.e. add code to
>  __ipoib_reap_ah() to detect if the handle was used for a sendonly
> multicast join. If so unsubscribe from the MC group. This will result in
> behavior consistent with address resolution and caching on IPoIB.

yep, Erez has the patches to do so.

Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH for-4.3] IB/ipoib: add module option for auto-creating mcast groups

2015-09-16 Thread Christoph Lameter
On Wed, 16 Sep 2015, Or Gerlitz wrote:

> Could you please post here a few (say 2-4) liner summary of what is
> still missing or done wrong in 4.3-rc1 and what is your suggestion how
> to resolve that.

With Doug's patch here the only thing that is left to be done is to
properly leave the multicast group. And it seems that Erez patch does just that.

And then there are the 20 other things that I have pending with Mellanox
but those are different issues that do not belong here. This one is a
critical bug for us.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 1/3] IB/core: Add support of checksum capability reporting in ib verbs

2015-09-16 Thread Bodong Wang
For RX: if corresponding QP is not supported, it will not validate the csum, 
but packets are still received normally. 
For TX: if corresponding QP is not supported for csum calculation and user 
application sets the IBV_SEND_IP_CSUM flag, it will return error.

-Original Message-
From: Christoph Lameter [mailto:c...@linux.com] 
Sent: Wednesday, September 16, 2015 12:07 PM
To: Bodong Wang
Cc: dledf...@redhat.com; linux-rdma@vger.kernel.org; Bodong Wang; Or Gerlitz; 
jguntho...@obsidianresearch.com; Moshe Lazer; Haggai Eran; Matan Barak
Subject: Re: [PATCH 1/3] IB/core: Add support of checksum capability reporting 
in ib verbs

On Wed, 16 Sep 2015, Bodong Wang wrote:

> A new filed csum_cap is added to both ib_query_device. It contains two 
> members:
> eth_csum_cap and ib_csum_cap, indicates checksum capability of 
> Ethernet and Infiniband link layer respectively for different QP types.
>
> Current checksum caps use the following enum members:
> - IB_CSUM_SUPPORT_UD: device supports validation/calculation of csum for UD 
> QP.
> - IB_CSUM_SUPPORT_RAW: device supports validation/calculation of csum for raw 
> QP.

A combination? Is it possible then to also support calculation without 
validation? Maybe we want to receive packets that do have invalid checksums.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html