Re: device attr cleanup (was: Handle mlx4 max_sge_rd correctly)

2015-12-14 Thread Chuck Lever

> On Dec 9, 2015, at 8:45 PM, ira.weiny  wrote:
> 
> On Wed, Dec 09, 2015 at 10:42:35AM -0800, Christoph Hellwig wrote:
>> On Tue, Dec 08, 2015 at 07:52:03PM -0500, ira.weiny wrote:
>>> Searching patchworks...
>>> 
>>> I'm a bit worried about the size of the patch and I would like to see it 
>>> split
>>> up for review.  But I agree Christophs method is better long term.
>> 
>> I'd be happy to split it up if I could see a way to split it.  So if
>> anyone has an idea you're welcome!
> 
> Well this is a ~3300 line patch which is pretty hard to review in total.
> 
>> 
>>> Christoph do you have this on github somewhere?  Perhaps it is split but I'm
>>> not finding in on patchworks?
>> 
>> No need for github, we have much better (and older) git hosting sites :)
>> 
>> http://git.infradead.org/users/hch/rdma.git/shortlog/refs/heads/ib_device_attr

Tested-by: Chuck Lever 

With NFS/RDMA client and server.


> Another nice side effect of this patch is to get rid of all the struct
> ib_device_attr allocations which are littered all over the ULPs.
> 
> For the core, srp, ipoib, qib, hfi1 bits.  Generally the rest looks fine I 
> just
> did not have time to really go through it line by line.
> 
> Reviewed-by: Ira Weiny 
> 
> Doug this is going to conflict with the rdmavt work.  So if you take this 
> could
> you respond on the list.


--
Chuck Lever
chuckle...@gmail.com



--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 1/3] staging/rdma/hfi1: consolidate kmalloc_array+memset into kcalloc

2015-12-14 Thread Marciniszyn, Mike
> --- a/drivers/staging/rdma/hfi1/chip.c
> +++ b/drivers/staging/rdma/hfi1/chip.c
> @@ -10128,8 +10128,7 @@ static void init_qos(struct hfi1_devdata *dd,
> u32 first_ctxt)
>   goto bail;
>   if (num_vls * qpns_per_vl > dd->chip_rcv_contexts)
>   goto bail;
> - rsmmap = kmalloc_array(NUM_MAP_REGS, sizeof(u64),
> GFP_KERNEL);
> - memset(rsmmap, rxcontext, NUM_MAP_REGS * sizeof(u64));
> + rsmmap = kcalloc(NUM_MAP_REGS, sizeof(u64), GFP_KERNEL);
>   /* init the local copy of the table */
>   for (i = 0, ctxt = first_ctxt; i < num_vls; i++) {
>   unsigned tctxt;
> --

I'm NAKing this.

There is a chip specific difference that accounts for the current code.

Mike
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 2/2] ipoib mcast sendonly join: Move multicast specific code out of ipoib_main.c.

2015-12-14 Thread Christoph Lameter
On Mon, 14 Dec 2015, Weiny, Ira wrote:

> > How about
> >=20
> > ipoib_check_and_add_mcast_sendonly()
>
> Better.

Fixup patch:


Subject: ipoib: Fix up naming of ipoib_check_and_add_mcast_sendonly

Signed-off-by: Christoph Lameter 

Index: linux/drivers/infiniband/ulp/ipoib/ipoib.h
===
--- linux.orig/drivers/infiniband/ulp/ipoib/ipoib.h
+++ linux/drivers/infiniband/ulp/ipoib/ipoib.h
@@ -549,7 +549,7 @@ void ipoib_path_iter_read(struct ipoib_p
 int ipoib_mcast_attach(struct net_device *dev, u16 mlid,
   union ib_gid *mgid, int set_qkey);
 void ipoib_mcast_remove_list(struct net_device *dev, struct list_head 
*remove_list);
-void ipoib_check_mcast_sendonly(struct ipoib_dev_priv *priv, u8 *mgid,
+void ipoib_check_and_add_mcast_sendonly(struct ipoib_dev_priv *priv, u8 *mgid,
struct list_head *remove_list);

 int ipoib_init_qp(struct net_device *dev);
Index: linux/drivers/infiniband/ulp/ipoib/ipoib_main.c
===
--- linux.orig/drivers/infiniband/ulp/ipoib/ipoib_main.c
+++ linux/drivers/infiniband/ulp/ipoib/ipoib_main.c
@@ -1179,7 +1179,7 @@ static void __ipoib_reap_neigh(struct ip
/* was the neigh idle for two GC periods */
if (time_after(neigh_obsolete, neigh->alive)) {

-   ipoib_check_mcast_sendonly(priv, neigh->daddr + 
4, _list);
+   ipoib_check_and_add_mcast_sendonly(priv, 
neigh->daddr + 4, _list);

rcu_assign_pointer(*np,
   
rcu_dereference_protected(neigh->hnext,
Index: linux/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
===
--- linux.orig/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
+++ linux/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
@@ -708,7 +708,7 @@ static int ipoib_mcast_leave(struct net_
  * Check if the multicast group is sendonly. If so remove it from the maps
  * and add to the remove list
  */
-void ipoib_check_mcast_sendonly(struct ipoib_dev_priv *priv, u8 *mgid,
+void ipoib_check_and_add_mcast_sendonly(struct ipoib_dev_priv *priv, u8 *mgid,
struct list_head *remove_list)
 {
/* Is this multicast ? */
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] IB core: Display 64 bit counters from the extended set

2015-12-14 Thread Matan Barak
On Mon, Dec 14, 2015 at 4:55 PM, Christoph Lameter  wrote:
> On Mon, 14 Dec 2015, Matan Barak wrote:
>
>> > +static PORT_PMA_ATTR(unicast_rcv_packets   ,  0, 64, 384, 
>> > IB_PMA_PORT_COUNTERS_EXT);
>> > +static PORT_PMA_ATTR(multicast_xmit_packets,  0, 64, 448, 
>> > IB_PMA_PORT_COUNTERS_EXT);
>> > +static PORT_PMA_ATTR(multicast_rcv_packets ,  0, 64, 512, 
>> > IB_PMA_PORT_COUNTERS_EXT);
>> >
>>
>> Why do we use 0 as the counter argument for all EXT counters?
>
> No idea what the counter is doing. Saw another EXT counter implementation
> use 0 so I thought that was fine.

It seems like a counter index, but I might be wrong though. If it is,
don't we want to preserve the existing non-EXT schema for the new
counters too?
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 37/37] IB/rdmavt: Add support for new memory registration API

2015-12-14 Thread Sagi Grimberg




Hi Dennis and Ira,

This question is not directly related to this patch, but given that
this is a copy-paste from the qib driver I'll go ahead and take it
anyway. How does qib (and rvt now) do memory key invalidation? I didn't
see any reference to IB_WR_LOCAL_INV anywhere in the qib driver...

What am I missing?


ping?
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 2/3] staging/rdma/hfi1: check return value of kcalloc

2015-12-14 Thread Marciniszyn, Mike
> @@ -10129,6 +10129,9 @@ static void init_qos(struct hfi1_devdata *dd,
> u32 first_ctxt)
>   if (num_vls * qpns_per_vl > dd->chip_rcv_contexts)
>   goto bail;
>   rsmmap = kcalloc(NUM_MAP_REGS, sizeof(u64), GFP_KERNEL);
> + if (!rsmmap)
> + goto bail;
> +

I checked out a linux-next remote at the next-20151214 tag.

The allocation method is clearly kmalloc_array() not kcalloc().

Where are you seeing the kcalloc()?

While it is tempting to allocate and zero, there is a chip rev specific 
difference.

Mike
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] IB core: Display 64 bit counters from the extended set

2015-12-14 Thread Christoph Lameter
On Mon, 14 Dec 2015, Matan Barak wrote:

> > No idea what the counter is doing. Saw another EXT counter implementation
> > use 0 so I thought that was fine.
>
> It seems like a counter index, but I might be wrong though. If it is,
> don't we want to preserve the existing non-EXT schema for the new
> counters too?

I do not see any use of that field so I am not sure what to put in there.
Could it be obsolete?

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V2 for-next 6/7] IB/uverbs: Introduce RWQ Indirection table

2015-12-14 Thread Yishai Hadas
Introduce RWQ indirection table uverbs commands, it includes:
create, destroy.

Signed-off-by: Yishai Hadas 
Reviewed-by: Moshe Lazer 
---
 drivers/infiniband/core/uverbs.h  |   3 +
 drivers/infiniband/core/uverbs_cmd.c  | 190 ++
 drivers/infiniband/core/uverbs_main.c |  13 +++
 include/rdma/ib_verbs.h   |   1 +
 include/uapi/rdma/ib_user_verbs.h |  26 +
 5 files changed, 233 insertions(+)

diff --git a/drivers/infiniband/core/uverbs.h b/drivers/infiniband/core/uverbs.h
index a0b1ee7..226a894 100644
--- a/drivers/infiniband/core/uverbs.h
+++ b/drivers/infiniband/core/uverbs.h
@@ -186,6 +186,7 @@ extern struct idr ib_uverbs_srq_idr;
 extern struct idr ib_uverbs_xrcd_idr;
 extern struct idr ib_uverbs_rule_idr;
 extern struct idr ib_uverbs_wq_idr;
+extern struct idr ib_uverbs_rwq_ind_tbl_idr;
 
 void idr_remove_uobj(struct idr *idp, struct ib_uobject *uobj);
 
@@ -282,5 +283,7 @@ IB_UVERBS_DECLARE_EX_CMD(create_qp);
 IB_UVERBS_DECLARE_EX_CMD(create_wq);
 IB_UVERBS_DECLARE_EX_CMD(modify_wq);
 IB_UVERBS_DECLARE_EX_CMD(destroy_wq);
+IB_UVERBS_DECLARE_EX_CMD(create_rwq_ind_table);
+IB_UVERBS_DECLARE_EX_CMD(destroy_rwq_ind_table);
 
 #endif /* UVERBS_H */
diff --git a/drivers/infiniband/core/uverbs_cmd.c 
b/drivers/infiniband/core/uverbs_cmd.c
index e0dd5da..04742fa 100644
--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -58,6 +58,7 @@ static struct uverbs_lock_class srq_lock_class= { 
.name = "SRQ-uobj" };
 static struct uverbs_lock_class xrcd_lock_class = { .name = "XRCD-uobj" };
 static struct uverbs_lock_class rule_lock_class = { .name = "RULE-uobj" };
 static struct uverbs_lock_class wq_lock_class = { .name = "WQ-uobj" };
+static struct uverbs_lock_class rwq_ind_table_lock_class = { .name = 
"IND_TBL-uobj" };
 
 /*
  * The ib_uobject locking scheme is as follows:
@@ -339,6 +340,7 @@ ssize_t ib_uverbs_get_context(struct ib_uverbs_file *file,
INIT_LIST_HEAD(>srq_list);
INIT_LIST_HEAD(>ah_list);
INIT_LIST_HEAD(>wq_list);
+   INIT_LIST_HEAD(>rwq_ind_tbl_list);
INIT_LIST_HEAD(>xrcd_list);
INIT_LIST_HEAD(>rule_list);
rcu_read_lock();
@@ -3278,6 +3280,194 @@ int ib_uverbs_ex_modify_wq(struct ib_uverbs_file *file,
return ret;
 }
 
+int ib_uverbs_ex_create_rwq_ind_table(struct ib_uverbs_file *file,
+ struct ib_device *ib_dev,
+ struct ib_udata *ucore,
+ struct ib_udata *uhw)
+{
+   struct ib_uverbs_ex_create_rwq_ind_table  cmd;
+   struct ib_uverbs_ex_create_rwq_ind_table_resp   resp;
+   struct ib_uobject *uobj;
+   int err = 0;
+   struct ib_rwq_ind_table_init_attr init_attr;
+   struct ib_rwq_ind_table *rwq_ind_tbl;
+   struct ib_wq**wqs = NULL;
+   u32 *wqs_handles = NULL;
+   struct ib_wq*wq = NULL;
+   int i, j, num_read_wqs;
+   u32 num_wq_handles;
+   u32 expected_in_size;
+
+   if (ucore->inlen < sizeof(cmd))
+   return -EINVAL;
+
+   if (ucore->outlen < sizeof(resp))
+   return -ENOSPC;
+
+   err = ib_copy_from_udata(, ucore, sizeof(cmd));
+   if (err)
+   return err;
+
+   ucore->inbuf += sizeof(cmd);
+   ucore->inlen -= sizeof(cmd);
+
+   if (cmd.comp_mask)
+   return -EINVAL;
+
+   if (cmd.log_ind_tbl_size > IB_USER_VERBS_MAX_LOG_IND_TBL_SIZE)
+   return -EINVAL;
+
+   num_wq_handles = 1 << cmd.log_ind_tbl_size;
+   expected_in_size = num_wq_handles * sizeof(__u32);
+   if (num_wq_handles == 1)
+   /* input size for wq handles is u64 aligned */
+   expected_in_size += sizeof(__u32);
+
+   if (ucore->inlen != expected_in_size)
+   return -EINVAL;
+
+   wqs_handles = kcalloc(num_wq_handles, sizeof(*wqs_handles),
+ GFP_KERNEL);
+   if (!wqs_handles)
+   return -ENOMEM;
+
+   err = ib_copy_from_udata(wqs_handles, ucore,
+num_wq_handles * sizeof(__u32));
+   if (err)
+   goto err_free;
+
+   wqs = kcalloc(num_wq_handles, sizeof(*wqs), GFP_KERNEL);
+   if (!wqs)
+   goto  err_free;
+
+   for (num_read_wqs = 0; num_read_wqs < num_wq_handles;
+   num_read_wqs++) {
+   wq = idr_read_wq(wqs_handles[num_read_wqs], file->ucontext);
+   if (!wq)
+   goto put_wqs;
+
+   wqs[num_read_wqs] = wq;
+   }
+
+   uobj = kmalloc(sizeof(*uobj), GFP_KERNEL);
+   if (!uobj) {
+   err = -ENOMEM;
+   goto put_wqs;
+   }
+
+   init_uobj(uobj, 0, file->ucontext, _ind_table_lock_class);
+   down_write(>mutex);
+   memset(_attr, 0, sizeof(init_attr));
+   

[PATCH V2 for-next 5/7] IB: Introduce Receive Work Queue indirection table

2015-12-14 Thread Yishai Hadas
Introduce Receive Work Queue indirection table.
This object can be used to spread incoming traffic to different
receive Work Queues.

Signed-off-by: Yishai Hadas 
Reviewed-by: Moshe Lazer 
---
 drivers/infiniband/core/verbs.c | 66 +
 include/rdma/ib_verbs.h | 24 +++
 2 files changed, 90 insertions(+)

diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
index 576c65d..e2951145 100644
--- a/drivers/infiniband/core/verbs.c
+++ b/drivers/infiniband/core/verbs.c
@@ -1522,6 +1522,72 @@ int ib_modify_wq(struct ib_wq *wq, struct ib_wq_attr 
*wq_attr,
 }
 EXPORT_SYMBOL(ib_modify_wq);
 
+/*
+ * ib_create_rwq_ind_table - Creates a RQ Indirection Table.
+ * @device: The device on which to create the rwq indirection table.
+ * @ib_rwq_ind_table_init_attr: A list of initial attributes required to
+ * create the Indirection Table.
+ *
+ * Note: The life time of ib_rwq_ind_table_init_attr->ind_tbl is not less
+ * than the created ib_rwq_ind_table object and the caller is responsible
+ * for its memory allocation/free.
+ */
+struct ib_rwq_ind_table *ib_create_rwq_ind_table(struct ib_device *device,
+struct 
ib_rwq_ind_table_init_attr*
+init_attr)
+{
+   struct ib_rwq_ind_table *rwq_ind_table;
+   int i;
+   u32 table_size;
+
+   if (!device->create_rwq_ind_table)
+   return ERR_PTR(-ENOSYS);
+
+   table_size = (1 << init_attr->log_ind_tbl_size);
+   rwq_ind_table = device->create_rwq_ind_table(device,
+   init_attr, NULL);
+   if (IS_ERR(rwq_ind_table))
+   return rwq_ind_table;
+
+   rwq_ind_table->ind_tbl = init_attr->ind_tbl;
+   rwq_ind_table->log_ind_tbl_size = init_attr->log_ind_tbl_size;
+   rwq_ind_table->device = device;
+   rwq_ind_table->uobject = NULL;
+   atomic_set(_ind_table->usecnt, 0);
+
+   for (i = 0; i < table_size; i++)
+   atomic_inc(_ind_table->ind_tbl[i]->usecnt);
+
+   return rwq_ind_table;
+}
+EXPORT_SYMBOL(ib_create_rwq_ind_table);
+
+/*
+ * ib_destroy_rwq_ind_table - Destroys the specified Indirection Table.
+ * @wq_ind_table: The Indirection Table to destroy.
+*/
+int ib_destroy_rwq_ind_table(struct ib_rwq_ind_table *rwq_ind_table)
+{
+   int err, i;
+   u32 table_size = (1 << rwq_ind_table->log_ind_tbl_size);
+   struct ib_wq **ind_tbl = rwq_ind_table->ind_tbl;
+
+   if (!rwq_ind_table->device->destroy_rwq_ind_table)
+   return -ENOSYS;
+
+   if (atomic_read(_ind_table->usecnt))
+   return -EBUSY;
+
+   err = rwq_ind_table->device->destroy_rwq_ind_table(rwq_ind_table);
+   if (!err) {
+   for (i = 0; i < table_size; i++)
+   atomic_dec(_tbl[i]->usecnt);
+   }
+
+   return err;
+}
+EXPORT_SYMBOL(ib_destroy_rwq_ind_table);
+
 struct ib_flow *ib_create_flow(struct ib_qp *qp,
   struct ib_flow_attr *flow_attr,
   int domain)
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index a3243ad..10107a8 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -1379,6 +1379,21 @@ struct ib_wq_attr {
enumib_wq_state curr_wq_state;
 };
 
+struct ib_rwq_ind_table {
+   struct ib_device*device;
+   struct ib_uobject  *uobject;
+   atomic_tusecnt;
+   u32 ind_tbl_num;
+   u32 log_ind_tbl_size;
+   struct ib_wq**ind_tbl;
+};
+
+struct ib_rwq_ind_table_init_attr {
+   u32 log_ind_tbl_size;
+   /* Each entry is a pointer to Receive Work Queue */
+   struct ib_wq**ind_tbl;
+};
+
 struct ib_qp {
struct ib_device   *device;
struct ib_pd   *pd;
@@ -1851,6 +1866,11 @@ struct ib_device {
struct ib_wq_attr *attr,
u32 wq_attr_mask,
struct ib_udata *udata);
+   struct ib_rwq_ind_table *  (*create_rwq_ind_table)(struct ib_device 
*device,
+  struct 
ib_rwq_ind_table_init_attr *init_attr,
+  struct ib_udata 
*udata);
+   int(*destroy_rwq_ind_table)(struct 
ib_rwq_ind_table *wq_ind_table);
+
 
struct ib_dma_mapping_ops   *dma_ops;
 
@@ -3082,6 +3102,10 @@ struct ib_wq *ib_create_wq(struct ib_pd *pd,
 int ib_destroy_wq(struct ib_wq *wq);
 int ib_modify_wq(struct ib_wq *wq, struct ib_wq_attr *attr,
 u32 wq_attr_mask);
+struct ib_rwq_ind_table *ib_create_rwq_ind_table(struct ib_device *device,
+struct 

[PATCH V2 for-next 1/7] net/mlx5_core: Expose transobj APIs from mlx5 core

2015-12-14 Thread Yishai Hadas
Move transobj.h from the core library to include/linux/mlx5
to enable using its functionality outside of mlx5 core.

Signed-off-by: Yishai Hadas 
Signed-off-by: Eli Cohen 
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h   |  2 +-
 drivers/net/ethernet/mellanox/mlx5/core/srq.c  |  2 +-
 drivers/net/ethernet/mellanox/mlx5/core/transobj.c |  8 ++-
 drivers/net/ethernet/mellanox/mlx5/core/transobj.h | 72 -
 include/linux/mlx5/transobj.h  | 73 ++
 5 files changed, 82 insertions(+), 75 deletions(-)
 delete mode 100644 drivers/net/ethernet/mellanox/mlx5/core/transobj.h
 create mode 100644 include/linux/mlx5/transobj.h

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index 22e72bf..034ccab 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -36,8 +36,8 @@
 #include 
 #include 
 #include 
+#include 
 #include "wq.h"
-#include "transobj.h"
 #include "mlx5_core.h"
 
 #define MLX5E_MAX_NUM_TC   8
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/srq.c 
b/drivers/net/ethernet/mellanox/mlx5/core/srq.c
index ffada80..16cbc37 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/srq.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/srq.c
@@ -35,9 +35,9 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include "mlx5_core.h"
-#include "transobj.h"
 
 void mlx5_srq_event(struct mlx5_core_dev *dev, u32 srqn, int event_type)
 {
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/transobj.c 
b/drivers/net/ethernet/mellanox/mlx5/core/transobj.c
index d7068f5..a4254e1 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/transobj.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/transobj.c
@@ -30,9 +30,10 @@
  * SOFTWARE.
  */
 
+#include 
 #include 
+#include 
 #include "mlx5_core.h"
-#include "transobj.h"
 
 int mlx5_alloc_transport_domain(struct mlx5_core_dev *dev, u32 *tdn)
 {
@@ -83,6 +84,7 @@ int mlx5_core_create_rq(struct mlx5_core_dev *dev, u32 *in, 
int inlen, u32 *rqn)
 
return err;
 }
+EXPORT_SYMBOL(mlx5_core_create_rq);
 
 int mlx5_core_modify_rq(struct mlx5_core_dev *dev, u32 rqn, u32 *in, int inlen)
 {
@@ -94,6 +96,7 @@ int mlx5_core_modify_rq(struct mlx5_core_dev *dev, u32 rqn, 
u32 *in, int inlen)
memset(out, 0, sizeof(out));
return mlx5_cmd_exec_check_status(dev, in, inlen, out, sizeof(out));
 }
+EXPORT_SYMBOL(mlx5_core_modify_rq);
 
 void mlx5_core_destroy_rq(struct mlx5_core_dev *dev, u32 rqn)
 {
@@ -107,6 +110,7 @@ void mlx5_core_destroy_rq(struct mlx5_core_dev *dev, u32 
rqn)
 
mlx5_cmd_exec_check_status(dev, in, sizeof(in), out, sizeof(out));
 }
+EXPORT_SYMBOL(mlx5_core_destroy_rq);
 
 int mlx5_core_create_sq(struct mlx5_core_dev *dev, u32 *in, int inlen, u32 
*sqn)
 {
@@ -386,6 +390,7 @@ int mlx5_core_create_rqt(struct mlx5_core_dev *dev, u32 
*in, int inlen,
 
return err;
 }
+EXPORT_SYMBOL(mlx5_core_create_rqt);
 
 int mlx5_core_modify_rqt(struct mlx5_core_dev *dev, u32 rqtn, u32 *in,
 int inlen)
@@ -411,3 +416,4 @@ void mlx5_core_destroy_rqt(struct mlx5_core_dev *dev, u32 
rqtn)
 
mlx5_cmd_exec_check_status(dev, in, sizeof(in), out, sizeof(out));
 }
+EXPORT_SYMBOL(mlx5_core_destroy_rqt);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/transobj.h 
b/drivers/net/ethernet/mellanox/mlx5/core/transobj.h
deleted file mode 100644
index 74cae51..000
--- a/drivers/net/ethernet/mellanox/mlx5/core/transobj.h
+++ /dev/null
@@ -1,72 +0,0 @@
-/*
- * Copyright (c) 2013-2015, Mellanox Technologies, Ltd.  All rights reserved.
- *
- * This software is available to you under a choice of one of two
- * licenses.  You may choose to be licensed under the terms of the GNU
- * General Public License (GPL) Version 2, available from the file
- * COPYING in the main directory of this source tree, or the
- * OpenIB.org BSD license below:
- *
- * Redistribution and use in source and binary forms, with or
- * without modification, are permitted provided that the following
- * conditions are met:
- *
- *  - Redistributions of source code must retain the above
- *copyright notice, this list of conditions and the following
- *disclaimer.
- *
- *  - Redistributions in binary form must reproduce the above
- *copyright notice, this list of conditions and the following
- *disclaimer in the documentation and/or other materials
- *provided with the distribution.
- *
- * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
- * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
- * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
- * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
- * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
- * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN

[PATCH V2 for-next 4/7] IB/mlx5: Add receive Work Queue verbs

2015-12-14 Thread Yishai Hadas
QP can be created without internal WQs "packaged" inside it,
this QP can be configured to use "external" WQ object as its
receive/send queue.

WQ is a necessary component for RSS technology since RSS mechanism
is supposed to distribute the traffic between multiple
Receive Work Queues

Add receive Work Queue verbs, it includes:
creation, modification and destruction.

Signed-off-by: Yishai Hadas 
Signed-off-by: Eli Cohen 
---
 drivers/infiniband/hw/mlx5/main.c|   9 ++
 drivers/infiniband/hw/mlx5/mlx5_ib.h |  35 +
 drivers/infiniband/hw/mlx5/qp.c  | 263 +++
 drivers/infiniband/hw/mlx5/user.h|  15 ++
 4 files changed, 322 insertions(+)

diff --git a/drivers/infiniband/hw/mlx5/main.c 
b/drivers/infiniband/hw/mlx5/main.c
index 7e97cb5..61e9a50 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -1439,6 +1439,15 @@ static void *mlx5_ib_add(struct mlx5_core_dev *mdev)
(1ull << IB_USER_VERBS_CMD_CLOSE_XRCD);
}
 
+   if (MLX5_CAP_GEN(mdev, port_type) == MLX5_CAP_PORT_TYPE_ETH) {
+   dev->ib_dev.create_wq   = mlx5_ib_create_wq;
+   dev->ib_dev.modify_wq   = mlx5_ib_modify_wq;
+   dev->ib_dev.destroy_wq  = mlx5_ib_destroy_wq;
+   dev->ib_dev.uverbs_ex_cmd_mask |=
+   (1ull << IB_USER_VERBS_EX_CMD_CREATE_WQ) |
+   (1ull << IB_USER_VERBS_EX_CMD_MODIFY_WQ) |
+   (1ull << IB_USER_VERBS_EX_CMD_DESTROY_WQ);
+   }
err = init_node_data(dev);
if (err)
goto err_dealloc;
diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h 
b/drivers/infiniband/hw/mlx5/mlx5_ib.h
index 6333472..2418b91 100644
--- a/drivers/infiniband/hw/mlx5/mlx5_ib.h
+++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h
@@ -142,12 +142,36 @@ struct mlx5_ib_wq {
void   *qend;
 };
 
+struct mlx5_ib_rwq {
+   struct ib_wqibwq;
+   u32 rqn;
+   u32 rq_num_pas;
+   u32 log_rq_stride;
+   u32 log_rq_size;
+   u32 rq_page_offset;
+   u32 log_page_size;
+   struct ib_umem  *umem;
+   size_t  buf_size;
+   unsigned intpage_shift;
+   int create_type;
+   struct mlx5_db  db;
+   u32 user_index;
+   u32 wqe_count;
+   u32 wqe_shift;
+   int wq_sig;
+};
+
 enum {
MLX5_QP_USER,
MLX5_QP_KERNEL,
MLX5_QP_EMPTY
 };
 
+enum {
+   MLX5_WQ_USER,
+   MLX5_WQ_KERNEL
+};
+
 /*
  * Connect-IB can trigger up to four concurrent pagefaults
  * per-QP.
@@ -478,6 +502,11 @@ static inline struct mlx5_ib_qp *to_mqp(struct ib_qp *ibqp)
return container_of(ibqp, struct mlx5_ib_qp, ibqp);
 }
 
+static inline struct mlx5_ib_rwq *to_mrwq(struct ib_wq *ibwq)
+{
+   return container_of(ibwq, struct mlx5_ib_rwq, ibwq);
+}
+
 static inline struct mlx5_ib_srq *to_mibsrq(struct mlx5_core_srq *msrq)
 {
return container_of(msrq, struct mlx5_ib_srq, msrq);
@@ -604,6 +633,12 @@ int mlx5_mr_ib_cont_pages(struct ib_umem *umem, u64 addr, 
int *count, int *shift
 void mlx5_umr_cq_handler(struct ib_cq *cq, void *cq_context);
 int mlx5_ib_check_mr_status(struct ib_mr *ibmr, u32 check_mask,
struct ib_mr_status *mr_status);
+struct ib_wq *mlx5_ib_create_wq(struct ib_pd *pd,
+   struct ib_wq_init_attr *init_attr,
+   struct ib_udata *udata);
+int mlx5_ib_destroy_wq(struct ib_wq *wq);
+int mlx5_ib_modify_wq(struct ib_wq *wq, struct ib_wq_attr *wq_attr,
+ u32 wq_attr_mask, struct ib_udata *udata);
 
 #ifdef CONFIG_INFINIBAND_ON_DEMAND_PAGING
 extern struct workqueue_struct *mlx5_ib_page_fault_wq;
diff --git a/drivers/infiniband/hw/mlx5/qp.c b/drivers/infiniband/hw/mlx5/qp.c
index 307bdbc..2179bd0 100644
--- a/drivers/infiniband/hw/mlx5/qp.c
+++ b/drivers/infiniband/hw/mlx5/qp.c
@@ -32,6 +32,7 @@
 
 #include 
 #include 
+#include 
 #include "mlx5_ib.h"
 #include "user.h"
 
@@ -590,6 +591,71 @@ static int uuarn_to_uar_index(struct mlx5_uuar_info 
*uuari, int uuarn)
return uuari->uars[uuarn / MLX5_BF_REGS_PER_PAGE].index;
 }
 
+static void destroy_user_rq(struct ib_pd *pd, struct mlx5_ib_rwq *rwq)
+{
+   struct mlx5_ib_ucontext *context;
+
+   context = to_mucontext(pd->uobject->context);
+   mlx5_ib_db_unmap_user(context, >db);
+   if (rwq->umem)
+   ib_umem_release(rwq->umem);
+}
+
+static int create_user_rq(struct mlx5_ib_dev *dev, struct ib_pd *pd,
+ struct mlx5_ib_rwq *rwq,
+ struct 

[PATCH V2 for-next 7/7] IB/mlx5: Add Receive Work Queue Indirection table operations

2015-12-14 Thread Yishai Hadas
Add Receive Work Queue Indirection table operations, it includes:
create, destroy.

Signed-off-by: Yishai Hadas 
Signed-off-by: Eli Cohen 
---
 drivers/infiniband/hw/mlx5/main.c|  6 +++-
 drivers/infiniband/hw/mlx5/mlx5_ib.h | 14 +
 drivers/infiniband/hw/mlx5/qp.c  | 56 
 3 files changed, 75 insertions(+), 1 deletion(-)

diff --git a/drivers/infiniband/hw/mlx5/main.c 
b/drivers/infiniband/hw/mlx5/main.c
index 61e9a50..e549c4e7 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -1446,7 +1446,11 @@ static void *mlx5_ib_add(struct mlx5_core_dev *mdev)
dev->ib_dev.uverbs_ex_cmd_mask |=
(1ull << IB_USER_VERBS_EX_CMD_CREATE_WQ) |
(1ull << IB_USER_VERBS_EX_CMD_MODIFY_WQ) |
-   (1ull << IB_USER_VERBS_EX_CMD_DESTROY_WQ);
+   (1ull << IB_USER_VERBS_EX_CMD_DESTROY_WQ) |
+   (1ull << 
IB_USER_VERBS_EX_CMD_CREATE_RWQ_IND_TBL) |
+   (1ull << 
IB_USER_VERBS_EX_CMD_DESTROY_RWQ_IND_TBL);
+   dev->ib_dev.create_rwq_ind_table = mlx5_ib_create_rwq_ind_table;
+   dev->ib_dev.destroy_rwq_ind_table = 
mlx5_ib_destroy_rwq_ind_table;
}
err = init_node_data(dev);
if (err)
diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h 
b/drivers/infiniband/hw/mlx5/mlx5_ib.h
index 2418b91..0bedfa2 100644
--- a/drivers/infiniband/hw/mlx5/mlx5_ib.h
+++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h
@@ -172,6 +172,11 @@ enum {
MLX5_WQ_KERNEL
 };
 
+struct mlx5_ib_rwq_ind_table {
+   struct ib_rwq_ind_table ib_rwq_ind_tbl;
+   u32 rqtn;
+};
+
 /*
  * Connect-IB can trigger up to four concurrent pagefaults
  * per-QP.
@@ -507,6 +512,11 @@ static inline struct mlx5_ib_rwq *to_mrwq(struct ib_wq 
*ibwq)
return container_of(ibwq, struct mlx5_ib_rwq, ibwq);
 }
 
+static inline struct mlx5_ib_rwq_ind_table *to_mrwq_ind_table(struct 
ib_rwq_ind_table *ib_rwq_ind_tbl)
+{
+   return container_of(ib_rwq_ind_tbl, struct mlx5_ib_rwq_ind_table, 
ib_rwq_ind_tbl);
+}
+
 static inline struct mlx5_ib_srq *to_mibsrq(struct mlx5_core_srq *msrq)
 {
return container_of(msrq, struct mlx5_ib_srq, msrq);
@@ -639,6 +649,10 @@ struct ib_wq *mlx5_ib_create_wq(struct ib_pd *pd,
 int mlx5_ib_destroy_wq(struct ib_wq *wq);
 int mlx5_ib_modify_wq(struct ib_wq *wq, struct ib_wq_attr *wq_attr,
  u32 wq_attr_mask, struct ib_udata *udata);
+struct ib_rwq_ind_table *mlx5_ib_create_rwq_ind_table(struct ib_device *device,
+ struct 
ib_rwq_ind_table_init_attr *init_attr,
+ struct ib_udata *udata);
+int mlx5_ib_destroy_rwq_ind_table(struct ib_rwq_ind_table *wq_ind_table);
 
 #ifdef CONFIG_INFINIBAND_ON_DEMAND_PAGING
 extern struct workqueue_struct *mlx5_ib_page_fault_wq;
diff --git a/drivers/infiniband/hw/mlx5/qp.c b/drivers/infiniband/hw/mlx5/qp.c
index 2179bd0..f0ac9fa 100644
--- a/drivers/infiniband/hw/mlx5/qp.c
+++ b/drivers/infiniband/hw/mlx5/qp.c
@@ -3387,6 +3387,62 @@ int mlx5_ib_destroy_wq(struct ib_wq *wq)
return 0;
 }
 
+struct ib_rwq_ind_table *mlx5_ib_create_rwq_ind_table(struct ib_device *device,
+ struct 
ib_rwq_ind_table_init_attr *init_attr,
+ struct ib_udata *udata)
+{
+   struct mlx5_ib_dev *dev = to_mdev(device);
+   struct mlx5_ib_rwq_ind_table *rwq_ind_tbl;
+   int sz = 1 << init_attr->log_ind_tbl_size;
+   int inlen;
+   int err;
+   int i;
+   u32 *in;
+   void *rqtc;
+
+   rwq_ind_tbl = kzalloc(sizeof(*rwq_ind_tbl), GFP_KERNEL);
+   if (!rwq_ind_tbl)
+   return ERR_PTR(-ENOMEM);
+
+   inlen = MLX5_ST_SZ_BYTES(create_rqt_in) + sizeof(u32) * sz;
+   in = mlx5_vzalloc(inlen);
+   if (!in) {
+   err = -ENOMEM;
+   goto err;
+   }
+
+   rqtc = MLX5_ADDR_OF(create_rqt_in, in, rqt_context);
+
+   MLX5_SET(rqtc, rqtc, rqt_actual_size, sz);
+   MLX5_SET(rqtc, rqtc, rqt_max_size, sz);
+
+   for (i = 0; i < sz; i++)
+   MLX5_SET(rqtc, rqtc, rq_num[i], init_attr->ind_tbl[i]->wq_num);
+
+   err = mlx5_core_create_rqt(dev->mdev, in, inlen, _ind_tbl->rqtn);
+   kvfree(in);
+
+   if (err)
+   goto err;
+
+   rwq_ind_tbl->ib_rwq_ind_tbl.ind_tbl_num = rwq_ind_tbl->rqtn;
+   return _ind_tbl->ib_rwq_ind_tbl;
+err:
+   kfree(rwq_ind_tbl);
+   return ERR_PTR(err);
+}
+
+int mlx5_ib_destroy_rwq_ind_table(struct ib_rwq_ind_table *ib_rwq_ind_tbl)
+{
+   struct mlx5_ib_rwq_ind_table *rwq_ind_tbl = 
to_mrwq_ind_table(ib_rwq_ind_tbl);
+   struct mlx5_ib_dev *dev = 

[PATCH V2 for-next 3/7] IB/uverbs: Add WQ support

2015-12-14 Thread Yishai Hadas
Add Work Queue support, it includes create, modify and
destroy commands.

Signed-off-by: Yishai Hadas 
Reviewed-by: Moshe Lazer 
---
 drivers/infiniband/core/uverbs.h  |   9 ++
 drivers/infiniband/core/uverbs_cmd.c  | 219 ++
 drivers/infiniband/core/uverbs_main.c |  25 
 include/rdma/ib_verbs.h   |   3 +
 include/uapi/rdma/ib_user_verbs.h |  41 +++
 5 files changed, 297 insertions(+)

diff --git a/drivers/infiniband/core/uverbs.h b/drivers/infiniband/core/uverbs.h
index 94bbd8c..a0b1ee7 100644
--- a/drivers/infiniband/core/uverbs.h
+++ b/drivers/infiniband/core/uverbs.h
@@ -162,6 +162,10 @@ struct ib_uqp_object {
struct ib_uxrcd_object *uxrcd;
 };
 
+struct ib_uwq_object {
+   struct ib_uevent_object uevent;
+};
+
 struct ib_ucq_object {
struct ib_uobject   uobject;
struct ib_uverbs_file  *uverbs_file;
@@ -181,6 +185,7 @@ extern struct idr ib_uverbs_qp_idr;
 extern struct idr ib_uverbs_srq_idr;
 extern struct idr ib_uverbs_xrcd_idr;
 extern struct idr ib_uverbs_rule_idr;
+extern struct idr ib_uverbs_wq_idr;
 
 void idr_remove_uobj(struct idr *idp, struct ib_uobject *uobj);
 
@@ -199,6 +204,7 @@ void ib_uverbs_release_uevent(struct ib_uverbs_file *file,
 void ib_uverbs_comp_handler(struct ib_cq *cq, void *cq_context);
 void ib_uverbs_cq_event_handler(struct ib_event *event, void *context_ptr);
 void ib_uverbs_qp_event_handler(struct ib_event *event, void *context_ptr);
+void ib_uverbs_wq_event_handler(struct ib_event *event, void *context_ptr);
 void ib_uverbs_srq_event_handler(struct ib_event *event, void *context_ptr);
 void ib_uverbs_event_handler(struct ib_event_handler *handler,
 struct ib_event *event);
@@ -273,5 +279,8 @@ IB_UVERBS_DECLARE_EX_CMD(destroy_flow);
 IB_UVERBS_DECLARE_EX_CMD(query_device);
 IB_UVERBS_DECLARE_EX_CMD(create_cq);
 IB_UVERBS_DECLARE_EX_CMD(create_qp);
+IB_UVERBS_DECLARE_EX_CMD(create_wq);
+IB_UVERBS_DECLARE_EX_CMD(modify_wq);
+IB_UVERBS_DECLARE_EX_CMD(destroy_wq);
 
 #endif /* UVERBS_H */
diff --git a/drivers/infiniband/core/uverbs_cmd.c 
b/drivers/infiniband/core/uverbs_cmd.c
index 94816ae..e0dd5da 100644
--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -57,6 +57,7 @@ static struct uverbs_lock_class ah_lock_class = { .name = 
"AH-uobj" };
 static struct uverbs_lock_class srq_lock_class = { .name = "SRQ-uobj" };
 static struct uverbs_lock_class xrcd_lock_class = { .name = "XRCD-uobj" };
 static struct uverbs_lock_class rule_lock_class = { .name = "RULE-uobj" };
+static struct uverbs_lock_class wq_lock_class = { .name = "WQ-uobj" };
 
 /*
  * The ib_uobject locking scheme is as follows:
@@ -241,6 +242,16 @@ static struct ib_qp *idr_read_qp(int qp_handle, struct 
ib_ucontext *context)
return idr_read_obj(_uverbs_qp_idr, qp_handle, context, 0);
 }
 
+static struct ib_wq *idr_read_wq(int wq_handle, struct ib_ucontext *context)
+{
+   return idr_read_obj(_uverbs_wq_idr, wq_handle, context, 0);
+}
+
+static void put_wq_read(struct ib_wq *wq)
+{
+   put_uobj_read(wq->uobject);
+}
+
 static struct ib_qp *idr_write_qp(int qp_handle, struct ib_ucontext *context)
 {
struct ib_uobject *uobj;
@@ -327,6 +338,7 @@ ssize_t ib_uverbs_get_context(struct ib_uverbs_file *file,
INIT_LIST_HEAD(>qp_list);
INIT_LIST_HEAD(>srq_list);
INIT_LIST_HEAD(>ah_list);
+   INIT_LIST_HEAD(>wq_list);
INIT_LIST_HEAD(>xrcd_list);
INIT_LIST_HEAD(>rule_list);
rcu_read_lock();
@@ -3059,6 +3071,213 @@ static int kern_spec_to_ib_spec(struct 
ib_uverbs_flow_spec *kern_spec,
return 0;
 }
 
+int ib_uverbs_ex_create_wq(struct ib_uverbs_file *file,
+  struct ib_device *ib_dev,
+  struct ib_udata *ucore,
+  struct ib_udata *uhw)
+{
+   struct ib_uverbs_ex_create_wq cmd;
+   struct ib_uverbs_ex_create_wq_resp resp;
+   struct ib_uwq_object   *obj;
+   int err = 0;
+   struct ib_cq *cq;
+   struct ib_pd *pd;
+   struct ib_wq *wq;
+   struct ib_wq_init_attr wq_init_attr;
+
+   if (ucore->inlen < sizeof(cmd))
+   return -EINVAL;
+
+   if (ucore->outlen < sizeof(resp))
+   return -ENOSPC;
+
+   err = ib_copy_from_udata(, ucore, sizeof(cmd));
+   if (err)
+   return err;
+
+   if (cmd.comp_mask)
+   return -EINVAL;
+
+   obj = kmalloc(sizeof(*obj), GFP_KERNEL);
+   if (!obj)
+   return -ENOMEM;
+
+   init_uobj(>uevent.uobject, cmd.user_handle, file->ucontext,
+ _lock_class);
+   down_write(>uevent.uobject.mutex);
+   pd  = idr_read_pd(cmd.pd_handle, file->ucontext);
+   if (!pd) {
+   err = -EINVAL;
+   goto err_uobj;
+   }
+
+   cq = idr_read_cq(cmd.cq_handle, file->ucontext, 0);
+

[PATCH V2 for-next 2/7] IB: Introduce Work Queue object and its verbs

2015-12-14 Thread Yishai Hadas
Introduce Work Queue object and its create/destroy/modify verbs.

QP can be created without internal WQs "packaged" inside it,
this QP can be configured to use "external" WQ object as its
receive/send queue.
WQ is a necessary component for RSS technology since RSS mechanism
is supposed to distribute the traffic between multiple
Receive Work Queues.

WQ associated (many to one) with Completion Queue and it owns WQ
properties (PD, WQ size, etc.).
WQ has a type, this patch introduces the IB_WQT_RQ (i.e.receive queue),
it may be extend to others such as IB_WQT_SQ. (send queue).
WQ from type IB_WQT_RQ contains receive work requests.

PD is an attribute of a work queue (i.e. send/receive queue), it's used
by the hardware for security validation before scattering to a memory
region which is pointed by the WQ. For that, an external WQ object
needs a PD, letting the hardware makes that validation.

When accessing a memory region that is pointed by the WQ its PD
is used and not the QP's PD, this behavior is similar
to a SRQ and a QP.

WQ context is subject to a well-defined state transitions done by
the modify_wq verb.
When WQ is created its initial state becomes IB_WQS_RESET.
>From IB_WQS_RESET it can be modified to itself or to IB_WQS_RDY.
>From IB_WQS_RDY it can be modified to itself, to IB_WQS_RESET
or to IB_WQS_ERR.
>From IB_WQS_ERR it can be modified to IB_WQS_RESET.

Note: transition to IB_WQS_ERR might occur implicitly in case there
was some HW error.


Signed-off-by: Yishai Hadas 
Reviewed-by: Moshe Lazer 
---
 drivers/infiniband/core/verbs.c | 85 +
 include/rdma/ib_verbs.h | 55 ++
 2 files changed, 140 insertions(+)

diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
index 043a60e..576c65d 100644
--- a/drivers/infiniband/core/verbs.c
+++ b/drivers/infiniband/core/verbs.c
@@ -1437,6 +1437,91 @@ int ib_dealloc_xrcd(struct ib_xrcd *xrcd)
 }
 EXPORT_SYMBOL(ib_dealloc_xrcd);
 
+/**
+ * ib_create_wq - Creates a WQ associated with the specified protection
+ * domain.
+ * @pd: The protection domain associated with the WQ.
+ * @wq_init_attr: A list of initial attributes required to create the
+ * WQ. If WQ creation succeeds, then the attributes are updated to
+ * the actual capabilities of the created WQ.
+ *
+ * wq_init_attr->max_wr and wq_init_attr->max_sge determine
+ * the requested size of the WQ, and set to the actual values allocated
+ * on return.
+ * If ib_create_wq() succeeds, then max_wr and max_sge will always be
+ * at least as large as the requested values.
+ */
+struct ib_wq *ib_create_wq(struct ib_pd *pd,
+  struct ib_wq_init_attr *wq_attr)
+{
+   struct ib_wq *wq;
+
+   if (!pd->device->create_wq)
+   return ERR_PTR(-ENOSYS);
+
+   wq = pd->device->create_wq(pd, wq_attr, NULL);
+   if (!IS_ERR(wq)) {
+   wq->event_handler = wq_attr->event_handler;
+   wq->wq_context = wq_attr->wq_context;
+   wq->wq_type = wq_attr->wq_type;
+   wq->cq = wq_attr->cq;
+   wq->device = pd->device;
+   wq->pd = pd;
+   wq->uobject = NULL;
+   atomic_inc(>usecnt);
+   atomic_inc(_attr->cq->usecnt);
+   atomic_set(>usecnt, 0);
+   }
+   return wq;
+}
+EXPORT_SYMBOL(ib_create_wq);
+
+/**
+ * ib_destroy_wq - Destroys the specified WQ.
+ * @wq: The WQ to destroy.
+ */
+int ib_destroy_wq(struct ib_wq *wq)
+{
+   int err;
+   struct ib_cq *cq = wq->cq;
+   struct ib_pd *pd = wq->pd;
+
+   if (!wq->device->destroy_wq)
+   return -ENOSYS;
+
+   if (atomic_read(>usecnt))
+   return -EBUSY;
+
+   err = wq->device->destroy_wq(wq);
+   if (!err) {
+   atomic_dec(>usecnt);
+   atomic_dec(>usecnt);
+   }
+   return err;
+}
+EXPORT_SYMBOL(ib_destroy_wq);
+
+/**
+ * ib_modify_wq - Modifies the specified WQ.
+ * @wq: The WQ to modify.
+ * @wq_attr: On input, specifies the WQ attributes to modify.
+ * @wq_attr_mask: A bit-mask used to specify which attributes of the WQ
+ *   are being modified.
+ * On output, the current values of selected WQ attributes are returned.
+ */
+int ib_modify_wq(struct ib_wq *wq, struct ib_wq_attr *wq_attr,
+u32 wq_attr_mask)
+{
+   int err;
+
+   if (!wq->device->modify_wq)
+   return -ENOSYS;
+
+   err = wq->device->modify_wq(wq, wq_attr, wq_attr_mask, NULL);
+   return err;
+}
+EXPORT_SYMBOL(ib_modify_wq);
+
 struct ib_flow *ib_create_flow(struct ib_qp *qp,
   struct ib_flow_attr *flow_attr,
   int domain)
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 9a68a19..277272d 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -1334,6 +1334,48 @@ struct ib_srq {
} ext;

[PATCH V2 for-next 0/7] Verbs RSS

2015-12-14 Thread Yishai Hadas
RSS (Receive Side Scaling) technology allows to spread incoming traffic between
different receive descriptor queues.
Assigning each queue to different CPU cores allows to better load balance the
incoming traffic and improve performance.

This patch-set introduces some new objects and verbs in order to allow
verbs based solutions to utilize the RSS offload capability which is
widely supported today by many modern NICs. It extends the IB and uverbs
layers to support the above functionality and supplies a specific
implementation for the mlx5_ib driver.

The implementation is based on an RFC that was sent to the list some months ago
and describes the expected verbs and objects.
RFC: http://www.spinics.net/lists/linux-rdma/msg25012.html

In addition, below URL can be used as a reference to the motivation and 
the justification to add the new objects that are described below.
http://lxr.free-electrons.com/source/Documentation/networking/scaling.txt

Overview of the changes:
- Add new objects: Work Queue and Receive Work Queues Indirection Table.
- Add new verbs that are required to handle the new objects:
  ib_create_wq(), ib_modify_wq(), ib_destory_wq(),
  ib_create_rwq_ind_table(), ib_destroy_rwq_ind_table().

Work Queue: (ib_wq)
- Work Queue is associated (many to one) with  Completion Queue.
- It owns Work Queue properties (PD, WQ size etc.).
- Currently Work Queue type can be IB_WQT_RQ (receive queue), other ones may be 
added
  in the future. (e.g. IB_WQT_SQ, send queue)
- Work Queue from type IB_WQT_RQ contains receive work requests.
- Work Queue context is subject to a well-defined state transitions done by the 
modify_wq verb.
- Work Queue is a necessary component for RSS technology since RSS mechanism is
  supposed to distribute the traffic between multiple Receive Work Queues. 
  
Receive Work Queue Indirection Table: (ib_rwq_ind_tbl)
- Serves to spread traffic between Work Queues from type RQ.
- Can be modified dynamically to give different queues different relative 
weights.
- The receive queue for a packet is determined by computed hash for the 
incoming packet.
- Receive Work Queue Indirection Table is associated (one to many) with QPs.

Future extensions to this patch-set:
- Add ib_modify_rwq_ind_table() verb to enable a dynamic RQ mapping change.
- Introduce RSS hashing configuration that should be used to compute the 
required RQ entry for the incoming packet.
- Extend the ib_create_qp() verb to work with external WQs by the indirection 
table object and with RSS hash configuration.
  - Will enable a ULP/user application to enjoy from the RSS scaling.
  - QPs that support flow steering rules can enjoy from the RSS scaling in 
addition to the steering capabilities.
- Reflect RSS capabilities by the query device verb.
- User space support (i.e. libibverbs/vendor drivers) to expose the new verbs 
and objects.

Patches:
#1 - Exposes the required APIs from mlx5_core to be used in coming patches by 
mlx5_ib driver.
#2 - Introduces the Work Queue object and its verbs in the IB layer.
#3 - Adds uverbs support for the Work Queue verbs.
#4 - Implements the Work Queue verbs in mlx5_ib driver.
#5 - Introduces Receive Work Queue indirection table and its verbs in the IB 
layer.
#6 - Adds uverbs support for the Receive Work Queue indirection table verbs.
#7 - Implements the Receive Work Queue indirection table verbs in mlx5_ib 
driver.

Changes from V1:
IB patches were reviewed by Moshe Lazer , added reviewed 
by.
#patch #2: Change ib_modify_wq to use u32 instead of enum for bit wise values.
#patch #3: Improve usage of attr_mask/comp_mask.
#patch #4: Fix driver issue in mlx5_ib in PPC.
#patch #6: Limit un-expected memory allocation.

Changes from V0:
patch #2: Move the new verbs documentation to be in the C file, improve the 
commit message.
patch #5: Move the new verbs documentation to be in the C file.

Yishai Hadas (7):
  net/mlx5_core: Expose transobj APIs from mlx5 core
  IB: Introduce Work Queue object and its verbs
  IB/uverbs: Add WQ support
  IB/mlx5: Add receive Work Queue verbs
  IB: Introduce Receive Work Queue indirection table
  IB/uverbs: Introduce RWQ Indirection table
  IB/mlx5: Add Receive Work Queue Indirection table operations

 drivers/infiniband/core/uverbs.h   |  12 +
 drivers/infiniband/core/uverbs_cmd.c   | 409 +
 drivers/infiniband/core/uverbs_main.c  |  38 ++
 drivers/infiniband/core/verbs.c| 151 
 drivers/infiniband/hw/mlx5/main.c  |  13 +
 drivers/infiniband/hw/mlx5/mlx5_ib.h   |  49 +++
 drivers/infiniband/hw/mlx5/qp.c| 319 
 drivers/infiniband/hw/mlx5/user.h  |  15 +
 drivers/net/ethernet/mellanox/mlx5/core/en.h   |   2 +-
 drivers/net/ethernet/mellanox/mlx5/core/srq.c  |   2 +-
 drivers/net/ethernet/mellanox/mlx5/core/transobj.c |   8 +-
 

[PATCH 1/3] staging/rdma/hfi1: consolidate kmalloc_array+memset into kcalloc

2015-12-14 Thread Nicholas Mc Guire
rather than using kmalloc_array + memset it seems cleaner to simply use
kcalloc which will deliver memory set to zero.

Signed-off-by: Nicholas Mc Guire <hof...@osadl.org>
---

Patch was compile tested with: x86_64_defconfig
CONFIG_INFINIBAND=m, CONFIG_STAGING=y, CONFIG_STAGING_RDMA=m

Patch is against linux-next (localversion-next is -next-20151214)

 drivers/staging/rdma/hfi1/chip.c |3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/staging/rdma/hfi1/chip.c b/drivers/staging/rdma/hfi1/chip.c
index dc69159..31eec8a 100644
--- a/drivers/staging/rdma/hfi1/chip.c
+++ b/drivers/staging/rdma/hfi1/chip.c
@@ -10128,8 +10128,7 @@ static void init_qos(struct hfi1_devdata *dd, u32 
first_ctxt)
goto bail;
if (num_vls * qpns_per_vl > dd->chip_rcv_contexts)
goto bail;
-   rsmmap = kmalloc_array(NUM_MAP_REGS, sizeof(u64), GFP_KERNEL);
-   memset(rsmmap, rxcontext, NUM_MAP_REGS * sizeof(u64));
+   rsmmap = kcalloc(NUM_MAP_REGS, sizeof(u64), GFP_KERNEL);
/* init the local copy of the table */
for (i = 0, ctxt = first_ctxt; i < num_vls; i++) {
unsigned tctxt;
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/3] staging/rdma/hfi1: fix build warning

2015-12-14 Thread Nicholas Mc Guire
fix the following build warning
drivers/staging/rdma/hfi1/chip.c: In function 'init_qos':
drivers/staging/rdma/hfi1/chip.c:10110:6: warning: unused variable
'rxcontext' [-Wunused-variable]

Signed-off-by: Nicholas Mc Guire <hof...@osadl.org>
---

Patch was compile tested with: x86_64_defconfig
CONFIG_INFINIBAND=m, CONFIG_STAGING=y, CONFIG_STAGING_RDMA=m

Patch is against linux-next (localversion-next is -next-20151214)

 drivers/staging/rdma/hfi1/chip.c |1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/staging/rdma/hfi1/chip.c b/drivers/staging/rdma/hfi1/chip.c
index 52d2bd7..ec368a8 100644
--- a/drivers/staging/rdma/hfi1/chip.c
+++ b/drivers/staging/rdma/hfi1/chip.c
@@ -10107,7 +10107,6 @@ static void init_qos(struct hfi1_devdata *dd, u32 
first_ctxt)
unsigned qpns_per_vl, ctxt, i, qpn, n = 1, m;
u64 *rsmmap;
u64 reg;
-   u8  rxcontext = is_a0(dd) ? 0 : 0xff;  /* 0 is default if a0 ver. */
 
/* validate */
if (dd->n_krcv_queues <= MIN_KERNEL_KCTXTS ||
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/3] staging/rdma/hfi1: check return value of kcalloc

2015-12-14 Thread Nicholas Mc Guire
Add a null check after the kcalloc call as proposed by
Mike Marciniszyn <mike.marcinis...@intel.com>.

Signed-off-by: Nicholas Mc Guire <hof...@osadl.org>
---

Patch was compile tested with: x86_64_defconfig
CONFIG_INFINIBAND=m, CONFIG_STAGING=y, CONFIG_STAGING_RDMA=m

Patch is against linux-next (localversion-next is -next-20151214)

 drivers/staging/rdma/hfi1/chip.c |3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/staging/rdma/hfi1/chip.c b/drivers/staging/rdma/hfi1/chip.c
index 31eec8a..52d2bd7 100644
--- a/drivers/staging/rdma/hfi1/chip.c
+++ b/drivers/staging/rdma/hfi1/chip.c
@@ -10129,6 +10129,9 @@ static void init_qos(struct hfi1_devdata *dd, u32 
first_ctxt)
if (num_vls * qpns_per_vl > dd->chip_rcv_contexts)
goto bail;
rsmmap = kcalloc(NUM_MAP_REGS, sizeof(u64), GFP_KERNEL);
+   if (!rsmmap)
+   goto bail;
+
/* init the local copy of the table */
for (i = 0, ctxt = first_ctxt; i < num_vls; i++) {
unsigned tctxt;
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 2/2] ipoib mcast sendonly join: Move multicast specific code out of ipoib_main.c.

2015-12-14 Thread Weiny, Ira
> 
> On Fri, 11 Dec 2015, ira.weiny wrote:
> 
> > I think I would rather see this called something like
> >
> > ipoib_add_to_list_sendonly
> >
> > Or something...
> >
> > Calling it iboib_check* sounds like it should return a bool.
> 
> Hmm... It only adds the multicast group if the check was successful.
> 
> How about
> 
>   ipoib_check_and_add_mcast_sendonly()

Better.

> 
> 
> > > +void ipoib_check_mcast_sendonly(struct ipoib_dev_priv *priv, u8
> *mgid,
> > > + struct list_head *remove_list)
> > > +{
> > > + /* Is this multicast ? */
> > > + if (*mgid == 0xff) {
> >
> > Odd to see a mgid variable which is only u8?
> >
> > How about "gid_prefix"?
> 
> That is only used in the qib driver and there it is a field.
> 
> mgid is a pointer to the seres of bytes of the MGID and the first byte of that
> signifies multicast if 0xff

Understood, I misread the code at first.

Ira

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] ipoib mcast sendonly join: Move multicast specific code out of ipoib_main.c.

2015-12-14 Thread Christoph Lameter
On Fri, 11 Dec 2015, ira.weiny wrote:

> I think I would rather see this called something like
>
> ipoib_add_to_list_sendonly
>
> Or something...
>
> Calling it iboib_check* sounds like it should return a bool.

Hmm... It only adds the multicast group if the check was successful.

How about

ipoib_check_and_add_mcast_sendonly()


> > +void ipoib_check_mcast_sendonly(struct ipoib_dev_priv *priv, u8 *mgid,
> > +   struct list_head *remove_list)
> > +{
> > +   /* Is this multicast ? */
> > +   if (*mgid == 0xff) {
>
> Odd to see a mgid variable which is only u8?
>
> How about "gid_prefix"?

That is only used in the qib driver and there it is a field.

mgid is a pointer to the seres of bytes of the MGID and the first byte of
that signifies multicast if 0xff

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 21/37] IB/rdmavt: Move MR datastructures into rvt

2015-12-14 Thread Dennis Dalessandro

On Mon, Dec 07, 2015 at 03:39:17PM -0600, Hefty, Sean wrote:

+struct rvt_mregion {
+   struct ib_pd *pd;   /* shares refcnt of ibmr.pd */
+   u64 user_base;  /* User's address for this region */
+   u64 iova;   /* IB start address of this region */
+   size_t length;
+   u32 lkey;
+   u32 offset; /* offset (bytes) to start of region */
+   int access_flags;
+   u32 max_segs;   /* number of rvt_segs in all the arrays */
+   u32 mapsz;  /* size of the map array */
+   u8  page_shift; /* 0 - non unform/non powerof2 sizes */
+   u8  lkey_published; /* in global table */


Without looking ahead in the patch series, won't the access_flags indicate this?


I think it could. However to me this is more clear. When we allocate an lkey 
we set this, when it's freed we clear it, we check this flag in the free 
routine to decide if we should actually free it.


-Denny
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH for-next V2 00/11] Add RoCE v2 support

2015-12-14 Thread Moni Shoua
On Thu, Dec 3, 2015 at 3:47 PM, Matan Barak  wrote:
> Hi Doug,
>
> This series adds the support for RoCE v2. In order to support RoCE v2,
> we add gid_type attribute to every GID. When the RoCE GID management
> populates the GID table, it duplicates each GID with all supported types.
> This gives the user the ability to communicate over each supported
> type.
>
> Patch 0001, 0002 and 0003 add support for multiple GID types to the
> cache and related APIs. The third patch exposes the GID attributes
> information is sysfs.
>
> Patch 0004 adds the RoCE v2 GID type and the capabilities required
> from the vendor in order to implement RoCE v2. These capabilities
> are grouped together as RDMA_CORE_PORT_IBA_ROCE_UDP_ENCAP.
>
> RoCE v2 could work at IPv4 and IPv6 networks. When receiving ib_wc, this
> information should come from the vendor's driver. In case the vendor
> doesn't supply this information, we parse the packet headers and resolve
> its network type. Patch 0005 adds this information and required utilities.
>
> Patches 0006 and 0007 adds route validation. This is mandatory to ensure
> that we send packets using GIDS which corresponds to a net-device that
> can be routed to the destination.
>
> Patches 0008 and 0009 add configfs support (and the required
> infrastructure) for CMA. The administrator should be able to set the
> default RoCE type. This is done through a new per-port
> default_roce_mode configfs file.
>
> Patch 0010 formats a QP1 packet in order to support RoCE v2 CM
> packets. This is required for vendors which implement their
> QP1 as a Raw QP.
>
> Patch 0011 adds support for IPv4 multicast as an IPv4 network
> requires IGMP to be sent in order to join multicast groups.
>
> Vendors code aren't part of this patch-set. Soft-Roce will be
> sent soon and depends on these patches. Other vendors, like
> mlx4, ocrdma and mlx5 will follow.
>
> This patch is applied on "Change per-entry locks in GID cache to table lock"
> which was sent to the mailing list.
>
> Thanks,
> Matan
>
> Changed from V1:
>  - Rebased against Linux 4.4-rc2 master branch.
>  - Add route validation
>  - ConfigFS - avoid compiling INFINIBAND=y and CONFIGFS_FS=m
>  - Add documentation for configfs and sysfs ABI
>  - Remove ifindex and gid_type from mcmember
>
> Changes from V0:
>  - Rebased patches against Doug's latest k.o/for-4.4 tree.
>  - Fixed a bug in configfs (rmdir caused an incorrect free).
>
> Matan Barak (8):
>   IB/core: Add gid_type to gid attribute
>   IB/cm: Use the source GID index type
>   IB/core: Add gid attributes to sysfs
>   IB/core: Add ROCE_UDP_ENCAP (RoCE V2) type
>   IB/core: Move rdma_is_upper_dev_rcu to header file
>   IB/core: Validate route in ib_init_ah_from_wc and ib_init_ah_from_path
>   IB/rdma_cm: Add wrapper for cma reference count
>   IB/cma: Add configfs for rdma_cm
>
> Moni Shoua (2):
>   IB/core: Initialize UD header structure with IP and UDP headers
>   IB/cma: Join and leave multicast groups with IGMP
>
> Somnath Kotur (1):
>   IB/core: Add rdma_network_type to wc
>
>  Documentation/ABI/testing/configfs-rdma_cm   |  22 ++
>  Documentation/ABI/testing/sysfs-class-infiniband |  16 ++
>  drivers/infiniband/Kconfig   |   9 +
>  drivers/infiniband/core/Makefile |   2 +
>  drivers/infiniband/core/addr.c   | 185 +
>  drivers/infiniband/core/cache.c  | 169 
>  drivers/infiniband/core/cm.c |  31 ++-
>  drivers/infiniband/core/cma.c| 261 --
>  drivers/infiniband/core/cma_configfs.c   | 321 
> +++
>  drivers/infiniband/core/core_priv.h  |  45 
>  drivers/infiniband/core/device.c |  10 +-
>  drivers/infiniband/core/multicast.c  |  17 +-
>  drivers/infiniband/core/roce_gid_mgmt.c  |  81 --
>  drivers/infiniband/core/sa_query.c   |  76 +-
>  drivers/infiniband/core/sysfs.c  | 184 -
>  drivers/infiniband/core/ud_header.c  | 155 ++-
>  drivers/infiniband/core/uverbs_marshall.c|   1 +
>  drivers/infiniband/core/verbs.c  | 170 ++--
>  drivers/infiniband/hw/mlx4/qp.c  |   7 +-
>  drivers/infiniband/hw/mthca/mthca_qp.c   |   2 +-
>  drivers/infiniband/hw/ocrdma/ocrdma_ah.c |   2 +-
>  include/rdma/ib_addr.h   |  11 +-
>  include/rdma/ib_cache.h  |   4 +
>  include/rdma/ib_pack.h   |  45 +++-
>  include/rdma/ib_sa.h |   3 +
>  include/rdma/ib_verbs.h  |  78 +-
>  26 files changed, 1704 insertions(+), 203 deletions(-)
>  create mode 100644 Documentation/ABI/testing/configfs-rdma_cm
>  create mode 100644 Documentation/ABI/testing/sysfs-class-infiniband
>  create mode 100644 

Re: [PATCH] staging/rdma/hfi1: Fix a possible null pointer dereference

2015-12-14 Thread Nicholas Mc Guire
On Thu, Dec 10, 2015 at 11:13:38AM -0500, Mike Marciniszyn wrote:
> From: Easwar Hariharan <easwar.hariha...@intel.com>
> 
> A code inspection pointed out that kmalloc_array may return NULL and
> memset doesn't check the input pointer for NULL, resulting in a possible
> NULL dereference. This patch fixes this.
> 
> Reviewed-by: Mike Marciniszyn <mike.marcinis...@intel.com>
> Signed-off-by: Easwar Hariharan <easwar.hariha...@intel.com>
> ---
>  drivers/staging/rdma/hfi1/chip.c |2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/drivers/staging/rdma/hfi1/chip.c 
> b/drivers/staging/rdma/hfi1/chip.c
> index dc69159..49d49b2 100644
> --- a/drivers/staging/rdma/hfi1/chip.c
> +++ b/drivers/staging/rdma/hfi1/chip.c
> @@ -10129,6 +10129,8 @@ static void init_qos(struct hfi1_devdata *dd, u32 
> first_ctxt)
>   if (num_vls * qpns_per_vl > dd->chip_rcv_contexts)
>   goto bail;
>   rsmmap = kmalloc_array(NUM_MAP_REGS, sizeof(u64), GFP_KERNEL);
> + if (!rsmmap)
> + goto bail;
>   memset(rsmmap, rxcontext, NUM_MAP_REGS * sizeof(u64));
>   /* init the local copy of the table */
>   for (i = 0, ctxt = first_ctxt; i < num_vls; i++) {
> 
> --

Based on this report a generalization of unchecked use turned up one more
case in the current kernel (patch sent). Probably the  when  block needs 
some cleanup, but findings like this definitely are a case for coccinelle 
scanners.


/// check for missing NULL check before use 
//
//  missing check in: 
//  ./drivers/staging/rdma/hfi1/chip.c:10131 unchecked allocation
//  in -next-20151214
//  reported-by Mike Marciniszyn <mike.marcinis...@intel.com> 
//
//  after generalization this also found:
//  ./drivers/clk/shmobile/clk-div6.c:197 unchecked allocation

virtual context
virtual org
virtual report

@badmemset@
expression mem;
position p;
statement S;
@@

<+...
*mem = kmalloc_array@p(...);
  ... when != if (!mem || ...) S
  when != if (... && !mem) S
  when != if (mem == NULL || ...) S
  when != if (... && mem == NULL) S
  when != if (unlikely(mem == NULL)) S
  when != if (unlikely(!mem)) S
  when != if (likely(!mem)) S
  when != if (likely(mem == NULL)) S
  return;
...+>

@script:python@
p << badmemset.p;
@@

print "%s:%s unchecked allocation" % (p[0].file,p[0].line)



thx!
hofrat
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] IB core: Display 64 bit counters from the extended set

2015-12-14 Thread Christoph Lameter
On Mon, 14 Dec 2015, Matan Barak wrote:

> > +static PORT_PMA_ATTR(unicast_rcv_packets   ,  0, 64, 384, 
> > IB_PMA_PORT_COUNTERS_EXT);
> > +static PORT_PMA_ATTR(multicast_xmit_packets,  0, 64, 448, 
> > IB_PMA_PORT_COUNTERS_EXT);
> > +static PORT_PMA_ATTR(multicast_rcv_packets ,  0, 64, 512, 
> > IB_PMA_PORT_COUNTERS_EXT);
> >
>
> Why do we use 0 as the counter argument for all EXT counters?

No idea what the counter is doing. Saw another EXT counter implementation
use 0 so I thought that was fine.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] IB core: Display 64 bit counters from the extended set

2015-12-14 Thread Matan Barak
On Fri, Dec 11, 2015 at 8:25 PM, Christoph Lameter  wrote:
> Display the additional 64 bit counters available through the extended
> set and replace the existing 32 bit counters if there is a 64 bit
> alternative available.
>
> Note: This requires universal support of extended counters in
> the devices. If there are still devices around that do not
> support extended counters then we will have to add some fallback
> technique here.
>
> Signed-off-by: Christoph Lameter 
> ---
>  drivers/infiniband/core/sysfs.c | 16 
>  1 file changed, 12 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/infiniband/core/sysfs.c b/drivers/infiniband/core/sysfs.c
> index 0083a4f..f7f2954 100644
> --- a/drivers/infiniband/core/sysfs.c
> +++ b/drivers/infiniband/core/sysfs.c
> @@ -406,10 +406,14 @@ static PORT_PMA_ATTR(port_rcv_constraint_errors   , 
>  8,  8, 136, IB_PMA_PORT_C
>  static PORT_PMA_ATTR(local_link_integrity_errors,  9,  4, 152, 
> IB_PMA_PORT_COUNTERS);
>  static PORT_PMA_ATTR(excessive_buffer_overrun_errors, 10,  4, 156, 
> IB_PMA_PORT_COUNTERS);
>  static PORT_PMA_ATTR(VL15_dropped  , 11, 16, 176, 
> IB_PMA_PORT_COUNTERS);
> -static PORT_PMA_ATTR(port_xmit_data, 12, 32, 192, 
> IB_PMA_PORT_COUNTERS);
> -static PORT_PMA_ATTR(port_rcv_data , 13, 32, 224, 
> IB_PMA_PORT_COUNTERS);
> -static PORT_PMA_ATTR(port_xmit_packets , 14, 32, 256, 
> IB_PMA_PORT_COUNTERS);
> -static PORT_PMA_ATTR(port_rcv_packets  , 15, 32, 288, 
> IB_PMA_PORT_COUNTERS);
> +static PORT_PMA_ATTR(port_xmit_data,  0, 64,  64, 
> IB_PMA_PORT_COUNTERS_EXT);
> +static PORT_PMA_ATTR(port_rcv_data ,  0, 64, 128, 
> IB_PMA_PORT_COUNTERS_EXT);
> +static PORT_PMA_ATTR(port_xmit_packets ,  0, 64, 192, 
> IB_PMA_PORT_COUNTERS_EXT);
> +static PORT_PMA_ATTR(port_rcv_packets  ,  0, 64, 256, 
> IB_PMA_PORT_COUNTERS_EXT);
> +static PORT_PMA_ATTR(unicast_xmit_packets  ,  0, 64, 320, 
> IB_PMA_PORT_COUNTERS_EXT);
> +static PORT_PMA_ATTR(unicast_rcv_packets   ,  0, 64, 384, 
> IB_PMA_PORT_COUNTERS_EXT);
> +static PORT_PMA_ATTR(multicast_xmit_packets,  0, 64, 448, 
> IB_PMA_PORT_COUNTERS_EXT);
> +static PORT_PMA_ATTR(multicast_rcv_packets ,  0, 64, 512, 
> IB_PMA_PORT_COUNTERS_EXT);
>

Why do we use 0 as the counter argument for all EXT counters?

>  static struct attribute *pma_attrs[] = {
> _pma_attr_symbol_error.attr.attr,
> @@ -428,6 +432,10 @@ static struct attribute *pma_attrs[] = {
> _pma_attr_port_rcv_data.attr.attr,
> _pma_attr_port_xmit_packets.attr.attr,
> _pma_attr_port_rcv_packets.attr.attr,
> +   _pma_attr_unicast_rcv_packets.attr.attr,
> +   _pma_attr_unicast_xmit_packets.attr.attr,
> +   _pma_attr_multicast_rcv_packets.attr.attr,
> +   _pma_attr_multicast_xmit_packets.attr.attr,
> NULL
>  };
>
> --
> 2.5.0
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH for-next v1 1/2] net/mlx5_core: Configure HW to support atomic request in host endianness

2015-12-14 Thread Eran Ben Elisha
HW is capable of 2 requestor endianness modes for standard 8 Bytes
atomic: BE (0x0) and host endianness (0x1). Read the supported modes
from hca atomic capabilities and configure HW to host endianness mode if
supported.

Signed-off-by: Eran Ben Elisha 
Reviewed-by: Yishai Hadas 
---
 drivers/net/ethernet/mellanox/mlx5/core/main.c | 57 +-
 include/linux/mlx5/mlx5_ifc.h  | 22 ++
 2 files changed, 70 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c 
b/drivers/net/ethernet/mellanox/mlx5/core/main.c
index 03aabdd..682a4c0 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
@@ -73,6 +73,11 @@ struct mlx5_device_context {
void   *context;
 };
 
+enum {
+   MLX5_ATOMIC_REQ_MODE_BE = 0x0,
+   MLX5_ATOMIC_REQ_MODE_HOST_ENDIANNESS = 0x1,
+};
+
 static struct mlx5_profile profile[] = {
[0] = {
.mask   = 0,
@@ -335,7 +340,7 @@ query_ex:
return err;
 }
 
-static int set_caps(struct mlx5_core_dev *dev, void *in, int in_sz)
+static int set_caps(struct mlx5_core_dev *dev, void *in, int in_sz, int opmod)
 {
u32 out[MLX5_ST_SZ_DW(set_hca_cap_out)];
int err;
@@ -343,6 +348,7 @@ static int set_caps(struct mlx5_core_dev *dev, void *in, 
int in_sz)
memset(out, 0, sizeof(out));
 
MLX5_SET(set_hca_cap_in, in, opcode, MLX5_CMD_OP_SET_HCA_CAP);
+   MLX5_SET(set_hca_cap_in, in, op_mod, opmod << 1);
err = mlx5_cmd_exec(dev, in, in_sz, out, sizeof(out));
if (err)
return err;
@@ -352,6 +358,46 @@ static int set_caps(struct mlx5_core_dev *dev, void *in, 
int in_sz)
return err;
 }
 
+static int handle_hca_cap_atomic(struct mlx5_core_dev *dev)
+{
+   void *set_ctx;
+   void *set_hca_cap;
+   int set_sz = MLX5_ST_SZ_BYTES(set_hca_cap_in);
+   int req_endianness;
+   int err;
+
+   if (MLX5_CAP_GEN(dev, atomic)) {
+   err = mlx5_core_get_caps(dev, MLX5_CAP_ATOMIC,
+HCA_CAP_OPMOD_GET_CUR);
+   if (err)
+   return err;
+   } else {
+   return 0;
+   }
+
+   req_endianness =
+   MLX5_CAP_ATOMIC(dev,
+   supported_atomic_req_8B_endianess_mode_1);
+
+   if (req_endianness != MLX5_ATOMIC_REQ_MODE_HOST_ENDIANNESS)
+   return 0;
+
+   set_ctx = kzalloc(set_sz, GFP_KERNEL);
+   if (!set_ctx)
+   return -ENOMEM;
+
+   set_hca_cap = MLX5_ADDR_OF(set_hca_cap_in, set_ctx, capability);
+
+   /* Set requestor to host endianness */
+   MLX5_SET(atomic_caps, set_hca_cap, atomic_req_8B_endianess_mode,
+MLX5_ATOMIC_REQ_MODE_HOST_ENDIANNESS);
+
+   err = set_caps(dev, set_ctx, set_sz, MLX5_SET_HCA_CAP_OP_MOD_ATOMIC);
+
+   kfree(set_ctx);
+   return err;
+}
+
 static int handle_hca_cap(struct mlx5_core_dev *dev)
 {
void *set_ctx = NULL;
@@ -393,7 +439,8 @@ static int handle_hca_cap(struct mlx5_core_dev *dev)
 
MLX5_SET(cmd_hca_cap, set_hca_cap, log_uar_page_sz, PAGE_SHIFT - 12);
 
-   err = set_caps(dev, set_ctx, set_sz);
+   err = set_caps(dev, set_ctx, set_sz,
+  MLX5_SET_HCA_CAP_OP_MOD_GENERAL_DEVICE);
 
 query_ex:
kfree(set_ctx);
@@ -764,6 +811,12 @@ static int mlx5_dev_init(struct mlx5_core_dev *dev, struct 
pci_dev *pdev)
goto reclaim_boot_pages;
}
 
+   err = handle_hca_cap_atomic(dev);
+   if (err) {
+   dev_err(>dev, "handle_hca_cap_atomic failed\n");
+   goto reclaim_boot_pages;
+   }
+
err = mlx5_satisfy_startup_pages(dev, 0);
if (err) {
dev_err(>dev, "failed to allocate init pages\n");
diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h
index dd20974..3da1951 100644
--- a/include/linux/mlx5/mlx5_ifc.h
+++ b/include/linux/mlx5/mlx5_ifc.h
@@ -67,6 +67,11 @@ enum {
 };
 
 enum {
+   MLX5_SET_HCA_CAP_OP_MOD_GENERAL_DEVICE= 0x0,
+   MLX5_SET_HCA_CAP_OP_MOD_ATOMIC= 0x3,
+};
+
+enum {
MLX5_CMD_OP_QUERY_HCA_CAP = 0x100,
MLX5_CMD_OP_QUERY_ADAPTER = 0x101,
MLX5_CMD_OP_INIT_HCA  = 0x102,
@@ -525,21 +530,24 @@ enum {
 struct mlx5_ifc_atomic_caps_bits {
u8 reserved_0[0x40];
 
-   u8 atomic_req_endianness[0x1];
-   u8 reserved_1[0x1f];
+   u8 atomic_req_8B_endianess_mode[0x2];
+   u8 reserved_1[0x4];
+   u8 supported_atomic_req_8B_endianess_mode_1[0x1];
 
-   u8 reserved_2[0x20];
+   u8 reserved_2[0x19];
 
-   u8 reserved_3[0x10];
-   u8 atomic_operations[0x10];
+   u8 

[PATCH for-next v1 0/2] Advertise atomic operations support in mlx5

2015-12-14 Thread Eran Ben Elisha
Hi Doug,

This patch set adds the functionality to advertise standard atomic operations
capabilities for mlx5 driver.  The Hardware can be configured to work in two
modes according to the device capabilities: 1. Big Endian requestor respond 2.
Host Endian requestor respond

If the firmware supports host endianness, try to configure the hardware to work
this way, and on success propagate this capability to the upper layers,
otherwise advertise that atomic operations aren't supported.

Thanks, Eran

Changes from v0:
- Rewrite the commit message of the second patch in the series


Eran Ben Elisha (2):
  net/mlx5_core: Configure HW to support atomic request in host
endianness
  IB/mlx5: Advertise atomic capabilities in query device

 drivers/infiniband/hw/mlx5/main.c  | 28 -
 drivers/net/ethernet/mellanox/mlx5/core/main.c | 57 +-
 include/linux/mlx5/driver.h|  5 +++
 include/linux/mlx5/mlx5_ifc.h  | 22 ++
 4 files changed, 102 insertions(+), 10 deletions(-)

-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH for-next v1 2/2] IB/mlx5: Advertise atomic capabilities in query device

2015-12-14 Thread Eran Ben Elisha
In order to ensure IB spec atomic correctness in atomic operations, if
HW is configured to host endianness, advertise IB_ATOMIC_HCA.  if not,
advertise IB_ATOMIC_NONE.

Signed-off-by: Eran Ben Elisha 
Reviewed-by: Yishai Hadas 
---
 drivers/infiniband/hw/mlx5/main.c | 28 +++-
 include/linux/mlx5/driver.h   |  5 +
 2 files changed, 32 insertions(+), 1 deletion(-)

diff --git a/drivers/infiniband/hw/mlx5/main.c 
b/drivers/infiniband/hw/mlx5/main.c
index bdd60a6..1139b67 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -63,6 +63,10 @@ static char mlx5_version[] =
DRIVER_NAME ": Mellanox Connect-IB Infiniband driver v"
DRIVER_VERSION " (" DRIVER_RELDATE ")\n";
 
+enum {
+   MLX5_ATOMIC_SIZE_QP_8BYTES = 1 << 3,
+};
+
 static enum rdma_link_layer
 mlx5_ib_port_link_layer(struct ib_device *device)
 {
@@ -101,6 +105,28 @@ static int mlx5_get_vport_access_method(struct ib_device 
*ibdev)
return MLX5_VPORT_ACCESS_METHOD_HCA;
 }
 
+static void get_atomic_caps(struct mlx5_ib_dev *dev,
+   struct ib_device_attr *props)
+{
+   u8 tmp;
+   u8 atomic_operations = MLX5_CAP_ATOMIC(dev->mdev, atomic_operations);
+   u8 atomic_size_qp = MLX5_CAP_ATOMIC(dev->mdev, atomic_size_qp);
+   u8 atomic_req_8B_endianness_mode =
+   MLX5_CAP_ATOMIC(dev->mdev, atomic_req_8B_endianess_mode);
+
+   /* Check if HW supports 8 bytes standard atomic operations and capable
+* of host endianness respond
+*/
+   tmp = MLX5_ATOMIC_OPS_CMP_SWAP | MLX5_ATOMIC_OPS_FETCH_ADD;
+   if (((atomic_operations & tmp) == tmp) &&
+   (atomic_size_qp & MLX5_ATOMIC_SIZE_QP_8BYTES) &&
+   (atomic_req_8B_endianness_mode)) {
+   props->atomic_cap = IB_ATOMIC_HCA;
+   } else {
+   props->atomic_cap = IB_ATOMIC_NONE;
+   }
+}
+
 static int mlx5_query_system_image_guid(struct ib_device *ibdev,
__be64 *sys_image_guid)
 {
@@ -286,7 +312,7 @@ static int mlx5_ib_query_device(struct ib_device *ibdev,
props->max_res_rd_atom = props->max_qp_rd_atom * props->max_qp;
props->max_srq_sge = max_rq_sg - 1;
props->max_fast_reg_page_list_len = (unsigned int)-1;
-   props->atomic_cap  = IB_ATOMIC_NONE;
+   get_atomic_caps(dev, props);
props->masked_atomic_cap   = IB_ATOMIC_NONE;
props->max_mcast_grp   = 1 << MLX5_CAP_GEN(mdev, log_max_mcg);
props->max_mcast_qp_attach = MLX5_CAP_GEN(mdev, max_qp_mcg);
diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h
index 8b6d6f2..4c5a7e6 100644
--- a/include/linux/mlx5/driver.h
+++ b/include/linux/mlx5/driver.h
@@ -115,6 +115,11 @@ enum {
MLX5_REG_HOST_ENDIANNESS = 0x7004,
 };
 
+enum {
+   MLX5_ATOMIC_OPS_CMP_SWAP= 1 << 0,
+   MLX5_ATOMIC_OPS_FETCH_ADD   = 1 << 1,
+};
+
 enum mlx5_page_fault_resume_flags {
MLX5_PAGE_FAULT_RESUME_REQUESTOR = 1 << 0,
MLX5_PAGE_FAULT_RESUME_WRITE = 1 << 1,
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH for-next V1 2/5] IB/core: Add ib_is_udata_cleared

2015-12-14 Thread Matan Barak
On Sun, Dec 13, 2015 at 5:47 PM, Haggai Eran  wrote:
> On 10/12/2015 19:29, Matan Barak wrote:
>> On Thu, Dec 10, 2015 at 5:20 PM, Haggai Eran  wrote:
>>> On 10/12/2015 16:59, Matan Barak wrote:
 On Mon, Dec 7, 2015 at 3:18 PM, Haggai Eran  wrote:
> On 12/03/2015 05:44 PM, Matan Barak wrote:
>> Extending core and vendor verb commands require us to check that the
>> unknown part of the user's given command is all zeros.
>> Adding ib_is_udata_cleared in order to do so.
>>
>
> Why not copy the data into kernel space and run memchr_inv() on it?
>

 Probably less efficient, isn't it?
>>> Why do you think it is less efficient?
>>>
>>> I'm not sure calling copy_from_user multiple times is very efficient.
>>> For once, you are calling access_ok multiple times. I guess it depends
>>> on the amount of data you are copying.
>>>
>>
>> Isn't access_ok pretty cheap?
>> It calls __chk_range_not_ok which on x86 seems like a very cheap
>> function and __chk_user_ptr which is a compiler check.
>> I guess most kernel-user implementation will be pretty much in sync,
>> so we'll possibly call it for a few/dozens of bytes. In that case, I
>> think this implementation is a bit faster.
>>
 I know it isn't data path, but we'll execute this code in all extended
 functions (sometimes even more than once).
>>> Do you think it is important enough to maintain our own copy of
>>> memchr_inv()?
>>>
>>
>> True, I'm not sure it's important enough, but do you think it's that
>> complicated?
>
> It is complicated in my opinion. It is 67 lines of code, it's
> architecture dependent and relies on preprocessor macros and conditional
> code. I think this kind of stuff belongs in lib/string.c and not in the
> RDMA stack.
>

I'm not sure regarding the string.c location, as it deals with user
buffers, but in order not to
be dependent on this, I'll change this code to the following.

static inline bool ib_is_udata_cleared(struct ib_udata *udata,
   u8 cleared_char,
   size_t offset,
   size_t len)
{
const void __user *p = udata->inbuf + offset;
bool ret = false;
u8 *buf;

if (len > USHRT_MAX)
return false;

buf = kmalloc(len, GFP_KERNEL);
if (!buf)
return false;

if (copy_from_user(buf, p, len))
goto free;

ret = !memchr_inv(buf, cleared_char, len);

free:
kfree(buf);
return ret;
}

> Haggai

Regards,
Matan
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 10/13] IB/srp: use the new CQ API

2015-12-14 Thread Doug Ledford
On 12/13/2015 05:26 AM, Sagi Grimberg wrote:
> 
>> Allright.  How do you want to proceed?  The current rdma-cq branch
>> has all kinds of dependencies, but I've also prepared a new rdma-cq.2
>> branch that could go straight on top of your current queue:
>>
>> http://git.infradead.org/users/hch/rdma.git/shortlog/refs/heads/rdma-cq.2
>>
>> If you're ready to start the 4.5 tree I can send those out as a patch
>> series.
> 
> Will this get on top of iser-remote-inv? Or should I resend atop of this?

I'm going through my inbox right now.  I expect somewhere in there Or
will make his case for why he doesn't like Christoph's patch to get rid
of the attr struct.  I'll listen, and if I'm not convinced, I'll take
that patchset first and this one second. (I reviewed the patchset
alreadyaside from the fact that I *like* having the attr struct
elements in an organized sub-struct, it's fine and it definitely
improves on all of those query calls).

-- 
Doug Ledford 
  GPG KeyID: 0E572FDD




signature.asc
Description: OpenPGP digital signature


Re: [PATCH 3/3] IB core: Display 64 bit counters from the extended set

2015-12-14 Thread Hal Rosenstock
On 12/11/2015 7:00 PM, Jason Gunthorpe wrote:
>> qib, mlx4 are fine.  mlx5 should be as well I would think (I don't have that
>> hardware.)

I'm not 100% sure but I don't think that mthca supports the
PortCountersExtended attribute.

> I have no specifics to add, but I keep running into systems, even
> today, where the 64 bit counters don't work. The MAD might be there,
> but several counters are wired to 0.

I have seen this too :-(

> Not sure exactly which HW though.
> 
> Mellanox should really confirm this for their hardware matrix.

I am trying to get definitive answer to this.

-- Hal
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 22/37] IB/rdmavt: Add queue pair data structure to rdmavt

2015-12-14 Thread Dennis Dalessandro

On Thu, Dec 10, 2015 at 01:12:51PM -0600, Hefty, Sean wrote:

>> +struct rvt_rwqe {
>> +  u64 wr_id;
>> +  u8 num_sge;
>> +  struct ib_sge sg_list[0];
>> +};
>> +
>> +/*
>> + * This structure is used to contain the head pointer, tail pointer,
>> + * and receive work queue entries as a single memory allocation so
>> + * it can be mmap'ed into user space.
>> + * Note that the wq array elements are variable size so you can't
>> + * just index into the array to get the N'th element;
>> + * use get_rwqe_ptr() instead.
>
>Can you add/use an entry_size field?

I think we could work something like that, however what we have in
qib/hfi1
also works.  Any reason we really should change this?


I did not check to see what the drivers do.  Using entry_size is 
straightforward, may provide the best performance, and can be done in 
common code, versus adding callbacks to all users. 


Are you concerned that we have to do something different in each driver? If 
so, I do plan to move get_rwqe_ptr to rdmavt in a later patch.  It won't be 
part of the drivers any more.


I kind of like what we have for get_rwqe_ptr, it illustrates how we are 
fishing out the pointer. I'll take another look at it though for the coming 
patch, it could save us from doing a bit of math on the fly.


-Denny
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH 12/15] IB/hfi1: Remove srq from hfi1

2015-12-14 Thread Dennis Dalessandro
SRQ data structure has been moved to rdmavt. Make use of it.

Reviewed-by: Harish Chegondi 
Signed-off-by: Dennis Dalessandro 
---
 drivers/staging/rdma/hfi1/qp.c|2 +-
 drivers/staging/rdma/hfi1/ruc.c   |4 ++--
 drivers/staging/rdma/hfi1/srq.c   |   10 +-
 drivers/staging/rdma/hfi1/verbs.h |   13 -
 4 files changed, 8 insertions(+), 21 deletions(-)

diff --git a/drivers/staging/rdma/hfi1/qp.c b/drivers/staging/rdma/hfi1/qp.c
index b82855f..045ee74 100644
--- a/drivers/staging/rdma/hfi1/qp.c
+++ b/drivers/staging/rdma/hfi1/qp.c
@@ -1092,7 +1092,7 @@ struct ib_qp *hfi1_create_qp(struct ib_pd *ibpd,
sz = sizeof(*qp);
sg_list_sz = 0;
if (init_attr->srq) {
-   struct hfi1_srq *srq = to_isrq(init_attr->srq);
+   struct rvt_srq *srq = ibsrq_to_rvtsrq(init_attr->srq);
 
if (srq->rq.max_sge > 1)
sg_list_sz = sizeof(*qp->r_sg_list) *
diff --git a/drivers/staging/rdma/hfi1/ruc.c b/drivers/staging/rdma/hfi1/ruc.c
index 558dadb..9841e89 100644
--- a/drivers/staging/rdma/hfi1/ruc.c
+++ b/drivers/staging/rdma/hfi1/ruc.c
@@ -159,14 +159,14 @@ int hfi1_get_rwqe(struct rvt_qp *qp, int wr_id_only)
unsigned long flags;
struct rvt_rq *rq;
struct rvt_rwq *wq;
-   struct hfi1_srq *srq;
+   struct rvt_srq *srq;
struct rvt_rwqe *wqe;
void (*handler)(struct ib_event *, void *);
u32 tail;
int ret;
 
if (qp->ibqp.srq) {
-   srq = to_isrq(qp->ibqp.srq);
+   srq = ibsrq_to_rvtsrq(qp->ibqp.srq);
handler = srq->ibsrq.event_handler;
rq = >rq;
} else {
diff --git a/drivers/staging/rdma/hfi1/srq.c b/drivers/staging/rdma/hfi1/srq.c
index 932bd96..78f190a 100644
--- a/drivers/staging/rdma/hfi1/srq.c
+++ b/drivers/staging/rdma/hfi1/srq.c
@@ -65,7 +65,7 @@
 int hfi1_post_srq_receive(struct ib_srq *ibsrq, struct ib_recv_wr *wr,
  struct ib_recv_wr **bad_wr)
 {
-   struct hfi1_srq *srq = to_isrq(ibsrq);
+   struct rvt_srq *srq = ibsrq_to_rvtsrq(ibsrq);
struct rvt_rwq *wq;
unsigned long flags;
int ret;
@@ -120,7 +120,7 @@ struct ib_srq *hfi1_create_srq(struct ib_pd *ibpd,
   struct ib_udata *udata)
 {
struct hfi1_ibdev *dev = to_idev(ibpd->device);
-   struct hfi1_srq *srq;
+   struct rvt_srq *srq;
u32 sz;
struct ib_srq *ret;
 
@@ -229,7 +229,7 @@ int hfi1_modify_srq(struct ib_srq *ibsrq, struct 
ib_srq_attr *attr,
enum ib_srq_attr_mask attr_mask,
struct ib_udata *udata)
 {
-   struct hfi1_srq *srq = to_isrq(ibsrq);
+   struct rvt_srq *srq = ibsrq_to_rvtsrq(ibsrq);
struct rvt_rwq *wq;
int ret = 0;
 
@@ -367,7 +367,7 @@ bail:
 
 int hfi1_query_srq(struct ib_srq *ibsrq, struct ib_srq_attr *attr)
 {
-   struct hfi1_srq *srq = to_isrq(ibsrq);
+   struct rvt_srq *srq = ibsrq_to_rvtsrq(ibsrq);
 
attr->max_wr = srq->rq.size - 1;
attr->max_sge = srq->rq.max_sge;
@@ -381,7 +381,7 @@ int hfi1_query_srq(struct ib_srq *ibsrq, struct ib_srq_attr 
*attr)
  */
 int hfi1_destroy_srq(struct ib_srq *ibsrq)
 {
-   struct hfi1_srq *srq = to_isrq(ibsrq);
+   struct rvt_srq *srq = ibsrq_to_rvtsrq(ibsrq);
struct hfi1_ibdev *dev = to_idev(ibsrq->device);
 
spin_lock(>n_srqs_lock);
diff --git a/drivers/staging/rdma/hfi1/verbs.h 
b/drivers/staging/rdma/hfi1/verbs.h
index fec5e7b..f4ec83c 100644
--- a/drivers/staging/rdma/hfi1/verbs.h
+++ b/drivers/staging/rdma/hfi1/verbs.h
@@ -263,14 +263,6 @@ struct hfi1_cq {
struct rvt_mmap_info *ip;
 };
 
-struct hfi1_srq {
-   struct ib_srq ibsrq;
-   struct rvt_rq rq;
-   struct rvt_mmap_info *ip;
-   /* send signal when number of RWQEs < limit */
-   u32 limit;
-};
-
 /*
  * hfi1 specific data structures that will be hidden from rvt after the queue
  * pair is made common
@@ -537,11 +529,6 @@ static inline struct hfi1_cq *to_icq(struct ib_cq *ibcq)
return container_of(ibcq, struct hfi1_cq, ibcq);
 }
 
-static inline struct hfi1_srq *to_isrq(struct ib_srq *ibsrq)
-{
-   return container_of(ibsrq, struct hfi1_srq, ibsrq);
-}
-
 static inline struct rvt_qp *to_iqp(struct ib_qp *ibqp)
 {
return container_of(ibqp, struct rvt_qp, ibqp);

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH 10/15] IB/hfi1: Implement hfi1 support for AH notification

2015-12-14 Thread Dennis Dalessandro
For OPA devices additional work is required to create an AH.
This patch adds support to set the VL correctly.

Reviewed-by: Mike Marciniszyn 
Signed-off-by: Dennis Dalessandro 
---
 drivers/staging/rdma/hfi1/verbs.c |   24 
 1 files changed, 24 insertions(+), 0 deletions(-)

diff --git a/drivers/staging/rdma/hfi1/verbs.c 
b/drivers/staging/rdma/hfi1/verbs.c
index e007c52..6d69491 100644
--- a/drivers/staging/rdma/hfi1/verbs.c
+++ b/drivers/staging/rdma/hfi1/verbs.c
@@ -1629,6 +1629,29 @@ static int hfi1_check_ah(struct ib_device *ibdev, struct 
ib_ah_attr *ah_attr)
return 0;
 }
 
+static void hfi1_notify_new_ah(struct ib_device *ibdev,
+  struct ib_ah_attr *ah_attr,
+  struct rvt_ah *ah)
+{
+   struct hfi1_ibport *ibp;
+   struct hfi1_pportdata *ppd;
+   struct hfi1_devdata *dd;
+   u8 sc5;
+
+   /*
+* Do not trust reading anything from rvt_ah at this point as it is not
+* done being setup. We can however modify things which we need to set.
+*/
+
+   ibp = to_iport(ibdev, ah_attr->port_num);
+   ppd = ppd_from_ibp(ibp);
+   sc5 = ibp->sl_to_sc[ah->attr.sl];
+   dd = dd_from_ppd(ppd);
+   ah->vl = sc_to_vlt(dd, sc5);
+   if (ah->vl < num_vls || ah->vl == 15)
+   ah->log_pmtu = ilog2(dd->vld[ah->vl].mtu);
+}
+
 struct ib_ah *hfi1_create_qp0_ah(struct hfi1_ibport *ibp, u16 dlid)
 {
struct ib_ah_attr attr;
@@ -1919,6 +1942,7 @@ int hfi1_register_ib_device(struct hfi1_devdata *dd)
dd->verbs_dev.rdi.driver_f.get_card_name = get_card_name;
dd->verbs_dev.rdi.driver_f.get_pci_dev = get_pci_dev;
dd->verbs_dev.rdi.driver_f.check_ah = hfi1_check_ah;
+   dd->verbs_dev.rdi.driver_f.notify_new_ah = hfi1_notify_new_ah;
dd->verbs_dev.rdi.dparms.props.max_ah = hfi1_max_ahs;
dd->verbs_dev.rdi.dparms.props.max_pd = hfi1_max_pds;
dd->verbs_dev.rdi.flags = (RVT_FLAG_MR_INIT_DRIVER |

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH 11/15] IB/hfi1: Remove hfi1 MR and hfi1 specific qp type

2015-12-14 Thread Dennis Dalessandro
This patch does the actual removal of the queue pair from the hfi1 driver
along with a number of dependent data structures. These were moved to rvt.

It also removes the MR functions to use those in rdmavt.

These two pieces can not reasonably be split apart becuase they depend on
each other.

Reviewed-by: Mike Marciniszyn 
Signed-off-by: Dennis Dalessandro 
Signed-off-by: Jubin John 
---
 drivers/staging/rdma/hfi1/Makefile  |2 
 drivers/staging/rdma/hfi1/cq.c  |2 
 drivers/staging/rdma/hfi1/diag.c|8 
 drivers/staging/rdma/hfi1/driver.c  |   10 -
 drivers/staging/rdma/hfi1/hfi.h |   16 -
 drivers/staging/rdma/hfi1/keys.c|  356 -
 drivers/staging/rdma/hfi1/mmap.c|   24 +
 drivers/staging/rdma/hfi1/mr.c  |  522 ---
 drivers/staging/rdma/hfi1/pio.c |4 
 drivers/staging/rdma/hfi1/qp.c  |   88 +++--
 drivers/staging/rdma/hfi1/qp.h  |   41 +-
 drivers/staging/rdma/hfi1/rc.c  |  116 +++
 drivers/staging/rdma/hfi1/ruc.c |   89 +++--
 drivers/staging/rdma/hfi1/sdma.h|6 
 drivers/staging/rdma/hfi1/srq.c |   28 +-
 drivers/staging/rdma/hfi1/trace.h   |   22 +
 drivers/staging/rdma/hfi1/uc.c  |   10 -
 drivers/staging/rdma/hfi1/ud.c  |   18 +
 drivers/staging/rdma/hfi1/verbs.c   |  143 +++-
 drivers/staging/rdma/hfi1/verbs.h   |  372 +++---
 drivers/staging/rdma/hfi1/verbs_mcast.c |8 
 21 files changed, 349 insertions(+), 1536 deletions(-)
 delete mode 100644 drivers/staging/rdma/hfi1/keys.c
 delete mode 100644 drivers/staging/rdma/hfi1/mr.c

diff --git a/drivers/staging/rdma/hfi1/Makefile 
b/drivers/staging/rdma/hfi1/Makefile
index 2126b8b..3ba64fe 100644
--- a/drivers/staging/rdma/hfi1/Makefile
+++ b/drivers/staging/rdma/hfi1/Makefile
@@ -8,7 +8,7 @@
 obj-$(CONFIG_INFINIBAND_HFI1) += hfi1.o
 
 hfi1-y := chip.o cq.o device.o diag.o driver.o eprom.o file_ops.o firmware.o \
-   init.o intr.o keys.o mad.o mmap.o mr.o pcie.o pio.o pio_copy.o \
+   init.o intr.o mad.o mmap.o pcie.o pio.o pio_copy.o \
qp.o qsfp.o rc.o ruc.o sdma.o srq.o sysfs.o trace.o twsi.o \
uc.o ud.o user_pages.o user_sdma.o verbs_mcast.o verbs.o
 hfi1-$(CONFIG_DEBUG_FS) += debugfs.o
diff --git a/drivers/staging/rdma/hfi1/cq.c b/drivers/staging/rdma/hfi1/cq.c
index 4f046ff..ffd0e7a 100644
--- a/drivers/staging/rdma/hfi1/cq.c
+++ b/drivers/staging/rdma/hfi1/cq.c
@@ -479,7 +479,7 @@ int hfi1_resize_cq(struct ib_cq *ibcq, int cqe, struct 
ib_udata *udata)
 
if (cq->ip) {
struct hfi1_ibdev *dev = to_idev(ibcq->device);
-   struct hfi1_mmap_info *ip = cq->ip;
+   struct rvt_mmap_info *ip = cq->ip;
 
hfi1_update_mmap_info(dev, ip, sz, wc);
 
diff --git a/drivers/staging/rdma/hfi1/diag.c b/drivers/staging/rdma/hfi1/diag.c
index e172f2a..49822c1 100644
--- a/drivers/staging/rdma/hfi1/diag.c
+++ b/drivers/staging/rdma/hfi1/diag.c
@@ -1618,7 +1618,7 @@ int snoop_recv_handler(struct hfi1_packet *packet)
 /*
  * Handle snooping and capturing packets when sdma is being used.
  */
-int snoop_send_dma_handler(struct hfi1_qp *qp, struct hfi1_pkt_state *ps,
+int snoop_send_dma_handler(struct rvt_qp *qp, struct hfi1_pkt_state *ps,
   u64 pbc)
 {
pr_alert("Snooping/Capture of Send DMA Packets Is Not Supported!\n");
@@ -1631,13 +1631,13 @@ int snoop_send_dma_handler(struct hfi1_qp *qp, struct 
hfi1_pkt_state *ps,
  * bypass packets. The only way to send a bypass packet currently is to use the
  * diagpkt interface. When that interface is enable snoop/capture is not.
  */
-int snoop_send_pio_handler(struct hfi1_qp *qp, struct hfi1_pkt_state *ps,
+int snoop_send_pio_handler(struct rvt_qp *qp, struct hfi1_pkt_state *ps,
   u64 pbc)
 {
struct hfi1_qp_priv *priv = qp->priv;
struct ahg_ib_header *ahdr = priv->s_hdr;
u32 hdrwords = qp->s_hdrwords;
-   struct hfi1_sge_state *ss = qp->s_cur_sge;
+   struct rvt_sge_state *ss = qp->s_cur_sge;
u32 len = qp->s_cur_size;
u32 dwords = (len + 3) >> 2;
u32 plen = hdrwords + dwords + 2; /* includes pbc */
@@ -1645,7 +1645,7 @@ int snoop_send_pio_handler(struct hfi1_qp *qp, struct 
hfi1_pkt_state *ps,
struct snoop_packet *s_packet = NULL;
u32 *hdr = (u32 *)>ibh;
u32 length = 0;
-   struct hfi1_sge_state temp_ss;
+   struct rvt_sge_state temp_ss;
void *data = NULL;
void *data_start = NULL;
int ret;
diff --git a/drivers/staging/rdma/hfi1/driver.c 
b/drivers/staging/rdma/hfi1/driver.c
index fb52d07..182e05f 100644
--- a/drivers/staging/rdma/hfi1/driver.c
+++ b/drivers/staging/rdma/hfi1/driver.c
@@ -318,7 +318,7 @@ static void rcv_hdrerr(struct hfi1_ctxtdata *rcd, struct 

[RFC PATCH 06/15] IB/hfi1: Remove driver specific members from hfi1 qp type

2015-12-14 Thread Dennis Dalessandro
In preparation for moving the queue pair data structure to rdmavt the
members of the driver specific queue pairs which are not common need to be
pushed off to a private driver structure. This structure will be available
in the queue pair once moved to rdmavt as a void pointer. This patch while
not adding a lot of value in and of itself is a prerequisite to move the
queue pair out of the drivers and into rdmavt.

The driver specific, private queue pair data structure should condense as
more of the send side code moves to rdmavt.

Reviewed-by: Mike Marciniszyn 
Signed-off-by: Dennis Dalessandro 
Signed-off-by: Jubin John 
---
 drivers/staging/rdma/hfi1/diag.c  |3 +
 drivers/staging/rdma/hfi1/pio.c   |6 ++-
 drivers/staging/rdma/hfi1/qp.c|   76 -
 drivers/staging/rdma/hfi1/qp.h|9 +++-
 drivers/staging/rdma/hfi1/rc.c|7 ++-
 drivers/staging/rdma/hfi1/ruc.c   |   46 --
 drivers/staging/rdma/hfi1/uc.c|7 ++-
 drivers/staging/rdma/hfi1/ud.c|   37 +-
 drivers/staging/rdma/hfi1/verbs.c |   54 --
 drivers/staging/rdma/hfi1/verbs.h |   26 ++---
 10 files changed, 166 insertions(+), 105 deletions(-)

diff --git a/drivers/staging/rdma/hfi1/diag.c b/drivers/staging/rdma/hfi1/diag.c
index 0cf324d..e172f2a 100644
--- a/drivers/staging/rdma/hfi1/diag.c
+++ b/drivers/staging/rdma/hfi1/diag.c
@@ -1634,7 +1634,8 @@ int snoop_send_dma_handler(struct hfi1_qp *qp, struct 
hfi1_pkt_state *ps,
 int snoop_send_pio_handler(struct hfi1_qp *qp, struct hfi1_pkt_state *ps,
   u64 pbc)
 {
-   struct ahg_ib_header *ahdr = qp->s_hdr;
+   struct hfi1_qp_priv *priv = qp->priv;
+   struct ahg_ib_header *ahdr = priv->s_hdr;
u32 hdrwords = qp->s_hdrwords;
struct hfi1_sge_state *ss = qp->s_cur_sge;
u32 len = qp->s_cur_size;
diff --git a/drivers/staging/rdma/hfi1/pio.c b/drivers/staging/rdma/hfi1/pio.c
index eab58c1..b5bce5e 100644
--- a/drivers/staging/rdma/hfi1/pio.c
+++ b/drivers/staging/rdma/hfi1/pio.c
@@ -1498,6 +1498,7 @@ static void sc_piobufavail(struct send_context *sc)
struct list_head *list;
struct hfi1_qp *qps[PIO_WAIT_BATCH_SIZE];
struct hfi1_qp *qp;
+   struct hfi1_qp_priv *priv;
unsigned long flags;
unsigned i, n = 0;
 
@@ -1517,8 +1518,9 @@ static void sc_piobufavail(struct send_context *sc)
if (n == ARRAY_SIZE(qps))
goto full;
wait = list_first_entry(list, struct iowait, list);
-   qp = container_of(wait, struct hfi1_qp, s_iowait);
-   list_del_init(>s_iowait.list);
+   qp = iowait_to_qp(wait);
+   priv = qp->priv;
+   list_del_init(>s_iowait.list);
/* refcount held until actual wake up */
qps[n++] = qp;
}
diff --git a/drivers/staging/rdma/hfi1/qp.c b/drivers/staging/rdma/hfi1/qp.c
index bb447b5..d49b1e9 100644
--- a/drivers/staging/rdma/hfi1/qp.c
+++ b/drivers/staging/rdma/hfi1/qp.c
@@ -349,11 +349,12 @@ bail:
  */
 static void reset_qp(struct hfi1_qp *qp, enum ib_qp_type type)
 {
+   struct hfi1_qp_priv *priv = qp->priv;
qp->remote_qpn = 0;
qp->qkey = 0;
qp->qp_access_flags = 0;
iowait_init(
-   >s_iowait,
+   >s_iowait,
1,
hfi1_do_send,
iowait_sleep,
@@ -460,6 +461,7 @@ static void clear_mr_refs(struct hfi1_qp *qp, int clr_sends)
 int hfi1_error_qp(struct hfi1_qp *qp, enum ib_wc_status err)
 {
struct hfi1_ibdev *dev = to_idev(qp->ibqp.device);
+   struct hfi1_qp_priv *priv = qp->priv;
struct ib_wc wc;
int ret = 0;
 
@@ -477,9 +479,9 @@ int hfi1_error_qp(struct hfi1_qp *qp, enum ib_wc_status err)
qp->s_flags &= ~HFI1_S_ANY_WAIT_SEND;
 
write_seqlock(>iowait_lock);
-   if (!list_empty(>s_iowait.list) && !(qp->s_flags & HFI1_S_BUSY)) {
+   if (!list_empty(>s_iowait.list) && !(qp->s_flags & HFI1_S_BUSY)) {
qp->s_flags &= ~HFI1_S_ANY_WAIT_IO;
-   list_del_init(>s_iowait.list);
+   list_del_init(>s_iowait.list);
if (atomic_dec_and_test(>refcount))
wake_up(>wait);
}
@@ -544,11 +546,13 @@ bail:
 
 static void flush_tx_list(struct hfi1_qp *qp)
 {
-   while (!list_empty(>s_iowait.tx_head)) {
+   struct hfi1_qp_priv *priv = qp->priv;
+
+   while (!list_empty(>s_iowait.tx_head)) {
struct sdma_txreq *tx;
 
tx = list_first_entry(
-   >s_iowait.tx_head,
+   >s_iowait.tx_head,
struct sdma_txreq,
list);
list_del_init(>list);
@@ -559,12 +563,13 @@ static void 

[RFC PATCH 09/15] IB/hfi1: Use address handle in rdmavt and remove from hfi1

2015-12-14 Thread Dennis Dalessandro
Original patch from Kamal Heib , split
apart from original and modified to accomodate recent changes
in rdmavt.

Remove AH from hfi1 and use rdmavt version.

Signed-off-by: Kamal Heib 
Signed-off-by: Dennis Dalessandro 
Signed-off-by: Jubin John 
---
 drivers/staging/rdma/hfi1/common.h |2 -
 drivers/staging/rdma/hfi1/mad.c|2 -
 drivers/staging/rdma/hfi1/qp.c |6 +-
 drivers/staging/rdma/hfi1/ruc.c|2 -
 drivers/staging/rdma/hfi1/ud.c |4 +
 drivers/staging/rdma/hfi1/verbs.c  |  131 ++--
 drivers/staging/rdma/hfi1/verbs.h  |   20 +
 7 files changed, 18 insertions(+), 149 deletions(-)

diff --git a/drivers/staging/rdma/hfi1/common.h 
b/drivers/staging/rdma/hfi1/common.h
index 5dd9272..b0c415a 100644
--- a/drivers/staging/rdma/hfi1/common.h
+++ b/drivers/staging/rdma/hfi1/common.h
@@ -341,7 +341,6 @@ struct hfi1_message_header {
 #define FULL_MGMT_P_KEY  0x
 
 #define DEFAULT_P_KEY LIM_MGMT_P_KEY
-#define HFI1_PERMISSIVE_LID 0x
 #define HFI1_AETH_CREDIT_SHIFT 24
 #define HFI1_AETH_CREDIT_MASK 0x1F
 #define HFI1_AETH_CREDIT_INVAL 0x1F
@@ -353,7 +352,6 @@ struct hfi1_message_header {
 #define HFI1_BECN_SHIFT 30
 #define HFI1_BECN_MASK 1
 #define HFI1_BECN_SMASK (1 << HFI1_BECN_SHIFT)
-#define HFI1_MULTICAST_LID_BASE 0xC000
 
 static inline __u64 rhf_to_cpu(const __le32 *rbuf)
 {
diff --git a/drivers/staging/rdma/hfi1/mad.c b/drivers/staging/rdma/hfi1/mad.c
index 0a3f291..8e9d1e7 100644
--- a/drivers/staging/rdma/hfi1/mad.c
+++ b/drivers/staging/rdma/hfi1/mad.c
@@ -137,7 +137,7 @@ static void send_trap(struct hfi1_ibport *ibp, void *data, 
unsigned len)
ret = PTR_ERR(ah);
else {
send_buf->ah = ah;
-   ibp->sm_ah = to_iah(ah);
+   ibp->sm_ah = ibah_to_rvtah(ah);
ret = 0;
}
} else
diff --git a/drivers/staging/rdma/hfi1/qp.c b/drivers/staging/rdma/hfi1/qp.c
index 7c356e4..bbe6b4d 100644
--- a/drivers/staging/rdma/hfi1/qp.c
+++ b/drivers/staging/rdma/hfi1/qp.c
@@ -424,7 +424,7 @@ static void clear_mr_refs(struct hfi1_qp *qp, int clr_sends)
if (qp->ibqp.qp_type == IB_QPT_UD ||
qp->ibqp.qp_type == IB_QPT_SMI ||
qp->ibqp.qp_type == IB_QPT_GSI)
-   atomic_dec(_iah(wqe->ud_wr.ah)->refcount);
+   
atomic_dec(_to_rvtah(wqe->ud_wr.ah)->refcount);
if (++qp->s_last >= qp->s_size)
qp->s_last = 0;
}
@@ -642,7 +642,7 @@ int hfi1_modify_qp(struct ib_qp *ibqp, struct ib_qp_attr 
*attr,
 
if (attr->ah_attr.dlid >= be16_to_cpu(IB_MULTICAST_LID_BASE))
goto inval;
-   if (hfi1_check_ah(qp->ibqp.device, >ah_attr))
+   if (rvt_check_ah(qp->ibqp.device, >ah_attr))
goto inval;
sc = ah_to_sc(ibqp->device, >ah_attr);
if (!qp_to_sdma_engine(qp, sc) &&
@@ -656,7 +656,7 @@ int hfi1_modify_qp(struct ib_qp *ibqp, struct ib_qp_attr 
*attr,
if (attr->alt_ah_attr.dlid >=
be16_to_cpu(IB_MULTICAST_LID_BASE))
goto inval;
-   if (hfi1_check_ah(qp->ibqp.device, >alt_ah_attr))
+   if (rvt_check_ah(qp->ibqp.device, >alt_ah_attr))
goto inval;
if (attr->alt_pkey_index >= hfi1_get_npkeys(dd))
goto inval;
diff --git a/drivers/staging/rdma/hfi1/ruc.c b/drivers/staging/rdma/hfi1/ruc.c
index 736b44d..4108c6a 100644
--- a/drivers/staging/rdma/hfi1/ruc.c
+++ b/drivers/staging/rdma/hfi1/ruc.c
@@ -891,7 +891,7 @@ void hfi1_send_complete(struct hfi1_qp *qp, struct 
hfi1_swqe *wqe,
if (qp->ibqp.qp_type == IB_QPT_UD ||
qp->ibqp.qp_type == IB_QPT_SMI ||
qp->ibqp.qp_type == IB_QPT_GSI)
-   atomic_dec(_iah(wqe->ud_wr.ah)->refcount);
+   atomic_dec(_to_rvtah(wqe->ud_wr.ah)->refcount);
 
/* See ch. 11.2.4.1 and 10.7.3.1 */
if (!(qp->s_flags & HFI1_S_SIGNAL_REQ_WR) ||
diff --git a/drivers/staging/rdma/hfi1/ud.c b/drivers/staging/rdma/hfi1/ud.c
index aad4e49..24b6077 100644
--- a/drivers/staging/rdma/hfi1/ud.c
+++ b/drivers/staging/rdma/hfi1/ud.c
@@ -98,7 +98,7 @@ static void ud_loopback(struct hfi1_qp *sqp, struct hfi1_swqe 
*swqe)
goto drop;
}
 
-   ah_attr = _iah(swqe->ud_wr.ah)->attr;
+   ah_attr = _to_rvtah(swqe->ud_wr.ah)->attr;
ppd = ppd_from_ibp(ibp);
 
if (qp->ibqp.qp_num > 1) {
@@ -310,7 +310,7 @@ int hfi1_make_ud_req(struct hfi1_qp *qp)
/* Construct the header. */
ibp = 

[RFC PATCH 05/15] IB/hfi1: Remove MR data structures from hfi1

2015-12-14 Thread Dennis Dalessandro
Remove MR data structures from hfi1 and use the version in rdmavt

Reviewed-by: Dean Luick 
Reviewed-by: Mike Marciniszyn 
Signed-off-by: Dennis Dalessandro 
---
 drivers/staging/rdma/hfi1/keys.c  |   30 +-
 drivers/staging/rdma/hfi1/mr.c|   22 ++---
 drivers/staging/rdma/hfi1/ruc.c   |4 +-
 drivers/staging/rdma/hfi1/sdma.h  |2 +
 drivers/staging/rdma/hfi1/ud.c|2 +
 drivers/staging/rdma/hfi1/verbs.c |   16 +
 drivers/staging/rdma/hfi1/verbs.h |   63 ++---
 7 files changed, 48 insertions(+), 91 deletions(-)

diff --git a/drivers/staging/rdma/hfi1/keys.c b/drivers/staging/rdma/hfi1/keys.c
index 57a266f..ffaaa6f 100644
--- a/drivers/staging/rdma/hfi1/keys.c
+++ b/drivers/staging/rdma/hfi1/keys.c
@@ -63,21 +63,21 @@
  *
  */
 
-int hfi1_alloc_lkey(struct hfi1_mregion *mr, int dma_region)
+int hfi1_alloc_lkey(struct rvt_mregion *mr, int dma_region)
 {
unsigned long flags;
u32 r;
u32 n;
int ret = 0;
struct hfi1_ibdev *dev = to_idev(mr->pd->device);
-   struct hfi1_lkey_table *rkt = >lk_table;
+   struct rvt_lkey_table *rkt = >lk_table;
 
hfi1_get_mr(mr);
spin_lock_irqsave(>lock, flags);
 
/* special case for dma_mr lkey == 0 */
if (dma_region) {
-   struct hfi1_mregion *tmr;
+   struct rvt_mregion *tmr;
 
tmr = rcu_access_pointer(dev->dma_mr);
if (!tmr) {
@@ -133,13 +133,13 @@ bail:
  * hfi1_free_lkey - free an lkey
  * @mr: mr to free from tables
  */
-void hfi1_free_lkey(struct hfi1_mregion *mr)
+void hfi1_free_lkey(struct rvt_mregion *mr)
 {
unsigned long flags;
u32 lkey = mr->lkey;
u32 r;
struct hfi1_ibdev *dev = to_idev(mr->pd->device);
-   struct hfi1_lkey_table *rkt = >lk_table;
+   struct rvt_lkey_table *rkt = >lk_table;
int freed = 0;
 
spin_lock_irqsave(>lock, flags);
@@ -176,10 +176,10 @@ out:
  * Check the IB SGE for validity and initialize our internal version
  * of it.
  */
-int hfi1_lkey_ok(struct hfi1_lkey_table *rkt, struct rvt_pd *pd,
+int hfi1_lkey_ok(struct rvt_lkey_table *rkt, struct rvt_pd *pd,
 struct hfi1_sge *isge, struct ib_sge *sge, int acc)
 {
-   struct hfi1_mregion *mr;
+   struct rvt_mregion *mr;
unsigned n, m;
size_t off;
 
@@ -231,15 +231,15 @@ int hfi1_lkey_ok(struct hfi1_lkey_table *rkt, struct 
rvt_pd *pd,
 
entries_spanned_by_off = off >> mr->page_shift;
off -= (entries_spanned_by_off << mr->page_shift);
-   m = entries_spanned_by_off / HFI1_SEGSZ;
-   n = entries_spanned_by_off % HFI1_SEGSZ;
+   m = entries_spanned_by_off / RVT_SEGSZ;
+   n = entries_spanned_by_off % RVT_SEGSZ;
} else {
m = 0;
n = 0;
while (off >= mr->map[m]->segs[n].length) {
off -= mr->map[m]->segs[n].length;
n++;
-   if (n >= HFI1_SEGSZ) {
+   if (n >= RVT_SEGSZ) {
m++;
n = 0;
}
@@ -274,8 +274,8 @@ bail:
 int hfi1_rkey_ok(struct hfi1_qp *qp, struct hfi1_sge *sge,
 u32 len, u64 vaddr, u32 rkey, int acc)
 {
-   struct hfi1_lkey_table *rkt = _idev(qp->ibqp.device)->lk_table;
-   struct hfi1_mregion *mr;
+   struct rvt_lkey_table *rkt = _idev(qp->ibqp.device)->lk_table;
+   struct rvt_mregion *mr;
unsigned n, m;
size_t off;
 
@@ -328,15 +328,15 @@ int hfi1_rkey_ok(struct hfi1_qp *qp, struct hfi1_sge *sge,
 
entries_spanned_by_off = off >> mr->page_shift;
off -= (entries_spanned_by_off << mr->page_shift);
-   m = entries_spanned_by_off / HFI1_SEGSZ;
-   n = entries_spanned_by_off % HFI1_SEGSZ;
+   m = entries_spanned_by_off / RVT_SEGSZ;
+   n = entries_spanned_by_off % RVT_SEGSZ;
} else {
m = 0;
n = 0;
while (off >= mr->map[m]->segs[n].length) {
off -= mr->map[m]->segs[n].length;
n++;
-   if (n >= HFI1_SEGSZ) {
+   if (n >= RVT_SEGSZ) {
m++;
n = 0;
}
diff --git a/drivers/staging/rdma/hfi1/mr.c b/drivers/staging/rdma/hfi1/mr.c
index 02589b2..27f8081 100644
--- a/drivers/staging/rdma/hfi1/mr.c
+++ b/drivers/staging/rdma/hfi1/mr.c
@@ -56,7 +56,7 @@
 /* Fast memory region */
 struct hfi1_fmr {
struct ib_fmr ibfmr;
-   struct hfi1_mregion mr;/* must be last */
+   struct rvt_mregion mr;/* must be last */
 };
 
 static inline struct 

[RFC PATCH 14/15] IB/hfi1: Remove mmap from hfi1

2015-12-14 Thread Dennis Dalessandro
Mmap data structure has already been moved to rdmavt and hfi1 supports
it. Now that the mmap functionality has also been moved to rdmavt its
time for hfi1 to use that as well.

Reviewed-by: Mike Marciniszyn 
Signed-off-by: Dennis Dalessandro 
Signed-off-by: Jubin John 
---
 drivers/staging/rdma/hfi1/Makefile |2 
 drivers/staging/rdma/hfi1/cq.c |   18 ++-
 drivers/staging/rdma/hfi1/mmap.c   |  192 
 drivers/staging/rdma/hfi1/qp.c |   12 +-
 drivers/staging/rdma/hfi1/srq.c|   20 ++--
 drivers/staging/rdma/hfi1/verbs.c  |6 -
 drivers/staging/rdma/hfi1/verbs.h  |   17 ---
 7 files changed, 27 insertions(+), 240 deletions(-)
 delete mode 100644 drivers/staging/rdma/hfi1/mmap.c

diff --git a/drivers/staging/rdma/hfi1/Makefile 
b/drivers/staging/rdma/hfi1/Makefile
index 3ba64fe..ff663b3 100644
--- a/drivers/staging/rdma/hfi1/Makefile
+++ b/drivers/staging/rdma/hfi1/Makefile
@@ -8,7 +8,7 @@
 obj-$(CONFIG_INFINIBAND_HFI1) += hfi1.o
 
 hfi1-y := chip.o cq.o device.o diag.o driver.o eprom.o file_ops.o firmware.o \
-   init.o intr.o mad.o mmap.o pcie.o pio.o pio_copy.o \
+   init.o intr.o mad.o pcie.o pio.o pio_copy.o \
qp.o qsfp.o rc.o ruc.o sdma.o srq.o sysfs.o trace.o twsi.o \
uc.o ud.o user_pages.o user_sdma.o verbs_mcast.o verbs.o
 hfi1-$(CONFIG_DEBUG_FS) += debugfs.o
diff --git a/drivers/staging/rdma/hfi1/cq.c b/drivers/staging/rdma/hfi1/cq.c
index ffd0e7a..25d1a2a 100644
--- a/drivers/staging/rdma/hfi1/cq.c
+++ b/drivers/staging/rdma/hfi1/cq.c
@@ -277,7 +277,7 @@ struct ib_cq *hfi1_create_cq(
if (udata && udata->outlen >= sizeof(__u64)) {
int err;
 
-   cq->ip = hfi1_create_mmap_info(dev, sz, context, wc);
+   cq->ip = rvt_create_mmap_info(>rdi, sz, context, wc);
if (!cq->ip) {
ret = ERR_PTR(-ENOMEM);
goto bail_wc;
@@ -303,9 +303,9 @@ struct ib_cq *hfi1_create_cq(
spin_unlock(>n_cqs_lock);
 
if (cq->ip) {
-   spin_lock_irq(>pending_lock);
-   list_add(>ip->pending_mmaps, >pending_mmaps);
-   spin_unlock_irq(>pending_lock);
+   spin_lock_irq(>rdi.pending_lock);
+   list_add(>ip->pending_mmaps, >rdi.pending_mmaps);
+   spin_unlock_irq(>rdi.pending_lock);
}
 
/*
@@ -355,7 +355,7 @@ int hfi1_destroy_cq(struct ib_cq *ibcq)
dev->n_cqs_allocated--;
spin_unlock(>n_cqs_lock);
if (cq->ip)
-   kref_put(>ip->ref, hfi1_release_mmap_info);
+   kref_put(>ip->ref, rvt_release_mmap_info);
else
vfree(cq->queue);
kfree(cq);
@@ -481,7 +481,7 @@ int hfi1_resize_cq(struct ib_cq *ibcq, int cqe, struct 
ib_udata *udata)
struct hfi1_ibdev *dev = to_idev(ibcq->device);
struct rvt_mmap_info *ip = cq->ip;
 
-   hfi1_update_mmap_info(dev, ip, sz, wc);
+   rvt_update_mmap_info(>rdi, ip, sz, wc);
 
/*
 * Return the offset to mmap.
@@ -494,10 +494,10 @@ int hfi1_resize_cq(struct ib_cq *ibcq, int cqe, struct 
ib_udata *udata)
goto bail;
}
 
-   spin_lock_irq(>pending_lock);
+   spin_lock_irq(>rdi.pending_lock);
if (list_empty(>pending_mmaps))
-   list_add(>pending_mmaps, >pending_mmaps);
-   spin_unlock_irq(>pending_lock);
+   list_add(>pending_mmaps, >rdi.pending_mmaps);
+   spin_unlock_irq(>rdi.pending_lock);
}
 
ret = 0;
diff --git a/drivers/staging/rdma/hfi1/mmap.c b/drivers/staging/rdma/hfi1/mmap.c
deleted file mode 100644
index 4ce6be6..000
--- a/drivers/staging/rdma/hfi1/mmap.c
+++ /dev/null
@@ -1,192 +0,0 @@
-/*
- *
- * This file is provided under a dual BSD/GPLv2 license.  When using or
- * redistributing this file, you may do so under either license.
- *
- * GPL LICENSE SUMMARY
- *
- * Copyright(c) 2015 Intel Corporation.
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of version 2 of the GNU General Public License as
- * published by the Free Software Foundation.
- *
- * This program is distributed in the hope that it will be useful, but
- * WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
- * General Public License for more details.
- *
- * BSD LICENSE
- *
- * Copyright(c) 2015 Intel Corporation.
- *
- * Redistribution and use in source and binary forms, with or without
- * modification, are permitted provided that the following conditions
- * are met:
- *
- *  - Redistributions of source code must retain the above copyright
- *notice, this list of conditions and the following disclaimer.
- *  - 

[RFC PATCH 08/15] IB/hfi1: Use correct rdmavt header files after move.

2015-12-14 Thread Dennis Dalessandro
Rdmavt split the header files to be based on ibta object. This patch
makes changes in hfi1 to account for the move.

The actual removal of HFI1 code continues in the following patch.

Reviewed-by: Mike Marciniszyn 
Signed-off-by: Dennis Dalessandro 
Signed-off-by: Jubin John 
---
 drivers/staging/rdma/hfi1/driver.c |3 ++-
 drivers/staging/rdma/hfi1/mad.c|4 ++--
 drivers/staging/rdma/hfi1/qp.c |5 +++--
 drivers/staging/rdma/hfi1/ud.c |   14 +++---
 drivers/staging/rdma/hfi1/verbs.c  |4 ++--
 5 files changed, 16 insertions(+), 14 deletions(-)

diff --git a/drivers/staging/rdma/hfi1/driver.c 
b/drivers/staging/rdma/hfi1/driver.c
index 4f0103b..fb52d07 100644
--- a/drivers/staging/rdma/hfi1/driver.c
+++ b/drivers/staging/rdma/hfi1/driver.c
@@ -56,6 +56,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "hfi.h"
 #include "trace.h"
@@ -316,7 +317,7 @@ static void rcv_hdrerr(struct hfi1_ctxtdata *rcd, struct 
hfi1_pportdata *ppd,
 
/* Get the destination QP number. */
qp_num = be32_to_cpu(ohdr->bth[1]) & HFI1_QPN_MASK;
-   if (lid < HFI1_MULTICAST_LID_BASE) {
+   if (lid < be16_to_cpu(IB_MULTICAST_LID_BASE)) {
struct hfi1_qp *qp;
unsigned long flags;
 
diff --git a/drivers/staging/rdma/hfi1/mad.c b/drivers/staging/rdma/hfi1/mad.c
index 1c34396..0a3f291 100644
--- a/drivers/staging/rdma/hfi1/mad.c
+++ b/drivers/staging/rdma/hfi1/mad.c
@@ -1096,7 +1096,7 @@ static int __subn_set_opa_portinfo(struct opa_smp *smp, 
u32 am, u8 *data,
 
/* Must be a valid unicast LID address. */
if ((lid == 0 && ls_old > IB_PORT_INIT) ||
-lid >= HFI1_MULTICAST_LID_BASE) {
+lid >= be16_to_cpu(IB_MULTICAST_LID_BASE)) {
smp->status |= IB_SMP_INVALID_FIELD;
pr_warn("SubnSet(OPA_PortInfo) lid invalid 0x%x\n",
lid);
@@ -1129,7 +1129,7 @@ static int __subn_set_opa_portinfo(struct opa_smp *smp, 
u32 am, u8 *data,
 
/* Must be a valid unicast LID address. */
if ((smlid == 0 && ls_old > IB_PORT_INIT) ||
-smlid >= HFI1_MULTICAST_LID_BASE) {
+smlid >= be16_to_cpu(IB_MULTICAST_LID_BASE)) {
smp->status |= IB_SMP_INVALID_FIELD;
pr_warn("SubnSet(OPA_PortInfo) smlid invalid 0x%x\n", smlid);
} else if (smlid != ibp->sm_lid || msl != ibp->sm_sl) {
diff --git a/drivers/staging/rdma/hfi1/qp.c b/drivers/staging/rdma/hfi1/qp.c
index d49b1e9..7c356e4 100644
--- a/drivers/staging/rdma/hfi1/qp.c
+++ b/drivers/staging/rdma/hfi1/qp.c
@@ -640,7 +640,7 @@ int hfi1_modify_qp(struct ib_qp *ibqp, struct ib_qp_attr 
*attr,
if (attr_mask & IB_QP_AV) {
u8 sc;
 
-   if (attr->ah_attr.dlid >= HFI1_MULTICAST_LID_BASE)
+   if (attr->ah_attr.dlid >= be16_to_cpu(IB_MULTICAST_LID_BASE))
goto inval;
if (hfi1_check_ah(qp->ibqp.device, >ah_attr))
goto inval;
@@ -653,7 +653,8 @@ int hfi1_modify_qp(struct ib_qp *ibqp, struct ib_qp_attr 
*attr,
if (attr_mask & IB_QP_ALT_PATH) {
u8 sc;
 
-   if (attr->alt_ah_attr.dlid >= HFI1_MULTICAST_LID_BASE)
+   if (attr->alt_ah_attr.dlid >=
+   be16_to_cpu(IB_MULTICAST_LID_BASE))
goto inval;
if (hfi1_check_ah(qp->ibqp.device, >alt_ah_attr))
goto inval;
diff --git a/drivers/staging/rdma/hfi1/ud.c b/drivers/staging/rdma/hfi1/ud.c
index ba8a557..aad4e49 100644
--- a/drivers/staging/rdma/hfi1/ud.c
+++ b/drivers/staging/rdma/hfi1/ud.c
@@ -243,7 +243,7 @@ static void ud_loopback(struct hfi1_qp *sqp, struct 
hfi1_swqe *swqe)
wc.slid = ppd->lid | (ah_attr->src_path_bits & ((1 << ppd->lmc) - 1));
/* Check for loopback when the port lid is not set */
if (wc.slid == 0 && sqp->ibqp.qp_type == IB_QPT_GSI)
-   wc.slid = HFI1_PERMISSIVE_LID;
+   wc.slid = be16_to_cpu(IB_LID_PERMISSIVE);
wc.sl = ah_attr->sl;
wc.dlid_path_bits = ah_attr->dlid & ((1 << ppd->lmc) - 1);
wc.port_num = qp->port_num;
@@ -311,11 +311,11 @@ int hfi1_make_ud_req(struct hfi1_qp *qp)
ibp = to_iport(qp->ibqp.device, qp->port_num);
ppd = ppd_from_ibp(ibp);
ah_attr = _iah(wqe->ud_wr.ah)->attr;
-   if (ah_attr->dlid < HFI1_MULTICAST_LID_BASE ||
-   ah_attr->dlid == HFI1_PERMISSIVE_LID) {
+   if (ah_attr->dlid < be16_to_cpu(IB_MULTICAST_LID_BASE) ||
+   ah_attr->dlid == be16_to_cpu(IB_LID_PERMISSIVE)) {
lid = ah_attr->dlid & ~((1 << ppd->lmc) - 1);
if (unlikely(!loopback && (lid == ppd->lid ||
-   (lid == HFI1_PERMISSIVE_LID &&
+   (lid == be16_to_cpu(IB_LID_PERMISSIVE) &&

[RFC PATCH 02/15] IB/hfi1: Add basic rdmavt capability flags for hfi1

2015-12-14 Thread Dennis Dalessandro
Most functionality is still being done in the driver, set flags so that
rdmavt will let hfi1 continue to handle mr, qp, and cq init.

Reviewed-by: Mike Marciniszyn 
Signed-off-by: Dennis Dalessandro 
---
 drivers/staging/rdma/hfi1/verbs.c |3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/drivers/staging/rdma/hfi1/verbs.c 
b/drivers/staging/rdma/hfi1/verbs.c
index 4292d52..c457e82 100644
--- a/drivers/staging/rdma/hfi1/verbs.c
+++ b/drivers/staging/rdma/hfi1/verbs.c
@@ -2075,6 +2075,9 @@ int hfi1_register_ib_device(struct hfi1_devdata *dd)
 */
dd->verbs_dev.rdi.driver_f.port_callback = hfi1_create_port_files;
dd->verbs_dev.rdi.dparms.props.max_pd = hfi1_max_pds;
+   dd->verbs_dev.rdi.flags = (RVT_FLAG_MR_INIT_DRIVER |
+  RVT_FLAG_QP_INIT_DRIVER |
+  RVT_FLAG_CQ_INIT_DRIVER);
 
ret = rvt_register_device(>verbs_dev.rdi);
if (ret)

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH 00/15] staging/rdma/hfi1: Initial patches to add rdmavt support in HFI1

2015-12-14 Thread Dennis Dalessandro
This patch series is being submitted as a Request For Comment only. It depends
on code submitted to the rdma subsystem [1].

This work is the first submission aimed to satisfy the TODO item for removing
duplicated code in hfi1. At the time of submission hfi1 and qib contained alot
of duplicated verbs processing code. The qib driver is having similar changes
made to use rdmavt. This will result in a common code base that both drivers and
future drivers such as soft-roce can use.

Note that due to the ongoing submission of hfi1 improvement patches, there will
likely be a number of conflicts which will still need to be resolved.

We also are still faced with the issue of separate trees for this work as was
discussed previously [2]. The result of that conversation was to keep the
drivers in separate trees until the 4.5 merge window. We are hoping that after
this merge window a single maintainer can take control of hfi1, qib, and rdmavt
so that these patches can move forward and be applied.

For now though we would like to get feedback on these patches with more to
follow.

[1] https://www.mail-archive.com/linux-rdma@vger.kernel.org/msg30074.html
[2] https://www.mail-archive.com/linux-rdma%40vger.kernel.org/msg29360.html

---

Dennis Dalessandro (15):
  IB/hfi1: Begin to use rdmavt for verbs
  IB/hfi1: Add basic rdmavt capability flags for hfi1
  IB/hfi1: Consolidate dma ops for hfi1
  IB/hfi1: Use rdmavt protection domain
  IB/hfi1: Remove MR data structures from hfi1
  IB/hfi1: Remove driver specific members from hfi1 qp type
  IB/hfi1: Add device specific info prints
  IB/hfi1: Use correct rdmavt header files after move.
  IB/hfi1: Use address handle in rdmavt and remove from hfi1
  IB/hfi1: Implement hfi1 support for AH notification
  IB/hfi1: Remove hfi1 MR and hfi1 specific qp type
  IB/hfi1: Remove srq from hfi1
  IB/hfi1: Remove ibport and use rdmavt version
  IB/hfi1: Remove mmap from hfi1
  IB/hfi1: Use rdmavt pkey verbs function


 drivers/staging/rdma/hfi1/Kconfig   |2 
 drivers/staging/rdma/hfi1/Makefile  |4 
 drivers/staging/rdma/hfi1/chip.c|   36 +-
 drivers/staging/rdma/hfi1/common.h  |2 
 drivers/staging/rdma/hfi1/cq.c  |   20 +
 drivers/staging/rdma/hfi1/diag.c|   13 -
 drivers/staging/rdma/hfi1/driver.c  |   31 +-
 drivers/staging/rdma/hfi1/hfi.h |   27 +-
 drivers/staging/rdma/hfi1/init.c|5 
 drivers/staging/rdma/hfi1/intr.c|2 
 drivers/staging/rdma/hfi1/keys.c|  356 -
 drivers/staging/rdma/hfi1/mad.c |  163 +-
 drivers/staging/rdma/hfi1/mmap.c|  192 ---
 drivers/staging/rdma/hfi1/mr.c  |  522 --
 drivers/staging/rdma/hfi1/pio.c |   10 -
 drivers/staging/rdma/hfi1/qp.c  |  214 +++-
 drivers/staging/rdma/hfi1/qp.h  |   44 +--
 drivers/staging/rdma/hfi1/rc.c  |  155 +
 drivers/staging/rdma/hfi1/ruc.c |  161 +
 drivers/staging/rdma/hfi1/sdma.h|8 
 drivers/staging/rdma/hfi1/srq.c |   58 ++-
 drivers/staging/rdma/hfi1/sysfs.c   |   18 +
 drivers/staging/rdma/hfi1/trace.h   |   22 +
 drivers/staging/rdma/hfi1/uc.c  |   19 +
 drivers/staging/rdma/hfi1/ud.c  |   91 +++--
 drivers/staging/rdma/hfi1/verbs.c   |  526 +++
 drivers/staging/rdma/hfi1/verbs.h   |  531 ---
 drivers/staging/rdma/hfi1/verbs_mcast.c |   36 +-
 28 files changed, 843 insertions(+), 2425 deletions(-)
 delete mode 100644 drivers/staging/rdma/hfi1/keys.c
 delete mode 100644 drivers/staging/rdma/hfi1/mmap.c
 delete mode 100644 drivers/staging/rdma/hfi1/mr.c

--
-Denny
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH 03/15] IB/hfi1: Consolidate dma ops for hfi1

2015-12-14 Thread Dennis Dalessandro
Remove the dma.c file from hfi1 in favor of using that which is
present in rdmavt.

Reviewed-by: Ira Weiny 
Reviewed-by: Mike Marciniszyn 
Signed-off-by: Dennis Dalessandro 
---
 drivers/staging/rdma/hfi1/Makefile |2 +-
 drivers/staging/rdma/hfi1/verbs.c  |2 +-
 drivers/staging/rdma/hfi1/verbs.h  |2 --
 3 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/staging/rdma/hfi1/Makefile 
b/drivers/staging/rdma/hfi1/Makefile
index 2e5daa6..2126b8b 100644
--- a/drivers/staging/rdma/hfi1/Makefile
+++ b/drivers/staging/rdma/hfi1/Makefile
@@ -7,7 +7,7 @@
 #
 obj-$(CONFIG_INFINIBAND_HFI1) += hfi1.o
 
-hfi1-y := chip.o cq.o device.o diag.o dma.o driver.o eprom.o file_ops.o 
firmware.o \
+hfi1-y := chip.o cq.o device.o diag.o driver.o eprom.o file_ops.o firmware.o \
init.o intr.o keys.o mad.o mmap.o mr.o pcie.o pio.o pio_copy.o \
qp.o qsfp.o rc.o ruc.o sdma.o srq.o sysfs.o trace.o twsi.o \
uc.o ud.o user_pages.o user_sdma.o verbs_mcast.o verbs.o
diff --git a/drivers/staging/rdma/hfi1/verbs.c 
b/drivers/staging/rdma/hfi1/verbs.c
index c457e82..22e2742 100644
--- a/drivers/staging/rdma/hfi1/verbs.c
+++ b/drivers/staging/rdma/hfi1/verbs.c
@@ -2064,7 +2064,7 @@ int hfi1_register_ib_device(struct hfi1_devdata *dd)
ibdev->detach_mcast = hfi1_multicast_detach;
ibdev->process_mad = hfi1_process_mad;
ibdev->mmap = hfi1_mmap;
-   ibdev->dma_ops = _dma_mapping_ops;
+   ibdev->dma_ops = NULL;
ibdev->get_port_immutable = port_immutable;
 
strncpy(ibdev->node_desc, init_utsname()->nodename,
diff --git a/drivers/staging/rdma/hfi1/verbs.h 
b/drivers/staging/rdma/hfi1/verbs.h
index a290ed3..4f23e00 100644
--- a/drivers/staging/rdma/hfi1/verbs.h
+++ b/drivers/staging/rdma/hfi1/verbs.h
@@ -1155,6 +1155,4 @@ extern unsigned int hfi1_max_srq_wrs;
 
 extern const u32 ib_hfi1_rnr_table[];
 
-extern struct ib_dma_mapping_ops hfi1_dma_mapping_ops;
-
 #endif  /* HFI1_VERBS_H */

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH 01/15] IB/hfi1: Begin to use rdmavt for verbs

2015-12-14 Thread Dennis Dalessandro
This patch begins to make use of rdmavt by registering with it and
providing access to the header files. This is just the beginning of
rdmavt support in hfi1.

Reviewed-by: Ira Weiny 
Reviewed-by: Mike Marciniszyn 
Signed-off-by: Dennis Dalessandro 
---
 drivers/staging/rdma/hfi1/Kconfig |2 +-
 drivers/staging/rdma/hfi1/chip.c  |2 +-
 drivers/staging/rdma/hfi1/diag.c  |2 +-
 drivers/staging/rdma/hfi1/hfi.h   |1 +
 drivers/staging/rdma/hfi1/init.c  |5 +++--
 drivers/staging/rdma/hfi1/intr.c  |2 +-
 drivers/staging/rdma/hfi1/mad.c   |5 +++--
 drivers/staging/rdma/hfi1/qp.c|4 ++--
 drivers/staging/rdma/hfi1/sysfs.c |   18 +-
 drivers/staging/rdma/hfi1/verbs.c |   15 ++-
 drivers/staging/rdma/hfi1/verbs.h |8 ++--
 11 files changed, 38 insertions(+), 26 deletions(-)

diff --git a/drivers/staging/rdma/hfi1/Kconfig 
b/drivers/staging/rdma/hfi1/Kconfig
index fd25078..55048fe 100644
--- a/drivers/staging/rdma/hfi1/Kconfig
+++ b/drivers/staging/rdma/hfi1/Kconfig
@@ -1,6 +1,6 @@
 config INFINIBAND_HFI1
tristate "Intel OPA Gen1 support"
-   depends on X86_64
+   depends on X86_64 && INFINIBAND_RDMAVT
default m
---help---
This is a low-level driver for Intel OPA Gen1 adapter.
diff --git a/drivers/staging/rdma/hfi1/chip.c b/drivers/staging/rdma/hfi1/chip.c
index dc69159..f799b86 100644
--- a/drivers/staging/rdma/hfi1/chip.c
+++ b/drivers/staging/rdma/hfi1/chip.c
@@ -6631,7 +6631,7 @@ int set_link_state(struct hfi1_pportdata *ppd, u32 state)
sdma_all_running(dd);
 
/* Signal the IB layer that the port has went active */
-   event.device = >verbs_dev.ibdev;
+   event.device = >verbs_dev.rdi.ibdev;
event.element.port_num = ppd->port;
event.event = IB_EVENT_PORT_ACTIVE;
}
diff --git a/drivers/staging/rdma/hfi1/diag.c b/drivers/staging/rdma/hfi1/diag.c
index 0aaad74..0cf324d 100644
--- a/drivers/staging/rdma/hfi1/diag.c
+++ b/drivers/staging/rdma/hfi1/diag.c
@@ -856,7 +856,7 @@ static ssize_t hfi1_snoop_write(struct file *fp, const char 
__user *data,
vl = sc4;
} else {
sl = (byte_two >> 4) & 0xf;
-   ibp = to_iport(>verbs_dev.ibdev, 1);
+   ibp = to_iport(>verbs_dev.rdi.ibdev, 1);
sc5 = ibp->sl_to_sc[sl];
vl = sc_to_vlt(dd, sc5);
if (vl != sc4) {
diff --git a/drivers/staging/rdma/hfi1/hfi.h b/drivers/staging/rdma/hfi1/hfi.h
index 54ed6b3..c4991be 100644
--- a/drivers/staging/rdma/hfi1/hfi.h
+++ b/drivers/staging/rdma/hfi1/hfi.h
@@ -65,6 +65,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "chip_registers.h"
 #include "common.h"
diff --git a/drivers/staging/rdma/hfi1/init.c b/drivers/staging/rdma/hfi1/init.c
index 1c8286f..1f64e4e 100644
--- a/drivers/staging/rdma/hfi1/init.c
+++ b/drivers/staging/rdma/hfi1/init.c
@@ -56,6 +56,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "hfi.h"
 #include "device.h"
@@ -985,7 +986,7 @@ void hfi1_free_devdata(struct hfi1_devdata *dd)
rcu_barrier(); /* wait for rcu callbacks to complete */
free_percpu(dd->int_counter);
free_percpu(dd->rcv_limit);
-   ib_dealloc_device(>verbs_dev.ibdev);
+   ib_dealloc_device(>verbs_dev.rdi.ibdev);
 }
 
 /*
@@ -1081,7 +1082,7 @@ struct hfi1_devdata *hfi1_alloc_devdata(struct pci_dev 
*pdev, size_t extra)
 bail:
if (!list_empty(>list))
list_del_init(>list);
-   ib_dealloc_device(>verbs_dev.ibdev);
+   ib_dealloc_device(>verbs_dev.rdi.ibdev);
return ERR_PTR(ret);
 }
 
diff --git a/drivers/staging/rdma/hfi1/intr.c b/drivers/staging/rdma/hfi1/intr.c
index 426582b..1283f2d 100644
--- a/drivers/staging/rdma/hfi1/intr.c
+++ b/drivers/staging/rdma/hfi1/intr.c
@@ -98,7 +98,7 @@ static void signal_ib_event(struct hfi1_pportdata *ppd, enum 
ib_event_type ev)
 */
if (!(dd->flags & HFI1_INITTED))
return;
-   event.device = >verbs_dev.ibdev;
+   event.device = >verbs_dev.rdi.ibdev;
event.element.port_num = ppd->port;
event.event = ev;
ib_dispatch_event();
diff --git a/drivers/staging/rdma/hfi1/mad.c b/drivers/staging/rdma/hfi1/mad.c
index a122565..1c34396 100644
--- a/drivers/staging/rdma/hfi1/mad.c
+++ b/drivers/staging/rdma/hfi1/mad.c
@@ -1387,7 +1387,7 @@ static int set_pkeys(struct hfi1_devdata *dd, u8 port, 
u16 *pkeys)
(void)hfi1_set_ib_cfg(ppd, HFI1_IB_CFG_PKEYS, 0);
 
event.event = IB_EVENT_PKEY_CHANGE;
-   event.device = >verbs_dev.ibdev;
+   event.device = >verbs_dev.rdi.ibdev;
event.element.port_num = port;

[RFC PATCH 04/15] IB/hfi1: Use rdmavt protection domain

2015-12-14 Thread Dennis Dalessandro
Remove protection domain from hfi1 and use rdmavt's version.

Reviewed-by: Ira Weiny 
Reviewed-by: Mike Marciniszyn 
Signed-off-by: Dennis Dalessandro 
---
 drivers/staging/rdma/hfi1/keys.c  |4 +-
 drivers/staging/rdma/hfi1/mr.c|2 +
 drivers/staging/rdma/hfi1/ruc.c   |4 +-
 drivers/staging/rdma/hfi1/verbs.c |   67 +++--
 drivers/staging/rdma/hfi1/verbs.h |   15 +---
 5 files changed, 12 insertions(+), 80 deletions(-)

diff --git a/drivers/staging/rdma/hfi1/keys.c b/drivers/staging/rdma/hfi1/keys.c
index cb4e608..57a266f 100644
--- a/drivers/staging/rdma/hfi1/keys.c
+++ b/drivers/staging/rdma/hfi1/keys.c
@@ -176,7 +176,7 @@ out:
  * Check the IB SGE for validity and initialize our internal version
  * of it.
  */
-int hfi1_lkey_ok(struct hfi1_lkey_table *rkt, struct hfi1_pd *pd,
+int hfi1_lkey_ok(struct hfi1_lkey_table *rkt, struct rvt_pd *pd,
 struct hfi1_sge *isge, struct ib_sge *sge, int acc)
 {
struct hfi1_mregion *mr;
@@ -285,7 +285,7 @@ int hfi1_rkey_ok(struct hfi1_qp *qp, struct hfi1_sge *sge,
 */
rcu_read_lock();
if (rkey == 0) {
-   struct hfi1_pd *pd = to_ipd(qp->ibqp.pd);
+   struct rvt_pd *pd = ibpd_to_rvtpd(qp->ibqp.pd);
struct hfi1_ibdev *dev = to_idev(pd->ibpd.device);
 
if (pd->user)
diff --git a/drivers/staging/rdma/hfi1/mr.c b/drivers/staging/rdma/hfi1/mr.c
index 568f185..02589b2 100644
--- a/drivers/staging/rdma/hfi1/mr.c
+++ b/drivers/staging/rdma/hfi1/mr.c
@@ -116,7 +116,7 @@ struct ib_mr *hfi1_get_dma_mr(struct ib_pd *pd, int acc)
struct ib_mr *ret;
int rval;
 
-   if (to_ipd(pd)->user) {
+   if (ibpd_to_rvtpd(pd)->user) {
ret = ERR_PTR(-EPERM);
goto bail;
}
diff --git a/drivers/staging/rdma/hfi1/ruc.c b/drivers/staging/rdma/hfi1/ruc.c
index 317bf6f..eb7aea9 100644
--- a/drivers/staging/rdma/hfi1/ruc.c
+++ b/drivers/staging/rdma/hfi1/ruc.c
@@ -102,11 +102,11 @@ static int init_sge(struct hfi1_qp *qp, struct hfi1_rwqe 
*wqe)
int i, j, ret;
struct ib_wc wc;
struct hfi1_lkey_table *rkt;
-   struct hfi1_pd *pd;
+   struct rvt_pd *pd;
struct hfi1_sge_state *ss;
 
rkt = _idev(qp->ibqp.device)->lk_table;
-   pd = to_ipd(qp->ibqp.srq ? qp->ibqp.srq->pd : qp->ibqp.pd);
+   pd = ibpd_to_rvtpd(qp->ibqp.srq ? qp->ibqp.srq->pd : qp->ibqp.pd);
ss = >r_sge;
ss->sg_list = qp->r_sg_list;
qp->r_len = 0;
diff --git a/drivers/staging/rdma/hfi1/verbs.c 
b/drivers/staging/rdma/hfi1/verbs.c
index 22e2742..1390755 100644
--- a/drivers/staging/rdma/hfi1/verbs.c
+++ b/drivers/staging/rdma/hfi1/verbs.c
@@ -368,7 +368,7 @@ static int post_one_send(struct hfi1_qp *qp, struct 
ib_send_wr *wr)
int j;
int acc;
struct hfi1_lkey_table *rkt;
-   struct hfi1_pd *pd;
+   struct rvt_pd *pd;
struct hfi1_devdata *dd = dd_from_ibdev(qp->ibqp.device);
struct hfi1_pportdata *ppd;
struct hfi1_ibport *ibp;
@@ -413,7 +413,7 @@ static int post_one_send(struct hfi1_qp *qp, struct 
ib_send_wr *wr)
return -ENOMEM;
 
rkt = _idev(qp->ibqp.device)->lk_table;
-   pd = to_ipd(qp->ibqp.pd);
+   pd = ibpd_to_rvtpd(qp->ibqp.pd);
wqe = get_swqe_ptr(qp, qp->s_head);
 
 
@@ -1394,7 +1394,7 @@ static int query_device(struct ib_device *ibdev,
props->max_mr = dev->lk_table.max;
props->max_fmr = dev->lk_table.max;
props->max_map_per_fmr = 32767;
-   props->max_pd = hfi1_max_pds;
+   props->max_pd = dev->rdi.dparms.props.max_pd;
props->max_qp_rd_atom = HFI1_MAX_RDMA_ATOMIC;
props->max_qp_init_rd_atom = 255;
/* props->max_res_rd_atom */
@@ -1592,61 +1592,6 @@ static int query_gid(struct ib_device *ibdev, u8 port,
return ret;
 }
 
-static struct ib_pd *alloc_pd(struct ib_device *ibdev,
- struct ib_ucontext *context,
- struct ib_udata *udata)
-{
-   struct hfi1_ibdev *dev = to_idev(ibdev);
-   struct hfi1_pd *pd;
-   struct ib_pd *ret;
-
-   /*
-* This is actually totally arbitrary.  Some correctness tests
-* assume there's a maximum number of PDs that can be allocated.
-* We don't actually have this limit, but we fail the test if
-* we allow allocations of more than we report for this value.
-*/
-
-   pd = kmalloc(sizeof(*pd), GFP_KERNEL);
-   if (!pd) {
-   ret = ERR_PTR(-ENOMEM);
-   goto bail;
-   }
-
-   spin_lock(>n_pds_lock);
-   if (dev->n_pds_allocated == hfi1_max_pds) {
-   spin_unlock(>n_pds_lock);
-   kfree(pd);
-   ret = ERR_PTR(-ENOMEM);
-   goto bail;
-   }
-
-   dev->n_pds_allocated++;
-   

[RFC PATCH 07/15] IB/hfi1: Add device specific info prints

2015-12-14 Thread Dennis Dalessandro
Implement get_card_name and get_pci_dev helper functions for rdmavt
for hfi1.

Reviewed-by: Mike Marciniszyn 
Reviewed-by: Ira Weiny 
Signed-off-by: Dennis Dalessandro 
---
 drivers/staging/rdma/hfi1/driver.c |   16 
 drivers/staging/rdma/hfi1/hfi.h|2 ++
 drivers/staging/rdma/hfi1/verbs.c  |2 ++
 3 files changed, 20 insertions(+), 0 deletions(-)

diff --git a/drivers/staging/rdma/hfi1/driver.c 
b/drivers/staging/rdma/hfi1/driver.c
index 4c52e78..4f0103b 100644
--- a/drivers/staging/rdma/hfi1/driver.c
+++ b/drivers/staging/rdma/hfi1/driver.c
@@ -162,6 +162,22 @@ const char *get_unit_name(int unit)
return iname;
 }
 
+const char *get_card_name(struct rvt_dev_info *rdi)
+{
+   struct hfi1_ibdev *ibdev = container_of(rdi, struct hfi1_ibdev, rdi);
+   struct hfi1_devdata *dd = container_of(ibdev,
+  struct hfi1_devdata, verbs_dev);
+   return get_unit_name(dd->unit);
+}
+
+struct pci_dev *get_pci_dev(struct rvt_dev_info *rdi)
+{
+   struct hfi1_ibdev *ibdev = container_of(rdi, struct hfi1_ibdev, rdi);
+   struct hfi1_devdata *dd = container_of(ibdev,
+  struct hfi1_devdata, verbs_dev);
+   return dd->pcidev;
+}
+
 /*
  * Return count of units with at least one port ACTIVE.
  */
diff --git a/drivers/staging/rdma/hfi1/hfi.h b/drivers/staging/rdma/hfi1/hfi.h
index c4991be..5925deb 100644
--- a/drivers/staging/rdma/hfi1/hfi.h
+++ b/drivers/staging/rdma/hfi1/hfi.h
@@ -1604,6 +1604,8 @@ int get_platform_config_field(struct hfi1_devdata *dd,
 dma_addr_t hfi1_map_page(struct pci_dev *, struct page *, unsigned long,
 size_t, int);
 const char *get_unit_name(int unit);
+const char *get_card_name(struct rvt_dev_info *rdi);
+struct pci_dev *get_pci_dev(struct rvt_dev_info *rdi);
 
 /*
  * Flush write combining store buffers (if present) and perform a write
diff --git a/drivers/staging/rdma/hfi1/verbs.c 
b/drivers/staging/rdma/hfi1/verbs.c
index b810142..1477d00 100644
--- a/drivers/staging/rdma/hfi1/verbs.c
+++ b/drivers/staging/rdma/hfi1/verbs.c
@@ -2031,6 +2031,8 @@ int hfi1_register_ib_device(struct hfi1_devdata *dd)
 * Fill in rvt info object.
 */
dd->verbs_dev.rdi.driver_f.port_callback = hfi1_create_port_files;
+   dd->verbs_dev.rdi.driver_f.get_card_name = get_card_name;
+   dd->verbs_dev.rdi.driver_f.get_pci_dev = get_pci_dev;
dd->verbs_dev.rdi.dparms.props.max_pd = hfi1_max_pds;
dd->verbs_dev.rdi.flags = (RVT_FLAG_MR_INIT_DRIVER |
   RVT_FLAG_QP_INIT_DRIVER |

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] IB core: Display 64 bit counters from the extended set

2015-12-14 Thread Devesh Sharma
Hello all,

On Sat, Dec 12, 2015 at 5:26 AM, ira.weiny  wrote:
> On Fri, Dec 11, 2015 at 12:25:35PM -0600, Christoph Lameter wrote:
>> Display the additional 64 bit counters available through the extended
>> set and replace the existing 32 bit counters if there is a 64 bit
>> alternative available.
>>
>> Note: This requires universal support of extended counters in
>> the devices. If there are still devices around that do not
>> support extended counters then we will have to add some fallback
>> technique here.
>
> Looks like ocrdma will break here.

Yes, today we report 32 bit counters and to support this change a
simple patch is needed to replace those cpu_to_be32() with
cpu_to_be64(). Internally we already have 64bit countes.

>
> I'm not sure about mthca.
>
> qib, mlx4 are fine.  mlx5 should be as well I would think (I don't have that
> hardware.)
>
> hfi1 did not process these MADs previously as all the hardware counters are 64
> bits.  But with this patch series we would add it.
>
> ehca, amso1100, and ipath are all gone so they don't matter.
>
> I can whip up a patch for hfi1 and we have to wait for Doug to take over that
> driver anyway to make sure that the patch would apply.  So I think you can
> ignore it.
>
> ocrdma seems like it could be a quick patch pre this one.
>
> Ira
>
>>
>> Signed-off-by: Christoph Lameter 
>> ---
>>  drivers/infiniband/core/sysfs.c | 16 
>>  1 file changed, 12 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/infiniband/core/sysfs.c 
>> b/drivers/infiniband/core/sysfs.c
>> index 0083a4f..f7f2954 100644
>> --- a/drivers/infiniband/core/sysfs.c
>> +++ b/drivers/infiniband/core/sysfs.c
>> @@ -406,10 +406,14 @@ static PORT_PMA_ATTR(port_rcv_constraint_errors ,  
>> 8,  8, 136, IB_PMA_PORT_C
>>  static PORT_PMA_ATTR(local_link_integrity_errors,  9,  4, 152, 
>> IB_PMA_PORT_COUNTERS);
>>  static PORT_PMA_ATTR(excessive_buffer_overrun_errors, 10,  4, 156, 
>> IB_PMA_PORT_COUNTERS);
>>  static PORT_PMA_ATTR(VL15_dropped, 11, 16, 176, 
>> IB_PMA_PORT_COUNTERS);
>> -static PORT_PMA_ATTR(port_xmit_data  , 12, 32, 192, 
>> IB_PMA_PORT_COUNTERS);
>> -static PORT_PMA_ATTR(port_rcv_data   , 13, 32, 224, 
>> IB_PMA_PORT_COUNTERS);
>> -static PORT_PMA_ATTR(port_xmit_packets   , 14, 32, 256, 
>> IB_PMA_PORT_COUNTERS);
>> -static PORT_PMA_ATTR(port_rcv_packets, 15, 32, 288, 
>> IB_PMA_PORT_COUNTERS);
>> +static PORT_PMA_ATTR(port_xmit_data  ,  0, 64,  64, 
>> IB_PMA_PORT_COUNTERS_EXT);
>> +static PORT_PMA_ATTR(port_rcv_data   ,  0, 64, 128, 
>> IB_PMA_PORT_COUNTERS_EXT);
>> +static PORT_PMA_ATTR(port_xmit_packets   ,  0, 64, 192, 
>> IB_PMA_PORT_COUNTERS_EXT);
>> +static PORT_PMA_ATTR(port_rcv_packets,  0, 64, 256, 
>> IB_PMA_PORT_COUNTERS_EXT);
>> +static PORT_PMA_ATTR(unicast_xmit_packets,  0, 64, 320, 
>> IB_PMA_PORT_COUNTERS_EXT);
>> +static PORT_PMA_ATTR(unicast_rcv_packets ,  0, 64, 384, 
>> IB_PMA_PORT_COUNTERS_EXT);
>> +static PORT_PMA_ATTR(multicast_xmit_packets  ,  0, 64, 448, 
>> IB_PMA_PORT_COUNTERS_EXT);
>> +static PORT_PMA_ATTR(multicast_rcv_packets   ,  0, 64, 512, 
>> IB_PMA_PORT_COUNTERS_EXT);
>>
>>  static struct attribute *pma_attrs[] = {
>>   _pma_attr_symbol_error.attr.attr,
>> @@ -428,6 +432,10 @@ static struct attribute *pma_attrs[] = {
>>   _pma_attr_port_rcv_data.attr.attr,
>>   _pma_attr_port_xmit_packets.attr.attr,
>>   _pma_attr_port_rcv_packets.attr.attr,
>> + _pma_attr_unicast_rcv_packets.attr.attr,
>> + _pma_attr_unicast_xmit_packets.attr.attr,
>> + _pma_attr_multicast_rcv_packets.attr.attr,
>> + _pma_attr_multicast_xmit_packets.attr.attr,
>>   NULL
>>  };
>>
>> --
>> 2.5.0
>>
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH 13/15] IB/hfi1: Remove ibport and use rdmavt version

2015-12-14 Thread Dennis Dalessandro
Remove most of the ibport members from hfi1 and use the rdmavt version.
Also register the port with rdmavt.

Reviewed-by: Mike Marciniszyn 
Reviewed-by: Harish Chegondi 
Signed-off-by: Dennis Dalessandro 
Signed-off-by: Jubin John 
---
 drivers/staging/rdma/hfi1/chip.c|   34 +++
 drivers/staging/rdma/hfi1/driver.c  |2 
 drivers/staging/rdma/hfi1/hfi.h |8 +-
 drivers/staging/rdma/hfi1/mad.c |  152 ---
 drivers/staging/rdma/hfi1/qp.c  |   23 ++---
 drivers/staging/rdma/hfi1/qp.h  |2 
 drivers/staging/rdma/hfi1/rc.c  |   32 +++
 drivers/staging/rdma/hfi1/ruc.c |   14 ++-
 drivers/staging/rdma/hfi1/uc.c  |2 
 drivers/staging/rdma/hfi1/ud.c  |   16 ++-
 drivers/staging/rdma/hfi1/verbs.c   |   61 +++-
 drivers/staging/rdma/hfi1/verbs.h   |   51 +-
 drivers/staging/rdma/hfi1/verbs_mcast.c |   28 +++---
 13 files changed, 198 insertions(+), 227 deletions(-)

diff --git a/drivers/staging/rdma/hfi1/chip.c b/drivers/staging/rdma/hfi1/chip.c
index f799b86..6d916d0 100644
--- a/drivers/staging/rdma/hfi1/chip.c
+++ b/drivers/staging/rdma/hfi1/chip.c
@@ -1543,8 +1543,8 @@ static u64 access_sw_cpu_##cntr(const struct cntr_entry 
*entry, \
  void *context, int vl, int mode, u64 data)  \
 {\
struct hfi1_pportdata *ppd = (struct hfi1_pportdata *)context;\
-   return read_write_cpu(ppd->dd, >ibport_data.z_ ##cntr,   \
- ppd->ibport_data.cntr, vl,  \
+   return read_write_cpu(ppd->dd, >ibport_data.rvp.z_ ##cntr,   \
+ ppd->ibport_data.rvp.cntr, vl,  \
  mode, data);\
 }
 
@@ -1561,7 +1561,7 @@ static u64 access_ibp_##cntr(const struct cntr_entry 
*entry,\
if (vl != CNTR_INVALID_VL)\
return 0; \
  \
-   return read_write_sw(ppd->dd, >ibport_data.n_ ##cntr,\
+   return read_write_sw(ppd->dd, >ibport_data.rvp.n_ ##cntr,\
 mode, data); \
 }
 
@@ -5947,14 +5947,14 @@ static inline int init_cpu_counters(struct hfi1_devdata 
*dd)
 
ppd = (struct hfi1_pportdata *)(dd + 1);
for (i = 0; i < dd->num_pports; i++, ppd++) {
-   ppd->ibport_data.rc_acks = NULL;
-   ppd->ibport_data.rc_qacks = NULL;
-   ppd->ibport_data.rc_acks = alloc_percpu(u64);
-   ppd->ibport_data.rc_qacks = alloc_percpu(u64);
-   ppd->ibport_data.rc_delayed_comp = alloc_percpu(u64);
-   if ((ppd->ibport_data.rc_acks == NULL) ||
-   (ppd->ibport_data.rc_delayed_comp == NULL) ||
-   (ppd->ibport_data.rc_qacks == NULL))
+   ppd->ibport_data.rvp.rc_acks = NULL;
+   ppd->ibport_data.rvp.rc_qacks = NULL;
+   ppd->ibport_data.rvp.rc_acks = alloc_percpu(u64);
+   ppd->ibport_data.rvp.rc_qacks = alloc_percpu(u64);
+   ppd->ibport_data.rvp.rc_delayed_comp = alloc_percpu(u64);
+   if (!ppd->ibport_data.rvp.rc_acks ||
+   !ppd->ibport_data.rvp.rc_delayed_comp ||
+   !ppd->ibport_data.rvp.rc_qacks)
return -ENOMEM;
}
 
@@ -8010,14 +8010,14 @@ static void free_cntrs(struct hfi1_devdata *dd)
for (i = 0; i < dd->num_pports; i++, ppd++) {
kfree(ppd->cntrs);
kfree(ppd->scntrs);
-   free_percpu(ppd->ibport_data.rc_acks);
-   free_percpu(ppd->ibport_data.rc_qacks);
-   free_percpu(ppd->ibport_data.rc_delayed_comp);
+   free_percpu(ppd->ibport_data.rvp.rc_acks);
+   free_percpu(ppd->ibport_data.rvp.rc_qacks);
+   free_percpu(ppd->ibport_data.rvp.rc_delayed_comp);
ppd->cntrs = NULL;
ppd->scntrs = NULL;
-   ppd->ibport_data.rc_acks = NULL;
-   ppd->ibport_data.rc_qacks = NULL;
-   ppd->ibport_data.rc_delayed_comp = NULL;
+   ppd->ibport_data.rvp.rc_acks = NULL;
+   ppd->ibport_data.rvp.rc_qacks = NULL;
+   ppd->ibport_data.rvp.rc_delayed_comp = NULL;
}
kfree(dd->portcntrnames);
dd->portcntrnames = NULL;
diff --git a/drivers/staging/rdma/hfi1/driver.c 
b/drivers/staging/rdma/hfi1/driver.c
index 182e05f..3b17913 100644
--- 

[RFC PATCH 15/15] IB/hfi1: Use rdmavt pkey verbs function

2015-12-14 Thread Dennis Dalessandro
No need to keep providing the query pkey function. This is now being
done in rdmavt. Remove support from hfi1. The allocation and
maintenance of the list still resides in the driver.

Reviewed-by: Mike Marciniszyn 
Signed-off-by: Dennis Dalessandro 
---
 drivers/staging/rdma/hfi1/verbs.c |   20 +---
 1 files changed, 1 insertions(+), 19 deletions(-)

diff --git a/drivers/staging/rdma/hfi1/verbs.c 
b/drivers/staging/rdma/hfi1/verbs.c
index d208717..e1f249a 100644
--- a/drivers/staging/rdma/hfi1/verbs.c
+++ b/drivers/staging/rdma/hfi1/verbs.c
@@ -1678,24 +1678,6 @@ unsigned hfi1_get_npkeys(struct hfi1_devdata *dd)
return ARRAY_SIZE(dd->pport[0].pkeys);
 }
 
-static int query_pkey(struct ib_device *ibdev, u8 port, u16 index,
- u16 *pkey)
-{
-   struct hfi1_devdata *dd = dd_from_ibdev(ibdev);
-   int ret;
-
-   if (index >= hfi1_get_npkeys(dd)) {
-   ret = -EINVAL;
-   goto bail;
-   }
-
-   *pkey = hfi1_get_pkey(to_iport(ibdev, port), index);
-   ret = 0;
-
-bail:
-   return ret;
-}
-
 /**
  * alloc_ucontext - allocate a ucontest
  * @ibdev: the infiniband device
@@ -1863,7 +1845,7 @@ int hfi1_register_ib_device(struct hfi1_devdata *dd)
ibdev->modify_device = modify_device;
ibdev->query_port = query_port;
ibdev->modify_port = modify_port;
-   ibdev->query_pkey = query_pkey;
+   ibdev->query_pkey = NULL;
ibdev->query_gid = query_gid;
ibdev->alloc_ucontext = alloc_ucontext;
ibdev->dealloc_ucontext = dealloc_ucontext;

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 37/37] IB/rdmavt: Add support for new memory registration API

2015-12-14 Thread Dennis Dalessandro

On Mon, Dec 14, 2015 at 06:18:48PM +0200, Sagi Grimberg wrote:

This question is not directly related to this patch, but given that
this is a copy-paste from the qib driver I'll go ahead and take it
anyway. How does qib (and rvt now) do memory key invalidation? I didn't
see any reference to IB_WR_LOCAL_INV anywhere in the qib driver...

What am I missing?


ping?


In short, it doesn't look like qib or hfi1 support this.

That doesn't mean it can't be added to rdmavt as a future enhancement though 
if there is a need. Are you asking because soft-roce will need it?


-Denny
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] staging/rdma/hfi1: check return value of kcalloc

2015-12-14 Thread Nicholas Mc Guire
On Mon, Dec 14, 2015 at 03:21:24PM +, Marciniszyn, Mike wrote:
> > @@ -10129,6 +10129,9 @@ static void init_qos(struct hfi1_devdata *dd,
> > u32 first_ctxt)
> > if (num_vls * qpns_per_vl > dd->chip_rcv_contexts)
> > goto bail;
> > rsmmap = kcalloc(NUM_MAP_REGS, sizeof(u64), GFP_KERNEL);
> > +   if (!rsmmap)
> > +   goto bail;
> > +
> 
> I checked out a linux-next remote at the next-20151214 tag.
> 
> The allocation method is clearly kmalloc_array() not kcalloc().
> 
> Where are you seeing the kcalloc()?
> 
> While it is tempting to allocate and zero, there is a chip rev specific 
> difference.
>
x = kmalloc_array(...)
if(!x)
   ...
memset(x...)

should be equivalent to

kcalloc - include/linux/slab.h

static inline void *kcalloc(size_t n, size_t size, gfp_t flags)
{
return kmalloc_array(n, size, flags | __GFP_ZERO);
}

if the assumption that this is equvalent is wrong I appologize
the intent was simply API consolidation as the patch description
stated.

thx!
hofrta
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] staging/rdma/hfi1: consolidate kmalloc_array+memset into kcalloc

2015-12-14 Thread Nicholas Mc Guire
On Mon, Dec 14, 2015 at 03:28:46PM +, Marciniszyn, Mike wrote:
> > --- a/drivers/staging/rdma/hfi1/chip.c
> > +++ b/drivers/staging/rdma/hfi1/chip.c
> > @@ -10128,8 +10128,7 @@ static void init_qos(struct hfi1_devdata *dd,
> > u32 first_ctxt)
> > goto bail;
> > if (num_vls * qpns_per_vl > dd->chip_rcv_contexts)
> > goto bail;
> > -   rsmmap = kmalloc_array(NUM_MAP_REGS, sizeof(u64),
> > GFP_KERNEL);
> > -   memset(rsmmap, rxcontext, NUM_MAP_REGS * sizeof(u64));
> > +   rsmmap = kcalloc(NUM_MAP_REGS, sizeof(u64), GFP_KERNEL);
> > /* init the local copy of the table */
> > for (i = 0, ctxt = first_ctxt; i < num_vls; i++) {
> > unsigned tctxt;
> > --
> 
> I'm NAKing this.
> 
> There is a chip specific difference that accounts for the current code.
>
I obviously made a real mess here.
I incorrectly concluded that rxcontext is 0 which it is not in some cases

sorry for the noise.

thx!
hofrat



--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 04/11] svcrdma: Improve allocation of struct svc_rdma_op_ctxt

2015-12-14 Thread Chuck Lever
When the maximum payload size of NFS READ and WRITE was increased
by commit cc9a903d915c ("svcrdma: Change maximum server payload back
to RPCSVC_MAXPAYLOAD"), the size of struct svc_rdma_op_ctxt
increased to over 6KB (on x86_64). That makes allocating one of
these from a kmem_cache more likely to fail in situations when
system memory is exhausted.

Since I'm about to add a caller where this allocation must always
work _and_ it cannot sleep, pre-allocate ctxts for each connection.

Another motivation for this change is that NFSv4.x servers are
required by specification not to drop NFS requests. Pre-allocating
memory resources reduces the likelihood of a drop.

Signed-off-by: Chuck Lever 
---
 include/linux/sunrpc/svc_rdma.h  |6 +-
 net/sunrpc/xprtrdma/svc_rdma_transport.c |  102 ++
 2 files changed, 94 insertions(+), 14 deletions(-)

diff --git a/include/linux/sunrpc/svc_rdma.h b/include/linux/sunrpc/svc_rdma.h
index f869807..be2804b 100644
--- a/include/linux/sunrpc/svc_rdma.h
+++ b/include/linux/sunrpc/svc_rdma.h
@@ -69,6 +69,7 @@ extern atomic_t rdma_stat_sq_prod;
  * completes.
  */
 struct svc_rdma_op_ctxt {
+   struct list_head free;
struct svc_rdma_op_ctxt *read_hdr;
struct svc_rdma_fastreg_mr *frmr;
int hdr_count;
@@ -141,7 +142,10 @@ struct svcxprt_rdma {
struct ib_pd *sc_pd;
 
atomic_t sc_dma_used;
-   atomic_t sc_ctxt_used;
+   spinlock_t   sc_ctxt_lock;
+   struct list_head sc_ctxts;
+   int  sc_ctxt_used;
+
struct list_head sc_rq_dto_q;
spinlock_t   sc_rq_dto_lock;
struct ib_qp *sc_qp;
diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c 
b/net/sunrpc/xprtrdma/svc_rdma_transport.c
index 0783f6e..58ed9f2 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_transport.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c
@@ -153,18 +153,76 @@ static void svc_rdma_bc_free(struct svc_xprt *xprt)
 }
 #endif /* CONFIG_SUNRPC_BACKCHANNEL */
 
-struct svc_rdma_op_ctxt *svc_rdma_get_context(struct svcxprt_rdma *xprt)
+static struct svc_rdma_op_ctxt *alloc_ctxt(struct svcxprt_rdma *xprt,
+  gfp_t flags)
 {
struct svc_rdma_op_ctxt *ctxt;
 
-   ctxt = kmem_cache_alloc(svc_rdma_ctxt_cachep,
-   GFP_KERNEL | __GFP_NOFAIL);
-   ctxt->xprt = xprt;
-   INIT_LIST_HEAD(>dto_q);
+   ctxt = kmalloc(sizeof(*ctxt), flags);
+   if (ctxt) {
+   ctxt->xprt = xprt;
+   INIT_LIST_HEAD(>free);
+   INIT_LIST_HEAD(>dto_q);
+   }
+   return ctxt;
+}
+
+static bool svc_rdma_prealloc_ctxts(struct svcxprt_rdma *xprt)
+{
+   int i;
+
+   /* Each RPC/RDMA credit can consume a number of send
+* and receive WQEs. One ctxt is allocated for each.
+*/
+   i = xprt->sc_sq_depth + xprt->sc_max_requests;
+
+   while (i--) {
+   struct svc_rdma_op_ctxt *ctxt;
+
+   ctxt = alloc_ctxt(xprt, GFP_KERNEL);
+   if (!ctxt) {
+   dprintk("svcrdma: No memory for RDMA ctxt\n");
+   return false;
+   }
+   list_add(>free, >sc_ctxts);
+   }
+   return true;
+}
+
+struct svc_rdma_op_ctxt *svc_rdma_get_context(struct svcxprt_rdma *xprt)
+{
+   struct svc_rdma_op_ctxt *ctxt = NULL;
+
+   spin_lock_bh(>sc_ctxt_lock);
+   xprt->sc_ctxt_used++;
+   if (list_empty(>sc_ctxts))
+   goto out_empty;
+
+   ctxt = list_first_entry(>sc_ctxts,
+   struct svc_rdma_op_ctxt, free);
+   list_del_init(>free);
+   spin_unlock_bh(>sc_ctxt_lock);
+
+out:
ctxt->count = 0;
ctxt->frmr = NULL;
-   atomic_inc(>sc_ctxt_used);
return ctxt;
+
+out_empty:
+   /* Either pre-allocation missed the mark, or send
+* queue accounting is broken.
+*/
+   spin_unlock_bh(>sc_ctxt_lock);
+
+   ctxt = alloc_ctxt(xprt, GFP_NOIO);
+   if (ctxt)
+   goto out;
+
+   spin_lock_bh(>sc_ctxt_lock);
+   xprt->sc_ctxt_used--;
+   spin_unlock_bh(>sc_ctxt_lock);
+   WARN_ON_ONCE("svcrdma: empty RDMA ctxt list?\n");
+   return NULL;
 }
 
 void svc_rdma_unmap_dma(struct svc_rdma_op_ctxt *ctxt)
@@ -190,16 +248,29 @@ void svc_rdma_unmap_dma(struct svc_rdma_op_ctxt *ctxt)
 
 void svc_rdma_put_context(struct svc_rdma_op_ctxt *ctxt, int free_pages)
 {
-   struct svcxprt_rdma *xprt;
+   struct svcxprt_rdma *xprt = ctxt->xprt;
int i;
 
-   xprt = ctxt->xprt;
if (free_pages)
for (i = 0; i < ctxt->count; i++)
put_page(ctxt->pages[i]);
 
-   kmem_cache_free(svc_rdma_ctxt_cachep, ctxt);
-   atomic_dec(>sc_ctxt_used);
+   spin_lock_bh(>sc_ctxt_lock);
+   xprt->sc_ctxt_used--;
+   

[PATCH v4 02/11] svcrdma: Clean up rdma_create_xprt()

2015-12-14 Thread Chuck Lever
kzalloc is used here, so setting the atomic fields to zero is
unnecessary. sc_ord is set again in handle_connect_req. The other
fields are re-initialized in svc_rdma_accept().

Signed-off-by: Chuck Lever 
---
 net/sunrpc/xprtrdma/svc_rdma_transport.c |9 +
 1 file changed, 1 insertion(+), 8 deletions(-)

diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c 
b/net/sunrpc/xprtrdma/svc_rdma_transport.c
index 9f3eb89..27f338a 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_transport.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c
@@ -529,14 +529,6 @@ static struct svcxprt_rdma *rdma_create_xprt(struct 
svc_serv *serv,
spin_lock_init(_xprt->sc_rq_dto_lock);
spin_lock_init(_xprt->sc_frmr_q_lock);
 
-   cma_xprt->sc_ord = svcrdma_ord;
-
-   cma_xprt->sc_max_req_size = svcrdma_max_req_size;
-   cma_xprt->sc_max_requests = svcrdma_max_requests;
-   cma_xprt->sc_sq_depth = svcrdma_max_requests * RPCRDMA_SQ_DEPTH_MULT;
-   atomic_set(_xprt->sc_sq_count, 0);
-   atomic_set(_xprt->sc_ctxt_used, 0);
-
if (listener)
set_bit(XPT_LISTENER, _xprt->sc_xprt.xpt_flags);
 
@@ -918,6 +910,7 @@ static struct svc_xprt *svc_rdma_accept(struct svc_xprt 
*xprt)
  (size_t)RPCSVC_MAXPAGES);
newxprt->sc_max_sge_rd = min_t(size_t, dev->max_sge_rd,
   RPCSVC_MAXPAGES);
+   newxprt->sc_max_req_size = svcrdma_max_req_size;
newxprt->sc_max_requests = min((size_t)dev->max_qp_wr,
   (size_t)svcrdma_max_requests);
newxprt->sc_sq_depth = RPCRDMA_SQ_DEPTH_MULT * newxprt->sc_max_requests;

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 06/11] svcrdma: Remove unused req_map and ctxt kmem_caches

2015-12-14 Thread Chuck Lever
Clean up.

Signed-off-by: Chuck Lever 
---
 include/linux/sunrpc/svc_rdma.h |1 +
 net/sunrpc/xprtrdma/svc_rdma.c  |   35 ---
 net/sunrpc/xprtrdma/xprt_rdma.h |7 ---
 3 files changed, 1 insertion(+), 42 deletions(-)

diff --git a/include/linux/sunrpc/svc_rdma.h b/include/linux/sunrpc/svc_rdma.h
index 05bf4fe..141edbb 100644
--- a/include/linux/sunrpc/svc_rdma.h
+++ b/include/linux/sunrpc/svc_rdma.h
@@ -242,6 +242,7 @@ extern struct svc_xprt_class svc_rdma_bc_class;
 #endif
 
 /* svc_rdma.c */
+extern struct workqueue_struct *svc_rdma_wq;
 extern int svc_rdma_init(void);
 extern void svc_rdma_cleanup(void);
 
diff --git a/net/sunrpc/xprtrdma/svc_rdma.c b/net/sunrpc/xprtrdma/svc_rdma.c
index 1b7051b..e894e06 100644
--- a/net/sunrpc/xprtrdma/svc_rdma.c
+++ b/net/sunrpc/xprtrdma/svc_rdma.c
@@ -71,10 +71,6 @@ atomic_t rdma_stat_rq_prod;
 atomic_t rdma_stat_sq_poll;
 atomic_t rdma_stat_sq_prod;
 
-/* Temporary NFS request map and context caches */
-struct kmem_cache *svc_rdma_map_cachep;
-struct kmem_cache *svc_rdma_ctxt_cachep;
-
 struct workqueue_struct *svc_rdma_wq;
 
 /*
@@ -243,8 +239,6 @@ void svc_rdma_cleanup(void)
svc_unreg_xprt_class(_rdma_bc_class);
 #endif
svc_unreg_xprt_class(_rdma_class);
-   kmem_cache_destroy(svc_rdma_map_cachep);
-   kmem_cache_destroy(svc_rdma_ctxt_cachep);
 }
 
 int svc_rdma_init(void)
@@ -264,39 +258,10 @@ int svc_rdma_init(void)
svcrdma_table_header =
register_sysctl_table(svcrdma_root_table);
 
-   /* Create the temporary map cache */
-   svc_rdma_map_cachep = kmem_cache_create("svc_rdma_map_cache",
-   sizeof(struct svc_rdma_req_map),
-   0,
-   SLAB_HWCACHE_ALIGN,
-   NULL);
-   if (!svc_rdma_map_cachep) {
-   printk(KERN_INFO "Could not allocate map cache.\n");
-   goto err0;
-   }
-
-   /* Create the temporary context cache */
-   svc_rdma_ctxt_cachep =
-   kmem_cache_create("svc_rdma_ctxt_cache",
- sizeof(struct svc_rdma_op_ctxt),
- 0,
- SLAB_HWCACHE_ALIGN,
- NULL);
-   if (!svc_rdma_ctxt_cachep) {
-   printk(KERN_INFO "Could not allocate WR ctxt cache.\n");
-   goto err1;
-   }
-
/* Register RDMA with the SVC transport switch */
svc_reg_xprt_class(_rdma_class);
 #if defined(CONFIG_SUNRPC_BACKCHANNEL)
svc_reg_xprt_class(_rdma_bc_class);
 #endif
return 0;
- err1:
-   kmem_cache_destroy(svc_rdma_map_cachep);
- err0:
-   unregister_sysctl_table(svcrdma_table_header);
-   destroy_workqueue(svc_rdma_wq);
-   return -ENOMEM;
 }
diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h b/net/sunrpc/xprtrdma/xprt_rdma.h
index 4197191..72276c7 100644
--- a/net/sunrpc/xprtrdma/xprt_rdma.h
+++ b/net/sunrpc/xprtrdma/xprt_rdma.h
@@ -528,11 +528,4 @@ void xprt_rdma_bc_free_rqst(struct rpc_rqst *);
 void xprt_rdma_bc_destroy(struct rpc_xprt *, unsigned int);
 #endif /* CONFIG_SUNRPC_BACKCHANNEL */
 
-/* Temporary NFS request map cache. Created in svc_rdma.c  */
-extern struct kmem_cache *svc_rdma_map_cachep;
-/* WR context cache. Created in svc_rdma.c  */
-extern struct kmem_cache *svc_rdma_ctxt_cachep;
-/* Workqueue created in svc_rdma.c */
-extern struct workqueue_struct *svc_rdma_wq;
-
 #endif /* _LINUX_SUNRPC_XPRT_RDMA_H */

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 05/11] svcrdma: Improve allocation of struct svc_rdma_req_map

2015-12-14 Thread Chuck Lever
To ensure this allocation cannot fail and will not sleep,
pre-allocate the req_map structures per-connection.

Signed-off-by: Chuck Lever 
---
 include/linux/sunrpc/svc_rdma.h  |8 ++-
 net/sunrpc/xprtrdma/svc_rdma_sendto.c|6 +-
 net/sunrpc/xprtrdma/svc_rdma_transport.c |   85 ++
 3 files changed, 84 insertions(+), 15 deletions(-)

diff --git a/include/linux/sunrpc/svc_rdma.h b/include/linux/sunrpc/svc_rdma.h
index be2804b..05bf4fe 100644
--- a/include/linux/sunrpc/svc_rdma.h
+++ b/include/linux/sunrpc/svc_rdma.h
@@ -113,6 +113,7 @@ struct svc_rdma_fastreg_mr {
struct list_head frmr_list;
 };
 struct svc_rdma_req_map {
+   struct list_head free;
unsigned long count;
union {
struct kvec sge[RPCSVC_MAXPAGES];
@@ -145,6 +146,8 @@ struct svcxprt_rdma {
spinlock_t   sc_ctxt_lock;
struct list_head sc_ctxts;
int  sc_ctxt_used;
+   spinlock_t   sc_map_lock;
+   struct list_head sc_maps;
 
struct list_head sc_rq_dto_q;
spinlock_t   sc_rq_dto_lock;
@@ -223,8 +226,9 @@ extern int svc_rdma_create_listen(struct svc_serv *, int, 
struct sockaddr *);
 extern struct svc_rdma_op_ctxt *svc_rdma_get_context(struct svcxprt_rdma *);
 extern void svc_rdma_put_context(struct svc_rdma_op_ctxt *, int);
 extern void svc_rdma_unmap_dma(struct svc_rdma_op_ctxt *ctxt);
-extern struct svc_rdma_req_map *svc_rdma_get_req_map(void);
-extern void svc_rdma_put_req_map(struct svc_rdma_req_map *);
+extern struct svc_rdma_req_map *svc_rdma_get_req_map(struct svcxprt_rdma *);
+extern void svc_rdma_put_req_map(struct svcxprt_rdma *,
+struct svc_rdma_req_map *);
 extern struct svc_rdma_fastreg_mr *svc_rdma_get_frmr(struct svcxprt_rdma *);
 extern void svc_rdma_put_frmr(struct svcxprt_rdma *,
  struct svc_rdma_fastreg_mr *);
diff --git a/net/sunrpc/xprtrdma/svc_rdma_sendto.c 
b/net/sunrpc/xprtrdma/svc_rdma_sendto.c
index bad5eaa..b75566c 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_sendto.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_sendto.c
@@ -598,7 +598,7 @@ int svc_rdma_sendto(struct svc_rqst *rqstp)
/* Build an req vec for the XDR */
ctxt = svc_rdma_get_context(rdma);
ctxt->direction = DMA_TO_DEVICE;
-   vec = svc_rdma_get_req_map();
+   vec = svc_rdma_get_req_map(rdma);
ret = map_xdr(rdma, >rq_res, vec);
if (ret)
goto err0;
@@ -637,14 +637,14 @@ int svc_rdma_sendto(struct svc_rqst *rqstp)
 
ret = send_reply(rdma, rqstp, res_page, rdma_resp, ctxt, vec,
 inline_bytes);
-   svc_rdma_put_req_map(vec);
+   svc_rdma_put_req_map(rdma, vec);
dprintk("svcrdma: send_reply returns %d\n", ret);
return ret;
 
  err1:
put_page(res_page);
  err0:
-   svc_rdma_put_req_map(vec);
+   svc_rdma_put_req_map(rdma, vec);
svc_rdma_put_context(ctxt, 0);
return ret;
 }
diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c 
b/net/sunrpc/xprtrdma/svc_rdma_transport.c
index 58ed9f2..ec10ae3 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_transport.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c
@@ -273,23 +273,83 @@ static void svc_rdma_destroy_ctxts(struct svcxprt_rdma 
*xprt)
}
 }
 
-/*
- * Temporary NFS req mappings are shared across all transport
- * instances. These are short lived and should be bounded by the number
- * of concurrent server threads * depth of the SQ.
- */
-struct svc_rdma_req_map *svc_rdma_get_req_map(void)
+static struct svc_rdma_req_map *alloc_req_map(gfp_t flags)
 {
struct svc_rdma_req_map *map;
-   map = kmem_cache_alloc(svc_rdma_map_cachep,
-  GFP_KERNEL | __GFP_NOFAIL);
+
+   map = kmalloc(sizeof(*map), flags);
+   if (map)
+   INIT_LIST_HEAD(>free);
+   return map;
+}
+
+static bool svc_rdma_prealloc_maps(struct svcxprt_rdma *xprt)
+{
+   int i;
+
+   /* One for each receive buffer on this connection. */
+   i = xprt->sc_max_requests;
+
+   while (i--) {
+   struct svc_rdma_req_map *map;
+
+   map = alloc_req_map(GFP_KERNEL);
+   if (!map) {
+   dprintk("svcrdma: No memory for request map\n");
+   return false;
+   }
+   list_add(>free, >sc_maps);
+   }
+   return true;
+}
+
+struct svc_rdma_req_map *svc_rdma_get_req_map(struct svcxprt_rdma *xprt)
+{
+   struct svc_rdma_req_map *map = NULL;
+
+   spin_lock(>sc_map_lock);
+   if (list_empty(>sc_maps))
+   goto out_empty;
+
+   map = list_first_entry(>sc_maps,
+  struct svc_rdma_req_map, free);
+   list_del_init(>free);
+   spin_unlock(>sc_map_lock);
+
+out:
map->count = 0;
return map;
+
+out_empty:
+   

[PATCH v4 01/11] svcrdma: Do not send XDR roundup bytes for a write chunk

2015-12-14 Thread Chuck Lever
Minor optimization: when dealing with write chunk XDR roundup, do
not post a Write WR for the zero bytes in the pad. Simply update
the write segment in the RPC-over-RDMA header to reflect the extra
pad bytes.

The Reply chunk is also a write chunk, but the server does not use
send_write_chunks() to send the Reply chunk. That's OK in this case:
the server Upper Layer typically marshals the Reply chunk contents
in a single contiguous buffer, without a separate tail for the XDR
pad.

The comments and the variable naming refer to "chunks" but what is
really meant is "segments." The existing code sends only one
xdr_write_chunk per RPC reply.

The fix assumes this as well. When the XDR pad in the first write
chunk is reached, the assumption is the Write list is complete and
send_write_chunks() returns.

That will remain a valid assumption until the server Upper Layer can
support multiple bulk payload results per RPC.

Signed-off-by: Chuck Lever 
---
 net/sunrpc/xprtrdma/svc_rdma_sendto.c |7 +++
 1 file changed, 7 insertions(+)

diff --git a/net/sunrpc/xprtrdma/svc_rdma_sendto.c 
b/net/sunrpc/xprtrdma/svc_rdma_sendto.c
index 969a1ab..bad5eaa 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_sendto.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_sendto.c
@@ -342,6 +342,13 @@ static int send_write_chunks(struct svcxprt_rdma *xprt,
arg_ch->rs_handle,
arg_ch->rs_offset,
write_len);
+
+   /* Do not send XDR pad bytes */
+   if (chunk_no && write_len < 4) {
+   chunk_no++;
+   break;
+   }
+
chunk_off = 0;
while (write_len) {
ret = send_write(xprt, rqstp,

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 00/11] NFS/RDMA server patches for v4.5

2015-12-14 Thread Chuck Lever
Here are patches to support server-side bi-directional RPC/RDMA
operation (to enable NFSv4.1 on RPC/RDMA transports). Thanks to
all who reviewed v1, v2, and v3. This version has some significant
changes since the previous one.

In preparation for Doug's final topic branch, Bruce, I've rebased
these on Christoph's ib_device_attr branch. There were some merge
conflicts which I've fixed and tested. These are ready for your
review.

Also available in the "nfsd-rdma-for-4.5" topic branch of this git repo:

git://git.linux-nfs.org/projects/cel/cel-2.6.git

Or for browsing:

http://git.linux-nfs.org/?p=cel/cel-2.6.git;a=log;h=refs/heads/nfsd-rdma-for-4.5


Changes since v3:
- Rebased on Christoph's ib_device_attr branch
- Backchannel patches have been squashed together
- Memory allocation overhaul to prevent blocking allocation
  when sending backchannel calls


Changes since v2:
- Rebased on v4.4-rc4
- Backchannel code in new source file to address dprintk issues
- svc_rdma_get_context() now uses a pre-allocated cache
- Dropped svc_rdma_send clean up


Changes since v1:

- Rebased on v4.4-rc3
- Removed the use of CONFIG_SUNRPC_BACKCHANNEL
- Fixed computation of forward and backward max_requests
- Updated some comments and patch descriptions
- pr_err and pr_info converted to dprintk
- Simplified svc_rdma_get_context()
- Dropped patch removing access_flags field
- NFSv4.1 callbacks tested with for-4.5 client

---

Chuck Lever (11):
  svcrdma: Do not send XDR roundup bytes for a write chunk
  svcrdma: Clean up rdma_create_xprt()
  svcrdma: Clean up process_context()
  svcrdma: Improve allocation of struct svc_rdma_op_ctxt
  svcrdma: Improve allocation of struct svc_rdma_req_map
  svcrdma: Remove unused req_map and ctxt kmem_caches
  svcrdma: Add gfp flags to svc_rdma_post_recv()
  svcrdma: Remove last two __GFP_NOFAIL call sites
  svcrdma: Make map_xdr non-static
  svcrdma: Define maximum number of backchannel requests
  svcrdma: Add class for RDMA backwards direction transport


 include/linux/sunrpc/svc_rdma.h|   37 ++-
 net/sunrpc/xprt.c  |1 
 net/sunrpc/xprtrdma/Makefile   |2 
 net/sunrpc/xprtrdma/svc_rdma.c |   41 ---
 net/sunrpc/xprtrdma/svc_rdma_backchannel.c |  371 
 net/sunrpc/xprtrdma/svc_rdma_recvfrom.c|   52 
 net/sunrpc/xprtrdma/svc_rdma_sendto.c  |   34 ++-
 net/sunrpc/xprtrdma/svc_rdma_transport.c   |  284 -
 net/sunrpc/xprtrdma/transport.c|   30 +-
 net/sunrpc/xprtrdma/xprt_rdma.h|   20 +-
 10 files changed, 730 insertions(+), 142 deletions(-)
 create mode 100644 net/sunrpc/xprtrdma/svc_rdma_backchannel.c

--
Signature
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 03/11] svcrdma: Clean up process_context()

2015-12-14 Thread Chuck Lever
Be sure the completed ctxt is put in every path.

The xprt enqueue can take a while, so put the completed ctxt back
in circulation _before_ enqueuing the xprt.

Remove/disable debugging.

Signed-off-by: Chuck Lever 
---
 net/sunrpc/xprtrdma/svc_rdma_transport.c |   44 ++
 1 file changed, 21 insertions(+), 23 deletions(-)

diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c 
b/net/sunrpc/xprtrdma/svc_rdma_transport.c
index 27f338a..0783f6e 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_transport.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c
@@ -386,46 +386,44 @@ static void rq_cq_reap(struct svcxprt_rdma *xprt)
 static void process_context(struct svcxprt_rdma *xprt,
struct svc_rdma_op_ctxt *ctxt)
 {
+   struct svc_rdma_op_ctxt *read_hdr;
+   int free_pages = 0;
+
svc_rdma_unmap_dma(ctxt);
 
switch (ctxt->wr_op) {
case IB_WR_SEND:
-   if (ctxt->frmr)
-   pr_err("svcrdma: SEND: ctxt->frmr != NULL\n");
-   svc_rdma_put_context(ctxt, 1);
+   free_pages = 1;
break;
 
case IB_WR_RDMA_WRITE:
-   if (ctxt->frmr)
-   pr_err("svcrdma: WRITE: ctxt->frmr != NULL\n");
-   svc_rdma_put_context(ctxt, 0);
break;
 
case IB_WR_RDMA_READ:
case IB_WR_RDMA_READ_WITH_INV:
svc_rdma_put_frmr(xprt, ctxt->frmr);
-   if (test_bit(RDMACTXT_F_LAST_CTXT, >flags)) {
-   struct svc_rdma_op_ctxt *read_hdr = ctxt->read_hdr;
-   if (read_hdr) {
-   spin_lock_bh(>sc_rq_dto_lock);
-   set_bit(XPT_DATA, >sc_xprt.xpt_flags);
-   list_add_tail(_hdr->dto_q,
- >sc_read_complete_q);
-   spin_unlock_bh(>sc_rq_dto_lock);
-   } else {
-   pr_err("svcrdma: ctxt->read_hdr == NULL\n");
-   }
-   svc_xprt_enqueue(>sc_xprt);
-   }
+
+   if (!test_bit(RDMACTXT_F_LAST_CTXT, >flags))
+   break;
+
+   read_hdr = ctxt->read_hdr;
svc_rdma_put_context(ctxt, 0);
-   break;
+
+   spin_lock_bh(>sc_rq_dto_lock);
+   set_bit(XPT_DATA, >sc_xprt.xpt_flags);
+   list_add_tail(_hdr->dto_q,
+ >sc_read_complete_q);
+   spin_unlock_bh(>sc_rq_dto_lock);
+   svc_xprt_enqueue(>sc_xprt);
+   return;
 
default:
-   printk(KERN_ERR "svcrdma: unexpected completion type, "
-  "opcode=%d\n",
-  ctxt->wr_op);
+   dprintk("svcrdma: unexpected completion opcode=%d\n",
+   ctxt->wr_op);
break;
}
+
+   svc_rdma_put_context(ctxt, free_pages);
 }
 
 /*

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 11/11] svcrdma: Add class for RDMA backwards direction transport

2015-12-14 Thread Chuck Lever
To support the server-side of an NFSv4.1 backchannel on RDMA
connections, add a transport class that enables backward
direction messages on an existing forward channel connection.

Signed-off-by: Chuck Lever 
---
 include/linux/sunrpc/svc_rdma.h|5 
 net/sunrpc/xprt.c  |1 
 net/sunrpc/xprtrdma/Makefile   |2 
 net/sunrpc/xprtrdma/svc_rdma_backchannel.c |  371 
 net/sunrpc/xprtrdma/svc_rdma_recvfrom.c|   52 
 net/sunrpc/xprtrdma/svc_rdma_transport.c   |   14 +
 net/sunrpc/xprtrdma/transport.c|   30 +-
 net/sunrpc/xprtrdma/xprt_rdma.h|   15 +
 8 files changed, 475 insertions(+), 15 deletions(-)
 create mode 100644 net/sunrpc/xprtrdma/svc_rdma_backchannel.c

diff --git a/include/linux/sunrpc/svc_rdma.h b/include/linux/sunrpc/svc_rdma.h
index 9a2c418..b13513a 100644
--- a/include/linux/sunrpc/svc_rdma.h
+++ b/include/linux/sunrpc/svc_rdma.h
@@ -195,6 +195,11 @@ struct svcxprt_rdma {
 
 #define RPCSVC_MAXPAYLOAD_RDMA RPCSVC_MAXPAYLOAD
 
+/* svc_rdma_backchannel.c */
+extern int svc_rdma_handle_bc_reply(struct rpc_xprt *xprt,
+   struct rpcrdma_msg *rmsgp,
+   struct xdr_buf *rcvbuf);
+
 /* svc_rdma_marshal.c */
 extern int svc_rdma_xdr_decode_req(struct rpcrdma_msg **, struct svc_rqst *);
 extern int svc_rdma_xdr_encode_error(struct svcxprt_rdma *,
diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c
index 2e98f4a..37edea6 100644
--- a/net/sunrpc/xprt.c
+++ b/net/sunrpc/xprt.c
@@ -1425,3 +1425,4 @@ void xprt_put(struct rpc_xprt *xprt)
if (atomic_dec_and_test(>count))
xprt_destroy(xprt);
 }
+EXPORT_SYMBOL_GPL(xprt_put);
diff --git a/net/sunrpc/xprtrdma/Makefile b/net/sunrpc/xprtrdma/Makefile
index 33f99d3..dc9f3b5 100644
--- a/net/sunrpc/xprtrdma/Makefile
+++ b/net/sunrpc/xprtrdma/Makefile
@@ -2,7 +2,7 @@ obj-$(CONFIG_SUNRPC_XPRT_RDMA) += rpcrdma.o
 
 rpcrdma-y := transport.o rpc_rdma.o verbs.o \
fmr_ops.o frwr_ops.o physical_ops.o \
-   svc_rdma.o svc_rdma_transport.o \
+   svc_rdma.o svc_rdma_backchannel.o svc_rdma_transport.o \
svc_rdma_marshal.o svc_rdma_sendto.o svc_rdma_recvfrom.o \
module.o
 rpcrdma-$(CONFIG_SUNRPC_BACKCHANNEL) += backchannel.o
diff --git a/net/sunrpc/xprtrdma/svc_rdma_backchannel.c 
b/net/sunrpc/xprtrdma/svc_rdma_backchannel.c
new file mode 100644
index 000..417cec1
--- /dev/null
+++ b/net/sunrpc/xprtrdma/svc_rdma_backchannel.c
@@ -0,0 +1,371 @@
+/*
+ * Copyright (c) 2015 Oracle.  All rights reserved.
+ *
+ * Support for backward direction RPCs on RPC/RDMA (server-side).
+ */
+
+#include 
+#include "xprt_rdma.h"
+
+#define RPCDBG_FACILITYRPCDBG_SVCXPRT
+
+#undef SVCRDMA_BACKCHANNEL_DEBUG
+
+int svc_rdma_handle_bc_reply(struct rpc_xprt *xprt, struct rpcrdma_msg *rmsgp,
+struct xdr_buf *rcvbuf)
+{
+   struct rpcrdma_xprt *r_xprt = rpcx_to_rdmax(xprt);
+   struct kvec *dst, *src = >head[0];
+   struct rpc_rqst *req;
+   unsigned long cwnd;
+   u32 credits;
+   size_t len;
+   __be32 xid;
+   __be32 *p;
+   int ret;
+
+   p = (__be32 *)src->iov_base;
+   len = src->iov_len;
+   xid = rmsgp->rm_xid;
+
+#ifdef SVCRDMA_BACKCHANNEL_DEBUG
+   pr_info("%s: xid=%08x, length=%zu\n",
+   __func__, be32_to_cpu(xid), len);
+   pr_info("%s: RPC/RDMA: %*ph\n",
+   __func__, (int)RPCRDMA_HDRLEN_MIN, rmsgp);
+   pr_info("%s:  RPC: %*ph\n",
+   __func__, (int)len, p);
+#endif
+
+   ret = -EAGAIN;
+   if (src->iov_len < 24)
+   goto out_shortreply;
+
+   spin_lock_bh(>transport_lock);
+   req = xprt_lookup_rqst(xprt, xid);
+   if (!req)
+   goto out_notfound;
+
+   dst = >rq_private_buf.head[0];
+   memcpy(>rq_private_buf, >rq_rcv_buf, sizeof(struct xdr_buf));
+   if (dst->iov_len < len)
+   goto out_unlock;
+   memcpy(dst->iov_base, p, len);
+
+   credits = be32_to_cpu(rmsgp->rm_credit);
+   if (credits == 0)
+   credits = 1;/* don't deadlock */
+   else if (credits > r_xprt->rx_buf.rb_bc_max_requests)
+   credits = r_xprt->rx_buf.rb_bc_max_requests;
+
+   cwnd = xprt->cwnd;
+   xprt->cwnd = credits << RPC_CWNDSHIFT;
+   if (xprt->cwnd > cwnd)
+   xprt_release_rqst_cong(req->rq_task);
+
+   ret = 0;
+   xprt_complete_rqst(req->rq_task, rcvbuf->len);
+   rcvbuf->len = 0;
+
+out_unlock:
+   spin_unlock_bh(>transport_lock);
+out:
+   return ret;
+
+out_shortreply:
+   dprintk("svcrdma: short bc reply: xprt=%p, len=%zu\n",
+   xprt, src->iov_len);
+   goto out;
+
+out_notfound:
+   dprintk("svcrdma: unrecognized bc reply: xprt=%p, xid=%08x\n",
+   xprt, be32_to_cpu(xid));
+
+   goto out_unlock;
+}
+
+/* Send a backwards 

[PATCH v4 10/11] svcrdma: Define maximum number of backchannel requests

2015-12-14 Thread Chuck Lever
Extra resources for handling backchannel requests have to be
pre-allocated when a transport instance is created. Set up
additional fields in svcxprt_rdma to track these resources.

The max_requests fields are elements of the RPC-over-RDMA
protocol, so they should be u32. To ensure that unsigned
arithmetic is used everywhere, some other fields in the
svcxprt_rdma struct are updated.

Signed-off-by: Chuck Lever 
---
 include/linux/sunrpc/svc_rdma.h  |   13 ++---
 net/sunrpc/xprtrdma/svc_rdma.c   |6 --
 net/sunrpc/xprtrdma/svc_rdma_transport.c |   24 ++--
 3 files changed, 28 insertions(+), 15 deletions(-)

diff --git a/include/linux/sunrpc/svc_rdma.h b/include/linux/sunrpc/svc_rdma.h
index aeffa30..9a2c418 100644
--- a/include/linux/sunrpc/svc_rdma.h
+++ b/include/linux/sunrpc/svc_rdma.h
@@ -51,6 +51,7 @@
 /* RPC/RDMA parameters and stats */
 extern unsigned int svcrdma_ord;
 extern unsigned int svcrdma_max_requests;
+extern unsigned int svcrdma_max_bc_requests;
 extern unsigned int svcrdma_max_req_size;
 
 extern atomic_t rdma_stat_recv;
@@ -134,10 +135,11 @@ struct svcxprt_rdma {
int  sc_max_sge;
int  sc_max_sge_rd; /* max sge for read target */
 
-   int  sc_sq_depth;   /* Depth of SQ */
atomic_t sc_sq_count;   /* Number of SQ WR on queue */
-
-   int  sc_max_requests;   /* Depth of RQ */
+   unsigned int sc_sq_depth;   /* Depth of SQ */
+   unsigned int sc_rq_depth;   /* Depth of RQ */
+   u32  sc_max_requests;   /* Forward credits */
+   u32  sc_max_bc_requests;/* Backward credits */
int  sc_max_req_size;   /* Size of each RQ WR buf */
 
struct ib_pd *sc_pd;
@@ -186,6 +188,11 @@ struct svcxprt_rdma {
 #define RPCRDMA_MAX_REQUESTS32
 #define RPCRDMA_MAX_REQ_SIZE4096
 
+/* Typical ULP usage of BC requests is NFSv4.1 backchannel. Our
+ * current NFSv4.1 implementation supports one backchannel slot.
+ */
+#define RPCRDMA_MAX_BC_REQUESTS2
+
 #define RPCSVC_MAXPAYLOAD_RDMA RPCSVC_MAXPAYLOAD
 
 /* svc_rdma_marshal.c */
diff --git a/net/sunrpc/xprtrdma/svc_rdma.c b/net/sunrpc/xprtrdma/svc_rdma.c
index e894e06..c846ca9 100644
--- a/net/sunrpc/xprtrdma/svc_rdma.c
+++ b/net/sunrpc/xprtrdma/svc_rdma.c
@@ -55,6 +55,7 @@ unsigned int svcrdma_ord = RPCRDMA_ORD;
 static unsigned int min_ord = 1;
 static unsigned int max_ord = 4096;
 unsigned int svcrdma_max_requests = RPCRDMA_MAX_REQUESTS;
+unsigned int svcrdma_max_bc_requests = RPCRDMA_MAX_BC_REQUESTS;
 static unsigned int min_max_requests = 4;
 static unsigned int max_max_requests = 16384;
 unsigned int svcrdma_max_req_size = RPCRDMA_MAX_REQ_SIZE;
@@ -245,9 +246,10 @@ int svc_rdma_init(void)
 {
dprintk("SVCRDMA Module Init, register RPC RDMA transport\n");
dprintk("\tsvcrdma_ord  : %d\n", svcrdma_ord);
-   dprintk("\tmax_requests : %d\n", svcrdma_max_requests);
-   dprintk("\tsq_depth : %d\n",
+   dprintk("\tmax_requests : %u\n", svcrdma_max_requests);
+   dprintk("\tsq_depth : %u\n",
svcrdma_max_requests * RPCRDMA_SQ_DEPTH_MULT);
+   dprintk("\tmax_bc_requests  : %u\n", svcrdma_max_bc_requests);
dprintk("\tmax_inline   : %d\n", svcrdma_max_req_size);
 
svc_rdma_wq = alloc_workqueue("svc_rdma", 0, 0);
diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c 
b/net/sunrpc/xprtrdma/svc_rdma_transport.c
index 694ade4..35326a3 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_transport.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c
@@ -169,12 +169,12 @@ static struct svc_rdma_op_ctxt *alloc_ctxt(struct 
svcxprt_rdma *xprt,
 
 static bool svc_rdma_prealloc_ctxts(struct svcxprt_rdma *xprt)
 {
-   int i;
+   unsigned int i;
 
/* Each RPC/RDMA credit can consume a number of send
 * and receive WQEs. One ctxt is allocated for each.
 */
-   i = xprt->sc_sq_depth + xprt->sc_max_requests;
+   i = xprt->sc_sq_depth + xprt->sc_rq_depth;
 
while (i--) {
struct svc_rdma_op_ctxt *ctxt;
@@ -285,7 +285,7 @@ static struct svc_rdma_req_map *alloc_req_map(gfp_t flags)
 
 static bool svc_rdma_prealloc_maps(struct svcxprt_rdma *xprt)
 {
-   int i;
+   unsigned int i;
 
/* One for each receive buffer on this connection. */
i = xprt->sc_max_requests;
@@ -1016,8 +1016,8 @@ static struct svc_xprt *svc_rdma_accept(struct svc_xprt 
*xprt)
struct ib_device *dev;
int uninitialized_var(dma_mr_acc);
int need_dma_mr = 0;
+   unsigned int i;
int ret = 0;
-   int i;
 
listen_rdma = container_of(xprt, struct svcxprt_rdma, sc_xprt);
clear_bit(XPT_CONN, >xpt_flags);
@@ -1046,9 +1046,13 @@ static struct svc_xprt *svc_rdma_accept(struct svc_xprt 
*xprt)
  

[PATCH v4 07/11] svcrdma: Add gfp flags to svc_rdma_post_recv()

2015-12-14 Thread Chuck Lever
svc_rdma_post_recv() allocates pages for receive buffers on-demand.
It uses GFP_KERNEL so the allocator tries hard, and may sleep. But
I'm about to add a call to svc_rdma_post_recv() from a function
that may not sleep.

Since all svc_rdma_post_recv() call sites can tolerate its failure,
allow it to fail if the page allocator returns nothing. Longer term,
receive buffers, being a finite resource per-connection, should be
pre-allocated and re-used.

Signed-off-by: Chuck Lever 
---
 include/linux/sunrpc/svc_rdma.h  |2 +-
 net/sunrpc/xprtrdma/svc_rdma_sendto.c|2 +-
 net/sunrpc/xprtrdma/svc_rdma_transport.c |8 +---
 3 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/include/linux/sunrpc/svc_rdma.h b/include/linux/sunrpc/svc_rdma.h
index 141edbb..729ff35 100644
--- a/include/linux/sunrpc/svc_rdma.h
+++ b/include/linux/sunrpc/svc_rdma.h
@@ -221,7 +221,7 @@ extern struct rpcrdma_read_chunk *
 extern int svc_rdma_send(struct svcxprt_rdma *, struct ib_send_wr *);
 extern void svc_rdma_send_error(struct svcxprt_rdma *, struct rpcrdma_msg *,
enum rpcrdma_errcode);
-extern int svc_rdma_post_recv(struct svcxprt_rdma *);
+extern int svc_rdma_post_recv(struct svcxprt_rdma *, gfp_t);
 extern int svc_rdma_create_listen(struct svc_serv *, int, struct sockaddr *);
 extern struct svc_rdma_op_ctxt *svc_rdma_get_context(struct svcxprt_rdma *);
 extern void svc_rdma_put_context(struct svc_rdma_op_ctxt *, int);
diff --git a/net/sunrpc/xprtrdma/svc_rdma_sendto.c 
b/net/sunrpc/xprtrdma/svc_rdma_sendto.c
index b75566c..2d3d7a4 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_sendto.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_sendto.c
@@ -472,7 +472,7 @@ static int send_reply(struct svcxprt_rdma *rdma,
int ret;
 
/* Post a recv buffer to handle another request. */
-   ret = svc_rdma_post_recv(rdma);
+   ret = svc_rdma_post_recv(rdma, GFP_KERNEL);
if (ret) {
printk(KERN_INFO
   "svcrdma: could not post a receive buffer, err=%d."
diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c 
b/net/sunrpc/xprtrdma/svc_rdma_transport.c
index ec10ae3..14b692d 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_transport.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c
@@ -668,7 +668,7 @@ static struct svcxprt_rdma *rdma_create_xprt(struct 
svc_serv *serv,
return cma_xprt;
 }
 
-int svc_rdma_post_recv(struct svcxprt_rdma *xprt)
+int svc_rdma_post_recv(struct svcxprt_rdma *xprt, gfp_t flags)
 {
struct ib_recv_wr recv_wr, *bad_recv_wr;
struct svc_rdma_op_ctxt *ctxt;
@@ -686,7 +686,9 @@ int svc_rdma_post_recv(struct svcxprt_rdma *xprt)
pr_err("svcrdma: Too many sges (%d)\n", sge_no);
goto err_put_ctxt;
}
-   page = alloc_page(GFP_KERNEL | __GFP_NOFAIL);
+   page = alloc_page(flags);
+   if (!page)
+   goto err_put_ctxt;
ctxt->pages[sge_no] = page;
pa = ib_dma_map_page(xprt->sc_cm_id->device,
 page, 0, PAGE_SIZE,
@@ -1182,7 +1184,7 @@ static struct svc_xprt *svc_rdma_accept(struct svc_xprt 
*xprt)
 
/* Post receive buffers */
for (i = 0; i < newxprt->sc_max_requests; i++) {
-   ret = svc_rdma_post_recv(newxprt);
+   ret = svc_rdma_post_recv(newxprt, GFP_KERNEL);
if (ret) {
dprintk("svcrdma: failure posting receive buffers\n");
goto errout;

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 09/11] svcrdma: Make map_xdr non-static

2015-12-14 Thread Chuck Lever
Pre-requisite to use map_xdr in the backchannel code.

Signed-off-by: Chuck Lever 
---
 include/linux/sunrpc/svc_rdma.h   |2 ++
 net/sunrpc/xprtrdma/svc_rdma_sendto.c |   14 +++---
 2 files changed, 9 insertions(+), 7 deletions(-)

diff --git a/include/linux/sunrpc/svc_rdma.h b/include/linux/sunrpc/svc_rdma.h
index 729ff35..aeffa30 100644
--- a/include/linux/sunrpc/svc_rdma.h
+++ b/include/linux/sunrpc/svc_rdma.h
@@ -213,6 +213,8 @@ extern int rdma_read_chunk_frmr(struct svcxprt_rdma *, 
struct svc_rqst *,
u32, u32, u64, bool);
 
 /* svc_rdma_sendto.c */
+extern int svc_rdma_map_xdr(struct svcxprt_rdma *, struct xdr_buf *,
+   struct svc_rdma_req_map *);
 extern int svc_rdma_sendto(struct svc_rqst *);
 extern struct rpcrdma_read_chunk *
svc_rdma_get_read_chunk(struct rpcrdma_msg *);
diff --git a/net/sunrpc/xprtrdma/svc_rdma_sendto.c 
b/net/sunrpc/xprtrdma/svc_rdma_sendto.c
index 9221086..ced3151 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_sendto.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_sendto.c
@@ -50,9 +50,9 @@
 
 #define RPCDBG_FACILITYRPCDBG_SVCXPRT
 
-static int map_xdr(struct svcxprt_rdma *xprt,
-  struct xdr_buf *xdr,
-  struct svc_rdma_req_map *vec)
+int svc_rdma_map_xdr(struct svcxprt_rdma *xprt,
+struct xdr_buf *xdr,
+struct svc_rdma_req_map *vec)
 {
int sge_no;
u32 sge_bytes;
@@ -62,7 +62,7 @@ static int map_xdr(struct svcxprt_rdma *xprt,
 
if (xdr->len !=
(xdr->head[0].iov_len + xdr->page_len + xdr->tail[0].iov_len)) {
-   pr_err("svcrdma: map_xdr: XDR buffer length error\n");
+   pr_err("svcrdma: %s: XDR buffer length error\n", __func__);
return -EIO;
}
 
@@ -97,9 +97,9 @@ static int map_xdr(struct svcxprt_rdma *xprt,
sge_no++;
}
 
-   dprintk("svcrdma: map_xdr: sge_no %d page_no %d "
+   dprintk("svcrdma: %s: sge_no %d page_no %d "
"page_base %u page_len %u head_len %zu tail_len %zu\n",
-   sge_no, page_no, xdr->page_base, xdr->page_len,
+   __func__, sge_no, page_no, xdr->page_base, xdr->page_len,
xdr->head[0].iov_len, xdr->tail[0].iov_len);
 
vec->count = sge_no;
@@ -599,7 +599,7 @@ int svc_rdma_sendto(struct svc_rqst *rqstp)
ctxt = svc_rdma_get_context(rdma);
ctxt->direction = DMA_TO_DEVICE;
vec = svc_rdma_get_req_map(rdma);
-   ret = map_xdr(rdma, >rq_res, vec);
+   ret = svc_rdma_map_xdr(rdma, >rq_res, vec);
if (ret)
goto err0;
inline_bytes = rqstp->rq_res.len;

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 08/11] svcrdma: Remove last two __GFP_NOFAIL call sites

2015-12-14 Thread Chuck Lever
Clean up.

These functions can otherwise fail, so check for page allocation
failures too.

Signed-off-by: Chuck Lever 
---
 net/sunrpc/xprtrdma/svc_rdma_sendto.c|5 -
 net/sunrpc/xprtrdma/svc_rdma_transport.c |4 +++-
 2 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/net/sunrpc/xprtrdma/svc_rdma_sendto.c 
b/net/sunrpc/xprtrdma/svc_rdma_sendto.c
index 2d3d7a4..9221086 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_sendto.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_sendto.c
@@ -605,7 +605,10 @@ int svc_rdma_sendto(struct svc_rqst *rqstp)
inline_bytes = rqstp->rq_res.len;
 
/* Create the RDMA response header */
-   res_page = alloc_page(GFP_KERNEL | __GFP_NOFAIL);
+   ret = -ENOMEM;
+   res_page = alloc_page(GFP_KERNEL);
+   if (!res_page)
+   goto err0;
rdma_resp = page_address(res_page);
reply_ary = svc_rdma_get_reply_array(rdma_argp);
if (reply_ary)
diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c 
b/net/sunrpc/xprtrdma/svc_rdma_transport.c
index 14b692d..694ade4 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_transport.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c
@@ -1445,7 +1445,9 @@ void svc_rdma_send_error(struct svcxprt_rdma *xprt, 
struct rpcrdma_msg *rmsgp,
int length;
int ret;
 
-   p = alloc_page(GFP_KERNEL | __GFP_NOFAIL);
+   p = alloc_page(GFP_KERNEL);
+   if (!p)
+   return;
va = page_address(p);
 
/* XDR encode error */

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 0/2] staging/rdma/hfi1: set Gen 3 half-swing for integrated devices.

2015-12-14 Thread Greg KH
On Mon, Dec 14, 2015 at 02:54:09PM -0500, ira.weiny wrote:
> Any further feedback on this series?

I'm way behind on staging patches right now (1000+ patches behind...),
hopefully will catch up on them this week.

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


hfi1 patches and ordering

2015-12-14 Thread Marciniszyn, Mike
Greg,

We have other patch series close to being submitted.

Some depend on the inflight patches you are behind on, some on each other.

What is the best way of handling this to insure no conflicts?

Mike
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 0/2] staging/rdma/hfi1: set Gen 3 half-swing for integrated devices.

2015-12-14 Thread ira.weiny
Any further feedback on this series?

Ira

On Tue, Dec 01, 2015 at 02:47:55PM -0500, ira.we...@intel.com wrote:
> From: Ira Weiny 
> 
> This was a single patch before.  The change to dev_dbg required a precursor
> patch to add the dd_dev_dbg which is consistent with the other dev_* macros
> which automatically use struct hfi1_devdata.
> 
> Dean Luick (1):
>   staging/rdma/hfi1: set Gen3 half-swing for integrated devices
> 
> Ira Weiny (1):
>   staging/rdma/hf1: add dd_dev_dbg
> 
>  drivers/staging/rdma/hfi1/chip_registers.h | 11 
>  drivers/staging/rdma/hfi1/hfi.h|  4 ++
>  drivers/staging/rdma/hfi1/pcie.c   | 82 
> --
>  3 files changed, 93 insertions(+), 4 deletions(-)
> 
> -- 
> 1.8.2
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] staging/rdma/hfi1: consolidate kmalloc_array+memset into kcalloc

2015-12-14 Thread Dan Carpenter
On Mon, Dec 14, 2015 at 05:41:23PM +, Nicholas Mc Guire wrote:
> I obviously made a real mess here.
> I incorrectly concluded that rxcontext is 0 which it is not in some cases

Yep.  Plus you build tested it but assumed that the unused variable
warning must have been there in the original...  I've done that for
static checker warnings.  Lesson learned, hopefully.

regards,
dan carpenter

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: hfi1 patches and ordering

2015-12-14 Thread Greg KH
On Mon, Dec 14, 2015 at 09:52:31PM +, Marciniszyn, Mike wrote:
> Greg,
> 
> We have other patch series close to being submitted.
> 
> Some depend on the inflight patches you are behind on, some on each other.
> 
> What is the best way of handling this to insure no conflicts?

I apply patches in the order I receive them, so feel free to build on
patches you have already sent in.  For patches you are sending that have
dependancies on themselves, put them in a patch series, numbered
properly, and all should be fine.

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 04/11] xprtrdma: Move struct ib_send_wr off the stack

2015-12-14 Thread Chuck Lever
For FRWR FASTREG and LOCAL_INV, move the ib_*_wr structure off
the stack. This allows frwr_op_map and frwr_op_unmap to chain
WRs together without limit to register or invalidate a set of MRs
with a single ib_post_send().

(This will be for chaining LOCAL_INV requests).

Signed-off-by: Chuck Lever 
---
 net/sunrpc/xprtrdma/frwr_ops.c  |   38 --
 net/sunrpc/xprtrdma/xprt_rdma.h |2 ++
 2 files changed, 22 insertions(+), 18 deletions(-)

diff --git a/net/sunrpc/xprtrdma/frwr_ops.c b/net/sunrpc/xprtrdma/frwr_ops.c
index ae2a241..660d0b6 100644
--- a/net/sunrpc/xprtrdma/frwr_ops.c
+++ b/net/sunrpc/xprtrdma/frwr_ops.c
@@ -318,7 +318,7 @@ frwr_op_map(struct rpcrdma_xprt *r_xprt, struct 
rpcrdma_mr_seg *seg,
struct rpcrdma_mw *mw;
struct rpcrdma_frmr *frmr;
struct ib_mr *mr;
-   struct ib_reg_wr reg_wr;
+   struct ib_reg_wr *reg_wr;
struct ib_send_wr *bad_wr;
int rc, i, n, dma_nents;
u8 key;
@@ -335,6 +335,7 @@ frwr_op_map(struct rpcrdma_xprt *r_xprt, struct 
rpcrdma_mr_seg *seg,
frmr = >r.frmr;
frmr->fr_state = FRMR_IS_VALID;
mr = frmr->fr_mr;
+   reg_wr = >fr_regwr;
 
if (nsegs > ia->ri_max_frmr_depth)
nsegs = ia->ri_max_frmr_depth;
@@ -380,19 +381,19 @@ frwr_op_map(struct rpcrdma_xprt *r_xprt, struct 
rpcrdma_mr_seg *seg,
key = (u8)(mr->rkey & 0x00FF);
ib_update_fast_reg_key(mr, ++key);
 
-   reg_wr.wr.next = NULL;
-   reg_wr.wr.opcode = IB_WR_REG_MR;
-   reg_wr.wr.wr_id = (uintptr_t)mw;
-   reg_wr.wr.num_sge = 0;
-   reg_wr.wr.send_flags = 0;
-   reg_wr.mr = mr;
-   reg_wr.key = mr->rkey;
-   reg_wr.access = writing ?
-   IB_ACCESS_REMOTE_WRITE | IB_ACCESS_LOCAL_WRITE :
-   IB_ACCESS_REMOTE_READ;
+   reg_wr->wr.next = NULL;
+   reg_wr->wr.opcode = IB_WR_REG_MR;
+   reg_wr->wr.wr_id = (uintptr_t)mw;
+   reg_wr->wr.num_sge = 0;
+   reg_wr->wr.send_flags = 0;
+   reg_wr->mr = mr;
+   reg_wr->key = mr->rkey;
+   reg_wr->access = writing ?
+IB_ACCESS_REMOTE_WRITE | IB_ACCESS_LOCAL_WRITE :
+IB_ACCESS_REMOTE_READ;
 
DECR_CQCOUNT(_xprt->rx_ep);
-   rc = ib_post_send(ia->ri_id->qp, _wr.wr, _wr);
+   rc = ib_post_send(ia->ri_id->qp, _wr->wr, _wr);
if (rc)
goto out_senderr;
 
@@ -422,23 +423,24 @@ frwr_op_unmap(struct rpcrdma_xprt *r_xprt, struct 
rpcrdma_mr_seg *seg)
struct rpcrdma_ia *ia = _xprt->rx_ia;
struct rpcrdma_mw *mw = seg1->rl_mw;
struct rpcrdma_frmr *frmr = >r.frmr;
-   struct ib_send_wr invalidate_wr, *bad_wr;
+   struct ib_send_wr *invalidate_wr, *bad_wr;
int rc, nsegs = seg->mr_nsegs;
 
dprintk("RPC:   %s: FRMR %p\n", __func__, mw);
 
seg1->rl_mw = NULL;
frmr->fr_state = FRMR_IS_INVALID;
+   invalidate_wr = >r.frmr.fr_invwr;
 
-   memset(_wr, 0, sizeof(invalidate_wr));
-   invalidate_wr.wr_id = (unsigned long)(void *)mw;
-   invalidate_wr.opcode = IB_WR_LOCAL_INV;
-   invalidate_wr.ex.invalidate_rkey = frmr->fr_mr->rkey;
+   memset(invalidate_wr, 0, sizeof(*invalidate_wr));
+   invalidate_wr->wr_id = (uintptr_t)mw;
+   invalidate_wr->opcode = IB_WR_LOCAL_INV;
+   invalidate_wr->ex.invalidate_rkey = frmr->fr_mr->rkey;
DECR_CQCOUNT(_xprt->rx_ep);
 
ib_dma_unmap_sg(ia->ri_device, frmr->sg, frmr->sg_nents, seg1->mr_dir);
read_lock(>ri_qplock);
-   rc = ib_post_send(ia->ri_id->qp, _wr, _wr);
+   rc = ib_post_send(ia->ri_id->qp, invalidate_wr, _wr);
read_unlock(>ri_qplock);
if (rc)
goto out_err;
diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h b/net/sunrpc/xprtrdma/xprt_rdma.h
index 4197191..e60d817 100644
--- a/net/sunrpc/xprtrdma/xprt_rdma.h
+++ b/net/sunrpc/xprtrdma/xprt_rdma.h
@@ -206,6 +206,8 @@ struct rpcrdma_frmr {
enum rpcrdma_frmr_state fr_state;
struct work_struct  fr_work;
struct rpcrdma_xprt *fr_xprt;
+   struct ib_reg_wrfr_regwr;
+   struct ib_send_wr   fr_invwr;
 };
 
 struct rpcrdma_fmr {

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 06/11] xprtrdma: Add ro_unmap_sync method for FRWR

2015-12-14 Thread Chuck Lever
FRWR's ro_unmap is asynchronous. The new ro_unmap_sync posts
LOCAL_INV Work Requests and waits for them to complete before
returning.

Note also, DMA unmapping is now done _after_ invalidation.

Signed-off-by: Chuck Lever 
---
 net/sunrpc/xprtrdma/frwr_ops.c  |  137 ++-
 net/sunrpc/xprtrdma/xprt_rdma.h |2 +
 2 files changed, 135 insertions(+), 4 deletions(-)

diff --git a/net/sunrpc/xprtrdma/frwr_ops.c b/net/sunrpc/xprtrdma/frwr_ops.c
index 660d0b6..5b9e41d 100644
--- a/net/sunrpc/xprtrdma/frwr_ops.c
+++ b/net/sunrpc/xprtrdma/frwr_ops.c
@@ -244,12 +244,14 @@ frwr_op_maxpages(struct rpcrdma_xprt *r_xprt)
 rpcrdma_max_segments(r_xprt) * ia->ri_max_frmr_depth);
 }
 
-/* If FAST_REG or LOCAL_INV failed, indicate the frmr needs to be reset. */
+/* If FAST_REG or LOCAL_INV failed, indicate the frmr needs
+ * to be reset.
+ *
+ * WARNING: Only wr_id and status are reliable at this point
+ */
 static void
-frwr_sendcompletion(struct ib_wc *wc)
+__frwr_sendcompletion_flush(struct ib_wc *wc, struct rpcrdma_mw *r)
 {
-   struct rpcrdma_mw *r;
-
if (likely(wc->status == IB_WC_SUCCESS))
return;
 
@@ -260,9 +262,23 @@ frwr_sendcompletion(struct ib_wc *wc)
else
pr_warn("RPC:   %s: frmr %p error, status %s (%d)\n",
__func__, r, ib_wc_status_msg(wc->status), wc->status);
+
r->r.frmr.fr_state = FRMR_IS_STALE;
 }
 
+static void
+frwr_sendcompletion(struct ib_wc *wc)
+{
+   struct rpcrdma_mw *r = (struct rpcrdma_mw *)(unsigned long)wc->wr_id;
+   struct rpcrdma_frmr *f = >r.frmr;
+
+   if (unlikely(wc->status != IB_WC_SUCCESS))
+   __frwr_sendcompletion_flush(wc, r);
+
+   if (f->fr_waiter)
+   complete(>fr_linv_done);
+}
+
 static int
 frwr_op_init(struct rpcrdma_xprt *r_xprt)
 {
@@ -334,6 +350,7 @@ frwr_op_map(struct rpcrdma_xprt *r_xprt, struct 
rpcrdma_mr_seg *seg,
} while (mw->r.frmr.fr_state != FRMR_IS_INVALID);
frmr = >r.frmr;
frmr->fr_state = FRMR_IS_VALID;
+   frmr->fr_waiter = false;
mr = frmr->fr_mr;
reg_wr = >fr_regwr;
 
@@ -413,6 +430,117 @@ out_senderr:
return rc;
 }
 
+static struct ib_send_wr *
+__frwr_prepare_linv_wr(struct rpcrdma_mr_seg *seg)
+{
+   struct rpcrdma_mw *mw = seg->rl_mw;
+   struct rpcrdma_frmr *f = >r.frmr;
+   struct ib_send_wr *invalidate_wr;
+
+   f->fr_waiter = false;
+   f->fr_state = FRMR_IS_INVALID;
+   invalidate_wr = >fr_invwr;
+
+   memset(invalidate_wr, 0, sizeof(*invalidate_wr));
+   invalidate_wr->wr_id = (unsigned long)(void *)mw;
+   invalidate_wr->opcode = IB_WR_LOCAL_INV;
+   invalidate_wr->ex.invalidate_rkey = f->fr_mr->rkey;
+
+   return invalidate_wr;
+}
+
+static void
+__frwr_dma_unmap(struct rpcrdma_xprt *r_xprt, struct rpcrdma_mr_seg *seg,
+int rc)
+{
+   struct ib_device *device = r_xprt->rx_ia.ri_device;
+   struct rpcrdma_mw *mw = seg->rl_mw;
+   int nsegs = seg->mr_nsegs;
+
+   seg->rl_mw = NULL;
+
+   while (nsegs--)
+   rpcrdma_unmap_one(device, seg++);
+
+   if (!rc)
+   rpcrdma_put_mw(r_xprt, mw);
+   else
+   __frwr_queue_recovery(mw);
+}
+
+/* Invalidate all memory regions that were registered for "req".
+ *
+ * Sleeps until it is safe for the host CPU to access the
+ * previously mapped memory regions.
+ */
+static void
+frwr_op_unmap_sync(struct rpcrdma_xprt *r_xprt, struct rpcrdma_req *req)
+{
+   struct ib_send_wr *invalidate_wrs, *pos, *prev, *bad_wr;
+   struct rpcrdma_ia *ia = _xprt->rx_ia;
+   struct rpcrdma_mr_seg *seg;
+   unsigned int i, nchunks;
+   struct rpcrdma_frmr *f;
+   int rc;
+
+   dprintk("RPC:   %s: req %p\n", __func__, req);
+
+   /* ORDER: Invalidate all of the req's MRs first
+*
+* Chain the LOCAL_INV Work Requests and post them with
+* a single ib_post_send() call.
+*/
+   invalidate_wrs = pos = prev = NULL;
+   seg = NULL;
+   for (i = 0, nchunks = req->rl_nchunks; nchunks; nchunks--) {
+   seg = >rl_segments[i];
+
+   pos = __frwr_prepare_linv_wr(seg);
+
+   if (!invalidate_wrs)
+   invalidate_wrs = pos;
+   else
+   prev->next = pos;
+   prev = pos;
+
+   i += seg->mr_nsegs;
+   }
+   f = >rl_mw->r.frmr;
+
+   /* Strong send queue ordering guarantees that when the
+* last WR in the chain completes, all WRs in the chain
+* are complete.
+*/
+   f->fr_invwr.send_flags = IB_SEND_SIGNALED;
+   f->fr_waiter = true;
+   init_completion(>fr_linv_done);
+   INIT_CQCOUNT(_xprt->rx_ep);
+
+   /* Transport disconnect drains the receive CQ before it
+* replaces the QP. The RPC reply handler won't call us
+ 

[PATCH v3 05/11] xprtrdma: Introduce ro_unmap_sync method

2015-12-14 Thread Chuck Lever
In the current xprtrdma implementation, some memreg strategies
implement ro_unmap synchronously (the MR is knocked down before the
method returns) and some asynchonously (the MR will be knocked down
and returned to the pool in the background).

To guarantee the MR is truly invalid before the RPC consumer is
allowed to resume execution, we need an unmap method that is
always synchronous, invoked from the RPC/RDMA reply handler.

The new method unmaps all MRs for an RPC. The existing ro_unmap
method unmaps only one MR at a time.

Signed-off-by: Chuck Lever 
---
 net/sunrpc/xprtrdma/xprt_rdma.h |2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h b/net/sunrpc/xprtrdma/xprt_rdma.h
index e60d817..d9f2f65 100644
--- a/net/sunrpc/xprtrdma/xprt_rdma.h
+++ b/net/sunrpc/xprtrdma/xprt_rdma.h
@@ -365,6 +365,8 @@ struct rpcrdma_xprt;
 struct rpcrdma_memreg_ops {
int (*ro_map)(struct rpcrdma_xprt *,
  struct rpcrdma_mr_seg *, int, bool);
+   void(*ro_unmap_sync)(struct rpcrdma_xprt *,
+struct rpcrdma_req *);
int (*ro_unmap)(struct rpcrdma_xprt *,
struct rpcrdma_mr_seg *);
int (*ro_open)(struct rpcrdma_ia *,

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 07/11] xprtrdma: Add ro_unmap_sync method for FMR

2015-12-14 Thread Chuck Lever
FMR's ro_unmap method is already synchronous because ib_unmap_fmr()
is a synchronous verb. However, some improvements can be made here.

1. Gather all the MRs for the RPC request onto a list, and invoke
   ib_unmap_fmr() once with that list. This reduces the number of
   doorbells when there is more than one MR to invalidate

2. Perform the DMA unmap _after_ the MRs are unmapped, not before.
   This is critical after invalidating a Write chunk.

Signed-off-by: Chuck Lever 
---
 net/sunrpc/xprtrdma/fmr_ops.c |   64 +
 1 file changed, 64 insertions(+)

diff --git a/net/sunrpc/xprtrdma/fmr_ops.c b/net/sunrpc/xprtrdma/fmr_ops.c
index f1e8daf..c14f3a4 100644
--- a/net/sunrpc/xprtrdma/fmr_ops.c
+++ b/net/sunrpc/xprtrdma/fmr_ops.c
@@ -179,6 +179,69 @@ out_maperr:
return rc;
 }
 
+static void
+__fmr_dma_unmap(struct rpcrdma_xprt *r_xprt, struct rpcrdma_mr_seg *seg)
+{
+   struct ib_device *device = r_xprt->rx_ia.ri_device;
+   struct rpcrdma_mw *mw = seg->rl_mw;
+   int nsegs = seg->mr_nsegs;
+
+   seg->rl_mw = NULL;
+
+   while (nsegs--)
+   rpcrdma_unmap_one(device, seg++);
+
+   rpcrdma_put_mw(r_xprt, mw);
+}
+
+/* Invalidate all memory regions that were registered for "req".
+ *
+ * Sleeps until it is safe for the host CPU to access the
+ * previously mapped memory regions.
+ */
+static void
+fmr_op_unmap_sync(struct rpcrdma_xprt *r_xprt, struct rpcrdma_req *req)
+{
+   struct rpcrdma_mr_seg *seg;
+   unsigned int i, nchunks;
+   struct rpcrdma_mw *mw;
+   LIST_HEAD(unmap_list);
+   int rc;
+
+   dprintk("RPC:   %s: req %p\n", __func__, req);
+
+   /* ORDER: Invalidate all of the req's MRs first
+*
+* ib_unmap_fmr() is slow, so use a single call instead
+* of one call per mapped MR.
+*/
+   for (i = 0, nchunks = req->rl_nchunks; nchunks; nchunks--) {
+   seg = >rl_segments[i];
+   mw = seg->rl_mw;
+
+   list_add(>r.fmr.fmr->list, _list);
+
+   i += seg->mr_nsegs;
+   }
+   rc = ib_unmap_fmr(_list);
+   if (rc)
+   pr_warn("%s: ib_unmap_fmr failed (%i)\n", __func__, rc);
+
+   /* ORDER: Now DMA unmap all of the req's MRs, and return
+* them to the free MW list.
+*/
+   for (i = 0, nchunks = req->rl_nchunks; nchunks; nchunks--) {
+   seg = >rl_segments[i];
+
+   __fmr_dma_unmap(r_xprt, seg);
+
+   i += seg->mr_nsegs;
+   seg->mr_nsegs = 0;
+   }
+
+   req->rl_nchunks = 0;
+}
+
 /* Use the ib_unmap_fmr() verb to prevent further remote
  * access via RDMA READ or RDMA WRITE.
  */
@@ -231,6 +294,7 @@ fmr_op_destroy(struct rpcrdma_buffer *buf)
 
 const struct rpcrdma_memreg_ops rpcrdma_fmr_memreg_ops = {
.ro_map = fmr_op_map,
+   .ro_unmap_sync  = fmr_op_unmap_sync,
.ro_unmap   = fmr_op_unmap,
.ro_open= fmr_op_open,
.ro_maxpages= fmr_op_maxpages,

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 02/11] xprtrdma: xprt_rdma_free() must not release backchannel reqs

2015-12-14 Thread Chuck Lever
Preserve any rpcrdma_req that is attached to rpc_rqst's allocated
for the backchannel. Otherwise, after all the pre-allocated
backchannel req's are consumed, incoming backward calls start
writing on freed memory.

Somehow this hunk got lost.

Fixes: f531a5dbc451 ('xprtrdma: Pre-allocate backward rpc_rqst')
Signed-off-by: Chuck Lever 
---
 net/sunrpc/xprtrdma/transport.c |3 +++
 1 file changed, 3 insertions(+)

diff --git a/net/sunrpc/xprtrdma/transport.c b/net/sunrpc/xprtrdma/transport.c
index 8c545f7..740bddc 100644
--- a/net/sunrpc/xprtrdma/transport.c
+++ b/net/sunrpc/xprtrdma/transport.c
@@ -576,6 +576,9 @@ xprt_rdma_free(void *buffer)
 
rb = container_of(buffer, struct rpcrdma_regbuf, rg_base[0]);
req = rb->rg_owner;
+   if (req->rl_backchannel)
+   return;
+
r_xprt = container_of(req->rl_buffer, struct rpcrdma_xprt, rx_buf);
 
dprintk("RPC:   %s: called on 0x%p\n", __func__, req->rl_reply);

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 10/11] xprtrdma: Invalidate in the RPC reply handler

2015-12-14 Thread Chuck Lever
There is a window between the time the RPC reply handler wakes the
waiting RPC task and when xprt_release() invokes ops->buf_free.
During this time, memory regions containing the data payload may
still be accessed by a broken or malicious server, but the RPC
application has already been allowed access to the memory containing
the RPC request's data payloads.

The server should be fenced from client memory containing RPC data
payloads _before_ the RPC application is allowed to continue.

This change also more strongly enforces send queue accounting. There
is a maximum number of RPC calls allowed to be outstanding. When an
RPC/RDMA transport is set up, just enough send queue resources are
allocated to handle registration, Send, and invalidation WRs for
each those RPCs at the same time.

Before, additional RPC calls could be dispatched while invalidation
WRs were still consuming send WQEs. When invalidation WRs backed
up, dispatching additional RPCs resulted in a send queue overrun.

Now, the reply handler prevents RPC dispatch until invalidation is
complete. This prevents RPC call dispatch until there are enough
send queue resources to proceed.

Still to do: If an RPC exits early (say, ^C), the reply handler has
no opportunity to perform invalidation. Currently, xprt_rdma_free()
still frees remaining RDMA resources, which could deadlock.
Additional changes are needed to handle invalidation properly in this
case.

Reported-by: Jason Gunthorpe 
Signed-off-by: Chuck Lever 
---
 net/sunrpc/xprtrdma/rpc_rdma.c |   10 ++
 1 file changed, 10 insertions(+)

diff --git a/net/sunrpc/xprtrdma/rpc_rdma.c b/net/sunrpc/xprtrdma/rpc_rdma.c
index 0bc8c39..3d00c5d 100644
--- a/net/sunrpc/xprtrdma/rpc_rdma.c
+++ b/net/sunrpc/xprtrdma/rpc_rdma.c
@@ -891,6 +891,16 @@ badheader:
break;
}
 
+   /* Invalidate and flush the data payloads before waking the
+* waiting application. This guarantees the memory region is
+* properly fenced from the server before the application
+* accesses the data. It also ensures proper send flow
+* control: waking the next RPC waits until this RPC has
+* relinquished all its Send Queue entries.
+*/
+   if (req->rl_nchunks)
+   r_xprt->rx_ia.ri_ops->ro_unmap_sync(r_xprt, req);
+
credits = be32_to_cpu(headerp->rm_credit);
if (credits == 0)
credits = 1;/* don't deadlock */

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 11/11] xprtrdma: Revert commit e7104a2a9606 ('xprtrdma: Cap req_cqinit').

2015-12-14 Thread Chuck Lever
The root of the problem was that sends (especially unsignalled
FASTREG and LOCAL_INV Work Requests) were not properly flow-
controlled, which allowed a send queue overrun.

Now that the RPC/RDMA reply handler waits for invalidation to
complete, the send queue is properly flow-controlled. Thus this
limit is no longer necessary.

Signed-off-by: Chuck Lever 
---
 net/sunrpc/xprtrdma/verbs.c |6 ++
 net/sunrpc/xprtrdma/xprt_rdma.h |6 --
 2 files changed, 2 insertions(+), 10 deletions(-)

diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index f23f3d6..1867e3a 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -608,10 +608,8 @@ rpcrdma_ep_create(struct rpcrdma_ep *ep, struct rpcrdma_ia 
*ia,
 
/* set trigger for requesting send completion */
ep->rep_cqinit = ep->rep_attr.cap.max_send_wr/2 - 1;
-   if (ep->rep_cqinit > RPCRDMA_MAX_UNSIGNALED_SENDS)
-   ep->rep_cqinit = RPCRDMA_MAX_UNSIGNALED_SENDS;
-   else if (ep->rep_cqinit <= 2)
-   ep->rep_cqinit = 0;
+   if (ep->rep_cqinit <= 2)
+   ep->rep_cqinit = 0; /* always signal? */
INIT_CQCOUNT(ep);
init_waitqueue_head(>rep_connect_wait);
INIT_DELAYED_WORK(>rep_connect_worker, rpcrdma_connect_worker);
diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h b/net/sunrpc/xprtrdma/xprt_rdma.h
index 089a7db..ba3bc3f 100644
--- a/net/sunrpc/xprtrdma/xprt_rdma.h
+++ b/net/sunrpc/xprtrdma/xprt_rdma.h
@@ -87,12 +87,6 @@ struct rpcrdma_ep {
struct delayed_work rep_connect_worker;
 };
 
-/*
- * Force a signaled SEND Work Request every so often,
- * in case the provider needs to do some housekeeping.
- */
-#define RPCRDMA_MAX_UNSIGNALED_SENDS   (32)
-
 #define INIT_CQCOUNT(ep) atomic_set(&(ep)->rep_cqcount, (ep)->rep_cqinit)
 #define DECR_CQCOUNT(ep) atomic_sub_return(1, &(ep)->rep_cqcount)
 

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 09/11] SUNRPC: Introduce xprt_commit_rqst()

2015-12-14 Thread Chuck Lever
I'm about to add code in the RPC/RDMA reply handler between the
xprt_lookup_rqst() and xprt_complete_rqst() call site that needs
to execute outside of spinlock critical sections.

Add a hook to remove an rpc_rqst from the pending list once
the transport knows its going to invoke xprt_complete_rqst().

Signed-off-by: Chuck Lever 
---
 include/linux/sunrpc/xprt.h|1 +
 net/sunrpc/xprt.c  |   14 ++
 net/sunrpc/xprtrdma/rpc_rdma.c |4 
 3 files changed, 19 insertions(+)

diff --git a/include/linux/sunrpc/xprt.h b/include/linux/sunrpc/xprt.h
index 69ef5b3..ab6c3a5 100644
--- a/include/linux/sunrpc/xprt.h
+++ b/include/linux/sunrpc/xprt.h
@@ -366,6 +366,7 @@ void
xprt_wait_for_buffer_space(struct rpc_task *task, rpc_action action);
 void   xprt_write_space(struct rpc_xprt *xprt);
 void   xprt_adjust_cwnd(struct rpc_xprt *xprt, struct rpc_task 
*task, int result);
 struct rpc_rqst *  xprt_lookup_rqst(struct rpc_xprt *xprt, __be32 xid);
+void   xprt_commit_rqst(struct rpc_task *task);
 void   xprt_complete_rqst(struct rpc_task *task, int copied);
 void   xprt_release_rqst_cong(struct rpc_task *task);
 void   xprt_disconnect_done(struct rpc_xprt *xprt);
diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c
index 2e98f4a..a5be4ab 100644
--- a/net/sunrpc/xprt.c
+++ b/net/sunrpc/xprt.c
@@ -837,6 +837,20 @@ static void xprt_update_rtt(struct rpc_task *task)
 }
 
 /**
+ * xprt_commit_rqst - remove rqst from pending list early
+ * @task: RPC request to remove
+ *
+ * Caller holds transport lock.
+ */
+void xprt_commit_rqst(struct rpc_task *task)
+{
+   struct rpc_rqst *req = task->tk_rqstp;
+
+   list_del_init(>rq_list);
+}
+EXPORT_SYMBOL_GPL(xprt_commit_rqst);
+
+/**
  * xprt_complete_rqst - called when reply processing is complete
  * @task: RPC request that recently completed
  * @copied: actual number of bytes received from the transport
diff --git a/net/sunrpc/xprtrdma/rpc_rdma.c b/net/sunrpc/xprtrdma/rpc_rdma.c
index c10d969..0bc8c39 100644
--- a/net/sunrpc/xprtrdma/rpc_rdma.c
+++ b/net/sunrpc/xprtrdma/rpc_rdma.c
@@ -804,6 +804,9 @@ rpcrdma_reply_handler(struct rpcrdma_rep *rep)
if (req->rl_reply)
goto out_duplicate;
 
+   xprt_commit_rqst(rqst->rq_task);
+   spin_unlock_bh(>transport_lock);
+
dprintk("RPC:   %s: reply 0x%p completes request 0x%p\n"
"   RPC request 0x%p xid 0x%08x\n",
__func__, rep, req, rqst,
@@ -894,6 +897,7 @@ badheader:
else if (credits > r_xprt->rx_buf.rb_max_requests)
credits = r_xprt->rx_buf.rb_max_requests;
 
+   spin_lock_bh(>transport_lock);
cwnd = xprt->cwnd;
xprt->cwnd = credits << RPC_CWNDSHIFT;
if (xprt->cwnd > cwnd)

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 08/11] xprtrdma: Add ro_unmap_sync method for all-physical registration

2015-12-14 Thread Chuck Lever
physical's ro_unmap is synchronous already. The new ro_unmap_sync
method just has to DMA unmap all MRs associated with the RPC
request.

Signed-off-by: Chuck Lever 
---
 net/sunrpc/xprtrdma/physical_ops.c |   13 +
 1 file changed, 13 insertions(+)

diff --git a/net/sunrpc/xprtrdma/physical_ops.c 
b/net/sunrpc/xprtrdma/physical_ops.c
index 617b76f..dbb302e 100644
--- a/net/sunrpc/xprtrdma/physical_ops.c
+++ b/net/sunrpc/xprtrdma/physical_ops.c
@@ -83,6 +83,18 @@ physical_op_unmap(struct rpcrdma_xprt *r_xprt, struct 
rpcrdma_mr_seg *seg)
return 1;
 }
 
+/* DMA unmap all memory regions that were mapped for "req".
+ */
+static void
+physical_op_unmap_sync(struct rpcrdma_xprt *r_xprt, struct rpcrdma_req *req)
+{
+   struct ib_device *device = r_xprt->rx_ia.ri_device;
+   unsigned int i;
+
+   for (i = 0; req->rl_nchunks; --req->rl_nchunks)
+   rpcrdma_unmap_one(device, >rl_segments[i++]);
+}
+
 static void
 physical_op_destroy(struct rpcrdma_buffer *buf)
 {
@@ -90,6 +102,7 @@ physical_op_destroy(struct rpcrdma_buffer *buf)
 
 const struct rpcrdma_memreg_ops rpcrdma_physical_memreg_ops = {
.ro_map = physical_op_map,
+   .ro_unmap_sync  = physical_op_unmap_sync,
.ro_unmap   = physical_op_unmap,
.ro_open= physical_op_open,
.ro_maxpages= physical_op_maxpages,

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 00/11] NFS/RDMA client patches for 4.5

2015-12-14 Thread Chuck Lever
For 4.5, I'd like to address the send queue accounting and
invalidation/unmap ordering issues Jason brought up a couple of
months ago.

In preparation for Doug's final topic branch, Anna, I've rebased
these on Christoph's ib_device_attr branch, but there were no merge
conflicts or other changes needed. Could you begin preparing these
for linux-next and other final testing and review?

Also available in the "nfs-rdma-for-4.5" topic branch of this git repo:

git://git.linux-nfs.org/projects/cel/cel-2.6.git

Or for browsing:

http://git.linux-nfs.org/?p=cel/cel-2.6.git;a=log;h=refs/heads/nfs-rdma-for-4.5


Changes since v2:
- Rebased on Christoph's ib_device_attr branch


Changes since v1:

- Rebased on v4.4-rc3
- Receive buffer safety margin patch dropped
- Backchannel pr_err and pr_info converted to dprintk
- Backchannel spin locks converted to work queue-safe locks
- Fixed premature release of backchannel request buffer
- NFSv4.1 callbacks tested with for-4.5 server

---

Chuck Lever (11):
  xprtrdma: Fix additional uses of spin_lock_irqsave(rb_lock)
  xprtrdma: xprt_rdma_free() must not release backchannel reqs
  xprtrdma: Disable RPC/RDMA backchannel debugging messages
  xprtrdma: Move struct ib_send_wr off the stack
  xprtrdma: Introduce ro_unmap_sync method
  xprtrdma: Add ro_unmap_sync method for FRWR
  xprtrdma: Add ro_unmap_sync method for FMR
  xprtrdma: Add ro_unmap_sync method for all-physical registration
  SUNRPC: Introduce xprt_commit_rqst()
  xprtrdma: Invalidate in the RPC reply handler
  xprtrdma: Revert commit e7104a2a9606 ('xprtrdma: Cap req_cqinit').


 include/linux/sunrpc/xprt.h|1 
 net/sunrpc/xprt.c  |   14 +++
 net/sunrpc/xprtrdma/backchannel.c  |   22 ++---
 net/sunrpc/xprtrdma/fmr_ops.c  |   64 +
 net/sunrpc/xprtrdma/frwr_ops.c |  175 +++-
 net/sunrpc/xprtrdma/physical_ops.c |   13 +++
 net/sunrpc/xprtrdma/rpc_rdma.c |   14 +++
 net/sunrpc/xprtrdma/transport.c|3 +
 net/sunrpc/xprtrdma/verbs.c|   13 +--
 net/sunrpc/xprtrdma/xprt_rdma.h|   12 +-
 10 files changed, 283 insertions(+), 48 deletions(-)

--
Chuck Lever
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 01/11] xprtrdma: Fix additional uses of spin_lock_irqsave(rb_lock)

2015-12-14 Thread Chuck Lever
Clean up.

rb_lock critical sections added in rpcrdma_ep_post_extra_recv()
should have first been converted to use normal spin_lock now that
the reply handler is a work queue.

The backchannel set up code should use the appropriate helper
instead of open-coding a rb_recv_bufs list add.

Problem introduced by glib patch re-ordering on my part.

Fixes: f531a5dbc451 ('xprtrdma: Pre-allocate backward rpc_rqst')
Signed-off-by: Chuck Lever 
---
 net/sunrpc/xprtrdma/backchannel.c |6 +-
 net/sunrpc/xprtrdma/verbs.c   |7 +++
 2 files changed, 4 insertions(+), 9 deletions(-)

diff --git a/net/sunrpc/xprtrdma/backchannel.c 
b/net/sunrpc/xprtrdma/backchannel.c
index 2dcb44f..11d2cfb 100644
--- a/net/sunrpc/xprtrdma/backchannel.c
+++ b/net/sunrpc/xprtrdma/backchannel.c
@@ -84,9 +84,7 @@ out_fail:
 static int rpcrdma_bc_setup_reps(struct rpcrdma_xprt *r_xprt,
 unsigned int count)
 {
-   struct rpcrdma_buffer *buffers = _xprt->rx_buf;
struct rpcrdma_rep *rep;
-   unsigned long flags;
int rc = 0;
 
while (count--) {
@@ -98,9 +96,7 @@ static int rpcrdma_bc_setup_reps(struct rpcrdma_xprt *r_xprt,
break;
}
 
-   spin_lock_irqsave(>rb_lock, flags);
-   list_add(>rr_list, >rb_recv_bufs);
-   spin_unlock_irqrestore(>rb_lock, flags);
+   rpcrdma_recv_buffer_put(rep);
}
 
return rc;
diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index 650034b..f23f3d6 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -1329,15 +1329,14 @@ rpcrdma_ep_post_extra_recv(struct rpcrdma_xprt *r_xprt, 
unsigned int count)
struct rpcrdma_ia *ia = _xprt->rx_ia;
struct rpcrdma_ep *ep = _xprt->rx_ep;
struct rpcrdma_rep *rep;
-   unsigned long flags;
int rc;
 
while (count--) {
-   spin_lock_irqsave(>rb_lock, flags);
+   spin_lock(>rb_lock);
if (list_empty(>rb_recv_bufs))
goto out_reqbuf;
rep = rpcrdma_buffer_get_rep_locked(buffers);
-   spin_unlock_irqrestore(>rb_lock, flags);
+   spin_unlock(>rb_lock);
 
rc = rpcrdma_ep_post_recv(ia, ep, rep);
if (rc)
@@ -1347,7 +1346,7 @@ rpcrdma_ep_post_extra_recv(struct rpcrdma_xprt *r_xprt, 
unsigned int count)
return 0;
 
 out_reqbuf:
-   spin_unlock_irqrestore(>rb_lock, flags);
+   spin_unlock(>rb_lock);
pr_warn("%s: no extra receive buffers\n", __func__);
return -ENOMEM;
 

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 03/11] xprtrdma: Disable RPC/RDMA backchannel debugging messages

2015-12-14 Thread Chuck Lever
Clean up.

Fixes: 63cae47005af ('xprtrdma: Handle incoming backward direction')
Signed-off-by: Chuck Lever 
---
 net/sunrpc/xprtrdma/backchannel.c |   16 +---
 1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/net/sunrpc/xprtrdma/backchannel.c 
b/net/sunrpc/xprtrdma/backchannel.c
index 11d2cfb..cd31181 100644
--- a/net/sunrpc/xprtrdma/backchannel.c
+++ b/net/sunrpc/xprtrdma/backchannel.c
@@ -15,7 +15,7 @@
 # define RPCDBG_FACILITY   RPCDBG_TRANS
 #endif
 
-#define RPCRDMA_BACKCHANNEL_DEBUG
+#undef RPCRDMA_BACKCHANNEL_DEBUG
 
 static void rpcrdma_bc_free_rqst(struct rpcrdma_xprt *r_xprt,
 struct rpc_rqst *rqst)
@@ -136,6 +136,7 @@ int xprt_rdma_bc_setup(struct rpc_xprt *xprt, unsigned int 
reqs)
   __func__);
goto out_free;
}
+   dprintk("RPC:   %s: new rqst %p\n", __func__, rqst);
 
rqst->rq_xprt = _xprt->rx_xprt;
INIT_LIST_HEAD(>rq_list);
@@ -216,12 +217,14 @@ int rpcrdma_bc_marshal_reply(struct rpc_rqst *rqst)
 
rpclen = rqst->rq_svec[0].iov_len;
 
+#ifdef RPCRDMA_BACKCHANNEL_DEBUG
pr_info("RPC:   %s: rpclen %zd headerp 0x%p lkey 0x%x\n",
__func__, rpclen, headerp, rdmab_lkey(req->rl_rdmabuf));
pr_info("RPC:   %s: RPC/RDMA: %*ph\n",
__func__, (int)RPCRDMA_HDRLEN_MIN, headerp);
pr_info("RPC:   %s:  RPC: %*ph\n",
__func__, (int)rpclen, rqst->rq_svec[0].iov_base);
+#endif
 
req->rl_send_iov[0].addr = rdmab_addr(req->rl_rdmabuf);
req->rl_send_iov[0].length = RPCRDMA_HDRLEN_MIN;
@@ -265,6 +268,9 @@ void xprt_rdma_bc_free_rqst(struct rpc_rqst *rqst)
 {
struct rpc_xprt *xprt = rqst->rq_xprt;
 
+   dprintk("RPC:   %s: freeing rqst %p (req %p)\n",
+   __func__, rqst, rpcr_to_rdmar(rqst));
+
smp_mb__before_atomic();
WARN_ON_ONCE(!test_bit(RPC_BC_PA_IN_USE, >rq_bc_pa_state));
clear_bit(RPC_BC_PA_IN_USE, >rq_bc_pa_state);
@@ -329,9 +335,7 @@ void rpcrdma_bc_receive_call(struct rpcrdma_xprt *r_xprt,
struct rpc_rqst, rq_bc_pa_list);
list_del(>rq_bc_pa_list);
spin_unlock(>bc_pa_lock);
-#ifdef RPCRDMA_BACKCHANNEL_DEBUG
-   pr_info("RPC:   %s: using rqst %p\n", __func__, rqst);
-#endif
+   dprintk("RPC:   %s: using rqst %p\n", __func__, rqst);
 
/* Prepare rqst */
rqst->rq_reply_bytes_recvd = 0;
@@ -351,10 +355,8 @@ void rpcrdma_bc_receive_call(struct rpcrdma_xprt *r_xprt,
 * direction reply.
 */
req = rpcr_to_rdmar(rqst);
-#ifdef RPCRDMA_BACKCHANNEL_DEBUG
-   pr_info("RPC:   %s: attaching rep %p to req %p\n",
+   dprintk("RPC:   %s: attaching rep %p to req %p\n",
__func__, rep, req);
-#endif
req->rl_reply = rep;
 
/* Defeat the retransmit detection logic in send_request */

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html