RE: [PATCH V3 01/17] xprtrdma: mind the device's max fast register page list depth

2014-05-16 Thread Devesh Sharma
Chuck

This patch is causing a CPU soft-lockup if underlying vendor reports 
devattr.max_fast_reg_pagr_list_len = 0 and ia-ri_memreg_strategy = FRMR 
(Default option).
I think there is need to refer to device capability flags. If strategy = FRMR 
is forced and devattr.max_fast_reg_pagr_list_len=0 then flash an error and fail 
RPC with -EIO.

See inline:

 -Original Message-
 From: linux-rdma-ow...@vger.kernel.org [mailto:linux-rdma-
 ow...@vger.kernel.org] On Behalf Of Chuck Lever
 Sent: Thursday, May 01, 2014 1:00 AM
 To: linux-...@vger.kernel.org; linux-rdma@vger.kernel.org
 Cc: anna.schuma...@netapp.com
 Subject: [PATCH V3 01/17] xprtrdma: mind the device's max fast register
 page list depth
 
 From: Steve Wise sw...@opengridcomputing.com
 
 Some rdma devices don't support a fast register page list depth of at least
 RPCRDMA_MAX_DATA_SEGS.  So xprtrdma needs to chunk its fast register
 regions according to the minimum of the device max supported depth or
 RPCRDMA_MAX_DATA_SEGS.
 
 Signed-off-by: Steve Wise sw...@opengridcomputing.com
 Reviewed-by: Chuck Lever chuck.le...@oracle.com
 ---
 
  net/sunrpc/xprtrdma/rpc_rdma.c  |4 ---
  net/sunrpc/xprtrdma/verbs.c |   47 +-
 -
  net/sunrpc/xprtrdma/xprt_rdma.h |1 +
  3 files changed, 36 insertions(+), 16 deletions(-)
 
 diff --git a/net/sunrpc/xprtrdma/rpc_rdma.c
 b/net/sunrpc/xprtrdma/rpc_rdma.c index 96ead52..400aa1b 100644
 --- a/net/sunrpc/xprtrdma/rpc_rdma.c
 +++ b/net/sunrpc/xprtrdma/rpc_rdma.c
 @@ -248,10 +248,6 @@ rpcrdma_create_chunks(struct rpc_rqst *rqst, struct
 xdr_buf *target,
   /* success. all failures return above */
   req-rl_nchunks = nchunks;
 
 - BUG_ON(nchunks == 0);
 - BUG_ON((r_xprt-rx_ia.ri_memreg_strategy == RPCRDMA_FRMR)
 - (nchunks  3));
 -
   /*
* finish off header. If write, marshal discrim and nchunks.
*/
 diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
 index 9372656..55fb09a 100644
 --- a/net/sunrpc/xprtrdma/verbs.c
 +++ b/net/sunrpc/xprtrdma/verbs.c
 @@ -539,6 +539,11 @@ rpcrdma_ia_open(struct rpcrdma_xprt *xprt, struct
 sockaddr *addr, int memreg)
   __func__);
   memreg = RPCRDMA_REGISTER;
  #endif
 + } else {
 + /* Mind the ia limit on FRMR page list depth */
 + ia-ri_max_frmr_depth = min_t(unsigned int,
 + RPCRDMA_MAX_DATA_SEGS,
 + devattr.max_fast_reg_page_list_len);
   }
   break;
   }
 @@ -659,24 +664,42 @@ rpcrdma_ep_create(struct rpcrdma_ep *ep, struct
 rpcrdma_ia *ia,
   ep-rep_attr.srq = NULL;
   ep-rep_attr.cap.max_send_wr = cdata-max_requests;
   switch (ia-ri_memreg_strategy) {
 - case RPCRDMA_FRMR:
 + case RPCRDMA_FRMR: {
 + int depth = 7;
 +
   /* Add room for frmr register and invalidate WRs.
* 1. FRMR reg WR for head
* 2. FRMR invalidate WR for head
 -  * 3. FRMR reg WR for pagelist
 -  * 4. FRMR invalidate WR for pagelist
 +  * 3. N FRMR reg WRs for pagelist
 +  * 4. N FRMR invalidate WRs for pagelist
* 5. FRMR reg WR for tail
* 6. FRMR invalidate WR for tail
* 7. The RDMA_SEND WR
*/
 - ep-rep_attr.cap.max_send_wr *= 7;
 +
 + /* Calculate N if the device max FRMR depth is smaller than
 +  * RPCRDMA_MAX_DATA_SEGS.
 +  */
 + if (ia-ri_max_frmr_depth  RPCRDMA_MAX_DATA_SEGS) {
 + int delta = RPCRDMA_MAX_DATA_SEGS -
 + ia-ri_max_frmr_depth;
 +
 + do {
 + depth += 2; /* FRMR reg + invalidate */
 + delta -= ia-ri_max_frmr_depth;

If ia-ri_max_frmr_depth is = 0. This loop becomes infinite loop.

 + } while (delta  0);
 +
 + }
 + ep-rep_attr.cap.max_send_wr *= depth;
   if (ep-rep_attr.cap.max_send_wr  devattr.max_qp_wr) {
 - cdata-max_requests = devattr.max_qp_wr / 7;
 + cdata-max_requests = devattr.max_qp_wr / depth;
   if (!cdata-max_requests)
   return -EINVAL;
 - ep-rep_attr.cap.max_send_wr = cdata-
 max_requests * 7;
 + ep-rep_attr.cap.max_send_wr = cdata-
 max_requests *
 +depth;
   }
   break;
 + }
   case RPCRDMA_MEMWINDOWS_ASYNC:
   case RPCRDMA_MEMWINDOWS:
   /* Add room for mw_binds+unbinds - overkill! */ @@ -
 1043,16 +1066,16 @@ rpcrdma_buffer_create(struct rpcrdma_buffer *buf,
 struct rpcrdma_ep *ep,
   case RPCRDMA_FRMR:
   for (i 

[PATCH v3 0/2] Fix a use-after-free in ib_umad

2014-05-16 Thread Bart Van Assche
Changes compared to version 2 of this patch series:
* Converted explicit kobject() get calls into implicit calls by moving
  the kobj.parent assignments in front of the corresponding cdev_add()
  calls.

Changes compared to version 1 of this patch series:
* Folded the first patch into the second.
* Implemented Yann's suggestion to drop the test of ret in the
  non-error path of ib_umad_sm_open().
* Simplified the implementation of ib_umad_open() and ib_umad_sm_open()
  further.

This patch series consists of the following two patches:
0001-IB-umad-Fix-error-handling.patch
0002-IB-umad-Fix-a-use-after-free.patch
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 1/2] IB/umad: Fix error handling

2014-05-16 Thread Bart Van Assche
Avoid leaking a kref count in ib_umad_open() if port-ib_dev == NULL
or if nonseekable_open() fails. Avoid leaking a kref count, that
sm_sem is kept down and also that the IB_PORT_SM capability mask is
not cleared in ib_umad_sm_open() if nonseekable_open() fails.
Since container_of() never returns NULL, remove the code that tests
whether container_of() returns NULL. Note: moving the kref_get() call
from the start of ib_umad_*open() to the end is safe since it is the
responsibility of the caller of these functions to ensure that the
cdev pointer remains valid until at least when these functions return.

Signed-off-by: Bart Van Assche bvanass...@acm.org
Cc: Alex Chiang achi...@canonical.com
Cc: Yann Droneaud ydrone...@opteya.com
Cc: sta...@vger.kernel.org
---
 drivers/infiniband/core/user_mad.c | 58 ++
 1 file changed, 33 insertions(+), 25 deletions(-)

diff --git a/drivers/infiniband/core/user_mad.c 
b/drivers/infiniband/core/user_mad.c
index f0d588f..2b3dfcc 100644
--- a/drivers/infiniband/core/user_mad.c
+++ b/drivers/infiniband/core/user_mad.c
@@ -780,27 +780,19 @@ static int ib_umad_open(struct inode *inode, struct file 
*filp)
 {
struct ib_umad_port *port;
struct ib_umad_file *file;
-   int ret;
+   int ret = -ENXIO;
 
port = container_of(inode-i_cdev, struct ib_umad_port, cdev);
-   if (port)
-   kref_get(port-umad_dev-ref);
-   else
-   return -ENXIO;
 
mutex_lock(port-file_mutex);
 
-   if (!port-ib_dev) {
-   ret = -ENXIO;
+   if (!port-ib_dev)
goto out;
-   }
 
+   ret = -ENOMEM;
file = kzalloc(sizeof *file, GFP_KERNEL);
-   if (!file) {
-   kref_put(port-umad_dev-ref, ib_umad_release_dev);
-   ret = -ENOMEM;
+   if (!file)
goto out;
-   }
 
mutex_init(file-mutex);
spin_lock_init(file-send_lock);
@@ -815,9 +807,20 @@ static int ib_umad_open(struct inode *inode, struct file 
*filp)
 
ret = nonseekable_open(inode, filp);
 
+   if (ret)
+   goto del;
+
+   kref_get(port-umad_dev-ref);
+
 out:
mutex_unlock(port-file_mutex);
+
return ret;
+
+del:
+   list_del(file-port_list);
+   kfree(file);
+   goto out;
 }
 
 static int ib_umad_close(struct inode *inode, struct file *filp)
@@ -880,36 +883,41 @@ static int ib_umad_sm_open(struct inode *inode, struct 
file *filp)
int ret;
 
port = container_of(inode-i_cdev, struct ib_umad_port, sm_cdev);
-   if (port)
-   kref_get(port-umad_dev-ref);
-   else
-   return -ENXIO;
 
if (filp-f_flags  O_NONBLOCK) {
if (down_trylock(port-sm_sem)) {
ret = -EAGAIN;
-   goto fail;
+   goto out;
}
} else {
if (down_interruptible(port-sm_sem)) {
ret = -ERESTARTSYS;
-   goto fail;
+   goto out;
}
}
 
ret = ib_modify_port(port-ib_dev, port-port_num, 0, props);
-   if (ret) {
-   up(port-sm_sem);
-   goto fail;
-   }
+   if (ret)
+   goto up_sem;
 
filp-private_data = port;
 
-   return nonseekable_open(inode, filp);
+   ret = nonseekable_open(inode, filp);
+   if (ret)
+   goto clr_sm_cap;
 
-fail:
-   kref_put(port-umad_dev-ref, ib_umad_release_dev);
+   kref_get(port-umad_dev-ref);
+
+out:
return ret;
+
+clr_sm_cap:
+   swap(props.set_port_cap_mask, props.clr_port_cap_mask);
+   ib_modify_port(port-ib_dev, port-port_num, 0, props);
+
+up_sem:
+   up(port-sm_sem);
+   goto out;
 }
 
 static int ib_umad_sm_close(struct inode *inode, struct file *filp)
-- 
1.8.4.5

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 2/2] IB/umad: Fix a use-after-free

2014-05-16 Thread Bart Van Assche
Avoid that closing /dev/infiniband/umadn or /dev/infiniband/issmn
triggers a use-after-free. __fput() in fs/file_table.c invokes
f_op-release() before it invokes cdev_put(). Make sure that the
ib_umad_device structure is freed by the cdev_put() call instead of
f_op-release(). This avoids that changing the port mode from IB into
Ethernet and back to IB followed by restarting opensmd triggers the
following kernel oops:

general protection fault:  [#1] PREEMPT SMP
RIP: 0010:[810cc65c]  [810cc65c] module_put+0x2c/0x170
Call Trace:
 [81190f20] cdev_put+0x20/0x30
 [8118e2ce] __fput+0x1ae/0x1f0
 [8118e35e] fput+0xe/0x10
 [810723bc] task_work_run+0xac/0xe0
 [81002a9f] do_notify_resume+0x9f/0xc0
 [814b8398] int_signal+0x12/0x17

Reference: https://bugzilla.kernel.org/show_bug.cgi?id=75051
Signed-off-by: Bart Van Assche bvanass...@acm.org
Cc: Alex Chiang achi...@canonical.com
Cc: Yann Droneaud ydrone...@opteya.com
Cc: sta...@vger.kernel.org
---
 drivers/infiniband/core/user_mad.c | 30 +++---
 1 file changed, 19 insertions(+), 11 deletions(-)

diff --git a/drivers/infiniband/core/user_mad.c 
b/drivers/infiniband/core/user_mad.c
index 2b3dfcc..4ac0d42 100644
--- a/drivers/infiniband/core/user_mad.c
+++ b/drivers/infiniband/core/user_mad.c
@@ -98,7 +98,7 @@ struct ib_umad_port {
 
 struct ib_umad_device {
int  start_port, end_port;
-   struct kref  ref;
+   struct kobject   kobj;
struct ib_umad_port  port[0];
 };
 
@@ -134,14 +134,18 @@ static DECLARE_BITMAP(dev_map, IB_UMAD_MAX_PORTS);
 static void ib_umad_add_one(struct ib_device *device);
 static void ib_umad_remove_one(struct ib_device *device);
 
-static void ib_umad_release_dev(struct kref *ref)
+static void ib_umad_release_dev(struct kobject *kobj)
 {
struct ib_umad_device *dev =
-   container_of(ref, struct ib_umad_device, ref);
+   container_of(kobj, struct ib_umad_device, kobj);
 
kfree(dev);
 }
 
+static struct kobj_type ib_umad_dev_ktype = {
+   .release = ib_umad_release_dev,
+};
+
 static int hdr_size(struct ib_umad_file *file)
 {
return file-use_pkey_index ? sizeof (struct ib_user_mad_hdr) :
@@ -810,7 +814,7 @@ static int ib_umad_open(struct inode *inode, struct file 
*filp)
if (ret)
goto del;
 
-   kref_get(port-umad_dev-ref);
+   kobject_get(port-umad_dev-kobj);
 
 out:
mutex_unlock(port-file_mutex);
@@ -855,7 +859,7 @@ static int ib_umad_close(struct inode *inode, struct file 
*filp)
mutex_unlock(file-port-file_mutex);
 
kfree(file);
-   kref_put(dev-ref, ib_umad_release_dev);
+   kobject_put(dev-kobj);
 
return 0;
 }
@@ -906,7 +910,7 @@ static int ib_umad_sm_open(struct inode *inode, struct file 
*filp)
if (ret)
goto clr_sm_cap;
 
-   kref_get(port-umad_dev-ref);
+   kobject_get(port-umad_dev-kobj);
 
 out:
return ret;
@@ -935,7 +939,7 @@ static int ib_umad_sm_close(struct inode *inode, struct 
file *filp)
 
up(port-sm_sem);
 
-   kref_put(port-umad_dev-ref, ib_umad_release_dev);
+   kobject_put(port-umad_dev-kobj);
 
return ret;
 }
@@ -1003,6 +1007,7 @@ static int find_overflow_devnum(void)
 }
 
 static int ib_umad_init_port(struct ib_device *device, int port_num,
+struct ib_umad_device *umad_dev,
 struct ib_umad_port *port)
 {
int devnum;
@@ -1035,6 +1040,7 @@ static int ib_umad_init_port(struct ib_device *device, 
int port_num,
 
cdev_init(port-cdev, umad_fops);
port-cdev.owner = THIS_MODULE;
+   port-cdev.kobj.parent = umad_dev-kobj;
kobject_set_name(port-cdev.kobj, umad%d, port-dev_num);
if (cdev_add(port-cdev, base, 1))
goto err_cdev;
@@ -1053,6 +1059,7 @@ static int ib_umad_init_port(struct ib_device *device, 
int port_num,
base += IB_UMAD_MAX_PORTS;
cdev_init(port-sm_cdev, umad_sm_fops);
port-sm_cdev.owner = THIS_MODULE;
+   port-sm_cdev.kobj.parent = umad_dev-kobj;
kobject_set_name(port-sm_cdev.kobj, issm%d, port-dev_num);
if (cdev_add(port-sm_cdev, base, 1))
goto err_sm_cdev;
@@ -1146,7 +1153,7 @@ static void ib_umad_add_one(struct ib_device *device)
if (!umad_dev)
return;
 
-   kref_init(umad_dev-ref);
+   kobject_init(umad_dev-kobj, ib_umad_dev_ktype);
 
umad_dev-start_port = s;
umad_dev-end_port   = e;
@@ -1154,7 +1161,8 @@ static void ib_umad_add_one(struct ib_device *device)
for (i = s; i = e; ++i) {
umad_dev-port[i - s].umad_dev = umad_dev;
 
-   if (ib_umad_init_port(device, i, umad_dev-port[i - s]))
+   if (ib_umad_init_port(device, i, umad_dev,
+ umad_dev-port[i - s]))
goto 

Re: [PATCH v3 2/2] IB/umad: Fix a use-after-free

2014-05-16 Thread Yann Droneaud
Le vendredi 16 mai 2014 à 13:05 +0200, Bart Van Assche a écrit :
 Avoid that closing /dev/infiniband/umadn or /dev/infiniband/issmn
 triggers a use-after-free. __fput() in fs/file_table.c invokes
 f_op-release() before it invokes cdev_put(). Make sure that the
 ib_umad_device structure is freed by the cdev_put() call instead of
 f_op-release(). This avoids that changing the port mode from IB into
 Ethernet and back to IB followed by restarting opensmd triggers the
 following kernel oops:
 
 general protection fault:  [#1] PREEMPT SMP
 RIP: 0010:[810cc65c]  [810cc65c] module_put+0x2c/0x170
 Call Trace:
  [81190f20] cdev_put+0x20/0x30
  [8118e2ce] __fput+0x1ae/0x1f0
  [8118e35e] fput+0xe/0x10
  [810723bc] task_work_run+0xac/0xe0
  [81002a9f] do_notify_resume+0x9f/0xc0
  [814b8398] int_signal+0x12/0x17
 
 Reference: https://bugzilla.kernel.org/show_bug.cgi?id=75051
 Signed-off-by: Bart Van Assche bvanass...@acm.org
 Cc: Alex Chiang achi...@canonical.com
 Cc: Yann Droneaud ydrone...@opteya.com
 Cc: sta...@vger.kernel.org
 ---
  drivers/infiniband/core/user_mad.c | 30 +++---
  1 file changed, 19 insertions(+), 11 deletions(-)
 
 diff --git a/drivers/infiniband/core/user_mad.c 
 b/drivers/infiniband/core/user_mad.c
 index 2b3dfcc..4ac0d42 100644
 --- a/drivers/infiniband/core/user_mad.c
 +++ b/drivers/infiniband/core/user_mad.c
 @@ -98,7 +98,7 @@ struct ib_umad_port {
  
  struct ib_umad_device {
   int  start_port, end_port;
 - struct kref  ref;
 + struct kobject   kobj;
   struct ib_umad_port  port[0];
  };
  
 @@ -134,14 +134,18 @@ static DECLARE_BITMAP(dev_map, IB_UMAD_MAX_PORTS);
  static void ib_umad_add_one(struct ib_device *device);
  static void ib_umad_remove_one(struct ib_device *device);
  
 -static void ib_umad_release_dev(struct kref *ref)
 +static void ib_umad_release_dev(struct kobject *kobj)
  {
   struct ib_umad_device *dev =
 - container_of(ref, struct ib_umad_device, ref);
 + container_of(kobj, struct ib_umad_device, kobj);
  
   kfree(dev);
  }
  
 +static struct kobj_type ib_umad_dev_ktype = {
 + .release = ib_umad_release_dev,
 +};
 +
  static int hdr_size(struct ib_umad_file *file)
  {
   return file-use_pkey_index ? sizeof (struct ib_user_mad_hdr) :
 @@ -810,7 +814,7 @@ static int ib_umad_open(struct inode *inode, struct file 
 *filp)
   if (ret)
   goto del;
  
 - kref_get(port-umad_dev-ref);
 + kobject_get(port-umad_dev-kobj);
  
  out:
   mutex_unlock(port-file_mutex);
 @@ -855,7 +859,7 @@ static int ib_umad_close(struct inode *inode, struct file 
 *filp)
   mutex_unlock(file-port-file_mutex);
  
   kfree(file);
 - kref_put(dev-ref, ib_umad_release_dev);
 + kobject_put(dev-kobj);
  
   return 0;
  }
 @@ -906,7 +910,7 @@ static int ib_umad_sm_open(struct inode *inode, struct 
 file *filp)
   if (ret)
   goto clr_sm_cap;
  
 - kref_get(port-umad_dev-ref);
 + kobject_get(port-umad_dev-kobj);
  
  out:
   return ret;
 @@ -935,7 +939,7 @@ static int ib_umad_sm_close(struct inode *inode, struct 
 file *filp)
  
   up(port-sm_sem);
  
 - kref_put(port-umad_dev-ref, ib_umad_release_dev);
 + kobject_put(port-umad_dev-kobj);
  
   return ret;
  }
 @@ -1003,6 +1007,7 @@ static int find_overflow_devnum(void)
  }
  
  static int ib_umad_init_port(struct ib_device *device, int port_num,
 +  struct ib_umad_device *umad_dev,
struct ib_umad_port *port)
  {
   int devnum;
 @@ -1035,6 +1040,7 @@ static int ib_umad_init_port(struct ib_device *device, 
 int port_num,
  
   cdev_init(port-cdev, umad_fops);
   port-cdev.owner = THIS_MODULE;
 + port-cdev.kobj.parent = umad_dev-kobj;
   kobject_set_name(port-cdev.kobj, umad%d, port-dev_num);
   if (cdev_add(port-cdev, base, 1))
   goto err_cdev;
 @@ -1053,6 +1059,7 @@ static int ib_umad_init_port(struct ib_device *device, 
 int port_num,
   base += IB_UMAD_MAX_PORTS;
   cdev_init(port-sm_cdev, umad_sm_fops);
   port-sm_cdev.owner = THIS_MODULE;
 + port-sm_cdev.kobj.parent = umad_dev-kobj;
   kobject_set_name(port-sm_cdev.kobj, issm%d, port-dev_num);
   if (cdev_add(port-sm_cdev, base, 1))
   goto err_sm_cdev;
 @@ -1146,7 +1153,7 @@ static void ib_umad_add_one(struct ib_device *device)
   if (!umad_dev)
   return;
  
 - kref_init(umad_dev-ref);
 + kobject_init(umad_dev-kobj, ib_umad_dev_ktype);
  
   umad_dev-start_port = s;
   umad_dev-end_port   = e;
 @@ -1154,7 +1161,8 @@ static void ib_umad_add_one(struct ib_device *device)
   for (i = s; i = e; ++i) {
   umad_dev-port[i - s].umad_dev = umad_dev;
  
 - if (ib_umad_init_port(device, i, umad_dev-port[i - s]))
 + if (ib_umad_init_port(device, i, 

[PATCH v3 0/9] SRP initiator patches for kernel 3.16

2014-05-16 Thread Bart Van Assche
Changes compared to v2:
- Reconnect to the SRP target if a local invalidation work request
  fails.
- Swapped the state-next_fmr / next_fr assignments to improve code
  readability.
- Clarified a comment in patch 1/9.
- Fixed error handling in srp_create_target() (was broken in v2).
- Added a missing PFX in two shost_printk() statements in patch 9/9.

Changes compared to v1:
- Modified the FMR code such that one FMR pool is allocated per
  connection instead of one pool per HCA.
- Dropped the patch Make srp_alloc_req_data() reallocate request data.
- Moved introduction of the register_always kernel module parameter
  into a separate patch.
- Removed the loop from around ib_create_fmr_pool() and
  srp_create_fr_pool(). max_pages_per_mr is now computed from
  max_mr_size and max_fast_reg_page_list_len.
- Reduced fast registration pool size from 1024 to scsi_host-can_queue.
- Added a patch that should fix a crash that had been reported by Sagi
  but that I have not yet been able to reproduce myself.

This patch series consists of the following nine patches:

0001-IB-srp-Fix-a-sporadic-crash-triggered-by-cable-pulli.patch
0002-IB-srp-Fix-kernel-doc-warnings.patch
0003-IB-srp-Introduce-an-additional-local-variable.patch
0004-IB-srp-Introduce-srp_map_fmr.patch
0005-IB-srp-Introduce-srp_finish_mapping.patch
0006-IB-srp-Introduce-the-register_always-kernel-module-p.patch
0007-IB-srp-One-FMR-pool-per-SRP-connection.patch
0008-IB-srp-Rename-FMR-related-variables.patch
0009-IB-srp-Add-fast-registration-support.patch

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 1/9] IB/srp: Fix a sporadic crash triggered by cable pulling

2014-05-16 Thread Bart Van Assche
Avoid that the loops that iterate over the request ring can
encounter a pointer to a SCSI command in req-scmnd that is
no longer associated with that request. If the function
srp_unmap_data() is invoked twice for a SCSI command that is
not in flight then that would cause ib_fmr_pool_unmap() to
be invoked with an invalid pointer as argument, resulting in
a kernel oops.

Reported by: Sagi Grimberg sa...@mellanox.com
Reference: http://thread.gmane.org/gmane.linux.drivers.rdma/19068/focus=19069
Signed-off-by: Bart Van Assche bvanass...@acm.org
Cc: Roland Dreier rol...@purestorage.com
Cc: David Dillow d...@thedillows.org
Cc: Sagi Grimberg sa...@mellanox.com
Cc: Vu Pham v...@mellanox.com
Cc: Sebastian Parschauer sebastian.rie...@profitbricks.com
Cc: stable sta...@vger.kernel.org
---
 drivers/infiniband/ulp/srp/ib_srp.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/drivers/infiniband/ulp/srp/ib_srp.c 
b/drivers/infiniband/ulp/srp/ib_srp.c
index 66a908b..5b2bed8 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -1594,6 +1594,12 @@ err_unmap:
 err_iu:
srp_put_tx_iu(target, iu, SRP_IU_CMD);
 
+   /*
+* Avoid that the loops that iterate over the request ring can
+* encounter a dangling SCSI command pointer.
+*/
+   req-scmnd = NULL;
+
spin_lock_irqsave(target-lock, flags);
list_add(req-list, target-free_reqs);
 
-- 
1.8.4.5


--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 9/9] IB/srp: Add fast registration support

2014-05-16 Thread Bart Van Assche
Certain HCA types (e.g. Connect-IB) and certain configurations (e.g.
ConnectX VF) support fast registration but not FMR. Hence add fast
registration support.

In function srp_rport_reconnect(), move the the srp_finish_req()
loop from after to before the srp_create_target_ib() call. This is
needed to avoid that srp_finish_req() tries to queue any
invalidation requests for rkeys associated with the old queue pair
on the newly allocated queue pair. Invoking srp_finish_req() before
the queue pair has been reallocated is safe since srp_claim_req()
handles completions correctly that arrive after srp_finish_req()
has been invoked.

Signed-off-by: Bart Van Assche bvanass...@acm.org
Cc: Roland Dreier rol...@purestorage.com
Cc: David Dillow d...@thedillows.org
Cc: Sagi Grimberg sa...@mellanox.com
Cc: Vu Pham v...@mellanox.com
Cc: Sebastian Parschauer sebastian.rie...@profitbricks.com
---
 drivers/infiniband/ulp/srp/ib_srp.c | 457 +---
 drivers/infiniband/ulp/srp/ib_srp.h |  74 +-
 2 files changed, 444 insertions(+), 87 deletions(-)

diff --git a/drivers/infiniband/ulp/srp/ib_srp.c 
b/drivers/infiniband/ulp/srp/ib_srp.c
index 32ec11c..d24eeed 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -66,6 +66,7 @@ static unsigned int srp_sg_tablesize;
 static unsigned int cmd_sg_entries;
 static unsigned int indirect_sg_entries;
 static bool allow_ext_sg;
+static bool prefer_fr;
 static bool register_always;
 static int topspin_workarounds = 1;
 
@@ -88,6 +89,10 @@ module_param(topspin_workarounds, int, 0444);
 MODULE_PARM_DESC(topspin_workarounds,
 Enable workarounds for Topspin/Cisco SRP target bugs if != 
0);
 
+module_param(prefer_fr, bool, 0444);
+MODULE_PARM_DESC(prefer_fr,
+Whether to use fast registration if both FMR and fast registration are 
supported);
+
 module_param(register_always, bool, 0444);
 MODULE_PARM_DESC(register_always,
 Use memory registration even for contiguous memory regions);
@@ -311,6 +316,132 @@ static struct ib_fmr_pool *srp_alloc_fmr_pool(struct 
srp_target_port *target)
return ib_create_fmr_pool(dev-pd, fmr_param);
 }
 
+/**
+ * srp_destroy_fr_pool() - free the resources owned by a pool
+ * @pool: Fast registration pool to be destroyed.
+ */
+static void srp_destroy_fr_pool(struct srp_fr_pool *pool)
+{
+   int i;
+   struct srp_fr_desc *d;
+
+   if (!pool)
+   return;
+
+   for (i = 0, d = pool-desc[0]; i  pool-size; i++, d++) {
+   if (d-frpl)
+   ib_free_fast_reg_page_list(d-frpl);
+   if (d-mr)
+   ib_dereg_mr(d-mr);
+   }
+   kfree(pool);
+}
+
+/**
+ * srp_create_fr_pool() - allocate and initialize a pool for fast registration
+ * @device:IB device to allocate fast registration descriptors for.
+ * @pd:Protection domain associated with the FR descriptors.
+ * @pool_size: Number of descriptors to allocate.
+ * @max_page_list_len: Maximum fast registration work request page list length.
+ */
+static struct srp_fr_pool *srp_create_fr_pool(struct ib_device *device,
+ struct ib_pd *pd, int pool_size,
+ int max_page_list_len)
+{
+   struct srp_fr_pool *pool;
+   struct srp_fr_desc *d;
+   struct ib_mr *mr;
+   struct ib_fast_reg_page_list *frpl;
+   int i, ret = -EINVAL;
+
+   if (pool_size = 0)
+   goto err;
+   ret = -ENOMEM;
+   pool = kzalloc(sizeof(struct srp_fr_pool) +
+  pool_size * sizeof(struct srp_fr_desc), GFP_KERNEL);
+   if (!pool)
+   goto err;
+   pool-size = pool_size;
+   pool-max_page_list_len = max_page_list_len;
+   spin_lock_init(pool-lock);
+   INIT_LIST_HEAD(pool-free_list);
+
+   for (i = 0, d = pool-desc[0]; i  pool-size; i++, d++) {
+   mr = ib_alloc_fast_reg_mr(pd, max_page_list_len);
+   if (IS_ERR(mr)) {
+   ret = PTR_ERR(mr);
+   goto destroy_pool;
+   }
+   d-mr = mr;
+   frpl = ib_alloc_fast_reg_page_list(device, max_page_list_len);
+   if (IS_ERR(frpl)) {
+   ret = PTR_ERR(frpl);
+   goto destroy_pool;
+   }
+   d-frpl = frpl;
+   list_add_tail(d-entry, pool-free_list);
+   }
+
+out:
+   return pool;
+
+destroy_pool:
+   srp_destroy_fr_pool(pool);
+
+err:
+   pool = ERR_PTR(ret);
+   goto out;
+}
+
+/**
+ * srp_fr_pool_get() - obtain a descriptor suitable for fast registration
+ * @pool: Pool to obtain descriptor from.
+ */
+static struct srp_fr_desc *srp_fr_pool_get(struct srp_fr_pool *pool)
+{
+   struct srp_fr_desc *d = NULL;
+   unsigned long flags;
+
+   spin_lock_irqsave(pool-lock, flags);
+   if 

[PATCH v3 8/9] IB/srp: Rename FMR-related variables

2014-05-16 Thread Bart Van Assche
The next patch will cause the renamed variables to be shared between
the code for FMR and for FR memory registration. Make the names of
these variables independent of the memory registration mode. This
patch does not change any functionality. The start of this patch was
the changes applied via the following shell command:

sed -i.orig 's/SRP_FMR_SIZE/SRP_MAX_PAGES_PER_MR/g; \
s/fmr_page_mask/mr_page_mask/g;s/fmr_page_size/mr_page_size/g; \
s/fmr_page_shift/mr_page_shift/g;s/fmr_max_size/mr_max_size/g; \
s/max_pages_per_fmr/max_pages_per_mr/g;s/nfmr/nmdesc/g; \
s/fmr_len/dma_len/g' drivers/infiniband/ulp/srp/ib_srp.[ch]

Signed-off-by: Bart Van Assche bvanass...@acm.org
Cc: Roland Dreier rol...@purestorage.com
Cc: David Dillow d...@thedillows.org
Cc: Sagi Grimberg sa...@mellanox.com
Cc: Vu Pham v...@mellanox.com
Cc: Sebastian Parschauer sebastian.rie...@profitbricks.com
---
 drivers/infiniband/ulp/srp/ib_srp.c | 48 ++---
 drivers/infiniband/ulp/srp/ib_srp.h | 16 ++---
 2 files changed, 32 insertions(+), 32 deletions(-)

diff --git a/drivers/infiniband/ulp/srp/ib_srp.c 
b/drivers/infiniband/ulp/srp/ib_srp.c
index 72e3bf0..32ec11c 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -302,8 +302,8 @@ static struct ib_fmr_pool *srp_alloc_fmr_pool(struct 
srp_target_port *target)
fmr_param.pool_size = target-scsi_host-can_queue;
fmr_param.dirty_watermark   = fmr_param.pool_size / 4;
fmr_param.cache = 1;
-   fmr_param.max_pages_per_fmr = dev-max_pages_per_fmr;
-   fmr_param.page_shift= ilog2(dev-fmr_page_size);
+   fmr_param.max_pages_per_fmr = dev-max_pages_per_mr;
+   fmr_param.page_shift= ilog2(dev-mr_page_size);
fmr_param.access= (IB_ACCESS_LOCAL_WRITE |
   IB_ACCESS_REMOTE_WRITE |
   IB_ACCESS_REMOTE_READ);
@@ -659,7 +659,7 @@ static int srp_alloc_req_data(struct srp_target_port 
*target)
req = target-req_ring[i];
req-fmr_list = kmalloc(target-cmd_sg_cnt * sizeof(void *),
GFP_KERNEL);
-   req-map_page = kmalloc(SRP_FMR_SIZE * sizeof(void *),
+   req-map_page = kmalloc(SRP_MAX_PAGES_PER_MR * sizeof(void *),
GFP_KERNEL);
req-indirect_desc = kmalloc(target-indirect_size, GFP_KERNEL);
if (!req-fmr_list || !req-map_page || !req-indirect_desc)
@@ -812,7 +812,7 @@ static void srp_unmap_data(struct scsi_cmnd *scmnd,
return;
 
pfmr = req-fmr_list;
-   while (req-nfmr--)
+   while (req-nmdesc--)
ib_fmr_pool_unmap(*pfmr++);
 
ib_dma_unmap_sg(ibdev, scsi_sglist(scmnd), scsi_sg_count(scmnd),
@@ -981,9 +981,9 @@ static int srp_map_finish_fmr(struct srp_map_state *state,
return PTR_ERR(fmr);
 
*state-next_fmr++ = fmr;
-   state-nfmr++;
+   state-nmdesc++;
 
-   srp_map_desc(state, 0, state-fmr_len, fmr-fmr-rkey);
+   srp_map_desc(state, 0, state-dma_len, fmr-fmr-rkey);
 
return 0;
 }
@@ -997,14 +997,14 @@ static int srp_finish_mapping(struct srp_map_state *state,
return 0;
 
if (state-npages == 1  !register_always)
-   srp_map_desc(state, state-base_dma_addr, state-fmr_len,
+   srp_map_desc(state, state-base_dma_addr, state-dma_len,
 target-rkey);
else
ret = srp_map_finish_fmr(state, target);
 
if (ret == 0) {
state-npages = 0;
-   state-fmr_len = 0;
+   state-dma_len = 0;
}
 
return ret;
@@ -1049,7 +1049,7 @@ static int srp_map_sg_entry(struct srp_map_state *state,
 * that were never quite defined, but went away when the initiator
 * avoided using FMR on such page fragments.
 */
-   if (dma_addr  ~dev-fmr_page_mask || dma_len  dev-fmr_max_size) {
+   if (dma_addr  ~dev-mr_page_mask || dma_len  dev-mr_max_size) {
ret = srp_finish_mapping(state, target);
if (ret)
return ret;
@@ -1068,7 +1068,7 @@ static int srp_map_sg_entry(struct srp_map_state *state,
srp_map_update_start(state, sg, sg_index, dma_addr);
 
while (dma_len) {
-   if (state-npages == SRP_FMR_SIZE) {
+   if (state-npages == SRP_MAX_PAGES_PER_MR) {
ret = srp_finish_mapping(state, target);
if (ret)
return ret;
@@ -1076,12 +1076,12 @@ static int srp_map_sg_entry(struct srp_map_state *state,
srp_map_update_start(state, sg, sg_index, dma_addr);
}
 
-   len = min_t(unsigned int, dma_len, dev-fmr_page_size);
+

[PATCH v3 4/9] IB/srp: Introduce srp_map_fmr()

2014-05-16 Thread Bart Van Assche
This patch does not change any functionality.

Signed-off-by: Bart Van Assche bvanass...@acm.org
Reviewed-by: Sagi Grimberg sa...@mellanox.com
Cc: Roland Dreier rol...@purestorage.com
Cc: David Dillow d...@thedillows.org
Cc: Vu Pham v...@mellanox.com
Cc: Sebastian Parschauer sebastian.rie...@profitbricks.com
---
 drivers/infiniband/ulp/srp/ib_srp.c | 77 ++---
 1 file changed, 45 insertions(+), 32 deletions(-)

diff --git a/drivers/infiniband/ulp/srp/ib_srp.c 
b/drivers/infiniband/ulp/srp/ib_srp.c
index 281f785..be84c94 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -1047,12 +1047,54 @@ static int srp_map_sg_entry(struct srp_map_state *state,
return ret;
 }
 
+static void srp_map_fmr(struct srp_map_state *state,
+   struct srp_target_port *target, struct srp_request *req,
+   struct scatterlist *scat, int count)
+{
+   struct srp_device *dev = target-srp_host-srp_dev;
+   struct ib_device *ibdev = dev-dev;
+   struct scatterlist *sg;
+   int i, use_fmr;
+
+   state-desc = req-indirect_desc;
+   state-pages= req-map_page;
+   state-next_fmr = req-fmr_list;
+
+   use_fmr = dev-fmr_pool ? SRP_MAP_ALLOW_FMR : SRP_MAP_NO_FMR;
+
+   for_each_sg(scat, sg, count, i) {
+   if (srp_map_sg_entry(state, target, sg, i, use_fmr)) {
+   /* FMR mapping failed, so backtrack to the first
+* unmapped entry and continue on without using FMR.
+*/
+   dma_addr_t dma_addr;
+   unsigned int dma_len;
+
+backtrack:
+   sg = state-unmapped_sg;
+   i = state-unmapped_index;
+
+   dma_addr = ib_sg_dma_address(ibdev, sg);
+   dma_len = ib_sg_dma_len(ibdev, sg);
+   dma_len -= (state-unmapped_addr - dma_addr);
+   dma_addr = state-unmapped_addr;
+   use_fmr = SRP_MAP_NO_FMR;
+   srp_map_desc(state, dma_addr, dma_len, target-rkey);
+   }
+   }
+
+   if (use_fmr == SRP_MAP_ALLOW_FMR  srp_map_finish_fmr(state, target))
+   goto backtrack;
+
+   req-nfmr = state-nfmr;
+}
+
 static int srp_map_data(struct scsi_cmnd *scmnd, struct srp_target_port 
*target,
struct srp_request *req)
 {
-   struct scatterlist *scat, *sg;
+   struct scatterlist *scat;
struct srp_cmd *cmd = req-cmd-buf;
-   int i, len, nents, count, use_fmr;
+   int len, nents, count;
struct srp_device *dev;
struct ib_device *ibdev;
struct srp_map_state state;
@@ -,35 +1153,7 @@ static int srp_map_data(struct scsi_cmnd *scmnd, struct 
srp_target_port *target,
   target-indirect_size, DMA_TO_DEVICE);
 
memset(state, 0, sizeof(state));
-   state.desc  = req-indirect_desc;
-   state.pages = req-map_page;
-   state.next_fmr  = req-fmr_list;
-
-   use_fmr = dev-fmr_pool ? SRP_MAP_ALLOW_FMR : SRP_MAP_NO_FMR;
-
-   for_each_sg(scat, sg, count, i) {
-   if (srp_map_sg_entry(state, target, sg, i, use_fmr)) {
-   /* FMR mapping failed, so backtrack to the first
-* unmapped entry and continue on without using FMR.
-*/
-   dma_addr_t dma_addr;
-   unsigned int dma_len;
-
-backtrack:
-   sg = state.unmapped_sg;
-   i = state.unmapped_index;
-
-   dma_addr = ib_sg_dma_address(ibdev, sg);
-   dma_len = ib_sg_dma_len(ibdev, sg);
-   dma_len -= (state.unmapped_addr - dma_addr);
-   dma_addr = state.unmapped_addr;
-   use_fmr = SRP_MAP_NO_FMR;
-   srp_map_desc(state, dma_addr, dma_len, target-rkey);
-   }
-   }
-
-   if (use_fmr == SRP_MAP_ALLOW_FMR  srp_map_finish_fmr(state, target))
-   goto backtrack;
+   srp_map_fmr(state, target, req, scat, count);
 
/* We've mapped the request, now pull as much of the indirect
 * descriptor table as we can into the command buffer. If this
@@ -1147,7 +1161,6 @@ backtrack:
 * guaranteed to fit into the command, as the SCSI layer won't
 * give us more S/G entries than we allow.
 */
-   req-nfmr = state.nfmr;
if (state.ndesc == 1) {
/* FMR mapping was able to collapse this to one entry,
 * so use a direct descriptor.
-- 
1.8.4.5

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 6/9] IB/srp: Introduce the 'register_always' kernel module parameter

2014-05-16 Thread Bart Van Assche
Add a kernel module parameter that enables memory registration also for SG-lists
that can be processed without memory registration. This makes it easier for 
kernel
developers to test the memory registration code.

Signed-off-by: Bart Van Assche bvanass...@acm.org
Cc: Roland Dreier rol...@purestorage.com
Cc: David Dillow d...@thedillows.org
Cc: Sagi Grimberg sa...@mellanox.com
Cc: Vu Pham v...@mellanox.com
Cc: Sebastian Parschauer sebastian.rie...@profitbricks.com
---
 drivers/infiniband/ulp/srp/ib_srp.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/ulp/srp/ib_srp.c 
b/drivers/infiniband/ulp/srp/ib_srp.c
index c2e0ad3..77ba965 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -66,6 +66,7 @@ static unsigned int srp_sg_tablesize;
 static unsigned int cmd_sg_entries;
 static unsigned int indirect_sg_entries;
 static bool allow_ext_sg;
+static bool register_always;
 static int topspin_workarounds = 1;
 
 module_param(srp_sg_tablesize, uint, 0444);
@@ -87,6 +88,10 @@ module_param(topspin_workarounds, int, 0444);
 MODULE_PARM_DESC(topspin_workarounds,
 Enable workarounds for Topspin/Cisco SRP target bugs if != 
0);
 
+module_param(register_always, bool, 0444);
+MODULE_PARM_DESC(register_always,
+Use memory registration even for contiguous memory regions);
+
 static struct kernel_param_ops srp_tmo_ops;
 
 static int srp_reconnect_delay = 10;
@@ -956,7 +961,7 @@ static int srp_finish_mapping(struct srp_map_state *state,
if (state-npages == 0)
return 0;
 
-   if (state-npages == 1)
+   if (state-npages == 1  !register_always)
srp_map_desc(state, state-base_dma_addr, state-fmr_len,
 target-rkey);
else
@@ -1138,7 +1143,7 @@ static int srp_map_data(struct scsi_cmnd *scmnd, struct 
srp_target_port *target,
fmt = SRP_DATA_DESC_DIRECT;
len = sizeof (struct srp_cmd) + sizeof (struct srp_direct_buf);
 
-   if (count == 1) {
+   if (count == 1  !register_always) {
/*
 * The midlayer only generated a single gather/scatter
 * entry, or DMA mapping coalesced everything to a
-- 
1.8.4.5

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 5/9] IB/srp: Introduce srp_finish_mapping()

2014-05-16 Thread Bart Van Assche
This patch does not change any functionality.

Signed-off-by: Bart Van Assche bvanass...@acm.org
Cc: Roland Dreier rol...@purestorage.com
Cc: David Dillow d...@thedillows.org
Cc: Sagi Grimberg sa...@mellanox.com
Cc: Vu Pham v...@mellanox.com
Cc: Sebastian Parschauer sebastian.rie...@profitbricks.com
---
 drivers/infiniband/ulp/srp/ib_srp.c | 42 -
 1 file changed, 27 insertions(+), 15 deletions(-)

diff --git a/drivers/infiniband/ulp/srp/ib_srp.c 
b/drivers/infiniband/ulp/srp/ib_srp.c
index be84c94..c2e0ad3 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -935,16 +935,6 @@ static int srp_map_finish_fmr(struct srp_map_state *state,
struct ib_pool_fmr *fmr;
u64 io_addr = 0;
 
-   if (!state-npages)
-   return 0;
-
-   if (state-npages == 1) {
-   srp_map_desc(state, state-base_dma_addr, state-fmr_len,
-target-rkey);
-   state-npages = state-fmr_len = 0;
-   return 0;
-   }
-
fmr = ib_fmr_pool_map_phys(dev-fmr_pool, state-pages,
   state-npages, io_addr);
if (IS_ERR(fmr))
@@ -954,10 +944,32 @@ static int srp_map_finish_fmr(struct srp_map_state *state,
state-nfmr++;
 
srp_map_desc(state, 0, state-fmr_len, fmr-fmr-rkey);
-   state-npages = state-fmr_len = 0;
+
return 0;
 }
 
+static int srp_finish_mapping(struct srp_map_state *state,
+ struct srp_target_port *target)
+{
+   int ret = 0;
+
+   if (state-npages == 0)
+   return 0;
+
+   if (state-npages == 1)
+   srp_map_desc(state, state-base_dma_addr, state-fmr_len,
+target-rkey);
+   else
+   ret = srp_map_finish_fmr(state, target);
+
+   if (ret == 0) {
+   state-npages = 0;
+   state-fmr_len = 0;
+   }
+
+   return ret;
+}
+
 static void srp_map_update_start(struct srp_map_state *state,
 struct scatterlist *sg, int sg_index,
 dma_addr_t dma_addr)
@@ -998,7 +1010,7 @@ static int srp_map_sg_entry(struct srp_map_state *state,
 * avoided using FMR on such page fragments.
 */
if (dma_addr  ~dev-fmr_page_mask || dma_len  dev-fmr_max_size) {
-   ret = srp_map_finish_fmr(state, target);
+   ret = srp_finish_mapping(state, target);
if (ret)
return ret;
 
@@ -1017,7 +1029,7 @@ static int srp_map_sg_entry(struct srp_map_state *state,
 
while (dma_len) {
if (state-npages == SRP_FMR_SIZE) {
-   ret = srp_map_finish_fmr(state, target);
+   ret = srp_finish_mapping(state, target);
if (ret)
return ret;
 
@@ -1040,7 +1052,7 @@ static int srp_map_sg_entry(struct srp_map_state *state,
 */
ret = 0;
if (len != dev-fmr_page_size) {
-   ret = srp_map_finish_fmr(state, target);
+   ret = srp_finish_mapping(state, target);
if (!ret)
srp_map_update_start(state, NULL, 0, 0);
}
@@ -1083,7 +1095,7 @@ backtrack:
}
}
 
-   if (use_fmr == SRP_MAP_ALLOW_FMR  srp_map_finish_fmr(state, target))
+   if (use_fmr == SRP_MAP_ALLOW_FMR  srp_finish_mapping(state, target))
goto backtrack;
 
req-nfmr = state-nfmr;
-- 
1.8.4.5

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 3/9] IB/srp: Introduce an additional local variable

2014-05-16 Thread Bart Van Assche
This patch does not change any functionality.

Signed-off-by: Bart Van Assche bvanass...@acm.org
Reviewed-by: Sagi Grimberg sa...@mellanox.com
Cc: Roland Dreier rol...@purestorage.com
Cc: David Dillow d...@thedillows.org
Cc: Vu Pham v...@mellanox.com
Cc: Sebastian Parschauer sebastian.rie...@profitbricks.com
---
 drivers/infiniband/ulp/srp/ib_srp.c | 9 -
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/drivers/infiniband/ulp/srp/ib_srp.c 
b/drivers/infiniband/ulp/srp/ib_srp.c
index 4f8be37..281f785 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -290,6 +290,7 @@ static int srp_new_cm_id(struct srp_target_port *target)
 
 static int srp_create_target_ib(struct srp_target_port *target)
 {
+   struct srp_device *dev = target-srp_host-srp_dev;
struct ib_qp_init_attr *init_attr;
struct ib_cq *recv_cq, *send_cq;
struct ib_qp *qp;
@@ -299,16 +300,14 @@ static int srp_create_target_ib(struct srp_target_port 
*target)
if (!init_attr)
return -ENOMEM;
 
-   recv_cq = ib_create_cq(target-srp_host-srp_dev-dev,
-  srp_recv_completion, NULL, target,
+   recv_cq = ib_create_cq(dev-dev, srp_recv_completion, NULL, target,
   target-queue_size, target-comp_vector);
if (IS_ERR(recv_cq)) {
ret = PTR_ERR(recv_cq);
goto err;
}
 
-   send_cq = ib_create_cq(target-srp_host-srp_dev-dev,
-  srp_send_completion, NULL, target,
+   send_cq = ib_create_cq(dev-dev, srp_send_completion, NULL, target,
   target-queue_size, target-comp_vector);
if (IS_ERR(send_cq)) {
ret = PTR_ERR(send_cq);
@@ -327,7 +326,7 @@ static int srp_create_target_ib(struct srp_target_port 
*target)
init_attr-send_cq = send_cq;
init_attr-recv_cq = recv_cq;
 
-   qp = ib_create_qp(target-srp_host-srp_dev-pd, init_attr);
+   qp = ib_create_qp(dev-pd, init_attr);
if (IS_ERR(qp)) {
ret = PTR_ERR(qp);
goto err_send_cq;
-- 
1.8.4.5

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V3 01/17] xprtrdma: mind the device's max fast register page list depth

2014-05-16 Thread Steve Wise
I guess the client code doesn't verify that the device supports the 
chosen memreg mode.  That's not good.   Lemme fix this and respin this 
patch.



On 5/16/2014 2:08 AM, Devesh Sharma wrote:

Chuck

This patch is causing a CPU soft-lockup if underlying vendor reports 
devattr.max_fast_reg_pagr_list_len = 0 and ia-ri_memreg_strategy = FRMR 
(Default option).
I think there is need to refer to device capability flags. If strategy = FRMR 
is forced and devattr.max_fast_reg_pagr_list_len=0 then flash an error and fail 
RPC with -EIO.

See inline:


-Original Message-
From: linux-rdma-ow...@vger.kernel.org [mailto:linux-rdma-
ow...@vger.kernel.org] On Behalf Of Chuck Lever
Sent: Thursday, May 01, 2014 1:00 AM
To: linux-...@vger.kernel.org; linux-rdma@vger.kernel.org
Cc: anna.schuma...@netapp.com
Subject: [PATCH V3 01/17] xprtrdma: mind the device's max fast register
page list depth

From: Steve Wise sw...@opengridcomputing.com

Some rdma devices don't support a fast register page list depth of at least
RPCRDMA_MAX_DATA_SEGS.  So xprtrdma needs to chunk its fast register
regions according to the minimum of the device max supported depth or
RPCRDMA_MAX_DATA_SEGS.

Signed-off-by: Steve Wise sw...@opengridcomputing.com
Reviewed-by: Chuck Lever chuck.le...@oracle.com
---

  net/sunrpc/xprtrdma/rpc_rdma.c  |4 ---
  net/sunrpc/xprtrdma/verbs.c |   47 +-
-
  net/sunrpc/xprtrdma/xprt_rdma.h |1 +
  3 files changed, 36 insertions(+), 16 deletions(-)

diff --git a/net/sunrpc/xprtrdma/rpc_rdma.c
b/net/sunrpc/xprtrdma/rpc_rdma.c index 96ead52..400aa1b 100644
--- a/net/sunrpc/xprtrdma/rpc_rdma.c
+++ b/net/sunrpc/xprtrdma/rpc_rdma.c
@@ -248,10 +248,6 @@ rpcrdma_create_chunks(struct rpc_rqst *rqst, struct
xdr_buf *target,
/* success. all failures return above */
req-rl_nchunks = nchunks;

-   BUG_ON(nchunks == 0);
-   BUG_ON((r_xprt-rx_ia.ri_memreg_strategy == RPCRDMA_FRMR)
-   (nchunks  3));
-
/*
 * finish off header. If write, marshal discrim and nchunks.
 */
diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index 9372656..55fb09a 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -539,6 +539,11 @@ rpcrdma_ia_open(struct rpcrdma_xprt *xprt, struct
sockaddr *addr, int memreg)
__func__);
memreg = RPCRDMA_REGISTER;
  #endif
+   } else {
+   /* Mind the ia limit on FRMR page list depth */
+   ia-ri_max_frmr_depth = min_t(unsigned int,
+   RPCRDMA_MAX_DATA_SEGS,
+   devattr.max_fast_reg_page_list_len);
}
break;
}
@@ -659,24 +664,42 @@ rpcrdma_ep_create(struct rpcrdma_ep *ep, struct
rpcrdma_ia *ia,
ep-rep_attr.srq = NULL;
ep-rep_attr.cap.max_send_wr = cdata-max_requests;
switch (ia-ri_memreg_strategy) {
-   case RPCRDMA_FRMR:
+   case RPCRDMA_FRMR: {
+   int depth = 7;
+
/* Add room for frmr register and invalidate WRs.
 * 1. FRMR reg WR for head
 * 2. FRMR invalidate WR for head
-* 3. FRMR reg WR for pagelist
-* 4. FRMR invalidate WR for pagelist
+* 3. N FRMR reg WRs for pagelist
+* 4. N FRMR invalidate WRs for pagelist
 * 5. FRMR reg WR for tail
 * 6. FRMR invalidate WR for tail
 * 7. The RDMA_SEND WR
 */
-   ep-rep_attr.cap.max_send_wr *= 7;
+
+   /* Calculate N if the device max FRMR depth is smaller than
+* RPCRDMA_MAX_DATA_SEGS.
+*/
+   if (ia-ri_max_frmr_depth  RPCRDMA_MAX_DATA_SEGS) {
+   int delta = RPCRDMA_MAX_DATA_SEGS -
+   ia-ri_max_frmr_depth;
+
+   do {
+   depth += 2; /* FRMR reg + invalidate */
+   delta -= ia-ri_max_frmr_depth;

If ia-ri_max_frmr_depth is = 0. This loop becomes infinite loop.


+   } while (delta  0);
+
+   }
+   ep-rep_attr.cap.max_send_wr *= depth;
if (ep-rep_attr.cap.max_send_wr  devattr.max_qp_wr) {
-   cdata-max_requests = devattr.max_qp_wr / 7;
+   cdata-max_requests = devattr.max_qp_wr / depth;
if (!cdata-max_requests)
return -EINVAL;
-   ep-rep_attr.cap.max_send_wr = cdata-

max_requests * 7;

+   ep-rep_attr.cap.max_send_wr = cdata-

max_requests *

+  depth;
}
break;
+   }
case RPCRDMA_MEMWINDOWS_ASYNC:
case 

Re: [PATCH V3 01/17] xprtrdma: mind the device's max fast register page list depth

2014-05-16 Thread Steve Wise
By the way, Devesh:  Is the device advertising FRMR support, yet setting 
the max page list len to zero?  That's a driver bug...




On 5/16/2014 9:10 AM, Steve Wise wrote:
I guess the client code doesn't verify that the device supports the 
chosen memreg mode.  That's not good.   Lemme fix this and respin this 
patch.



On 5/16/2014 2:08 AM, Devesh Sharma wrote:

Chuck

This patch is causing a CPU soft-lockup if underlying vendor reports 
devattr.max_fast_reg_pagr_list_len = 0 and ia-ri_memreg_strategy = 
FRMR (Default option).
I think there is need to refer to device capability flags. If 
strategy = FRMR is forced and devattr.max_fast_reg_pagr_list_len=0 
then flash an error and fail RPC with -EIO.


See inline:


-Original Message-
From: linux-rdma-ow...@vger.kernel.org [mailto:linux-rdma-
ow...@vger.kernel.org] On Behalf Of Chuck Lever
Sent: Thursday, May 01, 2014 1:00 AM
To: linux-...@vger.kernel.org; linux-rdma@vger.kernel.org
Cc: anna.schuma...@netapp.com
Subject: [PATCH V3 01/17] xprtrdma: mind the device's max fast register
page list depth

From: Steve Wise sw...@opengridcomputing.com

Some rdma devices don't support a fast register page list depth of 
at least

RPCRDMA_MAX_DATA_SEGS.  So xprtrdma needs to chunk its fast register
regions according to the minimum of the device max supported depth or
RPCRDMA_MAX_DATA_SEGS.

Signed-off-by: Steve Wise sw...@opengridcomputing.com
Reviewed-by: Chuck Lever chuck.le...@oracle.com
---

  net/sunrpc/xprtrdma/rpc_rdma.c  |4 ---
  net/sunrpc/xprtrdma/verbs.c |   47 +-
-
  net/sunrpc/xprtrdma/xprt_rdma.h |1 +
  3 files changed, 36 insertions(+), 16 deletions(-)

diff --git a/net/sunrpc/xprtrdma/rpc_rdma.c
b/net/sunrpc/xprtrdma/rpc_rdma.c index 96ead52..400aa1b 100644
--- a/net/sunrpc/xprtrdma/rpc_rdma.c
+++ b/net/sunrpc/xprtrdma/rpc_rdma.c
@@ -248,10 +248,6 @@ rpcrdma_create_chunks(struct rpc_rqst *rqst, 
struct

xdr_buf *target,
  /* success. all failures return above */
  req-rl_nchunks = nchunks;

-BUG_ON(nchunks == 0);
-BUG_ON((r_xprt-rx_ia.ri_memreg_strategy == RPCRDMA_FRMR)
-(nchunks  3));
-
  /*
   * finish off header. If write, marshal discrim and nchunks.
   */
diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index 9372656..55fb09a 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -539,6 +539,11 @@ rpcrdma_ia_open(struct rpcrdma_xprt *xprt, struct
sockaddr *addr, int memreg)
  __func__);
  memreg = RPCRDMA_REGISTER;
  #endif
+} else {
+/* Mind the ia limit on FRMR page list depth */
+ia-ri_max_frmr_depth = min_t(unsigned int,
+RPCRDMA_MAX_DATA_SEGS,
+devattr.max_fast_reg_page_list_len);
  }
  break;
  }
@@ -659,24 +664,42 @@ rpcrdma_ep_create(struct rpcrdma_ep *ep, struct
rpcrdma_ia *ia,
  ep-rep_attr.srq = NULL;
  ep-rep_attr.cap.max_send_wr = cdata-max_requests;
  switch (ia-ri_memreg_strategy) {
-case RPCRDMA_FRMR:
+case RPCRDMA_FRMR: {
+int depth = 7;
+
  /* Add room for frmr register and invalidate WRs.
   * 1. FRMR reg WR for head
   * 2. FRMR invalidate WR for head
- * 3. FRMR reg WR for pagelist
- * 4. FRMR invalidate WR for pagelist
+ * 3. N FRMR reg WRs for pagelist
+ * 4. N FRMR invalidate WRs for pagelist
   * 5. FRMR reg WR for tail
   * 6. FRMR invalidate WR for tail
   * 7. The RDMA_SEND WR
   */
-ep-rep_attr.cap.max_send_wr *= 7;
+
+/* Calculate N if the device max FRMR depth is smaller than
+ * RPCRDMA_MAX_DATA_SEGS.
+ */
+if (ia-ri_max_frmr_depth  RPCRDMA_MAX_DATA_SEGS) {
+int delta = RPCRDMA_MAX_DATA_SEGS -
+ia-ri_max_frmr_depth;
+
+do {
+depth += 2; /* FRMR reg + invalidate */
+delta -= ia-ri_max_frmr_depth;

If ia-ri_max_frmr_depth is = 0. This loop becomes infinite loop.


+} while (delta  0);
+
+}
+ep-rep_attr.cap.max_send_wr *= depth;
  if (ep-rep_attr.cap.max_send_wr  devattr.max_qp_wr) {
-cdata-max_requests = devattr.max_qp_wr / 7;
+cdata-max_requests = devattr.max_qp_wr / depth;
  if (!cdata-max_requests)
  return -EINVAL;
-ep-rep_attr.cap.max_send_wr = cdata-

max_requests * 7;

+ep-rep_attr.cap.max_send_wr = cdata-

max_requests *

+   depth;
  }
  break;
+}
  case RPCRDMA_MEMWINDOWS_ASYNC:
  case RPCRDMA_MEMWINDOWS:
  /* Add room for mw_binds+unbinds - overkill! */ @@ -
1043,16 +1066,16 @@ rpcrdma_buffer_create(struct rpcrdma_buffer *buf,
struct rpcrdma_ep *ep,
  case RPCRDMA_FRMR:
  for (i = buf-rb_max_requests * RPCRDMA_MAX_SEGS; i; i--
) {
 

[PATCH v3 2/9] IB/srp: Fix kernel-doc warnings

2014-05-16 Thread Bart Van Assche
Avoid that the kernel-doc tool warns about missing argument descriptions
for the ib_srp.[ch] source files.

Signed-off-by: Bart Van Assche bvanass...@acm.org
Reviewed-by: Sagi Grimberg sa...@mellanox.com
Cc: Roland Dreier rol...@purestorage.com
Cc: David Dillow d...@thedillows.org
Cc: Vu Pham v...@mellanox.com
Cc: Sebastian Parschauer sebastian.rie...@profitbricks.com
---
 drivers/infiniband/ulp/srp/ib_srp.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/drivers/infiniband/ulp/srp/ib_srp.c 
b/drivers/infiniband/ulp/srp/ib_srp.c
index 5b2bed8..4f8be37 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -813,6 +813,10 @@ static struct scsi_cmnd *srp_claim_req(struct 
srp_target_port *target,
 
 /**
  * srp_free_req() - Unmap data and add request to the free request list.
+ * @target: SRP target port.
+ * @req:Request to be freed.
+ * @scmnd:  SCSI command associated with @req.
+ * @req_lim_delta: Amount to be added to @target-req_lim.
  */
 static void srp_free_req(struct srp_target_port *target,
 struct srp_request *req, struct scsi_cmnd *scmnd,
@@ -1455,6 +1459,7 @@ static void srp_handle_recv(struct srp_target_port 
*target, struct ib_wc *wc)
 
 /**
  * srp_tl_err_work() - handle a transport layer error
+ * @work: Work structure embedded in an SRP target port.
  *
  * Note: This function may get invoked before the rport has been created,
  * hence the target-rport test.
@@ -2316,6 +2321,8 @@ static struct class srp_class = {
 
 /**
  * srp_conn_unique() - check whether the connection to a target is unique
+ * @host:   SRP host.
+ * @target: SRP target port.
  */
 static bool srp_conn_unique(struct srp_host *host,
struct srp_target_port *target)
-- 
1.8.4.5

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 7/9] IB/srp: One FMR pool per SRP connection

2014-05-16 Thread Bart Van Assche
Allocate one FMR pool per SRP connection instead of one SRP pool
per HCA. This improves scalability of the SRP initiator.

Only request the SCSI mid-layer to retry a SCSI command after a
temporary mapping failure (-ENOMEM) but not after a permanent
mapping failure. This avoids that SCSI commands are retried
indefinitely if a permanent memory mapping failure occurs.

Tell the SCSI mid-layer to reduce queue depth temporarily in the
unlikely case where an application is queuing many requests with
more than max_pages_per_fmr sg-list elements.

For FMR pool allocation, base the max_pages_per_fmr parameter on
the HCA memory registration limit.

Signed-off-by: Bart Van Assche bvanass...@acm.org
Cc: Sagi Grimberg sa...@mellanox.com
Cc: Roland Dreier rol...@purestorage.com
Cc: David Dillow d...@thedillows.org
Cc: Vu Pham v...@mellanox.com
Cc: Sebastian Parschauer sebastian.rie...@profitbricks.com
---
 drivers/infiniband/ulp/srp/ib_srp.c | 155 +---
 drivers/infiniband/ulp/srp/ib_srp.h |   6 +-
 2 files changed, 92 insertions(+), 69 deletions(-)

diff --git a/drivers/infiniband/ulp/srp/ib_srp.c 
b/drivers/infiniband/ulp/srp/ib_srp.c
index 77ba965..72e3bf0 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -293,12 +293,31 @@ static int srp_new_cm_id(struct srp_target_port *target)
return 0;
 }
 
+static struct ib_fmr_pool *srp_alloc_fmr_pool(struct srp_target_port *target)
+{
+   struct srp_device *dev = target-srp_host-srp_dev;
+   struct ib_fmr_pool_param fmr_param;
+
+   memset(fmr_param, 0, sizeof(fmr_param));
+   fmr_param.pool_size = target-scsi_host-can_queue;
+   fmr_param.dirty_watermark   = fmr_param.pool_size / 4;
+   fmr_param.cache = 1;
+   fmr_param.max_pages_per_fmr = dev-max_pages_per_fmr;
+   fmr_param.page_shift= ilog2(dev-fmr_page_size);
+   fmr_param.access= (IB_ACCESS_LOCAL_WRITE |
+  IB_ACCESS_REMOTE_WRITE |
+  IB_ACCESS_REMOTE_READ);
+
+   return ib_create_fmr_pool(dev-pd, fmr_param);
+}
+
 static int srp_create_target_ib(struct srp_target_port *target)
 {
struct srp_device *dev = target-srp_host-srp_dev;
struct ib_qp_init_attr *init_attr;
struct ib_cq *recv_cq, *send_cq;
struct ib_qp *qp;
+   struct ib_fmr_pool *fmr_pool = NULL;
int ret;
 
init_attr = kzalloc(sizeof *init_attr, GFP_KERNEL);
@@ -341,6 +360,21 @@ static int srp_create_target_ib(struct srp_target_port 
*target)
if (ret)
goto err_qp;
 
+   if (!target-qp || target-fmr_pool) {
+   fmr_pool = srp_alloc_fmr_pool(target);
+   if (IS_ERR(fmr_pool)) {
+   ret = PTR_ERR(fmr_pool);
+   shost_printk(KERN_WARNING, target-scsi_host, PFX
+FMR pool allocation failed (%d)\n, ret);
+   if (target-qp)
+   goto err_qp;
+   fmr_pool = NULL;
+   }
+   if (target-fmr_pool)
+   ib_destroy_fmr_pool(target-fmr_pool);
+   target-fmr_pool = fmr_pool;
+   }
+
if (target-qp)
ib_destroy_qp(target-qp);
if (target-recv_cq)
@@ -377,6 +411,8 @@ static void srp_free_target_ib(struct srp_target_port 
*target)
 {
int i;
 
+   if (target-fmr_pool)
+   ib_destroy_fmr_pool(target-fmr_pool);
ib_destroy_qp(target-qp);
ib_destroy_cq(target-send_cq);
ib_destroy_cq(target-recv_cq);
@@ -936,11 +972,10 @@ static void srp_map_desc(struct srp_map_state *state, 
dma_addr_t dma_addr,
 static int srp_map_finish_fmr(struct srp_map_state *state,
  struct srp_target_port *target)
 {
-   struct srp_device *dev = target-srp_host-srp_dev;
struct ib_pool_fmr *fmr;
u64 io_addr = 0;
 
-   fmr = ib_fmr_pool_map_phys(dev-fmr_pool, state-pages,
+   fmr = ib_fmr_pool_map_phys(target-fmr_pool, state-pages,
   state-npages, io_addr);
if (IS_ERR(fmr))
return PTR_ERR(fmr);
@@ -1077,7 +1112,7 @@ static void srp_map_fmr(struct srp_map_state *state,
state-pages= req-map_page;
state-next_fmr = req-fmr_list;
 
-   use_fmr = dev-fmr_pool ? SRP_MAP_ALLOW_FMR : SRP_MAP_NO_FMR;
+   use_fmr = target-fmr_pool ? SRP_MAP_ALLOW_FMR : SRP_MAP_NO_FMR;
 
for_each_sg(scat, sg, count, i) {
if (srp_map_sg_entry(state, target, sg, i, use_fmr)) {
@@ -1555,7 +1590,7 @@ static int srp_queuecommand(struct Scsi_Host *shost, 
struct scsi_cmnd *scmnd)
struct srp_cmd *cmd;
struct ib_device *dev;
unsigned long flags;
-   int len, result;
+   int len, ret;
const bool in_scsi_eh = !in_interrupt()  current == 

Re: [PATCH V3 01/17] xprtrdma: mind the device's max fast register page list depth

2014-05-16 Thread Steve Wise
Looks like ocrdma does this.  See ocrdma_query_device().  It advertises 
IB_DEVICE_MEM_MGT_EXTENSIONS but sets max_fast_reg_page_list_len to 0.  
The Verbs spec sez if you advertise the mem extensions, then you need to 
support all of them.   Is this just a bug in the driver?  Or does it 
really not support FRMRs?


Steve.


On 5/16/2014 9:14 AM, Steve Wise wrote:
By the way, Devesh:  Is the device advertising FRMR support, yet 
setting the max page list len to zero?  That's a driver bug...




On 5/16/2014 9:10 AM, Steve Wise wrote:
I guess the client code doesn't verify that the device supports the 
chosen memreg mode.  That's not good.   Lemme fix this and respin 
this patch.



On 5/16/2014 2:08 AM, Devesh Sharma wrote:

Chuck

This patch is causing a CPU soft-lockup if underlying vendor reports 
devattr.max_fast_reg_pagr_list_len = 0 and ia-ri_memreg_strategy = 
FRMR (Default option).
I think there is need to refer to device capability flags. If 
strategy = FRMR is forced and devattr.max_fast_reg_pagr_list_len=0 
then flash an error and fail RPC with -EIO.


See inline:


-Original Message-
From: linux-rdma-ow...@vger.kernel.org [mailto:linux-rdma-
ow...@vger.kernel.org] On Behalf Of Chuck Lever
Sent: Thursday, May 01, 2014 1:00 AM
To: linux-...@vger.kernel.org; linux-rdma@vger.kernel.org
Cc: anna.schuma...@netapp.com
Subject: [PATCH V3 01/17] xprtrdma: mind the device's max fast 
register

page list depth

From: Steve Wise sw...@opengridcomputing.com

Some rdma devices don't support a fast register page list depth of 
at least

RPCRDMA_MAX_DATA_SEGS.  So xprtrdma needs to chunk its fast register
regions according to the minimum of the device max supported depth or
RPCRDMA_MAX_DATA_SEGS.

Signed-off-by: Steve Wise sw...@opengridcomputing.com
Reviewed-by: Chuck Lever chuck.le...@oracle.com
---

  net/sunrpc/xprtrdma/rpc_rdma.c  |4 ---
  net/sunrpc/xprtrdma/verbs.c |   47 
+-

-
  net/sunrpc/xprtrdma/xprt_rdma.h |1 +
  3 files changed, 36 insertions(+), 16 deletions(-)

diff --git a/net/sunrpc/xprtrdma/rpc_rdma.c
b/net/sunrpc/xprtrdma/rpc_rdma.c index 96ead52..400aa1b 100644
--- a/net/sunrpc/xprtrdma/rpc_rdma.c
+++ b/net/sunrpc/xprtrdma/rpc_rdma.c
@@ -248,10 +248,6 @@ rpcrdma_create_chunks(struct rpc_rqst *rqst, 
struct

xdr_buf *target,
  /* success. all failures return above */
  req-rl_nchunks = nchunks;

-BUG_ON(nchunks == 0);
-BUG_ON((r_xprt-rx_ia.ri_memreg_strategy == RPCRDMA_FRMR)
-(nchunks  3));
-
  /*
   * finish off header. If write, marshal discrim and nchunks.
   */
diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index 9372656..55fb09a 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -539,6 +539,11 @@ rpcrdma_ia_open(struct rpcrdma_xprt *xprt, struct
sockaddr *addr, int memreg)
  __func__);
  memreg = RPCRDMA_REGISTER;
  #endif
+} else {
+/* Mind the ia limit on FRMR page list depth */
+ia-ri_max_frmr_depth = min_t(unsigned int,
+RPCRDMA_MAX_DATA_SEGS,
+devattr.max_fast_reg_page_list_len);
  }
  break;
  }
@@ -659,24 +664,42 @@ rpcrdma_ep_create(struct rpcrdma_ep *ep, struct
rpcrdma_ia *ia,
  ep-rep_attr.srq = NULL;
  ep-rep_attr.cap.max_send_wr = cdata-max_requests;
  switch (ia-ri_memreg_strategy) {
-case RPCRDMA_FRMR:
+case RPCRDMA_FRMR: {
+int depth = 7;
+
  /* Add room for frmr register and invalidate WRs.
   * 1. FRMR reg WR for head
   * 2. FRMR invalidate WR for head
- * 3. FRMR reg WR for pagelist
- * 4. FRMR invalidate WR for pagelist
+ * 3. N FRMR reg WRs for pagelist
+ * 4. N FRMR invalidate WRs for pagelist
   * 5. FRMR reg WR for tail
   * 6. FRMR invalidate WR for tail
   * 7. The RDMA_SEND WR
   */
-ep-rep_attr.cap.max_send_wr *= 7;
+
+/* Calculate N if the device max FRMR depth is smaller than
+ * RPCRDMA_MAX_DATA_SEGS.
+ */
+if (ia-ri_max_frmr_depth  RPCRDMA_MAX_DATA_SEGS) {
+int delta = RPCRDMA_MAX_DATA_SEGS -
+ia-ri_max_frmr_depth;
+
+do {
+depth += 2; /* FRMR reg + invalidate */
+delta -= ia-ri_max_frmr_depth;

If ia-ri_max_frmr_depth is = 0. This loop becomes infinite loop.


+} while (delta  0);
+
+}
+ep-rep_attr.cap.max_send_wr *= depth;
  if (ep-rep_attr.cap.max_send_wr  devattr.max_qp_wr) {
-cdata-max_requests = devattr.max_qp_wr / 7;
+cdata-max_requests = devattr.max_qp_wr / depth;
  if (!cdata-max_requests)
  return -EINVAL;
-ep-rep_attr.cap.max_send_wr = cdata-

max_requests * 7;

+ep-rep_attr.cap.max_send_wr = cdata-

max_requests *

+   

[PATCH] iw_cxgb4: fix vlan support

2014-05-16 Thread Steve Wise
RDMA connections over a vlan interface don't work due to
import_ep() not using the correct egress device.

- use the real device in import_ep()

- use rdma_vlan_dev_real_dev() in get_real_dev().

Signed-off-by: Steve Wise sw...@opengridcomputing.com
---

 drivers/infiniband/hw/cxgb4/cm.c |   17 -
 1 files changed, 8 insertions(+), 9 deletions(-)

diff --git a/drivers/infiniband/hw/cxgb4/cm.c b/drivers/infiniband/hw/cxgb4/cm.c
index 1f863a9..28114e6 100644
--- a/drivers/infiniband/hw/cxgb4/cm.c
+++ b/drivers/infiniband/hw/cxgb4/cm.c
@@ -47,6 +47,8 @@
 #include net/ip6_route.h
 #include net/addrconf.h
 
+#include rdma/ib_addr.h
+
 #include iw_cxgb4.h
 
 static char *states[] = {
@@ -341,10 +343,7 @@ static struct sk_buff *get_skb(struct sk_buff *skb, int 
len, gfp_t gfp)
 
 static struct net_device *get_real_dev(struct net_device *egress_dev)
 {
-   struct net_device *phys_dev = egress_dev;
-   if (egress_dev-priv_flags  IFF_802_1Q_VLAN)
-   phys_dev = vlan_dev_real_dev(egress_dev);
-   return phys_dev;
+   return rdma_vlan_dev_real_dev(egress_dev) ? : egress_dev;
 }
 
 static int our_interface(struct c4iw_dev *dev, struct net_device *egress_dev)
@@ -1746,16 +1745,16 @@ static int import_ep(struct c4iw_ep *ep, int iptype, 
__u8 *peer_ip,
if (!ep-l2t)
goto out;
ep-mtu = dst_mtu(dst);
-   ep-tx_chan = cxgb4_port_chan(n-dev);
-   ep-smac_idx = (cxgb4_port_viid(n-dev)  0x7F)  1;
+   ep-tx_chan = cxgb4_port_chan(pdev);
+   ep-smac_idx = (cxgb4_port_viid(pdev)  0x7F)  1;
step = cdev-rdev.lldi.ntxq /
cdev-rdev.lldi.nchan;
-   ep-txq_idx = cxgb4_port_idx(n-dev) * step;
-   ep-ctrlq_idx = cxgb4_port_idx(n-dev);
+   ep-txq_idx = cxgb4_port_idx(pdev) * step;
+   ep-ctrlq_idx = cxgb4_port_idx(pdev);
step = cdev-rdev.lldi.nrxq /
cdev-rdev.lldi.nchan;
ep-rss_qid = cdev-rdev.lldi.rxq_ids[
-   cxgb4_port_idx(n-dev) * step];
+   cxgb4_port_idx(pdev) * step];
 
if (clear_mpa_v1) {
ep-retry_with_mpa_v1 = 0;

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] drivers: net: ethernet: mellanox: mlx4: let mlx4 depend on SMP

2014-05-16 Thread Chen Gang
'struct irq_affinity_notify' and the related functions are only defined
when SMP enabled, so at present, mlx4 has to only run under SMP.

The related error (allmodconfig under unicore32):

CC [M]  drivers/net/ethernet/mellanox/mlx4/eq.o
  drivers/net/ethernet/mellanox/mlx4/eq.c:58: error: field ‘notify’ has 
incomplete type
  drivers/net/ethernet/mellanox/mlx4/eq.c: In function 
‘mlx4_irq_notifier_notify’:
  drivers/net/ethernet/mellanox/mlx4/eq.c:1094: error: type defaults to ‘int’ 
in declaration of ‘__mptr’
  drivers/net/ethernet/mellanox/mlx4/eq.c:1094: warning: initialization from 
incompatible pointer type
  drivers/net/ethernet/mellanox/mlx4/eq.c:1104: error: dereferencing pointer to 
incomplete type
  drivers/net/ethernet/mellanox/mlx4/eq.c: In function 
‘mlx4_release_irq_notifier’:
  drivers/net/ethernet/mellanox/mlx4/eq.c:: error: type defaults to ‘int’ 
in declaration of ‘__mptr’
  drivers/net/ethernet/mellanox/mlx4/eq.c:: warning: initialization from 
incompatible pointer type
  drivers/net/ethernet/mellanox/mlx4/eq.c: In function 
‘mlx4_assign_irq_notifier’:
  drivers/net/ethernet/mellanox/mlx4/eq.c:1133: error: implicit declaration of 
function ‘irq_set_affinity_notifier’
  make[5]: *** [drivers/net/ethernet/mellanox/mlx4/eq.o] Error 1
  make[4]: *** [drivers/net/ethernet/mellanox/mlx4] Error 2
  make[3]: *** [drivers/net/ethernet/mellanox] Error 2
  make[2]: *** [drivers/net/ethernet] Error 2
  make[1]: *** [drivers/net] Error 2
  make: *** [drivers] Error 2

Signed-off-by: Chen Gang gang.chen.5...@gmail.com
---
 drivers/infiniband/hw/mlx4/Kconfig | 2 +-
 drivers/net/ethernet/mellanox/mlx4/Kconfig | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/hw/mlx4/Kconfig 
b/drivers/infiniband/hw/mlx4/Kconfig
index fc01dea..e31e400 100644
--- a/drivers/infiniband/hw/mlx4/Kconfig
+++ b/drivers/infiniband/hw/mlx4/Kconfig
@@ -1,6 +1,6 @@
 config MLX4_INFINIBAND
tristate Mellanox ConnectX HCA support
-   depends on NETDEVICES  ETHERNET  PCI  INET
+   depends on NETDEVICES  ETHERNET  PCI  INET  SMP
select NET_VENDOR_MELLANOX
select MLX4_CORE
---help---
diff --git a/drivers/net/ethernet/mellanox/mlx4/Kconfig 
b/drivers/net/ethernet/mellanox/mlx4/Kconfig
index 1486ce9..a1f2380 100644
--- a/drivers/net/ethernet/mellanox/mlx4/Kconfig
+++ b/drivers/net/ethernet/mellanox/mlx4/Kconfig
@@ -4,7 +4,7 @@
 
 config MLX4_EN
tristate Mellanox Technologies 1/10/40Gbit Ethernet support
-   depends on PCI
+   depends on PCI  SMP
select MLX4_CORE
select PTP_1588_CLOCK
---help---
-- 
1.9.2.459.g68773ac
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] drivers: net: ethernet: mellanox: mlx4: let mlx4 depend on SMP

2014-05-16 Thread David Miller
From: Chen Gang gang.chen.5...@gmail.com
Date: Sat, 17 May 2014 13:26:16 +0800

 'struct irq_affinity_notify' and the related functions are only defined
 when SMP enabled, so at present, mlx4 has to only run under SMP.
 
 The related error (allmodconfig under unicore32):

Making the entire driver depend upon SMP is not the answer,
other Mellanox developers said that a proper fix is pending
so please be patient.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] drivers: net: ethernet: mellanox: mlx4: let mlx4 depend on SMP

2014-05-16 Thread Chen Gang
On 05/17/2014 01:36 PM, David Miller wrote:
 From: Chen Gang gang.chen.5...@gmail.com
 Date: Sat, 17 May 2014 13:26:16 +0800
 
 'struct irq_affinity_notify' and the related functions are only defined
 when SMP enabled, so at present, mlx4 has to only run under SMP.

 The related error (allmodconfig under unicore32):
 
 Making the entire driver depend upon SMP is not the answer,
 other Mellanox developers said that a proper fix is pending
 so please be patient.
 

OK, thank you for your information. I shall bypass it, and continue.

Thanks.
-- 
Chen Gang

Open, share, and attitude like air, water, and life which God blessed
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html