[PATCH opensm] osm_sa_mad_ctrl.c: In sa_mad_ctrl_rcv_callback, improve 1A04 error log message

2014-04-11 Thread Hal Rosenstock

to include more info on incoming SA MAD

Signed-off-by: Hal Rosenstock h...@mellanox.com
---
diff --git a/opensm/osm_sa_mad_ctrl.c b/opensm/osm_sa_mad_ctrl.c
index 902803e..61fcb0b 100644
--- a/opensm/osm_sa_mad_ctrl.c
+++ b/opensm/osm_sa_mad_ctrl.c
@@ -331,9 +331,12 @@ static void sa_mad_ctrl_rcv_callback(IN osm_madw_t * 
p_madw, IN void *context,
if (p_sa_mad-sm_key != 0 
p_sa_mad-sm_key != p_ctrl-p_subn-opt.sa_key) {
OSM_LOG(p_ctrl-p_log, OSM_LOG_ERROR, ERR 1A04: 
-   Non-Zero SA MAD SM_Key: 0x% PRIx64  != SM_Key: 0x%
-   PRIx64 ; MAD ignored\n, cl_ntoh64(p_sa_mad-sm_key),
-   cl_ntoh64(p_ctrl-p_subn-opt.sa_key));
+   Non-Zero MAD SM_Key: 0x% PRIx64  != SM_Key: 0x%
+   PRIx64 ; SA MAD ignored for method 0x%X attribute 0x%X 
(%s)\n,
+   cl_ntoh64(p_sa_mad-sm_key),
+   cl_ntoh64(p_ctrl-p_subn-opt.sa_key),
+   p_sa_mad-method, cl_ntoh16(p_sa_mad-attr_id),
+   ib_get_sa_attr_str(p_sa_mad-attr_id));
osm_mad_pool_put(p_ctrl-p_mad_pool, p_madw);
goto Exit;
}
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Rare Duplicate Completions

2014-04-11 Thread Christopher Mitchell
Bert,

Thanks for the help. Unfortunately, it looks as if all of my
message-passing and RDMA operations use the SIGNALED flag. I'm
checking the  wc-status flag (==IBV_WC_SUCCESS), and it and
everything in the message appears to be completely identical to the
first completion. I notice that it only seems to happen when I have
server processes (ie, listeners) that also open active connections to
other listeners as well as accept connections from client processes. I
perform both RDMA and message-passing across those server-to-server
connections, so on a hunch I tried rewriting the RDMA operations as
message-passing, but the problem still occurs. Thanks in advance for
any other thoughts you might have.

Cheers,
Christopher

On Thu, Apr 10, 2014 at 5:42 PM, Christopher Mitchell
christop...@cemetech.net wrote:
 Bert,

 Thanks for the help. Unfortunately, it looks as if all of my message-passing
 and RDMA operations use the SIGNALED flag. I'm checking the  wc-status flag
 (==IBV_WC_SUCCESS), and it and everything in the message appears to be
 completely identical to the first completion. I notice that it only seems to
 happen when I have server processes (ie, listeners) that also open active
 connections to other listeners as well as accept connections from client
 processes. I perform both RDMA and message-passing across those
 server-to-server connections, so on a hunch I tried rewriting the RDMA
 operations as message-passing, but the problem still occurs. Thanks in
 advance for any other thoughts you might have.

 Cheers,
 Christopher


 On Mon, Apr 7, 2014 at 9:01 PM, Bart Van Assche bvanass...@acm.org wrote:

 On 4/04/2014 10:00, Christopher Mitchell wrote:

 I am working on building a distributed Infiniband application, testing
 using Mellanox Connect-X HCAs. In very rare cases, perhaps once in a
 few million operations, I appear to be receiving duplicate completions
 or incorrect completions. For instance, I'll send out an RDMA request
 and receive a completion for a Verb message response I had just
 handled, or send a Verb message request and receive a duplicate Verb
 message completion.. Needless to say, this is introducing instability
 in my application. Does anyone have experience with a bug like this,
 or am I encountering some arcane issue in how I'm manipulating the
 HCA? I'd be more than happy to furnish relevant sections of code to
 help nail down the issue.


 Hello Christopher,

 In e.g. the SCST SRP target driver there is code present that checks for
 duplicate and/or missing completions. Although this code is being used
 intensively I have not yet seen any error messages being logged by the code
 that verifies completions. Note: something that is nontrivial in the RDMA
 API and that you might already be aware of is that even for non-signaled
 work requests a completion is delivered if that work request fails. If these
 non-signaled work requests do not have a unique wr_id error completions for
 these requests might be misinterpret as duplicate completions.

 Bart.
 --
 To unsubscribe from this list: send the line unsubscribe linux-rdma in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Add cleanup routines.

2014-04-11 Thread Hannes Weisbach
Dynamically loaded library handles are saved in a list and dlclosed() on
exit. The list of struct ibv_driver *, as well as the global
struct ibv_device ** list are free()d.

Signed-off-by: Hannes Weisbach hannes_weisb...@gmx.net
---
 src/device.c | 10 ++
 src/init.c   | 51 ++-
 2 files changed, 56 insertions(+), 5 deletions(-)

diff --git a/src/device.c b/src/device.c
index beb7b3c..d5b76bb 100644
--- a/src/device.c
+++ b/src/device.c
@@ -305,3 +305,13 @@ void __ibv_ack_async_event(struct ibv_async_event *event)
}
 }
 default_symver(__ibv_ack_async_event, ibv_ack_async_event);
+
+FINI static void ibverbs_deinit()
+{
+   size_t i;
+   for (i = 0; i  num_devices; i++) {
+   /* driver callback needed. May not be malloc'd memory */
+   free(device_list[i]);
+   }
+   free(device_list);
+}
diff --git a/src/init.c b/src/init.c
index d0e4b1c..2a8bca4 100644
--- a/src/init.c
+++ b/src/init.c
@@ -67,6 +67,11 @@ struct ibv_driver_name {
struct ibv_driver_name *next;
 };
 
+struct ibv_so_list {
+   void *dlhandle;
+   struct ibv_so_list *next;
+};
+
 struct ibv_driver {
const char *name;
ibv_driver_init_funcinit_func;
@@ -77,6 +82,7 @@ struct ibv_driver {
 static struct ibv_sysfs_dev *sysfs_dev_list;
 static struct ibv_driver_name *driver_name_list;
 static struct ibv_driver *head_driver, *tail_driver;
+static struct ibv_so_list *so_list;
 
 static int find_sysfs_devs(void)
 {
@@ -193,7 +199,14 @@ void verbs_register_driver(const char *name, 
verbs_driver_init_func init_func)
 static void load_driver(const char *name)
 {
char *so_name;
-   void *dlhandle;
+   struct ibv_so_list *elem;
+   struct ibv_so_list **list;
+
+   elem = malloc(sizeof(*elem));
+   if(!elem)
+   return;
+
+   elem-next = NULL;
 
 #define __IBV_QUOTE(x) #x
 #define IBV_QUOTE(x)   __IBV_QUOTE(x)
@@ -205,16 +218,25 @@ static void load_driver(const char *name)
 name)  0) {
fprintf(stderr, PFX Warning: couldn't load driver '%s'.\n,
name);
-   return;
+   goto err;
}
 
-   dlhandle = dlopen(so_name, RTLD_NOW);
-   if (!dlhandle) {
+   elem-dlhandle = dlopen(so_name, RTLD_NOW);
+   if (!elem-dlhandle) {
fprintf(stderr, PFX Warning: couldn't load driver '%s': %s\n,
name, dlerror());
-   goto out;
+   goto err;
}
 
+   for (list = so_list; *list; list = (*list)-next)
+   ;
+
+   *list = elem;
+
+   goto out;
+
+err:
+   free(elem);
 out:
free(so_name);
 }
@@ -573,3 +595,22 @@ out:
 
return num_devices;
 }
+
+FINI static void unload_drivers()
+{
+   struct ibv_driver *driver;
+   struct ibv_so_list * list;
+
+   for (driver = head_driver; driver;) {
+   struct ibv_driver *tmp = driver;
+   driver = driver-next;
+   free(tmp);
+   }
+
+   for (list = so_list; list;) {
+   struct ibv_so_list *tmp = list;
+   list = list-next;
+   dlclose(tmp-dlhandle);
+   free(tmp);
+   }
+}
-- 
1.8.5.5


--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Clean index map on exit

2014-04-11 Thread Hannes Weisbach
This patch adds the function idm_free() to free all entries of an index
map. A call to this function is added in the ucma_cleanup destructor.
The ucma_idm struct index_map is cleaned.

Signed-off-by: Hannes Weisbach hannes_weisb...@gmx.net
---
 src/cma.c | 1 +
 src/indexer.c | 8 
 src/indexer.h | 1 +
 3 files changed, 10 insertions(+)

diff --git a/src/cma.c b/src/cma.c
index 0cf4203..a533bf9 100644
--- a/src/cma.c
+++ b/src/cma.c
@@ -139,6 +139,7 @@ static void ucma_cleanup(void)
ibv_close_device(cma_dev_array[cma_dev_cnt].verbs);
}
 
+   idm_free(ucma_idm);
fastlock_destroy(idm_lock);
free(cma_dev_array);
cma_dev_cnt = 0;
diff --git a/src/indexer.c b/src/indexer.c
index c8e8bce..4d1fd77 100644
--- a/src/indexer.c
+++ b/src/indexer.c
@@ -164,3 +164,11 @@ void *idm_clear(struct index_map *idm, int index)
entry[idx_entry_index(index)] = NULL;
return item;
 }
+
+void idm_free(struct index_map *idm)
+{
+   size_t i;
+   for (i = 0; i  IDX_ARRAY_SIZE; i++) {
+   free(idm-array[i]);
+   }
+}
diff --git a/src/indexer.h b/src/indexer.h
index 0c5f388..829d46b 100644
--- a/src/indexer.h
+++ b/src/indexer.h
@@ -89,6 +89,7 @@ struct index_map
 
 int idm_set(struct index_map *idm, int index, void *item);
 void *idm_clear(struct index_map *idm, int index);
+void idm_free(struct index_map *idm);
 
 static inline void *idm_at(struct index_map *idm, int index)
 {
-- 
1.8.5.5


--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Add cleanup routines.

2014-04-11 Thread Yann Droneaud
Hi,

Le vendredi 11 avril 2014 à 18:50 +0200, Hannes Weisbach a écrit :
 Dynamically loaded library handles are saved in a list and dlclosed() on
 exit. The list of struct ibv_driver *, as well as the global
 struct ibv_device ** list are free()d.
 

Please adds some explanation, in particular the purpose of the changes.
Your commit message only explain how, but you should also explain
why.

 Signed-off-by: Hannes Weisbach hannes_weisb...@gmx.net
 ---
  src/device.c | 10 ++
  src/init.c   | 51 ++-
  2 files changed, 56 insertions(+), 5 deletions(-)
 
 diff --git a/src/device.c b/src/device.c
 index beb7b3c..d5b76bb 100644
 --- a/src/device.c
 +++ b/src/device.c
 @@ -305,3 +305,13 @@ void __ibv_ack_async_event(struct ibv_async_event *event)
   }
  }
  default_symver(__ibv_ack_async_event, ibv_ack_async_event);
 +
 +FINI static void ibverbs_deinit()

In C, empty prototype declare a function which accept any parameter. So
perhaps void ibverbs(void) is what you mean.

 +{
 + size_t i;
 + for (i = 0; i  num_devices; i++) {
 + /* driver callback needed. May not be malloc'd memory */

Seems dangerous. Such interrogation must be explicitly added in the
commit message. 

 + free(device_list[i]);
 + }
 + free(device_list);
 +}
 diff --git a/src/init.c b/src/init.c
 index d0e4b1c..2a8bca4 100644
 --- a/src/init.c
 +++ b/src/init.c
 @@ -67,6 +67,11 @@ struct ibv_driver_name {
   struct ibv_driver_name *next;
  };
  
 +struct ibv_so_list {
 + void *dlhandle;
 + struct ibv_so_list *next;
 +};
 +
  struct ibv_driver {
   const char *name;
   ibv_driver_init_funcinit_func;
 @@ -77,6 +82,7 @@ struct ibv_driver {
  static struct ibv_sysfs_dev *sysfs_dev_list;
  static struct ibv_driver_name *driver_name_list;
  static struct ibv_driver *head_driver, *tail_driver;
 +static struct ibv_so_list *so_list;
  
  static int find_sysfs_devs(void)
  {
 @@ -193,7 +199,14 @@ void verbs_register_driver(const char *name, 
 verbs_driver_init_func init_func)
  static void load_driver(const char *name)
  {
   char *so_name;
 - void *dlhandle;
 + struct ibv_so_list *elem;
 + struct ibv_so_list **list;
 +
 + elem = malloc(sizeof(*elem));
 + if(!elem)
 + return;
 +
 + elem-next = NULL;
  
  #define __IBV_QUOTE(x)   #x
  #define IBV_QUOTE(x) __IBV_QUOTE(x)
 @@ -205,16 +218,25 @@ static void load_driver(const char *name)
name)  0) {
   fprintf(stderr, PFX Warning: couldn't load driver '%s'.\n,
   name);
 - return;
 + goto err;
   }
  
 - dlhandle = dlopen(so_name, RTLD_NOW);
 - if (!dlhandle) {
 + elem-dlhandle = dlopen(so_name, RTLD_NOW);
 + if (!elem-dlhandle) {
   fprintf(stderr, PFX Warning: couldn't load driver '%s': %s\n,
   name, dlerror());
 - goto out;
 + goto err;
   }
  
 + for (list = so_list; *list; list = (*list)-next)
 + ;
 +
 + *list = elem;
 +
 + goto out;
 +
 +err:
 + free(elem);
  out:
   free(so_name);
  }
 @@ -573,3 +595,22 @@ out:
  
   return num_devices;
  }
 +
 +FINI static void unload_drivers()

same remarks about prototype.

 +{
 + struct ibv_driver *driver;
 + struct ibv_so_list * list;
 +
 + for (driver = head_driver; driver;) {
 + struct ibv_driver *tmp = driver;
 + driver = driver-next;
 + free(tmp);
 + }
 +

Is it safe here to free the driver ?

 + for (list = so_list; list;) {
 + struct ibv_so_list *tmp = list;
 + list = list-next;
 + dlclose(tmp-dlhandle);
 + free(tmp);
 + }
 +}

Why not store the dlopen() handle in the struct ibv_driver itself ?
This way only one list has to be scanned.

Regards.

-- 
Yann Droneaud
OPTEYA


--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Clean index map on exit

2014-04-11 Thread Yann Droneaud
Le vendredi 11 avril 2014 à 18:58 +0200, Hannes Weisbach a écrit :
 This patch adds the function idm_free() to free all entries of an index
 map. A call to this function is added in the ucma_cleanup destructor.
 The ucma_idm struct index_map is cleaned.
 

Please explain the purpose: what's your goal ?

 Signed-off-by: Hannes Weisbach hannes_weisb...@gmx.net
 ---
  src/cma.c | 1 +
  src/indexer.c | 8 
  src/indexer.h | 1 +
  3 files changed, 10 insertions(+)
 
 diff --git a/src/cma.c b/src/cma.c
 index 0cf4203..a533bf9 100644
 --- a/src/cma.c
 +++ b/src/cma.c
 @@ -139,6 +139,7 @@ static void ucma_cleanup(void)
   ibv_close_device(cma_dev_array[cma_dev_cnt].verbs);
   }
  
 + idm_free(ucma_idm);
   fastlock_destroy(idm_lock);
   free(cma_dev_array);
   cma_dev_cnt = 0;
 diff --git a/src/indexer.c b/src/indexer.c
 index c8e8bce..4d1fd77 100644
 --- a/src/indexer.c
 +++ b/src/indexer.c
 @@ -164,3 +164,11 @@ void *idm_clear(struct index_map *idm, int index)
   entry[idx_entry_index(index)] = NULL;
   return item;
  }
 +
 +void idm_free(struct index_map *idm)
 +{
 + size_t i;
 + for (i = 0; i  IDX_ARRAY_SIZE; i++) {
 + free(idm-array[i]);
 + }
 +}

Is the array always full ? is the maximum index of the array
IDX_ARRAY_SIZE or could it be lower ?

And what about idm, is it free()'d somewhere else ?

 diff --git a/src/indexer.h b/src/indexer.h
 index 0c5f388..829d46b 100644
 --- a/src/indexer.h
 +++ b/src/indexer.h
 @@ -89,6 +89,7 @@ struct index_map
  
  int idm_set(struct index_map *idm, int index, void *item);
  void *idm_clear(struct index_map *idm, int index);
 +void idm_free(struct index_map *idm);
  
  static inline void *idm_at(struct index_map *idm, int index)
  {

Regards.

-- 
Yann Droneaud
OPTEYA


--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Add cleanup routines.

2014-04-11 Thread Hannes Weisbach

Am 11.04.2014 um 19:00 schrieb Yann Droneaud ydrone...@opteya.com:

 Hi,
Thanks for your quick reply.

 Please adds some explanation, in particular the purpose of the changes.
 Your commit message only explain how, but you should also explain
 why.
Ok.

 In C, empty prototype declare a function which accept any parameter. So
 perhaps void ibverbs(void) is what you mean.
Yup, thanks.

 +{
 +size_t i;
 +for (i = 0; i  num_devices; i++) {
 +/* driver callback needed. May not be malloc'd memory */
 
 Seems dangerous. Such interrogation must be explicitly added in the
 commit message. 
Maybe it's better to leave it as a TODO and only free the device_list
itself?

 +for (driver = head_driver; driver;) {
 +struct ibv_driver *tmp = driver;
 +driver = driver-next;
 +free(tmp);
 +}
 +
 
 Is it safe here to free the driver ?
The struct itself is no longer needed.  All the driver libraries are
loaded and ibverbs_init() is call-once, so no new libraries are
dlopen()ed (and call ibv_register_driver()).
It's probably nicer first to dlclose() all libraries and then free all
struct ibv_drivers.

 +for (list = so_list; list;) {
 +struct ibv_so_list *tmp = list;
 +list = list-next;
 +dlclose(tmp-dlhandle);
 +free(tmp);
 +}
 +}
 
 Why not store the dlopen() handle in the struct ibv_driver itself ?
 This way only one list has to be scanned.
That was my first idea, but the struct ibv_driver doesn't exist at
that time.  First the (driver) library is dlopen()d and calls
ibv_register_driver().  Only there is the ibv_driver struct allocated,
but the handle is gone.

Best regards,
Hannes
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Clean index map on exit

2014-04-11 Thread Hannes Weisbach

 
 Is the array always full ?
No, but it has to be zero-initialized and free(NULL) is fine.

idm_set() does:
if (!idm-array[idx_array_index(index)]) {
if (idm_grow(idm, index)  0)
(idm_grow does calloc().)

So I concluded unused entries have to be zero already and I don't put
a new constraint on struct index_map.

 is the maximum index of the array
 IDX_ARRAY_SIZE or could it be lower ?
It's fixed:
struct index_map
{
void **array[IDX_ARRAY_SIZE];
};

 And what about idm, is it free()'d somewhere else ?
The owner has to take care of its struct index_map.  I can only free
the resources the map itself holds.
For what it's worth: all struct index_maps are declared statically.

Again, thank you for your quick feedback.

Best regards,
Hannes
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] xprtrdma: mind the device's max fast register page list depth

2014-04-11 Thread Chuck Lever
Hi Steve-

I reviewed and successfully tested this patch with mlx4 using FRMR and MTHCAFMR.
A couple of cosmetic notes are below:


On Apr 10, 2014, at 11:13 AM, Steve Wise sw...@opengridcomputing.com wrote:

 Some rdma devices don't support a fast register page list depth of
 at least RPCRDMA_MAX_DATA_SEGS.  So xprtrdma needs to chunk its fast
 register regions according to the minimum of the device max supported
 depth or RPCRDMA_MAX_DATA_SEGS.
 
 Signed-off-by: Steve Wise sw...@opengridcomputing.com
 ---
 
 net/sunrpc/xprtrdma/rpc_rdma.c  |2 -
 net/sunrpc/xprtrdma/verbs.c |   64 ---
 net/sunrpc/xprtrdma/xprt_rdma.h |1 +
 3 files changed, 47 insertions(+), 20 deletions(-)
 
 diff --git a/net/sunrpc/xprtrdma/rpc_rdma.c b/net/sunrpc/xprtrdma/rpc_rdma.c
 index 96ead52..14d7634 100644
 --- a/net/sunrpc/xprtrdma/rpc_rdma.c
 +++ b/net/sunrpc/xprtrdma/rpc_rdma.c
 @@ -249,8 +249,6 @@ rpcrdma_create_chunks(struct rpc_rqst *rqst, struct 
 xdr_buf *target,
   req-rl_nchunks = nchunks;
 
   BUG_ON(nchunks == 0);

I don’t see a path through rpcrdma_create_chunks() where nchunks can be 0 here. 
If you
agree, can you remove the above BUG_ON as well, if you re-submit a new version 
of this
patch?

 - BUG_ON((r_xprt-rx_ia.ri_memreg_strategy == RPCRDMA_FRMR)
 - (nchunks  3));
 
   /*
* finish off header. If write, marshal discrim and nchunks.
 diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
 index 93726560..c08f193 100644
 --- a/net/sunrpc/xprtrdma/verbs.c
 +++ b/net/sunrpc/xprtrdma/verbs.c
 @@ -539,6 +539,11 @@ rpcrdma_ia_open(struct rpcrdma_xprt *xprt, struct 
 sockaddr *addr, int memreg)
   __func__);
   memreg = RPCRDMA_REGISTER;
 #endif
 + } else {
 + /* Mind the ia limit on FRMR page list depth */
 + ia-ri_max_frmr_depth = min_t(unsigned int,
 + RPCRDMA_MAX_DATA_SEGS,
 + devattr.max_fast_reg_page_list_len);
   }
   break;
   }
 @@ -659,24 +664,42 @@ rpcrdma_ep_create(struct rpcrdma_ep *ep, struct 
 rpcrdma_ia *ia,
   ep-rep_attr.srq = NULL;
   ep-rep_attr.cap.max_send_wr = cdata-max_requests;
   switch (ia-ri_memreg_strategy) {
 - case RPCRDMA_FRMR:
 + case RPCRDMA_FRMR: {
 + int depth = 7;
 +
   /* Add room for frmr register and invalidate WRs.
* 1. FRMR reg WR for head
* 2. FRMR invalidate WR for head
 -  * 3. FRMR reg WR for pagelist
 -  * 4. FRMR invalidate WR for pagelist
 +  * 3. N FRMR reg WRs for pagelist
 +  * 4. N FRMR invalidate WRs for pagelist
* 5. FRMR reg WR for tail
* 6. FRMR invalidate WR for tail
* 7. The RDMA_SEND WR
*/
 - ep-rep_attr.cap.max_send_wr *= 7;
 +
 + /* Calculate N if the device max FRMR depth is smaller than
 +  * RPCRDMA_MAX_DATA_SEGS.
 +  */
 + if (ia-ri_max_frmr_depth  RPCRDMA_MAX_DATA_SEGS) {
 + int delta = RPCRDMA_MAX_DATA_SEGS -
 + ia-ri_max_frmr_depth;
 +
 + do {
 + depth += 2; /* FRMR reg + invalidate */
 + delta -= ia-ri_max_frmr_depth;
 + } while (delta  0);
 +
 + }
 + ep-rep_attr.cap.max_send_wr *= depth;
   if (ep-rep_attr.cap.max_send_wr  devattr.max_qp_wr) {
 - cdata-max_requests = devattr.max_qp_wr / 7;
 + cdata-max_requests = devattr.max_qp_wr / depth;
   if (!cdata-max_requests)
   return -EINVAL;
 - ep-rep_attr.cap.max_send_wr = cdata-max_requests * 7;
 + ep-rep_attr.cap.max_send_wr = cdata-max_requests *
 +depth;
   }
   break;
 + }
   case RPCRDMA_MEMWINDOWS_ASYNC:
   case RPCRDMA_MEMWINDOWS:
   /* Add room for mw_binds+unbinds - overkill! */
 @@ -1043,16 +1066,16 @@ rpcrdma_buffer_create(struct rpcrdma_buffer *buf, 
 struct rpcrdma_ep *ep,
   case RPCRDMA_FRMR:
   for (i = buf-rb_max_requests * RPCRDMA_MAX_SEGS; i; i--) {
   r-r.frmr.fr_mr = ib_alloc_fast_reg_mr(ia-ri_pd,
 -  RPCRDMA_MAX_SEGS);
 + ia-ri_max_frmr_depth);
   if (IS_ERR(r-r.frmr.fr_mr)) {
   rc = PTR_ERR(r-r.frmr.fr_mr);
   dprintk(RPC:   %s: ib_alloc_fast_reg_mr
failed %i\n, __func__, rc);
  

Re: [PATCH] nfs-rdma: Fix for FMR leaks

2014-04-11 Thread Chuck Lever
Hi Allen-

I reviewed and tested your patch. Comments inline below.


On Apr 9, 2014, at 6:40 PM, Allen Andrews allen.andr...@cox.net wrote:

 Two memory region leaks iwere found during testing:
 
 1. rpcrdma_buffer_create: While allocating RPCRDMA_FRMR's 
 ib_alloc_fast_reg_mr is called and then ib_alloc_fast_reg_page_list is 
 called.  If ib_alloc_fast_reg_page_list returns an error it bails out of the 
 routine dropping the last ib_alloc_fast_reg_mr frmr region creating a memory 
 leak.  Added code to dereg the last frmr if ib_alloc_fast_reg_page_list fails.
 
 2. rpcrdma_buffer_destroy: While cleaning up, the routine will only free the 
 MR's on the rb_mws list if there are rb_send_bufs present.  However, in 
 rpcrdma_buffer_create while the rb_mws list is being built if one of the MR 
 allocation requests fail after some MR's have been allocated on the rb_mws 
 list the routine never gets to create any rb_send_bufs but instead jumps to 
 the rpcrdma_buffer_destroy routine which will never free the MR's on rb_mws
 list because the rb_send_bufs were never created.   This leaks all the MR's
 on the rb_mws list that were created prior to one of the MR allocations 
 failing.
 
 Issue(2) was seen during testing. Our adapter had a finite number of MR's 
 available and we created enough connections to where we saw an MR allocation 
 failure on our Nth NFS connection request. After the kernel cleaned up the 
 resources it had allocated for the Nth connection we noticed that FMR's had 
 been leaked due to the coding error described above.
 
 Issue(1) was seen during a code review while debugging issue(2).
 
 Signed-off-by: Allen Andrews allen.andr...@emulex.com
 ---
 
 net/sunrpc/xprtrdma/verbs.c |   79 ---
 1 files changed, 44 insertions(+), 35 deletions(-)
 
 diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c index 
 9372656..54f592d 100644
 --- a/net/sunrpc/xprtrdma/verbs.c
 +++ b/net/sunrpc/xprtrdma/verbs.c
 @@ -1058,6 +1058,14 @@ rpcrdma_buffer_create(struct rpcrdma_buffer *buf, 
 struct rpcrdma_ep *ep,
   dprintk(RPC:   %s: 
   ib_alloc_fast_reg_page_list 
   failed %i\n, __func__, rc);
 +
 + rc = ib_dereg_mr(r-r.frmr.fr_mr);

This clears “rc,” wiping the error returned by ib_alloc_fast_reg_page_list().

Then rpcrdma_buffer_create() returns zero, even though it failed, and xprtrdma 
thinks the
transport is set up and ready for use. Panic.

I think you can safely ignore the return code from ib_dereg_mr() here.


 + if (rc)
 + dprintk(RPC:   %s:
 +  ib_dereg_mr
 +  failed %i\n,
 + __func__, rc);
 +
   goto out;
   }
   list_add(r-mw_list, buf-rb_mws); @@ -1194,41 
 +1202,6 @@ rpcrdma_buffer_destroy(struct rpcrdma_buffer *buf)

This line is broken: the “@@“ should start on the next line. Some kind of 
e-mail snafu?


   kfree(buf-rb_recv_bufs[i]);
   }
   if (buf-rb_send_bufs  buf-rb_send_bufs[i]) {
 - while (!list_empty(buf-rb_mws)) {
 - r = list_entry(buf-rb_mws.next,
 - struct rpcrdma_mw, mw_list);
 - list_del(r-mw_list);
 - switch (ia-ri_memreg_strategy) {
 - case RPCRDMA_FRMR:
 - rc = ib_dereg_mr(r-r.frmr.fr_mr);
 - if (rc)
 - dprintk(RPC:   %s:
 -  ib_dereg_mr
 -  failed %i\n,
 - __func__, rc);
 - 
 ib_free_fast_reg_page_list(r-r.frmr.fr_pgl);
 - break;
 - case RPCRDMA_MTHCAFMR:
 - rc = ib_dealloc_fmr(r-r.fmr);
 - if (rc)
 - dprintk(RPC:   %s:
 -  ib_dealloc_fmr
 -  failed %i\n,
 - __func__, rc);
 - break;
 - case RPCRDMA_MEMWINDOWS_ASYNC:
 - case RPCRDMA_MEMWINDOWS:
 - rc = ib_dealloc_mw(r-r.mw);
 - if (rc)
 - 

RE: [PATCH V1] NFS-RDMA: fix qp pointer validation checks

2014-04-11 Thread Devesh Sharma
Hi  Chuck,
Yes that is the case, Following is the trace I got.

4RPC:   355 setting alarm for 6 ms
4RPC:   355 sync task going to sleep
4RPC:   xprt_rdma_connect_worker: reconnect
4RPC:   rpcrdma_ep_disconnect: rdma_disconnect -1
4RPC:   rpcrdma_ep_connect: rpcrdma_ep_disconnect status -1
3ocrdma_mbx_create_qp(0) rq_err
3ocrdma_mbx_create_qp(0) sq_err
3ocrdma_create_qp(0) error=-1
4RPC:   rpcrdma_ep_connect: rdma_create_qp failed -1
4RPC:   355 __rpc_wake_up_task (now 4296956756)
4RPC:   355 disabling timer
4RPC:   355 removed from queue 880454578258 xprt_pending
4RPC:   __rpc_wake_up_task done
4RPC:   xprt_rdma_connect_worker: exit
4RPC:   355 sync task resuming
4RPC:   355 xprt_connect_status: error 1 connecting to server 192.168.1.1
4RPC:   wake_up_next(880454578190 xprt_sending)
4RPC:   355 call_connect_status (status -5)
4RPC:   355 return 0, status -5
4RPC:   355 release task
4RPC:   wake_up_next(880454578190 xprt_sending)
4RPC:   xprt_rdma_free: called on 0x(null)
1BUG: unable to handle kernel NULL pointer dereference at (null)
1IP: [a05b68ac] rpcrdma_deregister_frmr_external+0x9c/0xe0 
[xprtrdma]
4PGD 454554067 PUD 4665b7067 PMD 0
4Oops:  [#1] SMP
4last sysfs file: 
/sys/devices/pci:00/:00:03.0/:03:00.0/infiniband/ocrdma0/fwerr
4CPU 6
4Modules linked in: xprtrdma(U) nfs lockd fscache auth_rpcgss nfs_acl 
ocrdma(U) be2net(U) ip6table_filter ip6_tables ebtable_nat ebtables 
ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_state 
nf_conntrack ipt_REJECT xt_CHECKSUM iptable_mangle iptable_filter ip_tables 
bridge stp llc autofs4 des_generic ecb md4 nls_utf8 cifs sunrpc rdma_ucm(U) 
rdma_cm(U) iw_cm(U) ib_addr(U) ib_ipoib(U) ib_cm(U) ib_sa(U) ipv6 ib_uverbs(U) 
ib_umad(U) iw_nes(U) libcrc32c iw_cxgb4(U) cxgb4(U) iw_cxgb3(U) cxgb3(U) mdio 
ib_qib(U) mlx4_en(U) mlx4_ib(U) mlx4_core(U) ib_mthca(U) ib_mad(U) ib_core(U) 
compat(U) vhost_net macvtap macvlan tun kvm uinput power_meter sg microcode 
i2c_i801 i2c_core iTCO_wdt iTCO_vendor_support igb ptp pps_core ioatdma dca 
i7core_edac edac_core ext3 jbd mbcache sr_mod cdrom sd_mod crc_t10dif 
usb_storage pata_acpi ata_generic ata_piix mptsas mptscsih mptbase 
scsi_transport_sas dm_mirror dm_region_hash dm_log dm_mod [last unloaded: 
be2net]
4
4Pid: 3597, comm: ls Not tainted 2.6.32-358.el6.x86_64 #1 Cisco Systems Inc 
R210-2121605W/R210-2121605W
4RIP: 0010:[a05b68ac]  [a05b68ac] 
rpcrdma_deregister_frmr_external+0x9c/0xe0 [xprtrdma]
4RSP: 0018:880465aff9a8  EFLAGS: 00010217
4RAX: 8804673fcc00 RBX: 880466578028 RCX: 
4RDX: 880465affa10 RSI: 880465aff9a8 RDI: 
4RBP: 880465affa38 R08:  R09: 
4R10: 000f R11: 000f R12: 880454578598
4R13: 880454578000 R14: 880466578068 R15: 
4FS:  7fe61f3107a0() GS:8800368c() knlGS:
4CS:  0010 DS:  ES:  CR0: 8005003b
4CR2:  CR3: 00046520a000 CR4: 07e0
4DR0:  DR1:  DR2: 
4DR3:  DR6: 0ff0 DR7: 0400
4Process ls (pid: 3597, threadinfo 880465afe000, task 8804639a5500)
4Stack:
4  880462287370  000a
4d 0802dd3b0002   
4d    
4Call Trace:
4 [8109c97f] ? up+0x2f/0x50
4 [a05b6a03] rpcrdma_deregister_external+0x113/0x2d0 [xprtrdma]
4 [8150d0d1] ? printk+0x41/0x48
4 [a05b44fc] xprt_rdma_free+0x8c/0x210 [xprtrdma]
4 [81082014] ? mod_timer+0x144/0x220
4 [a05c8a60] xprt_release+0xc0/0x220 [sunrpc]
4 [a05cff5d] rpc_release_resources_task+0x1d/0x50 [sunrpc]
4 [a05d0a84] __rpc_execute+0x174/0x350 [sunrpc]
4 [8150d0d1] ? printk+0x41/0x48
4 [81096b47] ? bit_waitqueue+0x17/0xd0
4 [a05d0cc1] rpc_execute+0x61/0xa0 [sunrpc]
4 [a05c73a5] rpc_run_task+0x75/0x90 [sunrpc]
4 [a05c74c2] rpc_call_sync+0x42/0x70 [sunrpc]
4 [81143a17] ? handle_pte_fault+0x487/0xb50
4 [a074e030] _nfs4_call_sync+0x30/0x40 [nfs]
4 [a07461dc] _nfs4_proc_getattr+0xac/0xc0 [nfs]
4 [a07494be] nfs4_proc_getattr+0x4e/0x70 [nfs]
4 [a072f3e3] __nfs_revalidate_inode+0xe3/0x220 [nfs]
4 [a072fdb6] nfs_getattr+0xb6/0x120 [nfs]
4 [81186951] vfs_getattr+0x51/0x80
4 [811869e0] vfs_fstatat+0x60/0x80
4 [81186b2b] vfs_stat+0x1b/0x20
4 [81186b54] sys_newstat+0x24/0x50
4 [810dc817] ? audit_syscall_entry+0x1d7/0x200
4 [8100b072] system_call_fastpath+0x16/0x1b
4Code: 48 89 85 78 ff ff ff 48 8b 40 08 8b 40 1c 89 45 94 b8 ff ff ff ff f0 
41 0f c1 85 e0 05 00 00 49 8b 04 24 48 8d 55 d8 48 8b 78 10 48 8b 07 ff 90 b0 
01 00 00