[PATCH opensm] osm_sa_mad_ctrl.c: In sa_mad_ctrl_rcv_callback, improve 1A04 error log message
to include more info on incoming SA MAD Signed-off-by: Hal Rosenstock h...@mellanox.com --- diff --git a/opensm/osm_sa_mad_ctrl.c b/opensm/osm_sa_mad_ctrl.c index 902803e..61fcb0b 100644 --- a/opensm/osm_sa_mad_ctrl.c +++ b/opensm/osm_sa_mad_ctrl.c @@ -331,9 +331,12 @@ static void sa_mad_ctrl_rcv_callback(IN osm_madw_t * p_madw, IN void *context, if (p_sa_mad-sm_key != 0 p_sa_mad-sm_key != p_ctrl-p_subn-opt.sa_key) { OSM_LOG(p_ctrl-p_log, OSM_LOG_ERROR, ERR 1A04: - Non-Zero SA MAD SM_Key: 0x% PRIx64 != SM_Key: 0x% - PRIx64 ; MAD ignored\n, cl_ntoh64(p_sa_mad-sm_key), - cl_ntoh64(p_ctrl-p_subn-opt.sa_key)); + Non-Zero MAD SM_Key: 0x% PRIx64 != SM_Key: 0x% + PRIx64 ; SA MAD ignored for method 0x%X attribute 0x%X (%s)\n, + cl_ntoh64(p_sa_mad-sm_key), + cl_ntoh64(p_ctrl-p_subn-opt.sa_key), + p_sa_mad-method, cl_ntoh16(p_sa_mad-attr_id), + ib_get_sa_attr_str(p_sa_mad-attr_id)); osm_mad_pool_put(p_ctrl-p_mad_pool, p_madw); goto Exit; } -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Rare Duplicate Completions
Bert, Thanks for the help. Unfortunately, it looks as if all of my message-passing and RDMA operations use the SIGNALED flag. I'm checking the wc-status flag (==IBV_WC_SUCCESS), and it and everything in the message appears to be completely identical to the first completion. I notice that it only seems to happen when I have server processes (ie, listeners) that also open active connections to other listeners as well as accept connections from client processes. I perform both RDMA and message-passing across those server-to-server connections, so on a hunch I tried rewriting the RDMA operations as message-passing, but the problem still occurs. Thanks in advance for any other thoughts you might have. Cheers, Christopher On Thu, Apr 10, 2014 at 5:42 PM, Christopher Mitchell christop...@cemetech.net wrote: Bert, Thanks for the help. Unfortunately, it looks as if all of my message-passing and RDMA operations use the SIGNALED flag. I'm checking the wc-status flag (==IBV_WC_SUCCESS), and it and everything in the message appears to be completely identical to the first completion. I notice that it only seems to happen when I have server processes (ie, listeners) that also open active connections to other listeners as well as accept connections from client processes. I perform both RDMA and message-passing across those server-to-server connections, so on a hunch I tried rewriting the RDMA operations as message-passing, but the problem still occurs. Thanks in advance for any other thoughts you might have. Cheers, Christopher On Mon, Apr 7, 2014 at 9:01 PM, Bart Van Assche bvanass...@acm.org wrote: On 4/04/2014 10:00, Christopher Mitchell wrote: I am working on building a distributed Infiniband application, testing using Mellanox Connect-X HCAs. In very rare cases, perhaps once in a few million operations, I appear to be receiving duplicate completions or incorrect completions. For instance, I'll send out an RDMA request and receive a completion for a Verb message response I had just handled, or send a Verb message request and receive a duplicate Verb message completion.. Needless to say, this is introducing instability in my application. Does anyone have experience with a bug like this, or am I encountering some arcane issue in how I'm manipulating the HCA? I'd be more than happy to furnish relevant sections of code to help nail down the issue. Hello Christopher, In e.g. the SCST SRP target driver there is code present that checks for duplicate and/or missing completions. Although this code is being used intensively I have not yet seen any error messages being logged by the code that verifies completions. Note: something that is nontrivial in the RDMA API and that you might already be aware of is that even for non-signaled work requests a completion is delivered if that work request fails. If these non-signaled work requests do not have a unique wr_id error completions for these requests might be misinterpret as duplicate completions. Bart. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Add cleanup routines.
Dynamically loaded library handles are saved in a list and dlclosed() on exit. The list of struct ibv_driver *, as well as the global struct ibv_device ** list are free()d. Signed-off-by: Hannes Weisbach hannes_weisb...@gmx.net --- src/device.c | 10 ++ src/init.c | 51 ++- 2 files changed, 56 insertions(+), 5 deletions(-) diff --git a/src/device.c b/src/device.c index beb7b3c..d5b76bb 100644 --- a/src/device.c +++ b/src/device.c @@ -305,3 +305,13 @@ void __ibv_ack_async_event(struct ibv_async_event *event) } } default_symver(__ibv_ack_async_event, ibv_ack_async_event); + +FINI static void ibverbs_deinit() +{ + size_t i; + for (i = 0; i num_devices; i++) { + /* driver callback needed. May not be malloc'd memory */ + free(device_list[i]); + } + free(device_list); +} diff --git a/src/init.c b/src/init.c index d0e4b1c..2a8bca4 100644 --- a/src/init.c +++ b/src/init.c @@ -67,6 +67,11 @@ struct ibv_driver_name { struct ibv_driver_name *next; }; +struct ibv_so_list { + void *dlhandle; + struct ibv_so_list *next; +}; + struct ibv_driver { const char *name; ibv_driver_init_funcinit_func; @@ -77,6 +82,7 @@ struct ibv_driver { static struct ibv_sysfs_dev *sysfs_dev_list; static struct ibv_driver_name *driver_name_list; static struct ibv_driver *head_driver, *tail_driver; +static struct ibv_so_list *so_list; static int find_sysfs_devs(void) { @@ -193,7 +199,14 @@ void verbs_register_driver(const char *name, verbs_driver_init_func init_func) static void load_driver(const char *name) { char *so_name; - void *dlhandle; + struct ibv_so_list *elem; + struct ibv_so_list **list; + + elem = malloc(sizeof(*elem)); + if(!elem) + return; + + elem-next = NULL; #define __IBV_QUOTE(x) #x #define IBV_QUOTE(x) __IBV_QUOTE(x) @@ -205,16 +218,25 @@ static void load_driver(const char *name) name) 0) { fprintf(stderr, PFX Warning: couldn't load driver '%s'.\n, name); - return; + goto err; } - dlhandle = dlopen(so_name, RTLD_NOW); - if (!dlhandle) { + elem-dlhandle = dlopen(so_name, RTLD_NOW); + if (!elem-dlhandle) { fprintf(stderr, PFX Warning: couldn't load driver '%s': %s\n, name, dlerror()); - goto out; + goto err; } + for (list = so_list; *list; list = (*list)-next) + ; + + *list = elem; + + goto out; + +err: + free(elem); out: free(so_name); } @@ -573,3 +595,22 @@ out: return num_devices; } + +FINI static void unload_drivers() +{ + struct ibv_driver *driver; + struct ibv_so_list * list; + + for (driver = head_driver; driver;) { + struct ibv_driver *tmp = driver; + driver = driver-next; + free(tmp); + } + + for (list = so_list; list;) { + struct ibv_so_list *tmp = list; + list = list-next; + dlclose(tmp-dlhandle); + free(tmp); + } +} -- 1.8.5.5 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Clean index map on exit
This patch adds the function idm_free() to free all entries of an index map. A call to this function is added in the ucma_cleanup destructor. The ucma_idm struct index_map is cleaned. Signed-off-by: Hannes Weisbach hannes_weisb...@gmx.net --- src/cma.c | 1 + src/indexer.c | 8 src/indexer.h | 1 + 3 files changed, 10 insertions(+) diff --git a/src/cma.c b/src/cma.c index 0cf4203..a533bf9 100644 --- a/src/cma.c +++ b/src/cma.c @@ -139,6 +139,7 @@ static void ucma_cleanup(void) ibv_close_device(cma_dev_array[cma_dev_cnt].verbs); } + idm_free(ucma_idm); fastlock_destroy(idm_lock); free(cma_dev_array); cma_dev_cnt = 0; diff --git a/src/indexer.c b/src/indexer.c index c8e8bce..4d1fd77 100644 --- a/src/indexer.c +++ b/src/indexer.c @@ -164,3 +164,11 @@ void *idm_clear(struct index_map *idm, int index) entry[idx_entry_index(index)] = NULL; return item; } + +void idm_free(struct index_map *idm) +{ + size_t i; + for (i = 0; i IDX_ARRAY_SIZE; i++) { + free(idm-array[i]); + } +} diff --git a/src/indexer.h b/src/indexer.h index 0c5f388..829d46b 100644 --- a/src/indexer.h +++ b/src/indexer.h @@ -89,6 +89,7 @@ struct index_map int idm_set(struct index_map *idm, int index, void *item); void *idm_clear(struct index_map *idm, int index); +void idm_free(struct index_map *idm); static inline void *idm_at(struct index_map *idm, int index) { -- 1.8.5.5 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Add cleanup routines.
Hi, Le vendredi 11 avril 2014 à 18:50 +0200, Hannes Weisbach a écrit : Dynamically loaded library handles are saved in a list and dlclosed() on exit. The list of struct ibv_driver *, as well as the global struct ibv_device ** list are free()d. Please adds some explanation, in particular the purpose of the changes. Your commit message only explain how, but you should also explain why. Signed-off-by: Hannes Weisbach hannes_weisb...@gmx.net --- src/device.c | 10 ++ src/init.c | 51 ++- 2 files changed, 56 insertions(+), 5 deletions(-) diff --git a/src/device.c b/src/device.c index beb7b3c..d5b76bb 100644 --- a/src/device.c +++ b/src/device.c @@ -305,3 +305,13 @@ void __ibv_ack_async_event(struct ibv_async_event *event) } } default_symver(__ibv_ack_async_event, ibv_ack_async_event); + +FINI static void ibverbs_deinit() In C, empty prototype declare a function which accept any parameter. So perhaps void ibverbs(void) is what you mean. +{ + size_t i; + for (i = 0; i num_devices; i++) { + /* driver callback needed. May not be malloc'd memory */ Seems dangerous. Such interrogation must be explicitly added in the commit message. + free(device_list[i]); + } + free(device_list); +} diff --git a/src/init.c b/src/init.c index d0e4b1c..2a8bca4 100644 --- a/src/init.c +++ b/src/init.c @@ -67,6 +67,11 @@ struct ibv_driver_name { struct ibv_driver_name *next; }; +struct ibv_so_list { + void *dlhandle; + struct ibv_so_list *next; +}; + struct ibv_driver { const char *name; ibv_driver_init_funcinit_func; @@ -77,6 +82,7 @@ struct ibv_driver { static struct ibv_sysfs_dev *sysfs_dev_list; static struct ibv_driver_name *driver_name_list; static struct ibv_driver *head_driver, *tail_driver; +static struct ibv_so_list *so_list; static int find_sysfs_devs(void) { @@ -193,7 +199,14 @@ void verbs_register_driver(const char *name, verbs_driver_init_func init_func) static void load_driver(const char *name) { char *so_name; - void *dlhandle; + struct ibv_so_list *elem; + struct ibv_so_list **list; + + elem = malloc(sizeof(*elem)); + if(!elem) + return; + + elem-next = NULL; #define __IBV_QUOTE(x) #x #define IBV_QUOTE(x) __IBV_QUOTE(x) @@ -205,16 +218,25 @@ static void load_driver(const char *name) name) 0) { fprintf(stderr, PFX Warning: couldn't load driver '%s'.\n, name); - return; + goto err; } - dlhandle = dlopen(so_name, RTLD_NOW); - if (!dlhandle) { + elem-dlhandle = dlopen(so_name, RTLD_NOW); + if (!elem-dlhandle) { fprintf(stderr, PFX Warning: couldn't load driver '%s': %s\n, name, dlerror()); - goto out; + goto err; } + for (list = so_list; *list; list = (*list)-next) + ; + + *list = elem; + + goto out; + +err: + free(elem); out: free(so_name); } @@ -573,3 +595,22 @@ out: return num_devices; } + +FINI static void unload_drivers() same remarks about prototype. +{ + struct ibv_driver *driver; + struct ibv_so_list * list; + + for (driver = head_driver; driver;) { + struct ibv_driver *tmp = driver; + driver = driver-next; + free(tmp); + } + Is it safe here to free the driver ? + for (list = so_list; list;) { + struct ibv_so_list *tmp = list; + list = list-next; + dlclose(tmp-dlhandle); + free(tmp); + } +} Why not store the dlopen() handle in the struct ibv_driver itself ? This way only one list has to be scanned. Regards. -- Yann Droneaud OPTEYA -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Clean index map on exit
Le vendredi 11 avril 2014 à 18:58 +0200, Hannes Weisbach a écrit : This patch adds the function idm_free() to free all entries of an index map. A call to this function is added in the ucma_cleanup destructor. The ucma_idm struct index_map is cleaned. Please explain the purpose: what's your goal ? Signed-off-by: Hannes Weisbach hannes_weisb...@gmx.net --- src/cma.c | 1 + src/indexer.c | 8 src/indexer.h | 1 + 3 files changed, 10 insertions(+) diff --git a/src/cma.c b/src/cma.c index 0cf4203..a533bf9 100644 --- a/src/cma.c +++ b/src/cma.c @@ -139,6 +139,7 @@ static void ucma_cleanup(void) ibv_close_device(cma_dev_array[cma_dev_cnt].verbs); } + idm_free(ucma_idm); fastlock_destroy(idm_lock); free(cma_dev_array); cma_dev_cnt = 0; diff --git a/src/indexer.c b/src/indexer.c index c8e8bce..4d1fd77 100644 --- a/src/indexer.c +++ b/src/indexer.c @@ -164,3 +164,11 @@ void *idm_clear(struct index_map *idm, int index) entry[idx_entry_index(index)] = NULL; return item; } + +void idm_free(struct index_map *idm) +{ + size_t i; + for (i = 0; i IDX_ARRAY_SIZE; i++) { + free(idm-array[i]); + } +} Is the array always full ? is the maximum index of the array IDX_ARRAY_SIZE or could it be lower ? And what about idm, is it free()'d somewhere else ? diff --git a/src/indexer.h b/src/indexer.h index 0c5f388..829d46b 100644 --- a/src/indexer.h +++ b/src/indexer.h @@ -89,6 +89,7 @@ struct index_map int idm_set(struct index_map *idm, int index, void *item); void *idm_clear(struct index_map *idm, int index); +void idm_free(struct index_map *idm); static inline void *idm_at(struct index_map *idm, int index) { Regards. -- Yann Droneaud OPTEYA -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Add cleanup routines.
Am 11.04.2014 um 19:00 schrieb Yann Droneaud ydrone...@opteya.com: Hi, Thanks for your quick reply. Please adds some explanation, in particular the purpose of the changes. Your commit message only explain how, but you should also explain why. Ok. In C, empty prototype declare a function which accept any parameter. So perhaps void ibverbs(void) is what you mean. Yup, thanks. +{ +size_t i; +for (i = 0; i num_devices; i++) { +/* driver callback needed. May not be malloc'd memory */ Seems dangerous. Such interrogation must be explicitly added in the commit message. Maybe it's better to leave it as a TODO and only free the device_list itself? +for (driver = head_driver; driver;) { +struct ibv_driver *tmp = driver; +driver = driver-next; +free(tmp); +} + Is it safe here to free the driver ? The struct itself is no longer needed. All the driver libraries are loaded and ibverbs_init() is call-once, so no new libraries are dlopen()ed (and call ibv_register_driver()). It's probably nicer first to dlclose() all libraries and then free all struct ibv_drivers. +for (list = so_list; list;) { +struct ibv_so_list *tmp = list; +list = list-next; +dlclose(tmp-dlhandle); +free(tmp); +} +} Why not store the dlopen() handle in the struct ibv_driver itself ? This way only one list has to be scanned. That was my first idea, but the struct ibv_driver doesn't exist at that time. First the (driver) library is dlopen()d and calls ibv_register_driver(). Only there is the ibv_driver struct allocated, but the handle is gone. Best regards, Hannes -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Clean index map on exit
Is the array always full ? No, but it has to be zero-initialized and free(NULL) is fine. idm_set() does: if (!idm-array[idx_array_index(index)]) { if (idm_grow(idm, index) 0) (idm_grow does calloc().) So I concluded unused entries have to be zero already and I don't put a new constraint on struct index_map. is the maximum index of the array IDX_ARRAY_SIZE or could it be lower ? It's fixed: struct index_map { void **array[IDX_ARRAY_SIZE]; }; And what about idm, is it free()'d somewhere else ? The owner has to take care of its struct index_map. I can only free the resources the map itself holds. For what it's worth: all struct index_maps are declared statically. Again, thank you for your quick feedback. Best regards, Hannes -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] xprtrdma: mind the device's max fast register page list depth
Hi Steve- I reviewed and successfully tested this patch with mlx4 using FRMR and MTHCAFMR. A couple of cosmetic notes are below: On Apr 10, 2014, at 11:13 AM, Steve Wise sw...@opengridcomputing.com wrote: Some rdma devices don't support a fast register page list depth of at least RPCRDMA_MAX_DATA_SEGS. So xprtrdma needs to chunk its fast register regions according to the minimum of the device max supported depth or RPCRDMA_MAX_DATA_SEGS. Signed-off-by: Steve Wise sw...@opengridcomputing.com --- net/sunrpc/xprtrdma/rpc_rdma.c |2 - net/sunrpc/xprtrdma/verbs.c | 64 --- net/sunrpc/xprtrdma/xprt_rdma.h |1 + 3 files changed, 47 insertions(+), 20 deletions(-) diff --git a/net/sunrpc/xprtrdma/rpc_rdma.c b/net/sunrpc/xprtrdma/rpc_rdma.c index 96ead52..14d7634 100644 --- a/net/sunrpc/xprtrdma/rpc_rdma.c +++ b/net/sunrpc/xprtrdma/rpc_rdma.c @@ -249,8 +249,6 @@ rpcrdma_create_chunks(struct rpc_rqst *rqst, struct xdr_buf *target, req-rl_nchunks = nchunks; BUG_ON(nchunks == 0); I don’t see a path through rpcrdma_create_chunks() where nchunks can be 0 here. If you agree, can you remove the above BUG_ON as well, if you re-submit a new version of this patch? - BUG_ON((r_xprt-rx_ia.ri_memreg_strategy == RPCRDMA_FRMR) - (nchunks 3)); /* * finish off header. If write, marshal discrim and nchunks. diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c index 93726560..c08f193 100644 --- a/net/sunrpc/xprtrdma/verbs.c +++ b/net/sunrpc/xprtrdma/verbs.c @@ -539,6 +539,11 @@ rpcrdma_ia_open(struct rpcrdma_xprt *xprt, struct sockaddr *addr, int memreg) __func__); memreg = RPCRDMA_REGISTER; #endif + } else { + /* Mind the ia limit on FRMR page list depth */ + ia-ri_max_frmr_depth = min_t(unsigned int, + RPCRDMA_MAX_DATA_SEGS, + devattr.max_fast_reg_page_list_len); } break; } @@ -659,24 +664,42 @@ rpcrdma_ep_create(struct rpcrdma_ep *ep, struct rpcrdma_ia *ia, ep-rep_attr.srq = NULL; ep-rep_attr.cap.max_send_wr = cdata-max_requests; switch (ia-ri_memreg_strategy) { - case RPCRDMA_FRMR: + case RPCRDMA_FRMR: { + int depth = 7; + /* Add room for frmr register and invalidate WRs. * 1. FRMR reg WR for head * 2. FRMR invalidate WR for head - * 3. FRMR reg WR for pagelist - * 4. FRMR invalidate WR for pagelist + * 3. N FRMR reg WRs for pagelist + * 4. N FRMR invalidate WRs for pagelist * 5. FRMR reg WR for tail * 6. FRMR invalidate WR for tail * 7. The RDMA_SEND WR */ - ep-rep_attr.cap.max_send_wr *= 7; + + /* Calculate N if the device max FRMR depth is smaller than + * RPCRDMA_MAX_DATA_SEGS. + */ + if (ia-ri_max_frmr_depth RPCRDMA_MAX_DATA_SEGS) { + int delta = RPCRDMA_MAX_DATA_SEGS - + ia-ri_max_frmr_depth; + + do { + depth += 2; /* FRMR reg + invalidate */ + delta -= ia-ri_max_frmr_depth; + } while (delta 0); + + } + ep-rep_attr.cap.max_send_wr *= depth; if (ep-rep_attr.cap.max_send_wr devattr.max_qp_wr) { - cdata-max_requests = devattr.max_qp_wr / 7; + cdata-max_requests = devattr.max_qp_wr / depth; if (!cdata-max_requests) return -EINVAL; - ep-rep_attr.cap.max_send_wr = cdata-max_requests * 7; + ep-rep_attr.cap.max_send_wr = cdata-max_requests * +depth; } break; + } case RPCRDMA_MEMWINDOWS_ASYNC: case RPCRDMA_MEMWINDOWS: /* Add room for mw_binds+unbinds - overkill! */ @@ -1043,16 +1066,16 @@ rpcrdma_buffer_create(struct rpcrdma_buffer *buf, struct rpcrdma_ep *ep, case RPCRDMA_FRMR: for (i = buf-rb_max_requests * RPCRDMA_MAX_SEGS; i; i--) { r-r.frmr.fr_mr = ib_alloc_fast_reg_mr(ia-ri_pd, - RPCRDMA_MAX_SEGS); + ia-ri_max_frmr_depth); if (IS_ERR(r-r.frmr.fr_mr)) { rc = PTR_ERR(r-r.frmr.fr_mr); dprintk(RPC: %s: ib_alloc_fast_reg_mr failed %i\n, __func__, rc);
Re: [PATCH] nfs-rdma: Fix for FMR leaks
Hi Allen- I reviewed and tested your patch. Comments inline below. On Apr 9, 2014, at 6:40 PM, Allen Andrews allen.andr...@cox.net wrote: Two memory region leaks iwere found during testing: 1. rpcrdma_buffer_create: While allocating RPCRDMA_FRMR's ib_alloc_fast_reg_mr is called and then ib_alloc_fast_reg_page_list is called. If ib_alloc_fast_reg_page_list returns an error it bails out of the routine dropping the last ib_alloc_fast_reg_mr frmr region creating a memory leak. Added code to dereg the last frmr if ib_alloc_fast_reg_page_list fails. 2. rpcrdma_buffer_destroy: While cleaning up, the routine will only free the MR's on the rb_mws list if there are rb_send_bufs present. However, in rpcrdma_buffer_create while the rb_mws list is being built if one of the MR allocation requests fail after some MR's have been allocated on the rb_mws list the routine never gets to create any rb_send_bufs but instead jumps to the rpcrdma_buffer_destroy routine which will never free the MR's on rb_mws list because the rb_send_bufs were never created. This leaks all the MR's on the rb_mws list that were created prior to one of the MR allocations failing. Issue(2) was seen during testing. Our adapter had a finite number of MR's available and we created enough connections to where we saw an MR allocation failure on our Nth NFS connection request. After the kernel cleaned up the resources it had allocated for the Nth connection we noticed that FMR's had been leaked due to the coding error described above. Issue(1) was seen during a code review while debugging issue(2). Signed-off-by: Allen Andrews allen.andr...@emulex.com --- net/sunrpc/xprtrdma/verbs.c | 79 --- 1 files changed, 44 insertions(+), 35 deletions(-) diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c index 9372656..54f592d 100644 --- a/net/sunrpc/xprtrdma/verbs.c +++ b/net/sunrpc/xprtrdma/verbs.c @@ -1058,6 +1058,14 @@ rpcrdma_buffer_create(struct rpcrdma_buffer *buf, struct rpcrdma_ep *ep, dprintk(RPC: %s: ib_alloc_fast_reg_page_list failed %i\n, __func__, rc); + + rc = ib_dereg_mr(r-r.frmr.fr_mr); This clears “rc,” wiping the error returned by ib_alloc_fast_reg_page_list(). Then rpcrdma_buffer_create() returns zero, even though it failed, and xprtrdma thinks the transport is set up and ready for use. Panic. I think you can safely ignore the return code from ib_dereg_mr() here. + if (rc) + dprintk(RPC: %s: + ib_dereg_mr + failed %i\n, + __func__, rc); + goto out; } list_add(r-mw_list, buf-rb_mws); @@ -1194,41 +1202,6 @@ rpcrdma_buffer_destroy(struct rpcrdma_buffer *buf) This line is broken: the “@@“ should start on the next line. Some kind of e-mail snafu? kfree(buf-rb_recv_bufs[i]); } if (buf-rb_send_bufs buf-rb_send_bufs[i]) { - while (!list_empty(buf-rb_mws)) { - r = list_entry(buf-rb_mws.next, - struct rpcrdma_mw, mw_list); - list_del(r-mw_list); - switch (ia-ri_memreg_strategy) { - case RPCRDMA_FRMR: - rc = ib_dereg_mr(r-r.frmr.fr_mr); - if (rc) - dprintk(RPC: %s: - ib_dereg_mr - failed %i\n, - __func__, rc); - ib_free_fast_reg_page_list(r-r.frmr.fr_pgl); - break; - case RPCRDMA_MTHCAFMR: - rc = ib_dealloc_fmr(r-r.fmr); - if (rc) - dprintk(RPC: %s: - ib_dealloc_fmr - failed %i\n, - __func__, rc); - break; - case RPCRDMA_MEMWINDOWS_ASYNC: - case RPCRDMA_MEMWINDOWS: - rc = ib_dealloc_mw(r-r.mw); - if (rc) -
RE: [PATCH V1] NFS-RDMA: fix qp pointer validation checks
Hi Chuck, Yes that is the case, Following is the trace I got. 4RPC: 355 setting alarm for 6 ms 4RPC: 355 sync task going to sleep 4RPC: xprt_rdma_connect_worker: reconnect 4RPC: rpcrdma_ep_disconnect: rdma_disconnect -1 4RPC: rpcrdma_ep_connect: rpcrdma_ep_disconnect status -1 3ocrdma_mbx_create_qp(0) rq_err 3ocrdma_mbx_create_qp(0) sq_err 3ocrdma_create_qp(0) error=-1 4RPC: rpcrdma_ep_connect: rdma_create_qp failed -1 4RPC: 355 __rpc_wake_up_task (now 4296956756) 4RPC: 355 disabling timer 4RPC: 355 removed from queue 880454578258 xprt_pending 4RPC: __rpc_wake_up_task done 4RPC: xprt_rdma_connect_worker: exit 4RPC: 355 sync task resuming 4RPC: 355 xprt_connect_status: error 1 connecting to server 192.168.1.1 4RPC: wake_up_next(880454578190 xprt_sending) 4RPC: 355 call_connect_status (status -5) 4RPC: 355 return 0, status -5 4RPC: 355 release task 4RPC: wake_up_next(880454578190 xprt_sending) 4RPC: xprt_rdma_free: called on 0x(null) 1BUG: unable to handle kernel NULL pointer dereference at (null) 1IP: [a05b68ac] rpcrdma_deregister_frmr_external+0x9c/0xe0 [xprtrdma] 4PGD 454554067 PUD 4665b7067 PMD 0 4Oops: [#1] SMP 4last sysfs file: /sys/devices/pci:00/:00:03.0/:03:00.0/infiniband/ocrdma0/fwerr 4CPU 6 4Modules linked in: xprtrdma(U) nfs lockd fscache auth_rpcgss nfs_acl ocrdma(U) be2net(U) ip6table_filter ip6_tables ebtable_nat ebtables ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT xt_CHECKSUM iptable_mangle iptable_filter ip_tables bridge stp llc autofs4 des_generic ecb md4 nls_utf8 cifs sunrpc rdma_ucm(U) rdma_cm(U) iw_cm(U) ib_addr(U) ib_ipoib(U) ib_cm(U) ib_sa(U) ipv6 ib_uverbs(U) ib_umad(U) iw_nes(U) libcrc32c iw_cxgb4(U) cxgb4(U) iw_cxgb3(U) cxgb3(U) mdio ib_qib(U) mlx4_en(U) mlx4_ib(U) mlx4_core(U) ib_mthca(U) ib_mad(U) ib_core(U) compat(U) vhost_net macvtap macvlan tun kvm uinput power_meter sg microcode i2c_i801 i2c_core iTCO_wdt iTCO_vendor_support igb ptp pps_core ioatdma dca i7core_edac edac_core ext3 jbd mbcache sr_mod cdrom sd_mod crc_t10dif usb_storage pata_acpi ata_generic ata_piix mptsas mptscsih mptbase scsi_transport_sas dm_mirror dm_region_hash dm_log dm_mod [last unloaded: be2net] 4 4Pid: 3597, comm: ls Not tainted 2.6.32-358.el6.x86_64 #1 Cisco Systems Inc R210-2121605W/R210-2121605W 4RIP: 0010:[a05b68ac] [a05b68ac] rpcrdma_deregister_frmr_external+0x9c/0xe0 [xprtrdma] 4RSP: 0018:880465aff9a8 EFLAGS: 00010217 4RAX: 8804673fcc00 RBX: 880466578028 RCX: 4RDX: 880465affa10 RSI: 880465aff9a8 RDI: 4RBP: 880465affa38 R08: R09: 4R10: 000f R11: 000f R12: 880454578598 4R13: 880454578000 R14: 880466578068 R15: 4FS: 7fe61f3107a0() GS:8800368c() knlGS: 4CS: 0010 DS: ES: CR0: 8005003b 4CR2: CR3: 00046520a000 CR4: 07e0 4DR0: DR1: DR2: 4DR3: DR6: 0ff0 DR7: 0400 4Process ls (pid: 3597, threadinfo 880465afe000, task 8804639a5500) 4Stack: 4 880462287370 000a 4d 0802dd3b0002 4d 4Call Trace: 4 [8109c97f] ? up+0x2f/0x50 4 [a05b6a03] rpcrdma_deregister_external+0x113/0x2d0 [xprtrdma] 4 [8150d0d1] ? printk+0x41/0x48 4 [a05b44fc] xprt_rdma_free+0x8c/0x210 [xprtrdma] 4 [81082014] ? mod_timer+0x144/0x220 4 [a05c8a60] xprt_release+0xc0/0x220 [sunrpc] 4 [a05cff5d] rpc_release_resources_task+0x1d/0x50 [sunrpc] 4 [a05d0a84] __rpc_execute+0x174/0x350 [sunrpc] 4 [8150d0d1] ? printk+0x41/0x48 4 [81096b47] ? bit_waitqueue+0x17/0xd0 4 [a05d0cc1] rpc_execute+0x61/0xa0 [sunrpc] 4 [a05c73a5] rpc_run_task+0x75/0x90 [sunrpc] 4 [a05c74c2] rpc_call_sync+0x42/0x70 [sunrpc] 4 [81143a17] ? handle_pte_fault+0x487/0xb50 4 [a074e030] _nfs4_call_sync+0x30/0x40 [nfs] 4 [a07461dc] _nfs4_proc_getattr+0xac/0xc0 [nfs] 4 [a07494be] nfs4_proc_getattr+0x4e/0x70 [nfs] 4 [a072f3e3] __nfs_revalidate_inode+0xe3/0x220 [nfs] 4 [a072fdb6] nfs_getattr+0xb6/0x120 [nfs] 4 [81186951] vfs_getattr+0x51/0x80 4 [811869e0] vfs_fstatat+0x60/0x80 4 [81186b2b] vfs_stat+0x1b/0x20 4 [81186b54] sys_newstat+0x24/0x50 4 [810dc817] ? audit_syscall_entry+0x1d7/0x200 4 [8100b072] system_call_fastpath+0x16/0x1b 4Code: 48 89 85 78 ff ff ff 48 8b 40 08 8b 40 1c 89 45 94 b8 ff ff ff ff f0 41 0f c1 85 e0 05 00 00 49 8b 04 24 48 8d 55 d8 48 8b 78 10 48 8b 07 ff 90 b0 01 00 00