RE: [PATCH V1] NFS-RDMA: fix qp pointer validation checks
Hi Chuck, Yes that is the case, Following is the trace I got. <4>RPC: 355 setting alarm for 6 ms <4>RPC: 355 sync task going to sleep <4>RPC: xprt_rdma_connect_worker: reconnect <4>RPC: rpcrdma_ep_disconnect: rdma_disconnect -1 <4>RPC: rpcrdma_ep_connect: rpcrdma_ep_disconnect status -1 <3>ocrdma_mbx_create_qp(0) rq_err <3>ocrdma_mbx_create_qp(0) sq_err <3>ocrdma_create_qp(0) error=-1 <4>RPC: rpcrdma_ep_connect: rdma_create_qp failed -1 <4>RPC: 355 __rpc_wake_up_task (now 4296956756) <4>RPC: 355 disabling timer <4>RPC: 355 removed from queue 880454578258 "xprt_pending" <4>RPC: __rpc_wake_up_task done <4>RPC: xprt_rdma_connect_worker: exit <4>RPC: 355 sync task resuming <4>RPC: 355 xprt_connect_status: error 1 connecting to server 192.168.1.1 <4>RPC: wake_up_next(880454578190 "xprt_sending") <4>RPC: 355 call_connect_status (status -5) <4>RPC: 355 return 0, status -5 <4>RPC: 355 release task <4>RPC: wake_up_next(880454578190 "xprt_sending") <4>RPC: xprt_rdma_free: called on 0x(null) <1>BUG: unable to handle kernel NULL pointer dereference at (null) <1>IP: [] rpcrdma_deregister_frmr_external+0x9c/0xe0 [xprtrdma] <4>PGD 454554067 PUD 4665b7067 PMD 0 <4>Oops: [#1] SMP <4>last sysfs file: /sys/devices/pci:00/:00:03.0/:03:00.0/infiniband/ocrdma0/fwerr <4>CPU 6 <4>Modules linked in: xprtrdma(U) nfs lockd fscache auth_rpcgss nfs_acl ocrdma(U) be2net(U) ip6table_filter ip6_tables ebtable_nat ebtables ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT xt_CHECKSUM iptable_mangle iptable_filter ip_tables bridge stp llc autofs4 des_generic ecb md4 nls_utf8 cifs sunrpc rdma_ucm(U) rdma_cm(U) iw_cm(U) ib_addr(U) ib_ipoib(U) ib_cm(U) ib_sa(U) ipv6 ib_uverbs(U) ib_umad(U) iw_nes(U) libcrc32c iw_cxgb4(U) cxgb4(U) iw_cxgb3(U) cxgb3(U) mdio ib_qib(U) mlx4_en(U) mlx4_ib(U) mlx4_core(U) ib_mthca(U) ib_mad(U) ib_core(U) compat(U) vhost_net macvtap macvlan tun kvm uinput power_meter sg microcode i2c_i801 i2c_core iTCO_wdt iTCO_vendor_support igb ptp pps_core ioatdma dca i7core_edac edac_core ext3 jbd mbcache sr_mod cdrom sd_mod crc_t10dif usb_storage pata_acpi ata_generic ata_piix mptsas mptscsih mptbase scsi_transport_sas dm_mirror dm_region_hash dm_log dm_mod [last unloaded: be2net] <4> <4>Pid: 3597, comm: ls Not tainted 2.6.32-358.el6.x86_64 #1 Cisco Systems Inc R210-2121605W/R210-2121605W <4>RIP: 0010:[] [] rpcrdma_deregister_frmr_external+0x9c/0xe0 [xprtrdma] <4>RSP: 0018:880465aff9a8 EFLAGS: 00010217 <4>RAX: 8804673fcc00 RBX: 880466578028 RCX: <4>RDX: 880465affa10 RSI: 880465aff9a8 RDI: <4>RBP: 880465affa38 R08: R09: <4>R10: 000f R11: 000f R12: 880454578598 <4>R13: 880454578000 R14: 880466578068 R15: <4>FS: 7fe61f3107a0() GS:8800368c() knlGS: <4>CS: 0010 DS: ES: CR0: 8005003b <4>CR2: CR3: 00046520a000 CR4: 07e0 <4>DR0: DR1: DR2: <4>DR3: DR6: 0ff0 DR7: 0400 <4>Process ls (pid: 3597, threadinfo 880465afe000, task 8804639a5500) <4>Stack: <4> 880462287370 000a <4> 0802dd3b0002 <4> <4>Call Trace: <4> [] ? up+0x2f/0x50 <4> [] rpcrdma_deregister_external+0x113/0x2d0 [xprtrdma] <4> [] ? printk+0x41/0x48 <4> [] xprt_rdma_free+0x8c/0x210 [xprtrdma] <4> [] ? mod_timer+0x144/0x220 <4> [] xprt_release+0xc0/0x220 [sunrpc] <4> [] rpc_release_resources_task+0x1d/0x50 [sunrpc] <4> [] __rpc_execute+0x174/0x350 [sunrpc] <4> [] ? printk+0x41/0x48 <4> [] ? bit_waitqueue+0x17/0xd0 <4> [] rpc_execute+0x61/0xa0 [sunrpc] <4> [] rpc_run_task+0x75/0x90 [sunrpc] <4> [] rpc_call_sync+0x42/0x70 [sunrpc] <4> [] ? handle_pte_fault+0x487/0xb50 <4> [] _nfs4_call_sync+0x30/0x40 [nfs] <4> [] _nfs4_proc_getattr+0xac/0xc0 [nfs] <4> [] nfs4_proc_getattr+0x4e/0x70 [nfs] <4> [] __nfs_revalidate_inode+0xe3/0x220 [nfs] <4> [] nfs_getattr+0xb6/0x120 [nfs] <4> [] vfs_getattr+0x51/0x80 <4> [] vfs_fstatat+0x60/0x80 <4> [] vfs_stat+0x1b/0x20 <4> [] sys_newstat+0x24/0x50 <4> [] ? audit_syscall_entry+0x1d7/0x200 <4> [] system_call_fastpath+0x16/0x1b <4>Code: 48 89 85 78 ff ff ff 48 8b 40 08 8b 40 1c 89 45 94 b8 ff ff ff ff f0 41 0f c1 85 e0 05 00 00 49 8b 04 24 48 8d 55 d8 48 8b 78 10 <48> 8b 07 ff 90 b0 01 00 00 85 c0 89 c3 74 09 80 3d 56 dd 03 00 <1>RIP [] rpcrdma_deregister_frmr_external+0x9c/0xe0 [xprtrdma] <4> RSP <4>CR2: > -Original Message- > From: Chuck Lever [mailto:chuck.le...@oracle.com] > Sent: Friday, April 11, 2014 1:24 AM > To: Devesh Sharma > Cc: Linux NFS
Re: [PATCH] nfs-rdma: Fix for FMR leaks
Hi Allen- I reviewed and tested your patch. Comments inline below. On Apr 9, 2014, at 6:40 PM, Allen Andrews wrote: > Two memory region leaks iwere found during testing: > > 1. rpcrdma_buffer_create: While allocating RPCRDMA_FRMR's > ib_alloc_fast_reg_mr is called and then ib_alloc_fast_reg_page_list is > called. If ib_alloc_fast_reg_page_list returns an error it bails out of the > routine dropping the last ib_alloc_fast_reg_mr frmr region creating a memory > leak. Added code to dereg the last frmr if ib_alloc_fast_reg_page_list fails. > > 2. rpcrdma_buffer_destroy: While cleaning up, the routine will only free the > MR's on the rb_mws list if there are rb_send_bufs present. However, in > rpcrdma_buffer_create while the rb_mws list is being built if one of the MR > allocation requests fail after some MR's have been allocated on the rb_mws > list the routine never gets to create any rb_send_bufs but instead jumps to > the rpcrdma_buffer_destroy routine which will never free the MR's on rb_mws > list because the rb_send_bufs were never created. This leaks all the MR's > on the rb_mws list that were created prior to one of the MR allocations > failing. > > Issue(2) was seen during testing. Our adapter had a finite number of MR's > available and we created enough connections to where we saw an MR allocation > failure on our Nth NFS connection request. After the kernel cleaned up the > resources it had allocated for the Nth connection we noticed that FMR's had > been leaked due to the coding error described above. > > Issue(1) was seen during a code review while debugging issue(2). > > Signed-off-by: Allen Andrews > --- > > net/sunrpc/xprtrdma/verbs.c | 79 --- > 1 files changed, 44 insertions(+), 35 deletions(-) > > diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c index > 9372656..54f592d 100644 > --- a/net/sunrpc/xprtrdma/verbs.c > +++ b/net/sunrpc/xprtrdma/verbs.c > @@ -1058,6 +1058,14 @@ rpcrdma_buffer_create(struct rpcrdma_buffer *buf, > struct rpcrdma_ep *ep, > dprintk("RPC: %s: " > "ib_alloc_fast_reg_page_list " > "failed %i\n", __func__, rc); > + > + rc = ib_dereg_mr(r->r.frmr.fr_mr); This clears “rc,” wiping the error returned by ib_alloc_fast_reg_page_list(). Then rpcrdma_buffer_create() returns zero, even though it failed, and xprtrdma thinks the transport is set up and ready for use. Panic. I think you can safely ignore the return code from ib_dereg_mr() here. > + if (rc) > + dprintk("RPC: %s:" > + " ib_dereg_mr" > + " failed %i\n", > + __func__, rc); > + > goto out; > } > list_add(&r->mw_list, &buf->rb_mws); @@ -1194,41 > +1202,6 @@ rpcrdma_buffer_destroy(struct rpcrdma_buffer *buf) This line is broken: the “@@“ should start on the next line. Some kind of e-mail snafu? > kfree(buf->rb_recv_bufs[i]); > } > if (buf->rb_send_bufs && buf->rb_send_bufs[i]) { > - while (!list_empty(&buf->rb_mws)) { > - r = list_entry(buf->rb_mws.next, > - struct rpcrdma_mw, mw_list); > - list_del(&r->mw_list); > - switch (ia->ri_memreg_strategy) { > - case RPCRDMA_FRMR: > - rc = ib_dereg_mr(r->r.frmr.fr_mr); > - if (rc) > - dprintk("RPC: %s:" > - " ib_dereg_mr" > - " failed %i\n", > - __func__, rc); > - > ib_free_fast_reg_page_list(r->r.frmr.fr_pgl); > - break; > - case RPCRDMA_MTHCAFMR: > - rc = ib_dealloc_fmr(r->r.fmr); > - if (rc) > - dprintk("RPC: %s:" > - " ib_dealloc_fmr" > - " failed %i\n", > - __func__, rc); > - break; > - case RPCRDMA_MEMWINDOWS_ASYNC: > - case RPCRDMA_MEMWINDOWS: > - rc = ib_dealloc_mw(r->r.mw); >
Re: [PATCH] xprtrdma: mind the device's max fast register page list depth
On 4/11/2014 1:34 PM, Chuck Lever wrote: Hi Steve- I reviewed and successfully tested this patch with mlx4 using FRMR and MTHCAFMR. A couple of cosmetic notes are below: On Apr 10, 2014, at 11:13 AM, Steve Wise wrote: Some rdma devices don't support a fast register page list depth of at least RPCRDMA_MAX_DATA_SEGS. So xprtrdma needs to chunk its fast register regions according to the minimum of the device max supported depth or RPCRDMA_MAX_DATA_SEGS. Signed-off-by: Steve Wise --- net/sunrpc/xprtrdma/rpc_rdma.c |2 - net/sunrpc/xprtrdma/verbs.c | 64 --- net/sunrpc/xprtrdma/xprt_rdma.h |1 + 3 files changed, 47 insertions(+), 20 deletions(-) diff --git a/net/sunrpc/xprtrdma/rpc_rdma.c b/net/sunrpc/xprtrdma/rpc_rdma.c index 96ead52..14d7634 100644 --- a/net/sunrpc/xprtrdma/rpc_rdma.c +++ b/net/sunrpc/xprtrdma/rpc_rdma.c @@ -249,8 +249,6 @@ rpcrdma_create_chunks(struct rpc_rqst *rqst, struct xdr_buf *target, req->rl_nchunks = nchunks; BUG_ON(nchunks == 0); I don’t see a path through rpcrdma_create_chunks() where nchunks can be 0 here. If you agree, can you remove the above BUG_ON as well, if you re-submit a new version of this patch? Sure. - BUG_ON((r_xprt->rx_ia.ri_memreg_strategy == RPCRDMA_FRMR) - && (nchunks > 3)); /* * finish off header. If write, marshal discrim and nchunks. diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c index 93726560..c08f193 100644 --- a/net/sunrpc/xprtrdma/verbs.c +++ b/net/sunrpc/xprtrdma/verbs.c @@ -539,6 +539,11 @@ rpcrdma_ia_open(struct rpcrdma_xprt *xprt, struct sockaddr *addr, int memreg) __func__); memreg = RPCRDMA_REGISTER; #endif + } else { + /* Mind the ia limit on FRMR page list depth */ + ia->ri_max_frmr_depth = min_t(unsigned int, + RPCRDMA_MAX_DATA_SEGS, + devattr.max_fast_reg_page_list_len); } break; } @@ -659,24 +664,42 @@ rpcrdma_ep_create(struct rpcrdma_ep *ep, struct rpcrdma_ia *ia, ep->rep_attr.srq = NULL; ep->rep_attr.cap.max_send_wr = cdata->max_requests; switch (ia->ri_memreg_strategy) { - case RPCRDMA_FRMR: + case RPCRDMA_FRMR: { + int depth = 7; + /* Add room for frmr register and invalidate WRs. * 1. FRMR reg WR for head * 2. FRMR invalidate WR for head -* 3. FRMR reg WR for pagelist -* 4. FRMR invalidate WR for pagelist +* 3. N FRMR reg WRs for pagelist +* 4. N FRMR invalidate WRs for pagelist * 5. FRMR reg WR for tail * 6. FRMR invalidate WR for tail * 7. The RDMA_SEND WR */ - ep->rep_attr.cap.max_send_wr *= 7; + + /* Calculate N if the device max FRMR depth is smaller than +* RPCRDMA_MAX_DATA_SEGS. +*/ + if (ia->ri_max_frmr_depth < RPCRDMA_MAX_DATA_SEGS) { + int delta = RPCRDMA_MAX_DATA_SEGS - + ia->ri_max_frmr_depth; + + do { + depth += 2; /* FRMR reg + invalidate */ + delta -= ia->ri_max_frmr_depth; + } while (delta > 0); + + } + ep->rep_attr.cap.max_send_wr *= depth; if (ep->rep_attr.cap.max_send_wr > devattr.max_qp_wr) { - cdata->max_requests = devattr.max_qp_wr / 7; + cdata->max_requests = devattr.max_qp_wr / depth; if (!cdata->max_requests) return -EINVAL; - ep->rep_attr.cap.max_send_wr = cdata->max_requests * 7; + ep->rep_attr.cap.max_send_wr = cdata->max_requests * + depth; } break; + } case RPCRDMA_MEMWINDOWS_ASYNC: case RPCRDMA_MEMWINDOWS: /* Add room for mw_binds+unbinds - overkill! */ @@ -1043,16 +1066,16 @@ rpcrdma_buffer_create(struct rpcrdma_buffer *buf, struct rpcrdma_ep *ep, case RPCRDMA_FRMR: for (i = buf->rb_max_requests * RPCRDMA_MAX_SEGS; i; i--) { r->r.frmr.fr_mr = ib_alloc_fast_reg_mr(ia->ri_pd, -RPCRDMA_MAX_SEGS); + ia->ri_max_frmr_depth); if (IS_ERR(r->r.frmr.fr_mr)) { rc = PTR_ERR(r->r.frmr.fr_mr); dprintk("RPC: %s: ib_alloc_fast_reg_mr"
Re: [PATCH] xprtrdma: mind the device's max fast register page list depth
Hi Steve- I reviewed and successfully tested this patch with mlx4 using FRMR and MTHCAFMR. A couple of cosmetic notes are below: On Apr 10, 2014, at 11:13 AM, Steve Wise wrote: > Some rdma devices don't support a fast register page list depth of > at least RPCRDMA_MAX_DATA_SEGS. So xprtrdma needs to chunk its fast > register regions according to the minimum of the device max supported > depth or RPCRDMA_MAX_DATA_SEGS. > > Signed-off-by: Steve Wise > --- > > net/sunrpc/xprtrdma/rpc_rdma.c |2 - > net/sunrpc/xprtrdma/verbs.c | 64 --- > net/sunrpc/xprtrdma/xprt_rdma.h |1 + > 3 files changed, 47 insertions(+), 20 deletions(-) > > diff --git a/net/sunrpc/xprtrdma/rpc_rdma.c b/net/sunrpc/xprtrdma/rpc_rdma.c > index 96ead52..14d7634 100644 > --- a/net/sunrpc/xprtrdma/rpc_rdma.c > +++ b/net/sunrpc/xprtrdma/rpc_rdma.c > @@ -249,8 +249,6 @@ rpcrdma_create_chunks(struct rpc_rqst *rqst, struct > xdr_buf *target, > req->rl_nchunks = nchunks; > > BUG_ON(nchunks == 0); I don’t see a path through rpcrdma_create_chunks() where nchunks can be 0 here. If you agree, can you remove the above BUG_ON as well, if you re-submit a new version of this patch? > - BUG_ON((r_xprt->rx_ia.ri_memreg_strategy == RPCRDMA_FRMR) > -&& (nchunks > 3)); > > /* >* finish off header. If write, marshal discrim and nchunks. > diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c > index 93726560..c08f193 100644 > --- a/net/sunrpc/xprtrdma/verbs.c > +++ b/net/sunrpc/xprtrdma/verbs.c > @@ -539,6 +539,11 @@ rpcrdma_ia_open(struct rpcrdma_xprt *xprt, struct > sockaddr *addr, int memreg) > __func__); > memreg = RPCRDMA_REGISTER; > #endif > + } else { > + /* Mind the ia limit on FRMR page list depth */ > + ia->ri_max_frmr_depth = min_t(unsigned int, > + RPCRDMA_MAX_DATA_SEGS, > + devattr.max_fast_reg_page_list_len); > } > break; > } > @@ -659,24 +664,42 @@ rpcrdma_ep_create(struct rpcrdma_ep *ep, struct > rpcrdma_ia *ia, > ep->rep_attr.srq = NULL; > ep->rep_attr.cap.max_send_wr = cdata->max_requests; > switch (ia->ri_memreg_strategy) { > - case RPCRDMA_FRMR: > + case RPCRDMA_FRMR: { > + int depth = 7; > + > /* Add room for frmr register and invalidate WRs. >* 1. FRMR reg WR for head >* 2. FRMR invalidate WR for head > - * 3. FRMR reg WR for pagelist > - * 4. FRMR invalidate WR for pagelist > + * 3. N FRMR reg WRs for pagelist > + * 4. N FRMR invalidate WRs for pagelist >* 5. FRMR reg WR for tail >* 6. FRMR invalidate WR for tail >* 7. The RDMA_SEND WR >*/ > - ep->rep_attr.cap.max_send_wr *= 7; > + > + /* Calculate N if the device max FRMR depth is smaller than > + * RPCRDMA_MAX_DATA_SEGS. > + */ > + if (ia->ri_max_frmr_depth < RPCRDMA_MAX_DATA_SEGS) { > + int delta = RPCRDMA_MAX_DATA_SEGS - > + ia->ri_max_frmr_depth; > + > + do { > + depth += 2; /* FRMR reg + invalidate */ > + delta -= ia->ri_max_frmr_depth; > + } while (delta > 0); > + > + } > + ep->rep_attr.cap.max_send_wr *= depth; > if (ep->rep_attr.cap.max_send_wr > devattr.max_qp_wr) { > - cdata->max_requests = devattr.max_qp_wr / 7; > + cdata->max_requests = devattr.max_qp_wr / depth; > if (!cdata->max_requests) > return -EINVAL; > - ep->rep_attr.cap.max_send_wr = cdata->max_requests * 7; > + ep->rep_attr.cap.max_send_wr = cdata->max_requests * > +depth; > } > break; > + } > case RPCRDMA_MEMWINDOWS_ASYNC: > case RPCRDMA_MEMWINDOWS: > /* Add room for mw_binds+unbinds - overkill! */ > @@ -1043,16 +1066,16 @@ rpcrdma_buffer_create(struct rpcrdma_buffer *buf, > struct rpcrdma_ep *ep, > case RPCRDMA_FRMR: > for (i = buf->rb_max_requests * RPCRDMA_MAX_SEGS; i; i--) { > r->r.frmr.fr_mr = ib_alloc_fast_reg_mr(ia->ri_pd, > - RPCRDMA_MAX_SEGS); > + ia->ri_max_frmr_depth); > if (IS_ERR(r->r.frmr.fr_mr)) { > rc = PTR_ERR(r->r.frmr.fr_mr); > dprintk("RPC: %s: ib_alloc_fast_reg_mr" >
Re: [PATCH] Clean index map on exit
> > Is the array always full ? No, but it has to be zero-initialized and free(NULL) is fine. idm_set() does: if (!idm->array[idx_array_index(index)]) { if (idm_grow(idm, index) < 0) (idm_grow does calloc().) So I concluded unused entries have to be zero already and I don't put a new constraint on struct index_map. > is the maximum index of the array > IDX_ARRAY_SIZE or could it be lower ? It's fixed: struct index_map { void **array[IDX_ARRAY_SIZE]; }; > And what about idm, is it free()'d somewhere else ? The owner has to take care of its struct index_map. I can only free the resources the map itself holds. For what it's worth: all struct index_maps are declared statically. Again, thank you for your quick feedback. Best regards, Hannes -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Add cleanup routines.
Am 11.04.2014 um 19:00 schrieb Yann Droneaud : > Hi, Thanks for your quick reply. > Please adds some explanation, in particular the purpose of the changes. > Your commit message only explain "how", but you should also explain > "why". Ok. > In C, empty prototype declare a function which accept any parameter. So > perhaps void ibverbs(void) is what you mean. Yup, thanks. >> +{ >> +size_t i; >> +for (i = 0; i < num_devices; i++) { >> +/* driver callback needed. May not be malloc'd memory */ > > Seems dangerous. Such interrogation must be explicitly added in the > commit message. Maybe it's better to leave it as a TODO and only free the device_list itself? >> +for (driver = head_driver; driver;) { >> +struct ibv_driver *tmp = driver; >> +driver = driver->next; >> +free(tmp); >> +} >> + > > Is it safe here to free the driver ? The struct itself is no longer needed. All the driver libraries are loaded and ibverbs_init() is call-once, so no new libraries are dlopen()ed (and call ibv_register_driver()). It's probably nicer first to dlclose() all libraries and then free all struct ibv_drivers. >> +for (list = so_list; list;) { >> +struct ibv_so_list *tmp = list; >> +list = list->next; >> +dlclose(tmp->dlhandle); >> +free(tmp); >> +} >> +} > > Why not store the dlopen() handle in the struct ibv_driver itself ? > This way only one list has to be scanned. That was my first idea, but the struct ibv_driver doesn't exist at that time. First the (driver) library is dlopen()d and calls ibv_register_driver(). Only there is the ibv_driver struct allocated, but the handle is gone. Best regards, Hannes -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Clean index map on exit
Le vendredi 11 avril 2014 à 18:58 +0200, Hannes Weisbach a écrit : > This patch adds the function idm_free() to free all entries of an index > map. A call to this function is added in the ucma_cleanup destructor. > The ucma_idm struct index_map is cleaned. > Please explain the purpose: what's your goal ? > Signed-off-by: Hannes Weisbach > --- > src/cma.c | 1 + > src/indexer.c | 8 > src/indexer.h | 1 + > 3 files changed, 10 insertions(+) > > diff --git a/src/cma.c b/src/cma.c > index 0cf4203..a533bf9 100644 > --- a/src/cma.c > +++ b/src/cma.c > @@ -139,6 +139,7 @@ static void ucma_cleanup(void) > ibv_close_device(cma_dev_array[cma_dev_cnt].verbs); > } > > + idm_free(&ucma_idm); > fastlock_destroy(&idm_lock); > free(cma_dev_array); > cma_dev_cnt = 0; > diff --git a/src/indexer.c b/src/indexer.c > index c8e8bce..4d1fd77 100644 > --- a/src/indexer.c > +++ b/src/indexer.c > @@ -164,3 +164,11 @@ void *idm_clear(struct index_map *idm, int index) > entry[idx_entry_index(index)] = NULL; > return item; > } > + > +void idm_free(struct index_map *idm) > +{ > + size_t i; > + for (i = 0; i < IDX_ARRAY_SIZE; i++) { > + free(idm->array[i]); > + } > +} Is the array always full ? is the maximum index of the array IDX_ARRAY_SIZE or could it be lower ? And what about idm, is it free()'d somewhere else ? > diff --git a/src/indexer.h b/src/indexer.h > index 0c5f388..829d46b 100644 > --- a/src/indexer.h > +++ b/src/indexer.h > @@ -89,6 +89,7 @@ struct index_map > > int idm_set(struct index_map *idm, int index, void *item); > void *idm_clear(struct index_map *idm, int index); > +void idm_free(struct index_map *idm); > > static inline void *idm_at(struct index_map *idm, int index) > { Regards. -- Yann Droneaud OPTEYA -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Add cleanup routines.
Hi, Le vendredi 11 avril 2014 à 18:50 +0200, Hannes Weisbach a écrit : > Dynamically loaded library handles are saved in a list and dlclosed() on > exit. The list of struct ibv_driver *, as well as the global > struct ibv_device ** list are free()d. > Please adds some explanation, in particular the purpose of the changes. Your commit message only explain "how", but you should also explain "why". > Signed-off-by: Hannes Weisbach > --- > src/device.c | 10 ++ > src/init.c | 51 ++- > 2 files changed, 56 insertions(+), 5 deletions(-) > > diff --git a/src/device.c b/src/device.c > index beb7b3c..d5b76bb 100644 > --- a/src/device.c > +++ b/src/device.c > @@ -305,3 +305,13 @@ void __ibv_ack_async_event(struct ibv_async_event *event) > } > } > default_symver(__ibv_ack_async_event, ibv_ack_async_event); > + > +FINI static void ibverbs_deinit() In C, empty prototype declare a function which accept any parameter. So perhaps void ibverbs(void) is what you mean. > +{ > + size_t i; > + for (i = 0; i < num_devices; i++) { > + /* driver callback needed. May not be malloc'd memory */ Seems dangerous. Such interrogation must be explicitly added in the commit message. > + free(device_list[i]); > + } > + free(device_list); > +} > diff --git a/src/init.c b/src/init.c > index d0e4b1c..2a8bca4 100644 > --- a/src/init.c > +++ b/src/init.c > @@ -67,6 +67,11 @@ struct ibv_driver_name { > struct ibv_driver_name *next; > }; > > +struct ibv_so_list { > + void *dlhandle; > + struct ibv_so_list *next; > +}; > + > struct ibv_driver { > const char *name; > ibv_driver_init_funcinit_func; > @@ -77,6 +82,7 @@ struct ibv_driver { > static struct ibv_sysfs_dev *sysfs_dev_list; > static struct ibv_driver_name *driver_name_list; > static struct ibv_driver *head_driver, *tail_driver; > +static struct ibv_so_list *so_list; > > static int find_sysfs_devs(void) > { > @@ -193,7 +199,14 @@ void verbs_register_driver(const char *name, > verbs_driver_init_func init_func) > static void load_driver(const char *name) > { > char *so_name; > - void *dlhandle; > + struct ibv_so_list *elem; > + struct ibv_so_list **list; > + > + elem = malloc(sizeof(*elem)); > + if(!elem) > + return; > + > + elem->next = NULL; > > #define __IBV_QUOTE(x) #x > #define IBV_QUOTE(x) __IBV_QUOTE(x) > @@ -205,16 +218,25 @@ static void load_driver(const char *name) >name) < 0) { > fprintf(stderr, PFX "Warning: couldn't load driver '%s'.\n", > name); > - return; > + goto err; > } > > - dlhandle = dlopen(so_name, RTLD_NOW); > - if (!dlhandle) { > + elem->dlhandle = dlopen(so_name, RTLD_NOW); > + if (!elem->dlhandle) { > fprintf(stderr, PFX "Warning: couldn't load driver '%s': %s\n", > name, dlerror()); > - goto out; > + goto err; > } > > + for (list = &so_list; *list; list = &(*list)->next) > + ; > + > + *list = elem; > + > + goto out; > + > +err: > + free(elem); > out: > free(so_name); > } > @@ -573,3 +595,22 @@ out: > > return num_devices; > } > + > +FINI static void unload_drivers() same remarks about prototype. > +{ > + struct ibv_driver *driver; > + struct ibv_so_list * list; > + > + for (driver = head_driver; driver;) { > + struct ibv_driver *tmp = driver; > + driver = driver->next; > + free(tmp); > + } > + Is it safe here to free the driver ? > + for (list = so_list; list;) { > + struct ibv_so_list *tmp = list; > + list = list->next; > + dlclose(tmp->dlhandle); > + free(tmp); > + } > +} Why not store the dlopen() handle in the struct ibv_driver itself ? This way only one list has to be scanned. Regards. -- Yann Droneaud OPTEYA -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Clean index map on exit
This patch adds the function idm_free() to free all entries of an index map. A call to this function is added in the ucma_cleanup destructor. The ucma_idm struct index_map is cleaned. Signed-off-by: Hannes Weisbach --- src/cma.c | 1 + src/indexer.c | 8 src/indexer.h | 1 + 3 files changed, 10 insertions(+) diff --git a/src/cma.c b/src/cma.c index 0cf4203..a533bf9 100644 --- a/src/cma.c +++ b/src/cma.c @@ -139,6 +139,7 @@ static void ucma_cleanup(void) ibv_close_device(cma_dev_array[cma_dev_cnt].verbs); } + idm_free(&ucma_idm); fastlock_destroy(&idm_lock); free(cma_dev_array); cma_dev_cnt = 0; diff --git a/src/indexer.c b/src/indexer.c index c8e8bce..4d1fd77 100644 --- a/src/indexer.c +++ b/src/indexer.c @@ -164,3 +164,11 @@ void *idm_clear(struct index_map *idm, int index) entry[idx_entry_index(index)] = NULL; return item; } + +void idm_free(struct index_map *idm) +{ + size_t i; + for (i = 0; i < IDX_ARRAY_SIZE; i++) { + free(idm->array[i]); + } +} diff --git a/src/indexer.h b/src/indexer.h index 0c5f388..829d46b 100644 --- a/src/indexer.h +++ b/src/indexer.h @@ -89,6 +89,7 @@ struct index_map int idm_set(struct index_map *idm, int index, void *item); void *idm_clear(struct index_map *idm, int index); +void idm_free(struct index_map *idm); static inline void *idm_at(struct index_map *idm, int index) { -- 1.8.5.5 -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Add cleanup routines.
Dynamically loaded library handles are saved in a list and dlclosed() on exit. The list of struct ibv_driver *, as well as the global struct ibv_device ** list are free()d. Signed-off-by: Hannes Weisbach --- src/device.c | 10 ++ src/init.c | 51 ++- 2 files changed, 56 insertions(+), 5 deletions(-) diff --git a/src/device.c b/src/device.c index beb7b3c..d5b76bb 100644 --- a/src/device.c +++ b/src/device.c @@ -305,3 +305,13 @@ void __ibv_ack_async_event(struct ibv_async_event *event) } } default_symver(__ibv_ack_async_event, ibv_ack_async_event); + +FINI static void ibverbs_deinit() +{ + size_t i; + for (i = 0; i < num_devices; i++) { + /* driver callback needed. May not be malloc'd memory */ + free(device_list[i]); + } + free(device_list); +} diff --git a/src/init.c b/src/init.c index d0e4b1c..2a8bca4 100644 --- a/src/init.c +++ b/src/init.c @@ -67,6 +67,11 @@ struct ibv_driver_name { struct ibv_driver_name *next; }; +struct ibv_so_list { + void *dlhandle; + struct ibv_so_list *next; +}; + struct ibv_driver { const char *name; ibv_driver_init_funcinit_func; @@ -77,6 +82,7 @@ struct ibv_driver { static struct ibv_sysfs_dev *sysfs_dev_list; static struct ibv_driver_name *driver_name_list; static struct ibv_driver *head_driver, *tail_driver; +static struct ibv_so_list *so_list; static int find_sysfs_devs(void) { @@ -193,7 +199,14 @@ void verbs_register_driver(const char *name, verbs_driver_init_func init_func) static void load_driver(const char *name) { char *so_name; - void *dlhandle; + struct ibv_so_list *elem; + struct ibv_so_list **list; + + elem = malloc(sizeof(*elem)); + if(!elem) + return; + + elem->next = NULL; #define __IBV_QUOTE(x) #x #define IBV_QUOTE(x) __IBV_QUOTE(x) @@ -205,16 +218,25 @@ static void load_driver(const char *name) name) < 0) { fprintf(stderr, PFX "Warning: couldn't load driver '%s'.\n", name); - return; + goto err; } - dlhandle = dlopen(so_name, RTLD_NOW); - if (!dlhandle) { + elem->dlhandle = dlopen(so_name, RTLD_NOW); + if (!elem->dlhandle) { fprintf(stderr, PFX "Warning: couldn't load driver '%s': %s\n", name, dlerror()); - goto out; + goto err; } + for (list = &so_list; *list; list = &(*list)->next) + ; + + *list = elem; + + goto out; + +err: + free(elem); out: free(so_name); } @@ -573,3 +595,22 @@ out: return num_devices; } + +FINI static void unload_drivers() +{ + struct ibv_driver *driver; + struct ibv_so_list * list; + + for (driver = head_driver; driver;) { + struct ibv_driver *tmp = driver; + driver = driver->next; + free(tmp); + } + + for (list = so_list; list;) { + struct ibv_so_list *tmp = list; + list = list->next; + dlclose(tmp->dlhandle); + free(tmp); + } +} -- 1.8.5.5 -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Rare Duplicate Completions
Bert, Thanks for the help. Unfortunately, it looks as if all of my message-passing and RDMA operations use the SIGNALED flag. I'm checking the wc->status flag (==IBV_WC_SUCCESS), and it and everything in the message appears to be completely identical to the first completion. I notice that it only seems to happen when I have server processes (ie, listeners) that also open active connections to other listeners as well as accept connections from client processes. I perform both RDMA and message-passing across those server-to-server connections, so on a hunch I tried rewriting the RDMA operations as message-passing, but the problem still occurs. Thanks in advance for any other thoughts you might have. Cheers, Christopher On Thu, Apr 10, 2014 at 5:42 PM, Christopher Mitchell wrote: > Bert, > > Thanks for the help. Unfortunately, it looks as if all of my message-passing > and RDMA operations use the SIGNALED flag. I'm checking the wc->status flag > (==IBV_WC_SUCCESS), and it and everything in the message appears to be > completely identical to the first completion. I notice that it only seems to > happen when I have server processes (ie, listeners) that also open active > connections to other listeners as well as accept connections from client > processes. I perform both RDMA and message-passing across those > server-to-server connections, so on a hunch I tried rewriting the RDMA > operations as message-passing, but the problem still occurs. Thanks in > advance for any other thoughts you might have. > > Cheers, > Christopher > > > On Mon, Apr 7, 2014 at 9:01 PM, Bart Van Assche wrote: >> >> On 4/04/2014 10:00, Christopher Mitchell wrote: >>> >>> I am working on building a distributed Infiniband application, testing >>> using Mellanox Connect-X HCAs. In very rare cases, perhaps once in a >>> few million operations, I appear to be receiving duplicate completions >>> or incorrect completions. For instance, I'll send out an RDMA request >>> and receive a completion for a Verb message response I had just >>> handled, or send a Verb message request and receive a duplicate Verb >>> message completion.. Needless to say, this is introducing instability >>> in my application. Does anyone have experience with a bug like this, >>> or am I encountering some arcane issue in how I'm manipulating the >>> HCA? I'd be more than happy to furnish relevant sections of code to >>> help nail down the issue. >> >> >> Hello Christopher, >> >> In e.g. the SCST SRP target driver there is code present that checks for >> duplicate and/or missing completions. Although this code is being used >> intensively I have not yet seen any error messages being logged by the code >> that verifies completions. Note: something that is nontrivial in the RDMA >> API and that you might already be aware of is that even for non-signaled >> work requests a completion is delivered if that work request fails. If these >> non-signaled work requests do not have a unique wr_id error completions for >> these requests might be misinterpret as duplicate completions. >> >> Bart. >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in >> the body of a message to majord...@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html > > -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH opensm] osm_sa_mad_ctrl.c: In sa_mad_ctrl_rcv_callback, improve 1A04 error log message
to include more info on incoming SA MAD Signed-off-by: Hal Rosenstock --- diff --git a/opensm/osm_sa_mad_ctrl.c b/opensm/osm_sa_mad_ctrl.c index 902803e..61fcb0b 100644 --- a/opensm/osm_sa_mad_ctrl.c +++ b/opensm/osm_sa_mad_ctrl.c @@ -331,9 +331,12 @@ static void sa_mad_ctrl_rcv_callback(IN osm_madw_t * p_madw, IN void *context, if (p_sa_mad->sm_key != 0 && p_sa_mad->sm_key != p_ctrl->p_subn->opt.sa_key) { OSM_LOG(p_ctrl->p_log, OSM_LOG_ERROR, "ERR 1A04: " - "Non-Zero SA MAD SM_Key: 0x%" PRIx64 " != SM_Key: 0x%" - PRIx64 "; MAD ignored\n", cl_ntoh64(p_sa_mad->sm_key), - cl_ntoh64(p_ctrl->p_subn->opt.sa_key)); + "Non-Zero MAD SM_Key: 0x%" PRIx64 " != SM_Key: 0x%" + PRIx64 "; SA MAD ignored for method 0x%X attribute 0x%X (%s)\n", + cl_ntoh64(p_sa_mad->sm_key), + cl_ntoh64(p_ctrl->p_subn->opt.sa_key), + p_sa_mad->method, cl_ntoh16(p_sa_mad->attr_id), + ib_get_sa_attr_str(p_sa_mad->attr_id)); osm_mad_pool_put(p_ctrl->p_mad_pool, p_madw); goto Exit; } -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html