RE: [PATCH V1] NFS-RDMA: fix qp pointer validation checks

2014-04-11 Thread Devesh Sharma
Hi  Chuck,
Yes that is the case, Following is the trace I got.

<4>RPC:   355 setting alarm for 6 ms
<4>RPC:   355 sync task going to sleep
<4>RPC:   xprt_rdma_connect_worker: reconnect
<4>RPC:   rpcrdma_ep_disconnect: rdma_disconnect -1
<4>RPC:   rpcrdma_ep_connect: rpcrdma_ep_disconnect status -1
<3>ocrdma_mbx_create_qp(0) rq_err
<3>ocrdma_mbx_create_qp(0) sq_err
<3>ocrdma_create_qp(0) error=-1
<4>RPC:   rpcrdma_ep_connect: rdma_create_qp failed -1
<4>RPC:   355 __rpc_wake_up_task (now 4296956756)
<4>RPC:   355 disabling timer
<4>RPC:   355 removed from queue 880454578258 "xprt_pending"
<4>RPC:   __rpc_wake_up_task done
<4>RPC:   xprt_rdma_connect_worker: exit
<4>RPC:   355 sync task resuming
<4>RPC:   355 xprt_connect_status: error 1 connecting to server 192.168.1.1
<4>RPC:   wake_up_next(880454578190 "xprt_sending")
<4>RPC:   355 call_connect_status (status -5)
<4>RPC:   355 return 0, status -5
<4>RPC:   355 release task
<4>RPC:   wake_up_next(880454578190 "xprt_sending")
<4>RPC:   xprt_rdma_free: called on 0x(null)
<1>BUG: unable to handle kernel NULL pointer dereference at (null)
<1>IP: [] rpcrdma_deregister_frmr_external+0x9c/0xe0 
[xprtrdma]
<4>PGD 454554067 PUD 4665b7067 PMD 0
<4>Oops:  [#1] SMP
<4>last sysfs file: 
/sys/devices/pci:00/:00:03.0/:03:00.0/infiniband/ocrdma0/fwerr
<4>CPU 6
<4>Modules linked in: xprtrdma(U) nfs lockd fscache auth_rpcgss nfs_acl 
ocrdma(U) be2net(U) ip6table_filter ip6_tables ebtable_nat ebtables 
ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_state 
nf_conntrack ipt_REJECT xt_CHECKSUM iptable_mangle iptable_filter ip_tables 
bridge stp llc autofs4 des_generic ecb md4 nls_utf8 cifs sunrpc rdma_ucm(U) 
rdma_cm(U) iw_cm(U) ib_addr(U) ib_ipoib(U) ib_cm(U) ib_sa(U) ipv6 ib_uverbs(U) 
ib_umad(U) iw_nes(U) libcrc32c iw_cxgb4(U) cxgb4(U) iw_cxgb3(U) cxgb3(U) mdio 
ib_qib(U) mlx4_en(U) mlx4_ib(U) mlx4_core(U) ib_mthca(U) ib_mad(U) ib_core(U) 
compat(U) vhost_net macvtap macvlan tun kvm uinput power_meter sg microcode 
i2c_i801 i2c_core iTCO_wdt iTCO_vendor_support igb ptp pps_core ioatdma dca 
i7core_edac edac_core ext3 jbd mbcache sr_mod cdrom sd_mod crc_t10dif 
usb_storage pata_acpi ata_generic ata_piix mptsas mptscsih mptbase 
scsi_transport_sas dm_mirror dm_region_hash dm_log dm_mod [last unloaded: 
be2net]
<4>
<4>Pid: 3597, comm: ls Not tainted 2.6.32-358.el6.x86_64 #1 Cisco Systems Inc 
R210-2121605W/R210-2121605W
<4>RIP: 0010:[]  [] 
rpcrdma_deregister_frmr_external+0x9c/0xe0 [xprtrdma]
<4>RSP: 0018:880465aff9a8  EFLAGS: 00010217
<4>RAX: 8804673fcc00 RBX: 880466578028 RCX: 
<4>RDX: 880465affa10 RSI: 880465aff9a8 RDI: 
<4>RBP: 880465affa38 R08:  R09: 
<4>R10: 000f R11: 000f R12: 880454578598
<4>R13: 880454578000 R14: 880466578068 R15: 
<4>FS:  7fe61f3107a0() GS:8800368c() knlGS:
<4>CS:  0010 DS:  ES:  CR0: 8005003b
<4>CR2:  CR3: 00046520a000 CR4: 07e0
<4>DR0:  DR1:  DR2: 
<4>DR3:  DR6: 0ff0 DR7: 0400
<4>Process ls (pid: 3597, threadinfo 880465afe000, task 8804639a5500)
<4>Stack:
<4>  880462287370  000a
<4> 0802dd3b0002   
<4>    
<4>Call Trace:
<4> [] ? up+0x2f/0x50
<4> [] rpcrdma_deregister_external+0x113/0x2d0 [xprtrdma]
<4> [] ? printk+0x41/0x48
<4> [] xprt_rdma_free+0x8c/0x210 [xprtrdma]
<4> [] ? mod_timer+0x144/0x220
<4> [] xprt_release+0xc0/0x220 [sunrpc]
<4> [] rpc_release_resources_task+0x1d/0x50 [sunrpc]
<4> [] __rpc_execute+0x174/0x350 [sunrpc]
<4> [] ? printk+0x41/0x48
<4> [] ? bit_waitqueue+0x17/0xd0
<4> [] rpc_execute+0x61/0xa0 [sunrpc]
<4> [] rpc_run_task+0x75/0x90 [sunrpc]
<4> [] rpc_call_sync+0x42/0x70 [sunrpc]
<4> [] ? handle_pte_fault+0x487/0xb50
<4> [] _nfs4_call_sync+0x30/0x40 [nfs]
<4> [] _nfs4_proc_getattr+0xac/0xc0 [nfs]
<4> [] nfs4_proc_getattr+0x4e/0x70 [nfs]
<4> [] __nfs_revalidate_inode+0xe3/0x220 [nfs]
<4> [] nfs_getattr+0xb6/0x120 [nfs]
<4> [] vfs_getattr+0x51/0x80
<4> [] vfs_fstatat+0x60/0x80
<4> [] vfs_stat+0x1b/0x20
<4> [] sys_newstat+0x24/0x50
<4> [] ? audit_syscall_entry+0x1d7/0x200
<4> [] system_call_fastpath+0x16/0x1b
<4>Code: 48 89 85 78 ff ff ff 48 8b 40 08 8b 40 1c 89 45 94 b8 ff ff ff ff f0 
41 0f c1 85 e0 05 00 00 49 8b 04 24 48 8d 55 d8 48 8b 78 10 <48> 8b 07 ff 90 b0 
01 00 00 85 c0 89 c3 74 09 80 3d 56 dd 03 00
<1>RIP  [] rpcrdma_deregister_frmr_external+0x9c/0xe0 
[xprtrdma]
<4> RSP 
<4>CR2: 

> -Original Message-
> From: Chuck Lever [mailto:chuck.le...@oracle.com]
> Sent: Friday, April 11, 2014 1:24 AM
> To: Devesh Sharma
> Cc: Linux NFS 

Re: [PATCH] nfs-rdma: Fix for FMR leaks

2014-04-11 Thread Chuck Lever
Hi Allen-

I reviewed and tested your patch. Comments inline below.


On Apr 9, 2014, at 6:40 PM, Allen Andrews  wrote:

> Two memory region leaks iwere found during testing:
> 
> 1. rpcrdma_buffer_create: While allocating RPCRDMA_FRMR's 
> ib_alloc_fast_reg_mr is called and then ib_alloc_fast_reg_page_list is 
> called.  If ib_alloc_fast_reg_page_list returns an error it bails out of the 
> routine dropping the last ib_alloc_fast_reg_mr frmr region creating a memory 
> leak.  Added code to dereg the last frmr if ib_alloc_fast_reg_page_list fails.
> 
> 2. rpcrdma_buffer_destroy: While cleaning up, the routine will only free the 
> MR's on the rb_mws list if there are rb_send_bufs present.  However, in 
> rpcrdma_buffer_create while the rb_mws list is being built if one of the MR 
> allocation requests fail after some MR's have been allocated on the rb_mws 
> list the routine never gets to create any rb_send_bufs but instead jumps to 
> the rpcrdma_buffer_destroy routine which will never free the MR's on rb_mws
> list because the rb_send_bufs were never created.   This leaks all the MR's
> on the rb_mws list that were created prior to one of the MR allocations 
> failing.
> 
> Issue(2) was seen during testing. Our adapter had a finite number of MR's 
> available and we created enough connections to where we saw an MR allocation 
> failure on our Nth NFS connection request. After the kernel cleaned up the 
> resources it had allocated for the Nth connection we noticed that FMR's had 
> been leaked due to the coding error described above.
> 
> Issue(1) was seen during a code review while debugging issue(2).
> 
> Signed-off-by: Allen Andrews 
> ---
> 
> net/sunrpc/xprtrdma/verbs.c |   79 ---
> 1 files changed, 44 insertions(+), 35 deletions(-)
> 
> diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c index 
> 9372656..54f592d 100644
> --- a/net/sunrpc/xprtrdma/verbs.c
> +++ b/net/sunrpc/xprtrdma/verbs.c
> @@ -1058,6 +1058,14 @@ rpcrdma_buffer_create(struct rpcrdma_buffer *buf, 
> struct rpcrdma_ep *ep,
>   dprintk("RPC:   %s: "
>   "ib_alloc_fast_reg_page_list "
>   "failed %i\n", __func__, rc);
> +
> + rc = ib_dereg_mr(r->r.frmr.fr_mr);

This clears “rc,” wiping the error returned by ib_alloc_fast_reg_page_list().

Then rpcrdma_buffer_create() returns zero, even though it failed, and xprtrdma 
thinks the
transport is set up and ready for use. Panic.

I think you can safely ignore the return code from ib_dereg_mr() here.


> + if (rc)
> + dprintk("RPC:   %s:"
> + " ib_dereg_mr"
> + " failed %i\n",
> + __func__, rc);
> +
>   goto out;
>   }
>   list_add(&r->mw_list, &buf->rb_mws); @@ -1194,41 
> +1202,6 @@ rpcrdma_buffer_destroy(struct rpcrdma_buffer *buf)

This line is broken: the “@@“ should start on the next line. Some kind of 
e-mail snafu?


>   kfree(buf->rb_recv_bufs[i]);
>   }
>   if (buf->rb_send_bufs && buf->rb_send_bufs[i]) {
> - while (!list_empty(&buf->rb_mws)) {
> - r = list_entry(buf->rb_mws.next,
> - struct rpcrdma_mw, mw_list);
> - list_del(&r->mw_list);
> - switch (ia->ri_memreg_strategy) {
> - case RPCRDMA_FRMR:
> - rc = ib_dereg_mr(r->r.frmr.fr_mr);
> - if (rc)
> - dprintk("RPC:   %s:"
> - " ib_dereg_mr"
> - " failed %i\n",
> - __func__, rc);
> - 
> ib_free_fast_reg_page_list(r->r.frmr.fr_pgl);
> - break;
> - case RPCRDMA_MTHCAFMR:
> - rc = ib_dealloc_fmr(r->r.fmr);
> - if (rc)
> - dprintk("RPC:   %s:"
> - " ib_dealloc_fmr"
> - " failed %i\n",
> - __func__, rc);
> - break;
> - case RPCRDMA_MEMWINDOWS_ASYNC:
> - case RPCRDMA_MEMWINDOWS:
> - rc = ib_dealloc_mw(r->r.mw);
>

Re: [PATCH] xprtrdma: mind the device's max fast register page list depth

2014-04-11 Thread Steve Wise

On 4/11/2014 1:34 PM, Chuck Lever wrote:

Hi Steve-

I reviewed and successfully tested this patch with mlx4 using FRMR and MTHCAFMR.
A couple of cosmetic notes are below:


On Apr 10, 2014, at 11:13 AM, Steve Wise  wrote:


Some rdma devices don't support a fast register page list depth of
at least RPCRDMA_MAX_DATA_SEGS.  So xprtrdma needs to chunk its fast
register regions according to the minimum of the device max supported
depth or RPCRDMA_MAX_DATA_SEGS.

Signed-off-by: Steve Wise 
---

net/sunrpc/xprtrdma/rpc_rdma.c  |2 -
net/sunrpc/xprtrdma/verbs.c |   64 ---
net/sunrpc/xprtrdma/xprt_rdma.h |1 +
3 files changed, 47 insertions(+), 20 deletions(-)

diff --git a/net/sunrpc/xprtrdma/rpc_rdma.c b/net/sunrpc/xprtrdma/rpc_rdma.c
index 96ead52..14d7634 100644
--- a/net/sunrpc/xprtrdma/rpc_rdma.c
+++ b/net/sunrpc/xprtrdma/rpc_rdma.c
@@ -249,8 +249,6 @@ rpcrdma_create_chunks(struct rpc_rqst *rqst, struct xdr_buf 
*target,
req->rl_nchunks = nchunks;

BUG_ON(nchunks == 0);

I don’t see a path through rpcrdma_create_chunks() where nchunks can be 0 here. 
If you
agree, can you remove the above BUG_ON as well, if you re-submit a new version 
of this
patch?


Sure.


-   BUG_ON((r_xprt->rx_ia.ri_memreg_strategy == RPCRDMA_FRMR)
-  && (nchunks > 3));

/*
 * finish off header. If write, marshal discrim and nchunks.
diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index 93726560..c08f193 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -539,6 +539,11 @@ rpcrdma_ia_open(struct rpcrdma_xprt *xprt, struct sockaddr 
*addr, int memreg)
__func__);
memreg = RPCRDMA_REGISTER;
#endif
+   } else {
+   /* Mind the ia limit on FRMR page list depth */
+   ia->ri_max_frmr_depth = min_t(unsigned int,
+   RPCRDMA_MAX_DATA_SEGS,
+   devattr.max_fast_reg_page_list_len);
}
break;
}
@@ -659,24 +664,42 @@ rpcrdma_ep_create(struct rpcrdma_ep *ep, struct 
rpcrdma_ia *ia,
ep->rep_attr.srq = NULL;
ep->rep_attr.cap.max_send_wr = cdata->max_requests;
switch (ia->ri_memreg_strategy) {
-   case RPCRDMA_FRMR:
+   case RPCRDMA_FRMR: {
+   int depth = 7;
+
/* Add room for frmr register and invalidate WRs.
 * 1. FRMR reg WR for head
 * 2. FRMR invalidate WR for head
-* 3. FRMR reg WR for pagelist
-* 4. FRMR invalidate WR for pagelist
+* 3. N FRMR reg WRs for pagelist
+* 4. N FRMR invalidate WRs for pagelist
 * 5. FRMR reg WR for tail
 * 6. FRMR invalidate WR for tail
 * 7. The RDMA_SEND WR
 */
-   ep->rep_attr.cap.max_send_wr *= 7;
+
+   /* Calculate N if the device max FRMR depth is smaller than
+* RPCRDMA_MAX_DATA_SEGS.
+*/
+   if (ia->ri_max_frmr_depth < RPCRDMA_MAX_DATA_SEGS) {
+   int delta = RPCRDMA_MAX_DATA_SEGS -
+   ia->ri_max_frmr_depth;
+
+   do {
+   depth += 2; /* FRMR reg + invalidate */
+   delta -= ia->ri_max_frmr_depth;
+   } while (delta > 0);
+
+   }
+   ep->rep_attr.cap.max_send_wr *= depth;
if (ep->rep_attr.cap.max_send_wr > devattr.max_qp_wr) {
-   cdata->max_requests = devattr.max_qp_wr / 7;
+   cdata->max_requests = devattr.max_qp_wr / depth;
if (!cdata->max_requests)
return -EINVAL;
-   ep->rep_attr.cap.max_send_wr = cdata->max_requests * 7;
+   ep->rep_attr.cap.max_send_wr = cdata->max_requests *
+  depth;
}
break;
+   }
case RPCRDMA_MEMWINDOWS_ASYNC:
case RPCRDMA_MEMWINDOWS:
/* Add room for mw_binds+unbinds - overkill! */
@@ -1043,16 +1066,16 @@ rpcrdma_buffer_create(struct rpcrdma_buffer *buf, 
struct rpcrdma_ep *ep,
case RPCRDMA_FRMR:
for (i = buf->rb_max_requests * RPCRDMA_MAX_SEGS; i; i--) {
r->r.frmr.fr_mr = ib_alloc_fast_reg_mr(ia->ri_pd,
-RPCRDMA_MAX_SEGS);
+   ia->ri_max_frmr_depth);
if (IS_ERR(r->r.frmr.fr_mr)) {
rc = PTR_ERR(r->r.frmr.fr_mr);
dprintk("RPC:   %s: ib_alloc_fast_reg_mr"
 

Re: [PATCH] xprtrdma: mind the device's max fast register page list depth

2014-04-11 Thread Chuck Lever
Hi Steve-

I reviewed and successfully tested this patch with mlx4 using FRMR and MTHCAFMR.
A couple of cosmetic notes are below:


On Apr 10, 2014, at 11:13 AM, Steve Wise  wrote:

> Some rdma devices don't support a fast register page list depth of
> at least RPCRDMA_MAX_DATA_SEGS.  So xprtrdma needs to chunk its fast
> register regions according to the minimum of the device max supported
> depth or RPCRDMA_MAX_DATA_SEGS.
> 
> Signed-off-by: Steve Wise 
> ---
> 
> net/sunrpc/xprtrdma/rpc_rdma.c  |2 -
> net/sunrpc/xprtrdma/verbs.c |   64 ---
> net/sunrpc/xprtrdma/xprt_rdma.h |1 +
> 3 files changed, 47 insertions(+), 20 deletions(-)
> 
> diff --git a/net/sunrpc/xprtrdma/rpc_rdma.c b/net/sunrpc/xprtrdma/rpc_rdma.c
> index 96ead52..14d7634 100644
> --- a/net/sunrpc/xprtrdma/rpc_rdma.c
> +++ b/net/sunrpc/xprtrdma/rpc_rdma.c
> @@ -249,8 +249,6 @@ rpcrdma_create_chunks(struct rpc_rqst *rqst, struct 
> xdr_buf *target,
>   req->rl_nchunks = nchunks;
> 
>   BUG_ON(nchunks == 0);

I don’t see a path through rpcrdma_create_chunks() where nchunks can be 0 here. 
If you
agree, can you remove the above BUG_ON as well, if you re-submit a new version 
of this
patch?

> - BUG_ON((r_xprt->rx_ia.ri_memreg_strategy == RPCRDMA_FRMR)
> -&& (nchunks > 3));
> 
>   /*
>* finish off header. If write, marshal discrim and nchunks.
> diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
> index 93726560..c08f193 100644
> --- a/net/sunrpc/xprtrdma/verbs.c
> +++ b/net/sunrpc/xprtrdma/verbs.c
> @@ -539,6 +539,11 @@ rpcrdma_ia_open(struct rpcrdma_xprt *xprt, struct 
> sockaddr *addr, int memreg)
>   __func__);
>   memreg = RPCRDMA_REGISTER;
> #endif
> + } else {
> + /* Mind the ia limit on FRMR page list depth */
> + ia->ri_max_frmr_depth = min_t(unsigned int,
> + RPCRDMA_MAX_DATA_SEGS,
> + devattr.max_fast_reg_page_list_len);
>   }
>   break;
>   }
> @@ -659,24 +664,42 @@ rpcrdma_ep_create(struct rpcrdma_ep *ep, struct 
> rpcrdma_ia *ia,
>   ep->rep_attr.srq = NULL;
>   ep->rep_attr.cap.max_send_wr = cdata->max_requests;
>   switch (ia->ri_memreg_strategy) {
> - case RPCRDMA_FRMR:
> + case RPCRDMA_FRMR: {
> + int depth = 7;
> +
>   /* Add room for frmr register and invalidate WRs.
>* 1. FRMR reg WR for head
>* 2. FRMR invalidate WR for head
> -  * 3. FRMR reg WR for pagelist
> -  * 4. FRMR invalidate WR for pagelist
> +  * 3. N FRMR reg WRs for pagelist
> +  * 4. N FRMR invalidate WRs for pagelist
>* 5. FRMR reg WR for tail
>* 6. FRMR invalidate WR for tail
>* 7. The RDMA_SEND WR
>*/
> - ep->rep_attr.cap.max_send_wr *= 7;
> +
> + /* Calculate N if the device max FRMR depth is smaller than
> +  * RPCRDMA_MAX_DATA_SEGS.
> +  */
> + if (ia->ri_max_frmr_depth < RPCRDMA_MAX_DATA_SEGS) {
> + int delta = RPCRDMA_MAX_DATA_SEGS -
> + ia->ri_max_frmr_depth;
> +
> + do {
> + depth += 2; /* FRMR reg + invalidate */
> + delta -= ia->ri_max_frmr_depth;
> + } while (delta > 0);
> +
> + }
> + ep->rep_attr.cap.max_send_wr *= depth;
>   if (ep->rep_attr.cap.max_send_wr > devattr.max_qp_wr) {
> - cdata->max_requests = devattr.max_qp_wr / 7;
> + cdata->max_requests = devattr.max_qp_wr / depth;
>   if (!cdata->max_requests)
>   return -EINVAL;
> - ep->rep_attr.cap.max_send_wr = cdata->max_requests * 7;
> + ep->rep_attr.cap.max_send_wr = cdata->max_requests *
> +depth;
>   }
>   break;
> + }
>   case RPCRDMA_MEMWINDOWS_ASYNC:
>   case RPCRDMA_MEMWINDOWS:
>   /* Add room for mw_binds+unbinds - overkill! */
> @@ -1043,16 +1066,16 @@ rpcrdma_buffer_create(struct rpcrdma_buffer *buf, 
> struct rpcrdma_ep *ep,
>   case RPCRDMA_FRMR:
>   for (i = buf->rb_max_requests * RPCRDMA_MAX_SEGS; i; i--) {
>   r->r.frmr.fr_mr = ib_alloc_fast_reg_mr(ia->ri_pd,
> -  RPCRDMA_MAX_SEGS);
> + ia->ri_max_frmr_depth);
>   if (IS_ERR(r->r.frmr.fr_mr)) {
>   rc = PTR_ERR(r->r.frmr.fr_mr);
>   dprintk("RPC:   %s: ib_alloc_fast_reg_mr"
> 

Re: [PATCH] Clean index map on exit

2014-04-11 Thread Hannes Weisbach

> 
> Is the array always full ?
No, but it has to be zero-initialized and free(NULL) is fine.

idm_set() does:
if (!idm->array[idx_array_index(index)]) {
if (idm_grow(idm, index) < 0)
(idm_grow does calloc().)

So I concluded unused entries have to be zero already and I don't put
a new constraint on struct index_map.

> is the maximum index of the array
> IDX_ARRAY_SIZE or could it be lower ?
It's fixed:
struct index_map
{
void **array[IDX_ARRAY_SIZE];
};

> And what about idm, is it free()'d somewhere else ?
The owner has to take care of its struct index_map.  I can only free
the resources the map itself holds.
For what it's worth: all struct index_maps are declared statically.

Again, thank you for your quick feedback.

Best regards,
Hannes
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Add cleanup routines.

2014-04-11 Thread Hannes Weisbach

Am 11.04.2014 um 19:00 schrieb Yann Droneaud :

> Hi,
Thanks for your quick reply.

> Please adds some explanation, in particular the purpose of the changes.
> Your commit message only explain "how", but you should also explain
> "why".
Ok.

> In C, empty prototype declare a function which accept any parameter. So
> perhaps void ibverbs(void) is what you mean.
Yup, thanks.

>> +{
>> +size_t i;
>> +for (i = 0; i < num_devices; i++) {
>> +/* driver callback needed. May not be malloc'd memory */
> 
> Seems dangerous. Such interrogation must be explicitly added in the
> commit message. 
Maybe it's better to leave it as a TODO and only free the device_list
itself?

>> +for (driver = head_driver; driver;) {
>> +struct ibv_driver *tmp = driver;
>> +driver = driver->next;
>> +free(tmp);
>> +}
>> +
> 
> Is it safe here to free the driver ?
The struct itself is no longer needed.  All the driver libraries are
loaded and ibverbs_init() is call-once, so no new libraries are
dlopen()ed (and call ibv_register_driver()).
It's probably nicer first to dlclose() all libraries and then free all
struct ibv_drivers.

>> +for (list = so_list; list;) {
>> +struct ibv_so_list *tmp = list;
>> +list = list->next;
>> +dlclose(tmp->dlhandle);
>> +free(tmp);
>> +}
>> +}
> 
> Why not store the dlopen() handle in the struct ibv_driver itself ?
> This way only one list has to be scanned.
That was my first idea, but the struct ibv_driver doesn't exist at
that time.  First the (driver) library is dlopen()d and calls
ibv_register_driver().  Only there is the ibv_driver struct allocated,
but the handle is gone.

Best regards,
Hannes
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Clean index map on exit

2014-04-11 Thread Yann Droneaud
Le vendredi 11 avril 2014 à 18:58 +0200, Hannes Weisbach a écrit :
> This patch adds the function idm_free() to free all entries of an index
> map. A call to this function is added in the ucma_cleanup destructor.
> The ucma_idm struct index_map is cleaned.
> 

Please explain the purpose: what's your goal ?

> Signed-off-by: Hannes Weisbach 
> ---
>  src/cma.c | 1 +
>  src/indexer.c | 8 
>  src/indexer.h | 1 +
>  3 files changed, 10 insertions(+)
> 
> diff --git a/src/cma.c b/src/cma.c
> index 0cf4203..a533bf9 100644
> --- a/src/cma.c
> +++ b/src/cma.c
> @@ -139,6 +139,7 @@ static void ucma_cleanup(void)
>   ibv_close_device(cma_dev_array[cma_dev_cnt].verbs);
>   }
>  
> + idm_free(&ucma_idm);
>   fastlock_destroy(&idm_lock);
>   free(cma_dev_array);
>   cma_dev_cnt = 0;
> diff --git a/src/indexer.c b/src/indexer.c
> index c8e8bce..4d1fd77 100644
> --- a/src/indexer.c
> +++ b/src/indexer.c
> @@ -164,3 +164,11 @@ void *idm_clear(struct index_map *idm, int index)
>   entry[idx_entry_index(index)] = NULL;
>   return item;
>  }
> +
> +void idm_free(struct index_map *idm)
> +{
> + size_t i;
> + for (i = 0; i < IDX_ARRAY_SIZE; i++) {
> + free(idm->array[i]);
> + }
> +}

Is the array always full ? is the maximum index of the array
IDX_ARRAY_SIZE or could it be lower ?

And what about idm, is it free()'d somewhere else ?

> diff --git a/src/indexer.h b/src/indexer.h
> index 0c5f388..829d46b 100644
> --- a/src/indexer.h
> +++ b/src/indexer.h
> @@ -89,6 +89,7 @@ struct index_map
>  
>  int idm_set(struct index_map *idm, int index, void *item);
>  void *idm_clear(struct index_map *idm, int index);
> +void idm_free(struct index_map *idm);
>  
>  static inline void *idm_at(struct index_map *idm, int index)
>  {

Regards.

-- 
Yann Droneaud
OPTEYA


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Add cleanup routines.

2014-04-11 Thread Yann Droneaud
Hi,

Le vendredi 11 avril 2014 à 18:50 +0200, Hannes Weisbach a écrit :
> Dynamically loaded library handles are saved in a list and dlclosed() on
> exit. The list of struct ibv_driver *, as well as the global
> struct ibv_device ** list are free()d.
> 

Please adds some explanation, in particular the purpose of the changes.
Your commit message only explain "how", but you should also explain
"why".

> Signed-off-by: Hannes Weisbach 
> ---
>  src/device.c | 10 ++
>  src/init.c   | 51 ++-
>  2 files changed, 56 insertions(+), 5 deletions(-)
> 
> diff --git a/src/device.c b/src/device.c
> index beb7b3c..d5b76bb 100644
> --- a/src/device.c
> +++ b/src/device.c
> @@ -305,3 +305,13 @@ void __ibv_ack_async_event(struct ibv_async_event *event)
>   }
>  }
>  default_symver(__ibv_ack_async_event, ibv_ack_async_event);
> +
> +FINI static void ibverbs_deinit()

In C, empty prototype declare a function which accept any parameter. So
perhaps void ibverbs(void) is what you mean.

> +{
> + size_t i;
> + for (i = 0; i < num_devices; i++) {
> + /* driver callback needed. May not be malloc'd memory */

Seems dangerous. Such interrogation must be explicitly added in the
commit message. 

> + free(device_list[i]);
> + }
> + free(device_list);
> +}
> diff --git a/src/init.c b/src/init.c
> index d0e4b1c..2a8bca4 100644
> --- a/src/init.c
> +++ b/src/init.c
> @@ -67,6 +67,11 @@ struct ibv_driver_name {
>   struct ibv_driver_name *next;
>  };
>  
> +struct ibv_so_list {
> + void *dlhandle;
> + struct ibv_so_list *next;
> +};
> +
>  struct ibv_driver {
>   const char *name;
>   ibv_driver_init_funcinit_func;
> @@ -77,6 +82,7 @@ struct ibv_driver {
>  static struct ibv_sysfs_dev *sysfs_dev_list;
>  static struct ibv_driver_name *driver_name_list;
>  static struct ibv_driver *head_driver, *tail_driver;
> +static struct ibv_so_list *so_list;
>  
>  static int find_sysfs_devs(void)
>  {
> @@ -193,7 +199,14 @@ void verbs_register_driver(const char *name, 
> verbs_driver_init_func init_func)
>  static void load_driver(const char *name)
>  {
>   char *so_name;
> - void *dlhandle;
> + struct ibv_so_list *elem;
> + struct ibv_so_list **list;
> +
> + elem = malloc(sizeof(*elem));
> + if(!elem)
> + return;
> +
> + elem->next = NULL;
>  
>  #define __IBV_QUOTE(x)   #x
>  #define IBV_QUOTE(x) __IBV_QUOTE(x)
> @@ -205,16 +218,25 @@ static void load_driver(const char *name)
>name) < 0) {
>   fprintf(stderr, PFX "Warning: couldn't load driver '%s'.\n",
>   name);
> - return;
> + goto err;
>   }
>  
> - dlhandle = dlopen(so_name, RTLD_NOW);
> - if (!dlhandle) {
> + elem->dlhandle = dlopen(so_name, RTLD_NOW);
> + if (!elem->dlhandle) {
>   fprintf(stderr, PFX "Warning: couldn't load driver '%s': %s\n",
>   name, dlerror());
> - goto out;
> + goto err;
>   }
>  
> + for (list = &so_list; *list; list = &(*list)->next)
> + ;
> +
> + *list = elem;
> +
> + goto out;
> +
> +err:
> + free(elem);
>  out:
>   free(so_name);
>  }
> @@ -573,3 +595,22 @@ out:
>  
>   return num_devices;
>  }
> +
> +FINI static void unload_drivers()

same remarks about prototype.

> +{
> + struct ibv_driver *driver;
> + struct ibv_so_list * list;
> +
> + for (driver = head_driver; driver;) {
> + struct ibv_driver *tmp = driver;
> + driver = driver->next;
> + free(tmp);
> + }
> +

Is it safe here to free the driver ?

> + for (list = so_list; list;) {
> + struct ibv_so_list *tmp = list;
> + list = list->next;
> + dlclose(tmp->dlhandle);
> + free(tmp);
> + }
> +}

Why not store the dlopen() handle in the struct ibv_driver itself ?
This way only one list has to be scanned.

Regards.

-- 
Yann Droneaud
OPTEYA


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Clean index map on exit

2014-04-11 Thread Hannes Weisbach
This patch adds the function idm_free() to free all entries of an index
map. A call to this function is added in the ucma_cleanup destructor.
The ucma_idm struct index_map is cleaned.

Signed-off-by: Hannes Weisbach 
---
 src/cma.c | 1 +
 src/indexer.c | 8 
 src/indexer.h | 1 +
 3 files changed, 10 insertions(+)

diff --git a/src/cma.c b/src/cma.c
index 0cf4203..a533bf9 100644
--- a/src/cma.c
+++ b/src/cma.c
@@ -139,6 +139,7 @@ static void ucma_cleanup(void)
ibv_close_device(cma_dev_array[cma_dev_cnt].verbs);
}
 
+   idm_free(&ucma_idm);
fastlock_destroy(&idm_lock);
free(cma_dev_array);
cma_dev_cnt = 0;
diff --git a/src/indexer.c b/src/indexer.c
index c8e8bce..4d1fd77 100644
--- a/src/indexer.c
+++ b/src/indexer.c
@@ -164,3 +164,11 @@ void *idm_clear(struct index_map *idm, int index)
entry[idx_entry_index(index)] = NULL;
return item;
 }
+
+void idm_free(struct index_map *idm)
+{
+   size_t i;
+   for (i = 0; i < IDX_ARRAY_SIZE; i++) {
+   free(idm->array[i]);
+   }
+}
diff --git a/src/indexer.h b/src/indexer.h
index 0c5f388..829d46b 100644
--- a/src/indexer.h
+++ b/src/indexer.h
@@ -89,6 +89,7 @@ struct index_map
 
 int idm_set(struct index_map *idm, int index, void *item);
 void *idm_clear(struct index_map *idm, int index);
+void idm_free(struct index_map *idm);
 
 static inline void *idm_at(struct index_map *idm, int index)
 {
-- 
1.8.5.5


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Add cleanup routines.

2014-04-11 Thread Hannes Weisbach
Dynamically loaded library handles are saved in a list and dlclosed() on
exit. The list of struct ibv_driver *, as well as the global
struct ibv_device ** list are free()d.

Signed-off-by: Hannes Weisbach 
---
 src/device.c | 10 ++
 src/init.c   | 51 ++-
 2 files changed, 56 insertions(+), 5 deletions(-)

diff --git a/src/device.c b/src/device.c
index beb7b3c..d5b76bb 100644
--- a/src/device.c
+++ b/src/device.c
@@ -305,3 +305,13 @@ void __ibv_ack_async_event(struct ibv_async_event *event)
}
 }
 default_symver(__ibv_ack_async_event, ibv_ack_async_event);
+
+FINI static void ibverbs_deinit()
+{
+   size_t i;
+   for (i = 0; i < num_devices; i++) {
+   /* driver callback needed. May not be malloc'd memory */
+   free(device_list[i]);
+   }
+   free(device_list);
+}
diff --git a/src/init.c b/src/init.c
index d0e4b1c..2a8bca4 100644
--- a/src/init.c
+++ b/src/init.c
@@ -67,6 +67,11 @@ struct ibv_driver_name {
struct ibv_driver_name *next;
 };
 
+struct ibv_so_list {
+   void *dlhandle;
+   struct ibv_so_list *next;
+};
+
 struct ibv_driver {
const char *name;
ibv_driver_init_funcinit_func;
@@ -77,6 +82,7 @@ struct ibv_driver {
 static struct ibv_sysfs_dev *sysfs_dev_list;
 static struct ibv_driver_name *driver_name_list;
 static struct ibv_driver *head_driver, *tail_driver;
+static struct ibv_so_list *so_list;
 
 static int find_sysfs_devs(void)
 {
@@ -193,7 +199,14 @@ void verbs_register_driver(const char *name, 
verbs_driver_init_func init_func)
 static void load_driver(const char *name)
 {
char *so_name;
-   void *dlhandle;
+   struct ibv_so_list *elem;
+   struct ibv_so_list **list;
+
+   elem = malloc(sizeof(*elem));
+   if(!elem)
+   return;
+
+   elem->next = NULL;
 
 #define __IBV_QUOTE(x) #x
 #define IBV_QUOTE(x)   __IBV_QUOTE(x)
@@ -205,16 +218,25 @@ static void load_driver(const char *name)
 name) < 0) {
fprintf(stderr, PFX "Warning: couldn't load driver '%s'.\n",
name);
-   return;
+   goto err;
}
 
-   dlhandle = dlopen(so_name, RTLD_NOW);
-   if (!dlhandle) {
+   elem->dlhandle = dlopen(so_name, RTLD_NOW);
+   if (!elem->dlhandle) {
fprintf(stderr, PFX "Warning: couldn't load driver '%s': %s\n",
name, dlerror());
-   goto out;
+   goto err;
}
 
+   for (list = &so_list; *list; list = &(*list)->next)
+   ;
+
+   *list = elem;
+
+   goto out;
+
+err:
+   free(elem);
 out:
free(so_name);
 }
@@ -573,3 +595,22 @@ out:
 
return num_devices;
 }
+
+FINI static void unload_drivers()
+{
+   struct ibv_driver *driver;
+   struct ibv_so_list * list;
+
+   for (driver = head_driver; driver;) {
+   struct ibv_driver *tmp = driver;
+   driver = driver->next;
+   free(tmp);
+   }
+
+   for (list = so_list; list;) {
+   struct ibv_so_list *tmp = list;
+   list = list->next;
+   dlclose(tmp->dlhandle);
+   free(tmp);
+   }
+}
-- 
1.8.5.5


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Rare Duplicate Completions

2014-04-11 Thread Christopher Mitchell
Bert,

Thanks for the help. Unfortunately, it looks as if all of my
message-passing and RDMA operations use the SIGNALED flag. I'm
checking the  wc->status flag (==IBV_WC_SUCCESS), and it and
everything in the message appears to be completely identical to the
first completion. I notice that it only seems to happen when I have
server processes (ie, listeners) that also open active connections to
other listeners as well as accept connections from client processes. I
perform both RDMA and message-passing across those server-to-server
connections, so on a hunch I tried rewriting the RDMA operations as
message-passing, but the problem still occurs. Thanks in advance for
any other thoughts you might have.

Cheers,
Christopher

On Thu, Apr 10, 2014 at 5:42 PM, Christopher Mitchell
 wrote:
> Bert,
>
> Thanks for the help. Unfortunately, it looks as if all of my message-passing
> and RDMA operations use the SIGNALED flag. I'm checking the  wc->status flag
> (==IBV_WC_SUCCESS), and it and everything in the message appears to be
> completely identical to the first completion. I notice that it only seems to
> happen when I have server processes (ie, listeners) that also open active
> connections to other listeners as well as accept connections from client
> processes. I perform both RDMA and message-passing across those
> server-to-server connections, so on a hunch I tried rewriting the RDMA
> operations as message-passing, but the problem still occurs. Thanks in
> advance for any other thoughts you might have.
>
> Cheers,
> Christopher
>
>
> On Mon, Apr 7, 2014 at 9:01 PM, Bart Van Assche  wrote:
>>
>> On 4/04/2014 10:00, Christopher Mitchell wrote:
>>>
>>> I am working on building a distributed Infiniband application, testing
>>> using Mellanox Connect-X HCAs. In very rare cases, perhaps once in a
>>> few million operations, I appear to be receiving duplicate completions
>>> or incorrect completions. For instance, I'll send out an RDMA request
>>> and receive a completion for a Verb message response I had just
>>> handled, or send a Verb message request and receive a duplicate Verb
>>> message completion.. Needless to say, this is introducing instability
>>> in my application. Does anyone have experience with a bug like this,
>>> or am I encountering some arcane issue in how I'm manipulating the
>>> HCA? I'd be more than happy to furnish relevant sections of code to
>>> help nail down the issue.
>>
>>
>> Hello Christopher,
>>
>> In e.g. the SCST SRP target driver there is code present that checks for
>> duplicate and/or missing completions. Although this code is being used
>> intensively I have not yet seen any error messages being logged by the code
>> that verifies completions. Note: something that is nontrivial in the RDMA
>> API and that you might already be aware of is that even for non-signaled
>> work requests a completion is delivered if that work request fails. If these
>> non-signaled work requests do not have a unique wr_id error completions for
>> these requests might be misinterpret as duplicate completions.
>>
>> Bart.
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH opensm] osm_sa_mad_ctrl.c: In sa_mad_ctrl_rcv_callback, improve 1A04 error log message

2014-04-11 Thread Hal Rosenstock

to include more info on incoming SA MAD

Signed-off-by: Hal Rosenstock 
---
diff --git a/opensm/osm_sa_mad_ctrl.c b/opensm/osm_sa_mad_ctrl.c
index 902803e..61fcb0b 100644
--- a/opensm/osm_sa_mad_ctrl.c
+++ b/opensm/osm_sa_mad_ctrl.c
@@ -331,9 +331,12 @@ static void sa_mad_ctrl_rcv_callback(IN osm_madw_t * 
p_madw, IN void *context,
if (p_sa_mad->sm_key != 0 &&
p_sa_mad->sm_key != p_ctrl->p_subn->opt.sa_key) {
OSM_LOG(p_ctrl->p_log, OSM_LOG_ERROR, "ERR 1A04: "
-   "Non-Zero SA MAD SM_Key: 0x%" PRIx64 " != SM_Key: 0x%"
-   PRIx64 "; MAD ignored\n", cl_ntoh64(p_sa_mad->sm_key),
-   cl_ntoh64(p_ctrl->p_subn->opt.sa_key));
+   "Non-Zero MAD SM_Key: 0x%" PRIx64 " != SM_Key: 0x%"
+   PRIx64 "; SA MAD ignored for method 0x%X attribute 0x%X 
(%s)\n",
+   cl_ntoh64(p_sa_mad->sm_key),
+   cl_ntoh64(p_ctrl->p_subn->opt.sa_key),
+   p_sa_mad->method, cl_ntoh16(p_sa_mad->attr_id),
+   ib_get_sa_attr_str(p_sa_mad->attr_id));
osm_mad_pool_put(p_ctrl->p_mad_pool, p_madw);
goto Exit;
}
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html