RE: Announcing IBM Platform MPI 9.1.2.1 FixPack

2014-05-06 Thread Hefty, Sean
> Yes, the delay seems to be coming here:
> 
> err = hpmp_rdmacm->rdma_connect(id, NULL);
> if (err) {
> hpmp_printf("rdma_connect() failed");
> return NULL;
> }
> t1 = MPI_Wtime();
> 
> retry3:
> err =
> hpmp_rdmacm->rdma_get_cm_event(hpmp_rdmacm->connect_cm_channel, &event);
> if (err) {
> if (errno == EINTR) goto retry3;
> hpmp_printf("rdma_get_cm_event() failed");
> return NULL;
> }
> 
> if (event->event != RDMA_CM_EVENT_ESTABLISHED) {
> hpmp_printf("rdma_get_cm_event() unexpected event (%d vs
> %d)"
> "while connecting to %d\n",
> event->event, RDMA_CM_EVENT_ESTABLISHED,
> port);
> return NULL;
> }
> 
> t2 =  MPI_Wtime();
> fprintf(stderr, "CONNECTION ESTABLISHED ON CONNECT %lf\n", t2-t1);
> hpmp_rdmacm->rdma_ack_cm_event(event);
> 
> 
> 
> I get output such as:
> 
> [ 1] CONNECTION ESTABLISHED ON CONNECT 0.001447
> [ 9] CONNECTION ESTABLISHED ON CONNECT 6.145778
> [ 6] CONNECTION ESTABLISHED ON CONNECT 5.233660
> [ 0] CONNECTION ESTABLISHED ON CONNECT 0.001343
> [ 6] CONNECTION ESTABLISHED ON CONNECT 0.001155
> [ 7] CONNECTION ESTABLISHED ON CONNECT 4.517944
> [11] CONNECTION ESTABLISHED ON CONNECT 0.001445
> [ 3] CONNECTION ESTABLISHED ON CONNECT 0.001558
> [ 7] CONNECTION ESTABLISHED ON CONNECT 0.001627
> [ 5] CONNECTION ESTABLISHED ON CONNECT 6.145470
> [ 2] CONNECTION ESTABLISHED ON CONNECT 5.657639
> [ 9] CONNECTION ESTABLISHED ON CONNECT 0.001602
> [10] CONNECTION ESTABLISHED ON CONNECT 6.188743
> [ 1] CONNECTION ESTABLISHED ON CONNECT 0.001500
> [ 6] CONNECTION ESTABLISHED ON CONNECT 0.001061
> [ 1] CONNECTION ESTABLISHED ON CONNECT 0.001183
> [11] CONNECTION ESTABLISHED ON CONNECT 0.001213
> [ 5] CONNECTION ESTABLISHED ON CONNECT 0.210666

What version of Linux is this running?  Is the remote side responding to the 
connection request in a timely manner?  The connect call cannot complete until 
the remote side accepts the connection.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: Announcing IBM Platform MPI 9.1.2.1 FixPack

2014-05-06 Thread David Solt
Yes, the delay seems to be coming here:

err = hpmp_rdmacm->rdma_connect(id, NULL);
if (err) {
hpmp_printf("rdma_connect() failed");
return NULL;
}
t1 = MPI_Wtime();

retry3:
err = 
hpmp_rdmacm->rdma_get_cm_event(hpmp_rdmacm->connect_cm_channel, &event);
if (err) {
if (errno == EINTR) goto retry3;
hpmp_printf("rdma_get_cm_event() failed");
return NULL;
}

if (event->event != RDMA_CM_EVENT_ESTABLISHED) {
hpmp_printf("rdma_get_cm_event() unexpected event (%d vs 
%d)"
"while connecting to %d\n",
event->event, RDMA_CM_EVENT_ESTABLISHED,
port);
return NULL;
}

t2 =  MPI_Wtime();
fprintf(stderr, "CONNECTION ESTABLISHED ON CONNECT %lf\n", t2-t1);
hpmp_rdmacm->rdma_ack_cm_event(event);



I get output such as:

[ 1] CONNECTION ESTABLISHED ON CONNECT 0.001447
[ 9] CONNECTION ESTABLISHED ON CONNECT 6.145778
[ 6] CONNECTION ESTABLISHED ON CONNECT 5.233660
[ 0] CONNECTION ESTABLISHED ON CONNECT 0.001343
[ 6] CONNECTION ESTABLISHED ON CONNECT 0.001155
[ 7] CONNECTION ESTABLISHED ON CONNECT 4.517944
[11] CONNECTION ESTABLISHED ON CONNECT 0.001445
[ 3] CONNECTION ESTABLISHED ON CONNECT 0.001558
[ 7] CONNECTION ESTABLISHED ON CONNECT 0.001627
[ 5] CONNECTION ESTABLISHED ON CONNECT 6.145470
[ 2] CONNECTION ESTABLISHED ON CONNECT 5.657639
[ 9] CONNECTION ESTABLISHED ON CONNECT 0.001602
[10] CONNECTION ESTABLISHED ON CONNECT 6.188743
[ 1] CONNECTION ESTABLISHED ON CONNECT 0.001500
[ 6] CONNECTION ESTABLISHED ON CONNECT 0.001061
[ 1] CONNECTION ESTABLISHED ON CONNECT 0.001183
[11] CONNECTION ESTABLISHED ON CONNECT 0.001213
[ 5] CONNECTION ESTABLISHED ON CONNECT 0.210666









From:   "Hefty, Sean" 
To: David Solt/Dallas/IBM@IBMUS, 
Cc: "linux-rdma@vger.kernel.org" 
Date:   05/06/2014 03:48 PM
Subject:RE: Announcing IBM Platform MPI 9.1.2.1 FixPack



> I am trying to add rdmacm support to Platform MPI.   I noticed that the
> performance on our test cluster was very poor for creating connections.
> For 12 processes on 12 hosts to create n^^2 connections takes about 12
> seconds.   I also discovered that if I create some TCP sockets and use
> those to ensure that only one process at a time is calling 
rdmacm_connect
> to any target at a time, that the performance changes dramatically and
> that I can then connected the 12 processes very quickly (didn't measure
> exactly, but similar to our old rdma code).The order in which I am
> connecting processes avoids flooding a single target with many
> rdmacm_connects at once, but it is difficult to avoid the case where 2
> processes call dmacm_connect to the same target at roughly the same time
> except when using my extra TCP socket connections.   I haven't played 
with
> MPICH code yet to see if they have the same issue, but will try that 
next.
> 
> 
> Our test cluster is a bit old:
> 
> 09:00.0 InfiniBand: Mellanox Technologies MT26428 [ConnectX VPI PCIe 2.0
> 5GT/s - IB QDR / 10GigE] (rev b0)
> 
> Is this a known problem?  Are you aware of any issues that would shed 
some
> light on this?

This is the first I've heard of slow connect times.  Are you sure that the 
time is coming from rdma_connect, versus route or address resolution?



--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V2 RFC 1/3] svcrdma: Transport and header file changes

2014-05-06 Thread J. Bruce Fields
On Tue, May 06, 2014 at 04:02:41PM -0500, Steve Wise wrote:
> On 5/6/2014 2:21 PM, J. Bruce Fields wrote:
> >On Tue, May 06, 2014 at 12:46:27PM -0500, Steve Wise wrote:
> >>From: Tom Tucker 
> >>
> >>Change poll logic to grab up to 6 completions at a time.
> >>
> >>RDMA write and send completions no longer deal with fastreg objects.
> >>
> >>Set SVCRDMA_DEVCAP_FAST_REG and allocate a dma_mr based on the device
> >>capabilities.
> >>
> >>Signed-off-by: Tom Tucker 
> >>Signed-off-by: Steve Wise 
> >>---
> >>
> >>  include/linux/sunrpc/svc_rdma.h  |3 -
> >>  net/sunrpc/xprtrdma/svc_rdma_transport.c |   62 
> >> +-
> >>  2 files changed, 37 insertions(+), 28 deletions(-)
> >>
> >>diff --git a/include/linux/sunrpc/svc_rdma.h 
> >>b/include/linux/sunrpc/svc_rdma.h
> >>index 0b8e3e6..5cf99a0 100644
> >>--- a/include/linux/sunrpc/svc_rdma.h
> >>+++ b/include/linux/sunrpc/svc_rdma.h
> >>@@ -115,14 +115,13 @@ struct svc_rdma_fastreg_mr {
> >>struct list_head frmr_list;
> >>  };
> >>  struct svc_rdma_req_map {
> >>-   struct svc_rdma_fastreg_mr *frmr;
> >>unsigned long count;
> >>union {
> >>struct kvec sge[RPCSVC_MAXPAGES];
> >>struct svc_rdma_chunk_sge ch[RPCSVC_MAXPAGES];
> >>+   unsigned long lkey[RPCSVC_MAXPAGES];
> >>};
> >>  };
> >>-#define RDMACTXT_F_FAST_UNREG  1
> >>  #define RDMACTXT_F_LAST_CTXT  2
> >>  #define   SVCRDMA_DEVCAP_FAST_REG 1   /* fast mr registration 
> >> */
> >>diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c 
> >>b/net/sunrpc/xprtrdma/svc_rdma_transport.c
> >>index 25688fa..2c5b201 100644
> >>--- a/net/sunrpc/xprtrdma/svc_rdma_transport.c
> >>+++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c
> >>@@ -1,4 +1,5 @@
> >>  /*
> >>+ * Copyright (c) 2014 Open Grid Computing, Inc. All rights reserved.
> >>   * Copyright (c) 2005-2007 Network Appliance, Inc. All rights reserved.
> >>   *
> >>   * This software is available to you under a choice of one of two
> >>@@ -160,7 +161,6 @@ struct svc_rdma_req_map *svc_rdma_get_req_map(void)
> >>schedule_timeout_uninterruptible(msecs_to_jiffies(500));
> >>}
> >>map->count = 0;
> >>-   map->frmr = NULL;
> >>return map;
> >>  }
> >>@@ -336,22 +336,21 @@ static void process_context(struct svcxprt_rdma *xprt,
> >>switch (ctxt->wr_op) {
> >>case IB_WR_SEND:
> >>-   if (test_bit(RDMACTXT_F_FAST_UNREG, &ctxt->flags))
> >>-   svc_rdma_put_frmr(xprt, ctxt->frmr);
> >>+   BUG_ON(ctxt->frmr);
> >>svc_rdma_put_context(ctxt, 1);
> >>break;
> >>case IB_WR_RDMA_WRITE:
> >>+   BUG_ON(ctxt->frmr);
> >>svc_rdma_put_context(ctxt, 0);
> >>break;
> >>case IB_WR_RDMA_READ:
> >>case IB_WR_RDMA_READ_WITH_INV:
> >>+   svc_rdma_put_frmr(xprt, ctxt->frmr);
> >>if (test_bit(RDMACTXT_F_LAST_CTXT, &ctxt->flags)) {
> >>struct svc_rdma_op_ctxt *read_hdr = ctxt->read_hdr;
> >>BUG_ON(!read_hdr);
> >>-   if (test_bit(RDMACTXT_F_FAST_UNREG, &ctxt->flags))
> >>-   svc_rdma_put_frmr(xprt, ctxt->frmr);
> >>spin_lock_bh(&xprt->sc_rq_dto_lock);
> >>set_bit(XPT_DATA, &xprt->sc_xprt.xpt_flags);
> >>list_add_tail(&read_hdr->dto_q,
> >>@@ -363,6 +362,7 @@ static void process_context(struct svcxprt_rdma *xprt,
> >>break;
> >>default:
> >>+   BUG_ON(1);
> >>printk(KERN_ERR "svcrdma: unexpected completion type, "
> >>   "opcode=%d\n",
> >>   ctxt->wr_op);
> >Note the printk's unreachable now.  Should some of these BUG_ON()'s be
> >WARN_ON()'s?
> 
> I'll remove the printk.  And if any of the new BUG_ON()'s can be
> WARN_ON(), then I'll do that.  But only if proceeding after a
> WARN_ON() results in a working server.

The other thing to keep in mind is what the consequences of the BUG
might be--e.g. if we BUG while holding an important lock then that lock
never gets dropped and the system can freeze pretty quickly--possibly
before we get any useful information to the system logs.  On a quick
check that doesn't look like the case here, though.

--b.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V2 RFC 0/3] svcrdma: refactor marshalling logic

2014-05-06 Thread Steve Wise

On 5/6/2014 2:27 PM, J. Bruce Fields wrote:

On Tue, May 06, 2014 at 12:46:21PM -0500, Steve Wise wrote:

This patch series refactors the NFSRDMA server marshalling logic to
remove the intermediary map structures.  It also fixes an existing bug
where the NFSRDMA server was not minding the device fast register page
list length limitations.

I've also made a git repo available with these patches on top of 3.15-rc4:

git://git.openfabrics.org/~swise/linux svcrdma-refactor

Changes since V1:

- fixed regression for devices that don't support FRMRs (see
   rdma_read_chunk_lcl())

- split patch up for closer review.  However I request it be squashed
   before merging as they is not bisectable, and I think these changes
   should all be a single commit anyway.

If it's not split up in a way that's bisectable, then yes, just don't
bother.


I didn't see a good way to split it up, have it bisectable, and not have 
all the big stuff in one patch.  I think its a little more reviewable in 
these 3 patches, but when I post V3, I'll put it back as an uber patch.  
Hopefully folks can have a look at these 3 patches ignoring the bisect 
issue.Having said that, the rdma read logic really is better 
reviewed by look at the code after applying the patches.   That's why I 
published a git branch.


Thanks!

Steve.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V2 RFC 1/3] svcrdma: Transport and header file changes

2014-05-06 Thread Steve Wise

On 5/6/2014 2:21 PM, J. Bruce Fields wrote:

On Tue, May 06, 2014 at 12:46:27PM -0500, Steve Wise wrote:

From: Tom Tucker 

Change poll logic to grab up to 6 completions at a time.

RDMA write and send completions no longer deal with fastreg objects.

Set SVCRDMA_DEVCAP_FAST_REG and allocate a dma_mr based on the device
capabilities.

Signed-off-by: Tom Tucker 
Signed-off-by: Steve Wise 
---

  include/linux/sunrpc/svc_rdma.h  |3 -
  net/sunrpc/xprtrdma/svc_rdma_transport.c |   62 +-
  2 files changed, 37 insertions(+), 28 deletions(-)

diff --git a/include/linux/sunrpc/svc_rdma.h b/include/linux/sunrpc/svc_rdma.h
index 0b8e3e6..5cf99a0 100644
--- a/include/linux/sunrpc/svc_rdma.h
+++ b/include/linux/sunrpc/svc_rdma.h
@@ -115,14 +115,13 @@ struct svc_rdma_fastreg_mr {
struct list_head frmr_list;
  };
  struct svc_rdma_req_map {
-   struct svc_rdma_fastreg_mr *frmr;
unsigned long count;
union {
struct kvec sge[RPCSVC_MAXPAGES];
struct svc_rdma_chunk_sge ch[RPCSVC_MAXPAGES];
+   unsigned long lkey[RPCSVC_MAXPAGES];
};
  };
-#define RDMACTXT_F_FAST_UNREG  1
  #define RDMACTXT_F_LAST_CTXT  2
  
  #define	SVCRDMA_DEVCAP_FAST_REG		1	/* fast mr registration */

diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c 
b/net/sunrpc/xprtrdma/svc_rdma_transport.c
index 25688fa..2c5b201 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_transport.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c
@@ -1,4 +1,5 @@
  /*
+ * Copyright (c) 2014 Open Grid Computing, Inc. All rights reserved.
   * Copyright (c) 2005-2007 Network Appliance, Inc. All rights reserved.
   *
   * This software is available to you under a choice of one of two
@@ -160,7 +161,6 @@ struct svc_rdma_req_map *svc_rdma_get_req_map(void)
schedule_timeout_uninterruptible(msecs_to_jiffies(500));
}
map->count = 0;
-   map->frmr = NULL;
return map;
  }
  
@@ -336,22 +336,21 @@ static void process_context(struct svcxprt_rdma *xprt,
  
  	switch (ctxt->wr_op) {

case IB_WR_SEND:
-   if (test_bit(RDMACTXT_F_FAST_UNREG, &ctxt->flags))
-   svc_rdma_put_frmr(xprt, ctxt->frmr);
+   BUG_ON(ctxt->frmr);
svc_rdma_put_context(ctxt, 1);
break;
  
  	case IB_WR_RDMA_WRITE:

+   BUG_ON(ctxt->frmr);
svc_rdma_put_context(ctxt, 0);
break;
  
  	case IB_WR_RDMA_READ:

case IB_WR_RDMA_READ_WITH_INV:
+   svc_rdma_put_frmr(xprt, ctxt->frmr);
if (test_bit(RDMACTXT_F_LAST_CTXT, &ctxt->flags)) {
struct svc_rdma_op_ctxt *read_hdr = ctxt->read_hdr;
BUG_ON(!read_hdr);
-   if (test_bit(RDMACTXT_F_FAST_UNREG, &ctxt->flags))
-   svc_rdma_put_frmr(xprt, ctxt->frmr);
spin_lock_bh(&xprt->sc_rq_dto_lock);
set_bit(XPT_DATA, &xprt->sc_xprt.xpt_flags);
list_add_tail(&read_hdr->dto_q,
@@ -363,6 +362,7 @@ static void process_context(struct svcxprt_rdma *xprt,
break;
  
  	default:

+   BUG_ON(1);
printk(KERN_ERR "svcrdma: unexpected completion type, "
   "opcode=%d\n",
   ctxt->wr_op);

Note the printk's unreachable now.  Should some of these BUG_ON()'s be
WARN_ON()'s?


I'll remove the printk.  And if any of the new BUG_ON()'s can be 
WARN_ON(), then I'll do that.  But only if proceeding after a WARN_ON() 
results in a working server.



@@ -378,29 +378,42 @@ static void process_context(struct svcxprt_rdma *xprt,
  static void sq_cq_reap(struct svcxprt_rdma *xprt)
  {
struct svc_rdma_op_ctxt *ctxt = NULL;
-   struct ib_wc wc;
+   struct ib_wc wc_a[6];
+   struct ib_wc *wc;
struct ib_cq *cq = xprt->sc_sq_cq;
int ret;

May want to keep an eye on the stack usage here?


Ok.  Perhaps I'll put the array in the cvs_rdma_op_ctxt.



--b.

  
+	memset(wc_a, 0, sizeof(wc_a));

+
if (!test_and_clear_bit(RDMAXPRT_SQ_PENDING, &xprt->sc_flags))
return;
  
  	ib_req_notify_cq(xprt->sc_sq_cq, IB_CQ_NEXT_COMP);

atomic_inc(&rdma_stat_sq_poll);
-   while ((ret = ib_poll_cq(cq, 1, &wc)) > 0) {
-   if (wc.status != IB_WC_SUCCESS)
-   /* Close the transport */
-   set_bit(XPT_CLOSE, &xprt->sc_xprt.xpt_flags);
+   while ((ret = ib_poll_cq(cq, ARRAY_SIZE(wc_a), wc_a)) > 0) {
+   int i;
  
-		/* Decrement used SQ WR count */

-   atomic_dec(&xprt->sc_sq_count);
-   wake_up(&xprt->sc_send_wait);
+   for (i = 0; i < ret; i++) {
+   wc = &wc_a[i];
+   if (wc->status != IB_WC_SUCCESS) {
+   dprintk("svcrdma: sq wc err status 

RE: Announcing IBM Platform MPI 9.1.2.1 FixPack

2014-05-06 Thread Hefty, Sean
> I am trying to add rdmacm support to Platform MPI.   I noticed that the
> performance on our test cluster was very poor for creating connections.
> For 12 processes on 12 hosts to create n^^2 connections takes about 12
> seconds.   I also discovered that if I create some TCP sockets and use
> those to ensure that only one process at a time is calling rdmacm_connect
> to any target at a time, that the performance changes dramatically and
> that I can then connected the 12 processes very quickly (didn't measure
> exactly, but similar to our old rdma code).The order in which I am
> connecting processes avoids flooding a single target with many
> rdmacm_connects at once, but it is difficult to avoid the case where 2
> processes call dmacm_connect to the same target at roughly the same time
> except when using my extra TCP socket connections.   I haven't played with
> MPICH code yet to see if they have the same issue, but will try that next.
> 
> 
> Our test cluster is a bit old:
> 
> 09:00.0 InfiniBand: Mellanox Technologies MT26428 [ConnectX VPI PCIe 2.0
> 5GT/s - IB QDR / 10GigE] (rev b0)
> 
> Is this a known problem?  Are you aware of any issues that would shed some
> light on this?

This is the first I've heard of slow connect times.  Are you sure that the time 
is coming from rdma_connect, versus route or address resolution?
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] IB/mlx4: Allow to always block UD multicast loopback

2014-05-06 Thread Doug Ledford
- Original Message -
> On Tue, 6 May 2014, Doug Ledford wrote:
> 
> > That table only tells the NIC to listen to specific multicast
> > addresses on the wire.  This is roughly equivalent to telling
> > the SM to subscribe the port to the multicast groups it needs
> > subscribed too.  In both cases, this merely gets the packets
> > to the card.  From there, the card (or the OS as is usually the
> > case on NICs and Ethernet multicast) must redistribute the
> > packet to all queue pairs that have subscribed to the group
> > that the packet was received from.  If you were to block it at
> > the group level, then it universally effect all applications
> > that subscribed to that group, and there might well be a number
> > of applications that did not request this behavior and would
> > be rightfully confused at the card/OS not sending them their
> > multicast packets.  So, I would suggest that the blocking of
> > loopback multicast packets needs to be "opt in" for all
> > applications.  The big hammer of blocking all loopback on an
> > entire card or an entire group, while possible, should be
> > highly discouraged.  It might work with limited applications
> > that know it is being done, but it can also lead to hard to
> > diagnose problems if you add a new application into the mix
> > and it is unaware of this hammer being used and unable to
> > handle the situation.
> 
> Right the multicast blocking occurs at the socket level for regular
> networking (see IP_MULTICAST_LOOP socket options). Socket are owned
> by the application.
> 
> 
> A QP is roughly the equivalent thing at the RDMA level. So it seems
> to me
> that blocking needs to occur at the QP level and not at the multicast
> group level as suggested by Sean.

Nobody that I know of suggested that this should occur at the multicast
group level.  I suggested, and Sean agreed with, the idea that this
should  happen at multicast join time.  That means it would be on a
per queue pair, per multicast join basis.

Setting the IP_MULTICAST_LOOP on the socket effects all joins on that
socket, so this would be a slightly different API.

Right now, the IP stack is somewhat limited in that you would have
to create two different sockets and set IP_MULTICAST_LOOP differently
on the two sockets in order to have some joins reflect your data back
and some not.  You could sort of create what I'm talking about by
using the source address block, but that would block all applications
from your IP address, not just your own sent data, and so wouldn't
really work.

My original preference would have been to allow one queue pair to
have joins in either blocked or unblocked state, that way you would
only need one queue pair for both your reflected and non-reflected
joins.  But it would be easier to have a program capable of both
sockets and RDMA connections if we make the queue pairs follow the
sockets semantics here, so I'll withdraw my concerns over the current
patch (but also wouldn't object to it being done on a per queue
pair basis either, since we don't exactly follow socket semantics
either way given that the socket semantics separate the join and
the ioctl to control this behavior, so separating qp creation
and the setting of this behavior in the RDMA API seems reasonable
too).

> The 2008 patch does exactly that and allow setting the multicast
> loopback
> blocking from user space. Moreover the support is already there for
> in
> kernel users. Could we just get that 2008 patch updated and merged?
> 
> 

-- 
Doug Ledford 
  GPG KeyID: 0E572FDD
  http://people.redhat.com/dledford

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Fw: Announcing IBM Platform MPI 9.1.2.1 FixPack

2014-05-06 Thread David Solt
Resending as plain text for linux-rdma's sake:


From:   David Solt/Dallas/IBM
To: sean.he...@intel.com, 
Cc: Geoffrey Paulsen/Dallas/IBM@IBMUS, linux-rdma@vger.kernel.org
Date:   05/02/2014 09:33 AM
Subject:Re: Fw: Announcing IBM Platform MPI 9.1.2.1 FixPack


Hi Sean,

I am trying to add rdmacm support to Platform MPI.   I noticed that the 
performance on our test cluster was very poor for creating connections. 
For 12 processes on 12 hosts to create n^^2 connections takes about 12 
seconds.   I also discovered that if I create some TCP sockets and use 
those to ensure that only one process at a time is calling rdmacm_connect 
to any target at a time, that the performance changes dramatically and 
that I can then connected the 12 processes very quickly (didn't measure 
exactly, but similar to our old rdma code).The order in which I am 
connecting processes avoids flooding a single target with many 
rdmacm_connects at once, but it is difficult to avoid the case where 2 
processes call dmacm_connect to the same target at roughly the same time 
except when using my extra TCP socket connections.   I haven't played with 
MPICH code yet to see if they have the same issue, but will try that next. 
 

Our test cluster is a bit old:

09:00.0 InfiniBand: Mellanox Technologies MT26428 [ConnectX VPI PCIe 2.0 
5GT/s - IB QDR / 10GigE] (rev b0) 

Is this a known problem?  Are you aware of any issues that would shed some 
light on this? 

Thanks,
Dave

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH] IB/mlx4: Allow to always block UD multicast loopback

2014-05-06 Thread Hefty, Sean
> Right the multicast blocking occurs at the socket level for regular
> networking (see IP_MULTICAST_LOOP socket options). Socket are owned
> by the application.
> 
> 
> A QP is roughly the equivalent thing at the RDMA level. So it seems to me
> that blocking needs to occur at the QP level and not at the multicast
> group level as suggested by Sean.

I was unaware of the IP_MULTICAST_LOOP option.  Given that it makes sense to 
apply this at the QP level.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] IB/mlx4: Allow to always block UD multicast loopback

2014-05-06 Thread Christoph Lameter
On Tue, 6 May 2014, Doug Ledford wrote:

> That table only tells the NIC to listen to specific multicast
> addresses on the wire.  This is roughly equivalent to telling
> the SM to subscribe the port to the multicast groups it needs
> subscribed too.  In both cases, this merely gets the packets
> to the card.  From there, the card (or the OS as is usually the
> case on NICs and Ethernet multicast) must redistribute the
> packet to all queue pairs that have subscribed to the group
> that the packet was received from.  If you were to block it at
> the group level, then it universally effect all applications
> that subscribed to that group, and there might well be a number
> of applications that did not request this behavior and would
> be rightfully confused at the card/OS not sending them their
> multicast packets.  So, I would suggest that the blocking of
> loopback multicast packets needs to be "opt in" for all
> applications.  The big hammer of blocking all loopback on an
> entire card or an entire group, while possible, should be
> highly discouraged.  It might work with limited applications
> that know it is being done, but it can also lead to hard to
> diagnose problems if you add a new application into the mix
> and it is unaware of this hammer being used and unable to
> handle the situation.

Right the multicast blocking occurs at the socket level for regular
networking (see IP_MULTICAST_LOOP socket options). Socket are owned
by the application.


A QP is roughly the equivalent thing at the RDMA level. So it seems to me
that blocking needs to occur at the QP level and not at the multicast
group level as suggested by Sean.

The 2008 patch does exactly that and allow setting the multicast loopback
blocking from user space. Moreover the support is already there for in
kernel users. Could we just get that 2008 patch updated and merged?

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V2 RFC 0/3] svcrdma: refactor marshalling logic

2014-05-06 Thread J. Bruce Fields
On Tue, May 06, 2014 at 12:46:21PM -0500, Steve Wise wrote:
> This patch series refactors the NFSRDMA server marshalling logic to
> remove the intermediary map structures.  It also fixes an existing bug
> where the NFSRDMA server was not minding the device fast register page
> list length limitations.
> 
> I've also made a git repo available with these patches on top of 3.15-rc4:
> 
> git://git.openfabrics.org/~swise/linux svcrdma-refactor
> 
> Changes since V1:
> 
> - fixed regression for devices that don't support FRMRs (see
>   rdma_read_chunk_lcl())
> 
> - split patch up for closer review.  However I request it be squashed
>   before merging as they is not bisectable, and I think these changes
>   should all be a single commit anyway.

If it's not split up in a way that's bisectable, then yes, just don't
bother.

--b.

> 
> Please review, and test if you can.
> 
> Signed-off-by: Tom Tucker 
> Signed-off-by: Steve Wise 
> 
> ---
> 
> Tom Tucker (3):
>   svcrdma: Sendto changes
>   svcrdma: Recvfrom changes
>   svcrdma: Transport and header file changes
> 
> 
>  include/linux/sunrpc/svc_rdma.h  |3 
>  net/sunrpc/xprtrdma/svc_rdma_recvfrom.c  |  633 
> --
>  net/sunrpc/xprtrdma/svc_rdma_sendto.c|  230 +--
>  net/sunrpc/xprtrdma/svc_rdma_transport.c |   62 ++-
>  4 files changed, 318 insertions(+), 610 deletions(-)
> 
> -- 
> 
> Steve / Tom
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] IB/mlx4: Allow to always block UD multicast loopback

2014-05-06 Thread Doug Ledford
- Original Message -
> On Mon, 5 May 2014, Hefty, Sean wrote:
> 
> > I agree with Doug here.  This makes more sense to specify on
> > multicast atta=
> > ch time, not QP creation time.
> 
> On topic response: Do devices support multicast loopback blocking on
> a
> multicast group?

I would think this is very doubtful...see below...

> From what I can tell the multicast support in NICs
> is
> based on having a table with MAC addresses of the multicast groups
> that
> the NIC is able to receive.

That table only tells the NIC to listen to specific multicast
addresses on the wire.  This is roughly equivalent to telling
the SM to subscribe the port to the multicast groups it needs
subscribed too.  In both cases, this merely gets the packets
to the card.  From there, the card (or the OS as is usually the
case on NICs and Ethernet multicast) must redistribute the
packet to all queue pairs that have subscribed to the group
that the packet was received from.  If you were to block it at
the group level, then it universally effect all applications
that subscribed to that group, and there might well be a number
of applications that did not request this behavior and would
be rightfully confused at the card/OS not sending them their
multicast packets.  So, I would suggest that the blocking of
loopback multicast packets needs to be "opt in" for all
applications.  The big hammer of blocking all loopback on an
entire card or an entire group, while possible, should be
highly discouraged.  It might work with limited applications
that know it is being done, but it can also lead to hard to
diagnose problems if you add a new application into the mix
and it is unaware of this hammer being used and unable to
handle the situation.

-- 
Doug Ledford 
  GPG KeyID: 0E572FDD
  http://people.redhat.com/dledford

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V2 RFC 1/3] svcrdma: Transport and header file changes

2014-05-06 Thread J. Bruce Fields
On Tue, May 06, 2014 at 12:46:27PM -0500, Steve Wise wrote:
> From: Tom Tucker 
> 
> Change poll logic to grab up to 6 completions at a time.
> 
> RDMA write and send completions no longer deal with fastreg objects.
> 
> Set SVCRDMA_DEVCAP_FAST_REG and allocate a dma_mr based on the device
> capabilities.
> 
> Signed-off-by: Tom Tucker 
> Signed-off-by: Steve Wise 
> ---
> 
>  include/linux/sunrpc/svc_rdma.h  |3 -
>  net/sunrpc/xprtrdma/svc_rdma_transport.c |   62 
> +-
>  2 files changed, 37 insertions(+), 28 deletions(-)
> 
> diff --git a/include/linux/sunrpc/svc_rdma.h b/include/linux/sunrpc/svc_rdma.h
> index 0b8e3e6..5cf99a0 100644
> --- a/include/linux/sunrpc/svc_rdma.h
> +++ b/include/linux/sunrpc/svc_rdma.h
> @@ -115,14 +115,13 @@ struct svc_rdma_fastreg_mr {
>   struct list_head frmr_list;
>  };
>  struct svc_rdma_req_map {
> - struct svc_rdma_fastreg_mr *frmr;
>   unsigned long count;
>   union {
>   struct kvec sge[RPCSVC_MAXPAGES];
>   struct svc_rdma_chunk_sge ch[RPCSVC_MAXPAGES];
> + unsigned long lkey[RPCSVC_MAXPAGES];
>   };
>  };
> -#define RDMACTXT_F_FAST_UNREG1
>  #define RDMACTXT_F_LAST_CTXT 2
>  
>  #define  SVCRDMA_DEVCAP_FAST_REG 1   /* fast mr registration 
> */
> diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c 
> b/net/sunrpc/xprtrdma/svc_rdma_transport.c
> index 25688fa..2c5b201 100644
> --- a/net/sunrpc/xprtrdma/svc_rdma_transport.c
> +++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c
> @@ -1,4 +1,5 @@
>  /*
> + * Copyright (c) 2014 Open Grid Computing, Inc. All rights reserved.
>   * Copyright (c) 2005-2007 Network Appliance, Inc. All rights reserved.
>   *
>   * This software is available to you under a choice of one of two
> @@ -160,7 +161,6 @@ struct svc_rdma_req_map *svc_rdma_get_req_map(void)
>   schedule_timeout_uninterruptible(msecs_to_jiffies(500));
>   }
>   map->count = 0;
> - map->frmr = NULL;
>   return map;
>  }
>  
> @@ -336,22 +336,21 @@ static void process_context(struct svcxprt_rdma *xprt,
>  
>   switch (ctxt->wr_op) {
>   case IB_WR_SEND:
> - if (test_bit(RDMACTXT_F_FAST_UNREG, &ctxt->flags))
> - svc_rdma_put_frmr(xprt, ctxt->frmr);
> + BUG_ON(ctxt->frmr);
>   svc_rdma_put_context(ctxt, 1);
>   break;
>  
>   case IB_WR_RDMA_WRITE:
> + BUG_ON(ctxt->frmr);
>   svc_rdma_put_context(ctxt, 0);
>   break;
>  
>   case IB_WR_RDMA_READ:
>   case IB_WR_RDMA_READ_WITH_INV:
> + svc_rdma_put_frmr(xprt, ctxt->frmr);
>   if (test_bit(RDMACTXT_F_LAST_CTXT, &ctxt->flags)) {
>   struct svc_rdma_op_ctxt *read_hdr = ctxt->read_hdr;
>   BUG_ON(!read_hdr);
> - if (test_bit(RDMACTXT_F_FAST_UNREG, &ctxt->flags))
> - svc_rdma_put_frmr(xprt, ctxt->frmr);
>   spin_lock_bh(&xprt->sc_rq_dto_lock);
>   set_bit(XPT_DATA, &xprt->sc_xprt.xpt_flags);
>   list_add_tail(&read_hdr->dto_q,
> @@ -363,6 +362,7 @@ static void process_context(struct svcxprt_rdma *xprt,
>   break;
>  
>   default:
> + BUG_ON(1);
>   printk(KERN_ERR "svcrdma: unexpected completion type, "
>  "opcode=%d\n",
>  ctxt->wr_op);

Note the printk's unreachable now.  Should some of these BUG_ON()'s be
WARN_ON()'s?

> @@ -378,29 +378,42 @@ static void process_context(struct svcxprt_rdma *xprt,
>  static void sq_cq_reap(struct svcxprt_rdma *xprt)
>  {
>   struct svc_rdma_op_ctxt *ctxt = NULL;
> - struct ib_wc wc;
> + struct ib_wc wc_a[6];
> + struct ib_wc *wc;
>   struct ib_cq *cq = xprt->sc_sq_cq;
>   int ret;

May want to keep an eye on the stack usage here?

--b.

>  
> + memset(wc_a, 0, sizeof(wc_a));
> +
>   if (!test_and_clear_bit(RDMAXPRT_SQ_PENDING, &xprt->sc_flags))
>   return;
>  
>   ib_req_notify_cq(xprt->sc_sq_cq, IB_CQ_NEXT_COMP);
>   atomic_inc(&rdma_stat_sq_poll);
> - while ((ret = ib_poll_cq(cq, 1, &wc)) > 0) {
> - if (wc.status != IB_WC_SUCCESS)
> - /* Close the transport */
> - set_bit(XPT_CLOSE, &xprt->sc_xprt.xpt_flags);
> + while ((ret = ib_poll_cq(cq, ARRAY_SIZE(wc_a), wc_a)) > 0) {
> + int i;
>  
> - /* Decrement used SQ WR count */
> - atomic_dec(&xprt->sc_sq_count);
> - wake_up(&xprt->sc_send_wait);
> + for (i = 0; i < ret; i++) {
> + wc = &wc_a[i];
> + if (wc->status != IB_WC_SUCCESS) {
> + dprintk("svcrdma: sq wc err status %d\n",
> + wc->status);
>  
> - ctxt = (struct svc_rdma_op_ctxt *)(unsigned lon

RE: [PATCH] IB/mlx4: Allow to always block UD multicast loopback

2014-05-06 Thread Christoph Lameter
On Mon, 5 May 2014, Hefty, Sean wrote:

> I agree with Doug here.  This makes more sense to specify on multicast atta=
> ch time, not QP creation time.

On topic response: Do devices support multicast loopback blocking on a
multicast group? From what I can tell the multicast support in NICs is
based on having a table with MAC addresses of the multicast groups that
the NIC is able to receive.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V2 RFC 1/3] svcrdma: Transport and header file changes

2014-05-06 Thread Steve Wise
From: Tom Tucker 

Change poll logic to grab up to 6 completions at a time.

RDMA write and send completions no longer deal with fastreg objects.

Set SVCRDMA_DEVCAP_FAST_REG and allocate a dma_mr based on the device
capabilities.

Signed-off-by: Tom Tucker 
Signed-off-by: Steve Wise 
---

 include/linux/sunrpc/svc_rdma.h  |3 -
 net/sunrpc/xprtrdma/svc_rdma_transport.c |   62 +-
 2 files changed, 37 insertions(+), 28 deletions(-)

diff --git a/include/linux/sunrpc/svc_rdma.h b/include/linux/sunrpc/svc_rdma.h
index 0b8e3e6..5cf99a0 100644
--- a/include/linux/sunrpc/svc_rdma.h
+++ b/include/linux/sunrpc/svc_rdma.h
@@ -115,14 +115,13 @@ struct svc_rdma_fastreg_mr {
struct list_head frmr_list;
 };
 struct svc_rdma_req_map {
-   struct svc_rdma_fastreg_mr *frmr;
unsigned long count;
union {
struct kvec sge[RPCSVC_MAXPAGES];
struct svc_rdma_chunk_sge ch[RPCSVC_MAXPAGES];
+   unsigned long lkey[RPCSVC_MAXPAGES];
};
 };
-#define RDMACTXT_F_FAST_UNREG  1
 #define RDMACTXT_F_LAST_CTXT   2
 
 #defineSVCRDMA_DEVCAP_FAST_REG 1   /* fast mr registration 
*/
diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c 
b/net/sunrpc/xprtrdma/svc_rdma_transport.c
index 25688fa..2c5b201 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_transport.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c
@@ -1,4 +1,5 @@
 /*
+ * Copyright (c) 2014 Open Grid Computing, Inc. All rights reserved.
  * Copyright (c) 2005-2007 Network Appliance, Inc. All rights reserved.
  *
  * This software is available to you under a choice of one of two
@@ -160,7 +161,6 @@ struct svc_rdma_req_map *svc_rdma_get_req_map(void)
schedule_timeout_uninterruptible(msecs_to_jiffies(500));
}
map->count = 0;
-   map->frmr = NULL;
return map;
 }
 
@@ -336,22 +336,21 @@ static void process_context(struct svcxprt_rdma *xprt,
 
switch (ctxt->wr_op) {
case IB_WR_SEND:
-   if (test_bit(RDMACTXT_F_FAST_UNREG, &ctxt->flags))
-   svc_rdma_put_frmr(xprt, ctxt->frmr);
+   BUG_ON(ctxt->frmr);
svc_rdma_put_context(ctxt, 1);
break;
 
case IB_WR_RDMA_WRITE:
+   BUG_ON(ctxt->frmr);
svc_rdma_put_context(ctxt, 0);
break;
 
case IB_WR_RDMA_READ:
case IB_WR_RDMA_READ_WITH_INV:
+   svc_rdma_put_frmr(xprt, ctxt->frmr);
if (test_bit(RDMACTXT_F_LAST_CTXT, &ctxt->flags)) {
struct svc_rdma_op_ctxt *read_hdr = ctxt->read_hdr;
BUG_ON(!read_hdr);
-   if (test_bit(RDMACTXT_F_FAST_UNREG, &ctxt->flags))
-   svc_rdma_put_frmr(xprt, ctxt->frmr);
spin_lock_bh(&xprt->sc_rq_dto_lock);
set_bit(XPT_DATA, &xprt->sc_xprt.xpt_flags);
list_add_tail(&read_hdr->dto_q,
@@ -363,6 +362,7 @@ static void process_context(struct svcxprt_rdma *xprt,
break;
 
default:
+   BUG_ON(1);
printk(KERN_ERR "svcrdma: unexpected completion type, "
   "opcode=%d\n",
   ctxt->wr_op);
@@ -378,29 +378,42 @@ static void process_context(struct svcxprt_rdma *xprt,
 static void sq_cq_reap(struct svcxprt_rdma *xprt)
 {
struct svc_rdma_op_ctxt *ctxt = NULL;
-   struct ib_wc wc;
+   struct ib_wc wc_a[6];
+   struct ib_wc *wc;
struct ib_cq *cq = xprt->sc_sq_cq;
int ret;
 
+   memset(wc_a, 0, sizeof(wc_a));
+
if (!test_and_clear_bit(RDMAXPRT_SQ_PENDING, &xprt->sc_flags))
return;
 
ib_req_notify_cq(xprt->sc_sq_cq, IB_CQ_NEXT_COMP);
atomic_inc(&rdma_stat_sq_poll);
-   while ((ret = ib_poll_cq(cq, 1, &wc)) > 0) {
-   if (wc.status != IB_WC_SUCCESS)
-   /* Close the transport */
-   set_bit(XPT_CLOSE, &xprt->sc_xprt.xpt_flags);
+   while ((ret = ib_poll_cq(cq, ARRAY_SIZE(wc_a), wc_a)) > 0) {
+   int i;
 
-   /* Decrement used SQ WR count */
-   atomic_dec(&xprt->sc_sq_count);
-   wake_up(&xprt->sc_send_wait);
+   for (i = 0; i < ret; i++) {
+   wc = &wc_a[i];
+   if (wc->status != IB_WC_SUCCESS) {
+   dprintk("svcrdma: sq wc err status %d\n",
+   wc->status);
 
-   ctxt = (struct svc_rdma_op_ctxt *)(unsigned long)wc.wr_id;
-   if (ctxt)
-   process_context(xprt, ctxt);
+   /* Close the transport */
+   set_bit(XPT_CLOSE, &xprt->sc_xprt.xpt_flags);
+   }
 
-   svc_xprt_put(&xprt->sc_xprt);
+  

[PATCH V2 RFC 3/3] svcrdma: Sendto changes

2014-05-06 Thread Steve Wise
From: Tom Tucker 

Don't use fast-register mrs for the source of RDMA writes and sends.
Instead, use either a local dma lkey or a dma_mr lkey based on what the
device supports.

Signed-off-by: Tom Tucker 
Signed-off-by: Steve Wise 
---

 net/sunrpc/xprtrdma/svc_rdma_sendto.c |  230 +++--
 1 files changed, 22 insertions(+), 208 deletions(-)

diff --git a/net/sunrpc/xprtrdma/svc_rdma_sendto.c 
b/net/sunrpc/xprtrdma/svc_rdma_sendto.c
index 7e024a5..49fd21a 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_sendto.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_sendto.c
@@ -1,4 +1,5 @@
 /*
+ * Copyright (c) 2014 Open Grid Computing, Inc. All rights reserved.
  * Copyright (c) 2005-2006 Network Appliance, Inc. All rights reserved.
  *
  * This software is available to you under a choice of one of two
@@ -49,152 +50,6 @@
 
 #define RPCDBG_FACILITYRPCDBG_SVCXPRT
 
-/* Encode an XDR as an array of IB SGE
- *
- * Assumptions:
- * - head[0] is physically contiguous.
- * - tail[0] is physically contiguous.
- * - pages[] is not physically or virtually contiguous and consists of
- *   PAGE_SIZE elements.
- *
- * Output:
- * SGE[0]  reserved for RCPRDMA header
- * SGE[1]  data from xdr->head[]
- * SGE[2..sge_count-2] data from xdr->pages[]
- * SGE[sge_count-1]data from xdr->tail.
- *
- * The max SGE we need is the length of the XDR / pagesize + one for
- * head + one for tail + one for RPCRDMA header. Since RPCSVC_MAXPAGES
- * reserves a page for both the request and the reply header, and this
- * array is only concerned with the reply we are assured that we have
- * on extra page for the RPCRMDA header.
- */
-static int fast_reg_xdr(struct svcxprt_rdma *xprt,
-   struct xdr_buf *xdr,
-   struct svc_rdma_req_map *vec)
-{
-   int sge_no;
-   u32 sge_bytes;
-   u32 page_bytes;
-   u32 page_off;
-   int page_no = 0;
-   u8 *frva;
-   struct svc_rdma_fastreg_mr *frmr;
-
-   frmr = svc_rdma_get_frmr(xprt);
-   if (IS_ERR(frmr))
-   return -ENOMEM;
-   vec->frmr = frmr;
-
-   /* Skip the RPCRDMA header */
-   sge_no = 1;
-
-   /* Map the head. */
-   frva = (void *)((unsigned long)(xdr->head[0].iov_base) & PAGE_MASK);
-   vec->sge[sge_no].iov_base = xdr->head[0].iov_base;
-   vec->sge[sge_no].iov_len = xdr->head[0].iov_len;
-   vec->count = 2;
-   sge_no++;
-
-   /* Map the XDR head */
-   frmr->kva = frva;
-   frmr->direction = DMA_TO_DEVICE;
-   frmr->access_flags = 0;
-   frmr->map_len = PAGE_SIZE;
-   frmr->page_list_len = 1;
-   page_off = (unsigned long)xdr->head[0].iov_base & ~PAGE_MASK;
-   frmr->page_list->page_list[page_no] =
-   ib_dma_map_page(xprt->sc_cm_id->device,
-   virt_to_page(xdr->head[0].iov_base),
-   page_off,
-   PAGE_SIZE - page_off,
-   DMA_TO_DEVICE);
-   if (ib_dma_mapping_error(xprt->sc_cm_id->device,
-frmr->page_list->page_list[page_no]))
-   goto fatal_err;
-   atomic_inc(&xprt->sc_dma_used);
-
-   /* Map the XDR page list */
-   page_off = xdr->page_base;
-   page_bytes = xdr->page_len + page_off;
-   if (!page_bytes)
-   goto encode_tail;
-
-   /* Map the pages */
-   vec->sge[sge_no].iov_base = frva + frmr->map_len + page_off;
-   vec->sge[sge_no].iov_len = page_bytes;
-   sge_no++;
-   while (page_bytes) {
-   struct page *page;
-
-   page = xdr->pages[page_no++];
-   sge_bytes = min_t(u32, page_bytes, (PAGE_SIZE - page_off));
-   page_bytes -= sge_bytes;
-
-   frmr->page_list->page_list[page_no] =
-   ib_dma_map_page(xprt->sc_cm_id->device,
-   page, page_off,
-   sge_bytes, DMA_TO_DEVICE);
-   if (ib_dma_mapping_error(xprt->sc_cm_id->device,
-frmr->page_list->page_list[page_no]))
-   goto fatal_err;
-
-   atomic_inc(&xprt->sc_dma_used);
-   page_off = 0; /* reset for next time through loop */
-   frmr->map_len += PAGE_SIZE;
-   frmr->page_list_len++;
-   }
-   vec->count++;
-
- encode_tail:
-   /* Map tail */
-   if (0 == xdr->tail[0].iov_len)
-   goto done;
-
-   vec->count++;
-   vec->sge[sge_no].iov_len = xdr->tail[0].iov_len;
-
-   if (((unsigned long)xdr->tail[0].iov_base & PAGE_MASK) ==
-   ((unsigned long)xdr->head[0].iov_base & PAGE_MASK)) {
-   /*
-* If head and tail use the same page, we don't need
-* to map it again.
-*/
-   vec->sge[sge_no].iov_base = xdr->tail[0].iov_base;
-  

[PATCH V2 RFC 0/3] svcrdma: refactor marshalling logic

2014-05-06 Thread Steve Wise
This patch series refactors the NFSRDMA server marshalling logic to
remove the intermediary map structures.  It also fixes an existing bug
where the NFSRDMA server was not minding the device fast register page
list length limitations.

I've also made a git repo available with these patches on top of 3.15-rc4:

git://git.openfabrics.org/~swise/linux svcrdma-refactor

Changes since V1:

- fixed regression for devices that don't support FRMRs (see
  rdma_read_chunk_lcl())

- split patch up for closer review.  However I request it be squashed
  before merging as they is not bisectable, and I think these changes
  should all be a single commit anyway.

Please review, and test if you can.

Signed-off-by: Tom Tucker 
Signed-off-by: Steve Wise 

---

Tom Tucker (3):
  svcrdma: Sendto changes
  svcrdma: Recvfrom changes
  svcrdma: Transport and header file changes


 include/linux/sunrpc/svc_rdma.h  |3 
 net/sunrpc/xprtrdma/svc_rdma_recvfrom.c  |  633 --
 net/sunrpc/xprtrdma/svc_rdma_sendto.c|  230 +--
 net/sunrpc/xprtrdma/svc_rdma_transport.c |   62 ++-
 4 files changed, 318 insertions(+), 610 deletions(-)

-- 

Steve / Tom
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V2 RFC 2/3] svcrdma: Recvfrom changes

2014-05-06 Thread Steve Wise
From: Tom Tucker 

Based on device support, RDMA read target sgls are fast-registered,
or composed using the local dma lkey or a dma_mr lkey.  A given NFS
Write chunk list will be split into a set of rdma reads based on the
limitations of the device.

Signed-off-by: Tom Tucker 
Signed-off-by: Steve Wise 
---

 net/sunrpc/xprtrdma/svc_rdma_recvfrom.c |  633 +--
 1 files changed, 259 insertions(+), 374 deletions(-)

diff --git a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c 
b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
index 8d904e4..1c4c285 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
@@ -1,4 +1,5 @@
 /*
+ * Copyright (c) 2014 Open Grid Computing, Inc. All rights reserved.
  * Copyright (c) 2005-2006 Network Appliance, Inc. All rights reserved.
  *
  * This software is available to you under a choice of one of two
@@ -69,7 +70,8 @@ static void rdma_build_arg_xdr(struct svc_rqst *rqstp,
 
/* Set up the XDR head */
rqstp->rq_arg.head[0].iov_base = page_address(page);
-   rqstp->rq_arg.head[0].iov_len = min(byte_count, ctxt->sge[0].length);
+   rqstp->rq_arg.head[0].iov_len =
+   min_t(size_t, byte_count, ctxt->sge[0].length);
rqstp->rq_arg.len = byte_count;
rqstp->rq_arg.buflen = byte_count;
 
@@ -85,7 +87,7 @@ static void rdma_build_arg_xdr(struct svc_rqst *rqstp,
page = ctxt->pages[sge_no];
put_page(rqstp->rq_pages[sge_no]);
rqstp->rq_pages[sge_no] = page;
-   bc -= min(bc, ctxt->sge[sge_no].length);
+   bc -= min_t(u32, bc, ctxt->sge[sge_no].length);
rqstp->rq_arg.buflen += ctxt->sge[sge_no].length;
sge_no++;
}
@@ -113,291 +115,249 @@ static void rdma_build_arg_xdr(struct svc_rqst *rqstp,
rqstp->rq_arg.tail[0].iov_len = 0;
 }
 
-/* Encode a read-chunk-list as an array of IB SGE
- *
- * Assumptions:
- * - chunk[0]->position points to pages[0] at an offset of 0
- * - pages[] is not physically or virtually contiguous and consists of
- *   PAGE_SIZE elements.
- *
- * Output:
- * - sge array pointing into pages[] array.
- * - chunk_sge array specifying sge index and count for each
- *   chunk in the read list
- *
- */
-static int map_read_chunks(struct svcxprt_rdma *xprt,
-  struct svc_rqst *rqstp,
-  struct svc_rdma_op_ctxt *head,
-  struct rpcrdma_msg *rmsgp,
-  struct svc_rdma_req_map *rpl_map,
-  struct svc_rdma_req_map *chl_map,
-  int ch_count,
-  int byte_count)
+static int rdma_read_max_sge(struct svcxprt_rdma *xprt, int sge_count)
 {
-   int sge_no;
-   int sge_bytes;
-   int page_off;
-   int page_no;
-   int ch_bytes;
-   int ch_no;
-   struct rpcrdma_read_chunk *ch;
+   if (rdma_node_get_transport(xprt->sc_cm_id->device->node_type) ==
+RDMA_TRANSPORT_IWARP)
+   return 1;
+   else
+   return min_t(int, sge_count, xprt->sc_max_sge);
+}
 
-   sge_no = 0;
-   page_no = 0;
-   page_off = 0;
-   ch = (struct rpcrdma_read_chunk *)&rmsgp->rm_body.rm_chunks[0];
-   ch_no = 0;
-   ch_bytes = ntohl(ch->rc_target.rs_length);
-   head->arg.head[0] = rqstp->rq_arg.head[0];
-   head->arg.tail[0] = rqstp->rq_arg.tail[0];
-   head->arg.pages = &head->pages[head->count];
-   head->hdr_count = head->count; /* save count of hdr pages */
-   head->arg.page_base = 0;
-   head->arg.page_len = ch_bytes;
-   head->arg.len = rqstp->rq_arg.len + ch_bytes;
-   head->arg.buflen = rqstp->rq_arg.buflen + ch_bytes;
-   head->count++;
-   chl_map->ch[0].start = 0;
-   while (byte_count) {
-   rpl_map->sge[sge_no].iov_base =
-   page_address(rqstp->rq_arg.pages[page_no]) + page_off;
-   sge_bytes = min_t(int, PAGE_SIZE-page_off, ch_bytes);
-   rpl_map->sge[sge_no].iov_len = sge_bytes;
-   /*
-* Don't bump head->count here because the same page
-* may be used by multiple SGE.
-*/
-   head->arg.pages[page_no] = rqstp->rq_arg.pages[page_no];
-   rqstp->rq_respages = &rqstp->rq_arg.pages[page_no+1];
+typedef int (*rdma_reader_fn)(struct svcxprt_rdma *xprt,
+ struct svc_rqst *rqstp,
+ struct svc_rdma_op_ctxt *head,
+ int *page_no,
+ u32 *page_offset,
+ u32 rs_handle,
+ u32 rs_length,
+ u64 rs_offset,
+ int last);
+
+/* Issue an RDMA_READ using the local lkey to map the data sink */
+static int rdma_read_chunk_lcl(struct svcxprt_rdma *xprt

Re: [PATCH 0/5] add new verb command support

2014-05-06 Thread Roland Dreier
On Mon, May 5, 2014 at 6:14 AM, Alexey Ishchuk
 wrote:
> Dear community,
>
> I posted the patches to provide the DAPL API support on the s390x platform
> to the linux-rdma mailing list several weeks ago and still don't have any
> feedback to them. I would like to kindly ask the component maintainers to
> take care of the patches and integrate them to the appropriate source code
> trees. Could you tell me, are you planning to integrate the kernel patch
> into the Linux kernel? Could you tell me, are the userspace component
> maintainers planning to integrate the changes into the OFED components?
> Could you, please, give the feedback?

In my opinion, the same feedback I gave for the first posting still
applies.  This approach is too invasive and hacky looking.  To be
blunt, s390 is not a mainstream platform, and adding #ifdefs all over
just to support fake kernel bypass on s390 doesn't seem like a good
tradeoff.

So I think this needs to be rearchitected before it goes upstream.

 - R.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ANNOUNCE] libibverbs 1.1.8 is released

2014-05-06 Thread Or Gerlitz
On Mon, May 5, 2014 at 10:19 PM, Roland Dreier  wrote:
> libibverbs is a library that allows programs to use RDMA "verbs" for
> direct access to RDMA (currently InfiniBand and iWARP) hardware from
> userspace.
> The new stable release, 1.1.8, is available from


Hi Roland, as you've mentioned to us before, this release will be
accompanied by a release of libmlx4 too, right? so when?
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/9] SRP initiator patches for kernel 3.16

2014-05-06 Thread Bart Van Assche
On 05/06/14 16:06, Jack Wang wrote:
> On 05/06/2014 02:49 PM, Bart Van Assche wrote:
>> This patch series consists of one patch that adds fast registration
>> support to the SRP initiator and eight preparation patches:
>>
>> 0001-IB-srp-Fix-kernel-doc-warnings.patch
>> 0002-IB-srp-Introduce-an-additional-local-variable.patch
>> 0003-IB-srp-Introduce-srp_alloc_fmr_pool.patch
>> 0004-IB-srp-Introduce-srp_map_fmr.patch
>> 0005-IB-srp-Introduce-srp_finish_mapping.patch
>> 0006-IB-srp-Make-srp_alloc_req_data-reallocate-request-da.patch
>> 0007-IB-srp-Avoid-triggering-an-infinite-loop-if-memory-m.patch
>> 0008-IB-srp-Rename-FMR-related-variables.patch
>> 0009-IB-srp-Add-fast-registration-support.patch
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> The 3rd patch does not show up in the mail list, could you resend it.
> The first 2 looks clear right:)

Hello Jack,

That's strange ... anyway, thanks for the notification. I have resent
patch 3/9.

Bart.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/9] IB/srp: Introduce srp_alloc_fmr_pool()

2014-05-06 Thread Bart Van Assche
Introduce the srp_alloc_fmr_pool() function. Only set
srp_dev->fmr_max_size if FMR pool creation succeeded. This change is
safe since that variable is only used if FMR pool creation succeeded.

Signed-off-by: Bart Van Assche 
Cc: Roland Dreier 
Cc: David Dillow 
Cc: Sagi Grimberg 
Cc: Vu Pham 
Cc: Sebastian Parschauer 
---
 drivers/infiniband/ulp/srp/ib_srp.c | 56 ++---
 1 file changed, 33 insertions(+), 23 deletions(-)

diff --git a/drivers/infiniband/ulp/srp/ib_srp.c 
b/drivers/infiniband/ulp/srp/ib_srp.c
index 8c03371..f41cc8c 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -2792,13 +2792,43 @@ free_host:
return NULL;
 }
 +static void srp_alloc_fmr_pool(struct srp_device *srp_dev)
+{
+   int max_pages_per_mr;
+   struct ib_fmr_pool_param fmr_param;
+   struct ib_fmr_pool *pool;
+
+   srp_dev->fmr_pool = NULL;
+
+   for (max_pages_per_mr = SRP_FMR_SIZE;
+max_pages_per_mr >= SRP_FMR_MIN_SIZE;
+max_pages_per_mr /= 2) {
+   memset(&fmr_param, 0, sizeof(fmr_param));
+   fmr_param.pool_size = SRP_FMR_POOL_SIZE;
+   fmr_param.dirty_watermark   = SRP_FMR_DIRTY_SIZE;
+   fmr_param.cache = 1;
+   fmr_param.max_pages_per_fmr = max_pages_per_mr;
+   fmr_param.page_shift= ilog2(srp_dev->fmr_page_size);
+   fmr_param.access= (IB_ACCESS_LOCAL_WRITE |
+  IB_ACCESS_REMOTE_WRITE |
+  IB_ACCESS_REMOTE_READ);
+
+   pool = ib_create_fmr_pool(srp_dev->pd, &fmr_param);
+   if (!IS_ERR(pool)) {
+   srp_dev->fmr_pool = pool;
+   srp_dev->fmr_max_size =
+   srp_dev->fmr_page_size * max_pages_per_mr;
+   break;
+   }
+   }
+}
+
 static void srp_add_one(struct ib_device *device)
 {
struct srp_device *srp_dev;
struct ib_device_attr *dev_attr;
-   struct ib_fmr_pool_param fmr_param;
struct srp_host *host;
-   int max_pages_per_fmr, fmr_page_shift, s, e, p;
+   int fmr_page_shift, s, e, p;
dev_attr = kmalloc(sizeof *dev_attr, GFP_KERNEL);
if (!dev_attr)
@@ -2821,7 +2851,6 @@ static void srp_add_one(struct ib_device *device)
fmr_page_shift  = max(12, ffs(dev_attr->page_size_cap) - 1);
srp_dev->fmr_page_size  = 1 << fmr_page_shift;
srp_dev->fmr_page_mask  = ~((u64) srp_dev->fmr_page_size - 1);
-   srp_dev->fmr_max_size   = srp_dev->fmr_page_size * SRP_FMR_SIZE;
INIT_LIST_HEAD(&srp_dev->dev_list);
 @@ -2837,26 +2866,7 @@ static void srp_add_one(struct ib_device *device)
if (IS_ERR(srp_dev->mr))
goto err_pd;
 -  for (max_pages_per_fmr = SRP_FMR_SIZE;
-   max_pages_per_fmr >= SRP_FMR_MIN_SIZE;
-   max_pages_per_fmr /= 2, srp_dev->fmr_max_size /= 2) {
-   memset(&fmr_param, 0, sizeof fmr_param);
-   fmr_param.pool_size = SRP_FMR_POOL_SIZE;
-   fmr_param.dirty_watermark   = SRP_FMR_DIRTY_SIZE;
-   fmr_param.cache = 1;
-   fmr_param.max_pages_per_fmr = max_pages_per_fmr;
-   fmr_param.page_shift= fmr_page_shift;
-   fmr_param.access= (IB_ACCESS_LOCAL_WRITE |
-  IB_ACCESS_REMOTE_WRITE |
-  IB_ACCESS_REMOTE_READ);
-
-   srp_dev->fmr_pool = ib_create_fmr_pool(srp_dev->pd, &fmr_param);
-   if (!IS_ERR(srp_dev->fmr_pool))
-   break;
-   }
-
-   if (IS_ERR(srp_dev->fmr_pool))
-   srp_dev->fmr_pool = NULL;
+   srp_alloc_fmr_pool(srp_dev);
if (device->node_type == RDMA_NODE_IB_SWITCH) {
s = 0;
-- 
1.8.4.5

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/9] SRP initiator patches for kernel 3.16

2014-05-06 Thread Jack Wang
On 05/06/2014 02:49 PM, Bart Van Assche wrote:
> This patch series consists of one patch that adds fast registration
> support to the SRP initiator and eight preparation patches:
> 
> 0001-IB-srp-Fix-kernel-doc-warnings.patch
> 0002-IB-srp-Introduce-an-additional-local-variable.patch
> 0003-IB-srp-Introduce-srp_alloc_fmr_pool.patch
> 0004-IB-srp-Introduce-srp_map_fmr.patch
> 0005-IB-srp-Introduce-srp_finish_mapping.patch
> 0006-IB-srp-Make-srp_alloc_req_data-reallocate-request-da.patch
> 0007-IB-srp-Avoid-triggering-an-infinite-loop-if-memory-m.patch
> 0008-IB-srp-Rename-FMR-related-variables.patch
> 0009-IB-srp-Add-fast-registration-support.patch
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

Hi Bart,

The 3rd patch does not show up in the mail list, could you resend it.
The first 2 looks clear right:)

Jack

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 9/9] IB/srp: Add fast registration support

2014-05-06 Thread Bart Van Assche
Certain HCA types (e.g. Connect-IB) and certain configurations (e.g.
ConnectX VF) support FR but not FMR. Hence add FR support.

Signed-off-by: Bart Van Assche 
Cc: Roland Dreier 
Cc: David Dillow 
Cc: Sagi Grimberg 
Cc: Vu Pham 
Cc: Sebastian Parschauer 
---
 drivers/infiniband/ulp/srp/ib_srp.c | 442 ++--
 drivers/infiniband/ulp/srp/ib_srp.h |  82 ++-
 2 files changed, 451 insertions(+), 73 deletions(-)

diff --git a/drivers/infiniband/ulp/srp/ib_srp.c 
b/drivers/infiniband/ulp/srp/ib_srp.c
index 017de46..fbda2ca 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -66,6 +66,8 @@ static unsigned int srp_sg_tablesize;
 static unsigned int cmd_sg_entries;
 static unsigned int indirect_sg_entries;
 static bool allow_ext_sg;
+static bool prefer_fr;
+static bool register_always;
 static int topspin_workarounds = 1;
 
 module_param(srp_sg_tablesize, uint, 0444);
@@ -87,6 +89,14 @@ module_param(topspin_workarounds, int, 0444);
 MODULE_PARM_DESC(topspin_workarounds,
 "Enable workarounds for Topspin/Cisco SRP target bugs if != 
0");
 
+module_param(prefer_fr, bool, 0444);
+MODULE_PARM_DESC(prefer_fr,
+"Whether to use FR if both FMR and FR are supported");
+
+module_param(register_always, bool, 0444);
+MODULE_PARM_DESC(register_always,
+"Use memory registration even for contiguous memory regions");
+
 static struct kernel_param_ops srp_tmo_ops;
 
 static int srp_reconnect_delay = 10;
@@ -288,12 +298,154 @@ static int srp_new_cm_id(struct srp_target_port *target)
return 0;
 }
 
+/**
+ * srp_destroy_fr_pool() - free the resources owned by a pool
+ * @pool: Fast registration pool to be destroyed.
+ */
+static void srp_destroy_fr_pool(struct srp_fr_pool *pool)
+{
+   int i;
+   struct srp_fr_desc *d;
+
+   if (!pool)
+   return;
+
+   for (i = 0, d = &pool->desc[0]; i < pool->size; i++, d++) {
+   if (d->frpl)
+   ib_free_fast_reg_page_list(d->frpl);
+   if (d->mr)
+   ib_dereg_mr(d->mr);
+   }
+   kfree(pool);
+}
+
+/**
+ * srp_create_fr_pool() - allocate and initialize a pool for fast registration
+ * @device:IB device to allocate fast registration descriptors for.
+ * @pd:Protection domain associated with the FR descriptors.
+ * @pool_size: Number of descriptors to allocate.
+ * @max_page_list_len: Maximum fast registration work request page list length.
+ */
+static struct srp_fr_pool *srp_create_fr_pool(struct ib_device *device,
+ struct ib_pd *pd, int pool_size,
+ int max_page_list_len)
+{
+   struct srp_fr_pool *pool;
+   struct srp_fr_desc *d;
+   struct ib_mr *mr;
+   struct ib_fast_reg_page_list *frpl;
+   int i, ret = -EINVAL;
+
+   if (pool_size <= 0)
+   goto err;
+   ret = -ENOMEM;
+   pool = kzalloc(sizeof(struct srp_fr_pool) +
+  pool_size * sizeof(struct srp_fr_desc), GFP_KERNEL);
+   if (!pool)
+   goto err;
+   pool->size = pool_size;
+   pool->max_page_list_len = max_page_list_len;
+   spin_lock_init(&pool->lock);
+   INIT_LIST_HEAD(&pool->free_list);
+
+   for (i = 0, d = &pool->desc[0]; i < pool->size; i++, d++) {
+   mr = ib_alloc_fast_reg_mr(pd, max_page_list_len);
+   if (IS_ERR(mr)) {
+   ret = PTR_ERR(mr);
+   goto destroy_pool;
+   }
+   d->mr = mr;
+   frpl = ib_alloc_fast_reg_page_list(device, max_page_list_len);
+   if (IS_ERR(frpl)) {
+   ret = PTR_ERR(frpl);
+   goto destroy_pool;
+   }
+   d->frpl = frpl;
+   list_add_tail(&d->entry, &pool->free_list);
+   }
+
+out:
+   return pool;
+
+destroy_pool:
+   srp_destroy_fr_pool(pool);
+
+err:
+   pool = ERR_PTR(ret);
+   goto out;
+}
+
+/**
+ * srp_fr_pool_get() - obtain a descriptor suitable for fast registration
+ * @pool: Pool to obtain descriptor from.
+ */
+static struct srp_fr_desc *srp_fr_pool_get(struct srp_fr_pool *pool)
+{
+   struct srp_fr_desc *d = NULL;
+   unsigned long flags;
+
+   spin_lock_irqsave(&pool->lock, flags);
+   if (!list_empty(&pool->free_list)) {
+   d = list_first_entry(&pool->free_list, typeof(*d), entry);
+   list_del(&d->entry);
+   }
+   spin_unlock_irqrestore(&pool->lock, flags);
+
+   return d;
+}
+
+/**
+ * srp_fr_pool_put() - put an FR descriptor back in the free list
+ * @pool: Pool the descriptor was allocated from.
+ * @desc: Pointer to an array of fast registration descriptor pointers.
+ * @n:Number of descriptors to put back.
+ *
+ * Note: The caller must already have queued an invalidation 

[PATCH 8/9] IB/srp: Rename FMR-related variables

2014-05-06 Thread Bart Van Assche
The next patch will cause the renamed variables to be shared between
the code for FMR and for FR memory registration. Make the names of
these variables independent of the memory registration mode. This
patch does not change any functionality.

Signed-off-by: Bart Van Assche 
Cc: Roland Dreier 
Cc: David Dillow 
Cc: Sagi Grimberg 
Cc: Vu Pham 
Cc: Sebastian Parschauer 
---
 drivers/infiniband/ulp/srp/ib_srp.c | 44 ++---
 drivers/infiniband/ulp/srp/ib_srp.h | 18 +++
 2 files changed, 31 insertions(+), 31 deletions(-)

diff --git a/drivers/infiniband/ulp/srp/ib_srp.c 
b/drivers/infiniband/ulp/srp/ib_srp.c
index af94381..017de46 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -626,7 +626,7 @@ static int srp_alloc_req_data(struct srp_target_port 
*target)
req = &req_ring[i];
req->fmr_list = kmalloc(target->cmd_sg_cnt * sizeof(void *),
GFP_KERNEL);
-   req->map_page = kmalloc(SRP_FMR_SIZE * sizeof(void *),
+   req->map_page = kmalloc(SRP_MAX_PAGES_PER_MR * sizeof(void *),
GFP_KERNEL);
req->indirect_desc = kmalloc(target->indirect_size, GFP_KERNEL);
if (!req->fmr_list || !req->map_page || !req->indirect_desc)
@@ -784,7 +784,7 @@ static void srp_unmap_data(struct scsi_cmnd *scmnd,
return;
 
pfmr = req->fmr_list;
-   while (req->nfmr--)
+   while (req->nmdesc--)
ib_fmr_pool_unmap(*pfmr++);
 
ib_dma_unmap_sg(ibdev, scsi_sglist(scmnd), scsi_sg_count(scmnd),
@@ -954,9 +954,9 @@ static int srp_map_finish_fmr(struct srp_map_state *state,
return PTR_ERR(fmr);
 
*state->next_fmr++ = fmr;
-   state->nfmr++;
+   state->nmdesc++;
 
-   srp_map_desc(state, 0, state->fmr_len, fmr->fmr->rkey);
+   srp_map_desc(state, 0, state->dma_len, fmr->fmr->rkey);
 
return 0;
 }
@@ -970,7 +970,7 @@ static int srp_finish_mapping(struct srp_map_state *state,
return 0;
 
if (state->npages == 1) {
-   srp_map_desc(state, state->base_dma_addr, state->fmr_len,
+   srp_map_desc(state, state->base_dma_addr, state->dma_len,
 target->rkey);
} else {
ret = srp_map_finish_fmr(state, target);
@@ -978,7 +978,7 @@ static int srp_finish_mapping(struct srp_map_state *state,
 
if (ret == 0) {
state->npages = 0;
-   state->fmr_len = 0;
+   state->dma_len = 0;
}
 
return ret;
@@ -1023,7 +1023,7 @@ static int srp_map_sg_entry(struct srp_map_state *state,
 * that were never quite defined, but went away when the initiator
 * avoided using FMR on such page fragments.
 */
-   if (dma_addr & ~dev->fmr_page_mask || dma_len > dev->fmr_max_size) {
+   if (dma_addr & ~dev->mr_page_mask || dma_len > dev->fmr_max_size) {
ret = srp_finish_mapping(state, target);
if (ret)
return ret;
@@ -1042,7 +1042,7 @@ static int srp_map_sg_entry(struct srp_map_state *state,
srp_map_update_start(state, sg, sg_index, dma_addr);
 
while (dma_len) {
-   if (state->npages == SRP_FMR_SIZE) {
+   if (state->npages == SRP_MAX_PAGES_PER_MR) {
ret = srp_map_finish_fmr(state, target);
if (ret)
return ret;
@@ -1050,12 +1050,12 @@ static int srp_map_sg_entry(struct srp_map_state *state,
srp_map_update_start(state, sg, sg_index, dma_addr);
}
 
-   len = min_t(unsigned int, dma_len, dev->fmr_page_size);
+   len = min_t(unsigned int, dma_len, dev->mr_page_size);
 
if (!state->npages)
state->base_dma_addr = dma_addr;
state->pages[state->npages++] = dma_addr;
-   state->fmr_len += len;
+   state->dma_len += len;
dma_addr += len;
dma_len -= len;
}
@@ -1065,7 +1065,7 @@ static int srp_map_sg_entry(struct srp_map_state *state,
 * boundries.
 */
ret = 0;
-   if (len != dev->fmr_page_size) {
+   if (len != dev->mr_page_size) {
ret = srp_map_finish_fmr(state, target);
if (!ret)
srp_map_update_start(state, NULL, 0, 0);
@@ -1112,7 +1112,7 @@ backtrack:
if (use_fmr == SRP_MAP_ALLOW_FMR && srp_map_finish_fmr(state, target))
goto backtrack;
 
-   req->nfmr = state->nfmr;
+   req->nmdesc = state->nmdesc;
 }
 
 static int srp_map_data(struct scsi_cmnd *scmnd, struct srp_target_port 
*target,
@@ -1165,7 +1165,7 @@ static int srp_map_data(struct scsi_cmnd *scmnd, struct 
srp_target_port *targ

[PATCH 7/9] IB/srp: Avoid triggering an infinite loop if memory mapping fails

2014-05-06 Thread Bart Van Assche
Only request the SCSI mid-layer to retry a SCSI command after a
temporary mapping failure (-ENOMEM) but not after a permanent
mapping failure. This patch avoids that SCSI commands are retried
indefinitely if a permanent memory mapping failure occurs.

Signed-off-by: Bart Van Assche 
Cc: Roland Dreier 
Cc: David Dillow 
Cc: Sagi Grimberg 
Cc: Vu Pham 
Cc: Sebastian Parschauer 
---
 drivers/infiniband/ulp/srp/ib_srp.c | 19 ---
 1 file changed, 12 insertions(+), 7 deletions(-)

diff --git a/drivers/infiniband/ulp/srp/ib_srp.c 
b/drivers/infiniband/ulp/srp/ib_srp.c
index 1c4b0d3..af94381 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -1564,7 +1564,7 @@ static int srp_queuecommand(struct Scsi_Host *shost, 
struct scsi_cmnd *scmnd)
struct srp_cmd *cmd;
struct ib_device *dev;
unsigned long flags;
-   int len, result;
+   int len, result, ret = SCSI_MLQUEUE_HOST_BUSY;
const bool in_scsi_eh = !in_interrupt() && current == shost->ehandler;
 
/*
@@ -1580,6 +1580,7 @@ static int srp_queuecommand(struct Scsi_Host *shost, 
struct scsi_cmnd *scmnd)
if (unlikely(result)) {
scmnd->result = result;
scmnd->scsi_done(scmnd);
+   ret = 0;
goto unlock_rport;
}
 
@@ -1613,7 +1614,12 @@ static int srp_queuecommand(struct Scsi_Host *shost, 
struct scsi_cmnd *scmnd)
len = srp_map_data(scmnd, target, req);
if (len < 0) {
shost_printk(KERN_ERR, target->scsi_host,
-PFX "Failed to map data\n");
+PFX "Failed to map data (%d)\n", len);
+   if (len != -ENOMEM) {
+   scmnd->result = DID_ERROR << 16;
+   scmnd->scsi_done(scmnd);
+   ret = 0;
+   }
goto err_iu;
}
 
@@ -1625,11 +1631,13 @@ static int srp_queuecommand(struct Scsi_Host *shost, 
struct scsi_cmnd *scmnd)
goto err_unmap;
}
 
+   ret = 0;
+
 unlock_rport:
if (in_scsi_eh)
mutex_unlock(&rport->mutex);
 
-   return 0;
+   return ret;
 
 err_unmap:
srp_unmap_data(scmnd, target, req);
@@ -1643,10 +1651,7 @@ err_iu:
 err_unlock:
spin_unlock_irqrestore(&target->lock, flags);
 
-   if (in_scsi_eh)
-   mutex_unlock(&rport->mutex);
-
-   return SCSI_MLQUEUE_HOST_BUSY;
+   goto unlock_rport;
 }
 
 /*
-- 
1.8.4.5

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 6/9] IB/srp: Make srp_alloc_req_data() reallocate request data

2014-05-06 Thread Bart Van Assche
This patch is needed by the patch that adds fast registration support.

Signed-off-by: Bart Van Assche 
Cc: Roland Dreier 
Cc: David Dillow 
Cc: Sagi Grimberg 
Cc: Vu Pham 
Cc: Sebastian Parschauer 
---
 drivers/infiniband/ulp/srp/ib_srp.c | 41 -
 1 file changed, 27 insertions(+), 14 deletions(-)

diff --git a/drivers/infiniband/ulp/srp/ib_srp.c 
b/drivers/infiniband/ulp/srp/ib_srp.c
index ba434d6..1c4b0d3 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -574,17 +574,18 @@ static void srp_disconnect_target(struct srp_target_port 
*target)
}
 }
 
-static void srp_free_req_data(struct srp_target_port *target)
+static void srp_free_req_data(struct srp_target_port *target,
+ struct srp_request *req_ring)
 {
struct ib_device *ibdev = target->srp_host->srp_dev->dev;
struct srp_request *req;
int i;
 
-   if (!target->req_ring)
+   if (!req_ring)
return;
 
for (i = 0; i < target->req_ring_size; ++i) {
-   req = &target->req_ring[i];
+   req = &req_ring[i];
kfree(req->fmr_list);
kfree(req->map_page);
if (req->indirect_dma_addr) {
@@ -595,27 +596,34 @@ static void srp_free_req_data(struct srp_target_port 
*target)
kfree(req->indirect_desc);
}
 
-   kfree(target->req_ring);
-   target->req_ring = NULL;
+   kfree(req_ring);
 }
 
+/**
+ * srp_alloc_req_data() - allocate or reallocate request data
+ * @target: SRP target port.
+ *
+ * If target->req_ring was non-NULL before this function got invoked it will
+ * also be non-NULL after this function has finished.
+ */
 static int srp_alloc_req_data(struct srp_target_port *target)
 {
struct srp_device *srp_dev = target->srp_host->srp_dev;
struct ib_device *ibdev = srp_dev->dev;
-   struct srp_request *req;
+   struct list_head free_reqs;
+   struct srp_request *req_ring, *req;
dma_addr_t dma_addr;
int i, ret = -ENOMEM;
 
-   INIT_LIST_HEAD(&target->free_reqs);
+   INIT_LIST_HEAD(&free_reqs);
 
-   target->req_ring = kzalloc(target->req_ring_size *
-  sizeof(*target->req_ring), GFP_KERNEL);
-   if (!target->req_ring)
+   req_ring = kzalloc(target->req_ring_size * sizeof(*req_ring),
+  GFP_KERNEL);
+   if (!req_ring)
goto out;
 
for (i = 0; i < target->req_ring_size; ++i) {
-   req = &target->req_ring[i];
+   req = &req_ring[i];
req->fmr_list = kmalloc(target->cmd_sg_cnt * sizeof(void *),
GFP_KERNEL);
req->map_page = kmalloc(SRP_FMR_SIZE * sizeof(void *),
@@ -632,11 +640,16 @@ static int srp_alloc_req_data(struct srp_target_port 
*target)
 
req->indirect_dma_addr = dma_addr;
req->index = i;
-   list_add_tail(&req->list, &target->free_reqs);
+   list_add_tail(&req->list, &free_reqs);
}
+   swap(target->req_ring, req_ring);
+   INIT_LIST_HEAD(&target->free_reqs);
+   list_splice(&free_reqs, &target->free_reqs);
ret = 0;
 
 out:
+   srp_free_req_data(target, req_ring);
+
return ret;
 }
 
@@ -669,7 +682,7 @@ static void srp_remove_target(struct srp_target_port 
*target)
srp_free_target_ib(target);
cancel_work_sync(&target->tl_err_work);
srp_rport_put(target->rport);
-   srp_free_req_data(target);
+   srp_free_req_data(target, target->req_ring);
 
spin_lock(&target->srp_host->target_lock);
list_del(&target->list);
@@ -2750,7 +2763,7 @@ err_free_ib:
srp_free_target_ib(target);
 
 err_free_mem:
-   srp_free_req_data(target);
+   srp_free_req_data(target, target->req_ring);
 
 err:
scsi_host_put(target_host);
-- 
1.8.4.5

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 5/9] IB/srp: Introduce srp_finish_mapping()

2014-05-06 Thread Bart Van Assche
This patch does not change any functionality.

Signed-off-by: Bart Van Assche 
Cc: Roland Dreier 
Cc: David Dillow 
Cc: Sagi Grimberg 
Cc: Vu Pham 
Cc: Sebastian Parschauer 
---
 drivers/infiniband/ulp/srp/ib_srp.c | 37 +
 1 file changed, 25 insertions(+), 12 deletions(-)

diff --git a/drivers/infiniband/ulp/srp/ib_srp.c 
b/drivers/infiniband/ulp/srp/ib_srp.c
index 5fb607b..ba434d6 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -935,16 +935,6 @@ static int srp_map_finish_fmr(struct srp_map_state *state,
struct ib_pool_fmr *fmr;
u64 io_addr = 0;
 
-   if (!state->npages)
-   return 0;
-
-   if (state->npages == 1) {
-   srp_map_desc(state, state->base_dma_addr, state->fmr_len,
-target->rkey);
-   state->npages = state->fmr_len = 0;
-   return 0;
-   }
-
fmr = ib_fmr_pool_map_phys(dev->fmr_pool, state->pages,
   state->npages, io_addr);
if (IS_ERR(fmr))
@@ -954,10 +944,33 @@ static int srp_map_finish_fmr(struct srp_map_state *state,
state->nfmr++;
 
srp_map_desc(state, 0, state->fmr_len, fmr->fmr->rkey);
-   state->npages = state->fmr_len = 0;
+
return 0;
 }
 
+static int srp_finish_mapping(struct srp_map_state *state,
+ struct srp_target_port *target)
+{
+   int ret = 0;
+
+   if (state->npages == 0)
+   return 0;
+
+   if (state->npages == 1) {
+   srp_map_desc(state, state->base_dma_addr, state->fmr_len,
+target->rkey);
+   } else {
+   ret = srp_map_finish_fmr(state, target);
+   }
+
+   if (ret == 0) {
+   state->npages = 0;
+   state->fmr_len = 0;
+   }
+
+   return ret;
+}
+
 static void srp_map_update_start(struct srp_map_state *state,
 struct scatterlist *sg, int sg_index,
 dma_addr_t dma_addr)
@@ -998,7 +1011,7 @@ static int srp_map_sg_entry(struct srp_map_state *state,
 * avoided using FMR on such page fragments.
 */
if (dma_addr & ~dev->fmr_page_mask || dma_len > dev->fmr_max_size) {
-   ret = srp_map_finish_fmr(state, target);
+   ret = srp_finish_mapping(state, target);
if (ret)
return ret;
 
-- 
1.8.4.5

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/9] IB/srp: Introduce srp_map_fmr()

2014-05-06 Thread Bart Van Assche
This patch does not change any functionality.

Signed-off-by: Bart Van Assche 
Cc: Roland Dreier 
Cc: David Dillow 
Cc: Sagi Grimberg 
Cc: Vu Pham 
Cc: Sebastian Parschauer 
---
 drivers/infiniband/ulp/srp/ib_srp.c | 77 ++---
 1 file changed, 45 insertions(+), 32 deletions(-)

diff --git a/drivers/infiniband/ulp/srp/ib_srp.c 
b/drivers/infiniband/ulp/srp/ib_srp.c
index f41cc8c..5fb607b 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -1047,12 +1047,54 @@ static int srp_map_sg_entry(struct srp_map_state *state,
return ret;
 }
 
+static void srp_map_fmr(struct srp_map_state *state,
+   struct srp_target_port *target, struct srp_request *req,
+   struct scatterlist *scat, int count)
+{
+   struct srp_device *dev = target->srp_host->srp_dev;
+   struct ib_device *ibdev = dev->dev;
+   struct scatterlist *sg;
+   int i, use_fmr;
+
+   state->desc = req->indirect_desc;
+   state->pages= req->map_page;
+   state->next_fmr = req->fmr_list;
+
+   use_fmr = dev->fmr_pool ? SRP_MAP_ALLOW_FMR : SRP_MAP_NO_FMR;
+
+   for_each_sg(scat, sg, count, i) {
+   if (srp_map_sg_entry(state, target, sg, i, use_fmr)) {
+   /* FMR mapping failed, so backtrack to the first
+* unmapped entry and continue on without using FMR.
+*/
+   dma_addr_t dma_addr;
+   unsigned int dma_len;
+
+backtrack:
+   sg = state->unmapped_sg;
+   i = state->unmapped_index;
+
+   dma_addr = ib_sg_dma_address(ibdev, sg);
+   dma_len = ib_sg_dma_len(ibdev, sg);
+   dma_len -= (state->unmapped_addr - dma_addr);
+   dma_addr = state->unmapped_addr;
+   use_fmr = SRP_MAP_NO_FMR;
+   srp_map_desc(state, dma_addr, dma_len, target->rkey);
+   }
+   }
+
+   if (use_fmr == SRP_MAP_ALLOW_FMR && srp_map_finish_fmr(state, target))
+   goto backtrack;
+
+   req->nfmr = state->nfmr;
+}
+
 static int srp_map_data(struct scsi_cmnd *scmnd, struct srp_target_port 
*target,
struct srp_request *req)
 {
-   struct scatterlist *scat, *sg;
+   struct scatterlist *scat;
struct srp_cmd *cmd = req->cmd->buf;
-   int i, len, nents, count, use_fmr;
+   int len, nents, count;
struct srp_device *dev;
struct ib_device *ibdev;
struct srp_map_state state;
@@ -,35 +1153,7 @@ static int srp_map_data(struct scsi_cmnd *scmnd, struct 
srp_target_port *target,
   target->indirect_size, DMA_TO_DEVICE);
 
memset(&state, 0, sizeof(state));
-   state.desc  = req->indirect_desc;
-   state.pages = req->map_page;
-   state.next_fmr  = req->fmr_list;
-
-   use_fmr = dev->fmr_pool ? SRP_MAP_ALLOW_FMR : SRP_MAP_NO_FMR;
-
-   for_each_sg(scat, sg, count, i) {
-   if (srp_map_sg_entry(&state, target, sg, i, use_fmr)) {
-   /* FMR mapping failed, so backtrack to the first
-* unmapped entry and continue on without using FMR.
-*/
-   dma_addr_t dma_addr;
-   unsigned int dma_len;
-
-backtrack:
-   sg = state.unmapped_sg;
-   i = state.unmapped_index;
-
-   dma_addr = ib_sg_dma_address(ibdev, sg);
-   dma_len = ib_sg_dma_len(ibdev, sg);
-   dma_len -= (state.unmapped_addr - dma_addr);
-   dma_addr = state.unmapped_addr;
-   use_fmr = SRP_MAP_NO_FMR;
-   srp_map_desc(&state, dma_addr, dma_len, target->rkey);
-   }
-   }
-
-   if (use_fmr == SRP_MAP_ALLOW_FMR && srp_map_finish_fmr(&state, target))
-   goto backtrack;
+   srp_map_fmr(&state, target, req, scat, count);
 
/* We've mapped the request, now pull as much of the indirect
 * descriptor table as we can into the command buffer. If this
@@ -1147,7 +1161,6 @@ backtrack:
 * guaranteed to fit into the command, as the SCSI layer won't
 * give us more S/G entries than we allow.
 */
-   req->nfmr = state.nfmr;
if (state.ndesc == 1) {
/* FMR mapping was able to collapse this to one entry,
 * so use a direct descriptor.
-- 
1.8.4.5


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/9] IB/srp: Introduce srp_alloc_fmr_pool()

2014-05-06 Thread Bart Van Assche
Introduce the srp_alloc_fmr_pool() function. Only set
srp_dev->fmr_max_size if FMR pool creation succeeded. This change is
safe since that variable is only used if FMR pool creation succeeded.

Signed-off-by: Bart Van Assche 
Cc: Roland Dreier 
Cc: David Dillow 
Cc: Sagi Grimberg 
Cc: Vu Pham 
Cc: Sebastian Parschauer 
---
 drivers/infiniband/ulp/srp/ib_srp.c | 56 ++---
 1 file changed, 33 insertions(+), 23 deletions(-)

diff --git a/drivers/infiniband/ulp/srp/ib_srp.c 
b/drivers/infiniband/ulp/srp/ib_srp.c
index 8c03371..f41cc8c 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -2792,13 +2792,43 @@ free_host:
return NULL;
 }
 
+static void srp_alloc_fmr_pool(struct srp_device *srp_dev)
+{
+   int max_pages_per_mr;
+   struct ib_fmr_pool_param fmr_param;
+   struct ib_fmr_pool *pool;
+
+   srp_dev->fmr_pool = NULL;
+
+   for (max_pages_per_mr = SRP_FMR_SIZE;
+max_pages_per_mr >= SRP_FMR_MIN_SIZE;
+max_pages_per_mr /= 2) {
+   memset(&fmr_param, 0, sizeof(fmr_param));
+   fmr_param.pool_size = SRP_FMR_POOL_SIZE;
+   fmr_param.dirty_watermark   = SRP_FMR_DIRTY_SIZE;
+   fmr_param.cache = 1;
+   fmr_param.max_pages_per_fmr = max_pages_per_mr;
+   fmr_param.page_shift= ilog2(srp_dev->fmr_page_size);
+   fmr_param.access= (IB_ACCESS_LOCAL_WRITE |
+  IB_ACCESS_REMOTE_WRITE |
+  IB_ACCESS_REMOTE_READ);
+
+   pool = ib_create_fmr_pool(srp_dev->pd, &fmr_param);
+   if (!IS_ERR(pool)) {
+   srp_dev->fmr_pool = pool;
+   srp_dev->fmr_max_size =
+   srp_dev->fmr_page_size * max_pages_per_mr;
+   break;
+   }
+   }
+}
+
 static void srp_add_one(struct ib_device *device)
 {
struct srp_device *srp_dev;
struct ib_device_attr *dev_attr;
-   struct ib_fmr_pool_param fmr_param;
struct srp_host *host;
-   int max_pages_per_fmr, fmr_page_shift, s, e, p;
+   int fmr_page_shift, s, e, p;
 
dev_attr = kmalloc(sizeof *dev_attr, GFP_KERNEL);
if (!dev_attr)
@@ -2821,7 +2851,6 @@ static void srp_add_one(struct ib_device *device)
fmr_page_shift  = max(12, ffs(dev_attr->page_size_cap) - 1);
srp_dev->fmr_page_size  = 1 << fmr_page_shift;
srp_dev->fmr_page_mask  = ~((u64) srp_dev->fmr_page_size - 1);
-   srp_dev->fmr_max_size   = srp_dev->fmr_page_size * SRP_FMR_SIZE;
 
INIT_LIST_HEAD(&srp_dev->dev_list);
 
@@ -2837,26 +2866,7 @@ static void srp_add_one(struct ib_device *device)
if (IS_ERR(srp_dev->mr))
goto err_pd;
 
-   for (max_pages_per_fmr = SRP_FMR_SIZE;
-   max_pages_per_fmr >= SRP_FMR_MIN_SIZE;
-   max_pages_per_fmr /= 2, srp_dev->fmr_max_size /= 2) {
-   memset(&fmr_param, 0, sizeof fmr_param);
-   fmr_param.pool_size = SRP_FMR_POOL_SIZE;
-   fmr_param.dirty_watermark   = SRP_FMR_DIRTY_SIZE;
-   fmr_param.cache = 1;
-   fmr_param.max_pages_per_fmr = max_pages_per_fmr;
-   fmr_param.page_shift= fmr_page_shift;
-   fmr_param.access= (IB_ACCESS_LOCAL_WRITE |
-  IB_ACCESS_REMOTE_WRITE |
-  IB_ACCESS_REMOTE_READ);
-
-   srp_dev->fmr_pool = ib_create_fmr_pool(srp_dev->pd, &fmr_param);
-   if (!IS_ERR(srp_dev->fmr_pool))
-   break;
-   }
-
-   if (IS_ERR(srp_dev->fmr_pool))
-   srp_dev->fmr_pool = NULL;
+   srp_alloc_fmr_pool(srp_dev);
 
if (device->node_type == RDMA_NODE_IB_SWITCH) {
s = 0;
-- 
1.8.4.5

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/9] IB/srp: Introduce an additional local variable

2014-05-06 Thread Bart Van Assche
This patch does not change any functionality.

Signed-off-by: Bart Van Assche 
Cc: Roland Dreier 
Cc: David Dillow 
Cc: Sagi Grimberg 
Cc: Vu Pham 
Cc: Sebastian Parschauer 
---
 drivers/infiniband/ulp/srp/ib_srp.c | 9 -
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/drivers/infiniband/ulp/srp/ib_srp.c 
b/drivers/infiniband/ulp/srp/ib_srp.c
index cf80f7a..8c03371 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -290,6 +290,7 @@ static int srp_new_cm_id(struct srp_target_port *target)
 
 static int srp_create_target_ib(struct srp_target_port *target)
 {
+   struct srp_device *dev = target->srp_host->srp_dev;
struct ib_qp_init_attr *init_attr;
struct ib_cq *recv_cq, *send_cq;
struct ib_qp *qp;
@@ -299,16 +300,14 @@ static int srp_create_target_ib(struct srp_target_port 
*target)
if (!init_attr)
return -ENOMEM;
 
-   recv_cq = ib_create_cq(target->srp_host->srp_dev->dev,
-  srp_recv_completion, NULL, target,
+   recv_cq = ib_create_cq(dev->dev, srp_recv_completion, NULL, target,
   target->queue_size, target->comp_vector);
if (IS_ERR(recv_cq)) {
ret = PTR_ERR(recv_cq);
goto err;
}
 
-   send_cq = ib_create_cq(target->srp_host->srp_dev->dev,
-  srp_send_completion, NULL, target,
+   send_cq = ib_create_cq(dev->dev, srp_send_completion, NULL, target,
   target->queue_size, target->comp_vector);
if (IS_ERR(send_cq)) {
ret = PTR_ERR(send_cq);
@@ -327,7 +326,7 @@ static int srp_create_target_ib(struct srp_target_port 
*target)
init_attr->send_cq = send_cq;
init_attr->recv_cq = recv_cq;
 
-   qp = ib_create_qp(target->srp_host->srp_dev->pd, init_attr);
+   qp = ib_create_qp(dev->pd, init_attr);
if (IS_ERR(qp)) {
ret = PTR_ERR(qp);
goto err_send_cq;
-- 
1.8.4.5

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/9] IB/srp: Fix kernel-doc warnings

2014-05-06 Thread Bart Van Assche
Avoid that the kernel-doc tool warns about missing argument descriptions
for the ib_srp.[ch] source files.

Signed-off-by: Bart Van Assche 
Cc: Roland Dreier 
Cc: David Dillow 
Cc: Sagi Grimberg 
Cc: Vu Pham 
Cc: Sebastian Parschauer 
---
 drivers/infiniband/ulp/srp/ib_srp.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/drivers/infiniband/ulp/srp/ib_srp.c 
b/drivers/infiniband/ulp/srp/ib_srp.c
index 66a908b..cf80f7a 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -813,6 +813,10 @@ static struct scsi_cmnd *srp_claim_req(struct 
srp_target_port *target,
 
 /**
  * srp_free_req() - Unmap data and add request to the free request list.
+ * @target: SRP target port.
+ * @req:Request to be freed.
+ * @scmnd:  SCSI command associated with @req.
+ * @req_lim_delta: Amount to be added to @target->req_lim.
  */
 static void srp_free_req(struct srp_target_port *target,
 struct srp_request *req, struct scsi_cmnd *scmnd,
@@ -1455,6 +1459,7 @@ static void srp_handle_recv(struct srp_target_port 
*target, struct ib_wc *wc)
 
 /**
  * srp_tl_err_work() - handle a transport layer error
+ * @work: Work structure embedded in an SRP target port.
  *
  * Note: This function may get invoked before the rport has been created,
  * hence the target->rport test.
@@ -2310,6 +2315,8 @@ static struct class srp_class = {
 
 /**
  * srp_conn_unique() - check whether the connection to a target is unique
+ * @host:   SRP host.
+ * @target: SRP target port.
  */
 static bool srp_conn_unique(struct srp_host *host,
struct srp_target_port *target)
-- 
1.8.4.5

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/9] SRP initiator patches for kernel 3.16

2014-05-06 Thread Bart Van Assche
This patch series consists of one patch that adds fast registration
support to the SRP initiator and eight preparation patches:

0001-IB-srp-Fix-kernel-doc-warnings.patch
0002-IB-srp-Introduce-an-additional-local-variable.patch
0003-IB-srp-Introduce-srp_alloc_fmr_pool.patch
0004-IB-srp-Introduce-srp_map_fmr.patch
0005-IB-srp-Introduce-srp_finish_mapping.patch
0006-IB-srp-Make-srp_alloc_req_data-reallocate-request-da.patch
0007-IB-srp-Avoid-triggering-an-infinite-loop-if-memory-m.patch
0008-IB-srp-Rename-FMR-related-variables.patch
0009-IB-srp-Add-fast-registration-support.patch
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html