RE: [PATCH V3] svcrdma: refactor marshalling logic

2014-06-02 Thread Steve Wise


 Steve
 
 I have not checked the code because I am away from my laptop
 But I assume mlx and mthca is using fmr and cxgb4 is using 
 rdma-read-with-invalidate
which
 has implicit fence.
 
 If invalidate is issued before read is completed its a problem.
 

You're correct.  And this bug appears to be in the current upstream code as 
well.  If an
IB_WR_LOCAL_INV wr is used, it must include IB_SEND_FENCE to fence it until the 
prior read
completes.

Good catch!  I'll post V4 soon.



 Regards
 Devesh
 
 From: Steve Wise [sw...@opengridcomputing.com]
 Sent: Friday, May 30, 2014 6:32 PM
 To: Devesh Sharma; bfie...@fieldses.org
 Cc: linux-...@vger.kernel.org; linux-rdma@vger.kernel.org; 
 t...@opengridcomputing.com
 Subject: RE: [PATCH V3] svcrdma: refactor marshalling logic
 
 
  Hi Steve
 
  I am testing this patch. I have found that when server tries to initiate 
  RDMA-READ on
 ocrdma
  device the RDMA-READ posting fails because there is no FENCE bit set for
  Non-iwarp device which is using frmr. Because of this, whenever server 
  tries to
initiate
  RDMA_READ operation, it fails with completion error.
  This bug was there in v1 and v2 as well.
 
 
 Why would the FENCE bit not be required for mlx4, mthca, cxgb4, and yet be 
 required for
 ocrdma?
 
 
  Check inline for the exact location of the change.
 
  Rest is okay from my side, iozone is passing with this patch. Off-course 
  after putting
a FENCE
  indicator.
 
  -Regards
   Devesh
 
   -Original Message-
   From: linux-rdma-ow...@vger.kernel.org [mailto:linux-rdma-
   ow...@vger.kernel.org] On Behalf Of Steve Wise
   Sent: Thursday, May 29, 2014 10:26 PM
   To: bfie...@fieldses.org
   Cc: linux-...@vger.kernel.org; linux-rdma@vger.kernel.org;
   t...@opengridcomputing.com
   Subject: [PATCH V3] svcrdma: refactor marshalling logic
  
   This patch refactors the NFSRDMA server marshalling logic to remove the
   intermediary map structures.  It also fixes an existing bug where the
   NFSRDMA server was not minding the device fast register page list length
   limitations.
  
   I've also made a git repo available with these patches on top of 3.15-rc7:
  
   git://git.linux-nfs.org/projects/swise/linux.git svcrdma-refactor-v3
  
   Changes since V2:
  
   - fixed logic bug in rdma_read_chunk_frmr() and rdma_read_chunk_lcl()
  
   - in rdma_read_chunks(), set the reader function pointer only once since
 it doesn't change
  
   - squashed the patch back into one patch since the previous split wasn't
 bisectable
  
   Changes since V1:
  
   - fixed regression for devices that don't support FRMRs (see
 rdma_read_chunk_lcl())
  
   - split patch up for closer review.  However I request it be squashed
 before merging as they is not bisectable, and I think these changes
 should all be a single commit anyway.
  
   Please review, and test if you can.  I'd like this to hit 3.16.
  
   Signed-off-by: Tom Tucker t...@opengridcomputing.com
   Signed-off-by: Steve Wise sw...@opengridcomputing.com
   ---
  
include/linux/sunrpc/svc_rdma.h  |3
net/sunrpc/xprtrdma/svc_rdma_recvfrom.c  |  643 +--
   ---
net/sunrpc/xprtrdma/svc_rdma_sendto.c|  230 +--
net/sunrpc/xprtrdma/svc_rdma_transport.c |   62 ++-
4 files changed, 332 insertions(+), 606 deletions(-)
  
   diff --git a/include/linux/sunrpc/svc_rdma.h
   b/include/linux/sunrpc/svc_rdma.h index 0b8e3e6..5cf99a0 100644
   --- a/include/linux/sunrpc/svc_rdma.h
   +++ b/include/linux/sunrpc/svc_rdma.h
   @@ -115,14 +115,13 @@ struct svc_rdma_fastreg_mr {
   struct list_head frmr_list;
};
struct svc_rdma_req_map {
   -   struct svc_rdma_fastreg_mr *frmr;
   unsigned long count;
   union {
   struct kvec sge[RPCSVC_MAXPAGES];
   struct svc_rdma_chunk_sge ch[RPCSVC_MAXPAGES];
   +   unsigned long lkey[RPCSVC_MAXPAGES];
   };
};
   -#define RDMACTXT_F_FAST_UNREG  1
#define RDMACTXT_F_LAST_CTXT   2
  
#defineSVCRDMA_DEVCAP_FAST_REG 1   /*
   fast mr registration */
   diff --git a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
   b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
   index 8d904e4..52d9f2c 100644
   --- a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
   +++ b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
   @@ -1,4 +1,5 @@
/*
   + * Copyright (c) 2014 Open Grid Computing, Inc. All rights reserved.
 * Copyright (c) 2005-2006 Network Appliance, Inc. All rights reserved.
 *
 * This software is available to you under a choice of one of two @@ 
   -69,7
   +70,8 @@ static void rdma_build_arg_xdr(struct svc_rqst *rqstp,
  
   /* Set up the XDR head */
   rqstp-rq_arg.head[0].iov_base = page_address(page);
   -   rqstp-rq_arg.head[0].iov_len = min(byte_count, ctxt-
   sge[0].length);
   +   rqstp-rq_arg.head[0].iov_len =
   +   min_t(size_t, byte_count, ctxt-sge[0].length);
   rqstp

Re: [PATCH V3] svcrdma: refactor marshalling logic

2014-06-02 Thread J. Bruce Fields
On Mon, Jun 02, 2014 at 11:47:58AM -0500, Steve Wise wrote:
 
 
  Steve
  
  I have not checked the code because I am away from my laptop
  But I assume mlx and mthca is using fmr and cxgb4 is using 
  rdma-read-with-invalidate
 which
  has implicit fence.
  
  If invalidate is issued before read is completed its a problem.
  
 
 You're correct.  And this bug appears to be in the current upstream code as 
 well.  If an
 IB_WR_LOCAL_INV wr is used, it must include IB_SEND_FENCE to fence it until 
 the prior read
 completes.
 
 Good catch!  I'll post V4 soon.

Any chance that can be handled as a separate patch rather than folded
in?

(Disclaimer: I've been following the discussion only very
superficially.)

--b.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH V3] svcrdma: refactor marshalling logic

2014-06-02 Thread Steve Wise
  You're correct.  And this bug appears to be in the current upstream code as 
  well.  If
an
  IB_WR_LOCAL_INV wr is used, it must include IB_SEND_FENCE to fence it until 
  the prior
 read
  completes.
 
  Good catch!  I'll post V4 soon.
 
 Any chance that can be handled as a separate patch rather than folded
 in?
 
 (Disclaimer: I've been following the discussion only very
 superficially.)
 

Sure.  I'll post the patch soon.

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V3] svcrdma: refactor marshalling logic

2014-06-02 Thread 'J. Bruce Fields'
On Mon, Jun 02, 2014 at 11:52:47AM -0500, Steve Wise wrote:
   You're correct.  And this bug appears to be in the current upstream code 
   as well.  If
 an
   IB_WR_LOCAL_INV wr is used, it must include IB_SEND_FENCE to fence it 
   until the prior
  read
   completes.
  
   Good catch!  I'll post V4 soon.
  
  Any chance that can be handled as a separate patch rather than folded
  in?
  
  (Disclaimer: I've been following the discussion only very
  superficially.)
  
 
 Sure.  I'll post the patch soon.

Thanks, and, again, I'm not terribly happy about the monster
patch--anything you can split off it is great, even if that thing's
small.  As long as all the intermediate stages still build and run.

(And any bugs you've identified in upstream code are good candidates for
separate patches, hopefully preceding the rewrite.  That also allows us
to apply those fixes to stable kernels if appropriate.)

--b.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH V3] svcrdma: refactor marshalling logic

2014-06-02 Thread Steve Wise

 On Mon, Jun 02, 2014 at 11:52:47AM -0500, Steve Wise wrote:
You're correct.  And this bug appears to be in the current upstream 
code as well.
If
  an
IB_WR_LOCAL_INV wr is used, it must include IB_SEND_FENCE to fence it 
until the
prior
   read
completes.
   
Good catch!  I'll post V4 soon.
  
   Any chance that can be handled as a separate patch rather than folded
   in?
  
   (Disclaimer: I've been following the discussion only very
   superficially.)
  
 
  Sure.  I'll post the patch soon.
 
 Thanks, and, again, I'm not terribly happy about the monster
 patch--anything you can split off it is great, even if that thing's
 small.  As long as all the intermediate stages still build and run.
 

I don't see any way to do this for this particular patch.  It rewrites the 
entire rdma
read logic.  

 (And any bugs you've identified in upstream code are good candidates for
 separate patches, hopefully preceding the rewrite.  That also allows us
 to apply those fixes to stable kernels if appropriate.)
 

If I do this, then I'd have to respin the refactor patch.  I really would like 
to get this
merged as-is (with the one change I'm sending soon), and move on.  I definitely 
will try
and keep the patches smaller and more discrete going forward.  

Will that work?


Steve.

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V3] svcrdma: refactor marshalling logic

2014-06-02 Thread 'J. Bruce Fields'
On Mon, Jun 02, 2014 at 12:06:39PM -0500, Steve Wise wrote:
 
  On Mon, Jun 02, 2014 at 11:52:47AM -0500, Steve Wise wrote:
 You're correct.  And this bug appears to be in the current upstream 
 code as well.
 If
   an
 IB_WR_LOCAL_INV wr is used, it must include IB_SEND_FENCE to fence it 
 until the
 prior
read
 completes.

 Good catch!  I'll post V4 soon.
   
Any chance that can be handled as a separate patch rather than folded
in?
   
(Disclaimer: I've been following the discussion only very
superficially.)
   
  
   Sure.  I'll post the patch soon.
  
  Thanks, and, again, I'm not terribly happy about the monster
  patch--anything you can split off it is great, even if that thing's
  small.  As long as all the intermediate stages still build and run.
 
 I don't see any way to do this for this particular patch.  It rewrites the 
 entire rdma
 read logic.  

There's almost always a way.

  (And any bugs you've identified in upstream code are good candidates for
  separate patches, hopefully preceding the rewrite.  That also allows us
  to apply those fixes to stable kernels if appropriate.)
  
 
 If I do this, then I'd have to respin the refactor patch.  I really would 
 like to get this
 merged as-is (with the one change I'm sending soon), and move on.  I 
 definitely will try
 and keep the patches smaller and more discrete going forward.  
 
 Will that work?

Yes, just this once, we'll live.

--b.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH V3] svcrdma: refactor marshalling logic

2014-05-30 Thread Steve Wise
 
 Hi Steve
 
 I am testing this patch. I have found that when server tries to initiate 
 RDMA-READ on ocrdma
 device the RDMA-READ posting fails because there is no FENCE bit set for
 Non-iwarp device which is using frmr. Because of this, whenever server tries 
 to initiate
 RDMA_READ operation, it fails with completion error.
 This bug was there in v1 and v2 as well.


Why would the FENCE bit not be required for mlx4, mthca, cxgb4, and yet be 
required for ocrdma?

 
 Check inline for the exact location of the change.
 
 Rest is okay from my side, iozone is passing with this patch. Off-course 
 after putting a FENCE
 indicator.
 
 -Regards
  Devesh
 
  -Original Message-
  From: linux-rdma-ow...@vger.kernel.org [mailto:linux-rdma-
  ow...@vger.kernel.org] On Behalf Of Steve Wise
  Sent: Thursday, May 29, 2014 10:26 PM
  To: bfie...@fieldses.org
  Cc: linux-...@vger.kernel.org; linux-rdma@vger.kernel.org;
  t...@opengridcomputing.com
  Subject: [PATCH V3] svcrdma: refactor marshalling logic
 
  This patch refactors the NFSRDMA server marshalling logic to remove the
  intermediary map structures.  It also fixes an existing bug where the
  NFSRDMA server was not minding the device fast register page list length
  limitations.
 
  I've also made a git repo available with these patches on top of 3.15-rc7:
 
  git://git.linux-nfs.org/projects/swise/linux.git svcrdma-refactor-v3
 
  Changes since V2:
 
  - fixed logic bug in rdma_read_chunk_frmr() and rdma_read_chunk_lcl()
 
  - in rdma_read_chunks(), set the reader function pointer only once since
it doesn't change
 
  - squashed the patch back into one patch since the previous split wasn't
bisectable
 
  Changes since V1:
 
  - fixed regression for devices that don't support FRMRs (see
rdma_read_chunk_lcl())
 
  - split patch up for closer review.  However I request it be squashed
before merging as they is not bisectable, and I think these changes
should all be a single commit anyway.
 
  Please review, and test if you can.  I'd like this to hit 3.16.
 
  Signed-off-by: Tom Tucker t...@opengridcomputing.com
  Signed-off-by: Steve Wise sw...@opengridcomputing.com
  ---
 
   include/linux/sunrpc/svc_rdma.h  |3
   net/sunrpc/xprtrdma/svc_rdma_recvfrom.c  |  643 +--
  ---
   net/sunrpc/xprtrdma/svc_rdma_sendto.c|  230 +--
   net/sunrpc/xprtrdma/svc_rdma_transport.c |   62 ++-
   4 files changed, 332 insertions(+), 606 deletions(-)
 
  diff --git a/include/linux/sunrpc/svc_rdma.h
  b/include/linux/sunrpc/svc_rdma.h index 0b8e3e6..5cf99a0 100644
  --- a/include/linux/sunrpc/svc_rdma.h
  +++ b/include/linux/sunrpc/svc_rdma.h
  @@ -115,14 +115,13 @@ struct svc_rdma_fastreg_mr {
  struct list_head frmr_list;
   };
   struct svc_rdma_req_map {
  -   struct svc_rdma_fastreg_mr *frmr;
  unsigned long count;
  union {
  struct kvec sge[RPCSVC_MAXPAGES];
  struct svc_rdma_chunk_sge ch[RPCSVC_MAXPAGES];
  +   unsigned long lkey[RPCSVC_MAXPAGES];
  };
   };
  -#define RDMACTXT_F_FAST_UNREG  1
   #define RDMACTXT_F_LAST_CTXT   2
 
   #defineSVCRDMA_DEVCAP_FAST_REG 1   /*
  fast mr registration */
  diff --git a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
  b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
  index 8d904e4..52d9f2c 100644
  --- a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
  +++ b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
  @@ -1,4 +1,5 @@
   /*
  + * Copyright (c) 2014 Open Grid Computing, Inc. All rights reserved.
* Copyright (c) 2005-2006 Network Appliance, Inc. All rights reserved.
*
* This software is available to you under a choice of one of two @@ -69,7
  +70,8 @@ static void rdma_build_arg_xdr(struct svc_rqst *rqstp,
 
  /* Set up the XDR head */
  rqstp-rq_arg.head[0].iov_base = page_address(page);
  -   rqstp-rq_arg.head[0].iov_len = min(byte_count, ctxt-
  sge[0].length);
  +   rqstp-rq_arg.head[0].iov_len =
  +   min_t(size_t, byte_count, ctxt-sge[0].length);
  rqstp-rq_arg.len = byte_count;
  rqstp-rq_arg.buflen = byte_count;
 
  @@ -85,7 +87,7 @@ static void rdma_build_arg_xdr(struct svc_rqst *rqstp,
  page = ctxt-pages[sge_no];
  put_page(rqstp-rq_pages[sge_no]);
  rqstp-rq_pages[sge_no] = page;
  -   bc -= min(bc, ctxt-sge[sge_no].length);
  +   bc -= min_t(u32, bc, ctxt-sge[sge_no].length);
  rqstp-rq_arg.buflen += ctxt-sge[sge_no].length;
  sge_no++;
  }
  @@ -113,291 +115,265 @@ static void rdma_build_arg_xdr(struct svc_rqst
  *rqstp,
  rqstp-rq_arg.tail[0].iov_len = 0;
   }
 
  -/* Encode a read-chunk-list as an array of IB SGE
  - *
  - * Assumptions:
  - * - chunk[0]-position points to pages[0] at an offset of 0
  - * - pages[] is not physically or virtually contiguous and consists of
  - *   PAGE_SIZE elements.
  - *
  - * Output:
  - * - sge array pointing into pages[] array

RE: [PATCH V3] svcrdma: refactor marshalling logic

2014-05-30 Thread Devesh Sharma

Steve

I have not checked the code because I am away from my laptop
But I assume mlx and mthca is using fmr and cxgb4 is using 
rdma-read-with-invalidate which has implicit fence.
 
If invalidate is issued before read is completed its a problem.

Regards
Devesh

From: Steve Wise [sw...@opengridcomputing.com]
Sent: Friday, May 30, 2014 6:32 PM
To: Devesh Sharma; bfie...@fieldses.org
Cc: linux-...@vger.kernel.org; linux-rdma@vger.kernel.org; 
t...@opengridcomputing.com
Subject: RE: [PATCH V3] svcrdma: refactor marshalling logic


 Hi Steve

 I am testing this patch. I have found that when server tries to initiate 
 RDMA-READ on ocrdma
 device the RDMA-READ posting fails because there is no FENCE bit set for
 Non-iwarp device which is using frmr. Because of this, whenever server tries 
 to initiate
 RDMA_READ operation, it fails with completion error.
 This bug was there in v1 and v2 as well.


Why would the FENCE bit not be required for mlx4, mthca, cxgb4, and yet be 
required for ocrdma?


 Check inline for the exact location of the change.

 Rest is okay from my side, iozone is passing with this patch. Off-course 
 after putting a FENCE
 indicator.

 -Regards
  Devesh

  -Original Message-
  From: linux-rdma-ow...@vger.kernel.org [mailto:linux-rdma-
  ow...@vger.kernel.org] On Behalf Of Steve Wise
  Sent: Thursday, May 29, 2014 10:26 PM
  To: bfie...@fieldses.org
  Cc: linux-...@vger.kernel.org; linux-rdma@vger.kernel.org;
  t...@opengridcomputing.com
  Subject: [PATCH V3] svcrdma: refactor marshalling logic
 
  This patch refactors the NFSRDMA server marshalling logic to remove the
  intermediary map structures.  It also fixes an existing bug where the
  NFSRDMA server was not minding the device fast register page list length
  limitations.
 
  I've also made a git repo available with these patches on top of 3.15-rc7:
 
  git://git.linux-nfs.org/projects/swise/linux.git svcrdma-refactor-v3
 
  Changes since V2:
 
  - fixed logic bug in rdma_read_chunk_frmr() and rdma_read_chunk_lcl()
 
  - in rdma_read_chunks(), set the reader function pointer only once since
it doesn't change
 
  - squashed the patch back into one patch since the previous split wasn't
bisectable
 
  Changes since V1:
 
  - fixed regression for devices that don't support FRMRs (see
rdma_read_chunk_lcl())
 
  - split patch up for closer review.  However I request it be squashed
before merging as they is not bisectable, and I think these changes
should all be a single commit anyway.
 
  Please review, and test if you can.  I'd like this to hit 3.16.
 
  Signed-off-by: Tom Tucker t...@opengridcomputing.com
  Signed-off-by: Steve Wise sw...@opengridcomputing.com
  ---
 
   include/linux/sunrpc/svc_rdma.h  |3
   net/sunrpc/xprtrdma/svc_rdma_recvfrom.c  |  643 +--
  ---
   net/sunrpc/xprtrdma/svc_rdma_sendto.c|  230 +--
   net/sunrpc/xprtrdma/svc_rdma_transport.c |   62 ++-
   4 files changed, 332 insertions(+), 606 deletions(-)
 
  diff --git a/include/linux/sunrpc/svc_rdma.h
  b/include/linux/sunrpc/svc_rdma.h index 0b8e3e6..5cf99a0 100644
  --- a/include/linux/sunrpc/svc_rdma.h
  +++ b/include/linux/sunrpc/svc_rdma.h
  @@ -115,14 +115,13 @@ struct svc_rdma_fastreg_mr {
  struct list_head frmr_list;
   };
   struct svc_rdma_req_map {
  -   struct svc_rdma_fastreg_mr *frmr;
  unsigned long count;
  union {
  struct kvec sge[RPCSVC_MAXPAGES];
  struct svc_rdma_chunk_sge ch[RPCSVC_MAXPAGES];
  +   unsigned long lkey[RPCSVC_MAXPAGES];
  };
   };
  -#define RDMACTXT_F_FAST_UNREG  1
   #define RDMACTXT_F_LAST_CTXT   2
 
   #defineSVCRDMA_DEVCAP_FAST_REG 1   /*
  fast mr registration */
  diff --git a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
  b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
  index 8d904e4..52d9f2c 100644
  --- a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
  +++ b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
  @@ -1,4 +1,5 @@
   /*
  + * Copyright (c) 2014 Open Grid Computing, Inc. All rights reserved.
* Copyright (c) 2005-2006 Network Appliance, Inc. All rights reserved.
*
* This software is available to you under a choice of one of two @@ -69,7
  +70,8 @@ static void rdma_build_arg_xdr(struct svc_rqst *rqstp,
 
  /* Set up the XDR head */
  rqstp-rq_arg.head[0].iov_base = page_address(page);
  -   rqstp-rq_arg.head[0].iov_len = min(byte_count, ctxt-
  sge[0].length);
  +   rqstp-rq_arg.head[0].iov_len =
  +   min_t(size_t, byte_count, ctxt-sge[0].length);
  rqstp-rq_arg.len = byte_count;
  rqstp-rq_arg.buflen = byte_count;
 
  @@ -85,7 +87,7 @@ static void rdma_build_arg_xdr(struct svc_rqst *rqstp,
  page = ctxt-pages[sge_no];
  put_page(rqstp-rq_pages[sge_no]);
  rqstp-rq_pages[sge_no] = page;
  -   bc -= min(bc, ctxt-sge[sge_no].length);
  +   bc -= min_t(u32, bc, ctxt-sge

[PATCH V3] svcrdma: refactor marshalling logic

2014-05-29 Thread Steve Wise
This patch refactors the NFSRDMA server marshalling logic to
remove the intermediary map structures.  It also fixes an existing bug
where the NFSRDMA server was not minding the device fast register page
list length limitations.

I've also made a git repo available with these patches on top of 3.15-rc7:

git://git.linux-nfs.org/projects/swise/linux.git svcrdma-refactor-v3

Changes since V2:

- fixed logic bug in rdma_read_chunk_frmr() and rdma_read_chunk_lcl()

- in rdma_read_chunks(), set the reader function pointer only once since
  it doesn't change

- squashed the patch back into one patch since the previous split wasn't
  bisectable

Changes since V1:

- fixed regression for devices that don't support FRMRs (see
  rdma_read_chunk_lcl())

- split patch up for closer review.  However I request it be squashed
  before merging as they is not bisectable, and I think these changes
  should all be a single commit anyway.

Please review, and test if you can.  I'd like this to hit 3.16.

Signed-off-by: Tom Tucker t...@opengridcomputing.com
Signed-off-by: Steve Wise sw...@opengridcomputing.com
---

 include/linux/sunrpc/svc_rdma.h  |3 
 net/sunrpc/xprtrdma/svc_rdma_recvfrom.c  |  643 +-
 net/sunrpc/xprtrdma/svc_rdma_sendto.c|  230 +--
 net/sunrpc/xprtrdma/svc_rdma_transport.c |   62 ++-
 4 files changed, 332 insertions(+), 606 deletions(-)

diff --git a/include/linux/sunrpc/svc_rdma.h b/include/linux/sunrpc/svc_rdma.h
index 0b8e3e6..5cf99a0 100644
--- a/include/linux/sunrpc/svc_rdma.h
+++ b/include/linux/sunrpc/svc_rdma.h
@@ -115,14 +115,13 @@ struct svc_rdma_fastreg_mr {
struct list_head frmr_list;
 };
 struct svc_rdma_req_map {
-   struct svc_rdma_fastreg_mr *frmr;
unsigned long count;
union {
struct kvec sge[RPCSVC_MAXPAGES];
struct svc_rdma_chunk_sge ch[RPCSVC_MAXPAGES];
+   unsigned long lkey[RPCSVC_MAXPAGES];
};
 };
-#define RDMACTXT_F_FAST_UNREG  1
 #define RDMACTXT_F_LAST_CTXT   2
 
 #defineSVCRDMA_DEVCAP_FAST_REG 1   /* fast mr registration 
*/
diff --git a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c 
b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
index 8d904e4..52d9f2c 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
@@ -1,4 +1,5 @@
 /*
+ * Copyright (c) 2014 Open Grid Computing, Inc. All rights reserved.
  * Copyright (c) 2005-2006 Network Appliance, Inc. All rights reserved.
  *
  * This software is available to you under a choice of one of two
@@ -69,7 +70,8 @@ static void rdma_build_arg_xdr(struct svc_rqst *rqstp,
 
/* Set up the XDR head */
rqstp-rq_arg.head[0].iov_base = page_address(page);
-   rqstp-rq_arg.head[0].iov_len = min(byte_count, ctxt-sge[0].length);
+   rqstp-rq_arg.head[0].iov_len =
+   min_t(size_t, byte_count, ctxt-sge[0].length);
rqstp-rq_arg.len = byte_count;
rqstp-rq_arg.buflen = byte_count;
 
@@ -85,7 +87,7 @@ static void rdma_build_arg_xdr(struct svc_rqst *rqstp,
page = ctxt-pages[sge_no];
put_page(rqstp-rq_pages[sge_no]);
rqstp-rq_pages[sge_no] = page;
-   bc -= min(bc, ctxt-sge[sge_no].length);
+   bc -= min_t(u32, bc, ctxt-sge[sge_no].length);
rqstp-rq_arg.buflen += ctxt-sge[sge_no].length;
sge_no++;
}
@@ -113,291 +115,265 @@ static void rdma_build_arg_xdr(struct svc_rqst *rqstp,
rqstp-rq_arg.tail[0].iov_len = 0;
 }
 
-/* Encode a read-chunk-list as an array of IB SGE
- *
- * Assumptions:
- * - chunk[0]-position points to pages[0] at an offset of 0
- * - pages[] is not physically or virtually contiguous and consists of
- *   PAGE_SIZE elements.
- *
- * Output:
- * - sge array pointing into pages[] array.
- * - chunk_sge array specifying sge index and count for each
- *   chunk in the read list
- *
- */
-static int map_read_chunks(struct svcxprt_rdma *xprt,
-  struct svc_rqst *rqstp,
-  struct svc_rdma_op_ctxt *head,
-  struct rpcrdma_msg *rmsgp,
-  struct svc_rdma_req_map *rpl_map,
-  struct svc_rdma_req_map *chl_map,
-  int ch_count,
-  int byte_count)
+static int rdma_read_max_sge(struct svcxprt_rdma *xprt, int sge_count)
 {
-   int sge_no;
-   int sge_bytes;
-   int page_off;
-   int page_no;
-   int ch_bytes;
-   int ch_no;
-   struct rpcrdma_read_chunk *ch;
+   if (rdma_node_get_transport(xprt-sc_cm_id-device-node_type) ==
+RDMA_TRANSPORT_IWARP)
+   return 1;
+   else
+   return min_t(int, sge_count, xprt-sc_max_sge);
+}
 
-   sge_no = 0;
-   page_no = 0;
-   page_off = 0;
-   ch = (struct rpcrdma_read_chunk *)rmsgp-rm_body.rm_chunks[0];
-   ch_no = 0;
-  

Re: [PATCH V3] svcrdma: refactor marshalling logic

2014-05-29 Thread Chuck Lever
Test results:

On May 29, 2014, at 12:55 PM, Steve Wise sw...@opengridcomputing.com wrote:

 This patch refactors the NFSRDMA server marshalling logic to
 remove the intermediary map structures.  It also fixes an existing bug
 where the NFSRDMA server was not minding the device fast register page
 list length limitations.
 
 I've also made a git repo available with these patches on top of 3.15-rc7:
 
 git://git.linux-nfs.org/projects/swise/linux.git svcrdma-refactor-v3

Client:
v3.15-rc7 with my 24 patches applied.
ConnectX-2 adapter (mlx4)

Server:
stock Linux v3.15-rc7 with v3 refactoring patch applied
InfiniHost III adapter (mthca)



Connectathon tests: “./server -a -N30” with NFSv3 and NFSv4 against
Linux server. All passed. No hangs, stalls, or crashes.

Client tested in FRMR, FMR, and PHYSICAL memory registration modes.

Added “rsize=32768” mount option for PHYSICAL.


xfstests: “sudo ./check -nfs” with NFSv3 and NFSv4 against Linux
server. Test failure rate same as over TCP. No hangs, stalls, or
crashes.

Client ran in FRMR, FMR, and PHYSICAL memory registration modes.
PHYSICAL registration added “rsize=32768” mount option



 Changes since V2:
 
 - fixed logic bug in rdma_read_chunk_frmr() and rdma_read_chunk_lcl()
 
 - in rdma_read_chunks(), set the reader function pointer only once since
  it doesn't change
 
 - squashed the patch back into one patch since the previous split wasn't
  bisectable
 
 Changes since V1:
 
 - fixed regression for devices that don't support FRMRs (see
  rdma_read_chunk_lcl())
 
 - split patch up for closer review.  However I request it be squashed
  before merging as they is not bisectable, and I think these changes
  should all be a single commit anyway.
 
 Please review, and test if you can.  I'd like this to hit 3.16.
 
 Signed-off-by: Tom Tucker t...@opengridcomputing.com
 Signed-off-by: Steve Wise sw...@opengridcomputing.com
 ---
 
 include/linux/sunrpc/svc_rdma.h  |3 
 net/sunrpc/xprtrdma/svc_rdma_recvfrom.c  |  643 +-
 net/sunrpc/xprtrdma/svc_rdma_sendto.c|  230 +--
 net/sunrpc/xprtrdma/svc_rdma_transport.c |   62 ++-
 4 files changed, 332 insertions(+), 606 deletions(-)
 
 diff --git a/include/linux/sunrpc/svc_rdma.h b/include/linux/sunrpc/svc_rdma.h
 index 0b8e3e6..5cf99a0 100644
 --- a/include/linux/sunrpc/svc_rdma.h
 +++ b/include/linux/sunrpc/svc_rdma.h
 @@ -115,14 +115,13 @@ struct svc_rdma_fastreg_mr {
   struct list_head frmr_list;
 };
 struct svc_rdma_req_map {
 - struct svc_rdma_fastreg_mr *frmr;
   unsigned long count;
   union {
   struct kvec sge[RPCSVC_MAXPAGES];
   struct svc_rdma_chunk_sge ch[RPCSVC_MAXPAGES];
 + unsigned long lkey[RPCSVC_MAXPAGES];
   };
 };
 -#define RDMACTXT_F_FAST_UNREG1
 #define RDMACTXT_F_LAST_CTXT  2
 
 #define   SVCRDMA_DEVCAP_FAST_REG 1   /* fast mr registration 
 */
 diff --git a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c 
 b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
 index 8d904e4..52d9f2c 100644
 --- a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
 +++ b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
 @@ -1,4 +1,5 @@
 /*
 + * Copyright (c) 2014 Open Grid Computing, Inc. All rights reserved.
  * Copyright (c) 2005-2006 Network Appliance, Inc. All rights reserved.
  *
  * This software is available to you under a choice of one of two
 @@ -69,7 +70,8 @@ static void rdma_build_arg_xdr(struct svc_rqst *rqstp,
 
   /* Set up the XDR head */
   rqstp-rq_arg.head[0].iov_base = page_address(page);
 - rqstp-rq_arg.head[0].iov_len = min(byte_count, ctxt-sge[0].length);
 + rqstp-rq_arg.head[0].iov_len =
 + min_t(size_t, byte_count, ctxt-sge[0].length);
   rqstp-rq_arg.len = byte_count;
   rqstp-rq_arg.buflen = byte_count;
 
 @@ -85,7 +87,7 @@ static void rdma_build_arg_xdr(struct svc_rqst *rqstp,
   page = ctxt-pages[sge_no];
   put_page(rqstp-rq_pages[sge_no]);
   rqstp-rq_pages[sge_no] = page;
 - bc -= min(bc, ctxt-sge[sge_no].length);
 + bc -= min_t(u32, bc, ctxt-sge[sge_no].length);
   rqstp-rq_arg.buflen += ctxt-sge[sge_no].length;
   sge_no++;
   }
 @@ -113,291 +115,265 @@ static void rdma_build_arg_xdr(struct svc_rqst *rqstp,
   rqstp-rq_arg.tail[0].iov_len = 0;
 }
 
 -/* Encode a read-chunk-list as an array of IB SGE
 - *
 - * Assumptions:
 - * - chunk[0]-position points to pages[0] at an offset of 0
 - * - pages[] is not physically or virtually contiguous and consists of
 - *   PAGE_SIZE elements.
 - *
 - * Output:
 - * - sge array pointing into pages[] array.
 - * - chunk_sge array specifying sge index and count for each
 - *   chunk in the read list
 - *
 - */
 -static int map_read_chunks(struct svcxprt_rdma *xprt,
 -struct svc_rqst *rqstp,
 -struct svc_rdma_op_ctxt *head,
 -struct rpcrdma_msg *rmsgp,
 -