Re: [PATCH for-3.11 7/7] IB/iser: Introduce fast memory registration model (FRWR)

2013-07-23 Thread Bart Van Assche

On 07/22/13 15:11, Sagi Grimberg wrote:

So just to clarify the flow:
. at connection establishment allocate pool of fastreg descriptors
. upon each IOP take a fastreg descriptor from the pool
 . if it is not invalidated - invalidate it.
 . register using FRWR.
. when cleanup_task is called - just return the fastreg descriptor to
the pool.
. at connection teardown free all resources.
Still to come:
. upon each IOP response, check if the target used remote invalidate -
if so mark relevant fastreg as valid.


Hello Sagi and Or,

Thanks for the clarifications. I have one more question though. My 
interpretation of section 10.6 Memory Management in the IB 
specification is that memory registration maps a memory region that 
either has contiguous virtual addresses or contiguous physical 
addresses. However, there is no such requirement for an sg-list. As an 
example, for direct I/O to a block device with a sector size of 512 
bytes it is only required that I/O occurs in multiples of 512 bytes and 
from memory aligned on 512-byte boundaries. So the use of direct I/O can 
result in an sg-list where the second and subsequent sg-list elements 
have a non-zero offset. Do you agree with this ? Are such sg-lists 
mapped correctly by the FRWR code ?


Bart.

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] libibverbs: Add the use of IBV_SEND_INLINE to example pingpong programs

2013-07-23 Thread Jeff Squyres (jsquyres)
Bump bump bump.

I know this isn't a huge / important patch, but it is a small thing that does 
decrease the latency reported by these example programs.


On Jul 10, 2013, at 4:32 PM, Jeff Squyres jsquy...@cisco.com wrote:

 If the send size is less than the cap.max_inline_data reported by the
 qp, use the IBV_SEND_INLINE flag.  This not only shows the example of
 using ibv_query_qp(), it also reduces the latency time shown by the
 pingpong programs when the sends can be inlined.
 
 Signed-off-by: Jeff Squyres jsquy...@cisco.com
 ---
 examples/rc_pingpong.c  | 18 +-
 examples/srq_pingpong.c | 19 +--
 examples/uc_pingpong.c  | 17 -
 examples/ud_pingpong.c  | 18 +-
 4 files changed, 51 insertions(+), 21 deletions(-)
 
 diff --git a/examples/rc_pingpong.c b/examples/rc_pingpong.c
 index 15494a1..a8637a5 100644
 --- a/examples/rc_pingpong.c
 +++ b/examples/rc_pingpong.c
 @@ -65,6 +65,7 @@ struct pingpong_context {
   struct ibv_qp   *qp;
   void*buf;
   int  size;
 + int  send_flags;
   int  rx_depth;
   int  pending;
   struct ibv_port_attr portinfo;
 @@ -319,8 +320,9 @@ static struct pingpong_context *pp_init_ctx(struct 
 ibv_device *ib_dev, int size,
   if (!ctx)
   return NULL;
 
 - ctx-size = size;
 - ctx-rx_depth = rx_depth;
 + ctx-size   = size;
 + ctx-send_flags = IBV_SEND_SIGNALED;
 + ctx-rx_depth   = rx_depth;
 
   ctx-buf = memalign(page_size, size);
   if (!ctx-buf) {
 @@ -367,7 +369,8 @@ static struct pingpong_context *pp_init_ctx(struct 
 ibv_device *ib_dev, int size,
   }
 
   {
 - struct ibv_qp_init_attr attr = {
 + struct ibv_qp_attr attr;
 + struct ibv_qp_init_attr init_attr = {
   .send_cq = ctx-cq,
   .recv_cq = ctx-cq,
   .cap = {
 @@ -379,11 +382,16 @@ static struct pingpong_context *pp_init_ctx(struct 
 ibv_device *ib_dev, int size,
   .qp_type = IBV_QPT_RC
   };
 
 - ctx-qp = ibv_create_qp(ctx-pd, attr);
 + ctx-qp = ibv_create_qp(ctx-pd, init_attr);
   if (!ctx-qp)  {
   fprintf(stderr, Couldn't create QP\n);
   goto clean_cq;
   }
 +
 + ibv_query_qp(ctx-qp, attr, IBV_QP_CAP, init_attr);
 + if (init_attr.cap.max_inline_data = size) {
 + ctx-send_flags |= IBV_SEND_INLINE;
 + }
   }
 
   {
 @@ -508,7 +516,7 @@ static int pp_post_send(struct pingpong_context *ctx)
   .sg_list= list,
   .num_sge= 1,
   .opcode = IBV_WR_SEND,
 - .send_flags = IBV_SEND_SIGNALED,
 + .send_flags = ctx-send_flags,
   };
   struct ibv_send_wr *bad_wr;
 
 diff --git a/examples/srq_pingpong.c b/examples/srq_pingpong.c
 index 6e00f8c..552a144 100644
 --- a/examples/srq_pingpong.c
 +++ b/examples/srq_pingpong.c
 @@ -68,6 +68,7 @@ struct pingpong_context {
   struct ibv_qp   *qp[MAX_QP];
   void*buf;
   int  size;
 + int  send_flags;
   int  num_qp;
   int  rx_depth;
   int  pending[MAX_QP];
 @@ -350,9 +351,10 @@ static struct pingpong_context *pp_init_ctx(struct 
 ibv_device *ib_dev, int size,
   if (!ctx)
   return NULL;
 
 - ctx-size = size;
 - ctx-num_qp   = num_qp;
 - ctx-rx_depth = rx_depth;
 + ctx-size   = size;
 + ctx-send_flags = IBV_SEND_SIGNALED;
 + ctx-num_qp = num_qp;
 + ctx-rx_depth   = rx_depth;
 
   ctx-buf = memalign(page_size, size);
   if (!ctx-buf) {
 @@ -413,7 +415,8 @@ static struct pingpong_context *pp_init_ctx(struct 
 ibv_device *ib_dev, int size,
   }
 
   for (i = 0; i  num_qp; ++i) {
 - struct ibv_qp_init_attr attr = {
 + struct ibv_qp_attr attr;
 + struct ibv_qp_init_attr init_attr = {
   .send_cq = ctx-cq,
   .recv_cq = ctx-cq,
   .srq = ctx-srq,
 @@ -424,11 +427,15 @@ static struct pingpong_context *pp_init_ctx(struct 
 ibv_device *ib_dev, int size,
   .qp_type = IBV_QPT_RC
   };
 
 - ctx-qp[i] = ibv_create_qp(ctx-pd, attr);
 + ctx-qp[i] = ibv_create_qp(ctx-pd, init_attr);
   if (!ctx-qp[i])  {
   fprintf(stderr, Couldn't create QP[%d]\n, i);
   goto clean_qps;
   }
 + ibv_query_qp(ctx-qp[i], attr, IBV_QP_CAP, init_attr);
 + if (init_attr.cap.max_inline_data = size) {
 + ctx-send_flags |= 

Re: [PATCH V2] libibverbs: Allow arbitrary int values for MTU

2013-07-23 Thread Jeff Squyres (jsquyres)
On Jul 18, 2013, at 12:50 PM, Jason Gunthorpe jguntho...@obsidianresearch.com 
wrote:

 We need it for UD for our upcoming device, however, because the MTU
 is the only way to get the max message size.
 
 .. and UD is the least abstracted transport, so existing apps won't
 support Jeff's new NIC anyhow, MTU is the least of their problems.
 
 Existing apps with existing transports see the same old values.


...so how do we move forward?

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH for-3.11 7/7] IB/iser: Introduce fast memory registration model (FRWR)

2013-07-23 Thread Or Gerlitz
On Tue, Jul 23, 2013 at 2:58 PM, Bart Van Assche bvanass...@acm.org wrote:

 [...]
 Hello Sagi and Or,

 Thanks for the clarifications. I have one more question though. My 
 interpretation of section 10.6 Memory Management in the IB specification is 
 that memory registration maps a memory region that either has contiguous 
 virtual addresses or contiguous physical addresses. However, there is no such 
 requirement for an sg-list. As an example, for direct I/O to a block device 
 with a sector size of 512 bytes it is only required that I/O occurs in 
 multiples of 512 bytes and from memory aligned on 512-byte boundaries. So the 
 use of direct I/O can result in an sg-list where the second and subsequent 
 sg-list elements have a non-zero offset. Do you agree with this ?



YES, this can happen.




 Are such sg-lists mapped correctly by the FRWR code ?



Bart, iSER's FMR and FRWR code works under the assumption that an SG
list is 4K aligned. For SGs which don't obey that assumption we're
using bounce buffer.

Note that the SG page size used by FMRs/FRWRs doesn't have to be 1:1
with the OS page size, so in that respect  down the road, we will
get rid of the bounce buffer thing with having another FMR/FRWR pool
whose page size is 512B and will be used for SG which are not 4K
aligned.

Or.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH for-3.11 7/7] IB/iser: Introduce fast memory registration model (FRWR)

2013-07-23 Thread Sagi Grimberg

On 7/23/2013 2:58 PM, Bart Van Assche wrote:

On 07/22/13 15:11, Sagi Grimberg wrote:

So just to clarify the flow:
. at connection establishment allocate pool of fastreg descriptors
. upon each IOP take a fastreg descriptor from the pool
 . if it is not invalidated - invalidate it.
 . register using FRWR.
. when cleanup_task is called - just return the fastreg descriptor to
the pool.
. at connection teardown free all resources.
Still to come:
. upon each IOP response, check if the target used remote invalidate -
if so mark relevant fastreg as valid.


Hello Sagi and Or,

Thanks for the clarifications. I have one more question though. My 
interpretation of section 10.6 Memory Management in the IB 
specification is that memory registration maps a memory region that 
either has contiguous virtual addresses or contiguous physical 
addresses. However, there is no such requirement for an sg-list. As an 
example, for direct I/O to a block device with a sector size of 512 
bytes it is only required that I/O occurs in multiples of 512 bytes 
and from memory aligned on 512-byte boundaries. So the use of direct 
I/O can result in an sg-list where the second and subsequent sg-list 
elements have a non-zero offset. Do you agree with this ? Are such 
sg-lists mapped correctly by the FRWR code ?


Bart.



Hey Bart,

You are on the money with this observation, like FMRs, FRWR cannot 
register any arbitrary SG-list. You have the same limitations.
Unlike SRP where the initiator will use multiple FMRs to register such 
unaligned SG-lists,
iSER uses a bounce buffer to copy the data to a nice physically 
contiguous memory area (see patch 5/7 fall_to_bounce_buf routine), thus 
will pass a single R_Key for each transaction.
An equivalent FRWR implementation for SRP will also use multiple FRWRs 
in-order to register such un-aligned SG-lists and publish the R_Keys 
in ib_sge.


Hope this helps,

-Sagi
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH for-3.11 7/7] IB/iser: Introduce fast memory registration model (FRWR)

2013-07-23 Thread Bart Van Assche

On 07/23/13 16:21, Or Gerlitz wrote:

Bart, iSER's FMR and FRWR code works under the assumption that an SG
list is 4K aligned. For SGs which don't obey that assumption we're
using bounce buffer.

Note that the SG page size used by FMRs/FRWRs doesn't have to be 1:1
with the OS page size, so in that respect  down the road, we will
get rid of the bounce buffer thing with having another FMR/FRWR pool
whose page size is 512B and will be used for SG which are not 4K
aligned.


Sorry but I had overlooked the bounce buffer patch. Regarding page 
sizes: is an InfiniBand HCA required to support a page size of 512 bytes 
? To me it seems like the smallest page size supported by e.g. the 
ocrdma driver is 4KB. From ocrdma_query_device():


attr-page_size_cap = 0x000;

Still regarding page sizes: shouldn't ib_alloc_fast_reg_page_list() and 
ib_alloc_fast_reg_mr() multiply the SG list length by PAGE_SIZE / 
SIZE_4K to compensate for page size differences on architectures where 
virtual memory pages are larger than 4KB ?


Bart.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH opensm] opensm/osm_db_pack.c: Removed uneeded asserts

2013-07-23 Thread Hal Rosenstock

From: Alex Netes ale...@mellanox.com

Out of range lids isn't a fatal event and SM code just ignores these
lids.

Signed-off-by: Alex Netes ale...@mellanox.com
---
 opensm/osm_db_pack.c |8 ++--
 1 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/opensm/osm_db_pack.c b/opensm/osm_db_pack.c
index 708a875..8cddd06 100644
--- a/opensm/osm_db_pack.c
+++ b/opensm/osm_db_pack.c
@@ -73,14 +73,18 @@ static inline int unpack_lids(IN char *p_lid_str, OUT 
uint16_t * p_min_lid,
if (!p_num)
return 1;
tmp = strtoul(p_num, NULL, 0);
-   CL_ASSERT(tmp  0x1);
+   if (tmp = 0xC000)
+   return 1;
+
*p_min_lid = (uint16_t) tmp;
 
p_num = strtok_r(NULL,  \t, p_next);
if (!p_num)
return 1;
tmp = strtoul(p_num, NULL, 0);
-   CL_ASSERT(tmp  0x1);
+   if (tmp = 0xC000)
+   return 1;
+
*p_max_lid = (uint16_t) tmp;
 
return 0;
-- 
1.7.8.2

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: infiniband build warning

2013-07-23 Thread Or Gerlitz
On Mon, Jul 22, 2013 at 8:40 AM, Hefty, Sean sean.he...@intel.com wrote:
 I am seeing build warnings in drivers/infiniband/core/cma.c starting with
 v3.11-rc1. These can be reproduced with gcc 4.6.3.
 Would you consider applying the following fix ?

 A patch to fix this was submitted to the linux-rdma list last week.

Hi Roland,

There is a nice set of patches (IPoIB virtualization fixes, mlx5
fixes, iser patches to cope with Connect-IB no FMR, etc, etc) posted
to linux-rdma little before  after 3.11-rc1 was out, and we are @
rc2 now. I think we need to get them upstreamed around this time such
that its early enough in the cycle, makes sense?

Or.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[ANNOUNCE] dapl-2.0.38

2013-07-23 Thread Davis, Arlin R
Rupert, please pull new dapl-2.0.38 released package into OFED 3.5.2

Thanks, Arlin
--

Latest Packages (see ChangeLog for recent changes):

md5sum: 21b933fb24ed86d5c5413d9a269f913d dapl-2.0.38.tar.gz 

For v2.0 package install RPM packages as follow: 

dapl-2.0.38-1 
dapl-utils-2.0.38-1 
dapl-devel-2.0.38-1 
dapl-debuginfo-2.0.38-1 

Summary of v2.0 changes: 

Release 2.0.38 fixes (OFED 3.5.2) 

dapltest: add -n parameter to override default server port number (45278) 
ucm,scm: UD mode creates many CR objects per EP that needs cleaned up 
cma: add DAPL_CM_TOS environment variable to enable passing a TOS to the RDMA 
CM 


--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html