Re: [PATCH for-3.11 7/7] IB/iser: Introduce fast memory registration model (FRWR)
On 07/22/13 15:11, Sagi Grimberg wrote: So just to clarify the flow: . at connection establishment allocate pool of fastreg descriptors . upon each IOP take a fastreg descriptor from the pool . if it is not invalidated - invalidate it. . register using FRWR. . when cleanup_task is called - just return the fastreg descriptor to the pool. . at connection teardown free all resources. Still to come: . upon each IOP response, check if the target used remote invalidate - if so mark relevant fastreg as valid. Hello Sagi and Or, Thanks for the clarifications. I have one more question though. My interpretation of section 10.6 Memory Management in the IB specification is that memory registration maps a memory region that either has contiguous virtual addresses or contiguous physical addresses. However, there is no such requirement for an sg-list. As an example, for direct I/O to a block device with a sector size of 512 bytes it is only required that I/O occurs in multiples of 512 bytes and from memory aligned on 512-byte boundaries. So the use of direct I/O can result in an sg-list where the second and subsequent sg-list elements have a non-zero offset. Do you agree with this ? Are such sg-lists mapped correctly by the FRWR code ? Bart. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] libibverbs: Add the use of IBV_SEND_INLINE to example pingpong programs
Bump bump bump. I know this isn't a huge / important patch, but it is a small thing that does decrease the latency reported by these example programs. On Jul 10, 2013, at 4:32 PM, Jeff Squyres jsquy...@cisco.com wrote: If the send size is less than the cap.max_inline_data reported by the qp, use the IBV_SEND_INLINE flag. This not only shows the example of using ibv_query_qp(), it also reduces the latency time shown by the pingpong programs when the sends can be inlined. Signed-off-by: Jeff Squyres jsquy...@cisco.com --- examples/rc_pingpong.c | 18 +- examples/srq_pingpong.c | 19 +-- examples/uc_pingpong.c | 17 - examples/ud_pingpong.c | 18 +- 4 files changed, 51 insertions(+), 21 deletions(-) diff --git a/examples/rc_pingpong.c b/examples/rc_pingpong.c index 15494a1..a8637a5 100644 --- a/examples/rc_pingpong.c +++ b/examples/rc_pingpong.c @@ -65,6 +65,7 @@ struct pingpong_context { struct ibv_qp *qp; void*buf; int size; + int send_flags; int rx_depth; int pending; struct ibv_port_attr portinfo; @@ -319,8 +320,9 @@ static struct pingpong_context *pp_init_ctx(struct ibv_device *ib_dev, int size, if (!ctx) return NULL; - ctx-size = size; - ctx-rx_depth = rx_depth; + ctx-size = size; + ctx-send_flags = IBV_SEND_SIGNALED; + ctx-rx_depth = rx_depth; ctx-buf = memalign(page_size, size); if (!ctx-buf) { @@ -367,7 +369,8 @@ static struct pingpong_context *pp_init_ctx(struct ibv_device *ib_dev, int size, } { - struct ibv_qp_init_attr attr = { + struct ibv_qp_attr attr; + struct ibv_qp_init_attr init_attr = { .send_cq = ctx-cq, .recv_cq = ctx-cq, .cap = { @@ -379,11 +382,16 @@ static struct pingpong_context *pp_init_ctx(struct ibv_device *ib_dev, int size, .qp_type = IBV_QPT_RC }; - ctx-qp = ibv_create_qp(ctx-pd, attr); + ctx-qp = ibv_create_qp(ctx-pd, init_attr); if (!ctx-qp) { fprintf(stderr, Couldn't create QP\n); goto clean_cq; } + + ibv_query_qp(ctx-qp, attr, IBV_QP_CAP, init_attr); + if (init_attr.cap.max_inline_data = size) { + ctx-send_flags |= IBV_SEND_INLINE; + } } { @@ -508,7 +516,7 @@ static int pp_post_send(struct pingpong_context *ctx) .sg_list= list, .num_sge= 1, .opcode = IBV_WR_SEND, - .send_flags = IBV_SEND_SIGNALED, + .send_flags = ctx-send_flags, }; struct ibv_send_wr *bad_wr; diff --git a/examples/srq_pingpong.c b/examples/srq_pingpong.c index 6e00f8c..552a144 100644 --- a/examples/srq_pingpong.c +++ b/examples/srq_pingpong.c @@ -68,6 +68,7 @@ struct pingpong_context { struct ibv_qp *qp[MAX_QP]; void*buf; int size; + int send_flags; int num_qp; int rx_depth; int pending[MAX_QP]; @@ -350,9 +351,10 @@ static struct pingpong_context *pp_init_ctx(struct ibv_device *ib_dev, int size, if (!ctx) return NULL; - ctx-size = size; - ctx-num_qp = num_qp; - ctx-rx_depth = rx_depth; + ctx-size = size; + ctx-send_flags = IBV_SEND_SIGNALED; + ctx-num_qp = num_qp; + ctx-rx_depth = rx_depth; ctx-buf = memalign(page_size, size); if (!ctx-buf) { @@ -413,7 +415,8 @@ static struct pingpong_context *pp_init_ctx(struct ibv_device *ib_dev, int size, } for (i = 0; i num_qp; ++i) { - struct ibv_qp_init_attr attr = { + struct ibv_qp_attr attr; + struct ibv_qp_init_attr init_attr = { .send_cq = ctx-cq, .recv_cq = ctx-cq, .srq = ctx-srq, @@ -424,11 +427,15 @@ static struct pingpong_context *pp_init_ctx(struct ibv_device *ib_dev, int size, .qp_type = IBV_QPT_RC }; - ctx-qp[i] = ibv_create_qp(ctx-pd, attr); + ctx-qp[i] = ibv_create_qp(ctx-pd, init_attr); if (!ctx-qp[i]) { fprintf(stderr, Couldn't create QP[%d]\n, i); goto clean_qps; } + ibv_query_qp(ctx-qp[i], attr, IBV_QP_CAP, init_attr); + if (init_attr.cap.max_inline_data = size) { + ctx-send_flags |=
Re: [PATCH V2] libibverbs: Allow arbitrary int values for MTU
On Jul 18, 2013, at 12:50 PM, Jason Gunthorpe jguntho...@obsidianresearch.com wrote: We need it for UD for our upcoming device, however, because the MTU is the only way to get the max message size. .. and UD is the least abstracted transport, so existing apps won't support Jeff's new NIC anyhow, MTU is the least of their problems. Existing apps with existing transports see the same old values. ...so how do we move forward? -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/ -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH for-3.11 7/7] IB/iser: Introduce fast memory registration model (FRWR)
On Tue, Jul 23, 2013 at 2:58 PM, Bart Van Assche bvanass...@acm.org wrote: [...] Hello Sagi and Or, Thanks for the clarifications. I have one more question though. My interpretation of section 10.6 Memory Management in the IB specification is that memory registration maps a memory region that either has contiguous virtual addresses or contiguous physical addresses. However, there is no such requirement for an sg-list. As an example, for direct I/O to a block device with a sector size of 512 bytes it is only required that I/O occurs in multiples of 512 bytes and from memory aligned on 512-byte boundaries. So the use of direct I/O can result in an sg-list where the second and subsequent sg-list elements have a non-zero offset. Do you agree with this ? YES, this can happen. Are such sg-lists mapped correctly by the FRWR code ? Bart, iSER's FMR and FRWR code works under the assumption that an SG list is 4K aligned. For SGs which don't obey that assumption we're using bounce buffer. Note that the SG page size used by FMRs/FRWRs doesn't have to be 1:1 with the OS page size, so in that respect down the road, we will get rid of the bounce buffer thing with having another FMR/FRWR pool whose page size is 512B and will be used for SG which are not 4K aligned. Or. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH for-3.11 7/7] IB/iser: Introduce fast memory registration model (FRWR)
On 7/23/2013 2:58 PM, Bart Van Assche wrote: On 07/22/13 15:11, Sagi Grimberg wrote: So just to clarify the flow: . at connection establishment allocate pool of fastreg descriptors . upon each IOP take a fastreg descriptor from the pool . if it is not invalidated - invalidate it. . register using FRWR. . when cleanup_task is called - just return the fastreg descriptor to the pool. . at connection teardown free all resources. Still to come: . upon each IOP response, check if the target used remote invalidate - if so mark relevant fastreg as valid. Hello Sagi and Or, Thanks for the clarifications. I have one more question though. My interpretation of section 10.6 Memory Management in the IB specification is that memory registration maps a memory region that either has contiguous virtual addresses or contiguous physical addresses. However, there is no such requirement for an sg-list. As an example, for direct I/O to a block device with a sector size of 512 bytes it is only required that I/O occurs in multiples of 512 bytes and from memory aligned on 512-byte boundaries. So the use of direct I/O can result in an sg-list where the second and subsequent sg-list elements have a non-zero offset. Do you agree with this ? Are such sg-lists mapped correctly by the FRWR code ? Bart. Hey Bart, You are on the money with this observation, like FMRs, FRWR cannot register any arbitrary SG-list. You have the same limitations. Unlike SRP where the initiator will use multiple FMRs to register such unaligned SG-lists, iSER uses a bounce buffer to copy the data to a nice physically contiguous memory area (see patch 5/7 fall_to_bounce_buf routine), thus will pass a single R_Key for each transaction. An equivalent FRWR implementation for SRP will also use multiple FRWRs in-order to register such un-aligned SG-lists and publish the R_Keys in ib_sge. Hope this helps, -Sagi -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH for-3.11 7/7] IB/iser: Introduce fast memory registration model (FRWR)
On 07/23/13 16:21, Or Gerlitz wrote: Bart, iSER's FMR and FRWR code works under the assumption that an SG list is 4K aligned. For SGs which don't obey that assumption we're using bounce buffer. Note that the SG page size used by FMRs/FRWRs doesn't have to be 1:1 with the OS page size, so in that respect down the road, we will get rid of the bounce buffer thing with having another FMR/FRWR pool whose page size is 512B and will be used for SG which are not 4K aligned. Sorry but I had overlooked the bounce buffer patch. Regarding page sizes: is an InfiniBand HCA required to support a page size of 512 bytes ? To me it seems like the smallest page size supported by e.g. the ocrdma driver is 4KB. From ocrdma_query_device(): attr-page_size_cap = 0x000; Still regarding page sizes: shouldn't ib_alloc_fast_reg_page_list() and ib_alloc_fast_reg_mr() multiply the SG list length by PAGE_SIZE / SIZE_4K to compensate for page size differences on architectures where virtual memory pages are larger than 4KB ? Bart. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH opensm] opensm/osm_db_pack.c: Removed uneeded asserts
From: Alex Netes ale...@mellanox.com Out of range lids isn't a fatal event and SM code just ignores these lids. Signed-off-by: Alex Netes ale...@mellanox.com --- opensm/osm_db_pack.c |8 ++-- 1 files changed, 6 insertions(+), 2 deletions(-) diff --git a/opensm/osm_db_pack.c b/opensm/osm_db_pack.c index 708a875..8cddd06 100644 --- a/opensm/osm_db_pack.c +++ b/opensm/osm_db_pack.c @@ -73,14 +73,18 @@ static inline int unpack_lids(IN char *p_lid_str, OUT uint16_t * p_min_lid, if (!p_num) return 1; tmp = strtoul(p_num, NULL, 0); - CL_ASSERT(tmp 0x1); + if (tmp = 0xC000) + return 1; + *p_min_lid = (uint16_t) tmp; p_num = strtok_r(NULL, \t, p_next); if (!p_num) return 1; tmp = strtoul(p_num, NULL, 0); - CL_ASSERT(tmp 0x1); + if (tmp = 0xC000) + return 1; + *p_max_lid = (uint16_t) tmp; return 0; -- 1.7.8.2 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: infiniband build warning
On Mon, Jul 22, 2013 at 8:40 AM, Hefty, Sean sean.he...@intel.com wrote: I am seeing build warnings in drivers/infiniband/core/cma.c starting with v3.11-rc1. These can be reproduced with gcc 4.6.3. Would you consider applying the following fix ? A patch to fix this was submitted to the linux-rdma list last week. Hi Roland, There is a nice set of patches (IPoIB virtualization fixes, mlx5 fixes, iser patches to cope with Connect-IB no FMR, etc, etc) posted to linux-rdma little before after 3.11-rc1 was out, and we are @ rc2 now. I think we need to get them upstreamed around this time such that its early enough in the cycle, makes sense? Or. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[ANNOUNCE] dapl-2.0.38
Rupert, please pull new dapl-2.0.38 released package into OFED 3.5.2 Thanks, Arlin -- Latest Packages (see ChangeLog for recent changes): md5sum: 21b933fb24ed86d5c5413d9a269f913d dapl-2.0.38.tar.gz For v2.0 package install RPM packages as follow: dapl-2.0.38-1 dapl-utils-2.0.38-1 dapl-devel-2.0.38-1 dapl-debuginfo-2.0.38-1 Summary of v2.0 changes: Release 2.0.38 fixes (OFED 3.5.2) dapltest: add -n parameter to override default server port number (45278) ucm,scm: UD mode creates many CR objects per EP that needs cleaned up cma: add DAPL_CM_TOS environment variable to enable passing a TOS to the RDMA CM -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html