Re: [PATCH V3 1/5] RDMA/core: Transport-independent access flags
On 7/6/2015 2:22 AM, Steve Wise wrote: The semantics for MR access flags are not consistent across RDMA protocols. So rather than have applications try and glean what they need, have them pass in the intended roles and attributes for the MR to be allocated and let the RDMA core select the appropriate access flags given the roles, attributes, and device capabilities. We introduce rdma_mr_roles and rdma_mr_attributes that enumerate the possible roles and attributes for a MR. These are documented in the enums themselves. New services exported: rdma_device_access_flags() - given the intended MR roles and attributes passed in, return the ib_access_flags bits for the device. rdma_get_dma_mr() - allocate a dma mr using the applications intended MR roles and MR attributes. This uses rdma_device_access_flags(). rdma_fast_reg_access_flags() - return the ib_access_flags bits needed for a fast register WR given the applications intended MR roles and MR attributes. This uses rdma_device_access_flags(). Signed-off-by: Steve Wise sw...@opengridcomputing.com --- drivers/infiniband/core/verbs.c | 30 +++ include/rdma/ib_verbs.h | 106 +++ 2 files changed, 136 insertions(+), 0 deletions(-) diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c index bac3fb4..f42595c 100644 --- a/drivers/infiniband/core/verbs.c +++ b/drivers/infiniband/core/verbs.c @@ -1144,6 +1144,36 @@ struct ib_mr *ib_get_dma_mr(struct ib_pd *pd, int mr_access_flags) } EXPORT_SYMBOL(ib_get_dma_mr); +int rdma_device_access_flags(struct ib_pd *pd, int roles, int attrs) +{ + int access_flags = attrs; + + if (roles RDMA_MRR_RECV) + access_flags |= IB_ACCESS_LOCAL_WRITE; + + if (roles RDMA_MRR_WRITE_DEST) + access_flags |= IB_ACCESS_LOCAL_WRITE | IB_ACCESS_REMOTE_WRITE; + + if (roles RDMA_MRR_READ_DEST) { + access_flags |= IB_ACCESS_LOCAL_WRITE; + if (rdma_protocol_iwarp(pd-device, + rdma_start_port(pd-device))) + access_flags |= IB_ACCESS_REMOTE_WRITE; + } + + if (roles RDMA_MRR_READ_SOURCE) + access_flags |= IB_ACCESS_REMOTE_READ; + + if (roles RDMA_MRR_ATOMIC_DEST) + access_flags |= IB_ACCESS_LOCAL_WRITE | IB_ACCESS_REMOTE_ATOMIC; + + if (roles RDMA_MRR_MW_BIND) + access_flags |= IB_ACCESS_MW_BIND; + + return access_flags; +} +EXPORT_SYMBOL(rdma_device_access_flags); + struct ib_mr *ib_reg_phys_mr(struct ib_pd *pd, struct ib_phys_buf *phys_buf_array, int num_phys_buf, diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h index 986fddb..da1eadf 100644 --- a/include/rdma/ib_verbs.h +++ b/include/rdma/ib_verbs.h @@ -2494,6 +2494,112 @@ static inline int ib_req_ncomp_notif(struct ib_cq *cq, int wc_cnt) struct ib_mr *ib_get_dma_mr(struct ib_pd *pd, int mr_access_flags); /** + * rdma_mr_roles - possible roles an RDMA MR will be used for + * + * This allows a transport independent RDMA application to + * create MRs that are usable for all the desired roles w/o + * having to understand which access rights are needed. + */ +enum { + + /* lkey used in a ib_recv_wr sge */ + RDMA_MRR_RECV = 1, + + /* lkey used for IB_WR_SEND* ops in the ib_send_wr sge */ + RDMA_MRR_SEND = (11), + + /* rkey used for IB_WR_RDMA_READ* ops in ib_send_wr wr.rdma.rkey */ + RDMA_MRR_READ_SOURCE= (12), + + /* lkey used for IB_WR_RDMA_READ* ops in the ib_send_wr sge */ + RDMA_MRR_READ_DEST = (13), + + /* lkey used for IB_WR_RDMA_WRITE* ops in the ib_send_wr sge */ + RDMA_MRR_WRITE_SOURCE = (14), + + /* rkey used for IB_WR_RDMA_WRITE* ops in ib_send_wr wr.rdma.rkey */ + RDMA_MRR_WRITE_DEST = (15), + + /* +* lkey used for IB_WR_ATOMIC/MASKED_ATOMIC* ops in the ib_send_wr sge +*/ + RDMA_MRR_ATOMIC_SOURCE = (16), + + /* +* rkey used for IB_WR_ATOMIC/MASKED_ATOMIC* ops in ib_send_wr +* wr.atomic.rkey +*/ + RDMA_MRR_ATOMIC_DEST= (17), + + /* MR used for a IB_WR_MW_BIND in ib_send_wr wr.bind_mw.bind_info.mr */ + RDMA_MRR_MW_BIND= (18), +}; + +/** + * rdma_mr_attributes - attributes for rdma memory regions + */ +enum { + RDMA_MRA_ZERO_BASED = IB_ZERO_BASED, + RDMA_MRA_ACCESS_ON_DEMAND = IB_ACCESS_ON_DEMAND, +}; + +/** + * rdma_device_access_flags - Returns the device-specific MR access flags. + * @pd: The protection domain associated with the memory region. + * @roles: The intended roles of the MR + * @attrs: The desired attributes of the MR + * + * Use the intended roles from @roles along with @attrs and the device + *
Re: [PATCH V5 4/5] RDMA/isert: Set REMOTE_WRITE on DMA MRs to support iWARP devices
On 7/5/2015 8:45 PM, Steve Wise wrote: iWARP devices require REMOTE_WRITE for MRs used as the destination of an RDMA READ. So if the device protocol is iWARP, then set REMOTE_WRITE when allocating the DMA MR. Signed-off-by: Steve Wise sw...@opengridcomputing.com Reviewed-by: Sagi Grimberg sa...@mellanox.com -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V5 3/5] RDMA/iser: Limit sg tablesize and max_sectors to device fastreg max depth
On 7/5/2015 8:44 PM, Steve Wise wrote: Currently the sg tablesize, which dictates fast register page list depth to use, does not take into account the limits of the rdma device. So adjust it once we discover the device fastreg max depth limit. Also adjust the max_sectors based on the resulting sg tablesize. Signed-off-by: Steve Wise sw...@opengridcomputing.com --- drivers/infiniband/ulp/iser/iscsi_iser.c |9 + 1 files changed, 9 insertions(+), 0 deletions(-) diff --git a/drivers/infiniband/ulp/iser/iscsi_iser.c b/drivers/infiniband/ulp/iser/iscsi_iser.c index 6a594aa..de8730d 100644 --- a/drivers/infiniband/ulp/iser/iscsi_iser.c +++ b/drivers/infiniband/ulp/iser/iscsi_iser.c @@ -640,6 +640,15 @@ iscsi_iser_session_create(struct iscsi_endpoint *ep, SHOST_DIX_GUARD_CRC); } + /* +* Limit the sg_tablesize and max_sectors based on the device +* max fastreg page list length. +*/ + shost-sg_tablesize = min_t(unsigned short, shost-sg_tablesize, + ib_conn-device-dev_attr.max_fast_reg_page_list_len); + shost-max_sectors = min_t(unsigned int, + 1024, (shost-sg_tablesize * PAGE_SIZE) 9); + The min statement is meaningless for max_sectors - you do a min between default sg_tablesize and frpl length - so the maximum sg_tablesize is 128 which is 1024 max_sectors. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V5 5/5] RDMA/isert: Limit read depth based on the device max_sge_rd capability
On 7/5/2015 8:45 PM, Steve Wise wrote: Use the device's max_sge_rd capability to compute the target's read sge depth. Save both the read and write max_sge values in the isert_conn struct, and use these when creating RDMA_WRITE/READ work requests. Signed-off-by: Steve Wise sw...@opengridcomputing.com Reviewed-by: Sagi Grimberg sa...@mellanox.com -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V3 1/5] RDMA/core: Transport-independent access flags
On 7/6/2015 2:22 AM, Steve Wise wrote: The semantics for MR access flags are not consistent across RDMA protocols. So rather than have applications try and glean what they need, have them pass in the intended roles and attributes for the MR to be allocated and let the RDMA core select the appropriate access flags given the roles, attributes, and device capabilities. We introduce rdma_mr_roles and rdma_mr_attributes that enumerate the possible roles and attributes for a MR. These are documented in the enums themselves. New services exported: rdma_device_access_flags() - given the intended MR roles and attributes passed in, return the ib_access_flags bits for the device. rdma_get_dma_mr() - allocate a dma mr using the applications intended MR roles and MR attributes. This uses rdma_device_access_flags(). rdma_fast_reg_access_flags() - return the ib_access_flags bits needed for a fast register WR given the applications intended MR roles and MR attributes. This uses rdma_device_access_flags(). Signed-off-by: Steve Wise sw...@opengridcomputing.com --- drivers/infiniband/core/verbs.c | 30 +++ include/rdma/ib_verbs.h | 106 +++ 2 files changed, 136 insertions(+), 0 deletions(-) diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c index bac3fb4..f42595c 100644 --- a/drivers/infiniband/core/verbs.c +++ b/drivers/infiniband/core/verbs.c @@ -1144,6 +1144,36 @@ struct ib_mr *ib_get_dma_mr(struct ib_pd *pd, int mr_access_flags) } EXPORT_SYMBOL(ib_get_dma_mr); +int rdma_device_access_flags(struct ib_pd *pd, int roles, int attrs) +{ + int access_flags = attrs; + + if (roles RDMA_MRR_RECV) + access_flags |= IB_ACCESS_LOCAL_WRITE; + + if (roles RDMA_MRR_WRITE_DEST) + access_flags |= IB_ACCESS_LOCAL_WRITE | IB_ACCESS_REMOTE_WRITE; + + if (roles RDMA_MRR_READ_DEST) { + access_flags |= IB_ACCESS_LOCAL_WRITE; + if (rdma_protocol_iwarp(pd-device, + rdma_start_port(pd-device))) + access_flags |= IB_ACCESS_REMOTE_WRITE; + } + + if (roles RDMA_MRR_READ_SOURCE) + access_flags |= IB_ACCESS_REMOTE_READ; + + if (roles RDMA_MRR_ATOMIC_DEST) + access_flags |= IB_ACCESS_LOCAL_WRITE | IB_ACCESS_REMOTE_ATOMIC; + + if (roles RDMA_MRR_MW_BIND) + access_flags |= IB_ACCESS_MW_BIND; + + return access_flags; +} +EXPORT_SYMBOL(rdma_device_access_flags); + struct ib_mr *ib_reg_phys_mr(struct ib_pd *pd, struct ib_phys_buf *phys_buf_array, int num_phys_buf, diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h index 986fddb..da1eadf 100644 --- a/include/rdma/ib_verbs.h +++ b/include/rdma/ib_verbs.h @@ -2494,6 +2494,112 @@ static inline int ib_req_ncomp_notif(struct ib_cq *cq, int wc_cnt) struct ib_mr *ib_get_dma_mr(struct ib_pd *pd, int mr_access_flags); /** + * rdma_mr_roles - possible roles an RDMA MR will be used for + * + * This allows a transport independent RDMA application to + * create MRs that are usable for all the desired roles w/o + * having to understand which access rights are needed. + */ +enum { + + /* lkey used in a ib_recv_wr sge */ + RDMA_MRR_RECV = 1, + + /* lkey used for IB_WR_SEND* ops in the ib_send_wr sge */ + RDMA_MRR_SEND = (11), + + /* rkey used for IB_WR_RDMA_READ* ops in ib_send_wr wr.rdma.rkey */ + RDMA_MRR_READ_SOURCE= (12), + + /* lkey used for IB_WR_RDMA_READ* ops in the ib_send_wr sge */ + RDMA_MRR_READ_DEST = (13), + + /* lkey used for IB_WR_RDMA_WRITE* ops in the ib_send_wr sge */ + RDMA_MRR_WRITE_SOURCE = (14), + + /* rkey used for IB_WR_RDMA_WRITE* ops in ib_send_wr wr.rdma.rkey */ + RDMA_MRR_WRITE_DEST = (15), + + /* +* lkey used for IB_WR_ATOMIC/MASKED_ATOMIC* ops in the ib_send_wr sge +*/ + RDMA_MRR_ATOMIC_SOURCE = (16), + + /* +* rkey used for IB_WR_ATOMIC/MASKED_ATOMIC* ops in ib_send_wr +* wr.atomic.rkey +*/ + RDMA_MRR_ATOMIC_DEST= (17), + + /* MR used for a IB_WR_MW_BIND in ib_send_wr wr.bind_mw.bind_info.mr */ + RDMA_MRR_MW_BIND= (18), +}; + +/** + * rdma_mr_attributes - attributes for rdma memory regions + */ +enum { + RDMA_MRA_ZERO_BASED = IB_ZERO_BASED, + RDMA_MRA_ACCESS_ON_DEMAND = IB_ACCESS_ON_DEMAND, +}; + +/** + * rdma_device_access_flags - Returns the device-specific MR access flags. + * @pd: The protection domain associated with the memory region. + * @roles: The intended roles of the MR + * @attrs: The desired attributes of the MR + * + * Use the intended roles from @roles along with @attrs and the device + *
Re: [PATCH v3] libibverbs init.c: conditionally emit warning if no userspace driver found
On Jun 17, 2015, at 10:25 AM, Doug Ledford dledf...@redhat.com wrote: The patch is accepted, I just haven’t pushed it out yet. Is there a timeline for when this patch will be available in the upstream git repo and released in a new version of libibverbs? I ask because we'd like to see this patch get into upstream distro libibverbs releases. Once that happens, we can start planning the end of the horrible hackarounds we had to put into place (e.g., in Open MPI) to suppress the misleading libibverbs output. Thanks! -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/ N�r��yb�X��ǧv�^�){.n�+{��ٚ�{ay�ʇڙ�,j��f���h���z��w��� ���j:+v���w�j�mzZ+�ݢj��!�i
Re: [PATCH for-next V6 3/5] IB/uverbs: Enable device removal when there are active user space applications
On 6/30/2015 9:40 PM, Jason Gunthorpe wrote: On Tue, Jun 30, 2015 at 01:26:05PM +0300, Yishai Hadas wrote: struct ib_uverbs_device { - struct kref ref; + struct kref comp_ref; + struct kref free_ref; So.. I was looking at this, and there is something wrong with the existing code. This old code: cdev_del(uverbs_dev-cdev); [..] wait_for_completion(uverbs_dev-comp); - kfree(uverbs_dev); Has built in to it an assumption that when cdev_del returns there can be no possible open() running. Which doesn't appear to be true, cdev calls open unlocked and relies on refcounting to make everything work out. The patch that introduces this bug was added 5 years ago by Alex Chiang and Signed-off-by: Roland Dreier. Look at commit ID:2a72f212263701b927559f6850446421d5906c41, it can be seen also at: http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=2a72f212263701b Before this commit there was a device look-up table that was protected by a spin_lock used by ib_uverbs_open and by ib_uverbs_remove_one. When it was dropped and container_of was used instead, it enabled the race with remove_one as dev might be freed just after: dev = container_of(inode-i_cdev, struct ib_uverbs_device, cdev) but before the kref_get. In addition, this buggy patch added some dead code as container_of(x,y,z) can never be NULL and so dev can never be NULL. As a result the comment above ib_uverbs_open saying the open method will either immediately run -ENXIO is wrong as it can never happen. static int ib_uverbs_open(struct inode *inode, struct file *filp) { @@ -631,13 +628,10 @@ static int ib_uverbs_open(struct inode *inode, struct file *filp) struct ib_uverbs_file *file; int ret; - spin_lock(map_lock); - dev = dev_table[iminor(inode) - IB_UVERBS_BASE_MINOR]; + dev = container_of(inode-i_cdev, struct ib_uverbs_device, cdev); if (dev) kref_get(dev-ref); - spin_unlock(map_lock); - - if (!dev) + else return -ENXIO; Doug/Jason, AFAIK V6 addressed all opened comments raised by Jason, including the last one that asked to use 2 separate krefs for both complete and free, it didn't introduced the problem above. I believe that we should go forward and take the series. Please consider that this series fixes an existing oops in patch #1 and adds a missing functionality in the kernel, Enable device removal when there are active user space clients. To fix the existing 5 years bug an orthogonal patch that fixes the buggy patch should be sent. Alex/Roland: Please review above, any option that you'll contribute a patch that solves that problem ? any comment on ? -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH for-4.2 0/1] RDMA/core: Fixes for port mapper client registration
Hello Steve, how can I reproduce the original bug/issue and then test this fix on iw_cxgb4? Have iwpmd running and start an rdma application. Leave the rdma application running and restart the iwpmd. When you stop the application, if you have the debug enabled, you will see that the remove mapping request from the client driver to the port mapper service isn't going through, (without the fix, it isn't allowed, because the client is treated as unregistered). Tatyana -Original Message- From: linux-rdma-ow...@vger.kernel.org [mailto:linux-rdma-ow...@vger.kernel.org] On Behalf Of Steve Wise Sent: Monday, July 06, 2015 12:21 PM To: Nikolova, Tatyana E; 'Doug Ledford' Cc: Lacombe, John S; linux-rdma@vger.kernel.org Subject: RE: [PATCH for-4.2 0/1] RDMA/core: Fixes for port mapper client registration -Original Message- From: Tatyana Nikolova [mailto:tatyana.e.nikol...@intel.com] Sent: Thursday, July 02, 2015 12:46 PM To: Doug Ledford Cc: john.s.laco...@intel.com; sw...@opengridcomputing.com; linux-rdma@vger.kernel.org Subject: [PATCH for-4.2 0/1] RDMA/core: Fixes for port mapper client registration This patch has been reworked, but is very similar to the previously submitted (10/16/2014) patch, which was forgotten: [PATCH 1/1] RDMA/core: Fixes for port mapper client registration https://www.mail-archive.com/linux-rdma%40vger.kernel.org/msg21512.htm l This patch is addressing the same issue as the previous one - the clients which have provided their mapping info to the user space service after it is restarted, are treated as unregistered and are not allowed to make remove mapping requests. If the user space port mapper service is restarted, it notifies the port mapper clients and they provide the mapping information they have in use. At this time with the fix the registration type of the client is set to IWPM_REG_INCOMPL and the client is allowed to make remove mapping requests to the user space service. There are some improvements in the current patch: 1) Adding functions to set/check registration type 2) New clients (calling iwpm_init()) have their registration set to IWPM_REG_UNDEF 3) If the port mapper user space service is not available, then the client registration stays IWPM_REG_UNDEF and the registration type won't be checked until the service becomes available (obviously no mappings are possible, if the user space service isn't running). For these changes to take effect no provider changes are necessary. The patch contains fixes for the current registration logic. Hey Tatyana, how can I reproduce the original bug/issue and then test this fix on iw_cxgb4? -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH V3 1/5] RDMA/core: Transport-independent access flags
-Original Message- From: Haggai Eran [mailto:hagg...@mellanox.com] Sent: Monday, July 06, 2015 2:09 AM To: Steve Wise; dledf...@redhat.com Cc: sa...@mellanox.com; ogerl...@mellanox.com; r...@mellanox.com; linux-rdma@vger.kernel.org; e...@mellanox.com; target- de...@vger.kernel.org; linux-...@vger.kernel.org; trond.mykleb...@primarydata.com; bfie...@fieldses.org Subject: Re: [PATCH V3 1/5] RDMA/core: Transport-independent access flags On 06/07/2015 02:22, Steve Wise wrote: +int rdma_device_access_flags(struct ib_pd *pd, int roles, int attrs) +{ + int access_flags = attrs; + + if (roles RDMA_MRR_RECV) + access_flags |= IB_ACCESS_LOCAL_WRITE; + + if (roles RDMA_MRR_WRITE_DEST) + access_flags |= IB_ACCESS_LOCAL_WRITE | IB_ACCESS_REMOTE_WRITE; + + if (roles RDMA_MRR_READ_DEST) { + access_flags |= IB_ACCESS_LOCAL_WRITE; + if (rdma_protocol_iwarp(pd-device, + rdma_start_port(pd-device))) + access_flags |= IB_ACCESS_REMOTE_WRITE; + } + + if (roles RDMA_MRR_READ_SOURCE) + access_flags |= IB_ACCESS_REMOTE_READ; + + if (roles RDMA_MRR_ATOMIC_DEST) + access_flags |= IB_ACCESS_LOCAL_WRITE | IB_ACCESS_REMOTE_ATOMIC; I think you need LOCAL_WRITE for ATOMIC_SOURCE in order to receive the results of the atomic operation. Ok. I'm not familiar with atomics. I'll verify with the IB spec and update the code as needed. + + if (roles RDMA_MRR_MW_BIND) + access_flags |= IB_ACCESS_MW_BIND; + + return access_flags; +} +EXPORT_SYMBOL(rdma_device_access_flags); -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v7 1/4] IB/netlink: Add defines for local service requests through netlink
+/* Local service status attribute */ +enum { + LS_NLA_STATUS_SUCCESS = 0, + LS_NLA_STATUS_EINVAL, + LS_NLA_STATUS_ENODATA, + LS_NLA_STATUS_MAX +}; So, this is never used, there seems to be a bunch of never used stuff - please audit everything and get rid of the cruft before re-sending anything. Well, EINVAL and ENODATA are not used by the kernel, but used by the application (ibacm). Should this file (include/uapi/rdma/rdma_netlink.h) contain only defines used by both kernel and application? In that case, the kernel may see unrecognized status value if it ever decides to check the status code when the response status is ERR. Get rid of the status value completely, it serves no current purpose. If in future we need something we can always add a new attribute. Don't pollute the kernel headers with ibacm implementation details. We need a way to encode three reply types, I suggest: RDMA_NL_LS_F_ERR For some reason the listener could not respond. The kernel should issue the request on its own. Identical to a timeout. Good flags, but an empty reply with no LS_NLA_TYPE_PATH_RECORDs The listener responded and there are no paths. Return no paths failure to requestor. Good flags, and up to 6 LS_NLA_TYPE_PATH_RECORDs Success Current implementation can be easily modified to handle these three cases. Are you going to use this scheme or do you have a different idea? There are only two remaining problems I see with the actual netlink schema: 1) There is no easy indication what port the request is coming from. User space absolutely needs that to be able to forward a request on to the proper SA. Yes, we could look at the SGID, but with gid aliases that seems like alot of work for a performant API. Ideas? Include the hardware port GUID? Port Number? Device Name? Not sure, but does need to be addressed. We can pass the device name and port number to the user application. The device and port_num are two of the parameters to ib_sa_path_rec_get(). That might be best, a ifindex would be even better, but we don't have one... Jason -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3] libibverbs init.c: conditionally emit warning if no userspace driver found
On 07/06/2015 09:35 AM, Jeff Squyres (jsquyres) wrote: On Jun 17, 2015, at 10:25 AM, Doug Ledford dledf...@redhat.com wrote: The patch is accepted, I just haven’t pushed it out yet. Is there a timeline for when this patch will be available in the upstream git repo and released in a new version of libibverbs? I ask because we'd like to see this patch get into upstream distro libibverbs releases. Once that happens, we can start planning the end of the horrible hackarounds we had to put into place (e.g., in Open MPI) to suppress the misleading libibverbs output. Thanks! The goal I had in mind was to work through this merge window until we are in RCs on kernel stuff, then shift my focus to libibverbs and try to work through a few of the outstanding issues for it and maybe have a release prior to the next merge window for the kernel. -- Doug Ledford dledf...@redhat.com GPG KeyID: 0E572FDD signature.asc Description: OpenPGP digital signature
Re: [PATCH for-next V6 3/5] IB/uverbs: Enable device removal when there are active user space applications
On Mon, Jul 06, 2015 at 05:08:08PM +0300, Yishai Hadas wrote: The patch that introduces this bug was added 5 years ago by Alex Chiang and Signed-off-by: Roland Dreier. Look at commit ID:2a72f212263701b927559f6850446421d5906c41, it can be seen also at: http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=2a72f212263701b Perhaps, this one also looks involved as well: commit 055422ddbb0a7610c5f57a056743d7336a39e90f Author: Alexander Chiang achi...@hp.com Date: Tue Feb 2 19:07:49 2010 + IB/uverbs: Convert *cdev to cdev in struct ib_uverbs_device Instead of storing a pointer to a cdev, embed the entire struct cdev. Embedding the cdev without using a parent kobject looks like the root mistake. AFAIK V6 addressed all opened comments raised by Jason, including the last one that asked to use 2 separate krefs for both complete and free, it didn't introduced the problem above. It does make it worse though, previously the module locking would make it unlikely to ever hit any problem here, but now we have a naked fully exposed race where release races with kfree resulting in use-after-free. I'd think hitting it is quite likely if the new feature is being used, and subtle memory corruption is not something we want to see in the kernel. So, I'd say, yes it is an old bug, but it is unlikely to hit it. This patch series is making it much likely, so it needs to be fixed. In any event, I'm not sure what you are complaining about - this series absolutely reworks the lifetime model of ib_uverbs_device, why on earth do you think it is OK to have a buggy new implementation just because the previous version was buggy? *Especially* when someone takes the time to point out the mistake and tells you exactly how to fix it, and it is *trival* to do? Even worse: I went through and audited the lifetime of V6's new model, and I think that is *absolutely* something you should have done before sending V1 :( Jason -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH for-4.2 0/1] RDMA/core: Fixes for port mapper client registration
-Original Message- From: Tatyana Nikolova [mailto:tatyana.e.nikol...@intel.com] Sent: Thursday, July 02, 2015 12:46 PM To: Doug Ledford Cc: john.s.laco...@intel.com; sw...@opengridcomputing.com; linux-rdma@vger.kernel.org Subject: [PATCH for-4.2 0/1] RDMA/core: Fixes for port mapper client registration This patch has been reworked, but is very similar to the previously submitted (10/16/2014) patch, which was forgotten: [PATCH 1/1] RDMA/core: Fixes for port mapper client registration https://www.mail-archive.com/linux-rdma%40vger.kernel.org/msg21512.html This patch is addressing the same issue as the previous one - the clients which have provided their mapping info to the user space service after it is restarted, are treated as unregistered and are not allowed to make remove mapping requests. If the user space port mapper service is restarted, it notifies the port mapper clients and they provide the mapping information they have in use. At this time with the fix the registration type of the client is set to IWPM_REG_INCOMPL and the client is allowed to make remove mapping requests to the user space service. There are some improvements in the current patch: 1) Adding functions to set/check registration type 2) New clients (calling iwpm_init()) have their registration set to IWPM_REG_UNDEF 3) If the port mapper user space service is not available, then the client registration stays IWPM_REG_UNDEF and the registration type won't be checked until the service becomes available (obviously no mappings are possible, if the user space service isn't running). For these changes to take effect no provider changes are necessary. The patch contains fixes for the current registration logic. Hey Tatyana, how can I reproduce the original bug/issue and then test this fix on iw_cxgb4? -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V3 1/5] RDMA/core: Transport-independent access flags
On 7/6/2015 5:37 PM, Steve Wise wrote: -Original Message- From: Sagi Grimberg [mailto:sa...@dev.mellanox.co.il] Sent: Monday, July 06, 2015 2:54 AM To: Steve Wise; dledf...@redhat.com Cc: sa...@mellanox.com; ogerl...@mellanox.com; r...@mellanox.com; linux-rdma@vger.kernel.org; e...@mellanox.com; target- de...@vger.kernel.org; linux-...@vger.kernel.org; trond.mykleb...@primarydata.com; bfie...@fieldses.org Subject: Re: [PATCH V3 1/5] RDMA/core: Transport-independent access flags On 7/6/2015 2:22 AM, Steve Wise wrote: The semantics for MR access flags are not consistent across RDMA protocols. So rather than have applications try and glean what they need, have them pass in the intended roles and attributes for the MR to be allocated and let the RDMA core select the appropriate access flags given the roles, attributes, and device capabilities. We introduce rdma_mr_roles and rdma_mr_attributes that enumerate the possible roles and attributes for a MR. These are documented in the enums themselves. New services exported: rdma_device_access_flags() - given the intended MR roles and attributes passed in, return the ib_access_flags bits for the device. rdma_get_dma_mr() - allocate a dma mr using the applications intended MR roles and MR attributes. This uses rdma_device_access_flags(). rdma_fast_reg_access_flags() - return the ib_access_flags bits needed for a fast register WR given the applications intended MR roles and MR attributes. This uses rdma_device_access_flags(). Signed-off-by: Steve Wise sw...@opengridcomputing.com --- drivers/infiniband/core/verbs.c | 30 +++ include/rdma/ib_verbs.h | 106 +++ 2 files changed, 136 insertions(+), 0 deletions(-) diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c index bac3fb4..f42595c 100644 --- a/drivers/infiniband/core/verbs.c +++ b/drivers/infiniband/core/verbs.c @@ -1144,6 +1144,36 @@ struct ib_mr *ib_get_dma_mr(struct ib_pd *pd, int mr_access_flags) } EXPORT_SYMBOL(ib_get_dma_mr); +int rdma_device_access_flags(struct ib_pd *pd, int roles, int attrs) +{ + int access_flags = attrs; + + if (roles RDMA_MRR_RECV) + access_flags |= IB_ACCESS_LOCAL_WRITE; + + if (roles RDMA_MRR_WRITE_DEST) + access_flags |= IB_ACCESS_LOCAL_WRITE | IB_ACCESS_REMOTE_WRITE; + + if (roles RDMA_MRR_READ_DEST) { + access_flags |= IB_ACCESS_LOCAL_WRITE; + if (rdma_protocol_iwarp(pd-device, + rdma_start_port(pd-device))) + access_flags |= IB_ACCESS_REMOTE_WRITE; + } + + if (roles RDMA_MRR_READ_SOURCE) + access_flags |= IB_ACCESS_REMOTE_READ; + + if (roles RDMA_MRR_ATOMIC_DEST) + access_flags |= IB_ACCESS_LOCAL_WRITE | IB_ACCESS_REMOTE_ATOMIC; + + if (roles RDMA_MRR_MW_BIND) + access_flags |= IB_ACCESS_MW_BIND; + + return access_flags; +} +EXPORT_SYMBOL(rdma_device_access_flags); + struct ib_mr *ib_reg_phys_mr(struct ib_pd *pd, struct ib_phys_buf *phys_buf_array, int num_phys_buf, diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h index 986fddb..da1eadf 100644 --- a/include/rdma/ib_verbs.h +++ b/include/rdma/ib_verbs.h @@ -2494,6 +2494,112 @@ static inline int ib_req_ncomp_notif(struct ib_cq *cq, int wc_cnt) struct ib_mr *ib_get_dma_mr(struct ib_pd *pd, int mr_access_flags); /** + * rdma_mr_roles - possible roles an RDMA MR will be used for + * + * This allows a transport independent RDMA application to + * create MRs that are usable for all the desired roles w/o + * having to understand which access rights are needed. + */ +enum { + + /* lkey used in a ib_recv_wr sge */ + RDMA_MRR_RECV = 1, + + /* lkey used for IB_WR_SEND* ops in the ib_send_wr sge */ + RDMA_MRR_SEND = (11), + + /* rkey used for IB_WR_RDMA_READ* ops in ib_send_wr wr.rdma.rkey */ + RDMA_MRR_READ_SOURCE= (12), + + /* lkey used for IB_WR_RDMA_READ* ops in the ib_send_wr sge */ + RDMA_MRR_READ_DEST = (13), + + /* lkey used for IB_WR_RDMA_WRITE* ops in the ib_send_wr sge */ + RDMA_MRR_WRITE_SOURCE = (14), + + /* rkey used for IB_WR_RDMA_WRITE* ops in ib_send_wr wr.rdma.rkey */ + RDMA_MRR_WRITE_DEST = (15), + + /* +* lkey used for IB_WR_ATOMIC/MASKED_ATOMIC* ops in the ib_send_wr sge +*/ + RDMA_MRR_ATOMIC_SOURCE = (16), + + /* +* rkey used for IB_WR_ATOMIC/MASKED_ATOMIC* ops in ib_send_wr +* wr.atomic.rkey +*/ + RDMA_MRR_ATOMIC_DEST= (17), + + /* MR used for a IB_WR_MW_BIND in ib_send_wr wr.bind_mw.bind_info.mr */ + RDMA_MRR_MW_BIND= (18), +}; + +/** + *
RE: [PATCH V3 1/5] RDMA/core: Transport-independent access flags
-Original Message- From: linux-rdma-ow...@vger.kernel.org [mailto:linux-rdma-ow...@vger.kernel.org] On Behalf Of Sagi Grimberg Sent: Monday, July 06, 2015 11:18 AM To: Steve Wise; dledf...@redhat.com Cc: sa...@mellanox.com; ogerl...@mellanox.com; r...@mellanox.com; linux-rdma@vger.kernel.org; e...@mellanox.com; target- de...@vger.kernel.org; linux-...@vger.kernel.org; trond.mykleb...@primarydata.com; bfie...@fieldses.org Subject: Re: [PATCH V3 1/5] RDMA/core: Transport-independent access flags On 7/6/2015 5:37 PM, Steve Wise wrote: -Original Message- From: Sagi Grimberg [mailto:sa...@dev.mellanox.co.il] Sent: Monday, July 06, 2015 2:54 AM To: Steve Wise; dledf...@redhat.com Cc: sa...@mellanox.com; ogerl...@mellanox.com; r...@mellanox.com; linux-rdma@vger.kernel.org; e...@mellanox.com; target- de...@vger.kernel.org; linux-...@vger.kernel.org; trond.mykleb...@primarydata.com; bfie...@fieldses.org Subject: Re: [PATCH V3 1/5] RDMA/core: Transport-independent access flags On 7/6/2015 2:22 AM, Steve Wise wrote: The semantics for MR access flags are not consistent across RDMA protocols. So rather than have applications try and glean what they need, have them pass in the intended roles and attributes for the MR to be allocated and let the RDMA core select the appropriate access flags given the roles, attributes, and device capabilities. We introduce rdma_mr_roles and rdma_mr_attributes that enumerate the possible roles and attributes for a MR. These are documented in the enums themselves. New services exported: rdma_device_access_flags() - given the intended MR roles and attributes passed in, return the ib_access_flags bits for the device. rdma_get_dma_mr() - allocate a dma mr using the applications intended MR roles and MR attributes. This uses rdma_device_access_flags(). rdma_fast_reg_access_flags() - return the ib_access_flags bits needed for a fast register WR given the applications intended MR roles and MR attributes. This uses rdma_device_access_flags(). Signed-off-by: Steve Wise sw...@opengridcomputing.com --- drivers/infiniband/core/verbs.c | 30 +++ include/rdma/ib_verbs.h | 106 +++ 2 files changed, 136 insertions(+), 0 deletions(-) diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c index bac3fb4..f42595c 100644 --- a/drivers/infiniband/core/verbs.c +++ b/drivers/infiniband/core/verbs.c @@ -1144,6 +1144,36 @@ struct ib_mr *ib_get_dma_mr(struct ib_pd *pd, int mr_access_flags) } EXPORT_SYMBOL(ib_get_dma_mr); +int rdma_device_access_flags(struct ib_pd *pd, int roles, int attrs) +{ + int access_flags = attrs; + + if (roles RDMA_MRR_RECV) + access_flags |= IB_ACCESS_LOCAL_WRITE; + + if (roles RDMA_MRR_WRITE_DEST) + access_flags |= IB_ACCESS_LOCAL_WRITE | IB_ACCESS_REMOTE_WRITE; + + if (roles RDMA_MRR_READ_DEST) { + access_flags |= IB_ACCESS_LOCAL_WRITE; + if (rdma_protocol_iwarp(pd-device, + rdma_start_port(pd-device))) + access_flags |= IB_ACCESS_REMOTE_WRITE; + } + + if (roles RDMA_MRR_READ_SOURCE) + access_flags |= IB_ACCESS_REMOTE_READ; + + if (roles RDMA_MRR_ATOMIC_DEST) + access_flags |= IB_ACCESS_LOCAL_WRITE | IB_ACCESS_REMOTE_ATOMIC; + + if (roles RDMA_MRR_MW_BIND) + access_flags |= IB_ACCESS_MW_BIND; + + return access_flags; +} +EXPORT_SYMBOL(rdma_device_access_flags); + struct ib_mr *ib_reg_phys_mr(struct ib_pd *pd, struct ib_phys_buf *phys_buf_array, int num_phys_buf, diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h index 986fddb..da1eadf 100644 --- a/include/rdma/ib_verbs.h +++ b/include/rdma/ib_verbs.h @@ -2494,6 +2494,112 @@ static inline int ib_req_ncomp_notif(struct ib_cq *cq, int wc_cnt) struct ib_mr *ib_get_dma_mr(struct ib_pd *pd, int mr_access_flags); /** + * rdma_mr_roles - possible roles an RDMA MR will be used for + * + * This allows a transport independent RDMA application to + * create MRs that are usable for all the desired roles w/o + * having to understand which access rights are needed. + */ +enum { + + /* lkey used in a ib_recv_wr sge */ + RDMA_MRR_RECV = 1, + + /* lkey used for IB_WR_SEND* ops in the ib_send_wr sge */ + RDMA_MRR_SEND = (11), + + /* rkey used for IB_WR_RDMA_READ* ops in ib_send_wr wr.rdma.rkey */ + RDMA_MRR_READ_SOURCE= (12), + + /* lkey used for IB_WR_RDMA_READ* ops in the ib_send_wr sge */ + RDMA_MRR_READ_DEST = (13), + + /* lkey used for IB_WR_RDMA_WRITE* ops in the ib_send_wr sge */ + RDMA_MRR_WRITE_SOURCE = (14), + + /* rkey
RE: [PATCH v7 1/4] IB/netlink: Add defines for local service requests through netlink
From: Jason Gunthorpe [mailto:jguntho...@obsidianresearch.com] Sent: Friday, July 03, 2015 5:16 PM To: Wan, Kaike Cc: linux-rdma@vger.kernel.org; Fleck, John; Weiny, Ira Subject: Re: [PATCH v7 1/4] IB/netlink: Add defines for local service requests through netlink On Tue, Jun 30, 2015 at 09:45:52AM -0400, kaike@intel.com wrote: @@ -7,12 +7,14 @@ enum { RDMA_NL_RDMA_CM = 1, RDMA_NL_NES, RDMA_NL_C4IW, + RDMA_NL_SA, I think this should be RDMA_NL_LS to be consistent with the rest, the SA resolve OP should be something like: RDMA_NL_GET_TYPE(RDMA_NL_LS,RDMA_NL_LS_OP_RESOLVE_PATH) I'd probably also add a comment: +RDMA_NL_LS, /* RDMA Local Services */ I have no idea what 'local services' means, it seems like a silly name for this, but whatever. Changed. +/* Local service group opcodes */ +enum { + RDMA_NL_LS_OP_RESOLVE = 0, + RDMA_NL_LS_OP_SET_TIMEOUT, + RDMA_NL_LS_NUM_OPS +}; I think you should document the schema for each operation here in a comment for future readers. Added. +/* Local service netlink message flags */ +#define RDMA_NL_LS_F_OK0x0100 /* Success response */ +#define RDMA_NL_LS_F_ERR 0x0200 /* Failed response */ These overlap with other generic netlink flags, that seems OK, but didn't look too hard. Drop RDMA_NL_LS_F_OK, we don't need OK and ERR. !ERR == OK. RDMA_NL_LS_F_OK dropped. You might need a RDMA_NL_LS_F_REPLY however, see below. Added. +/* Local service attribute type */ +#define RDMA_NLA_F_MANDATORY (1 13) +#define RDMA_NLA_TYPE_MASK ~(NLA_F_NESTED | NLA_F_NET_BYTEORDER | \ + RDMA_NLA_F_MANDATORY) More brackets for this macro: #define RDMA_NLA_TYPE_MASK(~(NLA_F_NESTED | NLA_F_NET_BYTEORDER | RDMA_NLA_F_MANDATORY)) Added. +/* Local service status attribute */ +enum { + LS_NLA_STATUS_SUCCESS = 0, + LS_NLA_STATUS_EINVAL, + LS_NLA_STATUS_ENODATA, + LS_NLA_STATUS_MAX +}; So, this is never used, there seems to be a bunch of never used stuff - please audit everything and get rid of the cruft before re-sending anything. Well, EINVAL and ENODATA are not used by the kernel, but used by the application (ibacm). Should this file (include/uapi/rdma/rdma_netlink.h) contain only defines used by both kernel and application? In that case, the kernel may see unrecognized status value if it ever decides to check the status code when the response status is ERR. We need a way to encode three reply types, I suggest: RDMA_NL_LS_F_ERR For some reason the listener could not respond. The kernel should issue the request on its own. Identical to a timeout. Good flags, but an empty reply with no LS_NLA_TYPE_PATH_RECORDs The listener responded and there are no paths. Return no paths failure to requestor. Good flags, and up to 6 LS_NLA_TYPE_PATH_RECORDs Success Current implementation can be easily modified to handle these three cases. +struct rdma_nla_ls_status { + __u32 status; +}; Never used, drop it OK. + +/* Local service pathrecord attribute: struct ib_path_rec_data */ + +/* Local service timeout attribute */ struct rdma_nla_ls_timeout { + __u32 timeout; +}; I don't think the SET_TIMEOUT schema makes much sense, there is not much reason for a nested attribute, just use the rdma_nla_ls_timeout as the value. If we need extension then we can add optional attributes after it later without breaking. OK +/* Local Service ServiceID attribute */ struct rdma_nla_ls_service_id +{ + __u64 service_id; +}; Drop struct, just use u64 OK +/* Local Service DGID/SGID attribute: big endian */ struct +rdma_nla_ls_gid { + __u8gid[16]; +}; Wish there was a common GID type we could use, but OK.. +/* Local Service Traffic Class attribute */ struct rdma_nla_ls_tclass +{ + __u8tclass; +}; Drop OK. +/* Local Service path use attribute */ enum { + LS_NLA_PATH_USE_ALL = 0, + LS_NLA_PATH_USE_UNIDIRECTIONAL, + LS_NLA_PATH_USE_GMP, + LS_NLA_PATH_USE_MAX +}; Document how these work Done. + +struct rdma_nla_ls_path_use { + __u8use; +}; + +/* Local Service Pkey attribute*/ +struct rdma_nla_ls_pkey { + __u16 pkey; +}; + +/* Local Service Qos Class attribute */ struct rdma_nla_ls_qos_class +{ + __u16 qos_class; +}; Drop all of these Done. There are only two remaining problems I see with the actual netlink schema: 1) There is no easy indication what port the request is coming from. User space absolutely needs that to be able to forward a request on to the proper SA. Yes, we could look at the SGID, but with gid aliases that seems like alot of work for a performant API. Ideas? Include the hardware port GUID? Port Number? Device
Re: [PATCH v2 2/2] x86/mm/pat, drivers/media/ivtv: move pat warn and replace WARN() with pr_warn()
On Mon, Jun 29, 2015 at 05:52:18AM -0400, Andy Walls wrote: On June 29, 2015 2:55:05 AM EDT, Ingo Molnar mi...@kernel.org wrote: * Andy Walls a...@silverblocksystems.net wrote: On Fri, 2015-06-26 at 10:45 +0200, Ingo Molnar wrote: * Luis R. Rodriguez mcg...@suse.com wrote: On Thu, Jun 25, 2015 at 08:51:47AM +0200, Ingo Molnar wrote: * Luis R. Rodriguez mcg...@do-not-panic.com wrote: From: Luis R. Rodriguez mcg...@suse.com On built-in kernels this warning will always splat as this is part of the module init. Fix that by shifting the PAT requirement check out under the code that does the quasi-probe for the device. This device driver relies on an existing driver to find its own devices, it looks for that device driver and its own found devices, then uses driver_for_each_device() to try to see if it can probe each of those devices as a frambuffer device with ivtvfb_init_card(). We tuck the PAT requiremenet check then on the ivtvfb_init_card() call making the check at least require an ivtv device present before complaining. Reported-by: Fengguang Wu fengguang...@intel.com [0-day test robot] Signed-off-by: Luis R. Rodriguez mcg...@suse.com --- drivers/media/pci/ivtv/ivtvfb.c | 15 +-- 1 file changed, 9 insertions(+), 6 deletions(-) diff --git a/drivers/media/pci/ivtv/ivtvfb.c b/drivers/media/pci/ivtv/ivtvfb.c index 4cb365d..8b95eef 100644 --- a/drivers/media/pci/ivtv/ivtvfb.c +++ b/drivers/media/pci/ivtv/ivtvfb.c @@ -38,6 +38,8 @@ Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA */ +#define pr_fmt(fmt) KBUILD_MODNAME : fmt + #include linux/module.h #include linux/kernel.h #include linux/fb.h @@ -1171,6 +1173,13 @@ static int ivtvfb_init_card(struct ivtv *itv) { int rc; +#ifdef CONFIG_X86_64 + if (pat_enabled()) { + pr_warn(ivtvfb needs PAT disabled, boot with nopat kernel parameter\n); + return -ENODEV; + } +#endif + if (itv-osd_info) { IVTVFB_ERR(Card %d already initialised\n, ivtvfb_card_id); return -EBUSY; Same argument as for ipath: why not make arch_phys_wc_add() fail on PAT and return -1, and check it in arch_phys_wc_del()? The arch_phys_wc_add() is a no-op for PAT systems but for PAT to work we need not only need to add this in where we replace the MTRR call but we also need to convert ioremap_nocache() calls to ioremap_wc() but only if things were split up already. Hi Ingo, We don't need to do that: for such legacy drivers we can fall back to UC just fine, and inform the user that by booting with 'nopat' the old behavior will be back... This is really a user experience decision. IMO anyone who is still using ivtvfb and an old conventional PCI PVR-350 to render, at SDTV resolution, an X Desktop display or video playback on a television screen, isn't going to give a hoot about modern things like PAT. The user will simply want the framebuffer performance they are accustomed to having with their system. UC will probably yield unsatisfactory performance for an ivtvfb framebuffer. With that in mind, I would think it better to obviously and clearly disable the ivtvfb framebuffer module with PAT enabled, so the user will check the log and read the steps needed to obtain acceptable performance. Maybe that's me just wanting to head off the poor ivtvfb performance with latest kernel bug reports. Whatever the decision, my stock response to bug reports related to this will always be What do the logs say?. So what if that frame buffer is their only (working) frame buffer? A slow framebuffer is still much better at giving people logs to look at than a non-working one. Thanks, Ingo Hi Ingo, For an old DVR setup, I can see that being the case. So a sluggish framebuffer is better than none. OK, so I knew I had considered this strategy before, I did and here's the recap: I originally had proposed this same exact thing with my first patch set but [0] had set to require a new __arch_phys_wc_add() which forces the setting without checking if pat_enabled() is true. This was set out as a transient API in hopes that device drivers that require work but hadn't been converted to split the ioremap'd calls would later be modified / fixed with the split. Here's an update for the status of each driver: Driver File fusion drivers/message/fusion/mptbase.c This code was commented out, the code was just removed, this longer applies. ivtv
Re: IBV_SEND_INLINE Behavior Differing from Spec
Roland, My libmlx4 is named libmlx4-rdmav2.so and timestamped 2014-01-01 (from the package libmlx4-1, 1.0.5-1ubuntu1 for Ubuntu Trusty 14.04). I use a mutex-protected FIFO of send buffers to make sure only one thread is able to acquire a particular send buffer at a time, and my application does not modify a send buffer after it passes it to my message-sending method. I've also replicated this with a single-threaded version of the application; something as simple as zeroing out the first byte of the send buffer on the line after ibv_post_send() is enough to occasionally trigger this behavior. Thanks, Christopher On Sun, Jul 5, 2015 at 12:28 PM, Roland Dreier rol...@purestorage.com wrote: On Fri, Jul 3, 2015 at 2:50 PM, Christopher Mitchell christop...@cemetech.net wrote: The manpage for ibv_post_send indicates that if you specify the IBV_SEND_INLINE flag, that you can immediately re-use the buffer after ibv_post_send() returns. However, I have found that if I modify the buffer immediately after ibv_post_send(), the changes are rarely reflected in the presumably already-posted message. I'm using ConnectX-3 cards and RC connections, if that's relevant. Is this a known issue, and/or is it possible I'm missing something obvious? I can't see any way to get the behavior you're seeing. What version of libmlx4 are you using? The reason I say I don't see how that can happen is that in mlx4_post_send() in libmlx4, if IBV_SEND_INLINE is set, the code memcpys the data into the descriptor and does not keep any reference to the pointers passed in through the sg ilst. So I don't see how the hardware could even detect that you change the data after ibv_post_send(). Or what do you mean about after ibv_post_send()? Are you possibly changing the data from another thread and racing with the memcpy before ibv_post_send() returns? - R. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V5 0/5] iSER support for iWARP
Hi Steve Co, On Sun, 2015-07-05 at 12:44 -0500, Steve Wise wrote: The following series implements support for iWARP transports in the iSER initiator and target. This is based on Doug's k.o/for-4.2 branch. I've tested this on cxgb4 and mlx4 hardware. Changes since V4: iser: fixedcompiler warning isert: back to setting REMOTE_WRITE only for iWARP devices Changes since V3: Fixed commit messages based on feedback. iser: adjust max_sectors isert: split into 2 patches isert: always set REMOTE_WRITE for dma mrs Changes since V2: The transport independent work is removed from this series and will be submitted in a subsequent series. This V3 series now enables iWARP using existing core services. Changes since V1: Introduce and use transport-independent RDMA core services for allocating DMA MRs and computing fast register access flags. Correctly set the device max_sge_rd capability in several rdma device drivers. isert: use device capability max_sge_rd for the read sge depth. isert: change max_sge to max_write_sge in struct isert_conn. --- Sagi Grimberg (1): mlx4, mlx5, mthca: Expose max_sge_rd correctly Steve Wise (4): RDMA/isert: Limit read depth based on the device max_sge_rd capability RDMA/isert: Set REMOTE_WRITE on DMA MRs to support iWARP devices RDMA/iser: Limit sg tablesize and max_sectors to device fastreg max depth ipath,qib: Expose max_sge_rd correctly drivers/infiniband/hw/ipath/ipath_verbs.c|1 drivers/infiniband/hw/mlx4/main.c|1 drivers/infiniband/hw/mlx5/main.c|1 drivers/infiniband/hw/mthca/mthca_provider.c |1 drivers/infiniband/hw/qib/qib_verbs.c|1 drivers/infiniband/ulp/iser/iscsi_iser.c |9 drivers/infiniband/ulp/isert/ib_isert.c | 53 ++ drivers/infiniband/ulp/isert/ib_isert.h |3 + 8 files changed, 60 insertions(+), 10 deletions(-) Very excited to see iWARP support for iser-initiator + iser-target. ;) No objections to taking the iser-target changes through Doug's infiniband.git. For the target pieces: Acked-by: Nicholas Bellinger n...@linux-iscsi.org Also Doug, since this series is sufficiently small enough, and allows a class of hardware that people are expecting to function with iser to 'just work', I'd recommend to go ahead and push this series for v4.2-rc code. Thank you, --nab -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] rds: rds_ib_device.refcount overflow
On 24/06/2015 07:54, Wengang Wang wrote: There lacks a dropping on rds_ib_device.refcount in case rds_ib_alloc_fmr failed(mr pool running out). this lead to the refcount overflow. A complain in line 117(see following) is seen. From vmcore: s_ib_rdma_mr_pool_depleted is 2147485544 and rds_ibdev-refcount is -2147475448. That is the evidence the mr pool is used up. so rds_ib_alloc_fmr is very likely to return ERR_PTR(-EAGAIN). 115 void rds_ib_dev_put(struct rds_ib_device *rds_ibdev) 116 { 117 BUG_ON(atomic_read(rds_ibdev-refcount) = 0); 118 if (atomic_dec_and_test(rds_ibdev-refcount)) 119 queue_work(rds_wq, rds_ibdev-free_work); 120 } fix is to drop refcount when rds_ib_alloc_fmr failed. Signed-off-by: Wengang Wang wen.gang.w...@oracle.com --- net/rds/ib_rdma.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/net/rds/ib_rdma.c b/net/rds/ib_rdma.c index 273b8bf..657ba9f 100644 --- a/net/rds/ib_rdma.c +++ b/net/rds/ib_rdma.c @@ -759,8 +759,10 @@ void *rds_ib_get_mr(struct scatterlist *sg, unsigned long nents, } ibmr = rds_ib_alloc_fmr(rds_ibdev); - if (IS_ERR(ibmr)) + if (IS_ERR(ibmr)) { + rds_ib_dev_put(rds_ibdev); return ibmr; + } ret = rds_ib_map_fmr(rds_ibdev, ibmr, sg, nents); if (ret == 0) It seems like the function indeed is missing a put on the rds_ibdev in that case. Reviewed-by: Haggai Eran hagg...@mellanox.com You may also want to add: Fixes: 3e0249f9c05c (RDS/IB: add refcount tracking to struct rds_ib_device) -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] rds: rds_ib_device.refcount overflow
Haggai, Thanks for review! I will add the message you suggested and re-post. thanks, wengang 在 2015年07月06日 14:18, Haggai Eran 写道: On 24/06/2015 07:54, Wengang Wang wrote: There lacks a dropping on rds_ib_device.refcount in case rds_ib_alloc_fmr failed(mr pool running out). this lead to the refcount overflow. A complain in line 117(see following) is seen. From vmcore: s_ib_rdma_mr_pool_depleted is 2147485544 and rds_ibdev-refcount is -2147475448. That is the evidence the mr pool is used up. so rds_ib_alloc_fmr is very likely to return ERR_PTR(-EAGAIN). 115 void rds_ib_dev_put(struct rds_ib_device *rds_ibdev) 116 { 117 BUG_ON(atomic_read(rds_ibdev-refcount) = 0); 118 if (atomic_dec_and_test(rds_ibdev-refcount)) 119 queue_work(rds_wq, rds_ibdev-free_work); 120 } fix is to drop refcount when rds_ib_alloc_fmr failed. Signed-off-by: Wengang Wang wen.gang.w...@oracle.com --- net/rds/ib_rdma.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/net/rds/ib_rdma.c b/net/rds/ib_rdma.c index 273b8bf..657ba9f 100644 --- a/net/rds/ib_rdma.c +++ b/net/rds/ib_rdma.c @@ -759,8 +759,10 @@ void *rds_ib_get_mr(struct scatterlist *sg, unsigned long nents, } ibmr = rds_ib_alloc_fmr(rds_ibdev); - if (IS_ERR(ibmr)) + if (IS_ERR(ibmr)) { + rds_ib_dev_put(rds_ibdev); return ibmr; + } ret = rds_ib_map_fmr(rds_ibdev, ibmr, sg, nents); if (ret == 0) It seems like the function indeed is missing a put on the rds_ibdev in that case. Reviewed-by: Haggai Eran hagg...@mellanox.com You may also want to add: Fixes: 3e0249f9c05c (RDS/IB: add refcount tracking to struct rds_ib_device) -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] rds: rds_ib_device.refcount overflow
Fixes: 3e0249f9c05c (RDS/IB: add refcount tracking to struct rds_ib_device) There lacks a dropping on rds_ib_device.refcount in case rds_ib_alloc_fmr failed(mr pool running out). this lead to the refcount overflow. A complain in line 117(see following) is seen. From vmcore: s_ib_rdma_mr_pool_depleted is 2147485544 and rds_ibdev-refcount is -2147475448. That is the evidence the mr pool is used up. so rds_ib_alloc_fmr is very likely to return ERR_PTR(-EAGAIN). 115 void rds_ib_dev_put(struct rds_ib_device *rds_ibdev) 116 { 117 BUG_ON(atomic_read(rds_ibdev-refcount) = 0); 118 if (atomic_dec_and_test(rds_ibdev-refcount)) 119 queue_work(rds_wq, rds_ibdev-free_work); 120 } fix is to drop refcount when rds_ib_alloc_fmr failed. Signed-off-by: Wengang Wang wen.gang.w...@oracle.com Reviewed-by: Haggai Eran hagg...@mellanox.com --- net/rds/ib_rdma.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/net/rds/ib_rdma.c b/net/rds/ib_rdma.c index 273b8bf..657ba9f 100644 --- a/net/rds/ib_rdma.c +++ b/net/rds/ib_rdma.c @@ -759,8 +759,10 @@ void *rds_ib_get_mr(struct scatterlist *sg, unsigned long nents, } ibmr = rds_ib_alloc_fmr(rds_ibdev); - if (IS_ERR(ibmr)) + if (IS_ERR(ibmr)) { + rds_ib_dev_put(rds_ibdev); return ibmr; + } ret = rds_ib_map_fmr(rds_ibdev, ibmr, sg, nents); if (ret == 0) -- 2.1.0 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V3 1/5] RDMA/core: Transport-independent access flags
On 06/07/2015 02:22, Steve Wise wrote: +int rdma_device_access_flags(struct ib_pd *pd, int roles, int attrs) +{ + int access_flags = attrs; + + if (roles RDMA_MRR_RECV) + access_flags |= IB_ACCESS_LOCAL_WRITE; + + if (roles RDMA_MRR_WRITE_DEST) + access_flags |= IB_ACCESS_LOCAL_WRITE | IB_ACCESS_REMOTE_WRITE; + + if (roles RDMA_MRR_READ_DEST) { + access_flags |= IB_ACCESS_LOCAL_WRITE; + if (rdma_protocol_iwarp(pd-device, + rdma_start_port(pd-device))) + access_flags |= IB_ACCESS_REMOTE_WRITE; + } + + if (roles RDMA_MRR_READ_SOURCE) + access_flags |= IB_ACCESS_REMOTE_READ; + + if (roles RDMA_MRR_ATOMIC_DEST) + access_flags |= IB_ACCESS_LOCAL_WRITE | IB_ACCESS_REMOTE_ATOMIC; I think you need LOCAL_WRITE for ATOMIC_SOURCE in order to receive the results of the atomic operation. + + if (roles RDMA_MRR_MW_BIND) + access_flags |= IB_ACCESS_MW_BIND; + + return access_flags; +} +EXPORT_SYMBOL(rdma_device_access_flags); -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH V3 1/5] RDMA/core: Transport-independent access flags
-Original Message- From: Christoph Hellwig [mailto:h...@infradead.org] Sent: Monday, July 06, 2015 12:25 AM To: Steve Wise Cc: dledf...@redhat.com; sa...@mellanox.com; ogerl...@mellanox.com; r...@mellanox.com; linux-rdma@vger.kernel.org; e...@mellanox.com; target-de...@vger.kernel.org; linux-...@vger.kernel.org; trond.mykleb...@primarydata.com; bfie...@fieldses.org Subject: Re: [PATCH V3 1/5] RDMA/core: Transport-independent access flags On Sun, Jul 05, 2015 at 06:22:00PM -0500, Steve Wise wrote: The semantics for MR access flags are not consistent across RDMA protocols. So rather than have applications try and glean what they need, have them pass in the intended roles and attributes for the MR to be allocated and let the RDMA core select the appropriate access flags given the roles, attributes, and device capabilities. We introduce rdma_mr_roles and rdma_mr_attributes that enumerate the possible roles and attributes for a MR. These are documented in the enums themselves. New services exported: rdma_device_access_flags() - given the intended MR roles and attributes passed in, return the ib_access_flags bits for the device. rdma_get_dma_mr() - allocate a dma mr using the applications intended MR roles and MR attributes. This uses rdma_device_access_flags(). rdma_fast_reg_access_flags() - return the ib_access_flags bits needed for a fast register WR given the applications intended MR roles and MR attributes. This uses rdma_device_access_flags(). Signed-off-by: Steve Wise sw...@opengridcomputing.com --- drivers/infiniband/core/verbs.c | 30 +++ include/rdma/ib_verbs.h | 106 +++ 2 files changed, 136 insertions(+), 0 deletions(-) diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c index bac3fb4..f42595c 100644 --- a/drivers/infiniband/core/verbs.c +++ b/drivers/infiniband/core/verbs.c @@ -1144,6 +1144,36 @@ struct ib_mr *ib_get_dma_mr(struct ib_pd *pd, int mr_access_flags) } EXPORT_SYMBOL(ib_get_dma_mr); +int rdma_device_access_flags(struct ib_pd *pd, int roles, int attrs) +{ + int access_flags = attrs; Please add an assert for the values that are allowed for attrs. It also would be highly useful to add a kerneldoc comment describing the function and the parameters. Also __bitwise sparse tricks to ensure the right flags are passed wouldn't be a too bad idea. Can you explain the __bitwise sparse tricks? Or point me to an example. +/** + * rdma_mr_attributes - attributes for rdma memory regions + */ +enum { + RDMA_MRA_ZERO_BASED = IB_ZERO_BASED, + RDMA_MRA_ACCESS_ON_DEMAND = IB_ACCESS_ON_DEMAND, +}; Why not specfiy these as separate nuerical namespace? To avoid having to translate them. +/** + * rdma_device_access_flags - Returns the device-specific MR access flags. + * @pd: The protection domain associated with the memory region. + * @roles: The intended roles of the MR + * @attrs: The desired attributes of the MR + * + * Use the intended roles from @roles along with @attrs and the device + * capabilities to generate the needed access rights. + * + * Return: the ib_access_flags value needed to allocate the MR. + */ +int rdma_device_access_flags(struct ib_pd *pd, int roles, int attrs); Oh, here we have kerneldoc comments, just in the wrong place. Please move them to the function defintion. Ok. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH V3 1/5] RDMA/core: Transport-independent access flags
-Original Message- From: Sagi Grimberg [mailto:sa...@dev.mellanox.co.il] Sent: Monday, July 06, 2015 2:59 AM To: Steve Wise; dledf...@redhat.com Cc: sa...@mellanox.com; ogerl...@mellanox.com; r...@mellanox.com; linux-rdma@vger.kernel.org; e...@mellanox.com; target- de...@vger.kernel.org; linux-...@vger.kernel.org; trond.mykleb...@primarydata.com; bfie...@fieldses.org Subject: Re: [PATCH V3 1/5] RDMA/core: Transport-independent access flags On 7/6/2015 2:22 AM, Steve Wise wrote: The semantics for MR access flags are not consistent across RDMA protocols. So rather than have applications try and glean what they need, have them pass in the intended roles and attributes for the MR to be allocated and let the RDMA core select the appropriate access flags given the roles, attributes, and device capabilities. We introduce rdma_mr_roles and rdma_mr_attributes that enumerate the possible roles and attributes for a MR. These are documented in the enums themselves. New services exported: rdma_device_access_flags() - given the intended MR roles and attributes passed in, return the ib_access_flags bits for the device. rdma_get_dma_mr() - allocate a dma mr using the applications intended MR roles and MR attributes. This uses rdma_device_access_flags(). rdma_fast_reg_access_flags() - return the ib_access_flags bits needed for a fast register WR given the applications intended MR roles and MR attributes. This uses rdma_device_access_flags(). Signed-off-by: Steve Wise sw...@opengridcomputing.com --- drivers/infiniband/core/verbs.c | 30 +++ include/rdma/ib_verbs.h | 106 +++ 2 files changed, 136 insertions(+), 0 deletions(-) diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c index bac3fb4..f42595c 100644 --- a/drivers/infiniband/core/verbs.c +++ b/drivers/infiniband/core/verbs.c @@ -1144,6 +1144,36 @@ struct ib_mr *ib_get_dma_mr(struct ib_pd *pd, int mr_access_flags) } EXPORT_SYMBOL(ib_get_dma_mr); +int rdma_device_access_flags(struct ib_pd *pd, int roles, int attrs) +{ + int access_flags = attrs; + + if (roles RDMA_MRR_RECV) + access_flags |= IB_ACCESS_LOCAL_WRITE; + + if (roles RDMA_MRR_WRITE_DEST) + access_flags |= IB_ACCESS_LOCAL_WRITE | IB_ACCESS_REMOTE_WRITE; + + if (roles RDMA_MRR_READ_DEST) { + access_flags |= IB_ACCESS_LOCAL_WRITE; + if (rdma_protocol_iwarp(pd-device, + rdma_start_port(pd-device))) + access_flags |= IB_ACCESS_REMOTE_WRITE; + } + + if (roles RDMA_MRR_READ_SOURCE) + access_flags |= IB_ACCESS_REMOTE_READ; + + if (roles RDMA_MRR_ATOMIC_DEST) + access_flags |= IB_ACCESS_LOCAL_WRITE | IB_ACCESS_REMOTE_ATOMIC; + + if (roles RDMA_MRR_MW_BIND) + access_flags |= IB_ACCESS_MW_BIND; + + return access_flags; +} +EXPORT_SYMBOL(rdma_device_access_flags); + struct ib_mr *ib_reg_phys_mr(struct ib_pd *pd, struct ib_phys_buf *phys_buf_array, int num_phys_buf, diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h index 986fddb..da1eadf 100644 --- a/include/rdma/ib_verbs.h +++ b/include/rdma/ib_verbs.h @@ -2494,6 +2494,112 @@ static inline int ib_req_ncomp_notif(struct ib_cq *cq, int wc_cnt) struct ib_mr *ib_get_dma_mr(struct ib_pd *pd, int mr_access_flags); /** + * rdma_mr_roles - possible roles an RDMA MR will be used for + * + * This allows a transport independent RDMA application to + * create MRs that are usable for all the desired roles w/o + * having to understand which access rights are needed. + */ +enum { + + /* lkey used in a ib_recv_wr sge */ + RDMA_MRR_RECV = 1, + + /* lkey used for IB_WR_SEND* ops in the ib_send_wr sge */ + RDMA_MRR_SEND = (11), + + /* rkey used for IB_WR_RDMA_READ* ops in ib_send_wr wr.rdma.rkey */ + RDMA_MRR_READ_SOURCE= (12), + + /* lkey used for IB_WR_RDMA_READ* ops in the ib_send_wr sge */ + RDMA_MRR_READ_DEST = (13), + + /* lkey used for IB_WR_RDMA_WRITE* ops in the ib_send_wr sge */ + RDMA_MRR_WRITE_SOURCE = (14), + + /* rkey used for IB_WR_RDMA_WRITE* ops in ib_send_wr wr.rdma.rkey */ + RDMA_MRR_WRITE_DEST = (15), + + /* +* lkey used for IB_WR_ATOMIC/MASKED_ATOMIC* ops in the ib_send_wr sge +*/ + RDMA_MRR_ATOMIC_SOURCE = (16), + + /* +* rkey used for IB_WR_ATOMIC/MASKED_ATOMIC* ops in ib_send_wr +* wr.atomic.rkey +*/ + RDMA_MRR_ATOMIC_DEST= (17), + + /* MR used for a IB_WR_MW_BIND in ib_send_wr wr.bind_mw.bind_info.mr */ + RDMA_MRR_MW_BIND=
RE: [PATCH V3 0/5] Transport-independent MRs
-Original Message- From: Christoph Hellwig [mailto:h...@infradead.org] Sent: Monday, July 06, 2015 12:26 AM To: Steve Wise Cc: dledf...@redhat.com; sa...@mellanox.com; ogerl...@mellanox.com; r...@mellanox.com; linux-rdma@vger.kernel.org; e...@mellanox.com; target-de...@vger.kernel.org; linux-...@vger.kernel.org; trond.mykleb...@primarydata.com; bfie...@fieldses.org Subject: Re: [PATCH V3 0/5] Transport-independent MRs On Sun, Jul 05, 2015 at 06:21:53PM -0500, Steve Wise wrote: This series introduces transport-independent RDMA core services for allocating DMA MRs and computing fast register access flags. Included are changes to the iSER and NFSRDMA ULPs to make use of the new services. Can you convert all of them and remove the old APIs? I can. These are the only transport independent ULPs at this point. But if we want to remove ib_get_dma_mr(), then I can do this. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH V3 1/5] RDMA/core: Transport-independent access flags
-Original Message- From: Sagi Grimberg [mailto:sa...@dev.mellanox.co.il] Sent: Monday, July 06, 2015 2:54 AM To: Steve Wise; dledf...@redhat.com Cc: sa...@mellanox.com; ogerl...@mellanox.com; r...@mellanox.com; linux-rdma@vger.kernel.org; e...@mellanox.com; target- de...@vger.kernel.org; linux-...@vger.kernel.org; trond.mykleb...@primarydata.com; bfie...@fieldses.org Subject: Re: [PATCH V3 1/5] RDMA/core: Transport-independent access flags On 7/6/2015 2:22 AM, Steve Wise wrote: The semantics for MR access flags are not consistent across RDMA protocols. So rather than have applications try and glean what they need, have them pass in the intended roles and attributes for the MR to be allocated and let the RDMA core select the appropriate access flags given the roles, attributes, and device capabilities. We introduce rdma_mr_roles and rdma_mr_attributes that enumerate the possible roles and attributes for a MR. These are documented in the enums themselves. New services exported: rdma_device_access_flags() - given the intended MR roles and attributes passed in, return the ib_access_flags bits for the device. rdma_get_dma_mr() - allocate a dma mr using the applications intended MR roles and MR attributes. This uses rdma_device_access_flags(). rdma_fast_reg_access_flags() - return the ib_access_flags bits needed for a fast register WR given the applications intended MR roles and MR attributes. This uses rdma_device_access_flags(). Signed-off-by: Steve Wise sw...@opengridcomputing.com --- drivers/infiniband/core/verbs.c | 30 +++ include/rdma/ib_verbs.h | 106 +++ 2 files changed, 136 insertions(+), 0 deletions(-) diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c index bac3fb4..f42595c 100644 --- a/drivers/infiniband/core/verbs.c +++ b/drivers/infiniband/core/verbs.c @@ -1144,6 +1144,36 @@ struct ib_mr *ib_get_dma_mr(struct ib_pd *pd, int mr_access_flags) } EXPORT_SYMBOL(ib_get_dma_mr); +int rdma_device_access_flags(struct ib_pd *pd, int roles, int attrs) +{ + int access_flags = attrs; + + if (roles RDMA_MRR_RECV) + access_flags |= IB_ACCESS_LOCAL_WRITE; + + if (roles RDMA_MRR_WRITE_DEST) + access_flags |= IB_ACCESS_LOCAL_WRITE | IB_ACCESS_REMOTE_WRITE; + + if (roles RDMA_MRR_READ_DEST) { + access_flags |= IB_ACCESS_LOCAL_WRITE; + if (rdma_protocol_iwarp(pd-device, + rdma_start_port(pd-device))) + access_flags |= IB_ACCESS_REMOTE_WRITE; + } + + if (roles RDMA_MRR_READ_SOURCE) + access_flags |= IB_ACCESS_REMOTE_READ; + + if (roles RDMA_MRR_ATOMIC_DEST) + access_flags |= IB_ACCESS_LOCAL_WRITE | IB_ACCESS_REMOTE_ATOMIC; + + if (roles RDMA_MRR_MW_BIND) + access_flags |= IB_ACCESS_MW_BIND; + + return access_flags; +} +EXPORT_SYMBOL(rdma_device_access_flags); + struct ib_mr *ib_reg_phys_mr(struct ib_pd *pd, struct ib_phys_buf *phys_buf_array, int num_phys_buf, diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h index 986fddb..da1eadf 100644 --- a/include/rdma/ib_verbs.h +++ b/include/rdma/ib_verbs.h @@ -2494,6 +2494,112 @@ static inline int ib_req_ncomp_notif(struct ib_cq *cq, int wc_cnt) struct ib_mr *ib_get_dma_mr(struct ib_pd *pd, int mr_access_flags); /** + * rdma_mr_roles - possible roles an RDMA MR will be used for + * + * This allows a transport independent RDMA application to + * create MRs that are usable for all the desired roles w/o + * having to understand which access rights are needed. + */ +enum { + + /* lkey used in a ib_recv_wr sge */ + RDMA_MRR_RECV = 1, + + /* lkey used for IB_WR_SEND* ops in the ib_send_wr sge */ + RDMA_MRR_SEND = (11), + + /* rkey used for IB_WR_RDMA_READ* ops in ib_send_wr wr.rdma.rkey */ + RDMA_MRR_READ_SOURCE= (12), + + /* lkey used for IB_WR_RDMA_READ* ops in the ib_send_wr sge */ + RDMA_MRR_READ_DEST = (13), + + /* lkey used for IB_WR_RDMA_WRITE* ops in the ib_send_wr sge */ + RDMA_MRR_WRITE_SOURCE = (14), + + /* rkey used for IB_WR_RDMA_WRITE* ops in ib_send_wr wr.rdma.rkey */ + RDMA_MRR_WRITE_DEST = (15), + + /* +* lkey used for IB_WR_ATOMIC/MASKED_ATOMIC* ops in the ib_send_wr sge +*/ + RDMA_MRR_ATOMIC_SOURCE = (16), + + /* +* rkey used for IB_WR_ATOMIC/MASKED_ATOMIC* ops in ib_send_wr +* wr.atomic.rkey +*/ + RDMA_MRR_ATOMIC_DEST= (17), + + /* MR used for a IB_WR_MW_BIND in ib_send_wr wr.bind_mw.bind_info.mr */ + RDMA_MRR_MW_BIND=
RE: [PATCH V5 3/5] RDMA/iser: Limit sg tablesize and max_sectors to device fastreg max depth
-Original Message- From: Sagi Grimberg [mailto:sa...@dev.mellanox.co.il] Sent: Monday, July 06, 2015 2:51 AM To: Steve Wise; dledf...@redhat.com Cc: infinip...@intel.com; sa...@mellanox.com; ogerl...@mellanox.com; r...@mellanox.com; linux-rdma@vger.kernel.org; e...@mellanox.com; target-de...@vger.kernel.org Subject: Re: [PATCH V5 3/5] RDMA/iser: Limit sg tablesize and max_sectors to device fastreg max depth On 7/5/2015 8:44 PM, Steve Wise wrote: Currently the sg tablesize, which dictates fast register page list depth to use, does not take into account the limits of the rdma device. So adjust it once we discover the device fastreg max depth limit. Also adjust the max_sectors based on the resulting sg tablesize. Signed-off-by: Steve Wise sw...@opengridcomputing.com --- drivers/infiniband/ulp/iser/iscsi_iser.c |9 + 1 files changed, 9 insertions(+), 0 deletions(-) diff --git a/drivers/infiniband/ulp/iser/iscsi_iser.c b/drivers/infiniband/ulp/iser/iscsi_iser.c index 6a594aa..de8730d 100644 --- a/drivers/infiniband/ulp/iser/iscsi_iser.c +++ b/drivers/infiniband/ulp/iser/iscsi_iser.c @@ -640,6 +640,15 @@ iscsi_iser_session_create(struct iscsi_endpoint *ep, SHOST_DIX_GUARD_CRC); } + /* +* Limit the sg_tablesize and max_sectors based on the device +* max fastreg page list length. +*/ + shost-sg_tablesize = min_t(unsigned short, shost-sg_tablesize, + ib_conn-device-dev_attr.max_fast_reg_page_list_len); + shost-max_sectors = min_t(unsigned int, + 1024, (shost-sg_tablesize * PAGE_SIZE) 9); + The min statement is meaningless for max_sectors - you do a min between default sg_tablesize and frpl length - so the maximum sg_tablesize is 128 which is 1024 max_sectors. I'm not following. What if ib_conn-device-dev_attr.max_fast_reg_page_list_len is say, 32? Then shost-sg_tablesize is set to 32, and max_sectors is set to (32*4K) 9 == 256 512B sectors. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html