Multicast Filtering - mlx4_SET_MCAST_FLTR
Hi, I was going through the mlx4 code and noticed that this function mlx4_SET_MCAST_FLTR calls the mlx4_SET_MCAST_FLTR_wrapper which in turns has an empty body. So, I was just wondering if the multicast filtering functionality is disabled? Is QP_ATTACH the replacement for this? Couldn't understand so wanted your help on this... Thanks Best Regards, Bob -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: Stepping down as maintainer (was Re: [PATCH for-next 0/9] mlx4 changes in virtual GID management)
On Tue, Apr 7, 2015 at 10:12 AM, Jason Gunthorpe jguntho...@obsidianresearch.com wrote: I don't think you understand how deep the problem Or is describing goes. [...Appropriate and correct critique...] This thread has made me realize that even as I am able to carve out more time to work on things like IB maintaintership, I no longer have the desire to spend my time maintaining the IB subsystem. Since my current level of activity is clearly hurting the community, I've decided to step down as maintainer. I am sad to see you leave. Without you InfiniBand would have never been accepted upstream at all. Thank you for all the hard work, Ira N�r��yb�X��ǧv�^�){.n�+{��ٚ�{ay�ʇڙ�,j��f���h���z��w��� ���j:+v���w�j�mzZ+�ݢj��!�i
Re: Stepping down as maintainer (was Re: [PATCH for-next 0/9] mlx4 changes in virtual GID management)
On Thu, 2015-04-09 at 10:41 -0700, Roland Dreier wrote: On Tue, Apr 7, 2015 at 10:12 AM, Jason Gunthorpe jguntho...@obsidianresearch.com wrote: I don't think you understand how deep the problem Or is describing goes. [...Appropriate and correct critique...] This thread has made me realize that even as I am able to carve out more time to work on things like IB maintaintership, I no longer have the desire to spend my time maintaining the IB subsystem. Since my current level of activity is clearly hurting the community, I've decided to step down as maintainer. Thank you for the long, thankless years of service. I wish you the best in whatever endeavors you choose to focus on! -- Doug Ledford dledf...@redhat.com GPG KeyID: 0E572FDD signature.asc Description: This is a digitally signed message part
RE: Stepping down as maintainer (was Re: [PATCH for-next 0/9] mlx4 changes in virtual GID management)
This thread has made me realize that even as I am able to carve out more time to work on things like IB maintaintership, I no longer have the desire to spend my time maintaining the IB subsystem. Since my current level of activity is clearly hurting the community, I've decided to step down as maintainer. Thank you for all of your work over the years. I know that your technical skills and opinion have been widely appreciated and respected by the entire community. I concur, and I'd like to thank you for the personal time that you dedicated to this project. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 01/17] IB/Verbs: Implement new callback query_transport() for each HW
On Thu, Apr 09, 2015 at 10:34:30AM -0400, Doug Ledford wrote: These are exactly the tests I proposed Jason. I'm not sure I see your point here. I guess my point is that although the scenario of all the different items seems complex, it really does boil down to needing only exactly what I proposed earlier to fulfill the entire test matrix. I have no problem with minimizing a bitmap, but I want the accessors to make sense first. My specific problem with your suggestion was combining cap_ib_mad, cap_ib_sa, and cap_ib_smi into rdma_port_ib_fabric_mgmt. Not only do the three cap things not return the same value for all situations, the documentary knowledge is lost by the reduction. I'd prefer we look at this from a 'what do the call sites need' view, not a 'how do we minimize' view. I've written this before: The mess here is that it is too hard to know what the call sites are actually checking for when it is some baroque conditional. Jason -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 01/17] IB/Verbs: Implement new callback query_transport() for each HW
On Thu, Apr 09, 2015 at 02:42:24PM +0200, Michael Wang wrote: On 04/08/2015 10:10 PM, Jason Gunthorpe wrote: [snip] Some of the other checks in this file revolve around pkey, I'm not sure what rocee does there? cap_pkey_supported ? I'm not sure if this count in capability... how shall we describe it? I'm not sure how rocee uses pkey, but maybe the the GRH and pkey thing would work well together under a single 'cap_ethernet_ah' ? Jason -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 01/17] IB/Verbs: Implement new callback query_transport() for each HW
On 04/08/2015 08:29 PM, Doug Ledford wrote: On Tue, 2015-04-07 at 14:42 +0200, Michael Wang wrote: Add new callback query_transport() and implement for each HW. My response here is going to be a long email, but that's because it's easier to respond to the various patches all in one response in order to preserve context. So, while I'm responding to patch 1 of 17, my response will cover all 17 patches in whole. Thanks for the review :-) Mapping List: [snip] diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c index 18c1ece..a9587c4 100644 --- a/drivers/infiniband/core/device.c +++ b/drivers/infiniband/core/device.c @@ -76,6 +76,7 @@ static int ib_device_check_mandatory(struct ib_device *device) } mandatory_table[] = { IB_MANDATORY_FUNC(query_device), IB_MANDATORY_FUNC(query_port), +IB_MANDATORY_FUNC(query_transport), IB_MANDATORY_FUNC(query_pkey), IB_MANDATORY_FUNC(query_gid), IB_MANDATORY_FUNC(alloc_pd), I'm concerned about the performance implications of this. The size of this patchset already points out just how many places in the code we have to check for various aspects of the device transport in order to do the right thing. Without going through the entire list to see how many are on critical hot paths, I'm sure some of them are on at least partially critical hot paths (like creation of new connections). I would prefer to see this change be implemented via a device attribute, not a functional call query. That adds a needless function call in these paths. That's exactly the first issue come into my mind while working on this. Mostly I was influenced by the current device callback mechanism, we have plenty of query callback and they are widely used in hot path, thus I finally decided to use query_transport() to utilize the existed mechanism. Actually I used to learn that the bitmask operation is somewhat expensive too, while the callback may only cost two register, one instruction and twice jump, thus I guess we may need some benchmark to tell the difference on performance, so I just pick the easier way as first step :-P diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c index f93eb8d..83370de 100644 --- a/drivers/infiniband/core/verbs.c +++ b/drivers/infiniband/core/verbs.c @@ -133,14 +133,16 @@ enum rdma_link_layer rdma_port_get_link_layer(struct ib_device *device, u8 port_ if (device-get_link_layer) return device-get_link_layer(device, port_num); -switch (rdma_node_get_transport(device-node_type)) { +switch (device-query_transport(device, port_num)) { case RDMA_TRANSPORT_IB: +case RDMA_TRANSPORT_IBOE: return IB_LINK_LAYER_INFINIBAND; If we are perserving ABI, then this looks wrong. Currently, IBOE returnsi transport IB and link layer Ethernet. It should not return link layer IB, it does not support IB link layer operations (such as MAD access). That's my bad, IBOE is ETH link layer. [snip] }; I'm also concerned about this. I would like to see this enum essentially turned into a bitmap. One that is constructed in such a way that we can always get the specific test we need with only one compare against the overall value. In order to do so, we need to break it down into the essential elements that are part of each of the transports. So, for instance, we can define the two link layers we have so far, plus reserve one for OPA which we know is coming: The idea sounds interesting, but frankly speaking I'm already starting to worried about the size of this patch set... I really prefer to move optimizing/reforming work like this into next stage, after this pioneer patch set settle down and working stably, after all, we have already get rid of the old transport helpers, reforming based on that should be far more easier and clear. Next version will be reorganized to separate the implementation and wrapper replacement, which make the patch set even bigger, fortunately, since the logical is not very complex, we are still able to handle it, I really prefer we can focus on performance and concise after infrastructure built up. RDMA_LINK_LAYER_IB = 0x0001, RDMA_LINK_LAYER_ETH = 0x0002, RDMA_LINK_LAYER_OPA = 0x0004, RDMA_LINK_LAYER_MASK = 0x000f, [snip] From patch 2/17: +static inline int rdma_ib_mgmt(struct ib_device *device, u8 port_num) +{ + enum rdma_transport_type tp = device-query_transport(device, port_num); + + return (tp == RDMA_TRANSPORT_IB || tp == RDMA_TRANSPORT_IBOE); +} This looks wrong. IBOE doesn't have IB management. At least it doesn't have subnet management. This helper actually could be erased at last :-) after Sean's suggestion on cma stuff, no where need this raw helper anymore, just cap_ib_cm(), cap_iw_cm() and cap_ib_mad() is enough.
Re: RDMA Multicasting
On Thu, 9 Apr 2015, Caitlin Bestler wrote: RDMA requires specific delivery semantics. Successful completion of an untagged message (an RDMA Send) implies that all prior tagged (Write/Read) packets have been successfully placed in user memory. That seems to be incompatible with a multicast delivery mechanism because there is state to all endpoints involved. Just putting RDMA packets over an unreliable transport will not accomplish that. I am not sure what an RDMA packet is. RDMA is a memory to memory transfer action. A packet is information send on a medium. A packet that describes an RDMA action to be taken between certain endpoints? Infiniband is lossless and thus what unreliable means is also quite foggy. You are probably better off using RDMA ideas over UDP/UD, and doing the direct memory placement from your own code, instead. So send the memory transfer info via multicast datagram to the endpoints and then run the transfer from the endpoint. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 01/17] IB/Verbs: Implement new callback query_transport() for each HW
On 04/08/2015 10:10 PM, Jason Gunthorpe wrote: [snip] As Sean pointed out, force_grh should be rdma_dev_is_iboe(). The cm I actually really prefer cap_mandatory_grh - that is what is going on here. ie based on that name (as a reviewer) I'd expect to see the mad layer check that the mandatory GRH is always present, or blow up. Sounds good, will be in next version :-) Regards, Michael Wang Some of the other checks in this file revolve around pkey, I'm not sure what rocee does there? cap_pkey_supported ? Jason -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] ib_uverbs: Fix pages leak when using XRC SRQs
Hello, When an application using XRCs abruptly terminates, the mmaped pages of the CQ buffers are leaked. This comes from the fact that when resources are released in ib_uverbs_cleanup_ucontext(), we fail to release the CQs because their refcount is not 0. When creating an XRC SRQ, we increment the associated CQ refcount. This refcount is only decremented when the SRQ is released. Therefore we need to release the SRQs prior to the CQs to make sure that all references to the CQs are gone before trying to release these. Signed-off-by: Sebastien Dugue sebastien.du...@bull.net --- drivers/infiniband/core/uverbs_main.c | 22 +++--- 1 file changed, 11 insertions(+), 11 deletions(-) diff --git a/drivers/infiniband/core/uverbs_main.c b/drivers/infiniband/core/uverbs_main.c index 259dcc7..88cce9b 100644 --- a/drivers/infiniband/core/uverbs_main.c +++ b/drivers/infiniband/core/uverbs_main.c @@ -246,6 +246,17 @@ static int ib_uverbs_cleanup_ucontext(struct ib_uverbs_file *file, kfree(uqp); } + list_for_each_entry_safe(uobj, tmp, context-srq_list, list) { + struct ib_srq *srq = uobj-object; + struct ib_uevent_object *uevent = + container_of(uobj, struct ib_uevent_object, uobject); + + idr_remove_uobj(ib_uverbs_srq_idr, uobj); + ib_destroy_srq(srq); + ib_uverbs_release_uevent(file, uevent); + kfree(uevent); + } + list_for_each_entry_safe(uobj, tmp, context-cq_list, list) { struct ib_cq *cq = uobj-object; struct ib_uverbs_event_file *ev_file = cq-cq_context; @@ -258,17 +269,6 @@ static int ib_uverbs_cleanup_ucontext(struct ib_uverbs_file *file, kfree(ucq); } - list_for_each_entry_safe(uobj, tmp, context-srq_list, list) { - struct ib_srq *srq = uobj-object; - struct ib_uevent_object *uevent = - container_of(uobj, struct ib_uevent_object, uobject); - - idr_remove_uobj(ib_uverbs_srq_idr, uobj); - ib_destroy_srq(srq); - ib_uverbs_release_uevent(file, uevent); - kfree(uevent); - } - list_for_each_entry_safe(uobj, tmp, context-mr_list, list) { struct ib_mr *mr = uobj-object; -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RDMA Multicasting
On Wed, 8 Apr 2015, Allen Andrews wrote: I am trying to find out if RDMA Multicasting is supported on RoCE in Linux. If so, is it supported above the verbs interface? RDMA multicasting? You mean sending ud(or udp) messages via the RDMA API? That works even without RoCE. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 01/17] IB/Verbs: Implement new callback query_transport() for each HW
On Thu, 2015-04-09 at 10:01 -0600, Jason Gunthorpe wrote: On Thu, Apr 09, 2015 at 10:34:30AM -0400, Doug Ledford wrote: These are exactly the tests I proposed Jason. I'm not sure I see your point here. I guess my point is that although the scenario of all the different items seems complex, it really does boil down to needing only exactly what I proposed earlier to fulfill the entire test matrix. I have no problem with minimizing a bitmap, but I want the accessors to make sense first. My specific problem with your suggestion was combining cap_ib_mad, cap_ib_sa, and cap_ib_smi into rdma_port_ib_fabric_mgmt. Not only do the three cap things not return the same value for all situations, the documentary knowledge is lost by the reduction. I'd prefer we look at this from a 'what do the call sites need' view, not a 'how do we minimize' view. I've written this before: The mess here is that it is too hard to know what the call sites are actually checking for when it is some baroque conditional. The two goals: being specific about what the test is returning and minimizing the bitmap footprint; are not necessarily opposed. One can do both at the same time. -- Doug Ledford dledf...@redhat.com GPG KeyID: 0E572FDD signature.asc Description: This is a digitally signed message part
Re: Stepping down as maintainer (was Re: [PATCH for-next 0/9] mlx4 changes in virtual GID management)
On Thu, Apr 09, 2015 at 10:41:13AM -0700, Roland Dreier wrote: This thread has made me realize that even as I am able to carve out more time to work on things like IB maintaintership, I no longer have the desire to spend my time maintaining the IB subsystem. Since my current level of activity is clearly hurting the community, I've decided to step down as maintainer. Thank you for all of your work over the years. I know that your technical skills and opinion have been widely appreciated and respected by the entire community. Regards, Jason -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 13/17] IB/Verbs: Reform cma/ucma with management helpers
On 04/08/2015 07:02 PM, Hefty, Sean wrote: [snip] The wrapper make sense, but do we have the guarantee that IBoE port won't be used for AF_IB address? I just can't locate the place we filtered it out... I can't think of a reason why IBoE wouldn't work with AF_IB, but I'm not sure if anyone has tested it. The original check would have let IBoE through. When I suggested checking for IB transport, I meant the actual transport protocol, which would have included both IB and IBoE. Got it :-) @@ -700,8 +700,7 @@ static int cma_ib_init_qp_attr(struct [snip] id_priv-id.route.addr.dev_addr.dev_type = - (rdma_port_get_link_layer(cma_dev-device, p) == IB_LINK_LAYER_INFINIBAND) ? + (rdma_transport_ib(cma_dev-device, p)) ? ARPHRD_INFINIBAND : ARPHRD_ETHER; This wants the link layer, or maybe use cap_ipoib. Is this related with ipoib only? ARPHDR_INFINIBAND is related to ipoib. In your next update, maybe go with tech_ib. I don't know the status of ipoib over iboe. Will be in next version :-) Regards, Michael Wang -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 01/17] IB/Verbs: Implement new callback query_transport() for each HW
On 04/08/2015 10:10 PM, Jason Gunthorpe wrote: [snip] Some of the other checks in this file revolve around pkey, I'm not sure what rocee does there? cap_pkey_supported ? I'm not sure if this count in capability... how shall we describe it? Regards, Michael Wang Jason -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 01/17] IB/Verbs: Implement new callback query_transport() for each HW
On Wed, 2015-04-08 at 14:10 -0600, Jason Gunthorpe wrote: On Wed, Apr 08, 2015 at 02:29:46PM -0400, Doug Ledford wrote: To straighten all this out, lets break management out into the two distinct types: rdma_port_ib_fabric_mgmt() - fabric specific management tasks: MAD, SM, multicast. The proper test for this with my bitmap above is a simple transport RDMA_MGMT_IB test. If will be true for IB and OPA fabrics. rdma_port_conn_mgmt() - connection management, which we currently support everything except USNIC (correct Sean?), so a test would be something like !(transport RDMA_TRANSPORT_USNIC). This is then split out into two subgroups, IB style and iWARP stype connection management (aka, rdma_port_iw_conn_mgmt() and rdma_port_ib_conn_mgmt()). In my above bitmap, since I didn't give IBOE its own transport type, these subgroups still boil down to the simple tests transport iWARP and transport IB like they do today. There is a lot more variation here than just these two tests, and those two tests won't scale to include OPA. IB ROCEE OPA SMI Y N Y(though the OPA smi looked a bit different) IB SMP Y N N OPA SMP N N Y GMP Y Y Y SA Y N Y PM Y Y Y(? guessing for OPA) CM Y Y Y GMP needs GRH N Y N You can still break this down to a manageable bitmap. SMI, SMP, and SA are all essentially the same and can be combined to one bitmap that is IB_SM 0x1 OPA_SM 0x2 and the defines are such that IB devices define IB_SM, and OPA devices define IB_SM and OPA_SM. Any minor differences between OPA and IB can be handled by testing just the OPA_SM bit. This will exclude all IBOE devices and iWARP devices. GMP, PM, and CM are all the same, and are all identical to transport == INFINIBAND. GMP needs GRH happens to be precisely the same as ib_dev_is_iboe. These are exactly the tests I proposed Jason. I'm not sure I see your point here. I guess my point is that although the scenario of all the different items seems complex, it really does boil down to needing only exactly what I proposed earlier to fulfill the entire test matrix. -- Doug Ledford dledf...@redhat.com GPG KeyID: 0E572FDD signature.asc Description: This is a digitally signed message part