Multicast Filtering - mlx4_SET_MCAST_FLTR

2015-04-09 Thread Bob Biloxi
Hi,

I was going through the mlx4 code and noticed that this function
mlx4_SET_MCAST_FLTR calls the mlx4_SET_MCAST_FLTR_wrapper which in
turns has an empty body.


So, I was just wondering if the multicast filtering functionality is disabled?

Is QP_ATTACH the replacement for this?

Couldn't understand so wanted your help on this...


Thanks

Best Regards,
Bob
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: Stepping down as maintainer (was Re: [PATCH for-next 0/9] mlx4 changes in virtual GID management)

2015-04-09 Thread Weiny, Ira
 
 On Tue, Apr 7, 2015 at 10:12 AM, Jason Gunthorpe
 jguntho...@obsidianresearch.com wrote:
  I don't think you understand how deep the problem Or is describing
  goes.
 
  [...Appropriate and correct critique...]
 
 This thread has made me realize that even as I am able to carve out more time
 to work on things like IB maintaintership, I no longer have the desire to 
 spend
 my time maintaining the IB subsystem.  Since my current level of activity is
 clearly hurting the community, I've decided to step down as maintainer.
 

I am sad to see you leave.  Without you InfiniBand would have never been 
accepted upstream at all.

Thank you for all the hard work,
Ira

N�r��yb�X��ǧv�^�)޺{.n�+{��ٚ�{ay�ʇڙ�,j��f���h���z��w���
���j:+v���w�j�mzZ+�ݢj��!�i

Re: Stepping down as maintainer (was Re: [PATCH for-next 0/9] mlx4 changes in virtual GID management)

2015-04-09 Thread Doug Ledford
On Thu, 2015-04-09 at 10:41 -0700, Roland Dreier wrote:
 On Tue, Apr 7, 2015 at 10:12 AM, Jason Gunthorpe
 jguntho...@obsidianresearch.com wrote:
  I don't think you understand how deep the problem Or is describing
  goes.
 
  [...Appropriate and correct critique...]
 
 This thread has made me realize that even as I am able to carve out
 more time to work on things like IB maintaintership, I no longer have
 the desire to spend my time maintaining the IB subsystem.  Since my
 current level of activity is clearly hurting the community, I've
 decided to step down as maintainer.

Thank you for the long, thankless years of service.  I wish you the best
in whatever endeavors you choose to focus on!

-- 
Doug Ledford dledf...@redhat.com
  GPG KeyID: 0E572FDD




signature.asc
Description: This is a digitally signed message part


RE: Stepping down as maintainer (was Re: [PATCH for-next 0/9] mlx4 changes in virtual GID management)

2015-04-09 Thread Hefty, Sean
  This thread has made me realize that even as I am able to carve out
  more time to work on things like IB maintaintership, I no longer
  have the desire to spend my time maintaining the IB subsystem.
  Since my current level of activity is clearly hurting the community,
  I've decided to step down as maintainer.
 
 Thank you for all of your work over the years. I know that your
 technical skills and opinion have been widely appreciated and
 respected by the entire community.

I concur, and I'd like to thank you for the personal time that you dedicated to 
this project.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 01/17] IB/Verbs: Implement new callback query_transport() for each HW

2015-04-09 Thread Jason Gunthorpe
On Thu, Apr 09, 2015 at 10:34:30AM -0400, Doug Ledford wrote:

 These are exactly the tests I proposed Jason.  I'm not sure I see your
 point here.  I guess my point is that although the scenario of all the
 different items seems complex, it really does boil down to needing only
 exactly what I proposed earlier to fulfill the entire test matrix.

I have no problem with minimizing a bitmap, but I want the accessors
to make sense first.

My specific problem with your suggestion was combining cap_ib_mad,
cap_ib_sa, and cap_ib_smi into rdma_port_ib_fabric_mgmt.

Not only do the three cap things not return the same value for all
situations, the documentary knowledge is lost by the reduction.

I'd prefer we look at this from a 'what do the call sites need' view,
not a 'how do we minimize' view.

I've written this before: The mess here is that it is too hard to know
what the call sites are actually checking for when it is some baroque
conditional.

Jason
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 01/17] IB/Verbs: Implement new callback query_transport() for each HW

2015-04-09 Thread Jason Gunthorpe
On Thu, Apr 09, 2015 at 02:42:24PM +0200, Michael Wang wrote:
 On 04/08/2015 10:10 PM, Jason Gunthorpe wrote:
 [snip]
  
  Some of the other checks in this file revolve around pkey, I'm not
  sure what rocee does there? cap_pkey_supported ?
 
 I'm not sure if this count in capability... how shall we describe it?

I'm not sure how rocee uses pkey, but maybe the the GRH and pkey thing
would work well together under a single 'cap_ethernet_ah' ?

Jason
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 01/17] IB/Verbs: Implement new callback query_transport() for each HW

2015-04-09 Thread Michael Wang
On 04/08/2015 08:29 PM, Doug Ledford wrote:
 On Tue, 2015-04-07 at 14:42 +0200, Michael Wang wrote:
 Add new callback query_transport() and implement for each HW.
 
 My response here is going to be a long email, but that's because it's
 easier to respond to the various patches all in one response in order to
 preserve context.  So, while I'm responding to patch 1 of 17, my
 response will cover all 17 patches in whole.

Thanks for the review :-)

 
 Mapping List:
[snip]

 diff --git a/drivers/infiniband/core/device.c 
 b/drivers/infiniband/core/device.c
 index 18c1ece..a9587c4 100644
 --- a/drivers/infiniband/core/device.c
 +++ b/drivers/infiniband/core/device.c
 @@ -76,6 +76,7 @@ static int ib_device_check_mandatory(struct ib_device 
 *device)
  } mandatory_table[] = {
  IB_MANDATORY_FUNC(query_device),
  IB_MANDATORY_FUNC(query_port),
 +IB_MANDATORY_FUNC(query_transport),
  IB_MANDATORY_FUNC(query_pkey),
  IB_MANDATORY_FUNC(query_gid),
  IB_MANDATORY_FUNC(alloc_pd),
 
 I'm concerned about the performance implications of this.  The size of
 this patchset already points out just how many places in the code we
 have to check for various aspects of the device transport in order to do
 the right thing.  Without going through the entire list to see how many
 are on critical hot paths, I'm sure some of them are on at least
 partially critical hot paths (like creation of new connections).  I
 would prefer to see this change be implemented via a device attribute,
 not a functional call query.  That adds a needless function call in
 these paths.

That's exactly the first issue come into my mind while working on this.

Mostly I was influenced by the current device callback mechanism, we have
plenty of query callback and they are widely used in hot path, thus I
finally decided to use query_transport() to utilize the existed mechanism.

Actually I used to learn that the bitmask operation is somewhat expensive
too, while the callback may only cost two register, one instruction and
twice jump, thus I guess we may need some benchmark to tell the difference
on performance, so I just pick the easier way as first step :-P

 
 diff --git a/drivers/infiniband/core/verbs.c 
 b/drivers/infiniband/core/verbs.c
 index f93eb8d..83370de 100644
 --- a/drivers/infiniband/core/verbs.c
 +++ b/drivers/infiniband/core/verbs.c
 @@ -133,14 +133,16 @@ enum rdma_link_layer rdma_port_get_link_layer(struct 
 ib_device *device, u8 port_
  if (device-get_link_layer)
  return device-get_link_layer(device, port_num);
  
 -switch (rdma_node_get_transport(device-node_type)) {
 +switch (device-query_transport(device, port_num)) {
  case RDMA_TRANSPORT_IB:
 +case RDMA_TRANSPORT_IBOE:
  return IB_LINK_LAYER_INFINIBAND;
 
 If we are perserving ABI, then this looks wrong.  Currently, IBOE
 returnsi transport IB and link layer Ethernet.  It should not return
 link layer IB, it does not support IB link layer operations (such as MAD
 access).

That's my bad, IBOE is ETH link layer.

 
[snip]
  };
 
 I'm also concerned about this.  I would like to see this enum
 essentially turned into a bitmap.  One that is constructed in such a way
 that we can always get the specific test we need with only one compare
 against the overall value.  In order to do so, we need to break it down
 into the essential elements that are part of each of the transports.
 So, for instance, we can define the two link layers we have so far, plus
 reserve one for OPA which we know is coming:

The idea sounds interesting, but frankly speaking I'm already starting to
worried about the size of this patch set...

I really prefer to move optimizing/reforming work like this into next stage,
after this pioneer patch set settle down and working stably, after all, we
have already get rid of the old transport helpers, reforming based on
that should be far more easier and clear.

Next version will be reorganized to separate the implementation and wrapper
replacement, which make the patch set even bigger, fortunately, since the 
logical
is not very complex, we are still able to handle it, I really prefer we can
focus on performance and concise after infrastructure built up.

 
 RDMA_LINK_LAYER_IB   = 0x0001,
 RDMA_LINK_LAYER_ETH  = 0x0002,
 RDMA_LINK_LAYER_OPA  = 0x0004,
 RDMA_LINK_LAYER_MASK = 0x000f,
[snip]
 
 From patch 2/17:
 
 
 +static inline int rdma_ib_mgmt(struct ib_device *device, u8 port_num)
 +{
 +   enum rdma_transport_type tp = device-query_transport(device,
 port_num);
 +
 +   return (tp == RDMA_TRANSPORT_IB || tp == RDMA_TRANSPORT_IBOE);
 +}
 
 This looks wrong.  IBOE doesn't have IB management.  At least it doesn't
 have subnet management.

This helper actually could be erased at last :-) after Sean's suggestion on cma
stuff, no where need this raw helper anymore, just cap_ib_cm(), cap_iw_cm()
and cap_ib_mad() is enough.

 
 

Re: RDMA Multicasting

2015-04-09 Thread Christoph Lameter
On Thu, 9 Apr 2015, Caitlin Bestler wrote:

 RDMA requires specific delivery semantics. Successful completion of an
 untagged message (an RDMA Send) implies that all prior tagged
 (Write/Read) packets have been successfully placed in user memory.

That seems to be incompatible with a multicast delivery mechanism
because there is state to all endpoints involved.

 Just putting RDMA packets over an unreliable transport will not accomplish 
 that.

I am not sure what an RDMA packet is. RDMA is a memory to memory
transfer action. A packet is information send on a medium. A packet that
describes an RDMA action to be taken between certain endpoints?

Infiniband is lossless and thus what unreliable means is also quite
foggy.

 You are probably better off using RDMA ideas over UDP/UD, and doing the
 direct memory placement from your own code, instead.

So send the memory transfer info via multicast datagram to the
endpoints and then run the transfer from the endpoint.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 01/17] IB/Verbs: Implement new callback query_transport() for each HW

2015-04-09 Thread Michael Wang
On 04/08/2015 10:10 PM, Jason Gunthorpe wrote:
[snip]
 
 As Sean pointed out, force_grh should be rdma_dev_is_iboe().  The cm
 
 I actually really prefer cap_mandatory_grh - that is what is going on
 here. ie based on that name (as a reviewer) I'd expect to see the mad
 layer check that the mandatory GRH is always present, or blow up.

Sounds good, will be in next version :-)

Regards,
Michael Wang

 
 Some of the other checks in this file revolve around pkey, I'm not
 sure what rocee does there? cap_pkey_supported ?
 
 Jason
 
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] ib_uverbs: Fix pages leak when using XRC SRQs

2015-04-09 Thread Sébastien Dugué

  Hello,

  When an application using XRCs abruptly terminates, the mmaped pages
of the CQ buffers are leaked.

  This comes from the fact that when resources are released in
ib_uverbs_cleanup_ucontext(), we fail to release the CQs because their
refcount is not 0.

  When creating an XRC SRQ, we increment the associated CQ refcount.
This refcount is only decremented when the SRQ is released.

  Therefore we need to release the SRQs prior to the CQs to make sure
that all references to the CQs are gone before trying to release these.

Signed-off-by: Sebastien Dugue sebastien.du...@bull.net
---
 drivers/infiniband/core/uverbs_main.c | 22 +++---
 1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/drivers/infiniband/core/uverbs_main.c 
b/drivers/infiniband/core/uverbs_main.c
index 259dcc7..88cce9b 100644
--- a/drivers/infiniband/core/uverbs_main.c
+++ b/drivers/infiniband/core/uverbs_main.c
@@ -246,6 +246,17 @@ static int ib_uverbs_cleanup_ucontext(struct 
ib_uverbs_file *file,
kfree(uqp);
}
 
+   list_for_each_entry_safe(uobj, tmp, context-srq_list, list) {
+   struct ib_srq *srq = uobj-object;
+   struct ib_uevent_object *uevent =
+   container_of(uobj, struct ib_uevent_object, uobject);
+
+   idr_remove_uobj(ib_uverbs_srq_idr, uobj);
+   ib_destroy_srq(srq);
+   ib_uverbs_release_uevent(file, uevent);
+   kfree(uevent);
+   }
+
list_for_each_entry_safe(uobj, tmp, context-cq_list, list) {
struct ib_cq *cq = uobj-object;
struct ib_uverbs_event_file *ev_file = cq-cq_context;
@@ -258,17 +269,6 @@ static int ib_uverbs_cleanup_ucontext(struct 
ib_uverbs_file *file,
kfree(ucq);
}
 
-   list_for_each_entry_safe(uobj, tmp, context-srq_list, list) {
-   struct ib_srq *srq = uobj-object;
-   struct ib_uevent_object *uevent =
-   container_of(uobj, struct ib_uevent_object, uobject);
-
-   idr_remove_uobj(ib_uverbs_srq_idr, uobj);
-   ib_destroy_srq(srq);
-   ib_uverbs_release_uevent(file, uevent);
-   kfree(uevent);
-   }
-
list_for_each_entry_safe(uobj, tmp, context-mr_list, list) {
struct ib_mr *mr = uobj-object;
 
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RDMA Multicasting

2015-04-09 Thread Christoph Lameter
On Wed, 8 Apr 2015, Allen Andrews wrote:

 I am trying to find out if RDMA Multicasting is supported on RoCE in Linux.
 If so, is it supported above the verbs interface?

RDMA multicasting? You mean sending ud(or udp) messages via the RDMA API?
That works even without RoCE.

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 01/17] IB/Verbs: Implement new callback query_transport() for each HW

2015-04-09 Thread Doug Ledford
On Thu, 2015-04-09 at 10:01 -0600, Jason Gunthorpe wrote:
 On Thu, Apr 09, 2015 at 10:34:30AM -0400, Doug Ledford wrote:
 
  These are exactly the tests I proposed Jason.  I'm not sure I see your
  point here.  I guess my point is that although the scenario of all the
  different items seems complex, it really does boil down to needing only
  exactly what I proposed earlier to fulfill the entire test matrix.
 
 I have no problem with minimizing a bitmap, but I want the accessors
 to make sense first.
 
 My specific problem with your suggestion was combining cap_ib_mad,
 cap_ib_sa, and cap_ib_smi into rdma_port_ib_fabric_mgmt.
 
 Not only do the three cap things not return the same value for all
 situations, the documentary knowledge is lost by the reduction.
 
 I'd prefer we look at this from a 'what do the call sites need' view,
 not a 'how do we minimize' view.
 
 I've written this before: The mess here is that it is too hard to know
 what the call sites are actually checking for when it is some baroque
 conditional.

The two goals: being specific about what the test is returning and
minimizing the bitmap footprint; are not necessarily opposed.  One can
do both at the same time.

-- 
Doug Ledford dledf...@redhat.com
  GPG KeyID: 0E572FDD




signature.asc
Description: This is a digitally signed message part


Re: Stepping down as maintainer (was Re: [PATCH for-next 0/9] mlx4 changes in virtual GID management)

2015-04-09 Thread Jason Gunthorpe
On Thu, Apr 09, 2015 at 10:41:13AM -0700, Roland Dreier wrote:

 This thread has made me realize that even as I am able to carve out
 more time to work on things like IB maintaintership, I no longer
 have the desire to spend my time maintaining the IB subsystem.
 Since my current level of activity is clearly hurting the community,
 I've decided to step down as maintainer.

Thank you for all of your work over the years. I know that your
technical skills and opinion have been widely appreciated and
respected by the entire community.

Regards,
Jason
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 13/17] IB/Verbs: Reform cma/ucma with management helpers

2015-04-09 Thread Michael Wang
On 04/08/2015 07:02 PM, Hefty, Sean wrote:
[snip]

 The wrapper make sense, but do we have the guarantee that IBoE port won't
 be used for AF_IB address? I just can't locate the place we filtered it
 out...
 
 I can't think of a reason why IBoE wouldn't work with AF_IB, but I'm not sure 
 if anyone has tested it.  The original check would have let IBoE through.  
 When I suggested checking for IB transport, I meant the actual transport 
 protocol, which would have included both IB and IBoE.

Got it :-)

 
 @@ -700,8 +700,7 @@ static int cma_ib_init_qp_attr(struct
[snip]
 
id_priv-id.route.addr.dev_addr.dev_type =
 -  (rdma_port_get_link_layer(cma_dev-device, p) ==
 IB_LINK_LAYER_INFINIBAND) ?
 +  (rdma_transport_ib(cma_dev-device, p)) ?
ARPHRD_INFINIBAND : ARPHRD_ETHER;

 This wants the link layer, or maybe use cap_ipoib.

 Is this related with ipoib only?
 
 ARPHDR_INFINIBAND is related to ipoib.  In your next update, maybe go with 
 tech_ib.  I don't know the status of ipoib over iboe.

Will be in next version :-)

Regards,
Michael Wang

 
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 01/17] IB/Verbs: Implement new callback query_transport() for each HW

2015-04-09 Thread Michael Wang
On 04/08/2015 10:10 PM, Jason Gunthorpe wrote:
[snip]
 
 Some of the other checks in this file revolve around pkey, I'm not
 sure what rocee does there? cap_pkey_supported ?

I'm not sure if this count in capability... how shall we describe it?

Regards,
Michael Wang

 
 Jason
 
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 01/17] IB/Verbs: Implement new callback query_transport() for each HW

2015-04-09 Thread Doug Ledford
On Wed, 2015-04-08 at 14:10 -0600, Jason Gunthorpe wrote:
 On Wed, Apr 08, 2015 at 02:29:46PM -0400, Doug Ledford wrote:
 
  To straighten all this out, lets break management out into the two
  distinct types:
  
  rdma_port_ib_fabric_mgmt() - fabric specific management tasks: MAD, SM,
  multicast.  The proper test for this with my bitmap above is a simple
  transport  RDMA_MGMT_IB test.  If will be true for IB and OPA fabrics.
 
  rdma_port_conn_mgmt() - connection management, which we currently
  support everything except USNIC (correct Sean?), so a test would be
  something like !(transport  RDMA_TRANSPORT_USNIC).  This is then split
  out into two subgroups, IB style and iWARP stype connection management
  (aka, rdma_port_iw_conn_mgmt() and rdma_port_ib_conn_mgmt()).  In my
  above bitmap, since I didn't give IBOE its own transport type, these
  subgroups still boil down to the simple tests transport  iWARP and
  transport  IB like they do today.
 
 There is a lot more variation here than just these two tests, and those
 two tests won't scale to include OPA.
 
 IB ROCEE OPA
 SMI Y  N Y(though the OPA smi looked a bit different)
 IB SMP  Y  N N
 OPA SMP N  N Y
 GMP Y  Y Y
 SA  Y  N Y
 PM  Y  Y Y(? guessing for OPA)
 CM  Y  Y Y
 GMP needs GRH N Y N
 

You can still break this down to a manageable bitmap.

SMI, SMP, and SA are all essentially the same and can be combined to one
bitmap that is

IB_SM  0x1
OPA_SM 0x2

and the defines are such that IB devices define IB_SM, and OPA devices
define IB_SM and OPA_SM.  Any minor differences between OPA and IB can
be handled by testing just the OPA_SM bit.  This will exclude all IBOE
devices and iWARP devices.

GMP, PM, and CM are all the same, and are all identical to transport ==
INFINIBAND.

GMP needs GRH happens to be precisely the same as ib_dev_is_iboe.

These are exactly the tests I proposed Jason.  I'm not sure I see your
point here.  I guess my point is that although the scenario of all the
different items seems complex, it really does boil down to needing only
exactly what I proposed earlier to fulfill the entire test matrix.


-- 
Doug Ledford dledf...@redhat.com
  GPG KeyID: 0E572FDD




signature.asc
Description: This is a digitally signed message part