Re: RDMAoE verbs questions
On Wed, Dec 09, 2009 at 02:48:43PM -0700, Jason Gunthorpe wrote: > > Also, if we really must do this, can you please send a patch to Roland > that at least adds the constants for IB, ASAP. Ideally to be included > in OFED 1.5. > > At least that way people can make updates to check it prior to RDMAoE > actually appearing.. > Sure, I'll send the patch tomorrow and and it will be in the next OFED-RDMAoE build too. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RDMAoE verbs questions
On Wed, Dec 09, 2009 at 11:06:41AM -0800, Roland Dreier wrote: > > > It looks good to me. Thanks, I will take it for RDMAoE. > > Great... as Jason suggested, please also add in the appropriate reserved > fields to pad the struct to a 32 bit boundary and zero them in the > wrapper. So if there is a next time we don't have this problem again. Also, if we really must do this, can you please send a patch to Roland that at least adds the constants for IB, ASAP. Ideally to be included in OFED 1.5. At least that way people can make updates to check it prior to RDMAoE actually appearing.. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RDMAoE verbs questions
> It looks good to me. Thanks, I will take it for RDMAoE. Great... as Jason suggested, please also add in the appropriate reserved fields to pad the struct to a 32 bit boundary and zero them in the wrapper. So if there is a next time we don't have this problem again. - R. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RDMAoE verbs questions
On Fri, Dec 04, 2009 at 08:03:31PM -0800, Roland Dreier wrote: > > > Yes, every Linux arch aligns structs to the min alignment for the > > members, so at least 32 in this case. > > > > However, it doesn't really matter, look at ibv_cmd_query_port, it > > doesn't zero the padding. So there must be an ABI bump to ensure that > > new code links to a library that doesn't fill the new member with > > garbage. > > > > This is a messy one, the low level libraries have to be reved somehow too.. > > ops.query_port2() I guess. > > Actually I think we can fix this in libibverbs without having to break > anything. It's a little bit devious, but if we do something like: > > // ... add link_layer member in padding of struct ibv_port_attr > > enum { > IBV_LINK_LAYER_UNSPECIFIED, > IBV_LINK_LAYER_INFINIBAND, > IBV_LINK_LAYER_ETHERNET, > }; > > static inline int __ibv_query_port(struct ibv_context *context, uint8_t > port_num, > struct ibv_port_attr *port_attr) > { > port_attr->link_layer = IBV_LINK_LAYER_UNSPECIFIED; > return ibv_query_port(context, port_num, port_attr); > } > > // ... rest of file... > > #define ibv_query_port(context, port_num, port_attr) \ > __ibv_query_port(context, port_num, port_attr) > > then I think legacy apps should be OK (port_attr size doesn't change, > binary compat is still there), and new apps that do check link_layer > should also be OK ... if they use an old library and/or old driver, > they'll see LINK_LAYER_UNSPECIFIED, which means that IBoE is not supported. > > What do you think, does this work? > > - R. It looks good to me. Thanks, I will take it for RDMAoE. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RDMAoE verbs questions
On Fri, Dec 04, 2009 at 08:03:31PM -0800, Roland Dreier wrote: > then I think legacy apps should be OK (port_attr size doesn't change, > binary compat is still there), and new apps that do check link_layer > should also be OK ... if they use an old library and/or old driver, > they'll see LINK_LAYER_UNSPECIFIED, which means that IBoE is not supported. > > What do you think, does this work? Yeah, that should be fine, quite sneaky indeed.. Maybe zero the whole padding though, for future? Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RDMAoE verbs questions
> Yes, every Linux arch aligns structs to the min alignment for the > members, so at least 32 in this case. > > However, it doesn't really matter, look at ibv_cmd_query_port, it > doesn't zero the padding. So there must be an ABI bump to ensure that > new code links to a library that doesn't fill the new member with > garbage. > > This is a messy one, the low level libraries have to be reved somehow too.. > ops.query_port2() I guess. Actually I think we can fix this in libibverbs without having to break anything. It's a little bit devious, but if we do something like: // ... add link_layer member in padding of struct ibv_port_attr enum { IBV_LINK_LAYER_UNSPECIFIED, IBV_LINK_LAYER_INFINIBAND, IBV_LINK_LAYER_ETHERNET, }; static inline int __ibv_query_port(struct ibv_context *context, uint8_t port_num, struct ibv_port_attr *port_attr) { port_attr->link_layer = IBV_LINK_LAYER_UNSPECIFIED; return ibv_query_port(context, port_num, port_attr); } // ... rest of file... #define ibv_query_port(context, port_num, port_attr) \ __ibv_query_port(context, port_num, port_attr) then I think legacy apps should be OK (port_attr size doesn't change, binary compat is still there), and new apps that do check link_layer should also be OK ... if they use an old library and/or old driver, they'll see LINK_LAYER_UNSPECIFIED, which means that IBoE is not supported. What do you think, does this work? - R. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: RDMAoE verbs questions
> Existing apps rely on transport_type == IBV_TRANSPORT_IB to indicate IB management is present. There are many examples of this. > The art of API compatability is to not break existing old apps, so you don't get to change the meaning of transport_type == IBV_TRANSPORT_IB to mean 'it is only IB verbs like'. That breaks the API. > Adding a new field to port_attr preserves functionality but not compatability. I hope you understand the difference. I understand exactly what you mean, but I want to propose another way of looking at the compatibility issue: IB management is a network service. Just as an administrator might mistakenly try to access an FTP server over a wrong eth interface, an IB admin can mistakenly run an IB management application (or a non-rdmacm app that is incompatible with RDMAoE) on a RDMAoE port. In both cases, the service is unreachable, but otherwise no harm is done. Basically, IB management is "supported" by RDMAoE --- you use the same verbs to access it but you don't get any answers... (Of course that it terms of implementation, we can choose to drop SMPs or non-CM MADs rather then sending them on the wire.) There is a tradeoff here between transparently supporting existing rdmacm apps versus making sure that non-rdmacm apps that may come across an RDMAoE port do not attempt to use it. Given the fact that rdmacm has become a preferred approach, and that for non-rdmacm apps the worst case effect is that of being unable to create a connection through a port that they were not designed to support to begin with, we prefer the approach that we proposed. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RDMAoE verbs questions
On Wed, Dec 02, 2009 at 12:38:31PM +0200, Liran Liss wrote: > > > So? There are substantial semantic differences for *all* non-rdmacm > > applications. Even common ones like OpenMPI. You propose to ignore them? > > On the contrary! Any application that *does* care what the link layer is > can look up a new field in port_attr (rather than a new node transport > type). > Applications that don't, both old and new, can continue as normal - no > changes to the code are required. Existing apps rely on transport_type == IBV_TRANSPORT_IB to indicate IB management is present. There are many examples of this. The art of API compatability is to not break existing old apps, so you don't get to change the meaning of transport_type == IBV_TRANSPORT_IB to mean 'it is only IB verbs like'. That breaks the API. Adding a new field to port_attr preserves functionality but not compatability. I hope you understand the difference. > So, all relevant apps will work great with either IB or RDMAoE, in a > transparent manner. No, they won't. They will see transport_type == IBV_TRANSPORT_IB and attempt to do PR queries to the SM. That won't work on RDMAoE. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RDMAoE verbs questions
On Wed, Nov 25, 2009 at 09:25:32AM +0200, Or Gerlitz wrote: > Liran, where this limitation comes from? isn't the HCA supporting > bridging (loopback connections) for RDMAoE? if this is the case > maybe you should add a device capability to mark that. > Loopback support is not complete and is a temporary limitation. It will be fixed in the near future. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RDMAoE verbs questions
Paul Grun wrote: Why do you say that Or? I said that b/c the latest patch set posted by Mellanox doesn't support loopback, I hear now that this was a temporal limitation which will be removed, let it be. Or. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: RDMAoE verbs questions
Hi Paul, you are not missing anything - lookback communication will work in RDMAoE just as in IB. --Liran -Original Message- From: Paul Grun [mailto:pg...@systemfabricworks.com] Sent: Wednesday, December 02, 2009 10:55 AM To: 'Or Gerlitz'; Liran Liss Cc: 'Sean Hefty'; 'Jason Gunthorpe'; 'Eli Cohen'; 'Jeff Squyres'; linux-rdma@vger.kernel.org Subject: RE: RDMAoE verbs questions Why do you say that Or? I'm a hardware guy so can't comment on what the s/w supports/prevents, but I can see no reason why loopback wouldn't be supported. In fact, if I were guessing, I would guess that the spec currently under development would support loopback.tt What am I missing? -Paul -Original Message- From: linux-rdma-ow...@vger.kernel.org [mailto:linux-rdma-ow...@vger.kernel.org] On Behalf Of Or Gerlitz Sent: Wednesday, December 02, 2009 12:09 AM To: Liran Liss Cc: Sean Hefty; Jason Gunthorpe; Eli Cohen; Jeff Squyres; linux-rdma@vger.kernel.org Subject: Re: RDMAoE verbs questions Liran Liss wrote: > from an rdmacm app's point of view - there is no visible difference between IB and RDMAoE ports: both support the complete set of Verbs, just as any IB transport provider > wrong, local (loopback) communication aren't supported with RDMAoE. Or. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: RDMAoE verbs questions
> So? There are substantial semantic differences for *all* non-rdmacm applications. Even common ones like OpenMPI. You propose to ignore them? On the contrary! Any application that *does* care what the link layer is can look up a new field in port_attr (rather than a new node transport type). Applications that don't, both old and new, can continue as normal - no changes to the code are required. >> RDMAoE *is* IB transport over Ethernet - we don't want different >> devices with different node types exactly for this reason: >> applications shouldn't care if they are running over IB or RDMAoE, and >> shouldn't add another switch statement to support RDMAoE. > Nonsense. RDMAoE is no such thing, it is utterly incompatible with the IB management model. It is some new protocol that is only about 90% compatible with IB. You are missing the point here - RDMAoE is 100% compatible with IB at the *transport* level, as reflected by the Verbs. The point that the management model is different is true, but irrelevant. The only transport-related issue that matters is addressing, but for user-space apps, it is completely abstracted by the rdmacm. Non-rdmacm apps fall into 2 main categories: 1. IB management+diagnostics apps - these are irrelevant for an Eth network anyway. 2. High-performance middleware such as MPI and SHMEM - these perform optimizations according to the link protocol anyway. So, all relevant apps will work great with either IB or RDMAoE, in a transparent manner. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RDMAoE verbs questions
Liran Liss wrote: from an rdmacm app's point of view - there is no visible difference between IB and RDMAoE ports: both support the complete set of Verbs, just as any IB transport provider wrong, local (loopback) communication aren't supported with RDMAoE. Or. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RDMAoE verbs questions
On Tue, 1 Dec 2009, Eli Cohen wrote: > On Mon, Nov 30, 2009 at 09:03:47AM -0500, Jeff Squyres wrote: > > Per my prior question: is it expected that IBoE will function > > *exactly* the same as real IB? The addition of the port attribute > > seems to imply not. > > IBoE and IB should work exactly the same from the perspective of a > user level application that makes use of rdmacm to create connections. > Such apps can ignore the new attribute. And we believe this should > cover the vast majority of apps. The new port attribute optionally > allows the distinction for apps that need it (e.g. those that do not > use the rdmacm, apps that have a reason to prefer one over the other > when there is a choice, etc). rdmacm only? I would expect syscall API compatibility? For our use case this needs to work with multicast traffic. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RDMAoE verbs questions
On Wed, 25 Nov 2009, Jason Gunthorpe wrote: > If you have a single physical chip with two ports and they are running > different protocols it seems much cleaner to me to report it to verbs > apps as two devices. > > Doing this avoids creating compatability problems. Right. Mellanox has some limitation between how each port can be configured. But that could also be checked with two independent devices. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RDMAoE verbs questions
On Tue, Dec 01, 2009 at 06:22:06PM +0200, Liran Liss wrote: > > Dealing with ABI compatability is a different issue, this new scheme > is API incompatible due to the change in semantics for existing values. > > For rdmacm applications, there are no semantic changes between IB and > RDMAoE. So? There are substantial semantic differences for *all* non-rdmacm applications. Even common ones like OpenMPI. You propose to ignore them? > > Please look at my message regarding using multiple devices, perhaps > you can improve on that general idea. > > RDMAoE *is* IB transport over Ethernet - we don't want different devices > with different node types exactly for this reason: applications > shouldn't care if they are running over IB or RDMAoE, and shouldn't add > another switch statement to support RDMAoE. Nonsense. RDMAoE is no such thing, it is utterly incompatible with the IB management model. It is some new protocol that is only about 90% compatible with IB. Apps using rdmacm shouldn't care one way or the other, apps that don't *need* a new transport type. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: RDMAoE verbs questions
The issue is that from an rdmacm app's point of view - there is no visible difference between IB and RDMAoE ports: both support the complete set of Verbs, just as any IB transport provider. Therefore, both ports should reside on a node with an IB transport type. -Original Message- From: Sean Hefty [mailto:sean.he...@intel.com] Sent: Tuesday, December 01, 2009 6:27 PM To: Liran Liss; Jason Gunthorpe Cc: Eli Cohen; Jeff Squyres; linux-rdma@vger.kernel.org Subject: RE: RDMAoE verbs questions >RDMAoE *is* IB transport over Ethernet RDMAoE carries something that looks a lot like IB L3 or an IPv6 header, so it isn't exactly IB transport over Ethernet. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RDMAoE verbs questions
On Mon, Nov 30, 2009 at 02:01:45PM -0600, Todd Rimmer wrote: > > If a given architecture rounds sizeof(transport) up to 16 bits or 32 bits, > then the replacement field should be uint16_t or uint32_t respectively, > otherwise existing binary applications which fetch transport will fetch > additional undefined bytes which follow it in the new structure. > > The big question is whether all presently supported architectures use the > same size for enum? > > I did a quicky test program and on SLES10 x86_64 sizeof(an enum) is 32 bits. > Hence uint8_t would break binary compatibility on that platform. > You probably refer to binary compatibility between defferent versions of the RDMAoE libibverbs (e.g. the one with enum and the one with uint8_t). I think this is not a big issue and recompiling should solve that. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: RDMAoE verbs questions
>RDMAoE *is* IB transport over Ethernet RDMAoE carries something that looks a lot like IB L3 or an IPv6 header, so it isn't exactly IB transport over Ethernet. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: RDMAoE verbs questions
> Dealing with ABI compatability is a different issue, this new scheme is API incompatible due to the change in semantics for existing values. For rdmacm applications, there are no semantic changes between IB and RDMAoE. > Please look at my message regarding using multiple devices, perhaps you can improve on that general idea. RDMAoE *is* IB transport over Ethernet - we don't want different devices with different node types exactly for this reason: applications shouldn't care if they are running over IB or RDMAoE, and shouldn't add another switch statement to support RDMAoE. In addiiton, any existing rdmacm app can work transparently over RDMAoE without recompilation or relinking. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RDMAoE verbs questions
On Mon, Nov 30, 2009 at 10:50:02AM -0800, Roland Dreier wrote: > > I was thinking the same thing, although maybe a name like "link_layer" > would be clearer? > Sure, that's cleaer - I'll change that. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RDMAoE verbs questions
On Mon, Nov 30, 2009 at 09:03:47AM -0500, Jeff Squyres wrote: > Per my prior question: is it expected that IBoE will function > *exactly* the same as real IB? The addition of the port attribute > seems to imply not. IBoE and IB should work exactly the same from the perspective of a user level application that makes use of rdmacm to create connections. Such apps can ignore the new attribute. And we believe this should cover the vast majority of apps. The new port attribute optionally allows the distinction for apps that need it (e.g. those that do not use the rdmacm, apps that have a reason to prefer one over the other when there is a choice, etc). > > Additionally, per Jason's question, why not simply expose this as an > additional device? E.g., can you APM across a real IB port and an > IBoE port on the same CX2? I'm guessing that they're *effectively* > different devices; it might make sense expose them as different > virtual device_t's. Just my $0.02. Yes. It should be possible to do APM across them. And with the ABI compatible link-protocol attribute we see no reason to force the isolation. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: RDMAoE verbs questions
> > diff --git a/include/infiniband/verbs.h b/include/infiniband/verbs.h > > index 07d4395..f7fe68d 100644 > > --- a/include/infiniband/verbs.h > > +++ b/include/infiniband/verbs.h > > @@ -192,7 +192,7 @@ struct ibv_port_attr { > > uint8_t active_width; > > uint8_t active_speed; > > uint8_t phys_state; > > - enum rdma_transport_type transport; > > + uint8_t transport; If a given architecture rounds sizeof(transport) up to 16 bits or 32 bits, then the replacement field should be uint16_t or uint32_t respectively, otherwise existing binary applications which fetch transport will fetch additional undefined bytes which follow it in the new structure. The big question is whether all presently supported architectures use the same size for enum? I did a quicky test program and on SLES10 x86_64 sizeof(an enum) is 32 bits. Hence uint8_t would break binary compatibility on that platform. Todd Rimmer -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RDMAoE verbs questions
On Mon, Nov 30, 2009 at 10:50:02AM -0800, Roland Dreier wrote: > > > If we change struct ibv_port_attr transport field from enum to uint8, > > we eliminate binary compatibility problems. That's because the previous > > filed is aligned to 16 bits address so that leaves us 16 bits more. > > > > diff --git a/include/infiniband/verbs.h b/include/infiniband/verbs.h > > index 07d4395..f7fe68d 100644 > > +++ b/include/infiniband/verbs.h > > @@ -192,7 +192,7 @@ struct ibv_port_attr { > > uint8_t active_width; > > uint8_t active_speed; > > uint8_t phys_state; > > - enum rdma_transport_type transport; > > + uint8_t transport; > > }; > > Do all architectures round up the structure size? ie is this always > going to preserve ABI? Yes, every Linux arch aligns structs to the min alignment for the members, so at least 32 in this case. However, it doesn't really matter, look at ibv_cmd_query_port, it doesn't zero the padding. So there must be an ABI bump to ensure that new code links to a library that doesn't fill the new member with garbage. This is a messy one, the low level libraries have to be reved somehow too.. ops.query_port2() I guess. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RDMAoE verbs questions
> If we change struct ibv_port_attr transport field from enum to uint8, > we eliminate binary compatibility problems. That's because the previous > filed is aligned to 16 bits address so that leaves us 16 bits more. > > diff --git a/include/infiniband/verbs.h b/include/infiniband/verbs.h > index 07d4395..f7fe68d 100644 > --- a/include/infiniband/verbs.h > +++ b/include/infiniband/verbs.h > @@ -192,7 +192,7 @@ struct ibv_port_attr { > uint8_t active_width; > uint8_t active_speed; > uint8_t phys_state; > - enum rdma_transport_type transport; > + uint8_t transport; > }; Do all architectures round up the structure size? ie is this always going to preserve ABI? > Moreover, I would like to change the field's name from transport to > link_protocl. Let me know if that makes more sense to you. I was thinking the same thing, although maybe a name like "link_layer" would be clearer? - R. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RDMAoE verbs questions
On Mon, Nov 30, 2009 at 03:34:06PM +0200, Eli Cohen wrote: > If we change struct ibv_port_attr transport field from enum to uint8, > we eliminate binary compatibility problems. That's because the previous > filed is aligned to 16 bits address so that leaves us 16 bits more. Dealing with ABI compatability is a different issue, this new scheme is API incompatible due to the change in semantics for existing values. Please look at my message regarding using multiple devices, perhaps you can improve on that general idea. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RDMAoE verbs questions
Per my prior question: is it expected that IBoE will function *exactly* the same as real IB? The addition of the port attribute seems to imply not. Additionally, per Jason's question, why not simply expose this as an additional device? E.g., can you APM across a real IB port and an IBoE port on the same CX2? I'm guessing that they're *effectively* different devices; it might make sense expose them as different virtual device_t's. Just my $0.02. On Nov 30, 2009, at 8:34 AM, Eli Cohen wrote: On Tue, Nov 24, 2009 at 05:11:36PM -0700, Jason Gunthorpe wrote: > On Tue, Nov 24, 2009 at 06:23:15PM -0500, Jeff Squyres wrote: > > > 2. I am somewhat confused by the overloading of the term "transport". > > It appears that a device will have > > ibv_device.transport_type==IBV_TRANSPORT_IB for both IB and RDMAOE > > devices. The only way to tell the difference is to examine the new > > ibv_port_attr.transport field to see if it is RDMA_TRANSPORT_IB or > > RDMA_TRANSPORT_RDMAOE. > > I haven't seen these patches but this seems poor to me. I think any > app that isn't using rdmacm will need patching and support for RDMAOE > (certainly all mine will). libibverbs shouldn't overload the existing > transport_type checks for something that is not 100% compatible with > IB. > > Is the same true for openmpi? If you try to run it as is on a RDMAOE > interface will it work? If not I think that alone should kill this > idea.. > If we change struct ibv_port_attr transport field from enum to uint8, we eliminate binary compatibility problems. That's because the previous filed is aligned to 16 bits address so that leaves us 16 bits more. diff --git a/include/infiniband/verbs.h b/include/infiniband/verbs.h index 07d4395..f7fe68d 100644 --- a/include/infiniband/verbs.h +++ b/include/infiniband/verbs.h @@ -192,7 +192,7 @@ struct ibv_port_attr { uint8_t active_width; uint8_t active_speed; uint8_t phys_state; - enum rdma_transport_type transport; + uint8_t transport; }; Moreover, I would like to change the field's name from transport to link_protocl. Let me know if that makes more sense to you. -- To unsubscribe from this list: send the line "unsubscribe linux- rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- Jeff Squyres jsquy...@cisco.com -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RDMAoE verbs questions
Sorry -- I replied from my PDA last week but the list rejected the mail. All I have is what was sent to the OMPI list (although I see Pasha attached the patch in a later mail on this thread). Note that we (OMPI) tend to operate a bit differently than OpenFabrics -- we don't typically send patches to lists, we don't typically use git, etc. On Nov 25, 2009, at 9:59 AM, Or Gerlitz wrote: Jeff Squyres wrote: > Here's one thread: > http://www.open-mpi.org/community/lists/devel/2009/11/7063.php Jeff, looking on the threads you have sent, I didn't find a way to download the patch in a form which can be applied on a source tree, is there a way to do it through this archive? are these patches available from some git tree @mellanox or elsewhere? does anyone have the email address of Vasily Philipov (/vasily_at_[hidden]/), if yes, can you op Pasha please ask him to send me or better, this list the proposed patch, many thanks. Or -- Jeff Squyres jsquy...@cisco.com -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RDMAoE verbs questions
On Tue, Nov 24, 2009 at 05:11:36PM -0700, Jason Gunthorpe wrote: > On Tue, Nov 24, 2009 at 06:23:15PM -0500, Jeff Squyres wrote: > > > 2. I am somewhat confused by the overloading of the term "transport". > > It appears that a device will have > > ibv_device.transport_type==IBV_TRANSPORT_IB for both IB and RDMAOE > > devices. The only way to tell the difference is to examine the new > > ibv_port_attr.transport field to see if it is RDMA_TRANSPORT_IB or > > RDMA_TRANSPORT_RDMAOE. > > I haven't seen these patches but this seems poor to me. I think any > app that isn't using rdmacm will need patching and support for RDMAOE > (certainly all mine will). libibverbs shouldn't overload the existing > transport_type checks for something that is not 100% compatible with > IB. > > Is the same true for openmpi? If you try to run it as is on a RDMAOE > interface will it work? If not I think that alone should kill this > idea.. > If we change struct ibv_port_attr transport field from enum to uint8, we eliminate binary compatibility problems. That's because the previous filed is aligned to 16 bits address so that leaves us 16 bits more. diff --git a/include/infiniband/verbs.h b/include/infiniband/verbs.h index 07d4395..f7fe68d 100644 --- a/include/infiniband/verbs.h +++ b/include/infiniband/verbs.h @@ -192,7 +192,7 @@ struct ibv_port_attr { uint8_t active_width; uint8_t active_speed; uint8_t phys_state; - enum rdma_transport_type transport; + uint8_t transport; }; Moreover, I would like to change the field's name from transport to link_protocl. Let me know if that makes more sense to you. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RDMAoE verbs questions
Pavel Shamis (Pasha) wrote: > The only reason for this changes is the fact that for IB devices we > prefer to use our own open mpi connection managers. In case if we will > decide to use RDMA-CM for all devices the number of changes will be zero... whatever, currently, this change is still there, and best if you remove it and find another way to set this predicate. > So we decided to use the current ompi code as is, in future maybe we will > implement own ompi rdmacm code that will not have all this work around flows. just to make sure I am with you, all in all, only one patch is proposed to ompi for rdmaoe support and is the patch which we discuss above, this patch does three things: 1. changes BTL_OPENIB_CONNECT_BASE_CHECK_IF_NOT_IB to look on the port transport type 2. if the port transport is rdmaoe don't run loopback connections on IB 3. some change in the qp destroy logic 4. that's it... correct? can you comment on #2? why loopback connections aren't supported? Or. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RDMAoE verbs questions
Or Gerlitz wrote: Pavel Shamis (Pasha) wrote: The patch is attached Thanks, this patch basically replaces checks for the device transport type to be IB to a check that makes sure either the former happens or the port transport type is rdmaoe. As Jason, Tziporet and noted, the port transport type seems to be bad and non-comapatible/operable idea, so it should and probably could be avoided. The only reason for this changes is the fact that for IB devices we prefer to use our own open mpi connection managers. In case if we will decide to use RDMA-CM for all devices the number of changes will be zero... I see another patch @ http://www.open-mpi.org/community/lists/devel/2009/11/7063.php can you send that one as well. The you sent patch isn't signed so I can't address the author in further replies (unless you are the author), also it wasn't generated with the -p option of diff which would show for each change what is the effected function, doing so would help in the review. Ohh. It was our first patch. Some of iwarp devices have the "first packet" issue. So the RDMACM code in Ompi have bunch of work-around code that try to resolve the issue on MPI level. We don't have such problems with IB, so we tried to disable the work-around flow for IB devices, but it did not work well. So we decided to use the current ompi code as is, in future maybe we will implement own ompi rdmacm code that will not have all this work around flows. Regards, Pasha -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RDMAoE verbs questions
Pavel Shamis (Pasha) wrote: The patch is attached Thanks, this patch basically replaces checks for the device transport type to be IB to a check that makes sure either the former happens or the port transport type is rdmaoe. As Jason, Tziporet and noted, the port transport type seems to be bad and non-comapatible/operable idea, so it should and probably could be avoided. I see another patch @ http://www.open-mpi.org/community/lists/devel/2009/11/7063.php can you send that one as well. The you sent patch isn't signed so I can't address the author in further replies (unless you are the author), also it wasn't generated with the -p option of diff which would show for each change what is the effected function, doing so would help in the review. Or. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RDMAoE verbs questions
Or, The patch is attached. Regards, Pasha. Or Gerlitz wrote: Jeff Squyres wrote: Here's one thread: http://www.open-mpi.org/community/lists/devel/2009/11/7063.php Jeff, looking on the threads you have sent, I didn't find a way to download the patch in a form which can be applied on a source tree, is there a way to do it through this archive? are these patches available from some git tree @mellanox or elsewhere? does anyone have the email address of Vasily Philipov (/vasily_at_[hidden]/), if yes, can you op Pasha please ask him to send me or better, this list the proposed patch, many thanks. Or -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff -r 16b0d6d73529 ompi/config/ompi_check_openib.m4 --- a/ompi/config/ompi_check_openib.m4 Tue Nov 03 20:00:16 2009 -0800 +++ b/ompi/config/ompi_check_openib.m4 Sun Nov 15 14:58:37 2009 +0200 @@ -13,7 +13,7 @@ # Copyright (c) 2006-2008 Cisco Systems, Inc. All rights reserved. # Copyright (c) 2006-2007 Los Alamos National Security, LLC. All rights # reserved. -# Copyright (c) 2006-2008 Mellanox Technologies. All rights reserved. +# Copyright (c) 2006-2009 Mellanox Technologies. All rights reserved. # $COPYRIGHT$ # # Additional copyrights may follow @@ -204,6 +204,21 @@ [$1_have_ibcm=1 $1_LIBS="-libcm $$1_LIBS"])]) fi + + # Check support for RDMAoE devices + $1_have_rdmaoe=0 + AC_CHECK_DECLS([RDMA_TRANSPORT_RDMAOE], + [$1_have_rdmaoe=1], [], + [#include ]) + + AC_MSG_CHECKING([if RDMAoE support is enabled]) + if test "1" = "$$1_have_rdmaoe"; then +AC_DEFINE_UNQUOTED([OMPI_HAVE_RDMAOE], [$$1_have_rdmaoe], [Enable RDMAoE support]) +AC_MSG_RESULT([yes]) + else +AC_MSG_RESULT([no]) + fi + ]) # Check to see if works. It is known to diff -r 16b0d6d73529 ompi/mca/btl/openib/btl_openib.c --- a/ompi/mca/btl/openib/btl_openib.c Tue Nov 03 20:00:16 2009 -0800 +++ b/ompi/mca/btl/openib/btl_openib.c Sun Nov 15 14:58:37 2009 +0200 @@ -354,6 +354,13 @@ } #endif +#ifdef OMPI_HAVE_RDMAOE +if(RDMA_TRANSPORT_RDMAOE == (openib_btl->ib_port_attr.transport) && +OPAL_PROC_ON_LOCAL_NODE(ompi_proc->proc_flags)) { +continue; +} +#endif + if(NULL == (ib_proc = mca_btl_openib_proc_create(ompi_proc))) { return OMPI_ERR_OUT_OF_RESOURCE; } diff -r 16b0d6d73529 ompi/mca/btl/openib/connect/base.h --- a/ompi/mca/btl/openib/connect/base.h Tue Nov 03 20:00:16 2009 -0800 +++ b/ompi/mca/btl/openib/connect/base.h Sun Nov 15 14:58:37 2009 +0200 @@ -1,6 +1,7 @@ /* * Copyright (c) 2007-2008 Cisco Systems, Inc. All rights reserved. * + * Copyright (c) 2009 Mellanox Technologies. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -13,6 +14,17 @@ #include "connect/connect.h" +#ifdef OMPI_HAVE_RDMAOE +#define BTL_OPENIB_CONNECT_BASE_CHECK_IF_NOT_IB(btl) \ +(((IBV_TRANSPORT_IB != ((btl)->device->ib_dev->transport_type)) || \ +(RDMA_TRANSPORT_RDMAOE == ((btl)->ib_port_attr.transport))) ? \ +true : false) +#else +#define BTL_OPENIB_CONNECT_BASE_CHECK_IF_NOT_IB(btl) \ +((IBV_TRANSPORT_IB != ((btl)->device->ib_dev->transport_type)) ? \ +true : false) +#endif + BEGIN_C_DECLS /* diff -r 16b0d6d73529 ompi/mca/btl/openib/connect/btl_openib_connect_ibcm.c --- a/ompi/mca/btl/openib/connect/btl_openib_connect_ibcm.c Tue Nov 03 20:00:16 2009 -0800 +++ b/ompi/mca/btl/openib/connect/btl_openib_connect_ibcm.c Sun Nov 15 14:58:37 2009 +0200 @@ -1,6 +1,6 @@ /* * Copyright (c) 2007-2009 Cisco Systems, Inc. All rights reserved. - * Copyright (c) 2008 Mellanox Technologies. All rights reserved. + * Copyright (c) 2008-2009 Mellanox Technologies. All rights reserved. * * $COPYRIGHT$ * @@ -653,7 +653,7 @@ we're in an old version of OFED that is IB only (i.e., no iWarp), so we can safely assume that we can use this CPC. */ #if defined(HAVE_STRUCT_IBV_DEVICE_TRANSPORT_TYPE) -if (IBV_TRANSPORT_IB != btl->device->ib_dev->transport_type) { +if (BTL_OPENIB_CONNECT_BASE_CHECK_IF_NOT_IB(btl)) { BTL_VERBOSE(("ibcm CPC only supported on InfiniBand; skipped on %s:%d", ibv_get_device_name(btl->device->ib_dev), openib_btl->port_num)); diff -r 16b0d6d73529 ompi/mca/btl/openib/connect/btl_openib_connect_oob.c --- a/ompi/mca/btl/openib/connect/btl_openib_connect_oob.c Tue Nov 03 20:00:16 2009 -0800 +++ b/ompi/mca/btl/openib/connect/btl_openib_connect_oob.c Sun Nov 15 14:58:37 2009 +0200 @@ -12,7
Re: Re: RDMAoE verbs questions
On Wed, Nov 25, 2009 at 11:53:47AM +0200, Pavel Shamis (Pasha) wrote: > > >I think I'm asking you about the non RDMACM stuff in openmpi, ibcm, > >xoob, etc. I can't tell at glance if any of them will be safe to run > >on RDMAoE as-is.. > > > The oob and xoob are custom ompi mpi connection manager that were > created specially for Infiniband ONLY. So as result they do what they > supposed to do - they work with infiniband devices :- > ) (and do not work with Iwarp and others.). So we do not brake anything > here. Just so everyone is clear here, the test to run only on Infiniband in the above cases is done by checking the transport_type for Infiniband: if (IBV_TRANSPORT_IB != btl->device->ib_dev->transport_type) { The code protected by this doesn't work on RDMAoE so RDMAoE must not set transport_type to IBV_TRANSPORT_IB. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RDMAoE verbs questions
On Wed, Nov 25, 2009 at 04:41:08PM +0200, Eli Cohen wrote: > On Wed, Nov 25, 2009 at 09:30:40AM -0500, Jeff Squyres wrote: > > > > In practice, we have seen that applications *do* need to query the > > transport type -- at least (real) IB vs. iWARP. It is your > > expectation that IB and IBoE will function identically? > > > > Can you discuss the "transport" vs. "transport_type" questions? > > > > The reason for identifying each specific port with its own transport > is to allow devices which may configure each port differently to be > distinhishable. ConnectX is one such device. As far as I can tell there is no reason for a multi-port device to be represented through verbs as a single device with multiple protocols. If you have a single physical chip with two ports and they are running different protocols it seems much cleaner to me to report it to verbs apps as two devices. Doing this avoids creating compatability problems. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RDMAoE verbs questions
Jeff Squyres wrote: Here's one thread: http://www.open-mpi.org/community/lists/devel/2009/11/7063.php Jeff, looking on the threads you have sent, I didn't find a way to download the patch in a form which can be applied on a source tree, is there a way to do it through this archive? are these patches available from some git tree @mellanox or elsewhere? does anyone have the email address of Vasily Philipov (/vasily_at_[hidden]/), if yes, can you op Pasha please ask him to send me or better, this list the proposed patch, many thanks. Or -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RDMAoE verbs questions
On Wed, Nov 25, 2009 at 09:30:40AM -0500, Jeff Squyres wrote: > > In practice, we have seen that applications *do* need to query the > transport type -- at least (real) IB vs. iWARP. It is your > expectation that IB and IBoE will function identically? > > Can you discuss the "transport" vs. "transport_type" questions? > The reason for identifying each specific port with its own transport is to allow devices which may configure each port differently to be distinhishable. ConnectX is one such device. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RDMAoE verbs questions
On Nov 25, 2009, at 9:04 AM, Tziporet Koren wrote: Note that application does not need to query the transport type, but we thought it can be good to know also from debug perspective. Thus I think sysfs is the best place. In practice, we have seen that applications *do* need to query the transport type -- at least (real) IB vs. iWARP. It is your expectation that IB and IBoE will function identically? Can you discuss the "transport" vs. "transport_type" questions? Thanks! -- Jeff Squyres jsquy...@cisco.com -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RDMAoE verbs questions
Jason Gunthorpe wrote: On Tue, Nov 24, 2009 at 06:23:15PM -0500, Jeff Squyres wrote: 2. I am somewhat confused by the overloading of the term "transport". It appears that a device will have ibv_device.transport_type==IBV_TRANSPORT_IB for both IB and RDMAOE devices. The only way to tell the difference is to examine the new ibv_port_attr.transport field to see if it is RDMA_TRANSPORT_IB or RDMA_TRANSPORT_RDMAOE. I haven't seen these patches but this seems poor to me. I think any app that isn't using rdmacm will need patching and support for RDMAOE (certainly all mine will). libibverbs shouldn't overload the existing transport_type checks for something that is not 100% compatible with IB. Good catch - I agree that the ABI should be 100% backward compatible, and we will fix this. We can add a sysfs option to query the transport type, or add another verb Note that application does not need to query the transport type, but we thought it can be good to know also from debug perspective. Thus I think sysfs is the best place. Opinions? Tziporet -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RDMAoE verbs questions
On Nov 24, 2009, at 11:52 PM, Jason Gunthorpe wrote: > OMPI uses RDMACM (among others), so I'm not sure I follow what you're > asking me...? I think I'm asking you about the non RDMACM stuff in openmpi, ibcm, xoob, etc. I can't tell at glance if any of them will be safe to run on RDMAoE as-is.. Wait, I think I might have been mistaken. I'm looking through the patches this morning and I don't see the "don't allow host-loopback if it's IBoE" logic. The only places I see the check for real IB vs. IBoE is when deciding to use IBCM or OOB connection schemes (which, as Pasha said, are designed to be [real] IB only). But, as you mentioned, there definitely are apps that don't use RDMACM and use an "out of band" (i.e., OOB) mechanism for making IB QP's. They therefore might have similar issues (need to check for real IB vs. IBoE). Sorry for the confusion... I'm going to chalk it up to the fact that it was late at night when I sent that. :-) -- Jeff Squyres jsquy...@cisco.com -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RDMAoE verbs questions
On Nov 25, 2009, at 2:25 AM, Or Gerlitz wrote: > I was reviewing Mellanox's Open MPI patches for RDMAoE support Can you send us point to the patch series (mail thread or some repository where they sit)? Here's one thread: http://www.open-mpi.org/community/lists/devel/2009/11/7063.php the latest patch in that thread is here: http://www.open-mpi.org/community/lists/devel/2009/11/7119.php Here's another thread with a slightly different thread, but with elements of IBoE support in it: http://www.open-mpi.org/community/lists/devel/2009/11/7120.php > 1. It looks like there is a new field on the ibv_port_attr struct: > transport. Is it expected that all device drivers will start filling > in this value, or is it done in the OF core code somewhere? Please note that this field isn't present in the distro provided IB stack and hence it is highly recommended to avoid referring it in your code, FWIW: we have configure tests checking for this field (just like we have configure tests checking for transport_type, because that wasn't always there, either). However, it is a little disturbing that based on this conversation, that field name may change, and therefore we'll have to add *more* configure logic to figure out what exact field to check. The same is true for all the IBoE code -- since none of that code has been approved yet, it's risky to base any code off it. :-\ as least some of us (...) are for decoupling ompi from ofed, so lets not put sticks in the wheels of that process. Hear hear (let's remove MPI from OFED! :-) ). But I think that this is a separate issue. -- Jeff Squyres jsquy...@cisco.com -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Re: RDMAoE verbs questions
I think I'm asking you about the non RDMACM stuff in openmpi, ibcm, xoob, etc. I can't tell at glance if any of them will be safe to run on RDMAoE as-is.. The oob and xoob are custom ompi mpi connection manager that were created specially for Infiniband ONLY. So as result they do what they supposed to do - they work with infiniband devices :- ) (and do not work with Iwarp and others.). So we do not brake anything here. Regards, Pasha. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RDMAoE verbs questions
Jeff Squyres wrote: I was reviewing Mellanox's Open MPI patches for RDMAoE support Hi Jeff, Can you send us point to the patch series (mail thread or some repository where they sit)? 1. It looks like there is a new field on the ibv_port_attr struct: transport. Is it expected that all device drivers will start filling in this value, or is it done in the OF core code somewhere? Please note that this field isn't present in the distro provided IB stack and hence it is highly recommended to avoid referring it in your code, as least some of us (...) are for decoupling ompi from ofed, so lets not put sticks in the wheels of that process. the Open MPI RDMAOE patch implies that host loopback is not supported in RDMAOE mode (but it is in IB mode). To be clear, the OMPI code had to do something different for real IB vs. RDMAOE in at least 1 or 2 places Liran, where this limitation comes from? isn't the HCA supporting bridging (loopback connections) for RDMAoE? if this is the case maybe you should add a device capability to mark that. Or. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RDMAoE verbs questions
On Tue, Nov 24, 2009 at 09:12:53PM -0500, Jeff Squyres wrote: > On Nov 24, 2009, at 7:11 PM, Jason Gunthorpe wrote: > > >Is the same true for openmpi? If you try to run it as is on a RDMAOE > >interface will it work? If not I think that alone should kill this > >idea.. > > > OMPI uses RDMACM (among others), so I'm not sure I follow what you're > asking me...? I think I'm asking you about the non RDMACM stuff in openmpi, ibcm, xoob, etc. I can't tell at glance if any of them will be safe to run on RDMAoE as-is.. At least it looks like there are some basic problems, like oob doesn't exchange a GID, or setup the AH to use a GRH. Basically, if setting transport_type == IB for RDMAoE breaks existing stuff then it probably isn't a good strategy. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RDMAoE verbs questions
On Nov 24, 2009, at 7:11 PM, Jason Gunthorpe wrote: Is the same true for openmpi? If you try to run it as is on a RDMAOE interface will it work? If not I think that alone should kill this idea.. OMPI uses RDMACM (among others), so I'm not sure I follow what you're asking me...? The checks that I was referring to was when OMPI is checking reachability (before it makes QPs). If the peer proc is on the same server, if it's real IB, OMPI concludes "yes, this works". If the peer proc is on the same server and it's IBoE, OMPI concludes "no, this won't work." But I'm not actually sure that's a 100% standards compliant test. We have a long-standing bug ticket open to change this test to actually try to open a QP to the same host and see if it works rather than relying on the transport_type. However, all [real] IB devices that we run with seem to obey this semantic. So there's at least historical precedent...? (that might be a weak argument) -- Jeff Squyres jsquy...@cisco.com -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RDMAoE verbs questions
On Tue, Nov 24, 2009 at 06:23:15PM -0500, Jeff Squyres wrote: > 2. I am somewhat confused by the overloading of the term "transport". > It appears that a device will have > ibv_device.transport_type==IBV_TRANSPORT_IB for both IB and RDMAOE > devices. The only way to tell the difference is to examine the new > ibv_port_attr.transport field to see if it is RDMA_TRANSPORT_IB or > RDMA_TRANSPORT_RDMAOE. I haven't seen these patches but this seems poor to me. I think any app that isn't using rdmacm will need patching and support for RDMAOE (certainly all mine will). libibverbs shouldn't overload the existing transport_type checks for something that is not 100% compatible with IB. Is the same true for openmpi? If you try to run it as is on a RDMAOE interface will it work? If not I think that alone should kill this idea.. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html