Re: RDMAoE verbs questions

2009-12-09 Thread Roland Dreier

  It looks good to me. Thanks, I will take it for RDMAoE.

Great... as Jason suggested, please also add in the appropriate reserved
fields to pad the struct to a 32 bit boundary and zero them in the
wrapper.  So if there is a next time we don't have this problem again.

 - R.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RDMAoE verbs questions

2009-12-09 Thread Jason Gunthorpe
On Wed, Dec 09, 2009 at 11:06:41AM -0800, Roland Dreier wrote:
 
   It looks good to me. Thanks, I will take it for RDMAoE.
 
 Great... as Jason suggested, please also add in the appropriate reserved
 fields to pad the struct to a 32 bit boundary and zero them in the
 wrapper.  So if there is a next time we don't have this problem again.

Also, if we really must do this, can you please send a patch to Roland
that at least adds the constants for IB, ASAP. Ideally to be included
in OFED 1.5.

At least that way people can make updates to check it prior to RDMAoE
actually appearing..

Jason
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RDMAoE verbs questions

2009-12-09 Thread Eli Cohen
On Wed, Dec 09, 2009 at 02:48:43PM -0700, Jason Gunthorpe wrote:
 
 Also, if we really must do this, can you please send a patch to Roland
 that at least adds the constants for IB, ASAP. Ideally to be included
 in OFED 1.5.
 
 At least that way people can make updates to check it prior to RDMAoE
 actually appearing..
 

Sure, I'll send the patch tomorrow and and it will be in the next
OFED-RDMAoE build too.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RDMAoE verbs questions

2009-12-04 Thread Jason Gunthorpe
On Fri, Dec 04, 2009 at 08:03:31PM -0800, Roland Dreier wrote:

 then I think legacy apps should be OK (port_attr size doesn't change,
 binary compat is still there), and new apps that do check link_layer
 should also be OK ... if they use an old library and/or old driver,
 they'll see LINK_LAYER_UNSPECIFIED, which means that IBoE is not supported.
 
 What do you think, does this work?

Yeah, that should be fine, quite sneaky indeed..

Maybe zero the whole padding though, for future?

Jason
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: RDMAoE verbs questions

2009-12-03 Thread Liran Liss
 Existing apps rely on transport_type == IBV_TRANSPORT_IB to indicate
IB management is present. There are many examples of this.
 The art of API compatability is to not break existing old apps, so you
don't get to change the meaning of transport_type == IBV_TRANSPORT_IB to
mean 'it is only IB verbs like'. That breaks the API.
 Adding a new field to port_attr preserves functionality but not
compatability. I hope you understand the difference.

I understand exactly what you mean, but I want to propose another way of
looking at the compatibility issue:

IB management is a network service. Just as an administrator might
mistakenly try to access an FTP server over a wrong eth interface, an IB
admin can mistakenly run an IB management application (or a non-rdmacm
app that is incompatible with RDMAoE) on a RDMAoE port. In both cases,
the service is unreachable, but otherwise no harm is done.
Basically, IB management is supported by RDMAoE --- you use the same
verbs to access it but you don't get any answers...
(Of course that it terms of implementation, we can choose to drop SMPs
or non-CM MADs rather then sending them on the wire.)

There is a tradeoff here between transparently supporting existing
rdmacm apps versus making sure that non-rdmacm apps that may come across
an RDMAoE port do not attempt to use it. 
Given the fact that rdmacm has become a preferred approach, and that for
non-rdmacm apps the worst case effect is that of being unable to create
a connection through a port that they were not designed to support to
begin with, we prefer the approach that we proposed.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: RDMAoE verbs questions

2009-12-02 Thread Liran Liss

 So? There are substantial semantic differences for *all* non-rdmacm
applications. Even common ones like OpenMPI. You propose to ignore them?

On the contrary! Any application that *does* care what the link layer is
can look up a new field in port_attr (rather than a new node transport
type).
Applications that don't, both old and new, can continue as normal - no
changes to the code are required.

 RDMAoE *is* IB transport over Ethernet - we don't want different 
 devices with different node types exactly for this reason: 
 applications shouldn't care if they are running over IB or RDMAoE,
and 
 shouldn't add another switch statement to support RDMAoE.

 Nonsense. RDMAoE is no such thing, it is utterly incompatible with the
IB management model. It is some new protocol that is only about 90%
compatible with IB.

You are missing the point here - RDMAoE is 100% compatible with IB at
the *transport* level, as reflected by the Verbs.
The point that the management model is different is true, but
irrelevant. The only transport-related issue that matters is addressing,
but for user-space apps, it is completely abstracted by the rdmacm.

Non-rdmacm apps fall into 2 main categories:
1. IB management+diagnostics apps - these are irrelevant for an Eth
network anyway.
2. High-performance middleware such as MPI and SHMEM - these perform
optimizations according to the link protocol anyway.

So, all relevant apps will work great with either IB or RDMAoE, in a
transparent manner.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: RDMAoE verbs questions

2009-12-02 Thread Liran Liss
Hi Paul, you are not missing anything - lookback communication will work
in RDMAoE just as in IB.
--Liran

-Original Message-
From: Paul Grun [mailto:pg...@systemfabricworks.com] 
Sent: Wednesday, December 02, 2009 10:55 AM
To: 'Or Gerlitz'; Liran Liss
Cc: 'Sean Hefty'; 'Jason Gunthorpe'; 'Eli Cohen'; 'Jeff Squyres';
linux-rdma@vger.kernel.org
Subject: RE: RDMAoE verbs questions

Why do you say that Or?  
I'm a hardware guy so can't comment on what the s/w supports/prevents,
but I can see no reason why loopback wouldn't be supported.  In fact, if
I were guessing, I would guess that the spec currently under development
would support loopback.tt What am I missing?
-Paul

-Original Message-
From: linux-rdma-ow...@vger.kernel.org
[mailto:linux-rdma-ow...@vger.kernel.org] On Behalf Of Or Gerlitz
Sent: Wednesday, December 02, 2009 12:09 AM
To: Liran Liss
Cc: Sean Hefty; Jason Gunthorpe; Eli Cohen; Jeff Squyres;
linux-rdma@vger.kernel.org
Subject: Re: RDMAoE verbs questions

Liran Liss wrote:
 from an rdmacm app's point of view - there is no visible difference
between IB and RDMAoE ports: both support the complete set of Verbs,
just as any IB transport provider
   
wrong,  local (loopback) communication aren't supported  with RDMAoE.

Or.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org More majordomo info
at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RDMAoE verbs questions

2009-12-02 Thread Or Gerlitz

Paul Grun wrote:
Why do you say that Or? 
I said that b/c the latest patch set posted by Mellanox doesn't support 
loopback, I hear now that this was a temporal limitation which will be 
removed, let it be.


Or.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RDMAoE verbs questions

2009-12-01 Thread Eli Cohen
On Mon, Nov 30, 2009 at 09:03:47AM -0500, Jeff Squyres wrote:
 Per my prior question: is it expected that IBoE will function
 *exactly* the same as real IB?  The addition of the port attribute
 seems to imply not.

IBoE and IB should work exactly the same from the perspective of a
user level application that makes use of rdmacm to create connections.
Such apps can ignore the new attribute. And we believe this should
cover the vast majority of apps.  The new port attribute optionally
allows the distinction for apps that need it (e.g. those that do not
use the rdmacm, apps that have a reason to prefer one over the other
when there is a choice, etc).

 
 Additionally, per Jason's question, why not simply expose this as an
 additional device?  E.g., can you APM across a real IB port and an
 IBoE port on the same CX2?  I'm guessing that they're *effectively*
 different devices; it might make sense expose them as different
 virtual device_t's.  Just my $0.02.

Yes. It should be possible to do APM across them. And with the ABI
compatible link-protocol attribute we see no reason to force the
isolation.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RDMAoE verbs questions

2009-12-01 Thread Eli Cohen
On Mon, Nov 30, 2009 at 10:50:02AM -0800, Roland Dreier wrote:
 
 I was thinking the same thing, although maybe a name like link_layer
 would be clearer?
 

Sure, that's cleaer - I'll change that.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: RDMAoE verbs questions

2009-12-01 Thread Sean Hefty
RDMAoE *is* IB transport over Ethernet

RDMAoE carries something that looks a lot like IB L3 or an IPv6 header, so it
isn't exactly IB transport over Ethernet.

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RDMAoE verbs questions

2009-12-01 Thread Eli Cohen
On Mon, Nov 30, 2009 at 02:01:45PM -0600, Todd Rimmer wrote:
 
 If a given architecture rounds sizeof(transport) up to 16 bits or 32 bits, 
 then the replacement field should be uint16_t or uint32_t respectively, 
 otherwise existing binary applications which fetch transport will fetch 
 additional undefined bytes which follow it in the new structure.
 
 The big question is whether all presently supported architectures use the 
 same size for enum?
 
 I did a quicky test program and on SLES10 x86_64 sizeof(an enum) is 32 bits.  
 Hence uint8_t would break binary compatibility on that platform.
 
You probably refer to binary compatibility between defferent versions
of the RDMAoE libibverbs (e.g. the one with enum and the one with
uint8_t). I think this is not a big issue and recompiling should solve
that.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RDMAoE verbs questions

2009-12-01 Thread Jason Gunthorpe
On Tue, Dec 01, 2009 at 06:22:06PM +0200, Liran Liss wrote:
  Dealing with ABI compatability is a different issue, this new scheme
 is API incompatible due to the change in semantics for existing values.
 
 For rdmacm applications, there are no semantic changes between IB and
 RDMAoE.

So? There are substantial semantic differences for *all* non-rdmacm
applications. Even common ones like OpenMPI. You propose to ignore them?

  Please look at my message regarding using multiple devices, perhaps
 you can improve on that general idea.
 
 RDMAoE *is* IB transport over Ethernet - we don't want different devices
 with different node types exactly for this reason: applications
 shouldn't care if they are running over IB or RDMAoE, and shouldn't add
 another switch statement to support RDMAoE.

Nonsense. RDMAoE is no such thing, it is utterly incompatible with the
IB management model. It is some new protocol that is only about 90%
compatible with IB.

Apps using rdmacm shouldn't care one way or the other, apps that don't
*need* a new transport type.

Jason
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RDMAoE verbs questions

2009-11-30 Thread Eli Cohen
On Tue, Nov 24, 2009 at 05:11:36PM -0700, Jason Gunthorpe wrote:
 On Tue, Nov 24, 2009 at 06:23:15PM -0500, Jeff Squyres wrote:
 
  2. I am somewhat confused by the overloading of the term transport.   
  It appears that a device will have  
  ibv_device.transport_type==IBV_TRANSPORT_IB for both IB and RDMAOE  
  devices.  The only way to tell the difference is to examine the new  
  ibv_port_attr.transport field to see if it is RDMA_TRANSPORT_IB or  
  RDMA_TRANSPORT_RDMAOE.
 
 I haven't seen these patches but this seems poor to me. I think any
 app that isn't using rdmacm will need patching and support for RDMAOE
 (certainly all mine will). libibverbs shouldn't overload the existing
 transport_type checks for something that is not 100% compatible with
 IB.
 
 Is the same true for openmpi? If you try to run it as is on a RDMAOE
 interface will it work? If not I think that alone should kill this
 idea..
 
If we change struct ibv_port_attr transport field from enum to uint8,
we eliminate binary compatibility problems. That's because the previous
filed is aligned to 16 bits address so that leaves us 16 bits more.

diff --git a/include/infiniband/verbs.h b/include/infiniband/verbs.h
index 07d4395..f7fe68d 100644
--- a/include/infiniband/verbs.h
+++ b/include/infiniband/verbs.h
@@ -192,7 +192,7 @@ struct ibv_port_attr {
uint8_t active_width;
uint8_t active_speed;
uint8_t phys_state;
-   enum rdma_transport_type transport;
+   uint8_t transport;
 };

Moreover, I would like to change the field's name from transport to
link_protocl. Let me know if that makes more sense to you.

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RDMAoE verbs questions

2009-11-30 Thread Jeff Squyres

Sorry -- I replied from my PDA last week but the list rejected the mail.

All I have is what was sent to the OMPI list (although I see Pasha  
attached the patch in a later mail on this thread).  Note that we  
(OMPI) tend to operate a bit differently than OpenFabrics -- we don't  
typically send patches to lists, we don't typically use git, etc.




On Nov 25, 2009, at 9:59 AM, Or Gerlitz wrote:


Jeff Squyres wrote:
 Here's one thread:
 http://www.open-mpi.org/community/lists/devel/2009/11/7063.php
Jeff, looking on the threads you have sent, I didn't find a way to
download the patch in a form which can be applied on a source tree, is
there a way to do it through this archive? are these patches available
from some git tree @mellanox or elsewhere? does anyone have the email
address of Vasily Philipov (/vasily_at_[hidden]/), if yes, can you op
Pasha please ask him to send me or better, this list the proposed  
patch,

many thanks.

Or





--
Jeff Squyres
jsquy...@cisco.com

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RDMAoE verbs questions

2009-11-30 Thread Jeff Squyres
Per my prior question: is it expected that IBoE will function  
*exactly* the same as real IB?  The addition of the port attribute  
seems to imply not.


Additionally, per Jason's question, why not simply expose this as an  
additional device?  E.g., can you APM across a real IB port and an  
IBoE port on the same CX2?  I'm guessing that they're *effectively*  
different devices; it might make sense expose them as different  
virtual device_t's.  Just my $0.02.




On Nov 30, 2009, at 8:34 AM, Eli Cohen wrote:


On Tue, Nov 24, 2009 at 05:11:36PM -0700, Jason Gunthorpe wrote:
 On Tue, Nov 24, 2009 at 06:23:15PM -0500, Jeff Squyres wrote:

  2. I am somewhat confused by the overloading of the term  
transport.

  It appears that a device will have
  ibv_device.transport_type==IBV_TRANSPORT_IB for both IB and RDMAOE
  devices.  The only way to tell the difference is to examine the  
new

  ibv_port_attr.transport field to see if it is RDMA_TRANSPORT_IB or
  RDMA_TRANSPORT_RDMAOE.

 I haven't seen these patches but this seems poor to me. I think any
 app that isn't using rdmacm will need patching and support for  
RDMAOE
 (certainly all mine will). libibverbs shouldn't overload the  
existing

 transport_type checks for something that is not 100% compatible with
 IB.

 Is the same true for openmpi? If you try to run it as is on a RDMAOE
 interface will it work? If not I think that alone should kill this
 idea..

If we change struct ibv_port_attr transport field from enum to uint8,
we eliminate binary compatibility problems. That's because the  
previous

filed is aligned to 16 bits address so that leaves us 16 bits more.

diff --git a/include/infiniband/verbs.h b/include/infiniband/verbs.h
index 07d4395..f7fe68d 100644
--- a/include/infiniband/verbs.h
+++ b/include/infiniband/verbs.h
@@ -192,7 +192,7 @@ struct ibv_port_attr {
uint8_t active_width;
uint8_t active_speed;
uint8_t phys_state;
-   enum rdma_transport_type transport;
+   uint8_t transport;
 };

Moreover, I would like to change the field's name from transport to
link_protocl. Let me know if that makes more sense to you.

--
To unsubscribe from this list: send the line unsubscribe linux- 
rdma in

the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html




--
Jeff Squyres
jsquy...@cisco.com

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RDMAoE verbs questions

2009-11-30 Thread Jason Gunthorpe
On Mon, Nov 30, 2009 at 03:34:06PM +0200, Eli Cohen wrote:

 If we change struct ibv_port_attr transport field from enum to uint8,
 we eliminate binary compatibility problems. That's because the previous
 filed is aligned to 16 bits address so that leaves us 16 bits more.

Dealing with ABI compatability is a different issue, this new scheme
is API incompatible due to the change in semantics for existing values.

Please look at my message regarding using multiple devices, perhaps
you can improve on that general idea.

Jason
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RDMAoE verbs questions

2009-11-30 Thread Roland Dreier

  If we change struct ibv_port_attr transport field from enum to uint8,
  we eliminate binary compatibility problems. That's because the previous
  filed is aligned to 16 bits address so that leaves us 16 bits more.
  
  diff --git a/include/infiniband/verbs.h b/include/infiniband/verbs.h
  index 07d4395..f7fe68d 100644
  --- a/include/infiniband/verbs.h
  +++ b/include/infiniband/verbs.h
  @@ -192,7 +192,7 @@ struct ibv_port_attr {
  uint8_t active_width;
  uint8_t active_speed;
  uint8_t phys_state;
  -   enum rdma_transport_type transport;
  +   uint8_t transport;
   };

Do all architectures round up the structure size?  ie is this always
going to preserve ABI?

  Moreover, I would like to change the field's name from transport to
  link_protocl. Let me know if that makes more sense to you.

I was thinking the same thing, although maybe a name like link_layer
would be clearer?

 - R.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RDMAoE verbs questions

2009-11-30 Thread Jason Gunthorpe
On Mon, Nov 30, 2009 at 10:50:02AM -0800, Roland Dreier wrote:
 
   If we change struct ibv_port_attr transport field from enum to uint8,
   we eliminate binary compatibility problems. That's because the previous
   filed is aligned to 16 bits address so that leaves us 16 bits more.
   
   diff --git a/include/infiniband/verbs.h b/include/infiniband/verbs.h
   index 07d4395..f7fe68d 100644
   +++ b/include/infiniband/verbs.h
   @@ -192,7 +192,7 @@ struct ibv_port_attr {
   uint8_t active_width;
   uint8_t active_speed;
   uint8_t phys_state;
   -   enum rdma_transport_type transport;
   +   uint8_t transport;
};
 
 Do all architectures round up the structure size?  ie is this always
 going to preserve ABI?

Yes, every Linux arch aligns structs to the min alignment for the
members, so at least 32 in this case.

However, it doesn't really matter, look at ibv_cmd_query_port, it
doesn't zero the padding. So there must be an ABI bump to ensure that
new code links to a library that doesn't fill the new member with
garbage.

This is a messy one, the low level libraries have to be reved somehow too..
ops.query_port2() I guess.

Jason
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: RDMAoE verbs questions

2009-11-30 Thread Todd Rimmer
   diff --git a/include/infiniband/verbs.h b/include/infiniband/verbs.h
   index 07d4395..f7fe68d 100644
   --- a/include/infiniband/verbs.h
   +++ b/include/infiniband/verbs.h
   @@ -192,7 +192,7 @@ struct ibv_port_attr {
   uint8_t active_width;
   uint8_t active_speed;
   uint8_t phys_state;
   -   enum rdma_transport_type transport;
   +   uint8_t transport;

If a given architecture rounds sizeof(transport) up to 16 bits or 32 bits, then 
the replacement field should be uint16_t or uint32_t respectively, otherwise 
existing binary applications which fetch transport will fetch additional 
undefined bytes which follow it in the new structure.

The big question is whether all presently supported architectures use the same 
size for enum?

I did a quicky test program and on SLES10 x86_64 sizeof(an enum) is 32 bits.  
Hence uint8_t would break binary compatibility on that platform.

Todd Rimmer

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RDMAoE verbs questions

2009-11-26 Thread Pavel Shamis (Pasha)

Or Gerlitz wrote:

Pavel Shamis (Pasha) wrote:

The patch is attached
Thanks, this patch basically replaces checks for the device transport 
type to be IB to a check that makes sure either the former happens or 
the port transport type is rdmaoe. As Jason, Tziporet and noted, the 
port transport type seems to be bad and non-comapatible/operable idea, 
so it should and probably could be avoided.
The only reason for this changes is the fact that for IB devices we 
prefer to use our own open mpi connection managers. In case if we will 
decide to use RDMA-CM for all devices the number of changes will be zero...


I see another patch @ 
http://www.open-mpi.org/community/lists/devel/2009/11/7063.php
can you send that one as well. The you sent patch isn't signed so I 
can't address the author in further replies (unless you are the 
author), also it wasn't generated with the -p option of diff which 
would show for each change what is the effected function, doing so 
would help in the review.
Ohh. It was our first patch. Some of iwarp devices have the first 
packet issue. So the RDMACM code in Ompi have bunch of work-around code 
that try to resolve the issue on MPI level. We don't have such problems 
with IB, so we tried to disable the work-around flow for IB devices, but 
it did not work well. So we decided to use the current ompi code as is, 
in future maybe we will implement own ompi rdmacm code that will not 
have all this work around flows.


Regards,

Pasha
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RDMAoE verbs questions

2009-11-26 Thread Or Gerlitz
Pavel Shamis (Pasha) wrote:
 The only reason for this changes is the fact that for IB devices we
 prefer to use our own open mpi connection managers. In case if we will
 decide to use RDMA-CM for all devices the number of changes will be zero...

whatever, currently, this change is still there, and best if you remove it 
and find another way to set this predicate.

 So we decided to use the current ompi code as is, in future maybe we will
 implement own ompi rdmacm code that will not have all this work around flows.

just to make sure I am with you, all in all, only one patch is proposed to ompi 
for 
rdmaoe support and is the patch which we discuss above, this patch does three 
things:

1. changes BTL_OPENIB_CONNECT_BASE_CHECK_IF_NOT_IB to look on the port 
transport type
2. if the port transport is rdmaoe don't run loopback connections on IB
3. some change in the qp destroy logic
4. that's it...

correct? can you comment on #2? why loopback connections aren't supported?

Or.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RDMAoE verbs questions

2009-11-25 Thread Jeff Squyres

On Nov 25, 2009, at 2:25 AM, Or Gerlitz wrote:


 I was reviewing Mellanox's Open MPI patches for RDMAoE support

Can you send us point to the patch series (mail thread or some
repository where they sit)?



Here's one thread:

http://www.open-mpi.org/community/lists/devel/2009/11/7063.php

the latest patch in that thread is here:

http://www.open-mpi.org/community/lists/devel/2009/11/7119.php

Here's another thread with a slightly different thread, but with  
elements of IBoE support in it:


http://www.open-mpi.org/community/lists/devel/2009/11/7120.php


 1. It looks like there is a new field on the ibv_port_attr struct:
 transport. Is it expected that all device drivers will start filling
 in this value, or is it done in the OF core code somewhere?
Please note that this field isn't present in the distro provided IB
stack and hence it is highly recommended to avoid referring it in your
code,



FWIW: we have configure tests checking for this field (just like we  
have configure tests checking for transport_type, because that wasn't  
always there, either).  However, it is a little disturbing that based  
on this conversation, that field name may change, and therefore we'll  
have to add *more* configure logic to figure out what exact field to  
check.  The same is true for all the IBoE code -- since none of that  
code has been approved yet, it's risky to base any code off it.  :-\



as least some of us (...) are for decoupling ompi from ofed, so
lets not put sticks in the wheels of that process.



Hear hear (let's remove MPI from OFED! :-) ).  But I think that this  
is a separate issue.


--
Jeff Squyres
jsquy...@cisco.com

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RDMAoE verbs questions

2009-11-25 Thread Jeff Squyres

On Nov 24, 2009, at 11:52 PM, Jason Gunthorpe wrote:

 OMPI uses RDMACM (among others), so I'm not sure I follow what  
you're

 asking me...?

I think I'm asking you about the non RDMACM stuff in openmpi, ibcm,
xoob, etc. I can't tell at glance if any of them will be safe to run
on RDMAoE as-is..




Wait, I think I might have been mistaken.  I'm looking through the  
patches this morning and I don't see the don't allow host-loopback if  
it's IBoE logic.  The only places I see the check for real IB vs.  
IBoE is when deciding to use IBCM or OOB connection schemes (which, as  
Pasha said, are designed to be [real] IB only).


But, as you mentioned, there definitely are apps that don't use RDMACM  
and use an out of band (i.e., OOB) mechanism for making IB QP's.   
They therefore might have similar issues (need to check for real IB  
vs. IBoE).


Sorry for the confusion... I'm going to chalk it up to the fact that  
it was late at night when I sent that.  :-)


--
Jeff Squyres
jsquy...@cisco.com

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RDMAoE verbs questions

2009-11-25 Thread Tziporet Koren

Jason Gunthorpe wrote:

On Tue, Nov 24, 2009 at 06:23:15PM -0500, Jeff Squyres wrote:

  
2. I am somewhat confused by the overloading of the term transport.   
It appears that a device will have  
ibv_device.transport_type==IBV_TRANSPORT_IB for both IB and RDMAOE  
devices.  The only way to tell the difference is to examine the new  
ibv_port_attr.transport field to see if it is RDMA_TRANSPORT_IB or  
RDMA_TRANSPORT_RDMAOE.



I haven't seen these patches but this seems poor to me. I think any
app that isn't using rdmacm will need patching and support for RDMAOE
(certainly all mine will). libibverbs shouldn't overload the existing
transport_type checks for something that is not 100% compatible with
IB.

  
Good catch - I agree that the ABI should be 100% backward compatible, 
and we will fix this.

We can add a sysfs option to query the transport type, or add another verb

Note that application does not need to query the transport type, but we 
thought it can be good to know also from debug perspective.

Thus I think sysfs is the best place.

Opinions?

Tziporet

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RDMAoE verbs questions

2009-11-25 Thread Eli Cohen
On Wed, Nov 25, 2009 at 09:30:40AM -0500, Jeff Squyres wrote:
 
 In practice, we have seen that applications *do* need to query the
 transport type -- at least (real) IB vs. iWARP.  It is your
 expectation that IB and IBoE will function identically?
 
 Can you discuss the transport vs. transport_type questions?
 

The reason for identifying each specific port with its own transport
is to allow devices which may configure each port differently to be
distinhishable. ConnectX is one such device.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RDMAoE verbs questions

2009-11-25 Thread Or Gerlitz

Jeff Squyres wrote:
Here's one thread:  
http://www.open-mpi.org/community/lists/devel/2009/11/7063.php
Jeff, looking on the threads you have sent, I didn't find a way to 
download the patch in a form which can be applied on a source tree, is 
there a way to do it through this archive? are these patches available 
from some git tree @mellanox or elsewhere? does anyone have the email 
address of Vasily Philipov (/vasily_at_[hidden]/), if yes, can you op 
Pasha please ask him to send me or better, this list the proposed patch, 
many thanks.


Or

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RDMAoE verbs questions

2009-11-25 Thread Jason Gunthorpe
On Wed, Nov 25, 2009 at 04:41:08PM +0200, Eli Cohen wrote:
 On Wed, Nov 25, 2009 at 09:30:40AM -0500, Jeff Squyres wrote:
  
  In practice, we have seen that applications *do* need to query the
  transport type -- at least (real) IB vs. iWARP.  It is your
  expectation that IB and IBoE will function identically?
  
  Can you discuss the transport vs. transport_type questions?
  
 
 The reason for identifying each specific port with its own transport
 is to allow devices which may configure each port differently to be
 distinhishable. ConnectX is one such device.

As far as I can tell there is no reason for a multi-port device to
be represented through verbs as a single device with multiple
protocols.

If you have a single physical chip with two ports and they are running
different protocols it seems much cleaner to me to report it to verbs
apps as two devices.

Doing this avoids creating compatability problems.

Jason
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RDMAoE verbs questions

2009-11-25 Thread Pavel Shamis (Pasha)

Or,
The patch is attached.

Regards,
Pasha.

Or Gerlitz wrote:

Jeff Squyres wrote:
Here's one thread:  
http://www.open-mpi.org/community/lists/devel/2009/11/7063.php
Jeff, looking on the threads you have sent, I didn't find a way to 
download the patch in a form which can be applied on a source tree, is 
there a way to do it through this archive? are these patches available 
from some git tree @mellanox or elsewhere? does anyone have the email 
address of Vasily Philipov (/vasily_at_[hidden]/), if yes, can you op 
Pasha please ask him to send me or better, this list the proposed 
patch, many thanks.


Or

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html



diff -r 16b0d6d73529 ompi/config/ompi_check_openib.m4
--- a/ompi/config/ompi_check_openib.m4	Tue Nov 03 20:00:16 2009 -0800
+++ b/ompi/config/ompi_check_openib.m4	Sun Nov 15 14:58:37 2009 +0200
@@ -13,7 +13,7 @@
 # Copyright (c) 2006-2008 Cisco Systems, Inc.  All rights reserved.
 # Copyright (c) 2006-2007 Los Alamos National Security, LLC.  All rights
 # reserved.
-# Copyright (c) 2006-2008 Mellanox Technologies. All rights reserved.
+# Copyright (c) 2006-2009 Mellanox Technologies. All rights reserved.
 # $COPYRIGHT$
 # 
 # Additional copyrights may follow
@@ -204,6 +204,21 @@
[$1_have_ibcm=1
$1_LIBS=-libcm $$1_LIBS])])
fi
+		   
+   # Check support for RDMAoE devices
+   $1_have_rdmaoe=0
+   AC_CHECK_DECLS([RDMA_TRANSPORT_RDMAOE],
+  [$1_have_rdmaoe=1], [],
+  [#include infiniband/verbs.h])
+
+   AC_MSG_CHECKING([if RDMAoE support is enabled])
+   if test 1 = $$1_have_rdmaoe; then
+AC_DEFINE_UNQUOTED([OMPI_HAVE_RDMAOE], [$$1_have_rdmaoe], [Enable RDMAoE support])
+AC_MSG_RESULT([yes])
+   else
+AC_MSG_RESULT([no])
+   fi
+
   ])
 
 # Check to see if infiniband/driver.h works.  It is known to
diff -r 16b0d6d73529 ompi/mca/btl/openib/btl_openib.c
--- a/ompi/mca/btl/openib/btl_openib.c	Tue Nov 03 20:00:16 2009 -0800
+++ b/ompi/mca/btl/openib/btl_openib.c	Sun Nov 15 14:58:37 2009 +0200
@@ -354,6 +354,13 @@
 }
 #endif
 
+#ifdef OMPI_HAVE_RDMAOE
+if(RDMA_TRANSPORT_RDMAOE == (openib_btl-ib_port_attr.transport) 
+OPAL_PROC_ON_LOCAL_NODE(ompi_proc-proc_flags)) {
+continue;
+}
+#endif
+
 if(NULL == (ib_proc = mca_btl_openib_proc_create(ompi_proc))) {
 return OMPI_ERR_OUT_OF_RESOURCE;
 }
diff -r 16b0d6d73529 ompi/mca/btl/openib/connect/base.h
--- a/ompi/mca/btl/openib/connect/base.h	Tue Nov 03 20:00:16 2009 -0800
+++ b/ompi/mca/btl/openib/connect/base.h	Sun Nov 15 14:58:37 2009 +0200
@@ -1,6 +1,7 @@
 /*
  * Copyright (c) 2007-2008 Cisco Systems, Inc.  All rights reserved.
  *
+ * Copyright (c) 2009  Mellanox Technologies.  All rights reserved.
  * $COPYRIGHT$
  * 
  * Additional copyrights may follow
@@ -13,6 +14,17 @@
 
 #include connect/connect.h
 
+#ifdef OMPI_HAVE_RDMAOE
+#define BTL_OPENIB_CONNECT_BASE_CHECK_IF_NOT_IB(btl)   \
+(((IBV_TRANSPORT_IB != ((btl)-device-ib_dev-transport_type)) || \
+(RDMA_TRANSPORT_RDMAOE == ((btl)-ib_port_attr.transport))) ?  \
+true : false)
+#else
+#define BTL_OPENIB_CONNECT_BASE_CHECK_IF_NOT_IB(btl)   \
+((IBV_TRANSPORT_IB != ((btl)-device-ib_dev-transport_type)) ?   \
+true : false)
+#endif
+
 BEGIN_C_DECLS
 
 /*
diff -r 16b0d6d73529 ompi/mca/btl/openib/connect/btl_openib_connect_ibcm.c
--- a/ompi/mca/btl/openib/connect/btl_openib_connect_ibcm.c	Tue Nov 03 20:00:16 2009 -0800
+++ b/ompi/mca/btl/openib/connect/btl_openib_connect_ibcm.c	Sun Nov 15 14:58:37 2009 +0200
@@ -1,6 +1,6 @@
 /*
  * Copyright (c) 2007-2009 Cisco Systems, Inc.  All rights reserved.
- * Copyright (c) 2008  Mellanox Technologies. All rights reserved.
+ * Copyright (c) 2008-2009 Mellanox Technologies. All rights reserved.
  *
  * $COPYRIGHT$
  * 
@@ -653,7 +653,7 @@
we're in an old version of OFED that is IB only (i.e., no
iWarp), so we can safely assume that we can use this CPC. */
 #if defined(HAVE_STRUCT_IBV_DEVICE_TRANSPORT_TYPE)
-if (IBV_TRANSPORT_IB != btl-device-ib_dev-transport_type) {
+if (BTL_OPENIB_CONNECT_BASE_CHECK_IF_NOT_IB(btl)) {
 BTL_VERBOSE((ibcm CPC only supported on InfiniBand; skipped on %s:%d,
  ibv_get_device_name(btl-device-ib_dev),
  openib_btl-port_num));
diff -r 16b0d6d73529 ompi/mca/btl/openib/connect/btl_openib_connect_oob.c
--- a/ompi/mca/btl/openib/connect/btl_openib_connect_oob.c	Tue Nov 03 20:00:16 2009 -0800
+++ b/ompi/mca/btl/openib/connect/btl_openib_connect_oob.c	Sun Nov 15 14:58:37 2009 +0200

Re: RDMAoE verbs questions

2009-11-25 Thread Or Gerlitz

Pavel Shamis (Pasha) wrote:

The patch is attached
Thanks, this patch basically replaces checks for the device transport 
type to be IB to a check that makes sure either the former happens or 
the port transport type is rdmaoe. As Jason, Tziporet and noted, the 
port transport type seems to be bad and non-comapatible/operable idea, 
so it should and probably could be avoided.


I see another patch @ 
http://www.open-mpi.org/community/lists/devel/2009/11/7063.php
can you send that one as well. The you sent patch isn't signed so I 
can't address the author in further replies (unless you are the author), 
also it wasn't generated with the -p option of diff which would show for 
each change what is the effected function, doing so would help in the 
review.


Or.

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RDMAoE verbs questions

2009-11-24 Thread Jason Gunthorpe
On Tue, Nov 24, 2009 at 06:23:15PM -0500, Jeff Squyres wrote:

 2. I am somewhat confused by the overloading of the term transport.   
 It appears that a device will have  
 ibv_device.transport_type==IBV_TRANSPORT_IB for both IB and RDMAOE  
 devices.  The only way to tell the difference is to examine the new  
 ibv_port_attr.transport field to see if it is RDMA_TRANSPORT_IB or  
 RDMA_TRANSPORT_RDMAOE.

I haven't seen these patches but this seems poor to me. I think any
app that isn't using rdmacm will need patching and support for RDMAOE
(certainly all mine will). libibverbs shouldn't overload the existing
transport_type checks for something that is not 100% compatible with
IB.

Is the same true for openmpi? If you try to run it as is on a RDMAOE
interface will it work? If not I think that alone should kill this
idea..

Jason
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RDMAoE verbs questions

2009-11-24 Thread Jeff Squyres

On Nov 24, 2009, at 7:11 PM, Jason Gunthorpe wrote:


Is the same true for openmpi? If you try to run it as is on a RDMAOE
interface will it work? If not I think that alone should kill this
idea..




OMPI uses RDMACM (among others), so I'm not sure I follow what you're  
asking me...?


The checks that I was referring to was when OMPI is checking  
reachability (before it makes QPs).  If the peer proc is on the same  
server, if it's real IB, OMPI concludes yes, this works.  If the  
peer proc is on the same server and it's IBoE, OMPI concludes no,  
this won't work.


But I'm not actually sure that's a 100% standards compliant test.  We  
have a long-standing bug ticket open to change this test to actually  
try to open a QP to the same host and see if it works rather than  
relying on the transport_type.  However, all [real] IB devices that we  
run with seem to obey this semantic.  So there's at least historical  
precedent...?  (that might be a weak argument)


--
Jeff Squyres
jsquy...@cisco.com

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RDMAoE verbs questions

2009-11-24 Thread Jason Gunthorpe
On Tue, Nov 24, 2009 at 09:12:53PM -0500, Jeff Squyres wrote:
 On Nov 24, 2009, at 7:11 PM, Jason Gunthorpe wrote:
 
 Is the same true for openmpi? If you try to run it as is on a RDMAOE
 interface will it work? If not I think that alone should kill this
 idea..
 
  
 OMPI uses RDMACM (among others), so I'm not sure I follow what you're  
 asking me...?

I think I'm asking you about the non RDMACM stuff in openmpi, ibcm,
xoob, etc. I can't tell at glance if any of them will be safe to run
on RDMAoE as-is..

At least it looks like there are some basic problems, like oob doesn't
exchange a GID, or setup the AH to use a GRH.

Basically, if setting transport_type == IB for RDMAoE breaks existing
stuff then it probably isn't a good strategy.

Jason
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RDMAoE verbs questions

2009-11-24 Thread Or Gerlitz

Jeff Squyres wrote:

I was reviewing Mellanox's Open MPI patches for RDMAoE support

Hi Jeff,

Can you send us point to the patch series (mail thread or some 
repository where they sit)?


1. It looks like there is a new field on the ibv_port_attr struct: 
transport. Is it expected that all device drivers will start filling 
in this value, or is it done in the OF core code somewhere?
Please note that this field isn't present in the distro provided IB 
stack and hence it is highly recommended to avoid referring it in your 
code, as least some of us (...) are for decoupling ompi from ofed, so 
lets not put sticks in the wheels of that process.


the Open MPI RDMAOE patch implies that host loopback is not supported 
in RDMAOE mode (but it is in IB mode).  To be clear, the OMPI code had 
to do something different for real IB vs. RDMAOE in at least 1 or 2 places
Liran, where this limitation comes from? isn't the HCA supporting 
bridging (loopback connections) for RDMAoE? if this is the case maybe 
you should add a device capability to mark that.


Or.

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html