RE: [openib-general] RDMA connection and address translation API

2005-08-29 Thread Guy German
Sean wrote:
> >>It looks like this would work.  If a client wanted to create multiple
> >>connections to the same remote service (for example, to separate control and
> >>data), then it seems more efficient to move the asynchronous at outside of 
> >>the
> >>connect call.
> >>- Sean
> 
> Thats a good point. What I had in mind was mainly simplicity for the
> consumer - save him dealing with another upcall. 
> 
> Maybe caching in at module would make things better, but I agree 
> that for multiple connections to the same remote service, the
> asynchronous at aproach, seems more appropriate.

OTOH,
After thinking about it some more, there might be problems in letting
each and every consumer do his own caching. The at.c has a (non
implemented yet) mechanism with invalidate for caching tables.

Do we really want to let the consumer handle all the cases of routing
tables changing on the fly etc. or centralize it in one place (i.e
at.c) ?

What do you think, Sean ?

Guy


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] RDMA connection and address translation API

2005-08-26 Thread Caitlin Bestler
 

> -Original Message-
> From: Guy German [mailto:[EMAIL PROTECTED] 
> Sent: Friday, August 26, 2005 12:28 PM
> To: Caitlin Bestler; Sean Hefty; James Lentini
> Cc: openib-general@openib.org
> Subject: RE: [openib-general] RDMA connection and address 
> translation API
> 
> > What do you think about this flow ? 
> > 1. resolve device and port from ip address - synchronous operation 
> >(like at.c resolve_ip)
> > 2. rdma_create_qp (device+port) - modifies qp to init with default 
> > pkey index 3. ib_post_recvs(...); 4. cma_connect -  
> asynchronous at, 
> > modify qp with correct pkey index, cm_connect
> 
> Caitlin wrote:
> >At least with iWARP a QP is not bound to a specific port, or 
> even to an 
> >IP Address. It is only bound to the RDMA Device (RNIC) and 
> Protection 
> >Domain. The same QP can be re-used for a new connection with 
> a new IP 
> >address. Indeed, that is exactly what would happen with 
> >application-layer controlled failover (such as iSER).
> 
> In ib, in order to post receive the QP need to be in init.
> In order to modify qp to init, you need port and pkey_index.
> If iWARP can post receive without it, the iwarp 
> implementation of "rdma_create_qp" can ignore the port attribute.
> 

The closest equivalent of a pkey_index would be the VLAN ID, which
is at L2 and totally transparent to an iWARP QP. You can definitely
post receive buffers before knowing anything about the TCP connection
(or SCTP association/stream) that will provide the LLP service.

> The other option, that was suggested to solve the sync 
> problem (need of post receive before connect) is to retrieve 
> the path synchronically, which will require an unnecessary 
> upcall handling for iwarp consumers.
> 
The generic requirement is that the QP passed to the connect
method is ready to be moved to a connected state as soon as
the connection establishment exchanges have finished.

If I follow what you are proposing, you are trying to find a way
to do this for IB automatically as a by-product of determining what
device to use. I don't see any problem with this, as long as the
"port" being returned from the first call is defined in such a
way that it can have a void value when the transport does not need
this refinement. Avoiding transport-dependent steps is good for
encouraging development of RDMA-aware applications.


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] RDMA connection and address translation API

2005-08-26 Thread Guy German
>What do you think about this flow ?
>1. resolve device and port from ip address - synchronous operation
>   (like at.c resolve_ip)
>2. rdma_create_qp (device+port) - modifies qp to init with default pkey index
>3. ib_post_recvs(...);
>4. cma_connect - asynchronous at, modify qp with correct pkey index, cm_connect

>>It looks like this would work.  If a client wanted to create multiple
>>connections to the same remote service (for example, to separate control and
>>data), then it seems more efficient to move the asynchronous at outside of the
>>connect call.
>>- Sean

Thats a good point. What I had in mind was mainly simplicity for the
consumer - save him dealing with another upcall. 

Maybe caching in at module would make things better, but I agree 
that for multiple connections to the same remote service, the
asynchronous at aproach, seems more appropriate.

So ...
Does everyone else thinks that we should change the API of a cm 
abstraction to asynchronous at before connection ? 
(This should concern mostly the iWAPR guys - Caitlin,Tom etc..)

Thanks,
Guy
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] RDMA connection and address translation API

2005-08-26 Thread Sean Hefty
>What do you think about this flow ?
>1. resolve device and port from ip address - synchronous operation
>   (like at.c resolve_ip)
>2. rdma_create_qp (device+port) - modifies qp to init with default pkey index
>3. ib_post_recvs(...);
>4. cma_connect - asynchronous at, modify qp with correct pkey index, cm_connect

It looks like this would work.  If a client wanted to create multiple
connections to the same remote service (for example, to separate control and
data), then it seems more efficient to move the asynchronous at outside of the
connect call.

- Sean

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] RDMA connection and address translation API

2005-08-26 Thread Guy German
> What do you think about this flow ? 
> 1. resolve device and port from ip address - synchronous operation 
>(like at.c resolve_ip)
> 2. rdma_create_qp (device+port) - modifies qp to init with 
> default pkey index 
> 3. ib_post_recvs(...); 
> 4. cma_connect -  asynchronous at, modify qp with correct 
> pkey index, cm_connect

Caitlin wrote:
>At least with iWARP a QP is not bound to a specific port, or even
>to an IP Address. It is only bound to the RDMA Device (RNIC) and
>Protection Domain. The same QP can be re-used for a new connection
>with a new IP address. Indeed, that is exactly what would happen
>with application-layer controlled failover (such as iSER).

In ib, in order to post receive the QP need to be in init.
In order to modify qp to init, you need port and pkey_index.
If iWARP can post receive without it, the iwarp implementation
of "rdma_create_qp" can ignore the port attribute.

The other option, that was suggested to solve the sync problem
(need of post receive before connect) is to retrieve the path
synchronically, which will require an unnecessary upcall handling
for iwarp consumers.

Guy
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] RDMA connection and address translation API

2005-08-26 Thread Caitlin Bestler
 

> -Original Message-
> From: [EMAIL PROTECTED] 
> [mailto:[EMAIL PROTECTED] On Behalf Of Guy German
> Sent: Friday, August 26, 2005 1:27 AM
> To: Sean Hefty; James Lentini
> Cc: openib-general@openib.org
> Subject: RE: [openib-general] RDMA connection and address 
> translation API
> 
> >> We need to insert in here:
> >>
> >> ib_modify_qp(...);  /* somehow uses address resolution... */ 
> >> ib_post_recvs(...);
> >>
> >
> >or add a new call to create the qp and modify it to init (an 
> analog to 
> >the socket(2) function).
> 
> Sean> This approach seems reasonable to me.  Maybe something like:
> Sean> rdma_create_qp(rdma_addr_info);
> 
> Sean> Uses the output from the address resolution to create the QP on 
> Sean> the correct device and transitions it to the INIT 
> state.  The user 
> Sean> can now post any work requests that they want.  For 
> example, with 
> Sean> iWarp, I believe that even send work requests can be 
> posted in the INIT state.
> 
> What do you think about this flow ? 
> 1. resolve device and port from ip address - synchronous operation 
>(like at.c resolve_ip)
> 2. rdma_create_qp (device+port) - modifies qp to init with 
> default pkey index 3. ib_post_recvs(...); 4. cma_connect - 
> asynchronous at, modify qp with correct pkey index, cm_connect
> 

At least with iWARP a QP is not bound to a specific port, or even
to an IP Address. It is only bound to the RDMA Device (RNIC) and
Protection Domain. The same QP can be re-used for a new connection
with a new IP address. Indeed, that is exactly what would happen
with application-layer controlled failover (such as iSER).


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] RDMA connection and address translation API

2005-08-26 Thread James Lentini


On Thu, 25 Aug 2005, Sean Hefty wrote:

> >> Any way providing src/dst IPs in the CM Private data is simple, 
> >> and we can come with IBTA extension blessing that data structure 
> >> as a general way to map IP oriented protocols over IB (a 1-2 page 
> >> draft at the most) This way it can also address Caitlin concerns 
> >> regarding NFS & IETF (since now it's a transport specific issue)
> >
> >How long do you estimate it would take to standardize an IP<->GID 
> >mechanism (ATS, CM embedded, ...) in the IBTA? 3 months? 6 months? 
> >A year?
> >
> >Let's assume that everyone on this list is in agreement.
> 
> Does anyone in the IB world disagree with adding IP addresses in the 
> CM private data area?  Would we want to extend this concept to SIDR 
> as well?

I think we should focus on providing a mechanism to allow ULPs to use 
IP addresses on InfiniBand networks. 

Service discovery (SIDR) seems like a separate issue. The ability to 
ask "What UD QPN is this service using?" seems useful on its own.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] RDMA connection and address translation API

2005-08-26 Thread Guy German
>> We need to insert in here:
>>
>> ib_modify_qp(...);  /* somehow uses address resolution... */
>> ib_post_recvs(...);
>>
>
>or add a new call to create the qp and modify it to init (an analog to
>the socket(2) function).

Sean> This approach seems reasonable to me.  Maybe something like:
Sean> rdma_create_qp(rdma_addr_info);

Sean> Uses the output from the address resolution to create the QP on the 
Sean> correct device and transitions it to the INIT state.  The user can 
Sean> now post any work requests that they want.  For example, with iWarp, 
Sean> I believe that even send work requests can be posted in the INIT state.

What do you think about this flow ? 
1. resolve device and port from ip address - synchronous operation 
   (like at.c resolve_ip)
2. rdma_create_qp (device+port) - modifies qp to init with default pkey index
3. ib_post_recvs(...);
4. cma_connect - asynchronous at, modify qp with correct pkey index, cm_connect

Guy

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] RDMA connection and address translation API

2005-08-25 Thread Christoph Hellwig
On Thu, Aug 25, 2005 at 01:18:06PM -0400, Talpey, Thomas wrote:
> At 12:56 PM 8/25/2005, Caitlin Bestler wrote:
> >Generic code MUST support both IPv4 and IPv6 addresses.
> >I've even seen code that actually does this.
> 
> Let me jump ahead to the root question. How will the NFS layer know
> what address to resolve?
> 
> On IB mounts, it will need to resolve a hostname or numeric string to
> a GID, in order to provide the address to connect. On TCP/UDP, or
> iWARP mounts, it must resolve to IP address. The mount command
> has little or no context to perform these lookups, since it does not
> know what interface will be used to form the connection.
> 
> In exports, the server must inspect the source network of each
> incoming request, in order to match against /etc/exports. If there
> are wildcards in the file, a GID-specific algorithm must be applied.
> Historically, /etc/exports contains hostnames and IPv4 netmasks/
> addresses.
> 
> In either case, I think it is a red herring to assume that the GID
> is actually an IPv6 address. They are not assigned by the sysadmin,
> they are not subnetted, and they are quite foreign to many users.
> IPv6 support for Linux NFS isn't even submitted yet, btw.
> 
> With an IP address service, we don't have to change a line of 
> NFS code.

I think this shows that using IP addresses in any service over
infiniband that isn't actually IP networking is extremly stupid.
Just stop living in the illusion that it makes sense and use IB-specific
addressing, namely IB and stop all this layering violations into IP,
which is much higher up the stack.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] RDMA connection and address translation API

2005-08-25 Thread Yaron Haviv
> -Original Message-
> From: Sean Hefty [mailto:[EMAIL PROTECTED]
> Sent: Thursday, August 25, 2005 2:37 PM
> To: 'James Lentini'; Yaron Haviv
> Cc: openib-general@openib.org
> Subject: RE: [openib-general] RDMA connection and address translation
API
> 
> >> Any way providing src/dst IPs in the CM Private data is simple, and
we
> >> can come with IBTA extension blessing that data structure as a
general
> >> way to map IP oriented protocols over IB (a 1-2 page draft at the
most)
> >> This way it can also address Caitlin concerns regarding NFS & IETF
> >> (since now it's a transport specific issue)
> >
> >How long do you estimate it would take to standardize an IP<->GID
> >mechanism (ATS, CM embedded, ...) in the IBTA? 3 months? 6 months? A
> >year?
> >
> >Let's assume that everyone on this list is in agreement.
> 
> Does anyone in the IB world disagree with adding IP addresses in the
CM
> private
> data area?  Would we want to extend this concept to SIDR as well?
> 
> - Sean

I send my proposal from 2004 re-send again as text (attached)
Also addresses the ServiceID issue, this can be a baseline for
discussions
Feel free to change 

Yaron


   Mapping of iWarp/TCP connections to InfiniBand


AUTHOR   Yaron Haviv  ([EMAIL PROTECTED])
VERSION  0.30, Mon June 28 2004


I.  INTRODUCTION


 InfiniBand and iWarp semantics are similar especially with the latest
 Verb Extensions, the major difference is in the way connections are 
 established, iWarp uses TCP based connection establishment while 
 InfiniBand uses a CM for that. 
 Another related difference is that in iWarp a user can start in a 
 standard TCP mode and migrate to RDMA verbs in the middle of a session.

 The following document provides a general mapping from iWarp/TCP 
 connection establishment to InfiniBand which can be used by ULPs over 
 InfiniBand or by any other future iWarp protocols, it imitates the SDP
 connection establishment process and CM headers (does not require SDP,
 just have the same data formats for CM messages).
 
 
II. Establishing a TCP/iWarp like connections over InfiniBand

 In order to emulate an iWarp connection, it is required to open an 
 InfiniBand RC connection, associate it with IP addresses and TCP ports
 In addition protocols may transfer control/login packets before
 the migration to the RDMA mode; this requires exchanging receiver buffer
 size and depth for initial usage (the ULP’s will manage the flow control
 for the duration of the connection).

 The mapping uses the same data structures already defined for connection 
 establishment in SDP  (IBTA Socket Direct Protocol) which accomplish the
 same goal of mapping TCP Sockets addressing to InfiniBand, the non 
 relevant SDP fields were Reserved. 

 iWarp emulation CM Request (Hello) Private Data header
  
0   1   2   3 
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1   
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+  
 04|  MID  | Rsvd  | bufs  |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+  
 08|  len  |  
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+  
 12|   Reserved|  
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+  
 16|   Reserved|  
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+  
 20| MajVer| MinVer| IPVer | FlowC |   Reserved|  
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+  
 24|  DesRemRcvSz  |  
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+  
 28|  LocalRcvSz   |  
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+  
 32| Local Port|   Reserved|  
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+  
 36|   Src IP (127-96) |  
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+  
 40|   Src IP ( 95-64) |  
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+  
 44|   Src IP ( 63-32) |  
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+  
 48|   Src IP ( 31-00

RE: [openib-general] RDMA connection and address translation API

2005-08-25 Thread Sean Hefty
>> Any way providing src/dst IPs in the CM Private data is simple, and we
>> can come with IBTA extension blessing that data structure as a general
>> way to map IP oriented protocols over IB (a 1-2 page draft at the most)
>> This way it can also address Caitlin concerns regarding NFS & IETF
>> (since now it's a transport specific issue)
>
>How long do you estimate it would take to standardize an IP<->GID
>mechanism (ATS, CM embedded, ...) in the IBTA? 3 months? 6 months? A
>year?
>
>Let's assume that everyone on this list is in agreement.

Does anyone in the IB world disagree with adding IP addresses in the CM private
data area?  Would we want to extend this concept to SIDR as well?

- Sean

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] RDMA connection and address translation API

2005-08-25 Thread Sean Hefty
>Sean> Another possibility could be to add a list of receives to
>Sean> rdma_connect().
>
>Guy> I added this to both connect and accept calls
>
>I don't think this is a good idea.  Let's try to streamline the
>connect call, not add every single possible feature to it.

I don't think that we want to add a list of receives to the connect call either.
I only mentioned that it was a possibility.

- Sean

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] RDMA connection and address translation API

2005-08-25 Thread Sean Hefty
>> We need to insert in here:
>>
>> ib_modify_qp(...);  /* somehow uses address resolution... */
>> ib_post_recvs(...);
>>
>
>or add a new call to create the qp and modify it to init (an analog to
>the socket(2) function).

This approach seems reasonable to me.  Maybe something like:

rdma_create_qp(rdma_addr_info);

Uses the output from the address resolution to create the QP on the correct
device and transitions it to the INIT state.  The user can now post any work
requests that they want.  For example, with iWarp, I believe that even send work
requests can be posted in the INIT state.

- Sean

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] RDMA connection and address translation API

2005-08-25 Thread James Lentini


On Tue, 23 Aug 2005, Roland Dreier wrote:

> The listen side is even simpler:
> 
> rdma_listen():
> inputs: local service, event callback, consumer context
> 
> Wait for connection requests and pass events to the consumer's
> callback.  I'm not sure if/home we want to support binding to
> a particular IP address.  The current IB CM in Linux doesn't
> support binding a listen to a single device or port, and even
> if it did it's not clear how to handle binding to one IP
> address when a port has more than one IP.
> 
> I guess the event callback would receive a device pointer and
> the same RDMA transport address union I talked about above
> when discussing address resolution.
> 
> It would be possible to have another function like
> rdma_getpeername() that takes the transport address and
> returns a source IP address.

To be complete, the API needs an rdma_getpeername() function:

rdma_getpeername():
inputs: connected QP
outputs: peer IP address

> In the IB case this would do an
> ATS reverse lookup.  However, I hate this idea.  iSER already
> uses the CM private data to pass the source IP in the IB case,
> and I would much rather fix NFS/RDMA to do the same thing (so
> we can just kill ATS as an address resolution method).
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] RDMA connection and address translation API

2005-08-25 Thread Talpey, Thomas
At 12:56 PM 8/25/2005, Caitlin Bestler wrote:
>Generic code MUST support both IPv4 and IPv6 addresses.
>I've even seen code that actually does this.

Let me jump ahead to the root question. How will the NFS layer know
what address to resolve?

On IB mounts, it will need to resolve a hostname or numeric string to
a GID, in order to provide the address to connect. On TCP/UDP, or
iWARP mounts, it must resolve to IP address. The mount command
has little or no context to perform these lookups, since it does not
know what interface will be used to form the connection.

In exports, the server must inspect the source network of each
incoming request, in order to match against /etc/exports. If there
are wildcards in the file, a GID-specific algorithm must be applied.
Historically, /etc/exports contains hostnames and IPv4 netmasks/
addresses.

In either case, I think it is a red herring to assume that the GID
is actually an IPv6 address. They are not assigned by the sysadmin,
they are not subnetted, and they are quite foreign to many users.
IPv6 support for Linux NFS isn't even submitted yet, btw.

With an IP address service, we don't have to change a line of 
NFS code.

Tom.


>
>So supporting GIDs is not that much of an issue as long
>as no IB network IDs are assigned with a meaning that
>conflicts with any reachable IPv6 network ID. (In other
>words, assign GIDs so that they are in fact valid IPv6
>addresses. Something that was always planned to be one
>option for GIDs).
>
>
>
>> -Original Message-
>> From: [EMAIL PROTECTED] 
>> [mailto:[EMAIL PROTECTED] On Behalf Of James Lentini
>> Sent: Thursday, August 25, 2005 9:48 AM
>> To: Tom Tucker
>> Cc: openib-general@openib.org
>> Subject: RE: [openib-general] RDMA connection and address 
>> translation API
>> 
>> 
>> 
>> On Wed, 24 Aug 2005, Tom Tucker wrote:
>> 
>> > > 
>> > >  - It's not just preventing connections to the wrong 
>> local address.
>> > >NFS-RDMA wants the remote source address (ie 
>> getpeername()) so that
>> > >it can look it up in the exports list.
>> > 
>> > Agreed. But you could also get rid of ATS by allowing GIDs to be 
>> > specified in the exports file and then treating them like
>> > IPv6 addresses for the purpose of subnet comparisons.
>> 
>> Could generic code use both GIDs and IPv4 addresses? 
>> ___
>> openib-general mailing list
>> openib-general@openib.org
>> http://openib.org/mailman/listinfo/openib-general
>> 
>> To unsubscribe, please visit 
>> http://openib.org/mailman/listinfo/openib-general
>> 
>> 
>
>___
>openib-general mailing list
>openib-general@openib.org
>http://openib.org/mailman/listinfo/openib-general
>
>To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] RDMA connection and address translation API

2005-08-25 Thread James Lentini


On Wed, 24 Aug 2005, Sean Hefty wrote:

> >With this in mind, I believe that the connection API needs to be
> >something more like the following:
> >
> >rdma_resolve_address():
> >inputs: dest IP address, qos, npaths,
> >done callback, opaque context
> > done callback params: status, local RDMA device,
> >RDMA transport address, context
> ...
> >rdma_connect():
> >inputs: local QP, RDMA transport address, destination service,
> >private data, timeout, event callback, opaque context
> 
> Have we agreed that this is the functionality that we should be 
> aiming towards?

I think so, but as you pointed out the local QP must be in the init 
state.

> 
> >rdma_resolve_address(...);
> >/* wait for resolution */
> >ib_create_qp(...) /* use device pointer we got from 
> > rdma_resolve_address()
> >*/
> 
> We need to insert in here: 
> 
> ib_modify_qp(...);  /* somehow uses address resolution... */
> ib_post_recvs(...);
> 

or add a new call to create the qp and modify it to init (an analog to 
the socket(2) function).

> >rdma_connect(...); /* pass transport address we got from
> >rdma_resolve_address() */
> >/* wait for connection to finish... */
> 
> Another possibility could be to add a list of receives to 
> rdma_connect().

The caller might also want to setup memory windows. Requiring the qp 
to be in the init state before calling connect seems cleaner to me.

> 
> - Sean
> 
> 
> ___
> openib-general mailing list
> openib-general@openib.org
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
> 
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] RDMA connection and address translation API

2005-08-25 Thread James Lentini


On Wed, 24 Aug 2005, Fab Tillier wrote:

> Performing a forward lookup via ARP is going to be a lot faster than 
> ATS if the ARP entry already exists.

ATS responses could also be cached.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] RDMA connection and address translation API

2005-08-25 Thread Yaron Haviv
> -Original Message-
> From: James Lentini [mailto:[EMAIL PROTECTED]
> Sent: Thursday, August 25, 2005 12:21 PM
> To: Yaron Haviv
> Cc: Fab Tillier; Roland Dreier; openib-general@openib.org
> Subject: RE: [openib-general] RDMA connection and address translation
API
> 
> 
> 
> On Wed, 24 Aug 2005, Yaron Haviv wrote:
> 
> > Any way providing src/dst IPs in the CM Private data is simple, and
we
> > can come with IBTA extension blessing that data structure as a
general
> > way to map IP oriented protocols over IB (a 1-2 page draft at the
most)
> > This way it can also address Caitlin concerns regarding NFS & IETF
> > (since now it's a transport specific issue)
> 
> How long do you estimate it would take to standardize an IP<->GID
> mechanism (ATS, CM embedded, ...) in the IBTA? 3 months? 6 months? A
> year?
> 
> Let's assume that everyone on this list is in agreement.

James, I can identify enough IBTA members in this list
In case the group is in agreement I believe it's a rather short process
Since it's just some minor definition, and IBTA doesn't have much on its
agenda these days.

For example Hal added a feature to the SM (client re-register ..) in
weeks 
Based on the OpenIB input 
We also don't have to wait for finalized spec to implement, just like we
implement IPoIB without an IETF RFC (only a draft)

By the way a quick path could be to define it in DAT and hand it over to
IBTA, after all ATS is also not an IBTA standard 

Yaron
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] RDMA connection and address translation API

2005-08-25 Thread Caitlin Bestler
Generic code MUST support both IPv4 and IPv6 addresses.
I've even seen code that actually does this.

So supporting GIDs is not that much of an issue as long
as no IB network IDs are assigned with a meaning that
conflicts with any reachable IPv6 network ID. (In other
words, assign GIDs so that they are in fact valid IPv6
addresses. Something that was always planned to be one
option for GIDs).



> -Original Message-
> From: [EMAIL PROTECTED] 
> [mailto:[EMAIL PROTECTED] On Behalf Of James Lentini
> Sent: Thursday, August 25, 2005 9:48 AM
> To: Tom Tucker
> Cc: openib-general@openib.org
> Subject: RE: [openib-general] RDMA connection and address 
> translation API
> 
> 
> 
> On Wed, 24 Aug 2005, Tom Tucker wrote:
> 
> > > 
> > >  - It's not just preventing connections to the wrong 
> local address.
> > >NFS-RDMA wants the remote source address (ie 
> getpeername()) so that
> > >it can look it up in the exports list.
> > 
> > Agreed. But you could also get rid of ATS by allowing GIDs to be 
> > specified in the exports file and then treating them like
> > IPv6 addresses for the purpose of subnet comparisons.
> 
> Could generic code use both GIDs and IPv4 addresses? 
> ___
> openib-general mailing list
> openib-general@openib.org
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit 
> http://openib.org/mailman/listinfo/openib-general
> 
> 

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] RDMA connection and address translation API

2005-08-25 Thread Talpey, Thomas
At 12:34 PM 8/25/2005, Roland Dreier wrote:
>All implementation of NFS/RDMA on top of IB had better interoperate,
>right?  Which means that someone has to specify which address
>translation mechanism is the choice for NFS/RDMA.

Correct. At the moment the existing NFS/RDMA implementations
use ATS (Sun's and NetApp's).

>NFS/RDMA is being defined on top of an abstract RDMA interface.
>Someone has to write a spec for how that RDMA abstraction is
>translated into packets on the wire for each transport that NFS/RDMA
>will run on top of.

Well, we did. We specify the ULP payload of all the messages
in those two IETF documents. What we didn't do is define how
each transport handles IP addressing, that is a transport issue.

We don't need address translation over iWARP, since that uses
IP. Over IB, so far, we have used ATS. I am perfectly fine with
a better solution, but ATS has been fine too.

I am catching up to this discussion, so this is just one reply.

Tom.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] RDMA connection and address translation API

2005-08-25 Thread James Lentini


On Wed, 24 Aug 2005, Tom Tucker wrote:

> > 
> >  - It's not just preventing connections to the wrong local address.
> >NFS-RDMA wants the remote source address (ie getpeername()) so that
> >it can look it up in the exports list.
> 
> Agreed. But you could also get rid of ATS by allowing GIDs to 
> be specified in the exports file and then treating them like 
> IPv6 addresses for the purpose of subnet comparisons.

Could generic code use both GIDs and IPv4 addresses? 
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] RDMA connection and address translation API

2005-08-25 Thread Roland Dreier
Roland> No, I think we just need to realize that a perfectly
Roland> transport neutral protocol implementation is not
Roland> achievable.

James> It is achievable. Although the IB and iWARP protocols are
James> different, they can provide the same services to NFS-RDMA.

Not really.  This is just hiding the transport dependence in some
other layer and then pretending it doesn't exist.  IB and iWARP can
provide the same services to NFS/RDMA, but only through some
intermediate layer that implements the actual transport-dependent wire
protocol.

James> IB is missing one service that iWARP has, namely that nodes
James> can be identified with IP addresses. The ATS mechanism
James> provides this capability for IB networks. If there are
James> better mechanisms that do the same thing, then NFS-RDMA can
James> use them.

All implementation of NFS/RDMA on top of IB had better interoperate,
right?  Which means that someone has to specify which address
translation mechanism is the choice for NFS/RDMA.

James> The important things is not to push this up into the
James> ULPs. The NFS-RDMA protocol is being standardized in the
James> IETF. There is no reason to upset that process. If an
James> additional IB specific protocol is necessary, it should be
James> standardized in the IBTA.

NFS/RDMA is being defined on top of an abstract RDMA interface.
Someone has to write a spec for how that RDMA abstraction is
translated into packets on the wire for each transport that NFS/RDMA
will run on top of.

 - R.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] RDMA connection and address translation API

2005-08-25 Thread James Lentini


On Wed, 24 Aug 2005, Roland Dreier wrote:

> James> I agree with Caitlin. The eventual solution cannot force
> James> protocol modifications in ULPs.
> 
> Does this mean we're stuck with the current use of ATS in NFS-RDMA?

NFS-RDMA requires that the lower layer provide IP addressing. ATS is 
one proposal and the only one being documented and standardized in a 
standards organization. Any other solution that was documented and 
standardized should be considered. 

Since this will involve the wire protocol, it can't be OpenIB 
specific.

> Surely there's still time to fix the protocol.

I believe that a solution can be found without impacting the NFS-RDMA 
specifications.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] RDMA connection and address translation API

2005-08-25 Thread James Lentini


On Wed, 24 Aug 2005, Yaron Haviv wrote:

> Any way providing src/dst IPs in the CM Private data is simple, and we
> can come with IBTA extension blessing that data structure as a general
> way to map IP oriented protocols over IB (a 1-2 page draft at the most)
> This way it can also address Caitlin concerns regarding NFS & IETF
> (since now it's a transport specific issue)

How long do you estimate it would take to standardize an IP<->GID 
mechanism (ATS, CM embedded, ...) in the IBTA? 3 months? 6 months? A 
year?

Let's assume that everyone on this list is in agreement.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] RDMA connection and address translation API

2005-08-25 Thread Guy German
On Thu, 2005-08-25 at 08:58 -0700, Roland Dreier wrote:
> Sean> Another possibility could be to add a list of receives to
> Sean> rdma_connect().
> 
> Guy> I added this to both connect and accept calls
> 
> I don't think this is a good idea.  Let's try to streamline the
> connect call, not add every single possible feature to it.
> 
>  - R.

I think it is a good solution for the sync problem that sean raised - in
the case where we modify the qp inside the abstraction layer.
We can take it out (i.e getting the path and modify qp to init *before*
connect) but I think this will be more complicated for the consumers
(especially the iwarp ones).
I am not saying we *have* to do it - this is just a suggestion.

Guy

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] RDMA connection and address translation API

2005-08-25 Thread James Lentini


On Wed, 24 Aug 2005, Caitlin Bestler wrote:

> NFS over RDMA does not do that.
> 
> Shouldn't that be the end of discussion on abusing CM private data
> unless you are talking *solely* about IB private data. And if that is
> the discussion, should not such a strategy be proposed to IETF
> and/or IBTA for an NFSoRDMA for IB official mapping?

Since this is IB specific, I think it should be addressed in the IBTA.

> The other end of the NFSoRDMA connection is not necessarily
> running OpenIB or even Linux and is not party to any of these
> discussions.
> 
> > 
> > My resistance is that ATS is just complexity without any benefit.  It
> > doesn't provide additional security.  It doesn't solve the
> > multi-homing problem we're talking about now.  Once you've thrown away
> > information by turning your IP address into an IB GID, there's no
> > magic way ATS can recreate that information and be psychic about which
> > of the multi-homed IPs you actually meant.  So why not just put the IP
> > addressing information into the CM private data, the way that the SDP
> > protocol already does?
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] RDMA connection and address translation API

2005-08-25 Thread James Lentini


On Wed, 24 Aug 2005, Roland Dreier wrote:

> James> You need to consider what makes sense for *both* ib and
> James> iwarp. Keep in mind that the correct API will allow a
> James> consumer to use ib and iwarp devices transparently. In
> James> other words their will be one code path that support both.
> 
> James> If we were to adopt your proposal, the consumer would need
> James> to perform unnecessary operations on iWARP.
> 
> No, I think we just need to realize that a perfectly transport neutral
> protocol implementation is not achievable.  

It is achievable. Although the IB and iWARP protocols are different, 
they can provide the same services to NFS-RDMA.

IB is missing one service that iWARP has, namely that nodes can be 
identified with IP addresses. The ATS mechanism provides this 
capability for IB networks. If there are better mechanisms that do the 
same thing, then NFS-RDMA can use them. 

The important things is not to push this up into the ULPs. The NFS-RDMA 
protocol is being standardized in the IETF. There is no reason to 
upset that process. If an additional IB specific protocol is 
necessary, it should be standardized in the IBTA.

> It's unfortunate that kDAPL fooled people by hiding the details of 
> the wire protocol under a supposedly "neutral API," but the fact is 
> that mapping an abstract RDMA transport to a real implementation 
> will always involve arbitrary transport-dependent choices.

The kDAPL API *is* transport neutral. This has been demonstrated at 
several interoperability tests at which the same applications were run 
on both IB and iWARP.

kDAPL isn't the only transport neutral networking API. The Sockets API 
supports UDP and TCP transports via the same interface. 

I believe we are very close to reaching agreement on a transport 
neutral RDMA connection API. Comparing your API proposal to the API 
that we proposed at the BOF, they are very similar. The most important 
similarity is that both use IP addressing. 

The only real point of debate is over how to perform the address 
translation (IP <-> GID) on IB. I believe we should separate that from 
the API discussion. 
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] RDMA connection and address translation API

2005-08-25 Thread Roland Dreier
Sean> Another possibility could be to add a list of receives to
Sean> rdma_connect().

Guy> I added this to both connect and accept calls

I don't think this is a good idea.  Let's try to streamline the
connect call, not add every single possible feature to it.

 - R.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] RDMA connection and address translation API

2005-08-25 Thread Caitlin Bestler
The data required when doing a qp-modify-to-rts is inherently
transport specific. IB requires a set of data obtained from the
IB CM protocol (or the equivalent data through application specific
black magic), while iWARP requires a handle for a TCP connection
(assumed to be a socket, but not explicitly required to be so).

The problem is that when the RDMAC specified the iWARP modify qp
to RTS behaviour they did not forsee the non-technical barriers
to simply using a socket handle to specify transfer of ownership
of a TCP connection from one stack to another.
 

> -Original Message-
> From: [EMAIL PROTECTED] 
> [mailto:[EMAIL PROTECTED] On Behalf Of James Lentini
> Sent: Thursday, August 25, 2005 7:54 AM
> To: Roland Dreier
> Cc: openib-general@openib.org
> Subject: Re: [openib-general] RDMA connection and address 
> translation API
> 
> 
> 
> On Wed, 24 Aug 2005, Roland Dreier wrote:
> 
> > Sean> Is the idea that the user calls connect() and 
> then receives
> > Sean> a single callback indicating that the connection has been
> > Sean> established?  If so, then the user may need to 
> modify the QP
> > Sean> to the INIT state, which would require some knowledge
> > Sean> already of the path.  We would also need to be clear on
> > Sean> whether the QP is expected to be in the INIT state before
> > Sean> connect is called, or if it could be in any 
> arbitrary state.
> > Sean> The other alternative is to provide multiple callbacks
> > Sean> during connection establishment.
> > 
> > To me it makes sense for the generic CM API to be defined 
> so that an 
> > IB QP must be in the INIT state before being passed to connect().
> 
> Will the ib_modify_qp() function be made transport neutral? I 
> see some fields in the ib_qp_attr structure that are IB specific.
> 
> I think the RDMA connection API should perform all the QP 
> state transitions for the ULP. How about a new call to create 
> the QP and perform all QP state transitions necessary for the 
> posting receive work requests?
> ___
> openib-general mailing list
> openib-general@openib.org
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit 
> http://openib.org/mailman/listinfo/openib-general
> 
> 

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] RDMA connection and address translation API

2005-08-25 Thread James Lentini


On Wed, 24 Aug 2005, Roland Dreier wrote:

> Sean> Is the idea that the user calls connect() and then receives
> Sean> a single callback indicating that the connection has been
> Sean> established?  If so, then the user may need to modify the QP
> Sean> to the INIT state, which would require some knowledge
> Sean> already of the path.  We would also need to be clear on
> Sean> whether the QP is expected to be in the INIT state before
> Sean> connect is called, or if it could be in any arbitrary state.
> Sean> The other alternative is to provide multiple callbacks
> Sean> during connection establishment.
> 
> To me it makes sense for the generic CM API to be defined so that an
> IB QP must be in the INIT state before being passed to connect().

Will the ib_modify_qp() function be made transport neutral? I see some 
fields in the ib_qp_attr structure that are IB specific.

I think the RDMA connection API should perform all the QP state 
transitions for the ULP. How about a new call to create the QP and 
perform all QP state transitions necessary for the posting receive 
work requests?
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] RDMA connection and address translation API

2005-08-25 Thread Caitlin Bestler
Good point. But that's about wire behavior, not what an application sees.

And yes, the RDMA device must behave as though its IP layer
were part of the host stack. That is a strong argument for
standardizing many of those interactions rather than relying
on fully compliant parallel processing.
 

-Original Message-
From: Christoph Hellwig [mailto:[EMAIL PROTECTED] 
Sent: Thursday, August 25, 2005 1:52 AM
To: Caitlin Bestler
Cc: Christoph Hellwig; openib-general@openib.org
Subject: Re: [openib-general] RDMA connection and address translation API

On Wed, Aug 24, 2005 at 02:22:31PM -0700, Caitlin Bestler wrote:
> Not if the host connects two disjoint networks and does not route 
> between them. Such a host should/may be configured to reject any 
> packet that arrives with a destination address that does not match the 
> expected destination address for the port it arrives upon.

While you can configure a Linux system to reject such request through a bunch
of crude hacks, the default and fully RFC compliant behaviour is to always
reply to ARP requests for any IP address assigned to the system.  RDMA CM
implementations must work the same.


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] RDMA connection and address translation API

2005-08-25 Thread Guy German
On Wed, 2005-08-24 at 18:28 -0700, Sean Hefty wrote:
> Another possibility could be to add a list of receives to rdma_connect().

I added this to both connect and accept calls

Guy


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] RDMA connection and address translation API

2005-08-25 Thread Christoph Hellwig
On Wed, Aug 24, 2005 at 02:22:31PM -0700, Caitlin Bestler wrote:
> Not if the host connects two disjoint networks and does not route
> between them. Such a host should/may be configured to reject any
> packet that arrives with a destination address that does not match
> the expected destination address for the port it arrives upon.

While you can configure a Linux system to reject such request through
a bunch of crude hacks, the default and fully RFC compliant behaviour
is to always reply to ARP requests for any IP address assigned to the
system.  RDMA CM implementations must work the same.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] RDMA connection and address translation API

2005-08-25 Thread Christoph Hellwig
On Wed, Aug 24, 2005 at 02:15:09PM -0700, Roland Dreier wrote:
> Roland> Well, that's not what I would expect.  Suppose I have a
> Roland> device configured with local addresses 192.168.11.12 and
> Roland> 192.168.98.99 and I
> 
> Christoph> You never configure a device with local addresses.  IP
> Christoph> addresses are always a per-host attribute in Linux.
> 
> I don't think this is really true.  In some ways Linux behaves as if
> IP addresses are per-host (eg ARP responses can go out any interface)
> but really IP addresses are attached to an interface.  Every struct
> net_device has a struct in_device, and every struct in_device has a
> list of struct in_ifaddrs for the device's IP addresses.

This is correct, but the user-visible effect is what I said above.
When you do an ARP query for any of the IP addresses of a linux box
you'll get a responce even if that interface isn't on the network.

Even if you don't think that's enough you can assign any number of
IP and other networking addresses to a given device even formally,
rendering the notation of an IP address <-> network device relation
rather mood.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] RDMA connection and address translation API

2005-08-24 Thread Sean Hefty
>With this in mind, I believe that the connection API needs to be
>something more like the following:
>
>rdma_resolve_address():
>inputs: dest IP address, qos, npaths,
>done callback, opaque context
>   done callback params: status, local RDMA device,
>RDMA transport address, context
...
>rdma_connect():
>inputs: local QP, RDMA transport address, destination service,
>private data, timeout, event callback, opaque context

Have we agreed that this is the functionality that we should be aiming towards?

>rdma_resolve_address(...);
>/* wait for resolution */
>ib_create_qp(...) /* use device pointer we got from rdma_resolve_address()
>*/

We need to insert in here: 

ib_modify_qp(...);  /* somehow uses address resolution... */
ib_post_recvs(...);

>rdma_connect(...); /* pass transport address we got from
>rdma_resolve_address() */
>/* wait for connection to finish... */

Another possibility could be to add a list of receives to rdma_connect().

- Sean


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] RDMA connection and address translation API

2005-08-24 Thread Yaron Haviv
> -Original Message-
> From: Roland Dreier [mailto:[EMAIL PROTECTED]
> Sent: Wednesday, August 24, 2005 7:29 PM
> To: Yaron Haviv
> Cc: James Lentini; Roland Dreier; openib-general@openib.org
> Subject: Re: [openib-general] RDMA connection and address translation
API
> 
> 
> Yaron, has anyone raised all this in the IBTA WG?
> 

I raised it about a year ago, but didn't really followed up on it 
At the time IBTA was also busy with other more urgent stuff (verb ext..)
We work with few key IBTA members to re-surface it with the need for an
abstract CM

See the following text that was proposed (a Year ago as is)
It is slightly different than your proposal but can be altered if needed

It basically uses SDP header and marks one of the fields with 01 (FlowC)
to indicate it's not SDP, this way even SDP can use it 
Also it covers some nice idea raised by MS & SUN to extend SDP to accept
PUT & GET operations for RDMA, so you can get a BSD like API with few
additional APIs rather than have a totally new API like DAPL


Establishing a TCP/iWarp like connections over InfiniBand
=

 In order to emulate an iWarp connection, it is required to open an 
 InfiniBand RC connection, associate it with IP addresses and TCP ports
 In addition protocols may transfer control/login packets before
 the migration to the RDMA mode; this requires exchanging receiver
buffer
 size and depth for initial usage (the ULP's will manage the flow
control
 for the duration of the connection).

 The mapping uses the same data structures already defined for
connection 
 establishment in SDP  (IBTA Socket Direct Protocol) which accomplish
the
 same goal of mapping TCP Sockets addressing to InfiniBand, the non 
 relevant SDP fields were Reserved. 

 iWarp emulation CM Request (Hello) Private Data header
  
0   1   2   3

0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1

   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

 04|  MID  | Rsvd  | bufs  |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

 08|  len  |

   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

 12|   Reserved|

   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

 16|   Reserved|

   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

 20| MajVer| MinVer| IPVer | FlowC |   Reserved|

   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

 24|  DesRemRcvSz  |

   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

 28|  LocalRcvSz   |

   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

 32| Local Port|   Reserved|

   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

 36|   Src IP (127-96) |

   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

 40|   Src IP ( 95-64) |

   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

 44|   Src IP ( 63-32) |

   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

 48|   Src IP ( 31-00) |

   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

 52|   Dst IP (127-96) |

   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

 56|   Dst IP ( 95-64) |

   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

 60|   Dst IP ( 63-32) |

   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

 64|   Dst IP ( 31-00) |

   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

 Figure 1 CM Hello private data structure   
  

 iWarp emulation CM Response (HelloReply) Private Data header

0   1   2   3

0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1

   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

 04|  MID  | Rsvd  | bufs  |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

RE: [openib-general] RDMA connection and address translation API

2005-08-24 Thread Fab Tillier
> From: James Lentini [mailto:[EMAIL PROTECTED]
> Sent: Wednesday, August 24, 2005 1:58 PM
> 
> On Wed, 24 Aug 2005, Fab Tillier wrote:
> 
> > > From: Roland Dreier [mailto:[EMAIL PROTECTED]
> > > Sent: Wednesday, August 24, 2005 11:03 AM
> > >
> > > Fab> Why can't the IPV field be ignored?  If a listen wants only
> > > Fab> IPV4 addresses, it would specify a 16-byte compare buffer
> > > Fab> with the first 12 bytes zero, the next 4 filled with the IPV4
> > > Fab> address, and would set the offset to that of the hello
> > > Fab> message's destination address (32).
> > >
> > > Yes, you're right for SDP.  I guess if we're comfortable mandating
> > > that all protocols put their source and destination IPs in the private
> > > data for the IB case, then this works.  Of course it's somewhat
> > > awkward to pass this information into the transport-neutral CM API but
> > > I think this can be worked around.
> >
> > I don't know if we need to mandate IP usage - it's up to the
> > application.  Any application that wants to have similar semantics
> > to the way socket listens work (especially when bound to one of
> > multiple IP addresses on a port) the application would have to
> > define its private data to accommodate this.
> >
> >  At the IB level, the contents of the private data are still opaque,
> > even to the CM.  The CM would only expose the ability to have it
> > perform an initial triage of requests by doing binary comparisons
> > over regions of private data.  It doesn't know (or need to know)
> > what the data represents - it only cares about finding a match (or
> > not).  The CM doesn't define any sort of policy here, and I don't
> > think it should.  It's just bytes to the CM, and it's doing a blind
> > comparison without interpreting the contents.
> 
> You need to consider what makes sense for *both* ib and iwarp. Keep in
> mind that the correct API will allow a consumer to use ib and iwarp
> devices transparently. In other words their will be one code path that
> support both.

I believe using the private data makes the most sense from the IB perspective.
One could even argue that it is the only way to provide positive "getpeername"
functionality.  Use of the IB private data does not require identical use of
private data in other technologies.

> If we were to adopt your proposal, the consumer would need to perform
> unnecessary operations on iWARP.

It doesn't have to impact the client if there's some intermediate abstraction to
isolate the client from the IB CM details (including private data use).

> A transport neutral client would be forced to put IP information into
> its CM private data on iWARP.
> 
> Likewise, a transport neutral server would be forced to pass an
> private data offset and binary blob to the listen API call on iWARP.
> 
> Neither of these make sense.

A higher-level CM abstraction could implement the policy of private data use
when running on IB without the client's involvement.  The end result still is
that you end up with a wire protocol that needs to be documented so that someone
without that exact CM abstraction knows where and how to format the private data
as well as how to interpret it.  If the IBTA defines something like this, all
these issues go away.  I don't know if the IBTA can define this without
affecting existing protocols like SDP and iSER that already define how to
encapsulate the source and destination information in the private data.

Using the private data, either by the client or some IB-specific CM abstraction,
will remove the need for any reverse lookups.  A forward lookup to validate the
incoming source GID to the source IP in the private data can validate the IP
address.  Performing a forward lookup via ARP is going to be a lot faster than
ATS if the ARP entry already exists.  On large fabrics, ARP is also going to
scale better since there's not one single entity responsible for responding to
every node's requests. 

> These API problems are secondary to the burden you would be placing on
> the protocols. As has been mentioned in a previous email, extending
> the current protocols to use this convention will require further
> standardization and in some cases may not be compatible with their
> current architecture.

I think biting the bullet now on establishing these standards for applications
using IP addressing over IB, whether in the IBTA or in each application, is
going to give us the best long term result.

- Fab

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] RDMA connection and address translation API

2005-08-24 Thread Roland Dreier
Yaron> The current implementation may not use the private data
Yaron> field (since its not critical/mandatory) but the intention
Yaron> is to add it to address multi homed hosts, we would like to
Yaron> push such a definition into IBTA so every IP oriented ULP
Yaron> can use it, several people expressed interest in such a
Yaron> definition, this can also support NFS/RDMA or any other IP
Yaron> based ULP.

Strange as it may seem, I agree completely with Yaron ;)

It would make perfect sense to take a couple of the reserved bits in
the CM REQ format and turn them into an "IP address present" field (a
couple of bits so we can distinguish between v4 and v6).  When this
field is set, then the first (or last, or whatever) 32 bytes of the
private data would hold the source and destination IP address.

Having this standardized also gives us the ability to deal with the
concerns around connections initiated in userspace.  The kernel proxy
for the user CM can make sure that any REQs sent with the "IP address
present" field set actually has an IP assigned to the local system.
Remote systems would still need to treat CM messages from QPs other
than QP 1 as untrusted.

Of course for real security some stronger authentication is needed in
any case (even in the iWARP case the source IP can't be trusted; an
attacker could DOS the real owner of the IP, flood the switches MAC
tables so it becomes a hub, and then take over any IP it wants).

The only unfortunate thing about all this is that the SDP Hello
message format is already frozen, and it seems a little too
specialized for generic use (eg we don't want a "Max Zcopy
Advertisements" field).

Yaron, has anyone raised all this in the IBTA WG?

 - R.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] RDMA connection and address translation API

2005-08-24 Thread Yaron Haviv
> -Original Message-
> From: James Lentini [mailto:[EMAIL PROTECTED]
> Sent: Wednesday, August 24, 2005 5:51 PM
> To: Yaron Haviv
> Cc: Roland Dreier; openib-general@openib.org
> Subject: RE: [openib-general] RDMA connection and address translation
API
> 
> 
> 
> Which draft contains this? I found
> 
> http://www.ietf.org/internet-drafts/draft-ietf-ips-iser-04.txt
> 

James,

You should look at :
http://www.haifa.il.ibm.com/satran/ips/draft-ietf-ips-iser-05-candidate.
txt

The 05 rev really adds all the InfiniBand related stuff 
You can see how the association between IB & IP is done using IPoIB

The current implementation may not use the private data field (since its
not critical/mandatory) but the intention is to add it to address multi
homed hosts, we would like to push such a definition into IBTA so every
IP oriented ULP can use it, several people expressed interest in such a
definition, this can also support NFS/RDMA or any other IP based ULP.


> but the HELLO header in section 9.3 does not contain any IP address
> information.
> 
> > I believe it can be a good idea to use the same approach for
> > NFS/RDMA and eliminate the need for reverse ATS lookup (the may have
> > some conflicts when multiple IPs exists per node). We may just use
> > the SDP hello header as is with unused fields zeroed This will allow
> > all ULPs to use the same mechanism
> 
> NFS/RDMA is not specific to iWARP or InfiniBand. My understanding is
> that this could not be easily accommodated in the current standards
> for that reason.

Not sure why is that the case, if we add an IBTA definition of CM
exchange for IP based ULP's (i.e. send src/dst IP and optionally ports)
you can now have an NFS/RDMA spec that doesn't need to have any IB/iWarp
specific definitions, since the differences are pushed down to the IBTA 

In case of NFS/RDMA over other (non IB or iWarp) transport you can
specify that providing the IP addressing is a responsibility of the
underline transport.

Yaron

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] RDMA connection and address translation API

2005-08-24 Thread Tom Tucker
 

> -Original Message-
> From: Roland Dreier [mailto:[EMAIL PROTECTED] 
> Sent: Wednesday, August 24, 2005 4:03 PM
> To: Tom Tucker
> Cc: Sean Hefty; Roland Dreier; openib-general@openib.org
> Subject: Re: [openib-general] RDMA connection and address 
> translation API
> 
> Tom> The issue is that this connection will be established when
> Tom> the server may only want to accept requests that are
> Tom> targetted to the 10.10.1.1 address.  I don't get why this is
> Tom> such a big deal. You can preclude this behavior by simply
> Tom> keeping a one to one mapping between the IPv4 addresses and
> Tom> the GIDs using the existing protocols and without mandating a
> Tom> private data format across *all* ulps and transports.
> 
> Well, a few problems with what you say:
> 
>  - ATS does not help at all with the case of a multi-homed interface.
>Unless the remote system puts the IP it's trying to connect to
>somewhere in the connection request, there is no way to be psychic
>and recover this information.

I thought a single HCA could have multiple GIDs. All 
I'm advocating is that a "correct" multi-homed configuration 
has a one-to-one mapping between it's IP addresses and it's GIDS.

> 
>  - Mandating ATS use is dictating protocol design just as much as
>requiring the CM private data to carry source and destination IP
>addresses.

I think ATS dictates the kinds of authentication that can be 
done by the server over an IB transport, but not the protocol 
design. Certainly the private data can have additional 
authentication data (which I think is what you're advocating). 

> 
>  - It's not just preventing connections to the wrong local address.
>NFS-RDMA wants the remote source address (ie getpeername()) so that
>it can look it up in the exports list.

Agreed. But you could also get rid of ATS by allowing GIDs to 
be specified in the exports file and then treating them like 
IPv6 addresses for the purpose of subnet comparisons.

> 
>  - Saying that a given GID may only have a single IP address is
>definitely a case of the cure being worse than the disease.  I
>don't think we can forbid perfectly valid multi-homed
>configurations just because it's inconvenient for us to 
> support them.

I think our different perspectives come from what 
we consider to be "perfectly valid multi-homed configurations". 
One approach advocates overloading private data, the other 
advocates overloading address assignments. 

My approach suffers from the fact that multiple IP addresses
for the same GID are just aliases that are interchangeable and at 
the remote end indistinguishable. The private data approach 
suffers from the need to mandate private data formats across 
all ulps and transports.

I prefer the former limitation/cost. 

> 
> By the way, as far as I can tell, there is NO formal 
> documentation of the NFS-RDMA wire protocol.  The current 
> draft (draft-ietf-nfsv4-rpcrdma-01.txt) simply says:
> 
>  This protocol is designed to function with equivalent semantics
>  over all appropriate RDMA transports.  In its abstract form, this
>  protocol does not implement RDMA directly. [...]  It therefore
>  becomes a useful, implementable standard when mapped onto a
>  specific RDMA transport, such as iWARP [RDDP] or Infiniband [IB].
> 
>  [...]
> 
>  In setting up a new RDMA connection, the first action by an RPC
>  client will be to obtain a transport address for the server.  The
>  mechanism used to obtain this address, and to open an RDMA
>  connection is dependent on the type of RDMA transport, 
> and outside
>  the scope of this protocol.
> 
> So it seems perfectly reasonable and acceptable for the 
> mapping of NFS-RDMA onto IB to specify that the source and 
> destination IP addresses for an IB connection are placed in 
> the CM private data.
> This seems much easier than trying to turn ATS into an IETF standard.
> 
>  - R.
> 

I think there is a way to get rid of ATS as I described above without 
overloading the private data.

Phew -- I'm exhausted. I'm going to go write code ;-)


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] RDMA connection and address translation API

2005-08-24 Thread Roland Dreier
Roland> No, I think we just need to realize that a perfectly
Roland> transport neutral protocol implementation is not
Roland> achievable.  It's unfortunate that kDAPL fooled people by
Roland> hiding the details of the wire protocol under a supposedly
Roland> "neutral API," but the fact is that mapping an abstract
Roland> RDMA transport to a real implementation will always
Roland> involve arbitrary transport-dependent choices.

Further: if we would be willing to say that transport-neutral
protocols must use a "kDAPL wire protocol," then there's no problem in
defining that wire protocol to put the source and destination IP
address somewhere in the CM private data.  The current "kDAPL wire
protocol" happens to use ATS to try and achieve this (although it
doesn't handle the multi-homed case), but that is no more and no less
of an arbitrary protocol design choice.

So in a nutshell, my objection to using ATS is that it is an arbitrary
design choice that doesn't work as well as other equally valid choices.

 - R.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] RDMA connection and address translation API

2005-08-24 Thread Roland Dreier
James> NFS/RDMA is not specific to iWARP or InfiniBand. My
James> understanding is that this could not be easily accommodated
James> in the current standards for that reason.

Yes, it seems that there will need to be some additional NFS/RDMA
drafts describing the iWARP and IB wire protocols before the standard
is complete.

 - R.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] RDMA connection and address translation API

2005-08-24 Thread Caitlin Bestler
The requirement is solely that that System Administrators
for each host directly attached to Network X agree on the
basic addressing characteristics for Network X.

This onerous challenge is sucessfully overcome on every
IP subnet in the world every day for such details as 
what the subnet is, what the mask is, etc. Further, 
two adjoining subnets won't be able to talk unless
their administrators have arranged for them to agree
on what their network identifiers are/etc.

For the specific question it is even less of a
problem than theory suggests. A rule such as "non
IPv4 subnets are direct translated while IPv4 subnets
use IPv4" is actually quite simple to implement.
That could even be extended to allow *some* IPv6
subnets to be translated so that mutiple IPV6
aliases for a single GID could be identified
(that is, if anyone has a need for such a thing).
 

-Original Message-
From: Roland Dreier [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, August 24, 2005 2:45 PM
To: Caitlin Bestler
Cc: Roland Dreier; James Lentini; openib-general@openib.org
Subject: Re: [openib-general] RDMA connection and address translation API

Caitlin> So with this wealth of options available, do you agree
Caitlin> that there is no reason to elevate any of these issues to
Caitlin> being visisble to a transport neutral application?

No -- the fact that there are a wealth of options actually means that picking
one is an arbitrary choice we impose on transport neutral implementations and
is de facto mandating a wire protocol.

 - R.


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] RDMA connection and address translation API

2005-08-24 Thread James Lentini

On Wed, 24 Aug 2005, Yaron Haviv wrote:

> > On Tue, 23 Aug 2005, Roland Dreier wrote:
> > 
> > > It would be possible to have another function like
> > > rdma_getpeername() that takes the transport address and returns
> > > a source IP address.  In the IB case this would do an ATS
> > > reverse lookup.  However, I hate this idea.  iSER already uses
> > > the CM private data to pass the source IP in the IB case,
> >  
> > I know this is how IB SDP works, but I don't think iSER works this
> > way.
> >
> > The code in the tree calls dat_ep_connect() with a NULL private 
> > data pointer.
> >
> > There is an iSER HELLO message described in iser_header.h contains   
> > IP addresses, but I'm not certain that this is part of the current
> > protocol (ISER_HELLO_LEN and ISER_HELLO_REPLY_LEN are unused). 
> 
> James,
> 
> iSER doesn't mandate the source IP in general since its doing a much
> stronger authentication during Login
> However we believe using a similar header to SDP can help the Passive
> side 
> a. know which destination IP was targeted (in a multi homed environment)
> b. for some implementations that want to validate the source for some
> reason
> 
> that's why the draft suggested adding the source/dst IP in the private
> data just like SDP does, 

Which draft contains this? I found

http://www.ietf.org/internet-drafts/draft-ietf-ips-iser-04.txt

but the HELLO header in section 9.3 does not contain any IP address 
information.

> I believe it can be a good idea to use the same approach for 
> NFS/RDMA and eliminate the need for reverse ATS lookup (the may have 
> some conflicts when multiple IPs exists per node). We may just use 
> the SDP hello header as is with unused fields zeroed This will allow 
> all ULPs to use the same mechanism

NFS/RDMA is not specific to iWARP or InfiniBand. My understanding is 
that this could not be easily accommodated in the current standards 
for that reason.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] RDMA connection and address translation API

2005-08-24 Thread Roland Dreier
Caitlin> So with this wealth of options available, do you agree
Caitlin> that there is no reason to elevate any of these issues to
Caitlin> being visisble to a transport neutral application?

No -- the fact that there are a wealth of options actually means that
picking one is an arbitrary choice we impose on transport neutral
implementations and is de facto mandating a wire protocol.

 - R.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] RDMA connection and address translation API

2005-08-24 Thread Caitlin Bestler
I think it would be more accurate to state that DAPL requires the 128-bit
"IA Address space" to be administratively subdivided so that each "subnet"
unambiguously translates to a specific IA reached network and that
translation
of the "IA Address" into and from that network's wire protocol is not visible
to the DAT Consumer.

ATS is indeed *one* solution for doing so. Adding RARP to IPoIB would make
for another solution. Direct translation is also a valid solution for IPv6
compatible network IDs.

So with this wealth of options available, do you agree that there is no
reason to elevate any of these issues to being visisble to a transport
neutral application? 

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Roland Dreier
Sent: Wednesday, August 24, 2005 2:31 PM
To: James Lentini
Cc: openib-general@openib.org
Subject: Re: [openib-general] RDMA connection and address translation API

Roland> No, I think we just need to realize that a perfectly
Roland> transport neutral protocol implementation is not
Roland> achievable.  It's unfortunate that kDAPL fooled people by
Roland> hiding the details of the wire protocol under a supposedly
Roland> "neutral API," but the fact is that mapping an abstract
Roland> RDMA transport to a real implementation will always
Roland> involve arbitrary transport-dependent choices.

Further: if we would be willing to say that transport-neutral protocols must
use a "kDAPL wire protocol," then there's no problem in defining that wire
protocol to put the source and destination IP address somewhere in the CM
private data.  The current "kDAPL wire protocol" happens to use ATS to try
and achieve this (although it doesn't handle the multi-homed case), but that
is no more and no less of an arbitrary protocol design choice.

So in a nutshell, my objection to using ATS is that it is an arbitrary design
choice that doesn't work as well as other equally valid choices.

 - R.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit
http://openib.org/mailman/listinfo/openib-general


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] RDMA connection and address translation API

2005-08-24 Thread Caitlin Bestler
 

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Roland Dreier
Sent: Wednesday, August 24, 2005 2:03 PM
To: Tom Tucker
Cc: openib-general@openib.org
Subject: Re: [openib-general] RDMA connection and address translation API



By the way, as far as I can tell, there is NO formal documentation of the
NFS-RDMA wire protocol.  The current draft (draft-ietf-nfsv4-rpcrdma-01.txt)
simply says:

 This protocol is designed to function with equivalent semantics
 over all appropriate RDMA transports.  In its abstract form, this
 protocol does not implement RDMA directly. [...]  It therefore
 becomes a useful, implementable standard when mapped onto a
 specific RDMA transport, such as iWARP [RDDP] or Infiniband [IB].

 [...]

 In setting up a new RDMA connection, the first action by an RPC
 client will be to obtain a transport address for the server.  The
 mechanism used to obtain this address, and to open an RDMA
 connection is dependent on the type of RDMA transport, and outside
 the scope of this protocol.

So it seems perfectly reasonable and acceptable for the mapping of NFS-RDMA
onto IB to specify that the source and destination IP addresses for an IB
connection are placed in the CM private data.
This seems much easier than trying to turn ATS into an IETF standard.

 - R.


NFS over RDMA was intended to be implemented using DAPL in a transport
neutrall way. Now having the transport layer *add* data before the
private data is legitimate for any specific transport. It would just
have to be defined independently of openib and linux.

Basically, any solution that allows NFS over RDMA to be coded with
the *same* set of kDAPL calls to listen/connect/accept/reject would
be compliant with the intent -- as long as the mapping to wire protocols
was straight-forward and allowed non-kDAPL implementations. For example,
mapping the DAPL private data to the IETF MPA Request/Reply frame 
Private Data certainly qualifies as "straight forward".


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] RDMA connection and address translation API

2005-08-24 Thread Roland Dreier
James> You need to consider what makes sense for *both* ib and
James> iwarp. Keep in mind that the correct API will allow a
James> consumer to use ib and iwarp devices transparently. In
James> other words their will be one code path that support both.

James> If we were to adopt your proposal, the consumer would need
James> to perform unnecessary operations on iWARP.

No, I think we just need to realize that a perfectly transport neutral
protocol implementation is not achievable.  It's unfortunate that
kDAPL fooled people by hiding the details of the wire protocol under a
supposedly "neutral API," but the fact is that mapping an abstract
RDMA transport to a real implementation will always involve arbitrary
transport-dependent choices.

To use an analogy, the IP layer is mostly insulated from the details
of the L2 transport it's using by the net_device abstraction.
However, there are a few things that require code like:

int arp_mc_map(u32 addr, u8 *haddr, struct net_device *dev, int dir)
{
switch (dev->type) {
case ARPHRD_ETHER:
case ARPHRD_FDDI:
case ARPHRD_IEEE802:
ip_eth_mc_map(addr, haddr);
return 0; 
case ARPHRD_IEEE802_TR:
ip_tr_mc_map(addr, haddr);
return 0;
case ARPHRD_INFINIBAND:
ip_ib_mc_map(addr, haddr);
return 0;
default:
if (dir) {
memcpy(haddr, dev->broadcast, dev->addr_len);
return 0;
}
}
return -EINVAL;
}

 - R.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] RDMA connection and address translation API

2005-08-24 Thread Caitlin Bestler
Not if the host connects two disjoint networks and does not route
between them. Such a host should/may be configured to reject any
packet that arrives with a destination address that does not match
the expected destination address for the port it arrives upon. 

One of the things that iWARP vendors strive for is to ensure that
all such existing filtring/safety rules on accepting connections
are left 100% intact.

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Christoph Hellwig
Sent: Wednesday, August 24, 2005 2:00 PM
To: Caitlin Bestler
Cc: openib-general@openib.org
Subject: Re: [openib-general] RDMA connection and address translation API

On Wed, Aug 24, 2005 at 11:14:08AM -0700, Caitlin Bestler wrote:
> The concensus when this issue was debated in the DAT Collaborative was 
> that there was no transport neutral way to specify a set of addresses 
> to listen on other than "all addresses supported by this device".

That doesn't make any sense at all for iWarp as that uses IP addressing which
in Linux is host-, not device-based.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit
http://openib.org/mailman/listinfo/openib-general


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] RDMA connection and address translation API

2005-08-24 Thread Roland Dreier
Roland> Well, that's not what I would expect.  Suppose I have a
Roland> device configured with local addresses 192.168.11.12 and
Roland> 192.168.98.99 and I

Christoph> You never configure a device with local addresses.  IP
Christoph> addresses are always a per-host attribute in Linux.

I don't think this is really true.  In some ways Linux behaves as if
IP addresses are per-host (eg ARP responses can go out any interface)
but really IP addresses are attached to an interface.  Every struct
net_device has a struct in_device, and every struct in_device has a
list of struct in_ifaddrs for the device's IP addresses.

 - R.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] RDMA connection and address translation API

2005-08-24 Thread Roland Dreier
Tom> The issue is that this connection will be established when
Tom> the server may only want to accept requests that are
Tom> targetted to the 10.10.1.1 address.  I don't get why this is
Tom> such a big deal. You can preclude this behavior by simply
Tom> keeping a one to one mapping between the IPv4 addresses and
Tom> the GIDs using the existing protocols and without mandating a
Tom> private data format across *all* ulps and transports.

Well, a few problems with what you say:

 - ATS does not help at all with the case of a multi-homed interface.
   Unless the remote system puts the IP it's trying to connect to
   somewhere in the connection request, there is no way to be psychic
   and recover this information.

 - Mandating ATS use is dictating protocol design just as much as
   requiring the CM private data to carry source and destination IP
   addresses.

 - It's not just preventing connections to the wrong local address.
   NFS-RDMA wants the remote source address (ie getpeername()) so that
   it can look it up in the exports list.

 - Saying that a given GID may only have a single IP address is
   definitely a case of the cure being worse than the disease.  I
   don't think we can forbid perfectly valid multi-homed
   configurations just because it's inconvenient for us to support them.

By the way, as far as I can tell, there is NO formal documentation of
the NFS-RDMA wire protocol.  The current draft (draft-ietf-nfsv4-rpcrdma-01.txt)
simply says:

 This protocol is designed to function with equivalent semantics
 over all appropriate RDMA transports.  In its abstract form, this
 protocol does not implement RDMA directly. [...]  It therefore
 becomes a useful, implementable standard when mapped onto a
 specific RDMA transport, such as iWARP [RDDP] or Infiniband [IB].

 [...]

 In setting up a new RDMA connection, the first action by an RPC
 client will be to obtain a transport address for the server.  The
 mechanism used to obtain this address, and to open an RDMA
 connection is dependent on the type of RDMA transport, and outside
 the scope of this protocol.

So it seems perfectly reasonable and acceptable for the mapping of
NFS-RDMA onto IB to specify that the source and destination IP
addresses for an IB connection are placed in the CM private data.
This seems much easier than trying to turn ATS into an IETF standard.

 - R.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] RDMA connection and address translation API

2005-08-24 Thread Christoph Hellwig
On Wed, Aug 24, 2005 at 11:14:08AM -0700, Caitlin Bestler wrote:
> The concensus when this issue was debated in the DAT Collaborative was
> that there was no transport neutral way to specify a set of addresses to 
> listen
> on other than "all addresses supported by this device".

That doesn't make any sense at all for iWarp as that uses IP addressing
which in Linux is host-, not device-based.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] RDMA connection and address translation API

2005-08-24 Thread Christoph Hellwig
On Wed, Aug 24, 2005 at 09:26:42AM -0700, Roland Dreier wrote:
> Tom> I think I understand, but the purpose of specifying the IP
> Tom> address in the listen is not to filter incoming connect
> Tom> requests, but rather to determine which devices I listen
> Tom> on. I think this works for the IB case as well. So the
> Tom> utility of the IP address specified in the listen is only to
> Tom> determine which devices the sid is created on. Does this make
> Tom> sense or am I missing something?
> 
> Well, that's not what I would expect.  Suppose I have a device
> configured with local addresses 192.168.11.12 and 192.168.98.99 and I

You never configure a device with local addresses.  IP addresses are
always a per-host attribute in Linux.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] RDMA connection and address translation API

2005-08-24 Thread James Lentini


On Wed, 24 Aug 2005, Fab Tillier wrote:

> > From: Roland Dreier [mailto:[EMAIL PROTECTED]
> > Sent: Wednesday, August 24, 2005 11:03 AM
> > 
> > Fab> Why can't the IPV field be ignored?  If a listen wants only
> > Fab> IPV4 addresses, it would specify a 16-byte compare buffer
> > Fab> with the first 12 bytes zero, the next 4 filled with the IPV4
> > Fab> address, and would set the offset to that of the hello
> > Fab> message's destination address (32).
> > 
> > Yes, you're right for SDP.  I guess if we're comfortable mandating
> > that all protocols put their source and destination IPs in the private
> > data for the IB case, then this works.  Of course it's somewhat
> > awkward to pass this information into the transport-neutral CM API but
> > I think this can be worked around.
> 
> I don't know if we need to mandate IP usage - it's up to the 
> application.  Any application that wants to have similar semantics 
> to the way socket listens work (especially when bound to one of 
> multiple IP addresses on a port) the application would have to 
> define its private data to accommodate this.
>
>  At the IB level, the contents of the private data are still opaque, 
> even to the CM.  The CM would only expose the ability to have it 
> perform an initial triage of requests by doing binary comparisons 
> over regions of private data.  It doesn't know (or need to know) 
> what the data represents - it only cares about finding a match (or 
> not).  The CM doesn't define any sort of policy here, and I don't 
> think it should.  It's just bytes to the CM, and it's doing a blind 
> comparison without interpreting the contents.

You need to consider what makes sense for *both* ib and iwarp. Keep in 
mind that the correct API will allow a consumer to use ib and iwarp 
devices transparently. In other words their will be one code path that 
support both.

If we were to adopt your proposal, the consumer would need to perform 
unnecessary operations on iWARP.

A transport neutral client would be forced to put IP information into 
its CM private data on iWARP.

Likewise, a transport neutral server would be forced to pass an 
private data offset and binary blob to the listen API call on iWARP.

Neither of these make sense. 

These API problems are secondary to the burden you would be placing on 
the protocols. As has been mentioned in a previous email, extending 
the current protocols to use this convention will require further 
standardization and in some cases may not be compatible with their 
current architecture.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] RDMA connection and address translation API

2005-08-24 Thread Tom Tucker

So the listening server takes the IP address from the private data, uses
AT to get the GID and then compares it to the GID in the connect
request? 

It feels to me like this private data thing is a case of the cure is
worse than the disease. As I understand it, we're trying to avoid the
following:

server:

dev = ib_get_device(10.10.1.1 /*src ip*/,0 /*dest ip*/);

/* GID has IP addresses 10.10.1.1, 10.10.1.2 */
ib_listen(dev, 10.10.1.1 /* listen bind address */, 143 /* port */, 10
/* backlog */);


client:

dev = ib_get_device(0 /* src wildcard */, 10.10.1.2 /* dest ip*/)


ib_connect(dev, 0 /*src*/, 10.10.1.2 /*dest*/, 143/*port*/, ...);


The issue is that this connection will be established when the server
may only want to accept requests that are targetted to the 10.10.1.1
address.  I don't get why this is such a big deal. You can preclude this
behavior by simply keeping a one to one mapping between the IPv4
addresses and the GIDs using the existing protocols and without
mandating a private data format across *all* ulps and transports.

If I'm being painfully stupid...please feel free to tell me. 

> -Original Message-
> From: Sean Hefty [mailto:[EMAIL PROTECTED] 
> Sent: Wednesday, August 24, 2005 2:12 PM
> To: Tom Tucker; Roland Dreier
> Cc: openib-general@openib.org
> Subject: RE: [openib-general] RDMA connection and address 
> translation API
> 
> >Because it would be better to configure your network "properly". 
> >Putting IP addresses in private data is fundamentally insecure since 
> >any user mode client can spoof the IP address.
> 
> A simple forward lookup could detect this.
> 
> - Sean
> 
> 
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] RDMA connection and address translation API

2005-08-24 Thread Roland Dreier
Tom> Isn't this inevitable regardless of whether or not we have a
Tom> tranport independent connection API. I thought ATS was
Tom> required by NFS for authentication/authorization. Sorry in
Tom> advance if I'm confused --- again.

Current NFS-RDMA code uses and relies on ATS.  However I hope that we
can fix the NFS-RDMA draft to get rid of this.

 - R.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] RDMA connection and address translation API

2005-08-24 Thread Tom Tucker

Isn't this inevitable regardless of whether or not we have a tranport
independent connection API. I thought ATS was required by NFS for
authentication/authorization. Sorry in advance if I'm confused ---
again.

> -Original Message-
> From: [EMAIL PROTECTED] 
> [mailto:[EMAIL PROTECTED] On Behalf Of Roland Dreier
> Sent: Wednesday, August 24, 2005 3:27 PM
> To: James Lentini
> Cc: Caitlin Bestler; openib-general@openib.org
> Subject: Re: [openib-general] RDMA connection and address 
> translation API
> 
> James> I agree with Caitlin. The eventual solution cannot force
> James> protocol modifications in ULPs.
> 
> Does this mean we're stuck with the current use of ATS in NFS-RDMA?
> Surely there's still time to fix the protocol.
> 
>  - R.
> ___
> openib-general mailing list
> openib-general@openib.org
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit 
> http://openib.org/mailman/listinfo/openib-general
> 
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] RDMA connection and address translation API

2005-08-24 Thread Roland Dreier
James> I agree with Caitlin. The eventual solution cannot force
James> protocol modifications in ULPs.

Does this mean we're stuck with the current use of ATS in NFS-RDMA?
Surely there's still time to fix the protocol.

 - R.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] RDMA connection and address translation API

2005-08-24 Thread James Lentini


On Wed, 24 Aug 2005, Caitlin Bestler wrote:

> On 8/24/05, Fab Tillier <[EMAIL PROTECTED]> wrote:
> > 
> > I think if all ULPs provide their source and destination IP in the 
> > private data, you can eliminate the reverse lookup altogether.  A 
> > simple forward lookup is all that's needed to validate that the 
> > source GID in the REQ matches the reported source IP in the 
> > private data.  The forward lookup could be done via ATS or via 
> > ARP, but the CM doesn't need to care which method is used.
> 
> That is not an option.
> 
> The applications are expecting source/destination network addresses 
> that come from a network layer, not from the peer application. IP 
> has no problem meeting this requirement. This is an IB problem that 
> needs to be solved within the scope of IB without changing any ULPs.

I agree with Caitlin. The eventual solution cannot force protocol 
modifications in ULPs.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] RDMA connection and address translation API

2005-08-24 Thread Roland Dreier
Sean> Is the idea that the user calls connect() and then receives
Sean> a single callback indicating that the connection has been
Sean> established?  If so, then the user may need to modify the QP
Sean> to the INIT state, which would require some knowledge
Sean> already of the path.  We would also need to be clear on
Sean> whether the QP is expected to be in the INIT state before
Sean> connect is called, or if it could be in any arbitrary state.
Sean> The other alternative is to provide multiple callbacks
Sean> during connection establishment.

To me it makes sense for the generic CM API to be defined so that an
IB QP must be in the INIT state before being passed to connect().

 - R.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] RDMA connection and address translation API

2005-08-24 Thread Sean Hefty
>If the connect call succeeds in establishing a connection, the ULP's
>QP should be ready for posting work requests. This simplifies the ULP
>considerably.
>
>The API should not create the QP. That would create race conditions
>for certain protocols. For example, consider a protocol in which the
>first message was a send from the server to the client. To properly
>implement such a protocol, the client must post a receive work request
>before initiating a connection.

Thanks for the clarification.  This is similar to what I was thinking as well.
I guess we should note that in order to post receives to the QP, it at least
needs to be in the INIT state.  Would this be done by the CM abstraction or the
user?  For IB, the following fields need to be set when transitioning to INIT:
enable RDMA, PKey index, and physical port.

Is the idea that the user calls connect() and then receives a single callback
indicating that the connection has been established?  If so, then the user may
need to modify the QP to the INIT state, which would require some knowledge
already of the path.  We would also need to be clear on whether the QP is
expected to be in the INIT state before connect is called, or if it could be in
any arbitrary state.  The other alternative is to provide multiple callbacks
during connection establishment.

- Sean


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] RDMA connection and address translation API

2005-08-24 Thread James Lentini


On Wed, 24 Aug 2005, Sean Hefty wrote:

> I guess that I'd like to clarify what the operation of a connect 
> call would do.  Would it be responsible for modifying the QP?  If 
> so, could such a call also allocate the QP?  Note that I'm not 
> advocating either of these, just trying to determine what the 
> behavior of the API would be.

If the connect call succeeds in establishing a connection, the ULP's 
QP should be ready for posting work requests. This simplifies the ULP 
considerably.

The API should not create the QP. That would create race conditions 
for certain protocols. For example, consider a protocol in which the 
first message was a send from the server to the client. To properly 
implement such a protocol, the client must post a receive work request 
before initiating a connection.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] RDMA connection and address translation API

2005-08-24 Thread Yaron Haviv
> -Original Message-
> From: [EMAIL PROTECTED] [mailto:openib-general-
> [EMAIL PROTECTED] On Behalf Of Fab Tillier
> Sent: Wednesday, August 24, 2005 3:00 PM
> To: 'Roland Dreier'
> Cc: openib-general@openib.org
> Subject: RE: [openib-general] RDMA connection and address translation
API
> 
> > From: Roland Dreier [mailto:[EMAIL PROTECTED]
> > Sent: Wednesday, August 24, 2005 11:03 AM
> >
> > Fab> Why can't the IPV field be ignored?  If a listen wants only
> > Fab> IPV4 addresses, it would specify a 16-byte compare buffer
> > Fab> with the first 12 bytes zero, the next 4 filled with the
IPV4
> > Fab> address, and would set the offset to that of the hello
> > Fab> message's destination address (32).
> >
> > Yes, you're right for SDP.  I guess if we're comfortable mandating
> > that all protocols put their source and destination IPs in the
private
> > data for the IB case, then this works.  Of course it's somewhat
> > awkward to pass this information into the transport-neutral CM API
but
> > I think this can be worked around.
> 
> I don't know if we need to mandate IP usage - it's up to the
application.
> Any
> application that wants to have similar semantics to the way socket
listens
> work
> (especially when bound to one of multiple IP addresses on a port) the
> application would have to define its private data to accommodate this.
> 

The context of this discussion is around a common API for iWarp/IB ULPs
In that case they all use IP addresses (since it's the common
addressing) 

If someone would use the IB specific API under this abstraction level he
can provide what ever data he wants to the CM

Any way providing src/dst IPs in the CM Private data is simple, and we
can come with IBTA extension blessing that data structure as a general
way to map IP oriented protocols over IB (a 1-2 page draft at the most)
This way it can also address Caitlin concerns regarding NFS & IETF
(since now it's a transport specific issue)

Yaron
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] RDMA connection and address translation API

2005-08-24 Thread Sean Hefty
>Because it would be better to configure your network "properly". Putting
>IP addresses in private data is fundamentally insecure since any user
>mode client can spoof the IP address.

A simple forward lookup could detect this.

- Sean

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] RDMA connection and address translation API

2005-08-24 Thread Tom Tucker
 

> -Original Message-
> From: Roland Dreier [mailto:[EMAIL PROTECTED] 
> Sent: Wednesday, August 24, 2005 1:17 PM
> To: Tom Tucker
> Cc: openib-general@openib.org
> Subject: Re: [openib-general] RDMA connection and address 
> translation API
> 
> Tom> Good point, although for iWARP it will work that way that you
> Tom> expect.  For IB, admitedly it's more complex and would
> Tom> require ATS. There seems to be significant reluctance around
> Tom> ATS and I don't understand the issues. Can you provide a
> Tom> quick synopsis?
> 
> My resistance is that ATS is just complexity without any benefit.  

IMHO the benefit is that you have a transport independent addressing
mechanism -- albeit with some limitations as you've mentioned. In this
case, the vast majority of clients enjoy the benefit without suffering
the limitations.

> ... It
> doesn't provide additional security.  It doesn't solve the
> multi-homing problem we're talking about now.  

Whenever a single GID maps to multiple IP addresses, I agree, it is a
limitation. However, I don't believe that this is strictly necessary.

> ... Once you've thrown away
> information by turning your IP address into an IB GID, there's no
> magic way ATS can recreate that information and be psychic about which
> of the multi-homed IPs you actually meant.  

I agree, so don't do that. If you want it to work properly, then you
need to map GIDS to IP addresses. 

> ... So why not just put the IP
> addressing information into the CM private data, the way that the SDP
> protocol already does?
> 
>  - R.
> 

Because it would be better to configure your network "properly". Putting
IP addresses in private data is fundamentally insecure since any user
mode client can spoof the IP address. 


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] RDMA connection and address translation API

2005-08-24 Thread Fab Tillier
> From: Sean Hefty [mailto:[EMAIL PROTECTED]
> Sent: Wednesday, August 24, 2005 11:18 AM
> 
> For IB, using private data to listen on a specific IP address seems the
> easiest thing to do.  (Maybe we could do it by mapping different IP
> addresses to different service IDs, requiring registration and lookup?)

The problem with the SID method is that the SID namespace is smaller than the
IPV6 address name space.  There's no way to get every possible IPV6 address
represented by a 64-bit SID.  This further ignores the rules for SIDs in the IB
specification.  I think private data is the only way to do this properly.

> If the CM abstraction layer expected those values to be returned in the
> REP message, it could validate that the remote side it using the same
> protocol to ensure some degree of backwards compatibility.
> 
> I don't know if it makes more sense to push private data checks into the
> actual CM or keep them in a CM abstraction layer.  My guess is that the
> former may be the easier implementation.

I think putting the checks in the CM makes the most sense, though it should be
done in a generic fashion.  A CM abstraction layer could then simply apply a
policy for private data usage - where in the private data it stores the IP
address information.

Layering it this way allows the private data compare to be used for things other
than IP addresses.  Add functionality without imposing policy.

- Fab

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] RDMA connection and address translation API

2005-08-24 Thread Fab Tillier
> From: Roland Dreier [mailto:[EMAIL PROTECTED]
> Sent: Wednesday, August 24, 2005 11:03 AM
> 
> Fab> Why can't the IPV field be ignored?  If a listen wants only
> Fab> IPV4 addresses, it would specify a 16-byte compare buffer
> Fab> with the first 12 bytes zero, the next 4 filled with the IPV4
> Fab> address, and would set the offset to that of the hello
> Fab> message's destination address (32).
> 
> Yes, you're right for SDP.  I guess if we're comfortable mandating
> that all protocols put their source and destination IPs in the private
> data for the IB case, then this works.  Of course it's somewhat
> awkward to pass this information into the transport-neutral CM API but
> I think this can be worked around.

I don't know if we need to mandate IP usage - it's up to the application.  Any
application that wants to have similar semantics to the way socket listens work
(especially when bound to one of multiple IP addresses on a port) the
application would have to define its private data to accommodate this.
 
At the IB level, the contents of the private data are still opaque, even to the
CM.  The CM would only expose the ability to have it perform an initial triage
of requests by doing binary comparisons over regions of private data.  It
doesn't know (or need to know) what the data represents - it only cares about
finding a match (or not).  The CM doesn't define any sort of policy here, and I
don't think it should.  It's just bytes to the CM, and it's doing a blind
comparison without interpreting the contents.

- Fab


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] RDMA connection and address translation API

2005-08-24 Thread Fab Tillier
> From: Caitlin Bestler [mailto:[EMAIL PROTECTED]
> Sent: Wednesday, August 24, 2005 11:14 AM
> 
> On 8/24/05, Fab Tillier <[EMAIL PROTECTED]> wrote:
> > > From: Roland Dreier [mailto:[EMAIL PROTECTED]
> > > Sent: Wednesday, August 24, 2005 10:16 AM
> > >
> > > Fab> Knowledge of actual IP addresses would be up to the consumer.
> > > Fab> However, the IB CM can facilitate checks by allowing the user
> > > Fab> to specify an offset and length in the private data to match
> > > Fab> to for incoming requests.
> > >
> > > This seems too complex and at the same time too limited to me.  For
> > > one thing -- although I think ATS should die -- this doesn't support
> > > ATS reverse lookups.
> >
> > I think if all ULPs provide their source and destination IP in the private
> > data, you can eliminate the reverse lookup altogether.  A simple forward
> > lookup is all that's needed to validate that the source GID in the REQ
> > matches the reported source IP in the private data.  The forward lookup
> > could be done via ATS or via ARP, but the CM doesn't need to care which
> > method is used.
> 
> That is not an option.
> 
> The applications are expecting source/destination network addresses
> that come from a network layer, not from the peer application. IP has
> no problem meeting this requirement. This is an IB problem that needs
> to be solved within the scope of IB without changing any ULPs.

If the app wants to use source/destination network addresses, there isn't a
problem.  The problem is the app wants to use IP addresses, which are *not*
network addresses in IB.  So the app needs to decide between one of two things -
be aware of IB network addresses, or provide meaning to IP addresses over IB.
The latter can't be done reliably under the covers - ATS reverse lookups won't
tell you the IP the source actually used, and there's no way to do so without
either using private data in the CM REQ or requiring a 1:1 mapping of IB:IP
addresses.  The 1:1 IB:IP mapping is not feasible, so the only way to know what
IP address the application used is to embed that into the private data.  I would
expect protocols that try to use IP as their addressing would accommodate this
in their IB usage, just like SDP accommodates it in the hello message.

- Fab

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] RDMA connection and address translation API

2005-08-24 Thread Caitlin Bestler
NFS over RDMA does not do that.

Shouldn't that be the end of discussion on abusing CM private data
unless you are talking *solely* about IB private data. And if that is
the discussion, should not such a strategy be proposed to IETF
and/or IBTA for an NFSoRDMA for IB official mapping?

The other end of the NFSoRDMA connection is not necessarily
running OpenIB or even Linux and is not party to any of these
discussions.

> 
> My resistance is that ATS is just complexity without any benefit.  It
> doesn't provide additional security.  It doesn't solve the
> multi-homing problem we're talking about now.  Once you've thrown away
> information by turning your IP address into an IB GID, there's no
> magic way ATS can recreate that information and be psychic about which
> of the multi-homed IPs you actually meant.  So why not just put the IP
> addressing information into the CM private data, the way that the SDP
> protocol already does?
> 
>  - R.
> ___
> openib-general mailing list
> openib-general@openib.org
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
>
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] RDMA connection and address translation API

2005-08-24 Thread Fab Tillier
> From: Roland Dreier [mailto:[EMAIL PROTECTED]
> Sent: Wednesday, August 24, 2005 10:16 AM
> 
> Fab> Knowledge of actual IP addresses would be up to the consumer.
> Fab> However, the IB CM can facilitate checks by allowing the user
> Fab> to specify an offset and length in the private data to match
> Fab> to for incoming requests.
> 
> This seems too complex and at the same time too limited to me.  For
> one thing -- although I think ATS should die -- this doesn't support
> ATS reverse lookups.

I think if all ULPs provide their source and destination IP in the private data,
you can eliminate the reverse lookup altogether.  A simple forward lookup is all
that's needed to validate that the source GID in the REQ matches the reported
source IP in the private data.  The forward lookup could be done via ATS or via
ARP, but the CM doesn't need to care which method is used.
 
> For another, it doesn't handle something like
> the SDP Hello header, where the IP version is at a certain offset, and
> then the IP address is interpreted according to the IP address.

Why can't the IPV field be ignored?  If a listen wants only IPV4 addresses, it
would specify a 16-byte compare buffer with the first 12 bytes zero, the next 4
filled with the IPV4 address, and would set the offset to that of the hello
message's destination address (32).

> What makes it really ugly is that it's perfectly reasonable for one
> consumer to listen to a service at 192.168.11.12 and another consumer
> to listen to the same service at 192.168.98.99.  How do we handle this
> in the IB case??

As long as the service IP address (the local address on the listening side) is
always advertised in the same place in the private data, this isn't a problem.
The compare lengths and offsets would be identical for both services, but the
compare buffer contents would differ.  Did I miss what you were getting at?

- Fab

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] RDMA connection and address translation API

2005-08-24 Thread Sean Hefty
>> I think if all ULPs provide their source and destination IP in the private
>data,
>> you can eliminate the reverse lookup altogether.  A simple forward lookup is
>all
>> that's needed to validate that the source GID in the REQ matches the reported
>> source IP in the private data.  The forward lookup could be done via ATS or
>via
>> ARP, but the CM doesn't need to care which method is used.
>>
>
>That is not an option.
>
>The applications are expecting source/destination network addresses
>that come from a network layer, not from the peer application. IP has
>no problem meeting this requirement. This is an IB problem that needs
>to be solved within the scope of IB without changing any ULPs.

IB can solve the option by exposing fewer bytes of private data.  ULPs do not
need to know that part of the IB private data is actually used by the CM
abstraction layer.  ULPs that make use of this new interface change anyway.

- Sean

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] RDMA connection and address translation API

2005-08-24 Thread Yaron Haviv
> -Original Message-
> From: [EMAIL PROTECTED] [mailto:openib-general-
> [EMAIL PROTECTED] On Behalf Of Caitlin Bestler
> Sent: Wednesday, August 24, 2005 2:14 PM
> To: Fab Tillier
> Cc: openib-general@openib.org
> Subject: Re: [openib-general] RDMA connection and address translation
API
> 
> 
> The applications are expecting source/destination network addresses
> that come from a network layer, not from the peer application. IP has
> no problem meeting this requirement. This is an IB problem that needs
> to be solved within the scope of IB without changing any ULPs.
> 

To my understanding IB private data fields are IB CM specific 
So embedding src/dst IP in it doesn't change the ULP and could be
considered as part of the IB CM

You can look at the private data in that case as a replacement to the
TCP CM (Syn/SynAck exchange), and Syn packet includes IPs & Ports

Yaron 
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] RDMA connection and address translation API

2005-08-24 Thread Sean Hefty
>Fab> Why can't the IPV field be ignored?  If a listen wants only
>Fab> IPV4 addresses, it would specify a 16-byte compare buffer
>Fab> with the first 12 bytes zero, the next 4 filled with the IPV4
>Fab> address, and would set the offset to that of the hello
>Fab> message's destination address (32).
>
>Yes, you're right for SDP.  I guess if we're comfortable mandating
>that all protocols put their source and destination IPs in the private
>data for the IB case, then this works.  Of course it's somewhat
>awkward to pass this information into the transport-neutral CM API but
>I think this can be worked around.

For IB, using private data to listen on a specific IP address seems the easiest
thing to do.  (Maybe we could do it by mapping different IP addresses to
different service IDs, requiring registration and lookup?)  If the CM
abstraction layer expected those values to be returned in the REP message, it
could validate that the remote side it using the same protocol to ensure some
degree of backwards compatibility.

I don't know if it makes more sense to push private data checks into the actual
CM or keep them in a CM abstraction layer.  My guess is that the former may be
the easier implementation.

- Sean

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] RDMA connection and address translation API

2005-08-24 Thread Roland Dreier
Tom> Good point, although for iWARP it will work that way that you
Tom> expect.  For IB, admitedly it's more complex and would
Tom> require ATS. There seems to be significant reluctance around
Tom> ATS and I don't understand the issues. Can you provide a
Tom> quick synopsis?

My resistance is that ATS is just complexity without any benefit.  It
doesn't provide additional security.  It doesn't solve the
multi-homing problem we're talking about now.  Once you've thrown away
information by turning your IP address into an IB GID, there's no
magic way ATS can recreate that information and be psychic about which
of the multi-homed IPs you actually meant.  So why not just put the IP
addressing information into the CM private data, the way that the SDP
protocol already does?

 - R.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] RDMA connection and address translation API

2005-08-24 Thread Caitlin Bestler
On 8/24/05, Fab Tillier <[EMAIL PROTECTED]> wrote:
> > From: Roland Dreier [mailto:[EMAIL PROTECTED]
> > Sent: Wednesday, August 24, 2005 10:16 AM
> >
> > Fab> Knowledge of actual IP addresses would be up to the consumer.
> > Fab> However, the IB CM can facilitate checks by allowing the user
> > Fab> to specify an offset and length in the private data to match
> > Fab> to for incoming requests.
> >
> > This seems too complex and at the same time too limited to me.  For
> > one thing -- although I think ATS should die -- this doesn't support
> > ATS reverse lookups.
> 
> I think if all ULPs provide their source and destination IP in the private 
> data,
> you can eliminate the reverse lookup altogether.  A simple forward lookup is 
> all
> that's needed to validate that the source GID in the REQ matches the reported
> source IP in the private data.  The forward lookup could be done via ATS or 
> via
> ARP, but the CM doesn't need to care which method is used.
> 

That is not an option.

The applications are expecting source/destination network addresses
that come from a network layer, not from the peer application. IP has
no problem meeting this requirement. This is an IB problem that needs
to be solved within the scope of IB without changing any ULPs.

> > For another, it doesn't handle something like
> > the SDP Hello header, where the IP version is at a certain offset, and
> > then the IP address is interpreted according to the IP address.
> 
> Why can't the IPV field be ignored?  If a listen wants only IPV4 addresses, it
> would specify a 16-byte compare buffer with the first 12 bytes zero, the next 
> 4
> filled with the IPV4 address, and would set the offset to that of the hello
> message's destination address (32).
> 
> > What makes it really ugly is that it's perfectly reasonable for one
> > consumer to listen to a service at 192.168.11.12 and another consumer
> > to listen to the same service at 192.168.98.99.  How do we handle this
> > in the IB case??
> 
> As long as the service IP address (the local address on the listening side) is
> always advertised in the same place in the private data, this isn't a problem.
> The compare lengths and offsets would be identical for both services, but the
> compare buffer contents would differ.  Did I miss what you were getting at?
> 

The concensus when this issue was debated in the DAT Collaborative was
that there was no transport neutral way to specify a set of addresses to listen
on other than "all addresses supported by this device".

As noted in another posting, it is easy to support "all for device" and "this
address only" with transport neutral interfaces. Anything else is problematic.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] RDMA connection and address translation API

2005-08-24 Thread Tom Tucker
 

> -Original Message-
> From: Roland Dreier [mailto:[EMAIL PROTECTED] 
> Sent: Wednesday, August 24, 2005 11:27 AM
> To: Tom Tucker
> Cc: Roland Dreier; openib-general@openib.org
> Subject: Re: [openib-general] RDMA connection and address 
> translation API
> 
> Tom> I think I understand, but the purpose of specifying the IP
> Tom> address in the listen is not to filter incoming connect
> Tom> requests, but rather to determine which devices I listen
> Tom> on. I think this works for the IB case as well. So the
> Tom> utility of the IP address specified in the listen is only to
> Tom> determine which devices the sid is created on. Does this make
> Tom> sense or am I missing something?
> 
> Well, that's not what I would expect.  Suppose I have a 
> device configured with local addresses 192.168.11.12 and 
> 192.168.98.99 and I start listening for some service at the 
> address 192.168.11.12.  I don't think I should see a 
> connection request if a remote system tries to connect to 
> 192.168.98.99 (even though it's the same network interface as 
> 192.168.11.12).
> 
>  - R.
> 
Good point, although for iWARP it will work that way that you expect.
For IB, admitedly it's more complex and would require ATS. There seems
to be significant reluctance around ATS and I don't understand the
issues. Can you provide a quick synopsis?

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] RDMA connection and address translation API

2005-08-24 Thread Roland Dreier
Fab> Why can't the IPV field be ignored?  If a listen wants only
Fab> IPV4 addresses, it would specify a 16-byte compare buffer
Fab> with the first 12 bytes zero, the next 4 filled with the IPV4
Fab> address, and would set the offset to that of the hello
Fab> message's destination address (32).

Yes, you're right for SDP.  I guess if we're comfortable mandating
that all protocols put their source and destination IPs in the private
data for the IB case, then this works.  Of course it's somewhat
awkward to pass this information into the transport-neutral CM API but
I think this can be worked around.

Roland> What makes it really ugly is that it's perfectly
Roland> reasonable for one consumer to listen to a service at
Roland> 192.168.11.12 and another consumer to listen to the same
Roland> service at 192.168.98.99.  How do we handle this in the IB
Roland> case??

Fab> As long as the service IP address (the local address on the
Fab> listening side) is always advertised in the same place in the
Fab> private data, this isn't a problem.  The compare lengths and
Fab> offsets would be identical for both services, but the compare
Fab> buffer contents would differ.  Did I miss what you were
Fab> getting at?

No, I think I confused myself.  As long as the CM can get at the IP
information, it can figure out which consumer is which.

 - R.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] RDMA connection and address translation API

2005-08-24 Thread Yaron Haviv
> -Original Message-
> From: [EMAIL PROTECTED] [mailto:openib-general-
> [EMAIL PROTECTED] On Behalf Of James Lentini
> Sent: Wednesday, August 24, 2005 1:43 PM
> To: Roland Dreier
> Cc: openib-general@openib.org
> Subject: Re: [openib-general] RDMA connection and address translation
API
> 
> 
> 
> On Tue, 23 Aug 2005, Roland Dreier wrote:
> 
> > It would be possible to have another function like
> > rdma_getpeername() that takes the transport address and
> > returns a source IP address.  In the IB case this would do
an
> > ATS reverse lookup.  However, I hate this idea.  iSER
already
> > uses the CM private data to pass the source IP in the IB
case,
> 
> I know this is how IB SDP works, but I don't think iSER works this
> way.
> 
> The code in the tree calls dat_ep_connect() with a NULL private data
> pointer.
> 
> There is an iSER HELLO message described in iser_header.h contains IP
> addresses, but I'm not certain that this is part of the current
> protocol (ISER_HELLO_LEN and ISER_HELLO_REPLY_LEN are unused).

James,

iSER doesn't mandate the source IP in general since its doing a much
stronger authentication during Login
However we believe using a similar header to SDP can help the Passive
side 
a. know which destination IP was targeted (in a multi homed environment)
b. for some implementations that want to validate the source for some
reason

that's why the draft suggested adding the source/dst IP in the private
data just like SDP does, I believe it can be a good idea to use the same
approach for NFS/RDMA and eliminate the need for reverse ATS lookup (the
may have some conflicts when multiple IPs exists per node).
We may just use the SDP hello header as is with unused fields zeroed 
This will allow all ULPs to use the same mechanism

Yaron


> ___
> openib-general mailing list
> openib-general@openib.org
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit
http://openib.org/mailman/listinfo/openib-
> general
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] RDMA connection and address translation API

2005-08-24 Thread James Lentini


On Tue, 23 Aug 2005, Roland Dreier wrote:

> It would be possible to have another function like
> rdma_getpeername() that takes the transport address and
> returns a source IP address.  In the IB case this would do an
> ATS reverse lookup.  However, I hate this idea.  iSER already
> uses the CM private data to pass the source IP in the IB case,

I know this is how IB SDP works, but I don't think iSER works this 
way.

The code in the tree calls dat_ep_connect() with a NULL private data 
pointer. 

There is an iSER HELLO message described in iser_header.h contains IP 
addresses, but I'm not certain that this is part of the current 
protocol (ISER_HELLO_LEN and ISER_HELLO_REPLY_LEN are unused).
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] RDMA connection and address translation API

2005-08-24 Thread Roland Dreier
Fab> I think the IB CM needs to be able to do two things.  It
Fab> needs to allow a listen to be bound to a specific port -
Fab> using the port GUID or the LID or something along those
Fab> lines.

Yes, this is probably a good idea.

Fab> Knowledge of actual IP addresses would be up to the consumer.
Fab> However, the IB CM can facilitate checks by allowing the user
Fab> to specify an offset and length in the private data to match
Fab> to for incoming requests.

This seems too complex and at the same time too limited to me.  For
one thing -- although I think ATS should die -- this doesn't support
ATS reverse lookups.  For another, it doesn't handle something like
the SDP Hello header, where the IP version is at a certain offset, and
then the IP address is interpreted according to the IP address.

What makes it really ugly is that it's perfectly reasonable for one
consumer to listen to a service at 192.168.11.12 and another consumer
to listen to the same service at 192.168.98.99.  How do we handle this
in the IB case??

 - R.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] RDMA connection and address translation API

2005-08-24 Thread James Lentini


> > However, there's another problem with trying to lump address
> > translation and connection into a single "connect" call, and this
> > problem looks fundamental and fatal to me.  The connect call takes a
> > QP pointer, but to create a QP the consumer needs to know which local
> > device to use.  However, the consumer doesn't know which device to use
> > until the destination address has been resolved to a route, including
> > a local interface.
> 
> The proposition, also presented (I beleive) in the OpenIB workshop,
> include a function called ib_cma_get_device, that retrieves the device
> (for qp creation purposes) according to the destination address and the
> local routing table. 

That function was included in the presentation. Given that the 
discussion focused on the proper location of address translation, it 
is understandable that its presence was overlooked.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] RDMA connection and address translation API

2005-08-24 Thread Fab Tillier
> From: Roland Dreier [mailto:[EMAIL PROTECTED]
> Sent: Wednesday, August 24, 2005 9:27 AM
> 
> Tom> I think I understand, but the purpose of specifying the IP
> Tom> address in the listen is not to filter incoming connect
> Tom> requests, but rather to determine which devices I listen
> Tom> on. I think this works for the IB case as well. So the
> Tom> utility of the IP address specified in the listen is only to
> Tom> determine which devices the sid is created on. Does this make
> Tom> sense or am I missing something?
> 
> Well, that's not what I would expect.  Suppose I have a device
> configured with local addresses 192.168.11.12 and 192.168.98.99 and I
> start listening for some service at the address 192.168.11.12.  I
> don't think I should see a connection request if a remote system tries
> to connect to 192.168.98.99 (even though it's the same network
> interface as 192.168.11.12).

I think the IB CM needs to be able to do two things.  It needs to allow a listen
to be bound to a specific port - using the port GUID or the LID or something
along those lines.  The Windows CM currently take a port GUID as input to allow
binding requests to a local IB port.  Incoming MADs are matched based on which
port they came in on.  This does introduce the limitation that sending CM MADs
to a port other than the one you wish to connect to won't have the desired
result if the ULP performs port filtering.  I don't think this is a big deal.

Knowledge of actual IP addresses would be up to the consumer.  However, the IB
CM can facilitate checks by allowing the user to specify an offset and length in
the private data to match to for incoming requests.  ULPs that would want to
distinguish between IP addresses on a given port would put the IP in their
private data, and instruct the CM to compare a specific value at a specific
offset and length for every incoming REQ.  The Windows CM does this - a listen
takes as input a private data compare buffer, buffer length, and offset within
the REQ private data to perform the comparison.

Without the CM performing the private data comparison for the client, there is
no way for the CM to route to the proper person based on something like IP.
Using a generic private data compare mechanism enables the users to do whatever
they feel like, without putting knowledge of IP addresses and whatnot into the
IB CM or dictating how clients must use their private data.

A lookup of a listen for an incoming request changes from just being based on
SID to taking as additional parameters the port GUID on which the REQ was
received and the REQ's private data in case a private data compare needs to be
performed.

- Fab

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] RDMA connection and address translation API

2005-08-24 Thread Steve Wise
 

> -Original Message-
> From: [EMAIL PROTECTED] 
> [mailto:[EMAIL PROTECTED] On Behalf Of Roland Dreier
> Sent: Wednesday, August 24, 2005 11:27 AM
> To: Tom Tucker
> Cc: openib-general@openib.org
> Subject: Re: [openib-general] RDMA connection and address 
> translation API
> 
> Tom> I think I understand, but the purpose of specifying the IP
> Tom> address in the listen is not to filter incoming connect
> Tom> requests, but rather to determine which devices I listen
> Tom> on. I think this works for the IB case as well. So the
> Tom> utility of the IP address specified in the listen is only to
> Tom> determine which devices the sid is created on. Does this make
> Tom> sense or am I missing something?
> 
> Well, that's not what I would expect.  Suppose I have a device
> configured with local addresses 192.168.11.12 and 192.168.98.99 and I
> start listening for some service at the address 192.168.11.12.  I
> don't think I should see a connection request if a remote system tries
> to connect to 192.168.98.99 (even though it's the same network
> interface as 192.168.11.12).
> 

I agree Roland.  ULPs that listen to a specific addr, expect only
connections requests that were sent to that ip addr.  I think we want to
provide this functionality. 


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] RDMA connection and address translation API

2005-08-24 Thread Roland Dreier
Tom> I think I understand, but the purpose of specifying the IP
Tom> address in the listen is not to filter incoming connect
Tom> requests, but rather to determine which devices I listen
Tom> on. I think this works for the IB case as well. So the
Tom> utility of the IP address specified in the listen is only to
Tom> determine which devices the sid is created on. Does this make
Tom> sense or am I missing something?

Well, that's not what I would expect.  Suppose I have a device
configured with local addresses 192.168.11.12 and 192.168.98.99 and I
start listening for some service at the address 192.168.11.12.  I
don't think I should see a connection request if a remote system tries
to connect to 192.168.98.99 (even though it's the same network
interface as 192.168.11.12).

 - R.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] RDMA connection and address translation API

2005-08-24 Thread Tom Tucker
 

> -Original Message-
> From: Roland Dreier [mailto:[EMAIL PROTECTED] 
> Sent: Wednesday, August 24, 2005 11:04 AM
> To: Tom Tucker
> Cc: Roland Dreier; openib-general@openib.org
> Subject: Re: [openib-general] RDMA connection and address 
> translation API
> 
> Tom> The listen side, however, I think needs a little tweaking. It
> Tom> would be beneficial if the client can specify either an IP
> Tom> address and port to listen on (effectively selecting a
> Tom> particular device), or a wild card (all RDMA devices). An NFS
> Tom> server is an example of the later. This is trivial to do by
> Tom> providing an address to the listen call where a '0'
> Tom> represents a wild card.
> 
> I agree that it's useful to be able to pass a sockaddr to 
> bind a listen to (just like the bind() call in userspace).  
> However, the problem is that in the IB world, an incoming 
> connection request does not come with a destination IP 
> address in any standard way.  So I don't know the right way 
> to implement bind() in the IB case.

I think I understand, but the purpose of specifying the IP address in
the listen is not to filter incoming connect requests, but rather to
determine which devices I listen on. I think this works for the IB case
as well. So the utility of the IP address specified in the listen is
only to determine which devices the sid is created on. Does this make
sense or am I missing something?

> 
> By the way, an IP address/port does not necessarily select a 
> single RDMA device.  It's a perfectly valid configuration to 
> have 10 network interfaces all with the same local IP address.
> 
>  - R.
> 
Yes, and in this case, all devices with the same IP address would end up
listening in the same way that specifying a wildcard (0) would result in
multiple devices listening. 


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] RDMA connection and address translation API

2005-08-24 Thread Roland Dreier
Tom> The listen side, however, I think needs a little tweaking. It
Tom> would be beneficial if the client can specify either an IP
Tom> address and port to listen on (effectively selecting a
Tom> particular device), or a wild card (all RDMA devices). An NFS
Tom> server is an example of the later. This is trivial to do by
Tom> providing an address to the listen call where a '0'
Tom> represents a wild card.

I agree that it's useful to be able to pass a sockaddr to bind a
listen to (just like the bind() call in userspace).  However, the
problem is that in the IB world, an incoming connection request does
not come with a destination IP address in any standard way.  So I
don't know the right way to implement bind() in the IB case.

By the way, an IP address/port does not necessarily select a single
RDMA device.  It's a perfectly valid configuration to have 10 network
interfaces all with the same local IP address.

 - R.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] RDMA connection and address translation API

2005-08-24 Thread Steve Wise
Roland, this looks good!  A few comments below...

 

> -Original Message-
> From: [EMAIL PROTECTED] 
> [mailto:[EMAIL PROTECTED] On Behalf Of Roland Dreier
> Sent: Wednesday, August 24, 2005 12:07 AM
> To: openib-general@openib.org
> Subject: [openib-general] RDMA connection and address translation API
> 
> At the OpenIB workshop on Monday, we had some discussion about a
> high-level transport-neutral API for connection handling.  After
> giving the topic some more thought, I've come to the conclusion that
> neither the kDAPL API nor the new API that was presented are usable.
> In this email, I'll try to detail my reasoning and sketch what I
> believe is the correct API.
> 
> The new API that we looked at was essentially the following (I'm
> recreating this from memory, so I apologize if I misrepresent it):
> 
> listen(local_ip_address, service_id, listen_callback)
> connect(local_qp, remote_ip_address, qos, service_id,
> private_data, connect_callback)
> 
> We already discussed the problem with having the listen callback pass
> the consumer a remote source address -- doing this requires the
> connection handling module to do an ATS reverse lookup in the IB case,
> which the consumer might not want.  I think there's agreement that the
> correct thing here is for the listen callback to pass a transport
> address to the consumer and provide a function that the consumer can
> call to perform an ATS reverse lookup if desired.  This isn't a major
> problem and can be dealt with.
> 
> However, there's another problem with trying to lump address
> translation and connection into a single "connect" call, and this
> problem looks fundamental and fatal to me.  The connect call takes a
> QP pointer, but to create a QP the consumer needs to know which local
> device to use.  However, the consumer doesn't know which device to use
> until the destination address has been resolved to a route, including
> a local interface.
> 
> As far as I can tell, kDAPL punts on this and simply requires the
> consumer to handle the route lookup itself before calling
> dat_ep_connect().  It seems that current kDAPL consumers similarly
> punt on this issue: the iSER initiator and the NFS-RDMA client both
> just use a single device which is statically discovered at init time.
>

Yes, DAPL punts on this.

> It seems that the kDAPL connection model has a serious flaw, in that
> it pushes the complexity of route lookup into the consumer.  Further,
> we have strong evidence that this routing code is hard to write and
> that consumers will just ignore this complexity and hard-code
> solutions that don't work under all configurations.
>

I agree!
 
> With this in mind, I believe that the connection API needs to be
> something more like the following:
> 
> rdma_resolve_address():
> inputs: dest IP address, qos, npaths,
> done callback, opaque context
>   done callback params: status, local RDMA device,
> RDMA transport address, context
> 
> This function starts the process of resolving an IP address to
> an RDMA device and address.  When the resolution is complete,
> the callback is called with a status.  If the status is
> "success" then the callback also gets the device pointer and
> transport address (as well as the original context that the
> consumer passed in).
> 
> The "RDMA transport address" type is a union containing
> transport-dependent data.  In the IB case, it's all of the
> SGID, DGID, SLID, DLID, SL etc. that we know and love.  In the
> iWARP case, it's the source IP, destination IP and QOS.
> 
> npaths can be either 1 or 2 in the IB case; if it's 2, then
> the resolver will try to find a primary and alternate path for
> APM.  In the iWARP case, I guess npaths will always be 1, and
> I guess anyone who wants to use iWARP over multihomed SCTP
> will probably have to use some lower-level API.
> 
> By the way, we may also have to have the option of passing in
> a local netdev so that we can handle link-local IPv6
> addresses.  There may be other cases I haven't thought of yet.
> I just hope we can avoid going all the way to the horror of
> the getaddrinfo() API.
> 
> I also hope we can agree to use IPoIB ARP to resolve the
> address in the IB case; having a flag or some other hack in
> the API to expose the option of ATS seems unacceptably ugly.
> 
> rdma_connect():
> inputs: local QP, RDMA transport address, destination service,
> private data, timeout, event callback, opaque context
> 
> This function takes the resolved address and actually 
> connects.
> 
> I'm not sure how we want to abstract the IB service vs. iWARP
> TCP port number difference.  I guess it's OK to have iWARP
> consumers stick their (16-bit) port number in a

RE: [openib-general] RDMA connection and address translation API

2005-08-24 Thread Tom Tucker
Roland:

Steve and I came to the same conclusion on the airplane ride back to
Austin. Whereas plain old TCP/IP selects a device at the bottom of the
stack, RDMA transports must select the device at the top because
pre-connect resources must be allocated and these resouces are
associated with a particular device.

I think you've absolutely nailed the active side (by the way, I think
the ib_at_route_by_ip service already performs the necessary routing
function). The listen side, however, I think needs a little tweaking. It
would be beneficial if the client can specify either an IP address and
port to listen on (effectively selecting a particular device), or a wild
card (all RDMA devices). An NFS server is an example of the later. This
is trivial to do by providing an address to the listen call where a '0'
represents a wild card.

> -Original Message-
> From: [EMAIL PROTECTED] 
> [mailto:[EMAIL PROTECTED] On Behalf Of Roland Dreier
> Sent: Wednesday, August 24, 2005 12:07 AM
> To: openib-general@openib.org
> Subject: [openib-general] RDMA connection and address translation API
> 
> At the OpenIB workshop on Monday, we had some discussion 
> about a high-level transport-neutral API for connection 
> handling.  After giving the topic some more thought, I've 
> come to the conclusion that neither the kDAPL API nor the new 
> API that was presented are usable.
> In this email, I'll try to detail my reasoning and sketch 
> what I believe is the correct API.
> 
> The new API that we looked at was essentially the following 
> (I'm recreating this from memory, so I apologize if I 
> misrepresent it):
> 
> listen(local_ip_address, service_id, listen_callback)
> connect(local_qp, remote_ip_address, qos, service_id,
> private_data, connect_callback)
> 
> We already discussed the problem with having the listen 
> callback pass the consumer a remote source address -- doing 
> this requires the connection handling module to do an ATS 
> reverse lookup in the IB case, which the consumer might not 
> want.  I think there's agreement that the correct thing here 
> is for the listen callback to pass a transport address to the 
> consumer and provide a function that the consumer can call to 
> perform an ATS reverse lookup if desired.  This isn't a major 
> problem and can be dealt with.
> 
> However, there's another problem with trying to lump address 
> translation and connection into a single "connect" call, and 
> this problem looks fundamental and fatal to me.  The connect 
> call takes a QP pointer, but to create a QP the consumer 
> needs to know which local device to use.  However, the 
> consumer doesn't know which device to use until the 
> destination address has been resolved to a route, including a 
> local interface.
> 
> As far as I can tell, kDAPL punts on this and simply requires 
> the consumer to handle the route lookup itself before calling 
> dat_ep_connect().  It seems that current kDAPL consumers 
> similarly punt on this issue: the iSER initiator and the 
> NFS-RDMA client both just use a single device which is 
> statically discovered at init time.
> 
> It seems that the kDAPL connection model has a serious flaw, 
> in that it pushes the complexity of route lookup into the 
> consumer.  Further, we have strong evidence that this routing 
> code is hard to write and that consumers will just ignore 
> this complexity and hard-code solutions that don't work under 
> all configurations.
> 
> With this in mind, I believe that the connection API needs to 
> be something more like the following:
> 
> rdma_resolve_address():
> inputs: dest IP address, qos, npaths,
> done callback, opaque context
>   done callback params: status, local RDMA device,
> RDMA transport address, context
> 
> This function starts the process of resolving an IP address to
> an RDMA device and address.  When the resolution is complete,
> the callback is called with a status.  If the status is
> "success" then the callback also gets the device pointer and
> transport address (as well as the original context that the
> consumer passed in).
> 
> The "RDMA transport address" type is a union containing
> transport-dependent data.  In the IB case, it's all of the
> SGID, DGID, SLID, DLID, SL etc. that we know and love.  In the
> iWARP case, it's the source IP, destination IP and QOS.
> 
> npaths can be either 1 or 2 in the IB case; if it's 2, then
> the resolver will try to find a primary and alternate path for
> APM.  In the iWARP case, I guess npaths will always be 1, and
> I guess anyone who wants to use iWARP over multihomed SCTP
> will probably have to use some lower-level API.
> 
> By the way, we may also have to have the option of passing in
> a local netdev so that we can handle link-local IPv6
> addresses.  There may be other

Re: [openib-general] RDMA connection and address translation API

2005-08-24 Thread Guy German
Hi,

- Here is a header file for cm abstraction API proposition.
- This is just a preliminary suggestion, for review.
- All comments are welcome.
- Please read the notes in the header remarks
- I am attaching the file and will send it later in a different message,
to the list.
- I think that the ib_ prefix should be changed to rdma_, but that
should be done for the rest of the verbs as well, if we are claiming
that the ib verbs abstract iwarp.
- I think that the main difference between the 2 propositions is the
question of whether or not to expose the consumer to the address
resolution. I believe this suggestion (of covering it in the cma) is
simpler, because it saves unnecessary upcall handling for the consumer.
In any case - I don't believe this is clear cut, and would like to hear
other opinions from people on the list.
- Also please see my embedded answer to this mail


Thanks,
Guy.

> We already discussed the problem with having the listen callback pass
> the consumer a remote source address -- doing this requires the
> connection handling module to do an ATS reverse lookup in the IB case,
> which the consumer might not want.  I think there's agreement that the
> correct thing here is for the listen callback to pass a transport
> address to the consumer and provide a function that the consumer can
> call to perform an ATS reverse lookup if desired.  This isn't a major
> problem and can be dealt with.

I agree. This is corrected in the current suggestion

> However, there's another problem with trying to lump address
> translation and connection into a single "connect" call, and this
> problem looks fundamental and fatal to me.  The connect call takes a
> QP pointer, but to create a QP the consumer needs to know which local
> device to use.  However, the consumer doesn't know which device to use
> until the destination address has been resolved to a route, including
> a local interface.

The proposition, also presented (I beleive) in the OpenIB workshop,
include a function called ib_cma_get_device, that retrieves the device
(for qp creation purposes) according to the destination address and the
local routing table. This is done synchronously, and it is implemented
today in the at module. If using link-local IPv6 addresses, I think that
this function isn't even necessary (If I understand it correctly - you
need to know which device to get out from).

> As far as I can tell, kDAPL punts on this and simply requires the
> consumer to handle the route lookup itself before calling
> dat_ep_connect().  It seems that current kDAPL consumers similarly
> punt on this issue: the iSER initiator and the NFS-RDMA client both
> just use a single device which is statically discovered at init time.
> 
> It seems that the kDAPL connection model has a serious flaw, in that
> it pushes the complexity of route lookup into the consumer.  Further,
> we have strong evidence that this routing code is hard to write and
> that consumers will just ignore this complexity and hard-code
> solutions that don't work under all configurations.
> With this in mind, I believe that the connection API needs to be
> something more like the following:
> 
> rdma_resolve_address():
> inputs: dest IP address, qos, npaths,
> done callback, opaque context
>   done callback params: status, local RDMA device,
> RDMA transport address, context
> 
> This function starts the process of resolving an IP address to
> an RDMA device and address.  When the resolution is complete,
> the callback is called with a status.  If the status is
> "success" then the callback also gets the device pointer and
> transport address (as well as the original context that the
> consumer passed in).

In the address resolution you have 2 upcalls (from ip to gid and from
gid to path). So, if you are already covering one upcall in the cma, why
not cover both ?

> The "RDMA transport address" type is a union containing
> transport-dependent data.  In the IB case, it's all of the
> SGID, DGID, SLID, DLID, SL etc. that we know and love.  In the
> iWARP case, it's the source IP, destination IP and QOS.
> 
> npaths can be either 1 or 2 in the IB case; if it's 2, then
> the resolver will try to find a primary and alternate path for
> APM.  In the iWARP case, I guess npaths will always be 1, and
> I guess anyone who wants to use iWARP over multihomed SCTP
> will probably have to use some lower-level API.
> 
> By the way, we may also have to have the option of passing in
> a local netdev so that we can handle link-local IPv6
> addresses.  There may be other cases I haven't thought of yet.
> I just hope we can avoid going all the way to the horror of
> the getaddrinfo() API.
> 
> I also hope we can agree to use IPoIB ARP to resolve the
> address in the IB case; having a flag

RE: [openib-general] RDMA connection and address translation API

2005-08-24 Thread Sean Hefty
>However, there's another problem with trying to lump address
>translation and connection into a single "connect" call, and this
>problem looks fundamental and fatal to me.  The connect call takes a
>QP pointer, but to create a QP the consumer needs to know which local
>device to use.  However, the consumer doesn't know which device to use
>until the destination address has been resolved to a route, including
>a local interface.

I agree that this is a fairly serious issue with the proposed API.  I guess that
I'd like to clarify what the operation of a connect call would do.  Would it be
responsible for modifying the QP?  If so, could such a call also allocate the
QP?  Note that I'm not advocating either of these, just trying to determine what
the behavior of the API would be.

>Wait for connection requests and pass events to the consumer's
>callback.  I'm not sure if/home we want to support binding to
>a particular IP address.  The current IB CM in Linux doesn't
>support binding a listen to a single device or port, and even
>if it did it's not clear how to handle binding to one IP
>address when a port has more than one IP.

I don't think that it would be overly difficult to bind IB CM listen requests to
a specific port or LID, or based on matching specific private data.

- Sean

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general