subject:"RE\: \[openib\-general\] RDMA connection and address translation API"

RE: [openib-general] RDMA connection and address translation API

2005-08-29 Thread Guy German

Sean wrote:
 It looks like this would work.  If a client wanted to create multiple
 connections to the same remote service (for example, to separate control and
 data), then it seems more efficient to move the asynchronous at outside of 
 the
 connect call.
 - Sean
 
 Thats a good point. What I had in mind was mainly simplicity for the
 consumer - save him dealing with another upcall. 
 
 Maybe caching in at module would make things better, but I agree 
 that for multiple connections to the same remote service, the
 asynchronous at aproach, seems more appropriate.

OTOH,
After thinking about it some more, there might be problems in letting
each and every consumer do his own caching. The at.c has a (non
implemented yet) mechanism with invalidate for caching tables.

Do we really want to let the consumer handle all the cases of routing
tables changing on the fly etc. or centralize it in one place (i.e
at.c) ?

What do you think, Sean ?

Guy


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [openib-general] RDMA connection and address translation API

2005-08-26 Thread Guy German

 We need to insert in here:

 ib_modify_qp(...);  /* somehow uses address resolution... */
 ib_post_recvs(...);


or add a new call to create the qp and modify it to init (an analog to
the socket(2) function).

Sean This approach seems reasonable to me.  Maybe something like:
Sean rdma_create_qp(rdma_addr_info);

Sean Uses the output from the address resolution to create the QP on the 
Sean correct device and transitions it to the INIT state.  The user can 
Sean now post any work requests that they want.  For example, with iWarp, 
Sean I believe that even send work requests can be posted in the INIT state.

What do you think about this flow ? 
1. resolve device and port from ip address - synchronous operation 
   (like at.c resolve_ip)
2. rdma_create_qp (device+port) - modifies qp to init with default pkey index
3. ib_post_recvs(...);
4. cma_connect - asynchronous at, modify qp with correct pkey index, cm_connect

Guy

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [openib-general] RDMA connection and address translation API

2005-08-26 Thread James Lentini



On Thu, 25 Aug 2005, Sean Hefty wrote:

  Any way providing src/dst IPs in the CM Private data is simple, 
  and we can come with IBTA extension blessing that data structure 
  as a general way to map IP oriented protocols over IB (a 1-2 page 
  draft at the most) This way it can also address Caitlin concerns 
  regarding NFS  IETF (since now it's a transport specific issue)
 
 How long do you estimate it would take to standardize an IP-GID 
 mechanism (ATS, CM embedded, ...) in the IBTA? 3 months? 6 months? 
 A year?
 
 Let's assume that everyone on this list is in agreement.
 
 Does anyone in the IB world disagree with adding IP addresses in the 
 CM private data area?  Would we want to extend this concept to SIDR 
 as well?

I think we should focus on providing a mechanism to allow ULPs to use 
IP addresses on InfiniBand networks. 

Service discovery (SIDR) seems like a separate issue. The ability to 
ask What UD QPN is this service using? seems useful on its own.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [openib-general] RDMA connection and address translation API

2005-08-26 Thread Caitlin Bestler

 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of Guy German
 Sent: Friday, August 26, 2005 1:27 AM
 To: Sean Hefty; James Lentini
 Cc: openib-general@openib.org
 Subject: RE: [openib-general] RDMA connection and address 
 translation API

  We need to insert in here:

  ib_modify_qp(...);  /* somehow uses address resolution... */ 
  ib_post_recvs(...);

 or add a new call to create the qp and modify it to init (an 
 analog to 
 the socket(2) function).

 Sean This approach seems reasonable to me.  Maybe something like:
 Sean rdma_create_qp(rdma_addr_info);

 Sean Uses the output from the address resolution to create the QP on 
 Sean the correct device and transitions it to the INIT 
 state.  The user 
 Sean can now post any work requests that they want.  For 
 example, with 
 Sean iWarp, I believe that even send work requests can be 
 posted in the INIT state.

 What do you think about this flow ? 
 1. resolve device and port from ip address - synchronous operation 
(like at.c resolve_ip)
 2. rdma_create_qp (device+port) - modifies qp to init with 
 default pkey index 3. ib_post_recvs(...); 4. cma_connect - 
 asynchronous at, modify qp with correct pkey index, cm_connect

At least with iWARP a QP is not bound to a specific port, or even
to an IP Address. It is only bound to the RDMA Device (RNIC) and
Protection Domain. The same QP can be re-used for a new connection
with a new IP address. Indeed, that is exactly what would happen
with application-layer controlled failover (such as iSER).

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [openib-general] RDMA connection and address translation API

2005-08-26 Thread Guy German

 What do you think about this flow ? 
 1. resolve device and port from ip address - synchronous operation 
(like at.c resolve_ip)
 2. rdma_create_qp (device+port) - modifies qp to init with 
 default pkey index 
 3. ib_post_recvs(...); 
 4. cma_connect -  asynchronous at, modify qp with correct 
 pkey index, cm_connect

Caitlin wrote:
At least with iWARP a QP is not bound to a specific port, or even
to an IP Address. It is only bound to the RDMA Device (RNIC) and
Protection Domain. The same QP can be re-used for a new connection
with a new IP address. Indeed, that is exactly what would happen
with application-layer controlled failover (such as iSER).

In ib, in order to post receive the QP need to be in init.
In order to modify qp to init, you need port and pkey_index.
If iWARP can post receive without it, the iwarp implementation
of rdma_create_qp can ignore the port attribute.

The other option, that was suggested to solve the sync problem
(need of post receive before connect) is to retrieve the path
synchronically, which will require an unnecessary upcall handling
for iwarp consumers.

Guy
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [openib-general] RDMA connection and address translation API

2005-08-26 Thread Guy German

What do you think about this flow ?
1. resolve device and port from ip address - synchronous operation
   (like at.c resolve_ip)
2. rdma_create_qp (device+port) - modifies qp to init with default pkey index
3. ib_post_recvs(...);
4. cma_connect - asynchronous at, modify qp with correct pkey index, cm_connect

It looks like this would work.  If a client wanted to create multiple
connections to the same remote service (for example, to separate control and
data), then it seems more efficient to move the asynchronous at outside of the
connect call.
- Sean

Thats a good point. What I had in mind was mainly simplicity for the
consumer - save him dealing with another upcall. 

Maybe caching in at module would make things better, but I agree 
that for multiple connections to the same remote service, the
asynchronous at aproach, seems more appropriate.

So ...
Does everyone else thinks that we should change the API of a cm 
abstraction to asynchronous at before connection ? 
(This should concern mostly the iWAPR guys - Caitlin,Tom etc..)

Thanks,
Guy
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [openib-general] RDMA connection and address translation API

2005-08-26 Thread Caitlin Bestler

 -Original Message-
 From: Guy German [mailto:[EMAIL PROTECTED] 
 Sent: Friday, August 26, 2005 12:28 PM
 To: Caitlin Bestler; Sean Hefty; James Lentini
 Cc: openib-general@openib.org
 Subject: RE: [openib-general] RDMA connection and address 
 translation API

  What do you think about this flow ? 
  1. resolve device and port from ip address - synchronous operation 
 (like at.c resolve_ip)
  2. rdma_create_qp (device+port) - modifies qp to init with default 
  pkey index 3. ib_post_recvs(...); 4. cma_connect -  
 asynchronous at, 
  modify qp with correct pkey index, cm_connect

 Caitlin wrote:
 At least with iWARP a QP is not bound to a specific port, or 
 even to an 
 IP Address. It is only bound to the RDMA Device (RNIC) and 
 Protection 
 Domain. The same QP can be re-used for a new connection with 
 a new IP 
 address. Indeed, that is exactly what would happen with 
 application-layer controlled failover (such as iSER).

 In ib, in order to post receive the QP need to be in init.
 In order to modify qp to init, you need port and pkey_index.
 If iWARP can post receive without it, the iwarp 
 implementation of rdma_create_qp can ignore the port attribute.

The closest equivalent of a pkey_index would be the VLAN ID, which
is at L2 and totally transparent to an iWARP QP. You can definitely
post receive buffers before knowing anything about the TCP connection
(or SCTP association/stream) that will provide the LLP service.

 The other option, that was suggested to solve the sync 
 problem (need of post receive before connect) is to retrieve 
 the path synchronically, which will require an unnecessary 
 upcall handling for iwarp consumers.

The generic requirement is that the QP passed to the connect
method is ready to be moved to a connected state as soon as
the connection establishment exchanges have finished.

If I follow what you are proposing, you are trying to find a way
to do this for IB automatically as a by-product of determining what
device to use. I don't see any problem with this, as long as the
port being returned from the first call is defined in such a
way that it can have a void value when the transport does not need
this refinement. Avoiding transport-dependent steps is good for
encouraging development of RDMA-aware applications.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] RDMA connection and address translation API

2005-08-25 Thread Christoph Hellwig

On Wed, Aug 24, 2005 at 02:15:09PM -0700, Roland Dreier wrote:
 Roland Well, that's not what I would expect.  Suppose I have a
 Roland device configured with local addresses 192.168.11.12 and
 Roland 192.168.98.99 and I
 
 Christoph You never configure a device with local addresses.  IP
 Christoph addresses are always a per-host attribute in Linux.
 
 I don't think this is really true.  In some ways Linux behaves as if
 IP addresses are per-host (eg ARP responses can go out any interface)
 but really IP addresses are attached to an interface.  Every struct
 net_device has a struct in_device, and every struct in_device has a
 list of struct in_ifaddrs for the device's IP addresses.

This is correct, but the user-visible effect is what I said above.
When you do an ARP query for any of the IP addresses of a linux box
you'll get a responce even if that interface isn't on the network.

Even if you don't think that's enough you can assign any number of
IP and other networking addresses to a given device even formally,
rendering the notation of an IP address - network device relation
rather mood.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] RDMA connection and address translation API

2005-08-25 Thread Christoph Hellwig

On Wed, Aug 24, 2005 at 02:22:31PM -0700, Caitlin Bestler wrote:
 Not if the host connects two disjoint networks and does not route
 between them. Such a host should/may be configured to reject any
 packet that arrives with a destination address that does not match
 the expected destination address for the port it arrives upon.

While you can configure a Linux system to reject such request through
a bunch of crude hacks, the default and fully RFC compliant behaviour
is to always reply to ARP requests for any IP address assigned to the
system.  RDMA CM implementations must work the same.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [openib-general] RDMA connection and address translation API

2005-08-25 Thread Caitlin Bestler

Good point. But that's about wire behavior, not what an application sees.

And yes, the RDMA device must behave as though its IP layer
were part of the host stack. That is a strong argument for
standardizing many of those interactions rather than relying
on fully compliant parallel processing.
 

-Original Message-
From: Christoph Hellwig [mailto:[EMAIL PROTECTED] 
Sent: Thursday, August 25, 2005 1:52 AM
To: Caitlin Bestler
Cc: Christoph Hellwig; openib-general@openib.org
Subject: Re: [openib-general] RDMA connection and address translation API

On Wed, Aug 24, 2005 at 02:22:31PM -0700, Caitlin Bestler wrote:
 Not if the host connects two disjoint networks and does not route 
 between them. Such a host should/may be configured to reject any 
 packet that arrives with a destination address that does not match the 
 expected destination address for the port it arrives upon.

While you can configure a Linux system to reject such request through a bunch
of crude hacks, the default and fully RFC compliant behaviour is to always
reply to ARP requests for any IP address assigned to the system.  RDMA CM
implementations must work the same.


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] RDMA connection and address translation API

2005-08-25 Thread James Lentini



On Wed, 24 Aug 2005, Roland Dreier wrote:

 Sean Is the idea that the user calls connect() and then receives
 Sean a single callback indicating that the connection has been
 Sean established?  If so, then the user may need to modify the QP
 Sean to the INIT state, which would require some knowledge
 Sean already of the path.  We would also need to be clear on
 Sean whether the QP is expected to be in the INIT state before
 Sean connect is called, or if it could be in any arbitrary state.
 Sean The other alternative is to provide multiple callbacks
 Sean during connection establishment.
 
 To me it makes sense for the generic CM API to be defined so that an
 IB QP must be in the INIT state before being passed to connect().

Will the ib_modify_qp() function be made transport neutral? I see some 
fields in the ib_qp_attr structure that are IB specific.

I think the RDMA connection API should perform all the QP state 
transitions for the ULP. How about a new call to create the QP and 
perform all QP state transitions necessary for the posting receive 
work requests?
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [openib-general] RDMA connection and address translation API

2005-08-25 Thread Caitlin Bestler

The data required when doing a qp-modify-to-rts is inherently
transport specific. IB requires a set of data obtained from the
IB CM protocol (or the equivalent data through application specific
black magic), while iWARP requires a handle for a TCP connection
(assumed to be a socket, but not explicitly required to be so).

The problem is that when the RDMAC specified the iWARP modify qp
to RTS behaviour they did not forsee the non-technical barriers
to simply using a socket handle to specify transfer of ownership
of a TCP connection from one stack to another.
 

 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of James Lentini
 Sent: Thursday, August 25, 2005 7:54 AM
 To: Roland Dreier
 Cc: openib-general@openib.org
 Subject: Re: [openib-general] RDMA connection and address 
 translation API
 
 
 
 On Wed, 24 Aug 2005, Roland Dreier wrote:
 
  Sean Is the idea that the user calls connect() and 
 then receives
  Sean a single callback indicating that the connection has been
  Sean established?  If so, then the user may need to 
 modify the QP
  Sean to the INIT state, which would require some knowledge
  Sean already of the path.  We would also need to be clear on
  Sean whether the QP is expected to be in the INIT state before
  Sean connect is called, or if it could be in any 
 arbitrary state.
  Sean The other alternative is to provide multiple callbacks
  Sean during connection establishment.
  
  To me it makes sense for the generic CM API to be defined 
 so that an 
  IB QP must be in the INIT state before being passed to connect().
 
 Will the ib_modify_qp() function be made transport neutral? I 
 see some fields in the ib_qp_attr structure that are IB specific.
 
 I think the RDMA connection API should perform all the QP 
 state transitions for the ULP. How about a new call to create 
 the QP and perform all QP state transitions necessary for the 
 posting receive work requests?
 ___
 openib-general mailing list
 openib-general@openib.org
 http://openib.org/mailman/listinfo/openib-general
 
 To unsubscribe, please visit 
 http://openib.org/mailman/listinfo/openib-general
 
 

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] RDMA connection and address translation API

2005-08-25 Thread Roland Dreier

Sean Another possibility could be to add a list of receives to
Sean rdma_connect().

Guy I added this to both connect and accept calls

I don't think this is a good idea.  Let's try to streamline the
connect call, not add every single possible feature to it.

 - R.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] RDMA connection and address translation API

2005-08-25 Thread James Lentini



On Wed, 24 Aug 2005, Roland Dreier wrote:

 James You need to consider what makes sense for *both* ib and
 James iwarp. Keep in mind that the correct API will allow a
 James consumer to use ib and iwarp devices transparently. In
 James other words their will be one code path that support both.
 
 James If we were to adopt your proposal, the consumer would need
 James to perform unnecessary operations on iWARP.
 
 No, I think we just need to realize that a perfectly transport neutral
 protocol implementation is not achievable.  

It is achievable. Although the IB and iWARP protocols are different, 
they can provide the same services to NFS-RDMA.

IB is missing one service that iWARP has, namely that nodes can be 
identified with IP addresses. The ATS mechanism provides this 
capability for IB networks. If there are better mechanisms that do the 
same thing, then NFS-RDMA can use them. 

The important things is not to push this up into the ULPs. The NFS-RDMA 
protocol is being standardized in the IETF. There is no reason to 
upset that process. If an additional IB specific protocol is 
necessary, it should be standardized in the IBTA.

 It's unfortunate that kDAPL fooled people by hiding the details of 
 the wire protocol under a supposedly neutral API, but the fact is 
 that mapping an abstract RDMA transport to a real implementation 
 will always involve arbitrary transport-dependent choices.

The kDAPL API *is* transport neutral. This has been demonstrated at 
several interoperability tests at which the same applications were run 
on both IB and iWARP.

kDAPL isn't the only transport neutral networking API. The Sockets API 
supports UDP and TCP transports via the same interface. 

I believe we are very close to reaching agreement on a transport 
neutral RDMA connection API. Comparing your API proposal to the API 
that we proposed at the BOF, they are very similar. The most important 
similarity is that both use IP addressing. 

The only real point of debate is over how to perform the address 
translation (IP - GID) on IB. I believe we should separate that from 
the API discussion. 
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] RDMA connection and address translation API

2005-08-25 Thread James Lentini



On Wed, 24 Aug 2005, Caitlin Bestler wrote:

 NFS over RDMA does not do that.
 
 Shouldn't that be the end of discussion on abusing CM private data
 unless you are talking *solely* about IB private data. And if that is
 the discussion, should not such a strategy be proposed to IETF
 and/or IBTA for an NFSoRDMA for IB official mapping?

Since this is IB specific, I think it should be addressed in the IBTA.

 The other end of the NFSoRDMA connection is not necessarily
 running OpenIB or even Linux and is not party to any of these
 discussions.
 
  
  My resistance is that ATS is just complexity without any benefit.  It
  doesn't provide additional security.  It doesn't solve the
  multi-homing problem we're talking about now.  Once you've thrown away
  information by turning your IP address into an IB GID, there's no
  magic way ATS can recreate that information and be psychic about which
  of the multi-homed IPs you actually meant.  So why not just put the IP
  addressing information into the CM private data, the way that the SDP
  protocol already does?
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] RDMA connection and address translation API

2005-08-25 Thread Guy German

On Thu, 2005-08-25 at 08:58 -0700, Roland Dreier wrote:
 Sean Another possibility could be to add a list of receives to
 Sean rdma_connect().
 
 Guy I added this to both connect and accept calls
 
 I don't think this is a good idea.  Let's try to streamline the
 connect call, not add every single possible feature to it.
 
  - R.

I think it is a good solution for the sync problem that sean raised - in
the case where we modify the qp inside the abstraction layer.
We can take it out (i.e getting the path and modify qp to init *before*
connect) but I think this will be more complicated for the consumers
(especially the iwarp ones).
I am not saying we *have* to do it - this is just a suggestion.

Guy

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [openib-general] RDMA connection and address translation API

2005-08-25 Thread James Lentini



On Wed, 24 Aug 2005, Yaron Haviv wrote:

 Any way providing src/dst IPs in the CM Private data is simple, and we
 can come with IBTA extension blessing that data structure as a general
 way to map IP oriented protocols over IB (a 1-2 page draft at the most)
 This way it can also address Caitlin concerns regarding NFS  IETF
 (since now it's a transport specific issue)

How long do you estimate it would take to standardize an IP-GID 
mechanism (ATS, CM embedded, ...) in the IBTA? 3 months? 6 months? A 
year?

Let's assume that everyone on this list is in agreement.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] RDMA connection and address translation API

2005-08-25 Thread James Lentini



On Wed, 24 Aug 2005, Roland Dreier wrote:

 James I agree with Caitlin. The eventual solution cannot force
 James protocol modifications in ULPs.
 
 Does this mean we're stuck with the current use of ATS in NFS-RDMA?

NFS-RDMA requires that the lower layer provide IP addressing. ATS is 
one proposal and the only one being documented and standardized in a 
standards organization. Any other solution that was documented and 
standardized should be considered. 

Since this will involve the wire protocol, it can't be OpenIB 
specific.

 Surely there's still time to fix the protocol.

I believe that a solution can be found without impacting the NFS-RDMA 
specifications.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] RDMA connection and address translation API

2005-08-25 Thread Roland Dreier

Roland No, I think we just need to realize that a perfectly
Roland transport neutral protocol implementation is not
Roland achievable.

James It is achievable. Although the IB and iWARP protocols are
James different, they can provide the same services to NFS-RDMA.

Not really.  This is just hiding the transport dependence in some
other layer and then pretending it doesn't exist.  IB and iWARP can
provide the same services to NFS/RDMA, but only through some
intermediate layer that implements the actual transport-dependent wire
protocol.

James IB is missing one service that iWARP has, namely that nodes
James can be identified with IP addresses. The ATS mechanism
James provides this capability for IB networks. If there are
James better mechanisms that do the same thing, then NFS-RDMA can
James use them.

All implementation of NFS/RDMA on top of IB had better interoperate,
right?  Which means that someone has to specify which address
translation mechanism is the choice for NFS/RDMA.

James The important things is not to push this up into the
James ULPs. The NFS-RDMA protocol is being standardized in the
James IETF. There is no reason to upset that process. If an
James additional IB specific protocol is necessary, it should be
James standardized in the IBTA.

NFS/RDMA is being defined on top of an abstract RDMA interface.
Someone has to write a spec for how that RDMA abstraction is
translated into packets on the wire for each transport that NFS/RDMA
will run on top of.

 - R.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [openib-general] RDMA connection and address translation API

2005-08-25 Thread James Lentini



On Wed, 24 Aug 2005, Tom Tucker wrote:

  
   - It's not just preventing connections to the wrong local address.
 NFS-RDMA wants the remote source address (ie getpeername()) so that
 it can look it up in the exports list.
 
 Agreed. But you could also get rid of ATS by allowing GIDs to 
 be specified in the exports file and then treating them like 
 IPv6 addresses for the purpose of subnet comparisons.

Could generic code use both GIDs and IPv4 addresses? 
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] RDMA connection and address translation API

2005-08-25 Thread Talpey, Thomas

At 12:34 PM 8/25/2005, Roland Dreier wrote:
All implementation of NFS/RDMA on top of IB had better interoperate,
right?  Which means that someone has to specify which address
translation mechanism is the choice for NFS/RDMA.

Correct. At the moment the existing NFS/RDMA implementations
use ATS (Sun's and NetApp's).

NFS/RDMA is being defined on top of an abstract RDMA interface.
Someone has to write a spec for how that RDMA abstraction is
translated into packets on the wire for each transport that NFS/RDMA
will run on top of.

Well, we did. We specify the ULP payload of all the messages
in those two IETF documents. What we didn't do is define how
each transport handles IP addressing, that is a transport issue.

We don't need address translation over iWARP, since that uses
IP. Over IB, so far, we have used ATS. I am perfectly fine with
a better solution, but ATS has been fine too.

I am catching up to this discussion, so this is just one reply.

Tom.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [openib-general] RDMA connection and address translation API

2005-08-25 Thread Caitlin Bestler

Generic code MUST support both IPv4 and IPv6 addresses.
I've even seen code that actually does this.

So supporting GIDs is not that much of an issue as long
as no IB network IDs are assigned with a meaning that
conflicts with any reachable IPv6 network ID. (In other
words, assign GIDs so that they are in fact valid IPv6
addresses. Something that was always planned to be one
option for GIDs).



 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of James Lentini
 Sent: Thursday, August 25, 2005 9:48 AM
 To: Tom Tucker
 Cc: openib-general@openib.org
 Subject: RE: [openib-general] RDMA connection and address 
 translation API
 
 
 
 On Wed, 24 Aug 2005, Tom Tucker wrote:
 
   
- It's not just preventing connections to the wrong 
 local address.
  NFS-RDMA wants the remote source address (ie 
 getpeername()) so that
  it can look it up in the exports list.
  
  Agreed. But you could also get rid of ATS by allowing GIDs to be 
  specified in the exports file and then treating them like
  IPv6 addresses for the purpose of subnet comparisons.
 
 Could generic code use both GIDs and IPv4 addresses? 
 ___
 openib-general mailing list
 openib-general@openib.org
 http://openib.org/mailman/listinfo/openib-general
 
 To unsubscribe, please visit 
 http://openib.org/mailman/listinfo/openib-general
 
 

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [openib-general] RDMA connection and address translation API

2005-08-25 Thread Yaron Haviv

 -Original Message-
 From: James Lentini [mailto:[EMAIL PROTECTED]
 Sent: Thursday, August 25, 2005 12:21 PM
 To: Yaron Haviv
 Cc: Fab Tillier; Roland Dreier; openib-general@openib.org
 Subject: RE: [openib-general] RDMA connection and address translation
API
 
 
 
 On Wed, 24 Aug 2005, Yaron Haviv wrote:
 
  Any way providing src/dst IPs in the CM Private data is simple, and
we
  can come with IBTA extension blessing that data structure as a
general
  way to map IP oriented protocols over IB (a 1-2 page draft at the
most)
  This way it can also address Caitlin concerns regarding NFS  IETF
  (since now it's a transport specific issue)
 
 How long do you estimate it would take to standardize an IP-GID
 mechanism (ATS, CM embedded, ...) in the IBTA? 3 months? 6 months? A
 year?
 
 Let's assume that everyone on this list is in agreement.

James, I can identify enough IBTA members in this list
In case the group is in agreement I believe it's a rather short process
Since it's just some minor definition, and IBTA doesn't have much on its
agenda these days.

For example Hal added a feature to the SM (client re-register ..) in
weeks 
Based on the OpenIB input 
We also don't have to wait for finalized spec to implement, just like we
implement IPoIB without an IETF RFC (only a draft)

By the way a quick path could be to define it in DAT and hand it over to
IBTA, after all ATS is also not an IBTA standard 

Yaron
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [openib-general] RDMA connection and address translation API

2005-08-25 Thread James Lentini



On Wed, 24 Aug 2005, Fab Tillier wrote:

 Performing a forward lookup via ARP is going to be a lot faster than 
 ATS if the ARP entry already exists.

ATS responses could also be cached.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [openib-general] RDMA connection and address translation API

2005-08-25 Thread James Lentini



On Wed, 24 Aug 2005, Sean Hefty wrote:

 With this in mind, I believe that the connection API needs to be
 something more like the following:
 
 rdma_resolve_address():
 inputs: dest IP address, qos, npaths,
 done callback, opaque context
  done callback params: status, local RDMA device,
 RDMA transport address, context
 ...
 rdma_connect():
 inputs: local QP, RDMA transport address, destination service,
 private data, timeout, event callback, opaque context
 
 Have we agreed that this is the functionality that we should be 
 aiming towards?

I think so, but as you pointed out the local QP must be in the init 
state.

 
 rdma_resolve_address(...);
 /* wait for resolution */
 ib_create_qp(...) /* use device pointer we got from 
  rdma_resolve_address()
 */
 
 We need to insert in here: 
 
 ib_modify_qp(...);  /* somehow uses address resolution... */
 ib_post_recvs(...);
 

or add a new call to create the qp and modify it to init (an analog to 
the socket(2) function).

 rdma_connect(...); /* pass transport address we got from
 rdma_resolve_address() */
 /* wait for connection to finish... */
 
 Another possibility could be to add a list of receives to 
 rdma_connect().

The caller might also want to setup memory windows. Requiring the qp 
to be in the init state before calling connect seems cleaner to me.

 
 - Sean
 
 
 ___
 openib-general mailing list
 openib-general@openib.org
 http://openib.org/mailman/listinfo/openib-general
 
 To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
 
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [openib-general] RDMA connection and address translation API

2005-08-25 Thread Talpey, Thomas

At 12:56 PM 8/25/2005, Caitlin Bestler wrote:
Generic code MUST support both IPv4 and IPv6 addresses.
I've even seen code that actually does this.

Let me jump ahead to the root question. How will the NFS layer know
what address to resolve?

On IB mounts, it will need to resolve a hostname or numeric string to
a GID, in order to provide the address to connect. On TCP/UDP, or
iWARP mounts, it must resolve to IP address. The mount command
has little or no context to perform these lookups, since it does not
know what interface will be used to form the connection.

In exports, the server must inspect the source network of each
incoming request, in order to match against /etc/exports. If there
are wildcards in the file, a GID-specific algorithm must be applied.
Historically, /etc/exports contains hostnames and IPv4 netmasks/
addresses.

In either case, I think it is a red herring to assume that the GID
is actually an IPv6 address. They are not assigned by the sysadmin,
they are not subnetted, and they are quite foreign to many users.
IPv6 support for Linux NFS isn't even submitted yet, btw.

With an IP address service, we don't have to change a line of 
NFS code.

Tom.



So supporting GIDs is not that much of an issue as long
as no IB network IDs are assigned with a meaning that
conflicts with any reachable IPv6 network ID. (In other
words, assign GIDs so that they are in fact valid IPv6
addresses. Something that was always planned to be one
option for GIDs).



 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of James Lentini
 Sent: Thursday, August 25, 2005 9:48 AM
 To: Tom Tucker
 Cc: openib-general@openib.org
 Subject: RE: [openib-general] RDMA connection and address 
 translation API
 
 
 
 On Wed, 24 Aug 2005, Tom Tucker wrote:
 
   
- It's not just preventing connections to the wrong 
 local address.
  NFS-RDMA wants the remote source address (ie 
 getpeername()) so that
  it can look it up in the exports list.
  
  Agreed. But you could also get rid of ATS by allowing GIDs to be 
  specified in the exports file and then treating them like
  IPv6 addresses for the purpose of subnet comparisons.
 
 Could generic code use both GIDs and IPv4 addresses? 
 ___
 openib-general mailing list
 openib-general@openib.org
 http://openib.org/mailman/listinfo/openib-general
 
 To unsubscribe, please visit 
 http://openib.org/mailman/listinfo/openib-general
 
 

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] RDMA connection and address translation API

2005-08-25 Thread James Lentini



On Tue, 23 Aug 2005, Roland Dreier wrote:

 The listen side is even simpler:
 
 rdma_listen():
 inputs: local service, event callback, consumer context
 
 Wait for connection requests and pass events to the consumer's
 callback.  I'm not sure if/home we want to support binding to
 a particular IP address.  The current IB CM in Linux doesn't
 support binding a listen to a single device or port, and even
 if it did it's not clear how to handle binding to one IP
 address when a port has more than one IP.
 
 I guess the event callback would receive a device pointer and
 the same RDMA transport address union I talked about above
 when discussing address resolution.
 
 It would be possible to have another function like
 rdma_getpeername() that takes the transport address and
 returns a source IP address.

To be complete, the API needs an rdma_getpeername() function:

rdma_getpeername():
inputs: connected QP
outputs: peer IP address

 In the IB case this would do an
 ATS reverse lookup.  However, I hate this idea.  iSER already
 uses the CM private data to pass the source IP in the IB case,
 and I would much rather fix NFS/RDMA to do the same thing (so
 we can just kill ATS as an address resolution method).
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [openib-general] RDMA connection and address translation API

2005-08-25 Thread Sean Hefty

Sean Another possibility could be to add a list of receives to
Sean rdma_connect().

Guy I added this to both connect and accept calls

I don't think this is a good idea.  Let's try to streamline the
connect call, not add every single possible feature to it.

I don't think that we want to add a list of receives to the connect call either.
I only mentioned that it was a possibility.

- Sean

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [openib-general] RDMA connection and address translation API

2005-08-25 Thread Yaron Haviv

 -Original Message-
 From: Sean Hefty [mailto:[EMAIL PROTECTED]
 Sent: Thursday, August 25, 2005 2:37 PM
 To: 'James Lentini'; Yaron Haviv
 Cc: openib-general@openib.org
 Subject: RE: [openib-general] RDMA connection and address translation
API

  Any way providing src/dst IPs in the CM Private data is simple, and
we
  can come with IBTA extension blessing that data structure as a
general
  way to map IP oriented protocols over IB (a 1-2 page draft at the
most)
  This way it can also address Caitlin concerns regarding NFS  IETF
  (since now it's a transport specific issue)

 How long do you estimate it would take to standardize an IP-GID
 mechanism (ATS, CM embedded, ...) in the IBTA? 3 months? 6 months? A
 year?

 Let's assume that everyone on this list is in agreement.

 Does anyone in the IB world disagree with adding IP addresses in the
CM
 private
 data area?  Would we want to extend this concept to SIDR as well?

 - Sean

I send my proposal from 2004 re-send again as text (attached)
Also addresses the ServiceID issue, this can be a baseline for
discussions
Feel free to change 

Yaron

   Mapping of iWarp/TCP connections to InfiniBand

AUTHOR   Yaron Haviv  ([EMAIL PROTECTED])
VERSION  0.30, Mon June 28 2004

I.  INTRODUCTION

 InfiniBand and iWarp semantics are similar especially with the latest
 Verb Extensions, the major difference is in the way connections are 
 established, iWarp uses TCP based connection establishment while 
 InfiniBand uses a CM for that. 
 Another related difference is that in iWarp a user can start in a 
 standard TCP mode and migrate to RDMA verbs in the middle of a session.

 The following document provides a general mapping from iWarp/TCP 
 connection establishment to InfiniBand which can be used by ULPs over 
 InfiniBand or by any other future iWarp protocols, it imitates the SDP
 connection establishment process and CM headers (does not require SDP,
 just have the same data formats for CM messages).

II. Establishing a TCP/iWarp like connections over InfiniBand

 In order to emulate an iWarp connection, it is required to open an 
 InfiniBand RC connection, associate it with IP addresses and TCP ports
 In addition protocols may transfer control/login packets before
 the migration to the RDMA mode; this requires exchanging receiver buffer
 size and depth for initial usage (the ULPs will manage the flow control
 for the duration of the connection).

 The mapping uses the same data structures already defined for connection 
 establishment in SDP  (IBTA Socket Direct Protocol) which accomplish the
 same goal of mapping TCP Sockets addressing to InfiniBand, the non 
 relevant SDP fields were Reserved. 

 iWarp emulation CM Request (Hello) Private Data header

0   1   2   3 
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1   
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+  
 04|  MID  | Rsvd  | bufs  |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+  
 08|  len  |  
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+  
 12|   Reserved|  
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+  
 16|   Reserved|  
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+  
 20| MajVer| MinVer| IPVer | FlowC |   Reserved|  
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+  
 24|  DesRemRcvSz  |  
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+  
 28|  LocalRcvSz   |  
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+  
 32| Local Port|   Reserved|  
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+  
 36|   Src IP (127-96) |  
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+  
 40|   Src IP ( 95-64) |  
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+  
 44|   Src IP ( 63-32) |  
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+  
 48|   Src IP ( 31-00) |  
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+  
 52|   Dst IP (127-96) |  
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+  
 56|   Dst IP ( 95

Re: [openib-general] RDMA connection and address translation API

2005-08-25 Thread Christoph Hellwig

On Thu, Aug 25, 2005 at 01:18:06PM -0400, Talpey, Thomas wrote:
 At 12:56 PM 8/25/2005, Caitlin Bestler wrote:
 Generic code MUST support both IPv4 and IPv6 addresses.
 I've even seen code that actually does this.
 
 Let me jump ahead to the root question. How will the NFS layer know
 what address to resolve?
 
 On IB mounts, it will need to resolve a hostname or numeric string to
 a GID, in order to provide the address to connect. On TCP/UDP, or
 iWARP mounts, it must resolve to IP address. The mount command
 has little or no context to perform these lookups, since it does not
 know what interface will be used to form the connection.
 
 In exports, the server must inspect the source network of each
 incoming request, in order to match against /etc/exports. If there
 are wildcards in the file, a GID-specific algorithm must be applied.
 Historically, /etc/exports contains hostnames and IPv4 netmasks/
 addresses.
 
 In either case, I think it is a red herring to assume that the GID
 is actually an IPv6 address. They are not assigned by the sysadmin,
 they are not subnetted, and they are quite foreign to many users.
 IPv6 support for Linux NFS isn't even submitted yet, btw.
 
 With an IP address service, we don't have to change a line of 
 NFS code.

I think this shows that using IP addresses in any service over
infiniband that isn't actually IP networking is extremly stupid.
Just stop living in the illusion that it makes sense and use IB-specific
addressing, namely IB and stop all this layering violations into IP,
which is much higher up the stack.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [openib-general] RDMA connection and address translation API

2005-08-24 Thread Sean Hefty

However, there's another problem with trying to lump address
translation and connection into a single connect call, and this
problem looks fundamental and fatal to me.  The connect call takes a
QP pointer, but to create a QP the consumer needs to know which local
device to use.  However, the consumer doesn't know which device to use
until the destination address has been resolved to a route, including
a local interface.

I agree that this is a fairly serious issue with the proposed API.  I guess that
I'd like to clarify what the operation of a connect call would do.  Would it be
responsible for modifying the QP?  If so, could such a call also allocate the
QP?  Note that I'm not advocating either of these, just trying to determine what
the behavior of the API would be.

Wait for connection requests and pass events to the consumer's
callback.  I'm not sure if/home we want to support binding to
a particular IP address.  The current IB CM in Linux doesn't
support binding a listen to a single device or port, and even
if it did it's not clear how to handle binding to one IP
address when a port has more than one IP.

I don't think that it would be overly difficult to bind IB CM listen requests to
a specific port or LID, or based on matching specific private data.

- Sean

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] RDMA connection and address translation API

2005-08-24 Thread Guy German

Hi,

- Here is a header file for cm abstraction API proposition.
- This is just a preliminary suggestion, for review.
- All comments are welcome.
- Please read the notes in the header remarks
- I am attaching the file and will send it later in a different message,
to the list.
- I think that the ib_ prefix should be changed to rdma_, but that
should be done for the rest of the verbs as well, if we are claiming
that the ib verbs abstract iwarp.
- I think that the main difference between the 2 propositions is the
question of whether or not to expose the consumer to the address
resolution. I believe this suggestion (of covering it in the cma) is
simpler, because it saves unnecessary upcall handling for the consumer.
In any case - I don't believe this is clear cut, and would like to hear
other opinions from people on the list.
- Also please see my embedded answer to this mail


Thanks,
Guy.

 We already discussed the problem with having the listen callback pass
 the consumer a remote source address -- doing this requires the
 connection handling module to do an ATS reverse lookup in the IB case,
 which the consumer might not want.  I think there's agreement that the
 correct thing here is for the listen callback to pass a transport
 address to the consumer and provide a function that the consumer can
 call to perform an ATS reverse lookup if desired.  This isn't a major
 problem and can be dealt with.

I agree. This is corrected in the current suggestion

 However, there's another problem with trying to lump address
 translation and connection into a single connect call, and this
 problem looks fundamental and fatal to me.  The connect call takes a
 QP pointer, but to create a QP the consumer needs to know which local
 device to use.  However, the consumer doesn't know which device to use
 until the destination address has been resolved to a route, including
 a local interface.

The proposition, also presented (I beleive) in the OpenIB workshop,
include a function called ib_cma_get_device, that retrieves the device
(for qp creation purposes) according to the destination address and the
local routing table. This is done synchronously, and it is implemented
today in the at module. If using link-local IPv6 addresses, I think that
this function isn't even necessary (If I understand it correctly - you
need to know which device to get out from).

 As far as I can tell, kDAPL punts on this and simply requires the
 consumer to handle the route lookup itself before calling
 dat_ep_connect().  It seems that current kDAPL consumers similarly
 punt on this issue: the iSER initiator and the NFS-RDMA client both
 just use a single device which is statically discovered at init time.
 
 It seems that the kDAPL connection model has a serious flaw, in that
 it pushes the complexity of route lookup into the consumer.  Further,
 we have strong evidence that this routing code is hard to write and
 that consumers will just ignore this complexity and hard-code
 solutions that don't work under all configurations.
 With this in mind, I believe that the connection API needs to be
 something more like the following:
 
 rdma_resolve_address():
 inputs: dest IP address, qos, npaths,
 done callback, opaque context
   done callback params: status, local RDMA device,
 RDMA transport address, context
 
 This function starts the process of resolving an IP address to
 an RDMA device and address.  When the resolution is complete,
 the callback is called with a status.  If the status is
 success then the callback also gets the device pointer and
 transport address (as well as the original context that the
 consumer passed in).

In the address resolution you have 2 upcalls (from ip to gid and from
gid to path). So, if you are already covering one upcall in the cma, why
not cover both ?

 The RDMA transport address type is a union containing
 transport-dependent data.  In the IB case, it's all of the
 SGID, DGID, SLID, DLID, SL etc. that we know and love.  In the
 iWARP case, it's the source IP, destination IP and QOS.
 
 npaths can be either 1 or 2 in the IB case; if it's 2, then
 the resolver will try to find a primary and alternate path for
 APM.  In the iWARP case, I guess npaths will always be 1, and
 I guess anyone who wants to use iWARP over multihomed SCTP
 will probably have to use some lower-level API.
 
 By the way, we may also have to have the option of passing in
 a local netdev so that we can handle link-local IPv6
 addresses.  There may be other cases I haven't thought of yet.
 I just hope we can avoid going all the way to the horror of
 the getaddrinfo() API.
 
 I also hope we can agree to use IPoIB ARP to resolve the
 address in the IB case; having a flag or some other hack in
 the API to expose the option of

RE: [openib-general] RDMA connection and address translation API

2005-08-24 Thread Tom Tucker

Roland:

Steve and I came to the same conclusion on the airplane ride back to
Austin. Whereas plain old TCP/IP selects a device at the bottom of the
stack, RDMA transports must select the device at the top because
pre-connect resources must be allocated and these resouces are
associated with a particular device.

I think you've absolutely nailed the active side (by the way, I think
the ib_at_route_by_ip service already performs the necessary routing
function). The listen side, however, I think needs a little tweaking. It
would be beneficial if the client can specify either an IP address and
port to listen on (effectively selecting a particular device), or a wild
card (all RDMA devices). An NFS server is an example of the later. This
is trivial to do by providing an address to the listen call where a '0'
represents a wild card.

 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of Roland Dreier
 Sent: Wednesday, August 24, 2005 12:07 AM
 To: openib-general@openib.org
 Subject: [openib-general] RDMA connection and address translation API
 
 At the OpenIB workshop on Monday, we had some discussion 
 about a high-level transport-neutral API for connection 
 handling.  After giving the topic some more thought, I've 
 come to the conclusion that neither the kDAPL API nor the new 
 API that was presented are usable.
 In this email, I'll try to detail my reasoning and sketch 
 what I believe is the correct API.
 
 The new API that we looked at was essentially the following 
 (I'm recreating this from memory, so I apologize if I 
 misrepresent it):
 
 listen(local_ip_address, service_id, listen_callback)
 connect(local_qp, remote_ip_address, qos, service_id,
 private_data, connect_callback)
 
 We already discussed the problem with having the listen 
 callback pass the consumer a remote source address -- doing 
 this requires the connection handling module to do an ATS 
 reverse lookup in the IB case, which the consumer might not 
 want.  I think there's agreement that the correct thing here 
 is for the listen callback to pass a transport address to the 
 consumer and provide a function that the consumer can call to 
 perform an ATS reverse lookup if desired.  This isn't a major 
 problem and can be dealt with.
 
 However, there's another problem with trying to lump address 
 translation and connection into a single connect call, and 
 this problem looks fundamental and fatal to me.  The connect 
 call takes a QP pointer, but to create a QP the consumer 
 needs to know which local device to use.  However, the 
 consumer doesn't know which device to use until the 
 destination address has been resolved to a route, including a 
 local interface.
 
 As far as I can tell, kDAPL punts on this and simply requires 
 the consumer to handle the route lookup itself before calling 
 dat_ep_connect().  It seems that current kDAPL consumers 
 similarly punt on this issue: the iSER initiator and the 
 NFS-RDMA client both just use a single device which is 
 statically discovered at init time.
 
 It seems that the kDAPL connection model has a serious flaw, 
 in that it pushes the complexity of route lookup into the 
 consumer.  Further, we have strong evidence that this routing 
 code is hard to write and that consumers will just ignore 
 this complexity and hard-code solutions that don't work under 
 all configurations.
 
 With this in mind, I believe that the connection API needs to 
 be something more like the following:
 
 rdma_resolve_address():
 inputs: dest IP address, qos, npaths,
 done callback, opaque context
   done callback params: status, local RDMA device,
 RDMA transport address, context
 
 This function starts the process of resolving an IP address to
 an RDMA device and address.  When the resolution is complete,
 the callback is called with a status.  If the status is
 success then the callback also gets the device pointer and
 transport address (as well as the original context that the
 consumer passed in).
 
 The RDMA transport address type is a union containing
 transport-dependent data.  In the IB case, it's all of the
 SGID, DGID, SLID, DLID, SL etc. that we know and love.  In the
 iWARP case, it's the source IP, destination IP and QOS.
 
 npaths can be either 1 or 2 in the IB case; if it's 2, then
 the resolver will try to find a primary and alternate path for
 APM.  In the iWARP case, I guess npaths will always be 1, and
 I guess anyone who wants to use iWARP over multihomed SCTP
 will probably have to use some lower-level API.
 
 By the way, we may also have to have the option of passing in
 a local netdev so that we can handle link-local IPv6
 addresses.  There may be other cases I haven't thought of yet.
 I just hope we can avoid going all the way to the

RE: [openib-general] RDMA connection and address translation API

2005-08-24 Thread Steve Wise

Roland, this looks good!  A few comments below...

 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of Roland Dreier
 Sent: Wednesday, August 24, 2005 12:07 AM
 To: openib-general@openib.org
 Subject: [openib-general] RDMA connection and address translation API

 At the OpenIB workshop on Monday, we had some discussion about a
 high-level transport-neutral API for connection handling.  After
 giving the topic some more thought, I've come to the conclusion that
 neither the kDAPL API nor the new API that was presented are usable.
 In this email, I'll try to detail my reasoning and sketch what I
 believe is the correct API.

 The new API that we looked at was essentially the following (I'm
 recreating this from memory, so I apologize if I misrepresent it):

 listen(local_ip_address, service_id, listen_callback)
 connect(local_qp, remote_ip_address, qos, service_id,
 private_data, connect_callback)

 We already discussed the problem with having the listen callback pass
 the consumer a remote source address -- doing this requires the
 connection handling module to do an ATS reverse lookup in the IB case,
 which the consumer might not want.  I think there's agreement that the
 correct thing here is for the listen callback to pass a transport
 address to the consumer and provide a function that the consumer can
 call to perform an ATS reverse lookup if desired.  This isn't a major
 problem and can be dealt with.

 However, there's another problem with trying to lump address
 translation and connection into a single connect call, and this
 problem looks fundamental and fatal to me.  The connect call takes a
 QP pointer, but to create a QP the consumer needs to know which local
 device to use.  However, the consumer doesn't know which device to use
 until the destination address has been resolved to a route, including
 a local interface.

 As far as I can tell, kDAPL punts on this and simply requires the
 consumer to handle the route lookup itself before calling
 dat_ep_connect().  It seems that current kDAPL consumers similarly
 punt on this issue: the iSER initiator and the NFS-RDMA client both
 just use a single device which is statically discovered at init time.

Yes, DAPL punts on this.

 It seems that the kDAPL connection model has a serious flaw, in that
 it pushes the complexity of route lookup into the consumer.  Further,
 we have strong evidence that this routing code is hard to write and
 that consumers will just ignore this complexity and hard-code
 solutions that don't work under all configurations.

I agree!

 With this in mind, I believe that the connection API needs to be
 something more like the following:

 rdma_resolve_address():
 inputs: dest IP address, qos, npaths,
 done callback, opaque context
   done callback params: status, local RDMA device,
 RDMA transport address, context

 This function starts the process of resolving an IP address to
 an RDMA device and address.  When the resolution is complete,
 the callback is called with a status.  If the status is
 success then the callback also gets the device pointer and
 transport address (as well as the original context that the
 consumer passed in).

 The RDMA transport address type is a union containing
 transport-dependent data.  In the IB case, it's all of the
 SGID, DGID, SLID, DLID, SL etc. that we know and love.  In the
 iWARP case, it's the source IP, destination IP and QOS.

 npaths can be either 1 or 2 in the IB case; if it's 2, then
 the resolver will try to find a primary and alternate path for
 APM.  In the iWARP case, I guess npaths will always be 1, and
 I guess anyone who wants to use iWARP over multihomed SCTP
 will probably have to use some lower-level API.

 By the way, we may also have to have the option of passing in
 a local netdev so that we can handle link-local IPv6
 addresses.  There may be other cases I haven't thought of yet.
 I just hope we can avoid going all the way to the horror of
 the getaddrinfo() API.

 I also hope we can agree to use IPoIB ARP to resolve the
 address in the IB case; having a flag or some other hack in
 the API to expose the option of ATS seems unacceptably ugly.

 rdma_connect():
 inputs: local QP, RDMA transport address, destination service,
 private data, timeout, event callback, opaque context

 This function takes the resolved address and actually 
 connects.

 I'm not sure how we want to abstract the IB service vs. iWARP
 TCP port number difference.  I guess it's OK to have iWARP
 consumers stick their (16-bit) port number in a 64-bit
 parameter, even if it's not the prettiest API.

 To head off the knee-jerk

Re: [openib-general] RDMA connection and address translation API

2005-08-24 Thread Roland Dreier

Tom The listen side, however, I think needs a little tweaking. It
Tom would be beneficial if the client can specify either an IP
Tom address and port to listen on (effectively selecting a
Tom particular device), or a wild card (all RDMA devices). An NFS
Tom server is an example of the later. This is trivial to do by
Tom providing an address to the listen call where a '0'
Tom represents a wild card.

I agree that it's useful to be able to pass a sockaddr to bind a
listen to (just like the bind() call in userspace).  However, the
problem is that in the IB world, an incoming connection request does
not come with a destination IP address in any standard way.  So I
don't know the right way to implement bind() in the IB case.

By the way, an IP address/port does not necessarily select a single
RDMA device.  It's a perfectly valid configuration to have 10 network
interfaces all with the same local IP address.

 - R.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] RDMA connection and address translation API

2005-08-24 Thread Roland Dreier

Tom I think I understand, but the purpose of specifying the IP
Tom address in the listen is not to filter incoming connect
Tom requests, but rather to determine which devices I listen
Tom on. I think this works for the IB case as well. So the
Tom utility of the IP address specified in the listen is only to
Tom determine which devices the sid is created on. Does this make
Tom sense or am I missing something?

Well, that's not what I would expect.  Suppose I have a device
configured with local addresses 192.168.11.12 and 192.168.98.99 and I
start listening for some service at the address 192.168.11.12.  I
don't think I should see a connection request if a remote system tries
to connect to 192.168.98.99 (even though it's the same network
interface as 192.168.11.12).

 - R.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [openib-general] RDMA connection and address translation API

2005-08-24 Thread Steve Wise

 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of Roland Dreier
 Sent: Wednesday, August 24, 2005 11:27 AM
 To: Tom Tucker
 Cc: openib-general@openib.org
 Subject: Re: [openib-general] RDMA connection and address 
 translation API

 Tom I think I understand, but the purpose of specifying the IP
 Tom address in the listen is not to filter incoming connect
 Tom requests, but rather to determine which devices I listen
 Tom on. I think this works for the IB case as well. So the
 Tom utility of the IP address specified in the listen is only to
 Tom determine which devices the sid is created on. Does this make
 Tom sense or am I missing something?

 Well, that's not what I would expect.  Suppose I have a device
 configured with local addresses 192.168.11.12 and 192.168.98.99 and I
 start listening for some service at the address 192.168.11.12.  I
 don't think I should see a connection request if a remote system tries
 to connect to 192.168.98.99 (even though it's the same network
 interface as 192.168.11.12).

I agree Roland.  ULPs that listen to a specific addr, expect only
connections requests that were sent to that ip addr.  I think we want to
provide this functionality. 

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] RDMA connection and address translation API

2005-08-24 Thread James Lentini



  However, there's another problem with trying to lump address
  translation and connection into a single connect call, and this
  problem looks fundamental and fatal to me.  The connect call takes a
  QP pointer, but to create a QP the consumer needs to know which local
  device to use.  However, the consumer doesn't know which device to use
  until the destination address has been resolved to a route, including
  a local interface.
 
 The proposition, also presented (I beleive) in the OpenIB workshop,
 include a function called ib_cma_get_device, that retrieves the device
 (for qp creation purposes) according to the destination address and the
 local routing table. 

That function was included in the presentation. Given that the 
discussion focused on the proper location of address translation, it 
is understandable that its presence was overlooked.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] RDMA connection and address translation API

2005-08-24 Thread Roland Dreier

Fab I think the IB CM needs to be able to do two things.  It
Fab needs to allow a listen to be bound to a specific port -
Fab using the port GUID or the LID or something along those
Fab lines.

Yes, this is probably a good idea.

Fab Knowledge of actual IP addresses would be up to the consumer.
Fab However, the IB CM can facilitate checks by allowing the user
Fab to specify an offset and length in the private data to match
Fab to for incoming requests.

This seems too complex and at the same time too limited to me.  For
one thing -- although I think ATS should die -- this doesn't support
ATS reverse lookups.  For another, it doesn't handle something like
the SDP Hello header, where the IP version is at a certain offset, and
then the IP address is interpreted according to the IP address.

What makes it really ugly is that it's perfectly reasonable for one
consumer to listen to a service at 192.168.11.12 and another consumer
to listen to the same service at 192.168.98.99.  How do we handle this
in the IB case??

 - R.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] RDMA connection and address translation API

2005-08-24 Thread James Lentini



On Tue, 23 Aug 2005, Roland Dreier wrote:

 It would be possible to have another function like
 rdma_getpeername() that takes the transport address and
 returns a source IP address.  In the IB case this would do an
 ATS reverse lookup.  However, I hate this idea.  iSER already
 uses the CM private data to pass the source IP in the IB case,

I know this is how IB SDP works, but I don't think iSER works this 
way.

The code in the tree calls dat_ep_connect() with a NULL private data 
pointer. 

There is an iSER HELLO message described in iser_header.h contains IP 
addresses, but I'm not certain that this is part of the current 
protocol (ISER_HELLO_LEN and ISER_HELLO_REPLY_LEN are unused).
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [openib-general] RDMA connection and address translation API

2005-08-24 Thread Yaron Haviv

 -Original Message-
 From: [EMAIL PROTECTED] [mailto:openib-general-
 [EMAIL PROTECTED] On Behalf Of James Lentini
 Sent: Wednesday, August 24, 2005 1:43 PM
 To: Roland Dreier
 Cc: openib-general@openib.org
 Subject: Re: [openib-general] RDMA connection and address translation
API

 On Tue, 23 Aug 2005, Roland Dreier wrote:

  It would be possible to have another function like
  rdma_getpeername() that takes the transport address and
  returns a source IP address.  In the IB case this would do
an
  ATS reverse lookup.  However, I hate this idea.  iSER
already
  uses the CM private data to pass the source IP in the IB
case,

 I know this is how IB SDP works, but I don't think iSER works this
 way.

 The code in the tree calls dat_ep_connect() with a NULL private data
 pointer.

 There is an iSER HELLO message described in iser_header.h contains IP
 addresses, but I'm not certain that this is part of the current
 protocol (ISER_HELLO_LEN and ISER_HELLO_REPLY_LEN are unused).

James,

iSER doesn't mandate the source IP in general since its doing a much
stronger authentication during Login
However we believe using a similar header to SDP can help the Passive
side 
a. know which destination IP was targeted (in a multi homed environment)
b. for some implementations that want to validate the source for some
reason

that's why the draft suggested adding the source/dst IP in the private
data just like SDP does, I believe it can be a good idea to use the same
approach for NFS/RDMA and eliminate the need for reverse ATS lookup (the
may have some conflicts when multiple IPs exists per node).
We may just use the SDP hello header as is with unused fields zeroed 
This will allow all ULPs to use the same mechanism

Yaron

 ___
 openib-general mailing list
 openib-general@openib.org
 http://openib.org/mailman/listinfo/openib-general

 To unsubscribe, please visit
http://openib.org/mailman/listinfo/openib-
 general
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [openib-general] RDMA connection and address translation API

2005-08-24 Thread Tom Tucker

 -Original Message-
 From: Roland Dreier [mailto:[EMAIL PROTECTED] 
 Sent: Wednesday, August 24, 2005 11:27 AM
 To: Tom Tucker
 Cc: Roland Dreier; openib-general@openib.org
 Subject: Re: [openib-general] RDMA connection and address 
 translation API

 Tom I think I understand, but the purpose of specifying the IP
 Tom address in the listen is not to filter incoming connect
 Tom requests, but rather to determine which devices I listen
 Tom on. I think this works for the IB case as well. So the
 Tom utility of the IP address specified in the listen is only to
 Tom determine which devices the sid is created on. Does this make
 Tom sense or am I missing something?

 Well, that's not what I would expect.  Suppose I have a 
 device configured with local addresses 192.168.11.12 and 
 192.168.98.99 and I start listening for some service at the 
 address 192.168.11.12.  I don't think I should see a 
 connection request if a remote system tries to connect to 
 192.168.98.99 (even though it's the same network interface as 
 192.168.11.12).

  - R.

Good point, although for iWARP it will work that way that you expect.
For IB, admitedly it's more complex and would require ATS. There seems
to be significant reluctance around ATS and I don't understand the
issues. Can you provide a quick synopsis?

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] RDMA connection and address translation API

2005-08-24 Thread Caitlin Bestler

On 8/24/05, Fab Tillier [EMAIL PROTECTED] wrote:
  From: Roland Dreier [mailto:[EMAIL PROTECTED]
  Sent: Wednesday, August 24, 2005 10:16 AM
 
  Fab Knowledge of actual IP addresses would be up to the consumer.
  Fab However, the IB CM can facilitate checks by allowing the user
  Fab to specify an offset and length in the private data to match
  Fab to for incoming requests.
 
  This seems too complex and at the same time too limited to me.  For
  one thing -- although I think ATS should die -- this doesn't support
  ATS reverse lookups.
 
 I think if all ULPs provide their source and destination IP in the private 
 data,
 you can eliminate the reverse lookup altogether.  A simple forward lookup is 
 all
 that's needed to validate that the source GID in the REQ matches the reported
 source IP in the private data.  The forward lookup could be done via ATS or 
 via
 ARP, but the CM doesn't need to care which method is used.
 

That is not an option.

The applications are expecting source/destination network addresses
that come from a network layer, not from the peer application. IP has
no problem meeting this requirement. This is an IB problem that needs
to be solved within the scope of IB without changing any ULPs.

  For another, it doesn't handle something like
  the SDP Hello header, where the IP version is at a certain offset, and
  then the IP address is interpreted according to the IP address.
 
 Why can't the IPV field be ignored?  If a listen wants only IPV4 addresses, it
 would specify a 16-byte compare buffer with the first 12 bytes zero, the next 
 4
 filled with the IPV4 address, and would set the offset to that of the hello
 message's destination address (32).
 
  What makes it really ugly is that it's perfectly reasonable for one
  consumer to listen to a service at 192.168.11.12 and another consumer
  to listen to the same service at 192.168.98.99.  How do we handle this
  in the IB case??
 
 As long as the service IP address (the local address on the listening side) is
 always advertised in the same place in the private data, this isn't a problem.
 The compare lengths and offsets would be identical for both services, but the
 compare buffer contents would differ.  Did I miss what you were getting at?
 

The concensus when this issue was debated in the DAT Collaborative was
that there was no transport neutral way to specify a set of addresses to listen
on other than all addresses supported by this device.

As noted in another posting, it is easy to support all for device and this
address only with transport neutral interfaces. Anything else is problematic.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [openib-general] RDMA connection and address translation API

2005-08-24 Thread Sean Hefty

Fab Why can't the IPV field be ignored?  If a listen wants only
Fab IPV4 addresses, it would specify a 16-byte compare buffer
Fab with the first 12 bytes zero, the next 4 filled with the IPV4
Fab address, and would set the offset to that of the hello
Fab message's destination address (32).

Yes, you're right for SDP.  I guess if we're comfortable mandating
that all protocols put their source and destination IPs in the private
data for the IB case, then this works.  Of course it's somewhat
awkward to pass this information into the transport-neutral CM API but
I think this can be worked around.

For IB, using private data to listen on a specific IP address seems the easiest
thing to do.  (Maybe we could do it by mapping different IP addresses to
different service IDs, requiring registration and lookup?)  If the CM
abstraction layer expected those values to be returned in the REP message, it
could validate that the remote side it using the same protocol to ensure some
degree of backwards compatibility.

I don't know if it makes more sense to push private data checks into the actual
CM or keep them in a CM abstraction layer.  My guess is that the former may be
the easier implementation.

- Sean

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [openib-general] RDMA connection and address translation API

2005-08-24 Thread Yaron Haviv

 -Original Message-
 From: [EMAIL PROTECTED] [mailto:openib-general-
 [EMAIL PROTECTED] On Behalf Of Caitlin Bestler
 Sent: Wednesday, August 24, 2005 2:14 PM
 To: Fab Tillier
 Cc: openib-general@openib.org
 Subject: Re: [openib-general] RDMA connection and address translation
API
 
 
 The applications are expecting source/destination network addresses
 that come from a network layer, not from the peer application. IP has
 no problem meeting this requirement. This is an IB problem that needs
 to be solved within the scope of IB without changing any ULPs.
 

To my understanding IB private data fields are IB CM specific 
So embedding src/dst IP in it doesn't change the ULP and could be
considered as part of the IB CM

You can look at the private data in that case as a replacement to the
TCP CM (Syn/SynAck exchange), and Syn packet includes IPs  Ports

Yaron 
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] RDMA connection and address translation API

2005-08-24 Thread Roland Dreier

Fab Why can't the IPV field be ignored?  If a listen wants only
Fab IPV4 addresses, it would specify a 16-byte compare buffer
Fab with the first 12 bytes zero, the next 4 filled with the IPV4
Fab address, and would set the offset to that of the hello
Fab message's destination address (32).

Yes, you're right for SDP.  I guess if we're comfortable mandating
that all protocols put their source and destination IPs in the private
data for the IB case, then this works.  Of course it's somewhat
awkward to pass this information into the transport-neutral CM API but
I think this can be worked around.

Roland What makes it really ugly is that it's perfectly
Roland reasonable for one consumer to listen to a service at
Roland 192.168.11.12 and another consumer to listen to the same
Roland service at 192.168.98.99.  How do we handle this in the IB
Roland case??

Fab As long as the service IP address (the local address on the
Fab listening side) is always advertised in the same place in the
Fab private data, this isn't a problem.  The compare lengths and
Fab offsets would be identical for both services, but the compare
Fab buffer contents would differ.  Did I miss what you were
Fab getting at?

No, I think I confused myself.  As long as the CM can get at the IP
information, it can figure out which consumer is which.

 - R.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [openib-general] RDMA connection and address translation API

2005-08-24 Thread Sean Hefty

 I think if all ULPs provide their source and destination IP in the private
data,
 you can eliminate the reverse lookup altogether.  A simple forward lookup is
all
 that's needed to validate that the source GID in the REQ matches the reported
 source IP in the private data.  The forward lookup could be done via ATS or
via
 ARP, but the CM doesn't need to care which method is used.


That is not an option.

The applications are expecting source/destination network addresses
that come from a network layer, not from the peer application. IP has
no problem meeting this requirement. This is an IB problem that needs
to be solved within the scope of IB without changing any ULPs.

IB can solve the option by exposing fewer bytes of private data.  ULPs do not
need to know that part of the IB private data is actually used by the CM
abstraction layer.  ULPs that make use of this new interface change anyway.

- Sean

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [openib-general] RDMA connection and address translation API

2005-08-24 Thread Fab Tillier

 From: Roland Dreier [mailto:[EMAIL PROTECTED]
 Sent: Wednesday, August 24, 2005 10:16 AM
 
 Fab Knowledge of actual IP addresses would be up to the consumer.
 Fab However, the IB CM can facilitate checks by allowing the user
 Fab to specify an offset and length in the private data to match
 Fab to for incoming requests.
 
 This seems too complex and at the same time too limited to me.  For
 one thing -- although I think ATS should die -- this doesn't support
 ATS reverse lookups.

I think if all ULPs provide their source and destination IP in the private data,
you can eliminate the reverse lookup altogether.  A simple forward lookup is all
that's needed to validate that the source GID in the REQ matches the reported
source IP in the private data.  The forward lookup could be done via ATS or via
ARP, but the CM doesn't need to care which method is used.
 
 For another, it doesn't handle something like
 the SDP Hello header, where the IP version is at a certain offset, and
 then the IP address is interpreted according to the IP address.

Why can't the IPV field be ignored?  If a listen wants only IPV4 addresses, it
would specify a 16-byte compare buffer with the first 12 bytes zero, the next 4
filled with the IPV4 address, and would set the offset to that of the hello
message's destination address (32).

 What makes it really ugly is that it's perfectly reasonable for one
 consumer to listen to a service at 192.168.11.12 and another consumer
 to listen to the same service at 192.168.98.99.  How do we handle this
 in the IB case??

As long as the service IP address (the local address on the listening side) is
always advertised in the same place in the private data, this isn't a problem.
The compare lengths and offsets would be identical for both services, but the
compare buffer contents would differ.  Did I miss what you were getting at?

- Fab

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] RDMA connection and address translation API

2005-08-24 Thread Caitlin Bestler

NFS over RDMA does not do that.

Shouldn't that be the end of discussion on abusing CM private data
unless you are talking *solely* about IB private data. And if that is
the discussion, should not such a strategy be proposed to IETF
and/or IBTA for an NFSoRDMA for IB official mapping?

The other end of the NFSoRDMA connection is not necessarily
running OpenIB or even Linux and is not party to any of these
discussions.

 
 My resistance is that ATS is just complexity without any benefit.  It
 doesn't provide additional security.  It doesn't solve the
 multi-homing problem we're talking about now.  Once you've thrown away
 information by turning your IP address into an IB GID, there's no
 magic way ATS can recreate that information and be psychic about which
 of the multi-homed IPs you actually meant.  So why not just put the IP
 addressing information into the CM private data, the way that the SDP
 protocol already does?
 
  - R.
 ___
 openib-general mailing list
 openib-general@openib.org
 http://openib.org/mailman/listinfo/openib-general
 
 To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [openib-general] RDMA connection and address translation API

2005-08-24 Thread Fab Tillier

 From: Caitlin Bestler [mailto:[EMAIL PROTECTED]
 Sent: Wednesday, August 24, 2005 11:14 AM

 On 8/24/05, Fab Tillier [EMAIL PROTECTED] wrote:
   From: Roland Dreier [mailto:[EMAIL PROTECTED]
   Sent: Wednesday, August 24, 2005 10:16 AM

   Fab Knowledge of actual IP addresses would be up to the consumer.
   Fab However, the IB CM can facilitate checks by allowing the user
   Fab to specify an offset and length in the private data to match
   Fab to for incoming requests.

   This seems too complex and at the same time too limited to me.  For
   one thing -- although I think ATS should die -- this doesn't support
   ATS reverse lookups.

  I think if all ULPs provide their source and destination IP in the private
  data, you can eliminate the reverse lookup altogether.  A simple forward
  lookup is all that's needed to validate that the source GID in the REQ
  matches the reported source IP in the private data.  The forward lookup
  could be done via ATS or via ARP, but the CM doesn't need to care which
  method is used.

 That is not an option.

 The applications are expecting source/destination network addresses
 that come from a network layer, not from the peer application. IP has
 no problem meeting this requirement. This is an IB problem that needs
 to be solved within the scope of IB without changing any ULPs.

If the app wants to use source/destination network addresses, there isn't a
problem.  The problem is the app wants to use IP addresses, which are *not*
network addresses in IB.  So the app needs to decide between one of two things -
be aware of IB network addresses, or provide meaning to IP addresses over IB.
The latter can't be done reliably under the covers - ATS reverse lookups won't
tell you the IP the source actually used, and there's no way to do so without
either using private data in the CM REQ or requiring a 1:1 mapping of IB:IP
addresses.  The 1:1 IB:IP mapping is not feasible, so the only way to know what
IP address the application used is to embed that into the private data.  I would
expect protocols that try to use IP as their addressing would accommodate this
in their IB usage, just like SDP accommodates it in the hello message.

- Fab

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [openib-general] RDMA connection and address translation API

2005-08-24 Thread Fab Tillier

 From: Roland Dreier [mailto:[EMAIL PROTECTED]
 Sent: Wednesday, August 24, 2005 11:03 AM
 
 Fab Why can't the IPV field be ignored?  If a listen wants only
 Fab IPV4 addresses, it would specify a 16-byte compare buffer
 Fab with the first 12 bytes zero, the next 4 filled with the IPV4
 Fab address, and would set the offset to that of the hello
 Fab message's destination address (32).
 
 Yes, you're right for SDP.  I guess if we're comfortable mandating
 that all protocols put their source and destination IPs in the private
 data for the IB case, then this works.  Of course it's somewhat
 awkward to pass this information into the transport-neutral CM API but
 I think this can be worked around.

I don't know if we need to mandate IP usage - it's up to the application.  Any
application that wants to have similar semantics to the way socket listens work
(especially when bound to one of multiple IP addresses on a port) the
application would have to define its private data to accommodate this.
 
At the IB level, the contents of the private data are still opaque, even to the
CM.  The CM would only expose the ability to have it perform an initial triage
of requests by doing binary comparisons over regions of private data.  It
doesn't know (or need to know) what the data represents - it only cares about
finding a match (or not).  The CM doesn't define any sort of policy here, and I
don't think it should.  It's just bytes to the CM, and it's doing a blind
comparison without interpreting the contents.

- Fab


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [openib-general] RDMA connection and address translation API

2005-08-24 Thread Fab Tillier

 From: Sean Hefty [mailto:[EMAIL PROTECTED]
 Sent: Wednesday, August 24, 2005 11:18 AM
 
 For IB, using private data to listen on a specific IP address seems the
 easiest thing to do.  (Maybe we could do it by mapping different IP
 addresses to different service IDs, requiring registration and lookup?)

The problem with the SID method is that the SID namespace is smaller than the
IPV6 address name space.  There's no way to get every possible IPV6 address
represented by a 64-bit SID.  This further ignores the rules for SIDs in the IB
specification.  I think private data is the only way to do this properly.

 If the CM abstraction layer expected those values to be returned in the
 REP message, it could validate that the remote side it using the same
 protocol to ensure some degree of backwards compatibility.
 
 I don't know if it makes more sense to push private data checks into the
 actual CM or keep them in a CM abstraction layer.  My guess is that the
 former may be the easier implementation.

I think putting the checks in the CM makes the most sense, though it should be
done in a generic fashion.  A CM abstraction layer could then simply apply a
policy for private data usage - where in the private data it stores the IP
address information.

Layering it this way allows the private data compare to be used for things other
than IP addresses.  Add functionality without imposing policy.

- Fab

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [openib-general] RDMA connection and address translation API

2005-08-24 Thread Tom Tucker

 -Original Message-
 From: Roland Dreier [mailto:[EMAIL PROTECTED] 
 Sent: Wednesday, August 24, 2005 1:17 PM
 To: Tom Tucker
 Cc: openib-general@openib.org
 Subject: Re: [openib-general] RDMA connection and address 
 translation API

 Tom Good point, although for iWARP it will work that way that you
 Tom expect.  For IB, admitedly it's more complex and would
 Tom require ATS. There seems to be significant reluctance around
 Tom ATS and I don't understand the issues. Can you provide a
 Tom quick synopsis?

 My resistance is that ATS is just complexity without any benefit.  

IMHO the benefit is that you have a transport independent addressing
mechanism -- albeit with some limitations as you've mentioned. In this
case, the vast majority of clients enjoy the benefit without suffering
the limitations.

 ... It
 doesn't provide additional security.  It doesn't solve the
 multi-homing problem we're talking about now.  

Whenever a single GID maps to multiple IP addresses, I agree, it is a
limitation. However, I don't believe that this is strictly necessary.

 ... Once you've thrown away
 information by turning your IP address into an IB GID, there's no
 magic way ATS can recreate that information and be psychic about which
 of the multi-homed IPs you actually meant.  

I agree, so don't do that. If you want it to work properly, then you
need to map GIDS to IP addresses. 

 ... So why not just put the IP
 addressing information into the CM private data, the way that the SDP
 protocol already does?

  - R.

Because it would be better to configure your network properly. Putting
IP addresses in private data is fundamentally insecure since any user
mode client can spoof the IP address. 

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [openib-general] RDMA connection and address translation API

2005-08-24 Thread Sean Hefty

Because it would be better to configure your network properly. Putting
IP addresses in private data is fundamentally insecure since any user
mode client can spoof the IP address.

A simple forward lookup could detect this.

- Sean

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [openib-general] RDMA connection and address translation API

2005-08-24 Thread Yaron Haviv

 -Original Message-
 From: [EMAIL PROTECTED] [mailto:openib-general-
 [EMAIL PROTECTED] On Behalf Of Fab Tillier
 Sent: Wednesday, August 24, 2005 3:00 PM
 To: 'Roland Dreier'
 Cc: openib-general@openib.org
 Subject: RE: [openib-general] RDMA connection and address translation
API

  From: Roland Dreier [mailto:[EMAIL PROTECTED]
  Sent: Wednesday, August 24, 2005 11:03 AM

  Fab Why can't the IPV field be ignored?  If a listen wants only
  Fab IPV4 addresses, it would specify a 16-byte compare buffer
  Fab with the first 12 bytes zero, the next 4 filled with the
IPV4
  Fab address, and would set the offset to that of the hello
  Fab message's destination address (32).

  Yes, you're right for SDP.  I guess if we're comfortable mandating
  that all protocols put their source and destination IPs in the
private
  data for the IB case, then this works.  Of course it's somewhat
  awkward to pass this information into the transport-neutral CM API
but
  I think this can be worked around.

 I don't know if we need to mandate IP usage - it's up to the
application.
 Any
 application that wants to have similar semantics to the way socket
listens
 work
 (especially when bound to one of multiple IP addresses on a port) the
 application would have to define its private data to accommodate this.

The context of this discussion is around a common API for iWarp/IB ULPs
In that case they all use IP addresses (since it's the common
addressing) 

If someone would use the IB specific API under this abstraction level he
can provide what ever data he wants to the CM

Any way providing src/dst IPs in the CM Private data is simple, and we
can come with IBTA extension blessing that data structure as a general
way to map IP oriented protocols over IB (a 1-2 page draft at the most)
This way it can also address Caitlin concerns regarding NFS  IETF
(since now it's a transport specific issue)

Yaron
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [openib-general] RDMA connection and address translation API

2005-08-24 Thread James Lentini



On Wed, 24 Aug 2005, Sean Hefty wrote:

 I guess that I'd like to clarify what the operation of a connect 
 call would do.  Would it be responsible for modifying the QP?  If 
 so, could such a call also allocate the QP?  Note that I'm not 
 advocating either of these, just trying to determine what the 
 behavior of the API would be.

If the connect call succeeds in establishing a connection, the ULP's 
QP should be ready for posting work requests. This simplifies the ULP 
considerably.

The API should not create the QP. That would create race conditions 
for certain protocols. For example, consider a protocol in which the 
first message was a send from the server to the client. To properly 
implement such a protocol, the client must post a receive work request 
before initiating a connection.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] RDMA connection and address translation API

2005-08-24 Thread James Lentini



On Wed, 24 Aug 2005, Caitlin Bestler wrote:

 On 8/24/05, Fab Tillier [EMAIL PROTECTED] wrote:
  
  I think if all ULPs provide their source and destination IP in the 
  private data, you can eliminate the reverse lookup altogether.  A 
  simple forward lookup is all that's needed to validate that the 
  source GID in the REQ matches the reported source IP in the 
  private data.  The forward lookup could be done via ATS or via 
  ARP, but the CM doesn't need to care which method is used.
 
 That is not an option.
 
 The applications are expecting source/destination network addresses 
 that come from a network layer, not from the peer application. IP 
 has no problem meeting this requirement. This is an IB problem that 
 needs to be solved within the scope of IB without changing any ULPs.

I agree with Caitlin. The eventual solution cannot force protocol 
modifications in ULPs.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] RDMA connection and address translation API

2005-08-24 Thread Roland Dreier

James I agree with Caitlin. The eventual solution cannot force
James protocol modifications in ULPs.

Does this mean we're stuck with the current use of ATS in NFS-RDMA?
Surely there's still time to fix the protocol.

 - R.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [openib-general] RDMA connection and address translation API

2005-08-24 Thread Tom Tucker


Isn't this inevitable regardless of whether or not we have a tranport
independent connection API. I thought ATS was required by NFS for
authentication/authorization. Sorry in advance if I'm confused ---
again.

 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of Roland Dreier
 Sent: Wednesday, August 24, 2005 3:27 PM
 To: James Lentini
 Cc: Caitlin Bestler; openib-general@openib.org
 Subject: Re: [openib-general] RDMA connection and address 
 translation API
 
 James I agree with Caitlin. The eventual solution cannot force
 James protocol modifications in ULPs.
 
 Does this mean we're stuck with the current use of ATS in NFS-RDMA?
 Surely there's still time to fix the protocol.
 
  - R.
 ___
 openib-general mailing list
 openib-general@openib.org
 http://openib.org/mailman/listinfo/openib-general
 
 To unsubscribe, please visit 
 http://openib.org/mailman/listinfo/openib-general
 
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] RDMA connection and address translation API

2005-08-24 Thread Roland Dreier

Tom Isn't this inevitable regardless of whether or not we have a
Tom tranport independent connection API. I thought ATS was
Tom required by NFS for authentication/authorization. Sorry in
Tom advance if I'm confused --- again.

Current NFS-RDMA code uses and relies on ATS.  However I hope that we
can fix the NFS-RDMA draft to get rid of this.

 - R.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [openib-general] RDMA connection and address translation API

2005-08-24 Thread Tom Tucker


So the listening server takes the IP address from the private data, uses
AT to get the GID and then compares it to the GID in the connect
request? 

It feels to me like this private data thing is a case of the cure is
worse than the disease. As I understand it, we're trying to avoid the
following:

server:

dev = ib_get_device(10.10.1.1 /*src ip*/,0 /*dest ip*/);

/* GID has IP addresses 10.10.1.1, 10.10.1.2 */
ib_listen(dev, 10.10.1.1 /* listen bind address */, 143 /* port */, 10
/* backlog */);


client:

dev = ib_get_device(0 /* src wildcard */, 10.10.1.2 /* dest ip*/)


ib_connect(dev, 0 /*src*/, 10.10.1.2 /*dest*/, 143/*port*/, ...);


The issue is that this connection will be established when the server
may only want to accept requests that are targetted to the 10.10.1.1
address.  I don't get why this is such a big deal. You can preclude this
behavior by simply keeping a one to one mapping between the IPv4
addresses and the GIDs using the existing protocols and without
mandating a private data format across *all* ulps and transports.

If I'm being painfully stupid...please feel free to tell me. 

 -Original Message-
 From: Sean Hefty [mailto:[EMAIL PROTECTED] 
 Sent: Wednesday, August 24, 2005 2:12 PM
 To: Tom Tucker; Roland Dreier
 Cc: openib-general@openib.org
 Subject: RE: [openib-general] RDMA connection and address 
 translation API
 
 Because it would be better to configure your network properly. 
 Putting IP addresses in private data is fundamentally insecure since 
 any user mode client can spoof the IP address.
 
 A simple forward lookup could detect this.
 
 - Sean
 
 
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [openib-general] RDMA connection and address translation API

2005-08-24 Thread James Lentini



On Wed, 24 Aug 2005, Fab Tillier wrote:

  From: Roland Dreier [mailto:[EMAIL PROTECTED]
  Sent: Wednesday, August 24, 2005 11:03 AM
  
  Fab Why can't the IPV field be ignored?  If a listen wants only
  Fab IPV4 addresses, it would specify a 16-byte compare buffer
  Fab with the first 12 bytes zero, the next 4 filled with the IPV4
  Fab address, and would set the offset to that of the hello
  Fab message's destination address (32).
  
  Yes, you're right for SDP.  I guess if we're comfortable mandating
  that all protocols put their source and destination IPs in the private
  data for the IB case, then this works.  Of course it's somewhat
  awkward to pass this information into the transport-neutral CM API but
  I think this can be worked around.
 
 I don't know if we need to mandate IP usage - it's up to the 
 application.  Any application that wants to have similar semantics 
 to the way socket listens work (especially when bound to one of 
 multiple IP addresses on a port) the application would have to 
 define its private data to accommodate this.

  At the IB level, the contents of the private data are still opaque, 
 even to the CM.  The CM would only expose the ability to have it 
 perform an initial triage of requests by doing binary comparisons 
 over regions of private data.  It doesn't know (or need to know) 
 what the data represents - it only cares about finding a match (or 
 not).  The CM doesn't define any sort of policy here, and I don't 
 think it should.  It's just bytes to the CM, and it's doing a blind 
 comparison without interpreting the contents.

You need to consider what makes sense for *both* ib and iwarp. Keep in 
mind that the correct API will allow a consumer to use ib and iwarp 
devices transparently. In other words their will be one code path that 
support both.

If we were to adopt your proposal, the consumer would need to perform 
unnecessary operations on iWARP.

A transport neutral client would be forced to put IP information into 
its CM private data on iWARP.

Likewise, a transport neutral server would be forced to pass an 
private data offset and binary blob to the listen API call on iWARP.

Neither of these make sense. 

These API problems are secondary to the burden you would be placing on 
the protocols. As has been mentioned in a previous email, extending 
the current protocols to use this convention will require further 
standardization and in some cases may not be compatible with their 
current architecture.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] RDMA connection and address translation API

2005-08-24 Thread Christoph Hellwig

On Wed, Aug 24, 2005 at 09:26:42AM -0700, Roland Dreier wrote:
 Tom I think I understand, but the purpose of specifying the IP
 Tom address in the listen is not to filter incoming connect
 Tom requests, but rather to determine which devices I listen
 Tom on. I think this works for the IB case as well. So the
 Tom utility of the IP address specified in the listen is only to
 Tom determine which devices the sid is created on. Does this make
 Tom sense or am I missing something?
 
 Well, that's not what I would expect.  Suppose I have a device
 configured with local addresses 192.168.11.12 and 192.168.98.99 and I

You never configure a device with local addresses.  IP addresses are
always a per-host attribute in Linux.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] RDMA connection and address translation API

2005-08-24 Thread Christoph Hellwig

On Wed, Aug 24, 2005 at 11:14:08AM -0700, Caitlin Bestler wrote:
 The concensus when this issue was debated in the DAT Collaborative was
 that there was no transport neutral way to specify a set of addresses to 
 listen
 on other than all addresses supported by this device.

That doesn't make any sense at all for iWarp as that uses IP addressing
which in Linux is host-, not device-based.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] RDMA connection and address translation API

2005-08-24 Thread Roland Dreier

Tom The issue is that this connection will be established when
Tom the server may only want to accept requests that are
Tom targetted to the 10.10.1.1 address.  I don't get why this is
Tom such a big deal. You can preclude this behavior by simply
Tom keeping a one to one mapping between the IPv4 addresses and
Tom the GIDs using the existing protocols and without mandating a
Tom private data format across *all* ulps and transports.

Well, a few problems with what you say:

 - ATS does not help at all with the case of a multi-homed interface.
   Unless the remote system puts the IP it's trying to connect to
   somewhere in the connection request, there is no way to be psychic
   and recover this information.

 - Mandating ATS use is dictating protocol design just as much as
   requiring the CM private data to carry source and destination IP
   addresses.

 - It's not just preventing connections to the wrong local address.
   NFS-RDMA wants the remote source address (ie getpeername()) so that
   it can look it up in the exports list.

 - Saying that a given GID may only have a single IP address is
   definitely a case of the cure being worse than the disease.  I
   don't think we can forbid perfectly valid multi-homed
   configurations just because it's inconvenient for us to support them.

By the way, as far as I can tell, there is NO formal documentation of
the NFS-RDMA wire protocol.  The current draft (draft-ietf-nfsv4-rpcrdma-01.txt)
simply says:

 This protocol is designed to function with equivalent semantics
 over all appropriate RDMA transports.  In its abstract form, this
 protocol does not implement RDMA directly. [...]  It therefore
 becomes a useful, implementable standard when mapped onto a
 specific RDMA transport, such as iWARP [RDDP] or Infiniband [IB].

 [...]

 In setting up a new RDMA connection, the first action by an RPC
 client will be to obtain a transport address for the server.  The
 mechanism used to obtain this address, and to open an RDMA
 connection is dependent on the type of RDMA transport, and outside
 the scope of this protocol.

So it seems perfectly reasonable and acceptable for the mapping of
NFS-RDMA onto IB to specify that the source and destination IP
addresses for an IB connection are placed in the CM private data.
This seems much easier than trying to turn ATS into an IETF standard.

 - R.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] RDMA connection and address translation API

2005-08-24 Thread Roland Dreier

Roland Well, that's not what I would expect.  Suppose I have a
Roland device configured with local addresses 192.168.11.12 and
Roland 192.168.98.99 and I

Christoph You never configure a device with local addresses.  IP
Christoph addresses are always a per-host attribute in Linux.

I don't think this is really true.  In some ways Linux behaves as if
IP addresses are per-host (eg ARP responses can go out any interface)
but really IP addresses are attached to an interface.  Every struct
net_device has a struct in_device, and every struct in_device has a
list of struct in_ifaddrs for the device's IP addresses.

 - R.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [openib-general] RDMA connection and address translation API

2005-08-24 Thread Caitlin Bestler

Not if the host connects two disjoint networks and does not route
between them. Such a host should/may be configured to reject any
packet that arrives with a destination address that does not match
the expected destination address for the port it arrives upon. 

One of the things that iWARP vendors strive for is to ensure that
all such existing filtring/safety rules on accepting connections
are left 100% intact.

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Christoph Hellwig
Sent: Wednesday, August 24, 2005 2:00 PM
To: Caitlin Bestler
Cc: openib-general@openib.org
Subject: Re: [openib-general] RDMA connection and address translation API

On Wed, Aug 24, 2005 at 11:14:08AM -0700, Caitlin Bestler wrote:
 The concensus when this issue was debated in the DAT Collaborative was 
 that there was no transport neutral way to specify a set of addresses 
 to listen on other than all addresses supported by this device.

That doesn't make any sense at all for iWarp as that uses IP addressing which
in Linux is host-, not device-based.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit
http://openib.org/mailman/listinfo/openib-general


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] RDMA connection and address translation API

2005-08-24 Thread Roland Dreier

James You need to consider what makes sense for *both* ib and
James iwarp. Keep in mind that the correct API will allow a
James consumer to use ib and iwarp devices transparently. In
James other words their will be one code path that support both.

James If we were to adopt your proposal, the consumer would need
James to perform unnecessary operations on iWARP.

No, I think we just need to realize that a perfectly transport neutral
protocol implementation is not achievable.  It's unfortunate that
kDAPL fooled people by hiding the details of the wire protocol under a
supposedly neutral API, but the fact is that mapping an abstract
RDMA transport to a real implementation will always involve arbitrary
transport-dependent choices.

To use an analogy, the IP layer is mostly insulated from the details
of the L2 transport it's using by the net_device abstraction.
However, there are a few things that require code like:

int arp_mc_map(u32 addr, u8 *haddr, struct net_device *dev, int dir)
{
switch (dev-type) {
case ARPHRD_ETHER:
case ARPHRD_FDDI:
case ARPHRD_IEEE802:
ip_eth_mc_map(addr, haddr);
return 0; 
case ARPHRD_IEEE802_TR:
ip_tr_mc_map(addr, haddr);
return 0;
case ARPHRD_INFINIBAND:
ip_ib_mc_map(addr, haddr);
return 0;
default:
if (dir) {
memcpy(haddr, dev-broadcast, dev-addr_len);
return 0;
}
}
return -EINVAL;
}

 - R.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [openib-general] RDMA connection and address translation API

2005-08-24 Thread Caitlin Bestler

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Roland Dreier
Sent: Wednesday, August 24, 2005 2:03 PM
To: Tom Tucker
Cc: openib-general@openib.org
Subject: Re: [openib-general] RDMA connection and address translation API

By the way, as far as I can tell, there is NO formal documentation of the
NFS-RDMA wire protocol.  The current draft (draft-ietf-nfsv4-rpcrdma-01.txt)
simply says:

 This protocol is designed to function with equivalent semantics
 over all appropriate RDMA transports.  In its abstract form, this
 protocol does not implement RDMA directly. [...]  It therefore
 becomes a useful, implementable standard when mapped onto a
 specific RDMA transport, such as iWARP [RDDP] or Infiniband [IB].

 [...]

 In setting up a new RDMA connection, the first action by an RPC
 client will be to obtain a transport address for the server.  The
 mechanism used to obtain this address, and to open an RDMA
 connection is dependent on the type of RDMA transport, and outside
 the scope of this protocol.

So it seems perfectly reasonable and acceptable for the mapping of NFS-RDMA
onto IB to specify that the source and destination IP addresses for an IB
connection are placed in the CM private data.
This seems much easier than trying to turn ATS into an IETF standard.

 - R.

caitlin
NFS over RDMA was intended to be implemented using DAPL in a transport
neutrall way. Now having the transport layer *add* data before the
private data is legitimate for any specific transport. It would just
have to be defined independently of openib and linux.

Basically, any solution that allows NFS over RDMA to be coded with
the *same* set of kDAPL calls to listen/connect/accept/reject would
be compliant with the intent -- as long as the mapping to wire protocols
was straight-forward and allowed non-kDAPL implementations. For example,
mapping the DAPL private data to the IETF MPA Request/Reply frame 
Private Data certainly qualifies as straight forward.
/caitlin

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [openib-general] RDMA connection and address translation API

2005-08-24 Thread Caitlin Bestler

I think it would be more accurate to state that DAPL requires the 128-bit
IA Address space to be administratively subdivided so that each subnet
unambiguously translates to a specific IA reached network and that
translation
of the IA Address into and from that network's wire protocol is not visible
to the DAT Consumer.

ATS is indeed *one* solution for doing so. Adding RARP to IPoIB would make
for another solution. Direct translation is also a valid solution for IPv6
compatible network IDs.

So with this wealth of options available, do you agree that there is no
reason to elevate any of these issues to being visisble to a transport
neutral application? 

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Roland Dreier
Sent: Wednesday, August 24, 2005 2:31 PM
To: James Lentini
Cc: openib-general@openib.org
Subject: Re: [openib-general] RDMA connection and address translation API

Roland No, I think we just need to realize that a perfectly
Roland transport neutral protocol implementation is not
Roland achievable.  It's unfortunate that kDAPL fooled people by
Roland hiding the details of the wire protocol under a supposedly
Roland neutral API, but the fact is that mapping an abstract
Roland RDMA transport to a real implementation will always
Roland involve arbitrary transport-dependent choices.

Further: if we would be willing to say that transport-neutral protocols must
use a kDAPL wire protocol, then there's no problem in defining that wire
protocol to put the source and destination IP address somewhere in the CM
private data.  The current kDAPL wire protocol happens to use ATS to try
and achieve this (although it doesn't handle the multi-homed case), but that
is no more and no less of an arbitrary protocol design choice.

So in a nutshell, my objection to using ATS is that it is an arbitrary design
choice that doesn't work as well as other equally valid choices.

 - R.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit
http://openib.org/mailman/listinfo/openib-general


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] RDMA connection and address translation API

2005-08-24 Thread Roland Dreier

Caitlin So with this wealth of options available, do you agree
Caitlin that there is no reason to elevate any of these issues to
Caitlin being visisble to a transport neutral application?

No -- the fact that there are a wealth of options actually means that
picking one is an arbitrary choice we impose on transport neutral
implementations and is de facto mandating a wire protocol.

 - R.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [openib-general] RDMA connection and address translation API

2005-08-24 Thread James Lentini


On Wed, 24 Aug 2005, Yaron Haviv wrote:

  On Tue, 23 Aug 2005, Roland Dreier wrote:
  
   It would be possible to have another function like
   rdma_getpeername() that takes the transport address and returns
   a source IP address.  In the IB case this would do an ATS
   reverse lookup.  However, I hate this idea.  iSER already uses
   the CM private data to pass the source IP in the IB case,
   
  I know this is how IB SDP works, but I don't think iSER works this
  way.
 
  The code in the tree calls dat_ep_connect() with a NULL private 
  data pointer.
 
  There is an iSER HELLO message described in iser_header.h contains   
  IP addresses, but I'm not certain that this is part of the current
  protocol (ISER_HELLO_LEN and ISER_HELLO_REPLY_LEN are unused). 
 
 James,
 
 iSER doesn't mandate the source IP in general since its doing a much
 stronger authentication during Login
 However we believe using a similar header to SDP can help the Passive
 side 
 a. know which destination IP was targeted (in a multi homed environment)
 b. for some implementations that want to validate the source for some
 reason
 
 that's why the draft suggested adding the source/dst IP in the private
 data just like SDP does, 

Which draft contains this? I found

http://www.ietf.org/internet-drafts/draft-ietf-ips-iser-04.txt

but the HELLO header in section 9.3 does not contain any IP address 
information.

 I believe it can be a good idea to use the same approach for 
 NFS/RDMA and eliminate the need for reverse ATS lookup (the may have 
 some conflicts when multiple IPs exists per node). We may just use 
 the SDP hello header as is with unused fields zeroed This will allow 
 all ULPs to use the same mechanism

NFS/RDMA is not specific to iWARP or InfiniBand. My understanding is 
that this could not be easily accommodated in the current standards 
for that reason.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [openib-general] RDMA connection and address translation API

2005-08-24 Thread Caitlin Bestler

The requirement is solely that that System Administrators
for each host directly attached to Network X agree on the
basic addressing characteristics for Network X.

This onerous challenge is sucessfully overcome on every
IP subnet in the world every day for such details as 
what the subnet is, what the mask is, etc. Further, 
two adjoining subnets won't be able to talk unless
their administrators have arranged for them to agree
on what their network identifiers are/etc.

For the specific question it is even less of a
problem than theory suggests. A rule such as non
IPv4 subnets are direct translated while IPv4 subnets
use IPv4 is actually quite simple to implement.
That could even be extended to allow *some* IPv6
subnets to be translated so that mutiple IPV6
aliases for a single GID could be identified
(that is, if anyone has a need for such a thing).
 

-Original Message-
From: Roland Dreier [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, August 24, 2005 2:45 PM
To: Caitlin Bestler
Cc: Roland Dreier; James Lentini; openib-general@openib.org
Subject: Re: [openib-general] RDMA connection and address translation API

Caitlin So with this wealth of options available, do you agree
Caitlin that there is no reason to elevate any of these issues to
Caitlin being visisble to a transport neutral application?

No -- the fact that there are a wealth of options actually means that picking
one is an arbitrary choice we impose on transport neutral implementations and
is de facto mandating a wire protocol.

 - R.


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] RDMA connection and address translation API

2005-08-24 Thread Roland Dreier

James NFS/RDMA is not specific to iWARP or InfiniBand. My
James understanding is that this could not be easily accommodated
James in the current standards for that reason.

Yes, it seems that there will need to be some additional NFS/RDMA
drafts describing the iWARP and IB wire protocols before the standard
is complete.

 - R.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [openib-general] RDMA connection and address translation API

2005-08-24 Thread Tom Tucker

 -Original Message-
 From: Roland Dreier [mailto:[EMAIL PROTECTED] 
 Sent: Wednesday, August 24, 2005 4:03 PM
 To: Tom Tucker
 Cc: Sean Hefty; Roland Dreier; openib-general@openib.org
 Subject: Re: [openib-general] RDMA connection and address 
 translation API

 Tom The issue is that this connection will be established when
 Tom the server may only want to accept requests that are
 Tom targetted to the 10.10.1.1 address.  I don't get why this is
 Tom such a big deal. You can preclude this behavior by simply
 Tom keeping a one to one mapping between the IPv4 addresses and
 Tom the GIDs using the existing protocols and without mandating a
 Tom private data format across *all* ulps and transports.

 Well, a few problems with what you say:

  - ATS does not help at all with the case of a multi-homed interface.
Unless the remote system puts the IP it's trying to connect to
somewhere in the connection request, there is no way to be psychic
and recover this information.

I thought a single HCA could have multiple GIDs. All 
I'm advocating is that a correct multi-homed configuration 
has a one-to-one mapping between it's IP addresses and it's GIDS.

  - Mandating ATS use is dictating protocol design just as much as
requiring the CM private data to carry source and destination IP
addresses.

I think ATS dictates the kinds of authentication that can be 
done by the server over an IB transport, but not the protocol 
design. Certainly the private data can have additional 
authentication data (which I think is what you're advocating). 

  - It's not just preventing connections to the wrong local address.
NFS-RDMA wants the remote source address (ie getpeername()) so that
it can look it up in the exports list.

Agreed. But you could also get rid of ATS by allowing GIDs to 
be specified in the exports file and then treating them like 
IPv6 addresses for the purpose of subnet comparisons.

  - Saying that a given GID may only have a single IP address is
definitely a case of the cure being worse than the disease.  I
don't think we can forbid perfectly valid multi-homed
configurations just because it's inconvenient for us to 
 support them.

I think our different perspectives come from what 
we consider to be perfectly valid multi-homed configurations. 
One approach advocates overloading private data, the other 
advocates overloading address assignments. 

My approach suffers from the fact that multiple IP addresses
for the same GID are just aliases that are interchangeable and at 
the remote end indistinguishable. The private data approach 
suffers from the need to mandate private data formats across 
all ulps and transports.

I prefer the former limitation/cost. 

 By the way, as far as I can tell, there is NO formal 
 documentation of the NFS-RDMA wire protocol.  The current 
 draft (draft-ietf-nfsv4-rpcrdma-01.txt) simply says:

  This protocol is designed to function with equivalent semantics
  over all appropriate RDMA transports.  In its abstract form, this
  protocol does not implement RDMA directly. [...]  It therefore
  becomes a useful, implementable standard when mapped onto a
  specific RDMA transport, such as iWARP [RDDP] or Infiniband [IB].

  [...]

  In setting up a new RDMA connection, the first action by an RPC
  client will be to obtain a transport address for the server.  The
  mechanism used to obtain this address, and to open an RDMA
  connection is dependent on the type of RDMA transport, 
 and outside
  the scope of this protocol.

 So it seems perfectly reasonable and acceptable for the 
 mapping of NFS-RDMA onto IB to specify that the source and 
 destination IP addresses for an IB connection are placed in 
 the CM private data.
 This seems much easier than trying to turn ATS into an IETF standard.

  - R.

I think there is a way to get rid of ATS as I described above without 
overloading the private data.

Phew -- I'm exhausted. I'm going to go write code ;-)

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] RDMA connection and address translation API

2005-08-24 Thread Roland Dreier

Roland No, I think we just need to realize that a perfectly
Roland transport neutral protocol implementation is not
Roland achievable.  It's unfortunate that kDAPL fooled people by
Roland hiding the details of the wire protocol under a supposedly
Roland neutral API, but the fact is that mapping an abstract
Roland RDMA transport to a real implementation will always
Roland involve arbitrary transport-dependent choices.

Further: if we would be willing to say that transport-neutral
protocols must use a kDAPL wire protocol, then there's no problem in
defining that wire protocol to put the source and destination IP
address somewhere in the CM private data.  The current kDAPL wire
protocol happens to use ATS to try and achieve this (although it
doesn't handle the multi-homed case), but that is no more and no less
of an arbitrary protocol design choice.

So in a nutshell, my objection to using ATS is that it is an arbitrary
design choice that doesn't work as well as other equally valid choices.

 - R.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [openib-general] RDMA connection and address translation API

2005-08-24 Thread Yaron Haviv

 -Original Message-
 From: James Lentini [mailto:[EMAIL PROTECTED]
 Sent: Wednesday, August 24, 2005 5:51 PM
 To: Yaron Haviv
 Cc: Roland Dreier; openib-general@openib.org
 Subject: RE: [openib-general] RDMA connection and address translation
API

 Which draft contains this? I found

 http://www.ietf.org/internet-drafts/draft-ietf-ips-iser-04.txt

James,

You should look at :
http://www.haifa.il.ibm.com/satran/ips/draft-ietf-ips-iser-05-candidate.
txt

The 05 rev really adds all the InfiniBand related stuff 
You can see how the association between IB  IP is done using IPoIB

The current implementation may not use the private data field (since its
not critical/mandatory) but the intention is to add it to address multi
homed hosts, we would like to push such a definition into IBTA so every
IP oriented ULP can use it, several people expressed interest in such a
definition, this can also support NFS/RDMA or any other IP based ULP.

 but the HELLO header in section 9.3 does not contain any IP address
 information.

  I believe it can be a good idea to use the same approach for
  NFS/RDMA and eliminate the need for reverse ATS lookup (the may have
  some conflicts when multiple IPs exists per node). We may just use
  the SDP hello header as is with unused fields zeroed This will allow
  all ULPs to use the same mechanism

 NFS/RDMA is not specific to iWARP or InfiniBand. My understanding is
 that this could not be easily accommodated in the current standards
 for that reason.

Not sure why is that the case, if we add an IBTA definition of CM
exchange for IP based ULP's (i.e. send src/dst IP and optionally ports)
you can now have an NFS/RDMA spec that doesn't need to have any IB/iWarp
specific definitions, since the differences are pushed down to the IBTA 

In case of NFS/RDMA over other (non IB or iWarp) transport you can
specify that providing the IP addressing is a responsibility of the
underline transport.

Yaron

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] RDMA connection and address translation API

2005-08-24 Thread Roland Dreier

Yaron The current implementation may not use the private data
Yaron field (since its not critical/mandatory) but the intention
Yaron is to add it to address multi homed hosts, we would like to
Yaron push such a definition into IBTA so every IP oriented ULP
Yaron can use it, several people expressed interest in such a
Yaron definition, this can also support NFS/RDMA or any other IP
Yaron based ULP.

Strange as it may seem, I agree completely with Yaron ;)

It would make perfect sense to take a couple of the reserved bits in
the CM REQ format and turn them into an IP address present field (a
couple of bits so we can distinguish between v4 and v6).  When this
field is set, then the first (or last, or whatever) 32 bytes of the
private data would hold the source and destination IP address.

Having this standardized also gives us the ability to deal with the
concerns around connections initiated in userspace.  The kernel proxy
for the user CM can make sure that any REQs sent with the IP address
present field set actually has an IP assigned to the local system.
Remote systems would still need to treat CM messages from QPs other
than QP 1 as untrusted.

Of course for real security some stronger authentication is needed in
any case (even in the iWARP case the source IP can't be trusted; an
attacker could DOS the real owner of the IP, flood the switches MAC
tables so it becomes a hub, and then take over any IP it wants).

The only unfortunate thing about all this is that the SDP Hello
message format is already frozen, and it seems a little too
specialized for generic use (eg we don't want a Max Zcopy
Advertisements field).

Yaron, has anyone raised all this in the IBTA WG?

 - R.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [openib-general] RDMA connection and address translation API

2005-08-24 Thread Fab Tillier

 From: James Lentini [mailto:[EMAIL PROTECTED]
 Sent: Wednesday, August 24, 2005 1:58 PM

 On Wed, 24 Aug 2005, Fab Tillier wrote:

   From: Roland Dreier [mailto:[EMAIL PROTECTED]
   Sent: Wednesday, August 24, 2005 11:03 AM

   Fab Why can't the IPV field be ignored?  If a listen wants only
   Fab IPV4 addresses, it would specify a 16-byte compare buffer
   Fab with the first 12 bytes zero, the next 4 filled with the IPV4
   Fab address, and would set the offset to that of the hello
   Fab message's destination address (32).

   Yes, you're right for SDP.  I guess if we're comfortable mandating
   that all protocols put their source and destination IPs in the private
   data for the IB case, then this works.  Of course it's somewhat
   awkward to pass this information into the transport-neutral CM API but
   I think this can be worked around.

  I don't know if we need to mandate IP usage - it's up to the
  application.  Any application that wants to have similar semantics
  to the way socket listens work (especially when bound to one of
  multiple IP addresses on a port) the application would have to
  define its private data to accommodate this.

   At the IB level, the contents of the private data are still opaque,
  even to the CM.  The CM would only expose the ability to have it
  perform an initial triage of requests by doing binary comparisons
  over regions of private data.  It doesn't know (or need to know)
  what the data represents - it only cares about finding a match (or
  not).  The CM doesn't define any sort of policy here, and I don't
  think it should.  It's just bytes to the CM, and it's doing a blind
  comparison without interpreting the contents.

 You need to consider what makes sense for *both* ib and iwarp. Keep in
 mind that the correct API will allow a consumer to use ib and iwarp
 devices transparently. In other words their will be one code path that
 support both.

I believe using the private data makes the most sense from the IB perspective.
One could even argue that it is the only way to provide positive getpeername
functionality.  Use of the IB private data does not require identical use of
private data in other technologies.

 If we were to adopt your proposal, the consumer would need to perform
 unnecessary operations on iWARP.

It doesn't have to impact the client if there's some intermediate abstraction to
isolate the client from the IB CM details (including private data use).

 A transport neutral client would be forced to put IP information into
 its CM private data on iWARP.

 Likewise, a transport neutral server would be forced to pass an
 private data offset and binary blob to the listen API call on iWARP.

 Neither of these make sense.

A higher-level CM abstraction could implement the policy of private data use
when running on IB without the client's involvement.  The end result still is
that you end up with a wire protocol that needs to be documented so that someone
without that exact CM abstraction knows where and how to format the private data
as well as how to interpret it.  If the IBTA defines something like this, all
these issues go away.  I don't know if the IBTA can define this without
affecting existing protocols like SDP and iSER that already define how to
encapsulate the source and destination information in the private data.

Using the private data, either by the client or some IB-specific CM abstraction,
will remove the need for any reverse lookups.  A forward lookup to validate the
incoming source GID to the source IP in the private data can validate the IP
address.  Performing a forward lookup via ARP is going to be a lot faster than
ATS if the ARP entry already exists.  On large fabrics, ARP is also going to
scale better since there's not one single entity responsible for responding to
every node's requests. 

 These API problems are secondary to the burden you would be placing on
 the protocols. As has been mentioned in a previous email, extending
 the current protocols to use this convention will require further
 standardization and in some cases may not be compatible with their
 current architecture.

I think biting the bullet now on establishing these standards for applications
using IP addressing over IB, whether in the IBTA or in each application, is
going to give us the best long term result.

- Fab

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [openib-general] RDMA connection and address translation API

2005-08-24 Thread Yaron Haviv

 -Original Message-
 From: Roland Dreier [mailto:[EMAIL PROTECTED]
 Sent: Wednesday, August 24, 2005 7:29 PM
 To: Yaron Haviv
 Cc: James Lentini; Roland Dreier; openib-general@openib.org
 Subject: Re: [openib-general] RDMA connection and address translation
API
 
 
 Yaron, has anyone raised all this in the IBTA WG?
 

I raised it about a year ago, but didn't really followed up on it 
At the time IBTA was also busy with other more urgent stuff (verb ext..)
We work with few key IBTA members to re-surface it with the need for an
abstract CM

See the following text that was proposed (a Year ago as is)
It is slightly different than your proposal but can be altered if needed

It basically uses SDP header and marks one of the fields with 01 (FlowC)
to indicate it's not SDP, this way even SDP can use it 
Also it covers some nice idea raised by MS  SUN to extend SDP to accept
PUT  GET operations for RDMA, so you can get a BSD like API with few
additional APIs rather than have a totally new API like DAPL


Establishing a TCP/iWarp like connections over InfiniBand
=

 In order to emulate an iWarp connection, it is required to open an 
 InfiniBand RC connection, associate it with IP addresses and TCP ports
 In addition protocols may transfer control/login packets before
 the migration to the RDMA mode; this requires exchanging receiver
buffer
 size and depth for initial usage (the ULP's will manage the flow
control
 for the duration of the connection).

 The mapping uses the same data structures already defined for
connection 
 establishment in SDP  (IBTA Socket Direct Protocol) which accomplish
the
 same goal of mapping TCP Sockets addressing to InfiniBand, the non 
 relevant SDP fields were Reserved. 

 iWarp emulation CM Request (Hello) Private Data header
  
0   1   2   3

0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1

   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

 04|  MID  | Rsvd  | bufs  |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

 08|  len  |

   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

 12|   Reserved|

   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

 16|   Reserved|

   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

 20| MajVer| MinVer| IPVer | FlowC |   Reserved|

   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

 24|  DesRemRcvSz  |

   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

 28|  LocalRcvSz   |

   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

 32| Local Port|   Reserved|

   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

 36|   Src IP (127-96) |

   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

 40|   Src IP ( 95-64) |

   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

 44|   Src IP ( 63-32) |

   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

 48|   Src IP ( 31-00) |

   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

 52|   Dst IP (127-96) |

   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

 56|   Dst IP ( 95-64) |

   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

 60|   Dst IP ( 63-32) |

   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

 64|   Dst IP ( 31-00) |

   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

 Figure 1 CM Hello private data structure   
  

 iWarp emulation CM Response (HelloReply) Private Data header

0   1   2   3

0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1

   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

 04|  MID  | Rsvd  | bufs  |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

 08|  len

RE: [openib-general] RDMA connection and address translation API

2005-08-24 Thread Sean Hefty

With this in mind, I believe that the connection API needs to be
something more like the following:

rdma_resolve_address():
inputs: dest IP address, qos, npaths,
done callback, opaque context
   done callback params: status, local RDMA device,
RDMA transport address, context
...
rdma_connect():
inputs: local QP, RDMA transport address, destination service,
private data, timeout, event callback, opaque context

Have we agreed that this is the functionality that we should be aiming towards?

rdma_resolve_address(...);
/* wait for resolution */
ib_create_qp(...) /* use device pointer we got from rdma_resolve_address()
*/

We need to insert in here: 

ib_modify_qp(...);  /* somehow uses address resolution... */
ib_post_recvs(...);

rdma_connect(...); /* pass transport address we got from
rdma_resolve_address() */
/* wait for connection to finish... */

Another possibility could be to add a list of receives to rdma_connect().

- Sean


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

81 matches

Mail list logo