Re: [openib-general] [RFC] IB address translation using ARP

2005-10-17 Thread Christoph Hellwig
On Fri, Oct 14, 2005 at 08:38:18AM -0700, Caitlin Bestler wrote:
 I can't think of a better example of something that is truly
 brain dead than an application *written* to use Sockets Direct
 Protocol.

I think you confuse specificly written to support with specificly
written to support only.  And yes, in the days of getaddrinfo writing
an application specific to a protocol instead of IP+Stream or Dgram
semantics is pretty bad idea.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] [RFC] IB address translation using ARP

2005-10-17 Thread Christoph Hellwig
On Fri, Oct 14, 2005 at 03:39:53PM -0700, Grant Grundler wrote:
 Open source does NOT ignore legacy applications:
 1) Anyone can continue to update and run on the linux kernel version
they have source code for if they don't want to (or can't) change
the application or newer kernels break the ABI.
Many people are still very happy using 2.4 linux kernels.

Actually if your aplication plays by the rules and breaks with a new
kernel that's a major bug.  We definitly guarantee that applications
that use the defined syscall interface work on new kernels indefinitly.
That doesn't mean they will get all the new features, though.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] [RFC] IB address translation using ARP

2005-10-17 Thread Tom Tucker
On Mon, 2005-10-17 at 10:11 -0700, Sean Hefty wrote:
 I think the current CMA could probably be better.
 
 Can you please clarify what you would change to the CMA API or implementation?
 I would rather get changes in sooner, rather than waiting until it has been
 pushed upstream.

At first blush, the API looks good to me. The kinds of changes I was
pondering were related to hiding some of the routing issues. For
example, if the app. doesn't bind the rdma_cm_id prior to calling
rdma_connect, the code will lookup and use the default route instead of
returning -EINVAL.

These kinds of things allows the app to use bind if they want control,
or not use bind (and simplify the code) if they are happy to take the
defaults. 

I was planning to do a patch and submit it for review, but if you'd
prefer talking through it -- that's fine two.


 And to be clear, the current interface is not attempting to abstract QPs, CQs,
 or other hardware resources.
 

Absolutely. 
 - Sean
 
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] [RFC] IB address translation using ARP

2005-10-17 Thread Sean Hefty

Tom Tucker wrote:

At first blush, the API looks good to me. The kinds of changes I was
pondering were related to hiding some of the routing issues. For
example, if the app. doesn't bind the rdma_cm_id prior to calling
rdma_connect, the code will lookup and use the default route instead of
returning -EINVAL.


From an app's perspective, they need to perform the following on the client 
side:

rdma_create_id();
rdma_resolve_addr();
rdma_create_qp();
rdma_resolve_route();
rdma_connect();

Before rdma_resolve_addr() is called, the rdma_cm_id is not associated with a 
local device.  So, rdma_resolve_addr() must be called before a QP can be allocated.


I had planned on making rdma_resolve_route() optional, but it complicates device 
removal handling.  It can still be done, but only saves the client about 2 lines 
of code.


Note that both rdma_resolve_addr() and rdma_resolve_route() are asynchronous 
for IB.


I was planning to do a patch and submit it for review, but if you'd
prefer talking through it -- that's fine 


Either will work.  I can accept a patch or modify the CMA directly if it's a 
fairly straightforward change.


- Sean
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] [RFC] IB address translation using ARP

2005-10-16 Thread Tom Tucker

At 50,000 feet, I don't think anyone disagrees with these lines of
reasoning, however, there are some practical design issues that don't
yield to the architectural rubric of design by rule of least
astonishment. 

It may be more complex than it needs to be; so propose an API, submit a
patch. I think the current CMA could probably be better. This will give
everyone something concrete to consider. 

IMHO, at this point, these philospohical arguments serve only to consume
network bandwidth.

On Thu, 2005-10-13 at 13:55 -0400, Caitlin Bestler wrote:
 I agree with Mike's analysis. But I'd also like to point out that even
 when source compatability is not a requirement, source familiarity
 is. That is, even when recoding is feasible the API should only
 introduce new concepts as required to improve efficiency. The
 shift from socket model to QP/CQ is challenging enough as is.
 It's also where the benefit is. Changing how the application
 requests and accepts connections is just piling on more things
 for the developers to learn onto an already very full plate, and
 with nowhere near the same benefit.
  
 The simple, IP/DNS-centric methods that Mike outlined will
 work on either iWARP or IB, and are very easily understood
 by those familiar with existing sockets/IP network development.

 The more complex models provide minor enhancements for
 very corner cases at the very heavy concept of requiring 
 the developer to understand a lot more about network topology.
  

Per the above, I don't view these issues as minor enhancements or
corner cases, these are features of the network software layer that
most applications rely on. 


 plain text document attachment (ATT49612.txt), ATT49612.txt
 ___
 openib-general mailing list
 openib-general@openib.org
 http://openib.org/mailman/listinfo/openib-general
 
 To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] [RFC] IB address translation using ARP

2005-10-14 Thread Grant Grundler
On Fri, Oct 14, 2005 at 08:38:18AM -0700, Caitlin Bestler wrote:
  That's not who SDP is going to work on Linux, though.  Where 
  not into your crude hacks to let broken applications work 
  with new technology business.  Applications will have to use 
  SDP directly or via getaddrinfo and we will never put in a 
  broken sockets switch.
 
 I can't think of a better example of something that is truly
 brain dead than an application *written* to use Sockets Direct
 Protocol.

That wasn't hch's point. His point was the kernel would never make
SDP transperent to user space. But using LD_PRELOAD and libsdp.so,
SDP can be transperently used by the application.
It's the user's (ok, maybe sysadmin's) choice.

...
 So if you aren't preserving the sockets API what is
 the point in using the protocol?

Yes, the intent is to let the application to continue using sockets API.
But if sysadmin is asking for AF_INET or AF_INET6, then they want TCP/IP
*plus* netfilter and other features in the linux kernel. Not something else.
If the sysadmin decides they don't need netfilter/tcp, then they can
use LD_PRELOAD as noted above.


  And can you _please_ stop all thise time to market and 
  similar business crap?  That simply doesn't matter when 
  designing something properly.
 
 If we really were to play stop-the-world-while-I-redesign-it
 games then the resulting solution would not use sockets, TCP
 or even Linux.

Well, linux kernel doesn't play stop-the-world-while-I-redesign-it.
The revolution happened (open source collaborative developement).
Linux kernel development is now an evolution.

Rule #1 for linux kernel develepment: labor is free
We *know* that's not true in commercial reality.  But kernel developement
just works that way because of it's origins and Linus likes it that way.
If someone wants something changed in the linux kernel, they can
develope/submit the changes themselves or pay someone to do the work.
In either case, Linus doesn't pay for it.

Seems like a sufficient number of smart people agree with him and play
the game the way he has defined it. The folks who do NOT like his game,
grab some version of the source tree and do what they like with it (as
long as they meet licensing requirements). That's ok too.


 Real solutions, from NICs through Operating
 Systems, recognize that their legacy is part of their strength
 as well as a nuisance.

Legacy is definitely a linux strength.

Open source does NOT ignore legacy applications:
1) Anyone can continue to update and run on the linux kernel version
   they have source code for if they don't want to (or can't) change
   the application or newer kernels break the ABI.
   Many people are still very happy using 2.4 linux kernels.

 [ Linux kernel has no ABI obligation to closed source apps given
   the availability of source code. That's what vendors like RH, SuSE,
   and their competitors are paid to provide - support for stable ABI. ]

2) kernel developers DO modify open source user programs to work
   with updated kernel interfaces if there is a clear advantage.
   scsitools and pciutils might be a good examples.
   X.org might be a more contemporary one.

3) kernel developers do NOT break an API/ABI just because it's tuesday
   and they had a bad burrito for lunch. They eat their own dogfood and
   don't want to have ABI events on their box once a week either.
   Some ABIs have been deprecated or intentionally broken to improve things.
   But that's not the norm.  We know it's not painless.

4) deprecated functionality is clearly marked and only removed after
   a reasonably long period (at least 12 months, usually 2-3 years).
   I know apps live longer than that.

I live in many worlds: paid to provide stable ABI, be good citizen,
make changes available upstream, and upstream is cost effective for HP.

hth,
grant
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] [RFC] IB address translation using ARP

2005-10-13 Thread Michael Krause


At 03:14 PM 10/12/2005, Caitlin Bestler wrote:

 -Original Message-
 From: [EMAIL PROTECTED] 

[
mailto:[EMAIL PROTECTED]] On Behalf Of Sean
Hefty
 Sent: Wednesday, October 12, 2005 2:36 PM
 To: Michael Krause
 Cc: openib-general@openib.org
 Subject: Re: [openib-general] [RFC] IB address translation using
ARP
 
 Michael Krause wrote:
  1. Applications want to use existing API to identify remote

 endnodes / 
  services.
 
 To clarify, the applications want to use IP based addressing 
 to identify remote endnotes. The connection API is under
development.
 

No, I think Mike's comment was dead on. Applications want to
use the existing API. They want to use the existing API even
when the API is clearly defective. Note that there are several
generations of host-resolution APIs for the IP world, with the
earlier ones clearly being heavily inferior (not thread safe,
not IPv4/IPv6 neutral, etc). But they have not been eliminated.
Why, because applications want to use the existing API.
If application developers were rationale and totally open to
adopt new ideas instantly then the active side would ask to
make a connection to a *service*, not to a host with a service
qualifier.
A new API may be under development to meet new needs. But keep in
mind that the application developers expect it to be as close to
what they are used to as possible, and will grumble that it is
not 100% compatible. 
This all comes down to economics which is why some ULP such as SDP are
created. Let's examine SDP for a moment. The purpose of SDP
to enable synchronous and asynchronous Sockets applications to
transparently run unmodified over a RDMA capable
interconnect. Unmodified means no source code changes and no
recompile required (this is possible if the Sockets library is a shared
library and dynamically linked). The first part of unmodified
means that the existing address / service resolution API calls work
(further, no change to the address family, etc. is required to make this
work either). Hence, pick any of the get* API calls that are in use
today and they should just work. 
How does this work? The SDP implementation takes on the burden for
the application developer. For iWARP, there really isn't anything
special that has to be done as these calls all should provide the
necessary information. The port mapper protocol would be invoked
which would map to the actual RDMA listen QP and target RNIC. For
IB, there is some additional work both in using SID as well as resolving
the IP address to the IB address vector but the work isn't that hard
to implement (we know this because this has all been
implemented on various OS within the industry). The same will be
true for NFS/RDMA and iSER - again all use the existing interfaces to
identify the address / service and map to an address vector (and again,
all of this has been implemented on various OS within the
industry).
The above makes ISV and customers very happy as they can take advantage
of RDMA technologies without having to go through the lengthy and
expensive qualification process that comes when any application is
modified / recompiled. This keeps costs low and improves
TTM. As for the RDMA connection API, that is simply attempting to
abstract to a common interface that any ULP implementation can use to
access either iWARP or IB. The RDMA connection API should not
be viewed as something end application developers will use but towards
middleware developers. This allows everyone to use IP addresses,
port spaces, etc. through the existing application API while allowing
RDMA to transparently add some intelligence to the process and eventually
enable new capabilities like policy management (e.g. how best to map ULP
QoS needs to a given path, service rate,etc.) without permuting
everything above. Keeping things transparent is best for all.
Attempting to require end application developers to modify their code
will result in slower adoption and reduced utilization of RDMA
technologies within the industry. It really is all about economics
and re-using the existing ecosystem / infrastructure.
Mike


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [openib-general] [RFC] IB address translation using ARP

2005-10-13 Thread Caitlin Bestler



I agree with Mike's analysis. But I'd also like to point 
out that even
when source compatability is not a requirement, source 
familiarity
is. That is, even when recoding is feasible the API should 
only
introduce new concepts as required to improve efficiency. 
The
shift from socket model to QP/CQ is challenging enough as 
is.
It's also where the benefit is. Changing how the 
application
requests and accepts connections is just piling on more 
things
for the developers to learn onto an already very full 
plate, and
with nowhere near the same benefit.

The simple, IP/DNS-centric methods that Mike outlined 
will
work on either iWARP or IB, and are very easily 
understood
by those familiar with existing sockets/IP network 
development.
The more complex models provide minor enhancements 
for
very corner cases at the very heavy concept of requiring 

the developer to understand a lot more about network 
topology.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [openib-general] [RFC] IB address translation using ARP

2005-10-12 Thread Caitlin Bestler





  
  
  From: [EMAIL PROTECTED] 
  [mailto:[EMAIL PROTECTED] On Behalf Of Michael 
  KrauseSent: Wednesday, October 12, 2005 8:24 AMTo: Hal 
  Rosenstock; Sean HeftyCc: OpenibSubject: RE: 
  [openib-general] [RFC] IB address translation using ARP
  At 07:45 AM 10/10/2005, Hal Rosenstock wrote:
  On Sun, 2005-10-09 at 10:19, Sean 
Hefty wrote:  I think iWARP can be on top of TCP or SCTP. But 
why wouldn't it care ?  I'm referring to the case that iWarp 
is running over TCP. I know that it can run over SCTP, but I'm 
not familiar with the details of that protocol. With TCP, this 
is an end-to-end connection, so layering iWarp over it, only the 
endpoints need to deal with it. I believe the same is true for 
SCTP.Yes, SCTP is similar in those regards.
  SCTP creates a connection and then multiplexes a set of sessions over 
  it. You can conceptually think of it as akin to IB RD but where all QP 
  are bound to the same EEC.
SCTP preserves all QP to QP semantics, including 
buffers posted to specific
buffers and credits. So SCTP will allows multiple 
in-flight messages for each
RDMA stream in the 
association.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [openib-general] [RFC] IB address translation using ARP

2005-10-12 Thread Michael Krause


At 09:59 AM 10/12/2005, Caitlin Bestler wrote:




From:
[EMAIL PROTECTED]
[
mailto:[EMAIL PROTECTED]] On Behalf Of Michael
Krause

Sent: Wednesday, October 12, 2005 8:24 AM

To: Hal Rosenstock; Sean Hefty

Cc: Openib

Subject: RE: [openib-general] [RFC] IB address translation using
ARP


At 07:45 AM 10/10/2005, Hal Rosenstock wrote:

On Sun, 2005-10-09 at 10:19, Sean Hefty wrote: 

 I think iWARP can be on top of TCP or SCTP. But why wouldn't
it care ?

 

 I'm referring to the case that iWarp is running over TCP.
I know that it can

 run over SCTP, but I'm not familiar with the details of that
protocol. With

 TCP, this is an end-to-end connection, so layering iWarp over
it, only the

 endpoints need to deal with it. I believe the same is true
for SCTP.

Yes, SCTP is similar in those regards.

SCTP creates a connection and then multiplexes a set of sessions over
it. You can conceptually think of it as akin to IB RD but where all
QP are bound to the same EEC.


SCTP preserves all QP to
QP semantics, including buffers posted to specific
buffers and credits. So SCTP will allows multiple in-flight messages for
each
RDMA stream in the association.
Yep. This is where iWARP differs from IB RD in that IB restricts
this to a single in-flight message per EEC at a time while iWARP allows
multiple in-flight over either transport type supported. The logic
behind why IB RD was constructed the way it was is somewhat complex but
one of the core requirements was to enable a QP to communicate across
multiple EEC while preserving an ordering domain within an EEC.
Given all of this needed to be implemented in hardware, i.e. without host
software intervention, for both main data path and error management, the
restriction to a single message was required. I and several others
had created a proprietary RDMA RC followed by a RD implementation 10+
years ago so we had a reasonable understanding of the error / complexity
trade-offs. Given the distances were within a usec or each other
and one could support multiple EEC per endnode pair, the performance /
scaling impacts were not seen as overly restrictive and met the software
application usage models quite nicely. Anyway, there are
differences between iWARP / SCTP and IB RD so people cannot equate them
beyond some base conceptual level aspects.
Mike

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [openib-general] [RFC] IB address translation using ARP

2005-10-12 Thread Caitlin Bestler
 

 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of Sean Hefty
 Sent: Wednesday, October 12, 2005 2:36 PM
 To: Michael Krause
 Cc: openib-general@openib.org
 Subject: Re: [openib-general] [RFC] IB address translation using ARP
 
 Michael Krause wrote:
  1. Applications want to use existing API to identify remote 
 endnodes / 
  services.
 
 To clarify, the applications want to use IP based addressing 
 to identify remote endnotes.  The connection API is under development.
 


No, I think Mike's comment was dead on. Applications want to
use the existing API. They want to use the existing API even
when the API is clearly defective. Note that there are several
generations of host-resolution APIs for the IP world, with the
earlier ones clearly being heavily inferior (not thread safe,
not IPv4/IPv6 neutral, etc). But they have not been eliminated.

Why, because applications want to use the existing API.

If application developers were rationale and totally open to
adopt new ideas instantly then the active side would ask to
make a connection to a *service*, not to a host with a service
qualifier.

A new API may be under development to meet new needs. But keep in
mind that the application developers expect it to be as close to
what they are used to as possible, and will grumble that it is
not 100% compatible.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] [RFC] IB address translation using ARP

2005-10-12 Thread Sean Hefty

Caitlin Bestler wrote:

No, I think Mike's comment was dead on. Applications want to
use the existing API. They want to use the existing API even
when the API is clearly defective. Note that there are several
generations of host-resolution APIs for the IP world, with the
earlier ones clearly being heavily inferior (not thread safe,
not IPv4/IPv6 neutral, etc). But they have not been eliminated.


What existing API are you referring to?  We have SDP to support standard 
sockets.

- Sean
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] [RFC] IB address translation using ARP

2005-10-10 Thread Hal Rosenstock
On Sun, 2005-10-09 at 10:19, Sean Hefty wrote: 
 I think iWARP can be on top of TCP or SCTP. But why wouldn't it care ?
 
 I'm referring to the case that iWarp is running over TCP.  I know that it can
 run over SCTP, but I'm not familiar with the details of that protocol.  With
 TCP, this is an end-to-end connection, so layering iWarp over it, only the
 endpoints need to deal with it.  I believe the same is true for SCTP.

Yes, SCTP is similar in those regards.

 Doesn't a routing decision still need to be made at the IP layer ?
 
 Routing of the IP packets is done at the IP layer, but I don't see how this
 affects iWarp.

It does under the covers, those covers being IP routing.

 Doesn't the IP next hop need to be determined (e.g. gateway when the
 destination is off the local IP subnet) ? Is there something that
 precludes iWARP from working across IP subnets ?
 
 I can't think of anything that would preclude iWarp from working 
 across subnets.

Doesn't the IP next hop need determining in that case ? Why is that not
relevant ? I don't think the iWARP connection is end to end in all
cases.

-- Hal

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] [RFC] IB address translation using ARP

2005-10-10 Thread Hal Rosenstock
Hi Tom,

On Sun, 2005-10-09 at 13:10, Tom Tucker wrote: 
 On Sun, 2005-10-09 at 07:57 -0700, Sean Hefty wrote:
  It is theoretically possible to support all this on an IPoIB based
  network. Multiple subnets, multiple routes to remote peers, ICMP
  redirect, multiple IP addresses for each physical interface, yada yada
  yada. But IMHO, the only way to do this would be to tie directly into
  the existing routing,  ARP, ICMP, etc... subsystems in Linux. Otherwise
  you'll end up recreating a gigantic (and I mean GIGANTIC) amount of
  
  The current implementation ties into the standard Linux ARP tables.  If
  connections were made over TCP/IP, using IPoIB, then I don't think that 
  there
  would be any issues.  The issues only arise because of the desire to use 
  TCP/IP
  network addresses over a non-TCP/IP network.
  
  code. This belief is why I've been a proponent of mapping GIDs to one
  and only one IP address and treating it for management purposes as the
  equivalent of an IP address. Without this, the whole mechanism for
  determining routes, etc.. breaks down. If you treat the GID like a MAC
  address -- it breaks, because a MAC address can have multiple IP
  addresses -- the observation that lead to the conclusion that ATS was
  broken in the first place.
  
  We should be able to handle the case where a GID has multiple IP addresses 
  bound
  to it.  But even if we added a 1:1 restriction, the connection over IB issue
  still exists.
 
 I agree, except for RARP.

Not sure what you mean except for RARP. Can you elaborate ?
 
[snip...]

  I
  don't view a GID as an IP address because we're not sending and receiving IP
  packets on the GID.  IPoIB treats GIDs as only part of a MAC address, which 
  I
  think is the proper view. 
 
  Anyway, returning back to the original problem of connecting to an IB 
  gateway if
  a given a destination IP address on a different subnet...  I'm slowly 
  convincing
  myself that either the CMA or AT should do this.  (I believe that the 
  ib_addr
  code will do this now, but still wasn't sure that we wanted it to.)
  
 
 IMHO, you need a service separate from the CMA to do address
 translation. My (iWARP's) rationale for this is that there are two
 clients of the service, the CM and IP. For CM, you need it to elect a
 route and thereby a local interface. For IP you need it because routes
 change and ARP entries time out. 
 
 BTW, can you educate me ... is the following what you're thinking:
 
 On the client side...
 
 - route is discovered by looking at the Linux routing table
 - local interface is IPoIB (looks at rdma_ptr embedded in netdev struct)
 - send ARP AT message over local IB interface

It's just a normal IPoIB ARP to the destination IP address initiated by
AT. (With ATS, it could have been an SA Get ServiceRecord as an
alternative).

I think the current CMA code handles client above and server but not
(bridging) gateway below.

 At the gateway...bridging to IP

 - ARP AT query received on IB interface
 - Lookup route to destination IP address in gateway's route table.
 - If next hop's Ethernet address is already known, it is returned
  
  hardware (may not be ethernet)

 - Otherwise, local interface identified is IPoEthernet
 - New ARP query goes out on the local interface from the route
 - When response comes back, answer is returned.

 At the gateway...bridging to IPoIB
 
 - ARP AT message received on IB interface, delivered to AT
 - Lookup route to destination IP address in gateway's route table
 - If next hop's Ethernet address is already known, it is returned
 - otherwise, local interface identified in route is IPoIB
 - New ARP AT query goes out on the local interface
 - When response comes back, answer is returned.

-- Hal

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] [RFC] IB address translation using ARP

2005-10-10 Thread Caitlin Bestler
On 10/9/05, Sean Hefty [EMAIL PROTECTED] wrote:
I think iWARP can be on top of TCP or SCTP. But why wouldn't it care ?I'm referring to the case that iWarp is running over TCP.I know that it canrun over SCTP, but I'm not familiar with the details of that protocol.With
TCP, this is an end-to-end connection, so layering iWarp over it, only theendpoints need to deal with it.I believe the same is true for SCTP.

The main impact of SCTP is that even the IP address can change
under the covers. So not only is their routing that is transparent
to the RDMA consumer, there is also selection of source/destination
IP addresses .


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] [RFC] IB address translation using ARP

2005-10-10 Thread Caitlin Bestler
On 10 Oct 2005 10:45:59 -0400, Hal Rosenstock [EMAIL PROTECTED] wrote:
On Sun, 2005-10-09 at 10:19, Sean Hefty wrote: I think iWARP can be on top of TCP or SCTP. But why wouldn't it care ? I'm referring to the case that iWarp is running over TCP.I know that it can
 run over SCTP, but I'm not familiar with the details of that protocol.With TCP, this is an end-to-end connection, so layering iWarp over it, only the endpoints need to deal with it.I believe the same is true for SCTP.
Yes, SCTP is similar in those regards. Doesn't a routing decision still need to be made at the IP layer ? Routing of the IP packets is done at the IP layer, but I don't see how this
 affects iWarp.It does under the covers, those covers being IP routing. Doesn't the IP next hop need to be determined (e.g. gateway when the destination is off the local IP subnet) ? Is there something that
 precludes iWARP from working across IP subnets ? I can't think of anything that would preclude iWarp from working across subnets.Doesn't the IP next hop need determining in that case ? Why is that not
relevant ? I don't think the iWARP connection is end to end in allcases.

Of course it's end to end. It's just that only the end points understand
that it is an iWARP connection.

Or more properly, the underlying transport (or LLP) connections 
are end to end, but the iWARP semantics exist only in the RDMA
endpoints.

That is why iWARP works across multiple subnets. We've actually
done true worldwide connections. The exisitng IP network carries
the iWARP traffic because it is indeed just TCP traffic to the
intermediate network.


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] [RFC] IB address translation using ARP

2005-10-10 Thread Hal Rosenstock
On Mon, 2005-10-10 at 11:50, Caitlin Bestler wrote:
 Doesn't the IP next hop need determining in that case ? Why is
 that not 
 relevant ? I don't think the iWARP connection is end to end in
 all
 cases.
 
 
 Of course it's end to end. It's just that only the end points
 understand that it is an iWARP connection.

What about the case of iWARP - IB ?

 Or more properly, the underlying transport (or LLP) connections 
 are end to end, but the iWARP semantics exist only in the RDMA
 endpoints.
 
 That is why iWARP works across multiple subnets.

  ^^^
  IP subnets
  We've actually
 done true worldwide connections. The exisitng IP network carries
 the iWARP traffic because it is indeed just TCP traffic to the
 intermediate network.

-- Hal

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] [RFC] IB address translation using ARP

2005-10-10 Thread Sean Hefty

Tom Tucker wrote:

Again, I don't think that the binding is the issue, so much as the desire to use
an address for a protocol that isn't actually being used for communication.  


Not to be pedantic, but if binding or mapping or somesuch weren't an
issue we wouldn't need AT. 


We need AT because we're not using network addresses.  If a client used an IP 
address and ran over IP, we wouldn't need to do anything special.



IMHO, you need a service separate from the CMA to do address
translation. My (iWARP's) rationale for this is that there are two
clients of the service, the CM and IP. For CM, you need it to elect a
route and thereby a local interface. For IP you need it because routes
change and ARP entries time out. 


The connection management and address translation are separate services, with 
the CMA calling the address translation for the user.  You may want to look at 
ib_addr for details on how the address translation works.



- route is discovered by looking at the Linux routing table

^
address mapping from IP to GID/Pkey.


- local interface is IPoIB (looks at rdma_ptr embedded in netdev struct)
The address translation looks only at the hardware and broadcast addresses.  No 
additional rdma_ptr is needed with ib_addr.



- send ARP AT message over local IB interface
It sends a normal IP ARP to get the remove hardware address, which contains the 
destination GID.  An ARP is sent only if the mapping isn't available in the 
local ARP table.


At this point, the client has the SGID, DGID, and PKey.  It then issues a path 
record query to obtain the route to the destination.  The CMA doesn't really 
care if that destination is the actual destination or some gateway.


- Sean
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] [RFC] IB address translation using ARP

2005-10-10 Thread Sean Hefty

Hal Rosenstock wrote:

What about the case of iWARP - IB ?


Crossing IB shouldn't matter.  iWarp should simply cross the IB subnet using 
IPoIB.  You could build a gateway to make the transfer across IB more efficient, 
but it's not required.


- Sean
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] [RFC] IB address translation using ARP

2005-10-10 Thread Tom Tucker
 

 -Original Message-
 From: Sean Hefty [mailto:[EMAIL PROTECTED] 
 Sent: Monday, October 10, 2005 12:37 PM
 To: Tom Tucker
 Cc: Sean Hefty; Openib
 Subject: Re: [openib-general] [RFC] IB address translation using ARP
 
 Tom Tucker wrote:
 Again, I don't think that the binding is the issue, so much 
 as the desire to use
 an address for a protocol that isn't actually being used 
 for communication.  
  
  Not to be pedantic, but if binding or mapping or somesuch weren't an
  issue we wouldn't need AT. 
 
 We need AT because we're not using network addresses.  If a 
 client used an IP 
 address and ran over IP, we wouldn't need to do anything special.

agreed.

 
  IMHO, you need a service separate from the CMA to do address
  translation. My (iWARP's) rationale for this is that there are two
  clients of the service, the CM and IP. For CM, you need it 
 to elect a
  route and thereby a local interface. For IP you need it 
 because routes
  change and ARP entries time out. 
 
 The connection management and address translation are 
 separate services, with 
 the CMA calling the address translation for the user.  You 
 may want to look at 
 ib_addr for details on how the address translation works.

Very cool. I've applied the patch and will take a look.

 
  - route is discovered by looking at the Linux routing table
  ^
  address mapping from IP to GID/Pkey.

I think I understand where I'm upside down now. In my world, 
you don't know which interface to send the ARP request on 
until you've identified the local interface and you can't 
identify the local interface until you've looked up the route.
Not all interface have a path to all remote peers.

In your world, you can't look up the path record until you've 
identified the remote GID. What I don't get is, if you have more 
than one IB interface, which interface do you submit your IPoIB ARP 
request on? All of them?

 
  - local interface is IPoIB (looks at rdma_ptr embedded in 
 netdev struct)
 The address translation looks only at the hardware and 
 broadcast addresses.  No 
 additional rdma_ptr is needed with ib_addr.
 

Cool, I must have misunderstood an earlier discussion.

  - send ARP AT message over local IB interface
 It sends a normal IP ARP to get the remove hardware address, 
 which contains the 
 destination GID.  An ARP is sent only if the mapping isn't 
 available in the 
 local ARP table.

Not sure what a normal IP ARP message is. In my world, ARP and 
IP are peer protocols. ARP does not sit on top of IP, nor is it a 
special kind of IP message. Forgive my ignorance, but does IPoIB 
have ARP built into it?

But regardless, how do you know which local interface to send the 
IP ARP message on?

 
 At this point, the client has the SGID, DGID, and PKey.  It 
 then issues a path 
 record query to obtain the route to the destination.  The 
 CMA doesn't really 
 care if that destination is the actual destination or some gateway.

Thanks for the clarifications.
 
 - Sean
 
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] [RFC] IB address translation using ARP

2005-10-10 Thread Sean Hefty

Tom Tucker wrote:
I think I understand where I'm upside down now. In my world, 
you don't know which interface to send the ARP request on 
until you've identified the local interface and you can't 
identify the local interface until you've looked up the route.

Not all interface have a path to all remote peers.


We have the same restriction.  I lookup the route based on the destination IP 
address to get the local interface.


In your world, you can't look up the path record until you've 
identified the remote GID. What I don't get is, if you have more 
than one IB interface, which interface do you submit your IPoIB ARP 
request on? All of them?


It's based on the device returned by the route lookup.  I've attached the 
relevant code portion below.  If the code below fails, I generate an ARP, wait 
for the reply, then re-execute the code.


Not sure what a normal IP ARP message is. In my world, ARP and 
IP are peer protocols. ARP does not sit on top of IP, nor is it a 
special kind of IP message. Forgive my ignorance, but does IPoIB 
have ARP built into it?


I was being confusing.  The ARP is sent on the IPoIB net_device to map an IP 
address to the remote hardware address.  There's nothing special about the ARP.


- Sean

static int addr_resolve_remote(struct sockaddr_in *src_in,
   struct sockaddr_in *dst_in,
   struct ib_addr *addr)
{
u32 src_ip = src_in-sin_addr.s_addr;
u32 dst_ip = dst_in-sin_addr.s_addr;
struct flowi fl;
struct rtable *rt;
struct neighbour *neigh;
int ret;

memset(fl, 0, sizeof fl);
fl.nl_u.ip4_u.daddr = dst_ip;
fl.nl_u.ip4_u.saddr = src_ip;
ret = ip_route_output_key(rt, fl);
if (ret)
goto out;

neigh = neigh_lookup(arp_tbl, dst_ip, rt-idev-dev);
if (!neigh) {
ret = -ENODATA;
goto err1;
}

if (!(neigh-nud_state  NUD_VALID)) {
ret = -ENODATA;
goto err2;
}

if (!src_ip) {
src_in-sin_family = dst_in-sin_family;
src_in-sin_addr.s_addr = rt-rt_src;
}

addr-sgid = *(union ib_gid *) (neigh-dev-dev_addr + 4);
addr-dgid = *(union ib_gid *) (neigh-ha + 4);
addr-pkey = addr_get_pkey(neigh-dev);

err2:
neigh_release(neigh);
err1:
ip_rt_put(rt);
out:
return ret;
}

static void addr_send_arp(struct sockaddr_in *dst_in)
{
struct rtable *rt;
struct flowi fl;
u32 dst_ip = dst_in-sin_addr.s_addr;

memset(fl, 0, sizeof fl);
fl.nl_u.ip4_u.daddr = dst_ip;
if (ip_route_output_key(rt, fl))
return;

arp_send(ARPOP_REQUEST, ETH_P_ARP, dst_ip, rt-idev-dev, rt-rt_src,
 NULL, rt-idev-dev-dev_addr, NULL);
ip_rt_put(rt);
}
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] [RFC] IB address translation using ARP

2005-10-10 Thread Michael Krause


At 10:40 AM 10/10/2005, Sean Hefty wrote:
Hal Rosenstock wrote:
What about the case of iWARP
- IB ?
Crossing IB shouldn't matter. iWarp should simply cross the IB
subnet using IPoIB. You could build a gateway to make the transfer
across IB more efficient, but it's not required.
I don't understand this statement. iWARP is RDMA based and if
someone wanted to build a gateway with IB in between, it should be mapped
to an IB RC connection 1:1. Going through IPoIB is a waste and
would result in a very poor performing solution (not that such a solution
would deliver stellar performance to start with. Prior similar
solutions used ULP over IB and the gateway then provided ULP over TOE and
would then be easily extended to do iWARP. In general, you would
want to have defined domains for each interconnect and not try to add
poor ROI superset functionality of one over the other - waste of time and
money.
Mike

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] [RFC] IB address translation using ARP

2005-10-10 Thread Sean Hefty

Michael Krause wrote:

What about the case of iWARP - IB ?


Crossing IB shouldn't matter.  iWarp should simply cross the IB subnet 
using IPoIB.  You could build a gateway to make the transfer across IB 
more efficient, but it's not required.


I don't understand this statement.  iWARP is RDMA based and if someone 


I was referring to the case where both endpoints are running over iWarp, with IB 
being one of the subnets being crossed.  I believe that you're referring to one 
side running over iWarp, and the other running over IB, with an application 
level gateway in between.


For the latter case, I would think that the gateway needs to establish iWarp 
connections for any IP addresses that reside on the IB subnet behind it, with a 
separate IB connection on the back-end.  It seems to me that this would occur 
transparently to the application using iWarp.


- Sean
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] [RFC] IB address translation using ARP

2005-10-10 Thread Michael Krause


At 01:59 PM 10/10/2005, Sean Hefty wrote:
Michael Krause wrote:


What about the case of iWARP
- IB ?
Crossing IB shouldn't matter. iWarp should simply cross the IB
subnet using IPoIB. You could build a gateway to make the transfer
across IB more efficient, but it's not required.I don't
understand this statement. iWARP is RDMA based and if someone

I was referring to the case where both endpoints are running over iWarp,
with IB being one of the subnets being crossed. I believe that
you're referring to one side running over iWarp, and the other running
over IB, with an application level gateway in between.
For the latter case, I would think that the gateway needs to establish
iWarp connections for any IP addresses that reside on the IB subnet
behind it, with a separate IB connection on the back-end. It seems
to me that this would occur transparently to the application using
iWarp.
iWARP with IB in between seems like a waste of time to do (very small if
any market for such a beast). IB HCA on a host with an iWARP edge
device may be reasonable but again seems like a waste to construct.
These types of corner usage models while of interest to comprehend to see
if there is any architectural issues to insure they are not precluded
really are just that, corner cases, and little time or effort should be
spent on their support.
Mike

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] [RFC] IB address translation using ARP

2005-10-10 Thread Hal Rosenstock
On Mon, 2005-10-10 at 13:40, Sean Hefty wrote:
 Hal Rosenstock wrote:
  What about the case of iWARP - IB ?
 
 Crossing IB shouldn't matter.  iWarp should simply cross the IB subnet using 
 IPoIB.  You could build a gateway to make the transfer across IB more 
 efficient, 
 but it's not required.

I was referring to gatewaying to an IB end client from iWARP.

-- Hal

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] [RFC] IB address translation using ARP

2005-10-09 Thread Sean Hefty
I think iWARP can be on top of TCP or SCTP. But why wouldn't it care ?

I'm referring to the case that iWarp is running over TCP.  I know that it can
run over SCTP, but I'm not familiar with the details of that protocol.  With
TCP, this is an end-to-end connection, so layering iWarp over it, only the
endpoints need to deal with it.  I believe the same is true for SCTP.

Doesn't a routing decision still need to be made at the IP layer ?

Routing of the IP packets is done at the IP layer, but I don't see how this
affects iWarp.

Doesn't the IP next hop need to be determined (e.g. gateway when the
destination is off the local IP subnet) ? Is there something that
precludes iWARP from working across IP subnets ?

I can't think of anything that would preclude iWarp from working across subnets.


- Sean

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] [RFC] IB address translation using ARP

2005-10-09 Thread Sean Hefty
It is theoretically possible to support all this on an IPoIB based
network. Multiple subnets, multiple routes to remote peers, ICMP
redirect, multiple IP addresses for each physical interface, yada yada
yada. But IMHO, the only way to do this would be to tie directly into
the existing routing,  ARP, ICMP, etc... subsystems in Linux. Otherwise
you'll end up recreating a gigantic (and I mean GIGANTIC) amount of

The current implementation ties into the standard Linux ARP tables.  If
connections were made over TCP/IP, using IPoIB, then I don't think that there
would be any issues.  The issues only arise because of the desire to use TCP/IP
network addresses over a non-TCP/IP network.

code. This belief is why I've been a proponent of mapping GIDs to one
and only one IP address and treating it for management purposes as the
equivalent of an IP address. Without this, the whole mechanism for
determining routes, etc.. breaks down. If you treat the GID like a MAC
address -- it breaks, because a MAC address can have multiple IP
addresses -- the observation that lead to the conclusion that ATS was
broken in the first place.

We should be able to handle the case where a GID has multiple IP addresses bound
to it.  But even if we added a 1:1 restriction, the connection over IB issue
still exists.

I know there is significant resistance to this idea, but I just don't
see how we get this generically resolved without binding the two
addressing schemes more closely. With the current binding, I just don't
think it works.

Again, I don't think that the binding is the issue, so much as the desire to use
an address for a protocol that isn't actually being used for communication.  I
don't view a GID as an IP address because we're not sending and receiving IP
packets on the GID.  IPoIB treats GIDs as only part of a MAC address, which I
think is the proper view.

Anyway, returning back to the original problem of connecting to an IB gateway if
a given a destination IP address on a different subnet...  I'm slowly convincing
myself that either the CMA or AT should do this.  (I believe that the ib_addr
code will do this now, but still wasn't sure that we wanted it to.)

- Sean


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] [RFC] IB address translation using ARP

2005-10-09 Thread Tom Tucker
On Sun, 2005-10-09 at 07:57 -0700, Sean Hefty wrote:
 It is theoretically possible to support all this on an IPoIB based
 network. Multiple subnets, multiple routes to remote peers, ICMP
 redirect, multiple IP addresses for each physical interface, yada yada
 yada. But IMHO, the only way to do this would be to tie directly into
 the existing routing,  ARP, ICMP, etc... subsystems in Linux. Otherwise
 you'll end up recreating a gigantic (and I mean GIGANTIC) amount of
 
 The current implementation ties into the standard Linux ARP tables.  If
 connections were made over TCP/IP, using IPoIB, then I don't think that there
 would be any issues.  The issues only arise because of the desire to use 
 TCP/IP
 network addresses over a non-TCP/IP network.
 
 code. This belief is why I've been a proponent of mapping GIDs to one
 and only one IP address and treating it for management purposes as the
 equivalent of an IP address. Without this, the whole mechanism for
 determining routes, etc.. breaks down. If you treat the GID like a MAC
 address -- it breaks, because a MAC address can have multiple IP
 addresses -- the observation that lead to the conclusion that ATS was
 broken in the first place.
 
 We should be able to handle the case where a GID has multiple IP addresses 
 bound
 to it.  But even if we added a 1:1 restriction, the connection over IB issue
 still exists.

I agree, except for RARP.

 
 I know there is significant resistance to this idea, but I just don't
 see how we get this generically resolved without binding the two
 addressing schemes more closely. With the current binding, I just don't
 think it works.
 
 Again, I don't think that the binding is the issue, so much as the desire to 
 use
 an address for a protocol that isn't actually being used for communication.  

Not to be pedantic, but if binding or mapping or somesuch weren't an
issue we wouldn't need AT. 

 I
 don't view a GID as an IP address because we're not sending and receiving IP
 packets on the GID.  IPoIB treats GIDs as only part of a MAC address, which I
 think is the proper view. 

 Anyway, returning back to the original problem of connecting to an IB gateway 
 if
 a given a destination IP address on a different subnet...  I'm slowly 
 convincing
 myself that either the CMA or AT should do this.  (I believe that the ib_addr
 code will do this now, but still wasn't sure that we wanted it to.)
 

IMHO, you need a service separate from the CMA to do address
translation. My (iWARP's) rationale for this is that there are two
clients of the service, the CM and IP. For CM, you need it to elect a
route and thereby a local interface. For IP you need it because routes
change and ARP entries time out. 

BTW, can you educate me ... is the following what you're thinking:

On the client side...

- route is discovered by looking at the Linux routing table
- local interface is IPoIB (looks at rdma_ptr embedded in netdev struct)
- send ARP AT message over local IB interface

At the gateway...bridging to IP

- ARP AT query received on IB interface
- Lookup route to destination IP address in gateway's route table. 
- If next hop's Ethernet address is already known, it is returned 
- Otherwise, local interface identified is IPoEthernet
- New ARP query goes out on the local interface from the route
- When response comes back, answer is returned.

At the gateway...bridging to IPoIB

- ARP AT message received on IB interface, delivered to AT
- Lookup route to destination IP address in gateway's route table
- If next hop's Ethernet address is already known, it is returned
- otherwise, local interface identified in route is IPoIB
- New ARP AT query goes out on the local interface
- When response comes back, answer is returned.

Thanks,



 - Sean
 
 
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] [RFC] IB address translation using ARP

2005-10-08 Thread Tom Tucker
On Fri, 2005-10-07 at 20:13 -0400, Hal Rosenstock wrote:
 On Fri, 2005-10-07 at 19:57, Sean Hefty wrote:
  Hal Rosenstock wrote:
   Would an iWARP connection jump across IP subnets ? It would need to
   determine that it could do this (ala NHRP with ATM). Also, could there
   be other RDMA networks between them (like IB) ?
  
  if iWarp is on top of TCP, I don't think that it would care about IP 
  subnets.
 
 I think iWARP can be on top of TCP or SCTP. But why wouldn't it care ? 
 Doesn't a routing decision still need to be made at the IP layer ?
 Doesn't the IP next hop need to be determined (e.g. gateway when the
 destination is off the local IP subnet) ? Is there something that
 precludes iWARP from working across IP subnets ?
 
 -- Hal
 
I've just read through entire this thread for the first time, and I
sense considerable confusion about how IP routing works. I know I'm
confused ;-)

With sockets, the path to the remote peer is determined *after* the
connection request is submitted by the app (connect(...)). The app has
no idea which local interface will ultimately handle this connection or
what the path (route) is to the remote peer. It simply says
connect(67.65.105.4, ...). In fact, TCP doesn't know this either! Like
Hal suggests, the connect request (SYN packet) gets all the way down to
IP where the least cost route is selected, and if not already known, the
Ethernet address is determined (arp) for the next hop. The reasons for
this are varied but include: routes may change, Ethernet addresses for
next hops change, all within the lifetime of a connection. Almost
certainly if the connection lasts more than 15 minutes.

The route identifies the local interface, and next hop IP. An interface
is only ever on a single subnet. The ARP broadcast is issued on this
interface and is only on this one subnet. We're not broadcasting across
subnets. Note that the local interface is logical, and a single
Ethernet NIC may have multiple IP addresses and may in fact be on
multiple subnets if using VLAN. 

It is theoretically possible to support all this on an IPoIB based
network. Multiple subnets, multiple routes to remote peers, ICMP
redirect, multiple IP addresses for each physical interface, yada yada
yada. But IMHO, the only way to do this would be to tie directly into
the existing routing,  ARP, ICMP, etc... subsystems in Linux. Otherwise
you'll end up recreating a gigantic (and I mean GIGANTIC) amount of
code. This belief is why I've been a proponent of mapping GIDs to one
and only one IP address and treating it for management purposes as the
equivalent of an IP address. Without this, the whole mechanism for
determining routes, etc.. breaks down. If you treat the GID like a MAC
address -- it breaks, because a MAC address can have multiple IP
addresses -- the observation that lead to the conclusion that ATS was
broken in the first place.

I know there is significant resistance to this idea, but I just don't
see how we get this generically resolved without binding the two
addressing schemes more closely. With the current binding, I just don't
think it works.

If I'm off in the weeds, please let me know ... and I'll cease spouting
off.
 
 ___
 openib-general mailing list
 openib-general@openib.org
 http://openib.org/mailman/listinfo/openib-general
 
 To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] [RFC] IB address translation using ARP

2005-10-07 Thread Michael Krause


At 06:38 AM 9/30/2005, Caitlin Bestler wrote:

 -Original Message-
 From: [EMAIL PROTECTED] 

[
mailto:[EMAIL PROTECTED]] On Behalf Of Roland
Dreier
 Sent: Thursday, September 29, 2005 6:50 PM
 To: Sean Hefty
 Cc: Openib
 Subject: Re: [openib-general] [RFC] IB address translation using
ARP
 
 Sean Can you explain how RDMA works in
this case? This is simply
 Sean performing IP routing, and not IB
routing, correct? Are you
 Sean referring to a protocol running on
top of IP or IB directly?
 Sean Is the router establishing a second
reliable connection on
 Sean the backend? Does it simply
translate headers as packets
 Sean pass through in this case?
 
 I think the usage model is the following: you have some magic 
 device that has an IB port on one side and something
else 
 on the other side. Think of something like a gateway that

 talks SDP on the IB side and TCP/IP on the other side.
 
 You configure your IPoIB routing so that this magic device is 
 the next hop for talking to hosts on the IP network on the other
side.
 
 Now someone tries to make an SDP connection to an IP address 
 on the other side of the magic device. Routing tables + ARP

 give it the GID of the IB port of this magic device. It 
 connects to the magic device and run SDP to talk to the magic 
 device, and the magic device magically splices this into a 
 TCP connection to the real destination.
 
 Or the same idea for an NFS/RDMA - NFS/UDP gateway,
etc.
 
Those examples are all basically application level gateways.
As such they would have no transport or connection setup
implications. The application level gateway simply offers
a service on network X that it fulfills on network Y. But
as far as network X is concerned the gateway IS the
server.
It must be viewed as such. The cross over point between the two
domains represents independent management domains, trust domains,
reliable delivery domains, etc. 
I do not believe it
is possible to construct a transport
layer gateway that bridges RDMA between IB and iWARP while
appearing to be a normal RDMA endpoint on both networks.
Higher level gateways will be possible for many
applications, but I don't see how that relates to
connection establishment. That would require having
an end-to-end reliable connection, complete with flow
control semantics, that bridged the two networks by
some method other than encapsulation or tunneling.
We took steps to insure that both IB and iWARP could transmit packets in
the main data path very efficiently between the two interconnects but it
was never envisioned that a connection was truly end-to-end transparent
across the gateway component. I think most of the architects would
not support such an effort to define such a beast. There are many
issues in attempting such an offering. Just examine all of the
problems with the existing iSCSI to FC solutions; they ignore a number of
customer issues and hence have been relegated in many customer minds as
TTM, play toys not ready for prime time. This is one of the many
reasons why iSCSI has not taken off as the hype portrayed.
It would be best to define a CM architecture that enabled communication
between like endpoints and avoid the gateway dilemma. Let the gateway
provider work out such issues as there are many requirements already on
each side of these interconnects.
Mike


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [openib-general] [RFC] IB address translation using ARP

2005-10-07 Thread Michael Krause


At 06:24 AM 9/30/2005, Yaron Haviv wrote:
 -Original
Message-
 From: Roland Dreier
[
mailto:[EMAIL PROTECTED]]
 Sent: Thursday, September 29, 2005 9:50 PM
 To: Sean Hefty
 Cc: Yaron Haviv; Openib
 Subject: Re: [openib-general] [RFC] IB address translation using
ARP
 
 I think the usage model is the following: you have some magic
device
 that has an IB port on one side and something else on
the other
 side. Think of something like a gateway that talks SDP on the
IB side
 and TCP/IP on the other side.
 
Also applicable to two IB ports, e.g. forwarding SDP traffic from one
IB
partition to SDP on another partition (may even be the same port
with
two P_Keys), and doing some load-balancing or traffic management in
between, overall there are many use cases for that. 
While I can envision how an endpoint could communicate with another in
separate partitions, doing so really violates the spirit of the
partitioning where endpoints must be in the same partition in order to
see one another and communicate. Attempting to create an
intermediary who has insights into both and then somehow is able to
communicate how to find one another using some proprietary (can't be
through standards that I can think of) method, seems like way too much
complexity to be worth it.
Mike

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [openib-general] [RFC] IB address translation using ARP

2005-10-07 Thread Sean Hefty
It would be best to define a CM architecture that enabled communication
between like endpoints and avoid the gateway dilemma. Let the gateway
provider work out such issues as there are many requirements already
on each side of these interconnects.


I've given this some more thought since the original postings and agree with
you.  It doesn't seem right to me to have the CM establish a connection to
something that is not the specified destination, under the assumption that
whatever is being connected to is a gateway.  I think it would be better for the
application to determine that the actual destination is on a different subnet,
locate the gateway, and issue a connection request to the gateway.

- Sean

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] [RFC] IB address translation using ARP

2005-10-07 Thread Yaron Haviv
 
 From: Michael Krause [mailto:[EMAIL PROTECTED]
 Sent: Friday, October 07, 2005 12:29 PM
 To: Yaron Haviv
 Cc: Openib
 Subject: RE: [openib-general] [RFC] IB address translation using ARP
 
 At 06:24 AM 9/30/2005, Yaron Haviv wrote:
 
  -Original Message-
  From: Roland Dreier [ mailto:[EMAIL PROTECTED]
  Sent: Thursday, September 29, 2005 9:50 PM
  To: Sean Hefty
  Cc: Yaron Haviv; Openib
  Subject: Re: [openib-general] [RFC] IB address translation using ARP
 
  I think the usage model is the following: you have some magic device
  that has an IB port on one side and something else on the other
  side.  Think of something like a gateway that talks SDP on the IB side
  and TCP/IP on the other side.
 
 
 Also applicable to two IB ports, e.g. forwarding SDP traffic from one IB
 partition to SDP on another partition (may even be the same port with
 two P_Keys), and doing some load-balancing or traffic management in
 between, overall there are many use cases for that.
 
 While I can envision how an endpoint could communicate with another in
 separate partitions, doing so really violates the spirit of the
 partitioning where endpoints must be in the same partition in order to see
 one another and communicate.  

Mike, 
This is exactly the same case as two IPoIB interfaces over same port with two 
partitions configured with IP routing between them, or a layer 7 proxy that 
connects two network segments  
I don’t see anything wrong with such a model 

 Attempting to create an intermediary who has
 insights into both and then somehow is able to communicate how to find one
 another using some proprietary (can't be through standards that I can
 think of) method, seems like way too much complexity to be worth it.
 

Assuming the ULPs on both sides are standards, how the proxy is built and how 
it functions is application dependent just like people do proxies for XML which 
don’t need to obey to any standard beside be transparent to both sides.
OpenIB should not block the ability to provide gateway/proxy functionality, or 
routing traffic beyond a single IP addressing hop.
This is just matching IB to capabilities already available in iWarp.

Yaron
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] [RFC] IB address translation using ARP

2005-10-07 Thread Yaron Haviv
 -Original Message-
 From: [EMAIL PROTECTED] [mailto:openib-general-
 [EMAIL PROTECTED] On Behalf Of Sean Hefty
 Sent: Friday, October 07, 2005 12:40 PM
 To: 'Michael Krause'; Caitlin Bestler
 Cc: Openib
 Subject: RE: [openib-general] [RFC] IB address translation using ARP
 
 It would be best to define a CM architecture that enabled
communication
 between like endpoints and avoid the gateway dilemma. Let the gateway
 provider work out such issues as there are many requirements already
 on each side of these interconnects.
 
 
 I've given this some more thought since the original postings and
agree
 with
 you.  It doesn't seem right to me to have the CM establish a
connection to
 something that is not the specified destination, under the assumption
that
 whatever is being connected to is a gateway.  I think it would be
better
 for the
 application to determine that the actual destination is on a different
 subnet,
 locate the gateway, and issue a connection request to the gateway.
 
 - Sean
 

Sean, I believe this is exactly how it is been proposed
The gateway is the endpoint in IB, and the IB CM request is done against
the gateway, the gateway may decide to create its own connection on the
other side based on IB headers or Private data or even application data
(depend on the type of the gateway), this just requires that traffic
targeted to a certain IP range/subnet/non-local will end up in the
gateway without the need to specify address by address individually
(just like its done in IP)

Yaron

 ___
 openib-general mailing list
 openib-general@openib.org
 http://openib.org/mailman/listinfo/openib-general
 
 To unsubscribe, please visit
http://openib.org/mailman/listinfo/openib-
 general
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] [RFC] IB address translation using ARP

2005-10-07 Thread Sean Hefty

Yaron Haviv wrote:

Sean, I believe this is exactly how it is been proposed
The gateway is the endpoint in IB, and the IB CM request is done against
the gateway, the gateway may decide to create its own connection on the


Yes - I agree with that.  I'm referring to the RDMA connection manager, versus 
the IB connection manager.



targeted to a certain IP range/subnet/non-local will end up in the
gateway without the need to specify address by address individually
(just like its done in IP)


IP is connectionless, so I'm not sure how to relate from IP to the RDMA CM. 
With TCP, the connection is to the actual endpoint, not the IP router.  This 
seems more similar to an application requesting a connection to a proxy server.


- Sean
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] [RFC] IB address translation using ARP

2005-10-07 Thread Hal Rosenstock
On Fri, 2005-10-07 at 16:10, Sean Hefty wrote:
 Yaron Haviv wrote:
  Sean, I believe this is exactly how it is been proposed
  The gateway is the endpoint in IB, and the IB CM request is done against
  the gateway, the gateway may decide to create its own connection on the
 
 Yes - I agree with that.  I'm referring to the RDMA connection manager, 
 versus 
 the IB connection manager.
 
  targeted to a certain IP range/subnet/non-local will end up in the
  gateway without the need to specify address by address individually
  (just like its done in IP)
 
 IP is connectionless, so I'm not sure how to relate from IP to the RDMA CM. 

IP is connectionless but has been implemented on top of connection
oriented link layers which may gateway to other connection oriented link
layers or non connection oriented link layers. I think it is analagous
to that.

-- Hal

 With TCP, the connection is to the actual endpoint, not the IP router.  This 
 seems more similar to an application requesting a connection to a proxy 
 server.
 
 - Sean
 ___
 openib-general mailing list
 openib-general@openib.org
 http://openib.org/mailman/listinfo/openib-general
 
 To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] [RFC] IB address translation using ARP

2005-10-07 Thread Sean Hefty

Hal Rosenstock wrote:
IP is connectionless, so I'm not sure how to relate from IP to the RDMA CM. 



IP is connectionless but has been implemented on top of connection
oriented link layers which may gateway to other connection oriented link
layers or non connection oriented link layers. I think it is analagous
to that.


I didn't think that IP was even being run in this case.  Aren't we talking about 
an application level gateway?  If the RDMA CM ran a protocol that ensured that 
data sent from the source reached the actual destination, then this would make 
more sense to me.  But the protocol is coming from the client.


I just don't think that the RDMA CM should connect to a gateway under the 
assumption that a client is running a protocol that operates this way.  If the 
source and destination were both running iWarp, then wouldn't a connection be 
established to the actual destination, and not a gateway?


- Sean
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] [RFC] IB address translation using ARP

2005-10-07 Thread Hal Rosenstock
On Fri, 2005-10-07 at 17:02, Sean Hefty wrote:
 Hal Rosenstock wrote:
 IP is connectionless, so I'm not sure how to relate from IP to the RDMA CM. 
  
  
  IP is connectionless but has been implemented on top of connection
  oriented link layers which may gateway to other connection oriented link
  layers or non connection oriented link layers. I think it is analagous
  to that.
 
 I didn't think that IP was even being run in this case.  Aren't we talking 
 about 
 an application level gateway?

Yes.

 If the RDMA CM ran a protocol that ensured that data sent from the source 
 reached the actual destination, then this would make 
 more sense to me.  But the protocol is coming from the client.

Wouldn't the gateway/host reject or drop the connection if it couldn't
do what was required ?
 
 I just don't think that the RDMA CM should connect to a gateway under the 
 assumption that a client is running a protocol that operates this way.  If 
 the 
 source and destination were both running iWarp, then wouldn't a connection be 
 established to the actual destination, and not a gateway?

Would it shortcut the connection across IP subnets or go through a
gateway in that case ?

-- Hal

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] [RFC] IB address translation using ARP

2005-10-07 Thread Sean Hefty

Hal Rosenstock wrote:

If the RDMA CM ran a protocol that ensured that data sent from the source
reached the actual destination, then this would make more sense to me.  But
the protocol is coming from the client.


Wouldn't the gateway/host reject or drop the connection if it couldn't do
what was required ?


I would assume so, and maybe that's sufficient.  The one problem that I see if
this feature weren't in the RDMA CM is that clients may need to be transport 
aware.  (Assuming that an iWarp connection would go directly to the destination.)


- Sean
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] [RFC] IB address translation using ARP

2005-10-07 Thread Hal Rosenstock
On Fri, 2005-10-07 at 17:30, Sean Hefty wrote:
 Hal Rosenstock wrote:
  If the RDMA CM ran a protocol that ensured that data sent from the source
  reached the actual destination, then this would make more sense to me.  But
  the protocol is coming from the client.
  
  Wouldn't the gateway/host reject or drop the connection if it couldn't do
  what was required ?
 
 I would assume so, and maybe that's sufficient.  The one problem that I see if
 this feature weren't in the RDMA CM is that clients may need to be transport 
 aware.  (Assuming that an iWarp connection would go directly to the 
 destination.)

Would an iWARP connection jump across IP subnets ? It would need to
determine that it could do this (ala NHRP with ATM). Also, could there
be other RDMA networks between them (like IB) ?

-- Hal

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] [RFC] IB address translation using ARP

2005-10-07 Thread Sean Hefty

Hal Rosenstock wrote:

Would an iWARP connection jump across IP subnets ? It would need to
determine that it could do this (ala NHRP with ATM). Also, could there
be other RDMA networks between them (like IB) ?


if iWarp is on top of TCP, I don't think that it would care about IP subnets.

- Sean
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] [RFC] IB address translation using ARP

2005-10-07 Thread Hal Rosenstock
On Fri, 2005-10-07 at 19:57, Sean Hefty wrote:
 Hal Rosenstock wrote:
  Would an iWARP connection jump across IP subnets ? It would need to
  determine that it could do this (ala NHRP with ATM). Also, could there
  be other RDMA networks between them (like IB) ?
 
 if iWarp is on top of TCP, I don't think that it would care about IP subnets.

I think iWARP can be on top of TCP or SCTP. But why wouldn't it care ? 
Doesn't a routing decision still need to be made at the IP layer ?
Doesn't the IP next hop need to be determined (e.g. gateway when the
destination is off the local IP subnet) ? Is there something that
precludes iWARP from working across IP subnets ?

-- Hal



___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] [RFC] IB address translation using ARP

2005-09-30 Thread Yaron Haviv
 -Original Message-
 From: Roland Dreier [mailto:[EMAIL PROTECTED]
 Sent: Thursday, September 29, 2005 9:50 PM
 To: Sean Hefty
 Cc: Yaron Haviv; Openib
 Subject: Re: [openib-general] [RFC] IB address translation using ARP
 
 I think the usage model is the following: you have some magic device
 that has an IB port on one side and something else on the other
 side.  Think of something like a gateway that talks SDP on the IB side
 and TCP/IP on the other side.
 

Also applicable to two IB ports, e.g. forwarding SDP traffic from one IB
partition to SDP on another partition (may even be the same port with
two P_Keys), and doing some load-balancing or traffic management in
between, overall there are many use cases for that. 

Yaron
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] [RFC] IB address translation using ARP

2005-09-30 Thread Caitlin Bestler
 

 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of Roland Dreier
 Sent: Thursday, September 29, 2005 6:50 PM
 To: Sean Hefty
 Cc: Openib
 Subject: Re: [openib-general] [RFC] IB address translation using ARP
 
 Sean Can you explain how RDMA works in this case?  This is simply
 Sean performing IP routing, and not IB routing, correct?  Are you
 Sean referring to a protocol running on top of IP or IB directly?
 Sean Is the router establishing a second reliable connection on
 Sean the backend?  Does it simply translate headers as packets
 Sean pass through in this case?
 
 I think the usage model is the following: you have some magic 
 device that has an IB port on one side and something else 
 on the other side.  Think of something like a gateway that 
 talks SDP on the IB side and TCP/IP on the other side.
 
 You configure your IPoIB routing so that this magic device is 
 the next hop for talking to hosts on the IP network on the other side.
 
 Now someone tries to make an SDP connection to an IP address 
 on the other side of the magic device.  Routing tables + ARP 
 give it the GID of the IB port of this magic device.  It 
 connects to the magic device and run SDP to talk to the magic 
 device, and the magic device magically splices this into a 
 TCP connection to the real destination.
 
 Or the same idea for an NFS/RDMA - NFS/UDP gateway, etc.
 

Those examples are all basically application level gateways.
As such they would have no transport or connection setup
implications. The application level gateway simply offers
a service on network X that it fulfills on network Y. But
as far as network X is concerned the gateway IS the server.

I do not believe it is possible to construct a transport
layer gateway that bridges RDMA between IB and iWARP while
appearing to be a normal RDMA endpoint on both networks.
Higher level gateways will be possible for many
applications, but I don't see how that relates to
connection establishment. That would require having
an end-to-end reliable connection, complete with flow
control semantics, that bridged the two networks by
some method other than encapsulation or tunneling.


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] [RFC] IB address translation using ARP

2005-09-29 Thread Hal Rosenstock
On Wed, 2005-09-28 at 21:26, Sean Hefty wrote:
 Here's a first attempt at an API / implementation (that compiles only) for
 an address translation module for IB using ARP.  The code should check the
 ARP cache for information, but is missing the actual ARP processing.

Where would the path record lookup subsequent to the ARP go ? It would
be here as well prior to the connect, right ?

 (We should be able to pull that from ib_at.)

or sdp_link which has the more temporal netdev references currently :-)

 The API is similar to the route
 portion of ib_at, but corrects issues with canceling requests.

What are you referring to here ?

 Only the destination IP address is required for input.
 
 The intent is that the CMA will use this service to locate the
 proper RDMA device GUID

This is the outgoing device, right ?

  and port to use in establishing a connection.
 Hopefully, this makes it clearer how I envision address translation wrt
 the CMA.

When/if there are multiple paths, how is the selection performed ?

Also, on the passive side, would a rdma_resolve_route also be done or
something else or wouldn't just a path lookup suffice here ? If it is
the latter, is that hidden under the rdma_accept or handled otherwise ?

-- Hal

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] [RFC] IB address translation using ARP

2005-09-29 Thread Hal Rosenstock
On Thu, 2005-09-29 at 09:59, Hal Rosenstock wrote:
 On Wed, 2005-09-28 at 21:26, Sean Hefty wrote:
  Here's a first attempt at an API / implementation (that compiles only) for
  an address translation module for IB using ARP.  The code should check the
  ARP cache for information, but is missing the actual ARP processing.
 
 Where would the path record lookup subsequent to the ARP go ? It would
 be here as well prior to the connect, right ?
 
  (We should be able to pull that from ib_at.)
 
 or sdp_link which has the more temporal netdev references currently :-)
 
  The API is similar to the route
  portion of ib_at, but corrects issues with canceling requests.
 
 What are you referring to here ?
 
  Only the destination IP address is required for input.
  
  The intent is that the CMA will use this service to locate the
  proper RDMA device GUID
 
 This is the outgoing device, right ?
 
   and port to use in establishing a connection.
  Hopefully, this makes it clearer how I envision address translation wrt
  the CMA.
 
 When/if there are multiple paths, how is the selection performed ?
 
 Also, on the passive side, would a rdma_resolve_route also be done or
 something else or wouldn't just a path lookup suffice here ? If it is
 the latter, is that hidden under the rdma_accept or handled otherwise ?

A couple more comments about the emerging implementation for address
translation:

What happens if the destination IP address is a local one ? I think
there is some missing code here.

Also, shouldn't non subnet local destination IP addresses be handled ?

-- Hal

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] [RFC] IB address translation using ARP

2005-09-29 Thread Sean Hefty

Sean Hefty wrote:

struct ib_addr_svc* ib_addr_create_svc(void *context, ib_addr_handler handler);

void ib_addr_destroy_svc(struct ib_addr_svc *svc);


On second thought, I think this can be done without the need to create/destroy a 
service without changing the functionality.



void ib_addr_cancel(struct ib_addr_svc *svc, struct ib_addr *addr);


If we make cancel a blocking call, I think that we could also ensure that a 
callback will not occur after cancel returns.  Not sure if we want this 
restriction, or that it really helps a ULP that's following a call to resolve 
with a path record query.


- Sean
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] [RFC] IB address translation using ARP

2005-09-29 Thread Hal Rosenstock
On Thu, 2005-09-29 at 12:05, Sean Hefty wrote:
  What happens if the destination IP address is a local one ? I think
  there is some missing code here.
 
 I think there's code in at.c to handle that case that could be re-used.

Yes. This is the code related to ip_dev_find which has been discussed on
the list.

  Also, shouldn't non subnet local destination IP addresses be handled ?
 
 How does that map to the IB subnet?

or IP subnet in the case of iWARP, right ? It's still an outgoing
interface just more than 1 IP hop away.

   Would it require global routing,

Yes.

  or are 
 non-subnet local addresses a valid configuration on a local IB subnet?

You need to end up ARPing for the next hop router.

-- Hal


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] [RFC] IB address translation using ARP

2005-09-29 Thread Hal Rosenstock
On Thu, 2005-09-29 at 12:40, Sean Hefty wrote:
 Hal Rosenstock wrote:
 How does that map to the IB subnet?
  
  or IP subnet in the case of iWARP, right ? It's still an outgoing
  interface just more than 1 IP hop away.
 
 The intent of the module is only to deal with IB.  Although, it seems generic 
 enough that it could return hardware addresses for anything.  I just don't 
 know 
 if there's a need for this functionality outside of IB.
 
   Would it require global routing,
  
  Yes.
 
 If it requires global routing of IB, then I think that we should defer it 
 until 
 global routing is available.  At least this was my original thinking.

I was referring to IP not IB routing.

-- Hal

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] [RFC] IB address translation using ARP

2005-09-29 Thread Sean Hefty

Hal Rosenstock wrote:

Would it require global routing,


Yes.


If it requires global routing of IB, then I think that we should defer it until 
global routing is available.  At least this was my original thinking.



I was referring to IP not IB routing.


If we restrict IB to a single subnet, do we need to worry about IP routing?  My 
assumption was no.  Is this an invalid assumption?


- Sean
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] [RFC] IB address translation using ARP

2005-09-29 Thread Hal Rosenstock
On Thu, 2005-09-29 at 12:57, Sean Hefty wrote:
 Hal Rosenstock wrote:
  Would it require global routing,
 
 Yes.
 
 If it requires global routing of IB, then I think that we should defer it 
 until 
 global routing is available.  At least this was my original thinking.
  
  
  I was referring to IP not IB routing.
 
 If we restrict IB to a single subnet, do we need to worry about IP routing?  
 My 
 assumption was no.  Is this an invalid assumption?

I think so. There is nothing that precludes having multiple IPoIB
subnets on the same IB subnet.

-- Hal

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] [RFC] IB address translation using ARP

2005-09-29 Thread Sean Hefty

Hal Rosenstock wrote:
If we restrict IB to a single subnet, do we need to worry about IP routing?  My 
assumption was no.  Is this an invalid assumption?


I think so. There is nothing that precludes having multiple IPoIB
subnets on the same IB subnet.


This seems similar to having multiple IP subnets on the same Ethernet subnet.

I'm struggling with understanding how translation can even occur in this case. 
What DGID is used when querying for the path record, and how is it obtained?  If 
this is a valid configuration, then it seems that we're still without a solution.


What does SDP do in this case?

- Sean
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] [RFC] IB address translation using ARP

2005-09-29 Thread Sean Hefty

Hal Rosenstock wrote:
I'm struggling with understanding how translation can even occur in this case. 
What DGID is used when querying for the path record, and how is it obtained?


Isn't it the DGID of the next hop IP router ? (I suppose in the case of
multiple IPoIB subnets on the same IB subnet, it could shortcut somehow
like NHRP does in terms of ATM v. CLIP (Classic IP over ATM).


How is the DGID of the next hop IP router used when connecting?  As an aside, do 
the IPoIB subnets all fall into the same broadcast domain?



What does SDP do in this case?


Same as AT. It does the route lookup and ARPs for and then asks for the
PathRecord of the next hop IP router.


I guess I'm confused here.  This gives a path record between the host system and 
the IP router.  How is that used to establish a connection to the actual 
destination?  What values (DLID, DGID, pkey, etc.) go in the CM REQ message, and 
how are those values obtained?


- Sean
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] [RFC] IB address translation using ARP

2005-09-29 Thread Sean Hefty

Yaron Haviv wrote:

4. send an arp on the net device find destination MAC

Note the destination IP in the ARP phase is either the REAL destination
IP in case of a local subnet, or the IP router IP address in case of a
gateway/router.

5. issue a path record between the source/dest GIDs (DGID taken from ARP
Result IPoIB MAC) 


In the case of gateway/router, isn't the returned GID for the router?  How is 
this used to establish a connection with the real destination?


- Sean
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] [RFC] IB address translation using ARP

2005-09-29 Thread Yaron Haviv
 -Original Message-
 From: Sean Hefty [mailto:[EMAIL PROTECTED]
 Sent: Thursday, September 29, 2005 5:16 PM
 To: Yaron Haviv
 Cc: Hal Rosenstock; Openib
 Subject: Re: [openib-general] [RFC] IB address translation using ARP
 
 Yaron Haviv wrote:
  4. send an arp on the net device find destination MAC
 
  Note the destination IP in the ARP phase is either the REAL
destination
  IP in case of a local subnet, or the IP router IP address in case of
a
  gateway/router.
 
  5. issue a path record between the source/dest GIDs (DGID taken from
ARP
  Result IPoIB MAC)
 
 In the case of gateway/router, isn't the returned GID for the router?
How
 is
 this used to establish a connection with the real destination?
 
 - Sean

The RC connection is established with the DGID of the router (it's the
equivalent of a MAC address and its ok), the ServiceID + private data in
the case of SDP or iSER (or NFS-R assuming the IBTA proposal will pass)
also contains info on the REAL destination IP that can be used by the
proxy.

By the way there is a section on that in the IETF iSER draft talking
about iSER to iSCSI routing, but it's a general solution just as
applicable to someone doing HTTP proxy to SDP, or NFS/TCP to NFS/RDMA,
or SDP to SDP, etc'.


to route 


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] [RFC] IB address translation using ARP

2005-09-29 Thread Sean Hefty

Yaron Haviv wrote:

The RC connection is established with the DGID of the router (it's the
equivalent of a MAC address and its ok), the ServiceID + private data in
the case of SDP or iSER (or NFS-R assuming the IBTA proposal will pass)
also contains info on the REAL destination IP that can be used by the
proxy.


I think I'm missing some fairly important concepts here.

Can you explain how RDMA works in this case?  This is simply performing IP 
routing, and not IB routing, correct?  Are you referring to a protocol running 
on top of IP or IB directly?  Is the router establishing a second reliable 
connection on the backend?  Does it simply translate headers as packets pass 
through in this case?


My focus so far has been trying to connection directly over IB, but using IP 
addresses.


- Sean
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] [RFC] IB address translation using ARP

2005-09-29 Thread Roland Dreier
Sean Can you explain how RDMA works in this case?  This is simply
Sean performing IP routing, and not IB routing, correct?  Are you
Sean referring to a protocol running on top of IP or IB directly?
Sean Is the router establishing a second reliable connection on
Sean the backend?  Does it simply translate headers as packets
Sean pass through in this case?

I think the usage model is the following: you have some magic device
that has an IB port on one side and something else on the other
side.  Think of something like a gateway that talks SDP on the IB side
and TCP/IP on the other side.

You configure your IPoIB routing so that this magic device is the next
hop for talking to hosts on the IP network on the other side.

Now someone tries to make an SDP connection to an IP address on the
other side of the magic device.  Routing tables + ARP give it the GID
of the IB port of this magic device.  It connects to the magic device
and run SDP to talk to the magic device, and the magic device
magically splices this into a TCP connection to the real destination.

Or the same idea for an NFS/RDMA - NFS/UDP gateway, etc.

 - R.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] [RFC] IB address translation using ARP

2005-09-29 Thread Sean Hefty
I think the usage model is the following: you have some magic device
that has an IB port on one side and something else on the other
side.  Think of something like a gateway that talks SDP on the IB side
and TCP/IP on the other side.

You configure your IPoIB routing so that this magic device is the next
hop for talking to hosts on the IP network on the other side.

Now someone tries to make an SDP connection to an IP address on the
other side of the magic device.  Routing tables + ARP give it the GID
of the IB port of this magic device.  It connects to the magic device
and run SDP to talk to the magic device, and the magic device
magically splices this into a TCP connection to the real destination.

Thanks for the clarification.

- Sean

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general