from:"Yaron Haviv"

Re: [openib-general] [PATCH 0/10] [RFC] Support for SilverStorm Virtual Ethernet I/O controller (VEx)

2006-10-03 Thread Yaron Haviv

> -Original Message-
> From: rick [mailto:[EMAIL PROTECTED]
> Sent: Tuesday, October 03, 2006 4:54 PM
> To: Michael Krause
> Cc: Fabian Tillier; Yaron Haviv; Roland Dreier (rdreier); Kuchimanchi,
> Ramachandra; openib-General
> Subject: Re: [openib-general] [PATCH 0/10] [RFC] Support for
SilverStorm
> Virtual Ethernet I/O controller (VEx)
> 
> For what it's worth: As a customer who is using the SS stack - we were
> more than pleased that we could achieve IPOIB (and RDS) failover
without
> using the bonding driver. I believe this is direct result of the
Virtual
> NIC approach SS is using.

Rick, if such functionality (w/o the bonding driver) is needed
It can also be implemented into IPoIB (we had it in our old stack) 
It has no direct relation to the Virtual NIC.

It may even be preferred if it's IPoIB and not a proprietary gateway
driver, so also IB nodes in the same fabric can use that functionality.

The only point I'm making is that any one can add an overlay driver for
his proprietary HW as he likes, and put it in OFED distribution, but if
this is becoming an internal portion of the open fabric kernel than:
1. Let's look at how we solve the problems in a more general perspective

2. Let's not duplicate code where we can avoid it
3. Let's make sure it's documented and reviewed (code and architectural
wise)  

We have kept those standards for all other solutions; I think it's just
as fair to demand it in that case as well

Yaron 

> 
> Michael Krause wrote:
> 
> >Silverstorm is executing a usage model that the IBTA used to develop
the
> IB
> >protocols.   What is the problem with that?  If it works and
integrates
> >into the stack, then this seems like an appropriate bit of
functionality
> to
> >support.   The fact that one can use a standard ULP to communicate to
a
> TCA
> >as an alternative which is supported by the existing stack is a
customer
> >product decision at the end of the day.   If Silverstorm or any IHV
can
> >show value and that it works in the stack, then it seems appropriate
to
> >support.  Isn't that a fundamental principle of being an open source
> effort?
> >
> >
> >Mike
> >
> >
> >At 12:31 PM 10/3/2006, Fabian Tillier wrote:
> >
> >
> >>Hi Yaron,
> >>
> >>On 10/3/06, Yaron Haviv <[EMAIL PROTECTED]> wrote:
> >>
> >>
> >>>I'm trying to figure out why this protocol makes sense
> >>>As far as I understand, IPoIB can provide a Virtual NIC
functionality
> >>>just as well (maybe even better), with two restrictions:
> >>>1. Lack of support for Jumbo Frames
> >>>2. Doesn't support protocols other than IP (e.g. IPX, ..)
> >>>
> >>>
> >>Whether to use a router or virtual NIC approach for connectivity to
> >>Ethernet subnets is a design decision.  We could argue until we are
> >>blue in the face about which architecture is "better", but that's
> >>really not relevant.
> >>
> >>
> >>
> >>>I believe we should first see if such a driver is needed and if
IPoIB
> >>>UD/RC cannot be leveraged for that, maybe the Ethernet emulation
can
> >>>just be an extension to IPoIB RC, hitting 3 birds in one stone
(same
> >>>infrastructure, jumbo frames for IPoIB, and Ethernet emulation for
all
> >>>nodes not just Gateways)
> >>>
> >>>
> >>You're joking right?  Are you really arguing that SilverStorm should
> >>not develop a driver to support its existing devices?  This really
> >>isn't complicated:
> >>
> >>1). SilverStorm has a virtual NIC hardware device.
> >>2). SilverStorm is committed to support OpenFabrics.
> >>
> >>The above two statements lead to the following conclusion:
SilverStorm
> >>needs a driver for its devices that works with the OpenFabrics
stack.
> >>This is totally orthogonal to and independent of working on IPoIB RC
> >>or any IETF efforts to define something new.
> >>
> >>- Fab
> >>
> >>___
> >>openib-general mailing list
> >>openib-general@openib.org
> >>http://openib.org/mailman/listinfo/openib-general
> >>
> >>To unsubscribe, please visit
> >>http://openib.org/mailman/listinfo/openib-general
> >>
> >>
> >
> >
> >
> >___
> >openib-general mailing list
> >openib-general@openib.org
> >http://openib.org/mailman/listinfo/openib-general
> >
> >To unsubscribe, please visit
http://openib.org/mailman/listinfo/openib-
> general
> >
> >
> >

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] [PATCH 0/10] [RFC] Support for SilverStorm Virtual Ethernet I/O controller (VEx)

2006-10-03 Thread Yaron Haviv

> -Original Message-
> From: [EMAIL PROTECTED] [mailto:openib-general-
> [EMAIL PROTECTED] On Behalf Of Rimmer, Todd
> Sent: Monday, October 02, 2006 5:46 PM
> To: Scott Weitzenkamp (sweitzen); Kuchimanchi, Ramachandra; Roland
Dreier
> (rdreier)
> Cc: openib-General
> Subject: Re: [openib-general] [PATCH 0/10] [RFC] Support for
SilverStorm
> Virtual Ethernet I/O controller (VEx)
> 
> > From: Scott Weitzenkamp (sweitzen)
> > Sent: Monday, October 02, 2006 4:22 PM
> > To: Kuchimanchi, Ramachandra; Roland Dreier (rdreier)
> > Cc: openib-General
> > Subject: Re: [openib-general] [PATCH 0/10] [RFC] Support for
> SilverStorm
> > Virtual Ethernet I/O controller (VEx)
> >
> > Is this communication protocols documented anywhere?  How does this
> > feature compare to IPoIB and SDP?
> >
> This protocol is distinct from IPoIB and SDP.
> 
> In brief:
> 
> IPoIB treats an IB fabric as a LAN.  As such it has UD semantics.
> 
> SDP essentially treats the HCA as a TOE and leverages IB's RC
semantics
> to emulate TCP/IP SOCK_STREAM sockets.
> 
> This protocol implements the interface to communicate to the
SilverStorm
> VEx Ethernet Virtual IO Controllers.  The VEx card presents a true
> Ethernet NIC to the host and essentially treats IB as an IO bus to
allow
> a host CPU to use the VEx card as its NIC.
> 
> Todd Rimmer
> 

Todd,

I'm trying to figure out why this protocol makes sense 
As far as I understand, IPoIB can provide a Virtual NIC functionality
just as well (maybe even better), with two restrictions:
1. Lack of support for Jumbo Frames 
2. Doesn't support protocols other than IP (e.g. IPX, ..)

1 can easily be addressed using IPoIB RC, and the question is if 2 is
really a problem (how many people use IPX or apple talk .. these days)
And if 2 is a problem why isn't it in a greater scope of supporting
Ethernet emulation even between any IB nodes, and not just from a host
to a gateway device.

If this is a real requirement, why haven't SilverStorm worked with the
industry and standardization bodies such as IBTA or IETF to come with a
standard and interoperable way to address it, and not just try and push
a proprietary driver and a point solution to the kernel.

I believe we should first see if such a driver is needed and if IPoIB
UD/RC cannot be leveraged for that, maybe the Ethernet emulation can
just be an extension to IPoIB RC, hitting 3 birds in one stone (same
infrastructure, jumbo frames for IPoIB, and Ethernet emulation for all
nodes not just Gateways) 

Yaron

> 
> ___
> openib-general mailing list
> openib-general@openib.org
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit
http://openib.org/mailman/listinfo/openib-
> general


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [openib-general] iSER & FC-SAN performance

2006-03-12 Thread Yaron Haviv

> -Original Message-
> From: [EMAIL PROTECTED] [mailto:openib-general-
> [EMAIL PROTECTED] On Behalf Of Mohit Katiyar, Noida
> Sent: Friday, March 10, 2006 11:36 AM
> To: openib-general@openib.org
> Subject: [openib-general] iSER & FC-SAN performance
> 
> Hi All,
> Are there any performance related data of iSER is available or some
test
> results that were performed on iSER?
> Are there any kinds of operational issues in the performance of iSER
in
> an general FC-SAN environment? I am also looking for some reference
> material of operations of iSER on FC Gateway environment?
> Thanks in advance
> 
> Mohit Katiyar

Mohit,

Your question should be broken to two, one is the iSER performance and
the other is an IB to FC gateway performance, and both are very
implementation dependent 

iSER initiator uses zero copy and can map a large SCSI command to a
single send+rdma transaction, this allows for very high-bandwidths and
low CPU% as the message size grow, the performance we saw is >900MB/s
per initiator 
this was tested with a 4X SDR link (1000 MB/s capable), with DDR it may
achieve more.

As for IB-FC gateway performance, it depends on the HW architecture of
the gateway rather than if its iSER or SRP, e.g. what's the memory BW
capacity of the gateway, pci-x vs. pci-express, CPU capacity, and FC
ports  
When targeting larger messages the mem/bus bandwidth become much more
critical than the cpu capacities.

As an example an iSER-FCP gateway would typically implement a store &
forward design (other designs can be achieved as well), the SCSI command
would be intercepted, data would be fetched to memory (using RDMA), and
a SCSI transaction would be performed on the FC side (were the FC
adapter will fetch the data using DMA), in a good gateway implementation
multiple I/Os can flow in parallel (asynchronous), this can sustain the
900MB/s in some architectures, in addition multiple gateways can be
aggregated to scale bandwidth of many GB/s of data, while still enjoying
the single name space and even emulate a single session leveraging on
iSCSI mechanisms. 

The different options for iSER to FC gateways would include:
1. Voltaire FCR product
2. FalconStore NSS product on a PC platform (with IB & FC adapters) 

You can address the different vendors to get more details on their
products & performance, I can point you to the right contacts offline 

Yaron


> ___
> openib-general mailing list
> openib-general@openib.org
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit
http://openib.org/mailman/listinfo/openib-
> general
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [openib-general] Suggested components to support in 1.0

2006-02-24 Thread Yaron Haviv



> -Original Message-
> From: [EMAIL PROTECTED] [mailto:openib-general-
> [EMAIL PROTECTED] On Behalf Of Bob Woodruff
> Sent: Friday, February 24, 2006 12:23 PM
> To: 'Bryan O'Sullivan'; openib-general
> Subject: RE: [openib-general] Suggested components to support in 1.0
> 
> Bryan wrote,
> >Components that I don't know what to do about, and will likely want
to
> >drop unless someone can vouch for them:
> 
>  > * iSER
>  > * SRP
>  > * uDAPL
> 
> 
> We need uDAPL and I am sure people want SRP and
> I think both are in good shape.
> I am not sure that iSer is quite ready, but will let Voltaire make
that
> call.
> 
> woody

Woody, I believe that OpenIB iSER is quickly getting there with the
amount of dedicated work Or, Dan, and others put into it 

We would definitely vote for it 

Yaron
 
> ___
> openib-general mailing list
> openib-general@openib.org
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit
http://openib.org/mailman/listinfo/openib-
> general
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [openib-general] [ANNOUNCE] Contribute RDS (Reliable DatagramSockets) to OpenIB

2005-12-02 Thread Yaron Haviv

> -Original Message-
> From: [EMAIL PROTECTED] [mailto:openib-general-
> [EMAIL PROTECTED] On Behalf Of Richard Frank
> Sent: Thursday, December 01, 2005 1:03 PM
> To: Grant Grundler
> Cc: openib-general@openib.org
> Subject: Re: [openib-general] [ANNOUNCE] Contribute RDS (Reliable
> DatagramSockets) to OpenIB
> 
> We do not see any deficiencies - the RDS specification and current
> implementation so far meet our requirements and is working very well.
> 
> There is more we will want to do further down the road - such as
access
> the RDS sockets via AIO so we can add zero copy support.
> 

Richard,

In the document you published few weeks ago you listed latency and CPU%
as key goals 

I assume to really get the latency down you need a user space
implementation that can leverage on pooling, any plans to work in user
space ?

Several other comments/suggestions if I may add (may already took them
into account):

As a UDP consumer isn't there a need to support Multicast as well, and
potentially leverage on IB multicast for scalability ?

I feel that there is not much benefit in eliminating the reliability
checks in the upper (UDP) consumer, since its negligible in CPU or
latency overhead, you may even just go with a UC implementation, also
UDP consumers may want to use RDS without modifying the application, or
may accept dropped packets or over subscription (since they are
interested in the most recent data).

And it is very important to tie the RDS implementation to the IP stack
for routing information/resolution, ARPs, etc' 
So it would become transparent from the mng/configuration side as well,
not requiring separate configuration files, or dealing better with
dynamic environments and failures like a real UDP would.

Yaron

> 
> On Thu, 2005-12-01 at 08:16 -0800, Grant Grundler wrote:
> > On Tue, Nov 29, 2005 at 03:23:46PM -0800, Roland Dreier wrote:
> > > Any progress to report on the port of RDS from the SilverStorm
> > > proprietary stack to the standard Linux stack?  I think it would
> > > really move the discussion forward if there were some code that
people
> > > could build and use.
> >
> > As primary consumer of RDS, I think Oracle first needs to decide if
> > the deficiencies that Mike Krause pointed out are acceptable or not.
> >
> > grant
> > ___
> > openib-general mailing list
> > openib-general@openib.org
> > http://openib.org/mailman/listinfo/openib-general
> >
> > To unsubscribe, please visit
http://openib.org/mailman/listinfo/openib-
> general
> 
> ___
> openib-general mailing list
> openib-general@openib.org
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit
http://openib.org/mailman/listinfo/openib-
> general
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [swg] RE: [openib-general] socket based connectionmodel for IBproposal -round 4

2005-11-29 Thread Yaron Haviv

> -Original Message-
> From: [EMAIL PROTECTED] [mailto:openib-general-
> [EMAIL PROTECTED] On Behalf Of Sean Hefty
> Sent: Tuesday, November 29, 2005 6:30 PM
> To: Kanevsky, Arkady
> Cc: Ted H. Kim; [EMAIL PROTECTED]; openib-general@openib.org
> Subject: Re: [swg] RE: [openib-general] socket based connectionmodel
for
> IBproposal -round 4
> 
> Kanevsky, Arkady wrote:
> > Sean,
> > SWG discussed today the extending private data format proposal to
> > SIDR_REQ.
> > The group does not see the need for it since ULP is no RDMA aware.
> > That is ULP does not use RDMA operations.
> > Do you have some specific ULP in mind for this functionality?
> > For UDP a different IP address can be used for each message. There
is no
> > persistent connection.
> 
> I didn't have any particular ULP in mind.  I was thinking more of a
> generic
> application that wanted to use UDP style addressing over IB, similar
to
> what's
> being discussed for using TCP style addressing over IB.
> 
> It seems that there needs to be a way to map a given destination
address
> to a
> remote QP/qkey.  Regardless if the IP address is carried in each ULP
> message, it
> would still need to be in the SIDR REQ in order to locate the correct
QP.
> 

Sean,

How about using ARP to get from IP to DGID+Partition
Followed by an SIDR to map DGID+PKey+Service to QKey & QP

It is the same concept as CMA that first uses IP stack (ARP etc') to get
to the remote end-point (in that case GID+PKey combination) followed by
SA-PR and CM REQ, we just substitute the CM REQ with a SIDR REQ
It may not solve all the cases but probably most of the practical ones 

Anyway the packets will need to carry some header (since it's not a
connected model), you can add more stuff in that header (e.g. can use
IPoIB header as is which contains already the src/dst IP) 

Yaron
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [swg] RE: [openib-general] round 2 - proposal for socket based connection model

2005-10-26 Thread Yaron Haviv

> -Original Message-
> From: Caitlin Bestler [mailto:[EMAIL PROTECTED]
> Sent: Tuesday, October 25, 2005 6:39 PM
> To: Tom Tucker; Kanevsky, Arkady
> Cc: [EMAIL PROTECTED]; openib-general@openib.org
> Subject: [swg] RE: [openib-general] round 2 - proposal for socket
based
> connection model
> 
> 
> 
> > -Original Message-
> > From: [EMAIL PROTECTED]
> > [mailto:[EMAIL PROTECTED] On Behalf Of Tom Tucker
> > Sent: Tuesday, October 25, 2005 2:56 PM
> > To: Kanevsky, Arkady
> > Cc: [EMAIL PROTECTED]; openib-general@openib.org
> > Subject: RE: [openib-general] round 2 - proposal for socket
> > based connection model
> >
> > Arkady:
> >
> > I may actually have a constructive comment about the protocol
> > (private data format). One thing I noticed is that *almost*
> > everything in the private data header is available in the
> > native iWARP protocol header except the ZB and SI bits.  If
> > these bits become part of the canonical private data header,
> > then does that require an iWARP transport to use the header
> > too even though only two bits are useful?
> >
> > Sorry if this is a dumb question,
> >
> 
> I'm not sure I followed why these were needed myself.

I believe ZBTO and Remote Invalidation are mandatory in iWarp, right ?

There are two new RDMA features that are available in iWarp, and are new
to IB (optional in 1.2 version) 
A ULP that is supposed to run on both may want to know if the peer
supports those, so it can use the correct verbs

e.g. if the peer doesn't support remote invalidation the ULP will need
to use Send verb, and invalidate the FMR locally, if it does support it,
it can use the new "Send with Invalidate" verb which can improve
performance and security

I don't see why iWarp needs to negotiate it, CMA can just return true on
both bits in case its iWarp

This is a generic parameters that will be needed by more than one ULP,
that wants to make sure what verbs are supported by the RDMA generic
layer, that's why its in the generic portion of the header.

Yaron



___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [openib-general] RE: [dat-discussions] round 2 - proposal forsocket based connection model

2005-10-26 Thread Yaron Haviv

> -Original Message-
> From: [EMAIL PROTECTED] [mailto:openib-general-
> [EMAIL PROTECTED] On Behalf Of Kanevsky, Arkady
> Sent: Tuesday, October 25, 2005 1:26 PM
> To: Sean Hefty
> Cc: [EMAIL PROTECTED]; openib-general@openib.org; dat-
> [EMAIL PROTECTED]
> Subject: RE: [openib-general] RE: [dat-discussions] round 2 - proposal
> forsocket based connection model
> 
> Think of a single API that supports iWARP and IB (transport
independent
> API).
> To a connection listener it provides the IP 5-tuple + private data.
> For IB it means that CM parses REQ and extracts IP 5-tuple as separate
> fields from private data.
> Listener does not parse the private data encoding of the proposal.
> 
> So CM need to know if it need to encode IP 5-tuple on requestor side
> and if need to parse on responder side.
> Arkady
> 

Arkady, I agree with Sean you can encode the Dest Port in the ServiceID
And if you really want to verify its using that format you can look at
the upper 48 bits in the serviceID.

We may need to distinguish between Explicit RDMA protocols (iSER,
NFS-RDMA, RDP, etc') and Implicit RDMA (SDP, where the Socket
application doesn't know it is using RDMA), this can be done in 3 ways:
a. port mapper, b. different ServiceID prefix, or c. a bit in the CM REQ
Header.

Also I'm not sure why we need the Protocol (UDP, TCP, SCTP, ..) since we
emulate RDMA we shouldn't care if its TCP or SCTP, and UDP is
unconnected and cant drive RDMA anyway 

Yaron


> 
> Arkady Kanevsky   email: [EMAIL PROTECTED]
> Network Appliance phone: 781-768-5395
> 375 Totten Pond Rd.  Fax: 781-895-1195
> Waltham, MA 02451-2010  central phone: 781-768-5300
> 
> 
> 
> > -Original Message-
> > From: Sean Hefty [mailto:[EMAIL PROTECTED]
> > Sent: Tuesday, October 25, 2005 1:08 PM
> > To: Kanevsky, Arkady
> > Cc: Caitlin Bestler; [EMAIL PROTECTED];
> > openib-general@openib.org; [EMAIL PROTECTED]
> > Subject: Re: [openib-general] RE: [dat-discussions] round 2 -
> > proposal for socket based connection model
> >
> >
> > Kanevsky, Arkady wrote:
> > > Correct.
> > > But this does bring the question how responder CM knows
> > that it need
> > > to parse the private data. I suspect this will be done via
> > new version
> > > of CM. But a suage of some of the CM REQ reserved fields are also
> > > possible. Anotherwords the current CM version assumes that CM only
> > > supports one version and there is no need to support more than 1
> > > version.
> >
> > The responder knows how to parse the private data based on
> > the service ID that
> > they're listening on.  This is how it's done today, and how
> > it will still need
> > to be done.  What is the motivation to change it?
> >
> > What data is beyond the addressing?  How does the responder
> > know how to
> > interpret that?
> >
> > - Sean
> >
> ___
> openib-general mailing list
> openib-general@openib.org
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit
http://openib.org/mailman/listinfo/openib-
> general
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [openib-general] iSER details

2005-10-24 Thread Yaron Haviv

> -Original Message-
> From: [EMAIL PROTECTED] [mailto:openib-general-
> [EMAIL PROTECTED] On Behalf Of Hal Rosenstock
> Sent: Monday, October 24, 2005 6:31 AM
> To: Mohit Katiyar, Noida
> Cc: openib-general@openib.org
> Subject: Re: [openib-general] iSER details
> 
> On Mon, 2005-10-24 at 06:09, Mohit Katiyar, Noida wrote:
> > Can anyone tell me where can I find the specifications of iSER
> > protocol on Infiniband. I could not find any document which provides
> > specification specially according to Infiniband, all the doc were on
> > iWarp. If anyone can guide me in this
> 
> There are 2 relevant I-Ds:
> 
> iSCSI Extensions for RDMA Specification
> http://www.ietf.org/internet-drafts/draft-ietf-ips-iser-05.txt
> 

As Hal indicate the iSER-05 IETF draft already incorporates InfiniBand,
and already passed last call status.
There aren't many differenced between IB and iWarp, IBTA is also working
on the IP address mapping over InfiniBand that will be leveraged by
iSER/IB and NFS/RDMA, and few other clarifications/issues.

Note one key difference in the IETF draft is that IB negotiate the Login
over the RC connection, where in iWarp its over a TCP connection (and
than transition to RDMA RC).

Some more detailed material can be found on
http://www.haifa.il.ibm.com/satran/ips/iSER-in-an-IB-network-V9.pdf
It's a little old but many sections are still relevant 

Yaron

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [dat-discussions] RE: [openib-general] Re: [swg] Re: private data...

2005-10-23 Thread Yaron Haviv

> -Original Message-
> From: [EMAIL PROTECTED] [mailto:dat-
> [EMAIL PROTECTED] On Behalf Of Kanevsky, Arkady
> Sent: Thursday, October 20, 2005 5:07 PM
> To: [EMAIL PROTECTED]; Sean Hefty
> Cc: Lentini, James; [EMAIL PROTECTED]; openib-general@openib.org
> Subject: RE: [dat-discussions] RE: [openib-general] Re: [swg] Re:
private
> data...
> 
> 
> Once this is defined ULP can decide on which Service ID(s) to listen.
> Requestor can send conn req to a specific Service ID (IB specific)
> or use higher level abstraction - TCP port.
> CM may be capable to translate TCP port to Service ID based on ULP.
> For example, iSER over IPoIB will be mapped to one Service ID and
> native iSER over IB will be mapped to another. But this is not simple.
> On another hand every intermediate level protocol (SDP, IPoIB) can
> do conversion. But this is also hard and is extension of existing
> protocol.

A small correction, there is no iSER over IPoIB, just iSER over Native
RDMA
There can be an iSCSI/TCP session running over IPoIB but than it's a
connectionless UD session (without ServiceID), also the iSER spec
defines that iSCSI/iSER is in precedence to iSCSI/TCP.

To add to the ongoing discussion, one of the major benefits in
maintaining the TCP port numbers for RDMA protocols is the ability to
leverage on existing naming services and configuration mechanisms.

e.g. NFS use Port mappers, other protocols use DHCP, DNS, SLP, iSNS,
well defined numbers, or other mechanisms, this way the upper layers
beyond the transport stay the same and don't bother if its IB or iWarp
or even if its plain TCP.   

If we don't preserve a simple/linear port mapping, we probably need to
reinvent name-services for RDMA as well.

Yaron
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[openib-general] RE: iWARP emulation protocol

2005-10-18 Thread Yaron Haviv

> -Original Message-
> From: Roland Dreier [mailto:[EMAIL PROTECTED]
> Sent: Tuesday, October 18, 2005 2:53 PM
> To: Kanevsky, Arkady
> Cc: Roland Dreier; Yaron Haviv; openib-general@openib.org;
> [EMAIL PROTECTED]
> Subject: Re: iWARP emulation protocol
> 
> 
> The proposal doesn't talk about mapping from TCP port numbers into a
> 16-bit range of IB service IDs.  I think this is necessary.
> 

I agree, that's part of the other proposals 

> Also, putting the destination address in the REP message doesn't make
> sense to me.  The destination IP and port number is something that the
> initiator of the connection is sending to the destination, not the
> other way around.  The passive side of the connection (receiver of the
> REQ) needs the destination IP as part of the REQ so that it can decide
> whether to accept the connection; the active side (sender of the REQ)
> knows who it is trying to talk to, so having the address information
> in the REP is not useful.

Also Agree, REP just needs few fields (ver, capabilities)

Yaron
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [openib-general] [RFC] IB address translation using ARP

2005-10-07 Thread Yaron Haviv

> -Original Message-
> From: [EMAIL PROTECTED] [mailto:openib-general-
> [EMAIL PROTECTED] On Behalf Of Sean Hefty
> Sent: Friday, October 07, 2005 12:40 PM
> To: 'Michael Krause'; Caitlin Bestler
> Cc: Openib
> Subject: RE: [openib-general] [RFC] IB address translation using ARP
> 
> >It would be best to define a CM architecture that enabled
communication
> >between like endpoints and avoid the gateway dilemma. Let the gateway
> >provider work out such issues as there are many requirements already
> >on each side of these interconnects.
> 
> 
> I've given this some more thought since the original postings and
agree
> with
> you.  It doesn't seem right to me to have the CM establish a
connection to
> something that is not the specified destination, under the assumption
that
> whatever is being connected to is a gateway.  I think it would be
better
> for the
> application to determine that the actual destination is on a different
> subnet,
> locate the gateway, and issue a connection request to the gateway.
> 
> - Sean
> 

Sean, I believe this is exactly how it is been proposed
The gateway is the endpoint in IB, and the IB CM request is done against
the gateway, the gateway may decide to create its own connection on the
other side based on IB headers or Private data or even application data
(depend on the type of the gateway), this just requires that traffic
targeted to a certain IP range/subnet/non-local will end up in the
gateway without the need to specify address by address individually
(just like its done in IP)

Yaron

> ___
> openib-general mailing list
> openib-general@openib.org
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit
http://openib.org/mailman/listinfo/openib-
> general
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [openib-general] [RFC] IB address translation using ARP

2005-10-07 Thread Yaron Haviv

> 
> From: Michael Krause [mailto:[EMAIL PROTECTED]
> Sent: Friday, October 07, 2005 12:29 PM
> To: Yaron Haviv
> Cc: Openib
> Subject: RE: [openib-general] [RFC] IB address translation using ARP
> 
> At 06:24 AM 9/30/2005, Yaron Haviv wrote:
> 
> > -Original Message-
> > From: Roland Dreier [ mailto:[EMAIL PROTECTED]
> > Sent: Thursday, September 29, 2005 9:50 PM
> > To: Sean Hefty
> > Cc: Yaron Haviv; Openib
> > Subject: Re: [openib-general] [RFC] IB address translation using ARP
> >
> > I think the usage model is the following: you have some magic device
> > that has an IB port on one side and "something else" on the other
> > side.  Think of something like a gateway that talks SDP on the IB side
> > and TCP/IP on the other side.
> >
> 
> >Also applicable to two IB ports, e.g. forwarding SDP traffic from one IB
> >partition to SDP on another partition (may even be the same port with
> >two P_Keys), and doing some load-balancing or traffic management in
> >between, overall there are many use cases for that.
> 
> While I can envision how an endpoint could communicate with another in
> separate partitions, doing so really violates the spirit of the
> partitioning where endpoints must be in the same partition in order to see
> one another and communicate.  

Mike, 
This is exactly the same case as two IPoIB interfaces over same port with two 
partitions configured with IP routing between them, or a layer 7 proxy that 
connects two network segments  
I don’t see anything wrong with such a model 

> Attempting to create an intermediary who has
> insights into both and then somehow is able to communicate how to find one
> another using some proprietary (can't be through standards that I can
> think of) method, seems like way too much complexity to be worth it.
> 

Assuming the ULPs on both sides are standards, how the proxy is built and how 
it functions is application dependent just like people do proxies for XML which 
don’t need to obey to any standard beside be transparent to both sides.
OpenIB should not block the ability to provide gateway/proxy functionality, or 
routing traffic beyond a single IP addressing hop.
This is just matching IB to capabilities already available in iWarp.

Yaron
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [openib-general] [RFC] IB address translation using ARP

2005-09-30 Thread Yaron Haviv

> -Original Message-
> From: Roland Dreier [mailto:[EMAIL PROTECTED]
> Sent: Thursday, September 29, 2005 9:50 PM
> To: Sean Hefty
> Cc: Yaron Haviv; Openib
> Subject: Re: [openib-general] [RFC] IB address translation using ARP
> 
> I think the usage model is the following: you have some magic device
> that has an IB port on one side and "something else" on the other
> side.  Think of something like a gateway that talks SDP on the IB side
> and TCP/IP on the other side.
> 

Also applicable to two IB ports, e.g. forwarding SDP traffic from one IB
partition to SDP on another partition (may even be the same port with
two P_Keys), and doing some load-balancing or traffic management in
between, overall there are many use cases for that. 

Yaron
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [openib-general] [RFC] IB address translation using ARP

2005-09-29 Thread Yaron Haviv

> -Original Message-
> From: Sean Hefty [mailto:[EMAIL PROTECTED]
> Sent: Thursday, September 29, 2005 5:16 PM
> To: Yaron Haviv
> Cc: Hal Rosenstock; Openib
> Subject: Re: [openib-general] [RFC] IB address translation using ARP
> 
> Yaron Haviv wrote:
> > 4. send an arp on the net device find destination MAC
> >
> > Note the destination IP in the ARP phase is either the REAL
destination
> > IP in case of a local subnet, or the IP router IP address in case of
a
> > gateway/router.
> >
> > 5. issue a path record between the source/dest GIDs (DGID taken from
ARP
> > Result IPoIB MAC)
> 
> In the case of gateway/router, isn't the returned GID for the router?
How
> is
> this used to establish a connection with the real destination?
> 
> - Sean

The RC connection is established with the DGID of the router (it's the
equivalent of a MAC address and its ok), the ServiceID + private data in
the case of SDP or iSER (or NFS-R assuming the IBTA proposal will pass)
also contains info on the REAL destination IP that can be used by the
proxy.

By the way there is a section on that in the IETF iSER draft talking
about iSER to iSCSI routing, but it's a general solution just as
applicable to someone doing HTTP proxy to SDP, or NFS/TCP to NFS/RDMA,
or SDP to SDP, etc'.

to route 

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [openib-general] [RFC] IB address translation using ARP

2005-09-29 Thread Yaron Haviv

> -Original Message-
> From: [EMAIL PROTECTED] [mailto:openib-general-
> [EMAIL PROTECTED] On Behalf Of Sean Hefty
> Sent: Thursday, September 29, 2005 2:58 PM
> To: Hal Rosenstock
> Cc: Openib
> Subject: Re: [openib-general] [RFC] IB address translation using ARP
> 
> Hal Rosenstock wrote:
> >>I'm struggling with understanding how translation can even occur in
this
> case.
> >>What DGID is used when querying for the path record, and how is it
> obtained?
> >
> > Isn't it the DGID of the next hop IP router ? (I suppose in the case
of
> > multiple IPoIB subnets on the same IB subnet, it could shortcut
somehow
> > like NHRP does in terms of ATM v. CLIP (Classic IP over ATM).
> 
> How is the DGID of the next hop IP router used when connecting?  As an
> aside, do
> the IPoIB subnets all fall into the same broadcast domain?
> 
> >>What does SDP do in this case?
> >
> > Same as AT. It does the route lookup and ARPs for and then asks for
the
> > PathRecord of the next hop IP router.
> 
> I guess I'm confused here.  This gives a path record between the host
> system and
> the IP router.  How is that used to establish a connection to the
actual
> destination?  What values (DLID, DGID, pkey, etc.) go in the CM REQ
> message, and
> how are those values obtained?
> 
> - Sean

The idea as Hal was describing is following the common IP model:
1. per destination IP (and TOS in IP case) find the outgoing route entry

2. if it's a subnet covered by an adapter (IPoIB in our case, can have
multiple per port each with its own P_Key), find the net device to use 
3. if its not in one of my subnets than what is the IP of the router
covering that destination (e.g. default gateway), and what is the net
device I need to use (a device/port/partition combination).
4. send an arp on the net device find destination MAC

Note the destination IP in the ARP phase is either the REAL destination
IP in case of a local subnet, or the IP router IP address in case of a
gateway/router.

5. issue a path record between the source/dest GIDs (DGID taken from ARP
Result IPoIB MAC) 

That's how its done in SDP & ib_at I believe

The generalization beyond a local subnet is very important 
If we want to address all sorts of applications, and configurations
And not related to IB routing

e.g. a proxy/LB application that sits in between two IP subnets (both
over IB), future mapping from IB to external iWarp subnets, IP routers,
etc'
it also follows the exact flow as in GbE/IP

Yaron

> ___
> openib-general mailing list
> openib-general@openib.org
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit
http://openib.org/mailman/listinfo/openib-
> general
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [openib-general][PATCH][RFC]: CMA IB implementation

2005-09-24 Thread Yaron Haviv

> -Original Message-
> From: [EMAIL PROTECTED] [mailto:openib-general-
> [EMAIL PROTECTED] On Behalf Of Sean Hefty
> Sent: Thursday, September 22, 2005 12:28 PM
> To: Guy German
> Cc: Openib
> Subject: Re: [openib-general][PATCH][RFC]: CMA IB implementation
> 
> Guy German wrote:
> > I don't think this layer should replace ib_at. If you think there
are
> > things to be fixed in the ib_at, I suggest we fix them. I do believe
> > that the original purpose of this generic cm was to serve ulps that
> > don't want to be transport oriented (e.g. iSER).
> 
> Based on discussions from last month, the general agreement was to use
CM
> private data in place of ATS.  Once that's done, I don't see a need
for
> ib_at.
> (Also, put simply, I don't believe that ATS can work.)  I think that a
> combination of what Roland, including his original API design, and
Yaron
> proposed is the right direction to go.
> 

Sean, my response is somewhat behind

Any way ib_at doesn't depend or directly connect to ATS
ATS was just one way to translate IP to GID

IB_AT provides a way to eventually translate src/dst IP + QoS attributes
to a set of layer 2 attributes and QP parameters in one place for few
ULPs 
And with potential enhancements to implement central address cache and
central QoS & Partitioning configuration mechanism. Basically it's the
IB equivalent of TCP/IPs IP & Eth resolution and routing layers. 

Having said that it doesn't really matter if its part of the CM or
external if we keep the functionality and implementation

To address partitioning IB_AT suggest using the P_Key value derived from
the IPoIB interface, also allowing a consumer/ULP to override those
values with its own. This forming the exact behavior as you would expect
from an Ethernet or iWarp mapping the RDMA sessions to the VLAN used by
that Interface.

To address QoS IB_AT model suggest taking by default the SL value from
the IPoIB interface of that subnet which took it from the SA MCRecord
(can override that with ULP). 
This allows a user to create two subnets over the fabric each mapped to
a different SL/VL with its BW/Priority reservation, and on the ULP side
he just needs to config ULP with different BW requirements to work over
a different subnet (which is what people already do today in many cases
since they use separate fabrics for e.g. one for NFS and one for MPI)

The API was also designed to let users override the default values
derived from IPoIB, so a sophisticated user/ulp can always get the best
granularity.

Yaron 

> - Sean
> ___
> openib-general mailing list
> openib-general@openib.org
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit
http://openib.org/mailman/listinfo/openib-
> general
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [openib-general] Managing SRP devices via iSCSI ?

2005-09-19 Thread Yaron Haviv


>From: [EMAIL PROTECTED]
[mailto:openib-general->[EMAIL PROTECTED] On Behalf Of Rick Frank
>Sent: Monday, September 19, 2005 10:02 PM
>To: openib-general@openib.org
>Subject: [openib-general] Managing SRP devices via iSCSI ?
>
>One key argument I've heard in favor of iSER vs SRP is that iSCSI (top
>level 
>iSER driver) has a very strong management infrastructure - as it is
fairly 
>mature.
>
>However, iSER seems to be just gaining steam in terms of direct
attached 
>storage supporting this protocol .vs. SRP.
>
>Would it not be possible to implement some glue between SRP and iSCSI
to 
>allow for the discovery and management of SRP devices ?

Rick,

The question is why bother with a new approach when iSER is what you
just suggested ?

a. After all iSER transactions are similar to SRP ones (derived from
SRP) with few enhancements in favor of iSER (SRQ, FMR, MC/S, immediate,
recovery,..).

b. iSER header and naming convention is derived from iSCSI, where as SRP
naming and header structure is different forcing redundant translation
between the two, and some functionality that wouldn't be possible such
as Portals, MC/S, ACA, etc', makes more sense to just use the iSCSI base
header format (like iSER does).

c. iWarp guys that now join OpenIB will never use this non standard, IB
specific SRP/iSCSI hybrid but rather the real iSER. 

d. SRP which was initially defined in T10 lost all its momentum in T10
(last SRP meeting was 2 years ago), not sure how you will standardize
your proposal, where iSER is in IETF (integral part of iSCSI/IPS) and
serves IB & iWarp, guaranteeing its momentum will grow, and it will be
enhanced over time.

So I believe overall it's simpler to move SRP implementations to iSER,
(some vendors already wisely do that) than somehow define a non standard
SRP with iSCSI management, after all iSER is just what you propose
(improved SRP with iSCSI services), and is already defined (last call in
IETF).

By the way I wouldn't deduct from few early experiments of SRP storage
in the market a whole lot on SRP adoption among key storage vendors or
on their future plans.

If you are interested in more details on iSER let me know 

Yaron
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [openib-general] RDMA Generic Connection Management

2005-08-31 Thread Yaron Haviv

> -Original Message-
> From: Talpey, Thomas [mailto:[EMAIL PROTECTED]
> Sent: Tuesday, August 30, 2005 12:54 PM
> To: Yaron Haviv
> Cc: openib-general@openib.org
> Subject: RE: [openib-general] RDMA Generic Connection Management
> 
> At 10:55 AM 8/30/2005, Yaron Haviv wrote:
> >The iSCSI discovery may return multiple src & dst IP addresses and
the
> >iSCSI multipath implementation will open multiple connections.
> >There are many TCP/IP protocols that do that at the upper layers
(e.g.
> >GridFTP, ..), not sure how NFS does it.
> 
> 
> To answer the question of how NFS "finds out" about multiple
> connections and trunking, the answer is generally that the mount
> command tells it. Mount can get this information from the command
> line, or DNS. I believe Solaris uses the command line approach. There
> may be a way to use the RPC portmapper for it, but the portmapper
> isn't used by NFSv4.
> 
> Bottom line? NFS would love to have a way to learn multipathing
> topology. But it needs to follow existing practice, such as having
> an IP address / DNS expression. If the only way to find it is to query
> fabric services, that's not very compelling.
> 
> Tom.

Tom, from your description it looks like the multipathing is done based
on IP addressing (like iSCSI/iSER, GridFTP, ..) and resolved by the ULP
or its name service, in that case the ULP probably opens few connections
from one or more IPs to one or more other IPs.

This mean that we don't need a transport dependent mechanism as long as
each port is associate with a unique IP (like we do today in OpenIB).
(Another good reason to use IP addressing)

Yaron
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [openib-general] Re: RDMA Generic Connection Management

2005-08-31 Thread Yaron Haviv

> -Original Message-
> From: [EMAIL PROTECTED] [mailto:openib-general-
> [EMAIL PROTECTED] On Behalf Of Roland Dreier
> Sent: Tuesday, August 30, 2005 2:36 PM
> To: Talpey, Thomas
> Cc: openib-general@openib.org
> Subject: Re: [openib-general] Re: RDMA Generic Connection Management
> 
> Thomas> Well, you're saying somebody has to do it, right? Is it
> Thomas> easier to fob this off to upper layers that (frankly)
> Thomas> don't care what hardware they're talking to!? This means
> Thomas> we have N copies of this, and N ways to do it. Talk about
> Thomas> cacheline pingpong.
> 
> Upper layers have the luxury of being able to do this at a
> per-connection level, can sleep, etc.  If we push it down into the
> verbs, then we have to do it in every verbs call, including the fast
> path verbs call.  And that means we get into all sorts of crazy code
> to deal with a device disappearing between a consumer calling
> ib_post_send() and the core code being entered, etc.
> 
> Right now we have a very simple set of rules:
> 

If all the ULPs need to do exactly the same, or the implementation is
different for IB/iWarp, than we should probably do it under the API like
its defined in kDAPL.

Also note that with Virtual machines this type of event may be more
frequent and we may want to decouple the ULPs from the actual hardware
device as much as we can 

Yaron 
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [openib-general] RDMA Generic Connection Management

2005-08-30 Thread Yaron Haviv

> -Original Message-
> From: [EMAIL PROTECTED] [mailto:openib-general-
> [EMAIL PROTECTED] On Behalf Of James Lentini
> Sent: Monday, August 29, 2005 3:35 PM
> To: Guy German
> Cc: openib-general@openib.org
> Subject: Re: [openib-general] RDMA Generic Connection Management
> 
> 
> What happens if multiple devices can reach the destination address?
> How will they be enumerated to the consumer?
> 

Since its an IP based approach, it will work like traditional IP
A preference is given to a device with the same subnet as destination
In GbE if two NICs are on the same subnet then only one will be selected

You can also use a LAG solution that will balance connections over
multiple links, but it is done at the L2-3 layers (not exposed to the
ULP)

We should probably use the same approach and provide a single device
handle to the ULP, we may have a virtual device handle representing few
similar parallel devices (just like a LAG group has a virtual MAC), also
maybe a good idea to pass an enum with some preference (e.g. single path
or redundant or ...)

Specifically in iSER the redundancy is handled in the upper layers 
The iSCSI discovery may return multiple src & dst IP addresses and the
iSCSI multipath implementation will open multiple connections.
There are many TCP/IP protocols that do that at the upper layers (e.g.
GridFTP, ..), not sure how NFS does it.

Also note that there was a new addendum to IB Multipath record query me
& Hal proposed in IBTA that enable a client to ask "what are all the
options to get from point A to point B ?", where A & B are identified by
one of the GIDs we know about, and we can specify a flag for same
port/hca/system preferences, this can be implemented under AT if we
want.

Yaron



___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [openib-general] RDMA connection and address translation API

2005-08-25 Thread Yaron Haviv

> -Original Message-
> From: Sean Hefty [mailto:[EMAIL PROTECTED]
> Sent: Thursday, August 25, 2005 2:37 PM
> To: 'James Lentini'; Yaron Haviv
> Cc: openib-general@openib.org
> Subject: RE: [openib-general] RDMA connection and address translation
API
> 
> >> Any way providing src/dst IPs in the CM Private data is simple, and
we
> >> can come with IBTA extension blessing that data structure as a
general
> >> way to map IP oriented protocols over IB (a 1-2 page draft at the
most)
> >> This way it can also address Caitlin concerns regarding NFS & IETF
> >> (since now it's a transport specific issue)
> >
> >How long do you estimate it would take to standardize an IP<->GID
> >mechanism (ATS, CM embedded, ...) in the IBTA? 3 months? 6 months? A
> >year?
> >
> >Let's assume that everyone on this list is in agreement.
> 
> Does anyone in the IB world disagree with adding IP addresses in the
CM
> private
> data area?  Would we want to extend this concept to SIDR as well?
> 
> - Sean

I send my proposal from 2004 re-send again as text (attached)
Also addresses the ServiceID issue, this can be a baseline for
discussions
Feel free to change 

Yaron


   Mapping of iWarp/TCP connections to InfiniBand


AUTHOR   Yaron Haviv  ([EMAIL PROTECTED])
VERSION  0.30, Mon June 28 2004


I.  INTRODUCTION


 InfiniBand and iWarp semantics are similar especially with the latest
 Verb Extensions, the major difference is in the way connections are 
 established, iWarp uses TCP based connection establishment while 
 InfiniBand uses a CM for that. 
 Another related difference is that in iWarp a user can start in a 
 standard TCP mode and migrate to RDMA verbs in the middle of a session.

 The following document provides a general mapping from iWarp/TCP 
 connection establishment to InfiniBand which can be used by ULPs over 
 InfiniBand or by any other future iWarp protocols, it imitates the SDP
 connection establishment process and CM headers (does not require SDP,
 just have the same data formats for CM messages).
 
 
II. Establishing a TCP/iWarp like connections over InfiniBand

 In order to emulate an iWarp connection, it is required to open an 
 InfiniBand RC connection, associate it with IP addresses and TCP ports
 In addition protocols may transfer control/login packets before
 the migration to the RDMA mode; this requires exchanging receiver buffer
 size and depth for initial usage (the ULPs will manage the flow control
 for the duration of the connection).

 The mapping uses the same data structures already defined for connection 
 establishment in SDP  (IBTA Socket Direct Protocol) which accomplish the
 same goal of mapping TCP Sockets addressing to InfiniBand, the non 
 relevant SDP fields were Reserved. 

 iWarp emulation CM Request (Hello) Private Data header
  
0   1   2   3 
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1   
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+  
 04|  MID  | Rsvd  | bufs  |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+  
 08|  len  |  
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+  
 12|   Reserved|  
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+  
 16|   Reserved|  
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+  
 20| MajVer| MinVer| IPVer | FlowC |   Reserved|  
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+  
 24|  DesRemRcvSz  |  
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+  
 28|  LocalRcvSz   |  
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+  
 32| Local Port|   Reserved|  
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+  
 36|   Src IP (127-96) |  
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+  
 40|   Src IP ( 95-64) |  
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+  
 44|   Src IP ( 63-32) |  
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+  
 48|   Src IP ( 31-00

RE: [openib-general] RDMA connection and address translation API

2005-08-25 Thread Yaron Haviv

> -Original Message-
> From: James Lentini [mailto:[EMAIL PROTECTED]
> Sent: Thursday, August 25, 2005 12:21 PM
> To: Yaron Haviv
> Cc: Fab Tillier; Roland Dreier; openib-general@openib.org
> Subject: RE: [openib-general] RDMA connection and address translation
API
> 
> 
> 
> On Wed, 24 Aug 2005, Yaron Haviv wrote:
> 
> > Any way providing src/dst IPs in the CM Private data is simple, and
we
> > can come with IBTA extension blessing that data structure as a
general
> > way to map IP oriented protocols over IB (a 1-2 page draft at the
most)
> > This way it can also address Caitlin concerns regarding NFS & IETF
> > (since now it's a transport specific issue)
> 
> How long do you estimate it would take to standardize an IP<->GID
> mechanism (ATS, CM embedded, ...) in the IBTA? 3 months? 6 months? A
> year?
> 
> Let's assume that everyone on this list is in agreement.

James, I can identify enough IBTA members in this list
In case the group is in agreement I believe it's a rather short process
Since it's just some minor definition, and IBTA doesn't have much on its
agenda these days.

For example Hal added a feature to the SM (client re-register ..) in
weeks 
Based on the OpenIB input 
We also don't have to wait for finalized spec to implement, just like we
implement IPoIB without an IETF RFC (only a draft)

By the way a quick path could be to define it in DAT and hand it over to
IBTA, after all ATS is also not an IBTA standard 

Yaron
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[openib-general] RE: RDMA connection and address translation API

2005-08-25 Thread Yaron Haviv

> -Original Message-
> From: Roland Dreier [mailto:[EMAIL PROTECTED]
> Sent: Thursday, August 25, 2005 12:13 PM
> To: Michael S. Tsirkin
> Cc: Yaron Haviv; openib-general@openib.org
> Subject: Re: RDMA connection and address translation API
> 
> Michael> Wouldnt it be better to use some bits in the service ID
> Michael> field for this?
> 
> This would also be OK.  But Annex 3 of the IBA spec has already
> defined the service ID field without any reserved bits we can use.
> For example, if the first byte is 0x01, then the IETF is allowed to
> use any value they want for the rest of the service ID.  So if we want
> to keep backwards compatibility with the spec, this approach might be
> difficult.
> 

The IB ServiceID is 64 bits and TCP is 16 bits, so we can still take
some bits in the middle to define what Michael was proposing, this may
be a simpler change in IBTA than changing the CM header, but both
options are valid 

Yaron

> Anyway, what's the disadvantage of using a reserved bit or two from
> the CM REQ?
> 
>  - R.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [openib-general] RDMA connection and address translation API

2005-08-24 Thread Yaron Haviv

> -Original Message-
> From: Roland Dreier [mailto:[EMAIL PROTECTED]
> Sent: Wednesday, August 24, 2005 7:29 PM
> To: Yaron Haviv
> Cc: James Lentini; Roland Dreier; openib-general@openib.org
> Subject: Re: [openib-general] RDMA connection and address translation
API
> 
> 
> Yaron, has anyone raised all this in the IBTA WG?
> 

I raised it about a year ago, but didn't really followed up on it 
At the time IBTA was also busy with other more urgent stuff (verb ext..)
We work with few key IBTA members to re-surface it with the need for an
abstract CM

See the following text that was proposed (a Year ago as is)
It is slightly different than your proposal but can be altered if needed

It basically uses SDP header and marks one of the fields with 01 (FlowC)
to indicate it's not SDP, this way even SDP can use it 
Also it covers some nice idea raised by MS & SUN to extend SDP to accept
PUT & GET operations for RDMA, so you can get a BSD like API with few
additional APIs rather than have a totally new API like DAPL


Establishing a TCP/iWarp like connections over InfiniBand
=

 In order to emulate an iWarp connection, it is required to open an 
 InfiniBand RC connection, associate it with IP addresses and TCP ports
 In addition protocols may transfer control/login packets before
 the migration to the RDMA mode; this requires exchanging receiver
buffer
 size and depth for initial usage (the ULP's will manage the flow
control
 for the duration of the connection).

 The mapping uses the same data structures already defined for
connection 
 establishment in SDP  (IBTA Socket Direct Protocol) which accomplish
the
 same goal of mapping TCP Sockets addressing to InfiniBand, the non 
 relevant SDP fields were Reserved. 

 iWarp emulation CM Request (Hello) Private Data header
  
0   1   2   3

0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1

   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

 04|  MID  | Rsvd  | bufs  |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

 08|  len  |

   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

 12|   Reserved|

   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

 16|   Reserved|

   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

 20| MajVer| MinVer| IPVer | FlowC |   Reserved|

   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

 24|  DesRemRcvSz  |

   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

 28|  LocalRcvSz   |

   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

 32| Local Port|   Reserved|

   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

 36|   Src IP (127-96) |

   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

 40|   Src IP ( 95-64) |

   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

 44|   Src IP ( 63-32) |

   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

 48|   Src IP ( 31-00) |

   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

 52|   Dst IP (127-96) |

   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

 56|   Dst IP ( 95-64) |

   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

 60|   Dst IP ( 63-32) |

   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

 64|   Dst IP ( 31-00) |

   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

 Figure 1 CM Hello private data structure   
  

 iWarp emulation CM Response (HelloReply) Private Data header

0   1   2   3

0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1

   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

 04|  MID  | Rsvd  | bufs  |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

RE: [openib-general] RDMA connection and address translation API

2005-08-24 Thread Yaron Haviv

> -Original Message-
> From: James Lentini [mailto:[EMAIL PROTECTED]
> Sent: Wednesday, August 24, 2005 5:51 PM
> To: Yaron Haviv
> Cc: Roland Dreier; openib-general@openib.org
> Subject: RE: [openib-general] RDMA connection and address translation
API
> 
> 
> 
> Which draft contains this? I found
> 
> http://www.ietf.org/internet-drafts/draft-ietf-ips-iser-04.txt
> 

James,

You should look at :
http://www.haifa.il.ibm.com/satran/ips/draft-ietf-ips-iser-05-candidate.
txt

The 05 rev really adds all the InfiniBand related stuff 
You can see how the association between IB & IP is done using IPoIB

The current implementation may not use the private data field (since its
not critical/mandatory) but the intention is to add it to address multi
homed hosts, we would like to push such a definition into IBTA so every
IP oriented ULP can use it, several people expressed interest in such a
definition, this can also support NFS/RDMA or any other IP based ULP.


> but the HELLO header in section 9.3 does not contain any IP address
> information.
> 
> > I believe it can be a good idea to use the same approach for
> > NFS/RDMA and eliminate the need for reverse ATS lookup (the may have
> > some conflicts when multiple IPs exists per node). We may just use
> > the SDP hello header as is with unused fields zeroed This will allow
> > all ULPs to use the same mechanism
> 
> NFS/RDMA is not specific to iWARP or InfiniBand. My understanding is
> that this could not be easily accommodated in the current standards
> for that reason.

Not sure why is that the case, if we add an IBTA definition of CM
exchange for IP based ULP's (i.e. send src/dst IP and optionally ports)
you can now have an NFS/RDMA spec that doesn't need to have any IB/iWarp
specific definitions, since the differences are pushed down to the IBTA 

In case of NFS/RDMA over other (non IB or iWarp) transport you can
specify that providing the IP addressing is a responsibility of the
underline transport.

Yaron

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [openib-general] RDMA connection and address translation API

2005-08-24 Thread Yaron Haviv

> -Original Message-
> From: [EMAIL PROTECTED] [mailto:openib-general-
> [EMAIL PROTECTED] On Behalf Of Fab Tillier
> Sent: Wednesday, August 24, 2005 3:00 PM
> To: 'Roland Dreier'
> Cc: openib-general@openib.org
> Subject: RE: [openib-general] RDMA connection and address translation
API
> 
> > From: Roland Dreier [mailto:[EMAIL PROTECTED]
> > Sent: Wednesday, August 24, 2005 11:03 AM
> >
> > Fab> Why can't the IPV field be ignored?  If a listen wants only
> > Fab> IPV4 addresses, it would specify a 16-byte compare buffer
> > Fab> with the first 12 bytes zero, the next 4 filled with the
IPV4
> > Fab> address, and would set the offset to that of the hello
> > Fab> message's destination address (32).
> >
> > Yes, you're right for SDP.  I guess if we're comfortable mandating
> > that all protocols put their source and destination IPs in the
private
> > data for the IB case, then this works.  Of course it's somewhat
> > awkward to pass this information into the transport-neutral CM API
but
> > I think this can be worked around.
> 
> I don't know if we need to mandate IP usage - it's up to the
application.
> Any
> application that wants to have similar semantics to the way socket
listens
> work
> (especially when bound to one of multiple IP addresses on a port) the
> application would have to define its private data to accommodate this.
> 

The context of this discussion is around a common API for iWarp/IB ULPs
In that case they all use IP addresses (since it's the common
addressing) 

If someone would use the IB specific API under this abstraction level he
can provide what ever data he wants to the CM

Any way providing src/dst IPs in the CM Private data is simple, and we
can come with IBTA extension blessing that data structure as a general
way to map IP oriented protocols over IB (a 1-2 page draft at the most)
This way it can also address Caitlin concerns regarding NFS & IETF
(since now it's a transport specific issue)

Yaron
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [openib-general] RDMA connection and address translation API

2005-08-24 Thread Yaron Haviv

> -Original Message-
> From: [EMAIL PROTECTED] [mailto:openib-general-
> [EMAIL PROTECTED] On Behalf Of Caitlin Bestler
> Sent: Wednesday, August 24, 2005 2:14 PM
> To: Fab Tillier
> Cc: openib-general@openib.org
> Subject: Re: [openib-general] RDMA connection and address translation
API
> 
> 
> The applications are expecting source/destination network addresses
> that come from a network layer, not from the peer application. IP has
> no problem meeting this requirement. This is an IB problem that needs
> to be solved within the scope of IB without changing any ULPs.
> 

To my understanding IB private data fields are IB CM specific 
So embedding src/dst IP in it doesn't change the ULP and could be
considered as part of the IB CM

You can look at the private data in that case as a replacement to the
TCP CM (Syn/SynAck exchange), and Syn packet includes IPs & Ports

Yaron 
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [openib-general] RDMA connection and address translation API

2005-08-24 Thread Yaron Haviv

> -Original Message-
> From: [EMAIL PROTECTED] [mailto:openib-general-
> [EMAIL PROTECTED] On Behalf Of James Lentini
> Sent: Wednesday, August 24, 2005 1:43 PM
> To: Roland Dreier
> Cc: openib-general@openib.org
> Subject: Re: [openib-general] RDMA connection and address translation
API
> 
> 
> 
> On Tue, 23 Aug 2005, Roland Dreier wrote:
> 
> > It would be possible to have another function like
> > rdma_getpeername() that takes the transport address and
> > returns a source IP address.  In the IB case this would do
an
> > ATS reverse lookup.  However, I hate this idea.  iSER
already
> > uses the CM private data to pass the source IP in the IB
case,
> 
> I know this is how IB SDP works, but I don't think iSER works this
> way.
> 
> The code in the tree calls dat_ep_connect() with a NULL private data
> pointer.
> 
> There is an iSER HELLO message described in iser_header.h contains IP
> addresses, but I'm not certain that this is part of the current
> protocol (ISER_HELLO_LEN and ISER_HELLO_REPLY_LEN are unused).

James,

iSER doesn't mandate the source IP in general since its doing a much
stronger authentication during Login
However we believe using a similar header to SDP can help the Passive
side 
a. know which destination IP was targeted (in a multi homed environment)
b. for some implementations that want to validate the source for some
reason

that's why the draft suggested adding the source/dst IP in the private
data just like SDP does, I believe it can be a good idea to use the same
approach for NFS/RDMA and eliminate the need for reverse ATS lookup (the
may have some conflicts when multiple IPs exists per node).
We may just use the SDP hello header as is with unused fields zeroed 
This will allow all ULPs to use the same mechanism

Yaron

> ___
> openib-general mailing list
> openib-general@openib.org
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit
http://openib.org/mailman/listinfo/openib-
> general
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [openib-general] [ANNOUNCE] Initial trunk checkin of ISERinitiator

2005-08-19 Thread Yaron Haviv

> 
> Also on the IB side the AT code probably needs to be reviewed and
> improved.  The API should be simpler, and I don't like the way AT
> sticks its tentacles into the IPoIB driver and network stack.
> 

The AT implementation was based on the code from SDP 
I assume that similar changes as the ones you propose would need to
apply to SDP, or SDP would need to use the same lib as the other ULPs 

Yaron
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [openib-general] [ANNOUNCE] Initial trunk checkin of ISERinitiator

2005-08-19 Thread Yaron Haviv

> -Original Message-
> From: Christoph Hellwig [mailto:[EMAIL PROTECTED]
> Sent: Friday, August 19, 2005 10:22 AM
> To: Roland Dreier
> Cc: Yaron Haviv; Christoph Hellwig; Grant Grundler; open-
> [EMAIL PROTECTED]; openib-general@openib.org
> Subject: Re: [openib-general] [ANNOUNCE] Initial trunk checkin of
> ISERinitiator
> 
> On Thu, Aug 18, 2005 at 09:24:24PM -0700, Roland Dreier wrote:
> > Yaron> Not every one wants to keep on doing target discovery
with
> > Yaron> Python scripts,
> >
> > Come on, this is just a stupid statement.  The whole point of
putting
> > device management in userspace is so that everybody has the
> > flexibility to use whatever discovery mechanism they want.
> 
> And just FYI.  If you ever want an iSER implementation merged it will
> have to work the same way.  Look at how the open-iscsi TCP initator
does
> it.

Good point, the high-level functionality in iSER
is all done in Open-iSCSI and its userspace extensions
iSER just deals with the data transfer and is layered under Open-iSCSI

by the way can you point me to the iSCSI HBA that delivers better
performance, latency, and memory consumption 
and what about the price of that HBA and the attached 10GbE switch 

Yaron
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [openib-general] [ANNOUNCE] Initial trunk checkin of ISERinitiator

2005-08-18 Thread Yaron Haviv

> -Original Message-
> From: Roland Dreier [mailto:[EMAIL PROTECTED]
> Sent: Friday, August 19, 2005 12:24 AM
> To: Yaron Haviv
> Cc: Christoph Hellwig; Grant Grundler; [EMAIL PROTECTED];
> openib-general@openib.org
> Subject: Re: [openib-general] [ANNOUNCE] Initial trunk checkin of
> ISERinitiator
> 
> Yaron> Not every one wants to keep on doing target discovery with
> Yaron> Python scripts,
> 
> Come on, this is just a stupid statement.  The whole point of putting
> device management in userspace is so that everybody has the
> flexibility to use whatever discovery mechanism they want.

You know there is a small problem in storage, people don't want to just
use what "they want", but rather use standard management, discovery,
Security, HA, etc' which are quite essential for commercial customers  

 
> I agree that the SRP and iSER protocols are basically equivalent at a
> technical level: they both transport SCSI over RDMA.  If you want to
> compare existing implementations, I'd much rather use my SRP driver's
> 1600 lines of code over your 14000+ lines of x86-only iSER on top of
> 1+ lines of kDAPL (not even counting the iSCSI core).

Not sure how you do your LOC counting or what's included in it
In any case a protocol that is generalized to multiple transports, has
built in discovery, error-recovery, global routing/naming,
authentication, built-in multi-pathing, multi-connection per session,
optimizations for small messages, comprehensive management and
configuration with industry standard APIs, etc'
Probably need to have more LOC than one that just tunnels SCSI command
from one predefined point to another (by the way is DM, CFM and/or
Python included in the 1400 :))

The important things is how many LOC are on the command path and how
optimized it the protocol, this code runs SCSI at 850-900MB/s and on the
same time provides the most comprehensive set of features, and is
managed out of the box with industry standard tools  

A variation of that code runs today on PPC, so I assume it's not an
issue to make sure it runs over PPC 

In any case let aside the religious discussion iSER needs to get into
OpenIB and customers will then decide what ever they want, to get it in
we need:
1. iSER developers to comply to Linux requirements and address any
constructive feedback 
2. have an API that can be used by ULP developers that want to be
transport independent (till then kDAPL would need to be used) 

Yaron

> 
>  - R.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [openib-general] [ANNOUNCE] Initial trunk checkin of ISERinitiator

2005-08-18 Thread Yaron Haviv

-Original Message-
> From: Christoph Hellwig [mailto:[EMAIL PROTECTED]
> Sent: Thursday, August 18, 2005 7:45 PM
> To: Grant Grundler
> Cc: Yaron Haviv; [EMAIL PROTECTED];
openib-general@openib.org
> Subject: Re: [openib-general] [ANNOUNCE] Initial trunk checkin of
> ISERinitiator
> 
> > If kDAPL for any reason doesn't get pushed upstream to kernel.org,
> > we effectively don't have iSER or NFS/RDMA in linux.
> > Since I think without them, linux won't be competitive in the
> > commercial market place.
> 
> iser doesn't matter at all in the marketplace.  nfs/rdma matters and
> even if netapp/citi keeps beeing ignorant I will port it over to the
> infiniband/rdma layer myself.  I'll hopefully have some iwarp cards
> soon.

Christoph,

Can you help me understand how would you address the CM issue, would you
add IB/iWarp specific code into all the ULPs (NFS, SDP, MPI, Lustre,
iSER, ..) ?

Regarding iSER, You are entitled to your opinion  
Many others won't agree with you and think that in the long run iSER
will be the only viable block storage alternative in OpenIB, mainly
since it fits the IB/iWarp generalization and it is much more complete
than alternatives, and with the recent IETF moves people can't claim its
non-standard anymore.

Not every one wants to keep on doing target discovery with Python
scripts, and some prefer just using existing code and management from
iSCSI rather than inventing new mechanisms just for IB  

Yaron
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [openib-general] [ANNOUNCE] Initial trunk checkin of ISERinitiator

2005-08-18 Thread Yaron Haviv

> -Original Message-
> From: Grant Grundler [mailto:[EMAIL PROTECTED]
> Sent: Thursday, August 18, 2005 7:41 PM
> To: Yaron Haviv
> Cc: Grant Grundler; Christoph Hellwig; [EMAIL PROTECTED];
> openib-general@openib.org
> Subject: Re: [openib-general] [ANNOUNCE] Initial trunk checkin of
> ISERinitiator
> 
> 
> > Until OpenIB will define another layer that can be used for both,
there
> > is no other viable alternative for iSER to be implemented on top
> > In future if a new common API/Layer will be provided iSER can change
to
> > support it
> 
> I've understood that the openib.org Verbs API can be changed to make
> it "transport neutral" - ie support RNICs.  RNIC vendors don't seem
> to be interested in submitting patches for that. Did someone think
> they can drop kDAPL into openib.org SVN and roland would automatically
> push that into kernel.org?
> 
> I'm not convinced of that and worry that iSER and NFS/RDMA won't
> make it into kernel.org as things stand now.
> 

Grant,

The Verb portion deals with the data path operations (after the
connection was established), the connection establishment process is
very different 
IB CM is implemented on top of the verbs, an iWarp specific CM would
also need to be developed in parallel (interacts with the TCP stack ..),
and common ULPs need a single mechanism to use both (in the DAPL case a
BSD like API using IP addresses)

Again I'm not saying kDAPL is the ultimate solution or that it will last
in its current form, its just the only thing we can use today, if
someone would come with a better implementation we can just change iSER 

In one of the previous threads I suggested building a hybrid layer that
uses the current verb APIs for verb type operations, and the DAPL code
for the connection establishment, resulting in a simpler/shorter code,
this would present a middle ground addressing the concerns on both sides


Yaron 
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [openib-general] [ANNOUNCE] Initial trunk checkin of ISERinitiator

2005-08-18 Thread Yaron Haviv

> -Original Message-
> From: [EMAIL PROTECTED] [mailto:openib-general-
> [EMAIL PROTECTED] On Behalf Of Grant Grundler
> Sent: Thursday, August 18, 2005 2:18 PM
> To: Christoph Hellwig
> Cc: [EMAIL PROTECTED]; openib-general@openib.org
> Subject: Re: [openib-general] [ANNOUNCE] Initial trunk checkin of
> ISERinitiator
> 
> On Thu, Aug 18, 2005 at 07:43:17PM +0200, Christoph Hellwig wrote:
> ...
> > > The same as last time, the code didn't change at all.  It's still
> > > totally ignorant about such essential things as dma mapping, has
> > > creative new abuse for struct iovec, it's still based on iovecs,
> >
> >  "... still based on kdapl" of course
> 
> Yeah, I was wondering about that. When I was off on vacation
> in July (and OLS), kDAPL was committed to the svn repository.
> Has anyone reviewed that?
> 
> I was under the impression kDAPL would never make it into
> the openib.org source tree. Or has something changed?
> 

Grant, 

Currently kDAPL is the ONLY layer that can be abstracted over both IB &
iWarp, due to the different CM model of the two interconnects
iSER and NFS/RDMA are common to both IB & iWarp and are implemented to
run on both

Until OpenIB will define another layer that can be used for both, there
is no other viable alternative for iSER to be implemented on top
In future if a new common API/Layer will be provided iSER can change to
support it

Also appreciate your productive feedback on the code, the team will
address it

Yaron



> grant
> ___
> openib-general mailing list
> openib-general@openib.org
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit
http://openib.org/mailman/listinfo/openib-
> general
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [openib-general] [ANNOUNCE] Initial trunk checkin of ISERinitiator

2005-08-18 Thread Yaron Haviv

> -Original Message-
> From: [EMAIL PROTECTED] [mailto:openib-general-
> [EMAIL PROTECTED] On Behalf Of Christoph Hellwig
> Sent: Thursday, August 18, 2005 8:36 AM
> To: Dan Bar Dov
> Cc: [EMAIL PROTECTED]; openib-general@openib.org
> Subject: Re: [openib-general] [ANNOUNCE] Initial trunk checkin of
> ISERinitiator
> 
> On Thu, Aug 18, 2005 at 03:14:05PM +0300, Dan Bar Dov wrote:
> > I just checked in a first version of iSCSI Extensions for RDMA
> > Protocol (ISER) initiator under infiniband/ulp/iser. This
> > implements the ISER datamover, a transport layer alternative to
> > TCP/IP usable by iSCSI. This ISER transport has been tested with
> > the open-iscsi opensource project, and against the Voltaire
> > Fibre-Channel Router (FCR) and Voltaire's Native-IB storage kit.
> >
> > All the iSCSI features including device management are available
> > seamlessly with the iSCSI/ISER initiator. ISER simply puts iSCSI
> > on steroids.
> >
> > The ISER implementation makes use of the openIB/kDAPL. Please note
> > that several kDAPL patches that were submitted to the list are
> > necessary for this implementation to work.
> 
> The code is complete crap, please remove it again.

Cristoph,

iSER is part of OpenIB just like any other ULP
And there needs to be a Productive process of adding it to the stack

Your feedback is valuable, but we need to get more details on what
concerns you, the iSER team is committed to address any feedback that
will be presented in this list, after all it's just an initial posting 

Yaron 

> 
> ___
> openib-general mailing list
> openib-general@openib.org
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit
http://openib.org/mailman/listinfo/openib-
> general
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[openib-general] FW: [Ips] iSER over IB - Consensus call

2005-08-16 Thread Yaron Haviv

FYI, For the ones that don't track the IETF iSCSI WG

In the last IETF meeting in Paris iSER (iSCSI RDMA) over InfiniBand was
discussed again, and as you can see below IETF gave its green light to
do the few semantic changes in the iSER RFC and generalize it to IB
Can also note that iSER over IB/iWarp RFC is in the Last Call status 

It is interesting to see the convergence with OpenIB adding iWarp
drivers, and IETF adding IB to the iSER RFC, resulting in a common set
of Drivers, ULPs, and remote boot support. 

Yaron

-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of
[EMAIL PROTECTED]
Sent: Tuesday, August 09, 2005 11:02 PM
To: ips@ietf.org
Subject: [Ips] iSER over IB - Consensus call

The IPS WG Paris meeting discussed:

iSER over InfiniBand (draft-hufferd-iser-ib-00.txt)

Proposal for text edits to iSER to permit use on other
transports,
including InfiniBand.  Also will help enable iSER to be defined
over
SCTP.

This draft is (or at least is intended to be) entirely editorial -
it does not (or at least is not intended to) make any technical
changes to the iSER draft that has passed WG Last Call.

The draft Paris minutes record the following:

Sense of room: Want to proceed towards applying these changes
(after careful review and WG rough consensus) to the approved
iSER draft so that there is one draft that is broadly applicable
rather than the current iSER draft plus a draft that modifies
that draft to broaden it.

Anyone who objects to this sense of the room in Paris should post
to the list with reasons for the objection, otherwise the sense of
the room to proceed in this direction will become the rough consensus
of the IPS WG.

If the WG does proceed in this direction, the next step will be a WG
Last Call on draft-hufferd-iser-ib-00.txt, with all
changes/comments/etc.
to be posted to the list, even editorial ones.  After conclusion of
that WG Last Call, the resulting edits can be applied to produce a new
version of the iSER draft.  We'll try to get this done by the end
of August, but it may take a bit longer.

Thanks,
--David

David L. Black, Senior Technologist
EMC Corporation, 176 South St., Hopkinton, MA  01748
+1 (508) 293-7953 FAX: +1 (508) 293-7786
[EMAIL PROTECTED]Mobile: +1 (978) 394-7754


___
Ips mailing list
Ips@ietf.org
https://www1.ietf.org/mailman/listinfo/ips
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [openib-general] [iSER]How to use the iSER with the UNH iSCSI

2005-08-04 Thread Yaron Haviv

Ian,

Currently the UNH iSCSI doesn’t support the "Datamover API" which is a new API 
defined in IETF and enable iSCSI to run over offload technologies such as iSER

In addition the iSER code that is in OpenIB covers the Initiator side 
The Target code is (and being) integrated into few commercial products, or can 
be provided under some licensing  

There are few that intend to enable the datamover API in the UNH iSCSI and 
integrate it with iSER, they would be happy to see more helping hands, if you 
are interested I can hook you up with them 

Yaron 

> -Original Message-
> From: [EMAIL PROTECTED] [mailto:openib-general-
> [EMAIL PROTECTED] On Behalf Of Ian Jiang
> Sent: Thursday, August 04, 2005 3:10 AM
> To: openib-general@openib.org
> Subject: [openib-general] [iSER]How to use the iSER with the UNH iSCSI
> 
> Hi, everybody!
> Thanks for all the replis to my "How to get the dat_headers_1_1.tgz"!
> I downloaded the dapl_beta2.06.tgz as Itamar told me.
> And I made some modification to the iSER to use it on the x86_64 platform.
> 
> I got through the compiling finally, but here is another question:
> How to use the iSER with the UNH iSCSI? I have the UNH iSCSI running on my
> system at present. Need I modify it and reinstall?
> 
> And I'm not sure if the dapl_beta2.06 has to be installed to run the iSER.
> In fact, I did not compile or install the dapl before installing the iSER.
> 
> Any suggestion is appriciated!
> 
> Ian Jiang
> [EMAIL PROTECTED]
> 
> Computer Architecture Laboratory
> Institute of Computing Technology
> Chinese Academy of Sciences
> Beijing,P.R.China
> Zip code: 100080
> Tel: +86-10-62564394(office)
> 
> _
> 免费下载 MSN Explorer:   http://explorer.msn.com/lccn
> 
> ___
> openib-general mailing list
> openib-general@openib.org
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-
> general
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [openib-general] Re: [Rdma-developers] Meeting(07/22) summary:OpenRDMA community development discussion

2005-08-01 Thread Yaron Haviv

> -Original Message-
> From: Fab Tillier [mailto:[EMAIL PROTECTED]
> Sent: Monday, August 01, 2005 1:14 PM
> 
> > From: Sean Hefty [mailto:[EMAIL PROTECTED]
> >
> > Yaron Haviv wrote:
> > > we can spend time and discuss theories and intentions, at the end
of
> the
> > > day an iWarp RNIC cannot just reside under IB-Verbs without major
> > > changes to the overall infrastructure.
> >
> > I don't disagree with having a common connection library that
supports
> both
> > IB and iWarp, or that you could derive a solution from kDAPL.  But
based
> on
> > the proposed APIs that I've seen, I believe that an RNIC could
reside
> under
> > IB verbs with minimal changes, and would likely be the best
engineered
> > solution for including RNIC support in Linux.
> 
> Just for clarity, when you say verbs you exclude connection
> establishment/management, right?
> 
> I think keeping the two distinct is important in this discussion, as
it
> seems
> there is some confusion - some people refer to verbs as verbs + CM,
others
> as
> just verbs.
> 
> Here's my take from the discussions so far:
> - RNICs can probably be made to work under the IB verbs (with changes
of
> course).
> - RNICs can probably not be made to work under the IB CM (not that
I've
> seen
> this suggested).
> 

Fab, I did the same distinction between pure verbs & the broader API
(+CM, SA, ..)

I agree that pure send, receive, .. verbs are similar with minor
differences 
And we may just want to adopt them with minor changes

On the other hand it would not be efficient to try and bend the iWarp CM
model to the IB (complex) one, but rather use a simpler one, such as the
one in DAPL that fits both camps

In IB we need to use a CM and a bunch of SA queries, where the ULP
doesn't really need all that and can do with a simple BSD like
connection request (that may map to a more complex IB or iWarp model
underneath)

There are ways in the dapl/bsd like connection mechanism enough to imply
sequrity/QoS/etc' (using a src/dst IP, network implied from IP, and
kDAPL QoS or BSD TOS, ..) so a user doesn't need direct access to SA for
connections, at the most we can add some flags to it

Yaron


> - Fab

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [openib-general] Re: [Rdma-developers] Meeting (07/22) summary:OpenRDMA community development discussion

2005-08-01 Thread Yaron Haviv

> -Original Message-
> From: Roland Dreier [mailto:[EMAIL PROTECTED]
> Sent: Monday, August 01, 2005 11:14 AM
> To: Yaron Haviv
> Cc: Christoph Hellwig; Tom Duffy; Venkata Jagana; rdma-
> [EMAIL PROTECTED]; openib-general@openib.org
> Subject: Re: [openib-general] Re: [Rdma-developers] Meeting (07/22)
> summary:OpenRDMA community development discussion
> 
> Yaron> It would probably be wise to try and merge that effort with
> Yaron> IB-verbs etc' (e.g. make the verbs portion of the API
> Yaron> closer), and on the same time preserve the effort that was
> Yaron> done in kDAPL to overcome the differences (e.g. in the CM,
> Yaron> addressing portions)
> 
> This doesn't seem like the right approach to me but we'll be happy to
> review your patches.

So how would you reconcile the differences between IB & iWarp, and
specifically on the connection establishment portion ?
In your approach would I need to access different CM APIs for IB & for
iWarp in my ULP ?

>From my perspective the current kDAPL solves that problem (w/o any
additional patches), and we are trying to re-invent the wheel here. 

If patches are really needed they can probably applied to the kDAPL code
(i.e. remove redundant code/simplify kDAPL), however this is an
optimization that can always be done later. 

Yaron

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [openib-general] Re: [Rdma-developers] Meeting (07/22) summary:OpenRDMA community development discussion

2005-07-31 Thread Yaron Haviv

> -Original Message-
> From: [EMAIL PROTECTED] [mailto:openib-general-
> [EMAIL PROTECTED] On Behalf Of Christoph Hellwig
> Sent: Friday, July 29, 2005 8:02 AM
> To: Tom Duffy
> Cc: Venkata Jagana; [EMAIL PROTECTED]; Christoph
> Hellwig; openib-general@openib.org
> Subject: [openib-general] Re: [Rdma-developers] Meeting (07/22)
> summary:OpenRDMA community development discussion
> 
> On Thu, Jul 28, 2005 at 02:02:08PM -0700, Tom Duffy wrote:
> > At OLS (and in previous forums), the kernel maintainers have made it
> > *very* clear that there should only be one API.
> 
> _and_ that this api is neither RNIC-PI or KDAPL.  In fact for anything
> that doesn't look very similar to the current IB midlayer you'd need
> very convincing arguments.
> 

I assume it is not as simplistic as that 
iWarp CM model is quite different than IB, and iWarp doesn't have SA/SM
and a bunch of other IB specific things 

For example: 
The correct common abstraction is one where a user can issue a
connection by using a logical end-point address (such as an IP), and
doesn't have to deal with the IB or iWarp specific CM state machine or
SA/SM. 

If you look at DAPL you can break it to simple Verbs (e.g. send, ..)
where its just a simple overlay on to of the verbs (and may be
redundant) 
However there is a second part that implements a simple connection
establishment model (much like BSD) that can be mapped to both IB (CM,
SA, ..) or iWarp (TCP Syn/SynAck, ARP, etc'), this serves couple of main
purposes:
a. make it simple for ULP developer and put the complex part in a common
place   
b. define a common model for different HW

we can spend time and discuss theories and intentions, at the end of the
day an iWarp RNIC cannot just reside under IB-Verbs without major
changes to the overall infrastructure.
Several guys spent some time looking it over and came with an
abstraction that IS possible on top of IB & iWarp & foo, that is called
DAPL (or IT as another similar alternative)

It would probably be wise to try and merge that effort with IB-verbs
etc' 
(e.g. make the verbs portion of the API closer), and on the same time
preserve the effort that was done in kDAPL to overcome the differences
(e.g. in the CM, addressing portions)

Yaron

> ___
> openib-general mailing list
> openib-general@openib.org
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit
http://openib.org/mailman/listinfo/openib-
> general
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [openib-general] IBDM and IBMgtSim Proposal Comments

2005-07-07 Thread Yaron Haviv

> -Original Message-
> From: [EMAIL PROTECTED] [mailto:openib-general-
> [EMAIL PROTECTED] On Behalf Of Fab Tillier
> Sent: Thursday, July 07, 2005 11:37 PM
> To: Hal Rosenstock; 'Eitan Zahavi'
> Cc: [EMAIL PROTECTED]; openib-general@openib.org
> Subject: RE: [openib-general] IBDM and IBMgtSim Proposal Comments
> 
> > From: Hal Rosenstock [mailto:[EMAIL PROTECTED]
> > Sent: Thursday, July 07, 2005 10:56 AM
> >
> > In the OpenIB architecture, umad is the lowest layer library and the
> > diagnostics are built on that.
> 
> That's only true in the *Linux* OpenIB Architecture.  Windows is
different
> - the
> access layer already provides support for user-level MAD clients, and
the
> API is
> very close (if not identical) to the IBAL interface OpenSM was
originally
> written to.
> 

>From my understanding the main advantage for using the OSM Vendor
specific layer is that it is also present in Windows ? 
or does it have some other advantage over the umad layer (from Hal's
response seems like umad has better layering/functionality) ?

If that is the case than you can also suggest to replace the OpenIB
verbs layer or CM, etc' with the IBAL one because its present in Windows


I believe if we want to do a major change in the management
infrastructure that is live and kicking (can probably improve like
always) 
We need a much better reason than "its done this way in Windows"

Yaron


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [openib-general] [iser]about the target

2005-07-05 Thread Yaron Haviv









Ian,

 

An iSER target is basically an iSCSI
target with an add-on of an iSER transport

As you may know Voltaire contributed the
iSER Initiator and also has a full Target implementation that was tested with
it

 

There are few Target solutions that are
possible:

There is at least one major storage vendor
working on a target that will be available later on

Voltaire also provides a software package
that can turn a server to an iSCSI/iSER target (not open source)

As well as gateway solutions from iSER to
FC and GbE 

In addition UNH has started some work on enabling
iSER on their Open Source iSCSI target 

 

Logically iSER target is a (much faster)
iSCSI Target 

So just like in iSCSI you can bridge iSER
to FC just like Cisco
 MDS Bridge
from iSCSI to FC 

There are standards that define the exact
mapping between the FC naming and iSCSI naming 

 

Yaron

 











From:
[EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Michael Krause
Sent: Tuesday, July 05, 2005 7:28
PM
To: Ian Jiang;
openib-general@openib.org
Subject: Re: [openib-general]
[iser]about the target



 

At 06:07 PM 7/4/2005, Ian Jiang wrote:



Hi!
I am new to the iSER.
On " https://openib.org/tiki/tiki-index.php?page=iSER", it
is said that iSER currently contains initiator only (no target). Will the
target come out later? How did they test the iSER initiator without a iSER
target?
Could you give some explaination?


>From a practical perspective, there are very few iSCSI targets shipping
today.  Most people had envisioned iSER over IB to a gateway Ethernet
device since native IB storage is also quite rare in terms of real
product.  For many of us, our push for iSER over IB was to replace SRP
which has a deficient ecosystem thus not really used beyond some basic Fibre
Channel gateway cards.  

Mike




Thanks!



Ian Jiang
[EMAIL PROTECTED]

Computer Architecture Laboratory
Institute of Computing Technology
Chinese Academy of Sciences
Beijing,P.R.China
Zip code: 100080
Tel: +86-10-62564394(office)

_
ÓëÁª»úµÄÅóÓÑ½øÐÐ½»Á÷£¬ÇëÊ¹ÓÃ MSN Messenger:  http://messenger.msn.com/cn


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general








___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [openib-general] performance counters in /sys

2005-05-19 Thread Yaron Haviv

> -Original Message-
> From: [EMAIL PROTECTED] [mailto:openib-general-
> [EMAIL PROTECTED] On Behalf Of Hal Rosenstock
> Sent: Thursday, May 19, 2005 11:22 PM
>
> On Thu, 2005-05-19 at 16:11, Mark Seger wrote:
> >
> > The only other thing that could be useful would be an extra field
for
> > the protocol, such that for a given interface/port, I could see the
> > traffic counters for each type of protocol that one might choose to
> > support, such as mpi, portals, etc.
> 
> There are no hardware counters for these. These would need to be
filled
> in somehow by software.
> 

Mark/Hal,

I believe you can use the per VL counters for that 
(IB allows counting traffic on a specific VL)
By matching ULPs to VLs (e.g. through the ib_at lib we suggested)
You can get both congestion isolation per traffic type as well as the
ability to count traffic per ULP 
(note that up to 8 VLs are supported in the Mellanox chips)

Yaron
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [openib-general] IB Address Translation service

2005-03-05 Thread Yaron Haviv

> -Original Message-
> From: [EMAIL PROTECTED] [mailto:openib-general-
> [EMAIL PROTECTED] On Behalf Of Hal Rosenstock
> Sent: Saturday, March 05, 2005 6:18 PM
> To: David M. Brean
> Cc: openib-general@openib.org
> Subject: Re: [openib-general] IB Address Translation service
> 
> On Sat, 2005-03-05 at 10:22, David M. Brean wrote:
> > There is an I-D for DHCP on IB.  IPoIB defines a "broadcast" address
and
> > DHCP (and ARP) on IB use it.  Could make RARP work using this
mechanism,
> > but as someone else pointed out, the IB hardware address contains a
> > QPN.  The I-D for IPoIB says something like:
> >
> > The link-layer address for IPoIB includes the QPN which might
not be
> > constant across reboots or even across network interface resets.
> > Cached QPN entries, such as in static ARP entries or in RARP
servers
> > will only work if the implementation(s) using these options
ensure
> > that the QPN associated with an interface is invariant across
> > reboots/network resets.
> 
> That may be the requirement but I think there are some issues with
> keeping the QPN invariant. Quoting Dror Goldenberg
>
(http://openib.org/pipermail/openib-general/2004-November/006765.html):
> "Assigning specific QPN for ipoib requires allocation of QPN space
which
> is beyond IB spec verbs. Current verbs do not allow it. I don't have
any
> objection for that, except that you have to hold a set of preallocated
> QPs with specific numbers and hand them over to privileged consumer
when
> requested to.  I wouldn't commit that it will work on any HCA
> architecture."
> 
> -- Hal
> 

Just to add to Hal and Dave, it is not only that the QPN may not be
constant, you can actually have few valid QPNs, one or more per
partition, since each partition reflects the notion of an IP
VLAN/Network the RARP should return different IP per partition, and the
RARP caller should use different QPN in each case.

I believe all the emails in this thread clarify why RARP is not a valid
approach

Yaron
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [openib-general] putting in dead wood for DAPL and similarabomination

2005-03-03 Thread Yaron Haviv

> -Original Message-
> From: Christoph Hellwig [mailto:[EMAIL PROTECTED]
> Sent: Thursday, March 03, 2005 5:48 AM
> To: Yaron Haviv
> Cc: Christoph Hellwig; James Lentini; openib-general@openib.org
> Subject: Re: [openib-general] putting in dead wood for DAPL and
> similarabomination
> 
> The current iSER code is 10928 LOC, add to that 22155 LOC of kDAPL
(not
> including the actual provider for IB) and 5822 LOC linux-iscsi kernel
> code. Compare that to the 25412 LOC total for drivers/infiniband in
Linux
> 2.6.11.

As Tom indicated we expect a significant code shrink for kDAPL, it will
be much more Linux friendly when we are done with it, some parts will be
re-written.
Also the iSER code is not optimal in terms of LOC, and we can clean up
some redundant code if we are in an LOC contest, I believe after we glue
all the layers we will focus on reducing LOCs and test code.
 
> Here's the challenge: if someone gets me the funding I'll write
> complete iSER of IB implementation in less than 10k LOC based on the
> open-iscsi code if someone gets me the funding.

You know there is also the challenge of making it work, perform,
interoperate, and support some features, not all is about LOC :)
Anyway thanks for offering us support we may take you up on the some day
 
Yaron
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [openib-general] putting in dead wood for DAPL and similarabomination

2005-03-02 Thread Yaron Haviv

> -Original Message-
> From: [EMAIL PROTECTED] [mailto:openib-general-
> [EMAIL PROTECTED] On Behalf Of Christoph Hellwig
> Sent: Wednesday, March 02, 2005 11:49 PM
> To: James Lentini
> Cc: Christoph Hellwig; openib-general@openib.org
> Subject: Re: [openib-general] putting in dead wood for DAPL and
> similarabomination
> 
> On Wed, Mar 02, 2005 at 11:11:35AM -0500, James Lentini wrote:
> > DAPL has been efficiently supported on top of InfiniBand, iWARP, the
> > Virtual Interface Architecture, Quadrics, and Myrinet.
> 
> And I've not seen any kernel submittsion for either of them - and
what's
> important no single kDAPL application that actually shows any benefit
> that way.  Volatair's iSER implementation would surely be smaller when
> directly written to the OpenIB interface, and is already smaller than
> the whole kDAPL layer.

Christoph, the reason the iSER code is very thin is that it is using
kDAPL
(and Linux iSCSI), it doesn't need to deal with SA calls, CM calls,
LIDs, GIDs, and a bunch of other things.

Besides being RDMA transport independent DAPL enable people to code to
RDMA without been intimately familiar with the HW, we saw people coding
to it in days, Which I can't say the same for Verbs.

Abstract layers are not new to Linux, Sockets is another type of
abstraction with multiple protocols/families underneath, or even
Ethernet 
Why aren't you suggesting to do TCP implementation for ATM cards, and
one for PPP, etc' 

Yaron
 
> ___
> openib-general mailing list
> openib-general@openib.org
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit
http://openib.org/mailman/listinfo/openib-
> general
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [openib-general] IB Address Translation service

2005-03-02 Thread Yaron Haviv

> -Original Message-
> From: Tom Duffy [mailto:[EMAIL PROTECTED]
> Sent: Wednesday, March 02, 2005 1:02 AM
> To: Yaron Haviv
> Cc: openib-general@openib.org
> Subject: RE: [openib-general] IB Address Translation service
> 
> [ putting back on list ]
> 
> On Wed, 2005-03-02 at 00:29 +0200, Yaron Haviv wrote:
> > Did you try RARP with IPoIB ?
> 
> I have not.
> 
> > I thought that there is some issue that it doesn't work
> 
> Currently, the rarpd only works with ethernet, but I don't see why
this
> couldn't be fixed.
> 

Tom, IPoIB HW Address consists of GID+QPN+.. 
In order to issue a RARP I believe you should supply the full HW address
to get the IP address back, how would you know the remote IPoIB QPN ? or
can you do it without a QPN ?

Yaron 
 
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [openib-general] putting in dead wood for DAPL and similarabomination

2005-03-01 Thread Yaron Haviv

> -Original Message-
> From: [EMAIL PROTECTED] [mailto:openib-general-
> [EMAIL PROTECTED] On Behalf Of Christoph Hellwig
> Sent: Wednesday, March 02, 2005 12:06 AM
> To: openib-general@openib.org
> Subject: [openib-general] putting in dead wood for DAPL and
> similarabomination
> 
> Please don't put in things like the address translation service or
> memory windows for DAPL folks.  The IB code in the kernel already
> has far too much unused stuff and adding more will not go past reviews
> for kernel inclusions - as will DAPL itself exactly because of such
> utter stupidities.  

Even if your approach to DAPL was right you still have address
translation service in SDP, and would need one for NFS/RDMA, and another
one to iSER and another one for Lustre, etc' (even if they are coded
directly to the verbs) Not to mention other protocols that access the SA
(e.g. SRP, ..).

So is your idea to duplicate that functionality for all the ULPs ?
Would that make the code simpler and easier to maintain ?

Yaron

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [openib-general] IB Address Translation service

2005-03-01 Thread Yaron Haviv

Eric, let me correct some of your assumptions
Which this API is actually targeting to protect against, see below 

> -Original Message-
> From: Eric W. Biederman [mailto:[EMAIL PROTECTED] On Behalf Of Eric W.
> Biederman
> Sent: Tuesday, March 01, 2005 9:18 AM
> To: Yaron Haviv
> Cc: Roland Dreier; shaharf; openib-general@openib.org
> Subject: Re: [openib-general] IB Address Translation service
> 
> "Yaron Haviv" <[EMAIL PROTECTED]> writes:
> 
> > > -Original Message-
> > > From: [EMAIL PROTECTED] [mailto:openib-general-
> > > [EMAIL PROTECTED] On Behalf Of Roland Dreier
> > > Sent: Monday, February 28, 2005 7:13 PM
> > > To: shaharf
> > > Cc: openib-general@openib.org
> > > Subject: Re: [openib-general] IB Address Translation service
> > >
> > > This API seems overly complex and at the same time too inflexible
to
> > > me.  However, rather than getting bogged down nitpicking about
APIs, I
> > > think we have to take a few steps back.
> >
> > I believe the API is very flexible, but we are pretty open to here
what
> > you think is needed in addition
> >
> > > First, let's understand the problem we're trying to solve.  Who
are
> > > the consumers of this address translation service?
> >
> > The first problem is that most ULPs use valid IP addresses for
> > simplicity (DAPL, iSER, NFS/RDMA, SDP, MPI, etc') and someone needs
to
> > resolve it to an IB address and device to use IB. This should take
into
> > account cases where there are more than one HCAs in the system.
> > Preferable/optionally the ULP would like to know which partition to
use
> > if there is more than one, and leverage on the IP subnetting done by
> > IPoIB.
> 
> I am confused.  In any sane network the translation is:
> Hostname -> address.
> 
> IP because it spans multiple networks does:
> Hostname -> IP address -> hw address.
> 
> IB because it can span multiple IB networks does:
> GUID+QPN -> LID + QPN.
> 
> So what is wrong with simply doing:
> Hostname -> GUID
> ???

1. In standard protocols such as SDP, iSER, NFS/RDMA, Oracle, .. (unlike
OSU MPICH) the name service is one of the standard IP name services
mapping Host names to IP addresses, and the ULP accepts a destination IP
and NOT a Host name.

2. InfiniBand Hardware address is a GID and not LID, LID is a path
attribute implemented to avoid the slow 48 bit lookup done in Ethernet
and enable multi-pathing. A LID address is dynamically allocated; you
may also have multiple LID addresses per port.
(OSU MPICH implementation is a bad example for IB citizenship) 

So to summaries:

Ethernet:   Host Name -> IP -> MAC Address 
InfiniBand: Host Name -> IP -> GID Address -> Path (LID, SL, ..)

So If we intend to relay on standard name services we can start with IP
(or implement a proprietary name service for Name->HW Addr if we wish)

Than we need to translate an IP to HW address (GID/GUID) and the
equivalent of VLANs (partitions), this is provided by the
ib_at_route_by_ip call
And internally it is based on IP and IPoIB mechanisms similar to how
Libor implemented it in SDP (and optionally if we see a need using ATS).

Than in IB we need to resolve a GID to path attributes, which consist of
LID, SL/VL, MTU, etc'
The inputs to that are the source, destination, partition and QoS
attributes, and the result is a path, since IB also support
Multi-pathing, a user may receive multiple paths that can be used for
high-availability, performance aggregation, or source based routing.
A path may also travel through isolated congestion domains using VLs. 

The ib_at_paths_by_route call allows resolving HW Address + preferences
to one or more path records that are than used by the ULP & CM.
It can also be used by non-IP based ULP's such as SRP or MPICH, that is
why the API unlike the current SDP implementation is divided to 2 calls
one for HW address, and one for path.

Currently OSU MPICH is using Proprietary Name and LID+QP assignment, it
doesn't work the standard IB way with SA & CM, which is not making use
of a lot of IB capabilities, and is also making it more static and less
robust, I wouldn't use that as the example for ULP implementation.
The MPI layer which doesn't have any idea about the fabric
routing/utilization/availability is determining the path. 
Another simple scenario your application requires is to run MPI and NFS
on different IB VLs, today you need to manually configure (recompile)
that in each ULP, with that proposal it can be done automatically with a
central configuration on the SM.

On the other hand SDP uses same mechanisms; however we cannot use it for
other ULP's (e.g. kDAPL), and also it is missing functional

RE: [openib-general] IB Address Translation service

2005-02-28 Thread Yaron Haviv

From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Michael Krause
Sent: Tuesday, March 01, 2005 2:07 AM
To: openib-general@openib.org
Subject: RE: [openib-general] IB Address Translation service

At 11:47 AM 2/28/2005, Yaron Haviv wrote:

>It would be a mistake to attempt to use anything by IP addresses (v4 or
v6) from an >application perspective. Mapping to IB must be application
transparent to be viable.
>
>Mike

Mike, the all idea behind the proposed ib_at calls is to provide
semantics matching between IP and IB in a way where the applications
wont feel the difference, but will still make use of all IB
capabilities. Advanced applications can still use the advanced IB
specific functionality through the optional parameters.

ATS is just a minor option in the API (the less important one)

Yaron
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [openib-general] IB Address Translation service

2005-02-28 Thread Yaron Haviv

> -Original Message-
> From: Libor Michalek [mailto:[EMAIL PROTECTED]
> Sent: Tuesday, March 01, 2005 2:04 AM
> To: Yaron Haviv
> Cc: Paul Baxter; openib-general@openib.org
> Subject: Re: [openib-general] IB Address Translation service
> 
> On Mon, Feb 28, 2005 at 09:55:50PM +0200, Yaron Haviv wrote:
> > From: Libor Michalek
> > >
> > > The two are not interoperable, they
> > > reside in parallel, and succeed in producing much confusion. (IMO)
> >
> > One note, the two can be made interoperable, if nodes that use IPoIB
> > register them self in the ATS database as well (which has its merits
for
> > reverse resolution that cannot be satisfied by IPoIB), this way the
> > nodes that just use ATS can locate the IPoIB ones.
> 
>   This relies on each node in a fabric keeping the information between
> the two parallel methods in sync. Which leads to the question, why
have
> two independent methods for getting the exact same information? The
> only logical answer is that there are some nodes which can only use
> one of the methods. In which case the two sets of data are not
identical,
> because of these nodes, which succeeds in producing much confusion.
Not
> to mention the race conditions between keeping a centralized database
> (ATS)
> in sync with the distributed mechanism. (ARP)
> 
>   For these reasons I cringe at hearing IP address and ATS in the
> same sentence, I really wish DAT had chosen a different name for
> the addresses.
> 
>   Really, we all discussed this years ago in the IETF, the merits of
> using broadcast vs. centralized data store, and a solution was
> developed. This is why open standards bodies are so useful.
> 

Libor, I agree with most of your statements here, I also advocated to
use ARP based mechanisms in the DAT calls rather than ATS.
And our DAPL implementation enable ARP based resolution in addition to
ATS 
The one thing that ATS provide and is not possible with ARP is reverse
resolution GID->IP, any ideas how to achieve that without ATS ?

The protocols such as SDP and iSER pass the source IP address as part of
the CM REQ Private Data, so they don't really need the reverse
translation, DAT people have tried to make a generalized mechanism 

I assume James or Arkady should comment on the need for ATS and DAPL
reverse resolution

One other approach can be to provide ATS support to user applications
only, and eliminate the kDAPL support for those functions.
Also I kind of like Paul's application for thin IB clients.

> > Anyway the merits if the proposed API goes much beyond the use of
ATS,
> > so I hope we don't just hang on that one.
> 
>   Agreed, there is certainly a lot more to discuss then just ATS and
ARP.
> 

Any comments on my email explaining the forward resolution mechanisms
(IP->GID, GID->path) ?  (not relating to ATS)

Yaron

> 
> -Libor
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [openib-general] IB Address Translation service

2005-02-28 Thread Yaron Haviv

> -Original Message-
> From: Tom Duffy [mailto:[EMAIL PROTECTED]
> Sent: Tuesday, March 01, 2005 1:38 AM
> To: Yaron Haviv
> Cc: Paul Baxter; openib-general@openib.org
> Subject: RE: [openib-general] IB Address Translation service
> 
> On Mon, 2005-02-28 at 21:47 +0200, Yaron Haviv wrote:
> > And as you mentioned there is value to have the same API for
different
> > resolution mechanisms, the SDP code can be altered in future to ride
> > over the proposed API, so it can be used without TCP/IP.
> 
> I am not sure you are gaining much by having SDP use straight ATS.
> Already, once the ARP table is filled with the information, it is a
> local cached lookup.


Tom, the value to Paul is not performance related, but an ability to
resolve an IP to GID without requiring a TCP/IP implementation, I can
think of some applications that would like thin client and still use
Valid IPs

Anyway as I mentioned the ATS support is just one (very minor) thing in
the API we proposed, can I assume you don't have comments to the main
functionality in ib_at ?

Yaron
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [openib-general] Question

2005-02-28 Thread Yaron Haviv

Ron, I believe netdiscover uses direct route MADs
So it can work also when the fabric is not fully initialized 

Yaron

> -Original Message-
> From: [EMAIL PROTECTED] [mailto:openib-general-
> [EMAIL PROTECTED] On Behalf Of Ronald G. Minnich
> Sent: Tuesday, March 01, 2005 12:07 AM
> To: openib-general@openib.org
> Subject: [openib-general] Question
> 
> 
> If ibnetdiscover can do stuff like this:
> hcaguids=0xc074660801c90200
> Hca 2 "H-0002c901086674c0"  # MT23108 InfiniHost Mellanox
> Technologies
> [1] "S-0002c90112c08b40"[2] # lid 0 lmc 0
> 
> 
> etc. etc.
> 
> i.e., probe all the way to the edge of the network and find things
out,
> what could be going on such that opensm won't work at all? I did an
svn
> update and complete rebuild friday. But opensm is still totally stuck.
> 
> I have power cycled all switches, and indeed the whole system. I did
yank
> (yet another) dead power supply on one mellanox switch, but still ..
> ibnetdiscover is happy, and opensm is not.
> 
> opensm -r does not help.
> 
> I'm basically baffled. What is ibnetdiscover able to do that opensm is
not
> able to do?
> 
> thanks
> 
> ron
> ___
> openib-general mailing list
> openib-general@openib.org
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit
http://openib.org/mailman/listinfo/openib-
> general
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [openib-general] IB Address Translation service

2005-02-28 Thread Yaron Haviv

> -Original Message-
> From: [EMAIL PROTECTED] [mailto:openib-general-
> [EMAIL PROTECTED] On Behalf Of Libor Michalek
> Sent: Monday, February 28, 2005 8:55 PM
> To: Roland Dreier
> Cc: openib-general@openib.org
> Subject: Re: [openib-general] IB Address Translation service
> 
> 
>   SDP does implement a subset of the proposed functionality for
> resolving IP addresses to PathRecords which can then be used in
> a CM REQ request, plus some basic caching. All the code is isolated
> to a single file, sdp_link.c. There's really only a single entry
> point API, plus a completion function:
> 
> int sdp_link_path_lookup(u32 dst_addr,
>u32 src_addr,
>int bound_dev_if,
>void (*completion)(u64 id,
>   int status,
>   u32 dst_addr,
>   u32 src_addr,
>   u8  hw_port,
>   struct ib_device *ca,
>   struct ib_sa_path_rec *path,
>   void *arg),
>void *arg,
>u64  *id);
> 
>   The values are based on strictly what is needed by either the Linux
> routing code to resolve the address, or the IB APIs to establish the
> connection. The implementation has three stages:
> 
>   - src/dst IP address -> IPoIB net_device, IB ca, IB port, IB pkey.
>   - dst IP address and IPoIB net_device -> dst GID using IPoIB ARP
>   - dst GID -> PathRecord using ib_sa.
> 
>   A cancel function based on the 'id' parameter would be a nice to
have
> but is not strictly necessary, since the lookup will eventually
compelte
> one way or another and any dead connection will be cleaned up at that
> point.
> 

Libor the idea is that ib_at provides similar functionality 
Sahar looked through your SDP code prior to proposing the API
We would like to have a common API for all the ULP's that provide that
functionality, and specifically now when we implement kDAPL over OpenIB.

To summaries the differences:

The reasons we broken it to two functions (IP->GID, GID->Path) and not
have an IP->Path API (like we also used to have in our gen1 stack) are:

a. some consumers will only need the 1st part (e.g. just to know which
HCA to use)
b. some may use only the 2nd part (e.g. IPoIB, SRP)
c. you can get parameters from the first part (e.g. P_Key, and decide to
overwrite it with your own P_Key, etc')
d. the 2nd function provides more options for multipath, partitioning,
QoS
e. we can now more easily use different IP resolution mechanisms without
changing the 2nd function (ARP or ATS).  

We added source IP and TOS as optional parameters for the IP->GID, just
because IP route can be defined for Src/dst/TOS, and it's already part
of Linux.
we added multipath, IB QoS etc' because we have more than one
applications that need it today, e.g. people that want to run IPC/MPI
and NFS on the same fabric may want 2 separate VLs and SLs, some
applications need APM support, some applications need source based
routing, ..
Since there are commercial SMs that can provide all the advanced
capabilities, we want to enable the OpenIB stack to make use of it.

By default you can nest the 2 functions (call the first, and than use
its result to call the second), what you will get is that the ULP will
use the HCA/Port associated with the IPoIB subnet, will use the same
Partition as the IPoIB interface, and will use the same QoS/SL as IPoIB.
Optionally the consumer can put his own QoS, Partitioning, etc' for the
2nd function if he knows where to take it from.

An example of what you can get with the default mode:
You define few partitions in the fabric (central configuration), each
with its IP subnet and SL (from MCRecord).
And then just use different IP subnet for isolation (Partitions or VLs)
NO need for any manual/local configuration on the host side and your
IPC, Storage are now running on separate VLs and/or Partitions, what we
got is something we can explain to users and developers that haven't
read the IB 1000s page book, and don't sit regularly in IBTA meetings.

Yaron

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [openib-general] IB Address Translation service

2005-02-28 Thread Yaron Haviv

> -Original Message-
> From: [EMAIL PROTECTED] [mailto:openib-general-
> [EMAIL PROTECTED] On Behalf Of Libor Michalek
> Sent: Monday, February 28, 2005 9:49 PM
> To: Paul Baxter
> Cc: openib-general@openib.org
> Subject: Re: [openib-general] IB Address Translation service
> 
> The two are not interoperable, they
> reside in parallel, and succeed in producing much confusion. (IMO)

One note, the two can be made interoperable, if nodes that use IPoIB
register them self in the ATS database as well (which has its merits for
reverse resolution that cannot be satisfied by IPoIB), this way the
nodes that just use ATS can locate the IPoIB ones.

That is how it works (successfully) in the Voltaire gen1 stack

Anyway the merits if the proposed API goes much beyond the use of ATS,
so I hope we don't just hang on that one.

Yaron
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [openib-general] IB Address Translation service

2005-02-28 Thread Yaron Haviv

> -Original Message-
> From: [EMAIL PROTECTED] [mailto:openib-general-
> [EMAIL PROTECTED] On Behalf Of Paul Baxter
> Sent: Monday, February 28, 2005 9:32 PM
> To: openib-general@openib.org
> Subject: Re: [openib-general] IB Address Translation service
> 
> Having now just read Yaron's reply, I am even more convinced that this
is
> the right way to go albeit I can't comment on the API etc (Could
someone
> explain the differences in using ARP and ATS. )

Paul,

ATS (Address Resolution Service) is based on each node registering a
service record in the SM/SA with GID&P_Key=IP address.
When you want to map an IP address to IB address it issues an SA query
to the SM/SA with an IP that results in GID+P_Key values than can be
used by the ULP.

ATS is a standard defined by DAT and recently also by ICSC.

As I mentioned in the IP to GID API you can specify if to resolve based
on the IP infrastructure (like the one Libor described), or based on
ATS, or Default (first try IP/ARP, than ATS).

And as you mentioned there is value to have the same API for different
resolution mechanisms, the SDP code can be altered in future to ride
over the proposed API, so it can be used without TCP/IP.

Yaron
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [openib-general] IB Address Translation service

2005-02-28 Thread Yaron Haviv

> -Original Message-
> From: [EMAIL PROTECTED] [mailto:openib-general-
> [EMAIL PROTECTED] On Behalf Of Roland Dreier
> Sent: Monday, February 28, 2005 7:13 PM
> To: shaharf
> Cc: openib-general@openib.org
> Subject: Re: [openib-general] IB Address Translation service
> 
> This API seems overly complex and at the same time too inflexible to
> me.  However, rather than getting bogged down nitpicking about APIs, I
> think we have to take a few steps back.

I believe the API is very flexible, but we are pretty open to here what
you think is needed in addition 

> First, let's understand the problem we're trying to solve.  Who are
> the consumers of this address translation service?

The first problem is that most ULPs use valid IP addresses for
simplicity (DAPL, iSER, NFS/RDMA, SDP, MPI, etc') and someone needs to
resolve it to an IB address and device to use IB. This should take into
account cases where there are more than one HCAs in the system.
Preferable/optionally the ULP would like to know which partition to use
if there is more than one, and leverage on the IP subnetting done by
IPoIB.

It is possible to replicate the same code you have in SDP (which is also
not complete) across all ULP's, I assume a better way is to provide it
in one central place.
There are also two proposed address resolution mechanisms, one is ARP
used by SDP, and one is ATS used by some DAPL consumers, and we believe
it is better to combine them under the same API.

The second problem relates to mapping of IB GID to one or more Path
records
This is also something needed for ALL ULP's. today each ULP provides the
minimal subset of path resolution functionality without taking into
account topics such as partitioning, QoS, source routing and
multi-pathing.
Some of these require using special SA queries (such as SA Multipath
Record query and QoSPath Query).
I don't think it make sense to put all this functionality into each ULP
as well.

Than we can also discuss, does it make sense to have each path
resolution call lead us to the sa, or does it make more sense to cache
those paths.
And if we cache, doesn't it make more sense to cache/invalidate the
routes to all ULP's rather implementing/having it in each ULP.
Also not sure how a 1000 node cluster functions without the caching.
 
And the last problem is related to reverse resolution from IB to IP
addresses that is needed for DAPL, as well as for different management
and diagnostic tools that want to know what is really that node/port
behind that GID addresses.

So how would you suggest to go about it ?
Duplicate all of that in each ULP ?
Refrain from implementing advanced routing, partitioning, QoS (we cant
really maintain all that advanced code for each ULP) ? 

Our idea is to provide those few helper functions that enable people to
make full use of IB and its features without reading all the IB spec,
and a Phd.
If you clear all the remarks from the library, you will see it is very
slim, and for my understanding includes all the relevant input and
output parameters for each of the 3 functions I mentioned.

As shahar mentioned, this is just a proposal, and if you see any thing
missing in the API, or a better way to address the requirements I just
listed, I'm happy to here.

The API doesn't define the implementation, which we can discuss once we
agree on the functionality and interfaces, and you have some valid
questions there needs to be addressed.

Yaron
 
> Second, let's come up with the right architecture to solve the
> problem.  Are we implementing a library in userspace or a kernel
> module?  Do we have a single cache or do we need multiple caching
> policies?  And so on...
> 
> Finally, we can design the API.
> 
> Thanks,
>   Roland
> 
> 
> ___
> openib-general mailing list
> openib-general@openib.org
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit
http://openib.org/mailman/listinfo/openib-
> general
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: FW: [openib-general] Minutes from DAPL BOF at OpenIB Workshop

2005-02-11 Thread Yaron Haviv

Just to add there is a Lustre NAL over kDAPL in development
And few other application specific protocols done over kDAPL I know of
All those protocols Arkady mentioned can work on both RDMA technologies
and where designed in such a way (I'm familiar with their code and
architecture).

And another grate benefit of kDAPL is the simplification of the Verbs &
Access Layer API, making the ULP's simpler to implement, where a lot of
common functionality is done by a shared library (kDAPL) 
And with a socket like connection establishment flow, etc'

If we want IB to be successful we need to find a way for software
developers to easily build implementations over it (even if not all of
the Applications are open and part of the Linux tree), forcing all Linux
RDMA developers to code to Verbs, CM, SA, ... is probably not the best
approach (the IB spec is many pages as you all know).

For all the guys worried about performance degradation, the latest Verbs
vs DAPL benchmark we did we got 100% the same BW and ONLY 200ns latency
difference.

There is agreement that the current kDAPL API and implementation are not
Linux friendly, and as mentioned before a bunch or people volunteered at
Sonoma to do the work involved in changing it, and agree to make kDAPL
API different than uDAPL and more suitable for kernel.

Yaron

> -Original Message-
> From: [EMAIL PROTECTED] [mailto:openib-general-
> [EMAIL PROTECTED] On Behalf Of Kanevsky, Arkady
> Sent: Friday, February 11, 2005 1:49 AM
> To: Libor Michalek; Matt Leininger
> Cc: Christoph Hellwig; openib-general@openib.org; Tom Duffy
> Subject: RE: FW: [openib-general] Minutes from DAPL BOF at OpenIB
Workshop
> 
> For kDAPL:
> The iSER has been submitted to Open Ib by Voltaire already.
> NFS-RDMA is at http://sourceforge.net/projects/nfs-rdma/.
> 
> For uDAPL: Oracle, DB2 and MPI.
> I am not aware if there is an open source MPI version on uDAPL.
> 
> These are publicly known.
> 
> As far as changing the uDAPL or kDAPL APIs.
> There are application already writen to them.
> There are implementation of these APIs on other platforms besides
Linux.
> It is in nobody's interest to splinter the user community.
> We need the same API on all platforms.
> If there is a good technical reason to change some specific APIs we
> should consider it.
> But the "burn the spec" approach is not a rationale one.
> If we need to change implementation or some definitions in header
files
> it is feasible.
> 
> As far as other transport. As people already mentioned iWARP (IETF
> RDDP).
> IBM talked at the BOF about RNIC PI which is being developed as a
level
> of
> abstraction on the lower end to "discover" all the need info about
> RNIC/HCA.
> It is still no ready so we will start with gen2.
> But lets not loose site of what DAPL brings:
> OS independent,
> Transport independent,
> RDMA APIs!!!
> 
> Thanks for jumping on the code so quickly.
> Arkady
> Chair of DAT Collaborative
> 
> Arkady Kanevsky   email: [EMAIL PROTECTED]
> Network Appliance phone: 781-768-5395
> 375 Totten Pond Rd.  Fax: 781-895-1195
> Waltham, MA 02451-2010  central phone: 781-768-5300
> 
> 
> 
> > -Original Message-
> > From: Libor Michalek [mailto:[EMAIL PROTECTED]
> > Sent: Thursday, February 10, 2005 4:03 PM
> > To: Matt Leininger
> > Cc: Christoph Hellwig; openib-general@openib.org; Tom Duffy
> > Subject: Re: FW: [openib-general] Minutes from DAPL BOF at
> > OpenIB Workshop
> >
> >
> > On Thu, Feb 10, 2005 at 12:36:39PM -0800, Matt Leininger wrote:
> > > On Thu, 2005-02-10 at 12:27 -0800, Grant Grundler wrote:
> > > > On Thu, Feb 10, 2005 at 12:05:58PM -0800, Matt Leininger wrote:
> > > > >   uDAPL - Oracle, MPI
> > > > >   kDAPL - iSER, NFS over RDMA, Lustre?
> > > >
> > > > Lustre will use Sandia Portals AFAIK.
> > > > Anyone know what Portals will use?
> > > > They might directly program to VAPI or something.
> > > >
> > >   There will be a Portals over verbs.  At some point there may be
a
> > > Portals over kDAPL to support both RDMA ethernet and IB.
> >
> >   Yup, that's one of the bigger questions, can it abstract
> > away the differences between two different RDMA technologies?
> > Having a RDMA ethernet and IB providers for kDAPL is
> > insufficient, one would need to show an actual, non-trivial,
> > protocol that works ontop of either provider with no, or
> > little, modification/ifdef'ing.
> >
> > -Libor
> >
> >
> > ___
> > openib-general mailing list
> > openib-general@openib.org
> > http://openib.org/mailman/listinfo/openib-> general
> >
> > To
> > unsubscribe, please visit
> > http://openib.org/mailman/listinfo/openib-general
> >
> ___
> openib-general mailing list
> openib-general@openib.org
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit
http://openib.org/mailman/listinfo/openib-
> general
_

[openib-general] SDP socket address family

2004-10-14 Thread Yaron Haviv

There seems to be a conflict between the currently used SDP socket address family 
number (26) and the current linux kernel. Linux allocates this address family number 
(26) for 'LLC' protocol.

Any ideas if we should change it from 26, and to what ?
 
Below are some related header-file snippets:
 
SuSE-9.1  /usr/include/linux/socket.h:
---
#define AF_IRDA 23  /* IRDA sockets */
#define AF_PPPOX    24  /* PPPoX sockets    */
#define AF_WANPIPE  25  /* Wanpipe API Sockets */
#define AF_LLC  26  /* Linux LLC    */
#define AF_BLUETOOTH    31  /* Bluetooth sockets    */
#define AF_MAX  32  /* For now.. */
 

Voltaire's sdp/sdp-sockets/sdp-sockets.h:
---
# define AF_IBT  26
 

TopSpin's infiniband/ulp/sdp/sdp_inet.h:
---
/*
 * constants shared between user and kernel space.
 */
#define AF_INET_SDP 26 /* SDP socket protocol family */
#define AF_INET_STR "AF_INET_SDP"  /* SDP enabled enviroment variable */
 
Yaron

___
openib-general mailing list
[EMAIL PROTECTED]
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: Fwd: Re: [openib-general] static LID computationwithTS_HOST_DRIVER

2004-09-30 Thread Yaron Haviv









As I mentioned before, I think the best
approach in the long run is to have a well known Loopback LID

(that will stay as an alias also after the
port changed its LID, not to break apps), just like in any other stack 

 

From a short research I did once I think
it is possible to create one even in the current Mellanox HW leveraging on the
Multicast support with little firmware changes, maybe Mellanox can comment on
that 

 

It is also possible to leverage on APM (+
the SMI port change events) if we don’t want to deal with the HCA’s

And in any case we want the apps to be
able to recover from any RC failures gracefully (not just LID changes)

 

Doing manual configuration on each host
violates all the idea of zero configuration and utility computing we all advocate
for

 

Yaron

 

 











From:
[EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Michael Krause
Sent: Friday, October 01, 2004
1:29 AM
To: [EMAIL PROTECTED]
Subject: Re: Fwd: Re:
[openib-general] static LID computationwithTS_HOST_DRIVER



 

At 03:40 PM 9/30/2004, David M. Brean wrote:



The IBA provides two mechanisms for updating subnet management data:

1) through the verbs - see Modify HCA (section 11.2.1.3)
2) through Subnet management packets (SMPs) - see Subnet Management
Class (section 14.2)

The IBA only supports updating the LID via SMPs (#2 above) and an entity
using SMPs must have the M_Key.  If that entity doesn't have the M_Key,
then it can't reliably change the LID.

In addition, the IBA allows an endnode to request, through the verbs
interface provided for the "node reinitialization" (see 14.4.4)
mechanism, that subnet management state, such as the LID, be preserved, when a
port transitions through the DOWN state.  However, the SM may not honor
that request so the endnode must handle that possibility because LID assignment
policy is owned by the SM.  Furthermore, this mechanism is used on ports
that have previously been initialized by the SM (maybe that's why it's called
the reinitialization function :)).

Given the mechanisms in the specification, I think that its possible to have IB
clients use loopback, even under the endnode power-up scenario, while the port
is not in the ACTIVE state and have them continue without disruption when the
port is made ACTIVE on the subnet by the SM with use of the reinitialization
mechanism.  This is a very useful mechanism for various failover
situations.


This is a reasonable approach where the loopback LID being used is updated upon
the port being initialized (akin to solving this in the CI but still allowing
CM to work with a known LID.  It avoids any complexity in the SM having to
preserve LID that may not be optimal or potentially unique within the
subnet.  

Not sure this might work but it seems to me that APM mech could be used to
configure a new configured LID and then transfer the connection to the
configured.  May take a bit of work in CM as APM is nominally set up
during these exchanges.




There is no current IBA mechanism or protocol for an endnode to set
just the LID, even if it had the M_Key, and have the SM preserve that value.


Agreed.

Mike





-David

Roland Dreier wrote:



I don't see anything in the spec that forbids a CA from having an
arbitrary value in PortInfo:LID after initialization but before the SM
discovery (please correct me if I missed something).  I also don't see
anything that forbids an SM implementation from providing a mechanism
for preserving the LIDs it finds or administratively assigning LIDs.

Of course none of this is required but I don't see a problem with
allowing it.



___
openib-general mailing list
[EMAIL PROTECTED]
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general








___
openib-general mailing list
[EMAIL PROTECTED]
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: Fwd: Re: [openib-general] static LID computationwithTS_HOST_DRIVER

2004-09-29 Thread Yaron Haviv

> -Original Message-
> From: [EMAIL PROTECTED] [mailto:openib-general-
> [EMAIL PROTECTED] On Behalf Of Roland Dreier
> Sent: Thursday, September 30, 2004 3:13 AM
> To: Michael Krause
> Cc: [EMAIL PROTECTED]
> Subject: Re: Fwd: Re: [openib-general] static LID
> computationwithTS_HOST_DRIVER
> 
> Michael> The SM is the only entity that is supposed to assign LID
> Michael> as well as the subnet prefix.  The SM should not trust
> Michael> any CA / switch configuration if it has not configured it
> Michael> thus should wipe it out and replace it with what it deems
> Michael> best.
> 
> I don't see anything in the spec that forbids a CA from having an
> arbitrary value in PortInfo:LID after initialization but before the SM
> discovery (please correct me if I missed something).  I also don't see
> anything that forbids an SM implementation from providing a mechanism
> for preserving the LIDs it finds or administratively assigning LIDs.
> 

While I agree that other SM's in a recovery/merge phase should try and
preserve the LID's I think a CA shouldn't just like it is not supposed
to change its own P_Key table, and because it is not aware of the policy
and/or the bigger picture. 
Applications should be designed to deal with LID changes or other RC
connection failures.

But any way out of curiosity how do you generate a unique LID (locally
by the host) for every node in the fabric in a large fabric when the
ports are down and the nodes don't talk to each other ? (I hope not
through Ethernet :))

Or how do you anticipate the LMC value (LID spacing)? 

Yaron 
___
openib-general mailing list
[EMAIL PROTECTED]
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: Fwd: Re: [openib-general] static LID computation withTS_HOST_DRIVER

2004-09-29 Thread Yaron Haviv

I agree with Dave that Static LID is problematic and we should think of
other short and longer term alternative for that 
(There are many cases where the SM may dictate a non random LID
allocation policy, E.g. LMC configuration changes, Subnet Merge, .. and
the HCA is not aware of it).

I believe that the need for it comes from applications that want to talk
to some kind of a loop back adapter without depending on the port state
or even before the port is up.

A better solution that IBTA needs to look at is creating a well known
Loopback LID value that apps use when they want to talk locally (like IP
127...)

It may even be feasible to implement something on the existing HCA HW
(by using one of the unused multicast LID's and some firmware changes)

Yaron

> -Original Message-
> From: [EMAIL PROTECTED] [mailto:openib-general-
> [EMAIL PROTECTED] On Behalf Of Roland Dreier
> Sent: Wednesday, September 29, 2004 5:24 PM
> To: David M. Brean
> Cc: [EMAIL PROTECTED]
> Subject: Re: Fwd: Re: [openib-general] static LID computation
> withTS_HOST_DRIVER
> 
> David> Ok.  How does the port inform the SM that it has a
> David> "preferred" LID?
> 
> The port will already have a LID assigned when the SM discovers it.
> My understanding is that the SM is "encouraged" to preserve a port's
> LID if it doesn't conflict with any other LIDs, and this is what we're
> relying on.
> 
>  - Roland
> ___
> openib-general mailing list
> [EMAIL PROTECTED]
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit
http://openib.org/mailman/listinfo/openib-
> general
___
openib-general mailing list
[EMAIL PROTECTED]
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [openib-general] Re: [openib-commits] r894 -gen2/branches/roland-merge/src/linux-kernel/infiniband/ulp/ipoib

2004-09-27 Thread Yaron Haviv

> -Original Message-
> From: [EMAIL PROTECTED] [mailto:openib-general-
> [EMAIL PROTECTED] On Behalf Of Roland Dreier
> Sent: Monday, September 27, 2004 9:18 PM
> To: Tom Duffy
> Cc: [EMAIL PROTECTED]
> Subject: Re: [openib-general] Re: [openib-commits] r894 -
> gen2/branches/roland-merge/src/linux-kernel/infiniband/ulp/ipoib
> 
> Tom> Doh.  You beat me to the punch.  I was working on the same
> Tom> thing (although, I was trying to do it with a kthread).
> 
> Sorry dude...
> 
> Tom> What do you think is the next step on the TODO that I could
> Tom> start working on?  Don't want to step on your toes...
> 
> I think I've done all the straightforward work on IPoIB now.  We can
> try to figure out how to make it a "native" driver now (ie use the
> full 20 byte HW address instead of hashing down to 6 bytes, etc).  I
> had some inconclusive discussions on [EMAIL PROTECTED] about this
> last week but I still don't know how to do it.
> 

Having a 20 byte HW address towards the upper stack may result in some
unexpected behavior with different networking tools such as sniffers,
etc' ,  a variety of DHCP servers and few other protocols that use the
hardware addresses. 

I suggest we don't rush to incorporate the 20 byte support and think of
more urgent matters, and in any case when we do get to it allow the user
to configure the IPoIB to work in the 6 byte mode, to enable
compatibility with those apps/protocols.

Yaron
 
___
openib-general mailing list
[EMAIL PROTECTED]
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

64 matches

Mail list logo