Re: [openib-general] [PATCH 0/10] [RFC] Support for SilverStorm Virtual Ethernet I/O controller (VEx)
> -Original Message- > From: rick [mailto:[EMAIL PROTECTED] > Sent: Tuesday, October 03, 2006 4:54 PM > To: Michael Krause > Cc: Fabian Tillier; Yaron Haviv; Roland Dreier (rdreier); Kuchimanchi, > Ramachandra; openib-General > Subject: Re: [openib-general] [PATCH 0/10] [RFC] Support for SilverStorm > Virtual Ethernet I/O controller (VEx) > > For what it's worth: As a customer who is using the SS stack - we were > more than pleased that we could achieve IPOIB (and RDS) failover without > using the bonding driver. I believe this is direct result of the Virtual > NIC approach SS is using. Rick, if such functionality (w/o the bonding driver) is needed It can also be implemented into IPoIB (we had it in our old stack) It has no direct relation to the Virtual NIC. It may even be preferred if it's IPoIB and not a proprietary gateway driver, so also IB nodes in the same fabric can use that functionality. The only point I'm making is that any one can add an overlay driver for his proprietary HW as he likes, and put it in OFED distribution, but if this is becoming an internal portion of the open fabric kernel than: 1. Let's look at how we solve the problems in a more general perspective 2. Let's not duplicate code where we can avoid it 3. Let's make sure it's documented and reviewed (code and architectural wise) We have kept those standards for all other solutions; I think it's just as fair to demand it in that case as well Yaron > > Michael Krause wrote: > > >Silverstorm is executing a usage model that the IBTA used to develop the > IB > >protocols. What is the problem with that? If it works and integrates > >into the stack, then this seems like an appropriate bit of functionality > to > >support. The fact that one can use a standard ULP to communicate to a > TCA > >as an alternative which is supported by the existing stack is a customer > >product decision at the end of the day. If Silverstorm or any IHV can > >show value and that it works in the stack, then it seems appropriate to > >support. Isn't that a fundamental principle of being an open source > effort? > > > > > >Mike > > > > > >At 12:31 PM 10/3/2006, Fabian Tillier wrote: > > > > > >>Hi Yaron, > >> > >>On 10/3/06, Yaron Haviv <[EMAIL PROTECTED]> wrote: > >> > >> > >>>I'm trying to figure out why this protocol makes sense > >>>As far as I understand, IPoIB can provide a Virtual NIC functionality > >>>just as well (maybe even better), with two restrictions: > >>>1. Lack of support for Jumbo Frames > >>>2. Doesn't support protocols other than IP (e.g. IPX, ..) > >>> > >>> > >>Whether to use a router or virtual NIC approach for connectivity to > >>Ethernet subnets is a design decision. We could argue until we are > >>blue in the face about which architecture is "better", but that's > >>really not relevant. > >> > >> > >> > >>>I believe we should first see if such a driver is needed and if IPoIB > >>>UD/RC cannot be leveraged for that, maybe the Ethernet emulation can > >>>just be an extension to IPoIB RC, hitting 3 birds in one stone (same > >>>infrastructure, jumbo frames for IPoIB, and Ethernet emulation for all > >>>nodes not just Gateways) > >>> > >>> > >>You're joking right? Are you really arguing that SilverStorm should > >>not develop a driver to support its existing devices? This really > >>isn't complicated: > >> > >>1). SilverStorm has a virtual NIC hardware device. > >>2). SilverStorm is committed to support OpenFabrics. > >> > >>The above two statements lead to the following conclusion: SilverStorm > >>needs a driver for its devices that works with the OpenFabrics stack. > >>This is totally orthogonal to and independent of working on IPoIB RC > >>or any IETF efforts to define something new. > >> > >>- Fab > >> > >>___ > >>openib-general mailing list > >>openib-general@openib.org > >>http://openib.org/mailman/listinfo/openib-general > >> > >>To unsubscribe, please visit > >>http://openib.org/mailman/listinfo/openib-general > >> > >> > > > > > > > >___ > >openib-general mailing list > >openib-general@openib.org > >http://openib.org/mailman/listinfo/openib-general > > > >To unsubscribe, please visit http://openib.org/mailman/listinfo/openib- > general > > > > > > ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH 0/10] [RFC] Support for SilverStorm Virtual Ethernet I/O controller (VEx)
> -Original Message- > From: [EMAIL PROTECTED] [mailto:openib-general- > [EMAIL PROTECTED] On Behalf Of Rimmer, Todd > Sent: Monday, October 02, 2006 5:46 PM > To: Scott Weitzenkamp (sweitzen); Kuchimanchi, Ramachandra; Roland Dreier > (rdreier) > Cc: openib-General > Subject: Re: [openib-general] [PATCH 0/10] [RFC] Support for SilverStorm > Virtual Ethernet I/O controller (VEx) > > > From: Scott Weitzenkamp (sweitzen) > > Sent: Monday, October 02, 2006 4:22 PM > > To: Kuchimanchi, Ramachandra; Roland Dreier (rdreier) > > Cc: openib-General > > Subject: Re: [openib-general] [PATCH 0/10] [RFC] Support for > SilverStorm > > Virtual Ethernet I/O controller (VEx) > > > > Is this communication protocols documented anywhere? How does this > > feature compare to IPoIB and SDP? > > > This protocol is distinct from IPoIB and SDP. > > In brief: > > IPoIB treats an IB fabric as a LAN. As such it has UD semantics. > > SDP essentially treats the HCA as a TOE and leverages IB's RC semantics > to emulate TCP/IP SOCK_STREAM sockets. > > This protocol implements the interface to communicate to the SilverStorm > VEx Ethernet Virtual IO Controllers. The VEx card presents a true > Ethernet NIC to the host and essentially treats IB as an IO bus to allow > a host CPU to use the VEx card as its NIC. > > Todd Rimmer > Todd, I'm trying to figure out why this protocol makes sense As far as I understand, IPoIB can provide a Virtual NIC functionality just as well (maybe even better), with two restrictions: 1. Lack of support for Jumbo Frames 2. Doesn't support protocols other than IP (e.g. IPX, ..) 1 can easily be addressed using IPoIB RC, and the question is if 2 is really a problem (how many people use IPX or apple talk .. these days) And if 2 is a problem why isn't it in a greater scope of supporting Ethernet emulation even between any IB nodes, and not just from a host to a gateway device. If this is a real requirement, why haven't SilverStorm worked with the industry and standardization bodies such as IBTA or IETF to come with a standard and interoperable way to address it, and not just try and push a proprietary driver and a point solution to the kernel. I believe we should first see if such a driver is needed and if IPoIB UD/RC cannot be leveraged for that, maybe the Ethernet emulation can just be an extension to IPoIB RC, hitting 3 birds in one stone (same infrastructure, jumbo frames for IPoIB, and Ethernet emulation for all nodes not just Gateways) Yaron > > ___ > openib-general mailing list > openib-general@openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib- > general ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [openib-general] iSER & FC-SAN performance
> -Original Message- > From: [EMAIL PROTECTED] [mailto:openib-general- > [EMAIL PROTECTED] On Behalf Of Mohit Katiyar, Noida > Sent: Friday, March 10, 2006 11:36 AM > To: openib-general@openib.org > Subject: [openib-general] iSER & FC-SAN performance > > Hi All, > Are there any performance related data of iSER is available or some test > results that were performed on iSER? > Are there any kinds of operational issues in the performance of iSER in > an general FC-SAN environment? I am also looking for some reference > material of operations of iSER on FC Gateway environment? > Thanks in advance > > Mohit Katiyar Mohit, Your question should be broken to two, one is the iSER performance and the other is an IB to FC gateway performance, and both are very implementation dependent iSER initiator uses zero copy and can map a large SCSI command to a single send+rdma transaction, this allows for very high-bandwidths and low CPU% as the message size grow, the performance we saw is >900MB/s per initiator this was tested with a 4X SDR link (1000 MB/s capable), with DDR it may achieve more. As for IB-FC gateway performance, it depends on the HW architecture of the gateway rather than if its iSER or SRP, e.g. what's the memory BW capacity of the gateway, pci-x vs. pci-express, CPU capacity, and FC ports When targeting larger messages the mem/bus bandwidth become much more critical than the cpu capacities. As an example an iSER-FCP gateway would typically implement a store & forward design (other designs can be achieved as well), the SCSI command would be intercepted, data would be fetched to memory (using RDMA), and a SCSI transaction would be performed on the FC side (were the FC adapter will fetch the data using DMA), in a good gateway implementation multiple I/Os can flow in parallel (asynchronous), this can sustain the 900MB/s in some architectures, in addition multiple gateways can be aggregated to scale bandwidth of many GB/s of data, while still enjoying the single name space and even emulate a single session leveraging on iSCSI mechanisms. The different options for iSER to FC gateways would include: 1. Voltaire FCR product 2. FalconStore NSS product on a PC platform (with IB & FC adapters) You can address the different vendors to get more details on their products & performance, I can point you to the right contacts offline Yaron > ___ > openib-general mailing list > openib-general@openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib- > general ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [openib-general] Suggested components to support in 1.0
> -Original Message- > From: [EMAIL PROTECTED] [mailto:openib-general- > [EMAIL PROTECTED] On Behalf Of Bob Woodruff > Sent: Friday, February 24, 2006 12:23 PM > To: 'Bryan O'Sullivan'; openib-general > Subject: RE: [openib-general] Suggested components to support in 1.0 > > Bryan wrote, > >Components that I don't know what to do about, and will likely want to > >drop unless someone can vouch for them: > > > * iSER > > * SRP > > * uDAPL > > > We need uDAPL and I am sure people want SRP and > I think both are in good shape. > I am not sure that iSer is quite ready, but will let Voltaire make that > call. > > woody Woody, I believe that OpenIB iSER is quickly getting there with the amount of dedicated work Or, Dan, and others put into it We would definitely vote for it Yaron > ___ > openib-general mailing list > openib-general@openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib- > general ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [openib-general] [ANNOUNCE] Contribute RDS (Reliable DatagramSockets) to OpenIB
> -Original Message- > From: [EMAIL PROTECTED] [mailto:openib-general- > [EMAIL PROTECTED] On Behalf Of Richard Frank > Sent: Thursday, December 01, 2005 1:03 PM > To: Grant Grundler > Cc: openib-general@openib.org > Subject: Re: [openib-general] [ANNOUNCE] Contribute RDS (Reliable > DatagramSockets) to OpenIB > > We do not see any deficiencies - the RDS specification and current > implementation so far meet our requirements and is working very well. > > There is more we will want to do further down the road - such as access > the RDS sockets via AIO so we can add zero copy support. > Richard, In the document you published few weeks ago you listed latency and CPU% as key goals I assume to really get the latency down you need a user space implementation that can leverage on pooling, any plans to work in user space ? Several other comments/suggestions if I may add (may already took them into account): As a UDP consumer isn't there a need to support Multicast as well, and potentially leverage on IB multicast for scalability ? I feel that there is not much benefit in eliminating the reliability checks in the upper (UDP) consumer, since its negligible in CPU or latency overhead, you may even just go with a UC implementation, also UDP consumers may want to use RDS without modifying the application, or may accept dropped packets or over subscription (since they are interested in the most recent data). And it is very important to tie the RDS implementation to the IP stack for routing information/resolution, ARPs, etc' So it would become transparent from the mng/configuration side as well, not requiring separate configuration files, or dealing better with dynamic environments and failures like a real UDP would. Yaron > > On Thu, 2005-12-01 at 08:16 -0800, Grant Grundler wrote: > > On Tue, Nov 29, 2005 at 03:23:46PM -0800, Roland Dreier wrote: > > > Any progress to report on the port of RDS from the SilverStorm > > > proprietary stack to the standard Linux stack? I think it would > > > really move the discussion forward if there were some code that people > > > could build and use. > > > > As primary consumer of RDS, I think Oracle first needs to decide if > > the deficiencies that Mike Krause pointed out are acceptable or not. > > > > grant > > ___ > > openib-general mailing list > > openib-general@openib.org > > http://openib.org/mailman/listinfo/openib-general > > > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib- > general > > ___ > openib-general mailing list > openib-general@openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib- > general ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [swg] RE: [openib-general] socket based connectionmodel for IBproposal -round 4
> -Original Message- > From: [EMAIL PROTECTED] [mailto:openib-general- > [EMAIL PROTECTED] On Behalf Of Sean Hefty > Sent: Tuesday, November 29, 2005 6:30 PM > To: Kanevsky, Arkady > Cc: Ted H. Kim; [EMAIL PROTECTED]; openib-general@openib.org > Subject: Re: [swg] RE: [openib-general] socket based connectionmodel for > IBproposal -round 4 > > Kanevsky, Arkady wrote: > > Sean, > > SWG discussed today the extending private data format proposal to > > SIDR_REQ. > > The group does not see the need for it since ULP is no RDMA aware. > > That is ULP does not use RDMA operations. > > Do you have some specific ULP in mind for this functionality? > > For UDP a different IP address can be used for each message. There is no > > persistent connection. > > I didn't have any particular ULP in mind. I was thinking more of a > generic > application that wanted to use UDP style addressing over IB, similar to > what's > being discussed for using TCP style addressing over IB. > > It seems that there needs to be a way to map a given destination address > to a > remote QP/qkey. Regardless if the IP address is carried in each ULP > message, it > would still need to be in the SIDR REQ in order to locate the correct QP. > Sean, How about using ARP to get from IP to DGID+Partition Followed by an SIDR to map DGID+PKey+Service to QKey & QP It is the same concept as CMA that first uses IP stack (ARP etc') to get to the remote end-point (in that case GID+PKey combination) followed by SA-PR and CM REQ, we just substitute the CM REQ with a SIDR REQ It may not solve all the cases but probably most of the practical ones Anyway the packets will need to carry some header (since it's not a connected model), you can add more stuff in that header (e.g. can use IPoIB header as is which contains already the src/dst IP) Yaron ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [swg] RE: [openib-general] round 2 - proposal for socket based connection model
> -Original Message- > From: Caitlin Bestler [mailto:[EMAIL PROTECTED] > Sent: Tuesday, October 25, 2005 6:39 PM > To: Tom Tucker; Kanevsky, Arkady > Cc: [EMAIL PROTECTED]; openib-general@openib.org > Subject: [swg] RE: [openib-general] round 2 - proposal for socket based > connection model > > > > > -Original Message- > > From: [EMAIL PROTECTED] > > [mailto:[EMAIL PROTECTED] On Behalf Of Tom Tucker > > Sent: Tuesday, October 25, 2005 2:56 PM > > To: Kanevsky, Arkady > > Cc: [EMAIL PROTECTED]; openib-general@openib.org > > Subject: RE: [openib-general] round 2 - proposal for socket > > based connection model > > > > Arkady: > > > > I may actually have a constructive comment about the protocol > > (private data format). One thing I noticed is that *almost* > > everything in the private data header is available in the > > native iWARP protocol header except the ZB and SI bits. If > > these bits become part of the canonical private data header, > > then does that require an iWARP transport to use the header > > too even though only two bits are useful? > > > > Sorry if this is a dumb question, > > > > I'm not sure I followed why these were needed myself. I believe ZBTO and Remote Invalidation are mandatory in iWarp, right ? There are two new RDMA features that are available in iWarp, and are new to IB (optional in 1.2 version) A ULP that is supposed to run on both may want to know if the peer supports those, so it can use the correct verbs e.g. if the peer doesn't support remote invalidation the ULP will need to use Send verb, and invalidate the FMR locally, if it does support it, it can use the new "Send with Invalidate" verb which can improve performance and security I don't see why iWarp needs to negotiate it, CMA can just return true on both bits in case its iWarp This is a generic parameters that will be needed by more than one ULP, that wants to make sure what verbs are supported by the RDMA generic layer, that's why its in the generic portion of the header. Yaron ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [openib-general] RE: [dat-discussions] round 2 - proposal forsocket based connection model
> -Original Message- > From: [EMAIL PROTECTED] [mailto:openib-general- > [EMAIL PROTECTED] On Behalf Of Kanevsky, Arkady > Sent: Tuesday, October 25, 2005 1:26 PM > To: Sean Hefty > Cc: [EMAIL PROTECTED]; openib-general@openib.org; dat- > [EMAIL PROTECTED] > Subject: RE: [openib-general] RE: [dat-discussions] round 2 - proposal > forsocket based connection model > > Think of a single API that supports iWARP and IB (transport independent > API). > To a connection listener it provides the IP 5-tuple + private data. > For IB it means that CM parses REQ and extracts IP 5-tuple as separate > fields from private data. > Listener does not parse the private data encoding of the proposal. > > So CM need to know if it need to encode IP 5-tuple on requestor side > and if need to parse on responder side. > Arkady > Arkady, I agree with Sean you can encode the Dest Port in the ServiceID And if you really want to verify its using that format you can look at the upper 48 bits in the serviceID. We may need to distinguish between Explicit RDMA protocols (iSER, NFS-RDMA, RDP, etc') and Implicit RDMA (SDP, where the Socket application doesn't know it is using RDMA), this can be done in 3 ways: a. port mapper, b. different ServiceID prefix, or c. a bit in the CM REQ Header. Also I'm not sure why we need the Protocol (UDP, TCP, SCTP, ..) since we emulate RDMA we shouldn't care if its TCP or SCTP, and UDP is unconnected and cant drive RDMA anyway Yaron > > Arkady Kanevsky email: [EMAIL PROTECTED] > Network Appliance phone: 781-768-5395 > 375 Totten Pond Rd. Fax: 781-895-1195 > Waltham, MA 02451-2010 central phone: 781-768-5300 > > > > > -Original Message- > > From: Sean Hefty [mailto:[EMAIL PROTECTED] > > Sent: Tuesday, October 25, 2005 1:08 PM > > To: Kanevsky, Arkady > > Cc: Caitlin Bestler; [EMAIL PROTECTED]; > > openib-general@openib.org; [EMAIL PROTECTED] > > Subject: Re: [openib-general] RE: [dat-discussions] round 2 - > > proposal for socket based connection model > > > > > > Kanevsky, Arkady wrote: > > > Correct. > > > But this does bring the question how responder CM knows > > that it need > > > to parse the private data. I suspect this will be done via > > new version > > > of CM. But a suage of some of the CM REQ reserved fields are also > > > possible. Anotherwords the current CM version assumes that CM only > > > supports one version and there is no need to support more than 1 > > > version. > > > > The responder knows how to parse the private data based on > > the service ID that > > they're listening on. This is how it's done today, and how > > it will still need > > to be done. What is the motivation to change it? > > > > What data is beyond the addressing? How does the responder > > know how to > > interpret that? > > > > - Sean > > > ___ > openib-general mailing list > openib-general@openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib- > general ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [openib-general] iSER details
> -Original Message- > From: [EMAIL PROTECTED] [mailto:openib-general- > [EMAIL PROTECTED] On Behalf Of Hal Rosenstock > Sent: Monday, October 24, 2005 6:31 AM > To: Mohit Katiyar, Noida > Cc: openib-general@openib.org > Subject: Re: [openib-general] iSER details > > On Mon, 2005-10-24 at 06:09, Mohit Katiyar, Noida wrote: > > Can anyone tell me where can I find the specifications of iSER > > protocol on Infiniband. I could not find any document which provides > > specification specially according to Infiniband, all the doc were on > > iWarp. If anyone can guide me in this > > There are 2 relevant I-Ds: > > iSCSI Extensions for RDMA Specification > http://www.ietf.org/internet-drafts/draft-ietf-ips-iser-05.txt > As Hal indicate the iSER-05 IETF draft already incorporates InfiniBand, and already passed last call status. There aren't many differenced between IB and iWarp, IBTA is also working on the IP address mapping over InfiniBand that will be leveraged by iSER/IB and NFS/RDMA, and few other clarifications/issues. Note one key difference in the IETF draft is that IB negotiate the Login over the RC connection, where in iWarp its over a TCP connection (and than transition to RDMA RC). Some more detailed material can be found on http://www.haifa.il.ibm.com/satran/ips/iSER-in-an-IB-network-V9.pdf It's a little old but many sections are still relevant Yaron ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [dat-discussions] RE: [openib-general] Re: [swg] Re: private data...
> -Original Message- > From: [EMAIL PROTECTED] [mailto:dat- > [EMAIL PROTECTED] On Behalf Of Kanevsky, Arkady > Sent: Thursday, October 20, 2005 5:07 PM > To: [EMAIL PROTECTED]; Sean Hefty > Cc: Lentini, James; [EMAIL PROTECTED]; openib-general@openib.org > Subject: RE: [dat-discussions] RE: [openib-general] Re: [swg] Re: private > data... > > > Once this is defined ULP can decide on which Service ID(s) to listen. > Requestor can send conn req to a specific Service ID (IB specific) > or use higher level abstraction - TCP port. > CM may be capable to translate TCP port to Service ID based on ULP. > For example, iSER over IPoIB will be mapped to one Service ID and > native iSER over IB will be mapped to another. But this is not simple. > On another hand every intermediate level protocol (SDP, IPoIB) can > do conversion. But this is also hard and is extension of existing > protocol. A small correction, there is no iSER over IPoIB, just iSER over Native RDMA There can be an iSCSI/TCP session running over IPoIB but than it's a connectionless UD session (without ServiceID), also the iSER spec defines that iSCSI/iSER is in precedence to iSCSI/TCP. To add to the ongoing discussion, one of the major benefits in maintaining the TCP port numbers for RDMA protocols is the ability to leverage on existing naming services and configuration mechanisms. e.g. NFS use Port mappers, other protocols use DHCP, DNS, SLP, iSNS, well defined numbers, or other mechanisms, this way the upper layers beyond the transport stay the same and don't bother if its IB or iWarp or even if its plain TCP. If we don't preserve a simple/linear port mapping, we probably need to reinvent name-services for RDMA as well. Yaron ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] RE: iWARP emulation protocol
> -Original Message- > From: Roland Dreier [mailto:[EMAIL PROTECTED] > Sent: Tuesday, October 18, 2005 2:53 PM > To: Kanevsky, Arkady > Cc: Roland Dreier; Yaron Haviv; openib-general@openib.org; > [EMAIL PROTECTED] > Subject: Re: iWARP emulation protocol > > > The proposal doesn't talk about mapping from TCP port numbers into a > 16-bit range of IB service IDs. I think this is necessary. > I agree, that's part of the other proposals > Also, putting the destination address in the REP message doesn't make > sense to me. The destination IP and port number is something that the > initiator of the connection is sending to the destination, not the > other way around. The passive side of the connection (receiver of the > REQ) needs the destination IP as part of the REQ so that it can decide > whether to accept the connection; the active side (sender of the REQ) > knows who it is trying to talk to, so having the address information > in the REP is not useful. Also Agree, REP just needs few fields (ver, capabilities) Yaron ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [openib-general] [RFC] IB address translation using ARP
> -Original Message- > From: [EMAIL PROTECTED] [mailto:openib-general- > [EMAIL PROTECTED] On Behalf Of Sean Hefty > Sent: Friday, October 07, 2005 12:40 PM > To: 'Michael Krause'; Caitlin Bestler > Cc: Openib > Subject: RE: [openib-general] [RFC] IB address translation using ARP > > >It would be best to define a CM architecture that enabled communication > >between like endpoints and avoid the gateway dilemma. Let the gateway > >provider work out such issues as there are many requirements already > >on each side of these interconnects. > > > I've given this some more thought since the original postings and agree > with > you. It doesn't seem right to me to have the CM establish a connection to > something that is not the specified destination, under the assumption that > whatever is being connected to is a gateway. I think it would be better > for the > application to determine that the actual destination is on a different > subnet, > locate the gateway, and issue a connection request to the gateway. > > - Sean > Sean, I believe this is exactly how it is been proposed The gateway is the endpoint in IB, and the IB CM request is done against the gateway, the gateway may decide to create its own connection on the other side based on IB headers or Private data or even application data (depend on the type of the gateway), this just requires that traffic targeted to a certain IP range/subnet/non-local will end up in the gateway without the need to specify address by address individually (just like its done in IP) Yaron > ___ > openib-general mailing list > openib-general@openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib- > general ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [openib-general] [RFC] IB address translation using ARP
> > From: Michael Krause [mailto:[EMAIL PROTECTED] > Sent: Friday, October 07, 2005 12:29 PM > To: Yaron Haviv > Cc: Openib > Subject: RE: [openib-general] [RFC] IB address translation using ARP > > At 06:24 AM 9/30/2005, Yaron Haviv wrote: > > > -Original Message- > > From: Roland Dreier [ mailto:[EMAIL PROTECTED] > > Sent: Thursday, September 29, 2005 9:50 PM > > To: Sean Hefty > > Cc: Yaron Haviv; Openib > > Subject: Re: [openib-general] [RFC] IB address translation using ARP > > > > I think the usage model is the following: you have some magic device > > that has an IB port on one side and "something else" on the other > > side. Think of something like a gateway that talks SDP on the IB side > > and TCP/IP on the other side. > > > > >Also applicable to two IB ports, e.g. forwarding SDP traffic from one IB > >partition to SDP on another partition (may even be the same port with > >two P_Keys), and doing some load-balancing or traffic management in > >between, overall there are many use cases for that. > > While I can envision how an endpoint could communicate with another in > separate partitions, doing so really violates the spirit of the > partitioning where endpoints must be in the same partition in order to see > one another and communicate. Mike, This is exactly the same case as two IPoIB interfaces over same port with two partitions configured with IP routing between them, or a layer 7 proxy that connects two network segments I don’t see anything wrong with such a model > Attempting to create an intermediary who has > insights into both and then somehow is able to communicate how to find one > another using some proprietary (can't be through standards that I can > think of) method, seems like way too much complexity to be worth it. > Assuming the ULPs on both sides are standards, how the proxy is built and how it functions is application dependent just like people do proxies for XML which don’t need to obey to any standard beside be transparent to both sides. OpenIB should not block the ability to provide gateway/proxy functionality, or routing traffic beyond a single IP addressing hop. This is just matching IB to capabilities already available in iWarp. Yaron ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [openib-general] [RFC] IB address translation using ARP
> -Original Message- > From: Roland Dreier [mailto:[EMAIL PROTECTED] > Sent: Thursday, September 29, 2005 9:50 PM > To: Sean Hefty > Cc: Yaron Haviv; Openib > Subject: Re: [openib-general] [RFC] IB address translation using ARP > > I think the usage model is the following: you have some magic device > that has an IB port on one side and "something else" on the other > side. Think of something like a gateway that talks SDP on the IB side > and TCP/IP on the other side. > Also applicable to two IB ports, e.g. forwarding SDP traffic from one IB partition to SDP on another partition (may even be the same port with two P_Keys), and doing some load-balancing or traffic management in between, overall there are many use cases for that. Yaron ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [openib-general] [RFC] IB address translation using ARP
> -Original Message- > From: Sean Hefty [mailto:[EMAIL PROTECTED] > Sent: Thursday, September 29, 2005 5:16 PM > To: Yaron Haviv > Cc: Hal Rosenstock; Openib > Subject: Re: [openib-general] [RFC] IB address translation using ARP > > Yaron Haviv wrote: > > 4. send an arp on the net device find destination MAC > > > > Note the destination IP in the ARP phase is either the REAL destination > > IP in case of a local subnet, or the IP router IP address in case of a > > gateway/router. > > > > 5. issue a path record between the source/dest GIDs (DGID taken from ARP > > Result IPoIB MAC) > > In the case of gateway/router, isn't the returned GID for the router? How > is > this used to establish a connection with the real destination? > > - Sean The RC connection is established with the DGID of the router (it's the equivalent of a MAC address and its ok), the ServiceID + private data in the case of SDP or iSER (or NFS-R assuming the IBTA proposal will pass) also contains info on the REAL destination IP that can be used by the proxy. By the way there is a section on that in the IETF iSER draft talking about iSER to iSCSI routing, but it's a general solution just as applicable to someone doing HTTP proxy to SDP, or NFS/TCP to NFS/RDMA, or SDP to SDP, etc'. to route ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [openib-general] [RFC] IB address translation using ARP
> -Original Message- > From: [EMAIL PROTECTED] [mailto:openib-general- > [EMAIL PROTECTED] On Behalf Of Sean Hefty > Sent: Thursday, September 29, 2005 2:58 PM > To: Hal Rosenstock > Cc: Openib > Subject: Re: [openib-general] [RFC] IB address translation using ARP > > Hal Rosenstock wrote: > >>I'm struggling with understanding how translation can even occur in this > case. > >>What DGID is used when querying for the path record, and how is it > obtained? > > > > Isn't it the DGID of the next hop IP router ? (I suppose in the case of > > multiple IPoIB subnets on the same IB subnet, it could shortcut somehow > > like NHRP does in terms of ATM v. CLIP (Classic IP over ATM). > > How is the DGID of the next hop IP router used when connecting? As an > aside, do > the IPoIB subnets all fall into the same broadcast domain? > > >>What does SDP do in this case? > > > > Same as AT. It does the route lookup and ARPs for and then asks for the > > PathRecord of the next hop IP router. > > I guess I'm confused here. This gives a path record between the host > system and > the IP router. How is that used to establish a connection to the actual > destination? What values (DLID, DGID, pkey, etc.) go in the CM REQ > message, and > how are those values obtained? > > - Sean The idea as Hal was describing is following the common IP model: 1. per destination IP (and TOS in IP case) find the outgoing route entry 2. if it's a subnet covered by an adapter (IPoIB in our case, can have multiple per port each with its own P_Key), find the net device to use 3. if its not in one of my subnets than what is the IP of the router covering that destination (e.g. default gateway), and what is the net device I need to use (a device/port/partition combination). 4. send an arp on the net device find destination MAC Note the destination IP in the ARP phase is either the REAL destination IP in case of a local subnet, or the IP router IP address in case of a gateway/router. 5. issue a path record between the source/dest GIDs (DGID taken from ARP Result IPoIB MAC) That's how its done in SDP & ib_at I believe The generalization beyond a local subnet is very important If we want to address all sorts of applications, and configurations And not related to IB routing e.g. a proxy/LB application that sits in between two IP subnets (both over IB), future mapping from IB to external iWarp subnets, IP routers, etc' it also follows the exact flow as in GbE/IP Yaron > ___ > openib-general mailing list > openib-general@openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib- > general ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [openib-general][PATCH][RFC]: CMA IB implementation
> -Original Message- > From: [EMAIL PROTECTED] [mailto:openib-general- > [EMAIL PROTECTED] On Behalf Of Sean Hefty > Sent: Thursday, September 22, 2005 12:28 PM > To: Guy German > Cc: Openib > Subject: Re: [openib-general][PATCH][RFC]: CMA IB implementation > > Guy German wrote: > > I don't think this layer should replace ib_at. If you think there are > > things to be fixed in the ib_at, I suggest we fix them. I do believe > > that the original purpose of this generic cm was to serve ulps that > > don't want to be transport oriented (e.g. iSER). > > Based on discussions from last month, the general agreement was to use CM > private data in place of ATS. Once that's done, I don't see a need for > ib_at. > (Also, put simply, I don't believe that ATS can work.) I think that a > combination of what Roland, including his original API design, and Yaron > proposed is the right direction to go. > Sean, my response is somewhat behind Any way ib_at doesn't depend or directly connect to ATS ATS was just one way to translate IP to GID IB_AT provides a way to eventually translate src/dst IP + QoS attributes to a set of layer 2 attributes and QP parameters in one place for few ULPs And with potential enhancements to implement central address cache and central QoS & Partitioning configuration mechanism. Basically it's the IB equivalent of TCP/IPs IP & Eth resolution and routing layers. Having said that it doesn't really matter if its part of the CM or external if we keep the functionality and implementation To address partitioning IB_AT suggest using the P_Key value derived from the IPoIB interface, also allowing a consumer/ULP to override those values with its own. This forming the exact behavior as you would expect from an Ethernet or iWarp mapping the RDMA sessions to the VLAN used by that Interface. To address QoS IB_AT model suggest taking by default the SL value from the IPoIB interface of that subnet which took it from the SA MCRecord (can override that with ULP). This allows a user to create two subnets over the fabric each mapped to a different SL/VL with its BW/Priority reservation, and on the ULP side he just needs to config ULP with different BW requirements to work over a different subnet (which is what people already do today in many cases since they use separate fabrics for e.g. one for NFS and one for MPI) The API was also designed to let users override the default values derived from IPoIB, so a sophisticated user/ulp can always get the best granularity. Yaron > - Sean > ___ > openib-general mailing list > openib-general@openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib- > general ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [openib-general] Managing SRP devices via iSCSI ?
>From: [EMAIL PROTECTED] [mailto:openib-general->[EMAIL PROTECTED] On Behalf Of Rick Frank >Sent: Monday, September 19, 2005 10:02 PM >To: openib-general@openib.org >Subject: [openib-general] Managing SRP devices via iSCSI ? > >One key argument I've heard in favor of iSER vs SRP is that iSCSI (top >level >iSER driver) has a very strong management infrastructure - as it is fairly >mature. > >However, iSER seems to be just gaining steam in terms of direct attached >storage supporting this protocol .vs. SRP. > >Would it not be possible to implement some glue between SRP and iSCSI to >allow for the discovery and management of SRP devices ? Rick, The question is why bother with a new approach when iSER is what you just suggested ? a. After all iSER transactions are similar to SRP ones (derived from SRP) with few enhancements in favor of iSER (SRQ, FMR, MC/S, immediate, recovery,..). b. iSER header and naming convention is derived from iSCSI, where as SRP naming and header structure is different forcing redundant translation between the two, and some functionality that wouldn't be possible such as Portals, MC/S, ACA, etc', makes more sense to just use the iSCSI base header format (like iSER does). c. iWarp guys that now join OpenIB will never use this non standard, IB specific SRP/iSCSI hybrid but rather the real iSER. d. SRP which was initially defined in T10 lost all its momentum in T10 (last SRP meeting was 2 years ago), not sure how you will standardize your proposal, where iSER is in IETF (integral part of iSCSI/IPS) and serves IB & iWarp, guaranteeing its momentum will grow, and it will be enhanced over time. So I believe overall it's simpler to move SRP implementations to iSER, (some vendors already wisely do that) than somehow define a non standard SRP with iSCSI management, after all iSER is just what you propose (improved SRP with iSCSI services), and is already defined (last call in IETF). By the way I wouldn't deduct from few early experiments of SRP storage in the market a whole lot on SRP adoption among key storage vendors or on their future plans. If you are interested in more details on iSER let me know Yaron ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [openib-general] RDMA Generic Connection Management
> -Original Message- > From: Talpey, Thomas [mailto:[EMAIL PROTECTED] > Sent: Tuesday, August 30, 2005 12:54 PM > To: Yaron Haviv > Cc: openib-general@openib.org > Subject: RE: [openib-general] RDMA Generic Connection Management > > At 10:55 AM 8/30/2005, Yaron Haviv wrote: > >The iSCSI discovery may return multiple src & dst IP addresses and the > >iSCSI multipath implementation will open multiple connections. > >There are many TCP/IP protocols that do that at the upper layers (e.g. > >GridFTP, ..), not sure how NFS does it. > > > To answer the question of how NFS "finds out" about multiple > connections and trunking, the answer is generally that the mount > command tells it. Mount can get this information from the command > line, or DNS. I believe Solaris uses the command line approach. There > may be a way to use the RPC portmapper for it, but the portmapper > isn't used by NFSv4. > > Bottom line? NFS would love to have a way to learn multipathing > topology. But it needs to follow existing practice, such as having > an IP address / DNS expression. If the only way to find it is to query > fabric services, that's not very compelling. > > Tom. Tom, from your description it looks like the multipathing is done based on IP addressing (like iSCSI/iSER, GridFTP, ..) and resolved by the ULP or its name service, in that case the ULP probably opens few connections from one or more IPs to one or more other IPs. This mean that we don't need a transport dependent mechanism as long as each port is associate with a unique IP (like we do today in OpenIB). (Another good reason to use IP addressing) Yaron ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [openib-general] Re: RDMA Generic Connection Management
> -Original Message- > From: [EMAIL PROTECTED] [mailto:openib-general- > [EMAIL PROTECTED] On Behalf Of Roland Dreier > Sent: Tuesday, August 30, 2005 2:36 PM > To: Talpey, Thomas > Cc: openib-general@openib.org > Subject: Re: [openib-general] Re: RDMA Generic Connection Management > > Thomas> Well, you're saying somebody has to do it, right? Is it > Thomas> easier to fob this off to upper layers that (frankly) > Thomas> don't care what hardware they're talking to!? This means > Thomas> we have N copies of this, and N ways to do it. Talk about > Thomas> cacheline pingpong. > > Upper layers have the luxury of being able to do this at a > per-connection level, can sleep, etc. If we push it down into the > verbs, then we have to do it in every verbs call, including the fast > path verbs call. And that means we get into all sorts of crazy code > to deal with a device disappearing between a consumer calling > ib_post_send() and the core code being entered, etc. > > Right now we have a very simple set of rules: > If all the ULPs need to do exactly the same, or the implementation is different for IB/iWarp, than we should probably do it under the API like its defined in kDAPL. Also note that with Virtual machines this type of event may be more frequent and we may want to decouple the ULPs from the actual hardware device as much as we can Yaron ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [openib-general] RDMA Generic Connection Management
> -Original Message- > From: [EMAIL PROTECTED] [mailto:openib-general- > [EMAIL PROTECTED] On Behalf Of James Lentini > Sent: Monday, August 29, 2005 3:35 PM > To: Guy German > Cc: openib-general@openib.org > Subject: Re: [openib-general] RDMA Generic Connection Management > > > What happens if multiple devices can reach the destination address? > How will they be enumerated to the consumer? > Since its an IP based approach, it will work like traditional IP A preference is given to a device with the same subnet as destination In GbE if two NICs are on the same subnet then only one will be selected You can also use a LAG solution that will balance connections over multiple links, but it is done at the L2-3 layers (not exposed to the ULP) We should probably use the same approach and provide a single device handle to the ULP, we may have a virtual device handle representing few similar parallel devices (just like a LAG group has a virtual MAC), also maybe a good idea to pass an enum with some preference (e.g. single path or redundant or ...) Specifically in iSER the redundancy is handled in the upper layers The iSCSI discovery may return multiple src & dst IP addresses and the iSCSI multipath implementation will open multiple connections. There are many TCP/IP protocols that do that at the upper layers (e.g. GridFTP, ..), not sure how NFS does it. Also note that there was a new addendum to IB Multipath record query me & Hal proposed in IBTA that enable a client to ask "what are all the options to get from point A to point B ?", where A & B are identified by one of the GIDs we know about, and we can specify a flag for same port/hca/system preferences, this can be implemented under AT if we want. Yaron ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [openib-general] RDMA connection and address translation API
> -Original Message- > From: Sean Hefty [mailto:[EMAIL PROTECTED] > Sent: Thursday, August 25, 2005 2:37 PM > To: 'James Lentini'; Yaron Haviv > Cc: openib-general@openib.org > Subject: RE: [openib-general] RDMA connection and address translation API > > >> Any way providing src/dst IPs in the CM Private data is simple, and we > >> can come with IBTA extension blessing that data structure as a general > >> way to map IP oriented protocols over IB (a 1-2 page draft at the most) > >> This way it can also address Caitlin concerns regarding NFS & IETF > >> (since now it's a transport specific issue) > > > >How long do you estimate it would take to standardize an IP<->GID > >mechanism (ATS, CM embedded, ...) in the IBTA? 3 months? 6 months? A > >year? > > > >Let's assume that everyone on this list is in agreement. > > Does anyone in the IB world disagree with adding IP addresses in the CM > private > data area? Would we want to extend this concept to SIDR as well? > > - Sean I send my proposal from 2004 re-send again as text (attached) Also addresses the ServiceID issue, this can be a baseline for discussions Feel free to change Yaron Mapping of iWarp/TCP connections to InfiniBand AUTHOR Yaron Haviv ([EMAIL PROTECTED]) VERSION 0.30, Mon June 28 2004 I. INTRODUCTION InfiniBand and iWarp semantics are similar especially with the latest Verb Extensions, the major difference is in the way connections are established, iWarp uses TCP based connection establishment while InfiniBand uses a CM for that. Another related difference is that in iWarp a user can start in a standard TCP mode and migrate to RDMA verbs in the middle of a session. The following document provides a general mapping from iWarp/TCP connection establishment to InfiniBand which can be used by ULPs over InfiniBand or by any other future iWarp protocols, it imitates the SDP connection establishment process and CM headers (does not require SDP, just have the same data formats for CM messages). II. Establishing a TCP/iWarp like connections over InfiniBand In order to emulate an iWarp connection, it is required to open an InfiniBand RC connection, associate it with IP addresses and TCP ports In addition protocols may transfer control/login packets before the migration to the RDMA mode; this requires exchanging receiver buffer size and depth for initial usage (the ULPs will manage the flow control for the duration of the connection). The mapping uses the same data structures already defined for connection establishment in SDP (IBTA Socket Direct Protocol) which accomplish the same goal of mapping TCP Sockets addressing to InfiniBand, the non relevant SDP fields were Reserved. iWarp emulation CM Request (Hello) Private Data header 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 04| MID | Rsvd | bufs | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 08| len | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 12| Reserved| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 16| Reserved| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 20| MajVer| MinVer| IPVer | FlowC | Reserved| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 24| DesRemRcvSz | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 28| LocalRcvSz | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 32| Local Port| Reserved| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 36| Src IP (127-96) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 40| Src IP ( 95-64) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 44| Src IP ( 63-32) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 48| Src IP ( 31-00
RE: [openib-general] RDMA connection and address translation API
> -Original Message- > From: James Lentini [mailto:[EMAIL PROTECTED] > Sent: Thursday, August 25, 2005 12:21 PM > To: Yaron Haviv > Cc: Fab Tillier; Roland Dreier; openib-general@openib.org > Subject: RE: [openib-general] RDMA connection and address translation API > > > > On Wed, 24 Aug 2005, Yaron Haviv wrote: > > > Any way providing src/dst IPs in the CM Private data is simple, and we > > can come with IBTA extension blessing that data structure as a general > > way to map IP oriented protocols over IB (a 1-2 page draft at the most) > > This way it can also address Caitlin concerns regarding NFS & IETF > > (since now it's a transport specific issue) > > How long do you estimate it would take to standardize an IP<->GID > mechanism (ATS, CM embedded, ...) in the IBTA? 3 months? 6 months? A > year? > > Let's assume that everyone on this list is in agreement. James, I can identify enough IBTA members in this list In case the group is in agreement I believe it's a rather short process Since it's just some minor definition, and IBTA doesn't have much on its agenda these days. For example Hal added a feature to the SM (client re-register ..) in weeks Based on the OpenIB input We also don't have to wait for finalized spec to implement, just like we implement IPoIB without an IETF RFC (only a draft) By the way a quick path could be to define it in DAT and hand it over to IBTA, after all ATS is also not an IBTA standard Yaron ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] RE: RDMA connection and address translation API
> -Original Message- > From: Roland Dreier [mailto:[EMAIL PROTECTED] > Sent: Thursday, August 25, 2005 12:13 PM > To: Michael S. Tsirkin > Cc: Yaron Haviv; openib-general@openib.org > Subject: Re: RDMA connection and address translation API > > Michael> Wouldnt it be better to use some bits in the service ID > Michael> field for this? > > This would also be OK. But Annex 3 of the IBA spec has already > defined the service ID field without any reserved bits we can use. > For example, if the first byte is 0x01, then the IETF is allowed to > use any value they want for the rest of the service ID. So if we want > to keep backwards compatibility with the spec, this approach might be > difficult. > The IB ServiceID is 64 bits and TCP is 16 bits, so we can still take some bits in the middle to define what Michael was proposing, this may be a simpler change in IBTA than changing the CM header, but both options are valid Yaron > Anyway, what's the disadvantage of using a reserved bit or two from > the CM REQ? > > - R. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [openib-general] RDMA connection and address translation API
> -Original Message- > From: Roland Dreier [mailto:[EMAIL PROTECTED] > Sent: Wednesday, August 24, 2005 7:29 PM > To: Yaron Haviv > Cc: James Lentini; Roland Dreier; openib-general@openib.org > Subject: Re: [openib-general] RDMA connection and address translation API > > > Yaron, has anyone raised all this in the IBTA WG? > I raised it about a year ago, but didn't really followed up on it At the time IBTA was also busy with other more urgent stuff (verb ext..) We work with few key IBTA members to re-surface it with the need for an abstract CM See the following text that was proposed (a Year ago as is) It is slightly different than your proposal but can be altered if needed It basically uses SDP header and marks one of the fields with 01 (FlowC) to indicate it's not SDP, this way even SDP can use it Also it covers some nice idea raised by MS & SUN to extend SDP to accept PUT & GET operations for RDMA, so you can get a BSD like API with few additional APIs rather than have a totally new API like DAPL Establishing a TCP/iWarp like connections over InfiniBand = In order to emulate an iWarp connection, it is required to open an InfiniBand RC connection, associate it with IP addresses and TCP ports In addition protocols may transfer control/login packets before the migration to the RDMA mode; this requires exchanging receiver buffer size and depth for initial usage (the ULP's will manage the flow control for the duration of the connection). The mapping uses the same data structures already defined for connection establishment in SDP (IBTA Socket Direct Protocol) which accomplish the same goal of mapping TCP Sockets addressing to InfiniBand, the non relevant SDP fields were Reserved. iWarp emulation CM Request (Hello) Private Data header 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 04| MID | Rsvd | bufs | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 08| len | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 12| Reserved| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 16| Reserved| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 20| MajVer| MinVer| IPVer | FlowC | Reserved| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 24| DesRemRcvSz | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 28| LocalRcvSz | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 32| Local Port| Reserved| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 36| Src IP (127-96) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 40| Src IP ( 95-64) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 44| Src IP ( 63-32) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 48| Src IP ( 31-00) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 52| Dst IP (127-96) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 56| Dst IP ( 95-64) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 60| Dst IP ( 63-32) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 64| Dst IP ( 31-00) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 1 CM Hello private data structure iWarp emulation CM Response (HelloReply) Private Data header 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 04| MID | Rsvd | bufs | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
RE: [openib-general] RDMA connection and address translation API
> -Original Message- > From: James Lentini [mailto:[EMAIL PROTECTED] > Sent: Wednesday, August 24, 2005 5:51 PM > To: Yaron Haviv > Cc: Roland Dreier; openib-general@openib.org > Subject: RE: [openib-general] RDMA connection and address translation API > > > > Which draft contains this? I found > > http://www.ietf.org/internet-drafts/draft-ietf-ips-iser-04.txt > James, You should look at : http://www.haifa.il.ibm.com/satran/ips/draft-ietf-ips-iser-05-candidate. txt The 05 rev really adds all the InfiniBand related stuff You can see how the association between IB & IP is done using IPoIB The current implementation may not use the private data field (since its not critical/mandatory) but the intention is to add it to address multi homed hosts, we would like to push such a definition into IBTA so every IP oriented ULP can use it, several people expressed interest in such a definition, this can also support NFS/RDMA or any other IP based ULP. > but the HELLO header in section 9.3 does not contain any IP address > information. > > > I believe it can be a good idea to use the same approach for > > NFS/RDMA and eliminate the need for reverse ATS lookup (the may have > > some conflicts when multiple IPs exists per node). We may just use > > the SDP hello header as is with unused fields zeroed This will allow > > all ULPs to use the same mechanism > > NFS/RDMA is not specific to iWARP or InfiniBand. My understanding is > that this could not be easily accommodated in the current standards > for that reason. Not sure why is that the case, if we add an IBTA definition of CM exchange for IP based ULP's (i.e. send src/dst IP and optionally ports) you can now have an NFS/RDMA spec that doesn't need to have any IB/iWarp specific definitions, since the differences are pushed down to the IBTA In case of NFS/RDMA over other (non IB or iWarp) transport you can specify that providing the IP addressing is a responsibility of the underline transport. Yaron ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [openib-general] RDMA connection and address translation API
> -Original Message- > From: [EMAIL PROTECTED] [mailto:openib-general- > [EMAIL PROTECTED] On Behalf Of Fab Tillier > Sent: Wednesday, August 24, 2005 3:00 PM > To: 'Roland Dreier' > Cc: openib-general@openib.org > Subject: RE: [openib-general] RDMA connection and address translation API > > > From: Roland Dreier [mailto:[EMAIL PROTECTED] > > Sent: Wednesday, August 24, 2005 11:03 AM > > > > Fab> Why can't the IPV field be ignored? If a listen wants only > > Fab> IPV4 addresses, it would specify a 16-byte compare buffer > > Fab> with the first 12 bytes zero, the next 4 filled with the IPV4 > > Fab> address, and would set the offset to that of the hello > > Fab> message's destination address (32). > > > > Yes, you're right for SDP. I guess if we're comfortable mandating > > that all protocols put their source and destination IPs in the private > > data for the IB case, then this works. Of course it's somewhat > > awkward to pass this information into the transport-neutral CM API but > > I think this can be worked around. > > I don't know if we need to mandate IP usage - it's up to the application. > Any > application that wants to have similar semantics to the way socket listens > work > (especially when bound to one of multiple IP addresses on a port) the > application would have to define its private data to accommodate this. > The context of this discussion is around a common API for iWarp/IB ULPs In that case they all use IP addresses (since it's the common addressing) If someone would use the IB specific API under this abstraction level he can provide what ever data he wants to the CM Any way providing src/dst IPs in the CM Private data is simple, and we can come with IBTA extension blessing that data structure as a general way to map IP oriented protocols over IB (a 1-2 page draft at the most) This way it can also address Caitlin concerns regarding NFS & IETF (since now it's a transport specific issue) Yaron ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [openib-general] RDMA connection and address translation API
> -Original Message- > From: [EMAIL PROTECTED] [mailto:openib-general- > [EMAIL PROTECTED] On Behalf Of Caitlin Bestler > Sent: Wednesday, August 24, 2005 2:14 PM > To: Fab Tillier > Cc: openib-general@openib.org > Subject: Re: [openib-general] RDMA connection and address translation API > > > The applications are expecting source/destination network addresses > that come from a network layer, not from the peer application. IP has > no problem meeting this requirement. This is an IB problem that needs > to be solved within the scope of IB without changing any ULPs. > To my understanding IB private data fields are IB CM specific So embedding src/dst IP in it doesn't change the ULP and could be considered as part of the IB CM You can look at the private data in that case as a replacement to the TCP CM (Syn/SynAck exchange), and Syn packet includes IPs & Ports Yaron ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [openib-general] RDMA connection and address translation API
> -Original Message- > From: [EMAIL PROTECTED] [mailto:openib-general- > [EMAIL PROTECTED] On Behalf Of James Lentini > Sent: Wednesday, August 24, 2005 1:43 PM > To: Roland Dreier > Cc: openib-general@openib.org > Subject: Re: [openib-general] RDMA connection and address translation API > > > > On Tue, 23 Aug 2005, Roland Dreier wrote: > > > It would be possible to have another function like > > rdma_getpeername() that takes the transport address and > > returns a source IP address. In the IB case this would do an > > ATS reverse lookup. However, I hate this idea. iSER already > > uses the CM private data to pass the source IP in the IB case, > > I know this is how IB SDP works, but I don't think iSER works this > way. > > The code in the tree calls dat_ep_connect() with a NULL private data > pointer. > > There is an iSER HELLO message described in iser_header.h contains IP > addresses, but I'm not certain that this is part of the current > protocol (ISER_HELLO_LEN and ISER_HELLO_REPLY_LEN are unused). James, iSER doesn't mandate the source IP in general since its doing a much stronger authentication during Login However we believe using a similar header to SDP can help the Passive side a. know which destination IP was targeted (in a multi homed environment) b. for some implementations that want to validate the source for some reason that's why the draft suggested adding the source/dst IP in the private data just like SDP does, I believe it can be a good idea to use the same approach for NFS/RDMA and eliminate the need for reverse ATS lookup (the may have some conflicts when multiple IPs exists per node). We may just use the SDP hello header as is with unused fields zeroed This will allow all ULPs to use the same mechanism Yaron > ___ > openib-general mailing list > openib-general@openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib- > general ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [openib-general] [ANNOUNCE] Initial trunk checkin of ISERinitiator
> > Also on the IB side the AT code probably needs to be reviewed and > improved. The API should be simpler, and I don't like the way AT > sticks its tentacles into the IPoIB driver and network stack. > The AT implementation was based on the code from SDP I assume that similar changes as the ones you propose would need to apply to SDP, or SDP would need to use the same lib as the other ULPs Yaron ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [openib-general] [ANNOUNCE] Initial trunk checkin of ISERinitiator
> -Original Message- > From: Christoph Hellwig [mailto:[EMAIL PROTECTED] > Sent: Friday, August 19, 2005 10:22 AM > To: Roland Dreier > Cc: Yaron Haviv; Christoph Hellwig; Grant Grundler; open- > [EMAIL PROTECTED]; openib-general@openib.org > Subject: Re: [openib-general] [ANNOUNCE] Initial trunk checkin of > ISERinitiator > > On Thu, Aug 18, 2005 at 09:24:24PM -0700, Roland Dreier wrote: > > Yaron> Not every one wants to keep on doing target discovery with > > Yaron> Python scripts, > > > > Come on, this is just a stupid statement. The whole point of putting > > device management in userspace is so that everybody has the > > flexibility to use whatever discovery mechanism they want. > > And just FYI. If you ever want an iSER implementation merged it will > have to work the same way. Look at how the open-iscsi TCP initator does > it. Good point, the high-level functionality in iSER is all done in Open-iSCSI and its userspace extensions iSER just deals with the data transfer and is layered under Open-iSCSI by the way can you point me to the iSCSI HBA that delivers better performance, latency, and memory consumption and what about the price of that HBA and the attached 10GbE switch Yaron ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [openib-general] [ANNOUNCE] Initial trunk checkin of ISERinitiator
> -Original Message- > From: Roland Dreier [mailto:[EMAIL PROTECTED] > Sent: Friday, August 19, 2005 12:24 AM > To: Yaron Haviv > Cc: Christoph Hellwig; Grant Grundler; [EMAIL PROTECTED]; > openib-general@openib.org > Subject: Re: [openib-general] [ANNOUNCE] Initial trunk checkin of > ISERinitiator > > Yaron> Not every one wants to keep on doing target discovery with > Yaron> Python scripts, > > Come on, this is just a stupid statement. The whole point of putting > device management in userspace is so that everybody has the > flexibility to use whatever discovery mechanism they want. You know there is a small problem in storage, people don't want to just use what "they want", but rather use standard management, discovery, Security, HA, etc' which are quite essential for commercial customers > I agree that the SRP and iSER protocols are basically equivalent at a > technical level: they both transport SCSI over RDMA. If you want to > compare existing implementations, I'd much rather use my SRP driver's > 1600 lines of code over your 14000+ lines of x86-only iSER on top of > 1+ lines of kDAPL (not even counting the iSCSI core). Not sure how you do your LOC counting or what's included in it In any case a protocol that is generalized to multiple transports, has built in discovery, error-recovery, global routing/naming, authentication, built-in multi-pathing, multi-connection per session, optimizations for small messages, comprehensive management and configuration with industry standard APIs, etc' Probably need to have more LOC than one that just tunnels SCSI command from one predefined point to another (by the way is DM, CFM and/or Python included in the 1400 :)) The important things is how many LOC are on the command path and how optimized it the protocol, this code runs SCSI at 850-900MB/s and on the same time provides the most comprehensive set of features, and is managed out of the box with industry standard tools A variation of that code runs today on PPC, so I assume it's not an issue to make sure it runs over PPC In any case let aside the religious discussion iSER needs to get into OpenIB and customers will then decide what ever they want, to get it in we need: 1. iSER developers to comply to Linux requirements and address any constructive feedback 2. have an API that can be used by ULP developers that want to be transport independent (till then kDAPL would need to be used) Yaron > > - R. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [openib-general] [ANNOUNCE] Initial trunk checkin of ISERinitiator
-Original Message- > From: Christoph Hellwig [mailto:[EMAIL PROTECTED] > Sent: Thursday, August 18, 2005 7:45 PM > To: Grant Grundler > Cc: Yaron Haviv; [EMAIL PROTECTED]; openib-general@openib.org > Subject: Re: [openib-general] [ANNOUNCE] Initial trunk checkin of > ISERinitiator > > > If kDAPL for any reason doesn't get pushed upstream to kernel.org, > > we effectively don't have iSER or NFS/RDMA in linux. > > Since I think without them, linux won't be competitive in the > > commercial market place. > > iser doesn't matter at all in the marketplace. nfs/rdma matters and > even if netapp/citi keeps beeing ignorant I will port it over to the > infiniband/rdma layer myself. I'll hopefully have some iwarp cards > soon. Christoph, Can you help me understand how would you address the CM issue, would you add IB/iWarp specific code into all the ULPs (NFS, SDP, MPI, Lustre, iSER, ..) ? Regarding iSER, You are entitled to your opinion Many others won't agree with you and think that in the long run iSER will be the only viable block storage alternative in OpenIB, mainly since it fits the IB/iWarp generalization and it is much more complete than alternatives, and with the recent IETF moves people can't claim its non-standard anymore. Not every one wants to keep on doing target discovery with Python scripts, and some prefer just using existing code and management from iSCSI rather than inventing new mechanisms just for IB Yaron ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [openib-general] [ANNOUNCE] Initial trunk checkin of ISERinitiator
> -Original Message- > From: Grant Grundler [mailto:[EMAIL PROTECTED] > Sent: Thursday, August 18, 2005 7:41 PM > To: Yaron Haviv > Cc: Grant Grundler; Christoph Hellwig; [EMAIL PROTECTED]; > openib-general@openib.org > Subject: Re: [openib-general] [ANNOUNCE] Initial trunk checkin of > ISERinitiator > > > > Until OpenIB will define another layer that can be used for both, there > > is no other viable alternative for iSER to be implemented on top > > In future if a new common API/Layer will be provided iSER can change to > > support it > > I've understood that the openib.org Verbs API can be changed to make > it "transport neutral" - ie support RNICs. RNIC vendors don't seem > to be interested in submitting patches for that. Did someone think > they can drop kDAPL into openib.org SVN and roland would automatically > push that into kernel.org? > > I'm not convinced of that and worry that iSER and NFS/RDMA won't > make it into kernel.org as things stand now. > Grant, The Verb portion deals with the data path operations (after the connection was established), the connection establishment process is very different IB CM is implemented on top of the verbs, an iWarp specific CM would also need to be developed in parallel (interacts with the TCP stack ..), and common ULPs need a single mechanism to use both (in the DAPL case a BSD like API using IP addresses) Again I'm not saying kDAPL is the ultimate solution or that it will last in its current form, its just the only thing we can use today, if someone would come with a better implementation we can just change iSER In one of the previous threads I suggested building a hybrid layer that uses the current verb APIs for verb type operations, and the DAPL code for the connection establishment, resulting in a simpler/shorter code, this would present a middle ground addressing the concerns on both sides Yaron ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [openib-general] [ANNOUNCE] Initial trunk checkin of ISERinitiator
> -Original Message- > From: [EMAIL PROTECTED] [mailto:openib-general- > [EMAIL PROTECTED] On Behalf Of Grant Grundler > Sent: Thursday, August 18, 2005 2:18 PM > To: Christoph Hellwig > Cc: [EMAIL PROTECTED]; openib-general@openib.org > Subject: Re: [openib-general] [ANNOUNCE] Initial trunk checkin of > ISERinitiator > > On Thu, Aug 18, 2005 at 07:43:17PM +0200, Christoph Hellwig wrote: > ... > > > The same as last time, the code didn't change at all. It's still > > > totally ignorant about such essential things as dma mapping, has > > > creative new abuse for struct iovec, it's still based on iovecs, > > > > "... still based on kdapl" of course > > Yeah, I was wondering about that. When I was off on vacation > in July (and OLS), kDAPL was committed to the svn repository. > Has anyone reviewed that? > > I was under the impression kDAPL would never make it into > the openib.org source tree. Or has something changed? > Grant, Currently kDAPL is the ONLY layer that can be abstracted over both IB & iWarp, due to the different CM model of the two interconnects iSER and NFS/RDMA are common to both IB & iWarp and are implemented to run on both Until OpenIB will define another layer that can be used for both, there is no other viable alternative for iSER to be implemented on top In future if a new common API/Layer will be provided iSER can change to support it Also appreciate your productive feedback on the code, the team will address it Yaron > grant > ___ > openib-general mailing list > openib-general@openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib- > general ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [openib-general] [ANNOUNCE] Initial trunk checkin of ISERinitiator
> -Original Message- > From: [EMAIL PROTECTED] [mailto:openib-general- > [EMAIL PROTECTED] On Behalf Of Christoph Hellwig > Sent: Thursday, August 18, 2005 8:36 AM > To: Dan Bar Dov > Cc: [EMAIL PROTECTED]; openib-general@openib.org > Subject: Re: [openib-general] [ANNOUNCE] Initial trunk checkin of > ISERinitiator > > On Thu, Aug 18, 2005 at 03:14:05PM +0300, Dan Bar Dov wrote: > > I just checked in a first version of iSCSI Extensions for RDMA > > Protocol (ISER) initiator under infiniband/ulp/iser. This > > implements the ISER datamover, a transport layer alternative to > > TCP/IP usable by iSCSI. This ISER transport has been tested with > > the open-iscsi opensource project, and against the Voltaire > > Fibre-Channel Router (FCR) and Voltaire's Native-IB storage kit. > > > > All the iSCSI features including device management are available > > seamlessly with the iSCSI/ISER initiator. ISER simply puts iSCSI > > on steroids. > > > > The ISER implementation makes use of the openIB/kDAPL. Please note > > that several kDAPL patches that were submitted to the list are > > necessary for this implementation to work. > > The code is complete crap, please remove it again. Cristoph, iSER is part of OpenIB just like any other ULP And there needs to be a Productive process of adding it to the stack Your feedback is valuable, but we need to get more details on what concerns you, the iSER team is committed to address any feedback that will be presented in this list, after all it's just an initial posting Yaron > > ___ > openib-general mailing list > openib-general@openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib- > general ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] FW: [Ips] iSER over IB - Consensus call
FYI, For the ones that don't track the IETF iSCSI WG In the last IETF meeting in Paris iSER (iSCSI RDMA) over InfiniBand was discussed again, and as you can see below IETF gave its green light to do the few semantic changes in the iSER RFC and generalize it to IB Can also note that iSER over IB/iWarp RFC is in the Last Call status It is interesting to see the convergence with OpenIB adding iWarp drivers, and IETF adding IB to the iSER RFC, resulting in a common set of Drivers, ULPs, and remote boot support. Yaron -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of [EMAIL PROTECTED] Sent: Tuesday, August 09, 2005 11:02 PM To: ips@ietf.org Subject: [Ips] iSER over IB - Consensus call The IPS WG Paris meeting discussed: iSER over InfiniBand (draft-hufferd-iser-ib-00.txt) Proposal for text edits to iSER to permit use on other transports, including InfiniBand. Also will help enable iSER to be defined over SCTP. This draft is (or at least is intended to be) entirely editorial - it does not (or at least is not intended to) make any technical changes to the iSER draft that has passed WG Last Call. The draft Paris minutes record the following: Sense of room: Want to proceed towards applying these changes (after careful review and WG rough consensus) to the approved iSER draft so that there is one draft that is broadly applicable rather than the current iSER draft plus a draft that modifies that draft to broaden it. Anyone who objects to this sense of the room in Paris should post to the list with reasons for the objection, otherwise the sense of the room to proceed in this direction will become the rough consensus of the IPS WG. If the WG does proceed in this direction, the next step will be a WG Last Call on draft-hufferd-iser-ib-00.txt, with all changes/comments/etc. to be posted to the list, even editorial ones. After conclusion of that WG Last Call, the resulting edits can be applied to produce a new version of the iSER draft. We'll try to get this done by the end of August, but it may take a bit longer. Thanks, --David David L. Black, Senior Technologist EMC Corporation, 176 South St., Hopkinton, MA 01748 +1 (508) 293-7953 FAX: +1 (508) 293-7786 [EMAIL PROTECTED]Mobile: +1 (978) 394-7754 ___ Ips mailing list Ips@ietf.org https://www1.ietf.org/mailman/listinfo/ips ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [openib-general] [iSER]How to use the iSER with the UNH iSCSI
Ian, Currently the UNH iSCSI doesn’t support the "Datamover API" which is a new API defined in IETF and enable iSCSI to run over offload technologies such as iSER In addition the iSER code that is in OpenIB covers the Initiator side The Target code is (and being) integrated into few commercial products, or can be provided under some licensing There are few that intend to enable the datamover API in the UNH iSCSI and integrate it with iSER, they would be happy to see more helping hands, if you are interested I can hook you up with them Yaron > -Original Message- > From: [EMAIL PROTECTED] [mailto:openib-general- > [EMAIL PROTECTED] On Behalf Of Ian Jiang > Sent: Thursday, August 04, 2005 3:10 AM > To: openib-general@openib.org > Subject: [openib-general] [iSER]How to use the iSER with the UNH iSCSI > > Hi, everybody! > Thanks for all the replis to my "How to get the dat_headers_1_1.tgz"! > I downloaded the dapl_beta2.06.tgz as Itamar told me. > And I made some modification to the iSER to use it on the x86_64 platform. > > I got through the compiling finally, but here is another question: > How to use the iSER with the UNH iSCSI? I have the UNH iSCSI running on my > system at present. Need I modify it and reinstall? > > And I'm not sure if the dapl_beta2.06 has to be installed to run the iSER. > In fact, I did not compile or install the dapl before installing the iSER. > > Any suggestion is appriciated! > > Ian Jiang > [EMAIL PROTECTED] > > Computer Architecture Laboratory > Institute of Computing Technology > Chinese Academy of Sciences > Beijing,P.R.China > Zip code: 100080 > Tel: +86-10-62564394(office) > > _ > 免费下载 MSN Explorer: http://explorer.msn.com/lccn > > ___ > openib-general mailing list > openib-general@openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib- > general ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [openib-general] Re: [Rdma-developers] Meeting(07/22) summary:OpenRDMA community development discussion
> -Original Message- > From: Fab Tillier [mailto:[EMAIL PROTECTED] > Sent: Monday, August 01, 2005 1:14 PM > > > From: Sean Hefty [mailto:[EMAIL PROTECTED] > > > > Yaron Haviv wrote: > > > we can spend time and discuss theories and intentions, at the end of > the > > > day an iWarp RNIC cannot just reside under IB-Verbs without major > > > changes to the overall infrastructure. > > > > I don't disagree with having a common connection library that supports > both > > IB and iWarp, or that you could derive a solution from kDAPL. But based > on > > the proposed APIs that I've seen, I believe that an RNIC could reside > under > > IB verbs with minimal changes, and would likely be the best engineered > > solution for including RNIC support in Linux. > > Just for clarity, when you say verbs you exclude connection > establishment/management, right? > > I think keeping the two distinct is important in this discussion, as it > seems > there is some confusion - some people refer to verbs as verbs + CM, others > as > just verbs. > > Here's my take from the discussions so far: > - RNICs can probably be made to work under the IB verbs (with changes of > course). > - RNICs can probably not be made to work under the IB CM (not that I've > seen > this suggested). > Fab, I did the same distinction between pure verbs & the broader API (+CM, SA, ..) I agree that pure send, receive, .. verbs are similar with minor differences And we may just want to adopt them with minor changes On the other hand it would not be efficient to try and bend the iWarp CM model to the IB (complex) one, but rather use a simpler one, such as the one in DAPL that fits both camps In IB we need to use a CM and a bunch of SA queries, where the ULP doesn't really need all that and can do with a simple BSD like connection request (that may map to a more complex IB or iWarp model underneath) There are ways in the dapl/bsd like connection mechanism enough to imply sequrity/QoS/etc' (using a src/dst IP, network implied from IP, and kDAPL QoS or BSD TOS, ..) so a user doesn't need direct access to SA for connections, at the most we can add some flags to it Yaron > - Fab ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [openib-general] Re: [Rdma-developers] Meeting (07/22) summary:OpenRDMA community development discussion
> -Original Message- > From: Roland Dreier [mailto:[EMAIL PROTECTED] > Sent: Monday, August 01, 2005 11:14 AM > To: Yaron Haviv > Cc: Christoph Hellwig; Tom Duffy; Venkata Jagana; rdma- > [EMAIL PROTECTED]; openib-general@openib.org > Subject: Re: [openib-general] Re: [Rdma-developers] Meeting (07/22) > summary:OpenRDMA community development discussion > > Yaron> It would probably be wise to try and merge that effort with > Yaron> IB-verbs etc' (e.g. make the verbs portion of the API > Yaron> closer), and on the same time preserve the effort that was > Yaron> done in kDAPL to overcome the differences (e.g. in the CM, > Yaron> addressing portions) > > This doesn't seem like the right approach to me but we'll be happy to > review your patches. So how would you reconcile the differences between IB & iWarp, and specifically on the connection establishment portion ? In your approach would I need to access different CM APIs for IB & for iWarp in my ULP ? >From my perspective the current kDAPL solves that problem (w/o any additional patches), and we are trying to re-invent the wheel here. If patches are really needed they can probably applied to the kDAPL code (i.e. remove redundant code/simplify kDAPL), however this is an optimization that can always be done later. Yaron ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [openib-general] Re: [Rdma-developers] Meeting (07/22) summary:OpenRDMA community development discussion
> -Original Message- > From: [EMAIL PROTECTED] [mailto:openib-general- > [EMAIL PROTECTED] On Behalf Of Christoph Hellwig > Sent: Friday, July 29, 2005 8:02 AM > To: Tom Duffy > Cc: Venkata Jagana; [EMAIL PROTECTED]; Christoph > Hellwig; openib-general@openib.org > Subject: [openib-general] Re: [Rdma-developers] Meeting (07/22) > summary:OpenRDMA community development discussion > > On Thu, Jul 28, 2005 at 02:02:08PM -0700, Tom Duffy wrote: > > At OLS (and in previous forums), the kernel maintainers have made it > > *very* clear that there should only be one API. > > _and_ that this api is neither RNIC-PI or KDAPL. In fact for anything > that doesn't look very similar to the current IB midlayer you'd need > very convincing arguments. > I assume it is not as simplistic as that iWarp CM model is quite different than IB, and iWarp doesn't have SA/SM and a bunch of other IB specific things For example: The correct common abstraction is one where a user can issue a connection by using a logical end-point address (such as an IP), and doesn't have to deal with the IB or iWarp specific CM state machine or SA/SM. If you look at DAPL you can break it to simple Verbs (e.g. send, ..) where its just a simple overlay on to of the verbs (and may be redundant) However there is a second part that implements a simple connection establishment model (much like BSD) that can be mapped to both IB (CM, SA, ..) or iWarp (TCP Syn/SynAck, ARP, etc'), this serves couple of main purposes: a. make it simple for ULP developer and put the complex part in a common place b. define a common model for different HW we can spend time and discuss theories and intentions, at the end of the day an iWarp RNIC cannot just reside under IB-Verbs without major changes to the overall infrastructure. Several guys spent some time looking it over and came with an abstraction that IS possible on top of IB & iWarp & foo, that is called DAPL (or IT as another similar alternative) It would probably be wise to try and merge that effort with IB-verbs etc' (e.g. make the verbs portion of the API closer), and on the same time preserve the effort that was done in kDAPL to overcome the differences (e.g. in the CM, addressing portions) Yaron > ___ > openib-general mailing list > openib-general@openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib- > general ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [openib-general] IBDM and IBMgtSim Proposal Comments
> -Original Message- > From: [EMAIL PROTECTED] [mailto:openib-general- > [EMAIL PROTECTED] On Behalf Of Fab Tillier > Sent: Thursday, July 07, 2005 11:37 PM > To: Hal Rosenstock; 'Eitan Zahavi' > Cc: [EMAIL PROTECTED]; openib-general@openib.org > Subject: RE: [openib-general] IBDM and IBMgtSim Proposal Comments > > > From: Hal Rosenstock [mailto:[EMAIL PROTECTED] > > Sent: Thursday, July 07, 2005 10:56 AM > > > > In the OpenIB architecture, umad is the lowest layer library and the > > diagnostics are built on that. > > That's only true in the *Linux* OpenIB Architecture. Windows is different > - the > access layer already provides support for user-level MAD clients, and the > API is > very close (if not identical) to the IBAL interface OpenSM was originally > written to. > >From my understanding the main advantage for using the OSM Vendor specific layer is that it is also present in Windows ? or does it have some other advantage over the umad layer (from Hal's response seems like umad has better layering/functionality) ? If that is the case than you can also suggest to replace the OpenIB verbs layer or CM, etc' with the IBAL one because its present in Windows I believe if we want to do a major change in the management infrastructure that is live and kicking (can probably improve like always) We need a much better reason than "its done this way in Windows" Yaron ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [openib-general] [iser]about the target
Ian, An iSER target is basically an iSCSI target with an add-on of an iSER transport As you may know Voltaire contributed the iSER Initiator and also has a full Target implementation that was tested with it There are few Target solutions that are possible: There is at least one major storage vendor working on a target that will be available later on Voltaire also provides a software package that can turn a server to an iSCSI/iSER target (not open source) As well as gateway solutions from iSER to FC and GbE In addition UNH has started some work on enabling iSER on their Open Source iSCSI target Logically iSER target is a (much faster) iSCSI Target So just like in iSCSI you can bridge iSER to FC just like Cisco MDS Bridge from iSCSI to FC There are standards that define the exact mapping between the FC naming and iSCSI naming Yaron From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Michael Krause Sent: Tuesday, July 05, 2005 7:28 PM To: Ian Jiang; openib-general@openib.org Subject: Re: [openib-general] [iser]about the target At 06:07 PM 7/4/2005, Ian Jiang wrote: Hi! I am new to the iSER. On " https://openib.org/tiki/tiki-index.php?page=iSER", it is said that iSER currently contains initiator only (no target). Will the target come out later? How did they test the iSER initiator without a iSER target? Could you give some explaination? >From a practical perspective, there are very few iSCSI targets shipping today. Most people had envisioned iSER over IB to a gateway Ethernet device since native IB storage is also quite rare in terms of real product. For many of us, our push for iSER over IB was to replace SRP which has a deficient ecosystem thus not really used beyond some basic Fibre Channel gateway cards. Mike Thanks! Ian Jiang [EMAIL PROTECTED] Computer Architecture Laboratory Institute of Computing Technology Chinese Academy of Sciences Beijing,P.R.China Zip code: 100080 Tel: +86-10-62564394(office) _ ÓëÁª»úµÄÅóÓѽøÐн»Á÷£¬ÇëʹÓà MSN Messenger: http://messenger.msn.com/cn ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [openib-general] performance counters in /sys
> -Original Message- > From: [EMAIL PROTECTED] [mailto:openib-general- > [EMAIL PROTECTED] On Behalf Of Hal Rosenstock > Sent: Thursday, May 19, 2005 11:22 PM > > On Thu, 2005-05-19 at 16:11, Mark Seger wrote: > > > > The only other thing that could be useful would be an extra field for > > the protocol, such that for a given interface/port, I could see the > > traffic counters for each type of protocol that one might choose to > > support, such as mpi, portals, etc. > > There are no hardware counters for these. These would need to be filled > in somehow by software. > Mark/Hal, I believe you can use the per VL counters for that (IB allows counting traffic on a specific VL) By matching ULPs to VLs (e.g. through the ib_at lib we suggested) You can get both congestion isolation per traffic type as well as the ability to count traffic per ULP (note that up to 8 VLs are supported in the Mellanox chips) Yaron ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [openib-general] IB Address Translation service
> -Original Message- > From: [EMAIL PROTECTED] [mailto:openib-general- > [EMAIL PROTECTED] On Behalf Of Hal Rosenstock > Sent: Saturday, March 05, 2005 6:18 PM > To: David M. Brean > Cc: openib-general@openib.org > Subject: Re: [openib-general] IB Address Translation service > > On Sat, 2005-03-05 at 10:22, David M. Brean wrote: > > There is an I-D for DHCP on IB. IPoIB defines a "broadcast" address and > > DHCP (and ARP) on IB use it. Could make RARP work using this mechanism, > > but as someone else pointed out, the IB hardware address contains a > > QPN. The I-D for IPoIB says something like: > > > > The link-layer address for IPoIB includes the QPN which might not be > > constant across reboots or even across network interface resets. > > Cached QPN entries, such as in static ARP entries or in RARP servers > > will only work if the implementation(s) using these options ensure > > that the QPN associated with an interface is invariant across > > reboots/network resets. > > That may be the requirement but I think there are some issues with > keeping the QPN invariant. Quoting Dror Goldenberg > (http://openib.org/pipermail/openib-general/2004-November/006765.html): > "Assigning specific QPN for ipoib requires allocation of QPN space which > is beyond IB spec verbs. Current verbs do not allow it. I don't have any > objection for that, except that you have to hold a set of preallocated > QPs with specific numbers and hand them over to privileged consumer when > requested to. I wouldn't commit that it will work on any HCA > architecture." > > -- Hal > Just to add to Hal and Dave, it is not only that the QPN may not be constant, you can actually have few valid QPNs, one or more per partition, since each partition reflects the notion of an IP VLAN/Network the RARP should return different IP per partition, and the RARP caller should use different QPN in each case. I believe all the emails in this thread clarify why RARP is not a valid approach Yaron ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [openib-general] putting in dead wood for DAPL and similarabomination
> -Original Message- > From: Christoph Hellwig [mailto:[EMAIL PROTECTED] > Sent: Thursday, March 03, 2005 5:48 AM > To: Yaron Haviv > Cc: Christoph Hellwig; James Lentini; openib-general@openib.org > Subject: Re: [openib-general] putting in dead wood for DAPL and > similarabomination > > The current iSER code is 10928 LOC, add to that 22155 LOC of kDAPL (not > including the actual provider for IB) and 5822 LOC linux-iscsi kernel > code. Compare that to the 25412 LOC total for drivers/infiniband in Linux > 2.6.11. As Tom indicated we expect a significant code shrink for kDAPL, it will be much more Linux friendly when we are done with it, some parts will be re-written. Also the iSER code is not optimal in terms of LOC, and we can clean up some redundant code if we are in an LOC contest, I believe after we glue all the layers we will focus on reducing LOCs and test code. > Here's the challenge: if someone gets me the funding I'll write > complete iSER of IB implementation in less than 10k LOC based on the > open-iscsi code if someone gets me the funding. You know there is also the challenge of making it work, perform, interoperate, and support some features, not all is about LOC :) Anyway thanks for offering us support we may take you up on the some day Yaron ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [openib-general] putting in dead wood for DAPL and similarabomination
> -Original Message- > From: [EMAIL PROTECTED] [mailto:openib-general- > [EMAIL PROTECTED] On Behalf Of Christoph Hellwig > Sent: Wednesday, March 02, 2005 11:49 PM > To: James Lentini > Cc: Christoph Hellwig; openib-general@openib.org > Subject: Re: [openib-general] putting in dead wood for DAPL and > similarabomination > > On Wed, Mar 02, 2005 at 11:11:35AM -0500, James Lentini wrote: > > DAPL has been efficiently supported on top of InfiniBand, iWARP, the > > Virtual Interface Architecture, Quadrics, and Myrinet. > > And I've not seen any kernel submittsion for either of them - and what's > important no single kDAPL application that actually shows any benefit > that way. Volatair's iSER implementation would surely be smaller when > directly written to the OpenIB interface, and is already smaller than > the whole kDAPL layer. Christoph, the reason the iSER code is very thin is that it is using kDAPL (and Linux iSCSI), it doesn't need to deal with SA calls, CM calls, LIDs, GIDs, and a bunch of other things. Besides being RDMA transport independent DAPL enable people to code to RDMA without been intimately familiar with the HW, we saw people coding to it in days, Which I can't say the same for Verbs. Abstract layers are not new to Linux, Sockets is another type of abstraction with multiple protocols/families underneath, or even Ethernet Why aren't you suggesting to do TCP implementation for ATM cards, and one for PPP, etc' Yaron > ___ > openib-general mailing list > openib-general@openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib- > general ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [openib-general] IB Address Translation service
> -Original Message- > From: Tom Duffy [mailto:[EMAIL PROTECTED] > Sent: Wednesday, March 02, 2005 1:02 AM > To: Yaron Haviv > Cc: openib-general@openib.org > Subject: RE: [openib-general] IB Address Translation service > > [ putting back on list ] > > On Wed, 2005-03-02 at 00:29 +0200, Yaron Haviv wrote: > > Did you try RARP with IPoIB ? > > I have not. > > > I thought that there is some issue that it doesn't work > > Currently, the rarpd only works with ethernet, but I don't see why this > couldn't be fixed. > Tom, IPoIB HW Address consists of GID+QPN+.. In order to issue a RARP I believe you should supply the full HW address to get the IP address back, how would you know the remote IPoIB QPN ? or can you do it without a QPN ? Yaron ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [openib-general] putting in dead wood for DAPL and similarabomination
> -Original Message- > From: [EMAIL PROTECTED] [mailto:openib-general- > [EMAIL PROTECTED] On Behalf Of Christoph Hellwig > Sent: Wednesday, March 02, 2005 12:06 AM > To: openib-general@openib.org > Subject: [openib-general] putting in dead wood for DAPL and > similarabomination > > Please don't put in things like the address translation service or > memory windows for DAPL folks. The IB code in the kernel already > has far too much unused stuff and adding more will not go past reviews > for kernel inclusions - as will DAPL itself exactly because of such > utter stupidities. Even if your approach to DAPL was right you still have address translation service in SDP, and would need one for NFS/RDMA, and another one to iSER and another one for Lustre, etc' (even if they are coded directly to the verbs) Not to mention other protocols that access the SA (e.g. SRP, ..). So is your idea to duplicate that functionality for all the ULPs ? Would that make the code simpler and easier to maintain ? Yaron ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [openib-general] IB Address Translation service
Eric, let me correct some of your assumptions Which this API is actually targeting to protect against, see below > -Original Message- > From: Eric W. Biederman [mailto:[EMAIL PROTECTED] On Behalf Of Eric W. > Biederman > Sent: Tuesday, March 01, 2005 9:18 AM > To: Yaron Haviv > Cc: Roland Dreier; shaharf; openib-general@openib.org > Subject: Re: [openib-general] IB Address Translation service > > "Yaron Haviv" <[EMAIL PROTECTED]> writes: > > > > -Original Message- > > > From: [EMAIL PROTECTED] [mailto:openib-general- > > > [EMAIL PROTECTED] On Behalf Of Roland Dreier > > > Sent: Monday, February 28, 2005 7:13 PM > > > To: shaharf > > > Cc: openib-general@openib.org > > > Subject: Re: [openib-general] IB Address Translation service > > > > > > This API seems overly complex and at the same time too inflexible to > > > me. However, rather than getting bogged down nitpicking about APIs, I > > > think we have to take a few steps back. > > > > I believe the API is very flexible, but we are pretty open to here what > > you think is needed in addition > > > > > First, let's understand the problem we're trying to solve. Who are > > > the consumers of this address translation service? > > > > The first problem is that most ULPs use valid IP addresses for > > simplicity (DAPL, iSER, NFS/RDMA, SDP, MPI, etc') and someone needs to > > resolve it to an IB address and device to use IB. This should take into > > account cases where there are more than one HCAs in the system. > > Preferable/optionally the ULP would like to know which partition to use > > if there is more than one, and leverage on the IP subnetting done by > > IPoIB. > > I am confused. In any sane network the translation is: > Hostname -> address. > > IP because it spans multiple networks does: > Hostname -> IP address -> hw address. > > IB because it can span multiple IB networks does: > GUID+QPN -> LID + QPN. > > So what is wrong with simply doing: > Hostname -> GUID > ??? 1. In standard protocols such as SDP, iSER, NFS/RDMA, Oracle, .. (unlike OSU MPICH) the name service is one of the standard IP name services mapping Host names to IP addresses, and the ULP accepts a destination IP and NOT a Host name. 2. InfiniBand Hardware address is a GID and not LID, LID is a path attribute implemented to avoid the slow 48 bit lookup done in Ethernet and enable multi-pathing. A LID address is dynamically allocated; you may also have multiple LID addresses per port. (OSU MPICH implementation is a bad example for IB citizenship) So to summaries: Ethernet: Host Name -> IP -> MAC Address InfiniBand: Host Name -> IP -> GID Address -> Path (LID, SL, ..) So If we intend to relay on standard name services we can start with IP (or implement a proprietary name service for Name->HW Addr if we wish) Than we need to translate an IP to HW address (GID/GUID) and the equivalent of VLANs (partitions), this is provided by the ib_at_route_by_ip call And internally it is based on IP and IPoIB mechanisms similar to how Libor implemented it in SDP (and optionally if we see a need using ATS). Than in IB we need to resolve a GID to path attributes, which consist of LID, SL/VL, MTU, etc' The inputs to that are the source, destination, partition and QoS attributes, and the result is a path, since IB also support Multi-pathing, a user may receive multiple paths that can be used for high-availability, performance aggregation, or source based routing. A path may also travel through isolated congestion domains using VLs. The ib_at_paths_by_route call allows resolving HW Address + preferences to one or more path records that are than used by the ULP & CM. It can also be used by non-IP based ULP's such as SRP or MPICH, that is why the API unlike the current SDP implementation is divided to 2 calls one for HW address, and one for path. Currently OSU MPICH is using Proprietary Name and LID+QP assignment, it doesn't work the standard IB way with SA & CM, which is not making use of a lot of IB capabilities, and is also making it more static and less robust, I wouldn't use that as the example for ULP implementation. The MPI layer which doesn't have any idea about the fabric routing/utilization/availability is determining the path. Another simple scenario your application requires is to run MPI and NFS on different IB VLs, today you need to manually configure (recompile) that in each ULP, with that proposal it can be done automatically with a central configuration on the SM. On the other hand SDP uses same mechanisms; however we cannot use it for other ULP's (e.g. kDAPL), and also it is missing functional
RE: [openib-general] IB Address Translation service
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Michael Krause Sent: Tuesday, March 01, 2005 2:07 AM To: openib-general@openib.org Subject: RE: [openib-general] IB Address Translation service At 11:47 AM 2/28/2005, Yaron Haviv wrote: >It would be a mistake to attempt to use anything by IP addresses (v4 or v6) from an >application perspective. Mapping to IB must be application transparent to be viable. > >Mike Mike, the all idea behind the proposed ib_at calls is to provide semantics matching between IP and IB in a way where the applications wont feel the difference, but will still make use of all IB capabilities. Advanced applications can still use the advanced IB specific functionality through the optional parameters. ATS is just a minor option in the API (the less important one) Yaron ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [openib-general] IB Address Translation service
> -Original Message- > From: Libor Michalek [mailto:[EMAIL PROTECTED] > Sent: Tuesday, March 01, 2005 2:04 AM > To: Yaron Haviv > Cc: Paul Baxter; openib-general@openib.org > Subject: Re: [openib-general] IB Address Translation service > > On Mon, Feb 28, 2005 at 09:55:50PM +0200, Yaron Haviv wrote: > > From: Libor Michalek > > > > > > The two are not interoperable, they > > > reside in parallel, and succeed in producing much confusion. (IMO) > > > > One note, the two can be made interoperable, if nodes that use IPoIB > > register them self in the ATS database as well (which has its merits for > > reverse resolution that cannot be satisfied by IPoIB), this way the > > nodes that just use ATS can locate the IPoIB ones. > > This relies on each node in a fabric keeping the information between > the two parallel methods in sync. Which leads to the question, why have > two independent methods for getting the exact same information? The > only logical answer is that there are some nodes which can only use > one of the methods. In which case the two sets of data are not identical, > because of these nodes, which succeeds in producing much confusion. Not > to mention the race conditions between keeping a centralized database > (ATS) > in sync with the distributed mechanism. (ARP) > > For these reasons I cringe at hearing IP address and ATS in the > same sentence, I really wish DAT had chosen a different name for > the addresses. > > Really, we all discussed this years ago in the IETF, the merits of > using broadcast vs. centralized data store, and a solution was > developed. This is why open standards bodies are so useful. > Libor, I agree with most of your statements here, I also advocated to use ARP based mechanisms in the DAT calls rather than ATS. And our DAPL implementation enable ARP based resolution in addition to ATS The one thing that ATS provide and is not possible with ARP is reverse resolution GID->IP, any ideas how to achieve that without ATS ? The protocols such as SDP and iSER pass the source IP address as part of the CM REQ Private Data, so they don't really need the reverse translation, DAT people have tried to make a generalized mechanism I assume James or Arkady should comment on the need for ATS and DAPL reverse resolution One other approach can be to provide ATS support to user applications only, and eliminate the kDAPL support for those functions. Also I kind of like Paul's application for thin IB clients. > > Anyway the merits if the proposed API goes much beyond the use of ATS, > > so I hope we don't just hang on that one. > > Agreed, there is certainly a lot more to discuss then just ATS and ARP. > Any comments on my email explaining the forward resolution mechanisms (IP->GID, GID->path) ? (not relating to ATS) Yaron > > -Libor ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [openib-general] IB Address Translation service
> -Original Message- > From: Tom Duffy [mailto:[EMAIL PROTECTED] > Sent: Tuesday, March 01, 2005 1:38 AM > To: Yaron Haviv > Cc: Paul Baxter; openib-general@openib.org > Subject: RE: [openib-general] IB Address Translation service > > On Mon, 2005-02-28 at 21:47 +0200, Yaron Haviv wrote: > > And as you mentioned there is value to have the same API for different > > resolution mechanisms, the SDP code can be altered in future to ride > > over the proposed API, so it can be used without TCP/IP. > > I am not sure you are gaining much by having SDP use straight ATS. > Already, once the ARP table is filled with the information, it is a > local cached lookup. Tom, the value to Paul is not performance related, but an ability to resolve an IP to GID without requiring a TCP/IP implementation, I can think of some applications that would like thin client and still use Valid IPs Anyway as I mentioned the ATS support is just one (very minor) thing in the API we proposed, can I assume you don't have comments to the main functionality in ib_at ? Yaron ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [openib-general] Question
Ron, I believe netdiscover uses direct route MADs So it can work also when the fabric is not fully initialized Yaron > -Original Message- > From: [EMAIL PROTECTED] [mailto:openib-general- > [EMAIL PROTECTED] On Behalf Of Ronald G. Minnich > Sent: Tuesday, March 01, 2005 12:07 AM > To: openib-general@openib.org > Subject: [openib-general] Question > > > If ibnetdiscover can do stuff like this: > hcaguids=0xc074660801c90200 > Hca 2 "H-0002c901086674c0" # MT23108 InfiniHost Mellanox > Technologies > [1] "S-0002c90112c08b40"[2] # lid 0 lmc 0 > > > etc. etc. > > i.e., probe all the way to the edge of the network and find things out, > what could be going on such that opensm won't work at all? I did an svn > update and complete rebuild friday. But opensm is still totally stuck. > > I have power cycled all switches, and indeed the whole system. I did yank > (yet another) dead power supply on one mellanox switch, but still .. > ibnetdiscover is happy, and opensm is not. > > opensm -r does not help. > > I'm basically baffled. What is ibnetdiscover able to do that opensm is not > able to do? > > thanks > > ron > ___ > openib-general mailing list > openib-general@openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib- > general ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [openib-general] IB Address Translation service
> -Original Message- > From: [EMAIL PROTECTED] [mailto:openib-general- > [EMAIL PROTECTED] On Behalf Of Libor Michalek > Sent: Monday, February 28, 2005 8:55 PM > To: Roland Dreier > Cc: openib-general@openib.org > Subject: Re: [openib-general] IB Address Translation service > > > SDP does implement a subset of the proposed functionality for > resolving IP addresses to PathRecords which can then be used in > a CM REQ request, plus some basic caching. All the code is isolated > to a single file, sdp_link.c. There's really only a single entry > point API, plus a completion function: > > int sdp_link_path_lookup(u32 dst_addr, >u32 src_addr, >int bound_dev_if, >void (*completion)(u64 id, > int status, > u32 dst_addr, > u32 src_addr, > u8 hw_port, > struct ib_device *ca, > struct ib_sa_path_rec *path, > void *arg), >void *arg, >u64 *id); > > The values are based on strictly what is needed by either the Linux > routing code to resolve the address, or the IB APIs to establish the > connection. The implementation has three stages: > > - src/dst IP address -> IPoIB net_device, IB ca, IB port, IB pkey. > - dst IP address and IPoIB net_device -> dst GID using IPoIB ARP > - dst GID -> PathRecord using ib_sa. > > A cancel function based on the 'id' parameter would be a nice to have > but is not strictly necessary, since the lookup will eventually compelte > one way or another and any dead connection will be cleaned up at that > point. > Libor the idea is that ib_at provides similar functionality Sahar looked through your SDP code prior to proposing the API We would like to have a common API for all the ULP's that provide that functionality, and specifically now when we implement kDAPL over OpenIB. To summaries the differences: The reasons we broken it to two functions (IP->GID, GID->Path) and not have an IP->Path API (like we also used to have in our gen1 stack) are: a. some consumers will only need the 1st part (e.g. just to know which HCA to use) b. some may use only the 2nd part (e.g. IPoIB, SRP) c. you can get parameters from the first part (e.g. P_Key, and decide to overwrite it with your own P_Key, etc') d. the 2nd function provides more options for multipath, partitioning, QoS e. we can now more easily use different IP resolution mechanisms without changing the 2nd function (ARP or ATS). We added source IP and TOS as optional parameters for the IP->GID, just because IP route can be defined for Src/dst/TOS, and it's already part of Linux. we added multipath, IB QoS etc' because we have more than one applications that need it today, e.g. people that want to run IPC/MPI and NFS on the same fabric may want 2 separate VLs and SLs, some applications need APM support, some applications need source based routing, .. Since there are commercial SMs that can provide all the advanced capabilities, we want to enable the OpenIB stack to make use of it. By default you can nest the 2 functions (call the first, and than use its result to call the second), what you will get is that the ULP will use the HCA/Port associated with the IPoIB subnet, will use the same Partition as the IPoIB interface, and will use the same QoS/SL as IPoIB. Optionally the consumer can put his own QoS, Partitioning, etc' for the 2nd function if he knows where to take it from. An example of what you can get with the default mode: You define few partitions in the fabric (central configuration), each with its IP subnet and SL (from MCRecord). And then just use different IP subnet for isolation (Partitions or VLs) NO need for any manual/local configuration on the host side and your IPC, Storage are now running on separate VLs and/or Partitions, what we got is something we can explain to users and developers that haven't read the IB 1000s page book, and don't sit regularly in IBTA meetings. Yaron ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [openib-general] IB Address Translation service
> -Original Message- > From: [EMAIL PROTECTED] [mailto:openib-general- > [EMAIL PROTECTED] On Behalf Of Libor Michalek > Sent: Monday, February 28, 2005 9:49 PM > To: Paul Baxter > Cc: openib-general@openib.org > Subject: Re: [openib-general] IB Address Translation service > > The two are not interoperable, they > reside in parallel, and succeed in producing much confusion. (IMO) One note, the two can be made interoperable, if nodes that use IPoIB register them self in the ATS database as well (which has its merits for reverse resolution that cannot be satisfied by IPoIB), this way the nodes that just use ATS can locate the IPoIB ones. That is how it works (successfully) in the Voltaire gen1 stack Anyway the merits if the proposed API goes much beyond the use of ATS, so I hope we don't just hang on that one. Yaron ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [openib-general] IB Address Translation service
> -Original Message- > From: [EMAIL PROTECTED] [mailto:openib-general- > [EMAIL PROTECTED] On Behalf Of Paul Baxter > Sent: Monday, February 28, 2005 9:32 PM > To: openib-general@openib.org > Subject: Re: [openib-general] IB Address Translation service > > Having now just read Yaron's reply, I am even more convinced that this is > the right way to go albeit I can't comment on the API etc (Could someone > explain the differences in using ARP and ATS. ) Paul, ATS (Address Resolution Service) is based on each node registering a service record in the SM/SA with GID&P_Key=IP address. When you want to map an IP address to IB address it issues an SA query to the SM/SA with an IP that results in GID+P_Key values than can be used by the ULP. ATS is a standard defined by DAT and recently also by ICSC. As I mentioned in the IP to GID API you can specify if to resolve based on the IP infrastructure (like the one Libor described), or based on ATS, or Default (first try IP/ARP, than ATS). And as you mentioned there is value to have the same API for different resolution mechanisms, the SDP code can be altered in future to ride over the proposed API, so it can be used without TCP/IP. Yaron ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [openib-general] IB Address Translation service
> -Original Message- > From: [EMAIL PROTECTED] [mailto:openib-general- > [EMAIL PROTECTED] On Behalf Of Roland Dreier > Sent: Monday, February 28, 2005 7:13 PM > To: shaharf > Cc: openib-general@openib.org > Subject: Re: [openib-general] IB Address Translation service > > This API seems overly complex and at the same time too inflexible to > me. However, rather than getting bogged down nitpicking about APIs, I > think we have to take a few steps back. I believe the API is very flexible, but we are pretty open to here what you think is needed in addition > First, let's understand the problem we're trying to solve. Who are > the consumers of this address translation service? The first problem is that most ULPs use valid IP addresses for simplicity (DAPL, iSER, NFS/RDMA, SDP, MPI, etc') and someone needs to resolve it to an IB address and device to use IB. This should take into account cases where there are more than one HCAs in the system. Preferable/optionally the ULP would like to know which partition to use if there is more than one, and leverage on the IP subnetting done by IPoIB. It is possible to replicate the same code you have in SDP (which is also not complete) across all ULP's, I assume a better way is to provide it in one central place. There are also two proposed address resolution mechanisms, one is ARP used by SDP, and one is ATS used by some DAPL consumers, and we believe it is better to combine them under the same API. The second problem relates to mapping of IB GID to one or more Path records This is also something needed for ALL ULP's. today each ULP provides the minimal subset of path resolution functionality without taking into account topics such as partitioning, QoS, source routing and multi-pathing. Some of these require using special SA queries (such as SA Multipath Record query and QoSPath Query). I don't think it make sense to put all this functionality into each ULP as well. Than we can also discuss, does it make sense to have each path resolution call lead us to the sa, or does it make more sense to cache those paths. And if we cache, doesn't it make more sense to cache/invalidate the routes to all ULP's rather implementing/having it in each ULP. Also not sure how a 1000 node cluster functions without the caching. And the last problem is related to reverse resolution from IB to IP addresses that is needed for DAPL, as well as for different management and diagnostic tools that want to know what is really that node/port behind that GID addresses. So how would you suggest to go about it ? Duplicate all of that in each ULP ? Refrain from implementing advanced routing, partitioning, QoS (we cant really maintain all that advanced code for each ULP) ? Our idea is to provide those few helper functions that enable people to make full use of IB and its features without reading all the IB spec, and a Phd. If you clear all the remarks from the library, you will see it is very slim, and for my understanding includes all the relevant input and output parameters for each of the 3 functions I mentioned. As shahar mentioned, this is just a proposal, and if you see any thing missing in the API, or a better way to address the requirements I just listed, I'm happy to here. The API doesn't define the implementation, which we can discuss once we agree on the functionality and interfaces, and you have some valid questions there needs to be addressed. Yaron > Second, let's come up with the right architecture to solve the > problem. Are we implementing a library in userspace or a kernel > module? Do we have a single cache or do we need multiple caching > policies? And so on... > > Finally, we can design the API. > > Thanks, > Roland > > > ___ > openib-general mailing list > openib-general@openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib- > general ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: FW: [openib-general] Minutes from DAPL BOF at OpenIB Workshop
Just to add there is a Lustre NAL over kDAPL in development And few other application specific protocols done over kDAPL I know of All those protocols Arkady mentioned can work on both RDMA technologies and where designed in such a way (I'm familiar with their code and architecture). And another grate benefit of kDAPL is the simplification of the Verbs & Access Layer API, making the ULP's simpler to implement, where a lot of common functionality is done by a shared library (kDAPL) And with a socket like connection establishment flow, etc' If we want IB to be successful we need to find a way for software developers to easily build implementations over it (even if not all of the Applications are open and part of the Linux tree), forcing all Linux RDMA developers to code to Verbs, CM, SA, ... is probably not the best approach (the IB spec is many pages as you all know). For all the guys worried about performance degradation, the latest Verbs vs DAPL benchmark we did we got 100% the same BW and ONLY 200ns latency difference. There is agreement that the current kDAPL API and implementation are not Linux friendly, and as mentioned before a bunch or people volunteered at Sonoma to do the work involved in changing it, and agree to make kDAPL API different than uDAPL and more suitable for kernel. Yaron > -Original Message- > From: [EMAIL PROTECTED] [mailto:openib-general- > [EMAIL PROTECTED] On Behalf Of Kanevsky, Arkady > Sent: Friday, February 11, 2005 1:49 AM > To: Libor Michalek; Matt Leininger > Cc: Christoph Hellwig; openib-general@openib.org; Tom Duffy > Subject: RE: FW: [openib-general] Minutes from DAPL BOF at OpenIB Workshop > > For kDAPL: > The iSER has been submitted to Open Ib by Voltaire already. > NFS-RDMA is at http://sourceforge.net/projects/nfs-rdma/. > > For uDAPL: Oracle, DB2 and MPI. > I am not aware if there is an open source MPI version on uDAPL. > > These are publicly known. > > As far as changing the uDAPL or kDAPL APIs. > There are application already writen to them. > There are implementation of these APIs on other platforms besides Linux. > It is in nobody's interest to splinter the user community. > We need the same API on all platforms. > If there is a good technical reason to change some specific APIs we > should consider it. > But the "burn the spec" approach is not a rationale one. > If we need to change implementation or some definitions in header files > it is feasible. > > As far as other transport. As people already mentioned iWARP (IETF > RDDP). > IBM talked at the BOF about RNIC PI which is being developed as a level > of > abstraction on the lower end to "discover" all the need info about > RNIC/HCA. > It is still no ready so we will start with gen2. > But lets not loose site of what DAPL brings: > OS independent, > Transport independent, > RDMA APIs!!! > > Thanks for jumping on the code so quickly. > Arkady > Chair of DAT Collaborative > > Arkady Kanevsky email: [EMAIL PROTECTED] > Network Appliance phone: 781-768-5395 > 375 Totten Pond Rd. Fax: 781-895-1195 > Waltham, MA 02451-2010 central phone: 781-768-5300 > > > > > -Original Message- > > From: Libor Michalek [mailto:[EMAIL PROTECTED] > > Sent: Thursday, February 10, 2005 4:03 PM > > To: Matt Leininger > > Cc: Christoph Hellwig; openib-general@openib.org; Tom Duffy > > Subject: Re: FW: [openib-general] Minutes from DAPL BOF at > > OpenIB Workshop > > > > > > On Thu, Feb 10, 2005 at 12:36:39PM -0800, Matt Leininger wrote: > > > On Thu, 2005-02-10 at 12:27 -0800, Grant Grundler wrote: > > > > On Thu, Feb 10, 2005 at 12:05:58PM -0800, Matt Leininger wrote: > > > > > uDAPL - Oracle, MPI > > > > > kDAPL - iSER, NFS over RDMA, Lustre? > > > > > > > > Lustre will use Sandia Portals AFAIK. > > > > Anyone know what Portals will use? > > > > They might directly program to VAPI or something. > > > > > > > There will be a Portals over verbs. At some point there may be a > > > Portals over kDAPL to support both RDMA ethernet and IB. > > > > Yup, that's one of the bigger questions, can it abstract > > away the differences between two different RDMA technologies? > > Having a RDMA ethernet and IB providers for kDAPL is > > insufficient, one would need to show an actual, non-trivial, > > protocol that works ontop of either provider with no, or > > little, modification/ifdef'ing. > > > > -Libor > > > > > > ___ > > openib-general mailing list > > openib-general@openib.org > > http://openib.org/mailman/listinfo/openib-> general > > > > To > > unsubscribe, please visit > > http://openib.org/mailman/listinfo/openib-general > > > ___ > openib-general mailing list > openib-general@openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib- > general _
[openib-general] SDP socket address family
There seems to be a conflict between the currently used SDP socket address family number (26) and the current linux kernel. Linux allocates this address family number (26) for 'LLC' protocol. Any ideas if we should change it from 26, and to what ? Below are some related header-file snippets: SuSE-9.1 /usr/include/linux/socket.h: --- #define AF_IRDA 23 /* IRDA sockets */ #define AF_PPPOX 24 /* PPPoX sockets */ #define AF_WANPIPE 25 /* Wanpipe API Sockets */ #define AF_LLC 26 /* Linux LLC */ #define AF_BLUETOOTH 31 /* Bluetooth sockets */ #define AF_MAX 32 /* For now.. */ Voltaire's sdp/sdp-sockets/sdp-sockets.h: --- # define AF_IBT 26 TopSpin's infiniband/ulp/sdp/sdp_inet.h: --- /* * constants shared between user and kernel space. */ #define AF_INET_SDP 26 /* SDP socket protocol family */ #define AF_INET_STR "AF_INET_SDP" /* SDP enabled enviroment variable */ Yaron ___ openib-general mailing list [EMAIL PROTECTED] http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: Fwd: Re: [openib-general] static LID computationwithTS_HOST_DRIVER
As I mentioned before, I think the best approach in the long run is to have a well known Loopback LID (that will stay as an alias also after the port changed its LID, not to break apps), just like in any other stack From a short research I did once I think it is possible to create one even in the current Mellanox HW leveraging on the Multicast support with little firmware changes, maybe Mellanox can comment on that It is also possible to leverage on APM (+ the SMI port change events) if we don’t want to deal with the HCA’s And in any case we want the apps to be able to recover from any RC failures gracefully (not just LID changes) Doing manual configuration on each host violates all the idea of zero configuration and utility computing we all advocate for Yaron From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Michael Krause Sent: Friday, October 01, 2004 1:29 AM To: [EMAIL PROTECTED] Subject: Re: Fwd: Re: [openib-general] static LID computationwithTS_HOST_DRIVER At 03:40 PM 9/30/2004, David M. Brean wrote: The IBA provides two mechanisms for updating subnet management data: 1) through the verbs - see Modify HCA (section 11.2.1.3) 2) through Subnet management packets (SMPs) - see Subnet Management Class (section 14.2) The IBA only supports updating the LID via SMPs (#2 above) and an entity using SMPs must have the M_Key. If that entity doesn't have the M_Key, then it can't reliably change the LID. In addition, the IBA allows an endnode to request, through the verbs interface provided for the "node reinitialization" (see 14.4.4) mechanism, that subnet management state, such as the LID, be preserved, when a port transitions through the DOWN state. However, the SM may not honor that request so the endnode must handle that possibility because LID assignment policy is owned by the SM. Furthermore, this mechanism is used on ports that have previously been initialized by the SM (maybe that's why it's called the reinitialization function :)). Given the mechanisms in the specification, I think that its possible to have IB clients use loopback, even under the endnode power-up scenario, while the port is not in the ACTIVE state and have them continue without disruption when the port is made ACTIVE on the subnet by the SM with use of the reinitialization mechanism. This is a very useful mechanism for various failover situations. This is a reasonable approach where the loopback LID being used is updated upon the port being initialized (akin to solving this in the CI but still allowing CM to work with a known LID. It avoids any complexity in the SM having to preserve LID that may not be optimal or potentially unique within the subnet. Not sure this might work but it seems to me that APM mech could be used to configure a new configured LID and then transfer the connection to the configured. May take a bit of work in CM as APM is nominally set up during these exchanges. There is no current IBA mechanism or protocol for an endnode to set just the LID, even if it had the M_Key, and have the SM preserve that value. Agreed. Mike -David Roland Dreier wrote: I don't see anything in the spec that forbids a CA from having an arbitrary value in PortInfo:LID after initialization but before the SM discovery (please correct me if I missed something). I also don't see anything that forbids an SM implementation from providing a mechanism for preserving the LIDs it finds or administratively assigning LIDs. Of course none of this is required but I don't see a problem with allowing it. ___ openib-general mailing list [EMAIL PROTECTED] http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general ___ openib-general mailing list [EMAIL PROTECTED] http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: Fwd: Re: [openib-general] static LID computationwithTS_HOST_DRIVER
> -Original Message- > From: [EMAIL PROTECTED] [mailto:openib-general- > [EMAIL PROTECTED] On Behalf Of Roland Dreier > Sent: Thursday, September 30, 2004 3:13 AM > To: Michael Krause > Cc: [EMAIL PROTECTED] > Subject: Re: Fwd: Re: [openib-general] static LID > computationwithTS_HOST_DRIVER > > Michael> The SM is the only entity that is supposed to assign LID > Michael> as well as the subnet prefix. The SM should not trust > Michael> any CA / switch configuration if it has not configured it > Michael> thus should wipe it out and replace it with what it deems > Michael> best. > > I don't see anything in the spec that forbids a CA from having an > arbitrary value in PortInfo:LID after initialization but before the SM > discovery (please correct me if I missed something). I also don't see > anything that forbids an SM implementation from providing a mechanism > for preserving the LIDs it finds or administratively assigning LIDs. > While I agree that other SM's in a recovery/merge phase should try and preserve the LID's I think a CA shouldn't just like it is not supposed to change its own P_Key table, and because it is not aware of the policy and/or the bigger picture. Applications should be designed to deal with LID changes or other RC connection failures. But any way out of curiosity how do you generate a unique LID (locally by the host) for every node in the fabric in a large fabric when the ports are down and the nodes don't talk to each other ? (I hope not through Ethernet :)) Or how do you anticipate the LMC value (LID spacing)? Yaron ___ openib-general mailing list [EMAIL PROTECTED] http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: Fwd: Re: [openib-general] static LID computation withTS_HOST_DRIVER
I agree with Dave that Static LID is problematic and we should think of other short and longer term alternative for that (There are many cases where the SM may dictate a non random LID allocation policy, E.g. LMC configuration changes, Subnet Merge, .. and the HCA is not aware of it). I believe that the need for it comes from applications that want to talk to some kind of a loop back adapter without depending on the port state or even before the port is up. A better solution that IBTA needs to look at is creating a well known Loopback LID value that apps use when they want to talk locally (like IP 127...) It may even be feasible to implement something on the existing HCA HW (by using one of the unused multicast LID's and some firmware changes) Yaron > -Original Message- > From: [EMAIL PROTECTED] [mailto:openib-general- > [EMAIL PROTECTED] On Behalf Of Roland Dreier > Sent: Wednesday, September 29, 2004 5:24 PM > To: David M. Brean > Cc: [EMAIL PROTECTED] > Subject: Re: Fwd: Re: [openib-general] static LID computation > withTS_HOST_DRIVER > > David> Ok. How does the port inform the SM that it has a > David> "preferred" LID? > > The port will already have a LID assigned when the SM discovers it. > My understanding is that the SM is "encouraged" to preserve a port's > LID if it doesn't conflict with any other LIDs, and this is what we're > relying on. > > - Roland > ___ > openib-general mailing list > [EMAIL PROTECTED] > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib- > general ___ openib-general mailing list [EMAIL PROTECTED] http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [openib-general] Re: [openib-commits] r894 -gen2/branches/roland-merge/src/linux-kernel/infiniband/ulp/ipoib
> -Original Message- > From: [EMAIL PROTECTED] [mailto:openib-general- > [EMAIL PROTECTED] On Behalf Of Roland Dreier > Sent: Monday, September 27, 2004 9:18 PM > To: Tom Duffy > Cc: [EMAIL PROTECTED] > Subject: Re: [openib-general] Re: [openib-commits] r894 - > gen2/branches/roland-merge/src/linux-kernel/infiniband/ulp/ipoib > > Tom> Doh. You beat me to the punch. I was working on the same > Tom> thing (although, I was trying to do it with a kthread). > > Sorry dude... > > Tom> What do you think is the next step on the TODO that I could > Tom> start working on? Don't want to step on your toes... > > I think I've done all the straightforward work on IPoIB now. We can > try to figure out how to make it a "native" driver now (ie use the > full 20 byte HW address instead of hashing down to 6 bytes, etc). I > had some inconclusive discussions on [EMAIL PROTECTED] about this > last week but I still don't know how to do it. > Having a 20 byte HW address towards the upper stack may result in some unexpected behavior with different networking tools such as sniffers, etc' , a variety of DHCP servers and few other protocols that use the hardware addresses. I suggest we don't rush to incorporate the 20 byte support and think of more urgent matters, and in any case when we do get to it allow the user to configure the IPoIB to work in the 6 byte mode, to enable compatibility with those apps/protocols. Yaron ___ openib-general mailing list [EMAIL PROTECTED] http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general