Re: [ewg] [PATCHv8 07/11] ib_core: Add API to support IBoE from userspace
If we have a dedicated ABI call for this mapping, then it seems reasonable to have it device independent. However, this mapping is really only used when creating address handles. So, we can base the mapping on the (device specific) create_ah() flow, but provide generic mapping functions for all devices to use (this is kind of what happens now). Also, using create_ah() doesn't introduce an ABI call that is specific to ib-->eth mappings. This is similar to how device-specific ib_reg_user_mr() functions call the generic ib_umem_get()... -Original Message- From: linux-rdma-ow...@vger.kernel.org [mailto:linux-rdma-ow...@vger.kernel.org] On Behalf Of Roland Dreier Sent: Thursday, May 13, 2010 10:18 PM To: Sean Hefty Cc: 'Eli Cohen'; Eli Cohen; Linux RDMA list; ewg Subject: Re: [PATCHv8 07/11] ib_core: Add API to support IBoE from userspace > Basically, what I want to understand is why does this change make sense? > > @@ -1139,6 +1139,10 @@ struct ib_device { >struct ib_grh *in_grh, >struct ib_mad *in_mad, >struct ib_mad *out_mad); > +int(*get_eth_l2_addr)(struct ib_device *device, > u8 port, > + union ib_gid *dgid, int > sgid_idx, > + u8 *mac, u16 *vlan_id, u8 > *tagged); > + Yes, that was pretty much my original question. Why do we have a verb for userspace to call a device-specific method to do the mapping? The layering seems wrong somewhere if we have a generic verb to do this mapping, but then put the mapping in device-specific code. - R. -- Roland Dreier || For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/index.html -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] OFED-1.5.1 failure over iWarp
The rdma_dev_addr refers to an L2 netdevice, so it makes perfect sense that the hw addresses stored in src/dst_dev_addr are macs for both iWarp and RoCEE (as is already the case). Note that with this approach, dev_type is no longer sufficient to determine the ibdev type. Following the "spirit" of the current code, it is probably cma_acquire_dev()'s job to fill in the missing ibdev type information after matching the netdev to an ibdev. As for the match process, we could encode the mac in one of a RoCEE port's gids, but this entry would be a dummy, i.e., it would only serve for this matching process. In contrast to iWARP, RoCEE gids really *are* gids, and serve as the port's *network* addresses. In the current implementation, the link-local GID is a fully-qualified L3 address, which borrows from IPv6's automatic configuration scheme; it is always be present and usuable. So, the current suggestion of using the link-local gid for device matching has the advantage that the GID table contains only usable L3 gids - no dummys. I don't know which of these alternatives is "cleaner". --Liran P.S. - I really wish that we had a cleaner way to match an ibdev to a netdev without overloading the gid table entries. Basically, it should be the job of the entity that created the netdev to make this association, and stuff a pointer in the netdev. Another option is to register a list of "L2 HW addresses" with an ibdev's port (i.e., in a different structure than the gid table), so the lookup would be straight-forward. -Original Message- From: Or Gerlitz [mailto:ogerl...@voltaire.com] Sent: Thursday, February 04, 2010 10:29 AM To: Sean Hefty; Steve Wise; Liran Liss Cc: 'Eli Cohen'; OpenFabrics EWG Subject: Re: [ewg] OFED-1.5.1 failure over iWarp Sean Hefty wrote: > If I look at what's there today, we're trying to find some way to > match the net_device src_dev_addr with some sort of address associated with > an ib_device. > In the case of actual IB, the net_device src_dev_addr contains the > SGID, which provides the mapping. > Steve, can you please clarify the iWarp case for me? For iWarp, > doesn't the src_dev_addr contain the MAC? So, the 'GID's reported for > an iWarp device is really just the MAC. Is this correct? > If this is the case, then couldn't rocee (I hate that name) report its > MAC as one of its GIDs? This would ensure that the mapping between > net_device and ib_device was correct. Sean, AFAIK, reporting the MAC as one of the GIDs was part of the IBoE (feel free not to use names which you don't like) design presented couple of time, isn't it, Eli, Liran? Or. ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
RE: [ewg] Re: [ofw] SC'09 BOF - Meeting notes
The link local address that we are currently passing down from the rdmacm encodes a MAC address that was obtained through neighbor discovery; so we are safe. There are RDMAoE applications (some in the embedded space) that do not use the rdmacm. Some of these rely on custom L2 address assignment and would like to completely avoid the use of neighbor discovery. For these, we can clearly state the requirement that the "Interface Identifier" in the link local address that they pass down should be such that it encodes a valid MAC address that the interface currently responds to. In the future we also intend to allow the use of (non link local) IP addresses encoded in the GIDs. And we will definitely use neighbor discovery to translate those. --Liran -Original Message- From: Roland Dreier [mailto:rdre...@cisco.com] Sent: Monday, November 30, 2009 7:34 AM To: Liran Liss Cc: Richard Frank; o...@lists.openfabrics.org; OpenFabrics EWG Subject: Re: [ewg] Re: [ofw] SC'09 BOF - Meeting notes > RFC 4291, Appendix A. Thanks for the pointer. As far as I can tell from reading some IPv6 stuff, it really is broken to try to go from a link-local IPv6 address back to a L2 ethernet address. For example, RFC 2464 (pointed to by RFC 4291) says: Ethernet Address The 48 bit Ethernet IEEE 802 address, in canonical bit order. This is the address the interface currently responds to, and may be different from the built-in address used to derive the Interface Identifier. It really seems to be setting ourselves up for trouble not to use neighbor discovery to map IPv6 addresses to link-layer addresses. - R. ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
RE: [ewg] Re: [ofw] SC'09 BOF - Meeting notes
All addressing code now resides within a rdmaoe-specific flow in the cma, so the changes do not seem invasive. Is there any specific change that concerns you? http://www.t11.org/ftp/t11/pub/fc/study/09-543v0.pdf describes the IBTA definition in progress, which is in line with the current driver stack implementation. --Liran -Original Message- From: Roland Dreier [mailto:rdre...@cisco.com] Sent: Monday, November 23, 2009 9:20 PM To: Liran Liss Cc: Richard Frank; o...@lists.openfabrics.org; OpenFabrics EWG Subject: Re: [ewg] Re: [ofw] SC'09 BOF - Meeting notes > In any case, this is not a correctness issue that prohibits > experimentation with rdmaoe multicast on any network today. I agree -- nothing prevents experimentation. I am just leery about making invasive changes to the core stack in the absence of any documented design for IBoE (that I've seen at least). - R. ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
RE: [ewg] Re: [ofw] SC'09 BOF - Meeting notes
RFC 4291, Appendix A. --Liran -Original Message- From: Roland Dreier [mailto:rdre...@cisco.com] Sent: Monday, November 23, 2009 9:18 PM To: Liran Liss Cc: Richard Frank; o...@lists.openfabrics.org; OpenFabrics EWG Subject: Re: [ewg] Re: [ofw] SC'09 BOF - Meeting notes > RFC 3041 deals with static global IP addresses on the Internet, > especially for portable devices. > rmdaoe allows using link-local GIDs for applications residing on the > same subnet, so I don't see the relevance. I guess you're right -- I was confused about when random addresses are used for generating stateless autoconfig addresses, and I guess even with RFC3041 they are not for link-local scope. However, do you know of anything in the IPv6 RFCs that guarantees that link-local IPv6 addresses are generated using ethernet addresses? - R. ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
RE: [ewg] Re: [ofw] SC'09 BOF - Meeting notes
See below. --Liran I understand that this is your assessment of the situation, looking on the series present at the ofed1.5 rdmaoe branch in a black box manner yields that many many files are touched, see below. Coming and saying that changes in your HW LL driver are out of the scope for other companies to discuss is not acceptable, since we provide enterprise ready stack based on your HW driver. LL: Any comments on our low-level driver are more than welcome. That being said, we have been running extensive testing on this code base for several months now and see no stability issues. all the rdmaoe materials saying the lossless traffic class is a must, are you saying that this works well also without it? then why from architect point of view you have posed this requirement? LL: lossless traffic can be achieved today using global pause, for example. PFC is still important; we will submit initial patches that support it next week. ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
RE: [ewg] Re: [ofw] SC'09 BOF - Meeting notes
In the past few months of review, the responsibility for rdmaoe addressing was moved to the rdmacm. So, any future addressing enhancements can be confined to the rdmacm module without breaking existing APIs. RFC 3041 deals with static global IP addresses on the Internet, especially for portable devices. rmdaoe allows using link-local GIDs for applications residing on the same subnet, so I don't see the relevance. Note that for rdmacm apps, the intention is to map the IP addresses that were assigned to the host's interfaces. Please see http://www.t11.org/ftp/t11/pub/fc/study/09-543v0.pdf. Regarding multicast, current switches will flood the traffic just as any other non-IP multicast traffic (e.g., fcoe). Using switches that support multicast pruning for additional ethertypes, you can optimize the traffic and achieve the same link utilization as normal IP multicast. In any case, this is not a correctness issue that prohibits experimentation with rdmaoe multicast on any network today. --Liran -Original Message- From: ewg-boun...@lists.openfabrics.org [mailto:ewg-boun...@lists.openfabrics.org] On Behalf Of Roland Dreier Sent: Thursday, November 19, 2009 9:35 PM To: Richard Frank Cc: o...@lists.openfabrics.org; OpenFabrics EWG Subject: Re: [ewg] Re: [ofw] SC'09 BOF - Meeting notes > Having lots of testing exposure can help in validating that all the > edge cases are handled.. To some extent -- but there also needs to be some thinking involved to make sure that the interface can actually handle future cases. > Are there a set of cases that you have in mind ? For example -- how is multicast going to interact with IGMP on ethernet switches? How is address resolution going to be done (current patches seem to assume that stateless IPv6 link-local addresses contain the ethernet address, which is not valid if RFC 3041 is used)? etc - R. ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
RE: [ewg] Re: [ofw] SC'09 BOF - Meeting notes
90% of the changes are either in the mlx4 driver, or self-contained in the rdmaoe flow of the cma, which handles rdmaoe addressing and connection setup. The rest of the changes indeed touch various locations of the stack, but they are either definitions or follow the same logic: if (rdma_is_trasnport(ib_device, RDMA_TRANSPORT_RDMAOE)) do_something_rdmaoe_specific(); The patches don't change the logic of existing flows at all, so we are not risking *anything* in terms of the stability of the current stack. As for vlan id and priorities - we are fully aware to the importance of exposing vlan ids and priorities to the user, but thanks for pointing this out. There are deployments today that work fine with the current patches; but in any case, we are planning to send a follow-up patch set that adds vlan+priority support in the near future. --Liran -Original Message- From: ewg-boun...@lists.openfabrics.org [mailto:ewg-boun...@lists.openfabrics.org] On Behalf Of Or Gerlitz Sent: Friday, November 20, 2009 1:39 AM To: Richard Frank Cc: Sean Hefty; Roland Dreier; OpenFabrics EWG Subject: Re: [ewg] Re: [ofw] SC'09 BOF - Meeting notes Richard Frank wrote: > How can 1500 lines out of 240k lines be a big change.. do I have these > numbers right > - is the big change you are referring too? Rick, the change set is way not self contained but rather touches various parts of the core IB stack (rdma-cm module, ib address resolution module, ib uverbs module and even the mad module) and ofcourse some of the kernel and user space IB hw specific libraries. > What is the risk area that you are worried about .. do you think it > will break current transports or existing ULPs? yes, this would be simply not supportable, think about that, you want to hand your customers with a code which didn't pass review nor acceptance by the Linux IB stack maintainers (Roland and Sean), say, next a crash happens at this or that module / line, next, what you except the maintainers to do? > If it's just about how the implementation is done.. can this be > resolved concurrently with getting the bits available for evaluation now.. an rdmaoe branch at the git tree was set and an releases are maintained, its all what you need for evaluation, five lines later you're talking on deployments... > As RoCEE is totally transparent to existing ULPs.. any potential > changes would not be visible.. and therefore not an issue for ULP / clients going forward.. right? this is how you see things, since the IBTA IBXoE annex isn't released, you just don't know what would be the bottom line. > Oracle would like to see RoCEE get into 1.5 you guys have set a note to the rds developer community that that Oracle recently moved from 1.3.x to 1.4.y, no special work is expected on 1.5.z and that you have lots of plans for 1.6.w ... what's the urgency to get these bits into 1.5? > We are testing with RoCEE now and plan to deploy it fairly soon.. in > very large configuratio the proposed patch set doesn't let you use non zero VLAN, aren't you expecting Ethernet customers to trivially require that? also you can't use non zero traffic class (priority bits), where all the IBXoE materials are talking about how much working on a lossless traffic class is a must... if indeed this is the case, the patch set is useless without the ability to specify a traffic class, as CEE switches would typically (always?) set only some of the traffic classes to be lossless (e.g the ones used for FCoE, IBXoE) and the rest to be lossy Or ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
RE: [ewg] Re: [ofw] SC'09 BOF - Meeting notes
As far as core APIs go, the patch set introduces 2 basic additions rather than changes: - A new ABI function to resolve gids to macs - ib_get_mac() - A new kernel ib_device function to get the port transport - ib_get_port_transport(). There are no changes to the Verbs API. All the address resolution stuff is contained in the cma code, so I think we code extend its logic in the future without breaking things at the interface level. Do you have anything specific in mind? --Liran -Original Message- From: ewg-boun...@lists.openfabrics.org [mailto:ewg-boun...@lists.openfabrics.org] On Behalf Of Roland Dreier Sent: Thursday, November 19, 2009 9:17 PM To: Richard Frank Cc: o...@lists.openfabrics.org; OpenFabrics EWG Subject: Re: [ewg] Re: [ofw] SC'09 BOF - Meeting notes > How can 1500 lines out of 240k lines be a big change.. do I have these > numbers right - is the > big change you are referring too? If there are significant changes to the core APIs -- and IBoE has exactly this impact -- then yes it can be a big change even if the line count is small. > What is the risk area that you are worried about .. do you think it > will break current > transports or existing ULPs ? I am worried that no one has thought through all the issues and corner cases around address resolution, multicast, etc, and that when we do get a standardized version of IBoE, we'll have to break core APIs yet again. - R. ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
RE: [ofa-general] RE: [ewg] [PATCH 2/8 v3] ib_core: RDMAoE support onlyQP1
Hi Robert, Your suggestion to represent RDMAoE as a transport indeed makes the code simpler. Thus, we will have: switch(port_transport) { case RDMA_TRANSPORT_IB: ... break; case RDMA_TRANSPORT_RDMAOE: ... break; case RDMA_TRANSPORT_IWARP: ... break; }; instead of: switch(port_transport) { case RDMA_TRANSPORT_IB: if (port_type == IB) { ... } else { ... } break; case RDMA_TRANSPORT_IWARP: ... break; }; which is cleaner. In addition, for places in which IB and RDMAOE behave the same, we will have: case RDMA_TRANSPORT_IB: case RDMA_TRANSPORT_RDMAOE: ... break; which will make this fact explicit. The only difference is that the switch() will operate on port-transport rather than node transport. (We can add a wrapper that if the ib_dev didn't regsiter a port-transport function, it will default to the node transport.) Thanks! --Liran -Original Message- From: Liran Liss Sent: Tuesday, July 14, 2009 11:53 AM To: 'Woodruff, Robert J'; Eli Cohen; Hefty, Sean; Roland Dreier Cc: ewg; general-list Subject: RE: [ofa-general] RE: [ewg] [PATCH 2/8 v3] ib_core: RDMAoE support onlyQP1 S.B. --Liran > Trying to emulate IB for mad services is a total hack and not how this new transport should be added into the core. It should be it's own transport type, just like iWarp was added. > You should start with adding a new transport type to ib_verbs.h, e.g., LL: it is not a hack: RDMAoE will probably use mad services at least for connection management, and additional ones in the future. --- ib_verbs.h 2009-07-13 09:06:10.0 -0400 +++ ib_verbs_new.h 2009-07-14 03:00:23.0 -0400 @@ -64,12 +64,14 @@ enum rdma_node_type { RDMA_NODE_IB_CA = 1, RDMA_NODE_IB_SWITCH, RDMA_NODE_IB_ROUTER, - RDMA_NODE_RNIC + RDMA_NODE_RNIC, + RDMA_NODE_IBXOE }; LL: a multi-port HCA can have both IB and Ethernet ports, so this is not a per-node thing. enum rdma_transport_type { RDMA_TRANSPORT_IB, - RDMA_TRANSPORT_IWARP + RDMA_TRANSPORT_IWARP, + RDMA_TRANSPORT_IBXOE }; LL: thanks, we will look into this. I am not sure that "transport" is the right terminology, since we are using the IB transport layer. enum rdma_transport_type___ general mailing list gene...@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
RE: [ofa-general] RE: [ewg] [PATCH 6/8 v3] IB/ipoib: restrict IPoIB to work on IB ports only
Oops, I meant "exits" instead of "exists"... -Original Message----- From: Liran Liss Sent: Tuesday, July 14, 2009 11:16 AM To: 'Woodruff, Robert J'; Eli Cohen; Hefty, Sean; Roland Dreier Cc: ewg; general-list Subject: RE: [ofa-general] RE: [ewg] [PATCH 6/8 v3] IB/ipoib: restrict IPoIB to work on IB ports only This exaclty the same as for iWARP: IPoIB checks the node transport, and if it is != IB, it exists. For RDMAoE, we do the same check but at the port level. -Original Message- From: general-boun...@lists.openfabrics.org [mailto:general-boun...@lists.openfabrics.org] On Behalf Of Woodruff, Robert J Sent: Tuesday, July 14, 2009 12:04 AM To: Eli Cohen; Hefty, Sean; Roland Dreier Cc: ewg; general-list Subject: [ofa-general] RE: [ewg] [PATCH 6/8 v3] IB/ipoib: restrict IPoIB to work on IB ports only Eli Cohen wrote, >We don't want IPoIB to work over RDMAoE since it will give worse >performance than working directly on Ethernet interfaces which are a >prerequisite to RDMAoE anyway. This is another reason why NOT to try to add IBxOE under the IB transport, but rather add it as it's own transport type. We should not need to hack all the InfiniBand ULPs to now have to know the difference between real IB and IBxOE.___ general mailing list gene...@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
RE: [ofa-general] RE: [ewg] [PATCH 2/8 v3] ib_core: RDMAoE support onlyQP1
S.B. --Liran > Trying to emulate IB for mad services is a total hack and not how this new transport should be added into the core. It should be it's own transport type, just like iWarp was added. > You should start with adding a new transport type to ib_verbs.h, e.g., LL: it is not a hack: RDMAoE will probably use mad services at least for connection management, and additional ones in the future. --- ib_verbs.h 2009-07-13 09:06:10.0 -0400 +++ ib_verbs_new.h 2009-07-14 03:00:23.0 -0400 @@ -64,12 +64,14 @@ enum rdma_node_type { RDMA_NODE_IB_CA = 1, RDMA_NODE_IB_SWITCH, RDMA_NODE_IB_ROUTER, - RDMA_NODE_RNIC + RDMA_NODE_RNIC, + RDMA_NODE_IBXOE }; LL: a multi-port HCA can have both IB and Ethernet ports, so this is not a per-node thing. enum rdma_transport_type { RDMA_TRANSPORT_IB, - RDMA_TRANSPORT_IWARP + RDMA_TRANSPORT_IWARP, + RDMA_TRANSPORT_IBXOE }; LL: thanks, we will look into this. I am not sure that "transport" is the right terminology, since we are using the IB transport layer. enum rdma_transport_type___ general mailing list gene...@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
RE: [ofa-general] RE: [ewg] [PATCH 6/8 v3] IB/ipoib: restrict IPoIB to work on IB ports only
This exaclty the same as for iWARP: IPoIB checks the node transport, and if it is != IB, it exists. For RDMAoE, we do the same check but at the port level. -Original Message- From: general-boun...@lists.openfabrics.org [mailto:general-boun...@lists.openfabrics.org] On Behalf Of Woodruff, Robert J Sent: Tuesday, July 14, 2009 12:04 AM To: Eli Cohen; Hefty, Sean; Roland Dreier Cc: ewg; general-list Subject: [ofa-general] RE: [ewg] [PATCH 6/8 v3] IB/ipoib: restrict IPoIB to work on IB ports only Eli Cohen wrote, >We don't want IPoIB to work over RDMAoE since it will give worse >performance than working directly on Ethernet interfaces which are a >prerequisite to RDMAoE anyway. This is another reason why NOT to try to add IBxOE under the IB transport, but rather add it as it's own transport type. We should not need to hack all the InfiniBand ULPs to now have to know the difference between real IB and IBxOE.___ general mailing list gene...@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] RE: [ofa-general] [PATCH 0/9] RDMAoE - RDMA over Ethernet
>Let's just say that at this point I completely disagree with where these patches try to abstract the differences, which are many. >RDMA apps that want to use this and IB without going through an abstraction will need different code -- just like they would for iWarp, which also provides RDMA over Ethernet, and is a standard. IB mad and SA query modules are not >appropriate places for abstracting the differences between IB, iWarp, and whatever name we give this. >This could change depending on whether this is really trying to be IB with a different L2, or is just another RDMA protocol that runs on Ethernet. >- Sean Sean, These are indeed real concerns; I know that the cma is the natural place for abstracting transport differences, but I am worried about non-cma Infiniband ULPs which can work just as well with RDMAoE (perhaps we can specifically expose RDMAoE "path queries" as a simple library function). We will rethink our approach to SA queries and post new patches shortly. Note that without SA query emulation, the RDMAoE patches really amount to just a few cosmetic changes to ib_core...:) Thanks for the feedback. --Liran ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] RE: [ofa-general] [PATCH 0/9] RDMAoE - RDMA over Ethernet
S.B. --Liran >RDMA over Ethernet (RDMAoE) allows running the IB transport protocol >over Ethernet, providing IB capabilities for Ethernet fabrics. The >packets are standard Ethernet frames with an Ethertype, an IB GRH, >unmodified IB transport headers and payload. HCA RDMAoE ports are no >different than regular IB ports from the RDMA stack perspective. I would refer to this as IBoE, not RDMAoE. The RDMA stack should see these ports different than regular IB HCA ports. There are a lot of differences that should not simply be hidden or incorrectly assumed: QP0, QoS, multiple paths, routing(?), no SA, etc. LL: the RDMA stack will see that the port has different link types. SLs map cleanly to VLAN user priorities. >IB subnet management and SA services are not required for RDMAoE >operation; Then I would not try to emulate it at all. As Hal mentioned in a separate post, there are too many ways to interact with the SA that an emulation won't cover. LL: you need to emulate *enough* so that typical applications don't need to worry about the link type. SA path queries is the best example. Otherwise, every RDMA application (not necessarily a CMA app) will need to have different code paths depending on the link type. >Ethernet management practices are used instead. In Ethernet, nodes are >commonly referred to by applications by means of an IP address. RDMAoE >treats IP addresses that were assigned to the corresponding Ethernet >port as GIDs, and makes use of the IP stack to bind a destination >address to the corresponding netdevice (just as the CMA does today for >IB and iWARP) and to obtain its L2 MAC addresses. Is the actual L3 address an IP address, or just an encoded IP address in an IBoE L3 address? What L3 protocol is being used and will it interoperate with some peer L3 protocol (IP or IB)? LL: RDMAoE uses GIDs that encoded IP addresses. For IPv6, this is straightforward. We use mapped address for IPv4 (::0x). Currently, RDMAoE is not routable, as the IB routing specs are not complete. However, nothing prohibits making it so in the future (either Eth to Eth or Eth to IB). - Sean ___ general mailing list gene...@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] RE: [ofa-general] [PATCH 3/9] ib_core: RDMAoE support only QP1
> Which modules will use QP1 and for what purpose? I see sa_query/multicast, but there's not an actual SA. I'm guessing that the ib_cm works without changes. Currently, QP1 will be used only for the CM, which indeed doesn't require any changes for RDMAoE. However, we can gradually extend the support for additional QP1 services in the future. > To clarify, do all IBoE packets carry a GRH? Yes. ___ general mailing list gene...@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] RE: [ofa-general] [PATCH 2/9] ib_core: kernel API for GID -->MAC translations
> Why not just use IP to MAC calls? Or use the MAC as the GUID? We do use standard OS services to map the IP addresses (that were encoded in the GID) to MACs. GIDs encode IP addresses rather than MACs to enable users to use the node names that they are used to. Specifically, we will feed in all IP addresses that were assigned to the Ethernet interface to the corresponding port GID table. This will also enable routing in the future. The only exception is IPv6 link-local addresses, which already encodes the MAC. In this case, a simple algorithmic operation extracts the MAC without requiring ARP, etc. > Do the GIDs follow the IB GID format? Yes. ___ general mailing list gene...@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg