S.B. Regards, --Liran -----Original Message----- From: Or Gerlitz [mailto:[email protected]] Sent: Tuesday, July 14, 2009 12:43 PM To: Eli Cohen Cc: Sean Hefty; Roland Dreier; Jack Morgenstein; general-list; Liran Liss Subject: Re: [PATCH 0/8 v3] RDMAoE support
Eli Cohen wrote: > RDMAoE allows running the IB transport protocol using Ethernet frames allowing the deployment of IB semantics on lossless Ethernet fabrics. RDMAoE packets are standard Ethernet frames with an IEEE assigned Ethertype, a GRH, unmodified IB transport headers and payload. Hi Eli and the team @ Mellanox Before going into more detailed review and comments, I'd like to try and clarify few issues. 1. GRH - is it a must per your design to have it also for unicast packets? or maybe it just simplifies things, or both? I assume you may need it for the CM logic to keep working the way it used to. LL: GRH is required for all packets; it keeps things simple and provides length information. 2. is there any reason not to restrict the design for supporting addr / rdma-cm based consumers? e.g in the same manner that iWARP is? LL: There is no reason to restrict RDMAoE to RDMACM apps: all existing IB applications can work over RDMAoE. 3. CM services - note that once the ULP did the address resolution, the addr / rdma-cm data structure can/has the src/dest MACs, so basically why not push all the changes to the mad code and eimplement the MADs over raw or even datagram socket? LL: we already have a great CM protocol that runs over QP1 and is a perfect fit for RDMAoE; we don't see any reason to change this. 4. regarding your rdmaoe_sa - aside from being non review-able! (since in the same patch 3/8 you move tons of code from one place to another and add changes) it seems to me an overkill. I would like to see a solution which takes advantage of the SA non existence under Ethernet, and hence route resolution becomes no-op as it is with iWARP, as for multicast join, since it is a local operation it should and can be done by the rdma-cm, no need to modify the layers below it, expect for exposing API for the rdma-cm to do so. LL: As I said earlier, we do *not* want to restrict ourselves artificially to RDMACM. The rmdaoa "sa" code does more than just path resolution - it provides SL, MTU, rate, packet lifetime, and other params. The same thing is also true for multicast joins. In terms of the code, it is very simple: - move useful definitions that were previously in multicast.c and sa_query.c to header files. - implement only what is needed for rdmaoe in rdmaoe_sa.c. Perhaps we can make better code reuse (calling the SA's cmp_rec(), for example). We will look into it; thanks! I would just like to note that there is a tradeoff here: keeping rdmaoe code separate for independent development vs. reducing code duplication. Perhaps, as rmdaoe matures, we can aggregate the code to reduce duplication. 5. 8/8 says "Currently, each IB port has a single GID entry in its table and that GID entery equals the link local IPv6 address" - does your design support IPv4 as well, how? LL: Yes, we intend to use standard IPv4-mapped addresses (::0xffff<a.b.c.d>). > To enable RDMAoE with the mlx4 driver stack, both the mlx4_en and mlx4_ib drivers must be loaded, and the netdevice for the corresponding RDMAoE port must be running. 6. it seems that your design is somehow too tightly coupled to the connectX hca and the mlx4 driver, please note that the design should allow for implementing software rdmaoe provider as well LL: on the contrary, there is *nothing* in the architecture that is HW-specific. This is intentional --- SW rdmaoe will be straighforward, efficient (as is SW FCoE), and simple (this is rdma transport after all!) to implement over any nic. We already know of people that are looking into that. > Individual ports of a multi port HCA can be independently configured > as Ethernet (with support for RDMAoE) or IB, as is already the case. > [...] Following is a series of 8 patches based on version 2.6.30 of > the Linux kernel > Does the mainline kernel has all the patches to do so - I wasn't sure if this is the case. can you send the instructions? > This new series reflects changes based on feedback from the community > on the previous set of patches. The whole series is tagged v3 There's not a single mentioning of a change vs the previous versions, how do you expect someone not to treat it as v1?! LL: thanks for the remark, we will send a list of changes in our following patches. For some reason you have chosen to use cross-posting, I don't think these patches need that. Or. _______________________________________________ general mailing list [email protected] http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
