Re: [ewg] [PATCH 0/8 v3] RDMAoE support
Eli Cohen wrote: > RDMA over Ethernet (RDMAoE) allows running the IB transport protocol > using Ethernet frames allowing the deployment of IB semantics on > lossless Ethernet fabrics. RDMAoE packets are standard Ethernet frames > with an IEEE assigned Ethertype, a GRH, unmodified IB transport > headers and payload. Aside from the considerations pointed out below, > RDMAoE ports are functionally equivalent to regular IB ports from the > RDMA stack perspective. > > IB subnet management and SA services are not required for RDMAoE > operation; Ethernet management practices are used instead. In > Ethernet, nodes are commonly referred to by applications by means of > an IP address. RDMAoE encodes the IP addresses that were assigned to > the corresponding Ethernet port into its GIDs, and makes use of the IP > stack to bind a destination address to the corresponding netdevice > (just as the CMA does today for IB and iWARP) and to obtain its L2 MAC > addresses. > > The RDMA Verbs API is syntactically unmodified. When referring to > RDMAoE ports, Address handles are required to contain GIDs and the L2 > address fields in the API are ignored. The Ethernet L2 information is > then obtained by the vendor-specific driver (both in kernel- and > user-space) while modifying QPs to RTR and creating address handles. > > In order to maximize transparency for applications, RDMAoE implements > a dedicated API that provides services equivalent to some of those > provided by the IB-SA. The current approach is strictly local but may > evolve in the future. This API is implemented using an independent > source code file which allows for seamless evolution of the code > without affecting the IB native SA interfaces. We have successfully > tested MPI, SDP, RDS, and native Verbs applications over RDMAoE. > > To enable RDMAoE with the mlx4 driver stack, both the mlx4_en and > mlx4_ib drivers must be loaded, and the netdevice for the > corresponding RDMAoE port must be running. Individual ports of a multi > port HCA can be independently configured as Ethernet (with support for > RDMAoE) or IB, as is already the case. > > Following is a series of 8 patches based on version 2.6.30 of the > Linux kernel. This new series reflects changes based on feedback from > the community on the previous set of patches. The whole series is > tagged v3. > > Signed-off-by: Eli Cohen > I agree with Or here, I really do not think that making RDMAoE transparent to applications is worth pushing a lot of compatibility code to the kernel. The winner here is definitely rdmaoe_sa - 1000 lines of useless code which boils down to kernel_bind and kernel_setsockopt. Why do you need all this code to hold state, refcounts, whatever - if the kernel already does this for you? If an application uses IB - let it use real IB. If it uses RDMA - let it use all RDMA implementations out there (IB, iwarp, RDMAoE). Therefore, I think the correct place to add RDMAoE is under rdma_cm. If a consumer wants to use RDMAoE - it should use rdma_cm. Looks like you are trying to add something that is between RDMAoE and IBoE, and put a lot of hacky bypass logic in core and ulps. --Yossi ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] [PATCH 0/8 v3] RDMAoE support
RDMA over Ethernet (RDMAoE) allows running the IB transport protocol using Ethernet frames allowing the deployment of IB semantics on lossless Ethernet fabrics. RDMAoE packets are standard Ethernet frames with an IEEE assigned Ethertype, a GRH, unmodified IB transport headers and payload. Aside from the considerations pointed out below, RDMAoE ports are functionally equivalent to regular IB ports from the RDMA stack perspective. IB subnet management and SA services are not required for RDMAoE operation; Ethernet management practices are used instead. In Ethernet, nodes are commonly referred to by applications by means of an IP address. RDMAoE encodes the IP addresses that were assigned to the corresponding Ethernet port into its GIDs, and makes use of the IP stack to bind a destination address to the corresponding netdevice (just as the CMA does today for IB and iWARP) and to obtain its L2 MAC addresses. The RDMA Verbs API is syntactically unmodified. When referring to RDMAoE ports, Address handles are required to contain GIDs and the L2 address fields in the API are ignored. The Ethernet L2 information is then obtained by the vendor-specific driver (both in kernel- and user-space) while modifying QPs to RTR and creating address handles. In order to maximize transparency for applications, RDMAoE implements a dedicated API that provides services equivalent to some of those provided by the IB-SA. The current approach is strictly local but may evolve in the future. This API is implemented using an independent source code file which allows for seamless evolution of the code without affecting the IB native SA interfaces. We have successfully tested MPI, SDP, RDS, and native Verbs applications over RDMAoE. To enable RDMAoE with the mlx4 driver stack, both the mlx4_en and mlx4_ib drivers must be loaded, and the netdevice for the corresponding RDMAoE port must be running. Individual ports of a multi port HCA can be independently configured as Ethernet (with support for RDMAoE) or IB, as is already the case. Following is a series of 8 patches based on version 2.6.30 of the Linux kernel. This new series reflects changes based on feedback from the community on the previous set of patches. The whole series is tagged v3. Signed-off-by: Eli Cohen drivers/infiniband/core/Makefile |2 drivers/infiniband/core/addr.c| 20 drivers/infiniband/core/agent.c | 12 drivers/infiniband/core/cma.c | 124 +++ drivers/infiniband/core/mad.c | 48 + drivers/infiniband/core/multicast.c | 43 - drivers/infiniband/core/multicast.h | 79 ++ drivers/infiniband/core/rdmaoe_sa.c | 942 ++ drivers/infiniband/core/sa.h | 24 drivers/infiniband/core/sa_query.c| 26 drivers/infiniband/core/ud_header.c | 111 +++ drivers/infiniband/core/uverbs.h |1 drivers/infiniband/core/uverbs_cmd.c | 33 + drivers/infiniband/core/uverbs_main.c |1 drivers/infiniband/core/verbs.c | 17 drivers/infiniband/hw/mlx4/ah.c | 228 ++- drivers/infiniband/hw/mlx4/main.c | 276 +++- drivers/infiniband/hw/mlx4/mlx4_ib.h | 30 drivers/infiniband/hw/mlx4/qp.c | 253 ++-- drivers/infiniband/ulp/ipoib/ipoib_main.c |3 drivers/net/mlx4/cmd.c|6 drivers/net/mlx4/en_main.c| 15 drivers/net/mlx4/en_port.c|4 drivers/net/mlx4/en_port.h|3 drivers/net/mlx4/intf.c | 20 drivers/net/mlx4/main.c |6 drivers/net/mlx4/mlx4.h |1 include/linux/mlx4/cmd.h |1 include/linux/mlx4/device.h | 31 include/linux/mlx4/driver.h | 16 include/linux/mlx4/qp.h |8 include/rdma/ib_addr.h| 53 + include/rdma/ib_pack.h| 26 include/rdma/ib_user_verbs.h | 21 include/rdma/ib_verbs.h | 22 include/rdma/rdmaoe_sa.h | 66 ++ 36 files changed, 2333 insertions(+), 239 deletions(-) ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg