Re: [ewg] [PATCH 0/8 v3] RDMAoE support

2009-07-17 Thread Yossi Etigin
Eli Cohen wrote:
> RDMA over Ethernet (RDMAoE) allows running the IB transport protocol
> using Ethernet frames allowing the deployment of IB semantics on
> lossless Ethernet fabrics. RDMAoE packets are standard Ethernet frames
> with an IEEE assigned Ethertype, a GRH, unmodified IB transport
> headers and payload. Aside from the considerations pointed out below,
> RDMAoE ports are functionally equivalent to regular IB ports from the
> RDMA stack perspective.
> 
> IB subnet management and SA services are not required for RDMAoE
> operation; Ethernet management practices are used instead. In
> Ethernet, nodes are commonly referred to by applications by means of
> an IP address. RDMAoE encodes the IP addresses that were assigned to
> the corresponding Ethernet port into its GIDs, and makes use of the IP
> stack to bind a destination address to the corresponding netdevice
> (just as the CMA does today for IB and iWARP) and to obtain its L2 MAC
> addresses.
> 
> The RDMA Verbs API is syntactically unmodified. When referring to
> RDMAoE ports, Address handles are required to contain GIDs and the L2
> address fields in the API are ignored. The Ethernet L2 information is
> then obtained by the vendor-specific driver (both in kernel- and
> user-space) while modifying QPs to RTR and creating address handles.
> 
> In order to maximize transparency for applications, RDMAoE implements
> a dedicated API that provides services equivalent to some of those
> provided by the IB-SA. The current approach is strictly local but may
> evolve in the future. This API is implemented using an independent
> source code file which allows for seamless evolution of the code
> without affecting the IB native SA interfaces. We have successfully
> tested MPI, SDP, RDS, and native Verbs applications over RDMAoE.
> 
> To enable RDMAoE with the mlx4 driver stack, both the mlx4_en and
> mlx4_ib drivers must be loaded, and the netdevice for the
> corresponding RDMAoE port must be running. Individual ports of a multi
> port HCA can be independently configured as Ethernet (with support for
> RDMAoE) or IB, as is already the case.
> 
> Following is a series of 8 patches based on version 2.6.30 of the
> Linux kernel. This new series reflects changes based on feedback from
> the community on the previous set of patches. The whole series is
> tagged v3.
> 
> Signed-off-by: Eli Cohen 
> 

I agree with Or here, I really do not think that making RDMAoE transparent
to applications is worth pushing a lot of compatibility code to the kernel.
The winner here is definitely rdmaoe_sa - 1000 lines of useless code which boils
down to kernel_bind and kernel_setsockopt. Why do you need all this code to
hold state, refcounts, whatever - if the kernel already does this for you?

If an application uses IB - let it use real IB. If it uses RDMA - let it use
all RDMA implementations out there (IB, iwarp, RDMAoE).

Therefore, I think the correct place to add RDMAoE is under rdma_cm.
If a consumer wants to use RDMAoE - it should use rdma_cm. Looks like you are
trying to add something that is between RDMAoE and IBoE, and put a lot of hacky
bypass logic in core and ulps.

--Yossi

___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


[ewg] [PATCH 0/8 v3] RDMAoE support

2009-07-13 Thread Eli Cohen
RDMA over Ethernet (RDMAoE) allows running the IB transport protocol
using Ethernet frames allowing the deployment of IB semantics on
lossless Ethernet fabrics. RDMAoE packets are standard Ethernet frames
with an IEEE assigned Ethertype, a GRH, unmodified IB transport
headers and payload. Aside from the considerations pointed out below,
RDMAoE ports are functionally equivalent to regular IB ports from the
RDMA stack perspective.

IB subnet management and SA services are not required for RDMAoE
operation; Ethernet management practices are used instead. In
Ethernet, nodes are commonly referred to by applications by means of
an IP address. RDMAoE encodes the IP addresses that were assigned to
the corresponding Ethernet port into its GIDs, and makes use of the IP
stack to bind a destination address to the corresponding netdevice
(just as the CMA does today for IB and iWARP) and to obtain its L2 MAC
addresses.

The RDMA Verbs API is syntactically unmodified. When referring to
RDMAoE ports, Address handles are required to contain GIDs and the L2
address fields in the API are ignored. The Ethernet L2 information is
then obtained by the vendor-specific driver (both in kernel- and
user-space) while modifying QPs to RTR and creating address handles.

In order to maximize transparency for applications, RDMAoE implements
a dedicated API that provides services equivalent to some of those
provided by the IB-SA. The current approach is strictly local but may
evolve in the future. This API is implemented using an independent
source code file which allows for seamless evolution of the code
without affecting the IB native SA interfaces. We have successfully
tested MPI, SDP, RDS, and native Verbs applications over RDMAoE.

To enable RDMAoE with the mlx4 driver stack, both the mlx4_en and
mlx4_ib drivers must be loaded, and the netdevice for the
corresponding RDMAoE port must be running. Individual ports of a multi
port HCA can be independently configured as Ethernet (with support for
RDMAoE) or IB, as is already the case.

Following is a series of 8 patches based on version 2.6.30 of the
Linux kernel. This new series reflects changes based on feedback from
the community on the previous set of patches. The whole series is
tagged v3.

Signed-off-by: Eli Cohen 


 drivers/infiniband/core/Makefile  |2 
 drivers/infiniband/core/addr.c|   20 
 drivers/infiniband/core/agent.c   |   12 
 drivers/infiniband/core/cma.c |  124 +++
 drivers/infiniband/core/mad.c |   48 +
 drivers/infiniband/core/multicast.c   |   43 -
 drivers/infiniband/core/multicast.h   |   79 ++
 drivers/infiniband/core/rdmaoe_sa.c   |  942 ++
 drivers/infiniband/core/sa.h  |   24 
 drivers/infiniband/core/sa_query.c|   26 
 drivers/infiniband/core/ud_header.c   |  111 +++
 drivers/infiniband/core/uverbs.h  |1 
 drivers/infiniband/core/uverbs_cmd.c  |   33 +
 drivers/infiniband/core/uverbs_main.c |1 
 drivers/infiniband/core/verbs.c   |   17 
 drivers/infiniband/hw/mlx4/ah.c   |  228 ++-
 drivers/infiniband/hw/mlx4/main.c |  276 +++-
 drivers/infiniband/hw/mlx4/mlx4_ib.h  |   30 
 drivers/infiniband/hw/mlx4/qp.c   |  253 ++--
 drivers/infiniband/ulp/ipoib/ipoib_main.c |3 
 drivers/net/mlx4/cmd.c|6 
 drivers/net/mlx4/en_main.c|   15 
 drivers/net/mlx4/en_port.c|4 
 drivers/net/mlx4/en_port.h|3 
 drivers/net/mlx4/intf.c   |   20 
 drivers/net/mlx4/main.c   |6 
 drivers/net/mlx4/mlx4.h   |1 
 include/linux/mlx4/cmd.h  |1 
 include/linux/mlx4/device.h   |   31 
 include/linux/mlx4/driver.h   |   16 
 include/linux/mlx4/qp.h   |8 
 include/rdma/ib_addr.h|   53 +
 include/rdma/ib_pack.h|   26 
 include/rdma/ib_user_verbs.h  |   21 
 include/rdma/ib_verbs.h   |   22 
 include/rdma/rdmaoe_sa.h  |   66 ++
 36 files changed, 2333 insertions(+), 239 deletions(-)
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg