RE: [ewg] Re: [ofw] SC'09 BOF - Meeting notes
The link local address that we are currently passing down from the rdmacm encodes a MAC address that was obtained through neighbor discovery; so we are safe. There are RDMAoE applications (some in the embedded space) that do not use the rdmacm. Some of these rely on custom L2 address assignment and would like to completely avoid the use of neighbor discovery. For these, we can clearly state the requirement that the Interface Identifier in the link local address that they pass down should be such that it encodes a valid MAC address that the interface currently responds to. In the future we also intend to allow the use of (non link local) IP addresses encoded in the GIDs. And we will definitely use neighbor discovery to translate those. --Liran -Original Message- From: Roland Dreier [mailto:rdre...@cisco.com] Sent: Monday, November 30, 2009 7:34 AM To: Liran Liss Cc: Richard Frank; o...@lists.openfabrics.org; OpenFabrics EWG Subject: Re: [ewg] Re: [ofw] SC'09 BOF - Meeting notes RFC 4291, Appendix A. Thanks for the pointer. As far as I can tell from reading some IPv6 stuff, it really is broken to try to go from a link-local IPv6 address back to a L2 ethernet address. For example, RFC 2464 (pointed to by RFC 4291) says: Ethernet Address The 48 bit Ethernet IEEE 802 address, in canonical bit order. This is the address the interface currently responds to, and may be different from the built-in address used to derive the Interface Identifier. It really seems to be setting ourselves up for trouble not to use neighbor discovery to map IPv6 addresses to link-layer addresses. - R. ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] Re: [ofw] SC'09 BOF - Meeting notes
RFC 4291, Appendix A. Thanks for the pointer. As far as I can tell from reading some IPv6 stuff, it really is broken to try to go from a link-local IPv6 address back to a L2 ethernet address. For example, RFC 2464 (pointed to by RFC 4291) says: Ethernet Address The 48 bit Ethernet IEEE 802 address, in canonical bit order. This is the address the interface currently responds to, and may be different from the built-in address used to derive the Interface Identifier. It really seems to be setting ourselves up for trouble not to use neighbor discovery to map IPv6 addresses to link-layer addresses. - R. ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
RE: [ewg] Re: [ofw] SC'09 BOF - Meeting notes
RFC 4291, Appendix A. --Liran -Original Message- From: Roland Dreier [mailto:rdre...@cisco.com] Sent: Monday, November 23, 2009 9:18 PM To: Liran Liss Cc: Richard Frank; o...@lists.openfabrics.org; OpenFabrics EWG Subject: Re: [ewg] Re: [ofw] SC'09 BOF - Meeting notes RFC 3041 deals with static global IP addresses on the Internet, especially for portable devices. rmdaoe allows using link-local GIDs for applications residing on the same subnet, so I don't see the relevance. I guess you're right -- I was confused about when random addresses are used for generating stateless autoconfig addresses, and I guess even with RFC3041 they are not for link-local scope. However, do you know of anything in the IPv6 RFCs that guarantees that link-local IPv6 addresses are generated using ethernet addresses? - R. ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
RE: [ewg] Re: [ofw] SC'09 BOF - Meeting notes
All addressing code now resides within a rdmaoe-specific flow in the cma, so the changes do not seem invasive. Is there any specific change that concerns you? http://www.t11.org/ftp/t11/pub/fc/study/09-543v0.pdf describes the IBTA definition in progress, which is in line with the current driver stack implementation. --Liran -Original Message- From: Roland Dreier [mailto:rdre...@cisco.com] Sent: Monday, November 23, 2009 9:20 PM To: Liran Liss Cc: Richard Frank; o...@lists.openfabrics.org; OpenFabrics EWG Subject: Re: [ewg] Re: [ofw] SC'09 BOF - Meeting notes In any case, this is not a correctness issue that prohibits experimentation with rdmaoe multicast on any network today. I agree -- nothing prevents experimentation. I am just leery about making invasive changes to the core stack in the absence of any documented design for IBoE (that I've seen at least). - R. ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] Re: [ofw] SC'09 BOF - Meeting notes
On Fri, Nov 20, 2009 at 01:38:59AM +0200, Or Gerlitz wrote: yes, this would be simply not supportable, think about that, you want to hand your customers with a code which didn't pass review nor acceptance by the Linux IB stack maintainers (Roland and Sean), say, next a crash happens at this or that module / line, next, what you except the maintainers to do? Saying that the patch set did not go through a review process would not inaccurate. Bellow is a brief log of major changes done on the RDMAoE patches for your reference. A detailed correspondence can be found at the openfabrics general list. Rev1 - June 15 2009, first patch set sent for review Rev2 - June 25 2009, Sean - move path resolution to a new module (rdmaoe_sa) Rev3 - July 13 2009, Sean, Roland, share data structs between multicast.c and rdmaoe_sa.c, distinguish between rdmaoe and ib calls at the cma, increment ABI version Rev4 - Aug 5 2009, Woody Sean Or, ports are differentiated by port protocol rather than port type, move rdmaoe sa functionality to cma Rev5 - Aug 19 2009, Roland, Sean, don't use broadcast MACs to map multicast GIDs, MAD service disabled for userspace, add rdma_is_transport_supported() Annonuce - Sep 17 2009, OFED-RDMAoE branch announce, daily builds available Rev6 - Nov 16 2009, NIC programming moved from CMA to hw driver so verbs consumer can utilize it. ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
RE: [ewg] Re: [ofw] SC'09 BOF - Meeting notes
As far as core APIs go, the patch set introduces 2 basic additions rather than changes: - A new ABI function to resolve gids to macs - ib_get_mac() - A new kernel ib_device function to get the port transport - ib_get_port_transport(). There are no changes to the Verbs API. All the address resolution stuff is contained in the cma code, so I think we code extend its logic in the future without breaking things at the interface level. Do you have anything specific in mind? --Liran -Original Message- From: ewg-boun...@lists.openfabrics.org [mailto:ewg-boun...@lists.openfabrics.org] On Behalf Of Roland Dreier Sent: Thursday, November 19, 2009 9:17 PM To: Richard Frank Cc: o...@lists.openfabrics.org; OpenFabrics EWG Subject: Re: [ewg] Re: [ofw] SC'09 BOF - Meeting notes How can 1500 lines out of 240k lines be a big change.. do I have these numbers right - is the big change you are referring too? If there are significant changes to the core APIs -- and IBoE has exactly this impact -- then yes it can be a big change even if the line count is small. What is the risk area that you are worried about .. do you think it will break current transports or existing ULPs ? I am worried that no one has thought through all the issues and corner cases around address resolution, multicast, etc, and that when we do get a standardized version of IBoE, we'll have to break core APIs yet again. - R. ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] Re: [ofw] SC'09 BOF - Meeting notes
On Mon, Nov 23, 2009 at 10:11:21AM +0200, Eli Cohen wrote: Would like to fix a typo: I meant bellow: Saying that the patch set did not go through a review process would **be** inaccurate. On Fri, Nov 20, 2009 at 01:38:59AM +0200, Or Gerlitz wrote: yes, this would be simply not supportable, think about that, you want to hand your customers with a code which didn't pass review nor acceptance by the Linux IB stack maintainers (Roland and Sean), say, next a crash happens at this or that module / line, next, what you except the maintainers to do? Saying that the patch set did not go through a review process would not inaccurate. Bellow is a brief log of major changes done on the RDMAoE patches for your reference. A detailed correspondence can be found at the openfabrics general list. Rev1 - June 15 2009, first patch set sent for review Rev2 - June 25 2009, Sean - move path resolution to a new module (rdmaoe_sa) Rev3 - July 13 2009, Sean, Roland, share data structs between multicast.c and rdmaoe_sa.c, distinguish between rdmaoe and ib calls at the cma, increment ABI version Rev4 - Aug 5 2009, Woody Sean Or, ports are differentiated by port protocol rather than port type, move rdmaoe sa functionality to cma Rev5 - Aug 19 2009, Roland, Sean, don't use broadcast MACs to map multicast GIDs, MAD service disabled for userspace, add rdma_is_transport_supported() Annonuce - Sep 17 2009, OFED-RDMAoE branch announce, daily builds available Rev6 - Nov 16 2009, NIC programming moved from CMA to hw driver so verbs consumer can utilize it. ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
RE: [ewg] Re: [ofw] SC'09 BOF - Meeting notes
90% of the changes are either in the mlx4 driver, or self-contained in the rdmaoe flow of the cma, which handles rdmaoe addressing and connection setup. The rest of the changes indeed touch various locations of the stack, but they are either definitions or follow the same logic: if (rdma_is_trasnport(ib_device, RDMA_TRANSPORT_RDMAOE)) do_something_rdmaoe_specific(); The patches don't change the logic of existing flows at all, so we are not risking *anything* in terms of the stability of the current stack. As for vlan id and priorities - we are fully aware to the importance of exposing vlan ids and priorities to the user, but thanks for pointing this out. There are deployments today that work fine with the current patches; but in any case, we are planning to send a follow-up patch set that adds vlan+priority support in the near future. --Liran -Original Message- From: ewg-boun...@lists.openfabrics.org [mailto:ewg-boun...@lists.openfabrics.org] On Behalf Of Or Gerlitz Sent: Friday, November 20, 2009 1:39 AM To: Richard Frank Cc: Sean Hefty; Roland Dreier; OpenFabrics EWG Subject: Re: [ewg] Re: [ofw] SC'09 BOF - Meeting notes Richard Frank richard.fr...@oracle.com wrote: How can 1500 lines out of 240k lines be a big change.. do I have these numbers right - is the big change you are referring too? Rick, the change set is way not self contained but rather touches various parts of the core IB stack (rdma-cm module, ib address resolution module, ib uverbs module and even the mad module) and ofcourse some of the kernel and user space IB hw specific libraries. What is the risk area that you are worried about .. do you think it will break current transports or existing ULPs? yes, this would be simply not supportable, think about that, you want to hand your customers with a code which didn't pass review nor acceptance by the Linux IB stack maintainers (Roland and Sean), say, next a crash happens at this or that module / line, next, what you except the maintainers to do? If it's just about how the implementation is done.. can this be resolved concurrently with getting the bits available for evaluation now.. an rdmaoe branch at the git tree was set and an releases are maintained, its all what you need for evaluation, five lines later you're talking on deployments... As RoCEE is totally transparent to existing ULPs.. any potential changes would not be visible.. and therefore not an issue for ULP / clients going forward.. right? this is how you see things, since the IBTA IBXoE annex isn't released, you just don't know what would be the bottom line. Oracle would like to see RoCEE get into 1.5 you guys have set a note to the rds developer community that that Oracle recently moved from 1.3.x to 1.4.y, no special work is expected on 1.5.z and that you have lots of plans for 1.6.w ... what's the urgency to get these bits into 1.5? We are testing with RoCEE now and plan to deploy it fairly soon.. in very large configuratio the proposed patch set doesn't let you use non zero VLAN, aren't you expecting Ethernet customers to trivially require that? also you can't use non zero traffic class (priority bits), where all the IBXoE materials are talking about how much working on a lossless traffic class is a must... if indeed this is the case, the patch set is useless without the ability to specify a traffic class, as CEE switches would typically (always?) set only some of the traffic classes to be lossless (e.g the ones used for FCoE, IBXoE) and the rest to be lossy Or ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] RE: [ofw] SC'09 BOF - Meeting notes
facts... the patch set sent from downtown Yoqne'am isn't an addition of feature turns out that some folks from the Mellanox RD group found this sentence insulting, and I am apologizing for that. Mentioning the geographic location of the developers didn't come to serve why I find the patch set this or that, but rather send the author of the email I was responding on to go and do homework in his own company office. Or. ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
RE: [ewg] Re: [ofw] SC'09 BOF - Meeting notes
In the past few months of review, the responsibility for rdmaoe addressing was moved to the rdmacm. So, any future addressing enhancements can be confined to the rdmacm module without breaking existing APIs. RFC 3041 deals with static global IP addresses on the Internet, especially for portable devices. rmdaoe allows using link-local GIDs for applications residing on the same subnet, so I don't see the relevance. Note that for rdmacm apps, the intention is to map the IP addresses that were assigned to the host's interfaces. Please see http://www.t11.org/ftp/t11/pub/fc/study/09-543v0.pdf. Regarding multicast, current switches will flood the traffic just as any other non-IP multicast traffic (e.g., fcoe). Using switches that support multicast pruning for additional ethertypes, you can optimize the traffic and achieve the same link utilization as normal IP multicast. In any case, this is not a correctness issue that prohibits experimentation with rdmaoe multicast on any network today. --Liran -Original Message- From: ewg-boun...@lists.openfabrics.org [mailto:ewg-boun...@lists.openfabrics.org] On Behalf Of Roland Dreier Sent: Thursday, November 19, 2009 9:35 PM To: Richard Frank Cc: o...@lists.openfabrics.org; OpenFabrics EWG Subject: Re: [ewg] Re: [ofw] SC'09 BOF - Meeting notes Having lots of testing exposure can help in validating that all the edge cases are handled.. To some extent -- but there also needs to be some thinking involved to make sure that the interface can actually handle future cases. Are there a set of cases that you have in mind ? For example -- how is multicast going to interact with IGMP on ethernet switches? How is address resolution going to be done (current patches seem to assume that stateless IPv6 link-local addresses contain the ethernet address, which is not valid if RFC 3041 is used)? etc - R. ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] Re: [ofw] SC'09 BOF - Meeting notes
FWIW: the dealbreaker for me is that we're already at 1.5rc2. By OFED's own rules, new features are not to be allowed. Or you can reset the release clock and target Jan/Feb. Mellanox already has their own OFED distribution -- since there appears to be strong desire to get this stuff released ASAP, is there an issue with releasing it through Mellanox OFED. Then later release it through community OFED in the next go-round? On Nov 23, 2009, at 4:18 AM, Liran Liss wrote: In the past few months of review, the responsibility for rdmaoe addressing was moved to the rdmacm. So, any future addressing enhancements can be confined to the rdmacm module without breaking existing APIs. RFC 3041 deals with static global IP addresses on the Internet, especially for portable devices. rmdaoe allows using link-local GIDs for applications residing on the same subnet, so I don't see the relevance. Note that for rdmacm apps, the intention is to map the IP addresses that were assigned to the host's interfaces. Please see http://www.t11.org/ftp/t11/pub/fc/study/09-543v0.pdf. Regarding multicast, current switches will flood the traffic just as any other non-IP multicast traffic (e.g., fcoe). Using switches that support multicast pruning for additional ethertypes, you can optimize the traffic and achieve the same link utilization as normal IP multicast. In any case, this is not a correctness issue that prohibits experimentation with rdmaoe multicast on any network today. --Liran -Original Message- From: ewg-boun...@lists.openfabrics.org [mailto:ewg-boun...@lists.openfabrics.org] On Behalf Of Roland Dreier Sent: Thursday, November 19, 2009 9:35 PM To: Richard Frank Cc: o...@lists.openfabrics.org; OpenFabrics EWG Subject: Re: [ewg] Re: [ofw] SC'09 BOF - Meeting notes Having lots of testing exposure can help in validating that all the edge cases are handled.. To some extent -- but there also needs to be some thinking involved to make sure that the interface can actually handle future cases. Are there a set of cases that you have in mind ? For example -- how is multicast going to interact with IGMP on ethernet switches? How is address resolution going to be done (current patches seem to assume that stateless IPv6 link-local addresses contain the ethernet address, which is not valid if RFC 3041 is used)? etc - R. ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg -- Jeff Squyres jsquy...@cisco.com ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] Re: [ofw] SC'09 BOF - Meeting notes
Jeff Squyres wrote: FWIW: the dealbreaker for me is that we're already at 1.5rc2. By OFED's own rules, new features are not to be allowed. Or you can reset the release clock and target Jan/Feb. Mellanox already has their own OFED distribution -- since there appears to be strong desire to get this stuff released ASAP, is there an issue with releasing it through Mellanox OFED. Then later release it through community OFED in the next go-round? We will discuss this in the meeting today Tziporet ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] Re: [ofw] SC'09 BOF - Meeting notes
Liran Liss wrote: The patches don't change the logic of existing flows at all, so we are not risking *anything* in terms of the stability of the current stack. I understand that this is your assessment of the situation, looking on the series present at the ofed1.5 rdmaoe branch in a black box manner yields that many many files are touched, see below. Coming and saying that changes in your HW LL driver are out of the scope for other companies to discuss is not acceptable, since we provide enterprise ready stack based on your HW driver. As for vlan id and priorities - we are fully aware to the importance of exposing vlan ids and priorities to the user, but thanks for pointing this out. sure, I am saying this since my first look on the patches, couple of months ago, good that someone listens now. There are deployments today that work fine with the current patches all the rdmaoe materials saying the lossless traffic class is a must, are you saying that this works well also without it? then why from architect point of view you have posed this requirement? Or. $ cat rdmaoe_patches core_0300_refine_device_personality.patch core_0310-Add-RDMAoE-transport-protocol.patch core_0320_rdmaoe_support_qp1.patch core_0330_umad-Enable-support-only-for-IB-ports.patch core_0340_Enable-CM-support-for-RDMAoE.patch core_0350-CMA-device-binding.patch core_0360_RDMAoE-UD-packet-packing-support.patch core_0370-support-RDMAoE-from-userspace.patch core_0380_mcast-support-to-rdmaoe.patch core_0390_cma-move-netdev-mac.patch mlx4_2000_RDMAoE-address-resolution.patch mlx4_2010_RDMAoE-support-allow-interfaces-to-correspond.patch mlx4_2020_handle_mcast_mac.patch mlx4_2030_fix_port_num.patch mlx4_2040_Fix-multicast-handling.patch xxx_rdmaoe_port_notice.patch $ cat rdmaoe_patches | xargs diffstat b/drivers/infiniband/core/cm.c | 25 - b/drivers/infiniband/core/cma.c | 54 +- b/drivers/infiniband/core/mad.c | 41 +- b/drivers/infiniband/core/multicast.c |4 b/drivers/infiniband/core/sa_query.c| 39 +- b/drivers/infiniband/core/ucm.c |8 b/drivers/infiniband/core/ucma.c|2 b/drivers/infiniband/core/user_mad.c|6 b/drivers/infiniband/core/verbs.c | 25 + b/drivers/infiniband/hw/mlx4/main.c | 56 ++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c | 12 b/include/rdma/ib_addr.h| 93 b/include/rdma/ib_verbs.h | 11 b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c |3 b/net/sunrpc/xprtrdma/svc_rdma_transport.c |3 drivers/infiniband/core/cm.c|5 drivers/infiniband/core/cma.c | 254 +++-- drivers/infiniband/core/ucm.c | 13 drivers/infiniband/core/ucma.c | 31 + drivers/infiniband/core/user_mad.c | 16 drivers/infiniband/hw/mlx4/ah.c | 22 - drivers/infiniband/hw/mlx4/main.c | 110 - drivers/infiniband/hw/mlx4/mlx4_ib.h| 13 drivers/infiniband/hw/mlx4/qp.c | 29 + include/rdma/ib_verbs.h |4 ofed_kernel-fixes/drivers/infiniband/core/agent.c | 39 +- ofed_kernel-fixes/drivers/infiniband/core/local_sa.c| 22 - ofed_kernel-fixes/drivers/infiniband/core/mad.c | 45 +- ofed_kernel-fixes/drivers/infiniband/core/notice.c |4 ofed_kernel-fixes/drivers/infiniband/core/ud_header.c | 111 + ofed_kernel-fixes/drivers/infiniband/core/uverbs.h |1 ofed_kernel-fixes/drivers/infiniband/core/uverbs_cmd.c | 32 + ofed_kernel-fixes/drivers/infiniband/core/uverbs_main.c |1 ofed_kernel-fixes/drivers/infiniband/core/verbs.c |9 ofed_kernel-fixes/drivers/infiniband/hw/mlx4/ah.c | 187 +++-- ofed_kernel-fixes/drivers/infiniband/hw/mlx4/mad.c | 32 + ofed_kernel-fixes/drivers/infiniband/hw/mlx4/main.c | 309 ++-- ofed_kernel-fixes/drivers/infiniband/hw/mlx4/mlx4_ib.h | 19 ofed_kernel-fixes/drivers/infiniband/hw/mlx4/qp.c | 169 ++-- ofed_kernel-fixes/drivers/net/mlx4/en_main.c| 15 ofed_kernel-fixes/drivers/net/mlx4/en_port.c|4 ofed_kernel-fixes/drivers/net/mlx4/en_port.h|3 ofed_kernel-fixes/drivers/net/mlx4/fw.c |3 ofed_kernel-fixes/drivers/net/mlx4/intf.c | 20 + ofed_kernel-fixes/drivers/net/mlx4/main.c |6 ofed_kernel-fixes/drivers/net/mlx4/mlx4.h |1 ofed_kernel-fixes/include/linux/mlx4/cmd.h |1
Re: [ewg] Re: [ofw] SC'09 BOF - Meeting notes
Is this code new ? We've been evaluating versions of it since before June/2009. We are currently testing with OFED-RDMAoE-1.5-20091116-0620.tgz. Our plans are to move from OFED 1.4.2 to OFED 1.5.x in June/2010.. It takes us this long to complete internal testing. Has anyone else done any evaluation / testing with RDMAoE / RoCEE ? Jeff Squyres wrote: FWIW: the dealbreaker for me is that we're already at 1.5rc2. By OFED's own rules, new features are not to be allowed. Or you can reset the release clock and target Jan/Feb. Mellanox already has their own OFED distribution -- since there appears to be strong desire to get this stuff released ASAP, is there an issue with releasing it through Mellanox OFED. Then later release it through community OFED in the next go-round? On Nov 23, 2009, at 4:18 AM, Liran Liss wrote: In the past few months of review, the responsibility for rdmaoe addressing was moved to the rdmacm. So, any future addressing enhancements can be confined to the rdmacm module without breaking existing APIs. RFC 3041 deals with static global IP addresses on the Internet, especially for portable devices. rmdaoe allows using link-local GIDs for applications residing on the same subnet, so I don't see the relevance. Note that for rdmacm apps, the intention is to map the IP addresses that were assigned to the host's interfaces. Please see http://www.t11.org/ftp/t11/pub/fc/study/09-543v0.pdf. Regarding multicast, current switches will flood the traffic just as any other non-IP multicast traffic (e.g., fcoe). Using switches that support multicast pruning for additional ethertypes, you can optimize the traffic and achieve the same link utilization as normal IP multicast. In any case, this is not a correctness issue that prohibits experimentation with rdmaoe multicast on any network today. --Liran -Original Message- From: ewg-boun...@lists.openfabrics.org [mailto:ewg-boun...@lists.openfabrics.org] On Behalf Of Roland Dreier Sent: Thursday, November 19, 2009 9:35 PM To: Richard Frank Cc: o...@lists.openfabrics.org; OpenFabrics EWG Subject: Re: [ewg] Re: [ofw] SC'09 BOF - Meeting notes Having lots of testing exposure can help in validating that all the edge cases are handled.. To some extent -- but there also needs to be some thinking involved to make sure that the interface can actually handle future cases. Are there a set of cases that you have in mind ? For example -- how is multicast going to interact with IGMP on ethernet switches? How is address resolution going to be done (current patches seem to assume that stateless IPv6 link-local addresses contain the ethernet address, which is not valid if RFC 3041 is used)? etc - R. ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] Re: [ofw] SC'09 BOF - Meeting notes
Is this code new ? We've been evaluating versions of it since before June/2009. We are currently testing with OFED-RDMAoE-1.5-20091116-0620.tgz. Our plans are to move from OFED 1.4.2 to OFED 1.5.x in June/2010.. It takes us this long to complete internal testing. Has anyone else done any evaluation / testing with RDMAoE / RoCEE ? Jeff Squyres wrote: FWIW: the dealbreaker for me is that we're already at 1.5rc2. By OFED's own rules, new features are not to be allowed. Or you can reset the release clock and target Jan/Feb. Mellanox already has their own OFED distribution -- since there appears to be strong desire to get this stuff released ASAP, is there an issue with releasing it through Mellanox OFED. Then later release it through community OFED in the next go-round? On Nov 23, 2009, at 4:18 AM, Liran Liss wrote: In the past few months of review, the responsibility for rdmaoe addressing was moved to the rdmacm. So, any future addressing enhancements can be confined to the rdmacm module without breaking existing APIs. RFC 3041 deals with static global IP addresses on the Internet, especially for portable devices. rmdaoe allows using link-local GIDs for applications residing on the same subnet, so I don't see the relevance. Note that for rdmacm apps, the intention is to map the IP addresses that were assigned to the host's interfaces. Please see http://www.t11.org/ftp/t11/pub/fc/study/09-543v0.pdf. Regarding multicast, current switches will flood the traffic just as any other non-IP multicast traffic (e.g., fcoe). Using switches that support multicast pruning for additional ethertypes, you can optimize the traffic and achieve the same link utilization as normal IP multicast. In any case, this is not a correctness issue that prohibits experimentation with rdmaoe multicast on any network today. --Liran -Original Message- From: ewg-boun...@lists.openfabrics.org [mailto:ewg-boun...@lists.openfabrics.org] On Behalf Of Roland Dreier Sent: Thursday, November 19, 2009 9:35 PM To: Richard Frank Cc: o...@lists.openfabrics.org; OpenFabrics EWG Subject: Re: [ewg] Re: [ofw] SC'09 BOF - Meeting notes Having lots of testing exposure can help in validating that all the edge cases are handled.. To some extent -- but there also needs to be some thinking involved to make sure that the interface can actually handle future cases. Are there a set of cases that you have in mind ? For example -- how is multicast going to interact with IGMP on ethernet switches? How is address resolution going to be done (current patches seem to assume that stateless IPv6 link-local addresses contain the ethernet address, which is not valid if RFC 3041 is used)? etc - R. ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] Re: [ofw] SC'09 BOF - Meeting notes
We at OSU have done testing of MVAPICH2 1.4 against the OFED-RDMAoE branch mentioned below. Everything works well. In fact, we made a formal release of MVAPICH2 1.4 with RDMAoE support last month. Thanks, DK Is this code new ? We've been evaluating versions of it since before June/2009. We are currently testing with OFED-RDMAoE-1.5-20091116-0620.tgz. Our plans are to move from OFED 1.4.2 to OFED 1.5.x in June/2010.. It takes us this long to complete internal testing. Has anyone else done any evaluation / testing with RDMAoE / RoCEE ? Jeff Squyres wrote: FWIW: the dealbreaker for me is that we're already at 1.5rc2. By OFED's own rules, new features are not to be allowed. Or you can reset the release clock and target Jan/Feb. Mellanox already has their own OFED distribution -- since there appears to be strong desire to get this stuff released ASAP, is there an issue with releasing it through Mellanox OFED. Then later release it through community OFED in the next go-round? On Nov 23, 2009, at 4:18 AM, Liran Liss wrote: In the past few months of review, the responsibility for rdmaoe addressing was moved to the rdmacm. So, any future addressing enhancements can be confined to the rdmacm module without breaking existing APIs. RFC 3041 deals with static global IP addresses on the Internet, especially for portable devices. rmdaoe allows using link-local GIDs for applications residing on the same subnet, so I don't see the relevance. Note that for rdmacm apps, the intention is to map the IP addresses that were assigned to the host's interfaces. Please see http://www.t11.org/ftp/t11/pub/fc/study/09-543v0.pdf. Regarding multicast, current switches will flood the traffic just as any other non-IP multicast traffic (e.g., fcoe). Using switches that support multicast pruning for additional ethertypes, you can optimize the traffic and achieve the same link utilization as normal IP multicast. In any case, this is not a correctness issue that prohibits experimentation with rdmaoe multicast on any network today. --Liran -Original Message- From: ewg-boun...@lists.openfabrics.org [mailto:ewg-boun...@lists.openfabrics.org] On Behalf Of Roland Dreier Sent: Thursday, November 19, 2009 9:35 PM To: Richard Frank Cc: o...@lists.openfabrics.org; OpenFabrics EWG Subject: Re: [ewg] Re: [ofw] SC'09 BOF - Meeting notes Having lots of testing exposure can help in validating that all the edge cases are handled.. To some extent -- but there also needs to be some thinking involved to make sure that the interface can actually handle future cases. Are there a set of cases that you have in mind ? For example -- how is multicast going to interact with IGMP on ethernet switches? How is address resolution going to be done (current patches seem to assume that stateless IPv6 link-local addresses contain the ethernet address, which is not valid if RFC 3041 is used)? etc - R. ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] Re: [ofw] SC'09 BOF - Meeting notes
In any case, this is not a correctness issue that prohibits experimentation with rdmaoe multicast on any network today. I agree -- nothing prevents experimentation. I am just leery about making invasive changes to the core stack in the absence of any documented design for IBoE (that I've seen at least). - R. ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] Re: [ofw] SC'09 BOF - Meeting notes
How can 1500 lines out of 240k lines be a big change.. do I have these numbers right - is the big change you are referring too? If there are significant changes to the core APIs -- and IBoE has exactly this impact -- then yes it can be a big change even if the line count is small. What is the risk area that you are worried about .. do you think it will break current transports or existing ULPs ? I am worried that no one has thought through all the issues and corner cases around address resolution, multicast, etc, and that when we do get a standardized version of IBoE, we'll have to break core APIs yet again. - R. ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] Re: [ofw] SC'09 BOF - Meeting notes
I am worried that no one has thought through all the issues and corner cases around address resolution, multicast, etc, and that when we do get a standardized version of IBoE, we'll have to break core APIs yet again. Having lots of testing exposure can help in validating that all the edge cases are handled.. Are there a set of cases that you have in mind ? Roland Dreier wrote: How can 1500 lines out of 240k lines be a big change.. do I have these numbers right - is the big change you are referring too? If there are significant changes to the core APIs -- and IBoE has exactly this impact -- then yes it can be a big change even if the line count is small. What is the risk area that you are worried about .. do you think it will break current transports or existing ULPs ? I am worried that no one has thought through all the issues and corner cases around address resolution, multicast, etc, and that when we do get a standardized version of IBoE, we'll have to break core APIs yet again. - R. ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] Re: [ofw] SC'09 BOF - Meeting notes
Having lots of testing exposure can help in validating that all the edge cases are handled.. To some extent -- but there also needs to be some thinking involved to make sure that the interface can actually handle future cases. Are there a set of cases that you have in mind ? For example -- how is multicast going to interact with IGMP on ethernet switches? How is address resolution going to be done (current patches seem to assume that stateless IPv6 link-local addresses contain the ethernet address, which is not valid if RFC 3041 is used)? etc - R. ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
RE: [ewg] RE: [ofw] SC'09 BOF - Meeting notes
Please see come comments below marked as [Sujal] related to the acceptance of motions related to RoCEE at the BOD meeting: -Original Message- From: ewg-boun...@lists.openfabrics.org [mailto:ewg-boun...@lists.openfabrics.org] On Behalf Of Woodruff, Robert J Sent: Thursday, November 19, 2009 11:31 AM To: Richard Frank Cc: o...@lists.openfabrics.org; OpenFabrics EWG Subject: [ewg] RE: [ofw] SC'09 BOF - Meeting notes The arguments against including it are: 1.) We have agreed in the EWG to follow a process where code that is to be included in OFED be first reviewed and accepted, or at least queued for acceptance in a future kernel. So far, since the spec is not yet done, Roland has expressed concerns about the current implementation and how the final spec may require changes to the implementation, and as such, does not want to push something upstream, only to have to make changes later that could impact people that have started to use the early experimental version. [Sujal] It was disclosed at the BOD meeting that there is no defined process for inclusion of new features in OFED releases, rather it is based on discussions and consensus that happen in EWG meetings. This was the basis for acceptance of the modifications to the motion at BOD and the subsequent voting and acceptance (14 voted in favor, 2 opposed) 2.) We have also discussed in the past that part of the problems with OFED being able to meet its committed released dates are because we have in the past allowed major changes into the release way after feature freeze. We have discussed that this is not the way we should be working. So, since OFED-1.5 is already at RC2, I think it is too late to add such a major change. 3.) Since there is a complete branch version of OFED-1.5 that includes the RoCCE patches, people that want to try this experimental branch can download that tar ball and use it. It is also possible for Mellanox to include the feature in their release to support their current customers. I would rather see this kept as an experimental branch for a while and allow people to get some air time and testing on it before seeing it go into the main code base. We have to be more and more careful now with the OFED code base as lots of people are using it in production and we have to be very careful not to de-stabilize the code. [Sujal] Once again there is no defined and accepted process in the EWG about air time etc, and EWG needs to work on implementing the instructions from the BOD as best as it can using current practices - which is discussions and consensus within EWG and respecting the overwhelming number of BOD members who expressed strong interest to have the technology be part of OFED (and WinOF). my 2 cents on this one, but it is up to the full EWG members to discuss the options and make the final decision. woody -Original Message- From: Richard Frank [mailto:richard.fr...@oracle.com] Sent: Thursday, November 19, 2009 10:59 AM To: Woodruff, Robert J Cc: OpenFabrics EWG; o...@lists.openfabrics.org Subject: Re: [ofw] SC'09 BOF - Meeting notes How can 1500 lines out of 240k lines be a big change.. do I have these numbers right - is the big change you are referring too? What is the risk area that you are worried about .. do you think it will break current transports or existing ULPs ? If it's just about how the implementation is done.. can this be resolved concurrently with getting the bits available for evaluation now.. As RoCEE is totally transparent to existing ULPs.. any potential changes would not be visible.. and therefore not an issue for ULP / clients going forward.. right ? Oracle would like to see RoCEE get into 1.5. We are testing with RoCEE now and plan to deploy it fairly soon.. in very large configurations... so we'd like to see other folks pick it up and try it out.. ASAP... to allow for time to get fixes into a 1.5.x release.. It would be great if RoCEE were part of 1.5 even if it were listed as evaluation.. for now. Woodruff, Robert J wrote: Hmmm - the original mail I sent did not seem to show up on the list. Maybe the spam filters caught it because of the attachment. Re-sending without the attachment. If anyone wants a copy of the final slides, let me know and I can send them directly. Below is the notes from the BOF. woody -Original Message- From: Woodruff, Robert J Sent: Thursday, November 19, 2009 10:16 AM To: Woodruff, Robert J; Tziporet Koren; Gilad Shainer; Yiftah Shahar; Betsy Zeller; Smith, Stan; HalRosenstock; Jeff Squyres; DKPanda; pg...@systemfabricsworks.com Cc: h...@us.ibm.com; bb...@systemfabricworks.com; pg...@systemfabricsworks.com; rpear...@sxystemfabricsworks.com; OpenFabrics EWG; o...@lists.openfabrics.org Subject: SC'09 BOF - Meeting notes and Final Slides Here are just a few notes from the OFA BOF at SC'09. Stan also took a few notes and can add any additional comments if I missed anything in these notes. We had some
RE: [ewg] RE: [ofw] SC'09 BOF - Meeting notes
Sujal wrote, [Sujal] It was disclosed at the BOD meeting that there is no defined process for inclusion of new features in OFED releases, rather it is based on discussions and consensus that happen in EWG meetings. This was the basis for acceptance of the modifications to the motion at BOD and the subsequent voting and acceptance (14 voted in favor, 2 opposed) Let me clarify what I said at the board meeting when it was asked if there was a documented process for including code in OFED. They answer was no, there is not a document that defines the process. That does not mean that we do not have a process, just that it is not documented. I think that if you ask people in the EWG, they will tell you that we have agreed to submit code upstream before including it in OFED. I think that there have been and are sometimes exceptions to this, but we would like to follow that process in general whenever possible. [Sujal] Once again there is no defined and accepted process in the EWG about air time etc, and EWG needs to work on implementing the instructions from the BOD as best as it can using current practices - which is discussions and consensus within EWG and respecting the overwhelming number of BOD members who expressed strong interest to have the technology be part of OFED (and WinOF). We have also discussed in the EWG that it is probably not a good idea to include major new code changes late in the release cycle to components (like the core). You should probably also discuss this with Tziporet and Betsy to get their thoughts on this one. woody ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] RE: [ofw] SC'09 BOF - Meeting notes
Woodruff, Robert J wrote: Sujal wrote, [Sujal] It was disclosed at the BOD meeting that there is no defined process for inclusion of new features in OFED releases, rather it is based on discussions and consensus that happen in EWG meetings. This was the basis for acceptance of the modifications to the motion at BOD and the subsequent voting and acceptance (14 voted in favor, 2 opposed) Let me clarify what I said at the board meeting when it was asked if there was a documented process for including code in OFED. They answer was no, there is not a document that defines the process. That does not mean that we do not have a process, just that it is not documented. I think that if you ask people in the EWG, they will tell you that we have agreed to submit code upstream before including it in OFED. I think that there have been and are sometimes exceptions to this, but we would like to follow that process in general whenever possible. [Sujal] Once again there is no defined and accepted process in the EWG about air time etc, and EWG needs to work on implementing the instructions from the BOD as best as it can using current practices - which is discussions and consensus within EWG and respecting the overwhelming number of BOD members who expressed strong interest to have the technology be part of OFED (and WinOF). We have also discussed in the EWG that it is probably not a good idea to include major new code changes late in the release cycle to components (like the core). You should probably also discuss this with Tziporet and Betsy to get their thoughts on this one. woody ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg Woody, The past couple of times when I've spoken on HP's behalf at Sonoma, in the system vendor (i.e. OFA customer) panel, I've said more-or-less the much the same thing: 1. We like the OFED code, a single code base that supports RDMA hardware from many vendors. 2. Sometimes we have to use vendor-supplied code, because we cannot get the latest technology, bug fixes, etc. on the schedule we require. The value of OFED code is diluted when system vendors must use differentiated OFED code. HP thinks we should work to get the RDMAoE code into 1.5, marked as evaluation if that is EWG's assessment, rather than push it off to 1.6. This is important technology that should not be held back. Regards Bob Souza, HP bob.so...@hp.com ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] Re: [ofw] SC'09 BOF - Meeting notes
Richard Frank richard.fr...@oracle.com wrote: How can 1500 lines out of 240k lines be a big change.. do I have these numbers right - is the big change you are referring too? Rick, the change set is way not self contained but rather touches various parts of the core IB stack (rdma-cm module, ib address resolution module, ib uverbs module and even the mad module) and ofcourse some of the kernel and user space IB hw specific libraries. What is the risk area that you are worried about .. do you think it will break current transports or existing ULPs? yes, this would be simply not supportable, think about that, you want to hand your customers with a code which didn't pass review nor acceptance by the Linux IB stack maintainers (Roland and Sean), say, next a crash happens at this or that module / line, next, what you except the maintainers to do? If it's just about how the implementation is done.. can this be resolved concurrently with getting the bits available for evaluation now.. an rdmaoe branch at the git tree was set and an releases are maintained, its all what you need for evaluation, five lines later you're talking on deployments... As RoCEE is totally transparent to existing ULPs.. any potential changes would not be visible.. and therefore not an issue for ULP / clients going forward.. right? this is how you see things, since the IBTA IBXoE annex isn't released, you just don't know what would be the bottom line. Oracle would like to see RoCEE get into 1.5 you guys have set a note to the rds developer community that that Oracle recently moved from 1.3.x to 1.4.y, no special work is expected on 1.5.z and that you have lots of plans for 1.6.w ... what's the urgency to get these bits into 1.5? We are testing with RoCEE now and plan to deploy it fairly soon.. in very large configuratio the proposed patch set doesn't let you use non zero VLAN, aren't you expecting Ethernet customers to trivially require that? also you can't use non zero traffic class (priority bits), where all the IBXoE materials are talking about how much working on a lossless traffic class is a must... if indeed this is the case, the patch set is useless without the ability to specify a traffic class, as CEE switches would typically (always?) set only some of the traffic classes to be lossless (e.g the ones used for FCoE, IBXoE) and the rest to be lossy Or ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] RE: [ofw] SC'09 BOF - Meeting notes
get the RDMAoE code into 1.5, marked as evaluation if that is EWG's assessment rather than push it off to 1.6. This is important technology that should not be held back It would be great if RoCEE were part of 1.5 even if it were listed as evaluation.. for now. this is leading edge technology, so saying that it is for early evaluation is appropriate what is the thing you want to list as evaluation the whole ofed version or the rdmaoe part of it? in case you refer to the latter, this is impossible, since the rdmaoe patch set touches various areas of the IB stack code and is way not self contained, so the whole IB stack becomes for evaluation, this stack is developed since June 2004 and maintained in the kernel since early 2005, over the last year we see more and more people (e.g from the FSI market) looking and delpolying IB, as the technology and code became enterprise ready. Do you really want that after five years in January 2010 this stack will become experimental? why? Or. ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] RE: [ofw] SC'09 BOF - Meeting notes
It was disclosed at the BOD meeting that there is no defined process for inclusion of new features in OFED releases facts... the patch set sent from downtown Yoqne'am isn't an addition of feature but rather pose changes everywhere in the IB stack, so maybe the BOD should get together again and discuss things based on facts? Or. ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
RE: [ewg] RE: [ofw] SC'09 BOF - Meeting notes
Or wrote, get the RDMAoE code into 1.5, marked as evaluation if that is EWG's assessment rather than push it off to 1.6. This is important technology that should not be held back It would be great if RoCEE were part of 1.5 even if it were listed as evaluation.. for now. this is leading edge technology, so saying that it is for early evaluation is appropriate what is the thing you want to list as evaluation the whole ofed version or the rdmaoe part of it? in case you refer to the latter, this is impossible, since the rdmaoe patch set touches various areas of the IB stack code and is way not self contained, so the whole IB stack becomes for evaluation, this stack is developed since June 2004 and maintained in the kernel since early 2005, over the last year we see more and more people (e.g from the FSI market) looking and delpolying IB, as the technology and code became enterprise ready. Do you really want that after five years in January 2010 this stack will become experimental? why? Or. There are already 2 complete branches of OFED-1.5, one with and one without the RDMAoE code. This includes daily builds and complete installer for each branch. So what I would suggest is that the OFED-1.5 code http://www.openfabrics.org/~vlad/builds/ofed-1.5/ be used for the production OFED-1.5 release and the ofed-rdmaoe-1.5 code tar balls http://www.openfabrics.org/~vlad/builds/ofed-rdmaoe-1.5/ be considered evaluation or experimental for OFED-1.5 and until all of the technical issues that the maintainers have raised with the RDMAoE code are resolved. Since the RDMAoE code touches so many of the core components and is not just an isolated component that could be marked as experimental within one OFED-1.5 tar ball, I think we need to continue to have the two complete banches, daily builds, and tar balls for each branch, as we have now. ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] RE: [ofw] SC'09 BOF - Meeting notes
Obviously I meant the IBoE functionality, not the entire stack. Carl Or Gerlitz wrote: get the RDMAoE code into 1.5, marked as evaluation if that is EWG's assessment rather than push it off to 1.6. This is important technology that should not be held back It would be great if RoCEE were part of 1.5 even if it were listed as evaluation.. for now. this is leading edge technology, so saying that it is for early evaluation is appropriate what is the thing you want to list as evaluation the whole ofed version or the rdmaoe part of it? in case you refer to the latter, this is impossible, since the rdmaoe patch set touches various areas of the IB stack code and is way not self contained, so the whole IB stack becomes for evaluation, this stack is developed since June 2004 and maintained in the kernel since early 2005, over the last year we see more and more people (e.g from the FSI market) looking and delpolying IB, as the technology and code became enterprise ready. Do you really want that after five years in January 2010 this stack will become experimental? why? Or. ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg