Hello, Ales, This is a fork of the thread to go back to discuss some of the items raised in the most recent instance of the OVN A/V Community meeting [6].
On Fri, Jun 28, 2024 at 11:03 AM Ales Musil <amu...@redhat.com> wrote: > On Tue, Jun 25, 2024 at 6:52 PM Frode Nordahl <fnord...@ubuntu.com> wrote: >> >> Hello, >> >> We are increasingly seeing requests for integration between OVN >> powered CMSs/workloads and the fabric. >> >> As a side note, this is a very interesting topic to me personally, and >> I think there are opportunities in the long term for this class of >> software to potentially fill a void for more automated and SDN-like >> ways of managing the physical network, as previously closed physical >> switch hardware is increasingly opening up to programmatic extension >> and control. >> >> While very exciting, it will take a while, both in terms of evolving >> how networking teams are organized, in terms of the longevity of >> networking gear making entity wide refresh cycles very long, not to >> mention gathering agreement and momentum to build such a thing from >> the pieces we have. >> >> >> So to be pragmatic, we need to integrate with something that fabric >> network engineers are comfortable with, and already available on most >> networking hardware, be it closed or open, today. >> >> The most ubiquitous routing protocol, which has prevailed in modern >> layer 3 only data center designs [0], is BGP. >> >> Use cases: >> * Allow fabric to locate and direct traffic to reroutable resources >> such as IPv4/IPv6 prefixes, Floating IPs (FIPs) and Load Balancer >> VIPs. >> >> * Use the fabric as a load balancer, announcing the same service IP on >> multiple hosts (anycast). >> >> * Aggregate announcements from stacked CMSes (i.e. Kubernetes running >> on top of OpenStack). >> >> >> Requirements: >> * Data path must be hardware offloaded, i.e. the next hop address the >> peer resolves for announcements of OVN resources needs to be an LRP >> IP. >> >> * Minimize configuration overhead through the use of IPv6 LLAs for >> peering routing both IPv4 and IPv6 prefixes over a IPv6 BGP session >> [1] (aka. “BGP Unnumbered”). >> >> * Support ECMP out of the host, i.e. use L3 interfaces potentially >> connecting to two different ToRs, instead of bonds, avoiding the >> additional complexity of multi-chassis bonds. >> >> * Support BGP authentication [2][3], i.e. the source, destination >> address and ports in packet headers can not be changed. >> >> * Compatibility >> * Running a BGP protocol suite on the host is becoming a thing in >> its own right, and our users may have requirements of their own that >> influence their choice of implementation. We need to take this into >> account and choose integration methods that allow OVN to work with >> multiple protocol suite implementations. >> >> * While we have the power to change and fix issues in popular >> routing protocol suites, such as FRR, we need to be able to integrate >> with versions that exist on networking hardware out there today. >> >> Limitations that influence/dictate implementation choices: >> * Peering with IPv6 LLAs to meet the configuration overhead >> requirement makes the peering relationship point to point. >> >> * Popular BGP implementations, such as FRR which is used as routing >> protocol suite by many ToR open source NOSes, does not accept >> sending/receiving IPv6 LLA next hop with the route, so the BGP peer >> address will be used as next hop. (There are even mentions of 3rd >> party nexthop currently not being supported, but not sure if that is >> accurate [4]). >> >> * As mentioned above, BGP authentication requires IP headers to be >> unchanged for the BGP TCP packets going to/from the BGP speaker. >> >> >> Proposed implementation: >> >> We are in the process of preparing some RFC/PoC patches that at a high >> level will: >> * Manage a VRF in the system serving two purposes: >> * Leaking of route information from ovn-controller to the VRF >> routing table, which a routing protocol suite can redistribute subject >> to configuration. >> >> * Provide an IP endpoint that a VRF aware application, such as FRR, >> can bind to serving as a BGP speaker on behalf of a OVN LRP IP. >> >> * We will attach a OVN VIF to this VRF that has data path rules that: >> >> * Forward required traffic destined to the OVN LRP IP to the VRF. >> >> * Forward required traffic from the application bound to the VRF as >> if it originated from the OVN LRP IP. >> >> >> Hopefully we'll have something up on the list before the end of this >> week, which makes it real and easier to reason about for further >> discussion. >> >> >> Prior art: >> >> We recognize that there already exists a third party approach to this >> in the ovn-bgp-agent [5] governed by OpenStack, and our goal with this >> work is to provide a tighter integration that might cater generically >> for other CMSes and use cases. >> >> >> 0: https://datatracker.ietf.org/doc/html/rfc7938 >> 1: https://datatracker.ietf.org/doc/html/rfc5549 >> 2: https://datatracker.ietf.org/doc/html/rfc2385 >> 3: https://datatracker.ietf.org/doc/html/rfc5925 >> 4: >> https://github.com/FRRouting/frr/blob/cc3519f3e6eaa06f762e0d447202df32df66e129/bgpd/bgp_route.c#L2719 >> 5: https://docs.openstack.org/ovn-bgp-agent/latest/ 6: https://mail.openvswitch.org/pipermail/ovs-dev/2024-August/416209.html > > Hi Frode, > > looking forward to the RFC. As we agreed, the current set of patches that we have [7][8][9][10][11][12][13] will not be considered for the 24.09 release as we would like to make it more feature complete and target the 25.03 release instead. In that context I guess they serve as the RFC patches. 7: https://mail.openvswitch.org/pipermail/ovs-dev/2024-July/416038.html 8: https://mail.openvswitch.org/pipermail/ovs-dev/2024-July/416039.html 9: https://mail.openvswitch.org/pipermail/ovs-dev/2024-July/416040.html 10: https://mail.openvswitch.org/pipermail/ovs-dev/2024-July/416042.html 11: https://mail.openvswitch.org/pipermail/ovs-dev/2024-July/416041.html 12: https://mail.openvswitch.org/pipermail/ovs-dev/2024-July/416043.html 13: https://mail.openvswitch.org/pipermail/ovs-dev/2024-July/416044.html In addition to the above there is the LRP BGP redirect patch from Martin [14], which could be useful independently. 14: https://mail.openvswitch.org/pipermail/ovs-dev/2024-July/416095.html Discussion points from meeting: 1) OVN and Netlink code In the meeting you raised some concerns about introducing Netlink code in the OVN repository. While I agree with you 100% that the part of [10] that vendors code from OVS (contents of route-exchange-netlink-private.h), should instead be a patch for OVS. The parts of [10] that provide higher layer helper functions, consuming OVS library code, do not naturally fit in OVS as OVS itself has no use for them. As a quick reminder, we are looking at Netlink because it provides a simple and established API for exchange of this type of information which is already supported by all routing protocol suites out there. It is not tied to any particular data path type, we could theoretically even use it as IPC between two userspace processes, removing the kernel from the picture with support on the routing protocol suite side. (There has been some discussion on this for BIRD: https://bird.network.cz/pipermail/bird-users/2021-September/015707.html). Would it be possible to reach some compromise to include only the parts that consume OVS library code (route-exchange-netlink.{c|h})? While a plugin based approach was also suggested, and we have prior examples of successfully using that, it does not come without substantial cost. So I want to explore what options there are to host this inside the main repository. 2) OVS OpenFlow extensions One of the counter proposals you brought up was to add OVS OpenFlow extensions to allow OVN instruct OVS to insert routes into a system routing table. While I see this could be a clear separation of concerns between OVN and OVS, and OpenFlow being OVN's native integration language, I struggle a bit with the general usefulness of such an extension. Our use case for inserting routes into a system routing table is purely for exchange of control plane information with some external system, such as a routing protocol suite, and we have no interest in using it for actual data path control. This is in contrast to how OVN uses OpenFlow generally, which as far as I understand is to control the data path. [ snip ] > Another important part that we should keep in mind if possible, is the EVNP > use case. To be able to configure VXLAN tunnels based on the info that we > will receive. > > I'm not sure how far/deep in the actual design you are but maybe the following > might be helpful in some way. What I had in mind was sort of plugin that would > expose the info of bound entities that are interesting in terms of BGP > (it could be configurable), so mainly FIPS, LBs, GW router IPs. For the import > part (which would be applicable only for GW LR) we would create entries in SB > DB similarly as we do currently with Multicast_Group > (BGP_Routes? EVPN_Tunnels?). Northd could consume those values and configure > logical flows and encaps as needed. While the EVPN part is not a priority for us at this point in time, we will of course be interested in making sure the work we put into stage 1 (ovn-controller redistributing FIPS, LBs and GW router IPs), stage 2 (ovn-controller learning routes) will be consumable for a stage 3. -- Frode Nordahl > Let me know if that makes sense. > > Thanks, > Ales > > > -- > > Ales Musil > > Senior Software Engineer - OVN Core > > Red Hat EMEA > > amu...@redhat.com _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev