Hello OVN community, This is a follow up on the message I have sent today [1]. That second part focuses on some ideas I have to remove the limitations that were mentioned in the previous email.
[1] https://mail.openvswitch.org/pipermail/ovs-discuss/2023-September/052695.html If you didn't read it, my goal is to start a discussion about how we could improve OVN on the following topics: - Reduce the memory and CPU footprint of ovn-controller, ovn-northd. - Support scaling of L2 connectivity across larger clusters. - Simplify CMS interoperability. - Allow support for alternative datapath implementations. Disclaimer: This message does not mention anything about L3/L4 features of OVN. I didn't have time to work on these, yet. I hope we can discuss how these fit with my ideas. Distributed mac learning ======================== Use one OVS bridge per logical switch with mac learning enabled. Only create the bridge if the logical switch has a port bound to the local chassis. Pros: - Minimal openflow rules required in each bridge (ACLs and NAT mostly). - No central mac binding table required. - Mac table aging comes for free. - Zero access to southbound DB for learned addresses nor for aging. Cons: - How to manage seamless upgrades? - Requires ovn-controller to move/plug ports in the correct bridge. - Multiple openflow connections (one per managed bridge). - Requires ovn-trace to be reimplemented differently (maybe other tools as well). Use multicast for overlay networks ================================== Use a unique 24bit VNI per overlay network. Derive a multicast group address from that VNI. Use VXLAN address learning [2] to remove the need for ovn-controller to know the destination chassis for every mac address in advance. [2] https://datatracker.ietf.org/doc/html/rfc7348#section-4.2 Pros: - Nodes do not need to know about others in advance. The control plane load is distributed across the cluster. - 24bit VNI allows for more than 16 million logical switches. No need for extended GENEVE tunnel options. - Limited and scoped "flooding" with IGMP/MLD snooping enabled in top-of-rack switches. Multicast is only used for BUM traffic. - Only one VXLAN output port per implemented logical switch on a given chassis. Cons: - OVS does not support VXLAN address learning yet. - The number of usable multicast groups in a fabric network may be limited? - How to manage seamless upgrades and interoperability with older OVN versions? Connect ovn-controller to the northbound DB =========================================== This idea extends on a previous proposal to migrate the logical flows creation in ovn-controller [3]. [3] https://patchwork.ozlabs.org/project/ovn/patch/20210625233130.3347463-1-num...@ovn.org/ If the first two proposals are implemented, the southbound database can be removed from the picture. ovn-controller can directly translate the northbound schema into OVS configuration bridges, ports and flow rules. For other components that require access to the southbound DB (e.g. neutron metadata agent), ovn-controller should provide an interface to expose state and configuration data for local consumption. All state information present in the NB DB should be moved to a separate state database [4] for CMS consumption. [4] https://mail.openvswitch.org/pipermail/ovs-dev/2023-April/403675.html For those who like visuals, I have started working on basic use cases and how they would be implemented without a southbound database [5]. [5] https://link.excalidraw.com/p/readonly/jwZgJlPe4zhGF8lE5yY3 Pros: - The northbound DB is smaller by design: reduced network bandwidth and memory usage in all chassis. - If we keep the northbound read-only for ovn-controller, it removes scaling issues when one controller updates one row that needs to be replicated everywhere. - The northbound schema knows nothing about flows. We could introduce alternative dataplane backends configured by ovn-controller via plugins. I have done a minimal PoC to check if it could work with the linux network stack [6]. [6] https://github.com/rjarry/ovn-nb-agent/blob/main/backend/linux/bridge.go Cons: - This would be a serious API breakage for systems that depend on the southbound DB. - Can all OVN constructs be implemented without a southbound DB? - Is the community interested in alternative datapaths? Closing thoughts ================ I mainly focused on OpenStack use cases for now, but I think these propositions could benefit Kubernetes as well. I hope I didn't bore everyone to death. Let me know what you think. Cheers! -- Robin Jarry Red Hat, Telco/NFV _______________________________________________ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss