On 3/2/23 11:15, Ilya Maximets wrote: > On 3/1/23 23:36, Ilya Maximets wrote: >> It's common to have 'hairpin_snat_ip' to be configured for each and >> every Load Balancer in a setup. For example, ovn-kubernetes does that, >> setting '169.254.169.5 fd69::5' value unconditionally for every LB: >> >> https://github.com/ovn-org/ovn-kubernetes/blob/cd78ae1af4657d38bdc41003a8737aa958d62b9d/go-controller/pkg/ovn/controller/services/loadbalancer.go#L314 >> >> Current implementation of ovn-controller will create an OpenFlow rule >> for every datapath for each VIP in case of 'hairpin_snat_ip', because >> we need to handle the case where LBs with the same VIP are applied to >> different datapaths and have different SNAT IPs. >> >> In large scale setups with tens of thousands of LBs and hundreds of >> nodes, this generates millions of OpenFlow rules, making ovn-controller >> choke and fall into unrecoverable disconnection loop with poll >> intervals up to 200 seconds in ovn-heater's density-heavy scenario with >> just 120 nodes. It also generates rules with thousands of conjunctions >> that do not fit into a single FLOW_MOD, breaking the OpenFlow >> management communication. >> >> Fix that by partially reverting cited commit for Load Balancers with >> 'hairpin_snat_ip' specified. This will allow us to avoid construction >> of OpenFlow rules that do not fit into a single OpenFlow message. >> In order to fight the number of flows we have to generate we will >> generate flows for local datapaths only. In modern ovn-kubernetes >> setups, in general, there should be only one local datapath. >> >> Optimization for Load Balancers without 'hairpin_snat_ip' specified >> can still be preserved by keeping these wider flows at a lower priority. >> The same way as we do now. That allows to limit the blast radius of >> this change. E.g. OpenStack and older ovn-kubernetes deployments will >> not be affected. >> >> Fixes: 07467cfac499 ("lflow: Refactor OpenFlow snat hairpin flows") >> Reported-at: https://bugzilla.redhat.com/2171423 >> Suggested-by: Dumitru Ceara <dce...@redhat.com> >> Signed-off-by: Ilya Maximets <i.maxim...@ovn.org> >> --- >> >> Version 2: >> >> - This is less of a v2, but an alternative solution. A modified >> version of what Dumitru suggested during review of 1. Difference >> is that it's a partial revert of the cited commit, not a full one. >> i.e. optimization preserved for a case without hairpin_snat_ip. >> >> - Posting now due to pressing time constraints for a release. >> I didn't finish high-scale performance tests yet. I'll post >> results for that tomorrow. Preliminary ovn-heater run with >> 120 nodes density-heavy worked fine without issues. 500 node >> tests are queued. > > 500-node tests finished. And I see almost no performance difference > between cases with and without hairpin_snat_ip and the current main > branch without hairpin_snat_ip. All 3 setups have the same amount of > OF rules in ovn-k-ish setups with ovn-heater. CPU usage is approximately > on the same level as well. >
Nice! I'm also running this through the ovn-kubernetes CI but I expect it to work fine: https://github.com/dceara/ovn/actions/runs/4312305009 Acked-by: Dumitru Ceara <dce...@redhat.com> Thanks a lot for fixing this! Regards, Dumitru _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev