On 3/2/23 11:15, Ilya Maximets wrote:
> On 3/1/23 23:36, Ilya Maximets wrote:
>> It's common to have 'hairpin_snat_ip' to be configured for each and
>> every Load Balancer in a setup.  For example, ovn-kubernetes does that,
>> setting '169.254.169.5 fd69::5' value unconditionally for every LB:
>>   
>> https://github.com/ovn-org/ovn-kubernetes/blob/cd78ae1af4657d38bdc41003a8737aa958d62b9d/go-controller/pkg/ovn/controller/services/loadbalancer.go#L314
>>
>> Current implementation of ovn-controller will create an OpenFlow rule
>> for every datapath for each VIP in case of 'hairpin_snat_ip', because
>> we need to handle the case where LBs with the same VIP are applied to
>> different datapaths and have different SNAT IPs.
>>
>> In large scale setups with tens of thousands of LBs and hundreds of
>> nodes, this generates millions of OpenFlow rules, making ovn-controller
>> choke and fall into unrecoverable disconnection loop with poll
>> intervals up to 200 seconds in ovn-heater's density-heavy scenario with
>> just 120 nodes.  It also generates rules with thousands of conjunctions
>> that do not fit into a single FLOW_MOD, breaking the OpenFlow
>> management communication.
>>
>> Fix that by partially reverting cited commit for Load Balancers with
>> 'hairpin_snat_ip' specified.  This will allow us to avoid construction
>> of OpenFlow rules that do not fit into a single OpenFlow message.
>> In order to fight the number of flows we have to generate we will
>> generate flows for local datapaths only.  In modern ovn-kubernetes
>> setups, in general, there should be only one local datapath.
>>
>> Optimization for Load Balancers without 'hairpin_snat_ip' specified
>> can still be preserved by keeping these wider flows at a lower priority.
>> The same way as we do now.  That allows to limit the blast radius of
>> this change.  E.g. OpenStack and older ovn-kubernetes deployments will
>> not be affected.
>>
>> Fixes: 07467cfac499 ("lflow: Refactor OpenFlow snat hairpin flows")
>> Reported-at: https://bugzilla.redhat.com/2171423
>> Suggested-by: Dumitru Ceara <dce...@redhat.com>
>> Signed-off-by: Ilya Maximets <i.maxim...@ovn.org>
>> ---
>>
>> Version 2:
>>
>>   - This is less of a v2, but an alternative solution.  A modified
>>     version of what Dumitru suggested during review of 1.  Difference
>>     is that it's a partial revert of the cited commit, not a full one.
>>     i.e. optimization preserved for a case without hairpin_snat_ip.
>>
>>   - Posting now due to pressing time constraints for a release.
>>     I didn't finish high-scale performance tests yet.  I'll post
>>     results for that tomorrow.  Preliminary ovn-heater run with
>>     120 nodes density-heavy worked fine without issues.  500 node
>>     tests are queued.
> 
> 500-node tests finished.  And I see almost no performance difference
> between cases with and without hairpin_snat_ip and the current main
> branch without hairpin_snat_ip.  All 3 setups have the same amount of
> OF rules in ovn-k-ish setups with ovn-heater.  CPU usage is approximately
> on the same level as well.
> 

Nice!  I'm also running this through the ovn-kubernetes CI but I expect
it to work fine:

https://github.com/dceara/ovn/actions/runs/4312305009

Acked-by: Dumitru Ceara <dce...@redhat.com>

Thanks a lot for fixing this!

Regards,
Dumitru


_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to