Hi Srini, i can share what works for us for ~1k hypervisors:
On Tue, Mar 05, 2024 at 09:51:43PM -0800, Sri kor via discuss wrote: > Hi Team, > > > Currently , we are using OVN in RAFT cluster mode. We have 3 NB and SB > ovsdb-servers operating in RAFT cluster mode. Currently we have 500 > hypervisors connected to this RAFT cluster. > > For our next deployment, our scale would increase to 3000 hypervisors. To > accommodate this scaled hypervisors, we are migrating to DB relay with > multigroup deployment model. This increase helps with OVN SB DB read > transactions. But for write transactions, only the leader in the RAFT > cluster can update the DB. This creates a load on the leader of RAFT. Is > there a way to address the load on the RAFT cluster leader? We do the following: * If you need TLS on the ovsdb path, separate it out to some reverseproxy that can do just L4 TLS Termination (e.g. traefik, or so) * Have nobody besides northd connect to the SB DB directly, everyone else needs to use a relay * Do not run backups on the cluster leader, but on one of the current followers * Increase the raft election timeout significantly (we have 120s in there). However there is a patch afaik in 3.3 that makes that better * If you create metrics or so from database content generate these on the relays instead of the raft cluster Overall when our southbound db had issues most of the time it was some client constantly reconnecting to it and thereby pulling always a full DB dump. > > > As the scale increases, number updates coming to the ovn-controller from > OVN SB increases. that creates pressure on ovn-controller. Is there a way > to minimize the load on ovn-controller? Did not see any kind of issue there yet. However if you are using some python tooling outside of OVN (e.g. Openstack) ensure that you have JSON parsing using a C library avaialble in the ovs lib. This brings significant performance benefts if you have a lot of updates. You can check with `python3 -c "import ovs.json; print(ovs.json.PARSER)"` which should return "C". > > I wish there is a way for ovn-controller to subscribe to updates specific > to this hypervisor. Are there any known ovn-contrller subscription methods > available and being used OVS community? Yes, they do that per default. However for us we saw that this creates increased load on the relays due to the needed additional filtering and json serializing per target node. So we turned it of and thereby trade less ovsdb load for more network bandwidth. Relevant setting is `external_ids:ovn-monitor-all`. Thanks Felix > > > How can I optimize the load on the leader node in an OVN RAFT cluster to > handle increased write transactions? > > > > Thanks, > > Srini > _______________________________________________ > discuss mailing list > disc...@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-discuss _______________________________________________ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss