On Wed, Jul 5, 2023 at 9:59 AM Terry Wilson <twil...@redhat.com> wrote: > > On Fri, Jun 30, 2023 at 7:09 PM Han Zhou via discuss > <ovs-discuss@openvswitch.org> wrote: > > > > > > > > On Wed, May 24, 2023 at 12:26 AM Felix Huettner via discuss > > <ovs-discuss@openvswitch.org> wrote: > > > > > > Hi Ilya, > > > > > > thank you for the detailed reply > > > > > > On Tue, May 23, 2023 at 05:25:49PM +0200, Ilya Maximets wrote: > > > > On 5/23/23 15:59, Felix Hüttner via discuss wrote: > > > > > Hi everyone, > > > > > > > > Hi, Felix. > > > > > > > > > > > > > > we are currently running an OVN Deployment with 450 Nodes. We run a 3 > > > > > node cluster for the northbound database and a 3 nodes cluster for > > > > > the southbound database. > > > > > Between the southbound cluster and the ovn-controllers we have a > > > > > layer of 24 ovsdb relays. > > > > > The setup is using TLS for all connections, however the TLS Server is > > > > > handled by a traefik reverseproxy to offload this from the ovsdb > > > > > > > > The very important part of the system description is what versions > > > > of OVS and OVN are you using in this setup? If it's not latest > > > > 3.1 and 23.03, then it's hard to talk about what/if performance > > > > improvements are actually needed. > > > > > > > > > > We are currently running ovs 3.1 and ovn 22.12 (in the process of > > > upgrading to 23.03). `monitor-all` is currently disabled, but we want to > > > try that as well. > > > > > Hi Felix, did you try upgrading and enabling "monitor-all"? How does it > > look now? > > > > > > > Northd and Neutron is connecting directly to north- and southbound > > > > > databases without the relays. > > > > > > > > One of the big things that is annoying is that Neutron connects to > > > > Southbound database at all. There are some reasons to do that, > > > > but ideally that should be avoided. I know that in the past limiting > > > > the number of metadata agents was one of the mitigation strategies > > > > for scaling issues. Also, why can't it connect to relays? There > > > > shouldn't be too many transactions flowing towards Southbound DB > > > > from the Neutron. > > > > > > > > > > Thanks for that suggestion, that definately makes sense. > > > > > Does this make a big difference? How many Neutron - SB connections are > > there? > > What rings a bell is that Neutron is using the python OVSDB library which > > hasn't implemented the fast-resync feature (if I remember correctly). > > python-ovs has supported monitor_cond_since since v2.17.0 (though > there may have been a bug that was fixed in 2.17.1). If fast resync > isn't happening, then it should be considered a bug. With that said, I > remember when I looked it a year or two ago, ovsdb-server didn't > really use fast resync/monitor_cond_since unless it was running in > raft cluster mode (it would reply, but with the last-txn-id as 0 > IIRC?). Does the ovsdb-relay code actually return the last-txn-id? I > can set up an environment and run some tests, but maybe someone else > already knows.
Looks like ovsdb-relay does support last-txn-id now: https://github.com/openvswitch/ovs/commit/a3e97b1af1bdcaa802c6caa9e73087df7077d2b1, but only in v3.0+. > > At the same time, there is the feature leader-transfer-for-snapshot, which > > automatically transfer leader whenever a snapshot is to be written, which > > would happen frequently if your environment is very active. > > I believe snapshot should only be happening "no less frequently than > 24 hours, with snapshots if there are more than 100 log entries and > the log size has doubled, but no more frequently than every 10 mins" > or something pretty close to that. So it seems like once the system > got up to its expected size, you would just see updates every 24 hours > since you obviously can't double in size forever. But it's possible > I'm reading that wrong. > > > When a leader transfer happens, if Neutron set the option "leader-only" > > (only connects to leader) to SB DB (could someone confirm?), then when the > > leader transfer happens, all Neutron workers would reconnect to the new > > leader. With fast-resync, like what's implemented in C IDL and Go, the > > client that has cached the data would only request the delta when > > reconnecting. But since the python lib doesn't have this, the Neutron > > server would re-download full data when reconnecting ... > > This is a speculation based on the information I have, and the assumptions > > need to be confirmed. > > > > > > > > > > > > We needed to increase various timeouts on the ovsdb-server and client > > > > > side to get this to a mostly stable state: > > > > > * inactivity probes of 60 seconds (for all connections between > > > > > ovsdb-server, relay and clients) > > > > > * cluster election time of 50 seconds > > > > > > > > > > As long as none of the relays restarts the environment is quite > > > > > stable. > > > > > However we see quite regularly the "Unreasonably long xxx ms poll > > > > > interval" messages ranging from 1000ms up to 40000ms. > > > > > > > > With latest versions of OVS/OVN the CPU usage on Southbound DB > > > > servers without relays in our weekly 500-node ovn-heater runs > > > > stays below 10% during the test phase. No large poll intervals > > > > are getting registered. > > > > > > > > Do you have more details on under which circumstances these > > > > large poll intervals occur? > > > > > > > > > > It seems to mostly happen on the initial connection of some client to > > > the ovsdb. From the few times we ran perf there it looks like the time > > > is spend in creating a monitor and during that sending out the updates > > > to the client side. > > > > > It is one of the worst case scenario for OVSDB when many clients initialize > > connections to it at the same time, when the size of data downloaded by > > each client is big. > > OVSDB relay, for what I understand, should greatly help on this. You have > > 24 relay nodes, which are supposed to share the burden. Are the SB DB and > > the relay instances running with sufficient CPU resources? > > Is it clear that initial connections from which clients (ovn-controller or > > Neutron) are causing this? If it is Neutron, the above speculation about > > the lack of fast-resync from Neutron workers may be worth checking. > > > > > If it is of interest i can try and get a perf report once this occurs > > > again. > > > > > > > > > > > > > If a large amount of relays restart simultaneously they can also > > > > > bring the ovsdb cluster to fail as the poll interval exceeds the > > > > > cluster election time. > > > > > This happens with the relays already syncing the data from all 3 > > > > > ovsdb servers. > > > > > > > > There was a performance issue with upgrades and simultaneous > > > > reconnections, but it should be mostly fixed on the current master > > > > branch, i.e. in the upcoming 3.2 release: > > > > > > > > https://patchwork.ozlabs.org/project/openvswitch/list/?series=348259&state=* > > > > > > > > > > That sounds like that might be similar to when our issue occurs. I'll > > > see if we can try this out. > > > > > > > > > > > > > We would like to improve this significantly to ensure on the one hand > > > > > that our ovsdb clusters will survive unplanned load without issues > > > > > and on the other hand to keep the poll intervals short. > > > > > We would like to ensure a short poll interval to allow us to act on > > > > > distributed-gateway-ports failovers and failover of virtual port in a > > > > > timely manner (ideally below 1 second). > > > > > > > > These are good goals. But are you sure they are not already > > > > addressed with the most recent versions of OVS/OVN ? > > > > > > > > > > I was not sure, but all your feedback helped clarifying that. > > > > > > > > > > > > > To do this we found the following solutions that were discussed in > > > > > the past: > > > > > 1. Implementing multithreading for ovsdb > > > > > https://patchwork.ozlabs.org/project/openvswitch/list/?series=&submitter=&state=*&q=multithreading&archive=&delegate= > > > > > > > > We moved the compaction process to a separate thread in 3.0. > > > > This partially addressed the multi-threading topic. General > > > > handling of client requests/updates in separate threads will > > > > require significant changes in the internal architecture, AFAICT. > > > > So, I'd like to avoid doing that unless necessary. So far we > > > > were able to overcome almost all the performance challenges > > > > with simple algorithmic changes instead. > > > > > > > > > > I definately get that since that would be quite a complex change to do. > > > The only benefit i would see in having clients in separate threads is > > > that it reduces the impact of performance challenges. > > > E.g. it would still allow the cluster to healthly work together and make > > > progress, but individual reconnects would be slow. > > > > > > That benefit would be quite significant from my perspective as it makes > > > the solution more resillient. But i'm not sure if its worth the > > > additional complexity. > > > > > Multithreading for general OVSDB tasks (transactions, monitoring) seems > > more complex to implement, and the outcome should be very similar to OVSDB > > relay (which is multi-process instead of multi-threading), except that > > multi-threading may have a smaller memory footprint. > > Multithreading for RAFT cluster RPC may help keeping the cluster healthy > > when server load is high, but the same can be achieved by setting a longer > > election timer. I agree there is a subtle difference when you want fast > > failure detection for things like node crash but can tolerate overloaded > > servers that can barely respond to clients. > > > > Looking forward to hearing back from you regarding the situation. > > > > Thanks, > > Han > > > > > > > 2. Changing the storage backend of OVN to an alternative (e.g. etcd) > > > > > https://mail.openvswitch.org/pipermail/ovs-discuss/2016-July/041733.html > > > > > > > > There was an ovsdb-etcd project, but it didn't manage to provide > > > > better performance in comparison with ovsdb-server. So it was > > > > ultimately abandoned: https://github.com/IBM/ovsdb-etcd > > > > > > > > > > > > > > Both of these discussion are from 2016, not sure if more up-to-date > > > > > ones exist. > > > > > > > > > > I would like to ask if there are already existing discussions on > > > > > scaling ovsdb further/faster? > > > > > > > > This again comes to a question what versions you're using. I'm > > > > currently not aware of any major performance issues for ovsdb-server > > > > on the most recent code, besides the conditional monitoring, which is > > > > not entirely OVSDB server's issue. And it is also likely to become > > > > a bit better in 3.2: > > > > > > > > https://patchwork.ozlabs.org/project/openvswitch/patch/20230518121425.550048-1-i.maxim...@ovn.org/ > > > > > > > > > > That also sounds like a quite interesting change that might help us > > > here. > > > > > > > > > > > > > From my perspective whatever such a solution might be, would no > > > > > longer require relays and allow the ovsdb servers to handle load > > > > > gracefully. > > > > > I personally see that multithreading for ovsdb sounds quite > > > > > promising, as that would allow us to separate the raft/cluster > > > > > communication from the client connections. > > > > > This should allow us to keep the cluster healthly even under > > > > > significant pressure of clients. > > > > > > > > Again, good goals. I'm just not sure if we actually need to do > > > > something or if they are already achievable with the most recent code. > > > > > > > > I understand that testing on prod is not an option, so it's unlikely > > > > we'll have an accurate test. But maybe you can participate in the > > > > initiative [1] for creation of ovn-heater OpenStack scenarios that > > > > might be close to workloads you have? This way upstream will be able > > > > to test your use-cases or at least something similar. > > > > > > > > Most of our current efforts are focused on ovn-kubernetes use-case, > > > > because we don't have much details on how high-scale OpenStack > > > > deployments > > > > look like. > > > > > > > > [1] https://mail.openvswitch.org/pipermail/ovs-dev/2023-May/404488.html > > > > > > > > > > That looks very interesting and would also help us running scale tests. > > > I'll get in contact with whoever is working on that to help out as well. > > > > > > > Best regards, Ilya Maximets. > > > > > > > > > > > > > > Thank you > > > > > > > > > > -- > > > > > Felix Huettner > > > > > > > > > > Thanks for all of the detailed insights. > > > Felix > > > Diese E Mail enthält möglicherweise vertrauliche Inhalte und ist nur für > > > die Verwertung durch den vorgesehenen Empfänger bestimmt. > > > Sollten Sie nicht der vorgesehene Empfänger sein, setzen Sie den Absender > > > bitte unverzüglich in Kenntnis und löschen diese E Mail. > > > > > > Hinweise zum Datenschutz finden Sie hier<https://www.datenschutz.schwarz>. > > > > > > > > > This e-mail may contain confidential content and is intended only for the > > > specified recipient/s. > > > If you are not the intended recipient, please inform the sender > > > immediately and delete this e-mail. > > > > > > Information on data protection can be found > > > here<https://www.datenschutz.schwarz>. > > > _______________________________________________ > > > discuss mailing list > > > disc...@openvswitch.org > > > https://mail.openvswitch.org/mailman/listinfo/ovs-discuss > > _______________________________________________ > > discuss mailing list > > disc...@openvswitch.org > > https://mail.openvswitch.org/mailman/listinfo/ovs-discuss _______________________________________________ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss