On 6/6/23 21:10, Ilya Maximets wrote: > On 3/27/23 21:43, Ilya Maximets wrote: >> Change sets in OVSDB monitor are storing all the changes that happened >> between a particular transaction ID and now. Initial change set >> basically contains all the data. >> >> On each monitor request a new initial change set is created by creating >> an empty change set and adding all the database rows. Then it is >> converted into JSON reply and immediately untracked and destroyed. >> >> This is causing significant performance issues if many clients are >> requesting new monitors at the same time. For example, that is >> happening after database schema conversion, because conversion triggers >> cancellation of all monitors. After cancellation, every client sends >> a new monitor request. The server then creates a new initial change >> set, sends a reply, destroys initial change set and repeats that for >> each client. On a system with 200 MB database and 500 clients, >> cluster of 3 servers spends 20 minutes replying to all the clients >> (200 MB x 500 = 100 GB): >> >> timeval|WARN|Unreasonably long 1201525ms poll interval >> >> Of course, all the clients are already disconnected due to inactivity >> at this point. When they are re-connecting back, server accepts new >> connections one at a time, so inactivity probes will not be triggered >> anymore, but it still takes another 20 minutes to handle all the >> incoming connections. >> >> Let's keep the initial change set around for as long as the monitor >> itself exists. This will allow us to not construct a new change set >> on each new monitor request and even utilize the JSON cache in some >> cases. All that at a relatively small maintenance cost, since we'll >> need to commit changes to one extra change set on every transaction. >> Measured memory usage increase due to keeping around a shallow copy >> of a database is about 10%. Measured CPU usage difference during >> normal operation is negligible. >> >> With this change it takes only 30 seconds to send out all the monitor >> replies in the example above. So, it's a 40x performance improvement. >> On a more reasonable setup with 250 nodes, the process takes up to >> 8-10 seconds instead of 4-5 minutes. >> >> Conditional monitoring will benefit from this change as well, however >> results might be less impressive due to lack of JSON cache. >> >> Signed-off-by: Ilya Maximets <i.maxim...@ovn.org> >> --- >> ovsdb/monitor.c | 5 ++++- >> 1 file changed, 4 insertions(+), 1 deletion(-) >> >> diff --git a/ovsdb/monitor.c b/ovsdb/monitor.c >> index 191befcae..3cdd03b20 100644 >> --- a/ovsdb/monitor.c >> +++ b/ovsdb/monitor.c >> @@ -609,7 +609,10 @@ ovsdb_monitor_untrack_change_set(struct ovsdb_monitor >> *dbmon, >> ovs_assert(mcs); >> if (--mcs->n_refs == 0) { >> if (mcs == dbmon->init_change_set) { >> - dbmon->init_change_set = NULL; >> + /* The initial change set should exist as long as the >> + * monitor itself. */ >> + mcs->n_refs++; >> + return; >> } else if (mcs == dbmon->new_change_set) { >> dbmon->new_change_set = NULL; >> } > > Hi. I'm thinking about backporting this one patch to 3.1. It's our latest > stable version and it's starting to get used at scale. Without the change > it's really hard to manage simultaneous re-connections in large (250+ nodes) > OVN clusters. > > The change itself is fairly simple, so shouldn't cause any issues in 3.1. > We're running all our scale tests with it since it was accepted without > any issues. > > Backporting to 2.17 is probably not advisable, because 2.17 doesn't have > lazy-copy mechanism for database objects. > > Any thoughts on this?
No objections on my side. I agree with the backport to 3.1. Regards, Dumitru _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev