On 4/6/23 10:47, Ilya Maximets wrote: > On 4/6/23 10:43, Ilya Maximets wrote: >> On 4/6/23 10:21, Dumitru Ceara wrote: >>> On 3/27/23 21:43, Ilya Maximets wrote: >>>> Change sets in OVSDB monitor are storing all the changes that happened >>>> between a particular transaction ID and now. Initial change set >>>> basically contains all the data. >>>> >>>> On each monitor request a new initial change set is created by creating >>>> an empty change set and adding all the database rows. Then it is >>>> converted into JSON reply and immediately untracked and destroyed. >>>> >>>> This is causing significant performance issues if many clients are >>>> requesting new monitors at the same time. For example, that is >>>> happening after database schema conversion, because conversion triggers >>>> cancellation of all monitors. After cancellation, every client sends >>>> a new monitor request. The server then creates a new initial change >>>> set, sends a reply, destroys initial change set and repeats that for >>>> each client. On a system with 200 MB database and 500 clients, >>>> cluster of 3 servers spends 20 minutes replying to all the clients >>>> (200 MB x 500 = 100 GB): >>>> >>>> timeval|WARN|Unreasonably long 1201525ms poll interval >>>> >>>> Of course, all the clients are already disconnected due to inactivity >>>> at this point. When they are re-connecting back, server accepts new >>>> connections one at a time, so inactivity probes will not be triggered >>>> anymore, but it still takes another 20 minutes to handle all the >>>> incoming connections. >>>> >>>> Let's keep the initial change set around for as long as the monitor >>>> itself exists. This will allow us to not construct a new change set >>>> on each new monitor request and even utilize the JSON cache in some >>>> cases. All that at a relatively small maintenance cost, since we'll >>>> need to commit changes to one extra change set on every transaction. >>>> Measured memory usage increase due to keeping around a shallow copy >>>> of a database is about 10%. Measured CPU usage difference during >>>> normal operation is negligible. >>> >>> I'm sure I'm missing something but I'm seeing this when triggering a >>> schema conversion in an OVN sandbox (clustered NB): >>> >>> parse_txn() >>> -> update_schema() >>> -> ovsdb_replace() >>> -> ovsdb_monitor_prereplace_db() >>> // Here we iterate and delete each jsonrpc_monitor >>> // associated with the db. >>> // Deleting the last one will trigger the monitor to be >>> // deleted too: >>> -> ovsdb_jsonrpc_monitor_destroy() >>> -> ovsdb_monitor_remove_jsonrpc_monitor() >>> -> ovsdb_monitor_destroy() >>> >>> At this point the initial change set gets destroyed too. And, AFAICT, >>> this is before the clients reconnected. This confuses me, I guess; how >>> come your change works? :) >> >> This is a correct series of events. The database schema changed, so >> the structure of the change set has to be modified. There is no >> simple way to do that (we'll need an equivalent of ovsdb_convert() to >> do that), so the change set is destroyed. >> >> When the very first connection is established for the converted database, > > or the very first monitor request for the converted database is received, > like in our case (no disconnections with db_change_aware), > >> the new initial change set will be created from scratch. But it will >> stay and will be re-used for each subsequent connection afterwards. >> That's the goal of this patch. >> >> Does that make sense? >>
Ah, OK, I misinterpreted the intention as "maintaining the change set across monitor cancellation". ovsdb_monitor_get_initial() will get the initial change set for clients 2 .. N. Thanks for the clarification! Acked-by: Dumitru Ceara <dce...@redhat.com> >> Best regards, Ilya Maximets. >> >>> >>>> >>>> With this change it takes only 30 seconds to send out all the monitor >>>> replies in the example above. So, it's a 40x performance improvement. >>>> On a more reasonable setup with 250 nodes, the process takes up to >>>> 8-10 seconds instead of 4-5 minutes. >>>> >>>> Conditional monitoring will benefit from this change as well, however >>>> results might be less impressive due to lack of JSON cache. >>>> >>>> Signed-off-by: Ilya Maximets <i.maxim...@ovn.org> >>>> --- >>>> ovsdb/monitor.c | 5 ++++- >>>> 1 file changed, 4 insertions(+), 1 deletion(-) >>>> >>>> diff --git a/ovsdb/monitor.c b/ovsdb/monitor.c >>>> index 191befcae..3cdd03b20 100644 >>>> --- a/ovsdb/monitor.c >>>> +++ b/ovsdb/monitor.c >>>> @@ -609,7 +609,10 @@ ovsdb_monitor_untrack_change_set(struct ovsdb_monitor >>>> *dbmon, >>>> ovs_assert(mcs); >>>> if (--mcs->n_refs == 0) { >>>> if (mcs == dbmon->init_change_set) { >>>> - dbmon->init_change_set = NULL; >>>> + /* The initial change set should exist as long as the >>>> + * monitor itself. */ >>>> + mcs->n_refs++; >>>> + return; >>>> } else if (mcs == dbmon->new_change_set) { >>>> dbmon->new_change_set = NULL; >>>> } >>> >> > _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev