On 6/7/23 12:23, Ilya Maximets wrote: > On 6/7/23 00:48, Han Zhou wrote: >> >> >> On Tue, Jun 6, 2023 at 3:08 PM Ilya Maximets <i.maxim...@ovn.org >> <mailto:i.maxim...@ovn.org>> wrote: >>> >>> On 6/6/23 22:12, Han Zhou wrote: >>>> >>>> >>>> On Tue, Jun 6, 2023 at 12:10 PM Ilya Maximets <i.maxim...@ovn.org >>>> <mailto:i.maxim...@ovn.org> <mailto:i.maxim...@ovn.org >>>> <mailto:i.maxim...@ovn.org>>> wrote: >>>>> >>>>> On 3/27/23 21:43, Ilya Maximets wrote: >>>>>> Change sets in OVSDB monitor are storing all the changes that happened >>>>>> between a particular transaction ID and now. Initial change set >>>>>> basically contains all the data. >>>>>> >>>>>> On each monitor request a new initial change set is created by creating >>>>>> an empty change set and adding all the database rows. Then it is >>>>>> converted into JSON reply and immediately untracked and destroyed. >>>>>> >>>>>> This is causing significant performance issues if many clients are >>>>>> requesting new monitors at the same time. For example, that is >>>>>> happening after database schema conversion, because conversion triggers >>>>>> cancellation of all monitors. After cancellation, every client sends >>>>>> a new monitor request. The server then creates a new initial change >>>>>> set, sends a reply, destroys initial change set and repeats that for >>>>>> each client. On a system with 200 MB database and 500 clients, >>>>>> cluster of 3 servers spends 20 minutes replying to all the clients >>>>>> (200 MB x 500 = 100 GB): >>>>>> >>>>>> timeval|WARN|Unreasonably long 1201525ms poll interval >>>>>> >>>>>> Of course, all the clients are already disconnected due to inactivity >>>>>> at this point. When they are re-connecting back, server accepts new >>>>>> connections one at a time, so inactivity probes will not be triggered >>>>>> anymore, but it still takes another 20 minutes to handle all the >>>>>> incoming connections. >>>>>> >>>>>> Let's keep the initial change set around for as long as the monitor >>>>>> itself exists. This will allow us to not construct a new change set >>>>>> on each new monitor request and even utilize the JSON cache in some >>>>>> cases. All that at a relatively small maintenance cost, since we'll >>>>>> need to commit changes to one extra change set on every transaction. >>>>>> Measured memory usage increase due to keeping around a shallow copy >>>>>> of a database is about 10%. Measured CPU usage difference during >>>>>> normal operation is negligible. >>>>>> >>>>>> With this change it takes only 30 seconds to send out all the monitor >>>>>> replies in the example above. So, it's a 40x performance improvement. >>>>>> On a more reasonable setup with 250 nodes, the process takes up to >>>>>> 8-10 seconds instead of 4-5 minutes. >>>>>> >>>>>> Conditional monitoring will benefit from this change as well, however >>>>>> results might be less impressive due to lack of JSON cache. >>>>>> >>>>>> Signed-off-by: Ilya Maximets <i.maxim...@ovn.org >>>>>> <mailto:i.maxim...@ovn.org> <mailto:i.maxim...@ovn.org >>>>>> <mailto:i.maxim...@ovn.org>>> >>>>>> --- >>>>>> ovsdb/monitor.c | 5 ++++- >>>>>> 1 file changed, 4 insertions(+), 1 deletion(-) >>>>>> >>>>>> diff --git a/ovsdb/monitor.c b/ovsdb/monitor.c >>>>>> index 191befcae..3cdd03b20 100644 >>>>>> --- a/ovsdb/monitor.c >>>>>> +++ b/ovsdb/monitor.c >>>>>> @@ -609,7 +609,10 @@ ovsdb_monitor_untrack_change_set(struct >>>>>> ovsdb_monitor *dbmon, >>>>>> ovs_assert(mcs); >>>>>> if (--mcs->n_refs == 0) { >>>>>> if (mcs == dbmon->init_change_set) { >>>>>> - dbmon->init_change_set = NULL; >>>>>> + /* The initial change set should exist as long as the >>>>>> + * monitor itself. */ >>>>>> + mcs->n_refs++; >>>>>> + return; >>>>>> } else if (mcs == dbmon->new_change_set) { >>>>>> dbmon->new_change_set = NULL; >>>>>> } >>>>> >>>>> Hi. I'm thinking about backporting this one patch to 3.1. It's our >>>>> latest >>>>> stable version and it's starting to get used at scale. Without the change >>>>> it's really hard to manage simultaneous re-connections in large (250+ >>>>> nodes) >>>>> OVN clusters. >>>>> >>>>> The change itself is fairly simple, so shouldn't cause any issues in 3.1. >>>>> We're running all our scale tests with it since it was accepted without >>>>> any issues. >>>>> >>>>> Backporting to 2.17 is probably not advisable, because 2.17 doesn't have >>>>> lazy-copy mechanism for database objects. >>>>> >>>>> Any thoughts on this? >>>>> >>>>> Best regards, Ilya Maximets. >>>> >>>> Sorry for missing this in the original code review. I think there is a >>>> problem in this patch. (It was one of my TODOs years ago, and this is what >>>> I found from my notes :)) >>>> >>>> It may cause problems because clients monitoring the same tables and >>>> columns could have different conditions, and if a new client comes with a >>>> column in the condition that doesn't exist in the monitor. The new column >>>> will be added to the monitor in ovsdb_monitor_condition_bind(). However, >>>> the initial change set generated before that would not have such columns. >>>> It would crash when evaluating conditions for the new client against the >>>> initial set. The solution is to always include all columns for initial >>>> change set if we want to keep it. Maybe we should have a test case for >>>> this. >>> >>> >>> Hmm. So, it's a case where new monitor has a condition on a column >>> that it doesn't monitor. Right? >>> >>> It should not be hard to write a test for this scenario. >>> >>> I guess, another solution will be to destroy initial change set whenever >>> a new column is added to the monitor. It should not really happen in >>> the real world applications, as I don't think OVN, for example, is having >>> conditions on columns that it doesn't monitor. So, destruction will >>> not be triggered in normal cases and we will not have any side effects >>> like increased memory/CPU usage if we will add all the columns. >>> >> Probably having conditions on columns that it doesn't monitor is not >> abnormal, although I don't recall immediately any such use cases in OVN. > > Me neither, but I guess one simple use case is that if someone has > a condition like '<column B> equals A', there is no need for them > to monitor column B, because every row received from the server > will have the same value in this column.. > >> However, I like your approach because when it comes to scale, a large number >> of clients should have condition columns in common and the destruction >> should happen in very rare cases, even if many of them have conditions on >> columns that they don't monitor. > > Yeah, also columns seems to be never removed from the monitor, > so the total number of cases where the change set needs to be > destroyed is limited by the number of columns in the table. > >> >>> Maybe something like this (not tested): >>> >>> diff --git a/ovsdb/monitor.c b/ovsdb/monitor.c >>> index 04dcd2298..247f6f615 100644 >>> --- a/ovsdb/monitor.c >>> +++ b/ovsdb/monitor.c >>> @@ -478,6 +478,7 @@ ovsdb_monitor_add_column(struct ovsdb_monitor *dbmon, >>> enum ovsdb_monitor_selection select, >>> bool monitored) >>> { >>> + struct ovsdb_monitor_change_set *mcs; >>> struct ovsdb_monitor_table *mt; >>> struct ovsdb_monitor_column *c; >>> >>> @@ -488,6 +489,18 @@ ovsdb_monitor_add_column(struct ovsdb_monitor *dbmon, >>> return column->name; >>> } >>> >>> + mcs = dbmon->init_change_set; >>> + if (mcs) { >>> + /* A new column is going to be added to the monitor. Existing >>> + * initial change set doesn't have it, so can no longer be used. >>> + * Initial change set is never used by more than one session at >>> + * the same time, so it's safe to destroy it here. */ >>> + ovs_assert(mcs->n_refs == 1); >>> + ovsdb_monitor_json_cache_destroy(dbmon, mcs); >>> + ovsdb_monitor_change_set_destroy(mcs); >>> + dbmon->init_change_set = NULL; >>> + } >>> + >>> if (mt->n_columns >= mt->allocated_columns) { >>> mt->columns = x2nrealloc(mt->columns, &mt->allocated_columns, >>> sizeof *mt->columns); >>> --- >>> >>> What do you think? >>> >> >> At a quick glance, it looks good to me. > > Ack. I'll work on a unit test for this and post the formal patch. > Thanks! > >> >> Thanks, >> Han >> >>>> >>>> With this addressed, I have no objection for backporting to 3.1.
I backported the patch and the fix to 3.1 now. Thanks! Best regards, Ilya Maximets. _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev