On 6/6/23 21:10, Ilya Maximets wrote:
> On 3/27/23 21:43, Ilya Maximets wrote:
>> Change sets in OVSDB monitor are storing all the changes that happened
>> between a particular transaction ID and now.  Initial change set
>> basically contains all the data.
>>
>> On each monitor request a new initial change set is created by creating
>> an empty change set and adding all the database rows.  Then it is
>> converted into JSON reply and immediately untracked and destroyed.
>>
>> This is causing significant performance issues if many clients are
>> requesting new monitors at the same time.  For example, that is
>> happening after database schema conversion, because conversion triggers
>> cancellation of all monitors.  After cancellation, every client sends
>> a new monitor request.  The server then creates a new initial change
>> set, sends a reply, destroys initial change set and repeats that for
>> each client.  On a system with 200 MB database and 500 clients,
>> cluster of 3 servers spends 20 minutes replying to all the clients
>> (200 MB x 500 = 100 GB):
>>
>>   timeval|WARN|Unreasonably long 1201525ms poll interval
>>
>> Of course, all the clients are already disconnected due to inactivity
>> at this point.  When they are re-connecting back, server accepts new
>> connections one at a time, so inactivity probes will not be triggered
>> anymore, but it still takes another 20 minutes to handle all the
>> incoming connections.
>>
>> Let's keep the initial change set around for as long as the monitor
>> itself exists.  This will allow us to not construct a new change set
>> on each new monitor request and even utilize the JSON cache in some
>> cases.  All that at a relatively small maintenance cost, since we'll
>> need to commit changes to one extra change set on every transaction.
>> Measured memory usage increase due to keeping around a shallow copy
>> of a database is about 10%.  Measured CPU usage difference during
>> normal operation is negligible.
>>
>> With this change it takes only 30 seconds to send out all the monitor
>> replies in the example above.  So, it's a 40x performance improvement.
>> On a more reasonable setup with 250 nodes, the process takes up to
>> 8-10 seconds instead of 4-5 minutes.
>>
>> Conditional monitoring will benefit from this change as well, however
>> results might be less impressive due to lack of JSON cache.
>>
>> Signed-off-by: Ilya Maximets <i.maxim...@ovn.org>
>> ---
>>  ovsdb/monitor.c | 5 ++++-
>>  1 file changed, 4 insertions(+), 1 deletion(-)
>>
>> diff --git a/ovsdb/monitor.c b/ovsdb/monitor.c
>> index 191befcae..3cdd03b20 100644
>> --- a/ovsdb/monitor.c
>> +++ b/ovsdb/monitor.c
>> @@ -609,7 +609,10 @@ ovsdb_monitor_untrack_change_set(struct ovsdb_monitor 
>> *dbmon,
>>      ovs_assert(mcs);
>>      if (--mcs->n_refs == 0) {
>>          if (mcs == dbmon->init_change_set) {
>> -            dbmon->init_change_set = NULL;
>> +            /* The initial change set should exist as long as the
>> +             * monitor itself. */
>> +            mcs->n_refs++;
>> +            return;
>>          } else if (mcs == dbmon->new_change_set) {
>>              dbmon->new_change_set = NULL;
>>          }
> 
> Hi.  I'm thinking about backporting this one patch to 3.1.  It's our latest
> stable version and it's starting to get used at scale.  Without the change
> it's really hard to manage simultaneous re-connections in large (250+ nodes)
> OVN clusters.
> 
> The change itself is fairly simple, so shouldn't cause any issues in 3.1.
> We're running all our scale tests with it since it was accepted without
> any issues.
> 
> Backporting to 2.17 is probably not advisable, because 2.17 doesn't have
> lazy-copy mechanism for database objects.
> 
> Any thoughts on this?

No objections on my side.  I agree with the backport to 3.1.

Regards,
Dumitru

_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to