On 4/6/23 10:47, Ilya Maximets wrote:
> On 4/6/23 10:43, Ilya Maximets wrote:
>> On 4/6/23 10:21, Dumitru Ceara wrote:
>>> On 3/27/23 21:43, Ilya Maximets wrote:
>>>> Change sets in OVSDB monitor are storing all the changes that happened
>>>> between a particular transaction ID and now.  Initial change set
>>>> basically contains all the data.
>>>>
>>>> On each monitor request a new initial change set is created by creating
>>>> an empty change set and adding all the database rows.  Then it is
>>>> converted into JSON reply and immediately untracked and destroyed.
>>>>
>>>> This is causing significant performance issues if many clients are
>>>> requesting new monitors at the same time.  For example, that is
>>>> happening after database schema conversion, because conversion triggers
>>>> cancellation of all monitors.  After cancellation, every client sends
>>>> a new monitor request.  The server then creates a new initial change
>>>> set, sends a reply, destroys initial change set and repeats that for
>>>> each client.  On a system with 200 MB database and 500 clients,
>>>> cluster of 3 servers spends 20 minutes replying to all the clients
>>>> (200 MB x 500 = 100 GB):
>>>>
>>>>   timeval|WARN|Unreasonably long 1201525ms poll interval
>>>>
>>>> Of course, all the clients are already disconnected due to inactivity
>>>> at this point.  When they are re-connecting back, server accepts new
>>>> connections one at a time, so inactivity probes will not be triggered
>>>> anymore, but it still takes another 20 minutes to handle all the
>>>> incoming connections.
>>>>
>>>> Let's keep the initial change set around for as long as the monitor
>>>> itself exists.  This will allow us to not construct a new change set
>>>> on each new monitor request and even utilize the JSON cache in some
>>>> cases.  All that at a relatively small maintenance cost, since we'll
>>>> need to commit changes to one extra change set on every transaction.
>>>> Measured memory usage increase due to keeping around a shallow copy
>>>> of a database is about 10%.  Measured CPU usage difference during
>>>> normal operation is negligible.
>>>
>>> I'm sure I'm missing something but I'm seeing this when triggering a
>>> schema conversion in an OVN sandbox (clustered NB):
>>>
>>> parse_txn()
>>> -> update_schema()
>>>    -> ovsdb_replace()
>>>       -> ovsdb_monitor_prereplace_db()
>>>          // Here we iterate and delete each jsonrpc_monitor
>>>          // associated with the db.
>>>          // Deleting the last one will trigger the monitor to be
>>>          // deleted too:
>>>          -> ovsdb_jsonrpc_monitor_destroy()
>>>             -> ovsdb_monitor_remove_jsonrpc_monitor()
>>>                -> ovsdb_monitor_destroy()
>>>
>>> At this point the initial change set gets destroyed too.  And, AFAICT,
>>> this is before the clients reconnected.  This confuses me, I guess; how
>>> come your change works? :)
>>
>> This is a correct series of events.  The database schema changed, so
>> the structure of the change set has to be modified.  There is no
>> simple way to do that (we'll need an equivalent of ovsdb_convert() to
>> do that), so the change set is destroyed.
>>
>> When the very first connection is established for the converted database,
> 
> or the very first monitor request for the converted database is received,
> like in our case (no disconnections with db_change_aware),
> 
>> the new initial change set will be created from scratch.  But it will
>> stay and will be re-used for each subsequent connection afterwards.
>> That's the goal of this patch.
>>
>> Does that make sense?
>>

Ah, OK, I misinterpreted the intention as "maintaining the change set
across monitor cancellation".  ovsdb_monitor_get_initial() will get the
initial change set for clients 2 .. N.

Thanks for the clarification!

Acked-by: Dumitru Ceara <dce...@redhat.com>

>> Best regards, Ilya Maximets.
>>
>>>
>>>>
>>>> With this change it takes only 30 seconds to send out all the monitor
>>>> replies in the example above.  So, it's a 40x performance improvement.
>>>> On a more reasonable setup with 250 nodes, the process takes up to
>>>> 8-10 seconds instead of 4-5 minutes.
>>>>
>>>> Conditional monitoring will benefit from this change as well, however
>>>> results might be less impressive due to lack of JSON cache.
>>>>
>>>> Signed-off-by: Ilya Maximets <i.maxim...@ovn.org>
>>>> ---
>>>>  ovsdb/monitor.c | 5 ++++-
>>>>  1 file changed, 4 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/ovsdb/monitor.c b/ovsdb/monitor.c
>>>> index 191befcae..3cdd03b20 100644
>>>> --- a/ovsdb/monitor.c
>>>> +++ b/ovsdb/monitor.c
>>>> @@ -609,7 +609,10 @@ ovsdb_monitor_untrack_change_set(struct ovsdb_monitor 
>>>> *dbmon,
>>>>      ovs_assert(mcs);
>>>>      if (--mcs->n_refs == 0) {
>>>>          if (mcs == dbmon->init_change_set) {
>>>> -            dbmon->init_change_set = NULL;
>>>> +            /* The initial change set should exist as long as the
>>>> +             * monitor itself. */
>>>> +            mcs->n_refs++;
>>>> +            return;
>>>>          } else if (mcs == dbmon->new_change_set) {
>>>>              dbmon->new_change_set = NULL;
>>>>          }
>>>
>>
> 

_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to