On Tue, Jun 6, 2023 at 12:10 PM Ilya Maximets <i.maxim...@ovn.org> wrote:
>
> On 3/27/23 21:43, Ilya Maximets wrote:
> > Change sets in OVSDB monitor are storing all the changes that happened
> > between a particular transaction ID and now.  Initial change set
> > basically contains all the data.
> >
> > On each monitor request a new initial change set is created by creating
> > an empty change set and adding all the database rows.  Then it is
> > converted into JSON reply and immediately untracked and destroyed.
> >
> > This is causing significant performance issues if many clients are
> > requesting new monitors at the same time.  For example, that is
> > happening after database schema conversion, because conversion triggers
> > cancellation of all monitors.  After cancellation, every client sends
> > a new monitor request.  The server then creates a new initial change
> > set, sends a reply, destroys initial change set and repeats that for
> > each client.  On a system with 200 MB database and 500 clients,
> > cluster of 3 servers spends 20 minutes replying to all the clients
> > (200 MB x 500 = 100 GB):
> >
> >   timeval|WARN|Unreasonably long 1201525ms poll interval
> >
> > Of course, all the clients are already disconnected due to inactivity
> > at this point.  When they are re-connecting back, server accepts new
> > connections one at a time, so inactivity probes will not be triggered
> > anymore, but it still takes another 20 minutes to handle all the
> > incoming connections.
> >
> > Let's keep the initial change set around for as long as the monitor
> > itself exists.  This will allow us to not construct a new change set
> > on each new monitor request and even utilize the JSON cache in some
> > cases.  All that at a relatively small maintenance cost, since we'll
> > need to commit changes to one extra change set on every transaction.
> > Measured memory usage increase due to keeping around a shallow copy
> > of a database is about 10%.  Measured CPU usage difference during
> > normal operation is negligible.
> >
> > With this change it takes only 30 seconds to send out all the monitor
> > replies in the example above.  So, it's a 40x performance improvement.
> > On a more reasonable setup with 250 nodes, the process takes up to
> > 8-10 seconds instead of 4-5 minutes.
> >
> > Conditional monitoring will benefit from this change as well, however
> > results might be less impressive due to lack of JSON cache.
> >
> > Signed-off-by: Ilya Maximets <i.maxim...@ovn.org>
> > ---
> >  ovsdb/monitor.c | 5 ++++-
> >  1 file changed, 4 insertions(+), 1 deletion(-)
> >
> > diff --git a/ovsdb/monitor.c b/ovsdb/monitor.c
> > index 191befcae..3cdd03b20 100644
> > --- a/ovsdb/monitor.c
> > +++ b/ovsdb/monitor.c
> > @@ -609,7 +609,10 @@ ovsdb_monitor_untrack_change_set(struct
ovsdb_monitor *dbmon,
> >      ovs_assert(mcs);
> >      if (--mcs->n_refs == 0) {
> >          if (mcs == dbmon->init_change_set) {
> > -            dbmon->init_change_set = NULL;
> > +            /* The initial change set should exist as long as the
> > +             * monitor itself. */
> > +            mcs->n_refs++;
> > +            return;
> >          } else if (mcs == dbmon->new_change_set) {
> >              dbmon->new_change_set = NULL;
> >          }
>
> Hi.  I'm thinking about backporting this one patch to 3.1.  It's our
latest
> stable version and it's starting to get used at scale.  Without the change
> it's really hard to manage simultaneous re-connections in large (250+
nodes)
> OVN clusters.
>
> The change itself is fairly simple, so shouldn't cause any issues in 3.1.
> We're running all our scale tests with it since it was accepted without
> any issues.
>
> Backporting to 2.17 is probably not advisable, because 2.17 doesn't have
> lazy-copy mechanism for database objects.
>
> Any thoughts on this?
>
> Best regards, Ilya Maximets.

Sorry for missing this in the original code review. I think there is a
problem in this patch. (It was one of my TODOs years ago, and this is what
I found from my notes :))

It may cause problems because clients monitoring the same tables and
columns could have different conditions, and if a new client comes with a
column in the condition that doesn't exist in the monitor. The new column
will be added to the monitor in ovsdb_monitor_condition_bind(). However,
the initial change set generated before that would not have such columns.
It would crash when evaluating conditions for the new client against the
initial set. The solution is to always include all columns for initial
change set if we want to keep it. Maybe we should have a test case for this.

With this addressed, I have no objection for backporting to 3.1.

Thanks,
Han
_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to