Thanks Vladislav and Dumitru for reporting and fixing the issue.
The impact of the issue is more than just the memory spikes in
ovn-controller. More importantly, it incurs much higher load on SB DB
because the conditions are flooded with all the localnet ports regardless
of whether they belong to local datapaths.
Please see more details in a related fix I just posted:
https://patchwork.ozlabs.org/project/ovn/patch/20230810155351.2853369-1-hz...@ovn.org/

In addition, I want to comment on Dumitru's patch (sorry I still didn't see
you posted for review yet)
https://github.com/dceara/ovn/commit/f5e8b9bcba61a2528b67854bb4211981a99feaa8,
I think it is better to avoid using the "optional" version for port_binding
monitoring. For the same reason, if my patch above is not applied, the SB
DB overload problem is still there because the unconditional monitoring of
port_binding initially adds all localnet ports to local_lports and later to
the monitor conditions. In addition, if there are a huge amount of PBs, it
may still cause memory pressure. I understand the intention of avoiding
churns of building local_datapaths, but it may be better to be addressed in
a separate patch and we can review and discuss accordingly.

Still, looking forward to the formal patch for this.

Thanks,
Han

On Tue, Aug 1, 2023 at 3:09 AM Vladislav Odintsov <odiv...@gmail.com> wrote:
>
> Hi Dumitru!
>
> I’ve performed test on the host, where ovn-controller (22.09.x) without
your patch and without any local datapaths consumed 3+GiB of ram after
start and the process start took 100% CPU during ~40 seconds after start.
> With your patch ovn-controller starts during ~5 seconds with max cpu
consumption seen in top ~35-36%.
> Memory consumption doesn’t exceed 220-222 MiB on the same host with same
NB/SB contents.
>
> Many thanks for this improvement!
>
> I guess after this patch got merged to main branch, it should be
backported down to 22.03 (branch, to which initial commit [0] introduced
such behaviour was backported).
>
> 0:
https://github.com/ovn-org/ovn/commit/525829c47d09a114de8536d23784c458977d8321
>
> > On 31 Jul 2023, at 14:43, Vladislav Odintsov <odiv...@gmail.com> wrote:
> >
> > Thanks Dumitru!
> >
> > I’ll test this patch in a few days.
> >
> >> On 28 Jul 2023, at 14:36, Dumitru Ceara <dce...@redhat.com> wrote:
> >>
> >> Hi Vladislav,
> >>
> >> After quite some time trying to implement the IDL API change to allow
> >> setting a different default monitor condition and mostly struggling
with
> >> ovn-controller using that properly I kind of gave up and decided to
> >> approach this in a different way.
> >>
> >> We have guidelines about supported upgrade scenarios [0] so we can use
> >> the same guidelines for defining which tables ovn-controller is
entitled
> >> to assume exist in the SB (without having to check).
> >>
> >> I ended up with:
> >>
https://github.com/dceara/ovn/commit/f5e8b9bcba61a2528b67854bb4211981a99feaa8
> >>
> >> I know it's not perfect but it might be the least risky and a good
> >> enough solution.
> >>
> >> It would be great if you could try it out on your data set too.
> >>
> >> Thanks,
> >> Dumitru
> >>
> >> [0]
> >>
https://github.com/ovn-org/ovn/blob/5a1d82cb28c554276e0c17718f808b8f244cb162/Documentation/intro/install/ovn-upgrades.rst?plain=1#L28
> >>
> >> On 7/25/23 10:19, Vladislav Odintsov wrote:
> >>> Many thanks for the information!
> >>>
> >>>> On 25 Jul 2023, at 11:14, Dumitru Ceara <dce...@redhat.com> wrote:
> >>>>
> >>>> On 7/24/23 21:10, Vladislav Odintsov wrote:
> >>>>> Hi Dumitru,
> >>>>>
> >>>>
> >>>> Hi Vladislav,
> >>>>
> >>>>> I just wanted to ask wether you need any help (maybe, testing) in
this?
> >>>>> I’m ready to check this on my dataset if you were successful to
> >>>>> implement a fix.
> >>>>>
> >>>>
> >>>> Thanks for offering to help.  I didn't get the chance to properly
write
> >>>> and test the patches for this (we need a change in OVS IDL first and
> >>>> then one in OVN).  It would be great if you could try them out on
your
> >>>> data sets so I'll CC you on the patches when posting them.  I hope
to do
> >>>> that this week.
> >>>>
> >>>> Regards,
> >>>> Dumitru
> >>>>
> >>>>>> On 12 Jul 2023, at 12:15, Dumitru Ceara <dce...@redhat.com> wrote:
> >>>>>>
> >>>>>> On 7/12/23 00:01, Ilya Maximets wrote:
> >>>>>>> On 7/11/23 19:01, Dumitru Ceara wrote:
> >>>>>>>> On 7/11/23 18:33, Vladislav Odintsov wrote:
> >>>>>>>>> Hi Dumitru,
> >>>>>>>>>
> >>>>>>>>> The system on which I reproduced this issue is running 22.09.x
> >>>>>>>>> version. I’ve tried to upgrade ovn-controller to main branch +
your
> >>>>>>>>> patch. Please, note that it has test error: [1].
> >>>>>>>>> After two minutes after upgrade it still consumed 3.3G.
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>> Ack, I need to re-think the patch then.  Maybe a hard deadline
to run
> >>>>>>>> malloc_trim() at least once every X seconds.  I'll see what I
can come
> >>>>>>>> up with.
> >>>>>>>>
> >>>>>>>>> I tried to backport your patch to 22.09, it required to backport
> >>>>>>>>> also this commit: [2] and it failed some tests: [3].
> >>>>>>>>>
> >>>>>>>>> But I’ve got general question: prior to commit that I mentioned
in
> >>>>>>>>> initial mail, ovn-controller even didn’t try load such amount of
> >>>>>>>>> data. And now it does and IIUC, your patch just releases memory
> >>>>>>>>> that was freed after ovn-controller fully loaded.
> >>>>>>>>> I’m wonder wether it should load that excess data at all? Seems
> >>>>>>>>> like it did.
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>> Looking again at 1b0dbde94070 ("ovn-controller: Only set monitor
> >>>>>>>> conditions on available tables.") it's kind of expected indeed:
> >>>>>>>>
> >>>>>>>> Initially all tables are "unavailable" because we didn't get the
schema
> >>>>>>>> so we don't set any condition for any table.
> >>>>>>>>
> >>>>>>>> After ovn-controller connects to the SB for the first time it
will
> >>>>>>>> determine that the SB tables are in the schema so it will
explicitly add
> >>>>>>>> them to the monitor condition and restrict the SB data it is
> >>>>>>>> interested in.
> >>>>>>>>
> >>>>>>>> Maybe we need to change the IDL/CS modules to wait with the
> >>>>>>>> monitor_cond/monitor_cond_since until instructed by the client
> >>>>>>>> (ovn-controller).  Ilya do you have any thoughts on this matter?
> >>>>>>>
> >>>>>>> So, AFAICT, the issue is that we're running with
> >>>>>>> 'monitor_everything_by_default'
> >>>>>>> option, the default condition is 'true' and the monitor request
for
> >>>>>>> the main
> >>>>>>> database is sent out immediately after receiving the schema, so
the
> >>>>>>> application
> >>>>>>> has no time to react.
> >>>>>>>
> >>>>>>> I think, there are few possible solutions for this:
> >>>>>>>
> >>>>>>> 1. Introduce a new state in the CS state machine, e.g.
> >>>>>>> CS_S_SERVER_SCHEMA_RCEIVED, and move out from this state in the
run()
> >>>>>>> callback.  This way the application will have a chance to set up
> >>>>>>> conditions
> >>>>>>> before they are sent.  Slightly not intuitive.
> >>>>>>>
> >>>>>>> 2. A variation on what you suggested, i.e. enter the
> >>>>>>> CS_S_SERVER_SCHEMA_RCEIVED
> >>>>>>> state and wait for some sort of the signal from the application to
> >>>>>>> proceed.
> >>>>>>> Sounds a bit counter-intuitive for an IDL user.
> >>>>>>>
> >>>>>>> 3. Introduce an application callback that can be called from the
> >>>>>>> ovsdb_idl_compose_monitor_request() the same way as this function
> >>>>>>> is getting
> >>>>>>> called form the ovsdb_cs_send_monitor_request().  An application
> >>>>>>> will be
> >>>>>>> able to influence conditions before they are sent.
> >>>>>>> Might be tricky due to new->req->ack state transition.
> >>>>>>>
> >>>>>>> 4. Make the default condition configurable, e.g. by an additional
> >>>>>>> argument
> >>>>>>> 'default_condition' = true/false for an ovsdb_idl_create().  This
> >>>>>>> way the
> >>>>>>> application will not get any data until conditions are actually
set.
> >>>>>>>
> >>>>>>> 5. Or it maybe just a separate config function that will set
default
> >>>>>>> conditions
> >>>>>>> to 'false' and will need to be called before the first run().
> >>>>>>>
> >>>>>>> 6. Change behavior of 'monitor_everything_by_default' argument.
Make it
> >>>>>>> actually add all the tables to the monitor, but with the 'false'
> >>>>>>> condition.
> >>>>>>> Result should technically be the same.  Might be tricky to get
> >>>>>>> right though
> >>>>>>> with all the backward compatibility.
> >>>>>>>
> >>>>>>> Option 5 might be the better option of these.
> >>>>>>>
> >>>>>>> What do you think?
> >>>>>>>
> >>>>>>
> >>>>>> I think option 5 sounds the simplest to implement indeed.  It also
> >>>>>> doesn't induce any compatibility issues as you mentioned.
> >>>>>>
> >>>>>> The only "issue" is we'd probably want this backported to stable
OVN
> >>>>>> releases so it means we need to bump the submodule version to an
> >>>>>> unreleased version of OVS.  But that's an OVN problem and we
discussed
> >>>>>> similar instances of it before.
> >>>>>>
> >>>>>> I'll prepare a patch soon.
> >>>>>>
> >>>>>> Best regards,
> >>>>>> Dumitru
> >>>>>
> >>>>>
> >>>>> Regards,
> >>>>> Vladislav Odintsov
> >>>>>
> >>>>
> >>>> _______________________________________________
> >>>> dev mailing list
> >>>> d...@openvswitch.org <mailto:d...@openvswitch.org>
> >>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> >>>
> >>>
> >>> Regards,
> >>> Vladislav Odintsov
> >>>
> >>> _______________________________________________
> >>> dev mailing list
> >>> d...@openvswitch.org
> >>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> >>
> >> _______________________________________________
> >> dev mailing list
> >> d...@openvswitch.org
> >> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> >
> >
> > Regards,
> > Vladislav Odintsov
> >
> > _______________________________________________
> > dev mailing list
> > d...@openvswitch.org
> > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
>
> Regards,
> Vladislav Odintsov
>
> _______________________________________________
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to