Re: [ovs-dev] [ovn] ovn-controller high memory consumption with sbrec_server_has_##table##_table - enabled codepath

2023-08-11 Thread Dumitru Ceara
On 8/10/23 18:56, Dumitru Ceara wrote:
> On 8/10/23 18:15, Han Zhou wrote:
>> Thanks Vladislav and Dumitru for reporting and fixing the issue.
>> The impact of the issue is more than just the memory spikes in
>> ovn-controller. More importantly, it incurs much higher load on SB DB
>> because the conditions are flooded with all the localnet ports regardless
>> of whether they belong to local datapaths.
>> Please see more details in a related fix I just posted:
>> https://patchwork.ozlabs.org/project/ovn/patch/20230810155351.2853369-1-hz...@ovn.org/
>>
>> In addition, I want to comment on Dumitru's patch (sorry I still didn't see
>> you posted for review yet)
>> https://github.com/dceara/ovn/commit/f5e8b9bcba61a2528b67854bb4211981a99feaa8,
>> I think it is better to avoid using the "optional" version for port_binding
>> monitoring. For the same reason, if my patch above is not applied, the SB
>> DB overload problem is still there because the unconditional monitoring of
>> port_binding initially adds all localnet ports to local_lports and later to
>> the monitor conditions. In addition, if there are a huge amount of PBs, it
>> may still cause memory pressure. I understand the intention of avoiding
>> churns of building local_datapaths, but it may be better to be addressed in
>> a separate patch and we can review and discuss accordingly.
>>
> 
> The unfortunate part is the fact that the local datapath "churn" will
> also cause conntrack zones to be flushed and disrupt traffic.  What's
> worse (in the ovn-k8s case) is that some of the zones (zone 0) might be
> shared with the host.  So, just restarting ovn-controller (with
> conditional monitoring enabled) will trigger conntrack entries to be
> cleared for non-OVN sessions too.
> 
> 

I think I found a reasonable way forward.  I posted a formal series:

https://patchwork.ozlabs.org/project/ovn/list/?series=368430=*

Regards,
Dumitru

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [ovn] ovn-controller high memory consumption with sbrec_server_has_##table##_table - enabled codepath

2023-08-10 Thread Dumitru Ceara
On 8/10/23 18:15, Han Zhou wrote:
> Thanks Vladislav and Dumitru for reporting and fixing the issue.
> The impact of the issue is more than just the memory spikes in
> ovn-controller. More importantly, it incurs much higher load on SB DB
> because the conditions are flooded with all the localnet ports regardless
> of whether they belong to local datapaths.
> Please see more details in a related fix I just posted:
> https://patchwork.ozlabs.org/project/ovn/patch/20230810155351.2853369-1-hz...@ovn.org/
> 
> In addition, I want to comment on Dumitru's patch (sorry I still didn't see
> you posted for review yet)
> https://github.com/dceara/ovn/commit/f5e8b9bcba61a2528b67854bb4211981a99feaa8,
> I think it is better to avoid using the "optional" version for port_binding
> monitoring. For the same reason, if my patch above is not applied, the SB
> DB overload problem is still there because the unconditional monitoring of
> port_binding initially adds all localnet ports to local_lports and later to
> the monitor conditions. In addition, if there are a huge amount of PBs, it
> may still cause memory pressure. I understand the intention of avoiding
> churns of building local_datapaths, but it may be better to be addressed in
> a separate patch and we can review and discuss accordingly.
> 

The unfortunate part is the fact that the local datapath "churn" will
also cause conntrack zones to be flushed and disrupt traffic.  What's
worse (in the ovn-k8s case) is that some of the zones (zone 0) might be
shared with the host.  So, just restarting ovn-controller (with
conditional monitoring enabled) will trigger conntrack entries to be
cleared for non-OVN sessions too.


> Still, looking forward to the formal patch for this.
> 

It's still on my todo list, I didn't get a chance to do this yet, sorry.

Regards,
Dumitru

> Thanks,
> Han
> 
> On Tue, Aug 1, 2023 at 3:09 AM Vladislav Odintsov  wrote:
>>
>> Hi Dumitru!
>>
>> I’ve performed test on the host, where ovn-controller (22.09.x) without
> your patch and without any local datapaths consumed 3+GiB of ram after
> start and the process start took 100% CPU during ~40 seconds after start.
>> With your patch ovn-controller starts during ~5 seconds with max cpu
> consumption seen in top ~35-36%.
>> Memory consumption doesn’t exceed 220-222 MiB on the same host with same
> NB/SB contents.
>>
>> Many thanks for this improvement!
>>
>> I guess after this patch got merged to main branch, it should be
> backported down to 22.03 (branch, to which initial commit [0] introduced
> such behaviour was backported).
>>
>> 0:
> https://github.com/ovn-org/ovn/commit/525829c47d09a114de8536d23784c458977d8321
>>
>>> On 31 Jul 2023, at 14:43, Vladislav Odintsov  wrote:
>>>
>>> Thanks Dumitru!
>>>
>>> I’ll test this patch in a few days.
>>>
 On 28 Jul 2023, at 14:36, Dumitru Ceara  wrote:

 Hi Vladislav,

 After quite some time trying to implement the IDL API change to allow
 setting a different default monitor condition and mostly struggling
> with
 ovn-controller using that properly I kind of gave up and decided to
 approach this in a different way.

 We have guidelines about supported upgrade scenarios [0] so we can use
 the same guidelines for defining which tables ovn-controller is
> entitled
 to assume exist in the SB (without having to check).

 I ended up with:

> https://github.com/dceara/ovn/commit/f5e8b9bcba61a2528b67854bb4211981a99feaa8

 I know it's not perfect but it might be the least risky and a good
 enough solution.

 It would be great if you could try it out on your data set too.

 Thanks,
 Dumitru

 [0]

> https://github.com/ovn-org/ovn/blob/5a1d82cb28c554276e0c17718f808b8f244cb162/Documentation/intro/install/ovn-upgrades.rst?plain=1#L28

 On 7/25/23 10:19, Vladislav Odintsov wrote:
> Many thanks for the information!
>
>> On 25 Jul 2023, at 11:14, Dumitru Ceara  wrote:
>>
>> On 7/24/23 21:10, Vladislav Odintsov wrote:
>>> Hi Dumitru,
>>>
>>
>> Hi Vladislav,
>>
>>> I just wanted to ask wether you need any help (maybe, testing) in
> this?
>>> I’m ready to check this on my dataset if you were successful to
>>> implement a fix.
>>>
>>
>> Thanks for offering to help.  I didn't get the chance to properly
> write
>> and test the patches for this (we need a change in OVS IDL first and
>> then one in OVN).  It would be great if you could try them out on
> your
>> data sets so I'll CC you on the patches when posting them.  I hope
> to do
>> that this week.
>>
>> Regards,
>> Dumitru
>>
 On 12 Jul 2023, at 12:15, Dumitru Ceara  wrote:

 On 7/12/23 00:01, Ilya Maximets wrote:
> On 7/11/23 19:01, Dumitru Ceara wrote:
>> On 7/11/23 18:33, Vladislav Odintsov wrote:
>>> Hi Dumitru,
>>>
>>> The system on which I reproduced this 

Re: [ovs-dev] [ovn] ovn-controller high memory consumption with sbrec_server_has_##table##_table - enabled codepath

2023-08-10 Thread Han Zhou
Thanks Vladislav and Dumitru for reporting and fixing the issue.
The impact of the issue is more than just the memory spikes in
ovn-controller. More importantly, it incurs much higher load on SB DB
because the conditions are flooded with all the localnet ports regardless
of whether they belong to local datapaths.
Please see more details in a related fix I just posted:
https://patchwork.ozlabs.org/project/ovn/patch/20230810155351.2853369-1-hz...@ovn.org/

In addition, I want to comment on Dumitru's patch (sorry I still didn't see
you posted for review yet)
https://github.com/dceara/ovn/commit/f5e8b9bcba61a2528b67854bb4211981a99feaa8,
I think it is better to avoid using the "optional" version for port_binding
monitoring. For the same reason, if my patch above is not applied, the SB
DB overload problem is still there because the unconditional monitoring of
port_binding initially adds all localnet ports to local_lports and later to
the monitor conditions. In addition, if there are a huge amount of PBs, it
may still cause memory pressure. I understand the intention of avoiding
churns of building local_datapaths, but it may be better to be addressed in
a separate patch and we can review and discuss accordingly.

Still, looking forward to the formal patch for this.

Thanks,
Han

On Tue, Aug 1, 2023 at 3:09 AM Vladislav Odintsov  wrote:
>
> Hi Dumitru!
>
> I’ve performed test on the host, where ovn-controller (22.09.x) without
your patch and without any local datapaths consumed 3+GiB of ram after
start and the process start took 100% CPU during ~40 seconds after start.
> With your patch ovn-controller starts during ~5 seconds with max cpu
consumption seen in top ~35-36%.
> Memory consumption doesn’t exceed 220-222 MiB on the same host with same
NB/SB contents.
>
> Many thanks for this improvement!
>
> I guess after this patch got merged to main branch, it should be
backported down to 22.03 (branch, to which initial commit [0] introduced
such behaviour was backported).
>
> 0:
https://github.com/ovn-org/ovn/commit/525829c47d09a114de8536d23784c458977d8321
>
> > On 31 Jul 2023, at 14:43, Vladislav Odintsov  wrote:
> >
> > Thanks Dumitru!
> >
> > I’ll test this patch in a few days.
> >
> >> On 28 Jul 2023, at 14:36, Dumitru Ceara  wrote:
> >>
> >> Hi Vladislav,
> >>
> >> After quite some time trying to implement the IDL API change to allow
> >> setting a different default monitor condition and mostly struggling
with
> >> ovn-controller using that properly I kind of gave up and decided to
> >> approach this in a different way.
> >>
> >> We have guidelines about supported upgrade scenarios [0] so we can use
> >> the same guidelines for defining which tables ovn-controller is
entitled
> >> to assume exist in the SB (without having to check).
> >>
> >> I ended up with:
> >>
https://github.com/dceara/ovn/commit/f5e8b9bcba61a2528b67854bb4211981a99feaa8
> >>
> >> I know it's not perfect but it might be the least risky and a good
> >> enough solution.
> >>
> >> It would be great if you could try it out on your data set too.
> >>
> >> Thanks,
> >> Dumitru
> >>
> >> [0]
> >>
https://github.com/ovn-org/ovn/blob/5a1d82cb28c554276e0c17718f808b8f244cb162/Documentation/intro/install/ovn-upgrades.rst?plain=1#L28
> >>
> >> On 7/25/23 10:19, Vladislav Odintsov wrote:
> >>> Many thanks for the information!
> >>>
>  On 25 Jul 2023, at 11:14, Dumitru Ceara  wrote:
> 
>  On 7/24/23 21:10, Vladislav Odintsov wrote:
> > Hi Dumitru,
> >
> 
>  Hi Vladislav,
> 
> > I just wanted to ask wether you need any help (maybe, testing) in
this?
> > I’m ready to check this on my dataset if you were successful to
> > implement a fix.
> >
> 
>  Thanks for offering to help.  I didn't get the chance to properly
write
>  and test the patches for this (we need a change in OVS IDL first and
>  then one in OVN).  It would be great if you could try them out on
your
>  data sets so I'll CC you on the patches when posting them.  I hope
to do
>  that this week.
> 
>  Regards,
>  Dumitru
> 
> >> On 12 Jul 2023, at 12:15, Dumitru Ceara  wrote:
> >>
> >> On 7/12/23 00:01, Ilya Maximets wrote:
> >>> On 7/11/23 19:01, Dumitru Ceara wrote:
>  On 7/11/23 18:33, Vladislav Odintsov wrote:
> > Hi Dumitru,
> >
> > The system on which I reproduced this issue is running 22.09.x
> > version. I’ve tried to upgrade ovn-controller to main branch +
your
> > patch. Please, note that it has test error: [1].
> > After two minutes after upgrade it still consumed 3.3G.
> >
> 
>  Ack, I need to re-think the patch then.  Maybe a hard deadline
to run
>  malloc_trim() at least once every X seconds.  I'll see what I
can come
>  up with.
> 
> > I tried to backport your patch to 22.09, it required to backport
> > also this commit: [2] and it failed some tests: [3].
> 

Re: [ovs-dev] [ovn] ovn-controller high memory consumption with sbrec_server_has_##table##_table - enabled codepath

2023-08-01 Thread Vladislav Odintsov
Hi Dumitru!

I’ve performed test on the host, where ovn-controller (22.09.x) without your 
patch and without any local datapaths consumed 3+GiB of ram after start and the 
process start took 100% CPU during ~40 seconds after start.
With your patch ovn-controller starts during ~5 seconds with max cpu 
consumption seen in top ~35-36%.
Memory consumption doesn’t exceed 220-222 MiB on the same host with same NB/SB 
contents.

Many thanks for this improvement!

I guess after this patch got merged to main branch, it should be backported 
down to 22.03 (branch, to which initial commit [0] introduced such behaviour 
was backported).

0: 
https://github.com/ovn-org/ovn/commit/525829c47d09a114de8536d23784c458977d8321

> On 31 Jul 2023, at 14:43, Vladislav Odintsov  wrote:
> 
> Thanks Dumitru!
> 
> I’ll test this patch in a few days.
> 
>> On 28 Jul 2023, at 14:36, Dumitru Ceara  wrote:
>> 
>> Hi Vladislav,
>> 
>> After quite some time trying to implement the IDL API change to allow
>> setting a different default monitor condition and mostly struggling with
>> ovn-controller using that properly I kind of gave up and decided to
>> approach this in a different way.
>> 
>> We have guidelines about supported upgrade scenarios [0] so we can use
>> the same guidelines for defining which tables ovn-controller is entitled
>> to assume exist in the SB (without having to check).
>> 
>> I ended up with:
>> https://github.com/dceara/ovn/commit/f5e8b9bcba61a2528b67854bb4211981a99feaa8
>> 
>> I know it's not perfect but it might be the least risky and a good
>> enough solution.
>> 
>> It would be great if you could try it out on your data set too.
>> 
>> Thanks,
>> Dumitru
>> 
>> [0]
>> https://github.com/ovn-org/ovn/blob/5a1d82cb28c554276e0c17718f808b8f244cb162/Documentation/intro/install/ovn-upgrades.rst?plain=1#L28
>> 
>> On 7/25/23 10:19, Vladislav Odintsov wrote:
>>> Many thanks for the information!
>>> 
 On 25 Jul 2023, at 11:14, Dumitru Ceara  wrote:
 
 On 7/24/23 21:10, Vladislav Odintsov wrote:
> Hi Dumitru,
> 
 
 Hi Vladislav,
 
> I just wanted to ask wether you need any help (maybe, testing) in this?
> I’m ready to check this on my dataset if you were successful to
> implement a fix.
> 
 
 Thanks for offering to help.  I didn't get the chance to properly write
 and test the patches for this (we need a change in OVS IDL first and
 then one in OVN).  It would be great if you could try them out on your
 data sets so I'll CC you on the patches when posting them.  I hope to do
 that this week.
 
 Regards,
 Dumitru
 
>> On 12 Jul 2023, at 12:15, Dumitru Ceara  wrote:
>> 
>> On 7/12/23 00:01, Ilya Maximets wrote:
>>> On 7/11/23 19:01, Dumitru Ceara wrote:
 On 7/11/23 18:33, Vladislav Odintsov wrote:
> Hi Dumitru,
> 
> The system on which I reproduced this issue is running 22.09.x
> version. I’ve tried to upgrade ovn-controller to main branch + your
> patch. Please, note that it has test error: [1].
> After two minutes after upgrade it still consumed 3.3G.
> 
 
 Ack, I need to re-think the patch then.  Maybe a hard deadline to run
 malloc_trim() at least once every X seconds.  I'll see what I can come
 up with.
 
> I tried to backport your patch to 22.09, it required to backport
> also this commit: [2] and it failed some tests: [3].
> 
> But I’ve got general question: prior to commit that I mentioned in
> initial mail, ovn-controller even didn’t try load such amount of
> data. And now it does and IIUC, your patch just releases memory
> that was freed after ovn-controller fully loaded.
> I’m wonder wether it should load that excess data at all? Seems
> like it did.
> 
 
 Looking again at 1b0dbde94070 ("ovn-controller: Only set monitor
 conditions on available tables.") it's kind of expected indeed:
 
 Initially all tables are "unavailable" because we didn't get the schema
 so we don't set any condition for any table.
 
 After ovn-controller connects to the SB for the first time it will
 determine that the SB tables are in the schema so it will explicitly 
 add
 them to the monitor condition and restrict the SB data it is
 interested in.
 
 Maybe we need to change the IDL/CS modules to wait with the
 monitor_cond/monitor_cond_since until instructed by the client
 (ovn-controller).  Ilya do you have any thoughts on this matter?
>>> 
>>> So, AFAICT, the issue is that we're running with
>>> 'monitor_everything_by_default'
>>> option, the default condition is 'true' and the monitor request for
>>> the main
>>> database is sent out immediately after receiving the schema, so the
>>> application

Re: [ovs-dev] [ovn] ovn-controller high memory consumption with sbrec_server_has_##table##_table - enabled codepath

2023-07-31 Thread Vladislav Odintsov
Thanks Dumitru!

I’ll test this patch in a few days.

> On 28 Jul 2023, at 14:36, Dumitru Ceara  wrote:
> 
> Hi Vladislav,
> 
> After quite some time trying to implement the IDL API change to allow
> setting a different default monitor condition and mostly struggling with
> ovn-controller using that properly I kind of gave up and decided to
> approach this in a different way.
> 
> We have guidelines about supported upgrade scenarios [0] so we can use
> the same guidelines for defining which tables ovn-controller is entitled
> to assume exist in the SB (without having to check).
> 
> I ended up with:
> https://github.com/dceara/ovn/commit/f5e8b9bcba61a2528b67854bb4211981a99feaa8
> 
> I know it's not perfect but it might be the least risky and a good
> enough solution.
> 
> It would be great if you could try it out on your data set too.
> 
> Thanks,
> Dumitru
> 
> [0]
> https://github.com/ovn-org/ovn/blob/5a1d82cb28c554276e0c17718f808b8f244cb162/Documentation/intro/install/ovn-upgrades.rst?plain=1#L28
> 
> On 7/25/23 10:19, Vladislav Odintsov wrote:
>> Many thanks for the information!
>> 
>>> On 25 Jul 2023, at 11:14, Dumitru Ceara  wrote:
>>> 
>>> On 7/24/23 21:10, Vladislav Odintsov wrote:
 Hi Dumitru,
 
>>> 
>>> Hi Vladislav,
>>> 
 I just wanted to ask wether you need any help (maybe, testing) in this?
 I’m ready to check this on my dataset if you were successful to
 implement a fix.
 
>>> 
>>> Thanks for offering to help.  I didn't get the chance to properly write
>>> and test the patches for this (we need a change in OVS IDL first and
>>> then one in OVN).  It would be great if you could try them out on your
>>> data sets so I'll CC you on the patches when posting them.  I hope to do
>>> that this week.
>>> 
>>> Regards,
>>> Dumitru
>>> 
> On 12 Jul 2023, at 12:15, Dumitru Ceara  wrote:
> 
> On 7/12/23 00:01, Ilya Maximets wrote:
>> On 7/11/23 19:01, Dumitru Ceara wrote:
>>> On 7/11/23 18:33, Vladislav Odintsov wrote:
 Hi Dumitru,
 
 The system on which I reproduced this issue is running 22.09.x
 version. I’ve tried to upgrade ovn-controller to main branch + your
 patch. Please, note that it has test error: [1].
 After two minutes after upgrade it still consumed 3.3G.
 
>>> 
>>> Ack, I need to re-think the patch then.  Maybe a hard deadline to run
>>> malloc_trim() at least once every X seconds.  I'll see what I can come
>>> up with.
>>> 
 I tried to backport your patch to 22.09, it required to backport
 also this commit: [2] and it failed some tests: [3].
 
 But I’ve got general question: prior to commit that I mentioned in
 initial mail, ovn-controller even didn’t try load such amount of
 data. And now it does and IIUC, your patch just releases memory
 that was freed after ovn-controller fully loaded.
 I’m wonder wether it should load that excess data at all? Seems
 like it did.
 
>>> 
>>> Looking again at 1b0dbde94070 ("ovn-controller: Only set monitor
>>> conditions on available tables.") it's kind of expected indeed:
>>> 
>>> Initially all tables are "unavailable" because we didn't get the schema
>>> so we don't set any condition for any table.
>>> 
>>> After ovn-controller connects to the SB for the first time it will
>>> determine that the SB tables are in the schema so it will explicitly add
>>> them to the monitor condition and restrict the SB data it is
>>> interested in.
>>> 
>>> Maybe we need to change the IDL/CS modules to wait with the
>>> monitor_cond/monitor_cond_since until instructed by the client
>>> (ovn-controller).  Ilya do you have any thoughts on this matter?
>> 
>> So, AFAICT, the issue is that we're running with
>> 'monitor_everything_by_default'
>> option, the default condition is 'true' and the monitor request for
>> the main
>> database is sent out immediately after receiving the schema, so the
>> application
>> has no time to react.
>> 
>> I think, there are few possible solutions for this:
>> 
>> 1. Introduce a new state in the CS state machine, e.g.
>>  CS_S_SERVER_SCHEMA_RCEIVED, and move out from this state in the run()
>>  callback.  This way the application will have a chance to set up
>> conditions
>>  before they are sent.  Slightly not intuitive.
>> 
>> 2. A variation on what you suggested, i.e. enter the
>> CS_S_SERVER_SCHEMA_RCEIVED
>>  state and wait for some sort of the signal from the application to
>> proceed.
>>  Sounds a bit counter-intuitive for an IDL user.
>> 
>> 3. Introduce an application callback that can be called from the
>>  ovsdb_idl_compose_monitor_request() the same way as this function
>> is getting
>>  called form the ovsdb_cs_send_monitor_request().  An application
>> will be

Re: [ovs-dev] [ovn] ovn-controller high memory consumption with sbrec_server_has_##table##_table - enabled codepath

2023-07-28 Thread Dumitru Ceara
Hi Vladislav,

After quite some time trying to implement the IDL API change to allow
setting a different default monitor condition and mostly struggling with
ovn-controller using that properly I kind of gave up and decided to
approach this in a different way.

We have guidelines about supported upgrade scenarios [0] so we can use
the same guidelines for defining which tables ovn-controller is entitled
to assume exist in the SB (without having to check).

I ended up with:
https://github.com/dceara/ovn/commit/f5e8b9bcba61a2528b67854bb4211981a99feaa8

I know it's not perfect but it might be the least risky and a good
enough solution.

It would be great if you could try it out on your data set too.

Thanks,
Dumitru

[0]
https://github.com/ovn-org/ovn/blob/5a1d82cb28c554276e0c17718f808b8f244cb162/Documentation/intro/install/ovn-upgrades.rst?plain=1#L28

On 7/25/23 10:19, Vladislav Odintsov wrote:
> Many thanks for the information!
> 
>> On 25 Jul 2023, at 11:14, Dumitru Ceara  wrote:
>>
>> On 7/24/23 21:10, Vladislav Odintsov wrote:
>>> Hi Dumitru,
>>>
>>
>> Hi Vladislav,
>>
>>> I just wanted to ask wether you need any help (maybe, testing) in this?
>>> I’m ready to check this on my dataset if you were successful to
>>> implement a fix.
>>>
>>
>> Thanks for offering to help.  I didn't get the chance to properly write
>> and test the patches for this (we need a change in OVS IDL first and
>> then one in OVN).  It would be great if you could try them out on your
>> data sets so I'll CC you on the patches when posting them.  I hope to do
>> that this week.
>>
>> Regards,
>> Dumitru
>>
 On 12 Jul 2023, at 12:15, Dumitru Ceara  wrote:

 On 7/12/23 00:01, Ilya Maximets wrote:
> On 7/11/23 19:01, Dumitru Ceara wrote:
>> On 7/11/23 18:33, Vladislav Odintsov wrote:
>>> Hi Dumitru,
>>>
>>> The system on which I reproduced this issue is running 22.09.x
>>> version. I’ve tried to upgrade ovn-controller to main branch + your
>>> patch. Please, note that it has test error: [1].
>>> After two minutes after upgrade it still consumed 3.3G.
>>>
>>
>> Ack, I need to re-think the patch then.  Maybe a hard deadline to run
>> malloc_trim() at least once every X seconds.  I'll see what I can come
>> up with.
>>
>>> I tried to backport your patch to 22.09, it required to backport
>>> also this commit: [2] and it failed some tests: [3].
>>>
>>> But I’ve got general question: prior to commit that I mentioned in
>>> initial mail, ovn-controller even didn’t try load such amount of
>>> data. And now it does and IIUC, your patch just releases memory
>>> that was freed after ovn-controller fully loaded.
>>> I’m wonder wether it should load that excess data at all? Seems
>>> like it did.
>>>
>>
>> Looking again at 1b0dbde94070 ("ovn-controller: Only set monitor
>> conditions on available tables.") it's kind of expected indeed:
>>
>> Initially all tables are "unavailable" because we didn't get the schema
>> so we don't set any condition for any table.
>>
>> After ovn-controller connects to the SB for the first time it will
>> determine that the SB tables are in the schema so it will explicitly add
>> them to the monitor condition and restrict the SB data it is
>> interested in.
>>
>> Maybe we need to change the IDL/CS modules to wait with the
>> monitor_cond/monitor_cond_since until instructed by the client
>> (ovn-controller).  Ilya do you have any thoughts on this matter?
>
> So, AFAICT, the issue is that we're running with
> 'monitor_everything_by_default'
> option, the default condition is 'true' and the monitor request for
> the main
> database is sent out immediately after receiving the schema, so the
> application
> has no time to react.
>
> I think, there are few possible solutions for this:
>
> 1. Introduce a new state in the CS state machine, e.g.
>   CS_S_SERVER_SCHEMA_RCEIVED, and move out from this state in the run()
>   callback.  This way the application will have a chance to set up
> conditions
>   before they are sent.  Slightly not intuitive.
>
> 2. A variation on what you suggested, i.e. enter the
> CS_S_SERVER_SCHEMA_RCEIVED
>   state and wait for some sort of the signal from the application to
> proceed.
>   Sounds a bit counter-intuitive for an IDL user.
>
> 3. Introduce an application callback that can be called from the
>   ovsdb_idl_compose_monitor_request() the same way as this function
> is getting
>   called form the ovsdb_cs_send_monitor_request().  An application
> will be
>   able to influence conditions before they are sent.
>   Might be tricky due to new->req->ack state transition.
>
> 4. Make the default condition configurable, e.g. by an additional
> argument
>   'default_condition' = true/false for an 

Re: [ovs-dev] [ovn] ovn-controller high memory consumption with sbrec_server_has_##table##_table - enabled codepath

2023-07-25 Thread Vladislav Odintsov
Many thanks for the information!

> On 25 Jul 2023, at 11:14, Dumitru Ceara  wrote:
> 
> On 7/24/23 21:10, Vladislav Odintsov wrote:
>> Hi Dumitru,
>> 
> 
> Hi Vladislav,
> 
>> I just wanted to ask wether you need any help (maybe, testing) in this?
>> I’m ready to check this on my dataset if you were successful to
>> implement a fix.
>> 
> 
> Thanks for offering to help.  I didn't get the chance to properly write
> and test the patches for this (we need a change in OVS IDL first and
> then one in OVN).  It would be great if you could try them out on your
> data sets so I'll CC you on the patches when posting them.  I hope to do
> that this week.
> 
> Regards,
> Dumitru
> 
>>> On 12 Jul 2023, at 12:15, Dumitru Ceara  wrote:
>>> 
>>> On 7/12/23 00:01, Ilya Maximets wrote:
 On 7/11/23 19:01, Dumitru Ceara wrote:
> On 7/11/23 18:33, Vladislav Odintsov wrote:
>> Hi Dumitru,
>> 
>> The system on which I reproduced this issue is running 22.09.x
>> version. I’ve tried to upgrade ovn-controller to main branch + your
>> patch. Please, note that it has test error: [1].
>> After two minutes after upgrade it still consumed 3.3G.
>> 
> 
> Ack, I need to re-think the patch then.  Maybe a hard deadline to run
> malloc_trim() at least once every X seconds.  I'll see what I can come
> up with.
> 
>> I tried to backport your patch to 22.09, it required to backport
>> also this commit: [2] and it failed some tests: [3].
>> 
>> But I’ve got general question: prior to commit that I mentioned in
>> initial mail, ovn-controller even didn’t try load such amount of
>> data. And now it does and IIUC, your patch just releases memory
>> that was freed after ovn-controller fully loaded.
>> I’m wonder wether it should load that excess data at all? Seems
>> like it did.
>> 
> 
> Looking again at 1b0dbde94070 ("ovn-controller: Only set monitor
> conditions on available tables.") it's kind of expected indeed:
> 
> Initially all tables are "unavailable" because we didn't get the schema
> so we don't set any condition for any table.
> 
> After ovn-controller connects to the SB for the first time it will
> determine that the SB tables are in the schema so it will explicitly add
> them to the monitor condition and restrict the SB data it is
> interested in.
> 
> Maybe we need to change the IDL/CS modules to wait with the
> monitor_cond/monitor_cond_since until instructed by the client
> (ovn-controller).  Ilya do you have any thoughts on this matter?
 
 So, AFAICT, the issue is that we're running with
 'monitor_everything_by_default'
 option, the default condition is 'true' and the monitor request for
 the main
 database is sent out immediately after receiving the schema, so the
 application
 has no time to react.
 
 I think, there are few possible solutions for this:
 
 1. Introduce a new state in the CS state machine, e.g.
   CS_S_SERVER_SCHEMA_RCEIVED, and move out from this state in the run()
   callback.  This way the application will have a chance to set up
 conditions
   before they are sent.  Slightly not intuitive.
 
 2. A variation on what you suggested, i.e. enter the
 CS_S_SERVER_SCHEMA_RCEIVED
   state and wait for some sort of the signal from the application to
 proceed.
   Sounds a bit counter-intuitive for an IDL user.
 
 3. Introduce an application callback that can be called from the
   ovsdb_idl_compose_monitor_request() the same way as this function
 is getting
   called form the ovsdb_cs_send_monitor_request().  An application
 will be
   able to influence conditions before they are sent.
   Might be tricky due to new->req->ack state transition.
 
 4. Make the default condition configurable, e.g. by an additional
 argument
   'default_condition' = true/false for an ovsdb_idl_create().  This
 way the
   application will not get any data until conditions are actually set.
 
 5. Or it maybe just a separate config function that will set default
 conditions
   to 'false' and will need to be called before the first run().
 
 6. Change behavior of 'monitor_everything_by_default' argument.  Make it
   actually add all the tables to the monitor, but with the 'false'
 condition.
   Result should technically be the same.  Might be tricky to get
 right though
   with all the backward compatibility.
 
 Option 5 might be the better option of these.
 
 What do you think?
 
>>> 
>>> I think option 5 sounds the simplest to implement indeed.  It also
>>> doesn't induce any compatibility issues as you mentioned.
>>> 
>>> The only "issue" is we'd probably want this backported to stable OVN
>>> releases so it means we need to bump the submodule version to an
>>> unreleased version of 

Re: [ovs-dev] [ovn] ovn-controller high memory consumption with sbrec_server_has_##table##_table - enabled codepath

2023-07-25 Thread Dumitru Ceara
On 7/24/23 21:10, Vladislav Odintsov wrote:
> Hi Dumitru,
> 

Hi Vladislav,

> I just wanted to ask wether you need any help (maybe, testing) in this?
> I’m ready to check this on my dataset if you were successful to
> implement a fix.
> 

Thanks for offering to help.  I didn't get the chance to properly write
and test the patches for this (we need a change in OVS IDL first and
then one in OVN).  It would be great if you could try them out on your
data sets so I'll CC you on the patches when posting them.  I hope to do
that this week.

Regards,
Dumitru

>> On 12 Jul 2023, at 12:15, Dumitru Ceara  wrote:
>>
>> On 7/12/23 00:01, Ilya Maximets wrote:
>>> On 7/11/23 19:01, Dumitru Ceara wrote:
 On 7/11/23 18:33, Vladislav Odintsov wrote:
> Hi Dumitru,
>
> The system on which I reproduced this issue is running 22.09.x
> version. I’ve tried to upgrade ovn-controller to main branch + your
> patch. Please, note that it has test error: [1].
> After two minutes after upgrade it still consumed 3.3G.
>

 Ack, I need to re-think the patch then.  Maybe a hard deadline to run
 malloc_trim() at least once every X seconds.  I'll see what I can come
 up with.

> I tried to backport your patch to 22.09, it required to backport
> also this commit: [2] and it failed some tests: [3].
>
> But I’ve got general question: prior to commit that I mentioned in
> initial mail, ovn-controller even didn’t try load such amount of
> data. And now it does and IIUC, your patch just releases memory
> that was freed after ovn-controller fully loaded.
> I’m wonder wether it should load that excess data at all? Seems
> like it did.
>

 Looking again at 1b0dbde94070 ("ovn-controller: Only set monitor
 conditions on available tables.") it's kind of expected indeed:

 Initially all tables are "unavailable" because we didn't get the schema
 so we don't set any condition for any table.

 After ovn-controller connects to the SB for the first time it will
 determine that the SB tables are in the schema so it will explicitly add
 them to the monitor condition and restrict the SB data it is
 interested in.

 Maybe we need to change the IDL/CS modules to wait with the
 monitor_cond/monitor_cond_since until instructed by the client
 (ovn-controller).  Ilya do you have any thoughts on this matter?
>>>
>>> So, AFAICT, the issue is that we're running with
>>> 'monitor_everything_by_default'
>>> option, the default condition is 'true' and the monitor request for
>>> the main
>>> database is sent out immediately after receiving the schema, so the
>>> application
>>> has no time to react.
>>>
>>> I think, there are few possible solutions for this:
>>>
>>> 1. Introduce a new state in the CS state machine, e.g.
>>>   CS_S_SERVER_SCHEMA_RCEIVED, and move out from this state in the run()
>>>   callback.  This way the application will have a chance to set up
>>> conditions
>>>   before they are sent.  Slightly not intuitive.
>>>
>>> 2. A variation on what you suggested, i.e. enter the
>>> CS_S_SERVER_SCHEMA_RCEIVED
>>>   state and wait for some sort of the signal from the application to
>>> proceed.
>>>   Sounds a bit counter-intuitive for an IDL user.
>>>
>>> 3. Introduce an application callback that can be called from the
>>>   ovsdb_idl_compose_monitor_request() the same way as this function
>>> is getting
>>>   called form the ovsdb_cs_send_monitor_request().  An application
>>> will be
>>>   able to influence conditions before they are sent.
>>>   Might be tricky due to new->req->ack state transition.
>>>
>>> 4. Make the default condition configurable, e.g. by an additional
>>> argument
>>>   'default_condition' = true/false for an ovsdb_idl_create().  This
>>> way the
>>>   application will not get any data until conditions are actually set.
>>>
>>> 5. Or it maybe just a separate config function that will set default
>>> conditions
>>>   to 'false' and will need to be called before the first run().
>>>
>>> 6. Change behavior of 'monitor_everything_by_default' argument.  Make it
>>>   actually add all the tables to the monitor, but with the 'false'
>>> condition.
>>>   Result should technically be the same.  Might be tricky to get
>>> right though
>>>   with all the backward compatibility.
>>>
>>> Option 5 might be the better option of these.
>>>
>>> What do you think?
>>>
>>
>> I think option 5 sounds the simplest to implement indeed.  It also
>> doesn't induce any compatibility issues as you mentioned.
>>
>> The only "issue" is we'd probably want this backported to stable OVN
>> releases so it means we need to bump the submodule version to an
>> unreleased version of OVS.  But that's an OVN problem and we discussed
>> similar instances of it before.
>>
>> I'll prepare a patch soon.
>>
>> Best regards,
>> Dumitru
> 
> 
> Regards,
> Vladislav Odintsov
> 

___
dev 

Re: [ovs-dev] [ovn] ovn-controller high memory consumption with sbrec_server_has_##table##_table - enabled codepath

2023-07-24 Thread Vladislav Odintsov
Hi Dumitru,

I just wanted to ask wether you need any help (maybe, testing) in this?
I’m ready to check this on my dataset if you were successful to implement a fix.

> On 12 Jul 2023, at 12:15, Dumitru Ceara  wrote:
> 
> On 7/12/23 00:01, Ilya Maximets wrote:
>> On 7/11/23 19:01, Dumitru Ceara wrote:
>>> On 7/11/23 18:33, Vladislav Odintsov wrote:
 Hi Dumitru,
 
 The system on which I reproduced this issue is running 22.09.x version. 
 I’ve tried to upgrade ovn-controller to main branch + your patch. Please, 
 note that it has test error: [1].
 After two minutes after upgrade it still consumed 3.3G.
 
>>> 
>>> Ack, I need to re-think the patch then.  Maybe a hard deadline to run
>>> malloc_trim() at least once every X seconds.  I'll see what I can come
>>> up with.
>>> 
 I tried to backport your patch to 22.09, it required to backport also this 
 commit: [2] and it failed some tests: [3].
 
 But I’ve got general question: prior to commit that I mentioned in initial 
 mail, ovn-controller even didn’t try load such amount of data. And now it 
 does and IIUC, your patch just releases memory that was freed after 
 ovn-controller fully loaded.
 I’m wonder wether it should load that excess data at all? Seems like it 
 did.
 
>>> 
>>> Looking again at 1b0dbde94070 ("ovn-controller: Only set monitor
>>> conditions on available tables.") it's kind of expected indeed:
>>> 
>>> Initially all tables are "unavailable" because we didn't get the schema
>>> so we don't set any condition for any table.
>>> 
>>> After ovn-controller connects to the SB for the first time it will
>>> determine that the SB tables are in the schema so it will explicitly add
>>> them to the monitor condition and restrict the SB data it is interested in.
>>> 
>>> Maybe we need to change the IDL/CS modules to wait with the
>>> monitor_cond/monitor_cond_since until instructed by the client
>>> (ovn-controller).  Ilya do you have any thoughts on this matter?
>> 
>> So, AFAICT, the issue is that we're running with 
>> 'monitor_everything_by_default'
>> option, the default condition is 'true' and the monitor request for the main
>> database is sent out immediately after receiving the schema, so the 
>> application
>> has no time to react.
>> 
>> I think, there are few possible solutions for this:
>> 
>> 1. Introduce a new state in the CS state machine, e.g.
>>   CS_S_SERVER_SCHEMA_RCEIVED, and move out from this state in the run()
>>   callback.  This way the application will have a chance to set up conditions
>>   before they are sent.  Slightly not intuitive.
>> 
>> 2. A variation on what you suggested, i.e. enter the 
>> CS_S_SERVER_SCHEMA_RCEIVED
>>   state and wait for some sort of the signal from the application to proceed.
>>   Sounds a bit counter-intuitive for an IDL user.
>> 
>> 3. Introduce an application callback that can be called from the
>>   ovsdb_idl_compose_monitor_request() the same way as this function is 
>> getting
>>   called form the ovsdb_cs_send_monitor_request().  An application will be
>>   able to influence conditions before they are sent.
>>   Might be tricky due to new->req->ack state transition.
>> 
>> 4. Make the default condition configurable, e.g. by an additional argument
>>   'default_condition' = true/false for an ovsdb_idl_create().  This way the
>>   application will not get any data until conditions are actually set.
>> 
>> 5. Or it maybe just a separate config function that will set default 
>> conditions
>>   to 'false' and will need to be called before the first run().
>> 
>> 6. Change behavior of 'monitor_everything_by_default' argument.  Make it
>>   actually add all the tables to the monitor, but with the 'false' condition.
>>   Result should technically be the same.  Might be tricky to get right though
>>   with all the backward compatibility.
>> 
>> Option 5 might be the better option of these.
>> 
>> What do you think?
>> 
> 
> I think option 5 sounds the simplest to implement indeed.  It also
> doesn't induce any compatibility issues as you mentioned.
> 
> The only "issue" is we'd probably want this backported to stable OVN
> releases so it means we need to bump the submodule version to an
> unreleased version of OVS.  But that's an OVN problem and we discussed
> similar instances of it before.
> 
> I'll prepare a patch soon.
> 
> Best regards,
> Dumitru


Regards,
Vladislav Odintsov

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [ovn] ovn-controller high memory consumption with sbrec_server_has_##table##_table - enabled codepath

2023-07-12 Thread Dumitru Ceara
On 7/12/23 00:01, Ilya Maximets wrote:
> On 7/11/23 19:01, Dumitru Ceara wrote:
>> On 7/11/23 18:33, Vladislav Odintsov wrote:
>>> Hi Dumitru,
>>>
>>> The system on which I reproduced this issue is running 22.09.x version. 
>>> I’ve tried to upgrade ovn-controller to main branch + your patch. Please, 
>>> note that it has test error: [1].
>>> After two minutes after upgrade it still consumed 3.3G.
>>>
>>
>> Ack, I need to re-think the patch then.  Maybe a hard deadline to run
>> malloc_trim() at least once every X seconds.  I'll see what I can come
>> up with.
>>
>>> I tried to backport your patch to 22.09, it required to backport also this 
>>> commit: [2] and it failed some tests: [3].
>>>
>>> But I’ve got general question: prior to commit that I mentioned in initial 
>>> mail, ovn-controller even didn’t try load such amount of data. And now it 
>>> does and IIUC, your patch just releases memory that was freed after 
>>> ovn-controller fully loaded.
>>> I’m wonder wether it should load that excess data at all? Seems like it did.
>>>
>>
>> Looking again at 1b0dbde94070 ("ovn-controller: Only set monitor
>> conditions on available tables.") it's kind of expected indeed:
>>
>> Initially all tables are "unavailable" because we didn't get the schema
>> so we don't set any condition for any table.
>>
>> After ovn-controller connects to the SB for the first time it will
>> determine that the SB tables are in the schema so it will explicitly add
>> them to the monitor condition and restrict the SB data it is interested in.
>>
>> Maybe we need to change the IDL/CS modules to wait with the
>> monitor_cond/monitor_cond_since until instructed by the client
>> (ovn-controller).  Ilya do you have any thoughts on this matter?
> 
> So, AFAICT, the issue is that we're running with 
> 'monitor_everything_by_default'
> option, the default condition is 'true' and the monitor request for the main
> database is sent out immediately after receiving the schema, so the 
> application
> has no time to react.
> 
> I think, there are few possible solutions for this:
> 
> 1. Introduce a new state in the CS state machine, e.g.
>CS_S_SERVER_SCHEMA_RCEIVED, and move out from this state in the run()
>callback.  This way the application will have a chance to set up conditions
>before they are sent.  Slightly not intuitive.
> 
> 2. A variation on what you suggested, i.e. enter the 
> CS_S_SERVER_SCHEMA_RCEIVED
>state and wait for some sort of the signal from the application to proceed.
>Sounds a bit counter-intuitive for an IDL user.
> 
> 3. Introduce an application callback that can be called from the
>ovsdb_idl_compose_monitor_request() the same way as this function is 
> getting
>called form the ovsdb_cs_send_monitor_request().  An application will be
>able to influence conditions before they are sent.
>Might be tricky due to new->req->ack state transition.
> 
> 4. Make the default condition configurable, e.g. by an additional argument
>'default_condition' = true/false for an ovsdb_idl_create().  This way the
>application will not get any data until conditions are actually set.
> 
> 5. Or it maybe just a separate config function that will set default 
> conditions
>to 'false' and will need to be called before the first run().
> 
> 6. Change behavior of 'monitor_everything_by_default' argument.  Make it
>actually add all the tables to the monitor, but with the 'false' condition.
>Result should technically be the same.  Might be tricky to get right though
>with all the backward compatibility.
> 
> Option 5 might be the better option of these.
> 
> What do you think?
> 

I think option 5 sounds the simplest to implement indeed.  It also
doesn't induce any compatibility issues as you mentioned.

The only "issue" is we'd probably want this backported to stable OVN
releases so it means we need to bump the submodule version to an
unreleased version of OVS.  But that's an OVN problem and we discussed
similar instances of it before.

I'll prepare a patch soon.

Best regards,
Dumitru

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [ovn] ovn-controller high memory consumption with sbrec_server_has_##table##_table - enabled codepath

2023-07-11 Thread Ilya Maximets
On 7/11/23 19:01, Dumitru Ceara wrote:
> On 7/11/23 18:33, Vladislav Odintsov wrote:
>> Hi Dumitru,
>>
>> The system on which I reproduced this issue is running 22.09.x version. I’ve 
>> tried to upgrade ovn-controller to main branch + your patch. Please, note 
>> that it has test error: [1].
>> After two minutes after upgrade it still consumed 3.3G.
>>
> 
> Ack, I need to re-think the patch then.  Maybe a hard deadline to run
> malloc_trim() at least once every X seconds.  I'll see what I can come
> up with.
> 
>> I tried to backport your patch to 22.09, it required to backport also this 
>> commit: [2] and it failed some tests: [3].
>>
>> But I’ve got general question: prior to commit that I mentioned in initial 
>> mail, ovn-controller even didn’t try load such amount of data. And now it 
>> does and IIUC, your patch just releases memory that was freed after 
>> ovn-controller fully loaded.
>> I’m wonder wether it should load that excess data at all? Seems like it did.
>>
> 
> Looking again at 1b0dbde94070 ("ovn-controller: Only set monitor
> conditions on available tables.") it's kind of expected indeed:
> 
> Initially all tables are "unavailable" because we didn't get the schema
> so we don't set any condition for any table.
> 
> After ovn-controller connects to the SB for the first time it will
> determine that the SB tables are in the schema so it will explicitly add
> them to the monitor condition and restrict the SB data it is interested in.
> 
> Maybe we need to change the IDL/CS modules to wait with the
> monitor_cond/monitor_cond_since until instructed by the client
> (ovn-controller).  Ilya do you have any thoughts on this matter?

So, AFAICT, the issue is that we're running with 'monitor_everything_by_default'
option, the default condition is 'true' and the monitor request for the main
database is sent out immediately after receiving the schema, so the application
has no time to react.

I think, there are few possible solutions for this:

1. Introduce a new state in the CS state machine, e.g.
   CS_S_SERVER_SCHEMA_RCEIVED, and move out from this state in the run()
   callback.  This way the application will have a chance to set up conditions
   before they are sent.  Slightly not intuitive.

2. A variation on what you suggested, i.e. enter the CS_S_SERVER_SCHEMA_RCEIVED
   state and wait for some sort of the signal from the application to proceed.
   Sounds a bit counter-intuitive for an IDL user.

3. Introduce an application callback that can be called from the
   ovsdb_idl_compose_monitor_request() the same way as this function is getting
   called form the ovsdb_cs_send_monitor_request().  An application will be
   able to influence conditions before they are sent.
   Might be tricky due to new->req->ack state transition.

4. Make the default condition configurable, e.g. by an additional argument
   'default_condition' = true/false for an ovsdb_idl_create().  This way the
   application will not get any data until conditions are actually set.

5. Or it maybe just a separate config function that will set default conditions
   to 'false' and will need to be called before the first run().

6. Change behavior of 'monitor_everything_by_default' argument.  Make it
   actually add all the tables to the monitor, but with the 'false' condition.
   Result should technically be the same.  Might be tricky to get right though
   with all the backward compatibility.

Option 5 might be the better option of these.

What do you think?

Best regards, Ilya Maximets.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [ovn] ovn-controller high memory consumption with sbrec_server_has_##table##_table - enabled codepath

2023-07-11 Thread Dumitru Ceara
On 7/11/23 18:33, Vladislav Odintsov wrote:
> Hi Dumitru,
> 
> The system on which I reproduced this issue is running 22.09.x version. I’ve 
> tried to upgrade ovn-controller to main branch + your patch. Please, note 
> that it has test error: [1].
> After two minutes after upgrade it still consumed 3.3G.
> 

Ack, I need to re-think the patch then.  Maybe a hard deadline to run
malloc_trim() at least once every X seconds.  I'll see what I can come
up with.

> I tried to backport your patch to 22.09, it required to backport also this 
> commit: [2] and it failed some tests: [3].
> 
> But I’ve got general question: prior to commit that I mentioned in initial 
> mail, ovn-controller even didn’t try load such amount of data. And now it 
> does and IIUC, your patch just releases memory that was freed after 
> ovn-controller fully loaded.
> I’m wonder wether it should load that excess data at all? Seems like it did.
> 

Looking again at 1b0dbde94070 ("ovn-controller: Only set monitor
conditions on available tables.") it's kind of expected indeed:

Initially all tables are "unavailable" because we didn't get the schema
so we don't set any condition for any table.

After ovn-controller connects to the SB for the first time it will
determine that the SB tables are in the schema so it will explicitly add
them to the monitor condition and restrict the SB data it is interested in.

Maybe we need to change the IDL/CS modules to wait with the
monitor_cond/monitor_cond_since until instructed by the client
(ovn-controller).  Ilya do you have any thoughts on this matter?

Thanks,
Dumitru

> 1: https://github.com/odivlad/ovn/actions/runs/5519795488/jobs/10065618661
> 2: 
> https://github.com/ovn-org/ovn/commit/b4c593d23bd959e98fcc9ada4a973ac933579ded
> 3: https://github.com/odivlad/ovn/actions/runs/5519892057
> 
>> On 11 Jul 2023, at 14:39, Dumitru Ceara  wrote:
>>
>> On 7/10/23 22:20, Vladislav Odintsov wrote:
>>> Hi Dumitru,
>>>
>>> thanks for digging into this! I highly appreciate your help!
>>>
>>
>> No worries, my pleasure! :)
>>
>>> Please, see my answers inline.
>>>
 On 10 Jul 2023, at 15:28, Dumitru Ceara  wrote:

 On 7/10/23 12:57, Dumitru Ceara wrote:
> On 7/6/23 13:00, Vladislav Odintsov wrote:
>>
>>
>>> On 5 Jul 2023, at 20:07, Vladislav Odintsov  wrote:
>>>
>>> Hi Dumitru,
>>>
>>> thanks for the quick response!
>>>
 On 5 Jul 2023, at 19:53, Dumitru Ceara  wrote:

 On 7/5/23 17:14, Vladislav Odintsov wrote:
> Hi,
>

 Hi Vladislav,

> we’ve noticed there is a huge ovn-controller memory consumption 
> introduced with [0] comparing to version without its changes in 
> ovn-controller.c part (just OVS submodule bump without ovn-controller 
> changes doesn’t trigger such behaviour).
>

 Thanks for reporting this!

> On an empty host connected to working cluster ovn-controller normally 
> consumes ~175 MiB RSS, and the same host updated to a version with 
> commit [0] consumes 3.3 GiB RSS. The start time and CPU consumption 
> during process start of an affected version is higher that for the 
> normal version.
>
> Before upgrade (normal functioning):
>
> #ovn-appctl -t ovn-controller memory/show ;ps-mem -p $(pidof 
> ovn-controller)| grep ovn
> idl-cells-OVN_Southbound:343855 idl-cells-Open_vSwitch:14131 
> ofctrl_desired_flow_usage-KB:76 ofctrl_installed_flow_usage-KB:60 
> ofctrl_sb_flow_ref_usage-KB:18
> 174.2 MiB + 327.5 KiB = 174.5 MiB ovn-controller
>
> After upgrade to an affected version I’ve checked memory report while 
> ovn-controller was starting and at some time there was a bigger 
> amount of OVN_Southbound cells comparing to "after start" time:
>
> during start:
>
> # ovn-appctl -t ovn-controller memory/show ;ps-mem -p $(pidof 
> ovn-controller)| grep ovn
> idl-cells-OVN_Southbound:11388742 idl-cells-Open_vSwitch:14131 
> idl-outstanding-txns-Open_vSwitch:1
> 3.3 GiB + 327.0 KiB =   3.3 GiB   ovn-controller
>
> after start:
>
> # ovn-appctl -t ovn-controller memory/show ;ps-mem -p $(pidof 
> ovn-controller)| grep ovn
> idl-cells-OVN_Southbound:343896 idl-cells-Open_vSwitch:14131 
> idl-outstanding-txns-Open_vSwitch:1
> 3.3 GiB + 327.0 KiB =   3.3 GiB   ovn-controller
>
>
> cells during start:
> 11388742
>
> cells after start:
> 343896
>

 Are you running with ovn-monitor-all=true on this host?
>>>
>>> No, it has default false.
>>>

 I guess it's unlikely but I'll try just in case: would it be possible 
 to
 share the SB database?

Re: [ovs-dev] [ovn] ovn-controller high memory consumption with sbrec_server_has_##table##_table - enabled codepath

2023-07-11 Thread Vladislav Odintsov
Hi Dumitru,

The system on which I reproduced this issue is running 22.09.x version. I’ve 
tried to upgrade ovn-controller to main branch + your patch. Please, note that 
it has test error: [1].
After two minutes after upgrade it still consumed 3.3G.

I tried to backport your patch to 22.09, it required to backport also this 
commit: [2] and it failed some tests: [3].

But I’ve got general question: prior to commit that I mentioned in initial 
mail, ovn-controller even didn’t try load such amount of data. And now it does 
and IIUC, your patch just releases memory that was freed after ovn-controller 
fully loaded.
I’m wonder wether it should load that excess data at all? Seems like it did.

1: https://github.com/odivlad/ovn/actions/runs/5519795488/jobs/10065618661
2: 
https://github.com/ovn-org/ovn/commit/b4c593d23bd959e98fcc9ada4a973ac933579ded
3: https://github.com/odivlad/ovn/actions/runs/5519892057

> On 11 Jul 2023, at 14:39, Dumitru Ceara  wrote:
> 
> On 7/10/23 22:20, Vladislav Odintsov wrote:
>> Hi Dumitru,
>> 
>> thanks for digging into this! I highly appreciate your help!
>> 
> 
> No worries, my pleasure! :)
> 
>> Please, see my answers inline.
>> 
>>> On 10 Jul 2023, at 15:28, Dumitru Ceara  wrote:
>>> 
>>> On 7/10/23 12:57, Dumitru Ceara wrote:
 On 7/6/23 13:00, Vladislav Odintsov wrote:
> 
> 
>> On 5 Jul 2023, at 20:07, Vladislav Odintsov  wrote:
>> 
>> Hi Dumitru,
>> 
>> thanks for the quick response!
>> 
>>> On 5 Jul 2023, at 19:53, Dumitru Ceara  wrote:
>>> 
>>> On 7/5/23 17:14, Vladislav Odintsov wrote:
 Hi,
 
>>> 
>>> Hi Vladislav,
>>> 
 we’ve noticed there is a huge ovn-controller memory consumption 
 introduced with [0] comparing to version without its changes in 
 ovn-controller.c part (just OVS submodule bump without ovn-controller 
 changes doesn’t trigger such behaviour).
 
>>> 
>>> Thanks for reporting this!
>>> 
 On an empty host connected to working cluster ovn-controller normally 
 consumes ~175 MiB RSS, and the same host updated to a version with 
 commit [0] consumes 3.3 GiB RSS. The start time and CPU consumption 
 during process start of an affected version is higher that for the 
 normal version.
 
 Before upgrade (normal functioning):
 
 #ovn-appctl -t ovn-controller memory/show ;ps-mem -p $(pidof 
 ovn-controller)| grep ovn
 idl-cells-OVN_Southbound:343855 idl-cells-Open_vSwitch:14131 
 ofctrl_desired_flow_usage-KB:76 ofctrl_installed_flow_usage-KB:60 
 ofctrl_sb_flow_ref_usage-KB:18
 174.2 MiB + 327.5 KiB = 174.5 MiB  ovn-controller
 
 After upgrade to an affected version I’ve checked memory report while 
 ovn-controller was starting and at some time there was a bigger amount 
 of OVN_Southbound cells comparing to "after start" time:
 
 during start:
 
 # ovn-appctl -t ovn-controller memory/show ;ps-mem -p $(pidof 
 ovn-controller)| grep ovn
 idl-cells-OVN_Southbound:11388742 idl-cells-Open_vSwitch:14131 
 idl-outstanding-txns-Open_vSwitch:1
 3.3 GiB + 327.0 KiB =   3.3 GiBovn-controller
 
 after start:
 
 # ovn-appctl -t ovn-controller memory/show ;ps-mem -p $(pidof 
 ovn-controller)| grep ovn
 idl-cells-OVN_Southbound:343896 idl-cells-Open_vSwitch:14131 
 idl-outstanding-txns-Open_vSwitch:1
 3.3 GiB + 327.0 KiB =   3.3 GiBovn-controller
 
 
 cells during start:
 11388742
 
 cells after start:
 343896
 
>>> 
>>> Are you running with ovn-monitor-all=true on this host?
>> 
>> No, it has default false.
>> 
>>> 
>>> I guess it's unlikely but I'll try just in case: would it be possible to
>>> share the SB database?
>> 
>> Unfortunately, no. But I can say it’s about 450 M in size after 
>> compaction. And there are about 1M mac_bindings if it’s important :).
> 
 
 I tried in a sandbox, before and after the commit in question, with a
 script that adds 1M mac bindings on top of the sample topology built by
 tutorial/ovn-setup.sh.
 
 I see ovn-controller memory usage going to >3GB in before the commit you
 blamed and to >1.9GB after the same commit.  So it looks different than
 what you reported but still worrying that we use so much memory for mac
 bindings.
 
 I'm assuming however that quite a few of the 1M mac bindings in your
 setup are stale so would it be possible to enable mac_binding aging?  It
 needs to be enabled per router with something like:
 
 $ ovn-nbctl set logical_router RTR options:mac_binding_age_threshold=60
 
 The threshold is in seconds and is a hard limit (for now) 

Re: [ovs-dev] [ovn] ovn-controller high memory consumption with sbrec_server_has_##table##_table - enabled codepath

2023-07-11 Thread Dumitru Ceara
On 7/10/23 22:20, Vladislav Odintsov wrote:
> Hi Dumitru,
> 
> thanks for digging into this! I highly appreciate your help!
> 

No worries, my pleasure! :)

> Please, see my answers inline.
> 
>> On 10 Jul 2023, at 15:28, Dumitru Ceara  wrote:
>>
>> On 7/10/23 12:57, Dumitru Ceara wrote:
>>> On 7/6/23 13:00, Vladislav Odintsov wrote:


> On 5 Jul 2023, at 20:07, Vladislav Odintsov  wrote:
>
> Hi Dumitru,
>
> thanks for the quick response!
>
>> On 5 Jul 2023, at 19:53, Dumitru Ceara  wrote:
>>
>> On 7/5/23 17:14, Vladislav Odintsov wrote:
>>> Hi,
>>>
>>
>> Hi Vladislav,
>>
>>> we’ve noticed there is a huge ovn-controller memory consumption 
>>> introduced with [0] comparing to version without its changes in 
>>> ovn-controller.c part (just OVS submodule bump without ovn-controller 
>>> changes doesn’t trigger such behaviour).
>>>
>>
>> Thanks for reporting this!
>>
>>> On an empty host connected to working cluster ovn-controller normally 
>>> consumes ~175 MiB RSS, and the same host updated to a version with 
>>> commit [0] consumes 3.3 GiB RSS. The start time and CPU consumption 
>>> during process start of an affected version is higher that for the 
>>> normal version.
>>>
>>> Before upgrade (normal functioning):
>>>
>>> #ovn-appctl -t ovn-controller memory/show ;ps-mem -p $(pidof 
>>> ovn-controller)| grep ovn
>>> idl-cells-OVN_Southbound:343855 idl-cells-Open_vSwitch:14131 
>>> ofctrl_desired_flow_usage-KB:76 ofctrl_installed_flow_usage-KB:60 
>>> ofctrl_sb_flow_ref_usage-KB:18
>>> 174.2 MiB + 327.5 KiB = 174.5 MiB   ovn-controller
>>>
>>> After upgrade to an affected version I’ve checked memory report while 
>>> ovn-controller was starting and at some time there was a bigger amount 
>>> of OVN_Southbound cells comparing to "after start" time:
>>>
>>> during start:
>>>
>>> # ovn-appctl -t ovn-controller memory/show ;ps-mem -p $(pidof 
>>> ovn-controller)| grep ovn
>>> idl-cells-OVN_Southbound:11388742 idl-cells-Open_vSwitch:14131 
>>> idl-outstanding-txns-Open_vSwitch:1
>>> 3.3 GiB + 327.0 KiB =   3.3 GiB ovn-controller
>>>
>>> after start:
>>>
>>> # ovn-appctl -t ovn-controller memory/show ;ps-mem -p $(pidof 
>>> ovn-controller)| grep ovn
>>> idl-cells-OVN_Southbound:343896 idl-cells-Open_vSwitch:14131 
>>> idl-outstanding-txns-Open_vSwitch:1
>>> 3.3 GiB + 327.0 KiB =   3.3 GiB ovn-controller
>>>
>>>
>>> cells during start:
>>> 11388742
>>>
>>> cells after start:
>>> 343896
>>>
>>
>> Are you running with ovn-monitor-all=true on this host?
>
> No, it has default false.
>
>>
>> I guess it's unlikely but I'll try just in case: would it be possible to
>> share the SB database?
>
> Unfortunately, no. But I can say it’s about 450 M in size after 
> compaction. And there are about 1M mac_bindings if it’s important :).

>>>
>>> I tried in a sandbox, before and after the commit in question, with a
>>> script that adds 1M mac bindings on top of the sample topology built by
>>> tutorial/ovn-setup.sh.
>>>
>>> I see ovn-controller memory usage going to >3GB in before the commit you
>>> blamed and to >1.9GB after the same commit.  So it looks different than
>>> what you reported but still worrying that we use so much memory for mac
>>> bindings.
>>>
>>> I'm assuming however that quite a few of the 1M mac bindings in your
>>> setup are stale so would it be possible to enable mac_binding aging?  It
>>> needs to be enabled per router with something like:
>>>
>>> $ ovn-nbctl set logical_router RTR options:mac_binding_age_threshold=60
>>>
>>> The threshold is in seconds and is a hard limit (for now) after which a
>>> mac binding entry is removed.  There's work in progress to only clear
>>> arp cache entries that are really stale [1].
> 
> Yeah, I know about mac_binding ageing, but we currently use our own mechanism 
> to delete excess records and waiting for patchset you’ve mentioned.
> 

Ack.

>>>
 But if you are interested in any specific information, let me know it, I 
 can check.

>>>
>>> How many "local" datapaths do we have on the hosts that exhibit high
>>> memory usage?
>>>
>>> The quickest way I can think of to get this info is to run this on the
>>> hypervisor:
>>> $ ovn-appctl ct-zone-list | grep snat -c
>>>
>>> Additional question: how many mac_bindings are "local", i.e., associated
>>> to local datapaths?
> 
> For both questions: 0.
> 

OK, that really suggests that malloc_trim() didn't get called.  There's
really no reason (especially with ovn-monitor-all=false) to have such a
high memory usage in ovn-controller if there's no local datapath.

>>>
>
>>
>>> I guess it could be connected with this problem. Can anyone look at 
>>> this and comment please?
>>>
>>

Re: [ovs-dev] [ovn] ovn-controller high memory consumption with sbrec_server_has_##table##_table - enabled codepath

2023-07-10 Thread Vladislav Odintsov
Hi Dumitru,

thanks for digging into this! I highly appreciate your help!

Please, see my answers inline.

> On 10 Jul 2023, at 15:28, Dumitru Ceara  wrote:
> 
> On 7/10/23 12:57, Dumitru Ceara wrote:
>> On 7/6/23 13:00, Vladislav Odintsov wrote:
>>> 
>>> 
 On 5 Jul 2023, at 20:07, Vladislav Odintsov  wrote:
 
 Hi Dumitru,
 
 thanks for the quick response!
 
> On 5 Jul 2023, at 19:53, Dumitru Ceara  wrote:
> 
> On 7/5/23 17:14, Vladislav Odintsov wrote:
>> Hi,
>> 
> 
> Hi Vladislav,
> 
>> we’ve noticed there is a huge ovn-controller memory consumption 
>> introduced with [0] comparing to version without its changes in 
>> ovn-controller.c part (just OVS submodule bump without ovn-controller 
>> changes doesn’t trigger such behaviour).
>> 
> 
> Thanks for reporting this!
> 
>> On an empty host connected to working cluster ovn-controller normally 
>> consumes ~175 MiB RSS, and the same host updated to a version with 
>> commit [0] consumes 3.3 GiB RSS. The start time and CPU consumption 
>> during process start of an affected version is higher that for the 
>> normal version.
>> 
>> Before upgrade (normal functioning):
>> 
>> #ovn-appctl -t ovn-controller memory/show ;ps-mem -p $(pidof 
>> ovn-controller)| grep ovn
>> idl-cells-OVN_Southbound:343855 idl-cells-Open_vSwitch:14131 
>> ofctrl_desired_flow_usage-KB:76 ofctrl_installed_flow_usage-KB:60 
>> ofctrl_sb_flow_ref_usage-KB:18
>> 174.2 MiB + 327.5 KiB = 174.5 MiBovn-controller
>> 
>> After upgrade to an affected version I’ve checked memory report while 
>> ovn-controller was starting and at some time there was a bigger amount 
>> of OVN_Southbound cells comparing to "after start" time:
>> 
>> during start:
>> 
>> # ovn-appctl -t ovn-controller memory/show ;ps-mem -p $(pidof 
>> ovn-controller)| grep ovn
>> idl-cells-OVN_Southbound:11388742 idl-cells-Open_vSwitch:14131 
>> idl-outstanding-txns-Open_vSwitch:1
>> 3.3 GiB + 327.0 KiB =   3.3 GiB  ovn-controller
>> 
>> after start:
>> 
>> # ovn-appctl -t ovn-controller memory/show ;ps-mem -p $(pidof 
>> ovn-controller)| grep ovn
>> idl-cells-OVN_Southbound:343896 idl-cells-Open_vSwitch:14131 
>> idl-outstanding-txns-Open_vSwitch:1
>> 3.3 GiB + 327.0 KiB =   3.3 GiB  ovn-controller
>> 
>> 
>> cells during start:
>> 11388742
>> 
>> cells after start:
>> 343896
>> 
> 
> Are you running with ovn-monitor-all=true on this host?
 
 No, it has default false.
 
> 
> I guess it's unlikely but I'll try just in case: would it be possible to
> share the SB database?
 
 Unfortunately, no. But I can say it’s about 450 M in size after 
 compaction. And there are about 1M mac_bindings if it’s important :).
>>> 
>> 
>> I tried in a sandbox, before and after the commit in question, with a
>> script that adds 1M mac bindings on top of the sample topology built by
>> tutorial/ovn-setup.sh.
>> 
>> I see ovn-controller memory usage going to >3GB in before the commit you
>> blamed and to >1.9GB after the same commit.  So it looks different than
>> what you reported but still worrying that we use so much memory for mac
>> bindings.
>> 
>> I'm assuming however that quite a few of the 1M mac bindings in your
>> setup are stale so would it be possible to enable mac_binding aging?  It
>> needs to be enabled per router with something like:
>> 
>> $ ovn-nbctl set logical_router RTR options:mac_binding_age_threshold=60
>> 
>> The threshold is in seconds and is a hard limit (for now) after which a
>> mac binding entry is removed.  There's work in progress to only clear
>> arp cache entries that are really stale [1].

Yeah, I know about mac_binding ageing, but we currently use our own mechanism 
to delete excess records and waiting for patchset you’ve mentioned.

>> 
>>> But if you are interested in any specific information, let me know it, I 
>>> can check.
>>> 
>> 
>> How many "local" datapaths do we have on the hosts that exhibit high
>> memory usage?
>> 
>> The quickest way I can think of to get this info is to run this on the
>> hypervisor:
>> $ ovn-appctl ct-zone-list | grep snat -c
>> 
>> Additional question: how many mac_bindings are "local", i.e., associated
>> to local datapaths?

For both questions: 0.

>> 
 
> 
>> I guess it could be connected with this problem. Can anyone look at this 
>> and comment please?
>> 
> 
> Does the memory usage persist after SB is upgraded too?  I see the
> number of SB idl-cells goes down eventually which means that eventually
> the periodic malloc_trim() call would free up memory.  We trim on idle
> since
> https://github.com/ovn-org/ovn/commit/b4c593d23bd959e98fcc9ada4a973ac933579ded
> 
 
 In this upgrade DB schemas not upgraded, so they’re 

Re: [ovs-dev] [ovn] ovn-controller high memory consumption with sbrec_server_has_##table##_table - enabled codepath

2023-07-10 Thread Dumitru Ceara
On 7/10/23 12:57, Dumitru Ceara wrote:
> On 7/6/23 13:00, Vladislav Odintsov wrote:
>>
>>
>>> On 5 Jul 2023, at 20:07, Vladislav Odintsov  wrote:
>>>
>>> Hi Dumitru,
>>>
>>> thanks for the quick response!
>>>
 On 5 Jul 2023, at 19:53, Dumitru Ceara  wrote:

 On 7/5/23 17:14, Vladislav Odintsov wrote:
> Hi,
>

 Hi Vladislav,

> we’ve noticed there is a huge ovn-controller memory consumption 
> introduced with [0] comparing to version without its changes in 
> ovn-controller.c part (just OVS submodule bump without ovn-controller 
> changes doesn’t trigger such behaviour).
>

 Thanks for reporting this!

> On an empty host connected to working cluster ovn-controller normally 
> consumes ~175 MiB RSS, and the same host updated to a version with commit 
> [0] consumes 3.3 GiB RSS. The start time and CPU consumption during 
> process start of an affected version is higher that for the normal 
> version.
>
> Before upgrade (normal functioning):
>
> #ovn-appctl -t ovn-controller memory/show ;ps-mem -p $(pidof 
> ovn-controller)| grep ovn
> idl-cells-OVN_Southbound:343855 idl-cells-Open_vSwitch:14131 
> ofctrl_desired_flow_usage-KB:76 ofctrl_installed_flow_usage-KB:60 
> ofctrl_sb_flow_ref_usage-KB:18
> 174.2 MiB + 327.5 KiB = 174.5 MiB ovn-controller
>
> After upgrade to an affected version I’ve checked memory report while 
> ovn-controller was starting and at some time there was a bigger amount of 
> OVN_Southbound cells comparing to "after start" time:
>
> during start:
>
> # ovn-appctl -t ovn-controller memory/show ;ps-mem -p $(pidof 
> ovn-controller)| grep ovn
> idl-cells-OVN_Southbound:11388742 idl-cells-Open_vSwitch:14131 
> idl-outstanding-txns-Open_vSwitch:1
> 3.3 GiB + 327.0 KiB =   3.3 GiB   ovn-controller
>
> after start:
>
> # ovn-appctl -t ovn-controller memory/show ;ps-mem -p $(pidof 
> ovn-controller)| grep ovn
> idl-cells-OVN_Southbound:343896 idl-cells-Open_vSwitch:14131 
> idl-outstanding-txns-Open_vSwitch:1
> 3.3 GiB + 327.0 KiB =   3.3 GiB   ovn-controller
>
>
> cells during start:
> 11388742
>
> cells after start:
> 343896
>

 Are you running with ovn-monitor-all=true on this host?
>>>
>>> No, it has default false.
>>>

 I guess it's unlikely but I'll try just in case: would it be possible to
 share the SB database?
>>>
>>> Unfortunately, no. But I can say it’s about 450 M in size after compaction. 
>>> And there are about 1M mac_bindings if it’s important :).
>>
> 
> I tried in a sandbox, before and after the commit in question, with a
> script that adds 1M mac bindings on top of the sample topology built by
> tutorial/ovn-setup.sh.
> 
> I see ovn-controller memory usage going to >3GB in before the commit you
> blamed and to >1.9GB after the same commit.  So it looks different than
> what you reported but still worrying that we use so much memory for mac
> bindings.
> 
> I'm assuming however that quite a few of the 1M mac bindings in your
> setup are stale so would it be possible to enable mac_binding aging?  It
> needs to be enabled per router with something like:
> 
>  $ ovn-nbctl set logical_router RTR options:mac_binding_age_threshold=60
> 
> The threshold is in seconds and is a hard limit (for now) after which a
> mac binding entry is removed.  There's work in progress to only clear
> arp cache entries that are really stale [1].
> 
>> But if you are interested in any specific information, let me know it, I can 
>> check.
>>
> 
> How many "local" datapaths do we have on the hosts that exhibit high
> memory usage?
> 
> The quickest way I can think of to get this info is to run this on the
> hypervisor:
>  $ ovn-appctl ct-zone-list | grep snat -c
> 
> Additional question: how many mac_bindings are "local", i.e., associated
> to local datapaths?
> 
>>>

> I guess it could be connected with this problem. Can anyone look at this 
> and comment please?
>

 Does the memory usage persist after SB is upgraded too?  I see the
 number of SB idl-cells goes down eventually which means that eventually
 the periodic malloc_trim() call would free up memory.  We trim on idle
 since
 https://github.com/ovn-org/ovn/commit/b4c593d23bd959e98fcc9ada4a973ac933579ded

>>>
>>> In this upgrade DB schemas not upgraded, so they’re up to date.
>>>
 Are you using a different allocator?  E.g., jemalloc.
>>>
>>> No, this issue reproduces with gcc.
>>>
> 
> Can we run a test and see if malloc_trim() was actually called?  I'd
> first disable lflow-cache log rate limiting:
> 
>  $ ovn-appctl vlog/disable-rate-limit lflow_cache
> 
> Then check if malloc_trim() was called after the lflow-cache detected
> inactivity.  You'd see logs like:
> 
> "lflow_cache|INFO|Detected cache inactivity (last active 30005 ms ago):
> trimming 

Re: [ovs-dev] [ovn] ovn-controller high memory consumption with sbrec_server_has_##table##_table - enabled codepath

2023-07-10 Thread Dumitru Ceara
On 7/6/23 13:00, Vladislav Odintsov wrote:
> 
> 
>> On 5 Jul 2023, at 20:07, Vladislav Odintsov  wrote:
>>
>> Hi Dumitru,
>>
>> thanks for the quick response!
>>
>>> On 5 Jul 2023, at 19:53, Dumitru Ceara  wrote:
>>>
>>> On 7/5/23 17:14, Vladislav Odintsov wrote:
 Hi,

>>>
>>> Hi Vladislav,
>>>
 we’ve noticed there is a huge ovn-controller memory consumption introduced 
 with [0] comparing to version without its changes in ovn-controller.c part 
 (just OVS submodule bump without ovn-controller changes doesn’t trigger 
 such behaviour).

>>>
>>> Thanks for reporting this!
>>>
 On an empty host connected to working cluster ovn-controller normally 
 consumes ~175 MiB RSS, and the same host updated to a version with commit 
 [0] consumes 3.3 GiB RSS. The start time and CPU consumption during 
 process start of an affected version is higher that for the normal version.

 Before upgrade (normal functioning):

 #ovn-appctl -t ovn-controller memory/show ;ps-mem -p $(pidof 
 ovn-controller)| grep ovn
 idl-cells-OVN_Southbound:343855 idl-cells-Open_vSwitch:14131 
 ofctrl_desired_flow_usage-KB:76 ofctrl_installed_flow_usage-KB:60 
 ofctrl_sb_flow_ref_usage-KB:18
 174.2 MiB + 327.5 KiB = 174.5 MiB  ovn-controller

 After upgrade to an affected version I’ve checked memory report while 
 ovn-controller was starting and at some time there was a bigger amount of 
 OVN_Southbound cells comparing to "after start" time:

 during start:

 # ovn-appctl -t ovn-controller memory/show ;ps-mem -p $(pidof 
 ovn-controller)| grep ovn
 idl-cells-OVN_Southbound:11388742 idl-cells-Open_vSwitch:14131 
 idl-outstanding-txns-Open_vSwitch:1
 3.3 GiB + 327.0 KiB =   3.3 GiBovn-controller

 after start:

 # ovn-appctl -t ovn-controller memory/show ;ps-mem -p $(pidof 
 ovn-controller)| grep ovn
 idl-cells-OVN_Southbound:343896 idl-cells-Open_vSwitch:14131 
 idl-outstanding-txns-Open_vSwitch:1
 3.3 GiB + 327.0 KiB =   3.3 GiBovn-controller


 cells during start:
 11388742

 cells after start:
 343896

>>>
>>> Are you running with ovn-monitor-all=true on this host?
>>
>> No, it has default false.
>>
>>>
>>> I guess it's unlikely but I'll try just in case: would it be possible to
>>> share the SB database?
>>
>> Unfortunately, no. But I can say it’s about 450 M in size after compaction. 
>> And there are about 1M mac_bindings if it’s important :).
> 

I tried in a sandbox, before and after the commit in question, with a
script that adds 1M mac bindings on top of the sample topology built by
tutorial/ovn-setup.sh.

I see ovn-controller memory usage going to >3GB in before the commit you
blamed and to >1.9GB after the same commit.  So it looks different than
what you reported but still worrying that we use so much memory for mac
bindings.

I'm assuming however that quite a few of the 1M mac bindings in your
setup are stale so would it be possible to enable mac_binding aging?  It
needs to be enabled per router with something like:

 $ ovn-nbctl set logical_router RTR options:mac_binding_age_threshold=60

The threshold is in seconds and is a hard limit (for now) after which a
mac binding entry is removed.  There's work in progress to only clear
arp cache entries that are really stale [1].

> But if you are interested in any specific information, let me know it, I can 
> check.
> 

How many "local" datapaths do we have on the hosts that exhibit high
memory usage?

The quickest way I can think of to get this info is to run this on the
hypervisor:
 $ ovn-appctl ct-zone-list | grep snat -c

Additional question: how many mac_bindings are "local", i.e., associated
to local datapaths?

>>
>>>
 I guess it could be connected with this problem. Can anyone look at this 
 and comment please?

>>>
>>> Does the memory usage persist after SB is upgraded too?  I see the
>>> number of SB idl-cells goes down eventually which means that eventually
>>> the periodic malloc_trim() call would free up memory.  We trim on idle
>>> since
>>> https://github.com/ovn-org/ovn/commit/b4c593d23bd959e98fcc9ada4a973ac933579ded
>>>
>>
>> In this upgrade DB schemas not upgraded, so they’re up to date.
>>
>>> Are you using a different allocator?  E.g., jemalloc.
>>
>> No, this issue reproduces with gcc.
>>

Can we run a test and see if malloc_trim() was actually called?  I'd
first disable lflow-cache log rate limiting:

 $ ovn-appctl vlog/disable-rate-limit lflow_cache

Then check if malloc_trim() was called after the lflow-cache detected
inactivity.  You'd see logs like:

"lflow_cache|INFO|Detected cache inactivity (last active 30005 ms ago):
trimming cache"

The fact that the number of SB idl-cells goes down and memory doesn't
seems to indicate we might have a bug in the auto cache trimming mechanism.

In any case, a malloc_trim() can be manually triggered by flushing the

Re: [ovs-dev] [ovn] ovn-controller high memory consumption with sbrec_server_has_##table##_table - enabled codepath

2023-07-06 Thread Vladislav Odintsov


> On 5 Jul 2023, at 20:07, Vladislav Odintsov  wrote:
> 
> Hi Dumitru,
> 
> thanks for the quick response!
> 
>> On 5 Jul 2023, at 19:53, Dumitru Ceara  wrote:
>> 
>> On 7/5/23 17:14, Vladislav Odintsov wrote:
>>> Hi,
>>> 
>> 
>> Hi Vladislav,
>> 
>>> we’ve noticed there is a huge ovn-controller memory consumption introduced 
>>> with [0] comparing to version without its changes in ovn-controller.c part 
>>> (just OVS submodule bump without ovn-controller changes doesn’t trigger 
>>> such behaviour).
>>> 
>> 
>> Thanks for reporting this!
>> 
>>> On an empty host connected to working cluster ovn-controller normally 
>>> consumes ~175 MiB RSS, and the same host updated to a version with commit 
>>> [0] consumes 3.3 GiB RSS. The start time and CPU consumption during process 
>>> start of an affected version is higher that for the normal version.
>>> 
>>> Before upgrade (normal functioning):
>>> 
>>> #ovn-appctl -t ovn-controller memory/show ;ps-mem -p $(pidof 
>>> ovn-controller)| grep ovn
>>> idl-cells-OVN_Southbound:343855 idl-cells-Open_vSwitch:14131 
>>> ofctrl_desired_flow_usage-KB:76 ofctrl_installed_flow_usage-KB:60 
>>> ofctrl_sb_flow_ref_usage-KB:18
>>> 174.2 MiB + 327.5 KiB = 174.5 MiB   ovn-controller
>>> 
>>> After upgrade to an affected version I’ve checked memory report while 
>>> ovn-controller was starting and at some time there was a bigger amount of 
>>> OVN_Southbound cells comparing to "after start" time:
>>> 
>>> during start:
>>> 
>>> # ovn-appctl -t ovn-controller memory/show ;ps-mem -p $(pidof 
>>> ovn-controller)| grep ovn
>>> idl-cells-OVN_Southbound:11388742 idl-cells-Open_vSwitch:14131 
>>> idl-outstanding-txns-Open_vSwitch:1
>>> 3.3 GiB + 327.0 KiB =   3.3 GiB ovn-controller
>>> 
>>> after start:
>>> 
>>> # ovn-appctl -t ovn-controller memory/show ;ps-mem -p $(pidof 
>>> ovn-controller)| grep ovn
>>> idl-cells-OVN_Southbound:343896 idl-cells-Open_vSwitch:14131 
>>> idl-outstanding-txns-Open_vSwitch:1
>>> 3.3 GiB + 327.0 KiB =   3.3 GiB ovn-controller
>>> 
>>> 
>>> cells during start:
>>> 11388742
>>> 
>>> cells after start:
>>> 343896
>>> 
>> 
>> Are you running with ovn-monitor-all=true on this host?
> 
> No, it has default false.
> 
>> 
>> I guess it's unlikely but I'll try just in case: would it be possible to
>> share the SB database?
> 
> Unfortunately, no. But I can say it’s about 450 M in size after compaction. 
> And there are about 1M mac_bindings if it’s important :).

But if you are interested in any specific information, let me know it, I can 
check.

> 
>> 
>>> I guess it could be connected with this problem. Can anyone look at this 
>>> and comment please?
>>> 
>> 
>> Does the memory usage persist after SB is upgraded too?  I see the
>> number of SB idl-cells goes down eventually which means that eventually
>> the periodic malloc_trim() call would free up memory.  We trim on idle
>> since
>> https://github.com/ovn-org/ovn/commit/b4c593d23bd959e98fcc9ada4a973ac933579ded
>> 
> 
> In this upgrade DB schemas not upgraded, so they’re up to date.
> 
>> Are you using a different allocator?  E.g., jemalloc.
> 
> No, this issue reproduces with gcc.
> 
>> 
>>> 
>>> 0: 
>>> https://github.com/ovn-org/ovn/commit/1b0dbde940706e5de6e60221be78a278361fa76d
>>> 
>>> 
>>> 
>>> Regards,
>>> Vladislav Odintsov
>>> 
>> 
>> Regards,
>> Dumitru
>> 
>> ___
>> dev mailing list
>> d...@openvswitch.org  
>> 
>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> 
> 
> Regards,
> Vladislav Odintsov
> 
> ___
> dev mailing list
> d...@openvswitch.org 
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Regards,
Vladislav Odintsov

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [ovn] ovn-controller high memory consumption with sbrec_server_has_##table##_table - enabled codepath

2023-07-05 Thread Vladislav Odintsov
Hi Dumitru,

thanks for the quick response!

> On 5 Jul 2023, at 19:53, Dumitru Ceara  wrote:
> 
> On 7/5/23 17:14, Vladislav Odintsov wrote:
>> Hi,
>> 
> 
> Hi Vladislav,
> 
>> we’ve noticed there is a huge ovn-controller memory consumption introduced 
>> with [0] comparing to version without its changes in ovn-controller.c part 
>> (just OVS submodule bump without ovn-controller changes doesn’t trigger such 
>> behaviour).
>> 
> 
> Thanks for reporting this!
> 
>> On an empty host connected to working cluster ovn-controller normally 
>> consumes ~175 MiB RSS, and the same host updated to a version with commit 
>> [0] consumes 3.3 GiB RSS. The start time and CPU consumption during process 
>> start of an affected version is higher that for the normal version.
>> 
>> Before upgrade (normal functioning):
>> 
>> #ovn-appctl -t ovn-controller memory/show ;ps-mem -p $(pidof 
>> ovn-controller)| grep ovn
>> idl-cells-OVN_Southbound:343855 idl-cells-Open_vSwitch:14131 
>> ofctrl_desired_flow_usage-KB:76 ofctrl_installed_flow_usage-KB:60 
>> ofctrl_sb_flow_ref_usage-KB:18
>> 174.2 MiB + 327.5 KiB = 174.5 MiBovn-controller
>> 
>> After upgrade to an affected version I’ve checked memory report while 
>> ovn-controller was starting and at some time there was a bigger amount of 
>> OVN_Southbound cells comparing to "after start" time:
>> 
>> during start:
>> 
>> # ovn-appctl -t ovn-controller memory/show ;ps-mem -p $(pidof 
>> ovn-controller)| grep ovn
>> idl-cells-OVN_Southbound:11388742 idl-cells-Open_vSwitch:14131 
>> idl-outstanding-txns-Open_vSwitch:1
>>  3.3 GiB + 327.0 KiB =   3.3 GiB ovn-controller
>> 
>> after start:
>> 
>> # ovn-appctl -t ovn-controller memory/show ;ps-mem -p $(pidof 
>> ovn-controller)| grep ovn
>> idl-cells-OVN_Southbound:343896 idl-cells-Open_vSwitch:14131 
>> idl-outstanding-txns-Open_vSwitch:1
>>  3.3 GiB + 327.0 KiB =   3.3 GiB ovn-controller
>> 
>> 
>> cells during start:
>> 11388742
>> 
>> cells after start:
>> 343896
>> 
> 
> Are you running with ovn-monitor-all=true on this host?

No, it has default false.

> 
> I guess it's unlikely but I'll try just in case: would it be possible to
> share the SB database?

Unfortunately, no. But I can say it’s about 450 M in size after compaction. And 
there are about 1M mac_bindings if it’s important :).

> 
>> I guess it could be connected with this problem. Can anyone look at this and 
>> comment please?
>> 
> 
> Does the memory usage persist after SB is upgraded too?  I see the
> number of SB idl-cells goes down eventually which means that eventually
> the periodic malloc_trim() call would free up memory.  We trim on idle
> since
> https://github.com/ovn-org/ovn/commit/b4c593d23bd959e98fcc9ada4a973ac933579ded
> 

In this upgrade DB schemas not upgraded, so they’re up to date.

> Are you using a different allocator?  E.g., jemalloc.

No, this issue reproduces with gcc.

> 
>> 
>> 0: 
>> https://github.com/ovn-org/ovn/commit/1b0dbde940706e5de6e60221be78a278361fa76d
>> 
>> 
>> 
>> Regards,
>> Vladislav Odintsov
>> 
> 
> Regards,
> Dumitru
> 
> ___
> dev mailing list
> d...@openvswitch.org 
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Regards,
Vladislav Odintsov

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [ovn] ovn-controller high memory consumption with sbrec_server_has_##table##_table - enabled codepath

2023-07-05 Thread Dumitru Ceara
On 7/5/23 17:14, Vladislav Odintsov wrote:
> Hi,
> 

Hi Vladislav,

> we’ve noticed there is a huge ovn-controller memory consumption introduced 
> with [0] comparing to version without its changes in ovn-controller.c part 
> (just OVS submodule bump without ovn-controller changes doesn’t trigger such 
> behaviour).
>

Thanks for reporting this!

> On an empty host connected to working cluster ovn-controller normally 
> consumes ~175 MiB RSS, and the same host updated to a version with commit [0] 
> consumes 3.3 GiB RSS. The start time and CPU consumption during process start 
> of an affected version is higher that for the normal version.
> 
> Before upgrade (normal functioning):
> 
> #ovn-appctl -t ovn-controller memory/show ;ps-mem -p $(pidof ovn-controller)| 
> grep ovn
> idl-cells-OVN_Southbound:343855 idl-cells-Open_vSwitch:14131 
> ofctrl_desired_flow_usage-KB:76 ofctrl_installed_flow_usage-KB:60 
> ofctrl_sb_flow_ref_usage-KB:18
> 174.2 MiB + 327.5 KiB = 174.5 MiB ovn-controller
> 
> After upgrade to an affected version I’ve checked memory report while 
> ovn-controller was starting and at some time there was a bigger amount of 
> OVN_Southbound cells comparing to "after start" time:
> 
> during start:
> 
> # ovn-appctl -t ovn-controller memory/show ;ps-mem -p $(pidof 
> ovn-controller)| grep ovn
> idl-cells-OVN_Southbound:11388742 idl-cells-Open_vSwitch:14131 
> idl-outstanding-txns-Open_vSwitch:1
>   3.3 GiB + 327.0 KiB =   3.3 GiB ovn-controller
> 
> after start:
> 
> # ovn-appctl -t ovn-controller memory/show ;ps-mem -p $(pidof 
> ovn-controller)| grep ovn
> idl-cells-OVN_Southbound:343896 idl-cells-Open_vSwitch:14131 
> idl-outstanding-txns-Open_vSwitch:1
>   3.3 GiB + 327.0 KiB =   3.3 GiB ovn-controller
> 
> 
> cells during start:
> 11388742
> 
> cells after start:
> 343896
> 

Are you running with ovn-monitor-all=true on this host?

I guess it's unlikely but I'll try just in case: would it be possible to
share the SB database?

> I guess it could be connected with this problem. Can anyone look at this and 
> comment please?
> 

Does the memory usage persist after SB is upgraded too?  I see the
number of SB idl-cells goes down eventually which means that eventually
the periodic malloc_trim() call would free up memory.  We trim on idle
since
https://github.com/ovn-org/ovn/commit/b4c593d23bd959e98fcc9ada4a973ac933579ded

Are you using a different allocator?  E.g., jemalloc.

> 
> 0: 
> https://github.com/ovn-org/ovn/commit/1b0dbde940706e5de6e60221be78a278361fa76d
> 
> 
> 
> Regards,
> Vladislav Odintsov
> 

Regards,
Dumitru

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev