Hi Dumitru,

I just wanted to ask wether you need any help (maybe, testing) in this?
I’m ready to check this on my dataset if you were successful to implement a fix.

> On 12 Jul 2023, at 12:15, Dumitru Ceara <dce...@redhat.com> wrote:
> 
> On 7/12/23 00:01, Ilya Maximets wrote:
>> On 7/11/23 19:01, Dumitru Ceara wrote:
>>> On 7/11/23 18:33, Vladislav Odintsov wrote:
>>>> Hi Dumitru,
>>>> 
>>>> The system on which I reproduced this issue is running 22.09.x version. 
>>>> I’ve tried to upgrade ovn-controller to main branch + your patch. Please, 
>>>> note that it has test error: [1].
>>>> After two minutes after upgrade it still consumed 3.3G.
>>>> 
>>> 
>>> Ack, I need to re-think the patch then.  Maybe a hard deadline to run
>>> malloc_trim() at least once every X seconds.  I'll see what I can come
>>> up with.
>>> 
>>>> I tried to backport your patch to 22.09, it required to backport also this 
>>>> commit: [2] and it failed some tests: [3].
>>>> 
>>>> But I’ve got general question: prior to commit that I mentioned in initial 
>>>> mail, ovn-controller even didn’t try load such amount of data. And now it 
>>>> does and IIUC, your patch just releases memory that was freed after 
>>>> ovn-controller fully loaded.
>>>> I’m wonder wether it should load that excess data at all? Seems like it 
>>>> did.
>>>> 
>>> 
>>> Looking again at 1b0dbde94070 ("ovn-controller: Only set monitor
>>> conditions on available tables.") it's kind of expected indeed:
>>> 
>>> Initially all tables are "unavailable" because we didn't get the schema
>>> so we don't set any condition for any table.
>>> 
>>> After ovn-controller connects to the SB for the first time it will
>>> determine that the SB tables are in the schema so it will explicitly add
>>> them to the monitor condition and restrict the SB data it is interested in.
>>> 
>>> Maybe we need to change the IDL/CS modules to wait with the
>>> monitor_cond/monitor_cond_since until instructed by the client
>>> (ovn-controller).  Ilya do you have any thoughts on this matter?
>> 
>> So, AFAICT, the issue is that we're running with 
>> 'monitor_everything_by_default'
>> option, the default condition is 'true' and the monitor request for the main
>> database is sent out immediately after receiving the schema, so the 
>> application
>> has no time to react.
>> 
>> I think, there are few possible solutions for this:
>> 
>> 1. Introduce a new state in the CS state machine, e.g.
>>   CS_S_SERVER_SCHEMA_RCEIVED, and move out from this state in the run()
>>   callback.  This way the application will have a chance to set up conditions
>>   before they are sent.  Slightly not intuitive.
>> 
>> 2. A variation on what you suggested, i.e. enter the 
>> CS_S_SERVER_SCHEMA_RCEIVED
>>   state and wait for some sort of the signal from the application to proceed.
>>   Sounds a bit counter-intuitive for an IDL user.
>> 
>> 3. Introduce an application callback that can be called from the
>>   ovsdb_idl_compose_monitor_request() the same way as this function is 
>> getting
>>   called form the ovsdb_cs_send_monitor_request().  An application will be
>>   able to influence conditions before they are sent.
>>   Might be tricky due to new->req->ack state transition.
>> 
>> 4. Make the default condition configurable, e.g. by an additional argument
>>   'default_condition' = true/false for an ovsdb_idl_create().  This way the
>>   application will not get any data until conditions are actually set.
>> 
>> 5. Or it maybe just a separate config function that will set default 
>> conditions
>>   to 'false' and will need to be called before the first run().
>> 
>> 6. Change behavior of 'monitor_everything_by_default' argument.  Make it
>>   actually add all the tables to the monitor, but with the 'false' condition.
>>   Result should technically be the same.  Might be tricky to get right though
>>   with all the backward compatibility.
>> 
>> Option 5 might be the better option of these.
>> 
>> What do you think?
>> 
> 
> I think option 5 sounds the simplest to implement indeed.  It also
> doesn't induce any compatibility issues as you mentioned.
> 
> The only "issue" is we'd probably want this backported to stable OVN
> releases so it means we need to bump the submodule version to an
> unreleased version of OVS.  But that's an OVN problem and we discussed
> similar instances of it before.
> 
> I'll prepare a patch soon.
> 
> Best regards,
> Dumitru


Regards,
Vladislav Odintsov

_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to