On 4/11/24 15:43, Chris Riches wrote:
> On 11/04/2024 14:24, Ilya Maximets wrote:
>> On 4/11/24 10:59, Chris Riches wrote:

Hi Chris, Ilya,

>>>  From what we know so far, the DB was full of stale connection-tracking
>>> information such as the following:
>>>
>>> [...]
>>>
>>> Once the host was recovered by putting in the timeout increase,
>>> ovsdb-server successfully started and GCed the database down from 2.4
>>> *GB* to 29 *KB*. Had this happened before the host restart, we would
>>> have never seen this problem. But since it seems possible to end up
>>> booting with such a large DB, we figured a timeout increase was a
>>> sensible measure to take.
>> Uff.  Sounds like ovn-controller went off the rails.
>>
>> Normally, ovsdb-server compacts the database once in 10-20 minutes,
>> if the database doubles the size since the previous check.  If all
>> the transactions are that small, it would mean ovn-controller made
>> about 10K transactions per second in the 10-20 minutes before the
>> restart.  That's huge.
>>
>> I wonder if this can be addressed with a better compaction strategy.
>> Something like forcing compaction if "the database is more than 10 MB
>> and increased 10x" regardless of the time.
> 
> I'm not sure exactly what the test was doing when this was observed, so
> I don't know whether that transaction volume is within the realm of
> possibility or if we're looking at a failure to perform compaction on
> time. It would be nice to have an enhanced safety-net for DB size, as we
> were only a few hundred MB away from hitting filesystem space issues as
> well.
> 

To rule out any known issues, what OVN version is running on that setup?

>> Normally, ovsdb-server compacts the database once in 10-20 minutes, if
>> the database doubles the size since the previous check.
> 
> I presume you mean if it doubled in size since the previous
> *compaction*? If we only compact when it doubles since the last *check*,
> then it would be easy for it to slightly-less-than-double every 10-20
> minutes and never trigger the compaction while still growing exponentially.
> 
> I'm happy to discuss compaction approaches (though my expertise is very
> much in host service management and not OVS itself), but do you think
> there's merit in having this extended timeout as a backstop too?
> 

Regards,
Dumitru

_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to