On 4/11/24 15:43, Chris Riches wrote:
> On 11/04/2024 14:24, Ilya Maximets wrote:
>> On 4/11/24 10:59, Chris Riches wrote:
>>>  From what we know so far, the DB was full of stale connection-tracking
>>> information such as the following:
>>>
>>> [...]
>>>
>>> Once the host was recovered by putting in the timeout increase,
>>> ovsdb-server successfully started and GCed the database down from 2.4
>>> *GB* to 29 *KB*. Had this happened before the host restart, we would
>>> have never seen this problem. But since it seems possible to end up
>>> booting with such a large DB, we figured a timeout increase was a
>>> sensible measure to take.
>> Uff.  Sounds like ovn-controller went off the rails.
>>
>> Normally, ovsdb-server compacts the database once in 10-20 minutes,
>> if the database doubles the size since the previous check.  If all
>> the transactions are that small, it would mean ovn-controller made
>> about 10K transactions per second in the 10-20 minutes before the
>> restart.  That's huge.
>>
>> I wonder if this can be addressed with a better compaction strategy.
>> Something like forcing compaction if "the database is more than 10 MB
>> and increased 10x" regardless of the time.
> 
> I'm not sure exactly what the test was doing when this was observed, so 
> I don't know whether that transaction volume is within the realm of 
> possibility or if we're looking at a failure to perform compaction on 
> time. It would be nice to have an enhanced safety-net for DB size, as we 
> were only a few hundred MB away from hitting filesystem space issues as 
> well.

The compaction check is on the path in the main event loop, so it
should not be possible to avoid it, especially for a standalone
database.  Database will stop executing transactions until compaction
is done.

The transaction rate is very high, bu it might be possible, I guess,
with very small transaction as we have here.

I need to experiment with it and maybe I'll post some patches to
force compaction earlier under extreme conditions like these.

> 
>> Normally, ovsdb-server compacts the database once in 10-20 minutes, if 
>> the database doubles the size since the previous check.
> 
> I presume you mean if it doubled in size since the previous 
> *compaction*? If we only compact when it doubles since the last *check*, 
> then it would be easy for it to slightly-less-than-double every 10-20 
> minutes and never trigger the compaction while still growing exponentially.

Yes, I meant compaction, not the check, sorry.  So, this scenario is
covered and should not be possible.

> 
> I'm happy to discuss compaction approaches (though my expertise is very 
> much in host service management and not OVS itself), but do you think 
> there's merit in having this extended timeout as a backstop too?

Yep, I applied the change for now.

Best regards, Ilya Maximets.
_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to