On 4/11/24 15:43, Chris Riches wrote: > On 11/04/2024 14:24, Ilya Maximets wrote: >> On 4/11/24 10:59, Chris Riches wrote:
Hi Chris, Ilya, >>> From what we know so far, the DB was full of stale connection-tracking >>> information such as the following: >>> >>> [...] >>> >>> Once the host was recovered by putting in the timeout increase, >>> ovsdb-server successfully started and GCed the database down from 2.4 >>> *GB* to 29 *KB*. Had this happened before the host restart, we would >>> have never seen this problem. But since it seems possible to end up >>> booting with such a large DB, we figured a timeout increase was a >>> sensible measure to take. >> Uff. Sounds like ovn-controller went off the rails. >> >> Normally, ovsdb-server compacts the database once in 10-20 minutes, >> if the database doubles the size since the previous check. If all >> the transactions are that small, it would mean ovn-controller made >> about 10K transactions per second in the 10-20 minutes before the >> restart. That's huge. >> >> I wonder if this can be addressed with a better compaction strategy. >> Something like forcing compaction if "the database is more than 10 MB >> and increased 10x" regardless of the time. > > I'm not sure exactly what the test was doing when this was observed, so > I don't know whether that transaction volume is within the realm of > possibility or if we're looking at a failure to perform compaction on > time. It would be nice to have an enhanced safety-net for DB size, as we > were only a few hundred MB away from hitting filesystem space issues as > well. > To rule out any known issues, what OVN version is running on that setup? >> Normally, ovsdb-server compacts the database once in 10-20 minutes, if >> the database doubles the size since the previous check. > > I presume you mean if it doubled in size since the previous > *compaction*? If we only compact when it doubles since the last *check*, > then it would be easy for it to slightly-less-than-double every 10-20 > minutes and never trigger the compaction while still growing exponentially. > > I'm happy to discuss compaction approaches (though my expertise is very > much in host service management and not OVS itself), but do you think > there's merit in having this extended timeout as a backstop too? > Regards, Dumitru _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev