On 4/11/24 10:59, Chris Riches wrote:
> On 10/04/2024 23:31, Ilya Maximets wrote:
>> On 4/10/24 17:48, Chris Riches wrote:
>>> If the database is particularly large (multi-GB), ovsdb-server can take
>> Hi, Chris.  May I ask how did you end up with multi-GB database?
>> I would understand if it was an OVN Southbound DB, for example,
>> but why the local database that only stores ports/bridges and
>> some other not that large things ends up with so much data?
>>
>> Sounds a little strange.
>>
>> Best regards, Ilya Maximets.
> 
> I'd like to understand that too, and it's a separate RCA we're working 
> on but haven't reached a conclusion yet.
> 
>  From what we know so far, the DB was full of stale connection-tracking 
> information such as the following:
> 
> {
>    "_date": 1710858766431,
>    "Bridge": {
>      "49cb85cd-b085-4af8-98a2-56030dd614b9": {
>        "external_ids": [
>          "map",
>          [
>            [
> "ct-zone-lrp-ext_gw_port_48a89ae3-6528-4851-a277-e21db02518ad",
>              "4"
>            ],
>            [
>              "external",
>              "true"
>            ]
>          ]
>        ]
>      }
>    },
>    "_comment": "ovn-controller: modifying OVS tunnels 
> '5995b338-3080-44b1-9251-58080cc878f7'"
> }
> 
> Once the host was recovered by putting in the timeout increase, 
> ovsdb-server successfully started and GCed the database down from 2.4 
> *GB* to 29 *KB*. Had this happened before the host restart, we would 
> have never seen this problem. But since it seems possible to end up 
> booting with such a large DB, we figured a timeout increase was a 
> sensible measure to take.

Uff.  Sounds like ovn-controller went off the rails.

Normally, ovsdb-server compacts the database once in 10-20 minutes,
if the database doubles the size since the previous check.  If all
the transactions are that small, it would mean ovn-controller made
about 10K transactions per second in the 10-20 minutes before the
restart.  That's huge.

I wonder if this can be addressed with a better compaction strategy.
Something like forcing compaction if "the database is more than 10 MB
and increased 10x" regardless of the time.

Bets regards, Ilya Maximets.
_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to