On 4/13/26 4:11 PM, Felix Huettner wrote:
> Am Tue, Apr 07, 2026 at 10:36:19PM +0200 schrieb Ilya Maximets:
>> On 3/17/26 9:28 AM, Felix Huettner via dev wrote:
>> > Previously the transaction history was hard limited to 100 entries.
>> > However in fast changing environments (e.g. with 20 transactions/sec)
>> > this means that transactions are only in the history for a few seconds.
>> > If now a client reconnects and tries to use monitor_cond_since it has
>> > only a short timeframe where this will reliably work.
>> >
>> > We make the history limit configurable here so that use can choose the
>>
>> *user
>>
>> > speed vs memory tradeoff as needed.
>> > We still keep the limit based on size of the history, as syncing a
>> > history larger than the actual database size does not make sense in any
>> > way.
>> >
>> > Signed-off-by: Felix Huettner <[email protected]>
>> > ---
>>
>> Hi, Felix. Thanks or the patch! And sorry for delay (lots of random
>> security stuff lately...).
>>
>> I was thinking about this problem for along time but didn't make a patch
>> so far, but the problem definitely needs some solution.
>>
>> One other use case here beside the basic transaction rate is when you
>> run a sync command and every ovn-controller sends a tiny transaction
>> to update the sequence number. In a large cluster the entire history
>> will be overwritten multiple times within a few seconds.
>>
>> However, what I was thinking is maybe we should just drop the static
>> history limit and always rely on the dynamic one. Have you tried this
>> approach? Or do you see the configured history limit being hit after
>> you increase it? If so, is the memory consumption significantly
>> different than just allowing it to grow up to the database size?
>>
>> Just trying to see if we can drop the limit and avoid extra config knobs.
>> I saw people hacking the code to set the limit to 5000 in practice, but
>> I'm not sure if this value is even reachable in the normal operation, or
>> is it always capped by the database size in the end anyway.
>
> Hi Ilya,
>
> thanks for the feedback.
>
> So just to get some rough numbers from one of our production southbound
> clusters:
> * atoms: 23_255_133
> * txn-history: 100
> * txn-history-atoms: 100_196
>
> So we are currently at roughly 1000 atoms per history entry.
> This would mean that we could get up to roughly 23_000 transation
> history entries.
> At our current rate of roughly 20 transactions/sec this would last us
> for around 19 minutes.
> I am not sure how well a "struct ovs_list" works with so many entries :)
In general, we're not iterating over this list, unless a client reconnects,
so should be fine.
>
> So for me this equals the question if it still makes sense to be able to
> reconnect after 15 minutes and then get all the updates that happened in
> the mean time. My feeling would be that reconnects generally should be
> in the few seconds to maybe a minute category. If something takes longer
> it most probably was broken in one way or another.
That's fair. Though there are events where reconnection could take a minute,
e.g. if there are some DNS problems that usually take time to resolve. But
yes, something like a minute might be a good default, while 15 is indeed a
bit excessive.
>
> My first idea of this change had actually been to allow the user to
> specify a duration for the transaction history. This would be more in
> line with the actual needs that i see. I forgot by now why i did not
> implement it that way, but that would be the other option i currently
> see.
>
> Removing the cap alltogether would definately be nicer, but i'm really
> concerned about the health of it :)
>
> What would be your opinion on that?
I agree that fully removing the cap may be a little too aggressive and not
needed in vast majority of cases.
I like the time idea though. There is one issue with it:
If we're reading the transactions from disc on startup, then we will have
all of them with the same timestamp, potentially bloating the history. We
do have the timestamp in the database file itself, but those are wall clock
timestamps and there is no reliable way to convert them to monotonic ones.
Using wall clock time will cause weird artifacts during time adjustments.
Solution could be to not have time based limit enabled when read_db() is
called from open_db() and only enable it afterwards. Let the history default
limit of 100 transactions be enforced during the open_db()->read_db().
In short, in order of evaluation:
n_history <= 1 --> keep
n_history_atoms > n_db_atoms --> remove
n_txn_history <= 100 --> keep
retention && timestamp + retention >= now --> keep
else --> remove
Then we can keep the ovsdb_txn_history_init() as is and make it initialize
retention to zero and only update it after the open_db()->read_db() is
complete.
We can make it a default 60 seconds for the time-based retention. IIUC, that
will result in ~1200 transactions in the history in your setup. And will
probably be enough for most other setups, and if not, then users can increase
or reduce the value through the config.
This solution retains the last 100 transactions even if the system stayed
idle for a while, but I do not remember requests to reduce the history size,
so it should be fine, I guess.
If we're configuring time, a simple "transaction-history" may suffice as the
config option name.
What do you think?
Best regards, Ilya Maximets.
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev