On 8/2/23 15:45, Ilya Maximets wrote: > During initial read of a database file all the file transactions are > added to the transaction history. The history run with the history > size checks is only executed after the whole file is processed. > If, for some reason, the file contains way too many transactions, > this behavior may result in excessive memory consumption up to > hundreds of GBs. For example, here is a log entry about memory usage > after reading a file with 100K+ OVN NbDB transactions: > > |00004|memory|INFO|95650400 kB peak resident set size after 96.9 seconds > |00005|memory|INFO|atoms:3083346 cells:1838767 monitors:0 > raft-log:123309 txn-history:123307 txn-history-atoms:1647022868 > > In this particular case ovsdb-server allocated 95 GB of RAM in order > to accommodate 1.6 billion ovsdb atoms in the history, while only 3 > million atoms are in the actual database.
Wow.. > > Fix that by running history size checks after applying each file > transaction. This way the memory usage while reading the database > from the example stays at about 1 GB mark. History size checks are > cheap in comparison with transaction replay, so the additional calls > do not reduce performance. > > We could've just moved the history run into ovsdb_txn_replay_commit(), > but it seems more organic to call it externally, since we have init() > and destroy() functions called externally as well. > > Since the history run will be executed shortly after reading the > database and actual memory consumption peak is not always logged, > there seem to be no reliable way to unit test for the issue without > adding extra testing infrastructure into the code. > > Fixes: 695e81502794 ("ovsdb-server: Transaction history tracking.") > Reported-at: https://bugzilla.redhat.com/2228464 > Signed-off-by: Ilya Maximets <i.maxim...@ovn.org> > --- > ovsdb/ovsdb-server.c | 3 ++- > ovsdb/relay.c | 1 + > 2 files changed, 3 insertions(+), 1 deletion(-) > > diff --git a/ovsdb/ovsdb-server.c b/ovsdb/ovsdb-server.c > index 8e623118b..cf09c9079 100644 > --- a/ovsdb/ovsdb-server.c > +++ b/ovsdb/ovsdb-server.c > @@ -235,7 +235,7 @@ main_loop(struct server_config *config, > > SHASH_FOR_EACH_SAFE (node, all_dbs) { > struct db *db = node->data; > - ovsdb_txn_history_run(db->db); > + > ovsdb_storage_run(db->db->storage); > read_db(config, db); > /* Run triggers after storage_run and read_db to make sure new > raft > @@ -678,6 +678,7 @@ parse_txn(struct server_config *config, struct db *db, > if (!error && !uuid_is_zero(txnid)) { > db->db->prereq = *txnid; > } > + ovsdb_txn_history_run(db->db); > } > return error; > } > diff --git a/ovsdb/relay.c b/ovsdb/relay.c > index b035cb492..27ff196b7 100644 > --- a/ovsdb/relay.c > +++ b/ovsdb/relay.c > @@ -413,6 +413,7 @@ ovsdb_relay_run(void) > } > ovsdb_cs_event_destroy(event); > } > + ovsdb_txn_history_run(ctx->db); > } > } > It's always cool and somewhat worrying to see such a commit-log-size-to-loc ratio but I think this is fine: Acked-by: Dumitru Ceara <dce...@redhat.com> Regards, Dumitru _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev