On Wed, Feb 27, 2019 at 5:26 PM Ben Pfaff <b...@ovn.org> wrote: > > On Mon, Feb 25, 2019 at 09:25:03AM -0800, Han Zhou wrote: > > In scalability test with ovn-scale-test, ovsdb-server SB load is not a > > problem at least with 1k HVs. However, if we restart the ovsdb-server, > > depending on the number of HVs and scale of logical objects, e.g. the > > number of logical ports, ovsdb-server of SB become an obvious bottleneck. > > > > In our test with 1k HVs and 20k logical ports (200 lport * 100 lswitches > > connected by one single logical router). Restarting ovsdb-server of SB > > resulted in 100% CPU of ovsdb-server for more than 1 hour. All HVs (and > > northd) are reconnecting and resyncing the big amount of data at the same > > time. > > > > Similar problem would happen in failover scenario. With active-active > > cluster, the problem can be aleviated slightly, because only 1/3 (assuming > > it is 3-node cluster) of the HVs will need to resync data from new servers, > > but it is still a serious problem. > > > > For detailed discussions for the problem and solutions, see: > > https://mail.openvswitch.org/pipermail/ovs-discuss/2018-October/047591.html > > Thanks. > > When I apply this series, I get a reproducible test failure in test > 1920 "schema conversion online - clustered". It's an error from Address > Sanitizer. I'm appending the testsuite.log. >
Thanks Ben for catching this. I should enable AddressSanitizer for regression tests. Please find below patch that fixes this bug. I will send v4 with this fixing the patch 2/5: ovsdb-server: Transaction history tracking. ----8><----------------------------------------------------><8---- diff --git a/ovsdb/ovsdb.c b/ovsdb/ovsdb.c index ea7dd23..cfc96b3 100644 --- a/ovsdb/ovsdb.c +++ b/ovsdb/ovsdb.c @@ -538,6 +538,9 @@ ovsdb_replace(struct ovsdb *dst, struct ovsdb *src) ovsdb_trigger_prereplace_db(trigger); } + /* Destroy txn history. */ + ovsdb_txn_history_destroy(dst); + struct ovsdb_schema *tmp_schema = dst->schema; dst->schema = src->schema; src->schema = tmp_schema; diff --git a/ovsdb/transaction.c b/ovsdb/transaction.c index 0081840..b3f4946 100644 --- a/ovsdb/transaction.c +++ b/ovsdb/transaction.c @@ -1415,7 +1415,9 @@ ovsdb_txn_history_destroy(struct ovsdb *db) struct ovsdb_txn_history_node *txn_h_node, *next; LIST_FOR_EACH_SAFE (txn_h_node, next, node, &db->txn_history) { + ovs_list_remove(&txn_h_node->node); ovsdb_txn_destroy_cloned(txn_h_node->txn); free(txn_h_node); } + db->n_txn_history = 0; } _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev