On 8/24/22 17:43, Ilya Maximets wrote: > On 8/23/22 16:42, Dumitru Ceara wrote: >> After removing the possibility of disabling logical datapath groups the >> above mentioned state should only be relevant when ovn-northd is started >> with more than one lflow processing threads (--n-threads > 1), to avoid >> a very inefficient first lflow processing run due to a suboptimally >> sized hmap. >> >> There's no need to ever go back to this state now that we don't allow >> users to disable logical datapath groups. Moreover, the previous code >> was wrongfully setting the state to STATE_INIT_HASH_SIZES at every >> iteration, essentially disabling parallelization. >> >> Fixes: 9dea0c090e91 ("nb: Remove possibility of disabling logical datapath >> groups.") >> Reported-by: Ilya Maximets <i.maxim...@ovn.org> >> Signed-off-by: Dumitru Ceara <dce...@redhat.com> >> --- > > Hi, Dumitru. Thanks for the patch! > I've been using it in multiple scale tests since it was posted, > and it does re-enable parallel lflow build. > > One issue though. Not sure how to fix that or what is the desired > behavior, but what I noticed is: > > 1. Cluster with 3 northd instances starts up. > 2. After a while all of them has been active at least once, due > to leadership transfers or other things. > 3. If one instance stays active for long, a lot of database changes > can be accumulated. > 4. Once the leadership flips, new active northd is not prepared for > the amount of logical flows it suddenly needs to manage. So, > it starts parallel processing with hmap significantly smaller > than it actually needs right now. That iteration takes noticeably > more time than a usual one. After the first iteration after > becoming active, it can process lflows with a normal speed. > > So, I wonder if we can fall into a catastrophic scenario where > the current hash map size is very small and amount of flows after > re-connection is very high, so threads are only fighting for locks > all the time. > > Do you have some thoughts on this? > > Should we disable parallelism when northd goes into stand-up state?
* I meant 'standby', of course. :/ > > Best regards, Ilya Maximets. > >> northd/northd.c | 10 ---------- >> 1 file changed, 10 deletions(-) >> >> diff --git a/northd/northd.c b/northd/northd.c >> index 7e2681865..33943bfaf 100644 >> --- a/northd/northd.c >> +++ b/northd/northd.c >> @@ -14164,15 +14164,6 @@ void run_update_worker_pool(int n_threads) >> } >> } >> >> -static void init_worker_pool(void) >> -{ >> - /* If parallelization is enabled, make sure locks are initialized. */ >> - if (parallelization_state != STATE_NULL) { >> - lflow_hash_lock_init(); >> - parallelization_state = STATE_INIT_HASH_SIZES; >> - } >> -} >> - >> static void >> build_mcast_groups(struct lflow_input *data, >> const struct hmap *datapaths, >> @@ -15762,7 +15753,6 @@ void northd_run(struct northd_input *input_data, >> struct ovsdb_idl_txn *ovnsb_txn) >> { >> stopwatch_start(OVNNB_DB_RUN_STOPWATCH_NAME, time_msec()); >> - init_worker_pool(); >> ovnnb_db_run(input_data, data, ovnnb_txn, ovnsb_txn, >> input_data->sbrec_chassis_by_name, >> input_data->sbrec_chassis_by_hostname); > _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev