On 8/23/22 16:42, Dumitru Ceara wrote: > After removing the possibility of disabling logical datapath groups the > above mentioned state should only be relevant when ovn-northd is started > with more than one lflow processing threads (--n-threads > 1), to avoid > a very inefficient first lflow processing run due to a suboptimally > sized hmap. > > There's no need to ever go back to this state now that we don't allow > users to disable logical datapath groups. Moreover, the previous code > was wrongfully setting the state to STATE_INIT_HASH_SIZES at every > iteration, essentially disabling parallelization. > > Fixes: 9dea0c090e91 ("nb: Remove possibility of disabling logical datapath > groups.") > Reported-by: Ilya Maximets <i.maxim...@ovn.org> > Signed-off-by: Dumitru Ceara <dce...@redhat.com> > ---
Hi, Dumitru. Thanks for the patch! I've been using it in multiple scale tests since it was posted, and it does re-enable parallel lflow build. One issue though. Not sure how to fix that or what is the desired behavior, but what I noticed is: 1. Cluster with 3 northd instances starts up. 2. After a while all of them has been active at least once, due to leadership transfers or other things. 3. If one instance stays active for long, a lot of database changes can be accumulated. 4. Once the leadership flips, new active northd is not prepared for the amount of logical flows it suddenly needs to manage. So, it starts parallel processing with hmap significantly smaller than it actually needs right now. That iteration takes noticeably more time than a usual one. After the first iteration after becoming active, it can process lflows with a normal speed. So, I wonder if we can fall into a catastrophic scenario where the current hash map size is very small and amount of flows after re-connection is very high, so threads are only fighting for locks all the time. Do you have some thoughts on this? Should we disable parallelism when northd goes into stand-up state? Best regards, Ilya Maximets. > northd/northd.c | 10 ---------- > 1 file changed, 10 deletions(-) > > diff --git a/northd/northd.c b/northd/northd.c > index 7e2681865..33943bfaf 100644 > --- a/northd/northd.c > +++ b/northd/northd.c > @@ -14164,15 +14164,6 @@ void run_update_worker_pool(int n_threads) > } > } > > -static void init_worker_pool(void) > -{ > - /* If parallelization is enabled, make sure locks are initialized. */ > - if (parallelization_state != STATE_NULL) { > - lflow_hash_lock_init(); > - parallelization_state = STATE_INIT_HASH_SIZES; > - } > -} > - > static void > build_mcast_groups(struct lflow_input *data, > const struct hmap *datapaths, > @@ -15762,7 +15753,6 @@ void northd_run(struct northd_input *input_data, > struct ovsdb_idl_txn *ovnsb_txn) > { > stopwatch_start(OVNNB_DB_RUN_STOPWATCH_NAME, time_msec()); > - init_worker_pool(); > ovnnb_db_run(input_data, data, ovnnb_txn, ovnsb_txn, > input_data->sbrec_chassis_by_name, > input_data->sbrec_chassis_by_hostname); _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev