On Tue, Aug 24, 2021 at 7:56 AM Anton Ivanov <anton.iva...@cambridgegreys.com> wrote: > > > On 24/08/2021 12:46, Ilya Maximets wrote: > > On 8/24/21 1:18 PM, Anton Ivanov wrote: > >> On 24/08/2021 12:05, Ilya Maximets wrote: > >>> On 8/24/21 7:36 AM, Anton Ivanov wrote: > >>>> On 23/08/2021 22:36, Ilya Maximets wrote: > >>>>> On 8/23/21 10:37 PM, Anton Ivanov wrote: > >>>>>> On 23/08/2021 21:26, Ilya Maximets wrote: > >>>>>>> On 8/23/21 10:20 PM, Anton Ivanov wrote: > >>>>>>>> Should not be the case. > >>>>>>>> > >>>>>>>> The map is pre-sized for the size from the previous iterations. > >>>>>>>> > >>>>>>>> Line 12861 in my tree which is probably a few commits out of date: > >>>>>>>> > >>>>>>>> fast_hmap_size_for(&lflows, max_seen_lflow_size); > >>>>>>>> > >>>>>>>> And immediately after building the lflows: > >>>>>>>> > >>>>>>>> if (hmap_count(&lflows) > max_seen_lflow_size) { > >>>>>>>> max_seen_lflow_size = hmap_count(&lflows); > >>>>>>>> } > >>>>>>>> > >>>>>>>> So the map SHOULD be sized correctly - to the most recent seen lflow > >>>>>>>> count. > >>>>>>> Please, re-read the commit message. It's a first run of the loop > >>>>>>> and the 'max_seen_lflow_size' is default 128 at this point. > >>>>>> Ack, > >>>>>> > >>>>>> Not using auto-resizing in single threaded mode is a bug. Thanks for > >>>>>> fixing it. > >>>>>> > >>>>>> From that perspective the patch is a straight +1 from me. > >>>>>> > >>>>>> From the perspective of the use case stated in the commit message- > >>>>>> I am not sure it addresses it. > >>>>>> > >>>>>> If northd is in single-threaded mode and is tackling a GIGANTIC> > >>>>>> database, it may never complete the first iteration before the > >>>>>> expiration of the timers and everyone deciding that northd is AWOL. > >>>>> Well, how do you suggest to fix that? Obviously, we can always create > >>>>> a database that northd will never be able to process in a reasonable > >>>>> amount of time. And it doesn't matter if it's single- or > >>>>> multi-threaded. > >>>>> > >>>>> In this case NbDB is only 9MB in size, which is very reasonable, and > >>>>> northd on my laptop takes more than 15 minutes to process it (I killed > >>>>> it at this point). With the patch applied it took only 11 seconds. > >>>>> So, for me, this patch pretty much fixes the issue. 11 seconds is not > >>>>> that bad, e.g. ovn-k8s configures inactivity probes for clients to 180. > >>>>> It would be great to reduce, but we're not there yet. > >>>>> > >>>>>> In that case, if it is multi-threaded from the start, it should > >>>>>> probably > >>>>>> grab the sizing from the lflow table hash in south db. That would be a > >>>>>> good approximation for the first run. > >>>>> This will not work for a case where SbDB is empty for any reason while > >>>>> NbDB is not. And there is still a case where northd initially connects > >>>>> to semi-empty databases and after few iterations NbDB receives a big > >>>>> transaction and generates a big update for northd. > >>>> A partial fix is to resize to optimum size the hash after lflow > >>>> processing. > >>> I'm not sure I understand what you mean here, because resizing after > >>> lflow processing will not help. The lflow processing itself is the > >>> very slow part that we're trying to make faster here. > >> That can be the case only with dpgroups. Otherwise lflows are just a > >> destination to dump data and the bucket sizing is irrelevant because there > >> is never any lookup inside lflows during processing. The lflow map is just > >> used to store data. So if it is suboptimal at the exit of build_lflows() > >> the resize will fix it before the first lookup shortly thereafter. > >> > >> Are you running it with dpgroups enabled? In that case there are lookups > >> inside lflows during build which happen under a per-bucket lock. So in > >> addition to suboptimal size when searching the contention depends on the > >> number of buckets. If they are too few, the system becomes heavily > >> contended resulting in ridiculous computation sizes. > > Oh, I see. Indeed, without dp-groups there is no lookup during > > lflow build. I missed that. So, yes, I agree that for a case > > without dp-groups, re-sizing after lflow processing should work. > > We need that for parallel case. > > > > Current patch (use hmap_insert() that resizes if needed) helps > > for all non-parallel cases. > > Indeed. It should go in. >
Why can't we have hmap_insert() for both parallel and non parallel configs to start with and switch over to hmap_insert_fast() when ovn-northd has successfully connected to SB DB and has approximated on the more accurate hmap size ? Thanks Numan > I will sort out the other cases to the extent possible. > > Brgds, > > A. > > > > > I'm mostly running dp-groups + non-parallel which is a default > > case for ovn-heater/ovn-k8s. > > > >> For the case of "dpgroups + parallel + first iteration + pre-existing > >> large database" there is no cure short of pre-allocating the hash to > >> maximum size. > > Yeah, dp-groups + parallel is a hard case. > > > >> I am scale testing that as well as resize (for non-dp-groups cases) at > >> present. > >> > >> Brgds, > >> > >> A. > >> > >>>> If the sizing was correct - 99.9% of the case this will be a noop. > >>>> > >>>> If the sizing was incorrect, it will be resized so that the DP searches > >>>> and all other ops which were recently added for flow reduction will work > >>>> optimally. > >>>> > >>>> This still does not work for lflow compute with dpgroups + parallel upon > >>>> initial connect and without a SB database to use for size guidance. It > >>>> will work for all other cases. > >>>> > >>>> I will send two separate patches to address the cases which can be > >>>> easily addressed and see what can be done with the dp+parallel upon > >>>> initial connect to an empty sb database. > >>>> > >>>> Brgds, > >>>> > >>>> A > >>>> > >>>>>> A. > >>>>>> > >>>>>>>> A. > >>>>>>>> > >>>>>>>> On 23/08/2021 21:02, Ilya Maximets wrote: > >>>>>>>>> 'lflow_map' is never expanded because northd always uses fast > >>>>>>>>> insertion. This leads to the case where we have a hash map > >>>>>>>>> with only 128 initial buckets and every ovn_lflow_find() ends > >>>>>>>>> up iterating over n_lflows / 128 entries. It's thousands of > >>>>>>>>> logical flows or even more. For example, it takes forever for > >>>>>>>>> ovn-northd to start up with the Northbound Db from the 120 node > >>>>>>>>> density-heavy test from ovn-heater, because every lookup is slower > >>>>>>>>> than previous one. I aborted the process after 15 minutes of > >>>>>>>>> waiting, because there was no sign that it will converge. With > >>>>>>>>> this change applied the loop completes in only 11 seconds. > >>>>>>>>> > >>>>>>>>> Hash map will be pre-allocated to the maximum seen number of > >>>>>>>>> logical flows on a second iteration, but this doesn't help for > >>>>>>>>> the first iteration when northd first time connects to a big > >>>>>>>>> Northbound database, which is a common case during failover or > >>>>>>>>> cluster upgrade. And there is an even trickier case where big > >>>>>>>>> NbDB transaction that explodes the number of logical flows received > >>>>>>>>> on not the first run. > >>>>>>>>> > >>>>>>>>> We can't expand the hash map in case of parallel build, so this > >>>>>>>>> should be fixed separately. > >>>>>>>>> > >>>>>>>>> CC: Anton Ivanov <anton.iva...@cambridgegreys.com> > >>>>>>>>> Fixes: 74daa0607c7f ("ovn-northd: Introduce parallel lflow build") > >>>>>>>>> Signed-off-by: Ilya Maximets <i.maxim...@ovn.org> > >>>>>>>>> --- > >>>>>>>>> northd/ovn-northd.c | 6 +++++- > >>>>>>>>> 1 file changed, 5 insertions(+), 1 deletion(-) > >>>>>>>>> > >>>>>>>>> diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c > >>>>>>>>> index 3d8e21a4f..40cf957c0 100644 > >>>>>>>>> --- a/northd/ovn-northd.c > >>>>>>>>> +++ b/northd/ovn-northd.c > >>>>>>>>> @@ -4387,7 +4387,11 @@ do_ovn_lflow_add(struct hmap *lflow_map, > >>>>>>>>> struct ovn_datapath *od, > >>>>>>>>> nullable_xstrdup(ctrl_meter), > >>>>>>>>> ovn_lflow_hint(stage_hint), where); > >>>>>>>>> hmapx_add(&lflow->od_group, od); > >>>>>>>>> - hmap_insert_fast(lflow_map, &lflow->hmap_node, hash); > >>>>>>>>> + if (!use_parallel_build) { > >>>>>>>>> + hmap_insert(lflow_map, &lflow->hmap_node, hash); > >>>>>>>>> + } else { > >>>>>>>>> + hmap_insert_fast(lflow_map, &lflow->hmap_node, hash); > >>>>>>>>> + } > >>>>>>>>> } > >>>>>>>>> /* Adds a row with the specified contents to the > >>>>>>>>> Logical_Flow table. */ > > > -- > Anton R. Ivanov > Cambridgegreys Limited. Registered in England. Company Number 10273661 > https://www.cambridgegreys.com/ > > _______________________________________________ > dev mailing list > d...@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev