On 24/08/2021 22:49, Ilya Maximets wrote:
On 8/24/21 10:07 PM, Anton Ivanov wrote:
On 23/08/2021 22:36, Ilya Maximets wrote:
On 8/23/21 10:37 PM, Anton Ivanov wrote:
On 23/08/2021 21:26, Ilya Maximets wrote:
On 8/23/21 10:20 PM, Anton Ivanov wrote:
Should not be the case.

The map is pre-sized for the size from the previous iterations.

Line 12861 in my tree which is probably a few commits out of date:

       fast_hmap_size_for(&lflows, max_seen_lflow_size);

And immediately after building the lflows:

       if (hmap_count(&lflows) > max_seen_lflow_size) {
           max_seen_lflow_size = hmap_count(&lflows);
       }

So the map SHOULD be sized correctly - to the most recent seen lflow count.
Please, re-read the commit message.  It's a first run of the loop
and the 'max_seen_lflow_size' is default 128 at this point.
Ack,

Not using auto-resizing in single threaded mode is a bug. Thanks for fixing it.

  From that perspective the patch is a straight +1 from me.

  From the perspective of the use case stated in the commit message- I am not 
sure it addresses it.

If northd is in single-threaded mode and is tackling a GIGANTIC> database, it 
may never complete the first iteration before the
expiration of the timers and everyone deciding that northd is AWOL.
Well, how do you suggest to fix that?  Obviously, we can always create
a database that northd will never be able to process in a reasonable
amount of time.  And it doesn't matter if it's single- or multi-threaded.
I will send the additional "baseline" fixes for parallel - resizing and initial 
sizing tomorrow, they are fairly trivial and have tested out OK.

However, they do not solve the fact that the overall "heatmap" with dp_groups have moved. 
A lot of processing once again happens out of the "parallel" portion and in the single 
threaded part.

Unless I am mistaken, this can be improved.

Namely, at present, after computing lflows with dp_groups they are walked in 
full, single dp flows separated into a hmap and reprocessed. That is suboptimal 
for parallel (and possibly suboptimal for single threaded).

Unless I am mistaken, when dp_groups are enabled, all lflows can be initially inserted 
into a separate "single datapath" hmap. If the dp_group for an lflow grows to 
more than one, the flow is then moved to the main lflow hmap. This way, the computation 
will generate both types of flows straight away (optionally in parallel) and there will 
be no need to do a full single threaded walk of lflows after they have been generated.

One question (so I can get some idea on which optimizations are worth it and 
which aren't). What is the percentage and overall numbers of single datapath 
lflows?
 From the DB that I have I extracted following information:

Total lflows generated : 9.916.227
Ended up in SbDB: 540.795 (462.196 has no dp_group, 78.599 has a dp_group)
On disk size of this DB with dp groups enabled is 270 MB.

So, the lflows hashmap contains ~540K flows, 460K of them are single
flows.  But still it's 20 times less than number of lflows that northd
generated.  So, performance improvement from parallelization of this
part might be not significant if dp-groups enabled.

Actually, removing a 460K flow map creation, 460K flow map walk and replacing a "pop"-"push" map moves with a fast merge (all of that is possible) should give a significant performance boost even in single thread. All of that is parallelizable too (with some locking).

  If disabled, it
will be very hard for both northd and SbDB to handle database of this
size even from the memory consumption standpoint.  Database will take
around 5 GB on disk.  In memory as a parsed json object, it will be huge.
I'd not advise running a setup of that size without dp groups.  Node
will ran out of memory very fast.

Brgds,

A.

In this case NbDB is only 9MB in size, which is very reasonable, and
northd on my laptop takes more than 15 minutes to process it (I killed
it at this point).  With the patch applied it took only 11 seconds.
So, for me, this patch pretty much fixes the issue.  11 seconds is not
that bad, e.g. ovn-k8s configures inactivity probes for clients to 180.
It would be great to reduce, but we're not there yet.

In that case, if it is multi-threaded from the start, it should probably
grab the sizing from the lflow table hash in south db. That would be a
good approximation for the first run.
This will not work for a case where SbDB is empty for any reason while
NbDB is not.  And there is still a case where northd initially connects
to semi-empty databases and after few iterations NbDB receives a big
transaction and generates a big update for northd.

A.

A.

On 23/08/2021 21:02, Ilya Maximets wrote:
'lflow_map' is never expanded because northd always uses fast
insertion.  This leads to the case where we have a hash map
with only 128 initial buckets and every ovn_lflow_find() ends
up iterating over n_lflows / 128 entries.  It's thousands of
logical flows or even more.  For example, it takes forever for
ovn-northd to start up with the Northbound Db from the 120 node
density-heavy test from ovn-heater, because every lookup is slower
than previous one.  I aborted the process after 15 minutes of
waiting, because there was no sign that it will converge.  With
this change applied the loop completes in only 11 seconds.

Hash map will be pre-allocated to the maximum seen number of
logical flows on a second iteration, but this doesn't help for
the first iteration when northd first time connects to a big
Northbound database, which is a common case during failover or
cluster upgrade.  And there is an even trickier case where big
NbDB transaction that explodes the number of logical flows received
on not the first run.

We can't expand the hash map in case of parallel build, so this
should be fixed separately.

CC: Anton Ivanov <anton.iva...@cambridgegreys.com>
Fixes: 74daa0607c7f ("ovn-northd: Introduce parallel lflow build")
Signed-off-by: Ilya Maximets <i.maxim...@ovn.org>
---
     northd/ovn-northd.c | 6 +++++-
     1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c
index 3d8e21a4f..40cf957c0 100644
--- a/northd/ovn-northd.c
+++ b/northd/ovn-northd.c
@@ -4387,7 +4387,11 @@ do_ovn_lflow_add(struct hmap *lflow_map, struct 
ovn_datapath *od,
                        nullable_xstrdup(ctrl_meter),
                        ovn_lflow_hint(stage_hint), where);
         hmapx_add(&lflow->od_group, od);
-    hmap_insert_fast(lflow_map, &lflow->hmap_node, hash);
+    if (!use_parallel_build) {
+        hmap_insert(lflow_map, &lflow->hmap_node, hash);
+    } else {
+        hmap_insert_fast(lflow_map, &lflow->hmap_node, hash);
+    }
     }
       /* Adds a row with the specified contents to the Logical_Flow table. */


--
Anton R. Ivanov
Cambridgegreys Limited. Registered in England. Company Number 10273661
https://www.cambridgegreys.com/

_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to