The culprit has been found and neutralized.

It is now marginally faster (~ 0.5%) than master.

I will do full retests and scale tests for all combinations including parallel 
and if it tests out OK post v8 tomorrow.

A.

On 10/09/2021 17:13, Ilya Maximets wrote:
On 9/10/21 4:19 PM, Anton Ivanov wrote:
On 10/09/2021 13:29, Ilya Maximets wrote:
On 9/10/21 2:23 PM, Anton Ivanov wrote:
On 10/09/2021 13:18, Anton Ivanov wrote:
On 10/09/2021 12:43, Ilya Maximets wrote:
On 9/9/21 11:02 PM, Mark Michelson wrote:
Hi Anton,

On a high level, this change results in some parts of the parallelized hashmap 
being unused. Should we keep the hashrow_locks structure and its APIs, for 
instance?

See below for more comments.

On 9/2/21 9:27 AM, anton.iva...@cambridgegreys.com wrote:
From: Anton Ivanov <anton.iva...@cambridgegreys.com>

Restore parallel build with dp groups using rwlock instead
of per row locking as an underlying mechanism.

This provides improvement ~ 10% end-to-end on ovn-heater
under virutalization despite awakening some qemu gremlin
which makes qemu climb to silly CPU usage. The gain on
bare metal is likely to be higher.

Signed-off-by: Anton Ivanov <anton.iva...@cambridgegreys.com>
---
     northd/ovn-northd.c | 215 ++++++++++++++++++++++++++++++++------------
     1 file changed, 159 insertions(+), 56 deletions(-)

diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c
index 1f903882b..4537c55dd 100644
--- a/northd/ovn-northd.c
+++ b/northd/ovn-northd.c
@@ -59,6 +59,7 @@
     #include "unixctl.h"
     #include "util.h"
     #include "uuid.h"
+#include "ovs-thread.h"
     #include "openvswitch/vlog.h"
       VLOG_DEFINE_THIS_MODULE(ovn_northd);
@@ -4372,7 +4373,26 @@ ovn_lflow_init(struct ovn_lflow *lflow, struct 
ovn_datapath *od,
     static bool use_logical_dp_groups = false;
     static bool use_parallel_build = true;
     -static struct hashrow_locks lflow_locks;
+static struct ovs_rwlock flowtable_lock;
With the change from the custom hashrow_locks to using an ovs_rwlock, I think 
it's important to document what data this lock is intending to protect.

   From what I can tell, this lock specifically is intended to protect access 
to the hmaps in the global lflow_state structure, and it's also intended to 
protect all ovn_datapaths' od_group hmaps. This is not something that is 
immediately obvious just from a global rwlock declaration.

+
+static void ovn_make_multi_lflow(struct ovn_lflow *old_lflow,
+                              struct ovn_datapath *od,
+                              struct lflow_state *lflow_map,
+                              uint32_t hash)
+{
+    hmapx_add(&old_lflow->od_group, od);
+    hmap_remove(&lflow_map->single_od, &old_lflow->hmap_node);
+    if (use_parallel_build) {
+        hmap_insert_fast(&lflow_map->multiple_od, &old_lflow->hmap_node, hash);
+    } else {
+        hmap_insert(&lflow_map->multiple_od, &old_lflow->hmap_node, hash);
+    }
+}
+
+#ifdef __clang__
+#pragma clang diagnostic push
+#pragma clang diagnostic ignored "-Wthread-safety"
+#endif
The section above needs a comment to explain why -Wthread-safety is ignored.
Please, use OVS_NO_THREAD_SAFETY_ANALYSIS marker instead of manually disabling
the diagnostics.


On a side note, I ran some tests with this patch set applied and it drops
performance of non-parallel case by 20% on my laptop with databases from
the ovn-heater's 120 node density test.

I see, you mentioned 10% improvement in the commit message. Assuming it's
with parallelization enabled, right?
Yes. And no difference on heater tests up to 36 fake nodes with 7200 ports.

In any case, some performance testing without parallelization required before
accepting the change, as it's a default configuration.
They were done. I am somewhat surprised by your measurement, can you please 
provide some more details.
I have databases from a previous 120 node ovn-heater run.
I'm loading them to the OVN sandbox like this:

make sandbox SANDBOXFLAGS="--nbdb-source=/tmp/ovnnb_db.db 
--sbdb-source=/tmp/ovnsb_db.db"

And performing a couple of small operations with northbound database just to
trigger the northd processing loop.

Configuration has dp-groups enabled and parallelization disabled.
Can you run v7 versus that. It should fix 3 performance regressions I found in 
v6. Each of them is relatively small, but they may add up for a large database.
v7 seems only slightly better.  By a couple of percents.
Nowhere close to the current master.

And stopwatches in v7 are still messed up.

I will run an ovn-heater with this, but to be honest - I do not expect any 
difference. I did not see any difference before at the scales I was testing.

If it is still slower than the original code, it will be interesting to find 
the culprit. They should be nearly identical.

A.

You are saying "database from 120 nodes" - are we talking a stopwatch 
measurement here?
Which is the case. The patch moved it where it should not be. Will be fixed in 
the next revision.
Not only stopwatches.  Poll intervals in general grew too.

A.

Best regards, Ilya Maximets.


--
Anton R. Ivanov
Cambridgegreys Limited. Registered in England. Company Number 10273661
https://www.cambridgegreys.com/

_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to