On 10/5/22 08:19, Han Zhou wrote:
> On Fri, Sep 30, 2022 at 7:01 AM Dumitru Ceara <dce...@redhat.com> wrote:
>>
>> Sometimes network components are compute node-specific.  Sometimes such
>> components are replicated, almost identically, for multiple nodes
>> in the cluster.
>>
>> One such example is the case of Kubernetes NodePort services which
>> translate (in the ovn-kubernetes case) to Load_Balancer
>> objects being applied to each and every node's logical gateway router.
>> These load balancers are almost identical, the main difference being
>> the fact that they use different VIPs (the node's IP).
>>
>> With the current OVN load balancer design, this becomes a problem at
>> scale because the number of load balancers that must be configured is
>> N x M (N nodes times M services).
>>
>> This series proposes a new concept in OVN: virtual network component
>> templates.  The goal of the templates is to help reduce resource
>> consumption in the OVN central components in specific cases like the one
>> described above.
>>
>> To achieve that, the CMS will instead configure a "templated" load
>> balancer for every service and apply that single template record to
>> the cluster-wide load balancer group.  This template is then
>> instantiated differently on different compute nodes.  This translation
>> is controlled through per-chassis "template variables" configured by
>> the CMS in the new NB.Template_Var table.
>>
> Thanks Dumitru for the great improvement!
> 

Thanks for reviewing this!

>> A syntetic benchmark simulating what an OpenShift router (using Node
>> Port services) scale test would do shows the following preliminary
>> results:
>> A. 120 node, 2K NodePort services:
>> - before:
>>   - Southbound DB size on disk (compacted): ~385MB
>>   - Southbound DB memory usage (RSS): ~3GB
>>   - Southbound DB logical flows: 720K
>>
>> - after:
>>   - Southbound DB size on disk (compacted): ~100MB
>>   - Southbound DB memory usage (RSS): ~250MB
>>   - Southbound DB logical flows: 6K
>>
>> B. 250 node, 2K NodePort services:
>> - after (didn't run the "before" test as it was taking way too long):
>>   - Southbound DB size on disk (compacted): ~155MB
>>   - Southbound DB memory usage (RSS): ~760MB
>>   - Southbound DB logical flows: 6K

I'll add the (hacky) benchmark script below just for clarity.

> 
> A quick question to the test. How many LSPs per node? I am just wondering,
> how could the number of lflows be the same (6k) when number of nodes
> increased from 120 to 250? For some of my scale tests, the number of lflows
> are far more than this even if I don't create any LBs. (also consider that
> ovn-k8s deployment has at least an ext-LS and a GR per node)

I really only focused on logical flows (and SB.Load_Balancers) created
due to NB.Load_Balancers provisioned like ovn-k8s provisions them today.
So the test doesn't add a lot of LSPs.  However, in the "OpenShift
router" scenario I was trying to fix, the load due to LSPs is also
minimal.  It's exactly the huge number of (very similar)
load balancers that causes issues.

> I have no doubt of the effectiveness of this improvement, but just need to
> understand the numbers better since I am also doing scale tests and
> measurements on top of this patch series.

Sure, makes complete sense.  And if we can find even more use cases for
component templates, even better!

> 
> Thanks,
> Han

Thanks,
Dumitru

---
diff --git a/tutorial/node-template-lb-stress.sh 
b/tutorial/node-template-lb-stress.sh
new file mode 100755
index 0000000000..e1a051182a
--- /dev/null
+++ b/tutorial/node-template-lb-stress.sh
@@ -0,0 +1,57 @@
+#!/bin/bash
+
+nrtr=$1
+nlb=$2
+nbackends=$3
+
+echo "ROUTERS        : $nrtr"
+echo "LBS            : $nlb"
+echo "BACKENDS PER LB: $nbackends"
+
+export OVN_NB_DAEMON=$(ovn-nbctl --detach)
+export OVN_SB_DAEMON=$(ovn-sbctl --detach)
+trap "killall -9 ovn-nbctl; killall -9 ovn-sbctl" EXIT
+
+lbg=$(ovn-nbctl create load_balancer_group name=lbg)
+for i in $(seq $nrtr); do
+    r=lr-$i
+    lrp=lrp-$i
+    echo Router $r
+    ovn-nbctl lr-add $r -- set logical_router $r load_balancer_group=$lbg
+    ovn-nbctl lrp-add $r $lrp 00:00:00:00:01:00 88.88.88.88
+    s=ls-$i
+    echo Switch $s
+    ovn-nbctl ls-add $s -- set logical_switch $s load_balancer_group=$lbg
+    lsp=lsp-$i
+    echo LSP $lsp
+    ovn-nbctl lsp-add $s $lsp
+    ovs-vsctl add-port br-int $lsp -- set interface $lsp 
external_ids:iface-id=$lsp
+done
+
+for l in $(seq $nlb); do
+    lb=lb-$l
+    ovn-nbctl --template lb-add $lb "^vip:$l" "^backends$l" tcp
+    lb_uuid=$(ovn-nbctl --columns _uuid --bare find load_balancer name=$lb)
+    ovn-nbctl add load_balancer_group $lbg load_balancer $lb_uuid
+done
+
+for i in $(seq $nrtr); do
+    ovn-nbctl create chassis_template_var name=vip value=42.42.42.$i 
chassis_name="chassis-$i"
+
+    cmd=
+    for j in $(seq $nlb); do
+        echo "CREATING TEMPLATE VARS for RTR $i LB $j"
+        backends=""
+        for k in $(seq $nbackends); do
+            j1=$(expr $j / 250)
+            j2=$(expr $j % 250)
+            backends="42.$k.$j1.$j2:$j,$backends"
+        done
+        cmd="$cmd -- create chassis_template_var name=backends$j 
value=\"$backends\" chassis_name=\"chassis-$i\""
+        if [ $(expr $j % 1000) -eq "0" ]; then
+            ovn-nbctl $cmd
+            cmd=
+        fi
+    done
+    ovn-nbctl $cmd
+done

_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to