On Tue, Sep 6, 2022 at 6:42 AM Dumitru Ceara <dce...@redhat.com> wrote:
>
> On 8/31/22 16:59, Han Zhou wrote:
> > On Wed, Aug 31, 2022 at 1:38 AM Dumitru Ceara <dce...@redhat.com> wrote:
> >>
> >> On 8/31/22 02:17, Han Zhou wrote:
> >>> On Tue, Aug 30, 2022 at 8:18 AM Dumitru Ceara <dce...@redhat.com>
wrote:
> >>>>
> >>>> Hi Han,
> >>>>
> >>>> On 8/24/22 08:40, Han Zhou wrote:
> >>>>> The ls_in_pre_stateful priority 120 flow that saves dst IP and Port
to
> >>>>> registers is causing a critical dataplane performance impact to
> >>>>> short-lived connections, because it unwildcards megaflows with exact
> >>>>> match on dst IP and L4 ports. Any new connections with a different
> >>>>> client side L4 port will encounter datapath flow miss and upcall to
> >>>>> ovs-vswitchd, which makes typical use cases such as HTTP1.0 based
> >>>>> RESTful API calls suffer big performance degredations.
> >>>>>
> >>>>> These fields (dst IP and port) were saved to registers to solve a
> >>>>> problem of LB hairpin use case when different VIPs are sharing
> >>>>> overlapping backend+port [0]. The change [0] might not have as wide
> >>>>> performance impact as it is now because at that time one of the
match
> >>>>> condition "REGBIT_CONNTRACK_NAT == 1" was set only for established
and
> >>>>> natted traffic, while now the impact is more obvious because
> >>>>> REGBIT_CONNTRACK_NAT is now set for all IP traffic (if any VIP
> >>>>> configured on the LS) since commit [1], after several other
indirectly
> >>>>> related optimizations and refactors.
> >>>>>
> >>>>> This patch fixes the problem by modifying the priority-120 flows in
> >>>>> ls_in_pre_stateful. Instead of blindly saving dst IP and L4 port for
> > any
> >>>>> traffic with the REGBIT_CONNTRACK_NAT == 1, we now save dst IP and
L4
> >>>>> port only for traffic matching the LB VIPs, because these are the
ones
> >>>>> that need to be saved for the hairpin purpose. The existed
> > priority-110
> >>>>> flows will match the rest of the traffic just like before but
wouldn't
> >>>>> not save dst IP and L4 port, so any server->client traffic would not
> >>>>> unwildcard megaflows with client side L4 ports.
> >>>>>
> >>>>
> >>>> Just to be 100% sure, the client->server traffic will still generate
up
> >>>> to V x H flows where V is "the number of VIP:port tuples" and H is
"the
> >>>> number of potential (dp_)hash values", right?
> >>>
> >>> Yes, if all the VIPs and backends are being accessed at the same time.
> > This
> >>> is expected. For any given pair of client<->server, regardless of the
> >>> connections (source port number changes), the packets should match the
> > same
> >>> mega-flow and be handled by fastpath (instead of going to userspace,
> > which
> >>> is the behavior before this patch).
> >>>
> >>
> >> Thanks for the confirmation!
> >>
> >>>>
> >>>>> [0] ce0ef8d59850 ("Properly handle hairpin traffic for VIPs with
> > shared
> >>> backends.")
> >>>>> [1] 0038579d1928 ("northd: Optimize ct nat for load balancer
> > traffic.")
> >>>>>
> >>>>> Signed-off-by: Han Zhou <hz...@ovn.org>
> >>>>> ---
> >>>>> v1 -> v2: Add the missing changes for ovn-northd.8.xml which I
forgot
> >>> to commit
> >>>>>           before sending v1.
> >>>>>
> >>>>>  northd/northd.c         | 125
> > +++++++++++++++++++++++++---------------
> >>>>>  northd/ovn-northd.8.xml |  14 ++---
> >>>>>  tests/ovn-northd.at     |  87 ++++++++++------------------
> >>>>>  tests/ovn.at            |  14 ++---
> >>>>>  4 files changed, 118 insertions(+), 122 deletions(-)
> >>>>>
> >>>>> diff --git a/northd/northd.c b/northd/northd.c
> >>>>> index 7e2681865..860641936 100644
> >>>>> --- a/northd/northd.c
> >>>>> +++ b/northd/northd.c
> >>>>> @@ -273,15 +273,15 @@ enum ovn_stage {
> >>>>>   * |    | REGBIT_ACL_HINT_{ALLOW_NEW/ALLOW/DROP/BLOCK} |   |
> >>>        |
> >>>>>   * |    |     REGBIT_ACL_LABEL                         | X |
> >>>        |
> >>>>>   * +----+----------------------------------------------+ X |
> >>>        |
> >>>>> - * | R1 |         ORIG_DIP_IPV4 (>= IN_STATEFUL)       | R |
> >>>        |
> >>>>> + * | R1 |         ORIG_DIP_IPV4 (>= IN_PRE_STATEFUL)   | R |
> >>>        |
> >>>>>   * +----+----------------------------------------------+ E |
> >>>        |
> >>>>> - * | R2 |         ORIG_TP_DPORT (>= IN_STATEFUL)       | G |
> >>>        |
> >>>>> + * | R2 |         ORIG_TP_DPORT (>= IN_PRE_STATEFUL)   | G |
> >>>        |
> >>>>>   * +----+----------------------------------------------+ 0 |
> >>>        |
> >>>>>   * | R3 |                  ACL LABEL                   |   |
> >>>        |
> >>>>>   *
> >>>
> >
+----+----------------------------------------------+---+------------------+
> >>>>>   * | R4 |                   UNUSED                     |   |
> >>>        |
> >>>>> - * +----+----------------------------------------------+ X |
> >>> ORIG_DIP_IPV6  |
> >>>>> - * | R5 |                   UNUSED                     | X | (>=
> >>> IN_STATEFUL) |
> >>>>> + * +----+----------------------------------------------+ X |
> >>> ORIG_DIP_IPV6(>= |
> >>>>> + * | R5 |                   UNUSED                     | X |
> >>> IN_PRE_STATEFUL) |
> >>>>>   * +----+----------------------------------------------+ R |
> >>>        |
> >>>>>   * | R6 |                   UNUSED                     | E |
> >>>        |
> >>>>>   * +----+----------------------------------------------+ G |
> >>>        |
> >>>>> @@ -5899,43 +5899,17 @@ build_pre_stateful(struct ovn_datapath *od,
> >>>>>      ovn_lflow_add(lflows, od, S_SWITCH_IN_PRE_STATEFUL, 0, "1",
> >>> "next;");
> >>>>>      ovn_lflow_add(lflows, od, S_SWITCH_OUT_PRE_STATEFUL, 0, "1",
> >>> "next;");
> >>>>>
> >>>>> -    const char *ct_lb_action = features->ct_no_masked_label
> >>>>> -                               ? "ct_lb_mark"
> >>>>> -                               : "ct_lb";
> >>>>> -    const char *lb_protocols[] = {"tcp", "udp", "sctp"};
> >>>>> -    struct ds actions = DS_EMPTY_INITIALIZER;
> >>>>> -    struct ds match = DS_EMPTY_INITIALIZER;
> >>>>> -
> >>>>> -    for (size_t i = 0; i < ARRAY_SIZE(lb_protocols); i++) {
> >>>>> -        ds_clear(&match);
> >>>>> -        ds_clear(&actions);
> >>>>> -        ds_put_format(&match, REGBIT_CONNTRACK_NAT" == 1 && ip4 &&
> > %s",
> >>>>> -                      lb_protocols[i]);
> >>>>> -        ds_put_format(&actions, REG_ORIG_DIP_IPV4 " = ip4.dst; "
> >>>>> -                                REG_ORIG_TP_DPORT " = %s.dst; %s;",
> >>>>> -                      lb_protocols[i], ct_lb_action);
> >>>>> -        ovn_lflow_add(lflows, od, S_SWITCH_IN_PRE_STATEFUL, 120,
> >>>>> -                      ds_cstr(&match), ds_cstr(&actions));
> >>>>> -
> >>>>> -        ds_clear(&match);
> >>>>> -        ds_clear(&actions);
> >>>>> -        ds_put_format(&match, REGBIT_CONNTRACK_NAT" == 1 && ip6 &&
> > %s",
> >>>>> -                      lb_protocols[i]);
> >>>>> -        ds_put_format(&actions, REG_ORIG_DIP_IPV6 " = ip6.dst; "
> >>>>> -                                REG_ORIG_TP_DPORT " = %s.dst; %s;",
> >>>>> -                      lb_protocols[i], ct_lb_action);
> >>>>> -        ovn_lflow_add(lflows, od, S_SWITCH_IN_PRE_STATEFUL, 120,
> >>>>> -                      ds_cstr(&match), ds_cstr(&actions));
> >>>>> -    }
> >>>>> +    /* Note: priority-120 flows are added in
> >>> build_lb_rules_pre_stateful(). */
> >>>>>
> >>>>> -    ds_clear(&actions);
> >>>>> -    ds_put_format(&actions, "%s;", ct_lb_action);
> >>>>> +    const char *ct_lb_action = features->ct_no_masked_label
> >>>>> +                               ? "ct_lb_mark;"
> >>>>> +                               : "ct_lb;";
> >>>>>
> >>>>>      ovn_lflow_add(lflows, od, S_SWITCH_IN_PRE_STATEFUL, 110,
> >>>>> -                  REGBIT_CONNTRACK_NAT" == 1", ds_cstr(&actions));
> >>>>> +                  REGBIT_CONNTRACK_NAT" == 1", ct_lb_action);
> >>>>>
> >>>>>      ovn_lflow_add(lflows, od, S_SWITCH_OUT_PRE_STATEFUL, 110,
> >>>>> -                  REGBIT_CONNTRACK_NAT" == 1", ds_cstr(&actions));
> >>>>> +                  REGBIT_CONNTRACK_NAT" == 1", ct_lb_action);
> >>>>>
> >>>>>      /* If REGBIT_CONNTRACK_DEFRAG is set as 1, then the packets
> > should
> >>> be
> >>>>>       * sent to conntrack for tracking and defragmentation. */
> >>>>> @@ -5945,8 +5919,6 @@ build_pre_stateful(struct ovn_datapath *od,
> >>>>>      ovn_lflow_add(lflows, od, S_SWITCH_OUT_PRE_STATEFUL, 100,
> >>>>>                    REGBIT_CONNTRACK_DEFRAG" == 1", "ct_next;");
> >>>>>
> >>>>> -    ds_destroy(&actions);
> >>>>> -    ds_destroy(&match);
> >>>>>  }
> >>>>>
> >>>>>  static void
> >>>>> @@ -6841,22 +6813,16 @@ build_qos(struct ovn_datapath *od, struct
hmap
> >>> *lflows) {
> >>>>>  }
> >>>>>
> >>>>>  static void
> >>>>> -build_lb_rules(struct hmap *lflows, struct ovn_northd_lb *lb, bool
> >>> ct_lb_mark,
> >>>>> -               struct ds *match, struct ds *action,
> >>>>> -               const struct shash *meter_groups)
> >>>>> +build_lb_rules_pre_stateful(struct hmap *lflows, struct
ovn_northd_lb
> >>> *lb,
> >>>>> +                            bool ct_lb_mark, struct ds *match,
> >>>>> +                            struct ds *action)
> >>>>>  {
> >>>>>      for (size_t i = 0; i < lb->n_vips; i++) {
> >>>>>          struct ovn_lb_vip *lb_vip = &lb->vips[i];
> >>>>> -        struct ovn_northd_lb_vip *lb_vip_nb = &lb->vips_nb[i];
> >>>>> -        const char *ip_match = NULL;
> >>>>> -
> >>>>>          ds_clear(action);
> >>>>>          ds_clear(match);
> >>>>> +        const char *ip_match = NULL;
> >>>>>
> >>>>> -        /* Make sure that we clear the REGBIT_CONNTRACK_COMMIT
flag.
> >>> Otherwise
> >>>>> -         * the load balanced packet will be committed again in
> >>>>> -         * S_SWITCH_IN_STATEFUL. */
> >>>>> -        ds_put_format(action, REGBIT_CONNTRACK_COMMIT" = 0; ");
> >>>>>          /* Store the original destination IP to be used when
> > generating
> >>>>>           * hairpin flows.
> >>>>>           */
> >>>>> @@ -6887,6 +6853,67 @@ build_lb_rules(struct hmap *lflows, struct
> >>> ovn_northd_lb *lb, bool ct_lb_mark,
> >>>>>              ds_put_format(action, REG_ORIG_TP_DPORT " = %"PRIu16";
",
> >>>>>                            lb_vip->vip_port);
> >>>>>          }
> >>>>> +        ds_put_format(action, "%s;", ct_lb_mark ? "ct_lb_mark" :
> >>> "ct_lb");
> >>>>> +
> >>>>> +        ds_put_format(match, "%s.dst == %s", ip_match,
> >>> lb_vip->vip_str);
> >>>>> +        if (lb_vip->vip_port) {
> >>>>> +            ds_put_format(match, " && %s.dst == %d", proto,
> >>> lb_vip->vip_port);
> >>>>> +        }
> >>>>> +
> >>>>> +        struct ovn_lflow *lflow_ref = NULL;
> >>>>> +        uint32_t hash = ovn_logical_flow_hash(
> >>>>> +                ovn_stage_get_table(S_SWITCH_IN_PRE_STATEFUL),
> >>>>> +                ovn_stage_get_pipeline(S_SWITCH_IN_PRE_STATEFUL),
> > 120,
> >>>>> +                ds_cstr(match), ds_cstr(action));
> >>>>> +
> >>>>> +        for (size_t j = 0; j < lb->n_nb_ls; j++) {
> >>>>> +            struct ovn_datapath *od = lb->nb_ls[j];
> >>>>> +
> >>>>> +            if (!ovn_dp_group_add_with_reference(lflow_ref, od)) {
> >>>>> +                lflow_ref = ovn_lflow_add_at_with_hash(
> >>>>> +                        lflows, od, S_SWITCH_IN_PRE_STATEFUL, 120,
> >>>>> +                        ds_cstr(match), ds_cstr(action),
> >>>>> +                        NULL, NULL, &lb->nlb->header_,
> >>>>> +                        OVS_SOURCE_LOCATOR, hash);
> >>>>> +            }
> >>>>> +        }
> >>>>> +    }
> >>>>
> >>>> I see how this can fix the dataplane issue because we're limiting the
> >>>> number of dp flows but this now induces a measurable penalty on the
> >>>> control plane side because we essentially double the number of
logical
> >>>> flows that ovn-northd generates for load balancer VIPs
> >>>>
> >>>> Using a NB database [2] from a 250 node density heavy test [3] we ran
> >>>> in-house I see the following:
> >>>>
> >>>> a. before this patch (main):
> >>>> - number of SB lflows:         146233
> >>>> - SB size on disk (compacted): 70MB
> >>>> - northd poll loop intervals:  8525ms
> >>>>
> >>>> b. with this patch:
> >>>> - number of SB lflows:         163303
> >>>> - SB size on disk (compacted): 76MB
> >>>> - northd poll loop intervals:  9958ms
> >>>>
> >>>> [2]
> >>>>
> >>>
> >
https://people.redhat.com/~dceara/250-density-heavy/20220830/ovnnb_db.db.gz
> >>>> [3]
> >>>>
> >>>
> >
https://github.com/dceara/ovn-heater/blob/main/test-scenarios/ocp-250-density-heavy.yml
> >>>>
> >>>> This raises some concerns.  However, I went ahead and also ran a test
> >>>> with Ilya's load balancer optimization series [4] (up for review)
and I
> >>> got:
> >>>>
> >>>> c. with this patch + Ilya's series:
> >>>> - number of SB lflows:         163303  << Didn't change compared to
"b"
> >>>> - SB size on disk (compacted): 76MB    << Didn't change compared to
"b"
> >>>> - northd poll loop intervals:  6437ms  << 25% better than "a"
> >>>>
> >>>> Then I ran a test with main + Ilya's series.
> >>>>
> >>>> d. main + Ilya's series:
> >>>> - number of SB lflows:         146233  << Didn't change compared to
"a"
> >>>> - SB size on disk (compacted): 70MB    << Didn't change compared to
"a"
> >>>> - northd poll loop intervals:  5413ms  << 35% better than "a"
> >>>>
> >>>> [4]
> > https://patchwork.ozlabs.org/project/ovn/list/?series=315213&state=*
> >>>>
> >>>> I guess what I'm trying to say is that if go ahead with the approach
> > you
> >>>> proposed and especially if we want to backport it to the LTS we
should
> >>>> consider accepting/backporting other optimizations too (such as
Ilya's)
> >>>> that would compensate (and more) for the control plane performance
hit.
> >>>>
> >>> Thanks for the scale test!
> >>> I do expect some cost in the control plane, and your test result is
> > aligned
> >>> with my expectation. It is unpleasant to see ~10% SB size increase (in
> > the
> >>> density-heavy and LB/service-heavy test scenario), but if it is
> > necessary
> >>> to fix the dataplane performance bug I think it is worth it. After
all,
> >>> everything the control plane does is to serve the dataplane, right?
The
> >>> resulting dataplane performance boost is at least 10x for short-lived
> >>> connection scenarios.
> >>> It's great to see Ilya's patch helps the LB scale, and I have no
> > objection
> >>> to backport them if it is considered necessary to stable branches, but
> > I'd
> >>> consider it independent of the current patch.
> >>>
> >>
> >> They are independent.  But the problem is your fix will visibily impact
> >> control plane latency (~16% on my setup).  So, if we can avoid some of
> >> that hit by considering both patchsets together, I'd vote for that.
> >>
> > Again, no objection to backporting Ilya's patches, but this is the
criteria
> > in my mind:
> >
> > 1. For this patch, it improves dataplane performance (to me it is in
fact
> > fixing a performance bug) at the cost of the control plane. This is not
> > uncommon in the OVN project history - when adding some new features
> > sometimes it does add control plane cost. When this happens, whether to
> > accept the new feature is based on the value of the feature v.s. the
cost
> > of the control plane - a tradeoff has to be made in many cases. In this
> > case, for 16% control plane latency increase we get 10x dataplane speed
up,
> > I would  vote for it.
>
> Hi Han,
>
> I'm not arguing that we don't get dataplane speed up.  I'm sure we will
> and it should be a decent amount.  But I've been trying to replicate
> this 10x boost with an ovn-kubernetes kind deployment and I didn't
> really manage to.
>
> I did this on a 28 core Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz in our
> lab.
>
> a. build an ovnkube image (with and without your patch); essentially
> what our ovn-kubernetes CI jobs do:
>
>
https://github.com/ovn-org/ovn/blob/main/.github/workflows/ovn-kubernetes.yml#L39
>
> b. start a kind cluster with the image we built at step "a".
>
> c. edit the ovs-node daemonset definition and increase the resource
> limits to 16 CPUs and 2GB RSS (the kind defaults are very low: 0.2CPU
> and 300MB).  This restarts the ovs-daemon pods, wait for those to come
> back up.
>
> d. start an nginx deployment with 50 replicas: this is essentially 50
> backends accepting http requests on port 80.
>
> e. create a service load balancing http requests received on a port,
> e.g. 42424, to port 80 on the pods in the nginx deployment above.
>
> f. create a clients deployment with 100 replicas: 100 pods running the
> following bash script, a hack to simulate 40 new connections with
> different ephemeral source ports every 2 seconds:
>
> i=0
> while true; do
>   i=$((i+1))
>   [ $((i%40)) -eq 0 ] && sleep 2
>   nc -z -v $SERVICE_VIP 42424 &
> done
>
> Comparing the runs with and without your patch applied I see improvement:
>
> 1. without this patch applied:
> - 17K, 18K, 27K datapath flows on the 3 nodes of the cluster
> - OVS cpu usage on the 3 nodes: 30-50%
>
> 2. with this patch applied:
> - 9K, 9.5K, 9.K datapath flows on the 3 nodes of the cluster
> - OVS cpu usage on the 3 nodes: 10-20%
>
> In both cases the whole system is not overloaded, OVS doesn't reach it's
> CPU limits.
>
> I see clear improvement but I was expecting to see more than this.  It
> does look like my test doesn't generate enough unique sources per second
> so I was wondering if you could share your test methodology?
>

Thanks Dumitru for the performance test. I guess if you reduce the number
of clients while increasing the connection speed per-client you would see
more differences. And I am not sure why you saw 10-20% OVS cpu usage with
the patch applied. If all the 100 clients keep sending packets you should
have constant 100xN (N=2 or 3) megaflows instead of 9K (unless there are
other problems causing flow miss). BTW, I assume you don't have HW-offload
enabled, right? Could you also try sending to a backend directly instead of
VIP and see the difference?

Here is a brief test I had, with the below patch based on the system test
case "load-balancing". It is basically a single client sending requests to
one of the LB backends using the "ab - apache benchmark" test.
-------------------------------------------------------------------------
diff --git a/tests/system-ovn.at b/tests/system-ovn.at
index 992813614..3adc72da1 100644
--- a/tests/system-ovn.at
+++ b/tests/system-ovn.at
@@ -1430,10 +1430,12 @@ ovn-nbctl lsp-add bar bar3 \
 # Config OVN load-balancer with a VIP.
 ovn-nbctl lb-add lb1 30.0.0.1 "172.16.1.2,172.16.1.3,172.16.1.4"
 ovn-nbctl ls-lb-add foo lb1
+ovn-nbctl ls-lb-add bar lb1

 # Create another load-balancer with another VIP.
 lb2_uuid=`ovn-nbctl create load_balancer name=lb2
vips:30.0.0.3="172.16.1.2,172.16.1.3,172.16.1.4"`
 ovn-nbctl ls-lb-add foo lb2
+ovn-nbctl ls-lb-add bar lb2

 # Config OVN load-balancer with another VIP (this time with ports).
 ovn-nbctl set load_balancer $lb2_uuid vips:'"30.0.0.2:8000"'='"
172.16.1.2:80,172.16.1.3:80,172.16.1.4:80"'
@@ -1503,6 +1505,10 @@
tcp,orig=(src=192.168.1.2,dst=30.0.0.3,sport=<cleared>,dport=<cleared>),reply=(s
 
tcp,orig=(src=192.168.1.2,dst=30.0.0.3,sport=<cleared>,dport=<cleared>),reply=(src=172.16.1.4,dst=192.168.1.2,sport=<cleared>,dport=<cleared>),zone=<cleared>,mark=2,protoinfo=(state=<cleared>)
 ])

+#check ovn-nbctl --apply-after-lb acl-add foo from-lport 1002 "ip4 &&
ip4.dst == {172.16.1.2,172.16.1.3,172.16.1.4} && tcp.dst == 80"
allow-related
+#check ovn-nbctl --apply-after-lb acl-add foo from-lport 1002 "ip4"
allow-related
+#check ovn-nbctl --wait=hv sync
+
 dnl Test load-balancing that includes L4 ports in NAT.
 dnl Each server should have at least one connection.
 OVS_WAIT_FOR_OUTPUT([
@@ -1516,6 +1522,15 @@
tcp,orig=(src=192.168.1.2,dst=30.0.0.2,sport=<cleared>,dport=<cleared>),reply=(s
 
tcp,orig=(src=192.168.1.2,dst=30.0.0.2,sport=<cleared>,dport=<cleared>),reply=(src=172.16.1.4,dst=192.168.1.2,sport=<cleared>,dport=<cleared>),zone=<cleared>,mark=2,protoinfo=(state=<cleared>)
 ])

+ip netns exec foo1 ab -n 10000 -c 1 -q 172.16.1.2:80/
+
 # Configure selection_fields.
 ovn-nbctl set load_balancer $lb2_uuid
selection_fields="ip_src,ip_dst,tp_src,tp_dst"
 OVS_WAIT_UNTIL([
-----------------------------------------------------------------------

Tests on 24 cores x Intel(R) Core(TM) i9-7920X CPU @ 2.90GHz
The test results are:

// Before the change
Concurrency Level:      1
Time taken for tests:   12.913 seconds
Complete requests:      10000
Failed requests:        0
Total transferred:      21450000 bytes
HTML transferred:       19590000 bytes
Requests per second:    774.41 [#/sec] (mean)
Time per request:       1.291 [ms] (mean)
Time per request:       1.291 [ms] (mean, across all concurrent requests)
Transfer rate:          1622.18 [Kbytes/sec] received

OVS CPU: ~280%

// After the change
Concurrency Level:      1
Time taken for tests:   5.629 seconds
Complete requests:      10000
Failed requests:        0
Total transferred:      21450000 bytes
HTML transferred:       19590000 bytes
Requests per second:    1776.62 [#/sec] (mean)
Time per request:       0.563 [ms] (mean)
Time per request:       0.563 [ms] (mean, across all concurrent requests)
Transfer rate:          3721.54 [Kbytes/sec] received

OVS CPU: ~0%

We can see that the RPS is 2.5x higher (1776 v.s. 774) even with a single
thread (-c 1). It is not 10x as what I guessed earlier, but keep in mind
that the "ab" test has a complete TCP handshake and then request/response,
and finally tear down, including usually 6 packets per connection, so at
least half of the packets are still going through fast-path even before the
change. But for the "first" packets of each connection, the speed up should
be at 10x level, which we can easily tell by a ping test and see the RTT
difference between the first packet that goes through the slow-path v.s.
the later packets that go through the fast path only.

And the CPU utilization for OVS is huge before the change while after the
change it is 0. So if we combine more complex scenarios with the system
overloaded I believe the difference should also be even bigger.

> For the OVN code changes, they look good to me.  I think we should apply
> them to the main branch.  For that:
>
> Acked-by: Dumitru Ceara <dce...@redhat.com>
>
> For backporting the change to stable branches I'd really like wait to
> first understand the exact scenario in which the megaflow undwildcarding
> created such a big performance regression in ovn-kubernetes.

Thank you for all the reviews and tests. I applied to main and am waiting
for your confirmation for backporting.
For branch-22.09 I think maybe we should backport for now before it is
released?

Han

>
> Thanks,
> Dumitru
>
> >
> > 2. For backporting Ilya's optimization, I'd evaluate using the criteria
for
> > any optimization backporting. Say the patch has 10-20% speed up, and we
> > think that speed up is critical for a branch for some reason, and the
> > changes are well contained, I'd vote for the backport.
> >
> > Does this make sense?
> > @Numan Siddique <num...@ovn.org> @Mark Michelson <mmich...@redhat.com>
any
> > comments?
> >
> > Thanks,
> > Han
> >
> >> Given that Ilya's changes seem very contained I don't think this should
> >> be a problem.  They just need a proper review first.
> >>
> >>>> I tried to think of an alternative that wouldn't require doubling the
> >>>> number of VIP logical flows but couldn't come up with one.  Maybe you
> >>>> have more ideas?
> >>>>
> >>> This is the approach that has the least impact and changes to existing
> >>> pipelines I have thought about so far. The earlier attempt wouldn't
> >>> increase the number of flows but it had to compromise to either
failing
> > to
> >>> support a corner case (that we are still not sure if it is really
> > useful at
> >>> all) or losing HW offloading capability. I'll still think about some
> > more
> >>> approaches but the NAT/LB/ACL pipelines are very complex now and I'd
> > avoid
> >>> big changes in case any corner case is broken. I'd spend more time
> > sorting
> >>> out different use cases and the documentation before making more
> >>> optimizations, but I don't think we should also fix this performance
> > bug as
> >>> soon as we can. (I am also curious why we haven't heard other OVN
users
> >>> complain about this yet ...)
> >>
> >> It's a guess, but we probably didn't notice this issue because it
should
> >> be more visible when there are lots of short lived connections.
> >>
> >> Thanks,
> >> Dumitru
> >>
> >
>
_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to