On Wed, Jul 6, 2022 at 8:45 AM Dumitru Ceara <dce...@redhat.com> wrote:
>
> Hi Han,
>
> On 7/6/22 00:41, Han Zhou wrote:
> > The ls_in_pre_stateful priority 120 flow that saves dst IP and Port to
> > registers is causing a critical dataplane performance impact to
> > short-lived connections, because it unwildcards megaflows with exact
> > match on dst IP and L4 ports. Any new connections with a different
> > client side L4 port will encounter datapath flow miss and upcall to
> > ovs-vswitchd.
> >
> > These fields (dst IP and port) were saved to registers to solve a
> > problem of LB hairpin use case when different VIPs are sharing
> > overlapping backend+port [0]. The change [0] might not have as wide
> > performance impact as it is now because at that time one of the match
> > condition "REGBIT_CONNTRACK_NAT == 1" was set only for established and
> > natted traffic, while now the impact is more obvious because
> > REGBIT_CONNTRACK_NAT is now set for all IP traffic (if any VIP
> > configured on the LS) since commit [1], after several other indirectly
> > related optimizations and refactors.
> >
> > Since the changes that introduced the performance problem had their
> > own values (fixes problems or optimizes performance), so we don't want
> > to revert any of the changes (and it is also not straightforward to
> > revert any of them because there have been lots of changes and refactors
> > on top of them).
> >
> > Change [0] itself has added an alternative way to solve the overlapping
> > backends problem, which utilizes ct fields instead of saving dst IP and
> > port to registers. This patch forces to that approach and removes the
> > flows/actions that saves the dst IP and port to avoid the dataplane
> > performance problem for short-lived connections.
> >
> > (With this approach, the ct_state DNAT is not HW offload friendly, so it
> > may result in those flows not being offloaded, which is supposed to be
> > solved in a follow-up patch)
> >
> > [0] ce0ef8d59850 ("Properly handle hairpin traffic for VIPs with shared
backends.")
> > [1] 0038579d1928 ("northd: Optimize ct nat for load balancer traffic.")
> >
> > Signed-off-by: Han Zhou <hz...@ovn.org>
> > ---
>
> I think the main concern I have is that this forces us to choose between:
> a. non hwol friendly flows (reduced performance)
> b. less functionality (with the knob in patch 3/3 set to false).
>
Thanks Dumitru for the comments! I agree the solution is not ideal, but if
we look at it from a different angle, even with a), for most pod->service
traffic the performance is still much better than how it is today (not
offloaded kernel datapath is still much better than userspace slowpath).
And *hopefully* b) is ok for most use cases to get HW-offload capability.

> Change [0] was added to address the case when a service in kubernetes is
> exposed via two different k8s services objects that share the same
> endpoints.  That translates in ovn-k8s to two different OVN load
> balancer VIPs that share the same backends.  For such cases, if the
> service is being accessed by one of its own backends we need to be able
> to differentiate based on the VIP address it used to connect to.
>
> CC: Tim Rozet, Dan Williams for some more input from the ovn-k8s side on
> how common it is that an OVN-networked pod accesses two (or more)
> services that might have the pod itself as a backend.
>

Yes, we definitely need input from ovn-k8s side. The information we got so
far: the change [0] was to fix a bug [2] reported by Tim. However, the bug
description didn't mention anything about two VIPs sharing the same
backend. Tim also mentioned in the ovn-k8s meeting last week that the
original user bug report for [2] was [3], and [3] was in fact a completely
different problem (although it is related to hairpin, too). So, I am under
the impression that "an OVN-networked pod accesses two (or more) services
that might have the pod itself as a backend" might be a very rare use case,
if it exists at all.

> If this turns out to be mandatory I guess we might want to also look
> into alternatives like:
> - getting help from the HW to offload matches like ct_tuple()

I believe this is going to happen in the future. HWOL is continuously
enhanced.

> - limiting the impact of "a." only to some load balancers (e.g., would
> it help to use different hairpin lookup tables for such load balancers?)

I am not sure if this would work, and not sure if this is a good approach,
either. In general, I believe it is possible to solve the problem with more
complex pipelines, but we need to keep in mind it is quite easy to
introduce other performance problems (either control plane or data plane) -
many of the changes lead to the current implementation were for performance
optimizations, some for control plane, some for HWOL, and some for reducing
recirculations. I'd avoid complexity unless it is really necessary. Let's
get more input for the problem, and based on that we can decide if we want
to move to a more complex solution.

[2] https://bugzilla.redhat.com/show_bug.cgi?id=1931599
[3] https://bugzilla.redhat.com/show_bug.cgi?id=1903651

Thanks,
Han
_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to