On Mon, May 23, 2022 at 3:49 PM Ilya Maximets <i.maxim...@ovn.org> wrote:
>
> On 5/21/22 12:49, Frode Nordahl wrote:
> > On Thu, May 19, 2022 at 3:39 PM Frode Nordahl
> > <frode.nord...@canonical.com> wrote:
> >>
> >> On Sat, May 14, 2022 at 2:10 AM Ilya Maximets <i.maxim...@ovn.org> wrote:
> >>>
> >>> On 5/13/22 10:36, Frode Nordahl wrote:
> >>>> On Fri, Mar 11, 2022 at 2:04 PM Liam Young <liam.yo...@canonical.com> 
> >>>> wrote:
> >>>>>
> >>>>> Hi,
> >>>>>
> >>>>> <tl;dr> Commit 355fef6f2 seems to break connectivity in my setup</tl;dr>
> >>>>
> >>>> After OVN started colocating sNAT and dNAT operations in the same CT
> >>>> zone [0] the above mentioned OVS commit appears to also break OVN [1].
> >>>> I have been discussing with Numan and he has found a correlation with
> >>>> the behavior of the open vswitch kernel data path conntrack
> >>>> `skb_nfct_cached` function, i.e. if that is changed to always return
> >>>> 'false' the erratic behavior disappears.
> >>>>
> >>>> So I guess the question then becomes, is this a OVS userspace or OVS
> >>>> kernel problem?
> >>>>
> >>>> We have a reproducer in [3].
> >>>>
> >>>> 0: 
> >>>> https://github.com/ovn-org/ovn/commit/4deac4509abbedd6ffaecf27eed01ddefccea40a
> >>>> 1: https://bugs.launchpad.net/ubuntu/+source/ovn/+bug/1967856
> >>>> 2: 
> >>>> https://elixir.bootlin.com/linux/latest/source/net/openvswitch/conntrack.c#L683
> >>>> 3: https://bugs.launchpad.net/ubuntu/+source/ovn/+bug/1967856/comments/6
> >>>>
> >>>
> >>> Hrm.  I think, there is a logical bug in implementations of ct()
> >>> datapath action in both datapaths.
> >>>
> >>> The problem appears when the OpenFlow pipeline for the packet contains
> >>> several ct() actions.  NXAST_CT action (i.e. every ct() action) must
> >>> always put the packet into an untracked state.  Tracked state will be
> >>> set only in the 'recirc_table' where the packet is cloned by the ct()
> >>> action for further processing.
> >>>
> >>> If an OF pipeline looks like this:
> >>>
> >>>   actions=ct(),something,something,ct(),something
> >>>
> >>> each action must be entered with the 'untracked' conntrack state in
> >>> the packet metadata.
> >>>
> >>> However, ct() action in the datapath, unlike OpenFlow, doesn't work
> >>> this way.  It modifies the conntrack state for the packet it is 
> >>> processing.
> >>> During the flow translation OVS inserts recirculation right after the
> >>> datapath ct() action.  This way conntrack state affects only processing
> >>> after recirculation.  Sort of.
> >>>
> >>> The issue is that not all OpenFlow ct() actions have recirc_table
> >>> specified.  These actions supposed to change the state of the conntrack
> >>> subsystem, but they still must leave the packet itself in the untracked
> >>> state with no strings attached.  But since datapath ct() actions doesn't
> >>> work that way and since recirculation is not inserted (no recirc_table
> >>> specified), packet after conntrack execution carries the state along to
> >>> all other actions.
> >>> This doesn't impact normal actions, but seems to break subsequent ct()
> >>> actions on the same pipeline.
> >>>
> >>> In general, it looks to me that ct() action in the datapath is
> >>> not supposed to cache any connection data inside the packet's metadata.
> >>> This seems to be the root cause of the problem.  Fields that OF knows
> >>> about should not trigger any issues if carried along with a packet,
> >>> but datapath-specific metadata can not be cleared, because OFproto
> >>> simply doesn't know about it.
> >>>
> >>> But, I guess, not caching the connection and other internal structures
> >>> might significantly impact datapath performance.  Will it?
> >>>
> >>> Looking at the reproducer in [3], it has, for example, the flow with the
> >>> following actions:
> >>>
> >>>   actions:ct(commit,zone=8,mark=0/0x1,nat(src)),
> >>>           set(eth(src=00:00:00:00:00:01,dst=00:00:00:00:00:06)),
> >>>           set(ipv4(src=172.18.2.10,dst=192.168.100.6,ttl=62)),
> >>>           ct(zone=8),recirc(0x4)
> >>>
> >>> So, if the first ct() will change some datapath-specific packet metadata,
> >>> the second ct() may try to use that information.
> >>>
> >>> It looks like during the flow translation we must add ct_clear action
> >>> explicitly after every ct() action, unless it was the last action in
> >>> the list.  This will end up with adding a lot of ct_clear actions
> >>> everywhere.
> >>>
> >>> Another option is the patch below that tries to track if ct_clear
> >>> is required and adds that action before the next ct() action in
> >>> the datapath.
> >>>
> >>> BTW, the test [3] fails with both kernel and userspace datapaths.
> >>>
> >>> The change below should fix the problem, I think.
> >>> It looks like we also have to put ct_clear action before translating
> >>> output to the patch port if we're in 'conntracked' state.
> >>>
> >>> And I do not know how to fix the problem for kernels without ct_clear
> >>> support.  I don't think we can clear metadata that is internal to
> >>> the kernel.  Ideas are welcome.
> >>>
> >>> CC: Aaron, Paolo.
> >>>
> >>> Any thoughts?
> >>
> >> Thank you so much for the detailed explanation of what's going on and
> >> for providing a proposed fix.
> >>
> >> I took it for a spin and it does indeed appear to fix the OVN hairpin
> >> issue, but it does unfortunately not appear to fix the issue Liam
> >> reported for the ML2/OVS use case [4].
> >>
> >> When trying that use case with your patch I see this in the Open vSwitch 
> >> log:
> >> 2022-05-19T08:17:02.668Z|00001|ofproto_dpif_xlate(handler1)|WARN|over
> >> max translation depth 64 on bridge br-int while processing
> >> ct_state=new|trk,ct_ipv6_src=fc00:24:159c:555b:f816:3eff:fe8b:3ad0,ct_ipv6_dst=fc00:24:159c:555b:f816:3eff:fe8f:9302,ct_nw_proto=58,ct_tp_src=128,ct_tp_dst=0,icmp6,in_port=3,vlan_tci=0x0000,dl_src=fa:16:3e:8b:3a:d0,dl_dst=fa:16:3e:8f:93:02,ipv6_src=fc00:24:159c:555b:f816:3eff:fe8b:3ad0,ipv6_dst=fc00:24:159c:555b:f816:3eff:fe8f:9302,ipv6_label=0xac5db,nw_tos=0,nw_ecn=0,nw_ttl=64,icmp_type=128,icmp_code=0
> >> 2022-05-19T08:17:02.668Z|00002|ofproto_dpif_upcall(handler1)|WARN|Dropped
> >> 2 log messages in last 205 seconds (most recently, 187 seconds ago)
> >> due to excessive rate
> >> 2022-05-19T08:17:02.668Z|00003|ofproto_dpif_upcall(handler1)|WARN|Flow:
> >> ct_state=new|trk,ct_ipv6_src=fc00:24:159c:555b:f816:3eff:fe8b:3ad0,ct_ipv6_dst=fc00:24:159c:555b:f816:3eff:fe8f:9302,ct_nw_proto=58,ct_tp_src=128,ct_tp_dst=0,icmp6,in_port=5,vlan_tci=0x0000,dl_src=fa:16:3e:8b:3a:d0,dl_dst=fa:16:3e:8f:93:02,ipv6_src=fc00:24:159c:555b:f816:3eff:fe8b:3ad0,ipv6_dst=fc00:24:159c:555b:f816:3eff:fe8f:9302,ipv6_label=0xac5db,nw_tos=0,nw_ecn=0,nw_ttl=64,icmp_type=128,icmp_code=0
> >>
> >> bridge("br-int")
> >> ----------------
> >>  0. priority 0, cookie 0x56d6d58c018b51bd
> >>     goto_table:60
> >> 60. priority 3, cookie 0x56d6d58c018b51bd
> >>     NORMAL
> >>      >>>> received packet on unknown port 5 <<<<
> >>      >> no input bundle, dropping
> >>
> >> Final flow: unchanged
> >> Megaflow: 
> >> recirc_id=0,eth,ipv6,in_port=5,dl_src=fa:16:3e:8b:3a:d0,dl_dst=fa:16:3e:8f:93:02,nw_frag=no
> >> Datapath actions: drop
> >>
> >>
> >> On the back of your explanation and the log output I made an attempt
> >> at adding this patch on top of yours:
> >> ---
> >> diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c
> >> index 110dab0ec..2955e8e4d 100644
> >> --- a/ofproto/ofproto-dpif-xlate.c
> >> +++ b/ofproto/ofproto-dpif-xlate.c
> >> @@ -7144,6 +7144,7 @@ do_xlate_actions(const struct ofpact *ofpacts,
> >> size_t ofpacts_len,
> >>           * resubmit to the frozen actions.
> >>           */
> >>          case OFPACT_RESUBMIT:
> >> +            ctx->pending_ct_clear = true;
> >>              xlate_ofpact_resubmit(ctx, ofpact_get_RESUBMIT(a), last);
> >>              continue;
> >>          case OFPACT_GOTO_TABLE:
> >> ---
> >>
> >> And the two patches together do appear to resolve the issue reported
> >> in [4] as well as the OVN hairpin issue [1]. It does however make a
> >> couple of tests fail, so I need to look into if that is expected from
> >> the change or if the approach must be changed.
> >>
> >> 4: https://bugs.launchpad.net/openvswitch/+bug/1964117
> >
> > The following patch also works and does not cause any tests to fail.
> >
> > ---
> > diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c
> > index 110dab0ec..905f6994d 100644
> > --- a/ofproto/ofproto-dpif-xlate.c
> > +++ b/ofproto/ofproto-dpif-xlate.c
> > @@ -7354,7 +7354,9 @@ do_xlate_actions(const struct ofpact *ofpacts,
> > size_t ofpacts_len,
> >              break;
> >
> >          case OFPACT_CT_CLEAR:
> > -            if (ctx->conntracked || ctx->pending_ct_clear) {
> > +            if (ctx->conntracked || ctx->pending_ct_clear
> > +                || (flow->ct_state && flow->ct_state & CS_TRACKED))
> > +            {
> >                  compose_ct_clear_action(ctx);
> >              }
> >              break;
> > ---
> >
> > The OpenStack ML2/OVS Open vSwitch firewall driver appears to make
> > explicit use of ct_clear action as part of progressing packets through
> > the tables and the optimization in 355fef6f2 interrupts that workflow.
> >
> > I am not quite sure if the definition of `struct
> > xlate_ctx->conntracked` is "this flow has been exposed to conntrack"
> > or if it is "conntrack sees this flow as established", if anyone can
> > shed light on that we would know if the above patch is ok as is or if
> > this flow state should have set struct xlate_ctx->conntracked to true
> > in the first place.
>
> I beleive that ctx->conntracked should men that "this flow has been exposed
> to conntrack".  And it looks like the condition:
>  (!ctx->conntracked && flow->ct_state & CS_TRACKED)
> should not be possible.  We're, probably, missing clear_conntrack() call
> somewhere.
>
> One thing I seem to miss in my patch is the conntrack clear after returning
> from the patch port processing:
>
> diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c
> index 110dab0ec..53d4f78b2 100644
> --- a/ofproto/ofproto-dpif-xlate.c
> +++ b/ofproto/ofproto-dpif-xlate.c
> @@ -3955,8 +3955,13 @@ patch_port_output(struct xlate_ctx *ctx, const struct 
> xport *in_dev,
>
>      /* The out bridge's conntrack execution should have no effect on the
>       * original bridge. */
> -    ctx->conntracked = old_conntrack;
>      ctx->pending_ct_clear = old_ct_clear;
> +    if (ctx->conntracked) {
> +        /* Conntrack was involved in the other bridge.  We need to clear
> +         * whatever information was cached in the datapath. */
> +        ctx->pending_ct_clear = true;
> +    }
> +    ctx->conntracked = old_conntrack;
>
>      /* The fact that the out bridge exits (for any reason) does not mean
>       * that the original bridge should exit.  Specifically, if the out
> ---
>
> But that doesn't seem to be related to the (!ctx->conntracked &&
> flow->ct_state & CS_TRACKED) case...

Thank you for confirming that ctx->conntracked should have been set in
this flow state, that led me onto the possible root and fix of this
particular use case.

Enabling debug log for dpif module while initiating a new flow I see
that the flow can enter the system as a miss upcall without
recirculation and with ct_state set:
2022-05-24T10:42:57.869Z|00036|dpif(handler6)|DBG|system@ovs-system:
miss upcall:
recirc_id(0),dp_hash(0),skb_priority(0),in_port(6),skb_mark(0),ct_state(0x21),ct_zone(0),ct_mark(0),ct_label(0),ct_tuple6(src=fc00:a3f0:723f:7bc4:f816:3eff:fe81:224d,dst=fc00:a3f0:723f:7bc4:f816:3eff:fee5:e7b2,proto=58,src_port=128,dst_port=0),eth(src=fa:16:3e:81:22:4d,dst=fa:16:3e:e5:e7:b2),eth_type(0x86dd),ipv6(src=fc00:a3f0:723f:7bc4:f816:3eff:fe81:224d,dst=fc00:a3f0:723f:7bc4:f816:3eff:fee5:e7b2,label=0x357f1,proto=58,tclass=0,hlimit=64,frag=no),icmpv6(type=128,code=0)

Looking at the code it appears that for this state the
ctx->conntracked would not be set, it is currently initialized to
'false' and only updated based on frozen state for recirculated
packets.

Adding this patch resolves inconsistency we discussed above and
subsequently the ML2/OVS problem:
diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c
index 110dab0ec..7bc7426ac 100644
--- a/ofproto/ofproto-dpif-xlate.c
+++ b/ofproto/ofproto-dpif-xlate.c
@@ -7852,6 +7852,12 @@ xlate_actions(struct xlate_in *xin, struct
xlate_out *xout)
         goto exit;
     }

+    if (!xin->frozen_state
+        && xin->flow.ct_state
+        && xin->flow.ct_state & CS_TRACKED) {
+        ctx.conntracked = true;
+    }
+
     /* Tunnel metadata in udpif format must be normalized before
translation. */
     if (flow->tunnel.flags & FLOW_TNL_F_UDPIF) {
         const struct tun_table *tun_tab = ofproto_get_tun_tab(
---

What do you think?

-- 
Frode Nordahl

> >
> > The output from `conntrack -E` with the above patch is in [5], and
> > without it is in [6].
> >
> > 5: https://pastebin.ubuntu.com/p/tQr6PhXD9X/
> > 6: https://pastebin.ubuntu.com/p/WkDmpc6xRZ/
> >
>
_______________________________________________
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss

Reply via email to