On 5/21/22 12:49, Frode Nordahl wrote: > On Thu, May 19, 2022 at 3:39 PM Frode Nordahl > <frode.nord...@canonical.com> wrote: >> >> On Sat, May 14, 2022 at 2:10 AM Ilya Maximets <i.maxim...@ovn.org> wrote: >>> >>> On 5/13/22 10:36, Frode Nordahl wrote: >>>> On Fri, Mar 11, 2022 at 2:04 PM Liam Young <liam.yo...@canonical.com> >>>> wrote: >>>>> >>>>> Hi, >>>>> >>>>> <tl;dr> Commit 355fef6f2 seems to break connectivity in my setup</tl;dr> >>>> >>>> After OVN started colocating sNAT and dNAT operations in the same CT >>>> zone [0] the above mentioned OVS commit appears to also break OVN [1]. >>>> I have been discussing with Numan and he has found a correlation with >>>> the behavior of the open vswitch kernel data path conntrack >>>> `skb_nfct_cached` function, i.e. if that is changed to always return >>>> 'false' the erratic behavior disappears. >>>> >>>> So I guess the question then becomes, is this a OVS userspace or OVS >>>> kernel problem? >>>> >>>> We have a reproducer in [3]. >>>> >>>> 0: >>>> https://github.com/ovn-org/ovn/commit/4deac4509abbedd6ffaecf27eed01ddefccea40a >>>> 1: https://bugs.launchpad.net/ubuntu/+source/ovn/+bug/1967856 >>>> 2: >>>> https://elixir.bootlin.com/linux/latest/source/net/openvswitch/conntrack.c#L683 >>>> 3: https://bugs.launchpad.net/ubuntu/+source/ovn/+bug/1967856/comments/6 >>>> >>> >>> Hrm. I think, there is a logical bug in implementations of ct() >>> datapath action in both datapaths. >>> >>> The problem appears when the OpenFlow pipeline for the packet contains >>> several ct() actions. NXAST_CT action (i.e. every ct() action) must >>> always put the packet into an untracked state. Tracked state will be >>> set only in the 'recirc_table' where the packet is cloned by the ct() >>> action for further processing. >>> >>> If an OF pipeline looks like this: >>> >>> actions=ct(),something,something,ct(),something >>> >>> each action must be entered with the 'untracked' conntrack state in >>> the packet metadata. >>> >>> However, ct() action in the datapath, unlike OpenFlow, doesn't work >>> this way. It modifies the conntrack state for the packet it is processing. >>> During the flow translation OVS inserts recirculation right after the >>> datapath ct() action. This way conntrack state affects only processing >>> after recirculation. Sort of. >>> >>> The issue is that not all OpenFlow ct() actions have recirc_table >>> specified. These actions supposed to change the state of the conntrack >>> subsystem, but they still must leave the packet itself in the untracked >>> state with no strings attached. But since datapath ct() actions doesn't >>> work that way and since recirculation is not inserted (no recirc_table >>> specified), packet after conntrack execution carries the state along to >>> all other actions. >>> This doesn't impact normal actions, but seems to break subsequent ct() >>> actions on the same pipeline. >>> >>> In general, it looks to me that ct() action in the datapath is >>> not supposed to cache any connection data inside the packet's metadata. >>> This seems to be the root cause of the problem. Fields that OF knows >>> about should not trigger any issues if carried along with a packet, >>> but datapath-specific metadata can not be cleared, because OFproto >>> simply doesn't know about it. >>> >>> But, I guess, not caching the connection and other internal structures >>> might significantly impact datapath performance. Will it? >>> >>> Looking at the reproducer in [3], it has, for example, the flow with the >>> following actions: >>> >>> actions:ct(commit,zone=8,mark=0/0x1,nat(src)), >>> set(eth(src=00:00:00:00:00:01,dst=00:00:00:00:00:06)), >>> set(ipv4(src=172.18.2.10,dst=192.168.100.6,ttl=62)), >>> ct(zone=8),recirc(0x4) >>> >>> So, if the first ct() will change some datapath-specific packet metadata, >>> the second ct() may try to use that information. >>> >>> It looks like during the flow translation we must add ct_clear action >>> explicitly after every ct() action, unless it was the last action in >>> the list. This will end up with adding a lot of ct_clear actions >>> everywhere. >>> >>> Another option is the patch below that tries to track if ct_clear >>> is required and adds that action before the next ct() action in >>> the datapath. >>> >>> BTW, the test [3] fails with both kernel and userspace datapaths. >>> >>> The change below should fix the problem, I think. >>> It looks like we also have to put ct_clear action before translating >>> output to the patch port if we're in 'conntracked' state. >>> >>> And I do not know how to fix the problem for kernels without ct_clear >>> support. I don't think we can clear metadata that is internal to >>> the kernel. Ideas are welcome. >>> >>> CC: Aaron, Paolo. >>> >>> Any thoughts? >> >> Thank you so much for the detailed explanation of what's going on and >> for providing a proposed fix. >> >> I took it for a spin and it does indeed appear to fix the OVN hairpin >> issue, but it does unfortunately not appear to fix the issue Liam >> reported for the ML2/OVS use case [4]. >> >> When trying that use case with your patch I see this in the Open vSwitch log: >> 2022-05-19T08:17:02.668Z|00001|ofproto_dpif_xlate(handler1)|WARN|over >> max translation depth 64 on bridge br-int while processing >> ct_state=new|trk,ct_ipv6_src=fc00:24:159c:555b:f816:3eff:fe8b:3ad0,ct_ipv6_dst=fc00:24:159c:555b:f816:3eff:fe8f:9302,ct_nw_proto=58,ct_tp_src=128,ct_tp_dst=0,icmp6,in_port=3,vlan_tci=0x0000,dl_src=fa:16:3e:8b:3a:d0,dl_dst=fa:16:3e:8f:93:02,ipv6_src=fc00:24:159c:555b:f816:3eff:fe8b:3ad0,ipv6_dst=fc00:24:159c:555b:f816:3eff:fe8f:9302,ipv6_label=0xac5db,nw_tos=0,nw_ecn=0,nw_ttl=64,icmp_type=128,icmp_code=0 >> 2022-05-19T08:17:02.668Z|00002|ofproto_dpif_upcall(handler1)|WARN|Dropped >> 2 log messages in last 205 seconds (most recently, 187 seconds ago) >> due to excessive rate >> 2022-05-19T08:17:02.668Z|00003|ofproto_dpif_upcall(handler1)|WARN|Flow: >> ct_state=new|trk,ct_ipv6_src=fc00:24:159c:555b:f816:3eff:fe8b:3ad0,ct_ipv6_dst=fc00:24:159c:555b:f816:3eff:fe8f:9302,ct_nw_proto=58,ct_tp_src=128,ct_tp_dst=0,icmp6,in_port=5,vlan_tci=0x0000,dl_src=fa:16:3e:8b:3a:d0,dl_dst=fa:16:3e:8f:93:02,ipv6_src=fc00:24:159c:555b:f816:3eff:fe8b:3ad0,ipv6_dst=fc00:24:159c:555b:f816:3eff:fe8f:9302,ipv6_label=0xac5db,nw_tos=0,nw_ecn=0,nw_ttl=64,icmp_type=128,icmp_code=0 >> >> bridge("br-int") >> ---------------- >> 0. priority 0, cookie 0x56d6d58c018b51bd >> goto_table:60 >> 60. priority 3, cookie 0x56d6d58c018b51bd >> NORMAL >> >>>> received packet on unknown port 5 <<<< >> >> no input bundle, dropping >> >> Final flow: unchanged >> Megaflow: >> recirc_id=0,eth,ipv6,in_port=5,dl_src=fa:16:3e:8b:3a:d0,dl_dst=fa:16:3e:8f:93:02,nw_frag=no >> Datapath actions: drop >> >> >> On the back of your explanation and the log output I made an attempt >> at adding this patch on top of yours: >> --- >> diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c >> index 110dab0ec..2955e8e4d 100644 >> --- a/ofproto/ofproto-dpif-xlate.c >> +++ b/ofproto/ofproto-dpif-xlate.c >> @@ -7144,6 +7144,7 @@ do_xlate_actions(const struct ofpact *ofpacts, >> size_t ofpacts_len, >> * resubmit to the frozen actions. >> */ >> case OFPACT_RESUBMIT: >> + ctx->pending_ct_clear = true; >> xlate_ofpact_resubmit(ctx, ofpact_get_RESUBMIT(a), last); >> continue; >> case OFPACT_GOTO_TABLE: >> --- >> >> And the two patches together do appear to resolve the issue reported >> in [4] as well as the OVN hairpin issue [1]. It does however make a >> couple of tests fail, so I need to look into if that is expected from >> the change or if the approach must be changed. >> >> 4: https://bugs.launchpad.net/openvswitch/+bug/1964117 > > The following patch also works and does not cause any tests to fail. > > --- > diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c > index 110dab0ec..905f6994d 100644 > --- a/ofproto/ofproto-dpif-xlate.c > +++ b/ofproto/ofproto-dpif-xlate.c > @@ -7354,7 +7354,9 @@ do_xlate_actions(const struct ofpact *ofpacts, > size_t ofpacts_len, > break; > > case OFPACT_CT_CLEAR: > - if (ctx->conntracked || ctx->pending_ct_clear) { > + if (ctx->conntracked || ctx->pending_ct_clear > + || (flow->ct_state && flow->ct_state & CS_TRACKED)) > + { > compose_ct_clear_action(ctx); > } > break; > --- > > The OpenStack ML2/OVS Open vSwitch firewall driver appears to make > explicit use of ct_clear action as part of progressing packets through > the tables and the optimization in 355fef6f2 interrupts that workflow. > > I am not quite sure if the definition of `struct > xlate_ctx->conntracked` is "this flow has been exposed to conntrack" > or if it is "conntrack sees this flow as established", if anyone can > shed light on that we would know if the above patch is ok as is or if > this flow state should have set struct xlate_ctx->conntracked to true > in the first place.
I beleive that ctx->conntracked should men that "this flow has been exposed to conntrack". And it looks like the condition: (!ctx->conntracked && flow->ct_state & CS_TRACKED) should not be possible. We're, probably, missing clear_conntrack() call somewhere. One thing I seem to miss in my patch is the conntrack clear after returning from the patch port processing: diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c index 110dab0ec..53d4f78b2 100644 --- a/ofproto/ofproto-dpif-xlate.c +++ b/ofproto/ofproto-dpif-xlate.c @@ -3955,8 +3955,13 @@ patch_port_output(struct xlate_ctx *ctx, const struct xport *in_dev, /* The out bridge's conntrack execution should have no effect on the * original bridge. */ - ctx->conntracked = old_conntrack; ctx->pending_ct_clear = old_ct_clear; + if (ctx->conntracked) { + /* Conntrack was involved in the other bridge. We need to clear + * whatever information was cached in the datapath. */ + ctx->pending_ct_clear = true; + } + ctx->conntracked = old_conntrack; /* The fact that the out bridge exits (for any reason) does not mean * that the original bridge should exit. Specifically, if the out --- But that doesn't seem to be related to the (!ctx->conntracked && flow->ct_state & CS_TRACKED) case... > > The output from `conntrack -E` with the above patch is in [5], and > without it is in [6]. > > 5: https://pastebin.ubuntu.com/p/tQr6PhXD9X/ > 6: https://pastebin.ubuntu.com/p/WkDmpc6xRZ/ > _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev