On 5/21/22 12:49, Frode Nordahl wrote:
> On Thu, May 19, 2022 at 3:39 PM Frode Nordahl
> <frode.nord...@canonical.com> wrote:
>>
>> On Sat, May 14, 2022 at 2:10 AM Ilya Maximets <i.maxim...@ovn.org> wrote:
>>>
>>> On 5/13/22 10:36, Frode Nordahl wrote:
>>>> On Fri, Mar 11, 2022 at 2:04 PM Liam Young <liam.yo...@canonical.com> 
>>>> wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> <tl;dr> Commit 355fef6f2 seems to break connectivity in my setup</tl;dr>
>>>>
>>>> After OVN started colocating sNAT and dNAT operations in the same CT
>>>> zone [0] the above mentioned OVS commit appears to also break OVN [1].
>>>> I have been discussing with Numan and he has found a correlation with
>>>> the behavior of the open vswitch kernel data path conntrack
>>>> `skb_nfct_cached` function, i.e. if that is changed to always return
>>>> 'false' the erratic behavior disappears.
>>>>
>>>> So I guess the question then becomes, is this a OVS userspace or OVS
>>>> kernel problem?
>>>>
>>>> We have a reproducer in [3].
>>>>
>>>> 0: 
>>>> https://github.com/ovn-org/ovn/commit/4deac4509abbedd6ffaecf27eed01ddefccea40a
>>>> 1: https://bugs.launchpad.net/ubuntu/+source/ovn/+bug/1967856
>>>> 2: 
>>>> https://elixir.bootlin.com/linux/latest/source/net/openvswitch/conntrack.c#L683
>>>> 3: https://bugs.launchpad.net/ubuntu/+source/ovn/+bug/1967856/comments/6
>>>>
>>>
>>> Hrm.  I think, there is a logical bug in implementations of ct()
>>> datapath action in both datapaths.
>>>
>>> The problem appears when the OpenFlow pipeline for the packet contains
>>> several ct() actions.  NXAST_CT action (i.e. every ct() action) must
>>> always put the packet into an untracked state.  Tracked state will be
>>> set only in the 'recirc_table' where the packet is cloned by the ct()
>>> action for further processing.
>>>
>>> If an OF pipeline looks like this:
>>>
>>>   actions=ct(),something,something,ct(),something
>>>
>>> each action must be entered with the 'untracked' conntrack state in
>>> the packet metadata.
>>>
>>> However, ct() action in the datapath, unlike OpenFlow, doesn't work
>>> this way.  It modifies the conntrack state for the packet it is processing.
>>> During the flow translation OVS inserts recirculation right after the
>>> datapath ct() action.  This way conntrack state affects only processing
>>> after recirculation.  Sort of.
>>>
>>> The issue is that not all OpenFlow ct() actions have recirc_table
>>> specified.  These actions supposed to change the state of the conntrack
>>> subsystem, but they still must leave the packet itself in the untracked
>>> state with no strings attached.  But since datapath ct() actions doesn't
>>> work that way and since recirculation is not inserted (no recirc_table
>>> specified), packet after conntrack execution carries the state along to
>>> all other actions.
>>> This doesn't impact normal actions, but seems to break subsequent ct()
>>> actions on the same pipeline.
>>>
>>> In general, it looks to me that ct() action in the datapath is
>>> not supposed to cache any connection data inside the packet's metadata.
>>> This seems to be the root cause of the problem.  Fields that OF knows
>>> about should not trigger any issues if carried along with a packet,
>>> but datapath-specific metadata can not be cleared, because OFproto
>>> simply doesn't know about it.
>>>
>>> But, I guess, not caching the connection and other internal structures
>>> might significantly impact datapath performance.  Will it?
>>>
>>> Looking at the reproducer in [3], it has, for example, the flow with the
>>> following actions:
>>>
>>>   actions:ct(commit,zone=8,mark=0/0x1,nat(src)),
>>>           set(eth(src=00:00:00:00:00:01,dst=00:00:00:00:00:06)),
>>>           set(ipv4(src=172.18.2.10,dst=192.168.100.6,ttl=62)),
>>>           ct(zone=8),recirc(0x4)
>>>
>>> So, if the first ct() will change some datapath-specific packet metadata,
>>> the second ct() may try to use that information.
>>>
>>> It looks like during the flow translation we must add ct_clear action
>>> explicitly after every ct() action, unless it was the last action in
>>> the list.  This will end up with adding a lot of ct_clear actions
>>> everywhere.
>>>
>>> Another option is the patch below that tries to track if ct_clear
>>> is required and adds that action before the next ct() action in
>>> the datapath.
>>>
>>> BTW, the test [3] fails with both kernel and userspace datapaths.
>>>
>>> The change below should fix the problem, I think.
>>> It looks like we also have to put ct_clear action before translating
>>> output to the patch port if we're in 'conntracked' state.
>>>
>>> And I do not know how to fix the problem for kernels without ct_clear
>>> support.  I don't think we can clear metadata that is internal to
>>> the kernel.  Ideas are welcome.
>>>
>>> CC: Aaron, Paolo.
>>>
>>> Any thoughts?
>>
>> Thank you so much for the detailed explanation of what's going on and
>> for providing a proposed fix.
>>
>> I took it for a spin and it does indeed appear to fix the OVN hairpin
>> issue, but it does unfortunately not appear to fix the issue Liam
>> reported for the ML2/OVS use case [4].
>>
>> When trying that use case with your patch I see this in the Open vSwitch log:
>> 2022-05-19T08:17:02.668Z|00001|ofproto_dpif_xlate(handler1)|WARN|over
>> max translation depth 64 on bridge br-int while processing
>> ct_state=new|trk,ct_ipv6_src=fc00:24:159c:555b:f816:3eff:fe8b:3ad0,ct_ipv6_dst=fc00:24:159c:555b:f816:3eff:fe8f:9302,ct_nw_proto=58,ct_tp_src=128,ct_tp_dst=0,icmp6,in_port=3,vlan_tci=0x0000,dl_src=fa:16:3e:8b:3a:d0,dl_dst=fa:16:3e:8f:93:02,ipv6_src=fc00:24:159c:555b:f816:3eff:fe8b:3ad0,ipv6_dst=fc00:24:159c:555b:f816:3eff:fe8f:9302,ipv6_label=0xac5db,nw_tos=0,nw_ecn=0,nw_ttl=64,icmp_type=128,icmp_code=0
>> 2022-05-19T08:17:02.668Z|00002|ofproto_dpif_upcall(handler1)|WARN|Dropped
>> 2 log messages in last 205 seconds (most recently, 187 seconds ago)
>> due to excessive rate
>> 2022-05-19T08:17:02.668Z|00003|ofproto_dpif_upcall(handler1)|WARN|Flow:
>> ct_state=new|trk,ct_ipv6_src=fc00:24:159c:555b:f816:3eff:fe8b:3ad0,ct_ipv6_dst=fc00:24:159c:555b:f816:3eff:fe8f:9302,ct_nw_proto=58,ct_tp_src=128,ct_tp_dst=0,icmp6,in_port=5,vlan_tci=0x0000,dl_src=fa:16:3e:8b:3a:d0,dl_dst=fa:16:3e:8f:93:02,ipv6_src=fc00:24:159c:555b:f816:3eff:fe8b:3ad0,ipv6_dst=fc00:24:159c:555b:f816:3eff:fe8f:9302,ipv6_label=0xac5db,nw_tos=0,nw_ecn=0,nw_ttl=64,icmp_type=128,icmp_code=0
>>
>> bridge("br-int")
>> ----------------
>>  0. priority 0, cookie 0x56d6d58c018b51bd
>>     goto_table:60
>> 60. priority 3, cookie 0x56d6d58c018b51bd
>>     NORMAL
>>      >>>> received packet on unknown port 5 <<<<
>>      >> no input bundle, dropping
>>
>> Final flow: unchanged
>> Megaflow: 
>> recirc_id=0,eth,ipv6,in_port=5,dl_src=fa:16:3e:8b:3a:d0,dl_dst=fa:16:3e:8f:93:02,nw_frag=no
>> Datapath actions: drop
>>
>>
>> On the back of your explanation and the log output I made an attempt
>> at adding this patch on top of yours:
>> ---
>> diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c
>> index 110dab0ec..2955e8e4d 100644
>> --- a/ofproto/ofproto-dpif-xlate.c
>> +++ b/ofproto/ofproto-dpif-xlate.c
>> @@ -7144,6 +7144,7 @@ do_xlate_actions(const struct ofpact *ofpacts,
>> size_t ofpacts_len,
>>           * resubmit to the frozen actions.
>>           */
>>          case OFPACT_RESUBMIT:
>> +            ctx->pending_ct_clear = true;
>>              xlate_ofpact_resubmit(ctx, ofpact_get_RESUBMIT(a), last);
>>              continue;
>>          case OFPACT_GOTO_TABLE:
>> ---
>>
>> And the two patches together do appear to resolve the issue reported
>> in [4] as well as the OVN hairpin issue [1]. It does however make a
>> couple of tests fail, so I need to look into if that is expected from
>> the change or if the approach must be changed.
>>
>> 4: https://bugs.launchpad.net/openvswitch/+bug/1964117
> 
> The following patch also works and does not cause any tests to fail.
> 
> ---
> diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c
> index 110dab0ec..905f6994d 100644
> --- a/ofproto/ofproto-dpif-xlate.c
> +++ b/ofproto/ofproto-dpif-xlate.c
> @@ -7354,7 +7354,9 @@ do_xlate_actions(const struct ofpact *ofpacts,
> size_t ofpacts_len,
>              break;
> 
>          case OFPACT_CT_CLEAR:
> -            if (ctx->conntracked || ctx->pending_ct_clear) {
> +            if (ctx->conntracked || ctx->pending_ct_clear
> +                || (flow->ct_state && flow->ct_state & CS_TRACKED))
> +            {
>                  compose_ct_clear_action(ctx);
>              }
>              break;
> ---
> 
> The OpenStack ML2/OVS Open vSwitch firewall driver appears to make
> explicit use of ct_clear action as part of progressing packets through
> the tables and the optimization in 355fef6f2 interrupts that workflow.
> 
> I am not quite sure if the definition of `struct
> xlate_ctx->conntracked` is "this flow has been exposed to conntrack"
> or if it is "conntrack sees this flow as established", if anyone can
> shed light on that we would know if the above patch is ok as is or if
> this flow state should have set struct xlate_ctx->conntracked to true
> in the first place.

I beleive that ctx->conntracked should men that "this flow has been exposed
to conntrack".  And it looks like the condition:
 (!ctx->conntracked && flow->ct_state & CS_TRACKED)
should not be possible.  We're, probably, missing clear_conntrack() call
somewhere.

One thing I seem to miss in my patch is the conntrack clear after returning
from the patch port processing:

diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c
index 110dab0ec..53d4f78b2 100644
--- a/ofproto/ofproto-dpif-xlate.c
+++ b/ofproto/ofproto-dpif-xlate.c
@@ -3955,8 +3955,13 @@ patch_port_output(struct xlate_ctx *ctx, const struct 
xport *in_dev,
 
     /* The out bridge's conntrack execution should have no effect on the
      * original bridge. */
-    ctx->conntracked = old_conntrack;
     ctx->pending_ct_clear = old_ct_clear;
+    if (ctx->conntracked) {
+        /* Conntrack was involved in the other bridge.  We need to clear
+         * whatever information was cached in the datapath. */
+        ctx->pending_ct_clear = true;
+    }
+    ctx->conntracked = old_conntrack;
 
     /* The fact that the out bridge exits (for any reason) does not mean
      * that the original bridge should exit.  Specifically, if the out
---

But that doesn't seem to be related to the (!ctx->conntracked &&
flow->ct_state & CS_TRACKED) case...

> 
> The output from `conntrack -E` with the above patch is in [5], and
> without it is in [6].
> 
> 5: https://pastebin.ubuntu.com/p/tQr6PhXD9X/
> 6: https://pastebin.ubuntu.com/p/WkDmpc6xRZ/
> 

_______________________________________________
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss

Reply via email to