tor. 26. mai 2022, 14:45 skrev Ilya Maximets <i.maxim...@ovn.org>:

> On 5/26/22 13:00, Frode Nordahl wrote:
> > On Wed, May 25, 2022 at 9:55 AM Frode Nordahl
> > <frode.nord...@canonical.com> wrote:
> >>
> >> On Tue, May 24, 2022 at 1:32 PM Ilya Maximets <i.maxim...@ovn.org>
> wrote:
> >>>
> >>> On 5/24/22 12:54, Frode Nordahl wrote:
> >>>> On Mon, May 23, 2022 at 3:49 PM Ilya Maximets <i.maxim...@ovn.org>
> wrote:
> >>>>>
> >>>>> On 5/21/22 12:49, Frode Nordahl wrote:
> >>>>>> On Thu, May 19, 2022 at 3:39 PM Frode Nordahl
> >>>>>> <frode.nord...@canonical.com> wrote:
> >>>>>>>
> >>>>>>> On Sat, May 14, 2022 at 2:10 AM Ilya Maximets <i.maxim...@ovn.org>
> wrote:
> >>>>>>>>
> >>>>>>>> On 5/13/22 10:36, Frode Nordahl wrote:
> >>>>>>>>> On Fri, Mar 11, 2022 at 2:04 PM Liam Young <
> liam.yo...@canonical.com> wrote:
> >>>>>>>>>>
> >>>>>>>>>> Hi,
> >>>>>>>>>>
> >>>>>>>>>> <tl;dr> Commit 355fef6f2 seems to break connectivity in my
> setup</tl;dr>
> >>>>>>>>>
> >>>>>>>>> After OVN started colocating sNAT and dNAT operations in the
> same CT
> >>>>>>>>> zone [0] the above mentioned OVS commit appears to also break
> OVN [1].
> >>>>>>>>> I have been discussing with Numan and he has found a correlation
> with
> >>>>>>>>> the behavior of the open vswitch kernel data path conntrack
> >>>>>>>>> `skb_nfct_cached` function, i.e. if that is changed to always
> return
> >>>>>>>>> 'false' the erratic behavior disappears.
> >>>>>>>>>
> >>>>>>>>> So I guess the question then becomes, is this a OVS userspace or
> OVS
> >>>>>>>>> kernel problem?
> >>>>>>>>>
> >>>>>>>>> We have a reproducer in [3].
> >>>>>>>>>
> >>>>>>>>> 0:
> https://github.com/ovn-org/ovn/commit/4deac4509abbedd6ffaecf27eed01ddefccea40a
> >>>>>>>>> 1: https://bugs.launchpad.net/ubuntu/+source/ovn/+bug/1967856
> >>>>>>>>> 2:
> https://elixir.bootlin.com/linux/latest/source/net/openvswitch/conntrack.c#L683
> >>>>>>>>> 3:
> https://bugs.launchpad.net/ubuntu/+source/ovn/+bug/1967856/comments/6
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>> Hrm.  I think, there is a logical bug in implementations of ct()
> >>>>>>>> datapath action in both datapaths.
> >>>>>>>>
> >>>>>>>> The problem appears when the OpenFlow pipeline for the packet
> contains
> >>>>>>>> several ct() actions.  NXAST_CT action (i.e. every ct() action)
> must
> >>>>>>>> always put the packet into an untracked state.  Tracked state
> will be
> >>>>>>>> set only in the 'recirc_table' where the packet is cloned by the
> ct()
> >>>>>>>> action for further processing.
> >>>>>>>>
> >>>>>>>> If an OF pipeline looks like this:
> >>>>>>>>
> >>>>>>>>   actions=ct(),something,something,ct(),something
> >>>>>>>>
> >>>>>>>> each action must be entered with the 'untracked' conntrack state
> in
> >>>>>>>> the packet metadata.
> >>>>>>>>
> >>>>>>>> However, ct() action in the datapath, unlike OpenFlow, doesn't
> work
> >>>>>>>> this way.  It modifies the conntrack state for the packet it is
> processing.
> >>>>>>>> During the flow translation OVS inserts recirculation right after
> the
> >>>>>>>> datapath ct() action.  This way conntrack state affects only
> processing
> >>>>>>>> after recirculation.  Sort of.
> >>>>>>>>
> >>>>>>>> The issue is that not all OpenFlow ct() actions have recirc_table
> >>>>>>>> specified.  These actions supposed to change the state of the
> conntrack
> >>>>>>>> subsystem, but they still must leave the packet itself in the
> untracked
> >>>>>>>> state with no strings attached.  But since datapath ct() actions
> doesn't
> >>>>>>>> work that way and since recirculation is not inserted (no
> recirc_table
> >>>>>>>> specified), packet after conntrack execution carries the state
> along to
> >>>>>>>> all other actions.
> >>>>>>>> This doesn't impact normal actions, but seems to break subsequent
> ct()
> >>>>>>>> actions on the same pipeline.
> >>>>>>>>
> >>>>>>>> In general, it looks to me that ct() action in the datapath is
> >>>>>>>> not supposed to cache any connection data inside the packet's
> metadata.
> >>>>>>>> This seems to be the root cause of the problem.  Fields that OF
> knows
> >>>>>>>> about should not trigger any issues if carried along with a
> packet,
> >>>>>>>> but datapath-specific metadata can not be cleared, because OFproto
> >>>>>>>> simply doesn't know about it.
> >>>>>>>>
> >>>>>>>> But, I guess, not caching the connection and other internal
> structures
> >>>>>>>> might significantly impact datapath performance.  Will it?
> >>>>>>>>
> >>>>>>>> Looking at the reproducer in [3], it has, for example, the flow
> with the
> >>>>>>>> following actions:
> >>>>>>>>
> >>>>>>>>   actions:ct(commit,zone=8,mark=0/0x1,nat(src)),
> >>>>>>>>           set(eth(src=00:00:00:00:00:01,dst=00:00:00:00:00:06)),
> >>>>>>>>           set(ipv4(src=172.18.2.10,dst=192.168.100.6,ttl=62)),
> >>>>>>>>           ct(zone=8),recirc(0x4)
> >>>>>>>>
> >>>>>>>> So, if the first ct() will change some datapath-specific packet
> metadata,
> >>>>>>>> the second ct() may try to use that information.
> >>>>>>>>
> >>>>>>>> It looks like during the flow translation we must add ct_clear
> action
> >>>>>>>> explicitly after every ct() action, unless it was the last action
> in
> >>>>>>>> the list.  This will end up with adding a lot of ct_clear actions
> >>>>>>>> everywhere.
> >>>>>>>>
> >>>>>>>> Another option is the patch below that tries to track if ct_clear
> >>>>>>>> is required and adds that action before the next ct() action in
> >>>>>>>> the datapath.
> >>>>>>>>
> >>>>>>>> BTW, the test [3] fails with both kernel and userspace datapaths.
> >>>>>>>>
> >>>>>>>> The change below should fix the problem, I think.
> >>>>>>>> It looks like we also have to put ct_clear action before
> translating
> >>>>>>>> output to the patch port if we're in 'conntracked' state.
> >>>>>>>>
> >>>>>>>> And I do not know how to fix the problem for kernels without
> ct_clear
> >>>>>>>> support.  I don't think we can clear metadata that is internal to
> >>>>>>>> the kernel.  Ideas are welcome.
> >>>>>>>>
> >>>>>>>> CC: Aaron, Paolo.
> >>>>>>>>
> >>>>>>>> Any thoughts?
> >>>>>>>
> >>>>>>> Thank you so much for the detailed explanation of what's going on
> and
> >>>>>>> for providing a proposed fix.
> >>>>>>>
> >>>>>>> I took it for a spin and it does indeed appear to fix the OVN
> hairpin
> >>>>>>> issue, but it does unfortunately not appear to fix the issue Liam
> >>>>>>> reported for the ML2/OVS use case [4].
> >>>>>>>
> >>>>>>> When trying that use case with your patch I see this in the Open
> vSwitch log:
> >>>>>>>
> 2022-05-19T08:17:02.668Z|00001|ofproto_dpif_xlate(handler1)|WARN|over
> >>>>>>> max translation depth 64 on bridge br-int while processing
> >>>>>>>
> ct_state=new|trk,ct_ipv6_src=fc00:24:159c:555b:f816:3eff:fe8b:3ad0,ct_ipv6_dst=fc00:24:159c:555b:f816:3eff:fe8f:9302,ct_nw_proto=58,ct_tp_src=128,ct_tp_dst=0,icmp6,in_port=3,vlan_tci=0x0000,dl_src=fa:16:3e:8b:3a:d0,dl_dst=fa:16:3e:8f:93:02,ipv6_src=fc00:24:159c:555b:f816:3eff:fe8b:3ad0,ipv6_dst=fc00:24:159c:555b:f816:3eff:fe8f:9302,ipv6_label=0xac5db,nw_tos=0,nw_ecn=0,nw_ttl=64,icmp_type=128,icmp_code=0
> >>>>>>>
> 2022-05-19T08:17:02.668Z|00002|ofproto_dpif_upcall(handler1)|WARN|Dropped
> >>>>>>> 2 log messages in last 205 seconds (most recently, 187 seconds ago)
> >>>>>>> due to excessive rate
> >>>>>>>
> 2022-05-19T08:17:02.668Z|00003|ofproto_dpif_upcall(handler1)|WARN|Flow:
> >>>>>>>
> ct_state=new|trk,ct_ipv6_src=fc00:24:159c:555b:f816:3eff:fe8b:3ad0,ct_ipv6_dst=fc00:24:159c:555b:f816:3eff:fe8f:9302,ct_nw_proto=58,ct_tp_src=128,ct_tp_dst=0,icmp6,in_port=5,vlan_tci=0x0000,dl_src=fa:16:3e:8b:3a:d0,dl_dst=fa:16:3e:8f:93:02,ipv6_src=fc00:24:159c:555b:f816:3eff:fe8b:3ad0,ipv6_dst=fc00:24:159c:555b:f816:3eff:fe8f:9302,ipv6_label=0xac5db,nw_tos=0,nw_ecn=0,nw_ttl=64,icmp_type=128,icmp_code=0
> >>>>>>>
> >>>>>>> bridge("br-int")
> >>>>>>> ----------------
> >>>>>>>  0. priority 0, cookie 0x56d6d58c018b51bd
> >>>>>>>     goto_table:60
> >>>>>>> 60. priority 3, cookie 0x56d6d58c018b51bd
> >>>>>>>     NORMAL
> >>>>>>>      >>>> received packet on unknown port 5 <<<<
> >>>>>>>      >> no input bundle, dropping
> >>>>>>>
> >>>>>>> Final flow: unchanged
> >>>>>>> Megaflow:
> recirc_id=0,eth,ipv6,in_port=5,dl_src=fa:16:3e:8b:3a:d0,dl_dst=fa:16:3e:8f:93:02,nw_frag=no
> >>>>>>> Datapath actions: drop
> >>>>>>>
> >>>>>>>
> >>>>>>> On the back of your explanation and the log output I made an
> attempt
> >>>>>>> at adding this patch on top of yours:
> >>>>>>> ---
> >>>>>>> diff --git a/ofproto/ofproto-dpif-xlate.c
> b/ofproto/ofproto-dpif-xlate.c
> >>>>>>> index 110dab0ec..2955e8e4d 100644
> >>>>>>> --- a/ofproto/ofproto-dpif-xlate.c
> >>>>>>> +++ b/ofproto/ofproto-dpif-xlate.c
> >>>>>>> @@ -7144,6 +7144,7 @@ do_xlate_actions(const struct ofpact
> *ofpacts,
> >>>>>>> size_t ofpacts_len,
> >>>>>>>           * resubmit to the frozen actions.
> >>>>>>>           */
> >>>>>>>          case OFPACT_RESUBMIT:
> >>>>>>> +            ctx->pending_ct_clear = true;
> >>>>>>>              xlate_ofpact_resubmit(ctx, ofpact_get_RESUBMIT(a),
> last);
> >>>>>>>              continue;
> >>>>>>>          case OFPACT_GOTO_TABLE:
> >>>>>>> ---
> >>>>>>>
> >>>>>>> And the two patches together do appear to resolve the issue
> reported
> >>>>>>> in [4] as well as the OVN hairpin issue [1]. It does however make a
> >>>>>>> couple of tests fail, so I need to look into if that is expected
> from
> >>>>>>> the change or if the approach must be changed.
> >>>>>>>
> >>>>>>> 4: https://bugs.launchpad.net/openvswitch/+bug/1964117
> >>>>>>
> >>>>>> The following patch also works and does not cause any tests to fail.
> >>>>>>
> >>>>>> ---
> >>>>>> diff --git a/ofproto/ofproto-dpif-xlate.c
> b/ofproto/ofproto-dpif-xlate.c
> >>>>>> index 110dab0ec..905f6994d 100644
> >>>>>> --- a/ofproto/ofproto-dpif-xlate.c
> >>>>>> +++ b/ofproto/ofproto-dpif-xlate.c
> >>>>>> @@ -7354,7 +7354,9 @@ do_xlate_actions(const struct ofpact *ofpacts,
> >>>>>> size_t ofpacts_len,
> >>>>>>              break;
> >>>>>>
> >>>>>>          case OFPACT_CT_CLEAR:
> >>>>>> -            if (ctx->conntracked || ctx->pending_ct_clear) {
> >>>>>> +            if (ctx->conntracked || ctx->pending_ct_clear
> >>>>>> +                || (flow->ct_state && flow->ct_state & CS_TRACKED))
> >>>>>> +            {
> >>>>>>                  compose_ct_clear_action(ctx);
> >>>>>>              }
> >>>>>>              break;
> >>>>>> ---
> >>>>>>
> >>>>>> The OpenStack ML2/OVS Open vSwitch firewall driver appears to make
> >>>>>> explicit use of ct_clear action as part of progressing packets
> through
> >>>>>> the tables and the optimization in 355fef6f2 interrupts that
> workflow.
> >>>>>>
> >>>>>> I am not quite sure if the definition of `struct
> >>>>>> xlate_ctx->conntracked` is "this flow has been exposed to conntrack"
> >>>>>> or if it is "conntrack sees this flow as established", if anyone can
> >>>>>> shed light on that we would know if the above patch is ok as is or
> if
> >>>>>> this flow state should have set struct xlate_ctx->conntracked to
> true
> >>>>>> in the first place.
> >>>>>
> >>>>> I beleive that ctx->conntracked should men that "this flow has been
> exposed
> >>>>> to conntrack".  And it looks like the condition:
> >>>>>  (!ctx->conntracked && flow->ct_state & CS_TRACKED)
> >>>>> should not be possible.  We're, probably, missing clear_conntrack()
> call
> >>>>> somewhere.
> >>>>>
> >>>>> One thing I seem to miss in my patch is the conntrack clear after
> returning
> >>>>> from the patch port processing:
> >>>>>
> >>>>> diff --git a/ofproto/ofproto-dpif-xlate.c
> b/ofproto/ofproto-dpif-xlate.c
> >>>>> index 110dab0ec..53d4f78b2 100644
> >>>>> --- a/ofproto/ofproto-dpif-xlate.c
> >>>>> +++ b/ofproto/ofproto-dpif-xlate.c
> >>>>> @@ -3955,8 +3955,13 @@ patch_port_output(struct xlate_ctx *ctx,
> const struct xport *in_dev,
> >>>>>
> >>>>>      /* The out bridge's conntrack execution should have no effect
> on the
> >>>>>       * original bridge. */
> >>>>> -    ctx->conntracked = old_conntrack;
> >>>>>      ctx->pending_ct_clear = old_ct_clear;
> >>>>> +    if (ctx->conntracked) {
> >>>>> +        /* Conntrack was involved in the other bridge.  We need to
> clear
> >>>>> +         * whatever information was cached in the datapath. */
> >>>>> +        ctx->pending_ct_clear = true;
> >>>>> +    }
> >>>>> +    ctx->conntracked = old_conntrack;
> >>>>>
> >>>>>      /* The fact that the out bridge exits (for any reason) does not
> mean
> >>>>>       * that the original bridge should exit.  Specifically, if the
> out
> >>>>> ---
> >>>>>
> >>>>> But that doesn't seem to be related to the (!ctx->conntracked &&
> >>>>> flow->ct_state & CS_TRACKED) case...
> >>>>
> >>>> Thank you for confirming that ctx->conntracked should have been set in
> >>>> this flow state, that led me onto the possible root and fix of this
> >>>> particular use case.
> >>>>
> >>>> Enabling debug log for dpif module while initiating a new flow I see
> >>>> that the flow can enter the system as a miss upcall without
> >>>> recirculation and with ct_state set:
> >>>> 2022-05-24T10:42:57.869Z|00036|dpif(handler6)|DBG|system@ovs-system:
> >>>> miss upcall:
> >>>>
> recirc_id(0),dp_hash(0),skb_priority(0),in_port(6),skb_mark(0),ct_state(0x21),ct_zone(0),ct_mark(0),ct_label(0),ct_tuple6(src=fc00:a3f0:723f:7bc4:f816:3eff:fe81:224d,dst=fc00:a3f0:723f:7bc4:f816:3eff:fee5:e7b2,proto=58,src_port=128,dst_port=0),eth(src=fa:16:3e:81:22:4d,dst=fa:16:3e:e5:e7:b2),eth_type(0x86dd),ipv6(src=fc00:a3f0:723f:7bc4:f816:3eff:fe81:224d,dst=fc00:a3f0:723f:7bc4:f816:3eff:fee5:e7b2,label=0x357f1,proto=58,tclass=0,hlimit=64,frag=no),icmpv6(type=128,code=0)
> >>>>
> >>>> Looking at the code it appears that for this state the
> >>>> ctx->conntracked would not be set, it is currently initialized to
> >>>> 'false' and only updated based on frozen state for recirculated
> >>>> packets.
> >>>>
> >>>> Adding this patch resolves inconsistency we discussed above and
> >>>> subsequently the ML2/OVS problem:
> >>>> diff --git a/ofproto/ofproto-dpif-xlate.c
> b/ofproto/ofproto-dpif-xlate.c
> >>>> index 110dab0ec..7bc7426ac 100644
> >>>> --- a/ofproto/ofproto-dpif-xlate.c
> >>>> +++ b/ofproto/ofproto-dpif-xlate.c
> >>>> @@ -7852,6 +7852,12 @@ xlate_actions(struct xlate_in *xin, struct
> >>>> xlate_out *xout)
> >>>>          goto exit;
> >>>>      }
> >>>>
> >>>> +    if (!xin->frozen_state
> >>>> +        && xin->flow.ct_state
> >>>> +        && xin->flow.ct_state & CS_TRACKED) {
> >>>> +        ctx.conntracked = true;
> >>>> +    }
> >>>> +
> >>>>      /* Tunnel metadata in udpif format must be normalized before
> >>>> translation. */
> >>>>      if (flow->tunnel.flags & FLOW_TNL_F_UDPIF) {
> >>>>          const struct tun_table *tun_tab = ofproto_get_tun_tab(
> >>>> ---
> >>>>
> >>>> What do you think?
> >>>>
> >>>
> >>> Hrm.  I suspected something like this.  My guess is that packet
> >>> gets conntracked in OVS, then leaves OVS, gets routed inside the
> >>> kernel and enters OVS again while retaining the ct state from the
> >>> previous run.  Can you confirm that?
> >>
> >> In this setup there is an interface present in the system attached to
> >> OVS using an interface of type internal. It's essentially there to
> >> give a path into an OpenStack managed overlay network for some
> >> management daemons running "outside the cloud".
> >>
> >> Without the patches we've been working on in this thread so far,
> >> looking at the OF flow configuration side the packets traverse the
> >> tables like this:
> >> table=0, n_packets=6, n_bytes=604, priority=0 actions=resubmit(,59)
> >> table=59, n_packets=6, n_bytes=604, priority=0 actions=resubmit(,60)
> >> table=60, n_packets=4, n_bytes=384, priority=100,in_port="o-hm0"
> >> actions=load:0x3->NXM_NX_REG5[],load:0x1->NXM_NX_REG6[],resubmit(,71)
> >> table=71, n_packets=65, n_bytes=7670, priority=110,ct_state=+trk
> >> actions=ct_clear,resubmit(,71)
> >>
> >> Since we currently ignore the request for the ct_clear action in this
> >> state the packet will be resubmitted until hitting the limit in the
> >> last rule above.
> >>
> >> The OVS kernel event tracing output from the same flow look like this:
> >>             ping-207769  [010] ..... 33681.265450: ovs_dp_upcall:
> >> dpaddr=0000000053422f7e dp_name=ovs-system dev=o-hm0
> >> skbaddr=00000000662a695d len=118 data_len=0 truesize=768 nr_frags=0
> >> gso_size=0 gso_type=0x0 ovs_flow_hash=0x00000000 recirc_id=0x00000000
> >> keyaddr=00000000900b6d94 eth_type=0xdd86 ct_state=21 ct_orig_proto=3a
> >> ct_zone=0000 flow_key_valid=1 upcall_cmd=1 upcall_port=2805034951
> >> upcall_mru=0
> >>           <idle>-0       [010] ..s.. 33686.518866: ovs_dp_upcall:
> >> dpaddr=0000000053422f7e dp_name=ovs-system dev=o-hm0
> >> skbaddr=00000000ab9954ee len=86 data_len=0 truesize=768 nr_frags=0
> >> gso_size=0 gso_type=0x0 ovs_flow_hash=0x00000000 recirc_id=0x00000000
> >> keyaddr=00000000cfad77d1 eth_type=0xdd86 ct_state=00 ct_orig_proto=00
> >> ct_zone=0000 flow_key_valid=1 upcall_cmd=1 upcall_port=2805034951
> >> upcall_mru=0
> >>        handler12-205602  [010] ..... 33686.519158:
> >> ovs_do_execute_action: dpaddr=0000000053422f7e dp_name=ovs-system
> >> dev=o-hm0 skbaddr=00000000662a695d len=86 data_len=0 truesize=768
> >> nr_frags=0 gso_size=0 gso_type=0x0 ovs_flow_hash=0x00000000
> >> recirc_id=0x00000000 keyaddr=0000000008b47a0c eth_type=0xdd86
> >> ct_state=00 ct_orig_proto=00 ct_Zone=0000 flow_key_valid=1
> >> action_type=3 action_len=12 action_data=0000000046f8c388 is_last=0
> >>        handler12-205602  [010] ..... 33686.519159:
> >> ovs_do_execute_action: dpaddr=0000000053422f7e dp_name=ovs-system
> >> dev=o-hm0 skbaddr=00000000662a695d len=86 data_len=0 truesize=768
> >> nr_frags=0 gso_size=0 gso_type=0x0 ovs_flow_hash=0x00000000
> >> recirc_id=0x00000000 keyaddr=0000000008b47a0c eth_type=0xdd86
> >> ct_state=00 ct_orig_proto=00 ct_Zone=0000 flow_key_valid=1
> >> action_type=1 action_len=4 action_data=00000000acdfdf2a is_last=0
> >>        handler12-205602  [010] ..... 33686.519197:
> >> ovs_do_execute_action: dpaddr=0000000053422f7e dp_name=ovs-system
> >> dev=o-hm0 skbaddr=00000000662a695d len=86 data_len=0 truesize=768
> >> nr_frags=0 gso_size=0 gso_type=0x0 ovs_flow_hash=0x00000000
> >> recirc_id=0x00000000 keyaddr=0000000008b47a0c eth_type=0xdd86
> >> ct_state=00 ct_orig_proto=00 ct_Zone=0000 flow_key_valid=1
> >> action_type=4 action_len=4 action_data=0000000069492ddd is_last=0
> >>        handler12-205602  [010] ..... 33686.519197:
> >> ovs_do_execute_action: dpaddr=0000000053422f7e dp_name=ovs-system
> >> dev=o-hm0 skbaddr=00000000662a695d len=86 data_len=0 truesize=768
> >> nr_frags=0 gso_size=0 gso_type=0x0 ovs_flow_hash=0x00000000
> >> recirc_id=0x00000000 keyaddr=0000000008b47a0c eth_type=0xdd86
> >> ct_state=00 ct_orig_proto=00 ct_Zone=0000 flow_key_valid=1
> >> action_type=1 action_len=4 action_data=00000000e5733215 is_last=1
> >>           <idle>-0       [011] ..s.. 33686.521143: ovs_dp_upcall:
> >> dpaddr=0000000053422f7e dp_name=ovs-system dev=gre_sys
> >> skbaddr=00000000369641d8 len=78 data_len=0 truesize=768 nr_frags=0
> >> gso_size=0 gso_type=0x0 ovs_flow_hash=0x00000000 recirc_id=0x00000000
> >> keyaddr=00000000dc600580 eth_type=0xdd86 ct_state=00 ct_orig_proto=00
> >> ct_zone=0000 flow_key_valid=1 upcall_cmd=1 upcall_port=3508503525
> >> upcall_mru=0
> >>        handler13-205603  [011] ..... 33686.521383:
> >> ovs_do_execute_action: dpaddr=0000000053422f7e dp_name=ovs-system
> >> dev=gre_sys skbaddr=000000007efe6e4d len=78 data_len=0 truesize=768
> >> nr_frags=0 gso_size=0 gso_type=0x0 ovs_flow_hash=0x00000000
> >> recirc_id=0x00000000 keyaddr=00000000303b841b eth_type=0xdd86
> >> ct_state=00 ct_orig_proto=00 ct_Zone=0000 flow_key_valid=1
> >> action_type=1 action_len=4 action_data=0000000033bb5908 is_last=1
> >>
> >> So to me it looks like the packet is conntracked before entering OVS
> >> the first time, and I guess we would have the same situation if it
> >> took a path like you suggested above.
> >>
> >>> If that's the case, I think, we actually have to clear ct state
> >>> while packet is first entering the kernel datapath.  Otherwise
> >>> it can match on incorrect flows in the kenrel cache.  Marking
> >>> it as conntracked will probably be against the logic of the OF
> >>> pipeline.
> >>
> >> We did use to clear this from the OVS user space side, but I agree
> >> with you that that is problematic. This is probably why Numan was on
> >> the track of that in-kernel NF cache function.
> >>
> >>> Something like this (not tested):
> >>>
> >>> diff --git a/net/openvswitch/vport.c b/net/openvswitch/vport.c
> >>> index 82a74f998966..ebea64494196 100644
> >>> --- a/net/openvswitch/vport.c
> >>> +++ b/net/openvswitch/vport.c
> >>> @@ -451,6 +451,12 @@ int ovs_vport_receive(struct vport *vport, struct
> sk_buff *skb,
> >>>                 kfree_skb(skb);
> >>>                 return error;
> >>>         }
> >>> +
> >>> +       /* Packets entering OVS has to be in untracked state.
> Clearing here
> >>> +        * to be sure that no state persisted from the last pass
> through OVS.
> >>> +        */
> >>> +       ovs_ct_clear(skb, &key);
> >>> +
> >>>         ovs_dp_process_packet(skb, &key);
> >>>         return 0;
> >>>  }
> >>> ---
> >>
> >> Yep, this works:
> >>             ping-3265    [011] .....    97.820835: ovs_dp_upcall:
> >> dpaddr=00000000a05ba3f9 dp_name=ovs-system dev=o-hm0
> >> skbaddr=000000003b1e06d0 len=118 data_len=0 truesize=768 nr_frags=0
> >> gso_size=0 gso_type=0x0 ovs_flow_hash=0x00000000 recirc_id=0x00000000
> >> keyaddr=0000000077e1165d eth_type=0xdd86 ct_state=00 ct_orig_proto=00
> >> ct_zone=0000 flow_key_valid=1 upcall_cmd=1 upcall_port=2787207471
> >> upcall_mru=0
> >>
> >> A end to end workload test for the system described in [4] with this
> >> patched kernel is also successful. For good measure I also ran through
> >> the OVN kernel system testsuite and observed no regressions with this
> >> kernel.
> >>
> >>
> >> So to sum up, I guess we have two distinct bugs here and we need your
> >> proposed patch for ovs-dpif to fix the OVN hairpin issue, and the
> >> kernel patch for the conntracked packet entering the system issue that
> >> affects ML2/OVS.
> >
> > If we agree on how to proceed I'd be happy to help put this into
> > formal patches with tests and appropriate accreditation etc. Let me
> > know what you think and how you would prefer we can split the work of
> > bringing this to a resolution.
> >
>
> I'm working on a good commit message for the kernel piece above
> (ovs_vport_receive), will send it soon.
>

Great, thank you for doing that!

If you can help with stitching all the pieces for the OVS userspace
> part and writing tests for it, that would be great.
>

I will work on this part and hopefully submit tomorrow.

--
Frode Nordahl

Best regards, Ilya Maximets.
>
_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to