tor. 26. mai 2022, 14:45 skrev Ilya Maximets <i.maxim...@ovn.org>: > On 5/26/22 13:00, Frode Nordahl wrote: > > On Wed, May 25, 2022 at 9:55 AM Frode Nordahl > > <frode.nord...@canonical.com> wrote: > >> > >> On Tue, May 24, 2022 at 1:32 PM Ilya Maximets <i.maxim...@ovn.org> > wrote: > >>> > >>> On 5/24/22 12:54, Frode Nordahl wrote: > >>>> On Mon, May 23, 2022 at 3:49 PM Ilya Maximets <i.maxim...@ovn.org> > wrote: > >>>>> > >>>>> On 5/21/22 12:49, Frode Nordahl wrote: > >>>>>> On Thu, May 19, 2022 at 3:39 PM Frode Nordahl > >>>>>> <frode.nord...@canonical.com> wrote: > >>>>>>> > >>>>>>> On Sat, May 14, 2022 at 2:10 AM Ilya Maximets <i.maxim...@ovn.org> > wrote: > >>>>>>>> > >>>>>>>> On 5/13/22 10:36, Frode Nordahl wrote: > >>>>>>>>> On Fri, Mar 11, 2022 at 2:04 PM Liam Young < > liam.yo...@canonical.com> wrote: > >>>>>>>>>> > >>>>>>>>>> Hi, > >>>>>>>>>> > >>>>>>>>>> <tl;dr> Commit 355fef6f2 seems to break connectivity in my > setup</tl;dr> > >>>>>>>>> > >>>>>>>>> After OVN started colocating sNAT and dNAT operations in the > same CT > >>>>>>>>> zone [0] the above mentioned OVS commit appears to also break > OVN [1]. > >>>>>>>>> I have been discussing with Numan and he has found a correlation > with > >>>>>>>>> the behavior of the open vswitch kernel data path conntrack > >>>>>>>>> `skb_nfct_cached` function, i.e. if that is changed to always > return > >>>>>>>>> 'false' the erratic behavior disappears. > >>>>>>>>> > >>>>>>>>> So I guess the question then becomes, is this a OVS userspace or > OVS > >>>>>>>>> kernel problem? > >>>>>>>>> > >>>>>>>>> We have a reproducer in [3]. > >>>>>>>>> > >>>>>>>>> 0: > https://github.com/ovn-org/ovn/commit/4deac4509abbedd6ffaecf27eed01ddefccea40a > >>>>>>>>> 1: https://bugs.launchpad.net/ubuntu/+source/ovn/+bug/1967856 > >>>>>>>>> 2: > https://elixir.bootlin.com/linux/latest/source/net/openvswitch/conntrack.c#L683 > >>>>>>>>> 3: > https://bugs.launchpad.net/ubuntu/+source/ovn/+bug/1967856/comments/6 > >>>>>>>>> > >>>>>>>> > >>>>>>>> Hrm. I think, there is a logical bug in implementations of ct() > >>>>>>>> datapath action in both datapaths. > >>>>>>>> > >>>>>>>> The problem appears when the OpenFlow pipeline for the packet > contains > >>>>>>>> several ct() actions. NXAST_CT action (i.e. every ct() action) > must > >>>>>>>> always put the packet into an untracked state. Tracked state > will be > >>>>>>>> set only in the 'recirc_table' where the packet is cloned by the > ct() > >>>>>>>> action for further processing. > >>>>>>>> > >>>>>>>> If an OF pipeline looks like this: > >>>>>>>> > >>>>>>>> actions=ct(),something,something,ct(),something > >>>>>>>> > >>>>>>>> each action must be entered with the 'untracked' conntrack state > in > >>>>>>>> the packet metadata. > >>>>>>>> > >>>>>>>> However, ct() action in the datapath, unlike OpenFlow, doesn't > work > >>>>>>>> this way. It modifies the conntrack state for the packet it is > processing. > >>>>>>>> During the flow translation OVS inserts recirculation right after > the > >>>>>>>> datapath ct() action. This way conntrack state affects only > processing > >>>>>>>> after recirculation. Sort of. > >>>>>>>> > >>>>>>>> The issue is that not all OpenFlow ct() actions have recirc_table > >>>>>>>> specified. These actions supposed to change the state of the > conntrack > >>>>>>>> subsystem, but they still must leave the packet itself in the > untracked > >>>>>>>> state with no strings attached. But since datapath ct() actions > doesn't > >>>>>>>> work that way and since recirculation is not inserted (no > recirc_table > >>>>>>>> specified), packet after conntrack execution carries the state > along to > >>>>>>>> all other actions. > >>>>>>>> This doesn't impact normal actions, but seems to break subsequent > ct() > >>>>>>>> actions on the same pipeline. > >>>>>>>> > >>>>>>>> In general, it looks to me that ct() action in the datapath is > >>>>>>>> not supposed to cache any connection data inside the packet's > metadata. > >>>>>>>> This seems to be the root cause of the problem. Fields that OF > knows > >>>>>>>> about should not trigger any issues if carried along with a > packet, > >>>>>>>> but datapath-specific metadata can not be cleared, because OFproto > >>>>>>>> simply doesn't know about it. > >>>>>>>> > >>>>>>>> But, I guess, not caching the connection and other internal > structures > >>>>>>>> might significantly impact datapath performance. Will it? > >>>>>>>> > >>>>>>>> Looking at the reproducer in [3], it has, for example, the flow > with the > >>>>>>>> following actions: > >>>>>>>> > >>>>>>>> actions:ct(commit,zone=8,mark=0/0x1,nat(src)), > >>>>>>>> set(eth(src=00:00:00:00:00:01,dst=00:00:00:00:00:06)), > >>>>>>>> set(ipv4(src=172.18.2.10,dst=192.168.100.6,ttl=62)), > >>>>>>>> ct(zone=8),recirc(0x4) > >>>>>>>> > >>>>>>>> So, if the first ct() will change some datapath-specific packet > metadata, > >>>>>>>> the second ct() may try to use that information. > >>>>>>>> > >>>>>>>> It looks like during the flow translation we must add ct_clear > action > >>>>>>>> explicitly after every ct() action, unless it was the last action > in > >>>>>>>> the list. This will end up with adding a lot of ct_clear actions > >>>>>>>> everywhere. > >>>>>>>> > >>>>>>>> Another option is the patch below that tries to track if ct_clear > >>>>>>>> is required and adds that action before the next ct() action in > >>>>>>>> the datapath. > >>>>>>>> > >>>>>>>> BTW, the test [3] fails with both kernel and userspace datapaths. > >>>>>>>> > >>>>>>>> The change below should fix the problem, I think. > >>>>>>>> It looks like we also have to put ct_clear action before > translating > >>>>>>>> output to the patch port if we're in 'conntracked' state. > >>>>>>>> > >>>>>>>> And I do not know how to fix the problem for kernels without > ct_clear > >>>>>>>> support. I don't think we can clear metadata that is internal to > >>>>>>>> the kernel. Ideas are welcome. > >>>>>>>> > >>>>>>>> CC: Aaron, Paolo. > >>>>>>>> > >>>>>>>> Any thoughts? > >>>>>>> > >>>>>>> Thank you so much for the detailed explanation of what's going on > and > >>>>>>> for providing a proposed fix. > >>>>>>> > >>>>>>> I took it for a spin and it does indeed appear to fix the OVN > hairpin > >>>>>>> issue, but it does unfortunately not appear to fix the issue Liam > >>>>>>> reported for the ML2/OVS use case [4]. > >>>>>>> > >>>>>>> When trying that use case with your patch I see this in the Open > vSwitch log: > >>>>>>> > 2022-05-19T08:17:02.668Z|00001|ofproto_dpif_xlate(handler1)|WARN|over > >>>>>>> max translation depth 64 on bridge br-int while processing > >>>>>>> > ct_state=new|trk,ct_ipv6_src=fc00:24:159c:555b:f816:3eff:fe8b:3ad0,ct_ipv6_dst=fc00:24:159c:555b:f816:3eff:fe8f:9302,ct_nw_proto=58,ct_tp_src=128,ct_tp_dst=0,icmp6,in_port=3,vlan_tci=0x0000,dl_src=fa:16:3e:8b:3a:d0,dl_dst=fa:16:3e:8f:93:02,ipv6_src=fc00:24:159c:555b:f816:3eff:fe8b:3ad0,ipv6_dst=fc00:24:159c:555b:f816:3eff:fe8f:9302,ipv6_label=0xac5db,nw_tos=0,nw_ecn=0,nw_ttl=64,icmp_type=128,icmp_code=0 > >>>>>>> > 2022-05-19T08:17:02.668Z|00002|ofproto_dpif_upcall(handler1)|WARN|Dropped > >>>>>>> 2 log messages in last 205 seconds (most recently, 187 seconds ago) > >>>>>>> due to excessive rate > >>>>>>> > 2022-05-19T08:17:02.668Z|00003|ofproto_dpif_upcall(handler1)|WARN|Flow: > >>>>>>> > ct_state=new|trk,ct_ipv6_src=fc00:24:159c:555b:f816:3eff:fe8b:3ad0,ct_ipv6_dst=fc00:24:159c:555b:f816:3eff:fe8f:9302,ct_nw_proto=58,ct_tp_src=128,ct_tp_dst=0,icmp6,in_port=5,vlan_tci=0x0000,dl_src=fa:16:3e:8b:3a:d0,dl_dst=fa:16:3e:8f:93:02,ipv6_src=fc00:24:159c:555b:f816:3eff:fe8b:3ad0,ipv6_dst=fc00:24:159c:555b:f816:3eff:fe8f:9302,ipv6_label=0xac5db,nw_tos=0,nw_ecn=0,nw_ttl=64,icmp_type=128,icmp_code=0 > >>>>>>> > >>>>>>> bridge("br-int") > >>>>>>> ---------------- > >>>>>>> 0. priority 0, cookie 0x56d6d58c018b51bd > >>>>>>> goto_table:60 > >>>>>>> 60. priority 3, cookie 0x56d6d58c018b51bd > >>>>>>> NORMAL > >>>>>>> >>>> received packet on unknown port 5 <<<< > >>>>>>> >> no input bundle, dropping > >>>>>>> > >>>>>>> Final flow: unchanged > >>>>>>> Megaflow: > recirc_id=0,eth,ipv6,in_port=5,dl_src=fa:16:3e:8b:3a:d0,dl_dst=fa:16:3e:8f:93:02,nw_frag=no > >>>>>>> Datapath actions: drop > >>>>>>> > >>>>>>> > >>>>>>> On the back of your explanation and the log output I made an > attempt > >>>>>>> at adding this patch on top of yours: > >>>>>>> --- > >>>>>>> diff --git a/ofproto/ofproto-dpif-xlate.c > b/ofproto/ofproto-dpif-xlate.c > >>>>>>> index 110dab0ec..2955e8e4d 100644 > >>>>>>> --- a/ofproto/ofproto-dpif-xlate.c > >>>>>>> +++ b/ofproto/ofproto-dpif-xlate.c > >>>>>>> @@ -7144,6 +7144,7 @@ do_xlate_actions(const struct ofpact > *ofpacts, > >>>>>>> size_t ofpacts_len, > >>>>>>> * resubmit to the frozen actions. > >>>>>>> */ > >>>>>>> case OFPACT_RESUBMIT: > >>>>>>> + ctx->pending_ct_clear = true; > >>>>>>> xlate_ofpact_resubmit(ctx, ofpact_get_RESUBMIT(a), > last); > >>>>>>> continue; > >>>>>>> case OFPACT_GOTO_TABLE: > >>>>>>> --- > >>>>>>> > >>>>>>> And the two patches together do appear to resolve the issue > reported > >>>>>>> in [4] as well as the OVN hairpin issue [1]. It does however make a > >>>>>>> couple of tests fail, so I need to look into if that is expected > from > >>>>>>> the change or if the approach must be changed. > >>>>>>> > >>>>>>> 4: https://bugs.launchpad.net/openvswitch/+bug/1964117 > >>>>>> > >>>>>> The following patch also works and does not cause any tests to fail. > >>>>>> > >>>>>> --- > >>>>>> diff --git a/ofproto/ofproto-dpif-xlate.c > b/ofproto/ofproto-dpif-xlate.c > >>>>>> index 110dab0ec..905f6994d 100644 > >>>>>> --- a/ofproto/ofproto-dpif-xlate.c > >>>>>> +++ b/ofproto/ofproto-dpif-xlate.c > >>>>>> @@ -7354,7 +7354,9 @@ do_xlate_actions(const struct ofpact *ofpacts, > >>>>>> size_t ofpacts_len, > >>>>>> break; > >>>>>> > >>>>>> case OFPACT_CT_CLEAR: > >>>>>> - if (ctx->conntracked || ctx->pending_ct_clear) { > >>>>>> + if (ctx->conntracked || ctx->pending_ct_clear > >>>>>> + || (flow->ct_state && flow->ct_state & CS_TRACKED)) > >>>>>> + { > >>>>>> compose_ct_clear_action(ctx); > >>>>>> } > >>>>>> break; > >>>>>> --- > >>>>>> > >>>>>> The OpenStack ML2/OVS Open vSwitch firewall driver appears to make > >>>>>> explicit use of ct_clear action as part of progressing packets > through > >>>>>> the tables and the optimization in 355fef6f2 interrupts that > workflow. > >>>>>> > >>>>>> I am not quite sure if the definition of `struct > >>>>>> xlate_ctx->conntracked` is "this flow has been exposed to conntrack" > >>>>>> or if it is "conntrack sees this flow as established", if anyone can > >>>>>> shed light on that we would know if the above patch is ok as is or > if > >>>>>> this flow state should have set struct xlate_ctx->conntracked to > true > >>>>>> in the first place. > >>>>> > >>>>> I beleive that ctx->conntracked should men that "this flow has been > exposed > >>>>> to conntrack". And it looks like the condition: > >>>>> (!ctx->conntracked && flow->ct_state & CS_TRACKED) > >>>>> should not be possible. We're, probably, missing clear_conntrack() > call > >>>>> somewhere. > >>>>> > >>>>> One thing I seem to miss in my patch is the conntrack clear after > returning > >>>>> from the patch port processing: > >>>>> > >>>>> diff --git a/ofproto/ofproto-dpif-xlate.c > b/ofproto/ofproto-dpif-xlate.c > >>>>> index 110dab0ec..53d4f78b2 100644 > >>>>> --- a/ofproto/ofproto-dpif-xlate.c > >>>>> +++ b/ofproto/ofproto-dpif-xlate.c > >>>>> @@ -3955,8 +3955,13 @@ patch_port_output(struct xlate_ctx *ctx, > const struct xport *in_dev, > >>>>> > >>>>> /* The out bridge's conntrack execution should have no effect > on the > >>>>> * original bridge. */ > >>>>> - ctx->conntracked = old_conntrack; > >>>>> ctx->pending_ct_clear = old_ct_clear; > >>>>> + if (ctx->conntracked) { > >>>>> + /* Conntrack was involved in the other bridge. We need to > clear > >>>>> + * whatever information was cached in the datapath. */ > >>>>> + ctx->pending_ct_clear = true; > >>>>> + } > >>>>> + ctx->conntracked = old_conntrack; > >>>>> > >>>>> /* The fact that the out bridge exits (for any reason) does not > mean > >>>>> * that the original bridge should exit. Specifically, if the > out > >>>>> --- > >>>>> > >>>>> But that doesn't seem to be related to the (!ctx->conntracked && > >>>>> flow->ct_state & CS_TRACKED) case... > >>>> > >>>> Thank you for confirming that ctx->conntracked should have been set in > >>>> this flow state, that led me onto the possible root and fix of this > >>>> particular use case. > >>>> > >>>> Enabling debug log for dpif module while initiating a new flow I see > >>>> that the flow can enter the system as a miss upcall without > >>>> recirculation and with ct_state set: > >>>> 2022-05-24T10:42:57.869Z|00036|dpif(handler6)|DBG|system@ovs-system: > >>>> miss upcall: > >>>> > recirc_id(0),dp_hash(0),skb_priority(0),in_port(6),skb_mark(0),ct_state(0x21),ct_zone(0),ct_mark(0),ct_label(0),ct_tuple6(src=fc00:a3f0:723f:7bc4:f816:3eff:fe81:224d,dst=fc00:a3f0:723f:7bc4:f816:3eff:fee5:e7b2,proto=58,src_port=128,dst_port=0),eth(src=fa:16:3e:81:22:4d,dst=fa:16:3e:e5:e7:b2),eth_type(0x86dd),ipv6(src=fc00:a3f0:723f:7bc4:f816:3eff:fe81:224d,dst=fc00:a3f0:723f:7bc4:f816:3eff:fee5:e7b2,label=0x357f1,proto=58,tclass=0,hlimit=64,frag=no),icmpv6(type=128,code=0) > >>>> > >>>> Looking at the code it appears that for this state the > >>>> ctx->conntracked would not be set, it is currently initialized to > >>>> 'false' and only updated based on frozen state for recirculated > >>>> packets. > >>>> > >>>> Adding this patch resolves inconsistency we discussed above and > >>>> subsequently the ML2/OVS problem: > >>>> diff --git a/ofproto/ofproto-dpif-xlate.c > b/ofproto/ofproto-dpif-xlate.c > >>>> index 110dab0ec..7bc7426ac 100644 > >>>> --- a/ofproto/ofproto-dpif-xlate.c > >>>> +++ b/ofproto/ofproto-dpif-xlate.c > >>>> @@ -7852,6 +7852,12 @@ xlate_actions(struct xlate_in *xin, struct > >>>> xlate_out *xout) > >>>> goto exit; > >>>> } > >>>> > >>>> + if (!xin->frozen_state > >>>> + && xin->flow.ct_state > >>>> + && xin->flow.ct_state & CS_TRACKED) { > >>>> + ctx.conntracked = true; > >>>> + } > >>>> + > >>>> /* Tunnel metadata in udpif format must be normalized before > >>>> translation. */ > >>>> if (flow->tunnel.flags & FLOW_TNL_F_UDPIF) { > >>>> const struct tun_table *tun_tab = ofproto_get_tun_tab( > >>>> --- > >>>> > >>>> What do you think? > >>>> > >>> > >>> Hrm. I suspected something like this. My guess is that packet > >>> gets conntracked in OVS, then leaves OVS, gets routed inside the > >>> kernel and enters OVS again while retaining the ct state from the > >>> previous run. Can you confirm that? > >> > >> In this setup there is an interface present in the system attached to > >> OVS using an interface of type internal. It's essentially there to > >> give a path into an OpenStack managed overlay network for some > >> management daemons running "outside the cloud". > >> > >> Without the patches we've been working on in this thread so far, > >> looking at the OF flow configuration side the packets traverse the > >> tables like this: > >> table=0, n_packets=6, n_bytes=604, priority=0 actions=resubmit(,59) > >> table=59, n_packets=6, n_bytes=604, priority=0 actions=resubmit(,60) > >> table=60, n_packets=4, n_bytes=384, priority=100,in_port="o-hm0" > >> actions=load:0x3->NXM_NX_REG5[],load:0x1->NXM_NX_REG6[],resubmit(,71) > >> table=71, n_packets=65, n_bytes=7670, priority=110,ct_state=+trk > >> actions=ct_clear,resubmit(,71) > >> > >> Since we currently ignore the request for the ct_clear action in this > >> state the packet will be resubmitted until hitting the limit in the > >> last rule above. > >> > >> The OVS kernel event tracing output from the same flow look like this: > >> ping-207769 [010] ..... 33681.265450: ovs_dp_upcall: > >> dpaddr=0000000053422f7e dp_name=ovs-system dev=o-hm0 > >> skbaddr=00000000662a695d len=118 data_len=0 truesize=768 nr_frags=0 > >> gso_size=0 gso_type=0x0 ovs_flow_hash=0x00000000 recirc_id=0x00000000 > >> keyaddr=00000000900b6d94 eth_type=0xdd86 ct_state=21 ct_orig_proto=3a > >> ct_zone=0000 flow_key_valid=1 upcall_cmd=1 upcall_port=2805034951 > >> upcall_mru=0 > >> <idle>-0 [010] ..s.. 33686.518866: ovs_dp_upcall: > >> dpaddr=0000000053422f7e dp_name=ovs-system dev=o-hm0 > >> skbaddr=00000000ab9954ee len=86 data_len=0 truesize=768 nr_frags=0 > >> gso_size=0 gso_type=0x0 ovs_flow_hash=0x00000000 recirc_id=0x00000000 > >> keyaddr=00000000cfad77d1 eth_type=0xdd86 ct_state=00 ct_orig_proto=00 > >> ct_zone=0000 flow_key_valid=1 upcall_cmd=1 upcall_port=2805034951 > >> upcall_mru=0 > >> handler12-205602 [010] ..... 33686.519158: > >> ovs_do_execute_action: dpaddr=0000000053422f7e dp_name=ovs-system > >> dev=o-hm0 skbaddr=00000000662a695d len=86 data_len=0 truesize=768 > >> nr_frags=0 gso_size=0 gso_type=0x0 ovs_flow_hash=0x00000000 > >> recirc_id=0x00000000 keyaddr=0000000008b47a0c eth_type=0xdd86 > >> ct_state=00 ct_orig_proto=00 ct_Zone=0000 flow_key_valid=1 > >> action_type=3 action_len=12 action_data=0000000046f8c388 is_last=0 > >> handler12-205602 [010] ..... 33686.519159: > >> ovs_do_execute_action: dpaddr=0000000053422f7e dp_name=ovs-system > >> dev=o-hm0 skbaddr=00000000662a695d len=86 data_len=0 truesize=768 > >> nr_frags=0 gso_size=0 gso_type=0x0 ovs_flow_hash=0x00000000 > >> recirc_id=0x00000000 keyaddr=0000000008b47a0c eth_type=0xdd86 > >> ct_state=00 ct_orig_proto=00 ct_Zone=0000 flow_key_valid=1 > >> action_type=1 action_len=4 action_data=00000000acdfdf2a is_last=0 > >> handler12-205602 [010] ..... 33686.519197: > >> ovs_do_execute_action: dpaddr=0000000053422f7e dp_name=ovs-system > >> dev=o-hm0 skbaddr=00000000662a695d len=86 data_len=0 truesize=768 > >> nr_frags=0 gso_size=0 gso_type=0x0 ovs_flow_hash=0x00000000 > >> recirc_id=0x00000000 keyaddr=0000000008b47a0c eth_type=0xdd86 > >> ct_state=00 ct_orig_proto=00 ct_Zone=0000 flow_key_valid=1 > >> action_type=4 action_len=4 action_data=0000000069492ddd is_last=0 > >> handler12-205602 [010] ..... 33686.519197: > >> ovs_do_execute_action: dpaddr=0000000053422f7e dp_name=ovs-system > >> dev=o-hm0 skbaddr=00000000662a695d len=86 data_len=0 truesize=768 > >> nr_frags=0 gso_size=0 gso_type=0x0 ovs_flow_hash=0x00000000 > >> recirc_id=0x00000000 keyaddr=0000000008b47a0c eth_type=0xdd86 > >> ct_state=00 ct_orig_proto=00 ct_Zone=0000 flow_key_valid=1 > >> action_type=1 action_len=4 action_data=00000000e5733215 is_last=1 > >> <idle>-0 [011] ..s.. 33686.521143: ovs_dp_upcall: > >> dpaddr=0000000053422f7e dp_name=ovs-system dev=gre_sys > >> skbaddr=00000000369641d8 len=78 data_len=0 truesize=768 nr_frags=0 > >> gso_size=0 gso_type=0x0 ovs_flow_hash=0x00000000 recirc_id=0x00000000 > >> keyaddr=00000000dc600580 eth_type=0xdd86 ct_state=00 ct_orig_proto=00 > >> ct_zone=0000 flow_key_valid=1 upcall_cmd=1 upcall_port=3508503525 > >> upcall_mru=0 > >> handler13-205603 [011] ..... 33686.521383: > >> ovs_do_execute_action: dpaddr=0000000053422f7e dp_name=ovs-system > >> dev=gre_sys skbaddr=000000007efe6e4d len=78 data_len=0 truesize=768 > >> nr_frags=0 gso_size=0 gso_type=0x0 ovs_flow_hash=0x00000000 > >> recirc_id=0x00000000 keyaddr=00000000303b841b eth_type=0xdd86 > >> ct_state=00 ct_orig_proto=00 ct_Zone=0000 flow_key_valid=1 > >> action_type=1 action_len=4 action_data=0000000033bb5908 is_last=1 > >> > >> So to me it looks like the packet is conntracked before entering OVS > >> the first time, and I guess we would have the same situation if it > >> took a path like you suggested above. > >> > >>> If that's the case, I think, we actually have to clear ct state > >>> while packet is first entering the kernel datapath. Otherwise > >>> it can match on incorrect flows in the kenrel cache. Marking > >>> it as conntracked will probably be against the logic of the OF > >>> pipeline. > >> > >> We did use to clear this from the OVS user space side, but I agree > >> with you that that is problematic. This is probably why Numan was on > >> the track of that in-kernel NF cache function. > >> > >>> Something like this (not tested): > >>> > >>> diff --git a/net/openvswitch/vport.c b/net/openvswitch/vport.c > >>> index 82a74f998966..ebea64494196 100644 > >>> --- a/net/openvswitch/vport.c > >>> +++ b/net/openvswitch/vport.c > >>> @@ -451,6 +451,12 @@ int ovs_vport_receive(struct vport *vport, struct > sk_buff *skb, > >>> kfree_skb(skb); > >>> return error; > >>> } > >>> + > >>> + /* Packets entering OVS has to be in untracked state. > Clearing here > >>> + * to be sure that no state persisted from the last pass > through OVS. > >>> + */ > >>> + ovs_ct_clear(skb, &key); > >>> + > >>> ovs_dp_process_packet(skb, &key); > >>> return 0; > >>> } > >>> --- > >> > >> Yep, this works: > >> ping-3265 [011] ..... 97.820835: ovs_dp_upcall: > >> dpaddr=00000000a05ba3f9 dp_name=ovs-system dev=o-hm0 > >> skbaddr=000000003b1e06d0 len=118 data_len=0 truesize=768 nr_frags=0 > >> gso_size=0 gso_type=0x0 ovs_flow_hash=0x00000000 recirc_id=0x00000000 > >> keyaddr=0000000077e1165d eth_type=0xdd86 ct_state=00 ct_orig_proto=00 > >> ct_zone=0000 flow_key_valid=1 upcall_cmd=1 upcall_port=2787207471 > >> upcall_mru=0 > >> > >> A end to end workload test for the system described in [4] with this > >> patched kernel is also successful. For good measure I also ran through > >> the OVN kernel system testsuite and observed no regressions with this > >> kernel. > >> > >> > >> So to sum up, I guess we have two distinct bugs here and we need your > >> proposed patch for ovs-dpif to fix the OVN hairpin issue, and the > >> kernel patch for the conntracked packet entering the system issue that > >> affects ML2/OVS. > > > > If we agree on how to proceed I'd be happy to help put this into > > formal patches with tests and appropriate accreditation etc. Let me > > know what you think and how you would prefer we can split the work of > > bringing this to a resolution. > > > > I'm working on a good commit message for the kernel piece above > (ovs_vport_receive), will send it soon. >
Great, thank you for doing that! If you can help with stitching all the pieces for the OVS userspace > part and writing tests for it, that would be great. > I will work on this part and hopefully submit tomorrow. -- Frode Nordahl Best regards, Ilya Maximets. > _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev