On Thu, Mar 24, 2022 at 4:47 PM Mark Michelson <mmich...@redhat.com> wrote:
>
> Hi Numan,
>
> This message got buried in my inbox, so sorry about only replying now. I
> have some replies in-line below
>
> On 3/18/22 16:34, Numan Siddique wrote:
> > On Wed, Mar 9, 2022 at 2:38 PM Mark Michelson <mmich...@redhat.com> wrote:
> >>
> >> Commit 4deac4509abbedd6ffaecf27eed01ddefccea40a introduced functionality
> >> in ovn-northd to use a single zone for SNAT and DNAT when possible. It
> >> accounts for certain situations, such as hairpinned traffic, where we
> >> still need a separate SNAT and DNAT zone.
> >>
> >> However, one situation it did not account for was when traffic traverses
> >> a logical router and is DNATted as a result of a load balancer, then
> >> when the traffic egresses the router, it needs to be SNATted. In this
> >> situation, we try to use the same CT zone for the SNAT as for the load
> >> balancer DNAT, which does not work.
> >>
> >> This commit fixes the issue by installing a flow in the OUT_SNAT stage
> >> of the logical router egress pipeline for each SNAT. This flow will fall
> >> back to using a separate CT zone for SNAT if the ct_label.natted flag is
> >> set.
> >>
> >> Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2044127
> >>
> >> Signed-off-by: Mark Michelson <mmich...@redhat.com>
> >
> >
> > Hi Mark,
> >
> > Few comments,
> >
> >   - There are a few test cases failing with this patch.
>
> Interesting. I'll have to double-check about that but I hadn't noticed
> it happening locally.
>
> >
> >   - I'm pretty sure the issue will not be seen if the load balancer is
> > also associated with the logical switch.  In that case, dnat would
> > happen in the logical switch pipeline.
> >     But still we should address this issue as it's a regression
> > introduced by the commit 4deac4509abb.
> >
> >   - Only small concern I've with this patch is the usage of
> > ct_label.natted field in the egress pipeline.  In the ingress pipeline
> > of the router we send the pkt to conntrack and then do NAT.
> >     I'm just worried what if the ct states/labels are not valid or they
> > get cleared when the pkt enters ingress to egress pipeline.  Looks
> > like we don't clear the conntrack state now.
> >     But future patches could.  I think we should document this somewhere.
>
> The tricky thing here is that I don't know if there is any other
> reliable way to know in the egress pipeline if the packet was natted in
> the ingress pipeline. AFAIK, there is no scenario where the router
> egress pipeline can run on a different node than the router ingress
> pipeline, so at least the way it currently works should be reliable.
>
> Where is the best place to document this?

I guess a code comment in northd.c would be helpful.  I'd suggest to see
if we can document in ovn-northd.8.xml while updating this file with the newly
added logical flow(s).  I don't have any strong preference.  So please see
if it makes sense to document in ovn-northd.8.xml too.

Numan

>
> >
> > -  This patch needs to add the relevant documentation in the
> > ovn-northd.8.xml for the new flows added.
> >
> > -  Instead of adding the flows in the "lr_out_snat" stage,  I'd
> > suggest to add the below flow in the "lr_out_chk_dnat_local" pipeline
> >
> >      table=0 (lr_out_chk_dnat_local), priority=50   , match=(ip &&
> > ct_label.natted == 1),  action=(reg9[4] = 1; next;)
> >
> > We already have flows in the "lr_out_snat" stage to make use of
> > "ct_snat" action instead of "ct_snat_in_czone" if reg9[4] is set.
> > Can you please see if this suggestion is possible ?
>
> That sounds perfectly reasonable. I'll test it out and see if it works.
>
> >
> >
> > Thanks
> > Numan
>
>
> >
> >
> >> ---
> >> Note: The "SNAT in separate zone from DNAT" test in system-ovn.at can
> >> fail on some systems, and at this point it's not clear what causes it.
> >> On my development system running Fedora 33 and kernel 5.14.18-100, the
> >> test fails because the pings fail. In a container running Fedora 33 and
> >> kernel 5.10.19-200, the test succeeds. It's unknown if it's the kernel
> >> version or some other setting that has a bearing on the success or
> >> failure of the test. The OpenFlow that ovn-controller generates is
> >> identical on both successful and failure scenarios.
> >>
> >> If this inconsistency causes issues with CI, then this patch may need to
> >> be re-examined to determine if there are other parameters that may be
> >> needed to correct the referenced issue.
> >> ---
> >>   northd/northd.c     |  53 ++++++++++++++------
> >>   tests/ovn-northd.at |  36 ++++++++++++++
> >>   tests/system-ovn.at | 117 ++++++++++++++++++++++++++++++++++++++++++++
> >>   3 files changed, 190 insertions(+), 16 deletions(-)
> >>
> >> diff --git a/northd/northd.c b/northd/northd.c
> >> index b264fb850..866a81310 100644
> >> --- a/northd/northd.c
> >> +++ b/northd/northd.c
> >> @@ -12961,23 +12961,44 @@ build_lrouter_out_snat_flow(struct hmap *lflows, 
> >> struct ovn_datapath *od,
> >>                                   priority, ds_cstr(match),
> >>                                   ds_cstr(actions), &nat->header_);
> >>
> >> -        if (!stateless) {
> >> -            ds_put_cstr(match, " && "REGBIT_DST_NAT_IP_LOCAL" == 1");
> >> -            ds_clear(actions);
> >> -            if (distributed) {
> >> -                ds_put_format(actions, "eth.src = "ETH_ADDR_FMT"; ",
> >> -                              ETH_ADDR_ARGS(mac));
> >> -            }
> >> -            ds_put_format(actions,  REGBIT_DST_NAT_IP_LOCAL" = 0; 
> >> ct_snat(%s",
> >> -                          nat->external_ip);
> >> -            if (nat->external_port_range[0]) {
> >> -                ds_put_format(actions, ",%s", nat->external_port_range);
> >> -            }
> >> -            ds_put_format(actions, ");");
> >> -            ovn_lflow_add_with_hint(lflows, od, S_ROUTER_OUT_SNAT,
> >> -                                    priority + 1, ds_cstr(match),
> >> -                                    ds_cstr(actions), &nat->header_);
> >> +        if (stateless) {
> >> +            return;
> >> +        }
> >> +
> >> +        struct ds match_natted = DS_EMPTY_INITIALIZER;
> >> +        struct ds actions_natted = DS_EMPTY_INITIALIZER;
> >> +
> >> +        ds_clone(&match_natted, match);
> >> +
> >> +        ds_clear(actions);
> >> +
> >> +        if (distributed) {
> >> +            ds_put_format(actions, "eth.src = "ETH_ADDR_FMT"; ",
> >> +                          ETH_ADDR_ARGS(mac));
> >>           }
> >> +
> >> +        ds_clone(&actions_natted, actions);
> >> +
> >> +        ds_put_format(&actions_natted, "ct_snat(%s", nat->external_ip);
> >> +        if (nat->external_port_range[0]) {
> >> +            ds_put_format(&actions_natted, ",%s", 
> >> nat->external_port_range);
> >> +        }
> >> +        ds_put_format(&actions_natted, ");");
> >> +
> >> +        ds_put_cstr(&match_natted, " && ct_label.natted == 1");
> >> +        ovn_lflow_add_with_hint(lflows, od, S_ROUTER_OUT_SNAT,
> >> +                                priority + 2, ds_cstr(&match_natted),
> >> +                                ds_cstr(&actions_natted), &nat->header_);
> >> +
> >> +        ds_put_cstr(match, " && "REGBIT_DST_NAT_IP_LOCAL" == 1");
> >> +        ds_put_format(actions,  REGBIT_DST_NAT_IP_LOCAL" = 0; %s",
> >> +                      ds_cstr(&actions_natted));
> >> +        ovn_lflow_add_with_hint(lflows, od, S_ROUTER_OUT_SNAT,
> >> +                                priority + 1, ds_cstr(match),
> >> +                                ds_cstr(actions), &nat->header_);
> >> +
> >> +        ds_destroy(&match_natted);
> >> +        ds_destroy(&actions_natted);
> >>       }
> >>   }
> >>
> >> diff --git a/tests/ovn-northd.at b/tests/ovn-northd.at
> >> index 60d91a771..d9657bcb8 100644
> >> --- a/tests/ovn-northd.at
> >> +++ b/tests/ovn-northd.at
> >> @@ -5974,6 +5974,42 @@ AT_CHECK([grep -e "(lr_in_ip_routing   ).*outport" 
> >> lr0flows | sed 's/table=../ta
> >>     table=??(lr_in_ip_routing   ), priority=97   , match=(reg7 == 2 && 
> >> ip4.dst == 1.1.1.1/32), action=(ip.ttl--; reg8[[0..15]] = 0; reg0 = 
> >> 192.168.0.20; reg1 = 192.168.0.1; eth.src = 00:00:00:00:00:01; outport = 
> >> "lrp0"; flags.loopback = 1; next;)
> >>   ])
> >>
> >> +AT_CLEANUP
> >> +
> >> +OVN_FOR_EACH_NORTHD([
> >> +AT_SETUP([nat -- SNAT flows for DNAT traffic])
> >> +# This test makes sure that SNAT flows are added that ensure a unique
> >> +# SNAT zone is used for traffic that is natted. We don't need an elaborate
> >> +# setup just to ensure the flow is present. We just need to create the 
> >> router
> >> +# and add an SNAT to it.
> >> +
> >> +ovn_start
> >> +
> >> +check ovn-nbctl lr-add r0
> >> +check ovn-nbctl lrp-add r0 r0p1 00:00:00:00:00:01 10.0.0.1
> >> +check ovn-nbctl lrp-set-gateway-chassis r0p1 hv1
> >> +check ovn-nbctl lr-nat-add r0 snat 10.0.0.1 192.168.0.1
> >> +check ovn-nbctl --wait=sb sync
> >> +
> >> +ovn-sbctl lflow-list r0 | grep lr_out_snat | sed 's/table=../table=??/' > 
> >> snat_flows
> >> +
> >> +# We expect three SNAT flows for the NAT we created.
> >> +# Priority-161 flow that uses CT zone stored in reg11
> >> +# Priority-162 flow that uses CT zone stored in reg12 if local dnat is 
> >> detected
> >> +# Priority-163 flow taht uses CT zone stored in reg12 if traffic is natted
> >> +# Since priorities are unique, we can grep based on that and check the 
> >> resulting
> >> +# flows are as expected.
> >> +
> >> +AT_CHECK([grep priority=163 snat_flows], [0], [dnl
> >> +  table=??(lr_out_snat        ), priority=163  , match=(ip && ip4.src == 
> >> 192.168.0.1 && outport == "r0p1" && is_chassis_resident("cr-r0p1") && 
> >> ct_label.natted == 1), action=(ct_snat(10.0.0.1);)
> >> +])
> >> +AT_CHECK([grep priority=162 snat_flows], [0], [dnl
> >> +  table=??(lr_out_snat        ), priority=162  , match=(ip && ip4.src == 
> >> 192.168.0.1 && outport == "r0p1" && is_chassis_resident("cr-r0p1") && 
> >> reg9[[4]] == 1), action=(reg9[[4]] = 0; ct_snat(10.0.0.1);)
> >> +])
> >> +AT_CHECK([grep priority=161 snat_flows], [0], [dnl
> >> +  table=??(lr_out_snat        ), priority=161  , match=(ip && ip4.src == 
> >> 192.168.0.1 && outport == "r0p1" && is_chassis_resident("cr-r0p1")), 
> >> action=(ct_snat_in_czone(10.0.0.1);)
> >> +])
> >> +
> >>   AT_CLEANUP
> >>   ])
> >>
> >> diff --git a/tests/system-ovn.at b/tests/system-ovn.at
> >> index d4c22e7e9..0b29becc8 100644
> >> --- a/tests/system-ovn.at
> >> +++ b/tests/system-ovn.at
> >> @@ -7975,3 +7975,120 @@ OVS_TRAFFIC_VSWITCHD_STOP(["/failed to query port 
> >> patch-.*/d
> >>
> >>   AT_CLEANUP
> >>   ])
> >> +
> >> +OVN_FOR_EACH_NORTHD([
> >> +AT_SETUP([SNAT in separate zone from DNAT])
> >> +
> >> +CHECK_CONNTRACK()
> >> +CHECK_CONNTRACK_NAT()
> >> +ovn_start
> >> +OVS_TRAFFIC_VSWITCHD_START()
> >> +ADD_BR([br-int])
> >> +
> >> +# Set external-ids in br-int needed for ovn-controller
> >> +ovs-vsctl \
> >> +        -- set Open_vSwitch . external-ids:system-id=hv1 \
> >> +        -- set Open_vSwitch . 
> >> external-ids:ovn-remote=unix:$ovs_base/ovn-sb/ovn-sb.sock \
> >> +        -- set Open_vSwitch . external-ids:ovn-encap-type=geneve \
> >> +        -- set Open_vSwitch . external-ids:ovn-encap-ip=169.0.0.1 \
> >> +        -- set bridge br-int fail-mode=secure 
> >> other-config:disable-in-band=true
> >> +
> >> +# The goal of this test is to ensure that when traffic is first DNATted
> >> +# (by way of a load balancer), and then SNATted, the SNAT happens in a
> >> +# separate conntrack zone from the DNAT.
> >> +
> >> +start_daemon ovn-controller
> >> +
> >> +check ovn-nbctl ls-add public
> >> +
> >> +check ovn-nbctl lr-add r1
> >> +check ovn-nbctl lrp-add r1 r1_public 00:de:ad:ff:00:01 172.16.0.1/16
> >> +check ovn-nbctl lrp-add r1 r1_s1 00:de:ad:fe:00:01 173.0.1.1/24
> >> +check ovn-nbctl lrp-set-gateway-chassis r1_public hv1
> >> +
> >> +check ovn-nbctl lb-add r1_lb 30.0.0.1 172.16.0.102
> >> +check ovn-nbctl lr-lb-add r1 r1_lb
> >> +
> >> +check ovn-nbctl ls-add s1
> >> +check ovn-nbctl lsp-add s1 s1_r1
> >> +check ovn-nbctl lsp-set-type s1_r1 router
> >> +check ovn-nbctl lsp-set-addresses s1_r1 router
> >> +check ovn-nbctl lsp-set-options s1_r1 router-port=r1_s1
> >> +
> >> +check ovn-nbctl lsp-add s1 vm1
> >> +check ovn-nbctl lsp-set-addresses vm1 "00:de:ad:01:00:01 173.0.1.2"
> >> +
> >> +check ovn-nbctl lsp-add public public_r1
> >> +check ovn-nbctl lsp-set-type public_r1 router
> >> +check ovn-nbctl lsp-set-addresses public_r1 router
> >> +check ovn-nbctl lsp-set-options public_r1 router-port=r1_public 
> >> nat-addresses=router
> >> +
> >> +check ovn-nbctl lr-add r2
> >> +check ovn-nbctl lrp-add r2 r2_public 00:de:ad:ff:00:02 172.16.0.2/16
> >> +check ovn-nbctl lrp-add r2 r2_s2 00:de:ad:fe:00:02 173.0.2.1/24
> >> +check ovn-nbctl lr-nat-add r2 dnat_and_snat 172.16.0.102 173.0.2.2
> >> +check ovn-nbctl lrp-set-gateway-chassis r2_public hv1
> >> +
> >> +check ovn-nbctl ls-add s2
> >> +check ovn-nbctl lsp-add s2 s2_r2
> >> +check ovn-nbctl lsp-set-type s2_r2 router
> >> +check ovn-nbctl lsp-set-addresses s2_r2 router
> >> +check ovn-nbctl lsp-set-options s2_r2 router-port=r2_s2
> >> +
> >> +check ovn-nbctl lsp-add s2 vm2
> >> +check ovn-nbctl lsp-set-addresses vm2 "00:de:ad:01:00:02 173.0.2.2"
> >> +
> >> +check ovn-nbctl lsp-add public public_r2
> >> +check ovn-nbctl lsp-set-type public_r2 router
> >> +check ovn-nbctl lsp-set-addresses public_r2 router
> >> +check ovn-nbctl lsp-set-options public_r2 router-port=r2_public 
> >> nat-addresses=router
> >> +
> >> +ADD_NAMESPACES(vm1)
> >> +ADD_VETH(vm1, vm1, br-int, "173.0.1.2/24", "00:de:ad:01:00:01", \
> >> +         "173.0.1.1")
> >> +ADD_NAMESPACES(vm2)
> >> +ADD_VETH(vm2, vm2, br-int, "173.0.2.2/24", "00:de:ad:01:00:02", \
> >> +         "173.0.2.1")
> >> +
> >> +check ovn-nbctl lr-nat-add r1 dnat_and_snat 172.16.0.101 173.0.1.2 vm1 
> >> 00:00:00:01:02:03
> >> +check ovn-nbctl --wait=hv sync
> >> +
> >> +# Next, make sure that a ping works as expected
> >> +NS_CHECK_EXEC([vm1], [ping -q -c 3 -i 0.3 -w 2 30.0.0.1 | FORMAT_PING], \
> >> +[0], [dnl
> >> +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> >> +])
> >> +
> >> +# Finally, make sure that conntrack shows two separate zones being used 
> >> for
> >> +# DNAT and SNAT
> >> +AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(30.0.0.1) | \
> >> +sed -e 's/zone=[[0-9]]*/zone=<cleared>/'], [0], [dnl
> >> +icmp,orig=(src=173.0.1.2,dst=30.0.0.1,id=<cleared>,type=8,code=0),reply=(src=172.16.0.102,dst=173.0.1.2,id=<cleared>,type=0,code=0),zone=<cleared>,labels=0x2
> >> +])
> >> +
> >> +# The final two entries appear identical here. That is because FORMAT_CT
> >> +# scrubs the zone numbers. In actuality, the zone numbers are different,
> >> +# which is why there are two entries.
> >> +AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(172.16.0.102) | \
> >> +sed -e 's/zone=[[0-9]]*/zone=<cleared>/'], [0], [dnl
> >> +icmp,orig=(src=172.16.0.101,dst=172.16.0.102,id=<cleared>,type=8,code=0),reply=(src=173.0.2.2,dst=172.16.0.101,id=<cleared>,type=0,code=0),zone=<cleared>
> >> +icmp,orig=(src=173.0.1.2,dst=172.16.0.102,id=<cleared>,type=8,code=0),reply=(src=172.16.0.102,dst=172.16.0.101,id=<cleared>,type=0,code=0),zone=<cleared>
> >> +icmp,orig=(src=173.0.1.2,dst=172.16.0.102,id=<cleared>,type=8,code=0),reply=(src=172.16.0.102,dst=172.16.0.101,id=<cleared>,type=0,code=0),zone=<cleared>
> >> +])
> >> +
> >> +OVS_APP_EXIT_AND_WAIT([ovn-controller])
> >> +
> >> +as ovn-sb
> >> +OVS_APP_EXIT_AND_WAIT([ovsdb-server])
> >> +
> >> +as ovn-nb
> >> +OVS_APP_EXIT_AND_WAIT([ovsdb-server])
> >> +
> >> +as northd
> >> +OVS_APP_EXIT_AND_WAIT([NORTHD_TYPE])
> >> +
> >> +as
> >> +OVS_TRAFFIC_VSWITCHD_STOP(["/failed to query port patch-.*/d
> >> +/connection dropped.*/d"])
> >> +AT_CLEANUP
> >> +])
> >> --
> >> 2.31.1
> >>
> >> _______________________________________________
> >> dev mailing list
> >> d...@openvswitch.org
> >> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> >>
> >
>
> _______________________________________________
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to