On Thu, Apr 6, 2023 at 5:31 PM Ilya Maximets <i.maxim...@ovn.org> wrote:

> On 4/6/23 16:53, Ales Musil wrote:
> >
> >
> > On Thu, Apr 6, 2023 at 4:48 PM Ilya Maximets <i.maxim...@ovn.org
> <mailto:i.maxim...@ovn.org>> wrote:
> >
> >     On 4/6/23 16:37, Ales Musil wrote:
> >     > There are essentially three problems with the current
> >     > combination of DGP + SNAT + LB:
> >     >
> >     > 1) The first packet is being SNATed in common zone due
> >     > to a problem with pinctrl not preserving ct_mark/ct_label.
> >     > The commit would create a SNAT entry within the same with DNAT
> >     > entry.
> >     >
> >     > 2) The unSNAT for reply always happened in common zone because of
> >     > the loopback check which would be triggered only when we loop
> >     > the packet through the LR. Now there are two possibilities how
> >     > the reply packet would be handled:
> >     >
> >     > a) If the entry for SNAT in common zone did not time out yet, the
> >     > packet would be passed through unSNAT in common zone which would
> >     > be fine and continue on. However, the unDNAT wouldn't work due to
> >     > the limitation of CT not capable of doing unSNAT/unDNAT on the same
> >     > packet twice in the same zone. So the reply would arrive to
> >     > the original interface, but with wrong source address.
> >     >
> >     > b) If the entry for SNAT timed out it would loop and do unSNAT
> correctly
> >     > in separate zone and then also unDNAT. This is not possible
> anymore with
> >     > a recent change 8c341b9d (northd: Drop packets destined to router
> owned NAT IP for DGP).
> >     > The reply would be dropped before looping after that change co the
> traffic
> >     > would never arrive to the original interface.
> >     >
> >     > 3) The unDNAT was happening only if the DGP was outport meaning
> >     > the reply traffic was routed out, but in the opposite case
> >     > the unDNAT was happening only because of the looping which made
> >     > outport=inport. That's why it worked before introduction of
> explicit drop.
> >     >
> >     > In order to fix all those issues do two changes:
> >     >
> >     > 1) Include inport in the unDNAT match, so that we have both
> >     > routing directions covered e.g. (inport == "dgp_port" || outport
> == "dpg_port").
> >     >
> >     > 2) Always use the separate zone for SNAT and DNAT. As the common
> >     > zone was needed for HWOL make the common zone optional with
> >     > configuration option called "use_common_zone". This option is
> >     > by default "false" and can be specified per LR. Use of separate
> >     > zones also eliminates the need for the flag propagation
> >     > in "lr_out_chk_dnat_local" stage, removing the match on
> ct_mark/ct_label.
> >     >
> >     > The "SNAT in separate zone from DNAT" system test is moved to run
> only
> >     > with kernel datapath. The reason is that this test doesn't work
> with
> >     > userspace datapath due to recirculation limit, currently set to 6
> [0].
> >     >
> >     > [0]
> https://github.com/openvswitch/ovs/blob/9d840923d32124fe427de76e8234c49d64e4bb77/lib/dpif-netdev.c#L102
> <
> https://github.com/openvswitch/ovs/blob/9d840923d32124fe427de76e8234c49d64e4bb77/lib/dpif-netdev.c#L102
> >
> >     > Reported-at: https://bugzilla.redhat.com/2161281 <
> https://bugzilla.redhat.com/2161281>
> >     > Signed-off-by: Ales Musil <amu...@redhat.com <mailto:
> amu...@redhat.com>>
> >     > ---
> >     > v2: Fix flaky system test.
> >     > v3: Rebase on top of current main.
> >     > v4: Rebase on top of current main.
> >     >     Move the system test to system-ovn-kmod
> >     >     to unblock the failures with userspace.
> >     > v5: Rebase on top of current main.
> >     > v6: Rebase on top of current main.
> >     >     Change the config to a global option instead.
> >     > v7: Address comments from Dumitru:
> >     >     Rename the stateless helper to better reflect
> >     >     that it applies only to snat_and_dnat type.
> >     >     Call the common zone and separate zone functions
> >     >     separately based on condition.
> >     > ---
> >     >  northd/northd.c          | 536
> +++++++++++++++++++++------------------
> >     >  northd/ovn-northd.8.xml  |  90 +------
> >     >  ovn-nb.xml               |   9 +
> >     >  tests/ovn-northd.at <http://ovn-northd.at>      | 213
> +++++++++++-----
> >     >  tests/ovn.at <http://ovn.at>             |   3 +
> >     >  tests/system-ovn-kmod.at <http://system-ovn-kmod.at> | 166
> ++++++++++++
> >     >  tests/system-ovn.at <http://system-ovn.at>      | 117 ---------
> >     >  7 files changed, 634 insertions(+), 500 deletions(-)
> >     >
> >     > diff --git a/northd/northd.c b/northd/northd.c
> >     > index 7a3886de0..1d13c3d8e 100644
> >     > --- a/northd/northd.c
> >     > +++ b/northd/northd.c
> >     > @@ -66,6 +66,9 @@ static bool check_lsp_is_up;
> >     >
> >     >  static bool install_ls_lb_from_router;
> >     >
> >     > +/* Use common zone for SNAT and DNAT if this option is set to
> "true". */
> >     > +static bool use_common_zone = false;
> >     > +
> >     >  /* MAC allocated for service monitor usage. Just one mac is
> allocated
> >     >   * for this purpose and ovn-controller's on each chassis will
> make use
> >     >   * of this mac when sending out the packets to monitor the
> services
> >     > @@ -10662,6 +10665,8 @@
> build_distr_lrouter_nat_flows_for_lb(struct lrouter_nat_lb_flows_ctx *ctx,
> >     >                                       enum
> lrouter_nat_lb_flow_type type,
> >     >                                       struct ovn_datapath *od)
> >     >  {
> >     > +    struct ovn_port *dgp = od->l3dgw_ports[0];
> >     > +
> >     >      const char *undnat_action;
> >     >
> >     >      switch (type) {
> >     > @@ -10673,9 +10678,12 @@
> build_distr_lrouter_nat_flows_for_lb(struct lrouter_nat_lb_flows_ctx *ctx,
> >     >          break;
> >     >      case LROUTER_NAT_LB_FLOW_NORMAL:
> >     >      case LROUTER_NAT_LB_FLOW_MAX:
> >     > -        undnat_action = od->is_gw_router ? "ct_dnat;" :
> "ct_dnat_in_czone;";
> >     > +        undnat_action = !od->is_gw_router && use_common_zone
> >     > +                        ? "ct_dnat_in_czone;"
> >     > +                        : "ct_dnat;";
> >     >          break;
> >     >      }
> >     > +
> >     >      /* Store the match lengths, so we can reuse the ds buffer. */
> >     >      size_t new_match_len = ctx->new_match->length;
> >     >      size_t undnat_match_len = ctx->undnat_match->length;
> >     > @@ -10702,10 +10710,9 @@
> build_distr_lrouter_nat_flows_for_lb(struct lrouter_nat_lb_flows_ctx *ctx,
> >     >          return;
> >     >      }
> >     >
> >     > -    ds_put_format(ctx->undnat_match,
> >     > -                  ") && outport == %s && is_chassis_resident(%s)",
> >     > -                  od->l3dgw_ports[0]->json_key,
> >     > -                  od->l3dgw_ports[0]->cr_port->json_key);
> >     > +    ds_put_format(ctx->undnat_match, ") && (inport == %s ||
> outport == %s)"
> >     > +                  " && is_chassis_resident(%s)", dgp->json_key,
> dgp->json_key,
> >     > +                  dgp->cr_port->json_key);
> >     >      ovn_lflow_add_with_hint(ctx->lflows, od, S_ROUTER_OUT_UNDNAT,
> 120,
> >     >                              ds_cstr(ctx->undnat_match),
> undnat_action,
> >     >                              &ctx->lb->nlb->header_);
> >     > @@ -11157,6 +11164,14 @@ lrouter_dnat_and_snat_is_stateless(const
> struct nbrec_nat *nat)
> >     >             !strcmp(nat->type, "dnat_and_snat");
> >     >  }
> >     >
> >     > +static inline bool
> >     > +lrouter_use_common_zone_in_nat(const struct ovn_datapath *od,
> >     > +                               const struct nbrec_nat *nat)
> >     > +{
> >     > +    return !od->is_gw_router && use_common_zone &&
> >     > +           !lrouter_dnat_and_snat_is_stateless(nat);
> >     > +}
> >     > +
> >     >  /* Handles the match criteria and actions in logical flow
> >     >   * based on external ip based NAT rule filter.
> >     >   *
> >     > @@ -13648,85 +13663,90 @@ build_lrouter_ipv4_ip_input(struct
> ovn_port *op,
> >     >      }
> >     >  }
> >     >
> >     > +static void
> >     > +build_lrouter_in_unsnat_in_czone_flow(struct hmap *lflows,
> >     > +                                      struct ovn_datapath *od,
> >     > +                                      const struct nbrec_nat *nat,
> >     > +                                      struct ds *match, bool
> distributed,
> >     > +                                      bool is_v6, struct ovn_port
> *l3dgw_port)
> >     > +{
> >     > +    if (strcmp(nat->type, "snat") && strcmp(nat->type,
> "dnat_and_snat")) {
> >     > +        return;
> >     > +    }
> >     > +
> >     > +    ds_clear(match);
> >     > +
> >     > +    struct ds zone_match = DS_EMPTY_INITIALIZER;
> >     > +
> >     > +    ds_put_format(match, "ip && ip%c.dst == %s && inport == %s",
> >     > +                  is_v6 ? '6' : '4', nat->external_ip,
> l3dgw_port->json_key);
> >     > +    ds_clone(&zone_match, match);
> >
> >     Not sure how big of a problem it is, but it would be better to avoid
> >     cloning.  It's an extra memory allocation and copy.  Instead, it
> should
> >     be possible to save the length, create one flow, truncate, create a
> >     second one.  We do that in many other places.
> >
> >
> >     > +
> >     > +    ds_put_cstr(match, " && flags.loopback == 0");
> >     > +
> >     > +    /* Update common zone match for the hairpin traffic. */
> >     > +    ds_put_cstr(&zone_match, " && flags.loopback == 1"
> >     > +                             " && flags.use_snat_zone == 1");
> >     > +
> >     > +
> >     > +    if (!distributed && od->n_l3dgw_ports) {
> >     > +        /* Flows for NAT rules that are centralized are only
> >     > +         * programmed on the gateway chassis. */
> >     > +        ds_put_format(match, " && is_chassis_resident(%s)",
> >     > +                      l3dgw_port->cr_port->json_key);
> >     > +        ds_put_format(&zone_match, " && is_chassis_resident(%s)",
> >     > +                      l3dgw_port->cr_port->json_key);
> >
> >     If we add this before checking flags, we'll not need to update both
> matches,
> >     i.e. we can do that before the clone / before saving the length.
> >
> >
> > That was my original idea. There is one caveat that it affects a ton of
> tests.
> > Would you agree to have this change as next patch that will target
> specifically
> > this so the tests are easier to review?
>
> AFAICT, only 24 lines needs changing in tests for this.  And you're
> touching
> more than 200 already.  So, I'm not sure if it's worth a separate patch.
>

Yeah it wasn't that bad afterall, the ds_clone was removed in v8.


>
> >
> >
> >
> >     > +    }
> >     > +
> >     > +    ovn_lflow_add_with_hint(lflows, od, S_ROUTER_IN_UNSNAT,
> >     > +                            100, ds_cstr(match),
> "ct_snat_in_czone;",
> >     > +                            &nat->header_);
> >     > +
> >     > +    ovn_lflow_add_with_hint(lflows, od, S_ROUTER_IN_UNSNAT,
> >     > +                            100, ds_cstr(&zone_match), "ct_snat;",
> >     > +                            &nat->header_);
> >     > +
> >     > +    ds_destroy(&zone_match);
> >     > +}
> >
> >     Best regards, Ilya Maximets.
> >
> >
> >
> > Thanks,
> > Ales
> >
> > --
> >
> > Ales Musil
> >
> > Senior Software Engineer - OVN Core
> >
> > Red Hat EMEA <https://www.redhat.com>
> >
> > amu...@redhat.com <mailto:amu...@redhat.com>    IM: amusil
> >
> > <https://red.ht/sig>
> >
>
>
Thanks,
Ales

-- 

Ales Musil

Senior Software Engineer - OVN Core

Red Hat EMEA <https://www.redhat.com>

amu...@redhat.com    IM: amusil
<https://red.ht/sig>
_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to