On 4/6/23 16:53, Ales Musil wrote:
> 
> 
> On Thu, Apr 6, 2023 at 4:48 PM Ilya Maximets <i.maxim...@ovn.org 
> <mailto:i.maxim...@ovn.org>> wrote:
> 
>     On 4/6/23 16:37, Ales Musil wrote:
>     > There are essentially three problems with the current
>     > combination of DGP + SNAT + LB:
>     >
>     > 1) The first packet is being SNATed in common zone due
>     > to a problem with pinctrl not preserving ct_mark/ct_label.
>     > The commit would create a SNAT entry within the same with DNAT
>     > entry.
>     >
>     > 2) The unSNAT for reply always happened in common zone because of
>     > the loopback check which would be triggered only when we loop
>     > the packet through the LR. Now there are two possibilities how
>     > the reply packet would be handled:
>     >
>     > a) If the entry for SNAT in common zone did not time out yet, the
>     > packet would be passed through unSNAT in common zone which would
>     > be fine and continue on. However, the unDNAT wouldn't work due to
>     > the limitation of CT not capable of doing unSNAT/unDNAT on the same
>     > packet twice in the same zone. So the reply would arrive to
>     > the original interface, but with wrong source address.
>     >
>     > b) If the entry for SNAT timed out it would loop and do unSNAT correctly
>     > in separate zone and then also unDNAT. This is not possible anymore with
>     > a recent change 8c341b9d (northd: Drop packets destined to router owned 
> NAT IP for DGP).
>     > The reply would be dropped before looping after that change co the 
> traffic
>     > would never arrive to the original interface.
>     >
>     > 3) The unDNAT was happening only if the DGP was outport meaning
>     > the reply traffic was routed out, but in the opposite case
>     > the unDNAT was happening only because of the looping which made
>     > outport=inport. That's why it worked before introduction of explicit 
> drop.
>     >
>     > In order to fix all those issues do two changes:
>     >
>     > 1) Include inport in the unDNAT match, so that we have both
>     > routing directions covered e.g. (inport == "dgp_port" || outport == 
> "dpg_port").
>     >
>     > 2) Always use the separate zone for SNAT and DNAT. As the common
>     > zone was needed for HWOL make the common zone optional with
>     > configuration option called "use_common_zone". This option is
>     > by default "false" and can be specified per LR. Use of separate
>     > zones also eliminates the need for the flag propagation
>     > in "lr_out_chk_dnat_local" stage, removing the match on 
> ct_mark/ct_label.
>     >
>     > The "SNAT in separate zone from DNAT" system test is moved to run only
>     > with kernel datapath. The reason is that this test doesn't work with
>     > userspace datapath due to recirculation limit, currently set to 6 [0].
>     >
>     > 
> [0]https://github.com/openvswitch/ovs/blob/9d840923d32124fe427de76e8234c49d64e4bb77/lib/dpif-netdev.c#L102
>  
> <https://github.com/openvswitch/ovs/blob/9d840923d32124fe427de76e8234c49d64e4bb77/lib/dpif-netdev.c#L102>
>     > Reported-at: https://bugzilla.redhat.com/2161281 
> <https://bugzilla.redhat.com/2161281>
>     > Signed-off-by: Ales Musil <amu...@redhat.com <mailto:amu...@redhat.com>>
>     > ---
>     > v2: Fix flaky system test.
>     > v3: Rebase on top of current main.
>     > v4: Rebase on top of current main.
>     >     Move the system test to system-ovn-kmod
>     >     to unblock the failures with userspace.
>     > v5: Rebase on top of current main.
>     > v6: Rebase on top of current main.
>     >     Change the config to a global option instead.
>     > v7: Address comments from Dumitru:
>     >     Rename the stateless helper to better reflect
>     >     that it applies only to snat_and_dnat type.
>     >     Call the common zone and separate zone functions
>     >     separately based on condition.
>     > ---
>     >  northd/northd.c          | 536 +++++++++++++++++++++------------------
>     >  northd/ovn-northd.8.xml  |  90 +------
>     >  ovn-nb.xml               |   9 +
>     >  tests/ovn-northd.at <http://ovn-northd.at>      | 213 +++++++++++-----
>     >  tests/ovn.at <http://ovn.at>             |   3 +
>     >  tests/system-ovn-kmod.at <http://system-ovn-kmod.at> | 166 ++++++++++++
>     >  tests/system-ovn.at <http://system-ovn.at>      | 117 ---------
>     >  7 files changed, 634 insertions(+), 500 deletions(-)
>     >
>     > diff --git a/northd/northd.c b/northd/northd.c
>     > index 7a3886de0..1d13c3d8e 100644
>     > --- a/northd/northd.c
>     > +++ b/northd/northd.c
>     > @@ -66,6 +66,9 @@ static bool check_lsp_is_up;
>     > 
>     >  static bool install_ls_lb_from_router;
>     > 
>     > +/* Use common zone for SNAT and DNAT if this option is set to "true". 
> */
>     > +static bool use_common_zone = false;
>     > +
>     >  /* MAC allocated for service monitor usage. Just one mac is allocated
>     >   * for this purpose and ovn-controller's on each chassis will make use
>     >   * of this mac when sending out the packets to monitor the services
>     > @@ -10662,6 +10665,8 @@ build_distr_lrouter_nat_flows_for_lb(struct 
> lrouter_nat_lb_flows_ctx *ctx,
>     >                                       enum lrouter_nat_lb_flow_type 
> type,
>     >                                       struct ovn_datapath *od)
>     >  {
>     > +    struct ovn_port *dgp = od->l3dgw_ports[0];
>     > +
>     >      const char *undnat_action;
>     > 
>     >      switch (type) {
>     > @@ -10673,9 +10678,12 @@ build_distr_lrouter_nat_flows_for_lb(struct 
> lrouter_nat_lb_flows_ctx *ctx,
>     >          break;
>     >      case LROUTER_NAT_LB_FLOW_NORMAL:
>     >      case LROUTER_NAT_LB_FLOW_MAX:
>     > -        undnat_action = od->is_gw_router ? "ct_dnat;" : 
> "ct_dnat_in_czone;";
>     > +        undnat_action = !od->is_gw_router && use_common_zone
>     > +                        ? "ct_dnat_in_czone;"
>     > +                        : "ct_dnat;";
>     >          break;
>     >      }
>     > +
>     >      /* Store the match lengths, so we can reuse the ds buffer. */
>     >      size_t new_match_len = ctx->new_match->length;
>     >      size_t undnat_match_len = ctx->undnat_match->length;
>     > @@ -10702,10 +10710,9 @@ build_distr_lrouter_nat_flows_for_lb(struct 
> lrouter_nat_lb_flows_ctx *ctx,
>     >          return;
>     >      }
>     > 
>     > -    ds_put_format(ctx->undnat_match,
>     > -                  ") && outport == %s && is_chassis_resident(%s)",
>     > -                  od->l3dgw_ports[0]->json_key,
>     > -                  od->l3dgw_ports[0]->cr_port->json_key);
>     > +    ds_put_format(ctx->undnat_match, ") && (inport == %s || outport == 
> %s)"
>     > +                  " && is_chassis_resident(%s)", dgp->json_key, 
> dgp->json_key,
>     > +                  dgp->cr_port->json_key);
>     >      ovn_lflow_add_with_hint(ctx->lflows, od, S_ROUTER_OUT_UNDNAT, 120,
>     >                              ds_cstr(ctx->undnat_match), undnat_action,
>     >                              &ctx->lb->nlb->header_);
>     > @@ -11157,6 +11164,14 @@ lrouter_dnat_and_snat_is_stateless(const 
> struct nbrec_nat *nat)
>     >             !strcmp(nat->type, "dnat_and_snat");
>     >  }
>     > 
>     > +static inline bool
>     > +lrouter_use_common_zone_in_nat(const struct ovn_datapath *od,
>     > +                               const struct nbrec_nat *nat)
>     > +{
>     > +    return !od->is_gw_router && use_common_zone &&
>     > +           !lrouter_dnat_and_snat_is_stateless(nat);
>     > +}
>     > +
>     >  /* Handles the match criteria and actions in logical flow
>     >   * based on external ip based NAT rule filter.
>     >   *
>     > @@ -13648,85 +13663,90 @@ build_lrouter_ipv4_ip_input(struct ovn_port 
> *op,
>     >      }
>     >  }
>     > 
>     > +static void
>     > +build_lrouter_in_unsnat_in_czone_flow(struct hmap *lflows,
>     > +                                      struct ovn_datapath *od,
>     > +                                      const struct nbrec_nat *nat,
>     > +                                      struct ds *match, bool 
> distributed,
>     > +                                      bool is_v6, struct ovn_port 
> *l3dgw_port)
>     > +{
>     > +    if (strcmp(nat->type, "snat") && strcmp(nat->type, 
> "dnat_and_snat")) {
>     > +        return;
>     > +    }
>     > +
>     > +    ds_clear(match);
>     > +
>     > +    struct ds zone_match = DS_EMPTY_INITIALIZER;
>     > +
>     > +    ds_put_format(match, "ip && ip%c.dst == %s && inport == %s",
>     > +                  is_v6 ? '6' : '4', nat->external_ip, 
> l3dgw_port->json_key);
>     > +    ds_clone(&zone_match, match);
> 
>     Not sure how big of a problem it is, but it would be better to avoid
>     cloning.  It's an extra memory allocation and copy.  Instead, it should
>     be possible to save the length, create one flow, truncate, create a
>     second one.  We do that in many other places.
> 
> 
>     > +
>     > +    ds_put_cstr(match, " && flags.loopback == 0");
>     > +
>     > +    /* Update common zone match for the hairpin traffic. */
>     > +    ds_put_cstr(&zone_match, " && flags.loopback == 1"
>     > +                             " && flags.use_snat_zone == 1");
>     > +
>     > +
>     > +    if (!distributed && od->n_l3dgw_ports) {
>     > +        /* Flows for NAT rules that are centralized are only
>     > +         * programmed on the gateway chassis. */
>     > +        ds_put_format(match, " && is_chassis_resident(%s)",
>     > +                      l3dgw_port->cr_port->json_key);
>     > +        ds_put_format(&zone_match, " && is_chassis_resident(%s)",
>     > +                      l3dgw_port->cr_port->json_key);
> 
>     If we add this before checking flags, we'll not need to update both 
> matches,
>     i.e. we can do that before the clone / before saving the length.
> 
> 
> That was my original idea. There is one caveat that it affects a ton of tests.
> Would you agree to have this change as next patch that will target 
> specifically
> this so the tests are easier to review?

AFAICT, only 24 lines needs changing in tests for this.  And you're touching
more than 200 already.  So, I'm not sure if it's worth a separate patch.

>  
> 
> 
>     > +    }
>     > +
>     > +    ovn_lflow_add_with_hint(lflows, od, S_ROUTER_IN_UNSNAT,
>     > +                            100, ds_cstr(match), "ct_snat_in_czone;",
>     > +                            &nat->header_);
>     > +
>     > +    ovn_lflow_add_with_hint(lflows, od, S_ROUTER_IN_UNSNAT,
>     > +                            100, ds_cstr(&zone_match), "ct_snat;",
>     > +                            &nat->header_);
>     > +
>     > +    ds_destroy(&zone_match);
>     > +}
> 
>     Best regards, Ilya Maximets.
> 
> 
> 
> Thanks,
> Ales
> 
> -- 
> 
> Ales Musil
> 
> Senior Software Engineer - OVN Core
> 
> Red Hat EMEA <https://www.redhat.com>
> 
> amu...@redhat.com <mailto:amu...@redhat.com>    IM: amusil
> 
> <https://red.ht/sig>
> 

_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to