On 4/19/23 07:32, Ales Musil wrote:
> On Tue, Apr 18, 2023 at 11:39 PM Dumitru Ceara <dce...@redhat.com> wrote:
> 
>> On 4/12/23 07:56, Ales Musil wrote:
>>> There are essentially three problems with the current
>>> combination of DGP + SNAT + LB:
>>>
>>> 1) The first packet is being SNATed in common zone due
>>> to a problem with pinctrl not preserving ct_mark/ct_label.
>>> The commit would create a SNAT entry within the same with DNAT
>>> entry.
>>>
>>> 2) The unSNAT for reply always happened in common zone because of
>>> the loopback check which would be triggered only when we loop
>>> the packet through the LR. Now there are two possibilities how
>>> the reply packet would be handled:
>>>
>>> a) If the entry for SNAT in common zone did not time out yet, the
>>> packet would be passed through unSNAT in common zone which would
>>> be fine and continue on. However, the unDNAT wouldn't work due to
>>> the limitation of CT not capable of doing unSNAT/unDNAT on the same
>>> packet twice in the same zone. So the reply would arrive to
>>> the original interface, but with wrong source address.
>>>
>>> b) If the entry for SNAT timed out it would loop and do unSNAT correctly
>>> in separate zone and then also unDNAT. This is not possible anymore with
>>> a recent change 8c341b9d (northd: Drop packets destined to router owned
>> NAT IP for DGP).
>>> The reply would be dropped before looping after that change co the
>> traffic
>>> would never arrive to the original interface.
>>>
>>> 3) The unDNAT was happening only if the DGP was outport meaning
>>> the reply traffic was routed out, but in the opposite case
>>> the unDNAT was happening only because of the looping which made
>>> outport=inport. That's why it worked before introduction of explicit
>> drop.
>>>
>>> In order to fix all those issues do two changes:
>>>
>>> 1) Include inport in the unDNAT match, so that we have both
>>> routing directions covered e.g. (inport == "dgp_port" || outport ==
>> "dpg_port").
>>>
>>> 2) Always use the separate zone for SNAT and DNAT. As the common
>>> zone was needed for HWOL make the common zone optional with
>>> configuration option called "use_common_zone". This option is
>>> by default "false" and can be specified per LR. Use of separate
>>> zones also eliminates the need for the flag propagation
>>> in "lr_out_chk_dnat_local" stage, removing the match on ct_mark/ct_label.
>>>
>>> The "SNAT in separate zone from DNAT" system test is moved to run only
>>> with kernel datapath. The reason is that this test doesn't work with
>>> userspace datapath due to recirculation limit, currently set to 6 [0].
>>>
>>> [0]
>> https://github.com/openvswitch/ovs/blob/9d840923d32124fe427de76e8234c49d64e4bb77/lib/dpif-netdev.c#L102
>>> Reported-at: https://bugzilla.redhat.com/2161281
>>> Signed-off-by: Ales Musil <amu...@redhat.com>
>>> ---
>>> v2: Fix flaky system test.
>>> v3: Rebase on top of current main.
>>> v4: Rebase on top of current main.
>>>     Move the system test to system-ovn-kmod
>>>     to unblock the failures with userspace.
>>> v5: Rebase on top of current main.
>>> v6: Rebase on top of current main.
>>>     Change the config to a global option instead.
>>> v7: Address comments from Dumitru:
>>>     Rename the stateless helper to better reflect
>>>     that it applies only to snat_and_dnat type.
>>>     Call the common zone and separate zone functions
>>>     separately based on condition.
>>> v8: Address comments from Ilya and Dumitru.
>>>     Change the helper for common zone.
>>>     Avoid ds_clone.
>>> v9: Rebase on top of current main.
>>>     Address comments from Dumitru.
>>>     Split the function even further for three
>>>     possible cases: stateless, common zone, separate zone.
>>> ---
>>>  northd/northd.c          | 621 +++++++++++++++++++++++----------------
>>>  northd/ovn-northd.8.xml  |  90 +-----
>>>  ovn-nb.xml               |   9 +
>>>  tests/ovn-northd.at      | 237 ++++++++++-----
>>>  tests/ovn.at             |   3 +
>>>  tests/system-ovn-kmod.at | 166 +++++++++++
>>>  tests/system-ovn.at      | 117 --------
>>>  7 files changed, 721 insertions(+), 522 deletions(-)
>>>
>>> diff --git a/northd/northd.c b/northd/northd.c
>>> index da3c8cf77..8b8f411df 100644
>>> --- a/northd/northd.c
>>> +++ b/northd/northd.c
>>> @@ -66,6 +66,9 @@ static bool check_lsp_is_up;
>>>
>>>  static bool install_ls_lb_from_router;
>>>
>>> +/* Use common zone for SNAT and DNAT if this option is set to "true". */
>>> +static bool use_common_zone = false;
>>> +
>>>  /* MAC allocated for service monitor usage. Just one mac is allocated
>>>   * for this purpose and ovn-controller's on each chassis will make use
>>>   * of this mac when sending out the packets to monitor the services
>>> @@ -10657,11 +10660,19 @@ struct lrouter_nat_lb_flows_ctx {
>>>      const struct shash *meter_groups;
>>>  };
>>>
>>> +static inline bool
>>> +lrouter_use_common_zone(const struct ovn_datapath *od)
>>> +{
>>> +    return !od->is_gw_router && use_common_zone;
>>> +}
>>> +
>>>  static void
>>>  build_distr_lrouter_nat_flows_for_lb(struct lrouter_nat_lb_flows_ctx
>> *ctx,
>>>                                       enum lrouter_nat_lb_flow_type type,
>>>                                       struct ovn_datapath *od)
>>>  {
>>> +    struct ovn_port *dgp = od->l3dgw_ports[0];
>>> +
>>>      const char *undnat_action;
>>>
>>>      switch (type) {
>>> @@ -10673,9 +10684,12 @@ build_distr_lrouter_nat_flows_for_lb(struct
>> lrouter_nat_lb_flows_ctx *ctx,
>>>          break;
>>>      case LROUTER_NAT_LB_FLOW_NORMAL:
>>>      case LROUTER_NAT_LB_FLOW_MAX:
>>> -        undnat_action = od->is_gw_router ? "ct_dnat;" :
>> "ct_dnat_in_czone;";
>>> +        undnat_action = lrouter_use_common_zone(od)
>>> +                        ? "ct_dnat_in_czone;"
>>> +                        : "ct_dnat;";
>>>          break;
>>>      }
>>> +
>>>      /* Store the match lengths, so we can reuse the ds buffer. */
>>>      size_t new_match_len = ctx->new_match->length;
>>>      size_t undnat_match_len = ctx->undnat_match->length;
>>> @@ -10702,10 +10716,9 @@ build_distr_lrouter_nat_flows_for_lb(struct
>> lrouter_nat_lb_flows_ctx *ctx,
>>>          return;
>>>      }
>>>
>>> -    ds_put_format(ctx->undnat_match,
>>> -                  ") && outport == %s && is_chassis_resident(%s)",
>>> -                  od->l3dgw_ports[0]->json_key,
>>> -                  od->l3dgw_ports[0]->cr_port->json_key);
>>> +    ds_put_format(ctx->undnat_match, ") && (inport == %s || outport ==
>> %s)"
>>> +                  " && is_chassis_resident(%s)", dgp->json_key,
>> dgp->json_key,
>>> +                  dgp->cr_port->json_key);
>>>      ovn_lflow_add_with_hint(ctx->lflows, od, S_ROUTER_OUT_UNDNAT, 120,
>>>                              ds_cstr(ctx->undnat_match), undnat_action,
>>>                              &ctx->lb->nlb->header_);
>>> @@ -13649,84 +13662,101 @@ build_lrouter_ipv4_ip_input(struct ovn_port
>> *op,
>>>  }
>>>
>>>  static void
>>> -build_lrouter_in_unsnat_flow(struct hmap *lflows, struct ovn_datapath
>> *od,
>>> -                             const struct nbrec_nat *nat, struct ds
>> *match,
>>> -                             struct ds *actions, bool distributed, bool
>> is_v6,
>>> -                             struct ovn_port *l3dgw_port)
>>> +build_lrouter_in_unsnat_match(struct ovn_datapath *od,
>>> +                              const struct nbrec_nat *nat, struct ds
>> *match,
>>> +                              bool distributed, bool is_v6,
>>> +                              struct ovn_port *l3dgw_port)
>>>  {
>>> -    /* Ingress UNSNAT table: It is for already established connections'
>>> -    * reverse traffic. i.e., SNAT has already been done in egress
>>> -    * pipeline and now the packet has entered the ingress pipeline as
>>> -    * part of a reply. We undo the SNAT here.
>>> -    *
>>> -    * Undoing SNAT has to happen before DNAT processing.  This is
>>> -    * because when the packet was DNATed in ingress pipeline, it did
>>> -    * not know about the possibility of eventual additional SNAT in
>>> -    * egress pipeline. */
>>> -    if (strcmp(nat->type, "snat") && strcmp(nat->type,
>> "dnat_and_snat")) {
>>> -        return;
>>> -    }
>>> +    ds_clear(match);
>>>
>>> -    bool stateless = lrouter_dnat_and_snat_is_stateless(nat);
>>> -    if (od->is_gw_router) {
>>> -        ds_clear(match);
>>> -        ds_clear(actions);
>>> -        ds_put_format(match, "ip && ip%s.dst == %s",
>>> -                      is_v6 ? "6" : "4", nat->external_ip);
>>> -        if (stateless) {
>>> -            ds_put_format(actions, "next;");
>>> -        } else {
>>> -            ds_put_cstr(actions, "ct_snat;");
>>> -        }
>>> +    ds_put_format(match, "ip && ip%c.dst == %s",
>>> +                  is_v6 ? '6' : '4', nat->external_ip);
>>>
>>> -        ovn_lflow_add_with_hint(lflows, od, S_ROUTER_IN_UNSNAT,
>>> -                                90, ds_cstr(match), ds_cstr(actions),
>>> -                                &nat->header_);
>>> -    } else {
>>> +    if (!od->is_gw_router) {
>>>          /* Distributed router. */
>>>
>>>          /* Traffic received on l3dgw_port is subject to NAT. */
>>> -        ds_clear(match);
>>> -        ds_clear(actions);
>>> -        ds_put_format(match, "ip && ip%s.dst == %s && inport == %s && "
>>> -                      "flags.loopback == 0", is_v6 ? "6" : "4",
>>> -                      nat->external_ip, l3dgw_port->json_key);
>>> +        ds_put_format(match, " && inport == %s", l3dgw_port->json_key);
>>> +
>>>          if (!distributed && od->n_l3dgw_ports) {
>>>              /* Flows for NAT rules that are centralized are only
>>> -            * programmed on the gateway chassis. */
>>> +             * programmed on the gateway chassis. */
>>>              ds_put_format(match, " && is_chassis_resident(%s)",
>>>                            l3dgw_port->cr_port->json_key);
>>>          }
>>> +    }
>>> +}
>>>
>>> -        if (stateless) {
>>> -            ds_put_format(actions, "next;");
>>> -        } else {
>>> -            ds_put_cstr(actions, "ct_snat_in_czone;");
>>> -        }
>>> +static void
>>> +build_lrouter_in_unsnat_stateless_flow(struct hmap *lflows,
>>> +                                       struct ovn_datapath *od,
>>> +                                       const struct nbrec_nat *nat,
>>> +                                       struct ds *match,
>>> +                                       bool distributed, bool is_v6,
>>> +                                       struct ovn_port *l3dgw_port)
>>> +{
>>> +    if (strcmp(nat->type, "snat") && strcmp(nat->type,
>> "dnat_and_snat")) {
>>> +        return;
>>> +    }
>>>
>>> -        ovn_lflow_add_with_hint(lflows, od, S_ROUTER_IN_UNSNAT,
>>> -                                100, ds_cstr(match), ds_cstr(actions),
>>> -                                &nat->header_);
>>> +    uint16_t priority = od->is_gw_router ? 90 : 100;
>>>
>>> -        if (!stateless) {
>>> -            ds_clear(match);
>>> -            ds_clear(actions);
>>> -            ds_put_format(match, "ip && ip%s.dst == %s && inport == %s
>> && "
>>> -                          "flags.loopback == 1 && flags.use_snat_zone
>> == 1",
>>> -                          is_v6 ? "6" : "4", nat->external_ip,
>>> -                          l3dgw_port->json_key);
>>> -            if (!distributed && od->n_l3dgw_ports) {
>>> -                /* Flows for NAT rules that are centralized are only
>>> -                * programmed on the gateway chassis. */
>>> -                ds_put_format(match, " && is_chassis_resident(%s)",
>>> -                            l3dgw_port->cr_port->json_key);
>>> -            }
>>> -            ds_put_cstr(actions, "ct_snat;");
>>> -            ovn_lflow_add_with_hint(lflows, od, S_ROUTER_IN_UNSNAT,
>>> -                                    100, ds_cstr(match),
>> ds_cstr(actions),
>>> -                                    &nat->header_);
>>> -        }
>>> +    build_lrouter_in_unsnat_match(od, nat, match, distributed, is_v6,
>>> +                                  l3dgw_port);
>>> +
>>> +    ovn_lflow_add_with_hint(lflows, od, S_ROUTER_IN_UNSNAT,
>>> +                            priority, ds_cstr(match), "next;",
>>> +                            &nat->header_);
>>> +}
>>> +
>>> +static void
>>> +build_lrouter_in_unsnat_in_czone_flow(struct hmap *lflows,
>>> +                                      struct ovn_datapath *od,
>>> +                                      const struct nbrec_nat *nat,
>>> +                                      struct ds *match, bool
>> distributed,
>>> +                                      bool is_v6, struct ovn_port
>> *l3dgw_port)
>>> +{
>>> +    if (strcmp(nat->type, "snat") && strcmp(nat->type,
>> "dnat_and_snat")) {
>>> +        return;
>>>      }
>>> +
>>> +    build_lrouter_in_unsnat_match(od, nat, match, distributed, is_v6,
>>> +                                  l3dgw_port);
>>> +
>>> +    ds_put_cstr(match, " && flags.loopback == 0");
>>> +
>>> +    ovn_lflow_add_with_hint(lflows, od, S_ROUTER_IN_UNSNAT,
>>> +                            100, ds_cstr(match), "ct_snat_in_czone;",
>>> +                            &nat->header_);
>>> +
>>> +    ds_chomp(match, '0');
>>
>> I missed this in v8.  I think this obfuscates a bit the code.  Would you
>> be ok with the following incremental (I can squash it in when applying
>> the patch)?
>>
> 
> Sure.
> 

Thanks a lot for the effort of fixing this, applied to main!

_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to