On Tue, Apr 18, 2023 at 11:39 PM Dumitru Ceara <dce...@redhat.com> wrote:
> On 4/12/23 07:56, Ales Musil wrote: > > There are essentially three problems with the current > > combination of DGP + SNAT + LB: > > > > 1) The first packet is being SNATed in common zone due > > to a problem with pinctrl not preserving ct_mark/ct_label. > > The commit would create a SNAT entry within the same with DNAT > > entry. > > > > 2) The unSNAT for reply always happened in common zone because of > > the loopback check which would be triggered only when we loop > > the packet through the LR. Now there are two possibilities how > > the reply packet would be handled: > > > > a) If the entry for SNAT in common zone did not time out yet, the > > packet would be passed through unSNAT in common zone which would > > be fine and continue on. However, the unDNAT wouldn't work due to > > the limitation of CT not capable of doing unSNAT/unDNAT on the same > > packet twice in the same zone. So the reply would arrive to > > the original interface, but with wrong source address. > > > > b) If the entry for SNAT timed out it would loop and do unSNAT correctly > > in separate zone and then also unDNAT. This is not possible anymore with > > a recent change 8c341b9d (northd: Drop packets destined to router owned > NAT IP for DGP). > > The reply would be dropped before looping after that change co the > traffic > > would never arrive to the original interface. > > > > 3) The unDNAT was happening only if the DGP was outport meaning > > the reply traffic was routed out, but in the opposite case > > the unDNAT was happening only because of the looping which made > > outport=inport. That's why it worked before introduction of explicit > drop. > > > > In order to fix all those issues do two changes: > > > > 1) Include inport in the unDNAT match, so that we have both > > routing directions covered e.g. (inport == "dgp_port" || outport == > "dpg_port"). > > > > 2) Always use the separate zone for SNAT and DNAT. As the common > > zone was needed for HWOL make the common zone optional with > > configuration option called "use_common_zone". This option is > > by default "false" and can be specified per LR. Use of separate > > zones also eliminates the need for the flag propagation > > in "lr_out_chk_dnat_local" stage, removing the match on ct_mark/ct_label. > > > > The "SNAT in separate zone from DNAT" system test is moved to run only > > with kernel datapath. The reason is that this test doesn't work with > > userspace datapath due to recirculation limit, currently set to 6 [0]. > > > > [0] > https://github.com/openvswitch/ovs/blob/9d840923d32124fe427de76e8234c49d64e4bb77/lib/dpif-netdev.c#L102 > > Reported-at: https://bugzilla.redhat.com/2161281 > > Signed-off-by: Ales Musil <amu...@redhat.com> > > --- > > v2: Fix flaky system test. > > v3: Rebase on top of current main. > > v4: Rebase on top of current main. > > Move the system test to system-ovn-kmod > > to unblock the failures with userspace. > > v5: Rebase on top of current main. > > v6: Rebase on top of current main. > > Change the config to a global option instead. > > v7: Address comments from Dumitru: > > Rename the stateless helper to better reflect > > that it applies only to snat_and_dnat type. > > Call the common zone and separate zone functions > > separately based on condition. > > v8: Address comments from Ilya and Dumitru. > > Change the helper for common zone. > > Avoid ds_clone. > > v9: Rebase on top of current main. > > Address comments from Dumitru. > > Split the function even further for three > > possible cases: stateless, common zone, separate zone. > > --- > > northd/northd.c | 621 +++++++++++++++++++++++---------------- > > northd/ovn-northd.8.xml | 90 +----- > > ovn-nb.xml | 9 + > > tests/ovn-northd.at | 237 ++++++++++----- > > tests/ovn.at | 3 + > > tests/system-ovn-kmod.at | 166 +++++++++++ > > tests/system-ovn.at | 117 -------- > > 7 files changed, 721 insertions(+), 522 deletions(-) > > > > diff --git a/northd/northd.c b/northd/northd.c > > index da3c8cf77..8b8f411df 100644 > > --- a/northd/northd.c > > +++ b/northd/northd.c > > @@ -66,6 +66,9 @@ static bool check_lsp_is_up; > > > > static bool install_ls_lb_from_router; > > > > +/* Use common zone for SNAT and DNAT if this option is set to "true". */ > > +static bool use_common_zone = false; > > + > > /* MAC allocated for service monitor usage. Just one mac is allocated > > * for this purpose and ovn-controller's on each chassis will make use > > * of this mac when sending out the packets to monitor the services > > @@ -10657,11 +10660,19 @@ struct lrouter_nat_lb_flows_ctx { > > const struct shash *meter_groups; > > }; > > > > +static inline bool > > +lrouter_use_common_zone(const struct ovn_datapath *od) > > +{ > > + return !od->is_gw_router && use_common_zone; > > +} > > + > > static void > > build_distr_lrouter_nat_flows_for_lb(struct lrouter_nat_lb_flows_ctx > *ctx, > > enum lrouter_nat_lb_flow_type type, > > struct ovn_datapath *od) > > { > > + struct ovn_port *dgp = od->l3dgw_ports[0]; > > + > > const char *undnat_action; > > > > switch (type) { > > @@ -10673,9 +10684,12 @@ build_distr_lrouter_nat_flows_for_lb(struct > lrouter_nat_lb_flows_ctx *ctx, > > break; > > case LROUTER_NAT_LB_FLOW_NORMAL: > > case LROUTER_NAT_LB_FLOW_MAX: > > - undnat_action = od->is_gw_router ? "ct_dnat;" : > "ct_dnat_in_czone;"; > > + undnat_action = lrouter_use_common_zone(od) > > + ? "ct_dnat_in_czone;" > > + : "ct_dnat;"; > > break; > > } > > + > > /* Store the match lengths, so we can reuse the ds buffer. */ > > size_t new_match_len = ctx->new_match->length; > > size_t undnat_match_len = ctx->undnat_match->length; > > @@ -10702,10 +10716,9 @@ build_distr_lrouter_nat_flows_for_lb(struct > lrouter_nat_lb_flows_ctx *ctx, > > return; > > } > > > > - ds_put_format(ctx->undnat_match, > > - ") && outport == %s && is_chassis_resident(%s)", > > - od->l3dgw_ports[0]->json_key, > > - od->l3dgw_ports[0]->cr_port->json_key); > > + ds_put_format(ctx->undnat_match, ") && (inport == %s || outport == > %s)" > > + " && is_chassis_resident(%s)", dgp->json_key, > dgp->json_key, > > + dgp->cr_port->json_key); > > ovn_lflow_add_with_hint(ctx->lflows, od, S_ROUTER_OUT_UNDNAT, 120, > > ds_cstr(ctx->undnat_match), undnat_action, > > &ctx->lb->nlb->header_); > > @@ -13649,84 +13662,101 @@ build_lrouter_ipv4_ip_input(struct ovn_port > *op, > > } > > > > static void > > -build_lrouter_in_unsnat_flow(struct hmap *lflows, struct ovn_datapath > *od, > > - const struct nbrec_nat *nat, struct ds > *match, > > - struct ds *actions, bool distributed, bool > is_v6, > > - struct ovn_port *l3dgw_port) > > +build_lrouter_in_unsnat_match(struct ovn_datapath *od, > > + const struct nbrec_nat *nat, struct ds > *match, > > + bool distributed, bool is_v6, > > + struct ovn_port *l3dgw_port) > > { > > - /* Ingress UNSNAT table: It is for already established connections' > > - * reverse traffic. i.e., SNAT has already been done in egress > > - * pipeline and now the packet has entered the ingress pipeline as > > - * part of a reply. We undo the SNAT here. > > - * > > - * Undoing SNAT has to happen before DNAT processing. This is > > - * because when the packet was DNATed in ingress pipeline, it did > > - * not know about the possibility of eventual additional SNAT in > > - * egress pipeline. */ > > - if (strcmp(nat->type, "snat") && strcmp(nat->type, > "dnat_and_snat")) { > > - return; > > - } > > + ds_clear(match); > > > > - bool stateless = lrouter_dnat_and_snat_is_stateless(nat); > > - if (od->is_gw_router) { > > - ds_clear(match); > > - ds_clear(actions); > > - ds_put_format(match, "ip && ip%s.dst == %s", > > - is_v6 ? "6" : "4", nat->external_ip); > > - if (stateless) { > > - ds_put_format(actions, "next;"); > > - } else { > > - ds_put_cstr(actions, "ct_snat;"); > > - } > > + ds_put_format(match, "ip && ip%c.dst == %s", > > + is_v6 ? '6' : '4', nat->external_ip); > > > > - ovn_lflow_add_with_hint(lflows, od, S_ROUTER_IN_UNSNAT, > > - 90, ds_cstr(match), ds_cstr(actions), > > - &nat->header_); > > - } else { > > + if (!od->is_gw_router) { > > /* Distributed router. */ > > > > /* Traffic received on l3dgw_port is subject to NAT. */ > > - ds_clear(match); > > - ds_clear(actions); > > - ds_put_format(match, "ip && ip%s.dst == %s && inport == %s && " > > - "flags.loopback == 0", is_v6 ? "6" : "4", > > - nat->external_ip, l3dgw_port->json_key); > > + ds_put_format(match, " && inport == %s", l3dgw_port->json_key); > > + > > if (!distributed && od->n_l3dgw_ports) { > > /* Flows for NAT rules that are centralized are only > > - * programmed on the gateway chassis. */ > > + * programmed on the gateway chassis. */ > > ds_put_format(match, " && is_chassis_resident(%s)", > > l3dgw_port->cr_port->json_key); > > } > > + } > > +} > > > > - if (stateless) { > > - ds_put_format(actions, "next;"); > > - } else { > > - ds_put_cstr(actions, "ct_snat_in_czone;"); > > - } > > +static void > > +build_lrouter_in_unsnat_stateless_flow(struct hmap *lflows, > > + struct ovn_datapath *od, > > + const struct nbrec_nat *nat, > > + struct ds *match, > > + bool distributed, bool is_v6, > > + struct ovn_port *l3dgw_port) > > +{ > > + if (strcmp(nat->type, "snat") && strcmp(nat->type, > "dnat_and_snat")) { > > + return; > > + } > > > > - ovn_lflow_add_with_hint(lflows, od, S_ROUTER_IN_UNSNAT, > > - 100, ds_cstr(match), ds_cstr(actions), > > - &nat->header_); > > + uint16_t priority = od->is_gw_router ? 90 : 100; > > > > - if (!stateless) { > > - ds_clear(match); > > - ds_clear(actions); > > - ds_put_format(match, "ip && ip%s.dst == %s && inport == %s > && " > > - "flags.loopback == 1 && flags.use_snat_zone > == 1", > > - is_v6 ? "6" : "4", nat->external_ip, > > - l3dgw_port->json_key); > > - if (!distributed && od->n_l3dgw_ports) { > > - /* Flows for NAT rules that are centralized are only > > - * programmed on the gateway chassis. */ > > - ds_put_format(match, " && is_chassis_resident(%s)", > > - l3dgw_port->cr_port->json_key); > > - } > > - ds_put_cstr(actions, "ct_snat;"); > > - ovn_lflow_add_with_hint(lflows, od, S_ROUTER_IN_UNSNAT, > > - 100, ds_cstr(match), > ds_cstr(actions), > > - &nat->header_); > > - } > > + build_lrouter_in_unsnat_match(od, nat, match, distributed, is_v6, > > + l3dgw_port); > > + > > + ovn_lflow_add_with_hint(lflows, od, S_ROUTER_IN_UNSNAT, > > + priority, ds_cstr(match), "next;", > > + &nat->header_); > > +} > > + > > +static void > > +build_lrouter_in_unsnat_in_czone_flow(struct hmap *lflows, > > + struct ovn_datapath *od, > > + const struct nbrec_nat *nat, > > + struct ds *match, bool > distributed, > > + bool is_v6, struct ovn_port > *l3dgw_port) > > +{ > > + if (strcmp(nat->type, "snat") && strcmp(nat->type, > "dnat_and_snat")) { > > + return; > > } > > + > > + build_lrouter_in_unsnat_match(od, nat, match, distributed, is_v6, > > + l3dgw_port); > > + > > + ds_put_cstr(match, " && flags.loopback == 0"); > > + > > + ovn_lflow_add_with_hint(lflows, od, S_ROUTER_IN_UNSNAT, > > + 100, ds_cstr(match), "ct_snat_in_czone;", > > + &nat->header_); > > + > > + ds_chomp(match, '0'); > > I missed this in v8. I think this obfuscates a bit the code. Would you > be ok with the following incremental (I can squash it in when applying > the patch)? > Sure. Thanks, Ales > > diff --git a/northd/northd.c b/northd/northd.c > index 8d46358338..3854a2852f 100644 > --- a/northd/northd.c > +++ b/northd/northd.c > @@ -13740,15 +13740,20 @@ build_lrouter_in_unsnat_in_czone_flow(struct > hmap *lflows, > build_lrouter_in_unsnat_match(od, nat, match, distributed, is_v6, > l3dgw_port); > > - ds_put_cstr(match, " && flags.loopback == 0"); > + /* We're adding two flows: one matching on "M1 && flags.loopback == > 0" and > + * the second one matching on "M && flags.loopback == 1 && M2". > + * Reuse the common part of the match string. > + */ > + size_t common_match_len = match->length; > > + ds_put_cstr(match, " && flags.loopback == 0"); > ovn_lflow_add_with_hint(lflows, od, S_ROUTER_IN_UNSNAT, > 100, ds_cstr(match), "ct_snat_in_czone;", > &nat->header_); > > - ds_chomp(match, '0'); > + ds_truncate(match, common_match_len); > /* Update common zone match for the hairpin traffic. */ > - ds_put_cstr(match, "1 && flags.use_snat_zone == 1"); > + ds_put_cstr(match, " && flags.loopback == 1 && flags.use_snat_zone == > 1"); > > ovn_lflow_add_with_hint(lflows, od, S_ROUTER_IN_UNSNAT, > 100, ds_cstr(match), "ct_snat;", > --- > > The rest looks OK to me. > > Regards, > Dumitru > > -- Ales Musil Senior Software Engineer - OVN Core Red Hat EMEA <https://www.redhat.com> amu...@redhat.com IM: amusil <https://red.ht/sig> _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev