On 4/19/23 07:32, Ales Musil wrote: > On Tue, Apr 18, 2023 at 11:39 PM Dumitru Ceara <dce...@redhat.com> wrote: > >> On 4/12/23 07:56, Ales Musil wrote: >>> There are essentially three problems with the current >>> combination of DGP + SNAT + LB: >>> >>> 1) The first packet is being SNATed in common zone due >>> to a problem with pinctrl not preserving ct_mark/ct_label. >>> The commit would create a SNAT entry within the same with DNAT >>> entry. >>> >>> 2) The unSNAT for reply always happened in common zone because of >>> the loopback check which would be triggered only when we loop >>> the packet through the LR. Now there are two possibilities how >>> the reply packet would be handled: >>> >>> a) If the entry for SNAT in common zone did not time out yet, the >>> packet would be passed through unSNAT in common zone which would >>> be fine and continue on. However, the unDNAT wouldn't work due to >>> the limitation of CT not capable of doing unSNAT/unDNAT on the same >>> packet twice in the same zone. So the reply would arrive to >>> the original interface, but with wrong source address. >>> >>> b) If the entry for SNAT timed out it would loop and do unSNAT correctly >>> in separate zone and then also unDNAT. This is not possible anymore with >>> a recent change 8c341b9d (northd: Drop packets destined to router owned >> NAT IP for DGP). >>> The reply would be dropped before looping after that change co the >> traffic >>> would never arrive to the original interface. >>> >>> 3) The unDNAT was happening only if the DGP was outport meaning >>> the reply traffic was routed out, but in the opposite case >>> the unDNAT was happening only because of the looping which made >>> outport=inport. That's why it worked before introduction of explicit >> drop. >>> >>> In order to fix all those issues do two changes: >>> >>> 1) Include inport in the unDNAT match, so that we have both >>> routing directions covered e.g. (inport == "dgp_port" || outport == >> "dpg_port"). >>> >>> 2) Always use the separate zone for SNAT and DNAT. As the common >>> zone was needed for HWOL make the common zone optional with >>> configuration option called "use_common_zone". This option is >>> by default "false" and can be specified per LR. Use of separate >>> zones also eliminates the need for the flag propagation >>> in "lr_out_chk_dnat_local" stage, removing the match on ct_mark/ct_label. >>> >>> The "SNAT in separate zone from DNAT" system test is moved to run only >>> with kernel datapath. The reason is that this test doesn't work with >>> userspace datapath due to recirculation limit, currently set to 6 [0]. >>> >>> [0] >> https://github.com/openvswitch/ovs/blob/9d840923d32124fe427de76e8234c49d64e4bb77/lib/dpif-netdev.c#L102 >>> Reported-at: https://bugzilla.redhat.com/2161281 >>> Signed-off-by: Ales Musil <amu...@redhat.com> >>> --- >>> v2: Fix flaky system test. >>> v3: Rebase on top of current main. >>> v4: Rebase on top of current main. >>> Move the system test to system-ovn-kmod >>> to unblock the failures with userspace. >>> v5: Rebase on top of current main. >>> v6: Rebase on top of current main. >>> Change the config to a global option instead. >>> v7: Address comments from Dumitru: >>> Rename the stateless helper to better reflect >>> that it applies only to snat_and_dnat type. >>> Call the common zone and separate zone functions >>> separately based on condition. >>> v8: Address comments from Ilya and Dumitru. >>> Change the helper for common zone. >>> Avoid ds_clone. >>> v9: Rebase on top of current main. >>> Address comments from Dumitru. >>> Split the function even further for three >>> possible cases: stateless, common zone, separate zone. >>> --- >>> northd/northd.c | 621 +++++++++++++++++++++++---------------- >>> northd/ovn-northd.8.xml | 90 +----- >>> ovn-nb.xml | 9 + >>> tests/ovn-northd.at | 237 ++++++++++----- >>> tests/ovn.at | 3 + >>> tests/system-ovn-kmod.at | 166 +++++++++++ >>> tests/system-ovn.at | 117 -------- >>> 7 files changed, 721 insertions(+), 522 deletions(-) >>> >>> diff --git a/northd/northd.c b/northd/northd.c >>> index da3c8cf77..8b8f411df 100644 >>> --- a/northd/northd.c >>> +++ b/northd/northd.c >>> @@ -66,6 +66,9 @@ static bool check_lsp_is_up; >>> >>> static bool install_ls_lb_from_router; >>> >>> +/* Use common zone for SNAT and DNAT if this option is set to "true". */ >>> +static bool use_common_zone = false; >>> + >>> /* MAC allocated for service monitor usage. Just one mac is allocated >>> * for this purpose and ovn-controller's on each chassis will make use >>> * of this mac when sending out the packets to monitor the services >>> @@ -10657,11 +10660,19 @@ struct lrouter_nat_lb_flows_ctx { >>> const struct shash *meter_groups; >>> }; >>> >>> +static inline bool >>> +lrouter_use_common_zone(const struct ovn_datapath *od) >>> +{ >>> + return !od->is_gw_router && use_common_zone; >>> +} >>> + >>> static void >>> build_distr_lrouter_nat_flows_for_lb(struct lrouter_nat_lb_flows_ctx >> *ctx, >>> enum lrouter_nat_lb_flow_type type, >>> struct ovn_datapath *od) >>> { >>> + struct ovn_port *dgp = od->l3dgw_ports[0]; >>> + >>> const char *undnat_action; >>> >>> switch (type) { >>> @@ -10673,9 +10684,12 @@ build_distr_lrouter_nat_flows_for_lb(struct >> lrouter_nat_lb_flows_ctx *ctx, >>> break; >>> case LROUTER_NAT_LB_FLOW_NORMAL: >>> case LROUTER_NAT_LB_FLOW_MAX: >>> - undnat_action = od->is_gw_router ? "ct_dnat;" : >> "ct_dnat_in_czone;"; >>> + undnat_action = lrouter_use_common_zone(od) >>> + ? "ct_dnat_in_czone;" >>> + : "ct_dnat;"; >>> break; >>> } >>> + >>> /* Store the match lengths, so we can reuse the ds buffer. */ >>> size_t new_match_len = ctx->new_match->length; >>> size_t undnat_match_len = ctx->undnat_match->length; >>> @@ -10702,10 +10716,9 @@ build_distr_lrouter_nat_flows_for_lb(struct >> lrouter_nat_lb_flows_ctx *ctx, >>> return; >>> } >>> >>> - ds_put_format(ctx->undnat_match, >>> - ") && outport == %s && is_chassis_resident(%s)", >>> - od->l3dgw_ports[0]->json_key, >>> - od->l3dgw_ports[0]->cr_port->json_key); >>> + ds_put_format(ctx->undnat_match, ") && (inport == %s || outport == >> %s)" >>> + " && is_chassis_resident(%s)", dgp->json_key, >> dgp->json_key, >>> + dgp->cr_port->json_key); >>> ovn_lflow_add_with_hint(ctx->lflows, od, S_ROUTER_OUT_UNDNAT, 120, >>> ds_cstr(ctx->undnat_match), undnat_action, >>> &ctx->lb->nlb->header_); >>> @@ -13649,84 +13662,101 @@ build_lrouter_ipv4_ip_input(struct ovn_port >> *op, >>> } >>> >>> static void >>> -build_lrouter_in_unsnat_flow(struct hmap *lflows, struct ovn_datapath >> *od, >>> - const struct nbrec_nat *nat, struct ds >> *match, >>> - struct ds *actions, bool distributed, bool >> is_v6, >>> - struct ovn_port *l3dgw_port) >>> +build_lrouter_in_unsnat_match(struct ovn_datapath *od, >>> + const struct nbrec_nat *nat, struct ds >> *match, >>> + bool distributed, bool is_v6, >>> + struct ovn_port *l3dgw_port) >>> { >>> - /* Ingress UNSNAT table: It is for already established connections' >>> - * reverse traffic. i.e., SNAT has already been done in egress >>> - * pipeline and now the packet has entered the ingress pipeline as >>> - * part of a reply. We undo the SNAT here. >>> - * >>> - * Undoing SNAT has to happen before DNAT processing. This is >>> - * because when the packet was DNATed in ingress pipeline, it did >>> - * not know about the possibility of eventual additional SNAT in >>> - * egress pipeline. */ >>> - if (strcmp(nat->type, "snat") && strcmp(nat->type, >> "dnat_and_snat")) { >>> - return; >>> - } >>> + ds_clear(match); >>> >>> - bool stateless = lrouter_dnat_and_snat_is_stateless(nat); >>> - if (od->is_gw_router) { >>> - ds_clear(match); >>> - ds_clear(actions); >>> - ds_put_format(match, "ip && ip%s.dst == %s", >>> - is_v6 ? "6" : "4", nat->external_ip); >>> - if (stateless) { >>> - ds_put_format(actions, "next;"); >>> - } else { >>> - ds_put_cstr(actions, "ct_snat;"); >>> - } >>> + ds_put_format(match, "ip && ip%c.dst == %s", >>> + is_v6 ? '6' : '4', nat->external_ip); >>> >>> - ovn_lflow_add_with_hint(lflows, od, S_ROUTER_IN_UNSNAT, >>> - 90, ds_cstr(match), ds_cstr(actions), >>> - &nat->header_); >>> - } else { >>> + if (!od->is_gw_router) { >>> /* Distributed router. */ >>> >>> /* Traffic received on l3dgw_port is subject to NAT. */ >>> - ds_clear(match); >>> - ds_clear(actions); >>> - ds_put_format(match, "ip && ip%s.dst == %s && inport == %s && " >>> - "flags.loopback == 0", is_v6 ? "6" : "4", >>> - nat->external_ip, l3dgw_port->json_key); >>> + ds_put_format(match, " && inport == %s", l3dgw_port->json_key); >>> + >>> if (!distributed && od->n_l3dgw_ports) { >>> /* Flows for NAT rules that are centralized are only >>> - * programmed on the gateway chassis. */ >>> + * programmed on the gateway chassis. */ >>> ds_put_format(match, " && is_chassis_resident(%s)", >>> l3dgw_port->cr_port->json_key); >>> } >>> + } >>> +} >>> >>> - if (stateless) { >>> - ds_put_format(actions, "next;"); >>> - } else { >>> - ds_put_cstr(actions, "ct_snat_in_czone;"); >>> - } >>> +static void >>> +build_lrouter_in_unsnat_stateless_flow(struct hmap *lflows, >>> + struct ovn_datapath *od, >>> + const struct nbrec_nat *nat, >>> + struct ds *match, >>> + bool distributed, bool is_v6, >>> + struct ovn_port *l3dgw_port) >>> +{ >>> + if (strcmp(nat->type, "snat") && strcmp(nat->type, >> "dnat_and_snat")) { >>> + return; >>> + } >>> >>> - ovn_lflow_add_with_hint(lflows, od, S_ROUTER_IN_UNSNAT, >>> - 100, ds_cstr(match), ds_cstr(actions), >>> - &nat->header_); >>> + uint16_t priority = od->is_gw_router ? 90 : 100; >>> >>> - if (!stateless) { >>> - ds_clear(match); >>> - ds_clear(actions); >>> - ds_put_format(match, "ip && ip%s.dst == %s && inport == %s >> && " >>> - "flags.loopback == 1 && flags.use_snat_zone >> == 1", >>> - is_v6 ? "6" : "4", nat->external_ip, >>> - l3dgw_port->json_key); >>> - if (!distributed && od->n_l3dgw_ports) { >>> - /* Flows for NAT rules that are centralized are only >>> - * programmed on the gateway chassis. */ >>> - ds_put_format(match, " && is_chassis_resident(%s)", >>> - l3dgw_port->cr_port->json_key); >>> - } >>> - ds_put_cstr(actions, "ct_snat;"); >>> - ovn_lflow_add_with_hint(lflows, od, S_ROUTER_IN_UNSNAT, >>> - 100, ds_cstr(match), >> ds_cstr(actions), >>> - &nat->header_); >>> - } >>> + build_lrouter_in_unsnat_match(od, nat, match, distributed, is_v6, >>> + l3dgw_port); >>> + >>> + ovn_lflow_add_with_hint(lflows, od, S_ROUTER_IN_UNSNAT, >>> + priority, ds_cstr(match), "next;", >>> + &nat->header_); >>> +} >>> + >>> +static void >>> +build_lrouter_in_unsnat_in_czone_flow(struct hmap *lflows, >>> + struct ovn_datapath *od, >>> + const struct nbrec_nat *nat, >>> + struct ds *match, bool >> distributed, >>> + bool is_v6, struct ovn_port >> *l3dgw_port) >>> +{ >>> + if (strcmp(nat->type, "snat") && strcmp(nat->type, >> "dnat_and_snat")) { >>> + return; >>> } >>> + >>> + build_lrouter_in_unsnat_match(od, nat, match, distributed, is_v6, >>> + l3dgw_port); >>> + >>> + ds_put_cstr(match, " && flags.loopback == 0"); >>> + >>> + ovn_lflow_add_with_hint(lflows, od, S_ROUTER_IN_UNSNAT, >>> + 100, ds_cstr(match), "ct_snat_in_czone;", >>> + &nat->header_); >>> + >>> + ds_chomp(match, '0'); >> >> I missed this in v8. I think this obfuscates a bit the code. Would you >> be ok with the following incremental (I can squash it in when applying >> the patch)? >> > > Sure. >
Thanks a lot for the effort of fixing this, applied to main! _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev