On Fri, Mar 22, 2024 at 7:51 AM Roberto Bartzen Acosta <roberto.aco...@luizalabs.com> wrote: > > Hi Mark, > > Thanks for your feedback. > > Em seg., 18 de mar. de 2024 às 11:53, Mark Michelson <mmich...@redhat.com> > escreveu: > > > Hi Roberto, > > > > I have some concerns about this patch. Let's use the test case you added > > as an example network. Let's bind the vms and DGPs to hypervisors: > > > > * vm1 and lr1-ts1 are bound to hypervisor hv1 > > * vm2 and lr1-ts2 are bound to hypervisor hv2 > > > > Now imagine a packet arrives on lr1-ts1. The packet gets load balanced > > and sent to vm2. In this case, since lr1-ts1 is on hv1, the ct_lb_mark() > > action runs there, creating a conntrack entry on hv1. However, the > > packet eventually is tunneled to hv2 so that it can be output to vm2. > > > > Now vm2 replies to the packet. Is there anything that ensures that the > > reply from vm2 gets sent to hv1 for the egress pipeline of lr1? If so, > > then this should work fine since the packet will be unDNATted as > > expected on hv1. However, if the packet runs all of lr1's pipelines on > > hv2, then there is no conntrack entry present, and the attempt to unDNAT > > will not succeed. The packet will either be dropped because of invalid > > CT, or the packet will have an incorrect source IP and port. Either way, > > this isn't what is desired. > > > > yep, you're right! that makes sense. > If the packet comes back with a chassis that does not have the related LB > contrack entry corresponding to the initial request, this will trigger a > miss in the pipeline. > > I tried to disable ct tracking for lb by configuring the parameter on the > router, but I still wasn't successful. E.g.: > options:lb_force_snat_ip="172.16.200.201" > > Even changing the lb_force snat ip on the router, or creating a SNAT > "stateless" rule, I still see this action in the output pipeline in the > highest priority table (table=1). > > table=1 (lr_out_undnat ), priority=120 , match=(ip4 && ((ip4.src == > 10.195.185.2 && tcp.src == 8000) || (ip4.src == 10.195.185.3 && tcp.src == > 8000) || (ip4.src == 10.195.185.4 && tcp.src == 8000) || (ip4.src == > 10.195.185.5 && tcp.src == 8000) || (ip4.src == 10.195.185.6 && tcp.src == > 8000)) && outport == "incus-net40-lr-lrp-01" && > is_chassis_resident("cr-incus-net40-lr-lrp-01")), action=(ct_dnat_in_czone;) > table=1 (lr_out_undnat ), priority=120 , match=(ip4 && ((ip4.src == > 10.195.185.2 && tcp.src == 8000) || (ip4.src == 10.195.185.3 && tcp.src == > 8000) || (ip4.src == 10.195.185.4 && tcp.src == 8000) || (ip4.src == > 10.195.185.5 && tcp.src == 8000) || (ip4.src == 10.195.185.6 && tcp.src == > 8000)) && outport == "incus-net40-lr-lrp-02" && > is_chassis_resident("cr-incus-net40-lr-lrp-02")), action=(ct_dnat_in_czone;) > table=1 (lr_out_undnat ), priority=120 , match=(ip4 && ((ip4.src == > 10.195.185.2 && tcp.src == 8000) || (ip4.src == 10.195.185.3 && tcp.src == > 8000) || (ip4.src == 10.195.185.4 && tcp.src == 8000) || (ip4.src == > 10.195.185.5 && tcp.src == 8000) || (ip4.src == 10.195.185.6 && tcp.src == > 8000)) && outport == "incus-net40-lr-lrp-03" && > is_chassis_resident("cr-incus-net40-lr-lrp-03")), action=(ct_dnat_in_czone;) > table=1 (lr_out_undnat ), priority=120 , match=(ip4 && ((ip4.src == > 10.195.185.2 && tcp.src == 8000) || (ip4.src == 10.195.185.3 && tcp.src == > 8000) || (ip4.src == 10.195.185.4 && tcp.src == 8000) || (ip4.src == > 10.195.185.5 && tcp.src == 8000) || (ip4.src == 10.195.185.6 && tcp.src == > 8000)) && outport == "incus-net40-lr-lrp-04" && > is_chassis_resident("cr-incus-net40-lr-lrp-04")), action=(ct_dnat_in_czone;) > table=1 (lr_out_undnat ), priority=120 , match=(ip4 && ((ip4.src == > 10.195.185.2 && tcp.src == 8000) || (ip4.src == 10.195.185.3 && tcp.src == > 8000) || (ip4.src == 10.195.185.4 && tcp.src == 8000) || (ip4.src == > 10.195.185.5 && tcp.src == 8000) || (ip4.src == 10.195.185.6 && tcp.src == > 8000)) && outport == "incus-net40-lr-lrp-ext" && > is_chassis_resident("cr-incus-net40-lr-lrp-ext")), > action=(ct_dnat_in_czone;) > > > Considering that the return when using multiple DGPs is probably associated > with ECMP, any idea on how to change the rules so that the LB output rules > are 'stateless' (not ct dependent) in this case with multiple DGPs? I > imagine this solves the problem and guarantees a return for any chassis.
I think the only way to solve your problem is to add NAT support for a router having multiple DGPs. I'm not sure how easy that is or if it is even possible to support. We need to explore this. Numan > > Thanks, > Roberto > > PS: I have a complex load balancer scenario for the use case with multiple > DPGs, and 'internal' VIP addresses + OVN interconnect. I can explain the > general context if you are interested :) > > > > On 2/19/24 16:33, Roberto Bartzen Acosta wrote: > > > This commit fixes the build_distr_lrouter_nat_flows_for_lb function to > > > include one NAT flow entry for each DGP in use. Since we have added > > support > > > to create multiple gateway ports per logical router, it's necessary to > > > include in the LR nat rules pipeline a specific entry for each attached > > > DGP. Otherwise, the ingress traffic is only redirected when the incoming > > > LRP matches the chassis_resident field. > > > > > > Considering that DNAT rules for DGPs were implemented with the need to > > > configure the DGP-related gateway-port column, the load-balancer NAT rule > > > configuration can use a similar idea. In this case, we don't know the LRP > > > responsible for the incoming traffic, and therefore we must apply the > > > load-balancer automatically created NAT rule in all DGPs to allow the > > > incoming traffic. > > > > > > Reported-at: https://bugs.launchpad.net/ubuntu/+source/ovn/+bug/2054322 > > > Fixes: 15348b7b806f ("ovn-northd: Multiple distributed gateway port > > support.") > > > Signed-off-by: Roberto Bartzen Acosta <roberto.aco...@luizalabs.com> > > > --- > > > northd/en-lr-stateful.c | 12 ------ > > > northd/northd.c | 14 ++++--- > > > tests/ovn-northd.at | 92 +++++++++++++++++++++++++++++++++++++++++ > > > 3 files changed, 100 insertions(+), 18 deletions(-) > > > > > > diff --git a/northd/en-lr-stateful.c b/northd/en-lr-stateful.c > > > index 6d0192487..7ffa4a690 100644 > > > --- a/northd/en-lr-stateful.c > > > +++ b/northd/en-lr-stateful.c > > > @@ -537,18 +537,6 @@ lr_stateful_record_create(struct lr_stateful_table > > *table, > > > > > > table->array[od->index] = lr_stateful_rec; > > > > > > - /* Load balancers are not supported (yet) if a logical router has > > multiple > > > - * distributed gateway port. Log a warning. */ > > > - if (lr_stateful_rec->has_lb_vip && lr_has_multiple_gw_ports(od)) { > > > - static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 1); > > > - VLOG_WARN_RL(&rl, "Load-balancers are configured on logical " > > > - "router %s, which has %"PRIuSIZE" distributed " > > > - "gateway ports. Load-balancer is not supported " > > > - "yet when there is more than one distributed " > > > - "gateway port on the router.", > > > - od->nbr->name, od->n_l3dgw_ports); > > > - } > > > - > > > return lr_stateful_rec; > > > } > > > > > > diff --git a/northd/northd.c b/northd/northd.c > > > index 2c3560ce2..7eb943d2f 100644 > > > --- a/northd/northd.c > > > +++ b/northd/northd.c > > > @@ -10919,10 +10919,9 @@ static void > > > build_distr_lrouter_nat_flows_for_lb(struct lrouter_nat_lb_flows_ctx > > *ctx, > > > enum lrouter_nat_lb_flow_type > > type, > > > struct ovn_datapath *od, > > > - struct lflow_ref *lflow_ref) > > > + struct lflow_ref *lflow_ref, > > > + struct ovn_port *dgp) > > > { > > > - struct ovn_port *dgp = od->l3dgw_ports[0]; > > > - > > > const char *undnat_action; > > > > > > switch (type) { > > > @@ -10953,7 +10952,7 @@ build_distr_lrouter_nat_flows_for_lb(struct > > lrouter_nat_lb_flows_ctx *ctx, > > > > > > if (ctx->lb_vip->n_backends || !ctx->lb_vip->empty_backend_rej) { > > > ds_put_format(ctx->new_match, " && is_chassis_resident(%s)", > > > - od->l3dgw_ports[0]->cr_port->json_key); > > > + dgp->cr_port->json_key); > > > } > > > > > > ovn_lflow_add_with_hint__(ctx->lflows, od, S_ROUTER_IN_DNAT, > > ctx->prio, > > > @@ -11164,8 +11163,11 @@ build_lrouter_nat_flows_for_lb( > > > if (!od->n_l3dgw_ports) { > > > bitmap_set1(gw_dp_bitmap[type], index); > > > } else { > > > - build_distr_lrouter_nat_flows_for_lb(&ctx, type, od, > > > - lb_dps->lflow_ref); > > > + for (size_t i = 0; i < od->n_l3dgw_ports; i++) { > > > + struct ovn_port *dgp = od->l3dgw_ports[i]; > > > + build_distr_lrouter_nat_flows_for_lb(&ctx, type, od, > > > + lb_dps->lflow_ref, > > dgp); > > > + } > > > } > > > > > > if (lb->affinity_timeout) { > > > diff --git a/tests/ovn-northd.at b/tests/ovn-northd.at > > > index 6fdd761da..fa24935e1 100644 > > > --- a/tests/ovn-northd.at > > > +++ b/tests/ovn-northd.at > > > @@ -12313,3 +12313,95 @@ check_engine_stats northd recompute nocompute > > > check_engine_stats lflow recompute nocompute > > > > > > AT_CLEANUP > > > + > > > +OVN_FOR_EACH_NORTHD_NO_HV([ > > > +AT_SETUP([Load balancer with Distributed Gateway Ports (DGP)]) > > > +ovn_start > > > + > > > +check ovn-nbctl ls-add public > > > +check ovn-nbctl lr-add lr1 > > > + > > > +# lr1 DGP ts1 > > > +check ovn-nbctl ls-add ts1 > > > +check ovn-nbctl lrp-add lr1 lr1-ts1 00:00:01:02:03:04 172.16.10.1/24 > > > +check ovn-nbctl lrp-set-gateway-chassis lr1-ts1 chassis-2 > > > + > > > +# lr1 DGP ts2 > > > +check ovn-nbctl ls-add ts2 > > > +check ovn-nbctl lrp-add lr1 lr1-ts2 00:00:01:02:03:05 172.16.20.1/24 > > > +check ovn-nbctl lrp-set-gateway-chassis lr1-ts2 chassis-3 > > > + > > > +# lr1 DGP public > > > +check ovn-nbctl lrp-add lr1 lr1_public 00:de:ad:ff:00:01 173.16.0.1/16 > > > +check ovn-nbctl lrp-add lr1 lr1_s1 00:de:ad:fe:00:02 172.16.0.1/24 > > > +check ovn-nbctl lrp-set-gateway-chassis lr1_public chassis-1 > > > + > > > +check ovn-nbctl ls-add s1 > > > +# s1 - lr1 > > > +check ovn-nbctl lsp-add s1 s1_lr1 > > > +check ovn-nbctl lsp-set-type s1_lr1 router > > > +check ovn-nbctl lsp-set-addresses s1_lr1 "00:de:ad:fe:00:02 172.16.0.1" > > > +check ovn-nbctl lsp-set-options s1_lr1 router-port=lr1_s1 > > > + > > > +# s1 - backend vm1 > > > +check ovn-nbctl lsp-add s1 vm1 > > > +check ovn-nbctl lsp-set-addresses vm1 "00:de:ad:01:00:01 172.16.0.101" > > > + > > > +# s1 - backend vm2 > > > +check ovn-nbctl lsp-add s1 vm2 > > > +check ovn-nbctl lsp-set-addresses vm2 "00:de:ad:01:00:02 172.16.0.102" > > > + > > > +# s1 - backend vm3 > > > +check ovn-nbctl lsp-add s1 vm3 > > > +check ovn-nbctl lsp-set-addresses vm3 "00:de:ad:01:00:03 172.16.0.103" > > > + > > > +# Add the lr1 DGP ts1 to the public switch > > > +check ovn-nbctl lsp-add public public_lr1_ts1 > > > +check ovn-nbctl lsp-set-type public_lr1_ts1 router > > > +check ovn-nbctl lsp-set-addresses public_lr1_ts1 router > > > +check ovn-nbctl lsp-set-options public_lr1_ts1 router-port=lr1-ts1 > > nat-addresses=router > > > + > > > +# Add the lr1 DGP ts2 to the public switch > > > +check ovn-nbctl lsp-add public public_lr1_ts2 > > > +check ovn-nbctl lsp-set-type public_lr1_ts2 router > > > +check ovn-nbctl lsp-set-addresses public_lr1_ts2 router > > > +check ovn-nbctl lsp-set-options public_lr1_ts2 router-port=lr1-ts2 > > nat-addresses=router > > > + > > > +# Add the lr1 DGP public to the public switch > > > +check ovn-nbctl lsp-add public public_lr1 > > > +check ovn-nbctl lsp-set-type public_lr1 router > > > +check ovn-nbctl lsp-set-addresses public_lr1 router > > > +check ovn-nbctl lsp-set-options public_lr1 router-port=lr1_public > > nat-addresses=router > > > + > > > +# Create the Load Balancer lb1 > > > +check ovn-nbctl --wait=sb lb-add lb1 "30.0.0.1" > > "172.16.0.103,172.16.0.102,172.16.0.101" > > > + > > > +# Associate load balancer to s1 > > > +check ovn-nbctl ls-lb-add s1 lb1 > > > +check ovn-nbctl --wait=sb sync > > > + > > > +ovn-sbctl dump-flows s1 > s1flows > > > +AT_CAPTURE_FILE([s1flows]) > > > + > > > +AT_CHECK([grep "ls_in_pre_stateful" s1flows | ovn_strip_lflows | grep > > "30.0.0.1"], [0], [dnl > > > + table=??(ls_in_pre_stateful ), priority=120 , match=(reg0[[2]] == 1 > > && ip4.dst == 30.0.0.1), action=(reg1 = 30.0.0.1; ct_lb_mark;) > > > +]) > > > +AT_CHECK([grep "ls_in_lb" s1flows | ovn_strip_lflows | grep > > "30.0.0.1"], [0], [dnl > > > + table=??(ls_in_lb ), priority=110 , match=(ct.new && > > ip4.dst == 30.0.0.1), action=(reg0[[1]] = 0; > > ct_lb_mark(backends=172.16.0.103,172.16.0.102,172.16.0.101);) > > > +]) > > > + > > > +# Associate load balancer to lr1 with DGP > > > +check ovn-nbctl lr-lb-add lr1 lb1 > > > +check ovn-nbctl --wait=sb sync > > > + > > > +ovn-sbctl dump-flows lr1 > lr1flows > > > +AT_CAPTURE_FILE([lr1flows]) > > > + > > > +AT_CHECK([grep "lr_in_dnat" lr1flows | ovn_strip_lflows | grep > > "30.0.0.1"], [0], [dnl > > > + table=??(lr_in_dnat ), priority=110 , match=(ct.new && > > !ct.rel && ip4 && ip4.dst == 30.0.0.1 && > > is_chassis_resident("cr-lr1-ts1")), > > action=(ct_lb_mark(backends=172.16.0.103,172.16.0.102,172.16.0.101);) > > > + table=??(lr_in_dnat ), priority=110 , match=(ct.new && > > !ct.rel && ip4 && ip4.dst == 30.0.0.1 && > > is_chassis_resident("cr-lr1-ts2")), > > action=(ct_lb_mark(backends=172.16.0.103,172.16.0.102,172.16.0.101);) > > > + table=??(lr_in_dnat ), priority=110 , match=(ct.new && > > !ct.rel && ip4 && ip4.dst == 30.0.0.1 && > > is_chassis_resident("cr-lr1_public")), > > action=(ct_lb_mark(backends=172.16.0.103,172.16.0.102,172.16.0.101);) > > > +]) > > > + > > > +AT_CLEANUP > > > +]) > > > > > > -- > > > > > _‘Esta mensagem é direcionada apenas para os endereços constantes no > cabeçalho inicial. Se você não está listado nos endereços constantes no > cabeçalho, pedimos-lhe que desconsidere completamente o conteúdo dessa > mensagem e cuja cópia, encaminhamento e/ou execução das ações citadas estão > imediatamente anuladas e proibidas’._ > > > * **‘Apesar do Magazine Luiza tomar > todas as precauções razoáveis para assegurar que nenhum vírus esteja > presente nesse e-mail, a empresa não poderá aceitar a responsabilidade por > quaisquer perdas ou danos causados por esse e-mail ou por seus anexos’.* > > > > _______________________________________________ > dev mailing list > d...@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev