Hi Dumitru, Thanks a lot for looking at the change. It my mistake, LB use case totally slipped through my mind.
Cross product based approach is not generic enough to address LB as well (and future use cases for CT ZONE). Please ignore this patch, i will submit a new one with a more generic approach soon. There is one change in this patch that can be taken, i.e replacing CT_ZONE_DB_QUEUED with CT_ZONE_OF_QUEUED, but i will submit a separate patch for it as well. Thanks Regards, Ankur From: Dumitru Ceara <dce...@redhat.com> Sent: Friday, August 7, 2020 1:58 AM To: svc.mail.git <svc.mail....@nutanix.com>; ovs-dev@openvswitch.org <ovs-dev@openvswitch.org>; Ankur Sharma <ankur.sha...@nutanix.com> Subject: Re: [ovs-dev] [PATCH v1] ovn-controller: Fix the CT zone assignment logic for logical routers On 8/4/20 5:55 AM, Ankur Sharma wrote: > From: Ankur Sharma <ankur.sha...@nutanix.com> > > BACKGROUND: > a. ovn-controller assigns CT ZONES for local ports and datapaths. > b. If a local port/datapath is cleaned up from a chassis, then > corresponding CT ZONE is "unassigned"/"freed" up. > > ISSUE: > Above logic and implementations leaves stale CT entries in the > datapath, which may get reused unexpectedly, thereby causing > issues like, packets going through ct_nat(SNAT_IP_NEW) and getting > a stale IP as SNAT IP etc. > > a. As a part of CT Zone unassign, implementation should FLUSH the > corresponding CT entries, i.e it should do FLUSH by ZONE. > As os now, implementation avoids the flushing, thereby leaving > stale CT entries. > > b. Similarly, since the implementation relies on datapath existence > for assign/unassign of CT ZONEs. Hence, simple operations like > moving the logical router from one external logical switch to > another, may not cause any CT ZONE reassignment and thereby > stale CT entries might get consumed, when they should not have > been. > > c. a. and b. combined causes following: > i. Start a to be SNATed traffic from internal endpoint to an > external endpoint. Let us say internal endpoint IP is > 50.0.0.10 and external endpoint ip is 8.8.8.8 and > logical router port ip (and hence SNAT ip) is 100.0.0.10. > > ii. Detach the logical router from old external logical switch > and attach to new external logical switch. As a result > of this operation, new router port ip becomes 200.0.0.10 > , which also becomes the new SNAT ip. > > iii. The observation has been that traffic initiated in i. above > still ends up using OLD SNAT IP, i.e 100.0.0.10, rather than > 200.0.0.10 > > iv. iii. above happened, because although from OVS DP, the IP for > NAT action is 200.0.0.10, however, since its an ongoing traffic, > hence the CT entries come in use and end up NATing to old SNAT > ip 100.0.0.10. For example: > > OVS DP STATE > recirc_id(0),in_port(16),....ct(commit,zone=1,nat(src=200.0.0.10)) > > CT STATE > icmp,orig=(src=50.0.0.10,dst=8.8.8.8,id=2288,type=8,code=0), > reply=(src=8.8.8.8,dst=100.0.0.10,id=2288,type=0,code=0),zone=1 > > FIX: > This patch improves the overall CT ZONE management by doing following: > a. Do a FLUSH by CT ZONE, once we identify that a zone has to be freed up. > b. From datapath perspective, restrict the CT ZONE assignment ONLY > to logical routers that has NAT rules enabled. Hi Ankur, Maybe I'm missing something but doesn't this cause stateful ACLs, load balancers applied to logical_switches and load balancers applied to logical routers that don't have nat configured to use the default conntrack zone (0)? I don't think that's ok. As a matter of fact, most of the system-ovn.at tests related to conntrack fail because conntrack entries are created in the default zone, e.g.: $ make check-kernel 1: ovn -- 2 LRs connected via LS, gateway router, SNAT and DNAT FAILED (system-ovn.at:116) +++ /root/ovn/tests/system-kmod-testsuite.dir/at-groups/1/stdout 2020-08-07 04:50:11.167836393 -0400 @@ -1,2 +1,2 @@ -icmp,orig=(src=172.16.1.2,dst=30.0.0.2,id=<cleared>,type=8,code=0),reply=(src=192.168.1.2,dst=172.16.1.2,id=<cleared>,type=0,code=0),zone=<cleared> +icmp,orig=(src=172.16.1.2,dst=30.0.0.2,id=<cleared>,type=8,code=0),reply=(src=192.168.1.2,dst=172.16.1.2,id=<cleared>,type=0,code=0) 2: ovn -- 2 LRs connected via LS, gateway router, SNAT and DNAT - IPv6 FAILED (system-ovn.at:296) 3: ovn -- 2 LRs connected via LS, gateway router, easy SNAT FAILED (system-ovn.at:448) 4: ovn -- 2 LRs connected via LS, gateway router, easy SNAT - IPv6 FAILED (system-ovn.at:560) 5: ovn -- multiple gateway routers, SNAT and DNAT FAILED (system-ovn.at:730) 6: ovn -- multiple gateway routers, SNAT and DNAT - IPv6 FAILED (system-ovn.at:958) 7: ovn -- load-balancing FAILED (system-ovn.at:1153) 8: ovn -- load-balancing - IPv6 ok 9: ovn -- load-balancing - same subnet. ok 10: ovn -- load-balancing - same subnet. - IPv6 ok 11: ovn -- load balancing in gateway router FAILED (system-ovn.at:1829) ./system-ovn.at:1829: ovs-appctl dpctl/dump-conntrack | grep "dst=30.0.0.1" | sed -e 's/port=[0-9]*/port=<cleared>/g' -e 's/id=[0-9]*/id=<cleared>/g' -e 's/state=[0-9_A-Z]*/state=<cleared>/g' | sort | uniq | sed -e 's/zone=[0-9]*/zone=<cleared>/' --- - 2020-08-07 04:46:41.414957151 -0400 +++ /root/ovn/tests/system-kmod-testsuite.dir/at-groups/11/stdout 2020-08-07 04:46:41.412798625 -0400 @@ -1,3 +1,3 @@ -tcp,orig=(src=172.16.1.2,dst=30.0.0.1,sport=<cleared>,dport=<cleared>),reply=(src=192.168.1.2,dst=172.16.1.2,sport=<cleared>,dport=<cleared>),zone=<cleared>,protoinfo=(state=<cleared>) -tcp,orig=(src=172.16.1.2,dst=30.0.0.1,sport=<cleared>,dport=<cleared>),reply=(src=192.168.2.2,dst=172.16.1.2,sport=<cleared>,dport=<cleared>),zone=<cleared>,protoinfo=(state=<cleared>) +tcp,orig=(src=172.16.1.2,dst=30.0.0.1,sport=<cleared>,dport=<cleared>),reply=(src=192.168.1.2,dst=172.16.1.2,sport=<cleared>,dport=<cleared>),protoinfo=(state=<cleared>) +tcp,orig=(src=172.16.1.2,dst=30.0.0.1,sport=<cleared>,dport=<cleared>),reply=(src=192.168.2.2,dst=172.16.1.2,sport=<cleared>,dport=<cleared>),protoinfo=(state=<cleared>) 12: ovn -- load balancing in gateway router - IPv6 FAILED (system-ovn.at:2033) 13: ovn -- multiple gateway routers, load-balancing FAILED (system-ovn.at:2209) 14: ovn -- multiple gateway routers, load-balancing - IPv6 FAILED (system-ovn.at:2376) [...] 30: ovn -- ECMP symmetric reply FAILED (system-ovn.at:4684) ./system-ovn.at:4684: ovs-appctl dpctl/dump-conntrack | grep "dst=172.16.0.1" | sed -e 's/port=[0-9]*/port=<cleared>/g' -e 's/id=[0-9]*/id=<cleared>/g' -e 's/state=[0-9_A-Z]*/state=<cleared>/g' | sort | uniq | \ sed -e 's/zone=[0-9]*/zone=<cleared>/' | sed -e 's/labels=0x[0-9a-f]*00000401020400000000/labels=0x00000401020400000000/' --- - 2020-08-07 04:56:22.403499981 -0400 +++ /root/ovn/tests/system-kmod-testsuite.dir/at-groups/30/stdout 2020-08-07 04:56:22.401442917 -0400 @@ -1,2 +1,2 @@ -icmp,orig=(src=172.16.0.1,dst=10.0.0.2,id=<cleared>,type=8,code=0),reply=(src=10.0.0.2,dst=172.16.0.1,id=<cleared>,type=0,code=0),zone=<cleared>,labels=0x00000401020400000000 +icmp,orig=(src=172.16.0.1,dst=10.0.0.2,id=<cleared>,type=8,code=0),reply=(src=10.0.0.2,dst=172.16.0.1,id=<cleared>,type=0,code=0),labels=0x00000401020400000000 > c. Instead of using logical router uuid as ct zone key, use crossproduct > of logical router and logical router port that connects to external > logical switch. > > Signed-off-by: Ankur Sharma <ankur.sha...@nutanix.com> > --- > controller/ovn-controller.c | 37 +++++++++++++++++++++++++++---------- > controller/physical.c | 18 ++++++++++++------ > lib/ovn-util.c | 10 ++++++---- > lib/ovn-util.h | 3 ++- > 4 files changed, 47 insertions(+), 21 deletions(-) > > diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c > index 5ca32ac..9a6746e 100644 > --- a/controller/ovn-controller.c > +++ b/controller/ovn-controller.c > @@ -521,17 +521,34 @@ update_ct_zones(const struct sset *lports, const struct > hmap *local_datapaths, > sset_add(&all_users, user); > } > > - /* Local patched datapath (gateway routers) need zones assigned. */ > + /* Local patched datapath (gateway routers) need zones assigned. > + * Only local logical routers with atleast one NAT rule are considered > for > + * CT zone assignment.*/ > const struct local_datapath *ld; > HMAP_FOR_EACH (ld, hmap_node, local_datapaths) { > - /* XXX Add method to limit zone assignment to logical router > - * datapaths with NAT */ > - char *dnat = alloc_nat_zone_key(&ld->datapath->header_.uuid, "dnat"); > - char *snat = alloc_nat_zone_key(&ld->datapath->header_.uuid, "snat"); > - sset_add(&all_users, dnat); > - sset_add(&all_users, snat); > - free(dnat); > - free(snat); > + const char *dp_nblr = smap_get(&ld->datapath->external_ids, > + "logical-router"); > + if (dp_nblr) { > + for (size_t iter = 0; iter < ld->n_peer_ports; iter++) { > + const struct sbrec_port_binding *peer_binding = > + ld->peer_ports[iter].remote; > + const struct sbrec_port_binding *local_binding = > + ld->peer_ports[iter].local; > + > + if (peer_binding->nat_addresses) { > + char *dnat = alloc_nat_zone_key(&ld->datapath->header_.uuid, > + &local_binding->header_.uuid, > + "dnat"); > + char *snat = alloc_nat_zone_key(&ld->datapath->header_.uuid, > + &local_binding->header_.uuid, > + "snat"); > + sset_add(&all_users, dnat); > + sset_add(&all_users, snat); > + free(dnat); > + free(snat); > + } > + } > + } > } > > /* Delete zones that do not exist in above sset. */ > @@ -541,7 +558,7 @@ update_ct_zones(const struct sset *lports, const struct > hmap *local_datapaths, > ct_zone->data, ct_zone->name); > > struct ct_zone_pending_entry *pending = xmalloc(sizeof *pending); > - pending->state = CT_ZONE_DB_QUEUED; /* Skip flushing zone. */ > + pending->state = CT_ZONE_OF_QUEUED; > pending->zone = ct_zone->data; > pending->add = false; > shash_add(pending_ct_zones, ct_zone->name, pending); > diff --git a/controller/physical.c b/controller/physical.c > index 535c777..cc497e0 100644 > --- a/controller/physical.c > +++ b/controller/physical.c > @@ -218,18 +218,24 @@ static struct zone_ids > get_zone_ids(const struct sbrec_port_binding *binding, > const struct simap *ct_zones) > { > - struct zone_ids zone_ids; > + struct zone_ids zone_ids = {0}; > > zone_ids.ct = simap_get(ct_zones, binding->logical_port); > > - const struct uuid *key = &binding->datapath->header_.uuid; > + const struct uuid *key1 = &binding->datapath->header_.uuid; > + const struct uuid *key2 = &binding->header_.uuid; > > - char *dnat = alloc_nat_zone_key(key, "dnat"); > - zone_ids.dnat = simap_get(ct_zones, dnat); > + char *dnat = alloc_nat_zone_key(key1, key2, "dnat"); > + > + if (simap_contains(ct_zones, dnat)) { > + zone_ids.dnat = simap_get(ct_zones, dnat); > + } Both simap_get() and simap_contains() call simap_find() internally. If simap_get() can't find the key it will return a default value of 0. There's no need for the "if (simap_contains(ct_zones, dnat)) {". We can just directly: zone_ids.dnat = simap_get(ct_zones, dnat); > free(dnat); > > - char *snat = alloc_nat_zone_key(key, "snat"); > - zone_ids.snat = simap_get(ct_zones, snat); > + char *snat = alloc_nat_zone_key(key1, key2, "snat"); > + if (simap_contains(ct_zones, snat)) { > + zone_ids.snat = simap_get(ct_zones, snat); > + } Same here. Thanks, Dumitru > free(snat); > > return zone_ids; > diff --git a/lib/ovn-util.c b/lib/ovn-util.c > index cdb5e18..cba7355 100644 > --- a/lib/ovn-util.c > +++ b/lib/ovn-util.c > @@ -327,14 +327,16 @@ destroy_lport_addresses(struct lport_addresses *laddrs) > free(laddrs->ipv6_addrs); > } > > -/* Allocates a key for NAT conntrack zone allocation for a provided > - * 'key' record and a 'type'. > +/* Allocates a key for NAT conntrack zone allocation for provided > + * 'keys' and a 'type'. > * > * It is the caller's responsibility to free the allocated memory. */ > char * > -alloc_nat_zone_key(const struct uuid *key, const char *type) > +alloc_nat_zone_key(const struct uuid *key1, const struct uuid *key2, > + const char *type) > { > - return xasprintf(UUID_FMT"_%s", UUID_ARGS(key), type); > + return xasprintf(UUID_FMT"_"UUID_FMT"_%s", UUID_ARGS(key1), > + UUID_ARGS(key2), type); > } > > const char * > diff --git a/lib/ovn-util.h b/lib/ovn-util.h > index 0f7b501..fe86bf8 100644 > --- a/lib/ovn-util.h > +++ b/lib/ovn-util.h > @@ -77,7 +77,8 @@ bool extract_sbrec_binding_first_mac(const struct > sbrec_port_binding *binding, > > void destroy_lport_addresses(struct lport_addresses *); > > -char *alloc_nat_zone_key(const struct uuid *key, const char *type); > +char *alloc_nat_zone_key(const struct uuid *key1, const struct uuid *key2, > + const char *type); > > const char *default_nb_db(void); > const char *default_sb_db(void); > _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev