Hi Dumitru,

Thanks a lot for looking at the change.
It my mistake, LB use case totally slipped through my mind.

Cross product based approach is not generic enough to address LB as well (and 
future use cases for CT ZONE).

Please ignore this patch, i will submit a new one with a more generic approach 
soon.

There is one change in this patch that can be taken, i.e replacing 
CT_ZONE_DB_QUEUED with CT_ZONE_OF_QUEUED,
but i will submit a separate patch for it as well.


Thanks

Regards,
Ankur




From: Dumitru Ceara <dce...@redhat.com>
Sent: Friday, August 7, 2020 1:58 AM
To: svc.mail.git <svc.mail....@nutanix.com>; ovs-dev@openvswitch.org 
<ovs-dev@openvswitch.org>; Ankur Sharma <ankur.sha...@nutanix.com>
Subject: Re: [ovs-dev] [PATCH v1] ovn-controller: Fix the CT zone assignment 
logic for logical routers 
 
On 8/4/20 5:55 AM, Ankur Sharma wrote:
> From: Ankur Sharma <ankur.sha...@nutanix.com>
> 
> BACKGROUND:
> a. ovn-controller assigns CT ZONES for local ports and datapaths.
> b. If a local port/datapath is cleaned up from a chassis, then
>    corresponding CT ZONE is "unassigned"/"freed" up.
> 
> ISSUE:
> Above logic and implementations leaves stale CT entries in the
> datapath, which may get reused unexpectedly, thereby causing
> issues like, packets going through ct_nat(SNAT_IP_NEW) and getting
> a stale IP as SNAT IP etc.
> 
> a. As a part of CT Zone unassign, implementation should FLUSH the
>    corresponding CT entries, i.e it should do FLUSH by ZONE.
>    As os now, implementation avoids the flushing, thereby leaving
>    stale CT entries.
> 
> b. Similarly, since the implementation relies on datapath existence
>    for assign/unassign of CT ZONEs. Hence, simple operations like
>    moving the logical router from one external logical switch to
>    another, may not cause any CT ZONE reassignment and thereby
>    stale CT entries might get consumed, when they should not have
>    been.
> 
> c. a. and b. combined causes following:
>    i. Start a to be SNATed traffic from internal endpoint to an
>       external endpoint. Let us say internal endpoint IP is
>       50.0.0.10 and external endpoint ip is 8.8.8.8 and
>       logical router port ip (and hence SNAT ip) is 100.0.0.10.
> 
>   ii. Detach the logical router from old external logical switch
>       and attach to new external logical switch. As a result
>       of this operation, new router port ip becomes 200.0.0.10
>       , which also becomes the new SNAT ip.
> 
>  iii. The observation has been that traffic initiated in i. above
>       still ends up using OLD SNAT IP, i.e 100.0.0.10, rather than
>       200.0.0.10
> 
>  iv. iii. above happened, because although from OVS DP, the IP for
>      NAT action is 200.0.0.10, however, since its an ongoing traffic,
>      hence the CT entries come in use and end up NATing to old SNAT
>      ip 100.0.0.10. For example:
> 
>      OVS DP STATE
>      recirc_id(0),in_port(16),....ct(commit,zone=1,nat(src=200.0.0.10))
> 
>      CT STATE
>      icmp,orig=(src=50.0.0.10,dst=8.8.8.8,id=2288,type=8,code=0),
>      reply=(src=8.8.8.8,dst=100.0.0.10,id=2288,type=0,code=0),zone=1
> 
> FIX:
> This patch improves the overall CT ZONE management by doing following:
> a. Do a FLUSH by CT ZONE, once we identify that a zone has to be freed up.
> b. From datapath perspective, restrict the CT ZONE assignment ONLY
>    to logical routers that has NAT rules enabled.

Hi Ankur,

Maybe I'm missing something but doesn't this cause stateful ACLs, load
balancers applied to logical_switches and load balancers applied to
logical routers that don't have nat configured to use the default
conntrack zone (0)? I don't think that's ok.

As a matter of fact, most of the system-ovn.at tests related to
conntrack fail because conntrack entries are created in the default
zone, e.g.:

$ make check-kernel
1: ovn -- 2 LRs connected via LS, gateway router, SNAT and DNAT FAILED
(system-ovn.at:116)

+++ /root/ovn/tests/system-kmod-testsuite.dir/at-groups/1/stdout
2020-08-07 04:50:11.167836393 -0400
@@ -1,2 +1,2 @@
-icmp,orig=(src=172.16.1.2,dst=30.0.0.2,id=<cleared>,type=8,code=0),reply=(src=192.168.1.2,dst=172.16.1.2,id=<cleared>,type=0,code=0),zone=<cleared>
+icmp,orig=(src=172.16.1.2,dst=30.0.0.2,id=<cleared>,type=8,code=0),reply=(src=192.168.1.2,dst=172.16.1.2,id=<cleared>,type=0,code=0)

  2: ovn -- 2 LRs connected via LS, gateway router, SNAT and DNAT - IPv6
FAILED (system-ovn.at:296)
  3: ovn -- 2 LRs connected via LS, gateway router, easy SNAT FAILED
(system-ovn.at:448)
  4: ovn -- 2 LRs connected via LS, gateway router, easy SNAT - IPv6
FAILED (system-ovn.at:560)
  5: ovn -- multiple gateway routers, SNAT and DNAT  FAILED
(system-ovn.at:730)
  6: ovn -- multiple gateway routers, SNAT and DNAT - IPv6 FAILED
(system-ovn.at:958)
  7: ovn -- load-balancing                           FAILED
(system-ovn.at:1153)
  8: ovn -- load-balancing - IPv6                    ok
  9: ovn -- load-balancing - same subnet.            ok
 10: ovn -- load-balancing - same subnet. - IPv6     ok
 11: ovn -- load balancing in gateway router         FAILED
(system-ovn.at:1829)

./system-ovn.at:1829: ovs-appctl dpctl/dump-conntrack | grep
"dst=30.0.0.1" | sed -e 's/port=[0-9]*/port=<cleared>/g' -e
's/id=[0-9]*/id=<cleared>/g' -e 's/state=[0-9_A-Z]*/state=<cleared>/g' |
sort | uniq |
sed -e 's/zone=[0-9]*/zone=<cleared>/'
--- -   2020-08-07 04:46:41.414957151 -0400
+++ /root/ovn/tests/system-kmod-testsuite.dir/at-groups/11/stdout
2020-08-07 04:46:41.412798625 -0400
@@ -1,3 +1,3 @@
-tcp,orig=(src=172.16.1.2,dst=30.0.0.1,sport=<cleared>,dport=<cleared>),reply=(src=192.168.1.2,dst=172.16.1.2,sport=<cleared>,dport=<cleared>),zone=<cleared>,protoinfo=(state=<cleared>)
-tcp,orig=(src=172.16.1.2,dst=30.0.0.1,sport=<cleared>,dport=<cleared>),reply=(src=192.168.2.2,dst=172.16.1.2,sport=<cleared>,dport=<cleared>),zone=<cleared>,protoinfo=(state=<cleared>)
+tcp,orig=(src=172.16.1.2,dst=30.0.0.1,sport=<cleared>,dport=<cleared>),reply=(src=192.168.1.2,dst=172.16.1.2,sport=<cleared>,dport=<cleared>),protoinfo=(state=<cleared>)
+tcp,orig=(src=172.16.1.2,dst=30.0.0.1,sport=<cleared>,dport=<cleared>),reply=(src=192.168.2.2,dst=172.16.1.2,sport=<cleared>,dport=<cleared>),protoinfo=(state=<cleared>)


 12: ovn -- load balancing in gateway router - IPv6  FAILED
(system-ovn.at:2033)
 13: ovn -- multiple gateway routers, load-balancing FAILED
(system-ovn.at:2209)
 14: ovn -- multiple gateway routers, load-balancing - IPv6 FAILED
(system-ovn.at:2376)

[...]

 30: ovn -- ECMP symmetric reply                     FAILED
(system-ovn.at:4684)

./system-ovn.at:4684: ovs-appctl dpctl/dump-conntrack | grep
"dst=172.16.0.1" | sed -e 's/port=[0-9]*/port=<cleared>/g' -e
's/id=[0-9]*/id=<cleared>/g' -e 's/state=[0-9_A-Z]*/state=<cleared>/g' |
sort | uniq | \
sed -e 's/zone=[0-9]*/zone=<cleared>/' |
sed -e
's/labels=0x[0-9a-f]*00000401020400000000/labels=0x00000401020400000000/'
--- -   2020-08-07 04:56:22.403499981 -0400
+++ /root/ovn/tests/system-kmod-testsuite.dir/at-groups/30/stdout
2020-08-07 04:56:22.401442917 -0400
@@ -1,2 +1,2 @@
-icmp,orig=(src=172.16.0.1,dst=10.0.0.2,id=<cleared>,type=8,code=0),reply=(src=10.0.0.2,dst=172.16.0.1,id=<cleared>,type=0,code=0),zone=<cleared>,labels=0x00000401020400000000
+icmp,orig=(src=172.16.0.1,dst=10.0.0.2,id=<cleared>,type=8,code=0),reply=(src=10.0.0.2,dst=172.16.0.1,id=<cleared>,type=0,code=0),labels=0x00000401020400000000

> c. Instead of using logical router uuid as ct zone key, use crossproduct
>    of logical router and logical router port that connects to external
>    logical switch.
> 
> Signed-off-by: Ankur Sharma <ankur.sha...@nutanix.com>
> ---
>  controller/ovn-controller.c | 37 +++++++++++++++++++++++++++----------
>  controller/physical.c       | 18 ++++++++++++------
>  lib/ovn-util.c              | 10 ++++++----
>  lib/ovn-util.h              |  3 ++-
>  4 files changed, 47 insertions(+), 21 deletions(-)
> 
> diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c
> index 5ca32ac..9a6746e 100644
> --- a/controller/ovn-controller.c
> +++ b/controller/ovn-controller.c
> @@ -521,17 +521,34 @@ update_ct_zones(const struct sset *lports, const struct 
> hmap *local_datapaths,
>          sset_add(&all_users, user);
>      }
>  
> -    /* Local patched datapath (gateway routers) need zones assigned. */
> +    /* Local patched datapath (gateway routers) need zones assigned.
> +     * Only local logical routers with atleast one NAT rule are considered 
> for
> +     * CT zone assignment.*/
>      const struct local_datapath *ld;
>      HMAP_FOR_EACH (ld, hmap_node, local_datapaths) {
> -        /* XXX Add method to limit zone assignment to logical router
> -         * datapaths with NAT */
> -        char *dnat = alloc_nat_zone_key(&ld->datapath->header_.uuid, "dnat");
> -        char *snat = alloc_nat_zone_key(&ld->datapath->header_.uuid, "snat");
> -        sset_add(&all_users, dnat);
> -        sset_add(&all_users, snat);
> -        free(dnat);
> -        free(snat);
> +       const char *dp_nblr = smap_get(&ld->datapath->external_ids,
> +                                      "logical-router");
> +       if (dp_nblr) {
> +          for (size_t iter = 0; iter < ld->n_peer_ports; iter++) {
> +             const struct sbrec_port_binding *peer_binding =
> +                                             ld->peer_ports[iter].remote;
> +             const struct sbrec_port_binding *local_binding =
> +                                             ld->peer_ports[iter].local;
> +
> +             if (peer_binding->nat_addresses) {
> +                char *dnat = alloc_nat_zone_key(&ld->datapath->header_.uuid,
> +                                                &local_binding->header_.uuid,
> +                                                "dnat");
> +                char *snat = alloc_nat_zone_key(&ld->datapath->header_.uuid,
> +                                                &local_binding->header_.uuid,
> +                                                "snat");
> +                sset_add(&all_users, dnat);
> +                sset_add(&all_users, snat);
> +                free(dnat);
> +                free(snat);
> +             }
> +          }
> +       }
>      }
>  
>      /* Delete zones that do not exist in above sset. */
> @@ -541,7 +558,7 @@ update_ct_zones(const struct sset *lports, const struct 
> hmap *local_datapaths,
>                       ct_zone->data, ct_zone->name);
>  
>              struct ct_zone_pending_entry *pending = xmalloc(sizeof *pending);
> -            pending->state = CT_ZONE_DB_QUEUED; /* Skip flushing zone. */
> +            pending->state = CT_ZONE_OF_QUEUED;
>              pending->zone = ct_zone->data;
>              pending->add = false;
>              shash_add(pending_ct_zones, ct_zone->name, pending);
> diff --git a/controller/physical.c b/controller/physical.c
> index 535c777..cc497e0 100644
> --- a/controller/physical.c
> +++ b/controller/physical.c
> @@ -218,18 +218,24 @@ static struct zone_ids
>  get_zone_ids(const struct sbrec_port_binding *binding,
>               const struct simap *ct_zones)
>  {
> -    struct zone_ids zone_ids;
> +    struct zone_ids zone_ids = {0};
>  
>      zone_ids.ct = simap_get(ct_zones, binding->logical_port);
>  
> -    const struct uuid *key = &binding->datapath->header_.uuid;
> +    const struct uuid *key1 = &binding->datapath->header_.uuid;
> +    const struct uuid *key2 = &binding->header_.uuid;
>  
> -    char *dnat = alloc_nat_zone_key(key, "dnat");
> -    zone_ids.dnat = simap_get(ct_zones, dnat);
> +    char *dnat = alloc_nat_zone_key(key1, key2, "dnat");
> +
> +    if (simap_contains(ct_zones, dnat)) {
> +       zone_ids.dnat = simap_get(ct_zones, dnat);
> +    }

Both simap_get() and simap_contains() call simap_find() internally.  If
simap_get() can't find the key it will return a default value of 0.
There's no need for the "if (simap_contains(ct_zones, dnat)) {". We can
just directly:

zone_ids.dnat = simap_get(ct_zones, dnat);

>      free(dnat);
>  
> -    char *snat = alloc_nat_zone_key(key, "snat");
> -    zone_ids.snat = simap_get(ct_zones, snat);
> +    char *snat = alloc_nat_zone_key(key1, key2, "snat");
> +    if (simap_contains(ct_zones, snat)) {
> +       zone_ids.snat = simap_get(ct_zones, snat);
> +    }

Same here.

Thanks,
Dumitru

>      free(snat);
>  
>      return zone_ids;
> diff --git a/lib/ovn-util.c b/lib/ovn-util.c
> index cdb5e18..cba7355 100644
> --- a/lib/ovn-util.c
> +++ b/lib/ovn-util.c
> @@ -327,14 +327,16 @@ destroy_lport_addresses(struct lport_addresses *laddrs)
>      free(laddrs->ipv6_addrs);
>  }
>  
> -/* Allocates a key for NAT conntrack zone allocation for a provided
> - * 'key' record and a 'type'.
> +/* Allocates a key for NAT conntrack zone allocation for provided
> + * 'keys' and a 'type'.
>   *
>   * It is the caller's responsibility to free the allocated memory. */
>  char *
> -alloc_nat_zone_key(const struct uuid *key, const char *type)
> +alloc_nat_zone_key(const struct uuid *key1, const struct uuid *key2,
> +                   const char *type)
>  {
> -    return xasprintf(UUID_FMT"_%s", UUID_ARGS(key), type);
> +    return xasprintf(UUID_FMT"_"UUID_FMT"_%s", UUID_ARGS(key1),
> +                     UUID_ARGS(key2), type);
>  }
>  
>  const char *
> diff --git a/lib/ovn-util.h b/lib/ovn-util.h
> index 0f7b501..fe86bf8 100644
> --- a/lib/ovn-util.h
> +++ b/lib/ovn-util.h
> @@ -77,7 +77,8 @@ bool extract_sbrec_binding_first_mac(const struct 
> sbrec_port_binding *binding,
>  
>  void destroy_lport_addresses(struct lport_addresses *);
>  
> -char *alloc_nat_zone_key(const struct uuid *key, const char *type);
> +char *alloc_nat_zone_key(const struct uuid *key1, const struct uuid *key2,
> +                         const char *type);
>  
>  const char *default_nb_db(void);
>  const char *default_sb_db(void);
> 
_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to