Re: [ovs-dev] [PATCH ovn v2] Allow explicit setting of the SNAT zone on a gateway router.

2020-11-16 Thread Numan Siddique
On Tue, Nov 17, 2020 at 12:14 AM Mark Michelson  wrote:
>
> Thanks for the review Numan.
>
> On 11/16/20 4:19 AM, Numan Siddique wrote:
> > On Thu, Nov 12, 2020 at 8:26 PM Mark Michelson  wrote:
> >>
> >> In certain situations, OVN may coexist with other applications on a
> >> host. Traffic from OVN and the other applications may then go out a
> >> shared gateway. If OVN traffic and the other application traffic use
> >> different conntrack zones for SNAT, then it is possible for the shared
> >> gateway to assign conflicting source IP:port combinations. By sharing
> >> the same conntrack zone, there will be no conflicting assignments.
> >>
> >> In this commit, we introduce options:snat-ct-zone for northbound logical
> >> routers. By setting this option, users can explicitly set the conntrack
> >> zone for the logical router so that it will match the zone used by
> >> non-OVN traffic on the host.
> >>
> >> The biggest side effects of this patch are:
> >> 1) southbound datapath changes now result in recalculating CT zones in
> >> ovn-controller. This can result in recomputing physical flows in more
> >> situations than previously.
> >> 2) The table 65 flow to transition between datapaths is no longer
> >> associated with a port binding. This is because the flow refers to
> >> the peer datapath's CT zones, which can now be updated due to changes
> >> on that datapath. The flow therefore may need to be updated either
> >> due to the port binding being changed or the peer datapath being
> >> changed.
> >>
> >> Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1892311
> >> Signed-off-by: Mark Michelson 
> >
> > Hi Mark,
> >
> > Thanks for the patch.
> >
> > I did some testing with the below logical resources
> >
> > **
> > switch 0ff55345-e283-4ed7-8c7f-8475a8a8f968 (sw1)
> >  port sw1-lr0
> >  type: router
> >  addresses: ["00:00:00:00:ff:02"]
> >  router-port: lr0-sw1
> >  port sw1-port2
> >  addresses: ["40:54:00:00:00:04 20.0.0.4"]
> > switch bddcdd29-9b32-482a-b9a6-0f9fa1a0e25a (sw0)
> >  port sw0-lr0
> >  type: router
> >  addresses: ["00:00:00:00:ff:01"]
> >  router-port: lr0-sw0
> >  port sw0-port1
> >  addresses: ["50:54:00:00:00:03 10.0.0.3"]
> >  port sw0-lr1
> >  type: router
> >  router-port: lr1-sw0
> > router 9462469c-d799-4d28-bccd-80e9a8c68e24 (lr1)
> >  port lr1-sw0
> >  mac: "00:00:00:00:fa:21"
> >  networks: ["10.0.0.20/24"]
> > router 96b14269-65ad-4ae9-9fa6-e4ef2a344a54 (lr0)
> >  port lr0-sw1
> >  mac: "00:00:00:00:ff:02"
> >  networks: ["20.0.0.1/24"]
> >  port lr0-public
> >  mac: "00:00:20:20:12:13"
> >  networks: ["172.168.0.100/24"]
> >  gateway chassis: [chassis-1]
> > *
> >
> >
> > If I set - "ovn-nbctl set logical_router lr0 options:snat-ct-zone=x"
> > and If 'x' is already allocated for sw0 snat, then the
> > both sw0 snat and lr0 snat share the same zone id. Is that expected ?
>
> That is not expected. If you request a zone for lr0, then sw0 should
> have its zone re-assigned to something different. I can add a test case
> for this and debug it. Looking at the code, it's not obvious to me why
> it's failing.
>
> >
> > If I set the same zone id 'y' for both lr0 and lr1, then I observed
> > that both lr0 and lr1 have the same zone id 'y' and I also
> > see the warning in the ovn-controller log.
> >
> > Later if I change the zone of lr1 to 'z' and then lr0 to 'z', the lr0
> > is not updated with 'z'. So lr0 will have 'y' and lr1 will have 'z'.
> > Seems like it's not consistent.
>
> I wrote the code with the expectation that if duplicate zone assignments
> were detected, the oldest request would be honored and the newest would
> be ignored. If you first requested 'y' for lr0 and then later requested
> 'y' for lr1, the expectation is that when you requested 'y' for lr1,
> you'd see the warning, and lr1 would be auto-assigned a different zone
> id. I think the inconsistent behavior may be occurring due to the
> hashing of the datapaths. The order in which datapaths are visited does
> not correlate with the order that the datapaths were assigned their zone
> IDs. So if the newer assignment is visited in the datapath traversal
> before the older assignment, they'll both get assigned the same zone.
> But if the newer assignment is visited in the datapath traversal after
> the older assignment, it works as intended. This could be corrected, but
> the point may be moot based on discussion below.
>
> >
> > Can there be a  possibility that there are 2 shared gateways in one
> > node and both want to share the same zone id ?
> >
> > i.e lr0 is associated with one shared gateway and lr1 with another
> > shared gateway (on the same host) and both want to use the zone id 0 ?
>
> Yes, I guess that could very well be possible. I guess there's no harm
> in having multiple gateways share the 

Re: [ovs-dev] [PATCH ovn v2] Allow explicit setting of the SNAT zone on a gateway router.

2020-11-16 Thread Mark Michelson

Thanks for the review Numan.

On 11/16/20 4:19 AM, Numan Siddique wrote:

On Thu, Nov 12, 2020 at 8:26 PM Mark Michelson  wrote:


In certain situations, OVN may coexist with other applications on a
host. Traffic from OVN and the other applications may then go out a
shared gateway. If OVN traffic and the other application traffic use
different conntrack zones for SNAT, then it is possible for the shared
gateway to assign conflicting source IP:port combinations. By sharing
the same conntrack zone, there will be no conflicting assignments.

In this commit, we introduce options:snat-ct-zone for northbound logical
routers. By setting this option, users can explicitly set the conntrack
zone for the logical router so that it will match the zone used by
non-OVN traffic on the host.

The biggest side effects of this patch are:
1) southbound datapath changes now result in recalculating CT zones in
ovn-controller. This can result in recomputing physical flows in more
situations than previously.
2) The table 65 flow to transition between datapaths is no longer
associated with a port binding. This is because the flow refers to
the peer datapath's CT zones, which can now be updated due to changes
on that datapath. The flow therefore may need to be updated either
due to the port binding being changed or the peer datapath being
changed.

Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1892311
Signed-off-by: Mark Michelson 


Hi Mark,

Thanks for the patch.

I did some testing with the below logical resources

**
switch 0ff55345-e283-4ed7-8c7f-8475a8a8f968 (sw1)
 port sw1-lr0
 type: router
 addresses: ["00:00:00:00:ff:02"]
 router-port: lr0-sw1
 port sw1-port2
 addresses: ["40:54:00:00:00:04 20.0.0.4"]
switch bddcdd29-9b32-482a-b9a6-0f9fa1a0e25a (sw0)
 port sw0-lr0
 type: router
 addresses: ["00:00:00:00:ff:01"]
 router-port: lr0-sw0
 port sw0-port1
 addresses: ["50:54:00:00:00:03 10.0.0.3"]
 port sw0-lr1
 type: router
 router-port: lr1-sw0
router 9462469c-d799-4d28-bccd-80e9a8c68e24 (lr1)
 port lr1-sw0
 mac: "00:00:00:00:fa:21"
 networks: ["10.0.0.20/24"]
router 96b14269-65ad-4ae9-9fa6-e4ef2a344a54 (lr0)
 port lr0-sw1
 mac: "00:00:00:00:ff:02"
 networks: ["20.0.0.1/24"]
 port lr0-public
 mac: "00:00:20:20:12:13"
 networks: ["172.168.0.100/24"]
 gateway chassis: [chassis-1]
*


If I set - "ovn-nbctl set logical_router lr0 options:snat-ct-zone=x"
and If 'x' is already allocated for sw0 snat, then the
both sw0 snat and lr0 snat share the same zone id. Is that expected ?


That is not expected. If you request a zone for lr0, then sw0 should 
have its zone re-assigned to something different. I can add a test case 
for this and debug it. Looking at the code, it's not obvious to me why 
it's failing.




If I set the same zone id 'y' for both lr0 and lr1, then I observed
that both lr0 and lr1 have the same zone id 'y' and I also
see the warning in the ovn-controller log.

Later if I change the zone of lr1 to 'z' and then lr0 to 'z', the lr0
is not updated with 'z'. So lr0 will have 'y' and lr1 will have 'z'.
Seems like it's not consistent.


I wrote the code with the expectation that if duplicate zone assignments 
were detected, the oldest request would be honored and the newest would 
be ignored. If you first requested 'y' for lr0 and then later requested 
'y' for lr1, the expectation is that when you requested 'y' for lr1, 
you'd see the warning, and lr1 would be auto-assigned a different zone 
id. I think the inconsistent behavior may be occurring due to the 
hashing of the datapaths. The order in which datapaths are visited does 
not correlate with the order that the datapaths were assigned their zone 
IDs. So if the newer assignment is visited in the datapath traversal 
before the older assignment, they'll both get assigned the same zone. 
But if the newer assignment is visited in the datapath traversal after 
the older assignment, it works as intended. This could be corrected, but 
the point may be moot based on discussion below.




Can there be a  possibility that there are 2 shared gateways in one
node and both want to share the same zone id ?

i.e lr0 is associated with one shared gateway and lr1 with another
shared gateway (on the same host) and both want to use the zone id 0 ?


Yes, I guess that could very well be possible. I guess there's no harm 
in having multiple gateways share the same zone assignment if they both 
were explicitly configured to use that particular zone assigment. That 
actually would simplify the code a bit.




Thanks
Numan



---
  controller/ovn-controller.c | 89 +++-
  controller/physical.c   |  2 +-
  lib/ovn-util.c  |  7 +++
  lib/ovn-util.h  |  1 +
  northd/ovn-northd.c | 10 
  ovn-nb.xml  

Re: [ovs-dev] [PATCH ovn v2] Allow explicit setting of the SNAT zone on a gateway router.

2020-11-16 Thread Numan Siddique
On Thu, Nov 12, 2020 at 8:26 PM Mark Michelson  wrote:
>
> In certain situations, OVN may coexist with other applications on a
> host. Traffic from OVN and the other applications may then go out a
> shared gateway. If OVN traffic and the other application traffic use
> different conntrack zones for SNAT, then it is possible for the shared
> gateway to assign conflicting source IP:port combinations. By sharing
> the same conntrack zone, there will be no conflicting assignments.
>
> In this commit, we introduce options:snat-ct-zone for northbound logical
> routers. By setting this option, users can explicitly set the conntrack
> zone for the logical router so that it will match the zone used by
> non-OVN traffic on the host.
>
> The biggest side effects of this patch are:
> 1) southbound datapath changes now result in recalculating CT zones in
>ovn-controller. This can result in recomputing physical flows in more
>situations than previously.
> 2) The table 65 flow to transition between datapaths is no longer
>associated with a port binding. This is because the flow refers to
>the peer datapath's CT zones, which can now be updated due to changes
>on that datapath. The flow therefore may need to be updated either
>due to the port binding being changed or the peer datapath being
>changed.
>
> Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1892311
> Signed-off-by: Mark Michelson 

Hi Mark,

Thanks for the patch.

I did some testing with the below logical resources

**
switch 0ff55345-e283-4ed7-8c7f-8475a8a8f968 (sw1)
port sw1-lr0
type: router
addresses: ["00:00:00:00:ff:02"]
router-port: lr0-sw1
port sw1-port2
addresses: ["40:54:00:00:00:04 20.0.0.4"]
switch bddcdd29-9b32-482a-b9a6-0f9fa1a0e25a (sw0)
port sw0-lr0
type: router
addresses: ["00:00:00:00:ff:01"]
router-port: lr0-sw0
port sw0-port1
addresses: ["50:54:00:00:00:03 10.0.0.3"]
port sw0-lr1
type: router
router-port: lr1-sw0
router 9462469c-d799-4d28-bccd-80e9a8c68e24 (lr1)
port lr1-sw0
mac: "00:00:00:00:fa:21"
networks: ["10.0.0.20/24"]
router 96b14269-65ad-4ae9-9fa6-e4ef2a344a54 (lr0)
port lr0-sw1
mac: "00:00:00:00:ff:02"
networks: ["20.0.0.1/24"]
port lr0-public
mac: "00:00:20:20:12:13"
networks: ["172.168.0.100/24"]
gateway chassis: [chassis-1]
*


If I set - "ovn-nbctl set logical_router lr0 options:snat-ct-zone=x"
and If 'x' is already allocated for sw0 snat, then the
both sw0 snat and lr0 snat share the same zone id. Is that expected ?

If I set the same zone id 'y' for both lr0 and lr1, then I observed
that both lr0 and lr1 have the same zone id 'y' and I also
see the warning in the ovn-controller log.

Later if I change the zone of lr1 to 'z' and then lr0 to 'z', the lr0
is not updated with 'z'. So lr0 will have 'y' and lr1 will have 'z'.
Seems like it's not consistent.

Can there be a  possibility that there are 2 shared gateways in one
node and both want to share the same zone id ?

i.e lr0 is associated with one shared gateway and lr1 with another
shared gateway (on the same host) and both want to use the zone id 0 ?

Thanks
Numan


> ---
>  controller/ovn-controller.c | 89 +++-
>  controller/physical.c   |  2 +-
>  lib/ovn-util.c  |  7 +++
>  lib/ovn-util.h  |  1 +
>  northd/ovn-northd.c | 10 
>  ovn-nb.xml  |  7 +++
>  tests/ovn.at| 91 +
>  7 files changed, 194 insertions(+), 13 deletions(-)
>
> diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c
> index a06cae3cc..54c8fd2db 100644
> --- a/controller/ovn-controller.c
> +++ b/controller/ovn-controller.c
> @@ -531,6 +531,21 @@ update_sb_db(struct ovsdb_idl *ovs_idl, struct ovsdb_idl 
> *ovnsb_idl,
>  }
>  }
>
> +static void
> +add_pending_ct_zone_entry(struct shash *pending_ct_zones,
> +  enum ct_zone_pending_state state,
> +  int zone, bool add, const char *name)
> +{
> +VLOG_DBG("%s ct zone %"PRId32" for '%s'",
> + add ? "assigning" : "removing", zone, name);
> +
> +struct ct_zone_pending_entry *pending = xmalloc(sizeof *pending);
> +pending->state = state; /* Skip flushing zone. */
> +pending->zone = zone;
> +pending->add = add;
> +shash_add(pending_ct_zones, name, pending);
> +}
> +
>  static void
>  update_ct_zones(const struct sset *lports, const struct hmap 
> *local_datapaths,
>  struct simap *ct_zones, unsigned long *ct_zone_bitmap,
> @@ -540,6 +555,7 @@ update_ct_zones(const struct sset *lports, const struct 
> hmap *local_datapaths,
>  int scan_start = 1;
>  const char *user;
>  struct sset all_users = SSET_INITIALIZER(_users);
> +struct simap req_snat_zones = SIMAP_INITIALIZER(_snat_zones);
>
>