On Tue, Nov 17, 2020 at 12:14 AM Mark Michelson <mmich...@redhat.com> wrote: > > Thanks for the review Numan. > > On 11/16/20 4:19 AM, Numan Siddique wrote: > > On Thu, Nov 12, 2020 at 8:26 PM Mark Michelson <mmich...@redhat.com> wrote: > >> > >> In certain situations, OVN may coexist with other applications on a > >> host. Traffic from OVN and the other applications may then go out a > >> shared gateway. If OVN traffic and the other application traffic use > >> different conntrack zones for SNAT, then it is possible for the shared > >> gateway to assign conflicting source IP:port combinations. By sharing > >> the same conntrack zone, there will be no conflicting assignments. > >> > >> In this commit, we introduce options:snat-ct-zone for northbound logical > >> routers. By setting this option, users can explicitly set the conntrack > >> zone for the logical router so that it will match the zone used by > >> non-OVN traffic on the host. > >> > >> The biggest side effects of this patch are: > >> 1) southbound datapath changes now result in recalculating CT zones in > >> ovn-controller. This can result in recomputing physical flows in more > >> situations than previously. > >> 2) The table 65 flow to transition between datapaths is no longer > >> associated with a port binding. This is because the flow refers to > >> the peer datapath's CT zones, which can now be updated due to changes > >> on that datapath. The flow therefore may need to be updated either > >> due to the port binding being changed or the peer datapath being > >> changed. > >> > >> Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1892311 > >> Signed-off-by: Mark Michelson <mmich...@redhat.com> > > > > Hi Mark, > > > > Thanks for the patch. > > > > I did some testing with the below logical resources > > > > ****** > > switch 0ff55345-e283-4ed7-8c7f-8475a8a8f968 (sw1) > > port sw1-lr0 > > type: router > > addresses: ["00:00:00:00:ff:02"] > > router-port: lr0-sw1 > > port sw1-port2 > > addresses: ["40:54:00:00:00:04 20.0.0.4"] > > switch bddcdd29-9b32-482a-b9a6-0f9fa1a0e25a (sw0) > > port sw0-lr0 > > type: router > > addresses: ["00:00:00:00:ff:01"] > > router-port: lr0-sw0 > > port sw0-port1 > > addresses: ["50:54:00:00:00:03 10.0.0.3"] > > port sw0-lr1 > > type: router > > router-port: lr1-sw0 > > router 9462469c-d799-4d28-bccd-80e9a8c68e24 (lr1) > > port lr1-sw0 > > mac: "00:00:00:00:fa:21" > > networks: ["10.0.0.20/24"] > > router 96b14269-65ad-4ae9-9fa6-e4ef2a344a54 (lr0) > > port lr0-sw1 > > mac: "00:00:00:00:ff:02" > > networks: ["20.0.0.1/24"] > > port lr0-public > > mac: "00:00:20:20:12:13" > > networks: ["172.168.0.100/24"] > > gateway chassis: [chassis-1] > > ***** > > > > > > If I set - "ovn-nbctl set logical_router lr0 options:snat-ct-zone=x" > > and If 'x' is already allocated for sw0 snat, then the > > both sw0 snat and lr0 snat share the same zone id. Is that expected ? > > That is not expected. If you request a zone for lr0, then sw0 should > have its zone re-assigned to something different. I can add a test case > for this and debug it. Looking at the code, it's not obvious to me why > it's failing. > > > > > If I set the same zone id 'y' for both lr0 and lr1, then I observed > > that both lr0 and lr1 have the same zone id 'y' and I also > > see the warning in the ovn-controller log. > > > > Later if I change the zone of lr1 to 'z' and then lr0 to 'z', the lr0 > > is not updated with 'z'. So lr0 will have 'y' and lr1 will have 'z'. > > Seems like it's not consistent. > > I wrote the code with the expectation that if duplicate zone assignments > were detected, the oldest request would be honored and the newest would > be ignored. If you first requested 'y' for lr0 and then later requested > 'y' for lr1, the expectation is that when you requested 'y' for lr1, > you'd see the warning, and lr1 would be auto-assigned a different zone > id. I think the inconsistent behavior may be occurring due to the > hashing of the datapaths. The order in which datapaths are visited does > not correlate with the order that the datapaths were assigned their zone > IDs. So if the newer assignment is visited in the datapath traversal > before the older assignment, they'll both get assigned the same zone. > But if the newer assignment is visited in the datapath traversal after > the older assignment, it works as intended. This could be corrected, but > the point may be moot based on discussion below. > > > > > Can there be a possibility that there are 2 shared gateways in one > > node and both want to share the same zone id ? > > > > i.e lr0 is associated with one shared gateway and lr1 with another > > shared gateway (on the same host) and both want to use the zone id 0 ? > > Yes, I guess that could very well be possible. I guess there's no harm > in having multiple gateways share the same zone assignment if they both > were explicitly configured to use that particular zone assigment. That > actually would simplify the code a bit.
Thanks for the reply and explanation. To simplify the code and to avoid checking for duplicate zone id assignments, we could 1. Allow only zone id 0 for configuration from CMS. This is very restrictive. 2. OVN can reserve the zone ids from 0 to say 100 for CMS configuration. And CMS is free to use the zone ids within this reserve. ovn-controllers will allocate zone ids starting from 101 for the normal logical ports/routers. This is not very restrictive and would also simplify the code. What do you think of this ? My personal preference would be (2). I am fine with the present proposal too if (1) or (2) will not solve the requirement. Thanks Numan > > > > > Thanks > > Numan > > > > > >> --- > >> controller/ovn-controller.c | 89 +++++++++++++++++++++++++++++++----- > >> controller/physical.c | 2 +- > >> lib/ovn-util.c | 7 +++ > >> lib/ovn-util.h | 1 + > >> northd/ovn-northd.c | 10 ++++ > >> ovn-nb.xml | 7 +++ > >> tests/ovn.at | 91 +++++++++++++++++++++++++++++++++++++ > >> 7 files changed, 194 insertions(+), 13 deletions(-) > >> > >> diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c > >> index a06cae3cc..54c8fd2db 100644 > >> --- a/controller/ovn-controller.c > >> +++ b/controller/ovn-controller.c > >> @@ -531,6 +531,21 @@ update_sb_db(struct ovsdb_idl *ovs_idl, struct > >> ovsdb_idl *ovnsb_idl, > >> } > >> } > >> > >> +static void > >> +add_pending_ct_zone_entry(struct shash *pending_ct_zones, > >> + enum ct_zone_pending_state state, > >> + int zone, bool add, const char *name) > >> +{ > >> + VLOG_DBG("%s ct zone %"PRId32" for '%s'", > >> + add ? "assigning" : "removing", zone, name); > >> + > >> + struct ct_zone_pending_entry *pending = xmalloc(sizeof *pending); > >> + pending->state = state; /* Skip flushing zone. */ > >> + pending->zone = zone; > >> + pending->add = add; > >> + shash_add(pending_ct_zones, name, pending); > >> +} > >> + > >> static void > >> update_ct_zones(const struct sset *lports, const struct hmap > >> *local_datapaths, > >> struct simap *ct_zones, unsigned long *ct_zone_bitmap, > >> @@ -540,6 +555,7 @@ update_ct_zones(const struct sset *lports, const > >> struct hmap *local_datapaths, > >> int scan_start = 1; > >> const char *user; > >> struct sset all_users = SSET_INITIALIZER(&all_users); > >> + struct simap req_snat_zones = SIMAP_INITIALIZER(&req_snat_zones); > >> > >> SSET_FOR_EACH(user, lports) { > >> sset_add(&all_users, user); > >> @@ -554,6 +570,25 @@ update_ct_zones(const struct sset *lports, const > >> struct hmap *local_datapaths, > >> char *snat = alloc_nat_zone_key(&ld->datapath->header_.uuid, > >> "snat"); > >> sset_add(&all_users, dnat); > >> sset_add(&all_users, snat); > >> + > >> + int req_snat_zone = datapath_snat_ct_zone(ld->datapath); > >> + if (req_snat_zone >= 0) { > >> + struct simap_node *node; > >> + bool collision = false; > >> + SIMAP_FOR_EACH (node, &req_snat_zones) { > >> + if (node->data == req_snat_zone) { > >> + VLOG_WARN("Datapaths %.*s and " UUID_FMT " request > >> SNAT " > >> + "CT zone %d\n", UUID_LEN, node->name, > >> + UUID_ARGS(&ld->datapath->header_.uuid), > >> + req_snat_zone); > >> + collision = true; > >> + break; > >> + } > >> + } > >> + if (!collision) { > >> + simap_put(&req_snat_zones, snat, req_snat_zone); > >> + } > >> + } > >> free(dnat); > >> free(snat); > >> } > >> @@ -564,17 +599,51 @@ update_ct_zones(const struct sset *lports, const > >> struct hmap *local_datapaths, > >> VLOG_DBG("removing ct zone %"PRId32" for '%s'", > >> ct_zone->data, ct_zone->name); > >> > >> - struct ct_zone_pending_entry *pending = xmalloc(sizeof > >> *pending); > >> - pending->state = CT_ZONE_DB_QUEUED; /* Skip flushing zone. */ > >> - pending->zone = ct_zone->data; > >> - pending->add = false; > >> - shash_add(pending_ct_zones, ct_zone->name, pending); > >> + add_pending_ct_zone_entry(pending_ct_zones, CT_ZONE_DB_QUEUED, > >> + ct_zone->data, false, > >> ct_zone->name); > >> > >> bitmap_set0(ct_zone_bitmap, ct_zone->data); > >> simap_delete(ct_zones, ct_zone); > >> } > >> } > >> > >> + /* Prioritize requested CT zones */ > >> + struct simap_node *snat_req_node; > >> + SIMAP_FOR_EACH (snat_req_node, &req_snat_zones) { > >> + struct simap_node *node = simap_find(ct_zones, > >> snat_req_node->name); > >> + if (node) { > >> + if (node->data == snat_req_node->data) { > >> + /* Already have this zone reserved */ > >> + continue; > >> + } else { > >> + /* Zone has changed for this node. delete old entry */ > >> + bitmap_set0(ct_zone_bitmap, node->data); > >> + simap_delete(ct_zones, node); > >> + } > >> + } else if (snat_req_node->data > 0 && > >> + bitmap_is_set(ct_zone_bitmap, snat_req_node->data)) { > >> + /* Uh oh. Someone else already has this zone assigned. > >> + * We need to find who and remove them from ct_zones so > >> + * that they get re-assigned a new zone below > >> + */ > >> + struct simap_node *next; > >> + SIMAP_FOR_EACH_SAFE (node, next, ct_zones) { > >> + if (node->data == snat_req_node->data) { > >> + simap_delete(ct_zones, node); > >> + break; > >> + } > >> + } > >> + } > >> + > >> + add_pending_ct_zone_entry(pending_ct_zones, CT_ZONE_OF_QUEUED, > >> + snat_req_node->data, true, > >> + snat_req_node->name); > >> + > >> + bitmap_set1(ct_zone_bitmap, snat_req_node->data); > >> + simap_put(ct_zones, snat_req_node->name, snat_req_node->data); > >> + } > >> + simap_destroy(&req_snat_zones); > >> + > >> /* xxx This is wasteful to assign a zone to each port--even if no > >> * xxx security policy is applied. */ > >> > >> @@ -596,13 +665,8 @@ update_ct_zones(const struct sset *lports, const > >> struct hmap *local_datapaths, > >> } > >> scan_start = zone + 1; > >> > >> - VLOG_DBG("assigning ct zone %"PRId32" to '%s'", zone, user); > >> - > >> - struct ct_zone_pending_entry *pending = xmalloc(sizeof *pending); > >> - pending->state = CT_ZONE_OF_QUEUED; > >> - pending->zone = zone; > >> - pending->add = true; > >> - shash_add(pending_ct_zones, user, pending); > >> + add_pending_ct_zone_entry(pending_ct_zones, CT_ZONE_OF_QUEUED, > >> + zone, true, user); > >> > >> bitmap_set1(ct_zone_bitmap, zone); > >> simap_put(ct_zones, user, zone); > >> @@ -2330,6 +2394,7 @@ main(int argc, char *argv[]) > >> > >> engine_add_input(&en_ct_zones, &en_ovs_open_vswitch, NULL); > >> engine_add_input(&en_ct_zones, &en_ovs_bridge, NULL); > >> + engine_add_input(&en_ct_zones, &en_sb_datapath_binding, NULL); > >> engine_add_input(&en_ct_zones, &en_runtime_data, NULL); > >> > >> engine_add_input(&en_runtime_data, &en_ofctrl_is_connected, NULL); > >> diff --git a/controller/physical.c b/controller/physical.c > >> index 1bc2c389b..00c4ca4fd 100644 > >> --- a/controller/physical.c > >> +++ b/controller/physical.c > >> @@ -926,7 +926,7 @@ consider_port_binding(struct ovsdb_idl_index > >> *sbrec_port_binding_by_name, > >> > >> ofctrl_add_flow(flow_table, OFTABLE_LOG_TO_PHY, 100, > >> binding->header_.uuid.parts[0], > >> - &match, ofpacts_p, &binding->header_.uuid); > >> + &match, ofpacts_p, hc_uuid); > >> return; > >> } > >> > >> diff --git a/lib/ovn-util.c b/lib/ovn-util.c > >> index 18aac8da3..f0fb796b4 100644 > >> --- a/lib/ovn-util.c > >> +++ b/lib/ovn-util.c > >> @@ -532,6 +532,13 @@ datapath_is_switch(const struct > >> sbrec_datapath_binding *ldp) > >> { > >> return smap_get(&ldp->external_ids, "logical-switch") != NULL; > >> } > >> + > >> +int > >> +datapath_snat_ct_zone(const struct sbrec_datapath_binding *dp) > >> +{ > >> + return smap_get_int(&dp->external_ids, "snat-ct-zone", -1); > >> +} > >> + > >> > >> struct tnlid_node { > >> struct hmap_node hmap_node; > >> diff --git a/lib/ovn-util.h b/lib/ovn-util.h > >> index 3496673b2..ea8226571 100644 > >> --- a/lib/ovn-util.h > >> +++ b/lib/ovn-util.h > >> @@ -107,6 +107,7 @@ uint32_t ovn_logical_flow_hash(const struct uuid > >> *logical_datapath, > >> uint16_t priority, > >> const char *match, const char *actions); > >> bool datapath_is_switch(const struct sbrec_datapath_binding *); > >> +int datapath_snat_ct_zone(const struct sbrec_datapath_binding *ldp); > >> void ovn_conn_show(struct unixctl_conn *conn, int argc OVS_UNUSED, > >> const char *argv[] OVS_UNUSED, void *idl_); > >> > >> diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c > >> index 4d4190cb9..8ce756c8a 100644 > >> --- a/northd/ovn-northd.c > >> +++ b/northd/ovn-northd.c > >> @@ -1179,6 +1179,16 @@ ovn_datapath_update_external_ids(struct > >> ovn_datapath *od) > >> smap_add(&ids, "interconn-ts", ts); > >> } > >> } > >> + > >> + /* Set snat-ct-zone */ > >> + if (od->nbr) { > >> + int nat_default_ct = smap_get_int(&od->nbr->options, > >> + "snat-ct-zone", -1); > >> + if (nat_default_ct >= 0) { > >> + smap_add_format(&ids, "snat-ct-zone", "%d", nat_default_ct); > >> + } > >> + } > >> + > >> sbrec_datapath_binding_set_external_ids(od->sb, &ids); > >> smap_destroy(&ids); > >> } > >> diff --git a/ovn-nb.xml b/ovn-nb.xml > >> index 5e6c6c7a3..619c0e299 100644 > >> --- a/ovn-nb.xml > >> +++ b/ovn-nb.xml > >> @@ -1961,6 +1961,13 @@ > >> unique key for each datapath by itself. However, if it is > >> configured, > >> <code>ovn-northd</code> honors the configured value. > >> </column> > >> + <column name="options" key="snat-ct-zone" > >> + type='{"type": "integer", "minInteger": 0, "maxInteger": > >> 65535}'> > >> + Use the requested conntrack zone for SNAT with this router. This > >> can be > >> + useful if egress traffic from the host running OVN comes from > >> both OVN > >> + and other sources. This way, OVN and the other sources can make > >> use of > >> + the same conntrack zone. > >> + </column> > >> </group> > >> > >> <group title="Common Columns"> > >> diff --git a/tests/ovn.at b/tests/ovn.at > >> index 378d11921..4f8c1a3e1 100644 > >> --- a/tests/ovn.at > >> +++ b/tests/ovn.at > >> @@ -22686,3 +22686,94 @@ AT_CHECK([test "$encap_rec_mvtep" == > >> "$encap_rec_mvtep1"], [0], []) > >> > >> OVN_CLEANUP([hv1]) > >> AT_CLEANUP > >> + > >> +AT_SETUP([ovn -- snat default ct zone]) > >> +ovn_start > >> + > >> +net_add n1 > >> +sim_add hv1 > >> +ovs-vsctl add-br br-phys > >> +as hv1 > >> +ovn_attach n1 br-phys 192.168.0.10 > >> + > >> +ovn-nbctl ls-add sw0 > >> +ovn-nbctl lsp-add sw0 sw0-p1 > >> +ovn-nbctl lsp-set-addresses sw0-p1 "00:00:00:00:00:02 10.0.0.2" > >> + > >> +ovn-nbctl lr-add gw_router > >> +ovn-nbctl set Logical_Router gw_router options:chassis="hv1" > >> + > >> +ovn-nbctl lrp-add gw_router gw_router-sw0 00:00:00:00:00:01 10.0.0.1/24 > >> +ovn-nbctl lsp-add sw0 sw0-gw_router > >> +ovn-nbctl lsp-set-addresses sw0-gw_router router > >> +ovn-nbctl set Logical_Switch_Port sw0-gw_router type=router \ > >> + options:router-port=gw_router-sw0 \ > >> + > >> +ovn-nbctl lr-nat-add gw_router snat 192.168.0.1 10.0.0.0/24 > >> + > >> +as hv1 ovs-vsctl -- add-port br-int hv1-vif1 -- \ > >> + set interface hv1-vif1 external-ids:iface-id=sw0-p1 > >> + > >> +ovn-nbctl --wait=hv sync > >> + > >> +ro_nb_uuid=$(ovn-nbctl get Logical_Router gw_router _uuid) > >> +sw_nb_uuid=$(ovn-nbctl get Logical_Switch sw0 _uuid) > >> +ro_sb_uuid=$(ovn-sbctl --bare --columns=_uuid find Datapath_Binding > >> external-ids:logical-router=${ro_nb_uuid}) > >> +sw_sb_uuid=$(ovn-sbctl --bare --columns=_uuid find Datapath_Binding > >> external-ids:logical-switch=${sw_nb_uuid}) > >> +ro_meta=$(ovn-sbctl get Datapath_Binding ${ro_sb_uuid} tunnel_key) > >> +ro_meta=$(printf %#x ${ro_meta}) > >> +sw_meta=$(ovn-sbctl get Datapath_Binding ${sw_sb_uuid} tunnel_key) > >> +sw_meta=$(printf %#x ${sw_meta}) > >> + > >> +echo "ro_nb_uuid: ${ro_nb_uuid}" > >> +echo "sw_nb_uuid: ${sw_nb_uuid}" > >> +echo "ro_sb_uuid: ${ro_sb_uuid}" > >> +echo "sw_sb_uuid: ${sw_sb_uuid}" > >> +echo "ro_meta: ${ro_meta}" > >> +echo "sw_meta: ${sw_meta}" > >> + > >> +as hv1 > >> +ovs-vsctl list bridge br-int > >> +snat_zone=$(printf %#x $(ovs-vsctl get bridge br-int > >> external-ids:ct-zone-${ro_sb_uuid}_snat | tr -d \")) > >> + > >> +echo "snat_zone: ${snat_zone}" > >> + > >> +as hv1 ovs-ofctl dump-flows br-int > offlows_pre > >> +AT_CAPTURE_FILE([offlows_pre]) > >> +# We should have a flow in table 33 that transitions from the ingress > >> pipeline > >> +# to the egress pipeline of gw_router. > >> +AT_CHECK_UNQUOTED([grep -c > >> "table=33.*metadata=${ro_meta}.*load:${snat_zone}->NXM_NX_REG12[]" > >> offlows_pre], [0], [dnl > >> +1 > >> +]) > >> + > >> +# We should have a flow in table 65 that transitions from the egress > >> pipeline > >> +# of sw0 to the ingress pipeline of gw_router. > >> +AT_CHECK_UNQUOTED([grep -c > >> "table=65.*metadata=${sw_meta}.*load:${snat_zone}->NXM_NX_REG12[]" > >> offlows_pre], [0], [dnl > >> +1 > >> +]) > >> + > >> +ovn-nbctl --wait=hv set Logical_Router gw_router options:snat-ct-zone=666 > >> + > >> +as hv1 > >> +snat_zone=$(ovs-vsctl get bridge br-int > >> external-ids:ct-zone-${ro_sb_uuid}_snat | tr -d \") > >> + > >> +echo "snat_zone: ${snat_zone}" > >> + > >> +AT_CHECK([test "${snat_zone}" = "666"], [0], []) > >> + > >> +as hv1 ovs-ofctl dump-flows br-int > offlows_post > >> +AT_CAPTURE_FILE([offlows_post]) > >> +# We should have a flow in table 33 that transitions from the ingress > >> pipeline > >> +# to the egress pipeline of gw_router. > >> +AT_CHECK_UNQUOTED([grep -c > >> "table=33.*metadata=${ro_meta}.*load:0x29a->NXM_NX_REG12[]" offlows_post], > >> [0], [dnl > >> +1 > >> +]) > >> + > >> +# We should have a flow in table 65 that transitions from the egress > >> pipeline > >> +# of sw0 to the ingress pipeline of gw_router. > >> +AT_CHECK_UNQUOTED([grep -c > >> "table=65.*metadata=${sw_meta}.*load:0x29a->NXM_NX_REG12[]" offlows_post], > >> [0], [dnl > >> +1 > >> +]) > >> + > >> +OVN_CLEANUP([hv1]) > >> +AT_CLEANUP > >> -- > >> 2.25.4 > >> > >> _______________________________________________ > >> dev mailing list > >> d...@openvswitch.org > >> https://mail.openvswitch.org/mailman/listinfo/ovs-dev > >> > > > > _______________________________________________ > dev mailing list > d...@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev