On 10/24/22 04:43, Ales Musil wrote:
On Mon, Oct 24, 2022 at 8:53 AM Ales Musil <amu...@redhat.com
<mailto:amu...@redhat.com>> wrote:
Hi Mark,
please see my comment below.
Thanks for the review.
On Fri, Oct 21, 2022 at 8:38 PM Mark Michelson <mmich...@redhat.com
<mailto:mmich...@redhat.com>> wrote:
An issue was filed where CT zone 0 was assigned to both a
logical router
SNAT and to a logical port. CT zone 0 is typically "reserved"
and not
assigned by ovn-controller; however, since SNAT zones are
configurable,
it is possible for ovn-controller to assign this zone at the CMS's
request. This accounts for how CT zone 0 can be assigned for
SNAT. There
was also a small bug in the incremental processing that could
result in
a logical port being assigned zone 0.
In the specific issue report, CT zones were restored from the
OVSDB when
ovn-controller started, and the conflicting CT zones were already
present. ovn-controller dutifully loaded these zones up. But
then there
was nothing that would allow for the conflict to be resolved
afterwards.
It is unknown how these conflicts entered into the OVSDB in the
first
place. This change does not purport to prevent conflicts from
entering
the OVSDB. However, it does make the following changes that should
further safeguard against unwanted behavior:
* ct_zones_runtime_data_handler() now assigns zones starting at
1 instead of 0. This makes it use the same range as
update_ct_zones().
* update_ct_zones() now keeps a new simap for zones that are
assigned
but not due to an SNAT zone request. This allows for us to
guarantee
that when there is a conflict between a previously
auto-assigned CT
zone and a newly-requested zone, we are guaranteed to remove the
auto-assigned CT zone.
* The removal of conflicting auto-assigned CT zones is now performed
before dealing with the newly requested zone. This makes it
so that if
we load conflicting zones from the OVSDB or if there is some
issue
that results in conflicting zones being assigned, we will
correct the
issue.
Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2126406
<https://bugzilla.redhat.com/show_bug.cgi?id=2126406>
Signed-off-by: Mark Michelson <mmich...@redhat.com
<mailto:mmich...@redhat.com>>
---
controller/ovn-controller.c | 53
++++++++++++++++++-------------------
1 file changed, 26 insertions(+), 27 deletions(-)
diff --git a/controller/ovn-controller.c
b/controller/ovn-controller.c
index 8895c7a2b..5fd9bbc40 100644
--- a/controller/ovn-controller.c
+++ b/controller/ovn-controller.c
@@ -659,7 +659,8 @@ update_ct_zones(const struct shash
*binding_lports,
const char *user;
struct sset all_users = SSET_INITIALIZER(&all_users);
struct simap req_snat_zones =
SIMAP_INITIALIZER(&req_snat_zones);
- unsigned long unreq_snat_zones[BITMAP_N_LONGS(MAX_CT_ZONES)];
+ unsigned long
unreq_snat_zones_map[BITMAP_N_LONGS(MAX_CT_ZONES)];
+ struct simap unreq_snat_zones =
SIMAP_INITIALIZER(&unreq_snat_zones);
struct shash_node *shash_node;
SHASH_FOR_EACH (shash_node, binding_lports) {
@@ -696,49 +697,46 @@ update_ct_zones(const struct shash
*binding_lports,
bitmap_set0(ct_zone_bitmap, ct_zone->data);
simap_delete(ct_zones, ct_zone);
} else if (!simap_find(&req_snat_zones, ct_zone->name)) {
- bitmap_set1(unreq_snat_zones, ct_zone->data);
+ bitmap_set1(unreq_snat_zones_map, ct_zone->data);
+ simap_put(&unreq_snat_zones, ct_zone->name,
ct_zone->data);
}
}
/* Prioritize requested CT zones */
struct simap_node *snat_req_node;
SIMAP_FOR_EACH (snat_req_node, &req_snat_zones) {
- struct simap_node *node = simap_find(ct_zones,
snat_req_node->name);
- if (node) {
- if (node->data == snat_req_node->data) {
- /* No change to this request, so no action
needed */
- continue;
- } else {
- /* Zone request has changed for this node.
delete old entry */
- bitmap_set0(ct_zone_bitmap, node->data);
- simap_delete(ct_zones, node);
- }
- }
-
/* Determine if someone already had this zone
auto-assigned.
* If so, then they need to give up their assignment since
* that zone is being explicitly requested now.
*/
- if (bitmap_is_set(unreq_snat_zones, snat_req_node->data)) {
- struct simap_node *dup;
- SIMAP_FOR_EACH_SAFE (dup, ct_zones) {
- if (dup != snat_req_node && dup->data ==
snat_req_node->data) {
- simap_delete(ct_zones, dup);
+ if (bitmap_is_set(unreq_snat_zones_map,
snat_req_node->data)) {
+ struct simap_node *unreq_node;
+ SIMAP_FOR_EACH (unreq_node, &unreq_snat_zones) {
+ if (unreq_node->data == snat_req_node->data) {
break;
}
}
This does not seem right, considering the scenario when we have two
zones A=0 and B=0 and at the same time requesting SNAT=0,
the B wouldn't be cleared and there would be conflict. Which makes
me wonder if it's not the scenario of how we got into the same zone
scenario in the first place?
Right now, the only way I know how to make that happen is by
purposefully setting those zones in the OVSDB. You're correct that if
the OVSDB already had A=0 and B=0, then neither the original code nor
the patched code will correct the issue.
IMO it would make more sense to run the
simap_find_and_delete(ct_zones, unreq_node->name); in the loop
without break, to really remove anything that has the same id.
Having this patch applied the could just run recompute without
worrying how it got into this situation in the first place and it
should fix it. WDYT?
That's a good idea, since this will address the scenario you have
outlined above. I had a bit of concern about "wasted" cycles, but it's
probably better to be safe here, especially since you have demonstrated
a scenario where this patched code will not resolve the issue.
+ ovs_assert(unreq_node);
+
+ simap_find_and_delete(ct_zones, unreq_node->name);
+
/* Set this bit to 0 so that if multiple datapaths
have requested
* this zone, we don't needlessly double-detect
this condition.
*/
- bitmap_set0(unreq_snat_zones, snat_req_node->data);
+ bitmap_set0(unreq_snat_zones_map, snat_req_node->data);
}
- add_pending_ct_zone_entry(pending_ct_zones,
CT_ZONE_OF_QUEUED,
- snat_req_node->data, true,
- snat_req_node->name);
-
- bitmap_set1(ct_zone_bitmap, snat_req_node->data);
- simap_put(ct_zones, snat_req_node->name,
snat_req_node->data);
+ struct simap_node *node = simap_find(ct_zones,
snat_req_node->name);
+ if (node && node->data != snat_req_node->data) {
+ /* Zone request has changed for this node. delete
old entry and
+ * create new one*/
+ add_pending_ct_zone_entry(pending_ct_zones,
CT_ZONE_OF_QUEUED,
+ snat_req_node->data, true,
+ snat_req_node->name);
+ bitmap_set0(ct_zone_bitmap, node->data);
+ bitmap_set1(ct_zone_bitmap, snat_req_node->data);
+ node->data = snat_req_node->data;
+ }
}
/* xxx This is wasteful to assign a zone to each
port--even if no
@@ -756,6 +754,7 @@ update_ct_zones(const struct shash
*binding_lports,
}
simap_destroy(&req_snat_zones);
+ simap_destroy(&unreq_snat_zones);
sset_destroy(&all_users);
}
@@ -2178,7 +2177,7 @@ ct_zones_runtime_data_handler(struct
engine_node *node, void *data)
struct hmap *tracked_dp_bindings =
&rt_data->tracked_dp_bindings;
struct tracked_datapath *tdp;
- int scan_start = 0;
+ int scan_start = 1;
bool updated = false;
--
2.37.3
_______________________________________________
dev mailing list
d...@openvswitch.org <mailto:d...@openvswitch.org>
https://mail.openvswitch.org/mailman/listinfo/ovs-dev
<https://mail.openvswitch.org/mailman/listinfo/ovs-dev>
Thanks,
Ales
--
Ales Musil
Senior Software Engineer - OVN Core
Red Hat EMEA <https://www.redhat.com>
amu...@redhat.com <mailto:amu...@redhat.com> IM: amusil
<https://red.ht/sig>
And I have also found a reproducer.
If you run "snat-ct-zone with common NAT zone" with the below diff
applied you should see
multiple zones having zone id=0. It can be also reproduced with your
patch so we probably need to adjust that loop.
diff --git a/tests/ovn.at <http://ovn.at> b/tests/ovn.at <http://ovn.at>
index f8b8db4df..6cf7d93a3 100644
--- a/tests/ovn.at <http://ovn.at>
+++ b/tests/ovn.at <http://ovn.at>
@@ -32366,6 +32366,9 @@ as hv1
check ovs-vsctl add-br br-phys
ovn_attach n1 br-phys 192.168.0.1
check ovs-vsctl add-port br-int ls0-hv -- set Interface ls0-hv
external-ids:iface-id=ls0-hv
+check ovs-vsctl add-port br-int ls0-hv1 -- set Interface ls0-hv1
external-ids:iface-id=ls0-hv1
+check ovs-vsctl add-port br-int ls0-hv2 -- set Interface ls0-hv2
external-ids:iface-id=ls0-hv2
+check ovs-vsctl set open . external_ids:ovn-monitor-all=true
check ovn-nbctl lr-add lr0
@@ -32378,6 +32381,12 @@ check ovn-nbctl lrp-add lr0 lr0-ls0
00:00:00:00:00:01 10.0.0.1
check ovn-nbctl lsp-add ls0 ls0-hv
check ovn-nbctl lsp-set-addresses ls0-hv "00:00:00:00:00:02 10.0.0.2"
+check ovn-nbctl lsp-add ls0 ls0-hv1
+check ovn-nbctl lsp-set-addresses ls0-hv1 "00:00:00:00:00:03 10.0.0.3"
+
+check ovn-nbctl lsp-add ls0 ls0-hv2
+check ovn-nbctl lsp-set-addresses ls0-hv2 "00:00:00:00:00:04 10.0.0.4"
+
check ovn-nbctl ls-add ext
check ovn-nbctl lsp-add ext ext-lr0
check ovn-nbctl lsp-set-type ext-lr0 router
@@ -32410,7 +32419,7 @@ snat_zone=${snat_zone::-1}
check test "$dnat_zone" = "$DNAT_ZONE_REG"
check test "$snat_zone" = "$DNAT_ZONE_REG"
-check ovn-nbctl --wait=hv set logical_router lr0 options:snat-ct-zone=666
+check ovn-nbctl --wait=hv set logical_router lr0 options:snat-ct-zone=0
dnat_zone=$(ovs-ofctl dump-flows br-int
table=$DNAT_TABLE,metadata=0x${lr0_dp_key} | grep -o zone=.*, | cut -d
'=' -f 2)
dnat_zone=${dnat_zone::-1}
@@ -32422,6 +32431,19 @@ snat_zone=${snat_zone::-1}
check test "$dnat_zone" = "$SNAT_ZONE_REG"
check test "$snat_zone" = "$SNAT_ZONE_REG"
+ovs-appctl -t ovn-controller exit --restart
+
+ovs-vsctl set bridge br-int external_ids:ct-zone-ls0-hv="0"
+ovs-vsctl set bridge br-int external_ids:ct-zone-ls0-hv1="0"
+ovs-vsctl set bridge br-int external_ids:ct-zone-ls0-hv2="0"
+
+start_daemon ovn-controller
+sleep 2
+
+ovn-appctl ct-zone-list
+
Thanks for the reproducer. I will add this to v2 and add a co-author
credit to you.
OVN_CLEANUP([hv1])
AT_CLEANUP
Thanks,
Ales
--
Ales Musil
Senior Software Engineer - OVN Core
Red Hat EMEA <https://www.redhat.com>
amu...@redhat.com <mailto:amu...@redhat.com> IM: amusil
<https://red.ht/sig>
_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev