On Wed, Nov 5, 2025 at 11:05 PM Rosemarie O'Riorden <[email protected]> wrote:
> > > On 11/5/25 2:50 PM, Ilya Maximets wrote: > > On 11/5/25 7:15 PM, Rosemarie O'Riorden wrote: > >> Hi Ales, thanks so much for the review! > >> > >> I agree with all of the changes you suggest. I thought checking the > >> transaction count was necessary because there are two cases where it > >> could be wrong, but it seems the test consistently fails at checking the > >> number of tunnels in both of those cases, so it should be fine without. > >> > >> Thanks again! > >> > >> > >> On 11/5/25 6:31 AM, Ales Musil wrote: > >>> > >>> > >>> On Wed, Nov 5, 2025 at 12:57 AM Rosemarie O'Riorden via dev <ovs- > >>> [email protected] <mailto:[email protected]>> wrote: > >>> > >>> If you set a chassis' encap ip to be the same as that of another > >>> chassis, OVN should detect it, log an error, and delete the chassis > >>> record. > >>> > >>> But this was not the case; OVN would allow the transaction. > However, the > >>> commit to the southbound database would fail, because there was > already > >>> an encap record with that same ip and type (from the other > chassis). > >>> The OVS db on the problematic chassis would then rapidly increase > in > >>> size. Essentially, the controller would get stuck in a loop of > deleting > >>> and recreating the tunnels. > >>> > >>> This change makes it so when encaps are reconfigured, before being > >>> built, in chassis_build_encaps(), the controller detects if there > is > >>> already an existing encap tunnel of the same ip and type. If so, > and > >>> there are no good (unique) encaps configured to use, only bad ones > (as > >>> in already used on another chassis), the controller logs an error, > >>> refuses to build the encaps, and deletes the associated chassis and > >>> chassis private record. If there are both good encaps IPs and bad > ones > >>> configured, the good ones will be used to build encaps and the bad > ones > >>> will be ignored. > >>> > >>> In chassis_build_encaps(), only if an Encap record does not already > >>> exist will a new record be built. And options will only be updated > if > >>> they've changed. > >>> > >>> Another change this patch introduces is if there's currently a > >>> southbound transaction in flight, encaps_run() will not run. This > is to > >>> maintain symmetry with chassis_run()'s behavior. This change > prevents an > >>> inconsistency where the controller saw ports in the local OVS db > but not > >>> in sb-db, and then deleted and recreated them. > >>> > >>> Reported-at: https://issues.redhat.com/browse/FDP-1481 <https:// > >>> issues.redhat.com/browse/FDP-1481> > >>> > >>> > >>> nit: This should be https://issues.redhat.com/browse/FDP-2013 > <https:// > >>> issues.redhat.com/browse/FDP-2013> > >>> > >>> > >>> > >>> Co-authored-by: Nicholas Hubbard <[email protected] > >>> <mailto:[email protected]>> > >>> Signed-off-by: Nicholas Hubbard <[email protected] > >>> <mailto:[email protected]>> > >>> Signed-off-by: Rosemarie O'Riorden <[email protected] > >>> <mailto:[email protected]>> > >>> --- > >>> > >>> > >>> Hi Rosemarie, > >>> > >>> thank you for the patch, I have a couple of comments down below. > >>> > >>> > >>> v4: > >>> - Fix bug Jacob discovered where ovn-controller got stuck in a > loop. > >>> -> Change chassis_build_encaps() to not always recreate all > >>> encaps, but > >>> instead, only modify records if they've changed, and create > new > >>> records if > >>> there are not existing ones. > >>> -> Document above change with comments. > >>> -> Also add a comment to explain how good IPs will be used even > if bad > >>> ones are configured. > >>> -> Make encap_lookup_by_ip_and_type() not return const, so encap > >>> recs can be > >>> modified > >>> -> Add ovnsb_idl_txn != NULL as a requirement for encaps_run() > to run > >>> to have symmetry with chassis_build_encaps() so controller > won't > >>> unnecessarily delete and recreate tunnels. > >>> - Change test in many ways > >>> -> Update name. > >>> -> Update method of verifying success, check # of transactions > >>> with jsonrpc > >>> debug logging. > >>> -> Add the case where there are good and bad IPs. > >>> -> Check that the warning I added appears. > >>> - Update commit message to reflect all changes above. > >>> > >>> v3: > >>> - Update misleading comment > >>> - Add deletion of chassis private record > >>> - Move declaration function name to same line as return type > >>> - Delete "Transaction causes multiple rows" error from exceptions > of > >>> affected > >>> test > >>> - Update my own error to be more specific in exceptions of > affected test > >>> - Update test to not be racy; Check for db row counts instead of > >>> saving file > >>> size after change > >>> - Update commit message to reflect changes > >>> > >>> v2: Fix typo in title. > >>> --- > >>> controller/chassis.c | 133 > +++++++++++++++++++++++++----------- > >>> controller/chassis.h | 3 +- > >>> controller/encaps.c | 3 +- > >>> controller/encaps.h | 1 + > >>> controller/ovn-controller.c | 7 +- > >>> lib/chassis-index.c | 20 ++++++ > >>> lib/chassis-index.h | 3 + > >>> tests/ovn-controller.at <http://ovn-controller.at> | 65 > ++++++ > >>> ++++++++++-- > >>> 8 files changed, 186 insertions(+), 49 deletions(-) > >>> > >>> diff --git a/controller/chassis.c b/controller/chassis.c > >>> index 07bef1c56..2b6bfc247 100644 > >>> --- a/controller/chassis.c > >>> +++ b/controller/chassis.c > >>> @@ -77,6 +77,12 @@ struct ovs_chassis_cfg { > >>> bool ct_label_flush; > >>> }; > >>> > >>> +enum chassis_update_status { > >>> + CHASSIS_UPDATED, > >>> + CHASSIS_NOT_UPDATED, > >>> + CHASSIS_NEED_DELETE > >>> +}; > >>> + > >>> static void > >>> ovs_chassis_cfg_init(struct ovs_chassis_cfg *cfg) > >>> { > >>> @@ -699,7 +705,8 @@ chassis_build_encaps(struct ovsdb_idl_txn > >>> *ovnsb_idl_txn, > >>> const char *encap_ip_default, > >>> const char *chassis_id, > >>> const char *encap_csum, > >>> - size_t *n_encap) > >>> + size_t *n_encap, > >>> + struct ovsdb_idl_index *sbrec_encaps) > >>> { > >>> size_t tunnel_count = 0; > >>> > >>> @@ -713,20 +720,45 @@ chassis_build_encaps(struct ovsdb_idl_txn > >>> *ovnsb_idl_txn, > >>> > >>> SSET_FOR_EACH (encap_ip, encap_ip_set) { > >>> SSET_FOR_EACH (encap_type, encap_type_set) { > >>> - struct sbrec_encap *encap = > >>> sbrec_encap_insert(ovnsb_idl_txn); > >>> - > >>> - sbrec_encap_set_type(encap, encap_type); > >>> - sbrec_encap_set_ip(encap, encap_ip); > >>> - if (encap_ip_default && !strcmp(encap_ip_default, > >>> encap_ip)) { > >>> - struct smap _options; > >>> - smap_clone(&_options, &options); > >>> - smap_add(&_options, "is_default", "true"); > >>> - sbrec_encap_set_options(encap, &_options); > >>> - smap_destroy(&_options); > >>> + struct sbrec_encap *encap_rec = > >>> + encap_lookup_by_ip_and_type(sbrec_encaps, > >>> + encap_ip, encap_type); > >>> + if (encap_rec && strcmp(encap_rec->chassis_name, > >>> chassis_id)) { > >>> + static struct vlog_rate_limit rl = > >>> VLOG_RATE_LIMIT_INIT(5, 1); > >>> + VLOG_WARN_RL(&rl, "'%s' already has encap ip '%s' > and " > >>> + "type '%s', cannot duplicate on > '%s'", > >>> + encap_rec->chassis_name, > encap_rec->ip, > >>> + encap_rec->type, chassis_id); > >>> + > >>> + /* Build encaps for valid IPs even if bad IPs are > >>> + * configured. */ > >>> + continue; > >>> + } > >>> + > >>> + /* Only if an Encap record does not already exist do > we > >>> create a > >>> + * new record. */ > >>> + struct sbrec_encap *encap; > >>> > >>> > >>> No need to add a new variable, the encap_rec can be reused. > >>> > >>> > >>> + if (!encap_rec) { > >>> + encap = sbrec_encap_insert(ovnsb_idl_txn); > >>> + sbrec_encap_set_type(encap, encap_type); > >>> + sbrec_encap_set_ip(encap, encap_ip); > >>> + sbrec_encap_set_chassis_name(encap, chassis_id); > >>> } else { > >>> - sbrec_encap_set_options(encap, &options); > >>> + encap = encap_rec; > >>> + } > >>> + > >>> + if (!smap_is_empty(&encap->options) || > >>> + !smap_equal(&options, &encap->options)) { > >>> + if (encap_ip_default && !strcmp(encap_ip_default, > >>> encap_ip)) { > >>> + struct smap _options; > >>> + smap_clone(&_options, &options); > >>> + smap_add(&_options, "is_default", "true"); > >>> + sbrec_encap_set_options(encap, &_options); > >>> + smap_destroy(&_options); > >>> + } else { > >>> + sbrec_encap_set_options(encap, &options); > >>> + } > >>> } > >>> - sbrec_encap_set_chassis_name(encap, chassis_id); > >>> > >>> encaps[tunnel_count] = encap; > >>> tunnel_count++; > >>> @@ -778,7 +810,7 @@ update_supported_sset(struct sset *supported) > >>> > >>> static void > >>> remove_unsupported_options(const struct sbrec_chassis > *chassis_rec, > >>> - bool *updated) > >>> + enum chassis_update_status > *update_status) > >>> { > >>> struct sset supported_options = > >>> SSET_INITIALIZER(&supported_options); > >>> update_supported_sset(&supported_options); > >>> @@ -789,7 +821,7 @@ remove_unsupported_options(const struct > >>> sbrec_chassis *chassis_rec, > >>> VLOG_WARN("Removing unsupported key \"%s\" from > chassis > >>> record.", > >>> node->key); > >>> sbrec_chassis_update_other_config_delkey(chassis_rec, > >>> node->key); > >>> - *updated = true; > >>> + *update_status = CHASSIS_UPDATED; > >>> } > >>> } > >>> > >>> @@ -827,25 +859,28 @@ chassis_get_record(struct ovsdb_idl_txn > >>> *ovnsb_idl_txn, > >>> } > >>> > >>> /* Update a Chassis record based on the config in the ovs config. > >>> - * Returns true if 'chassis_rec' was updated, false otherwise. > >>> + * Returns CHASSIS_UPDATED if 'chassis_rec' was updated, > >>> CHASSIS_NEED_DELETE if > >>> + * another chassis exists with an encap with same IP and type as > >>> 'chassis', and > >>> + * returns CHASSIS_NOT_UPDATED otherwise. > >>> */ > >>> -static bool > >>> +static enum chassis_update_status > >>> chassis_update(const struct sbrec_chassis *chassis_rec, > >>> struct ovsdb_idl_txn *ovnsb_idl_txn, > >>> const struct ovs_chassis_cfg *ovs_cfg, > >>> const char *chassis_id, > >>> - const struct sset *transport_zones) > >>> + const struct sset *transport_zones, > >>> + struct ovsdb_idl_index *sbrec_encaps) > >>> { > >>> - bool updated = false; > >>> + enum chassis_update_status update_status = > CHASSIS_NOT_UPDATED; > >>> > >>> if (strcmp(chassis_id, chassis_rec->name)) { > >>> sbrec_chassis_set_name(chassis_rec, chassis_id); > >>> - updated = true; > >>> + update_status = CHASSIS_UPDATED; > >>> } > >>> > >>> if (strcmp(ovs_cfg->hostname, chassis_rec->hostname)) { > >>> sbrec_chassis_set_hostname(chassis_rec, > ovs_cfg->hostname); > >>> - updated = true; > >>> + update_status = CHASSIS_UPDATED; > >>> } > >>> > >>> if (chassis_other_config_changed(ovs_cfg, chassis_rec)) { > >>> @@ -856,12 +891,12 @@ chassis_update(const struct sbrec_chassis > >>> *chassis_rec, > >>> sbrec_chassis_verify_other_config(chassis_rec); > >>> sbrec_chassis_set_other_config(chassis_rec, > &other_config); > >>> smap_destroy(&other_config); > >>> - updated = true; > >>> + update_status = CHASSIS_UPDATED; > >>> } > >>> > >>> update_chassis_transport_zones(transport_zones, chassis_rec); > >>> > >>> - remove_unsupported_options(chassis_rec, &updated); > >>> + remove_unsupported_options(chassis_rec, &update_status); > >>> > >>> /* If any of the encaps should change, update them. */ > >>> bool tunnels_changed = > >>> @@ -871,7 +906,7 @@ chassis_update(const struct sbrec_chassis > >>> *chassis_rec, > >>> ovs_cfg->encap_csum, > >>> chassis_rec); > >>> if (!tunnels_changed) { > >>> - return updated; > >>> + return update_status; > >>> } > >>> > >>> struct sbrec_encap **encaps; > >>> @@ -881,10 +916,16 @@ chassis_update(const struct sbrec_chassis > >>> *chassis_rec, > >>> chassis_build_encaps(ovnsb_idl_txn, > &ovs_cfg->encap_type_set, > >>> &ovs_cfg->encap_ip_set, > >>> ovs_cfg->encap_ip_default, > chassis_id, > >>> - ovs_cfg->encap_csum, &n_encap); > >>> + ovs_cfg->encap_csum, &n_encap, > >>> + sbrec_encaps); > >>> sbrec_chassis_set_encaps(chassis_rec, encaps, n_encap); > >>> free(encaps); > >>> - return true; > >>> + > >>> + if (!n_encap) { > >>> + return CHASSIS_NEED_DELETE; > >>> + } > >>> + > >>> + return CHASSIS_UPDATED; > >>> } > >>> > >>> /* If this is a chassis_private config update after we initialized > >>> the record > >>> @@ -935,7 +976,8 @@ chassis_run(struct ovsdb_idl_txn > *ovnsb_idl_txn, > >>> const char *chassis_id, > >>> const struct ovsrec_bridge *br_int, > >>> const struct sset *transport_zones, > >>> - const struct sbrec_chassis_private **chassis_private) > >>> + const struct sbrec_chassis_private **chassis_private, > >>> + struct ovsdb_idl_index *sbrec_encaps) > >>> { > >>> struct ovs_chassis_cfg ovs_cfg; > >>> > >>> @@ -956,22 +998,31 @@ chassis_run(struct ovsdb_idl_txn > *ovnsb_idl_txn, > >>> * modified in the ovs table. > >>> */ > >>> if (chassis_rec && ovnsb_idl_txn) { > >>> - bool updated = chassis_update(chassis_rec, ovnsb_idl_txn, > >>> &ovs_cfg, > >>> - chassis_id, > transport_zones); > >>> - > >>> - if (!existed || updated) { > >>> - ovsdb_idl_txn_add_comment(ovnsb_idl_txn, > >>> - "ovn-controller: %s chassis > >>> '%s'", > >>> - !existed ? "registering" : > >>> "updating", > >>> - chassis_id); > >>> - } > >>> + enum chassis_update_status update_status = > >>> + chassis_update(chassis_rec, ovnsb_idl_txn, &ovs_cfg, > >>> + chassis_id, > >>> + transport_zones, sbrec_encaps); > >>> + > >>> + *chassis_private = > chassis_private_get_record(ovnsb_idl_txn, > >>> + sbrec_chassis_private_by_name, > >>> chassis_id); > >>> > >>> - *chassis_private = > >>> - chassis_private_get_record(ovnsb_idl_txn, > >>> - > sbrec_chassis_private_by_name, > >>> + if (update_status == CHASSIS_NEED_DELETE) { > >>> + sbrec_chassis_delete(chassis_rec); > >>> + chassis_rec = NULL; > >>> + if (*chassis_private) { > >>> + sbrec_chassis_private_delete(*chassis_private); > >>> + } > >>> > >>> > >>> nit: We can return early here instead of nesting. > >>> > >>> > >>> + } else { > >>> + if (!existed || update_status == CHASSIS_UPDATED) { > >>> + ovsdb_idl_txn_add_comment(ovnsb_idl_txn, > >>> + "ovn-controller: %s > >>> chassis '%s'", > >>> + !existed ? "registering" > >>> + : "updating", > chassis_id); > >>> + } > >>> + if (*chassis_private) { > >>> + chassis_private_update(*chassis_private, > chassis_rec, > >>> chassis_id); > >>> - if (*chassis_private) { > >>> - chassis_private_update(*chassis_private, chassis_rec, > >>> chassis_id); > >>> + } > >>> } > >>> } > >>> > >>> diff --git a/controller/chassis.h b/controller/chassis.h > >>> index 03cc2f906..14251bdad 100644 > >>> --- a/controller/chassis.h > >>> +++ b/controller/chassis.h > >>> @@ -44,7 +44,8 @@ const struct sbrec_chassis *chassis_run( > >>> const struct ovsrec_open_vswitch_table *, > >>> const char *chassis_id, const struct ovsrec_bridge *br_int, > >>> const struct sset *transport_zones, > >>> - const struct sbrec_chassis_private **chassis_private); > >>> + const struct sbrec_chassis_private **chassis_private, > >>> + struct ovsdb_idl_index *sbrec_chassis_encaps); > >>> bool chassis_cleanup(struct ovsdb_idl_txn *ovs_idl_txn, > >>> struct ovsdb_idl_txn *ovnsb_idl_txn, > >>> const struct ovsrec_open_vswitch_table *, > >>> diff --git a/controller/encaps.c b/controller/encaps.c > >>> index 67f8c9c9c..fd34a8ea0 100644 > >>> --- a/controller/encaps.c > >>> +++ b/controller/encaps.c > >>> @@ -549,6 +549,7 @@ create_evpn_tunnels(struct tunnel_ctx *tc) > >>> > >>> void > >>> encaps_run(struct ovsdb_idl_txn *ovs_idl_txn, > >>> + struct ovsdb_idl_txn *ovnsb_idl_txn, > >>> const struct ovsrec_bridge *br_int, > >>> const struct sbrec_chassis_table *chassis_table, > >>> const struct sbrec_chassis *this_chassis, > >>> @@ -557,7 +558,7 @@ encaps_run(struct ovsdb_idl_txn *ovs_idl_txn, > >>> const struct sset *transport_zones, > >>> const struct ovsrec_bridge_table *bridge_table) > >>> { > >>> - if (!ovs_idl_txn || !br_int) { > >>> + if (!ovs_idl_txn || !ovnsb_idl_txn || !br_int) { > >>> return; > >>> } > >>> > >>> diff --git a/controller/encaps.h b/controller/encaps.h > >>> index 6f9891ee5..3f4a82d68 100644 > >>> --- a/controller/encaps.h > >>> +++ b/controller/encaps.h > >>> @@ -30,6 +30,7 @@ struct sset; > >>> > >>> void encaps_register_ovs_idl(struct ovsdb_idl *); > >>> void encaps_run(struct ovsdb_idl_txn *ovs_idl_txn, > >>> + struct ovsdb_idl_txn *ovnsb_idl_txn, > >>> const struct ovsrec_bridge *br_int, > >>> const struct sbrec_chassis_table *, > >>> const struct sbrec_chassis *, > >>> diff --git a/controller/ovn-controller.c > b/controller/ovn-controller.c > >>> index c2dab41c1..6ef32d6d5 100644 > >>> --- a/controller/ovn-controller.c > >>> +++ b/controller/ovn-controller.c > >>> @@ -6728,6 +6728,9 @@ main(int argc, char *argv[]) > >>> struct ovsdb_idl_index > *sbrec_advertised_mac_binding_index_by_dp > >>> = ovsdb_idl_index_create1(ovnsb_idl_loop.idl, > >>> > >>> &sbrec_advertised_mac_binding_col_datapath); > >>> + struct ovsdb_idl_index *sbrec_encaps > >>> > >>> > >>> nit: We should use the naming convention as for other indexes: > >>> "sbrec_encaps_index_by_ip_and_type". > >>> > >>> + = ovsdb_idl_index_create2(ovnsb_idl_loop.idl, > >>> + &sbrec_encap_col_type, > >>> &sbrec_encap_col_ip); > >>> > >>> ovsdb_idl_track_add_all(ovnsb_idl_loop.idl); > >>> ovsdb_idl_omit_alert(ovnsb_idl_loop.idl, > >>> @@ -7450,7 +7453,7 @@ main(int argc, char *argv[]) > >>> > sbrec_chassis_private_by_name, > >>> ovs_table, chassis_id, > >>> br_int, &transport_zones, > >>> - &chassis_private); > >>> + &chassis_private, > sbrec_encaps); > >>> } > >>> > >>> /* If any OVS feature support changed, force a full > >>> recompute. > >>> @@ -7489,7 +7492,7 @@ main(int argc, char *argv[]) > >>> } > >>> > >>> if (chassis && ovs_feature_set_discovered()) { > >>> - encaps_run(ovs_idl_txn, br_int, > >>> + encaps_run(ovs_idl_txn, ovnsb_idl_txn, br_int, > >>> > >>> sbrec_chassis_table_get(ovnsb_idl_loop.idl), > >>> chassis, > >>> > >>> sbrec_sb_global_first(ovnsb_idl_loop.idl), > >>> diff --git a/lib/chassis-index.c b/lib/chassis-index.c > >>> index 4b38036cb..c1e0234e3 100644 > >>> --- a/lib/chassis-index.c > >>> +++ b/lib/chassis-index.c > >>> @@ -115,3 +115,23 @@ ha_chassis_group_lookup_by_name( > >>> > >>> return retval; > >>> } > >>> + > >>> +/* Finds and returns the encap with the given ip and type, or NULL > >>> if no such > >>> + * encap exists. */ > >>> +struct sbrec_encap * > >>> +encap_lookup_by_ip_and_type(struct ovsdb_idl_index > >>> *sbrec_chassis_encaps, > >>> + const char *ip, const char *type) > >>> +{ > >>> + struct sbrec_encap *target = sbrec_encap_index_init_row( > >>> + sbrec_chassis_encaps); > >>> + > >>> + sbrec_encap_index_set_ip(target, ip); > >>> + sbrec_encap_index_set_type(target, type); > >>> + > >>> + struct sbrec_encap *retval = sbrec_encap_index_find( > >>> + sbrec_chassis_encaps, target); > >>> + > >>> + sbrec_encap_index_destroy_row(target); > >>> + > >>> + return retval; > >>> +} > >>> diff --git a/lib/chassis-index.h b/lib/chassis-index.h > >>> index bc654da13..3c88ae631 100644 > >>> --- a/lib/chassis-index.h > >>> +++ b/lib/chassis-index.h > >>> @@ -35,5 +35,8 @@ chassis_private_lookup_by_name( > >>> struct ovsdb_idl_index *ha_chassis_group_index_create(struct > >>> ovsdb_idl *idl); > >>> const struct sbrec_ha_chassis_group > *ha_chassis_group_lookup_by_name( > >>> struct ovsdb_idl_index *sbrec_ha_chassis_grp_by_name, const > >>> char *name); > >>> +struct sbrec_encap *encap_lookup_by_ip_and_type( > >>> + struct ovsdb_idl_index *sbrec_chassis_encaps, > >>> + const char *ip, const char *type); > >>> > >>> #endif /* lib/chassis-index.h */ > >>> diff --git a/tests/ovn-controller.at <http://ovn-controller.at> b/ > >>> tests/ovn-controller.at <http://ovn-controller.at> > >>> index bbf939d25..9842bd976 100644 > >>> --- a/tests/ovn-controller.at <http://ovn-controller.at> > >>> +++ b/tests/ovn-controller.at <http://ovn-controller.at> > >>> @@ -277,7 +277,8 @@ ovs-vsctl -- set Open_vSwitch . external- > >>> ids:hostname="${new_sysid}" \ > >>> -- set Open_vSwitch . external-ids:ovn- > >>> remote="${current_remote}" > >>> > >>> OVS_WAIT_UNTIL([ > >>> - grep -q 'Transaction causes multiple rows in \\"Encap\\" table > >>> to have identical values' hv/ovn-controller.log > >>> + grep -q "'hv' already has encap ip '192.168.0.1' and type > >>> 'geneve', " \ > >>> + "cannot duplicate on 'hv-foo'" hv/ovn-controller.log > >>> ]) > > Since Ilya pointed out "geneve" could also be "vxlan", should we replace > this with: > > +grep -qE "'hv' already has encap ip '192.168.0.1' and type" \ > + "'(geneve|vxlan)', cannot duplicate on 'hv-foo'" \ > + hv/ovn-controller.log > > This doesn't work in the other case though, so I suppose the type could > be removed entirely from the error message. I figured it was best to > include as much information as possible, but if it's up to the order in > the hashmap that doesn't seem very important... > > >>> > >>> # Destroy the stale entries manually and ovn-controller should now > >>> be able > >>> @@ -305,12 +306,12 @@ check ovn-sbctl destroy chassis_private . -- > >>> destroy chassis . > >>> # - could not create datapath br-int of unknown type foobar > >>> # - unknown datapath type bar > >>> # - could not create datapath br-int of unknown type bar > >>> -# - Transaction causes multiple rows ... > >>> # - failed to create bridge br-int: Address family not supported > by > >>> protocol # due to previous errors. > >>> +# - 'hv' already has encap ip '192.168.0.1' and type 'geneve', > >>> cannot duplicate on 'hv-foo' > >>> OVN_CLEANUP_SBOX([hv], ["/foo/d > >>> /bar/d > >>> -/Transaction causes multiple rows/d > >>> -/failed to create bridge/d"]) > >>> +/failed to create bridge/d > >>> +/already has encap ip.*cannot duplicate on/d"]) > >>> > >>> > >>> nit: We shouldn't change the order. > >>> > >>> > >>> > >>> OVN_CLEANUP_VSWITCH([main]) > >>> as ovn-sb > >>> @@ -3788,3 +3789,59 @@ wait_ports 0 > >>> OVN_CLEANUP([hv1 > >>> /Invalid VXLAN port number.*/d]) > >>> AT_CLEANUP > >>> + > >>> +OVN_FOR_EACH_NORTHD([ > >>> +AT_SETUP([ovn-controller - two chassis set same encap ip and > type]) > >>> +AT_KEYWORDS([ovn]) > >>> +ovn_start > >>> +net_add n1 > >>> + > >>> +sim_add hv1 > >>> +as hv1 > >>> +check ovs-vsctl add-br br-phys > >>> +ovn_attach n1 br-phys 192.168.0.11 > >>> + > >>> +sim_add hv2 > >>> +as hv2 > >>> +check ovs-vsctl add-br br-phys > >>> +ovn_attach n1 br-phys 192.168.0.12 > >>> + > >>> +# Wait for tunnels to appear in OVS, encaps to appear in sb-db, > and set > >>> +# controller's jsonrpc debug logging. > >>> +as hv2 > >>> +OVS_WAIT_UNTIL([test 1 = $(ovs-vsctl find Interface type=geneve | > >>> grep -c _uuid)]) > >>> > >>> > >>> nit: We can use OVS_WAIT_FOR_OUTPUT instead. > >>> > >>> > >>> +wait_row_count Encap 4 > >>> +as hv1 > >>> +OVS_WAIT_UNTIL([test 1 = $(ovs-vsctl find Interface type=geneve | > >>> grep -c _uuid)]) > >>> +wait_row_count Encap 4 > >>> +ovn-appctl vlog/set jsonrpc:dbg > >>> +as hv2 > >>> +ovn-appctl vlog/set jsonrpc:dbg > >>> + > >>> +# Give hv2 the same encap IP as hv1 (.11) with other valid IPs. > >>> +check ovs-vsctl set Open_vSwitch . \ > >>> + external-ids:ovn-encap- > >>> ip="192.168.0.10,192.168.0.11,192.168.0.12,192.168.0.13" > >>> + > >>> +as hv1 wait_row_count Encap 2 chassis_name="hv1" > >>> + > >>> +# Make sure tunnels are created for the other three valid IPs > >>> configured. > >>> +as hv2 > >>> +OVS_WAIT_UNTIL([test 3 = $(ovs-vsctl find Interface type=geneve | > >>> grep -c _uuid)]) > >>> +wait_row_count Encap 6 chassis_name="hv2" > >>> +OVS_WAIT_UNTIL([check test 1 = $(cat hv2/ovn-controller.log | > >>> + grep -c "'hv1' already has encap ip '192.168.0.11' > >>> and type " > >>> + "'geneve', cannot duplicate on 'hv2'")]) > > > > This check fails with different compiler flags. Depending on the > > order inside the hash map, we'll get either geneve or vxlan here. > > Again, either remove the type from the warning altogether, or we could > match on only the beginning of the warning. > > > > >>> > >>> + > >>> +# Make sure controller logs have correct number of transactions. > >>> +OVS_WAIT_UNTIL([check test 2 = $(cat hv2/ovn-controller.log | grep > >>> transact | wc -l)]) > >>> +OVS_WAIT_UNTIL([check test 1 = $(cat hv1/ovn-controller.log | grep > >>> transact | wc -l)]) > >>> > >>> > >>> I don't think the transaction count is helping TBF. IMO we can > >>> completely skip it. > > > > We can't really skipt it. For exmaple, if you make the follwoing change > in > > the code on top of the suggested follow up: > > > > diff --git a/controller/chassis.c b/controller/chassis.c > > index 4afb6da93..8168a9e82 100644 > > --- a/controller/chassis.c > > +++ b/controller/chassis.c > > @@ -737,12 +737,12 @@ chassis_build_encaps(struct ovsdb_idl_txn > *ovnsb_idl_txn, > > > > /* Only if an Encap record does not already exist do we > create a > > * new record. */ > > - if (!encap_rec) { > > + //if (!encap_rec) { > > encap_rec = sbrec_encap_insert(ovnsb_idl_txn); > > sbrec_encap_set_type(encap_rec, encap_type); > > sbrec_encap_set_ip(encap_rec, encap_ip); > > sbrec_encap_set_chassis_name(encap_rec, chassis_id); > > - } > > + //} > > > > if (!smap_is_empty(&encap_rec->options) || > > !smap_equal(&options, &encap_rec->options)) { > > --- > > > > The test is still passing, but in the short time while the test is > ruuning, > > ovn-controller manages to send hundreds of transactions towards Sb, > re-creating > > all the encaps from scratch on every iteration. Which is one of the > issues this > > patch is trying to fix, beside the transations sent towards OVS database. > > Ah yes, I was trying to figure out how to make the test pass wrongly and > this is it. > > I thought removing the check for !ovnsb_idl_txn from encaps_run() would > let it pass with extra transactions but it still failed at the check for > # of tunnels. > > Yes we should keep the transaction count. > > > > > Best regards, Ilya Maximets. > > > > -- > Rosemarie O'Riorden > Lowell, MA, United States > [email protected] > > Thank you Rosemarie and Ilya, yeah that makes sense I have applied the diff, added back the transaction check and made the warning check typeless, in the end it doesn't matter what type of the tunnel it is. I went ahead and merged this into main and backportred all the way down to 24.03. Regards, Ales _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
