Hi Mark, Ales, On 10/10/25 2:51 AM, Mark Michelson wrote: > On Thu, Oct 9, 2025 at 5:25 PM Mark Michelson <[email protected]> wrote: >> >> Hi Ales, >> >> I've been thinking through this change, and I think the idea might have some >> flaws. Let's consider an arbitrary engine node B that takes engine node A as >> input. In this scenario, engine node B writes to the SB DB and engine node A >> does not. With this change, we mark engine node B as SB_WRITE, but we do not >> do the same with node A. >> >> We perform an engine run where recomputation is not allowed because the SB >> DB is read only. When engine node A runs, since it does not have SB_WRITE >> specified, it is able to recompute its data. However, when engine node B >> runs, it cannot recompute its data. >> >> Now, what happens during the next engine run? >> * Let's say engine node A has no new input data. Engine node A's state will >> be EN_UNCHANGED. In this case, engine node B sees that engine node A is >> unchanged, so engine node B does not run. However, B really should run >> because in the previous engine node run, A recomputed. In this case, B's >> data is incorrect. >> * Let's say engine node A has new data and can process it incrementally. >> Engine node A's state will be EN_UPDATED. >> * Engine node B may also try to incrementally handle A's tracked data. >> However, the only tracked data that B will see is from the current engine >> run. The data that A recomputed in the previous engine run will be unknown >> to B. In this case, B's data is incorrect. >> * Engine node B may not have an input handler for A. In this case, B >> will recompute and all will be well. >> * Let's say engine node A has new data and cannot process it incrementally. >> In this case, A will recompute, and then B will also likely recompute. The >> data integrity is fine in this case. However, A has had to recompute twice. >> Before this change, A would only have recomputed on the second engine run. >> >> In 2 of the 4 cases, B ends up with incorrect data. In 1 of the 4 cases, the >> data is correct in engine node B, but A has to recompute twice. And in 1 of >> the 4 cases, the data is correct in engine node B and we successfully >> avoided an unnecessary recomputation of A. The only time this change has a >> positive effect is if the SB_WRITE node has no input handlers for its input >> nodes, and the input nodes are able to incrementally process their data. > > I've realized that in the scenario I outlined, a couple of those > outcomes can't happen. When the first engine run happens, B will fail > to recompute, and the engine run will be canceled. In ovn-controller, > if an engine run is canceled, then we call > engine_set_force_recompute_immediate(). This means that the next run > will force all nodes to recompute. This means that the second and > third outcomes (both based on A being able to incrementally process > its data) cannot happen. Also, when engine_force_recompute is true, > the state of input nodes is not considered. This means that if A is > unchanged, it does not matter. B will still recompute and will > therefore have all of A's data. So my concerns about data integrity > are not correct. > > However, the fact that the engine forces all nodes to recompute means > that A will recompute on both engine runs. This means that with the > patch, A will recompute two times across the two engine runs. Without > the patch, A only recomputes once. So I still think the patch is > hurting more than it is helping. >
Do we need to consider storing more granular, maybe per-node, information about "forced recompute". I.e., store the reason why forced recompute is needed and if the reason was "recompute because the run was cancelled" then don't recompute the nodes that actually managed to recompute in the previous run? Also, it might be good to quantify the impact of the current change. Maybe running some large scale control plane tests with ovn-heater would provide some insights. Regards, Dumitru >> >> Another consideration is that the incremental engine in northd hardcodes the >> allow_recompute argument to true when it calls engine_run(). Therefore this >> change will have no effect on the northd incremental engine. > > ovn-northd doesn't have any consideration for canceled engine runs > since it knows engine runs can't be canceled there. If it were > possible to cancel engine runs in ovn-northd, I imagine it would have > the same logic as ovn-controller. > > >> >> For the above reasons, I don't think this patch should be accepted in its >> current state. >> >> On Thu, Oct 9, 2025 at 8:47 AM Ales Musil via dev <[email protected]> >> wrote: >>> >>> The engine would cancel node recompute if it wasn't allowed, >>> typically the SB is in read-only mode. That would lead to cancel >>> of the node recompute and force recompute the next engine run, which >>> is very wasteful. Add a way to mark the engine nodes as SB write. >>> Only nodes that are marked as SB write will lead to cancel when >>> recompute is not allowed. Other nodes (SB read-only) will be able >>> to proceed with recompute without any issue. >>> >>> To help with correct marking of the nodes there is a new IDL mechinsm >>> that will assert if the txn is marked as read-only but something >>> attempted to write into it. >>> >>> Signed-off-by: Ales Musil <[email protected]> >>> --- >>> controller/ovn-controller.c | 6 +++--- >>> lib/inc-proc-eng.c | 7 ++++++- >>> lib/inc-proc-eng.h | 6 ++++++ >>> northd/inc-proc-northd.c | 40 ++++++++++++++++++------------------- >>> ovs | 2 +- >>> 5 files changed, 36 insertions(+), 25 deletions(-) >>> >>> diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c >>> index 6fbf3a227..3cc7ab340 100644 >>> --- a/controller/ovn-controller.c >>> +++ b/controller/ovn-controller.c >>> @@ -6758,7 +6758,7 @@ main(int argc, char *argv[]) >>> ENGINE_NODE(template_vars, CLEAR_TRACKED_DATA); >>> ENGINE_NODE(ct_zones, CLEAR_TRACKED_DATA, IS_VALID); >>> ENGINE_NODE(ovs_interface_shadow, CLEAR_TRACKED_DATA); >>> - ENGINE_NODE(runtime_data, CLEAR_TRACKED_DATA); >>> + ENGINE_NODE(runtime_data, CLEAR_TRACKED_DATA, SB_WRITE); >>> ENGINE_NODE(non_vif_data); >>> ENGINE_NODE(mff_ovn_geneve); >>> ENGINE_NODE(ofctrl_is_connected); >>> @@ -6779,9 +6779,9 @@ main(int argc, char *argv[]) >>> ENGINE_NODE(acl_id, IS_VALID); >>> ENGINE_NODE(route); >>> ENGINE_NODE(route_table_notify); >>> - ENGINE_NODE(route_exchange); >>> + ENGINE_NODE(route_exchange, SB_WRITE); >>> ENGINE_NODE(route_exchange_status); >>> - ENGINE_NODE(garp_rarp); >>> + ENGINE_NODE(garp_rarp, SB_WRITE); >>> ENGINE_NODE(host_if_monitor); >>> ENGINE_NODE(neighbor); >>> ENGINE_NODE(neighbor_table_notify); >>> diff --git a/lib/inc-proc-eng.c b/lib/inc-proc-eng.c >>> index 78187c8fd..bd366cb90 100644 >>> --- a/lib/inc-proc-eng.c >>> +++ b/lib/inc-proc-eng.c >>> @@ -27,6 +27,7 @@ >>> #include "openvswitch/hmap.h" >>> #include "openvswitch/poll-loop.h" >>> #include "openvswitch/vlog.h" >>> +#include "ovsdb-idl.h" >>> #include "inc-proc-eng.h" >>> #include "timeval.h" >>> #include "unixctl.h" >>> @@ -426,7 +427,7 @@ engine_recompute(struct engine_node *node, bool allowed, >>> reason = xvasprintf(reason_fmt, reason_args); >>> va_end(reason_args); >>> >>> - if (!allowed) { >>> + if (node->sb_write && !allowed) { >>> VLOG_DBG("node: %s, recompute (%s) canceled", node->name, reason); >>> engine_set_node_state(node, EN_CANCELED, "recompute not allowed"); >>> goto done; >>> @@ -565,10 +566,14 @@ engine_run(bool recompute_allowed) >>> return; >>> } >>> >>> + struct ovsdb_idl_txn *sb_txn = engine_get_context()->ovnsb_idl_txn; >>> + >>> engine_run_canceled = false; >>> struct engine_node *node; >>> VECTOR_FOR_EACH (&engine_nodes, node) { >>> + ovsdb_idl_txn_assert_read_only(sb_txn, !node->sb_write); >>> engine_run_node(node, recompute_allowed); >>> + ovsdb_idl_txn_assert_read_only(sb_txn, false); >>> >>> if (node->state == EN_CANCELED) { >>> node->stats.cancel++; >>> diff --git a/lib/inc-proc-eng.h b/lib/inc-proc-eng.h >>> index d9174effa..9d39045b0 100644 >>> --- a/lib/inc-proc-eng.h >>> +++ b/lib/inc-proc-eng.h >>> @@ -272,6 +272,9 @@ struct engine_node { >>> >>> /* Engine stats. */ >>> struct engine_stats stats; >>> + >>> + /* Indication if the node writes to SB DB. */ >>> + bool sb_write; >>> }; >>> >>> /* Initialize the data for the engine nodes. It calls each node's >>> @@ -435,6 +438,9 @@ void engine_ovsdb_node_add_index(struct engine_node *, >>> const char *name, >>> #define COMPUTE_FAIL_INFO(NAME) \ >>> .get_compute_failure_info = en_##NAME##_compute_failure_info, >>> >>> +#define SB_WRITE(NAME) \ >>> + .sb_write = true >>> + >>> #define ENGINE_NODE2(NAME, ARG1) \ >>> ENGINE_NODE_DEF_START(NAME, #NAME) \ >>> ARG1(NAME), \ >>> diff --git a/northd/inc-proc-northd.c b/northd/inc-proc-northd.c >>> index ff9515be7..a5e4f9a53 100644 >>> --- a/northd/inc-proc-northd.c >>> +++ b/northd/inc-proc-northd.c >>> @@ -153,22 +153,22 @@ enum sb_engine_node { >>> >>> /* Define engine nodes for other nodes. They should be defined as static to >>> * avoid sparse errors. */ >>> -static ENGINE_NODE(northd, CLEAR_TRACKED_DATA); >>> -static ENGINE_NODE(sync_from_sb); >>> +static ENGINE_NODE(northd, CLEAR_TRACKED_DATA, SB_WRITE); >>> +static ENGINE_NODE(sync_from_sb, SB_WRITE); >>> static ENGINE_NODE(sampling_app); >>> -static ENGINE_NODE(lflow); >>> -static ENGINE_NODE(mac_binding_aging); >>> +static ENGINE_NODE(lflow, SB_WRITE); >>> +static ENGINE_NODE(mac_binding_aging, SB_WRITE); >>> static ENGINE_NODE(mac_binding_aging_waker); >>> static ENGINE_NODE(northd_output); >>> -static ENGINE_NODE(sync_meters); >>> -static ENGINE_NODE(sync_to_sb); >>> -static ENGINE_NODE(sync_to_sb_addr_set); >>> -static ENGINE_NODE(port_group, CLEAR_TRACKED_DATA); >>> -static ENGINE_NODE(fdb_aging); >>> +static ENGINE_NODE(sync_meters, SB_WRITE); >>> +static ENGINE_NODE(sync_to_sb, SB_WRITE); >>> +static ENGINE_NODE(sync_to_sb_addr_set, SB_WRITE); >>> +static ENGINE_NODE(port_group, CLEAR_TRACKED_DATA, SB_WRITE); >>> +static ENGINE_NODE(fdb_aging, SB_WRITE); >>> static ENGINE_NODE(fdb_aging_waker); >>> -static ENGINE_NODE(sync_to_sb_lb); >>> -static ENGINE_NODE(sync_to_sb_pb); >>> -static ENGINE_NODE(global_config, CLEAR_TRACKED_DATA); >>> +static ENGINE_NODE(sync_to_sb_lb, SB_WRITE); >>> +static ENGINE_NODE(sync_to_sb_pb, SB_WRITE); >>> +static ENGINE_NODE(global_config, CLEAR_TRACKED_DATA, SB_WRITE); >>> static ENGINE_NODE(lb_data, CLEAR_TRACKED_DATA); >>> static ENGINE_NODE(lr_nat, CLEAR_TRACKED_DATA); >>> static ENGINE_NODE(lr_stateful, CLEAR_TRACKED_DATA); >>> @@ -176,20 +176,20 @@ static ENGINE_NODE(ls_stateful, CLEAR_TRACKED_DATA); >>> static ENGINE_NODE(route_policies); >>> static ENGINE_NODE(routes); >>> static ENGINE_NODE(bfd); >>> -static ENGINE_NODE(bfd_sync); >>> -static ENGINE_NODE(ecmp_nexthop); >>> -static ENGINE_NODE(multicast_igmp); >>> -static ENGINE_NODE(acl_id); >>> -static ENGINE_NODE(advertised_route_sync); >>> -static ENGINE_NODE(learned_route_sync, CLEAR_TRACKED_DATA); >>> +static ENGINE_NODE(bfd_sync, SB_WRITE); >>> +static ENGINE_NODE(ecmp_nexthop, SB_WRITE); >>> +static ENGINE_NODE(multicast_igmp, SB_WRITE); >>> +static ENGINE_NODE(acl_id, SB_WRITE); >>> +static ENGINE_NODE(advertised_route_sync, SB_WRITE); >>> +static ENGINE_NODE(learned_route_sync, CLEAR_TRACKED_DATA, SB_WRITE); >>> static ENGINE_NODE(dynamic_routes); >>> static ENGINE_NODE(group_ecmp_route, CLEAR_TRACKED_DATA); >>> static ENGINE_NODE(datapath_logical_router, CLEAR_TRACKED_DATA); >>> static ENGINE_NODE(datapath_logical_switch, CLEAR_TRACKED_DATA); >>> -static ENGINE_NODE(datapath_sync, CLEAR_TRACKED_DATA); >>> +static ENGINE_NODE(datapath_sync, CLEAR_TRACKED_DATA, SB_WRITE); >>> static ENGINE_NODE(datapath_synced_logical_router, CLEAR_TRACKED_DATA); >>> static ENGINE_NODE(datapath_synced_logical_switch, CLEAR_TRACKED_DATA); >>> -static ENGINE_NODE(ic_learned_svc_monitors); >>> +static ENGINE_NODE(ic_learned_svc_monitors, SB_WRITE); >>> >>> void inc_proc_northd_init(struct ovsdb_idl_loop *nb, >>> struct ovsdb_idl_loop *sb) >>> diff --git a/ovs b/ovs >>> index 852f07e52..42d99eb38 160000 >>> --- a/ovs >>> +++ b/ovs >>> @@ -1 +1 @@ >>> -Subproject commit 852f07e5251c6a0c0d5c43dc980d12a4f1bcd370 >>> +Subproject commit 42d99eb38725c6e6c99d558d6130db7ade9ba121 >>> -- >>> 2.51.0 >>> >>> _______________________________________________ >>> dev mailing list >>> [email protected] >>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev >>> > _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
