Hi Ales,

I've been thinking through this change, and I think the idea might have
some flaws. Let's consider an arbitrary engine node B that takes engine
node A as input. In this scenario, engine node B writes to the SB DB and
engine node A does not. With this change, we mark engine node B as
SB_WRITE, but we do not do the same with node A.

We perform an engine run where recomputation is not allowed because the SB
DB is read only. When engine node A runs, since it does not have SB_WRITE
specified, it is able to recompute its data. However, when engine node B
runs, it cannot recompute its data.

Now, what happens during the next engine run?
* Let's say engine node A has no new input data. Engine node A's state will
be EN_UNCHANGED. In this case, engine node B sees that engine node A is
unchanged, so engine node B does not run. However, B really should run
because in the previous engine node run, A recomputed. In this case, B's
data is incorrect.
* Let's say engine node A has new data and can process it incrementally.
Engine node A's state will be EN_UPDATED.
    * Engine node B may also try to incrementally handle A's tracked data.
However, the only tracked data that B will see is from the current engine
run. The data that A recomputed in the previous engine run will be unknown
to B. In this case, B's data is incorrect.
    * Engine node B may not have an input handler for A. In this case, B
will recompute and all will be well.
* Let's say engine node A has new data and cannot process it incrementally.
In this case, A will recompute, and then B will also likely recompute. The
data integrity is fine in this case. However, A has had to recompute twice.
Before this change, A would only have recomputed on the second engine run.

In 2 of the 4 cases, B ends up with incorrect data. In 1 of the 4 cases,
the data is correct in engine node B, but A has to recompute twice. And in
1 of the 4 cases, the data is correct in engine node B and we successfully
avoided an unnecessary recomputation of A. The only time this change has a
positive effect is if the SB_WRITE node has no input handlers for its input
nodes, and the input nodes are able to incrementally process their data.

Another consideration is that the incremental engine in northd hardcodes
the allow_recompute argument to true when it calls engine_run(). Therefore
this change will have no effect on the northd incremental engine.

For the above reasons, I don't think this patch should be accepted in its
current state.

On Thu, Oct 9, 2025 at 8:47 AM Ales Musil via dev <[email protected]>
wrote:
>
> The engine would cancel node recompute if it wasn't allowed,
> typically the SB is in read-only mode. That would lead to cancel
> of the node recompute and force recompute the next engine run, which
> is very wasteful. Add a way to mark the engine nodes as SB write.
> Only nodes that are marked as SB write will lead to cancel when
> recompute is not allowed. Other nodes (SB read-only) will be able
> to proceed with recompute without any issue.
>
> To help with correct marking of the nodes there is a new IDL mechinsm
> that will assert if the txn is marked as read-only but something
> attempted to write into it.
>
> Signed-off-by: Ales Musil <[email protected]>
> ---
>  controller/ovn-controller.c |  6 +++---
>  lib/inc-proc-eng.c          |  7 ++++++-
>  lib/inc-proc-eng.h          |  6 ++++++
>  northd/inc-proc-northd.c    | 40 ++++++++++++++++++-------------------
>  ovs                         |  2 +-
>  5 files changed, 36 insertions(+), 25 deletions(-)
>
> diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c
> index 6fbf3a227..3cc7ab340 100644
> --- a/controller/ovn-controller.c
> +++ b/controller/ovn-controller.c
> @@ -6758,7 +6758,7 @@ main(int argc, char *argv[])
>      ENGINE_NODE(template_vars, CLEAR_TRACKED_DATA);
>      ENGINE_NODE(ct_zones, CLEAR_TRACKED_DATA, IS_VALID);
>      ENGINE_NODE(ovs_interface_shadow, CLEAR_TRACKED_DATA);
> -    ENGINE_NODE(runtime_data, CLEAR_TRACKED_DATA);
> +    ENGINE_NODE(runtime_data, CLEAR_TRACKED_DATA, SB_WRITE);
>      ENGINE_NODE(non_vif_data);
>      ENGINE_NODE(mff_ovn_geneve);
>      ENGINE_NODE(ofctrl_is_connected);
> @@ -6779,9 +6779,9 @@ main(int argc, char *argv[])
>      ENGINE_NODE(acl_id, IS_VALID);
>      ENGINE_NODE(route);
>      ENGINE_NODE(route_table_notify);
> -    ENGINE_NODE(route_exchange);
> +    ENGINE_NODE(route_exchange, SB_WRITE);
>      ENGINE_NODE(route_exchange_status);
> -    ENGINE_NODE(garp_rarp);
> +    ENGINE_NODE(garp_rarp, SB_WRITE);
>      ENGINE_NODE(host_if_monitor);
>      ENGINE_NODE(neighbor);
>      ENGINE_NODE(neighbor_table_notify);
> diff --git a/lib/inc-proc-eng.c b/lib/inc-proc-eng.c
> index 78187c8fd..bd366cb90 100644
> --- a/lib/inc-proc-eng.c
> +++ b/lib/inc-proc-eng.c
> @@ -27,6 +27,7 @@
>  #include "openvswitch/hmap.h"
>  #include "openvswitch/poll-loop.h"
>  #include "openvswitch/vlog.h"
> +#include "ovsdb-idl.h"
>  #include "inc-proc-eng.h"
>  #include "timeval.h"
>  #include "unixctl.h"
> @@ -426,7 +427,7 @@ engine_recompute(struct engine_node *node, bool
allowed,
>      reason = xvasprintf(reason_fmt, reason_args);
>      va_end(reason_args);
>
> -    if (!allowed) {
> +    if (node->sb_write && !allowed) {
>          VLOG_DBG("node: %s, recompute (%s) canceled", node->name,
reason);
>          engine_set_node_state(node, EN_CANCELED, "recompute not
allowed");
>          goto done;
> @@ -565,10 +566,14 @@ engine_run(bool recompute_allowed)
>          return;
>      }
>
> +    struct ovsdb_idl_txn *sb_txn = engine_get_context()->ovnsb_idl_txn;
> +
>      engine_run_canceled = false;
>      struct engine_node *node;
>      VECTOR_FOR_EACH (&engine_nodes, node) {
> +        ovsdb_idl_txn_assert_read_only(sb_txn, !node->sb_write);
>          engine_run_node(node, recompute_allowed);
> +        ovsdb_idl_txn_assert_read_only(sb_txn, false);
>
>          if (node->state == EN_CANCELED) {
>              node->stats.cancel++;
> diff --git a/lib/inc-proc-eng.h b/lib/inc-proc-eng.h
> index d9174effa..9d39045b0 100644
> --- a/lib/inc-proc-eng.h
> +++ b/lib/inc-proc-eng.h
> @@ -272,6 +272,9 @@ struct engine_node {
>
>      /* Engine stats. */
>      struct engine_stats stats;
> +
> +    /* Indication if the node writes to SB DB. */
> +    bool sb_write;
>  };
>
>  /* Initialize the data for the engine nodes. It calls each node's
> @@ -435,6 +438,9 @@ void engine_ovsdb_node_add_index(struct engine_node
*, const char *name,
>  #define COMPUTE_FAIL_INFO(NAME) \
>          .get_compute_failure_info = en_##NAME##_compute_failure_info,
>
> +#define SB_WRITE(NAME) \
> +    .sb_write = true
> +
>  #define ENGINE_NODE2(NAME, ARG1) \
>      ENGINE_NODE_DEF_START(NAME, #NAME) \
>      ARG1(NAME), \
> diff --git a/northd/inc-proc-northd.c b/northd/inc-proc-northd.c
> index ff9515be7..a5e4f9a53 100644
> --- a/northd/inc-proc-northd.c
> +++ b/northd/inc-proc-northd.c
> @@ -153,22 +153,22 @@ enum sb_engine_node {
>
>  /* Define engine nodes for other nodes. They should be defined as static
to
>   * avoid sparse errors. */
> -static ENGINE_NODE(northd, CLEAR_TRACKED_DATA);
> -static ENGINE_NODE(sync_from_sb);
> +static ENGINE_NODE(northd, CLEAR_TRACKED_DATA, SB_WRITE);
> +static ENGINE_NODE(sync_from_sb, SB_WRITE);
>  static ENGINE_NODE(sampling_app);
> -static ENGINE_NODE(lflow);
> -static ENGINE_NODE(mac_binding_aging);
> +static ENGINE_NODE(lflow, SB_WRITE);
> +static ENGINE_NODE(mac_binding_aging, SB_WRITE);
>  static ENGINE_NODE(mac_binding_aging_waker);
>  static ENGINE_NODE(northd_output);
> -static ENGINE_NODE(sync_meters);
> -static ENGINE_NODE(sync_to_sb);
> -static ENGINE_NODE(sync_to_sb_addr_set);
> -static ENGINE_NODE(port_group, CLEAR_TRACKED_DATA);
> -static ENGINE_NODE(fdb_aging);
> +static ENGINE_NODE(sync_meters, SB_WRITE);
> +static ENGINE_NODE(sync_to_sb, SB_WRITE);
> +static ENGINE_NODE(sync_to_sb_addr_set, SB_WRITE);
> +static ENGINE_NODE(port_group, CLEAR_TRACKED_DATA, SB_WRITE);
> +static ENGINE_NODE(fdb_aging, SB_WRITE);
>  static ENGINE_NODE(fdb_aging_waker);
> -static ENGINE_NODE(sync_to_sb_lb);
> -static ENGINE_NODE(sync_to_sb_pb);
> -static ENGINE_NODE(global_config, CLEAR_TRACKED_DATA);
> +static ENGINE_NODE(sync_to_sb_lb, SB_WRITE);
> +static ENGINE_NODE(sync_to_sb_pb, SB_WRITE);
> +static ENGINE_NODE(global_config, CLEAR_TRACKED_DATA, SB_WRITE);
>  static ENGINE_NODE(lb_data, CLEAR_TRACKED_DATA);
>  static ENGINE_NODE(lr_nat, CLEAR_TRACKED_DATA);
>  static ENGINE_NODE(lr_stateful, CLEAR_TRACKED_DATA);
> @@ -176,20 +176,20 @@ static ENGINE_NODE(ls_stateful, CLEAR_TRACKED_DATA);
>  static ENGINE_NODE(route_policies);
>  static ENGINE_NODE(routes);
>  static ENGINE_NODE(bfd);
> -static ENGINE_NODE(bfd_sync);
> -static ENGINE_NODE(ecmp_nexthop);
> -static ENGINE_NODE(multicast_igmp);
> -static ENGINE_NODE(acl_id);
> -static ENGINE_NODE(advertised_route_sync);
> -static ENGINE_NODE(learned_route_sync, CLEAR_TRACKED_DATA);
> +static ENGINE_NODE(bfd_sync, SB_WRITE);
> +static ENGINE_NODE(ecmp_nexthop, SB_WRITE);
> +static ENGINE_NODE(multicast_igmp, SB_WRITE);
> +static ENGINE_NODE(acl_id, SB_WRITE);
> +static ENGINE_NODE(advertised_route_sync, SB_WRITE);
> +static ENGINE_NODE(learned_route_sync, CLEAR_TRACKED_DATA, SB_WRITE);
>  static ENGINE_NODE(dynamic_routes);
>  static ENGINE_NODE(group_ecmp_route, CLEAR_TRACKED_DATA);
>  static ENGINE_NODE(datapath_logical_router, CLEAR_TRACKED_DATA);
>  static ENGINE_NODE(datapath_logical_switch, CLEAR_TRACKED_DATA);
> -static ENGINE_NODE(datapath_sync, CLEAR_TRACKED_DATA);
> +static ENGINE_NODE(datapath_sync, CLEAR_TRACKED_DATA, SB_WRITE);
>  static ENGINE_NODE(datapath_synced_logical_router, CLEAR_TRACKED_DATA);
>  static ENGINE_NODE(datapath_synced_logical_switch, CLEAR_TRACKED_DATA);
> -static ENGINE_NODE(ic_learned_svc_monitors);
> +static ENGINE_NODE(ic_learned_svc_monitors, SB_WRITE);
>
>  void inc_proc_northd_init(struct ovsdb_idl_loop *nb,
>                            struct ovsdb_idl_loop *sb)
> diff --git a/ovs b/ovs
> index 852f07e52..42d99eb38 160000
> --- a/ovs
> +++ b/ovs
> @@ -1 +1 @@
> -Subproject commit 852f07e5251c6a0c0d5c43dc980d12a4f1bcd370
> +Subproject commit 42d99eb38725c6e6c99d558d6130db7ade9ba121
> --
> 2.51.0
>
> _______________________________________________
> dev mailing list
> [email protected]
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to