Re: [ovs-dev] [PATCH v2 ovn] controller: introduce coverage_counters for ovn-controller incremental processing

2021-02-18 Thread Lorenzo Bianconi
> On 17/02/2021 19:33, Lorenzo Bianconi wrote:
> 
> Should the subject have an underscore? 'coverage_counters'

ops, sorry..I forgot to fix it.

> 
> > In order to help understanding system behaviour for debugging purpose,
> > introduce coverage counters for run handlers of ovn-controller
> > I-P engine nodes. Moreover add a global counter for engine abort.
> > 
> > https://bugzilla.redhat.com/show_bug.cgi?id=1890902
> > Signed-off-by: Lorenzo Bianconi 
> > ---
> > Changes since v1:
> > - drop handler counters and add global abort counter
> > - improve documentation and naming scheme
> > - introduce engine_set_node_updated utility macro
> > ---
> >  controller/ovn-controller.c | 39 +++--
> >  lib/inc-proc-eng.c  |  5 +
> >  lib/inc-proc-eng.h  | 12 +++-
> >  3 files changed, 45 insertions(+), 11 deletions(-)
> > 
> > diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c
> > index 4343650fc..42eed9ebd 100644
> > --- a/controller/ovn-controller.c
> > +++ b/controller/ovn-controller.c
> > @@ -69,6 +69,7 @@
> >  #include "stopwatch.h"
> >  #include "lib/inc-proc-eng.h"
> >  #include "hmapx.h"
> > +#include "coverage.h"
> >  
> >  VLOG_DEFINE_THIS_MODULE(main);
> >  
> > @@ -85,6 +86,18 @@ static unixctl_cb_func lflow_cache_flush_cmd;
> >  static unixctl_cb_func lflow_cache_show_stats_cmd;
> >  static unixctl_cb_func debug_delay_nb_cfg_report;
> >  
> > +/* Coverage counters for run handlers of OVN controller
> > + * incremental processing nodes
> > + */
> > +ENGINE_RUN_COVERAGE_DEFINE(flow_output);
> > +ENGINE_RUN_COVERAGE_DEFINE(runtime_data);
> > +ENGINE_RUN_COVERAGE_DEFINE(addr_sets);
> > +ENGINE_RUN_COVERAGE_DEFINE(port_groups);
> > +ENGINE_RUN_COVERAGE_DEFINE(ct_zones);
> > +ENGINE_RUN_COVERAGE_DEFINE(mff_ovn_geneve);
> > +ENGINE_RUN_COVERAGE_DEFINE(ofctrl_is_connected);
> > +ENGINE_RUN_COVERAGE_DEFINE(physical_flow_changes);
> 
> 
> I would like these counters to be part of the framework (i.e.
> inc-proc-eng.c/h) rather than being exposed out. The reasoning is that
> we shouldn't rely on a developer to modify them correctly when updating
> the _handler() and the _run() functions as it is too error-prone IMHO.
> For example, if a _run() function exits early and also sets a node to
> updated or aborts a node, will the developer/reviewer remember to update
> the corresponding counter? Also, if we need to make changes to the
> counters, it would be preferable to make the changes in one location.
> 
> I realize that this causes issues for per-node counters as currently
> implemented: inc-proc-eng.c has no idea about which nodes are which at
> compilation time and COVERAGE_INC() is a macro which requires templating
> the name of the node at compilation time.
> 
> For this reason, I suggest the following:
> 
> * Add counters to 'struct engine_node' and/or 'struct engine_context'
> * Create an ovs-appctl command, something like `ovs-appctl-ctl -t
> ovs-controller inc-proc-eng/stats-show` and `ovs-appctl-ctl -t
> ovs-controller inc-proc-eng/stats-clear` that will show/clear these
> stats to the user
> 
> This follows a common pattern in OVS/OVN code. e.g.
> https://github.com/openvswitch/ovs/blob/master/Documentation/topics/dpdk/pmd.rst#pmd-thread-statistics.
> 
> The next question is what counters to expose. As I think it should be
> easier to expose counters using the `appctl` above, I would suggest
> exposing as much as you can. However, you could also get feedback from
> those in the community who have more operational experience with OVN.
> 
> Here are some suggestions:
> 
> * Total number of engine runs
> * Total number of engine aborts
> * Total number of engine recomputes
> * Number of runs for each node
> * Number of computes for each node
> * Number of recomputes for each node
> * Number of transitions to ABORT for each node
> * Number of transitions to UNCHANGED for each node
> * Number of transitions to UPDATED for each node
> 
> What do you think?

yes, I like the idea...avoiding COVERAGE_* macros will allow a more manageable
code. I will work on a PoC, thx :)

Regards,
Lorenzo

> 
> > +
> >  #define DEFAULT_BRIDGE_NAME "br-int"
> >  #define DEFAULT_PROBE_INTERVAL_MSEC 5000
> >  #define OFCTRL_DEFAULT_PROBE_INTERVAL_SEC 0
> > @@ -955,6 +968,10 @@ ctrl_register_ovs_idl(struct ovsdb_idl *ovs_idl)
> >  SB_NODE(dns, "dns") \
> >  SB_NODE(load_balancer, "load_balancer")
> >  
> > +#define SB_NODE(NAME, NAME_STR) ENGINE_RUN_COVERAGE_DEFINE(sb_##NAME);
> > +SB_NODES
> > +#undef SB_NODE
> > +
> >  enum sb_engine_node {
> >  #define SB_NODE(NAME, NAME_STR) SB_##NAME,
> >  SB_NODES
> > @@ -972,6 +989,10 @@ enum sb_engine_node {
> >  OVS_NODE(interface, "interface") \
> >  OVS_NODE(qos, "qos")
> >  
> > +#define OVS_NODE(NAME, NAME_STR) ENGINE_RUN_COVERAGE_DEFINE(ovs_##NAME);
> > +OVS_NODES
> > +#undef OVS_NODE
> > +
> >  enum ovs_engine_node {
> >  #define OVS_NODE(NAME, NAME_STR) OVS_##NAME,
> >  

Re: [ovs-dev] [PATCH v2 ovn] controller: introduce coverage_counters for ovn-controller incremental processing

2021-02-18 Thread Mark Gray
On 17/02/2021 19:33, Lorenzo Bianconi wrote:

Should the subject have an underscore? 'coverage_counters'

> In order to help understanding system behaviour for debugging purpose,
> introduce coverage counters for run handlers of ovn-controller
> I-P engine nodes. Moreover add a global counter for engine abort.
> 
> https://bugzilla.redhat.com/show_bug.cgi?id=1890902
> Signed-off-by: Lorenzo Bianconi 
> ---
> Changes since v1:
> - drop handler counters and add global abort counter
> - improve documentation and naming scheme
> - introduce engine_set_node_updated utility macro
> ---
>  controller/ovn-controller.c | 39 +++--
>  lib/inc-proc-eng.c  |  5 +
>  lib/inc-proc-eng.h  | 12 +++-
>  3 files changed, 45 insertions(+), 11 deletions(-)
> 
> diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c
> index 4343650fc..42eed9ebd 100644
> --- a/controller/ovn-controller.c
> +++ b/controller/ovn-controller.c
> @@ -69,6 +69,7 @@
>  #include "stopwatch.h"
>  #include "lib/inc-proc-eng.h"
>  #include "hmapx.h"
> +#include "coverage.h"
>  
>  VLOG_DEFINE_THIS_MODULE(main);
>  
> @@ -85,6 +86,18 @@ static unixctl_cb_func lflow_cache_flush_cmd;
>  static unixctl_cb_func lflow_cache_show_stats_cmd;
>  static unixctl_cb_func debug_delay_nb_cfg_report;
>  
> +/* Coverage counters for run handlers of OVN controller
> + * incremental processing nodes
> + */
> +ENGINE_RUN_COVERAGE_DEFINE(flow_output);
> +ENGINE_RUN_COVERAGE_DEFINE(runtime_data);
> +ENGINE_RUN_COVERAGE_DEFINE(addr_sets);
> +ENGINE_RUN_COVERAGE_DEFINE(port_groups);
> +ENGINE_RUN_COVERAGE_DEFINE(ct_zones);
> +ENGINE_RUN_COVERAGE_DEFINE(mff_ovn_geneve);
> +ENGINE_RUN_COVERAGE_DEFINE(ofctrl_is_connected);
> +ENGINE_RUN_COVERAGE_DEFINE(physical_flow_changes);


I would like these counters to be part of the framework (i.e.
inc-proc-eng.c/h) rather than being exposed out. The reasoning is that
we shouldn't rely on a developer to modify them correctly when updating
the _handler() and the _run() functions as it is too error-prone IMHO.
For example, if a _run() function exits early and also sets a node to
updated or aborts a node, will the developer/reviewer remember to update
the corresponding counter? Also, if we need to make changes to the
counters, it would be preferable to make the changes in one location.

I realize that this causes issues for per-node counters as currently
implemented: inc-proc-eng.c has no idea about which nodes are which at
compilation time and COVERAGE_INC() is a macro which requires templating
the name of the node at compilation time.

For this reason, I suggest the following:

* Add counters to 'struct engine_node' and/or 'struct engine_context'
* Create an ovs-appctl command, something like `ovs-appctl-ctl -t
ovs-controller inc-proc-eng/stats-show` and `ovs-appctl-ctl -t
ovs-controller inc-proc-eng/stats-clear` that will show/clear these
stats to the user

This follows a common pattern in OVS/OVN code. e.g.
https://github.com/openvswitch/ovs/blob/master/Documentation/topics/dpdk/pmd.rst#pmd-thread-statistics.

The next question is what counters to expose. As I think it should be
easier to expose counters using the `appctl` above, I would suggest
exposing as much as you can. However, you could also get feedback from
those in the community who have more operational experience with OVN.

Here are some suggestions:

* Total number of engine runs
* Total number of engine aborts
* Total number of engine recomputes
* Number of runs for each node
* Number of computes for each node
* Number of recomputes for each node
* Number of transitions to ABORT for each node
* Number of transitions to UNCHANGED for each node
* Number of transitions to UPDATED for each node

What do you think?

> +
>  #define DEFAULT_BRIDGE_NAME "br-int"
>  #define DEFAULT_PROBE_INTERVAL_MSEC 5000
>  #define OFCTRL_DEFAULT_PROBE_INTERVAL_SEC 0
> @@ -955,6 +968,10 @@ ctrl_register_ovs_idl(struct ovsdb_idl *ovs_idl)
>  SB_NODE(dns, "dns") \
>  SB_NODE(load_balancer, "load_balancer")
>  
> +#define SB_NODE(NAME, NAME_STR) ENGINE_RUN_COVERAGE_DEFINE(sb_##NAME);
> +SB_NODES
> +#undef SB_NODE
> +
>  enum sb_engine_node {
>  #define SB_NODE(NAME, NAME_STR) SB_##NAME,
>  SB_NODES
> @@ -972,6 +989,10 @@ enum sb_engine_node {
>  OVS_NODE(interface, "interface") \
>  OVS_NODE(qos, "qos")
>  
> +#define OVS_NODE(NAME, NAME_STR) ENGINE_RUN_COVERAGE_DEFINE(ovs_##NAME);
> +OVS_NODES
> +#undef OVS_NODE
> +
>  enum ovs_engine_node {
>  #define OVS_NODE(NAME, NAME_STR) OVS_##NAME,
>  OVS_NODES
> @@ -1011,7 +1032,7 @@ en_ofctrl_is_connected_run(struct engine_node *node, 
> void *data)
>  ofctrl_seqno_flush();
>  binding_seqno_flush();
>  }
> -engine_set_node_state(node, EN_UPDATED);
> +engine_set_node_updated(node, ofctrl_is_connected);
>  return;
>  }
>  engine_set_node_state(node, EN_UNCHANGED);
> @@ -1067,7 +1088,7 @@ 

[ovs-dev] [PATCH v2 ovn] controller: introduce coverage_counters for ovn-controller incremental processing

2021-02-17 Thread Lorenzo Bianconi
In order to help understanding system behaviour for debugging purpose,
introduce coverage counters for run handlers of ovn-controller
I-P engine nodes. Moreover add a global counter for engine abort.

https://bugzilla.redhat.com/show_bug.cgi?id=1890902
Signed-off-by: Lorenzo Bianconi 
---
Changes since v1:
- drop handler counters and add global abort counter
- improve documentation and naming scheme
- introduce engine_set_node_updated utility macro
---
 controller/ovn-controller.c | 39 +++--
 lib/inc-proc-eng.c  |  5 +
 lib/inc-proc-eng.h  | 12 +++-
 3 files changed, 45 insertions(+), 11 deletions(-)

diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c
index 4343650fc..42eed9ebd 100644
--- a/controller/ovn-controller.c
+++ b/controller/ovn-controller.c
@@ -69,6 +69,7 @@
 #include "stopwatch.h"
 #include "lib/inc-proc-eng.h"
 #include "hmapx.h"
+#include "coverage.h"
 
 VLOG_DEFINE_THIS_MODULE(main);
 
@@ -85,6 +86,18 @@ static unixctl_cb_func lflow_cache_flush_cmd;
 static unixctl_cb_func lflow_cache_show_stats_cmd;
 static unixctl_cb_func debug_delay_nb_cfg_report;
 
+/* Coverage counters for run handlers of OVN controller
+ * incremental processing nodes
+ */
+ENGINE_RUN_COVERAGE_DEFINE(flow_output);
+ENGINE_RUN_COVERAGE_DEFINE(runtime_data);
+ENGINE_RUN_COVERAGE_DEFINE(addr_sets);
+ENGINE_RUN_COVERAGE_DEFINE(port_groups);
+ENGINE_RUN_COVERAGE_DEFINE(ct_zones);
+ENGINE_RUN_COVERAGE_DEFINE(mff_ovn_geneve);
+ENGINE_RUN_COVERAGE_DEFINE(ofctrl_is_connected);
+ENGINE_RUN_COVERAGE_DEFINE(physical_flow_changes);
+
 #define DEFAULT_BRIDGE_NAME "br-int"
 #define DEFAULT_PROBE_INTERVAL_MSEC 5000
 #define OFCTRL_DEFAULT_PROBE_INTERVAL_SEC 0
@@ -955,6 +968,10 @@ ctrl_register_ovs_idl(struct ovsdb_idl *ovs_idl)
 SB_NODE(dns, "dns") \
 SB_NODE(load_balancer, "load_balancer")
 
+#define SB_NODE(NAME, NAME_STR) ENGINE_RUN_COVERAGE_DEFINE(sb_##NAME);
+SB_NODES
+#undef SB_NODE
+
 enum sb_engine_node {
 #define SB_NODE(NAME, NAME_STR) SB_##NAME,
 SB_NODES
@@ -972,6 +989,10 @@ enum sb_engine_node {
 OVS_NODE(interface, "interface") \
 OVS_NODE(qos, "qos")
 
+#define OVS_NODE(NAME, NAME_STR) ENGINE_RUN_COVERAGE_DEFINE(ovs_##NAME);
+OVS_NODES
+#undef OVS_NODE
+
 enum ovs_engine_node {
 #define OVS_NODE(NAME, NAME_STR) OVS_##NAME,
 OVS_NODES
@@ -1011,7 +1032,7 @@ en_ofctrl_is_connected_run(struct engine_node *node, void 
*data)
 ofctrl_seqno_flush();
 binding_seqno_flush();
 }
-engine_set_node_state(node, EN_UPDATED);
+engine_set_node_updated(node, ofctrl_is_connected);
 return;
 }
 engine_set_node_state(node, EN_UNCHANGED);
@@ -1067,7 +1088,7 @@ en_addr_sets_run(struct engine_node *node, void *data)
 addr_sets_init(as_table, >addr_sets);
 
 as->change_tracked = false;
-engine_set_node_state(node, EN_UPDATED);
+engine_set_node_updated(node, addr_sets);
 }
 
 static bool
@@ -1147,7 +1168,7 @@ en_port_groups_run(struct engine_node *node, void *data)
 port_groups_init(pg_table, >port_groups);
 
 pg->change_tracked = false;
-engine_set_node_state(node, EN_UPDATED);
+engine_set_node_updated(node, port_groups);
 }
 
 static bool
@@ -1482,7 +1503,7 @@ en_runtime_data_run(struct engine_node *node, void *data)
 
 binding_run(_ctx_in, _ctx_out);
 
-engine_set_node_state(node, EN_UPDATED);
+engine_set_node_updated(node, runtime_data);
 }
 
 static bool
@@ -1604,8 +1625,7 @@ en_ct_zones_run(struct engine_node *node, void *data)
 _zones_data->current, ct_zones_data->bitmap,
 _zones_data->pending, _data->ct_updated_datapaths);
 
-
-engine_set_node_state(node, EN_UPDATED);
+engine_set_node_updated(node, ct_zones);
 }
 
 /* The data in the ct_zones node is always valid (i.e., no stale pointers). */
@@ -1639,7 +1659,7 @@ en_mff_ovn_geneve_run(struct engine_node *node, void 
*data)
 enum mf_field_id mff_ovn_geneve = ofctrl_get_mf_field_id();
 if (ed_mff_ovn_geneve->mff_ovn_geneve != mff_ovn_geneve) {
 ed_mff_ovn_geneve->mff_ovn_geneve = mff_ovn_geneve;
-engine_set_node_state(node, EN_UPDATED);
+engine_set_node_updated(node, mff_ovn_geneve);
 return;
 }
 engine_set_node_state(node, EN_UNCHANGED);
@@ -1714,7 +1734,7 @@ en_physical_flow_changes_run(struct engine_node *node, 
void *data)
 {
 struct ed_type_pfc_data *pfc_tdata = data;
 pfc_tdata->recompute_physical_flows = true;
-engine_set_node_state(node, EN_UPDATED);
+engine_set_node_updated(node, physical_flow_changes);
 }
 
 /* ct_zone changes are not handled incrementally but a handler is required
@@ -2034,8 +2054,7 @@ en_flow_output_run(struct engine_node *node, void *data)
 init_physical_ctx(node, rt_data, _ctx);
 
 physical_run(_ctx, >flow_table);
-
-engine_set_node_state(node, EN_UPDATED);
+engine_set_node_updated(node, flow_output);
 }
 
 static