On Fri, Sep 22, 2023 at 2:40 PM Han Zhou <hz...@ovn.org> wrote: > > On Mon, Sep 11, 2023 at 7:31 AM Vladislav Odintsov <odiv...@gmail.com> > wrote: > > > > Hi Dumitru, > > > > if eventually this patch got merged, please remove next lines from its > commit message: > > > > "Disablement for ovs-vswitchd -> ovn-controller OF inacivity probing will > > be done in the next patch." > > > > Unfortunately I was unsuccessful in removing these probing direction > (default 60s). > > Maybe you can give any hint how to solve this [1]. > > > > 1: https://mail.openvswitch.org/pipermail/ovs-dev/2023-June/405445.html > > > > > On 11 Sep 2023, at 15:57, Dumitru Ceara <dce...@redhat.com> wrote: > > > > > > On 9/11/23 14:54, Dumitru Ceara wrote: > > >> On 6/14/23 20:02, Vladislav Odintsov wrote: > > >>> Hi Mark, > > >>> > > >>> thanks for taking time to look on this! > > >>> > > >>> Your point with a hung OVS is really an interesting case. > > >>> > > >>> From one hand it’s a possible situation, and from another I guess > it’s much higher probability for OVS to be busy by some other work rather > than to hang in a loop. In my installation the first happening sometimes > (OVS business by real work). The reasons can be different but I can’t say > that it’s better to drop OF connection in such a moment (possibly > triggering ha chassis-redirect ports failover). > > >>> > > >>> At the same time if we’re talking about L3GW node, which has a > gateway chassis redirect port, the hung OVS should be detected by other > nodes with BFD probes (if it’s a ha chassis-redirect port). > > >>> And if it’s a normal hypervisor (with just a vif ports), so what can > we do with it? I guess if we drop OF connection, it won’t make any positive > changes. Maybe just release memory needed for handling this connection and > not allocate buffers…? Unfortunately I can’t evaluate this. > > >>> > > >>> Moreover, the OVN default is a disabled probing. What can be a > possible recommended values to set probes if to leave it as is? How should > users find out that they have to configure this? Currently in docs there is > only information that this option configures probe interval. But it’s not > obvious when to configure it and which value to set. > > >>> > > >> > > >> In support of this, quoting the original commit that added the probe > > >> config knob: > > >> > > >> "The main intention of this patch is to fix this recompuation issue for > > >> older versions (there by requesting backport), it still would be > > >> beneficial with the incremental processing engine." > > >> > > >> That was back in 2019: > > >> > > >> > https://github.com/ovn-org/ovn/commit/c99069c8934c9ea55d310a8b6d48fb66aa477589 > > To clarify a little more, the effect of the above commit in fact changed > the default behavior of probe from disabled (which is the default behavior > of all unix/punix rconn) to 5s, and was changed back to default disabled by: > https://github.com/ovn-org/ovn/commit/b8af8549396e62d6523be18e104352e334825783 > > So I am totally fine to remove this config if no one would ever need to > enable the probe. > > > >> > > >> Since then lots of performance improvements landed in OVN > > >> (ovn-controller too), we're probably fine. Checking > > >> ovn-org/ovn-kubernetes code I see they were also setting > > >> ovn-openflow-probe-interval "just in case" so they probably won't be > > >> affected by it getting removed: > > >> > > >> > https://github.com/ovn-org/ovn-kubernetes/commit/be1eb843852cd4c490514cec60553cca4feaf18e > > >> > > >>> I hope this makes sense. > > >>> > > >> > > >> The patch in its current form looks OK to me, I'm OK to merge it if > Mark > > >> doesn't have anything against it; therefore: > > >> > > >> Acked-by: Dumitru Ceara <dce...@redhat.com> > > >> > > > > > > I forgot to mention this earlier but, since this patch was posted way > > > before soft freeze and since the changes seem contained, it's probably > > > fair to apply this to the 23.09 branch too. What do you think, Mark? > > > > > > Thanks! > > > > > >> Regards, > > >> Dumitru > > >> > > >>> Also, which Han’s patch to OVS you’re talking about? I looked through > open OVS patches and haven’t seen any. Please point me out. > > >>> > > I don't remember that I posted any patch related to this. We have a patch > downstream that changed the hardcoded 60s to 0 in OVS for the mgmt > controller. > > > >>> regards, > > >>> Vladislav Odintsov > > >>> > > >>>> On 9 Jun 2023, at 18:28, Mark Michelson <mmich...@redhat.com> wrote: > > >>>> > > >>>> Hi, > > >>>> > > >>>> Thanks for linking the email thread, since I had the exact same > concern that Numan did about detecting if ovs-vswitchd goes down. It sounds > like because of the nature of unix sockets, this disconnection is still > detected by OVN and failover can occur as expected. > > >>>> > > >>>> But what about if ovs-vswitchd is still running but in a > "compromised" state. For instance, what if a coding error in ovs-vswitchd > results in it running an infinite loop? In such a case, the unix connection > would still be alive, but OVN would not be able to communicate with the > ovs-vswitchd process. > > >>>> > > >>>> Does this situation eventually lead to OVN disconnecting anyway > after it fills up the unix socket's buffer? If so, how long does that take? > Or does it lead to OVN slowly growing in memory usage while it waits > forever to be able to write to the socket? > > Hi Mark, if OVS is in such infinite loop, disconnect it from ovn-controller > and reconnect would not help anyway, right? > > > >>>> > > >>>> I think removing the hard-coded 5 second probe intervals from > pinctrl.c and features.c is a good idea. But I think we should leave the > configurable probe interval used by ofctrl.c for the case I described > before. Since Han's patch to OVS should disable probe intervals from the > OVS side by default, we would not trigger bad behavior simply because OVN > has a lot of work to do during a recompute. What do you think? > > >>>>> On 6/8/23 10:15, Vladislav Odintsov wrote: > > >>>>> OpenFlow connection is established over unix socket, which is a > reliable > > >>>>> connection and doesn't require additional probing. > > >>>>> With this patch openflow probing in ovn-controller -> ovs-vswitchd > > >>>>> direction is disabled for all three connections: > > >>>>> - OF flows management connection, > > >>>>> - OF features negotiation connection, > > >>>>> - pinctrl connection. > > >>>>> ovn-controller external_ids:ovn-openflow-probe-interval is removed > as > > >>>>> non-needed anymore. > > >>>>> Disablement for ovs-vswitchd -> ovn-controller OF inacivity probing > will > > >>>>> be done in the next patch. > > >>>>> Reported-at: > https://mail.openvswitch.org/pipermail/ovs-dev/2023-May/404625.html > > >>>>> Signed-off-by: Vladislav Odintsov <odiv...@gmail.com> > > >>>>> --- > > >>>>> NEWS | 5 +++++ > > >>>>> controller/ofctrl.c | 14 ++------------ > > >>>>> controller/ofctrl.h | 4 +--- > > >>>>> controller/ovn-controller.8.xml | 14 -------------- > > >>>>> controller/ovn-controller.c | 21 +-------------------- > > >>>>> controller/pinctrl.c | 2 +- > > >>>>> lib/features.c | 7 ++----- > > >>>>> 7 files changed, 12 insertions(+), 55 deletions(-) > > >>>>> diff --git a/NEWS b/NEWS > > >>>>> index 645acea1f..bd63b187b 100644 > > >>>>> --- a/NEWS > > >>>>> +++ b/NEWS > > >>>>> @@ -1,5 +1,10 @@ > > >>>>> Post v23.06.0 > > >>>>> ------------- > > >>>>> + - Disable OpenFlow inactivity probing between ovn-controller and > OVS. > > >>>>> + OF connection is established over unix socket, which is a > reliable > > >>>>> + connection method and doesn't require additional probing. > > >>>>> + external_ids:ovn-openflow-probe-interval configuration option > for > > >>>>> + ovn-controller is no longer matter. > > nit: s/is no longer matter/is removed > > Acked-by: Han Zhou <hz...@ovn.org> > > (waiting for Mark's confirm about this concerns)
OK. I went ahead and applied this patch to the main. Since there were no comments from Mark I assume he is fine with the patch. Numan > > > >>>>> OVN v23.06.0 - 01 Jun 2023 > > >>>>> -------------------------- > > >>>>> diff --git a/controller/ofctrl.c b/controller/ofctrl.c > > >>>>> index 64a444ff6..788634494 100644 > > >>>>> --- a/controller/ofctrl.c > > >>>>> +++ b/controller/ofctrl.c > > >>>>> @@ -415,11 +415,9 @@ static void ofctrl_recv(const struct > ofp_header *, enum ofptype); > > >>>>> void > > >>>>> ofctrl_init(struct ovn_extend_table *group_table, > > >>>>> - struct ovn_extend_table *meter_table, > > >>>>> - int inactivity_probe_interval) > > >>>>> + struct ovn_extend_table *meter_table) > > >>>>> { > > >>>>> - swconn = rconn_create(inactivity_probe_interval, 0, > > >>>>> - DSCP_DEFAULT, 1 << OFP15_VERSION); > > >>>>> + swconn = rconn_create(0, 0, DSCP_DEFAULT, 1 << OFP15_VERSION); > > >>>>> tx_counter = rconn_packet_counter_create(); > > >>>>> hmap_init(&installed_lflows); > > >>>>> hmap_init(&installed_pflows); > > >>>>> @@ -2986,14 +2984,6 @@ ofctrl_is_connected(void) > > >>>>> return rconn_is_connected(swconn); > > >>>>> } > > >>>>> -void > > >>>>> -ofctrl_set_probe_interval(int probe_interval) > > >>>>> -{ > > >>>>> - if (swconn) { > > >>>>> - rconn_set_probe_interval(swconn, probe_interval); > > >>>>> - } > > >>>>> -} > > >>>>> - > > >>>>> void > > >>>>> ofctrl_get_memory_usage(struct simap *usage) > > >>>>> { > > >>>>> diff --git a/controller/ofctrl.h b/controller/ofctrl.h > > >>>>> index 105f9370b..46bfccd85 100644 > > >>>>> --- a/controller/ofctrl.h > > >>>>> +++ b/controller/ofctrl.h > > >>>>> @@ -49,8 +49,7 @@ struct ovn_desired_flow_table { > > >>>>> /* Interface for OVN main loop. */ > > >>>>> void ofctrl_init(struct ovn_extend_table *group_table, > > >>>>> - struct ovn_extend_table *meter_table, > > >>>>> - int inactivity_probe_interval); > > >>>>> + struct ovn_extend_table *meter_table); > > >>>>> bool ofctrl_run(const struct ovsrec_bridge *br_int, > > >>>>> const struct ovsrec_open_vswitch_table *, > > >>>>> struct shash *pending_ct_zones); > > >>>>> @@ -142,7 +141,6 @@ void ofctrl_check_and_add_flow_metered(struct > ovn_desired_flow_table *, > > >>>>> bool ofctrl_is_connected(void); > > >>>>> -void ofctrl_set_probe_interval(int probe_interval); > > >>>>> void ofctrl_get_memory_usage(struct simap *usage); > > >>>>> #endif /* controller/ofctrl.h */ > > >>>>> diff --git a/controller/ovn-controller.8.xml > b/controller/ovn-controller.8.xml > > >>>>> index f61f43008..52eb137d3 100644 > > >>>>> --- a/controller/ovn-controller.8.xml > > >>>>> +++ b/controller/ovn-controller.8.xml > > >>>>> @@ -147,20 +147,6 @@ > > >>>>> </p> > > >>>>> </dd> > > >>>>> - > <dt><code>external_ids:ovn-openflow-probe-interval</code></dt> > > >>>>> - <dd> > > >>>>> - <p> > > >>>>> - The inactivity probe interval of the OpenFlow connection > to the > > >>>>> - OpenvSwitch integration bridge, in seconds. > > >>>>> - If the value is zero, it disables the connection > keepalive feature. > > >>>>> - </p> > > >>>>> - > > >>>>> - <p> > > >>>>> - If the value is nonzero, then it will be forced to a > value of > > >>>>> - at least 5s. > > >>>>> - </p> > > >>>>> - </dd> > > >>>>> - > > >>>>> <dt><code>external_ids:ovn-encap-type</code></dt> > > >>>>> <dd> > > >>>>> <p> > > >>>>> diff --git a/controller/ovn-controller.c > b/controller/ovn-controller.c > > >>>>> index 3a81a13fb..732e7a690 100644 > > >>>>> --- a/controller/ovn-controller.c > > >>>>> +++ b/controller/ovn-controller.c > > >>>>> @@ -105,7 +105,6 @@ static unixctl_cb_func > debug_ignore_startup_delay; > > >>>>> #define DEFAULT_BRIDGE_NAME "br-int" > > >>>>> #define DEFAULT_DATAPATH "system" > > >>>>> -#define OFCTRL_DEFAULT_PROBE_INTERVAL_SEC 0 > > >>>>> #define CONTROLLER_LOOP_STOPWATCH_NAME "flow-generation" > > >>>>> #define OFCTRL_PUT_STOPWATCH_NAME "flow-installation" > > >>>>> @@ -556,22 +555,6 @@ update_ssl_config(const struct > ovsrec_ssl_table *ssl_table) > > >>>>> } > > >>>>> } > > >>>>> -static int > > >>>>> -get_ofctrl_probe_interval(struct ovsdb_idl *ovs_idl) > > >>>>> -{ > > >>>>> - const struct ovsrec_open_vswitch *cfg = > ovsrec_open_vswitch_first(ovs_idl); > > >>>>> - if (!cfg) { > > >>>>> - return OFCTRL_DEFAULT_PROBE_INTERVAL_SEC; > > >>>>> - } > > >>>>> - > > >>>>> - const struct ovsrec_open_vswitch_table *ovs_table = > > >>>>> - ovsrec_open_vswitch_table_get(ovs_idl); > > >>>>> - const char *chassis_id = get_ovs_chassis_id(ovs_table); > > >>>>> - return get_chassis_external_id_value_int( > > >>>>> - &cfg->external_ids, chassis_id, > > >>>>> - "ovn-openflow-probe-interval", > OFCTRL_DEFAULT_PROBE_INTERVAL_SEC); > > >>>>> -} > > >>>>> - > > >>>>> /* Retrieves the pointer to the OVN Southbound database from > 'ovs_idl' and > > >>>>> * updates 'sbdb_idl' with that pointer. */ > > >>>>> static void > > >>>>> @@ -4975,8 +4958,7 @@ main(int argc, char *argv[]) > > >>>>> engine_get_internal_data(&en_lb_data); > > >>>>> ofctrl_init(&lflow_output_data->group_table, > > >>>>> - &lflow_output_data->meter_table, > > >>>>> - get_ofctrl_probe_interval(ovs_idl_loop.idl)); > > >>>>> + &lflow_output_data->meter_table); > > >>>>> ofctrl_seqno_init(); > > >>>>> unixctl_command_register("group-table-list", "", 0, 0, > > >>>>> @@ -5104,7 +5086,6 @@ main(int argc, char *argv[]) > > >>>>> &reset_ovnsb_idl_min_index, > > >>>>> &ctrl_engine_ctx, &ovnsb_expected_cond_seqno); > > >>>>> update_ssl_config(ovsrec_ssl_table_get(ovs_idl_loop.idl)); > > >>>>> - > ofctrl_set_probe_interval(get_ofctrl_probe_interval(ovs_idl_loop.idl)); > > >>>>> struct ovsdb_idl_txn *ovnsb_idl_txn > > >>>>> = ovsdb_idl_loop_run(&ovnsb_idl_loop); > > >>>>> diff --git a/controller/pinctrl.c b/controller/pinctrl.c > > >>>>> index c396ad4c2..a86db3f32 100644 > > >>>>> --- a/controller/pinctrl.c > > >>>>> +++ b/controller/pinctrl.c > > >>>>> @@ -3362,7 +3362,7 @@ pinctrl_handler(void *arg_) > > >>>>> static long long int svc_monitors_next_run_time = LLONG_MAX; > > >>>>> static long long int send_prefixd_time = LLONG_MAX; > > >>>>> - swconn = rconn_create(5, 0, DSCP_DEFAULT, 1 << OFP15_VERSION); > > >>>>> + swconn = rconn_create(0, 0, DSCP_DEFAULT, 1 << OFP15_VERSION); > > >>>>> while (!latch_is_set(&pctrl->pinctrl_thread_exit)) { > > >>>>> long long int bfd_time = LLONG_MAX; > > >>>>> diff --git a/lib/features.c b/lib/features.c > > >>>>> index 97c9c86f0..ad3357570 100644 > > >>>>> --- a/lib/features.c > > >>>>> +++ b/lib/features.c > > >>>>> @@ -28,11 +28,10 @@ > > >>>>> #include "openvswitch/ofp-meter.h" > > >>>>> #include "openvswitch/ofp-util.h" > > >>>>> #include "ovn/features.h" > > >>>>> +#include "controller/ofctrl.h" > > >>>>> VLOG_DEFINE_THIS_MODULE(features); > > >>>>> -#define FEATURES_DEFAULT_PROBE_INTERVAL_SEC 5 > > >>>>> - > > >>>>> struct ovs_feature { > > >>>>> enum ovs_feature_value value; > > >>>>> const char *name; > > >>>>> @@ -82,8 +81,7 @@ static void > > >>>>> ovs_feature_rconn_setup(const char *br_name) > > >>>>> { > > >>>>> if (!swconn) { > > >>>>> - swconn = rconn_create(FEATURES_DEFAULT_PROBE_INTERVAL_SEC, > 0, > > >>>>> - DSCP_DEFAULT, 1 << OFP15_VERSION); > > >>>>> + swconn = rconn_create(0, 0, DSCP_DEFAULT, 1 << > OFP15_VERSION); > > >>>>> } > > >>>>> if (!rconn_is_connected(swconn)) { > > >>>>> @@ -94,7 +92,6 @@ ovs_feature_rconn_setup(const char *br_name) > > >>>>> } > > >>>>> free(target); > > >>>>> } > > >>>>> - rconn_set_probe_interval(swconn, > FEATURES_DEFAULT_PROBE_INTERVAL_SEC); > > >>>>> } > > >>>>> static bool > > >>>> > > >>> _______________________________________________ > > >>> dev mailing list > > >>> d...@openvswitch.org > > >>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev > > > > > > Regards, > > Vladislav Odintsov > > > > _______________________________________________ > > dev mailing list > > d...@openvswitch.org > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > _______________________________________________ > dev mailing list > d...@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev