Hi Dumitru, if eventually this patch got merged, please remove next lines from its commit message:
"Disablement for ovs-vswitchd -> ovn-controller OF inacivity probing will be done in the next patch." Unfortunately I was unsuccessful in removing these probing direction (default 60s). Maybe you can give any hint how to solve this [1]. 1: https://mail.openvswitch.org/pipermail/ovs-dev/2023-June/405445.html > On 11 Sep 2023, at 15:57, Dumitru Ceara <dce...@redhat.com> wrote: > > On 9/11/23 14:54, Dumitru Ceara wrote: >> On 6/14/23 20:02, Vladislav Odintsov wrote: >>> Hi Mark, >>> >>> thanks for taking time to look on this! >>> >>> Your point with a hung OVS is really an interesting case. >>> >>> From one hand it’s a possible situation, and from another I guess it’s much >>> higher probability for OVS to be busy by some other work rather than to >>> hang in a loop. In my installation the first happening sometimes (OVS >>> business by real work). The reasons can be different but I can’t say that >>> it’s better to drop OF connection in such a moment (possibly triggering ha >>> chassis-redirect ports failover). >>> >>> At the same time if we’re talking about L3GW node, which has a gateway >>> chassis redirect port, the hung OVS should be detected by other nodes with >>> BFD probes (if it’s a ha chassis-redirect port). >>> And if it’s a normal hypervisor (with just a vif ports), so what can we do >>> with it? I guess if we drop OF connection, it won’t make any positive >>> changes. Maybe just release memory needed for handling this connection and >>> not allocate buffers…? Unfortunately I can’t evaluate this. >>> >>> Moreover, the OVN default is a disabled probing. What can be a possible >>> recommended values to set probes if to leave it as is? How should users >>> find out that they have to configure this? Currently in docs there is only >>> information that this option configures probe interval. But it’s not >>> obvious when to configure it and which value to set. >>> >> >> In support of this, quoting the original commit that added the probe >> config knob: >> >> "The main intention of this patch is to fix this recompuation issue for >> older versions (there by requesting backport), it still would be >> beneficial with the incremental processing engine." >> >> That was back in 2019: >> >> https://github.com/ovn-org/ovn/commit/c99069c8934c9ea55d310a8b6d48fb66aa477589 >> >> Since then lots of performance improvements landed in OVN >> (ovn-controller too), we're probably fine. Checking >> ovn-org/ovn-kubernetes code I see they were also setting >> ovn-openflow-probe-interval "just in case" so they probably won't be >> affected by it getting removed: >> >> https://github.com/ovn-org/ovn-kubernetes/commit/be1eb843852cd4c490514cec60553cca4feaf18e >> >>> I hope this makes sense. >>> >> >> The patch in its current form looks OK to me, I'm OK to merge it if Mark >> doesn't have anything against it; therefore: >> >> Acked-by: Dumitru Ceara <dce...@redhat.com> >> > > I forgot to mention this earlier but, since this patch was posted way > before soft freeze and since the changes seem contained, it's probably > fair to apply this to the 23.09 branch too. What do you think, Mark? > > Thanks! > >> Regards, >> Dumitru >> >>> Also, which Han’s patch to OVS you’re talking about? I looked through open >>> OVS patches and haven’t seen any. Please point me out. >>> >>> regards, >>> Vladislav Odintsov >>> >>>> On 9 Jun 2023, at 18:28, Mark Michelson <mmich...@redhat.com> wrote: >>>> >>>> Hi, >>>> >>>> Thanks for linking the email thread, since I had the exact same concern >>>> that Numan did about detecting if ovs-vswitchd goes down. It sounds like >>>> because of the nature of unix sockets, this disconnection is still >>>> detected by OVN and failover can occur as expected. >>>> >>>> But what about if ovs-vswitchd is still running but in a "compromised" >>>> state. For instance, what if a coding error in ovs-vswitchd results in it >>>> running an infinite loop? In such a case, the unix connection would still >>>> be alive, but OVN would not be able to communicate with the ovs-vswitchd >>>> process. >>>> >>>> Does this situation eventually lead to OVN disconnecting anyway after it >>>> fills up the unix socket's buffer? If so, how long does that take? Or does >>>> it lead to OVN slowly growing in memory usage while it waits forever to be >>>> able to write to the socket? >>>> >>>> I think removing the hard-coded 5 second probe intervals from pinctrl.c >>>> and features.c is a good idea. But I think we should leave the >>>> configurable probe interval used by ofctrl.c for the case I described >>>> before. Since Han's patch to OVS should disable probe intervals from the >>>> OVS side by default, we would not trigger bad behavior simply because OVN >>>> has a lot of work to do during a recompute. What do you think? >>>>> On 6/8/23 10:15, Vladislav Odintsov wrote: >>>>> OpenFlow connection is established over unix socket, which is a reliable >>>>> connection and doesn't require additional probing. >>>>> With this patch openflow probing in ovn-controller -> ovs-vswitchd >>>>> direction is disabled for all three connections: >>>>> - OF flows management connection, >>>>> - OF features negotiation connection, >>>>> - pinctrl connection. >>>>> ovn-controller external_ids:ovn-openflow-probe-interval is removed as >>>>> non-needed anymore. >>>>> Disablement for ovs-vswitchd -> ovn-controller OF inacivity probing will >>>>> be done in the next patch. >>>>> Reported-at: >>>>> https://mail.openvswitch.org/pipermail/ovs-dev/2023-May/404625.html >>>>> Signed-off-by: Vladislav Odintsov <odiv...@gmail.com> >>>>> --- >>>>> NEWS | 5 +++++ >>>>> controller/ofctrl.c | 14 ++------------ >>>>> controller/ofctrl.h | 4 +--- >>>>> controller/ovn-controller.8.xml | 14 -------------- >>>>> controller/ovn-controller.c | 21 +-------------------- >>>>> controller/pinctrl.c | 2 +- >>>>> lib/features.c | 7 ++----- >>>>> 7 files changed, 12 insertions(+), 55 deletions(-) >>>>> diff --git a/NEWS b/NEWS >>>>> index 645acea1f..bd63b187b 100644 >>>>> --- a/NEWS >>>>> +++ b/NEWS >>>>> @@ -1,5 +1,10 @@ >>>>> Post v23.06.0 >>>>> ------------- >>>>> + - Disable OpenFlow inactivity probing between ovn-controller and OVS. >>>>> + OF connection is established over unix socket, which is a reliable >>>>> + connection method and doesn't require additional probing. >>>>> + external_ids:ovn-openflow-probe-interval configuration option for >>>>> + ovn-controller is no longer matter. >>>>> OVN v23.06.0 - 01 Jun 2023 >>>>> -------------------------- >>>>> diff --git a/controller/ofctrl.c b/controller/ofctrl.c >>>>> index 64a444ff6..788634494 100644 >>>>> --- a/controller/ofctrl.c >>>>> +++ b/controller/ofctrl.c >>>>> @@ -415,11 +415,9 @@ static void ofctrl_recv(const struct ofp_header *, >>>>> enum ofptype); >>>>> void >>>>> ofctrl_init(struct ovn_extend_table *group_table, >>>>> - struct ovn_extend_table *meter_table, >>>>> - int inactivity_probe_interval) >>>>> + struct ovn_extend_table *meter_table) >>>>> { >>>>> - swconn = rconn_create(inactivity_probe_interval, 0, >>>>> - DSCP_DEFAULT, 1 << OFP15_VERSION); >>>>> + swconn = rconn_create(0, 0, DSCP_DEFAULT, 1 << OFP15_VERSION); >>>>> tx_counter = rconn_packet_counter_create(); >>>>> hmap_init(&installed_lflows); >>>>> hmap_init(&installed_pflows); >>>>> @@ -2986,14 +2984,6 @@ ofctrl_is_connected(void) >>>>> return rconn_is_connected(swconn); >>>>> } >>>>> -void >>>>> -ofctrl_set_probe_interval(int probe_interval) >>>>> -{ >>>>> - if (swconn) { >>>>> - rconn_set_probe_interval(swconn, probe_interval); >>>>> - } >>>>> -} >>>>> - >>>>> void >>>>> ofctrl_get_memory_usage(struct simap *usage) >>>>> { >>>>> diff --git a/controller/ofctrl.h b/controller/ofctrl.h >>>>> index 105f9370b..46bfccd85 100644 >>>>> --- a/controller/ofctrl.h >>>>> +++ b/controller/ofctrl.h >>>>> @@ -49,8 +49,7 @@ struct ovn_desired_flow_table { >>>>> /* Interface for OVN main loop. */ >>>>> void ofctrl_init(struct ovn_extend_table *group_table, >>>>> - struct ovn_extend_table *meter_table, >>>>> - int inactivity_probe_interval); >>>>> + struct ovn_extend_table *meter_table); >>>>> bool ofctrl_run(const struct ovsrec_bridge *br_int, >>>>> const struct ovsrec_open_vswitch_table *, >>>>> struct shash *pending_ct_zones); >>>>> @@ -142,7 +141,6 @@ void ofctrl_check_and_add_flow_metered(struct >>>>> ovn_desired_flow_table *, >>>>> bool ofctrl_is_connected(void); >>>>> -void ofctrl_set_probe_interval(int probe_interval); >>>>> void ofctrl_get_memory_usage(struct simap *usage); >>>>> #endif /* controller/ofctrl.h */ >>>>> diff --git a/controller/ovn-controller.8.xml >>>>> b/controller/ovn-controller.8.xml >>>>> index f61f43008..52eb137d3 100644 >>>>> --- a/controller/ovn-controller.8.xml >>>>> +++ b/controller/ovn-controller.8.xml >>>>> @@ -147,20 +147,6 @@ >>>>> </p> >>>>> </dd> >>>>> - <dt><code>external_ids:ovn-openflow-probe-interval</code></dt> >>>>> - <dd> >>>>> - <p> >>>>> - The inactivity probe interval of the OpenFlow connection to the >>>>> - OpenvSwitch integration bridge, in seconds. >>>>> - If the value is zero, it disables the connection keepalive >>>>> feature. >>>>> - </p> >>>>> - >>>>> - <p> >>>>> - If the value is nonzero, then it will be forced to a value of >>>>> - at least 5s. >>>>> - </p> >>>>> - </dd> >>>>> - >>>>> <dt><code>external_ids:ovn-encap-type</code></dt> >>>>> <dd> >>>>> <p> >>>>> diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c >>>>> index 3a81a13fb..732e7a690 100644 >>>>> --- a/controller/ovn-controller.c >>>>> +++ b/controller/ovn-controller.c >>>>> @@ -105,7 +105,6 @@ static unixctl_cb_func debug_ignore_startup_delay; >>>>> #define DEFAULT_BRIDGE_NAME "br-int" >>>>> #define DEFAULT_DATAPATH "system" >>>>> -#define OFCTRL_DEFAULT_PROBE_INTERVAL_SEC 0 >>>>> #define CONTROLLER_LOOP_STOPWATCH_NAME "flow-generation" >>>>> #define OFCTRL_PUT_STOPWATCH_NAME "flow-installation" >>>>> @@ -556,22 +555,6 @@ update_ssl_config(const struct ovsrec_ssl_table >>>>> *ssl_table) >>>>> } >>>>> } >>>>> -static int >>>>> -get_ofctrl_probe_interval(struct ovsdb_idl *ovs_idl) >>>>> -{ >>>>> - const struct ovsrec_open_vswitch *cfg = >>>>> ovsrec_open_vswitch_first(ovs_idl); >>>>> - if (!cfg) { >>>>> - return OFCTRL_DEFAULT_PROBE_INTERVAL_SEC; >>>>> - } >>>>> - >>>>> - const struct ovsrec_open_vswitch_table *ovs_table = >>>>> - ovsrec_open_vswitch_table_get(ovs_idl); >>>>> - const char *chassis_id = get_ovs_chassis_id(ovs_table); >>>>> - return get_chassis_external_id_value_int( >>>>> - &cfg->external_ids, chassis_id, >>>>> - "ovn-openflow-probe-interval", >>>>> OFCTRL_DEFAULT_PROBE_INTERVAL_SEC); >>>>> -} >>>>> - >>>>> /* Retrieves the pointer to the OVN Southbound database from 'ovs_idl' and >>>>> * updates 'sbdb_idl' with that pointer. */ >>>>> static void >>>>> @@ -4975,8 +4958,7 @@ main(int argc, char *argv[]) >>>>> engine_get_internal_data(&en_lb_data); >>>>> ofctrl_init(&lflow_output_data->group_table, >>>>> - &lflow_output_data->meter_table, >>>>> - get_ofctrl_probe_interval(ovs_idl_loop.idl)); >>>>> + &lflow_output_data->meter_table); >>>>> ofctrl_seqno_init(); >>>>> unixctl_command_register("group-table-list", "", 0, 0, >>>>> @@ -5104,7 +5086,6 @@ main(int argc, char *argv[]) >>>>> &reset_ovnsb_idl_min_index, >>>>> &ctrl_engine_ctx, &ovnsb_expected_cond_seqno); >>>>> update_ssl_config(ovsrec_ssl_table_get(ovs_idl_loop.idl)); >>>>> - >>>>> ofctrl_set_probe_interval(get_ofctrl_probe_interval(ovs_idl_loop.idl)); >>>>> struct ovsdb_idl_txn *ovnsb_idl_txn >>>>> = ovsdb_idl_loop_run(&ovnsb_idl_loop); >>>>> diff --git a/controller/pinctrl.c b/controller/pinctrl.c >>>>> index c396ad4c2..a86db3f32 100644 >>>>> --- a/controller/pinctrl.c >>>>> +++ b/controller/pinctrl.c >>>>> @@ -3362,7 +3362,7 @@ pinctrl_handler(void *arg_) >>>>> static long long int svc_monitors_next_run_time = LLONG_MAX; >>>>> static long long int send_prefixd_time = LLONG_MAX; >>>>> - swconn = rconn_create(5, 0, DSCP_DEFAULT, 1 << OFP15_VERSION); >>>>> + swconn = rconn_create(0, 0, DSCP_DEFAULT, 1 << OFP15_VERSION); >>>>> while (!latch_is_set(&pctrl->pinctrl_thread_exit)) { >>>>> long long int bfd_time = LLONG_MAX; >>>>> diff --git a/lib/features.c b/lib/features.c >>>>> index 97c9c86f0..ad3357570 100644 >>>>> --- a/lib/features.c >>>>> +++ b/lib/features.c >>>>> @@ -28,11 +28,10 @@ >>>>> #include "openvswitch/ofp-meter.h" >>>>> #include "openvswitch/ofp-util.h" >>>>> #include "ovn/features.h" >>>>> +#include "controller/ofctrl.h" >>>>> VLOG_DEFINE_THIS_MODULE(features); >>>>> -#define FEATURES_DEFAULT_PROBE_INTERVAL_SEC 5 >>>>> - >>>>> struct ovs_feature { >>>>> enum ovs_feature_value value; >>>>> const char *name; >>>>> @@ -82,8 +81,7 @@ static void >>>>> ovs_feature_rconn_setup(const char *br_name) >>>>> { >>>>> if (!swconn) { >>>>> - swconn = rconn_create(FEATURES_DEFAULT_PROBE_INTERVAL_SEC, 0, >>>>> - DSCP_DEFAULT, 1 << OFP15_VERSION); >>>>> + swconn = rconn_create(0, 0, DSCP_DEFAULT, 1 << OFP15_VERSION); >>>>> } >>>>> if (!rconn_is_connected(swconn)) { >>>>> @@ -94,7 +92,6 @@ ovs_feature_rconn_setup(const char *br_name) >>>>> } >>>>> free(target); >>>>> } >>>>> - rconn_set_probe_interval(swconn, >>>>> FEATURES_DEFAULT_PROBE_INTERVAL_SEC); >>>>> } >>>>> static bool >>>> >>> _______________________________________________ >>> dev mailing list >>> d...@openvswitch.org >>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev Regards, Vladislav Odintsov _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev